From fef01461f9fff391303251ff704dab7b25a2bbc6 Mon Sep 17 00:00:00 2001
From: Jade Lovelace <software@lfcode.ca>
Date: Thu, 30 Mar 2023 19:07:19 -0700
Subject: [PATCH] finishish the speedy ifd post

---
 content/posts/nix-evaluation-blocking.md | 322 +++++++++++++++++++++++
 content/posts/speedy-ifd.md              | 245 -----------------
 2 files changed, 322 insertions(+), 245 deletions(-)
 create mode 100644 content/posts/nix-evaluation-blocking.md
 delete mode 100644 content/posts/speedy-ifd.md
diff --git a/content/posts/nix-evaluation-blocking.md b/content/posts/nix-evaluation-blocking.md
new file mode 100644
index 0000000..de4301e
--- /dev/null
+++ b/content/posts/nix-evaluation-blocking.md
@@ -0,0 +1,322 @@
++++
+date = "2023-03-30"
+draft = true
+path = "/blog/nix-evaluation-blocking"
+tags = ["haskell", "nix"]
+title = "Stopping evaluation from blocking in Nix"
++++
+
+Nix has a feature called "import from derivation", which is sometimes called
+"such a nice foot gun" (grahamc, 2022). I can't argue with its usefulness; it
+lets Nix do amazing things that can't be accomplished any other way, and avoid
+pointlessly checking generated files into git. However, it has a dirty secret:
+it can *atrociously* slow down your builds.
+
+The essence of this feature is that Nix can do operations such as build
+derivations whose result is used in the *evaluation stage*, before more
+evaluation takes place.
+
+## Nix build staging
+
+It's a reasonable mental model to consider that Nix, in its current
+implementation, can do one of two things at a given time:
+
+* Evaluate: run Nix expressions to create some derivations to build. This stage
+  outputs `.drv` files, which can then be instantiated.
+* Build: given some `.drv` files, fetch the result from a binary cache or build
+  from scratch.
+
+Nix evaluation happens in a lazy order, and with neither concurrency nor
+parallelism. When Nix is working on evaluating something that is blocked on a
+build, it cannot go evaluate something else.
+
+There are efforts [such as tvix][tvix] to rewrite the Nix evaluator so it can
+simultaneously evaluate and build, but they are not shipped yet.
+
+[tvix]: https://code.tvl.fyi/about/tvix
+
+### How does import-from-derivation fit in?
+
+Import-from-derivation (IFD for short) lets you do magical things: since Nix
+builds can do arbitrary computation in any language, Nix expressions or other
+data can be generated by external programs that need to do pesky things such as
+parse cursed file formats such as cabal files.
+
+Features like IFD are useful in build systems because they allow build systems
+to be *Monadic*, in the words of the [Hadrian paper]: build things to decide
+what to build. By comparison, an *Applicative* build system such as Bazel can
+only take in things that are known statically; in Bazel it is thus common to
+check in generated build instructions. This property in category theory is
+illustrated in the type signatures of the operations of Monad and Applicative:
+
+* `apply :: Applicative f => f (a -> b) -> a -> f b`<br/>
+  which means that something *outside* the `f` box can be put into a
+  computation `a -> b` inside the box.
+* `bind :: Monad m => m a -> (a -> m b) -> m b`<br/>
+  which means that if you have a value of type `a` in a `m` box, you can *get
+  the `a` out of the box* and create a further computation in a box depending
+  on its actual value.
+
+  Import-from-derivation is `bind` for Nix builds: imagine that we
+  wrote a derivation that generated the Nix language source for the `glibc`
+  package definition, then we `import (make-glibc-definition someArgs)` in the
+  dependencies of `hello`. To *evaluate* `hello`, we need to *build*
+  `glibc-definition` to remove it from the box to put it into the `hello`
+  definition, which means that IFD is equivalent to `bind`.
+
+[Hadrian paper]: https://dl.acm.org/doi/10.1145/3241625.2976011
+
+In order to achieve IFD, the evaluation stage can demand builds be run. This
+would, by nature, block evaluating things that directly depend on building
+something, but since Nix's evaluator can only work on one thing at a time, it
+blocks everything else too.
+
+N.B. I've heard that someone wrote a PureScript compiler targeting the Nix
+language, which was then used to [parse a Cabal file to do cabal2nix's job][evil-cabal2nix]
+entirely within Nix without IFD. Nothing is sacred.
+
+[evil-cabal2nix]: https://github.com/cdepillabout/cabal2nixWithoutIFD
+
+
+### What blocks evaluation?
+
+Doing anything with a derivation that depends on the contents of its output,
+for example:
+* `import someDrv`
+* `builtins.readFile "${someDrv}/blah"`
+
+Any use of builtin fetchers blocks evaluation, which is why they are banned in
+nixpkgs:
+  * `builtins.fetchGit`
+  * `builtins.fetchTree`
+  * `builtins.fetchTarball`
+  * `builtins.fetchurl`
+  * etc
+
+#### Builtin fetchers
+
+Use of builtin fetchers is a surprisingly common problem of blocking in
+evaluation. Sometimes it is done by mistake, but other times it is done for
+good reason, with unfortunate tradeoffs. I think it's reasonable to use a
+builtin fetcher plus IFD to import libraries such as nixpkgs, since
+fundamentally the thing needs to be fetched for evaluation to proceed, but
+other cases are more questionable.
+
+One reason one might use the builtin fetchers is that there is no way to use a
+fixed-output derivation such as is produced by `nixpkgs.fetchurl` to download a
+URL without knowing the hash ahead of time.
+
+An example I've seen of this being done on purpose is wanting to avoid
+requiring contributors to have Nix installed to update the hash of some
+tarball, since Nix has its own bespoke algorithm for hashing tarball contents
+that nobody has yet implemented outside Nix. So the maintainers used an impure
+network fetch (only feasible with a builtin), at the cost of slow Nix build
+times. I wound up writing [gridlock] to serve the use case of needing to
+maintain a lockfile of Nix compatible hashes without needing Nix, allowing the
+use of fixed-output derivation based fetchers.
+
+[gridlock]: https://github.com/lf-/gridlock
+
+The reason that impure fetching needs to be a builtin is because Nix has an
+important purity rule for derivations: either input is fixed and network access
+is disallowed, or output is fixed and network access is allowed. In the Nix
+model as designed ([content-addressed store] aside), derivations are identified
+only by what goes into them, but not the output.
+
+Let's see why that is. Assume that network access is allowed in normal
+builders. If this were the case, builds could have arbitrarily different inputs
+without changing the store path. Such an impurity would force Nix to either
+rebuild the derivation every time since its actual inputs could arbitrarily
+change at any time ([implemented as impure derivations!][impure-drvs]) or
+following the popular strategy of assuming that it didn't change, making a
+"clean build" necessary to have any confidence in the build output.
+
+[impure-drvs]: https://nixos.org/manual/nix/stable/release-notes/rl-2.8.html
+
+If one is doing an network fetch without knowing the hash, one needs to
+first download the file to know its eventual store path. Therefore the
+fetcher *has* to serialize any evaluation depending on the store path while
+waiting for the download. Nix doesn't know how to resume evaluating something
+else in the meantime, so it serializes all evaluation.
+
+That said, it is, in my opinion, a significant design flaw in the Nix evaluator
+that it cannot queue all the derivations that are reachable, rather than
+stopping and building each in order.
+
+[content-addressed store]: https://github.com/NixOS/rfcs/blob/master/rfcs/0062-content-addressed-paths.md
+
+# Story: `some-cabal-hashes`
+
+I worked at a Haskell shop which made extensive use of Nix. One day I pulled
+the latest sources and yet again ran into the old bug in their Nix setup where
+Nix would go and serially build derivations called
+"`all-cabal-hashes-component-*`". For several minutes, before doing anything
+useful. I decided enough was enough and went bug hunting.
+
+I fixed it in a couple of afternoons by refactoring the use of
+import-from-derivation to result in fewer switches between building and
+evaluating, which I will expand on in a bit.
+
+The result of this work is [available on GitHub][nix-lib].
+
+[nix-lib]: https://github.com/lf-/nix-lib
+
+### Background on nixpkgs Haskell
+
+The way that the nixpkgs Haskell infrastructure works is that it has a
+[Stackage]-based package set based on some Stackage Long-Term Support release,
+comprising package versions that are all known to work together. The set is
+generated via a program called [`hackage2nix`][hackage2nix], which runs
+`cabal2nix` against the entirety of Hackage. What one gets in this package set
+is a snapshot of Hackage when it was committed by the nixpkgs maintainers, with
+both the latest version and the Stackage stable version of each package
+available (if that's different from the latest).
+
+`cabal2nix` is a program that generates metadata, input hashes, and wires
+together a set of Nix derivations for Haskell packages automatically,
+using Nix instead of Cabal to orchestrate builds of packages' dependencies.
+
+Often the package set is for an older compiler, and while there *are* some
+per-compiler-version overrides in the nixpkgs Haskell distribution, these may
+not switch the version of a package you need, or the package set just plain has
+too old a version.
+
+In that case, you can use [overlays] which can apply patches, override sources,
+introduce new packages, and do basically any other arbitrary modification to
+packages.
+
+Each package build will be provisioned with a GHC package database with
+everything in its build inputs, using Nix exclusively for dependency
+version resolution and avoiding Cabal doing it.
+
+Since Nix orchestrates the build of the package dependency graph, caching of
+dependencies for development shells, parallelism across package builds, remote
+builds, and other useful properties simply fall out for free.
+
+[Stackage]: https://www.stackage.org/
+[hackage2nix]: https://github.com/NixOS/cabal2nix/tree/master/cabal2nix/hackage2nix
+[overlays]: https://nixos.org/manual/nixpkgs/stable/#chap-overlays
+
+### Where's the IFD?
+
+nixpkgs Haskell provides a wonderfully useful function called `callCabal2nix`,
+which executes `cabal2nix` to generate the Nix expression for some Haskell
+source code at Nix evaluation time. Uh oh.
+
+It also provides another wonderfully useful function called `callHackage`. This
+is a very sweet function: it takes a package name and a version and gives you a Nix
+derivation for that package which knows how to download it from Hackage.
+
+Wait, how does that work, since you can't just download stuff for fun without
+knowing its hash? Well, there's your problem.
+
+"Figuring out hashes of stuff on Hackage" was solved by someone publishing a
+comically large GitHub repo called `all-cabal-hashes` with hashes of all of the
+tarballs on Hackage and CI to keep it up to date. Using this repo, you only
+have to deal with keeping one hash up to date: the hash of the version of
+`all-cabal-hashes` you're using, and the rest are just fetched from there.
+
+#### Oh no
+
+This repository has an obscene number of files in it, such that it takes dozens
+of seconds to unpack it. So it's simply not unpacked. Fetching a file out of it
+involves invoking tar to selectively extract the relevant file from the giant
+tarball of this repo.
+
+That, in turn, takes around 7 seconds on a very fast computer, for each and
+every package, in serial. Also, Nix checks the binary caches for each and every
+one, further compounding the oopsie.
+
+In my fixed version, it takes about 7 seconds *total* to gather all the hashes,
+and I think this is instructive in how a Nix expression can be structured to
+behave much better.
+
+## Making IFD go fast
+
+Nix is great at building big graphs of dependencies in parallel and caching
+them. So what if we ask Nix to do that?
+
+How can this be achieved?
+
+What if you only demand one big derivation be built with IFD then reuse it
+across all the usage sites? You then only block evaluation once and can
+appropriately use parallelism.
+
+### Details of `some-cabal-hashes`
+
+My observation was that hot-cache builds with a bunch of IFD are fine; it's
+refilling it that's horribly painful since Nix spends a lot of time doing
+pointless things in serial. What if we warmed up the cache by asking it to
+build all that stuff in one shot? Then, the rest of the IFD would hit a hot
+cache.
+
+The fix, `some-cabal-hashes` and `some-cabal2nix`, is that Nix builds *one*
+derivation with IFD that contains everything that will be needed for further
+evaluation, then all the following imports will complete immediately.
+
+Specifically, my build dependency graph now looks like:
+
+```
+                /-> cabal2nix-pkg1 -\
+some-cabal2nix -+-> cabal2nix-pkg2 -+-> some-cabal-hashes -> all-cabal-hashes
+                \-> cabal2nix-pkg3 -/
+```
+
+There are two notable things about this graph:
+
+1. It is (approximately) the natural graph of the dependencies of the build
+   assuming that the Nix evaluator could keep going when it encounters IFD.
+
+2. It allows Nix to parallelize all the `cabal2nix-*` derivations.
+
+Then, all of the usage sites are `import "${some-cabal2nix}/pkg1` or similar.
+In this way, one derivation is built, letting Nix do what it's good at. Another
+trick is that I made `some-cabal2nix` have no runtime dependencies by *copying*
+all the resulting cabal files into the output directory instead of symlinking
+them. Thus, the whole thing can be fetched from a cache server and not built at
+all.
+
+Figuring out what will be required is the other piece of the puzzle. Initially,
+I wrote a drop in solution: extract the information from the overlays that were
+actually using `callHackage`, without invoking `callHackage` and causing an
+IFD! The way I did this is by making a fake version of the package set with
+stubs for `callHackage`, `callCabal2nix` and such, then invoking the overlay
+with this fake package set as the value of `super`. Once the overlays are
+evaluated with these stubs, I extracted the versions requested by the overlay,
+and used that to create the *real* `callHackage`.
+
+This kind of trick is also used by maintainer scripts to extract metadata from
+overlays that nixpkgs Haskell uses for compiler-specific overrides.
+
+However, nixpkgs has something nicer for overriding versions of Hackage
+libraries, which the published version of the overlay imitates instead:
+`haskell.lib.packageSourceOverrides`. It constructs an overlay that overrides
+package versions with a nice clean input of what's required.
+
+The last and very important optimization I did was to fix the `tar` invocation,
+which was quite slow. `tar` files have a linear structure that is perfect for
+making `t`ape `ar`chives (hence the name of the tool) which can be streamed to
+a tape: one file after the other, without any index. Thus, finding a file in
+the tarball takes an amount of time proportional to the number of files in the
+archive.
+
+If you call tar `invocations` times for the number of files you need, then it
+does do `files-in-archive * invocations` work. However, if you call `tar`
+*once* with a set of files that you want such that it can do one parse through
+the file, then the overall execution time is just proportional to the number of
+files in the archive. I can assume that is what `tar` actually does, since it
+does the entire extraction in basically the same time with a long file list as
+with one file.
+
+I also found that if you use the `--wildcards` option, `tar` is extremely slow
+and it seems worse with more files to extract, so it's better to use exact file
+paths. If you need to get those for a tarball from GitHub, there's a pesky top
+level directory that everything is below, which needs to be included in the
+file paths; it can be found by using `tar tf` and grabbing the first line.
+
+At extraction time, the extra directory name can be removed with
+`--strip-components=1`, and using `--one-top-level` to dump everything below
+the desired directory.
+
+The source code for `some-cabal-hashes` is available here:
+https://github.com/lf-/nix-lib/blob/main/lib/some-cabal-hashes.nix
diff --git a/content/posts/speedy-ifd.md b/content/posts/speedy-ifd.md
deleted file mode 100644
index 65d4bde..0000000
--- a/content/posts/speedy-ifd.md
+++ /dev/null
@@ -1,245 +0,0 @@
-+++
-date = "2022-10-18"
-draft = true
-path = "/blog/speedy-ifd"
-tags = ["haskell", "nix"]
-title = "Speedy import-from-derivation in Nix?"
-+++
-
-Nix has a feature called "import from derivation", which is sometimes called
-"such a nice foot gun" (grahamc, 2022). I can't argue with its
-usefulness; it lets Nix do amazing things that can't be accomplished any other
-way, and avoid pointlessly checking build products into git. However, it has a
-dirty secret: it can *atrociously* slow down your builds.
-
-The essence of this feature is that Nix can do operations such as build
-derivations whose result is used in the *evaluation stage*.
-
-## Nix build staging?
-
-Nix, in its current implementation (there are efforts [such as tvix][tvix] to
-change this), can do one of two things at a given time.
-
-* Evaluate: run Nix expressions to create some derivations to build. This stage
-  outputs `.drv` files, which can then be instantiated. Nix evaluation happens
-  in serial (single threaded), and in a lazy fashion.
-* Build: given some `.drv` files, fetch the result from a binary cache or build
-  from scratch.
-
-[tvix]: https://code.tvl.fyi/about/tvix
-
-### How does import-from-derivation fit in?
-
-Import-from-derivation (IFD for short) lets you do magical things: since Nix
-derivations can do arbitrary computation in any language, Nix expressions or
-other data can be generated by external programs that need to do pesky things
-such as parse cursed file formats such as cabal files.
-
-N.B. I've heard that someone wrote a PureScript compiler targeting the Nix
-language, which was then targeted at [parsing a Cabal file to do cabal2nix's job][evil-cabal2nix]
-entirely within Nix. Nothing is sacred.
-
-[evil-cabal2nix]: https://github.com/cdepillabout/cabal2nixWithoutIFD
-
-In order to achieve this, however, the evaluation stage can demand builds be
-run. In fact, such builds need to be run before proceeding with evaluation! So
-IFD serializes builds.
-
-### What constitutes IFD?
-
-The following is a nonexhaustive list of things constituting IFD:
-* `builtins.readFile someDerivation`
-* `import someDerivation`
-* *Any use* of builtin fetchers:
-  * `builtins.fetchGit`
-  * `builtins.fetchTree`
-  * `builtins.fetchTarball`
-  * `builtins.fetchurl`
-  * etc
-
-#### Builtin fetchers
-
-Use of builtin fetchers is a surprisingly common IFD problem. Sometimes it is
-done by mistake, but other times it is done for good reason, with unfortunate
-tradeoffs. I think it's reasonable to use IFD to import libraries such as
-nixpkgs, since fundamentally the thing needs to be fetched for evaluation to
-proceed, but other cases are more dubious.
-
-One reason one might use the builtin fetchers is that there is no way
-(excepting calculated use of impure builders) to use a derivation to download a
-URL without knowing the hash ahead of time.
-
-An example I've seen of this being done on purpose is wanting to avoid
-requiring contributors to have Nix installed to update the hash of some
-tarball, since Nix has its own bespoke algorithm for hashing tarball contents
-that nobody has yet implemented outside Nix. So the maintainers used an impure
-network fetch (only feasible with a builtin) and instituted a curse on the
-build times of Nix users.
-
-The reason that impure fetching needs to be a builtin is because Nix has an
-important purity rule for derivations: either input is fixed and network access
-is disallowed, or output is fixed and network access is allowed. In the Nix
-model as designed ([content-addressed store] aside), derivations are identified
-only by what goes into them, but not the output.
-
-Let's see why that is. Assume that network access is allowed in normal
-builders. If the URL but no hash goes in *and* network access is available,
-anything can come out without changing the store path. Such an impurity would
-completely break the fantastic property that Nix has no such thing as a clean
-build since the builds don't get dirtied to begin with.
-
-Thus, if one is doing an impure network fetch, the resulting store path has to
-depend on the content without knowing the hash ahead of time. Therefore the
-fetch *has* to serialize all evaluation after it, since it affects the store
-paths of anything downstream of it during evaluation.
-
-That said, it is, in my opinion, a significant design flaw in the Nix evaluator
-that it cannot queue all the derivations that are reachable, rather than
-stopping and building each in order.
-
-[content-addressed store]: https://github.com/NixOS/rfcs/blob/master/rfcs/0062-content-addressed-paths.md
-
-## Stories
-
-I work at a Haskell shop which makes extensive use of Nix. We had a bug where
-Nix would go and serially build "`all-cabal-hashes-component-*`". For several
-minutes.
-
-This was what I would call a "very frustrating and expensive developer UX bug".
-I fixed it in a couple of afternoons by refactoring the use of
-import-from-derivation to result in fewer switches between building and
-evaluating, which I will expand on in a bit.
-
-## Background on nixpkgs Haskell
-
-The way that the nixpkgs Haskell infrastructure works is that it has a
-[Stackage]-based package set based on some Stackage Long-Term Support release,
-comprising package versions that are all known to work together. The set is
-generated via a program called [`hackage2nix`][hackage2nix], which runs
-`cabal2nix` against the entirety of Hackage.
-
-`cabal2nix` is a program that generates metadata, input hashes, and hooks up
-dependencies declared in the `.cabal` files to Nix build inputs.
-
-This set can then be overridden by [overlays] which can apply patches, override
-sources, introduce new packages, and do basically any other arbitrary
-modification.
-
-At build time, the builder will be provisioned with a GHC package database with
-everything in the build inputs of the package, and it will build and test the
-package.
-
-In this way, each dependency is turned into a Nix derivation so caching
-of dependencies for development shells, parallelism across package builds, and
-other useful properties simply fall out for free.
-
-[Stackage]: https://www.stackage.org/
-[hackage2nix]: https://github.com/NixOS/cabal2nix/tree/master/cabal2nix/hackage2nix
-[overlays]: https://nixos.org/manual/nixpkgs/stable/#chap-overlays
-
-## Where's the IFD?
-
-nixpkgs Haskell provides a wonderfully useful function called `callCabal2nix`,
-which executes `cabal2nix` to generate the Nix expression for some Haskell
-source code at Nix evaluation time. Uh oh.
-
-It also provides another wonderfully useful function called `callHackage`. This
-is a very sweet function: it will grab a package of the specified version off
-of Hackage, and call `cabal2nix` on it.
-
-Wait, how does that work, since you can't just download stuff for fun without
-knowing its hash? Well, there's your problem.
-
-"Figuring out hashes of stuff on Hackage" was solved by someone publishing a
-comically large GitHub repo called `all-cabal-hashes` with hashes of all of the
-tarballs on Hackage and CI to keep it up to date. Using this repo, you only
-have to deal with keeping one hash up to date: the hash of the version of
-`all-cabal-hashes` you're using, and the rest are just fetched from there.
-
-### Oh no
-
-This repository has an obscene number of files in it, such that it takes dozens
-of seconds to unpack it. So it's simply not unpacked. Fetching a file out of it
-involves invoking tar to selectively extract the relevant file from the giant
-tarball of this repo.
-
-That, in turn, takes around 7 seconds on the fastest MacBook available, for
-each and every package, in serial. Also, Nix checks the binary caches for each
-and every one, further compounding the fail.
-
-I optimized it to take about 7 seconds, *total*. Although I *am* a witch, I
-think that there is some generally applicable intuition derived from this that
-can be used to make IFD go fast.
-
-# Making IFD go fast
-
-Nix is great at building big graphs of dependencies in parallel and caching
-them. So what if we ask Nix to do that?
-
-How can this be achieved?
-
-What if you only demand one big derivation be built with IFD then reuse it
-across all the usage sites?
-
-## Details of `some-cabal2nix`
-
-My observation was that hot-cache builds with a bunch of IFD are fine; it's
-refilling it that's horribly painful since Nix spends a lot of time doing
-pointless things in serial. What if we warmed up the cache by asking it to
-build all that stuff in one shot? Then, the rest of the IFD would hit a hot
-cache.
-
-*The* major innovation in the fix, which I called `some-cabal-hashes`, is that
-it builds *one* derivation with IFD that contains everything that will be
-needed for further evaluation, then all the following imports will hit that
-already-built derivation.
-
-Specifically, my build dependency graph now looks like:
-
-```
-                /- cabal2nix-pkg1 -\
-some-cabal2nix -+- cabal2nix-pkg2 -+- some-cabal-hashes -> all-cabal-hashes
-                \- cabal2nix-pkg3 -/
-```
-
-There are two notable things about this graph:
-
-1. It is (approximately) the natural graph of the dependencies of the build
-   assuming that the Nix evaluator could keep going when it encounters IFD.
-
-2. It allows Nix to naturally parallelize all the `cabal2nix-*` derivations.
-
-Then, all of the usage sites are `import "${some-cabal2nix}/pkg1` or similar.
-In this way, one derivation is built, letting Nix do what it's good at. I did
-something clever also: I made `some-cabal2nix` have no runtime dependencies by
-*copying* all the resulting cabal files into the output directory. Thus, the
-whole thing can be fetched from a cache server and not built at all.
-
-Acquiring the data to know what will be demanded by any IFD is the other piece
-of the puzzle, of course. I extracted the data from the overlays by calling the
-overlays with stubs first (to avoid a cyclic dependency), then evaluating for
-real with a `callHackage` function using the `some-cabal2nix` created using
-that information.
-
-The last and very important optimization I did was to fix the `tar` invocation.
-`tar` files have a linear structure that is perfect for making `t`ape
-`ar`chives (hence the name of the tool) which can be streamed to a tape: one
-file after the other, without any index. Thus, finding a file in the tarball
-takes an amount of time proportional to `O(n)` where `n` is the number of files
-in the archive.
-
-If you call tar `m` times for the number of files you need, then you do
-`O(n*m)` work. However, if you call `tar` *once* with a set of files that you
-want such that it can do one parse through the file and an `O(1)` membership
-check of that set, then the overall time complexity is `O(n)`. I can assume
-that is what `tar` actually does, since it does the entire extraction in
-basically the same time with a long file list as with one file.
-
-Enough making myself sound like I am in an ivory tower with big O notation;
-regardless, extracting with a file list yielded a major performance win.
-
-I also found that if you use the `--wildcards` option, `tar` is extremely slow
-and it seems worse with more files to extract. Use exact file paths instead.
-
-FIXME: get permission to release the sources of that file
-