finishish the speedy ifd post
This commit is contained in:
parent
7d3d7cff3a
commit
fef01461f9
2 changed files with 322 additions and 245 deletions
322
content/posts/nix-evaluation-blocking.md
Normal file
322
content/posts/nix-evaluation-blocking.md
Normal file
|
|
@ -0,0 +1,322 @@
|
|||
+++
|
||||
date = "2023-03-30"
|
||||
draft = true
|
||||
path = "/blog/nix-evaluation-blocking"
|
||||
tags = ["haskell", "nix"]
|
||||
title = "Stopping evaluation from blocking in Nix"
|
||||
+++
|
||||
|
||||
Nix has a feature called "import from derivation", which is sometimes called
|
||||
"such a nice foot gun" (grahamc, 2022). I can't argue with its usefulness; it
|
||||
lets Nix do amazing things that can't be accomplished any other way, and avoid
|
||||
pointlessly checking generated files into git. However, it has a dirty secret:
|
||||
it can *atrociously* slow down your builds.
|
||||
|
||||
The essence of this feature is that Nix can do operations such as build
|
||||
derivations whose result is used in the *evaluation stage*, before more
|
||||
evaluation takes place.
|
||||
|
||||
## Nix build staging
|
||||
|
||||
It's a reasonable mental model to consider that Nix, in its current
|
||||
implementation, can do one of two things at a given time:
|
||||
|
||||
* Evaluate: run Nix expressions to create some derivations to build. This stage
|
||||
outputs `.drv` files, which can then be instantiated.
|
||||
* Build: given some `.drv` files, fetch the result from a binary cache or build
|
||||
from scratch.
|
||||
|
||||
Nix evaluation happens in a lazy order, and with neither concurrency nor
|
||||
parallelism. When Nix is working on evaluating something that is blocked on a
|
||||
build, it cannot go evaluate something else.
|
||||
|
||||
There are efforts [such as tvix][tvix] to rewrite the Nix evaluator so it can
|
||||
simultaneously evaluate and build, but they are not shipped yet.
|
||||
|
||||
[tvix]: https://code.tvl.fyi/about/tvix
|
||||
|
||||
### How does import-from-derivation fit in?
|
||||
|
||||
Import-from-derivation (IFD for short) lets you do magical things: since Nix
|
||||
builds can do arbitrary computation in any language, Nix expressions or other
|
||||
data can be generated by external programs that need to do pesky things such as
|
||||
parse cursed file formats such as cabal files.
|
||||
|
||||
Features like IFD are useful in build systems because they allow build systems
|
||||
to be *Monadic*, in the words of the [Hadrian paper]: build things to decide
|
||||
what to build. By comparison, an *Applicative* build system such as Bazel can
|
||||
only take in things that are known statically; in Bazel it is thus common to
|
||||
check in generated build instructions. This property in category theory is
|
||||
illustrated in the type signatures of the operations of Monad and Applicative:
|
||||
|
||||
* `apply :: Applicative f => f (a -> b) -> a -> f b`<br/>
|
||||
which means that something *outside* the `f` box can be put into a
|
||||
computation `a -> b` inside the box.
|
||||
* `bind :: Monad m => m a -> (a -> m b) -> m b`<br/>
|
||||
which means that if you have a value of type `a` in a `m` box, you can *get
|
||||
the `a` out of the box* and create a further computation in a box depending
|
||||
on its actual value.
|
||||
|
||||
Import-from-derivation is `bind` for Nix builds: imagine that we
|
||||
wrote a derivation that generated the Nix language source for the `glibc`
|
||||
package definition, then we `import (make-glibc-definition someArgs)` in the
|
||||
dependencies of `hello`. To *evaluate* `hello`, we need to *build*
|
||||
`glibc-definition` to remove it from the box to put it into the `hello`
|
||||
definition, which means that IFD is equivalent to `bind`.
|
||||
|
||||
[Hadrian paper]: https://dl.acm.org/doi/10.1145/3241625.2976011
|
||||
|
||||
In order to achieve IFD, the evaluation stage can demand builds be run. This
|
||||
would, by nature, block evaluating things that directly depend on building
|
||||
something, but since Nix's evaluator can only work on one thing at a time, it
|
||||
blocks everything else too.
|
||||
|
||||
N.B. I've heard that someone wrote a PureScript compiler targeting the Nix
|
||||
language, which was then used to [parse a Cabal file to do cabal2nix's job][evil-cabal2nix]
|
||||
entirely within Nix without IFD. Nothing is sacred.
|
||||
|
||||
[evil-cabal2nix]: https://github.com/cdepillabout/cabal2nixWithoutIFD
|
||||
|
||||
|
||||
### What blocks evaluation?
|
||||
|
||||
Doing anything with a derivation that depends on the contents of its output,
|
||||
for example:
|
||||
* `import someDrv`
|
||||
* `builtins.readFile "${someDrv}/blah"`
|
||||
|
||||
Any use of builtin fetchers blocks evaluation, which is why they are banned in
|
||||
nixpkgs:
|
||||
* `builtins.fetchGit`
|
||||
* `builtins.fetchTree`
|
||||
* `builtins.fetchTarball`
|
||||
* `builtins.fetchurl`
|
||||
* etc
|
||||
|
||||
#### Builtin fetchers
|
||||
|
||||
Use of builtin fetchers is a surprisingly common problem of blocking in
|
||||
evaluation. Sometimes it is done by mistake, but other times it is done for
|
||||
good reason, with unfortunate tradeoffs. I think it's reasonable to use a
|
||||
builtin fetcher plus IFD to import libraries such as nixpkgs, since
|
||||
fundamentally the thing needs to be fetched for evaluation to proceed, but
|
||||
other cases are more questionable.
|
||||
|
||||
One reason one might use the builtin fetchers is that there is no way to use a
|
||||
fixed-output derivation such as is produced by `nixpkgs.fetchurl` to download a
|
||||
URL without knowing the hash ahead of time.
|
||||
|
||||
An example I've seen of this being done on purpose is wanting to avoid
|
||||
requiring contributors to have Nix installed to update the hash of some
|
||||
tarball, since Nix has its own bespoke algorithm for hashing tarball contents
|
||||
that nobody has yet implemented outside Nix. So the maintainers used an impure
|
||||
network fetch (only feasible with a builtin), at the cost of slow Nix build
|
||||
times. I wound up writing [gridlock] to serve the use case of needing to
|
||||
maintain a lockfile of Nix compatible hashes without needing Nix, allowing the
|
||||
use of fixed-output derivation based fetchers.
|
||||
|
||||
[gridlock]: https://github.com/lf-/gridlock
|
||||
|
||||
The reason that impure fetching needs to be a builtin is because Nix has an
|
||||
important purity rule for derivations: either input is fixed and network access
|
||||
is disallowed, or output is fixed and network access is allowed. In the Nix
|
||||
model as designed ([content-addressed store] aside), derivations are identified
|
||||
only by what goes into them, but not the output.
|
||||
|
||||
Let's see why that is. Assume that network access is allowed in normal
|
||||
builders. If this were the case, builds could have arbitrarily different inputs
|
||||
without changing the store path. Such an impurity would force Nix to either
|
||||
rebuild the derivation every time since its actual inputs could arbitrarily
|
||||
change at any time ([implemented as impure derivations!][impure-drvs]) or
|
||||
following the popular strategy of assuming that it didn't change, making a
|
||||
"clean build" necessary to have any confidence in the build output.
|
||||
|
||||
[impure-drvs]: https://nixos.org/manual/nix/stable/release-notes/rl-2.8.html
|
||||
|
||||
If one is doing an network fetch without knowing the hash, one needs to
|
||||
first download the file to know its eventual store path. Therefore the
|
||||
fetcher *has* to serialize any evaluation depending on the store path while
|
||||
waiting for the download. Nix doesn't know how to resume evaluating something
|
||||
else in the meantime, so it serializes all evaluation.
|
||||
|
||||
That said, it is, in my opinion, a significant design flaw in the Nix evaluator
|
||||
that it cannot queue all the derivations that are reachable, rather than
|
||||
stopping and building each in order.
|
||||
|
||||
[content-addressed store]: https://github.com/NixOS/rfcs/blob/master/rfcs/0062-content-addressed-paths.md
|
||||
|
||||
# Story: `some-cabal-hashes`
|
||||
|
||||
I worked at a Haskell shop which made extensive use of Nix. One day I pulled
|
||||
the latest sources and yet again ran into the old bug in their Nix setup where
|
||||
Nix would go and serially build derivations called
|
||||
"`all-cabal-hashes-component-*`". For several minutes, before doing anything
|
||||
useful. I decided enough was enough and went bug hunting.
|
||||
|
||||
I fixed it in a couple of afternoons by refactoring the use of
|
||||
import-from-derivation to result in fewer switches between building and
|
||||
evaluating, which I will expand on in a bit.
|
||||
|
||||
The result of this work is [available on GitHub][nix-lib].
|
||||
|
||||
[nix-lib]: https://github.com/lf-/nix-lib
|
||||
|
||||
### Background on nixpkgs Haskell
|
||||
|
||||
The way that the nixpkgs Haskell infrastructure works is that it has a
|
||||
[Stackage]-based package set based on some Stackage Long-Term Support release,
|
||||
comprising package versions that are all known to work together. The set is
|
||||
generated via a program called [`hackage2nix`][hackage2nix], which runs
|
||||
`cabal2nix` against the entirety of Hackage. What one gets in this package set
|
||||
is a snapshot of Hackage when it was committed by the nixpkgs maintainers, with
|
||||
both the latest version and the Stackage stable version of each package
|
||||
available (if that's different from the latest).
|
||||
|
||||
`cabal2nix` is a program that generates metadata, input hashes, and wires
|
||||
together a set of Nix derivations for Haskell packages automatically,
|
||||
using Nix instead of Cabal to orchestrate builds of packages' dependencies.
|
||||
|
||||
Often the package set is for an older compiler, and while there *are* some
|
||||
per-compiler-version overrides in the nixpkgs Haskell distribution, these may
|
||||
not switch the version of a package you need, or the package set just plain has
|
||||
too old a version.
|
||||
|
||||
In that case, you can use [overlays] which can apply patches, override sources,
|
||||
introduce new packages, and do basically any other arbitrary modification to
|
||||
packages.
|
||||
|
||||
Each package build will be provisioned with a GHC package database with
|
||||
everything in its build inputs, using Nix exclusively for dependency
|
||||
version resolution and avoiding Cabal doing it.
|
||||
|
||||
Since Nix orchestrates the build of the package dependency graph, caching of
|
||||
dependencies for development shells, parallelism across package builds, remote
|
||||
builds, and other useful properties simply fall out for free.
|
||||
|
||||
[Stackage]: https://www.stackage.org/
|
||||
[hackage2nix]: https://github.com/NixOS/cabal2nix/tree/master/cabal2nix/hackage2nix
|
||||
[overlays]: https://nixos.org/manual/nixpkgs/stable/#chap-overlays
|
||||
|
||||
### Where's the IFD?
|
||||
|
||||
nixpkgs Haskell provides a wonderfully useful function called `callCabal2nix`,
|
||||
which executes `cabal2nix` to generate the Nix expression for some Haskell
|
||||
source code at Nix evaluation time. Uh oh.
|
||||
|
||||
It also provides another wonderfully useful function called `callHackage`. This
|
||||
is a very sweet function: it takes a package name and a version and gives you a Nix
|
||||
derivation for that package which knows how to download it from Hackage.
|
||||
|
||||
Wait, how does that work, since you can't just download stuff for fun without
|
||||
knowing its hash? Well, there's your problem.
|
||||
|
||||
"Figuring out hashes of stuff on Hackage" was solved by someone publishing a
|
||||
comically large GitHub repo called `all-cabal-hashes` with hashes of all of the
|
||||
tarballs on Hackage and CI to keep it up to date. Using this repo, you only
|
||||
have to deal with keeping one hash up to date: the hash of the version of
|
||||
`all-cabal-hashes` you're using, and the rest are just fetched from there.
|
||||
|
||||
#### Oh no
|
||||
|
||||
This repository has an obscene number of files in it, such that it takes dozens
|
||||
of seconds to unpack it. So it's simply not unpacked. Fetching a file out of it
|
||||
involves invoking tar to selectively extract the relevant file from the giant
|
||||
tarball of this repo.
|
||||
|
||||
That, in turn, takes around 7 seconds on a very fast computer, for each and
|
||||
every package, in serial. Also, Nix checks the binary caches for each and every
|
||||
one, further compounding the oopsie.
|
||||
|
||||
In my fixed version, it takes about 7 seconds *total* to gather all the hashes,
|
||||
and I think this is instructive in how a Nix expression can be structured to
|
||||
behave much better.
|
||||
|
||||
## Making IFD go fast
|
||||
|
||||
Nix is great at building big graphs of dependencies in parallel and caching
|
||||
them. So what if we ask Nix to do that?
|
||||
|
||||
How can this be achieved?
|
||||
|
||||
What if you only demand one big derivation be built with IFD then reuse it
|
||||
across all the usage sites? You then only block evaluation once and can
|
||||
appropriately use parallelism.
|
||||
|
||||
### Details of `some-cabal-hashes`
|
||||
|
||||
My observation was that hot-cache builds with a bunch of IFD are fine; it's
|
||||
refilling it that's horribly painful since Nix spends a lot of time doing
|
||||
pointless things in serial. What if we warmed up the cache by asking it to
|
||||
build all that stuff in one shot? Then, the rest of the IFD would hit a hot
|
||||
cache.
|
||||
|
||||
The fix, `some-cabal-hashes` and `some-cabal2nix`, is that Nix builds *one*
|
||||
derivation with IFD that contains everything that will be needed for further
|
||||
evaluation, then all the following imports will complete immediately.
|
||||
|
||||
Specifically, my build dependency graph now looks like:
|
||||
|
||||
```
|
||||
/-> cabal2nix-pkg1 -\
|
||||
some-cabal2nix -+-> cabal2nix-pkg2 -+-> some-cabal-hashes -> all-cabal-hashes
|
||||
\-> cabal2nix-pkg3 -/
|
||||
```
|
||||
|
||||
There are two notable things about this graph:
|
||||
|
||||
1. It is (approximately) the natural graph of the dependencies of the build
|
||||
assuming that the Nix evaluator could keep going when it encounters IFD.
|
||||
|
||||
2. It allows Nix to parallelize all the `cabal2nix-*` derivations.
|
||||
|
||||
Then, all of the usage sites are `import "${some-cabal2nix}/pkg1` or similar.
|
||||
In this way, one derivation is built, letting Nix do what it's good at. Another
|
||||
trick is that I made `some-cabal2nix` have no runtime dependencies by *copying*
|
||||
all the resulting cabal files into the output directory instead of symlinking
|
||||
them. Thus, the whole thing can be fetched from a cache server and not built at
|
||||
all.
|
||||
|
||||
Figuring out what will be required is the other piece of the puzzle. Initially,
|
||||
I wrote a drop in solution: extract the information from the overlays that were
|
||||
actually using `callHackage`, without invoking `callHackage` and causing an
|
||||
IFD! The way I did this is by making a fake version of the package set with
|
||||
stubs for `callHackage`, `callCabal2nix` and such, then invoking the overlay
|
||||
with this fake package set as the value of `super`. Once the overlays are
|
||||
evaluated with these stubs, I extracted the versions requested by the overlay,
|
||||
and used that to create the *real* `callHackage`.
|
||||
|
||||
This kind of trick is also used by maintainer scripts to extract metadata from
|
||||
overlays that nixpkgs Haskell uses for compiler-specific overrides.
|
||||
|
||||
However, nixpkgs has something nicer for overriding versions of Hackage
|
||||
libraries, which the published version of the overlay imitates instead:
|
||||
`haskell.lib.packageSourceOverrides`. It constructs an overlay that overrides
|
||||
package versions with a nice clean input of what's required.
|
||||
|
||||
The last and very important optimization I did was to fix the `tar` invocation,
|
||||
which was quite slow. `tar` files have a linear structure that is perfect for
|
||||
making `t`ape `ar`chives (hence the name of the tool) which can be streamed to
|
||||
a tape: one file after the other, without any index. Thus, finding a file in
|
||||
the tarball takes an amount of time proportional to the number of files in the
|
||||
archive.
|
||||
|
||||
If you call tar `invocations` times for the number of files you need, then it
|
||||
does do `files-in-archive * invocations` work. However, if you call `tar`
|
||||
*once* with a set of files that you want such that it can do one parse through
|
||||
the file, then the overall execution time is just proportional to the number of
|
||||
files in the archive. I can assume that is what `tar` actually does, since it
|
||||
does the entire extraction in basically the same time with a long file list as
|
||||
with one file.
|
||||
|
||||
I also found that if you use the `--wildcards` option, `tar` is extremely slow
|
||||
and it seems worse with more files to extract, so it's better to use exact file
|
||||
paths. If you need to get those for a tarball from GitHub, there's a pesky top
|
||||
level directory that everything is below, which needs to be included in the
|
||||
file paths; it can be found by using `tar tf` and grabbing the first line.
|
||||
|
||||
At extraction time, the extra directory name can be removed with
|
||||
`--strip-components=1`, and using `--one-top-level` to dump everything below
|
||||
the desired directory.
|
||||
|
||||
The source code for `some-cabal-hashes` is available here:
|
||||
https://github.com/lf-/nix-lib/blob/main/lib/some-cabal-hashes.nix
|
||||
Loading…
Add table
Add a link
Reference in a new issue