From 398f83f92589dfe54a9fb86da46bc2f1502bc6a0 Mon Sep 17 00:00:00 2001 From: Jade Lovelace Date: Tue, 7 Feb 2023 21:34:08 -0800 Subject: [PATCH] the nix pitch post --- content/posts/the-nix-pitch/index.md | 435 +++++++++++++++++++++++++++ 1 file changed, 435 insertions(+) create mode 100644 content/posts/the-nix-pitch/index.md diff --git a/content/posts/the-nix-pitch/index.md b/content/posts/the-nix-pitch/index.md new file mode 100644 index 0000000..ed273c3 --- /dev/null +++ b/content/posts/the-nix-pitch/index.md @@ -0,0 +1,435 @@ ++++ +date = "2023-02-05" +draft = true +path = "/blog/the-nix-pitch" +tags = ["nix"] +title = "The Nix pitch" ++++ + +I have probably caught a reputation of being a completely unrepentant Nix +shill. This blog post is pure Nix shilling, but intends to explain why it is so +transformational that people shill it so much. + +This post is partially inspired by a line of thinking raised by [Houyhnhnm +Computing], ("hyou-nam") a blog series about an alternate universe of computing +framed as what would happen if the sentient horses from Gulliver's Travels saw +human computers. If you want more ideas of absurdities of how our computers +work and could be so much better, I highly recommend reading this site. + +[Houyhnhnm Computing]: https://ngnghm.github.io/ + +There are a few properties that make the Nix ecosystem extremely interesting: + +* Reproducibility (I promise you care about this!) +* Cross-language build system integration +* Incremental builds are trustworthy due to sandboxing +* Drift-free configuration management + +[A friend said][mgattozzi-nix] that "unfortunately all my problems are now nix +problems if I don't understand it". This is essentially true: Nix is a machine +for converting otherwise-unsolved packaging and deployment problems into Nix +problems. + +[mgattozzi-nix]: https://twitter.com/mgattozzi/status/1617604038517817344 + +The Nix ecosystem consists of the following components (which do not +necessarily need to be used at the same time): + +* Nix, a sandboxed build system with binary caching. + + Nix knows how to build "derivations", build descriptions that amount to shell + scripts. If the result of a derivation exists on a binary cache, it will be + fetched instead of built locally. +* Nix language, a functional domain specific language for writing things to be + built by Nix. + + If we consider the Nix language to be "Haskell", and derivations are "Bash", + Nix is a compiler from Haskell to Bash. +* nixpkgs, the package collection used by NixOS (also usable without NixOS, and + on macOS). The most up-to-date and largest distro repository on the planet. + + Allows composition of multiple language ecosystems with extensive programming + language support. + + Has some of the best Haskell packaging available anywhere, and the only + distro packaging of Haskell worth using for development. +* NixOS, a configuration management system masquerading as a Linux + distribution. It uses a domain specific language embedded in the Nix language + to define system images that are then built with Nix. + +# Case studies + +Let's go through some case studies of frustrating problems you may have had +with computers that don't happen in the Nix universe. + +## Case study: docker image builds + +Traditionally, Docker images are built by running some shell scripts inside a +containerized system, with full network access. These are *impossible* to +reproduce: the very instant you run `apt update && apt upgrade`, your image is +no longer reproducible. Let's tell a story. + +You're working on your software one day, and unbeknownst to you, `libfoo` has a +minor upgrade in your distribution of choice. You rebuild your images and +production starts experiencing random segmentation faults. So you revert it and +go investigate why the new image has been broken. This has happened a few times +before, and you never know why it happened. It seems to happen at random when +you upgrade the base image, so you stick with the same base image from a year +ago before the last upgrade. + +Today, you have received a feature request: generate screenshots of the website +to embed in Discord previews. Sweet, just add headless Chromium to the Dockerfile, +and .... oops, it's been deleted from the mirror because the version is too +old, and updating the package database with `apt update` would require fixing +`libfoo` as well as `libbar` (since that also broke in the meantime). Damn it! + +Also, your image is 700MB, because it includes several toolchains, an ssh +client, git, and other things necessary to build the software. You could copy +the built product out, but doing so would require building an integration test +for the whole thing to ensure that nothing of importance was removed. + +--- + +What went wrong? Dockerfiles don't specify their dependencies fully: they fetch +arbitrary content off the internet which may change without warning, and are +thus impossible to reproduce. There is no way to tell if software actually +requires some path to exist at runtime. It is impractical to use multiple +versions of the package set concurrently while working through +incompatibilities in other parts of the software: an upgrade is all or nothing. + +What if you could declaratively specify system packages in the build +configuration, not pull in build dependencies for runtime, and have everything +come from a lockfile so it doesn't change unexpectedly? What if you could pull +only Chromium from the latest distro repositories while working on migrating +the rest? + +Is this a broader failure of Linux distributions due to choosing global +coherence with respect to [the golden rule of software +distributions][golden-rule]? Can we have nice things? + +[golden-rule]: https://www.haskellforall.com/2022/05/the-golden-rule-of-software.html + +## Case study: configuration drift with Ansible + +Ansible is a popular configuration management system that brings systems into +an expected state by ssh'ing into them and running Python scripts. I have used +it seven years ago to build a lab environment, and I hope they have improved +it, but my beefs with it are at the design level. Story time! + +A year ago, you added a log sender with filebeat to ship logs to an +Elasticsearch cluster to aggregate all the logs. Recently, you changed the +application to send all logs to the systemd journal to introduce structured +logging. You changed the service to use journalbeat now and deleted the old +filebeat service configuration but for some reason, you're getting duplicate +log entries. What? + +You build a new machine and it does not exhibit the same behaviour. + +You look at one of the machines and realize it is concurrently running filebeat +and journalbeat. Whoops. You forgot to set the state of the old filebeat +service to `stopped`, and instead deleted the rule. Because Ansible doesn't +know about things it does not manage, the system *contains configuration that +diverges from what is checked in* to the git repository with the configurations. + +--- + +What went wrong? Ansible doesn't own the system, it merely manages its own area +of the system. "You should have used [HashiCorp Packer]" rings through your +head. Building new system images and deleting the old machines is a great +solution to this issue, but it experiences exactly the same problem as Docker +during the image build process. If this is acceptable, it's honestly a great +solution over mutable configuration-management systems. + +[HashiCorp Packer]: https://www.packer.io/ + +Imagine if you could change the configuration and know that none of the old one +was still around. Imagine being able to revert the entire system to an older +version, even on bare metal, without needing such a big hammer as snapshots, +which are also easy to forget to use for pure configuration changes. + +## Case study: zfs via dkms + +On most distributions, if you want to use a kernel module that's not available +in the mainline kernel, you have to use `dkms`, which is essentially some scripts +for invoking `make` to build the kernel modules in question. This is then +generally hooked into the package manager so that the modules are rebuilt every +time the Linux kernel is updated. `dkms` needs to be separate from the system +package manager since the system package manager does not know how to build +patched packages, source based packages, and similar things. Story time! + +Several months ago, a new Linux kernel update broke the compilation of +zfs-on-linux. This is fine, it happens sometimes. I use zfs on the root +filesystem of my desktop machine, and I currently run Arch Linux on it. Arch +like most distros uses `dkms` to build these out of tree kernel modules. + +I ran `pacman -Syu` and waited a few minutes. I then thoughtlessly closed the +terminal and restarted my computer, since there was no error visible in the +bottom of the logs. Whoops, it can't mount the root filesystem. That seems +rather important! + +I then had to get out an Arch ISO to `chroot` into the system, install an older +Linux kernel and rerun the `dkms` build. + +--- + +What went wrong? The system package manager only knows how to handle binary +packages, which means that anything source based is second class, and is +handled via hacks such as a hook to build out-of-tree modules at the end of the +installation process. If this fails, it can't revert the upgrade it just +finished. By design, most binary distros' package managers can have partial +upgrade failures, and when the driver for the root filesystem is in such an +upgrade, render your system unbootable. + +Since the distro is not "you", they may have diverging priorities or concerns: +perhaps they don't feel comfortable shipping the zfs module as a binary, so you +have to build it from source on your computer. You can't do anything about +these decisions: do binary distributions actively enable software freedom or +ownership over your computing experience? + +Imagine if system upgrades were atomic and would harmlessly fail if some +source-based dependencies could not build. Imagine if you could seamlessly +patch arbitrary packages without needing to change distributions or manually +keep track of when you have to do so. Imagine if there weren't a distinction +between distro packages and packages you have patched or written yourself. + +Imagine if you could check in compiler patches to your work repository and +everyone would get the new binaries when they pull it next, without building +anything on their machines. + +# The Nix pitch + +Leveraging the Nix ecosystem, you can solve: + +* Consistent development environments for everyone working on your project, + with arbitrary dependencies in any language: you can ship your Rust SQL + migration tool to your TypeScript developers. The distro war is over, + everyone gets the exact same versions of the development tools with minimal + effort. People can run whatever distro they want including macOS, and distro + issues are basically gone. + + This also means that for personal projects, upgrading the system compiler + does not break the build. Upgrades are done on your terms, by updating a + lockfile in the project itself. You can have as many versions of a program as + you'd like on your system, and they don't interfere with each other. + + It's possible to pull some tools from a newer version of nixpkgs than is used + for the rest of the system, and this has no negative effects besides disk + use. +* Fast, reproducible, and small image builds for Docker, Amazon, and anything + else with the nixpkgs infrastructure or Determinate Systems [ephemera]. You + know it reproduces because everything going into it is locked to a version. +* System configuration is no longer something to be avoided: when you work on + your NixOS system configuration, you get the results of your work on all your + machines and you get it forever, since you check it into Git. +* Patching software is easy, and you can ship arbitrary patches to the package + set for projects anywhere you use Nix. + + There is no distinction besides binary caching between packages in the + official repositories and what you create yourself. You can run your own + binary cache for your project and build your patched dependencies in CI. I + didn't care about software freedom until I actually *had* it. +* You can simply rollback to previous NixOS system images if an upgrade goes + sideways. The entire system is one derivation with dependencies on everything + it needs, and switching configurations is a matter of running a script that + more or less switches a symlink and fixes up any systemd services. System + upgrades cause extremely short downtime. + + Workload configuration/version changes behave exactly the same as OS updates. +* You don't have to think about the disparate configuration formats various + programs use on NixOS. You just write your nginx config in Nix and it's no + big deal. +* Software is composable in Nix: you can build a Haskell program that depends + on a Rust library without tearing your hair out, since Cabal can just look in + pkg-config and not have to know how to build any of it. + + Machine learning Python libraries require funny system packages? Nix just + makes the Python libraries depend on the system packages. +* If you've used Arch, you may like the Arch User Repository. This is + unnecessary under Nix: nixpkgs is liberal in what they accept as packages, + and is both the largest and most up to date distro repository out there. + + Since Nix is a source based build system, you can just package what you need + and put it in your configuration, to upstream later or never. + + You can get proprietary software: you can literally install the huge Intel + Quartus toolchain for FPGA development from nixpkgs. + + Need patched software? Patch it, it's a few lines of Nix code to create a + modified package based on one defined in nixpkgs, which will naturally be + rebuilt if it changes upstream. + + The critical insight in why nixpkgs is so large is that maintainers aren't + special. I maintain packaging in nixpkgs for packages which I also develop. + Another reason for their success is that packages can depend on older e.g. + llvm: global coherence is not required, multiple versions of libraries can + and do exist. + +[ephemera]: https://twitter.com/grhmc/status/1575518762358513665 + +## It's not all rosy + +Nix has a lot of ways it needs to grow, especially in governance. + +* Documentation is poor. Often the best choice is to read the nixpkgs source + code, an activity [for which I have a guide][finding-functions-in-nixpkgs]. + + There has been much work to make this better, but it is somewhat fragmented + effort, hampered by both Flakes and the new CLI being in limbo for a long + time. +* Tooling isn't the greatest. + + The UX design of the `nix` CLI is not very good, with unfortunate design decisions + as the command to update everything being: + + > `nix flake update` + + However, to update one input: + + > `nix flake lock --update-input nixpkgs` + + This is [filed upstream](https://github.com/NixOS/nix/issues/5110) and is + thankfully showing slow movement in a good direction. + + The older `nix-build`/`nix-shell`/`nix-instantiate`/`nix-store` CLI design is + more troublesome since it crystallized over many years rather than being + designed upfront. + + There are some language servers for Nix language, namely `rnix-lsp` and + `nil`, and they both are OK, but their job is made much harder by Nix being a + dynamic language and some of the patterns used commonly in Nix code being + implemented in libraries, rendering their analysis challenging at best. + + For example, package definitions in nixpkgs are written as functions taking + their dependencies as arguments. Static analysis of this is nearly hopeless + without seeing the call site: you don't know anything about these values. + + The NixOS domain specific language is evaluated entirely in the Nix language, + which slows it down and makes diagnostics challenging. +* Currently there are significant governance issues. + + There are conflicts of interest with the major corporate sponsors of Nix, + Determinate Systems, employing many people in the Nix community. For example, + the sudden introduction of [Zero to Nix][ztn] alienating [some of the + official docs team][ztn-docs]. + + This conflict of interest is especially relevant with respect to Flakes, the + "experimental" built-in lockfile/project-structure system, which was + developed as consulting (by people now working at Determinate Systems) for + Target *first*, then brought [to RFC][flake-rfc] in experimental form, which + was closed. The great flakification was done amidst the `nix` CLI redesign + (also experimental) which has now been strongly tied to flakes with + non-flakes as an afterthought, in spite of the composability issues with + flakes such as inability to have dependencies between flakes in the same Git + repository, thus incompatibility with monorepos. + + Currently the state of flakes is that a lot of people use it, in spite of + experimental status. The people who don't want flakes as the only way of + doing things are understandably very frustrated, some of them even going so + far as to [rewrite Nix][tvix]. + + The maintenance of the C++ Nix implementation is not very healthy and has a + large PR backlog while at the same time the BDFL, Eelco Dolstra, commits + directly to master. This situation is disappointing. + +[ztn]: https://zero-to-nix.com/ +[ztn-docs]: https://discourse.nixos.org/t/parting-from-the-documentation-team/24900 +[finding-functions-in-nixpkgs]: https://jade.fyi/blog/finding-functions-in-nixpkgs/ +[flake-rfc]: https://github.com/NixOS/rfcs/pull/49 +[tvix]: https://tvl.fyi/blog/rewriting-nix + +## How did they do it? + +Every large company has rebuilt something Nix-like at some level, since at some +point everyone needs to have the same development environment which is the same +as production. Nix provides that tooling in a much more accessible form. + +Here's some things they did to achieve it (see also [Eelco Dolstra's PhD thesis +for extensive details][eelco-thesis]): + +[eelco-thesis]: https://edolstra.github.io/pubs/phd-thesis.pdf + +### Every store path is a unique version and dependency closure + +One of the *key* insights of the Nix implementation is that every path in the +Nix store (typically at `/nix/store`) has a hash in its name of either its +build plan or its content. + +Nix derivations are either fixed-output, input-addressed, or +content-addressed. Fixed-output derivations can access the network, but their +output must match the expected hash. Input-addressed derivations are the +bread and butter of Nix today: the hash in the name of the output depends on +the content of the build plan. [Content-addressed +derivations][ca-derivations] are an experimental feature, potentially +promising to save a lot of compute time doing pointless rebuilds by allowing +multiple build plans to generate the same output path if the output is +identical (for example, consider the case of a shell comment change in a +build script). + +All references to items in the Nix store (for example, in shebang, +derivation dependencies lists, shared object paths, etc) are by full path, +thus effectively creating a [Merkle tree] when they are themselves hashed: +the hashes of dependencies are included in the hash of the build plan. + +The upshot of this is that any number of versions of a package can coexist, +allowing programs from older distribution versions, development versions, and +any other weird versions to run on the same system without trampling over +each others' libraries or requiring sandboxing. + +This feature is necessary to the implementation of something like `nix-shell` +which brings packages into scope for the duration of a shell session, after +which they may be garbage collected later. + + + +### Builds must be hermetic for trustworthy incremental builds + +Nix builds are either sandboxed or forced to have an expected output, leaving +very little room for the typical incremental build issues everyone has had to +arise, since it is known exactly what went into the build. If it built today, +it's highly unlikely to have a different result tomorrow. + +### Archive encoding leaves room for creativity + +Nix chooses to hash archives *consistently*: when `.tar.gz` and other archive +files are unpacked, they are repacked into a archive format (NAR, "Nix +ARchive") that has exactly one encoding per directory structure, and that is +then hashed. + +Recently, GitHub upgraded their Git version, changing the `tar` file encoder +and changing the hashes of all their source archives. These archives themselves +have never been guaranteed to be themselves bit-for-bit stable; just their +contents. However, they had been stable in practice for years. Build systems +that pin source archives from GitHub should hash contents instead of archives +because of this. + +### Immutability/scratch-building system images make configuration drift impossible + +NixOS uses the most reliable paradigm for configuration management: full +ownership over the system, effectively generating the system from scratch +every time, modulo reusing some of the bits that didn't change. + +It keeps the configuration immutable once built. To change it, you rebuild the +configuration and then switch to it relatively atomically. + +This contrasts with the way that other configuration management systems +(besides Packer and other image building tools) work, attempting to mutate +the system into the desired state, potentially allowing unmanaged pieces to +creep in and enable drift. + +[ca-derivations]: https://www.tweag.io/blog/2021-12-02-nix-cas-4/ +[Merkle tree]: https://en.wikipedia.org/wiki/Merkle_tree