diff --git a/content/posts/packaging-is-extremely-hard/antifa-demon-core.png b/content/posts/packaging-is-extremely-hard/antifa-demon-core.png new file mode 100644 index 0000000..ec6c1bd Binary files /dev/null and b/content/posts/packaging-is-extremely-hard/antifa-demon-core.png differ diff --git a/content/posts/packaging-is-extremely-hard/index.md b/content/posts/packaging-is-extremely-hard/index.md new file mode 100644 index 0000000..0a99540 --- /dev/null +++ b/content/posts/packaging-is-extremely-hard/index.md @@ -0,0 +1,256 @@ ++++ +date = "2024-01-27" +draft = true +path = "/blog/packaging-is-extremely-hard" +tags = ["build-systems", "arch-linux", "linux", "nix"] +title = "Packaging is extremely hard, or, why building AUR packages in CI is a nightmare" ++++ + +Packaging on a traditional distribution is challenging to say the least, and I +haven't seen any coherent descriptions of *why* hermetic build systems like Nix +eliminate an entire category of needing to think about certain things. Recently +a friend mentioned she was considering setting up a CI service for some AUR +packages by a trivial cron job, whereas my reaction to the idea of CI for Arch +packages is "that would take a month of work to do correctly". + +Let's explore the inherent complexity in writing a CI service for basically any +binary distro; picking on Arch Linux is only because it is what I have +experience with, though they tend to be especially fast and loose with inherent +complexity. One could argue that Arch in particular is the Go of distros, since +it ignores a lot of hard things in order to ship a working distro, similarly to +[how Go famously solves complexity by ignoring it][golang]. This is not about +factionalism; it is about the choices of where distro maintainers have spent +their energy, and ignoring complexity is something that has its place. + +Arch is known for having a large user maintained repository of non-reviewed +community-written packaging for most anything under the sun called the AUR. +This is a blessing and a curse, because Arch is extremely a binary distro. +Pretty much this entire post would apply to anyone maintaining a binary +repository for another distribution, except perhaps the part of building +packages maintained by other people in CI. + +[golang]: https://fasterthanli.me/articles/i-want-off-mr-golangs-wild-ride + +[rebuild-conds]: https://wiki.archlinux.org/title/DeveloperWiki:How_to_be_a_packager#The_workflow +[rebuild-detector]: https://github.com/maximbaz/rebuild-detector + +## "Rebuild conditions are indeterminate", or, why C++ people are always talking about ABI + +If you are a downstream consumer of an official binary package, such as being +an AUR packager, there is not really any obvious notice that you should rebuild +your package due to dependency updates, besides, perhaps, [rebuild-detector] +and upgrading your system regularly. + +The way that release management is done at Arch Linux is that maintainers +updating libraries go and [ping all their colleagues][soname-bump] when their +upstream changed their software so it is no longer binary-compatible +("ABI-compatible"), represented by a "soname bump", e.g. changing the file name +`libc.so.5` -> `libc.so.6`. This is not terribly unusual among distros. + +However, it's perfectly possible that packages break their ABI without updating +their soname, since most changes to C header files besides adding things will +break ABI in theory, for instance, changing `#define` constants or other such +things. So, if upstream is being impolite, they can cause bugs at any time, and +blatant changes can be caught by things like [abi-checker], though they don't +necessarily form part of the official process for Arch. + +[abi-checker]: https://lvc.github.io/abi-compliance-checker/ + +[soname-bump]: https://wiki.archlinux.org/title/DeveloperWiki:How_to_be_a_packager#Run_sogrep_on_identified_soname_change + +When packages are rebuilt without being updated, this is done by incrementing +`pkgrel` in the PKGBUILD, which is achieved automatically in the official repos +with `pkgctl build --rebuild` ([man page][pkgctl-build]) of the affected +packages. For example, for a version `0.20.10-1`, incrementing `pkgrel` would +produce a version `0.20.10-2`, which is uploaded to staging as well as pushed +to the package's own Git repo with `pkgctl release`. + +After all the builds are made, `pkgctl db move` is invoked to move all the +packages over. + + + +[pkgctl-build]: https://man.archlinux.org/man/pkgctl-build.1.en + +### Atomicity? Is that like a criticality incident? + +{% image(name="./antifa-demon-core.png", colocated=true) %} +an antifaschistische aktion sticker with a demon core in the middle, +"ausgerutscht, trotzdem da" on top and "kernphysiker antifa" on the bottom +{% end %} + + + +If the official repos operate by coordination between all the packagers, with a +staging area to atomically release rebuilds, it follows that AUR packagers can +expect that official repos can and will change at any time without notification +(unless one goes and looks at the development bug tracker). + + + +[arch-arm]: https://wiki.archlinux.org/title/Arch_Linux_Archive + +This is a relatively reasonable process for a distro that doesn't fully +automate everything and even one that does, but it is kind of a problem if you +aren't an official maintainer working in the official repos, since you aren't +in the notification list. + +Note also that the information that the AUR itself has on packages is not +sufficient to send emails about this either; this isn't the fault of the +Arch developers. + +However, the upshot of this is that if one is using an AUR package maintained +by someone else, there is no guarantee anyone has tried building it against the +latest versions of the official repos, and it is in fact also impossible to +know what versions it was successfully built against. A local build of an AUR +package can get arbitrarily out of sync with the official repos and it is not +easily possible to reconstruct the state of all the repos that went into +building it. + +Stuff randomly breaking due to repositories using the time of day as a software +version pinning mechanism is not just an AUR problem: it is much, much worse on +third-party binary repositories. For instance, even though [archzfs] is by far +one of the best executed third party repositories, in large part on account of +them running a CI service, it still can be out of time with the versions of the +kernel. + +[archzfs]: https://github.com/archzfs/archzfs + +However, the instance where third party repositories get *really* out of sync +with things is for things like Manjaro which have repositories delayed by two +weeks relative to Arch for "stability". This doesn't work out very well. + +## The source-build-source cycle + +For any package, a CI system that fully automates the packaging workflow needs +to be able to increment `pkgrel` on any dependency updates and trigger a +rebuild automatically. This is stored in the package source files: the CI +system has to be able to push to the sources automatically. + +This also means that a CI system building someone else's AUR packages needs to +*fork any packages it builds*, since it must be able to update `pkgrel` based +on its own detection of upstream changes, without worrying about the AUR +maintainer doing it. + +### Building someone else's stuff? Better reconcile it with automated local changes automatically + +However, the even worse corrolary of the above is if the other maintainer +*does* update `pkgrel`, since then you have to reconcile your own maintained +`pkgrel` and ensure that it strictly increases even with the maintainer's +changes. + +Another cause of needing to rebuild AUR sourced packages is the AUR package +itself changing, perhaps because upstream updated it and the AUR packager +updated their packaging. In that case, one has to discard local changes and +hope that versions strictly increased so pacman will install the new one. + +## Weightless! In the package manager! Loopy dependency graphs + +Debian ([documentedly so][debian-loopy]) and most other binary distros don't +have any tooling preventing packages forming circular build dependency graphs. +The most trivial one that exists in most any binary distribution is the C++ +compiler, which is itself likely a build dependency of the C++ compiler since +both clang and gcc are written in C++. + +How does one get the first compiler? In most distros, the answer is +"someone built it manually from somewhere and shoved it in /usr/local and then +built the first compiler package using some crimes". However, that path is, for +the most part, not documented or clearly reproducible. It is the typical state +of affairs to have the *distro repository itself* be a ball of inscrutable +mutable state. + +In NixOS it's [a tarball of compilers that's built with Nix and is occasionally +updated][nixos-bootstrap-tools], and will in the future [be rooted in a 256 +byte binary][nixos-minimal-bootstrap] after which everything is built from +source, which is what Guix also does. There's a bunch more information about +the efforts to bootstrap from nearly nothing at [bootstrappable.org], as well +as [on the Guix blog][fsb]. + +[bootstrappable.org]: https://bootstrappable.org/ + +[fsb]: https://guix.gnu.org/en/blog/2023/the-full-source-bootstrap-building-from-source-all-the-way-down/ +[nixos-bootstrap-tools]: https://github.com/nixos/nixpkgs/blob/d0efa70d8114756ca5aeb875b7f3cf6d61543d62/pkgs/stdenv/linux/make-bootstrap-tools.nix#L237-L256 +[nixos-minimal-bootstrap]: https://github.com/nixos/nixpkgs/blob/3dcd819caa03c848a9a06964857e12e4b789239e/pkgs/os-specific/linux/minimal-bootstrap/default.nix + +[debian-loopy]: https://wiki.debian.org/CircularBuildDependencies + +## Package tests? p--package integration t-tests?? + +So you want to write an integration test for your package on Arch Linux. That's +too bad, because there's not a testing framework, because there are not tests. +Packages can run the software's testsuite, but there is no officially supported +integration testing solution. + +# Software engineering fixes this + +I have spilled a thousand words on how traditional binary distros (that [are +not Fedora][fedora-ci]) spend a significant amount of labour doing rebuilds +largely by hand, with scripts on their local machines, coordinating amongst +maintainers. Most packages are built on developer machines, though [never on +Fedora][fedora-ci2] and only [sometimes on Debian][debian-ci], and thus cannot +necessarily be trusted to not be contaminated by the squishy mutable stuff that +happens on dev machines. Even though they are typically built in chroots, the +environment is not controlled. + +[debian-ci]: https://ci.debian.net/ + +I have addressed how packages require manually poking `pkgrel` every time a +rebuild is necessary, and how the need for rebuilds affects downstream +builders. This is, incidentally, [largely still true on +Fedora][fedora-updates]. + +The (pessimistic but sound) way to manage rebuilds is to just recompile every +downstream when a single bit of any dependency changes. This is the approach +used by Nix and it trades a significant but not unaffordably large (for a big +distro) amount of computer time in a build cluster for not having to think +about any of this. ABI breaks cannot affect the distribution because everything +was built against the exact same libraries, together. + +A Nix-like hermetic build system doesn't have a concept of `pkgrel`, because +packages are just what is in the single monorepo source tree on a given commit. +There is nothing wrong with the other approach of multiple repositories and +repository metadata that doesn't expose a single history, but it would be +useful to be able to cleanly ensure that a group of machines have exactly the +same packages on them as of some epoch, say. + +Facebook has made a tool for RPM distributions that builds OS images with +Buck2, called [Antlir]. This takes snapshots of repositories and builds OS +images with a hermetic build system, such that they receive the exact same +result every time. + +[Antlir]: https://facebookincubator.github.io/antlir/docs/ + +ABI breaks can *also* not break downstream consumers of `nixpkgs`, because Nix +builds out-of-tree stuff exactly the same using the same version set as +anything else: unlike every binary distribution, the distribution packages are +not special, and building out-of-tree stuff will never randomly break due to +ABI changes. + +NixOS has a robust and widely used (1040 of them) [integration +test][nixos-integration-tests] system, like Fedora, testing most parts of the +system and [gating repository updates][nixos-gating] like Fedora Bodhi. + +[nixos-gating]: https://status.nixos.org/ +[nixos-integration-tests]: https://nix.dev/tutorials/nixos/integration-testing-using-virtual-machines.html +[fedora-updates]: https://docs.fedoraproject.org/en-US/fesco/Updates_Policy/ +[fedora-ci2]: https://discussion.fedoraproject.org/t/report-from-the-reproducible-builds-hackfest-during-flock-2023/87469 +[fedora-ci]: https://docs.fedoraproject.org/en-US/ci/