blog/content/posts/debugging-nix-package-building.md
2023-10-19 14:32:17 -07:00

229 lines
7.9 KiB
Markdown

+++
date = "2023-10-19"
draft = false
path = "/blog/debugging-nix-package-building"
tags = ["nix"]
title = "Debugging Nix build inconsistencies: manual vs automatic build"
+++
This is a quick post about an issue I debugged in a package build that didn't
matter, which may however still be instructive.
I was packaging [ios-webkit-debug-proxy] for NixOS, not having noticed it was
already there (oops! I noticed after fixing my build), and I ran into an issue
where my build was not working, so I tried building it manually, and it worked.
Wat.
[ios-webkit-debug-proxy]: https://github.com/google/ios-webkit-debug-proxy
Here is the `package.nix` I was using:
```nix
{ stdenv, fetchFromGitHub, autoconf, automake, libtool, pkg-config, libimobiledevice }:
stdenv.mkDerivation {
pname = "ios-webkit-debug-proxy";
version = "unstable-2023-09-23";
src = fetchFromGitHub {
owner = "google";
repo = "ios-webkit-debug-proxy";
rev = "c5d2e959043293ee7d0587618af2bcc4dde7feef";
sha256 = "sha256-vBrYbCzH83BqB7uTybEmF/yp2N40qJp+97lu+nKulXM=";
};
preConfigure = ''
NOCONFIGURE=1 ./autogen.sh
'';
# These are the fix ;)
# dontAddDisableDepTrack = true;
# dontDisableStatic = true;
nativeBuildInputs = [
autoconf
automake
libtool
pkg-config
];
buildInputs = [
libimobiledevice
];
}
```
This is what was happening:
```
$ nix build -L --impure --expr 'with import <nixpkgs> {}; (callPackage ../package.nix {})'
error: builder for '/nix/store/0ci7kyfmcbbxq5g5svzlp316y98vvgng-ios-webkit-debug-proxy-unstable-202
3-09-23.drv' failed with exit code 2;
last 10 log lines:
> gcc -DHAVE_CONFIG_H -I. -I.. -I../include -I../src -I/nix/store/n4wwsam3rk07mghksrp46sn
lmf7cdi33-libimobiledevice-1.3.0+date=2023-04-30-dev/include -I/nix/store/cykm9dyc58l9jfilgrd0aflfn
znzrc39-libplist-2.3.0-dev/include -I/nix/store/cykm9dyc58l9jfilgrd0aflfnznzrc39-libplist-2.3.0-dev
/include -I/nix/store/3g9fg7cdh3234xi5b14v1fp135q8ix49-libusbmuxd-2.0.2+date=2023-04-30/include -I/
nix/store/lqsfdc4p9vax2h55csza9iw46zjpghl6-openssl-3.0.10-dev/include -g -O2 -Wall -Werror -c -o ..
/src/socket_manager.o ../src/socket_manager.c
> ../src/socket_manager.c:34:10: fatal error: socket_manager.h: No such file or directory
> 34 | #include "socket_manager.h"
> | ^~~~~~~~~~~~~~~~~~
> compilation terminated.
> make[2]: *** [Makefile:464: ../src/socket_manager.o] Error 1
> make[2]: Leaving directory '/build/source/examples'
> make[1]: *** [Makefile:410: all-recursive] Error 1
> make[1]: Leaving directory '/build/source'
> make: *** [Makefile:342: all] Error 2
For full logs, run 'nix log /nix/store/0ci7kyfmcbbxq5g5svzlp316y98vvgng-ios-webkit-debug-pro
$ nix develop --impure --expr 'with import <nixpkgs> {}; (callPackage ../package.nix {})'
$ cd manually-cloned-src
$ NOCONFIGURE=1 ./autogen.sh
$ make -j8
(succeeds)
```
So, we know that the issue is one of two problems, either:
* The build relies on an impurity of not being in the sandbox. This symptom
makes no sense for that, but if this is your problem, use [breakpointHook]
and [cntr] to enter the build environment.
* There is some inconsistency in the actual build process. If this is the case,
we *believe*, in this case, that Make must be doing something different,
since the `../src/socket_manager.c` target does not appear as being compiled
in the successful build (it is only used in the link stage!).
One way to verify this is to [manually run the build] and see if the
behaviour still is occurring. I expect that in this case, it would have
(extremely hilariously!) caused the bug to reproduce differently due to
`dontAddDisableDepTrack` being automatically set in nix shells.
[manually run the build]: https://jade.fyi/blog/building-nix-derivations-manually/
[breakpointHook]: https://nixos.org/manual/nixpkgs/stable/#breakpointhook
[cntr]: https://github.com/Mic92/cntr
What I did was to run the failing `nix build` with `--keep-failed`, then looked
in `/tmp/nix-build-*` for the failed tree. From this failed tree I found I
could reproduce the failure again using `make`.
Let's figure out what differs between these directory trees. To do this, we can
use [diffoscope], an advanced diff tool that can compare directory structures
with nice structural diffs.
[diffoscope]: https://diffoscope.org/
We believe that it's an inconsistent Makefile, so let's dig through the output and
test that:
```
--- ./src/Makefile
+++ /tmp/nix-build-ios-webkit-debug-proxy-unstable-2023-09-23.drv-0/source/src/Makefile
+#include ./$(DEPDIR)/sha1.Plo # am--include-marker
+#include ./$(DEPDIR)/socket_manager.Plo # am--include-marker
+#include ./$(DEPDIR)/webinspector.Plo # am--include-marker
+#include ./$(DEPDIR)/websocket.Plo # am--include-marker
-include ./$(DEPDIR)/base64.Plo # am--include-marker
-include ./$(DEPDIR)/char_buffer.Plo # am--include-marker
-include ./$(DEPDIR)/device_listener.Plo # am--include-marker
```
Interesting, so the Makefile would have included some things in the successful
build that were not in the other build. For context, the way that autotools
works is:
* `Makefile.am` defines the rules to generate the `Makefile.in`, using
`automake`.
* `Makefile.in` is templated to `Makefile` when `./configure` is run.
Autotools is a turducken of string templating on more string templating on more
string templating. String templates generate string templates which generate
new string templates which finally generate a Makefile.
In a distribution tarball, `autoconf` and `automake` were already run so
`configure` and `Makefile.in` will be included in the tarball.
If we look for where that probably came from in `src/Makefile.in`, we find:
```
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/webinspector.Plo@am__quote@ # am--include-marker
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/websocket.Plo@am__quote@ # am--include-marker
```
We also observe in `config.status`:
```
-AMDEPBACKSLASH='\'
-AMDEP_FALSE='#'
-AMDEP_TRUE=''
+AMDEPBACKSLASH=''
+AMDEP_FALSE=''
+AMDEP_TRUE='#'
...
-am_cv_CC_dependencies_compiler_type=gcc3
+am_cv_CC_dependencies_compiler_type=none
...
-am__fastdepCC_FALSE='#'
-am__fastdepCC_TRUE=''
+am__fastdepCC_FALSE=''
+am__fastdepCC_TRUE='#'
```
and in `config.log`:
```
This file contains any messages produced by compilers while
running configure, to aid debugging if configure makes a mistake.
It was created by ios_webkit_debug_proxy configure 1.9.0, which was
generated by GNU Autoconf 2.71. Invocation command line was
- $ ./configure
+ $ ./configure --disable-static --disable-dependency-tracking --prefix=/nix/store/19fsba4r1c0jwsqcdjs76y83ndah7vyh-ios-webkit-debug-proxy-unstable-2023-09-23
...
configure:4539: checking dependency style of gcc
-configure:4651: result: gcc3
+configure:4651: result: none
...
configure:13417: checking whether to build static libraries
-configure:13421: result: yes
+configure:13421: result: no
```
Well. It looks like the synthesis of all of this is that different configure
arguments were used, causing this. But why?!
As is often the case,
[`setup.sh`](https://github.com/nixos/nixpkgs/blob/d5139e30179237517e09174561b82bdacc1c2a03/pkgs/stdenv/generic/setup.sh#L1262-L1276)
is responsible:
```
$ rg -C3 -- --disable-dependency-track ~/dev/nixpkgs
/home/jade/dev/nixpkgs/pkgs/stdenv/generic/setup.sh
1260- fi
1261-
1262- if [[ -f "$configureScript" ]]; then
1263: # Add --disable-dependency-tracking to speed up some builds.
1264- if [ -z "${dontAddDisableDepTrack:-}" ]; then
1265- if grep -q dependency-tracking "$configureScript"; then
1266: prependToVar configureFlags --disable-dependency-tracking
1267- fi
1268- fi
1269-
```
and likewise for `--disable-static`, if `dontDisableStatic` is unset.
Indeed, changing the derivation to set both `dontAddDisableDepTrack` and
`dontDisableStatic` fixes the build.
This all took altogether far longer than it should have to derive a hypothesis
for why it was broken.