diff --git a/content/posts/debugging-nix-package-building.md b/content/posts/debugging-nix-package-building.md new file mode 100644 index 0000000..5a2d584 --- /dev/null +++ b/content/posts/debugging-nix-package-building.md @@ -0,0 +1,229 @@ ++++ +date = "2023-10-19" +draft = false +path = "/blog/debugging-nix-package-building" +tags = ["nix"] +title = "Debugging Nix build inconsistencies: manual vs automatic build" ++++ + +This is a quick post about an issue I debugged in a package build that didn't +matter, which may however still be instructive. + +I was packaging [ios-webkit-debug-proxy] for NixOS, not having noticed it was +already there (oops! I noticed after fixing my build), and I ran into an issue +where my build was not working, so I tried building it manually, and it worked. +Wat. + +[ios-webkit-debug-proxy]: https://github.com/google/ios-webkit-debug-proxy + +Here is the `package.nix` I was using: + +```nix +{ stdenv, fetchFromGitHub, autoconf, automake, libtool, pkg-config, libimobiledevice }: +stdenv.mkDerivation { + pname = "ios-webkit-debug-proxy"; + version = "unstable-2023-09-23"; + src = fetchFromGitHub { + owner = "google"; + repo = "ios-webkit-debug-proxy"; + rev = "c5d2e959043293ee7d0587618af2bcc4dde7feef"; + sha256 = "sha256-vBrYbCzH83BqB7uTybEmF/yp2N40qJp+97lu+nKulXM="; + }; + + preConfigure = '' + NOCONFIGURE=1 ./autogen.sh + ''; + + # These are the fix ;) + # dontAddDisableDepTrack = true; + # dontDisableStatic = true; + + nativeBuildInputs = [ + autoconf + automake + libtool + pkg-config + ]; + + buildInputs = [ + libimobiledevice + ]; +} +``` + +This is what was happening: + +``` +$ nix build -L --impure --expr 'with import {}; (callPackage ../package.nix {})' +error: builder for '/nix/store/0ci7kyfmcbbxq5g5svzlp316y98vvgng-ios-webkit-debug-proxy-unstable-202 +3-09-23.drv' failed with exit code 2; + last 10 log lines: + > gcc -DHAVE_CONFIG_H -I. -I.. -I../include -I../src -I/nix/store/n4wwsam3rk07mghksrp46sn +lmf7cdi33-libimobiledevice-1.3.0+date=2023-04-30-dev/include -I/nix/store/cykm9dyc58l9jfilgrd0aflfn +znzrc39-libplist-2.3.0-dev/include -I/nix/store/cykm9dyc58l9jfilgrd0aflfnznzrc39-libplist-2.3.0-dev +/include -I/nix/store/3g9fg7cdh3234xi5b14v1fp135q8ix49-libusbmuxd-2.0.2+date=2023-04-30/include -I/ +nix/store/lqsfdc4p9vax2h55csza9iw46zjpghl6-openssl-3.0.10-dev/include -g -O2 -Wall -Werror -c -o .. +/src/socket_manager.o ../src/socket_manager.c + > ../src/socket_manager.c:34:10: fatal error: socket_manager.h: No such file or directory + > 34 | #include "socket_manager.h" + > | ^~~~~~~~~~~~~~~~~~ + > compilation terminated. + > make[2]: *** [Makefile:464: ../src/socket_manager.o] Error 1 + > make[2]: Leaving directory '/build/source/examples' + > make[1]: *** [Makefile:410: all-recursive] Error 1 + > make[1]: Leaving directory '/build/source' + > make: *** [Makefile:342: all] Error 2 + For full logs, run 'nix log /nix/store/0ci7kyfmcbbxq5g5svzlp316y98vvgng-ios-webkit-debug-pro + +$ nix develop --impure --expr 'with import {}; (callPackage ../package.nix {})' +$ cd manually-cloned-src +$ NOCONFIGURE=1 ./autogen.sh +$ make -j8 +(succeeds) +``` + +So, we know that the issue is one of two problems, either: + +* The build relies on an impurity of not being in the sandbox. This symptom + makes no sense for that, but if this is your problem, use [breakpointHook] + and [cntr] to enter the build environment. +* There is some inconsistency in the actual build process. If this is the case, + we *believe*, in this case, that Make must be doing something different, + since the `../src/socket_manager.c` target does not appear as being compiled + in the successful build (it is only used in the link stage!). + + One way to verify this is to [manually run the build] and see if the + behaviour still is occurring. I expect that in this case, it would have + (extremely hilariously!) caused the bug to reproduce differently due to + `dontAddDisableDepTrack` being automatically set in nix shells. + + +[manually run the build]: https://jade.fyi/blog/building-nix-derivations-manually/ + +[breakpointHook]: https://nixos.org/manual/nixpkgs/stable/#breakpointhook +[cntr]: https://github.com/Mic92/cntr + +What I did was to run the failing `nix build` with `--keep-failed`, then looked +in `/tmp/nix-build-*` for the failed tree. From this failed tree I found I +could reproduce the failure again using `make`. + +Let's figure out what differs between these directory trees. To do this, we can +use [diffoscope], an advanced diff tool that can compare directory structures +with nice structural diffs. + +[diffoscope]: https://diffoscope.org/ + +We believe that it's an inconsistent Makefile, so let's dig through the output and +test that: + +``` +--- ./src/Makefile ++++ /tmp/nix-build-ios-webkit-debug-proxy-unstable-2023-09-23.drv-0/source/src/Makefile ++#include ./$(DEPDIR)/sha1.Plo # am--include-marker ++#include ./$(DEPDIR)/socket_manager.Plo # am--include-marker ++#include ./$(DEPDIR)/webinspector.Plo # am--include-marker ++#include ./$(DEPDIR)/websocket.Plo # am--include-marker +-include ./$(DEPDIR)/base64.Plo # am--include-marker +-include ./$(DEPDIR)/char_buffer.Plo # am--include-marker +-include ./$(DEPDIR)/device_listener.Plo # am--include-marker +``` + +Interesting, so the Makefile would have included some things in the successful +build that were not in the other build. For context, the way that autotools +works is: + +* `Makefile.am` defines the rules to generate the `Makefile.in`, using + `automake`. +* `Makefile.in` is templated to `Makefile` when `./configure` is run. + +Autotools is a turducken of string templating on more string templating on more +string templating. String templates generate string templates which generate +new string templates which finally generate a Makefile. + +In a distribution tarball, `autoconf` and `automake` were already run so +`configure` and `Makefile.in` will be included in the tarball. + +If we look for where that probably came from in `src/Makefile.in`, we find: + +``` +@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/webinspector.Plo@am__quote@ # am--include-marker +@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/websocket.Plo@am__quote@ # am--include-marker +``` + +We also observe in `config.status`: + +``` +-AMDEPBACKSLASH='\' +-AMDEP_FALSE='#' +-AMDEP_TRUE='' ++AMDEPBACKSLASH='' ++AMDEP_FALSE='' ++AMDEP_TRUE='#' + +... + +-am_cv_CC_dependencies_compiler_type=gcc3 ++am_cv_CC_dependencies_compiler_type=none + +... + +-am__fastdepCC_FALSE='#' +-am__fastdepCC_TRUE='' ++am__fastdepCC_FALSE='' ++am__fastdepCC_TRUE='#' +``` + +and in `config.log`: + +``` + This file contains any messages produced by compilers while + running configure, to aid debugging if configure makes a mistake. + + It was created by ios_webkit_debug_proxy configure 1.9.0, which was + generated by GNU Autoconf 2.71. Invocation command line was + +- $ ./configure ++ $ ./configure --disable-static --disable-dependency-tracking --prefix=/nix/store/19fsba4r1c0jwsqcdjs76y83ndah7vyh-ios-webkit-debug-proxy-unstable-2023-09-23 + +... + + configure:4539: checking dependency style of gcc +-configure:4651: result: gcc3 ++configure:4651: result: none + +... + + configure:13417: checking whether to build static libraries +-configure:13421: result: yes ++configure:13421: result: no +``` + +Well. It looks like the synthesis of all of this is that different configure +arguments were used, causing this. But why?! + +As is often the case, +[`setup.sh`](https://github.com/nixos/nixpkgs/blob/d5139e30179237517e09174561b82bdacc1c2a03/pkgs/stdenv/generic/setup.sh#L1262-L1276) +is responsible: + +``` +$ rg -C3 -- --disable-dependency-track ~/dev/nixpkgs +/home/jade/dev/nixpkgs/pkgs/stdenv/generic/setup.sh +1260- fi +1261- +1262- if [[ -f "$configureScript" ]]; then +1263: # Add --disable-dependency-tracking to speed up some builds. +1264- if [ -z "${dontAddDisableDepTrack:-}" ]; then +1265- if grep -q dependency-tracking "$configureScript"; then +1266: prependToVar configureFlags --disable-dependency-tracking +1267- fi +1268- fi +1269- +``` + +and likewise for `--disable-static`, if `dontDisableStatic` is unset. + +Indeed, changing the derivation to set both `dontAddDisableDepTrack` and +`dontDisableStatic` fixes the build. + +This all took altogether far longer than it should have to derive a hypothesis +for why it was broken.