diff --git a/content/posts/debugging-rr-children/index.md b/content/posts/debugging-rr-children/index.md index 6e1b11d..54554cf 100644 --- a/content/posts/debugging-rr-children/index.md +++ b/content/posts/debugging-rr-children/index.md @@ -2,35 +2,34 @@ date = "2022-02-24" draft = false path = "/blog/debugging-rr-children" -tags = [] +tags = ["debugging", "rr"] title = "Debugging: using rr to deal with unruly children (processes)" +++ -I have done multiple rounds of debugging blobs of processes that start together -and then something bad happens in one of the children several forks down. -Although gdb claims to support child processes with `set follow-fork-mode` +I have wasted a lot of time debugging multi-process systems, especially those +that are some native program with a shell startup script. Although gdb claims +to support child processes with `set follow-fork-mode` ([docs][gdb-follow-fork-mode]), in practice, this is extremely painful since -you may have to set it to multiple different things in one reproduction. +gdb needs to be told, at each fork, which way it should go. To deal with these, I've done such hacks as writing wrapper scripts for the -executable at fault that run it in a gdbserver. However, by far the worst one +executable at fault that run it in a gdbserver. Certainly, the worst such hack I've done is printing out the PID of the misbehaving process then waiting, to -give me time to attach the debugger ([this is even suggested in the gdb -documentation][gdb-sleep]). +give me time to attach the debugger ([which is even a method suggested in the gdb +documentation][gdb-sleep]). Thankfully, there is a better way. [gdb-follow-fork-mode]: https://docs.jade.fyi/gnu/gdb/gdb.html#index-set-follow_002dfork_002dmode [gdb-sleep]: https://docs.jade.fyi/gnu/gdb/gdb.html#Forks -## Using rr to do it the easy way - For this demo, I am using two programs I wrote: -- `crasher` just prints out that it's about to crash, then aborts. +- `crasher` prints out that it's about to crash, then aborts. - `caller` forks and executes `crasher`, then prints its return value once it exits. -These are written in C but their source is not super interesting. Nevertheless, -you can find their source code [at the bottom of the post](#source). +These are written in C, but they could equally be written in Rust or some other +native language. You can find their source code [at the bottom of the +post](#source). Here they are in action: @@ -41,21 +40,21 @@ Here they are in action: [caller] waitpid: 158938, exited? 0 status 0, signaled? 1 signal 6 ``` -Signal 6, if you consult the table in `man 'signal(7)'`, is `SIGABRT` as +Signal 6, after consulting the table in `man 'signal(7)'`, is `SIGABRT` as expected. -We want to figure out why the crasher is crashing. It's possible to do with -gdb, but that's unnecessarily hard because of gdb, even moreso if it forks -multiple times. +We want to figure out where the crasher is crashing. + +## Using rr to do it the easy way Let's use `rr` to do this more easily. First, record a run: ``` » rr record ./caller -rr: Saving execution to trace directory `/home/lf/.local/share/rr/caller-0'. -[caller] spawned pid 159146 +rr: Saving execution to trace directory `/home/jade/.local/share/rr/caller-0'. +[caller] spawned pid 1432674 [crasher] about to crash -[caller] waitpid: 159146, exited? 0 status 0, signaled? 1 signal 6 +[caller] waitpid: 1432674, exited? 0 status 0, signaled? 1 signal 6 ``` Then find the process ID of the crashing process: @@ -63,18 +62,132 @@ Then find the process ID of the crashing process: ``` » rr ps PID PPID EXIT CMD -159145 -- 0 ./caller -159146 159145 -6 ./crasher +1432673 -- 0 ./caller +1432674 1432673 -6 ./crasher ``` Next, use either `--onfork=` or `--onprocess=` to get a debugger on the problem process: ``` +» rr replay --onprocess=1432674 +Reading symbols from /home/jade/.local/share/rr/caller-0/mmap_hardlink_37_crasher... +Remote debugging using 127.0.0.1:17412 +-------------------------------------------------- + ---> Reached target process 1432674 at event 138. +-------------------------------------------------- +Reading symbols from /lib64/ld-linux-x86-64.so.2... +(No debugging symbols found in /lib64/ld-linux-x86-64.so.2) +BFD: warning: system-supplied DSO at 0x6fffd000 has a section extending past end of file +0x00007f01f1ffb930 in _start () from /lib64/ld-linux-x86-64.so.2 +=> 0x00007f01f1ffb930 <_start+0>: 48 89 e7 mov %rsp,%rdi +(rr) continue +Continuing. +[caller] spawned pid 1432674 +[crasher] about to crash + +Program received signal SIGABRT, Aborted. +0x00007f01f1e0734c in __pthread_kill_implementation () from /usr/lib/libc.so.6 +=> 0x00007f01f1e0734c <__pthread_kill_implementation+284>: 89 c5 mov %eax,%ebp +(rr) backtrace +#0 0x00007f01f1e0734c in __pthread_kill_implementation () from /usr/lib/libc.so.6 +#1 0x00007f01f1dba4b8 in raise () from /usr/lib/libc.so.6 +#2 0x00007f01f1da4534 in abort () from /usr/lib/libc.so.6 +#3 0x000055757da48161 in main () at crasher.c:6 +(rr) frame 3 +#3 0x000055757da48161 in main () at crasher.c:6 +6 abort(); +(rr) list +1 #include +2 #include +3 +4 int main(void) { +5 printf("[crasher] about to crash\n"); +6 abort(); +7 } +(rr) +``` + +Sidebar: the lack of spew on gdb startup is a gdb 11 feature! Put `set +startup-quietly on` into `~/.config/gdb/gdbearlyinit` to get that. + +## Annotating output with event numbers + +Another neat feature I found in rr is the ability to annotate output with the +PIDs and event numbers with `--mark-stdio`, which can be useful if you have a +program that is doing a bunch of things before the event of interest. + +In this example, `--autopilot` makes it run without the debugger attached: + +``` +» rr replay --autopilot --mark-stdio +[rr 1432673 159][caller] spawned pid 1432674 +[rr 1432674 271][crasher] about to crash +[rr 1432673 289][caller] waitpid: 1432674, exited? 0 status 0, signaled? 1 signal 6 +``` + +Then I can dump myself into a debugger right at the print call that printed +"about to crash" like so: + +``` +» rr replay --onprocess=1432674 --goto 271 +[caller] spawned pid 1432674 +Reading symbols from /home/jade/.local/share/rr/caller-0/mmap_hardlink_37_crasher... +Remote debugging using 127.0.0.1:20652 + +-------------------------------------------------- + ---> Reached target process 1432674 at event 272. +-------------------------------------------------- +Reading symbols from /usr/bin/../lib64/rr/librrpreload.so... +Reading symbols from /usr/lib/libc.so.6... +(No debugging symbols found in /usr/lib/libc.so.6) +Reading symbols from /lib64/ld-linux-x86-64.so.2... +(No debugging symbols found in /lib64/ld-linux-x86-64.so.2) +BFD: warning: system-supplied DSO at 0x6fffd000 has a section extending past end of file +0x0000000070000002 in syscall_traced () +=> 0x0000000070000002: c3 ret +(rr) bt +#0 0x0000000070000002 in syscall_traced () +#1 0x00007f01f1fd0430 in _raw_syscall () + at /home/jade/builds/rr/src/rr-5.5.0/src/preload/raw_syscall.S:120 +/* .... */ +#13 0x00007f01f1dfe543 in __GI__IO_file_overflow () from /usr/lib/libc.so.6 +#14 0x00007f01f1df36fa in puts () from /usr/lib/libc.so.6 +#15 0x000055757da4815c in main () at crasher.c:5 +(rr) frame 15 +#15 0x000055757da4815c in main () at crasher.c:5 +5 printf("[crasher] about to crash\n"); +(rr) list +1 #include +2 #include +3 +4 int main(void) { +5 printf("[crasher] about to crash\n"); +6 abort(); +7 } +``` + +## Slightly less bad way with gdb + +As of a release some time in the last decade, gdb supports [`set detach-on-fork +off`](detach-on-fork), which, as the docs say, keeps both parent and child as +debugees in gdb by pausing the one that's not actively being interacted with. +This is not significantly better than `follow-fork-mode`, since it instead +requires you to run each debugee to when it will wait next. + +I tried it while writing this post and found I had to set an `exec` +[catchpoint](catchpoint) (with `catch exec`) then continue the parent process, +interrupt it when it is stuck in the wait call, switch threads to the other +debugee, then `continue` such that it hits the catchpoint. + +I would love to hear about better ways to do this in gdb. + +[detach-on-fork]: https://docs.jade.fyi/gnu/gdb/gdb.html#index-set-detach_002don_002dfork ## Demo source {#source} {{ codefile(path="caller.c", code_lang="c", colocated=true, hide=true) }} {{ codefile(path="crasher.c", code_lang="c", colocated=true, hide=true) }} {{ codefile(path="Makefile", code_lang="make", colocated=true, hide=true) }} + diff --git a/preview.sh b/preview.sh new file mode 100755 index 0000000..8914dea --- /dev/null +++ b/preview.sh @@ -0,0 +1,34 @@ +#!/bin/bash + +set -euo pipefail + +print_error() { +cat >&2 <<-EOF +Please create a .deploy_config with content: +PREVIEW_HOST=[the host you rsync to] +PREVIEW_DIR=some-subdomain +PREVIEW=https://blah +EOF +} + +if [[ ! -e .deploy_config ]]; then + print_error + exit 1 +fi + +source ./.deploy_config + +if [[ -z "${PREVIEW_HOST:-}" || -z "${PREVIEW_DIR:-}" || -z "${PREVIEW:-}" ]]; then + print_error + exit 1 +fi + +PREVIEW_TEMP=/tmp/zola-preview + +rm -rf "$PREVIEW_TEMP" +zola build --drafts --base-url "$PREVIEW" --output-dir "$PREVIEW_TEMP" + +# trailing slash: copy contents of directory into destination +rsync --verbose --human-readable --recursive \ + --links --times --new-compress --delete-delay \ + ${EXTRA_OPTS_PREVIEW:-} "$PREVIEW_TEMP/" "${PREVIEW_HOST}:${PREVIEW_DIR}"