Add post on debugging with rr

This commit is contained in:
Jade Lovelace 2022-03-15 13:52:18 -07:00
parent 5a1c98d8a4
commit 7db1a7c496
2 changed files with 169 additions and 22 deletions

View file

@ -2,35 +2,34 @@
date = "2022-02-24"
draft = false
path = "/blog/debugging-rr-children"
tags = []
tags = ["debugging", "rr"]
title = "Debugging: using rr to deal with unruly children (processes)"
+++
I have done multiple rounds of debugging blobs of processes that start together
and then something bad happens in one of the children several forks down.
Although gdb claims to support child processes with `set follow-fork-mode`
I have wasted a lot of time debugging multi-process systems, especially those
that are some native program with a shell startup script. Although gdb claims
to support child processes with `set follow-fork-mode`
([docs][gdb-follow-fork-mode]), in practice, this is extremely painful since
you may have to set it to multiple different things in one reproduction.
gdb needs to be told, at each fork, which way it should go.
To deal with these, I've done such hacks as writing wrapper scripts for the
executable at fault that run it in a gdbserver. However, by far the worst one
executable at fault that run it in a gdbserver. Certainly, the worst such hack
I've done is printing out the PID of the misbehaving process then waiting, to
give me time to attach the debugger ([this is even suggested in the gdb
documentation][gdb-sleep]).
give me time to attach the debugger ([which is even a method suggested in the gdb
documentation][gdb-sleep]). Thankfully, there is a better way.
[gdb-follow-fork-mode]: https://docs.jade.fyi/gnu/gdb/gdb.html#index-set-follow_002dfork_002dmode
[gdb-sleep]: https://docs.jade.fyi/gnu/gdb/gdb.html#Forks
## Using rr to do it the easy way
For this demo, I am using two programs I wrote:
- `crasher` just prints out that it's about to crash, then aborts.
- `crasher` prints out that it's about to crash, then aborts.
- `caller` forks and executes `crasher`, then prints its return value once it
exits.
These are written in C but their source is not super interesting. Nevertheless,
you can find their source code [at the bottom of the post](#source).
These are written in C, but they could equally be written in Rust or some other
native language. You can find their source code [at the bottom of the
post](#source).
Here they are in action:
@ -41,21 +40,21 @@ Here they are in action:
[caller] waitpid: 158938, exited? 0 status 0, signaled? 1 signal 6
```
Signal 6, if you consult the table in `man 'signal(7)'`, is `SIGABRT` as
Signal 6, after consulting the table in `man 'signal(7)'`, is `SIGABRT` as
expected.
We want to figure out why the crasher is crashing. It's possible to do with
gdb, but that's unnecessarily hard because of gdb, even moreso if it forks
multiple times.
We want to figure out where the crasher is crashing.
## Using rr to do it the easy way
Let's use `rr` to do this more easily. First, record a run:
```
» rr record ./caller
rr: Saving execution to trace directory `/home/lf/.local/share/rr/caller-0'.
[caller] spawned pid 159146
rr: Saving execution to trace directory `/home/jade/.local/share/rr/caller-0'.
[caller] spawned pid 1432674
[crasher] about to crash
[caller] waitpid: 159146, exited? 0 status 0, signaled? 1 signal 6
[caller] waitpid: 1432674, exited? 0 status 0, signaled? 1 signal 6
```
Then find the process ID of the crashing process:
@ -63,18 +62,132 @@ Then find the process ID of the crashing process:
```
» rr ps
PID PPID EXIT CMD
159145 -- 0 ./caller
159146 159145 -6 ./crasher
1432673 -- 0 ./caller
1432674 1432673 -6 ./crasher
```
Next, use either `--onfork=<PID>` or `--onprocess=<PID>` to get a debugger on
the problem process:
```
» rr replay --onprocess=1432674
Reading symbols from /home/jade/.local/share/rr/caller-0/mmap_hardlink_37_crasher...
Remote debugging using 127.0.0.1:17412
--------------------------------------------------
---> Reached target process 1432674 at event 138.
--------------------------------------------------
Reading symbols from /lib64/ld-linux-x86-64.so.2...
(No debugging symbols found in /lib64/ld-linux-x86-64.so.2)
BFD: warning: system-supplied DSO at 0x6fffd000 has a section extending past end of file
0x00007f01f1ffb930 in _start () from /lib64/ld-linux-x86-64.so.2
=> 0x00007f01f1ffb930 <_start+0>: 48 89 e7 mov %rsp,%rdi
(rr) continue
Continuing.
[caller] spawned pid 1432674
[crasher] about to crash
Program received signal SIGABRT, Aborted.
0x00007f01f1e0734c in __pthread_kill_implementation () from /usr/lib/libc.so.6
=> 0x00007f01f1e0734c <__pthread_kill_implementation+284>: 89 c5 mov %eax,%ebp
(rr) backtrace
#0 0x00007f01f1e0734c in __pthread_kill_implementation () from /usr/lib/libc.so.6
#1 0x00007f01f1dba4b8 in raise () from /usr/lib/libc.so.6
#2 0x00007f01f1da4534 in abort () from /usr/lib/libc.so.6
#3 0x000055757da48161 in main () at crasher.c:6
(rr) frame 3
#3 0x000055757da48161 in main () at crasher.c:6
6 abort();
(rr) list
1 #include <stdio.h>
2 #include <stdlib.h>
3
4 int main(void) {
5 printf("[crasher] about to crash\n");
6 abort();
7 }
(rr)
```
Sidebar: the lack of spew on gdb startup is a gdb 11 feature! Put `set
startup-quietly on` into `~/.config/gdb/gdbearlyinit` to get that.
## Annotating output with event numbers
Another neat feature I found in rr is the ability to annotate output with the
PIDs and event numbers with `--mark-stdio`, which can be useful if you have a
program that is doing a bunch of things before the event of interest.
In this example, `--autopilot` makes it run without the debugger attached:
```
» rr replay --autopilot --mark-stdio
[rr 1432673 159][caller] spawned pid 1432674
[rr 1432674 271][crasher] about to crash
[rr 1432673 289][caller] waitpid: 1432674, exited? 0 status 0, signaled? 1 signal 6
```
Then I can dump myself into a debugger right at the print call that printed
"about to crash" like so:
```
» rr replay --onprocess=1432674 --goto 271
[caller] spawned pid 1432674
Reading symbols from /home/jade/.local/share/rr/caller-0/mmap_hardlink_37_crasher...
Remote debugging using 127.0.0.1:20652
--------------------------------------------------
---> Reached target process 1432674 at event 272.
--------------------------------------------------
Reading symbols from /usr/bin/../lib64/rr/librrpreload.so...
Reading symbols from /usr/lib/libc.so.6...
(No debugging symbols found in /usr/lib/libc.so.6)
Reading symbols from /lib64/ld-linux-x86-64.so.2...
(No debugging symbols found in /lib64/ld-linux-x86-64.so.2)
BFD: warning: system-supplied DSO at 0x6fffd000 has a section extending past end of file
0x0000000070000002 in syscall_traced ()
=> 0x0000000070000002: c3 ret
(rr) bt
#0 0x0000000070000002 in syscall_traced ()
#1 0x00007f01f1fd0430 in _raw_syscall ()
at /home/jade/builds/rr/src/rr-5.5.0/src/preload/raw_syscall.S:120
/* .... */
#13 0x00007f01f1dfe543 in __GI__IO_file_overflow () from /usr/lib/libc.so.6
#14 0x00007f01f1df36fa in puts () from /usr/lib/libc.so.6
#15 0x000055757da4815c in main () at crasher.c:5
(rr) frame 15
#15 0x000055757da4815c in main () at crasher.c:5
5 printf("[crasher] about to crash\n");
(rr) list
1 #include <stdio.h>
2 #include <stdlib.h>
3
4 int main(void) {
5 printf("[crasher] about to crash\n");
6 abort();
7 }
```
## Slightly less bad way with gdb
As of a release some time in the last decade, gdb supports [`set detach-on-fork
off`](detach-on-fork), which, as the docs say, keeps both parent and child as
debugees in gdb by pausing the one that's not actively being interacted with.
This is not significantly better than `follow-fork-mode`, since it instead
requires you to run each debugee to when it will wait next.
I tried it while writing this post and found I had to set an `exec`
[catchpoint](catchpoint) (with `catch exec`) then continue the parent process,
interrupt it when it is stuck in the wait call, switch threads to the other
debugee, then `continue` such that it hits the catchpoint.
I would love to hear about better ways to do this in gdb.
[detach-on-fork]: https://docs.jade.fyi/gnu/gdb/gdb.html#index-set-detach_002don_002dfork
## Demo source {#source}
{{ codefile(path="caller.c", code_lang="c", colocated=true, hide=true) }}
{{ codefile(path="crasher.c", code_lang="c", colocated=true, hide=true) }}
{{ codefile(path="Makefile", code_lang="make", colocated=true, hide=true) }}