Add post on debugging with rr
This commit is contained in:
parent
5a1c98d8a4
commit
7db1a7c496
2 changed files with 169 additions and 22 deletions
|
|
@ -2,35 +2,34 @@
|
|||
date = "2022-02-24"
|
||||
draft = false
|
||||
path = "/blog/debugging-rr-children"
|
||||
tags = []
|
||||
tags = ["debugging", "rr"]
|
||||
title = "Debugging: using rr to deal with unruly children (processes)"
|
||||
+++
|
||||
|
||||
I have done multiple rounds of debugging blobs of processes that start together
|
||||
and then something bad happens in one of the children several forks down.
|
||||
Although gdb claims to support child processes with `set follow-fork-mode`
|
||||
I have wasted a lot of time debugging multi-process systems, especially those
|
||||
that are some native program with a shell startup script. Although gdb claims
|
||||
to support child processes with `set follow-fork-mode`
|
||||
([docs][gdb-follow-fork-mode]), in practice, this is extremely painful since
|
||||
you may have to set it to multiple different things in one reproduction.
|
||||
gdb needs to be told, at each fork, which way it should go.
|
||||
|
||||
To deal with these, I've done such hacks as writing wrapper scripts for the
|
||||
executable at fault that run it in a gdbserver. However, by far the worst one
|
||||
executable at fault that run it in a gdbserver. Certainly, the worst such hack
|
||||
I've done is printing out the PID of the misbehaving process then waiting, to
|
||||
give me time to attach the debugger ([this is even suggested in the gdb
|
||||
documentation][gdb-sleep]).
|
||||
give me time to attach the debugger ([which is even a method suggested in the gdb
|
||||
documentation][gdb-sleep]). Thankfully, there is a better way.
|
||||
|
||||
[gdb-follow-fork-mode]: https://docs.jade.fyi/gnu/gdb/gdb.html#index-set-follow_002dfork_002dmode
|
||||
[gdb-sleep]: https://docs.jade.fyi/gnu/gdb/gdb.html#Forks
|
||||
|
||||
## Using rr to do it the easy way
|
||||
|
||||
For this demo, I am using two programs I wrote:
|
||||
|
||||
- `crasher` just prints out that it's about to crash, then aborts.
|
||||
- `crasher` prints out that it's about to crash, then aborts.
|
||||
- `caller` forks and executes `crasher`, then prints its return value once it
|
||||
exits.
|
||||
|
||||
These are written in C but their source is not super interesting. Nevertheless,
|
||||
you can find their source code [at the bottom of the post](#source).
|
||||
These are written in C, but they could equally be written in Rust or some other
|
||||
native language. You can find their source code [at the bottom of the
|
||||
post](#source).
|
||||
|
||||
Here they are in action:
|
||||
|
||||
|
|
@ -41,21 +40,21 @@ Here they are in action:
|
|||
[caller] waitpid: 158938, exited? 0 status 0, signaled? 1 signal 6
|
||||
```
|
||||
|
||||
Signal 6, if you consult the table in `man 'signal(7)'`, is `SIGABRT` as
|
||||
Signal 6, after consulting the table in `man 'signal(7)'`, is `SIGABRT` as
|
||||
expected.
|
||||
|
||||
We want to figure out why the crasher is crashing. It's possible to do with
|
||||
gdb, but that's unnecessarily hard because of gdb, even moreso if it forks
|
||||
multiple times.
|
||||
We want to figure out where the crasher is crashing.
|
||||
|
||||
## Using rr to do it the easy way
|
||||
|
||||
Let's use `rr` to do this more easily. First, record a run:
|
||||
|
||||
```
|
||||
» rr record ./caller
|
||||
rr: Saving execution to trace directory `/home/lf/.local/share/rr/caller-0'.
|
||||
[caller] spawned pid 159146
|
||||
rr: Saving execution to trace directory `/home/jade/.local/share/rr/caller-0'.
|
||||
[caller] spawned pid 1432674
|
||||
[crasher] about to crash
|
||||
[caller] waitpid: 159146, exited? 0 status 0, signaled? 1 signal 6
|
||||
[caller] waitpid: 1432674, exited? 0 status 0, signaled? 1 signal 6
|
||||
```
|
||||
|
||||
Then find the process ID of the crashing process:
|
||||
|
|
@ -63,18 +62,132 @@ Then find the process ID of the crashing process:
|
|||
```
|
||||
» rr ps
|
||||
PID PPID EXIT CMD
|
||||
159145 -- 0 ./caller
|
||||
159146 159145 -6 ./crasher
|
||||
1432673 -- 0 ./caller
|
||||
1432674 1432673 -6 ./crasher
|
||||
```
|
||||
|
||||
Next, use either `--onfork=<PID>` or `--onprocess=<PID>` to get a debugger on
|
||||
the problem process:
|
||||
|
||||
```
|
||||
» rr replay --onprocess=1432674
|
||||
Reading symbols from /home/jade/.local/share/rr/caller-0/mmap_hardlink_37_crasher...
|
||||
Remote debugging using 127.0.0.1:17412
|
||||
|
||||
--------------------------------------------------
|
||||
---> Reached target process 1432674 at event 138.
|
||||
--------------------------------------------------
|
||||
Reading symbols from /lib64/ld-linux-x86-64.so.2...
|
||||
(No debugging symbols found in /lib64/ld-linux-x86-64.so.2)
|
||||
BFD: warning: system-supplied DSO at 0x6fffd000 has a section extending past end of file
|
||||
0x00007f01f1ffb930 in _start () from /lib64/ld-linux-x86-64.so.2
|
||||
=> 0x00007f01f1ffb930 <_start+0>: 48 89 e7 mov %rsp,%rdi
|
||||
(rr) continue
|
||||
Continuing.
|
||||
[caller] spawned pid 1432674
|
||||
[crasher] about to crash
|
||||
|
||||
Program received signal SIGABRT, Aborted.
|
||||
0x00007f01f1e0734c in __pthread_kill_implementation () from /usr/lib/libc.so.6
|
||||
=> 0x00007f01f1e0734c <__pthread_kill_implementation+284>: 89 c5 mov %eax,%ebp
|
||||
(rr) backtrace
|
||||
#0 0x00007f01f1e0734c in __pthread_kill_implementation () from /usr/lib/libc.so.6
|
||||
#1 0x00007f01f1dba4b8 in raise () from /usr/lib/libc.so.6
|
||||
#2 0x00007f01f1da4534 in abort () from /usr/lib/libc.so.6
|
||||
#3 0x000055757da48161 in main () at crasher.c:6
|
||||
(rr) frame 3
|
||||
#3 0x000055757da48161 in main () at crasher.c:6
|
||||
6 abort();
|
||||
(rr) list
|
||||
1 #include <stdio.h>
|
||||
2 #include <stdlib.h>
|
||||
3
|
||||
4 int main(void) {
|
||||
5 printf("[crasher] about to crash\n");
|
||||
6 abort();
|
||||
7 }
|
||||
(rr)
|
||||
```
|
||||
|
||||
Sidebar: the lack of spew on gdb startup is a gdb 11 feature! Put `set
|
||||
startup-quietly on` into `~/.config/gdb/gdbearlyinit` to get that.
|
||||
|
||||
## Annotating output with event numbers
|
||||
|
||||
Another neat feature I found in rr is the ability to annotate output with the
|
||||
PIDs and event numbers with `--mark-stdio`, which can be useful if you have a
|
||||
program that is doing a bunch of things before the event of interest.
|
||||
|
||||
In this example, `--autopilot` makes it run without the debugger attached:
|
||||
|
||||
```
|
||||
» rr replay --autopilot --mark-stdio
|
||||
[rr 1432673 159][caller] spawned pid 1432674
|
||||
[rr 1432674 271][crasher] about to crash
|
||||
[rr 1432673 289][caller] waitpid: 1432674, exited? 0 status 0, signaled? 1 signal 6
|
||||
```
|
||||
|
||||
Then I can dump myself into a debugger right at the print call that printed
|
||||
"about to crash" like so:
|
||||
|
||||
```
|
||||
» rr replay --onprocess=1432674 --goto 271
|
||||
[caller] spawned pid 1432674
|
||||
Reading symbols from /home/jade/.local/share/rr/caller-0/mmap_hardlink_37_crasher...
|
||||
Remote debugging using 127.0.0.1:20652
|
||||
|
||||
--------------------------------------------------
|
||||
---> Reached target process 1432674 at event 272.
|
||||
--------------------------------------------------
|
||||
Reading symbols from /usr/bin/../lib64/rr/librrpreload.so...
|
||||
Reading symbols from /usr/lib/libc.so.6...
|
||||
(No debugging symbols found in /usr/lib/libc.so.6)
|
||||
Reading symbols from /lib64/ld-linux-x86-64.so.2...
|
||||
(No debugging symbols found in /lib64/ld-linux-x86-64.so.2)
|
||||
BFD: warning: system-supplied DSO at 0x6fffd000 has a section extending past end of file
|
||||
0x0000000070000002 in syscall_traced ()
|
||||
=> 0x0000000070000002: c3 ret
|
||||
(rr) bt
|
||||
#0 0x0000000070000002 in syscall_traced ()
|
||||
#1 0x00007f01f1fd0430 in _raw_syscall ()
|
||||
at /home/jade/builds/rr/src/rr-5.5.0/src/preload/raw_syscall.S:120
|
||||
/* .... */
|
||||
#13 0x00007f01f1dfe543 in __GI__IO_file_overflow () from /usr/lib/libc.so.6
|
||||
#14 0x00007f01f1df36fa in puts () from /usr/lib/libc.so.6
|
||||
#15 0x000055757da4815c in main () at crasher.c:5
|
||||
(rr) frame 15
|
||||
#15 0x000055757da4815c in main () at crasher.c:5
|
||||
5 printf("[crasher] about to crash\n");
|
||||
(rr) list
|
||||
1 #include <stdio.h>
|
||||
2 #include <stdlib.h>
|
||||
3
|
||||
4 int main(void) {
|
||||
5 printf("[crasher] about to crash\n");
|
||||
6 abort();
|
||||
7 }
|
||||
```
|
||||
|
||||
## Slightly less bad way with gdb
|
||||
|
||||
As of a release some time in the last decade, gdb supports [`set detach-on-fork
|
||||
off`](detach-on-fork), which, as the docs say, keeps both parent and child as
|
||||
debugees in gdb by pausing the one that's not actively being interacted with.
|
||||
This is not significantly better than `follow-fork-mode`, since it instead
|
||||
requires you to run each debugee to when it will wait next.
|
||||
|
||||
I tried it while writing this post and found I had to set an `exec`
|
||||
[catchpoint](catchpoint) (with `catch exec`) then continue the parent process,
|
||||
interrupt it when it is stuck in the wait call, switch threads to the other
|
||||
debugee, then `continue` such that it hits the catchpoint.
|
||||
|
||||
I would love to hear about better ways to do this in gdb.
|
||||
|
||||
[detach-on-fork]: https://docs.jade.fyi/gnu/gdb/gdb.html#index-set-detach_002don_002dfork
|
||||
|
||||
## Demo source {#source}
|
||||
|
||||
{{ codefile(path="caller.c", code_lang="c", colocated=true, hide=true) }}
|
||||
{{ codefile(path="crasher.c", code_lang="c", colocated=true, hide=true) }}
|
||||
{{ codefile(path="Makefile", code_lang="make", colocated=true, hide=true) }}
|
||||
|
||||
|
|
|
|||
34
preview.sh
Executable file
34
preview.sh
Executable file
|
|
@ -0,0 +1,34 @@
|
|||
#!/bin/bash
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
print_error() {
|
||||
cat >&2 <<-EOF
|
||||
Please create a .deploy_config with content:
|
||||
PREVIEW_HOST=[the host you rsync to]
|
||||
PREVIEW_DIR=some-subdomain
|
||||
PREVIEW=https://blah
|
||||
EOF
|
||||
}
|
||||
|
||||
if [[ ! -e .deploy_config ]]; then
|
||||
print_error
|
||||
exit 1
|
||||
fi
|
||||
|
||||
source ./.deploy_config
|
||||
|
||||
if [[ -z "${PREVIEW_HOST:-}" || -z "${PREVIEW_DIR:-}" || -z "${PREVIEW:-}" ]]; then
|
||||
print_error
|
||||
exit 1
|
||||
fi
|
||||
|
||||
PREVIEW_TEMP=/tmp/zola-preview
|
||||
|
||||
rm -rf "$PREVIEW_TEMP"
|
||||
zola build --drafts --base-url "$PREVIEW" --output-dir "$PREVIEW_TEMP"
|
||||
|
||||
# trailing slash: copy contents of directory into destination
|
||||
rsync --verbose --human-readable --recursive \
|
||||
--links --times --new-compress --delete-delay \
|
||||
${EXTRA_OPTS_PREVIEW:-} "$PREVIEW_TEMP/" "${PREVIEW_HOST}:${PREVIEW_DIR}"
|
||||
Loading…
Add table
Add a link
Reference in a new issue