Add post on debugging with rr
This commit is contained in:
parent
5a1c98d8a4
commit
7db1a7c496
2 changed files with 169 additions and 22 deletions
|
|
@ -2,35 +2,34 @@
|
||||||
date = "2022-02-24"
|
date = "2022-02-24"
|
||||||
draft = false
|
draft = false
|
||||||
path = "/blog/debugging-rr-children"
|
path = "/blog/debugging-rr-children"
|
||||||
tags = []
|
tags = ["debugging", "rr"]
|
||||||
title = "Debugging: using rr to deal with unruly children (processes)"
|
title = "Debugging: using rr to deal with unruly children (processes)"
|
||||||
+++
|
+++
|
||||||
|
|
||||||
I have done multiple rounds of debugging blobs of processes that start together
|
I have wasted a lot of time debugging multi-process systems, especially those
|
||||||
and then something bad happens in one of the children several forks down.
|
that are some native program with a shell startup script. Although gdb claims
|
||||||
Although gdb claims to support child processes with `set follow-fork-mode`
|
to support child processes with `set follow-fork-mode`
|
||||||
([docs][gdb-follow-fork-mode]), in practice, this is extremely painful since
|
([docs][gdb-follow-fork-mode]), in practice, this is extremely painful since
|
||||||
you may have to set it to multiple different things in one reproduction.
|
gdb needs to be told, at each fork, which way it should go.
|
||||||
|
|
||||||
To deal with these, I've done such hacks as writing wrapper scripts for the
|
To deal with these, I've done such hacks as writing wrapper scripts for the
|
||||||
executable at fault that run it in a gdbserver. However, by far the worst one
|
executable at fault that run it in a gdbserver. Certainly, the worst such hack
|
||||||
I've done is printing out the PID of the misbehaving process then waiting, to
|
I've done is printing out the PID of the misbehaving process then waiting, to
|
||||||
give me time to attach the debugger ([this is even suggested in the gdb
|
give me time to attach the debugger ([which is even a method suggested in the gdb
|
||||||
documentation][gdb-sleep]).
|
documentation][gdb-sleep]). Thankfully, there is a better way.
|
||||||
|
|
||||||
[gdb-follow-fork-mode]: https://docs.jade.fyi/gnu/gdb/gdb.html#index-set-follow_002dfork_002dmode
|
[gdb-follow-fork-mode]: https://docs.jade.fyi/gnu/gdb/gdb.html#index-set-follow_002dfork_002dmode
|
||||||
[gdb-sleep]: https://docs.jade.fyi/gnu/gdb/gdb.html#Forks
|
[gdb-sleep]: https://docs.jade.fyi/gnu/gdb/gdb.html#Forks
|
||||||
|
|
||||||
## Using rr to do it the easy way
|
|
||||||
|
|
||||||
For this demo, I am using two programs I wrote:
|
For this demo, I am using two programs I wrote:
|
||||||
|
|
||||||
- `crasher` just prints out that it's about to crash, then aborts.
|
- `crasher` prints out that it's about to crash, then aborts.
|
||||||
- `caller` forks and executes `crasher`, then prints its return value once it
|
- `caller` forks and executes `crasher`, then prints its return value once it
|
||||||
exits.
|
exits.
|
||||||
|
|
||||||
These are written in C but their source is not super interesting. Nevertheless,
|
These are written in C, but they could equally be written in Rust or some other
|
||||||
you can find their source code [at the bottom of the post](#source).
|
native language. You can find their source code [at the bottom of the
|
||||||
|
post](#source).
|
||||||
|
|
||||||
Here they are in action:
|
Here they are in action:
|
||||||
|
|
||||||
|
|
@ -41,21 +40,21 @@ Here they are in action:
|
||||||
[caller] waitpid: 158938, exited? 0 status 0, signaled? 1 signal 6
|
[caller] waitpid: 158938, exited? 0 status 0, signaled? 1 signal 6
|
||||||
```
|
```
|
||||||
|
|
||||||
Signal 6, if you consult the table in `man 'signal(7)'`, is `SIGABRT` as
|
Signal 6, after consulting the table in `man 'signal(7)'`, is `SIGABRT` as
|
||||||
expected.
|
expected.
|
||||||
|
|
||||||
We want to figure out why the crasher is crashing. It's possible to do with
|
We want to figure out where the crasher is crashing.
|
||||||
gdb, but that's unnecessarily hard because of gdb, even moreso if it forks
|
|
||||||
multiple times.
|
## Using rr to do it the easy way
|
||||||
|
|
||||||
Let's use `rr` to do this more easily. First, record a run:
|
Let's use `rr` to do this more easily. First, record a run:
|
||||||
|
|
||||||
```
|
```
|
||||||
» rr record ./caller
|
» rr record ./caller
|
||||||
rr: Saving execution to trace directory `/home/lf/.local/share/rr/caller-0'.
|
rr: Saving execution to trace directory `/home/jade/.local/share/rr/caller-0'.
|
||||||
[caller] spawned pid 159146
|
[caller] spawned pid 1432674
|
||||||
[crasher] about to crash
|
[crasher] about to crash
|
||||||
[caller] waitpid: 159146, exited? 0 status 0, signaled? 1 signal 6
|
[caller] waitpid: 1432674, exited? 0 status 0, signaled? 1 signal 6
|
||||||
```
|
```
|
||||||
|
|
||||||
Then find the process ID of the crashing process:
|
Then find the process ID of the crashing process:
|
||||||
|
|
@ -63,18 +62,132 @@ Then find the process ID of the crashing process:
|
||||||
```
|
```
|
||||||
» rr ps
|
» rr ps
|
||||||
PID PPID EXIT CMD
|
PID PPID EXIT CMD
|
||||||
159145 -- 0 ./caller
|
1432673 -- 0 ./caller
|
||||||
159146 159145 -6 ./crasher
|
1432674 1432673 -6 ./crasher
|
||||||
```
|
```
|
||||||
|
|
||||||
Next, use either `--onfork=<PID>` or `--onprocess=<PID>` to get a debugger on
|
Next, use either `--onfork=<PID>` or `--onprocess=<PID>` to get a debugger on
|
||||||
the problem process:
|
the problem process:
|
||||||
|
|
||||||
```
|
```
|
||||||
|
» rr replay --onprocess=1432674
|
||||||
|
Reading symbols from /home/jade/.local/share/rr/caller-0/mmap_hardlink_37_crasher...
|
||||||
|
Remote debugging using 127.0.0.1:17412
|
||||||
|
|
||||||
|
--------------------------------------------------
|
||||||
|
---> Reached target process 1432674 at event 138.
|
||||||
|
--------------------------------------------------
|
||||||
|
Reading symbols from /lib64/ld-linux-x86-64.so.2...
|
||||||
|
(No debugging symbols found in /lib64/ld-linux-x86-64.so.2)
|
||||||
|
BFD: warning: system-supplied DSO at 0x6fffd000 has a section extending past end of file
|
||||||
|
0x00007f01f1ffb930 in _start () from /lib64/ld-linux-x86-64.so.2
|
||||||
|
=> 0x00007f01f1ffb930 <_start+0>: 48 89 e7 mov %rsp,%rdi
|
||||||
|
(rr) continue
|
||||||
|
Continuing.
|
||||||
|
[caller] spawned pid 1432674
|
||||||
|
[crasher] about to crash
|
||||||
|
|
||||||
|
Program received signal SIGABRT, Aborted.
|
||||||
|
0x00007f01f1e0734c in __pthread_kill_implementation () from /usr/lib/libc.so.6
|
||||||
|
=> 0x00007f01f1e0734c <__pthread_kill_implementation+284>: 89 c5 mov %eax,%ebp
|
||||||
|
(rr) backtrace
|
||||||
|
#0 0x00007f01f1e0734c in __pthread_kill_implementation () from /usr/lib/libc.so.6
|
||||||
|
#1 0x00007f01f1dba4b8 in raise () from /usr/lib/libc.so.6
|
||||||
|
#2 0x00007f01f1da4534 in abort () from /usr/lib/libc.so.6
|
||||||
|
#3 0x000055757da48161 in main () at crasher.c:6
|
||||||
|
(rr) frame 3
|
||||||
|
#3 0x000055757da48161 in main () at crasher.c:6
|
||||||
|
6 abort();
|
||||||
|
(rr) list
|
||||||
|
1 #include <stdio.h>
|
||||||
|
2 #include <stdlib.h>
|
||||||
|
3
|
||||||
|
4 int main(void) {
|
||||||
|
5 printf("[crasher] about to crash\n");
|
||||||
|
6 abort();
|
||||||
|
7 }
|
||||||
|
(rr)
|
||||||
|
```
|
||||||
|
|
||||||
|
Sidebar: the lack of spew on gdb startup is a gdb 11 feature! Put `set
|
||||||
|
startup-quietly on` into `~/.config/gdb/gdbearlyinit` to get that.
|
||||||
|
|
||||||
|
## Annotating output with event numbers
|
||||||
|
|
||||||
|
Another neat feature I found in rr is the ability to annotate output with the
|
||||||
|
PIDs and event numbers with `--mark-stdio`, which can be useful if you have a
|
||||||
|
program that is doing a bunch of things before the event of interest.
|
||||||
|
|
||||||
|
In this example, `--autopilot` makes it run without the debugger attached:
|
||||||
|
|
||||||
|
```
|
||||||
|
» rr replay --autopilot --mark-stdio
|
||||||
|
[rr 1432673 159][caller] spawned pid 1432674
|
||||||
|
[rr 1432674 271][crasher] about to crash
|
||||||
|
[rr 1432673 289][caller] waitpid: 1432674, exited? 0 status 0, signaled? 1 signal 6
|
||||||
|
```
|
||||||
|
|
||||||
|
Then I can dump myself into a debugger right at the print call that printed
|
||||||
|
"about to crash" like so:
|
||||||
|
|
||||||
|
```
|
||||||
|
» rr replay --onprocess=1432674 --goto 271
|
||||||
|
[caller] spawned pid 1432674
|
||||||
|
Reading symbols from /home/jade/.local/share/rr/caller-0/mmap_hardlink_37_crasher...
|
||||||
|
Remote debugging using 127.0.0.1:20652
|
||||||
|
|
||||||
|
--------------------------------------------------
|
||||||
|
---> Reached target process 1432674 at event 272.
|
||||||
|
--------------------------------------------------
|
||||||
|
Reading symbols from /usr/bin/../lib64/rr/librrpreload.so...
|
||||||
|
Reading symbols from /usr/lib/libc.so.6...
|
||||||
|
(No debugging symbols found in /usr/lib/libc.so.6)
|
||||||
|
Reading symbols from /lib64/ld-linux-x86-64.so.2...
|
||||||
|
(No debugging symbols found in /lib64/ld-linux-x86-64.so.2)
|
||||||
|
BFD: warning: system-supplied DSO at 0x6fffd000 has a section extending past end of file
|
||||||
|
0x0000000070000002 in syscall_traced ()
|
||||||
|
=> 0x0000000070000002: c3 ret
|
||||||
|
(rr) bt
|
||||||
|
#0 0x0000000070000002 in syscall_traced ()
|
||||||
|
#1 0x00007f01f1fd0430 in _raw_syscall ()
|
||||||
|
at /home/jade/builds/rr/src/rr-5.5.0/src/preload/raw_syscall.S:120
|
||||||
|
/* .... */
|
||||||
|
#13 0x00007f01f1dfe543 in __GI__IO_file_overflow () from /usr/lib/libc.so.6
|
||||||
|
#14 0x00007f01f1df36fa in puts () from /usr/lib/libc.so.6
|
||||||
|
#15 0x000055757da4815c in main () at crasher.c:5
|
||||||
|
(rr) frame 15
|
||||||
|
#15 0x000055757da4815c in main () at crasher.c:5
|
||||||
|
5 printf("[crasher] about to crash\n");
|
||||||
|
(rr) list
|
||||||
|
1 #include <stdio.h>
|
||||||
|
2 #include <stdlib.h>
|
||||||
|
3
|
||||||
|
4 int main(void) {
|
||||||
|
5 printf("[crasher] about to crash\n");
|
||||||
|
6 abort();
|
||||||
|
7 }
|
||||||
|
```
|
||||||
|
|
||||||
|
## Slightly less bad way with gdb
|
||||||
|
|
||||||
|
As of a release some time in the last decade, gdb supports [`set detach-on-fork
|
||||||
|
off`](detach-on-fork), which, as the docs say, keeps both parent and child as
|
||||||
|
debugees in gdb by pausing the one that's not actively being interacted with.
|
||||||
|
This is not significantly better than `follow-fork-mode`, since it instead
|
||||||
|
requires you to run each debugee to when it will wait next.
|
||||||
|
|
||||||
|
I tried it while writing this post and found I had to set an `exec`
|
||||||
|
[catchpoint](catchpoint) (with `catch exec`) then continue the parent process,
|
||||||
|
interrupt it when it is stuck in the wait call, switch threads to the other
|
||||||
|
debugee, then `continue` such that it hits the catchpoint.
|
||||||
|
|
||||||
|
I would love to hear about better ways to do this in gdb.
|
||||||
|
|
||||||
|
[detach-on-fork]: https://docs.jade.fyi/gnu/gdb/gdb.html#index-set-detach_002don_002dfork
|
||||||
|
|
||||||
## Demo source {#source}
|
## Demo source {#source}
|
||||||
|
|
||||||
{{ codefile(path="caller.c", code_lang="c", colocated=true, hide=true) }}
|
{{ codefile(path="caller.c", code_lang="c", colocated=true, hide=true) }}
|
||||||
{{ codefile(path="crasher.c", code_lang="c", colocated=true, hide=true) }}
|
{{ codefile(path="crasher.c", code_lang="c", colocated=true, hide=true) }}
|
||||||
{{ codefile(path="Makefile", code_lang="make", colocated=true, hide=true) }}
|
{{ codefile(path="Makefile", code_lang="make", colocated=true, hide=true) }}
|
||||||
|
|
||||||
|
|
|
||||||
34
preview.sh
Executable file
34
preview.sh
Executable file
|
|
@ -0,0 +1,34 @@
|
||||||
|
#!/bin/bash
|
||||||
|
|
||||||
|
set -euo pipefail
|
||||||
|
|
||||||
|
print_error() {
|
||||||
|
cat >&2 <<-EOF
|
||||||
|
Please create a .deploy_config with content:
|
||||||
|
PREVIEW_HOST=[the host you rsync to]
|
||||||
|
PREVIEW_DIR=some-subdomain
|
||||||
|
PREVIEW=https://blah
|
||||||
|
EOF
|
||||||
|
}
|
||||||
|
|
||||||
|
if [[ ! -e .deploy_config ]]; then
|
||||||
|
print_error
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
source ./.deploy_config
|
||||||
|
|
||||||
|
if [[ -z "${PREVIEW_HOST:-}" || -z "${PREVIEW_DIR:-}" || -z "${PREVIEW:-}" ]]; then
|
||||||
|
print_error
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
PREVIEW_TEMP=/tmp/zola-preview
|
||||||
|
|
||||||
|
rm -rf "$PREVIEW_TEMP"
|
||||||
|
zola build --drafts --base-url "$PREVIEW" --output-dir "$PREVIEW_TEMP"
|
||||||
|
|
||||||
|
# trailing slash: copy contents of directory into destination
|
||||||
|
rsync --verbose --human-readable --recursive \
|
||||||
|
--links --times --new-compress --delete-delay \
|
||||||
|
${EXTRA_OPTS_PREVIEW:-} "$PREVIEW_TEMP/" "${PREVIEW_HOST}:${PREVIEW_DIR}"
|
||||||
Loading…
Add table
Add a link
Reference in a new issue