+++ date = "2021-07-27" draft = false path = "/blog/workflow-unfamiliar-c-cpp-codebases" tags = ["workflow"] title = "My Workflow: Unfamiliar C and C++ codebases" +++ Improving and automating my workflow is something I have put considerable investment into, and I would like to share what I can. This is part of a series of posts about my development workflow. You can read the other installments here: - [Docs](./docs-tricks-and-gnus) --- I have a curse: I know how to program and I run Linux, so I naturally tend to fix things when I see them, which results in significant yak shaving, but also fixes things permanently. I also know that one of the most effective ways to learn things about libraries or programs that are misbehaving is reading their source code. # Acquiring source code Actually getting the source to things is surprisingly annoying: every project has its own Web site where they might link their source, and I would have to find those. I don't want to navigate websites as they're distracting and too often don't have the links to the code anywhere easy to find. ## Just grab it off GitHub Because of the monoculture in open source, very few projects don't at least have a mirror of their source on GitHub. This makes it very convenient to acquire source for things, as I only have to look in one place that's also quite uniform and machine accessible. The `gh` GitHub command line tool is probably one of the better ways to interact with the Web site without distractions. Unfortunately, there is an old [feature request](https://github.com/cli/cli/issues/1004) for adding search, which has not been implemented yet. Thankfully, you can add aliases. I've done so with a fancy alias for searching repos which emits nice coloured output for finding where the name of the repo for the thing I want. ```yaml aliases: search: 'api -X GET search/repositories -f q=''$1'' --template ''{{range .items}}{{ .full_name | color "white" }}: {{ .description }}{{"\n"}}{{ end }}''' ``` I can then do something like `gh search 'github cli'`, for example, and it will print out the name and description of `cli/cli` which is the one I want. It can then be cloned with `gh repo clone cli/cli`. ## Can't find it on GitHub At this point I would either Google it on DuckDuckGo, or look at where the distro package got it from (this is important in the case of things that have been forked or have multiple versions). I use Arch, so that would entail either looking at the package info with `pacman -Si PACKAGE-NAME` or grabbing the package source with `asp checkout PACKAGE-NAME`, then reading the `PKGBUILD`. Equivalent things exist for other distros, for example, reading `nixpkgs` source for NixOS. # Dealing with C or C++ C or C++ projects often have a lot of latitude to do creative things with their build processes, and I want an IDE to work on them. I use [`nvim`] and [`clangd`] for my IDE, so working on codebases with arbitrary build systems is a question of generating a compilation database (`compile_commands.json`). [`nvim`]: https://neovim.io/ [`clangd`]: https://clangd.llvm.org/ ## Using various build systems These are the build systems I have dealt with the most while working on random C or C++ projects. Sometimes they don't document how to use the build system, or I don't want to read the README. ### GNU autotools **Identifier**: `.in` files or `configure` script at the root of the repo. **Notes**: If `configure` is missing, there may be `bootstrap` or `bootstrap.sh` that will generate one, or you may have to run `autoreconf --install` if that's not there. **Usage**: `./configure`, then `make -jN` where N is the number of build jobs. `./configure` may need some options, it has a `--help` option that will list the possible ones. ### CMake **Identifier**: `CMakeLists.txt` at the root of the repo. **Usage**: `cmake -G Ninja -B build`, then `ninja -C build`. ### Meson **Identifier**: `meson.build` at the root of the repo. **Usage**: `meson ./build`, then `ninja -C build`. ## Compilation databases ### Unusual and obsolete build systems such as GNU autotools/GNU make Use [Bear]: configure, then `bear -- make [make options]`. This will do `LD_PRELOAD` magic and intercept the calls to the compiler and save them. This tool works on basically any build system, even silly shell scripts. Just remember to run it on a clean build or else it will miss some files! Sometimes [`compiledb`] works better: `compiledb make -- [make options]`. [Bear]: https://github.com/rizsotto/Bear [`compiledb`]: https://github.com/nickdiego/compiledb ### Linux kernel Pretty high up on the list of unusual build systems. [Build the kernel with clang]: `make CC=clang defconfig` then `make CC=clang -jN`, then run `scripts/clang-tools/gen_compile_commands.py`. [Build the kernel with clang]: https://www.kernel.org/doc/html/latest/kbuild/llvm.html ### CMake CMake is nice because it can generate Ninja. You can invoke it with `-G Ninja`, build, then ask Ninja for compile commands with `ninja -C build -t compdb > compile_commands.json`. ### Meson Like CMake, after building the software, you can use Ninja to get a compilation database with `ninja -C build -t compdb > compile_commands.json`. ## My IDE got confused because they're doing cursed stuff This has happened a couple of times, especially when reading source code to `glibc`, for instance, where there are definitions in headers and definitions in unrelated `.c` files, among other things. Fortunately, `ctags` is not smart enough to get confused by cursed stuff and works fine in parallel with a LSP server. Run `ctags -R .` at the root of the repo and use `nvim` to navigate with the tags: - CTRL-] jump to the identifier under the cursor. - CTRL-W g CTRL-] jump to the identifier under the cursor in a new split. - `:tj TAG_NAME` selects from the tags called `TAG_NAME` or jumps there directly if there's only one. Useful if there are multiple definitions of the same identifier. - CTRL-O goes back in history to the last jumped position. - CTRL-I goes forward in history to the last jumped position. # Finding things I use [`ripgrep`] as it has good defaults and is extremely fast. Usually the way I find things is I look for a unique word related to the thing I want in the documentation, for instance, a long command line option or an error message, then I search it case insensitively (`-i`) and start browsing code from there. Sometimes it's not that easy, and I have to use some more tricks as I can't find it by searching. I often pull out a debugger after `strace`ing the program to try to find an interesting system call I can set a breakpoint on to track down the code path. Or, for instance, I know that a program opens two dialogs before the interesting behaviour, so I set a breakpoint on `XCreateWindow`. I then take a backtrace and have somewhere to start looking in the codebase. Be creative! Usually my debugger of choice is either [`rr`] or `gdb`. [`ripgrep`]: https://github.com/burntsushi/ripgrep [`rr`]: https://github.com/rr-debugger/rr --- This is part of a series of posts about my development workflow. You can read the other installments here: - [Docs](./docs-tricks-and-gnus)