more tracing stuff i guess?
This commit is contained in:
parent
fdef491dae
commit
3a1d2710d6
1 changed files with 62 additions and 11 deletions
160
content/posts/tracing-dx-ideas.md
Normal file
160
content/posts/tracing-dx-ideas.md
Normal file
|
|
@ -0,0 +1,160 @@
|
|||
+++
|
||||
date = "2022-12-26"
|
||||
draft = true
|
||||
path = "/blog/tracing-dx-ideas"
|
||||
tags = ["haskell", "opentelemetry", "developer-experience"]
|
||||
title = "Make tracing easy easily! Developer experience ideas"
|
||||
+++
|
||||
|
||||
I interned at Mercury for several months and built out a lot of developer
|
||||
experience improvements. Many of these were driven by having a good sense of
|
||||
whether something will be feasible in an afternoon and knowing that I can get
|
||||
away with spending an afternoon programming something nobody asked for yet.
|
||||
|
||||
# OpenTelemetry/Tracing
|
||||
|
||||
Many of the highest impact ideas I had were related to OpenTelemetry tracing:
|
||||
my goal was to make tracing the first choice to investigate any kind of problem
|
||||
from development to production. This blog post catalogues the ideas that I
|
||||
implemented, how much work they were, and whether I think they're worth it.
|
||||
|
||||
## Put a link to traces in a header
|
||||
|
||||
I made the back-end emit a header `trace-link`, which contains a link to the
|
||||
Honeycomb trace for the request.
|
||||
|
||||
#### How easy was it?
|
||||
|
||||
1 afternoon of work (plus a couple days work later once we had to start hitting
|
||||
the API due to the new data model with environments). Most of this work is open
|
||||
source and reusable for Haskell apps.
|
||||
|
||||
#### What did it accomplish?
|
||||
|
||||
This was probably the best tracing adoption improvement I made because it lets
|
||||
devs directly look at misbehaving requests in browser dev tools and then open
|
||||
the trace in one click. It singlehandedly got a handful of people to start
|
||||
using tracing.
|
||||
|
||||
It doesn't really give any capability that isn't available by copying the trace
|
||||
ID out of the second component of the `traceparent` header you're already
|
||||
sending if you're using the [w3c trace propagator], however, doing that is very
|
||||
arduous and manual.
|
||||
|
||||
If you have trace ID generation code, you can also start emitting trace IDs in
|
||||
other places, such as logs, exception reporting systems, and anywhere else you
|
||||
might want to follow requests through.
|
||||
|
||||
#### How to do it
|
||||
|
||||
If you're using the hs-opentelemetry ecosystem for Haskell, the relevant code
|
||||
is here, in the package `hs-opentelemetry-vendor-honeycomb`:
|
||||
|
||||
https://github.com/iand675/hs-opentelemetry/tree/main/vendors/honeycomb
|
||||
|
||||
What this package does is:
|
||||
|
||||
1. Find where data is going using the [Honeycomb Auth API]: you need to know
|
||||
the dataset, tenancy name, and environment that the API key is going into.
|
||||
|
||||
In my design, this data is acquired at startup time so trace link generation
|
||||
is just string concatenation thereafter.
|
||||
|
||||
2. Create [Direct Trace Links] using the trace ID then put them in a header.
|
||||
|
||||
|
||||
[Honeycomb Auth API]: https://docs.honeycomb.io/api/auth/
|
||||
[Direct Trace Links]: https://docs.honeycomb.io/api/direct-trace-links/
|
||||
[w3c trace propagator]: https://www.w3.org/TR/trace-context/
|
||||
|
||||
## Instrument the test suite
|
||||
|
||||
#### How easy was it?
|
||||
|
||||
Implementing the hspec stuff originally took about half a week since it involved reading
|
||||
substantial amounts of hspec internals and poking around in a debugger. I
|
||||
assume probably similar times for initially adding instrumentation to any other
|
||||
test framework/language, with some adjustment for how well documented they are
|
||||
(deduct some points from hspec for confusing documentation).
|
||||
|
||||
However, once the integration to your test framework of choice exists, it takes
|
||||
a few minutes to add it to a new codebase.
|
||||
|
||||
#### What did it accomplish?
|
||||
|
||||
I was initially surprised at this having as big an impact as it did, but
|
||||
Honeycomb wound up being the easiest and cleanest way to view test suite runs
|
||||
and get database logs, exceptions and other useful debugging info to fix broken
|
||||
tests. This was a very worthwhile project and saved a handful of people
|
||||
probably a couple of hours each debugging thorny test failures.
|
||||
|
||||
#### How to do it
|
||||
|
||||
I wrote a Haskell library that starts spans for each test case in hspec:
|
||||
[hs-opentelemetry-instrumentation-hspec]. Plug this in per the example in the
|
||||
sources, and then you're done.
|
||||
|
||||
Bonus points if you print out a trace link at the end, since you can just reuse
|
||||
the trace link infrastructure from above for this.
|
||||
|
||||
You may also need to modify the way you do database interactions in tests to
|
||||
use instrumentation, for example.
|
||||
|
||||
[hs-opentelemetry-instrumentation-hspec]: https://github.com/iand675/hs-opentelemetry/tree/main/instrumentation/hspec
|
||||
|
||||
## Instrument scheduled tasks
|
||||
|
||||
#### How easy was it?
|
||||
|
||||
20 minutes to initialize tracing that already existed for the app, but in the
|
||||
scheduled tasks system.
|
||||
|
||||
#### What did it accomplish?
|
||||
|
||||
This one achieved ridiculously good results basically immediately: it's
|
||||
significantly easier to debug scheduled task misbehaviour and performance.
|
||||
|
||||
#### How to do it
|
||||
|
||||
Initialize tracing in your scheduled task runner, then create a context/root
|
||||
span for the task execution. Bonus points if you propagate the trace ID context
|
||||
from whatever invoked the scheduled task so you can correlate it with the
|
||||
initiating request in your tracing system.
|
||||
|
||||
# Database
|
||||
|
||||
While I was working at Mercury we were using Postgres, but these ideas are
|
||||
fairly generic.
|
||||
|
||||
## Speedy test startup
|
||||
|
||||
I debugged an issue after introducing instrumentation to the test-suite, in
|
||||
which migrations would run for 15 seconds or so on every test startup. This is
|
||||
because the migration system was running hundreds of migrations on startup. I
|
||||
solved this by restoring a snapshot of a pre-migrated database with
|
||||
`pg_restore`, saving about 10 seconds and not changing anything semantically
|
||||
(by comparison, a persistent test database has more risk of divergence).
|
||||
|
||||
The fastest way that I know of for creating a new Postgres database in a
|
||||
desired state is to use the template feature of `createdb` with the `-T`
|
||||
option, or `CREATE DATABASE yourname TEMPLATE yourtemplate`. This is a
|
||||
filesystem-level copy which makes it extremely fast (less than a second on a
|
||||
highly complex schema; compare to approximately 5 seconds to load the SQL for
|
||||
that in).
|
||||
|
||||
This can be used to create a database for each concurrent test. Those test
|
||||
databases can in turn be wiped after each test case with some kind of function
|
||||
that uses `TRUNCATE` (again, very low level; doesn't look at the data) to wipe
|
||||
the tables in preparation for the next case.
|
||||
|
||||
This leads to:
|
||||
|
||||
## Make testing migrations easy: ban down migrations
|
||||
|
||||
I wrote a script for testing database migrations. The idea that I had was born
|
||||
out of frustration in dealing with wiped development databases while working on
|
||||
migrations (which, to be clear, were easy to create, but still take 30 seconds
|
||||
or something): what if you just snapshot the development database then
|
||||
repeatedly run a migration?
|
||||
|
||||
|
||||
Loading…
Add table
Add a link
Reference in a new issue