Bitcoin Core’s current tracing and tracepoint implementation uses systemtap probes and eBPF for tracing (i.e. passing internal events to other processes via the linux kernel). While a powerful interface, it has its short comings:
- the current tracepoint marco definitions only work on Linux, thus it can currently only be used on Linux. While e.g. macOS and FreeBSD is possible, there hasn’t been much developer interest in implementing them. And even if implemented, who is going to test and maintain them. The Linux scripts won’t work on other platforms.
- loading eBPF scripts into the Linux kernel requires privileges, which makes the current tracing scripts harder to develop, use, and test in CI
- the Python tooling we use to test these tracepoints, for example BCC, has been troublesome in our CI from time to time
- the interfaces are not typed: we pass strings as C-style strings, and we can’t pass objects, lists, or more than 12 arguments to a tracepoint
- …
In a recent discussion I had, it came up if a tracepoint-like IPC interface could be an alternative. Superficially, this would have a few benefits:
- Available on all platforms, and doesn’t need per-platform code
- Connecting a tracing tool via IPC doesn’t require privileges, makes it easier to develop, use, and test
- no extra tooling needed to test the ipc-tracepoints: interface can be tested alongside the other IPC interfaces
- interface definitions will be clear, and data is typed. objects, lists, and strings are possible and a lot less error prone
I have a few questions to figure out if this could be an alternative:
- I’m aware that there is a streaming-pattern for capnproto. Could something like the following work with libmultiprocess:
- a client tells the node “I want to learn about incoming P2P message”
- the node sends the client all incoming P2P messages as they arrive. Possibly implemented similarly to https://github.com/bitcoin/bitcoin/blob/67ea4b9994e668dcea5e5d0f62f886d92e3737dc/src/net_processing.cpp#L5042-L5051
- the client disconnects, and the node stops sending incoming P2P messages
- One nice thing about the current tracepoint interface is that the overhead when not used is limited to a single instruction (check if someone is hooked into this tracepoint). And even when used, the overhead is minimal. Give then earlier example of passing all incoming P2P messages, is this doable over IPC without too much overhead? I guess the overhead is dependent on implementation (passing references or copies?), so this question is more about the gut feeling: yes this sounds doable / no, libmultiprocess/captnproto/IPC is not well suited to do this