Bitcoin Core has a tracing interface that exposes internal events in real-time via a machine-to-machine interface. An event is, for example, a P2P message being received or a mempool transaction being replaced. The interface is implemented using "Userspace, Statically Defined Tracepoints" (USDT) and primarily intended to be used with eBPF. The interface is currently being used by, for example, the peer-observer ebpf-extractor, the FIBRE network monitoring toolswith custom tracepoints, researchers collecting and studying RBF replacements, and possibly more users. The interface has no overhead when not in use, and a low overhead when in use.
However, there are a few pain points that make using this interface hard to use for most users. I believe these also to be reasons for why we haven't seen more usage of this interface, only limited review on tracing interface related PRs, and ultimately why the development of this interface has stalled). The pain points are:
- Requires privileges: Using the tracepoints by loading an eBPF scripts into the Linux kernel requires root privileges. This makes the interface and tracing tools hard to develop, use, review, and test in CI. Ideally, nobody should need to run something downloaded from GitHub as root on their development machine.
- Troublesome in CI: The Python tooling we use to test these tracepoints (BCC) has been troublesome in our CI from time to time (https://github.com/bitcoin/bitcoin/issues/28600, #29788). While using the tracepoints with libbpf (written in C) has been working great for me, we don't want to and can't use this in the Bitcoin Core functional tests. Additionally, running tracing scripts in CI requires a VM. A simple docker container does not work.
- Untyped: The interface is not typed. Integers are passed as values and the interpretation of them is done in the tracing scripts. Strings are passed as pointers to C-style strings. We can’t pass objects, lists, or more than 12 arguments to a tracepoint. This makes the interface brittle and we might not notice the interface breaking due to an unrelated change.
- Linux-only: The current tracepoint macro definitions only work on Linux and the interface currently only be used on Linux. While e.g. macOS and FreeBSD support is possible (https://github.com/bitcoin/bitcoin/issues/31274#issue-2649982759 has some discussion on this), there hasn’t been much interest in adding support for these platforms, as the implementation and tools are quite different from the Linux one. Even if implemented, who is going to test and maintain them? The Linux scripts won’t work on other platforms. Windows support is not possible.
With the recent progress on the IPC interface, we might have an alternative that solves these pain points. The idea would be a streaming-like interface that emits events to subscribers of one or more events. This should solve:
- No special privileges required: Using the IPC interfaces doesn’t require root privileges. This makes the interface much easier to develop, use, and test.
- Testing in CI: There is no tracing-only dependencies needed (like e.g. BCC) to test a potential IPC tracing interface in CI. There are already tests and test dependencies for the mining interface. My understanding is that these are there to stay and need to be maintained going forward.
- Platform support: IPC is available on Linux, macOS, and possibly Windows (https://github.com/bitcoin/bitcoin/pull/35084), and doesn’t need per-platform code and tracing scripts.
- Typed interface: Interface structs are well-defined as CaptnProto messages. Passing objects, lists, and strings is possible and a lot less error prone than passing untyped integer values and pointers to C-style strings.
The main downside I see is a likely higher overhead when tracing via IPC. This is something to evaluate and test, and potentially improve where possible. When not in use, the IPC-tracing should not cause any overhead (e.g. by gating the "tracepoints" with a if(script attached to this tracepoint)) similar how it's done for the USDT tracepoints.
Anecdotally, one of the goals of the initial introduction of the eBPF/USDT based tracing was to demonstrate a more general alternative to the --capturemessage functionality introduced in Per-Peer Message Capture. While the current tracing is an alternative to --capturemessage, it only works on Linux, requires running scripts with privileges, needs a dependency on bpftrace or BCC, and requires reading pointers to kernel-memory in the tracing script, and then an extra ringbuffer to get these to userspace. My hope is that with an IPC-tracing interface the --capturemessage functionality can be finally be deprecated and removed (at least my memory is that this is functionality people would like to see being removed again, once there's a good alternative).
Since we only know a part of the (likely small) user base of the eBPF/USDT tracing, we probably want to deprecate it for one or two releases once a potential IPC-tracing interface with good performance has been merged. The goal should be to have only one tracing interface, not two.
The current eBPF/USDT tracing interface is "experimental" and semi-stable. This means, ideally we don't break tracing scripts. However, since tracing is exposing internals, when we change internals, we also need to change the "tracepoints" and related tracing scripts. An example of this is https://github.com/bitcoin/bitcoin/pull/31122/changes/5736d1ddacc4019101e7a5170dd25efbc63b622a where cluster mempool related changes (https://github.com/bitcoin/bitcoin/pull/31122) meant that the data passed in a tracepoint needed to change. I think, a new tracing framework would need to behave similarly in terms of stability.
Currently, the following eBPF/USDT tracepoints span across a few different areas: net, mempool, utxocache, coinselection. Speaking with others about an IPC-tracing interface something that came up frequently was, that for some of these existing tracepoints we might want want to offer a more stable interface. For example, it might make sense to have a mempool IPC interface that offers an interface to the nodes mempool. This might be used by more than just "tracing" users. E.g. Lightning nodes that currently use the ZMQ interface and mempool and block explorers that want to learn about additions and replacements. I haven't really decided what's a better fit here. This ultimately boils down to having one large events interface or potentially multiple interfaces with different levels of stability etc. A Lighting node implementation might want to have a more stable mempool interface than e.g. a researcher/developer needs for a tracing script. Additionally, a more general mempool IPC interface might contain more functionality than receiving events about transactions. It might also support submitting transactions to the mempool or querying mempool contents.
Some questions I'd like people to comments on:
- Do you disagree with the general direction of this?
- Where to draw the interface boundaries? Do you have a preference on a
eventsinterface that's just for streaming events vs possible multiple, separate interfaces for different components (more general and stablemempoolinterface, potentially less stablenetinterface for messages/connections) - Is removing
--capturemessagereally a goal? Would an potential IPC-based replacement with external tooling to write msgs to disk enough to allow deprecating and removing it?
prior discussion and other resources:
- #31274 (comment)
- https://github.com/bitcoin-core/libmultiprocess/issues/185
- #32898: a IPC-tracing PoC by @ryanofsky
- https://github.com/willcl-ark/bitcoin/tree/pr/trace A rebase of the IPC-tracing PoC (including the chain-interface PR)
- https://github.com/bitcoin-dev-tools/ipc-exporter-rust/blob/master/report.md: @willcl-ark's "USDT Tracepoints vs IPC Interfaces: Comparison Report"
- https://github.com/bitcoin-dev-tools/ipc-exporter-rust/: PoC for a Bitcoin Core IPC metric exporter for Prometheus
- https://tracing.fish.foo/: a dashboard showing the ipc-exporter-rust data
- https://github.com/peer-observer/peer-observer/pull/379: a basic IPC-extractor for peer-observer (limited to the mining interface for now)