RFC: IPC based tracing interface (alternative to eBPF/USDT) #35142

issue 0xB10C opened this issue on April 23, 2026
  1. 0xB10C commented at 10:38 AM on April 23, 2026: contributor

    Bitcoin Core has a tracing interface that exposes internal events in real-time via a machine-to-machine interface. An event is, for example, a P2P message being received or a mempool transaction being replaced. The interface is implemented using "Userspace, Statically Defined Tracepoints" (USDT) and primarily intended to be used with eBPF. The interface is currently being used by, for example, the peer-observer ebpf-extractor, the FIBRE network monitoring toolswith custom tracepoints, researchers collecting and studying RBF replacements, and possibly more users. The interface has no overhead when not in use, and a low overhead when in use.

    However, there are a few pain points that make using this interface hard to use for most users. I believe these also to be reasons for why we haven't seen more usage of this interface, only limited review on tracing interface related PRs, and ultimately why the development of this interface has stalled). The pain points are:

    • Requires privileges: Using the tracepoints by loading an eBPF scripts into the Linux kernel requires root privileges. This makes the interface and tracing tools hard to develop, use, review, and test in CI. Ideally, nobody should need to run something downloaded from GitHub as root on their development machine.
    • Troublesome in CI: The Python tooling we use to test these tracepoints (BCC) has been troublesome in our CI from time to time (https://github.com/bitcoin/bitcoin/issues/28600, #29788). While using the tracepoints with libbpf (written in C) has been working great for me, we don't want to and can't use this in the Bitcoin Core functional tests. Additionally, running tracing scripts in CI requires a VM. A simple docker container does not work.
    • Untyped: The interface is not typed. Integers are passed as values and the interpretation of them is done in the tracing scripts. Strings are passed as pointers to C-style strings. We can’t pass objects, lists, or more than 12 arguments to a tracepoint. This makes the interface brittle and we might not notice the interface breaking due to an unrelated change.
    • Linux-only: The current tracepoint macro definitions only work on Linux and the interface currently only be used on Linux. While e.g. macOS and FreeBSD support is possible (https://github.com/bitcoin/bitcoin/issues/31274#issue-2649982759 has some discussion on this), there hasn’t been much interest in adding support for these platforms, as the implementation and tools are quite different from the Linux one. Even if implemented, who is going to test and maintain them? The Linux scripts won’t work on other platforms. Windows support is not possible.

    With the recent progress on the IPC interface, we might have an alternative that solves these pain points. The idea would be a streaming-like interface that emits events to subscribers of one or more events. This should solve:

    • No special privileges required: Using the IPC interfaces doesn’t require root privileges. This makes the interface much easier to develop, use, and test.
    • Testing in CI: There is no tracing-only dependencies needed (like e.g. BCC) to test a potential IPC tracing interface in CI. There are already tests and test dependencies for the mining interface. My understanding is that these are there to stay and need to be maintained going forward.
    • Platform support: IPC is available on Linux, macOS, and possibly Windows (https://github.com/bitcoin/bitcoin/pull/35084), and doesn’t need per-platform code and tracing scripts.
    • Typed interface: Interface structs are well-defined as CaptnProto messages. Passing objects, lists, and strings is possible and a lot less error prone than passing untyped integer values and pointers to C-style strings.

    The main downside I see is a likely higher overhead when tracing via IPC. This is something to evaluate and test, and potentially improve where possible. When not in use, the IPC-tracing should not cause any overhead (e.g. by gating the "tracepoints" with a if(script attached to this tracepoint)) similar how it's done for the USDT tracepoints.

    Anecdotally, one of the goals of the initial introduction of the eBPF/USDT based tracing was to demonstrate a more general alternative to the --capturemessage functionality introduced in Per-Peer Message Capture. While the current tracing is an alternative to --capturemessage, it only works on Linux, requires running scripts with privileges, needs a dependency on bpftrace or BCC, and requires reading pointers to kernel-memory in the tracing script, and then an extra ringbuffer to get these to userspace. My hope is that with an IPC-tracing interface the --capturemessage functionality can be finally be deprecated and removed (at least my memory is that this is functionality people would like to see being removed again, once there's a good alternative).

    Since we only know a part of the (likely small) user base of the eBPF/USDT tracing, we probably want to deprecate it for one or two releases once a potential IPC-tracing interface with good performance has been merged. The goal should be to have only one tracing interface, not two.

    The current eBPF/USDT tracing interface is "experimental" and semi-stable. This means, ideally we don't break tracing scripts. However, since tracing is exposing internals, when we change internals, we also need to change the "tracepoints" and related tracing scripts. An example of this is https://github.com/bitcoin/bitcoin/pull/31122/changes/5736d1ddacc4019101e7a5170dd25efbc63b622a where cluster mempool related changes (https://github.com/bitcoin/bitcoin/pull/31122) meant that the data passed in a tracepoint needed to change. I think, a new tracing framework would need to behave similarly in terms of stability.

    Currently, the following eBPF/USDT tracepoints span across a few different areas: net, mempool, utxocache, coinselection. Speaking with others about an IPC-tracing interface something that came up frequently was, that for some of these existing tracepoints we might want want to offer a more stable interface. For example, it might make sense to have a mempool IPC interface that offers an interface to the nodes mempool. This might be used by more than just "tracing" users. E.g. Lightning nodes that currently use the ZMQ interface and mempool and block explorers that want to learn about additions and replacements. I haven't really decided what's a better fit here. This ultimately boils down to having one large events interface or potentially multiple interfaces with different levels of stability etc. A Lighting node implementation might want to have a more stable mempool interface than e.g. a researcher/developer needs for a tracing script. Additionally, a more general mempool IPC interface might contain more functionality than receiving events about transactions. It might also support submitting transactions to the mempool or querying mempool contents.


    Some questions I'd like people to comments on:

    • Do you disagree with the general direction of this?
    • Where to draw the interface boundaries? Do you have a preference on a events interface that's just for streaming events vs possible multiple, separate interfaces for different components (more general and stable mempool interface, potentially less stable net interface for messages/connections)
    • Is removing --capturemessage really a goal? Would an potential IPC-based replacement with external tooling to write msgs to disk enough to allow deprecating and removing it?

    prior discussion and other resources:

  2. 0xB10C commented at 10:39 AM on April 23, 2026: contributor

    For evaluating the overhead and performance of an IPC-tracing interface, a few initial ideas are:

    Ping benchmark: This shows message throughput with many small messages.

    1. implement an event for P2P messages similar to the utxocache events in #32898
    2. write a simple tool that connects to this IPC interface and receives these events
    3. have a custom P2P client that connects to a localhost, regtest node and sends pings to it as fast as possible. Ignore the pong. Maybe send 1M pings over a span of a few minutes. Potentially start slow and increase the ping-rate over time.
    4. With only the eBPF tracepoints, measure how many events per second you receive and if any are dropped
    5. With only the IPC events, measure how many events per second you receive and how many are dropped

    IBD benchmark: This shows message throughput with many large messages:

    1. implement an event for P2P messages similar to the utxocache events in #32898
    2. Do a mainnet initial block download from a localhost node and measure how long it takes.
    3. With only the eBPF tracepoints, measure how many blocks per second we receive and if any are dropped.
    4. With only the IPC events, measure how many blocks per second you receive and how many are dropped. @willcl-ark also noted a while back that he saw high CPU usage with the utxocache events during IBD. So might be interesting to check this.
  3. fanquake added the label Brainstorming on Apr 23, 2026
  4. fanquake added the label interfaces on Apr 23, 2026
  5. ViniciusCestarii commented at 1:05 PM on April 24, 2026: contributor

    Great write-up. A few thoughts on the open questions:

    General direction: Strongly agree. The privilege requirement alone makes USDT inaccessible for most developers, and the CI friction compounds that. IPC-based tracing solves the right problems.

    Interface boundaries: I'd argue for keeping operational functionality separate from tracing, if a subsystem's interface starts wanting tx submission or mempool queries, that deserves its own dedicated interface following the mining interface precedent. For the tracing layer itself, I'm genuinely uncertain whether a single events interface or per-subsystem interfaces is better. It would help to know whether current USDT users typically want to correlate events across subsystems or tend to focus on one at a time. That seems like the right data to drive this decision.

  6. willcl-ark commented at 9:39 AM on April 27, 2026: member

    @willcl-ark also noted a while back that he saw high CPU usage with the utxocache events during IBD. So might be interesting to check this.

    Would only note here that this was done on a Hetzner CX22, with 2 vCPUs, 4 GB of RAM, and 40 GB of disk space. This is already pretty-constrained for a pruning full node doing IBD, but I was also obviously running the IPC collector on the same machine.

  7. Bicaru20 commented at 5:38 PM on May 4, 2026: none

    Hi, I've been using the tracepoint to study the transactions replaced by the RBF. Specifically the mempool:replaced. Personally at the begining it was dificult to set up all the dependecies and make the tracepoint work. Also is not easy to undertand and use the tracing scripts (althought the examples are really great!). On top of that, the fact that the tracpoint requires privileges is not ideal, specially if you are running the program in an external server.

    I also tried to modify the tracing script but I was not succesfull. Now I see that maybe it was due to the interface not being typed. In any case, I don't feel is trivial to modify this scripts for specific use cases.

    When extracting data about replacements, I noticed that the tracepoints sometimes return incorrect information. For example, if a transaction has a child in the mempool and it gets replaced, the child is also marked as replaced. The tracepoints then indicate that the child is being replaced by the parent transaction, while the parent appears to be replaced by nothing. I’m not sure whether issues like this would be resolved by introducing IPC. I also haven’t yet verified whether this behavior originates from a fault in the tracepoints themselves.

    I strongly agree with the general direction. Simply making the setup easier than the tracepoints and removing the dependency on special permissions makes this worth implementing.

    Interface boundaries: I would say that a subsystem-specific interface is better. In my case, I am only monitoring the mempool, so I would rather rely on a stable interface dedicated to that subsystem. But even when monitoring multiple components simultaneously, I think it still makes more sense for each subsystem to expose its own interface, with aggregation and comparison handled afterwards by a personalized backend.

  8. stickies-v commented at 2:30 PM on May 8, 2026: contributor

    Concept ACK, the general direction makes sense. I've not used eBPF/USDT in meaningful ways, but I think providing easier and more ways to monitor are useful for the project.

    Wrt interface, I generally prefer multiple, smaller, generic endpoints that different kinds of users can compose in ways that make most sense for their application.

  9. sipa commented at 2:31 PM on May 8, 2026: member

    Concept ACK on exploring this approach as a replacement for the eBPF interface, assuming it is found to be sufficiently performant. I believe eBPF only seen pretty narrow use, and breaking compatibility there is acceptable (with deprecation period etc.). Using IPC seems to be more flexible and easier to extend to more functionality. That said, I have not reviewed the code changes needed, and am not committing to reviewing this.

  10. edilmedeiros commented at 8:44 PM on May 9, 2026: contributor

    I'm favorable to the concept and willing to help with review.

    I would prefer to see a modular approach as much as possible, i.e., the user/developer enables smaller endpoints: better composability and probably less burden on the node for things it doesn't need.

    The main concern I see is not only runtime overhead, but API creep: once an internal event is exposed through a convenient IPC interface, external tools may start depending on it as if it were a stable API. I feel that #29912 is related in a broader sense: once interfaces are meant to be consumed by external tools, the schema and stability contract become part of the design. I would explicitly treat any design here as experimental for quite a some time.

    Having said that, maybe the distinction is not only modularity by subsystem, but modularity by stability level and so the split should be along two axes: subsystem boundaries (net, mempool, validation, utxocache, etc.) and stability boundaries. A potential mempool interface used by Lightning nodes or explorers has a different character from a net tracepoint used to understand message scheduling or compact block reconstruction. Having this somewhat explicitly should make this interface useful for many audiences.

    For developers, tracepoints are valuable because they expose internal behavior without transforming the program into a debugger session. Debuggers work best for inspecting a stopped state. Tracepoints, on the other hand, are useful for understanding how the program reached a specific state. I would guess they are keen to questions like: when a P2P message is received, which thread processed it, whether a validation event followed, whether a cache flush or mempool mutation happened nearby; i.e. implementation details matter and thus the interface should follow the implementation, not the other way around. I think the current tracepoints can't even help with many of these questions.

    For a network researcher, a node is more like a measurement instrument where the event stream is the product itself (most notable works that come to my mind are https://www.dsn.kastel.kit.edu/bitcoin/ and https://github.com/peer-observer/). These tend to ask more abstract questions like how fast do transactions and blocks propagate, how often does compact block reconstruction fail, how do peers behave under different relay conditions, can we fingerprint this node based on how it behaves on the network, etc. A researcher often needs to reconstruct sequences of events and correlate them with events from other nodes or other data sources and probably benefits from more stable semantics that gives some guarantees about timing, ordering and loss accounting.

    A node operator usually does not want raw trace events, they want to know whether the node is healthy and I think of things like: number of connected peers, inbound/outbound peer counts, message rates, block download progress, stale tip status, mempool size, mempool churn, compact block reconstruction failures, validation queue backlog, UTXO cache flush durations, disk or network bottlenecks. These are metrics or status summaries, and some are already available via the RPC interface.

  11. morozow commented at 11:08 AM on May 11, 2026: none

    Hi, I went through the current issue discussion, and the first thing I noticed is that we do not yet have enough concrete metrics to reason about the trade-offs in a measurable way. I measured IPC tracing overhead vs eBPF on -reindex with 200k mainnet blocks, heights 0–200,000 via same binary, same block data, Docker Linux arm64 including sequential execution with cache flushes between conditions.

    Condition Flag Time(s) Blocks/s Overhead%
    Baseline (no observers) baseline 466.26 428.9
    eBPF kernel counters ebpf 493.44 405.3 5.83%
    eBPF data delivery ebpf_full 849.71 235.4 82.24%
    IPC pipe (6 workers, raw) raw_ipc 495.58 403.6 6.29%
    IPC pipe (6 workers, managed) ipc 489.00 409.0 4.88%

    The benchmark is composed of an IPC path that serializes events to JSON, writes to POSIX pipes asynchronously through a bounded queue, and 6 forked worker processes consume from stdin. USDT tracepoints are mirrored identically. The eBPF overhead comes from bpftrace polling perf buffers in userspace per event batch, whereas the pipe write is a non-blocking kernel buffer copy. In benchmark, bpf measures kernel-side counters only while no data leaves the kernel – useless for actual observability. ebpf_full extracts event fields and delivers them to userspace via perf ring buffer – this is what any real eBPF monitoring tool does when it needs to show you block heights, transaction fees, or peer addresses. I would like to note one architectural difference – with IPC, consumer-side processing scales horizontally. The IPC approach allows adding more worker processes without increasing overhead on the bitcoind side, since the cost is one enqueue and one pipe write regardless of how many consumers read downstream. With eBPF, the perf buffer is a single-reader bottleneck, so additional analysis means either running multiple bpftrace instances with separate probe attachments or post-processing the single output stream.

    I added a reproduction Dockerfile and benchmark script to the fork: https://github.com/morozow/bitcoin_rd/tree/rfc/ebpf-ipc-tracing/contrib/perf/docker Benchmark can produce negative overhead values e.g. raw_ipc at -1.47% for instance. It's a measurement noise, not real speedup. Single-run variance on this workload is ±2-3% due to OS scheduling, thermal state, and I/O timing. A negative value means the condition's overhead is effectively zero – within the noise floor of the measurement.

  12. vasild commented at 8:59 AM on May 14, 2026: contributor

    Concept ACK


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-05-16 12:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me