RFC: IPC based tracing interface (alternative to eBPF/USDT) #35142

0xB10C commented at 10:38 AM on April 23, 2026: contributor

Bitcoin Core has a tracing interface that exposes internal events in real-time via a machine-to-machine interface. An event is, for example, a P2P message being received or a mempool transaction being replaced. The interface is implemented using "Userspace, Statically Defined Tracepoints" (USDT) and primarily intended to be used with eBPF. The interface is currently being used by, for example, the peer-observer ebpf-extractor, the FIBRE network monitoring toolswith custom tracepoints, researchers collecting and studying RBF replacements, and possibly more users. The interface has no overhead when not in use, and a low overhead when in use.

However, there are a few pain points that make using this interface hard to use for most users. I believe these also to be reasons for why we haven't seen more usage of this interface, only limited review on tracing interface related PRs, and ultimately why the development of this interface has stalled). The pain points are:

Requires privileges: Using the tracepoints by loading an eBPF scripts into the Linux kernel requires root privileges. This makes the interface and tracing tools hard to develop, use, review, and test in CI. Ideally, nobody should need to run something downloaded from GitHub as root on their development machine.
Troublesome in CI: The Python tooling we use to test these tracepoints (BCC) has been troublesome in our CI from time to time (https://github.com/bitcoin/bitcoin/issues/28600, #29788). While using the tracepoints with libbpf (written in C) has been working great for me, we don't want to and can't use this in the Bitcoin Core functional tests. Additionally, running tracing scripts in CI requires a VM. A simple docker container does not work.
Untyped: The interface is not typed. Integers are passed as values and the interpretation of them is done in the tracing scripts. Strings are passed as pointers to C-style strings. We can’t pass objects, lists, or more than 12 arguments to a tracepoint. This makes the interface brittle and we might not notice the interface breaking due to an unrelated change.
Linux-only: The current tracepoint macro definitions only work on Linux and the interface currently only be used on Linux. While e.g. macOS and FreeBSD support is possible (https://github.com/bitcoin/bitcoin/issues/31274#issue-2649982759 has some discussion on this), there hasn’t been much interest in adding support for these platforms, as the implementation and tools are quite different from the Linux one. Even if implemented, who is going to test and maintain them? The Linux scripts won’t work on other platforms. Windows support is not possible.

With the recent progress on the IPC interface, we might have an alternative that solves these pain points. The idea would be a streaming-like interface that emits events to subscribers of one or more events. This should solve:

No special privileges required: Using the IPC interfaces doesn’t require root privileges. This makes the interface much easier to develop, use, and test.
Testing in CI: There is no tracing-only dependencies needed (like e.g. BCC) to test a potential IPC tracing interface in CI. There are already tests and test dependencies for the mining interface. My understanding is that these are there to stay and need to be maintained going forward.
Platform support: IPC is available on Linux, macOS, and possibly Windows (https://github.com/bitcoin/bitcoin/pull/35084), and doesn’t need per-platform code and tracing scripts.
Typed interface: Interface structs are well-defined as CaptnProto messages. Passing objects, lists, and strings is possible and a lot less error prone than passing untyped integer values and pointers to C-style strings.

The main downside I see is a likely higher overhead when tracing via IPC. This is something to evaluate and test, and potentially improve where possible. When not in use, the IPC-tracing should not cause any overhead (e.g. by gating the "tracepoints" with a if(script attached to this tracepoint)) similar how it's done for the USDT tracepoints.

Anecdotally, one of the goals of the initial introduction of the eBPF/USDT based tracing was to demonstrate a more general alternative to the --capturemessage functionality introduced in Per-Peer Message Capture. While the current tracing is an alternative to --capturemessage, it only works on Linux, requires running scripts with privileges, needs a dependency on bpftrace or BCC, and requires reading pointers to kernel-memory in the tracing script, and then an extra ringbuffer to get these to userspace. My hope is that with an IPC-tracing interface the --capturemessage functionality can be finally be deprecated and removed (at least my memory is that this is functionality people would like to see being removed again, once there's a good alternative).

Since we only know a part of the (likely small) user base of the eBPF/USDT tracing, we probably want to deprecate it for one or two releases once a potential IPC-tracing interface with good performance has been merged. The goal should be to have only one tracing interface, not two.

The current eBPF/USDT tracing interface is "experimental" and semi-stable. This means, ideally we don't break tracing scripts. However, since tracing is exposing internals, when we change internals, we also need to change the "tracepoints" and related tracing scripts. An example of this is https://github.com/bitcoin/bitcoin/pull/31122/changes/5736d1ddacc4019101e7a5170dd25efbc63b622a where cluster mempool related changes (https://github.com/bitcoin/bitcoin/pull/31122) meant that the data passed in a tracepoint needed to change. I think, a new tracing framework would need to behave similarly in terms of stability.

Currently, the following eBPF/USDT tracepoints span across a few different areas: net, mempool, utxocache, coinselection. Speaking with others about an IPC-tracing interface something that came up frequently was, that for some of these existing tracepoints we might want want to offer a more stable interface. For example, it might make sense to have a mempool IPC interface that offers an interface to the nodes mempool. This might be used by more than just "tracing" users. E.g. Lightning nodes that currently use the ZMQ interface and mempool and block explorers that want to learn about additions and replacements. I haven't really decided what's a better fit here. This ultimately boils down to having one large events interface or potentially multiple interfaces with different levels of stability etc. A Lighting node implementation might want to have a more stable mempool interface than e.g. a researcher/developer needs for a tracing script. Additionally, a more general mempool IPC interface might contain more functionality than receiving events about transactions. It might also support submitting transactions to the mempool or querying mempool contents.

Some questions I'd like people to comments on:

Do you disagree with the general direction of this?
Where to draw the interface boundaries? Do you have a preference on a events interface that's just for streaming events vs possible multiple, separate interfaces for different components (more general and stable mempool interface, potentially less stable net interface for messages/connections)
Is removing --capturemessage really a goal? Would an potential IPC-based replacement with external tooling to write msgs to disk enough to allow deprecating and removing it?

prior discussion and other resources:

#31274 (comment)
https://github.com/bitcoin-core/libmultiprocess/issues/185
#32898: a IPC-tracing PoC by @ryanofsky
https://github.com/willcl-ark/bitcoin/tree/pr/trace A rebase of the IPC-tracing PoC (including the chain-interface PR)
https://github.com/bitcoin-dev-tools/ipc-exporter-rust/blob/master/report.md: @willcl-ark's "USDT Tracepoints vs IPC Interfaces: Comparison Report"
https://github.com/bitcoin-dev-tools/ipc-exporter-rust/: PoC for a Bitcoin Core IPC metric exporter for Prometheus
https://tracing.fish.foo/: a dashboard showing the ipc-exporter-rust data
https://github.com/peer-observer/peer-observer/pull/379: a basic IPC-extractor for peer-observer (limited to the mining interface for now)

0xB10C commented at 10:39 AM on April 23, 2026: contributor

For evaluating the overhead and performance of an IPC-tracing interface, a few initial ideas are:

Ping benchmark: This shows message throughput with many small messages.

implement an event for P2P messages similar to the utxocache events in #32898
write a simple tool that connects to this IPC interface and receives these events
have a custom P2P client that connects to a localhost, regtest node and sends pings to it as fast as possible. Ignore the pong. Maybe send 1M pings over a span of a few minutes. Potentially start slow and increase the ping-rate over time.
With only the eBPF tracepoints, measure how many events per second you receive and if any are dropped
With only the IPC events, measure how many events per second you receive and how many are dropped

IBD benchmark: This shows message throughput with many large messages:

implement an event for P2P messages similar to the utxocache events in #32898
Do a mainnet initial block download from a localhost node and measure how long it takes.
With only the eBPF tracepoints, measure how many blocks per second we receive and if any are dropped.
With only the IPC events, measure how many blocks per second you receive and how many are dropped. @willcl-ark also noted a while back that he saw high CPU usage with the utxocache events during IBD. So might be interesting to check this.

fanquake added the label Brainstorming on Apr 23, 2026

fanquake added the label interfaces on Apr 23, 2026

ViniciusCestarii commented at 1:05 PM on April 24, 2026: contributor

Great write-up. A few thoughts on the open questions:

General direction: Strongly agree. The privilege requirement alone makes USDT inaccessible for most developers, and the CI friction compounds that. IPC-based tracing solves the right problems.

Interface boundaries: I'd argue for keeping operational functionality separate from tracing, if a subsystem's interface starts wanting tx submission or mempool queries, that deserves its own dedicated interface following the mining interface precedent. For the tracing layer itself, I'm genuinely uncertain whether a single events interface or per-subsystem interfaces is better. It would help to know whether current USDT users typically want to correlate events across subsystems or tend to focus on one at a time. That seems like the right data to drive this decision.

willcl-ark commented at 9:39 AM on April 27, 2026: member

@willcl-ark also noted a while back that he saw high CPU usage with the utxocache events during IBD. So might be interesting to check this.

Would only note here that this was done on a Hetzner CX22, with 2 vCPUs, 4 GB of RAM, and 40 GB of disk space. This is already pretty-constrained for a pruning full node doing IBD, but I was also obviously running the IPC collector on the same machine.

Bicaru20 commented at 5:38 PM on May 4, 2026: none

Hi, I've been using the tracepoint to study the transactions replaced by the RBF. Specifically the mempool:replaced. Personally at the begining it was dificult to set up all the dependecies and make the tracepoint work. Also is not easy to undertand and use the tracing scripts (althought the examples are really great!). On top of that, the fact that the tracpoint requires privileges is not ideal, specially if you are running the program in an external server.

I also tried to modify the tracing script but I was not succesfull. Now I see that maybe it was due to the interface not being typed. In any case, I don't feel is trivial to modify this scripts for specific use cases.

When extracting data about replacements, I noticed that the tracepoints sometimes return incorrect information. For example, if a transaction has a child in the mempool and it gets replaced, the child is also marked as replaced. The tracepoints then indicate that the child is being replaced by the parent transaction, while the parent appears to be replaced by nothing. I’m not sure whether issues like this would be resolved by introducing IPC. I also haven’t yet verified whether this behavior originates from a fault in the tracepoints themselves.

I strongly agree with the general direction. Simply making the setup easier than the tracepoints and removing the dependency on special permissions makes this worth implementing.

Interface boundaries: I would say that a subsystem-specific interface is better. In my case, I am only monitoring the mempool, so I would rather rely on a stable interface dedicated to that subsystem. But even when monitoring multiple components simultaneously, I think it still makes more sense for each subsystem to expose its own interface, with aggregation and comparison handled afterwards by a personalized backend.

stickies-v commented at 2:30 PM on May 8, 2026: contributor

Concept ACK, the general direction makes sense. I've not used eBPF/USDT in meaningful ways, but I think providing easier and more ways to monitor are useful for the project.

Wrt interface, I generally prefer multiple, smaller, generic endpoints that different kinds of users can compose in ways that make most sense for their application.

sipa commented at 2:31 PM on May 8, 2026: member

Concept ACK on exploring this approach as a replacement for the eBPF interface, assuming it is found to be sufficiently performant. I believe eBPF only seen pretty narrow use, and breaking compatibility there is acceptable (with deprecation period etc.). Using IPC seems to be more flexible and easier to extend to more functionality. That said, I have not reviewed the code changes needed, and am not committing to reviewing this.

edilmedeiros commented at 8:44 PM on May 9, 2026: contributor

I'm favorable to the concept and willing to help with review.

I would prefer to see a modular approach as much as possible, i.e., the user/developer enables smaller endpoints: better composability and probably less burden on the node for things it doesn't need.

The main concern I see is not only runtime overhead, but API creep: once an internal event is exposed through a convenient IPC interface, external tools may start depending on it as if it were a stable API. I feel that #29912 is related in a broader sense: once interfaces are meant to be consumed by external tools, the schema and stability contract become part of the design. I would explicitly treat any design here as experimental for quite a some time.

Having said that, maybe the distinction is not only modularity by subsystem, but modularity by stability level and so the split should be along two axes: subsystem boundaries (net, mempool, validation, utxocache, etc.) and stability boundaries. A potential mempool interface used by Lightning nodes or explorers has a different character from a net tracepoint used to understand message scheduling or compact block reconstruction. Having this somewhat explicitly should make this interface useful for many audiences.

For developers, tracepoints are valuable because they expose internal behavior without transforming the program into a debugger session. Debuggers work best for inspecting a stopped state. Tracepoints, on the other hand, are useful for understanding how the program reached a specific state. I would guess they are keen to questions like: when a P2P message is received, which thread processed it, whether a validation event followed, whether a cache flush or mempool mutation happened nearby; i.e. implementation details matter and thus the interface should follow the implementation, not the other way around. I think the current tracepoints can't even help with many of these questions.

For a network researcher, a node is more like a measurement instrument where the event stream is the product itself (most notable works that come to my mind are https://www.dsn.kastel.kit.edu/bitcoin/ and https://github.com/peer-observer/). These tend to ask more abstract questions like how fast do transactions and blocks propagate, how often does compact block reconstruction fail, how do peers behave under different relay conditions, can we fingerprint this node based on how it behaves on the network, etc. A researcher often needs to reconstruct sequences of events and correlate them with events from other nodes or other data sources and probably benefits from more stable semantics that gives some guarantees about timing, ordering and loss accounting.

A node operator usually does not want raw trace events, they want to know whether the node is healthy and I think of things like: number of connected peers, inbound/outbound peer counts, message rates, block download progress, stale tip status, mempool size, mempool churn, compact block reconstruction failures, validation queue backlog, UTXO cache flush durations, disk or network bottlenecks. These are metrics or status summaries, and some are already available via the RPC interface.

morozow commented at 11:08 AM on May 11, 2026: none

Hi, I went through the current issue discussion, and the first thing I noticed is that we do not yet have enough concrete metrics to reason about the trade-offs in a measurable way. I measured IPC tracing overhead vs eBPF on -reindex with 200k mainnet blocks, heights 0–200,000 via same binary, same block data, Docker Linux arm64 including sequential execution with cache flushes between conditions.

Condition	Flag	Time(s)	Blocks/s	Overhead%
Baseline (no observers)	`baseline`	466.26	428.9	—
eBPF kernel counters	`ebpf`	493.44	405.3	5.83%
eBPF data delivery	`ebpf_full`	849.71	235.4	82.24%
IPC pipe (6 workers, raw)	`raw_ipc`	495.58	403.6	6.29%
IPC pipe (6 workers, managed)	`ipc`	489.00	409.0	4.88%

The benchmark is composed of an IPC path that serializes events to JSON, writes to POSIX pipes asynchronously through a bounded queue, and 6 forked worker processes consume from stdin. USDT tracepoints are mirrored identically. The eBPF overhead comes from bpftrace polling perf buffers in userspace per event batch, whereas the pipe write is a non-blocking kernel buffer copy. In benchmark, bpf measures kernel-side counters only while no data leaves the kernel – useless for actual observability. ebpf_full extracts event fields and delivers them to userspace via perf ring buffer – this is what any real eBPF monitoring tool does when it needs to show you block heights, transaction fees, or peer addresses. I would like to note one architectural difference – with IPC, consumer-side processing scales horizontally. The IPC approach allows adding more worker processes without increasing overhead on the bitcoind side, since the cost is one enqueue and one pipe write regardless of how many consumers read downstream. With eBPF, the perf buffer is a single-reader bottleneck, so additional analysis means either running multiple bpftrace instances with separate probe attachments or post-processing the single output stream.

I added a reproduction Dockerfile and benchmark script to the fork: https://github.com/morozow/bitcoin_rd/tree/rfc/ebpf-ipc-tracing/contrib/perf/docker Benchmark can produce negative overhead values e.g. raw_ipc at -1.47% for instance. It's a measurement noise, not real speedup. Single-run variance on this workload is ±2-3% due to OS scheduling, thermal state, and I/O timing. A negative value means the condition's overhead is effectively zero – within the noise floor of the measurement.

vasild commented at 8:59 AM on May 14, 2026: contributor

Concept ACK

0xB10C commented at 2:13 PM on May 26, 2026: contributor

With a potential #35369 (structured logging) and #35368 (review) (tracepoint that's the duplicate of a log statement), the question that came up for me was: What do want to expose via a events/tracing interface (that may be using eBPF or IPC) and what should remain as a (structured) logging output? Where do we draw the line? I don't think we want to end up with each log line also being a tracing event...

Just thinking out loud:

for the #35368 TRACEPOINT(net, block_header, ...) we already have a very similar log line: I think this could be a good thing to expose via structured logging for consumers. Maybe no need for an extra tracepoint. The data passed is quite small. It's more a "oh, this infrequent event happened at this point in time with this height, block hash, peer_id".
for P2P message received from a peer via TRACEPOINT(net, inbound_message, ...) this might be different. Printing the raw P2P message bytes isn't practical in the debug.log and we don't do it at the moment. It's just too much data, and too frequent, even if hidden behind -debug=net option. This is a good fit for a eBPF or IPC tracing/events interface IMO.
A third example is possibly the TRACEPOINT(mempool, replaced, ...) which gives us information about a transaction (package) that was replaced by something else. We have both logging and a tracepoint for this. The event doesn't happen too often, and logging doesn't have to put the full transaction into it. On the eBPF or IPC tracing/events side, we might want to have the full set of transactions that was replaced and the full set of replacements.

stickies-v commented at 2:28 PM on May 26, 2026: contributor

If we have structured logging and allow users to hook into it through IPC, do we still need a separate tracing endpoint at all?

maflcko commented at 8:37 PM on May 26, 2026: member

If we have structured logging and allow users to hook into it through IPC, do we still need a separate tracing endpoint at all?

I had the same question, but I wonder what the background is? Are you saying that with IPC the JSON overhead in hot structured logging paths is irrelevant or that it can be avoided?

I'd presume that avoiding the hex+json overhead would still require either dedicated types or a dedicated tracing macro. Recall that the json-logging and printable logging requires hex encoding of raw byte views. However, with IPC you'd probably want the actual raw bytes.

So the approach of just using json-types, would have to be extended to also cover more complex types:

LogTrace(BCLog::NET, "received msg type=%s payload=%s peer=%d",
    SanitizeString(msg_type), HexStr(vRecv), pfrom.GetId());

->

LogTrace(BCLog::NET, "received msg type={} payload={} peer={}",
    LogAsSanitizedStrOrIpcAsBytes{msg_type}, LogAsHexOrIpcAsBytes{vRecv}, pfrom.GetId());

ajtowns commented at 3:54 AM on May 27, 2026: contributor

If we're doing structured logging via IPC, then I think we'd be encoding the key/value params via capnproto rather than json, which should have much lower overhead -- there's no internal escaping of values needed since they're all length-prefixed. I think SanitizeString could return a custom object which has an operator<< that actually does the sanitization (for tinyformat) and some different method which is used for capnproto-encoding that skips the sanitization.

LogTrace(BCLog::NET, "received msg type={} payload={} peer={}",
     LogAsSanitizedStrOrIpcAsBytes{msg_type}, LogAsHexOrIpcAsBytes{vRecv}, pfrom.GetId());

If we want to do more complex types, then I think it would be better to do custom formatValue handlers like payload=%064x (or payload={:064x} and some custom std::format handling) than replace HexStr with LogAsHexOrIpcAsBytes.

I'm not sure that would actually be a good idea though -- if you get an arbitrary capnproto key/value struct with some arbitrary string as the key (prev, hash, txid, prevout, fork, ...) and 32 bytes in the value, figuring out how to presenting that to someone seems ambiguous -- it's perhaps only 66TH to get a txid that's all printable characters? And making a meaningful schema for all log messages that avoids arbitrary keys doesn't really seem plausible.

ajtowns commented at 4:16 AM on May 27, 2026: contributor

I don't think we want to end up with each log line also being a tracing event...

Could think of it as three levels:

LogDebug(BCLog::NET, ..) -- general debugging info, enabled in bulk per category
LogTrace(BCLog::NET, ..) -- noisy debugging info, enabled in bulk per category
TRACEPOINT -- very noisy, enabled on a line-by-line basis?

Moving some existing net logging lines to the trace level (eg the received/sent message lines?) might be helpful here too, so that you're not trying to enable individual net log lines via TRACEPOINT just because the debug net category as a whole is too noisy.

ajtowns commented at 11:58 AM on May 28, 2026: contributor

When not in use, the IPC-tracing should not cause any overhead (e.g. by gating the "tracepoints" with a if(script attached to this tracepoint)) similar how it's done for the USDT tracepoints.

Without eBPF/USDT, per-line tracing would need to have a tiny bit of overhead when not in use (ie, an atomic bool load and conditional jump) afaics.

The main concern I see is not only runtime overhead, but API creep: once an internal event is exposed through a convenient IPC interface, external tools may start depending on it as if it were a stable API.

I think this is a good point; logging is mostly intended for human consumption, with no stability guarantees; if this is meant to provide a stable api like zmq messages or capturemessages that we expect tools to rely on, perhaps we shouldn't be thinking of it as part of the "logging" system at all, even if it's closely related, and/or shares a lot of implementation code? Could perhaps call it the "capture" subsystem?

If it's a general "here are interesting messages someone might subscribe to" service, then that could potentially be provided over zmq, ipc, jsonrpc, and potentially direct filesystem dumps (like capturemessages) simultaneously, depending on what's convenient for the user.

Having said that, maybe the distinction is not only modularity by subsystem, but modularity by stability level and so the split should be along two axes:

Maybe we could give stable tracepoints a name (as we do now), but identify unstable tracepoints just by source filename and line number? Structured data there could also be encoded as a generic stringified key/value list, avoiding the need for a dedicated (sub)schema or worrying about preserving backwards compat.

If they're being provided over IPC, stable tracepoints probably want a numeric id to be represented as a union member in capnproto, which might make the name redundant, but probably better to have the redundancy than just an arbitrary number.

ajtowns commented at 3:53 AM on May 29, 2026: contributor

How about something like this on the API side?

RuntimeCapture("net:inbound_message@953"_rtcap, "peer=%d addr=%s type=%s bytes=%d data=%x",
        node.GetId(),
        node.m_addr_name.c_str(),
        node.ConnectionTypeAsString().c_str(),
        msg.m_type.c_str(),
        msg.m_recv.size(),
        msg.m_recv.data()
);

(largely matching how TRACEPOINT is used now, possibly without the TRACEPOINT_SEMAPHORE declaration)

Then those messages could be subscribed to over IPC for efficiency (raw data, data encoded via a schema rather than key/value store) or via some-jsony way for ease/flexibility (streaming rpc response, direct writes to fs somewhere, or something similar?) where you subscribe to each "net:inbound_message" key individually, with ~zero runtime overhead when there are no subscribers for a key.

I think hard coding an identifier for the schema (@953) is needed, and at that point the scehma could only be safely extended (additional keys added to the end).

The idea for the "node=..." fmt string being that it would be used for the quick-and-easy json encoding, but also to generate/validate the IPC schema, and ensure we don't introduce backwards incompatibilities via schema drift. I think you can map between capnproto types and format strings okay:

Int64: %d %lld
Int32: %ld
Int16: %hd
Int8: %c
Uint64: %u %llu %x %016x
Uint32: %lu %08x
Uint16: %hu %04x
Uint8: %c %02x
Float32: %f
Float64: %Lf %lf

For Bool, Data, Text, and List subtypes, you'd probably have to dispatch on the type of the argument to a "%s" format?

For quick debugging, could have a variant that takes a string and doesn't produce an IPC schema (with the corresponding requirement to worry about compatibility) but just returns a List(struct { Key [@0](/bitcoin-bitcoin/contributor/0/) Text, Value [@1](/bitcoin-bitcoin/contributor/1/) Text }), and might be used something like:

RuntimeCaptureDebug("misbehaving", "peer=%d msg=%s", peer.m_id, message.c_str());