Discussion: Potential USDTs (User Static Defined Traces) for Core

0xB10C commented at 7:49 pm on January 21, 2021: contributor

USDTs are tracepoints and behave similar to trace logging (no performance impact if not used), but can pass data into scripts running in a Linux kernel VM for collection and further evaluation. The required build system changes and a tracing framework for User Static Defined Traces (USDTs) were merged in #19866 and some additional discussion took place in #20960. Simpler scripts, such as collecting and printing metrics as histogram, can be created with the bpftrace tool. Complexer bcc scripts are written in C and loaded into the kernel with, for example, the bcc Python module where the passed data can be processed.

Following the core-dev-meeting discussion on 21st Jan 2021, the goal here is to list potentially interesting USDTs and what data they should pass. The USDT should have:

a clear use-case and motivation
a semi-stable API (order of arguments don’t change so people can rely on these for scripting)
no expensive operations added specifically for this USDT (no parsing, serialization, … - just passing of already avaliable data)
a documentation in docs/tracing.md
(ideally) a usage example as bftrace or bcc script
(need to double check this) a maximum of 512 bytes passed (max stack size in bpftrace)

While some kind of testing of the USDT examples would be good, root privileges are required to load programs in the kernel VM. Might still be possible to run them in a CI, though?

What could be useful USDTs? What data should they pass? Where should they be placed in the code?

0xB10C added the label Feature on Jan 21, 2021

0xB10C commented at 3:03 pm on January 23, 2021: contributor

Some of the previously proposed USDTs by @jb55 from #19866:

`net:push_message`

traces all sent P2P messages as TRACE5:
- int node_id
- char* node_addr
- char* msg_type
- size_t msg_size
- uint8_t* msg_data

https://github.com/bitcoin/bitcoin/blob/7d8a10a6f4d61df1f4afb138e379149565afa02d/src/net.cpp#L3001

`net:process_message`

Traces all received P2P messages as TRACE5:

int node_id
char* node_addr
char* msg_type
size_t msg_size
uint8_t* msg_data

https://github.com/bitcoin/bitcoin/blob/7d8a10a6f4d61df1f4afb138e379149565afa02d/src/net_processing.cpp#L3812

#19866 has an example bpftrace script using net:process_message and net:push_message.

`validation:connect_block`

Traces all blocks for which ConnectBlock() is called as TRACE4:

uint32 height
int tx_count
uint8_t[32] hash
uint8_t[32] prevhash
(more header fields?)

Passing the full serialized block is not possible as this could be bigger than both the bcc and bftrace stack (the kernel VM crashes gracefully, no stack overflow). While the serialized block header would not be bigger than the stack size, header serialization would need to happen even when no USDTs are attached. This has an performance impact and should not be done. @jb55 used an earlier version of validation:connect_block to measure IBD performance.

Depending on placement in ConnectBlock(), this could either trace to-be-connected blocks or connected blocks (no error while connecting). In either way, the documentation should make it clear for which blocks the data is passed.

https://github.com/bitcoin/bitcoin/blob/7d8a10a6f4d61df1f4afb138e379149565afa02d/src/validation.cpp#L1946

0xB10C commented at 3:27 pm on January 23, 2021: contributor

@laanwj mentioned that traces for dis- and connections of peers would be useful too. I’ll have a look where these could be placed. I guess there are some places where there is already net debug logging which can be used for orientation. @jlopp’s statoshi diff could be helpful to look at too.

Also interesting:

function timings, like statoshi.info/function-timings for block and transaction validation functions
tracing fee estimation

laanwj commented at 7:42 pm on January 24, 2021: member

@jlopp’s statoshi diff could be helpful to look at too.

FWIW the statoshi diff can be found here: https://github.com/jlopp/statoshi/compare/1b90a7b61a37d450bbe5b485dcc9b8c591567e0d..28b88353a78ae19e3c5379bbda6b72f4fef65a8c

jb55 commented at 11:07 pm on January 25, 2021: contributor

“Wladimir J. van der Laan” notifications@github.com writes:

@jlopp’s statoshi diff could be helpful to look at too.

FWIW the statoshi diff can be found here: https://github.com/jlopp/statoshi/compare/1b90a7b61a37d450bbe5b485dcc9b8c591567e0d..28b88353a78ae19e3c5379bbda6b72f4fef65a8c

I was looking at this a long time ago. In theory you could get the same functionality as statoshi (stats gathering via prometheus) with https://github.com/cloudflare/ebpf_exporter

Would be an interesting project.

laanwj commented at 11:53 am on January 26, 2021: member

Would be an interesting project.

I agree! This would be my preferred way to do this kind of reporting, to be honest. It’d flexible and built into the operating system. It has low impact on the code and low impact on performance. It could potentially work for all software.

One reason Statoshi never got merged is because it’s just too much hassle to integrate and maintain a Bitcoin Core specific framework for this. Not because it’s not a great thing to have.

laanwj commented at 9:34 am on March 19, 2021: member

Besides networking, one major thing that would be interesting to monitor through tracepoints is dbcache usage. This is useful to get metrics for hardware optimization of validation (e.g. to find where bottlenecks are). A few things come to mind:

Gross statistics on changes (this is in the log but it’s good to have a well defined interface for it)
- how much memory is used of how much
- how many UTXOs are cached
Before/afters for timing:
- Insertions and deletions of UTXOs in the CoinsCache with associated information
- Queries: hits and misses
- Flushes

jb55 commented at 5:21 pm on April 12, 2021: contributor

On Fri, Mar 19, 2021 at 02:34:27AM -0700, W. J. van der Laan wrote:

Besides networking, one major thing that would be interesting to monitor through tracepoints is dbcache usage. This is useful to get metrics for hardware optimization of validation (e.g. to find where bottlenecks are). A few things come to mind:

how much memory is used of how much

I’ve been using heaptrack[1] for memory tracing, it hooks into malloc calls which saves us from putting allocation traces everywhere.

Gross statistics on changes (this is in the log but it’s good to have a well defined interface for it)

how many UTXOs are cached

Before/afters for timing:

Insertions and deletions of UTXOs in the CoinsCache with associated information

Queries: hits and misses

Flushes

💯👍

[1] https://github.com/KDE/heaptrack

laanwj commented at 7:02 pm on April 12, 2021: member

I’ve been using heaptrack[1] for memory tracing, it hooks into malloc calls which saves us from putting allocation traces everywhere.

That sounds useful, however I think it would show gross malloc usage, not specifically that of the utxo set? Or can it drill down to more specific things as well?

jb55 commented at 7:19 pm on April 12, 2021: contributor

On Mon, Apr 12, 2021 at 12:03:09PM -0700, W. J. van der Laan wrote:

I’ve been using heaptrack[1] for memory tracing, it hooks into malloc calls which saves us from putting allocation traces everywhere.

That sounds useful, however I think it would show gross malloc usage, not specifically that of the utxo set? Or can it drill down to more specific things as well?

It shows memory usage from specific data structures over time, like the cacheCoins map. You can also drill down by which function is doing the allocation.

See: #18849 (comment)

0xB10C commented at 8:40 am on May 5, 2021: contributor

I’ve started working on a patch that adds tracing documentation (based on laanwj’s work) and the net:push_message + net:process_message + more (based on jb55’s work) together with both bpftrace and bcc examples. Goal is to have some initial and easy to work with examples of how USDT in Core could look like. There will be Bitcoin Core Review Club on USDTs later this month.

jb55 commented at 5:43 pm on May 5, 2021: contributor

On Wed, May 05, 2021 at 01:41:18AM -0700, 0xB10C wrote:

I’ve started working on a patch that adds tracing documentation (based on laanwj’s work) and the net:push_message + net:process_message + more (based on jb55’s work) together with both bpftrace and bcc examples. Goal is to have some initial and easy to work with examples of how USDT in Core could look like.

Great! Looking forward to it. Thanks for moving this along.

There will be Bitcoin Core Review Club on USDTs later this month.

I’ll be there

0xB10C commented at 9:54 am on May 21, 2021: contributor

laanwj mentioned that traces for dis- and connections of peers would be useful too. I’ll have a look where these could be placed. I guess there are some places where there is already net debug logging which can be used for orientation. @jonatack added that next to normal disconnects, tracepoints for peer evictions would be useful too. He provided input on what tracepoint arguments he’d find useful:

message
connection direction and type,
peer id
address

jonatack commented at 10:06 am on May 21, 2021: member

@0xB10C thanks! Yes, my wishlist ties in a bit with issue #20576. I wish we had those in our net logging, separated into high/low frequency/level events.

Edit: could add the network to the args list.

0xB10C commented at 12:02 pm on July 27, 2021: contributor

I was looking at this a long time ago. In theory you could get the same functionality as statoshi (stats gathering via prometheus) with https://github.com/cloudflare/ebpf_exporter Would be an interesting project.

I agree! This would be my preferred way to do this kind of reporting, to be honest. It’d flexible and built into the operating system. It has low impact on the code and low impact on performance. It could potentially work for all software.

One reason Statoshi never got merged is because it’s just too much hassle to integrate and maintain a Bitcoin Core specific framework for this. Not because it’s not a great thing to have.

I have a demo for Prometheus metrics via eBPF based on the tracepoints added in #22006 in 0xB10C/bitcoind-observer. It’s written in Rust and uses the bcc crate under the hood which recently added USDT support.

A live demo can be fund on bitcoind.observer. Currently showing block validation during IBD and per message type P2P traffic stats. I was surprised who well it works. Might be a real alternative to statoshi once we add more tracepoints about e.g. UTXO set, peer dis/connections, and so on.

laanwj referenced this in commit 61cefde7a7 on Jul 27, 2021

sidhujag referenced this in commit 1f479d9c1a on Jul 28, 2021

0xB10C commented at 2:20 pm on August 10, 2021: contributor

Tracepoints for Bitcoin Core startup and shutdown could be useful too. IDB latching to true/false too.

0xB10C commented at 1:48 pm on August 13, 2021: contributor

Banman tracepoints should be interesting too.

maflcko referenced this in commit 5522408f30 on Sep 9, 2021

maflcko referenced this in commit 5417758166 on Sep 9, 2021

laanwj referenced this in commit 913b7148a2 on Nov 29, 2021

RandyMcMillan referenced this in commit e5d47bc614 on Dec 23, 2021

0xB10C commented at 6:59 am on April 7, 2022: contributor

NYDIG has a $4500 bounty for mempool tracepoints.

https://nydig.com/bounties/add-bpf-tracepoints-for-the-mempool

glozow commented at 7:32 pm on April 22, 2022: member

Some mempool-related telemetry that I think would be helpful:

Transaction traversal count and memory usage when calculating mempool descendants and ancestors. This can inform us about performance/memory benefits of epochs and our current ancestor/descendant limits. I’m particularly interested to see information about this for reorgs.
Records of prevouts of transactions added to the mempool. This would let us track things like “how many times is a tx replaced / how many bytes were relayed before this payment confirmed?”
Signature and script cache usage.
{Resource usage, rate of valid transactions} of our data structures holding invalid/unvalidated transactions: m_orphanage, vExtraTxnForCompact, m_recent_rejects filter
Query and hit rate for LockPoints cached in mempool entries
The “shape” of packages in mempools, e.g. number of direct parents vs total ancestors

NYDIG has a $4500 bounty for mempool tracepoints.

It’s nice that there’s a bounty to encourage contributions. I’m glad the bounty stresses the importance of a clearly-defined use case. We already have logs for {added, removed} mempool transactions, but there’s definitely room for improvement. Hopefully NYDIG would be okay with different but still useful instrumentation?

russeree commented at 10:41 am on April 23, 2022: contributor

@glozow those would be some awesome use cases for ebpf tracepoints. Quite technical in their scope. I would like to take to working on a few of these points if desired. Number 1 has most of my interest.

In regards to #24935

In regards to my existing PR for mempool tracepoints should I remove mempool additions/eviction tracepoints as something to even pursue? Even if the NYDIG bounty no longer qualifies I want to contribute. Learning about the power of ebpf,usdt, and bcc has been a great addition to my skillset and has captured my interest.

Existing logging/tracing solutions in the mempool (adds & evictions)

The current in-mempool implementation seems to be using the ZMQ interface but doesn’t currently provide a complete solution to trace all adds/removals from the pool.

There are 2 functions for broadcasting mempool events referenced in txmempool.cpp and one of them is not even used. These are CValidationInterface::TransactionAddedToMempool and CZMQNotificationInterface::TransactionRemovedFromMempool

Currently CValidationInterface::TransactionAddedToMempool is not called anywhere in this repo
CZMQNotificationInterface::TransactionRemovedFromMempool is only called if the tx was not removed for the reason of ‘BLOCK’. The current implementation instead leaves a multiline comment stating that to find the evicted txs from a block, look at the latest txs in the mined block. A valid solution but why not leverage a trace to to a sandboxed kernel program with extremely little overhead. Even if the user doesn’t need this functionality it can even be compiled out into a nop instruction if built without tracing support.

On the topics of block evictions, they seem to be one hardest events to trace/log if the 256 bit txid is included since the volume of data being streamed is quite high with up to thousands of evictions in a very short time. Using a ringbuffer and a bpffs queue in tandem all mempool evictions can be logged in completeness and I am pretty sure they maintain the order eviction was requested in.

#24935 eBPF solution to logging mempool additions/evictions

Firstly by using eBPF tracepoints the overhead of tracing an event is minimal and isolated. bpftrace, python, and bcc scripts can be written in any manner desired to accommodate the collection of performance stats, testing, and custom user metrics. These custom scripts have no impact on the underlying functionality of the users current bitcoind build as all data is being passed though the kernel sandbox and the bitcoin core traces are a unidirectional event including a trace of all the txs being passed in and out of the mempool.

The example .py script implemented to trace mempool events in #24935 logs each request for a tx addition/removal from the mempool and the relevant information to validate/compare the current mempool with it’s output. These logs are complete without erroneous data. The whole extended test suite was ran and It is possible to log all 328416 function calls to add or evict txs from the mempool. As a note even if these traces are determined to not be a candidate using some of the methods included in log_mempool_activity.py would be very useful in fixing some of the existing scripts that display incomplete data and/or have dropped events.

Thanks

Edit: a word…phrases

0xB10C commented at 2:13 pm on December 1, 2022: contributor

ref: #26531 which is adds mempool tracepoints too (https://github.com/bitcoin/bitcoin/pull/24935 has been closed)

vijaydasmp referenced this in commit 54bcffbc99 on Nov 1, 2023

vijaydasmp referenced this in commit c45e0c6f02 on Nov 6, 2023

vijaydasmp referenced this in commit f9b72d525d on Nov 21, 2023

vijaydasmp referenced this in commit 20b2524b30 on Nov 25, 2023

0xB10C commented at 2:45 pm on April 10, 2024: contributor

Can probably close this as there hasn’t been much activity. Many of the tracepoints proposed here have been implemented in some form - other’s haven’t sparked enough interest.

0xB10C closed this on Apr 10, 2024

vijaydasmp referenced this in commit 82554d6e50 on Jun 15, 2024

vijaydasmp referenced this in commit 4a9d4ff2fd on Jun 15, 2024

vijaydasmp referenced this in commit 03842d8696 on Jun 15, 2024

vijaydasmp referenced this in commit ed3c44bd67 on Jun 17, 2024

vijaydasmp referenced this in commit 72d4374c4e on Jun 22, 2024

vijaydasmp referenced this in commit a6e45d8a8e on Jul 6, 2024

vijaydasmp referenced this in commit d8c466a056 on Jul 19, 2024

vijaydasmp referenced this in commit e9584c8d59 on Jul 20, 2024

vijaydasmp referenced this in commit 11f2d72099 on Aug 18, 2024

bitcoin locked this on Apr 10, 2025

Discussion: Potential USDTs (User Static Defined Traces) for Core #20981

net:push_message