Discussion: Potential USDTs (User Static Defined Traces) for Core #20981

issue 0xB10C openend this issue on January 21, 2021
  1. 0xB10C commented at 7:49 pm on January 21, 2021: contributor

    USDTs are tracepoints and behave similar to trace logging (no performance impact if not used), but can pass data into scripts running in a Linux kernel VM for collection and further evaluation. The required build system changes and a tracing framework for User Static Defined Traces (USDTs) were merged in #19866 and some additional discussion took place in #20960. Simpler scripts, such as collecting and printing metrics as histogram, can be created with the bpftrace tool. Complexer bcc scripts are written in C and loaded into the kernel with, for example, the bcc Python module where the passed data can be processed.

    Following the core-dev-meeting discussion on 21st Jan 2021, the goal here is to list potentially interesting USDTs and what data they should pass. The USDT should have:

    1. a clear use-case and motivation
    2. a semi-stable API (order of arguments don’t change so people can rely on these for scripting)
    3. no expensive operations added specifically for this USDT (no parsing, serialization, … - just passing of already avaliable data)
    4. a documentation in docs/tracing.md
    5. (ideally) a usage example as bftrace or bcc script
    6. (need to double check this) a maximum of 512 bytes passed (max stack size in bpftrace)

    While some kind of testing of the USDT examples would be good, root privileges are required to load programs in the kernel VM. Might still be possible to run them in a CI, though?

    What could be useful USDTs? What data should they pass? Where should they be placed in the code?

  2. 0xB10C added the label Feature on Jan 21, 2021
  3. 0xB10C commented at 3:03 pm on January 23, 2021: contributor

    Some of the previously proposed USDTs by @jb55 from #19866:

    net:push_message

    • traces all sent P2P messages as TRACE5:
      • int node_id
      • char* node_addr
      • char* msg_type
      • size_t msg_size
      • uint8_t* msg_data

    https://github.com/bitcoin/bitcoin/blob/7d8a10a6f4d61df1f4afb138e379149565afa02d/src/net.cpp#L3001


    net:process_message

    Traces all received P2P messages as TRACE5:

    • int node_id
    • char* node_addr
    • char* msg_type
    • size_t msg_size
    • uint8_t* msg_data

    https://github.com/bitcoin/bitcoin/blob/7d8a10a6f4d61df1f4afb138e379149565afa02d/src/net_processing.cpp#L3812

    #19866 has an example bpftrace script using net:process_message and net:push_message.


    validation:connect_block

    Traces all blocks for which ConnectBlock() is called as TRACE4:

    • uint32 height
    • int tx_count
    • uint8_t[32] hash
    • uint8_t[32] prevhash
    • (more header fields?)

    Passing the full serialized block is not possible as this could be bigger than both the bcc and bftrace stack (the kernel VM crashes gracefully, no stack overflow). While the serialized block header would not be bigger than the stack size, header serialization would need to happen even when no USDTs are attached. This has an performance impact and should not be done. @jb55 used an earlier version of validation:connect_block to measure IBD performance.

    Depending on placement in ConnectBlock(), this could either trace to-be-connected blocks or connected blocks (no error while connecting). In either way, the documentation should make it clear for which blocks the data is passed.

    https://github.com/bitcoin/bitcoin/blob/7d8a10a6f4d61df1f4afb138e379149565afa02d/src/validation.cpp#L1946

  4. 0xB10C commented at 3:27 pm on January 23, 2021: contributor

    @laanwj mentioned that traces for dis- and connections of peers would be useful too. I’ll have a look where these could be placed. I guess there are some places where there is already net debug logging which can be used for orientation. @jlopp’s statoshi diff could be helpful to look at too.

    Also interesting:

  5. laanwj commented at 7:42 pm on January 24, 2021: member

    @jlopp’s statoshi diff could be helpful to look at too.

    FWIW the statoshi diff can be found here: https://github.com/jlopp/statoshi/compare/1b90a7b61a37d450bbe5b485dcc9b8c591567e0d..28b88353a78ae19e3c5379bbda6b72f4fef65a8c

  6. jb55 commented at 11:07 pm on January 25, 2021: contributor

    “Wladimir J. van der Laan” notifications@github.com writes:

    @jlopp’s statoshi diff could be helpful to look at too.

    FWIW the statoshi diff can be found here: https://github.com/jlopp/statoshi/compare/1b90a7b61a37d450bbe5b485dcc9b8c591567e0d..28b88353a78ae19e3c5379bbda6b72f4fef65a8c

    I was looking at this a long time ago. In theory you could get the same functionality as statoshi (stats gathering via prometheus) with https://github.com/cloudflare/ebpf_exporter

    Would be an interesting project.

  7. laanwj commented at 11:53 am on January 26, 2021: member

    Would be an interesting project.

    I agree! This would be my preferred way to do this kind of reporting, to be honest. It’d flexible and built into the operating system. It has low impact on the code and low impact on performance. It could potentially work for all software.

    One reason Statoshi never got merged is because it’s just too much hassle to integrate and maintain a Bitcoin Core specific framework for this. Not because it’s not a great thing to have.

  8. laanwj commented at 9:34 am on March 19, 2021: member

    Besides networking, one major thing that would be interesting to monitor through tracepoints is dbcache usage. This is useful to get metrics for hardware optimization of validation (e.g. to find where bottlenecks are). A few things come to mind:

    • Gross statistics on changes (this is in the log but it’s good to have a well defined interface for it)
      • how much memory is used of how much
      • how many UTXOs are cached
    • Before/afters for timing:
      • Insertions and deletions of UTXOs in the CoinsCache with associated information
      • Queries: hits and misses
      • Flushes
  9. jb55 commented at 5:21 pm on April 12, 2021: contributor

    On Fri, Mar 19, 2021 at 02:34:27AM -0700, W. J. van der Laan wrote:

    Besides networking, one major thing that would be interesting to monitor through tracepoints is dbcache usage. This is useful to get metrics for hardware optimization of validation (e.g. to find where bottlenecks are). A few things come to mind:

    • how much memory is used of how much

    I’ve been using heaptrack[1] for memory tracing, it hooks into malloc calls which saves us from putting allocation traces everywhere.

    • Gross statistics on changes (this is in the log but it’s good to have a well defined interface for it)
    • how many UTXOs are cached
    • Before/afters for timing:
    • Insertions and deletions of UTXOs in the CoinsCache with associated information
    • Queries: hits and misses
    • Flushes

    💯👍

    [1] https://github.com/KDE/heaptrack

  10. laanwj commented at 7:02 pm on April 12, 2021: member

    I’ve been using heaptrack[1] for memory tracing, it hooks into malloc calls which saves us from putting allocation traces everywhere.

    That sounds useful, however I think it would show gross malloc usage, not specifically that of the utxo set? Or can it drill down to more specific things as well?

  11. jb55 commented at 7:19 pm on April 12, 2021: contributor

    On Mon, Apr 12, 2021 at 12:03:09PM -0700, W. J. van der Laan wrote:

    I’ve been using heaptrack[1] for memory tracing, it hooks into malloc calls which saves us from putting allocation traces everywhere.

    That sounds useful, however I think it would show gross malloc usage, not specifically that of the utxo set? Or can it drill down to more specific things as well?

    It shows memory usage from specific data structures over time, like the cacheCoins map. You can also drill down by which function is doing the allocation.

    See: #18849 (comment)

  12. 0xB10C commented at 8:40 am on May 5, 2021: contributor
    I’ve started working on a patch that adds tracing documentation (based on laanwj’s work) and the net:push_message + net:process_message + more (based on jb55’s work) together with both bpftrace and bcc examples. Goal is to have some initial and easy to work with examples of how USDT in Core could look like. There will be Bitcoin Core Review Club on USDTs later this month.
  13. jb55 commented at 5:43 pm on May 5, 2021: contributor

    On Wed, May 05, 2021 at 01:41:18AM -0700, 0xB10C wrote:

    I’ve started working on a patch that adds tracing documentation (based on laanwj’s work) and the net:push_message + net:process_message + more (based on jb55’s work) together with both bpftrace and bcc examples. Goal is to have some initial and easy to work with examples of how USDT in Core could look like.

    Great! Looking forward to it. Thanks for moving this along.

    There will be Bitcoin Core Review Club on USDTs later this month.

    I’ll be there

  14. 0xB10C commented at 9:54 am on May 21, 2021: contributor

    laanwj mentioned that traces for dis- and connections of peers would be useful too. I’ll have a look where these could be placed. I guess there are some places where there is already net debug logging which can be used for orientation. @jonatack added that next to normal disconnects, tracepoints for peer evictions would be useful too. He provided input on what tracepoint arguments he’d find useful:

    • message
    • connection direction and type,
    • peer id
    • address
  15. jonatack commented at 10:06 am on May 21, 2021: contributor

    @0xB10C thanks! Yes, my wishlist ties in a bit with issue #20576. I wish we had those in our net logging, separated into high/low frequency/level events.

    Edit: could add the network to the args list.

  16. 0xB10C commented at 12:02 pm on July 27, 2021: contributor

    I was looking at this a long time ago. In theory you could get the same functionality as statoshi (stats gathering via prometheus) with https://github.com/cloudflare/ebpf_exporter Would be an interesting project.

    I agree! This would be my preferred way to do this kind of reporting, to be honest. It’d flexible and built into the operating system. It has low impact on the code and low impact on performance. It could potentially work for all software.

    One reason Statoshi never got merged is because it’s just too much hassle to integrate and maintain a Bitcoin Core specific framework for this. Not because it’s not a great thing to have.

    I have a demo for Prometheus metrics via eBPF based on the tracepoints added in #22006 in 0xB10C/bitcoind-observer. It’s written in Rust and uses the bcc crate under the hood which recently added USDT support.

    A live demo can be fund on bitcoind.observer. Currently showing block validation during IBD and per message type P2P traffic stats. I was surprised who well it works. Might be a real alternative to statoshi once we add more tracepoints about e.g. UTXO set, peer dis/connections, and so on.

  17. laanwj referenced this in commit 61cefde7a7 on Jul 27, 2021
  18. sidhujag referenced this in commit 1f479d9c1a on Jul 28, 2021
  19. 0xB10C commented at 2:20 pm on August 10, 2021: contributor
    Tracepoints for Bitcoin Core startup and shutdown could be useful too. IDB latching to true/false too.
  20. 0xB10C commented at 1:48 pm on August 13, 2021: contributor
    Banman tracepoints should be interesting too.
  21. maflcko referenced this in commit 5522408f30 on Sep 9, 2021
  22. maflcko referenced this in commit 5417758166 on Sep 9, 2021
  23. laanwj referenced this in commit 913b7148a2 on Nov 29, 2021
  24. RandyMcMillan referenced this in commit e5d47bc614 on Dec 23, 2021
  25. 0xB10C commented at 6:59 am on April 7, 2022: contributor

    NYDIG has a $4500 bounty for mempool tracepoints.

    https://nydig.com/bounties/add-bpf-tracepoints-for-the-mempool

  26. glozow commented at 7:32 pm on April 22, 2022: member

    Some mempool-related telemetry that I think would be helpful:

    • Transaction traversal count and memory usage when calculating mempool descendants and ancestors. This can inform us about performance/memory benefits of epochs and our current ancestor/descendant limits. I’m particularly interested to see information about this for reorgs.
    • Records of prevouts of transactions added to the mempool. This would let us track things like “how many times is a tx replaced / how many bytes were relayed before this payment confirmed?”
    • Signature and script cache usage.
    • {Resource usage, rate of valid transactions} of our data structures holding invalid/unvalidated transactions: m_orphanage, vExtraTxnForCompact, m_recent_rejects filter
    • Query and hit rate for LockPoints cached in mempool entries
    • The “shape” of packages in mempools, e.g. number of direct parents vs total ancestors

    NYDIG has a $4500 bounty for mempool tracepoints.

    It’s nice that there’s a bounty to encourage contributions. I’m glad the bounty stresses the importance of a clearly-defined use case. We already have logs for {added, removed} mempool transactions, but there’s definitely room for improvement. Hopefully NYDIG would be okay with different but still useful instrumentation?

  27. russeree commented at 10:41 am on April 23, 2022: contributor

    @glozow those would be some awesome use cases for ebpf tracepoints. Quite technical in their scope. I would like to take to working on a few of these points if desired. Number 1 has most of my interest.

    In regards to #24935

    In regards to my existing PR for mempool tracepoints should I remove mempool additions/eviction tracepoints as something to even pursue? Even if the NYDIG bounty no longer qualifies I want to contribute. Learning about the power of ebpf,usdt, and bcc has been a great addition to my skillset and has captured my interest.

    Existing logging/tracing solutions in the mempool (adds & evictions)

    The current in-mempool implementation seems to be using the ZMQ interface but doesn’t currently provide a complete solution to trace all adds/removals from the pool.

    There are 2 functions for broadcasting mempool events referenced in txmempool.cpp and one of them is not even used. These are CValidationInterface::TransactionAddedToMempool and CZMQNotificationInterface::TransactionRemovedFromMempool

    • Currently CValidationInterface::TransactionAddedToMempool is not called anywhere in this repo
    • CZMQNotificationInterface::TransactionRemovedFromMempool is only called if the tx was not removed for the reason of ‘BLOCK’. The current implementation instead leaves a multiline comment stating that to find the evicted txs from a block, look at the latest txs in the mined block. A valid solution but why not leverage a trace to to a sandboxed kernel program with extremely little overhead. Even if the user doesn’t need this functionality it can even be compiled out into a nop instruction if built without tracing support.

    On the topics of block evictions, they seem to be one hardest events to trace/log if the 256 bit txid is included since the volume of data being streamed is quite high with up to thousands of evictions in a very short time. Using a ringbuffer and a bpffs queue in tandem all mempool evictions can be logged in completeness and I am pretty sure they maintain the order eviction was requested in.

    #24935 eBPF solution to logging mempool additions/evictions

    Firstly by using eBPF tracepoints the overhead of tracing an event is minimal and isolated. bpftrace, python, and bcc scripts can be written in any manner desired to accommodate the collection of performance stats, testing, and custom user metrics. These custom scripts have no impact on the underlying functionality of the users current bitcoind build as all data is being passed though the kernel sandbox and the bitcoin core traces are a unidirectional event including a trace of all the txs being passed in and out of the mempool.

    The example .py script implemented to trace mempool events in #24935 logs each request for a tx addition/removal from the mempool and the relevant information to validate/compare the current mempool with it’s output. These logs are complete without erroneous data. The whole extended test suite was ran and It is possible to log all 328416 function calls to add or evict txs from the mempool. As a note even if these traces are determined to not be a candidate using some of the methods included in log_mempool_activity.py would be very useful in fixing some of the existing scripts that display incomplete data and/or have dropped events.

    Thanks

    Edit: a word…phrases

  28. 0xB10C commented at 2:13 pm on December 1, 2022: contributor
    ref: #26531 which is adds mempool tracepoints too (https://github.com/bitcoin/bitcoin/pull/24935 has been closed)
  29. vijaydasmp referenced this in commit 54bcffbc99 on Nov 1, 2023
  30. vijaydasmp referenced this in commit c45e0c6f02 on Nov 6, 2023
  31. vijaydasmp referenced this in commit f9b72d525d on Nov 21, 2023
  32. vijaydasmp referenced this in commit 20b2524b30 on Nov 25, 2023
  33. 0xB10C commented at 2:45 pm on April 10, 2024: contributor
    Can probably close this as there hasn’t been much activity. Many of the tracepoints proposed here have been implemented in some form - other’s haven’t sparked enough interest.
  34. 0xB10C closed this on Apr 10, 2024


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-07-05 19:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me