rpc: add cpu_load to getpeerinfo #31672

pull vasild wants to merge 2 commits into bitcoin:master from vasild:peer_cpu_load changing 8 files +165 −4
  1. vasild commented at 2:02 PM on January 16, 2025: contributor

    Add a new field cpu_load to the output of getpeerinfo RPC.

    It represents the CPU time spent by the message handling thread for the given peer, weighted for the duration of the connection. That is, for example, if two peers are equally demanding and one is connected longer than the other, then they will have the same cpu_load number.


    Monitoring CPU usage is useful on its own. Also related to #31033.


    This PR uses clock_gettime() (POSIX) and GetThreadTimes() (Windows). An alternative to those, should this be explored are: getrusage() (POSIX) and QueryThreadCycleTime() (Windows, but it counts CPU cycles).

  2. DrahtBot commented at 2:02 PM on January 16, 2025: contributor

    <!--e57a25ab6845829454e8d69fc972939a-->

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    <!--006a51241073e994b41acfe9ec718e94-->

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/31672.

    <!--021abf342d371248e50ceaed478a90ca-->

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    Concept NACK rebroad
    Concept ACK sipa, theStack, BrandonOdiwuor, laanwj, mzumsande, 1440000bytes
    Stale ACK jonatack, yuvicc

    If your review is incorrectly listed, please copy-paste <code>&lt;!--meta-tag:bot-skip--&gt;</code> into the comment that the bot should ignore.

    <!--174a7506f384e20aa4161008e828411d-->

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #34242 (Prepare string and net utils for future HTTP operations by pinheadmz)
    • #32061 (Replace libevent with our own HTTP and socket-handling implementation by pinheadmz)
    • #19461 (multiprocess: Add bitcoin-gui -ipcconnect option by ryanofsky)
    • #19460 (multiprocess: Add bitcoin-wallet -ipcconnect option by ryanofsky)
    • #10102 (Multiprocess bitcoin by ryanofsky)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

    <!--5faf32d7da4f0f540f40219e4f7537a3-->

    LLM Linter (✨ experimental)

    Possible typos and grammar issues:

    • require it be validated -> require that it be validated [missing β€œthat” for clarity]

    <sup>drahtbot_id_4_m</sup>

  3. DrahtBot added the label RPC/REST/ZMQ on Jan 16, 2025
  4. vasild force-pushed on Jan 16, 2025
  5. DrahtBot added the label CI failed on Jan 16, 2025
  6. DrahtBot commented at 2:05 PM on January 16, 2025: contributor

    <!--85328a0da195eb286784d51f73fa0af9-->

    🚧 At least one of the CI tasks failed. <sub>Debug: https://github.com/bitcoin/bitcoin/runs/35717681069</sub>

    <details><summary>Hints</summary>

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

    </details>

  7. jonatack commented at 2:25 PM on January 16, 2025: member

    Concept ACK if this can be a reliable, useful metric and help pave the path to #31033.

  8. sipa commented at 3:24 PM on January 16, 2025: member

    Concept ACK

  9. vasild force-pushed on Jan 16, 2025
  10. theStack commented at 6:15 PM on January 16, 2025: contributor

    Concept ACK

    Might be worth noting that this is currently only available on POSIX systems (i.e. not available on Windows, but on all other systems that we support AFAICT) in both the RPC help and a release note.

  11. BrandonOdiwuor commented at 7:00 PM on January 16, 2025: contributor

    Concept ACK

  12. fanquake commented at 1:06 PM on January 17, 2025: member

    Seems like there was some very brief discussion in #31033, and the conclusion "I guess this should start with planting some metrics", but I'm not sure putting changes into Bitcoin Core is the right first step.

    Before changing our API, it'd be good to atleast show some usage that indicates that the changes here are useful. I assume you've already been running this locally, and collecting the data, so it'd be good to know what it turned up?

    As noted in #31033, others have also already conducted similar research (https://b10c.me/projects/023-cpu-usage-of-peers/ or https://delvingbitcoin.org/t/cpu-usage-of-peers/196), and it seems like the data you'd expose here, will be less detailed / useful than what can/has already been produced using tracepoints or similar, so I'm wondering if this is the best approach.

  13. vasild commented at 4:44 PM on January 17, 2025: contributor

    @fanquake IMO, monitoring CPU usage is useful on its own, even if we don't start treating peers differently based on that metric (https://github.com/bitcoin/bitcoin/issues/31033). In other words, this metric is useful at least as much as a bunch of other metrics in the getpeerinfo output.

    Yes, I have been running this locally - there is like 65x difference between the least and most demanding peers. I am curious to correlate this to the messages being sent/received to/from those peers and to have a histogram of the data, not just least / most demanding (e.g. histogram of bitcoin-cli getpeerinfo |jq ".[].cpu_load"). I view those as something nice to build on top of this PR, not as a blocker that's needed to justify the usefulness of the CPU time metric.

  14. 1440000bytes commented at 5:34 PM on January 17, 2025: none

    Concept ACK

  15. vasild commented at 12:25 PM on January 20, 2025: contributor

    Might be worth noting that this is currently only available on POSIX systems (i.e. not available on Windows

    Right. GetThreadTimes() looks like a promising Windows alternative. I will try to implement that here. Switching to draft because I will push some work-in-progress a bunch of times.

  16. vasild marked this as a draft on Jan 20, 2025
  17. vasild force-pushed on Jan 21, 2025
  18. vasild force-pushed on Jan 22, 2025
  19. vasild force-pushed on Jan 22, 2025
  20. DrahtBot removed the label CI failed on Jan 22, 2025
  21. vasild force-pushed on Jan 22, 2025
  22. vasild marked this as ready for review on Jan 23, 2025
  23. vasild commented at 10:07 AM on January 23, 2025: contributor

    Ready for review. Implemented for Windows as well.

    Here is a test program I played with to test this manually. I think it is not worth adding as a unit test or benchmark, but at least it can live here in the comments of this PR:

    <details> <summary>cpu_time_windows.cpp</summary>

    #include <assert.h>
    
    #include <windows.h>
    #include <winnt.h>
    
    #include <processthreadsapi.h>
    
    #include <iostream>
    #include <thread>
    
    void thread(size_t work)
    {
        FILETIME before_creation;
        FILETIME before_exit;
        FILETIME before_kernel;
        FILETIME before_user;
        bool ret = GetThreadTimes(GetCurrentThread(),
            &before_creation,
            &before_exit,
            &before_kernel,
            &before_user);
        assert(ret == 1);
    
        if (work > 10'000) {
            for (size_t i{0}; i < work; ++i) {
                (void)(i * i);
            }
        } else {
            std::this_thread::sleep_for(std::chrono::milliseconds{work});
        }
    
        FILETIME after_creation;
        FILETIME after_exit;
        FILETIME after_kernel;
        FILETIME after_user;
        ret = GetThreadTimes(GetCurrentThread(),
            &after_creation,
            &after_exit,
            &after_kernel,
            &after_user);
        assert(ret == 1);
    
        ULARGE_INTEGER before_kernel_;
        before_kernel_.LowPart = before_kernel.dwLowDateTime;
        before_kernel_.HighPart = before_kernel.dwHighDateTime;
    
        ULARGE_INTEGER before_user_;
        before_user_.LowPart = before_user.dwLowDateTime;
        before_user_.HighPart = before_user.dwHighDateTime;
    
        ULARGE_INTEGER after_kernel_;
        after_kernel_.LowPart = after_kernel.dwLowDateTime;
        after_kernel_.HighPart = after_kernel.dwHighDateTime;
    
        ULARGE_INTEGER after_user_;
        after_user_.LowPart = after_user.dwLowDateTime;
        after_user_.HighPart = after_user.dwHighDateTime;
    
        ULARGE_INTEGER elapsed;
        elapsed.QuadPart = after_kernel_.QuadPart + after_user_.QuadPart - before_kernel_.QuadPart - before_user_.QuadPart;
        ULONGLONG elapsed_ns{elapsed.QuadPart * 100};
        std::cout << "cpu time: " << elapsed_ns / 1'000'000'000.0 << " sec\n";
    }
    
    int main ()
    {
        std::thread t1{thread, 1000000000};
        std::thread t2{thread, 1000000000};
        std::thread t3{thread, 3000};
        t1.join();
        t2.join();
        t3.join();
    
        return 0;
    }
    

    </details>

  24. in src/rpc/net.cpp:150 in 28b2a67f13 outdated
     142 | @@ -143,6 +143,11 @@ static RPCHelpMan getpeerinfo()
     143 |                      {RPCResult::Type::NUM_TIME, "last_block", "The " + UNIX_EPOCH_TIME + " of the last block received from this peer"},
     144 |                      {RPCResult::Type::NUM, "bytessent", "The total bytes sent"},
     145 |                      {RPCResult::Type::NUM, "bytesrecv", "The total bytes received"},
     146 | +                    {RPCResult::Type::NUM, "cpu_load", /*optional=*/true,
     147 | +                        "The CPU time (user + system) spent processing messages from this peer "
     148 | +                        "and crafting messages for it expressed in per milles (‰) of the "
     149 | +                        "duration of the connection. Will be omitted on platforms that do not "
     150 | +                        "support this or if still not measured."},
    


    vasild commented at 10:19 AM on January 23, 2025:

    Currently I chose to represent the CPU load as 1/1000s of the connection time. For example, if we have spent 5 CPU seconds for a peer that has been connected for 1000 seconds, then cpu_load will be 5.

    Output of bitcoin-cli getpeerinfo |jq '.[].cpu_load ' |sort -n:

    • Shortly after startup:
    0.003207936170212766
    0.003909962962962963
    0.004000127659574468
    0.004340326086956522
    0.00904279365079365
    0.0270041914893617
    0.03776429787234042
    0.04154987234042553
    0.05750733333333333
    0.0828310843373494
    0.1014558936170213
    0.1453753214285714
    0.1619409404761905
    
    • After running for a few hours:
    0.5100616477272727
    0.7087315352941177
    0.7814365649717514
    0.8773131578947369
    0.9717403481228669
    1.069695995623632
    1.078479046189377
    1.153550253945481
    1.167638256213824
    1.187242140449438
    1.199645694868011
    1.199951082706767
    1.220689207655502
    1.264582222222222
    1.265083146221971
    1.282676876955162
    1.301380197644649
    1.303624877378436
    1.333201868001821
    1.341489760490639
    1.343411334541063
    1.361829386934673
    1.364938446935725
    1.372325493506493
    1.385421828947368
    1.472354355769231
    1.612321964879852
    1.641998848148148
    1.648207868421053
    1.681837521410579
    1.783343262569832
    1.78646682029841
    1.837577241385135
    1.844571151379764
    1.851067047263682
    1.908551290322581
    1.915087061408061
    1.923511220435511
    1.939589383639822
    1.940788162162162
    1.962705583687341
    2.000298126030624
    2.008423324931507
    2.00921122707588
    2.018911893157895
    2.019168323205742
    2.019358503865546
    2.030658536117768
    2.034708427345187
    2.03510252110758
    2.04022746739726
    2.054511708699122
    2.07339420882353
    2.076240002244949
    2.080369749066667
    2.08561637254902
    2.092916448969578
    2.095147154892331
    2.119004164227642
    2.129640033722438
    2.145774200383772
    2.164519192393115
    2.27069154817898
    2.29444800067659
    2.313547881255947
    2.31539280930693
    2.368903822619457
    2.445622954545454
    2.503377741525424
    2.557022695182724
    2.56755574523507
    2.709643024574987
    2.724320047021944
    2.913325884282384
    3.154521111713488
    3.706281482599432
    3.843126643942505
    3.865110431762718
    3.934123987370194
    4.634227207136824
    5.185085595879828
    5.233027217613057
    5.766649107296137
    6.105655600034141
    6.138089870962286
    10.09238865432725
    10.24402739230769
    10.56650271556122
    20.75054951145038
    47.69228849275363
    77.79810328289474
    145.2686368960396
    

    Maybe using an integer would be better for this? Then it would have to have higher resolution, e.g. instead of 1/1000s it should show 1/1Bs? Then the above numbers would vary from 3207 to 145'268'636. E.g. the lightest peer has caused 3207 nanoseconds of CPU time for each 1 second (=1B nanoseconds) of connection time.

  25. in src/util/time.cpp:128 in 28b2a67f13 outdated
     123 | +
     124 | +    timespec t;
     125 | +    if (clock_gettime(CLOCK_THREAD_CPUTIME_ID, &t) == -1) {
     126 | +        return std::chrono::nanoseconds{0};
     127 | +    }
     128 | +    return std::chrono::nanoseconds{static_cast<int64_t>(t.tv_sec) * 1'000'000'000 + t.tv_nsec};
    


    maflcko commented at 10:25 AM on January 23, 2025:

    nit: Instead of manual casting and multiplication, you can use chrono types and let the compiler insert what is needed.


    vasild commented at 12:38 PM on January 23, 2025:

    Right, much better and simpler this way. Thanks!

  26. in src/rpc/net.cpp:250 in 28b2a67f13 outdated
     245 | @@ -239,6 +246,11 @@ static RPCHelpMan getpeerinfo()
     246 |          obj.pushKV("last_block", count_seconds(stats.m_last_block_time));
     247 |          obj.pushKV("bytessent", stats.nSendBytes);
     248 |          obj.pushKV("bytesrecv", stats.nRecvBytes);
     249 | +        const auto cpu_time_ns{count_nanoseconds(stats.m_cpu_time)};
     250 | +        const auto connection_time_ns{count_nanoseconds(now - stats.m_connected)};
    


    maflcko commented at 10:29 AM on January 23, 2025:

    nit: Instead of manually casting, you can just divide two chrono types and let the compiler insert anything that is needed.


    vasild commented at 12:40 PM on January 23, 2025:

    Thanks for the suggestion! And now no need to add a new function count_nanoseconds().

  27. maflcko commented at 10:31 AM on January 23, 2025: member

    (left two nits, feel free to ignore)

  28. vasild force-pushed on Jan 23, 2025
  29. vasild commented at 12:38 PM on January 23, 2025: contributor

    28b2a67f13...0f68c47e93: address suggestions by @maflcko

  30. maflcko commented at 1:06 PM on January 23, 2025: member

    The code here assumes that Bitcoin Core is single threaded, which may be true right now, but could change in the future.

  31. luke-jr commented at 8:35 PM on January 25, 2025: member

    How so? Both implementations use thread-specific timers...?

  32. yuvicc commented at 4:40 PM on January 26, 2025: contributor

    Concept ACK

    I think this would be a clean way to get peers' CPU usage for message processing since Tracepoints is currently not compatible with macOS/Windows for which folks will be having a hard time.

    Also, this could pave the way for prioritising peers based on CPU usage #31033.

  33. yuvicc commented at 4:52 PM on January 26, 2025: contributor

    Also, during my observations I found value null as well.

    null
    0.0352094358974359
    0.54925525
    0.579001
    0.6585706
    1.390154842105263
    1.703986581081081
    2.427459408450704
    24.85140159887005
    

    Any idea why is it null? maybe it is about to get disconnected.

  34. vasild commented at 10:52 AM on January 27, 2025: contributor

    Any idea why is it null? maybe it is about to get disconnected.

    jq '.[].cpu_load' displays null when there is no cpu_load in the JSON for that peer. The new field is optional:

    $ bitcoin-cli help getpeerinfo
    ...
        "cpu_load" : n,                       (numeric, optional) The CPU time (user + system) spent processing messages from this peer and crafting messages for it expressed in per milles (‰) of the duration of the connection. Will be omitted on platforms that do not support this or if still not measured.
    

    The omission comes from this condition:

    if (stats.m_cpu_time > 0s && now > stats.m_connected) {
    

    Within 1 second of connecting, now will be equal to stats.m_connected (they use second-precision). I am open to suggestions if you think this can be done in a better or more intuitive way.

  35. vasild commented at 11:01 AM on January 27, 2025: contributor

    The code here assumes that Bitcoin Core is single threaded, which may be true right now,

    Hmm? https://github.com/bitcoin/bitcoin/blob/0a931a9787b196d7a620863cc143d9319ffd356d/doc/developer-notes.md#threads

    but could change in the future.

    I guess you mean that message processing is single threaded - the b-msghand thread. I agree it could change in the future.

  36. yuvicc commented at 12:46 PM on January 27, 2025: contributor

    Any idea why is it null? maybe it is about to get disconnected.

    jq '.[].cpu_load' displays null when there is no cpu_load in the JSON for that peer. The new field is optional:

    $ bitcoin-cli help getpeerinfo
    ...
        "cpu_load" : n,                       (numeric, optional) The CPU time (user + system) spent processing messages from this peer and crafting messages for it expressed in per milles (‰) of the duration of the connection. Will be omitted on platforms that do not support this or if still not measured.
    

    The omission comes from this condition:

    if (stats.m_cpu_time > 0s && now > stats.m_connected) {
    

    Within 1 second of connecting, now will be equal to stats.m_connected (they use second-precision). I am open to suggestions if you think this can be done in a better or more intuitive way.

    Got it, thanks!

  37. yuvicc approved
  38. yuvicc commented at 12:49 PM on January 27, 2025: contributor

    tACK 0f68c47e931de05200adeae639bcee50ea3c171d

  39. DrahtBot requested review from jonatack on Jan 27, 2025
  40. DrahtBot requested review from sipa on Jan 27, 2025
  41. DrahtBot requested review from theStack on Jan 27, 2025
  42. DrahtBot requested review from BrandonOdiwuor on Jan 27, 2025
  43. maflcko commented at 11:26 AM on February 5, 2025: member

    I guess you mean that message processing is single threaded - the b-msghand thread. I agree it could change in the future.

    I left the comment, because if the work is done in a different thread, this feature will silently break. I guess that is fine if the work is minimal, or code could be added to do the additional accounting if the work is substantial.

  44. vasild commented at 12:29 PM on February 5, 2025: contributor

    @maflcko yes, I agree. Future changes that make message processing multi-threaded will have to take this into account.

  45. luke-jr referenced this in commit 9febeb13b1 on Feb 22, 2025
  46. luke-jr referenced this in commit 13e161aa1a on Feb 28, 2025
  47. laanwj commented at 10:18 AM on April 9, 2025: member

    Ohh this is neat. Concept ACK.

    As noted in #31033, others have also already conducted similar research (https://b10c.me/projects/023-cpu-usage-of-peers/ or https://delvingbitcoin.org/t/cpu-usage-of-peers/196), and it seems like the data you'd expose here, will be less detailed / useful than what can/has already been produced using tracepoints or similar, so I'm wondering if this is the best approach.

    Where i think monitoring per-peer CPU usage is most concretely useful is for anti-DoS measures. Reporting it is a (still useful) first step.

    (do agree tracepoint are much better for gathering detailed information, but there's internal applications of this)

  48. vasild commented at 3:47 PM on April 9, 2025: contributor

    Hmm, https://b10c.me/projects/023-cpu-usage-of-peers/ and https://github.com/0xB10C/bitcoin/commit/59c0fe438a3e62d27eda237ae20c062608e047b3 measure the real time, aka wall clock time (cc @0xB10C).

    Message processing is single threaded, which means that we do not process messages from more than one peer at a given time, but still there are other threads in bitcoind and other processes on the system which influence the "real time". That is, bitcoind may perform identical tasks for two peers, but while doing the tasks for the second peer some other thread or process hogs the system and delays the completion. This will skew the results.

    To measure properly one has to measure "CPU time", not "real time". It can still be done with tracepoints, but the CPU measuring function from this PR would have to be used.

  49. maflcko commented at 5:23 PM on April 9, 2025: member

    I wouldn't say that one metric is more proper than the other one. Some people run on slow IO devices, so measuring the wall clock time is reasonable to find peers that eat CPU or IO (or both). Of course wall-clock is more fragile and it means you can't hammer the program over RPC at the same time, or do anything else on the whole system, but it is still a valid way to measure. (See also the recent commit to perf that added wall-clock measurements: https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=32ecca8d7a3e2400805f8b5564a1b07e05cfcd54)

  50. in doc/release-notes-31672.md:8 in 0f68c47e93 outdated
       0 | @@ -0,0 +1,8 @@
       1 | +RPC
       2 | +---
       3 | +
       4 | +A new field `cpu_load` has been added to the `getpeerinfo` RPC output.
       5 | +It shows the CPU time (user + system) spent processing messages from the
       6 | +given peer and crafting messages for it expressed in per milles (‰) of
       7 | +the duration of the connection. The field is optional and will be omitted
       8 | +on platforms that do not support this or if still not measured. (#31672)
    


    mzumsande commented at 3:09 PM on April 14, 2025:

    nit: not yet measured sounds more correct to me.


    vasild commented at 8:29 AM on April 15, 2025:

    Done. Also elaborated a little bit the help text.

  51. mzumsande commented at 5:44 PM on April 14, 2025: contributor

    Concept ACK, but I think that the cpu_load is not straightforward to interpret.

    As I wrote in #31033 (comment) I'd say that in absence of an attack, the highest CPU load peers are our most useful ones - it would be a unfortunate if users attempted to disconnect "DoSy" their peers based on this statistics when there is no actual DoS situation.

    On the other hand, I think that this would be really useful in case of an actual attack. If users see their node becoming slow, cpu_load could help detect who the attacker is and what they are doing.

  52. vasild force-pushed on Apr 15, 2025
  53. vasild commented at 8:25 AM on April 15, 2025: contributor

    0f68c47e93...e374825246: address suggestion

  54. vasild force-pushed on Apr 15, 2025
  55. vasild commented at 8:30 AM on April 15, 2025: contributor

    e374825246...9cc8ca3b1f: also update getpeerinfo help

  56. in doc/release-notes-31672.md:9 in 9cc8ca3b1f outdated
       0 | @@ -0,0 +1,10 @@
       1 | +RPC
       2 | +---
       3 | +
       4 | +A new field `cpu_load` has been added to the `getpeerinfo` RPC output.
       5 | +It shows the CPU time (user + system) spent processing messages from the
       6 | +given peer and crafting messages for it expressed in per milles (‰) of
       7 | +the duration of the connection. The field is optional and will be omitted
       8 | +on platforms that do not support this or if not yet measured. Note that
       9 | +high CPU time is not necessary a bad thing - new, valid, transactions and
    


    jonatack commented at 4:36 PM on April 16, 2025:
    high CPU time is not necessarily a bad thing - new valid transactions and
    

    vasild commented at 7:21 AM on April 17, 2025:

    Done.

  57. in doc/release-notes-31672.md:6 in 9cc8ca3b1f outdated
       0 | @@ -0,0 +1,10 @@
       1 | +RPC
       2 | +---
       3 | +
       4 | +A new field `cpu_load` has been added to the `getpeerinfo` RPC output.
       5 | +It shows the CPU time (user + system) spent processing messages from the
       6 | +given peer and crafting messages for it expressed in per milles (‰) of
    


    jonatack commented at 5:53 PM on April 16, 2025:

    maybe more simply replace "processing messages from the given peer and crafting messages for it" with "processing messages to/from the peer"

    (same for the RPC help)


    vasild commented at 7:21 AM on April 17, 2025:

    Done.

  58. in src/rpc/net.cpp:152 in 9cc8ca3b1f outdated
     147 | +                        "The CPU time (user + system) spent processing messages from this peer "
     148 | +                        "and crafting messages for it expressed in per milles (‰) of the "
     149 | +                        "duration of the connection. Will be omitted on platforms that do not "
     150 | +                        "support this or if still not measured. Note that high CPU time is not "
     151 | +                        "necessary a bad thing - new, valid, transactions and blocks require "
     152 | +                        "CPU time to be validated."},
    


    jonatack commented at 5:57 PM on April 16, 2025:

    nit, this is very long

        "bytesrecv" : n,                      (numeric) The total bytes received
        "cpu_load" : n,                       (numeric, optional) The CPU time (user + system) spent processing messages from this peer and crafting messages for it expressed in per milles (‰) of the duration of the connection. Will be omitted on platforms that do not support this or if still not measured. Note that high CPU time is not necessary a bad thing - new, valid, transactions and blocks require CPU time to be validated.
        "conntime" : xxx,                     (numeric) The UNIX epoch time of the connection
    

    perhaps more simply:

    "The CPU time (user + system) spent processing messages to/from the peer, in per milles (‰) of the connection duration"


    vasild commented at 7:24 AM on April 17, 2025:

    Shortened a bit but not that much since this is the only help text (I assume release notes are not so useful for somebody looking into this for the first time in e.g. version 35.0, they will not go to search for past release notes).

    Other help texts include \n, maybe do that here as well?


    jonatack commented at 9:19 PM on April 18, 2025:

    Other help texts include \n, maybe do that here as well?

    I'm agnostic as to that.

    A few text compaction ideas, feel free to pick/choose/ignore:

    s/The CPU time (user + system)/Total CPU time/

    s/spent processing/processing/

    s/duration. Will be omitted on platforms that do not support this or if still not measured./duration, if supported by the platform and measured./

    s/Note that high CPU time is not necessarily a bad thing - new valid transactions and blocks require CPU time to be validated./High CPU time is not necessarily bad; validating new transactions and blocks requires it./


    vasild commented at 1:48 PM on May 13, 2025:

    Took some, thanks!

  59. jonatack commented at 6:05 PM on April 16, 2025: member

    Light initial tested ACK 9cc8ca3b1fe2b20e465ddc030c21ac36570f2935

    I did not review the cmake build code.

    Tested RPC getpeerinfo and have also been testing this by running the following branch https://github.com/jonatack/bitcoin/commits/2025-04-add-cpu_load-to-netinfo/ containing an additional commit, and then running a live dashboard with

    $ watch ./build/bin/bitcoin-cli -rpcwait -netinfo 4
    
  60. DrahtBot requested review from yuvicc on Apr 16, 2025
  61. DrahtBot requested review from mzumsande on Apr 16, 2025
  62. DrahtBot requested review from laanwj on Apr 16, 2025
  63. yuvicc commented at 9:29 PM on April 16, 2025: contributor

    ACK 9cc8ca3b1fe2b20e465ddc030c21ac36570f2935

  64. vasild force-pushed on Apr 17, 2025
  65. vasild commented at 7:21 AM on April 17, 2025: contributor

    9cc8ca3b1f...bb822a5ee1: address suggestions

    I think the bitcoin-cli extension in https://github.com/jonatack/bitcoin/commit/708a9502f8eca7aaa84236ea038a574f4350f298#r155516952 might be included in this PR. @i-am-yuvi, since you ACKed this PR without that extra commit, what do you think? Should it be included?

  66. in cmake/introspection.cmake:162 in bb822a5ee1 outdated
     158 | @@ -159,6 +159,40 @@ check_cxx_source_compiles("
     159 |    " HAVE_SYSCTL_ARND
     160 |  )
     161 |  
     162 | +# Check for clock_gettime() (POSIX.1b).
    


    fanquake commented at 9:09 AM on April 17, 2025:

    I don't think we need to introduce checks here, we already use clock_gettime. You could just #ifdef CLOCK_THREAD_CPUTIME_ID at the callsite.


    vasild commented at 9:34 AM on April 17, 2025:

    Done.

  67. in cmake/introspection.cmake:178 in bb822a5ee1 outdated
     173 | +  }
     174 | +  " HAVE_CLOCK_GETTIME
     175 | +)
     176 | +
     177 | +# Check for GetThreadTimes() (Windows).
     178 | +check_cxx_source_compiles("
    


    fanquake commented at 9:09 AM on April 17, 2025:

    I don't think we check for any other Windows functions like this, I think you can just #ifdef WIN32 at the callsite.


    vasild commented at 9:34 AM on April 17, 2025:

    Done. This removes all build system changes from this PR, thanks!

  68. in src/util/time.cpp:178 in bb822a5ee1 outdated
     177 | +    ULARGE_INTEGER user_;
     178 | +    user_.LowPart = user.dwLowDateTime;
     179 | +    user_.HighPart = user.dwHighDateTime;
     180 | +
     181 | +    // https://learn.microsoft.com/en-us/windows/win32/api/processthreadsapi/nf-processthreadsapi-getthreadtimes
     182 | +    // "Thread kernel mode and user mode times are amounts of time. For example,
    


    fanquake commented at 9:10 AM on April 17, 2025:

    If you're going to link to external documentation, there's no need to copy paragraphs of it, verbatim, into our source code.


    vasild commented at 9:37 AM on April 17, 2025:

    The linked article is long and convoluted and here I want to highlight only a short part of it that is relevant. Also having the text here is helpful if the web page disappears in the future.


    fanquake commented at 9:41 AM on April 17, 2025:

    Then I'd suggest atleast trying to consolidate what is here, given you already link to the same place above, with just "GetThreadTimes():" as a comment (that seems unecessary).


    vasild commented at 10:06 AM on April 17, 2025:

    Reduced the text and removed the link to the doc since it is already a few lines earlier.

  69. in src/util/time.cpp:147 in bb822a5ee1 outdated
     139 | @@ -128,3 +140,62 @@ struct timeval MillisToTimeval(std::chrono::milliseconds ms)
     140 |  {
     141 |      return MillisToTimeval(count_milliseconds(ms));
     142 |  }
     143 | +
     144 | +std::chrono::nanoseconds ThreadCpuTime()
     145 | +{
     146 | +#ifdef HAVE_CLOCK_GETTIME
     147 | +    // An alternative to clock_gettime() is getrusage().
    


    fanquake commented at 9:11 AM on April 17, 2025:

    Comment seems unnecessary?


    vasild commented at 9:35 AM on April 17, 2025:

    Removed.

  70. vasild force-pushed on Apr 17, 2025
  71. vasild commented at 9:34 AM on April 17, 2025: contributor

    bb822a5ee1...ee16345af3: address suggestions

  72. in src/util/time.cpp:151 in ee16345af3 outdated
     146 | +    if (clock_gettime(CLOCK_THREAD_CPUTIME_ID, &t) == -1) {
     147 | +        return std::chrono::nanoseconds{0};
     148 | +    }
     149 | +    return std::chrono::seconds{t.tv_sec} + std::chrono::nanoseconds{t.tv_nsec};
     150 | +#elif defined(WIN32)
     151 | +    // An alternative to GetThreadTimes() is QueryThreadCycleTime() but it
    


    fanquake commented at 9:37 AM on April 17, 2025:

    Same here, as the clock_gettime alternative comment.


    vasild commented at 10:07 AM on April 17, 2025:

    Removed and added a note to the PR description about the alternatives.

  73. vasild force-pushed on Apr 17, 2025
  74. vasild commented at 10:05 AM on April 17, 2025: contributor

    ee16345af3...19c8336d97: address suggestions

  75. jonatack commented at 6:57 PM on April 19, 2025: member

    Latest changes since my last review LGTM. I plan to review the code more closely.

    Empirically, running a node on this branch, it looks like over time most peers settle into less than 0.5 per milles, a few are higher, with maybe one peer around 4. These higher-load connections seem to more often be outbound ones.

    (Edit, updated the branch with the -netinfo commit with a few improvements and to also only display cpu load for a peer (rounded to nearest integer) if it is non-zero (e.g. >= 0.5 in getpeerinfo), to more easily see which peers are using more CPU: https://github.com/jonatack/bitcoin/commits/2025-04-add-cpu_load-to-netinfo/

    <details><summary>Example 1</summary><p>

    <img width="600" alt="Screenshot 2025-04-19 at 12 54 32" src="https://github.com/user-attachments/assets/6800e5e2-3d5a-4d70-9e20-33bd45dbf606" />

    </p></details>

    <details><summary>Example 2 (a couple weeks later)</summary><p>

    <img width="602" alt="Screenshot 2025-05-03 at 12 51 29" src="https://github.com/user-attachments/assets/22dd2166-a39d-4edf-9a5b-eb1ba8216643" />

    </p></details>

  76. vasild force-pushed on May 13, 2025
  77. vasild commented at 1:45 PM on May 13, 2025: contributor

    19c8336d97...8b8b854346: address suggestions and take the bitcoin-cli change from https://github.com/jonatack/bitcoin/commits/2025-04-add-cpu_load-to-netinfo/, thanks!

  78. luke-jr referenced this in commit e05aeffe77 on Jun 6, 2025
  79. luke-jr referenced this in commit e109d55817 on Jun 6, 2025
  80. DrahtBot added the label CI failed on Jun 15, 2025
  81. DrahtBot commented at 11:06 AM on June 15, 2025: contributor

    <!--85328a0da195eb286784d51f73fa0af9-->

    🚧 At least one of the CI tasks failed. <sub>Task tidy: https://github.com/bitcoin/bitcoin/runs/42137973271</sub> <sub>LLM reason (✨ experimental): The CI failure is caused by clang-tidy detecting a deprecated header include in the code.</sub>

    <details><summary>Hints</summary>

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

    </details>

  82. rpc: add cpu_load to getpeerinfo
    Add a new field `cpu_load` to the output of `getpeerinfo` RPC.
    
    It represents the CPU time spent by the message handling thread for the
    given peer, weighted for the duration of the connection. That is, for
    example, if two peers are equally demanding and one is connected longer
    than the other, then they will have the same `cpu_load` number.
    f1517d96eb
  83. cli: add getpeerinfo#cpu_load to -netinfo b25b40ebd5
  84. vasild force-pushed on Jun 20, 2025
  85. vasild commented at 8:00 AM on June 20, 2025: contributor

    8b8b854346...b25b40ebd5: rebase and pet tidy: time.h -> ctime

  86. DrahtBot removed the label CI failed on Jun 20, 2025
  87. luke-jr referenced this in commit 9084ea96ac on Jul 17, 2025
  88. luke-jr referenced this in commit ca906b350e on Jul 17, 2025
  89. rebroad commented at 7:06 PM on July 28, 2025: contributor

    NACK. cpu load alone is a useless metric. However, CPU load divided by TXs accepted into the chain (or mempool) could be a useful metric.

  90. achow101 removed review request from BrandonOdiwuor on Oct 22, 2025
  91. achow101 requested review from ajtowns on Oct 22, 2025
  92. achow101 requested review from 0xB10C on Oct 22, 2025
  93. in doc/release-notes-31672.md:4 in b25b40ebd5
       0 | @@ -0,0 +1,9 @@
       1 | +RPC
       2 | +---
       3 | +
       4 | +A new field `cpu_load` has been added to the `getpeerinfo` RPC output. It
    


    ajtowns commented at 7:05 PM on October 29, 2025:

    Why monitor cpu time rather than real time? Isn't time spent waiting for disk access while looking up the utxo set equally problematic? Waiting on another thread's lock is arguably less problematic (it's the other thread that's delaying things), but if an attacker is deliberately triggering lock contention somehow to delay processing other thread's messages, that would still be a problem.

    Adding OS-specific logic rather than just using either real time or a steady clock seems over-complicated here.

    [EDIT: I guess in the context of rate-limiting expensive peers, it would be better to have a rolling average of time used for each peer, so that having sent expensive txs five hours ago has ~0 impact on how much cpu time you're getting now]

  94. DrahtBot added the label Needs rebase on Jan 23, 2026
  95. DrahtBot commented at 1:53 PM on January 23, 2026: contributor

    <!--cf906140f33d8803c4a75a2196329ecb-->

    πŸ™ This pull request conflicts with the target branch and needs rebase.

  96. 0xB10C commented at 3:28 PM on January 26, 2026: contributor

    fwiw I've been running a node with this patch for a while. Here a some numbers on min/max/mean/median I collected. I'm not sure yet how useful these metrics are from a monitoring standpoint.

    <img width="1297" height="887" alt="image" src="https://github.com/user-attachments/assets/dd6beb04-608c-4d19-b3ba-0aaa6a569f67" />


    I also briefly looked into which peers have a higher cpu_load by connection_type and by subver by looking at a getpeerinfo snapshot:

    <img width="854" height="685" alt="image" src="https://github.com/user-attachments/assets/1e4ec85c-d369-4e51-bb74-2ab58de73fee" />

    Generally, connections that we relay transactions to have a higher cpu_load. Possibly because they also relay transactions to us ( A few outliners are spy-nodes (bitnodes.earn.com, kit.dsn, ..) that we relay transactions to, but don't get any from us. See chart below).

    <img width="1129" height="685" alt="image" src="https://github.com/user-attachments/assets/f5b9f110-8263-4cf5-9d4a-77e67e8752ac" />


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-04-13 15:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me