rpc: add cpu_load to getpeerinfo #31672

pull vasild wants to merge 1 commits into bitcoin:master from vasild:peer_cpu_load changing 7 files +135 −0
  1. vasild commented at 2:02 pm on January 16, 2025: contributor

    Add a new field cpu_load to the output of getpeerinfo RPC.

    It represents the CPU time spent by the message handling thread for the given peer, weighted for the duration of the connection. That is, for example, if two peers are equally demanding and one is connected longer than the other, then they will have the same cpu_load number.


    Monitoring CPU usage is useful on its own. Also related to #31033.


    This PR uses clock_gettime() (POSIX) and GetThreadTimes() (Windows). An alternative to those, should this be explored are: getrusage() (POSIX) and QueryThreadCycleTime() (Windows, but it counts CPU cycles).

  2. DrahtBot commented at 2:02 pm on January 16, 2025: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/31672.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    Concept ACK sipa, theStack, BrandonOdiwuor, 1440000bytes, laanwj, mzumsande
    Stale ACK jonatack, i-am-yuvi

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #32061 (Replace libevent with our own HTTP and socket-handling implementation by pinheadmz)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

    LLM Linter

    per milles → per mille
    a nanoseconds atomic → an atomic nanoseconds variable

  3. DrahtBot added the label RPC/REST/ZMQ on Jan 16, 2025
  4. vasild force-pushed on Jan 16, 2025
  5. DrahtBot added the label CI failed on Jan 16, 2025
  6. DrahtBot commented at 2:05 pm on January 16, 2025: contributor

    🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/35717681069

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  7. jonatack commented at 2:25 pm on January 16, 2025: member
    Concept ACK if this can be a reliable, useful metric and help pave the path to #31033.
  8. sipa commented at 3:24 pm on January 16, 2025: member
    Concept ACK
  9. vasild force-pushed on Jan 16, 2025
  10. theStack commented at 6:15 pm on January 16, 2025: contributor

    Concept ACK

    Might be worth noting that this is currently only available on POSIX systems (i.e. not available on Windows, but on all other systems that we support AFAICT) in both the RPC help and a release note.

  11. BrandonOdiwuor commented at 7:00 pm on January 16, 2025: contributor
    Concept ACK
  12. fanquake commented at 1:06 pm on January 17, 2025: member

    Seems like there was some very brief discussion in #31033, and the conclusion “I guess this should start with planting some metrics”, but I’m not sure putting changes into Bitcoin Core is the right first step.

    Before changing our API, it’d be good to atleast show some usage that indicates that the changes here are useful. I assume you’ve already been running this locally, and collecting the data, so it’d be good to know what it turned up?

    As noted in #31033, others have also already conducted similar research (https://b10c.me/projects/023-cpu-usage-of-peers/ or https://delvingbitcoin.org/t/cpu-usage-of-peers/196), and it seems like the data you’d expose here, will be less detailed / useful than what can/has already been produced using tracepoints or similar, so I’m wondering if this is the best approach.

  13. vasild commented at 4:44 pm on January 17, 2025: contributor

    @fanquake IMO, monitoring CPU usage is useful on its own, even if we don’t start treating peers differently based on that metric (https://github.com/bitcoin/bitcoin/issues/31033). In other words, this metric is useful at least as much as a bunch of other metrics in the getpeerinfo output.

    Yes, I have been running this locally - there is like 65x difference between the least and most demanding peers. I am curious to correlate this to the messages being sent/received to/from those peers and to have a histogram of the data, not just least / most demanding (e.g. histogram of bitcoin-cli getpeerinfo |jq ".[].cpu_load"). I view those as something nice to build on top of this PR, not as a blocker that’s needed to justify the usefulness of the CPU time metric.

  14. 1440000bytes commented at 5:34 pm on January 17, 2025: none
    Concept ACK
  15. vasild commented at 12:25 pm on January 20, 2025: contributor

    Might be worth noting that this is currently only available on POSIX systems (i.e. not available on Windows

    Right. GetThreadTimes() looks like a promising Windows alternative. I will try to implement that here. Switching to draft because I will push some work-in-progress a bunch of times.

  16. vasild marked this as a draft on Jan 20, 2025
  17. vasild force-pushed on Jan 21, 2025
  18. vasild force-pushed on Jan 22, 2025
  19. vasild force-pushed on Jan 22, 2025
  20. DrahtBot removed the label CI failed on Jan 22, 2025
  21. vasild force-pushed on Jan 22, 2025
  22. vasild marked this as ready for review on Jan 23, 2025
  23. vasild commented at 10:07 am on January 23, 2025: contributor

    Ready for review. Implemented for Windows as well.

    Here is a test program I played with to test this manually. I think it is not worth adding as a unit test or benchmark, but at least it can live here in the comments of this PR:

     0#include <assert.h>
     1
     2#include <windows.h>
     3#include <winnt.h>
     4
     5#include <processthreadsapi.h>
     6
     7#include <iostream>
     8#include <thread>
     9
    10void thread(size_t work)
    11{
    12    FILETIME before_creation;
    13    FILETIME before_exit;
    14    FILETIME before_kernel;
    15    FILETIME before_user;
    16    bool ret = GetThreadTimes(GetCurrentThread(),
    17        &before_creation,
    18        &before_exit,
    19        &before_kernel,
    20        &before_user);
    21    assert(ret == 1);
    22
    23    if (work > 10'000) {
    24        for (size_t i{0}; i < work; ++i) {
    25            (void)(i * i);
    26        }
    27    } else {
    28        std::this_thread::sleep_for(std::chrono::milliseconds{work});
    29    }
    30
    31    FILETIME after_creation;
    32    FILETIME after_exit;
    33    FILETIME after_kernel;
    34    FILETIME after_user;
    35    ret = GetThreadTimes(GetCurrentThread(),
    36        &after_creation,
    37        &after_exit,
    38        &after_kernel,
    39        &after_user);
    40    assert(ret == 1);
    41
    42    ULARGE_INTEGER before_kernel_;
    43    before_kernel_.LowPart = before_kernel.dwLowDateTime;
    44    before_kernel_.HighPart = before_kernel.dwHighDateTime;
    45
    46    ULARGE_INTEGER before_user_;
    47    before_user_.LowPart = before_user.dwLowDateTime;
    48    before_user_.HighPart = before_user.dwHighDateTime;
    49
    50    ULARGE_INTEGER after_kernel_;
    51    after_kernel_.LowPart = after_kernel.dwLowDateTime;
    52    after_kernel_.HighPart = after_kernel.dwHighDateTime;
    53
    54    ULARGE_INTEGER after_user_;
    55    after_user_.LowPart = after_user.dwLowDateTime;
    56    after_user_.HighPart = after_user.dwHighDateTime;
    57
    58    ULARGE_INTEGER elapsed;
    59    elapsed.QuadPart = after_kernel_.QuadPart + after_user_.QuadPart - before_kernel_.QuadPart - before_user_.QuadPart;
    60    ULONGLONG elapsed_ns{elapsed.QuadPart * 100};
    61    std::cout << "cpu time: " << elapsed_ns / 1'000'000'000.0 << " sec\n";
    62}
    63
    64int main ()
    65{
    66    std::thread t1{thread, 1000000000};
    67    std::thread t2{thread, 1000000000};
    68    std::thread t3{thread, 3000};
    69    t1.join();
    70    t2.join();
    71    t3.join();
    72
    73    return 0;
    74}
    
  24. in src/rpc/net.cpp:150 in 28b2a67f13 outdated
    142@@ -143,6 +143,11 @@ static RPCHelpMan getpeerinfo()
    143                     {RPCResult::Type::NUM_TIME, "last_block", "The " + UNIX_EPOCH_TIME + " of the last block received from this peer"},
    144                     {RPCResult::Type::NUM, "bytessent", "The total bytes sent"},
    145                     {RPCResult::Type::NUM, "bytesrecv", "The total bytes received"},
    146+                    {RPCResult::Type::NUM, "cpu_load", /*optional=*/true,
    147+                        "The CPU time (user + system) spent processing messages from this peer "
    148+                        "and crafting messages for it expressed in per milles (‰) of the "
    149+                        "duration of the connection. Will be omitted on platforms that do not "
    150+                        "support this or if still not measured."},
    


    vasild commented at 10:19 am on January 23, 2025:

    Currently I chose to represent the CPU load as 1/1000s of the connection time. For example, if we have spent 5 CPU seconds for a peer that has been connected for 1000 seconds, then cpu_load will be 5.

    Output of bitcoin-cli getpeerinfo |jq '.[].cpu_load ' |sort -n:

    • Shortly after startup:
     00.003207936170212766
     10.003909962962962963
     20.004000127659574468
     30.004340326086956522
     40.00904279365079365
     50.0270041914893617
     60.03776429787234042
     70.04154987234042553
     80.05750733333333333
     90.0828310843373494
    100.1014558936170213
    110.1453753214285714
    120.1619409404761905
    
    • After running for a few hours:
     00.5100616477272727
     10.7087315352941177
     20.7814365649717514
     30.8773131578947369
     40.9717403481228669
     51.069695995623632
     61.078479046189377
     71.153550253945481
     81.167638256213824
     91.187242140449438
    101.199645694868011
    111.199951082706767
    121.220689207655502
    131.264582222222222
    141.265083146221971
    151.282676876955162
    161.301380197644649
    171.303624877378436
    181.333201868001821
    191.341489760490639
    201.343411334541063
    211.361829386934673
    221.364938446935725
    231.372325493506493
    241.385421828947368
    251.472354355769231
    261.612321964879852
    271.641998848148148
    281.648207868421053
    291.681837521410579
    301.783343262569832
    311.78646682029841
    321.837577241385135
    331.844571151379764
    341.851067047263682
    351.908551290322581
    361.915087061408061
    371.923511220435511
    381.939589383639822
    391.940788162162162
    401.962705583687341
    412.000298126030624
    422.008423324931507
    432.00921122707588
    442.018911893157895
    452.019168323205742
    462.019358503865546
    472.030658536117768
    482.034708427345187
    492.03510252110758
    502.04022746739726
    512.054511708699122
    522.07339420882353
    532.076240002244949
    542.080369749066667
    552.08561637254902
    562.092916448969578
    572.095147154892331
    582.119004164227642
    592.129640033722438
    602.145774200383772
    612.164519192393115
    622.27069154817898
    632.29444800067659
    642.313547881255947
    652.31539280930693
    662.368903822619457
    672.445622954545454
    682.503377741525424
    692.557022695182724
    702.56755574523507
    712.709643024574987
    722.724320047021944
    732.913325884282384
    743.154521111713488
    753.706281482599432
    763.843126643942505
    773.865110431762718
    783.934123987370194
    794.634227207136824
    805.185085595879828
    815.233027217613057
    825.766649107296137
    836.105655600034141
    846.138089870962286
    8510.09238865432725
    8610.24402739230769
    8710.56650271556122
    8820.75054951145038
    8947.69228849275363
    9077.79810328289474
    91145.2686368960396
    

    Maybe using an integer would be better for this? Then it would have to have higher resolution, e.g. instead of 1/1000s it should show 1/1Bs? Then the above numbers would vary from 3207 to 145'268'636. E.g. the lightest peer has caused 3207 nanoseconds of CPU time for each 1 second (=1B nanoseconds) of connection time.

  25. in src/util/time.cpp:128 in 28b2a67f13 outdated
    123+
    124+    timespec t;
    125+    if (clock_gettime(CLOCK_THREAD_CPUTIME_ID, &t) == -1) {
    126+        return std::chrono::nanoseconds{0};
    127+    }
    128+    return std::chrono::nanoseconds{static_cast<int64_t>(t.tv_sec) * 1'000'000'000 + t.tv_nsec};
    


    maflcko commented at 10:25 am on January 23, 2025:
    nit: Instead of manual casting and multiplication, you can use chrono types and let the compiler insert what is needed.

    vasild commented at 12:38 pm on January 23, 2025:
    Right, much better and simpler this way. Thanks!
  26. in src/rpc/net.cpp:250 in 28b2a67f13 outdated
    245@@ -239,6 +246,11 @@ static RPCHelpMan getpeerinfo()
    246         obj.pushKV("last_block", count_seconds(stats.m_last_block_time));
    247         obj.pushKV("bytessent", stats.nSendBytes);
    248         obj.pushKV("bytesrecv", stats.nRecvBytes);
    249+        const auto cpu_time_ns{count_nanoseconds(stats.m_cpu_time)};
    250+        const auto connection_time_ns{count_nanoseconds(now - stats.m_connected)};
    


    maflcko commented at 10:29 am on January 23, 2025:
    nit: Instead of manually casting, you can just divide two chrono types and let the compiler insert anything that is needed.

    vasild commented at 12:40 pm on January 23, 2025:
    Thanks for the suggestion! And now no need to add a new function count_nanoseconds().
  27. maflcko commented at 10:31 am on January 23, 2025: member
    (left two nits, feel free to ignore)
  28. vasild force-pushed on Jan 23, 2025
  29. vasild commented at 12:38 pm on January 23, 2025: contributor
    28b2a67f13...0f68c47e93: address suggestions by @maflcko
  30. maflcko commented at 1:06 pm on January 23, 2025: member
    The code here assumes that Bitcoin Core is single threaded, which may be true right now, but could change in the future.
  31. luke-jr commented at 8:35 pm on January 25, 2025: member
    How so? Both implementations use thread-specific timers…?
  32. i-am-yuvi commented at 4:40 pm on January 26, 2025: contributor

    Concept ACK

    I think this would be a clean way to get peers’ CPU usage for message processing since Tracepoints is currently not compatible with macOS/Windows for which folks will be having a hard time.

    Also, this could pave the way for prioritising peers based on CPU usage #31033.

  33. i-am-yuvi commented at 4:52 pm on January 26, 2025: contributor

    Also, during my observations I found value null as well.

    0null
    10.0352094358974359
    20.54925525
    30.579001
    40.6585706
    51.390154842105263
    61.703986581081081
    72.427459408450704
    824.85140159887005
    

    Any idea why is it null? maybe it is about to get disconnected.

  34. vasild commented at 10:52 am on January 27, 2025: contributor

    Any idea why is it null? maybe it is about to get disconnected.

    jq '.[].cpu_load' displays null when there is no cpu_load in the JSON for that peer. The new field is optional:

    0$ bitcoin-cli help getpeerinfo
    1...
    2    "cpu_load" : n,                       (numeric, optional) The CPU time (user + system) spent processing messages from this peer and crafting messages for it expressed in per milles () of the duration of the connection. Will be omitted on platforms that do not support this or if still not measured.
    

    The omission comes from this condition:

    0if (stats.m_cpu_time > 0s && now > stats.m_connected) {
    

    Within 1 second of connecting, now will be equal to stats.m_connected (they use second-precision). I am open to suggestions if you think this can be done in a better or more intuitive way.

  35. vasild commented at 11:01 am on January 27, 2025: contributor

    The code here assumes that Bitcoin Core is single threaded, which may be true right now,

    Hmm? https://github.com/bitcoin/bitcoin/blob/0a931a9787b196d7a620863cc143d9319ffd356d/doc/developer-notes.md#threads

    but could change in the future.

    I guess you mean that message processing is single threaded - the b-msghand thread. I agree it could change in the future.

  36. i-am-yuvi commented at 12:46 pm on January 27, 2025: contributor

    Any idea why is it null? maybe it is about to get disconnected.

    jq '.[].cpu_load' displays null when there is no cpu_load in the JSON for that peer. The new field is optional:

    0$ bitcoin-cli help getpeerinfo
    1...
    2    "cpu_load" : n,                       (numeric, optional) The CPU time (user + system) spent processing messages from this peer and crafting messages for it expressed in per milles () of the duration of the connection. Will be omitted on platforms that do not support this or if still not measured.
    

    The omission comes from this condition:

    0if (stats.m_cpu_time > 0s && now > stats.m_connected) {
    

    Within 1 second of connecting, now will be equal to stats.m_connected (they use second-precision). I am open to suggestions if you think this can be done in a better or more intuitive way.

    Got it, thanks!

  37. i-am-yuvi approved
  38. i-am-yuvi commented at 12:49 pm on January 27, 2025: contributor
    tACK 0f68c47e931de05200adeae639bcee50ea3c171d
  39. DrahtBot requested review from jonatack on Jan 27, 2025
  40. DrahtBot requested review from sipa on Jan 27, 2025
  41. DrahtBot requested review from theStack on Jan 27, 2025
  42. DrahtBot requested review from BrandonOdiwuor on Jan 27, 2025
  43. maflcko commented at 11:26 am on February 5, 2025: member

    I guess you mean that message processing is single threaded - the b-msghand thread. I agree it could change in the future.

    I left the comment, because if the work is done in a different thread, this feature will silently break. I guess that is fine if the work is minimal, or code could be added to do the additional accounting if the work is substantial.

  44. vasild commented at 12:29 pm on February 5, 2025: contributor
    @maflcko yes, I agree. Future changes that make message processing multi-threaded will have to take this into account.
  45. luke-jr referenced this in commit 9febeb13b1 on Feb 22, 2025
  46. luke-jr referenced this in commit 13e161aa1a on Feb 28, 2025
  47. laanwj commented at 10:18 am on April 9, 2025: member

    Ohh this is neat. Concept ACK.

    As noted in #31033, others have also already conducted similar research (https://b10c.me/projects/023-cpu-usage-of-peers/ or https://delvingbitcoin.org/t/cpu-usage-of-peers/196), and it seems like the data you’d expose here, will be less detailed / useful than what can/has already been produced using tracepoints or similar, so I’m wondering if this is the best approach.

    Where i think monitoring per-peer CPU usage is most concretely useful is for anti-DoS measures. Reporting it is a (still useful) first step.

    (do agree tracepoint are much better for gathering detailed information, but there’s internal applications of this)

  48. vasild commented at 3:47 pm on April 9, 2025: contributor

    Hmm, https://b10c.me/projects/023-cpu-usage-of-peers/ and https://github.com/0xB10C/bitcoin/commit/59c0fe438a3e62d27eda237ae20c062608e047b3 measure the real time, aka wall clock time (cc @0xB10C).

    Message processing is single threaded, which means that we do not process messages from more than one peer at a given time, but still there are other threads in bitcoind and other processes on the system which influence the “real time”. That is, bitcoind may perform identical tasks for two peers, but while doing the tasks for the second peer some other thread or process hogs the system and delays the completion. This will skew the results.

    To measure properly one has to measure “CPU time”, not “real time”. It can still be done with tracepoints, but the CPU measuring function from this PR would have to be used.

  49. maflcko commented at 5:23 pm on April 9, 2025: member
    I wouldn’t say that one metric is more proper than the other one. Some people run on slow IO devices, so measuring the wall clock time is reasonable to find peers that eat CPU or IO (or both). Of course wall-clock is more fragile and it means you can’t hammer the program over RPC at the same time, or do anything else on the whole system, but it is still a valid way to measure. (See also the recent commit to perf that added wall-clock measurements: https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=32ecca8d7a3e2400805f8b5564a1b07e05cfcd54)
  50. in doc/release-notes-31672.md:8 in 0f68c47e93 outdated
    0@@ -0,0 +1,8 @@
    1+RPC
    2+---
    3+
    4+A new field `cpu_load` has been added to the `getpeerinfo` RPC output.
    5+It shows the CPU time (user + system) spent processing messages from the
    6+given peer and crafting messages for it expressed in per milles (‰) of
    7+the duration of the connection. The field is optional and will be omitted
    8+on platforms that do not support this or if still not measured. (#31672)
    


    mzumsande commented at 3:09 pm on April 14, 2025:
    nit: not yet measured sounds more correct to me.

    vasild commented at 8:29 am on April 15, 2025:
    Done. Also elaborated a little bit the help text.
  51. mzumsande commented at 5:44 pm on April 14, 2025: contributor

    Concept ACK, but I think that the cpu_load is not straightforward to interpret.

    As I wrote in #31033 (comment) I’d say that in absence of an attack, the highest CPU load peers are our most useful ones - it would be a unfortunate if users attempted to disconnect “DoSy” their peers based on this statistics when there is no actual DoS situation.

    On the other hand, I think that this would be really useful in case of an actual attack. If users see their node becoming slow, cpu_load could help detect who the attacker is and what they are doing.

  52. vasild force-pushed on Apr 15, 2025
  53. vasild commented at 8:25 am on April 15, 2025: contributor
    0f68c47e93...e374825246: address suggestion
  54. vasild force-pushed on Apr 15, 2025
  55. vasild commented at 8:30 am on April 15, 2025: contributor
    e374825246...9cc8ca3b1f: also update getpeerinfo help
  56. in doc/release-notes-31672.md:9 in 9cc8ca3b1f outdated
    0@@ -0,0 +1,10 @@
    1+RPC
    2+---
    3+
    4+A new field `cpu_load` has been added to the `getpeerinfo` RPC output.
    5+It shows the CPU time (user + system) spent processing messages from the
    6+given peer and crafting messages for it expressed in per milles (‰) of
    7+the duration of the connection. The field is optional and will be omitted
    8+on platforms that do not support this or if not yet measured. Note that
    9+high CPU time is not necessary a bad thing - new, valid, transactions and
    


    jonatack commented at 4:36 pm on April 16, 2025:
    0high CPU time is not necessarily a bad thing - new valid transactions and
    

    vasild commented at 7:21 am on April 17, 2025:
    Done.
  57. in doc/release-notes-31672.md:6 in 9cc8ca3b1f outdated
    0@@ -0,0 +1,10 @@
    1+RPC
    2+---
    3+
    4+A new field `cpu_load` has been added to the `getpeerinfo` RPC output.
    5+It shows the CPU time (user + system) spent processing messages from the
    6+given peer and crafting messages for it expressed in per milles (‰) of
    


    jonatack commented at 5:53 pm on April 16, 2025:

    maybe more simply replace “processing messages from the given peer and crafting messages for it” with “processing messages to/from the peer”

    (same for the RPC help)


    vasild commented at 7:21 am on April 17, 2025:
    Done.
  58. in src/rpc/net.cpp:152 in 9cc8ca3b1f outdated
    147+                        "The CPU time (user + system) spent processing messages from this peer "
    148+                        "and crafting messages for it expressed in per milles (‰) of the "
    149+                        "duration of the connection. Will be omitted on platforms that do not "
    150+                        "support this or if still not measured. Note that high CPU time is not "
    151+                        "necessary a bad thing - new, valid, transactions and blocks require "
    152+                        "CPU time to be validated."},
    


    jonatack commented at 5:57 pm on April 16, 2025:

    nit, this is very long

    0    "bytesrecv" : n,                      (numeric) The total bytes received
    1    "cpu_load" : n,                       (numeric, optional) The CPU time (user + system) spent processing messages from this peer and crafting messages for it expressed in per milles () of the duration of the connection. Will be omitted on platforms that do not support this or if still not measured. Note that high CPU time is not necessary a bad thing - new, valid, transactions and blocks require CPU time to be validated.
    2    "conntime" : xxx,                     (numeric) The UNIX epoch time of the connection
    

    perhaps more simply:

    “The CPU time (user + system) spent processing messages to/from the peer, in per milles (‰) of the connection duration”


    vasild commented at 7:24 am on April 17, 2025:

    Shortened a bit but not that much since this is the only help text (I assume release notes are not so useful for somebody looking into this for the first time in e.g. version 35.0, they will not go to search for past release notes).

    Other help texts include \n, maybe do that here as well?


    jonatack commented at 9:19 pm on April 18, 2025:

    Other help texts include \n, maybe do that here as well?

    I’m agnostic as to that.

    A few text compaction ideas, feel free to pick/choose/ignore:

    s/The CPU time (user + system)/Total CPU time/

    s/spent processing/processing/

    s/duration. Will be omitted on platforms that do not support this or if still not measured./duration, if supported by the platform and measured./

    s/Note that high CPU time is not necessarily a bad thing - new valid transactions and blocks require CPU time to be validated./High CPU time is not necessarily bad; validating new transactions and blocks requires it./

  59. jonatack commented at 6:05 pm on April 16, 2025: member

    Light initial tested ACK 9cc8ca3b1fe2b20e465ddc030c21ac36570f2935

    I did not review the cmake build code.

    Tested RPC getpeerinfo and have also been testing this by running the following branch https://github.com/jonatack/bitcoin/commits/2025-04-add-cpu_load-to-netinfo/ containing an additional commit, and then running a live dashboard with

    0$ watch ./build/bin/bitcoin-cli -rpcwait -netinfo 4
    
  60. DrahtBot requested review from i-am-yuvi on Apr 16, 2025
  61. DrahtBot requested review from mzumsande on Apr 16, 2025
  62. DrahtBot requested review from laanwj on Apr 16, 2025
  63. i-am-yuvi commented at 9:29 pm on April 16, 2025: contributor
    ACK 9cc8ca3b1fe2b20e465ddc030c21ac36570f2935
  64. vasild force-pushed on Apr 17, 2025
  65. vasild commented at 7:21 am on April 17, 2025: contributor

    9cc8ca3b1f...bb822a5ee1: address suggestions

    I think the bitcoin-cli extension in https://github.com/jonatack/bitcoin/commit/708a9502f8eca7aaa84236ea038a574f4350f298#r155516952 might be included in this PR. @i-am-yuvi, since you ACKed this PR without that extra commit, what do you think? Should it be included?

  66. in cmake/introspection.cmake:162 in bb822a5ee1 outdated
    158@@ -159,6 +159,40 @@ check_cxx_source_compiles("
    159   " HAVE_SYSCTL_ARND
    160 )
    161 
    162+# Check for clock_gettime() (POSIX.1b).
    


    fanquake commented at 9:09 am on April 17, 2025:
    I don’t think we need to introduce checks here, we already use clock_gettime. You could just #ifdef CLOCK_THREAD_CPUTIME_ID at the callsite.

    vasild commented at 9:34 am on April 17, 2025:
    Done.
  67. in cmake/introspection.cmake:178 in bb822a5ee1 outdated
    173+  }
    174+  " HAVE_CLOCK_GETTIME
    175+)
    176+
    177+# Check for GetThreadTimes() (Windows).
    178+check_cxx_source_compiles("
    


    fanquake commented at 9:09 am on April 17, 2025:
    I don’t think we check for any other Windows functions like this, I think you can just #ifdef WIN32 at the callsite.

    vasild commented at 9:34 am on April 17, 2025:
    Done. This removes all build system changes from this PR, thanks!
  68. in src/util/time.cpp:178 in bb822a5ee1 outdated
    177+    ULARGE_INTEGER user_;
    178+    user_.LowPart = user.dwLowDateTime;
    179+    user_.HighPart = user.dwHighDateTime;
    180+
    181+    // https://learn.microsoft.com/en-us/windows/win32/api/processthreadsapi/nf-processthreadsapi-getthreadtimes
    182+    // "Thread kernel mode and user mode times are amounts of time. For example,
    


    fanquake commented at 9:10 am on April 17, 2025:
    If you’re going to link to external documentation, there’s no need to copy paragraphs of it, verbatim, into our source code.

    vasild commented at 9:37 am on April 17, 2025:
    The linked article is long and convoluted and here I want to highlight only a short part of it that is relevant. Also having the text here is helpful if the web page disappears in the future.

    fanquake commented at 9:41 am on April 17, 2025:
    Then I’d suggest atleast trying to consolidate what is here, given you already link to the same place above, with just “GetThreadTimes():” as a comment (that seems unecessary).

    vasild commented at 10:06 am on April 17, 2025:
    Reduced the text and removed the link to the doc since it is already a few lines earlier.
  69. in src/util/time.cpp:147 in bb822a5ee1 outdated
    139@@ -128,3 +140,62 @@ struct timeval MillisToTimeval(std::chrono::milliseconds ms)
    140 {
    141     return MillisToTimeval(count_milliseconds(ms));
    142 }
    143+
    144+std::chrono::nanoseconds ThreadCpuTime()
    145+{
    146+#ifdef HAVE_CLOCK_GETTIME
    147+    // An alternative to clock_gettime() is getrusage().
    


    fanquake commented at 9:11 am on April 17, 2025:
    Comment seems unnecessary?

    vasild commented at 9:35 am on April 17, 2025:
    Removed.
  70. vasild force-pushed on Apr 17, 2025
  71. vasild commented at 9:34 am on April 17, 2025: contributor
    bb822a5ee1...ee16345af3: address suggestions
  72. rpc: add cpu_load to getpeerinfo
    Add a new field `cpu_load` to the output of `getpeerinfo` RPC.
    
    It represents the CPU time spent by the message handling thread for the
    given peer, weighted for the duration of the connection. That is, for
    example, if two peers are equally demanding and one is connected longer
    than the other, then they will have the same `cpu_load` number.
    19c8336d97
  73. in src/util/time.cpp:151 in ee16345af3 outdated
    146+    if (clock_gettime(CLOCK_THREAD_CPUTIME_ID, &t) == -1) {
    147+        return std::chrono::nanoseconds{0};
    148+    }
    149+    return std::chrono::seconds{t.tv_sec} + std::chrono::nanoseconds{t.tv_nsec};
    150+#elif defined(WIN32)
    151+    // An alternative to GetThreadTimes() is QueryThreadCycleTime() but it
    


    fanquake commented at 9:37 am on April 17, 2025:
    Same here, as the clock_gettime alternative comment.

    vasild commented at 10:07 am on April 17, 2025:
    Removed and added a note to the PR description about the alternatives.
  74. vasild force-pushed on Apr 17, 2025
  75. vasild commented at 10:05 am on April 17, 2025: contributor
    ee16345af3...19c8336d97: address suggestions
  76. jonatack commented at 6:57 pm on April 19, 2025: member

    Latest changes since my last review LGTM. I plan to review the code more closely.

    Empirically, running a node on this branch, it looks like over time most peers settle into less than 0.5 per milles, a few are higher, with maybe one peer around 4. These higher-load connections seem to more often be outbound ones.

    (Edit, updated the branch with the -netinfo commit with a few improvements and to also only display cpu load for a peer (rounded to nearest integer) if it is non-zero (e.g. >= 0.5 in getpeerinfo), to more easily see which peers are using more CPU: https://github.com/jonatack/bitcoin/commits/2025-04-add-cpu_load-to-netinfo/


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-05-08 18:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me