rpc: add cpu_load to getpeerinfo

vasild commented at 2:02 pm on January 16, 2025: contributor

Add a new field cpu_load to the output of getpeerinfo RPC.

It represents the CPU time spent by the message handling thread for the given peer, weighted for the duration of the connection. That is, for example, if two peers are equally demanding and one is connected longer than the other, then they will have the same cpu_load number.

Monitoring CPU usage is useful on its own. Also related to #31033.

This PR uses clock_gettime() (POSIX) and GetThreadTimes() (Windows). An alternative to those, should this be explored are: getrusage() (POSIX) and QueryThreadCycleTime() (Windows, but it counts CPU cycles).

DrahtBot commented at 2:02 pm on January 16, 2025: contributor

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/31672.

Reviews

See the guideline for information on the review process.

Type	Reviewers
Concept NACK	rebroad
Concept ACK	sipa, theStack, BrandonOdiwuor, 1440000bytes, laanwj, mzumsande
Stale ACK	jonatack, yuvicc

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

No conflicts as of last run.

LLM Linter (✨ experimental)

Possible typos and grammar issues:

require it be validated -> require that it be validated [missing “that” for clarity]

drahtbot_id_4_m

DrahtBot added the label RPC/REST/ZMQ on Jan 16, 2025

vasild force-pushed on Jan 16, 2025

DrahtBot added the label CI failed on Jan 16, 2025

DrahtBot commented at 2:05 pm on January 16, 2025: contributor

🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/35717681069

Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.
An intermittent issue.

Leave a comment here, if you need help tracking down a confusing failure.

jonatack commented at 2:25 pm on January 16, 2025: member

Concept ACK if this can be a reliable, useful metric and help pave the path to #31033.

sipa commented at 3:24 pm on January 16, 2025: member

Concept ACK

vasild force-pushed on Jan 16, 2025

theStack commented at 6:15 pm on January 16, 2025: contributor

Concept ACK

Might be worth noting that this is currently only available on POSIX systems (i.e. not available on Windows, but on all other systems that we support AFAICT) in both the RPC help and a release note.

BrandonOdiwuor commented at 7:00 pm on January 16, 2025: contributor

Concept ACK

fanquake commented at 1:06 pm on January 17, 2025: member

Seems like there was some very brief discussion in #31033, and the conclusion “I guess this should start with planting some metrics”, but I’m not sure putting changes into Bitcoin Core is the right first step.

Before changing our API, it’d be good to atleast show some usage that indicates that the changes here are useful. I assume you’ve already been running this locally, and collecting the data, so it’d be good to know what it turned up?

As noted in #31033, others have also already conducted similar research (https://b10c.me/projects/023-cpu-usage-of-peers/ or https://delvingbitcoin.org/t/cpu-usage-of-peers/196), and it seems like the data you’d expose here, will be less detailed / useful than what can/has already been produced using tracepoints or similar, so I’m wondering if this is the best approach.

vasild commented at 4:44 pm on January 17, 2025: contributor

@fanquake IMO, monitoring CPU usage is useful on its own, even if we don’t start treating peers differently based on that metric (https://github.com/bitcoin/bitcoin/issues/31033). In other words, this metric is useful at least as much as a bunch of other metrics in the getpeerinfo output.

Yes, I have been running this locally - there is like 65x difference between the least and most demanding peers. I am curious to correlate this to the messages being sent/received to/from those peers and to have a histogram of the data, not just least / most demanding (e.g. histogram of bitcoin-cli getpeerinfo |jq ".[].cpu_load"). I view those as something nice to build on top of this PR, not as a blocker that’s needed to justify the usefulness of the CPU time metric.

1440000bytes commented at 5:34 pm on January 17, 2025: none

Concept ACK

vasild commented at 12:25 pm on January 20, 2025: contributor

Might be worth noting that this is currently only available on POSIX systems (i.e. not available on Windows

Right. GetThreadTimes() looks like a promising Windows alternative. I will try to implement that here. Switching to draft because I will push some work-in-progress a bunch of times.

vasild marked this as a draft on Jan 20, 2025

vasild force-pushed on Jan 21, 2025

vasild force-pushed on Jan 22, 2025

DrahtBot removed the label CI failed on Jan 22, 2025

vasild force-pushed on Jan 22, 2025

vasild marked this as ready for review on Jan 23, 2025

vasild commented at 10:07 am on January 23, 2025: contributor

Ready for review. Implemented for Windows as well.

Here is a test program I played with to test this manually. I think it is not worth adding as a unit test or benchmark, but at least it can live here in the comments of this PR:

 0#include <assert.h>
 1
 2#include <windows.h>
 3#include <winnt.h>
 4
 5#include <processthreadsapi.h>
 6
 7#include <iostream>
 8#include <thread>
 9
10void thread(size_t work)
11{
12    FILETIME before_creation;
13    FILETIME before_exit;
14    FILETIME before_kernel;
15    FILETIME before_user;
16    bool ret = GetThreadTimes(GetCurrentThread(),
17        &before_creation,
18        &before_exit,
19        &before_kernel,
20        &before_user);
21    assert(ret == 1);
22
23    if (work > 10'000) {
24        for (size_t i{0}; i < work; ++i) {
25            (void)(i * i);
26        }
27    } else {
28        std::this_thread::sleep_for(std::chrono::milliseconds{work});
29    }
30
31    FILETIME after_creation;
32    FILETIME after_exit;
33    FILETIME after_kernel;
34    FILETIME after_user;
35    ret = GetThreadTimes(GetCurrentThread(),
36        &after_creation,
37        &after_exit,
38        &after_kernel,
39        &after_user);
40    assert(ret == 1);
41
42    ULARGE_INTEGER before_kernel_;
43    before_kernel_.LowPart = before_kernel.dwLowDateTime;
44    before_kernel_.HighPart = before_kernel.dwHighDateTime;
45
46    ULARGE_INTEGER before_user_;
47    before_user_.LowPart = before_user.dwLowDateTime;
48    before_user_.HighPart = before_user.dwHighDateTime;
49
50    ULARGE_INTEGER after_kernel_;
51    after_kernel_.LowPart = after_kernel.dwLowDateTime;
52    after_kernel_.HighPart = after_kernel.dwHighDateTime;
53
54    ULARGE_INTEGER after_user_;
55    after_user_.LowPart = after_user.dwLowDateTime;
56    after_user_.HighPart = after_user.dwHighDateTime;
57
58    ULARGE_INTEGER elapsed;
59    elapsed.QuadPart = after_kernel_.QuadPart + after_user_.QuadPart - before_kernel_.QuadPart - before_user_.QuadPart;
60    ULONGLONG elapsed_ns{elapsed.QuadPart * 100};
61    std::cout << "cpu time: " << elapsed_ns / 1'000'000'000.0 << " sec\n";
62}
63
64int main ()
65{
66    std::thread t1{thread, 1000000000};
67    std::thread t2{thread, 1000000000};
68    std::thread t3{thread, 3000};
69    t1.join();
70    t2.join();
71    t3.join();
72
73    return 0;
74}

in src/rpc/net.cpp:150 in 28b2a67f13 outdated

142@@ -143,6 +143,11 @@ static RPCHelpMan getpeerinfo()
143                     {RPCResult::Type::NUM_TIME, "last_block", "The " + UNIX_EPOCH_TIME + " of the last block received from this peer"},
144                     {RPCResult::Type::NUM, "bytessent", "The total bytes sent"},
145                     {RPCResult::Type::NUM, "bytesrecv", "The total bytes received"},
146+                    {RPCResult::Type::NUM, "cpu_load", /*optional=*/true,
147+                        "The CPU time (user + system) spent processing messages from this peer "
148+                        "and crafting messages for it expressed in per milles (‰) of the "
149+                        "duration of the connection. Will be omitted on platforms that do not "
150+                        "support this or if still not measured."},

vasild commented at 10:19 am on January 23, 2025:

Currently I chose to represent the CPU load as 1/1000s of the connection time. For example, if we have spent 5 CPU seconds for a peer that has been connected for 1000 seconds, then cpu_load will be 5.

Output of bitcoin-cli getpeerinfo |jq '.[].cpu_load ' |sort -n:

Shortly after startup:

 00.003207936170212766
 10.003909962962962963
 20.004000127659574468
 30.004340326086956522
 40.00904279365079365
 50.0270041914893617
 60.03776429787234042
 70.04154987234042553
 80.05750733333333333
 90.0828310843373494
100.1014558936170213
110.1453753214285714
120.1619409404761905

After running for a few hours:

 00.5100616477272727
 10.7087315352941177
 20.7814365649717514
 30.8773131578947369
 40.9717403481228669
 51.069695995623632
 61.078479046189377
 71.153550253945481
 81.167638256213824
 91.187242140449438
101.199645694868011
111.199951082706767
121.220689207655502
131.264582222222222
141.265083146221971
151.282676876955162
161.301380197644649
171.303624877378436
181.333201868001821
191.341489760490639
201.343411334541063
211.361829386934673
221.364938446935725
231.372325493506493
241.385421828947368
251.472354355769231
261.612321964879852
271.641998848148148
281.648207868421053
291.681837521410579
301.783343262569832
311.78646682029841
321.837577241385135
331.844571151379764
341.851067047263682
351.908551290322581
361.915087061408061
371.923511220435511
381.939589383639822
391.940788162162162
401.962705583687341
412.000298126030624
422.008423324931507
432.00921122707588
442.018911893157895
452.019168323205742
462.019358503865546
472.030658536117768
482.034708427345187
492.03510252110758
502.04022746739726
512.054511708699122
522.07339420882353
532.076240002244949
542.080369749066667
552.08561637254902
562.092916448969578
572.095147154892331
582.119004164227642
592.129640033722438
602.145774200383772
612.164519192393115
622.27069154817898
632.29444800067659
642.313547881255947
652.31539280930693
662.368903822619457
672.445622954545454
682.503377741525424
692.557022695182724
702.56755574523507
712.709643024574987
722.724320047021944
732.913325884282384
743.154521111713488
753.706281482599432
763.843126643942505
773.865110431762718
783.934123987370194
794.634227207136824
805.185085595879828
815.233027217613057
825.766649107296137
836.105655600034141
846.138089870962286
8510.09238865432725
8610.24402739230769
8710.56650271556122
8820.75054951145038
8947.69228849275363
9077.79810328289474
91145.2686368960396

Maybe using an integer would be better for this? Then it would have to have higher resolution, e.g. instead of 1/1000s it should show 1/1Bs? Then the above numbers would vary from 3207 to 145'268'636. E.g. the lightest peer has caused 3207 nanoseconds of CPU time for each 1 second (=1B nanoseconds) of connection time.

in src/util/time.cpp:128 in 28b2a67f13 outdated

123+
124+    timespec t;
125+    if (clock_gettime(CLOCK_THREAD_CPUTIME_ID, &t) == -1) {
126+        return std::chrono::nanoseconds{0};
127+    }
128+    return std::chrono::nanoseconds{static_cast<int64_t>(t.tv_sec) * 1'000'000'000 + t.tv_nsec};

maflcko commented at 10:25 am on January 23, 2025:

nit: Instead of manual casting and multiplication, you can use chrono types and let the compiler insert what is needed.

vasild commented at 12:38 pm on January 23, 2025:

Right, much better and simpler this way. Thanks!

in src/rpc/net.cpp:250 in 28b2a67f13 outdated

245@@ -239,6 +246,11 @@ static RPCHelpMan getpeerinfo()
246         obj.pushKV("last_block", count_seconds(stats.m_last_block_time));
247         obj.pushKV("bytessent", stats.nSendBytes);
248         obj.pushKV("bytesrecv", stats.nRecvBytes);
249+        const auto cpu_time_ns{count_nanoseconds(stats.m_cpu_time)};
250+        const auto connection_time_ns{count_nanoseconds(now - stats.m_connected)};

maflcko commented at 10:29 am on January 23, 2025:

nit: Instead of manually casting, you can just divide two chrono types and let the compiler insert anything that is needed.

vasild commented at 12:40 pm on January 23, 2025:

Thanks for the suggestion! And now no need to add a new function count_nanoseconds().

maflcko commented at 10:31 am on January 23, 2025: member

(left two nits, feel free to ignore)

vasild force-pushed on Jan 23, 2025

vasild commented at 12:38 pm on January 23, 2025: contributor

28b2a67f13...0f68c47e93: address suggestions by @maflcko

maflcko commented at 1:06 pm on January 23, 2025: member

The code here assumes that Bitcoin Core is single threaded, which may be true right now, but could change in the future.

luke-jr commented at 8:35 pm on January 25, 2025: member

How so? Both implementations use thread-specific timers…?

yuvicc commented at 4:40 pm on January 26, 2025: contributor

Concept ACK

I think this would be a clean way to get peers’ CPU usage for message processing since Tracepoints is currently not compatible with macOS/Windows for which folks will be having a hard time.

Also, this could pave the way for prioritising peers based on CPU usage #31033.

yuvicc commented at 4:52 pm on January 26, 2025: contributor

Also, during my observations I found value null as well.

0null
10.0352094358974359
20.54925525
30.579001
40.6585706
51.390154842105263
61.703986581081081
72.427459408450704
824.85140159887005

Any idea why is it null? maybe it is about to get disconnected.

vasild commented at 10:52 am on January 27, 2025: contributor

Any idea why is it null? maybe it is about to get disconnected.

jq '.[].cpu_load' displays null when there is no cpu_load in the JSON for that peer. The new field is optional:

0$ bitcoin-cli help getpeerinfo
1...
2    "cpu_load" : n,                       (numeric, optional) The CPU time (user + system) spent processing messages from this peer and crafting messages for it expressed in per milles (‰) of the duration of the connection. Will be omitted on platforms that do not support this or if still not measured.

The omission comes from this condition:

0if (stats.m_cpu_time > 0s && now > stats.m_connected) {

Within 1 second of connecting, now will be equal to stats.m_connected (they use second-precision). I am open to suggestions if you think this can be done in a better or more intuitive way.

vasild commented at 11:01 am on January 27, 2025: contributor

The code here assumes that Bitcoin Core is single threaded, which may be true right now,

Hmm? https://github.com/bitcoin/bitcoin/blob/0a931a9787b196d7a620863cc143d9319ffd356d/doc/developer-notes.md#threads

but could change in the future.

I guess you mean that message processing is single threaded - the b-msghand thread. I agree it could change in the future.

yuvicc commented at 12:46 pm on January 27, 2025: contributor

Any idea why is it null? maybe it is about to get disconnected.

jq '.[].cpu_load' displays null when there is no cpu_load in the JSON for that peer. The new field is optional:
0$ bitcoin-cli help getpeerinfo
1...
2    "cpu_load" : n,                       (numeric, optional) The CPU time (user + system) spent processing messages from this peer and crafting messages for it expressed in per milles (‰) of the duration of the connection. Will be omitted on platforms that do not support this or if still not measured.
The omission comes from this condition:
0if (stats.m_cpu_time > 0s && now > stats.m_connected) {
Within 1 second of connecting, now will be equal to stats.m_connected (they use second-precision). I am open to suggestions if you think this can be done in a better or more intuitive way.

Got it, thanks!

yuvicc approved

yuvicc commented at 12:49 pm on January 27, 2025: contributor

tACK 0f68c47e931de05200adeae639bcee50ea3c171d

DrahtBot requested review from jonatack on Jan 27, 2025

DrahtBot requested review from sipa on Jan 27, 2025

DrahtBot requested review from theStack on Jan 27, 2025

DrahtBot requested review from BrandonOdiwuor on Jan 27, 2025

maflcko commented at 11:26 am on February 5, 2025: member

I guess you mean that message processing is single threaded - the b-msghand thread. I agree it could change in the future.

I left the comment, because if the work is done in a different thread, this feature will silently break. I guess that is fine if the work is minimal, or code could be added to do the additional accounting if the work is substantial.

vasild commented at 12:29 pm on February 5, 2025: contributor

@maflcko yes, I agree. Future changes that make message processing multi-threaded will have to take this into account.

luke-jr referenced this in commit 9febeb13b1 on Feb 22, 2025

luke-jr referenced this in commit 13e161aa1a on Feb 28, 2025

laanwj commented at 10:18 am on April 9, 2025: member

Ohh this is neat. Concept ACK.

As noted in #31033, others have also already conducted similar research (https://b10c.me/projects/023-cpu-usage-of-peers/ or https://delvingbitcoin.org/t/cpu-usage-of-peers/196), and it seems like the data you’d expose here, will be less detailed / useful than what can/has already been produced using tracepoints or similar, so I’m wondering if this is the best approach.

Where i think monitoring per-peer CPU usage is most concretely useful is for anti-DoS measures. Reporting it is a (still useful) first step.

(do agree tracepoint are much better for gathering detailed information, but there’s internal applications of this)

vasild commented at 3:47 pm on April 9, 2025: contributor

Hmm, https://b10c.me/projects/023-cpu-usage-of-peers/ and https://github.com/0xB10C/bitcoin/commit/59c0fe438a3e62d27eda237ae20c062608e047b3 measure the real time, aka wall clock time (cc @0xB10C).

Message processing is single threaded, which means that we do not process messages from more than one peer at a given time, but still there are other threads in bitcoind and other processes on the system which influence the “real time”. That is, bitcoind may perform identical tasks for two peers, but while doing the tasks for the second peer some other thread or process hogs the system and delays the completion. This will skew the results.

To measure properly one has to measure “CPU time”, not “real time”. It can still be done with tracepoints, but the CPU measuring function from this PR would have to be used.

maflcko commented at 5:23 pm on April 9, 2025: member

I wouldn’t say that one metric is more proper than the other one. Some people run on slow IO devices, so measuring the wall clock time is reasonable to find peers that eat CPU or IO (or both). Of course wall-clock is more fragile and it means you can’t hammer the program over RPC at the same time, or do anything else on the whole system, but it is still a valid way to measure. (See also the recent commit to perf that added wall-clock measurements: https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=32ecca8d7a3e2400805f8b5564a1b07e05cfcd54)

in doc/release-notes-31672.md:8 in 0f68c47e93 outdated

0@@ -0,0 +1,8 @@
1+RPC
2+---
3+
4+A new field `cpu_load` has been added to the `getpeerinfo` RPC output.
5+It shows the CPU time (user + system) spent processing messages from the
6+given peer and crafting messages for it expressed in per milles (‰) of
7+the duration of the connection. The field is optional and will be omitted
8+on platforms that do not support this or if still not measured. (#31672)

mzumsande commented at 3:09 pm on April 14, 2025:

nit: not yet measured sounds more correct to me.

vasild commented at 8:29 am on April 15, 2025:

Done. Also elaborated a little bit the help text.

mzumsande commented at 5:44 pm on April 14, 2025: contributor

Concept ACK, but I think that the cpu_load is not straightforward to interpret.

As I wrote in #31033 (comment) I’d say that in absence of an attack, the highest CPU load peers are our most useful ones - it would be a unfortunate if users attempted to disconnect “DoSy” their peers based on this statistics when there is no actual DoS situation.

On the other hand, I think that this would be really useful in case of an actual attack. If users see their node becoming slow, cpu_load could help detect who the attacker is and what they are doing.

vasild force-pushed on Apr 15, 2025

vasild commented at 8:25 am on April 15, 2025: contributor

0f68c47e93...e374825246: address suggestion

vasild force-pushed on Apr 15, 2025

vasild commented at 8:30 am on April 15, 2025: contributor

e374825246...9cc8ca3b1f: also update getpeerinfo help

in doc/release-notes-31672.md:9 in 9cc8ca3b1f outdated

0@@ -0,0 +1,10 @@
1+RPC
2+---
3+
4+A new field `cpu_load` has been added to the `getpeerinfo` RPC output.
5+It shows the CPU time (user + system) spent processing messages from the
6+given peer and crafting messages for it expressed in per milles (‰) of
7+the duration of the connection. The field is optional and will be omitted
8+on platforms that do not support this or if not yet measured. Note that
9+high CPU time is not necessary a bad thing - new, valid, transactions and

jonatack commented at 4:36 pm on April 16, 2025:

0high CPU time is not necessarily a bad thing - new valid transactions and

vasild commented at 7:21 am on April 17, 2025:

Done.

in doc/release-notes-31672.md:6 in 9cc8ca3b1f outdated

0@@ -0,0 +1,10 @@
1+RPC
2+---
3+
4+A new field `cpu_load` has been added to the `getpeerinfo` RPC output.
5+It shows the CPU time (user + system) spent processing messages from the
6+given peer and crafting messages for it expressed in per milles (‰) of

jonatack commented at 5:53 pm on April 16, 2025:

maybe more simply replace “processing messages from the given peer and crafting messages for it” with “processing messages to/from the peer”

(same for the RPC help)

vasild commented at 7:21 am on April 17, 2025:

Done.

in src/rpc/net.cpp:152 in 9cc8ca3b1f outdated

147+                        "The CPU time (user + system) spent processing messages from this peer "
148+                        "and crafting messages for it expressed in per milles (‰) of the "
149+                        "duration of the connection. Will be omitted on platforms that do not "
150+                        "support this or if still not measured. Note that high CPU time is not "
151+                        "necessary a bad thing - new, valid, transactions and blocks require "
152+                        "CPU time to be validated."},

jonatack commented at 5:57 pm on April 16, 2025:

nit, this is very long

0    "bytesrecv" : n,                      (numeric) The total bytes received
1    "cpu_load" : n,                       (numeric, optional) The CPU time (user + system) spent processing messages from this peer and crafting messages for it expressed in per milles (‰) of the duration of the connection. Will be omitted on platforms that do not support this or if still not measured. Note that high CPU time is not necessary a bad thing - new, valid, transactions and blocks require CPU time to be validated.
2    "conntime" : xxx,                     (numeric) The UNIX epoch time of the connection

perhaps more simply:

“The CPU time (user + system) spent processing messages to/from the peer, in per milles (‰) of the connection duration”

vasild commented at 7:24 am on April 17, 2025:

Shortened a bit but not that much since this is the only help text (I assume release notes are not so useful for somebody looking into this for the first time in e.g. version 35.0, they will not go to search for past release notes).

Other help texts include \n, maybe do that here as well?

jonatack commented at 9:19 pm on April 18, 2025:

Other help texts include \n, maybe do that here as well?

I’m agnostic as to that.

A few text compaction ideas, feel free to pick/choose/ignore:

s/The CPU time (user + system)/Total CPU time/

s/spent processing/processing/

s/duration. Will be omitted on platforms that do not support this or if still not measured./duration, if supported by the platform and measured./

s/Note that high CPU time is not necessarily a bad thing - new valid transactions and blocks require CPU time to be validated./High CPU time is not necessarily bad; validating new transactions and blocks requires it./

vasild commented at 1:48 pm on May 13, 2025:

Took some, thanks!

jonatack commented at 6:05 pm on April 16, 2025: member

Light initial tested ACK 9cc8ca3b1fe2b20e465ddc030c21ac36570f2935

I did not review the cmake build code.

Tested RPC getpeerinfo and have also been testing this by running the following branch https://github.com/jonatack/bitcoin/commits/2025-04-add-cpu_load-to-netinfo/ containing an additional commit, and then running a live dashboard with

0$ watch ./build/bin/bitcoin-cli -rpcwait -netinfo 4

DrahtBot requested review from yuvicc on Apr 16, 2025

DrahtBot requested review from mzumsande on Apr 16, 2025

DrahtBot requested review from laanwj on Apr 16, 2025

yuvicc commented at 9:29 pm on April 16, 2025: contributor

ACK 9cc8ca3b1fe2b20e465ddc030c21ac36570f2935

vasild force-pushed on Apr 17, 2025

vasild commented at 7:21 am on April 17, 2025: contributor

9cc8ca3b1f...bb822a5ee1: address suggestions

I think the bitcoin-cli extension in https://github.com/jonatack/bitcoin/commit/708a9502f8eca7aaa84236ea038a574f4350f298#r155516952 might be included in this PR. @i-am-yuvi, since you ACKed this PR without that extra commit, what do you think? Should it be included?

in cmake/introspection.cmake:162 in bb822a5ee1 outdated

158@@ -159,6 +159,40 @@ check_cxx_source_compiles("
159   " HAVE_SYSCTL_ARND
160 )
161 
162+# Check for clock_gettime() (POSIX.1b).

fanquake commented at 9:09 am on April 17, 2025:

I don’t think we need to introduce checks here, we already use clock_gettime. You could just #ifdef CLOCK_THREAD_CPUTIME_ID at the callsite.

vasild commented at 9:34 am on April 17, 2025:

Done.

in cmake/introspection.cmake:178 in bb822a5ee1 outdated

173+  }
174+  " HAVE_CLOCK_GETTIME
175+)
176+
177+# Check for GetThreadTimes() (Windows).
178+check_cxx_source_compiles("

fanquake commented at 9:09 am on April 17, 2025:

I don’t think we check for any other Windows functions like this, I think you can just #ifdef WIN32 at the callsite.

vasild commented at 9:34 am on April 17, 2025:

Done. This removes all build system changes from this PR, thanks!

in src/util/time.cpp:178 in bb822a5ee1 outdated

177+    ULARGE_INTEGER user_;
178+    user_.LowPart = user.dwLowDateTime;
179+    user_.HighPart = user.dwHighDateTime;
180+
181+    // https://learn.microsoft.com/en-us/windows/win32/api/processthreadsapi/nf-processthreadsapi-getthreadtimes
182+    // "Thread kernel mode and user mode times are amounts of time. For example,

fanquake commented at 9:10 am on April 17, 2025:

If you’re going to link to external documentation, there’s no need to copy paragraphs of it, verbatim, into our source code.

vasild commented at 9:37 am on April 17, 2025:

The linked article is long and convoluted and here I want to highlight only a short part of it that is relevant. Also having the text here is helpful if the web page disappears in the future.

fanquake commented at 9:41 am on April 17, 2025:

Then I’d suggest atleast trying to consolidate what is here, given you already link to the same place above, with just “GetThreadTimes():” as a comment (that seems unecessary).

vasild commented at 10:06 am on April 17, 2025:

Reduced the text and removed the link to the doc since it is already a few lines earlier.

in src/util/time.cpp:147 in bb822a5ee1 outdated

139@@ -128,3 +140,62 @@ struct timeval MillisToTimeval(std::chrono::milliseconds ms)
140 {
141     return MillisToTimeval(count_milliseconds(ms));
142 }
143+
144+std::chrono::nanoseconds ThreadCpuTime()
145+{
146+#ifdef HAVE_CLOCK_GETTIME
147+    // An alternative to clock_gettime() is getrusage().

fanquake commented at 9:11 am on April 17, 2025:

Comment seems unnecessary?

vasild commented at 9:35 am on April 17, 2025:

Removed.

vasild force-pushed on Apr 17, 2025

vasild commented at 9:34 am on April 17, 2025: contributor

bb822a5ee1...ee16345af3: address suggestions

in src/util/time.cpp:151 in ee16345af3 outdated

146+    if (clock_gettime(CLOCK_THREAD_CPUTIME_ID, &t) == -1) {
147+        return std::chrono::nanoseconds{0};
148+    }
149+    return std::chrono::seconds{t.tv_sec} + std::chrono::nanoseconds{t.tv_nsec};
150+#elif defined(WIN32)
151+    // An alternative to GetThreadTimes() is QueryThreadCycleTime() but it

fanquake commented at 9:37 am on April 17, 2025:

Same here, as the clock_gettime alternative comment.

vasild commented at 10:07 am on April 17, 2025:

Removed and added a note to the PR description about the alternatives.

vasild force-pushed on Apr 17, 2025

vasild commented at 10:05 am on April 17, 2025: contributor

ee16345af3...19c8336d97: address suggestions

jonatack commented at 6:57 pm on April 19, 2025: member

Latest changes since my last review LGTM. I plan to review the code more closely.

Empirically, running a node on this branch, it looks like over time most peers settle into less than 0.5 per milles, a few are higher, with maybe one peer around 4. These higher-load connections seem to more often be outbound ones.

(Edit, updated the branch with the -netinfo commit with a few improvements and to also only display cpu load for a peer (rounded to nearest integer) if it is non-zero (e.g. >= 0.5 in getpeerinfo), to more easily see which peers are using more CPU: https://github.com/jonatack/bitcoin/commits/2025-04-add-cpu_load-to-netinfo/

vasild force-pushed on May 13, 2025

vasild commented at 1:45 pm on May 13, 2025: contributor

19c8336d97...8b8b854346: address suggestions and take the bitcoin-cli change from https://github.com/jonatack/bitcoin/commits/2025-04-add-cpu_load-to-netinfo/, thanks!

luke-jr referenced this in commit e05aeffe77 on Jun 6, 2025

luke-jr referenced this in commit e109d55817 on Jun 6, 2025

DrahtBot added the label CI failed on Jun 15, 2025

DrahtBot commented at 11:06 am on June 15, 2025: contributor

🚧 At least one of the CI tasks failed. Task tidy: https://github.com/bitcoin/bitcoin/runs/42137973271 LLM reason (✨ experimental): The CI failure is caused by clang-tidy detecting a deprecated header include in the code.

Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.
An intermittent issue.

Leave a comment here, if you need help tracking down a confusing failure.

rpc: add cpu_load to getpeerinfo

Add a new field `cpu_load` to the output of `getpeerinfo` RPC.

It represents the CPU time spent by the message handling thread for the
given peer, weighted for the duration of the connection. That is, for
example, if two peers are equally demanding and one is connected longer
than the other, then they will have the same `cpu_load` number.

f1517d96eb

cli: add getpeerinfo#cpu_load to -netinfo b25b40ebd5

vasild force-pushed on Jun 20, 2025

vasild commented at 8:00 am on June 20, 2025: contributor

8b8b854346...b25b40ebd5: rebase and pet tidy: time.h -> ctime

DrahtBot removed the label CI failed on Jun 20, 2025

luke-jr referenced this in commit 9084ea96ac on Jul 17, 2025

luke-jr referenced this in commit ca906b350e on Jul 17, 2025

rebroad commented at 7:06 pm on July 28, 2025: contributor

NACK. cpu load alone is a useless metric. However, CPU load divided by TXs accepted into the chain (or mempool) could be a useful metric.

rpc: add cpu_load to getpeerinfo #31672

Code Coverage & Benchmarks

Reviews

Conflicts

LLM Linter (✨ experimental)