Cuts out some wasted time in net socket handling. First, only calculates the current time once every 50ms, rather than once for each peer, which given we only care about second-level precision seems more than adequate. Second, caches the value of the -capturemessages setting in CConnman rather than re-evaluating it every time we invoke PushMessaage.
net: Waste less time in socket handling #34025
pull ajtowns wants to merge 2 commits into bitcoin:master from ajtowns:202512-netsplit-opt changing 4 files +23 −11-
ajtowns commented at 10:44 PM on December 6, 2025: contributor
-
cea443e246
net: Pass time to InactivityChecks fuctions
We run InactivityChecks() for each node everytime poll()/select() every 50ms or so. Rather than calculating the current time once for each node, just calculate it once and reuse it.
- DrahtBot added the label P2P on Dec 6, 2025
-
DrahtBot commented at 10:44 PM on December 6, 2025: contributor
<!--e57a25ab6845829454e8d69fc972939a-->
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.
<!--006a51241073e994b41acfe9ec718e94-->
Code Coverage & Benchmarks
For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/34025.
<!--021abf342d371248e50ceaed478a90ca-->
Reviews
See the guideline for information on the review process.
If your review is incorrectly listed, please copy-paste <code><!--meta-tag:bot-skip--></code> into the comment that the bot should ignore.
<!--174a7506f384e20aa4161008e828411d-->
Conflicts
Reviewers, this pull request conflicts with the following ones:
- #30951 (net: option to disallow v1 connection on ipv4 and ipv6 peers by stratospher)
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.
<!--5faf32d7da4f0f540f40219e4f7537a3-->
LLM Linter (✨ experimental)
Possible places where named args for integral literals may be used (e.g.
func(x, /*named_arg=*/0)in C++, andfunc(x, named_arg=0)in Python):- CaptureMessage(pnode->addr, msg.m_type, msg.data, /is_incoming=/false) in src/net.cpp
<sup>2025-12-09</sup>
-
ajtowns commented at 10:56 PM on December 6, 2025: contributor
Not sure if the flame graph is usable, but:
GetBoolArg takes up 0.31% of total time, as part of PushMessage that takes up 1.75% of total time, in b-msghand.
GetTime takes up 0.82% of total time, as part of InactivityCheck that takes up 1.78% of total time, in b-net.
Converting from std::chrono::microseconds to NodeClock::time_point is a lot more intrusive (impacting at least net_processing and node/eviction as well).
Note that CConnman was a friend of CNode until #27257(no longer relevant) -
sedited commented at 11:24 PM on December 6, 2025: contributor
Concept ACK
-
in src/net.cpp:2125 in b4d1007221 outdated
2121 | @@ -2123,6 +2122,8 @@ void CConnman::SocketHandlerConnected(const std::vector<CNode*>& nodes, 2122 | { 2123 | AssertLockNotHeld(m_total_bytes_sent_mutex); 2124 | 2125 | + auto now = GetTime<std::chrono::microseconds>();
vasild commented at 8:34 AM on December 9, 2025:GetTime()is deprecated.const auto now = NodeClock::now();(this is moving the deprecated call from elsewhere, but now is a good time to change it)
ajtowns commented at 8:22 PM on December 9, 2025:Converting from std::chrono::microseconds to NodeClock::time_point is a lot more intrusive (impacting at least net_processing and node/eviction as well).
I did try that, it requires a lot of changes to all the things we compare
nowagainst, som_connected,m_last_send,m_last_recv.m_connectedin particular is a big hit compared to the rest of this PR.
vasild commented at 5:31 AM on December 10, 2025:This works:
@@ -2125 +2125 @@ void CConnman::SocketHandlerConnected(const std::vector<CNode*>& nodes, - auto now = GetTime<std::chrono::microseconds>(); + auto now = NodeClock::now(); @@ -2218 +2218 @@ void CConnman::SocketHandlerConnected(const std::vector<CNode*>& nodes, - if (InactivityCheck(*pnode, now)) pnode->fDisconnect = true; + if (InactivityCheck(*pnode, now.time_since_epoch())) pnode->fDisconnect = true;
vasild commented at 5:38 AM on December 10, 2025:A question beyond this PR: if
GetTime()is still useful because a lot of surrounding code uses e.g.std::chrono::secondswhich we need to compare against, then shouldGetTime()be un-deprecated?
maflcko commented at 6:53 AM on December 10, 2025:Converting from std::chrono::microseconds to NodeClock::time_point is a lot more intrusive (impacting at least net_processing and node/eviction as well).
Happy to take a look as well in a fresh commit, either here, or in a follow-up.
if
GetTime()is still useful because a lot of surrounding code uses e.g.std::chrono::secondswhich we need to compare against, then shouldGetTime()be un-deprecated?GetTime returning a time duration is wrong, because the current time point (now) is not a duration, but a time point. A duration arises as the difference of two points in time. This duration can then be compared with any other duration (e.g. peer timeout). I don't think it makes sense un-deprecate something just because it is used in the current code. If this was a valid reason, nothing could ever be marked deprecated as long as it is used.
vasild commented at 9:07 AM on December 10, 2025:@maflcko, yes, I agree. But existent code uses "duration":
CNode::m_last_send CNode::m_last_recv CNode::m_last_tx_time CNode::m_last_block_time CNode::m_connectedso, if one if writing new code that needs to compare "now" to those, which one is preferred:
- use the deprecated
GetTime(); or - use
NodeClock::now()and convert usingtime_since_epoch()in order to compare?
ajtowns commented at 9:07 PM on December 10, 2025:Happy to take a look as well in a fresh commit, either here, or in a follow-up.
Happy to review in a followup.
When looking at this previously, it seemed like having an
AtomicTimePoint<Clock>template would be helpful (for things likem_last_sendwhich is currentlyatomic<seconds>), because time_point doesn't support being an atomic. Here's a branch where I last looked at this topic; the "Add AtomicTimePoint" commit might be worth cherry-picking.
maflcko commented at 3:28 PM on December 12, 2025:because time_point doesn't support being an atomic.
Duration does not either. I think it is fine to type
.load()or.store(), where needed.I guess there is no rush here, and this can be done, when all other places (non-atomic) are switched.
ajtowns commented at 10:07 PM on December 12, 2025:Duration works as an atomic fine -- we have
atomic<seconds>andatomic<microseconds>in various places in net, net_processing and util/time.atomic<time_point>doesn't work because even if the duration isnoexcept(necessary for atomic), that doesn't propagate to the time_point's constructor also being noexcept which is a requirement for wrapping in atomic.
maflcko commented at 12:39 PM on December 13, 2025:I may be missing something obvious. https://godbolt.org/z/8GM97aYhs seems to compile fine. And https://eel.is/c++draft/time.duration.cons doesn't mention
noexcept. Can you give a minimal reproducer, or link to the std docs?
ajtowns commented at 9:59 PM on December 13, 2025:What I was seeing was that the duration constructor was noexcept in practice, but time_point has an explicit constructor that wouldn't throw but wasn't declared as noexcept, which was then rejected by atomic...
Stackoverflow answer about it: https://stackoverflow.com/questions/22701617/should-constructors-on-stdchrono-time-point-be-noexcept-or-why-arent
Ah, it looks like it changed between C++17 and C++20, going from
atomic() noexcept = default;toconstexpr atomic() noexcept(is_nothrow_default_constructible_v<T>);which fixes it.
ajtowns commented at 10:00 PM on December 13, 2025:(Having a custom class might still have some value in allowing direct comparisons/calculations without explicit loads, I guess)
ajtowns commented at 11:20 PM on December 13, 2025:Hmm, it looks to me like libstdc++ didn't match the C++20 spec until https://github.com/gcc-mirror/gcc/commit/613f8ddbe3d7da63d827e588bf0333813c184b8a which doesn't seem to have been merged until March this year in libstdc++15? C++20 spec change was https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p0883r2.pdf
However I can't reproduce the "core" bug LWG#2165 CWG#1778 even with
-std=c++11, so apparently modern compilers just no longer behave this way? I can't quite figure out what the change was or when it was made though. :man_shrugging:EDIT: ah, looks like the underlying language issue was present in g++ 9.x but fixed in g++ 10.x (May 2020).
in src/net.cpp:2218 in b4d1007221 outdated
2214 | @@ -2214,7 +2215,7 @@ void CConnman::SocketHandlerConnected(const std::vector<CNode*>& nodes, 2215 | } 2216 | } 2217 | 2218 | - if (InactivityCheck(*pnode)) pnode->fDisconnect = true; 2219 | + if (InactivityCheck(*pnode, now)) pnode->fDisconnect = true;
vasild commented at 8:38 AM on December 9, 2025:nit, feel free to ignore: given that we allow some nontrivial amount of time to pass between the variable initialization and usage, maybe
nowis not the best name for it. What abouttime_at_start_of_loopor something like that?
ajtowns commented at 8:51 PM on December 9, 2025:I'm not sure I'd say "we allow some nontrivial amount of time to pass" -- it's probably a bug if that were to actually happen?
in src/net.h:1677 in b4d1007221 outdated
1673 | @@ -1666,6 +1674,11 @@ class CConnman 1674 | */ 1675 | bool whitelist_relay; 1676 | 1677 | + /** 1678 | + * flag for whether messages are captured 1679 | + */ 1680 | + bool capture_messages{false};
vasild commented at 10:30 AM on December 9, 2025:naming:
m_capture_messagesin src/net.h:970 in b4d1007221
966 | @@ -978,6 +967,8 @@ class CNode 967 | } 968 | 969 | private: 970 | + friend class CConnman;
vasild commented at 10:33 AM on December 9, 2025:I do not like the second commit a542bf10f0
net: [refactor] move ReceiveMsgBytes to CConnman- it reduces the encapsulation ofCNodeand goes against #27257. Its commit message says:This allows ReceiveMsgBytes() to access CConnman members without needing to copy them to every CNode. This is used in the next commit.
But I don't think the next commit b4d1007221
net: Cache -capturemessages settingdoes any of that, hmm?
ajtowns commented at 8:23 PM on December 9, 2025:Ooof.
in src/net.cpp:2126 in b4d1007221 outdated
2121 | @@ -2123,6 +2122,8 @@ void CConnman::SocketHandlerConnected(const std::vector<CNode*>& nodes, 2122 | { 2123 | AssertLockNotHeld(m_total_bytes_sent_mutex); 2124 | 2125 | + auto now = GetTime<std::chrono::microseconds>(); 2126 | +
vasild commented at 10:40 AM on December 9, 2025:I think the reasoning of the first commit cea443e246
net: Pass time to InactivityChecks fuctionsis well grounded. No need to retrieve the current time for every node, given that we only care about seconds-precision here. I measured on my slow node with a few tens of connections: all nodes are processed in a few milliseconds or less. So, at worst this is outdated that much which is fine IMO.ajtowns force-pushed on Dec 9, 2025dergoegge commented at 6:37 PM on December 9, 2025: memberRe: #34025 (comment)
Just to confirm, this flamegraph is from a node that has finished syncing to the tip? i.e. IBD is not included in this graph, right?
maflcko commented at 7:35 PM on December 9, 2025: memberGetTime takes up 0.82% of total time, as part of InactivityCheck that takes up 1.78% of total time, in b-net.
Interesting. I was wondering why getting the time eats so much CPU. Though, in the happy path,
InactivityCheckis just loading a few atomics, which means getting the time costs as much as loading a few atomics. Also, the flame graph probably shows the CPU time, and not the wall clock time. So the patch here likely won't cut the wall clock time between two calls ofSocketHandlerConnected, but only the CPU time inside a singleSocketHandlerConnectedcall?concept ack, Seems fine to still make the changes here.
ajtowns commented at 8:41 PM on December 9, 2025: contributorJust to confirm, this flamegraph is from a node that has finished syncing to the tip? i.e. IBD is not included in this graph, right?
Yes; it's taken over a longish period though, iirc, I think
either30mor 2h. You can see ProcessNewBlock at 0.20% just before ProcessTransaction at 4.56% fwiw.Also, the flame graph probably shows the CPU time, and not the wall clock time. So the patch here likely won't cut the wall clock time between two calls of
SocketHandlerConnected, but only the CPU time inside a singleSocketHandlerConnectedcall?Yeah, presuming
SocketHandlerConnectedisn't using 100% of a core, it'll be spending its time waiting for theSELECT_TIMEOUT_MILLISECONDStimeout to hit, which is 50ms.net: Cache -capturemessages setting 5f5c1ea019ajtowns force-pushed on Dec 9, 2025vasild approvedvasild commented at 5:37 AM on December 10, 2025: contributorACK 5f5c1ea01955d277581f6c2acbeb982949088960
Would be happy to see the call to the deprecated
GetTime()removed: #34025 (review). I do not see it as a blocker because this PR is actually moving the call around, not planting a new one.DrahtBot requested review from sedited on Dec 10, 2025maflcko commented at 6:59 AM on December 10, 2025: memberreview ACK 5f5c1ea01955d277581f6c2acbeb982949088960 🏣
<details><summary>Show signature</summary>
Signature:
untrusted comment: signature from minisign secret key on empty file; verify via: minisign -Vm "${path_to_any_empty_file}" -P RWTRmVTMeKV5noAMqVlsMugDDCyyTSbA3Re5AkUrhvLVln0tSaFWglOw -x "${path_to_this_whole_four_line_signature_blob}" RUTRmVTMeKV5npGrKx1nqXCw5zeVHdtdYURB/KlyA/LMFgpNCs+SkW9a8N95d+U4AP1RJMi+krxU1A3Yux4bpwZNLvVBKy0wLgM= trusted comment: review ACK 5f5c1ea01955d277581f6c2acbeb982949088960 🏣 DIf+EMRUrE9g+3ldiGUW0pHeRyThkZk3bbOL6sas10+I4f60l14Vo/S1nCwtrHEqiue3z1X4uTE3S7uN3Tg3DQ==</details>
sedited approvedsedited commented at 11:51 AM on December 10, 2025: contributorACK 5f5c1ea01955d277581f6c2acbeb982949088960
glozow requested review from theuni on Dec 11, 2025mzumsande commented at 10:37 PM on December 11, 2025: contributorACK 5f5c1ea01955d277581f6c2acbeb982949088960
Looks like an improvement, but I wonder if instead of doing micro-optimizations like the ones here, it wouldn't be better to have the entire
InactivityCheck()procedure run somewhere else than in theSocketHandlerConnected, or at least less often - it seems a bit absurd to check every50msfor a timeout of 20 minutes. Even the initial waiting timem_peer_connect_timeoutfromShouldRunInactivityChecks()for a new peer doesn't really be checked that frequently.fanquake merged this on Dec 12, 2025fanquake closed this on Dec 12, 2025stringintech referenced this in commit b7c3ec1af3 on Dec 12, 2025ajtowns commented at 12:13 AM on December 13, 2025: contributorit seems a bit absurd to check every
50msfor a timeout of 20 minutes. Even the initial waiting timem_peer_connect_timeoutfromShouldRunInactivityChecks()for a new peer doesn't really be checked that frequently.I think if you're running a listening node with many real inbound peers, on mainnet where there seems to be maybe 4.5 tx/s, you probably expect to receive socket data from each peer every 2 seconds or so (announcing the ~9 txs that have been received in that time). With 80 (inbound, tx relay) peers, that's an interrupt every 25ms, rather than every 50ms. Worse if you have more peers, better if many of your peers are block-relay only or inactive spy nodes or similar.
theuni commented at 3:09 PM on December 18, 2025: memberPost-merge ACK 5f5c1ea01955d277581f6c2acbeb982949088960. Nits aside, both of these are nice cleanups.
github-metadata-mirror
This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-04-26 09:13 UTC
This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me - use the deprecated