This continues the work of moving application layer data into net_processing, by moving all ping data into the new Peer object added in #19607.
For motivation, see #19398.
Rebased on master. This is now ready for review.
4075 | @@ -4076,31 +4076,15 @@ class CompareInvMempoolOrder 4076 | }; 4077 | } 4078 | 4079 | -bool PeerManager::SendMessages(CNode* pto) 4080 | +void PeerManager::MaybeSendPing(CNode& pto)
If you're changing every reference to pto anyway because it's now a reference not a pointer, maybe also call it node_to or similar, since it's not a "p" anymore?
Sure. I'll split the commit into a move and a rename/change to reference to make it easier to review.
Done
216 | @@ -217,6 +217,9 @@ class PeerManager final : public CValidationInterface, public NetEventsInterface 217 | /** Send a version message to a peer */ 218 | void PushNodeVersion(CNode& pnode, int64_t nTime); 219 | 220 | + /** Send a ping message on a regular schedule or if requested manually */ 221 | + void MaybeSendPing(CNode& pto);
Yes, I don't think either need gate the other, the above was literally my thought process in getting to that PR -- sipa may not have nerd sniped me with twitter maths, but that doesn't mean I didn't get nerd sniped anyway...
This seems like a more productive form of aj-sniping.
I like this kind of aj-sniping.
116 | + // Wait 10 minutes 117 | + time += 10 * 60; 118 | + SetMockTime(time); 119 | + 120 | + // Clear nPingNonceSent so we don't disconnect for a ping timeout 121 | + dummyNode1.nPingNonceSent = 0;
Move the test change to its own commit so it's easy to verify behaviour doesn't change with the move logic to net_processing commit? I don't follow why the test change is necessary/desirable though.
Move the test change to its own commit so it's easy to verify behaviour doesn't change
good idea. Will do.
I don't follow why the test change is necessary/desirable though.
The change is required because previously ping timeouts happened in in CConnman, so it was fine to bump the time in this test because there's no CConnman to time out the connection. When the ping data and timout logic is moved to PeerManager, then the connection times out in this test if you bump the time forwards by 21 minutes without resetting the ping nonce.
Makes sense; might be a useful commit message for the test change?
Seems kind-of annoying to have to remember to constrain your mocktime bumps by random ping timeout limits. Shouldn't this also (already) cause problems in functional tests?
The functional tests have an asyncio event loop thread that will automatically respond to ping messages:
The unit tests are single threaded, so any logic to handle messages needs to be done in the main test logic.
ok, split this into its own commit.
Seems okay. Though, "jnewbery deleted a comment from sipa 3 days ago" ?
<!--e57a25ab6845829454e8d69fc972939a-->
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.
<!--174a7506f384e20aa4161008e828411d-->
Reviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.
"jnewbery deleted a comment from sipa 3 days ago" ?
ha. There were two comments from sheffine saying "why are you emailing me?" and one from sipa saying "sheffine, you're confused", so I deleted them all to reduce noise.
Censorship!!!
(yes, that's indeed what happened)
216 | @@ -217,6 +217,9 @@ class PeerManager final : public CValidationInterface, public NetEventsInterface 217 | /** Send a version message to a peer */ 218 | void PushNodeVersion(CNode& pnode, int64_t nTime); 219 | 220 | + /** Send a ping message on a regular schedule or if requested manually */ 221 | + void MaybeSendPing(CNode* pto);
Noob / practices question: why not just create this with a bool return type, since you change it 3 commits later? I don't have an opinion, but why is it better to do it this way?
Just so that the commits are hygienic. For people reviewing commit by commit, it wouldn't make sense to have a function that returns a bool which is just ignored. Changing the return type in a subsequent commit is as easy as changing void to bool in the declaration and definition.
I was considering not returning a bool here, since it's a bit of a strange interface (the function sets pto.fDisconnect and returns false. Perhaps it should stay as returning void, and have SendMessages() check for fDisconnect after calling this function?
1052 | - /** When the last ping was sent, or 0 if no ping was ever sent */ 1053 | - std::atomic<std::chrono::microseconds> m_ping_start{0us}; 1054 | - // Last measured round-trip time. 1055 | - std::atomic<int64_t> nPingUsecTime{0}; 1056 | - // Best measured round-trip time. 1057 | std::atomic<int64_t> nMinPingUsecTime{std::numeric_limits<int64_t>::max()};
Genuine question, why should best ping time be in CNode but all the others in Peer? We use best ping time for eviction logic which seems like an application-layer thing...
Answered here: #20721 (comment)
620 | - ping_wait = GetTime<std::chrono::microseconds>() - m_ping_start.load(); 621 | - } 622 | - 623 | - // Raw ping time is in microseconds, but show it to user as whole seconds (Bitcoin users should be well used to small numbers with many decimal places by now :) 624 | - stats.m_ping_usec = nPingUsecTime; 625 | - stats.m_min_ping_usec = nMinPingUsecTime;
Since you removed this, CNodeStats doesn't know min ping time anymore. The p2p_eviction and p2p_ping functional tests fail since getpeerinfo() always returns 0 for minping :(
Adding it back fixed the tests for me:
stats.m_min_ping_usec = m_best_ping_time;
Oh hey, if s/m_min_ping_usec/m_best_ping_time then
X(m_best_ping_time);
(wow so clean)
Good catch! I lost this in a rebase. I've now restored it.
I'm going to leave the m_min_ping_usec name to avoid increasing the diff further.
Can't really comment on the overall goal, as I don't really have a lot of context regarding either locks, mutexes, or peer management, but the linked motivation sounded good to me. The PR is easy to follow and each step makes sense to me. I found the first commit message to be a bit confusing:
No need to create another shared_ptr<Peer> if ProcessMessages() already has one in hand.
I take it to mean something along the lines of "Instead of creating another shared ptr, let's pass one that the callsite already has on hand.".
code review ACK https://github.com/bitcoin/bitcoin/pull/20721/commits/6f0309b4743d05c512fc86d2d206f72d512c5c93
Concept ACK, although I have a bit of confusion about what constitutes net vs application layer, namely why best ping is still in CNode. You broke a few functional tests by removing minping from copyStats() which I assume is accidental?
For renaming, I personally think "best" is more informative than "min" but there's some inconsistency in naming now. It's still called 'minping' in getpeerinfo and m_min_ping_usec in CNodeStats.
1055 | - std::atomic<int64_t> nPingUsecTime{0}; 1056 | - // Best measured round-trip time. 1057 | - std::atomic<int64_t> nMinPingUsecTime{std::numeric_limits<int64_t>::max()}; 1058 | - // Whether a ping is requested. 1059 | - std::atomic<bool> fPingQueued{false}; 1060 | + std::atomic<int64_t> m_best_ping_time{std::numeric_limits<int64_t>::max()};
std::atomic<int64_t> m_min_ping_time{std::numeric_limits<int64_t>::max()};
I think renaming from "min" to "best" needlessly sows confusion or more needed code churn for no gain:
I agree. I've changed it to m_min_ping_time (it's still code churn since it was previously named nMinPingUsecTime, but I agree with you about the consistent naming).
229 | @@ -237,6 +230,12 @@ static RPCHelpMan getpeerinfo() 230 | heights.push_back(height); 231 | } 232 | obj.pushKV("inflight", heights); 233 | + if (statestats.m_ping_usec > 0) { 234 | + obj.pushKV("pingtime", ((double)statestats.m_ping_usec) / 1e6); 235 | + } 236 | + if (statestats.m_ping_wait_usec > 0) { 237 | + obj.pushKV("pingwait", ((double)statestats.m_ping_wait_usec) / 1e6);
Can you use named casts in lines 234 and 237 while touching this code?
https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines.html#Res-casts-named
I'd rather not change this while moving it. There's another C-style cast from int64_t to double on line 206. Let's fix them all at the same time.
Concept ACK
A couple of thoughts on preliminary quick skim-through.
Thanks for the reviews, @glozow and @jonatack. I've addressed your comments: https://github.com/bitcoin/bitcoin/compare/6f0309b4743d05c512fc86d2d206f72d512c5c93..d666804b7b2d5006c37b86f5f88e344216ad0862.
rebased
Genuine question, why should best ping time be in CNode but all the others in Peer? We use best ping time for eviction logic which seems like an application-layer thing...
This is a very good question. I haven't specified the difference between connection-layer and application-layer well enough with respect to ping times.
Functionally, ping times are used in two places: timeout logic and eviction logic (they're also presented to the user through the RPC and GUI, but that doesn't affect what follows).
There's timeout logic in the connection layer (in InactivityCheck()), which should detect when there's a problem with the connection, such as:
There's also timeout logic in the application layer, such as here:
where we detect if block download has timed out. Ping timeouts belong in the application layer, since failing to respond to a ping message with a pong message is a P2P protocol violation. However, that ping timeout logic has previously existed in net.cpp because the ping data existed in CNode. In this PR, we move the ping timout logic to the application layer, where it logically belongs. This logic requires the nPingNonceSent, m_ping_start and fPingQueued data to be in the net_processing layer.
(You may be asking the question: "isn't the connection layer timeout logic somewhat redundant with the ping timeout logic?" and you'd be right - if pings are not timing out, then we must be sending/receiving data from the peer, so the connection cannot time out. However, a bit of redundancy here is fine - the net layer is responsible for maintaining the data connection, and the net_processing layer is responsible for maintaining the p2p protocol peering.)
This exists entirely in the connection layer, in CConnman::AttemptToEvictConnection(). It requires data from both the connection layer (e.g. connection time, connection type, whether the connection is local, etc) and from the application layer (e.g. last block received time, last tx received time, etc). For that to work, the application layer must inform the connection layer of events, such as a valid tx received, a valid block received, and (now) a valid pong received, and the connection layer must store that data (as nLastBlockTime, nLastTXTime and m_min_ping_time) for use in the eviction logic. We could group those members in a structure called EvictionCriteria or similar to make it clearer what they're used for. For now, I've just expanded the comment for m_min_ping_time.
Writing this has made me realize that nPingUsecTime should probably stay in CNode, since it doesn't need to be used for any application logic. I've done that in the latest force-push.
Ping timeouts are being triggered on feature_bip68_sequence.py test when run on cirrus. That test uses lots of mocktime jumps, so it's not surprising. I'll see if there's a way to make sure that pings don't timeout when using mocktime.
This exists entirely in the connection layer, in
CConnman::AttemptToEvictConnection(). It requires data from both the connection layer (e.g. connection time, connection type, whether the connection is local, etc)
Not convinced that makes sense -- "connection type" and "how many local connections to evict" both seem like application logic to me. I was certainly expecting m_conn_type to eventually make its way into net_processing, given it deals with message types (block/tx relay/addrfetch/feeler) and trust issues (inbound/outbound/manual) which are all application level concepts.
1143 | @@ -1143,6 +1144,10 @@ bool AppInitParameterInteraction(const ArgsManager& args) 1144 | return InitError(Untranslated("peertimeout cannot be configured with a negative value.")); 1145 | } 1146 | 1147 | + if (args.GetArg("-pingtimeout", TIMEOUT_INTERVAL) <= 0) {
nit: docs specify minimum is 1, but it seems here min is 0?
This error condition is hit if pingtimeout is set to 0.
I don't have a good understanding of how this all works yet to give an ack - but I did go through commit by commit and everything made sense and looked good to me (take that w/ a grain of salt). just noticed one nit
Pushed a fix for the intermittent ping timeout in wallet_resendwallettransactions.py
Rebased on master. All tests are now passing for me locally. Marking this as ready for review.
rebased
1144 | @@ -1144,6 +1145,10 @@ bool AppInitParameterInteraction(const ArgsManager& args) 1145 | return InitError(Untranslated("peertimeout cannot be configured with a negative value.")); 1146 | } 1147 | 1148 | + if (args.GetArg("-pingtimeout", TIMEOUT_INTERVAL) <= 0) {
Are cases where the response is malformed (e.g. user inputs a string instead of a number) handled elsewhere?
Argument handling is in ArgsManager::GetArg(). Since the second argument here is an int, the GetArg(string, int) version is called:
which will try to convert the string into an int.
This might result in a slightly inaccurate error message, but I think it's fine. It's the same behaviour as for -peertimeout:
→ ./src/bitcoind --pingtimeout=abc
Error: pingtimeout cannot be configured with a negative value.
1077 | + // Merely reporting pingtime might fool the caller into thinking the node was still responsive, 1078 | + // since pingtime does not update until the ping is complete, which might take a while. 1079 | + // So, if a ping is taking an unusually long time in flight, 1080 | + // the caller can immediately detect that this is happening. 1081 | + std::chrono::microseconds ping_wait{0}; 1082 | + if ((0 != peer->m_ping_nonce_sent) && (0 != peer->m_ping_start.load().count())) {
Are the inner sets of parenthesis required or is this a code-style thing?
No, these aren't required. In C++ && has a lower precedence than != so the parens aren't changing the order of operations (https://en.cppreference.com/w/cpp/language/operator_precedence). They're not doing any harm either.
This code is moved unaltered in this PR. I think it's fine to keep it as it is.
82 | @@ -83,6 +83,9 @@ BOOST_AUTO_TEST_CASE(outbound_slow_chain_eviction) 83 | auto peerLogic = PeerManager::make(chainparams, *connman, nullptr, *m_node.scheduler, 84 | *m_node.chainman, *m_node.mempool, false); 85 | 86 | + // Don't let pings time out
// Don't let pings time out due to mocktime
Sounds good. Added!
32 | @@ -33,6 +33,7 @@ def set_test_params(self): 33 | [ 34 | "-acceptnonstdtxn=1", 35 | "-peertimeout=9999", # bump because mocktime might cause a disconnect otherwise 36 | + "-pingtimeout=9999", # bump because mocktime might cause a disconnect otherwise
(Rookie question) Would the previous commit(9ad48f3965a32a6eba67ba49da19499fbe9b89e3) change anything with respect to mocktime unless -pingtimeout is used? If not, why did some tests start to fail due to ping timeouts?
Previously, when ping timeouts were handled in CConnman::InactivityCheck(), the whole logic was gated on the following guard:
if (now <= node.nTimeConnected + m_peer_connect_timeout) {
// Only run inactivity checks if the peer has been connected longer
// than m_peer_connect_timeout.
return false;
}
Because tests generally run for less than m_peer_connect_timeout, the ping timeout logic was not hit at all. Once the logic moves to net_processing, that guard no longer exists, so we start testing for ping timeouts as soon as the peer is connected.
314 | @@ -315,7 +315,8 @@ class PeerManagerImpl final : public PeerManager 315 | /** Send a version message to a peer */ 316 | void PushNodeVersion(CNode& pnode, int64_t nTime); 317 | 318 | - /** Send a ping message on a regular schedule or if requested manually */ 319 | + /** Send a ping message on a regular schedule or if requested manually. May 320 | + * disconnect the peer if a ping has timed out. */ 321 | void MaybeSendPing(CNode& node_to);
/** Send a ping message every PING_INTERVAL or if requested via rpc. Disconnect
* the peer if a ping has timed out. Return true upon disconnection. */
[[nodiscard]] bool ManagePing(CNode& node_to);
I've changed the signature to return a void, so the nodiscard tag is no longer required (good suggestion though!)
I've taken the suggested comment changes.
I see around ~20 functional tests that use setmocktime... but you only added -pingtimeout=9999 to two of them. Are you not worried about ping timeouts due to mocktime in any of the other tests?
concept ACK, tests pass locally, scripted diff verified in 4180381f683fad8755b30a3a8e65edc26c1ce5e2
Makes sense to me, moves ping properties from CNode in net to Peer in net_processing.
Testing also by running src/bitcoind -pingtimeout=1 -signet (didn't want to disrupt mainnet peers) and got some expected log messages:
2021-02-03T15:34:23Z ping timeout: 1.018515s peer=5
1144 | @@ -1144,6 +1145,10 @@ bool AppInitParameterInteraction(const ArgsManager& args) 1145 | return InitError(Untranslated("peertimeout cannot be configured with a negative value.")); 1146 | } 1147 | 1148 | + if (args.GetArg("-pingtimeout", TIMEOUT_INTERVAL) <= 0) { 1149 | + return InitError(Untranslated("pingtimeout cannot be configured with a negative value."));
Nit: Is it a bit pedantic to call it a nonpositive value here to include zero?
Although I see it's consistent with -peertimeout error so that's probably why :)
I've changed to option to be a binary true/false, so this check is no longer required.
I see around ~20 functional tests that use setmocktime... but you only added -pingtimeout=9999 to two of them. Are you not worried about ping timeouts due to mocktime in any of the other tests? @pinheadmz I imagine it's only necessary where
setmocktimedoes an increase of more than the default ping timeout.
Dug into this a little bit looking for setmocktime calls with big jumps - @jnewbery do you think p2p_addr_relay.py, p2p_addrv2_relay.py, and/or p2p_getaddr_caching.py need a -pingtimeout?
Also a question that may or may not be relevant - how do the inactivity timers handle when setmocktime goes backwards in time?
I've updated the -pingtimeout option to be a binary enable/disable. We don't need fine control over it in the tests, and it was essentially being used to disable ping timeouts entirely.
Dug into this a little bit looking for setmocktime calls with big jumps - @jnewbery do you think p2p_addr_relay.py, p2p_addrv2_relay.py, and/or p2p_getaddr_caching.py need a -pingtimeout?
Very nice. Thanks @glozow. I've updated those tests to disable ping timeouts.
Also a question that may or may not be relevant - how do the inactivity timers handle when setmocktime goes backwards in time?
That's not a problem for the ping timeout. The ping timeout condition is now > peer.m_ping_start.load() + std::chrono::seconds{TIMEOUT_INTERVAL} i.e. "(the time now) is after (when the ping was sent) plus (the timeout interval)". If time goes backwards, then (the time now) is decreased and we won't hit this condition.
Thanks everyone for the reviews. All comments have now been addressed.
47 | @@ -48,6 +48,7 @@ class AddrTest(BitcoinTestFramework): 48 | def set_test_params(self): 49 | self.setup_clean_chain = False 50 | self.num_nodes = 1 51 | + self.extra_args = [["-nopingtimeout"]] # mocktime might cause a disconnect otherwise
So would -pingtimeout means timeout=true? And -nopingtimeout is the same thing as -pingtimeout=0? I couldn't find where the implicit "no" logic was :(
I think it's done in InterpretOption() invoked from ArgsManager::ParseParameters() which is invoked in AppInit().
706 | @@ -713,6 +707,12 @@ class CNode 707 | 708 | std::string ConnectionTypeAsString() const { return ::ConnectionTypeAsString(m_conn_type); } 709 | 710 | + //* We have completed a successful ping-pong keepalive. Update latest and best ping times. */
I appreciate the added documentation 🙏
nit
/** We have completed a successful ping-pong keepalive. Update latest and best ping times. */
:facepalm: I've fixed the doxygen style and also tweaked the comment wording a bit.
Wooo 🙌 the boolean option makes much more sense to me - in the relevant tests, we more want to ignore ping timeouts than extend them. I feel like the interface would be more intuitive if it was just called -disablepingtimeout since it's an optional flag, but perhaps just personal preference?
I checked the functional tests by checking the input time against m_ping_wait_usec from GetNodeStateStats insetmocktime RPC, and then running the functional tests that use setmocktime. Seems like feature_bip68_sequence.py is actually the only one where it tripped... but my suggestions still make sense to me and I remember lightlike mentioning wallet_resendwallettransactions.py failing so maybe it was a race, no pings in flight?
Thanks for the reviews!
I feel like the interface would be more intuitive if it was just called -disablepingtimeout since it's an optional flag, but perhaps just personal preference?
I think it's best to avoid negative configuration names. -disablepingtimeout=0 would mean "do not disable ping timeout", which is a double negative and slightly harder to parse than "enable ping timeout". See other options like -listen, -persistmempool, etc, where the default is true, and -nolisten, -nopersistmempool is used to disable.
my suggestions still make sense to me and I remember lightlike mentioning wallet_resendwallettransactions.py failing so maybe it was a race, no pings in flight?
Right, these are all races, so they won't fail consistently. I think it's just safer to disable pings timeouts on any test which makes large mocktime leaps.
rebased
ACK https://github.com/bitcoin/bitcoin/pull/20721/commits/bbd970a2123aec5071dd3dd901e478844c6e0fea 🍿
The negative config option stuff makes more sense to me now, thanks for explaining!
Light Code-Review ACK bbd970a2123aec5071dd3dd901e478844c6e0fea
This may have been answered before but could someone tell me the exact sequence of events in the test framework involving setmocktime that would make -nopingtimeout necessary? My naive reading is that send_and_ping blocks until it receives the pong and is therefore not going to timeout if the next call is a setmocktime.
I'll also do a deeper read into [net processing] Move ping timeout logic to net processing later this week to make sure that this move is indeed safe and doesn't cause us to miss any opportunities to disconnect a peer.
3847 | - if (ping_time.count() >= 0) { 3848 | - // Successful ping time measurement, replace previous 3849 | - pfrom.nPingUsecTime = count_microseconds(ping_time); 3850 | - pfrom.nMinPingUsecTime = std::min(pfrom.nMinPingUsecTime.load(), count_microseconds(ping_time)); 3851 | + const auto ping_time = ping_end - peer->m_ping_start.load(); 3852 | + if (ping_time.count() > 0) {
just noting that this changed from >= to >. makes sense to call it a "timing mishap" if the pong was received in the same microsecond as we recorded for the ping going out, so seems fine. just highlighting the change since it's easy to miss.
Oops, this was unintentional. You're right that it shouldn't have any impact, but I'm going to revert it to how it was. Good catch!
4337 | + node_to.fPingQueued = false; 4338 | + node_to.m_ping_start = GetTime<std::chrono::microseconds>(); 4339 | + if (node_to.GetCommonVersion() > BIP0031_VERSION) { 4340 | + node_to.nPingNonceSent = nonce; 4341 | + m_connman.PushMessage(&node_to, msgMaker.Make(NetMsgType::PING, nonce)); 4342 | } else {
tangential because you didn't change this logic, but what's the point of sending a ping message to peers who are too old to support / respond with a pong?
pong messages were introduced in 2012 with p2p version 60000. I imagine that prior to that, the reason for a ping message was simply a keepalive (make sure that TCP connections stay up, NAT pinholes stay open, etc). This logic maintains backwards compatibility with clients older than 60000.
In practice, I expect there are almost no clients older than 60000, but if there are any it'd be a shame to arbitrarily disconnect them for failing ping timeouts.
709 | @@ -716,6 +710,12 @@ class CNode 710 | 711 | std::string ConnectionTypeAsString() const { return ::ConnectionTypeAsString(m_conn_type); } 712 | 713 | + /** A ping-pong keepalive has completed successfully. Update latest and minimum ping times. */ 714 | + void PingReceived(std::chrono::microseconds ping_time) {
isn't this more of a PongReceived?
Yes. Much better. Fixed
4401 | + MaybeSendPing(*pto, *peer); 4402 | + 4403 | + // MaybeSendPing may have marked peer for disconnection 4404 | + if (pto->fDisconnect) return true; 4405 | + 4406 | + // If we get here, the outgoing message serialization version is set and can't change.
this comment previously followed the fSuccesfullyConnected check, which I think(?) was the intent -> fSuccesfullyConnected is set when processing the VERACK message, and m_greatest_common_version gets set when processing the VERSION message.
but now this comment is below MaybeSendPing, which also might retrieve m_greatest_common_version to construct the ping message. I find the intent of this comment to be less clear after these changes.
This is a comment on the line below, which is contructing a CNetMsgMaker object with the correct serialization version:
const CNetMsgMaker msgMaker(pto->GetCommonVersion());
I've moved this back to immediately after the fSuccessfullyConnected check. Hopefully that's clearer.
4390 | +{ 4391 | + // We must call MaybeDiscourageAndDisconnect first, to ensure that we'll 4392 | + // disconnect misbehaving peers even before the version handshake is complete. 4393 | + if (MaybeDiscourageAndDisconnect(*pto)) return true; 4394 | + 4395 | + // Don't send anything until the version handshake is complete
I know you only moved this code, but I think we could handle fDisconnect better.
This comment seems misleading since I think it only refers to fSuccesfullyConnected. Then, we might set fDisconnect to true in MaybeSendPing, so we check this bool again a couple lines down to also return early.
A simpler logic flow might be removing this check here, starting MaybeSendPing with return if node_to.fDisconnect, and then removing this first early return from SendMessages. And the comment on 4377 could be updated to something along the lines of Don't send messages to peers that are marked for disconnection
That seems like a reasonable change, but I'm not sure it needs to happen in this PR. Very happy to review a PR that moves this around if you think it'd make it clearer.
322 | @@ -315,6 +323,10 @@ class PeerManagerImpl final : public PeerManager 323 | /** Send a version message to a peer */ 324 | void PushNodeVersion(CNode& pnode, int64_t nTime); 325 | 326 | + /** Send a ping message every PING_INTERVAL or if requested via RPC. May 327 | + * disconnect the peer if a ping has timed out. */
/** Send a ping message every PING_INTERVAL or if requested via RPC. May
* mark the peer to be disconnected if a ping has timed out. */
Done!
24 | @@ -25,11 +25,14 @@ static const bool DEFAULT_PEERBLOOMFILTERS = false; 25 | static const bool DEFAULT_PEERBLOCKFILTERS = false; 26 | /** Threshold for marking a node to be discouraged, e.g. disconnected and added to the discouragement filter. */ 27 | static const int DISCOURAGEMENT_THRESHOLD{100}; 28 | +/** -pingtimeout default */ 29 | +static const int64_t DEFAULT_PING_TIMEOUT = true;
another comment on code that you simply moved-
I found this definition odd. Why is it defined as an int but set as a boolean? Also, why is it an int64_t, that seems like more space needed than for a simple bool value?
also note that this could be constexpr, but I don't know if that actually impacts anything. I tried it out in godbolt explorer and looks like for such simple logic, it compiled down to the same assembly regardless 🤷🏽♀️
No, I added this code. It's an int because originally the config option was to set how long the timeout. I later changed the config option to be binary on/off (https://github.com/bitcoin/bitcoin/pull/20721#issuecomment-773227475), but failed to change the type to bool. Will fix.
code review ACK bbd970a212
carefully read through the diff and confirmed nothing substantial changed / thought about potential divergences. seems to me that the main functional difference is that marking a peer for disconnect based on ping time has changed from the socket handler thread to the message handler thread (and then the socket handler thread still executes the disconnection). I imagine this could sometimes take slightly longer to identify the peers for disconnect- eg. ThreadMessageHandler has a 100ms wait if !fMoreWork, or generally the message handler thread seems like it has more capacity for doing stuff. However, I can't see how this sort of small difference could cause any problems.
I left several comments, but they are all focused on style or documentation, and the majority of them are about code that you moved without changing. None of them are worth holding up this PR from merge or invalidating ACKs, just food for thought if you find any worthwhile for follow-ups, or have to push for a different reason.
Thanks for the review @amitiuttarwar. I've resolved all of your review comments.
This may have been answered before but could someone tell me the exact sequence of events in the test framework involving setmocktime that would make -nopingtimeout necessary? My naive reading is that send_and_ping blocks until it receives the pong and is therefore not going to timeout if the next call is a setmocktime.
The send_and_ping method involves the python test framework sending a ping message to the bitcoind node and waiting for a pong. The timeout that could cause a problem is the ping message in the other direction. Here's how it could fail:
ping message to test framework at time 0setmocktime to push the node's mocktime some large number of seconds in the futurepong response has been received/processed, SendMessages() is called by the message handler thread for that peer, and the test in MaybeSetPing() fails: now > peer.m_ping_start.load() + std::chrono::seconds{TIMEOUT_INTERVAL})bitcoind periodically sends pings to all its peers, and is continually running the message handler thread loop, so this window condition will cause the test to fail intermittently.
reACK af23d37cab, all that changed was in response to my feedback (thanks!)
4335 | + // This means that setmocktime may cause pings to time out. 4336 | + auto now = GetTime<std::chrono::microseconds>(); 4337 | + 4338 | + if (gArgs.GetArg("-pingtimeout", DEFAULT_PING_TIMEOUT) && 4339 | + peer.m_ping_nonce_sent && 4340 | + now > peer.m_ping_start.load() + std::chrono::seconds{TIMEOUT_INTERVAL}) {
Recommend if (x < y) rather than if (y > x) generally -- smaller numbers generally go to the left of larger numbers, so keeping that convention is one less thing to have to think about. (This patch changes the tests from timeout < GetTime() to now > timeout)
I find that "now < timeout" or "now > timeout" is much more natural to parse and understand - the thing that we're measuring (the time now) is on the left, and the thing that it's being measured against (the timeout) is on the right. That's how most people speak ("Is it after 10 o'clock?"), and the other way is a bit yoda-esque ("was 10 o'clock before now?"). Ultimately, I think this is a matter of taste.
If you write "if (10 o'clock < now)" you've literally put now after 10 o'clock and are asking if that's a true statement. I do get what you're saying; I just find the always less-than strategy lets you read the ordering much more directly in general -- which one comes first is in exactly the order you're reading, and not dictated by a single bit of punctuation. It's also helpful for bounds checking (low_value < x && x < high_value) and (x < low_value || high_value < x) read much clearer than (x > low_value && x < high_value) and (x < low_value || x > high_value).
4297 | + auto now = GetTime<std::chrono::microseconds>(); 4298 | + 4299 | + if (gArgs.GetArg("-pingtimeout", DEFAULT_PING_TIMEOUT) && 4300 | + node_to.nPingNonceSent && 4301 | + now > node_to.m_ping_start.load() + std::chrono::seconds{TIMEOUT_INTERVAL}) { 4302 | + LogPrint(BCLog::NET, "ping timeout: %fs peer=%d\n", 0.000001 * count_microseconds(now - node_to.m_ping_start.load()), node_to.GetId());
Would probably be better to set m_pingtimeout = gArgs.GetArg() in the constructor, rather than re-query it every loop.
I think losing the check that GetSystemTimeInSeconds() <= node_to.nTimeConnected + peer_connect_timeout (which is done in InactivityCheck but not MaybeSendPing) is a mistake -- the docs for -peertimeout say "Specify a p2p connection timeout delay in seconds. After connecting to a peer, wait this amount of time before considering disconnection based on inactivity" which seems like it should apply here: not sending a ping reply certainly seems like a form of inactivity to me! (Setting m_peer_connect_timeout from gArgs in the PeerManagerImpl constructor seems like it would work here too, without violating layering)
Having -peertimeout continue to apply to (a lack of) PONG messages seems to make -pingtimeout unnecessary; the tests don't need to be modified anymore, and I'm not seeing any reasons other than initial mocktime confusion as to why you'd want to prevent disconnections?
I've removed pingtimeout as requested.
31 | @@ -32,7 +32,7 @@ def set_test_params(self): 32 | self.extra_args = [ 33 | [ 34 | "-acceptnonstdtxn=1", 35 | - "-peertimeout=9999", # bump because mocktime might cause a disconnect otherwise
I think it might make more sense in general to set -peertimeout=9999 for all the test cases (ie, in the framework generated bitcoin.conf) -- or at least all the test cases that use mocktime -- and override it anywhere that's explicitly testing disconnection due to timeouts (p2p_ping.py).
I think it might make more sense in general to set -peertimeout=9999 for all the test cases (ie, in the framework generated bitcoin.conf)
Changing the behaviour of all tests is outside the scope of this PR, but I'm happy to review any proposal you have for making the tests more robust.
4311 | @@ -4299,7 +4312,7 @@ void PeerManagerImpl::MaybeSendPing(CNode& node_to) 4312 | pingSend = true; 4313 | } 4314 | 4315 | - if (node_to.nPingNonceSent == 0 && node_to.m_ping_start.load() + PING_INTERVAL < GetTime<std::chrono::microseconds>()) { 4316 | + if (node_to.nPingNonceSent == 0 && now > node_to.m_ping_start.load() + PING_INTERVAL) {
If I'm following the logic here right, then if we backdate mocktime after sending a PING, we'll never send a PING to that node again. I'm not seeing an obvious problem with that (and don't think it's a change in this PR), but it's somewhat surprising.
That seems right to me. As you say, that's not changed by this PR.
4291 | @@ -4291,6 +4292,18 @@ class CompareInvMempoolOrder 4292 | 4293 | void PeerManagerImpl::MaybeSendPing(CNode& node_to) 4294 | { 4295 | + // Use mockable time for ping timeouts.
If you're doing disconnects here as well, doesn't that make it MaybeSendPingOrDisconnectDueToLackOfPong() ? Not sure the Maybe is useful -- SendMessages doesn't have a Maybe despite it not necessarily doing something every time it's called.
This moves ping disconnections from ThreadSocketHandler (which calls SocketHandler which calls InactivityCheck, I think on a 50ms cycle) to ThreadMessageHandler (which calls SendMessages which calls MaybeSendPing, I think on a 100ms cycle, though will reinvoke SendMessages more frequently if any node's ProcessMessages indicates there's more work). I think that's fine.
make it MaybeSendPingOrDisconnectDueToLackOfPong()
I don't think the function name needs to document all the functionality. It's documented in the function comment:
/** Send a ping message every PING_INTERVAL or if requested via RPC. May
* mark the peer to be disconnected if a ping has timed out. */
Not sure the Maybe is useful -- SendMessages doesn't have a Maybe despite it not necessarily doing something every time it's called.
Right, and MaybeDiscourageAndDisconnect() does have maybe in it.
I don't really care what this function is called. If you feel strongly, I can change it.
"MaybeSendPing" just kinda bugs me as a name. I'd suggest "HandleBIP31PingPong()" or something, but that would bug me a bit too. Maybe some day it'll be possible to just register a PingPongMessages class that deals with both the send and receive side :pray:
76 | @@ -77,13 +77,12 @@ static RPCHelpMan ping() 77 | [&](const RPCHelpMan& self, const JSONRPCRequest& request) -> UniValue 78 | { 79 | NodeContext& node = EnsureNodeContext(request.context); 80 | - if(!node.connman) 81 | + if(!node.peerman) {
Space after "if" if you're changing punctuation?
Done
3843 | + if (nonce == peer->m_ping_nonce_sent) { 3844 | // Matching pong received, this ping is no longer outstanding 3845 | bPingFinished = true; 3846 | - const auto ping_time = ping_end - pfrom.m_ping_start.load(); 3847 | + const auto ping_time = ping_end - peer->m_ping_start.load(); 3848 | if (ping_time.count() >= 0) {
I think ping_time.count == 0 is expected if you do setmocktime X, then a ping goes out, and the pong is received before you call setmocktime Y.
Leaving this so as not to change behaviour
(This was meant to be a reply to #20721 (review) except github wasn't let me comment there)
I think I'm a NACK on the -pingtimeout changes as they stand -- at a minimum it should be clear from the PR description and commit messages why those changes are needed, but there's currently no motivation for them in those places.
I think duplicating the -peertimeout check into net_processing is enough to obsolete the need to change the tests (if it's not, I still don't understand the change well enough to ack it), and that seems like a much more obvious refactor to me, and also matches the help text for the option better. (That said, I think adding -peertimeout=9999 to those tests in case they end up taking longer than 60s to run would be justifiable, but that's true with or without these changes)
I believe we already have a bug in InactivityCheck when setting mocktime into the past, in that nLastRecv is set via GetTime<microseconds> in ReceiveMsgBytes but then compared against GetSystemTime in InactivityCheck leading to errors once -peertimeout has passed.
I think I'm a NACK on the -pingtimeout changes as they stand -- at a minimum it should be clear from the PR description and commit messages why those changes are needed, but there's currently no motivation for them in those places.
I think duplicating the -peertimeout check into net_processing is enough to obsolete the need to change the tests
Ok, done. I've removed -pingtimeout, and duplicted the -peertimeout setting and logic into MaybeSendPing() as requested.
Moving the application-layer data and logic is an opportunity to clean up and rationalize the weird edge cases and strange behavior. As you've noted from your observation about the InactivityCheck bug, and as sipa pointed out in #20027 (review), mixing mockable time and system time is too complicated and a source of bugs and confusion. Moving ping timeouts to the application layer means that we can lose one of those interactions between system time and mockable time - system time timeouts on the socket stay in net, and mockable timeouts live in the application layer. This is really only applicable for tests, and I don't think there's much downside to adding a hidden configuration option where it's needed.
However, I don't think this is important enough to hold up the rest of #19398. I've restored the behaviour to exactly how it was, so no changes are required to tests (except the modified PeerManager ctor signature). We can perhaps rationalize mocktime/system time for pings at some later date.
4376 | - pto->m_ping_start = GetTime<std::chrono::microseconds>(); 4377 | - if (pto->GetCommonVersion() > BIP0031_VERSION) { 4378 | - pto->nPingNonceSent = nonce; 4379 | - m_connman.PushMessage(pto, msgMaker.Make(NetMsgType::PING, nonce)); 4380 | + peer.m_ping_queued = false; 4381 | + peer.m_ping_start = GetTime<std::chrono::microseconds>();
think this could just be..
peer.m_ping_start = now;
Done!
346 | @@ -335,6 +347,10 @@ class PeerManagerImpl final : public PeerManager 347 | * on extra block-relay-only peers. */ 348 | bool m_initial_sync_finished{false}; 349 | 350 | + /** After connecting to a peer, wait this amount of time before considering 351 | + * disconnection based on inactivity */ 352 | + const int64_t m_peer_connect_timeout;
I'd prefer if this was a chrono duration, but I understand the context of keeping it consistent with net.
I've now removed this again. Sorry :grimacing:
RE -pingtimeout:
I understand how it could be surprising that -peertimeout would no longer apply to ping logic, especially since InactivityChecks would still bypass the version handshake timeout. However, I find it unfortunate to have the system time comparisons in MaybeSendPing. Having the application level logic be mockable makes sense to me, and seems in line with the discussion in #20027 (review). Maybe in the longer term, we could have a debug arg that handles the application level timeouts of ping & verack handshake, and there could be a param interaction that sets the new arg based on -peertimeout.
But that all seems out of the scope of this PR. I've reviewed the changes and the code looks correct as far as I can tell.
ACK 31cc466eb0
4297 | @@ -4297,6 +4298,20 @@ class CompareInvMempoolOrder 4298 | 4299 | void PeerManagerImpl::MaybeSendPing(CNode& node_to) 4300 | { 4301 | + // Use mockable time for ping timeouts. 4302 | + // This means that setmocktime may cause pings to time out. 4303 | + auto now = GetTime<std::chrono::microseconds>(); 4304 | + 4305 | + if (GetSystemTimeInSeconds() > node_to.nTimeConnected + m_peer_connect_timeout &&
Adding:
/** Return true if inactivity checks should be ignored */
inline bool IgnoreInactivityChecks(const CNode& node) const
{
return GetSystemTimeInSeconds() <= node.nTimeConnected + m_peer_connect_timeout;
}
to CConnman would let you avoid adding m_peer_connect_timeout in PeerManagerImpl and keep the non-mockable times in net.cpp. (You have access to connman anyway via PeerManagerImpl::m_connman and access to node in order to set the disconnect flag)
Done
4368 | + // If we get here, the outgoing message serialization version is set and can't change. 4369 | + const CNetMsgMaker msgMaker(pto->GetCommonVersion()); 4370 | + 4371 | + MaybeSendPing(*pto); 4372 | + 4373 | + PeerRef peer = GetPeerRef(pto->GetId());
Should there be a if (peer == nullptr) return true; here? We check that everywhere else.
Rather than move this line, might be better to continue doing it first thing and pass the result into MaybeDiscourageAndDisconnect rather than having that function look it up too. (At least, that's where I'm expecting to end up with #20758 eventually)
Done
Looks good, couple of suggestions I think are worth taking.
FYI, I think the scripted diff only touches net_processing.cpp, so doesn't need the .* there.
3998 | @@ -3969,43 +3999,39 @@ void PeerManagerImpl::ProcessMessage(CNode& pfrom, const std::string& msg_type, 3999 | return; 4000 | } 4001 | 4002 | -bool PeerManagerImpl::MaybeDiscourageAndDisconnect(CNode& pnode) 4003 | +bool PeerManagerImpl::MaybeDiscourageAndDisconnect(CNode& pnode, Peer& peer)
Question about 6a03a1f7e4: Is the idea that, in the future, MaybeDiscourageAndDisconnect() will only need to accept a Peer? And right now, even though you could grab the PeerRef from the m_peer_map in here again, passing in a Peer is a bit less overhead?
MaybeDiscourageAndDisconnect should continue needing a CNode in order to signal the disconnection, I think.
MaybeDiscourageAndDisconnect should continue needing a CNode in order to signal the disconnection, I think.
This is true. It's also currently needed for the connection type, which I don't think belongs in net_processing.
Even though you could grab the PeerRef from the m_peer_map in here again, passing in a Peer is a bit less overhead?
We want to grab a PeerRef at the entry functions (ProcessMessages and SendMessages for the message handler thread), and then pass that reference down through the call stack. Grabbing PeerRef incurs overhead because m_peer_mutex needs to be locked and the shared ptr's refcount needs to be incremented/decremented, so we shouldn't do it unnecessarily.
1220 | @@ -1221,18 +1221,17 @@ void CConnman::NotifyNumConnectionsChanged() 1221 | } 1222 | } 1223 | 1224 | +bool CConnman::RunInactivityChecks(const CNode& node) const 1225 | +{ 1226 | + return GetSystemTimeInSeconds() > node.nTimeConnected + m_peer_connect_timeout;
4561423100c1b6a8e778525a027d021b918a6312
Definitely like peerman asking connman if it should RunInactivityChecks() instead of taking a peertimeout param 👌
Strictly speaking, I think it's possible for the time to be different from RunInactivityChecks to InactivityChecks since they each grab the time separately (as opposed to before, where they used the same now)? I don't think it'd be a huge issue since it's in seconds and the system time would only move forward though
~Currently system time isn't guaranteed to move forwards; and I believe we're now doing these checks in different threads so I don't think it's reasonable to try to use the same exact point in time for the two comparisons. We should be doing both checks 10-20 times per second for each peer anyway, I believe, so any lack of precision shouldn't matter.~
Could make this
bool CConnman::RunInactivityChecks(const CNode& node, const uint64_t const* pnow=nullptr)
{
const uint64_t now = (pnow ? *pnow : GetSystemTimeInSeconds());
return now > node.nTimeconnected + m_peer_connect_timeout;
}
and call it from within InactivityCheck() as if (!RunInactivityChecks(node, &now) return false; to avoid unnecessarily getting the system time twice in quick succession. Might also make RunInactivityChecks easier to test since you could pass in different times and check the answer, without having to worry that GetSystemTime isn't mockable.
I'll take aj's suggestion here: #20721 (review) in a follow-up
Main changes since last review:
Using -peertimeout on both sets of timeout checks indeed looks cleaner than before. It was easier to review for me - I did my best to verify that the decision on whether or not to run ping timeout checks is identical, which makes me feel more confident than when I was trying to find which tests might timeout. 🧠 And then tried to verify that we're still timing out pings when we're supposed to through the MaybeSendPing route
code review ACK a3ec04b7a6
3969 | @@ -3969,43 +3970,39 @@ void PeerManagerImpl::ProcessMessage(CNode& pfrom, const std::string& msg_type, 3970 | return; 3971 | } 3972 | 3973 | -bool PeerManagerImpl::MaybeDiscourageAndDisconnect(CNode& pnode) 3974 | +bool PeerManagerImpl::MaybeDiscourageAndDisconnect(CNode& pnode, Peer& peer) 3975 | { 3976 | - const NodeId peer_id{pnode.GetId()};
const NodeId peer_id{peer.m_id}; would've changed fewer lines :)
Indeed it would, but minimizing lines changed isn't the right metric to judge a commit. Instead we should aim to maximize the clarity and expressiveness of the code after the change. If I was writing this from scratch, I wouldn't create a local copy of the peer id just to use in log lines.
Hmm, actually I think I disagree with that (well, at least after exaggerating it into a bigger principle) -- review is a much more important bottleneck than just about anything else, so optimising the code to be easier to review is much more helpful than optimising it to make perfect sense if we were going to erase our git history and anonymously post the current tree to a mailing list. I think just about any time there's a choice between correct code that's easier to review and correct code that's better according to some other aesthetic, easier to review should be the winner. At least until review is no longer a serious bottleneck for the project.
Obviously, peer_id to peer.m_id is trivial, but if we're talking what we should aim to maximise, I think ease of review is the single right answer. I don't think you'll often find that being far off from optimising for clarity, though. (I don't think "expressiveness" applies either way tbh)
This discussion is a bit abstract, but I'm going to stand by my assertion. We should always be aiming to improve the quality of the code by standardizing to a clear and consistent style and by taking advantage of higher-level abstractions. One example: if I find a hand written loop that I need to modify, I may well replace it with an stl algorithm. That's undoubtedly a bit more effort to review, but results in higher-quality code. If we always aim to minimize deltas in code changes, then we'll only ever converge to local maxima of quality.
The point of review being a bottleneck is that we won't ever progress to a global maximum of quality -- if we could do that, by definition we wouldn't be bottlenecked. Every merge is a reflection of that review bottleneck: did we spend our limited review cycles on working out if a while loop does the same thing as the for loop it replaces, or did we spend it on working out if muhash or minisketch does what it claims on the box? If every PR spends 5% of review time checking that shuffling code around is correct, then that's 1-in-20 fewer PRs we can make that affect people that don't see the code. Or it's an incentive to get in the habit of merging code that's not thoroughly reviewed in order to avoid delaying things because we've run out of review resources, which is worse.
That doesn't mean we shouldn't always be aiming to improve the quality of the code; just that we should be spending the resources we have on the changes that are most valuable, even if that means leaving some good things for (possibly much) later.
(Personally, I find most of the STL algorithms worse than the loops they replace -- range based for loops in c++ are pretty pleasant and behave pretty sensibly; for me, dealing explicitly with iterators and lambdas generally isn't pleasant -- they're harder to understand and easier to make mistakes with, for at best a minimal performance benefit, and for me at least any safety benefits are offset by the lack of clarity. There are more important battles to fight, and I'm hoping that perhaps familiarity might eventually breed affection, but for now they certainly don't seem like moving toward even a local maximum to me)
Thanks AJ, these are good points.
You're wrong about stl algorithms though :grin:
1023 | @@ -1024,6 +1024,9 @@ class CConnman 1024 | 1025 | void SetAsmap(std::vector<bool> asmap) { addrman.m_asmap = std::move(asmap); } 1026 | 1027 | + /** Return true if the peer has been connected for long enough to do inactivity checks. */
if the peer should be connected for inactivity ? -- we could introduce a net permission to prevent that rather than just have a standard timeout for all peers, eg
connected for inactivity?? I don't follow.
are y'all talking about the comment being unclear or ?
1535 | @@ -1537,7 +1536,7 @@ void CConnman::SocketHandler() 1536 | if (bytes_sent) RecordBytesSent(bytes_sent); 1537 | } 1538 | 1539 | - if (InactivityCheck(*pnode)) pnode->fDisconnect = true; 1540 | + if (RunInactivityChecks(*pnode) && InactivityCheck(*pnode)) pnode->fDisconnect = true;
I'm surprised this is pulled out of InactivityCheck(). Seems cleaner to keep the early exit to me
254 | @@ -248,6 +255,7 @@ class PeerManagerImpl final : public PeerManager 255 | void CheckForStaleTipAndEvictPeers() override; 256 | bool GetNodeStateStats(NodeId nodeid, CNodeStateStats& stats) override; 257 | bool IgnoresIncomingTxs() override { return m_ignore_incoming_txs; } 258 | + void SendPings() override;
Only mentioning since you've got RunInactivityChecks as a separate commit, but I think you could reasonably have pulled SendPings() out as a separate commit prior to moving things from net to net_processing too.
Yes, good point.
ACK a3ec04b7a61378dab0a9a53840e5f504d25ac563
Plus some maybe nits.
Thanks for the review @ajtowns! I'll leave your nits for a follow-up PR. @dongcarl @amitiuttarwar - I think this would be ready for merge if you reACKed.
Refactor only. No change in behaviour.
Moves the logic to prevent running inactivity checks until
the peer has been connected for -peertimeout time into its
own function. This will be reused by net_processing later.
Ping messages are an application-level mechanism. Move timeout
logic from net to net processing.
-BEGIN VERIFY SCRIPT-
sed -i 's/fPingQueued/m_ping_queued/g' src/net_processing.cpp
sed -i 's/nMinPingUsecTime/m_min_ping_time/g' src/net.* src/net_processing.cpp src/test/net_tests.cpp
sed -i 's/nPingNonceSent/m_ping_nonce_sent/g' src/net_processing.cpp
sed -i 's/nPingUsecTime/m_last_ping_time/g' src/net.*
-END VERIFY SCRIPT-
Rebased
1207 | @@ -1221,18 +1208,17 @@ void CConnman::NotifyNumConnectionsChanged() 1208 | } 1209 | } 1210 | 1211 | +bool CConnman::RunInactivityChecks(const CNode& node) const
I'd prefer if this was called something like InactivityChecksEnabled or ShouldRunInactivityChecks because right now the name suggests this function is where they are run (aka what InactivityCheck does).
I'll change this in a follow-up.
reACK https://github.com/bitcoin/bitcoin/commit/a5e15ae45ccae7948a6c5b95e6621f01afb88d55
rebase-only by git range-diff a3ec04b...a5e15ae
ACK a5e15ae45c
lost track of the different versions so just re-reviewed the whole diff commit by commit.
definitely like having the m_peer_connect_timeout check as a function in net more than duplicating system time check in net_processing
in the future, I think it could make sense to have -peertimeout be limited to regtest. it's hidden behind debug-help, but technically possible on mainnet. I don't see how it'd be useful on mainnet though.
In the scripted diff I don't like that it may modify non-git user files (e.g. a net.h.rej file) in the same folder. Would be better to use git grep -l '...', which would also simplify review, as reviewers don't have to call git grep to verify the replacement was complete over all files, including docs.
review ACK a5e15ae45ccae7948a6c5b95e6621f01afb88d55 🥉
<details><summary>Show signature and timestamp</summary>
Signature:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512
review ACK a5e15ae45ccae7948a6c5b95e6621f01afb88d55 🥉
-----BEGIN PGP SIGNATURE-----
iQGzBAEBCgAdFiEE+rVPoUahrI9sLGYTzit1aX5ppUgFAlwqrYAACgkQzit1aX5p
pUhixgv9EHlcd7Eo4IXpSgu8XZbUGYox8iHe1MU0pdDdnGPBjdnoB71xvMNUczaF
vDqmc9jIEfT19pPXGJfnOGQ53oWatA9be22GiLjZtjIA+w7slyjnTJ/WM2LCxA1V
EWX2X57ad+DwEHW7YCzmf5pxxm/KK9TV3h3kE9I1yUGwg6FR5INfIyzssPqULAt2
8BexUMoXXaqeAV6WOJRq/9wdtOcv2WwD7YLc340VjdmRRRkVflySKnx0/m2x6oo8
b3zBYj3YC6+UVmPRD2YIexoG+XHAVGXbc2kFRkeJ/Puknt7pWyurIsF6w+dgzM7B
YrsLzEwgO7mO4xvXfvHYwNKuNX3WJTFJ+gpL1dIN7UlJQis4p3bl56Sxtyw6cKM+
d8VZtjYkTcEpL3jKJYnIk4fHFA6WmHaRp7b4FQr6400G2bznF6AHMC2h0z5Jqhpy
63XIWO9kD89mIPxNgzUxA+id/3SoaNPRN0Mj62RxTWmbpRX2gIvCvF/MRZJsdEuC
TQhwMCuo
=FZw8
-----END PGP SIGNATURE-----
Timestamp of file with hash f094bc0ee463f6bec56659b9f20c20ab75a2f829388e57c839500e90f3c196a3 -
</details>
207 | @@ -209,8 +208,8 @@ static RPCHelpMan getpeerinfo()
208 | if (stats.m_min_ping_usec < std::numeric_limits<int64_t>::max()) {
209 | obj.pushKV("minping", ((double)stats.m_min_ping_usec) / 1e6);
unrelated note: Can use the count seconds double helper from #21015
Thanks @MarcoFalke. I'll address this in the follow-up
#21015 isn't merged yet, but hopefully soon will.
I plan to review it soon
4327 | + // This means that setmocktime may cause pings to time out. 4328 | + auto now = GetTime<std::chrono::microseconds>(); 4329 | + 4330 | + if (m_connman.RunInactivityChecks(node_to) && peer.m_ping_nonce_sent && 4331 | + now > peer.m_ping_start.load() + std::chrono::seconds{TIMEOUT_INTERVAL}) { 4332 | + LogPrint(BCLog::NET, "ping timeout: %fs peer=%d\n", 0.000001 * count_microseconds(now - peer.m_ping_start.load()), peer.m_id);
Same
same same