p2p: protect manual evictions #34743

pull willcl-ark wants to merge 4 commits into bitcoin:master from willcl-ark:protect-manual-evictions changing 5 files +131 −11
  1. willcl-ark commented at 1:28 pm on March 5, 2026: member

    Ref: #5097

    Manual peers added via -addnode, -connect, or the addnode RPC are currently disconnected during IBD if they trigger block-stalling logic. That can conflict with explicit operator intent to keep those peers connected, especially in -connect-based setups.

    This PR changes only the block-stalling path for manual peers. Instead of disconnecting the peer, it releases that peer’s in-flight block requests so other peers can pick them up and IBD can continue.

    The change also adds a one-round recovery flag to avoid immediately reassigning the freed blocks back to the same manual peer due to per-peer SendMessages ordering.

  2. willcl-ark renamed this:
    Protect manual evictions
    p2p: protect manual evictions
    on Mar 5, 2026
  3. DrahtBot commented at 1:29 pm on March 5, 2026: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    Concept ACK w0xlt, sedited, kevkevinpal, brunoerg

    If your review is incorrectly listed, please copy-paste <!–meta-tag:bot-skip–> into the comment that the bot should ignore.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #33854 (fix assumevalid is ignored during reindex by Eunovo)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

    LLM Linter (✨ experimental)

    Possible places where comparison-specific test macros should replace generic comparisons:

    • [test/functional/p2p_ibd_stalling.py] assert manual_peer.is_connected -> Replace with a comparison-specific helper, e.g. assert_equal(manual_peer.is_connected, True) (or assert_equal(bool(manual_peer.is_connected), True)).

    2026-03-24 14:32:15

  4. DrahtBot added the label P2P on Mar 5, 2026
  5. w0xlt commented at 3:30 pm on March 5, 2026: contributor
    Concept ACK
  6. in test/functional/test_framework/test_node.py:921 in 547f119668
    916+
    917+        def addnode_callback(address, port):
    918+            if v2:
    919+                self.addnode('%s:%d' % (address, port), "onetry", v2transport=True)
    920+            else:
    921+                self.addnode('%s:%d' % (address, port), "onetry")
    


    kevkevinpal commented at 7:59 pm on March 5, 2026:

    Might be able to do this instead. It seems redundant to have the two calls.

    0self.addnode('%s:%d' % (address, port), "onetry", v2transport=v2)
    

    willcl-ark commented at 10:08 am on March 9, 2026:
    Taken in be002a7
  7. jonatack commented at 2:51 am on March 6, 2026: member
    Thanks for picking this up, Will.
  8. sedited commented at 7:35 am on March 6, 2026: contributor
    Concept ACK
  9. kevkevinpal commented at 3:45 pm on March 6, 2026: contributor
    Concept ACK
  10. willcl-ark force-pushed on Mar 6, 2026
  11. brunoerg commented at 2:20 pm on March 9, 2026: contributor
    Concept ACK
  12. DrahtBot added the label Needs rebase on Mar 11, 2026
  13. willcl-ark force-pushed on Mar 11, 2026
  14. DrahtBot removed the label Needs rebase on Mar 11, 2026
  15. sedited requested review from ajtowns on Mar 20, 2026
  16. in src/net_processing.cpp:6111 in 749f425094 outdated
    6113+                // Return early so the block download logic below doesn't
    6114+                // immediately re-assign them to this peer.
    6115+                while (!state.vBlocksInFlight.empty()) {
    6116+                    RemoveBlockRequest(state.vBlocksInFlight.front().pindex->GetBlockHash(), node.GetId());
    6117+                }
    6118+                state.m_stall_recovery = true;
    


    ajtowns commented at 1:36 am on March 23, 2026:
    Would it be better for this to be a timer – ie, if it stalls, stop requesting any blocks from this peer for X seconds? Having the peer immediately start providing blocks again seems like it sets things up for regular stall/clear/stall/clear behaviour. I could see an argument for setting X to be fairly large, eg 10 minutes, to minimise that behaviour during IBD, while still recovering gracefully after IBD is complete.
  17. in src/net_processing.cpp:6135 in accb5e4edf
    6130@@ -6115,27 +6131,39 @@ bool PeerManagerImpl::SendMessages(CNode& node)
    6131             QueuedBlock &queuedBlock = state.vBlocksInFlight.front();
    6132             int nOtherPeersWithValidatedDownloads = m_peers_downloading_from - 1;
    6133             if (current_time > state.m_downloading_since + std::chrono::seconds{consensusParams.nPowTargetSpacing} * (BLOCK_DOWNLOAD_TIMEOUT_BASE + BLOCK_DOWNLOAD_TIMEOUT_PER_PEER * nOtherPeersWithValidatedDownloads)) {
    6134-                LogInfo("Timeout downloading block %s, %s", queuedBlock.pindex->GetBlockHash().ToString(), node.DisconnectMsg());
    6135-                node.fDisconnect = true;
    6136-                return true;
    6137+                if (node.IsManualConn()) {
    6138+                    LogDebug(BCLog::NET, "Not disconnecting manual peer for block download timeout, %s", node.DisconnectMsg());
    


    ajtowns commented at 1:45 am on March 23, 2026:
    I’m not sure this change makes sense – if a peer can’t deliver a block to us in ten minutes, it’s not very useful even if it was a manual connection?
  18. in src/net_processing.cpp:6159 in accb5e4edf
    6160+                    // or manual peers; just reset their sync state.
    6161                     // Note: If all our peers are inbound, then we won't
    6162                     // disconnect our sync peer for stalling; we have bigger
    6163                     // problems if we can't get any outbound peers.
    6164-                    if (!node.HasPermission(NetPermissionFlags::NoBan)) {
    6165+                    if (!node.HasPermission(NetPermissionFlags::NoBan) && !node.IsManualConn()) {
    


    ajtowns commented at 1:50 am on March 23, 2026:
    Likewise here – if a peer can’t reply to our getheaders request in 15 minutes it doesn’t seem very useful; and the “NoBan” permission is already there for keeping connections to not very useful nodes.
  19. ajtowns commented at 1:54 am on March 23, 2026: contributor

    The PR description is quite complicated, could it be simplified?

    I believe the approach we currently have is “if stalling in block downloads is detected, unstall by disconnecting the peer with the oldest queued block”, and this introduces a new unstalling technique “reset the peer with the oldest queued block’s entire download queue, but let it restart downloading blocks almost immediately”.

  20. ajtowns commented at 2:05 am on March 23, 2026: contributor
    It might be good to run a test simulating this behaviour – I think having three nodes: A doing IBD, manually connected to B and C; B normal; C has it’s bandwidth shaped by the OS to, say, 10kB/s (trickle -u 10 bitcoind ...) ? With master, that should A disconnect C for stalling; with this PR, I think it should see C repeatedly stall, and as a result see a somewhat slower IBD than if C were disconnected.
  21. net: don't disconnect manual peers for block stalling
    Manual connections (-addnode, -connect) represent explicit user intent to
    maintain the connection, yet they currently get disconnected during IBD
    when the block download window stalls.
    
    Instead of disconnecting a stalled manual peer, release its in-flight
    blocks so other peers can request them. Set m_stall_recovery to skip one
    round of block download for this peer so its next SendMessages call
    doesn't immediately reclaim the same blocks before another peer can pick
    them up.
    
    This behavior first showed up as intermittent failures in the functional
    tests with few peers, but it can also delay recovery outside tests when a
    manual peer is revisited before other peers have a chance to request the
    freed blocks.
    409313bab6
  22. test: add add_manual_p2p_connection helper to TestNode
    Add a helper for creating manual (addnode onetry) p2p connections,
    mirroring the existing add_outbound_p2p_connection method. This will
    be used by tests that verify manual peer behavior during IBD.
    6c245778c6
  23. test: verify manual peers survive block stalling timeout
    test that a manual (addnode) peer which stalls on a block, is not
    disconnected when the stalling timeout fires.
    7c519790f5
  24. doc: update addnode rpc help and add release note 935afd4042
  25. willcl-ark force-pushed on Mar 24, 2026

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-03-30 12:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me