p2p: protect manual evictions

willcl-ark commented at 1:28 pm on March 5, 2026: member

Manual peers added via -addnode, -connect, or the addnode RPC are currently disconnected during IBD if they trigger block-stalling logic. That can conflict with explicit operator intent to keep those peers connected, especially in -connect-based setups.

This PR changes only the block-stalling path for manual peers. Instead of disconnecting the peer, it releases that peer’s in-flight block requests so other peers can pick them up and IBD can continue.

The change also adds a one-round recovery flag to avoid immediately reassigning the freed blocks back to the same manual peer due to per-peer SendMessages ordering.

willcl-ark renamed this:
~~Protect manual evictions~~
p2p: protect manual evictions
on Mar 5, 2026

DrahtBot commented at 1:29 pm on March 5, 2026: contributor

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Reviews

See the guideline for information on the review process.

Type	Reviewers
Concept ACK	w0xlt, sedited, kevkevinpal, brunoerg

If your review is incorrectly listed, please copy-paste <!–meta-tag:bot-skip–> into the comment that the bot should ignore.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#33854 (fix assumevalid is ignored during reindex by Eunovo)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

LLM Linter (✨ experimental)

Possible places where comparison-specific test macros should replace generic comparisons:

[test/functional/p2p_ibd_stalling.py] assert manual_peer.is_connected -> Replace with a comparison-specific helper, e.g. assert_equal(manual_peer.is_connected, True) (or assert_equal(bool(manual_peer.is_connected), True)).

2026-03-24 14:32:15

DrahtBot added the label P2P on Mar 5, 2026

w0xlt commented at 3:30 pm on March 5, 2026: contributor

Concept ACK

in test/functional/test_framework/test_node.py:921 in 547f119668

916+
917+        def addnode_callback(address, port):
918+            if v2:
919+                self.addnode('%s:%d' % (address, port), "onetry", v2transport=True)
920+            else:
921+                self.addnode('%s:%d' % (address, port), "onetry")

kevkevinpal commented at 7:59 pm on March 5, 2026:

Might be able to do this instead. It seems redundant to have the two calls.

0self.addnode('%s:%d' % (address, port), "onetry", v2transport=v2)

willcl-ark commented at 10:08 am on March 9, 2026:

Taken in be002a7

jonatack commented at 2:51 am on March 6, 2026: member

Thanks for picking this up, Will.

sedited commented at 7:35 am on March 6, 2026: contributor

Concept ACK

kevkevinpal commented at 3:45 pm on March 6, 2026: contributor

Concept ACK

willcl-ark force-pushed on Mar 6, 2026

brunoerg commented at 2:20 pm on March 9, 2026: contributor

Concept ACK

DrahtBot added the label Needs rebase on Mar 11, 2026

willcl-ark force-pushed on Mar 11, 2026

DrahtBot removed the label Needs rebase on Mar 11, 2026

sedited requested review from ajtowns on Mar 20, 2026

in src/net_processing.cpp:6111 in 749f425094 outdated

6113+                // Return early so the block download logic below doesn't
6114+                // immediately re-assign them to this peer.
6115+                while (!state.vBlocksInFlight.empty()) {
6116+                    RemoveBlockRequest(state.vBlocksInFlight.front().pindex->GetBlockHash(), node.GetId());
6117+                }
6118+                state.m_stall_recovery = true;

ajtowns commented at 1:36 am on March 23, 2026:

Would it be better for this to be a timer – ie, if it stalls, stop requesting any blocks from this peer for X seconds? Having the peer immediately start providing blocks again seems like it sets things up for regular stall/clear/stall/clear behaviour. I could see an argument for setting X to be fairly large, eg 10 minutes, to minimise that behaviour during IBD, while still recovering gracefully after IBD is complete.

in src/net_processing.cpp:6135 in accb5e4edf

6130@@ -6115,27 +6131,39 @@ bool PeerManagerImpl::SendMessages(CNode& node)
6131             QueuedBlock &queuedBlock = state.vBlocksInFlight.front();
6132             int nOtherPeersWithValidatedDownloads = m_peers_downloading_from - 1;
6133             if (current_time > state.m_downloading_since + std::chrono::seconds{consensusParams.nPowTargetSpacing} * (BLOCK_DOWNLOAD_TIMEOUT_BASE + BLOCK_DOWNLOAD_TIMEOUT_PER_PEER * nOtherPeersWithValidatedDownloads)) {
6134-                LogInfo("Timeout downloading block %s, %s", queuedBlock.pindex->GetBlockHash().ToString(), node.DisconnectMsg());
6135-                node.fDisconnect = true;
6136-                return true;
6137+                if (node.IsManualConn()) {
6138+                    LogDebug(BCLog::NET, "Not disconnecting manual peer for block download timeout, %s", node.DisconnectMsg());

ajtowns commented at 1:45 am on March 23, 2026:

I’m not sure this change makes sense – if a peer can’t deliver a block to us in ten minutes, it’s not very useful even if it was a manual connection?

in src/net_processing.cpp:6159 in accb5e4edf

6160+                    // or manual peers; just reset their sync state.
6161                     // Note: If all our peers are inbound, then we won't
6162                     // disconnect our sync peer for stalling; we have bigger
6163                     // problems if we can't get any outbound peers.
6164-                    if (!node.HasPermission(NetPermissionFlags::NoBan)) {
6165+                    if (!node.HasPermission(NetPermissionFlags::NoBan) && !node.IsManualConn()) {

ajtowns commented at 1:50 am on March 23, 2026:

Likewise here – if a peer can’t reply to our getheaders request in 15 minutes it doesn’t seem very useful; and the “NoBan” permission is already there for keeping connections to not very useful nodes.

ajtowns commented at 1:54 am on March 23, 2026: contributor

The PR description is quite complicated, could it be simplified?

I believe the approach we currently have is “if stalling in block downloads is detected, unstall by disconnecting the peer with the oldest queued block”, and this introduces a new unstalling technique “reset the peer with the oldest queued block’s entire download queue, but let it restart downloading blocks almost immediately”.

ajtowns commented at 2:05 am on March 23, 2026: contributor

It might be good to run a test simulating this behaviour – I think having three nodes: A doing IBD, manually connected to B and C; B normal; C has it’s bandwidth shaped by the OS to, say, 10kB/s (trickle -u 10 bitcoind ...) ? With master, that should A disconnect C for stalling; with this PR, I think it should see C repeatedly stall, and as a result see a somewhat slower IBD than if C were disconnected.

net: don't disconnect manual peers for block stalling

Manual connections (-addnode, -connect) represent explicit user intent to
maintain the connection, yet they currently get disconnected during IBD
when the block download window stalls.

Instead of disconnecting a stalled manual peer, release its in-flight
blocks so other peers can request them. Set m_stall_recovery to skip one
round of block download for this peer so its next SendMessages call
doesn't immediately reclaim the same blocks before another peer can pick
them up.

This behavior first showed up as intermittent failures in the functional
tests with few peers, but it can also delay recovery outside tests when a
manual peer is revisited before other peers have a chance to request the
freed blocks.

409313bab6

test: add add_manual_p2p_connection helper to TestNode

Add a helper for creating manual (addnode onetry) p2p connections,
mirroring the existing add_outbound_p2p_connection method. This will
be used by tests that verify manual peer behavior during IBD.

6c245778c6

test: verify manual peers survive block stalling timeout

test that a manual (addnode) peer which stalls on a block, is not
disconnected when the stalling timeout fires.

7c519790f5

doc: update addnode rpc help and add release note 935afd4042

willcl-ark force-pushed on Mar 24, 2026

p2p: protect manual evictions #34743

Reviews

Conflicts

LLM Linter (✨ experimental)