Closes: #5097
Manual peers (-addnode, -connect + RPC) currently get disconnected by IBD timeout logic for block stalling, block download timeout and headers sync timeout. This can conflict with explicit user intent to keep these peers connected (especially in privacy-focused -connect setups), and causes unnecessary disconnect/reconnect churn.
This PR changes that behavior: manual peers are not disconnected by those timeout paths and we try to re-assign their work to other peers to maintain progress.
Disconnecting manual peers automatically causes reconnect loops and other behavior that can violate user expectation for manually configured peers.
Simply “not disconnecting” (for block stalling) is not sufficient by itself: a manual peer can retain in-flight block assignments and delay overall IBD progress.
I considered taking an approach more like that from #32051, simply protecting peers from some disconnects. However without the reassignment the stalling effects are simply prolonged (which is worse, IMO).
This PR uses a different appraoch:
- For manual peers in block-stalling/download-timeout paths:
- release all in-flight blocks from that peer,
- set a one-round recovery flag (
m_stall_recovery) so scheduler ordering doesn’t immediately reassign those same blocks back to the same peer.
This attempts to both preserve manual connections and maintain download progress. It actively reassigns stalled work so IBD can continue, rather than just waiting longer. And, the one-round skip prevents immediate re-capture of freed blocks by the same manual peer due to per-peer send ordering.
Compared with simpler “don’t disconnect” approaches, this is more robust both in tests, and real-world slow-peer conditions.
I believe the changes do not increase remote attacker control because it only applies to manual peers (-addnode, -connect, addnode), which are explicitly chosen by the local operator. You were and still are free to add a malicious peer manually.
A malicious manually-added peer can now remain connected longer, but that peer was already granted privileged trust by operator configuration. And in fact, for block stalling/download timeout this PR reduces potential impact of a slow/malicious manual peer by releasing its in-flight blocks so other peers can fetch them.
Of course, if an operator relies only on bad manual peers, sync can still be delayed.