p2p: protect addnode peers during IBD #32051

pull jonatack wants to merge 4 commits into bitcoin:master from jonatack:2025-03-addnode-p2p changing 2 files +17 −8
  1. jonatack commented at 3:27 am on March 13, 2025: member

    When an addnode peer is disconnected by our IBD headers/blocks download timeout or stalling logic, ThreadOpenAddedConnections attempts to immediately reconnect it (unless “onetry” was passed to the addnode RPC) up to the limit of 8 addnode connections that is separate from regular peer connection limits.

    During IBD in a low-bandwidth environment, I see these disconnections+reconnections frequently with high-quality addnode peers; see https://bitcoin-irc.chaincode.com/bitcoin-core-dev/2025-01-22#1083973.

    This is probably network, resource and time intensive to an extent, particularly in the case of I2P peers that involves destroying and rebuilding 2 tunnels per peer connection, and worth avoiding where it is simple to do so.

    Automatic (non-manually added) peers are also disconnected, but they are a different category and case (non-trusted non-protected peers, no immediate reconnection) that would require monitoring over time to adjust the timeouts accordingly; mzumsande was looking into this (see https://bitcoin-irc.chaincode.com/bitcoin-core-dev/2025-01-22#1083993).

    The goal of this pull is therefore to avoid unnecessary frequent disconnections/immediate reconnections of addnode peers during IBD.

  2. p2p: allow BLOCK_STALLING_TIMEOUT_MAX for addnode peers 921b89e1d4
  3. DrahtBot commented at 3:27 am on March 13, 2025: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/32051.

    Reviews

    See the guideline for information on the review process. A summary of reviews will appear here.

  4. DrahtBot added the label P2P on Mar 13, 2025
  5. fanquake commented at 3:33 am on March 13, 2025: member
    0[23:32:25.656] /ci_container_base/src/rpc/net.cpp:310:125: error: invalid operands to binary expression ('const char[198]' and 'const char[196]')
    1[23:32:25.656]                 "Nodes added using addnode (or -connect) are protected from disconnection due to DoS or IBD header/block\n" +
    2[23:32:25.656]                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^
    3[23:32:25.656] 1 error generated.
    
  6. jonatack force-pushed on Mar 13, 2025
  7. DrahtBot commented at 3:35 am on March 13, 2025: contributor

    🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/38682150505

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  8. DrahtBot added the label CI failed on Mar 13, 2025
  9. p2p: don't disconnect addnode peers for block download timeout 3463a7f481
  10. p2p: don't disconnect addnode peers for slow initial-headers-sync 64b956f422
  11. rpc, doc: update addnode documentation 93b07997e9
  12. jonatack force-pushed on Mar 13, 2025
  13. DrahtBot removed the label CI failed on Mar 13, 2025
  14. jonatack marked this as ready for review on Mar 13, 2025
  15. jonatack commented at 3:32 pm on March 13, 2025: member
    Ready for review.
  16. in src/net_processing.cpp:5806 in 93b07997e9
    5802@@ -5801,16 +5803,21 @@ bool PeerManagerImpl::SendMessages(CNode* pto)
    5803             return true;
    5804         }
    5805         // In case there is a block that has been in flight from this peer for block_interval * (1 + 0.5 * N)
    5806-        // (with N the number of peers from which we're downloading validated blocks), disconnect due to timeout.
    5807+        // (with N the number of peers from which we're downloading validated blocks), disconnect due to timeout
    


    yancyribbens commented at 4:58 pm on March 14, 2025:
    0        // (with N being the number of peers from which we're downloading validated blocks), disconnect due to timeout
    
  17. in src/net_processing.cpp:5818 in 3463a7f481 outdated
    5816-                LogInfo("Timeout downloading block %s, %s\n", queuedBlock.pindex->GetBlockHash().ToString(), pto->DisconnectMsg(fLogIPs));
    5817-                pto->fDisconnect = true;
    5818+                if (pto->IsManualConn()) {
    5819+                    LogInfo("Timeout downloading block %s from addnode peer, not %s\n", queuedBlock.pindex->GetBlockHash().ToString(), pto->DisconnectMsg(fLogIPs));
    5820+                } else {
    5821+                    LogInfo("Timeout downloading block %s, %s\n", queuedBlock.pindex->GetBlockHash().ToString(), pto->DisconnectMsg(fLogIPs));
    


    mzumsande commented at 5:38 pm on March 17, 2025:
    This is not the timeout specific for IBD stalling but the general one (which is generally longer). So we’d still disconnect for stalling (see log message “Peer is stalling block download”) - is that on purpose / can you explain a bit more the reasoning behind this?
  18. mzumsande commented at 5:45 pm on March 17, 2025: contributor

    I think the main conceptual question is what should happen in the case of a stalling situation where the addnode peer is actually slow and causing congestion?

    While we have more trust in manual peers than automatic ones, I don’t think that means the trust should be without bounds. Even trusted nodes can become unresponsive temporarily (e.g. very slow internet). I’m not sure if we should give one addnode peer the power to let our entire IBD come to a halt completely.

    An alternative would be to not give these peers unlimited time but clear their block requests (so that these blocks can be requested from other peers) so that they wouldn’t be disconnected but also wouldn’t be able to stall us indefinitely.


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-03-28 15:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me