Disconnect from outbound peers with bad headers chains #11490

pull sdaftuar wants to merge 3 commits into bitcoin:master from sdaftuar:2017-10-outbound-peers-good-chain changing 4 files +200 −0
  1. sdaftuar commented at 9:06 pm on October 12, 2017: member

    The first commit will disconnect an outbound peer that serves us a headers chain with insufficient work while we’re in IBD.

    The second commit introduces a way to disconnect outbound peers whose chains fall out of sync with ours:

    For a given outbound peer, we check whether their best known block (which is known from the blocks they announce to us) has at least as much work as our tip. If it doesn’t, we set a 20 minute timeout, and if we still haven’t heard about a block with as much work as our tip had when we set the timeout, then we send a single getheaders message, and wait 2 more minutes. If after two minutes their best known block has insufficient work, we disconnect that peer.

    We protect 4 of our outbound peers (who provide some “good” headers chains, ie a chain with at least as much work as our tip at some point) from being subject to this logic, to prevent excessive network topology changes as a result of this algorithm, while still ensuring that we have a reasonable number of nodes not known to be on bogus chains.

    We also don’t require our peers to be on the same chain as us, to prevent accidental partitioning of the network in the event of a chain split. Note that if our peers are ever on a more work chain than our tip, then we will download and validate it, and then either reorg to it, or learn of a consensus incompatibility with that peer and disconnect. This PR is designed to protect against peers that are on a less work chain which we may never try to download and validate.

  2. fanquake added the label P2P on Oct 12, 2017
  3. fanquake added this to the milestone 0.15.0.2 on Oct 12, 2017
  4. in src/net_processing.cpp:520 in d5b4bf323f outdated
    516@@ -502,7 +517,7 @@ void PeerLogicValidation::InitializeNode(CNode *pnode) {
    517     NodeId nodeid = pnode->GetId();
    518     {
    519         LOCK(cs_main);
    520-        mapNodeState.emplace_hint(mapNodeState.end(), std::piecewise_construct, std::forward_as_tuple(nodeid), std::forward_as_tuple(addr, std::move(addrName)));
    521+        auto it = mapNodeState.emplace_hint(mapNodeState.end(), std::piecewise_construct, std::forward_as_tuple(nodeid), std::forward_as_tuple(addr, std::move(addrName)));
    


    sipa commented at 0:40 am on October 13, 2017:
    Unused variable
  5. in src/net_processing.cpp:3282 in d5b4bf323f outdated
    3277+            // This is an outbound peer subject to disconnection if their chain
    3278+            // lags behind ours (note: if their chain has more work than ours,
    3279+            // we should sync to it, unless it's invalid, in which case we
    3280+            // should find that out and disconnect from them elsewhere).
    3281+            if (state.pindexBestKnownBlock != nullptr && state.pindexBestKnownBlock->nChainWork >= chainActive.Tip()->nChainWork)
    3282+            {
    


    sipa commented at 0:48 am on October 13, 2017:
    Nit: same line
  6. sipa commented at 1:55 am on October 13, 2017: member

    I’ve spent some time verifying the logic, and it looks very good.

    I briefly misunderstood it to mean that if a peer makes progress during the 20 minute window, it would be extended. That would have been a problem (an incompatible chain that has less work but does grow continuously), but it’s not what happens (it requires reaching the same amount of work as we had at the beginning of the window).

    A test for the second commit would be great.

  7. in src/net_processing.cpp:2448 in 3203afcc7a outdated
    2375+                // This peer has too little work on their headers chain to help
    2376+                // us sync -- disconnect if using an outbound slot (unless
    2377+                // whitelisted).
    2378+                if (!(pfrom->fInbound || pfrom->fWhitelisted || pfrom->fAddnode)) {
    2379+                    pfrom->fDisconnect = true;
    2380+                }
    


    gmaxwell commented at 9:17 am on October 13, 2017:

    I think this needs to be backed off a bit: Consider– imagine we start and are fully synced up, but we are still in IsInitialBlockDownload because the network has been slow and so the tip’s timestamp is a ways back. Then we have a peer we get headers from who is a single block behind us. This would disconnect them, and I think that is probably undesirable.

    It’s probably a good idea to review all code as if IsInitialBlockDownload() were just return true; and if there would be misbehavior in that case, then the usage is perhaps suboptimal. An obvious conservative thing to do would be to just use the nMinimumChainWork … though perhaps too conservative. It might also be reasonable to use a GetBlockProofEquivalentTime like test, or just the work several blocks back from chainactive tip. I think if it was conservative enough it could apply at all times, not just IsIninitialBlockDownload().

    Maybe the existence of the second commit means that it would be okay for this one to be less aggressive?


    sdaftuar commented at 1:14 pm on October 13, 2017:
    I guess with the second commit, we will eventually disconnect some peers if they stay on chains with less work than our tip. So I think we could just leave this as nMinimumChainWork, to protect against the case that all our peers only have incomplete/bogus chains (which wouldn’t be prevented by the second commit).

    promag commented at 2:47 pm on October 24, 2017:

    Does it make sense to avoid the push above: https://github.com/bitcoin/bitcoin/blob/e9d58023aef97d954b7a75da806ee4f22c9760c3/src/net_processing.cpp#L2332 now that the node will disconnect?

    Edit: nevermind, since this is protected with nCount != MAX_HEADERS_RESULTS.

  8. sdaftuar commented at 1:16 pm on October 13, 2017: member
    Thanks for the review. I’ve pushed up a couple quick fixes for the nits. Will work on a test, as well as trying to implement a suggestion @gmaxwell gave me offline for improving the behavior in the situation where none of our peers are giving us any block announcements.
  9. sdaftuar force-pushed on Oct 14, 2017
  10. in src/net_processing.cpp:2433 in 6e21a4a793 outdated
    2382@@ -2383,6 +2383,21 @@ bool static ProcessMessage(CNode* pfrom, const std::string& strCommand, CDataStr
    2383                 }
    2384             }
    2385         }
    2386+        // If we're in IBD, we want outbound peers that will serve us a useful
    2387+        // chain.  Disconnect peers that are behind us or on chains with
    2388+        // insufficient work.
    2389+        if (IsInitialBlockDownload() && nCount != MAX_HEADERS_RESULTS) {
    


    TheBlueMatt commented at 8:23 pm on October 18, 2017:

    I believe this adds a new invariant (currently true), that we shall never send a GETHEADERS or do something which expects, as a result, a HEADERS message, which is not MAX_HEADERS_RESULT if it doesn’t end with the peer’s tip (ie no GETHEADERS with hashStop of not-their-tip, or at least not when in IBD/before we’ve finished syncing the state of another peer). I don’t think this is bad, but it should be documented somewhere explicitly.

    Eventually I really think we need to have peers in a “mode” which describes were we are 1. connect-handshake (version/verack/sendcmpct/sendheaders/feefilter/etc) 2. header-chain-sync (slightly complicated by the fact that we want to block all peers but one during ibd until we’ve finished one) 3. normal operation and then we can simplify and document strange things like this significantly instead of making everything yet more of a mess.


    sdaftuar commented at 3:54 pm on October 19, 2017:

    I believe this adds a new invariant (currently true), that we shall never send a GETHEADERS or do something which expects, as a result, a HEADERS message, which is not MAX_HEADERS_RESULT if it doesn’t end with the peer’s tip (ie no GETHEADERS with hashStop of not-their-tip, or at least not when in IBD/before we’ve finished syncing the state of another peer). I don’t think this is bad, but it should be documented somewhere explicitly.

    I don’t think this is precisely correct. What we are assuming is the same thing we use for headers sync (eg on initial connection) – if we receive fewer than MAX_HEADERS_RESULTS, then we have no reason to request more headers in order to update pindexBestKnownBlock. Note that in the logic below, we are using pindexBestKnownBlock, not pindexLast.

    In other words, if we did add a reason for requesting a single header to a random location in the chain, I think this logic would still be correct.


    promag commented at 11:28 pm on October 19, 2017:
    Swap conditions as the second is faster?

    sdaftuar commented at 0:13 am on October 20, 2017:
    I don’t think this really matters? In the steady state (ie after we’ve caught up from initial sync), the second condition will generally be true, while the first condition will be false.

    TheBlueMatt commented at 0:16 am on October 20, 2017:

    if we receive fewer than MAX_HEADERS_RESULTS, then we have no reason to request more headers in order to update pindexBestKnownBlock. Note that in the logic below, we are using pindexBestKnownBlock, not pindexLast.

    I’m confused, then. If we randomly decide to request the header at height 1 in the middle of syncing our peer’s header chain, we’d do fine in the rest of the ::HEADERS handling here, just (correctly) not requesting any further headers built on top of it. However, we’d immediately disconnect them if we’re not in IBD by this new check. Previously this wouldn’t break anything, as the next HEADERS they send us will result in us requesting further headers.


    sdaftuar commented at 1:09 am on October 20, 2017:

    Ah, you’re right about that case – I was thinking of the case where we send a random getheaders after we’ve already synced the chain, and therefore updated pindexBestKnownBlock to be some recent-ish thing.

    I guess it is true that if you just added a random getheaders message in the middle of initial sync, you might not break the sync logic, but this new logic would cause a disconnect… I can add a comment indicating that we shouldn’t do anything funky until after we’ve synced a peer’s headers chain?

  11. in src/net_processing.cpp:207 in 4b8a4bd29c outdated
    202@@ -200,6 +203,14 @@ struct CNodeState {
    203      * otherwise: whether this peer sends non-witnesses in cmpctblocks/blocktxns.
    204      */
    205     bool fSupportsDesiredCmpctVersion;
    206+    //! Whether this peer is protected from disconnection due to a bad/slow chain
    207+    bool fProtectFromDisconnect;
    


    TheBlueMatt commented at 10:13 pm on October 18, 2017:
    new code should have the m_ prefix here (and other places), and probably drop the hungarian type prefix.
  12. in src/net_processing.cpp:3323 in b44013e9e7 outdated
    3318+                    // They've run out of time to catch up!
    3319+                    LogPrintf("Disconnecting outbound peer %d for old chain, best known block = %s\n", pto->GetId(), state.pindexBestKnownBlock != nullptr ? state.pindexBestKnownBlock->GetBlockHash().ToString() : "<none>");
    3320+                    pto->fDisconnect = true;
    3321+                } else {
    3322+                    const CBlockIndex *pindexStart = state.header_with_required_work;
    3323+                    assert (pindexStart != nullptr);
    


    TheBlueMatt commented at 11:21 pm on October 18, 2017:
    Ugh, just for a print?

    sdaftuar commented at 4:59 pm on October 19, 2017:
    agreed, chucking it
  13. TheBlueMatt commented at 11:24 pm on October 18, 2017: member
    Looks pretty good. This is begging for a clearer node-sync-state model, but its fine for 0.15.0.2. Does want a test.
  14. laanwj added the label Needs backport on Oct 19, 2017
  15. sdaftuar force-pushed on Oct 19, 2017
  16. sdaftuar commented at 5:05 pm on October 19, 2017: member

    Thanks for the review – pushed up some fixup commits, which I’ll squash.

    I switched to using GetTime() everywhere, and was then able to write up a unit test. This is currently next-to-impossible to test in the regtest environment, I believe, since we don’t support non-manual connections in regtest.

  17. sdaftuar force-pushed on Oct 19, 2017
  18. sdaftuar commented at 5:23 pm on October 19, 2017: member
    Squashed (https://github.com/sdaftuar/bitcoin/commits/11490.1 -> 032dc531b911a0c0ea570aa74531508f54a36073)
  19. jonasschnelli commented at 7:03 pm on October 19, 2017: contributor

    Travis found:

    0test/DoS_tests.cpp(80): error in "outbound_slow_chain_eviction": check dummyNode1.fDisconnect == true failed
    
  20. sdaftuar force-pushed on Oct 19, 2017
  21. sdaftuar force-pushed on Oct 19, 2017
  22. sdaftuar commented at 9:02 pm on October 19, 2017: member
    Unit test issue should be fixed now.
  23. in src/net_processing.cpp:552 in 0a3d957935 outdated
    548@@ -534,6 +549,8 @@ void PeerLogicValidation::FinalizeNode(NodeId nodeid, bool& fUpdateConnectionTim
    549     nPreferredDownload -= state->fPreferredDownload;
    550     nPeersWithValidatedDownloads -= (state->nBlocksInFlightValidHeaders != 0);
    551     assert(nPeersWithValidatedDownloads >= 0);
    552+    g_outbound_peers_with_protect_from_disconnect -= state->m_protect_from_disconnect;
    


    promag commented at 11:07 pm on October 19, 2017:
    int -= bool?

    promag commented at 11:26 pm on October 19, 2017:

    More verbose but IMO more clear:

    0if (state->m_protect_from_disconnect) {
    1    assert(g_outbound_peers_with_protect_from_disconnect > 0);
    2    --g_outbound_peers_with_protect_from_disconnect;
    3}
    

    sdaftuar commented at 0:12 am on October 20, 2017:
    I was following the pattern already in use in the lines above (nPreferredDownload -= state->fPreferredDownload). But I can change to something like what you suggest if this is frowned upon for some reason.
  24. in src/net_processing.cpp:128 in 0a3d957935 outdated
    123@@ -124,6 +124,9 @@ namespace {
    124     /** Number of peers from which we're downloading blocks. */
    125     int nPeersWithValidatedDownloads = 0;
    126 
    127+    /** Number of outbound peers with m_protect_from_disconnect. */
    128+    int g_outbound_peers_with_protect_from_disconnect = 0;
    


    promag commented at 11:10 pm on October 19, 2017:
    Remove since this is easily calculated by counting nodes with m_protect_from_disconnect?

    sdaftuar commented at 0:10 am on October 20, 2017:
    By looping over all the CNodeState objects when this value is needed? I don’t think that makes sense. Note that tracking the value here is analogous to what we do with nPeersWithValidatedDownloads and nPreferredDownload.
  25. in src/net_processing.h:27 in 0a3d957935 outdated
    20@@ -21,6 +21,12 @@ static const unsigned int DEFAULT_BLOCK_RECONSTRUCTION_EXTRA_TXN = 100;
    21  *  Timeout = base + per_header * (expected number of headers) */
    22 static constexpr int64_t HEADERS_DOWNLOAD_TIMEOUT_BASE = 15 * 60 * 1000000; // 15 minutes
    23 static constexpr int64_t HEADERS_DOWNLOAD_TIMEOUT_PER_HEADER = 1000; // 1ms/header
    24+/** Protect at least this many outbound peers from disconnection due to slow/
    25+ * behind headers chain.
    26+ */
    27+static constexpr int32_t MAX_OUTBOUND_PEERS_TO_PROTECT = 4;
    


    promag commented at 11:12 pm on October 19, 2017:
    _FROM_DISCONNECT?
  26. in src/net_processing.cpp:2405 in 0a3d957935 outdated
    2400@@ -2383,6 +2401,31 @@ bool static ProcessMessage(CNode* pfrom, const std::string& strCommand, CDataStr
    2401                 }
    2402             }
    2403         }
    2404+        // If we're in IBD, we want outbound peers that will serve us a useful
    2405+        // chain.  Disconnect peers that are behind us or on chains with
    


    promag commented at 11:28 pm on October 19, 2017:
    Remove extra space.
  27. in src/net_processing.cpp:3317 in 0a3d957935 outdated
    3312+                state.m_headers_chain_timeout = time_in_seconds + CHAIN_SYNC_TIMEOUT;
    3313+                state.m_header_with_required_work = chainActive.Tip();
    3314+                state.m_sent_getheaders_to_check_chain_sync = false;
    3315+            } else if (state.m_headers_chain_timeout > 0 && time_in_seconds > state.m_headers_chain_timeout) {
    3316+                // No evidence yet that our peer has synced to a chain with work equal to that
    3317+                // of our tip, when we first detected it was behind.  Send a single getheaders
    


    promag commented at 11:34 pm on October 19, 2017:
    Remove extra space.
  28. sdaftuar force-pushed on Oct 20, 2017
  29. sdaftuar commented at 0:32 am on October 20, 2017: member
    Addressed some of @promag’s nits at https://github.com/sdaftuar/bitcoin/commits/11490.2, squashed -> 39e0c8e
  30. in src/net_processing.cpp:2435 in e9d58023ae outdated
    2382@@ -2383,6 +2383,20 @@ bool static ProcessMessage(CNode* pfrom, const std::string& strCommand, CDataStr
    2383                 }
    2384             }
    2385         }
    2386+        // If we're in IBD, we want outbound peers that will serve us a useful
    2387+        // chain. Disconnect peers that are on chains with insufficient work.
    2388+        if (IsInitialBlockDownload() && nCount != MAX_HEADERS_RESULTS) {
    2389+            // When nCount < MAX_HEADERS_RESULTS, we know we have no more
    2390+            // headers to fetch from this peer.
    2391+            if (nodestate->pindexBestKnownBlock->nChainWork < nMinimumChainWork) {
    


    ryanofsky commented at 7:58 pm on October 23, 2017:

    In commit “Disconnecting from bad outbound peers in IBD”

    Can you update comment to say why this is checking against nMinimumChainWork (and not chainActive.Tip()->nChainWork)?


    ryanofsky commented at 4:20 pm on October 24, 2017:
    Thanks. New “compare their tip” text might make a little more sense as part of the “peer has too little work” comment below rather than the “no more headers to fetch from this peer” comment up here.
  31. in test/functional/minchainwork.py:30 in e9d58023ae outdated
    26@@ -27,7 +27,7 @@ class MinimumChainWorkTest(BitcoinTestFramework):
    27     def set_test_params(self):
    28         self.setup_clean_chain = True
    29         self.num_nodes = 3
    30-        self.extra_args = [[], ["-minimumchainwork=0x65"], ["-minimumchainwork=0x65"]]
    31+        self.extra_args = [['-whitelist=127.0.0.1'], ["-minimumchainwork=0x65", '-whitelist=127.0.0.1'], ["-minimumchainwork=0x65"]]
    


    ryanofsky commented at 8:06 pm on October 23, 2017:

    In commit “Disconnecting from bad outbound peers in IBD”

    A comment saying why -whitelist is needed would be helfpul I think

  32. in src/net_processing.cpp:206 in 9a70054f65 outdated
    202@@ -200,6 +203,14 @@ struct CNodeState {
    203      * otherwise: whether this peer sends non-witnesses in cmpctblocks/blocktxns.
    204      */
    205     bool fSupportsDesiredCmpctVersion;
    206+    //! Whether this peer is protected from disconnection due to a bad/slow chain
    


    ryanofsky commented at 8:19 pm on October 23, 2017:

    In commit “Permit disconnection of outbound peers on bad/slow chains”

    I think you should do something to more clearly group these variables together. You could use doxygen grouping syntax:

    0//! State used to enforce CHAIN_SYNC_TIMEOUT.
    1//! @{
    2... variable declarations here ...
    3//! @}
    

    You could also give them a common prefix like m_sync_timeout_enable / m_sync_timeout_deadline / m_sync_timeout_min_work / m_sync_timeout_sent_getheaders or move them to a nested struct, though that’s probably asking a lot this far into the review.

  33. in src/net_processing.cpp:3295 in 9a70054f65 outdated
    3288@@ -3261,6 +3289,52 @@ bool PeerLogicValidation::SendMessages(CNode* pto, std::atomic<bool>& interruptM
    3289             }
    3290         }
    3291 
    3292+        // Check that outbound peers have reasonable chains
    3293+        // GetTime() is used by this anti-DoS logic so we can test this using mocktime
    3294+        int64_t time_in_seconds = GetTime();
    3295+        if (!(state.m_protect_from_disconnect || pto->fInbound || pto->m_manual_connection || pto->fFeeler || pto->fOneShot) && state.fSyncStarted) {
    


    ryanofsky commented at 8:40 pm on October 23, 2017:

    In commit “Permit disconnection of outbound peers on bad/slow chains”

    It seems like it could be good to have an pto->isOutbound() helper returning !pto->fInbound && !pto->m_manual_connection && !pto->fFeeler && !pto->fOneShot so this long string of checks doesn’t need to be repeated here and above in getheaders.

  34. in src/net_processing.cpp:3297 in 9a70054f65 outdated
    3288@@ -3261,6 +3289,52 @@ bool PeerLogicValidation::SendMessages(CNode* pto, std::atomic<bool>& interruptM
    3289             }
    3290         }
    3291 
    3292+        // Check that outbound peers have reasonable chains
    3293+        // GetTime() is used by this anti-DoS logic so we can test this using mocktime
    3294+        int64_t time_in_seconds = GetTime();
    3295+        if (!(state.m_protect_from_disconnect || pto->fInbound || pto->m_manual_connection || pto->fFeeler || pto->fOneShot) && state.fSyncStarted) {
    3296+            // This is an outbound peer subject to disconnection if their chain
    3297+            // lags behind ours (note: if their chain has more work than ours,
    


    ryanofsky commented at 8:48 pm on October 23, 2017:

    In commit “Permit disconnection of outbound peers on bad/slow chains”

    Instead of “if their chain lags behind ours” maybe be more specific and say “if they don’t announce a block with as much work as the current tip within CHAIN_SYNC_TIMEOUT+HEADERS_RESPONSE_TIME seconds.”

  35. in src/net_processing.cpp:3328 in 9a70054f65 outdated
    3295+        if (!(state.m_protect_from_disconnect || pto->fInbound || pto->m_manual_connection || pto->fFeeler || pto->fOneShot) && state.fSyncStarted) {
    3296+            // This is an outbound peer subject to disconnection if their chain
    3297+            // lags behind ours (note: if their chain has more work than ours,
    3298+            // we should sync to it, unless it's invalid, in which case we
    3299+            // should find that out and disconnect from them elsewhere).
    3300+            if (state.pindexBestKnownBlock != nullptr && state.pindexBestKnownBlock->nChainWork >= chainActive.Tip()->nChainWork) {
    


    ryanofsky commented at 9:08 pm on October 23, 2017:

    In commit “Permit disconnection of outbound peers on bad/slow chains”:

    I don’t see why the first and second branches of this if/else are distinct. It seems like you could unite them and drop unneeded checks with:

    0if (!state.m_header_with_required_work || (state.pindexBestKnownBlock && state.pindexBestKnownBlock->nChainWork >= state.m_header_with_required_work->nChainWork) {
    1    // Reset timeout and min required work if peer caught up to previous requirement.
    2    state.m_headers_chain_timeout = time_in_seconds + CHAIN_SYNC_TIMEOUT;
    3    state.m_header_with_required_work = chainActive.Tip();
    4    state.m_sent_getheaders_to_check_chain_sync = false;
    5} else if (time_in_seconds > state.m_headers_chain_timeout) {
    6...
    

    sdaftuar commented at 1:22 pm on October 24, 2017:
    Hm, I’ll think about this. I haven’t quite convinced myself yet that there are no edge-case differences between your simplification and my original patch but you may be right.
  36. in src/test/DoS_tests.cpp:78 in 39e0c8e1e7 outdated
    71+    peerLogic->SendMessages(&dummyNode1, interruptDummy); // should result in getheaders
    72+
    73+    int64_t nStartTime = GetTime();
    74+    // Wait 21 minutes, to
    75+    SetMockTime(nStartTime+21*60);
    76+    peerLogic->SendMessages(&dummyNode1, interruptDummy); // should result in getheaders
    


    ryanofsky commented at 9:34 pm on October 23, 2017:

    In commit “Add unit test for outbound peer eviction”

    Maybe check dummyNode1.vSendMsg size or contents to see if there actually was a getheaders.

  37. ryanofsky commented at 9:44 pm on October 23, 2017: member
    utACK 39e0c8e1e7b5cdf57437aa0f957b41183fd0e197
  38. sdaftuar commented at 1:43 pm on October 24, 2017: member
    @ryanofsky Thanks for the review; I’ve addressed all but one of your comments, which I’m still thinking about.
  39. in src/net_processing.cpp:210 in b73dcd12b9 outdated
    213-    bool m_sent_getheaders_to_check_chain_sync;
    214+
    215+    /** State used to enforce CHAIN_SYNC_TIMEOUT
    216+      * Only in effect for outbound, non-manual connections, with
    217+      * m_protect == false
    218+      * Algorithm: if a peer's best known block has less work than our tip,
    


    ryanofsky commented at 4:17 pm on October 24, 2017:
    Could maybe be more specific about the timeouts in the description. Instead of “T in the future”, could write “CHAIN_SYNC_TIMEOUT seconds in the future”, instead of “bump the timeout” could write “reset the timeout”, instead of “new shorter timeout” could refer to HEADERS_RESPONSE_TIME.

    sdaftuar commented at 5:41 pm on October 24, 2017:
    Done
  40. sdaftuar force-pushed on Oct 24, 2017
  41. sdaftuar commented at 4:20 pm on October 24, 2017: member
  42. ryanofsky commented at 4:21 pm on October 24, 2017: member
    Thanks for the updates! utACK 9dafb4587baa13ead7fe78ced3deedd303989016
  43. in src/net_processing.cpp:2395 in 0c4f59b7f8 outdated
    2390+            // headers to fetch from this peer. Compare their tip to
    2391+            // nMinimumChainWork (rather than chainActive.Tip()) because we
    2392+            // won't start block download until we have a headers chain that
    2393+            // has at least nMinimumChainWork, even if a peer has a chain past
    2394+            // our tip, as an anti-DoS measure.
    2395+            if (nodestate->pindexBestKnownBlock->nChainWork < nMinimumChainWork) {
    


    theuni commented at 5:27 pm on October 24, 2017:
    I’m unsure if it would be possible to get here without setting a pindexBestKnownBlock, but please double-check to be safe: if (nodestate->pindexBestKnownBlock && ...

    sdaftuar commented at 5:41 pm on October 24, 2017:
    Done
  44. ryanofsky commented at 5:54 pm on October 24, 2017: member
    utACK d9a05060ce3c6b8d752898ef68d3dcc0f68c7fba (just adds two suggested comment updates).
  45. in src/net_processing.cpp:2455 in ac5d938ef7 outdated
    2445         }
    2446+
    2447+        if (!pfrom->fDisconnect && IsOutboundPeer(pfrom)) {
    2448+            // If this is an outbound peer, check to see if we should protect
    2449+            // it from the bad/lagging chain logic.
    2450+            if (g_outbound_peers_with_protect_from_disconnect < MAX_OUTBOUND_PEERS_TO_PROTECT_FROM_DISCONNECT && nodestate->pindexBestKnownBlock->nChainWork >= chainActive.Tip()->nChainWork && !nodestate->m_chain_sync.m_protect) {
    


    theuni commented at 6:36 pm on October 24, 2017:
    nodestate->pindexBestKnownBlock needs a nullptr check here too. Though after a quick look, it doesn’t look like we can get here without a pindexBestKnownBlock. If that’s the case (I haven’t traced it fully) an assert may make more sense.

    sdaftuar commented at 8:06 pm on October 24, 2017:
    Yeah it should be impossible due to the call to UpdateBlockAvailability() above, but I’ll add the nullptr check anyway.
  46. in src/net_processing.cpp:3320 in ac5d938ef7 outdated
    3314@@ -3265,6 +3315,54 @@ bool PeerLogicValidation::SendMessages(CNode* pto, std::atomic<bool>& interruptM
    3315             }
    3316         }
    3317 
    3318+        // Check that outbound peers have reasonable chains
    3319+        // GetTime() is used by this anti-DoS logic so we can test this using mocktime
    3320+        int64_t time_in_seconds = GetTime();
    


    theuni commented at 6:57 pm on October 24, 2017:
    Can this be factored out into a function? It seems we really only need to call it when we get new headers, or when a timer expires. Moving it out will make it easier to break up the SendMessages monster down the road.

    sdaftuar commented at 8:42 pm on October 24, 2017:
    Done
  47. in src/net_processing.cpp:537 in ac5d938ef7 outdated
    531@@ -502,6 +532,13 @@ void FindNextBlocksToDownload(NodeId nodeid, unsigned int count, std::vector<con
    532 
    533 } // namespace
    534 
    535+// Returns true for outbound peers, excluding manual connections, feelers, and
    536+// one-shots
    537+bool IsOutboundPeer(const CNode *node)
    


    theuni commented at 7:13 pm on October 24, 2017:
    This name is really deceiving. Maybe IsOutboundDisconnectionCandidate?

    sdaftuar commented at 8:42 pm on October 24, 2017:
    done
  48. in src/net_processing.cpp:3334 in ac5d938ef7 outdated
    3329+                if (state.m_chain_sync.m_timeout != 0) {
    3330+                    state.m_chain_sync.m_timeout = 0;
    3331+                    state.m_chain_sync.m_work_header = nullptr;
    3332+                    state.m_chain_sync.m_sent_getheaders = false;
    3333+                }
    3334+            } else if (state.m_chain_sync.m_timeout == 0 || (state.m_chain_sync.m_work_header != nullptr && state.pindexBestKnownBlock != nullptr && state.pindexBestKnownBlock->nChainWork >= state.m_chain_sync.m_work_header->nChainWork)) {
    


    theuni commented at 7:36 pm on October 24, 2017:

    I believe peers can avoid being added as disconnection candidates if they just never respond to requests for blocks/headers (pindexBestKnownBlock is unset). In that case (assuming the “Start block sync” fired off an initial GETHEADERS above), should the clock also start here if pindexBestKnownBlock is null?

    Note: This comment conflicts with the one above about possibly asserting pindexBestKnownBlock here. Ignore whichever doesn’t apply :)


    achow101 commented at 8:26 pm on October 24, 2017:
    If they don’t respond, doesn’t the timeout just expire and pindexBestKnownBlock doesn’t matter?

    sdaftuar commented at 8:31 pm on October 24, 2017:

    Nice catch, thanks!

    Actually I misread – I think if a peer just never responds to a headers request, then (a) they won’t be protected from this logic (because we only protect peers who give us headers with work at least equal to our tip), and (b) we’ll drop into this else if clause if m_timeout == 0, ie if we’ve never set a timeout. The rest of the condition is for the case where we had an existing timeout, then we’ll reset it if we received a header that has sufficient work.

    Agree?


    theuni commented at 8:54 pm on October 24, 2017:
    ok, agreed.
  49. sdaftuar commented at 8:43 pm on October 24, 2017: member
    Updated to address latest comments from @theuni.
  50. achow101 commented at 4:09 am on October 25, 2017: member
    utACK 011423049270dc288bb99d921620f9e0ba0bb43c Needs squash
  51. sdaftuar force-pushed on Oct 25, 2017
  52. sdaftuar commented at 3:55 pm on October 25, 2017: member
    Squashed https://github.com/sdaftuar/bitcoin/commits/11490.4 -> d6c4ad762a1f369a02367bfc18a79e79ab913857
  53. gmaxwell commented at 7:22 pm on October 25, 2017: contributor
    ACK. Looks good and I’ve been running it on a node for a couple days, seen some getheaders but no disconnections yet, though I’ve started disconnecting healthy outbound peers with the hope of finding some that fail naturally. :)
  54. gmaxwell commented at 3:22 am on October 26, 2017: contributor
    After a kicking all my outbounds every couple hours, I eventually hit on this disconnecting a peer… logs show that it was obviously broken. Good job patch.
  55. Disconnecting from bad outbound peers in IBD
    When in IBD, we'd like to use all our outbound peers to help us
    sync the chain.  Disconnect any outbound peers whose headers have
    insufficient work.
    c60fd71a65
  56. Permit disconnection of outbound peers on bad/slow chains
    Currently we have no rotation of outbound peers.  If an outbound peer
    stops serving us blocks, or is on a consensus-incompatible chain with
    less work than our tip (but otherwise valid headers), then we will never
    disconnect that peer, even though that peer is using one of our 8
    outbound connection slots.  Because we rely on our outbound peers to
    find an honest node in order to reach consensus, allowing an
    incompatible peer to occupy one of those slots is undesirable,
    particularly if it is possible for all such slots to be occupied by such
    peers.
    
    Protect against this by always checking to see if a peer's best known
    block has less work than our tip, and if so, set a 20 minute timeout --
    if the peer is still not known to have caught up to a chain with as much
    work as ours after 20 minutes, then send a single getheaders message,
    wait 2 more minutes, and if a better header hasn't been received by then,
    disconnect that peer.
    
    Note:
    
    - we do not require that our peer sync to the same tip as ours, just an
    equal or greater work tip.  (Doing otherwise would risk partitioning the
    network in the event of a chain split, and is also unnecessary.)
    
    - we pick 4 of our outbound peers and do not subject them to this logic,
    to be more conservative. We don't wish to permit temporary network
    issues (or an attacker) to excessively disrupt network topology.
    5a6d00c6de
  57. jnewbery commented at 5:48 pm on October 26, 2017: member
    utACK 453545ffb5b2843757e6644db98f4e8ac11092d2
  58. Add unit test for outbound peer eviction e065249c01
  59. sdaftuar force-pushed on Oct 26, 2017
  60. sdaftuar commented at 5:53 pm on October 26, 2017: member

    @jnewbery pointed out to that the functional test didn’t actually need a change, due to protections given to manual connections (and every connection in regtest is a manual connection). So I reverted the change to minchainwork and just added a comment that explains the situation. Also fixed a comment typo that he pointed out in the unit test.

    Squashed https://github.com/sdaftuar/bitcoin/commits/11490.6 -> e065249c014a070a8799b2ff947af5b8f012c5c1

  61. jnewbery commented at 6:25 pm on October 26, 2017: member
    utACK e065249c014a070a8799b2ff947af5b8f012c5c1
  62. laanwj added this to the "Review priority 0.15.0.2" column in a project

  63. laanwj merged this on Oct 26, 2017
  64. laanwj closed this on Oct 26, 2017

  65. laanwj referenced this in commit d93fa261f0 on Oct 26, 2017
  66. fanquake removed this from the "Review priority 0.15.0.2" column in a project

  67. MarcoFalke referenced this in commit a764b17d96 on Nov 1, 2017
  68. MarcoFalke referenced this in commit 9e599f1a2c on Nov 1, 2017
  69. MarcoFalke referenced this in commit b36650b1f4 on Nov 1, 2017
  70. MarcoFalke removed the label Needs backport on Nov 1, 2017
  71. MarcoFalke referenced this in commit bf191a7183 on Nov 2, 2017
  72. MarcoFalke referenced this in commit 9961abf9e4 on Nov 2, 2017
  73. MarcoFalke referenced this in commit e3272242e2 on Nov 2, 2017
  74. codablock referenced this in commit 7b909fe998 on Sep 26, 2019
  75. codablock referenced this in commit b451094e2e on Sep 29, 2019
  76. barrystyle referenced this in commit bb9d414a2d on Jan 22, 2020
  77. MarcoFalke locked this on Sep 8, 2021

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-01-21 12:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me