net_processing: make any misbehavior trigger immediate discouragement #29575

pull sipa wants to merge 6 commits into bitcoin:master from sipa:202403_nomisbehave changing 10 files +100 −170
  1. sipa commented at 12:54 pm on March 6, 2024: member

    So far, discouragement of peers triggers when their misbehavior score exceeds 100 points. Most types of misbehavior increment the score by 100, triggering immediate discouragement, but some types do not. This PR makes all increments equal to either 100 (meaning any misbehavior will immediately cause disconnection and discouragement) or 0 (making the behavior effectively unconditionally allowed), and then removes the logic for score accumulation.

    This simplifies the code a bit, but also makes protocol expectations clearer: if a peer misbehaves, they get disconnected. There is no good reason why certain types of protocol violations should be permitted 4 times (howmuch=20) or 9 times (howmuch=10), while many others are never allowed. Furthermore, the distinction between these looks arbitrary.

    The specific types of misbehavior that are changed to 100 are:

    • Sending us a block which does not connect to our header tree (which necessarily must have been unsollicited). [used to be score 10]
    • Sending us a headers with a non-continuous headers sequence. [used to be score 20]
    • Sending us more than 1000 addresses in a single addr or addrv2 message [used to be score 20]
    • Sending us more than 50000 invs in a single inv message [used to be score 20]
    • Sending us more than 2000 headers in a single headers message [used to be score 20]

    The specific types of misbehavior that are changed to 0 are:

    • Sending us 10 (*) separate BIP130 headers announcements that do not connect to our block tree [used to be score 20]
    • Sending us more than 8 headers in a single headers message (which thus does not get treated as a BIP130 announcement) that does not connect to our block tree. [used to be score 10]

    I believe that none of these behaviors are unavoidable, except for the one marked (*) which can in theory happen still due to interaction between BIP130 and variations in system clocks (the max 2 hour in the future rule). This one has been removed entirely. In order to remove the impact of the bug it was designed to deal with, without relying on misbehavior, a separate improvement is included that makes getheaders-tracking more accurate.

    In another unrelated improvement, this also gets rid of the 8 header limit heuristic to determine whether an incoming non-connecting headers is a potential BIP130 announcement, as this rule is no longer needed to prevent spurious Misbehavior. Instead, any non-connecting headers is now treated as a potential announcement.

  2. DrahtBot commented at 12:54 pm on March 6, 2024: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage

    For detailed information about the code coverage, see the test coverage report.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK sr-gi, mzumsande, glozow, achow101
    Concept ACK instagibbs, 1440000bytes, brunoerg, sdaftuar, willcl-ark
    Stale ACK Sjors

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #30277 ([DO NOT MERGE] Erlay: bandwidth-efficient transaction relay protocol (Full implementation) by sr-gi)
    • #30233 (refactor: move m_is_inbound out of CNodeState by sr-gi)
    • #30116 (p2p: Fill reconciliation sets (Erlay) attempt 2 by sr-gi)
    • #29641 (scripted-diff: Use LogInfo/LogDebug over LogPrintf/LogPrint by maflcko)
    • #25832 (tracing: network connection tracepoints by 0xB10C)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  3. willcl-ark added the label P2P on Mar 6, 2024
  4. in test/functional/p2p_sendheaders.py:571 in bdc5776295 outdated
    568@@ -569,7 +569,7 @@ def test_nonnull_locators(self, test_node, inv_node):
    569 
    570         # Now try to see how many unconnecting headers we can send
    571         # before we get disconnected.  Should be 5*MAX_NUM_UNCONNECTING_HEADERS_MSGS
    


    instagibbs commented at 2:55 pm on March 6, 2024:
    0        # Now try to see how many unconnecting headers we can send
    1        # before we get disconnected.  Should be MAX_NUM_UNCONNECTING_HEADERS_MSGS
    

    sipa commented at 4:44 pm on March 7, 2024:
    Fixed by deleting that code (and the corresponding test) entirely.
  5. instagibbs commented at 3:26 pm on March 6, 2024: member

    concept ACK

    going to run on a listening node with logging to test

  6. 1440000bytes commented at 6:42 pm on March 6, 2024: none
    Concept ACK
  7. brunoerg commented at 10:09 pm on March 6, 2024: contributor
    Concept ACK
  8. jess2505 approved
  9. jess2505 approved
  10. 0xB10C commented at 10:27 am on March 7, 2024: contributor

    I have been monitoring peer misbehavior with the misbehaving tracepoint from #25832. In the past weeks, I’ve seen the following misbehavior across 10 nodes:

    • header with invalid proof of work [100]: very frequent
    • mutated block [100]: these are frequent. I was running #29412 where we disconnect for each mutated block (compared to only disconnecting for mutated blocks we hadn’t seen before)
    • invalid header received [10]: these are infrequent (“Sending us more than 8 headers in a single headers message”)
    • headers message size = X [20]: only seen one occurrence (“Sending us more than 2000 headers in a single headers message that does not connect to our block tree”)

    I’ve looked at a few of these invalid header received misbehaviors and they all came from the same IP address with a (fake?) v26.0 Bitcoin Core user agent and always claiming their start height to be 153681 (in the version msg). Seems ok to disconnect these?

    I’ll keep an eye on further misbehavior.

  11. Sjors commented at 11:04 am on March 7, 2024: member

    due to interaction between BIP130 and variations in system clocks (the max 2 hour in the future rule)

    Perhaps we can tolerate this if the block is less than 10 hours in the future and disconnect if it’s more? It seems cheap enough to process slightly-too-far-in-the-future headers, and they are enormously expensive to produce.

    (for other reviewers, look for MAX_NUM_UNCONNECTING_HEADERS_MSGS to see where this is handled)

    The peer might also be telling us about a reorg longer than MAX_BLOCKS_TO_ANNOUNCE = 8. In UpdatedBlockTip we:

    limit announcements in case of a huge reorganization. Rely on the peer’s synchronization mechanism in that case.

    So we should not disconnect the first time around.

    The HandleFewUnconnectingHeaders documentation should probably mention this.

    https://github.com/Sjors/bitcoin/commit/fad8c5be6e05091e409186ea73e683a987e3b324 illustrates how we could limit this to just once (MAX_NUM_UNCONNECTING_HEADERS_MSGS = 2), since > 8 block reorgs should be rare. If we were to immedidately disconnect (MAX_NUM_UNCONNECTING_HEADERS_MSGS = 1 then step 5 of p2p_sendheaders.py fails). This doesn’t account for the time difference issue. And in any case it can be done in a followup. And of course we wouldn’t need an integer, but could instead call it m_chicken_little.

  12. Sjors commented at 3:00 pm on March 7, 2024: member

    https://github.com/Sjors/bitcoin/commit/00a7dccc1c6957ce5a6a00d5322a25f949956e6d is one approach of dealing with the time difference issue. If any of the headers are > MAX_FUTURE_BLOCK_TIME * 10 in the future we disconnect with time-too-new. We could perhaps increase the number to allow for even more incorrect clocks, but I’m not sure how useful that is.

    Some of the comments in that commit may be spurious; I can’t think of a useful attack where a peer (e.g. Bob) sends us headers, three hours in the future, with fake PoW. All Bob would achieve is that we ask another peer (Caroll) Bob for the missing headers, which Caroll either won’t have, or she does we disconnect her he either sends, for which we disconnect. Or he doesn’t send and tries again, at which point we disconnect him.

    I also don’t know if if it’s even worth checking if a header is in the far future, but at least it makes for more useful log entries? (it’s nice to know the difference between a peer with a broken clock and a peer that falsely announces a large reorg)

  13. in src/net_processing.cpp:1836 in 1b6cf1f456 outdated
    1832@@ -1833,8 +1833,7 @@ bool PeerManagerImpl::MaybePunishNodeForBlock(NodeId nodeid, const BlockValidati
    1833         return true;
    1834     // Conflicting (but not necessarily invalid) data or different policy:
    1835     case BlockValidationResult::BLOCK_MISSING_PREV:
    1836-        // TODO: Handle this much more gracefully (10 DoS points is super arbitrary)
    1837-        if (peer) Misbehaving(*peer, 10, message);
    1838+        if (peer) Misbehaving(*peer, 100, message);
    


    Sjors commented at 4:23 pm on March 7, 2024:

    This is only triggered by AcceptBlockHeader, which typically happens after if(!headers_connect_blockindex) causes an early return in ProcessHeadersMessage.

    In other words, disconnecting here does not get in the way of HandleUnconnectingHeaders processing.

    Presumably (I didn’t check) this can be reached if we get an unsolicited BLOCK message. In which case it seems fine to disconnect.

    It can also be reached when using the submitblock(header) RPC, but in that case there is no peer.


    sipa commented at 5:44 pm on March 7, 2024:
    This was my impression as well, that it can only be triggered by unsolicited block messages, not through headers logic.
  14. in src/net_processing.cpp:2535 in 1b6cf1f456 outdated
    2531@@ -2533,7 +2532,7 @@ bool PeerManagerImpl::CheckHeadersPoW(const std::vector<CBlockHeader>& headers,
    2532 
    2533     // Are these headers connected to each other?
    2534     if (!CheckHeadersAreContinuous(headers)) {
    2535-        Misbehaving(peer, 20, "non-continuous headers sequence");
    2536+        Misbehaving(peer, 100, "non-continuous headers sequence");
    


    Sjors commented at 4:32 pm on March 7, 2024:

    I don’t know why we ever tolerated this, but it’s been that way since headers-first was added in #4468. Perhaps to avoid a network split in the case where headers-first had an implementation bug?

    Note that the getheaders message is older, see https://github.com/bitcoin/bitcoin/pull/4468/commits/341735eb8f42e898cf9d4d130709471e5d01abe2#r17027597

  15. sipa force-pushed on Mar 7, 2024
  16. sipa commented at 4:40 pm on March 7, 2024: member
    I discussed this with @sdaftuar, @mzumsande, @sr-gi in person, and decided to instead just drop the Misbehavior score for unconnecting headers. The rationale is that this misbehavior was originally intended to prevent bandwidth wastage due to very broken (but likely non-malicious) nodes actually observed on the network that respond to GETHEADERS with a response unrelated to the request, triggering a request cycle. This has however largely been addressed by #25454, which limits the number of (not responded to) GETHEADERS requests in flight to one, with a 2 minute timeout. With that, the specific misbehavior is largely irrelevant (for inbound peers, it is now mostly harmless; for outbound peers, the eviction logic will eventually kick them out if they’re not keeping up with headers at all).
  17. in src/net_processing.cpp:3784 in aee3b67d43 outdated
    3780@@ -3782,7 +3781,7 @@ void PeerManagerImpl::ProcessMessage(CNode& pfrom, const std::string& msg_type,
    3781 
    3782         if (vAddr.size() > MAX_ADDR_TO_SEND)
    3783         {
    3784-            Misbehaving(*peer, 20, strprintf("%s message size = %u", msg_type, vAddr.size()));
    3785+            Misbehaving(*peer, 100, strprintf("%s message size = %u", msg_type, vAddr.size()));
    


    Sjors commented at 4:48 pm on March 7, 2024:
    I believe this limit was introduced a decade ago in #5419, at least on the sender side.
  18. in src/net_processing.cpp:3866 in aee3b67d43 outdated
    3862@@ -3864,7 +3863,7 @@ void PeerManagerImpl::ProcessMessage(CNode& pfrom, const std::string& msg_type,
    3863         vRecv >> vInv;
    3864         if (vInv.size() > MAX_INV_SZ)
    3865         {
    3866-            Misbehaving(*peer, 20, strprintf("inv message size = %u", vInv.size()));
    3867+            Misbehaving(*peer, 100, strprintf("inv message size = %u", vInv.size()));
    


    Sjors commented at 4:55 pm on March 7, 2024:
    The maximum size of an inv message has be 50,000 bytes since Satoshi sneaked in that limit in e2c2648c14f4b87d331dbc30f0f2bd4aab9ce7e6.
  19. in src/net_processing.cpp:4653 in aee3b67d43 outdated
    4649@@ -4651,7 +4650,7 @@ void PeerManagerImpl::ProcessMessage(CNode& pfrom, const std::string& msg_type,
    4650         // Bypass the normal CBlock deserialization, as we don't want to risk deserializing 2000 full blocks.
    4651         unsigned int nCount = ReadCompactSize(vRecv);
    4652         if (nCount > MAX_HEADERS_RESULTS) {
    4653-            Misbehaving(*peer, 20, strprintf("headers message size = %u", nCount));
    4654+            Misbehaving(*peer, 100, strprintf("headers message size = %u", nCount));
    


    Sjors commented at 4:59 pm on March 7, 2024:
    This limit has been in place since #4468
  20. Sjors commented at 5:00 pm on March 7, 2024: member

    I looked at each of the limits in aee3b67d43eea083d7299840dd11a61d4cf797b9 to confirm they’re plenty-old enough to just disconnect anyone that breaks them.

    Will look at the newly pushed commits later.

  21. in src/net_processing.cpp:543 in dee6ca279c outdated
    542@@ -545,7 +543,7 @@ class PeerManagerImpl final : public PeerManager
    543      * Increment peer's misbehavior score. If the new value >= DISCOURAGEMENT_THRESHOLD, mark the node
    


    mzumsande commented at 6:26 pm on March 7, 2024:
    Doc needs an update.

    sipa commented at 7:09 pm on March 12, 2024:
    Done.
  22. in src/net_processing.cpp:1772 in dee6ca279c outdated
    1780-    }
    1781+    peer.m_should_discourage = true;
    1782 
    1783-    LogPrint(BCLog::NET, "Misbehaving: peer=%d (%d -> %d)%s%s\n",
    1784-             peer.m_id, score_before, score_now, warning, message_prefixed);
    1785+    LogPrint(BCLog::NET, "Misbehaving: peer=%d (0 -> 100) DISCOURAGE THRESHOLD EXCEEDED%s\n",
    


    mzumsande commented at 6:29 pm on March 7, 2024:
    Could be simplified to stop mentioning a “threshold” and a score that doesn’t exist anymore.

    sr-gi commented at 6:24 pm on March 11, 2024:
    Does 100 even have a meaning anymore?

    sipa commented at 7:09 pm on March 12, 2024:
    Gone.
  23. Sjors commented at 7:36 pm on March 7, 2024: member

    Reviewed dee6ca279cac97e167c82b0b2a61d2d2f8df1d47

    I’ll run the log commit for a while.

    I’m still a bit on the fence about a903bba4d65720265659c6d4dae6cd8d9ec89c0f.

    With that, the specific misbehavior is largely irrelevant (for inbound peers, it is now mostly harmless; …

    Some context here for other reviewers, when you look at SelectNodeToEvict one of the criteria that prevents eviction is if they sent us novel blocks. This means that peers that keep sending us useless headers, will probably not enjoy this protection and eventually get replaced. Unless they also give use useful data, in which case they’re … useful - and we can just humor their headers shenaningans.

    … for outbound peers, the eviction logic will eventually kick them out if they’re not keeping up with headers at all).

    IIUC outbound peer eviction is handled in PeerManagerImpl::ConsiderEviction, which loses its patience after CHAIN_SYNC_TIMEOUT + HEADERS_RESPONSE_TIME. That seems a bit late though, if all our outbound peers are slow? Even if that’s rare, it seems better to have as many “fit” outbound peers as possible.

    Also… It might still be useful, even if not needed for security, to know if a peer sent us disconnecting headers. Although the NET log still provides a log message received header %s: missing prev block %s, sending getheaders (%d) to end (peer=%d), we don’t know the outcome.

    Tracking if the peer did this before, e.g. with something like m_chicken_little, would make this easier. But not sure if it’s worth the complexity. It seems all this would do is reveal that someone tried an attack that we believe to be useless.

  24. sipa commented at 8:39 pm on March 7, 2024: member

    @Sjors It’s worth taking into account that despite its history as originally being a “DoS score”, the discouragement logic isn’t particularly good at preventing intentional abuse of resources; scoring systems are inherently easy to bypass: just do enough to keep the score acceptable, and be as wasteful otherwise (or even better, if many IP addresses are available, use several). The only real solution, in my opinion, for actual DoS protection is keeping track of how many resources (bandwidth, CPU, I/O, …) we spend on behalf of every peer, and whenever those resources become a bottleneck for us, throttle the worst peers.

    So the purpose for the discouragement logic as I see it is avoiding wasting resources on (unintentionally) broken peers. These are somewhat distinct from useless peers (which is what the outbound eviction logic is aimed to prevent). There is an infinite amount of potential broken behavior one can imagine, and clearly we cannot have code for all of it.

    Another reason for the misbehavior detection is to help protocol implementers get a good idea for expectations around parts of the protocol that aren’t formally specified (because it triggers disconnections, and possibly logging). And arguably, unconnecting headers isn’t a protocol violation, at least in the context of BIP130 announcements, so disconnecting for something that’s very rare but not impossible to trigger in what we’d consider a perfectly compliant protocol implementation would be confusing.

  25. in src/net_processing.cpp:611 in 7712c5b2d7 outdated
    609@@ -610,7 +610,7 @@ class PeerManagerImpl final : public PeerManager
    610     /** Deal with state tracking and headers sync for peers that send the
    611      * occasional non-connecting header (this can happen due to BIP 130 headers
    


    mzumsande commented at 4:33 pm on March 11, 2024:
    doc could be updated, the function now deals with non-connecting headers in general.

    sipa commented at 7:52 pm on March 11, 2024:
    I think the comment is still accurate? Happy to take suggestions for what to change in it, though.

    mzumsande commented at 8:03 pm on March 11, 2024:
    maybe just drop the “occasional” (“that send non-connecting headers”), but no big deal either way.

    sipa commented at 7:09 pm on March 12, 2024:
    Done.
  26. in src/net_processing.cpp:134 in a903bba4d6 outdated
    128@@ -129,8 +129,6 @@ static constexpr double BLOCK_DOWNLOAD_TIMEOUT_BASE = 1;
    129 static constexpr double BLOCK_DOWNLOAD_TIMEOUT_PER_PEER = 0.5;
    130 /** Maximum number of headers to announce when relaying blocks with headers message.*/
    131 static const unsigned int MAX_BLOCKS_TO_ANNOUNCE = 8;
    132-/** Maximum number of unconnecting headers announcements before DoS score */
    133-static const int MAX_NUM_UNCONNECTING_HEADERS_MSGS = 10;
    


    mzumsande commented at 5:15 pm on March 11, 2024:
    The commit message of a903bba4d65720265659c6d4dae6cd8d9ec89c0f says that #25454 has largely addressed the problem of a request cycle, but I don’t understand why: If we receive an unconnecting header, we send a GetHeaders. But if our peer is broken in a way that they ignore the contents of our locator and send us unconnecting headers again, we clear m_last_getheaders_timestamp, call ProcessHeadersMessage and immediately send a GetHeaders again (we wold only delay that if m_last_getheaders_timestamp hadn’t been cleared) - and after this PR we wouldn’t discourage the peer anymore so that this could go on forever. How did #25454 address that?

    sipa commented at 7:59 pm on March 11, 2024:
    Oh, I was under the mistaken assumption that m_last_getheaders_timestamp was only cleared when a connecting header was received. I think that’s fine to change here: a response to a requested GETHEADERS will always be connecting.

    sr-gi commented at 2:07 pm on March 12, 2024:
    Does this mean m_last_getheaders_timestamp clearing should be moved under ProcessHeadersMessage conditionally to headers_connect_blockindex being true?

    sipa commented at 7:11 pm on March 12, 2024:

    Done in a separate new first commit. It turned out to be somewhat more complicated as there are 3 distinct conditions that must be treated as responses:

    • empty headers messages (they don’t announce anything, so they cannot be announcements)
    • continuations of low-work header syncs (they’re a response to a GETHEADERS, but do not connect to header tree!).
    • valid headers messages that connect to our header tree

    Sjors commented at 12:44 pm on March 22, 2024:
    Glad you caught this @mzumsande.
  27. mzumsande commented at 5:18 pm on March 11, 2024: contributor
    Concept ACK, a few doc nits and one question below.
  28. in src/net_processing.cpp:4804 in dee6ca279c outdated
    4487@@ -4529,7 +4488,7 @@ void PeerManagerImpl::ProcessMessage(CNode& pfrom, const std::string& msg_type,
    4488                 ReadStatus status = partialBlock.InitData(cmpctblock, vExtraTxnForCompact);
    4489                 if (status == READ_STATUS_INVALID) {
    4490                     RemoveBlockRequest(pindex->GetBlockHash(), pfrom.GetId()); // Reset in-flight state in case Misbehaving does not result in a disconnect
    


    sr-gi commented at 8:08 pm on March 11, 2024:
    Is this required anymore? Flagging as misbehaving will result in disconnecting now

    mzumsande commented at 4:30 pm on March 12, 2024:
    yes, because some peers (e.g. ones with special permissions) don’t get disconnected.
  29. in src/test/denialofservice_tests.cpp:362 in dee6ca279c outdated
    359     BOOST_CHECK(nodes[0]->fDisconnect);
    360     // [1] is not discouraged/disconnected yet.
    361     BOOST_CHECK(!banman->IsDiscouraged(addr[1]));
    362     BOOST_CHECK(!nodes[1]->fDisconnect);
    363-    peerLogic->UnitTestMisbehaving(nodes[1]->GetId(), 1); // [1] reaches discouragement threshold
    364+    peerLogic->UnitTestMisbehaving(nodes[1]->GetId()); // [1] reaches discouragement threshold
    


    sr-gi commented at 8:09 pm on March 11, 2024:
    Looks like this comment needs updating since there is no threshold anymore

    sipa commented at 7:11 pm on March 12, 2024:
    Fixed.
  30. in src/net_processing.cpp:1932 in aee3b67d43 outdated
    1827@@ -1828,8 +1828,7 @@ bool PeerManagerImpl::MaybePunishNodeForBlock(NodeId nodeid, const BlockValidati
    1828         return true;
    1829     // Conflicting (but not necessarily invalid) data or different policy:
    


    sr-gi commented at 8:29 pm on March 11, 2024:
    Is this comment accurate? I don’t see how it applies

    sipa commented at 7:08 pm on March 12, 2024:
    Heh, this seems to be a leftover comment that hasn’t applied in years. This code used to handle both block and transaction validation failures, but transactions were moved elsewhere. It seems unrelated to this PR, so I won’t touch it here.

    maflcko commented at 12:49 pm on June 28, 2024:
    @sr-gi Do you want to follow-up on this?

    sr-gi commented at 3:09 pm on June 28, 2024:

    I took a look at this to see if it would make sense to remove the comment, but looks like this is still consistent with the original approach, even after splitting the logic in two for txs and blocks:

    https://github.com/bitcoin/bitcoin/commit/ef54b486d5333dfc85c56e6b933c81735196a25d

    To me, it seems that, at the very least, this comment would need to be moved one case down, given BlockValidationResult::BLOCK_MISSING_PREV triggers misbehavior, so the comment doesn’t apply there.

  31. in src/net_processing.cpp:1954 in dee6ca279c outdated
    1834@@ -1848,7 +1835,7 @@ bool PeerManagerImpl::MaybePunishNodeForTx(NodeId nodeid, const TxValidationStat
    1835         break;
    1836     // The node is providing invalid data:
    1837     case TxValidationResult::TX_CONSENSUS:
    1838-        if (peer) Misbehaving(*peer, 100, "");
    1839+        if (peer) Misbehaving(*peer, "");
    


    sr-gi commented at 9:07 pm on March 11, 2024:
    Is there a reason why we don’t even signal this as a consensus discrepancy? Is it logged somewhere else?

    sipa commented at 7:04 pm on March 12, 2024:
    This is the network processing layer, which acts in response to validation failures (to the extent that it needs to punish peers for it). The actual validation failure should be logged through the validation code itself, see ConnectTip() there, which logs failures, and invokes the BlockChecked signal that is handled by net_processing.

    sr-gi commented at 2:58 pm on March 22, 2024:
    I see. The message passed to Misbehaving ranges in how explicit it is though (from the actual reason to the category, without really specifying what, e.g. “invalid compact block”). That’s why I was expecting this to at least signal the reason is due to a consensus discrepancy
  32. sr-gi commented at 3:54 pm on March 12, 2024: member

    Concept ACK. I left some comments inline, plus some general ones that may be addressable with the change of functionality:

    • Looks like banmah.h also needs updating, given there is no such thing as how much a peer misbehaves anymore:

      0(line 37) // 2. Discouragement. If a peer misbehaves enough (see Misbehaving() in
      1// net_processing.cpp)
      
    • p2p_filter.py and p2p_invalid_messages.py are asserting logs looking for Misbehaving. Shouldn’t it be better now to just check that the peer was disconnected in those cases? e.g:

      0self.log.info('Check that too large filter is rejected')
      1with self.nodes[0].assert_debug_log(['Misbehaving']):
      2    filter_peer.send_and_ping(msg_filterload(data=b'\xbb'*(MAX_BLOOM_FILTER_SIZE+1)))
      
  33. sipa force-pushed on Mar 12, 2024
  34. sipa commented at 7:20 pm on March 12, 2024: member

    Addressed most of the feedback I’ve gotten.

    As pointed out by @mzumsande in #29575 (review), the protection I imagined existed against buggy clients causing a getheaders/headers cycle didn’t actually exist. I’ve instead added that as a separate improvement, becoming the new first commit. @sr-gi I have not addressed the p2p_filter.py and p2p_invalid_messages.py using misbehavior logging. These tests use whitelisting to prevent disconnection, and I think reworking that isn’t necessarily an improvement.

  35. Sjors commented at 9:43 am on March 22, 2024: member

    I ran a slightly outdated version of this PR with the log commit e451338fa705a0a91b34f3fc53b8b921b1da9c63 for the last two weeks on a decently connected node. Only a single MISBEHAVE log entry:

    0168782-2024-03-07T15:58:10.608477Z Saw new header hash=00000000000000000002c88981de747ae237d80b74a5deb1a17943a76b296e3a height=833594 peer=1501 peeraddr=x.x.x.x:35630
    1168948-2024-03-07T15:58:10.947732Z UpdateTip: new best=00000000000000000002c88981de747ae237d80b74a5deb1a17943a76b296e3a height=833594 version=0x2319e000 log2_work=94.778914 tx=973821560 date='2024-03-07T15:57:26Z' progress=1.000000 cache=16.8MiB(112927txo)
    2169198:2024-03-07T15:58:19.055670Z MISBEHAVE: invalid header
    

    This invalid header was sent 9 seconds after receiving a new valid block.

    I’ll continue to run with the latest version…


    ACK e7ccaa01e4e0ef502d5727367baa54bcdcf1fb83

  36. DrahtBot requested review from instagibbs on Mar 22, 2024
  37. DrahtBot requested review from sr-gi on Mar 22, 2024
  38. DrahtBot requested review from mzumsande on Mar 22, 2024
  39. DrahtBot requested review from brunoerg on Mar 22, 2024
  40. sipa requested review from sdaftuar on Mar 22, 2024
  41. sdaftuar commented at 6:54 pm on April 4, 2024: member
    Concept and code review ACK. I haven’t done any testing (nor did I carefully review the tests), just read through the code and thought about the overall logical flow, which makes sense to me.
  42. Sjors commented at 8:00 am on April 5, 2024: member
    It’s been two weeks and I don’t see a single MISBEHAVE log entry.
  43. mzumsande commented at 4:49 pm on April 5, 2024: contributor
    Code Review ACK e7ccaa01e4e0ef502d5727367baa54bcdcf1fb83
  44. willcl-ark commented at 8:01 pm on April 6, 2024: member

    Concept ACK.

    IIUC this would not quite be enough to close down #9530 as it would not permit a header building on top of an invalid block with a valid header, as proposed there. Perhaps what is proposed here is superior though and ce238dd79bd3a1190a2229fac4502fa5742fc6e3 is enough to close that issue out if this goes in?

  45. in test/functional/p2p_sendheaders.py:79 in 78515e1857 outdated
    76    1. Announce a header that doesn't connect.
    77       Expect: getheaders message
    78    2. Send headers chain.
    79       Expect: getdata for the missing blocks, tip update.
    80-b. Then send 9 more headers that don't connect.
    81+b. Then send 100 more headers that don't connect.
    


    sr-gi commented at 6:41 pm on April 15, 2024:

    in: 78515e185755f5eba951917b8e2314ec69446d7d

    nit: isn’t this 99? (for i in range(1, NUM_HEADERS))


    sipa commented at 12:39 pm on May 30, 2024:
    Done.
  46. in test/functional/p2p_sendheaders.py:562 in ce238dd79b outdated
    554@@ -555,9 +555,13 @@ def test_nonnull_locators(self, test_node, inv_node):
    555             height += 1
    556 
    557         for i in range(1, MAX_NUM_UNCONNECTING_HEADERS_MSGS):
    558-            # Send a header that doesn't connect, check that we get a getheaders.
    559             with p2p_lock:
    560                 test_node.last_message.pop("getheaders", None)
    561+            # Send an empty header as failed response to the receive getheaders, to
    562+            # make marked responded to. New headers are not treated as announcements
    563+            # otherwise.
    


    sr-gi commented at 6:53 pm on April 15, 2024:

    in: ce238dd79bd3a1190a2229fac4502fa5742fc6e3

    nit: I found this wording a bit hard to follow, especially the “to make marked responded to” bit.

    IIUC, this is sending an empty block as a response to the outstanding getheaders request from the previous loop iteration, otherwise sending blocks[i] would be treated as the reply instead. Is that it? If so, I’d suggest something along the lines of:

    0            # Send an empty header as a failed response to the received getheaders
    1            # (from the previous iteration). Otherwise, the new headers will be
    2            # treated as a response instead of as an announcement.
    

    sipa commented at 12:39 pm on May 30, 2024:
    Done.
  47. in src/net_processing.cpp:2735 in ce238dd79b outdated
    2627@@ -2628,6 +2628,8 @@ bool PeerManagerImpl::IsContinuationOfLowWorkHeadersSync(Peer& peer, CNode& pfro
    2628 {
    2629     if (peer.m_headers_sync) {
    2630         auto result = peer.m_headers_sync->ProcessNextHeaders(headers, headers.size() == MAX_HEADERS_RESULTS);
    2631+        // If it is a valid continuation, we should treat the existing getheaders request as responded to.
    2632+        if (result.success) peer.m_last_getheaders_timestamp = {};
    


    sr-gi commented at 7:41 pm on April 15, 2024:

    in: ce238dd79bd3a1190a2229fac4502fa5742fc6e3

    I think the comment 6 lines below is inconsistent now:

    0// It should be impossible for the getheaders request to fail,
    1// because we should have cleared the last getheaders timestamp
    2// when processing the headers that triggered this call.
    3[...]
    

    IIUC, result.request_more cannot be true if result.success is false (based on my understanding of HeadersSyncState::ProcessNextHeaders), therefore we will clear m_last_getheaders_timestamp right before entering the request_more scope.

    I think this should also mean that there shouldn’t be a way of bypassing this check (as the second part of the comment used to suggest), so the logging can be simplified


    sipa commented at 12:40 pm on May 30, 2024:
    Nice catch, indeed. I’ve simplified the branches and comments, and added a few Assumes to document the expectations.
  48. sr-gi commented at 7:51 pm on April 15, 2024: member
    A couple of doc nits and a question below
  49. DrahtBot requested review from sr-gi on Apr 15, 2024
  50. DrahtBot added the label Needs rebase on Apr 25, 2024
  51. net_processing: do not treat non-connecting headers as response
    Since https://github.com/bitcoin/bitcoin/pull/25454 we keep track of the last
    GETHEADERS request that was sent and wasn't responded to. So far, every incoming
    HEADERS message is treated as a response to whatever GETHEADERS was last sent,
    regardless of its contents.
    
    This commit makes this tracking more accurate, by only treating HEADERS messages
    which (1) are empty, (2) connect to our existing block header tree, or (3) are a
    continuation of a low-work headers sync as responses that clear the "outstanding
    GETHEADERS" state (m_last_getheaders_timestamp).
    
    That means that HEADERS messages which do not satisfy any of the above criteria
    will be ignored, not triggering a GETHEADERS, and potentially (for now, but see
    later commit) increase misbehavior score.
    9f66ac7cf1
  52. net_processing: drop Misbehavior for unconnecting headers
    This misbehavior was originally intended to prevent bandwidth wastage due to
    actually observed very broken (but likely non-malicious) nodes that respond
    to GETHEADERS with a response unrelated to the request, triggering a request
    cycle.
    
    This has however largely been addressed by the previous commit, which causes
    non-connecting HEADERS that are received while a GETHEADERS has not been
    responded to, to be ignored, as long as they do not time out (2 minutes).
    With that, the specific misbehavior is largely irrelevant (for inbound peers,
    it is now harmless; for outbound peers, the eviction logic will eventually
    kick them out if they're not keeping up with headers at all).
    944c54290d
  53. net_processing: drop 8 headers threshold for incoming BIP130
    With the Misbehavior score gone for non-connecting headers (see previous
    commit), there is no need to only treat headers messages with up to 8
    headers as potential BIP130 announcements. BIP130 does not specify such
    a limit; it was purely a heuristic.
    5120ab1478
  54. net_processing: make all Misbehaving increments = 100
    This removes the need to actually track misbehavior score (see further commit), because any
    Misbehaving node will immediately hit the discouragement threshold.
    6457c31197
  55. net_processing: remove Misbehavior score and increments
    This is now all unused.
    ae60d485da
  56. sipa force-pushed on May 30, 2024
  57. DrahtBot removed the label Needs rebase on May 30, 2024
  58. DrahtBot added the label CI failed on May 30, 2024
  59. DrahtBot commented at 4:36 pm on May 30, 2024: contributor

    🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the documentation.

    Possibly this is due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    Leave a comment here, if you need help tracking down a confusing failure.

    Debug: https://github.com/bitcoin/bitcoin/runs/25603357473

  60. DrahtBot removed the label CI failed on May 31, 2024
  61. sr-gi commented at 2:18 pm on June 6, 2024: member

    ACK ae60d48

    Something orthogonal to this, but that I realized working on the same part of the codebase and I wonder if it could be added here:

    Looks like the result of MaybePunishNodeForBlock is never consumed in the codebase, plus the docs regarding what to return seem not to follow what the code is doing (if peer is not found, Misbehaving is not called, but the method returns true).

    Should we just refactor the method to simply return void?

  62. DrahtBot requested review from Sjors on Jun 6, 2024
  63. DrahtBot requested review from mzumsande on Jun 6, 2024
  64. DrahtBot requested review from willcl-ark on Jun 6, 2024
  65. net_processing: make MaybePunishNodeFor{Block,Tx} return void 6eecba475e
  66. sipa commented at 5:51 pm on June 6, 2024: member
    @sr-gi Done, in a new commit, for both MaybePunishNodeForBlock and MaybePunishNodeForTx.
  67. sr-gi commented at 6:52 pm on June 6, 2024: member
    ACK 6eecba4
  68. mzumsande commented at 5:57 pm on June 17, 2024: contributor
    Code Review ACK 6eecba475efd025eb011400af58621ad5823994e
  69. glozow commented at 11:49 am on June 19, 2024: member
    light code review / concept ACK 6eecba475efd025eb011400af58621ad5823994e
  70. achow101 commented at 5:20 pm on June 20, 2024: member
    ACK 6eecba475efd025eb011400af58621ad5823994e
  71. achow101 merged this on Jun 20, 2024
  72. achow101 closed this on Jun 20, 2024


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-12-11 06:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me