test: adds outbound eviction functional tests, updates comment in ConsiderEviction #29122

pull sr-gi wants to merge 2 commits into bitcoin:master from sr-gi:2023-12-20-eviction-tests changing 3 files +261 −4
  1. sr-gi commented at 4:00 PM on December 20, 2023: member

    Motivation

    While checking the outbound eviction code I realized a case was not considered within the comments, which in turn made me realize we had no functional tests for the outbound eviction case (when I went to check/add the test case).

    This PR updates the aforementioned comment and adds functional tests to cover the outbound eviction logic, in addition to the existing unit tests found at src/test/denialofservice_tests.cpp.

  2. DrahtBot commented at 4:00 PM on December 20, 2023: contributor

    <!--e57a25ab6845829454e8d69fc972939a-->

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    <!--006a51241073e994b41acfe9ec718e94-->

    Code Coverage

    For detailed information about the code coverage, see the test coverage report.

    <!--021abf342d371248e50ceaed478a90ca-->

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK davidgumberg, cbergqvist, tdb3, achow101
    Concept ACK sipa, stratospher, marcofleon

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    <!--174a7506f384e20aa4161008e828411d-->

    Conflicts

    No conflicts as of last run.

  3. DrahtBot added the label Tests on Dec 20, 2023
  4. sr-gi commented at 4:02 PM on December 20, 2023: member

    While the tests seem to run fine, I noticed a warning that appears every now and then:

    TestFramework.p2p (WARNING): Connection lost to 0:0 due to [Errno 54] Connection reset by peer
    

    I'm not sure if something unintended is happening under the hood. Disconnections are expected, but it's our node who is supposed to trigger them, so maybe I'm missing something.

  5. sr-gi force-pushed on Dec 20, 2023
  6. DrahtBot added the label CI failed on Dec 20, 2023
  7. sr-gi force-pushed on Dec 20, 2023
  8. sr-gi force-pushed on Dec 20, 2023
  9. sr-gi renamed this:
    test: adds outbound eviction functional tests, updated comment on ConsiderEviction
    test: adds outbound eviction functional tests, updates comment in ConsiderEviction
    on Dec 20, 2023
  10. DrahtBot removed the label CI failed on Dec 20, 2023
  11. sr-gi force-pushed on Dec 20, 2023
  12. in src/net_processing.cpp:5101 in 9ba568c816 outdated
    5097 | @@ -5098,9 +5098,10 @@ void PeerManagerImpl::ConsiderEviction(CNode& pto, Peer& peer, std::chrono::seco
    5098 |                  state.m_chain_sync.m_sent_getheaders = false;
    5099 |              }
    5100 |          } else if (state.m_chain_sync.m_timeout == 0s || (state.m_chain_sync.m_work_header != nullptr && state.pindexBestKnownBlock != nullptr && state.pindexBestKnownBlock->nChainWork >= state.m_chain_sync.m_work_header->nChainWork)) {
    5101 | -            // Our best block known by this peer is behind our tip, and we're either noticing
    5102 | -            // that for the first time, OR this peer was able to catch up to some earlier point
    5103 | -            // where we checked against our tip.
    5104 | +            // Either:
    


    naumenkogs commented at 9:34 AM on December 22, 2023:

    I've spent a ridiculous amount of time understanding what's going on here, and I'm still far from being 100% sure. Perhaps if you touch this comment, you can expand it even further? E.g., connecting to each of the conditions in else if


    sr-gi commented at 3:32 PM on December 22, 2023:

    Looks like I'm not the only one who took a minute to get this 😅

    I wouldn't mind expanding the comment if there is am overall agreement that it is incomplete


    sr-gi commented at 6:48 PM on January 3, 2024:

    Rebased and updated the comment. I also extended the test slightly to cover all possible cases

  13. sr-gi force-pushed on Jan 3, 2024
  14. sr-gi force-pushed on Jan 3, 2024
  15. DrahtBot added the label CI failed on Jan 3, 2024
  16. sr-gi force-pushed on Jan 3, 2024
  17. DrahtBot removed the label CI failed on Jan 3, 2024
  18. sr-gi force-pushed on Jan 4, 2024
  19. DrahtBot added the label CI failed on Jan 15, 2024
  20. sr-gi force-pushed on Jan 29, 2024
  21. sr-gi commented at 10:06 PM on January 29, 2024: member

    Rebased to fix CI

  22. DrahtBot removed the label CI failed on Jan 29, 2024
  23. DrahtBot added the label CI failed on Feb 6, 2024
  24. sr-gi force-pushed on Feb 6, 2024
  25. sr-gi commented at 2:55 PM on February 6, 2024: member

    Rebased. This has failed CI once when waiting for the disconnection after:

    cur_mock_time += (HEADERS_RESPONSE_TIME + 1)
    

    I've tried to reproduce it but I cannot in my local setup, the only thing that comes to mind is that mock time may be too close to the timeout (only one second ahead), so I gave it a delta of 10 seconds instead.

    PS: For context, you can check the failure here: https://cirrus-ci.com/task/4624744360706048?logs=ci#L6520. See how the peer is given the final notice by sending a getheaders message, but the check is performed before the peer gets disconnected.

  26. sr-gi force-pushed on Feb 6, 2024
  27. DrahtBot removed the label CI failed on Feb 6, 2024
  28. in src/net_processing.cpp:5167 in edc14bc9da outdated
    5131 | @@ -5132,16 +5132,19 @@ void PeerManagerImpl::ConsiderEviction(CNode& pto, Peer& peer, std::chrono::seco
    5132 |          // unless it's invalid, in which case we should find that out and
    5133 |          // disconnect from them elsewhere).
    5134 |          if (state.pindexBestKnownBlock != nullptr && state.pindexBestKnownBlock->nChainWork >= m_chainman.ActiveChain().Tip()->nChainWork) {
    


    tdb3 commented at 11:44 PM on February 10, 2024:

    I could be missing something, but a test case covering the reset of eviction timeout when an outbound peer sends us a valid block with more chainwork (i.e. the timeout set to 0 to prevent eviction, and m_work_header/m_sent_getheaders also reinitialized) was not observed. Do you think it's worth adding?


    sr-gi commented at 3:28 PM on February 12, 2024:

    tdb3 commented at 4:12 PM on February 12, 2024:

    Correct. >= rather than >.

  29. in test/functional/p2p_eviction.py:134 in edc14bc9da outdated
     128 | @@ -122,6 +129,178 @@ def run_test(self):
     129 |          self.log.debug("{} protected peers: {}".format(len(protected_peers), protected_peers))
     130 |          assert evicted_peers[0] not in protected_peers
     131 |  
     132 | +        node.disconnect_p2ps()
     133 | +
     134 | +    def test_outbound_eviction_unprotected(self):
    


    tdb3 commented at 11:51 PM on February 10, 2024:

    Added notes below to provide insight into what was reviewed:

    If eviction does not occur when it should, then wait_for_disconnect() will raise (AssertionError, originally from wait_until_helper_internal()). Confirmed by temporarily setting cur_mock_time to a value significantly lower than CHAIN_SYNC_TIMEOUT.

    Similarly, if unexpected eviction occurs (e.g. in the "keep catching up" case), then send_and_ping() or sync_with_ping() will raise IOError (from send_raw_message()). Confirmed by temporarily inserting a call to disconnect_p2ps().

  30. in test/functional/p2p_eviction.py:222 in edc14bc9da outdated
     217 | +        self.log.info("Test that the peer does not get evicted")
     218 | +        test_node.sync_with_ping()
     219 | +
     220 | +        node.disconnect_p2ps()
     221 | +
     222 | +    def test_outbound_eviction_protected(self):
    


    tdb3 commented at 12:00 AM on February 11, 2024:

    Added notes below to provide insight into what was reviewed:

    In test_outbound_eviction_protected() if unexpected eviction occurs (i.e. the P2P connection is no longer present), then sync_with_ping() will raise (IOError('Not connected'), causing the test to fail. Moved node.disconnect_p2ps() above the last test_node.sync_with_ping() call and, as expected, this occurred.

    nit: The heading comment in the test_outbound_eviction_protected() omits that a protected peer is also non-block-only, but this is implicit in the call to add_outbound_p2p_connection() using outbound-full-relay as the connection_type.

    net_processing.cpp:

    A peer is marked as protected if all of these are true:
     - its connection type is IsBlockOnlyConn() == false
     - it gave us a valid connecting header
     - we haven't reached MAX_OUTBOUND_PEERS_TO_PROTECT_FROM_DISCONNECT yet
     - its chain tip has at least as much work as ours
    

    sr-gi commented at 3:44 PM on February 12, 2024:

    Protected peer actually implies outbound-full-relay. The comment you mentioned may be outdated. I may add an update to it to the PR.

    https://github.com/bitcoin/bitcoin/blob/edc14bc9da332cae39f8803db559b532b3c74e16/src/net_processing.cpp#L2875-L2885

  31. in test/functional/p2p_eviction.py:252 in edc14bc9da outdated
     247 | +        self.log.info("Test that the node does not get evicted")
     248 | +        test_node.sync_with_ping()
     249 | +
     250 | +        node.disconnect_p2ps()
     251 | +
     252 | +    def test_outbound_eviction_protected_mixed(self):
    


    tdb3 commented at 12:03 AM on February 11, 2024:

    nit: I could be missing something, but a test case covering the limit of protected outbound peers (i.e. MAX_OUTBOUND_PEERS_TO_PROTECT_FROM_DISCONNECT) was not observed. Do you think it's worth adding?

    Thinking out loud: MAX_OUTBOUND_PEERS_TO_PROTECT_FROM_DISCONNECT = 4, so maybe adding another peer that would normally be considered protected and seeing that it is not protected could test this?


    sr-gi commented at 3:58 PM on February 12, 2024:

    The limit is tested in this test. We create 8 connections, from which only 4 are protected. If you mean that there are no unprotected peers that behave properly, that could be added. e.g.

    4 protected, send initial headers, do nothing, don't get evicted 2 unprotected, keep in sync, don't get evicted 2 unprotected, do not keep up, get evicted


    tdb3 commented at 4:28 PM on February 12, 2024:

    Thank you, yes, I should have been clearer. The updated test_outbound_eviction_protected_mixed in 91005f7459993e1f7139e46b53eef1bb04860a7c looks great (includes 4 protected, 4 unprotected (2 honest, 2 misbehaving by sending no headers or old)). ACK 91005f7459993e1f7139e46b53eef1bb04860a7c

  32. tdb3 commented at 12:06 AM on February 11, 2024: contributor

    Excellent work finding a gap and filling it by increasing functional test robustness. ACK for edc14bc9da332cae39f8803db559b532b3c74e16

    Checked out the PR branch, built, and ran all functional tests (all passed).

    The review was focused on verifying edge case coverage of the tests and that tests will accurately catch both pass and fail conditions.

    Did not observe previously reported warning message:

    TestFramework.p2p (WARNING): Connection lost to 0:0 due to [Errno 54] Connection reset by peer
    
  33. sr-gi commented at 4:00 PM on February 12, 2024: member

    Thanks for the review @tdb3

  34. sr-gi commented at 4:22 PM on February 12, 2024: member

    I redefined how peers are split in p2p_eviction:test_outbound_eviction_protected_mixed to address #29122 (review). Both commits can be squashed (leaving them as they are for now in case someone does not agree with the approach)

  35. sr-gi force-pushed on Feb 12, 2024
  36. sr-gi commented at 9:31 PM on February 12, 2024: member

    Rebased. This has failed CI once when waiting for the disconnection after:

    cur_mock_time += (HEADERS_RESPONSE_TIME + 1)
    

    I've tried to reproduce it but I cannot in my local setup, the only thing that comes to mind is that mock time may be too close to the timeout (only one second ahead), so I gave it a delta of 10 seconds instead.

    PS: For context, you can check the failure here: https://cirrus-ci.com/task/4624744360706048?logs=ci#L6520. See how the peer is given the final notice by sending a getheaders message, but the check is performed before the peer gets disconnected.

    I've been looking more into this, given an error showed up again on CI. I think the issue arose from the initial mock time not being properly set. I've reset the increments to be just one second apart, plus set the initial mock time before the tests were run.

    Also, I've moved the outbound eviction tests to their own file, given p2p_eviction was being described as testing inbound eviction, plus used some custom classes that were only relevant for inbounds.

  37. sr-gi force-pushed on Feb 12, 2024
  38. sr-gi force-pushed on Feb 12, 2024
  39. DrahtBot commented at 9:42 PM on February 12, 2024: contributor

    <!--85328a0da195eb286784d51f73fa0af9-->

    🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the documentation.

    Possibly this is due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    Leave a comment here, if you need help tracking down a confusing failure.

    <sub>Debug: https://github.com/bitcoin/bitcoin/runs/21495911628</sub>

  40. DrahtBot added the label CI failed on Feb 12, 2024
  41. sr-gi force-pushed on Feb 12, 2024
  42. DrahtBot removed the label CI failed on Feb 12, 2024
  43. tdb3 commented at 12:41 AM on February 13, 2024: contributor

    Rebased. This has failed CI once when waiting for the disconnection after:

    cur_mock_time += (HEADERS_RESPONSE_TIME + 1)
    

    I've tried to reproduce it but I cannot in my local setup, the only thing that comes to mind is that mock time may be too close to the timeout (only one second ahead), so I gave it a delta of 10 seconds instead. PS: For context, you can check the failure here: https://cirrus-ci.com/task/4624744360706048?logs=ci#L6520. See how the peer is given the final notice by sending a getheaders message, but the check is performed before the peer gets disconnected.

    I've been looking more into this, given an error showed up again on CI. I think the issue arose from the initial mock time not being properly set. I've reset the increments to be just one second apart, plus set the initial mock time before the tests were run.

    Also, I've moved the outbound eviction tests to their own file, given p2p_eviction was being described as testing inbound eviction, plus used some custom classes that were only relevant for inbounds.

    (now non-stale) ACK for f06b50dfd45000f34bf258d48101718b4b5f9772. Pulled. Built. Ran p2p_eviction.py and p2p_outbound_eviction.py. Both passed.

  44. in test/functional/p2p_outbound_eviction.py:84 in f06b50dfd4 outdated
      79 | +        # Test that if the peer never catches up with our current tip, but it does with the
      80 | +        # expected work that we set when setting the timer (that is, our tip at the time)
      81 | +        # we do not disconnect the peer
      82 | +        test_node = node.add_outbound_p2p_connection(P2PInterface(), p2p_idx=0, connection_type="outbound-full-relay")
      83 | +        self.log.info("Mine a block so our peer starts lagging")
      84 | +        best_block_hash = self.generateblock(self.nodes[0], output="raw(42)", transactions=[])["hash"]
    


    cbergqvist commented at 1:22 PM on February 22, 2024:

    self.nodes[0] => node here and in a few other places.

  45. in test/functional/p2p_outbound_eviction.py:67 in f06b50dfd4 outdated
      62 | +
      63 | +        # Connect a peer and make it send us headers ending in our tip's parent
      64 | +        test_node = node.add_outbound_p2p_connection(P2PInterface(), p2p_idx=0, connection_type="outbound-full-relay")
      65 | +        headers_message = msg_headers()
      66 | +        headers_message.headers = [prev_header]
      67 | +        test_node.send_and_ping(headers_message)
    


    cbergqvist commented at 1:22 PM on February 22, 2024:

    Nit: Why not make these more succinct, as in

           test_node.send_and_ping(msg_headers([prev_header]))
    

    ? There's no risk of raising exceptions from the first 2 lines.


    sr-gi commented at 8:34 PM on February 28, 2024:

    You're right, I didn't realize that the msg_headers constructor accepted an optional headers param

  46. in test/functional/p2p_outbound_eviction.py:185 in f06b50dfd4 outdated
     180 | +        for i in range(2):
     181 | +            peer = node.add_outbound_p2p_connection(P2PInterface(), p2p_idx=4+i, connection_type="outbound-full-relay")
     182 | +            peer.send_and_ping(headers_message)
     183 | +            honest_unprotected_peers.append(peer)
     184 | +
     185 | +        unprotected_peers_misbehaving = []
    


    cbergqvist commented at 1:28 PM on February 22, 2024:

    Suggestion: rename unprotected_peers_misbehaving => misbehaving_unprotected_peers to match honest_unprotected_peers.

  47. in test/functional/p2p_outbound_eviction.py:71 in f06b50dfd4 outdated
      67 | +        test_node.send_and_ping(headers_message)
      68 | +
      69 | +        # Trigger the timeouts
      70 | +        cur_mock_time += (CHAIN_SYNC_TIMEOUT + 1)
      71 | +        node.setmocktime(cur_mock_time)
      72 | +        test_node.sync_with_ping()
    


    cbergqvist commented at 9:43 PM on February 22, 2024:

    It's a bit opaque to me that .sync_with_ping() doesn't sync test_node all the way up to the tip of node. What is preventing that from happening?


    sr-gi commented at 8:52 PM on February 28, 2024:

    From the sync_with_ping docs/comments:

    # Sending two pings back-to-back, requires that the node calls
    # `ProcessMessage` twice, and thus ensures `SendMessages` must have
    # been called at least once
    

    For the sake of this test, this means that we are triggering the outbound eviction checks given we mock the node time right before the call to be past the check timeout


    cbergqvist commented at 8:56 AM on February 29, 2024:

    Yeah, I saw that comment. Your explanation for the context of this test helps clarify it, thanks!

    Calling "sync" (synchronize) on a bitcoin node and having it get the highest tip from it's peers would be less surprising. Maybe "update" or "process" would be better verbs but let's not rename it in this PR.

  48. in test/functional/p2p_outbound_eviction.py:70 in f06b50dfd4 outdated
      68 | +
      69 | +        # Trigger the timeouts
      70 | +        cur_mock_time += (CHAIN_SYNC_TIMEOUT + 1)
      71 | +        node.setmocktime(cur_mock_time)
      72 | +        test_node.sync_with_ping()
      73 | +        cur_mock_time += (HEADERS_RESPONSE_TIME + 1)
    


    cbergqvist commented at 9:48 PM on February 22, 2024:

    How about also verifying whether MaybeSendGetHeaders() is actually sent (or not) from node to test_node by PeerManagerImpl::ConsiderEviction() in relevant cases?


    sr-gi commented at 9:05 PM on February 28, 2024:

    That is already being tested implicitly by sync_with_ping() + wait + check for the disconnection.

    By mocking a time passed CHAIN_SYNC_TIMEOUT and syncing (and not getting an error) we test that the connection has not been dropped, therefore node must have followed to send the get headers message.

    I'll add it for clarity anyway

  49. cbergqvist commented at 10:08 PM on February 22, 2024: contributor

    Reviewed f06b50d. Appreciate the additional test coverage!

  50. cbergqvist commented at 10:15 PM on February 22, 2024: contributor

    Couldn't reproduce the CI timeout issue btw. But I have encountered the warning:

    2024-02-22T22:01:47.354000Z TestFramework.p2p (WARNING): Connection lost to 0:0 due to [Errno 104] Connection reset by peer
    
  51. in src/net_processing.cpp:5141 in f06b50dfd4 outdated
    5141 | -        } else if (state.m_chain_sync.m_timeout == 0s || (state.m_chain_sync.m_work_header != nullptr && state.pindexBestKnownBlock != nullptr && state.pindexBestKnownBlock->nChainWork >= state.m_chain_sync.m_work_header->nChainWork)) {
    5142 | -            // Our best block known by this peer is behind our tip, and we're either noticing
    5143 | -            // that for the first time, OR this peer was able to catch up to some earlier point
    5144 | -            // where we checked against our tip.
    5145 | -            // Either way, set a new timeout based on current tip.
    5146 | +        } else if (state.m_chain_sync.m_timeout == 0s || (state.m_chain_sync.m_work_header != nullptr && state.pindexBestKnownBlock != nullptr && state.pindexBestKnownBlock->nChainWork >= state.m_chain_sync.m_work_header->nChainWork)) {\
    


    marcofleon commented at 12:30 AM on February 23, 2024:

    nit: It looks like the backslash at the end of this line is a typo, yes? Could be removed for clarity.


    sr-gi commented at 9:06 PM on February 28, 2024:

    Nice catch!


    cbergqvist commented at 10:03 PM on February 28, 2024:

    I assumed it was some super-rare syntax for making tooling understand the comment below applied to the line before it. :D

  52. marcofleon commented at 12:55 AM on February 23, 2024: contributor

    ACK. Well done on the new test. Seems like a good number of cases are included. Only thing I commented on was the backslash nit in net_processing.cpp.

    No issues or warnings when built and tested. Tests p2p_eviction.py and p2p_outbound_eviction.py ran successfully.

  53. sr-gi force-pushed on Feb 28, 2024
  54. sr-gi commented at 9:11 PM on February 28, 2024: member

    Thanks for the reviews @cbergqvist and @marcofleon, I've addressed the comments of both.

    I also squashed the two commits from @tdb3 given looks like no one ha opposed to those

  55. cbergqvist approved
  56. cbergqvist commented at 8:58 AM on February 29, 2024: contributor

    ACK bffbb17!

    (Diffed test/functional/p2p_outbound_eviction.py against previous implementation and ran it a couple of times).

  57. DrahtBot requested review from marcofleon on Feb 29, 2024
  58. DrahtBot requested review from tdb3 on Feb 29, 2024
  59. DrahtBot removed review request from marcofleon on Feb 29, 2024
  60. DrahtBot removed review request from tdb3 on Feb 29, 2024
  61. DrahtBot requested review from tdb3 on Feb 29, 2024
  62. DrahtBot requested review from marcofleon on Feb 29, 2024
  63. DrahtBot removed review request from tdb3 on Feb 29, 2024
  64. DrahtBot removed review request from marcofleon on Feb 29, 2024
  65. DrahtBot requested review from tdb3 on Feb 29, 2024
  66. DrahtBot requested review from marcofleon on Feb 29, 2024
  67. davidgumberg commented at 7:02 AM on March 1, 2024: contributor

    What do you think about testing that block-relay-only nodes get evicted and aren't eligible for protection?

    For example:

    # Ensure that block-relay-only peers are not granted protection and ARE evicted
    self.log.info("Create an outbound connection to a peer that shares our tip but is block-relay only, so it does not get protection")
    test_node = node.add_outbound_p2p_connection(P2PInterface(), p2p_idx=1, connection_type="block-relay-only")
    
    test_node.send_and_ping(msg_headers([from_hex(CBlockHeader(), node.getblockheader(node.getbestblockhash(), False))]))
    
    self.log.info("Mine a new block and sync with our peer")
    self.generateblock(node, output="raw(42)", transactions=[])
    test_node.sync_with_ping()
    
    self.log.info("Let enough time pass for the timeouts to go off")
    # Trigger the timeouts and check how we are still connected
    cur_mock_time += (CHAIN_SYNC_TIMEOUT + 1)
    node.setmocktime(cur_mock_time)
    test_node.sync_with_ping()
    test_node.wait_for_getheaders()
    cur_mock_time += (HEADERS_RESPONSE_TIME + 1)
    node.setmocktime(cur_mock_time)
    self.log.info("Test that the block-relay-only peer gets evicted")
    test_node.wait_for_disconnect()
    

    Otherwise, reviewed ConsiderEviction and the tests added here provide good coverage. I ran the p2p_outbound_eviction.py tests and they passed, I was unable to reproduce the connection reset mentioned above.

    ACK https://github.com/bitcoin/bitcoin/pull/29122/commits/bffbb176d8a4f545c26ff7e437ab33cb004852bb

  68. DrahtBot removed review request from tdb3 on Mar 1, 2024
  69. DrahtBot removed review request from marcofleon on Mar 1, 2024
  70. DrahtBot requested review from marcofleon on Mar 1, 2024
  71. DrahtBot requested review from tdb3 on Mar 1, 2024
  72. DrahtBot removed review request from marcofleon on Mar 1, 2024
  73. DrahtBot removed review request from tdb3 on Mar 1, 2024
  74. DrahtBot requested review from tdb3 on Mar 1, 2024
  75. DrahtBot requested review from marcofleon on Mar 1, 2024
  76. DrahtBot removed review request from tdb3 on Mar 1, 2024
  77. DrahtBot removed review request from marcofleon on Mar 1, 2024
  78. DrahtBot requested review from tdb3 on Mar 1, 2024
  79. DrahtBot requested review from marcofleon on Mar 1, 2024
  80. tdb3 commented at 12:32 PM on March 1, 2024: contributor

    re ACK bffbb176d8a4f545c26ff7e437ab33cb004852bb

  81. DrahtBot removed review request from tdb3 on Mar 1, 2024
  82. DrahtBot removed review request from marcofleon on Mar 1, 2024
  83. DrahtBot requested review from marcofleon on Mar 1, 2024
  84. sr-gi commented at 3:55 PM on March 1, 2024: member

    What do you think about testing that block-relay-only nodes get evicted and aren't eligible for protection?

    This was already pointed out by @tdb3 in his review: #29122 (review)

    TL;DR; Protected peers imply outbound-full-relay, but given this seems to be a recurring question I'll add it to the test. I guess it wouldn't hurt having it to make sure we don't change the logic without noticing it.

  85. DrahtBot removed review request from marcofleon on Mar 1, 2024
  86. DrahtBot requested review from marcofleon on Mar 1, 2024
  87. sr-gi force-pushed on Mar 1, 2024
  88. sr-gi force-pushed on Mar 1, 2024
  89. sr-gi commented at 4:19 PM on March 1, 2024: member

    I've added a test to cover for @tdb3 and @davidgumberg comments on non-outbound-full-relay peers eviction, plus amended the original commit to rename test_outbound_eviction_mixed and added a missing comment to the method.

  90. DrahtBot commented at 5:59 PM on March 1, 2024: contributor

    <!--85328a0da195eb286784d51f73fa0af9-->

    🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the documentation.

    Possibly this is due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    Leave a comment here, if you need help tracking down a confusing failure.

    <sub>Debug: https://github.com/bitcoin/bitcoin/runs/22178847360</sub>

  91. DrahtBot added the label CI failed on Mar 1, 2024
  92. sr-gi force-pushed on Mar 1, 2024
  93. DrahtBot removed the label CI failed on Mar 1, 2024
  94. in test/functional/p2p_outbound_eviction.py:244 in 0af28b019a outdated
     233 | +        node.setmocktime(cur_mock_time)
     234 | +        self.log.info("Test that the peer gets evicted")
     235 | +        test_node.wait_for_disconnect()
     236 | +
     237 | +
     238 | +    def run_test(self):
    


    marcofleon commented at 9:11 PM on March 1, 2024:

    I could be missing something here but the new test test_outbound_eviction_blocks_relay_only isn't included in run_test. It does look good from just reading it though :)


    sr-gi commented at 9:51 PM on March 1, 2024:

    You're right, this may have been removed while rebasing. Nice catch!

    This should be fixed now

  95. sr-gi force-pushed on Mar 1, 2024
  96. marcofleon approved
  97. marcofleon commented at 1:27 AM on March 2, 2024: contributor

    re-ACK 33353b7723d0704d64c0dfc1b0f5d072692e68b9. I've reviewed the new changes, focusing on the added negative test. I think it's a valuable addition, expanding the test's coverage of different scenarios in the network. I built and ran p2p_outbound_eviction multiple times and observed no issues or warnings.

    Well done on addressing the feedback and enhancing the test suite.

  98. DrahtBot requested review from tdb3 on Mar 2, 2024
  99. DrahtBot requested review from cbergqvist on Mar 2, 2024
  100. in test/functional/p2p_outbound_eviction.py:213 in 33353b7723 outdated
     206 | @@ -207,12 +207,40 @@ def test_outbound_eviction_mixed(self):
     207 |          for peer in misbehaving_unprotected_peers:
     208 |              peer.wait_for_disconnect()
     209 |  
     210 | +        node.disconnect_p2ps()
    


    davidgumberg commented at 2:05 AM on March 2, 2024:

    (feel free to disregard) nit: This should be in the previous commit


    sr-gi commented at 2:24 PM on March 21, 2024:

    I don't think that's the case. In the previous commit this was the last test so there was no need to manually disconnect the nodes. This is only needed after the last commit to start fresh


    davidgumberg commented at 10:50 PM on April 25, 2024:

    (still nit) On second thought it seems a bad idea to depend on the order in which tests are called for deciding whether or not to perform cleanup. I think that might be kind of a footgun for anyone modifying the test in the future (more relevant for test_outbound_eviction_blocks_relay_only)


    sr-gi commented at 3:18 PM on April 26, 2024:

    Ok, agreed

  101. DrahtBot removed review request from tdb3 on Mar 2, 2024
  102. DrahtBot removed review request from cbergqvist on Mar 2, 2024
  103. DrahtBot requested review from cbergqvist on Mar 2, 2024
  104. DrahtBot requested review from tdb3 on Mar 2, 2024
  105. DrahtBot requested review from davidgumberg on Mar 2, 2024
  106. cbergqvist approved
  107. cbergqvist commented at 12:14 PM on March 2, 2024: contributor

    ACK 33353b7.

    Diffed p2p_outbound_eviction.py against previously acked version.

    Ran the new version a few times, switched the node type from block-relay-only to outbound-full-relay and confirmed that wait_for_disconnect() times out (decreased timeout to 10s locally, but should still be more than enough).

  108. DrahtBot removed review request from tdb3 on Mar 2, 2024
  109. DrahtBot removed review request from davidgumberg on Mar 2, 2024
  110. DrahtBot requested review from tdb3 on Mar 2, 2024
  111. DrahtBot requested review from davidgumberg on Mar 2, 2024
  112. davidgumberg commented at 1:26 AM on March 9, 2024: contributor

    ACK https://github.com/bitcoin/bitcoin/pull/29122/commits/33353b7723d0704d64c0dfc1b0f5d072692e68b9

    Tested by modifying bool IsOutboundOrBlockRelayConn(), and CHAIN_SYNC_TIMEOUT and this test catches. It would be nice if we also had a test that made sure we don't punish peers and set the timeout to HEADERS_RESPONSE_TIME before we've given them CHAIN_SYNC_TIMEOUT to catch up, but I think this PR looks great as it stands!

  113. DrahtBot removed review request from tdb3 on Mar 9, 2024
  114. DrahtBot removed review request from davidgumberg on Mar 9, 2024
  115. DrahtBot requested review from tdb3 on Mar 9, 2024
  116. DrahtBot added the label Needs rebase on Mar 11, 2024
  117. sr-gi force-pushed on Mar 11, 2024
  118. sr-gi commented at 3:11 PM on March 11, 2024: member

    Rebased to deal with merge conflicts

  119. DrahtBot removed the label Needs rebase on Mar 11, 2024
  120. DrahtBot requested review from cbergqvist on Mar 12, 2024
  121. DrahtBot requested review from marcofleon on Mar 12, 2024
  122. tdb3 commented at 10:27 PM on March 12, 2024: contributor

    re ACK for 59affd0bf9042d746a6b32739edb0cd45d28d569. Built and re-ran functional tests locally. All passed.

  123. cbergqvist approved
  124. cbergqvist commented at 8:55 AM on March 14, 2024: contributor

    re ACK 59affd0bf9042d746a6b32739edb0cd45d28d569

    Built & ran test/functional/test_runner.py --cachedir=/mnt/tmp/cache --tmpdir=/mnt/tmp successfully.

  125. marcofleon approved
  126. marcofleon commented at 12:20 AM on March 15, 2024: contributor

    re-ACK 59affd0bf9042d746a6b32739edb0cd45d28d569. Ran all functional tests, no issues.

  127. sr-gi force-pushed on Mar 15, 2024
  128. sr-gi commented at 9:15 PM on March 15, 2024: member
  129. davidgumberg commented at 12:11 PM on March 16, 2024: contributor

    reACK https://github.com/bitcoin/bitcoin/pull/29122/commits/cd5b7a11ee55b1f86bb7662c4fc83616ffa735b1

    $ git range-diff a945f09...59affd0 015ac13...cd5b7a1                                              
    1:  a12a4dfa90 = 1:  8fa272dccc test: adds outbound eviction functional tests, updates comment in ConsiderEviction     
    2:  59affd0bf9 = 2:  cd5b7a11ee test: adds outbound eviction tests for non outbound-full-relay peers 
    
    $ python ./test/functional/test_runner.py
    ALL                                                    | ✓ Passed  | 1350 s (accumulated) 
    
  130. DrahtBot requested review from cbergqvist on Mar 16, 2024
  131. DrahtBot requested review from marcofleon on Mar 16, 2024
  132. cbergqvist approved
  133. cbergqvist commented at 9:38 PM on March 16, 2024: contributor

    reACK cd5b7a11ee55b1f86bb7662c4fc83616ffa735b1

    (Diff didn't change). Built & ran functional tests.

  134. sipa commented at 4:26 PM on March 17, 2024: member

    Concept ACK

  135. tdb3 commented at 12:07 AM on March 18, 2024: contributor

    re ACK for cd5b7a11ee55b1f86bb7662c4fc83616ffa735b1. Built and re-ran functional tests locally. All passed.

  136. DrahtBot requested review from sipa on Mar 18, 2024
  137. marcofleon commented at 7:22 PM on March 18, 2024: contributor

    re-ACK cd5b7a11ee55b1f86bb7662c4fc83616ffa735b1. All functional tests passed.

  138. in test/functional/p2p_outbound_eviction.py:8 in 8fa272dccc outdated
       0 | @@ -0,0 +1,219 @@
       1 | +#!/usr/bin/env python3
       2 | +# Copyright (c) 2019-2021 The Bitcoin Core developers
       3 | +# Distributed under the MIT software license, see the accompanying
       4 | +# file COPYING or http://www.opensource.org/licenses/mit-license.php.
       5 | +
       6 | +""" Test node outbound peer eviction logic
       7 | +
       8 | +A subset of out outbound peers are subject to eviction logic if they cannot keep up
    


    stratospher commented at 2:53 AM on March 20, 2024:

    8fa272d: typo - s/out/our

  139. in test/functional/p2p_outbound_eviction.py:13 in 8fa272dccc outdated
       8 | +A subset of out outbound peers are subject to eviction logic if they cannot keep up
       9 | +with our vision of the best chain. This criteria applies only to non-protected peers,
      10 | +and can be triggered by either not learning about any blocks from an outbound peer after
      11 | +a certain deadline, or by them not being able to catch up fast enough (under the same deadline).
      12 | +
      13 | +This tests the different eviction paths based on the peer' behavior and on whether they are protected
    


    stratospher commented at 2:54 AM on March 20, 2024:

    8fa272d: typo - peer's

  140. in test/functional/p2p_outbound_eviction.py:28 in 8fa272dccc outdated
      23 | +from test_framework.p2p import (
      24 | +    P2PInterface,
      25 | +)
      26 | +from test_framework.test_framework import BitcoinTestFramework
      27 | +
      28 | +# Timeouts (in minutes)
    


    stratospher commented at 2:56 AM on March 20, 2024:

    8fa272d: maybe remove minutes? since multiplying with 60 means seconds.


    sr-gi commented at 7:53 AM on March 25, 2024:

    I'll change it for seconds 🤦

  141. in test/functional/p2p_outbound_eviction.py:44 in 8fa272dccc outdated
      39 | +        node = self.nodes[0]
      40 | +        cur_mock_time = node.mocktime
      41 | +
      42 | +        self.log.info("Create an outbound connection and don't send any headers")
      43 | +        # Test disconnect due to no block being announced in 22+ minutes (headers are not even exchanged)
      44 | +        test_node = node.add_outbound_p2p_connection(P2PInterface(), p2p_idx=0, connection_type="outbound-full-relay")
    


    stratospher commented at 3:06 AM on March 20, 2024:

    8fa272d: can we call test_node variable peer instead? add_outbound_p2p_connection returns P2PInterface objects and not a TestNode object like test_node implies.


    sr-gi commented at 7:56 AM on March 25, 2024:

    Yeah, I see your point. The rationale was that it was the node being tested, but that's indeed confusing given TestNode, I'll rename it to peer.

  142. in test/functional/p2p_outbound_eviction.py:46 in 8fa272dccc outdated
      41 | +
      42 | +        self.log.info("Create an outbound connection and don't send any headers")
      43 | +        # Test disconnect due to no block being announced in 22+ minutes (headers are not even exchanged)
      44 | +        test_node = node.add_outbound_p2p_connection(P2PInterface(), p2p_idx=0, connection_type="outbound-full-relay")
      45 | +        # Wait for over 20 min to trigger the first eviction timeout. This sets the last call past 2 min in the future.
      46 | +        test_node.sync_with_ping()
    


    stratospher commented at 3:12 AM on March 20, 2024:

    8fa272d: add_outbound_p2p_connection calls sync_with_ping() internally. do we need to do it again?


    sr-gi commented at 8:02 AM on March 25, 2024:

    I don't think we do


    stratospher commented at 8:09 AM on March 25, 2024:

    this line - but not a blocker, maybe the test is more consistent to read this way.


    sr-gi commented at 12:51 PM on March 25, 2024:

    Oh no, I meant I don't think we need to do it again 😆

  143. in test/functional/p2p_outbound_eviction.py:99 in cd5b7a11ee outdated
      94 | +            # Send a header with the old tip
      95 | +            test_node.send_and_ping(msg_headers([tip_header]))
      96 | +            tip_header = from_hex(CBlockHeader(), node.getblockheader(best_block_hash, False))
      97 | +
      98 | +            # Check that we are not evicted
      99 | +            test_node.wait_for_getheaders()
    


    stratospher commented at 1:04 PM on March 22, 2024:

    8fa272dc: last_message needs to be explicitly cleared:

    with p2p_lock:
        test_node.last_message.pop("getheaders", None)
    

    before calling wait_for_getheaders(). otherwise, it will always pass.


    sr-gi commented at 8:06 AM on March 25, 2024:

    Ohh, TIL, I'm guessing this is because the collection is already populated with the getheaders from the interaction i-1 and we find it all the time


    stratospher commented at 8:14 AM on March 25, 2024:

    yeah! follow up idea, out of scope for this PR - it'd be awesome to modify wait_for_getheaders() to make sure that it's the same getheaders (which was sent from TestNode's MaybeSendGetHeaders) we're expecting that is received!


    stratospher commented at 12:31 PM on March 25, 2024:

    oh see #29318.


    sr-gi commented at 4:04 PM on March 25, 2024:

    Fixed it by popping from last_message, may open a followup simplifying it after #18614 gets fixed

  144. in test/functional/p2p_outbound_eviction.py:158 in 8fa272dccc outdated
     153 | +        node = self.nodes[0]
     154 | +        cur_mock_time = node.mocktime
     155 | +
     156 | +        self.log.info("Create a mix of protected and unprotected outbound connections to check against eviction")
     157 | +
     158 | +        # Lets try this logic having multiple peers, some protected and some unprotected
    


    stratospher commented at 4:58 AM on March 23, 2024:

    8fa272dc: Let's

  145. stratospher commented at 6:03 AM on March 25, 2024: contributor

    cool test! Concept ACK. left some suggestions, nits and 1 problem.

    something I found interesting was when generateblock is called - inv, getdata and block messages are exchanged between node and peer. so in the tests (ex: test_outbound_eviction_unprotected), the peer is sending the header of an older block it already knows about! I don't think it affects the behaviour in the tests. but would like to make sure that this is expected.

  146. sr-gi force-pushed on Mar 25, 2024
  147. sr-gi force-pushed on Mar 25, 2024
  148. sr-gi commented at 4:05 PM on March 25, 2024: member

    Rebased + addressed comments from @stratospher. I had to slightly reshape test_outbound_eviction_unprotected to fit #29122 (review) given some time mocks + waits were unnecessary

  149. in test/functional/p2p_outbound_eviction.py:24 in 0f5193199a outdated
      19 | +    from_hex,
      20 | +    msg_headers,
      21 | +    CBlockHeader,
      22 | +)
      23 | +from test_framework.p2p import (
      24 | +    P2PInterface, p2p_lock
    


    stratospher commented at 4:09 AM on March 26, 2024:

    0f51931: style nit - different lines

  150. in test/functional/p2p_outbound_eviction.py:46 in 0f5193199a outdated
      39 | +        node = self.nodes[0]
      40 | +        cur_mock_time = node.mocktime
      41 | +
      42 | +        self.log.info("Create an outbound connection and don't send any headers")
      43 | +        # Test disconnect due to no block being announced in 22+ minutes (headers are not even exchanged)
      44 | +        peer = node.add_outbound_p2p_connection(P2PInterface(), p2p_idx=0, connection_type="outbound-full-relay")
    


    stratospher commented at 4:41 AM on March 26, 2024:

    0f51931: I think we need to clear the initial getheaders which gets sent every time add_outbound_p2p_connection is done too. this makes test_outbound_eviction_protected fail :(


    sr-gi commented at 10:03 AM on March 26, 2024:

    Yeah, I noticed that while patching your last comments for test_outbound_eviction_unprotected (which already has that change applied). I'm assessing how easy would be to supersede #29318 so this PR does not become super verbose and hacky


    sr-gi commented at 11:44 AM on March 27, 2024:

    #29736

    Will rebase on top of that one once it gets merged


    sr-gi commented at 3:16 PM on March 27, 2024:

    I patched this locally on top of the aforementioned PR and looks like all issues are gone :)

  151. tdb3 commented at 3:09 PM on March 28, 2024: contributor

    reACK for 368389d973a85d21aa9ebf030ef5e7f7ac5343de. Pulled, built, ran all unit/functional tests (all passed).

  152. DrahtBot requested review from cbergqvist on Mar 28, 2024
  153. DrahtBot requested review from marcofleon on Mar 28, 2024
  154. DrahtBot requested review from davidgumberg on Mar 28, 2024
  155. DrahtBot requested review from stratospher on Mar 28, 2024
  156. in test/functional/p2p_outbound_eviction.py:223 in 368389d973 outdated
     217 | +        # This tests that other types of peers (blocks-relay-only for instance) are not granted protection
     218 | +        node = self.nodes[0]
     219 | +        cur_mock_time = node.mocktime
     220 | +
     221 | +        self.log.info("Create an blocks-only outbound connection to a peer that shares our tip. This would usually grant protection")
     222 | +        test_node = node.add_outbound_p2p_connection(P2PInterface(), p2p_idx=0, connection_type="block-relay-only")
    


    cbergqvist commented at 11:09 AM on April 22, 2024:

    nit: Noticed you renamed test_node -> peer above (based off #29122 (review)) but not inside test_outbound_eviction_blocks_relay_only().

    (Was diffing against older head (59affd0bf9042d746a6b32739edb0cd45d28d569)).

    Might be worth changing once you rebase on top of #29736.


    sr-gi commented at 6:16 PM on April 25, 2024:

    Nice catch!

  157. DrahtBot requested review from cbergqvist on Apr 22, 2024
  158. sr-gi force-pushed on Apr 25, 2024
  159. sr-gi force-pushed on Apr 25, 2024
  160. sr-gi commented at 6:18 PM on April 25, 2024: member

    Rebased on top of #29736. Now this is using wait_for_getheaders to reduce the boilerplate of having to manually pop the results from last_message.

    cc/ @stratospher

  161. sr-gi force-pushed on Apr 25, 2024
  162. DrahtBot added the label CI failed on Apr 25, 2024
  163. DrahtBot commented at 6:21 PM on April 25, 2024: contributor

    <!--85328a0da195eb286784d51f73fa0af9-->

    🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the documentation.

    Possibly this is due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    Leave a comment here, if you need help tracking down a confusing failure.

    <sub>Debug: https://github.com/bitcoin/bitcoin/runs/24266342152</sub>

  164. davidgumberg commented at 11:11 PM on April 25, 2024: contributor

    ACK https://github.com/bitcoin/bitcoin/pull/29122/commits/be2c5510521081119f62ef3b29f76bb9097d0f34 Re-reviewed code, and attempted to break tests by making ConsiderEviction misbehave.

    It is worth noting that these tests will not catch if outbound eviction logic gets triggered too early. e.g.:

    /** How long to wait for a peer to respond to a getheaders request */
    static constexpr auto HEADERS_RESPONSE_TIME{0s};
    // [...]
    /** Timeout for (unprotected) outbound peers to sync to our chainwork */
    static constexpr auto CHAIN_SYNC_TIMEOUT{0s};
    

    will pass. That seems fine to me, maybe something for a follow up.

  165. DrahtBot requested review from tdb3 on Apr 25, 2024
  166. DrahtBot removed the label CI failed on Apr 26, 2024
  167. test: adds outbound eviction functional tests, updates comment in ConsiderEviction a8d9a0edc7
  168. test: adds outbound eviction tests for non outbound-full-relay peers
    Peer protection is only given to outbound-full-relay peers. Add a negative
    test to check that other type of outbound peers are not given protection under
    the circumstances that outbound-full-relay would
    d53d848347
  169. sr-gi commented at 3:20 PM on April 26, 2024: member

    ACK be2c551 Re-reviewed code, and attempted to break tests by making ConsiderEviction misbehave.

    It is worth noting that these tests will not catch if outbound eviction logic gets triggered too early. e.g.:

    /** How long to wait for a peer to respond to a getheaders request */
    static constexpr auto HEADERS_RESPONSE_TIME{0s};
    // [...]
    /** Timeout for (unprotected) outbound peers to sync to our chainwork */
    static constexpr auto CHAIN_SYNC_TIMEOUT{0s};
    

    will pass. That seems fine to me, maybe something for a follow-up.

    I don't think I follow you here. HEADERS_RESPONSE_TIME and CHAIN_SYNC_TIMEOUT are constants hardcoded in net_processing.cpp (and mapped here). I would expect unit test to fail if those are updated.

  170. sr-gi force-pushed on Apr 26, 2024
  171. sr-gi commented at 3:21 PM on April 26, 2024: member

    Updated to address #29122 (review)

  172. davidgumberg commented at 11:38 PM on April 26, 2024: contributor

    reACK https://github.com/bitcoin/bitcoin/commit/d53d84834747c37f4060a9ef379e0a6b50155246

    I don't think I follow you here. HEADERS_RESPONSE_TIME and CHAIN_SYNC_TIMEOUT are constants hardcoded in net_processing.cpp (and mapped here). I would expect unit test to fail if those are updated.

    I didn't mean to say that these tests should be responsible for enforcing the particular values of HEADERS_RESPONSE_TIME and CHAIN_SYNC_TIMEOUT, just that they would not catch a regression where outbound eviction logic gets triggered earlier than expected on peers, which can be demonstrated by setting both values to zero.

    Anyways, I think if what I'm saying isn't confused in some way, it is more pedantic than useful.

  173. cbergqvist approved
  174. cbergqvist commented at 9:53 PM on April 28, 2024: contributor

    ACK d53d84834747c37f4060a9ef379e0a6b50155246

    Functional including --extended tests passed (skipped unrelated feature_dbcrash).

  175. tdb3 commented at 3:54 PM on May 5, 2024: contributor

    Re ACK for d53d84834747c37f4060a9ef379e0a6b50155246 Pulled, built, ran all unit/functional tests (all passed). Performed code review, but didn't see anything noteworthy over previous review.

  176. bitcoin deleted a comment on May 5, 2024
  177. marcofleon commented at 3:30 PM on May 9, 2024: contributor

    Re ACK. Reviewed the code and the tests look good to me. I ran p2p_outbound_eviction.py individually and along with all the other functional tests and everything looks good.

  178. achow101 commented at 8:13 PM on May 9, 2024: member

    ACK d53d84834747c37f4060a9ef379e0a6b50155246

  179. DrahtBot requested review from marcofleon on May 9, 2024
  180. achow101 merged this on May 9, 2024
  181. achow101 closed this on May 9, 2024

  182. PastaPastaPasta referenced this in commit 9db73d205c on Feb 17, 2025
  183. PastaPastaPasta referenced this in commit cc9330c98e on Feb 17, 2025
  184. bitcoin locked this on May 9, 2025

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-04-15 00:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me