test: Fix intermittent issue in p2p_handshake.py #29898

pull stratospher wants to merge 1 commits into bitcoin:master from stratospher:2024-04-addpeerid changing 1 files +1 −0
  1. stratospher commented at 8:16 am on April 17, 2024: contributor

    When establishing outbound connections [TestNode ——–> P2PConnection], P2PConnection listens for a single connection from TestNode on a port which is fixed based on p2p_idx.

    If we reuse the same port when disconnecting and establishing connections again, we might hit this scenario where:

    • disconnection is done on python side for P2PConnection
    • disconnection not complete on c++ side for TestNode
    • we’re trying to establish a new connection on same port again

    Prevent this scenario from happening by ensuring disconnection on c++ side for TestNode as well.

    One way to reproduce this on master would be adding a sleep statement before disconnection happens on c++ side.

     0diff --git a/src/net.cpp b/src/net.cpp
     1index e388f05b03..62507d1f39 100644
     2--- a/src/net.cpp
     3+++ b/src/net.cpp
     4@@ -2112,6 +2112,7 @@ void CConnman::SocketHandlerConnected(const std::vector<CNode*>& nodes,
     5                 if (!pnode->fDisconnect) {
     6                     LogPrint(BCLog::NET, "socket closed for peer=%d\n", pnode->GetId());
     7                 }
     8+                std::this_thread::sleep_for(std::chrono::milliseconds(1000));
     9                 pnode->CloseSocketDisconnect();
    10             }
    11             else if (nBytes < 0)
    
  2. DrahtBot commented at 8:16 am on April 17, 2024: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage

    For detailed information about the code coverage, see the test coverage report.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK mzumsande, BrandonOdiwuor, theStack, maflcko, glozow
    Stale ACK alfonsoromanz

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

  3. DrahtBot added the label Tests on Apr 17, 2024
  4. DrahtBot added the label CI failed on Apr 17, 2024
  5. alfonsoromanz approved
  6. alfonsoromanz commented at 12:07 pm on April 17, 2024: none

    Tested ACK b199b323840cf4978bda3437eee36c3dd228702a

    I was able to reproduce this issue by using the sleep statement.

    Test execution before this fix (failed):

    Test execution after this fix (passed):

  7. theStack commented at 4:06 pm on April 17, 2024: contributor
    Concept ACK, thanks for fixing!
  8. DrahtBot removed the label CI failed on Apr 17, 2024
  9. in test/functional/p2p_handshake.py:50 in b199b32384 outdated
    46@@ -47,6 +47,7 @@ def test_desirable_service_flags(self, node, service_flag_tests, desirable_servi
    47            service flags in the VERSION message. The test is exercised for all relevant
    48            outbound connection types where the desirable service flags check is done."""
    49         CONNECTION_TYPES = ["outbound-full-relay", "block-relay-only", "addr-fetch"]
    50+        peer_id = 0
    


    mzumsande commented at 8:13 pm on April 17, 2024:
    I think it would be more stable to have a global counter, so that peer_id is never reused. Currently, it is still reused over different test_desirable_service_flags calls. E.g., if I add the net sleep and also change CONNECTION_TYPES = ["outbound-full-relay"] so that test_desirable_service_flags only makes one connection, I run into the timeout even with your fix.

    stratospher commented at 5:19 am on April 18, 2024:

    makes sense, less diff too. but peer_id is bounded to [0, 11]/p2p_idx = peer_id + 1 is bounded to [1, 12] here. so we’d cross that limit if we do global counters.

    EDIT: wondering why there are limits - looks like there weren’t limits before: https://github.com/bitcoin/bitcoin/commit/fabbf6bd62c1d8a290841b63fb1e5acec42ba3b0#diff-ca0d4d4c4cfe461c71633bbc358b2f38ef4aa1351f0e461c28b570928c9503eb. I guess it’s because each P2PConnection for obvious reasons can accept only 1 connection from TestNode and there could only be 12 possible simultaneous outbound connections among all TestNodes in a python test.


    mzumsande commented at 12:06 pm on April 18, 2024:
    oh - I didn’t know about that, interesting! Another thing we could do is to not only do peer.wait_for_disconnect() but also wait until the bitcoind side has disconnected - something like self.wait_until(lambda: len(self.nodes[0].getpeerinfo()) == X). Haven’t tried that though.

    maflcko commented at 1:38 pm on April 18, 2024:

    EDIT: wondering why there are limits

    There needs to be a limit, because each test gets allocated the full (maximum) range of ports for the duration of the test_runner (that is for the duration of all tests, not the test itself) to avoid port collisions. Thus, the limit needs to exist, because there is a finite number of ports on any system.


    stratospher commented at 6:57 pm on April 18, 2024:

    Another thing we could do is to not only do peer.wait_for_disconnect() but also wait until the bitcoind side has disconnected - something like self.wait_until(lambda: len(self.nodes[0].getpeerinfo()) == X). Haven’t tried that though.

    yeah this sounds like a better option! updated the PR to use this approach.

    (that is for the duration of all tests, not the test itself) to avoid port collisions.

    oh makes sense, a TIL!

  10. mzumsande commented at 8:24 pm on April 17, 2024: contributor
    Concept ACK
  11. stratospher force-pushed on Apr 18, 2024
  12. test: Fix intermittent issue in p2p_handshake.py
    If we reuse the same port when disconnecting and establishing connections
    again, we might hit this scenario:
    - disconnection is done on python side for P2PConnection
    - disconnection is not complete on c++ side for TestNode
    - we're trying to establish a new connection on same port again
    
    Prevent this scenario from happening by ensuring disconnection on c++
    side for TestNode as well.
    6b02c11d66
  13. stratospher force-pushed on Apr 18, 2024
  14. mzumsande commented at 2:27 pm on April 19, 2024: contributor
    Tested ACK 6b02c11d667adff24daf611f9b14815d27963674
  15. DrahtBot requested review from alfonsoromanz on Apr 19, 2024
  16. DrahtBot requested review from theStack on Apr 19, 2024
  17. BrandonOdiwuor approved
  18. BrandonOdiwuor commented at 3:19 pm on April 19, 2024: contributor
    Tested ACK 6b02c11d667adff24daf611f9b14815d27963674
  19. in test/functional/p2p_handshake.py:44 in 6b02c11d66
    40@@ -41,6 +41,7 @@ def add_outbound_connection(self, node, connection_type, services, wait_for_disc
    41             peer.sync_with_ping()
    42             peer.peer_disconnect()
    43             peer.wait_for_disconnect()
    44+        self.wait_until(lambda: len(node.getpeerinfo()) == 0)
    


    maflcko commented at 3:33 pm on April 19, 2024:
    I think this could still be racy? For example, the CNode may still exist, but the Peer may have been deleted already, in which case no peer info is returned, but the CNode would still cause the test to crash?

    mzumsande commented at 3:46 pm on April 19, 2024:
    Are you sure? DisconnectNodes() deletes the node first from m_nodes (pushing an entry to m_nodes_disconnected), and only after that deletes the Peer (in FinalizeNode()), so I can’t see how we can have a CNode but no Peer.

    maflcko commented at 2:42 pm on April 22, 2024:
    Ah, you are right. Sorry for the wrong comment.
  20. theStack approved
  21. theStack commented at 12:45 pm on April 22, 2024: contributor
    Tested ACK 6b02c11d667adff24daf611f9b14815d27963674
  22. maflcko commented at 2:43 pm on April 22, 2024: member
    lgtm ACK 6b02c11d667adff24daf611f9b14815d27963674
  23. maflcko commented at 2:43 pm on April 22, 2024: member
    Fixes #29896
  24. glozow commented at 4:01 pm on April 22, 2024: member
    ACK 6b02c11d667adff24daf611f9b14815d27963674
  25. glozow merged this on Apr 22, 2024
  26. glozow closed this on Apr 22, 2024

  27. achow101 commented at 4:07 pm on April 22, 2024: member
    ACK 6b02c11d667adff24daf611f9b14815d27963674
  28. stratospher deleted the branch on Apr 22, 2024
  29. dasibcryptoidology approved

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-06-29 04:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me