test: Fix all races after a socket is closed gracefully #34425

pull maflcko wants to merge 2 commits into bitcoin:master from maflcko:2601-test-missing-sync changing 11 files +21 −2
  1. maflcko commented at 9:35 pm on January 27, 2026: member

    Currently, functional tests may intermittently fail, due to a lack of synchronization after a node connection socket was closed gracefully: If node A is connected to node B, and node B closes the connection, node A must wait for the connection to be closed before continuing the test. Otherwise, subsequent re-connections may not work while a stale connection is still alive.

    This can be reproduced locally via something like:

    0diff --git a/src/net.cpp b/src/net.cpp
    1index 6b79e913e8..32bd061500 100644
    2--- a/src/net.cpp
    3+++ b/src/net.cpp
    4@@ -2200,2 +2200,3 @@ void CConnman::SocketHandlerConnected(const std::vector<CNode*>& nodes,
    5                 }
    6+            UninterruptibleSleep(599ms);
    7                 pnode->CloseSocketDisconnect();
    

    With this diff, the tests should fail on master, and pass after this fix.

    The fix involves simply calling self.wait_until(lambda: len(node_a.getpeerinfo()) == 0) after each disconnect, and before each re-connection.

  2. DrahtBot added the label Tests on Jan 27, 2026
  3. DrahtBot commented at 9:35 pm on January 27, 2026: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/34425.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    Concept ACK davidgumberg
    Stale ACK w0xlt

    If your review is incorrectly listed, please copy-paste <!–meta-tag:bot-skip–> into the comment that the bot should ignore.

  4. in test/functional/rpc_setban.py:29 in fa652a5d63
    27@@ -28,17 +28,19 @@ def run_test(self):
    28 
    29         # Node 0 get banned by Node 1
    


    w0xlt commented at 9:52 pm on January 27, 2026:

    nit not related to PR changes

    0        # Node 0 gets banned by Node 1
    
  5. w0xlt commented at 10:00 pm on January 27, 2026: contributor
    ACK fa652a5d63d9e1855e73fa72d86c0b54f67aa0c7
  6. maflcko renamed this:
    test: Add missing sync on node0 in rpc_setban.py
    test: Fix all races after a socket is closed gracefully
    on Jan 28, 2026
  7. maflcko force-pushed on Jan 28, 2026
  8. DrahtBot added the label CI failed on Jan 28, 2026
  9. DrahtBot removed the label CI failed on Jan 28, 2026
  10. maflcko commented at 11:17 am on January 28, 2026: member
    Pushed more commits to fix all such races in all tests
  11. davidgumberg commented at 6:47 pm on January 29, 2026: contributor
    Maybe some of these commits can be squashed?
  12. maflcko force-pushed on Jan 29, 2026
  13. maflcko commented at 6:51 pm on January 29, 2026: member
    Sure, squashed all down into one commit. It touches a few files, but 16 lines modified should be trivial enough for a single commit.
  14. maflcko force-pushed on Jan 29, 2026
  15. DrahtBot added the label CI failed on Jan 29, 2026
  16. in test/functional/p2p_disconnect_ban.py:107 in fa759a637c outdated
    103@@ -104,6 +104,7 @@ def run_test(self):
    104         # Keep mocktime, to avoid ban expiry when restart takes longer than
    105         # time_remaining
    106         self.restart_node(1, extra_args=[f"-mocktime={old_time+4}"])
    107+        self.wait_until(lambda: len(self.nodes[0].getpeerinfo()) == 0)
    


    davidgumberg commented at 7:13 pm on January 29, 2026:

    I don’t think wait_until is necessary here, since node 1 has banned node0 and is not connected, e.g.:

    0        assert(len(self.nodes[0].getpeerinfo()) == 0)
    1        self.restart_node(1, extra_args=[f"-mocktime={old_time+4}"])
    2        self.wait_until(lambda: len(self.nodes[0].getpeerinfo()) == 0)
    

    passes


    maflcko commented at 7:24 pm on January 29, 2026:

    The were connected, see line 25:

    0        self.connect_nodes(0, 1)
    

    A sync is required after the disconnect and before the next connection.

    Your test diff is expected to pass in the same way that the test is passing on current master.

    Did you apply the cpp diff as well to trigger the error?


    davidgumberg commented at 8:22 pm on January 29, 2026:
    Ah I see, and confirmed that the suggested cpp diff causes an error, feel free to mark to resolved
  17. DrahtBot removed the label CI failed on Jan 29, 2026
  18. davidgumberg commented at 9:19 pm on January 29, 2026: contributor

    Concept ACK, but I think len(getpeerinfo()) is kind of a brittle way / hacky way to enforce that some node is or isn’t connected to another node.

    What do you think of adding a TestNode method that checks if a node is connected to another node by looking for the subver in getpeerinfo(), ie:

    0class TestNode():
    1    def is_connected_to(self, other):
    2        assert isinstance(other, TestNode)
    3        other_subver = other.getnetworkinfo()['subversion']
    4
    5        return any(peer["subver"] == other_subver for peer in self.getpeerinfo())
    

    And then doing:

    0- self.wait_until(lambda: len(self.nodes[0].getpeerinfo()) == 0)
    1+ self.wait_until(lambda: not self.nodes[0].is_connected_to(self.nodes[1]))
    

    alternatively, could do it using the with style

    0class TestNode(): [@contextlib](/bitcoin-bitcoin/contributor/contextlib/).contextmanager
    1    def wait_for_disconnected_from(self, other, timeout=5):
    2        """
    3        Wait until this node is disconnected from other.
    4        """
    5        yield
    6        self.wait_until(lambda: not self.is_connected_to(other), timeout=timeout)
    

    and then doing:

    0with self.nodes[0].wait_for_disconnected_from(self.nodes[1]):
    1    self.restart_node(1, extra_args=[f"-mocktime={old_time+4}"])
    
  19. test: Add is_connected_to helper
    Needed in the next commit.
    
    Co-Authored-By: David Gumberg <davidzgumberg@gmail.com>
    fafa766c95
  20. maflcko force-pushed on Jan 30, 2026
  21. test: Fix all races after a socket is closed gracefully
    This waits for the disconnect to fully happen before the next connect.
    
    Also, in rpc_setban.py:
    * Fix a typo.
    * Move the `self.restart_node(1, [])` out of the context manager,
      because it nothing from the restart needs to be checked.
    fad3df9af6
  22. maflcko force-pushed on Jan 30, 2026
  23. DrahtBot added the label CI failed on Jan 30, 2026
  24. maflcko commented at 10:29 am on January 30, 2026: member

    Ah nice. I pushed the is_connected_to helper, because it makes the code more self-documenting and easier to understand.

    Not sure about the contextmanager, because connections/disconnections are often not the primary target of the test and using contexts for all of them could be confusing, so I’ll leave this as-is for now.

  25. DrahtBot removed the label CI failed on Jan 30, 2026

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-02-02 06:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me