test: fix intermittent failure in p2p_sendtxrcncl.py #26448

pull mzumsande wants to merge 1 commits into bitcoin:master from mzumsande:202211_fix_sendtxrcncl changing 2 files +19 −15
  1. mzumsande commented at 5:06 pm on November 3, 2022: contributor

    p2p_sendtxrcncl.py currently fails intermittently in the CI, see e.g. https://cirrus-ci.com/task/5511952184115200?logs=ci#L4024

    I believe that this is related to the reuse of the parameter p2p_idx=2 of add_outbound_p2p_connection in this test: When we call peer_disconnect, we don’t wait until the node has completed the disconnection. So there is a race between setting up the next connection (next addconnection RPC), and if the old one hasn’t been removed and has an identical port like the new one (because we didn’t increment p2p_idx), CConnman::OpenNetworkConnection just returns without establishing a connection, and the test fails.

    Fix this by using distinct disconnect_p2ps instead of peer_disconnect, which waits for the disconnect to complete. We can then use the same value for p2p_idx everywhere.

  2. fanquake added the label Tests on Nov 3, 2022
  3. in test/functional/test_framework/test_node.py:630 in b1cc46d57d outdated
    624@@ -625,6 +625,9 @@ def add_outbound_p2p_connection(self, p2p_conn, *, wait_for_verack=True, p2p_idx
    625 
    626         This method adds the p2p connection to the self.p2ps list and returns
    627         the connection to the caller.
    628+
    629+        p2p_idx must be different for simultaneously connected peers. Reusing it for the next peer
    630+        after disconnecting the previous one can lead to a race condition and should be avoided.
    


    maflcko commented at 5:16 pm on November 3, 2022:
    nit: the race wouldn’t happen with disconnect_p2ps

    mzumsande commented at 8:35 pm on November 3, 2022:
    Changed the wording.
  4. in test/functional/p2p_sendtxrcncl.py:165 in b1cc46d57d outdated
    161+            SendTxrcnclReceiver(), wait_for_verack=True, p2p_idx=ob_peer_id, connection_type="outbound-full-relay")
    162         assert peer.sendtxrcncl_msg_received
    163         assert peer.sendtxrcncl_msg_received.initiator
    164         assert not peer.sendtxrcncl_msg_received.responder
    165         assert_equal(peer.sendtxrcncl_msg_received.version, 1)
    166         peer.peer_disconnect()
    


    maflcko commented at 5:17 pm on November 3, 2022:
    nit: If there is no guarantee that this will disconnect the peer (due to races), it might be better to remove it, or replace it with disconnect_p2ps, which would also avoid having to modify the index.

    mzumsande commented at 8:38 pm on November 3, 2022:
    ok, I used disconnect_p2ps everywhere instead of peer_disconnect, and also changed p2p_idx=0 everywhere for consistency - this should also work.
  5. maflcko approved
  6. maflcko commented at 5:18 pm on November 3, 2022: member
    lgtm, but I am not really a fan of writing racy tests unless racyness is the goal of the test.
  7. mzumsande force-pushed on Nov 3, 2022
  8. test: fix intermittent failure in p2p_sendtxrcncl.py
    Using disconnect_p2ps instead of peer_disconnect makes
    the node wait for the disconnect to complete. As a result,
    we can reuse p2p_idx=0 in the add_outbound_p2p_connection calls.
    74d975318a
  9. mzumsande force-pushed on Nov 3, 2022
  10. DrahtBot commented at 9:55 am on November 4, 2022: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #26359 (p2p, refactor: #23443 (Erlay support signaling) follow-ups by naumenkogs)
    • #26257 (script, test: python linter fixups and updates by jonatack)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  11. in test/functional/p2p_sendtxrcncl.py:182 in 74d975318a
    185         peer = self.nodes[0].add_outbound_p2p_connection(
    186-            PeerNoVerack(), wait_for_verack=False, p2p_idx=3, connection_type="block-relay-only")
    187+            PeerNoVerack(), wait_for_verack=False, p2p_idx=0, connection_type="block-relay-only")
    188         with self.nodes[0].assert_debug_log(["we indicated no tx relay; disconnecting"]):
    189             peer.send_message(create_sendtxrcncl_msg(initiator=False))
    190             peer.wait_for_disconnect()
    


    maflcko commented at 11:10 am on November 4, 2022:
    I think this is racy, so you’d have to call self.nodes[0].disconnect_p2ps() or use a unique index

    mzumsande commented at 2:57 pm on November 4, 2022:
    Can you explain why? We called disconnect_p2ps() before add_outbound_p2p_connection, so are guaranteed to have no peer when doing so. Within thewith block we call wait_for_disconnect (which waits until the disconnect initiated by the node is completed), so the subsequent subtest also shouldn’t be able to run before the disconnect of the current one is completed. What am I missing?

    maflcko commented at 3:02 pm on November 4, 2022:
    Oh, right. As the disconnect happens on the side of the node, there shouldn’t be a race where the node is not aware of the disconnect.
  12. in test/functional/test_framework/test_node.py:631 in 74d975318a
    624@@ -625,6 +625,10 @@ def add_outbound_p2p_connection(self, p2p_conn, *, wait_for_verack=True, p2p_idx
    625 
    626         This method adds the p2p connection to the self.p2ps list and returns
    627         the connection to the caller.
    628+
    629+        p2p_idx must be different for simultaneously connected peers. When reusing it for the next peer
    630+        after disconnecting the previous one, it is necessary to wait for the disconnect to finish to avoid
    631+        a race condition.
    


    maflcko commented at 11:11 am on November 4, 2022:
    Maybe mention that disconnect_p2ps avoids the race?

    mzumsande commented at 3:46 pm on November 4, 2022:
    I tried to describe it in more general terms because disconnect_p2ps can only be used if there is no other, unrelated connection that we need to keep.
  13. maflcko approved
  14. maflcko commented at 11:12 am on November 4, 2022: member
    thanks, however this fixes one race, but introduces two more
  15. maflcko commented at 3:02 pm on November 4, 2022: member
    review ACK 74d975318a1443aebfbcee36e331df4e54ec1fbe
  16. maflcko merged this on Nov 4, 2022
  17. maflcko closed this on Nov 4, 2022

  18. mzumsande deleted the branch on Nov 4, 2022
  19. jonatack commented at 8:55 pm on November 4, 2022: contributor
    Post-merge ACK
  20. sidhujag referenced this in commit 7d09b61ed5 on Nov 5, 2022
  21. bitcoin locked this on Nov 4, 2023

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-10-04 22:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me