test: fix v2 transport intermittent test failure (#29002) #29006

pull theStack wants to merge 1 commits into bitcoin:master from theStack:202312-test-fix_v2_transport_intermittent_failure changing 2 files +26 −8
  1. theStack commented at 11:42 pm on December 5, 2023: contributor

    This PR improves the following fragile construct for detection of a new connection to the node under test in p2p_v2_transport.py: https://github.com/bitcoin/bitcoin/blob/6d5790956f45e3de5c6c4ee6fda21878b0d1287b/test/functional/p2p_v2_transport.py#L154-L156 Only relying on the number of peers for that suffers from race conditions, as unrelated previous peers could disconnect at anytime in-between. In the test run in #29002, the following happens:

    • getpeerinfo() is called the first time -> assigned to num_peers
    • previous peer disconnects, the node’s peer count is now num_peers - 1 (in most test runs, this happens before the first getpeerinfo call)
    • new peer connects, the node’s peer count is now num_peers
    • the condition that the node’s peer count is num_peers + 1 is never true, test fails

    Use the more robust approach of watching for an increased highest peer id instead (again using the getpeerinfo RPC call), with a newly introduced context manager method TestNode.wait_for_new_peer(). Note that for the opposite case of a disconnect, no new method is introduced; this is currently used only once in the test and is also simpler.

    Still happy to take suggestions for alternative solutions.

    Fixes #29002.

  2. test: fix v2 transport intermittent test failure (#29002)
    Only relying on the number of peers for detecting a new connection
    suffers from race conditions, as unrelated previous peers could
    disconnect at anytime in-between. Use the more robust approach of
    watching for an increased highest peer id instead (again using the
    `getpeerinfo` RPC call), with a newly introduced context manager
    method `TestNode.wait_for_new_peer()`.
    
    Fixes #29009.
    00e0658e77
  3. DrahtBot commented at 11:42 pm on December 5, 2023: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage

    For detailed information about the code coverage, see the test coverage report.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK stratospher, maflcko
    Concept ACK kevkevinpal

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

  4. DrahtBot added the label Tests on Dec 5, 2023
  5. kevkevinpal commented at 5:03 am on December 6, 2023: contributor
    Concept ACK 00e0658
  6. in test/functional/p2p_v2_transport.py:136 in 00e0658e77
    132@@ -133,9 +133,8 @@ def run_test(self):
    133         V1_PREFIX = MAGIC_BYTES["regtest"] + b"version\x00\x00\x00\x00\x00"
    134         assert_equal(len(V1_PREFIX), 16)
    135         with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
    136-            num_peers = len(self.nodes[0].getpeerinfo())
    137-            s.connect(("127.0.0.1", p2p_port(0)))
    138-            self.wait_until(lambda: len(self.nodes[0].getpeerinfo()) == num_peers + 1)
    139+            with self.nodes[0].wait_for_new_peer():
    


    stratospher commented at 6:25 am on December 6, 2023:
    00e0658: wondering if we need to wait for previous disconnections here since the unrelated disconnections happen only with peers where we directly open the sockets to the node right? (tests above use TestNode) i like how the approach is consistent with the rest of the test though.

    theStack commented at 9:03 am on December 8, 2023:
    You’re right, in this case there shouldn’t be any previous disconnect issues and hence the current connection-count approach would also work. I still decided to also use wait_for_new_peer here for consistency reasons and it’s less code and more readable.
  7. stratospher commented at 6:32 am on December 6, 2023: contributor

    ACK 00e0658.

    cross checked node0’s debug log to verify that this getpeerinfo gets called before disconnection is complete.

    0		2023-12-05T13:26:56.774275Z [net] [net.cpp:1106] [ProcessReceivedKeyBytes] [net] V2 transport error: V1 peer with wrong MessageStart 0a03cf40
    1		2023-12-05T13:26:56.857216Z [net] [net.cpp:554] [CloseSocketDisconnect] [net] disconnecting peer=5
    2		2023-12-05T13:26:56.857322Z [http] [httpserver.cpp:306] [http_request_cb] [http] Received a POST request for / from 127.0.0.1:32828
    3num_peers 	2023-12-05T13:26:56.857568Z [httpworker.2] [rpc/request.cpp:187] [parse] [rpc] ThreadRPCServer method=getpeerinfo user=__cookie__
    4		2023-12-05T13:26:56.858580Z [net] [logging/timer.h:58] [Log] [lock] Enter: lock contention m_nodes_mutex, net.cpp:1883 started
    5		2023-12-05T13:26:56.862378Z [http] [httpserver.cpp:306] [http_request_cb] [http] Received a POST request for / from 127.0.0.1:32828
    6		2023-12-05T13:26:56.862523Z [net] [logging/timer.h:58] [Log] [lock] Enter: lock contention m_nodes_mutex, net.cpp:1883 completed (3865μs)
    7num_peers-1     2023-12-05T13:26:56.862680Z [net] [net_processing.cpp:1651] [FinalizeNode] [net] Cleared nodestate for peer=5
    8num_peers-1	2023-12-05T13:26:56.863412Z [httpworker.3] [rpc/request.cpp:187] [parse] [rpc] ThreadRPCServer method=getpeerinfo user=__cookie__
    9num_peers-1+1   2023-12-05T13:26:56.863771Z [net] [net.cpp:3704] [CNode] [net] Added connection peer=6
    

    possible alternatives i could think of is checking whether the ip address:port combo is present/not present in getpeerinfo. something like: self.wait_until(lambda: "127.0.0.1:" + str(port) in [p["addrbind"] for p in self.nodes[0].getpeerinfo()]) but i think the current approach is slightly faster.

  8. in test/functional/p2p_v2_transport.py:149 in 00e0658e77
    142@@ -144,22 +143,23 @@ def run_test(self):
    143         # Check wrong network prefix detection (hits if the next 12 bytes correspond to a v1 version message)
    144         wrong_network_magic_prefix = MAGIC_BYTES["signet"] + V1_PREFIX[4:]
    145         with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
    146-            s.connect(("127.0.0.1", p2p_port(0)))
    147+            with self.nodes[0].wait_for_new_peer():
    148+                s.connect(("127.0.0.1", p2p_port(0)))
    149             with self.nodes[0].assert_debug_log(["V2 transport error: V1 peer with wrong MessageStart"]):
    150                 s.sendall(wrong_network_magic_prefix + b"somepayload")
    


    maflcko commented at 8:14 am on December 6, 2023:
    nit: unrelated: Would be good to wait for CNode desctruction after disconnection by checking the getpeerinfo RPC. This is what I meant in #29002 (comment)

    maflcko commented at 11:53 am on December 8, 2023:
    Alternatively, it would be good to pass the timeout option to assert_debug_log, so that it is well-documented that the debug log is used to sync.
  9. in test/functional/test_framework/test_node.py:537 in 00e0658e77
    532+            peer_info = self.getpeerinfo()
    533+            return peer_info[-1]["id"] if peer_info else -1
    534+
    535+        initial_peer_id = get_highest_peer_id()
    536+        yield
    537+        wait_until_helper_internal(lambda: get_highest_peer_id() > initial_peer_id,
    


    maflcko commented at 8:16 am on December 6, 2023:
    nit: unrelated: Could have a self.wait_until to avoid having to pass the timeout factor every time?
  10. maflcko approved
  11. maflcko commented at 8:18 am on December 6, 2023: member
    lgtm. Can’t hurt to have this helper, because it also makes code less verbose
  12. theStack commented at 9:05 am on December 8, 2023: contributor
    Thanks for the reviews! I’m leaving the unrelated nits (https://github.com/bitcoin/bitcoin/pull/29006#discussion_r1416885349 and #29006 (review)) for a follow-up.
  13. maflcko commented at 9:30 am on December 8, 2023: member
    Ok, lgtm ACK 00e0658e77f66103ebdeb29def99dc9f937c049d
  14. fanquake merged this on Dec 8, 2023
  15. fanquake closed this on Dec 8, 2023

  16. theStack deleted the branch on Dec 8, 2023
  17. fanquake commented at 11:49 am on December 8, 2023: member
    Note there was a typo in the commit message. This fixed #29002, not #29009 (which accidently got closed).
  18. fanquake referenced this in commit 36fabb01b1 on Dec 11, 2023

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-09-29 01:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me