Also wait for the other node to notice the closed socket. Otherwise, the other node is not able to use the connect helper.
test: Avoid race in disconnect_nodes helper #26138
pull MarcoFalke wants to merge 1 commits into bitcoin:master from MarcoFalke:2209-test-race-🍄 changing 1 files +9 −8-
MarcoFalke commented at 1:49 PM on September 20, 2022: member
-
test: Avoid race in disconnect_nodes helper faeea28753
- MarcoFalke added this to the milestone 24.0 on Sep 20, 2022
- fanquake added the label Tests on Sep 20, 2022
-
stickies-v commented at 6:49 PM on September 20, 2022: contributor
Makes sense to verify the disconnect has been picked up by both nodes, so this seems like a good improvement to me.
In
connect_nodes(), relying on the number of connected/veracked peers instead of the actual peer seems like a very fragile approach though. I would expect it to break again in the future. I'm not sure we don't have better ways to check 2 nodes being connected?For example, I think building on faeea2875 this (MVP - open to improvements) would work:
<details> <summary>git diff</summary>
diff --git a/test/functional/test_framework/test_framework.py b/test/functional/test_framework/test_framework.py index b1164b98f..f79b931b6 100755 --- a/test/functional/test_framework/test_framework.py +++ b/test/functional/test_framework/test_framework.py @@ -581,8 +581,6 @@ class BitcoinTestFramework(metaclass=BitcoinTestMetaClass): def connect_nodes(self, a, b): from_connection = self.nodes[a] to_connection = self.nodes[b] - from_num_peers = 1 + len(from_connection.getpeerinfo()) - to_num_peers = 1 + len(to_connection.getpeerinfo()) ip_port = "127.0.0.1:" + str(p2p_port(b)) from_connection.addnode(ip_port, "onetry") # poll until version handshake complete to avoid race conditions @@ -590,30 +588,27 @@ class BitcoinTestFramework(metaclass=BitcoinTestMetaClass): # See comments in net_processing: # * Must have a version message before anything else # * Must have a verack message before anything else - self.wait_until(lambda: sum(peer['version'] != 0 for peer in from_connection.getpeerinfo()) == from_num_peers) - self.wait_until(lambda: sum(peer['version'] != 0 for peer in to_connection.getpeerinfo()) == to_num_peers) - self.wait_until(lambda: sum(peer['bytesrecv_per_msg'].pop('verack', 0) == 24 for peer in from_connection.getpeerinfo()) == from_num_peers) - self.wait_until(lambda: sum(peer['bytesrecv_per_msg'].pop('verack', 0) == 24 for peer in to_connection.getpeerinfo()) == to_num_peers) + def is_connection_established(from_node: TestNode, to_node: TestNode) -> bool: + connected = from_node.get_connected_peers_info(to_node.index) + veracked = [peer for peer in connected if peer['bytesrecv_per_msg'].pop('verack', 0) == 24] + return len(veracked) > 0 + + self.wait_until(lambda: is_connection_established(from_connection, to_connection)) + self.wait_until(lambda: is_connection_established(to_connection, from_connection)) + def disconnect_nodes(self, a, b): def disconnect_nodes_helper(node_a, node_b): - def get_peer_ids(from_connection, node_num): - result = [] - for peer in from_connection.getpeerinfo(): - if "testnode{}".format(node_num) in peer['subver']: - result.append(peer['id']) - return result - - peer_ids = get_peer_ids(node_a, node_b.index) - if not peer_ids: + peers = node_a.get_connected_peers_info(node_b.index) + if not peers: self.log.warning("disconnect_nodes: {} and {} were not connected".format( node_a.index, node_b.index, )) return - for peer_id in peer_ids: + for peer in peers: try: - node_a.disconnectnode(nodeid=peer_id) + node_a.disconnectnode(nodeid=peer["id"]) except JSONRPCException as e: # If this node is disconnected between calculating the peer id # and issuing the disconnect, don't worry about it. @@ -622,8 +617,8 @@ class BitcoinTestFramework(metaclass=BitcoinTestMetaClass): raise # wait to disconnect - self.wait_until(lambda: not get_peer_ids(node_a, node_b.index), timeout=5) - self.wait_until(lambda: not get_peer_ids(node_b, node_a.index), timeout=5) + self.wait_until(lambda: not node_a.get_connected_peers_info(node_b.index), timeout=5) + self.wait_until(lambda: not node_b.get_connected_peers_info(node_a.index), timeout=5) disconnect_nodes_helper(self.nodes[a], self.nodes[b]) diff --git a/test/functional/test_framework/test_node.py b/test/functional/test_framework/test_node.py index e35cae006..73ad492a5 100755 --- a/test/functional/test_framework/test_node.py +++ b/test/functional/test_framework/test_node.py @@ -21,6 +21,7 @@ import collections import shlex import sys from pathlib import Path +from typing import Any, Dict, List from .authproxy import JSONRPCException from .descriptors import descsum_create @@ -645,6 +646,13 @@ class TestNode(): return p2p_conn + def get_connected_peers_info(self, other_node_index: int) -> List[Dict[str, Any]]: + result = [] + for peer in self.getpeerinfo(): + if "testnode{}".format(other_node_index) in peer['subver']: + result.append(peer) + return result + def num_test_p2p_connections(self): """Return number of test framework p2p connections to the node.""" return len([peer for peer in self.getpeerinfo() if peer['subver'] == P2P_SUBVERSION])</details>
-
MarcoFalke commented at 8:08 AM on September 21, 2022: member
get_connected_peers_infowould only work with Bitcoin Core peers, not mininode peers, so I'd prefer not to make it public -
MarcoFalke commented at 8:10 AM on September 21, 2022: member
I would expect it to break again in the future.
The only way I can see it would "break" is when you attempt to connect to the same peer twice. In which case the failure is probably wanted. I am not sure if your approach silently dismisses that failure.
-
stickies-v commented at 10:41 AM on September 22, 2022: contributor
get_connected_peers_info would only work with Bitcoin Core peers, not mininode peers, so I'd prefer not to make it public
Good point, I did not consider that. I've updated the diff (below) to keep it private. Also reduces the diff quite a bit. (Note:
get_peer_ids()would benefit from the same list comprehension imo, but not a required change for this to work)The only way I can see it would "break" is when you attempt to connect to the same peer twice. In which case the failure is probably wanted. I am not sure if your approach silently dismisses that failure.
Updated to only return
Trueif we have exactly 1 connection tob, which I think addresses your concern. I don't have any specific ideas of how it would break otherwise, but sinceconnect_nodes()doesn't own the nodes, I think it's not unreasonable to assume that the node may update its peers in the background? If without additional complexity we can check thatais now connected toband not just to any new node, that seems like a strict improvement to me?<details> <summary>git diff</summary>
diff --git a/test/functional/test_framework/test_framework.py b/test/functional/test_framework/test_framework.py index b1164b98f..fa3ddebe3 100755 --- a/test/functional/test_framework/test_framework.py +++ b/test/functional/test_framework/test_framework.py @@ -581,8 +581,6 @@ class BitcoinTestFramework(metaclass=BitcoinTestMetaClass): def connect_nodes(self, a, b): from_connection = self.nodes[a] to_connection = self.nodes[b] - from_num_peers = 1 + len(from_connection.getpeerinfo()) - to_num_peers = 1 + len(to_connection.getpeerinfo()) ip_port = "127.0.0.1:" + str(p2p_port(b)) from_connection.addnode(ip_port, "onetry") # poll until version handshake complete to avoid race conditions @@ -590,10 +588,14 @@ class BitcoinTestFramework(metaclass=BitcoinTestMetaClass): # See comments in net_processing: # * Must have a version message before anything else # * Must have a verack message before anything else - self.wait_until(lambda: sum(peer['version'] != 0 for peer in from_connection.getpeerinfo()) == from_num_peers) - self.wait_until(lambda: sum(peer['version'] != 0 for peer in to_connection.getpeerinfo()) == to_num_peers) - self.wait_until(lambda: sum(peer['bytesrecv_per_msg'].pop('verack', 0) == 24 for peer in from_connection.getpeerinfo()) == from_num_peers) - self.wait_until(lambda: sum(peer['bytesrecv_per_msg'].pop('verack', 0) == 24 for peer in to_connection.getpeerinfo()) == to_num_peers) + def is_connection_established(from_node: TestNode, to_node: TestNode) -> bool: + connected = [peer for peer in from_connection.getpeerinfo() if f"testnode{b}" in peer["subver"]] + veracked = [peer for peer in connected if peer["bytesrecv_per_msg"].pop('verack', 0) == 24] + return len(veracked) == 1 + + self.wait_until(lambda: is_connection_established(from_connection, to_connection)) + self.wait_until(lambda: is_connection_established(to_connection, from_connection)) + def disconnect_nodes(self, a, b): def disconnect_nodes_helper(node_a, node_b):</details
-
in test/functional/test_framework/test_framework.py:611 in faeea28753
611 | if not peer_ids: 612 | self.log.warning("disconnect_nodes: {} and {} were not connected".format( 613 | - from_connection.index, 614 | - node_num, 615 | + node_a.index, 616 | + node_b.index,
stickies-v commented at 10:51 AM on September 22, 2022:nit: If you're touching, would use f-string instead:
self.log.warning(f"disconnect_nodes: {node_a.index} and {node_b.index} were not connected")in test/functional/test_framework/test_framework.py:600 in faeea28753
595 | @@ -596,24 +596,24 @@ def connect_nodes(self, a, b): 596 | self.wait_until(lambda: sum(peer['bytesrecv_per_msg'].pop('verack', 0) == 24 for peer in to_connection.getpeerinfo()) == to_num_peers) 597 | 598 | def disconnect_nodes(self, a, b): 599 | - def disconnect_nodes_helper(from_connection, node_num): 600 | - def get_peer_ids(): 601 | + def disconnect_nodes_helper(node_a, node_b): 602 | + def get_peer_ids(from_connection, node_num):
stickies-v commented at 10:52 AM on September 22, 2022:nit: Since you're touching, I think adding type hints here would be helpful. Also, for consistency would prefer calling it
nodeinstead ofconnectionandindexinstead ofnum?def disconnect_nodes(self, a: int, b: int) -> None: def disconnect_nodes_helper(node_a: TestNode, node_b: TestNode) -> None: def get_peer_ids(from_node: TestNode, to_node_index: int) -> List[int]:
MarcoFalke commented at 7:03 PM on September 28, 2022:I kept the name as-is to make the diff minimal. If there are additional changes, they can be made in a follow-up.
If there is a follow-up, it would probably be fine to also make
to_node_indexof typeTestNode.
fanquake commented at 9:24 AM on September 29, 2022:@stickies-v maybe you can follow up here?
stickies-v approvedstickies-v commented at 10:58 AM on September 22, 2022: contributorACK faeea2875
I would prefer making
connect_nodes()more robust too as per my suggestion, but I think this is a solid improvement on its own and my comment could be implemented in a separate PR if that's preferable or the proposal is contentious.A few nits, no blockers.
DrahtBot commented at 3:19 AM on September 23, 2022: contributor<!--e57a25ab6845829454e8d69fc972939a-->
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.
<!--174a7506f384e20aa4161008e828411d-->
Conflicts
No conflicts as of last run.
in test/functional/test_framework/test_framework.py:626 in faeea28753
621 | @@ -622,9 +622,10 @@ def get_peer_ids(): 622 | raise 623 | 624 | # wait to disconnect 625 | - self.wait_until(lambda: not get_peer_ids(), timeout=5) 626 | + self.wait_until(lambda: not get_peer_ids(node_a, node_b.index), timeout=5) 627 | + self.wait_until(lambda: not get_peer_ids(node_b, node_a.index), timeout=5)
glozow commented at 3:19 PM on September 28, 2022:Ok I interpret this race to have been
- call
disconnect(1, 2), wait until 2 no longer shows up innode1.getpeerinfo() - call
connect(0, 2). getto_num_peers = 2when queryingnode2.getpeerinfo() - node1 actually disconnected from node2. now there's only 1 peer in
node2.getpeerinfo(). lambda: sum(peer['version'] != 0 for peer in to_connection.getpeerinfo()) == to_num_peers== 2- hang
Which is why we need to wait for both.
And now the order is
- call
disconnect(1, 2), wait until 2 no longer shows up innode1.getpeerinfo()AND wait until 1 no longer shows up innode2.getpeerinfo() - call
connect(0, 2). getto_num_peers = 1when queryingnode2.getpeerinfo() lambda: sum(peer['version'] != 0 for peer in to_connection.getpeerinfo()) == to_num_peers== 1
glozow commented at 3:19 PM on September 28, 2022: memberACK faeea28753a94c45618c1b0ba83bb8700c53009a
glozow merged this on Sep 28, 2022glozow closed this on Sep 28, 2022fanquake added the label Needs backport (24.x) on Sep 28, 2022MarcoFalke commented at 6:57 PM on September 28, 2022: memberI think it's not unreasonable to assume that the node may update its peers in the background?
The functional test main thread is a single thread, so there should be nothing happening in the background. If something were to happen in the background, it is either irrelevant to the test, or it could result in a rare intermittent false-failure.
MarcoFalke removed the label Needs backport (24.x) on Sep 28, 2022sidhujag referenced this in commit aec13d64ea on Sep 28, 2022fanquake referenced this in commit f34c98a460 on Sep 29, 2022MarcoFalke deleted the branch on Sep 29, 2022bitcoin locked this on Sep 29, 2023ContributorsLabelsMilestone
24.0
This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-04-17 06:13 UTC
More mirrored repositories can be found on mirror.b10c.me