Intermittent test failure in p2p_v2_transport #29002

issue maflcko openend this issue on December 5, 2023
  1. maflcko commented at 4:48 pm on December 5, 2023: member

    https://drahtbot.space/temp_scratch/p2p_v2_transport_129.tar.xz

     0 test  2023-12-05T14:06:56.900000Z TestFramework.utils (ERROR): wait_until() failed. Predicate: '''' 
     1                                               self.wait_until(lambda: len(self.nodes[0].getpeerinfo()) == num_peers + 1)
     2                                   '''
     3 test  2023-12-05T14:06:56.902000Z TestFramework (ERROR): Assertion failed 
     4                                   Traceback (most recent call last):
     5                                     File "/ci_container_base/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/test/functional/test_framework/test_framework.py", line 131, in main
     6                                       self.run_test()
     7                                     File "/ci_container_base/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/test/functional/p2p_v2_transport.py", line 156, in run_test
     8                                       self.wait_until(lambda: len(self.nodes[0].getpeerinfo()) == num_peers + 1)
     9                                     File "/ci_container_base/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/test/functional/test_framework/test_framework.py", line 765, in wait_until
    10                                       return wait_until_helper_internal(test_function, timeout=timeout, timeout_factor=self.options.timeout_factor)
    11                                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    12                                     File "/ci_container_base/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/test/functional/test_framework/util.py", line 276, in wait_until_helper_internal
    13                                       raise AssertionError("Predicate {} not true after {} seconds".format(predicate_source, timeout))
    14                                   AssertionError: Predicate ''''
    15                                               self.wait_until(lambda: len(self.nodes[0].getpeerinfo()) == num_peers + 1)
    16                                   ''' not true after 2400.0 seconds
    
  2. maflcko added the label Tests on Dec 5, 2023
  3. theStack commented at 9:54 pm on December 5, 2023: contributor

    It seems that the problem is the following construct for detection of a new incoming connection:

    0    num_peers = len(self.nodes[0].getpeerinfo())
    1    s.connect(("127.0.0.1", p2p_port(0))
    2    self.wait_until(lambda: len(self.nodes[0].getpeerinfo()) == num_peers + 1)
    

    Looking at the logs, I think in this test run the the disconnect of the previous connection finished after the first getpeerinfo RPC call, i.e. the peer count drops to num_peers-1, increases to num_peers after the connection is made and remains at this value, so the lambda expression can never be fulfilled.

    I wonder what would be a more robust alternative. Maybe checking that the last peer’s id has increased, i.e. something like:

    0    highest_peer_id = self.nodes[0].getpeerinfo()[-1]["id"]
    1    s.connect(("127.0.0.1", p2p_port(0))
    2    self.wait_until(lambda: self.nodes[0].getpeerinfo()[-1]["id"] > highest_peer_id)
    

    This ensures that at least one new connection has been made in-between, which should be good enough. And it needs special-casing for when .getpeerinfo() returns an empty list (as accessing [-1] would yield an “index out of range” exception), which is a bit annoying…

  4. maflcko added this to the milestone 27.0 on Dec 5, 2023
  5. maflcko commented at 10:10 pm on December 5, 2023: member

    It seems easier to add another self.wait_until.

    Generally, I am not a fan of using assert_debug_log to sync the test execution implicitly, because this may cause races such as this.

    Adding an explicit wait on what should happen (disconnect) seems better than extracting a “magic” string from the debug log in a loop.

  6. theStack commented at 10:45 pm on December 5, 2023: contributor

    I agree, but using assert_debug_log was not planned to be used for a potential solution anyway (note that parsing the log is not mentioned anywhere in my previous post). I would still use the getpeerinfo() RPC, but not look at the length, but wait for an increase of the highest peer id instead. At least that’s the current plan, happy to take suggestions.

    Regarding the interface, I think a context-manager method would make sense here, e.g.:

    0with self.nodes[0].wait_for_new_peer():
    1    s.connect(("127.0.0.1", p2p_port(0)))
    
  7. theStack referenced this in commit 00e0658e77 on Dec 5, 2023
  8. fanquake closed this on Dec 8, 2023

  9. fanquake referenced this in commit 03042fb6bb on Dec 8, 2023


maflcko theStack

Labels
Tests

Milestone
27.0


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-09-29 04:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me