valgrind will incur a slowdown of at least 2, so increase the default timeout factor.
This should reduce the number of reported issues. See also #27112 (comment)
valgrind will incur a slowdown of at least 2, so increase the default timeout factor.
This should reduce the number of reported issues. See also #27112 (comment)
<!--e57a25ab6845829454e8d69fc972939a-->
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.
<!--021abf342d371248e50ceaed478a90ca-->
See the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.
Concept ACK - will test
I'm still seeing #27208 with this change (fa27cf4cc7c24aa00a66dffabab849d4b3cb1c41). See here for the full combined log:
...............................................................................................
162/256 - p2p_ibd_stalling.py failed, Duration: 126 s
stdout:
2023-03-08T04:33:24.745000Z TestFramework (INFO): PRNG seed is: 3703385557670057473
2023-03-08T04:33:24.745000Z TestFramework (INFO): Initializing test directory /home/ubuntu/ci_scratch/ci/scratch/test_runner/test_runner_₿_🏃_20230308_012459/p2p_ibd_stalling_97
2023-03-08T04:33:38.260000Z TestFramework (INFO): Prepare blocks without sending them to the node
2023-03-08T04:33:38.356000Z TestFramework (INFO): Check that a staller does not get disconnected if the 1024 block lookahead buffer is filled
2023-03-08T04:35:17.908000Z TestFramework.utils (ERROR): wait_until() failed. Predicate: ''''
test_function = lambda: self.is_connected
'''
2023-03-08T04:35:17.908000Z TestFramework (ERROR): Assertion failed
Traceback (most recent call last):
File "/home/ubuntu/ci_scratch/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/test/functional/test_framework/test_framework.py", line 132, in main
self.run_test()
File "/home/ubuntu/ci_scratch/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/test/functional/p2p_ibd_stalling.py", line 77, in run_test
peers.append(node.add_outbound_p2p_connection(P2PStaller(stall_block), p2p_idx=id, connection_type="outbound-full-relay"))
File "/home/ubuntu/ci_scratch/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/test/functional/test_framework/test_node.py", line 663, in add_outbound_p2p_connection
p2p_conn.wait_for_connect()
File "/home/ubuntu/ci_scratch/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/test/functional/test_framework/p2p.py", line 467, in wait_for_connect
wait_until_helper(test_function, timeout=timeout, lock=p2p_lock)
File "/home/ubuntu/ci_scratch/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/test/functional/test_framework/util.py", line 281, in wait_until_helper
raise AssertionError("Predicate {} not true after {} seconds".format(predicate_source, timeout))
AssertionError: Predicate ''''
test_function = lambda: self.is_connected
''' not true after 60.0 seconds
2023-03-08T04:35:27.925000Z TestFramework.utils (ERROR): wait_until() failed. Predicate: ''''
wait_until_helper(lambda: not self.network_event_loop.is_running(), timeout=timeout)
'''
[node 0] Cleaning up leftover process
stderr:
Traceback (most recent call last):
File "/home/ubuntu/ci_scratch/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/test/functional/p2p_ibd_stalling.py", line 164, in <module>
P2PIBDStallingTest().main()
File "/home/ubuntu/ci_scratch/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/test/functional/test_framework/test_framework.py", line 155, in main
exit_code = self.shutdown()
File "/home/ubuntu/ci_scratch/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/test/functional/test_framework/test_framework.py", line 310, in shutdown
self.network_thread.close()
File "/home/ubuntu/ci_scratch/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/test/functional/test_framework/p2p.py", line 603, in close
wait_until_helper(lambda: not self.network_event_loop.is_running(), timeout=timeout)
File "/home/ubuntu/ci_scratch/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/test/functional/test_framework/util.py", line 281, in wait_until_helper
raise AssertionError("Predicate {} not true after {} seconds".format(predicate_source, timeout))
AssertionError: Predicate ''''
wait_until_helper(lambda: not self.network_event_loop.is_running(), timeout=timeout)
''' not true after 10.0 seconds
Task was destroyed but it is pending!
task: <Task pending name='Task-9' coro=<BaseSelectorEventLoop._accept_connection2() running at /usr/lib/python3.10/asyncio/selector_events.py:193>>
/usr/lib/python3.10/asyncio/base_events.py:671: RuntimeWarning: coroutine 'BaseSelectorEventLoop._accept_connection2' was never awaited
......
p2p_ibd_stalling.py | ✖ Failed | 126 s
ALL | ✖ Failed | 42880 s (accumulated)
Runtime: 11402 s
Yeah, this doesn't affect the valgrind CI system. Only test runs with --valgrind passed (and no timeout factor).
Yeah, this doesn't affect the valgrind CI system.
Ok. Can we consolidate this at all? The disparity is somewhat confusing because you would assume that the Valgrind CI job would use the dedicated --valgrind option provided by the test framework.
Not sure if that makes sense. This would require special casing the non-test binaries bitcoind/-cli/-tx/-util etc in the valgrind wrapper. Also, if someone wants to interactively re-run a single test inside the CI pod, they'd have to pass --valgrind. Maybe open a new discussion thread or pull request?
Concept ACK. On my test machine --valgrind --timeout-factor=4 works, whereas without --timeout-factor it doesn't. Haven't tried a lower value.
I also have to set --jobs very low (e.g. 5 on a 32 thread CPU, well below the point where it uses ~100% CPU), or I still get timeouts and other errors (e.g. no RPC connection in p2p_tx_download.py). So perhaps 4 is not enough if you want to max-out a machine, but fine for CI. (Also, I compiled without BDB)
Even with that some tests fail, but they do so very quickly so that seems unrelated (see #27228).
ACK fa27cf4cc7c24aa00a66dffabab849d4b3cb1c41 - I still see at least one actual issue when running the functional tests under --valgrind (outside the CI system), but will follow up separately with that. Increasing the timeout here seems fine.