test: Fix multiprocess CI timeout in p2p_tx

fjahr commented at 11:30 pm on April 20, 2024: contributor

This addresses multiprocess CI failures in p2p_tx_download.py, likely introduced by #29827.

Example failure: https://cirrus-ci.com/task/5622109341220864

I was having a hard time reproducing or rationalizing the root cause of the issue but it seemed very likely the mock time wasn’t working as expected without another reset and I got a successful run with it when I temporarily introduced it to another PR I am working on: https://cirrus-ci.com/task/5109555795853312

DrahtBot commented at 11:30 pm on April 20, 2024: contributor

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage

For detailed information about the code coverage, see the test coverage report.

Reviews

See the guideline for information on the review process. A summary of reviews will appear here.

DrahtBot added the label Tests on Apr 20, 2024

fjahr force-pushed on Apr 21, 2024

DrahtBot added the label CI failed on Apr 21, 2024

fjahr commented at 0:03 am on April 21, 2024: contributor

Hm, still seems to be flakey at least, giving it one more shot with longer bump

fjahr commented at 0:34 am on April 21, 2024: contributor

#28313 / #28321 may be related

DrahtBot removed the label CI failed on Apr 21, 2024

in test/functional/p2p_tx_download.py:269 in c5b2d2cab6 outdated

264@@ -265,7 +265,8 @@ def test_rejects_filter_reset(self):
265         self.generate(self.wallet, 1, sync_fun=self.no_op)
266         peer.sync_with_ping()
267         peer.send_and_ping(msg_inv([CInv(t=MSG_WTX, h=int(low_fee_tx['wtxid'], 16))]))
268-        node.bumpmocktime(MAX_GETDATA_INBOUND_WAIT)
269+        node.setmocktime(int(time.time()))
270+        node.bumpmocktime(MAX_GETDATA_INBOUND_WAIT + 300)

glozow commented at 10:31 am on April 22, 2024:

I haven’t been able to reproduce this issue and am mostly guessing here, but if it’s a setmocktime problem, perhaps just do a node.setmocktime(0) to reset it a few lines up (shouldn’t impact the test except maybe make it take 2sec longer)?

glozow commented at 10:34 am on April 22, 2024:

I’m not convinced that the +300 would make a difference since the timing for getdata isn’t variable like it is for announcements (which was the case in #28321)

fjahr commented at 1:01 pm on April 22, 2024:

I initially had has only used the node.setmocktime(0) in that place and that succeeded when I temporarily pushed it in an unrelated PR but then here it failed when I opened this one. After adding the +300 it succeeded, but yeah, this may have been just luck again. I will push a version without +300 and and the node.setmocktime(0) moved higher up.

I just saw that the Re-Run button also appears for CI jobs that succeeded, I will try to re-rerun this version a few times manually so we can have bit more confidence before merging.

fjahr commented at 2:03 pm on April 22, 2024:

Had another failure, so I will add the +300 back and see if that is a permanent fix.

glozow commented at 10:32 am on April 22, 2024: member

(cc @theStack, indeed looks like it’s from #29827)

fjahr force-pushed on Apr 22, 2024

test: Add mocktime reset in p2p_tx_download

This addresses a timeout error in the multiprocess CI job.

fb154e3946

fjahr force-pushed on Apr 22, 2024

DrahtBot added the label CI failed on Apr 22, 2024

maflcko commented at 2:15 pm on April 22, 2024: member

Example failure: https://cirrus-ci.com/task/5622109341220864

If I download https://api.cirrus-ci.com/v1/task/5622109341220864/logs/ci.log, I get

 02024-04-21T00:10:09.801000Z TestFramework.utils (ERROR): wait_until() failed. Predicate: ''''
 1        wait_until_helper_internal(lambda: not self.network_event_loop.is_running(), timeout=timeout)
 2'''
 3[node 1] Cleaning up leftover process
 4[node 0] Cleaning up leftover process
 5
 6
 7[1mstderr:
 8[0mTraceback (most recent call last):
 9  File "/ci_container_base/ci/scratch/build/bitcoin-i686-pc-linux-gnu/test/functional/p2p_tx_download.py", line 302, in <module>
10    TxDownloadTest().main()
11  File "/ci_container_base/ci/scratch/build/bitcoin-i686-pc-linux-gnu/test/functional/test_framework/test_framework.py", line 155, in main
12    exit_code = self.shutdown()
13                ^^^^^^^^^^^^^^^
14  File "/ci_container_base/ci/scratch/build/bitcoin-i686-pc-linux-gnu/test/functional/test_framework/test_framework.py", line 314, in shutdown
15    self.network_thread.close()
16  File "/ci_container_base/ci/scratch/build/bitcoin-i686-pc-linux-gnu/test/functional/test_framework/p2p.py", line 732, in close
17    wait_until_helper_internal(lambda: not self.network_event_loop.is_running(), timeout=timeout)
18  File "/ci_container_base/ci/scratch/build/bitcoin-i686-pc-linux-gnu/test/functional/test_framework/util.py", line 293, in wait_until_helper_internal
19    raise AssertionError("Predicate {} not true after {} seconds".format(predicate_source, timeout))
20AssertionError: Predicate ''''
21        wait_until_helper_internal(lambda: not self.network_event_loop.is_running(), timeout=timeout)
22''' not true after 10.0 seconds

Which indicates that the problem is during shutdown, after the test, not in the test.

So I don’t think adding mocktime will fix it. Maybe you were running into a different issue, or are trying to fix a different bug?

maflcko commented at 3:10 pm on April 22, 2024: member

See also the current test failure, which remains: https://github.com/bitcoin/bitcoin/pull/29926/checks?check_run_id=24105987877

fjahr commented at 3:20 pm on April 22, 2024: contributor

Which indicates that the problem is during shutdown, after the test, not in the test.

So I don’t think adding mocktime will fix it. Maybe you were running into a different issue, or are trying to fix a different bug?

Hm, I didn’t see Stopping nodes being printed so I am not convinced it’s during shutdown but maybe I am just not familiar enough with the test framework. I am closing since it seems even with the +30 it’s still failing and it seems duplicate #29933 might have a better approach.

fjahr closed this on Apr 22, 2024

maflcko commented at 3:23 pm on April 22, 2024: member

Hm, I didn’t see Stopping nodes being printed so I am not convinced it’s during shutdown

This will be printed after the network thread is closed. However, the network thread not closing is the underlying bug.

You can see this in the traceback I posted above. (self.network_thread.close() is line 314, and “Stopping nodes” is line 316)

bitcoin locked this on Apr 22, 2025

test: Fix multiprocess CI timeout in p2p_tx_download #29926

Code Coverage

Reviews