test: Fix multiprocess CI timeout in p2p_tx_download #29926

pull fjahr wants to merge 1 commits into bitcoin:master from fjahr:2024-04-p2p-tx-multiprocess-fail changing 1 files +2 −1
  1. fjahr commented at 11:30 pm on April 20, 2024: contributor

    This addresses multiprocess CI failures in p2p_tx_download.py, likely introduced by #29827.

    Example failure: https://cirrus-ci.com/task/5622109341220864

    I was having a hard time reproducing or rationalizing the root cause of the issue but it seemed very likely the mock time wasn’t working as expected without another reset and I got a successful run with it when I temporarily introduced it to another PR I am working on: https://cirrus-ci.com/task/5109555795853312

  2. DrahtBot commented at 11:30 pm on April 20, 2024: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage

    For detailed information about the code coverage, see the test coverage report.

    Reviews

    See the guideline for information on the review process. A summary of reviews will appear here.

  3. DrahtBot added the label Tests on Apr 20, 2024
  4. fjahr force-pushed on Apr 21, 2024
  5. DrahtBot added the label CI failed on Apr 21, 2024
  6. fjahr commented at 0:03 am on April 21, 2024: contributor
    Hm, still seems to be flakey at least, giving it one more shot with longer bump
  7. fjahr commented at 0:34 am on April 21, 2024: contributor
    #28313 / #28321 may be related
  8. DrahtBot removed the label CI failed on Apr 21, 2024
  9. in test/functional/p2p_tx_download.py:269 in c5b2d2cab6 outdated
    264@@ -265,7 +265,8 @@ def test_rejects_filter_reset(self):
    265         self.generate(self.wallet, 1, sync_fun=self.no_op)
    266         peer.sync_with_ping()
    267         peer.send_and_ping(msg_inv([CInv(t=MSG_WTX, h=int(low_fee_tx['wtxid'], 16))]))
    268-        node.bumpmocktime(MAX_GETDATA_INBOUND_WAIT)
    269+        node.setmocktime(int(time.time()))
    270+        node.bumpmocktime(MAX_GETDATA_INBOUND_WAIT + 300)
    


    glozow commented at 10:31 am on April 22, 2024:
    I haven’t been able to reproduce this issue and am mostly guessing here, but if it’s a setmocktime problem, perhaps just do a node.setmocktime(0) to reset it a few lines up (shouldn’t impact the test except maybe make it take 2sec longer)?

    glozow commented at 10:34 am on April 22, 2024:
    I’m not convinced that the +300 would make a difference since the timing for getdata isn’t variable like it is for announcements (which was the case in #28321)

    fjahr commented at 1:01 pm on April 22, 2024:

    I initially had has only used the node.setmocktime(0) in that place and that succeeded when I temporarily pushed it in an unrelated PR but then here it failed when I opened this one. After adding the +300 it succeeded, but yeah, this may have been just luck again. I will push a version without +300 and and the node.setmocktime(0) moved higher up.

    I just saw that the Re-Run button also appears for CI jobs that succeeded, I will try to re-rerun this version a few times manually so we can have bit more confidence before merging.


    fjahr commented at 2:03 pm on April 22, 2024:
    Had another failure, so I will add the +300 back and see if that is a permanent fix.
  10. glozow commented at 10:32 am on April 22, 2024: member
    (cc @theStack, indeed looks like it’s from #29827)
  11. fjahr force-pushed on Apr 22, 2024
  12. test: Add mocktime reset in p2p_tx_download
    This addresses a timeout error in the multiprocess CI job.
    fb154e3946
  13. fjahr force-pushed on Apr 22, 2024
  14. DrahtBot added the label CI failed on Apr 22, 2024
  15. maflcko commented at 2:15 pm on April 22, 2024: member

    Example failure: https://cirrus-ci.com/task/5622109341220864

    If I download https://api.cirrus-ci.com/v1/task/5622109341220864/logs/ci.log, I get

     02024-04-21T00:10:09.801000Z TestFramework.utils (ERROR): wait_until() failed. Predicate: ''''
     1        wait_until_helper_internal(lambda: not self.network_event_loop.is_running(), timeout=timeout)
     2'''
     3[node 1] Cleaning up leftover process
     4[node 0] Cleaning up leftover process
     5
     6
     7stderr:
     8Traceback (most recent call last):
     9  File "/ci_container_base/ci/scratch/build/bitcoin-i686-pc-linux-gnu/test/functional/p2p_tx_download.py", line 302, in <module>
    10    TxDownloadTest().main()
    11  File "/ci_container_base/ci/scratch/build/bitcoin-i686-pc-linux-gnu/test/functional/test_framework/test_framework.py", line 155, in main
    12    exit_code = self.shutdown()
    13                ^^^^^^^^^^^^^^^
    14  File "/ci_container_base/ci/scratch/build/bitcoin-i686-pc-linux-gnu/test/functional/test_framework/test_framework.py", line 314, in shutdown
    15    self.network_thread.close()
    16  File "/ci_container_base/ci/scratch/build/bitcoin-i686-pc-linux-gnu/test/functional/test_framework/p2p.py", line 732, in close
    17    wait_until_helper_internal(lambda: not self.network_event_loop.is_running(), timeout=timeout)
    18  File "/ci_container_base/ci/scratch/build/bitcoin-i686-pc-linux-gnu/test/functional/test_framework/util.py", line 293, in wait_until_helper_internal
    19    raise AssertionError("Predicate {} not true after {} seconds".format(predicate_source, timeout))
    20AssertionError: Predicate ''''
    21        wait_until_helper_internal(lambda: not self.network_event_loop.is_running(), timeout=timeout)
    22''' not true after 10.0 seconds
    

    Which indicates that the problem is during shutdown, after the test, not in the test.

    So I don’t think adding mocktime will fix it. Maybe you were running into a different issue, or are trying to fix a different bug?

  16. maflcko commented at 3:10 pm on April 22, 2024: member
    See also the current test failure, which remains: https://github.com/bitcoin/bitcoin/pull/29926/checks?check_run_id=24105987877
  17. fjahr commented at 3:20 pm on April 22, 2024: contributor

    Which indicates that the problem is during shutdown, after the test, not in the test.

    So I don’t think adding mocktime will fix it. Maybe you were running into a different issue, or are trying to fix a different bug?

    Hm, I didn’t see Stopping nodes being printed so I am not convinced it’s during shutdown but maybe I am just not familiar enough with the test framework. I am closing since it seems even with the +30 it’s still failing and it seems duplicate #29933 might have a better approach.

  18. fjahr closed this on Apr 22, 2024

  19. maflcko commented at 3:23 pm on April 22, 2024: member

    Hm, I didn’t see Stopping nodes being printed so I am not convinced it’s during shutdown

    This will be printed after the network thread is closed. However, the network thread not closing is the underlying bug.

    You can see this in the traceback I posted above. (self.network_thread.close() is line 314, and “Stopping nodes” is line 316)


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-09-28 22:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me