test: Fix intermittent timeout in p2p_tx_download.py #29933

pull maflcko wants to merge 1 commits into bitcoin:master from maflcko:2404-test-fix-p2p_tx_download-timeout- changing 1 files +11 −6
  1. maflcko commented at 12:54 pm on April 22, 2024: member

    Currently the test passes, but may fail during shutdown, because blocks and transactions are synced with NUM_INBOUND * self.num_nodes peers, which may take a long time.

    There is no need for this test to have this amount of inbounds.

    So avoid the extraneous inbounds to speed up the test and avoid the intermittent test failures.

  2. DrahtBot commented at 12:54 pm on April 22, 2024: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage

    For detailed information about the code coverage, see the test coverage report.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK instagibbs, fjahr, theStack, achow101

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

  3. DrahtBot added the label Tests on Apr 22, 2024
  4. test: Fix intermittent timeout in p2p_tx_download.py fa6c300a99
  5. maflcko force-pushed on Apr 22, 2024
  6. fjahr commented at 1:06 pm on April 22, 2024: contributor
    @maflcko did you see #29926?
  7. maflcko commented at 1:13 pm on April 22, 2024: member

    @maflcko did you see #29926?

    It looks like 29926 is adding a call to setmocktime in the test. As the issue happens after all tests have passed, this can not fix the issue. Or am I missing something?

    To explain it a bit more, the traceback is:

     0  File "/test/functional/p2p_tx_download.py", line 302, in <module>
     1    TxDownloadTest().main()
     2  File "/test/functional/test_framework/test_framework.py", line 155, in main
     3    exit_code = self.shutdown()
     4                ^^^^^^^^^^^^^^^
     5  File "/test/functional/test_framework/test_framework.py", line 314, in shutdown
     6    self.network_thread.close()
     7  File "/test/functional/test_framework/p2p.py", line 732, in close
     8    wait_until_helper_internal(lambda: not self.network_event_loop.is_running(), timeout=timeout)
     9  File "/test/functional/test_framework/util.py", line 293, in wait_until_helper_internal
    10    raise AssertionError("Predicate {} not true after {} seconds".format(predicate_source, timeout))
    11AssertionError: Predicate ''''
    12        wait_until_helper_internal(lambda: not self.network_event_loop.is_running(), timeout=timeout)
    13''' not true after 10.0 seconds
    

    Which is after the test has passed successfully.

  8. fjahr commented at 1:20 pm on April 22, 2024: contributor
    Hm, do you have an example failure where you have seen that traceback? Or is this from reproducing it locally? Because I haven’t seen this, at least not the shutdown part. Here is one of the ones I have seen and it seems to be a different failure: https://cirrus-ci.com/task/5622109341220864
  9. maflcko commented at 1:32 pm on April 22, 2024: member

    To get the full error log on Cirrus CI, you’ll have to use the “Open Full Logs” button to see the full log. Usually the output is:

    … (dots from the test runner) traceback (from the failed test) Full combined logs (from the failed test) Full test summary (from the test runner)

  10. glozow commented at 1:46 pm on April 22, 2024: member
    I can see the self.shutdown() traceback in this one: https://api.cirrus-ci.com/v1/task/6740372997537792/logs/ci.log
  11. maflcko commented at 2:04 pm on April 22, 2024: member

    I can see the self.shutdown() traceback in this one: https://api.cirrus-ci.com/v1/task/6740372997537792/logs/ci.log

    Thanks, in this case it seems the issue is transactions relayed to the extraneous peers. I’ve edited the pull request description.

    To see what message is the cause for the block, you can inspect the full combined log. In this example:

    0...
    1 test  2024-04-19T18:08:46.732000Z TestFramework.p2p (DEBUG): Received message from 127.0.0.1:17361: msg_tx(tx=CTransaction(nVersion=2 vin=[CTxIn(prevout=COutPoint(hash=25a2ce431090120926452cfbf24c43f9385a3228cee487df3dc952985a5a0e4f n=0) scriptSig= nSequence=0)] vout=[CTxOut(nValue=24.92720000 scriptPubKey=51202913b252fe537830f843bfdc5fa7d20ba48639a87c86ff837b92d083c55ad7c1), CTxOut(nValue=0.00000000 scriptPubKey=6a4e6d070100010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010... (msg truncated) 
    2 test  2024-04-19T18:08:46.861000Z TestFramework.utils (ERROR): wait_until() failed. Predicate: '''' 
    3                                           wait_until_helper_internal(lambda: not self.network_event_loop.is_running(), timeout=timeout)
    4                                   '''
    5 test  2024-04-19T18:08:47.029000Z TestFramework.p2p (DEBUG): Received message from 127.0.0.1:17361: msg_tx(tx=CTransaction(nVersion=2 vin=[CTxIn(prevout=COutPoint(hash=a21e32ecb778b77cb1a46e21c3ae4e31a893dbf0d3dedae257a89dbe14bacf99 n=0) scriptSig= nSequence=0)] vout=[CTxOut(nValue=24.92330000 scriptPubKey=51202913b252fe537830f843bfdc5fa7d20ba48639a87c86ff837b92d083c55ad7c1), CTxOut(nValue=0.00000000 
    6...
    
  12. in test/functional/p2p_tx_download.py:291 in fa6c300a99
    288-                     self.test_rejects_filter_reset]:
    289+        for test, with_inbounds in [
    290+            (self.test_in_flight_max, True),
    291+            (self.test_inv_block, True),
    292+            (self.test_tx_requests, True),
    293+            (self.test_rejects_filter_reset, False),
    


    instagibbs commented at 2:39 pm on April 22, 2024:
    Took me way too long to realize it’s the fill_mempool function causing the sync timeout specifically. Can a note be left here to say why we’re not making additional inbounds?

    maflcko commented at 2:52 pm on April 22, 2024:

    I think the real reason would be “because they are not needed”. The secondary reason would be that the test will be faster on slower hardware. The final reason would be that this avoids intermittent test failures/timeouts. However, there are other ways to avoid the intermittent test failures, so I am not sure if this is the right place to document it.

    Should I just put a comment here # Avoid inbounds because they are not needed and slow down the test?


    instagibbs commented at 3:15 pm on April 22, 2024:

    Should I just put a comment here # Avoid inbounds because they are not needed and slow down the test?

    not really, doesn’t add anything, so leave it as is

  13. instagibbs commented at 3:16 pm on April 22, 2024: member

    makes sense

    ACK fa6c300a9926a1d35fdd0a80f59ea39769bd2596

  14. fjahr commented at 3:58 pm on April 22, 2024: contributor
    We should probably re-run the multiprocess job at least 2 more times manually here to ensure this actually fixes it and it wasn’t just lucky this one time.
  15. maflcko commented at 4:00 pm on April 22, 2024: member
    It is already re-run 10 times. You can double check by navigating to the Cirrus CI page for this pull request.
  16. fjahr commented at 4:03 pm on April 22, 2024: contributor

    It is already re-run 10 times. You can double check by navigating to the Cirrus CI page for this pull request.

    Thanks, ACK fa6c300a9926a1d35fdd0a80f59ea39769bd2596

  17. theStack approved
  18. theStack commented at 4:35 pm on April 22, 2024: contributor

    ACK fa6c300a9926a1d35fdd0a80f59ea39769bd2596

    Thanks for fixing!

  19. achow101 commented at 10:11 pm on April 22, 2024: member
    ACK fa6c300a9926a1d35fdd0a80f59ea39769bd2596
  20. achow101 merged this on Apr 22, 2024
  21. achow101 closed this on Apr 22, 2024

  22. maflcko deleted the branch on Apr 23, 2024
  23. glozow commented at 7:31 am on April 23, 2024: member
    Thank you @maflcko!

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-09-28 22:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me