p2p-fullblocktest.py fails occasionally #11632

issue fanquake openend this issue on November 8, 2017
  1. fanquake commented at 5:53 am on November 8, 2017: member

    Occasionally I see p2pblocktest.py failing when running the full test suite on master. Here is the log output from one such run at https://github.com/bitcoin/bitcoin/commit/dd561667cb7ccbbfed3134b05a565971ef6f5873.

     02017-11-08 05:21:23.349000 TestFramework.mininode (DEBUG): Send message to 127.0.0.1:15360: msg_generic()
     12017-11-08 05:21:23.353000 TestFramework.mininode (DEBUG): Received message from 127.0.0.1:15360: msg_getdata(inv=[CInv(type=WitnessBlock hash=7a5f81e8313467d8da722212207f35e0bcff96b4a3d6e1cebe5ccfb482db087f)])
     22017-11-08 05:21:23.353000 TestFramework.mininode (DEBUG): Send message to 127.0.0.1:15360: msg_generic()
     32017-11-08 05:21:24.178000 TestFramework.mininode (DEBUG): Send message to 127.0.0.1:15360: msg_getheaders(locator=CBlockLocator(nVersion=70014 vHave=[33173860673120626974204943459578886554419646910719604625846583459047057165823, 52157530850984047832996077923584578103692798581921566502174761457904424775340, 55771744249389265034119257773627890252119103790626904495918282008316386578353, 40290229870805499894305341831791586089287558798738487092320053131961894901286, 37760069128212280825591083525569990527938311412737803690734490477686551632842, 2599600305764309661780695053387170433851388736... (msg truncated)
     42017-11-08 05:21:24.178000 TestFramework.mininode (DEBUG): Send message to 127.0.0.1:15360: msg_ping(nonce=00000060)
     52017-11-08 05:21:24.179000 TestFramework.mininode (DEBUG): Received message from 127.0.0.1:15360: msg_headers(headers=[CBlockHeader(nVersion=1 hashPrevBlock=4957bd6082037ffe6281f5682a4d12c0b5d1a9e47204df4688a8e0841dec8dff hashMerkleRoot=66e51352dee323b53d2c35154943cd4391a831b6984315be8754a28144884771 nTime=Wed Nov  8 13:31:25 2017 nBits=207fffff nNonce=00000000)])
     62017-11-08 05:21:24.179000 TestFramework.mininode (DEBUG): Received message from 127.0.0.1:15360: msg_pong(nonce=00000060)
     72017-11-08 05:21:24.229000 TestFramework.comptool (INFO): Test 96: PASS
     82017-11-08 05:21:24.230000 TestFramework.mininode (DEBUG): Send message to 127.0.0.1:15360: msg_inv(inv=[CInv(type=Block hash=20b90caf62293ca0d8ce43c540fd259d58cf86e0ddc337cb6bfba7f5d22d303e)])
     92017-11-08 05:21:24.231000 TestFramework.mininode (DEBUG): Received message from 127.0.0.1:15360: msg_getheaders(locator=CBlockLocator(nVersion=70014 vHave=[46904847670625905688023635754964878807294805301808802967599765639267425940817, 33173860673120626974204943459578886554419646910719604625846583459047057165823, 52157530850984047832996077923584578103692798581921566502174761457904424775340, 55771744249389265034119257773627890252119103790626904495918282008316386578353, 40290229870805499894305341831791586089287558798738487092320053131961894901286, 3776006912821228082559108352556999052793831141... (msg truncated)
    102017-11-08 05:21:24.239000 TestFramework.mininode (DEBUG): Send message to 127.0.0.1:15360: msg_headers(headers=[CBlockHeader(nVersion=1 hashPrevBlock=0f9188f13cb7b2c71f2a335e3a4fc328bf5beb436012afca590b1a11466e2206 hashMerkleRoot=185c284f92ef22630b0514f2752250acafe69429c2ac6b7ef40cb36e05bf30fb nTime=Wed Nov  8 13:11:11 2017 nBits=207fffff nNonce=00000000), CBlockHeader(nVersion=1 hashPrevBlock=07c4a08273b99a25dcd79b931e8b94292dafe9b6eb9136cd1f4d527f43ab366b hashMerkleRoot=fad67e50ef2d86063d3cbe9b019a458a70856defa66f0eb7da8cb553dca2b6f8 nTime=Wed Nov  8 13:11:12 2017 nBits=207fffff nNo... (msg truncated)
    112017-11-08 05:21:24.271000 TestFramework.mininode (DEBUG): Received message from 127.0.0.1:15360: msg_getdata(inv=[CInv(type=WitnessBlock hash=20b90caf62293ca0d8ce43c540fd259d58cf86e0ddc337cb6bfba7f5d22d303e)])
    122017-11-08 05:21:24.271000 TestFramework.mininode (DEBUG): Send message to 127.0.0.1:15360: msg_generic()
    132017-11-08 05:21:25.026000 TestFramework.mininode (DEBUG): Send message to 127.0.0.1:15360: msg_getheaders(locator=CBlockLocator(nVersion=70014 vHave=[33173860673120626974204943459578886554419646910719604625846583459047057165823, 52157530850984047832996077923584578103692798581921566502174761457904424775340, 55771744249389265034119257773627890252119103790626904495918282008316386578353, 40290229870805499894305341831791586089287558798738487092320053131961894901286, 37760069128212280825591083525569990527938311412737803690734490477686551632842, 2599600305764309661780695053387170433851388736... (msg truncated)
    142017-11-08 05:21:25.026000 TestFramework.mininode (DEBUG): Send message to 127.0.0.1:15360: msg_ping(nonce=00000061)
    152017-11-08 05:22:25.065000 TestFramework (ERROR): Assertion failed
    16Traceback (most recent call last):
    17  File "/bitcoin/test/functional/test_framework/test_framework.py", line 120, in main
    18    self.run_test()
    19  File "/bitcoin/test/functional/p2p-fullblocktest.py", line 72, in run_test
    20    self.test.run()
    21  File "/bitcoin/test/functional/test_framework/comptool.py", line 347, in run
    22    self.sync_blocks(block.sha256, 1)
    23  File "/bitcoin/test/functional/test_framework/comptool.py", line 220, in sync_blocks
    24    self.wait_for_pings(self.ping_counter)
    25  File "/bitcoin/test/functional/test_framework/comptool.py", line 200, in wait_for_pings
    26    wait_until(received_pongs, lock=mininode_lock)
    27  File "/bitcoin/test/functional/test_framework/util.py", line 222, in wait_until
    28    assert_greater_than(timeout, time.time())
    29  File "/bitcoin/test/functional/test_framework/util.py", line 42, in assert_greater_than
    30    raise AssertionError("%s <= %s" % (str(thing1), str(thing2)))
    31AssertionError: 1510118545.0264661 <= 1510118545.065086
    322017-11-08 05:22:25.068000 TestFramework (INFO): Stopping nodes
    332017-11-08 05:22:25.068000 TestFramework.node0 (DEBUG): Stopping node
    342017-11-08 05:23:05.775000 TestFramework.mininode (DEBUG): Closing connection to: 127.0.0.1:15360
    352017-11-08 05:23:05.776000 TestFramework.mininode (DEBUG): Closing connection to: 127.0.0.1:15360
    362017-11-08 05:23:05.776000 TestFramework.mininode (DEBUG): Network thread closing
    372017-11-08 05:23:06.087000 TestFramework.node0 (DEBUG): Node stopped
    

    @achow101 also mentioned that he has seen the same test fail, possibly continually. IRC discussion here: https://botbot.me/freenode/bitcoin-core-dev/2017-11-08/?msg=93258444&page=1

  2. fanquake added the label Tests on Nov 8, 2017
  3. instagibbs commented at 2:22 pm on November 8, 2017: member

    fails deterministic for me on both of my machines for months now

    2017-11-08 14:20:36.884000 TestFramework (ERROR): Assertion failed Traceback (most recent call last): File “/home/greg/bitcoin-dev/bitcoin/test/functional/test_framework/test_framework.py”, line 120, in main self.run_test() File “/home/greg/bitcoin-dev/bitcoin/test/functional/p2p-fullblocktest.py”, line 72, in run_test self.test.run() File “/home/greg/bitcoin-dev/bitcoin/test/functional/test_framework/comptool.py”, line 347, in run self.sync_blocks(block.sha256, 1) File “/home/greg/bitcoin-dev/bitcoin/test/functional/test_framework/comptool.py”, line 220, in sync_blocks self.wait_for_pings(self.ping_counter) File “/home/greg/bitcoin-dev/bitcoin/test/functional/test_framework/comptool.py”, line 200, in wait_for_pings wait_until(received_pongs, lock=mininode_lock) File “/home/greg/bitcoin-dev/bitcoin/test/functional/test_framework/util.py”, line 222, in wait_until assert_greater_than(timeout, time.time()) File “/home/greg/bitcoin-dev/bitcoin/test/functional/test_framework/util.py”, line 42, in assert_greater_than raise AssertionError("%s <= %s" % (str(thing1), str(thing2))) AssertionError: 1510150836.847895 <= 1510150836.8841593 2017-11-08 14:20:36.886000 TestFramework (INFO): Stopping nodes 2017-11-08 14:21:13.775000 TestFramework (WARNING): Not cleaning up dir /tmp/bitcoin_test_runner_20171108_091358/p2p-fullblocktest_370 2017-11-08 14:21:13.775000 TestFramework (ERROR): Test failed. Test logging available at /tmp/bitcoin_test_runner_20171108_091358/p2p-fullblocktest_370/test_framework.log

  4. achow101 commented at 3:06 pm on November 8, 2017: member
    This has also been failing for me for a while now. It happens every time I run the full test suite and has been doing so for several months.
  5. jnewbery commented at 9:06 pm on November 27, 2017: member

    @achow101 @instagibbs - can you try running again on master? The new logging should make this slightly easier to debug.

    Can you also try running this test on #11773. That’s a full rewrite of this test in the BitcoinTestFramework style.

  6. achow101 commented at 1:18 am on November 28, 2017: member

    @jnewbery Here’s the end of the output from running it one master:

     02017-11-28 00:01:40.355000 TestFramework.comptool (INFO): Running test 98: /home/andy/bitcoin/bitcoin/test/functional/p2p-fullblocktest.py line 1280
     12017-11-28 00:01:40.456000 TestFramework.comptool (INFO): Running test 99: /home/andy/bitcoin/bitcoin/test/functional/p2p-fullblocktest.py line 1282
     22017-11-28 00:02:40.923000 TestFramework (ERROR): Assertion failed
     3Traceback (most recent call last):
     4  File "/home/andy/bitcoin/bitcoin/test/functional/test_framework/test_framework.py", line 120, in main
     5    self.run_test()
     6  File "/home/andy/bitcoin/bitcoin/test/functional/p2p-fullblocktest.py", line 72, in run_test
     7    self.test.run()
     8  File "/home/andy/bitcoin/bitcoin/test/functional/test_framework/comptool.py", line 350, in run
     9    self.sync_blocks(block.sha256, 1)
    10  File "/home/andy/bitcoin/bitcoin/test/functional/test_framework/comptool.py", line 220, in sync_blocks
    11    self.wait_for_pings(self.ping_counter)
    12  File "/home/andy/bitcoin/bitcoin/test/functional/test_framework/comptool.py", line 200, in wait_for_pings
    13    wait_until(received_pongs, lock=mininode_lock)
    14  File "/home/andy/bitcoin/bitcoin/test/functional/test_framework/util.py", line 222, in wait_until
    15    assert_greater_than(timeout, time.time())
    16  File "/home/andy/bitcoin/bitcoin/test/functional/test_framework/util.py", line 42, in assert_greater_than
    17    raise AssertionError("%s <= %s" % (str(thing1), str(thing2)))
    18AssertionError: 1511827360.89918 <= 1511827360.9237378
    192017-11-28 00:02:40.926000 TestFramework (INFO): Stopping nodes
    202017-11-28 00:03:15.855000 TestFramework (WARNING): Not cleaning up dir /tmp/bitcoin_test_runner_20171127_185812/p2p-fullblocktest_655
    212017-11-28 00:03:15.856000 TestFramework (ERROR): Test failed. Test logging available at /tmp/bitcoin_test_runner_20171127_185812/p2p-fullblocktest_655/test_framework.log
    
  7. jnewbery commented at 1:39 am on November 28, 2017: member

    Could one of you post the full output of test_framework.log and debug.log to a pastebin?

    The sub-test here is for a large re-org of 1088 blocks, and the timeout is one minute. My best guess for now is that you’re running this on a low-powered machine or under high load.

    Have you tried running this on #11773?

  8. achow101 commented at 2:20 am on November 28, 2017: member
    #11773 does appear to fix the problem.
  9. MarcoFalke commented at 2:23 am on November 28, 2017: member
    If #11773 is meant to be refactoring only, we should determine why it accidentally fixes this issue.
  10. achow101 commented at 2:24 am on November 28, 2017: member

    @jnewbery

    test_framework.log debug.log

    The machine I am running this on has sort of high load (background has vm, node, electrum server, and a few other things). It is by no means low powered.

  11. instagibbs commented at 2:49 pm on November 28, 2017: member
    @jnewbery I can extend the timeout and check if it passes, how would I change this?
  12. jnewbery commented at 4:12 pm on November 28, 2017: member
    Thanks @achow101 - your debug log shows that the large re-org isn’t completing within the 60 second timeout. The tip height is initially 1220 and the node needs to rewind 1088 blocks and then reconnect up to height 1221. You can see that after 60 seconds, the node has only connected up to block height 1187, so the test framework shuts it down and fails the test. Interestingly, the node is only just failing to connect up to the new tip in time, and after the stop rpc command is issued, the node catches up 3 seconds later. @instagibbs - you can change this timeout by adding a timeout=120 to the call to wait_until() here: https://github.com/bitcoin/bitcoin/blob/26efc220a13aa3413f6e55e311e8991445104f82/test/functional/test_framework/comptool.py#L200. Note that this changes the timeout for all sync’ing when using the comparison test framework. @MarcoFalke - the rewrite in #11773 changes the test to use headers first syncing, which perhaps makes the re-org faster.
  13. instagibbs commented at 4:17 pm on November 29, 2017: member
    My machine is also busy, but not weak, and times out with .043(seconds?) too late on master. Any minor performance improvement would likely beat this.
  14. Sjors commented at 9:03 am on December 13, 2017: member

    For me using ./configure --enable-debug causes this error, presumably because it would require an even longer timeout.

    Tested with master ad1820cb:

    Test passes:

    0./configure --with-incompatible-bdb --disable-wallet --without-gui
    1make
    2test/functional/p2p-fullblocktest.py
    

    Test fails:

     0make clean
     1./configure --enable-debug --with-incompatible-bdb --without-gui --disable-qt
     2make
     3test/functional/p2p-fullblocktest.py
     4
     52017-12-12 12:46:23.112000 TestFramework.comptool (INFO): Running test 99: test/functional/p2p-fullblocktest.py line 1283
     62017-12-12 12:47:25.092000 TestFramework (ERROR): Assertion failed
     7Traceback (most recent call last):
     8  File ".../test/functional/test_framework/test_framework.py", line 118, in main
     9    self.run_test()
    10  File "test/functional/p2p-fullblocktest.py", line 73, in run_test
    11    self.test.run()
    12  File ".../test/functional/test_framework/comptool.py", line 341, in run
    13    self.sync_blocks(block.sha256, 1)
    14  File "..../test/functional/test_framework/comptool.py", line 211, in sync_blocks
    15    self.wait_for_pings(self.ping_counter)
    16  File ".../test/functional/test_framework/comptool.py", line 191, in wait_for_pings
    17    wait_until(received_pongs, lock=mininode_lock)
    18  File ".../test/functional/test_framework/util.py", line 222, in wait_until
    19    assert_greater_than(timeout, time.time())
    20  File ".../test/functional/test_framework/util.py", line 42, in assert_greater_than
    21    raise AssertionError("%s <= %s" % (str(thing1), str(thing2)))
    22AssertionError: 1513082845.0654101 <= 1513082845.092903
    

    Added --disable-wallet --without-gui to create a simpler scenario to reproduce, but it happens without those flags too.

    I’ll try with #11773.

  15. Sjors commented at 2:09 pm on December 13, 2017: member
    I’m getting a similar error, see logs. As before, error goes away without --enable-debug
  16. jnewbery commented at 2:54 pm on December 13, 2017: member

    I’m getting a similar error, see logs. As before, error goes away without –enable-debug

    Building with debug would make the executable far slower, so I’m not surprised that it times out on a 1008 block re-org.

  17. Sjors commented at 3:07 pm on December 13, 2017: member
    Indeed, I’m happy with “don’t do that” as a “fix” :-)
  18. achow101 commented at 4:30 pm on December 13, 2017: member
    Oh, I have also been building with --enable-debug so that may also be the cause of the test failures for me.
  19. instagibbs commented at 4:37 pm on December 13, 2017: member
    Quite standard for developers to run this way, I thought. Oh well.
  20. jnewbery commented at 7:53 pm on December 13, 2017: member

    I’ve increased the timeout for the re-org in #11773 and also improved the assert message for when a timeout fails, so that it’s more obvious what the cause was. @instagibbs @achow101 @Sjors - please let me know if the new #11773 works when running with --enable-debug.

    If you’d like this merged into master, please review #11771 first, which is a pre-requisite.

  21. Sjors commented at 12:22 pm on December 14, 2017: member

    Test a re-org of one week's worth of blocks (1088 blocks) still fails on my 2015 MacBook Pro, but this time it’s more clear that it’s due to a timeout.

    I changed the timeout to 1000. This seems to trigger a different timeout in the same test: 'getbestblockhash' RPC took longer than 60.000000 seconds.. See log.

    Rather than an all or nothing timeout, is there be a way for the test framework to ping the node if it’s making any progress on processing the reorg?

  22. jnewbery commented at 12:52 pm on December 14, 2017: member

    Ok, that’s a different kind of timeout. The timeout that was initially failing was the test framework waiting for the re-org to complete. The timeout in your attached log is due to an RPC call not getting responded to within a 30 seconds, failing, and then the test framework barfing 30 seconds later when its timeout pops. My guess is that the re-org is holding onto cs_main for so long during the re-org that it can’t process new RPC calls within that timeout.

    The 30 second RPC timeout is hard-coded here: https://github.com/bitcoin/bitcoin/blob/master/test/functional/test_framework/authproxy.py#L45. The default RPC server timeout is also 30 seconds: https://github.com/bitcoin/bitcoin/blob/master/src/httpserver.h#L14.

    I don’t think changing those defaults globally to get this test to work is necessarily a good idea.

  23. Sjors commented at 5:18 pm on December 14, 2017: member

    So the real bug here is that the code handling reorgs blocks the main thread. :-)

    Perhaps when a the test calls a function with a 1000s timeout, and this function in turn calls another function which fails with a timeout, it would keep retrying that function.

  24. laanwj commented at 6:54 pm on January 18, 2018: member
    I can reproduce this every time on a (slower) PC, on current master, when built with --enable-debug.
  25. Sjors commented at 8:39 am on January 19, 2018: member

    For me it happens fairly consistently if I sync a Parity node even without --enable-debug :-)

    I think the more elegant, though non-trivial, solution would be to make reorgs non-blocking and allow tests to poll their progress.

  26. AkioNak commented at 5:30 am on February 21, 2018: contributor

    I hope that someone will try #11988 and #12324 for running feature_block.py (p2p_fullblocktest.py) . These PRs give good effect to the re-org problem with –enable-debug because less locking/overhead on prevector.

    In my case, it was as follows:

    [enviroment] MacBook Pro (macOS 10.13.3/i7 2.2GHz/mem 16GB/SSD)

    [given command line]

    0$ ./test/functional/test_runner.py -j 1 feature_block.py feature_block.py
    

    [result] nodebug master e117cfe45

    0TEST             | STATUS    | DURATION
    1feature_block.py | ✓ Passed  | 140 s
    2feature_block.py | ✓ Passed  | 140 s
    3ALL              | ✓ Passed  | 280 s (accumulated) 
    

    nodebug master e117cfe45 + #11988 + #12324

    0TEST             | STATUS    | DURATION
    1feature_block.py | ✓ Passed  | 127 s
    2feature_block.py | ✓ Passed  | 129 s
    3ALL              | ✓ Passed  | 256 s (accumulated) 
    

    –enable-debug master e117cfe45

    0TEST             | STATUS    | DURATION
    1feature_block.py | ✓ Passed  | 533 s
    2feature_block.py | ✓ Passed  | 535 s
    3ALL              | ✓ Passed  | 1068 s (accumulated) 
    

    –enable-debug master e117cfe45 + #11988 + #12324

    0TEST             | STATUS    | DURATION
    1feature_block.py | ✓ Passed  | 366 s
    2feature_block.py | ✓ Passed  | 364 s
    3ALL              | ✓ Passed  | 730 s (accumulated)  
    

    Note: In the cases of –enable-debug, It is necessary to extend the timeout value from 60 seconds to 90 seconds in two places.

  27. MarcoFalke commented at 11:45 pm on March 29, 2018: member
    Please open a new issue if this is still reproducable
  28. MarcoFalke closed this on Mar 29, 2018

  29. DrahtBot locked this on Sep 8, 2021

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-11-17 12:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me