Rare walletbackup.py failure: cannot reproduce #8425

issue laanwj openend this issue on July 29, 2016
  1. laanwj commented at 8:23 am on July 29, 2016: member

    I had this problem once while I was running the walletbackup.py test on 0.13.0rc2:

     0walletbackup.py:
     1Initializing test directory /tmp/testo3khmc00/37
     2INFO: Generating initial blockchain
     3Assertion failed: 76.69962300 != 50
     4Stopping nodes
     5Not cleaning up dir /tmp/testo3khmc00/37
     6Failed
     7
     8stderr:
     9 Error: Unable to start HTTP server. See debug log for details.
    10
    11
    12************************
    13EXCEPTION: N5boost16exception_detail10clone_implINS0_19error_info_injectorINS_10lock_errorEEEEE       
    14boost: mutex lock failed in pthread_mutex_lock: Invalid argument       
    15bitcoin in scheduler       
    16
    17  File "/data/src/bitcoin-0.13/qa/rpc-tests/test_framework/test_framework.py", line 148, in main
    18    self.run_test()
    19  File "/home/user/src/bitcoin-0.13/qa/rpc-tests/walletbackup.py", line 114, in run_test
    20    assert_equal(self.nodes[0].getbalance(), 50)
    21  File "/data/src/bitcoin-0.13/qa/rpc-tests/test_framework/util.py", line 496, in assert_equal
    22    raise AssertionError("%s != %s"%(str(thing1),str(thing2)))
    

    I’m unfortunately unable to reproduce it again, looks like a race condition.

    Data directory available here: https://dev.visucore.com/bitcoin/tmp/8425_failed_testdata.tar.gz

  2. laanwj added the label Tests on Jul 29, 2016
  3. MarcoFalke commented at 8:51 am on July 29, 2016: member

    Could you also add the line where the assertion failed (from the python traceback, which should be in the stderr)

    On Fri, Jul 29, 2016 at 10:23 AM, Wladimir J. van der Laan < notifications@github.com> wrote:

    I had this problem once while I was running the walletbackup.py test on 0.13:

    walletbackup.py: Initializing test directory /tmp/testo3khmc00/37 INFO: Generating initial blockchain Assertion failed: 76.69962300 != 50 Stopping nodes Not cleaning up dir /tmp/testo3khmc00/37 Failed

    I’m unfortunately unable to reproduce it again, looks like a race condition.

    — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub #8425, or mute the thread https://github.com/notifications/unsubscribe-auth/AGGmv60NSH-svLHtS9wi2s23c_U3ZQlBks5qabiRgaJpZM4JX-fm .

  4. laanwj commented at 8:57 am on July 29, 2016: member
  5. fanquake commented at 2:09 am on August 1, 2016: member

    I’ve also seen walletbackup.py fail, but it’s a different error.

     0...Traceback (most recent call last):
     1  File "qa/pull-tester/rpc-tests.py", line 342, in <module>
     2    runtests()
     3  File "qa/pull-tester/rpc-tests.py", line 211, in runtests
     4    (name, stdout, stderr, passed, duration) = job_queue.get_next()
     5  File "qa/pull-tester/rpc-tests.py", line 266, in get_next
     6    (stdout, stderr) = proc.communicate(timeout=3)
     7  File "/usr/local/Cellar/python3/3.5.2_1/Frameworks/Python.framework/Versions/3.5/lib/python3.5/subprocess.py", line 1072, in communicate
     8    stdout, stderr = self._communicate(input, endtime, timeout)
     9  File "/usr/local/Cellar/python3/3.5.2_1/Frameworks/Python.framework/Versions/3.5/lib/python3.5/subprocess.py", line 1716, in _communicate
    10    self._check_timeout(endtime, orig_timeout)
    11  File "/usr/local/Cellar/python3/3.5.2_1/Frameworks/Python.framework/Versions/3.5/lib/python3.5/subprocess.py", line 1098, in _check_timeout
    12    raise TimeoutExpired(self.args, orig_timeout)
    13subprocess.TimeoutExpired: Command '['/Users/xxx/github/bitcoin/qa/rpc-tests/walletbackup.py', '--srcdir=/Users/xxx/github/bitcoin/src', '--portseed=39']' timed out after 3 seconds
    
     0Traceback (most recent call last):
     1  File "qa/pull-tester/rpc-tests.py", line 342, in <module>
     2    runtests()
     3  File "qa/pull-tester/rpc-tests.py", line 211, in runtests
     4    (name, stdout, stderr, passed, duration) = job_queue.get_next()
     5  File "qa/pull-tester/rpc-tests.py", line 266, in get_next
     6    (stdout, stderr) = proc.communicate(timeout=3)
     7  File "/usr/local/Cellar/python3/3.5.2_1/Frameworks/Python.framework/Versions/3.5/lib/python3.5/subprocess.py", line 1072, in communicate
     8    stdout, stderr = self._communicate(input, endtime, timeout)
     9  File "/usr/local/Cellar/python3/3.5.2_1/Frameworks/Python.framework/Versions/3.5/lib/python3.5/subprocess.py", line 1716, in _communicate
    10    self._check_timeout(endtime, orig_timeout)
    11  File "/usr/local/Cellar/python3/3.5.2_1/Frameworks/Python.framework/Versions/3.5/lib/python3.5/subprocess.py", line 1098, in _check_timeout
    12    raise TimeoutExpired(self.args, orig_timeout)
    13subprocess.TimeoutExpired: Command '['/Users/xxx/github/bitcoin/qa/rpc-tests/wallet-dump.py', '--srcdir=/Users/xxx/github/bitcoin/src', '--portseed=35']' timed out after 3 seconds
    
  6. fanquake commented at 2:14 am on August 1, 2016: member
     0walletbackup.py:
     1Initializing test directory /var/folders/rb/jqqd25w926s5429rf58_zsv00000gn/T/test34w8is11/39
     2Unexpected exception caught during testing: Exception('bitcoind exited with status -6 during initialization',)
     3Stopping nodes
     4
     5stderr:
     6 Error: Unable to start HTTP server. See debug log for details.
     7Assertion failed: (nThreadsServicingQueue == 0), function ~CScheduler, file scheduler.cpp, line 19.
     8Error: Unable to start HTTP server. See debug log for details.
     9  File "/Users/xxx/Github/bitcoin/qa/rpc-tests/test_framework/test_framework.py", line 146, in main
    10    self.setup_network()
    11  File "/Users/xxx/github/bitcoin/qa/rpc-tests/walletbackup.py", line 53, in setup_network
    12    self.nodes = start_nodes(self.num_nodes, self.options.tmpdir, extra_args)
    13  File "/Users/xxx/Github/bitcoin/qa/rpc-tests/test_framework/util.py", line 340, in start_nodes
    14    rpcs.append(start_node(i, dirname, extra_args[i], rpchost, binary=binary[i]))
    15  File "/Users/xxx/Github/bitcoin/qa/rpc-tests/test_framework/util.py", line 321, in start_node
    16    wait_for_bitcoind_start(bitcoind_processes[i], url, i)
    17  File "/Users/xxx/Github/bitcoin/qa/rpc-tests/test_framework/util.py", line 191, in wait_for_bitcoind_start
    18    raise Exception('bitcoind exited with status %i during initialization' % process.returncode)
    19Traceback (most recent call last):
    20  File "/Users/xxx/github/bitcoin/qa/rpc-tests/walletbackup.py", line 201, in <module>
    21    WalletBackupTest().main()
    22  File "/Users/xxx/Github/bitcoin/qa/rpc-tests/test_framework/test_framework.py", line 169, in main
    23    stop_nodes(self.nodes)
    24  File "/Users/xxx/Github/bitcoin/qa/rpc-tests/test_framework/util.py", line 358, in stop_nodes
    25    for node in nodes:
    26TypeError: 'NoneType' object is not iterable
    27
    28Pass: False, Duration: 11 s
    
  7. JeremyRubin commented at 6:16 pm on August 7, 2016: contributor
    I’m seeing this pretty frequently, seems to be part of why #8464 is failing.
  8. JeremyRubin commented at 2:17 am on August 8, 2016: contributor

    I think you’re right it is a race condition. To test the hypothesis, we can try running with a lot of workers, which should hopefully increase the probability of failure.

    Running ./rpc_tests.py -parallel=40 on master gives me:

     0TEST                           | PASSED | DURATION
     1
     2p2p-fullblocktest.py           | True   | 174 s
     3walletbackup.py                | False  | 9 s
     4bip68-112-113-p2p.py           | True   | 34 s
     5wallet.py                      | False  | 1 s
     6wallet-hd.py                   | False  | 33 s
     7wallet-dump.py                 | True   | 7 s
     8listtransactions.py            | True   | 38 s
     9receivedby.py                  | True   | 18 s
    10mempool_resurrect_test.py      | True   | 1 s
    11txn_doublespend.py --mineblock | True   | 4 s
    12txn_clone.py                   | True   | 3 s
    13getchaintips.py                | True   | 10 s
    14rawtransactions.py             | False  | 15 s
    15rest.py                        | False  | 62 s
    16mempool_spendcoinbase.py       | True   | 1 s
    17mempool_reorg.py               | True   | 6 s
    18mempool_limit.py               | False  | 4 s
    19httpbasics.py                  | True   | 1 s
    20multi_rpc.py                   | True   | 3 s
    21zapwallettxes.py               | False  | 13 s
    22proxy_test.py                  | True   | 2 s
    23merkle_blocks.py               | False  | 5 s
    24fundrawtransaction.py          | False  | 23 s
    25signrawtransactions.py         | True   | 2 s
    26nodehandling.py                | True   | 14 s
    27reindex.py                     | True   | 12 s
    28decodescript.py                | True   | 0 s
    29blockchain.py                  | False  | 1 s
    30disablewallet.py               | True   | 3 s
    31sendheaders.py                 | True   | 29 s
    32keypool.py                     | True   | 5 s
    33prioritise_transaction.py      | False  | 4 s
    34invalidblockrequest.py         | True   | 5 s
    35invalidtxrequest.py            | True   | 4 s
    36abandonconflict.py             | False  | 14 s
    37p2p-versionbits-warning.py     | False  | 7 s
    38p2p-segwit.py                  | False  | 10 s
    39segwit.py                      | False  | 4 s
    40importprunedfunds.py           | False  | 3 s
    41signmessages.py                | True   | 2 s
    42p2p-compactblocks.py           | False  | 13 s
    43zmq_test.py                    | False  | 0 s
    44
    45ALL                            | False  | 599 s (accumulated)
    

    Each one of these failures logs an Error: Unable to start HTTP server. See debug log for details..

    This supports the hypothesis that it’s a race condition.

    When I ran with -parallel=1 on my branch, I still observed a (Error: Unable to start HTTP server. See debug log for details.) on rawtransaction (see #8464). So this means that it’s likely not a race condition in the main test code (which should run in serial as per @MarcoFalke’s suggestion), but perhaps in some cleanup code which locks the bitcoind starting.

    Hope that helps track this down, still not sure exactly where it is.

  9. MarcoFalke commented at 7:16 am on August 8, 2016: member

    Interesting, I was hoping the walletbackup fail has been fixed by #8139 (related discussion in #8052)

    ./rpc_tests.py -parallel=40

    Running all tests in parallel may cause failures due to OOM or out of space (depending how your temp directory is set up).

  10. JeremyRubin commented at 6:45 pm on August 8, 2016: contributor
    Hm it’s possible my failures were related to not properly clearing bitcoind pids… I’ll check.
  11. MarcoFalke commented at 7:31 pm on August 15, 2016: member

    This just failed on travis due to insufficient funds:

     0   File "/home/travis/build/bitcoin/bitcoin/qa/rpc-tests/test_framework/test_framework.py", line 148, in main
     1    self.run_test()
     2  File "/home/travis/build/bitcoin/bitcoin/build/../qa/rpc-tests/walletbackup.py", line 122, in run_test
     3    self.do_one_round()
     4  File "/home/travis/build/bitcoin/bitcoin/build/../qa/rpc-tests/walletbackup.py", line 74, in do_one_round
     5    self.one_send(1, a2)
     6  File "/home/travis/build/bitcoin/bitcoin/build/../qa/rpc-tests/walletbackup.py", line 64, in one_send
     7    self.nodes[from_node].sendtoaddress(to_address, amount)
     8  File "/home/travis/build/bitcoin/bitcoin/qa/rpc-tests/test_framework/coverage.py", line 49, in __call__
     9    return_val = self.auth_service_proxy_instance.__call__(*args, **kwargs)
    10  File "/home/travis/build/bitcoin/bitcoin/qa/rpc-tests/test_framework/authproxy.py", line 146, in __call__
    11    raise JSONRPCException(response['error'])
    
  12. laanwj closed this on Mar 16, 2017

  13. MarcoFalke locked this on Sep 8, 2021

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-10-04 22:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me