Rare walletbackup.py failure: cannot reproduce #8425

issue laanwj opened this issue on July 29, 2016
  1. laanwj commented at 8:23 AM on July 29, 2016: member

    I had this problem once while I was running the walletbackup.py test on 0.13.0rc2:

    walletbackup.py:
    Initializing test directory /tmp/testo3khmc00/37
    INFO: Generating initial blockchain
    Assertion failed: 76.69962300 != 50
    Stopping nodes
    Not cleaning up dir /tmp/testo3khmc00/37
    Failed
    
    stderr:
     Error: Unable to start HTTP server. See debug log for details.
    
    
    ************************
    EXCEPTION: N5boost16exception_detail10clone_implINS0_19error_info_injectorINS_10lock_errorEEEEE       
    boost: mutex lock failed in pthread_mutex_lock: Invalid argument       
    bitcoin in scheduler       
    
      File "/data/src/bitcoin-0.13/qa/rpc-tests/test_framework/test_framework.py", line 148, in main
        self.run_test()
      File "/home/user/src/bitcoin-0.13/qa/rpc-tests/walletbackup.py", line 114, in run_test
        assert_equal(self.nodes[0].getbalance(), 50)
      File "/data/src/bitcoin-0.13/qa/rpc-tests/test_framework/util.py", line 496, in assert_equal
        raise AssertionError("%s != %s"%(str(thing1),str(thing2)))
    

    I'm unfortunately unable to reproduce it again, looks like a race condition.

    Data directory available here: https://dev.visucore.com/bitcoin/tmp/8425_failed_testdata.tar.gz

  2. laanwj added the label Tests on Jul 29, 2016
  3. MarcoFalke commented at 8:51 AM on July 29, 2016: member

    Could you also add the line where the assertion failed (from the python traceback, which should be in the stderr)

    On Fri, Jul 29, 2016 at 10:23 AM, Wladimir J. van der Laan < notifications@github.com> wrote:

    I had this problem once while I was running the walletbackup.py test on 0.13:

    walletbackup.py: Initializing test directory /tmp/testo3khmc00/37 INFO: Generating initial blockchain Assertion failed: 76.69962300 != 50 Stopping nodes Not cleaning up dir /tmp/testo3khmc00/37 Failed

    I'm unfortunately unable to reproduce it again, looks like a race condition.

    — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub #8425, or mute the thread https://github.com/notifications/unsubscribe-auth/AGGmv60NSH-svLHtS9wi2s23c_U3ZQlBks5qabiRgaJpZM4JX-fm .

  4. laanwj commented at 8:57 AM on July 29, 2016: member
  5. fanquake commented at 2:09 AM on August 1, 2016: member

    I've also seen walletbackup.py fail, but it's a different error.

    ...Traceback (most recent call last):
      File "qa/pull-tester/rpc-tests.py", line 342, in <module>
        runtests()
      File "qa/pull-tester/rpc-tests.py", line 211, in runtests
        (name, stdout, stderr, passed, duration) = job_queue.get_next()
      File "qa/pull-tester/rpc-tests.py", line 266, in get_next
        (stdout, stderr) = proc.communicate(timeout=3)
      File "/usr/local/Cellar/python3/3.5.2_1/Frameworks/Python.framework/Versions/3.5/lib/python3.5/subprocess.py", line 1072, in communicate
        stdout, stderr = self._communicate(input, endtime, timeout)
      File "/usr/local/Cellar/python3/3.5.2_1/Frameworks/Python.framework/Versions/3.5/lib/python3.5/subprocess.py", line 1716, in _communicate
        self._check_timeout(endtime, orig_timeout)
      File "/usr/local/Cellar/python3/3.5.2_1/Frameworks/Python.framework/Versions/3.5/lib/python3.5/subprocess.py", line 1098, in _check_timeout
        raise TimeoutExpired(self.args, orig_timeout)
    subprocess.TimeoutExpired: Command '['/Users/xxx/github/bitcoin/qa/rpc-tests/walletbackup.py', '--srcdir=/Users/xxx/github/bitcoin/src', '--portseed=39']' timed out after 3 seconds
    
    Traceback (most recent call last):
      File "qa/pull-tester/rpc-tests.py", line 342, in <module>
        runtests()
      File "qa/pull-tester/rpc-tests.py", line 211, in runtests
        (name, stdout, stderr, passed, duration) = job_queue.get_next()
      File "qa/pull-tester/rpc-tests.py", line 266, in get_next
        (stdout, stderr) = proc.communicate(timeout=3)
      File "/usr/local/Cellar/python3/3.5.2_1/Frameworks/Python.framework/Versions/3.5/lib/python3.5/subprocess.py", line 1072, in communicate
        stdout, stderr = self._communicate(input, endtime, timeout)
      File "/usr/local/Cellar/python3/3.5.2_1/Frameworks/Python.framework/Versions/3.5/lib/python3.5/subprocess.py", line 1716, in _communicate
        self._check_timeout(endtime, orig_timeout)
      File "/usr/local/Cellar/python3/3.5.2_1/Frameworks/Python.framework/Versions/3.5/lib/python3.5/subprocess.py", line 1098, in _check_timeout
        raise TimeoutExpired(self.args, orig_timeout)
    subprocess.TimeoutExpired: Command '['/Users/xxx/github/bitcoin/qa/rpc-tests/wallet-dump.py', '--srcdir=/Users/xxx/github/bitcoin/src', '--portseed=35']' timed out after 3 seconds
    
  6. fanquake commented at 2:14 AM on August 1, 2016: member
    walletbackup.py:
    Initializing test directory /var/folders/rb/jqqd25w926s5429rf58_zsv00000gn/T/test34w8is11/39
    Unexpected exception caught during testing: Exception('bitcoind exited with status -6 during initialization',)
    Stopping nodes
    
    stderr:
     Error: Unable to start HTTP server. See debug log for details.
    Assertion failed: (nThreadsServicingQueue == 0), function ~CScheduler, file scheduler.cpp, line 19.
    Error: Unable to start HTTP server. See debug log for details.
      File "/Users/xxx/Github/bitcoin/qa/rpc-tests/test_framework/test_framework.py", line 146, in main
        self.setup_network()
      File "/Users/xxx/github/bitcoin/qa/rpc-tests/walletbackup.py", line 53, in setup_network
        self.nodes = start_nodes(self.num_nodes, self.options.tmpdir, extra_args)
      File "/Users/xxx/Github/bitcoin/qa/rpc-tests/test_framework/util.py", line 340, in start_nodes
        rpcs.append(start_node(i, dirname, extra_args[i], rpchost, binary=binary[i]))
      File "/Users/xxx/Github/bitcoin/qa/rpc-tests/test_framework/util.py", line 321, in start_node
        wait_for_bitcoind_start(bitcoind_processes[i], url, i)
      File "/Users/xxx/Github/bitcoin/qa/rpc-tests/test_framework/util.py", line 191, in wait_for_bitcoind_start
        raise Exception('bitcoind exited with status %i during initialization' % process.returncode)
    Traceback (most recent call last):
      File "/Users/xxx/github/bitcoin/qa/rpc-tests/walletbackup.py", line 201, in <module>
        WalletBackupTest().main()
      File "/Users/xxx/Github/bitcoin/qa/rpc-tests/test_framework/test_framework.py", line 169, in main
        stop_nodes(self.nodes)
      File "/Users/xxx/Github/bitcoin/qa/rpc-tests/test_framework/util.py", line 358, in stop_nodes
        for node in nodes:
    TypeError: 'NoneType' object is not iterable
    
    Pass: False, Duration: 11 s
    
  7. JeremyRubin commented at 6:16 PM on August 7, 2016: contributor

    I'm seeing this pretty frequently, seems to be part of why #8464 is failing.

  8. JeremyRubin commented at 2:17 AM on August 8, 2016: contributor

    I think you're right it is a race condition. To test the hypothesis, we can try running with a lot of workers, which should hopefully increase the probability of failure.

    Running ./rpc_tests.py -parallel=40 on master gives me:

    TEST                           | PASSED | DURATION
    
    p2p-fullblocktest.py           | True   | 174 s
    walletbackup.py                | False  | 9 s
    bip68-112-113-p2p.py           | True   | 34 s
    wallet.py                      | False  | 1 s
    wallet-hd.py                   | False  | 33 s
    wallet-dump.py                 | True   | 7 s
    listtransactions.py            | True   | 38 s
    receivedby.py                  | True   | 18 s
    mempool_resurrect_test.py      | True   | 1 s
    txn_doublespend.py --mineblock | True   | 4 s
    txn_clone.py                   | True   | 3 s
    getchaintips.py                | True   | 10 s
    rawtransactions.py             | False  | 15 s
    rest.py                        | False  | 62 s
    mempool_spendcoinbase.py       | True   | 1 s
    mempool_reorg.py               | True   | 6 s
    mempool_limit.py               | False  | 4 s
    httpbasics.py                  | True   | 1 s
    multi_rpc.py                   | True   | 3 s
    zapwallettxes.py               | False  | 13 s
    proxy_test.py                  | True   | 2 s
    merkle_blocks.py               | False  | 5 s
    fundrawtransaction.py          | False  | 23 s
    signrawtransactions.py         | True   | 2 s
    nodehandling.py                | True   | 14 s
    reindex.py                     | True   | 12 s
    decodescript.py                | True   | 0 s
    blockchain.py                  | False  | 1 s
    disablewallet.py               | True   | 3 s
    sendheaders.py                 | True   | 29 s
    keypool.py                     | True   | 5 s
    prioritise_transaction.py      | False  | 4 s
    invalidblockrequest.py         | True   | 5 s
    invalidtxrequest.py            | True   | 4 s
    abandonconflict.py             | False  | 14 s
    p2p-versionbits-warning.py     | False  | 7 s
    p2p-segwit.py                  | False  | 10 s
    segwit.py                      | False  | 4 s
    importprunedfunds.py           | False  | 3 s
    signmessages.py                | True   | 2 s
    p2p-compactblocks.py           | False  | 13 s
    zmq_test.py                    | False  | 0 s
    
    ALL                            | False  | 599 s (accumulated)
    

    Each one of these failures logs an Error: Unable to start HTTP server. See debug log for details..

    This supports the hypothesis that it's a race condition.

    When I ran with -parallel=1 on my branch, I still observed a (Error: Unable to start HTTP server. See debug log for details.) on rawtransaction (see #8464). So this means that it's likely not a race condition in the main test code (which should run in serial as per @MarcoFalke's suggestion), but perhaps in some cleanup code which locks the bitcoind starting.

    Hope that helps track this down, still not sure exactly where it is.

  9. MarcoFalke commented at 7:16 AM on August 8, 2016: member

    Interesting, I was hoping the walletbackup fail has been fixed by #8139 (related discussion in #8052)

    ./rpc_tests.py -parallel=40

    Running all tests in parallel may cause failures due to OOM or out of space (depending how your temp directory is set up).

  10. JeremyRubin commented at 6:45 PM on August 8, 2016: contributor

    Hm it's possible my failures were related to not properly clearing bitcoind pids... I'll check.

  11. MarcoFalke commented at 7:31 PM on August 15, 2016: member

    This just failed on travis due to insufficient funds:

       File "/home/travis/build/bitcoin/bitcoin/qa/rpc-tests/test_framework/test_framework.py", line 148, in main
        self.run_test()
      File "/home/travis/build/bitcoin/bitcoin/build/../qa/rpc-tests/walletbackup.py", line 122, in run_test
        self.do_one_round()
      File "/home/travis/build/bitcoin/bitcoin/build/../qa/rpc-tests/walletbackup.py", line 74, in do_one_round
        self.one_send(1, a2)
      File "/home/travis/build/bitcoin/bitcoin/build/../qa/rpc-tests/walletbackup.py", line 64, in one_send
        self.nodes[from_node].sendtoaddress(to_address, amount)
      File "/home/travis/build/bitcoin/bitcoin/qa/rpc-tests/test_framework/coverage.py", line 49, in __call__
        return_val = self.auth_service_proxy_instance.__call__(*args, **kwargs)
      File "/home/travis/build/bitcoin/bitcoin/qa/rpc-tests/test_framework/authproxy.py", line 146, in __call__
        raise JSONRPCException(response['error'])
    
  12. laanwj closed this on Mar 16, 2017

  13. MarcoFalke locked this on Sep 8, 2021

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-05-05 06:15 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me