test: Fix feature_dbcrash.py --usecli error #35379

pull maflcko wants to merge 4 commits into bitcoin:master from maflcko:2605-test-use-cli changing 2 files +10 −21
  1. maflcko commented at 10:02 AM on May 26, 2026: member

    This fixes a small intermittent issue that snuck in via commit fa8d4d5c35ef021c94e30ce7fe5a4c728b6decd5.

    Generally, it seems tedious and brittle trying to enumerate all possible exception types on all platforms and all test configs, all possible errno values, etc.

    So just treat any Exception as crash, and confirm it in the test. The test would still fail, if a crash did not happen after an Exception, but the failure may be minimally more tedious to debug. I think this is fine, because failures should be rare and having simpler and more flexible test code is preferable.


    Also, disable the test for now in CI, because it is quite slow.

  2. DrahtBot renamed this:
    test: Fix feature_dbcrash.py --usecli error
    test: Fix feature_dbcrash.py --usecli error
    on May 26, 2026
  3. DrahtBot added the label Tests on May 26, 2026
  4. DrahtBot commented at 10:02 AM on May 26, 2026: contributor

    <!--e57a25ab6845829454e8d69fc972939a-->

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    <!--006a51241073e994b41acfe9ec718e94-->

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/35379.

    <!--021abf342d371248e50ceaed478a90ca-->

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK willcl-ark

    If your review is incorrectly listed, please copy-paste <code>&lt;!--meta-tag:bot-skip--&gt;</code> into the comment that the bot should ignore.

    <!--5faf32d7da4f0f540f40219e4f7537a3-->

  5. fanquake requested review from willcl-ark on May 26, 2026
  6. maflcko force-pushed on May 26, 2026
  7. DrahtBot added the label CI failed on May 26, 2026
  8. DrahtBot commented at 10:26 AM on May 26, 2026: contributor

    <!--85328a0da195eb286784d51f73fa0af9-->

    🚧 At least one of the CI tasks failed. <sub>Task lint: https://github.com/bitcoin/bitcoin/actions/runs/26445727176/job/77852269720</sub> <sub>LLM reason (✨ experimental): CI failed because the Python lint step (py_lint) failed—ruff reported unused imports in test/functional/feature_dbcrash.py.</sub>

    <details><summary>Hints</summary>

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

    </details>

  9. willcl-ark commented at 11:09 AM on May 26, 2026: member

    Some conceptual thoughts:

    Is the runtime of the alpine job going to remain at ~50 minutes if we keep --usecli on it? I suppose it could, as dbcrash is very rpc-heavy when generating many transactions, and doing that through spawning clis is perhaps just ... slow. (note: I wonder if we can halve that nnumber by using sendrawtransaction rpc directly instead of the miniwallet helper which does a decoderawtransaction for each send?)

    I had assumed previously the time was just a full rebuild, but now I don't think so.

    Perhaps we keep the fix here (I think it's fine), but don't set dbcrash as usecli-eligible?

  10. DrahtBot closed this on May 26, 2026

  11. DrahtBot reopened this on May 26, 2026

  12. sedited commented at 3:20 PM on May 26, 2026: contributor

    generating many transactions, and doing that through spawning clis is perhaps just ... slow

    I was testing it for the parent PR and it took about 17 minutes on my machine.

  13. DrahtBot removed the label CI failed on May 26, 2026
  14. maflcko commented at 4:06 PM on May 26, 2026: member

    Is the runtime of the alpine job going to remain at ~50 minutes if we keep --usecli on it? I suppose it could, as dbcrash is very rpc-heavy when generating many transactions, and doing that through spawning clis is perhaps just ... slow. (note: I wonder if we can halve that nnumber by using sendrawtransaction rpc directly instead of the miniwallet helper which does a decoderawtransaction for each send?)

    Thx, disabled it again, for now.

    Even halving to 25 minutes seems a bit steep for a single test case?

  15. maflcko force-pushed on May 26, 2026
  16. DrahtBot added the label CI failed on May 26, 2026
  17. maflcko commented at 4:11 PM on May 26, 2026: member

    generating many transactions, and doing that through spawning clis is perhaps just ... slow

    I was testing it for the parent PR and it took about 17 minutes on my machine.

    I think the CI machine is using 7950X3D, which is quite fast, but also a heavy debug build with -O1 DEBUG DEBUG_LOCKORDER DEBUG_LOCKCONTENTION RPC_DOC_CHECK ABORT_ON_FAILED_ASSUME _GLIBCXX_DEBUG _GLIBCXX_DEBUG_PEDANTIC _LIBCPP_HARDENING_MODE=_LIBCPP_HARDENING_MODE_DEBUG

  18. DrahtBot removed the label CI failed on May 26, 2026
  19. maflcko added this to the milestone 32.0 on May 26, 2026
  20. maflcko commented at 7:18 PM on May 27, 2026: member

    Example failure: https://github.com/bitcoin/bitcoin/actions/runs/26519150767/job/78106527120#step:11:4133

    2026-05-27T15:49:02.790941Z TestFramework (INFO): Iteration 37, generating 2500 transactions [11, 7, 4]
    2026-05-27T15:49:39.323797Z TestFramework (INFO): Iteration 38, generating 2500 transactions [12, 7, 4]
    2026-05-27T15:50:15.286848Z TestFramework (INFO): Iteration 39, generating 2500 transactions [13, 7, 4]
    2026-05-27T15:50:52.526040Z TestFramework (INFO): Verifying utxo hash matches for all nodes
    2026-05-27T15:50:52.595004Z TestFramework (ERROR): Called Process failed with stdout='error: Error while attempting to communicate with server 127.0.0.1:19421 (EOF)
    
    Make sure the bitcoind server is running and that you are connecting to the correct RPC port.
    Use "bitcoin-cli -help" for more info.
    '; stderr='None';
    Traceback (most recent call last):
      File "/home/admin/actions-runner/_work/_temp/test/functional/test_framework/test_framework.py", line 143, in main
        self.run_test()
      File "/home/admin/actions-runner/_work/_temp/build/test/functional/feature_dbcrash.py", line 273, in run_test
        self.verify_utxo_hash()
      File "/home/admin/actions-runner/_work/_temp/build/test/functional/feature_dbcrash.py", line 182, in verify_utxo_hash
        nodei_utxo_hash = self.nodes[i].gettxoutsetinfo()['hash_serialized_3']
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/home/admin/actions-runner/_work/_temp/test/functional/test_framework/test_node.py", line 967, in __call__
        return self.cli.send_cli(self.command, *args, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/home/admin/actions-runner/_work/_temp/test/functional/test_framework/test_node.py", line 1047, in send_cli
        raise subprocess.CalledProcessError(returncode, p_args, output=cli_stderr)
    subprocess.CalledProcessError: Command '['bitcoin', '-m', 'rpc', '-nonamed', '-datadir=/home/admin/actions-runner/_work/_temp/ci/scratch_ ₿🧪_/test_runner/test_runner_₿_🏃_20260527_151953/feature_dbcrash_285/node1', '-rpcclienttimeout=9600', '-rpcconnect=127.0.0.1', '-rpcport=19421', 'gettxoutsetinfo']' returned non-zero exit status 1.
    2026-05-27T15:50:52.670092Z TestFramework (INFO): Not stopping nodes as test failed. The dangling processes will be cleaned up later.
    2026-05-27T15:50:52.670165Z TestFramework (WARNING): Not cleaning up dir /home/admin/actions-runner/_work/_temp/ci/scratch_ ₿🧪_/test_runner/test_runner_₿_🏃_20260527_151953/feature_dbcrash_285
    2026-05-27T15:50:52.670208Z TestFramework (ERROR): Test failed. Test logging available at /home/admin/actions-runner/_work/_temp/ci/scratch_ ₿🧪_/test_runner/test_runner_₿_🏃_20260527_151953/feature_dbcrash_285/test_framework.log
    2026-05-27T15:50:52.670290Z TestFramework (ERROR): 
    2026-05-27T15:50:52.670400Z TestFramework (ERROR): Hint: Call /home/admin/actions-runner/_work/_temp/test/functional/combine_logs.py '/home/admin/actions-runner/_work/_temp/ci/scratch_ ₿🧪_/test_runner/test_runner_₿_🏃_20260527_151953/feature_dbcrash_285' to consolidate all logs
    2026-05-27T15:50:52.670437Z TestFramework (ERROR): 
    2026-05-27T15:50:52.670466Z TestFramework (ERROR): If this failure happened unexpectedly or intermittently, please file a bug and provide a link or upload of the combined log.
    2026-05-27T15:50:52.670511Z TestFramework (ERROR): https://github.com/bitcoin/bitcoin/issues
    2026-05-27T15:50:52.670542Z TestFramework (ERROR): 
    
    
    stderr:
    [node 2] Cleaning up leftover process
    [node 1] Cleaning up leftover process
    [node 0] Cleaning up leftover process
    [node 3] Cleaning up leftover process
    
    
    
  21. in ci/test/00_setup_env_native_alpine_musl.sh:20 in faacd44941 outdated
      16 | @@ -17,5 +17,5 @@ export BITCOIN_CONFIG="\
      17 |   -DREDUCE_EXPORTS=ON \
      18 |   -DCMAKE_BUILD_TYPE=Debug \
      19 |  "
      20 | -export TEST_RUNNER_EXTRA="--v2transport --usecli --extended"
      21 | +export TEST_RUNNER_EXTRA="--v2transport --usecli --extended --exclude feature_dbcrash"  # Run extended tests under --usecli and --v2transport, but exclude the very slow dbcrash
    


    fanquake commented at 9:17 AM on May 28, 2026:

    Do we want to unexclude after #35394?


    maflcko commented at 9:56 AM on May 28, 2026:

    Even on that pull the test was taking:

    feature_dbcrash.py                                                               | ✓ Passed  | 1696 s
    

    Which seems a long time, just to test the cli a bit more?

    (Recall that the test was never running before, because it was implicitly excluded because it didn't support the --usecli)


    maflcko commented at 2:03 PM on May 29, 2026:

    (Closing thread for now, I presume it is addressed)


    fanquake commented at 2:10 PM on May 29, 2026:

    Fine by me.

  22. in test/functional/feature_dbcrash.py:139 in faded8d435 outdated
     135 | @@ -149,6 +136,7 @@ def sync_node3blocks(self, block_hashes):
     136 |                  if not self.submit_block_catch_error(i, block):
     137 |                      # TODO: more carefully check that the crash is due to -dbcrashratio
     138 |                      # (change the exit code perhaps, and check that here?)
     139 | +                    # wait_for_node_exit() enforces that bitcoind crashed.
    


    willcl-ark commented at 9:47 AM on May 28, 2026:

    In faded8d435afb230f048c27f9285070da355f3a2

    I think we might want wait_for_node_exit() in verify_utxo_hash() too, to ensure the process has crashed.


    maflcko commented at 10:14 AM on May 28, 2026:

    thx, pushed a commit, but I haven't tested it locally yet


    maflcko commented at 1:42 PM on May 28, 2026:

    Ok, this is a bit tedious to test, but to check if the code was hit, I added a print:

    diff --git a/test/functional/feature_dbcrash.py b/test/functional/feature_dbcrash.py
    index 24e663af82..08b2d83224 100755
    --- a/test/functional/feature_dbcrash.py
    +++ b/test/functional/feature_dbcrash.py
    @@ -172,6 +172,7 @@ class ChainstateWriteCrashTest(BitcoinTestFramework):
                     # probably a crash on db flushing
                     self.wait_for_node_exit(i, timeout=10)
                     nodei_utxo_hash = self.restart_node(i, expected_tip=self.nodes[3].getbestblockhash())
    +                print('AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA')
                 assert_equal(nodei_utxo_hash, node3_utxo_hash)
     
         def generate_small_transactions(self, node, count, utxo_list):
    

    After 7 runs I got:

    ...
    2026-05-28T13:32:10.069860Z TestFramework (INFO): Iteration 39, generating 2500 transactions [10, 5, 3]
    2026-05-28T13:32:29.601244Z TestFramework (INFO): Verifying utxo hash matches for all nodes
    AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
    2026-05-28T13:32:30.487595Z TestFramework (INFO): Restarted nodes: [11, 5, 3]; crashes on restart: 26
    2026-05-28T13:32:30.553091Z TestFramework (INFO): Stopping nodes
    2026-05-28T13:32:30.658533Z TestFramework (INFO): Cleaning up /tmp/bitcoin_func_test_mp2hcv2o on exit
    2026-05-28T13:32:30.658594Z TestFramework (INFO): Tests successful
    

    willcl-ark commented at 1:47 PM on May 28, 2026:

    Oh nice, so it works.

    (also I wonder if you could have increased dbcrash ratio to crash more often to trigger it more)


    maflcko commented at 7:07 AM on May 29, 2026:

    Oh nice, so it works.

    (also I wonder if you could have increased dbcrash ratio to crash more often to trigger it more)

    Possibly. I'll leave it to other testers to find out, if they want to :)

  23. willcl-ark commented at 9:51 AM on May 28, 2026: member

    Looks OK to me. Left one comment about verifying the node has crashed before restarting.

    Do we want to unexclude after #35394?

    Hmm, I think the value of this is pretty low. The test is for chainstate recovery, and I don't think switching out RPC transports give much added value (and it's a slow test, in any case). The RPCs in use are all pretty standard mining/tx ones. Not sure it's worth it.

  24. test: [refactor] Simplify submit_block_catch_error
    Make it catch any Exception and let the caller verify it.
    
    This refactor does not change any behavior, but the code is simpler,
    more flexible and still correct, because wait_for_node_exit enforces the
    crash to happen.
    fa09de8b68
  25. test: Fix feature_dbcrash.py --usecli intermittent error
    Catch any Exception in verify_utxo_hash and let restart_node verify the
    crash via wait_for_node_exit.
    
    (Also, use named args in restart_node, while touching this test)
    
    Catching any Exception covers possible subprocess.CalledProcessError
    that may happen in a --usecli run. E.g.
    
      TestFramework (INFO): Verifying utxo hash matches for all nodes
      TestFramework.bitcoincli (DEBUG): Running bitcoin-cli ['-datadir=/tmp/bitcoin_func_test_gzufs0ht/node0', '-rpcclienttimeout=240', '-rpcconnect=127.0.0.1', '-rpcport=20963', 'gettxoutsetinfo']
      TestFramework.bitcoincli (DEBUG): Running bitcoin-cli ['-datadir=/tmp/bitcoin_func_test_gzufs0ht/node1', '-rpcclienttimeout=240', '-rpcconnect=127.0.0.1', '-rpcport=20964', 'gettxoutsetinfo']
      TestFramework (ERROR): Called Process failed with stdout='error: timeout on transient error: Could not connect to the server 127.0.0.1:20964 (error code 1 - "EOF reached")
    
      Make sure the bitcoind server is running and that you are connecting to the correct RPC port.
      Use "bitcoin-cli -help" for more info.
      '; stderr='None';
      Traceback (most recent call last):
        File "./test/functional/test_framework/test_framework.py", line 143, in main
          self.run_test()
          ~~~~~~~~~~~~~^^
        File "./test/functional/feature_dbcrash.py", line 273, in run_test
          self.verify_utxo_hash()
          ~~~~~~~~~~~~~~~~~~~~~^^
        File "./test/functional/feature_dbcrash.py", line 182, in verify_utxo_hash
          nodei_utxo_hash = self.nodes[i].gettxoutsetinfo()['hash_serialized_3']
                            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^
        File "./test/functional/test_framework/test_node.py", line 963, in __call__
          return self.cli.send_cli(self.command, *args, **kwargs)
                 ~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "./test/functional/test_framework/test_node.py", line 1043, in send_cli
          raise subprocess.CalledProcessError(returncode, p_args, output=cli_stderr)
    fac27d702f
  26. ci: Exclude feature_dbcrash.py under --v2transport --usecli
    The test should now be passing, but it is slow.
    
    For example, the commit that enabled the test, took ~47 minutes on a
    fast CPU, but using a heavy debug build:
    
    https://github.com/bitcoin/bitcoin/actions/runs/26434786214/job/77815064664?pr=35363#step:11:4101
    
    ...
    Model name: AMD Ryzen 9 7950X3D 16-Core Processor
    ...
    C++ compiler .......................... GNU 15.2.0, /usr/bin/g++
    CMAKE_BUILD_TYPE ...................... Debug
    Preprocessor defined macros ........... DEBUG DEBUG_LOCKORDER DEBUG_LOCKCONTENTION RPC_DOC_CHECK ABORT_ON_FAILED_ASSUME _GLIBCXX_DEBUG _GLIBCXX_DEBUG_PEDANTIC _LIBCPP_HARDENING_MODE=_LIBCPP_HARDENING_MODE_DEBUG
    C++ compiler flags .................... -m64 -O0 -ftrapv -O1 -g3 -g3 -std=c++20 ...
    ...
    feature_dbcrash.py | ✓ Passed  | 2806 s
    faf1475514
  27. test: Wait for node exit after crash in verify_utxo_hash fad585b6e5
  28. maflcko force-pushed on May 28, 2026
  29. maflcko commented at 10:15 AM on May 28, 2026: member

    .

  30. fanquake requested review from stickies-v on May 28, 2026
  31. Sjors commented at 7:01 PM on May 28, 2026: member

    The current master version of this test outputs about 280MB of logs when it fails, see e.g. https://github.com/bitcoin/bitcoin/actions/runs/26592252276/job/78354332966?pr=35403

    I guess that's solved by skipping it on CI?

  32. maflcko commented at 7:06 AM on May 29, 2026: member

    The current master version of this test outputs about 280MB of logs when it fails, see e.g. https://github.com/bitcoin/bitcoin/actions/runs/26592252276/job/78354332966?pr=35403

    I guess that's solved by skipping it on CI?

    The test is creating 40*2500 txs and logging them in hex to the debug log, which comes out to ~200MB. This has always been the case. Maybe it can be hidden via --tracerpc, but I don't really want to turn this single-test-bugfix into a whole unrelated test framework cleanup.

  33. willcl-ark approved
  34. willcl-ark commented at 2:00 PM on May 29, 2026: member

    ACK fad585b6e59452bcd0344653537001adc99450dc

    Nice to see test runtime on the apline job drop back to < 20 minutes :)

    IMO usecli gives little if any benefit on feature_dbcrash, so is fine to disable.

    This is also knackering loads of CI jobs currently, so I think we should get it in reasonably swiftly.

  35. fanquake merged this on May 29, 2026
  36. fanquake closed this on May 29, 2026

  37. maflcko deleted the branch on May 29, 2026


stickies-v

Labels

Milestone
32.0


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-05-31 17:50 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me