qa: Fix TxIndex race conditions #32010

pull hodlinator wants to merge 3 commits into bitcoin:master from hodlinator:2025/02/txindex_test_race changing 6 files +31 −9
  1. hodlinator commented at 11:08 pm on March 6, 2025: contributor
    • Add synchronization in 3 places where if the Transaction Index happens to be slow, we get rare test failures when querying it for transactions (one such case experienced on Windows, prompting investigation).
    • Remove unnecessary TxIndex initialization in some tests.
    • Add some test coverage where TxIndex aspect could be tested in feature_init.py.
  2. DrahtBot commented at 11:09 pm on March 6, 2025: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/32010.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK fjahr, mzumsande, furszy
    Concept ACK Prabhat1308

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

  3. DrahtBot added the label Tests on Mar 6, 2025
  4. hodlinator force-pushed on Mar 6, 2025
  5. DrahtBot commented at 11:13 pm on March 6, 2025: contributor

    🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/38344306947

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  6. DrahtBot added the label CI failed on Mar 6, 2025
  7. in test/functional/feature_init.py:89 in 6e6f2f1f4e outdated
    85@@ -86,12 +86,11 @@ def check_clean_start():
    86         if self.is_wallet_compiled():
    87             lines_to_terminate_after.append(b'Verifying wallet')
    88 
    89-        for terminate_line in lines_to_terminate_after:
    


    mzumsande commented at 11:25 pm on March 6, 2025:

    why unnecessary? The point of the test (which has a very different “philosophy” from most of our other functional tests) is to trigger these restarts on purpose, simulating power loss / node crashes at multiple different points during the init sequence, to make sure that indexes or chainstate do not get corrupted - this has been a source of bugs in the past.

    If you do fewer of them, the test will obviously run faster, but you also reduce the scope of the test.


    hodlinator commented at 11:40 pm on March 6, 2025:

    Thanks! I was relying too much on the log lines from the test saying the node would “exit” and not looking deeper into why we were calling sigterm_node().

    Dropped that commit and changed the log line to say “terminate” instead of “exit”.


    mzumsande commented at 0:01 am on March 7, 2025:
    oh, the tests uses SIGTERM - so it doesn’t simulate a power loss like I said above, but a user abort - but the general point still stands. (I confused it with similar tests I sometimes run but never PR’ed which use SIGKILL).

    hodlinator commented at 9:36 pm on March 8, 2025:
    As this thread prompted me to understand the beginning of the test better, I realized I could take a stab at making it work on Windows, and found a way: #32021.
  8. hodlinator force-pushed on Mar 6, 2025
  9. DrahtBot removed the label CI failed on Mar 7, 2025
  10. in test/functional/rpc_rawtransaction.py:74 in 081ae231ff outdated
    69@@ -70,7 +70,7 @@ def set_test_params(self):
    70         self.num_nodes = 3
    71         self.extra_args = [
    72             ["-txindex"],
    73-            ["-txindex"],
    74+            [],
    


    fjahr commented at 3:14 pm on March 7, 2025:
    Did you check if this has an performance impact and maybe that’s why they were added here?

    hodlinator commented at 9:05 pm on March 7, 2025:

    At 081ae231ff542c254924a1770c9db666aff9f880^:

    0₿ hyperfine -w 1 "build/test/functional/rpc_rawtransaction.py --legacy-wallet"
    1Benchmark 1: build/test/functional/rpc_rawtransaction.py --legacy-wallet
    2  Time (mean ± σ):      4.776 s ±  0.025 s    [User: 1.420 s, System: 0.267 s]
    3  Range (min … max):    4.717 s …  4.811 s    10 runs
    

    At 081ae231ff542c254924a1770c9db666aff9f880:

    0₿ hyperfine -w 1 "build/test/functional/rpc_rawtransaction.py --legacy-wallet"
    1Benchmark 1: build/test/functional/rpc_rawtransaction.py --legacy-wallet
    2  Time (mean ± σ):      4.761 s ±  0.029 s    [User: 1.387 s, System: 0.278 s]
    3  Range (min … max):    4.713 s …  4.791 s    10 runs
    

    Conclusion: Negligible improvement.

    More discussion in other thread.

  11. in test/functional/wallet_avoid_mixing_output_types.py:120 in 081ae231ff outdated
    116@@ -117,7 +117,6 @@ def set_test_params(self):
    117         self.extra_args = [
    118             [
    119                 "-addresstype=bech32",
    120-                "-txindex",
    


    fjahr commented at 3:14 pm on March 7, 2025:
    Same as above, may check for performance impact on this test.

    hodlinator commented at 9:06 pm on March 7, 2025:

    At 081ae231ff542c254924a1770c9db666aff9f880^:

    0₿ hyperfine -w 1 "build/test/functional/wallet_avoid_mixing_output_types.py --descriptors"
    1Benchmark 1: build/test/functional/wallet_avoid_mixing_output_types.py --descriptors
    2  Time (mean ± σ):      1.581 s ±  0.588 s    [User: 0.610 s, System: 0.113 s]
    3  Range (min … max):    0.927 s …  2.288 s    10 runs
    

    At 081ae231ff542c254924a1770c9db666aff9f880:

    0₿ hyperfine -w 1 "build/test/functional/wallet_avoid_mixing_output_types.py --descriptors"
    1Benchmark 1: build/test/functional/wallet_avoid_mixing_output_types.py --descriptors
    2  Time (mean ± σ):      1.640 s ±  0.575 s    [User: 0.602 s, System: 0.117 s]
    3  Range (min … max):    0.879 s …  2.288 s    10 runs
    

    Conclusion: A slight increase in average run-time, but a faster min-run.

    I prefer turning off functionality that isn’t required for tests, eliminating them from this category of race conditions completely.

  12. in test/functional/feature_init.py:105 in 6212731c5c outdated
    101@@ -102,12 +102,21 @@ def check_clean_start():
    102             'blocks/index/*.ldb': 'Error opening block database.',
    103             'chainstate/*.ldb': 'Error opening coins database.',
    104             'blocks/blk*.dat': 'Error loading block database.',
    105+            # Not detected: 'indexes/blockfilter/basic/*.dat', 'indexes/blockfilter/basic/db/*.*', 'indexes/coinstats/db/*.*'
    


    fjahr commented at 3:56 pm on March 7, 2025:
    Can this be a bit more clear what this means? I guess “Removing these files does not result in a startup error”? Would be great if you could spell that out because I needed a moment to understand. You could also put the commented out line below and this one together and above the variable declaration. I am not sure if having them in line and split up helps with understanding. Same below.

    hodlinator commented at 9:27 pm on March 7, 2025:
    Done in latest push.
  13. fjahr commented at 3:59 pm on March 7, 2025: contributor
    Concept ACK
  14. qa: Add missing coverage of corrupt indexes 7ac281c19c
  15. qa: Remove unnecessary -txindex args
    (Parent commit ensured indexes in feature_init.py are actually used, otherwise they would be removed here as well).
    9bfb0d75ba
  16. hodlinator force-pushed on Mar 7, 2025
  17. in test/functional/test_framework/util.py:597 in 12a1d5d813 outdated
    591@@ -592,3 +592,10 @@ def find_vout_for_address(node, txid, addr):
    592         if addr == tx["vout"][i]["scriptPubKey"]["address"]:
    593             return i
    594     raise RuntimeError("Vout not found for address: txid=%s, addr=%s" % (txid, addr))
    595+
    596+
    597+def sync_txindex(test_framework, node):
    


    fjahr commented at 12:05 pm on March 8, 2025:
    nit: I would have put this into test_framework/test_framework.py so it can live among its peers sync_blocks et al and we would save the import from util. But fine to keep as is.

    hodlinator commented at 7:20 pm on March 8, 2025:
    Looking at the test_framework.py functions, they’re about syncing between nodes. Maybe this one could be called wait_on_txindex(), but the RPC object uses “synced”, so I’m not sure.

    fjahr commented at 11:37 pm on March 8, 2025:
    Right, I’m not passionate about the naming here. sync_blocks is syncing between two blockchains, sync_txindex is syncing between a blockchain and an index, so it’s not wrong IMO. But wait_for_txindex() sounds fine, too. So up to you what you want to choose and if you want to change this at all.

    hodlinator commented at 2:32 pm on March 10, 2025:
    Looked around for wait_for_-functions, but a bunch of those are used to check for last_message in P2PInterface. Staring a bit at sync_txindex I realized the wait condition could be simplified, decided to do that to pre-empt other reviewers. Excuse me for invalidating your ACK.
  18. fjahr commented at 12:07 pm on March 8, 2025: contributor
    Code review ACK 12a1d5d813640addc8186e7fc882777f5a078164
  19. qa: Wait for txindex to avoid race condition
    Can be verified to be necessary through adding std::this_thread::sleep_for(0.5s) at the beginning of TxIndex::CustomAppend.
    3301d2cbe8
  20. hodlinator force-pushed on Mar 10, 2025
  21. fjahr commented at 4:28 pm on March 10, 2025: contributor

    re-ACK 3301d2cbe8c3b76c97285d75fa59637cb6952d0b

    Only change was to make the helper internals a bit nicer.

  22. mzumsande commented at 6:14 pm on March 10, 2025: contributor
    Code Review ACK 3301d2cbe8c3b76c97285d75fa59637cb6952d0b
  23. Prabhat1308 commented at 4:05 am on March 13, 2025: none
    Are the files mentioned in files_to_delete and files_to_perturb comments just for informational purposes that they don’t trigger an error or you want to indicate that these files should trigger an error but currently they don’t.
  24. hodlinator commented at 7:30 pm on March 13, 2025: contributor

    Are the files mentioned in files_to_delete and files_to_perturb comments just for informational purposes that they don’t trigger an error or you want to indicate that these files should trigger an error but currently they don’t.

    Yes, some of them at least probably should trigger an error, but I’m not knowledgeable of those aspects of LevelDB. It would probably be good to dig deeper into that, but I wanted to keep the focus of this PR on the tx index race conditions.

    The reason I touched feature_init.py was because I was trying to eliminate unnecessary -txindex usage, eliminating the possibility for these race conditions - but it felt bad to just strip it out there, so I preferred to at least add file patterns in the test for what could be checked without changing the C++ code.

  25. in test/functional/test_framework/util.py:600 in 3301d2cbe8
    591@@ -592,3 +592,10 @@ def find_vout_for_address(node, txid, addr):
    592         if addr == tx["vout"][i]["scriptPubKey"]["address"]:
    593             return i
    594     raise RuntimeError("Vout not found for address: txid=%s, addr=%s" % (txid, addr))
    595+
    596+
    597+def sync_txindex(test_framework, node):
    598+    test_framework.log.debug("Waiting for node txindex to sync")
    599+    sync_start = int(time.time())
    600+    test_framework.wait_until(lambda: node.getindexinfo("txindex")["txindex"]["synced"])
    


    furszy commented at 8:57 pm on March 13, 2025:
    tiny nit: could provide a less aggressive check_interval. Like 0.1.

    hodlinator commented at 10:01 am on March 14, 2025:
    Thanks, yes it often takes a little while to sync up so I think 0.1 would be slightly better than the default 0.05. Will do if I re-touch.
  26. furszy commented at 9:03 pm on March 13, 2025: member
    Code review ACK 3301d2cbe8c3b76c97285d75fa59637cb6952d0b
  27. Prabhat1308 commented at 2:53 am on March 15, 2025: none

    Concept ACK 3301d2c I could not recreate a race condition but I have

    • Tested that the commented files dont trigger a startup error.
    • Ran all the tests to check that all pass .
  28. hodlinator commented at 9:41 pm on March 15, 2025: contributor

    Thanks for testing!

    I could not recreate a race condition but I have

    Did you try the recommended approach in the commit message of 3301d2cbe8c3b76c97285d75fa59637cb6952d0b? It is a bit terse.

    • Make sure you don’t have the added sync_txindex fix on the Python side.
    • Add the sleep_for() in the recommended place in the C++ side, (build).
    • You should be able to get the individual tests to fail (simplest is to run the test_runner.py suite so the --args get sent in correctly).

    Have a feeling I needed slightly different sleep-duration on Windows/Linux due to differences in native/Python execution speed, so maybe it would help to insert slightly longer durations than recommended on Mac to reproduce the issue?

    Edit: Should have also mentioned that one might have to tweak the --timeout_factor= value to not time out in unrelated ways (such as tests waiting for more blocks to be synced than in the fixed tests).

  29. fanquake merged this on Mar 17, 2025
  30. fanquake closed this on Mar 17, 2025

  31. stickies-v referenced this in commit 5443a6eac4 on Mar 21, 2025
  32. stickies-v referenced this in commit a8bd94de4e on Mar 21, 2025
  33. stickies-v referenced this in commit 5571e63c54 on Mar 21, 2025
  34. TheCharlatan referenced this in commit 50d4368616 on Mar 24, 2025
  35. hodlinator deleted the branch on Mar 25, 2025

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-03-29 06:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me