- Add synchronization in 3 places where if the Transaction Index happens to be slow, we get rare test failures when querying it for transactions (one such case experienced on Windows, prompting investigation).
- Remove unnecessary TxIndex initialization in some tests.
- Add some test coverage where TxIndex aspect could be tested in feature_init.py.
qa: Fix TxIndex race conditions #32010
pull hodlinator wants to merge 3 commits into bitcoin:master from hodlinator:2025/02/txindex_test_race changing 6 files +31 −9-
hodlinator commented at 11:08 pm on March 6, 2025: contributor
-
DrahtBot commented at 11:09 pm on March 6, 2025: contributor
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.
Code Coverage & Benchmarks
For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/32010.
Reviews
See the guideline for information on the review process.
Type Reviewers ACK fjahr, mzumsande, furszy Concept ACK Prabhat1308 If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.
-
DrahtBot added the label Tests on Mar 6, 2025
-
hodlinator force-pushed on Mar 6, 2025
-
DrahtBot commented at 11:13 pm on March 6, 2025: contributor
🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/38344306947
Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:
-
Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.
-
A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.
-
An intermittent issue.
Leave a comment here, if you need help tracking down a confusing failure.
-
-
DrahtBot added the label CI failed on Mar 6, 2025
-
in test/functional/feature_init.py:89 in 6e6f2f1f4e outdated
85@@ -86,12 +86,11 @@ def check_clean_start(): 86 if self.is_wallet_compiled(): 87 lines_to_terminate_after.append(b'Verifying wallet') 88 89- for terminate_line in lines_to_terminate_after:
mzumsande commented at 11:25 pm on March 6, 2025:why unnecessary? The point of the test (which has a very different “philosophy” from most of our other functional tests) is to trigger these restarts on purpose, simulating power loss / node crashes at multiple different points during the init sequence, to make sure that indexes or chainstate do not get corrupted - this has been a source of bugs in the past.
If you do fewer of them, the test will obviously run faster, but you also reduce the scope of the test.
hodlinator commented at 11:40 pm on March 6, 2025:Thanks! I was relying too much on the log lines from the test saying the node would “exit” and not looking deeper into why we were calling
sigterm_node()
.Dropped that commit and changed the log line to say “terminate” instead of “exit”.
mzumsande commented at 0:01 am on March 7, 2025:oh, the tests uses SIGTERM - so it doesn’t simulate a power loss like I said above, but a user abort - but the general point still stands. (I confused it with similar tests I sometimes run but never PR’ed which use SIGKILL).
hodlinator commented at 9:36 pm on March 8, 2025:As this thread prompted me to understand the beginning of the test better, I realized I could take a stab at making it work on Windows, and found a way: #32021.hodlinator force-pushed on Mar 6, 2025DrahtBot removed the label CI failed on Mar 7, 2025in test/functional/rpc_rawtransaction.py:74 in 081ae231ff outdated
69@@ -70,7 +70,7 @@ def set_test_params(self): 70 self.num_nodes = 3 71 self.extra_args = [ 72 ["-txindex"], 73- ["-txindex"], 74+ [],
fjahr commented at 3:14 pm on March 7, 2025:Did you check if this has an performance impact and maybe that’s why they were added here?
hodlinator commented at 9:05 pm on March 7, 2025:At 081ae231ff542c254924a1770c9db666aff9f880^:
0₿ hyperfine -w 1 "build/test/functional/rpc_rawtransaction.py --legacy-wallet" 1Benchmark 1: build/test/functional/rpc_rawtransaction.py --legacy-wallet 2 Time (mean ± σ): 4.776 s ± 0.025 s [User: 1.420 s, System: 0.267 s] 3 Range (min … max): 4.717 s … 4.811 s 10 runs
At 081ae231ff542c254924a1770c9db666aff9f880:
0₿ hyperfine -w 1 "build/test/functional/rpc_rawtransaction.py --legacy-wallet" 1Benchmark 1: build/test/functional/rpc_rawtransaction.py --legacy-wallet 2 Time (mean ± σ): 4.761 s ± 0.029 s [User: 1.387 s, System: 0.278 s] 3 Range (min … max): 4.713 s … 4.791 s 10 runs
Conclusion: Negligible improvement.
More discussion in other thread.
in test/functional/wallet_avoid_mixing_output_types.py:120 in 081ae231ff outdated
116@@ -117,7 +117,6 @@ def set_test_params(self): 117 self.extra_args = [ 118 [ 119 "-addresstype=bech32", 120- "-txindex",
fjahr commented at 3:14 pm on March 7, 2025:Same as above, may check for performance impact on this test.
hodlinator commented at 9:06 pm on March 7, 2025:At 081ae231ff542c254924a1770c9db666aff9f880^:
0₿ hyperfine -w 1 "build/test/functional/wallet_avoid_mixing_output_types.py --descriptors" 1Benchmark 1: build/test/functional/wallet_avoid_mixing_output_types.py --descriptors 2 Time (mean ± σ): 1.581 s ± 0.588 s [User: 0.610 s, System: 0.113 s] 3 Range (min … max): 0.927 s … 2.288 s 10 runs
At 081ae231ff542c254924a1770c9db666aff9f880:
0₿ hyperfine -w 1 "build/test/functional/wallet_avoid_mixing_output_types.py --descriptors" 1Benchmark 1: build/test/functional/wallet_avoid_mixing_output_types.py --descriptors 2 Time (mean ± σ): 1.640 s ± 0.575 s [User: 0.602 s, System: 0.117 s] 3 Range (min … max): 0.879 s … 2.288 s 10 runs
Conclusion: A slight increase in average run-time, but a faster min-run.
I prefer turning off functionality that isn’t required for tests, eliminating them from this category of race conditions completely.
in test/functional/feature_init.py:105 in 6212731c5c outdated
101@@ -102,12 +102,21 @@ def check_clean_start(): 102 'blocks/index/*.ldb': 'Error opening block database.', 103 'chainstate/*.ldb': 'Error opening coins database.', 104 'blocks/blk*.dat': 'Error loading block database.', 105+ # Not detected: 'indexes/blockfilter/basic/*.dat', 'indexes/blockfilter/basic/db/*.*', 'indexes/coinstats/db/*.*'
fjahr commented at 3:56 pm on March 7, 2025:Can this be a bit more clear what this means? I guess “Removing these files does not result in a startup error”? Would be great if you could spell that out because I needed a moment to understand. You could also put the commented out line below and this one together and above the variable declaration. I am not sure if having them in line and split up helps with understanding. Same below.
hodlinator commented at 9:27 pm on March 7, 2025:Done in latest push.fjahr commented at 3:59 pm on March 7, 2025: contributorConcept ACKqa: Add missing coverage of corrupt indexes 7ac281c19cqa: Remove unnecessary -txindex args
(Parent commit ensured indexes in feature_init.py are actually used, otherwise they would be removed here as well).
hodlinator force-pushed on Mar 7, 2025in test/functional/test_framework/util.py:597 in 12a1d5d813 outdated
591@@ -592,3 +592,10 @@ def find_vout_for_address(node, txid, addr): 592 if addr == tx["vout"][i]["scriptPubKey"]["address"]: 593 return i 594 raise RuntimeError("Vout not found for address: txid=%s, addr=%s" % (txid, addr)) 595+ 596+ 597+def sync_txindex(test_framework, node):
fjahr commented at 12:05 pm on March 8, 2025:nit: I would have put this intotest_framework/test_framework.py
so it can live among its peerssync_blocks
et al and we would save the import from util. But fine to keep as is.
hodlinator commented at 7:20 pm on March 8, 2025:Looking at the test_framework.py functions, they’re about syncing between nodes. Maybe this one could be calledwait_on_txindex()
, but the RPC object uses “synced”, so I’m not sure.
fjahr commented at 11:37 pm on March 8, 2025:Right, I’m not passionate about the naming here.sync_blocks
is syncing between two blockchains,sync_txindex
is syncing between a blockchain and an index, so it’s not wrong IMO. Butwait_for_txindex()
sounds fine, too. So up to you what you want to choose and if you want to change this at all.
hodlinator commented at 2:32 pm on March 10, 2025:Looked around forwait_for_
-functions, but a bunch of those are used to check forlast_message
inP2PInterface
. Staring a bit atsync_txindex
I realized the wait condition could be simplified, decided to do that to pre-empt other reviewers. Excuse me for invalidating your ACK.fjahr commented at 12:07 pm on March 8, 2025: contributorCode review ACK 12a1d5d813640addc8186e7fc882777f5a078164qa: Wait for txindex to avoid race condition
Can be verified to be necessary through adding std::this_thread::sleep_for(0.5s) at the beginning of TxIndex::CustomAppend.
hodlinator force-pushed on Mar 10, 2025fjahr commented at 4:28 pm on March 10, 2025: contributorre-ACK 3301d2cbe8c3b76c97285d75fa59637cb6952d0b
Only change was to make the helper internals a bit nicer.
mzumsande commented at 6:14 pm on March 10, 2025: contributorCode Review ACK 3301d2cbe8c3b76c97285d75fa59637cb6952d0bPrabhat1308 commented at 4:05 am on March 13, 2025: noneAre the files mentioned infiles_to_delete
andfiles_to_perturb
comments just for informational purposes that they don’t trigger an error or you want to indicate that these files should trigger an error but currently they don’t.hodlinator commented at 7:30 pm on March 13, 2025: contributorAre the files mentioned in
files_to_delete
andfiles_to_perturb
comments just for informational purposes that they don’t trigger an error or you want to indicate that these files should trigger an error but currently they don’t.Yes, some of them at least probably should trigger an error, but I’m not knowledgeable of those aspects of LevelDB. It would probably be good to dig deeper into that, but I wanted to keep the focus of this PR on the tx index race conditions.
The reason I touched feature_init.py was because I was trying to eliminate unnecessary
-txindex
usage, eliminating the possibility for these race conditions - but it felt bad to just strip it out there, so I preferred to at least add file patterns in the test for what could be checked without changing the C++ code.in test/functional/test_framework/util.py:600 in 3301d2cbe8
591@@ -592,3 +592,10 @@ def find_vout_for_address(node, txid, addr): 592 if addr == tx["vout"][i]["scriptPubKey"]["address"]: 593 return i 594 raise RuntimeError("Vout not found for address: txid=%s, addr=%s" % (txid, addr)) 595+ 596+ 597+def sync_txindex(test_framework, node): 598+ test_framework.log.debug("Waiting for node txindex to sync") 599+ sync_start = int(time.time()) 600+ test_framework.wait_until(lambda: node.getindexinfo("txindex")["txindex"]["synced"])
furszy commented at 8:57 pm on March 13, 2025:tiny nit: could provide a less aggressivecheck_interval
. Like 0.1.
hodlinator commented at 10:01 am on March 14, 2025:Thanks, yes it often takes a little while to sync up so I think 0.1 would be slightly better than the default 0.05. Will do if I re-touch.furszy commented at 9:03 pm on March 13, 2025: memberCode review ACK 3301d2cbe8c3b76c97285d75fa59637cb6952d0bPrabhat1308 commented at 2:53 am on March 15, 2025: noneConcept ACK
3301d2c
I could not recreate a race condition but I have- Tested that the commented files dont trigger a startup error.
- Ran all the tests to check that all pass .
hodlinator commented at 9:41 pm on March 15, 2025: contributorThanks for testing!
I could not recreate a race condition but I have
Did you try the recommended approach in the commit message of 3301d2cbe8c3b76c97285d75fa59637cb6952d0b? It is a bit terse.
- Make sure you don’t have the added
sync_txindex
fix on the Python side. - Add the
sleep_for()
in the recommended place in the C++ side, (build). - You should be able to get the individual tests to fail (simplest is to run the test_runner.py suite so the
--args
get sent in correctly).
Have a feeling I needed slightly different sleep-duration on Windows/Linux due to differences in native/Python execution speed, so maybe it would help to insert slightly longer durations than recommended on Mac to reproduce the issue?
Edit: Should have also mentioned that one might have to tweak the
--timeout_factor=
value to not time out in unrelated ways (such as tests waiting for more blocks to be synced than in the fixed tests).fanquake merged this on Mar 17, 2025fanquake closed this on Mar 17, 2025
stickies-v referenced this in commit 5443a6eac4 on Mar 21, 2025stickies-v referenced this in commit a8bd94de4e on Mar 21, 2025stickies-v referenced this in commit 5571e63c54 on Mar 21, 2025TheCharlatan referenced this in commit 50d4368616 on Mar 24, 2025hodlinator deleted the branch on Mar 25, 2025
github-metadata-mirror
This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-03-29 06:12 UTC
This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me