wallet: Avoid potentially writing incorrect best block locator

ryanofsky commented at 5:30 pm on March 14, 2024: contributor

This PR updates the CWallet::chainStateFlushed method to write the m_last_block_processed locator to the database as the wallet’s best block instead of the chain tip locator.

Saving the last block processed locator as the best block is less fragile than saving chainStateFlushed locator as best block.

Right now wallet code relies on blockConnected notifications being sent before chainStateFlushed notifications, and on the chainStateFlushed locator pointing to the last connected block. But this is a fragile assumption that can be easily broken by ActivateBestChain/ConnectTrace changes, and in fact is currently broken (since d96c59cc5cd2f73f1f55c133c52208671fe75ef3 from #25740) when the assumeutxo snapshot block is validated. In that case the background validation chainStateFlushed notification is sent before the blockConnected notification so writing the locator as the best block would record a best block that was never processed. This specific change does not cause a problem for the wallet, because the wallet ignores events from the background validation chainstate. But nothing prevents similar out-of-order chainStateFlushed notifications from being sent in the future, so it is safer for the wallet to not rely on the chain tip sent in chainStateFlushed notifications.

A good followup to this change would probably be to drop the ChainStateFlushed locator argument entirely so it is not misused in the future. Indexing code already stopped using this locator argument a long time ago for identifying the best block (in 4368384f1d267b011e03a805f934f5c47e2ca1b2 from #14121), though it is currently using the locator argument to log diagnostic warnings.

DrahtBot commented at 5:30 pm on March 14, 2024: contributor

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/29652.

Reviews

See the guideline for information on the review process.

Type	Reviewers
Concept ACK	achow101

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#30221 (wallet: Ensure best block matches wallet scan state by achow101)
#28710 (Remove the legacy wallet and BDB dependency by achow101)
#19463 (Prune locks by luke-jr)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

DrahtBot added the label Wallet on Mar 14, 2024

ryanofsky commented at 5:49 pm on March 14, 2024: contributor

I couldn’t really come up with a test for this bug directly. I did find if I added an assert in chainStateFlushed that the locator block height matches the best block height, I could trigger it with test code below:

 0--- a/src/wallet/wallet.cpp
 1+++ b/src/wallet/wallet.cpp
 2@@ -634,6 +634,14 @@ void CWallet::chainStateFlushed(ChainstateRole role, const CBlockLocator& loc)
 3     if (m_attaching_chain || role == ChainstateRole::BACKGROUND) {
 4         return;
 5     }
 6+    if (!loc.IsNull()) {
 7+        LOCK(cs_wallet);
 8+        int loc_height{-1};
 9+        if (m_last_block_processed_height >= 0 && chain().findBlock(loc.vHave.front(), FoundBlock().height(loc_height))) {
10+            WalletLogPrintf("chainStateFlushed processed height %i locator height %i\n", m_last_block_processed_height, loc_height);
11+            assert(m_last_block_processed_height >= loc_height);
12+        }
13+    }
14     WalletBatch batch(GetDatabase());
15     batch.WriteBestBlock(loc);
16 }
17--- a/test/functional/feature_assumeutxo.py
18+++ b/test/functional/feature_assumeutxo.py
19@@ -52,6 +52,8 @@ COMPLETE_IDX = {'synced': True, 'best_block_height': FINAL_HEIGHT}
20 
21 
22 class AssumeutxoTest(BitcoinTestFramework):
23+    def add_options(self, parser):
24+        self.add_wallet_options(parser)
25 
26     def set_test_params(self):
27         """Use the pregenerated, deterministic chain up to height 199."""
28@@ -215,6 +217,7 @@ class AssumeutxoTest(BitcoinTestFramework):
29         self.test_invalid_chainstate_scenarios()
30 
31         self.log.info(f"Loading snapshot into second node from {dump_output['path']}")
32+        n1.createwallet('w', load_on_startup=True)
33         loaded = n1.loadtxoutset(dump_output['path'])
34         assert_equal(loaded['coins_loaded'], SNAPSHOT_BASE_HEIGHT)
35         assert_equal(loaded['base_height'], SNAPSHOT_BASE_HEIGHT)

However, this doesn’t trigger in the assumeutxo case where the snapshot block ChainStateFlushed notification is reliably sent before the BlockConnected notification, because wallet just ignores both notifications since they are associated with the background chain. The test above does trigger the assert with error chainStateFlushed processed height 199 locator height 299 but I think that is happening because one of walletInstance->chainStateFlushed(ChainstateRole::NORMAL, chain->getTipLocator()); calls on wallet startup.

This is all to say i couldn’t really find a practical bug that could be triggered here or a good test to add to this PR.

ryanofsky force-pushed on Mar 14, 2024

ryanofsky commented at 5:59 pm on March 14, 2024: contributor

Updated e295db870c676aa521e9ff1649514949940809cf -> c5583a35f3fa63c3fa0a015faa74dd54115464ad (pr/noloc.1 -> pr/noloc.2, compare) dropping unused chainStateFlushed argument Updated c5583a35f3fa63c3fa0a015faa74dd54115464ad -> 560063156a9ead2e21f9de4ee301d564a24488bd (pr/noloc.2 -> pr/noloc.3, compare) fixing missed chainStateFlushed call in previous push.

wallet: Avoid potentially writing incorrect best block locator

I noticed while debugging #24230 that the ChainStateFlushed notification can be
sent with a locator pointing to a different block than the block in the last
BlockConnected notification. This is bad because it could cause the wallet to
record that it is synced to a later block than it ever processed, and make it
potentially fail to scan blocks for relevant transactions if it is unloaded and
reloaded.

This problem should probably not happen in practice, because normally the
locator in the ChainStateFlushed notification does point to the block sent in
the last BlockConnected notification. But because of the way ActivateBestChain
accumulates block connected notifications, and the fact that FlushStateToDisk
is called from many different code paths, this isn't guaranteed. (The new
assumeutxo logic also introduces a situation where this is never the case: when
the background chain reaches the snapshot height and validates the snapshot,
the background chain is flushed before block connected notifications are sent.)

In any case, it is better if the wallet writes a locator actually pointing to
the last block that it processed instead of writing the locator validation code
passes to ChainStateFlushed, so this commit switches to the right locator.

A good followup to this change would probably be to drop the ChainStateFlushed
locator argument entirely so it is not misused in the future. Indexing code
already stopped using this locator argument a long time ago for identifying the
best block (in 4368384f1d267b011e03a805f934f5c47e2ca1b2 from #14121), but it is
still using the locator argument to log diagnostic warnings.

560063156a

in src/interfaces/chain.h:144 in c5583a35f3 outdated

136@@ -137,13 +137,6 @@ class Chain
137     //! pruned), and contains transactions.
138     virtual bool haveBlockOnDisk(int height) = 0;
139 
140-    //! Get locator for the current chain tip.
141-    virtual CBlockLocator getTipLocator() = 0;
142-
143-    //! Return a locator that refers to a block in the active chain.
144-    //! If specified block is not in the active chain, return locator for the latest ancestor that is in the chain.

ryanofsky commented at 6:09 pm on March 14, 2024:

note: This comment about getActiveChainLocator returning an ancestor locator if block was not on the active chain was true when the method was first added, but stopped being true in commit ed470940cddbeb40425960d51cefeec4948febe4 from #25717. So the description and name of this method where somewhat misleading after that point.

ryanofsky force-pushed on Mar 14, 2024

DrahtBot commented at 6:42 pm on March 14, 2024: contributor

🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the documentation.

Possibly this is due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

Leave a comment here, if you need help tracking down a confusing failure.

Debug: https://github.com/bitcoin/bitcoin/runs/22673169748

DrahtBot added the label CI failed on Mar 14, 2024

DrahtBot removed the label CI failed on Mar 14, 2024

in src/wallet/wallet.cpp:638 in 560063156a

636         return;
637     }
638-    WalletBatch batch(GetDatabase());
639-    batch.WriteBestBlock(loc);
640+    CBlockLocator loc;
641+    WITH_LOCK(cs_wallet, chain().findBlock(m_last_block_processed, FoundBlock().locator(loc)));

furszy commented at 3:11 pm on March 20, 2024:

I know this shouldn’t happen but.. what about logging an error if findBlock() returns false?

Also, could lock cs_wallet only to copy m_last_block_processed.

in src/wallet/wallet.cpp:3297 in 560063156a

3293@@ -3290,7 +3294,7 @@ bool CWallet::AttachChain(const std::shared_ptr<CWallet>& walletInstance, interf
3294             }
3295         }
3296         walletInstance->m_attaching_chain = false;
3297-        walletInstance->chainStateFlushed(ChainstateRole::NORMAL, chain.getTipLocator());
3298+        walletInstance->chainStateFlushed(ChainstateRole::NORMAL);

furszy commented at 3:29 pm on March 20, 2024:

Testing note: you could probably trigger the behavior by adding a sleep call here. When a reorg is processed in-between the ScanForWalletTransactions call and this chainStateFlushed call.

achow101 commented at 8:37 pm on March 20, 2024: member

Concept ACK

I’m now wondering whether the wallet should even be doing anything on ChainStateFlushed since that doesn’t seem like it should have any bearing on what the wallet knows about. It seems like the better thing to do would be to update the best block record every time we finish processing a block in blockConnected, and just ignore chainStateFlushed. The best block is really only used during loading anyways to figure out how much we need to rescan. One thing that could happen if we did this is that on an unclean shutdown, the wallet could store a record that is ahead of the chainstate. On the next load, we would end up resetting to maybe a few blocks behind where the chainstate is at and therefore rescan a bit more than necessary, but that doesn’t seem like it would really be a problem.

in src/wallet/wallet.cpp:637 in 560063156a

635     if (m_attaching_chain || role == ChainstateRole::BACKGROUND) {
636         return;
637     }
638-    WalletBatch batch(GetDatabase());
639-    batch.WriteBestBlock(loc);
640+    CBlockLocator loc;

Eunovo commented at 1:58 pm on March 25, 2024:

Could be useful to check that chainStateFlushed was called after blockConnected like was done in https://github.com/bitcoin/bitcoin/blob/556074143f581efb3c57f6db946e60dd28094e2c/src/index/base.cpp#L341-L353. I think checking that the CBlockLocator argument points to the same block as m_last_block_processed will suffice.

DrahtBot added the label CI failed on Apr 19, 2024

DrahtBot removed the label CI failed on Apr 23, 2024

achow101 commented at 9:58 pm on June 3, 2024: member

I’m now wondering whether the wallet should even be doing anything on ChainStateFlushed since that doesn’t seem like it should have any bearing on what the wallet knows about. […]

I’ve implemented this idea in #30221

ryanofsky commented at 3:53 pm on June 4, 2024: contributor

I’m now wondering whether the wallet should even be doing anything on ChainStateFlushed since that doesn’t seem like it should have any bearing on what the wallet knows about.

Notifications are sent in order, so it does have some meaning. It means wallet is in-between blockConnected events and synchronized up to some block, and that the node has written some data to disk recently, so the wallet may want write its data to disk, too.

I think the only problem with the ChainStateFlushed notification is that because of the fact that FlushStateToDisk is called all over validation code, and notifications are asynchronous, the ChainStateFlushed locator argument is unreliable and buggy. The argument is also unnecessary for wallets and indexes because they already know the last block that they are synced to and don’t need a redundant pointer to a potentially incorrect block (due to a bug on assumeutxo code or other other fragile parts of validation code).

DrahtBot added the label CI failed on Sep 11, 2024

DrahtBot removed the label CI failed on Sep 15, 2024

DrahtBot commented at 1:52 am on September 27, 2024: contributor

🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/22674943172

Make sure to run all tests locally, according to the documentation.

The failure may happen due to a number of reasons, for example:

Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.
An intermittent issue.

Leave a comment here, if you need help tracking down a confusing failure.

DrahtBot added the label CI failed on Sep 27, 2024

ryanofsky marked this as a draft on Oct 15, 2024

ryanofsky commented at 3:06 pm on October 15, 2024: contributor

Marking as draft, because this has a silent merge conflict and I don’t think I will be able to fix it right away. I still do think this PR is a good idea, but #30221 can be reviewed too if the alternate approach it implements is preferred (updating the wallet every block even it doesn’t change).

DrahtBot added the label Needs rebase on May 7, 2025

DrahtBot commented at 2:34 pm on May 7, 2025: contributor

🐙 This pull request conflicts with the target branch and needs rebase.

ryanofsky referenced this in commit 33f8f8ae4c on May 19, 2025

DrahtBot commented at 8:20 am on August 4, 2025: contributor

⌛ There hasn’t been much activity lately and the patch still needs rebase. What is the status here?

Is it still relevant? ➡️ Please solve the conflicts to make it ready for review and to ensure the CI passes.
Is it no longer relevant? ➡️ Please close.
Did the author lose interest or time to work on this? ➡️ Please close it and mark it ‘Up for grabs’ with the label, so that it can be picked up in the future.

furszy commented at 1:55 pm on August 4, 2025: member

Since #30221 got merged, I think this one can be closed

ryanofsky commented at 5:13 pm on August 4, 2025: contributor

re: #29652#pullrequestreview-3084332144

Since #30221 got merged, I think this one can be closed

Thanks! #30221 avoids the issue of wallets potentially writing the incorrect locators on flush notifications by just ignoring them and flushing in a less coordinated way. So the main concern of this PR is no longer an issue and it makes sense to close.

I still think the changes here dropping the Chain::getTipLocator and Chain::getActiveChainLocator methods would be good to apply. The first method is unused and since #25717, the second method has a misleading name and description is redundant with Chain::findBlock. So it would be nice to see those changes in another PR.

ryanofsky closed this on Aug 4, 2025

ryanofsky added the label Up for grabs on Aug 4, 2025

pablomartin4btc commented at 0:24 am on August 11, 2025: member

I still think the changes here dropping the Chain::getTipLocator and Chain::getActiveChainLocator methods would be good to apply. The first method is unused and since #25717, the second method has a misleading name and description is redundant with Chain::findBlock. So it would be nice to see those changes in another PR. @ryanofsky, @furszy, done in #33169.

fanquake removed the label Up for grabs on Aug 11, 2025

fanquake removed the label Needs rebase on Aug 11, 2025

fanquake removed the label CI failed on Aug 11, 2025

wallet: Avoid potentially writing incorrect best block locator #29652

Code Coverage & Benchmarks

Reviews

Conflicts