wallet: Avoid potentially writing incorrect best block locator #29652

pull ryanofsky wants to merge 1 commits into bitcoin:master from ryanofsky:pr/noloc changing 4 files +14 −28
  1. ryanofsky commented at 5:30 pm on March 14, 2024: contributor

    This PR updates the CWallet::chainStateFlushed method to write the m_last_block_processed locator to the database as the wallet’s best block instead of the chain tip locator.

    Saving the last block processed locator as the best block is less fragile than saving chainStateFlushed locator as best block.

    Right now wallet code relies on blockConnected notifications being sent before chainStateFlushed notifications, and on the chainStateFlushed locator pointing to the last connected block. But this is a fragile assumption that can be easily broken by ActivateBestChain/ConnectTrace changes, and in fact is currently broken (since d96c59cc5cd2f73f1f55c133c52208671fe75ef3 from #25740) when the assumeutxo snapshot block is validated. In that case the background validation chainStateFlushed notification is sent before the blockConnected notification so writing the locator as the best block would record a best block that was never processed. This specific change does not cause a problem for the wallet, because the wallet ignores events from the background validation chainstate. But nothing prevents similar out-of-order chainStateFlushed notifications from being sent in the future, so it is safer for the wallet to not rely on the chain tip sent in chainStateFlushed notifications.

    A good followup to this change would probably be to drop the ChainStateFlushed locator argument entirely so it is not misused in the future. Indexing code already stopped using this locator argument a long time ago for identifying the best block (in 4368384f1d267b011e03a805f934f5c47e2ca1b2 from #14121), though it is currently using the locator argument to log diagnostic warnings.

  2. DrahtBot commented at 5:30 pm on March 14, 2024: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage

    For detailed information about the code coverage, see the test coverage report.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    Concept ACK achow101

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #30221 (wallet: Ensure best block matches wallet scan state by achow101)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  3. DrahtBot added the label Wallet on Mar 14, 2024
  4. ryanofsky commented at 5:49 pm on March 14, 2024: contributor

    I couldn’t really come up with a test for this bug directly. I did find if I added an assert in chainStateFlushed that the locator block height matches the best block height, I could trigger it with test code below:

     0--- a/src/wallet/wallet.cpp
     1+++ b/src/wallet/wallet.cpp
     2@@ -634,6 +634,14 @@ void CWallet::chainStateFlushed(ChainstateRole role, const CBlockLocator& loc)
     3     if (m_attaching_chain || role == ChainstateRole::BACKGROUND) {
     4         return;
     5     }
     6+    if (!loc.IsNull()) {
     7+        LOCK(cs_wallet);
     8+        int loc_height{-1};
     9+        if (m_last_block_processed_height >= 0 && chain().findBlock(loc.vHave.front(), FoundBlock().height(loc_height))) {
    10+            WalletLogPrintf("chainStateFlushed processed height %i locator height %i\n", m_last_block_processed_height, loc_height);
    11+            assert(m_last_block_processed_height >= loc_height);
    12+        }
    13+    }
    14     WalletBatch batch(GetDatabase());
    15     batch.WriteBestBlock(loc);
    16 }
    17--- a/test/functional/feature_assumeutxo.py
    18+++ b/test/functional/feature_assumeutxo.py
    19@@ -52,6 +52,8 @@ COMPLETE_IDX = {'synced': True, 'best_block_height': FINAL_HEIGHT}
    20 
    21 
    22 class AssumeutxoTest(BitcoinTestFramework):
    23+    def add_options(self, parser):
    24+        self.add_wallet_options(parser)
    25 
    26     def set_test_params(self):
    27         """Use the pregenerated, deterministic chain up to height 199."""
    28@@ -215,6 +217,7 @@ class AssumeutxoTest(BitcoinTestFramework):
    29         self.test_invalid_chainstate_scenarios()
    30 
    31         self.log.info(f"Loading snapshot into second node from {dump_output['path']}")
    32+        n1.createwallet('w', load_on_startup=True)
    33         loaded = n1.loadtxoutset(dump_output['path'])
    34         assert_equal(loaded['coins_loaded'], SNAPSHOT_BASE_HEIGHT)
    35         assert_equal(loaded['base_height'], SNAPSHOT_BASE_HEIGHT)
    

    However, this doesn’t trigger in the assumeutxo case where the snapshot block ChainStateFlushed notification is reliably sent before the BlockConnected notification, because wallet just ignores both notifications since they are associated with the background chain. The test above does trigger the assert with error chainStateFlushed processed height 199 locator height 299 but I think that is happening because one of walletInstance->chainStateFlushed(ChainstateRole::NORMAL, chain->getTipLocator()); calls on wallet startup.

    This is all to say i couldn’t really find a practical bug that could be triggered here or a good test to add to this PR.

  5. ryanofsky force-pushed on Mar 14, 2024
  6. ryanofsky commented at 5:59 pm on March 14, 2024: contributor
    Updated e295db870c676aa521e9ff1649514949940809cf -> c5583a35f3fa63c3fa0a015faa74dd54115464ad (pr/noloc.1 -> pr/noloc.2, compare) dropping unused chainStateFlushed argument Updated c5583a35f3fa63c3fa0a015faa74dd54115464ad -> 560063156a9ead2e21f9de4ee301d564a24488bd (pr/noloc.2 -> pr/noloc.3, compare) fixing missed chainStateFlushed call in previous push.
  7. wallet: Avoid potentially writing incorrect best block locator
    I noticed while debugging #24230 that the ChainStateFlushed notification can be
    sent with a locator pointing to a different block than the block in the last
    BlockConnected notification. This is bad because it could cause the wallet to
    record that it is synced to a later block than it ever processed, and make it
    potentially fail to scan blocks for relevant transactions if it is unloaded and
    reloaded.
    
    This problem should probably not happen in practice, because normally the
    locator in the ChainStateFlushed notification does point to the block sent in
    the last BlockConnected notification. But because of the way ActivateBestChain
    accumulates block connected notifications, and the fact that FlushStateToDisk
    is called from many different code paths, this isn't guaranteed. (The new
    assumeutxo logic also introduces a situation where this is never the case: when
    the background chain reaches the snapshot height and validates the snapshot,
    the background chain is flushed before block connected notifications are sent.)
    
    In any case, it is better if the wallet writes a locator actually pointing to
    the last block that it processed instead of writing the locator validation code
    passes to ChainStateFlushed, so this commit switches to the right locator.
    
    A good followup to this change would probably be to drop the ChainStateFlushed
    locator argument entirely so it is not misused in the future. Indexing code
    already stopped using this locator argument a long time ago for identifying the
    best block (in 4368384f1d267b011e03a805f934f5c47e2ca1b2 from #14121), but it is
    still using the locator argument to log diagnostic warnings.
    560063156a
  8. in src/interfaces/chain.h:144 in c5583a35f3 outdated
    136@@ -137,13 +137,6 @@ class Chain
    137     //! pruned), and contains transactions.
    138     virtual bool haveBlockOnDisk(int height) = 0;
    139 
    140-    //! Get locator for the current chain tip.
    141-    virtual CBlockLocator getTipLocator() = 0;
    142-
    143-    //! Return a locator that refers to a block in the active chain.
    144-    //! If specified block is not in the active chain, return locator for the latest ancestor that is in the chain.
    


    ryanofsky commented at 6:09 pm on March 14, 2024:
    note: This comment about getActiveChainLocator returning an ancestor locator if block was not on the active chain was true when the method was first added, but stopped being true in commit ed470940cddbeb40425960d51cefeec4948febe4 from #25717. So the description and name of this method where somewhat misleading after that point.
  9. ryanofsky force-pushed on Mar 14, 2024
  10. DrahtBot commented at 6:42 pm on March 14, 2024: contributor

    🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the documentation.

    Possibly this is due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    Leave a comment here, if you need help tracking down a confusing failure.

    Debug: https://github.com/bitcoin/bitcoin/runs/22673169748

  11. DrahtBot added the label CI failed on Mar 14, 2024
  12. DrahtBot removed the label CI failed on Mar 14, 2024
  13. in src/wallet/wallet.cpp:638 in 560063156a
    636         return;
    637     }
    638-    WalletBatch batch(GetDatabase());
    639-    batch.WriteBestBlock(loc);
    640+    CBlockLocator loc;
    641+    WITH_LOCK(cs_wallet, chain().findBlock(m_last_block_processed, FoundBlock().locator(loc)));
    


    furszy commented at 3:11 pm on March 20, 2024:

    I know this shouldn’t happen but.. what about logging an error if findBlock() returns false?

    Also, could lock cs_wallet only to copy m_last_block_processed.

  14. in src/wallet/wallet.cpp:3297 in 560063156a
    3293@@ -3290,7 +3294,7 @@ bool CWallet::AttachChain(const std::shared_ptr<CWallet>& walletInstance, interf
    3294             }
    3295         }
    3296         walletInstance->m_attaching_chain = false;
    3297-        walletInstance->chainStateFlushed(ChainstateRole::NORMAL, chain.getTipLocator());
    3298+        walletInstance->chainStateFlushed(ChainstateRole::NORMAL);
    


    furszy commented at 3:29 pm on March 20, 2024:
    Testing note: you could probably trigger the behavior by adding a sleep call here. When a reorg is processed in-between the ScanForWalletTransactions call and this chainStateFlushed call.
  15. achow101 commented at 8:37 pm on March 20, 2024: member

    Concept ACK

    I’m now wondering whether the wallet should even be doing anything on ChainStateFlushed since that doesn’t seem like it should have any bearing on what the wallet knows about. It seems like the better thing to do would be to update the best block record every time we finish processing a block in blockConnected, and just ignore chainStateFlushed. The best block is really only used during loading anyways to figure out how much we need to rescan. One thing that could happen if we did this is that on an unclean shutdown, the wallet could store a record that is ahead of the chainstate. On the next load, we would end up resetting to maybe a few blocks behind where the chainstate is at and therefore rescan a bit more than necessary, but that doesn’t seem like it would really be a problem.

  16. in src/wallet/wallet.cpp:637 in 560063156a
    635     if (m_attaching_chain || role == ChainstateRole::BACKGROUND) {
    636         return;
    637     }
    638-    WalletBatch batch(GetDatabase());
    639-    batch.WriteBestBlock(loc);
    640+    CBlockLocator loc;
    


    Eunovo commented at 1:58 pm on March 25, 2024:
    Could be useful to check that chainStateFlushed was called after blockConnected like was done in https://github.com/bitcoin/bitcoin/blob/556074143f581efb3c57f6db946e60dd28094e2c/src/index/base.cpp#L341-L353. I think checking that the CBlockLocator argument points to the same block as m_last_block_processed will suffice.
  17. DrahtBot added the label CI failed on Apr 19, 2024
  18. DrahtBot removed the label CI failed on Apr 23, 2024
  19. achow101 commented at 9:58 pm on June 3, 2024: member

    I’m now wondering whether the wallet should even be doing anything on ChainStateFlushed since that doesn’t seem like it should have any bearing on what the wallet knows about. […]

    I’ve implemented this idea in #30221

  20. ryanofsky commented at 3:53 pm on June 4, 2024: contributor

    I’m now wondering whether the wallet should even be doing anything on ChainStateFlushed since that doesn’t seem like it should have any bearing on what the wallet knows about.

    Notifications are sent in order, so it does have some meaning. It means wallet is in-between blockConnected events and synchronized up to some block, and that the node has written some data to disk recently, so the wallet may want write its data to disk, too.

    I think the only problem with the ChainStateFlushed notification is that because of the fact that FlushStateToDisk is called all over validation code, and notifications are asynchronous, the ChainStateFlushed locator argument is unreliable and buggy. The argument is also unnecessary for wallets and indexes because they already know the last block that they are synced to and don’t need a redundant pointer to a potentially incorrect block (due to a bug on assumeutxo code or other other fragile parts of validation code).

  21. DrahtBot added the label CI failed on Sep 11, 2024
  22. DrahtBot removed the label CI failed on Sep 15, 2024
  23. DrahtBot commented at 1:52 am on September 27, 2024: contributor

    🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/22674943172

    Make sure to run all tests locally, according to the documentation.

    The failure may happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  24. DrahtBot added the label CI failed on Sep 27, 2024
  25. ryanofsky marked this as a draft on Oct 15, 2024
  26. ryanofsky commented at 3:06 pm on October 15, 2024: contributor
    Marking as draft, because this has a silent merge conflict and I don’t think I will be able to fix it right away. I still do think this PR is a good idea, but #30221 can be reviewed too if the alternate approach it implements is preferred (updating the wallet every block even it doesn’t change).

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-11-21 09:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me