wallet: Ensure best block matches wallet scan state #30221

pull achow101 wants to merge 10 commits into bitcoin:master from achow101:wallet-no-chainstateflushed changing 17 files +186 −149
  1. achow101 commented at 9:58 pm on June 3, 2024: member

    Implements the idea discussed in #29652 (comment)

    Currently, m_last_block_processed and m_last_block_processed_height are not guaranteed to match the block locator stored in the wallet, nor do either of those fields actually represent the last block that the wallet is synced up to. This is confusing and unintuitive.

    This PR changes the those last block fields to actually be in sync with the record stored on disk. This requires adding the block height to the BESTBLOCK_NOMERKLE record, which was done in a backwards compatible manner. Additionally, the issue with chainStateFlushed fixed in #29652 is also fixed here by removing chainStateFlushed entirely. The last block fields, now simply the best block, is updated for each blockConnected and blockDisconnected so that it does actually represent the block for which the wallet is synced up to.

    To make it easier to deal with, the best block fields are consolidated into a BestBlock struct which also has the serialization code with backwards compatibility.

  2. wallet: Update best block record after block dis/connect ae3b8280a7
  3. scripted-diff: Rename LastBlock to BestBlock
    -BEGIN VERIFY SCRIPT-
    sed -i "s/SetLastBlockProcessed/SetBestBlock/" $(git grep -l "SetLastBlockProcessed")
    sed -i "s/GetLastBlock/GetBestBlock/g" $(git grep -l "GetLastBlock")
    -END VERIFY SCRIPT-
    c37c702f47
  4. wallet: Combine last block processed hash and height into a struct
    m_last_block_processed and m_last_block_processed_height are combined
    into a new struct BestBlock.
    3f2b4e454d
  5. wallet: Replace chainStateFlushed in loading with WriteBestBlock
    The only reason to call chainStateFlushed during wallet loading is to
    ensure that the best block is written. Do these writes explicitly to
    prepare for removing chainStateFlushed.
    5e8164c86f
  6. wallet: Remove WriteBestBlock from chainStateFlushed 9df9027e2b
  7. wallet: Sync m_best_block with BESTBLOCK_NOMERKLE
    The BestBlock struct now contains the block locator, and is also
    serialized with the block height. For backwards compatibility, the
    height is optional. The best block hash is retrieved from the block
    locator when deserializing.
    
    Since the best block record has to store the height, additional
    changes to AttachChain are made to faciliate auto upgrade as well as
    setting the in-memory best block.
    0667b86bb4
  8. wallet: Load best block record in LoadWallet
    Instead of setting the best block info later during AttachChain, read it
    during LoadWallet so it is set (if it exists) before we get there.
    dfcddd1fa1
  9. wallet: Use best block locator in migrate instead of re-reading it
    Instead of calling ReadBestBlock to get the best block locator, use the
    m_best_block.m_locator that we already have in CWallet.
    0f97763e81
  10. wallet: use SetBestBlock when saving progress during rescan
    Instead of writing the best block record directly during a rescan, use
    SetBestBlock. This is safe because progress is only saved in a
    rescan when doing a rescan on loading. cs_wallet is held the entire time
    so new blocks or reorgs will not cause the best block record to be
    changed unexpectedly.
    09a14931e7
  11. wallet: Remove chainStateFlushed
    chainStateFlushed is no longer needed since the best block is updated
    after a block is scanned. Since the chainstate being flushed does not
    necessarily coincide with the wallet having processed said block, it
    does not entirely make sense for the wallet to be recording that block
    as its best block, and this can cause race conditions where some blocks
    are not processed. Thus, remove this notification.
    e5c9876525
  12. DrahtBot commented at 9:58 pm on June 3, 2024: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage

    For detailed information about the code coverage, see the test coverage report.

    Reviews

    See the guideline for information on the review process. A summary of reviews will appear here.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #30343 (wallet, logging: Replace WalletLogPrintf() with LogInfo() by ryanofsky)
    • #29936 (fuzz: wallet: add target for CreateTransaction by brunoerg)
    • #29652 (wallet: Avoid potentially writing incorrect best block locator by ryanofsky)
    • #28710 (Remove the legacy wallet and BDB dependency by achow101)
    • #28616 (Show transactions as not fully confirmed during background validation by Sjors)
    • #28574 (wallet: optimize migration process, batch db transactions by furszy)
    • #27865 (wallet: Track no-longer-spendable TXOs separately by achow101)
    • #27286 (wallet: Keep track of the wallet’s own transaction outputs in memory by achow101)
    • #25722 (refactor: Use util::Result class for wallet loading by ryanofsky)
    • #24128 (wallet: BIP 326 sequence based anti-fee-snipe for taproot inputs by maflcko)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  13. DrahtBot added the label Wallet on Jun 3, 2024
  14. Thompson1985 approved
  15. ryanofsky commented at 3:34 pm on June 4, 2024: contributor

    I’m not sure, but it seems like a potentially good thing to me that last_block_processed and BESTBLOCK are two distinct concepts.

    The last block processed is the block that CWallet in-memory state has been synced to (particularly CWalletTx::m_state which includes mempool / abandoned information not serialized to disk).

    The BESTBLOCK record is the last block whose data has been flushed to disk, that the wallet should begin syncing from next time it is reloaded.

    It seems like the first commit ae3b8280a7dbb6cc87814ed97dbea5a122cf8215 could lead to a performance regression if it is writing the flushing the BESTBLOCK record on every single blockConnected event from the node. For example during reindexing it could write and sync the database each time a block is connected, even if the block is before the wallets birthday, or doesn’t have any relevant transactions?

    I think it probably makes sense to write BESTBLOCK in a smarter way, and probably write it more frequently, but it seems unnecessary to have to write it every block, and it might be bad for performance now or make optimizations harder in the future.

    I’m also curious about the idea of saving height with the best block record. I could imagine this being useful, since IIRC parts of wallet loading are complicated by not knowing heights of transactions before the chain is attached, but it doesn’t really seem like the height is used for much yet. Was this intended to be used by something else later?

  16. achow101 commented at 3:48 pm on June 4, 2024: member

    The last block processed is the block that CWallet in-memory state has been synced to (particularly CWalletTx::m_state which includes mempool / abandoned information not serialized to disk).

    But none of that state is relevant to the last block processed. Any state related to blocks (confirmed or conflicted) is written to disk.

    The BESTBLOCK record is the last block whose data has been flushed to disk, that the wallet should begin syncing from next time it is reloaded.

    The point is that this discrepancy can result in skipping blocks. It doesn’t make sense to me that we wouldn’t store the block the wallet’s state has been synced to, and it doesn’t make sense that we should rely on a field that means something else to determine what point to start rescanning from on the next load.

    If BESTBLOCK doesn’t match the chainstate, that’s fine because it’s a locator and we will just find the fork and sync from there. I don’t think it’s necessary to record which block we think the chainstate is synced to.

  17. ryanofsky commented at 3:56 pm on June 4, 2024: contributor

    It doesn’t make sense to me that we wouldn’t store the block the wallet’s state has been synced to

    You may be right this is the better approach. But I think the previous approach does make some sense, too. You might not want to write to the wallet database each time a block is connected if the block doesn’t contain any relevant transactions, especially during reindexing.

  18. achow101 commented at 9:03 pm on June 4, 2024: member

    You might not want to write to the wallet database each time a block is connected if the block doesn’t contain any relevant transactions

    I think you would since not writing it would result in possibly rescanning that block at the next loading which takes a little bit of time, regardless of whether any relevant transactions are in the block.

    especially during reindexing.

    Perhaps an easy solution to this is to just write the best block every 1000 blocks (or some other interval) when we are in IBD?

    I’m also curious about the idea of saving height with the best block record. I could imagine this being useful, since IIRC parts of wallet loading are complicated by not knowing heights of transactions before the chain is attached, but it doesn’t really seem like the height is used for much yet. Was this intended to be used by something else later?

    This was mainly done to avoid looking up the height every time we load since that was being problematic and causing a tests to fail. But it definitely could be more useful in the future.

  19. ryanofsky commented at 3:16 pm on June 5, 2024: contributor

    The idea of saving heights is interesting to me because wallet code assumes it know block heights many places, but it doesn’t actually store heights anywhere. So for example, if the wallet stored a mapping of block hashes to heights (or other block information) it might be useful for allowing the wallet to work in an offline mode or letting the wallet CLI tool have more capabilities. This is all pretty abstract though, so I’m not suggesting something specific.

    Perhaps an easy solution to this is to just write the best block every 1000 blocks (or some other interval) when we are in IBD?

    Yes but I think that just adds back the complexity you were trying to get rid of, in a form that seems worse than the status quo. If you implement that, the wallet will be back to tracking the last block processed separately from the best block written to the database. And now, instead of the node determining when data should be flushed to disk and having all wallets flush that data simultaneously, each wallet will have internal logic deciding when to flush. This would be more complicated, and could be worse for power consumption and performance if there are multiple wallets and flushes are happening more frequently at different times.

    I think overall this PR is doing some good things, but the goals seem either not clear, or not clearly good, so I wonder if maybe you could take these changes and make more targeted PRs for the things you want to achieve?

    Or, if this PR is definitely the direction you want to go, I’m happy to review it. I don’t like some things it is doing, but overall I think it is a reasonable change.


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-06-29 07:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me