wallet: wrong balance and crash after reorg and unclean shutdown #31824

issue mzumsande openend this issue on February 7, 2025
  1. mzumsande commented at 7:13 pm on February 7, 2025: contributor

    The wallet can show a wrong balance and/or hit an assert crash in case of a unclean shutdown after a reorg:

    Initial state: Wallet and Node are synced to disk (e.g. a Flush just happened), the coinbase of the tip is a wallet tx. 1.) Tip is disconnected due to a reorg. This will change the status of affected wallet txns to Inactive (which will be synced with the wallet db immediately) - but neither the chainstate nor the wallet locator will be flushed. 2.) The node has an unclean shutdown for an unrelated reason (e.g. power goes off). 3.) The node restarts: The node will be back at the original tip, the wallet locator too (so no rescan is triggered), but the wallet tx state is Inactive (and abandoned for coinbase txns) 4.) The node reorgs once more to the better chain, and then the assert crash happens.

    See https://github.com/mzumsande/bitcoin/commit/c6b201d1a3cb6cf8b8a3dd0390c48b894c617a82 for a functional test that reproduces the above events.

    There are two problems:

    • The wallet will hit an assert if it has a wallet coinbase tx, and the block in question is disconnected again - this should be fixed by #31757
    • The wallet balance is wrong - this also affects non-coinbase txns. This is worse if the the tip disconnect from step 4) does not happen again (maybe the initial chain was extended while the node was offline) because then the transactions from the block will be missing indefinitely until the user performs a manual rescan.

    I think the root of the problems is that changes to the wallet db are synced immediately, but the best block locator isn’t. The solution could be #30221 - but even there, one would have to be very careful with the order of operations. Ideally, the SyncTransaction() updates and locator updates due to block connection / disconnection should be done atomically, in one batch, if possible.

    FYI @furszy @achow101 @ryanofsky

  2. furszy commented at 7:45 pm on February 7, 2025: member

    Nice finding! Cool functional test. Glad to see #31757 fixing the crash.

    Ideally, the SyncTransaction() updates and locator updates due to block connection / disconnection should be done atomically, in one batch, if possible.

    I actually implemented this in #25297, three years ago :). Will see how much work is needed to revive it next week.

  3. ryanofsky commented at 3:53 pm on February 11, 2025: contributor

    I actually implemented this in #25297, three years ago :). Will see how much work is needed to revive it next week. @furszy since I mentioned it the other day, this is a link to the PtrOrValue utility I used in libmultiprocess to allow functions to lock internally or optionally accept an external lock, without requiring two versions of each function.

    A similar mechanism could be used for wallet methods to allow them to use an external batch object or create one internally if not provided, e.g.

    0Result UpdateSomething(WalletBatch* optional_batch, Params params)
    1{
    2    PtrOrValue batch{optional_batch, GetDataBase()};
    3    batch->Write(params...);
    4   ...
    5}
    

    The batch variable there is a wrapper around std::variant<WalletBatch*, WalletBatch> that is set to the optional_batch pointer if it was non-null, or to a newly constructed WalletBatch object if it was null. It has * and -> operators giving access to whatever it was set to.

    This might not be applicable to #25297, but in general an approach like this could allow more functions to accept and use batches, incrementally, without needing big code changes.

  4. rkrux commented at 1:53 pm on February 19, 2025: none

    I think the root of the problems is that changes to the wallet db are synced immediately, but the best block locator isn’t.

    Is this not where the chainstate related changes are flushed to disk? https://github.com/bitcoin/bitcoin/blob/28.x/src/validation.cpp#L3069

    And here block disconnected notification fired that is listened to by the wallet: https://github.com/bitcoin/bitcoin/blob/28.x/src/validation.cpp#L3087

  5. mzumsande commented at 3:36 pm on February 19, 2025: contributor

    Is this not where the chainstate related changes are flushed to disk? https://github.com/bitcoin/bitcoin/blob/28.x/src/validation.cpp#L3069

    The call uses FlushStateMode::IF_NEEDED, which means it usually won’t do anything - it will only flush if it’s necessary because the cache has grown to a critical size.


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-02-23 00:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me