indexes: Don't commit ahead of the flushed chainstate #34897

pull mzumsande wants to merge 3 commits into bitcoin:master from mzumsande:202603_index_sync_dont_commit_ahead changing 8 files +97 −8
  1. mzumsande commented at 5:54 AM on March 23, 2026: contributor

    If indexes commit their data ahead of the flushed chainstate, and there is an unclean shutdown, the index will be corrupted. This is especially the case for the coinstatsindex, which has state (the muhash) which can't easily be rolled back without access to the blocks. This was only partly fixed in #33212 (for reorg scenarios) but could still happen during initial sync.

    Fix this more thoroughly by having the node keep track of the last flushed block, and skipping index commits if the current block of the index is not an ancestor of the node's last flushed block (similar to the suggestion by stickies-v in #33212#pullrequestreview-31408570890.

    I tried to follow the spirit of #24230 by adding the call to the chain interface (which involved adding an oddly specific method there) in order to avoid directly reaching into the node from the index code.

    Fixes #33208 Fixes #34261

  2. DrahtBot added the label UTXO Db and Indexes on Mar 23, 2026
  3. DrahtBot commented at 5:54 AM on March 23, 2026: contributor

    <!--e57a25ab6845829454e8d69fc972939a-->

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    <!--006a51241073e994b41acfe9ec718e94-->

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/34897.

    <!--021abf342d371248e50ceaed478a90ca-->

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK sedited, ryanofsky
    Concept ACK l0rinc, arejula27
    Stale ACK furszy, fjahr

    If your review is incorrectly listed, please copy-paste <code>&lt;!--meta-tag:bot-skip--&gt;</code> into the comment that the bot should ignore.

    <!--174a7506f384e20aa4161008e828411d-->

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #35465 (coins: compact chainstate regularly by l0rinc)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

    <!--5faf32d7da4f0f540f40219e4f7537a3-->

  4. fjahr commented at 8:23 AM on March 23, 2026: contributor

    Concept ACK

  5. l0rinc commented at 1:12 PM on March 25, 2026: contributor

    I have reported something similar in #34489 (review) and #34489 (review) - will this PR fix those? Concept ACK regardless, thanks for fixing them!

  6. in src/index/base.cpp:283 in 5af4d6e2a6
     278 | +        const CBlockIndex* index_tip = m_best_block_index.load();
     279 | +        if (index_tip && !m_chain->isBlockInFlushedChain(index_tip->GetBlockHash(), index_tip->nHeight)) {
     280 | +            LogInfo("Skipping commit, index is ahead of flushed chainstate");
     281 | +            return false;
     282 | +        }
     283 | +
    


    arejula27 commented at 10:29 AM on March 29, 2026:

    This is a behavioral change: Commit() can now return false without an actual error, meaning "intentionally skipped." Callers in Sync() currently treat Commit() failures as non-fatal, so this works in practice. However, the documentation for Commit() in base.h:96-103 only describes error scenarios and recovery recommendations, it should be updated to reflect this new "skip" behavior and the conditions under which it triggers (index ahead of flushed chainstate). I migth think this could be a good place to use the new Expected type. Errors will return an error and a no-commit cuz blocks no flushed would return false


    furszy commented at 8:24 PM on May 21, 2026:

    We are currently not using the Commit return value. I wouldn't worry about this now.


    fjahr commented at 12:41 PM on May 29, 2026:

    nit: Expected indeed seems like overkill but I also agree the documentation in base.h could be updated with a sentence on this new behavior, independent of what's being returned. Something like:

    "Will not commit if the index's best block is ahead of the chainstate's last flushed block. This avoids persisting state an unclean shutdown could not roll back from. A later call commits when the chainstate has flushed far enough."


    l0rinc commented at 9:43 PM on May 31, 2026:

    We could at least return true for now and maybe document it, so the header no longer promises a write when the safe outcome is to defer:

    /// Try to write the current index state (eg. chain block locator and subclass-specific items) to disk.
    ///
    /// Returns true without writing if no block has been indexed yet or if the index tip is ahead of
    /// the last flushed chainstate block. A later call will commit once the chainstate has flushed
    /// far enough. This makes false returns mean real write failures only.
    ///
    /// Recommendations for error handling:
    /// If called on a successor of the previous committed best block in the index, the index can
    /// continue processing without risk of corruption, though the index state will need to catch up
    /// from further behind on reboot. If the new state is not a successor of the previous state (due
    /// to a chain reorganization), the index must halt until Commit succeeds or else it could end up
    /// getting corrupted.
    bool Commit();
    

    arejula27 commented at 6:38 PM on June 1, 2026:

    yes, i like this approach


    mzumsande commented at 6:39 PM on June 8, 2026:

    I added some documentation similar to @fjahr's suggestion (and removed the previous documentation which was wrong, because it assumed that we use the locator to track back, which we don't). I kept the return value logic at indicating whether a commit was done or not. I think changing the logic makes only sense when the return value is actually used for something.

  7. in src/test/coinstatsindex_tests.cpp:1 in 5af4d6e2a6 outdated


    arejula27 commented at 10:38 AM on March 29, 2026:

    Would it be interesting to add these test cases?

    • Reorg scenario: Verify behavior when the index is on one fork and the flushed chainstate is on another.
    • Recovery: Verify the index resumes from the last chainstate flush point. I think will be interesting to verify the corret initialization behaviour of isBlockInFlushedChain

    mzumsande commented at 6:40 PM on June 8, 2026:

    I rewrote the test so that it now covers the Sync() failures that this PR fixes.

  8. arejula27 commented at 10:48 AM on March 29, 2026: none

    <!--meta-tag:bot-skip-->

    Concept ACK [5af4d6e2a63e32f36b4dcb7602c562cf953a41ad]

    However, I'm concerned about a potential starvation scenario during IBD. Two threads run in parallel: the validation thread connects blocks and periodically flushes the UTXO set to disk (~1h intervals or when the cache is full), while the index thread (Sync()) processes already-connected blocks and calls Commit() every 30 seconds. Since index processing (filters, tx lookups) is significantly lighter than UTXO set updates, the index will typically stay ahead of the flush point, even though it's always behind the chain tip.

    This means that during IBD, every Commit() call could be skipped because the index is perpetually ahead of m_last_flushed_block. A crash at any point would lose all indexing progress back to the initial LoadChainTip value, potentially hours of work. I will run experiments to verify whether this actually happens and propose a solution in that case.

    I also suggest discussing an alternative approach for index commits: committing the locator up to the flushed block instead of skipping entirely. For example, if the index has processed up to block 5000 but the last flush was at block 3000, the locator could be written at 3000. This would preserve the safety invariant (locator never exceeds the flush point) while still persisting progress. That said, this would require rethinking how Commit() and CustomCommit() work across index subclasses.

    This request also concerns me from a performance perspective: if index commit progress is now bounded by the chainstate flush interval, any index-side speed optimization becomes less impactful. PRs like #34489 (batch DB writes) and #26966 (parallelized index sync) focus on making indexes faster, but with this change the bottleneck shifts to how often the chainstate flushes, something entirely outside the index's control. I would love to see those optimizations land in Core, so it would be worth ensuring this safety fix doesn't inadvertently cap their benefits.

  9. in src/index/base.cpp:275 in 7933e270ee
     271 | @@ -272,7 +272,15 @@ bool BaseIndex::Commit()
     272 |      // Don't commit anything if we haven't indexed any block yet
     273 |      // (this could happen if init is interrupted).
     274 |      bool ok = m_best_block_index != nullptr;
     275 | +
    


    arejula27 commented at 4:24 PM on March 29, 2026:

    nit: why new line? On lines 278–279 a new variable is being initialised and used in an if guard on the next line without no new line space


    mzumsande commented at 6:40 PM on June 8, 2026:

    fixed

  10. mzumsande commented at 6:38 AM on March 30, 2026: contributor

    However, I'm concerned about a potential starvation scenario during IBD. Two threads run in parallel: the validation thread connects blocks and periodically flushes the UTXO set to disk (~1h intervals or when the cache is full), while the index thread (Sync()) processes already-connected blocks and calls Commit() every 30 seconds. Since index processing (filters, tx lookups) is significantly lighter than UTXO set updates, the index will typically stay ahead of the flush point, even though it's always behind the chain tip.

    The Sync thread runs only temporary, when the index is behind the chainstate (e.g. newly started on a node that was running without an index before). For example, if a new node starts IBD with an index, Sync will stop immediately and never do any work. As soon as the index has caught up with the tip, the Sync thread is stopped and we use validation signals to keep on syncing. Once that happens, the index will commit with each ChainstateFlushed validation callback, and since validation signals are processed in order, I don't think its best block can ever be ahead of the flushed chainstate then.

    So I don't think the problem you describe is an issue. Commits will only be skipped in the limited time span where the Sync thread is almost caught up, due to additional blocks being connected during the earlier sync. This should be a relatively short period. They won't be skipped if the index is catching up from far behind, or once the index has reached the tip.

    This request also concerns me from a performance perspective

    Commit doesn't do any significant work, this is being done in ProcessBlock(), which will write things to disk independently. Commit only writes the data required to know where we should pick up at the next start (including state data such as the current MuHash of the coinstatsindex: If we don't Commit and have an unclean restart, we might have to do some work again in case of an unclean restart, if we do Commit ahead of the tip and have an unclean restart, the index is corrupted.

    I'll address the other comments later.

  11. ryanofsky commented at 8:57 PM on April 7, 2026: contributor

    Concept ACK 5af4d6e2a63e32f36b4dcb7602c562cf953a41ad. Good to follow up on this issue.

    I feel like this could skip adding the isBlockInFlushedChain method and access the chainstate directly since #24230 moves sync thread from the node to the index. But either way seems fine.

    I also implemented a similar fix in 6537f9655a292b09db7be465601883d4fe87d7d7 from #24230 which has some differences like the check being done inside Sync() instead of Commit(). Not sure if the differences are significant though.

  12. in src/validation.cpp:2816 in 489bb4a5d8 outdated
    2812 | @@ -2813,6 +2813,7 @@ bool Chainstate::FlushStateToDisk(
    2813 |                  }
    2814 |                  // Flush the chainstate (which may refer to block index entries).
    2815 |                  empty_cache ? CoinsTip().Flush() : CoinsTip().Sync();
    2816 | +                m_last_flushed_block = m_chain.Tip();
    


    furszy commented at 2:37 PM on May 21, 2026:

    Coming from the lack of m_chain usage in this function, I'm wondering if we should add a sanity-check here?

    Assume(m_last_flushed_block->GetBlockHash() == CoinsTip().GetBestBlock());
    

    or.. maybe directly use

    m_last_flushed_block =  m_blockman.LookupBlockIndex(CoinsTip().GetBestBlock());
    

    l0rinc commented at 9:48 PM on May 31, 2026:

    489bb4a validation: track last flushed block:

    Could this new ChainStateFlushed locator come from the recorded flushed block instead of rereading m_chain.Tip()?

    diff --git a/src/validation.cpp b/src/validation.cpp
    --- a/src/validation.cpp	(revision 30c92e346762ec5d046d81d46d08811ccd9e26b3)
    +++ b/src/validation.cpp	(revision 2296c4e0101ee7de230478d0dd7323b04a849fe4)
    @@ -2831,7 +2831,7 @@
         }
         if (full_flush_completed && m_chainman.m_options.signals) {
             // Update best block in wallet (so we can detect restored wallets).
    -        m_chainman.m_options.signals->ChainStateFlushed(this->GetRole(), GetLocator(m_chain.Tip()));
    +        m_chainman.m_options.signals->ChainStateFlushed(this->GetRole(), GetLocator(Assert(m_last_flushed_block)));
         }
         } catch (const std::runtime_error& e) {
             return FatalError(m_chainman.GetNotifications(), state, strprintf(_("System error while flushing: %s"), e.what()));
    

    mzumsande commented at 6:42 PM on June 8, 2026:

    I added the suggested sanity check, and reused m_last_flushed. Note that we did use m_chain for the ChainStateFlushed() notification there. However, I just noticed that some unit tests (validation_chainstate_resize_caches) don't populate m_chain and fail with the Assume. For now I have fixed this by making it conditional to a tip existing - I'll look later into changing the test instead.


    mzumsande commented at 8:21 PM on June 8, 2026:

    Still doesn't work. The Assume is also hit by the validation_chainstatemanager_tests which manually changes the tip to be lagging behind the CoinsTip()- this is a dirty hack to make AssumeUtxo testable in unit tests (Snapshots will only activate when they have more work than the tip...). I have removed the Assume() for now. Changing it to CoinsTip() would work, but then we should probably also change the notification. Would you prefer that?

  13. in src/test/coinstatsindex_tests.cpp:148 in 5af4d6e2a6
     143 | +            new_block = std::make_shared<CBlock>(block);
     144 | +
     145 | +            LOCK(cs_main);
     146 | +            BlockValidationState state;
     147 | +            BOOST_CHECK(CheckBlock(block, state, params.GetConsensus()));
     148 | +            BOOST_CHECK(m_node.chainman->AcceptBlock(new_block, state, &new_block_index, true, nullptr, nullptr, true));
    


    furszy commented at 7:18 PM on May 21, 2026:

    nit: it seems you can inline the std::make_shared<CBlock>(block) here. It is not used elsewhere.


    mzumsande commented at 6:43 PM on June 8, 2026:

    new_block was used a few lines below in BlockConnected(). But no longer relevant since I rewrote the test.

  14. in src/test/coinstatsindex_tests.cpp:141 in 5af4d6e2a6
     136 | +        // never flushed, m_last_flushed_block is still null.
     137 | +        std::shared_ptr<const CBlock> new_block;
     138 | +        CBlockIndex* new_block_index = nullptr;
     139 | +        {
     140 | +            const CScript script_pub_key{CScript() << ToByteVector(coinbaseKey.GetPubKey()) << OP_CHECKSIG};
     141 | +            const CBlock block = this->CreateBlock({}, script_pub_key, chainstate);
    


    furszy commented at 7:41 PM on May 21, 2026:

    nit

                const CBlock block = this->CreateBlock({}, CScript() << OP_TRUE, chainstate);
    

    mzumsande commented at 6:44 PM on June 8, 2026:

    rewrote the test

  15. in src/test/coinstatsindex_tests.cpp:150 in 5af4d6e2a6
     145 | +            LOCK(cs_main);
     146 | +            BlockValidationState state;
     147 | +            BOOST_CHECK(CheckBlock(block, state, params.GetConsensus()));
     148 | +            BOOST_CHECK(m_node.chainman->AcceptBlock(new_block, state, &new_block_index, true, nullptr, nullptr, true));
     149 | +            CCoinsViewCache view(&chainstate.CoinsTip());
     150 | +            BOOST_CHECK(chainstate.ConnectBlock(block, state, new_block_index, view));
    


    furszy commented at 7:45 PM on May 21, 2026:

    what do you think about flushing the validation queue here. So it does not interfere with the checks you do below?

    m_node.chain->context()->validation_signals->SyncWithValidationInterfaceQueue();
    

    mzumsande commented at 7:00 PM on June 8, 2026:

    no longer relevant, since I rewrote the test.

  16. in src/index/base.cpp:280 in 7933e270ee
     271 | @@ -272,7 +272,15 @@ bool BaseIndex::Commit()
     272 |      // Don't commit anything if we haven't indexed any block yet
     273 |      // (this could happen if init is interrupted).
     274 |      bool ok = m_best_block_index != nullptr;
     275 | +
     276 |      if (ok) {
     277 | +        // Don't commit if the index is ahead of the chainstate's last flushed block
     278 | +        const CBlockIndex* index_tip = m_best_block_index.load();
     279 | +        if (index_tip && !m_chain->isBlockInFlushedChain(index_tip->GetBlockHash(), index_tip->nHeight)) {
     280 | +            LogInfo("Skipping commit, index is ahead of flushed chainstate");
    


    furszy commented at 8:12 PM on May 21, 2026:

    nit: Could log where the index is at vs the chain height, so the diff gets logged.


    fjahr commented at 12:03 PM on May 29, 2026:

    nit: Also LogInfo on every skip seems a bit excessive. Maybe make this a debug log?


    mzumsande commented at 7:02 PM on June 8, 2026:

    I added the heights. It should be quite rare though (only happens at the end of Sync), so I'm not sure if it's worth adding a new debug log category for indexes just for this.

  17. in src/index/base.cpp:279 in 7933e270ee
     271 | @@ -272,7 +272,15 @@ bool BaseIndex::Commit()
     272 |      // Don't commit anything if we haven't indexed any block yet
     273 |      // (this could happen if init is interrupted).
     274 |      bool ok = m_best_block_index != nullptr;
     275 | +
     276 |      if (ok) {
     277 | +        // Don't commit if the index is ahead of the chainstate's last flushed block
     278 | +        const CBlockIndex* index_tip = m_best_block_index.load();
     279 | +        if (index_tip && !m_chain->isBlockInFlushedChain(index_tip->GetBlockHash(), index_tip->nHeight)) {
    


    furszy commented at 8:18 PM on May 21, 2026:

    nit: We are already checking index_tip is not null above (that is what ok is for), maybe just load it outside the if block.


    l0rinc commented at 5:52 PM on May 31, 2026:

    7933e27 index: Don't commit ahead of the flushed chainstate:

    coinstatsindex doesn't seem to be only index with a similar race, see: #34489 (review)

    Can we cover it in this PR?


    l0rinc commented at 6:33 PM on May 31, 2026:

    7933e27 index: Don't commit ahead of the flushed chainstate:

    Since this is basically the only usage of this method, we should be able to tailor it more, to let the chain interface take just the candidate block hash instead of a hash and height pair from the caller:

            if (!m_chain->isAncestorOfLastFlushedBlock(best_block_index->GetBlockHash())) {
    

    calling:

    bool isAncestorOfLastFlushedBlock(const uint256& block_hash) override
    {
        LOCK(::cs_main);
        const CBlockIndex* block{chainman().m_blockman.LookupBlockIndex(block_hash)};
        const CBlockIndex* last_flushed{chainman().ValidatedChainstate().GetLastFlushedBlock()};
        return block && last_flushed && last_flushed->GetAncestor(block->nHeight) == block;
    }
    

    mzumsande commented at 7:02 PM on June 8, 2026:

    removed the check


    mzumsande commented at 7:44 PM on June 8, 2026:

    All indexes are affected by it because we don't have a special logic for indexes without state. I added it to the coinstatsindex file because that index has state and actually needs to rewind. I think it would make sense to add a base_index tests for generic functionality that affects all indexes (and run the test for all index types, or maybe pick a random one), but I think it would be better to do that in a follow-up because it should affect other existing tests too.


    mzumsande commented at 7:46 PM on June 8, 2026:

    I took this change, seems like a simpler interface with just one parameter, and the added lookup should not be performance critical since commits are rare.

  18. furszy commented at 8:31 PM on May 21, 2026: member

    ACK 5af4d6e2a63e32f36b4dcb7602c562cf953a41ad , left a few suggestions. Nothing blocking.

  19. DrahtBot requested review from l0rinc on May 21, 2026
  20. DrahtBot requested review from fjahr on May 21, 2026
  21. DrahtBot requested review from arejula27 on May 21, 2026
  22. DrahtBot requested review from ryanofsky on May 21, 2026
  23. arejula27 commented at 9:13 AM on May 25, 2026: none

    ACK 5af4d6e2a63e32f36b4dcb7602c562cf953a41ad

    I ran the experiments I suggested in my previous Concept ACK.

    The skip path does trigger, but the loss window is bounded by 30s, which is negligible compared to a multi-day IBD, so the changes are correct.

    I wopudl encorague again to add a few more tests:

    • Reorg scenario: Verify behavior when the index is on one fork and the flushed chainstate is on another.
    • Recovery: Verify the index resumes from the last chainstate flush point. I think will be interesting to verify the corret initialization behaviour of isBlockInFlushedChain
  24. sedited commented at 10:03 AM on May 29, 2026: contributor

    @mzumsande There's a silent merge conflict here:

    /home/drgrid/bitcoin/src/test/coinstatsindex_tests.cpp:154:72: error: too many arguments to function call, expected 2, have 3
      154 |             const CBlock block = this->CreateBlock({}, script_pub_key, chainstate);
          |                                  ~~~~~~~~~~~~~~~~~                     ^~~~~~~~~~
    /home/drgrid/bitcoin/src/test/util/setup_common.h:153:12: note: 'CreateBlock' declared here
      153 |     CBlock CreateBlock(
          |            ^
      154 |         const std::vector<CMutableTransaction>& txns,
          |         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      155 |         const CScript& scriptPubKey);
          |         ~~~~~~~~~~~~~~~~~~~~~~~~~~~
    
  25. fanquake added this to the milestone 32.0 on May 29, 2026
  26. in src/validation.h:895 in 489bb4a5d8 outdated
     891 | @@ -889,6 +892,7 @@ class Chainstate
     892 |          EXCLUSIVE_LOCKS_REQUIRED(::cs_main);
     893 |  
     894 |      NodeClock::time_point m_next_write{NodeClock::time_point::max()};
     895 | +    CBlockIndex* m_last_flushed_block GUARDED_BY(cs_main){nullptr};
    


    fjahr commented at 12:00 PM on May 29, 2026:

    style-nit:

        CBlockIndex* m_last_flushed_block GUARDED_BY(::cs_main){nullptr};
    

    mzumsande commented at 7:03 PM on June 8, 2026:

    done

  27. in src/node/interfaces.cpp:661 in 489bb4a5d8 outdated
     652 | @@ -653,6 +653,15 @@ class ChainImpl : public Chain
     653 |          }
     654 |          return false;
     655 |      }
     656 | +    bool isBlockInFlushedChain(const uint256& block_hash, int height) override
     657 | +    {
     658 | +        LOCK(::cs_main);
     659 | +        const CBlockIndex* last_flushed = chainman().ValidatedChainstate().GetLastFlushedBlock();
     660 | +        if (!last_flushed) return false;
     661 | +        if (last_flushed->nHeight < height) return false;
    


    fjahr commented at 12:13 PM on May 29, 2026:

    This line is redundant here because GetAncestor does the same check in the first line of its body:

        if (height > nHeight || height < 0) {
            return nullptr;
        }
    

    mzumsande commented at 7:05 PM on June 8, 2026:

    good point, but since I took the suggestion from @l0rinc no longer relevant.

  28. in src/test/coinstatsindex_tests.cpp:123 in 5af4d6e2a6
     118 | +{
     119 | +    Chainstate& chainstate = Assert(m_node.chainman)->ActiveChainstate();
     120 | +    const CChainParams& params = Params();
     121 | +    const CBlockIndex* old_tip;
     122 | +    {
     123 | +        LOCK(cs_main);
    


    fjahr commented at 12:45 PM on May 29, 2026:

    style-nit: (once more below)

            LOCK(::cs_main);
    

    mzumsande commented at 7:05 PM on June 8, 2026:

    no longer relevant (rewrote the test)

  29. in src/test/coinstatsindex_tests.cpp:157 in 5af4d6e2a6
     152 | +
     153 | +        // Notify the index about the new block
     154 | +        ValidationInterfaceTest::BlockConnected(ChainstateRole{}, index, new_block, new_block_index);
     155 | +
     156 | +        // Trigger a commit via ChainStateFlushed. The chainstate was never
     157 | +        // flushed (m_last_flushed_block is null), so Commit() should skip.
    


    fjahr commented at 10:01 PM on May 29, 2026:

    nit: The test is a bit light as it only covers this "m_last_flushed_block is null" case and not the seemingly more common scenario where it is set. This test still passes even if isBlockInFlushedChain() just returns false unconditionally. This extension of the test gives coverage for the more common case: https://github.com/fjahr/bitcoin/commit/75a4dacf629bfd1e47f59f05b8f57d0a3948cda8 But I consider this non-blocking since I can also open a follow-up for this.


    mzumsande commented at 7:08 PM on June 8, 2026:

    It's kind of "cheating" to call ValidationInterfaceTest::ChainStateFlushed to get a Commit() call without actually flushing the chainstate (which my old test already did). Therefore I rewrote the test myself to actually trigger the Sync() scenario that this PR fixes.

  30. fjahr commented at 10:09 PM on May 29, 2026: contributor

    ACK 5af4d6e2a63e32f36b4dcb7602c562cf953a41ad

    Would be ok for me to merge as-is but I think particularly @furszy 's comments here and here would be good to address. Maybe some of mine as well but I don't consider them strictly blocking and I would open a follow-up if they are not addressed. I will re-review quickly if you decide to address the open comments.

    EDIT: Never mind, I guess you have to push anyway due to the silent merge conflict.

  31. DrahtBot requested review from arejula27 on May 29, 2026
  32. in src/test/coinstatsindex_tests.cpp:128 in 5af4d6e2a6
     123 | +        LOCK(cs_main);
     124 | +        old_tip = m_node.chainman->ActiveChain().Tip();
     125 | +    }
     126 | +
     127 | +    {
     128 | +        CoinStatsIndex index{interfaces::MakeChain(m_node), 1 << 20};
    


    l0rinc commented at 5:43 PM on May 31, 2026:

    5af4d6e test: add test to ensure indexes dont commit too early:

    nit:

            CoinStatsIndex index{interfaces::MakeChain(m_node), /*n_cache_size=*/1_MiB};
    

    mzumsande commented at 7:08 PM on June 8, 2026:

    done

  33. in src/test/coinstatsindex_tests.cpp:117 in 5af4d6e2a6 outdated
     109 | @@ -109,4 +110,68 @@ BOOST_FIXTURE_TEST_CASE(coinstatsindex_unclean_shutdown, TestChain100Setup)
     110 |      }
     111 |  }
     112 |  
     113 | +// Test that the index does not commit ahead of the chainstate's last
     114 | +// flushed block. If it did, a subsequent unclean shutdown would corrupt
     115 | +// the index, because during reverting it would require blocks that were
     116 | +// never flushed to disk.
     117 | +BOOST_FIXTURE_TEST_CASE(coinstatsindex_no_commit_ahead_of_flush, TestChain100Setup)
    


    l0rinc commented at 5:45 PM on May 31, 2026:

    5af4d6e test: add test to ensure indexes dont commit too early:

    This seems like a good example for #35260 - it would help me with the review to know what the state was before the change and which commit changes the assumption and how and where it changes it. Maybe something like this documenting the state before the fix:

    BOOST_FIXTURE_TEST_CASE(coinstatsindex_commits_ahead_of_chainstate_flush, TestChain100Setup)
    {
        Chainstate& chainstate = Assert(m_node.chainman)->ActiveChainstate();
        BlockValidationState state;
    
        auto current_tip{[&] { return Assert(WITH_LOCK(::cs_main, return chainstate.m_chain.Tip())); }};
        auto coinsdb_best_is{[&](const CBlockIndex& block) {
            LOCK(::cs_main);
            BOOST_CHECK(chainstate.CoinsDB().GetBestBlock() == block.GetBlockHash());
        }};
        auto sync_fresh_index{[&](int expected_height) {
            CoinStatsIndex index{interfaces::MakeChain(m_node), /*n_cache_size=*/1_MiB, /*f_memory=*/false, /*f_wipe=*/true};
            BOOST_REQUIRE(index.Init());
            index.Sync();
            BOOST_CHECK_EQUAL(index.GetSummary().best_block_height, expected_height);
            index.Stop();
        }};
        auto read_persisted_index_height{[&] {
            CoinStatsIndex index{interfaces::MakeChain(m_node), /*n_cache_size=*/1_MiB};
            BOOST_REQUIRE(index.Init());
            const int height{index.GetSummary().best_block_height};
            index.Stop();
            return height;
        }};
        auto flush_and_commit_index{[&] {
            CoinStatsIndex index{interfaces::MakeChain(m_node), /*n_cache_size=*/1_MiB};
            BOOST_REQUIRE(index.Init());
            index.Sync();
            BOOST_REQUIRE(chainstate.FlushStateToDisk(state, FlushStateMode::FORCE_FLUSH));
            m_node.validation_signals->SyncWithValidationInterfaceQueue();
            index.Stop();
        }};
        // For one unflushed index tip, check the durable height before and after the chainstate catches up.
        auto check_unflushed_index_commit{[&](const CBlockIndex& index_tip, int expected_persisted_height) {
            sync_fresh_index(index_tip.nHeight);
            BOOST_CHECK_EQUAL(read_persisted_index_height(), expected_persisted_height);
            flush_and_commit_index();
            BOOST_CHECK_EQUAL(read_persisted_index_height(), index_tip.nHeight);
        }};
    
        // Start from a known durable chainstate boundary.
        BOOST_REQUIRE(chainstate.FlushStateToDisk(state, FlushStateMode::FORCE_FLUSH));
        auto* flushed_tip{current_tip()};
        coinsdb_best_is(*flushed_tip);
    
        // Straight-line case: connect one more block without flushing the chainstate.
        CreateAndProcessBlock({}, CScript() << OP_TRUE);
        auto* unflushed_tip{current_tip()};
        BOOST_REQUIRE_GT(unflushed_tip->nHeight, flushed_tip->nHeight);
        coinsdb_best_is(*flushed_tip);
    
        check_unflushed_index_commit(*unflushed_tip, /*expected_persisted_height=*/unflushed_tip->nHeight); // TODO: Persists ahead of the flushed chainstate.
    
        auto* reorg_flushed_tip{current_tip()};
        coinsdb_best_is(*reorg_flushed_tip);
    
        // Fork case: leave the flushed chainstate behind, then index a competing unflushed tip.
        BOOST_REQUIRE(chainstate.InvalidateBlock(state, reorg_flushed_tip));
        NodeClockContext clock_ctx{};
        clock_ctx += 1s;
        CreateAndProcessBlock({}, CScript() << OP_TRUE);
        clock_ctx += 1s;
        CreateAndProcessBlock({}, CScript() << OP_TRUE);
    
        auto* fork_tip{current_tip()};
        BOOST_REQUIRE_EQUAL(fork_tip->nHeight, reorg_flushed_tip->nHeight + 1);
        BOOST_CHECK(fork_tip->GetAncestor(reorg_flushed_tip->nHeight) != reorg_flushed_tip);
        coinsdb_best_is(*reorg_flushed_tip);
    
        check_unflushed_index_commit(*fork_tip, /*expected_persisted_height=*/fork_tip->nHeight); // TODO: Persists a fork from the flushed chainstate.
    }
    

    so that the commit fixing it would have a minimal diff of what it affects exactly:

    diff --git a/src/test/coinstatsindex_tests.cpp b/src/test/coinstatsindex_tests.cpp
    --- a/src/test/coinstatsindex_tests.cpp	(revision 7177975c1d50fd4e0164fa6093a06b68302f0115)
    +++ b/src/test/coinstatsindex_tests.cpp	(revision e8197142a66c98e72d2f8c6f138f4e29a9d83b94)
    @@ -124,7 +124,7 @@
         }
     }
     
    -BOOST_FIXTURE_TEST_CASE(coinstatsindex_commits_ahead_of_chainstate_flush, TestChain100Setup)
    +BOOST_FIXTURE_TEST_CASE(coinstatsindex_commit_is_limited_by_chainstate_flush, TestChain100Setup)
     {
         Chainstate& chainstate = Assert(m_node.chainman)->ActiveChainstate();
         BlockValidationState state;
    @@ -175,7 +175,7 @@
         BOOST_REQUIRE_GT(unflushed_tip->nHeight, flushed_tip->nHeight);
         coinsdb_best_is(*flushed_tip);
     
    -    check_unflushed_index_commit(*unflushed_tip, /*expected_persisted_height=*/unflushed_tip->nHeight); // TODO: Persists ahead of the flushed chainstate.
    +    check_unflushed_index_commit(*unflushed_tip, /*expected_persisted_height=*/0);
     
         auto* reorg_flushed_tip{current_tip()};
         coinsdb_best_is(*reorg_flushed_tip);
    @@ -193,7 +193,7 @@
         BOOST_CHECK(fork_tip->GetAncestor(reorg_flushed_tip->nHeight) != reorg_flushed_tip);
         coinsdb_best_is(*reorg_flushed_tip);
     
    -    check_unflushed_index_commit(*fork_tip, /*expected_persisted_height=*/fork_tip->nHeight); // TODO: Persists a fork from the flushed chainstate.
    +    check_unflushed_index_commit(*fork_tip, /*expected_persisted_height=*/0);
     }
     
     BOOST_AUTO_TEST_SUITE_END()
    

    mzumsande commented at 7:39 PM on June 8, 2026:

    I have tried this approach in the past, I prefer to add the full test (which can be cherry-picked on master to see the failure). Otherwise, it adds a lot of churn to check every comment etc. if it already made sense before, without the context of the fix. But I did rewrite the unit test, it should be very obvious how it fails on master.


    ryanofsky commented at 4:42 PM on June 16, 2026:

    re: #34897 (review)

    Agree with l0rinc, there is no significant churn here and adding the test first would make the bugfix commit easier to understand, and avoid the need for reviewers to do manual work to verify the test meaningfully checks the changed behavior, when this could be checked by the "test ancestor commits" CI job.

    My diff to confirm the test was:

    --- b/src/test/coinstatsindex_tests.cpp
    +++ a/src/test/coinstatsindex_tests.cpp
    @@ -145,7 +145,7 @@ BOOST_FIXTURE_TEST_CASE(coinstatsindex_no_commit_ahead_of_flush, TestChain100Set
         {
             CoinStatsIndex index{interfaces::MakeChain(m_node), /*n_cache_size=*/1_MiB};
             BOOST_REQUIRE(index.Init());
    -        BOOST_CHECK_EQUAL(index.GetSummary().best_block_height, 100);
    +        BOOST_CHECK_EQUAL(index.GetSummary().best_block_height, 0);
             index.Stop();
         }
     
    @@ -182,7 +182,7 @@ BOOST_FIXTURE_TEST_CASE(coinstatsindex_no_commit_ahead_of_flush, TestChain100Set
         {
             CoinStatsIndex index{interfaces::MakeChain(m_node), /*n_cache_size=*/1_MiB};
             BOOST_REQUIRE(index.Init());
    -        BOOST_CHECK_EQUAL(index.GetSummary().best_block_height, 101);
    +        BOOST_CHECK_EQUAL(index.GetSummary().best_block_height, 100);
             index.Stop();
         }
     }
    
  34. in src/test/coinstatsindex_tests.cpp:164 in 5af4d6e2a6
     159 | +        ValidationInterfaceTest::ChainStateFlushed(ChainstateRole{}, index, locator);
     160 | +
     161 | +        index.Stop();
     162 | +    }
     163 | +
     164 | +    // Reload the index and verify the committed state.
    


    l0rinc commented at 6:05 PM on May 31, 2026:

    Shouldn't we also test the reorg scenario?


    mzumsande commented at 7:45 PM on June 8, 2026:

    I hope the updated test covers it.

  35. in src/interfaces/chain.h:192 in 489bb4a5d8
     184 | @@ -185,6 +185,12 @@ class Chain
     185 |      //! the height range from min_height to max_height, inclusive.
     186 |      virtual bool hasBlocks(const uint256& block_hash, int min_height = 0, std::optional<int> max_height = {}) = 0;
     187 |  
     188 | +    //! Check whether the given block is an ancestor of (or equal to) the last
     189 | +    //! block whose chainstate was flushed to disk. Uses the fully validated
     190 | +    //! chainstate, which is the one indexes follow. Returns false if no flush
     191 | +    //! has happened yet.
     192 | +    virtual bool isBlockInFlushedChain(const uint256& block_hash, int height) = 0;
    


    l0rinc commented at 6:55 PM on May 31, 2026:

    489bb4a validation: track last flushed block:

    We're adding dead code in this commit that is only used in the second one (where we could tailor it more to our needs). But because we don't actually see the usage, it's hard to review and we're jumping back and forth. Could you please document the current state with some characterization tests that help us understand the problem before jumping to the solution, followed by the fix which updates these assumptions minimally to document precisely the effect this change has on the behavior.


    If you decide to keep this split, could the existing chainstate write test also verify which block the new flushed-boundary state records:

    diff --git a/src/test/chainstate_write_tests.cpp b/src/test/chainstate_write_tests.cpp
    --- a/src/test/chainstate_write_tests.cpp	(revision 1200703e45251337b5d8e8cf437117224371fa72)
    +++ b/src/test/chainstate_write_tests.cpp	(revision 51a653915149ceaa023e2a2a82c68605969bf3df)
    @@ -85,6 +85,7 @@
     
         // Set m_next_write to current time
         chainstate.FlushStateToDisk(state_dummy, FlushStateMode::FORCE_FLUSH);
    +    BOOST_CHECK_EQUAL(WITH_LOCK(::cs_main, return chainstate.GetLastFlushedBlock()), second_from_tip->pprev);
         m_node.validation_signals->SyncWithValidationInterfaceQueue();
         // The periodic flush interval is between 50 and 70 minutes (inclusive)
         // The next call to a PERIODIC write will flush
    @@ -101,6 +102,7 @@
         // inside the outer loop.
         m_node.validation_signals->SyncWithValidationInterfaceQueue();
         BOOST_CHECK_EQUAL(sub->m_flushed_at_block, second_from_tip);
    +    BOOST_CHECK_EQUAL(WITH_LOCK(::cs_main, return chainstate.GetLastFlushedBlock()), second_from_tip);
     }
     
     BOOST_AUTO_TEST_SUITE_END()
    

    mzumsande commented at 7:46 PM on June 8, 2026:

    I added the lines to chainstate_write_tests, but I kept the test in one commit (see above for reasoning).

  36. in src/node/interfaces.cpp:660 in 489bb4a5d8
     652 | @@ -653,6 +653,15 @@ class ChainImpl : public Chain
     653 |          }
     654 |          return false;
     655 |      }
     656 | +    bool isBlockInFlushedChain(const uint256& block_hash, int height) override
     657 | +    {
     658 | +        LOCK(::cs_main);
     659 | +        const CBlockIndex* last_flushed = chainman().ValidatedChainstate().GetLastFlushedBlock();
     660 | +        if (!last_flushed) return false;
    


    l0rinc commented at 7:00 PM on May 31, 2026:

    489bb4a validation: track last flushed block:

    nit: this can go to the final return as well to simplify this a bit


    mzumsande commented at 7:49 PM on June 8, 2026:

    no longer relevant, since I took your suggestion to rewrite the function.

  37. in src/index/base.cpp:274 in 7933e270ee outdated
     271 | @@ -272,7 +272,15 @@ bool BaseIndex::Commit()
     272 |      // Don't commit anything if we haven't indexed any block yet
     273 |      // (this could happen if init is interrupted).
     274 |      bool ok = m_best_block_index != nullptr;
    


    l0rinc commented at 9:45 PM on May 31, 2026:

    7933e27 index: Don't commit ahead of the flushed chainstate:

    This isn't a "real" nullptr check, so we could probably simplify the structure a bit by something like:

    const CBlockIndex* best_block_index{m_best_block_index.load()};
    if (!best_block_index) return true;
    
    if (!m_chain->isAncestorOfLastFlushedBlock(best_block_index->GetBlockHash())) {
        return true;
    }
    ...
    

    mzumsande commented at 7:51 PM on June 8, 2026:

    this would change the logic, we currently return false if best_block_index is not set. This relates to above discussion about the (unused) return value.

  38. l0rinc changes_requested
  39. l0rinc commented at 10:28 PM on May 31, 2026: contributor

    Left a few more comments - given the other comments and the silent merge conflict. It would also be good to confirm whether the same unsafe restart boundary applies to other indexes using eager write paths, such as BlockFilterIndex.

  40. DrahtBot requested review from l0rinc on May 31, 2026
  41. mzumsande commented at 10:33 PM on June 7, 2026: contributor

    Sorry for the delay, I am currently working on addressing all comments and will rebase/push a new version in a few days.

  42. mzumsande force-pushed on Jun 8, 2026
  43. mzumsande force-pushed on Jun 8, 2026
  44. DrahtBot added the label CI failed on Jun 8, 2026
  45. DrahtBot commented at 6:54 PM on June 8, 2026: contributor

    <!--85328a0da195eb286784d51f73fa0af9-->

    🚧 At least one of the CI tasks failed. <sub>Task macOS native: https://github.com/bitcoin/bitcoin/actions/runs/27158855154/job/80168666015</sub> <sub>LLM reason (✨ experimental): CI failed because CTest crashed with a segmentation fault in validation_chainstate_tests (test validation_chainstate_resize_caches).</sub>

    <details><summary>Hints</summary>

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

    </details>

  46. validation: track last flushed block
    This will be used to prevent the indexes from flushing their state ahead of
    the chainstate.
    
    Co-authored-by: Lőrinc <pap.lorinc@gmail.com>
    c8dc76f0a4
  47. index: Don't commit ahead of the flushed chainstate
    Otherwise, if the node has an unclean restart,
    indexes with state (coinstatsindex) couldn't reorg to the
    last flushed tip and would be corrupted.
    
    Also updates documentation of Commit() -
    the locator functionality isn't used, so the previous text was wrong:
    We must have the best block in our block index after a restart.
    
    Co-authored-by: Fabian Jahr <fjahr@protonmail.com>
    383b5e2cdf
  48. test: add test to ensure indexes dont commit too early c4096d9da5
  49. mzumsande force-pushed on Jun 8, 2026
  50. DrahtBot removed the label CI failed on Jun 8, 2026
  51. mzumsande commented at 10:02 PM on June 8, 2026: contributor

    Thanks for the feedback! I have now updated the PR - rebased and addressed comments. I also changed the added test to cover the actual problem in Sync().

    I feel like this could skip adding the isBlockInFlushedChain method and access the chainstate directly since #24230 moves sync thread from the node to the index. But either way seems fine.

    I also implemented a similar fix in 6537f96 from #24230 which has some differences like the check being done inside Sync() instead of Commit(). Not sure if the differences are significant though.

    I did the changes in Commit() because Sync() calls Commit() in multiple different spots.

  52. in src/node/interfaces.cpp:674 in c8dc76f0a4
     665 | @@ -666,6 +666,13 @@ class ChainImpl : public Chain
     666 |          }
     667 |          return false;
     668 |      }
     669 | +    bool isAncestorOfLastFlushedBlock(const uint256& block_hash) override
     670 | +    {
     671 | +        LOCK(::cs_main);
     672 | +        const CBlockIndex* block{chainman().m_blockman.LookupBlockIndex(block_hash)};
     673 | +        const CBlockIndex* last_flushed{chainman().ValidatedChainstate().GetLastFlushedBlock()};
     674 | +        return block && last_flushed && last_flushed->GetAncestor(block->nHeight) == block;
    


    sedited commented at 1:14 PM on June 9, 2026:

    Nit: Could we drop the lock before walking to the ancestor?

  53. sedited approved
  54. sedited commented at 1:21 PM on June 9, 2026: contributor

    ACK c4096d9da5c8bc2a06d7c22cfd08430d4634124d

  55. DrahtBot requested review from furszy on Jun 9, 2026
  56. DrahtBot requested review from fjahr on Jun 9, 2026
  57. in src/interfaces/chain.h:192 in c8dc76f0a4
     184 | @@ -185,6 +185,12 @@ class Chain
     185 |      //! the height range from min_height to max_height, inclusive.
     186 |      virtual bool hasBlocks(const uint256& block_hash, int min_height = 0, std::optional<int> max_height = {}) = 0;
     187 |  
     188 | +    //! Check whether the given block is an ancestor of (or equal to) the last
     189 | +    //! block whose chainstate was flushed to disk. Uses the fully validated
     190 | +    //! chainstate, which is the one indexes follow. Returns false if no flush
     191 | +    //! has happened yet.
     192 | +    virtual bool isAncestorOfLastFlushedBlock(const uint256& block_hash) = 0;
    


    ryanofsky commented at 4:35 PM on June 16, 2026:

    In commit "validation: track last flushed block" (c8dc76f0a4ecc89d02592adcbde04857f9ec8092)

    Not important but imo a name like isBlockDataFlushed would seem clearer (especially since it's a little confusing for a block to be considered its own ancestor here).

    I also think it would be reasonable to drop this method and use m_chainstate directly, since it's not good for the Chain interface to have so many unintegrated methods, and my long term hope for this (#24230) is for flush status to be sent with block notifications,

  58. in src/index/base.cpp:287 in 383b5e2cdf
     278 | @@ -279,6 +279,13 @@ bool BaseIndex::Commit()
     279 |      // (this could happen if init is interrupted).
     280 |      bool ok = m_best_block_index != nullptr;
     281 |      if (ok) {
     282 | +        // Don't commit if the index is ahead of the chainstate's last flushed block
     283 | +        const CBlockIndex* index_tip = m_best_block_index.load();
     284 | +        if (!m_chain->isAncestorOfLastFlushedBlock(index_tip->GetBlockHash())) {
     285 | +            LogInfo("Skipping commit, index is ahead of flushed chainstate (index height %d, chain height %d)",
     286 | +                    index_tip->nHeight, m_chain->getHeight().value_or(-1));
     287 | +            return false;
    


    ryanofsky commented at 4:37 PM on June 16, 2026:

    In commit "index: Don't commit ahead of the flushed chainstate" (9ffa017cc6eabba3b5633ebe1a1bbbb1f1985f5c)

    It seems like this change makes it no longer possible to distinguish between error and nonerror return values. Would seem better to just eliminate the return value since it's not used anywhere, instead of returning a more confusing value.

  59. in src/test/coinstatsindex_tests.cpp:130 in c4096d9da5
     122 | @@ -123,4 +123,68 @@ BOOST_FIXTURE_TEST_CASE(coinstatsindex_unclean_shutdown, TestChain100Setup)
     123 |      }
     124 |  }
     125 |  
     126 | +// Test that the index does not commit ahead of the chainstate's last
     127 | +// flushed block. If it did, a subsequent unclean shutdown would corrupt
     128 | +// the index, because during reverting it would require blocks that were
     129 | +// never flushed to disk.
     130 | +BOOST_FIXTURE_TEST_CASE(coinstatsindex_no_commit_ahead_of_flush, TestChain100Setup)
    


    ryanofsky commented at 5:09 PM on June 16, 2026:

    In commit "test: add test to ensure indexes dont commit too early" (c4096d9da5c8bc2a06d7c22cfd08430d4634124d)

    IMO, the test would be easier to understand if it were deduplicated, e.g:

    BOOST_FIXTURE_TEST_CASE(coinstatsindex_no_commit_ahead_of_flush, TestChain100Setup)
    {
        Chainstate& chainstate = Assert(m_node.chainman)->ActiveChainstate();
        auto sync_index = [&](bool do_flush, int expected_sync_height, int expected_commit_height) {
            CoinStatsIndex index{interfaces::MakeChain(m_node), /*n_cache_size=*/1_MiB};
            BOOST_REQUIRE(index.Init());
            index.Sync();
            if (do_flush) {
                chainstate.ForceFlushStateToDisk();
                m_node.chain->context()->validation_signals->SyncWithValidationInterfaceQueue();
            }
            BOOST_CHECK_EQUAL(index.GetSummary().best_block_height, expected_sync_height);
            index.Stop();
            // Reload index to see which block data was actually committed.
            BOOST_REQUIRE(index.Init());
            BOOST_CHECK_EQUAL(index.GetSummary().best_block_height, expected_commit_height);
            index.Stop();
        };
    
        // Part 1: Sync, then "crash" (stop without flushing). Models a node that
        // started up, had its index catch up, but never flushed before going down.
        // The end-of-sync Commit() runs at chain tip (height 100) but
        // m_last_flushed_block is null, so it is skipped.
        sync_index(false, 100, 0);
    
        // Part 2: Restart cleanly. Sync, force a chainstate flush, and drain the
        // validation queue so the index's ChainStateFlushed callback runs. Now
        // m_last_flushed_block == tip == 100 and the index can commit.
        sync_index(true, 100, 100);
    
        // Part 3: Connect a new block on the chain without flushing
        // (m_last_flushed_block stays at 100). For a real node this would happen
        // in parallel with Sync(). Here we do it before Sync() to make the race
        // state deterministic.
        CreateAndProcessBlock({}, CScript() << OP_TRUE);
        sync_index(false, 101, 100);
    }
    
  60. ryanofsky approved
  61. ryanofsky commented at 5:27 PM on June 16, 2026: contributor

    Code review ACK c4096d9da5c8bc2a06d7c22cfd08430d4634124d. Looks great! Left several suggestions but none are important


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-06-17 21:51 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me