indexes: Don't commit ahead of the flushed chainstate #34897

mzumsande commented at 5:54 AM on March 23, 2026: contributor

If indexes commit their data ahead of the flushed chainstate, and there is an unclean shutdown, the index will be corrupted. This is especially the case for the coinstatsindex, which has state (the muhash) which can't easily be rolled back without access to the blocks. This was only partly fixed in #33212 (for reorg scenarios) but could still happen during initial sync.

Fix this more thoroughly by having the node keep track of the last flushed block, and skipping index commits if the current block of the index is not an ancestor of the node's last flushed block (similar to the suggestion by stickies-v in #33212#pullrequestreview-31408570890.

I tried to follow the spirit of #24230 by adding the call to the chain interface (which involved adding an oddly specific method there) in order to avoid directly reaching into the node from the index code.

Fixes #33208 Fixes #34261

validation: track last flushed block

This will be used to prevent the indexes to flush their state ahead of
the chainstate.

489bb4a5d8

index: Don't commit ahead of the flushed chainstate

Otherwise, if the node has an unclean restart,
indexes with state (coinstatsindex) couldn't reorg to the
last flushed tip and would be corrupted.

7933e270ee

test: add test to ensure indexes dont commit too early 5af4d6e2a6

DrahtBot added the label UTXO Db and Indexes on Mar 23, 2026

DrahtBot commented at 5:54 AM on March 23, 2026: contributor

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/34897.

Reviews

See the guideline for information on the review process.

Type	Reviewers
ACK	furszy
Concept ACK	fjahr, l0rinc, ryanofsky, arejula27

If your review is incorrectly listed, please copy-paste <code></code> into the comment that the bot should ignore.

Conflicts

No conflicts as of last run.

LLM Linter (✨ experimental)

Possible typos and grammar issues:

//! Uses the fully validated chainstate, which is the one indexes follow. -> //! Uses the fully validated chainstate, which is the one the indexes follow. [Missing “the” makes the sentence grammatically incorrect and slightly harder to read.]

fjahr commented at 8:23 AM on March 23, 2026: contributor

Concept ACK

l0rinc commented at 1:12 PM on March 25, 2026: contributor

I have reported something similar in #34489 (review) and #34489 (review) - will this PR fix those? Concept ACK regardless, thanks for fixing them!

in src/index/base.cpp:283 in 5af4d6e2a6

 278 | +        const CBlockIndex* index_tip = m_best_block_index.load();
 279 | +        if (index_tip && !m_chain->isBlockInFlushedChain(index_tip->GetBlockHash(), index_tip->nHeight)) {
 280 | +            LogInfo("Skipping commit, index is ahead of flushed chainstate");
 281 | +            return false;
 282 | +        }
 283 | +

arejula27 commented at 10:29 AM on March 29, 2026:

This is a behavioral change: Commit() can now return false without an actual error, meaning "intentionally skipped." Callers in Sync() currently treat Commit() failures as non-fatal, so this works in practice. However, the documentation for Commit() in base.h:96-103 only describes error scenarios and recovery recommendations, it should be updated to reflect this new "skip" behavior and the conditions under which it triggers (index ahead of flushed chainstate). I migth think this could be a good place to use the new Expected type. Errors will return an error and a no-commit cuz blocks no flushed would return false

furszy commented at 8:24 PM on May 21, 2026:

We are currently not using the Commit return value. I wouldn't worry about this now.

in src/test/coinstatsindex_tests.cpp:1 in 5af4d6e2a6

arejula27 commented at 10:38 AM on March 29, 2026:

Would it be interesting to add these test cases?

Reorg scenario: Verify behavior when the index is on one fork and the flushed chainstate is on another.
Recovery: Verify the index resumes from the last chainstate flush point. I think will be interesting to verify the corret initialization behaviour of isBlockInFlushedChain

arejula27 commented at 10:48 AM on March 29, 2026: none

Concept ACK [5af4d6e2a63e32f36b4dcb7602c562cf953a41ad]

However, I'm concerned about a potential starvation scenario during IBD. Two threads run in parallel: the validation thread connects blocks and periodically flushes the UTXO set to disk (~1h intervals or when the cache is full), while the index thread (Sync()) processes already-connected blocks and calls Commit() every 30 seconds. Since index processing (filters, tx lookups) is significantly lighter than UTXO set updates, the index will typically stay ahead of the flush point, even though it's always behind the chain tip.

This means that during IBD, every Commit() call could be skipped because the index is perpetually ahead of m_last_flushed_block. A crash at any point would lose all indexing progress back to the initial LoadChainTip value, potentially hours of work. I will run experiments to verify whether this actually happens and propose a solution in that case.

I also suggest discussing an alternative approach for index commits: committing the locator up to the flushed block instead of skipping entirely. For example, if the index has processed up to block 5000 but the last flush was at block 3000, the locator could be written at 3000. This would preserve the safety invariant (locator never exceeds the flush point) while still persisting progress. That said, this would require rethinking how Commit() and CustomCommit() work across index subclasses.

This request also concerns me from a performance perspective: if index commit progress is now bounded by the chainstate flush interval, any index-side speed optimization becomes less impactful. PRs like #34489 (batch DB writes) and #26966 (parallelized index sync) focus on making indexes faster, but with this change the bottleneck shifts to how often the chainstate flushes, something entirely outside the index's control. I would love to see those optimizations land in Core, so it would be worth ensuring this safety fix doesn't inadvertently cap their benefits.

in src/index/base.cpp:275 in 7933e270ee

 271 | @@ -272,7 +272,15 @@ bool BaseIndex::Commit()
 272 |      // Don't commit anything if we haven't indexed any block yet
 273 |      // (this could happen if init is interrupted).
 274 |      bool ok = m_best_block_index != nullptr;
 275 | +

arejula27 commented at 4:24 PM on March 29, 2026:

nit: why new line? On lines 278–279 a new variable is being initialised and used in an if guard on the next line without no new line space

mzumsande commented at 6:38 AM on March 30, 2026: contributor

However, I'm concerned about a potential starvation scenario during IBD. Two threads run in parallel: the validation thread connects blocks and periodically flushes the UTXO set to disk (~1h intervals or when the cache is full), while the index thread (Sync()) processes already-connected blocks and calls Commit() every 30 seconds. Since index processing (filters, tx lookups) is significantly lighter than UTXO set updates, the index will typically stay ahead of the flush point, even though it's always behind the chain tip.

The Sync thread runs only temporary, when the index is behind the chainstate (e.g. newly started on a node that was running without an index before). For example, if a new node starts IBD with an index, Sync will stop immediately and never do any work. As soon as the index has caught up with the tip, the Sync thread is stopped and we use validation signals to keep on syncing. Once that happens, the index will commit with each ChainstateFlushed validation callback, and since validation signals are processed in order, I don't think its best block can ever be ahead of the flushed chainstate then.

So I don't think the problem you describe is an issue. Commits will only be skipped in the limited time span where the Sync thread is almost caught up, due to additional blocks being connected during the earlier sync. This should be a relatively short period. They won't be skipped if the index is catching up from far behind, or once the index has reached the tip.

This request also concerns me from a performance perspective

Commit doesn't do any significant work, this is being done in ProcessBlock(), which will write things to disk independently. Commit only writes the data required to know where we should pick up at the next start (including state data such as the current MuHash of the coinstatsindex: If we don't Commit and have an unclean restart, we might have to do some work again in case of an unclean restart, if we do Commit ahead of the tip and have an unclean restart, the index is corrupted.

I'll address the other comments later.

ryanofsky commented at 8:57 PM on April 7, 2026: contributor

Concept ACK 5af4d6e2a63e32f36b4dcb7602c562cf953a41ad. Good to follow up on this issue.

I feel like this could skip adding the isBlockInFlushedChain method and access the chainstate directly since #24230 moves sync thread from the node to the index. But either way seems fine.

I also implemented a similar fix in 6537f9655a292b09db7be465601883d4fe87d7d7 from #24230 which has some differences like the check being done inside Sync() instead of Commit(). Not sure if the differences are significant though.

in src/validation.cpp:2816 in 489bb4a5d8

2812 | @@ -2813,6 +2813,7 @@ bool Chainstate::FlushStateToDisk(
2813 |                  }
2814 |                  // Flush the chainstate (which may refer to block index entries).
2815 |                  empty_cache ? CoinsTip().Flush() : CoinsTip().Sync();
2816 | +                m_last_flushed_block = m_chain.Tip();

furszy commented at 2:37 PM on May 21, 2026:

Coming from the lack of m_chain usage in this function, I'm wondering if we should add a sanity-check here?

Assume(m_last_flushed_block->GetBlockHash() == CoinsTip().GetBestBlock());

or.. maybe directly use

m_last_flushed_block =  m_blockman.LookupBlockIndex(CoinsTip().GetBestBlock());

in src/test/coinstatsindex_tests.cpp:148 in 5af4d6e2a6

 143 | +            new_block = std::make_shared<CBlock>(block);
 144 | +
 145 | +            LOCK(cs_main);
 146 | +            BlockValidationState state;
 147 | +            BOOST_CHECK(CheckBlock(block, state, params.GetConsensus()));
 148 | +            BOOST_CHECK(m_node.chainman->AcceptBlock(new_block, state, &new_block_index, true, nullptr, nullptr, true));

furszy commented at 7:18 PM on May 21, 2026:

nit: it seems you can inline the std::make_shared<CBlock>(block) here. It is not used elsewhere.

in src/test/coinstatsindex_tests.cpp:141 in 5af4d6e2a6

 136 | +        // never flushed, m_last_flushed_block is still null.
 137 | +        std::shared_ptr<const CBlock> new_block;
 138 | +        CBlockIndex* new_block_index = nullptr;
 139 | +        {
 140 | +            const CScript script_pub_key{CScript() << ToByteVector(coinbaseKey.GetPubKey()) << OP_CHECKSIG};
 141 | +            const CBlock block = this->CreateBlock({}, script_pub_key, chainstate);

furszy commented at 7:41 PM on May 21, 2026:

nit

            const CBlock block = this->CreateBlock({}, CScript() << OP_TRUE, chainstate);

in src/test/coinstatsindex_tests.cpp:150 in 5af4d6e2a6

 145 | +            LOCK(cs_main);
 146 | +            BlockValidationState state;
 147 | +            BOOST_CHECK(CheckBlock(block, state, params.GetConsensus()));
 148 | +            BOOST_CHECK(m_node.chainman->AcceptBlock(new_block, state, &new_block_index, true, nullptr, nullptr, true));
 149 | +            CCoinsViewCache view(&chainstate.CoinsTip());
 150 | +            BOOST_CHECK(chainstate.ConnectBlock(block, state, new_block_index, view));

furszy commented at 7:45 PM on May 21, 2026:

what do you think about flushing the validation queue here. So it does not interfere with the checks you do below?

m_node.chain->context()->validation_signals->SyncWithValidationInterfaceQueue();

in src/index/base.cpp:280 in 7933e270ee

 271 | @@ -272,7 +272,15 @@ bool BaseIndex::Commit()
 272 |      // Don't commit anything if we haven't indexed any block yet
 273 |      // (this could happen if init is interrupted).
 274 |      bool ok = m_best_block_index != nullptr;
 275 | +
 276 |      if (ok) {
 277 | +        // Don't commit if the index is ahead of the chainstate's last flushed block
 278 | +        const CBlockIndex* index_tip = m_best_block_index.load();
 279 | +        if (index_tip && !m_chain->isBlockInFlushedChain(index_tip->GetBlockHash(), index_tip->nHeight)) {
 280 | +            LogInfo("Skipping commit, index is ahead of flushed chainstate");

furszy commented at 8:12 PM on May 21, 2026:

nit: Could log where the index is at vs the chain height, so the diff gets logged.

in src/index/base.cpp:279 in 7933e270ee

 271 | @@ -272,7 +272,15 @@ bool BaseIndex::Commit()
 272 |      // Don't commit anything if we haven't indexed any block yet
 273 |      // (this could happen if init is interrupted).
 274 |      bool ok = m_best_block_index != nullptr;
 275 | +
 276 |      if (ok) {
 277 | +        // Don't commit if the index is ahead of the chainstate's last flushed block
 278 | +        const CBlockIndex* index_tip = m_best_block_index.load();
 279 | +        if (index_tip && !m_chain->isBlockInFlushedChain(index_tip->GetBlockHash(), index_tip->nHeight)) {

furszy commented at 8:18 PM on May 21, 2026:

nit: We are already checking index_tip is not null above (that is what ok is for), maybe just load it outside the if block.

furszy commented at 8:31 PM on May 21, 2026: member

ACK 5af4d6e2a63e32f36b4dcb7602c562cf953a41ad , left a few suggestions. Nothing blocking.

DrahtBot requested review from l0rinc on May 21, 2026

DrahtBot requested review from fjahr on May 21, 2026

DrahtBot requested review from arejula27 on May 21, 2026

DrahtBot requested review from ryanofsky on May 21, 2026

arejula27 commented at 9:13 AM on May 25, 2026: none

ACK 5af4d6e2a63e32f36b4dcb7602c562cf953a41ad

I ran the experiments I suggested in my previous Concept ACK.

The skip path does trigger, but the loss window is bounded by 30s, which is negligible compared to a multi-day IBD, so the changes are correct.

Non-blocking suggestion for a follow-up: I wopudl encorague again to add a few more:

Reorg scenario: Verify behavior when the index is on one fork and the flushed chainstate is on another.
Recovery: Verify the index resumes from the last chainstate flush point. I think will be interesting to verify the corret initialization behaviour of isBlockInFlushedChain