indexes: Don't commit ahead of the flushed chainstate #34897

pull mzumsande wants to merge 3 commits into bitcoin:master from mzumsande:202603_index_sync_dont_commit_ahead changing 8 files +108 −0
  1. mzumsande commented at 5:54 AM on March 23, 2026: contributor

    If indexes commit their data ahead of the flushed chainstate, and there is an unclean shutdown, the index will be corrupted. This is especially the case for the coinstatsindex, which has state (the muhash) which can't easily be rolled back without access to the blocks. This was only partly fixed in #33212 (for reorg scenarios) but could still happen during initial sync.

    Fix this more thoroughly by having the node keep track of the last flushed block, and skipping index commits if the current block of the index is not an ancestor of the node's last flushed block (similar to the suggestion by stickies-v in #33212#pullrequestreview-31408570890.

    I tried to follow the spirit of #24230 by adding the call to the chain interface (which involved adding an oddly specific method there) in order to avoid directly reaching into the node from the index code.

    Fixes #33208 Fixes #34261

  2. validation: track last flushed block
    This will be used to prevent the indexes to flush their state ahead of
    the chainstate.
    489bb4a5d8
  3. index: Don't commit ahead of the flushed chainstate
    Otherwise, if the node has an unclean restart,
    indexes with state (coinstatsindex) couldn't reorg to the
    last flushed tip and would be corrupted.
    7933e270ee
  4. test: add test to ensure indexes dont commit too early 5af4d6e2a6
  5. DrahtBot added the label UTXO Db and Indexes on Mar 23, 2026
  6. DrahtBot commented at 5:54 AM on March 23, 2026: contributor

    <!--e57a25ab6845829454e8d69fc972939a-->

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    <!--021abf342d371248e50ceaed478a90ca-->

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    Concept ACK fjahr, l0rinc, arejula27, ryanofsky

    If your review is incorrectly listed, please copy-paste <code>&lt;!--meta-tag:bot-skip--&gt;</code> into the comment that the bot should ignore.

    <!--174a7506f384e20aa4161008e828411d-->

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #34917 (wallet: mark bip125-replaceable deprecated, remove walletrbf argument by rkrux)
    • #24230 (indexes: Stop using node internal types and locking cs_main, improve sync logic by ryanofsky)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

    <!--5faf32d7da4f0f540f40219e4f7537a3-->

    LLM Linter (✨ experimental)

    Possible typos and grammar issues:

    • //! Uses the fully validated chainstate, which is the one indexes follow. -> //! Uses the fully validated chainstate, which is the one the indexes follow. [Missing “the” makes the sentence grammatically incorrect and slightly harder to read.]

    <sup>2026-03-23 05:54:23</sup>

  7. fjahr commented at 8:23 AM on March 23, 2026: contributor

    Concept ACK

  8. l0rinc commented at 1:12 PM on March 25, 2026: contributor

    I have reported something similar in #34489 (review) and #34489 (review) - will this PR fix those? Concept ACK regardless, thanks for fixing them!

  9. in src/index/base.cpp:283 in 5af4d6e2a6
     278 | +        const CBlockIndex* index_tip = m_best_block_index.load();
     279 | +        if (index_tip && !m_chain->isBlockInFlushedChain(index_tip->GetBlockHash(), index_tip->nHeight)) {
     280 | +            LogInfo("Skipping commit, index is ahead of flushed chainstate");
     281 | +            return false;
     282 | +        }
     283 | +
    


    arejula27 commented at 10:29 AM on March 29, 2026:

    This is a behavioral change: Commit() can now return false without an actual error, meaning "intentionally skipped." Callers in Sync() currently treat Commit() failures as non-fatal, so this works in practice. However, the documentation for Commit() in base.h:96-103 only describes error scenarios and recovery recommendations, it should be updated to reflect this new "skip" behavior and the conditions under which it triggers (index ahead of flushed chainstate). I migth think this could be a good place to use the new Expected type. Errors will return an error and a no-commit cuz blocks no flushed would return false

  10. in src/test/coinstatsindex_tests.cpp:1 in 5af4d6e2a6


    arejula27 commented at 10:38 AM on March 29, 2026:

    Would it be interesting to add these test cases?

    • Reorg scenario: Verify behavior when the index is on one fork and the flushed chainstate is on another.
    • Recovery: Verify the index resumes from the last chainstate flush point. I think will be interesting to verify the corret initialization behaviour of isBlockInFlushedChain
  11. arejula27 commented at 10:48 AM on March 29, 2026: none

    Concept ACK [5af4d6e2a63e32f36b4dcb7602c562cf953a41ad]

    However, I'm concerned about a potential starvation scenario during IBD. Two threads run in parallel: the validation thread connects blocks and periodically flushes the UTXO set to disk (~1h intervals or when the cache is full), while the index thread (Sync()) processes already-connected blocks and calls Commit() every 30 seconds. Since index processing (filters, tx lookups) is significantly lighter than UTXO set updates, the index will typically stay ahead of the flush point, even though it's always behind the chain tip.

    This means that during IBD, every Commit() call could be skipped because the index is perpetually ahead of m_last_flushed_block. A crash at any point would lose all indexing progress back to the initial LoadChainTip value, potentially hours of work. I will run experiments to verify whether this actually happens and propose a solution in that case.

    I also suggest discussing an alternative approach for index commits: committing the locator up to the flushed block instead of skipping entirely. For example, if the index has processed up to block 5000 but the last flush was at block 3000, the locator could be written at 3000. This would preserve the safety invariant (locator never exceeds the flush point) while still persisting progress. That said, this would require rethinking how Commit() and CustomCommit() work across index subclasses.

    This request also concerns me from a performance perspective: if index commit progress is now bounded by the chainstate flush interval, any index-side speed optimization becomes less impactful. PRs like #34489 (batch DB writes) and #26966 (parallelized index sync) focus on making indexes faster, but with this change the bottleneck shifts to how often the chainstate flushes, something entirely outside the index's control. I would love to see those optimizations land in Core, so it would be worth ensuring this safety fix doesn't inadvertently cap their benefits.

  12. in src/index/base.cpp:275 in 7933e270ee
     271 | @@ -272,7 +272,15 @@ bool BaseIndex::Commit()
     272 |      // Don't commit anything if we haven't indexed any block yet
     273 |      // (this could happen if init is interrupted).
     274 |      bool ok = m_best_block_index != nullptr;
     275 | +
    


    arejula27 commented at 4:24 PM on March 29, 2026:

    nit: why new line? On lines 278–279 a new variable is being initialised and used in an if guard on the next line without no new line space

  13. mzumsande commented at 6:38 AM on March 30, 2026: contributor

    However, I'm concerned about a potential starvation scenario during IBD. Two threads run in parallel: the validation thread connects blocks and periodically flushes the UTXO set to disk (~1h intervals or when the cache is full), while the index thread (Sync()) processes already-connected blocks and calls Commit() every 30 seconds. Since index processing (filters, tx lookups) is significantly lighter than UTXO set updates, the index will typically stay ahead of the flush point, even though it's always behind the chain tip.

    The Sync thread runs only temporary, when the index is behind the chainstate (e.g. newly started on a node that was running without an index before). For example, if a new node starts IBD with an index, Sync will stop immediately and never do any work. As soon as the index has caught up with the tip, the Sync thread is stopped and we use validation signals to keep on syncing. Once that happens, the index will commit with each ChainstateFlushed validation callback, and since validation signals are processed in order, I don't think its best block can ever be ahead of the flushed chainstate then.

    So I don't think the problem you describe is an issue. Commits will only be skipped in the limited time span where the Sync thread is almost caught up, due to additional blocks being connected during the earlier sync. This should be a relatively short period. They won't be skipped if the index is catching up from far behind, or once the index has reached the tip.

    This request also concerns me from a performance perspective

    Commit doesn't do any significant work, this is being done in ProcessBlock(), which will write things to disk independently. Commit only writes the data required to know where we should pick up at the next start (including state data such as the current MuHash of the coinstatsindex: If we don't Commit and have an unclean restart, we might have to do some work again in case of an unclean restart, if we do Commit ahead of the tip and have an unclean restart, the index is corrupted.

    I'll address the other comments later.

  14. ryanofsky commented at 8:57 PM on April 7, 2026: contributor

    Concept ACK 5af4d6e2a63e32f36b4dcb7602c562cf953a41ad. Good to follow up on this issue.

    I feel like this could skip adding the isBlockInFlushedChain method and access the chainstate directly since #24230 moves sync thread from the node to the index. But either way seems fine.

    I also implemented a similar fix in 6537f9655a292b09db7be465601883d4fe87d7d7 from #24230 which has some differences like the check being done inside Sync() instead of Commit(). Not sure if the differences are significant though.


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-04-18 09:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me