index: batch db writes during initial sync #34489

pull furszy wants to merge 4 commits into bitcoin:master from furszy:2026_index_batch_processing changing 9 files +88 −28
  1. furszy commented at 4:06 am on February 3, 2026: member

    Decouples part of #26966.

    Right now, index initial sync writes to disk for every block, which is not the best for HDD. This change batches those, so disk writes are less frequent during initial sync. This also cuts down cs_main contention by reducing the number of NextBlockSync calls (instead of calling it for every block, we will call it once per block window), making the node more responsive (IBD and validation) while indexes sync up. On top of that, it lays the groundwork for the bigger speedups, since part of the parallelization pre-work is already in place.

    Just as a small summary:

    • Batch DB writes instead of flushing per block, which improves sync time on HDD due to the reduced number of disk write operations.
    • Reduce cs_main lock contention, which improves the overall node responsiveness (and primarily IBD) while the indexes threads are syncing.
    • Lays the groundwork for the real speedups, since part of #26966 parallelization pre-work is introduced here as well.

    Note: Pending initial sync benchmark and further testing. I have only focused on decoupling and simplifying what was in #26966 first. Commits should be simple to diggest.

  2. DrahtBot added the label UTXO Db and Indexes on Feb 3, 2026
  3. DrahtBot commented at 4:07 am on February 3, 2026: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    Concept ACK l0rinc

    If your review is incorrectly listed, please copy-paste <!–meta-tag:bot-skip–> into the comment that the bot should ignore.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #34440 (Refactor CChain methods to use references, tests by optout21)
    • #26966 (index: initial sync speedup, parallelize process by furszy)
    • #24230 (indexes: Stop using node internal types and locking cs_main, improve sync logic by ryanofsky)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  4. furszy force-pushed on Feb 3, 2026
  5. DrahtBot added the label CI failed on Feb 3, 2026
  6. DrahtBot commented at 4:42 am on February 3, 2026: contributor

    🚧 At least one of the CI tasks failed. Task iwyu: https://github.com/bitcoin/bitcoin/actions/runs/21616547248/job/62296358632 LLM reason (✨ experimental): IWYU (include-what-you-use) reported a failure in the test script, causing the CI to fail.

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  7. in src/index/base.cpp:201 in 8368abaa8b outdated
    197@@ -198,6 +198,22 @@ bool BaseIndex::ProcessBlock(const CBlockIndex* pindex, const CBlock* block_data
    198     return true;
    199 }
    200 
    201+bool BaseIndex::ProcessBlocks(const CBlockIndex* start, const CBlockIndex* end)
    


    fjahr commented at 9:42 am on February 3, 2026:
    Why keep ProcessBlock around? It should be simpler if we just have one function and it can be handed one or many blocks.

    fjahr commented at 10:06 am on February 3, 2026:
    Or, if you just want to keep it for encapsulation, consider making it private at least?

    furszy commented at 3:30 pm on February 3, 2026:

    Why keep ProcessBlock around? It should be simpler if we just have one function and it can be handed one or many blocks.

    Because ProcessBlocks will be specialized in a follow-up commit (please see #26966) to enable additional speedups, such as processing blocks out-of-order for indexes that support it.

    Also, I wouldn’t merge them because these two methods solve different problems. ProcessBlocks handles ordering, ProcessBlock handles a single unit of work. I think keeping them separate keeps the code clean and easy to follow while allowing us to introduce improvements in the future.

    Or, if you just want to keep it for encapsulation, consider making it private at least?

    Yeah, could do that.

    just realized that ProcessBlocks is private already.

  8. in src/index/base.h:32 in e7fa0d00e1 outdated
    26@@ -27,6 +27,10 @@ class CBlockIndex;
    27 class Chainstate;
    28 
    29 struct CBlockLocator;
    30+
    31+/** Range of blocks to process in batches */
    32+static constexpr int INDEX_BATCH_SIZE = 500;
    


    fjahr commented at 9:45 am on February 3, 2026:
    Has this been optimized with benchmarks or are you planning to do that? This might be something differs between the indexes so it might make sense to let each index define their own value.

    furszy commented at 3:13 pm on February 3, 2026:

    Has this been optimized with benchmarks or are you planning to do that? This might be something differs between the indexes so it might make sense to let each index define their own value.

    Let’s handle it in a focused follow-up so we don’t lose momentum here tuning a parameter.

    I’d rather land the structural improvements first, since they unblock other improvements and allow us to continue working on the parallelization goal (which introduces the major sync speedup).

  9. fjahr commented at 9:47 am on February 3, 2026: contributor
    This relates to what you wrote here right? It would be helpful that you check if it solves the LevelDB file issue in coinstatsindex and I am also curious about your benchmark results because @l0rinc did not find that this was leading to a speed up.
  10. l0rinc commented at 10:38 am on February 3, 2026: contributor

    Concept ACK, I prefer this over #33306 - and will investigate if this solves that problem once my benchmarking servers free up.

    because @l0rinc did not find that this was leading to a speed up.

    I experimented with something similar in https://github.com/l0rinc/bitcoin/pull/37/changes, will compare it against this change. I wrote:

    indexes: batch index writes: 34188s, 211M, 8 files It solved the fragmentation, but didn’t speed up anything. I still think this is a better direction than adding manual compactions.

    Most likely since writing isn’t the bottleneck but MuHash calculations were.

  11. furszy commented at 3:02 pm on February 3, 2026: member

    This relates to what you wrote here right? It would be helpful that you check if it solves the LevelDB file issue in coinstatsindex and I am also curious about your benchmark results because @l0rinc did not find that this was leading to a speed up. @fjahr the comment was merely an excuse to decouple the DB writes batching code out of #26966 rather than me having a formed opinion in favor or against #33306. That’s why I didn’t mention it in the PR description. Maybe the changes complement each other.

    To be crystal clear, just updated the PR description with further details on why this change worth alone, independently on #33306.

    To summarize it here, the goal of the changes are:

    • Batch DB writes instead of flushing per block, which will improve sync time on HDD due to the reduced number of IO operations.
    • Reduce cs_main lock contention, which orthogonally improves the overall node responsiveness (and primarily IBD) while the indexes threads are syncing.
    • Lays the groundwork for the real speedups, since part of #26966 parallelization pre-work is introduced here as well.
  12. furszy force-pushed on Feb 3, 2026
  13. furszy force-pushed on Feb 3, 2026
  14. furszy commented at 8:26 pm on February 3, 2026: member
    Updated per feedback from @hebasto (thanks!). Changed <cinttypes> include for <cstdint> to make IWYU happy.
  15. hebasto commented at 8:38 pm on February 3, 2026: member

    Updated per feedback from @hebasto (thanks!). Changed <cinttypes> include for <cstdint> to make IWYU happy.

    I guess, this is a bug in IWYU caused by https://github.com/include-what-you-use/include-what-you-use/commit/44480a2de0d8ef039b13391997f274ea33750be9.

    UPDATE: Fixed in #34498.

  16. maflcko commented at 1:57 pm on February 4, 2026: member
    Can you run all unit and functional tests on all commits? Or do they time out?
  17. index: add method to process block ranges
    No behavior change.
    
    Introduce ProcessBlocks(start, end) to handle a range of blocks
    in forward order. Currently used per-block, but this lays the
    foundation for future batch processing and parallelization.
    6da54d7cfa
  18. index: introduce BlockBatch struct for block range handling
    No behavior change.
    
    Lays the foundation for batch processing and future parallelization.
    e249e45ca4
  19. index: compute block batch window
    Sets the end of a block batch, starting from the next
    block to sync and stopping at either the configured
    batch size or the chain tip.
    eed9ea5fc5
  20. index: enable DB writes batching during sync
    Pass CDBBatch to subclasses during so writes
    can be accumulated and committed together
    instead of flushing per block.
    8826900ddc
  21. furszy force-pushed on Feb 4, 2026
  22. furszy commented at 3:22 pm on February 4, 2026: member

    Can you run all unit and functional tests on all commits? Or do they time out?

    Bad squash, my bad. Thanks for the heads up. Fixed. Also rebased the branch to pick up the CI changes.

  23. DrahtBot removed the label CI failed on Feb 4, 2026

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-02-07 06:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me