index: batch db writes during initial sync #34489

pull furszy wants to merge 9 commits into bitcoin:master from furszy:2026_index_batch_processing changing 9 files +278 −55
  1. furszy commented at 4:06 am on February 3, 2026: member

    Decouples part of #26966.

    Right now, index initial sync writes to disk for every block, which is not the best for HDD. This change batches those, so disk writes are less frequent during initial sync. This also cuts down cs_main contention by reducing the number of NextBlockSync calls (instead of calling it for every block, we will call it once per block window), making the node more responsive (IBD and validation) while indexes sync up. On top of that, it lays the groundwork for the bigger speedups, since part of the parallelization pre-work is already in place.

    Just as a small summary:

    • Batch DB writes instead of flushing per block, which improves sync time on HDD due to the reduced number of disk write operations.
    • Reduce cs_main lock contention, which improves the overall node responsiveness (and primarily IBD) while the indexes threads are syncing.
    • Lays the groundwork for the real speedups, since part of #26966 parallelization pre-work is introduced here as well.
    • Reduces the number of generated LDB files. See l0rinc’s comment.

    Note: Pending initial sync benchmark and further testing. I have only focused on decoupling and simplifying what was in #26966 first. Commits should be simple to digest.

  2. DrahtBot added the label UTXO Db and Indexes on Feb 3, 2026
  3. DrahtBot commented at 4:07 am on February 3, 2026: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    Concept ACK l0rinc, arejula27
    Stale ACK optout21

    If your review is incorrectly listed, please copy-paste <!–meta-tag:bot-skip–> into the comment that the bot should ignore.

    Conflicts

    No conflicts as of last run.

    LLM Linter (✨ experimental)

    Possible typos and grammar issues:

    • already be -> already [duplicate “be” makes the sentence ungrammatical]

    Possible places where named args for integral literals may be used (e.g. func(x, /*named_arg=*/0) in C++, and func(x, named_arg=0) in Python):

    • WaitForHeight(&index, blocking_height + 2, 5s) in src/test/blockfilter_index_tests.cpp

    2026-02-18 17:18:37

  4. furszy force-pushed on Feb 3, 2026
  5. DrahtBot added the label CI failed on Feb 3, 2026
  6. DrahtBot commented at 4:42 am on February 3, 2026: contributor

    🚧 At least one of the CI tasks failed. Task iwyu: https://github.com/bitcoin/bitcoin/actions/runs/21616547248/job/62296358632 LLM reason (✨ experimental): IWYU (include-what-you-use) reported a failure in the test script, causing the CI to fail.

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  7. in src/index/base.cpp:201 in 8368abaa8b outdated
    197@@ -198,6 +198,22 @@ bool BaseIndex::ProcessBlock(const CBlockIndex* pindex, const CBlock* block_data
    198     return true;
    199 }
    200 
    201+bool BaseIndex::ProcessBlocks(const CBlockIndex* start, const CBlockIndex* end)
    


    fjahr commented at 9:42 am on February 3, 2026:
    Why keep ProcessBlock around? It should be simpler if we just have one function and it can be handed one or many blocks.

    fjahr commented at 10:06 am on February 3, 2026:
    Or, if you just want to keep it for encapsulation, consider making it private at least?

    furszy commented at 3:30 pm on February 3, 2026:

    Why keep ProcessBlock around? It should be simpler if we just have one function and it can be handed one or many blocks.

    Because ProcessBlocks will be specialized in a follow-up commit (please see #26966) to enable additional speedups, such as processing blocks out-of-order for indexes that support it.

    Also, I wouldn’t merge them because these two methods solve different problems. ProcessBlocks handles ordering, ProcessBlock handles a single unit of work. I think keeping them separate keeps the code clean and easy to follow while allowing us to introduce improvements in the future.

    Or, if you just want to keep it for encapsulation, consider making it private at least?

    Yeah, could do that.

    just realized that ProcessBlocks is private already.

  8. in src/index/base.h:32 in e7fa0d00e1 outdated
    26@@ -27,6 +27,10 @@ class CBlockIndex;
    27 class Chainstate;
    28 
    29 struct CBlockLocator;
    30+
    31+/** Range of blocks to process in batches */
    32+static constexpr int INDEX_BATCH_SIZE = 500;
    


    fjahr commented at 9:45 am on February 3, 2026:
    Has this been optimized with benchmarks or are you planning to do that? This might be something differs between the indexes so it might make sense to let each index define their own value.

    furszy commented at 3:13 pm on February 3, 2026:

    Has this been optimized with benchmarks or are you planning to do that? This might be something differs between the indexes so it might make sense to let each index define their own value.

    Let’s handle it in a focused follow-up so we don’t lose momentum here tuning a parameter.

    I’d rather land the structural improvements first, since they unblock other improvements and allow us to continue working on the parallelization goal (which introduces the major sync speedup).


    arejula27 commented at 4:40 pm on February 12, 2026:

    Bitcoin blocks do not have a consistent size, so batching by block size will result in irregular batch sizes. I suggest considering other approaches, such as a fixed transaction count (e.g., adding blocks until the number of transactions within them falls within a target range). This could be particularly interesting for txindex, as most batches would have a similar byte size. This would make batches more uniform and lead to more predictable computation and performance.

    However, this approach would make parallelisation more difficult, since generating the batch would require a precomputation step. This is a considerable trade-off, especially given that parallelisation of the batch computation (#26966) would likely be a more impactful enhancement.

  9. fjahr commented at 9:47 am on February 3, 2026: contributor
    This relates to what you wrote here right? It would be helpful that you check if it solves the LevelDB file issue in coinstatsindex and I am also curious about your benchmark results because @l0rinc did not find that this was leading to a speed up.
  10. l0rinc commented at 10:38 am on February 3, 2026: contributor

    Concept ACK, I prefer this over #33306 - and will investigate if this solves that problem once my benchmarking servers free up.

    because @l0rinc did not find that this was leading to a speed up.

    I experimented with something similar in https://github.com/l0rinc/bitcoin/pull/37/changes, will compare it against this change. I wrote:

    indexes: batch index writes: 34188s, 211M, 8 files It solved the fragmentation, but didn’t speed up anything. I still think this is a better direction than adding manual compactions.

    Most likely since writing isn’t the bottleneck but MuHash calculations were.

  11. furszy commented at 3:02 pm on February 3, 2026: member

    This relates to what you wrote here right? It would be helpful that you check if it solves the LevelDB file issue in coinstatsindex and I am also curious about your benchmark results because @l0rinc did not find that this was leading to a speed up. @fjahr the comment was merely an excuse to decouple the DB writes batching code out of #26966 rather than me having a formed opinion in favor or against #33306. That’s why I didn’t mention it in the PR description. Maybe the changes complement each other.

    To be crystal clear, just updated the PR description with further details on why this change worth alone, independently on #33306.

    To summarize it here, the goal of the changes are:

    • Batch DB writes instead of flushing per block, which will improve sync time on HDD due to the reduced number of IO operations.
    • Reduce cs_main lock contention, which orthogonally improves the overall node responsiveness (and primarily IBD) while the indexes threads are syncing.
    • Lays the groundwork for the real speedups, since part of #26966 parallelization pre-work is introduced here as well.
  12. furszy force-pushed on Feb 3, 2026
  13. furszy force-pushed on Feb 3, 2026
  14. furszy commented at 8:26 pm on February 3, 2026: member
    Updated per feedback from @hebasto (thanks!). Changed <cinttypes> include for <cstdint> to make IWYU happy.
  15. hebasto commented at 8:38 pm on February 3, 2026: member

    Updated per feedback from @hebasto (thanks!). Changed <cinttypes> include for <cstdint> to make IWYU happy.

    I guess, this is a bug in IWYU caused by https://github.com/include-what-you-use/include-what-you-use/commit/44480a2de0d8ef039b13391997f274ea33750be9.

    UPDATE: Fixed in #34498.

  16. maflcko commented at 1:57 pm on February 4, 2026: member
    Can you run all unit and functional tests on all commits? Or do they time out?
  17. furszy force-pushed on Feb 4, 2026
  18. furszy commented at 3:22 pm on February 4, 2026: member

    Can you run all unit and functional tests on all commits? Or do they time out?

    Bad squash, my bad. Thanks for the heads up. Fixed. Also rebased the branch to pick up the CI changes.

  19. DrahtBot removed the label CI failed on Feb 4, 2026
  20. Jhackman2019 commented at 2:54 am on February 8, 2026: none
    Builds clean on ARM64 (Pi 5, Debian Bookworm). All index-related functional tests and blockfilter_index unit tests pass. :)
  21. in src/index/txindex.cpp:48 in 8826900ddc


    bvbfan commented at 8:31 am on February 8, 2026:
    This function doesn’t exist anymore.

    furszy commented at 4:55 pm on February 8, 2026:
    Done, thanks.

    optout21 commented at 10:22 pm on February 9, 2026:

    A minor comment on the commit description: I feel an ambiguity or typo around “during”.

    0Pass CDBBatch to subclasses during so writes can be accumulated and committed together instead of flushing per block.
    

    furszy commented at 1:12 am on February 10, 2026:
    Fixed.
  22. furszy force-pushed on Feb 8, 2026
  23. in src/index/base.cpp:246 in eed9ea5fc5 outdated
    243+                if (block_batch.start_index) {
    244+                    const int start_height = block_batch.start_index->nHeight;
    245+                    const int tip_height = m_chainstate->m_chain.Height();
    246+                    // Compute the last height in the batch without exceeding the chain tip
    247+                    const int batch_end_height = std::min(start_height + m_num_blocks_batch - 1, tip_height);
    248+                    block_batch.end_index = m_chainstate->m_chain[batch_end_height];
    


    optout21 commented at 2:06 pm on February 9, 2026:
    The start and end are set here, according to the desired size. However, block_batch.start_index can change below. Wouldn’t it be better/cleaner to set it just before the ProcessBlocks call? (I mean to keep the logic up to the deleted pindex = pindex_next; line, and put the bactch setup there.)

    furszy commented at 4:39 pm on February 9, 2026:

    The start and end are set here, according to the desired size. However, block_batch.start_index can change below.

    It seems the code I had for this case got dropped in the previous bad squash. Good catch. See the missing line https://github.com/furszy/bitcoin-core/commit/54874bf43b2df6846061fbb844b7026f69e7bc74#diff-5251d93616c116c364d2d502ad2a59e901a3512194462fabacc9756f96fd8dddR274. Will re-add it. Thanks.

    Wouldn’t it be better/cleaner to set it just before the ProcessBlocks call? (I mean to keep the logic up to the deleted pindex = pindex_next; line, and put the bactch setup there.)

    That would not only require taking cs_main an extra time, but would also introduce race conditions. The extra locking comes from the chain lookup needed to determine the window’s end index. And, more importantly, if the tip changes between setting the start index and computing the end, we could end up on a fork at a lower height, resulting in an invalid window (start_index > end_index).


    optout21 commented at 6:02 pm on February 9, 2026:

    That would not only require taking cs_main an extra time, but would also introduce race conditions. The extra locking comes from the chain lookup needed to determine the window’s end index.

    What I had in mind is setting both the start and end a bit later, after the pindex is finalized. I will try it out and provide a concrete code snippet a bit later.


    furszy commented at 7:00 pm on February 9, 2026:

    What I had in mind is setting both the start and end a bit later, after the pindex is finalized. I will try it out and provide a concrete code snippet a bit later.

    I see the same two problems there, with an extra requirement of a reorg check and another rewind call. Which makes it even worse.


    optout21 commented at 9:56 pm on February 9, 2026:
    I acknowledge that it’s important to take the cs_main lock only once in the typical scenario. Also, the missing update of end was identified and fixed! Thx
  24. optout21 commented at 2:29 pm on February 9, 2026: contributor

    I am confused as to how this batching relates to the batching logic of the orginal code. Looking at the pre-change BaseIndex::Sync(), it seems that Commit/WriteBatch occured only every 30 seconds (or at the chain tip), not after each block. This change does a WriteBatch after every 500 blocks, which in fact may be even more frequent. The original commit-after-every-30-secs is still there, so I don’t see how this change makes less writes. Maybe I misunderstand the change.

    0            if (!ProcessBlock(pindex)) return; // error logged internally
    1            ...
    2            if (current_time - last_locator_write_time >= SYNC_LOCATOR_WRITE_INTERVAL) {
    3                SetBestBlockIndex(pindex);
    4                last_locator_write_time = current_time;
    5                // No need to handle errors in Commit. See rationale above.
    6                Commit();
    7            }
    
  25. furszy commented at 6:48 pm on February 9, 2026: member

    I am confused as to how this batching relates to the batching logic of the orginal code. Looking at the pre-change BaseIndex::Sync(), it seems that Commit/WriteBatch occured only every 30 seconds (or at the chain tip), not after each block. This change does a WriteBatch after every 500 blocks, which in fact may be even more frequent. The original commit-after-every-30-secs is still there, so I don’t see how this change makes less writes. Maybe I misunderstand the change.

    0            if (!ProcessBlock(pindex)) return; // error logged internally
    1            ...
    2            if (current_time - last_locator_write_time >= SYNC_LOCATOR_WRITE_INTERVAL) {
    3                SetBestBlockIndex(pindex);
    4                last_locator_write_time = current_time;
    5                // No need to handle errors in Commit. See rationale above.
    6                Commit();
    7            }
    

    Commit() only persists the locator (i.e. “how far the index has synced”) so we can resume from that point after a restart. It does not flush or batch the per-block digested data.

    The block data is written inside CustomAppend() which is called from ProcessBlock() per block, and it writes to the db immediately.

    We’re not replacing locator update time nor merging it with something else here, we’re only batching something that previously wasn’t batched at all.

    Also, using a block window instead of reusing the existing time-based window is intentional. Time-based flushing makes memory usage hard to predict, and difficult to test (see #32878), because the amount of buffered data depends on runtime speed, which does not necessarily correlate with available memory (batched data stays in memory until it’s flushed..). A fixed block window provides predictable memory usage and deterministic behavior, which is overall better and also integrates nicely with multiple workers sync (#26966). I would argue in favor of ditching the time window in the future, but.. that’s not something related to this PR.

  26. furszy force-pushed on Feb 9, 2026
  27. furszy commented at 8:00 pm on February 9, 2026: member
    Updated per feedback, plus added test coverage for the introduced changes. Thanks @optout21.
  28. optout21 commented at 9:52 pm on February 9, 2026: contributor

    Thanks for the explanations, and apologies for asking trivial/irrelevant questions.

    Commit() only persists the locator (i.e. “how far the index has synced”) so we can resume from that point after a restart. The block data is written inside CustomAppend() which is called from ProcessBlock() per block, and it writes to the db immediately. These are two important points, I can see now.

    I was puzzled by the fact that there is a loop in Sync(), and another loop is introduced in ProcessBlocks(). But the two complement each other.

    Another approach would be to keep the outer loop in Sync() block-by-block, store a CDBBatch variable, process one block, count the blocks in the batch, and flush the batch if batch size is reached (outside of ProcessBlock). This would be slightly less change (no secondary loop), but – if I’m correct – the cs_main lock would be still accessed at every block.

    … I would argue in favor of ditching the time window in the future, but.. that’s not something related to this PR. Maybe Commit() could be called with the same granularity as batch writes, but that’s out of scope here.

  29. furszy commented at 10:11 pm on February 9, 2026: member

    Another approach would be to keep the outer loop in Sync() block-by-block, store a CDBBatch variable, process one block, count the blocks in the batch, and flush the batch if batch size is reached (outside of ProcessBlock). This would be slightly less change (no secondary loop), but – if I’m correct – the cs_main lock would be still accessed at every block.

    Yes. That would still lock cs_main on every block, which harms the overall node responsiveness, and it is also not very helpful for the parallelization goal, which requires certain isolation between batches so worker threads can process them concurrently.

  30. in src/index/base.cpp:276 in 360c43d7c5
    272                 FatalErrorf("Failed to rewind %s to a previous chain tip", GetName());
    273                 return;
    274             }
    275 
    276-            // For now, process a single block at time
    277             if (!ProcessBlocks(block_batch.start_index, block_batch.end_index)) return; // error logged internally
    


    optout21 commented at 10:17 pm on February 9, 2026:
    Code review shows that by this point start_index and end_index are not null, but there is several if’s to ensure that, it’s not trivial. I propose to document/ensure this invariant with an Assume before this line, especially having in mind that ProcessBlocks crashes if start or end is nullptr.

    furszy commented at 1:15 am on February 10, 2026:
    Probably better to just pass them by ref instead. Will do if have to re-touch. Otherwise it can be tackled in a quick follow-up too.

    furszy commented at 5:18 pm on February 18, 2026:
    Done as suggested.
  31. in src/index/base.cpp:212 in 0d980feb95
    207+    }
    208+
    209+    // And process blocks in forward order: from start to end
    210+    for (auto it = ordered_blocks.rbegin(); it != ordered_blocks.rend(); ++it) {
    211+        if (m_interrupt) return false;
    212+        if (!ProcessBlock(*it)) return false;
    


    optout21 commented at 10:18 pm on February 9, 2026:
    nit: the “// error logged internally” comment could be passed on here as well.

    furszy commented at 1:12 am on February 10, 2026:
    Done as suggested.
  32. optout21 commented at 10:24 pm on February 9, 2026: contributor

    crACK 3687ead892d207ddbdfec3f2ef5c086130414d9a

    LGTM after clarifications and changes! Code review, no assessment on performance.

    Prev: crACK d0f793ac61a0659f4f1e89f7943813244499bf7d

  33. DrahtBot requested review from l0rinc on Feb 9, 2026
  34. furszy force-pushed on Feb 10, 2026
  35. furszy commented at 1:19 am on February 10, 2026: member
    nits addressed, thanks @optout21. Also, just for completeness, added test coverage for the “reorg during initial sync” scenario that was not covered before.
  36. furszy force-pushed on Feb 10, 2026
  37. DrahtBot added the label CI failed on Feb 10, 2026
  38. DrahtBot removed the label CI failed on Feb 10, 2026
  39. l0rinc commented at 11:15 am on February 10, 2026: contributor

    I just finished measuring the size and file count for this PR (it was running previously for a week with the wrong tail -F grep I reused from #26966#pullrequestreview-2965376357)

     0COMMITS="47c9297172b0866d2b5d0de6b4fd35b798db3929 dd76491174b3a3b60c174822f75abb03fda3b1d4"; \
     1DATA_DIR="/mnt/my_storage/BitcoinData"; export DATA_DIR; \
     2wait_index() { tail -F ${DATA_DIR}/debug.log 2>/dev/null | grep -q -m1 'coinstatsindex is enabled'; killall bitcoind 2>/dev/null || true; sleep 2; }; export -f wait_index; \
     3for COMMIT in $COMMITS; do \
     4  killall bitcoind 2>/dev/null || true; rm -rf "$DATA_DIR/indexes" "$DATA_DIR/debug.log"; \
     5  git fetch -q origin "$COMMIT" >/dev/null 2>&1 || true && git checkout -q "$COMMIT" && git log -1 --pretty='%h %s' "$COMMIT"; \
     6  cmake -B build -G Ninja -DCMAKE_BUILD_TYPE=Release >/dev/null 2>&1 && ninja -C build >/dev/null 2>&1; \
     7  ./build/bin/bitcoind -datadir="$DATA_DIR" -connect=0 -coinstatsindex=1 -printtoconsole=0 & wait_index; \
     8  du -h "$DATA_DIR/indexes/coinstatsindex/db"; \
     9  find "$DATA_DIR/indexes/coinstatsindex/db" -type f -name '*.ldb' | wc -l; \
    10done
    
    0[1]+  Done                    ./build/bin/bitcoind -datadir="$DATA_DIR" -connect=0 -coinstatsindex=1 -printtoconsole=0
    147c9297172 Merge bitcoin/bitcoin#32420: mining, ipc: omit dummy extraNonce from coinbase
    2[1] 3876795
    3[1]+  Done                    ./build/bin/bitcoind -datadir="$DATA_DIR" -connect=0 -coinstatsindex=1 -printtoconsole=0
    4216M    /mnt/my_storage/BitcoinData/indexes/coinstatsindex/db
    5144
    

    and

    0dd76491174 index: enable DB writes batching during sync
    1[1] 3885937
    2[1]+  Done                    ./build/bin/bitcoind -datadir="$DATA_DIR" -connect=0 -coinstatsindex=1 -printtoconsole=0
    3216M    /mnt/my_storage/BitcoinData/indexes/coinstatsindex/db
    410
    

    I haven’t measured the speeds yet but will do that next.

  40. furszy commented at 11:31 pm on February 11, 2026: member
    Interesting, cool thanks. Another good point in favor of this change then. Will add it to the PR description.
  41. l0rinc commented at 8:32 am on February 12, 2026: contributor

    I have finished benchmarking them:

     0BEFORE="64294c89094d5ab10d87236729cc267fde0a24ca"; AFTER="3687ead892d207ddbdfec3f2ef5c086130414d9a"; \
     1DATA_DIR="/mnt/my_storage/BitcoinData"; export DATA_DIR; \
     2RESULTS_FILE="${DATA_DIR}/index_benchmark_results.txt"; export RESULTS_FILE; \
     3wait_index() { \
     4  tail -F ${DATA_DIR}/debug.log 2>/dev/null | grep -q -m1 'index is enabled\|is enabled at height'; \
     5  sleep 2; \
     6  killall bitcoind 2>/dev/null || true; \
     7  wait; \
     8}; export -f wait_index; \
     9log_size() { \
    10  local label="$1"; local index_dir="$2"; \
    11  local size=$(du -sh "$index_dir" 2>/dev/null | cut -f1); \
    12  local files=$(find "$index_dir" -type f -name '*.ldb' 2>/dev/null | wc -l); \
    13  echo "$label: size=$size, ldb_files=$files" | tee -a "$RESULTS_FILE"; \
    14}; export -f log_size; \
    15git reset --hard >/dev/null 2>&1 && git clean -fxd >/dev/null 2>&1 && git fetch origin $BEFORE $AFTER >/dev/null 2>&1; \
    16for c in $BEFORE:build-before $AFTER:build-after; do \
    17  git checkout ${c%:*} >/dev/null 2>&1 && cmake -B ${c#*:} -G Ninja -DCMAKE_BUILD_TYPE=Release >/dev/null 2>&1 && ninja -C ${c#*:} bitcoind >/dev/null 2>&1; \
    18done; \
    19echo "indexes | $(hostname) | $(uname -m) | $(lscpu | grep 'Model name' | head -1 | cut -d: -f2 | xargs) | $(nproc) cores | $(free -h | awk '/^Mem:/{print $2}') RAM | $(df -T $DATA_DIR | awk 'NR==2{print $2}') | $(lsblk -no ROTA $(df --output=source $DATA_DIR | tail -1) 2>/dev/null | grep -q 0 && echo SSD || echo HDD)" | tee -a "$RESULTS_FILE" && \
    20for INDEX in txindex blockfilterindex coinstatsindex; do \
    21  echo -e "\n--- $INDEX ---" | tee -a "$RESULTS_FILE"; \
    22  if [ "$INDEX" = "blockfilterindex" ]; then \
    23    INDEX_DIR="${DATA_DIR}/indexes/blockfilter/basic"; \
    24  else \
    25    INDEX_DIR="${DATA_DIR}/indexes/${INDEX}"; \
    26  fi; \
    27  export INDEX_DIR; \
    28  for BUILD in before after; do \
    29    if [ "$BUILD" = "before" ]; then COMMIT="${BEFORE:0:7}"; else COMMIT="${AFTER:0:7}"; fi; \
    30    hyperfine --runs 1 --shell bash --sort command \
    31      --prepare "rm -rf ${DATA_DIR}/indexes/* ${DATA_DIR}/debug.log" \
    32      --cleanup "log_size '${INDEX} ${BUILD}' '${INDEX_DIR}'" \
    33      -n "${BUILD} (${COMMIT})" \
    34      "./build-${BUILD}/bin/bitcoind -datadir=${DATA_DIR} -${INDEX}=1 -connect=0 -printtoconsole=0 & wait_index" \
    35      2>&1 | tee -a "$RESULTS_FILE"; \
    36  done; \
    37done;
    

    — txindex —

    0Benchmark 1: before (64294c8)
    1  Time (abs ):        7665.128 s               [User: 11912.927 s, System: 1194.716 s]
    2txindex before: size=63G, ldb_files=2046
    3
    4Benchmark 1: after (3687ead)
    5  Time (abs ):        7579.151 s               [User: 11918.671 s, System: 1120.304 s] 
    6txindex after: size=63G, ldb_files=2049
    

    — blockfilterindex —

    0Benchmark 1: before (64294c8)
    1  Time (abs ):        8447.157 s               [User: 8034.353 s, System: 274.633 s] 
    2blockfilterindex before: size=12G, ldb_files=3
    3
    4Benchmark 1: after (3687ead)
    5  Time (abs ):        8391.136 s               [User: 7992.547 s, System: 270.205 s] 
    6blockfilterindex after: size=12G, ldb_files=3
    

    — coinstatsindex —

    0Benchmark 1: before (64294c8)
    1  Time (abs ):        35219.938 s               [User: 34115.981 s, System: 276.583 s]
    2coinstatsindex before: size=216M, ldb_files=145
    3
    4Benchmark 1: after (3687ead)
    5  Time (abs ):        35229.893 s               [User: 34125.606 s, System: 271.160 s]
    6coinstatsindex after: size=216M, ldb_files=10
    

    Speed is basically the same and the number of files in coinstatsindex were fixed. The txindex file count change seems like noise. I will review the code soon, thanks for extracting it from the big PR.

  42. furszy commented at 3:40 pm on February 12, 2026: member

    Speed is basically the same

    You have a really nice machine there. The real batching gains should show up on HDDs rather than SSDs. Still, great to know this works properly on high end hardware. Thanks for the benchmark.

  43. l0rinc commented at 4:08 pm on February 12, 2026: contributor

    The real batching gains should show up on HDDs rather than SSDs

    yes, I would expect that to be the case - that will also be part of my review before I can ack it.

  44. furszy commented at 4:58 pm on February 12, 2026: member

    yes, I would expect that to be the case - that will also be part of my review before I can ack it.

    you know, even if it runs at the same speed on an HDD (which I doubt), the other improvements this PR brings still make it worth it.

  45. in src/test/blockfilter_index_tests.cpp:389 in 3687ead892 outdated
    385+    WaitForHeight(&index, blocking_height + 2, 5s);
    386     index.Stop();
    387 }
    388 
    389+// Ensure the initial sync batch window behaves as expected.
    390+// Tests sync from the genesis block and from a higher block to mimic a restart.
    


    arejula27 commented at 5:37 pm on February 12, 2026:

    I wouldn’t say this test truly mimics a restart (i.e., shutting down the node and immediately restarting it, without the blockchain growing). It is closer to a continuation after a clean shutdown of the whole node for a long period of time. This test is still interesting, but it does not cover a new bug window introduced by the batch approach (shutting down the node during a batch).

    The test currently does the following:

    • Pre-mine 100 blocks
    • Mine 173 additional blocks
    • Create the index
    • Synchronize the index to the tip
    • Mine another 173 blocks
    • Recreate the index
    • Synchronize again to the new tip

    In a real-world scenario, all blocks would already exist on disk, and a restart would likely happen during a batch (for example at height 196, where the previous batch ended at 180).

    I would like to see a test (either a modification of this one or a new one) that verifies the index is not corrupted and can correctly resume if the node is restarted during a sync, before reaching the tip.

    A possible approach for such a test could be:

    1. Pre-populate the blockchain (e.g., 300 blocks already on disk)
    2. Start index sync with batch_size = 30
    3. Interrupt the sync after several batches but before completion
    4. Simulate a shutdown
    5. Restart the index and verify (with no new blocks added):
      • It resumes from the last checkpoint (not from genesis or the last processed block)
      • No corruption occurs in re-processed blocks
      • The sync successfully completes

    furszy commented at 3:37 pm on February 13, 2026:
    Sorry, I don’t see the point here. What you described is essentially a restart. The index gets recreated, initial sync resumes from the last stored index. That’s all what we need. Feel free to experiment on a follow-up, or I could also add it here if you have some code too.

    arejula27 commented at 7:58 pm on February 13, 2026:

    Sorry if I didn’t explain this clearly before, let me try again.

    With this PR, some blocks are being indexed (on disk), but the index pointer is not updated because the Commit() function has not been called. The commit is executed every 30 seconds (which won’t happen during the test) or when the tip is processed (batch tip).

    What I’m suggesting is to test the scenario where a batch is processed but the tip is not processed. In that case, the index should sync starting from the last committed block.

    After debugging your test case, the second round started from the last element of the last batch, as it was the end of the chain at round 0. I would suggest checking the following cases (with a batch of 30 blocks and at least 101 blocks):

    • If the sync process is shut down when block 100 is processed, after a restart, the first block to sync should be 91 (not 0, 91 or 120), verifying that if a batch is interrupted, after a restart, the index will reindex from the last committed batch.
    • If the sync process is shut down when block 90 is processed, after a restart, the first block to sync should be 91, verify and an edge case of the previous scenario.

    Update

    I noticed that in the code, interruptions are handled by committing:

    0if (m_interrupt) {
    1    LogInfo("%s: m_interrupt set; exiting ThreadSync", GetName());
    2
    3    SetBestBlockIndex(pindex);
    4    Commit();
    5    return;
    6}
    

    Because of this, the scenarios described above may not be useful, since the last processed block might always be committed before shutdown.

    I’m not sure whether this interruption only applies to graceful shutdowns or if it also covers crash scenarios.


    furszy commented at 8:21 pm on February 13, 2026:

    With this PR, some blocks are being indexed (on disk), but the index pointer is not updated because the Commit() function has not been called. The commit is executed every 30 seconds (which won’t happen during the test) or when the tip is processed (batch tip).

    Just to be accurate: batch writes and the index locator update are, on purpose, not related procedures. Commit() happens every 30 secs, during interruption or when the tip is reached prior to finishing initial sync.

    What I’m suggesting is to test the scenario where a batch is processed but the tip is not processed. In that case, the index should sync starting from the last committed block.

    Sure, we could return an error inside the test child index CustomAppend() that finishes the process early. It shouldn’t be hard to code. Feel encouraged to experiment with it. I can cherry-pick your changes if you have a test commit for it.


    arejula27 commented at 10:08 pm on February 13, 2026:

    Just to be accurate: batch writes and the index locator update are, on purpose, not related procedures. Commit() happens every 30 secs, during interruption or when the tip is reached prior to finishing initial sync.

    Yes, ofc :D

    It shouldn’t be hard to code. Feel encouraged to experiment with it. 🫡


    arejula27 commented at 1:47 am on February 14, 2026:

    Having played around with the code, I found that the case i mentioned didnt worked correctly

    • Round 0:

      • Max height: 273
      • After sync, tip is at 273
    • Round 1:

      • Max height: 446 (273 + 173)
      • Sync fails at height 310
    • Round 2:

      • The first batch starts at 273, however, it should be 304 (the previous round synced one batch)

      This happens because CustomAppend() returns false and the process is stpped before the following branch is reached

      0if (m_interrupt) {
      1    LogInfo("%s: m_interrupt set; exiting ThreadSync", GetName());
      2     SetBestBlockIndex(pindex);
      3    Commit();
      4    return;
      5}
      

    Below is the code to reproduce the issue:

     0// Extend BlockFilterIndex to create a custom test function CustomAppend
     1class BlockFilterIndexTest : public BlockFilterIndex
     2{
     3public:
     4    BlockFilterIndexTest(std::unique_ptr<interfaces::Chain> chain, BlockFilterType filter_type, size_t n_cache_size, bool f_memory) :
     5        BlockFilterIndex(std::move(chain), filter_type, n_cache_size, f_memory) {}
     6
     7    // Override CustomAppend to simulate an error during append at a specific block height
     8    bool CustomAppend(CDBBatch& batch, const interfaces::BlockInfo& block) override
     9    {
    10        if (block.height == 290) {
    11            return false;
    12        }
    13        return BlockFilterIndex::CustomAppend(batch, block);
    14    }
    15};
    16
    17// Ensure the initial sync batch window behaves as expected.
    18// Tests sync from the genesis block and from a higher block to mimic a restart.
    19// Note: Test runs in /tmp by default, which is usually cached, so timings are not
    20// a meaningful benchmark (use -testdatadir to run it elsewhere).
    21BOOST_FIXTURE_TEST_CASE(initial_sync_batch_window, BuildChainTestingSetup)
    22{
    23    constexpr int MINE_BLOCKS = 173;
    24    constexpr int BATCH_SIZE = 30;
    25
    26    int expected_tip = 100; // pre-mined blocks
    27    for (int round = 0; round < 3; round++) { // two additional rounds to test sync from genesis and from a higher block
    28        mineBlocks(MINE_BLOCKS); // Generate blocks
    29        const int tip_height = WITH_LOCK(::cs_main, return m_node.chainman->ActiveChain().Height());
    30        BOOST_REQUIRE(tip_height == MINE_BLOCKS + expected_tip);
    31        expected_tip = tip_height;
    32
    33        BlockFilterIndexTest filter_index(interfaces::MakeChain(m_node), BlockFilterType::BASIC, 1 << 20, /*f_memory=*/false);
    34        filter_index.SetProcessingBatchSize(BATCH_SIZE);
    35        BOOST_REQUIRE(filter_index.Init());
    36        BOOST_CHECK(!filter_index.BlockUntilSyncedToCurrentChain());
    37
    38        // Ensure we can sync up to the tip
    39        const auto& summary_pre{filter_index.GetSummary()};
    40        // breakpoint here for following gdb ouptut
    41        filter_index.Sync();
    42        if (round == 1) { continue; }
    43        const auto& summary{filter_index.GetSummary()};
    44        BOOST_CHECK(summary.synced);
    

    GDB output before Sync() in round 2:

    0gdb build/bin/test_bitcoin
    1b blockfilter_index_tests.cpp:427
    2b src/index/base.cpp:255
    3run
    4(gdb) p round
    5$2 = 2
    6(gdb) p expected_tip
    7$3 = 619
    8(gdb) p summary_pre.best_block_height
    9$4 = 273
    

    furszy commented at 2:03 am on February 14, 2026:
    Haven’t gone through this in detail, but the locator won’t update in your second round unless you advance the mock time. Also, keep in mind that if any block within the batch fails, the entire process aborts. The process doesn’t save intermediate batch states.

    arejula27 commented at 2:18 am on February 14, 2026:

    but the locator won’t update in your second round unless you advance the mock time.

    yes i didnt want to simulate 30 sec, i wanted the error before the timmer commit (to fail into the error handling m_interrupt)

    . Also, keep in mind that if any block within the batch fails, the entire process aborts

    I saw that when CustomAppend returns false, it executes FatalErrorf, which simply stops the node (not gracefully) instead of raising m_interrupt, my bad

    1. What do you think about modifying the error handling in the sync process to prevent losing stored data by raising the m_interrupt flag? In the worst case, we lose 29.59 seconds of execution, it might not be worth it and let everything as it currently is

    2. I will try to modify the CustomAppend to raise the m_interrupt and create a test to show how it is correctly processed, i think it is more interesting this approach


    furszy commented at 5:20 pm on February 14, 2026:

    What do you think about modifying the error handling in the sync process to prevent losing stored data by raising the m_interrupt flag? In the worst case, we lose 29.59 seconds of execution, it might not be worth it and let everything as it currently is

    As you said, since we already save progress every 30 seconds, saving it during interruptions makes very little sense. The amount of work lost is minimal. Yet, this has been the case since the indexes were introduced, and I suspect it hasn’t received much attention, which is probably why no one removed it before.. It’s something we could consider.


    arejula27 commented at 7:47 pm on February 14, 2026:
    Nice, if you agree, you can mark this thread as resolved then
  46. arejula27 commented at 5:39 pm on February 12, 2026: none
    Concept ACK [3687ead892d207ddbdfec3f2ef5c086130414d9a] I really like the idea and I think it will be a massive upgrade with your parallelisation proposal #26966
  47. in src/index/base.cpp:212 in 3687ead892 outdated
    207+        ordered_blocks.emplace_back(block);
    208+    }
    209+
    210+    // And process blocks in forward order: from start to end
    211+    for (auto it = ordered_blocks.rbegin(); it != ordered_blocks.rend(); ++it) {
    212+        if (m_interrupt) return false;
    


    arejula27 commented at 10:50 am on February 14, 2026:

    Returning false here causes the interrupt to be handled incorrectly as it will be handled in the Sync function like this:

    0if (!ProcessBlocks(db_batch, block_batch.start_index, block_batch.end_index)) {
    1    return; // error logged internally
    2}
    

    This early return prevents execution from reaching line 226, where the correct interrupt-handling logic resides. As a result, the expected recovery path is skipped.

    Specifically, this makes the following block effectively unreachable in this scenario:

    0if (m_interrupt) {
    1    LogInfo("%s: m_interrupt set; exiting ThreadSync", GetName());
    2
    3    SetBestBlockIndex(pindex);
    4
    5    Commit();
    6    return;
    7}
    

    I didn’t find in the file a place where m_interrupt is handled in this way, might be interesting to try to make the code fall on the graceful path. A possible solution might be changing the return by continue, so it will go to the start of the loop and fall into the error handling logic


    arejula27 commented at 10:58 am on February 14, 2026:

    I debbuged and with this code, the restart point is ok:

    0 if (!ProcessBlocks(db_batch, block_batch.start_index, block_batch.end_index)){
    1      if (!m_interrupt) {
    2           return;
    3}
    4                continue; 
    5}
    

    furszy commented at 5:10 pm on February 14, 2026:

    Nice catch. Pushed an update that fixes it and adds test coverage.

    I’m considering alternative options for the structure, like removing the interruption update, but will let the ideas settle over the weekend. At least we have some nice test coverage for it now.

    Some context on “removing the interruption update”: Since we save progress every 30 seconds, saving it during interruptions makes little sense, as the work lost is minimal. The code is probably there merely because it was introduced on the early index days, where things were way more chaotic, and no one really paid much attention to it afterwards.


    arejula27 commented at 8:04 pm on February 14, 2026:

    Nice catch. Pushed an update that fixes it and adds test coverage.

    Nice. I finally managed to explain the scenario I’ve been trying to describe from the beginning, sorry for not expressing it clearly before (I’m still new as a core reviewer), at least I found a potential bug.

    I don’t want to take up more of your time on it, so feel free to resolve this thread. I think the test implemented for the scenario is enough.

    Since we save progress every 30 seconds, saving it during interruptions makes little sense, as the work lost is minimal. The code is probably there merely because it was introduced in the early index days.

    That’s a fair point. However, I’m not sure I see the benefit of removing it. I agree that this could be an argument for not introducing the code in the first place, since the potential work lost is minimal. But given that it’s already there, I’m not sure whether removing it (even if the performance impact is small) is an improvement. Do you have something specific in mind? Perhaps something related to the parallelisation of the indexing process?

  48. furszy force-pushed on Feb 14, 2026
  49. DrahtBot added the label CI failed on Feb 14, 2026
  50. DrahtBot commented at 6:04 pm on February 14, 2026: contributor

    🚧 At least one of the CI tasks failed. Task test max 6 ancestor commits: https://github.com/bitcoin/bitcoin/actions/runs/22020957853/job/63629927583 LLM reason (✨ experimental): Block filter index test initial_sync_reorg failed, causing the test suite to abort and CI failure.

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  51. arejula27 commented at 8:06 pm on February 14, 2026: none
    re Concept ACK [c2953e446d53a8ac86ae3e4801208df5c9c6d2cc]
  52. furszy force-pushed on Feb 16, 2026
  53. furszy force-pushed on Feb 16, 2026
  54. furszy force-pushed on Feb 16, 2026
  55. DrahtBot removed the label CI failed on Feb 16, 2026
  56. index: sync, update iterator only after processing block
    No behavior change. This is just for correctness.
    dd75f85e08
  57. index: refactor, decouple interruption from sync loop
    No behavior change.
    43b03731a7
  58. index: add method to process block ranges
    No behavior change.
    
    Introduce ProcessBlocks(start, end) to handle a range of blocks
    in forward order. Currently used per-block, but this lays the
    foundation for future batch processing and parallelization.
    
    This has the nice property of allowing us to collect the block
    indexes we are about to process without locking 'cs_main'. Just
    by traversing the chain backwards via each block index pprev.
    90bbdf877f
  59. index: introduce BlockBatch struct for block range handling
    No behavior change.
    
    Lays the foundation for batch processing and future parallelization.
    965d4299a7
  60. index: compute block batch window
    Sets the end of a block batch, starting from the next
    block to sync and stopping at either the configured
    batch size or the chain tip.
    b67920269c
  61. index: enable DB writes batching during sync
    Pass CDBBatch to subclasses so writes can be accumulated
    and committed together instead of flushed per block
    
    Note:
    Batch writes were intentionally not tied to the existing Commit() method.
    The rationale is to bound memory consumption as batches accumulate in RAM
    until flushed to disk. This leaves two separate workflows with different
    purposes:
    
    1) Batch writes flushes data for a block range atomically: either all
    blocks in the batch land in LevelDB or none do.
    
    2) Commit() is about progress checkpointing: it writes the best block
    locator alongside any other index state position (e.g. last written
    file for the block filter index - which requires an fsync call), so the
    node always has a consistent recovery point after an interruption or crash.
    
    Reprocessing data after a crash is not really a problem because we would
    just overwrite the existing one.
    ca9838c3a6
  62. test: index, add initial sync batch writes coverage 73e999c434
  63. test: index, add coverage for initial sync reorgs 8ef27711c9
  64. test: Add coverage for index locator persistence during shutdown
    Verifies the index persists the last processed index when interrupted.
    0bebf13c0e
  65. furszy force-pushed on Feb 18, 2026
  66. furszy commented at 5:21 pm on February 18, 2026: member
    Made a few improvements to the commits descriptions with additional rationale for the changes, and rebased on master to pick up the latest CI updates.
  67. DrahtBot added the label CI failed on Feb 18, 2026
  68. DrahtBot removed the label CI failed on Feb 18, 2026
  69. in src/index/base.cpp:256 in 965d4299a7
    256 
    257             // For now, process a single block at time
    258-            if (!ProcessBlocks(/*start=*/*pindex_next, /*end=*/*pindex_next)) {
    259-                // If failed due to an interruption, we haven't processed the block.
    260+            block_batch.last = block_batch.first;
    261+            if (!ProcessBlocks(*block_batch.first, *block_batch.last)) {
    


    sedited commented at 8:49 pm on February 18, 2026:

    In commit 965d4299a76badcf2257f390f5c02e6007ec6244

    Nit: Why introduce the BlockBatch at all when we don’t pass it on here? I don’t think we gain much by de-referencing here.

  70. in src/index/base.h:105 in 965d4299a7
    100+        BlockBatch() = default;
    101+
    102+        // Disallow copy
    103+        BlockBatch(const BlockBatch&) = delete;
    104+        BlockBatch& operator=(const BlockBatch&) = delete;
    105+        BlockBatch(BlockBatch&&) noexcept = default;
    


    sedited commented at 8:55 pm on February 18, 2026:
    Just a question: Why do we need to define all of these?
  71. in src/index/base.cpp:260 in b67920269c
    255@@ -245,14 +256,14 @@ void BaseIndex::Sync()
    256                     m_synced = true;
    257                     break;
    258                 }
    259+                // Just process one block in case of tip change
    260+                block_batch.last = block_batch.first;
    


    sedited commented at 9:21 pm on February 18, 2026:
    Why not re-apply the same logic from before here? Is there a reason why we should only ever process one block here? If that is possible could we make NextSyncBlock absorb this logic and return a BlockBatch directly?
  72. in src/index/base.cpp:207 in 43b03731a7
    213-                // logged. The best way to recover is to continue, as index cannot be corrupted by
    214-                // a missed commit to disk for an advanced index state.
    215-                Commit();
    216-                return;
    217-            }
    218+        while (!m_interrupt) {
    


    sedited commented at 10:03 pm on February 18, 2026:
    Nit (clang-format): could drop the empty line after this too.

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-02-27 09:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me