index: batch db writes during initial sync #34489

furszy commented at 4:06 AM on February 3, 2026: member

Pending on #34897 review. Please go there first.

Decouples part of #26966.

Right now, index initial sync writes to disk for every block, which is not the best for HDD. This change batches those, so disk writes are less frequent during initial sync. This also cuts down cs_main contention by reducing the number of NextBlockSync calls (instead of calling it for every block, we will call it once per block window), making the node more responsive (IBD and validation) while indexes sync up. On top of that, it lays the groundwork for the bigger speedups, since part of the parallelization pre-work is already in place.

Just as a small summary:

Batch DB writes instead of flushing per block, which improves sync time on HDD due to the reduced number of disk write operations.
Reduce cs_main lock contention, which improves the overall node responsiveness (and primarily IBD) while the indexes threads are syncing.
Lays the groundwork for the real speedups, since part of #26966 parallelization pre-work is introduced here as well.
Reduces the number of generated LDB files. See l0rinc's comment.

DrahtBot added the label UTXO Db and Indexes on Feb 3, 2026

DrahtBot commented at 4:07 AM on February 3, 2026: contributor

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/34489.

Reviews

See the guideline for information on the review process.

Type	Reviewers
Concept ACK	arejula27, l0rinc, polespinasa
Stale ACK	optout21

If your review is incorrectly listed, please copy-paste <code></code> into the comment that the bot should ignore.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#34440 (Refactor CChain methods to use references, tests by optout21)
#32875 (index: handle case where pindex_prev equals chain tip in NextSyncBlock() by HowHsu)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

LLM Linter (✨ experimental)

Possible typos and grammar issues:

// attached while m_synced is still false, and it would not be -> // attached while m_synced is still false, and it would not be indexed. [fragmentary comment missing its conclusion; adding "indexed." completes the sentence and makes the intent clear]
// No need to handle errors in Commit. If it fails, the error will be already be -> // No need to handle errors in Commit. If it fails, the error will already be -> duplicate word "be" makes the sentence ungrammatical; remove the extra "be" to restore correct syntax.

Possible places where named args for integral literals may be used (e.g. func(x, /*named_arg=*/0) in C++, and func(x, named_arg=0) in Python):

WaitForHeight(&index, blocking_height + 2, 5s) in src/test/blockfilter_index_tests.cpp

2026-03-08 18:11:08

furszy force-pushed on Feb 3, 2026

DrahtBot added the label CI failed on Feb 3, 2026

DrahtBot commented at 4:42 AM on February 3, 2026: contributor

🚧 At least one of the CI tasks failed. Task iwyu: https://github.com/bitcoin/bitcoin/actions/runs/21616547248/job/62296358632 LLM reason (✨ experimental): IWYU (include-what-you-use) reported a failure in the test script, causing the CI to fail.

<details><summary>Hints</summary>

Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.
An intermittent issue.

Leave a comment here, if you need help tracking down a confusing failure.

</details>

in src/index/base.cpp:201 in 8368abaa8b outdated

 197 | @@ -198,6 +198,22 @@ bool BaseIndex::ProcessBlock(const CBlockIndex* pindex, const CBlock* block_data
 198 |      return true;
 199 |  }
 200 |  
 201 | +bool BaseIndex::ProcessBlocks(const CBlockIndex* start, const CBlockIndex* end)

fjahr commented at 9:42 AM on February 3, 2026:

Why keep ProcessBlock around? It should be simpler if we just have one function and it can be handed one or many blocks.

fjahr commented at 10:06 AM on February 3, 2026:

Or, if you just want to keep it for encapsulation, consider making it private at least?

furszy commented at 3:30 PM on February 3, 2026:

Why keep ProcessBlock around? It should be simpler if we just have one function and it can be handed one or many blocks.

Because ProcessBlocks will be specialized in a follow-up commit (please see #26966) to enable additional speedups, such as processing blocks out-of-order for indexes that support it.

Also, I wouldn't merge them because these two methods solve different problems. ProcessBlocks handles ordering, ProcessBlock handles a single unit of work. I think keeping them separate keeps the code clean and easy to follow while allowing us to introduce improvements in the future.

Or, if you just want to keep it for encapsulation, consider making it private at least?

~~Yeah, could do that.~~

just realized that ProcessBlocks is private already.

in src/index/base.h:32 in e7fa0d00e1 outdated

  26 | @@ -27,6 +27,10 @@ class CBlockIndex;
  27 |  class Chainstate;
  28 |  
  29 |  struct CBlockLocator;
  30 | +
  31 | +/** Range of blocks to process in batches */
  32 | +static constexpr int INDEX_BATCH_SIZE = 500;

fjahr commented at 9:45 AM on February 3, 2026:

Has this been optimized with benchmarks or are you planning to do that? This might be something differs between the indexes so it might make sense to let each index define their own value.

furszy commented at 3:13 PM on February 3, 2026:

Has this been optimized with benchmarks or are you planning to do that? This might be something differs between the indexes so it might make sense to let each index define their own value.

Let's handle it in a focused follow-up so we don't lose momentum here tuning a parameter.

I'd rather land the structural improvements first, since they unblock other improvements and allow us to continue working on the parallelization goal (which introduces the major sync speedup).

arejula27 commented at 4:40 PM on February 12, 2026:

Bitcoin blocks do not have a consistent size, so batching by block size will result in irregular batch sizes. I suggest considering other approaches, such as a fixed transaction count (e.g., adding blocks until the number of transactions within them falls within a target range). This could be particularly interesting for txindex, as most batches would have a similar byte size. This would make batches more uniform and lead to more predictable computation and performance.

However, this approach would make parallelisation more difficult, since generating the batch would require a precomputation step. This is a considerable trade-off, especially given that parallelisation of the batch computation (#26966) would likely be a more impactful enhancement.

l0rinc commented at 3:23 PM on March 9, 2026:

so batching by block size will result in irregular batch sizes

I don't think that matters. Whichever is simpler.

Let's handle it in a focused follow-up so we don't lose momentum here tuning a parameter.

I agree with @fjahr that tuning this based on measurements should be part of the PR. This seems like a random number, not based on actual behavior. What are the memory implications of this batch? What would be the speedup with bigger ones, and what's the effect on LevelDB compaction and the number of files, etc.?

It's expected that any batching implementation adds some memory overhead, but that shouldn't be a problem

I personally find this hand-wavy approach concerning.

allow us to continue working on the parallelization goal

I also personally find this troubling. We should be concerned with correctness here, and not even mention parallelization before we're sure batching is actually correct. And since I have concerns about the current implementation (reorgs, save + load, state + tip writing inconsistencies) and the lack of testing, I don't think we should treat this as just a quick hack that helps us get to parallelization.

nit: inline constexpr is more idiomatic for headers:

static inline constexpr int INDEX_BATCH_SIZE{500};

fjahr commented at 12:43 PM on March 14, 2026:

FWIW, I didn't say that tweaking the value has to be part of this PR. I was just curious how this value was picked and didn't judge it. It isn't even related to this comment direct but the time of my initial pass there were no benchmarks as far as I can remember and I had a bit of fear that this might get merged quickly and could end up in v31 without benchmarks and without the multithreading for which this is part of the setup. In that case I would have been a bit concerned that this might still introduce a performance penalty in a release. But the benchmarks have been run now and quite extensively and I haven't seen any penalty from this change. I think the follow-up that tweaks the value (if it turns out it can be optimized) as well as the multithreading should be tagged accordingly so they can make it into the same release as this change, if that is the case then I don't think my comment should be considered a blocker.

l0rinc commented at 6:14 AM on March 17, 2026:

FWIW, I didn't say that tweaking the value has to be part of this PR

Ok, but I did, we can't just add a random value here, we have to back it by some measurements. I have measured a batch size of 1000 and it doesn't seem to reduce the size further. The implementation still contains some bugs, so we probably have to remeasure after they're fixed. I'm also curious about the peak memory usage here.

static inline constexpr int INDEX_BATCH_SIZE{500};

static inline constexpr int INDEX_BATCH_SIZE{1000};

<details><summary>INDEX_BATCH_SIZE = 1000</summary>

BEFORE="4c40a923f003420193aa574745f70788bcf35265"; AFTER="1497086cdb0e64df861100e0715ffd975d4c41cf"; \
DATA_DIR="/mnt/my_storage/BitcoinData"; export DATA_DIR; \
RESULTS_FILE="${DATA_DIR}/index_benchmark_results.txt"; export RESULTS_FILE; \
wait_index() { \
  tail -F ${DATA_DIR}/debug.log 2>/dev/null | grep -q -m1 'index is enabled\|is enabled at height'; \
  sleep 2; \
  killall bitcoind 2>/dev/null || true; \
  wait; \
}; export -f wait_index; \
log_size() { \
  local label="$1"; local index_dir="$2"; \
  local size=$(du -sh "$index_dir" 2>/dev/null | cut -f1); \
  local files=$(find "$index_dir" -type f -name '*.ldb' 2>/dev/null | wc -l); \
  echo "$label: size=$size, ldb_files=$files" | tee -a "$RESULTS_FILE"; \
}; export -f log_size; \
git reset --hard >/dev/null 2>&1 && git clean -fxd >/dev/null 2>&1 && git fetch origin $BEFORE $AFTER >/dev/null 2>&1; \
for c in $BEFORE:build-before $AFTER:build-after; do \
  git checkout ${c%:*} >/dev/null 2>&1 && \
  sed -i 's/INDEX_BATCH_SIZE = 500/INDEX_BATCH_SIZE = 1000/' src/index/base.h && \
  cmake -B ${c#*:} -G Ninja -DCMAKE_BUILD_TYPE=Release >/dev/null 2>&1 && ninja -C ${c#*:} bitcoind >/dev/null 2>&1; \
done; \
echo "indexes | $(hostname) | $(uname -m) | $(lscpu | grep 'Model name' | head -1 | cut -d: -f2 | xargs) | $(nproc) cores | $(free -h | awk '/^Mem:/{print $2}') RAM | $(df -T $DATA_DIR | awk 'NR==2{print $2}') | $(lsblk -no ROTA $(df --output=source $DATA_DIR | tail -1) 2>/dev/null | grep -q 0 && echo SSD || echo HDD)" | tee -a "$RESULTS_FILE" && \
for INDEX in txindex blockfilterindex coinstatsindex txospenderindex; do \
  echo -e "\n--- $INDEX ---" | tee -a "$RESULTS_FILE"; \
  if [ "$INDEX" = "blockfilterindex" ]; then \
    INDEX_DIR="${DATA_DIR}/indexes/blockfilter/basic"; \
  else \
    INDEX_DIR="${DATA_DIR}/indexes/${INDEX}"; \
  fi; \
  export INDEX_DIR; \
  for BUILD in before after; do \
    if [ "$BUILD" = "before" ]; then COMMIT="${BEFORE:0:7}"; else COMMIT="${AFTER:0:7}"; fi; \
    hyperfine --runs 1 --shell bash --sort command \
      --prepare "rm -rf ${DATA_DIR}/indexes/* ${DATA_DIR}/debug.log" \
      --cleanup "log_size '${INDEX} ${BUILD}' '${INDEX_DIR}'" \
      -n "${BUILD} (${COMMIT})" \
      "./build-${BUILD}/bin/bitcoind -datadir=${DATA_DIR} -${INDEX}=1 -connect=0 -printtoconsole=0 & wait_index" \
      2>&1 | tee -a "$RESULTS_FILE"; \
  done; \
done;
indexes | i7-hdd | x86_64 | Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz | 8 cores | 62Gi RAM | ext4 | HDD

--- txindex ---
Benchmark 1: before (4c40a92)
  Time (abs ≡):        23766.792 s               [User: 13249.392 s, System: 1700.229 s]
 
Benchmark 1: after (1497086)
  Time (abs ≡):        21667.695 s               [User: 12747.999 s, System: 1444.935 s]
 

--- blockfilterindex ---
Benchmark 1: before (4c40a92)
  Time (abs ≡):        14074.504 s               [User: 8335.840 s, System: 419.536 s]
 
Benchmark 1: after (1497086)
  Time (abs ≡):        13987.488 s               [User: 8275.042 s, System: 416.125 s]
 

--- coinstatsindex ---
Benchmark 1: before (4c40a92)
  Time (abs ≡):        40104.717 s               [User: 34332.346 s, System: 408.091 s]
 
Benchmark 1: after (1497086)
  Time (abs ≡):        39729.018 s               [User: 34321.404 s, System: 400.228 s]
 

--- txospenderindex ---
Benchmark 1: before (4c40a92)
  Time (abs ≡):        36491.202 s               [User: 23663.602 s, System: 2216.541 s]
 
Benchmark 1: after (1497086)
  Time (abs ≡):        26654.635 s               [User: 22664.639 s, System: 1659.436 s]

</details>

I will measure INDEX_BATCH_SIZE = 100 next,

l0rinc commented at 10:52 PM on March 19, 2026:

Finished INDEX_BATCH_SIZE = 100:

<details><summary>Details</summary>

BEFORE="4c40a923f003420193aa574745f70788bcf35265"; AFTER="1497086cdb0e64df861100e0715ffd975d4c41cf"; DATA_DIR="/mnt/my_storage/BitcoinData"; export DATA_DIR; RESULTS_FILE="${DATA_DIR}/index_benchmark_results.txt"; export RESULTS_FILE; wait_index() {   tail -F ${DATA_DIR}/debug.log 2>/dev/null 
| grep -q -m1 'index is enabled\|is enabled at height';   sleep 2;   killall bitcoind 2>/dev/null || true;   wait; }; export -f wait_index; log_size() {   local label="$1"; local index_dir="$2";   local size=$(du -sh "$index_dir" 2>/dev/null | cut -f1);   local files=$(find "$index_dir" -type f -name '*.ldb' 2>/dev/null | wc -l);   echo "$label: size=$size, ldb_files=$files" | tee -a "$RESULTS_FILE"; }; export -f log_size; git reset --hard >/dev/null 2>&1 && git clean -fxd >/dev/null 2>&1 && git fetch origin $BEFORE $AFTER >/dev/null 2>&1; for c in $BEFORE:build-before $AFTER:build-after; do   git checkout ${c%:*} >/dev/null 2>&1 &&   sed -i 's/INDEX_BATCH_SIZE = 500/INDEX_BATCH_SIZE = 100/' src/index/base.h &&   cmake -B ${c#*:} -G Ninja -DCMAKE_BUILD_TYPE=Release >/dev/null 2>&1 && ninja -C ${c#*:} bitcoind >/dev/null 2>&1; done; echo "indexes | $(hostname) | $(uname -m) | $(lscpu | grep 'Model name' | head -1 | cut -d: -f2 | xargs) | $(nproc) cores | $(free -h | awk '/^Mem:/{print $2}') RAM | $(df -T $DATA_DIR | awk 'NR==2{print $2}') | $(lsblk -no ROTA $(df --output=source $DATA_DIR | tail -1) 2>/dev/null | grep -q 0 && echo SSD || echo HDD)" | tee -a "$RESULTS_FILE" && for INDEX in txindex blockfilterindex coinstatsindex txospenderindex; do   echo -e "\n--- $INDEX ---" | tee -a "$RESULTS_FILE";   if [ "$INDEX" = "blockfilterindex" ]; then     INDEX_DIR="${DATA_DIR}/indexes/blockfilter/basic";   else     INDEX_DIR="${DATA_DIR}/indexes/${INDEX}";   fi;   export INDEX_DIR;   for BUILD in before after; do     if [ "$BUILD" = "before" ]; then COMMIT="${BEFORE:0:7}"; else COMMIT="${AFTER:0:7}"; fi;     hyperfine --runs 1 --shell bash --sort command       --prepare "rm -rf ${DATA_DIR}/indexes/* ${DATA_DIR}/debug.log"       --cleanup "log_size '${INDEX} ${BUILD}' '${INDEX_DIR}'"       -n "${BUILD} (${COMMIT})"       "./build-${BUILD}/bin/bitcoind -datadir=${DATA_DIR} -${INDEX}=1 -connect=0 -printtoconsole=0 & wait_index"       2>&1 | tee -a "$RESULTS_FILE";   done; done;
indexes | i7-hdd | x86_64 | Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz | 8 cores | 62Gi RAM | ext4 | HDD

--- txindex ---
Benchmark 1: before (4c40a92)
  Time (abs ≡):        23968.560 s               [User: 13214.417 s, System: 1732.668 s]
 
Benchmark 1: after (1497086)
  Time (abs ≡):        22914.283 s               [User: 13153.074 s, System: 1706.380 s]
 

--- blockfilterindex ---
Benchmark 1: before (4c40a92)
  Time (abs ≡):        14056.593 s               [User: 8332.068 s, System: 414.447 s]
 
Benchmark 1: after (1497086)
  Time (abs ≡):        14055.667 s               [User: 8282.042 s, System: 413.627 s]
 

--- coinstatsindex ---
Benchmark 1: before (4c40a92)
  Time (abs ≡):        40062.595 s               [User: 34294.097 s, System: 401.118 s]
 
Benchmark 1: after (1497086)
  Time (abs ≡):        39953.151 s               [User: 34338.162 s, System: 394.930 s]
 

--- txospenderindex ---
Benchmark 1: before (4c40a92)
  Time (abs ≡):        36106.469 s               [User: 23462.741 s, System: 2216.890 s]
 
Benchmark 1: after (1497086)
  Time (abs ≡):        32262.354 s               [User: 22250.924 s, System: 1963.755 s]

</details>

l0rinc commented at 1:30 PM on March 22, 2026:

Finished INDEX_BATCH_SIZE = 300:

<details><summary>Details</summary>

--- txindex ---
Benchmark 1: before (4c40a92)
  Time (abs ≡):        23916.303 s               [User: 13219.568 s, System: 1732.442 s]
 
Benchmark 1: after (1497086)
  Time (abs ≡):        22366.743 s               [User: 12985.083 s, System: 1581.649 s]
 

--- blockfilterindex ---
Benchmark 1: before (4c40a92)
  Time (abs ≡):        14148.605 s               [User: 8334.721 s, System: 415.524 s]
 
Benchmark 1: after (1497086)
  Time (abs ≡):        14078.675 s               [User: 8292.390 s, System: 414.992 s]
 

--- coinstatsindex ---
Benchmark 1: before (4c40a92)
  Time (abs ≡):        40208.707 s               [User: 34359.943 s, System: 401.578 s]
 
Benchmark 1: after (1497086)
  Time (abs ≡):        40035.152 s               [User: 34408.491 s, System: 395.890 s]
 

--- txospenderindex ---
Benchmark 1: before (4c40a92)
  Time (abs ≡):        36154.602 s               [User: 23454.551 s, System: 2234.643 s]
 
Benchmark 1: after (1497086)
  Time (abs ≡):        27110.711 s               [User: 21378.361 s, System: 1703.907 s]

</details>

I'd be interested in the extra memory consumption of these to help with choosing a good batch value.

l0rinc commented at 12:27 PM on March 25, 2026:

Finished batch=700:

<details><summary>Details</summary>

--- txindex ---
Benchmark 1: before (4c40a92)
  Time (abs ≡):        24201.390 s               [User: 13227.946 s, System: 1742.760 s]
 
Benchmark 1: after (1497086)
  Time (abs ≡):        21362.662 s               [User: 12664.631 s, System: 1473.880 s]
 
--- blockfilterindex ---
Benchmark 1: before (4c40a92)
  Time (abs ≡):        14179.578 s               [User: 8343.194 s, System: 421.597 s]
 
Benchmark 1: after (1497086)
  Time (abs ≡):        14020.574 s               [User: 8288.567 s, System: 420.177 s]
 
--- coinstatsindex ---
Benchmark 1: before (4c40a92)
  Time (abs ≡):        40028.484 s               [User: 34260.584 s, System: 409.004 s]
 
Benchmark 1: after (1497086)
  Time (abs ≡):        39803.985 s               [User: 34369.698 s, System: 403.067 s]
 
--- txospenderindex ---
Benchmark 1: before (4c40a92)
  Time (abs ≡):        36574.351 s               [User: 23483.284 s, System: 2245.537 s]
 
Benchmark 1: after (1497086)
  Time (abs ≡):        26134.814 s               [User: 21970.124 s, System: 1648.552 s]

</details>

Summarizing the measurements so far looks like this:

Index	batch=100	batch=300	batch=500	batch=700	batch=1000
txindex	-4.4%	-6.5%	-11.0%	-11.7%	-8.8%
blockfilterindex	-0.0%	-0.5%	-1.1%	-1.1%	-0.6%
coinstatsindex	-0.3%	-0.4%	-0.5%	-0.6%	-0.9%
txospenderindex	-10.6%	-25.0%	-27.4%	-28.5%	-27.0%

polespinasa commented at 2:08 PM on May 27, 2026:

in 1628b1005a74becbd82e128023bcdd70b86cc834 index: add block range computation why 500? is it an arbitrary number?

arejula27 commented at 3:20 PM on May 27, 2026:

I think that was decided as an arbitrary small number just for the first approach, then each index will have different batch sizes based on the benchmarks

l0rinc commented at 12:11 PM on May 30, 2026:

#34489 (review)

fjahr commented at 9:47 AM on February 3, 2026: contributor

This relates to what you wrote here right? It would be helpful that you check if it solves the LevelDB file issue in coinstatsindex and I am also curious about your benchmark results because @l0rinc did not find that this was leading to a speed up.

l0rinc commented at 10:38 AM on February 3, 2026: contributor

Concept ACK, I prefer this over #33306 - and will investigate if this solves that problem once my benchmarking servers free up.

because @l0rinc did not find that this was leading to a speed up.

I experimented with something similar in https://github.com/l0rinc/bitcoin/pull/37/changes, will compare it against this change. I wrote:

indexes: batch index writes: 34188s, 211M, 8 files It solved the fragmentation, but didn't speed up anything. I still think this is a better direction than adding manual compactions.

Most likely since writing isn't the bottleneck but MuHash calculations were.

furszy commented at 3:02 PM on February 3, 2026: member

This relates to what you wrote here right? It would be helpful that you check if it solves the LevelDB file issue in coinstatsindex and I am also curious about your benchmark results because @l0rinc did not find that this was leading to a speed up. @fjahr the comment was merely an excuse to decouple the DB writes batching code out of #26966 rather than me having a formed opinion in favor or against #33306. That's why I didn't mention it in the PR description. Maybe the changes complement each other.

To be crystal clear, just updated the PR description with further details on why this change worth alone, independently on #33306.

To summarize it here, the goal of the changes are:

Batch DB writes instead of flushing per block, which will improve sync time on HDD due to the reduced number of IO operations.
Reduce cs_main lock contention, which orthogonally improves the overall node responsiveness (and primarily IBD) while the indexes threads are syncing.
Lays the groundwork for the real speedups, since part of #26966 parallelization pre-work is introduced here as well.

furszy force-pushed on Feb 3, 2026

furszy commented at 8:26 PM on February 3, 2026: member

Updated per feedback from @hebasto (thanks!). Changed <cinttypes> include for <cstdint> to make IWYU happy.

hebasto commented at 8:38 PM on February 3, 2026: member

Updated per feedback from @hebasto (thanks!). Changed <cinttypes> include for <cstdint> to make IWYU happy.

~I guess, this is a bug in IWYU caused by https://github.com/include-what-you-use/include-what-you-use/commit/44480a2de0d8ef039b13391997f274ea33750be9.~

UPDATE: Fixed in #34498.

maflcko commented at 1:57 PM on February 4, 2026: member

Can you run all unit and functional tests on all commits? Or do they time out?

furszy force-pushed on Feb 4, 2026

furszy commented at 3:22 PM on February 4, 2026: member

Can you run all unit and functional tests on all commits? Or do they time out?

Bad squash, my bad. Thanks for the heads up. Fixed. Also rebased the branch to pick up the CI changes.

DrahtBot removed the label CI failed on Feb 4, 2026

Jhackman2019 commented at 2:54 AM on February 8, 2026: none

Builds clean on ARM64 (Pi 5, Debian Bookworm). All index-related functional tests and blockfilter_index unit tests pass. :)

in src/index/txindex.cpp:48 in 8826900ddc

bvbfan commented at 8:31 AM on February 8, 2026:

This function doesn't exist anymore.

furszy commented at 4:55 PM on February 8, 2026:

Done, thanks.

optout21 commented at 10:22 PM on February 9, 2026:

A minor comment on the commit description: I feel an ambiguity or typo around "during".

Pass CDBBatch to subclasses during so writes can be accumulated and committed together instead of flushing per block.

furszy commented at 1:12 AM on February 10, 2026:

Fixed.

l0rinc commented at 3:34 PM on March 9, 2026:

I would focus in the commit message (and PR description) on batching reducing write amplification - i.e. fewer LevelDB compactions because the payload is more predictable

l0rinc commented at 4:47 PM on March 9, 2026:

unrelated nit:

    // No need to handle errors in Commit. If it fails, the error will already be logged. The

furszy force-pushed on Feb 8, 2026

in src/index/base.cpp:246 in eed9ea5fc5 outdated

 243 | +                if (block_batch.start_index) {
 244 | +                    const int start_height = block_batch.start_index->nHeight;
 245 | +                    const int tip_height = m_chainstate->m_chain.Height();
 246 | +                    // Compute the last height in the batch without exceeding the chain tip
 247 | +                    const int batch_end_height = std::min(start_height + m_num_blocks_batch - 1, tip_height);
 248 | +                    block_batch.end_index = m_chainstate->m_chain[batch_end_height];

optout21 commented at 2:06 PM on February 9, 2026:

The start and end are set here, according to the desired size. However, block_batch.start_index can change below. Wouldn't it be better/cleaner to set it just before the ProcessBlocks call? (I mean to keep the logic up to the deleted pindex = pindex_next; line, and put the bactch setup there.)

furszy commented at 4:39 PM on February 9, 2026:

The start and end are set here, according to the desired size. However, block_batch.start_index can change below.

It seems the code I had for this case got dropped in the previous bad squash. Good catch. See the missing line https://github.com/furszy/bitcoin-core/commit/54874bf43b2df6846061fbb844b7026f69e7bc74#diff-5251d93616c116c364d2d502ad2a59e901a3512194462fabacc9756f96fd8dddR274. Will re-add it. Thanks.

Wouldn't it be better/cleaner to set it just before the ProcessBlocks call? (I mean to keep the logic up to the deleted pindex = pindex_next; line, and put the bactch setup there.)

That would not only require taking cs_main an extra time, but would also introduce race conditions. The extra locking comes from the chain lookup needed to determine the window’s end index. And, more importantly, if the tip changes between setting the start index and computing the end, we could end up on a fork at a lower height, resulting in an invalid window (start_index > end_index).

optout21 commented at 6:02 PM on February 9, 2026:

That would not only require taking cs_main an extra time, but would also introduce race conditions. The extra locking comes from the chain lookup needed to determine the window’s end index.

What I had in mind is setting both the start and end a bit later, after the pindex is finalized. I will try it out and provide a concrete code snippet a bit later.

furszy commented at 7:00 PM on February 9, 2026:

What I had in mind is setting both the start and end a bit later, after the pindex is finalized. I will try it out and provide a concrete code snippet a bit later.

I see the same two problems there, with an extra requirement of a reorg check and another rewind call. Which makes it even worse.

optout21 commented at 9:56 PM on February 9, 2026:

I acknowledge that it's important to take the cs_main lock only once in the typical scenario. Also, the missing update of end was identified and fixed! Thx

optout21 commented at 2:29 PM on February 9, 2026: contributor

I am confused as to how this batching relates to the batching logic of the orginal code. Looking at the pre-change BaseIndex::Sync(), it seems that Commit/WriteBatch occured only every 30 seconds (or at the chain tip), not after each block. This change does a WriteBatch after every 500 blocks, which in fact may be even more frequent. The original commit-after-every-30-secs is still there, so I don't see how this change makes less writes. Maybe I misunderstand the change.

            if (!ProcessBlock(pindex)) return; // error logged internally
            ...
            if (current_time - last_locator_write_time >= SYNC_LOCATOR_WRITE_INTERVAL) {
                SetBestBlockIndex(pindex);
                last_locator_write_time = current_time;
                // No need to handle errors in Commit. See rationale above.
                Commit();
            }

furszy commented at 6:48 PM on February 9, 2026: member

I am confused as to how this batching relates to the batching logic of the orginal code. Looking at the pre-change BaseIndex::Sync(), it seems that Commit/WriteBatch occured only every 30 seconds (or at the chain tip), not after each block. This change does a WriteBatch after every 500 blocks, which in fact may be even more frequent. The original commit-after-every-30-secs is still there, so I don't see how this change makes less writes. Maybe I misunderstand the change.
            if (!ProcessBlock(pindex)) return; // error logged internally
            ...
            if (current_time - last_locator_write_time >= SYNC_LOCATOR_WRITE_INTERVAL) {
                SetBestBlockIndex(pindex);
                last_locator_write_time = current_time;
                // No need to handle errors in Commit. See rationale above.
                Commit();
            }

Commit() only persists the locator (i.e. "how far the index has synced") so we can resume from that point after a restart. It does not flush or batch the per-block digested data.

The block data is written inside CustomAppend() which is called from ProcessBlock() per block, and it writes to the db immediately.

We're not replacing locator update time nor merging it with something else here, we're only batching something that previously wasn't batched at all.

Also, using a block window instead of reusing the existing time-based window is intentional. Time-based flushing makes memory usage hard to predict, and difficult to test (see #32878), because the amount of buffered data depends on runtime speed, which does not necessarily correlate with available memory (batched data stays in memory until it's flushed..). A fixed block window provides predictable memory usage and deterministic behavior, which is overall better and also integrates nicely with multiple workers sync (#26966). I would argue in favor of ditching the time window in the future, but.. that's not something related to this PR.

furszy force-pushed on Feb 9, 2026

furszy commented at 8:00 PM on February 9, 2026: member

Updated per feedback, plus added test coverage for the introduced changes. Thanks @optout21.

optout21 commented at 9:52 PM on February 9, 2026: contributor

Thanks for the explanations, and apologies for asking trivial/irrelevant questions.

Commit() only persists the locator (i.e. "how far the index has synced") so we can resume from that point after a restart. The block data is written inside CustomAppend() which is called from ProcessBlock() per block, and it writes to the db immediately. These are two important points, I can see now.

I was puzzled by the fact that there is a loop in Sync(), and another loop is introduced in ProcessBlocks(). But the two complement each other.

Another approach would be to keep the outer loop in Sync() block-by-block, store a CDBBatch variable, process one block, count the blocks in the batch, and flush the batch if batch size is reached (outside of ProcessBlock). This would be slightly less change (no secondary loop), but -- if I'm correct -- the cs_main lock would be still accessed at every block.

... I would argue in favor of ditching the time window in the future, but.. that's not something related to this PR. Maybe Commit() could be called with the same granularity as batch writes, but that's out of scope here.

furszy commented at 10:11 PM on February 9, 2026: member

Another approach would be to keep the outer loop in Sync() block-by-block, store a CDBBatch variable, process one block, count the blocks in the batch, and flush the batch if batch size is reached (outside of ProcessBlock). This would be slightly less change (no secondary loop), but -- if I'm correct -- the cs_main lock would be still accessed at every block.

Yes. That would still lock cs_main on every block, which harms the overall node responsiveness, and it is also not very helpful for the parallelization goal, which requires certain isolation between batches so worker threads can process them concurrently.

in src/index/base.cpp:276 in 360c43d7c5

 272 |                  FatalErrorf("Failed to rewind %s to a previous chain tip", GetName());
 273 |                  return;
 274 |              }
 275 |  
 276 | -            // For now, process a single block at time
 277 |              if (!ProcessBlocks(block_batch.start_index, block_batch.end_index)) return; // error logged internally

optout21 commented at 10:17 PM on February 9, 2026:

Code review shows that by this point start_index and end_index are not null, but there is several if's to ensure that, it's not trivial. I propose to document/ensure this invariant with an Assume before this line, especially having in mind that ProcessBlocks crashes if start or end is nullptr.

furszy commented at 1:15 AM on February 10, 2026:

Probably better to just pass them by ref instead. Will do if have to re-touch. Otherwise it can be tackled in a quick follow-up too.

furszy commented at 5:18 PM on February 18, 2026:

Done as suggested.

in src/index/base.cpp:212 in 0d980feb95

 207 | +    }
 208 | +
 209 | +    // And process blocks in forward order: from start to end
 210 | +    for (auto it = ordered_blocks.rbegin(); it != ordered_blocks.rend(); ++it) {
 211 | +        if (m_interrupt) return false;
 212 | +        if (!ProcessBlock(*it)) return false;

optout21 commented at 10:18 PM on February 9, 2026:

nit: the "// error logged internally" comment could be passed on here as well.

furszy commented at 1:12 AM on February 10, 2026:

Done as suggested.

optout21 commented at 10:24 PM on February 9, 2026: contributor

crACK 3687ead892d207ddbdfec3f2ef5c086130414d9a

LGTM after clarifications and changes! Code review, no assessment on performance.

Prev: crACK d0f793ac61a0659f4f1e89f7943813244499bf7d

DrahtBot requested review from l0rinc on Feb 9, 2026

furszy force-pushed on Feb 10, 2026

furszy commented at 1:19 AM on February 10, 2026: member

nits addressed, thanks @optout21. Also, just for completeness, added test coverage for the "reorg during initial sync" scenario that was not covered before.

furszy force-pushed on Feb 10, 2026

DrahtBot added the label CI failed on Feb 10, 2026

DrahtBot removed the label CI failed on Feb 10, 2026

l0rinc commented at 11:15 AM on February 10, 2026: contributor

I just finished measuring the size and file count for this PR (it was running previously for a week with the wrong tail -F grep I reused from #26966#pullrequestreview-2965376357)

<details> <summary>CoinStatsIndex DB: Before vs After Batch Writes (47c9297 → dd76491)</summary>

COMMITS="47c9297172b0866d2b5d0de6b4fd35b798db3929 dd76491174b3a3b60c174822f75abb03fda3b1d4"; \
DATA_DIR="/mnt/my_storage/BitcoinData"; export DATA_DIR; \
wait_index() { tail -F ${DATA_DIR}/debug.log 2>/dev/null | grep -q -m1 'coinstatsindex is enabled'; killall bitcoind 2>/dev/null || true; sleep 2; }; export -f wait_index; \
for COMMIT in $COMMITS; do \
  killall bitcoind 2>/dev/null || true; rm -rf "$DATA_DIR/indexes" "$DATA_DIR/debug.log"; \
  git fetch -q origin "$COMMIT" >/dev/null 2>&1 || true && git checkout -q "$COMMIT" && git log -1 --pretty='%h %s' "$COMMIT"; \
  cmake -B build -G Ninja -DCMAKE_BUILD_TYPE=Release >/dev/null 2>&1 && ninja -C build >/dev/null 2>&1; \
  ./build/bin/bitcoind -datadir="$DATA_DIR" -connect=0 -coinstatsindex=1 -printtoconsole=0 & wait_index; \
  du -h "$DATA_DIR/indexes/coinstatsindex/db"; \
  find "$DATA_DIR/indexes/coinstatsindex/db" -type f -name '*.ldb' | wc -l; \
done

</details>

[1]+  Done                    ./build/bin/bitcoind -datadir="$DATA_DIR" -connect=0 -coinstatsindex=1 -printtoconsole=0
47c9297172 Merge bitcoin/bitcoin#32420: mining, ipc: omit dummy extraNonce from coinbase
[1] 3876795
[1]+  Done                    ./build/bin/bitcoind -datadir="$DATA_DIR" -connect=0 -coinstatsindex=1 -printtoconsole=0
216M    /mnt/my_storage/BitcoinData/indexes/coinstatsindex/db
144

and

dd76491174 index: enable DB writes batching during sync
[1] 3885937
[1]+  Done                    ./build/bin/bitcoind -datadir="$DATA_DIR" -connect=0 -coinstatsindex=1 -printtoconsole=0
216M    /mnt/my_storage/BitcoinData/indexes/coinstatsindex/db
10

I haven't measured the speeds yet but will do that next.

furszy commented at 11:31 PM on February 11, 2026: member

Interesting, cool thanks. Another good point in favor of this change then. Will add it to the PR description.

l0rinc commented at 8:32 AM on February 12, 2026: contributor

I have finished benchmarking them:

BEFORE="64294c89094d5ab10d87236729cc267fde0a24ca"; AFTER="3687ead892d207ddbdfec3f2ef5c086130414d9a"; \
DATA_DIR="/mnt/my_storage/BitcoinData"; export DATA_DIR; \
RESULTS_FILE="${DATA_DIR}/index_benchmark_results.txt"; export RESULTS_FILE; \
wait_index() { \
  tail -F ${DATA_DIR}/debug.log 2>/dev/null | grep -q -m1 'index is enabled\|is enabled at height'; \
  sleep 2; \
  killall bitcoind 2>/dev/null || true; \
  wait; \
}; export -f wait_index; \
log_size() { \
  local label="$1"; local index_dir="$2"; \
  local size=$(du -sh "$index_dir" 2>/dev/null | cut -f1); \
  local files=$(find "$index_dir" -type f -name '*.ldb' 2>/dev/null | wc -l); \
  echo "$label: size=$size, ldb_files=$files" | tee -a "$RESULTS_FILE"; \
}; export -f log_size; \
git reset --hard >/dev/null 2>&1 && git clean -fxd >/dev/null 2>&1 && git fetch origin $BEFORE $AFTER >/dev/null 2>&1; \
for c in $BEFORE:build-before $AFTER:build-after; do \
  git checkout ${c%:*} >/dev/null 2>&1 && cmake -B ${c#*:} -G Ninja -DCMAKE_BUILD_TYPE=Release >/dev/null 2>&1 && ninja -C ${c#*:} bitcoind >/dev/null 2>&1; \
done; \
echo "indexes | $(hostname) | $(uname -m) | $(lscpu | grep 'Model name' | head -1 | cut -d: -f2 | xargs) | $(nproc) cores | $(free -h | awk '/^Mem:/{print $2}') RAM | $(df -T $DATA_DIR | awk 'NR==2{print $2}') | $(lsblk -no ROTA $(df --output=source $DATA_DIR | tail -1) 2>/dev/null | grep -q 0 && echo SSD || echo HDD)" | tee -a "$RESULTS_FILE" && \
for INDEX in txindex blockfilterindex coinstatsindex; do \
  echo -e "\n--- $INDEX ---" | tee -a "$RESULTS_FILE"; \
  if [ "$INDEX" = "blockfilterindex" ]; then \
    INDEX_DIR="${DATA_DIR}/indexes/blockfilter/basic"; \
  else \
    INDEX_DIR="${DATA_DIR}/indexes/${INDEX}"; \
  fi; \
  export INDEX_DIR; \
  for BUILD in before after; do \
    if [ "$BUILD" = "before" ]; then COMMIT="${BEFORE:0:7}"; else COMMIT="${AFTER:0:7}"; fi; \
    hyperfine --runs 1 --shell bash --sort command \
      --prepare "rm -rf ${DATA_DIR}/indexes/* ${DATA_DIR}/debug.log" \
      --cleanup "log_size '${INDEX} ${BUILD}' '${INDEX_DIR}'" \
      -n "${BUILD} (${COMMIT})" \
      "./build-${BUILD}/bin/bitcoind -datadir=${DATA_DIR} -${INDEX}=1 -connect=0 -printtoconsole=0 & wait_index" \
      2>&1 | tee -a "$RESULTS_FILE"; \
  done; \
done;

</details>

--- txindex ---

Benchmark 1: before (64294c8)
  Time (abs ≡):        7665.128 s               [User: 11912.927 s, System: 1194.716 s]
txindex before: size=63G, ldb_files=2046

Benchmark 1: after (3687ead)
  Time (abs ≡):        7579.151 s               [User: 11918.671 s, System: 1120.304 s] 
txindex after: size=63G, ldb_files=2049

--- blockfilterindex ---

Benchmark 1: before (64294c8)
  Time (abs ≡):        8447.157 s               [User: 8034.353 s, System: 274.633 s] 
blockfilterindex before: size=12G, ldb_files=3

Benchmark 1: after (3687ead)
  Time (abs ≡):        8391.136 s               [User: 7992.547 s, System: 270.205 s] 
blockfilterindex after: size=12G, ldb_files=3

--- coinstatsindex ---

Benchmark 1: before (64294c8)
  Time (abs ≡):        35219.938 s               [User: 34115.981 s, System: 276.583 s]
coinstatsindex before: size=216M, ldb_files=145

Benchmark 1: after (3687ead)
  Time (abs ≡):        35229.893 s               [User: 34125.606 s, System: 271.160 s]
coinstatsindex after: size=216M, ldb_files=10

Speed is basically the same and the number of files in coinstatsindex were fixed. The txindex file count change seems like noise. I will review the code soon, thanks for extracting it from the big PR.

furszy commented at 3:40 PM on February 12, 2026: member

Speed is basically the same

You have a really nice machine there. The real batching gains should show up on HDDs rather than SSDs. Still, great to know this works properly on high end hardware. Thanks for the benchmark.

l0rinc commented at 4:08 PM on February 12, 2026: contributor

The real batching gains should show up on HDDs rather than SSDs

yes, I would expect that to be the case - that will also be part of my review before I can ack it.

furszy commented at 4:58 PM on February 12, 2026: member

yes, I would expect that to be the case - that will also be part of my review before I can ack it.

you know, even if it runs at the same speed on an HDD (which I doubt), the other improvements this PR brings still make it worth it.

in src/test/blockfilter_index_tests.cpp:389 in 3687ead892 outdated

 385 | +    WaitForHeight(&index, blocking_height + 2, 5s);
 386 |      index.Stop();
 387 |  }
 388 |  
 389 | +// Ensure the initial sync batch window behaves as expected.
 390 | +// Tests sync from the genesis block and from a higher block to mimic a restart.

arejula27 commented at 5:37 PM on February 12, 2026:

I wouldn’t say this test truly mimics a restart (i.e., shutting down the node and immediately restarting it, without the blockchain growing). It is closer to a continuation after a clean shutdown of the whole node for a long period of time. This test is still interesting, but it does not cover a new bug window introduced by the batch approach (shutting down the node during a batch).

The test currently does the following:

Pre-mine 100 blocks
Mine 173 additional blocks
Create the index
Synchronize the index to the tip
Mine another 173 blocks
Recreate the index
Synchronize again to the new tip

In a real-world scenario, all blocks would already exist on disk, and a restart would likely happen during a batch (for example at height 196, where the previous batch ended at 180).

I would like to see a test (either a modification of this one or a new one) that verifies the index is not corrupted and can correctly resume if the node is restarted during a sync, before reaching the tip.

A possible approach for such a test could be:

Pre-populate the blockchain (e.g., 300 blocks already on disk)
Start index sync with batch_size = 30
Interrupt the sync after several batches but before completion
Simulate a shutdown
Restart the index and verify (with no new blocks added):
- It resumes from the last checkpoint (not from genesis or the last processed block)
- No corruption occurs in re-processed blocks
- The sync successfully completes

furszy commented at 3:37 PM on February 13, 2026:

Sorry, I don't see the point here. What you described is essentially a restart. The index gets recreated, initial sync resumes from the last stored index. That's all what we need. Feel free to experiment on a follow-up, or I could also add it here if you have some code too.

arejula27 commented at 7:58 PM on February 13, 2026:

Sorry if I didn’t explain this clearly before, let me try again.

With this PR, some blocks are being indexed (on disk), but the index pointer is not updated because the Commit() function has not been called. The commit is executed every 30 seconds (which won’t happen during the test) or when the tip is processed (batch tip).

What I’m suggesting is to test the scenario where a batch is processed but the tip is not processed. In that case, the index should sync starting from the last committed block.

After debugging your test case, the second round started from the last element of the last batch, as it was the end of the chain at round 0. I would suggest checking the following cases (with a batch of 30 blocks and at least 101 blocks):

If the sync process is shut down when block 100 is processed, after a restart, the first block to sync should be 91 (not 0, 91 or 120), verifying that if a batch is interrupted, after a restart, the index will reindex from the last committed batch.
If the sync process is shut down when block 90 is processed, after a restart, the first block to sync should be 91, verify and an edge case of the previous scenario.

Update

I noticed that in the code, interruptions are handled by committing:

if (m_interrupt) {
    LogInfo("%s: m_interrupt set; exiting ThreadSync", GetName());

    SetBestBlockIndex(pindex);
    Commit();
    return;
}

Because of this, the scenarios described above may not be useful, since the last processed block might always be committed before shutdown.

I’m not sure whether this interruption only applies to graceful shutdowns or if it also covers crash scenarios.

furszy commented at 8:21 PM on February 13, 2026:

With this PR, some blocks are being indexed (on disk), but the index pointer is not updated because the Commit() function has not been called. The commit is executed every 30 seconds (which won’t happen during the test) or when the tip is processed (batch tip).

Just to be accurate: batch writes and the index locator update are, on purpose, not related procedures. Commit() happens every 30 secs, during interruption or when the tip is reached prior to finishing initial sync.

What I’m suggesting is to test the scenario where a batch is processed but the tip is not processed. In that case, the index should sync starting from the last committed block.

Sure, we could return an error inside the test child index CustomAppend() that finishes the process early. It shouldn't be hard to code. Feel encouraged to experiment with it. I can cherry-pick your changes if you have a test commit for it.

arejula27 commented at 10:08 PM on February 13, 2026:

Just to be accurate: batch writes and the index locator update are, on purpose, not related procedures. Commit() happens every 30 secs, during interruption or when the tip is reached prior to finishing initial sync.

Yes, ofc :D

It shouldn't be hard to code. Feel encouraged to experiment with it. 🫡

arejula27 commented at 1:47 AM on February 14, 2026:

Having played around with the code, I found that the case i mentioned didnt worked correctly

Round 0:
- Max height: 273
- After sync, tip is at 273
Round 1:
- Max height: 446 (273 + 173)
- Sync fails at height 310
Round 2:
- The first batch starts at 273, however, it should be 304 (the previous round synced one batch)
This happens because CustomAppend() returns false and the process is stpped before the following branch is reached
```
if (m_interrupt) {
    LogInfo("%s: m_interrupt set; exiting ThreadSync", GetName());
     SetBestBlockIndex(pindex);
    Commit();
    return;
}
```
Below is the code to reproduce the issue:

// Extend BlockFilterIndex to create a custom test function CustomAppend
class BlockFilterIndexTest : public BlockFilterIndex
{
public:
    BlockFilterIndexTest(std::unique_ptr<interfaces::Chain> chain, BlockFilterType filter_type, size_t n_cache_size, bool f_memory) :
        BlockFilterIndex(std::move(chain), filter_type, n_cache_size, f_memory) {}

    // Override CustomAppend to simulate an error during append at a specific block height
    bool CustomAppend(CDBBatch& batch, const interfaces::BlockInfo& block) override
    {
        if (block.height == 290) {
            return false;
        }
        return BlockFilterIndex::CustomAppend(batch, block);
    }
};

// Ensure the initial sync batch window behaves as expected.
// Tests sync from the genesis block and from a higher block to mimic a restart.
// Note: Test runs in /tmp by default, which is usually cached, so timings are not
// a meaningful benchmark (use -testdatadir to run it elsewhere).
BOOST_FIXTURE_TEST_CASE(initial_sync_batch_window, BuildChainTestingSetup)
{
    constexpr int MINE_BLOCKS = 173;
    constexpr int BATCH_SIZE = 30;

    int expected_tip = 100; // pre-mined blocks
    for (int round = 0; round < 3; round++) { // two additional rounds to test sync from genesis and from a higher block
        mineBlocks(MINE_BLOCKS); // Generate blocks
        const int tip_height = WITH_LOCK(::cs_main, return m_node.chainman->ActiveChain().Height());
        BOOST_REQUIRE(tip_height == MINE_BLOCKS + expected_tip);
        expected_tip = tip_height;

        BlockFilterIndexTest filter_index(interfaces::MakeChain(m_node), BlockFilterType::BASIC, 1 << 20, /*f_memory=*/false);
        filter_index.SetProcessingBatchSize(BATCH_SIZE);
        BOOST_REQUIRE(filter_index.Init());
        BOOST_CHECK(!filter_index.BlockUntilSyncedToCurrentChain());

        // Ensure we can sync up to the tip
        const auto& summary_pre{filter_index.GetSummary()};
        // breakpoint here for following gdb ouptut
        filter_index.Sync();
        if (round == 1) { continue; }
        const auto& summary{filter_index.GetSummary()};
        BOOST_CHECK(summary.synced);

GDB output before Sync() in round 2:

gdb build/bin/test_bitcoin
b blockfilter_index_tests.cpp:427
b src/index/base.cpp:255
run
(gdb) p round
$2 = 2
(gdb) p expected_tip
$3 = 619
(gdb) p summary_pre.best_block_height
$4 = 273

furszy commented at 2:03 AM on February 14, 2026:

Haven't gone through this in detail, but the locator won't update in your second round unless you advance the mock time. Also, keep in mind that if any block within the batch fails, the entire process aborts. The process doesn't save intermediate batch states.

arejula27 commented at 2:18 AM on February 14, 2026:

but the locator won't update in your second round unless you advance the mock time.

yes i didnt want to simulate 30 sec, i wanted the error before the timmer commit (to fail into the error handling m_interrupt)

. Also, keep in mind that if any block within the batch fails, the entire process aborts

I saw that when CustomAppend returns false, it executes FatalErrorf, which simply stops the node (not gracefully) instead of raising m_interrupt, my bad

What do you think about modifying the error handling in the sync process to prevent losing stored data by raising the m_interrupt flag? In the worst case, we lose 29.59 seconds of execution, it might not be worth it and let everything as it currently is
I will try to modify the CustomAppend to raise the m_interrupt and create a test to show how it is correctly processed, i think it is more interesting this approach

furszy commented at 5:20 PM on February 14, 2026:

What do you think about modifying the error handling in the sync process to prevent losing stored data by raising the m_interrupt flag? In the worst case, we lose 29.59 seconds of execution, it might not be worth it and let everything as it currently is

As you said, since we already save progress every 30 seconds, saving it during interruptions makes very little sense. The amount of work lost is minimal. Yet, this has been the case since the indexes were introduced, and I suspect it hasn't received much attention, which is probably why no one removed it before.. It’s something we could consider.

arejula27 commented at 7:47 PM on February 14, 2026:

Nice, if you agree, you can mark this thread as resolved then

arejula27 commented at 5:39 PM on February 12, 2026: none

Concept ACK [3687ead892d207ddbdfec3f2ef5c086130414d9a] I really like the idea and I think it will be a massive upgrade with your parallelisation proposal #26966

in src/index/base.cpp:212 in 3687ead892 outdated

 207 | +        ordered_blocks.emplace_back(block);
 208 | +    }
 209 | +
 210 | +    // And process blocks in forward order: from start to end
 211 | +    for (auto it = ordered_blocks.rbegin(); it != ordered_blocks.rend(); ++it) {
 212 | +        if (m_interrupt) return false;

arejula27 commented at 10:50 AM on February 14, 2026:

Returning false here causes the interrupt to be handled incorrectly as it will be handled in the Sync function like this:

if (!ProcessBlocks(db_batch, block_batch.start_index, block_batch.end_index)) {
    return; // error logged internally
}

This early return prevents execution from reaching line 226, where the correct interrupt-handling logic resides. As a result, the expected recovery path is skipped.

Specifically, this makes the following block effectively unreachable in this scenario:

if (m_interrupt) {
    LogInfo("%s: m_interrupt set; exiting ThreadSync", GetName());

    SetBestBlockIndex(pindex);

    Commit();
    return;
}

I didn't find in the file a place where m_interrupt is handled in this way, might be interesting to try to make the code fall on the graceful path. A possible solution might be changing the return by continue, so it will go to the start of the loop and fall into the error handling logic

arejula27 commented at 10:58 AM on February 14, 2026:

I debbuged and with this code, the restart point is ok:

 if (!ProcessBlocks(db_batch, block_batch.start_index, block_batch.end_index)){
      if (!m_interrupt) {
           return;
}
                continue; 
}

furszy commented at 5:10 PM on February 14, 2026:

Nice catch. Pushed an update that fixes it and adds test coverage.

I'm considering alternative options for the structure, like removing the interruption update, but will let the ideas settle over the weekend. At least we have some nice test coverage for it now.

Some context on "removing the interruption update": Since we save progress every 30 seconds, saving it during interruptions makes little sense, as the work lost is minimal. The code is probably there merely because it was introduced on the early index days, where things were way more chaotic, and no one really paid much attention to it afterwards.

arejula27 commented at 8:04 PM on February 14, 2026:

Nice catch. Pushed an update that fixes it and adds test coverage.

Nice. I finally managed to explain the scenario I’ve been trying to describe from the beginning, sorry for not expressing it clearly before (I’m still new as a core reviewer), at least I found a potential bug.

I don’t want to take up more of your time on it, so feel free to resolve this thread. I think the test implemented for the scenario is enough.

Since we save progress every 30 seconds, saving it during interruptions makes little sense, as the work lost is minimal. The code is probably there merely because it was introduced in the early index days.

That’s a fair point. However, I’m not sure I see the benefit of removing it. I agree that this could be an argument for not introducing the code in the first place, since the potential work lost is minimal. But given that it’s already there, I’m not sure whether removing it (even if the performance impact is small) is an improvement. Do you have something specific in mind? Perhaps something related to the parallelisation of the indexing process?

furszy force-pushed on Feb 14, 2026

DrahtBot added the label CI failed on Feb 14, 2026

DrahtBot commented at 6:04 PM on February 14, 2026: contributor

🚧 At least one of the CI tasks failed. Task test max 6 ancestor commits: https://github.com/bitcoin/bitcoin/actions/runs/22020957853/job/63629927583 LLM reason (✨ experimental): Block filter index test initial_sync_reorg failed, causing the test suite to abort and CI failure.

<details><summary>Hints</summary>

Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.
An intermittent issue.

Leave a comment here, if you need help tracking down a confusing failure.

</details>

arejula27 commented at 8:06 PM on February 14, 2026: none

re Concept ACK [c2953e446d53a8ac86ae3e4801208df5c9c6d2cc]

furszy force-pushed on Feb 16, 2026

DrahtBot removed the label CI failed on Feb 16, 2026

furszy force-pushed on Feb 18, 2026

furszy commented at 5:21 PM on February 18, 2026: member

Made a few improvements to the commits descriptions with additional rationale for the changes, and rebased on master to pick up the latest CI updates.

DrahtBot added the label CI failed on Feb 18, 2026

DrahtBot removed the label CI failed on Feb 18, 2026

in src/index/base.cpp:256 in 965d4299a7

 256 |  
 257 |              // For now, process a single block at time
 258 | -            if (!ProcessBlocks(/*start=*/*pindex_next, /*end=*/*pindex_next)) {
 259 | -                // If failed due to an interruption, we haven't processed the block.
 260 | +            block_batch.last = block_batch.first;
 261 | +            if (!ProcessBlocks(*block_batch.first, *block_batch.last)) {

sedited commented at 8:49 PM on February 18, 2026:

In commit 965d4299a76badcf2257f390f5c02e6007ec6244

Nit: Why introduce the BlockBatch at all when we don't pass it on here? I don't think we gain much by de-referencing here.

furszy commented at 9:29 PM on February 28, 2026:

Nit: Why introduce the BlockBatch at all when we don't pass it on here? I don't think we gain much by de-referencing here.

Mainly for the block range computation, it's less error-prone to use a range structure instead of two independent variables. In the concurrent future (#26966), many block ranges will be computed at once and submitted to the pool (which will avoid further cs_main locking). This requires storing them in a container, and I wasn't really a fan of using a less readable std::pair. But complain taken, have reworked it so NextBlockToSync returns the range, which gives this struct a more powerful use case.

Regarding ProcessBlocks usage, I thought about it a few times. In #26966, the structure takes on a more meaningful role, it becomes the work unit state (we call it Task there), and we expand it to store intermediate results there as well. Because of that, I didn't think that making ProcessBlocks to use it was the best, the function should only receive what it will use, not an struct that contains other stuff.

On a side comment, I'm going to rebase #26966 on top of this PR now it is mature enough, so we use the same terms on both PRs and what I described above is super clear there as well.

in src/index/base.h:105 in 965d4299a7

 100 | +        BlockBatch() = default;
 101 | +
 102 | +        // Disallow copy
 103 | +        BlockBatch(const BlockBatch&) = delete;
 104 | +        BlockBatch& operator=(const BlockBatch&) = delete;
 105 | +        BlockBatch(BlockBatch&&) noexcept = default;

sedited commented at 8:55 PM on February 18, 2026:

Just a question: Why do we need to define all of these?

furszy commented at 9:45 PM on February 28, 2026:

Just a question: Why do we need to define all of these?

Not needed here, will remove them.

But, just to describe why these methods exist in the parent PR (and refresh my memory while doing it):

In #26966, this struct is extended to hold the block range processed data, it essentially becomes the work unit that flows through the entire pipeline:

It is computed by the index initial sync thread.
It is submitted to the thread pool, as a task there.
Once processed, it is pushed into a shared "ready" queue, which is then opportunistically processed.
Workers that finish their work units, post-processes "ready" units (this is where we flush data to disk in order, so the block filter index can build the headers chain correctly, since each header depends on the previous header's hash, same will happen with the coinfilterindex in the future, etc).

The whole rationale behind this design is that work units are processed in parallel, no ordering required. And once a worker finishes a task, it opportunistically checks whether the next in-order unit is ready for post-processing. If it is, that worker performs the post-processing step and continues draining any subsequent ready units in order. This basically ensures we can process blocks in parallel while still preserving the ordered persistence requirement that some of the indexes require.

in src/index/base.cpp:260 in b67920269c

 255 | @@ -245,14 +256,14 @@ void BaseIndex::Sync()
 256 |                      m_synced = true;
 257 |                      break;
 258 |                  }
 259 | +                // Just process one block in case of tip change
 260 | +                block_batch.last = block_batch.first;

sedited commented at 9:21 PM on February 18, 2026:

Why not re-apply the same logic from before here? Is there a reason why we should only ever process one block here? If that is possible could we make NextSyncBlock absorb this logic and return a BlockBatch directly?

furszy commented at 2:34 PM on March 2, 2026:

Done as suggested. Thanks!

in src/index/base.cpp:207 in 43b03731a7 outdated

 213 | -                // logged. The best way to recover is to continue, as index cannot be corrupted by
 214 | -                // a missed commit to disk for an advanced index state.
 215 | -                Commit();
 216 | -                return;
 217 | -            }
 218 | +        while (!m_interrupt) {

sedited commented at 10:03 PM on February 18, 2026:

Nit (clang-format): could drop the empty line after this too.

furszy commented at 2:32 PM on March 2, 2026:

Done as suggested.

furszy force-pushed on Mar 2, 2026

furszy commented at 2:35 PM on March 2, 2026: member

Updated per feedback, thanks @sedited! Also had to rebase the PR due to silent conflicts with the new TxoSpender index.

DrahtBot added the label CI failed on Mar 2, 2026

furszy force-pushed on Mar 2, 2026

furszy commented at 1:56 PM on March 3, 2026: member

CI failure is unrelated.

furszy commented at 9:30 PM on March 3, 2026: member

Why would it be unrealated, when it consistently happens on this pull only?

This PR only touches indexes code and nothing else. CI was failing in interface_ipc.py this morning, and now after the restart fails on p2p_orphan_handling.py. None of these tests have any index enabled.

maflcko commented at 9:36 PM on March 3, 2026: member

None of these tests have any index enabled.

Yeah, I guess this is a good point. It is just odd that the p2p_orphan_handling.py happened twice here, but nowhere else so far. The IPC one is tracked in #34711

Edit: Let's see if the CI fails in #34726 ...

DrahtBot removed the label CI failed on Mar 3, 2026

furszy commented at 1:40 AM on March 4, 2026: member

Let's see if the CI fails in #34726

Could add some debug logging to see where it's getting stuck. At first glance, it looks like it stalls during block creation.

Also, just to be clear to other reviewers. Since the issue happens there as well, we know for sure it is not related to this PR.

arejula27 commented at 4:09 PM on March 4, 2026: none

looks like CI passed now

in src/index/blockfilterindex.h:40 in 80aea07a6e

  37 | @@ -37,7 +38,7 @@ static constexpr int CFCHECKPT_INTERVAL = 1000;
  38 |   *
  39 |   * This index is used to serve BIP 157 net requests.
  40 |   */
  41 | -class BlockFilterIndex final : public BaseIndex

arejula27 commented at 4:45 PM on March 4, 2026:

Im not sure if removing final from BlockFilterIndex only to allow IndexBlockSim to subclass it for testing is the better decision, i undertsand that it was used final for a reason, and i do not think modifying the code fo fit a test is the best option.

Restoring final makes the compilation to fail with error:

  [ 82%] Building CXX object src/test/CMakeFiles/test_bitcoin.dir/checkqueue_tests.cpp.o
/home/arejula27/workspaces/bitcoin/src/test/blockfilter_index_tests.cpp:440:7: error: cannot derive from ‘final’ base ‘BlockFilterIndex’ in derived type ‘blockfilter_index_tests::IndexBlockSim’
  440 | class IndexBlockSim : public BlockFilterIndex
      |       ^~~~~~~~~~~~~
gmake[2]: *** [src/test/CMakeFiles/test_bitcoin.dir/build.make:348: src/test/CMakeFiles/test_bitcoin.dir/blockfilter_index_tests.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
gmake[1]: *** [CMakeFiles/Makefile2:2217: src/test/CMakeFiles/test_bitcoin.dir/all] Error 2
gmake: *** [Makefile:146: all] Error 2

Maybe we should base BlockFilterIndex from BaseIndex like IndexReorgCrash

furszy commented at 8:12 PM on March 4, 2026:

Yeah, I went back and forth on this decision for a while before pushing it. The idea was to actually cover BlockFilterIndex methods during reorgs, not just the BaseIndex behavior, even if they aren't strictly required for this PR changes.

Ideally, we should have both testing scenarios: reorgs during initial sync against BaseIndex, and reorgs against BlockFilterIndex. But.. doing that would require extracting BuildChainTestingSetup into a separate file, and I didn't want to expand the scope of this PR any further. This test, and even the existing index_reorg_crash (the one that uses IndexReorgCrash) do not belong to blockfilter_index_tests.cpp. They only test against the base class. But well, that is something that already exists in our codebase, and not something we are including here, so it is better to leave it for a different PR.

Also, the final declaration here (and in all other indexes) seems really unnecessary. I cannot think of anything we gain from it. It only restricts our unit testing capabilities. I would remove it from all indexes.

That being said, most of this is just me ranting about improvements we could make. So will take your suggestion and use the BaseIndex class here for now. Let's go step by step. Thanks for the feedback.

arejula27 commented at 10:42 PM on March 4, 2026:

Ideally, we should have both testing scenarios: reorgs during initial sync against BaseIndex, and reorgs against BlockFilterIndex.

Why? Isn't that redundant?

They only test against the base class. But that is something that already exists in our codebase and is not included here, so it is better to leave it for a different PR.

Also, the final declaration here (and in all other indexes) seems really unnecessary. I cannot think of anything we gain from it. It only restricts our unit testing capabilities. I would remove it from all indexes.

Sounds reasonable to discuss in another PR. I like the idea of removing final if it is not needed, but as you said, it should be consistent across all indexes.

furszy commented at 1:46 PM on March 5, 2026:

Why? Isn't that redundant?

We can test at different levels. The BaseIndex test can verify that we always follow the best chain during initial sync, that the appropriate child methods are called in order (append - remove), and check the db chain locator updates. Then, BlockFilterIndex test, and others, should only focus on manually connecting and disconnecting blocks, verifying that its data is consistent in memory and disk.

furszy commented at 5:37 PM on March 4, 2026: member

looks like CI passed now

yes, review can continue.

in src/index/txindex.cpp:57 in 80aea07a6e

  53 | @@ -57,13 +54,11 @@ bool TxIndex::DB::ReadTxPos(const Txid& txid, CDiskTxPos& pos) const
  54 |      return Read(std::make_pair(DB_TXINDEX, txid.ToUint256()), pos);
  55 |  }
  56 |  
  57 | -void TxIndex::DB::WriteTxs(const std::vector<std::pair<Txid, CDiskTxPos>>& v_pos)
  58 | +void WriteTxs(CDBBatch& batch, const std::vector<std::pair<Txid, CDiskTxPos>>& v_pos)

arejula27 commented at 5:38 PM on March 4, 2026:

Since WriteTxs was moved out of TxIndex::DB, it no longer has static or anonymous namespace scoping, which gives it external linkage. To avoid potential linker conflicts with other translation units, it might be worth wrapping it in an anonymous namespace as this function is not expected to be used outside this file

 namespace {
  void WriteTxs(CDBBatch& batch, const std::vector<std::pair<Txid,
  CDiskTxPos>>& v_pos)
  {
      ...
  }
 }

I compiled and passed all tests with this change

furszy commented at 7:07 PM on March 4, 2026:

why not using static instead?

arejula27 commented at 10:37 PM on March 4, 2026:

idm, i read that anonymous namespaces are more standarized in C++ and static is more C-compatible. I'm fine keeping it as static

arejula27 commented at 7:05 PM on March 4, 2026: none

Two small comments, I feel the one about removing final is relevant, the other one not really but I saw the same pattern on other files of the project so if it is not an inconvenient you can add it, if not just resolve it.

Waiting for the HDD benchmark of @l0rinc. I can do it if he is busy

furszy commented at 7:22 PM on March 4, 2026: member

Waiting for the HDD benchmark of l0rinc. I can do it if he is busy

Having more benchmarks does not hurt :). Feel more than welcome adding your results as well!

index: sync, update iterator only after processing block

No behavior change. This is just for correctness.

95fc91fc99

index: refactor, decouple interruption from sync loop

No behavior change.

0cbe84d36d

index: add method to process block ranges

No behavior change.

Introduce ProcessBlocks(start, end) to handle a range of blocks
in forward order. Currently used per-block, but this lays the
foundation for future batch processing and parallelization.

This has the nice property of allowing us to collect the block
indexes we are about to process without locking 'cs_main'. Just
by traversing the chain backwards via each block index pprev.

0336c61aac

furszy force-pushed on Mar 4, 2026

furszy commented at 8:27 PM on March 4, 2026: member

Updated per feedback. Thanks @arejula27. Also rebased on master to pull the CI fixes.

in src/index/base.cpp:166 in 7c051bd518 outdated

 164 | +        first = chain.Next(pindex_prev);
 165 | +        if (!first) {
 166 | +            // If there is no next block, we might be synced
 167 | +            if (pindex_prev == chain.Tip()) {
 168 | +                return nullptr;
 169 | +            }

optout21 commented at 4:55 PM on March 7, 2026:

Due to this new if this is not a straight refactor, maybe this is worth mentioning in the commit description. This if is a minor optimization or readability improvement, as in this special case the FindFork/Next combo below would return nullptr. Discussed at length in #32875

furszy commented at 6:10 PM on March 8, 2026:

Sure, done. Added mention to the Tip() fast-path in the commit description.

l0rinc commented at 2:51 PM on March 9, 2026:

0e9f2aa index: prepare NextSyncBlock to return block ranges:

I still find it confusing, as mentioned, this is just an optimization, not sure how it fits into and a clearer special case for "already at tip". It's a reasonable change, but another one without any test coverage - please see the mentioned #32875 (review) and #32875 (review) for how this could be tested.

l0rinc commented at 10:21 AM on March 8, 2026: contributor

Did another index benchmark for the latest push on an HDD:

BEFORE="4c40a923f003420193aa574745f70788bcf35265"; AFTER="5be65dbcdceaafb6f5a0e34c69efc2b3dfcfe27a"; \
DATA_DIR="/mnt/my_storage/BitcoinData"; export DATA_DIR; \
RESULTS_FILE="${DATA_DIR}/index_benchmark_results.txt"; export RESULTS_FILE; \
wait_index() { \
  tail -F ${DATA_DIR}/debug.log 2>/dev/null | grep -q -m1 'index is enabled\|is enabled at height'; \
  sleep 2; \
  killall bitcoind 2>/dev/null || true; \
  wait; \
}; export -f wait_index; \
log_size() { \
  local label="$1"; local index_dir="$2"; \
  local size=$(du -sh "$index_dir" 2>/dev/null | cut -f1); \
  local files=$(find "$index_dir" -type f -name '*.ldb' 2>/dev/null | wc -l); \
  echo "$label: size=$size, ldb_files=$files" | tee -a "$RESULTS_FILE"; \
}; export -f log_size; \
git reset --hard >/dev/null 2>&1 && git clean -fxd >/dev/null 2>&1 && git fetch origin $BEFORE $AFTER >/dev/null 2>&1; \
for c in $BEFORE:build-before $AFTER:build-after; do \
  git checkout ${c%:*} >/dev/null 2>&1 && cmake -B ${c#*:} -G Ninja -DCMAKE_BUILD_TYPE=Release >/dev/null 2>&1 && ninja -C ${c#*:} bitcoind >/dev/null 2>&1; \
done; \
echo "indexes | $(hostname) | $(uname -m) | $(lscpu | grep 'Model name' | head -1 | cut -d: -f2 | xargs) | $(nproc) cores | $(free -h | awk '/^Mem:/{print $2}') RAM | $(df -T $DATA_DIR | awk 'NR==2{print $2}') | $(lsblk -no ROTA $(df --output=source $DATA_DIR | tail -1) 2>/dev/null | grep -q 0 && echo SSD || echo HDD)" | tee -a "$RESULTS_FILE" && \
for INDEX in txindex blockfilterindex coinstatsindex txospenderindex;; do \
  echo -e "\n--- $INDEX ---" | tee -a "$RESULTS_FILE"; \
  if [ "$INDEX" = "blockfilterindex" ]; then \
    INDEX_DIR="${DATA_DIR}/indexes/blockfilter/basic"; \
  else \
    INDEX_DIR="${DATA_DIR}/indexes/${INDEX}"; \
  fi; \
  export INDEX_DIR; \
  for BUILD in before after; do \
    if [ "$BUILD" = "before" ]; then COMMIT="${BEFORE:0:7}"; else COMMIT="${AFTER:0:7}"; fi; \
    hyperfine --runs 1 --shell bash --sort command \
      --prepare "rm -rf ${DATA_DIR}/indexes/* ${DATA_DIR}/debug.log" \
      --cleanup "log_size '${INDEX} ${BUILD}' '${INDEX_DIR}'" \
      -n "${BUILD} (${COMMIT})" \
      "./build-${BUILD}/bin/bitcoind -datadir=${DATA_DIR} -${INDEX}=1 -connect=0 -printtoconsole=0 & wait_index" \
      2>&1 | tee -a "$RESULTS_FILE"; \
  done; \
done;

</details>

--- txindex ---

Benchmark 1: before (4c40a923f003420193aa574745f70788bcf35265) Time (abs ≡): 23757.916 s [User: 13234.906 s, System: 1697.284 s]

Benchmark 1: after (5be65dbcdceaafb6f5a0e34c69efc2b3dfcfe27a) Time (abs ≡): 21134.433 s [User: 12627.245 s, System: 1483.096 s]

--- blockfilterindex ---

Benchmark 1: before (4c40a923f003420193aa574745f70788bcf35265) Time (abs ≡): 14205.566 s [User: 8328.684 s, System: 418.981 s]

Benchmark 1: after (5be65dbcdceaafb6f5a0e34c69efc2b3dfcfe27a) Time (abs ≡): 14046.610 s [User: 8286.885 s, System: 415.464 s]

--- coinstatsindex ---

Benchmark 1: before (4c40a923f003420193aa574745f70788bcf35265) Time (abs ≡): 40075.612 s [User: 34320.885 s, System: 403.669 s]

Benchmark 1: after (5be65dbcdceaafb6f5a0e34c69efc2b3dfcfe27a) Time (abs ≡): 39869.235 s [User: 34307.695 s, System: 398.322 s]

--- txospenderindex --- Benchmark 1: before (4c40a92) Time (abs ≡): 36324.897 s [User: 23490.919 s, System: 2227.044 s]

Benchmark 1: after (5be65db) Time (abs ≡): 26375.223 s [User: 21783.716 s, System: 1690.540 s]

Edit: added txospenderindex

arejula27 commented at 10:27 AM on March 8, 2026: none

Did another index benchmark for the latest push on an HDD:

<img width="1784" height="594" alt="image" src="https://github.com/user-attachments/assets/3d929a7d-d919-4223-a598-90b7e73050da" />

<details><summary>indexes | i7-hdd | x86_64 | Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz | 8 cores | 62Gi RAM | ext4 | HDD</summary>
BEFORE="4c40a923f003420193aa574745f70788bcf35265"; AFTER="5be65dbcdceaafb6f5a0e34c69efc2b3dfcfe27a"; \
DATA_DIR="/mnt/my_storage/BitcoinData"; export DATA_DIR; \
RESULTS_FILE="${DATA_DIR}/index_benchmark_results.txt"; export RESULTS_FILE; \
wait_index() { \
 tail -F ${DATA_DIR}/debug.log 2>/dev/null | grep -q -m1 'index is enabled\|is enabled at height'; \
 sleep 2; \
 killall bitcoind 2>/dev/null || true; \
 wait; \
}; export -f wait_index; \
log_size() { \
 local label="$1"; local index_dir="$2"; \
 local size=$(du -sh "$index_dir" 2>/dev/null | cut -f1); \
 local files=$(find "$index_dir" -type f -name '*.ldb' 2>/dev/null | wc -l); \
 echo "$label: size=$size, ldb_files=$files" | tee -a "$RESULTS_FILE"; \
}; export -f log_size; \
git reset --hard >/dev/null 2>&1 && git clean -fxd >/dev/null 2>&1 && git fetch origin $BEFORE $AFTER >/dev/null 2>&1; \
for c in $BEFORE:build-before $AFTER:build-after; do \
 git checkout ${c%:*} >/dev/null 2>&1 && cmake -B ${c#*:} -G Ninja -DCMAKE_BUILD_TYPE=Release >/dev/null 2>&1 && ninja -C ${c#*:} bitcoind >/dev/null 2>&1; \
done; \
echo "indexes | $(hostname) | $(uname -m) | $(lscpu | grep 'Model name' | head -1 | cut -d: -f2 | xargs) | $(nproc) cores | $(free -h | awk '/^Mem:/{print $2}') RAM | $(df -T $DATA_DIR | awk 'NR==2{print $2}') | $(lsblk -no ROTA $(df --output=source $DATA_DIR | tail -1) 2>/dev/null | grep -q 0 && echo SSD || echo HDD)" | tee -a "$RESULTS_FILE" && \
for INDEX in txindex blockfilterindex coinstatsindex; do \
 echo -e "\n--- $INDEX ---" | tee -a "$RESULTS_FILE"; \
 if [ "$INDEX" = "blockfilterindex" ]; then \
 INDEX_DIR="${DATA_DIR}/indexes/blockfilter/basic"; \
 else \
 INDEX_DIR="${DATA_DIR}/indexes/${INDEX}"; \
 fi; \
 export INDEX_DIR; \
 for BUILD in before after; do \
 if [ "$BUILD" = "before" ]; then COMMIT="${BEFORE:0:7}"; else COMMIT="${AFTER:0:7}"; fi; \
 hyperfine --runs 1 --shell bash --sort command \
 --prepare "rm -rf ${DATA_DIR}/indexes/* ${DATA_DIR}/debug.log" \
 --cleanup "log_size '${INDEX} ${BUILD}' '${INDEX_DIR}'" \
 -n "${BUILD} (${COMMIT})" \
 "./build-${BUILD}/bin/bitcoind -datadir=${DATA_DIR} -${INDEX}=1 -connect=0 -printtoconsole=0 & wait_index" \
 2>&1 | tee -a "$RESULTS_FILE"; \
 done; \
done;
</details>

--- txindex ---

Benchmark 1: before (4c40a92) Time (abs ≡): 23757.916 s [User: 13234.906 s, System: 1697.284 s]

Benchmark 1: after (5be65db) Time (abs ≡): 21134.433 s [User: 12627.245 s, System: 1483.096 s]

--- blockfilterindex ---

Benchmark 1: before (4c40a92) Time (abs ≡): 14205.566 s [User: 8328.684 s, System: 418.981 s]

Benchmark 1: after (5be65db) Time (abs ≡): 14046.610 s [User: 8286.885 s, System: 415.464 s]

--- coinstatsindex ---

Benchmark 1: before (4c40a92) Time (abs ≡): 40075.612 s [User: 34320.885 s, System: 403.669 s]

Benchmark 1: after (5be65db) Time (abs ≡): 39869.235 s [User: 34307.695 s, System: 398.322 s]

It looks like the improvement is not very noticeable for two of the indexes. It might be interesting to try increasing the batch size. Do you have any suggestions on what could be causing this?

sedited commented at 4:50 PM on March 8, 2026: contributor

It looks like the improvement is not very noticeable for two of the indexes. It might be interesting to try increasing the batch size. Do you have any suggestions on what could be causing this?

I'm not sure this is really surprising given the difference in the amount of data written to the latter two. I would expect a jump more similar to the txindex for the txospenderindex.

What is the expectation on the increase of memory usage here? I tried profiling it with massif, but usage when running with this patch seemed a bit erratic, which I guess is still an indication that it is likely to be higher. Can we really batch operations in this way without putting more memory pressure on the OS?

furszy commented at 5:54 PM on March 8, 2026: member

Can we really batch operations in this way without putting more memory pressure on the OS?

The batch size is intentionally small. I was very conservative with the chosen value to avoid having to worry about it here. The goal of this PR is to land the structural improvements, with a net positive: less cs_main locking, merging prerequisites for the parallelization goal, and the last discovery of reducing the number of created LBD files.

It's expected that any batching implementation adds some memory overhead, but that shouldn't be a problem, it pays off in many other areas, not only with a faster sync promise. And, if we ever wanted to, we could add a config flag to disable batching by setting the batch size to 1. But realistically speaking, I don't think anyone running indexes would want them to take hours to sync while locking cs_main and slowing down the whole node. People who run indexes want them to sync-up quickly so they can start using them.

Also, an important point. All these changes only matter during the index initial sync period. Once the index is synced, they are no longer relevant.

We can benchmark this on small devices like the Pi 4 to see the benefits more clearly, but I honestly wouldn't spend too much time on it. If we agree this PR lands structural, scalable improvements with many benefits and no noticeable downside, and also lets us move forward with the major parallelization speedup path, it seems reasonable to just move forward. This also has a good number of tests we currently lack from.

index: prepare NextSyncBlock to return block ranges

This is a refactoring that makes NextSyncBlock easily adaptable to
return block ranges instead of single blocks in the subsequent commit.

It also adds a fast-path that avoids calling 'FindFork()' and 'Next()'
when the index is synced.

Co-authored-by: Hao Xu <hao.xu@linux.dev>

0e9f2aa7ea

index: make NextSyncBlock return block ranges

No behavior change.

64578a7888

index: add block range computation

Set end of the window of blocks an index will process at time.
Stopping at either the configured window size or the chain tip.

This is the first step toward batch and parallel processing.
These block windows become units of work and live in memory until
flushed to disk in one go.

Since each index has a different data size, the range size
is configurable so it can be tuned considering memory
consumption.

1628b1005a

index: enable DB writes batching during sync

Pass CDBBatch to subclasses so writes can be accumulated
and committed together instead of flushed per block

Note:
Batch writes were intentionally not tied to the existing Commit() method.
The rationale is to bound memory consumption as batches accumulate in RAM
until flushed to disk. This leaves two separate workflows with different
purposes:

1) Batch writes flushes data for a block range atomically: either all
blocks in the batch land in LevelDB or none do.

2) Commit() is about progress checkpointing: it writes the best block
locator alongside any other index state position (e.g. last written
file for the block filter index - which requires an fsync call), so the
node always has a consistent recovery point after an interruption or crash.

Reprocessing data after a crash is not really a problem because we would
just overwrite the existing one.

e4b1fd09e4

test: index, add initial sync batch writes coverage 7c82cc88e8

test: index, add coverage for initial sync reorgs 28d8b04def

test: Add coverage for index locator persistence during shutdown

Verifies the index persists the last processed index when interrupted.

1497086cdb

furszy force-pushed on Mar 8, 2026

in src/index/base.cpp:244 in 95fc91fc99

 239 | @@ -240,10 +240,11 @@ void BaseIndex::Sync()
 240 |                  FatalErrorf("Failed to rewind %s to a previous chain tip", GetName());
 241 |                  return;
 242 |              }
 243 | -            pindex = pindex_next;
 244 |  
 245 | +            if (!ProcessBlock(pindex_next)) return; // error logged internally

l0rinc commented at 12:28 PM on March 9, 2026:

95fc91f index: sync, update iterator only after processing block:

We're changing behavior in a commit that claims it's not a behavior change, without adding context for why this change was needed or how to validate that it is correct. Given that it wasn't obvious to the original authors, an explanation is needed. Also, No test was added to exercise this behavior, to explain exactly why it was necessary, and make sure it doesn't happen again.

It also seems independent enough to be pushed in a separate PR, with a test documenting the current behavior in the first commit and, in the next commit, a fix and a test update that reflects the new behavior.

furszy commented at 4:37 PM on March 15, 2026:

95fc91f index: sync, update iterator only after processing block:

We're changing behavior in a commit that claims it's not a behavior change, without adding context for why this change was needed or how to validate that it is correct. Given that it wasn't obvious to the original authors, an explanation is needed. Also, No test was added to exercise this behavior, to explain exactly why it was necessary, and make sure it doesn't happen again.

It also seems independent enough to be pushed in a separate PR, with a test documenting the current behavior in the first commit and, in the next commit, a fix and a test update that reflects the new behavior.

I'm puzzled here. You are claiming this is a behavior change without explaining how, and based on that, you write a strong affirmation for other reviewers rather than engaging with me about what you think this change introduces. That is not something I can act on, and it risks misleading other reviewers.

This is not a behavior change, it is a correctness change. The loop iterator that tracks the last successfully processed block was modified to be updated after processing the block, not before. The reordering of pindex = pindex_next to after the ProcessBlock call has zero effects on any execution path, because we immediately return if block processing fails, and at that point pindex is never read again. If you believe otherwise, please point to the specific code path you think is affected. This change was made to prepare the ground for the batch end block set, which is the actual focus of this PR. Splitting it out into a separate PR would add unnecessary overhead for a one-line reordering with no behavioral effect.

On the commit message: if the description was not clear enough, just say so. I would have updated it. Engaging directly about what is unclear is far more constructive than making assertions aimed at other reviewers.

On tone: "it wasn't obvious to the original authors" is truly unhelpful. It does not engage with the code, it does not help improve anything, and it makes it genuinely harder to engage with your other comments in good faith, even the ones that may be perfectly valid. The purpose of review is to work with the author to improve the code, not to editorialize for other readers.

l0rinc commented at 11:39 PM on March 15, 2026:

rather than engaging with me about what you think this change introduces

Not sure what you mean, I commented on your PR, meant it for you to read it.

I would have updated it.

I'm not sure about the past tense, I reviewed it because I'm expecting further changes, especially because of the bugs introduced here.

On tone: "it wasn't obvious to the original authors" is truly unhelpful.

How so? I don't know or care who wrote the original code, they had an intention here, if we're changing it, it needs explanation and preferably tests.

furszy commented at 2:32 PM on March 16, 2026:

I'm not sure about the past tense, I reviewed it because I'm expecting further changes, especially because of the bugs introduced here.

Please stick to the topic in question. I have not yet read your other comments on this PR. This is the first one, and it makes a strong claim without a proper explanation to back it up.

Can you please point to the specific code path where this commit introduces a behavior change? Thanks.

l0rinc commented at 2:45 PM on March 16, 2026:

I have not yet read your other comments on this PR

That explains the confusion.

Can you please point to the specific code path where this commit introduces a behavior change

Please read the rest of my comments, that's why I posted them...

sipa commented at 3:05 PM on March 16, 2026:

It seems obvious to me that 95fc91fc99a14c6635c13ba7b6a8e7c2a1efe7be cannot change behavior; it's only swapping the change of local variable with a return statement, at which point the local variable disappears anyway. So I agree with @furszy that I don't understand what the comment in https://github.com/bitcoin/bitcoin/pull/34489/changes/95fc91fc99a14c6635c13ba7b6a8e7c2a1efe7be#r2905123979 is about, independently from all other comments left on this PR.

l0rinc commented at 3:33 PM on March 16, 2026:

This commit does not explain why the swap is needed. That only becomes clear later, when a non-adjacent commit relies on pindex meaning "last successfully processed block" for the new interrupt path, see #34489 (comment).

So the issue is not this line movement in isolation, but that it is a preparatory change whose purpose is only revealed several commits later, in a commit that changes behavior while claiming to be a refactor. I think those commits should be squashed and the behavior change either fixed or documented.

in src/index/base.cpp:253 in 0cbe84d36d

 249 | @@ -261,6 +250,17 @@ void BaseIndex::Sync()
 250 |          }
 251 |      }
 252 |  
 253 | +    if (m_interrupt) {

l0rinc commented at 12:51 PM on March 9, 2026:

0cbe84d index: refactor, decouple interruption from sync loop:

The commit claims there's no behavior change, but it doesn't explain why the move is safe. Given all the internal returns and breaks it's not self-evident.

It seems to me that previously a synced, and later interrupted, path could hit

LogInfo("%s is enabled at height %d", GetName(), pindex->nHeight);

, but after the change it hits

LogInfo("%s: m_interrupt set; exiting ThreadSync", GetName());

and returns.

I'm not sure there is any other change, but I would appreciate a commit message explanation, and maybe a test exercising this path, preferably a fuzz test to cover all these weird combinations.

in src/index/base.cpp:203 in 0336c61aac

 197 | @@ -198,6 +198,24 @@ bool BaseIndex::ProcessBlock(const CBlockIndex* pindex, const CBlock* block_data
 198 |      return true;
 199 |  }
 200 |  
 201 | +bool BaseIndex::ProcessBlocks(const CBlockIndex* start, const CBlockIndex* end)
 202 | +{
 203 | +    // Collect all block indexes from [end...start] in order

l0rinc commented at 1:35 PM on March 9, 2026:

0336c61 index: add method to process block ranges:

nit: for consistency we could use the same comment style as below:

    // Collect all block indexes from end to start

in src/index/base.cpp:206 in 0336c61aac

 197 | @@ -198,6 +198,24 @@ bool BaseIndex::ProcessBlock(const CBlockIndex* pindex, const CBlock* block_data
 198 |      return true;
 199 |  }
 200 |  
 201 | +bool BaseIndex::ProcessBlocks(const CBlockIndex* start, const CBlockIndex* end)
 202 | +{
 203 | +    // Collect all block indexes from [end...start] in order
 204 | +    std::vector<const CBlockIndex*> ordered_blocks;
 205 | +    ordered_blocks.reserve(end->nHeight - start->nHeight + 1);
 206 | +    for (const CBlockIndex* block = end; block && start->pprev != block; block = block->pprev) {

l0rinc commented at 1:48 PM on March 9, 2026:

0336c61 index: add method to process block ranges:

When can block be nullptr here? Wouldn't it be a fatal error if we walk past genesis? And what if the caller mixes up begin and end? Or if start and end have valid heights but are on different forks.

Given how intertwined the code is here, a serious mess-up could cause an infinite loop and would be really hard to trace back to the source.

I would probably sleep better if we asserted that end->nHeight >= start->nHeight and used a bounded for loop here instead.

const int range_size{end->nHeight - start->nHeight + 1};
Assert(range_size > 0);

// Collect all block indexes from end to start
std::vector<const CBlockIndex*> ordered_blocks;
{
    ordered_blocks.reserve(range_size);
    CBlockIndex* it = end;
    for (int i{0}; i < range_size; ++i) {
        ordered_blocks.emplace_back(Assert(it));
        it = it->pprev;
    }
    Assert(it == start->pprev);
}

in src/index/base.cpp:212 in 0336c61aac

 207 | +        ordered_blocks.emplace_back(block);
 208 | +    }
 209 | +
 210 | +    // And process blocks in forward order: from start to end
 211 | +    for (auto it = ordered_blocks.rbegin(); it != ordered_blocks.rend(); ++it) {
 212 | +        if (m_interrupt) return false;

l0rinc commented at 1:59 PM on March 9, 2026:

0336c61 index: add method to process block ranges:

The method comment states:

Returns false on unrecoverable failure or during interruption

But if the interruption happens before we call the first ProcessBlock, we should be safe since no state was changed - i.e. nothing to recover from.

in src/index/base.cpp:253 in 0336c61aac

 247 | @@ -230,7 +248,13 @@ void BaseIndex::Sync()
 248 |                  return;
 249 |              }
 250 |  
 251 | -            if (!ProcessBlock(pindex_next)) return; // error logged internally
 252 | +            // For now, process a single block at time
 253 | +            if (!ProcessBlocks(/*start=*/pindex_next, /*end=*/pindex_next)) {
 254 | +                // If failed due to an interruption, we haven't processed the block.

l0rinc commented at 2:09 PM on March 9, 2026:

0336c61 index: add method to process block ranges:

How can we tell this failed "due to an interruption"? m_interrupt could have been set independently

in src/index/base.h:111 in 0336c61aac

 107 | @@ -108,6 +108,10 @@ class BaseIndex : public CValidationInterface
 108 |  
 109 |      bool ProcessBlock(const CBlockIndex* pindex, const CBlock* block_data = nullptr);
 110 |  
 111 | +    /// Processes blocks in the range [start, end]. Calling 'ProcessBlock'.

l0rinc commented at 2:14 PM on March 9, 2026:

0336c61 index: add method to process block ranges:

    /// Processes blocks in the range [start, end], calling `ProcessBlock`.

in src/index/base.cpp:151 in 0e9f2aa7ea

 147 | @@ -147,22 +148,31 @@ bool BaseIndex::Init()
 148 |      return true;
 149 |  }
 150 |  
 151 | -static const CBlockIndex* NextSyncBlock(const CBlockIndex* pindex_prev, CChain& chain) EXCLUSIVE_LOCKS_REQUIRED(cs_main)
 152 | +// Returns the next block to sync, or null if fully synced

l0rinc commented at 2:27 PM on March 9, 2026:

0e9f2aa index: prepare NextSyncBlock to return block ranges:

// Returns the next block to sync, or `nullptr` if fully synced

in src/index/base.cpp:159 in 0e9f2aa7ea

 157 |  
 158 |      if (!pindex_prev) {
 159 | -        return chain.Genesis();
 160 | -    }
 161 | +        // Only the genesis block has no prev block
 162 | +        first = chain.Genesis();

l0rinc commented at 2:43 PM on March 9, 2026:

0e9f2aa index: prepare NextSyncBlock to return block ranges:

The diff would be a lot simpler if we kept the early return guard here instead:

        return chain.Genesis();

which would reduce the indentation of the following code and the awkward naming of "first" for "next"

in src/index/base.cpp:161 in 0e9f2aa7ea

 159 | -        return chain.Genesis();
 160 | -    }
 161 | +        // Only the genesis block has no prev block
 162 | +        first = chain.Genesis();
 163 | +    } else {
 164 | +        first = chain.Next(pindex_prev);

l0rinc commented at 2:47 PM on March 9, 2026:

0e9f2aa index: prepare NextSyncBlock to return block ranges:

I find it a bit confusing that the return value of Next is called first, which is clearly the next of the prev.

in src/index/base.cpp:174 in 0e9f2aa7ea

 178 |      }
 179 |  
 180 | -    // Since block is not in the chain, return the next block in the chain AFTER the last common ancestor.
 181 | -    // Caller will be responsible for rewinding back to the common ancestor.
 182 | -    return chain.Next(chain.FindFork(pindex_prev));
 183 | +    Assume(first);

l0rinc commented at 3:03 PM on March 9, 2026:

0e9f2aa index: prepare NextSyncBlock to return block ranges:

The Assume(first) at the end is non-obvious. After the pindex_prev == chain.Tip() fast path returns nullptr, the remaining path reaches chain.Next(chain.FindFork(pindex_prev)). A reader needs to convince themselves this can't be nullptr, i.e., that FindFork can never return the tip here. Could you add a comment explaining why? Or add it to the commit message to help with review.

Is it safe because the fork point can't be the tip, because that would mean pindex_prev is in the active chain, which contradicts chain.Next(pindex_prev) having returned nullptr above?

in src/index/base.cpp:170 in 64578a7888

 172 | +        range.first = chain.Next(pindex_prev);
 173 | +        if (!range.first) {
 174 |              // If there is no next block, we might be synced
 175 |              if (pindex_prev == chain.Tip()) {
 176 | -                return nullptr;
 177 | +                return std::nullopt;

l0rinc commented at 3:12 PM on March 9, 2026:

64578a7 index: make NextSyncBlock return block ranges:

Can you please avoid modifying the same line in different commits? The change is already quite difficult to review.

in src/index/base.cpp:151 in 64578a7888

 147 | @@ -148,31 +148,38 @@ bool BaseIndex::Init()
 148 |      return true;
 149 |  }
 150 |  
 151 | -// Returns the next block to sync, or null if fully synced
 152 | -static const CBlockIndex* NextSyncBlock(const CBlockIndex* pindex_prev, const CChain& chain) EXCLUSIVE_LOCKS_REQUIRED(cs_main)
 153 | +struct BlockRange {

l0rinc commented at 3:14 PM on March 9, 2026:

64578a7 index: make NextSyncBlock return block ranges:

Can we squash this with 0336c61aacf01a503f75a8e35d6bec9954ea6840? We just introduced it and now we're modifying the same code area.

in src/index/base.cpp:153 in 64578a7888

 147 | @@ -148,31 +148,38 @@ bool BaseIndex::Init()
 148 |      return true;
 149 |  }
 150 |  
 151 | -// Returns the next block to sync, or null if fully synced
 152 | -static const CBlockIndex* NextSyncBlock(const CBlockIndex* pindex_prev, const CChain& chain) EXCLUSIVE_LOCKS_REQUIRED(cs_main)
 153 | +struct BlockRange {
 154 | +    const CBlockIndex* first{nullptr};
 155 | +    const CBlockIndex* last{nullptr};

l0rinc commented at 3:17 PM on March 9, 2026:

64578a7 index: make NextSyncBlock return block ranges:

nit: const CBlockIndex* is confusing in a mutable struct whose fields are reassigned every iteration. I understand that it's not CBlockIndex* const last, but a "pointer-to-const block index". Would seem simpler to me to just have:

struct BlockRange {
    CBlockIndex* first{nullptr};
    CBlockIndex* last{nullptr};
};

in src/index/base.cpp:157 in 1628b1005a

 153 | @@ -154,7 +154,7 @@ struct BlockRange {
 154 |  };
 155 |  
 156 |  // Returns the next range of blocks to sync, or 'std::nullopt' if fully synced
 157 | -static std::optional<BlockRange> NextSyncRange(const CBlockIndex* pindex_prev, const CChain& chain) EXCLUSIVE_LOCKS_REQUIRED(cs_main)
 158 | +static std::optional<BlockRange> NextSyncRange(const CBlockIndex* pindex_prev, const CChain& chain, const int range_max_size) EXCLUSIVE_LOCKS_REQUIRED(cs_main)

l0rinc commented at 3:20 PM on March 9, 2026:

1628b10 index: add block range computation:

We don't usually const primitive arguments

in src/index/base.cpp:274 in e4b1fd09e4

 269 | @@ -270,12 +270,14 @@ void BaseIndex::Sync()
 270 |                  return;
 271 |              }
 272 |  
 273 | -            if (!ProcessBlocks(block_range->first, block_range->last)) {
 274 | +            CDBBatch db_batch(GetDB());
 275 | +            if (!ProcessBlocks(db_batch, block_range->first, block_range->last)) {

l0rinc commented at 3:40 PM on March 9, 2026:

e4b1fd0 index: enable DB writes batching during sync:

Is ProcessBlocks guaranteed to make only temporary changes that we can just revert if this gets interrupted?

Before this PR, CustomAppend wrote directly to the DB per block, so m_muhash and the DB were always in sync. With batching, in-memory state advances ahead of what's persisted, and the interrupt handler's Commit() can write the advanced m_muhash alongside the old locator. After reload it could probably crash, or worse, maybe just continue with incorrect state (not sure, please confirm).

If this is accurate, it directly contradicts the code comment:

// Intentionally do not update DB_MUHASH here so it stays in sync with // DB_BEST_BLOCK, and the index is not corrupted if there is an unclean shutdown.

@dergoegge, is this something we can test currently? I'm surprised this wasn't caught by our fuzzers. Maybe I'm missing some constraint in the review, since I haven't tried reproducing this race explicitly.

in src/index/base.cpp:244 in 64578a7888

 239 | @@ -233,10 +240,10 @@ void BaseIndex::Sync()
 240 |          auto last_log_time{NodeClock::now()};
 241 |          auto last_locator_write_time{last_log_time};
 242 |          while (!m_interrupt) {
 243 | -            const CBlockIndex* pindex_next = WITH_LOCK(cs_main, return NextSyncBlock(pindex, m_chainstate->m_chain));
 244 | -            // If pindex_next is null, it means pindex is the chain tip, so
 245 | +            auto block_range = WITH_LOCK(cs_main, return NextSyncRange(pindex, m_chainstate->m_chain));
 246 | +            // If range.first is null, it means pindex is the chain tip, so

l0rinc commented at 4:35 PM on March 9, 2026:

64578a7 index: make NextSyncBlock return block ranges:

This comment seem stale now and mismatches the new API.

polespinasa commented at 1:39 PM on May 27, 2026:

in 64578a7888ed78a11e8ccf1742ed2e6a8a033b46 index: make NextSyncBlock return block ranges Probably worth not using auto here so it's visible that it can not have a value and that means it is the tip. Also I think you meant if block_range is not set? not range.first

in src/index/base.h:192 in 1628b1005a

 187 | @@ -181,6 +188,9 @@ class BaseIndex : public CValidationInterface
 188 |      /// Stops the instance from staying in sync with blockchain updates.
 189 |      void Stop();
 190 |  
 191 | +    /// Number of blocks to process in a batch
 192 | +    void SetProcessingBatchSize(const int blocks_count) { m_num_blocks_batch = blocks_count; }

l0rinc commented at 4:38 PM on March 9, 2026:

1628b10 index: add block range computation:

Are we exposing this for testing only? An incorrect call to this could possibly cause an infinite loop, or at least continue until genesis. Not sure, but I would prefer if we made the public interface safer.

in src/test/blockfilter_index_tests.cpp:518 in 1497086cdb

 512 | @@ -513,4 +513,42 @@ BOOST_FIXTURE_TEST_CASE(initial_sync_reorg, BuildChainTestingSetup)
 513 |      BOOST_CHECK_EQUAL(summary.best_block_hash, tip_block->GetBlockHash());
 514 |  }
 515 |  
 516 | +// Verifies that the index persists its sync progress when interrupted during initial sync.
 517 | +// The index should resume from the last processed batch rather than restarting from genesis.
 518 | +BOOST_FIXTURE_TEST_CASE(shutdown_during_initial_sync, BuildChainTestingSetup)

l0rinc commented at 4:52 PM on March 9, 2026:

1497086 test: Add coverage for index locator persistence during shutdown:

This should have probably caught the partial write regression. Can we extend IndexCommitStateSim to implement CustomCommit as well?

in src/test/blockfilter_index_tests.cpp:388 in 7c82cc88e8

 384 | @@ -385,4 +385,46 @@ BOOST_FIXTURE_TEST_CASE(index_reorg_crash, BuildChainTestingSetup)
 385 |      index.Stop();
 386 |  }
 387 |  
 388 | +// Ensure the initial sync batch window behaves as expected.

l0rinc commented at 5:05 PM on March 9, 2026:

7c82cc8 test: index, add initial sync batch writes coverage:

Adding tests at the very end doesn't help document the progression of the changes introduced in this PR, commit by commit. We could add this test as a first commit:

// Ensure initial sync can be restarted cleanly without overwriting earlier
// results. Tests sync from genesis and from a higher block to mimic a restart.
BOOST_FIXTURE_TEST_CASE(initial_sync_restart, BuildChainTestingSetup)

and later, in the batch size commit, just add filter_index.SetProcessingBatchSize(BATCH_SIZE); to the test to prove that it still passes. That way, we are asserting that the previous behavior is retained, not just that the new behavior is covered.

in src/test/blockfilter_index_tests.cpp:478 in 28d8b04def

 473 | +};
 474 | +
 475 | +// Tests that indexes can complete its initial sync even if a reorg occurs mid-sync.
 476 | +// The index is paused at a specific block while a fork is introduced a few blocks before the tip.
 477 | +// Once unblocked, the index should continue syncing and correctly reach the new chain tip.
 478 | +BOOST_FIXTURE_TEST_CASE(initial_sync_reorg, BuildChainTestingSetup)

l0rinc commented at 5:10 PM on March 9, 2026:

28d8b04 test: index, add coverage for initial sync reorgs:

This test almost completely passes on master - we should add as much coverage before the refactor. But we need to make it more realistic, likely by overriding CustomAppend as well.

in src/test/blockfilter_index_tests.cpp:524 in 1497086cdb

 519 | +{
 520 | +    // The index will be interrupted at block 45. Due to the batch size of 10,
 521 | +    // the last synced block will be block 39 after interruption (end of batch range [30-39]).
 522 | +    constexpr int SHUTDOWN_HEIGHT = 45;
 523 | +    constexpr int BATCH_SIZE = 10;
 524 | +    constexpr int EXPECTED_LAST_SYNCED_BLOCK = 39;

l0rinc commented at 5:12 PM on March 9, 2026:

1497086 test: Add coverage for index locator persistence during shutdown:

This passes before the change with:

    // The index will be interrupted while processing block 45. Since sync is
    // still per-block here, the last persisted block is expected to be 45.
    constexpr int SHUTDOWN_HEIGHT = 45;
    constexpr int EXPECTED_LAST_SYNCED_BLOCK = 45;

(and no SetProcessingBatchSize, of course)

in src/index/blockfilterindex.cpp:262 in 1497086cdb

 260 |  }
 261 |  
 262 | -bool BlockFilterIndex::Write(const BlockFilter& filter, uint32_t block_height, const uint256& filter_header)
 263 | +bool BlockFilterIndex::Write(CDBBatch& batch, const BlockFilter& filter, uint32_t block_height, const uint256& filter_header)
 264 |  {
 265 |      size_t bytes_written = WriteFilterToDisk(m_next_filter_pos, filter);

l0rinc commented at 5:23 PM on March 9, 2026:

We seem to be writing to disk to eagerly here, this also is misaligned with the batching.

l0rinc changes_requested

l0rinc commented at 5:37 PM on March 9, 2026: contributor

Concept ACK on the goal: batching is the right direction for reducing write amplification, cs_main contention, and LDB file proliferation.

However, I think it's important to stress my concern that this change is more complex than it's being treated as. It alters the invariants every index subclass relies on: in-memory state can now advance ahead of what's persisted, and the interrupt/shutdown paths haven't been fully thought through for each subclass. The most serious issue seems to me that Commit() during interrupt writes advanced m_muhash alongside a stale locator, which causes CoinStatsIndex::CustomInit to reject the index as corrupted on restart. blockfilterindex has a milder variant (orphaned flat file data). I wonder if fuzzing could catch these.

More generally, several "no behavior change" refactoring commits don't explain why the restructuring is safe, and memory concerns are brushed away with "shouldn't be a problem". I also left a ton of comments about the ProcessBlocks loop needing tighter bounds and the shutdown test only exercising a no-op BaseIndex subclass. It doesn't catch the actual subclass corruption paths. Some of the earlier commits (e.g. the iterator ordering fix) could land as standalone PRs to reduce the review surface here.

See inline comments for details on these and other issues (interrupt handler scoping, NextSyncBlock invariants, loop safety, stale comments, etc.).

[!NOTE] in the meantime, I finished the HDD benchmarks with txospenderindex measurements, see #34489 (comment).

furszy commented at 4:40 PM on March 15, 2026: member

Clarifying one of the comments.

arejula27 commented at 10:01 PM on March 22, 2026: none

I was thinking to buy and external disk (HDD) and try to benchmark the memory impact, would it be interesting or having the blockchain and the index on different disks make the test not useful?

l0rinc commented at 11:19 PM on March 22, 2026: contributor

having the blockchain and the index on different disks make the test not useful?

for memory profiling it shouldn't matter - just know that these measurements are notoriously slow

arejula27 commented at 11:30 AM on March 29, 2026: none

@furszy Do you think #34897 has any implications on this PR and #26966? If indexes can only commit up to the chainstate's last flushed block, the batching and parallelisation speedups might be limited by the flush interval (~50-70 min) rather than by index processing itself. Would it make sense to keep the current per-block behaviour during IBD (where the index is catching up and flush lag matters most) and use batching only for fully synced nodes processing new blocks using new methods?? Another option could be each index sets its own batch size. For example, blockfilterindex (required during IBD) could use a batch size of 1 to keep commit granularity tight, while optional indexes like coinstatsindex or txindex could use larger batches since their commit timing is less critical.

PS: I got my chain synced, will try to do the memory benchmark this week

DrahtBot added the label Needs rebase on Mar 30, 2026

DrahtBot commented at 1:24 PM on March 30, 2026: contributor

🐙 This pull request conflicts with the target branch and needs rebase.

fanquake marked this as a draft on May 8, 2026

in src/index/base.cpp:268 in 64578a7888

 268 |                  return;
 269 |              }
 270 |  
 271 | -            // For now, process a single block at time
 272 | -            if (!ProcessBlocks(/*start=*/pindex_next, /*end=*/pindex_next)) {
 273 | +            if (!ProcessBlocks(block_range->first, block_range->last)) {

polespinasa commented at 1:43 PM on May 27, 2026:

in 64578a7888ed78a11e8ccf1742ed2e6a8a033b46 index: make NextSyncBlock return block ranges I guess ProcessBlocks can just take the new BlockRange struct.

polespinasa commented at 2:21 PM on May 27, 2026: member

concept ACK

Just did a swift code review for the Spanish PR review club :)

I'm 0 familiar with this part of the codebase, but left some comments that ¿might? be useful.

I found the first two commits a bit difficult to understand because, although it is obvious that it is not a behavior change, there's no explanation in the commit message on why those changes are needed. I think a small sentence saying why that helps is useful :)