This change is part of [IBD] - Tracking PR for speeding up Initial Block Download
Summary
When the in-memory UTXO set is flushed to LevelDB (after IBD or AssumeUTXO load), it does so in batches to manage memory usage during the flush.
While a hidden -dbbatchsize
config option exists to modify this value, this PR introduces dynamic calculation of the batch size based on the -dbcache
setting. By using larger batches when more memory is available (i.e., higher -dbcache
), we can reduce the overhead from numerous small writes, minimize constant overhead per batch, improve I/O efficiency (especially on HDDs), and potentially allow LevelDB to optimize writes more effectively (e.g. by sorting the keys before write).
Context
The UTXO set has grown significantly since 2017, when the original fixed 16MiB batch size was chosen.
With the current multi-gigabyte UTXO set and the common practice of using larger -dbcache
values, the fixed 16MiB batch size leads to several inefficiencies:
- Flushing the entire UTXO set requires hundreds or thousands of separate 16MiB write operations.
- Particularly on HDDs, the cumulative disk seek time and per-operation overhead from numerous small writes significantly slow down the flushing process.
- Each
WriteBatch
call incurs internal LevelDB overhead (e.g., MemTable handling, compaction triggering logic). More frequent, smaller batches amplify this cumulative overhead. - Systems configured with large
-dbcache
values have sufficient memory to handle larger batches more efficiently, but the fixed size prevents leveraging this potential.
Flush times of 20-30 minutes are not uncommon, even on capable hardware.
Considerations
As [noted by @sipa](/bitcoin-bitcoin/31645/#issuecomment-2587500105), flushing involves a temporary memory usage increase as the batch is prepared. A larger batch size naturally leads to a larger peak during this phase. Crashing due to OOM during a flush is highly undesirable. Future work like #30611 could mitigate this somewhat.
The dynamic approach implemented here only increases the batch size if the user has explicitly set -dbcache
higher than the default 450MiB, implying they have more memory available. For default or lower -dbcache
values, the batch size (and thus the memory spike) remains unchanged from current behavior. This makes the memory increase an opt-in consequence of allocating more resources via -dbcache
.
The increased peak memory usage (detailed below) is primarily attributed to LevelDB’s leveldb::Arena
(backing MemTables) and the temporary storage of serialized batch data (e.g., std::string
in CDBBatch::WriteImpl
).
Performance gains are most pronounced on systems with slower I/O (HDDs), but some SSDs also show measurable improvements.
Solution
This change introduces dynamic sizing for the LevelDB write batch used in FlushSnapshotToDisk
, based on the configured -dbcache
size. The effective batch size is calculated to keep the current 16MiB for the default 450MiB dbcache
and below, scaling linearly upwards from there, capped at 256MiB (gains are barely measurable for bigger batches):
$$
\text{GetDbBatchSize}(dbcache)
= \max\Bigl(16\mathrm{MiB},
\min\bigl(256\mathrm{MiB},
\frac{dbcache}{450\mathrm{MiB}} \times 16\mathrm{MiB}
\bigr)
\Bigr).
$$
This results in the following effective batch sizes and approximate number of batches required to flush the ~28 GiB UTXO set from the 880k snapshot:
dbcache (MiB) | Effective Batch Size (MiB) | Batches Needed |
---|---|---|
<=450 | 16.0 | ~832 |
1000 | 35.6 | ~375 |
2000 | 71.1 | ~188 |
4500 | 160.0 | ~84 |
>7000 | 256.0 (Max) | ~52 |
Key Aspects:
- For
-dbcache <= 450
, the batch size remains 16MiB. No impact on low-memory systems or default configurations. - Batch size increases proportionally with
-dbcache
above 450MiB. - The size is clamped between 16MiB (minimum) and 256MiB (maximum).
Measurements
Performance (loadtxoutset
RPC call, GCC, Linux):
Hyperfine benchmarks loading and flushing the 880000.dat
UTXO snapshot show consistent speedups with the dynamic batch sizing compared to the fixed 16MiB batch:
-dbcache=1000
(Batch: 16MiB → ~35.5MiB): ~11.6% faster (901s → 797s)-dbcache=2000
(Batch: 16MiB → ~71.1MiB): ~15.7% faster (884s → 745s)-dbcache=4500
(Batch: 16MiB → 160MiB): ~7.8% faster (823s → 760s)-dbcache=10000
(Batch: 16MiB → 256MiB): ~16.4% faster (795s → 665s)-dbcache=45000
(Batch: 16MiB → 256MiB): ~(8.1% faster (570s → 527s)
Log analysis confirms the speedup originates from the flushing phase itself (e.g., ~11.4% faster FlushSnapshotToDisk
sum for -dbcache=1000
).
HDD Performance (-reindex-chainstate
, -dbcache=30000
):
Comparing the old fixed 16MiB batch to a fixed 64MiB batch (for reference, not the dynamic size) showed a ~33% reduction in flush time (31m -> 20.5m), indicating the substantial benefit on slower storage. The dynamic approach aims to capture similar gains when dbcache
allows.
Memory Usage (Massif Peak during loadtxoutset
):
Peak memory usage increases modestly when -dbcache
is raised above the default, reflecting the larger batches being processed during the flush:
dbcache (MiB) | Batch Size (Before→After,MiB) | Peak Mem (Before,GiB) | Peak Mem (After,GiB) | Increase (%) |
---|---|---|---|---|
1000 | 16 → ~35.5 | 1.24 | 1.31 | +5.9% |
4500 | 16 → 160 | 4.56 | 5.11 | +11.9% |
10000 | 16 → 256 | 9.79 | 10.77 | +10.1% |
The increase is primarily attributed to larger allocations in leveldb::Arena
and for batch serialization (std::string
in CDBBatch::WriteImpl
).
Reproducer:
0# Set up a clean demo environment
1mkdir -p demo && rm -rfd demo/chainstate demo/chainstate_snapshot demo/debug.log
2
3# Build Bitcoin Core
4cmake -B build -DCMAKE_BUILD_TYPE=Release && cmake --build build -j$(nproc)
5
6# Start bitcoind with minimal settings without mempool and internet connection
7build/bin/bitcoind -datadir=demo -stopatheight=1
8build/bin/bitcoind -datadir=demo -blocksonly=1 -connect=0 -dbcache=30000 -daemon
9
10# Load the AssumeUTXO snapshot, making sure the path is correct
11# Expected output includes `"coins_loaded": 184821030`
12build/bin/bitcoin-cli -datadir=demo -rpcclienttimeout=0 loadtxoutset ~/utxo-880000.dat
13
14# Stop the daemon and verify snapshot flushes in the logs
15build/bin/bitcoin-cli -datadir=demo stop
16grep "FlushSnapshotToDisk: completed" demo/debug.log
For more details see: #31645 (comment)
Note that the scope of the PR has change since it was opened - the measurements in the replies are all valid, but they may be slightly different now with the latest dynamic calculations.