Work In Progress - I still need to investigate if it’s possible to do the sorting in-place somehow instead of allocating extra memory.
This is an extension of #31645 - drafted for now to receive comments.
It looks like sorting the coins before flushing speeds up MemTable
insertion considerably (will move the image to comments, after undrafting):
Ran a few assumeUTXO
loads and a few complete reindexes to check the LevelDB write batch size increase with/without sorting:
assumeUTXO final dumps:
0/mnt/my_storage/bitcoin # tail -3 ../logs/debug-35bf426e02210c1bbb04926f4ca2e0285fbfcd11.log
12025-01-14T00:46:32Z [warning] Flushing large (26 GiB) UTXO set to disk, it may take several minutes
22025-01-14T01:18:53Z Shutdown: done
3
4/mnt/my_storage/bitcoin # tail -3 ../logs/debug-256a5c04f2eb190e45ec7d7ffb058c189174780a.log
52025-01-14T13:44:14Z [warning] Flushing large (26 GiB) UTXO set to disk, it may take several minutes
62025-01-14T14:06:10Z Shutdown: done
7
8/mnt/my_storage/bitcoin # tail -3 ../logs/debug-a5920ec25f02b8b66b6f23f00e696a86b5352d69.log
92025-01-15T02:15:49Z [warning] Flushing large (26 GiB) UTXO set to disk, it may take several minutes
102025-01-15T02:24:16Z Shutdown: done
35bf426e02210c1bbb04926f4ca2e0285fbfcd11 (master): Duration: 0:32:21 256a5c04f2eb190e45ec7d7ffb058c189174780a (64 « 20): Duration: 0:21:56 a5920ec25f02b8b66b6f23f00e696a86b5352d69 (SortedWrite 64): Duration: 0:08:27
Meaning that the 64 MiB batch size increase
+ descending sorting
improves the final UTXO set dump from 32 minutes to 8 minutes.
reindex-chainstate until 878k blocks
0hyperfine --runs 2 --parameter-list COMMIT 35bf426e02210c1bbb04926f4ca2e0285fbfcd11,256a5c04f2eb190e45ec7d7ffb058c189174780a,a5920ec25f02b8b66b6f23f00e696a86b5352d69 --prepare 'rm -f /mnt/my_storage/BitcoinData/debug.log && git checkout {COMMIT} && git clean -fxd && git reset --hard && cmake -B build -DCMAKE_BUILD_TYPE=Release -DBUILD_UTIL=OFF -DBUILD_TX=OFF -DBUILD_TESTS=OFF -DENABLE_WALLET=OFF -DINSTALL_MAN=OFF && cmake --build build -j$(nproc)' --cleanup 'mv /mnt/my_storage/BitcoinData/debug.log /mnt/my_storage/logs/debug-{COMMIT}.log' 'COMMIT={COMMIT} ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=878000 -dbcache=30000 -reindex-chainstate -connect=0'
1
2Benchmark 1: COMMIT=35bf426e02210c1bbb04926f4ca2e0285fbfcd11 ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=878000 -dbcache=30000 -reindex-chainstate -connect=0
3 Time (mean ± σ): 23766.362 s ± 5.864 s [User: 36042.690 s, System: 701.176 s]
4 Range (min … max): 23762.216 s … 23770.509 s 2 runs
5
6Benchmark 2: COMMIT=256a5c04f2eb190e45ec7d7ffb058c189174780a ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=878000 -dbcache=30000 -reindex-chainstate -connect=0
7 Time (mean ± σ): 22966.570 s ± 97.248 s [User: 35741.689 s, System: 642.797 s]
8 Range (min … max): 22897.805 s … 23035.335 s 2 runs
9
10Benchmark 3: COMMIT=a5920ec25f02b8b66b6f23f00e696a86b5352d69 ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=878000 -dbcache=30000 -reindex-chainstate -connect=0
11 Time (mean ± σ): 22119.481 s ± 50.626 s [User: 35457.379 s, System: 578.014 s]
12 Range (min … max): 22083.683 s … 22155.279 s 2 runs
0Summary
1 COMMIT=a5920ec25f02b8b66b6f23f00e696a86b5352d69 ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=878000 -dbcache=30000 -reindex-chainstate -connect=0 ran
2 1.04 ± 0.00 times faster than COMMIT=256a5c04f2eb190e45ec7d7ffb058c189174780a ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=878000 -dbcache=30000 -reindex-chainstate -connect=0
3 1.07 ± 0.00 times faster than COMMIT=35bf426e02210c1bbb04926f4ca2e0285fbfcd11 ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=878000 -dbcache=30000 -reindex-chainstate -connect=0
Indicating that the two changes combined result in 7% faster reindex (and likely IBD).
One of the drawbacks of this approach that I’d like guidance on (as noted by @sipa in the 64-MiB-bump PR as well), is that this will increase max memory usage temporarily at a critical time (e.g. while dumping the calculated final UTXO set), when we are already using the most memory. This can likely be mitigated in a few ways as far as I can tell:
- similarly to how we warn on low disk storage space, we could warn at the beginning when
dbcache
size is unrealistically big. (bitcoind -dbcache=300000
(300GiB) simply printsUsing 299990.0 MiB for in-memory UTXO set
- we could add a warning at least in this case). - merge #30611 by @andrewtoth and lose at most 1 hour’s worth of work.
- make the upstream LevelDB batch sortable (or automatically sort itself before insertion).