This change is part of [IBD] - Tracking PR for speeding up Initial Block Download
Summary
We can serialize the blocks and undos to any Stream
which implements the appropriate read/write methods.
AutoFile
is one of these, writing the results “directly” to disk (through the OS file cache). Batching these in memory first and reading/writing these to disk is measurably faster (likely because of fewer native fread calls or less locking, as observed by Martinus in a similar change).
Unlocking new optimization opportunities
Buffered writes will also enable batched obfuscation calculations (implemented in #31144) - especially since currently we need to copy the write input’s std::span to do the obfuscation on it, and batching enables doing the operations on the internal buffer directly.
Measurements (micro benchmarks, full IBDs and reindexes)
Microbenchmarks for [Read|Write]BlockBench
show a ~30%/168% speedup with macOS/Clang
, and ~19%/24% with Linux/GCC
(the follow-up XOR batching improves these further):
Before:
ns/op | op/s | err% | total | benchmark |
---|---|---|---|---|
2,271,441.67 | 440.25 | 0.1% | 11.00 | ReadBlockBench |
5,149,564.31 | 194.19 | 0.8% | 10.95 | WriteBlockBench |
After:
ns/op | op/s | err% | total | benchmark |
---|---|---|---|---|
1,738,683.04 | 575.15 | 0.2% | 11.04 | ReadBlockBench |
3,052,658.88 | 327.58 | 1.0% | 10.91 | WriteBlockBench |
Before:
ns/op | op/s | err% | ins/op | cyc/op | IPC | bra/op | miss% | total | benchmark |
---|---|---|---|---|---|---|---|---|---|
6,895,987.11 | 145.01 | 0.0% | 71,055,269.86 | 23,977,374.37 | 2.963 | 5,074,828.78 | 0.4% | 22.00 | ReadBlockBench |
5,152,973.58 | 194.06 | 2.2% | 19,350,886.41 | 8,784,539.75 | 2.203 | 3,079,335.21 | 0.4% | 23.18 | WriteBlockBench |
After:
ns/op | op/s | err% | ins/op | cyc/op | IPC | bra/op | miss% | total | benchmark |
---|---|---|---|---|---|---|---|---|---|
5,771,882.71 | 173.25 | 0.0% | 65,741,889.82 | 20,453,232.33 | 3.214 | 3,971,321.75 | 0.3% | 22.01 | ReadBlockBench |
4,145,681.13 | 241.21 | 4.0% | 15,337,596.85 | 5,732,186.47 | 2.676 | 2,239,662.64 | 0.1% | 23.94 | WriteBlockBench |
2 full IBD runs against master (compiled with GCC where the gains seem more modest) for 888888 blocks (seeded from real nodes) indicates a ~7% total speedup.
0COMMITS="d2b72b13699cf460ffbcb1028bcf5f3b07d3b73a 652b4e3de5c5e09fb812abe265f4a8946fa96b54"; \
1STOP_HEIGHT=888888; DBCACHE=1000; \
2C_COMPILER=gcc; CXX_COMPILER=g++; \
3BASE_DIR="/mnt/my_storage"; DATA_DIR="$BASE_DIR/BitcoinData"; LOG_DIR="$BASE_DIR/logs"; \
4(for c in $COMMITS; do git fetch origin $c -q && git log -1 --pretty=format:'%h %s' $c || exit 1; done) && \
5hyperfine \
6 --sort 'command' \
7 --runs 2 \
8 --export-json "$BASE_DIR/ibd-${COMMITS// /-}-$STOP_HEIGHT-$DBCACHE-$C_COMPILER.json" \
9 --parameter-list COMMIT ${COMMITS// /,} \
10 --prepare "killall bitcoind; rm -rf $DATA_DIR/*; git checkout {COMMIT}; git clean -fxd; git reset --hard; \
11 cmake -B build -DCMAKE_BUILD_TYPE=Release -DENABLE_WALLET=OFF -DCMAKE_C_COMPILER=$C_COMPILER -DCMAKE_CXX_COMPILER=$CXX_COMPILER && \
12 cmake --build build -j$(nproc) --target bitcoind && \
13 ./build/bin/bitcoind -datadir=$DATA_DIR -stopatheight=1 -printtoconsole=0; sleep 100" \
14 --cleanup "cp $DATA_DIR/debug.log $LOG_DIR/debug-{COMMIT}-$(date +%s).log" \
15 "COMPILER=$C_COMPILER COMMIT=${COMMIT:0:10} ./build/bin/bitcoind -datadir=$DATA_DIR -stopatheight=$STOP_HEIGHT -dbcache=$DBCACHE -blocksonly -printtoconsole=0"
16d2b72b1369 refactor: rename leftover WriteBlockBench
17652b4e3de5 optimization: Bulk serialization writes in `WriteBlockUndo` and `WriteBlock`
18Benchmark 1: COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=888888 -dbcache=1000 -blocksonly -printtoconsole=0 (COMMIT = d2b72b13699cf460ffbcb1028bcf5f3b07d3b73a)
19 Time (mean ± σ): 41528.104 s ± 354.003 s [User: 44324.407 s, System: 3074.829 s]
20 Range (min … max): 41277.786 s … 41778.421 s 2 runs
21
22Benchmark 2: COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=888888 -dbcache=1000 -blocksonly -printtoconsole=0 (COMMIT = 652b4e3de5c5e09fb812abe265f4a8946fa96b54)
23 Time (mean ± σ): 38771.457 s ± 441.941 s [User: 41930.651 s, System: 3222.664 s]
24 Range (min … max): 38458.957 s … 39083.957 s 2 runs
25
26Relative speed comparison
27 1.07 ± 0.02 COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=888888 -dbcache=1000 -blocksonly -printtoconsole=0 (COMMIT = d2b72b13699cf460ffbcb1028bcf5f3b07d3b73a)
28 1.00 COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=888888 -dbcache=1000 -blocksonly -printtoconsole=0 (COMMIT = 652b4e3de5c5e09fb812abe265f4a8946fa96b54)