This change is part of [IBD] - Tracking PR for speeding up Initial Block Download
Summary
We can serialize the blocks and undos to any Stream
which implements the appropriate read/write methods.
AutoFile
is one of these, writing the results “directly” to disk (through the OS file cache). Batching these in memory first and reading/writing these to disk is measurably faster (likely because of fewer native fread calls or less locking, as observed by @martinus in a similar change).
Unlocking new optimization opportunities
Buffered writes will also enable batched obfuscation calculations (implemented in #31144) - especially since currently we need to copy the write input’s std::span to do the obfuscation on it, and batching enables doing the operations on the internal buffer directly.
Measurements (micro benchmarks, full IBDs and reindexes)
Microbenchmarks for [Read|Write]BlockBench
show a ~30%/168% speedup with macOS/Clang
, and ~19%/24% with Linux/GCC
(the follow-up XOR batching improves these further):
Before:
ns/op | op/s | err% | total | benchmark |
---|---|---|---|---|
2,271,441.67 | 440.25 | 0.1% | 11.00 | ReadBlockBench |
5,149,564.31 | 194.19 | 0.8% | 10.95 | WriteBlockBench |
After:
ns/op | op/s | err% | total | benchmark |
---|---|---|---|---|
1,738,683.04 | 575.15 | 0.2% | 11.04 | ReadBlockBench |
3,052,658.88 | 327.58 | 1.0% | 10.91 | WriteBlockBench |
Before:
ns/op | op/s | err% | ins/op | cyc/op | IPC | bra/op | miss% | total | benchmark |
---|---|---|---|---|---|---|---|---|---|
6,895,987.11 | 145.01 | 0.0% | 71,055,269.86 | 23,977,374.37 | 2.963 | 5,074,828.78 | 0.4% | 22.00 | ReadBlockBench |
5,152,973.58 | 194.06 | 2.2% | 19,350,886.41 | 8,784,539.75 | 2.203 | 3,079,335.21 | 0.4% | 23.18 | WriteBlockBench |
After:
ns/op | op/s | err% | ins/op | cyc/op | IPC | bra/op | miss% | total | benchmark |
---|---|---|---|---|---|---|---|---|---|
5,771,882.71 | 173.25 | 0.0% | 65,741,889.82 | 20,453,232.33 | 3.214 | 3,971,321.75 | 0.3% | 22.01 | ReadBlockBench |
4,145,681.13 | 241.21 | 4.0% | 15,337,596.85 | 5,732,186.47 | 2.676 | 2,239,662.64 | 0.1% | 23.94 | WriteBlockBench |
2 full IBD runs against master (compiled with GCC where the gains seem more modest) for 888888 blocks (seeded from real nodes) indicates a ~7% total speedup.
0COMMITS="8af40aaf283cc81fe2b3cc125d21f090c562460e 5c21f6c26fd6b6eab33e208cd565577735fe9aa7 97d30e2534b0a8c388a4a2059277f64d3b2243af"; \
1STOP_HEIGHT=888888; DBCACHE=1000; \
2C_COMPILER=gcc; CXX_COMPILER=g++; \
3BASE_DIR="/mnt/my_storage"; DATA_DIR="${BASE_DIR}/BitcoinData"; LOG_DIR="${BASE_DIR}/logs"; \
4git fetch --all -q && (for c in $COMMITS; do git fetch origin $c -q && git log -1 --oneline $c || exit 1; done) && \
5hyperfine \
6 --export-json "${BASE_DIR}/ibd-${COMMITS// /-}-${STOP_HEIGHT}-${DBCACHE}-${C_COMPILER}.json" \
7 --runs 2 \
8 --parameter-list COMMIT ${COMMITS// /,} \
9 --prepare "killall bitcoind; rm -rf ${DATA_DIR}/*; git checkout {COMMIT}; git clean -fxd; git reset --hard; \
10 cmake -B build -DCMAKE_BUILD_TYPE=Release -DENABLE_WALLET=OFF -DCMAKE_C_COMPILER=$C_COMPILER -DCMAKE_CXX_COMPILER=$CXX_COMPILER && \
11 cmake --build build -j$(nproc) --target bitcoind && \
12 ./build/bin/bitcoind -datadir=${DATA_DIR} -stopatheight=1 -printtoconsole=0" \
13 --cleanup "cp ${DATA_DIR}/debug.log ${LOG_DIR}/debug-{COMMIT}-$(date +%s).log" \
14 "COMPILER=$C_COMPILER COMMIT={COMMIT} ./build/bin/bitcoind -datadir=${DATA_DIR} -stopatheight=$STOP_HEIGHT -dbcache=$DBCACHE -blocksonly -printtoconsole=0"
15
168af40aaf28 refactor: replace raw values in ReadRawBlock with HEADER_BYTE_SIZE
175c21f6c26f optimization: Bulk serialization reads in `UndoRead`, `ReadBlock`, and `ReadRawBlock`
1897d30e2534 optimization: Bulk serialization writes in `WriteBlockUndo` and `WriteBlock`
19
20Benchmark 1: COMPILER=gcc COMMIT=8af40aaf283cc81fe2b3cc125d21f090c562460e ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=888888 -dbcache=1000 -blocksonly -printtoconsole=0
21 Time (mean ± σ): 30899.137 s ± 501.097 s [User: 43094.744 s, System: 3295.980 s]
22 Range (min … max): 30544.808 s … 31253.467 s 2 runs
23
24Benchmark 2: COMPILER=gcc COMMIT=5c21f6c26fd6b6eab33e208cd565577735fe9aa7 ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=888888 -dbcache=1000 -blocksonly -printtoconsole=0
25 Time (mean ± σ): 29956.405 s ± 111.231 s [User: 42278.888 s, System: 3239.997 s]
26 Range (min … max): 29877.754 s … 30035.057 s 2 runs
27
28Benchmark 3: COMPILER=gcc COMMIT=97d30e2534b0a8c388a4a2059277f64d3b2243af ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=888888 -dbcache=1000 -blocksonly -printtoconsole=0
29 Time (mean ± σ): 28826.445 s ± 147.749 s [User: 41450.577 s, System: 2955.840 s]
30 Range (min … max): 28721.970 s … 28930.919 s 2 runs
31
32Summary
33 COMPILER=gcc COMMIT=97d30e2534b0a8c388a4a2059277f64d3b2243af ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=888888 -dbcache=1000 -blocksonly -printtoconsole=0 ran
34 1.04 ± 0.01 times faster than COMPILER=gcc COMMIT=5c21f6c26fd6b6eab33e208cd565577735fe9aa7 ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=888888 -dbcache=1000 -blocksonly -printtoconsole=0
35 1.07 ± 0.02 times faster than COMPILER=gcc COMMIT=8af40aaf283cc81fe2b3cc125d21f090c562460e ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=888888 -dbcache=1000 -blocksonly -printtoconsole=0