This PR contain a few different optimization I found by IBD profiling, and via the newly added block seralization benchmarks.
The commits merge similar (de)serialization methods, and separates them internally with if constexpr
- similarly to how it has been done here before. This enabled further SizeComputer
optimizations as well.
Other than these, since single byte writes are used very often (used for every (u)int8_t
or std::byte
or bool
and for every VarInt
’s first byte which is also needed for every (pre)Vector
), it makes sense to avoid the generalized serialization infrastructure that isn’t needed:
AutoFile
write doesn’t need to allocate 4k buffer for a single byte now;VectorWriter
andDataStream
avoids memcpy/insert calls.
DeserializeBlock
is dominated by the hash calculations so the optimizations barely affect it.
Before:
ns/block | block/s | err% | total | benchmark |
---|---|---|---|---|
936,285.45 | 1,068.05 | 0.1% | 11.01 | DeserializeBlock |
194,330.04 | 5,145.88 | 0.2% | 10.97 | SerializeBlock |
12,215.05 | 81,866.19 | 0.0% | 11.00 | SizeComputerBlock |
After:
ns/block | block/s | err% | total | benchmark |
---|---|---|---|---|
888,859.82 | 1,125.04 | 0.4% | 10.87 | DeserializeBlock |
168,502.88 | 5,934.62 | 0.1% | 10.99 | SerializeBlock |
10,200.88 | 98,030.75 | 0.1% | 11.00 | SizeComputerBlock |
DeserializeBlock
- 5.3% fasterSerializeBlock
- 15.3% fasterSizeComputerBlock
- 19.7% faster
Before:
ns/block | block/s | err% | ins/block | cyc/block | IPC | bra/block | miss% | total | benchmark |
---|---|---|---|---|---|---|---|---|---|
4,447,243.87 | 224.86 | 0.0% | 53,689,737.58 | 15,966,336.86 | 3.363 | 2,409,315.46 | 0.5% | 11.01 | DeserializeBlock |
869,833.14 | 1,149.65 | 0.0% | 8,015,883.90 | 3,123,013.80 | 2.567 | 1,517,035.87 | 0.5% | 10.81 | SerializeBlock |
26,535.51 | 37,685.36 | 0.0% | 225,261.03 | 95,278.40 | 2.364 | 53,037.03 | 0.6% | 11.00 | SizeComputerBlock |
After:
ns/block | block/s | err% | ins/block | cyc/block | IPC | bra/block | miss% | total | benchmark |
---|---|---|---|---|---|---|---|---|---|
4,460,428.52 | 224.19 | 0.0% | 53,692,507.13 | 16,015,347.97 | 3.353 | 2,410,105.48 | 0.5% | 11.01 | DeserializeBlock |
567,042.65 | 1,763.54 | 0.0% | 7,386,775.59 | 2,035,613.84 | 3.629 | 1,385,368.57 | 0.5% | 11.01 | SerializeBlock |
25,728.56 | 38,867.32 | 0.0% | 172,750.03 | 92,366.64 | 1.870 | 42,131.03 | 1.7% | 11.00 | SizeComputerBlock |
DeserializeBlock
- same speedSerializeBlock
- 53.3% fasterSizeComputerBlock
- 3.1% faster
While this wasn’t the main motivation for the change, IBD on Ubuntu/GCC on SSD with i9 indicates a 2% speedup as well:
0COMMITS="05314bde0b06b820225f10c6529b5afae128ff81 1cd94ec2511874ec68b92db34ad7ec7d9534fed1"; \
1STOP_HEIGHT=880000; DBCACHE=10000; \
2C_COMPILER=gcc; CXX_COMPILER=g++; \
3hyperfine \
4--export-json "/mnt/my_storage/ibd-${COMMITS// /-}-${STOP_HEIGHT}-${DBCACHE}-${C_COMPILER}.json" \
5--runs 3 \
6--parameter-list COMMIT ${COMMITS// /,} \
7--prepare "killall bitcoind || true; rm -rf /mnt/my_storage/BitcoinData/*; git checkout {COMMIT}; git clean -fxd; git reset --hard; cmake -B build -DCMAKE_BUILD_TYPE=Release -DENABLE_WALLET=OFF -DCMAKE_C_COMPILER=$C_COMPILER -DCMAKE_CXX_COMPILER=$CXX_COMPILER && cmake --build build -j$(nproc) --target bitcoind && ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=1 -printtoconsole=0 || true" \
8--cleanup "cp /mnt/my_storage/BitcoinData/debug.log /mnt/my_storage/logs/debug-{COMMIT}-$(date +%s).log || true" \
9"COMPILER=$C_COMPILER COMMIT={COMMIT} ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=$STOP_HEIGHT -dbcache=$DBCACHE -prune=550 -printtoconsole=0"
10Benchmark 1: COMPILER=gcc COMMIT=05314bde0b06b820225f10c6529b5afae128ff81 ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=880000 -dbcache=10000 -prune=550 -printtoconsole=0
11 Time (mean ± σ): 33647.918 s ± 508.655 s [User: 71503.409 s, System: 4404.899 s]
12 Range (min … max): 33283.439 s … 34229.026 s 3 runs
13
14Benchmark 2: COMPILER=gcc COMMIT=1cd94ec2511874ec68b92db34ad7ec7d9534fed1 ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=880000 -dbcache=10000 -prune=550 -printtoconsole=0
15 Time (mean ± σ): 33062.491 s ± 183.335 s [User: 71246.532 s, System: 4318.490 s]
16 Range (min … max): 32888.211 s … 33253.706 s 3 runs
17
18Summary
19 COMPILER=gcc COMMIT=1cd94ec2511874ec68b92db34ad7ec7d9534fed1 ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=880000 -dbcache=10000 -prune=550 -printtoconsole=0 ran
20 1.02 ± 0.02 times faster than COMPILER=gcc COMMIT=05314bde0b06b820225f10c6529b5afae128ff81 ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=880000 -dbcache=10000 -prune=550 -printtoconsole=0