This change is part of [IBD] - Tracking PR for speeding up Initial Block Download
This PR is drafted until I remeasure everything after the recent merges and I need to find a way to simplify the 1 byte writes more nicely, I don't like all the specializations.
Summary: This PR contains a few different optimizations found by IBD profiling and via the newly added block serialization benchmarks. It also takes advantage of the recently merged std::span changes enabling propagating static extents.
The commits merge similar (de)serialization methods and separate them internally with if constexpr, similarly to how it has been done here before. This enables further SizeComputer optimizations as well.
Context: Since single byte writes are used very often (for every (u)int8_t, std::byte, or bool, and for every VarInt's first byte, which is also needed for every (pre)Vector), it makes sense to avoid the generalized serialization infrastructure that isn't needed:
AutoFilewrite doesn't need to allocate a 4k buffer for a single byte now;VectorWriterandDataStreamavoidmemcpy/insertcalls;CSHA256::Writecan avoidmemcpy.
DeserializeBlock is dominated by hash calculations, so the optimizations barely affect it.
Measurements:
<details> <summary>C compiler ............................ AppleClang 16.0.0.16000026</summary>
Before: | ns/block | block/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 195,610.62 | 5,112.20 | 0.3% | 11.00 |
SerializeBlock| 12,061.83 | 82,906.19 | 0.1% | 11.01 |SizeComputerBlockAfter: | ns/block | block/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 174,569.19 | 5,728.39 | 0.6% | 10.89 |SerializeBlock| 10,241.16 | 97,645.21 | 0.0% | 11.00 |SizeComputerBlock</details>
SerializeBlock- ~12.% fasterSizeComputerBlock- ~17.7% faster
<details> <summary>C++ compiler .......................... GNU 13.3.0</summary>
Before: | ns/block | block/s | err% | ins/block | cyc/block | IPC | bra/block | miss% | total | benchmark |--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:---------- | 867,857.55 | 1,152.26 | 0.0% | 8,015,883.90 | 3,116,099.08 | 2.572 | 1,517,035.87 | 0.5% | 10.81 |
SerializeBlock| 30,928.27 | 32,332.88 | 0.0% | 221,683.03 | 111,055.84 | 1.996 | 53,037.03 | 0.8% | 11.03 |SizeComputerBlockAfter: | ns/block | block/s | err% | ins/block | cyc/block | IPC | bra/block | miss% | total | benchmark |--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:---------- | 615,000.56 | 1,626.01 | 0.0% | 8,015,883.64 | 2,208,340.88 | 3.630 | 1,517,035.62 | 0.5% | 10.56 |SerializeBlock| 25,676.76 | 38,945.72 | 0.0% | 159,390.03 | 92,202.10 | 1.729 | 42,131.03 | 0.9% | 11.00 |SizeComputerBlock</details>
SerializeBlock- ~41.1% fasterSizeComputerBlock- ~20.4% faster
While this wasn't the main motivation for the change, IBD on Ubuntu/GCC on SSD with i9 indicates a 2% speedup as well:
<details> <summary>Details</summary> ```bash COMMITS="05314bde0b06b820225f10c6529b5afae128ff81 1cd94ec2511874ec68b92db34ad7ec7d9534fed1"; \ STOP_HEIGHT=880000; DBCACHE=10000; \ C_COMPILER=gcc; CXX_COMPILER=g++; \ hyperfine \ --export-json "/mnt/my_storage/ibd-${COMMITS// /-}-${STOP_HEIGHT}-${DBCACHE}-${C_COMPILER}.json" \ --runs 3 \ --parameter-list COMMIT ${COMMITS// /,} \ --prepare "killall bitcoind || true; rm -rf /mnt/my_storage/BitcoinData/*; git checkout {COMMIT}; git clean -fxd; git reset --hard; cmake -B build -DCMAKE_BUILD_TYPE=Release -DENABLE_WALLET=OFF -DCMAKE_C_COMPILER=$C_COMPILER -DCMAKE_CXX_COMPILER=$CXX_COMPILER && cmake --build build -j$(nproc) --target bitcoind && ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=1 -printtoconsole=0 || true" \ --cleanup "cp /mnt/my_storage/BitcoinData/debug.log /mnt/my_storage/logs/debug-{COMMIT}-$(date +%s).log || true" \ "COMPILER=$C_COMPILER COMMIT={COMMIT} ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=$STOP_HEIGHT -dbcache=$DBCACHE -prune=550 -printtoconsole=0" Benchmark 1: COMPILER=gcc COMMIT=05314bde0b06b820225f10c6529b5afae128ff81 ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=880000 -dbcache=10000 -prune=550 -printtoconsole=0 Time (mean ± σ): 33647.918 s ± 508.655 s [User: 71503.409 s, System: 4404.899 s] Range (min … max): 33283.439 s … 34229.026 s 3 runs
Benchmark 2: COMPILER=gcc COMMIT=1cd94ec2511874ec68b92db34ad7ec7d9534fed1 ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=880000 -dbcache=10000 -prune=550 -printtoconsole=0 Time (mean ± σ): 33062.491 s ± 183.335 s [User: 71246.532 s, System: 4318.490 s] Range (min … max): 32888.211 s … 33253.706 s 3 runs
Summary COMPILER=gcc COMMIT=1cd94ec2511874ec68b92db34ad7ec7d9534fed1 ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=880000 -dbcache=10000 -prune=550 -printtoconsole=0 ran 1.02 ± 0.02 times faster than COMPILER=gcc COMMIT=05314bde0b06b820225f10c6529b5afae128ff81 ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=880000 -dbcache=10000 -prune=550 -printtoconsole=0
</details>