This change is part of [IBD] - Tracking PR for speeding up Initial Block Download
Summary
Current block obfuscations are done byte-by-byte, this PR batches them to 64 bit primitives to speed up obfuscating bigger memory batches. This is especially relevant now that #31551 was merged, having bigger obfuscatable chunks.
Since this obfuscation is optional, the speedup measured here depends on whether it’s a random value or completely turned off (i.e. XOR-ing with 0).
Changes in testing, benchmarking and implementation
- Added new tests comparing randomized inputs against a trivial implementation and performing roundtrip checks with random chunks.
- An additional benchmark checks the effect of short-circuiting XOR when the key is zero, ensuring no speed regression occurs when the obfuscation feature is disabled.
- Migrated remaining
std::vector<std::byte>(8)
values touint64_t
.
Reproducer and assembly
Memory alignment is handled via std::memcpy
, optimized out on tested platforms (see https://godbolt.org/z/P4cWx91Kv):
- Clang (x86-64) - 128-bit SIMD (pxor), 256-bit unroll (4×64-bit)
- GCC (x86-64) - 64-bit XOR (QWORD), 128-bit unroll (2×64-bit)
- RISC-V (32-bit) - 64-bit via 32-bit registers, no unroll, byte-by-byte load/store
- s390x (big-endian) - 64-bit XOR (xc), 512-bit unroll (8×64-bit)
Endianness
The only endianness issue was with bit rotation, intended to realign the key if obfuscation halted before full key consumption. Elsewhere, memory is read, processed, and written back in the same endianness, preserving byte order. Since CI lacks a big-endian machine, testing was done locally via Docker.
0brew install podman pigz
1softwareupdate --install-rosetta
2podman machine init
3podman machine start
4docker run --platform linux/s390x -it ubuntu:latest /bin/bash
5 apt update && apt install -y git build-essential cmake ccache pkg-config libevent-dev libboost-dev libssl-dev libsqlite3-dev && \
6 cd /mnt && git clone https://github.com/bitcoin/bitcoin.git && cd bitcoin && git remote add l0rinc https://github.com/l0rinc/bitcoin.git && git fetch --all && git checkout l0rinc/optimize-xor && \
7 cmake -B build && cmake --build build --target test_bitcoin -j$(nproc) && \
8 ./build/bin/test_bitcoin --run_test=streams_tests
Measurements (micro benchmarks and full IBDs)
cmake -B build -DBUILD_BENCH=ON -DCMAKE_BUILD_TYPE=Release
&& cmake –build build -j$(nproc)
&& build/bin/bench_bitcoin -filter=‘XorObfuscationBench’ -min-time=10000
Before:
ns/MiB | MiB/s | err% | total | benchmark |
---|---|---|---|---|
731,927.62 | 1,366.26 | 0.2% | 10.67 | XorObfuscationBench |
After:
ns/MiB | MiB/s | err% | total | benchmark |
---|---|---|---|---|
14,730.40 | 67,886.80 | 0.1% | 11.01 | XorObfuscationBench |
Before:
ns/MiB | MiB/s | err% | ins/MiB | cyc/MiB | IPC | bra/MiB | miss% | total | benchmark |
---|---|---|---|---|---|---|---|---|---|
941,015.26 | 1,062.68 | 0.0% | 9,437,186.97 | 3,378,911.52 | 2.793 | 1,048,577.15 | 0.0% | 10.99 | XorObfuscationBench |
After:
ns/MiB | MiB/s | err% | ins/MiB | cyc/MiB | IPC | bra/MiB | miss% | total | benchmark |
---|---|---|---|---|---|---|---|---|---|
51,187.17 | 19,536.15 | 0.0% | 327,683.95 | 183,747.58 | 1.783 | 65,536.55 | 0.0% | 11.00 | XorObfuscationBench |
i.e. 18x faster obfuscation on Linux, 49x faster on Mac
A few other benchmarks that seem to have improved as well (tested with Clang only): Before:
ns/op | op/s | err% | total | benchmark |
---|---|---|---|---|
2,202,618.49 | 454.01 | 0.2% | 11.01 | ReadBlockBench |
734,444.92 | 1,361.57 | 0.3% | 10.66 | ReadRawBlockBench |
After:
ns/op | op/s | err% | total | benchmark |
---|---|---|---|---|
1,912,308.06 | 522.93 | 0.4% | 10.98 | ReadBlockBench |
49,092.93 | 20,369.53 | 0.2% | 10.99 | ReadRawBlockBench |
i.e. ReadRawBlockBench
is 15x faster, ReadBlockBench
is 15% faster
Also visible on https://corecheck.dev/bitcoin/bitcoin/pulls/31144
Running an IBD until 888888 blocks reveals a 4% speedup.
SSD:
0COMMITS="8324a00bd4a6a5291c841f2d01162d8a014ddb02 5ddfd31b4158a89b0007cfb2be970c03d9278525"; \
1STOP_HEIGHT=888888; DBCACHE=1000; \
2CC=gcc; CXX=g++; \
3BASE_DIR="/mnt/my_storage"; DATA_DIR="$BASE_DIR/BitcoinData"; LOG_DIR="$BASE_DIR/logs"; \
4(for c in $COMMITS; do git fetch origin $c -q && git log -1 --pretty=format:'%h %s' $c || exit 1; done) && \
5hyperfine \
6 --sort 'command' \
7 --runs 1 \
8 --export-json "$BASE_DIR/ibd-${COMMITS// /-}-$STOP_HEIGHT-$DBCACHE-$CC.json" \
9 --parameter-list COMMIT ${COMMITS// /,} \
10 --prepare "killall bitcoind; rm -rf $DATA_DIR/*; git checkout {COMMIT}; git clean -fxd; git reset --hard; \
11 cmake -B build -DCMAKE_BUILD_TYPE=Release -DENABLE_WALLET=OFF && \
12 cmake --build build -j$(nproc) --target bitcoind && \
13 ./build/bin/bitcoind -datadir=$DATA_DIR -stopatheight=1 -printtoconsole=0; sleep 100" \
14 --cleanup "cp $DATA_DIR/debug.log $LOG_DIR/debug-{COMMIT}-$(date +%s).log" \
15 "COMPILER=$CC ./build/bin/bitcoind -datadir=$DATA_DIR -stopatheight=$STOP_HEIGHT -dbcache=$DBCACHE -blocksonly -printtoconsole=0"
8324a00bd4 test: Compare util::Xor with randomized inputs against simple impl 5ddfd31b41 optimization: Xor 64 bits together instead of byte-by-byte
0Benchmark 1: COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=888888 -dbcache=1000 -blocksonly -printtoconsole=0 (COMMIT = 8324a00bd4a6a5291c841f2d01162d8a014ddb02)
1 Time (abs ≡): 25033.413 s [User: 33953.984 s, System: 2613.604 s]
2
3Benchmark 2: COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=888888 -dbcache=1000 -blocksonly -printtoconsole=0 (COMMIT = 5ddfd31b4158a89b0007cfb2be970c03d9278525)
4 Time (abs ≡): 24110.710 s [User: 33389.536 s, System: 2660.292 s]
5
6Relative speed comparison
7 1.04 COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=888888 -dbcache=1000 -blocksonly -printtoconsole=0 (COMMIT = 8324a00bd4a6a5291c841f2d01162d8a014ddb02)
8 1.00 COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=888888 -dbcache=1000 -blocksonly -printtoconsole=0 (COMMIT = 5ddfd31b4158a89b0007cfb2be970c03d9278525)
HDD:
0COMMITS="71eb6eaa740ad0b28737e90e59b89a8e951d90d9 46854038e7984b599d25640de26d4680e62caba7"; \
1STOP_HEIGHT=888888; DBCACHE=4500; \
2CC=gcc; CXX=g++; \
3BASE_DIR="/mnt/my_storage"; DATA_DIR="$BASE_DIR/BitcoinData"; LOG_DIR="$BASE_DIR/logs"; \
4(for c in $COMMITS; do git fetch origin $c -q && git log -1 --pretty=format:'%h %s' $c || exit 1; done) && \
5hyperfine \
6 --sort 'command' \
7 --runs 2 \
8 --export-json "$BASE_DIR/ibd-${COMMITS// /-}-$STOP_HEIGHT-$DBCACHE-$CC.json" \
9 --parameter-list COMMIT ${COMMITS// /,} \
10 --prepare "killall bitcoind; rm -rf $DATA_DIR/*; git checkout {COMMIT}; git clean -fxd; git reset --hard; \
11 cmake -B build -DCMAKE_BUILD_TYPE=Release -DENABLE_WALLET=OFF && cmake --build build -j$(nproc) --target bitcoind && \
12 ./build/bin/bitcoind -datadir=$DATA_DIR -stopatheight=1 -printtoconsole=0; sleep 100" \
13 --cleanup "cp $DATA_DIR/debug.log $LOG_DIR/debug-{COMMIT}-$(date +%s).log" \
14 "COMPILER=$CC ./build/bin/bitcoind -datadir=$DATA_DIR -stopatheight=$STOP_HEIGHT -dbcache=$DBCACHE -blocksonly -printtoconsole=0"
71eb6eaa74 test: compare util::Xor with randomized inputs against simple impl 46854038e7 optimization: migrate fixed-size obfuscation from
std::vector<std::byte>
touint64_t
0Benchmark 1: COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=888888 -dbcache=4500 -blocksonly -printtoconsole=0 (COMMIT = 71eb6eaa740ad0b28737e90e59b89a8e951d90d9)
1 Time (mean ± σ): 37676.293 s ± 83.100 s [User: 36900.535 s, System: 2220.382 s]
2 Range (min … max): 37617.533 s … 37735.053 s 2 runs
3
4Benchmark 2: COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=888888 -dbcache=4500 -blocksonly -printtoconsole=0 (COMMIT = 46854038e7984b599d25640de26d4680e62caba7)
5 Time (mean ± σ): 36181.287 s ± 195.248 s [User: 34962.822 s, System: 1988.614 s]
6 Range (min … max): 36043.226 s … 36319.349 s 2 runs
7
8Relative speed comparison
9 1.04 ± 0.01 COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=888888 -dbcache=4500 -blocksonly -printtoconsole=0 (COMMIT = 71eb6eaa740ad0b28737e90e59b89a8e951d90d9)
10 1.00 COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=888888 -dbcache=4500 -blocksonly -printtoconsole=0 (COMMIT = 46854038e7984b599d25640de26d4680e62caba7)