This change is part of [IBD] - Tracking PR for speeding up Initial Block Download
Summary
Current block obfuscations are done byte-by-byte, this PR batches them to 64 bit primitives to speed up obfuscating bigger memory batches. This is especially relevant now that #31551 was merged, having bigger obfuscatable chunks.
Since this obfuscation is optional, the speedup measured here depends on whether it’s a random value or completely turned off (i.e. XOR-ing with 0).
Changes in testing, benchmarking and implementation
- Added new tests comparing randomized inputs against a trivial implementation and performing roundtrip checks with random chunks.
- Migrated
std::vector<std::byte>(8)
keys to plainuint64_t
; - Process unaligned bytes separately and unroll body to 64 bytes.
Assembly
Memory alignment is enforced by a small peel-loop (std::memcpy
is optimized out on tested platform), with an std::assume_aligned<8>
check, see the Godbolt listing at https://godbolt.org/z/35nveanf5 for details
Target & Compiler | Stride (per hot-loop iter) | Main operation(s) in loop | Effective XORs / iter |
---|---|---|---|
Clang x86-64 (trunk) | 64 bytes | 4 × movdqu → pxor → store | 8 × 64-bit |
GCC x86-64 (trunk) | 64 bytes | 4 × movdqu/pxor sequence, enabled by 8-way unroll | 8 × 64-bit |
GCC RV32 (trunk) | 8 bytes | copy 8 B to temp → 2 × 32-bit XOR → copy back | 1 × 64-bit (as 2 × 32-bit) |
GCC s390x (big-endian 14.2) | 64 bytes | 8 × XC (mem-mem 8-B XOR) with key cached on stack | 8 × 64-bit |
Endianness
The only endianness issue was with bit rotation, intended to realign the key if obfuscation halted before full key consumption. Elsewhere, memory is read, processed, and written back in the same endianness, preserving byte order. Since CI lacks a big-endian machine, testing was done locally via Docker.
0brew install podman pigz
1softwareupdate --install-rosetta
2podman machine init
3podman machine start
4docker run --platform linux/s390x -it ubuntu:latest /bin/bash
5 apt update && apt install -y git build-essential cmake ccache pkg-config libevent-dev libboost-dev libssl-dev libsqlite3-dev python3 && \
6 cd /mnt && git clone --depth=1 https://github.com/bitcoin/bitcoin.git && cd bitcoin && git remote add l0rinc https://github.com/l0rinc/bitcoin.git && git fetch --all && git checkout l0rinc/optimize-xor && \
7 cmake -B build && cmake --build build --target test_bitcoin -j$(nproc) && \
8 ./build/bin/test_bitcoin --run_test=streams_tests
Measurements (micro benchmarks and full IBDs)
cmake -B build -DBUILD_BENCH=ON -DCMAKE_BUILD_TYPE=Release -DCMAKE_C_COMPILER=gcc/clang -DCMAKE_CXX_COMPILER=g++/clang++ &&
cmake –build build -j$(nproc) &&
build/bin/bench_bitcoin -filter=‘ObfuscationBench’ -min-time=5000
Before:
ns/MiB | MiB/s | err% | ins/MiB | cyc/MiB | IPC | bra/MiB | miss% | total | benchmark |
---|---|---|---|---|---|---|---|---|---|
926,379.31 | 1,079.47 | 0.1% | 6,815,747.36 | 3,325,871.40 | 2.049 | 524,289.23 | 0.0% | 5.50 | ObfuscationBench |
After:
ns/MiB | MiB/s | err% | ins/MiB | cyc/MiB | IPC | bra/MiB | miss% | total | benchmark |
---|---|---|---|---|---|---|---|---|---|
74,817.84 | 13,365.80 | 0.0% | 655,366.68 | 268,566.88 | 2.440 | 131,074.08 | 0.0% | 5.50 | ObfuscationBench |
and
ns/MiB | MiB/s | err% | ins/MiB | cyc/MiB | IPC | bra/MiB | miss% | total | benchmark |
---|---|---|---|---|---|---|---|---|---|
34,770.15 | 28,760.30 | 0.0% | 262,149.44 | 124,756.60 | 2.101 | 16,384.83 | 0.0% | 5.32 | ObfuscationBench |
Before:
ns/MiB | MiB/s | err% | ins/MiB | cyc/MiB | IPC | bra/MiB | miss% | total | benchmark |
---|---|---|---|---|---|---|---|---|---|
879,957.05 | 1,136.42 | 0.0% | 9,437,186.42 | 3,158,477.41 | 2.988 | 1,048,576.99 | 0.0% | 5.50 | ObfuscationBench |
After:
ns/MiB | MiB/s | err% | ins/MiB | cyc/MiB | IPC | bra/MiB | miss% | total | benchmark |
---|---|---|---|---|---|---|---|---|---|
51,665.69 | 19,355.21 | 0.0% | 327,684.05 | 185,409.22 | 1.767 | 65,536.55 | 0.0% | 5.50 | ObfuscationBench |
and
ns/MiB | MiB/s | err% | ins/MiB | cyc/MiB | IPC | bra/MiB | miss% | total | benchmark |
---|---|---|---|---|---|---|---|---|---|
37,281.01 | 26,823.31 | 0.1% | 475,139.64 | 133,808.24 | 3.551 | 81,920.54 | 0.0% | 5.35 | ObfuscationBench |
i.e. 26.6x faster obfuscation with GCC, 23.6x faster with Clang
For other benchmark speedups see https://corecheck.dev/bitcoin/bitcoin/pulls/31144
Running an IBD until 888888 blocks reveals a 4% speedup.
SSD:
0COMMITS="8324a00bd4a6a5291c841f2d01162d8a014ddb02 5ddfd31b4158a89b0007cfb2be970c03d9278525"; \
1STOP_HEIGHT=888888; DBCACHE=1000; \
2CC=gcc; CXX=g++; \
3BASE_DIR="/mnt/my_storage"; DATA_DIR="$BASE_DIR/BitcoinData"; LOG_DIR="$BASE_DIR/logs"; \
4(for c in $COMMITS; do git fetch origin $c -q && git log -1 --pretty=format:'%h %s' $c || exit 1; done) && \
5hyperfine \
6 --sort 'command' \
7 --runs 1 \
8 --export-json "$BASE_DIR/ibd-${COMMITS// /-}-$STOP_HEIGHT-$DBCACHE-$CC.json" \
9 --parameter-list COMMIT ${COMMITS// /,} \
10 --prepare "killall bitcoind; rm -rf $DATA_DIR/*; git checkout {COMMIT}; git clean -fxd; git reset --hard; \
11 cmake -B build -DCMAKE_BUILD_TYPE=Release -DENABLE_WALLET=OFF && \
12 cmake --build build -j$(nproc) --target bitcoind && \
13 ./build/bin/bitcoind -datadir=$DATA_DIR -stopatheight=1 -printtoconsole=0; sleep 100" \
14 --cleanup "cp $DATA_DIR/debug.log $LOG_DIR/debug-{COMMIT}-$(date +%s).log" \
15 "COMPILER=$CC ./build/bin/bitcoind -datadir=$DATA_DIR -stopatheight=$STOP_HEIGHT -dbcache=$DBCACHE -blocksonly -printtoconsole=0"
8324a00bd4 test: Compare util::Xor with randomized inputs against simple impl 5ddfd31b41 optimization: Xor 64 bits together instead of byte-by-byte
0Benchmark 1: COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=888888 -dbcache=1000 -blocksonly -printtoconsole=0 (COMMIT = 8324a00bd4a6a5291c841f2d01162d8a014ddb02)
1 Time (abs ≡): 25033.413 s [User: 33953.984 s, System: 2613.604 s]
2
3Benchmark 2: COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=888888 -dbcache=1000 -blocksonly -printtoconsole=0 (COMMIT = 5ddfd31b4158a89b0007cfb2be970c03d9278525)
4 Time (abs ≡): 24110.710 s [User: 33389.536 s, System: 2660.292 s]
5
6Relative speed comparison
7 1.04 COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=888888 -dbcache=1000 -blocksonly -printtoconsole=0 (COMMIT = 8324a00bd4a6a5291c841f2d01162d8a014ddb02)
8 1.00 COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=888888 -dbcache=1000 -blocksonly -printtoconsole=0 (COMMIT = 5ddfd31b4158a89b0007cfb2be970c03d9278525)
HDD:
0COMMITS="71eb6eaa740ad0b28737e90e59b89a8e951d90d9 46854038e7984b599d25640de26d4680e62caba7"; \
1STOP_HEIGHT=888888; DBCACHE=4500; \
2CC=gcc; CXX=g++; \
3BASE_DIR="/mnt/my_storage"; DATA_DIR="$BASE_DIR/BitcoinData"; LOG_DIR="$BASE_DIR/logs"; \
4(for c in $COMMITS; do git fetch origin $c -q && git log -1 --pretty=format:'%h %s' $c || exit 1; done) && \
5hyperfine \
6 --sort 'command' \
7 --runs 2 \
8 --export-json "$BASE_DIR/ibd-${COMMITS// /-}-$STOP_HEIGHT-$DBCACHE-$CC.json" \
9 --parameter-list COMMIT ${COMMITS// /,} \
10 --prepare "killall bitcoind; rm -rf $DATA_DIR/*; git checkout {COMMIT}; git clean -fxd; git reset --hard; \
11 cmake -B build -DCMAKE_BUILD_TYPE=Release -DENABLE_WALLET=OFF && cmake --build build -j$(nproc) --target bitcoind && \
12 ./build/bin/bitcoind -datadir=$DATA_DIR -stopatheight=1 -printtoconsole=0; sleep 100" \
13 --cleanup "cp $DATA_DIR/debug.log $LOG_DIR/debug-{COMMIT}-$(date +%s).log" \
14 "COMPILER=$CC ./build/bin/bitcoind -datadir=$DATA_DIR -stopatheight=$STOP_HEIGHT -dbcache=$DBCACHE -blocksonly -printtoconsole=0"
71eb6eaa74 test: compare util::Xor with randomized inputs against simple impl 46854038e7 optimization: migrate fixed-size obfuscation from
std::vector<std::byte>
touint64_t
0Benchmark 1: COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=888888 -dbcache=4500 -blocksonly -printtoconsole=0 (COMMIT = 71eb6eaa740ad0b28737e90e59b89a8e951d90d9)
1 Time (mean ± σ): 37676.293 s ± 83.100 s [User: 36900.535 s, System: 2220.382 s]
2 Range (min … max): 37617.533 s … 37735.053 s 2 runs
3
4Benchmark 2: COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=888888 -dbcache=4500 -blocksonly -printtoconsole=0 (COMMIT = 46854038e7984b599d25640de26d4680e62caba7)
5 Time (mean ± σ): 36181.287 s ± 195.248 s [User: 34962.822 s, System: 1988.614 s]
6 Range (min … max): 36043.226 s … 36319.349 s 2 runs
7
8Relative speed comparison
9 1.04 ± 0.01 COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=888888 -dbcache=4500 -blocksonly -printtoconsole=0 (COMMIT = 71eb6eaa740ad0b28737e90e59b89a8e951d90d9)
10 1.00 COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=888888 -dbcache=4500 -blocksonly -printtoconsole=0 (COMMIT = 46854038e7984b599d25640de26d4680e62caba7)