[IBD] multi-byte block obfuscation

l0rinc commented at 7:11 am on October 24, 2024: contributor

This change is part of [IBD] - Tracking PR for speeding up Initial Block Download

Summary

Current block obfuscations are done byte-by-byte, this PR batches them to 64 bit primitives to speed up obfuscating bigger memory batches. This is especially relevant now that #31551 was merged, having bigger obfuscatable chunks.

Since this obfuscation is optional, the speedup measured here depends on whether it’s a random value or completely turned off (i.e. XOR-ing with 0).

Changes in testing, benchmarking and implementation

Added new tests comparing randomized inputs against a trivial implementation and performing roundtrip checks with random chunks.
Migrated std::vector<std::byte>(8) keys to plain uint64_t;
Process unaligned bytes separately and unroll body to 64 bytes.

Assembly

Memory alignment is enforced by a small peel-loop (std::memcpy is optimized out on tested platform), with an std::assume_aligned<8> check, see the Godbolt listing at https://godbolt.org/z/59EMv7h6Y for details

Target & Compiler	Stride (per hot-loop iter)	Main operation(s) in loop	Effective XORs / iter
Clang x86-64 (trunk)	64 bytes	4 × movdqu → pxor → store	8 × 64-bit
GCC x86-64 (trunk)	64 bytes	4 × movdqu/pxor sequence, enabled by 8-way unroll	8 × 64-bit
GCC RV32 (trunk)	8 bytes	copy 8 B to temp → 2 × 32-bit XOR → copy back	1 × 64-bit (as 2 × 32-bit)
GCC s390x (big-endian 14.2)	64 bytes	8 × XC (mem-mem 8-B XOR) with key cached on stack	8 × 64-bit

Endianness

The only endianness issue was with bit rotation, intended to realign the key if obfuscation halted before full key consumption. Elsewhere, memory is read, processed, and written back in the same endianness, preserving byte order. Since CI lacks a big-endian machine, testing was done locally via Docker.

0brew install podman pigz
1softwareupdate --install-rosetta
2podman machine init
3podman machine start
4docker run --platform linux/s390x -it ubuntu:latest /bin/bash
5  apt update && apt install -y git build-essential cmake ccache pkg-config libevent-dev libboost-dev libssl-dev libsqlite3-dev python3 && \
6  cd /mnt && git clone --depth=1 https://github.com/bitcoin/bitcoin.git && cd bitcoin && git remote add l0rinc https://github.com/l0rinc/bitcoin.git && git fetch --all && git checkout l0rinc/optimize-xor && \
7  cmake -B build && cmake --build build --target test_bitcoin -j$(nproc) && \
8  ./build/bin/test_bitcoin --run_test=streams_tests

Measurements (micro benchmarks and full IBDs)

cmake -B build -DBUILD_BENCH=ON -DCMAKE_BUILD_TYPE=Release -DCMAKE_C_COMPILER=gcc/clang -DCMAKE_CXX_COMPILER=g++/clang++ &&
cmake –build build -j$(nproc) &&
build/bin/bench_bitcoin -filter=‘ObfuscationBench’ -min-time=5000

Before:

ns/byte	byte/s	err%	ins/byte	cyc/byte	IPC	bra/byte	miss%	total	benchmark
0.84	1,184,138,235.64	0.0%	9.01	3.03	2.971	1.00	0.1%	5.50	`ObfuscationBench`

After (first optimizing commit):

ns/byte	byte/s	err%	ins/byte	cyc/byte	IPC	bra/byte	miss%	total	benchmark
0.04	28,365,698,819.44	0.0%	0.34	0.13	2.714	0.07	0.0%	5.33	`ObfuscationBench`

and (second optimizing commit):

ns/byte	byte/s	err%	ins/byte	cyc/byte	IPC	bra/byte	miss%	total	benchmark
0.03	32,464,658,919.11	0.0%	0.50	0.11	4.474	0.08	0.0%	5.29	`ObfuscationBench`

Before:

ns/byte	byte/s	err%	ins/byte	cyc/byte	IPC	bra/byte	miss%	total	benchmark
0.89	1,124,087,330.23	0.1%	6.52	3.20	2.041	0.50	0.2%	5.50	`ObfuscationBench`

After (first optimizing commit):

ns/byte	byte/s	err%	ins/byte	cyc/byte	IPC	bra/byte	miss%	total	benchmark
0.08	13,012,464,203.00	0.0%	0.65	0.28	2.338	0.13	0.8%	5.50	`ObfuscationBench`

and (second optimizing commit):

ns/byte	byte/s	err%	ins/byte	cyc/byte	IPC	bra/byte	miss%	total	benchmark
0.02	41,231,547,045.17	0.0%	0.30	0.09	3.463	0.02	0.0%	5.47	`ObfuscationBench`

i.e. 27.4x faster obfuscation with GCC, 36.7x faster with Clang

For other benchmark speedups see https://corecheck.dev/bitcoin/bitcoin/pulls/31144

Running an IBD until 888888 blocks reveals a 4% speedup.

SSD:

 0COMMITS="8324a00bd4a6a5291c841f2d01162d8a014ddb02 5ddfd31b4158a89b0007cfb2be970c03d9278525"; \
 1STOP_HEIGHT=888888; DBCACHE=1000; \
 2CC=gcc; CXX=g++; \
 3BASE_DIR="/mnt/my_storage"; DATA_DIR="$BASE_DIR/BitcoinData"; LOG_DIR="$BASE_DIR/logs"; \
 4(for c in $COMMITS; do git fetch origin $c -q && git log -1 --pretty=format:'%h %s' $c || exit 1; done) && \
 5hyperfine \
 6  --sort 'command' \
 7  --runs 1 \
 8  --export-json "$BASE_DIR/ibd-${COMMITS// /-}-$STOP_HEIGHT-$DBCACHE-$CC.json" \
 9  --parameter-list COMMIT ${COMMITS// /,} \
10  --prepare "killall bitcoind; rm -rf $DATA_DIR/*; git checkout {COMMIT}; git clean -fxd; git reset --hard; \
11    cmake -B build -DCMAKE_BUILD_TYPE=Release -DENABLE_WALLET=OFF && \
12    cmake --build build -j$(nproc) --target bitcoind && \
13    ./build/bin/bitcoind -datadir=$DATA_DIR -stopatheight=1 -printtoconsole=0; sleep 100" \
14  --cleanup "cp $DATA_DIR/debug.log $LOG_DIR/debug-{COMMIT}-$(date +%s).log" \
15  "COMPILER=$CC ./build/bin/bitcoind -datadir=$DATA_DIR -stopatheight=$STOP_HEIGHT -dbcache=$DBCACHE -blocksonly -printtoconsole=0"

8324a00bd4 test: Compare util::Xor with randomized inputs against simple impl 5ddfd31b41 optimization: Xor 64 bits together instead of byte-by-byte

0Benchmark 1: COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=888888 -dbcache=1000 -blocksonly -printtoconsole=0 (COMMIT = 8324a00bd4a6a5291c841f2d01162d8a014ddb02)
1  Time (abs ≡):        25033.413 s               [User: 33953.984 s, System: 2613.604 s]
2
3Benchmark 2: COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=888888 -dbcache=1000 -blocksonly -printtoconsole=0 (COMMIT = 5ddfd31b4158a89b0007cfb2be970c03d9278525)
4  Time (abs ≡):        24110.710 s               [User: 33389.536 s, System: 2660.292 s]
5
6Relative speed comparison
7        1.04          COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=888888 -dbcache=1000 -blocksonly -printtoconsole=0 (COMMIT = 8324a00bd4a6a5291c841f2d01162d8a014ddb02)
8        1.00          COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=888888 -dbcache=1000 -blocksonly -printtoconsole=0 (COMMIT = 5ddfd31b4158a89b0007cfb2be970c03d9278525)

HDD:

 0COMMITS="71eb6eaa740ad0b28737e90e59b89a8e951d90d9 46854038e7984b599d25640de26d4680e62caba7"; \
 1STOP_HEIGHT=888888; DBCACHE=4500; \
 2CC=gcc; CXX=g++; \
 3BASE_DIR="/mnt/my_storage"; DATA_DIR="$BASE_DIR/BitcoinData"; LOG_DIR="$BASE_DIR/logs"; \
 4(for c in $COMMITS; do git fetch origin $c -q && git log -1 --pretty=format:'%h %s' $c || exit 1; done) && \
 5hyperfine \
 6  --sort 'command' \
 7  --runs 2 \
 8  --export-json "$BASE_DIR/ibd-${COMMITS// /-}-$STOP_HEIGHT-$DBCACHE-$CC.json" \
 9  --parameter-list COMMIT ${COMMITS// /,} \
10  --prepare "killall bitcoind; rm -rf $DATA_DIR/*; git checkout {COMMIT}; git clean -fxd; git reset --hard; \
11    cmake -B build -DCMAKE_BUILD_TYPE=Release -DENABLE_WALLET=OFF && cmake --build build -j$(nproc) --target bitcoind && \
12    ./build/bin/bitcoind -datadir=$DATA_DIR -stopatheight=1 -printtoconsole=0; sleep 100" \
13  --cleanup "cp $DATA_DIR/debug.log $LOG_DIR/debug-{COMMIT}-$(date +%s).log" \
14  "COMPILER=$CC ./build/bin/bitcoind -datadir=$DATA_DIR -stopatheight=$STOP_HEIGHT -dbcache=$DBCACHE -blocksonly -printtoconsole=0"

71eb6eaa74 test: compare util::Xor with randomized inputs against simple impl 46854038e7 optimization: migrate fixed-size obfuscation from std::vector<std::byte> to uint64_t

 0Benchmark 1: COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=888888 -dbcache=4500 -blocksonly -printtoconsole=0 (COMMIT = 71eb6eaa740ad0b28737e90e59b89a8e951d90d9)
 1  Time (mean ± σ):     37676.293 s ± 83.100 s    [User: 36900.535 s, System: 2220.382 s]
 2  Range (min … max):   37617.533 s … 37735.053 s    2 runs
 3
 4Benchmark 2: COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=888888 -dbcache=4500 -blocksonly -printtoconsole=0 (COMMIT = 46854038e7984b599d25640de26d4680e62caba7)
 5  Time (mean ± σ):     36181.287 s ± 195.248 s    [User: 34962.822 s, System: 1988.614 s]
 6  Range (min … max):   36043.226 s … 36319.349 s    2 runs
 7
 8Relative speed comparison
 9        1.04 ±  0.01  COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=888888 -dbcache=4500 -blocksonly -printtoconsole=0 (COMMIT = 71eb6eaa740ad0b28737e90e59b89a8e951d90d9)
10        1.00          COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=888888 -dbcache=4500 -blocksonly -printtoconsole=0 (COMMIT = 46854038e7984b599d25640de26d4680e62caba7)

DrahtBot commented at 7:11 am on October 24, 2024: contributor

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/31144.

Reviews

See the guideline for information on the review process.

Type	Reviewers
ACK	maflcko, ryanofsky, achow101
Stale ACK	hodlinator, yuvicc

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#32967 (log: [refactor] Use info level for init logs by maflcko)
#31860 (init: Take lock on blocks directory in BlockManager ctor by TheCharlatan)
#29641 (scripted-diff: Use LogInfo over LogPrintf [WIP, NOMERGE, DRAFT] by maflcko)
#28792 (Embed default ASMap as binary dump header file by fjahr)
#19463 (Prune locks by luke-jr)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

l0rinc force-pushed on Oct 24, 2024

DrahtBot added the label CI failed on Oct 24, 2024

in src/streams.h:40 in 10b9e68768 outdated

41-    key_offset %= key.size();
42 
43-    for (size_t i = 0, j = key_offset; i != write.size(); i++) {
44-        write[i] ^= key[j++];
45+    if (size_t remaining = write.size() - i; key.size() == 8 && remaining >= 8) { // Xor in 64-bit chunks
46+        const auto key64 = *std::bit_cast<uint64_t*>(key.data());

maflcko commented at 8:27 am on October 24, 2024:

I fail to see how this is not UB. This is identical to #30349 (review)

maflcko commented at 8:54 am on October 24, 2024:

Isn’t this what we’re doing in CScript as well https://github.com/bitcoin/bitcoin/blob/master/src/script/script.h#L496 ?

no? value_type is unsigned char (an 8-bit integer type) and this one here is uint64_t (an 64-bit integer type).

l0rinc commented at 8:55 am on October 24, 2024:

Yes, just saw it, my mistake

l0rinc commented at 9:17 am on October 24, 2024:

Replaced it with memcpy and it looks like the compiler successfully simplifies it to proper vector instructions: https://godbolt.org/z/Koscjconz

in src/streams.h:42 in 10b9e68768 outdated

43-    for (size_t i = 0, j = key_offset; i != write.size(); i++) {
44-        write[i] ^= key[j++];
45+    if (size_t remaining = write.size() - i; key.size() == 8 && remaining >= 8) { // Xor in 64-bit chunks
46+        const auto key64 = *std::bit_cast<uint64_t*>(key.data());
47+        const auto size64 = remaining / 8;
48+        for (auto& write64 : std::span(std::bit_cast<uint64_t*>(write.data() + i), size64)) {

maflcko commented at 8:27 am on October 24, 2024:

same

maflcko commented at 8:37 am on October 24, 2024:

CI seems to agree:

0/usr/bin/../lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/bits/stl_iterator.h:1100:16: runtime error: reference binding to misaligned address 0x7f10961d9084 for type 'unsigned long', which requires 8 byte alignment
10x7f10961d9084: note: pointer points here
2  94 8e 20 eb 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
3              ^ 
4    [#0](/bitcoin-bitcoin/0/) 0x55780d85ab85 in __gnu_cxx::__normal_iterator<unsigned long*, std::span<unsigned long, 18446744073709551615ul>>::operator*() const /usr/bin/../lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/bits/stl_iterator.h:1100:9
5    [#1](/bitcoin-bitcoin/1/) 0x55780d85ab85 in util::Xor(Span<std::byte>, Span<std::byte const>, unsigned long) ci/scratch/build-x86_64-pc-linux-gnu/src/test/./src/streams.h:42:28

https://github.com/bitcoin/bitcoin/actions/runs/11495063168/job/31993791381?pr=31144#step:7:4647

l0rinc commented at 8:44 am on October 24, 2024:

Thanks, I’ll investigate. I assumed there will be more to check, that’s why it’s still a draft.

maflcko commented at 8:27 am on October 24, 2024: member

I think your example may be a bit skewed? It shows how much time is spent when deserializing a CScript from a block file. However, block files contain full blocks, where many (most?) of the writes are single bytes (or 4 bytes), see #30833 (comment). Thus, it would be useful to know what the overall end-to-end performance difference is. Also taking into account the utxo db.

If you want the micro-benchmark to be representative, I’d presume you’d have to mimic the histogram of the sizes of writes. Just picking one (1024, or 1004), which is never hit in reality, and then optimizing for that may be misleading.

l0rinc force-pushed on Oct 24, 2024

l0rinc commented at 10:02 am on October 24, 2024: contributor

where many (most?) of the writes are single bytes (or 4 bytes)

Thanks, I’ve extended your previous benchmarks with both Autofile serialization and very small vectors. I will also run a reindex of 400k blocks before and after to see if the effect is measurable or not.

in src/streams.h:44 in 6ae466bf11 outdated

42+    if (key.size() == 8 && write.size() - i >= 8) { // Xor in 64-bit chunks
43+        uint64_t key64;
44+        std::memcpy(&key64, key.data(), 8);
45+        for (; i <= write.size() - 8; i += 8) {
46+            uint64_t write64;
47+            std::memcpy(&write64, write.data() + i, 8);

laanwj commented at 10:41 am on October 24, 2024:

i have a hard time believing this will make a large difference, especially with the two memcpys involved On modern CPUs, ALU operations (especially bitwise ones) are so fast compared to any kind of memory access. And this isn’t some advanced crypto math, it’s one Xor operation per word with a fixed key.

Could avoid the memcpys if the code takes memory alignment into account, but that makes it even more complex. Not sure the pros/cons work out here.

l0rinc commented at 10:46 am on October 24, 2024:

The speedup comes from the vectorized operations, i.e. doing 64 bit xor instead of byte-by-byte xor (memcpy seems to be eliminated on 64 bit architectures successfully), see https://godbolt.org/z/Koscjconz

laanwj commented at 11:05 am on October 24, 2024:

Right, that’s trivial for x86_64. Let’s also check for architectures that do require alignment for 64-bit reads and writes, like RISC-V.

l0rinc commented at 12:06 pm on October 24, 2024:

Added RISC-V compiler to https://godbolt.org/z/n5rMeYeas - where it seems to my untrained eyes that it uses two separate 32 bit xors to emulate the 64 bit operation (but even if it’s byte-by-byte on 32 bit processors, that’s still the same as what it was before on 64 bit CPUs, right?).

l0rinc commented at 10:01 pm on October 26, 2024:

I’ve replaced all vector keys with 64 bit ints

Edit: Memory alignment is handled via std::memcpy, optimized out on tested platforms (see godbolt.org/z/dcxvh6abq):

Clang (x86-64) - 32 bytes/iter using SSE vector operations
GCC (x86-64) - 16 bytes/iter using unrolled 64-bit XORs
RISC-V (32-bit) - 8 bytes/iter using load/XOR/store sequence
s390x (big-endian) - 64 bytes/iter with unrolled 8-byte XORs

(please validate, my assembly knowledge is mostly academic)

in src/streams.h:51 in 6ae466bf11 outdated

59-        // way instead of doing a %, which would effectively be a division
60-        // for each byte Xor'd -- much slower than need be.
61-        if (j == key.size())
62-            j = 0;
63+    for (size_t j = 0; i < write.size(); ++i, ++j) {
64+        write[i] ^= key[j % key.size()];

laanwj commented at 10:42 am on October 24, 2024:

Please avoid % (especially with a dynamic value) in an inner loop. It’s essentially a division operation and those are not cheap.

l0rinc commented at 10:45 am on October 24, 2024:

Thanks for the hint, I deliberately removed that (please check the commit messages for details), since these are optimized away. Also, this is just the leftover part, so for key of length 8 (the version used in most places) this will have 7 iterations at most. Can you see any difference with any of the benchmarks?

laanwj commented at 11:02 am on October 24, 2024:

It’s only up to 7 iterations (assuming the key size is 8), sure, youre’re right.

But ok, yea, i’m a bit divided about relying on specific non-trivial things being optimized out, makes the output very dependent on specific compiler decisions (which may be fickle in some cases).

l0rinc commented at 11:05 am on October 24, 2024:

Often the simplest code gets optimized most, since it’s more predictable. Would you like me to extend the test or benchmark suite or try something else to make sure we’re comfortable with the change?

laanwj commented at 11:10 am on October 24, 2024:

No, it’s fine. It just gives me goosebumps seeing code like this, but if it doesn’t affect the benchmark and no one else cares then it doesn’t matter.

l0rinc commented at 12:09 pm on October 24, 2024:

To get rid of the goosebumps I’m handling the remaining 4 bytes as a single 32 bit xor now, so the final loop (when keys are 8 bytes long, which is mostly the case for us, I think) does 3 iterations at most. So even if it’s not optimized away, we should be fine doing 3 divisions by a nice round number like 8.

maflcko commented at 12:17 pm on October 24, 2024:

divisions by a nice round number like 8.

I don’t think the compiler knows the number here, so can’t use it to optimize the code based on it.

l0rinc commented at 12:19 pm on October 24, 2024:

Is that important for max 3 divisions?

maflcko commented at 7:33 pm on October 24, 2024:

Yes. At least for me locally, but I am getting widely different bench results anyway: #31144 (comment)

With this one reverted, XorSmall is back to being just slightly slower than current master for me.

l0rinc commented at 8:37 pm on October 24, 2024:

Usually these optimizations concentrate on the measurable parts based on the profiling results that I’m getting during reindexing or IBD. Obfuscating a single bit (i.e. XorSmall) wasn’t my focus, it’s already very fast, didn’t seem like the bottleneck. Would you like me to concentrate on that scenario instead? Or would it make more sense to serialize a block and use that as the basis for the benchmarks?

C++ compiler …………………….. AppleClang 16.0.0.16000026

Before:

ns/byte	byte/s	err%	total	benchmark
1.99	501,740,412.05	0.5%	10.27	`AutoFileXor`
1.24	807,597,761.92	0.0%	11.01	`Xor`
1.29	776,238,564.27	0.0%	10.59	`XorSmall`

After:

ns/byte	byte/s	err%	total	benchmark
0.73	1,364,622,484.81	0.9%	8.78	`AutoFileXor`
0.02	40,999,383,920.82	0.0%	11.00	`Xor`
0.54	1,862,525,472.57	0.0%	11.00	`XorSmall`

C++ compiler …………………….. GNU 12.2.0

Before:

ns/byte	byte/s	err%	ins/byte	cyc/byte	IPC	bra/byte	miss%	total	benchmark
1.60	624,562,742.35	0.2%	9.20	3.57	2.579	1.03	0.1%	10.61	`AutoFileXor`
0.91	1,102,205,071.31	0.0%	9.02	3.26	2.763	1.00	0.1%	11.00	`Xor`
1.23	811,876,565.33	0.1%	14.60	4.43	3.295	1.80	0.0%	10.56	`XorSmall`

After:

ns/byte	byte/s	err%	ins/byte	cyc/byte	IPC	bra/byte	miss%	total	benchmark
0.74	1,346,798,809.87	0.1%	0.72	0.47	1.531	0.16	0.2%	11.02	`AutoFileXor`
0.09	11,450,472,586.50	0.0%	0.59	0.31	1.882	0.14	1.9%	10.82	`Xor`
5.65	177,092,223.60	0.1%	22.00	20.31	1.083	4.80	0.0%	10.99	`XorSmall`

maflcko commented at 9:11 pm on October 24, 2024:

Would you like me to concentrate on that scenario instead? Or would it make more sense to serialize a block and use that as the basis for the benchmarks?

Well no. I think this has been mentioned previously. Generally, optimizing for micro benchmarks may not yield results that are actually meaningful or visible for end-users, because the benchmarks capture only a very specific and narrow view. Optimizing for one could even make the code slower for another (as observed above). Adding a bench for the block couldn’t hurt, but I haven’t checked how representative it is. If such a bench represents the IBD behavior, it would be ideal. (There already is a block in the hex bench data, which could be used)

Usually these optimizations concentrate on the measurable parts based on the profiling results that I’m getting during reindexing or IBD

Yes, that is more useful. It would be good to share the number you got. Because the commit message simply claims that no benefit was found (“The if (j == key.size()) optimization wasn’t kept since the benchmarks couldn’t show any advantage anymore”).

XorSmall

Looks like you can reproduce the slowdown. I wonder if it is correlated with the use of libc++ vs libstdc++

l0rinc commented at 11:35 am on October 25, 2024:

It would be good to share the number you got

The reindex-chainstate until 600k, 2 runs just finished - comparing master against the 64/32 bit packing (current state) on Linux (with GCC, showing the above inconsistency).

0hyperfine \
1--runs 2 \
2--show-output \
3--export-json /mnt/my_storage/reindex-xor.json \
4--parameter-list COMMIT dea7e2faf1bc48f96741ef
584e25e6f47cefd5a92,353915bae14b9704a209bc09b021d3dd2ee11cf2 \
6--prepare 'cmake -B build -DCMAKE_BUILD_TYPE=Release -DBUILD_UTIL=OFF -DBUILD_TX=OFF -DBUILD_TESTS=OFF -DENABLE_WALLET=OFF -DI
7NSTALL_MAN=OFF && cmake --build build -j$(nproc)' \
8'COMMIT={COMMIT} ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=600000 -dbcache=500 -printtoconsole=0 -reindex-chainstate -connect=0'

Before:

3.554 hours
3.567 hours

After:

3.523 hours
3.527 hours

0Benchmark 1: COMMIT=dea7e2faf1bc48f96741ef84e25e6f47cefd5a92 ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=600000 -dbcache=500 -printtoconsole=0 -reindex-chains
1tate -connect=0
2  Time (mean ± σ):     12819.367 s ± 35.155 s    [User: 11992.168 s, System: 2509.200 s]
3  Range (min … max):   12794.508 s … 12844.225 s    2 runs
4
5Benchmark 2: COMMIT=353915bae14b9704a209bc09b021d3dd2ee11cf2 ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=600000 -dbcache=500 -printtoconsole=0 -reindex-chains
6tate -connect=0
7  Time (mean ± σ):     12685.350 s ± 19.878 s    [User: 11918.349 s, System: 2523.819 s]
8  Range (min … max):   12671.295 s … 12699.406 s    2 runs

Reindexing is a lot more stable than IBD (as seen from multiple measurements), showing a consistent 1% speedup. Not earth-shattering, but at least this way the obfuscation isn’t causing a speed regression anymore.

Adding a bench for the block couldn’t hurt, but I haven’t checked how representative it is

Let’s find out!

l0rinc commented at 10:03 pm on October 26, 2024:

I’ve change the usages to avoid std::vector keys, this way GCC and Clang both agree that the new results are faster (even though clang manages to compile to 32 byte SIMD, while GCC only to 16 bytes per iteration, see #31144 (review))

DrahtBot removed the label CI failed on Oct 24, 2024

l0rinc force-pushed on Oct 24, 2024

l0rinc renamed this:
~~optimization: pack util::Xor into 64 bit chunks instead of doing it byte-by-byte~~
optimization: pack util::Xor into 64/32 bit chunks instead of doing it byte-by-byte
on Oct 24, 2024

l0rinc marked this as ready for review on Oct 24, 2024

laanwj added the label Block storage on Oct 24, 2024

maflcko commented at 7:14 pm on October 24, 2024: member

It would be good to explain the jpegs in the description, or even remove them. They will be excluded from the merge commit and aren’t shown, unless GitHub happens to be reachable and online. Are they saying that IBD was 4% faster? Also, I think they were created with the UB version of this pull, so may be outdated either way?

I did a quick check on my laptop and it seems the XorSmall (1+4 bytes) is slower with this pull. The Xor (modified to check 40 bytes) was twice as fast. Overall, I’d expect it to be slower on my machine, due to the histogram of real data showing more small byte writes than long ones, IIRC.

I can try to bench on another machine later, to see if it makes a difference.

Can you clarify what type of machine you tested this on?

l0rinc commented at 8:39 pm on October 24, 2024: contributor

Are they saying that IBD was 4% faster?

That’s what I’m measuring currently, but I don’t expect more than 2% difference here.

Also, I think they were created with the UB version of this pull, so may be outdated either way?

Benchmarks indicated that the 64 bit compiled result was basically the same.

Overall, I’d expect it to be slower on my machine, due to the histogram of real data showing more small byte writes than long ones, IIRC.

I’ll investigate, thanks.

Posting the perf here for reference: Reindexing until 300k blocks reveals that XOR usage was reduced:

in src/test/streams_tests.cpp:376 in a3dc138798 outdated

296@@ -270,7 +297,7 @@ BOOST_AUTO_TEST_CASE(streams_buffered_file)
297         BOOST_CHECK(false);
298     } catch (const std::exception& e) {
299         BOOST_CHECK(strstr(e.what(),
300-                        "Rewind limit must be less than buffer size") != nullptr);
301+                           "Rewind limit must be less than buffer size") != nullptr);

hodlinator commented at 12:41 pm on October 25, 2024:

Seems like prior author just rounded off to some not too unreasonable tab-indentation (efd2474d17098c754367b844ec646ebececc7c74). Function isn’t touched in this PR so should probably resist touching here and below.

l0rinc commented at 2:18 pm on October 25, 2024:

Seems it was a deliberate formatting, so I’ll revert. Will push after I have the block serialization benchmark working.

l0rinc commented at 9:58 pm on October 26, 2024:

Ended up modifying a lot more in the latest push, so this line was formatted again

in src/test/streams_tests.cpp:35 in a3dc138798 outdated

30+        const size_t write_size = rng.randrange(100);
31+        const size_t key_offset = rng.randrange(key_size + 2);
32+
33+        std::vector key(rng.randbytes<std::byte>(key_size));
34+        std::vector expected(rng.randbytes<std::byte>(write_size));
35+        std::vector actual(expected);

hodlinator commented at 12:46 pm on October 25, 2024:

Experimented with changed to brace-initialization which uncovered some slight narrowing/widening occurring. (Thought I had an angle for making the code more robust in a more material way but that attempt failed).

 0    auto expected_xor{[](Span<std::byte> write, Span<const std::byte> key, size_t key_offset) {
 1        if (key.size()) {
 2            for (auto& b : write) {
 3                b ^= key[key_offset++ % key.size()];
 4            }
 5        }
 6    }};
 7
 8    FastRandomContext rng{false};
 9    for (int test = 0; test < 100; ++test) {
10        const int key_size{rng.randrange(10)};
11        const int write_size{rng.randrange(100)};
12        const int key_offset{rng.randrange(key_size + 2)};
13
14        std::vector key{rng.randbytes<std::byte>(key_size)};
15        std::vector expected{rng.randbytes<std::byte>(write_size)};
16        std::vector actual{expected};

l0rinc commented at 2:18 pm on October 25, 2024:

Seems a bit excessive for a test, but ended up changing it to e.g. const size_t write_size{rng.randrange(100U)};. Will push a bit later.

in src/test/streams_tests.cpp:19 in a3dc138798 outdated

13@@ -14,6 +14,33 @@ using namespace std::string_literals;
14 
15 BOOST_FIXTURE_TEST_SUITE(streams_tests, BasicTestingSetup)
16 
17+BOOST_AUTO_TEST_CASE(xor_bytes)
18+{
19+    auto expected_xor = [](Span<std::byte> write, Span<const std::byte> key, size_t key_offset) {

hodlinator commented at 1:12 pm on October 25, 2024:

Might as well use std::span for new code.

l0rinc commented at 2:18 pm on October 25, 2024:

Indeed!

hodlinator commented at 2:02 pm on October 25, 2024: contributor

Concept ACK a3dc138798e2f2c7aa1c9b37633c16c1b523a251

Operating on CPU words rather than individual bytes. :+1:

Not entirely clear to me from #31144 (review) whether the optimizer is able to use SIMD. Guess picking through the binary of a GUIX-build would give a definitive answer.

The verbosity of std::memcpy hurts readability but alignment issues are real.

l0rinc marked this as a draft on Oct 25, 2024

l0rinc force-pushed on Oct 26, 2024

DrahtBot commented at 9:58 pm on October 26, 2024: contributor

🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/32109998785

Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.
An intermittent issue.

Leave a comment here, if you need help tracking down a confusing failure.

DrahtBot added the label CI failed on Oct 26, 2024

l0rinc force-pushed on Oct 26, 2024

l0rinc renamed this:
~~optimization: pack util::Xor into 64/32 bit chunks instead of doing it byte-by-byte~~
optimization: change XOR obfuscation key from `std::vector<std::byte>(8)` to `uint64_t`
on Oct 26, 2024

l0rinc force-pushed on Oct 27, 2024

DrahtBot removed the label CI failed on Oct 27, 2024

l0rinc force-pushed on Oct 27, 2024

l0rinc marked this as ready for review on Oct 27, 2024

l0rinc force-pushed on Oct 29, 2024

DrahtBot added the label CI failed on Oct 29, 2024

DrahtBot commented at 12:49 pm on October 29, 2024: contributor

🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/32217592364

Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.
An intermittent issue.

Leave a comment here, if you need help tracking down a confusing failure.

DrahtBot removed the label CI failed on Oct 29, 2024

hodlinator commented at 12:40 pm on October 31, 2024: contributor

The CI failure in https://github.com/bitcoin/bitcoin/runs/32217592364 might come from a bad rebase reconciliation with master?

0[12:44:30.723] Duplicate include(s) in src/bench/xor.cpp:
1[12:44:30.723] #include <cstddef>
2[12:44:30.723] #include <span.h>
3[12:44:30.723] #include <streams.h>
4[12:44:30.723] #include <random.h>
5[12:44:30.723] #include <vector>
6[12:44:30.723] #include <bench/bench.h>
7[12:44:30.723] 
8[12:44:30.728] ^---- ⚠️ Failure generated from lint-includes.py

l0rinc commented at 12:59 pm on October 31, 2024: contributor

The CI failure in bitcoin/bitcoin/runs/32217592364 might come from a bad rebase reconciliation with master?

That’s an old build, right? Latest Lint seems fine

in src/dbwrapper.cpp:259 in 8696184c2f outdated

263-        // Initialize non-degenerate obfuscation if it won't upset
264-        // existing, non-obfuscated data.
265-        std::vector<unsigned char> new_key = CreateObfuscateKey();
266+    obfuscate_key = 0; // needed for unobfuscated Read
267+    std::vector<unsigned char> obfuscate_key_vector(OBFUSCATE_KEY_NUM_BYTES, '\000');
268+    if (bool key_exists = Read(OBFUSCATE_KEY_KEY, obfuscate_key_vector); !key_exists && params.obfuscate && IsEmpty()) {

hodlinator commented at 1:09 pm on October 31, 2024:

While key_exists is only created for this ìf`-block, there are other conditions involved, and we test for the negation of the value, so I find it less surprising to revert to the previous approach.

0    bool key_exists = Read(OBFUSCATE_KEY_KEY, obfuscate_key_vector);
1    if (!key_exists && params.obfuscate && IsEmpty()) {

l0rinc commented at 2:57 pm on October 31, 2024:

This pattern has been used before, I meant to narrow the scope of the var here. If you feel strongly about it, I’ll separate them.

hodlinator commented at 3:09 pm on October 31, 2024:

This pattern has been used before

In the example you give, the variable is used in a more “positive” sense. Here we test the negated value, which is part of what makes it jarring.

One could maybe improve readability by moving the negation and renaming:

0    if (bool missing_key = !Read(OBFUSCATE_KEY_KEY, obfuscate_key_vector); missing_key && params.obfuscate && IsEmpty()) {

I meant to narrow the scope of the var here.

That’s why I was referring to the if-block.

If you feel strongly about it, I’ll separate them.

Yes please, either that or my negation move+rename suggestion.

l0rinc commented at 3:13 pm on October 31, 2024:

Done both, thanks, good observation about the negation

hodlinator commented at 3:48 pm on October 31, 2024:

My point about moving the negation and changing the name made more sense in the context of keeping it inside the if-block. If you are open to moving it out, I’d say it’s better to keep the original key_exists name and original negation to avoid the churn and make it easier to review.

(Realized another reason for not having it inside the if-block is that we are mutating obfuscate_key_vector, which is used after the block).

l0rinc commented at 5:43 pm on October 31, 2024:

I prefer the new key_missing part, it did confuse me once during development as well

in src/node/blockstorage.cpp:1180 in 8696184c2f outdated

1173@@ -1174,7 +1174,11 @@ static auto InitBlocksdirXorKey(const BlockManager::Options& opts)
1174         };
1175     }
1176     LogInfo("Using obfuscation key for blocksdir *.dat files (%s): '%s'\n", fs::PathToString(opts.blocks_dir), HexStr(xor_key));
1177-    return std::vector<std::byte>{xor_key.begin(), xor_key.end()};
1178+    assert(xor_key.size() == 8);
1179+    uint64_t key;
1180+    std::memcpy(&key, xor_key.data(), 8);
1181+    xor_key.fill(std::byte{0});

hodlinator commented at 1:16 pm on October 31, 2024:

I find it more useful to (static) assert that the std::array size matches the uint64 size directly. Also don’t see a point in zeroing out the local variable before returning?

0    uint64_t key;
1    static_assert(xor_key.size() == sizeof(key));
2    std::memcpy(&key, xor_key.data(), sizeof(key));

l0rinc commented at 2:59 pm on October 31, 2024:

I’ve added the fills to make sure we’re not using them after conversion anymore. What would be the advantage of the static asserts? I don’t mind removing these failsafes if you think they’re redundant or noisy.

l0rinc commented at 3:15 pm on October 31, 2024:

Removed in the end, not that important

in src/bench/xor.cpp:8 in 8696184c2f outdated

2@@ -3,22 +3,1283 @@
3 // file COPYING or https://opensource.org/license/mit/.
4 
5 #include <bench/bench.h>
6+#include <cmath>
7+#include <cstddef>
8+#include <map>

fanquake commented at 1:16 pm on October 31, 2024:

You can keep the std lib headers separated from our own headers.

l0rinc commented at 1:55 pm on October 31, 2024:

Done, anything else?

maflcko commented at 1:18 pm on October 31, 2024: member

Taking a step back, I wonder if this is worth it. IIRC it gives a +1% speedup when run on spinning storage, so it seems a higher speedup is possibly visible on faster, modern storage. However, it would be good if any IO delay was taken out of IBD completely, so that the speed of storage and the speed of XOR is largely irrelevant.

I haven’t looked, but this may be possible by asking the next block the be read into memory in the background, as soon as work on the current block begins.

Something similar is done in net_processing, where the (presumed) current block is read into memory, and kept there, to avoid having to read it from storage again in validation. See https://github.com/bitcoin/bitcoin/blob/f07a533dfcb172321972e5afb3b38a4bd24edb87/src/net_processing.cpp#L823

in src/node/blockstorage.cpp:1177 in 23fc898514 outdated

1173@@ -1174,6 +1174,7 @@ static auto InitBlocksdirXorKey(const BlockManager::Options& opts)
1174         };
1175     }
1176     LogInfo("Using obfuscation key for blocksdir *.dat files (%s): '%s'\n", fs::PathToString(opts.blocks_dir), HexStr(xor_key));
1177+    assert(xor_key.size() == 8);

hodlinator commented at 1:20 pm on October 31, 2024:

Don’t see much point in adding the assert here in 23fc898514bf9696facbaff65251b62c362d214e were we still only have a fixed-size std::array with the ~~asserted~~ fixed size of 8. Seems sufficient with the assert in the BlockManager-ctor.

l0rinc commented at 3:13 pm on October 31, 2024:

Simplified, thanks

l0rinc commented at 1:24 pm on October 31, 2024: contributor

Taking a step back, I wonder if this is worth it. IIRC it gives a +1% speedup

That’s not the main point, rather that we’re storing the key in a value that can be short-circuited easily so that when key is 0 (i.e. xor is NOOP), we can skip it. Previously this would have only been possible by checking each byte of the key. It’s also a lot cleaner to store it in a primitive instead, which supports xor natively. Xor comes up in every profiling I do, we shouldn’t have a regression because of #28207 - this PR solves that.

in src/test/streams_tests.cpp:262 in 850214ffd9 outdated

261@@ -235,7 +262,7 @@ BOOST_AUTO_TEST_CASE(streams_serializedata_xor)
262     // Single character key

hodlinator commented at 1:29 pm on October 31, 2024:

In 850214ffd9f56e887a18d0428d5881e6c1ee8652: Single/Multi character key comments don’t make sense inside of this commit.

l0rinc commented at 3:13 pm on October 31, 2024:

Removed, thanks

hodlinator commented at 3:39 pm on October 31, 2024:

(Should be done in the initial commit which invalidates the comments IMO).

maflcko commented at 1:39 pm on October 31, 2024: member

we shouldn’t have a regression because of #28207 - this PR solves that.

It is not possible to do XOR without any cost at all. There will always be an overhead and I think calling #28207 a regression and this change a “fix” is not entirely accurate. This change reduces the overhead, according to the benchmarks.

That’s not the main point

The pull request title starts with “optimization”, so I got the impression that a speedup is the main point.

The reason that std::vector was picked is that switching to larger sizes is possible. However, if there was a need to do that, XOR would likely not be sufficient anyway. So limiting to 8 bytes fixed at compile time seems reasonable.

I am mostly saying that any speedup here may not be visible at all if IO is completely taken out of the critical path, but I haven’t looked into that in detail.

l0rinc commented at 1:52 pm on October 31, 2024: contributor

change a “fix” is not entirely accurate

when xor is disabled we’re not xor-ing now at all. Previously we did xor, so this change brings back the previous behavior when xor is not needed.

The pull request title starts with “optimization”, so I got the impression that a speedup is the main point.

Yes, please see the updated description with the benchmarks: #31144#issue-2610689777

may not be visible at all if IO is completely taken out of the critical path

I’d argue the current implementation is slightly simpler (i.e. xor is stored and performed natively and can be disabled) and faster (2x for a representative dataset).

l0rinc force-pushed on Oct 31, 2024

in src/streams.cpp:14 in 4437d4379c outdated

10@@ -11,6 +11,7 @@
11 AutoFile::AutoFile(std::FILE* file, std::vector<std::byte> data_xor)
12     : m_file{file}, m_xor{std::move(data_xor)}
13 {
14+    assert(m_xor.size() == 8);

hodlinator commented at 2:15 pm on October 31, 2024:

Guess the point of adding this assert is to prove that it doesn’t break anything before switching to uint64?

l0rinc commented at 3:01 pm on October 31, 2024:

yes

in src/streams.h:408 in 4437d4379c outdated

404@@ -393,7 +405,7 @@ class AutoFile
405     std::optional<int64_t> m_position;
406 
407 public:
408-    explicit AutoFile(std::FILE* file, std::vector<std::byte> data_xor={});
409+    explicit AutoFile(std::FILE* file, std::vector<std::byte> data_xor = {std::byte{0x00}, std::byte{0x00}, std::byte{0x00}, std::byte{0x00}, std::byte{0x00}, std::byte{0x00}, std::byte{0x00}, std::byte{0x00}});

hodlinator commented at 2:27 pm on October 31, 2024:

In 4437d4379c42a7b87bd01ad5ea6c450a732f4f95: Less verbose:

{std::byte{0x00}, std::byte{0x00}, std::byte{0x00}, std::byte{0x00}, std::byte{0x00}, std::byte{0x00}, std::byte{0x00}, std::byte{0x00}}

=> {8, std::byte{0x00}}

l0rinc commented at 3:01 pm on October 31, 2024:

it’s removed in the next commit

l0rinc commented at 3:13 pm on October 31, 2024:

But done anyway

in src/node/mempool_persist.cpp:189 in 33625e9d53 outdated

188-            file.SetXor(xor_key);
189+            std::vector<std::byte> xor_key_vector(8);
190+            FastRandomContext{}.fillrand(xor_key_vector);
191+            file << xor_key_vector;
192+
193+            uint64_t m_xor;

hodlinator commented at 2:46 pm on October 31, 2024:

Should not use m_-prefix for non-member variables.

l0rinc commented at 3:13 pm on October 31, 2024:

You’re right, I’ve used xor_key in the same case before

l0rinc force-pushed on Oct 31, 2024

hodlinator commented at 4:00 pm on October 31, 2024: contributor

(Reviewed a5cad729c76cafa047a2b1897595669ae9b2b0d5)

Since my previous Concept ACK, the PR was changed to switch the xor key more completely to uint64_t. Before the PR, we were already using fixed-size of 8 bytes for the obfuscation value in the file formats, so changing the type to uint64_t shouldn’t be noticeable to users. :+1:

Even if we could move reading and XOR-ing out of the hot path as suggested by maflcko, we might as well make use the CPU architectures we have. I would expect larger-sized XOR operations to have less overhead and energy waste (less heat).

maflcko commented at 5:32 pm on October 31, 2024: member

We could have used ReadLE64 to unify the byte order for keys and writable values, but that shouldn’t be necessary, since both have the same endianness locally that shouldn’t be affected by a byte-by-byte xor.

The s390x unit tests fail:

 0./src/test/streams_tests.cpp(40): error: in "streams_tests/xor_bytes": check { expected.begin(), expected.end() } == { actual.begin(), actual.end() } has failed. 
 1Mismatch at position 0: � != |
 2Mismatch at position 1: Y != �
 3Mismatch at position 2: � != �
 4Mismatch at position 3: � != �
 5Mismatch at position 4: w != �
 6Mismatch at position 5: C != �
 7Mismatch at position 6:  != x
 8Mismatch at position 7: � != �
 9Mismatch at position 8: � != C
10Mismatch at position 9: Y != �
11Mismatch at position 10: � != �
12Mismatch at position 11: , != �
13Mismatch at position 12: � != R
14Mismatch at position 13: 8 != �
15Mismatch at position 14: � != �
16Mismatch at position 15: � != �
17Mismatch at position 16: � != l
18Mismatch at position 17: � != n
19Mismatch at position 18: � != �
20Mismatch at position 19: t != B
21Mismatch at position 20: ; != �
22Mismatch at position 21: != �
23Mismatch at position 22: � != �
24Mismatch at position 23: � != �
25Mismatch at position 24: k != �
26Mismatch at position 25: � != Z
27Mismatch at position 26: � != �
28Mismatch at position 27: � != �
29Mismatch at position 28: � != #
30Mismatch at position 29: 8 != �
31Mismatch at position 30: � != �
32Mismatch at position 31: � != �
33Mismatch at position 32: g != �
34Mismatch at position 33: � != ^
35Mismatch at position 34: � != �
36ismatch at position 36: � != k
37Mismatch at position 37: * != �
38Mismatch at position 38: q != 
39Mismatch at position 39: � != �
40Mismatch at position 40: � != �
41Mismatch at position 41: r != e
42Mismatch at position 42: � != �

l0rinc commented at 5:42 pm on October 31, 2024: contributor

The s390x unit tests fail:

I don’t know how to access that, is it part of CI? Does the test suite pass on it otherwise or was it just curiosity? Do you know the reason it fails, is my assumption incorrect that endianness should apply to both parts (key and value) the same way? Is the test wrong or the xor, should I change the test such that xor-ing twice should reveal the original data instead (while the intermediary part should not)?

maflcko commented at 6:37 pm on October 31, 2024: member

I don’t know how to access that, is it part of CI?

It needs to be run manually. See https://github.com/bitcoin/bitcoin/tree/master/ci#running-a-stage-locally. (podman run --rm --privileged docker.io/multiarch/qemu-user-static --reset -p yes may be required to setup qemu-s390x, depending on your setup). Then something like MAKEJOBS="-j$(nproc)" FILE_ENV="./ci/test/00_setup_env_s390x.sh" ./ci/test_run_all.sh should run it.

Does the test suite pass on it otherwise or was it just curiosity?

Yes, it should pass on s390x. If not, that is a bug somewhere.

l0rinc marked this as a draft on Oct 31, 2024

l0rinc force-pushed on Nov 2, 2024

l0rinc commented at 7:45 pm on November 2, 2024: contributor

Thanks for the hints @maflcko, I was under the impression that big-endian tests were run automatically.

Fix

It seem that that std::rotr doesn’t take endianness into account, so the fix basically looks like:

0size_t key_rotation = 8 * key_offset;
1if constexpr (std::endian::native == std::endian::big) key_rotation *= -1;
2return std::rotr(key, key_rotation);

I’ve emulated the s390x behavior locally like this:

0brew install podman pigz
1softwareupdate --install-rosetta
2podman machine init
3podman machine start
4docker run --platform linux/s390x -it ubuntu:22.04 /bin/bash
5    apt update && apt install -y git build-essential cmake ccache pkg-config libevent-dev libboost-dev libssl-dev libsqlite3-dev && \
6    cd /mnt && git clone https://github.com/bitcoin/bitcoin.git && cd bitcoin && git remote add l0rinc https://github.com/l0rinc/bitcoin.git && git fetch --all && git checkout l0rinc/optimize-xor && \
7    cmake -B build && cmake --build build --target test_bitcoin -j$(nproc) && \
8    ./build/src/test/test_bitcoin --run_test=streams_tests

Changes

The change also includes an updated benchmarking suite with 700k blocks inspected for all usages of utils::Xor to make the benchmark representative:
Changed AutoFileXor benchmark to measure the regression of turning off obfuscation.
I’ve also updated all key derivations to vector-to-uint64 instead of generating the 64 bit key directly (it’s more consistent, more representative and helps with endianness).
Added a xor roundtrip test which applies the xor in random chunks, asserts that it differs from the original and reapplies it in different random chunks and asserts that it’s equivalent to the original.

See the complete diff: https://github.com/bitcoin/bitcoin/compare/a5cad729c76cafa047a2b1897595669ae9b2b0d5..af778c5bdea4175f84e82b41f3b51c7f453d8c7e

l0rinc marked this as ready for review on Nov 2, 2024

l0rinc force-pushed on Nov 5, 2024

l0rinc commented at 4:40 pm on November 5, 2024: contributor

Updated the benchmark to 860k blocks (September 2024):

This one contains a lot of very big arrays (96'233 separate sizes, biggest was 3'992'470 bytes long) - a big departure from the previous 400k and 700k blocks (having 1500 sizes, biggest was 9319 bytes long).

The performance characteristics are also quite different, now that we have more and bigger byte arrays:

C++ compiler …………………….. AppleClang 16.0.0.16000026

Before:

ns/byte	byte/s	err%	total	benchmark
1.29	774,577,944.12	0.2%	115.99	`XorHistogram`

After:

ns/byte	byte/s	err%	total	benchmark
0.04	26,411,646,837.32	0.2%	8.97	`XorHistogram`

i.e. ~35x faster with Clang at processing the data with representative histograms.

C++ compiler …………………….. GNU 13.2.0

Before:

ns/byte	byte/s	err%	ins/byte	cyc/byte	IPC	bra/byte	miss%	total	benchmark
0.97	1,032,916,679.87	0.0%	9.01	3.29	2.738	1.00	0.0%	86.58	`XorHistogram`

After:

ns/byte	byte/s	err%	ins/byte	cyc/byte	IPC	bra/byte	miss%	total	benchmark
0.10	10,369,097,976.62	0.0%	0.32	0.33	0.985	0.06	0.6%	8.63	`XorHistogram`

i.e. ~10x faster with GCC at processing the data with representative histograms.

Edit: note that I couldn’t use random byte generation for each benchmark value since it timed out on CI. I have replaced it with getting subsets of a single big random vector.

l0rinc force-pushed on Nov 5, 2024

DrahtBot added the label CI failed on Nov 5, 2024

DrahtBot commented at 4:49 pm on November 5, 2024: contributor

🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/32549559124

Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.
An intermittent issue.

Leave a comment here, if you need help tracking down a confusing failure.

l0rinc force-pushed on Nov 6, 2024

l0rinc force-pushed on Nov 7, 2024

l0rinc force-pushed on Nov 8, 2024

DrahtBot removed the label CI failed on Nov 8, 2024

l0rinc commented at 10:59 am on November 11, 2024: contributor

I ran a reindex-chainstate until 860k blocks (SSD, i7 CPU), before and after this change, 2 runs per commit. As stated in the previous comment, the latter blocks (>700k) seem to contain a lot of very big vectors where the new algorithm shines.

0hyperfine   \
1--runs 2   \
2--export-json /mnt/my_storage/reindex-chainstate-obfuscation-overhead.json  \
3--parameter-list COMMIT 866f4fa521f6932162570d6531055cc007e3d0cd,3efc72ff7cbdfb788b23bf4346e29ba99362c120,08db1794647c37f966c525411f931a4a0f6b6119 \
4--prepare 'git checkout {COMMIT} && git clean -fxd && git reset --hard && cmake -B build -DCMAKE_BUILD_TYPE=Release -DBUILD_UTIL=OFF -DBUILD_TX=OFF -DBUILD_TESTS=OFF -DENABLE_WALLET=OFF -DINSTALL_MAN=OFF && cmake --build build -j$(nproc)' \
5'COMMIT={COMMIT} ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=860000 -dbcache=10000 -printtoconsole=0 -reindex-chainstate'

Which indicates that this change speeds up IBD (or at least reindex-chainstate) by roughly ~3% (i.e. reverts the regression):

866f4fa521f6932162570d6531055cc007e3d0cd - baseline
3efc72ff7cbdfb788b23bf4346e29ba99362c120 - end-to-end vector to uint64
08db1794647c37f966c525411f931a4a0f6b6119 - dummy commit that forces obfuscation keys to be 0 to ignore xor operations (since -blocksxor=0 doesn’t affect dbwrapper), to see how xor affects speed in general

Benchmark 1: COMMIT=866f4fa521f6932162570d6531055cc007e3d0cd ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=860000 -dbcache=10000 -printtoconsole=0 -reindex-chainstate

Time (mean ± σ): 20846.396 s ± 72.227 s [User: 22353.773 s, System: 2563.864 s] Range (min … max): 20795.323 s … 20897.468 s 2 runs

Benchmark 2: COMMIT=3efc72ff7cbdfb788b23bf4346e29ba99362c120 ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=860000 -dbcache=10000 -printtoconsole=0 -reindex-chainstate

Time (mean ± σ): 20308.036 s ± 100.345 s [User: 21745.076 s, System: 2591.459 s] Range (min … max): 20237.081 s … 20378.991 s 2 runs

Benchmark 3: COMMIT=08db1794647c37f966c525411f931a4a0f6b6119 ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=860000 -dbcache=10000 -printtoconsole=0 -reindex-chainstate

Time (mean ± σ): 20293.227 s ± 9.923 s [User: 21711.799 s, System: 2588.886 s] Range (min … max): 20286.210 s … 20300.244 s 2 runs

0Summary
1'COMMIT=08db1794647c37f966c525411f931a4a0f6b6119 ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=860000 -dbcache=10000 -printtoconsole=0 -reindex-chainstate' ran
2 1.00 ± 0.00 times faster than 'COMMIT=3efc72ff7cbdfb788b23bf4346e29ba99362c120 ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=860000 -dbcache=10000 -printtoconsole=0 -reindex-chainstate'
3 1.03 ± 0.00 times faster than 'COMMIT=866f4fa521f6932162570d6531055cc007e3d0cd ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=860000 -dbcache=10000 -printtoconsole=0 -reindex-chainstate'

in src/test/streams_tests.cpp:301 in a123297318 outdated

299+        uint64_t xor_key;
300+        std::memcpy(&xor_key, xor_pat.data(), 8);
301+
302         DataStream ds{in};
303-        ds.Xor({0xff});
304+        ds.Xor(xor_key);

hodlinator commented at 8:26 pm on November 11, 2024:

Let’s not add bloat for this case.

0        ds.Xor(0xffffffffffffffff);

l0rinc commented at 9:59 pm on November 11, 2024:

Please see: #31144 (review)

hodlinator commented at 10:26 pm on November 11, 2024:

All ones (binary) is not endian-sensitive. IMO it’s okay for the tests to look slightly different to reduce this kind of noise.

l0rinc commented at 10:28 pm on November 11, 2024:

I want to test the setup we have in production, these tests were very useful in revealing endianness problems.

in src/test/streams_tests.cpp:315 in a123297318 outdated

316+        uint64_t xor_key;
317+        std::memcpy(&xor_key, xor_pat.data(), 8);
318+
319         DataStream ds{in};
320-        ds.Xor({0xff, 0x0f});
321+        ds.Xor(xor_key);

hodlinator commented at 8:32 pm on November 11, 2024:

Should work regardless of endianness.

0        ds.Xor(0xff0fff0fff0fff0f);

l0rinc commented at 9:56 pm on November 11, 2024:

It doesn’t (at least not for all test cases), I’m deliberately testing generating vectors and converting them instead of generating 64 bit values since that’s what happens in production code (this should address multiple of your comments)

hodlinator commented at 10:45 pm on November 11, 2024:

My bad.

l0rinc commented at 3:46 pm on April 5, 2025:

Changed to:

const Obfuscation obfuscation{“ff0fff0fff0fff0f”_hex};

in src/test/streams_tests.cpp:82 in a123297318 outdated

78     const std::vector<uint8_t> test1{1, 2, 3};
79     const std::vector<uint8_t> test2{4, 5};
80-    const std::vector<std::byte> xor_pat{std::byte{0xff}, std::byte{0x00}};
81+    const std::vector xor_pat{std::byte{0xff}, std::byte{0x00}, std::byte{0xff}, std::byte{0x00}, std::byte{0xff}, std::byte{0x00}, std::byte{0xff}, std::byte{0x00}};
82+    uint64_t xor_key;
83+    std::memcpy(&xor_key, xor_pat.data(), 8);

hodlinator commented at 8:35 pm on November 11, 2024:

0    const uint64_t xor_pat{0xff00ff00ff00ff00};

l0rinc commented at 9:56 pm on November 11, 2024:

Please see: #31144 (review)

hodlinator commented at 10:35 pm on November 11, 2024:

Feel like I’m imitating ChatGPT: Ah, yes, I get now how the memcpy() interaction with endianness preserves the functioning of this specific test.

in src/dbwrapper.cpp:257 in a123297318 outdated

261-
262-    if (!key_exists && params.obfuscate && IsEmpty()) {
263-        // Initialize non-degenerate obfuscation if it won't upset
264-        // existing, non-obfuscated data.
265-        std::vector<unsigned char> new_key = CreateObfuscateKey();
266+    obfuscate_key = 0; // needed for unobfuscated Read

hodlinator commented at 9:19 pm on November 11, 2024:

0    obfuscate_key = 0; // Needed for unobfuscated Read

l0rinc commented at 11:07 pm on November 11, 2024:

Done

in src/node/mempool_persist.cpp:64 in a123297318 outdated

62         if (version == MEMPOOL_DUMP_VERSION_NO_XOR_KEY) {
63-            // Leave XOR-key empty
64+            file.SetXor(0);
65         } else if (version == MEMPOOL_DUMP_VERSION) {
66-            file >> xor_key;
67+            std::vector<std::byte> xor_key_vector(8);

hodlinator commented at 9:27 pm on November 11, 2024:

Here and on line 185:

0            std::vector<std::byte> xor_key_vector{8};

l0rinc commented at 11:07 pm on November 11, 2024:

Now I had to rename the PR :D

in src/streams.h:50 in a123297318 outdated

59+    switch (write.size()) { // Specify the optimization categories
60+    case 0: break;
61+    case 1: XorInt(write, key, 1); break;
62+    case 2: XorInt(write, key, 2); break;
63+    case 4: XorInt(write, key, 4); break;
64+    default: XorInt(write, key, write.size());

hodlinator commented at 9:36 pm on November 11, 2024:

Care to elaborate why we don’t just have the default statement?

l0rinc commented at 9:59 pm on November 11, 2024:

“Specify the optimization categories” means that the compiler will be able to optimize the cases where one of the parameters (the size) is a constant separately from each other. The default statement would work, but would be very slow, since the 1, 2 and 4 byte versions won’t be specialized.

hodlinator commented at 10:42 pm on November 11, 2024:

How about

0    // Help optimizers along by sending constant parameter values into the inlined function,
1    // resulting in more efficient substitutions of memcpy() -> native pow-2 copy instructions.
2    switch (write.size()) {
3    case 0: break;
4    case 1: XorInt(write, key, 1); break;
5    case 2: XorInt(write, key, 2); break;
6    case 4: XorInt(write, key, 4); break;
7    default: XorInt(write, key, write.size());

l0rinc commented at 11:07 pm on November 11, 2024:

I ended up with only // Help the compiler specialize 1, 2 and 4 byte cases, since the rest was just speculation

l0rinc commented at 3:48 pm on April 5, 2025:

This isn’t needed anymore, since #31551 will remove all those tiny writes

hodlinator commented at 9:49 pm on November 11, 2024: contributor

Reviewed a1232973189126cfc9526713011461709685fcc8

git range-diff master a5cad72 a123297

The awk script in the comment in c671dd17dced0d51845dc8d2148f288c4c44ecb2 doesn’t add the ' separators…?

Care to explain the scaling_factor value?

XorHistogram claims to use 8 GB of RAM. Could be a bit much if we want to be able to also run benchmarks on low-end devices.

 0diff --git a/src/bench/xor.cpp b/src/bench/xor.cpp
 1index 2ba8b17f08..bb193c7fcf 100644
 2--- a/src/bench/xor.cpp
 3+++ b/src/bench/xor.cpp
 4@@ -96269,18 +96269,13 @@ static void XorHistogram(benchmark::Bench& bench)
 5 
 6         total_bytes += scaled_count * size;
 7         for (size_t j{0}; j < scaled_count; ++j) {
 8-            std::vector<std::byte> ret;
 9-            ret.reserve(size);
10-            ret.insert(ret.end(), pattern.begin(), pattern.begin() + size);
11-            test_data.push_back(std::move(ret));
12+            test_data.emplace_back(pattern.begin(), pattern.begin() + size);
13         }
14     }
15     assert(total_bytes == 8'129'394'848); // ~8 GB of data
16     std::shuffle(test_data.begin(), test_data.end(), rng); // Make it more realistic & less predictable
17 
18-    std::vector key_bytes{rng.randbytes<std::byte>(8)};
19-    uint64_t key;
20-    std::memcpy(&key, key_bytes.data(), 8);
21+    const uint64_t key{rng.rand64()};
22 
23     size_t offset{0};
24     bench.batch(total_bytes).unit("byte").run([&] {
25@@ -96296,9 +96291,7 @@ static void AutoFileXor(benchmark::Bench& bench)
26     FastRandomContext rng{/*fDeterministic=*/true};
27     auto data{rng.randbytes<std::byte>(1'000'000)};
28 
29-    std::vector<std::byte> empty_key_bytes(8, std::byte{0}); // Test disabled xor
30-    uint64_t empty_key;
31-    std::memcpy(&empty_key, empty_key_bytes.data(), 8);
32+    uint64_t empty_key{0};
33 
34     const fs::path test_path = fs::temp_directory_path() / "xor_benchmark.dat";
35     AutoFile f{fsbridge::fopen(test_path, "wb+"), empty_key};

l0rinc commented at 10:11 pm on November 11, 2024: contributor

The awk script in the comment in https://github.com/bitcoin/bitcoin/commit/c671dd17dced0d51845dc8d2148f288c4c44ecb2 doesn’t add the ’ separators…?

I’ve added them manually.

Care to explain the scaling_factor value?

~~The values in the histogram (i.e. total bytes streamed through xor) add up to 92gb, but 1 byte values occupy half of that.~~ (edit: this is only true for the first row, in other cases we would need to multiply by the first column)

Since we have thousands of big values that represent vector of that size, we have to include all of those into the test set at least once. I had to scale down the histogram such that the lower values, having a few hundred occurrences aren’t flattened out completely - to make the histogram still representative. Do you have a better idea?

XorHistogram claims to use 8 GB of RAM. Could be a bit much if we want to be able to also run benchmarks on low-end devices.

Yes, but if I scale it down more, more values will be equal in the histogram and it won’t reflect real usage. That’s why I’ve set it to low priority, we don’t have to run these for every execution.

Edit: pushed some nits to git range-diff 866f4fa521f6932162570d6531055cc007e3d0cd..a1232973189126cfc9526713011461709685fcc8 866f4fa521f6932162570d6531055cc007e3d0cd..57caa965b5ae284e501f892415d60fcb536f4c0e

l0rinc force-pushed on Nov 11, 2024

l0rinc renamed this:
~~optimization: change XOR obfuscation key from `std::vector<std::byte>(8)` to `uint64_t`~~
optimization: change XOR obfuscation key from `std::vector<std::byte>{8}` to `uint64_t`
on Nov 11, 2024

hodlinator commented at 11:43 pm on November 11, 2024: contributor

Care to explain the scaling_factor value?

The values in the histogram (i.e. total bytes streamed through xor) add up to ~92gb, but 1 byte values occupy half of that. I had to scale down the histogram such that the lower values, having a few hundred occurrences aren’t flattened out completely - to make the histogram still representative. Do you have a better idea?

So raw_histogram really is the raw data?

Added:

0    uint64_t raw_total_bytes{0};
1    for (size_t i{0}; i < std::size(raw_histogram); i += 2) {
2        const uint64_t size = raw_histogram[i];
3        const uint64_t count = raw_histogram[i + 1];
4        raw_total_bytes += size * count;
5    }
6    printf("raw_total_bytes: %ld", raw_total_bytes);

Got:

0raw_total_bytes: 1'277'637'746'542 # "'" added manually

So something like 1.28TB instead of 92GB, but I can’t seem to get my head screwed on right today.

If I re-understood correctly what the code was doing with the scaling - it’s doing only 1'000'000 XOR-passes for the most common size (1 byte) instead of 47'584'838'861, and scaling down the number of XOR-passes for the others by the same factor.

Thought the following would be slightly faster:

 0diff --git a/src/bench/xor.cpp b/src/bench/xor.cpp
 1index 2ba8b17f08..5e933d74ea 100644
 2--- a/src/bench/xor.cpp
 3+++ b/src/bench/xor.cpp
 4@@ -96253,7 +96253,7 @@ static void XorHistogram(benchmark::Bench& bench)
 5     };
 6 
 7     const auto max_count{static_cast<double>(raw_histogram[1])}; // 1 byte is the most common
 8-    const auto scaling_factor{1'000'000U};
 9+    const auto scaling_factor{1'000'000.0 / max_count};
10 
11     FastRandomContext rng{/*fDeterministic=*/true};
12     const size_t pattern_size = raw_histogram[std::size(raw_histogram) - 2];
13@@ -96265,7 +96265,7 @@ static void XorHistogram(benchmark::Bench& bench)
14     for (size_t i{0}; i < std::size(raw_histogram); i += 2) {
15         const uint64_t size = raw_histogram[i];
16         const uint64_t count = raw_histogram[i + 1];
17-        const size_t scaled_count{static_cast<size_t>(std::ceil((static_cast<double>(count) / max_count) * scaling_factor))};
18+        const size_t scaled_count{static_cast<size_t>(std::ceil(static_cast<double>(count) * scaling_factor))};
19 
20         total_bytes += scaled_count * size;
21         for (size_t j{0}; j < scaled_count; ++j) {

Still passes the total_bytes == 8'129'394'848-assert unmodified, but seems slightly slower in my measurements (including for posterity).

hodlinator commented at 9:21 am on November 12, 2024: contributor

Reviewed 57caa965b5ae284e501f892415d60fcb536f4c0e

git range-diff master a123297 57caa96

Applied most of my nits that were still valid.

I understand you were burned by endianness but I disagree that it’s worth sacrificing readability where endianness is a non-issue.

For cases where endianness is a concern, I suggest the following pattern:

0uint64_t xor_key;
1constexpr std::array<uint8_t, sizeof xor_key> xor_pat{std::to_array<uint8_t>({0xff, 0x00, 0xff, 0x00, 0xff, 0x00, 0xff, 0x00})};
2std::memcpy(&xor_key, xor_pat.data(), sizeof xor_key);

Declare the uint64_t first so sizeof can be used in place of magic 8.
Use constexpr std::array instead of const std::vector to constrain the size.
Use to_array initialization since regular brace-initialization will only complain if too many elements are given, not too few.
Use uint8_t in favor of std::byte to improve readability.
Use sizeof <target> for the memcpy call since it is good practice to avoid writing out of bounds and also avoids the magic 8.

Use ~0ULL to express a uint64_t with all bits set, ensuring one doesn’t forget an f.

 0diff --git a/src/bench/xor.cpp b/src/bench/xor.cpp
 1index f94a1a6f96..fdf41b3696 100644
 2--- a/src/bench/xor.cpp
 3+++ b/src/bench/xor.cpp
 4@@ -96275,10 +96275,7 @@ static void XorHistogram(benchmark::Bench& bench)
 5     assert(total_bytes == 8'129'394'848); // ~8 GB of data
 6     std::shuffle(test_data.begin(), test_data.end(), rng); // Make it more realistic & less predictable
 7 
 8-    std::vector key_bytes{rng.randbytes<std::byte>(8)};
 9-    uint64_t key;
10-    std::memcpy(&key, key_bytes.data(), 8);
11-
12+    const uint64_t key{rng.rand64()};
13     size_t offset{0};
14     bench.batch(total_bytes).unit("byte").run([&] {
15         for (auto& data : test_data) {
16@@ -96293,10 +96290,7 @@ static void AutoFileXor(benchmark::Bench& bench)
17     FastRandomContext rng{/*fDeterministic=*/true};
18     auto data{rng.randbytes<std::byte>(1'000'000)};
19 
20-    std::vector<std::byte> empty_key_bytes(8, std::byte{0}); // Test disabled xor
21-    uint64_t empty_key;
22-    std::memcpy(&empty_key, empty_key_bytes.data(), 8);
23-
24+    const uint64_t empty_key{0}; // Test disabled xor
25     const fs::path test_path = fs::temp_directory_path() / "xor_benchmark.dat";
26     AutoFile f{fsbridge::fopen(test_path, "wb+"), empty_key};
27     bench.batch(data.size()).unit("byte").run([&] {
28diff --git a/src/test/streams_tests.cpp b/src/test/streams_tests.cpp
29index 5e9b607b3a..ac51eb3e51 100644
30--- a/src/test/streams_tests.cpp
31+++ b/src/test/streams_tests.cpp
32@@ -77,9 +77,9 @@ BOOST_AUTO_TEST_CASE(xor_file)
33     auto raw_file{[&](const auto& mode) { return fsbridge::fopen(xor_path, mode); }};
34     const std::vector<uint8_t> test1{1, 2, 3};
35     const std::vector<uint8_t> test2{4, 5};
36-    const std::vector xor_pat{std::byte{0xff}, std::byte{0x00}, std::byte{0xff}, std::byte{0x00}, std::byte{0xff}, std::byte{0x00}, std::byte{0xff}, std::byte{0x00}};
37     uint64_t xor_key;
38-    std::memcpy(&xor_key, xor_pat.data(), 8);
39+    constexpr std::array<uint8_t, sizeof xor_key> xor_pat{std::to_array<uint8_t>({0xff, 0x00, 0xff, 0x00, 0xff, 0x00, 0xff, 0x00})};
40+    std::memcpy(&xor_key, xor_pat.data(), sizeof xor_key);
41 
42     {
43         // Check errors for missing file
44@@ -293,12 +293,8 @@ BOOST_AUTO_TEST_CASE(streams_serializedata_xor)
45     in.push_back(std::byte{0xf0});
46 
47     {
48-        const std::vector xor_pat{std::byte{0xff}, std::byte{0xff}, std::byte{0xff}, std::byte{0xff}, std::byte{0xff}, std::byte{0xff}, std::byte{0xff}, std::byte{0xff}};
49-        uint64_t xor_key;
50-        std::memcpy(&xor_key, xor_pat.data(), 8);
51-
52         DataStream ds{in};
53-        ds.Xor(xor_key);
54+        ds.Xor(~0ULL);
55         BOOST_CHECK_EQUAL("\xf0\x0f"s, ds.str());
56     }
57 
58@@ -307,9 +303,9 @@ BOOST_AUTO_TEST_CASE(streams_serializedata_xor)
59     in.push_back(std::byte{0x0f});
60 
61     {
62-        const std::vector xor_pat{std::byte{0xff}, std::byte{0x0f}, std::byte{0xff}, std::byte{0x0f}, std::byte{0xff}, std::byte{0x0f}, std::byte{0xff}, std::byte{0x0f}};
63         uint64_t xor_key;
64-        std::memcpy(&xor_key, xor_pat.data(), 8);
65+        constexpr std::array<uint8_t, sizeof xor_key> xor_pat{std::to_array<uint8_t>({0xff, 0x0f, 0xff, 0x0f, 0xff, 0x0f, 0xff, 0x0f})};
66+        std::memcpy(&xor_key, xor_pat.data(), sizeof xor_key);
67 
68         DataStream ds{in};
69         ds.Xor(xor_key);

fanquake commented at 10:07 am on November 12, 2024: member

I haven’t really looked at the changes here, but just looking at the diff (+96'000 lines), my feedback would be that you’ll need to change approach in regards to making your data available (if the intent is to have that included), as I doubt we’ll be adding 96'000 lines to bench/xor.cpp. You could look at how we generate a header from bench/data/block413567.raw for a different approach, as including a small binary blob, and parsing it into a header at compile time is far more palatable.

l0rinc commented at 3:16 pm on November 12, 2024: contributor

Thanks @fanquake, I thought of that, can you please help me understand the constraints? Wouldn’t that require a cmake generation step from binary to header which would basically produce the exact same lines as what we have now? Would it help if I simply extracted it to a separate header file instead?

So something like 1.28TB instead of 92GB, but I can’t seem to get my head screwed on right today.

Yeah, I’ve edited that part since, my napkin calculations were only true for the first row, in other cases we would need to multiply by the first column - like you did. But the point is that it’s a lot of data that we have to scale down.

but seems slightly slower in my measurements

I thought of that as well, but wanted to avoid floating point conversion (likely the reason for the slowness in your example)

fanquake commented at 5:51 pm on November 12, 2024: member

Wouldn’t that require a cmake generation step from binary to header which would basically produce the exact same lines as what we have now?

Yes. See bench/data/block413567.raw & bench/data/block413567.raw.h, where at build time a header file of ~125'000 lines is produced.

Would it help if I simply extracted it to a separate header file instead?

I don’t think so. The point is more to not add 100'000s of lines of “data” to this repo, which doesn’t scale across many benchmarks, creates unusable diffs, leaves (source) files unviewable on GH etc.

l0rinc force-pushed on Nov 13, 2024

l0rinc commented at 3:27 pm on November 13, 2024: contributor

I understand you were burned by endianness but I disagree that it’s worth sacrificing readability where endianness is a non-issue.

Thanks @hodlinator for the suggestions, I tried them all, but in the end decided that I value consistency more than coming up with a separate solution for each test case. These are ugly, I agree, but at least they’re testing the setup we’re using in prod. I did however change the hard-coded 8 values to sizeof xor_key (for memcpy) or sizeof(uint64_t) for vector inits.

The point is more to not add 100'000s of lines of “data” to this repo

I have stored the sorted diffs in the end (since the lines are correlated, i.e. more likely to have similar neighbours) and compressed them using .tar.gz (added the generator python script as a gist, please verify) - this way the histogram data is ~100 kb instead of 1.7 MB (thanks for the hint @fanquake). I’ve extended GenerateHeaderFromRaw.cmake with compression support (adjusting GenerateHeaders.cmake to trim the suffix from the header name) and added more safety asserts to make sure the data read back is the same as before.

l0rinc force-pushed on Dec 2, 2024

in src/streams.h:43 in f2fd1f7c04 outdated

41+}
42+inline void Xor(Span<std::byte> write, const uint64_t key)
43+{
44+    assert(key);
45+    for (constexpr auto size = sizeof(uint64_t); write.size() >= size; write = write.subspan(size)) {
46+        XorInt(write, key, size);

hodlinator commented at 2:51 pm on December 3, 2024:

nit: Clearer name and source of sizeof expression.

0    for (constexpr auto chunk_size = sizeof(key); write.size() >= chunk_size; write = write.subspan(chunk_size)) {
1        XorInt(write, key, chunk_size);

l0rinc commented at 5:31 pm on December 6, 2024:

Not convinced chunk_size is better - but will use sizeof(key) on next push

l0rinc commented at 5:36 pm on December 6, 2024:

Done, kept the name

in src/bench/xor.cpp:1075 in f2fd1f7c04 outdated

108+    }
109 }
110 
111-BENCHMARK(Xor, benchmark::PriorityLevel::HIGH);
112+BENCHMARK(XorHistogram, benchmark::PriorityLevel::LOW);
113+BENCHMARK(AutoFileXor, benchmark::PriorityLevel::LOW);

hodlinator commented at 3:00 pm on December 3, 2024:

Missing newline at EOF in latest version.

l0rinc commented at 5:31 pm on December 6, 2024:

I still don’t see the reason for keeping them, but I’ll add it back on next push if it’s important

l0rinc commented at 5:36 pm on December 6, 2024:

Done

in src/bench/xor.cpp:34 in f2fd1f7c04 outdated

33+}
34+
35+static void XorHistogram(benchmark::Bench& bench)
36+{
37+    // The histogram represents util::Xor method's write.size() histograms for the first 860k blocks
38+    // aggregated and encoded with https://gist.github.com/l0rinc/a44da845ad32ec89c30525507cdd28ee

hodlinator commented at 3:28 pm on December 3, 2024:

nits: Although there is precedent for adding references to gists, I’m not sure we should encourage it. Would have preferred a file in this repo’s contrib/ directory.

Would also have preferred that it did the full calculation of the .tgz file instead of having a hard-coded array, computing it by looking at linearalized blocks on disk.

l0rinc commented at 5:32 pm on December 6, 2024:

Removed

hodlinator commented at 4:04 pm on December 3, 2024: contributor

Code reviewed f2fd1f7c043a2782cb2bf3c9fe7e2f94c17728b5

0₿ git range-diff master 57caa96 f2fd1f7
1₿ git show e314bb7e00 > old
2₿ git show 91a8fde051 > new
3₿ meld old new

Thanks for using more constexpr std::arrays and clearer sizeofs!

Nice that block data could be compressed to such a large extent.

nit: Would prefer the *.cmake changes where broken out into their own commit, keeping only src/bench/CMakeLists.txt as part of the benchmark change.

ryanofsky commented at 5:00 pm on December 3, 2024: contributor

Concept ACK, but curious for more feedback from @maflcko about this PR. The actual code changes here do not seem too complicated but maybe they make the code less generic. I wonder if you think there are concrete downsides to this PR, or if the changes are ok but possibly not be worth the review effort (as #31144 (comment) seems to suggest)

I’m happy to spend time reviewing this if it improves performance and doesn’t cause other problems.

this way the histogram data is ~100 kb instead of 1.7 MB

Current approach seems ok to me, but wondering it it might be better to just use a sampling of the most common write sizes instead of including the entire histogram. It seems like if you take the top 50 sizes it covers 99.6% of the writes, and might make the test more maintainable and PR easier to understand without changing results too much.

using histogram from https://gist.github.com/l0rinc/a44da845ad32ec89c30525507cdd28ee

 0    cut = 50
 1    hist_count = rest_count = 0
 2    histogram.sort(key=lambda h: (-h[1]*h[0]))
 3    for i, (size, count) in enumerate(histogram):
 4        if i < cut:
 5           print(f"{size=}, {count=}")
 6           hist_count += count
 7        else:
 8           rest_count += count
 9
10    print()
11    print(f"{hist_count=} {hist_count/(hist_count+rest_count)*100:.1f}%")
12    print(f"{rest_count=} {rest_count/(hist_count+rest_count)*100:.1f}%")

 0size=32, count=5369404406
 1size=106, count=1193555153
 2size=71, count=1763349816
 3size=107, count=1064027363
 4size=25, count=2705183236
 5size=33, count=2025696499
 6size=4, count=14983070199
 7size=72, count=827712501
 8size=23, count=2357347372
 9size=1, count=47584838861
10size=8, count=5939128629
11size=21, count=2019807250
12size=22, count=1616301504
13size=34, count=682781008
14size=139, count=155158814
15size=253, count=83729340
16size=105, count=198656556
17size=64, count=275253628
18size=4096, count=3905157
19size=138, count=112918349
20size=252, count=48105681
21size=254, count=38940977
22size=35, count=269339238
23size=218, count=25918921
24size=140, count=37826769
25size=83, count=53818260
26size=217, count=13625647
27size=219, count=12992432
28size=108, count=24662023
29size=488, count=4909014
30size=27, count=87057844
31size=489, count=4483016
32size=28, count=78184146
33size=40, count=50600412
34size=113, count=17829712
35size=65, count=29197009
36size=128, count=13323530
37size=123, count=12986923
38size=37, count=41467794
39size=114, count=9826401
40size=26, count=35870265
41size=130, count=7111451
42size=29, count=30858682
43size=487, count=1836899
44size=70, count=12772097
45size=131, count=6398914
46size=109, count=7506161
47size=490, count=1380253
48size=95901, count=6384
49size=30, count=19197417
50
51hist_count=91959859913 99.6%
52rest_count=382932220 0.4%

maflcko commented at 5:09 pm on December 3, 2024: member

Concept ACK, but curious for more feedback from @maflcko about this PR. The actual code changes here do not seem too complicated but maybe they make the code less generic.

There is a good chance that increasing the size of the vector is insufficient, if there is ever a need to increase it to more than 8 bytes, so a complete rewrite may be needed in that case anyway. However, this is just my guess and only time will tell. So I’d say this change is probably fine for now.

Would still be nice to if there was a way to take all of it out of the hot path (possibly with higher overall benefits), but I don’t know if such a change is possible and will replace this pull.

l0rinc commented at 5:44 pm on December 3, 2024: contributor

Would still be nice to if there was a way to take all of it out of the hot path

Can you give me hints on how to do that? Since we have a primitive as a key now, we already skip xor with 0 value now, see https://github.com/bitcoin/bitcoin/pull/31144/files#diff-4020c723bb55e114bdc7ff769086a765dcc7ccfb61da2047a315db16c0c7a8b4R295

but wondering it it might be better to just use a sampling of the most common write sizes @fanquake mentioned that he thinks this benchmark could be useful - if he’s fine with the truncated version as well, I’ll simplify (would solve some of @hodlinator’s cmake concerns as well).

maflcko commented at 6:33 pm on December 3, 2024: member

It seems like if you take the top 50 sizes it covers 99.6% of the writes

I had the same thought. Obviously there could be an unlikely problem if the remaining 0.04% of writes accounted for the majority of the time, but that seems unlikely. Other than that, taking only the top N seems preferable.

l0rinc commented at 9:38 am on December 5, 2024: contributor

Would still be nice to if there was a way to take all of it out of the hot path

Since blocks are XOR-ed as well, I can’t meaningfully test it with a reindex(-chainstate), so I did 2 full IBDs until 800k blocks, rebased after #30039 with -blocksxor=0 to test whether we can disable xor completely now.

0hyperfine \
1--runs 2 \
2--export-json /mnt/my_storage/IBD-xor-rebased.json \
3--parameter-list COMMIT e1074081c9f1895a4f629dfee347ceae484a10d3,f2fd1f7c043a2782cb2bf3c9fe7e2f94c17728b5 \
4--prepare 'rm -rf /mnt/my_storage/BitcoinData/* && git checkout {COMMIT} && git clean -fxd && git reset --hard && cmake -B build -DCMAKE_BUILD_TYPE=Release -DBUILD_UTIL=OFF -DBUILD_TX=OFF -DBUILD_TESTS=OFF -DENABLE_WALLET=OFF -DINSTALL_MAN=OFF && cmake --build build -j$(nproc)' \
5'COMMIT={COMMIT} ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=800000 -blocksxor=0 -dbcache=10000 -printtoconsole=0'

0Benchmark 1: COMMIT=e1074081c9f1895a4f629dfee347ceae484a10d3 ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=800000 -blocksxor=0 -dbcache=10000 -printtoconsole=0
1  Time (mean ± σ):     25797.921 s ± 61.629 s    [User: 26803.189 s, System: 1457.936 s]
2  Range (min … max):   25754.343 s … 25841.500 s    2 runs
3 
4Benchmark 2: COMMIT=f2fd1f7c043a2782cb2bf3c9fe7e2f94c17728b5 ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=800000 -blocksxor=0 -dbcache=10000 -printtoconsole=0
5  Time (mean ± σ):     23751.046 s ± 342.376 s    [User: 25322.345 s, System: 1509.236 s]
6  Range (min … max):   23508.949 s … 23993.142 s    2 runs

Which indicates a 9% speedup compared to the baseline:

0Summary
1  COMMIT=f2fd1f7c043a2782cb2bf3c9fe7e2f94c17728b5 ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=800000 -blocksxor=0 -dbcache=10000 -printtoconsole=0 ran
2    1.09 ± 0.02 times faster than COMMIT=e1074081c9f1895a4f629dfee347ceae484a10d3 ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=800000 -blocksxor=0 -dbcache=10000 -printtoconsole=0

maflcko commented at 9:46 am on December 5, 2024: member

Would still be nice to if there was a way to take all of it out of the hot path (possibly with higher overall benefits), but I don’t know if such a change is possible and will replace this pull.

Actually, this change here also affects RPC performance, not just internal validation, so this can be done in a follow-up or separate pull, if it is possible at all.

l0rinc commented at 1:48 pm on December 6, 2024: contributor

More context: The previous benchmark was for completely turning off XOR - but we can only do that for new IBD by explicitly setting it to 0. But for the majority of cases we likely want to still to the xor, so this PR is meant to speed it up. I have remeasured it with doing full IBD until 800k blocks (two runs to measure stability, since reindex wouldn’t cover all usages of XOR):

0hyperfine \
1--runs 2 \
2--export-json /mnt/my_storage/IBD-xor-rebased.json \
3--parameter-list COMMIT e1074081c9f1895a4f629dfee347ceae484a10d3,f2fd1f7c043a2782cb2bf3c9fe7e2f94c17728b5 \
4--prepare 'rm -rf /mnt/my_storage/BitcoinData/* && git checkout {COMMIT} && git clean -fxd && git reset --hard && cmake -B build -DCMAKE_BUILD_TYPE=Release -DBUILD_UTIL=OFF -DBUILD_TX=OFF -DBUILD_TESTS=OFF -DENABLE_WALLET=OFF -DINSTALL_MAN=OFF && cmake --build build -j$(nproc)' \
5'COMMIT={COMMIT} ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=800000 -dbcache=10000 -printtoconsole=0'

0Benchmark 1: COMMIT=e1074081c9f1895a4f629dfee347ceae484a10d3 ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=800000 -dbcache=10000 -printtoconsole=0
1  Time (mean ± σ):     25601.461 s ± 65.686 s    [User: 27025.116 s, System: 1586.908 s]
2  Range (min … max):   25555.014 s … 25647.907 s    2 runs
3 
4Benchmark 2: COMMIT=f2fd1f7c043a2782cb2bf3c9fe7e2f94c17728b5 ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=800000 -dbcache=10000 -printtoconsole=0
5  Time (mean ± σ):     24526.781 s ± 389.029 s    [User: 25525.801 s, System: 1552.625 s]
6  Range (min … max):   24251.697 s … 24801.866 s    2 runs

Which indicates that this will speed up IBD by roughly 4% on average (now that 30039 was merged the difference is more obvious):

0Summary
1 COMMIT=f2fd1f7c043a2782cb2bf3c9fe7e2f94c17728b5 ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=800000 -dbcache=10000 -printtoconsole=0 ran
2   1.04 ± 0.02 times faster than COMMIT=e1074081c9f1895a4f629dfee347ceae484a10d3 ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=800000 -dbcache=10000 -printtoconsole=0

l0rinc renamed this:
~~optimization: change XOR obfuscation key from `std::vector<std::byte>{8}` to `uint64_t`~~
optimization: speed up XOR by 4% (9% when disabled) by applying it in larger batches
on Dec 6, 2024

l0rinc force-pushed on Dec 6, 2024

l0rinc commented at 5:30 pm on December 6, 2024: contributor

Thanks for the reviews and hints, I’ve pushed the following changes:

Reverted all cmake changes @hodlinator mentioned and histogram archives, and based on the hints of @ryanofsky and @maflcko I’ve kept only the top entries (by frequency, re-sorted by size), making sure that the really big write-vectors are also covered (so kept the first 1000 instead of just the first 50). This enabled putting all the data in the sourcefile.
Added Assumes to each xor to check that we don’t have any useless calls with 0 keys - making sure we “turn off” the feature when we can.

l0rinc force-pushed on Dec 6, 2024

in src/streams.h:58 in b9c847fd09 outdated

68+
69+inline uint64_t RotateKey(const uint64_t key, const size_t key_offset)
70+{
71+    Assume(key);
72+    size_t key_rotation{8 * key_offset};
73+    if (key_rotation % 64 == 0) return key;

maflcko commented at 1:37 pm on December 7, 2024:

std::rotr can handle it fine, so this can be removed. Also the codegen is better without this for some reason?

l0rinc commented at 2:12 pm on December 10, 2024:

yes, it’s an optimization to avoid doing any rotation when it would wrap around - it would work without this as well. It’s not a measurable speedup, though, so I can remove it if you insist.

in src/node/blockstorage.cpp:1178 in b9c847fd09 outdated

1173@@ -1174,7 +1174,9 @@ static auto InitBlocksdirXorKey(const BlockManager::Options& opts)
1174         };
1175     }
1176     LogInfo("Using obfuscation key for blocksdir *.dat files (%s): '%s'\n", fs::PathToString(opts.blocks_dir), HexStr(xor_key));
1177-    return std::vector<std::byte>{xor_key.begin(), xor_key.end()};
1178+    uint64_t key;
1179+    std::memcpy(&key, xor_key.data(), sizeof key);

maflcko commented at 1:20 pm on December 10, 2024:

I wonder if it wouldn’t be better to have a class take care of construction, so that the memcpy is limited to one place. Something like:

0class XorKey{
1 uint64_t m_data;
2 XorKey(std::span<std::byte, 8> sp);
3}

IIUC the compilers should still be able to treat this as an integral in codegen.

l0rinc commented at 2:23 pm on December 10, 2024:

You mean like a

0union {
1    uint64_t as_uint64; 
2    std::array<std::byte, 8> as_bytes;
3};

or just a static method that initializes the uint64_t in a single place? Or should we encapsulate the XorKey concept behind a struct as well?

sipa commented at 2:59 pm on December 10, 2024:

It is undefined behavior in C++ to read from a union member that wasn’t the one most recently written to.

Using memcpy between a data type and something (equivalent to) a char/byte array is not UB.

maflcko commented at 5:49 pm on December 10, 2024:

You mean like a

0union {
1    uint64_t as_uint64; 
2    std::array<std::byte, 8> as_bytes;
3};

No, I mean just a simple struct. Have you seen my suggestion?

The benefits would be that passing the integral value around would be type-safe. Also, the endian-considerations are fully contained in a simple struct, as opposed to the xor internal implementation detail and all modules that use xor. Finally, the memcpy would also be contained in a single place. Overall this could make the code smaller, or not, depending on how many users are there. However, in any case, encapsulating the assumptions around type-safety, and endianness would already be worth it in my view.

l0rinc commented at 3:51 pm on April 5, 2025:

The latest version uses a dedicated Obfuscation type - very good suggestion, it’s a lot cleaner this way, thanks

in src/test/streams_tests.cpp:22 in e1074081c9 outdated

13@@ -14,13 +14,73 @@ using namespace std::string_literals;
14 
15 BOOST_FIXTURE_TEST_SUITE(streams_tests, BasicTestingSetup)
16 
17+BOOST_AUTO_TEST_CASE(xor_roundtrip_random_chunks)

ryanofsky commented at 3:49 pm on December 10, 2024:

In commit “test: Compare util::Xor with randomized inputs against simple impl” (e1074081c9f1895a4f629dfee347ceae484a10d3)

Would be good to add comment explaining test. Test seems to be to encoding and then decoding random byte vectors with random 8 byte xor keys using differently-sized and differently-aligned random chunks for encoding and decoding, and then making sure the byte vectors are unchanged after the round trip.

l0rinc commented at 3:51 pm on April 5, 2025:

Done, thanks (though different-aligned is a consequence of different-sized)

in src/test/streams_tests.cpp:97 in e1074081c9 outdated

42+        apply_random_xor_chunks(roundtrip, key_bytes, rng);
43+        BOOST_CHECK(original == roundtrip);
44+    }
45+}
46+
47+BOOST_AUTO_TEST_CASE(xor_bytes_reference)

ryanofsky commented at 3:56 pm on December 10, 2024:

In commit “test: Compare util::Xor with randomized inputs against simple impl” (e1074081c9f1895a4f629dfee347ceae484a10d3)

Again adding a test comment could be helpful here. This test is making sure the util::Xor function returns same results as a naive byte-by-byte xor with an 8-byte key, using 100 random sized random byte vectors.

Would suggest moving xor_bytes_reference test up before the xor_roundtrip_random_chunks since it seems like a simpler test that’s an easier introduction to this code and could be followed by more complicated tests.

l0rinc commented at 4:41 pm on April 5, 2025:

Done

in src/bench/xor.cpp:17 in caafbd0692 outdated

16+static void XorHistogram(benchmark::Bench& bench)
17 {
18-    FastRandomContext frc{/*fDeterministic=*/true};
19-    auto data{frc.randbytes<std::byte>(1024)};
20-    auto key{frc.randbytes<std::byte>(31)};
21+    // The top util::Xor method's [write.size(), frequency] calls for the IBD of the first 860k blocks

ryanofsky commented at 5:07 pm on December 10, 2024:

re: #31144 (comment)

I think instead of taking the top 1000 calls by count, it would make sense to take the top X calls by (size*count) as suggested #31144 (comment), where X could be smaller than 1000, because (size*count) should more closely approximate the time spent on all writes of a given size than count ignoring size. This should make the test more realistic and also allow shrinking the histogram.

l0rinc commented at 3:53 pm on April 5, 2025:

This isn’t needed anymore, since in #31551 we batch all the tiny calls now, so this PR only deals with doing the obfuscation on 64 bits

in src/test/streams_tests.cpp:61 in e1074081c9 outdated

56+    for (size_t test{0}; test < 100; ++test) {
57+        const size_t write_size{1 + rng.randrange(100U)};
58+        const size_t key_offset{rng.randrange(3 * 8U)}; // Should wrap around
59+
60+        std::vector key_bytes{rng.randbytes<std::byte>(sizeof(uint64_t))};
61+        uint64_t key;

ryanofsky commented at 11:00 pm on December 10, 2024:

In commit “test: Compare util::Xor with randomized inputs against simple impl” (e1074081c9f1895a4f629dfee347ceae484a10d3)

Would be good to use consistent variable names in these tests. The other tests are calling key vectors xor_pat and calling key values xor_key, while these tests are calling key vectors key_bytes and calling key values key. Would be clearer to use consistent names.

Also, after this PR each test is keeping two different variables and two different representations for each key. It would be good clean this up afterwards and just have one variable per key. Using a dedicated type for keys like the XorKey struct Marco suggested #31144 (review) would be even more ideal.

l0rinc commented at 4:41 pm on April 5, 2025:

Good find, unified the test in the first commit

in src/bench/xor.cpp:15 in caafbd0692 outdated

11 #include <cstddef>
12+#include <map>
13 #include <vector>
14 
15-static void Xor(benchmark::Bench& bench)
16+static void XorHistogram(benchmark::Bench& bench)

ryanofsky commented at 11:23 pm on December 10, 2024:

In commit “bench: Make Xor benchmark more representative” (caafbd069246848a8bdfc2f42fd1d692a824de94)

Would be a helpful to have a comment saying what this benchmark is measuring. Maybe: // Measure speed of util::Xor function applied to a set of byte vectors. The byte vectors are filled with random data and have sizes matching a distribution of data write sizes observed during IBD.

l0rinc commented at 4:41 pm on April 5, 2025:

This was removed since

in src/bench/xor.cpp:1041 in caafbd0692 outdated

1040+            test_data.emplace_back(rand_bytes);
1041+        }
1042+    }
1043+    assert(total_bytes == 114'929'502);
1044+
1045+    std::ranges::shuffle(test_data, rng); // Make it more realistic & less predictable

ryanofsky commented at 11:48 pm on December 10, 2024:

In commit “bench: Make Xor benchmark more representative” (caafbd069246848a8bdfc2f42fd1d692a824de94)

It seems awkward to have code with all these hardcoded values that will make it hard to update the histogram, and to end up with test data we can’t directly control the size of. Would be cleaner to just choose how much data to generate, and generate it without hardcoding values from the histogram. Would suggest something more like the following, with an easily adjustable size that doesn’t hardcode other values.

 0std::vector<std::vector<std::byte>> test_data(1'000'000);
 1FastRandomContext rng{/*fDeterministic=*/true};
 2
 3std::vector<std::pair<uint64_t, size_t>> cumulative_count;
 4uint64_t total_count = 0;
 5for (const auto& [size, count] : histogram) {
 6    total_count += count;
 7    cumulative_count.emplace_back(total_count, size);
 8}
 9
10auto sample_size = [&]() -> size_t {
11    uint64_t rand_val = rng.randrange<uint64_t>(0, total_count);
12    return std::lower_bound(
13        cumulative_count.begin(), cumulative_count.end(), rand_val,
14        [](const std::pair<uint64_t, size_t>& entry, uint64_t value) {
15            return entry.first < value;
16        })->second;
17};
18
19uint64_t total_bytes{0};
20for (auto& entry : test_data) {
21    entry = rng.randbytes<std::byte>(sample_size());
22    total_bytes += entry.size();
23}
24
25std::ranges::shuffle(test_data, rng);

(code was generated by chatgpt)

l0rinc commented at 4:40 pm on April 5, 2025:

Thanks, this was removed since

in src/bench/xor.cpp:1056 in caafbd0692 outdated

1055+        }
1056+        ankerl::nanobench::doNotOptimizeAway(test_data);
1057+    });
1058+}
1059+
1060+static void AutoFileXor(benchmark::Bench& bench)

ryanofsky commented at 0:03 am on December 11, 2024:

In commit “bench: Make Xor benchmark more representative” (caafbd069246848a8bdfc2f42fd1d692a824de94)

Not really sure I understand the goal if this benchmark. Is it significant that the xor key is 0? Would be helpful to have a description of what this benchmark is measuring and indicating.

l0rinc commented at 4:40 pm on April 5, 2025:

Was removed since

in src/bench/xor.cpp:1064 in b9c847fd09 outdated

1063+    const auto data{rng.randbytes<std::byte>(1'000'000)};
1064+
1065+    const std::vector empty_key_bytes(8, std::byte{0}); // Test disabled xor
1066+    uint64_t empty_key;
1067+    std::memcpy(&empty_key, empty_key_bytes.data(), 8);
1068+

ryanofsky commented at 0:05 am on December 11, 2024:

In commit “bench: Make Xor benchmark more representative” (caafbd069246848a8bdfc2f42fd1d692a824de94)

Again this looks like a nice place to replace these dual key representations with a clean XorKey value.

l0rinc commented at 4:40 pm on April 5, 2025:

Yes, this was done in the last commit

in src/streams.h:31 in 7a2e5ec977 outdated

28 #include <vector>
29+#include <util/check.h>
30 
31 namespace util {
32-inline void Xor(Span<std::byte> write, Span<const std::byte> key, size_t key_offset = 0)
33+inline void XorInt(Span<std::byte> write, const uint64_t key, const size_t size)

ryanofsky commented at 0:17 am on December 11, 2024:

In commit “optimization: Xor 64 bits together instead of byte-by-byte” (7a2e5ec97700584eeac6f8b08ef697df6a147606)

Having this size parameter is confusing and seems unnecessary. This would be clearer as:

0inline void XorInt(Span<std::byte> write, const uint64_t key)
1{
2    uint64_t raw = 0;
3    memcpy(&raw, write.data(), write.size());
4    raw ^= key;
5    memcpy(write.data(), &raw, write.size());
6}

Call sites would need to use .subspan(0, size) but this would also make them more obvious.

l0rinc commented at 4:40 pm on April 5, 2025:

This isn’t needed anymore in the latest version

in src/streams.h:33 in 7a2e5ec977 outdated

32-inline void Xor(Span<std::byte> write, Span<const std::byte> key, size_t key_offset = 0)
33+inline void XorInt(Span<std::byte> write, const uint64_t key, const size_t size)
34 {
35-    if (key.size() == 0) {
36-        return;
37+    Assume(key);

ryanofsky commented at 0:28 am on December 11, 2024:

In commit “optimization: Xor 64 bits together instead of byte-by-byte” (7a2e5ec97700584eeac6f8b08ef697df6a147606)

Seems like it would be less fragile and would simplify callers to replace this Assume(key) with if (!key) return;

l0rinc commented at 12:19 pm on December 11, 2024:

I could do that, but the Assume here was meant to make sure no calls get here when the key is 0 in the first place - since that can usually eliminate other work as well (e.g. MakeWritableByteSpan in DataStream#Xor)

in src/dbwrapper.cpp:262 in 7a2e5ec977 outdated

266-        std::vector<unsigned char> new_key = CreateObfuscateKey();
267+    obfuscate_key = std::vector<unsigned char>(OBFUSCATE_KEY_NUM_BYTES, '\000'); // Needed for unobfuscated Read
268+    const bool key_missing = !Read(OBFUSCATE_KEY_KEY, obfuscate_key);
269+    if (key_missing && params.obfuscate && IsEmpty()) {
270+        // Initialize non-degenerate obfuscation if it won't upset existing, non-obfuscated data.
271+        const std::vector<unsigned char> new_key = CreateObfuscateKey();

ryanofsky commented at 0:36 am on December 11, 2024:

In commit “optimization: Xor 64 bits together instead of byte-by-byte” (7a2e5ec97700584eeac6f8b08ef697df6a147606)

I’m confused why this code is changing in this commit when it seems unrelated to the stream.h optimization the behavior seems the same as before? If it his is a refactoring cleanup it would be good to move it to a separate commit explaining that it is a refactoring and what the point of these changes may be.

l0rinc commented at 4:40 pm on April 5, 2025:

We can’t write the obfuscation vector directly anymore (since we’re storing an Obfuscation object now) and Read can only read into a vector, so this part needed a temp vector - which looks a bit awkward, indeed. Split it out into a new commit, thanks for the hint!

in src/node/mempool_persist.cpp:62 in 7a2e5ec977 outdated

57@@ -58,15 +58,16 @@ bool LoadMempool(CTxMemPool& pool, const fs::path& load_path, Chainstate& active
58     try {
59         uint64_t version;
60         file >> version;
61-        std::vector<std::byte> xor_key;
62         if (version == MEMPOOL_DUMP_VERSION_NO_XOR_KEY) {
63-            // Leave XOR-key empty
64+            const std::vector xor_key(sizeof(uint64_t), std::byte{'\000'});

ryanofsky commented at 0:42 am on December 11, 2024:

In commit “optimization: Xor 64 bits together instead of byte-by-byte” (7a2e5ec97700584eeac6f8b08ef697df6a147606)

Are all the changes in this file also a refactoring that don’t change behavior? I don’t understand why these changes are in a commit that is supposed to be optimizing stream.h behavior. Would suggest splitting this commit up and and explaining what purpose of these changes are. Maybe they would make more sense in the next commit so this code is not changing twice?

l0rinc commented at 4:40 pm on April 5, 2025:

Now that we have a dedicated Obfuscation type these should be trivial now - let me know if you still think these should be split out to dedicated commits

ryanofsky commented at 1:15 am on December 11, 2024: contributor

Code review b9c847fd093d100628817af98fe837db938160f7. These changes look good and make sense, and I reviewed almost everything but have a few pieces of feedback:

I would very strongly endorse Marco’s suggestion to represent keys with an XorKey struct instead of raw uint64_t values so the code getting and setting keys is simpler, safer, and more uniform, and we can avoid a proliferation of memcpy calls.
I don’t think I understand structure of the third and fourth commits. The third commit seems to be adding an optimization to streams.h but also refactoring code not directly related to streams.h and then the fourth commit is refactoring a lot of the same code that was just refactored, and some of the code that was optimized as well. Would suggest doing this more cleanly in 3 commits:
- First commit adding optimized stream.h/stream.cpp functions and wrappers to provide backwards compatibility so no other code or tests have to change in the commit.
- Second commit updating code and tests to call the optimized stream.h API instead of the backwards compatibility wrappers.
- Third commit deleting stream.h backwards compatibility wrappers.

l0rinc marked this as a draft on Dec 14, 2024

l0rinc force-pushed on Dec 21, 2024

DrahtBot added the label CI failed on Dec 21, 2024

DrahtBot commented at 5:07 pm on December 21, 2024: contributor

🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/34748857550

Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.
An intermittent issue.

Leave a comment here, if you need help tracking down a confusing failure.

l0rinc force-pushed on Dec 21, 2024

l0rinc renamed this:
~~optimization: speed up XOR by 4% (9% when disabled) by applying it in larger batches~~
optimization: batch XOR operations 12% faster IBD
on Dec 22, 2024

l0rinc commented at 11:42 am on December 22, 2024: contributor

The PR has been split into 3 to simplify review, please check those out first:

The 3 changes together achieve a 12.3% speedup in raw IBD (real nodes, multiple runs, 870k blocks, 1Gb dbcache, ssd)

 0hyperfine \
 1--runs 2 \
 2--parameter-list COMMIT d73f37dda221835b5109ede1b84db2dc7c4b74a1,fe7365584bb3703e5691c93fb004772e84db3697 \
 3--prepare 'rm -rf /mnt/my_storage/BitcoinData/* && git checkout {COMMIT} && git clean -fxd && git reset --hard && cmake -B build -DCMAKE_BUILD_TYPE=Release -DBUILD_UTIL=OFF -DBUILD_TX=OFF -DBUILD_TESTS=OFF -DENABLE_WALLET=OFF -DINSTALL_MAN=OFF && cmake --build build -j$(nproc)' \
 4'COMMIT={COMMIT} ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=870000 -dbcache=1000 -printtoconsole=0'                                                          
 5  
 6Benchmark 1: COMMIT=d73f37dda221835b5109ede1b84db2dc7c4b74a1 ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=870000 -dbcache=1000 -printtoconsole=0
 7  Time (mean ± σ):     34098.690 s ± 175.888 s    [User: 51918.900 s, System: 2898.126 s]
 8  Range (min … max):   33974.318 s … 34223.062 s    2 runs
 9  
10Benchmark 2: COMMIT=fe7365584bb3703e5691c93fb004772e84db3697 ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=870000 -dbcache=1000 -printtoconsole=0
11  Time (mean ± σ):     30359.460 s ± 417.536 s    [User: 48322.766 s, System: 2898.793 s]
12  Range (min … max):   30064.218 s … 30654.703 s    2 runs

l0rinc force-pushed on Dec 22, 2024

l0rinc force-pushed on Dec 23, 2024

l0rinc force-pushed on Dec 24, 2024

DrahtBot removed the label CI failed on Dec 24, 2024

Sjors commented at 1:19 pm on January 7, 2025: member

I ran a benchmark to compare this PR (898a07e2ab3e5b653ddadc76f2d04d625f35607c, rebased) against master @ 433412fd8478923dfdb20044f74c5d1e19fa8dd8.

AMD Ryzen 7950x machine with Ubuntu 24.10, with one local network peer and a gigabit internet connection.

Got about a 3% speedup.

0bitcoind -dbcache=30000 -stopatheight=878000 -blocksdir=/magnetic/.bitcoin -addnode=local-network

Before: 5 hours 03 minutes (seems master already improved a bit)

Time excludes the 8 minutes to flush 26Gib worth of chainstate to disk during shutdown. That itself is twice as fast as a few months ago: #30987 (comment)

After: 4 hours and 50 minutes

hodlinator commented at 10:33 am on January 8, 2025: contributor

3% speedup is less to write home about but still good. Having a less constrained (Sjors) -dbcache setting (30GB) would lead to less XOR read/writes.

12.3% speedup (l0rinc) could at least in part be explained by the 1GB setting, leading to more XOR-operations.

An optimization that provides a bigger win for constrained devices should be welcomed.

l0rinc commented at 9:30 am on January 10, 2025: contributor

Thank you @Sjors for testing it. I was surprised to see your config only revealed a 3% change so I reran the full IBDs with the configs you had: -dbcache=30000 -stopatheight=878000 (I had -dbcache=1000 -stopatheight=870000 before). I suspect the difference in our measurements could stem from doing a single run and not including the final dump in the measurements.

I was using a HDD this time and wasn’t seeding from local nodes, so the variance was a bit bigger for me, but I ran both before and after several times and there’s an obvious clustering (the before case was consistently slower than any after run, showing a ~12% speedup on average even with high dbcache):

 0hyperfine \
 1--runs 2 \
 2--parameter-list COMMIT d73f37dda221835b5109ede1b84db2dc7c4b74a1,fe7365584bb3703e5691c93fb004772e84db3697 \
 3--prepare 'rm -rf /mnt/my_storage/BitcoinData/* && git checkout {COMMIT} && git clean -fxd && git reset --hard && cmake -B build -DCMAKE_BUILD_TYPE=Release -DBUILD_UTIL=OFF -DBUILD_TX=OFF -DBUILD_TESTS=OFF -DENABLE_WALLET=OFF -DINSTALL_MAN=OFF && cmake --build build -j$(nproc)' \
 4'COMMIT={COMMIT} ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -dbcache=30000 -stopatheight=878000 -printtoconsole=0' 
 5
 6Benchmark 1: COMMIT=d73f37dda221835b5109ede1b84db2dc7c4b74a1 ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -dbcache=30000 -stopatheight=878000 -printtoconsole=0
 7  Time (mean ± σ):     40251.909 s ± 1669.663 s    [User: 51304.669 s, System: 1889.767 s]
 8  Range (min … max):   39071.279 s … 41432.539 s    2 runs
 9 
10Benchmark 2: COMMIT=fe7365584bb3703e5691c93fb004772e84db3697 ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -dbcache=30000 -stopatheight=878000 -printtoconsole=0
11  Time (mean ± σ):     36062.225 s ± 916.289 s    [User: 47770.885 s, System: 2097.642 s]
12  Range (min … max):   35414.310 s … 36710.139 s    2 runs
13 
14Summary
15  COMMIT=fe7365584bb3703e5691c93fb004772e84db3697 ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -dbcache=30000 -stopatheight=878000 -printtoconsole=0 ran
16    1.12 ± 0.05 times faster than COMMIT=d73f37dda221835b5109ede1b84db2dc7c4b74a1 ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -dbcache=30000 -stopatheight=878000 -printtoconsole=0

maflcko commented at 10:05 am on January 10, 2025: member

Not sure how useful it is to derive speed improvements from measurements where the variance is about half as large as the difference itself. Not claiming this is the case here, but if you measure from the public network, you could very well just measure the bandwidth of the picked nodes (completely unrelated to this pull).

It is fine if you want to do those measurements locally for fun, but putting them in the pull request title and description doesn’t seem ideal. It would be better to focus on stable and reproducible measurements there.

l0rinc commented at 10:09 am on January 10, 2025: contributor

Usually the variance is a lot lower (see previous measurements), but these are just my benchmarks (I want them to be as close to reality as possible, that’s why I’m repeating them to have some predictability), I would appreciate if you could provide independent measurements that you find more stable.

hodlinator commented at 10:43 am on January 11, 2025: contributor

bitcoind -dbcache=30000 -stopatheight=878000 -blocksdir=/magnetic/.bitcoin -addnode=local-network @Sjors, what kind of drive is /magnetic/? (Edit: I’m thinking if the drive is a bit clunky, it will crowd out the speedup from optimizing XOR).

l0rinc commented at 1:07 pm on January 11, 2025: contributor

I think I managed to reproduce the ~2% difference - by not doing an IBD but a -reindex-chainstate. @Sjors, was your datadir completely empty for the runs? My 12% comes from having nothing locally (e.g. no blocks) to being fully synced (i.e. has to include the final flush as well) - to be as close to the user’s experience as possible.

 0 hyperfine \
 1--runs 2 \
 2--parameter-list COMMIT d73f37dda221835b5109ede1b84db2dc7c4b74a1,fe7365584bb3703e5691c93fb004772e84db3697 \
 3--prepare 'git checkout {COMMIT} && git clean -fxd && git reset --hard && cmake -B build -DCMAKE_BUILD_TYPE=Release -DBUILD_UTIL=OFF -DBUILD_TX=OFF -DBUILD_TESTS=OFF -DENABLE_WALLET=OFF -DINSTALL_MAN=OFF && cmake --build build -j$(nproc)' \
 4'COMMIT={COMMIT} ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -dbcache=30000 -stopatheight=878000 -reindex-chainstate -printtoconsole=0 -connect=0' 
 5
 6Benchmark 1: COMMIT=d73f37dda221835b5109ede1b84db2dc7c4b74a1 ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -dbcache=30000 -stopatheight=878000 -reindex-chainstate -printtoconsole=0 -connect=0
 7  Time (mean ± σ):     23664.320 s ± 111.385 s    [User: 35795.225 s, System: 714.912 s]
 8  Range (min … max):   23585.559 s … 23743.081 s    2 runs
 9 
10Benchmark 2: COMMIT=fe7365584bb3703e5691c93fb004772e84db3697 ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -dbcache=30000 -stopatheight=878000 -reindex-chainstate -printtoconsole=0 -connect=0
11  Time (mean ± σ):     23277.741 s ± 172.333 s    [User: 34509.524 s, System: 582.073 s]
12  Range (min … max):   23155.883 s … 23399.599 s    2 runs
13 
14Summary
15  COMMIT=fe7365584bb3703e5691c93fb004772e84db3697 ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -dbcache=30000 -stopatheight=878000 -reindex-chainstate -printtoconsole=0 -connect=0 ran
16    1.02 ± 0.01 times faster than COMMIT=d73f37dda221835b5109ede1b84db2dc7c4b74a1 ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -dbcache=30000 -stopatheight=878000 -reindex-chainstate -printtoconsole=0 -connect=0

andrewtoth commented at 8:47 pm on January 11, 2025: contributor

I benchmarked this PR rebased onto 35bf426e02210c1bbb04926f4ca2e0285fbfcd11 up to block 878k two times each, and I saw a 9% speedup. This was using dbcache=30000.

0hyperfine --export-markdown ~/bench.md --show-output --parameter-list commit 1298bae74a1f690fd6cc0e029e490537cbeb301b,35bf426e02210c1bbb04926f4ca2e0285fbfcd11 --prepare 'git checkout {commit} && \
1cmake --build build -j $(nproc); \
2rm -rf /home/user/.bitcoin; \
3sync; sudo /sbin/sysctl vm.drop_caches=3;' -M 2 'echo '{commit}' && /usr/bin/time ./build/src/bitcoind -printtoconsole=0 -dbcache=30000 -connect=local_node -stopatheight=878000'

This branch:

0  Time (mean ± σ):     17147.622 s ± 37.933 s    [User: 35867.007 s, System: 871.738 s]
1  Range (min … max):   17120.799 s … 17174.444 s    2 runs

Master branch commit 35bf426e02210c1bbb04926f4ca2e0285fbfcd11:

0  Time (mean ± σ):     18704.258 s ±  4.425 s    [User: 37215.600 s, System: 891.293 s]
1  Range (min … max):   18701.130 s … 18707.387 s    2 runs

0Summary
1  'echo 1298bae74a1f690fd6cc0e029e490537cbeb301b && /usr/bin/time ./build/src/bitcoind -printtoconsole=0 -dbcache=30000 -connect=192.168.2.171 -stopatheight=878000' ran
2    1.09 ± 0.00 times faster than 'echo 35bf426e02210c1bbb04926f4ca2e0285fbfcd11 && /usr/bin/time ./build/src/bitcoind -printtoconsole=0 -dbcache=30000 -connect=192.168.2.171 -stopatheight=878000'

Sjors commented at 9:05 am on January 13, 2025: member

@hodlinator a Western Digital spinning disk. Much slower than SSD but fine for just writing out blocks to disk.

(I’m assuming we’re not doing something foolish like writing the block, then reading it again to do the xor and writing it back) @l0rinc I wiped the blocks, indexes and chainstate dirs between runs. Both times I mention exclude the 8 minute flush, which remained similar.

I agree with @maflcko that a difference of 10 minutes is probably not statistically significant. I don’t have a good offline benchmark setup.

l0rinc commented at 9:44 am on January 13, 2025: contributor

@Sjors, your setup is already extremely fast - it seems this optimization shines mostly on commodity hardware, which I assume is used more often. Let’s wait for other reproducers.

Edit: I’ve created a reproducer tutorial to help others in benchmarking this change: https://gist.github.com/l0rinc/83d2bdfce378ad7396610095ceb7bed5

hodlinator commented at 9:11 pm on January 13, 2025: contributor

Nodes on local LAN, same commits, both on SSD. Syncing node was laptop running 13th Gen Intel i7, 20 logical cores.

Full node / source

Made to not have any other connections (verified through running with -debug=net for a while).

Deleted anchors.dat & peers.dat.

0₿ build/src/bitcoind -dbcache=30000 -nofixedseeds -nodnsseed

Syncing node

Deleted ~/.bitcoin.

0₿ time build/src/bitcoind -dbcache=30000 -stopatheight=878000 -connect=<sourcenode>

Results

Commit	Wall time
433412fd8478923dfdb20044f74c5d1e19fa8dd8	3h55m23s (including 8m13s to flush UTXO set to disk)
898a07e2ab3e5b653ddadc76f2d04d625f35607c (this PR) rebased onto 433412fd8478923dfdb20044f74c5d1e19fa8dd8	3h29m52s (including 8m11s to flush UTXO set to disk)

=> 11.9% speedup

hodlinator commented at 8:28 am on January 14, 2025: contributor

(I did a second run on 433412fd8478923dfdb20044f74c5d1e19fa8dd8, it took 3h45m53s, 9.5 minutes / 4.4% faster. So there is some variance).

l0rinc commented at 8:29 am on January 14, 2025: contributor

These are the fastest IBDs I’ve seen so far

DrahtBot added the label Needs rebase on Jan 22, 2025

l0rinc force-pushed on Jan 26, 2025

DrahtBot removed the label Needs rebase on Jan 26, 2025

mlori commented at 9:07 pm on February 17, 2025: none

Benchmark Results for Bitcoin Core Optimization

Based on the guide: https://gist.github.com/l0rinc/83d2bdfce378ad7396610095ceb7bed5, I executed the following benchmark tests on the commits listed below, with 4 runs per commit.

caa68f79c11e5c444977ce8dee8a43020b7b3c5a (Optimization: XOR 64 bits together instead of byte-by-byte)
5acf12bafeb126f2190b3f401f95199e0eea90c9 (master)

I began this quest on January 14 and obtained the final results on February 12, taking nearly a month in total.

Results

The execution of caa68f79c1 was 12% faster than 5acf12bafe.

Executed Command

 0COMMITS="caa68f79c11e5c444977ce8dee8a43020b7b3c5a 5acf12bafeb126f2190b3f401f95199e0eea90c9"
 1STOP_HEIGHT=882000
 2DBCACHE=5000
 3C_COMPILER=gcc
 4CXX_COMPILER=g++
 5
 6hyperfine --export-json "../ibd-${COMMITS/ /-}-${STOP_HEIGHT}-${DBCACHE}-${C_COMPILER}.json" \
 7--runs 4 --parameter-list COMMIT ${COMMITS/ /,} \
 8--prepare "killall bitcoind; rm -rf ../BitcoinData/*; git checkout {COMMIT}; git clean -fxd; git reset --hard; \
 9cmake -B build -DCMAKE_BUILD_TYPE=Release -DENABLE_WALLET=OFF -DCMAKE_C_COMPILER=$C_COMPILER -DCMAKE_CXX_COMPILER=$CXX_COMPILER && \
10cmake --build build -j$(nproc) --target bitcoind && ./build/src/bitcoind -datadir=../BitcoinData -stopatheight=1 -printtoconsole=0 || true" \
11--cleanup "mv ../BitcoinData/debug.log ../BitcoinData-logs/debug-{COMMIT}-$(date +%s).log" \
12"COMPILER=$C_COMPILER COMMIT={COMMIT} ./build/src/bitcoind -datadir=../BitcoinData -stopatheight=$STOP_HEIGHT -dbcache=$DBCACHE -printtoconsole=0"

Commit: caa68f79c1

0Time (mean ± σ):  64403.836 s ± 9298.080 s    
1User: 57218.987 s, System: 3918.333 s  
2Range (min … max):  50760.210 s … 70526.366 s    
3Runs: 4

Commit: 5acf12bafe

0Time (mean ± σ):  71941.764 s ± 2646.491 s    
1User: 60882.538 s, System: 3941.555 s  
2Range (min … max):  68480.815 s … 74310.881 s    
3Runs: 4

Summary

0COMPILER=gcc COMMIT=caa68f79c11e5c444977ce8dee8a43020b7b3c5a 
1ran 1.12 ± 0.17 times faster than 
2COMPILER=gcc COMMIT=5acf12bafeb126f2190b3f401f95199e0eea90c9

Bitcoin Node Setup

CPU: Intel® Core™ i5-6500T @ 2.50GHz
RAM: 16 GB
Storage: 1TB SSD (Apacer AS350)
Internet:
- Download: 732.86 Mbps
- Upload: 322.55 Mbps

Database Cache Adjustments

Initially, the provided database cache (DBCACHE) size was too high for the available memory on this node, causing fluctuations in execution time. To account for this variability, four runs per commit were conducted. Several configurations were tested:

DBCACHE=30000: Execution halted.
DBCACHE=10000: Execution was too slow (probably because swapping) without a significant difference between benchmarks.
DBCACHE=5000: Found to be the optimal configuration for this node.

l0rinc force-pushed on Mar 11, 2025

l0rinc renamed this:
~~optimization: batch XOR operations 12% faster IBD~~
[IBD] multi-byte block obfuscation
on Mar 12, 2025

l0rinc marked this as ready for review on Mar 12, 2025

DrahtBot added the label Needs rebase on Mar 20, 2025

l0rinc force-pushed on Mar 20, 2025

DrahtBot removed the label Needs rebase on Mar 20, 2025

l0rinc force-pushed on Mar 20, 2025

DrahtBot added the label CI failed on Apr 2, 2025

l0rinc force-pushed on Apr 3, 2025

DrahtBot removed the label CI failed on Apr 4, 2025

l0rinc force-pushed on Apr 5, 2025

l0rinc commented at 5:30 pm on April 5, 2025: contributor

Applied the remaining suggestions, thanks @ryanofsky, @hodlinator, @maflcko.

As mentioned before, this PR only contains the multi-byte obfuscation now, batching the single-byte serializations is done in #31551

The PR achieves 18x faster obfuscation on Linux, 49x faster on Mac and an IBD speedup of 4%.

The latest push:

Changed the existing Xor benchmark to XorObfuscationBench, measuring a 10 MiB chunk of random memory;
dbwrapper.cpp and mempool_persist.cpp migrations are simplified by dedicated refactor commits;
streams_tests.cpp now uses the native m_rng, comments were added to the new big tests, unified key_bytes{"ff00ff00ff00ff00"_hex_v} name and storage;
Updated every measurement and commit message to reflect the current state after the split.

Thanks for the reviews so far, it’s ready for review again! Edit: rebased in latest push to resolve CI failure

l0rinc force-pushed on Apr 6, 2025

achow101 referenced this in commit 33df4aebae on Apr 16, 2025

DrahtBot added the label Needs rebase on Apr 16, 2025

l0rinc force-pushed on Apr 17, 2025

l0rinc commented at 10:10 am on April 17, 2025: contributor

Rebased, now that #31551 was merged - will redo the IBD benchmarks (since we have bigger obfuscatable chunks now) to see if any of the commit messages or descriptions need changing. The PR is otherwise ready for review again!

DrahtBot removed the label Needs rebase on Apr 17, 2025

in src/bench/xor.cpp:13 in 46854038e7 outdated

 5@@ -6,19 +6,27 @@
 6 #include <random.h>
 7 #include <span.h>
 8 #include <streams.h>
 9+#include <util/byte_units.h>
10 
11+#include <cmath>
12 #include <cstddef>
13+#include <map>

hodlinator commented at 7:23 am on April 22, 2025:

Unused?

l0rinc commented at 4:35 pm on April 22, 2025:

Done

in src/dbwrapper.cpp:218 in 46854038e7 outdated

212@@ -213,7 +213,11 @@ struct LevelDBContext {
213 };
214 
215 CDBWrapper::CDBWrapper(const DBParams& params)
216-    : m_db_context{std::make_unique<LevelDBContext>()}, m_name{fs::PathToString(params.path.stem())}, m_path{params.path}, m_is_memory{params.memory_only}
217+    : m_db_context{std::make_unique<LevelDBContext>()},
218+      m_name{fs::PathToString(params.path.stem())},
219+      m_obfuscation{0},

hodlinator commented at 7:27 am on April 22, 2025:

nit: Could declare default initialization value for m_obfuscation in header to avoid touching these lines.

l0rinc commented at 4:39 pm on April 22, 2025:

Done

in src/dbwrapper.cpp:256 in 46854038e7 outdated

251@@ -248,24 +252,24 @@ CDBWrapper::CDBWrapper(const DBParams& params)
252         LogPrintf("Finished database compaction of %s\n", fs::PathToString(params.path));
253     }
254 
255-    // The base-case obfuscation key, which is a noop.
256-    obfuscate_key = std::vector<unsigned char>(OBFUSCATE_KEY_NUM_BYTES, '\000');
257+    {
258+        m_obfuscation = 0; // Needed for unobfuscated Read

hodlinator commented at 7:32 am on April 22, 2025:

There’s no return before the assignment on line 271, and we already set this field to 0 during initialization, so could be removed?

l0rinc commented at 12:46 pm on April 22, 2025:

This is needed to be able to read without obfuscation - as the comment states. Can you suggest how to reword that to make it obvious?

hodlinator commented at 1:21 pm on April 22, 2025:

This is needed to be able to read without obfuscation - as the comment states.

Why does it need to be set to zero again when it’s been set in the initializer list of this same ctor?

l0rinc commented at 4:35 pm on April 22, 2025:

I’ve changed it to an assert - if that’s what you meant.

hodlinator commented at 1:35 pm on April 23, 2025:

That’s better. But I still think it’s redundant as the code block now ends with

0m_obfuscation = obfuscate_key_vector;

and all code paths lead there.

l0rinc commented at 2:18 pm on April 23, 2025:

Yeah, can you tell me what’s wrong with it? If you have a suggestion that passes ci (and local IBD for some blocks), let me know

hodlinator commented at 2:19 pm on April 23, 2025:

I had that one coming. :)

hodlinator commented at 6:41 am on April 24, 2025:

Inside a ctor which initializes m_obfuscation{0} through the header, you have:

0    ...
1    {
2        std::vector<unsigned char> obfuscate_key_vector(Obfuscation::SIZE_BYTES, '\000');
3        ...
4        if (...) {
5            ... (modify obfuscate_key_vector)
6        }
7        m_obfuscation = obfuscate_key_vector;
8    }
9}

Do you feel a need to assert(m_obfuscation == 0) at the beginning of the block because the ctor has ~30 preceeding lines? Can’t think of another reason.

l0rinc commented at 9:33 am on April 24, 2025:

assert(m_obfuscation == 0) is needed to document (and to make sure) that the obfuscation key is already written with obfuscation turned on, so we have to make sure the obfuscation key is 0 to turn it off for this write only.

https://github.com/bitcoin/bitcoin/blob/master/src/test/dbwrapper_tests.cpp#L29-L46 demonstrates the usage of this call - I’ve updated it to verify that reopening the database will read the obfuscation key correctly.

So without assert(m_obfuscation == 0) we might not read back old stored obfuscation keys back correctly. Does that explain it? If not, let’s have a call.

hodlinator commented at 10:14 am on April 24, 2025:

Ah, I was interpreting “Needed for unobfuscated Read” as applying to Read-calls after the ctor, hadn’t realized it was for the call 2 lines below. :man_facepalming: That’s why I didn’t think it was relevant to include in #31144 (review). Sorry for this detour. assert is good.

Maybe comment could spell out “Needed for unobfuscated Read directly below” for similar readers to me, but hopefully there aren’t too many of them.

l0rinc commented at 7:02 pm on April 24, 2025:

Extended the comment

l0rinc commented at 7:07 pm on April 24, 2025:

No problem, glad it’s sorted. Extended the comment to make it even clearer.

in src/dbwrapper.h:189 in 46854038e7 outdated

186@@ -188,16 +187,11 @@ class CDBWrapper
187     std::string m_name;
188 
189     //! a key used for optional XOR-obfuscation of the database
190-    std::vector<unsigned char> obfuscate_key;
191+    Obfuscation m_obfuscation;

hodlinator commented at 7:43 am on April 22, 2025:

(Thanks for restoring sanity with the m_-prefix).

in src/obfuscation.h:1 in 46854038e7 outdated

0@@ -0,0 +1,85 @@
1+// Copyright (c) 2009-present The Bitcoin Core developers

hodlinator commented at 7:45 am on April 22, 2025:

2009?

l0rinc commented at 12:46 pm on April 22, 2025:

I’m moving code, I never know what that implies

l0rinc commented at 4:35 pm on April 22, 2025:

Changed anyway

in src/obfuscation.h:24 in 46854038e7 outdated

19+{
20+public:
21+    static constexpr size_t SIZE_BYTES{sizeof(uint64_t)};
22+
23+private:
24+    std::array<uint64_t, SIZE_BYTES> rotations; // Cached key rotations

hodlinator commented at 7:48 am on April 22, 2025:

0    // Cached key rotations for different offsets.
1    std::array<uint64_t, SIZE_BYTES> m_rotations;
2

Most important for me is m_-prefix, newline and comment are nits.

l0rinc commented at 4:35 pm on April 22, 2025:

renamed to m_rotations

in src/obfuscation.h:16 in 46854038e7 outdated

11+#include <random>
12+#include <span.h>
13+#include <util/check.h>
14+#include <cstring>
15+#include <climits>
16+#include <serialize.h>

hodlinator commented at 8:38 am on April 22, 2025:

util/check.h, random and cstring seem unused? bit-header seems required for rotr and std::endian.

0#include <serialize.h>
1#include <span.h>
2
3#include <array>
4#include <bit>
5#include <cassert>
6#include <climits>
7#include <cstdint>

l0rinc commented at 4:35 pm on April 22, 2025:

Did something similar (except for cstdint)

in src/obfuscation.h:23 in 46854038e7 outdated

18+class Obfuscation
19+{
20+public:
21+    static constexpr size_t SIZE_BYTES{sizeof(uint64_t)};
22+
23+private:

hodlinator commented at 8:41 am on April 22, 2025:

nit: Might as well move the private section last and merge the two public ones?

l0rinc commented at 4:35 pm on April 22, 2025:

Want to avoid this, but I don’t actually mind either way - done

in src/test/dbwrapper_tests.cpp:50 in 46854038e7 outdated

46@@ -57,7 +47,7 @@ BOOST_AUTO_TEST_CASE(dbwrapper_basic_data)
47         bool res_bool;
48 
49         // Ensure that we're doing real obfuscation when obfuscate=true
50-        BOOST_CHECK(obfuscate != is_null_key(dbwrapper_private::GetObfuscateKey(dbw)));
51+        BOOST_CHECK(obfuscate == dbwrapper_private::GetObfuscation(dbw));

hodlinator commented at 8:42 am on April 22, 2025:

nit:

0        BOOST_CHECK_EQUAL(obfuscate, dbwrapper_private::GetObfuscation(dbw));

Here and above on line 30.

l0rinc commented at 4:35 pm on April 22, 2025:

Done

in src/obfuscation.h:58 in 46854038e7 outdated

53+    Obfuscation(const std::vector<uint8_t>& key_vec) : Obfuscation(MakeByteSpan(key_vec).first<SIZE_BYTES>()) {}
54+    Obfuscation(const std::vector<std::byte>& key_vec) : Obfuscation(std::span(key_vec).first<SIZE_BYTES>()) {}
55+
56+    uint64_t Key() const { return rotations[0]; }
57+    operator bool() const { return Key() != 0; }
58+    void operator()(std::span<std::byte> write, const size_t key_offset_bytes = 0) const

hodlinator commented at 8:48 am on April 22, 2025:

nit: This is inherited from stream.h, but could change to “target”/“dest” to make it more of a noun?

0    void operator()(std::span<std::byte> target, const size_t key_offset_bytes = 0) const

Same in xor_roundtrip_random_chunks and xor_bytes_reference test cases.

l0rinc commented at 4:35 pm on April 22, 2025:

Done

in src/dbwrapper.cpp:405 in 118d8083b9 outdated

401@@ -411,10 +402,6 @@ void CDBIterator::SeekToFirst() { m_impl_iter->iter->SeekToFirst(); }
402 void CDBIterator::Next() { m_impl_iter->iter->Next(); }
403 
404 namespace dbwrapper_private {
405-
406-const std::vector<unsigned char>& GetObfuscateKey(const CDBWrapper &w)
407-{
408-    return w.obfuscate_key;
409-}
410+const std::vector<unsigned char>& GetObfuscateKey(const CDBWrapper& w) { return w.obfuscate_key; }

hodlinator commented at 9:00 am on April 22, 2025:

Commit message in 118d8083b913e130b073ed3ad0eeb5aca4887899:

0- This commit inlines the obfuscate‑key initialization, replaces `key_exists` with `key_missing`, and simplifies the `if` condition that writes a new obfuscation key.
1- The `CreateObfuscateKey` method and its private helper are removed.
2+ This commit inlines the `CreateObfuscateKey` method, replaces `key_exists` with `key_missing`, and simplifies the `if` condition that writes a new obfuscation key.

This private method is just reformatted. Maybe a case of moving changes between commits. Seems unnecessary.
“The CreateObfuscateKey method is inlined” might be more precise?

l0rinc commented at 4:35 pm on April 22, 2025:

Did something similar

hodlinator commented at 1:43 pm on April 23, 2025:

Seems you forgot to remove “The CreateObfuscateKey method and its private helper were also removed.”? I think saying CreateObfuscateKey was inlined is sufficient and no private helper was removed as stated above in (1.

l0rinc commented at 9:33 am on April 24, 2025:

Sure, done

in src/test/streams_tests.cpp:41 in 46854038e7 outdated

36+        const auto key_bytes{m_rng.randbytes<std::byte>(Obfuscation::SIZE_BYTES)};
37+        const Obfuscation obfuscation{key_bytes};
38+        apply_random_xor_chunks(roundtrip, obfuscation);
39+
40+        // Verify intermediate state is different from original (unless key is zero)
41+        const bool all_zero = !obfuscation || (HexStr(key_bytes).find_first_not_of('0') >= write_size * 2);

hodlinator commented at 9:23 am on April 22, 2025:

Would rather skip converting to string:

0        const bool all_zero = !obfuscation || (static_cast<size_t>(std::count(key_bytes.begin(), key_bytes.begin() + write_size, std::byte{0})) == write_size);

l0rinc commented at 4:37 pm on April 22, 2025:

None are very readable, but I still find the stringy one simpler to read - the performance doesn’t matter here at all

hodlinator commented at 2:10 pm on April 23, 2025:

0        const bool all_zero = !obfuscation || std::ranges::all_of(std::span{key_bytes.begin(), write_size}, [](auto& b) { return b == std::byte{0}; });

0        const bool all_zero = !obfuscation || std::all_of(key_bytes.begin(), key_bytes.begin() + write_size, [](auto& b) { return b == std::byte{0}; });

l0rinc commented at 9:32 am on April 24, 2025:

Ok, sure, changed it to something similar:

0const bool all_zeros{!obfuscation || std::ranges::all_of(std::span(key_bytes).first(write_size), [](auto b) { return b == std::byte{0}; })};

hodlinator commented at 10:04 am on April 24, 2025:

Thanks! (If you re-touch: std::span{key_bytes})

hodlinator commented at 12:32 pm on April 24, 2025:

https://github.com/bitcoin/bitcoin/pull/31144/checks?check_run_id=41075184292 Was able to repro with GCC using:

0cmake -B build -DCMAKE_BUILD_TYPE=Debug -DAPPEND_CPPFLAGS="-D_LIBCPP_HARDENING_MODE=_LIBCPP_HARDENING_MODE_DEBUG -D_GLIBCXX_DEBUG -D_GLIBCXX_DEBUG_PEDANTIC"

We both forgot to limit the span:

0const bool all_zeros{!obfuscation || std::ranges::all_of(std::span{key_bytes}.first(std::min(write_size, key_bytes.size())), [](auto b) { return b == std::byte{0}; })};

Edit: Could drop !obfuscation || since I guess it was mainly there as an optimization to short-circuit the stringly code.

l0rinc commented at 7:02 pm on April 24, 2025:

Done (except for the edit part, it’s better documentation this way I think)

l0rinc commented at 7:08 pm on April 24, 2025:

Done, thanks

in src/test/streams_tests.cpp:93 in 46854038e7 outdated

88+    BOOST_CHECK_EQUAL(obf2.Key(), test_key);
89+
90+    // std::vector<uint8_t> constructor
91+    std::vector<uint8_t> uint8_key(Obfuscation::SIZE_BYTES);
92+    std::memcpy(uint8_key.data(), &test_key, uint8_key.size());
93+    const Obfuscation obf4{uint8_key};

hodlinator commented at 9:50 am on April 22, 2025:

obf3 went missing.

l0rinc commented at 4:38 pm on April 22, 2025:

Hah, removed a constructor during refactoring and left these - moved them to braced scopes, this way the names can be the same

in src/obfuscation.h:24 in 46854038e7 outdated

49+
50+public:
51+    Obfuscation(const uint64_t key) { SetRotations(key); }
52+    Obfuscation(const std::span<const std::byte, SIZE_BYTES> key_span) : Obfuscation(ToUint64(key_span)) {}
53+    Obfuscation(const std::vector<uint8_t>& key_vec) : Obfuscation(MakeByteSpan(key_vec).first<SIZE_BYTES>()) {}
54+    Obfuscation(const std::vector<std::byte>& key_vec) : Obfuscation(std::span(key_vec).first<SIZE_BYTES>()) {}

hodlinator commented at 9:53 am on April 22, 2025:

span::first() is unchecked (https://en.cppreference.com/w/cpp/container/span/first), so we may be reading out of bounds of these vectors.

I managed to make it work with these constructors, part of larger diff which uses std::array over std::vector in most places:

0    Obfuscation(std::span<const std::byte, SIZE_BYTES> key_span) : Obfuscation(ToUint64(key_span)) {}
1    Obfuscation(std::span<const uint8_t, SIZE_BYTES> key_span) : Obfuscation(MakeByteSpan(key_span).first<SIZE_BYTES>()) {}

l0rinc commented at 4:39 pm on April 22, 2025:

I’m not yet sure about the array (tried it, CI or local IBD didn’t work for the reason you’ve also mentioned) - will take a look at that later

hodlinator commented at 6:49 am on May 8, 2025:

Taking only fixed-size spans in the interface of Obfuscation() encourages call-sites to perform error checking, or use std::array which implicitly converts if correctly sized - correct by construction. span::first() is unchecked as I said, so the current interface is unsafe as there are no checks.

I consider this thread unresolved in the current version of the PR.

l0rinc commented at 11:24 am on May 9, 2025:

I admit defeat. Edit: Initialized it in the tests similarly to InitBlocksdirXorKey via static extent arrays instead of vectors. This is closer to prod usage, so it’s even better

hodlinator commented at 1:24 pm on May 19, 2025:

Thanks for lettering yourself be convinced! Guess I needed another round to sharpen my arguments too.

Nice that you didn’t need the span<uint8_t> overload.

in src/test/streams_tests.cpp:314 in 71eb6eaa74 outdated

314     in.clear();
315     in.push_back(std::byte{0xf0});
316     in.push_back(std::byte{0x0f});
317 
318     {
319+        const auto key_bytes{"ff0fff0fff0fff0f"_hex_v};

hodlinator commented at 11:51 am on April 22, 2025:

In 71eb6eaa740ad0b28737e90e59b89a8e951d90d9: key_bytes is unused.

l0rinc commented at 4:40 pm on April 22, 2025:

Was meant as a simplification for the final commit - but didn’t actually use these - thanks, fixed

in src/test/streams_tests.cpp:302 in 71eb6eaa74 outdated

299 
300-    // Single character key
301     {
302+        const auto key_bytes{"ffffffffffffffff"_hex_v};
303+        uint64_t xor_key;
304+        std::memcpy(&xor_key, key_bytes.data(), sizeof(xor_key));

hodlinator commented at 11:53 am on April 22, 2025:

In 71eb6eaa740ad0b28737e90e59b89a8e951d90d9: key_bytes and xor_key are unused after this.

l0rinc commented at 4:40 pm on April 22, 2025:

Fixed

l0rinc requested review from ryanofsky on Apr 22, 2025

l0rinc requested review from hodlinator on Apr 22, 2025

l0rinc requested review from laanwj on Apr 22, 2025

l0rinc requested review from maflcko on Apr 22, 2025

hodlinator commented at 12:30 pm on April 22, 2025: contributor

Code Review 46854038e7984b599d25640de26d4680e62caba7

Edit: Superseded by #31144#pullrequestreview-2788618744

Didn’t get much deeper than surface level yet, but sharing what I found so far.

My primary suggestion is to change the Obfuscation-ctors to take static-extent spans to prevent accidental out-of-bounds access and also to clarify that we don’t consume bigger vectors (see inline comment). std::array of matching size maps well to such spans.

Branch with suggestions

https://github.com/bitcoin/bitcoin/compare/master...hodlinator:bitcoin:pr/31144_suggestions

Applies below diff in appropriate commit. Also adds initial commit de4a3ec4cc27aff527906f553f50a27eab51a418 “random: Add stack version of randbytes”.

Diff of suggestions applied to reviewed code

  0diff --git a/src/bench/xor.cpp b/src/bench/xor.cpp
  1index b04f08bba5..fca9fdceb3 100644
  2--- a/src/bench/xor.cpp
  3+++ b/src/bench/xor.cpp
  4@@ -10,7 +10,6 @@
  5 
  6 #include <cmath>
  7 #include <cstddef>
  8-#include <map>
  9 #include <vector>
 10 
 11 static void XorObfuscationBench(benchmark::Bench& bench)
 12diff --git a/src/dbwrapper.cpp b/src/dbwrapper.cpp
 13index 00317138d5..a6c9c02623 100644
 14--- a/src/dbwrapper.cpp
 15+++ b/src/dbwrapper.cpp
 16@@ -213,11 +213,7 @@ struct LevelDBContext {
 17 };
 18 
 19 CDBWrapper::CDBWrapper(const DBParams& params)
 20-    : m_db_context{std::make_unique<LevelDBContext>()},
 21-      m_name{fs::PathToString(params.path.stem())},
 22-      m_obfuscation{0},
 23-      m_path{params.path},
 24-      m_is_memory{params.memory_only}
 25+    : m_db_context{std::make_unique<LevelDBContext>()}, m_name{fs::PathToString(params.path.stem())}, m_path{params.path}, m_is_memory{params.memory_only}
 26 {
 27     DBContext().penv = nullptr;
 28     DBContext().readoptions.verify_checksums = true;
 29@@ -253,22 +249,21 @@ CDBWrapper::CDBWrapper(const DBParams& params)
 30     }
 31 
 32     {
 33-        m_obfuscation = 0; // Needed for unobfuscated Read
 34-        std::vector<unsigned char> obfuscate_key_vector(Obfuscation::SIZE_BYTES, '\000');
 35-        const bool key_missing{!Read(OBFUSCATE_KEY_KEY, obfuscate_key_vector)};
 36+        std::array<uint8_t, Obfuscation::SIZE_BYTES> obfuscate_key; // Zero-initialized is needed for unobfuscated Read
 37+        const bool key_missing{!Read(OBFUSCATE_KEY_KEY, obfuscate_key)};
 38         if (key_missing && params.obfuscate && IsEmpty()) {
 39             // Initialize non-degenerate obfuscation if it won't upset existing, non-obfuscated data.
 40-            std::vector<uint8_t> new_key(Obfuscation::SIZE_BYTES);
 41+            std::array<uint8_t, Obfuscation::SIZE_BYTES> new_key;
 42             GetRandBytes(new_key);
 43 
 44             // Write `new_key` so we don't obfuscate the key with itself
 45             Write(OBFUSCATE_KEY_KEY, new_key);
 46-            obfuscate_key_vector = new_key;
 47+            obfuscate_key = new_key;
 48 
 49-            LogPrintf("Wrote new obfuscate key for %s: %s\n", fs::PathToString(params.path), HexStr(obfuscate_key_vector));
 50+            LogPrintf("Wrote new obfuscate key for %s: %s\n", fs::PathToString(params.path), HexStr(obfuscate_key));
 51         }
 52-        LogPrintf("Using obfuscation key for %s: %s\n", fs::PathToString(params.path), HexStr(obfuscate_key_vector));
 53-        m_obfuscation = obfuscate_key_vector;
 54+        LogPrintf("Using obfuscation key for %s: %s\n", fs::PathToString(params.path), HexStr(obfuscate_key));
 55+        m_obfuscation = {obfuscate_key};
 56     }
 57 }
 58 
 59diff --git a/src/dbwrapper.h b/src/dbwrapper.h
 60index 5f23f1dc43..feb31b7d48 100644
 61--- a/src/dbwrapper.h
 62+++ b/src/dbwrapper.h
 63@@ -187,7 +187,7 @@ private:
 64     std::string m_name;
 65 
 66     //! a key used for optional XOR-obfuscation of the database
 67-    Obfuscation m_obfuscation;
 68+    Obfuscation m_obfuscation{0};
 69 
 70     //! the key under which the obfuscation key is stored
 71     static const std::string OBFUSCATE_KEY_KEY;
 72diff --git a/src/node/blockstorage.cpp b/src/node/blockstorage.cpp
 73index d2c1f33205..56d96e58f4 100644
 74--- a/src/node/blockstorage.cpp
 75+++ b/src/node/blockstorage.cpp
 76@@ -1107,7 +1107,7 @@ static Obfuscation InitBlocksdirXorKey(const BlockManager::Options& opts)
 77 {
 78     // Bytes are serialized without length indicator, so this is also the exact
 79     // size of the XOR-key file.
 80-    std::array<std::byte, 8> xor_key{};
 81+    std::array<std::byte, Obfuscation::SIZE_BYTES> xor_key{};
 82 
 83     // Consider this to be the first run if the blocksdir contains only hidden
 84     // files (those which start with a .). Checking for a fully-empty dir would
 85diff --git a/src/obfuscation.h b/src/obfuscation.h
 86index 70bc844c6d..d4a732e286 100644
 87--- a/src/obfuscation.h
 88+++ b/src/obfuscation.h
 89@@ -5,15 +5,14 @@
 90 #ifndef BITCOIN_OBFUSCATION_H
 91 #define BITCOIN_OBFUSCATION_H
 92 
 93+#include <serialize.h>
 94+#include <span.h>
 95+
 96 #include <array>
 97+#include <bit>
 98 #include <cassert>
 99-#include <cstdint>
100-#include <random>
101-#include <span.h>
102-#include <util/check.h>
103-#include <cstring>
104 #include <climits>
105-#include <serialize.h>
106+#include <cstdint>
107 
108 class Obfuscation
109 {
110@@ -21,13 +20,15 @@ public:
111     static constexpr size_t SIZE_BYTES{sizeof(uint64_t)};
112 
113 private:
114-    std::array<uint64_t, SIZE_BYTES> rotations; // Cached key rotations
115+    // Cached key rotations for different offsets.
116+    std::array<uint64_t, SIZE_BYTES> m_rotations;
117+
118     void SetRotations(const uint64_t key)
119     {
120         for (size_t i{0}; i < SIZE_BYTES; ++i) {
121             size_t key_rotation_bits{CHAR_BIT * i};
122             if constexpr (std::endian::native == std::endian::big) key_rotation_bits *= -1;
123-            rotations[i] = std::rotr(key, key_rotation_bits);
124+            m_rotations[i] = std::rotr(key, key_rotation_bits);
125         }
126     }
127 
128@@ -49,36 +50,35 @@ private:
129 
130 public:
131     Obfuscation(const uint64_t key) { SetRotations(key); }
132-    Obfuscation(const std::span<const std::byte, SIZE_BYTES> key_span) : Obfuscation(ToUint64(key_span)) {}
133-    Obfuscation(const std::vector<uint8_t>& key_vec) : Obfuscation(MakeByteSpan(key_vec).first<SIZE_BYTES>()) {}
134-    Obfuscation(const std::vector<std::byte>& key_vec) : Obfuscation(std::span(key_vec).first<SIZE_BYTES>()) {}
135+    Obfuscation(std::span<const std::byte, SIZE_BYTES> key_span) : Obfuscation(ToUint64(key_span)) {}
136+    Obfuscation(std::span<const uint8_t, SIZE_BYTES> key_span) : Obfuscation(MakeByteSpan(key_span).first<SIZE_BYTES>()) {}
137 
138-    uint64_t Key() const { return rotations[0]; }
139+    uint64_t Key() const { return m_rotations[0]; }
140     operator bool() const { return Key() != 0; }
141-    void operator()(std::span<std::byte> write, const size_t key_offset_bytes = 0) const
142+    void operator()(std::span<std::byte> target, const size_t key_offset_bytes = 0) const
143     {
144         if (!*this) return;
145-        const uint64_t rot_key{rotations[key_offset_bytes % SIZE_BYTES]}; // Continue obfuscation from where we left off
146-        for (; write.size() >= SIZE_BYTES; write = write.subspan(SIZE_BYTES)) { // Process multiple bytes at a time
147-            Xor(write, rot_key, SIZE_BYTES);
148+        const uint64_t rot_key{m_rotations[key_offset_bytes % SIZE_BYTES]}; // Continue obfuscation from where we left off
149+        for (; target.size() >= SIZE_BYTES; target = target.subspan(SIZE_BYTES)) { // Process multiple bytes at a time
150+            Xor(target, rot_key, SIZE_BYTES);
151         }
152-        Xor(write, rot_key, write.size());
153+        Xor(target, rot_key, target.size());
154     }
155 
156     template <typename Stream>
157     void Serialize(Stream& s) const
158     {
159-        std::vector<std::byte> bytes(SIZE_BYTES);
160-        std::memcpy(bytes.data(), &rotations[0], SIZE_BYTES);
161+        std::array<std::byte, SIZE_BYTES> bytes;
162+        std::memcpy(bytes.data(), &m_rotations[0], SIZE_BYTES); // MISMATCH IN size, byte vs i64
163         s << bytes;
164     }
165 
166     template <typename Stream>
167     void Unserialize(Stream& s)
168     {
169-        std::vector<std::byte> bytes(SIZE_BYTES);
170+        std::array<std::byte, SIZE_BYTES> bytes;
171         s >> bytes;
172-        SetRotations(ToUint64(MakeByteSpan(bytes).first<SIZE_BYTES>()));
173+        SetRotations(ToUint64(bytes));
174     }
175 };
176 
177diff --git a/src/random.h b/src/random.h
178index c702309d0c..ecc4d1e88f 100644
179--- a/src/random.h
180+++ b/src/random.h
181@@ -301,6 +301,15 @@ public:
182         return ret;
183     }
184 
185+    /** Generate random bytes on the stack. */
186+    template <size_t N, BasicByte B = unsigned char>
187+    std::array<B, N> randbytes() noexcept
188+    {
189+        std::array<B, N> ret;
190+        Impl().fillrand(MakeWritableByteSpan(ret));
191+        return ret;
192+    }
193+
194     /** Generate a random 32-bit integer. */
195     uint32_t rand32() noexcept { return Impl().template randbits<32>(); }
196 
197diff --git a/src/test/dbwrapper_tests.cpp b/src/test/dbwrapper_tests.cpp
198index 7d9135baf0..c7431e6e1f 100644
199--- a/src/test/dbwrapper_tests.cpp
200+++ b/src/test/dbwrapper_tests.cpp
201@@ -27,7 +27,7 @@ BOOST_AUTO_TEST_CASE(dbwrapper)
202         uint256 res;
203 
204         // Ensure that we're doing real obfuscation when obfuscate=true
205-        BOOST_CHECK(obfuscate == dbwrapper_private::GetObfuscation(dbw));
206+        BOOST_CHECK_EQUAL(obfuscate, dbwrapper_private::GetObfuscation(dbw));
207 
208         BOOST_CHECK(dbw.Write(key, in));
209         BOOST_CHECK(dbw.Read(key, res));
210@@ -47,7 +47,7 @@ BOOST_AUTO_TEST_CASE(dbwrapper_basic_data)
211         bool res_bool;
212 
213         // Ensure that we're doing real obfuscation when obfuscate=true
214-        BOOST_CHECK(obfuscate == dbwrapper_private::GetObfuscation(dbw));
215+        BOOST_CHECK_EQUAL(obfuscate, dbwrapper_private::GetObfuscation(dbw));
216 
217         //Simulate block raw data - "b + block hash"
218         std::string key_block = "b" + m_rng.rand256().ToString();
219diff --git a/src/test/streams_tests.cpp b/src/test/streams_tests.cpp
220index ab64825d98..ebd22d2a0c 100644
221--- a/src/test/streams_tests.cpp
222+++ b/src/test/streams_tests.cpp
223@@ -33,12 +33,12 @@ BOOST_AUTO_TEST_CASE(xor_roundtrip_random_chunks)
224         const std::vector original{m_rng.randbytes<std::byte>(write_size)};
225         std::vector roundtrip{original};
226 
227-        const auto key_bytes{m_rng.randbytes<std::byte>(Obfuscation::SIZE_BYTES)};
228+        const std::array key_bytes{m_rng.randbytes<Obfuscation::SIZE_BYTES, std::byte>()};
229         const Obfuscation obfuscation{key_bytes};
230         apply_random_xor_chunks(roundtrip, obfuscation);
231 
232         // Verify intermediate state is different from original (unless key is zero)
233-        const bool all_zero = !obfuscation || (HexStr(key_bytes).find_first_not_of('0') >= write_size * 2);
234+        const bool all_zero = !obfuscation || (static_cast<size_t>(std::count(key_bytes.begin(), key_bytes.begin() + write_size, std::byte{0})) == write_size);
235         BOOST_CHECK_EQUAL(original != roundtrip, !all_zero);
236 
237         apply_random_xor_chunks(roundtrip, obfuscation);
238@@ -60,7 +60,7 @@ BOOST_AUTO_TEST_CASE(xor_bytes_reference)
239         const size_t write_size{1 + m_rng.randrange(100U)};
240         const size_t key_offset{m_rng.randrange(3 * 8U)}; // Should wrap around
241 
242-        const auto key_bytes{m_rng.randbytes<std::byte>(Obfuscation::SIZE_BYTES)};
243+        const std::array key_bytes{m_rng.randbytes<Obfuscation::SIZE_BYTES, std::byte>()};
244         const Obfuscation obfuscation{key_bytes};
245         std::vector expected{m_rng.randbytes<std::byte>(write_size)};
246         std::vector actual{expected};
247@@ -72,7 +72,6 @@ BOOST_AUTO_TEST_CASE(xor_bytes_reference)
248     }
249 }
250 
251-
252 BOOST_AUTO_TEST_CASE(obfuscation_constructors)
253 {
254     constexpr uint64_t test_key = 0x0123456789ABCDEF;
255@@ -81,23 +80,17 @@ BOOST_AUTO_TEST_CASE(obfuscation_constructors)
256     const Obfuscation obf1{test_key};
257     BOOST_CHECK_EQUAL(obf1.Key(), test_key);
258 
259-    // std::span constructor
260+    // std::span<std::byte> constructor
261     std::array<std::byte, Obfuscation::SIZE_BYTES> key_bytes{};
262     std::memcpy(key_bytes.data(), &test_key, Obfuscation::SIZE_BYTES);
263-    const Obfuscation obf2{std::span{key_bytes}};
264+    const Obfuscation obf2{key_bytes};
265     BOOST_CHECK_EQUAL(obf2.Key(), test_key);
266 
267-    // std::vector<uint8_t> constructor
268-    std::vector<uint8_t> uint8_key(Obfuscation::SIZE_BYTES);
269+    // std::span<uint8_t> constructor
270+    std::array<uint8_t, Obfuscation::SIZE_BYTES> uint8_key;
271     std::memcpy(uint8_key.data(), &test_key, uint8_key.size());
272-    const Obfuscation obf4{uint8_key};
273-    BOOST_CHECK_EQUAL(obf4.Key(), test_key);
274-
275-    // std::vector<std::byte> constructor
276-    std::vector<std::byte> byte_vector_key(Obfuscation::SIZE_BYTES);
277-    std::memcpy(byte_vector_key.data(), &test_key, byte_vector_key.size());
278-    const Obfuscation obf5{byte_vector_key};
279-    BOOST_CHECK_EQUAL(obf5.Key(), test_key);
280+    const Obfuscation obf3{uint8_key};
281+    BOOST_CHECK_EQUAL(obf3.Key(), test_key);
282 }
283 
284 BOOST_AUTO_TEST_CASE(obfuscation_serialize)
285@@ -129,13 +122,13 @@ BOOST_AUTO_TEST_CASE(xor_file)
286     auto raw_file{[&](const auto& mode) { return fsbridge::fopen(xor_path, mode); }};
287     const std::vector<uint8_t> test1{1, 2, 3};
288     const std::vector<uint8_t> test2{4, 5};
289-    auto key_bytes{"ff00ff00ff00ff00"_hex_v};
290+    auto key_bytes{"ff00ff00ff00ff00"_hex};
291     uint64_t xor_key;
292     std::memcpy(&xor_key, key_bytes.data(), sizeof(xor_key));
293 
294     {
295         // Check errors for missing file
296-        AutoFile xor_file{raw_file("rb"), key_bytes};
297+        AutoFile xor_file{raw_file("rb"), {key_bytes}};
298         BOOST_CHECK_EXCEPTION(xor_file << std::byte{}, std::ios_base::failure, HasReason{"AutoFile::write: file handle is nullpt"});
299         BOOST_CHECK_EXCEPTION(xor_file >> std::byte{}, std::ios_base::failure, HasReason{"AutoFile::read: file handle is nullpt"});
300         BOOST_CHECK_EXCEPTION(xor_file.ignore(1), std::ios_base::failure, HasReason{"AutoFile::ignore: file handle is nullpt"});
301@@ -345,7 +338,7 @@ BOOST_AUTO_TEST_CASE(streams_serializedata_xor)
302     in.push_back(std::byte{0xf0});
303 
304     {
305-        const Obfuscation obfuscation{"ffffffffffffffff"_hex_v};
306+        const Obfuscation obfuscation{"ffffffffffffffff"_hex};
307 
308         DataStream ds{in};
309         obfuscation(ds);
310@@ -357,7 +350,7 @@ BOOST_AUTO_TEST_CASE(streams_serializedata_xor)
311     in.push_back(std::byte{0x0f});
312 
313     {
314-        const Obfuscation obfuscation{"ff0fff0fff0fff0f"_hex_v};
315+        const Obfuscation obfuscation{"ff0fff0fff0fff0f"_hex};
316 
317         DataStream ds{in};
318         obfuscation(ds);
319@@ -673,7 +666,7 @@ BOOST_AUTO_TEST_CASE(buffered_reader_matches_autofile_random_content)
320     const FlatFilePos pos{0, 0};
321 
322     const FlatFileSeq test_file{m_args.GetDataDirBase(), "buffered_file_test_random", node::BLOCKFILE_CHUNK_SIZE};
323-    const std::vector obfuscation{m_rng.randbytes<std::byte>(8)};
324+    const Obfuscation obfuscation{m_rng.randbytes<Obfuscation::SIZE_BYTES>()};
325 
326     // Write out the file with random content
327     {
328@@ -726,7 +719,7 @@ BOOST_AUTO_TEST_CASE(buffered_writer_matches_autofile_random_content)
329 
330     const FlatFileSeq test_buffered{m_args.GetDataDirBase(), "buffered_write_test", node::BLOCKFILE_CHUNK_SIZE};
331     const FlatFileSeq test_direct{m_args.GetDataDirBase(), "direct_write_test", node::BLOCKFILE_CHUNK_SIZE};
332-    const std::vector obfuscation{m_rng.randbytes<std::byte>(8)};
333+    const Obfuscation obfuscation{m_rng.randbytes<Obfuscation::SIZE_BYTES>()};
334 
335     {
336         DataBuffer test_data{m_rng.randbytes<std::byte>(file_size)};

hodlinator commented at 1:52 pm on April 22, 2025: contributor

Changed the serialization of Obfuscation from vector -> array without re-testing my suggestions towards the end. Forgot that one serializes the size and the other does not. Seems to be responsible for some of the test failures on my suggestion-branch.

l0rinc force-pushed on Apr 22, 2025

in src/bench/xor.cpp:22 in 2a50ec0d17 outdated

23 
24-    bench.batch(data.size()).unit("byte").run([&] {
25-        util::Xor(data, key);
26+    std::vector key_bytes{rng.randbytes<std::byte>(8)};
27+    uint64_t key;
28+    std::memcpy(&key, key_bytes.data(), 8);

hodlinator commented at 8:29 pm on April 23, 2025:

In 2a50ec0d17b12f2adb1bf8872592b898810e9ac4: key is no longer used.

l0rinc commented at 9:32 am on April 24, 2025:

Oops!…I Did It Again

hodlinator commented at 7:02 am on April 24, 2025: contributor

Got some mixed messaging here since you told me repeatedly in private not to investigate failures I reported (https://github.com/bitcoin/bitcoin/pull/31144#issuecomment-2821410082) on my previous branch (https://github.com/bitcoin/bitcoin/pull/31144#pullrequestreview-2783071854) but then chided me in public (https://github.com/bitcoin/bitcoin/pull/31144#discussion_r2056174501). Anyways, the previous branch was not the sliced bread I believed it to be, looking back.

Edit: this was cleared up in #31144 (comment)

New branch: https://github.com/bitcoin/bitcoin/compare/master...hodlinator:bitcoin:pr/31144_suggestions_2 CI: https://github.com/hodlinator/bitcoin/actions/runs/14628005565/attempts/1 (one failure due to #32322)

If you have a suggestion that passes ci (and local IBD for some blocks), let me know

How about CI above + local IBD on mainnet of a couple of months of blocks? Have you ever encountered IBD failures for code that passes CI? Seems like missing test coverage in that case?

Scaled back some of the vector -> array, especially in context of serialization.

Comparison between current PR and suggestion branch: https://github.com/bitcoin/bitcoin/compare/6892c5a..hodlinator:bitcoin:pr/31144_suggestions_2

l0rinc force-pushed on Apr 24, 2025

l0rinc commented at 9:43 am on April 24, 2025: contributor

Have you ever encountered IBD failures for code that passes CI? Seems like missing test coverage in that case?

yes, extended one of the tests now, hoping that will cover it next time.

then chided me in public

Definitely wasn’t my intention to scold you in any way, just didn’t (and still don’t) understand what you’re objecting to or suggesting in that part of the code. Pushed some changes, if it’s still not clear, let’s discuss in person.

DrahtBot added the label CI failed on Apr 24, 2025

DrahtBot commented at 11:06 am on April 24, 2025: contributor

🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/41075184281

Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.
An intermittent issue.

Leave a comment here, if you need help tracking down a confusing failure.

l0rinc force-pushed on Apr 24, 2025

l0rinc commented at 7:06 pm on April 24, 2025: contributor

Thanks for the reviews, addressed most of your concerns - except for the vector constructor for Obfuscation - but if other reviewers also think it’s better that way, I’ll do it of course. Also extended the BOOST_AUTO_TEST_CASE(dbwrapper) test case with asserting that the obfuscation key can be read back by an unobfuscated instance as well.

l0rinc force-pushed on Apr 24, 2025

hodlinator commented at 10:13 am on April 25, 2025: contributor

Just letting the thread know that we’ve cleared up our miscommunication regarding #31144#pullrequestreview-2788618744. The combination of my prior suggestion failing CI and @l0rinc’s cautioning to make sure my further suggestions pass CI seem to be an unlucky coincidence within a very short time-frame. We should strive towards assuming good-faith when communicating over text, but on some days frustrations can get the best of me.

l0rinc force-pushed on Apr 25, 2025

l0rinc commented at 11:34 am on April 25, 2025: contributor

Sorry for a few useless pushes, I was fighting some compilers for strings not always being constexpr-able… Code is ready for review again.

DrahtBot commented at 1:26 pm on April 25, 2025: contributor

[07:30:05.813] ⚠️ Failure generated from target with exit code 77: [’/ci_container_base/ci/scratch/build-x86_64-pc-linux-gnu/bin/fuzz’, ‘-runs=1’, PosixPath(’/ci_container_base/ci/scratch/qa-assets/fuzz_corpora/validation_load_mempool’)]

l0rinc force-pushed on Apr 25, 2025

l0rinc commented at 5:04 pm on April 25, 2025: contributor

For now I’ve reverted the assert that checks the deserialized key size, since I couldn’t reproduce the fuzz failure locally (or when I could, even master failed for the same command). Will continue trying, but without the assert it’s closer to the previous state, so it should be fine as it is as well.

DrahtBot removed the label CI failed on Apr 25, 2025

l0rinc force-pushed on Apr 26, 2025

l0rinc commented at 2:57 pm on April 26, 2025: contributor

Restored the deserialization validation in Obfuscation::Unserialize, we can’t assert, but we can throw a std::logic_error, since during mempool fuzzing https://github.com/bitcoin/bitcoin/blob/master/src/node/mempool_persist.cpp#L141 catches and ignored these errors safely (managed to reproduce it on Linux, not sure why it’s not reproducible on Mac). I’ve also split out all renames to a single commit before any other refactor or optimization to simplify the higher-risk changes. PR is ready for review again!

in src/dbwrapper.h:206 in bba64732ff outdated

199@@ -210,6 +200,11 @@ class CDBWrapper
200     auto& DBContext() const LIFETIMEBOUND { return *Assert(m_db_context); }
201 
202 public:
203+    // Prefixed with null character to avoid collisions with other keys
204+    //
205+    // We must use a string constructor which specifies length so that we copy past the null-terminator.
206+    inline static const std::string OBFUSCATION_KEY{"\000obfuscate_key", 14};

hodlinator commented at 1:27 pm on April 30, 2025:

What is the point of changing this?

Renaming OBFUSCATE_KEY_KEY -> OBFUSCATION_KEY makes the first word nicer, but it is arguably the database key (1) for looking up the key (2) used to obfuscate (0) the data.
Making it inline might be shifting some work from the linker to the compiler (better for parallelism?). Having it all in the header is slightly nicer for humans, but seems like it would duplicate the constant where it is used. Could it just be declared as a file-local (static/anon namespace + constexpr) variable in the .CPP file instead?
If we keep this change, could it be made private constexpr?

l0rinc commented at 11:24 am on May 9, 2025:

Some compilers had problem with string constexpr, it’s why I inlined it instead. I don’t mind reverting to OBFUSCATION_KEY_KEY (note OBFUSCATE -> OBFUSCATION), if you think it’s better_better.

hodlinator commented at 1:36 pm on May 19, 2025:

Slightly better_better, thanks!

Why change it from being private to public, some later PR?
nit: This compiles on GCC 14.2.1 and Clang 20.1.3:

 0diff --git a/src/dbwrapper.cpp b/src/dbwrapper.cpp
 1index ffb25b8ac1..ffe98a3fde 100644
 2--- a/src/dbwrapper.cpp
 3+++ b/src/dbwrapper.cpp
 4@@ -32,6 +32,11 @@
 5 #include <optional>
 6 #include <utility>
 7 
 8+// Prefixed with null character to avoid collisions with other keys
 9+//
10+// We must use a string constructor which specifies length so that we copy past the null-terminator.
11+static constexpr std::string OBFUSCATION_KEY_KEY{"\000obfuscate_key", 14};
12+
13 static auto CharCast(const std::byte* data) { return reinterpret_cast<const char*>(data); }
14 
15 bool DestroyDB(const std::string& path_str)
16diff --git a/src/dbwrapper.h b/src/dbwrapper.h
17index 7a027d2ce4..8e9c6a31ba 100644
18--- a/src/dbwrapper.h
19+++ b/src/dbwrapper.h
20@@ -200,11 +200,6 @@ private:
21     auto& DBContext() const LIFETIMEBOUND { return *Assert(m_db_context); }
22 
23 public:
24-    // Prefixed with null character to avoid collisions with other keys
25-    //
26-    // We must use a string constructor which specifies length so that we copy past the null-terminator.
27-    inline static const std::string OBFUSCATION_KEY_KEY{"\000obfuscate_key", 14};
28-
29     CDBWrapper(const DBParams& params);
30     ~CDBWrapper();

l0rinc commented at 2:58 pm on May 19, 2025:

This is the error I got previously for constexpr string: https://cirrus-ci.com/task/4531348515323904?logs=ci#L1428 Changed it back to private, thanks.

l0rinc requested review from hodlinator on May 1, 2025

l0rinc requested review from sipa on May 1, 2025

in src/test/dbwrapper_tests.cpp:47 in b9c54ccd8c outdated

52+            BOOST_CHECK_EQUAL(obfuscate, !dbw.IsEmpty());
53+
54+            // Ensure that we're doing real obfuscation when obfuscate=true
55+            BOOST_CHECK(obfuscate != is_null_key(dbwrapper_private::GetObfuscateKey(dbw)));
56+
57+            obfuscation_key = dbwrapper_private::GetObfuscateKey(dbw);

hodlinator commented at 7:19 am on May 8, 2025:

In b9c54ccd8c4248e060c3b4186af0deb3b577d34f: nit, avoid repeated dbwrapper_private::GetObfuscateKey call:

0-             BOOST_CHECK(obfuscate != is_null_key(dbwrapper_private::GetObfuscateKey(dbw)));
1-             obfuscation_key = dbwrapper_private::GetObfuscateKey(dbw);
2+             obfuscation_key = dbwrapper_private::GetObfuscateKey(dbw);
3+             BOOST_CHECK_NE(obfuscate, is_null_key(obfuscation_key));

similarly on lines 68-69:

0-             BOOST_CHECK(obfuscate != is_null_key(dbwrapper_private::GetObfuscateKey(dbw)));
1-             BOOST_CHECK(obfuscation_key == dbwrapper_private::GetObfuscateKey(dbw));
2+             BOOST_CHECK(obfuscation_key == dbwrapper_private::GetObfuscateKey(dbw));
3+             BOOST_CHECK_NE(obfuscate, is_null_key(obfuscation_key));

l0rinc commented at 12:47 pm on May 9, 2025:

Good idea, though it doesn’t change the final commit

in src/test/dbwrapper_tests.cpp:66 in bba64732ff outdated

81+
82+            // Verify all written values
83+            for (const auto& [key, expected_value] : key_values) {
84+                uint256 read_value{};
85+                BOOST_CHECK(dbw.Read(key, read_value));
86+                BOOST_CHECK_EQUAL(read_value.ToString(), expected_value.ToString());

hodlinator commented at 7:41 am on May 8, 2025:

nit in b9c54ccd8c4248e060c3b4186af0deb3b577d34f: Could drop string conversions:

0                BOOST_CHECK_EQUAL(read_value, expected_value);

l0rinc commented at 11:25 am on May 9, 2025:

Thanks

in src/bench/xor.cpp:14 in 366bffd125 outdated

10 
11 #include <cstddef>
12 #include <vector>
13 
14-static void Xor(benchmark::Bench& bench)
15+static void XorObfuscationBench(benchmark::Bench& bench)

hodlinator commented at 8:03 am on May 8, 2025:

nit in 366bffd1252a768e1161c7b632ef8c4816bb504e: Could drop rename here from Xor -> XorObfuscationBench since next commit renames it again to ObfuscationBench.

l0rinc commented at 12:46 pm on May 9, 2025:

Was done on purpose - to retire the type of Obfuscation gradually - but I don’t mind doing it early either. Changed in the commit messages as well.

in src/streams.h:28 in 9712481ae7 outdated

24@@ -25,7 +25,7 @@
25 #include <vector>
26 
27 namespace util {
28-inline void Xor(std::span<std::byte> write, std::span<const std::byte> key, size_t key_offset = 0)
29+inline void Obfuscation(std::span<std::byte> write, std::span<const std::byte> key, size_t key_offset = 0)

hodlinator commented at 8:08 am on May 8, 2025:

nit in 9712481ae727bb0d8ad392c5fc2980aacbd9091b: prefer verb for function, as you did for DataStream::Obfuscate():

0- inline void Obfuscation(std::span<std::byte> write, std::span<const std::byte> key, size_t key_offset = 0)
1+ inline void Obfuscate(std::span<std::byte> write, std::span<const std::byte> key, size_t key_offset = 0)

l0rinc commented at 12:46 pm on May 9, 2025:

Named it as such to minimize diffs - but I also see the logic here - done

in src/dbwrapper.cpp:266 in bba64732ff outdated

274-        obfuscate_key = new_key;
275-
276-        LogPrintf("Wrote new obfuscate key for %s: %s\n", fs::PathToString(params.path), HexStr(obfuscate_key));
277+            LogPrintf("Wrote new obfuscate key for %s: %s\n", fs::PathToString(params.path), HexStr(obfuscation_key_vector));
278+        }
279+        LogPrintf("Using obfuscation key for %s: %s\n", fs::PathToString(params.path), HexStr(obfuscation_key_vector));

hodlinator commented at 8:30 am on May 8, 2025:

nit in 13cc039f20eae787ddbc00140dbc89a9a0b1ea05: Could stop using deprecated macro and drop \n, also seems the wording could be corrected:

0            LogInfo("Wrote new obfuscation key for %s: %s", fs::PathToString(params.path), HexStr(obfuscation_key_vector));
1        }
2        LogInfo("Using obfuscation key for %s: %s", fs::PathToString(params.path), HexStr(obfuscation_key_vector));

l0rinc commented at 12:46 pm on May 9, 2025:

Done

in src/dbwrapper.cpp:314 in 13cc039f20 outdated

315-// past the null-terminator.
316-const std::string CDBWrapper::OBFUSCATION_KEY("\000obfuscate_key", 14);
317-
318 const unsigned int CDBWrapper::OBFUSCATION_SIZE_BYTES = 8;
319 
320 /**

hodlinator commented at 8:36 am on May 8, 2025:

in 13cc039f20eae787ddbc00140dbc89a9a0b1ea05: CreateObfuscation remains here despite the commit message.

l0rinc commented at 12:46 pm on May 9, 2025:

Was done in the next commit, indeed - localized it, thanks for finding these rebase inconsistencies (likely happened when I changed the order of commits).

in src/dbwrapper.cpp:253 in bba64732ff outdated

247@@ -248,24 +248,24 @@ CDBWrapper::CDBWrapper(const DBParams& params)
248         LogPrintf("Finished database compaction of %s\n", fs::PathToString(params.path));
249     }
250 
251-    // The base-case obfuscation key, which is a noop.
252-    obfuscate_key = std::vector<unsigned char>(OBFUSCATE_KEY_NUM_BYTES, '\000');
253+    {
254+        assert(m_obfuscation == 0); // Needed for unobfuscated Read() below
255+        std::vector<uint8_t> obfuscation_key_vector(Obfuscation::SIZE_BYTES, '\000');

hodlinator commented at 8:47 am on May 8, 2025:

nit: Naming is too Hungarian.

0        std::vector<uint8_t> obfuscation_key(Obfuscation::SIZE_BYTES, '\000');

l0rinc commented at 12:47 pm on May 9, 2025:

But I am Hungarian :p In the tests I’ve been using key_bytes, applied it here as well.

in src/dbwrapper.cpp:267 in bba64732ff outdated

275-
276-        LogPrintf("Wrote new obfuscate key for %s: %s\n", fs::PathToString(params.path), HexStr(obfuscate_key));
277+            LogPrintf("Wrote new obfuscate key for %s: %s\n", fs::PathToString(params.path), HexStr(obfuscation_key_vector));
278+        }
279+        LogPrintf("Using obfuscation key for %s: %s\n", fs::PathToString(params.path), HexStr(obfuscation_key_vector));
280+        m_obfuscation = obfuscation_key_vector;

hodlinator commented at 8:55 am on May 8, 2025:

In the !key_missing case, we’ve just read the vector from a file on disk. It could have been corrupted and got a wildly unexpected length. If it is too short we will read out of bounds here, if it is too long we will continue when we probably should be erroring out. See #31144 (review)

l0rinc commented at 12:47 pm on May 9, 2025:

Valid critique - which is a gateway-suggestion forcing me to admit that we need static extend spans/arrays here :/

in src/streams.h:250 in bba64732ff outdated

252-     * XOR the contents of this stream with a certain key.
253-     *
254-     * @param[in] key    The key used to XOR the data in this stream.
255-     */
256-    void Xor(const std::vector<unsigned char>& key)
257+    void Obfuscate(const Obfuscation& obfuscation)

hodlinator commented at 9:05 am on May 8, 2025:

Seems like this function is no longer referenced after this PR and could be dropped in the final commit.

l0rinc commented at 12:47 pm on May 9, 2025:

Hah, indeed, thanks!

in src/test/streams_tests.cpp:105 in bba64732ff outdated

100+        BOOST_CHECK_EQUAL(obfuscation.Key(), test_key);
101+    }
102+
103+    // std::vector<std::byte> constructor
104+    {
105+        std::vector<std::byte> byte_vector_key(Obfuscation::SIZE_BYTES);

hodlinator commented at 9:10 am on May 8, 2025:

The size of these vectors are treating Obfuscation with kid gloves, only to let it detonate in the wild. See #31144 (review)

l0rinc commented at 12:47 pm on May 9, 2025:

Removed, thanks

hodlinator commented at 9:23 am on May 8, 2025: contributor

Code review bba64732ffb6f9463bb1eced7953493935950cec.

Thanks for improving the dbwrapper test case and minimizing _hex_v usage in the end state of the PR!

l0rinc force-pushed on May 9, 2025

l0rinc commented at 3:07 pm on May 9, 2025: contributor

Rebased, opened issue for unrelated test failure

in src/dbwrapper.cpp:268 in 9406c17223 outdated

276+        }
277+        if (obfuscation_key_bytes.size() == Obfuscation::SIZE_BYTES) {
278+            LogInfo("Using obfuscation key for %s: %s", fs::PathToString(params.path), HexStr(obfuscation_key_bytes));
279+            m_obfuscation = MakeByteSpan(obfuscation_key_bytes).first<Obfuscation::SIZE_BYTES>();
280+        } else {
281+            LogError("Invalid obfuscation key for %s: %s - continuing unobfuscated!", fs::PathToString(params.path), HexStr(obfuscation_key_bytes));

hodlinator commented at 7:53 am on May 19, 2025:

Should we really continue after reading an invalid key length? Feels like it would be good to throw an exception, or assert the condition instead of if.

(Could adjust the message/comment to include the possibility of us being old software running on some future data format we don’t recognize?)

l0rinc commented at 2:58 pm on May 19, 2025:

I’m against throwing in general - especially for constructors, but we’re already reading files in there and already throwing via HandleError - changed and moved to the related refactoring commit.

in src/node/mempool_persist.cpp:65 in b085c52030 outdated

62+
63         if (version == MEMPOOL_DUMP_VERSION_NO_XOR_KEY) {
64-            // Leave XOR-key empty
65+            file.SetObfuscation({});
66         } else if (version == MEMPOOL_DUMP_VERSION) {
67+            std::vector<std::byte> obfuscation{8};

hodlinator commented at 8:04 am on May 19, 2025:

note in b085c52030ec30294f68abfdbe5b05acbae60a7e: Seems like there’s a newish footgun with curly-brace initialization of vectors.

This compiles because the passed value unambiguously specifies the length (std::byte isn’t implicitly constructible from integer literals):

0std::vector<std::byte> obfuscation{512};

The following fails, because it’s interpreted as initializer-list construction, and the integer literal would be narrowed to fit into the first element of the vector:

0std::vector<uint8_t> obfuscation{512};

So your code is correct, but I’m actually starting to feel some anti-curly sentiment brewing for vector initialization.

l0rinc commented at 2:58 pm on May 19, 2025:

✨ ₘₐ𝓖ᵢ𝒸 ✨

in src/test/streams_tests.cpp:100 in 9406c17223 outdated

 95+    }
 96+}
 97+
 98+BOOST_AUTO_TEST_CASE(obfuscation_serialize)
 99+{
100+    const Obfuscation original{0xDEADBEEF};

hodlinator commented at 8:31 am on May 19, 2025:

nit: 0xDEADBEEF is a 32-bit value. Might be more precise to use a 64-bit literal as above, or randomize?

l0rinc commented at 2:58 pm on May 19, 2025:

In my defense, I was hungry 🥩

in src/test/streams_tests.cpp:146 in 9406c17223 outdated

144         const char* mode = "wb";
145 #else
146         const char* mode = "wbx";
147 #endif
148-        AutoFile xor_file{raw_file(mode), xor_pat};
149+        AutoFile xor_file{raw_file(mode), xor_key}; // use the new uint64_t-to-Obfuscation constructor

hodlinator commented at 8:47 am on May 19, 2025:

nit: Should we use “old”/“new” qualifiers inline in the code? Feels more appropriate for something in the commit message.

l0rinc commented at 2:58 pm on May 19, 2025:

The new/old are wrong indeed - but the explanation for why we’re using a different constructor here needs local explanation - reworded, thanks

hodlinator commented at 2:05 pm on May 19, 2025: contributor

Code review 9406c17223da9df3a9875e4a5769a96ae66d61b8

Getting pretty close to ACK. One main inlined question about error handling in dbwrapper.cpp.

l0rinc force-pushed on May 19, 2025

in src/obfuscation.h:1 in 989537ff40

hodlinator commented at 6:54 am on May 20, 2025:

nit: Could update Godbolt-link in PR description to a newer version of Obfuscation with cached rotations?

l0rinc commented at 12:31 pm on May 20, 2025:

Done, thanks

hodlinator commented at 7:42 am on May 22, 2025:

nit: Might be appropriate to rename the file to src/bench/obfuscation.cpp?

l0rinc commented at 1:49 pm on May 23, 2025:

Done

in src/obfuscation.h:100 in 989537ff40 outdated

71+        return key;
72+    }
73+
74+    static void Xor(std::span<std::byte> target, const uint64_t key, const size_t size)
75+    {
76+        assert(size <= target.size());

hodlinator commented at 7:00 am on May 20, 2025:

nit: This is more of a sanity-check only necessary for debug builds, right?

0        Assume(size <= target.size());

ns/MiB	MiB/s	err%	total	benchmark
14,543.90	68,757.33	0.1%	11.01	`ObfuscationBench`
14,604.52	68,471.95	0.0%	11.01	`ObfuscationBench`
14,573.71	68,616.72	0.2%	11.04	`ObfuscationBench`

ns/MiB	MiB/s	err%	total	benchmark
19,552.98	51,143.11	0.1%	11.01	`ObfuscationBench`
19,529.72	51,204.01	0.2%	11.00	`ObfuscationBench`
20,046.53	49,883.94	0.1%	11.03	`ObfuscationBench`

ns/MiB	MiB/s	err%	ins/MiB	cyc/MiB	IPC	bra/MiB	miss%	total	benchmark
73,504.93	13,604.53	0.1%	655,366.68	270,831.04	2.420	131,074.07	0.0%	10.66	`ObfuscationBench`

ns/MiB	MiB/s	err%	ins/MiB	cyc/MiB	IPC	bra/MiB	miss%	total	benchmark
37,501.52	26,665.59	0.9%	294,930.14	138,093.33	2.136	32,771.94	0.0%	11.03	`ObfuscationBench`

ns/MiB	MiB/s	err%	ins/MiB	cyc/MiB	IPC	bra/MiB	miss%	total	benchmark
879,957.05	1,136.42	0.0%	9,437,186.42	3,158,477.41	2.988	1,048,576.99	0.0%	5.50	`ObfuscationBench`
879,916.05	1,136.47	0.0%	9,437,186.42	3,157,995.25	2.988	1,048,576.99	0.0%	5.50	`ObfuscationBench`

ns/MiB	MiB/s	err%	ins/MiB	cyc/MiB	IPC	bra/MiB	miss%	total	benchmark
51,665.69	19,355.21	0.0%	327,684.05	185,409.22	1.767	65,536.55	0.0%	5.50	`ObfuscationBench`
52,520.96	19,040.02	0.2%	327,684.05	188,497.60	1.738	65,536.55	0.0%	5.51	`ObfuscationBench`

ns/MiB	MiB/s	err%	ins/MiB	cyc/MiB	IPC	bra/MiB	miss%	total	benchmark
51,601.83	19,379.16	0.0%	327,687.85	185,151.71	1.770	65,536.95	0.0%	5.50	`ObfuscationBench`
51,952.63	19,248.30	0.1%	327,687.85	186,403.90	1.758	65,536.95	0.0%	5.50	`ObfuscationBench`

ns/MiB	MiB/s	err%	ins/MiB	cyc/MiB	IPC	bra/MiB	miss%	total	benchmark
46,341.13	21,579.10	0.0%	475,139.45	166,272.12	2.858	81,920.35	0.0%	5.50	`ObfuscationBench`
45,936.99	21,768.95	0.0%	475,139.45	164,853.63	2.882	81,920.35	0.0%	5.50	`ObfuscationBench`

ns/MiB	MiB/s	err%	ins/MiB	cyc/MiB	IPC	bra/MiB	miss%	total	benchmark
37,281.01	26,823.31	0.1%	475,139.64	133,808.24	3.551	81,920.54	0.0%	5.35	`ObfuscationBench`
37,652.77	26,558.47	0.1%	475,139.64	135,132.04	3.516	81,920.54	0.0%	5.35	`ObfuscationBench`

ns/MiB	MiB/s	err%	ins/MiB	cyc/MiB	IPC	bra/MiB	miss%	total	benchmark
926,379.31	1,079.47	0.1%	6,815,747.36	3,325,871.40	2.049	524,289.23	0.0%	5.50	`ObfuscationBench`
926,608.63	1,079.20	0.0%	6,815,747.36	3,326,572.88	2.049	524,289.23	0.0%	5.50	`ObfuscationBench`

ns/MiB	MiB/s	err%	ins/MiB	cyc/MiB	IPC	bra/MiB	miss%	total	benchmark
74,817.84	13,365.80	0.0%	655,366.68	268,566.88	2.440	131,074.08	0.0%	5.50	`ObfuscationBench`
75,351.98	13,271.05	0.1%	655,366.68	270,405.84	2.424	131,074.08	0.0%	5.51	`ObfuscationBench`

ns/MiB	MiB/s	err%	ins/MiB	cyc/MiB	IPC	bra/MiB	miss%	total	benchmark
38,396.57	26,043.99	0.1%	294,929.64	137,732.80	2.141	32,771.94	0.0%	5.35	`ObfuscationBench`
38,151.47	26,211.31	0.2%	294,929.64	136,877.12	2.155	32,771.94	0.0%	5.38	`ObfuscationBench`

ns/MiB	MiB/s	err%	ins/MiB	cyc/MiB	IPC	bra/MiB	miss%	total	benchmark
35,556.53	28,124.23	0.1%	262,148.44	127,610.00	2.054	16,384.64	0.0%	5.32	`ObfuscationBench`
36,409.31	27,465.50	0.1%	262,148.44	130,650.29	2.006	16,384.64	0.0%	5.33	`ObfuscationBench`

ns/MiB	MiB/s	err%	ins/MiB	cyc/MiB	IPC	bra/MiB	miss%	total	benchmark
34,770.15	28,760.30	0.0%	262,149.44	124,756.60	2.101	16,384.83	0.0%	5.32	`ObfuscationBench`
36,214.90	27,612.94	0.0%	262,149.44	129,932.72	2.018	16,384.84	0.0%	5.33	`ObfuscationBench`

Target & Compiler	Stride (per iter)	Main operation(s) in loop	Effective XORs
Clang x86-64 (trunk)	32 bytes	two unaligned 128-bit loads → pxor with broadcast key → stores	4 × 64-bit
GCC x86-64 (trunk)	16 bytes	two scalar 64-bit xor reg-mem instructions directly on the target	2 × 64-bit
GCC RV32 (trunk)	8 bytes	copy 8 B to stack scratch → XOR via two 32-bit regs → copy back	1 × 64-bit
GCC s390x (big-endian 14.2)	64 bytes	eight XC ops using key cached on stack	8 × 64-bit

Target & Compiler	Stride (per hot-loop iter)	Main operation(s) in loop	Effective XORs / iter
Clang x86-64 (trunk)	64 bytes	4 × movdqu → pxor → store (aligned after peel)	8 × 64-bit
GCC x86-64 (trunk)	64 bytes	same 4 × movdqu/pxor sequence, enabled by 8-way unroll	8 × 64-bit
GCC RV32 (trunk)	8 bytes	copy 8 B to temp stack → 2 × 32-bit XOR → copy back	1 × 64-bit (as 2 × 32-bit)
GCC s390x (big-endian 14.2)	64 bytes	8 × XC (mem-mem 8-B XOR) with key cached on stack	8 × 64-bit

Benchmarks

ObfuscationBench: 12x speedup on Clang, 17x speedup on GCC.
ReadBlockBench: 15-16% speedup on both.
ReadRawBlockBench: ~9x speedup on both.

Used ./build/bin/bench_bitcoin --filter="ObfuscationBench|ReadBlockBench|ReadRawBlockBench" --min-time=10000 on NixOS.

GCC 14.2.1

Before (06353e23e19a116a8e1862bef4b1fcfece14b445, 2nd commit in PR):

ns/MiB	MiB/s	err%	ins/MiB	cyc/MiB	IPC	bra/MiB	miss%	total	benchmark
861,980.77	1,160.12	0.2%	9,437,186.38	3,177,041.69	2.970	1,048,576.97	0.0%	10.79	`ObfuscationBench`

ns/op	op/s	err%	ins/op	cyc/op	IPC	bra/op	miss%	total	benchmark
5,509,795.20	181.49	0.5%	64,457,565.88	20,006,289.95	3.222	3,636,177.14	0.3%	11.03	`ReadBlockBench`
936,745.22	1,067.53	0.3%	9,010,771.90	3,232,301.21	2.788	1,002,214.88	0.0%	10.99	`ReadRawBlockBench`

After (989537ff4026d7c3fa5ba99701e0a4b134d950f7, last commit in PR):

ns/MiB	MiB/s	err%	ins/MiB	cyc/MiB	IPC	bra/MiB	miss%	total	benchmark
51,720.05	19,334.86	0.4%	327,684.05	190,464.16	1.720	65,536.55	0.0%	10.99	`ObfuscationBench`

ns/op	op/s	err%	ins/op	cyc/op	IPC	bra/op	miss%	total	benchmark
4,675,373.37	213.89	0.3%	55,774,507.70	16,981,281.76	3.284	2,699,256.61	0.5%	10.98	`ReadBlockBench`
105,697.36	9,460.97	0.2%	324,342.05	175,063.45	1.853	64,840.05	0.0%	10.99	`ReadRawBlockBench`

Clang Clang 20.1.3

Before (06353e23e19a116a8e1862bef4b1fcfece14b445, 2nd commit in PR):

ns/MiB	MiB/s	err%	ins/MiB	cyc/MiB	IPC	bra/MiB	miss%	total	benchmark
907,086.11	1,102.43	0.1%	6,815,747.33	3,342,513.78	2.039	524,289.21	0.0%	11.01	`ObfuscationBench`

ns/op	op/s	err%	ins/op	cyc/op	IPC	bra/op	miss%	total	benchmark
5,509,068.75	181.52	0.1%	61,830,240.13	19,977,540.00	3.095	3,020,308.24	0.4%	10.98	`ReadBlockBench`
924,286.56	1,081.92	0.3%	6,511,639.90	3,190,509.11	2.041	502,479.88	0.0%	10.97	`ReadRawBlockBench`

After (989537ff4026d7c3fa5ba99701e0a4b134d950f7, last commit in PR):

ns/MiB	MiB/s	err%	ins/MiB	cyc/MiB	IPC	bra/MiB	miss%	total	benchmark
75,454.29	13,253.06	0.6%	655,366.68	277,944.85	2.358	131,074.08	0.0%	10.71	`ObfuscationBench`

ns/op	op/s	err%	ins/op	cyc/op	IPC	bra/op	miss%	total	benchmark
4,632,323.51	215.87	0.1%	55,587,644.66	16,818,777.18	3.305	2,552,825.55	0.4%	11.00	`ReadBlockBench`
105,903.19	9,442.59	0.1%	262,149.05	176,010.62	1.489	33,762.05	0.1%	11.00	`ReadRawBlockBench`

l0rinc force-pushed on May 23, 2025

DrahtBot added the label CI failed on May 23, 2025

DrahtBot commented at 2:15 pm on May 23, 2025: contributor

🚧 At least one of the CI tasks failed. Task MSan, depends: https://github.com/bitcoin/bitcoin/runs/42787791556 LLM reason (✨ experimental): The CI failure is due to a failed assertion related to memory alignment during streams_tests.

Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.
An intermittent issue.

Leave a comment here, if you need help tracking down a confusing failure.

l0rinc force-pushed on May 23, 2025

DrahtBot removed the label CI failed on May 23, 2025

hodlinator approved

hodlinator commented at 9:17 am on May 28, 2025: contributor

re-ACK 63854e8a81fdef3ca99ebd339db72563d053b9d0

While the code is made less straightforward due to optimizations, I think it’s still worth doing it to minimize overhead from obfuscation.

Changes since first ACK (https://github.com/bitcoin/bitcoin/pull/31144#pullrequestreview-2852896828):

Changed commit messages to include benchmark results from more recent GCC and newer vanilla instead of Apple-flavoured Clang.
Modified unit test to add verification of obfuscation starting at non-aligned offsets as suggested.
Renamed benchmark file as suggested.
Removed ReadBlockBench and ReadRawBlockBench results from commit message (verified to still improve at the end of this review).
Added new commit at the end with further optimization partially inspired by my suggestion (https://github.com/bitcoin/bitcoin/pull/31144#discussion_r2097271922)!
- Adapts code to help compiler with loop unrolling and SIMD.
- Instead of having *reinterpret_cast<uint64_t*>(target.data()) ^= rot_key, the unrolling allows for using Obfuscation::Xor() with memcpys back & forth and still have compilers optimize effectively.
- I like how this version calculates the number of initial non-aligned bytes.
- std::assume_aligned - nice find!

Testing

Guix + Windows

Was curious about Windows performance, so I patched the Guix build to include bench_bitcoin.exe.

 0diff --git a/contrib/guix/libexec/build.sh b/contrib/guix/libexec/build.sh
 1index ae98eba744..25191f7e24 100755
 2--- a/contrib/guix/libexec/build.sh
 3+++ b/contrib/guix/libexec/build.sh
 4@@ -206,7 +206,7 @@ mkdir -p "$OUTDIR"
 5 ###########################
 6 
 7 # CONFIGFLAGS
 8-CONFIGFLAGS="-DREDUCE_EXPORTS=ON -DBUILD_BENCH=OFF -DBUILD_GUI_TESTS=OFF -DBUILD_FUZZ_BINARY=OFF"
 9+CONFIGFLAGS="-DREDUCE_EXPORTS=ON -DBUILD_BENCH=ON -DBUILD_GUI_TESTS=OFF -DBUILD_FUZZ_BINARY=OFF"
10 
11 # CFLAGS
12 HOST_CFLAGS="-O2 -g"
13diff --git a/src/bench/CMakeLists.txt b/src/bench/CMakeLists.txt
14index 3003115068..56f78ab21d 100644
15--- a/src/bench/CMakeLists.txt
16+++ b/src/bench/CMakeLists.txt
17@@ -58,6 +58,8 @@ target_raw_data_sources(bench_bitcoin NAMESPACE benchmark::data
18   data/block413567.raw
19 )
20 
21+add_windows_application_manifest(bench_bitcoin)
22+
23 target_link_libraries(bench_bitcoin
24   core_interface
25   test_util

Result on Windows Guix build show we are in the same ballpark as results in commit message (63854e8a81fdef3ca99ebd339db72563d053b9d0):

0> bench_bitcoin.exe -filter=ObfuscationBench -min-time=10000
1
2|              ns/MiB |               MiB/s |    err% |     total | benchmark
3|--------------------:|--------------------:|--------:|----------:|:----------
4|           37,787.80 |           26,463.57 |    0.7% |     11.22 | `ObfuscationBench`

Result on Linux Guix build is similar as Windows, also those in commit messages and measurements with my suggested optimization:

0₿ ./bench_bitcoin -filter=ObfuscationBench -min-time=10000
1
2|              ns/MiB |               MiB/s |    err% |         ins/MiB |         cyc/MiB |    IPC |        bra/MiB |   miss% |     total | benchmark
3|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
4|           36,053.83 |           27,736.30 |    0.4% |      475,140.24 |      132,841.83 |  3.577 |      81,920.74 |    0.0% |     10.96 | `ObfuscationBench`

Prior to running doas perf system tune, it actually ran significantly faster (probably due to some unreliable CPU boosting logic):

0|              ns/MiB |               MiB/s |    err% |         ins/MiB |         cyc/MiB |    IPC |        bra/MiB |   miss% |     total | benchmark
1|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
2|           29,425.58 |           33,984.04 |    0.8% |      475,140.23 |      107,611.98 |  4.415 |      81,920.73 |    0.0% |     11.04 | `ObfuscationBench`

ReadBlockBench + ReadRawBlockBench

Verified that related benchmarks still improved (non-Guix, NixOS, compiler Clang 20.1.3):

Before (2df824f4e62b6bc569044819cd64f66f3839ba13, PR base):

0|               ns/op |                op/s |    err% |          ins/op |          cyc/op |    IPC |         bra/op |   miss% |     total | benchmark
1|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
2|        5,440,481.19 |              183.81 |    0.1% |   61,834,466.65 |   19,827,449.24 |  3.119 |   3,028,151.40 |    0.4% |     10.98 | `ReadBlockBench`
3|          919,632.97 |            1,087.39 |    0.0% |    6,511,624.89 |    3,187,581.46 |  2.043 |     502,474.87 |    0.0% |     11.00 | `ReadRawBlockBench`

After (63854e8a81fdef3ca99ebd339db72563d053b9d0, PR tip):

0|               ns/op |                op/s |    err% |          ins/op |          cyc/op |    IPC |         bra/op |   miss% |     total | benchmark
1|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
2|        4,649,147.57 |              215.09 |    0.1% |   55,576,004.70 |   16,919,942.23 |  3.285 |   2,542,138.60 |    0.5% |     11.03 | `ReadBlockBench`
3|           87,733.01 |           11,398.22 |    0.2% |      262,263.03 |      116,898.39 |  2.244 |      18,154.03 |    0.1% |     10.80 | `ReadRawBlockBench`

yuvicc commented at 8:45 am on June 9, 2025: contributor

Concept ACK

Will benchmark this soon!

yuvicc commented at 6:45 pm on June 11, 2025: contributor

Code-Review & Tested ACK 63854e8a81fdef3ca99ebd339db72563d053b9d0

The code lgtm, thanks to @hodlinator reviews.

Regarding Tests

x86_64 ubuntu 24.04.2

 0C++  compiler ................................................................... Clang 18.1.3
 1
 2|              ns/MiB |               MiB/s |    err% |         ins/MiB |         cyc/MiB |    IPC |        bra/MiB |   miss% |     total | benchmark
 3|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
 4|          107,903.53 |            9,267.54 |    0.1% |      475,140.31 |      452,373.92 |  1.050 |      81,920.81 |    0.0% |     10.99 | `ObfuscationBench`
 5
 6
 7C++ compiler ................................................................... GNU 13.3.0
 8
 9|              ns/MiB |               MiB/s |    err% |         ins/MiB |         cyc/MiB |    IPC |        bra/MiB |   miss% |     total | benchmark
10|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
11|          107,881.57 |            9,269.42 |    0.1% |      475,140.31 |      452,243.73 |  1.051 |      81,920.81 |    0.0% |     11.00 | `ObfuscationBench`

MacOS Sequoia 15.3.2

0C++ compiler ................................................................... Clang 16.0.0
1|              ns/MiB |               MiB/s |    err% |     total | benchmark
2|--------------------:|--------------------:|--------:|----------:|:----------
3|           22,649.25 |           44,151.57 |    4.7% |     10.80 | `ObfuscationBench`
4
5
6C++ compiler ................................................................... Clang 18.1.7
7|              ns/MiB |               MiB/s |    err% |     total | benchmark
8|--------------------:|--------------------:|--------:|----------:|:----------
9|           21,154.05 |           47,272.27 |    0.1% |     10.99 | `ObfuscationBench`

commit 3959a9010669383e5727582f35dd8753e3d026d4

 0C++  compiler ................................................................... Clang 18.1.3
 1
 2|              ns/MiB |               MiB/s |    err% |         ins/MiB |         cyc/MiB |    IPC |        bra/MiB |   miss% |     total | benchmark
 3|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
 4|          117,618.83 |            8,502.04 |    0.1% |      655,362.22 |      494,070.94 |  1.326 |     131,072.42 |    0.0% |      5.51 | `ObfuscationBench`
 5
 6C++ compiler ................................................................... GNU 13.3.0
 7
 8|              ns/MiB |               MiB/s |    err% |         ins/MiB |         cyc/MiB |    IPC |        bra/MiB |   miss% |     total | benchmark
 9|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
10|          113,280.61 |            8,827.64 |    0.1% |      327,683.32 |      475,761.88 |  0.689 |      65,536.51 |    0.0% |      5.50 | `ObfuscationBench`

Commit be3435c2ff82f69db11ba032452614404dac15f9

 0C++  compiler ................................................................... Clang 18.1.3
 1
 2|              ns/MiB |               MiB/s |    err% |         ins/MiB |         cyc/MiB |    IPC |        bra/MiB |   miss% |     total | benchmark
 3|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
 4|          788,559.17 |            1,268.14 |    0.1% |    6,815,747.22 |    3,366,204.35 |  2.025 |     524,289.09 |    0.0% |      5.50 | `ObfuscationBench`
 5
 6C++ compiler ................................................................... GNU 13.3.0
 7
 8|              ns/MiB |               MiB/s |    err% |         ins/MiB |         cyc/MiB |    IPC |        bra/MiB |   miss% |     total | benchmark
 9|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
10|          604,312.84 |            1,654.77 |    0.1% |    9,437,186.04 |    2,579,119.63 |  3.659 |   1,048,576.71 |    0.0% |      5.51 | `ObfuscationBench`

I will be doing IBD test on my machine and share the results soon!

maflcko commented at 10:17 am on June 16, 2025: member

For other benchmark speedups see https://corecheck.dev/bitcoin/bitcoin/pulls/31144

I don’t think this list is complete. At least for me it is truncated, due to the benchmark rename in the early commit. I tried to fix it in https://github.com/corecheck/corecheck/pull/90, but I am not sure how to re-run it here. Maybe rebase?

achow101 commented at 10:23 pm on July 1, 2025: member

In be3435c2ff82f69db11ba032452614404dac15f9 “bench: make ObfuscationBench more representative”, the commit message states

Since a previous PR already solved the tiny byte-array xors during serialization, we’re only concentrating on big continuous chunks now.

and the commit essentially removes the previous benchmark. However, I don’t think that’s a good justification for removing the benchmark. Benchmarks for things that are “resolved” should still be kept around in order to check for regressions. I would prefer if this commit added the benchmark as a new benchmark rather than modifying/destroying the original one.

Could 2c8cb56fdd5c8b8f7910015f6b0c85daf05d384b “refactor: unify xor-vs-obfuscation nomenclature” be done as a scripted-diff?

maflcko commented at 7:28 am on July 2, 2025: member

removing the benchmark

My understanding is that changing the benchmark is required. Maybe this can be explained better in the commit message to mention:

31-byte xor-keys are not used in the codebase, so using the common size (8) makes more sense.

I am not sure about changing the size to 10_MiB, because database obfuscation still runs on the values individually, which are smaller in reality. Also, the buffered reader will read smaller chunks than that in reality. So maybe the data size can be left as-is or not increased this much?

in src/bench/xor.cpp:24 in be3435c2ff outdated

25-        util::Xor(data, key);
26+    std::array<std::byte, sizeof(uint64_t)> obfuscation{};
27+    rng.fillrand(obfuscation);
28+
29+    size_t offset{0};
30+    bench.batch(bytes / 1_MiB).unit("MiB").run([&] {

maflcko commented at 7:29 am on July 2, 2025:

nit in be3435c2ff82f69db11ba032452614404dac15f9: Any reason to rename data to test_data and change the unit? Seems fine to just keep bench.batch(data.size()).unit("byte").run([&] {?

l0rinc force-pushed on Jul 2, 2025

l0rinc commented at 11:27 am on July 2, 2025: contributor

Thank you for the comments @achow101 and @maflcko!

I’ve applied both suggestions - making sure the benchmark is closer to the original, explained it in the commit message as well. I have re-measured each commit accordingly with the updated benchmarks - see the commit messages and PR description.

I have also reworked the renames to use simple scripted diffs for the renames (it even revealed a xor_key to obfuscation rename I missed and in one of the commit messages was still using the XOR name). I have ordered the renames by heaviest substitutions first, and checked that each replacement results in compilable code on its own as well. For simplicity, the benchmark file rename was done in the dedicated benchmark commit instead.

And to rerun the corecheck benchmarks mentioned by @maflcko (and to make sure all latest tests are run), I have also rebased the change in a separate push (to make the first GitHub diff cleaner).

 0git log -1 --pretty=format:'%h %s' && \
 1echo ">  C++ compiler .......................... GNU $(gcc -dumpfullversion)" && \
 2  rm -rf build && \
 3  cmake -B build -DBUILD_BENCH=ON -DCMAKE_BUILD_TYPE=Release -DCMAKE_C_COMPILER=gcc -DCMAKE_CXX_COMPILER=g++ >/dev/null 2>&1 && \
 4  cmake --build build -j$(nproc) >/dev/null 2>&1 && \
 5  build/bin/bench_bitcoin -filter='ObfuscationBench' -min-time=5000 && \
 6  build/bin/bench_bitcoin -filter='ObfuscationBench' -min-time=5000 && \
 7echo "> C++ compiler .......................... Clang $(clang -dumpversion)" && \
 8  rm -rf build && \
 9  cmake -B build -DBUILD_BENCH=ON -DCMAKE_BUILD_TYPE=Release -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ >/dev/null 2>&1 && \
10  cmake --build build -j$(nproc) >/dev/null 2>&1 && \
11  build/bin/bench_bitcoin -filter='ObfuscationBench' -min-time=5000 && \
12  build/bin/bench_bitcoin -filter='ObfuscationBench' -min-time=5000

l0rinc force-pushed on Jul 2, 2025

achow101 commented at 9:04 pm on July 2, 2025: member

ACK 0ab0e1be5461084b8e945f14406868f044c90718

DrahtBot requested review from hodlinator on Jul 2, 2025

DrahtBot requested review from yuvicc on Jul 2, 2025

l0rinc requested review from maflcko on Jul 2, 2025

l0rinc requested review from fanquake on Jul 2, 2025

DrahtBot added the label Needs rebase on Jul 3, 2025

l0rinc commented at 7:27 am on July 4, 2025: contributor

Rebased after #29307 Edit: pushed again to fix LLM findings and to retrigger corecheck

l0rinc force-pushed on Jul 4, 2025

DrahtBot removed the label Needs rebase on Jul 4, 2025

l0rinc force-pushed on Jul 4, 2025

in src/test/streams_tests.cpp:23 in 79901734fa outdated

18 BOOST_FIXTURE_TEST_SUITE(streams_tests, BasicTestingSetup)
19 
20+// Test that obfuscation can be properly reverted even with random chunk sizes.
21+BOOST_AUTO_TEST_CASE(xor_roundtrip_random_chunks)
22+{
23+    auto apply_random_xor_chunks{[&](std::span<std::byte> target, const std::span<std::byte, sizeof(uint64_t)> obfuscation) {

maflcko commented at 1:07 pm on July 4, 2025:

nit in 79901734fa357c2449608f8648ab84daf1cccd8b: A span is just a pointer, so adding const before std::span doesn’t really say much. In fact, it could be confusing, because const std::span<std::byte> actually points to mutable data. I’d just remove the const here.

If you really want to add it, it should probably go before std::byte?

l0rinc commented at 5:41 pm on July 4, 2025:

adding const before std::span doesn’t really say much

This is an intermediary commit, meant to lay the groundwork for more risky changes later. It may not make a lot of sense on its own, even though it does prohibits argument reassignments such as:

0auto apply_random_xor_chunks{[&](std::span<std::byte> target, const std::span<std::byte, sizeof(uint64_t)> obfuscation) {
1    std::array<std::byte, sizeof(uint64_t)> key_bytes{};
2    obfuscation = std::span<std::byte, sizeof(uint64_t)>{key_bytes.begin(), key_bytes.end()};

It also separates the mutable target parameter from the immutable key that operates on it. It’s not perfect, but it’s improved in a follow-up commit, where we’re getting const Obfuscation& obfuscation parameter instead where const makes more sense. It was meant to do the change in smaller steps - but it looks like that’s not welcome, so I’ll revert it for now.

in src/test/streams_tests.cpp:43 in 79901734fa outdated

38+        apply_random_xor_chunks(roundtrip, key_bytes);
39+
40+        // Verify intermediate state differs from original unless the key is all zeros
41+        const bool all_zeros{std::ranges::all_of(
42+            std::span{key_bytes}.first(std::min(write_size, key_bytes.size())), [](auto b) { return b == std::byte{0}; })};
43+        BOOST_CHECK_EQUAL(original != roundtrip, !all_zeros);

maflcko commented at 1:17 pm on July 4, 2025:

nit in https://github.com/bitcoin/bitcoin/commit/79901734fa357c2449608f8648ab84daf1cccd8b: Generally, I try to avoid double inversions. Could probably write this without any !:

0        BOOST_CHECK_EQUAL(original == roundtrip, all_zeros);

l0rinc commented at 4:01 pm on July 4, 2025:

I can’t say that I don’t disagree with none of that!

I tried to encode the meaning into a check (i.e. that the obfuscated value should differ from the original, unless obfuscation is turned off), but it got indeed quite weird, thanks :)

Maybe there’s a middle-ground that documents the expectation better:

0BOOST_CHECK_NE(original != roundtrip, all_zeros);

or even better (which would eliminate the need for the comment as well)

0const bool key_all_zeros{std::ranges::all_of(
1    std::span{key_bytes}.first(std::min(write_size, key_bytes.size())), [](auto b) { return b == std::byte{0}; })};
2BOOST_CHECK(key_all_zeros ? original == roundtrip : original != roundtrip);

in src/test/streams_tests.cpp:54 in 79901734fa outdated

49+
50+// Compares optimized obfuscation against a trivial byte-by-byte reference implementation
51+// with random offsets to ensure proper handling of key wrapping.
52+BOOST_AUTO_TEST_CASE(xor_bytes_reference)
53+{
54+    auto expected_xor{[](std::span<std::byte> target, const std::span<const std::byte, sizeof(uint64_t)> obfuscation, size_t key_offset) {

maflcko commented at 1:19 pm on July 4, 2025:

same (about const)

l0rinc commented at 5:58 pm on July 4, 2025:

Kept the inner const, removed the outer

in src/test/streams_tests.cpp:51 in 79901734fa outdated

46+        BOOST_CHECK(original == roundtrip);
47+    }
48+}
49+
50+// Compares optimized obfuscation against a trivial byte-by-byte reference implementation
51+// with random offsets to ensure proper handling of key wrapping.

maflcko commented at 1:43 pm on July 4, 2025:

nit in https://github.com/bitcoin/bitcoin/commit/79901734fa357c2449608f8648ab84daf1cccd8b: This also checks alignment? I guess from https://github.com/bitcoin/bitcoin/runs/42787791556?

If yes, could clarify here “… of key wrapping and data alignment.”, and below after write_offset.

Otherwise, someone could “optimize the test” and remove write_offset, because it seems unused.

l0rinc commented at 6:08 pm on July 4, 2025:

I’m not exactly sure I understood, but added a comment to write_offset to clarify that it’s meant to check that we can start obfuscation from any offset

in src/test/streams_tests.cpp:62 in 79901734fa outdated

57+        }
58+    }};
59+
60+    for (size_t test{0}; test < 100; ++test) {
61+        const size_t write_size{1 + m_rng.randrange(100U)};
62+        const size_t key_offset{m_rng.randrange(3 * 8U)}; // Should wrap around

maflcko commented at 1:44 pm on July 4, 2025:

q: What does “Should wrap around” mean? The whole test is about key wrapping, so it seems redundant?

l0rinc commented at 3:54 pm on July 4, 2025:

the key_offset_bytes % SIZE_BYTES part wouldn’t be tested if we just used key_offset{m_rng.randrange(8U)} - can you suggest a better comment to make that obvious?

in src/test/dbwrapper_tests.cpp:267 in 79901734fa outdated

262@@ -231,7 +263,7 @@ BOOST_AUTO_TEST_CASE(existing_data_no_obfuscate)
263     BOOST_CHECK(odbw.Read(key, res2));
264     BOOST_CHECK_EQUAL(res2.ToString(), in.ToString());
265 
266-    BOOST_CHECK(!odbw.IsEmpty()); // There should be existing data
267+    BOOST_CHECK(!odbw.IsEmpty());                          // There should be existing data
268     BOOST_CHECK(is_null_key(dbwrapper_private::GetObfuscateKey(odbw))); // The key should be an empty string

maflcko commented at 1:53 pm on July 4, 2025:

nit in 79901734fa357c2449608f8648ab84daf1cccd8b: looks unrelated and clang-format-wrong? Instead of changing whitespace on redundant comments, it would be better to remove the comments, or leave them as-is?

l0rinc commented at 3:55 pm on July 4, 2025:

Yeah, this was done to separate the formatting from code change being done later - but I also dislike doing it in a former commit where we don’t see yet why the alignment won’t be off in a few more commits. And the comment is meaningless, so I’ve removed it, thanks.

in src/bench/obfuscation.cpp:9 in 8b03122ed8 outdated

0@@ -0,0 +1,31 @@
1+// Copyright (c) The Bitcoin Core developers
2+// Distributed under the MIT software license, see the accompanying
3+// file COPYING or https://opensource.org/license/mit/.
4+
5+#include <bench/bench.h>
6+#include <random.h>
7+#include <span.h>
8+#include <streams.h>
9+#include <util/byte_units.h>

maflcko commented at 2:14 pm on July 4, 2025:

nit in 8b03122ed8f6ef6955532d4499aeb66bdb12c24b: Why add an unused include in a commit that should be mostly move-only ?

l0rinc commented at 6:13 pm on July 4, 2025:

was a leftover from the previous 1_MiB - removed, thanks.

in src/bench/obfuscation.cpp:16 in 8b03122ed8 outdated

11+#include <cstddef>
12+#include <vector>
13+
14+static void ObfuscationBench(benchmark::Bench& bench)
15+{
16+    FastRandomContext rng{/*fDeterministic=*/true};

maflcko commented at 2:16 pm on July 4, 2025:

nit in https://github.com/bitcoin/bitcoin/commit/8b03122ed8f6ef6955532d4499aeb66bdb12c24b: Why rename this? This just makes it harder to review via --color-moved=dimmed-zebra.

l0rinc commented at 6:13 pm on July 4, 2025:

Fine, reverted.

in src/bench/obfuscation.cpp:25 in 8b03122ed8 outdated

26+        util::Xor(data, obfuscation, offset++); // mutated differently each time
27+        ankerl::nanobench::doNotOptimizeAway(data);
28+    });
29+}
30+
31+BENCHMARK(ObfuscationBench, benchmark::PriorityLevel::HIGH);

maflcko commented at 2:35 pm on July 4, 2025:

nit in https://github.com/bitcoin/bitcoin/commit/8b03122ed8f6ef6955532d4499aeb66bdb12c24b: Any reason to include the Bench suffix. It is redundant and inconsistent, as none of the other benchmarks have it. Generally, it is best keep everything as-is, unless there is a reason to change it.

l0rinc commented at 3:56 pm on July 4, 2025:

When a benchmark’s name collides with an existing class/method, the Test suffix is often used, which I wanted to avoid here since it’s not strictly a test. I.e. we can’t have:

0static void Obfuscation(...)
1{
2    // ...
3    const Obfuscation obfuscation{...};

in src/bench/obfuscation.cpp:17 in 8b03122ed8 outdated

12+#include <vector>
13+
14+static void ObfuscationBench(benchmark::Bench& bench)
15+{
16+    FastRandomContext rng{/*fDeterministic=*/true};
17+    constexpr size_t bytes{1024};

maflcko commented at 2:38 pm on July 4, 2025:

nit in the same commit: Why extract this when it isn’t used? This just makes --color-moved=dimmed-zebra dirty.

l0rinc commented at 6:18 pm on July 4, 2025:

Inlined it back

in src/streams.h:31 in 4bbc73003e outdated

27@@ -28,7 +28,7 @@
28 #include <vector>
29 
30 namespace util {
31-inline void Xor(std::span<std::byte> write, std::span<const std::byte> key, size_t key_offset = 0)
32+inline void Obfuscate(std::span<std::byte> write, std::span<const std::byte> key, size_t key_offset = 0)

maflcko commented at 2:59 pm on July 4, 2025:

4bbc73003ecdb3e555a7b0be329efdc86c5c40d6: Not sure about renaming something that will be deleted anyway in the next commit. Also, this is the lowest level and Xor seems fine to leave as-is here.

(The other renames in the scripted diff seem fine)

l0rinc commented at 6:26 pm on July 4, 2025:

It made the following commits simpler (e.g. tests), but I don’t mind reverting this part, the end-result will be very similar anyway.

in src/test/streams_tests.cpp:113 in 79901734fa outdated

109@@ -51,7 +110,7 @@ BOOST_AUTO_TEST_CASE(xor_file)
110         BOOST_CHECK_EXCEPTION(non_xor_file.ignore(1), std::ios_base::failure, HasReason{"AutoFile::ignore: end of file"});
111     }
112     {
113-        AutoFile xor_file{raw_file("rb"), xor_pat};
114+        AutoFile xor_file{raw_file("rb"), key_bytes};

maflcko commented at 3:03 pm on July 4, 2025:

in the first commit: It seems a bit confusing to follow the renames here xor_pat -> key_bytes -> xor_key, but xor_key was supposed to be renamed to obfuscation in the scripted diff?

No strong opinion, but generally, it is best to keep the churn to a minimum. Otherwise, it is harder to review and go back in history if a name changes randomly in every commit.

l0rinc commented at 6:42 pm on July 4, 2025:

That is indeed confusing - I have tried to unify it as much as I could based on whether it’s important that the key is stored as bytes or not. Let me know if it’s still confusing or if I forgot anything. key_bytes was chosen also because it existed in the codebase before, e.g. https://github.com/bitcoin/bitcoin/blob/fa942332b40c97375af0722f32f7575bca3af819/src/test/fuzz/crypto_chacha20.cpp#L62

maflcko commented at 4:39 pm on July 8, 2025:

That is indeed confusing - I have tried to unify it as much as I could based on whether it’s important that the key is stored as bytes or not. Let me know if it’s still confusing or if I forgot anything.

I’d say the Obfuscation constructor that takes an uint64_t should probably be removed, because it is unused outside of tests. Also, it exposes an implementation detail, which could even be confusing in the context of endianness. It is only used to set a value of 0 (or all-zero bytes). However, for that the default constructor could be used, which could then be denoted in C++ via {} (vs {0}). If you want to give 0 a name, you can add a new type, but this seems overkill.

In that case, the distinction of whether the key is stored as bytes doesn’t matter, because all keys are stored as bytes.

maflcko commented at 5:15 pm on July 8, 2025:

Though, generally, I also wonder if the rename from xor to obfuscation is really needed or meaningful. I think it is clear that the xor approach isn’t trivial to change without writing data-migration/upgrade code. Also, there hopefully isn’t a reason to having to ever change it in the future. Also, there are still plenty of places that refer to xor (MEMPOOL_DUMP_VERSION_NO_XOR_KEY, …).

I’d say just leaving the naming as-is will be beneficial, because:

The diff is smaller and more focussed.
The existing code is more consistent, because unchanged lines in the rename that still refer to “xor” aren’t going to be mixed with lines that refer to “obfuscation”.

For the new class/struct you are adding, it seems fine to pick a name you want, but “fixing” the naming of all existing code will probably bloat this pull request, and be confusing, without a clear motivation.

The scripted-diff can probably still “fix” the typo “obfuscate key” -> “obfuscation key”, but other than that, it may be better to reduce it.

l0rinc commented at 9:31 am on July 10, 2025:

rename from xor to obfuscation is really needed or meaningful

My thinking was that xor describes the implementation while obfuscation describes the purpose - there are many xor uses throughout the codebase, but fewer places where we specifically need obfuscation. Our existing patterns already lean toward the obfuscation terminology: the printed messages reference obfuscation key, the database entry is obfuscate_key, and the parameter is .obfuscate.

xor approach isn’t trivial to change

Absolutely, I wouldn’t expect the underlying implementation to change. The rename was more about aligning with the existing user-facing terminology and higher abstraction level.

“fixing” the naming of all existing code will probably bloat this pull request

I’ve already separated the naming changes into their own commit (which you mentioned works for you), the changes are straightforward to verify. I’d appreciate keeping this improvement, but I’m open to discussion if you have concerns about it.

in src/dbwrapper.cpp:397 in 1c1b506a6f outdated

393@@ -412,9 +394,6 @@ void CDBIterator::Next() { m_impl_iter->iter->Next(); }
394 
395 namespace dbwrapper_private {
396 
397-const std::vector<unsigned char>& GetObfuscation(const CDBWrapper &w)
398-{
399-    return w.m_obfuscation;
400-}
401+const std::vector<unsigned char>& GetObfuscation(const CDBWrapper &w) { return w.m_obfuscation; }

maflcko commented at 3:14 pm on July 4, 2025:

in 1c1b506a6fbaae8b99356e800aae6f8c49d010d3: Seems like a unrelated whitespace-only change? Seems hard to believe this is needed preparation. Also, the line is touched later anyway, so this just increases the git blame depth.

l0rinc commented at 6:54 pm on July 4, 2025:

Also, the line is touched later anyway

That’s why low-risk changes such as the whitespace formatting was done in a separate commit. But I’ve reverted it an now the optimization commit does the formatting as well.

in src/obfuscation.h:28 in 7d6beefef2 outdated

21+    Obfuscation(const uint64_t key) { SetRotations(key); }
22+    Obfuscation(const std::span<const std::byte, SIZE_BYTES> key_span) : Obfuscation(ToUint64(key_span)) {}
23+
24+    uint64_t Key() const { return m_rotations[0]; }
25+    operator bool() const { return Key() != 0; }
26+    void operator()(std::span<std::byte> target, const size_t key_offset_bytes = 0) const

maflcko commented at 3:23 pm on July 4, 2025:

nit in 7d6beefef2384b96237a9f23993078576713e3c6: Seems a bit odd to mix refactor changes with optimization changes.

It could make sense to introduce Obfuscation and have its operator() just be a move-only Xor() implementation and then add a new commit that actually replaces the operator with the optimized version?

l0rinc commented at 8:37 pm on July 4, 2025:

fair, I have indeed done too many things in that commit, split out 2 more smaller ones before it - thanks for the hind, added you as co-author.

maflcko approved

maflcko commented at 3:35 pm on July 4, 2025: member

concept ack. Left some nits on a first pass

I haven’t benchmarked this myself yet.

l0rinc force-pushed on Jul 5, 2025

l0rinc commented at 6:34 pm on July 5, 2025: contributor

Thanks @maflcko for the review, addressed your nits in the latest push.

The final diff is mostly the same, since the comments were mostly about intermediary commits.

I have exploded the big vector-to-uint64 commit into 3, these two were split out:

refactor: move util::Xor to Obfuscation().Xor
refactor: encapsulate std::vector<std::byte> keys into Obfuscation

I also tried to be more strategic in the naming - let me know if it’s still confusing. Often the intermediary names are meant to simplify later commits.

I have also removed a few argument consts that could be confusing.

In streams I have added back dst.subspan(0, ret) - even though it’s a noop basically, the app will crash after it anyway if it’s not the same as dts, but at least this minimizes the diff.

l0rinc force-pushed on Jul 10, 2025

l0rinc commented at 12:31 pm on July 10, 2025: contributor

First push contains:

Obfuscation constructor only accepts fixed-size span now - and is explicit
fuzz test changes were moved earlier to fix per-commit tests and we’re using ConsumeFixedLengthByteVector now instead of either a uint64_t or random length vector
Obfuscation::SIZE_BYTES was originally named as such to minimize the diff, but Obfuscation::KEY_SIZE is a better name
added another FastRandomContext::randbytes (with vector/array behavior symmetry test) which creates a fixed-size random array for convenience (cc: @hodlinator)
renamed key_offset_bytes back to key_offset to minimize the diff
reverted uint64_t obfuscation_key change in last commit back to Obfuscation obfuscation to minimize the diff

The second push is just a changeless rebase. Third one is rebase + fuzz fix.

l0rinc force-pushed on Jul 10, 2025

DrahtBot added the label CI failed on Jul 10, 2025

DrahtBot commented at 1:10 pm on July 10, 2025: contributor

🚧 At least one of the CI tasks failed. Task MSan, depends: https://github.com/bitcoin/bitcoin/runs/45719812034 LLM reason (✨ experimental): The CI failure is caused by a compilation error in autofile.cpp due to mismatched constructor arguments for the Obfuscation class.

Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.
An intermittent issue.

Leave a comment here, if you need help tracking down a confusing failure.

l0rinc force-pushed on Jul 10, 2025

DrahtBot removed the label CI failed on Jul 10, 2025

in src/obfuscation.h:79 in 342bb224bb outdated

 95-        std::memcpy(&key, m_key.data(), KEY_SIZE);
 96+        std::memcpy(&key, key_span.data(), KEY_SIZE);
 97         return key;
 98     }
 99+
100+    static void Xor(std::span<std::byte> target, const uint64_t key, const size_t size)

ryanofsky commented at 7:11 pm on July 11, 2025:

In commit “refactor: encapsulate vector/array keys into Obfuscation” (74e33e7793bfee10a8aac3503b5c0a6db75aa787)

Would suggest some changes to make usage of this Xor() function clearer:

Instead taking separate span and size values then ignoring the size of the span, just take a span and use its actual size.
Add an assert to ensure that size is not greater than 8 bytes.
Rename from Xor to XorWord to make it obvious that this only useful for dealing with short spans not arbitrary sized ones.

 0--- a/src/obfuscation.h
 1+++ b/src/obfuscation.h
 2@@ -32,10 +32,10 @@ public:
 3         if (!*this) return;
 4 
 5         const uint64_t rot_key{m_rotations[key_offset % KEY_SIZE]}; // Continue obfuscation from where we left off
 6-        for (; target.size() >= KEY_SIZE; target = target.subspan(KEY_SIZE)) { // Process multiple bytes at a time
 7-            Xor(target, rot_key, KEY_SIZE);
 8+        for (; target.size() > KEY_SIZE; target = target.subspan(KEY_SIZE)) { // Process multiple bytes at a time
 9+            XorWord(target.subspan(0, KEY_SIZE), rot_key);
10         }
11-        Xor(target, rot_key, target.size());
12+        XorWord(target, rot_key);
13     }
14 
15     template <typename Stream>
16@@ -76,14 +76,14 @@ private:
17         return key;
18     }
19 
20-    static void Xor(std::span<std::byte> target, const uint64_t key, const size_t size)
21+    static void XorWord(std::span<std::byte> target, const uint64_t key)
22     {
23-        if (target.empty() || !size) return;
24-        assert(size <= target.size());
25+        assert(target.size() <= sizeof(uint64_t));
26+        if (target.empty()) return;
27         uint64_t raw{};
28-        std::memcpy(&raw, target.data(), size);
29+        std::memcpy(&raw, target.data(), target.size());
30         raw ^= key;
31-        std::memcpy(target.data(), &raw, size);
32+        std::memcpy(target.data(), &raw, target.size());
33     }
34 };
35

l0rinc commented at 5:26 pm on July 13, 2025:

In commit “refactor: …

based on your code I have instead applied it to “optimization: migrate fixed-size obfuscation from std::vector<std::byte> to uint64_t” (342bb224bb96b09d9950a144e62e60ede8710191)

target.size() > KEY_SIZE

Local benchmarks indicate that would be slower, so I kept the target.size() >= KEY_SIZE.

XorWord

taken

target.subspan(0, KEY_SIZE)

taken as target.first<KEY_SIZE>() and in last commit as target.first(alignment)

Since the benchmarks indicate the same performance (and the godbolt assembly is basically the same), I’m not remeasuring everything.

in src/obfuscation.h:61 in 342bb224bb outdated

75     }
76 
77 private:
78-    std::vector<std::byte> m_key;
79+    // Cached key rotations for different offsets.
80+    std::array<uint64_t, KEY_SIZE> m_rotations;

ryanofsky commented at 7:33 pm on July 11, 2025:

In commit “optimization: migrate fixed-size obfuscation from std::vector<std::byte> to uint64_t” (342bb224bb96b09d9950a144e62e60ede8710191)

IMO, it is haphazard for this code to define a KEY_SIZE constant and avoid hardcoding key sizes most places, but not define a key type and literally hardcode uint64_t and Uint64 in other places.

It would seem better to either hardcode the key size consistently or not hardcode it. Would suggest not hardcoding it:

 0diff --git a/src/obfuscation.h b/src/obfuscation.h
 1index dc3d0b58fac9..ace79f33c062 100644
 2--- a/src/obfuscation.h
 3+++ b/src/obfuscation.h
 4@@ -16,22 +16,23 @@
 5 class Obfuscation
 6 {
 7 public:
 8-    static constexpr size_t KEY_SIZE{sizeof(uint64_t)};
 9+    using KeyType = uint64_t;
10+    static constexpr size_t KEY_SIZE{sizeof(KeyType)};
11 
12     Obfuscation() { SetRotations(0); }
13     explicit Obfuscation(std::span<const std::byte, KEY_SIZE> key_bytes)
14     {
15-        SetRotations(ToUint64(key_bytes));
16+        SetRotations(ToKey(key_bytes));
17     }
18 
19-    uint64_t Key() const { return m_rotations[0]; }
20+    KeyType Key() const { return m_rotations[0]; }
21     operator bool() const { return Key() != 0; }
22 
23     void operator()(std::span<std::byte> target, size_t key_offset = 0) const
24     {
25         if (!*this) return;
26 
27-        const uint64_t rot_key{m_rotations[key_offset % KEY_SIZE]}; // Continue obfuscation from where we left off
28+        const KeyType rot_key{m_rotations[key_offset % KEY_SIZE]}; // Continue obfuscation from where we left off
29         for (; target.size() >= KEY_SIZE; target = target.subspan(KEY_SIZE)) { // Process multiple bytes at a time
30             Xor(target, rot_key, KEY_SIZE);
31         }
32@@ -53,14 +54,14 @@ public:
33         std::vector<std::byte> bytes{KEY_SIZE};
34         s >> bytes;
35         if (bytes.size() != KEY_SIZE) throw std::logic_error(strprintf("Obfuscation key size should be exactly %s bytes long", KEY_SIZE));
36-        SetRotations(ToUint64(std::span<std::byte, KEY_SIZE>(bytes)));
37+        SetRotations(ToKey(std::span<std::byte, KEY_SIZE>(bytes)));
38     }
39 
40 private:
41     // Cached key rotations for different offsets.
42-    std::array<uint64_t, KEY_SIZE> m_rotations;
43+    std::array<KeyType, KEY_SIZE> m_rotations;
44 
45-    void SetRotations(const uint64_t key)
46+    void SetRotations(const KeyType key)
47     {
48         for (size_t i{0}; i < KEY_SIZE; ++i) {
49             size_t key_rotation_bits{CHAR_BIT * i};
50@@ -69,18 +70,18 @@ private:
51         }
52     }
53 
54-    static uint64_t ToUint64(const std::span<const std::byte, KEY_SIZE> key_span)
55+    static KeyType ToKey(const std::span<const std::byte, KEY_SIZE> key_span)
56     {
57-        uint64_t key{};
58+        KeyType key{};
59         std::memcpy(&key, key_span.data(), KEY_SIZE);
60         return key;
61     }
62 
63-    static void Xor(std::span<std::byte> target, const uint64_t key, const size_t size)
64+    static void Xor(std::span<std::byte> target, const KeyType key, const size_t size)
65     {
66         if (target.empty() || !size) return;
67         assert(size <= target.size());
68-        uint64_t raw{};
69+        KeyType raw{};
70         std::memcpy(&raw, target.data(), size);
71         raw ^= key;
72         std::memcpy(target.data(), &raw, size);

l0rinc commented at 5:21 pm on July 13, 2025:

Thanks, took the patch, kept KEY_SIZE as well. To align it with the obfuscation_private friend, I have extracted it outside the class.

in src/obfuscation.h:54 in 74e33e7793 outdated

49+
50+    template <typename Stream>
51+    void Unserialize(Stream& s)
52+    {
53+        s >> m_key;
54+        if (m_key.size() != KEY_SIZE) throw std::logic_error(strprintf("Obfuscation key size should be exactly %s bytes long", KEY_SIZE));

ryanofsky commented at 7:45 pm on July 11, 2025:

In commit “refactor: encapsulate vector/array keys into Obfuscation” (74e33e7793bfee10a8aac3503b5c0a6db75aa787)

This seems more like a runtime deserialization error than something that is probably a programming error. Would suggest throwing std::ios_base::failure instead of logic_error.

l0rinc commented at 5:20 pm on July 13, 2025:

Thanks

in src/obfuscation.h:92 in 342bb224bb outdated

81+
82+    void SetRotations(const uint64_t key)
83+    {
84+        for (size_t i{0}; i < KEY_SIZE; ++i) {
85+            size_t key_rotation_bits{CHAR_BIT * i};
86+            if constexpr (std::endian::native == std::endian::big) key_rotation_bits *= -1;

ryanofsky commented at 7:52 pm on July 11, 2025:

In commit “optimization: migrate fixed-size obfuscation from std::vector<std::byte> to uint64_t” (342bb224bb96b09d9950a144e62e60ede8710191)

This is multiplying a size_t value (which is unsigned) by -1 which doesn’t seem right. Probably size_t should be changed to int which is the size accepted by std::rotr.

l0rinc commented at 5:20 pm on July 13, 2025:

I’m not sure it matters, but I have changed it to an int and retested it on the emulated big-endian machine, as described above.

in src/obfuscation.h:24 in 74e33e7793 outdated

20+    explicit Obfuscation(std::span<const std::byte, KEY_SIZE> key_bytes)
21+    {
22+        m_key = {key_bytes.begin(), key_bytes.end()};
23+    }
24+
25+    uint64_t Key() const { return ToUint64(); }

ryanofsky commented at 8:35 pm on July 11, 2025:

In commit “refactor: encapsulate vector/array keys into Obfuscation” (74e33e7793bfee10a8aac3503b5c0a6db75aa787)

This method does not seem safe to be a public method since unlike the other public methods, it will gives different results depending on whether the this is big or little endian machine. It seems not very useful to expose and like a potential footgun.

Would suggest making this private since it only seems to be called by tests, and if the tests are important would recommend defining a friend class or function they could call to get access to the internal / native key value.

l0rinc commented at 5:23 pm on July 13, 2025:

Not sure it’s as dangerous as how ugly the friend alternative is, but I have extracted it, let me know if you see a nicer alternative.

ryanofsky commented at 8:51 pm on July 11, 2025: contributor

Code review 56903b682bf308243ce5329b7aa8363dc498d16a. Started reviewing this going through the whole PR, but mostly focusing on the second to last commit, and left some comments. Overall the change seems great for performance in the microbenchmarks , and the commits seem nicely broken up, and the naming / API changes all seem clean and make sense.

l0rinc force-pushed on Jul 13, 2025

l0rinc commented at 7:11 pm on July 13, 2025: contributor

Thanks for the review @ryanofsky! I have addressed the comments, updated the godbolt reproducer, and reran the big-endian tests.

in src/dbwrapper.h:197 in 4a8f7fe052 outdated

197 
198     //! the length of the obfuscation key in number of bytes
199-    static const unsigned int OBFUSCATION_SIZE_BYTES;
200-
201-    std::vector<unsigned char> CreateObfuscation() const;
202+    static constexpr unsigned int OBFUSCATION_SIZE_BYTES{8};

maflcko commented at 9:40 am on July 15, 2025:

nit in 4a8f7fe052d84706cb81c09fb3bc975ce414c7e0: This is removed in the next commit, so it seems better to leave as-is instead of changing/reviewing deleted code. (Same for the scripted diff)

l0rinc commented at 10:28 pm on July 15, 2025:

I have extracted it to a new commit at the beginning which hard-codes the size and uses is throughout the codebase, so we get to delete OBFUSCATION_SIZE_BYTES early, so the rename scripted diff doen’t need it anymore.

in src/obfuscation.h:6 in 72c775d58a outdated

0@@ -0,0 +1,33 @@
1+// Copyright (c) 2025-present The Bitcoin Core developers
2+// Distributed under the MIT software license, see the accompanying
3+// file COPYING or http://www.opensource.org/licenses/mit-license.php.
4+
5+#ifndef BITCOIN_OBFUSCATION_H
6+#define BITCOIN_OBFUSCATION_H

maflcko commented at 9:43 am on July 15, 2025:

nit in 72c775d58a70771efa48acf12da7a6d434adf04e: Could make sense to move this file to src/util, to avoid bloating the root source dir. I know this is a stand-alone header right now, but in the future this could also make it easier to move it to the util lib, if there is need.

l0rinc commented at 9:47 pm on July 15, 2025:

Thanks, moved

in src/node/blockstorage.cpp:1143 in 72c775d58a outdated

1139@@ -1140,14 +1140,14 @@ static auto InitBlocksdirXorKey(const BlockManager::Options& opts)
1140     if (opts.use_xor && first_run) {
1141         // Only use random fresh key when the boolean option is set and on the
1142         // very first start of the program.
1143-        FastRandomContext{}.fillrand(obfuscation);
1144+        FastRandomContext{}.fillrand(key_bytes);

maflcko commented at 9:49 am on July 15, 2025:

nit in 72c775d58a70771efa48acf12da7a6d434adf04e:

I still don’t understand why the same line is changed/renamed several times, when it could just be changed a single time.

it is changed xor_key -> obfuscation -> key_bytes

Also, first hard-coding 8 and then changing all the same lines to Obfuscation::KEY_SIZE seems odd. Why not let the very first commit be:

0class Obfuscation
1{
2public:
3    static constexpr size_t KEY_SIZE{sizeof(uint64_t)};
4};

and then use that from the beginning?

This would reduce the code and review churn and also start off by clearly documenting the goal of this pull in the first commit.

l0rinc commented at 10:53 pm on July 15, 2025:

I still don’t understand why the same line is changed/renamed several times

Because they usually reflect the name that I found to best describe their role in the current commit. I have changed these to reflect the final state instead so we shouldn’t have back-and-forth renames anymore (remaining ones are are deliberate, e.g. we need both the underlying bytes and the Obfuscation object).

let the very first commit be […] class Obfuscation […] and then use that from the beginning

The PR changed a lot lately, makes perfect sense now, taken!

maflcko approved

maflcko commented at 9:53 am on July 15, 2025: member

looked at a6d254e2462cb0d4276e5388b540e02ebc2366c9

l0rinc force-pushed on Jul 16, 2025

l0rinc commented at 0:06 am on July 16, 2025: contributor

Thanks for the comments, addressed them + rebased in a separate push.

l0rinc commented at 0:06 am on July 16, 2025: contributor

Thanks for the comments, addressed them + rebased in a separate push.

l0rinc commented at 0:06 am on July 16, 2025: contributor

Thanks for the comments, addressed them + rebased in a separate push.

l0rinc commented at 0:06 am on July 16, 2025: contributor

Thanks for the comments, addressed them + rebased in a separate push.

l0rinc force-pushed on Jul 16, 2025

DrahtBot added the label CI failed on Jul 16, 2025

DrahtBot commented at 0:10 am on July 16, 2025: contributor

🚧 At least one of the CI tasks failed. Task lint: https://github.com/bitcoin/bitcoin/runs/46055133794 LLM reason (✨ experimental): The CI failure is due to a missing include guard in src/util/obfuscation.h detected by the include guard linter.

Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.
An intermittent issue.

Leave a comment here, if you need help tracking down a confusing failure.

DrahtBot removed the label CI failed on Jul 16, 2025

in src/dbwrapper.cpp:398 in 43189b844a outdated

394@@ -395,7 +395,7 @@ void CDBIterator::Next() { m_impl_iter->iter->Next(); }
395 
396 namespace dbwrapper_private {
397 
398-const std::vector<unsigned char>& GetObfuscation(const CDBWrapper &w)
399+Obfuscation GetObfuscation(const CDBWrapper& w)

maflcko commented at 11:22 am on July 16, 2025:

nit in 43189b844a95422fda6d14d4723035e6fc342aff: Any reason to create a copy now? I understand that in this commit the size is small enough to prefer pass-by-value, but this may not hold in the future?

l0rinc commented at 5:18 pm on July 16, 2025:

Seems simpler and I assumed copy elision kicks in, but I’ll change it next time I touch the code, thanks.

l0rinc commented at 8:27 pm on July 16, 2025:

Done

in src/util/obfuscation.h:82 in 43189b844a outdated

77 
78+inline KeyType obfuscation_private::Key(const Obfuscation& obfuscation)
79+{
80+    return obfuscation.ToKey();
81+}
82+

maflcko commented at 11:44 am on July 16, 2025:

nit in https://github.com/bitcoin/bitcoin/commit/43189b844a95422fda6d14d4723035e6fc342aff: Is exposing this implementation detail really needed? This is adding more lines to real code than there are test-only lines that need it. There are two tests:

One that checks the ToKey round-trip. However, this is already checked by the existing tests and the serialize test can instead do the comparison on a DataStream, if needed: Check that a std::vector (de)serialization (roundtrip) is equal to an obfuscation (de)serialization (roundtrip).
One that checks a serialize round-trip. However, this would be covered by the above suggested test as well.

l0rinc commented at 4:58 pm on July 16, 2025:

Yes, it’s redundant, but I wanted to make absolutely sure the internals are also correct. If other reviewers also comfortable deleting this, I don’t mind.

maflcko commented at 5:34 pm on July 16, 2025:

It only checks m_rotations[0] against a hardcoded memcpy, it doesn’t check ToKey in the top commit, so I don’t think the unit test adds any new checks. Otherwise, it would be good to know what exact bug this could possibly catch that the other test isn’t catching.

In fact the unit test seems a bit confusing, because it constructs the obfuscation from an uint64_t, whereas production code only uses byte spans.

If you really want to directly test Serialize uses m_rotations[0] and Unserialize or the constructor use ToKey, so I am thinking using those public functions is a better way to test the internals than to introduce new and duplicate test-only function.

edit: Also putting a type named generally “KeyType” in the global namespace for this test-only feature seems a bit confusing. If you really want to test the internal state, my preference would be to mark the private field protected and then in the unit test code expose the protected field in a derived class. (But my first preference would still be the above :sweat_smile: )

l0rinc commented at 8:27 pm on July 16, 2025:

Let me know what you think of the result now.

ryanofsky commented at 8:37 pm on July 16, 2025:

re: #31144 (review)

Looks like the code changes in the latest push, but here was the test I was going to suggest that relied on public methods only and I think covered everything.

 0--- a/src/test/streams_tests.cpp
 1+++ b/src/test/streams_tests.cpp
 2@@ -74,32 +74,24 @@ BOOST_AUTO_TEST_CASE(xor_bytes_reference)
 3     }
 4 }
 5 
 6-BOOST_AUTO_TEST_CASE(obfuscation_constructors)
 7-{
 8-    using KeyType = uint64_t;
 9-    constexpr KeyType test_key{0x0123456789ABCDEF};
10-
11-    std::array<std::byte, Obfuscation::KEY_SIZE> key_bytes{};
12-    std::memcpy(key_bytes.data(), &test_key, Obfuscation::KEY_SIZE);
13-    const Obfuscation obfuscation{key_bytes};
14-    BOOST_CHECK_EQUAL(obfuscation_private::Key(obfuscation), test_key);
15-}
16-
17 BOOST_AUTO_TEST_CASE(obfuscation_serialize)
18 {
19-    const Obfuscation original{m_rng.randbytes<Obfuscation::KEY_SIZE>()};
20+    Obfuscation obfuscation{};
21 
22-    // Serialize
23-    DataStream ds;
24-    ds << original;
25+    // Test loading a key.
26+    std::vector<std::byte> key_in{m_rng.randbytes<std::byte>(Obfuscation::KEY_SIZE)};
27+    DataStream ds_in;
28+    ds_in << key_in;
29+    ds_in >> obfuscation;
30 
31-    BOOST_CHECK_EQUAL(ds.size(), 1 + Obfuscation::KEY_SIZE); // serialized as a vector
32+    // Test saving the key.
33+    std::vector<std::byte> key_out;
34+    DataStream ds_out;
35+    ds_out << obfuscation;
36+    ds_out >> key_out;
37 
38-    // Deserialize
39-    Obfuscation recovered{};
40-    ds >> recovered;
41-
42-    BOOST_CHECK_EQUAL(obfuscation_private::Key(recovered), obfuscation_private::Key(original));
43+    // Make sure saved key is the same.
44+    BOOST_CHECK_EQUAL_COLLECTIONS(key_in.begin(), key_in.end(), key_out.begin(), key_out.end());
45 }
46 
47 BOOST_AUTO_TEST_CASE(obfuscation_empty)
48--- a/src/util/obfuscation.h
49+++ b/src/util/obfuscation.h
50@@ -15,17 +15,10 @@
51 #include <ios>
52 #include <memory>
53 
54-using KeyType = uint64_t;
55-
56-class Obfuscation;
57-
58-namespace obfuscation_private {
59-KeyType Key(const Obfuscation& obfuscation);
60-}
61-
62 class Obfuscation
63 {
64 public:
65+    using KeyType = uint64_t; //!< Internal key representation.
66     static constexpr size_t KEY_SIZE{sizeof(KeyType)};
67 
68     Obfuscation() { SetRotations(0); }
69@@ -111,13 +104,7 @@ private:
70         raw ^= key;
71         std::memcpy(target.data(), &raw, target.size());
72     }
73-
74-    friend KeyType obfuscation_private::Key(const Obfuscation& obfuscation);
75 };
76 
77-inline KeyType obfuscation_private::Key(const Obfuscation& obfuscation)
78-{
79-    return obfuscation.m_rotations[0];
80-}
81 
82 #endif // BITCOIN_UTIL_OBFUSCATION_H

l0rinc commented at 9:07 pm on July 16, 2025:

We can still do that, but given that we need to be able to display the contents of the obfuscation anyway (unless we go back to reading/writing vectors), I think the current way is probably simpler. Please let me know if you disagree.

ryanofsky commented at 8:09 pm on July 18, 2025:

re: #31144 (review)

I think the current way is probably simpler. Please let me know if you disagree.

I think current way is ok, but serialization test in the diff has some benefits over the current one. Diff suggested:

 0BOOST_AUTO_TEST_CASE(obfuscation_serialize)
 1{
 2    Obfuscation obfuscation{};
 3
 4    // Test loading a key.
 5    std::vector<std::byte> key_in{m_rng.randbytes<std::byte>(Obfuscation::KEY_SIZE)};
 6    DataStream ds_in;
 7    ds_in << key_in;
 8    ds_in >> obfuscation;
 9
10    // Test saving the key.
11    std::vector<std::byte> key_out;
12    DataStream ds_out;
13    ds_out << obfuscation;
14    ds_out >> key_out;
15
16    // Make sure saved key is the same.
17    BOOST_CHECK_EQUAL_COLLECTIONS(key_in.begin(), key_in.end(), key_out.begin(), key_out.end());
18}

IMO this was better because it actually checks output of serialization is as expected, not just that it round-trips and is a certain length. Also think it’s better to not use HexStr/HexKey to check for equality. But feel free to keep the current test.

l0rinc commented at 5:04 pm on July 22, 2025:

Added it to the test suite: 2dea045 (#33039)

in src/dbwrapper.cpp:254 in 43189b844a outdated

249@@ -250,23 +250,23 @@ CDBWrapper::CDBWrapper(const DBParams& params)
250     }
251 
252     {
253-        m_obfuscation = std::vector<uint8_t>(Obfuscation::KEY_SIZE, '\000'); // Needed for unobfuscated Read() below
254-        const bool key_missing{!Read(OBFUSCATION_KEY_KEY, m_obfuscation)};

maflcko commented at 12:09 pm on July 16, 2025:

nit in 43189b844a95422fda6d14d4723035e6fc342aff: obfuscation-serialization and vector-serialization is the same, so why is this changed? Leaving this as-is would avoid the need to introduce a separate key_bytes and then check the size again of key_bytes afterward.

This would also allow dropping commit c678b5dd79c3ed68b5c2537fdbf3de485db09021

l0rinc commented at 4:58 pm on July 16, 2025:

refactor: prepare DBWrapper for obfuscation key change adds key size validation on read, clarifies that we have to turn off obfuscation before we read the vector, and inlines CreateObfuscation (+ renames to clarify things + narrowing of scopes).

Before, we could read directly into the CDBWrapper field, now we have to read into a temp vector and init the Obfuscation field from that.

If you have a better idea of how to do this, please let me know.

maflcko commented at 5:12 pm on July 16, 2025:

I don’t think there is any risk in reading directly into the field. The deser throws internally on key size validation, so it safe.

Handling with a vector separately just seems like extra handling/code?

l0rinc commented at 5:24 pm on July 16, 2025:

We’re not exposing the internals of Obfuscation, we can’t currently call GetRandBytes(new_key) on the internals, we have to first read the vector from disk or create a random vector to be able to construct the object.

maflcko commented at 5:33 pm on July 16, 2025:

FastRandomContext calls GetRandBytes internally, so I think it is fine to use (it is also used in the blockmanager).

At least for me the following compiles/links: (Obviously the code is wrong, I just wanted to check compilation)

 0diff --git a/src/dbwrapper.cpp b/src/dbwrapper.cpp
 1index 2b33e04468..79c909b552 100644
 2--- a/src/dbwrapper.cpp
 3+++ b/src/dbwrapper.cpp
 4@@ -252,13 +252,13 @@ CDBWrapper::CDBWrapper(const DBParams& params)
 5     {
 6         assert(!m_obfuscation); // Needed for unobfuscated Read() below
 7         std::vector<uint8_t> key_bytes(Obfuscation::KEY_SIZE, '\000');
 8-        const bool key_missing{!Read(OBFUSCATION_KEY_KEY, key_bytes)};
 9+        const bool key_missing{!Read(OBFUSCATION_KEY_KEY, m_obfuscation)};
10         if (key_missing && params.obfuscate && IsEmpty()) {
11             // Initialize non-degenerate obfuscation if it won't upset existing, non-obfuscated data.
12-            GetRandBytes(key_bytes);
13+            m_obfuscation = Obfuscation{FastRandomContext{}.randbytes<8>()};
14 
15             // Write `new_key` so we don't obfuscate the key with itself
16-            Write(OBFUSCATION_KEY_KEY, key_bytes);
17+            Write(OBFUSCATION_KEY_KEY, m_obfuscation);
18 
19             LogInfo("Wrote new obfuscation key for %s: %s", fs::PathToString(params.path), HexStr(key_bytes));
20         }

l0rinc commented at 8:27 pm on July 16, 2025:

m_obfuscation = Obfuscation{FastRandomContext{}.randbytes<8>()}; Write(OBFUSCATION_KEY_KEY, m_obfuscation);

If we enable obfuscation (by assigning the field) before calling Write, it will obfuscate the new key with the new key in CDBBatch::WriteImpl, so before writing we need m_obfuscation to remain empty. I’ve extended the comments to clarify that.

FastRandomContext calls GetRandBytes internally

Nice, if we can indeed use FastRandomContext{}, the code can be simplified a lot, thanks!

And you’re right that if we use the Obfuscation serialization route, we only need to check the length of the existing keys - which the serialization already does! Nice!

And in the generation path we can just read back the value we just wrote, cleaning up the whole thing to a few lines! Sweet, thanks for the hint!

We do need to add a way to convert the stored value to a string (since we’re logging the generated key) - which @ryanofsky and you opposed, but let me know what you think.

ryanofsky commented at 8:36 pm on July 16, 2025:

re: #31144 (review)

I was going to suggest dropping the serialization methods since current code isn’t really using them. But I like Marco’s suggestion to use them more.

For reference, change I was going to suggest is below. But Marco’s approach seems better if it works.

  0--- a/src/dbwrapper.cpp
  1+++ b/src/dbwrapper.cpp
  2@@ -262,11 +262,8 @@ CDBWrapper::CDBWrapper(const DBParams& params)
  3 
  4             LogInfo("Wrote new obfuscation key for %s: %s", fs::PathToString(params.path), HexStr(key_bytes));
  5         }
  6-        if (key_bytes.size() != Obfuscation::KEY_SIZE) {
  7-            throw dbwrapper_error(strprintf("Invalid obfuscation key for %s: %s!", fs::PathToString(params.path), HexStr(key_bytes)));
  8-        }
  9+        m_obfuscation = Obfuscation::FromBytes(MakeByteSpan(key_bytes));
 10         LogInfo("Using obfuscation key for %s: %s", fs::PathToString(params.path), HexStr(key_bytes));
 11-        m_obfuscation = Obfuscation{MakeByteSpan(key_bytes).first<Obfuscation::KEY_SIZE>()};
 12     }
 13 }
 14 
 15--- a/src/node/mempool_persist.cpp
 16+++ b/src/node/mempool_persist.cpp
 17@@ -64,9 +64,9 @@ bool LoadMempool(CTxMemPool& pool, const fs::path& load_path, Chainstate& active
 18         if (version == MEMPOOL_DUMP_VERSION_NO_XOR_KEY) {
 19             file.SetObfuscation({});
 20         } else if (version == MEMPOOL_DUMP_VERSION) {
 21-            Obfuscation obfuscation;
 22+            std::vector<std::byte> obfuscation;
 23             file >> obfuscation;
 24-            file.SetObfuscation(obfuscation);
 25+            file.SetObfuscation(Obfuscation::FromBytes(obfuscation));
 26         } else {
 27             return false;
 28         }
 29--- a/src/test/streams_tests.cpp
 30+++ b/src/test/streams_tests.cpp
 31@@ -76,30 +76,11 @@ BOOST_AUTO_TEST_CASE(xor_bytes_reference)
 32 
 33 BOOST_AUTO_TEST_CASE(obfuscation_constructors)
 34 {
 35-    using KeyType = uint64_t;
 36-    constexpr KeyType test_key{0x0123456789ABCDEF};
 37-
 38-    std::array<std::byte, Obfuscation::KEY_SIZE> key_bytes{};
 39-    std::memcpy(key_bytes.data(), &test_key, Obfuscation::KEY_SIZE);
 40-    const Obfuscation obfuscation{key_bytes};
 41-    BOOST_CHECK_EQUAL(obfuscation_private::Key(obfuscation), test_key);
 42-}
 43-
 44-BOOST_AUTO_TEST_CASE(obfuscation_serialize)
 45-{
 46-    const Obfuscation original{m_rng.randbytes<Obfuscation::KEY_SIZE>()};
 47-
 48-    // Serialize
 49-    DataStream ds;
 50-    ds << original;
 51-
 52-    BOOST_CHECK_EQUAL(ds.size(), 1 + Obfuscation::KEY_SIZE); // serialized as a vector
 53-
 54-    // Deserialize
 55-    Obfuscation recovered{};
 56-    ds >> recovered;
 57-
 58-    BOOST_CHECK_EQUAL(obfuscation_private::Key(recovered), obfuscation_private::Key(original));
 59+    std::vector<std::byte> bytes_in{"0123456789abcdef"_hex_v};
 60+    Obfuscation obfuscation{Obfuscation::FromBytes(bytes_in)};
 61+    std::vector<std::byte> bytes_out{Obfuscation::KEY_SIZE};
 62+    obfuscation.ToBytes(bytes_out);
 63+    BOOST_CHECK_EQUAL_COLLECTIONS(bytes_in.begin(), bytes_in.end(), bytes_out.begin(), bytes_out.end());
 64 }
 65 
 66 BOOST_AUTO_TEST_CASE(obfuscation_empty)
 67--- a/src/util/obfuscation.h
 68+++ b/src/util/obfuscation.h
 69@@ -15,17 +15,10 @@
 70 #include <ios>
 71 #include <memory>
 72 
 73-using KeyType = uint64_t;
 74-
 75-class Obfuscation;
 76-
 77-namespace obfuscation_private {
 78-KeyType Key(const Obfuscation& obfuscation);
 79-}
 80-
 81 class Obfuscation
 82 {
 83 public:
 84+    using KeyType = uint64_t; //!< Internal key representation.
 85     static constexpr size_t KEY_SIZE{sizeof(KeyType)};
 86 
 87     Obfuscation() { SetRotations(0); }
 88@@ -64,22 +57,16 @@ public:
 89         XorWord(target, rot_key);
 90     }
 91 
 92-    template <typename Stream>
 93-    void Serialize(Stream& s) const
 94+    static Obfuscation FromBytes(std::span<const std::byte> bytes)
 95     {
 96-        // Use vector serialization for convenient compact size prefix.
 97-        std::vector<std::byte> bytes{KEY_SIZE};
 98-        std::memcpy(bytes.data(), &m_rotations[0], KEY_SIZE);
 99-        s << bytes;
100+        if (bytes.size() != KEY_SIZE) throw std::ios_base::failure(strprintf("Obfuscation key size should be exactly %s bytes long", KEY_SIZE));
101+        return Obfuscation{std::span<const std::byte, KEY_SIZE>(bytes)};
102     }
103 
104-    template <typename Stream>
105-    void Unserialize(Stream& s)
106+    void ToBytes(std::span<std::byte> bytes)
107     {
108-        std::vector<std::byte> bytes{KEY_SIZE};
109-        s >> bytes;
110         if (bytes.size() != KEY_SIZE) throw std::ios_base::failure(strprintf("Obfuscation key size should be exactly %s bytes long", KEY_SIZE));
111-        SetRotations(ToKey(std::span<std::byte, KEY_SIZE>(bytes)));
112+        std::memcpy(bytes.data(), &m_rotations[0], KEY_SIZE);
113     }
114 
115 private:
116@@ -111,13 +98,7 @@ private:
117         raw ^= key;
118         std::memcpy(target.data(), &raw, target.size());
119     }
120-
121-    friend KeyType obfuscation_private::Key(const Obfuscation& obfuscation);
122 };
123 
124-inline KeyType obfuscation_private::Key(const Obfuscation& obfuscation)
125-{
126-    return obfuscation.m_rotations[0];
127-}
128 
129 #endif // BITCOIN_UTIL_OBFUSCATION_H

l0rinc commented at 9:05 pm on July 16, 2025:

Do you recommend we apply any of that after the recent changes?

ryanofsky commented at 8:08 pm on July 18, 2025:

re: #31144 (review)

Do you recommend we apply any of that after the recent changes?

Nope, new implementation looks better and this should be resolved.

in src/util/obfuscation.h:43 in ddba3f5866 outdated

54-            if (j == KEY_SIZE)
55-                j = 0;
56+        if (!*this) return;
57+
58+        const KeyType rot_key{m_rotations[key_offset % KEY_SIZE]}; // Continue obfuscation from where we left off
59+        for (; target.size() >= KEY_SIZE; target = target.subspan(KEY_SIZE)) { // Process multiple bytes at a time

maflcko commented at 1:48 pm on July 16, 2025:

nit in ddba3f5866868a6a27cd945508091439c6a82416: I think XorWord already explains the processing of multiple bytes at once, so this comment could be removed.

l0rinc commented at 8:27 pm on July 16, 2025:

Removed from intermediary commit

in src/util/obfuscation.h:41 in 7b93fa394a outdated

45-            XorWord(target.first<KEY_SIZE>(), rot_key);
46+        KeyType rot_key{m_rotations[key_offset % KEY_SIZE]}; // Continue obfuscation from where we left off
47+        if (target.size() > KEY_SIZE) {
48+            // Obfuscate until 64-bit alignment boundary
49+            if (const auto misalign{std::bit_cast<uintptr_t>(target.data()) % KEY_SIZE}) {
50+                const size_t alignment{std::min(KEY_SIZE - misalign, target.size())};

maflcko commented at 4:01 pm on July 16, 2025:

nit in 7b93fa394acaee7a715fae9a2b968d33c5adc174: Is the min needed? misalign is less than KEY_SIZE, and target.size() > KEY_SIZE.

l0rinc commented at 5:09 pm on July 16, 2025:

Good observation: it’s rather a safety, since if (target.size() > KEY_SIZE) { condition guarding it is just an optimization: everything works without it as well. If we remove the min, the above condition isn’t just an optimization anymore. I prefer keeping the redundancy, but if other reviewers think it’s cleaner without, let me know.

maflcko commented at 5:36 pm on July 16, 2025:

Just a nit, anything is fine here, as it is optimized out anyway.

ryanofsky commented at 9:19 pm on July 18, 2025:

re: #31144 (review)

In commit “optimization: peel align-head and unroll body to 64 bytes” (248b6a27c351690d3596711cc36b8102977adeab)

I was about to ask the same question about why min was there to handle an impossible condition and would suggest either removing the min or removing the if (target.size() > KEY_SIZE) check if the latter wouldn’t hurt performance . Either of these would make the code easier to follow, IMO

l0rinc commented at 11:22 pm on July 18, 2025:

If you insist, I’ll do it if I have to push again.

l0rinc commented at 5:14 pm on July 22, 2025:

Done, together with the bit_cast -> reinterpret_cast change and comment renames, see fee3048 (#33039)

maflcko commented at 4:18 pm on July 16, 2025: member

lgtm ack 7b93fa394acaee7a715fae9a2b968d33c5adc174 📕

Signature:

0untrusted comment: signature from minisign secret key on empty file; verify via: minisign -Vm "${path_to_any_empty_file}" -P RWTRmVTMeKV5noAMqVlsMugDDCyyTSbA3Re5AkUrhvLVln0tSaFWglOw -x "${path_to_this_whole_four_line_signature_blob}"
1RUTRmVTMeKV5npGrKx1nqXCw5zeVHdtdYURB/KlyA/LMFgpNCs+SkW9a8N95d+U4AP1RJMi+krxU1A3Yux4bpwZNLvVBKy0wLgM=
2trusted comment: lgtm ack 7b93fa394acaee7a715fae9a2b968d33c5adc174 📕
3oZkgF1tzWxCxE4D9fRIXf9lAc9P9X7sxHugu5QW4rogkU1EhQ/XW6zHnGVGtAXhrw82ClIvoDbDWH5ksSwyiDg==

random: add fixed-size `std::array` generation

Co-authored-by: Hodlinator <172445034+hodlinator@users.noreply.github.com>

7aa557a37b

refactor: commit to 8 byte obfuscation keys

Since 31 byte xor-keys are not used in the codebase, using the common size (8 bytes) makes the benchmarks more realistic.

Co-authored-by: maflcko <6399679+maflcko@users.noreply.github.com>

54ab0bd64c

in src/test/streams_tests.cpp:43 in 7750bfe17e outdated

37+        const auto key_bytes{m_rng.randbool() ? m_rng.randbytes<Obfuscation::KEY_SIZE>() : std::array<std::byte, Obfuscation::KEY_SIZE>{}};
38+        apply_random_xor_chunks(roundtrip, key_bytes);
39+
40+        const bool key_all_zeros{std::ranges::all_of(
41+            std::span{key_bytes}.first(std::min(write_size, Obfuscation::KEY_SIZE)), [](auto b) { return b == std::byte{0}; })};
42+        BOOST_CHECK(key_all_zeros ? original == roundtrip : original != roundtrip);

ryanofsky commented at 6:13 pm on July 16, 2025:

In commit “test: compare util::Xor with randomized inputs against simple impl” (7750bfe17e1c5cb69806f9f83675cd1e78083f7f)

This seems like a weak check which is not actually verifying the obfuscation is done correctly. Why not just check that obfuscated data is correct?

 0--- a/src/test/streams_tests.cpp
 1+++ b/src/test/streams_tests.cpp
 2@@ -37,9 +37,10 @@ BOOST_AUTO_TEST_CASE(xor_roundtrip_random_chunks)
 3         const auto key_bytes{m_rng.randbool() ? m_rng.randbytes<Obfuscation::KEY_SIZE>() : std::array<std::byte, Obfuscation::KEY_SIZE>{}};
 4         apply_random_xor_chunks(roundtrip, key_bytes);
 5 
 6-        const bool key_all_zeros{std::ranges::all_of(
 7-            std::span{key_bytes}.first(std::min(write_size, Obfuscation::KEY_SIZE)), [](auto b) { return b == std::byte{0}; })};
 8-        BOOST_CHECK(key_all_zeros ? original == roundtrip : original != roundtrip);
 9+        BOOST_CHECK_EQUAL(roundtrip.size(), original.size());
10+        for (size_t i{0}; i < original.size(); ++i) {
11+            BOOST_CHECK(roundtrip[i] == (original[i] ^ key_bytes[i % key_bytes.size()]));
12+        }
13 
14         apply_random_xor_chunks(roundtrip, key_bytes);
15         BOOST_CHECK(original == roundtrip);

This would also allow dropping the xor_bytes_reference test below which is checking the same condition in a more complicated way.

l0rinc commented at 8:57 pm on July 16, 2025:

The two tests are doing different things - one does black-box style property-based testing to validate that certain invariants hold - that deobfuscating an obfuscation results in the original message (higher level, it doesn’t have to know about the implementation details).

The second one does what you mentioned, so if I understand you correctly, what you suggest is rather dropping the roundtrip test? I think it’s useful because if it fails it gives a higher-level error.

ryanofsky commented at 2:33 pm on July 18, 2025:

In commit “test: compare util::Xor with randomized inputs against simple impl” (618a30e326e9bcfd72e0e2645ce49f8b2a88714d)

re: #31144 (review)

Thanks for the detail in the commit message clarifying the reasons for the changes.

I would still suggest simplifying the tests. I think the first test is simple and easy to understand but has a major gap in coverage, and the second test provides good coverage but is written in a convoluted style. IMO, following change would provide full coverage with simpler tests:

 0--- a/src/test/streams_tests.cpp
 1+++ b/src/test/streams_tests.cpp
 2@@ -18,8 +18,9 @@ using namespace util::hex_literals;
 3 
 4 BOOST_FIXTURE_TEST_SUITE(streams_tests, BasicTestingSetup)
 5 
 6-// Test that obfuscation can be properly reverted even with random chunk sizes.
 7-BOOST_AUTO_TEST_CASE(xor_roundtrip_random_chunks)
 8+// Check optimized obfuscation with random offsets and sizes to ensure proper
 9+// handling of key wrapping. Also verify it roundtrips.
10+BOOST_AUTO_TEST_CASE(xor_random_chunks)
11 {
12     auto apply_random_xor_chunks{[&](std::span<std::byte> target, std::span<const std::byte, Obfuscation::KEY_SIZE> obfuscation) {
13         for (size_t offset{0}; offset < target.size();) {
14@@ -36,43 +37,16 @@ BOOST_AUTO_TEST_CASE(xor_roundtrip_random_chunks)
15 
16         const auto key_bytes{m_rng.randbool() ? m_rng.randbytes<Obfuscation::KEY_SIZE>() : std::array<std::byte, Obfuscation::KEY_SIZE>{}};
17         apply_random_xor_chunks(roundtrip, key_bytes);
18-
19-        const bool key_all_zeros{std::ranges::all_of(
20-            std::span{key_bytes}.first(std::min(write_size, Obfuscation::KEY_SIZE)), [](auto b) { return b == std::byte{0}; })};
21-        BOOST_CHECK(key_all_zeros ? original == roundtrip : original != roundtrip);
22+        BOOST_CHECK_EQUAL(roundtrip.size(), original.size());
23+        for (size_t i{0}; i < original.size(); ++i) {
24+            BOOST_CHECK(roundtrip[i] == (original[i] ^ key_bytes[i % key_bytes.size()]));
25+        }
26 
27         apply_random_xor_chunks(roundtrip, key_bytes);
28         BOOST_CHECK(original == roundtrip);
29     }
30 }
31 
32-// Compares optimized obfuscation against a trivial, byte-by-byte reference implementation
33-// with random offsets to ensure proper handling of key wrapping.
34-BOOST_AUTO_TEST_CASE(xor_bytes_reference)
35-{
36-    auto expected_xor{[](std::span<std::byte> target, std::span<const std::byte, Obfuscation::KEY_SIZE> obfuscation, size_t key_offset) {
37-        for (auto& b : target) {
38-            b ^= obfuscation[key_offset++ % obfuscation.size()];
39-        }
40-    }};
41-
42-    for (size_t test{0}; test < 100; ++test) {
43-        const size_t write_size{1 + m_rng.randrange(100U)};
44-        const size_t key_offset{m_rng.randrange(3 * Obfuscation::KEY_SIZE)}; // Make sure the key can wrap around
45-        const size_t write_offset{std::min(write_size, m_rng.randrange(Obfuscation::KEY_SIZE * 2))}; // Write unaligned data
46-
47-        const auto key_bytes{m_rng.randbool() ? m_rng.randbytes<Obfuscation::KEY_SIZE>() : std::array<std::byte, Obfuscation::KEY_SIZE>{}};
48-        const std::vector obfuscation{key_bytes.begin(), key_bytes.end()};
49-        std::vector expected{m_rng.randbytes<std::byte>(write_size)};
50-        std::vector actual{expected};
51-
52-        expected_xor(std::span{expected}.subspan(write_offset), key_bytes, key_offset);
53-        util::Xor(std::span{actual}.subspan(write_offset), key_bytes, key_offset);
54-
55-        BOOST_CHECK_EQUAL_COLLECTIONS(expected.begin(), expected.end(), actual.begin(), actual.end());
56-    }
57-}
58-
59 BOOST_AUTO_TEST_CASE(xor_file)
60 {
61     fs::path xor_path{m_args.GetDataDirBase() / "test_xor.bin"};

one does black-box style property-based testing

I wouldn’t consider the first test to be a black box test because it making major assumptions about the obfuscation implementation: That the same function is used to obfuscate and deobfuscate, that each byte can be obfuscated independently, and that the output is not obfuscated when they key is 0. Testing it as if it were a general purpose API just weakens the test and overcomplicates it without a good reason, IMO.

what you suggest is rather dropping the roundtrip test

No, both of the suggested diffs would keep the roundtrip test. The roundrip property is critical and good to verify directly I think.

l0rinc commented at 11:37 pm on July 18, 2025:

Not sure I understand why this is better, we have the xor instructions spread to multiple places now - apply_random_xor_chunks & inline original ^ key_bytes. What do others think?

ryanofsky commented at 8:55 pm on July 21, 2025:

re: #31144 (review)

Not sure I understand why this is better, we have the xor instructions spread to multiple places now - apply_random_xor_chunks & inline original ^ key_bytes

Hmm. Are we looking at the same diff? Below is an updated diff based on master. It is net -26 lines of code and has exactly one xor operation. It makes the simple test stronger, deletes the more complicated test, testing all the same cases and providing all the same coverage as far as I can see:

 0--- a/src/test/streams_tests.cpp
 1+++ b/src/test/streams_tests.cpp
 2@@ -18,8 +18,9 @@ using namespace util::hex_literals;
 3 
 4 BOOST_FIXTURE_TEST_SUITE(streams_tests, BasicTestingSetup)
 5 
 6-// Test that obfuscation can be properly reverted even with random chunk sizes.
 7-BOOST_AUTO_TEST_CASE(xor_roundtrip_random_chunks)
 8+// Check optimized obfuscation with random offsets and sizes to ensure proper
 9+// handling of key wrapping. Also verify it roundtrips.
10+BOOST_AUTO_TEST_CASE(xor_random_chunks)
11 {
12     auto apply_random_xor_chunks{[&](std::span<std::byte> target, const Obfuscation& obfuscation) {
13         for (size_t offset{0}; offset < target.size();) {
14@@ -37,43 +38,16 @@ BOOST_AUTO_TEST_CASE(xor_roundtrip_random_chunks)
15         const auto key_bytes{m_rng.randbool() ? m_rng.randbytes<Obfuscation::KEY_SIZE>() : std::array<std::byte, Obfuscation::KEY_SIZE>{}};
16         const Obfuscation obfuscation{key_bytes};
17         apply_random_xor_chunks(roundtrip, obfuscation);
18-
19-        const bool key_all_zeros{std::ranges::all_of(
20-            std::span{key_bytes}.first(std::min(write_size, Obfuscation::KEY_SIZE)), [](auto b) { return b == std::byte{0}; })};
21-        BOOST_CHECK(key_all_zeros ? original == roundtrip : original != roundtrip);
22+        BOOST_CHECK_EQUAL(roundtrip.size(), original.size());
23+        for (size_t i{0}; i < original.size(); ++i) {
24+            BOOST_CHECK(roundtrip[i] == (original[i] ^ key_bytes[i % key_bytes.size()]));
25+        }
26 
27         apply_random_xor_chunks(roundtrip, obfuscation);
28         BOOST_CHECK(original == roundtrip);
29     }
30 }
31 
32-// Compares optimized obfuscation against a trivial, byte-by-byte reference implementation
33-// with random offsets to ensure proper handling of key wrapping.
34-BOOST_AUTO_TEST_CASE(xor_bytes_reference)
35-{
36-    auto expected_xor{[](std::span<std::byte> target, std::span<const std::byte, Obfuscation::KEY_SIZE> obfuscation, size_t key_offset) {
37-        for (auto& b : target) {
38-            b ^= obfuscation[key_offset++ % obfuscation.size()];
39-        }
40-    }};
41-
42-    for (size_t test{0}; test < 100; ++test) {
43-        const size_t write_size{1 + m_rng.randrange(100U)};
44-        const size_t key_offset{m_rng.randrange(3 * Obfuscation::KEY_SIZE)}; // Make sure the key can wrap around
45-        const size_t write_offset{std::min(write_size, m_rng.randrange(Obfuscation::KEY_SIZE * 2))}; // Write unaligned data
46-
47-        const auto key_bytes{m_rng.randbool() ? m_rng.randbytes<Obfuscation::KEY_SIZE>() : std::array<std::byte, Obfuscation::KEY_SIZE>{}};
48-        const Obfuscation obfuscation{key_bytes};
49-        std::vector expected{m_rng.randbytes<std::byte>(write_size)};
50-        std::vector actual{expected};
51-
52-        expected_xor(std::span{expected}.subspan(write_offset), key_bytes, key_offset);
53-        obfuscation(std::span{actual}.subspan(write_offset), key_offset);
54-
55-        BOOST_CHECK_EQUAL_COLLECTIONS(expected.begin(), expected.end(), actual.begin(), actual.end());
56-    }
57-}
58-
59 BOOST_AUTO_TEST_CASE(obfuscation_hexkey)
60 {
61     const auto key_bytes{m_rng.randbytes<Obfuscation::KEY_SIZE>()};

l0rinc commented at 4:41 pm on July 22, 2025:

I’m usually a fan of small and focused tests that complement each other - instead of end-to-end ones that test everything. But I don’t mind merging these either, done something similar in a separate follow-up PR: a17d820 (#33039)

in src/test/streams_tests.cpp:300 in 7750bfe17e outdated

313-    in.push_back(std::byte{0x0f});
314-
315     {
316-        DataStream ds{in};
317-        ds.Xor({0xff, 0x0f});
318+        const auto obfuscation{"ff0fff0fff0fff0f"_hex_v_u8};

ryanofsky commented at 6:34 pm on July 16, 2025:

In commit “test: compare util::Xor with randomized inputs against simple impl” (7750bfe17e1c5cb69806f9f83675cd1e78083f7f)

There are a lot of changes in this commit which seem to be made without a clear reason and aren’t mentioned at all in the commit message.

I understand the point of adding the new xor_roundtrip_random_chunks and xor_bytes_reference tests but none of the other changes including the rewrite of the main dbwrapper test at the top, and the random tweaks such as this seem to have a clear purpose. I’m not sure if these are code cleanups, or expansions of test coverage to cover more cases, or reductions of test coverage to be compatible with the restricted 64 bit key size.

It would be good if commit message explained purpose of these changes. And if there are different purposes, it might make sense to split the changes into commits with a single purpose.

l0rinc commented at 9:15 pm on July 16, 2025:

Many of those changes were suggested during review. dbwrapper was added because of #31144 (review), after I saw that IBD failed even after all tests were passing. I’ve extracted it a separate commit now and explained the changes - and moved a remaining rename to the later scripted diff.

ryanofsky commented at 8:10 pm on July 18, 2025:

re: #31144 (review)

Thanks! New commit and explanations make this much clearer to me

in src/dbwrapper.cpp:253 in c678b5dd79 outdated

263-        Write(OBFUSCATION_KEY_KEY, new_key);
264-        m_obfuscation = new_key;
265-
266-        LogPrintf("Wrote new obfuscation key for %s: %s\n", fs::PathToString(params.path), HexStr(m_obfuscation));
267+    {
268+        m_obfuscation = std::vector<uint8_t>(Obfuscation::KEY_SIZE, '\000'); // Needed for unobfuscated Read() below

ryanofsky commented at 6:50 pm on July 16, 2025:

In commit “refactor: prepare DBWrapper for obfuscation key change” (c678b5dd79c3ed68b5c2537fdbf3de485db09021)

All these changes seem reasonable, but I don’t understand which of these changes help “prepare” for the new key type. It would be nice if commit message said why existing code wasn’t compatible with the new key type.

l0rinc commented at 8:58 pm on July 16, 2025:

Let me know if this is still relevant after the previously pushed changes

ryanofsky commented at 8:04 pm on July 18, 2025:

re: #31144 (review)

Let me know if this is still relevant after the previously pushed changes

New implementation is pretty different so I think this is resolved. I would suggest squashing this commit though (also mentioned this in another thread #31144 (review))

in src/node/mempool_persist.cpp:65 in 833359690c outdated

59@@ -60,15 +60,17 @@ bool LoadMempool(CTxMemPool& pool, const fs::path& load_path, Chainstate& active
60     try {
61         uint64_t version;
62         file >> version;
63-        std::vector<std::byte> obfuscation;
64+
65         if (version == MEMPOOL_DUMP_VERSION_NO_XOR_KEY) {
66-            // Leave XOR-key empty
67+            file.SetObfuscation({});

ryanofsky commented at 6:54 pm on July 16, 2025:

In commit “refactor: prepare DBWrapper for obfuscation key change” (c678b5dd79c3ed68b5c2537fdbf3de485db09021)

Similar to my comment in the last commit it is not clear if this is a code cleanup or a necessary change and it would be helpful if commit message mentioned the intent of the changes or why they are necessary.

l0rinc commented at 9:02 pm on July 16, 2025:

These changes are meant to minimize the diffs of the optimization commits later. I have added them after you have requested simplifying a more complicated optimization commit - these were the parts I was able to extract such that the diff is minimal in the high-risk commits. Later commits can simply change the vector to Obfuscation because of these refactors. Would you like me to go into more details in the commit messages?

ryanofsky commented at 4:36 pm on July 18, 2025:

re: #31144 (review)

In commit “refactor: prepare DBWrapper for obfuscation key change” (6bbf2d9311b47a8a15c17d9fe11828ee623d98e0)

I have added them after you have requested simplifying a more complicated optimization commit

Thanks for trying this. Looking at these two “prepare for” commits I don’t think they simply the optimization commit and just create complexity in the PR by changing the same code multiple times.

In light of this would suggest squashing both of these changes into the “refactor: encapsulate vector/array keys into Obfuscation” commit.

l0rinc commented at 11:38 pm on July 18, 2025:

Sure, I’ll merge those commits next time I push

l0rinc commented at 8:32 pm on July 16, 2025: contributor

Thanks for the hints, the latest version simplifies reading the obfuscation key by using the new Obfuscation object’s serialization paths + reading back the new key directly after generating a new one.

The obfuscation_constructors test was also changes to validate the new HexKey conversion which we need since we’re logging the generated obfuscations keys so we need to peek into the object.

It also avoids a possible Obfuscation copy used in GetObfuscation and removed the obfuscation_private::Key friend accessor - which was ugly and we don’t need it now that HexKey was exposed.

l0rinc force-pushed on Jul 16, 2025

l0rinc requested review from ryanofsky on Jul 16, 2025

ryanofsky commented at 8:43 pm on July 16, 2025: contributor

Code review 7b93fa394acaee7a715fae9a2b968d33c5adc174. Still in progress reviewing but wanted to post comments I had so far. There was a new push since these were written so please ignore anything that no longer applies.

test: make sure dbwrapper obfuscation key is never obfuscated a5141cd39e

test: compare util::Xor with randomized inputs against simple impl

The two tests are doing different things - `xor_roundtrip_random_chunks` does black-box style property-based testing to validate that certain invariants hold - that deobfuscating an obfuscation results in the original message (higher level, it doesn't have to know about the implementation details).

The `xor_bytes_reference` test makes sure the optimized xor implementation behaves in every imaginable scenario exactly as the simplest possible obfuscation - with random chunks, random alignment, random data, random key.

Since we're touching the file, other related small refactors were also applied:
* `nullpt` typo fixed;
* manual byte-by-byte xor key creations were replaced with `_hex` factories;
* since we're only using 64 bit keys in production, smaller keys were changed to reflect real-world usage;

Co-authored-by: Hodlinator <172445034+hodlinator@users.noreply.github.com>

618a30e326

bench: make ObfuscationBench more representative

A previous PR already solved the tiny byte-array-xors during serialization, so it makes sense to keep focusing on the performance of bigger continuous chunks.

This also renames the file from `xor` to `obfuscation` to enable scripted diff name unification later.

> C++ compiler .......................... GNU 14.2.0

|             ns/byte |              byte/s |    err% |        ins/byte |        cyc/byte |    IPC |       bra/byte |   miss% |     total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
|                0.84 |    1,184,138,235.64 |    0.0% |            9.01 |            3.03 |  2.971 |           1.00 |    0.1% |      5.50 | `ObfuscationBench`

> C++ compiler .......................... Clang 20.1.7

|             ns/byte |              byte/s |    err% |        ins/byte |        cyc/byte |    IPC |       bra/byte |   miss% |     total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
|                0.89 |    1,124,087,330.23 |    0.1% |            6.52 |            3.20 |  2.041 |           0.50 |    0.2% |      5.50 | `ObfuscationBench`

972697976c

scripted-diff: unify xor-vs-obfuscation nomenclature

Mechanical refactor of the low-level "xor" wording to signal the intent instead of the implementation used.
The renames are ordered by heaviest-hitting substitutions first, and were constructed such that after each replacement the code is still compilable.

-BEGIN VERIFY SCRIPT-
sed -i \
  -e 's/\bGetObfuscateKey\b/GetObfuscation/g' \
  -e 's/\bxor_key\b/obfuscation/g' \
  -e 's/\bxor_pat\b/obfuscation/g' \
  -e 's/\bm_xor_key\b/m_obfuscation/g' \
  -e 's/\bm_xor\b/m_obfuscation/g' \
  -e 's/\bobfuscate_key\b/m_obfuscation/g' \
  -e 's/\bOBFUSCATE_KEY_KEY\b/OBFUSCATION_KEY_KEY/g' \
  -e 's/\bSetXor(/SetObfuscation(/g' \
  -e 's/\bdata_xor\b/obfuscation/g' \
  -e 's/\bCreateObfuscateKey\b/CreateObfuscation/g' \
  -e 's/\bobfuscate key\b/obfuscation key/g' \
  $(git ls-files '*.cpp' '*.h')
-END VERIFY SCRIPT-

0b8bec8aa6

refactor: prepare `DBWrapper` for obfuscation key change

Since `FastRandomContext` delegates to `GetRandBytes` anyway, we can simplify new key generation to a Write/Read combo, unifying the flow of enabling obfuscation via `Read`.

The comments were also adjusted to clarify that the `m_obfuscation` field affects the behavior of `Read` and `Write` methods.

These changes are meant to simplify the diffs for the riskier optimization commits later.

6bbf2d9311

refactor: prepare mempool_persist for obfuscation key change

These changes are meant to simplify the diffs for the riskier optimization commits later.

fa5d296e3b

refactor: move `util::Xor` to `Obfuscation().Xor`

This is meant to focus the usages to narrow the scope of the obfuscation optimization.

`Obfuscation::Xor` is mostly a move.

Co-authored-by: maflcko <6399679+maflcko@users.noreply.github.com>

377aab8e5a

refactor: encapsulate `vector`/`array` keys into `Obfuscation` 478d40afc6

optimization: migrate fixed-size obfuscation from `std::vector<std::byte>` to `uint64_t`

All former `std::vector<std::byte>` keys were replaced with `uint64_t` (we still serialize them as vectors but convert immediately to `uint64_t` on load).
This is why some tests still generate vector keys and convert them to `uint64_t` later instead of generating them directly.

In `Obfuscation::Unserialize` we can safely throw an `std::ios_base::failure` since during mempool fuzzing `mempool_persist.cpp#L141` catches and ignored these errors.

>  C++ compiler .......................... GNU 14.2.0

|             ns/byte |              byte/s |    err% |        ins/byte |        cyc/byte |    IPC |       bra/byte |   miss% |     total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
|                0.04 |   28,365,698,819.44 |    0.0% |            0.34 |            0.13 |  2.714 |           0.07 |    0.0% |      5.33 | `ObfuscationBench`

> C++ compiler .......................... Clang 20.1.7

|             ns/byte |              byte/s |    err% |        ins/byte |        cyc/byte |    IPC |       bra/byte |   miss% |     total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
|                0.08 |   13,012,464,203.00 |    0.0% |            0.65 |            0.28 |  2.338 |           0.13 |    0.8% |      5.50 | `ObfuscationBench`

Co-authored-by: Hodlinator <172445034+hodlinator@users.noreply.github.com>
Co-authored-by: Ryan Ofsky <ryan@ofsky.org>

e7114fc6dc

optimization: peel align-head and unroll body to 64 bytes

Benchmarks indicated that obfuscating multiple bytes already gives an order of magnitude speed-up, but:
* GCC still emitted scalar code;
* Clang’s auto-vectorized loop ran on the slow unaligned-load path.

Fix contains:
* peeling the misaligned head enabled the hot loop starting at an 8-byte address;
* `std::assume_aligned<8>` tells the optimizer the promise holds - required to keep Apple Clang happy;
* manually unrolling the body to 64 bytes enabled GCC to auto-vectorize.

Note that `target.size() > KEY_SIZE` condition is just an optimization, the aligned and unaligned loops work without it as well - it's why the alignment calculation still contains `std::min`.

>  C++ compiler .......................... GNU 14.2.0

|             ns/byte |              byte/s |    err% |        ins/byte |        cyc/byte |    IPC |       bra/byte |   miss% |     total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
|                0.03 |   32,464,658,919.11 |    0.0% |            0.50 |            0.11 |  4.474 |           0.08 |    0.0% |      5.29 | `ObfuscationBench`

> C++ compiler .......................... Clang 20.1.7

|             ns/byte |              byte/s |    err% |        ins/byte |        cyc/byte |    IPC |       bra/byte |   miss% |     total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
|                0.02 |   41,231,547,045.17 |    0.0% |            0.30 |            0.09 |  3.463 |           0.02 |    0.0% |      5.47 | `ObfuscationBench`

Co-authored-by: Hodlinator <172445034+hodlinator@users.noreply.github.com>

248b6a27c3

l0rinc force-pushed on Jul 16, 2025

l0rinc commented at 9:42 pm on July 16, 2025: contributor

Thanks a lot Russ, I’ve split out a test commit, moved a rename to the scripted diff, clarified test refactors and testing strategies in commit messages.

in src/dbwrapper.cpp:256 in 6bbf2d9311 outdated

266-        LogPrintf("Wrote new obfuscation key for %s: %s\n", fs::PathToString(params.path), HexStr(m_obfuscation));
267+    m_obfuscation = std::vector<uint8_t>(Obfuscation::KEY_SIZE, '\000'); // Needed for unobfuscated Read()/Write() below
268+    if (!Read(OBFUSCATION_KEY_KEY, m_obfuscation) && params.obfuscate && IsEmpty()) {
269+        // Generate, write and read back the new obfuscation key, making sure we don't obfuscate the key itself
270+        Write(OBFUSCATION_KEY_KEY, FastRandomContext{}.randbytes(Obfuscation::KEY_SIZE));
271+        Read(OBFUSCATION_KEY_KEY, m_obfuscation);

maflcko commented at 10:39 am on July 18, 2025:

q in 6bbf2d9311b47a8a15c17d9fe11828ee623d98e0: Is this correct? Write may throw, but Read does not? So ignoring the return value makes it hard to see how this would be correct.

Why not just keep the code as-is here and use m_obfuscation = new_key;, which should be trivially correct?

l0rinc commented at 4:46 pm on July 18, 2025:

Write may throw, but Read does not

Did that change in this PR?

Why not just keep the code as-is here and use m_obfuscation = new_key

You mean like this?

0const auto new_key{FastRandomContext{}.randbytes<Obfuscation::KEY_SIZE, uint8_t>()};
1Write(OBFUSCATION_KEY_KEY, std::vector<uint8_t>{new_key.begin(), new_key.end()});
2m_obfuscation = Obfuscation{MakeByteSpan(new_key)};

instead of

0Write(OBFUSCATION_KEY_KEY, FastRandomContext{}.randbytes(Obfuscation::KEY_SIZE));
1Read(OBFUSCATION_KEY_KEY, m_obfuscation);

? The write/read combo seems simpler to me - and a bit more consistent, since we always assign m_obfuscation the same way, instead of read-first-assign-if-missing.

ryanofsky commented at 8:24 pm on July 18, 2025:

In commit “refactor: prepare DBWrapper for obfuscation key change” (6bbf2d9311b47a8a15c17d9fe11828ee623d98e0)

re: #31144 (review)

The write/read combo seems simpler to me

The read in the read/write combo doesn’t seem to accomplish anything. The simplest way to write this would be:

0m_obfuscation = Obfuscation{FastRandomContext{}.randbytes<Obfuscation::KEY_SIZE>()};
1Write(OBFUSCATION_KEY_KEY, m_obfuscation);

Alternately, you could write:

0Obfuscation new_key{FastRandomContext{}.randbytes<Obfuscation::KEY_SIZE>()};

and keep more of the existing code unchanged as Marco was suggesting.

l0rinc commented at 11:10 pm on July 18, 2025:

The simplest way to write this would be: m_obfuscation = Obfuscation{FastRandomContext{}.randbytesObfuscation::KEY_SIZE()}; Write(OBFUSCATION_KEY_KEY, m_obfuscation);

I don’t think so, Write needs m_obfuscation, so it would obfuscate the key itself if we assign it before the write, try build/bin/test_bitcoin --run_test=dbwrapper_tests. Seems the comment I added still isn’t helping, how do I rephrase it to make this part obvious?

0const Obfuscation new_key{FastRandomContext{}.randbytes<Obfuscation::KEY_SIZE>()};
1Write(OBFUSCATION_KEY_KEY, new_key);
2m_obfuscation = new_key;

This would work, but only work after “refactor: encapsulate vector/array keys into Obfuscation” - while the current write/read combo is the same for vectors or Obfuscation.

ryanofsky commented at 7:57 pm on July 21, 2025:

re: #31144 (review)

I don’t think so, Write needs m_obfuscation

Thanks, a comment that says something like “m_obfuscation must be unset while calling Write() so Write() does not try to obfuscate the obfuscation key” would be helpful. I still don’t think the unnecessary Read() call here is good, but at least I see why it’s being done now.

Actually, the main thing I don’t like about current code is the way it calls Write with a std::vector argument but calls Read with a Obfuscation argument. This looks wrong, makes an unnecessary assumption about the Obfuscation class’s serialization, and if we ever extended dbwrapper to require consistent types, it would fail to compile. Writing:

0Write(OBFUSCATION_KEY_KEY, Obfuscation{FastRandomContext{}.randbytes(Obfuscation::KEY_SIZE));

would avoid that, but I suspect there’s a better cleanup possible that would address all of the different issues here together and make this simpler.

l0rinc commented at 5:11 pm on July 22, 2025:

I don’t mind writing an Obfuscation object directly, it does make more sense to make that symmetric with the read. And if you still don’t like reading back what we wrote (which would exercise the same route that we’d take otherwise, seems better to me), we can also just assign it directly. Done in 62aa7b9 (#33039)

maflcko commented at 6:52 pm on July 22, 2025:

Write may throw, but Read does not

Did that change in this PR?

Yes, you are adding a call to Read() in this line, which does not check the return code. In the previous master, there was just a simple m_obfuscation = new_key; (Stuff like this is why I prefer to keep the code as-is, when possible, unless there is a reason to change and no downsides :)

l0rinc commented at 7:04 pm on July 22, 2025:

in the follow-up we don’t have another Read anymore and Write has a constant return true (wanted to get rid of this in one of my PRs before)

in src/util/obfuscation.h:63 in e7114fc6dc outdated

78     }
79 
80     std::string HexKey() const
81     {
82-        return HexStr(m_key);
83+        return HexStr(std::bit_cast<std::array<uint8_t, KEY_SIZE>>(m_rotations[0]));

maflcko commented at 10:53 am on July 18, 2025:

nit in e7114fc6dc3488c2584d42779ff2b102e4d1db99: Not that it matters, but you can also use a span to avoid the copy:

0-        return HexStr(std::bit_cast<std::array<uint8_t, KEY_SIZE>>(m_rotations[0]));
1+        return HexStr(std::as_bytes(std::span{&m_rotations[0], 1})));

l0rinc commented at 5:17 pm on July 18, 2025:

Yeah, I know we’re using these in serialization.h, but it seems to me the current option might be easier to optimize, see https://godbolt.org/z/fh8o9xdvz

It’s possible my reproducer is not representative, I also don’t mind switching if others prefer it.

ryanofsky commented at 9:07 pm on July 18, 2025:

re: #31144 (review)

as_bytes also seems a little clearer to me and might suggest a variation HexStr(std::as_bytes(std::span{m_rotations}.first(1)));, but no strong opinion. I’d be interested to understand godbolt example too, not sure if there is a simple explantion of why bit_cast might help?

l0rinc commented at 11:09 pm on July 18, 2025:

Ok, not sure it matters, but can take @maflcko’s suggestion if I have to push again.

l0rinc commented at 4:55 pm on July 22, 2025:

Done in a separate PR, thanks: 298bf95 (#33039)

maflcko commented at 10:55 am on July 18, 2025: member

Not sure the code is correct (see comment), but I’ve looked at it:

review ACK 248b6a27c351690d3596711cc36b8102977adeab 🎻

Signature:

0untrusted comment: signature from minisign secret key on empty file; verify via: minisign -Vm "${path_to_any_empty_file}" -P RWTRmVTMeKV5noAMqVlsMugDDCyyTSbA3Re5AkUrhvLVln0tSaFWglOw -x "${path_to_this_whole_four_line_signature_blob}"
1RUTRmVTMeKV5npGrKx1nqXCw5zeVHdtdYURB/KlyA/LMFgpNCs+SkW9a8N95d+U4AP1RJMi+krxU1A3Yux4bpwZNLvVBKy0wLgM=
2trusted comment: review ACK 248b6a27c351690d3596711cc36b8102977adeab 🎻
3y+Vf1FMMs+Mh/qNuFt7m5rGXcMVrJvCxgLk26KJE7ErP9Qbcwd7mBpBp7xxVtWzxWIsB66iJvW59yuotufVoBg==

DrahtBot requested review from achow101 on Jul 18, 2025

DrahtBot requested review from ryanofsky on Jul 18, 2025

in src/dbwrapper.cpp:255 in 0b8bec8aa6 outdated

249@@ -250,23 +250,23 @@ CDBWrapper::CDBWrapper(const DBParams& params)
250     }
251 
252     // The base-case obfuscation key, which is a noop.
253-    obfuscate_key = std::vector<unsigned char>(Obfuscation::KEY_SIZE, '\000');
254+    m_obfuscation = std::vector<unsigned char>(Obfuscation::KEY_SIZE, '\000');
255 
256-    bool key_exists = Read(OBFUSCATE_KEY_KEY, obfuscate_key);
257+    bool key_exists = Read(OBFUSCATION_KEY_KEY, m_obfuscation);

ryanofsky commented at 4:15 pm on July 18, 2025:

In commit “scripted-diff: unify xor-vs-obfuscation nomenclature” (0b8bec8aa6260c499c2663ab7a1c905da0d312c3)

Given latest changes where the obfuscation object is what’s serialized instead of the key (e.g. Read(OBFUSCATION_KEY_KEY, m_obfuscation)), I think it probably makes sense to rename OBFUSCATE_KEY_KEY to OBFUSCATION_KEY

l0rinc commented at 11:19 pm on July 18, 2025:

Did that before, but @hodlinator pointed out that the previous version made more sense since it’s storing obfuscation key’s database key: #31144 (review).

ryanofsky commented at 8:19 pm on July 21, 2025:

re: #31144 (review)

Did that before, but @hodlinator pointed out that the previous version made more sense since it’s storing obfuscation key’s database key: #31144 (comment).

Yes, but that was when the row contained was read as a key rather than an Obfuscation object. IMO, if the row contains a serialized Obfuscation object the key should be called OBFUSCATION_KEY, and if contains a serialized obfuscation key OBFUSCATION_KEY_KEY is a reasonable name. Prior to the most recent push the Obfuscation class had serialization methods but they were not used here, so now that they are used I think the OBFUSCATION_KEY name probably makes more sense.

l0rinc commented at 4:56 pm on July 22, 2025:

Sure, done in a separate PR: e5b1b7c (#33039)

in src/util/obfuscation.h:40 in 248b6a27c3

38-        for (; target.size() >= KEY_SIZE; target = target.subspan(KEY_SIZE)) {
39-            XorWord(target.first<KEY_SIZE>(), rot_key);
40+        KeyType rot_key{m_rotations[key_offset % KEY_SIZE]}; // Continue obfuscation from where we left off
41+        if (target.size() > KEY_SIZE) {
42+            // Obfuscate until 64-bit alignment boundary
43+            if (const auto misalign{std::bit_cast<uintptr_t>(target.data()) % KEY_SIZE}) {

ryanofsky commented at 9:32 pm on July 18, 2025:

In commit “optimization: peel align-head and unroll body to 64 bytes” (248b6a27c351690d3596711cc36b8102977adeab)

reinterpret_cast seems more appropriate than bit_cast here, though either are probably ok. IIUC reinterpret cast is guaranteed to convert the pointer to an integer that can be round tripped back into a valid pointer, while bit_cast is looser and doesn’t provide any guarantees.

l0rinc commented at 11:25 pm on July 18, 2025:

Can you please point me to a documentation or code which demonstrates that?

ryanofsky commented at 8:41 pm on July 21, 2025:

re: #31144 (review)

Can you please point me to a documentation or code which demonstrates that?

Not an expert on this, but seeing the note about uintptr_t at https://en.cppreference.com/w/cpp/language/reinterpret_cast.html is what prompted me to post the suggestion:

If the implementation provides std::intptr_t and/or std::uintptr_t, then a cast from a pointer to an object type or cv void to these types is always well-defined.

By contrast I don’t think bit_cast tells you anything about the result you can expect converting a pointer to uintptr_t. It could also fail to compile if the pointer and uintptr_t are not exactly the same size.

Generally speaking bit_cast just seems like the wrong tool for the job here. As I understand it bit_cast is an alternative to memcpy: a way of copying bytes from one place to another without interpreting them. reinterpret_cast is more intended to be used for pointer and reference arguments as opposed to data arguments, which is why there is so much documentation about what it doesn’t and doesn’t guarantee when converting these, while bit_cast doesn’t directly deal with pointers at all.

I doubt there is any difference in practice, but reinterpret_cast feels a little more appropriate and in theory could work better on esoteric platforms where intptr_t isn’t the exact same size as the pointer type.

l0rinc commented at 5:15 pm on July 22, 2025:

Done in fee3048 (#33039)

in src/util/obfuscation.h:47 in 248b6a27c3

45+                XorWord(target.first(alignment), rot_key);
46+
47+                target = {std::assume_aligned<KEY_SIZE>(target.data() + alignment), target.size() - alignment};
48+                rot_key = m_rotations[(key_offset + alignment) % KEY_SIZE];
49+            }
50+            // Aligned obfuscation in 64-byte chunks

ryanofsky commented at 9:46 pm on July 18, 2025:

In commit “optimization: peel align-head and unroll body to 64 bytes” (248b6a27c351690d3596711cc36b8102977adeab)

Might be clearer to s/64-byte/8*KEY_SIZE/ here and s/64-bit/KEY_SIZE/ below

l0rinc commented at 11:28 pm on July 18, 2025:

Sure, if I push again, I’ll consider these nits

l0rinc commented at 5:17 pm on July 22, 2025:

Done in fee3048 (#33039)

ryanofsky approved

ryanofsky commented at 9:57 pm on July 18, 2025: contributor

Code review ACK 248b6a27c351690d3596711cc36b8102977adeab. Looks good! Thanks for adapting this and considering all the suggestions. I did leave more comments below but non are important and this looks good as-is

l0rinc commented at 11:43 pm on July 18, 2025: contributor

Thanks for the review. Since the PR is in review for almost a year now, and since the remaining comments seem like they can potentially be done in follow-ups as well, I’d prefer getting this over the finish line soon.

achow101 commented at 0:12 am on July 19, 2025: member

ACK 248b6a27c351690d3596711cc36b8102977adeab

achow101 merged this on Jul 19, 2025

achow101 closed this on Jul 19, 2025

l0rinc deleted the branch on Jul 19, 2025

l0rinc commented at 6:13 am on July 19, 2025: contributor

Thanks a lot @maflcko, @ryanofsky, @hodlinator and @achow101 for all the reviews and reproducers and suggestions!

l0rinc referenced this in commit 6b6ae2316d on Jul 21, 2025

l0rinc referenced this in commit a17d8202c3 on Jul 22, 2025

l0rinc referenced this in commit 2dea045425 on Jul 22, 2025

l0rinc referenced this in commit 298bf95105 on Jul 22, 2025

l0rinc referenced this in commit e5b1b7c557 on Jul 22, 2025

l0rinc referenced this in commit 62aa7b9fab on Jul 22, 2025

l0rinc referenced this in commit fee3048fcc on Jul 22, 2025

l0rinc commented at 5:34 pm on July 22, 2025: contributor

Added a follow-up with the remaining nits, see: https://github.com/bitcoin/bitcoin/pull/33039

l0rinc referenced this in commit da7bd4420e on Jul 22, 2025

l0rinc referenced this in commit e7b00d69b6 on Jul 22, 2025

[IBD] multi-byte block obfuscation #31144

Summary

Changes in testing, benchmarking and implementation

Assembly

Endianness

Measurements (micro benchmarks and full IBDs)

Code Coverage & Benchmarks

Reviews

Conflicts

Fix

Changes

Full node / source

Syncing node

Results

Benchmark Results for Bitcoin Core Optimization

Results

Executed Command

Commit: caa68f79c1

Commit: 5acf12bafe

Summary

Bitcoin Node Setup

Database Cache Adjustments

Branch with suggestions

Diff of suggestions applied to reviewed code

Benchmarks

GCC 14.2.1

Clang Clang 20.1.3

Testing

Guix + Windows

ReadBlockBench + ReadRawBlockBench

Regarding Tests