[IBD] multi-byte block obfuscation #31144

pull l0rinc wants to merge 6 commits into bitcoin:master from l0rinc:l0rinc/optimize-xor changing 13 files +350 −175
  1. l0rinc commented at 7:11 am on October 24, 2024: contributor

    This change is part of [IBD] - Tracking PR for speeding up Initial Block Download

    Summary

    Current block obfuscations are done byte-by-byte, this PR batches them to 64 bit primitives to speed up obfuscating bigger memory batches. This is especially relevant now that #31551 was merged, having bigger obfuscatable chunks.

    Since this obfuscation is optional, the speedup measured here depends on whether it’s a random value or completely turned off (i.e. XOR-ing with 0).

    Changes in testing, benchmarking and implementation

    • Added new tests comparing randomized inputs against a trivial implementation and performing roundtrip checks with random chunks.
    • An additional benchmark checks the effect of short-circuiting XOR when the key is zero, ensuring no speed regression occurs when the obfuscation feature is disabled.
    • Migrated remaining std::vector<std::byte>(8) values to uint64_t.

    Reproducer and assembly

    Memory alignment is handled via std::memcpy, optimized out on tested platforms (see https://godbolt.org/z/P4cWx91Kv):

    • Clang (x86-64) - 128-bit SIMD (pxor), 256-bit unroll (4×64-bit)
    • GCC (x86-64) - 64-bit XOR (QWORD), 128-bit unroll (2×64-bit)
    • RISC-V (32-bit) - 64-bit via 32-bit registers, no unroll, byte-by-byte load/store
    • s390x (big-endian) - 64-bit XOR (xc), 512-bit unroll (8×64-bit)

    Endianness

    The only endianness issue was with bit rotation, intended to realign the key if obfuscation halted before full key consumption. Elsewhere, memory is read, processed, and written back in the same endianness, preserving byte order. Since CI lacks a big-endian machine, testing was done locally via Docker.

    0brew install podman pigz
    1softwareupdate --install-rosetta
    2podman machine init
    3podman machine start
    4docker run --platform linux/s390x -it ubuntu:latest /bin/bash
    5    apt update && apt install -y git build-essential cmake ccache pkg-config libevent-dev libboost-dev libssl-dev libsqlite3-dev && \
    6    cd /mnt && git clone https://github.com/bitcoin/bitcoin.git && cd bitcoin && git remote add l0rinc https://github.com/l0rinc/bitcoin.git && git fetch --all && git checkout l0rinc/optimize-xor && \
    7    cmake -B build && cmake --build build --target test_bitcoin -j$(nproc) && \
    8    ./build/bin/test_bitcoin --run_test=streams_tests
    

    Measurements (micro benchmarks and full IBDs)

    cmake -B build -DBUILD_BENCH=ON -DCMAKE_BUILD_TYPE=Release
    && cmake –build build -j$(nproc)
    && build/bin/bench_bitcoin -filter=‘XorObfuscationBench’ -min-time=10000

    Before:

    ns/MiB MiB/s err% total benchmark
    731,927.62 1,366.26 0.2% 10.67 XorObfuscationBench

    After:

    ns/MiB MiB/s err% total benchmark
    14,730.40 67,886.80 0.1% 11.01 XorObfuscationBench

    Before:

    ns/MiB MiB/s err% ins/MiB cyc/MiB IPC bra/MiB miss% total benchmark
    941,015.26 1,062.68 0.0% 9,437,186.97 3,378,911.52 2.793 1,048,577.15 0.0% 10.99 XorObfuscationBench

    After:

    ns/MiB MiB/s err% ins/MiB cyc/MiB IPC bra/MiB miss% total benchmark
    51,187.17 19,536.15 0.0% 327,683.95 183,747.58 1.783 65,536.55 0.0% 11.00 XorObfuscationBench

    i.e. 18x faster obfuscation on Linux, 49x faster on Mac


    A few other benchmarks that seem to have improved as well (tested with Clang only): Before:

    ns/op op/s err% total benchmark
    2,202,618.49 454.01 0.2% 11.01 ReadBlockBench
    734,444.92 1,361.57 0.3% 10.66 ReadRawBlockBench

    After:

    ns/op op/s err% total benchmark
    1,912,308.06 522.93 0.4% 10.98 ReadBlockBench
    49,092.93 20,369.53 0.2% 10.99 ReadRawBlockBench

    i.e. ReadRawBlockBench is 15x faster, ReadBlockBench is 15% faster

    Also visible on https://corecheck.dev/bitcoin/bitcoin/pulls/31144


    Running an IBD until 888888 blocks reveals a 4% speedup.

    SSD:

     0COMMITS="8324a00bd4a6a5291c841f2d01162d8a014ddb02 5ddfd31b4158a89b0007cfb2be970c03d9278525"; \
     1STOP_HEIGHT=888888; DBCACHE=1000; \
     2CC=gcc; CXX=g++; \
     3BASE_DIR="/mnt/my_storage"; DATA_DIR="$BASE_DIR/BitcoinData"; LOG_DIR="$BASE_DIR/logs"; \
     4(for c in $COMMITS; do git fetch origin $c -q && git log -1 --pretty=format:'%h %s' $c || exit 1; done) && \
     5hyperfine \
     6  --sort 'command' \
     7  --runs 1 \
     8  --export-json "$BASE_DIR/ibd-${COMMITS// /-}-$STOP_HEIGHT-$DBCACHE-$CC.json" \
     9  --parameter-list COMMIT ${COMMITS// /,} \
    10  --prepare "killall bitcoind; rm -rf $DATA_DIR/*; git checkout {COMMIT}; git clean -fxd; git reset --hard; \
    11    cmake -B build -DCMAKE_BUILD_TYPE=Release -DENABLE_WALLET=OFF && \
    12    cmake --build build -j$(nproc) --target bitcoind && \
    13    ./build/bin/bitcoind -datadir=$DATA_DIR -stopatheight=1 -printtoconsole=0; sleep 100" \
    14  --cleanup "cp $DATA_DIR/debug.log $LOG_DIR/debug-{COMMIT}-$(date +%s).log" \
    15  "COMPILER=$CC ./build/bin/bitcoind -datadir=$DATA_DIR -stopatheight=$STOP_HEIGHT -dbcache=$DBCACHE -blocksonly -printtoconsole=0"
    

    8324a00bd4 test: Compare util::Xor with randomized inputs against simple impl 5ddfd31b41 optimization: Xor 64 bits together instead of byte-by-byte

    0Benchmark 1: COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=888888 -dbcache=1000 -blocksonly -printtoconsole=0 (COMMIT = 8324a00bd4a6a5291c841f2d01162d8a014ddb02)
    1  Time (abs ):        25033.413 s               [User: 33953.984 s, System: 2613.604 s]
    2 
    3Benchmark 2: COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=888888 -dbcache=1000 -blocksonly -printtoconsole=0 (COMMIT = 5ddfd31b4158a89b0007cfb2be970c03d9278525)
    4  Time (abs ):        24110.710 s               [User: 33389.536 s, System: 2660.292 s]
    5 
    6Relative speed comparison
    7        1.04          COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=888888 -dbcache=1000 -blocksonly -printtoconsole=0 (COMMIT = 8324a00bd4a6a5291c841f2d01162d8a014ddb02)
    8        1.00          COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=888888 -dbcache=1000 -blocksonly -printtoconsole=0 (COMMIT = 5ddfd31b4158a89b0007cfb2be970c03d9278525)
    

    HDD:

     0COMMITS="71eb6eaa740ad0b28737e90e59b89a8e951d90d9 46854038e7984b599d25640de26d4680e62caba7"; \
     1STOP_HEIGHT=888888; DBCACHE=4500; \
     2CC=gcc; CXX=g++; \
     3BASE_DIR="/mnt/my_storage"; DATA_DIR="$BASE_DIR/BitcoinData"; LOG_DIR="$BASE_DIR/logs"; \
     4(for c in $COMMITS; do git fetch origin $c -q && git log -1 --pretty=format:'%h %s' $c || exit 1; done) && \
     5hyperfine \
     6  --sort 'command' \
     7  --runs 2 \
     8  --export-json "$BASE_DIR/ibd-${COMMITS// /-}-$STOP_HEIGHT-$DBCACHE-$CC.json" \
     9  --parameter-list COMMIT ${COMMITS// /,} \
    10  --prepare "killall bitcoind; rm -rf $DATA_DIR/*; git checkout {COMMIT}; git clean -fxd; git reset --hard; \
    11    cmake -B build -DCMAKE_BUILD_TYPE=Release -DENABLE_WALLET=OFF && cmake --build build -j$(nproc) --target bitcoind && \
    12    ./build/bin/bitcoind -datadir=$DATA_DIR -stopatheight=1 -printtoconsole=0; sleep 100" \
    13  --cleanup "cp $DATA_DIR/debug.log $LOG_DIR/debug-{COMMIT}-$(date +%s).log" \
    14  "COMPILER=$CC ./build/bin/bitcoind -datadir=$DATA_DIR -stopatheight=$STOP_HEIGHT -dbcache=$DBCACHE -blocksonly -printtoconsole=0"
    

    71eb6eaa74 test: compare util::Xor with randomized inputs against simple impl 46854038e7 optimization: migrate fixed-size obfuscation from std::vector<std::byte> to uint64_t

     0Benchmark 1: COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=888888 -dbcache=4500 -blocksonly -printtoconsole=0 (COMMIT = 71eb6eaa740ad0b28737e90e59b89a8e951d90d9)
     1  Time (mean ± σ):     37676.293 s ± 83.100 s    [User: 36900.535 s, System: 2220.382 s]
     2  Range (min  max):   37617.533 s  37735.053 s    2 runs
     3 
     4Benchmark 2: COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=888888 -dbcache=4500 -blocksonly -printtoconsole=0 (COMMIT = 46854038e7984b599d25640de26d4680e62caba7)
     5  Time (mean ± σ):     36181.287 s ± 195.248 s    [User: 34962.822 s, System: 1988.614 s]
     6  Range (min  max):   36043.226 s  36319.349 s    2 runs
     7 
     8Relative speed comparison
     9        1.04 ±  0.01  COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=888888 -dbcache=4500 -blocksonly -printtoconsole=0 (COMMIT = 71eb6eaa740ad0b28737e90e59b89a8e951d90d9)
    10        1.00          COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=888888 -dbcache=4500 -blocksonly -printtoconsole=0 (COMMIT = 46854038e7984b599d25640de26d4680e62caba7)
    
  2. DrahtBot commented at 7:11 am on October 24, 2024: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/31144.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    Concept ACK hodlinator, ryanofsky

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #32043 ([IBD] Tracking PR for speeding up Initial Block Download by l0rinc)
    • #31868 ([IBD] specialize block serialization by l0rinc)
    • #31860 (init: Take lock on blocks directory in BlockManager ctor by TheCharlatan)
    • #29641 (scripted-diff: Use LogInfo over LogPrintf [WIP, NOMERGE, DRAFT] by maflcko)
    • #29307 (util: explicitly close all AutoFiles that have been written by vasild)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  3. l0rinc force-pushed on Oct 24, 2024
  4. DrahtBot added the label CI failed on Oct 24, 2024
  5. in src/streams.h:40 in 10b9e68768 outdated
    41-    key_offset %= key.size();
    42 
    43-    for (size_t i = 0, j = key_offset; i != write.size(); i++) {
    44-        write[i] ^= key[j++];
    45+    if (size_t remaining = write.size() - i; key.size() == 8 && remaining >= 8) { // Xor in 64-bit chunks
    46+        const auto key64 = *std::bit_cast<uint64_t*>(key.data());
    


    maflcko commented at 8:27 am on October 24, 2024:
    I fail to see how this is not UB. This is identical to #30349 (review)

    maflcko commented at 8:54 am on October 24, 2024:

    Isn’t this what we’re doing in CScript as well https://github.com/bitcoin/bitcoin/blob/master/src/script/script.h#L496 ?

    no? value_type is unsigned char (an 8-bit integer type) and this one here is uint64_t (an 64-bit integer type).


    l0rinc commented at 8:55 am on October 24, 2024:
    Yes, just saw it, my mistake

    l0rinc commented at 9:17 am on October 24, 2024:
    Replaced it with memcpy and it looks like the compiler successfully simplifies it to proper vector instructions: https://godbolt.org/z/Koscjconz
  6. in src/streams.h:42 in 10b9e68768 outdated
    43-    for (size_t i = 0, j = key_offset; i != write.size(); i++) {
    44-        write[i] ^= key[j++];
    45+    if (size_t remaining = write.size() - i; key.size() == 8 && remaining >= 8) { // Xor in 64-bit chunks
    46+        const auto key64 = *std::bit_cast<uint64_t*>(key.data());
    47+        const auto size64 = remaining / 8;
    48+        for (auto& write64 : std::span(std::bit_cast<uint64_t*>(write.data() + i), size64)) {
    


    maflcko commented at 8:27 am on October 24, 2024:
    same

    maflcko commented at 8:37 am on October 24, 2024:

    CI seems to agree:

    0/usr/bin/../lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/bits/stl_iterator.h:1100:16: runtime error: reference binding to misaligned address 0x7f10961d9084 for type 'unsigned long', which requires 8 byte alignment
    10x7f10961d9084: note: pointer points here
    2  94 8e 20 eb 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
    3              ^ 
    4    [#0](/bitcoin-bitcoin/0/) 0x55780d85ab85 in __gnu_cxx::__normal_iterator<unsigned long*, std::span<unsigned long, 18446744073709551615ul>>::operator*() const /usr/bin/../lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/bits/stl_iterator.h:1100:9
    5    [#1](/bitcoin-bitcoin/1/) 0x55780d85ab85 in util::Xor(Span<std::byte>, Span<std::byte const>, unsigned long) ci/scratch/build-x86_64-pc-linux-gnu/src/test/./src/streams.h:42:28
    

    https://github.com/bitcoin/bitcoin/actions/runs/11495063168/job/31993791381?pr=31144#step:7:4647


    l0rinc commented at 8:44 am on October 24, 2024:
    Thanks, I’ll investigate. I assumed there will be more to check, that’s why it’s still a draft.
  7. maflcko commented at 8:27 am on October 24, 2024: member

    I think your example may be a bit skewed? It shows how much time is spent when deserializing a CScript from a block file. However, block files contain full blocks, where many (most?) of the writes are single bytes (or 4 bytes), see #30833 (comment). Thus, it would be useful to know what the overall end-to-end performance difference is. Also taking into account the utxo db.

    If you want the micro-benchmark to be representative, I’d presume you’d have to mimic the histogram of the sizes of writes. Just picking one (1024, or 1004), which is never hit in reality, and then optimizing for that may be misleading.

  8. l0rinc force-pushed on Oct 24, 2024
  9. l0rinc force-pushed on Oct 24, 2024
  10. l0rinc commented at 10:02 am on October 24, 2024: contributor

    where many (most?) of the writes are single bytes (or 4 bytes)

    Thanks, I’ve extended your previous benchmarks with both Autofile serialization and very small vectors. I will also run a reindex of 400k blocks before and after to see if the effect is measurable or not.

  11. in src/streams.h:44 in 6ae466bf11 outdated
    42+    if (key.size() == 8 && write.size() - i >= 8) { // Xor in 64-bit chunks
    43+        uint64_t key64;
    44+        std::memcpy(&key64, key.data(), 8);
    45+        for (; i <= write.size() - 8; i += 8) {
    46+            uint64_t write64;
    47+            std::memcpy(&write64, write.data() + i, 8);
    


    laanwj commented at 10:41 am on October 24, 2024:

    i have a hard time believing this will make a large difference, especially with the two memcpys involved On modern CPUs, ALU operations (especially bitwise ones) are so fast compared to any kind of memory access. And this isn’t some advanced crypto math, it’s one Xor operation per word with a fixed key.

    Could avoid the memcpys if the code takes memory alignment into account, but that makes it even more complex. Not sure the pros/cons work out here.


    l0rinc commented at 10:46 am on October 24, 2024:
    The speedup comes from the vectorized operations, i.e. doing 64 bit xor instead of byte-by-byte xor (memcpy seems to be eliminated on 64 bit architectures successfully), see https://godbolt.org/z/Koscjconz

    laanwj commented at 11:05 am on October 24, 2024:
    Right, that’s trivial for x86_64. Let’s also check for architectures that do require alignment for 64-bit reads and writes, like RISC-V.

    l0rinc commented at 12:06 pm on October 24, 2024:
    Added RISC-V compiler to https://godbolt.org/z/n5rMeYeas - where it seems to my untrained eyes that it uses two separate 32 bit xors to emulate the 64 bit operation (but even if it’s byte-by-byte on 32 bit processors, that’s still the same as what it was before on 64 bit CPUs, right?).

    l0rinc commented at 10:01 pm on October 26, 2024:

    I’ve replaced all vector keys with 64 bit ints

    Edit: Memory alignment is handled via std::memcpy, optimized out on tested platforms (see godbolt.org/z/dcxvh6abq):

    • Clang (x86-64) - 32 bytes/iter using SSE vector operations
    • GCC (x86-64) - 16 bytes/iter using unrolled 64-bit XORs
    • RISC-V (32-bit) - 8 bytes/iter using load/XOR/store sequence
    • s390x (big-endian) - 64 bytes/iter with unrolled 8-byte XORs

    (please validate, my assembly knowledge is mostly academic)

  12. in src/streams.h:51 in 6ae466bf11 outdated
    59-        // way instead of doing a %, which would effectively be a division
    60-        // for each byte Xor'd -- much slower than need be.
    61-        if (j == key.size())
    62-            j = 0;
    63+    for (size_t j = 0; i < write.size(); ++i, ++j) {
    64+        write[i] ^= key[j % key.size()];
    


    laanwj commented at 10:42 am on October 24, 2024:
    Please avoid % (especially with a dynamic value) in an inner loop. It’s essentially a division operation and those are not cheap.

    l0rinc commented at 10:45 am on October 24, 2024:
    Thanks for the hint, I deliberately removed that (please check the commit messages for details), since these are optimized away. Also, this is just the leftover part, so for key of length 8 (the version used in most places) this will have 7 iterations at most. Can you see any difference with any of the benchmarks?

    laanwj commented at 11:02 am on October 24, 2024:

    It’s only up to 7 iterations (assuming the key size is 8), sure, youre’re right.

    But ok, yea, i’m a bit divided about relying on specific non-trivial things being optimized out, makes the output very dependent on specific compiler decisions (which may be fickle in some cases).


    l0rinc commented at 11:05 am on October 24, 2024:
    Often the simplest code gets optimized most, since it’s more predictable. Would you like me to extend the test or benchmark suite or try something else to make sure we’re comfortable with the change?

    laanwj commented at 11:10 am on October 24, 2024:
    No, it’s fine. It just gives me goosebumps seeing code like this, but if it doesn’t affect the benchmark and no one else cares then it doesn’t matter.

    l0rinc commented at 12:09 pm on October 24, 2024:
    To get rid of the goosebumps I’m handling the remaining 4 bytes as a single 32 bit xor now, so the final loop (when keys are 8 bytes long, which is mostly the case for us, I think) does 3 iterations at most. So even if it’s not optimized away, we should be fine doing 3 divisions by a nice round number like 8.

    maflcko commented at 12:17 pm on October 24, 2024:

    divisions by a nice round number like 8.

    I don’t think the compiler knows the number here, so can’t use it to optimize the code based on it.


    l0rinc commented at 12:19 pm on October 24, 2024:
    Is that important for max 3 divisions?

    maflcko commented at 7:33 pm on October 24, 2024:

    Yes. At least for me locally, but I am getting widely different bench results anyway: #31144 (comment)

    With this one reverted, XorSmall is back to being just slightly slower than current master for me.


    l0rinc commented at 8:37 pm on October 24, 2024:

    Usually these optimizations concentrate on the measurable parts based on the profiling results that I’m getting during reindexing or IBD. Obfuscating a single bit (i.e. XorSmall) wasn’t my focus, it’s already very fast, didn’t seem like the bottleneck. Would you like me to concentrate on that scenario instead? Or would it make more sense to serialize a block and use that as the basis for the benchmarks?

    C++ compiler …………………….. AppleClang 16.0.0.16000026

    Before:

    ns/byte byte/s err% total benchmark
    1.99 501,740,412.05 0.5% 10.27 AutoFileXor
    1.24 807,597,761.92 0.0% 11.01 Xor
    1.29 776,238,564.27 0.0% 10.59 XorSmall

    After:

    ns/byte byte/s err% total benchmark
    0.73 1,364,622,484.81 0.9% 8.78 AutoFileXor
    0.02 40,999,383,920.82 0.0% 11.00 Xor
    0.54 1,862,525,472.57 0.0% 11.00 XorSmall

    C++ compiler …………………….. GNU 12.2.0

    Before:

    ns/byte byte/s err% ins/byte cyc/byte IPC bra/byte miss% total benchmark
    1.60 624,562,742.35 0.2% 9.20 3.57 2.579 1.03 0.1% 10.61 AutoFileXor
    0.91 1,102,205,071.31 0.0% 9.02 3.26 2.763 1.00 0.1% 11.00 Xor
    1.23 811,876,565.33 0.1% 14.60 4.43 3.295 1.80 0.0% 10.56 XorSmall

    After:

    ns/byte byte/s err% ins/byte cyc/byte IPC bra/byte miss% total benchmark
    0.74 1,346,798,809.87 0.1% 0.72 0.47 1.531 0.16 0.2% 11.02 AutoFileXor
    0.09 11,450,472,586.50 0.0% 0.59 0.31 1.882 0.14 1.9% 10.82 Xor
    5.65 177,092,223.60 0.1% 22.00 20.31 1.083 4.80 0.0% 10.99 XorSmall

    maflcko commented at 9:11 pm on October 24, 2024:

    Would you like me to concentrate on that scenario instead? Or would it make more sense to serialize a block and use that as the basis for the benchmarks?

    Well no. I think this has been mentioned previously. Generally, optimizing for micro benchmarks may not yield results that are actually meaningful or visible for end-users, because the benchmarks capture only a very specific and narrow view. Optimizing for one could even make the code slower for another (as observed above). Adding a bench for the block couldn’t hurt, but I haven’t checked how representative it is. If such a bench represents the IBD behavior, it would be ideal. (There already is a block in the hex bench data, which could be used)

    Usually these optimizations concentrate on the measurable parts based on the profiling results that I’m getting during reindexing or IBD

    Yes, that is more useful. It would be good to share the number you got. Because the commit message simply claims that no benefit was found (“The if (j == key.size()) optimization wasn’t kept since the benchmarks couldn’t show any advantage anymore”).

    XorSmall

    Looks like you can reproduce the slowdown. I wonder if it is correlated with the use of libc++ vs libstdc++


    l0rinc commented at 11:35 am on October 25, 2024:

    It would be good to share the number you got

    The reindex-chainstate until 600k, 2 runs just finished - comparing master against the 64/32 bit packing (current state) on Linux (with GCC, showing the above inconsistency).

    0hyperfine \
    1--runs 2 \
    2--show-output \
    3--export-json /mnt/my_storage/reindex-xor.json \
    4--parameter-list COMMIT dea7e2faf1bc48f96741ef
    584e25e6f47cefd5a92,353915bae14b9704a209bc09b021d3dd2ee11cf2 \
    6--prepare 'cmake -B build -DCMAKE_BUILD_TYPE=Release -DBUILD_UTIL=OFF -DBUILD_TX=OFF -DBUILD_TESTS=OFF -DENABLE_WALLET=OFF -DI
    7NSTALL_MAN=OFF && cmake --build build -j$(nproc)' \
    8'COMMIT={COMMIT} ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=600000 -dbcache=500 -printtoconsole=0 -reindex-chainstate -connect=0'
    

    Before:

    • 3.554 hours
    • 3.567 hours

    After:

    • 3.523 hours
    • 3.527 hours
    0Benchmark 1: COMMIT=dea7e2faf1bc48f96741ef84e25e6f47cefd5a92 ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=600000 -dbcache=500 -printtoconsole=0 -reindex-chains
    1tate -connect=0
    2  Time (mean ± σ):     12819.367 s ± 35.155 s    [User: 11992.168 s, System: 2509.200 s]
    3  Range (min … max):   12794.508 s … 12844.225 s    2 runs
    4
    5Benchmark 2: COMMIT=353915bae14b9704a209bc09b021d3dd2ee11cf2 ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=600000 -dbcache=500 -printtoconsole=0 -reindex-chains
    6tate -connect=0
    7  Time (mean ± σ):     12685.350 s ± 19.878 s    [User: 11918.349 s, System: 2523.819 s]
    8  Range (min … max):   12671.295 s … 12699.406 s    2 runs
    

    Reindexing is a lot more stable than IBD (as seen from multiple measurements), showing a consistent 1% speedup. Not earth-shattering, but at least this way the obfuscation isn’t causing a speed regression anymore.


    Adding a bench for the block couldn’t hurt, but I haven’t checked how representative it is

    Let’s find out!


    l0rinc commented at 10:03 pm on October 26, 2024:
    I’ve change the usages to avoid std::vector keys, this way GCC and Clang both agree that the new results are faster (even though clang manages to compile to 32 byte SIMD, while GCC only to 16 bytes per iteration, see #31144 (review))
  13. DrahtBot removed the label CI failed on Oct 24, 2024
  14. l0rinc force-pushed on Oct 24, 2024
  15. l0rinc force-pushed on Oct 24, 2024
  16. l0rinc renamed this:
    optimization: pack util::Xor into 64 bit chunks instead of doing it byte-by-byte
    optimization: pack util::Xor into 64/32 bit chunks instead of doing it byte-by-byte
    on Oct 24, 2024
  17. l0rinc marked this as ready for review on Oct 24, 2024
  18. laanwj added the label Block storage on Oct 24, 2024
  19. maflcko commented at 7:14 pm on October 24, 2024: member

    It would be good to explain the jpegs in the description, or even remove them. They will be excluded from the merge commit and aren’t shown, unless GitHub happens to be reachable and online. Are they saying that IBD was 4% faster? Also, I think they were created with the UB version of this pull, so may be outdated either way?

    I did a quick check on my laptop and it seems the XorSmall (1+4 bytes) is slower with this pull. The Xor (modified to check 40 bytes) was twice as fast. Overall, I’d expect it to be slower on my machine, due to the histogram of real data showing more small byte writes than long ones, IIRC.

    I can try to bench on another machine later, to see if it makes a difference.

    Can you clarify what type of machine you tested this on?

  20. l0rinc commented at 8:39 pm on October 24, 2024: contributor

    Are they saying that IBD was 4% faster?

    That’s what I’m measuring currently, but I don’t expect more than 2% difference here.

    Also, I think they were created with the UB version of this pull, so may be outdated either way?

    Benchmarks indicated that the 64 bit compiled result was basically the same.

    Overall, I’d expect it to be slower on my machine, due to the histogram of real data showing more small byte writes than long ones, IIRC.

    I’ll investigate, thanks.

    Posting the perf here for reference: Reindexing until 300k blocks reveals that XOR usage was reduced:

  21. in src/test/streams_tests.cpp:386 in a3dc138798 outdated
    296@@ -270,7 +297,7 @@ BOOST_AUTO_TEST_CASE(streams_buffered_file)
    297         BOOST_CHECK(false);
    298     } catch (const std::exception& e) {
    299         BOOST_CHECK(strstr(e.what(),
    300-                        "Rewind limit must be less than buffer size") != nullptr);
    301+                           "Rewind limit must be less than buffer size") != nullptr);
    


    hodlinator commented at 12:41 pm on October 25, 2024:
    Seems like prior author just rounded off to some not too unreasonable tab-indentation (efd2474d17098c754367b844ec646ebececc7c74). Function isn’t touched in this PR so should probably resist touching here and below.

    l0rinc commented at 2:18 pm on October 25, 2024:
    Seems it was a deliberate formatting, so I’ll revert. Will push after I have the block serialization benchmark working.

    l0rinc commented at 9:58 pm on October 26, 2024:
    Ended up modifying a lot more in the latest push, so this line was formatted again
  22. in src/test/streams_tests.cpp:35 in a3dc138798 outdated
    30+        const size_t write_size = rng.randrange(100);
    31+        const size_t key_offset = rng.randrange(key_size + 2);
    32+
    33+        std::vector key(rng.randbytes<std::byte>(key_size));
    34+        std::vector expected(rng.randbytes<std::byte>(write_size));
    35+        std::vector actual(expected);
    


    hodlinator commented at 12:46 pm on October 25, 2024:

    Experimented with changed to brace-initialization which uncovered some slight narrowing/widening occurring. (Thought I had an angle for making the code more robust in a more material way but that attempt failed).

     0    auto expected_xor{[](Span<std::byte> write, Span<const std::byte> key, size_t key_offset) {
     1        if (key.size()) {
     2            for (auto& b : write) {
     3                b ^= key[key_offset++ % key.size()];
     4            }
     5        }
     6    }};
     7
     8    FastRandomContext rng{false};
     9    for (int test = 0; test < 100; ++test) {
    10        const int key_size{rng.randrange(10)};
    11        const int write_size{rng.randrange(100)};
    12        const int key_offset{rng.randrange(key_size + 2)};
    13
    14        std::vector key{rng.randbytes<std::byte>(key_size)};
    15        std::vector expected{rng.randbytes<std::byte>(write_size)};
    16        std::vector actual{expected};
    

    l0rinc commented at 2:18 pm on October 25, 2024:
    Seems a bit excessive for a test, but ended up changing it to e.g. const size_t write_size{rng.randrange(100U)};. Will push a bit later.
  23. in src/test/streams_tests.cpp:19 in a3dc138798 outdated
    13@@ -14,6 +14,33 @@ using namespace std::string_literals;
    14 
    15 BOOST_FIXTURE_TEST_SUITE(streams_tests, BasicTestingSetup)
    16 
    17+BOOST_AUTO_TEST_CASE(xor_bytes)
    18+{
    19+    auto expected_xor = [](Span<std::byte> write, Span<const std::byte> key, size_t key_offset) {
    


    hodlinator commented at 1:12 pm on October 25, 2024:
    Might as well use std::span for new code.

    l0rinc commented at 2:18 pm on October 25, 2024:
    Indeed!
  24. hodlinator commented at 2:02 pm on October 25, 2024: contributor

    Concept ACK a3dc138798e2f2c7aa1c9b37633c16c1b523a251

    Operating on CPU words rather than individual bytes. :+1:

    Not entirely clear to me from #31144 (review) whether the optimizer is able to use SIMD. Guess picking through the binary of a GUIX-build would give a definitive answer.

    The verbosity of std::memcpy hurts readability but alignment issues are real.

  25. l0rinc marked this as a draft on Oct 25, 2024
  26. l0rinc force-pushed on Oct 26, 2024
  27. l0rinc force-pushed on Oct 26, 2024
  28. l0rinc force-pushed on Oct 26, 2024
  29. DrahtBot commented at 9:58 pm on October 26, 2024: contributor

    🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/32109998785

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  30. DrahtBot added the label CI failed on Oct 26, 2024
  31. l0rinc force-pushed on Oct 26, 2024
  32. l0rinc renamed this:
    optimization: pack util::Xor into 64/32 bit chunks instead of doing it byte-by-byte
    optimization: change XOR obfuscation key from `std::vector<std::byte>(8)` to `uint64_t`
    on Oct 26, 2024
  33. l0rinc force-pushed on Oct 27, 2024
  34. DrahtBot removed the label CI failed on Oct 27, 2024
  35. l0rinc force-pushed on Oct 27, 2024
  36. l0rinc force-pushed on Oct 27, 2024
  37. l0rinc marked this as ready for review on Oct 27, 2024
  38. l0rinc force-pushed on Oct 29, 2024
  39. l0rinc force-pushed on Oct 29, 2024
  40. DrahtBot added the label CI failed on Oct 29, 2024
  41. DrahtBot commented at 12:49 pm on October 29, 2024: contributor

    🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/32217592364

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  42. DrahtBot removed the label CI failed on Oct 29, 2024
  43. hodlinator commented at 12:40 pm on October 31, 2024: contributor

    The CI failure in https://github.com/bitcoin/bitcoin/runs/32217592364 might come from a bad rebase reconciliation with master?

    0[12:44:30.723] Duplicate include(s) in src/bench/xor.cpp:
    1[12:44:30.723] #include <cstddef>
    2[12:44:30.723] #include <span.h>
    3[12:44:30.723] #include <streams.h>
    4[12:44:30.723] #include <random.h>
    5[12:44:30.723] #include <vector>
    6[12:44:30.723] #include <bench/bench.h>
    7[12:44:30.723] 
    8[12:44:30.728] ^---- ⚠️ Failure generated from lint-includes.py
    
  44. l0rinc commented at 12:59 pm on October 31, 2024: contributor

    The CI failure in bitcoin/bitcoin/runs/32217592364 might come from a bad rebase reconciliation with master?

    That’s an old build, right? Latest Lint seems fine

  45. in src/dbwrapper.cpp:259 in 8696184c2f outdated
    263-        // Initialize non-degenerate obfuscation if it won't upset
    264-        // existing, non-obfuscated data.
    265-        std::vector<unsigned char> new_key = CreateObfuscateKey();
    266+    obfuscate_key = 0; // needed for unobfuscated Read
    267+    std::vector<unsigned char> obfuscate_key_vector(OBFUSCATE_KEY_NUM_BYTES, '\000');
    268+    if (bool key_exists = Read(OBFUSCATE_KEY_KEY, obfuscate_key_vector); !key_exists && params.obfuscate && IsEmpty()) {
    


    hodlinator commented at 1:09 pm on October 31, 2024:

    While key_exists is only created for this ìf`-block, there are other conditions involved, and we test for the negation of the value, so I find it less surprising to revert to the previous approach.

    0    bool key_exists = Read(OBFUSCATE_KEY_KEY, obfuscate_key_vector);
    1    if (!key_exists && params.obfuscate && IsEmpty()) {
    

    l0rinc commented at 2:57 pm on October 31, 2024:
    This pattern has been used before, I meant to narrow the scope of the var here. If you feel strongly about it, I’ll separate them.

    hodlinator commented at 3:09 pm on October 31, 2024:

    This pattern has been used before

    In the example you give, the variable is used in a more “positive” sense. Here we test the negated value, which is part of what makes it jarring.

    One could maybe improve readability by moving the negation and renaming:

    0    if (bool missing_key = !Read(OBFUSCATE_KEY_KEY, obfuscate_key_vector); missing_key && params.obfuscate && IsEmpty()) {
    

    I meant to narrow the scope of the var here.

    That’s why I was referring to the if-block.

    If you feel strongly about it, I’ll separate them.

    Yes please, either that or my negation move+rename suggestion.


    l0rinc commented at 3:13 pm on October 31, 2024:
    Done both, thanks, good observation about the negation

    hodlinator commented at 3:48 pm on October 31, 2024:

    My point about moving the negation and changing the name made more sense in the context of keeping it inside the if-block. If you are open to moving it out, I’d say it’s better to keep the original key_exists name and original negation to avoid the churn and make it easier to review.

    (Realized another reason for not having it inside the if-block is that we are mutating obfuscate_key_vector, which is used after the block).


    l0rinc commented at 5:43 pm on October 31, 2024:
    I prefer the new key_missing part, it did confuse me once during development as well
  46. in src/node/blockstorage.cpp:1180 in 8696184c2f outdated
    1173@@ -1174,7 +1174,11 @@ static auto InitBlocksdirXorKey(const BlockManager::Options& opts)
    1174         };
    1175     }
    1176     LogInfo("Using obfuscation key for blocksdir *.dat files (%s): '%s'\n", fs::PathToString(opts.blocks_dir), HexStr(xor_key));
    1177-    return std::vector<std::byte>{xor_key.begin(), xor_key.end()};
    1178+    assert(xor_key.size() == 8);
    1179+    uint64_t key;
    1180+    std::memcpy(&key, xor_key.data(), 8);
    1181+    xor_key.fill(std::byte{0});
    


    hodlinator commented at 1:16 pm on October 31, 2024:

    I find it more useful to (static) assert that the std::array size matches the uint64 size directly. Also don’t see a point in zeroing out the local variable before returning?

    0    uint64_t key;
    1    static_assert(xor_key.size() == sizeof(key));
    2    std::memcpy(&key, xor_key.data(), sizeof(key));
    

    l0rinc commented at 2:59 pm on October 31, 2024:
    I’ve added the fills to make sure we’re not using them after conversion anymore. What would be the advantage of the static asserts? I don’t mind removing these failsafes if you think they’re redundant or noisy.

    l0rinc commented at 3:15 pm on October 31, 2024:
    Removed in the end, not that important
  47. in src/bench/xor.cpp:7 in 8696184c2f outdated
    2@@ -3,22 +3,1283 @@
    3 // file COPYING or https://opensource.org/license/mit/.
    4 
    5 #include <bench/bench.h>
    6+#include <cmath>
    7+#include <cstddef>
    8+#include <map>
    


    fanquake commented at 1:16 pm on October 31, 2024:
    You can keep the std lib headers separated from our own headers.

    l0rinc commented at 1:55 pm on October 31, 2024:
    Done, anything else?
  48. maflcko commented at 1:18 pm on October 31, 2024: member

    Taking a step back, I wonder if this is worth it. IIRC it gives a +1% speedup when run on spinning storage, so it seems a higher speedup is possibly visible on faster, modern storage. However, it would be good if any IO delay was taken out of IBD completely, so that the speed of storage and the speed of XOR is largely irrelevant.

    I haven’t looked, but this may be possible by asking the next block the be read into memory in the background, as soon as work on the current block begins.

    Something similar is done in net_processing, where the (presumed) current block is read into memory, and kept there, to avoid having to read it from storage again in validation. See https://github.com/bitcoin/bitcoin/blob/f07a533dfcb172321972e5afb3b38a4bd24edb87/src/net_processing.cpp#L823

  49. in src/node/blockstorage.cpp:1177 in 23fc898514 outdated
    1173@@ -1174,6 +1174,7 @@ static auto InitBlocksdirXorKey(const BlockManager::Options& opts)
    1174         };
    1175     }
    1176     LogInfo("Using obfuscation key for blocksdir *.dat files (%s): '%s'\n", fs::PathToString(opts.blocks_dir), HexStr(xor_key));
    1177+    assert(xor_key.size() == 8);
    


    hodlinator commented at 1:20 pm on October 31, 2024:
    Don’t see much point in adding the assert here in 23fc898514bf9696facbaff65251b62c362d214e were we still only have a fixed-size std::array with the asserted fixed size of 8. Seems sufficient with the assert in the BlockManager-ctor.

    l0rinc commented at 3:13 pm on October 31, 2024:
    Simplified, thanks
  50. l0rinc commented at 1:24 pm on October 31, 2024: contributor

    Taking a step back, I wonder if this is worth it. IIRC it gives a +1% speedup

    That’s not the main point, rather that we’re storing the key in a value that can be short-circuited easily so that when key is 0 (i.e. xor is NOOP), we can skip it. Previously this would have only been possible by checking each byte of the key. It’s also a lot cleaner to store it in a primitive instead, which supports xor natively. Xor comes up in every profiling I do, we shouldn’t have a regression because of #28207 - this PR solves that.

  51. in src/test/streams_tests.cpp:262 in 850214ffd9 outdated
    261@@ -235,7 +262,7 @@ BOOST_AUTO_TEST_CASE(streams_serializedata_xor)
    262     // Single character key
    


    hodlinator commented at 1:29 pm on October 31, 2024:
    In 850214ffd9f56e887a18d0428d5881e6c1ee8652: Single/Multi character key comments don’t make sense inside of this commit.

    l0rinc commented at 3:13 pm on October 31, 2024:
    Removed, thanks

    hodlinator commented at 3:39 pm on October 31, 2024:
    (Should be done in the initial commit which invalidates the comments IMO).
  52. maflcko commented at 1:39 pm on October 31, 2024: member

    we shouldn’t have a regression because of #28207 - this PR solves that.

    It is not possible to do XOR without any cost at all. There will always be an overhead and I think calling #28207 a regression and this change a “fix” is not entirely accurate. This change reduces the overhead, according to the benchmarks.

    That’s not the main point

    The pull request title starts with “optimization”, so I got the impression that a speedup is the main point.

    The reason that std::vector was picked is that switching to larger sizes is possible. However, if there was a need to do that, XOR would likely not be sufficient anyway. So limiting to 8 bytes fixed at compile time seems reasonable.

    I am mostly saying that any speedup here may not be visible at all if IO is completely taken out of the critical path, but I haven’t looked into that in detail.

  53. l0rinc commented at 1:52 pm on October 31, 2024: contributor

    change a “fix” is not entirely accurate

    when xor is disabled we’re not xor-ing now at all. Previously we did xor, so this change brings back the previous behavior when xor is not needed.

    The pull request title starts with “optimization”, so I got the impression that a speedup is the main point.

    Yes, please see the updated description with the benchmarks: #31144#issue-2610689777

    may not be visible at all if IO is completely taken out of the critical path

    I’d argue the current implementation is slightly simpler (i.e. xor is stored and performed natively and can be disabled) and faster (2x for a representative dataset).

  54. l0rinc force-pushed on Oct 31, 2024
  55. in src/streams.cpp:14 in 4437d4379c outdated
    10@@ -11,6 +11,7 @@
    11 AutoFile::AutoFile(std::FILE* file, std::vector<std::byte> data_xor)
    12     : m_file{file}, m_xor{std::move(data_xor)}
    13 {
    14+    assert(m_xor.size() == 8);
    


    hodlinator commented at 2:15 pm on October 31, 2024:
    Guess the point of adding this assert is to prove that it doesn’t break anything before switching to uint64?

    l0rinc commented at 3:01 pm on October 31, 2024:
    yes
  56. in src/streams.h:408 in 4437d4379c outdated
    404@@ -393,7 +405,7 @@ class AutoFile
    405     std::optional<int64_t> m_position;
    406 
    407 public:
    408-    explicit AutoFile(std::FILE* file, std::vector<std::byte> data_xor={});
    409+    explicit AutoFile(std::FILE* file, std::vector<std::byte> data_xor = {std::byte{0x00}, std::byte{0x00}, std::byte{0x00}, std::byte{0x00}, std::byte{0x00}, std::byte{0x00}, std::byte{0x00}, std::byte{0x00}});
    


    hodlinator commented at 2:27 pm on October 31, 2024:
    In 4437d4379c42a7b87bd01ad5ea6c450a732f4f95: Less verbose: {std::byte{0x00}, std::byte{0x00}, std::byte{0x00}, std::byte{0x00}, std::byte{0x00}, std::byte{0x00}, std::byte{0x00}, std::byte{0x00}} => {8, std::byte{0x00}}

    l0rinc commented at 3:01 pm on October 31, 2024:
    it’s removed in the next commit

    l0rinc commented at 3:13 pm on October 31, 2024:
    But done anyway
  57. in src/node/mempool_persist.cpp:189 in 33625e9d53 outdated
    188-            file.SetXor(xor_key);
    189+            std::vector<std::byte> xor_key_vector(8);
    190+            FastRandomContext{}.fillrand(xor_key_vector);
    191+            file << xor_key_vector;
    192+
    193+            uint64_t m_xor;
    


    hodlinator commented at 2:46 pm on October 31, 2024:
    Should not use m_-prefix for non-member variables.

    l0rinc commented at 3:13 pm on October 31, 2024:
    You’re right, I’ve used xor_key in the same case before
  58. l0rinc force-pushed on Oct 31, 2024
  59. hodlinator commented at 4:00 pm on October 31, 2024: contributor

    (Reviewed a5cad729c76cafa047a2b1897595669ae9b2b0d5)

    Since my previous Concept ACK, the PR was changed to switch the xor key more completely to uint64_t. Before the PR, we were already using fixed-size of 8 bytes for the obfuscation value in the file formats, so changing the type to uint64_t shouldn’t be noticeable to users. :+1:

    Even if we could move reading and XOR-ing out of the hot path as suggested by maflcko, we might as well make use the CPU architectures we have. I would expect larger-sized XOR operations to have less overhead and energy waste (less heat).

  60. maflcko commented at 5:32 pm on October 31, 2024: member

    We could have used ReadLE64 to unify the byte order for keys and writable values, but that shouldn’t be necessary, since both have the same endianness locally that shouldn’t be affected by a byte-by-byte xor.

    The s390x unit tests fail:

     0./src/test/streams_tests.cpp(40): error: in "streams_tests/xor_bytes": check { expected.begin(), expected.end() } == { actual.begin(), actual.end() } has failed. 
     1Mismatch at position 0: � != |
     2Mismatch at position 1: Y != �
     3Mismatch at position 2: � != �
     4Mismatch at position 3: � != �
     5Mismatch at position 4: w != �
     6Mismatch at position 5: C != �
     7Mismatch at position 6:  != x
     8Mismatch at position 7: � != �
     9Mismatch at position 8: � != C
    10Mismatch at position 9: Y != �
    11Mismatch at position 10: � != �
    12Mismatch at position 11: , != �
    13Mismatch at position 12: � != R
    14Mismatch at position 13: 8 != �
    15Mismatch at position 14: � != �
    16Mismatch at position 15: � != �
    17Mismatch at position 16: � != l
    18Mismatch at position 17: � != n
    19Mismatch at position 18: � != �
    20Mismatch at position 19: t != B
    21Mismatch at position 20: ; != �
    22Mismatch at position 21: != �
    23Mismatch at position 22: � != �
    24Mismatch at position 23: � != �
    25Mismatch at position 24: k != �
    26Mismatch at position 25: � != Z
    27Mismatch at position 26: � != �
    28Mismatch at position 27: � != �
    29Mismatch at position 28: � != #
    30Mismatch at position 29: 8 != �
    31Mismatch at position 30: � != �
    32Mismatch at position 31: � != �
    33Mismatch at position 32: g != �
    34Mismatch at position 33: � != ^
    35Mismatch at position 34: � != �
    36ismatch at position 36: � != k
    37Mismatch at position 37: * != �
    38Mismatch at position 38: q != 
    39Mismatch at position 39: � != �
    40Mismatch at position 40: � != �
    41Mismatch at position 41: r != e
    42Mismatch at position 42: � != �
    
  61. l0rinc commented at 5:42 pm on October 31, 2024: contributor

    The s390x unit tests fail:

    I don’t know how to access that, is it part of CI? Does the test suite pass on it otherwise or was it just curiosity? Do you know the reason it fails, is my assumption incorrect that endianness should apply to both parts (key and value) the same way? Is the test wrong or the xor, should I change the test such that xor-ing twice should reveal the original data instead (while the intermediary part should not)?

  62. maflcko commented at 6:37 pm on October 31, 2024: member

    I don’t know how to access that, is it part of CI?

    It needs to be run manually. See https://github.com/bitcoin/bitcoin/tree/master/ci#running-a-stage-locally. (podman run --rm --privileged docker.io/multiarch/qemu-user-static --reset -p yes may be required to setup qemu-s390x, depending on your setup). Then something like MAKEJOBS="-j$(nproc)" FILE_ENV="./ci/test/00_setup_env_s390x.sh" ./ci/test_run_all.sh should run it.

    Does the test suite pass on it otherwise or was it just curiosity?

    Yes, it should pass on s390x. If not, that is a bug somewhere.

  63. l0rinc marked this as a draft on Oct 31, 2024
  64. l0rinc force-pushed on Nov 2, 2024
  65. l0rinc commented at 7:45 pm on November 2, 2024: contributor

    Thanks for the hints @maflcko, I was under the impression that big-endian tests were run automatically.

    Fix

    It seem that that std::rotr doesn’t take endianness into account, so the fix basically looks like:

    0size_t key_rotation = 8 * key_offset;
    1if constexpr (std::endian::native == std::endian::big) key_rotation *= -1;
    2return std::rotr(key, key_rotation);
    

    I’ve emulated the s390x behavior locally like this:

    0brew install podman pigz
    1softwareupdate --install-rosetta
    2podman machine init
    3podman machine start
    4docker run --platform linux/s390x -it ubuntu:22.04 /bin/bash
    5    apt update && apt install -y git build-essential cmake ccache pkg-config libevent-dev libboost-dev libssl-dev libsqlite3-dev && \
    6    cd /mnt && git clone https://github.com/bitcoin/bitcoin.git && cd bitcoin && git remote add l0rinc https://github.com/l0rinc/bitcoin.git && git fetch --all && git checkout l0rinc/optimize-xor && \
    7    cmake -B build && cmake --build build --target test_bitcoin -j$(nproc) && \
    8    ./build/src/test/test_bitcoin --run_test=streams_tests
    

    Changes

    • The change also includes an updated benchmarking suite with 700k blocks inspected for all usages of utils::Xor to make the benchmark representative:

    • Changed AutoFileXor benchmark to measure the regression of turning off obfuscation.

    • I’ve also updated all key derivations to vector-to-uint64 instead of generating the 64 bit key directly (it’s more consistent, more representative and helps with endianness).

    • Added a xor roundtrip test which applies the xor in random chunks, asserts that it differs from the original and reapplies it in different random chunks and asserts that it’s equivalent to the original.

    See the complete diff: https://github.com/bitcoin/bitcoin/compare/a5cad729c76cafa047a2b1897595669ae9b2b0d5..af778c5bdea4175f84e82b41f3b51c7f453d8c7e

  66. l0rinc marked this as ready for review on Nov 2, 2024
  67. l0rinc force-pushed on Nov 5, 2024
  68. l0rinc commented at 4:40 pm on November 5, 2024: contributor

    Updated the benchmark to 860k blocks (September 2024):

    This one contains a lot of very big arrays (96'233 separate sizes, biggest was 3'992'470 bytes long) - a big departure from the previous 400k and 700k blocks (having 1500 sizes, biggest was 9319 bytes long).

    The performance characteristics are also quite different, now that we have more and bigger byte arrays:

    C++ compiler …………………….. AppleClang 16.0.0.16000026

    Before:

    ns/byte byte/s err% total benchmark
    1.29 774,577,944.12 0.2% 115.99 XorHistogram

    After:

    ns/byte byte/s err% total benchmark
    0.04 26,411,646,837.32 0.2% 8.97 XorHistogram

    i.e. ~35x faster with Clang at processing the data with representative histograms.

    C++ compiler …………………….. GNU 13.2.0

    Before:

    ns/byte byte/s err% ins/byte cyc/byte IPC bra/byte miss% total benchmark
    0.97 1,032,916,679.87 0.0% 9.01 3.29 2.738 1.00 0.0% 86.58 XorHistogram

    After:

    ns/byte byte/s err% ins/byte cyc/byte IPC bra/byte miss% total benchmark
    0.10 10,369,097,976.62 0.0% 0.32 0.33 0.985 0.06 0.6% 8.63 XorHistogram

    i.e. ~10x faster with GCC at processing the data with representative histograms.

    Edit: note that I couldn’t use random byte generation for each benchmark value since it timed out on CI. I have replaced it with getting subsets of a single big random vector.

  69. l0rinc force-pushed on Nov 5, 2024
  70. DrahtBot added the label CI failed on Nov 5, 2024
  71. DrahtBot commented at 4:49 pm on November 5, 2024: contributor

    🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/32549559124

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  72. l0rinc force-pushed on Nov 6, 2024
  73. l0rinc force-pushed on Nov 6, 2024
  74. l0rinc force-pushed on Nov 7, 2024
  75. l0rinc force-pushed on Nov 8, 2024
  76. DrahtBot removed the label CI failed on Nov 8, 2024
  77. l0rinc commented at 10:59 am on November 11, 2024: contributor

    I ran a reindex-chainstate until 860k blocks (SSD, i7 CPU), before and after this change, 2 runs per commit. As stated in the previous comment, the latter blocks (>700k) seem to contain a lot of very big vectors where the new algorithm shines.

    0hyperfine   \
    1--runs 2   \
    2--export-json /mnt/my_storage/reindex-chainstate-obfuscation-overhead.json  \
    3--parameter-list COMMIT 866f4fa521f6932162570d6531055cc007e3d0cd,3efc72ff7cbdfb788b23bf4346e29ba99362c120,08db1794647c37f966c525411f931a4a0f6b6119 \
    4--prepare 'git checkout {COMMIT} && git clean -fxd && git reset --hard && cmake -B build -DCMAKE_BUILD_TYPE=Release -DBUILD_UTIL=OFF -DBUILD_TX=OFF -DBUILD_TESTS=OFF -DENABLE_WALLET=OFF -DINSTALL_MAN=OFF && cmake --build build -j$(nproc)' \
    5'COMMIT={COMMIT} ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=860000 -dbcache=10000 -printtoconsole=0 -reindex-chainstate'
    

    Which indicates that this change speeds up IBD (or at least reindex-chainstate) by roughly ~3% (i.e. reverts the regression):

    • 866f4fa521f6932162570d6531055cc007e3d0cd - baseline
    • 3efc72ff7cbdfb788b23bf4346e29ba99362c120 - end-to-end vector to uint64
    • 08db1794647c37f966c525411f931a4a0f6b6119 - dummy commit that forces obfuscation keys to be 0 to ignore xor operations (since -blocksxor=0 doesn’t affect dbwrapper), to see how xor affects speed in general

    Benchmark 1: COMMIT=866f4fa521f6932162570d6531055cc007e3d0cd ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=860000 -dbcache=10000 -printtoconsole=0 -reindex-chainstate

    Time (mean ± σ): 20846.396 s ± 72.227 s [User: 22353.773 s, System: 2563.864 s] Range (min … max): 20795.323 s … 20897.468 s 2 runs

    Benchmark 2: COMMIT=3efc72ff7cbdfb788b23bf4346e29ba99362c120 ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=860000 -dbcache=10000 -printtoconsole=0 -reindex-chainstate

    Time (mean ± σ): 20308.036 s ± 100.345 s [User: 21745.076 s, System: 2591.459 s] Range (min … max): 20237.081 s … 20378.991 s 2 runs

    Benchmark 3: COMMIT=08db1794647c37f966c525411f931a4a0f6b6119 ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=860000 -dbcache=10000 -printtoconsole=0 -reindex-chainstate

    Time (mean ± σ): 20293.227 s ± 9.923 s [User: 21711.799 s, System: 2588.886 s] Range (min … max): 20286.210 s … 20300.244 s 2 runs

    0Summary
    1'COMMIT=08db1794647c37f966c525411f931a4a0f6b6119 ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=860000 -dbcache=10000 -printtoconsole=0 -reindex-chainstate' ran
    2 1.00 ± 0.00 times faster than 'COMMIT=3efc72ff7cbdfb788b23bf4346e29ba99362c120 ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=860000 -dbcache=10000 -printtoconsole=0 -reindex-chainstate'
    3 1.03 ± 0.00 times faster than 'COMMIT=866f4fa521f6932162570d6531055cc007e3d0cd ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=860000 -dbcache=10000 -printtoconsole=0 -reindex-chainstate'
    
  78. in src/test/streams_tests.cpp:301 in a123297318 outdated
    299+        uint64_t xor_key;
    300+        std::memcpy(&xor_key, xor_pat.data(), 8);
    301+
    302         DataStream ds{in};
    303-        ds.Xor({0xff});
    304+        ds.Xor(xor_key);
    


    hodlinator commented at 8:26 pm on November 11, 2024:

    Let’s not add bloat for this case.

    0        ds.Xor(0xffffffffffffffff);
    

    l0rinc commented at 9:59 pm on November 11, 2024:
    Please see: #31144 (review)

    hodlinator commented at 10:26 pm on November 11, 2024:
    All ones (binary) is not endian-sensitive. IMO it’s okay for the tests to look slightly different to reduce this kind of noise.

    l0rinc commented at 10:28 pm on November 11, 2024:
    I want to test the setup we have in production, these tests were very useful in revealing endianness problems.
  79. in src/test/streams_tests.cpp:315 in a123297318 outdated
    316+        uint64_t xor_key;
    317+        std::memcpy(&xor_key, xor_pat.data(), 8);
    318+
    319         DataStream ds{in};
    320-        ds.Xor({0xff, 0x0f});
    321+        ds.Xor(xor_key);
    


    hodlinator commented at 8:32 pm on November 11, 2024:

    Should work regardless of endianness.

    0        ds.Xor(0xff0fff0fff0fff0f);
    

    l0rinc commented at 9:56 pm on November 11, 2024:
    It doesn’t (at least not for all test cases), I’m deliberately testing generating vectors and converting them instead of generating 64 bit values since that’s what happens in production code (this should address multiple of your comments)

    hodlinator commented at 10:45 pm on November 11, 2024:

    l0rinc commented at 3:46 pm on April 5, 2025:

    Changed to:

    const Obfuscation obfuscation{“ff0fff0fff0fff0f”_hex};

  80. in src/test/streams_tests.cpp:82 in a123297318 outdated
    78     const std::vector<uint8_t> test1{1, 2, 3};
    79     const std::vector<uint8_t> test2{4, 5};
    80-    const std::vector<std::byte> xor_pat{std::byte{0xff}, std::byte{0x00}};
    81+    const std::vector xor_pat{std::byte{0xff}, std::byte{0x00}, std::byte{0xff}, std::byte{0x00}, std::byte{0xff}, std::byte{0x00}, std::byte{0xff}, std::byte{0x00}};
    82+    uint64_t xor_key;
    83+    std::memcpy(&xor_key, xor_pat.data(), 8);
    


    hodlinator commented at 8:35 pm on November 11, 2024:
    0    const uint64_t xor_pat{0xff00ff00ff00ff00};
    

    l0rinc commented at 9:56 pm on November 11, 2024:
    Please see: #31144 (review)

    hodlinator commented at 10:35 pm on November 11, 2024:
    Feel like I’m imitating ChatGPT: Ah, yes, I get now how the memcpy() interaction with endianness preserves the functioning of this specific test.
  81. in src/dbwrapper.cpp:257 in a123297318 outdated
    261-
    262-    if (!key_exists && params.obfuscate && IsEmpty()) {
    263-        // Initialize non-degenerate obfuscation if it won't upset
    264-        // existing, non-obfuscated data.
    265-        std::vector<unsigned char> new_key = CreateObfuscateKey();
    266+    obfuscate_key = 0; // needed for unobfuscated Read
    


    hodlinator commented at 9:19 pm on November 11, 2024:
    0    obfuscate_key = 0; // Needed for unobfuscated Read
    

    l0rinc commented at 11:07 pm on November 11, 2024:
    Done
  82. in src/node/mempool_persist.cpp:64 in a123297318 outdated
    62         if (version == MEMPOOL_DUMP_VERSION_NO_XOR_KEY) {
    63-            // Leave XOR-key empty
    64+            file.SetXor(0);
    65         } else if (version == MEMPOOL_DUMP_VERSION) {
    66-            file >> xor_key;
    67+            std::vector<std::byte> xor_key_vector(8);
    


    hodlinator commented at 9:27 pm on November 11, 2024:

    Here and on line 185:

    0            std::vector<std::byte> xor_key_vector{8};
    

    l0rinc commented at 11:07 pm on November 11, 2024:
    Now I had to rename the PR :D
  83. in src/streams.h:50 in a123297318 outdated
    59+    switch (write.size()) { // Specify the optimization categories
    60+    case 0: break;
    61+    case 1: XorInt(write, key, 1); break;
    62+    case 2: XorInt(write, key, 2); break;
    63+    case 4: XorInt(write, key, 4); break;
    64+    default: XorInt(write, key, write.size());
    


    hodlinator commented at 9:36 pm on November 11, 2024:
    Care to elaborate why we don’t just have the default statement?

    l0rinc commented at 9:59 pm on November 11, 2024:
    “Specify the optimization categories” means that the compiler will be able to optimize the cases where one of the parameters (the size) is a constant separately from each other. The default statement would work, but would be very slow, since the 1, 2 and 4 byte versions won’t be specialized.

    hodlinator commented at 10:42 pm on November 11, 2024:

    How about

    0    // Help optimizers along by sending constant parameter values into the inlined function,
    1    // resulting in more efficient substitutions of memcpy() -> native pow-2 copy instructions.
    2    switch (write.size()) {
    3    case 0: break;
    4    case 1: XorInt(write, key, 1); break;
    5    case 2: XorInt(write, key, 2); break;
    6    case 4: XorInt(write, key, 4); break;
    7    default: XorInt(write, key, write.size());
    

    l0rinc commented at 11:07 pm on November 11, 2024:
    I ended up with only // Help the compiler specialize 1, 2 and 4 byte cases, since the rest was just speculation

    l0rinc commented at 3:48 pm on April 5, 2025:
    This isn’t needed anymore, since #31551 will remove all those tiny writes
  84. hodlinator commented at 9:49 pm on November 11, 2024: contributor

    Reviewed a1232973189126cfc9526713011461709685fcc8

    git range-diff master a5cad72 a123297

    The awk script in the comment in c671dd17dced0d51845dc8d2148f288c4c44ecb2 doesn’t add the ' separators…?

    Care to explain the scaling_factor value?

    XorHistogram claims to use 8 GB of RAM. Could be a bit much if we want to be able to also run benchmarks on low-end devices.

     0diff --git a/src/bench/xor.cpp b/src/bench/xor.cpp
     1index 2ba8b17f08..bb193c7fcf 100644
     2--- a/src/bench/xor.cpp
     3+++ b/src/bench/xor.cpp
     4@@ -96269,18 +96269,13 @@ static void XorHistogram(benchmark::Bench& bench)
     5 
     6         total_bytes += scaled_count * size;
     7         for (size_t j{0}; j < scaled_count; ++j) {
     8-            std::vector<std::byte> ret;
     9-            ret.reserve(size);
    10-            ret.insert(ret.end(), pattern.begin(), pattern.begin() + size);
    11-            test_data.push_back(std::move(ret));
    12+            test_data.emplace_back(pattern.begin(), pattern.begin() + size);
    13         }
    14     }
    15     assert(total_bytes == 8'129'394'848); // ~8 GB of data
    16     std::shuffle(test_data.begin(), test_data.end(), rng); // Make it more realistic & less predictable
    17 
    18-    std::vector key_bytes{rng.randbytes<std::byte>(8)};
    19-    uint64_t key;
    20-    std::memcpy(&key, key_bytes.data(), 8);
    21+    const uint64_t key{rng.rand64()};
    22 
    23     size_t offset{0};
    24     bench.batch(total_bytes).unit("byte").run([&] {
    25@@ -96296,9 +96291,7 @@ static void AutoFileXor(benchmark::Bench& bench)
    26     FastRandomContext rng{/*fDeterministic=*/true};
    27     auto data{rng.randbytes<std::byte>(1'000'000)};
    28 
    29-    std::vector<std::byte> empty_key_bytes(8, std::byte{0}); // Test disabled xor
    30-    uint64_t empty_key;
    31-    std::memcpy(&empty_key, empty_key_bytes.data(), 8);
    32+    uint64_t empty_key{0};
    33 
    34     const fs::path test_path = fs::temp_directory_path() / "xor_benchmark.dat";
    35     AutoFile f{fsbridge::fopen(test_path, "wb+"), empty_key};
    
  85. l0rinc commented at 10:11 pm on November 11, 2024: contributor

    The awk script in the comment in https://github.com/bitcoin/bitcoin/commit/c671dd17dced0d51845dc8d2148f288c4c44ecb2 doesn’t add the ’ separators…?

    I’ve added them manually.

    Care to explain the scaling_factor value?

    The values in the histogram (i.e. total bytes streamed through xor) add up to 92gb, but 1 byte values occupy half of that. (edit: this is only true for the first row, in other cases we would need to multiply by the first column)

    Since we have thousands of big values that represent vector of that size, we have to include all of those into the test set at least once. I had to scale down the histogram such that the lower values, having a few hundred occurrences aren’t flattened out completely - to make the histogram still representative. Do you have a better idea?

    XorHistogram claims to use 8 GB of RAM. Could be a bit much if we want to be able to also run benchmarks on low-end devices.

    Yes, but if I scale it down more, more values will be equal in the histogram and it won’t reflect real usage. That’s why I’ve set it to low priority, we don’t have to run these for every execution.

    Edit: pushed some nits to git range-diff 866f4fa521f6932162570d6531055cc007e3d0cd..a1232973189126cfc9526713011461709685fcc8 866f4fa521f6932162570d6531055cc007e3d0cd..57caa965b5ae284e501f892415d60fcb536f4c0e

  86. l0rinc force-pushed on Nov 11, 2024
  87. l0rinc renamed this:
    optimization: change XOR obfuscation key from `std::vector<std::byte>(8)` to `uint64_t`
    optimization: change XOR obfuscation key from `std::vector<std::byte>{8}` to `uint64_t`
    on Nov 11, 2024
  88. hodlinator commented at 11:43 pm on November 11, 2024: contributor

    Care to explain the scaling_factor value?

    The values in the histogram (i.e. total bytes streamed through xor) add up to ~92gb, but 1 byte values occupy half of that. I had to scale down the histogram such that the lower values, having a few hundred occurrences aren’t flattened out completely - to make the histogram still representative. Do you have a better idea?

    So raw_histogram really is the raw data?

    Added:

    0    uint64_t raw_total_bytes{0};
    1    for (size_t i{0}; i < std::size(raw_histogram); i += 2) {
    2        const uint64_t size = raw_histogram[i];
    3        const uint64_t count = raw_histogram[i + 1];
    4        raw_total_bytes += size * count;
    5    }
    6    printf("raw_total_bytes: %ld", raw_total_bytes);
    

    Got:

    0raw_total_bytes: 1'277'637'746'542 # "'" added manually
    

    So something like 1.28TB instead of 92GB, but I can’t seem to get my head screwed on right today.


    If I re-understood correctly what the code was doing with the scaling - it’s doing only 1'000'000 XOR-passes for the most common size (1 byte) instead of 47'584'838'861, and scaling down the number of XOR-passes for the others by the same factor.

    Thought the following would be slightly faster:

     0diff --git a/src/bench/xor.cpp b/src/bench/xor.cpp
     1index 2ba8b17f08..5e933d74ea 100644
     2--- a/src/bench/xor.cpp
     3+++ b/src/bench/xor.cpp
     4@@ -96253,7 +96253,7 @@ static void XorHistogram(benchmark::Bench& bench)
     5     };
     6 
     7     const auto max_count{static_cast<double>(raw_histogram[1])}; // 1 byte is the most common
     8-    const auto scaling_factor{1'000'000U};
     9+    const auto scaling_factor{1'000'000.0 / max_count};
    10 
    11     FastRandomContext rng{/*fDeterministic=*/true};
    12     const size_t pattern_size = raw_histogram[std::size(raw_histogram) - 2];
    13@@ -96265,7 +96265,7 @@ static void XorHistogram(benchmark::Bench& bench)
    14     for (size_t i{0}; i < std::size(raw_histogram); i += 2) {
    15         const uint64_t size = raw_histogram[i];
    16         const uint64_t count = raw_histogram[i + 1];
    17-        const size_t scaled_count{static_cast<size_t>(std::ceil((static_cast<double>(count) / max_count) * scaling_factor))};
    18+        const size_t scaled_count{static_cast<size_t>(std::ceil(static_cast<double>(count) * scaling_factor))};
    19 
    20         total_bytes += scaled_count * size;
    21         for (size_t j{0}; j < scaled_count; ++j) {
    

    Still passes the total_bytes == 8'129'394'848-assert unmodified, but seems slightly slower in my measurements (including for posterity).

  89. hodlinator commented at 9:21 am on November 12, 2024: contributor

    Reviewed 57caa965b5ae284e501f892415d60fcb536f4c0e

    git range-diff master a123297 57caa96

    • Applied most of my nits that were still valid.

    I understand you were burned by endianness but I disagree that it’s worth sacrificing readability where endianness is a non-issue.

    For cases where endianness is a concern, I suggest the following pattern:

    0uint64_t xor_key;
    1constexpr std::array<uint8_t, sizeof xor_key> xor_pat{std::to_array<uint8_t>({0xff, 0x00, 0xff, 0x00, 0xff, 0x00, 0xff, 0x00})};
    2std::memcpy(&xor_key, xor_pat.data(), sizeof xor_key);
    
    1. Declare the uint64_t first so sizeof can be used in place of magic 8.
    2. Use constexpr std::array instead of const std::vector to constrain the size.
    3. Use to_array initialization since regular brace-initialization will only complain if too many elements are given, not too few.
    4. Use uint8_t in favor of std::byte to improve readability.
    5. Use sizeof <target> for the memcpy call since it is good practice to avoid writing out of bounds and also avoids the magic 8.

    Use ~0ULL to express a uint64_t with all bits set, ensuring one doesn’t forget an f.


     0diff --git a/src/bench/xor.cpp b/src/bench/xor.cpp
     1index f94a1a6f96..fdf41b3696 100644
     2--- a/src/bench/xor.cpp
     3+++ b/src/bench/xor.cpp
     4@@ -96275,10 +96275,7 @@ static void XorHistogram(benchmark::Bench& bench)
     5     assert(total_bytes == 8'129'394'848); // ~8 GB of data
     6     std::shuffle(test_data.begin(), test_data.end(), rng); // Make it more realistic & less predictable
     7 
     8-    std::vector key_bytes{rng.randbytes<std::byte>(8)};
     9-    uint64_t key;
    10-    std::memcpy(&key, key_bytes.data(), 8);
    11-
    12+    const uint64_t key{rng.rand64()};
    13     size_t offset{0};
    14     bench.batch(total_bytes).unit("byte").run([&] {
    15         for (auto& data : test_data) {
    16@@ -96293,10 +96290,7 @@ static void AutoFileXor(benchmark::Bench& bench)
    17     FastRandomContext rng{/*fDeterministic=*/true};
    18     auto data{rng.randbytes<std::byte>(1'000'000)};
    19 
    20-    std::vector<std::byte> empty_key_bytes(8, std::byte{0}); // Test disabled xor
    21-    uint64_t empty_key;
    22-    std::memcpy(&empty_key, empty_key_bytes.data(), 8);
    23-
    24+    const uint64_t empty_key{0}; // Test disabled xor
    25     const fs::path test_path = fs::temp_directory_path() / "xor_benchmark.dat";
    26     AutoFile f{fsbridge::fopen(test_path, "wb+"), empty_key};
    27     bench.batch(data.size()).unit("byte").run([&] {
    28diff --git a/src/test/streams_tests.cpp b/src/test/streams_tests.cpp
    29index 5e9b607b3a..ac51eb3e51 100644
    30--- a/src/test/streams_tests.cpp
    31+++ b/src/test/streams_tests.cpp
    32@@ -77,9 +77,9 @@ BOOST_AUTO_TEST_CASE(xor_file)
    33     auto raw_file{[&](const auto& mode) { return fsbridge::fopen(xor_path, mode); }};
    34     const std::vector<uint8_t> test1{1, 2, 3};
    35     const std::vector<uint8_t> test2{4, 5};
    36-    const std::vector xor_pat{std::byte{0xff}, std::byte{0x00}, std::byte{0xff}, std::byte{0x00}, std::byte{0xff}, std::byte{0x00}, std::byte{0xff}, std::byte{0x00}};
    37     uint64_t xor_key;
    38-    std::memcpy(&xor_key, xor_pat.data(), 8);
    39+    constexpr std::array<uint8_t, sizeof xor_key> xor_pat{std::to_array<uint8_t>({0xff, 0x00, 0xff, 0x00, 0xff, 0x00, 0xff, 0x00})};
    40+    std::memcpy(&xor_key, xor_pat.data(), sizeof xor_key);
    41 
    42     {
    43         // Check errors for missing file
    44@@ -293,12 +293,8 @@ BOOST_AUTO_TEST_CASE(streams_serializedata_xor)
    45     in.push_back(std::byte{0xf0});
    46 
    47     {
    48-        const std::vector xor_pat{std::byte{0xff}, std::byte{0xff}, std::byte{0xff}, std::byte{0xff}, std::byte{0xff}, std::byte{0xff}, std::byte{0xff}, std::byte{0xff}};
    49-        uint64_t xor_key;
    50-        std::memcpy(&xor_key, xor_pat.data(), 8);
    51-
    52         DataStream ds{in};
    53-        ds.Xor(xor_key);
    54+        ds.Xor(~0ULL);
    55         BOOST_CHECK_EQUAL("\xf0\x0f"s, ds.str());
    56     }
    57 
    58@@ -307,9 +303,9 @@ BOOST_AUTO_TEST_CASE(streams_serializedata_xor)
    59     in.push_back(std::byte{0x0f});
    60 
    61     {
    62-        const std::vector xor_pat{std::byte{0xff}, std::byte{0x0f}, std::byte{0xff}, std::byte{0x0f}, std::byte{0xff}, std::byte{0x0f}, std::byte{0xff}, std::byte{0x0f}};
    63         uint64_t xor_key;
    64-        std::memcpy(&xor_key, xor_pat.data(), 8);
    65+        constexpr std::array<uint8_t, sizeof xor_key> xor_pat{std::to_array<uint8_t>({0xff, 0x0f, 0xff, 0x0f, 0xff, 0x0f, 0xff, 0x0f})};
    66+        std::memcpy(&xor_key, xor_pat.data(), sizeof xor_key);
    67 
    68         DataStream ds{in};
    69         ds.Xor(xor_key);
    
  90. fanquake commented at 10:07 am on November 12, 2024: member
    I haven’t really looked at the changes here, but just looking at the diff (+96'000 lines), my feedback would be that you’ll need to change approach in regards to making your data available (if the intent is to have that included), as I doubt we’ll be adding 96'000 lines to bench/xor.cpp. You could look at how we generate a header from bench/data/block413567.raw for a different approach, as including a small binary blob, and parsing it into a header at compile time is far more palatable.
  91. l0rinc commented at 3:16 pm on November 12, 2024: contributor

    Thanks @fanquake, I thought of that, can you please help me understand the constraints? Wouldn’t that require a cmake generation step from binary to header which would basically produce the exact same lines as what we have now? Would it help if I simply extracted it to a separate header file instead?

    So something like 1.28TB instead of 92GB, but I can’t seem to get my head screwed on right today.

    Yeah, I’ve edited that part since, my napkin calculations were only true for the first row, in other cases we would need to multiply by the first column - like you did. But the point is that it’s a lot of data that we have to scale down.

    but seems slightly slower in my measurements

    I thought of that as well, but wanted to avoid floating point conversion (likely the reason for the slowness in your example)

  92. fanquake commented at 5:51 pm on November 12, 2024: member

    Wouldn’t that require a cmake generation step from binary to header which would basically produce the exact same lines as what we have now?

    Yes. See bench/data/block413567.raw & bench/data/block413567.raw.h, where at build time a header file of ~125'000 lines is produced.

    Would it help if I simply extracted it to a separate header file instead?

    I don’t think so. The point is more to not add 100'000s of lines of “data” to this repo, which doesn’t scale across many benchmarks, creates unusable diffs, leaves (source) files unviewable on GH etc.

  93. l0rinc force-pushed on Nov 13, 2024
  94. l0rinc commented at 3:27 pm on November 13, 2024: contributor

    I understand you were burned by endianness but I disagree that it’s worth sacrificing readability where endianness is a non-issue.

    Thanks @hodlinator for the suggestions, I tried them all, but in the end decided that I value consistency more than coming up with a separate solution for each test case. These are ugly, I agree, but at least they’re testing the setup we’re using in prod. I did however change the hard-coded 8 values to sizeof xor_key (for memcpy) or sizeof(uint64_t) for vector inits.

    The point is more to not add 100'000s of lines of “data” to this repo

    I have stored the sorted diffs in the end (since the lines are correlated, i.e. more likely to have similar neighbours) and compressed them using .tar.gz (added the generator python script as a gist, please verify) - this way the histogram data is ~100 kb instead of 1.7 MB (thanks for the hint @fanquake). I’ve extended GenerateHeaderFromRaw.cmake with compression support (adjusting GenerateHeaders.cmake to trim the suffix from the header name) and added more safety asserts to make sure the data read back is the same as before.

  95. l0rinc force-pushed on Dec 2, 2024
  96. in src/streams.h:43 in f2fd1f7c04 outdated
    41+}
    42+inline void Xor(Span<std::byte> write, const uint64_t key)
    43+{
    44+    assert(key);
    45+    for (constexpr auto size = sizeof(uint64_t); write.size() >= size; write = write.subspan(size)) {
    46+        XorInt(write, key, size);
    


    hodlinator commented at 2:51 pm on December 3, 2024:

    nit: Clearer name and source of sizeof expression.

    0    for (constexpr auto chunk_size = sizeof(key); write.size() >= chunk_size; write = write.subspan(chunk_size)) {
    1        XorInt(write, key, chunk_size);
    

    l0rinc commented at 5:31 pm on December 6, 2024:
    Not convinced chunk_size is better - but will use sizeof(key) on next push

    l0rinc commented at 5:36 pm on December 6, 2024:
    Done, kept the name
  97. in src/bench/xor.cpp:1075 in f2fd1f7c04 outdated
    108+    }
    109 }
    110 
    111-BENCHMARK(Xor, benchmark::PriorityLevel::HIGH);
    112+BENCHMARK(XorHistogram, benchmark::PriorityLevel::LOW);
    113+BENCHMARK(AutoFileXor, benchmark::PriorityLevel::LOW);
    


    hodlinator commented at 3:00 pm on December 3, 2024:
    Missing newline at EOF in latest version.

    l0rinc commented at 5:31 pm on December 6, 2024:
    I still don’t see the reason for keeping them, but I’ll add it back on next push if it’s important

    l0rinc commented at 5:36 pm on December 6, 2024:
    Done
  98. in src/bench/xor.cpp:34 in f2fd1f7c04 outdated
    33+}
    34+
    35+static void XorHistogram(benchmark::Bench& bench)
    36+{
    37+    // The histogram represents util::Xor method's write.size() histograms for the first 860k blocks
    38+    // aggregated and encoded with https://gist.github.com/l0rinc/a44da845ad32ec89c30525507cdd28ee
    


    hodlinator commented at 3:28 pm on December 3, 2024:

    nits: Although there is precedent for adding references to gists, I’m not sure we should encourage it. Would have preferred a file in this repo’s contrib/ directory.

    Would also have preferred that it did the full calculation of the .tgz file instead of having a hard-coded array, computing it by looking at linearalized blocks on disk.


    l0rinc commented at 5:32 pm on December 6, 2024:
    Removed
  99. hodlinator commented at 4:04 pm on December 3, 2024: contributor

    Code reviewed f2fd1f7c043a2782cb2bf3c9fe7e2f94c17728b5

    0₿ git range-diff master 57caa96 f2fd1f7
    1₿ git show e314bb7e00 > old
    2₿ git show 91a8fde051 > new
    3₿ meld old new
    

    Thanks for using more constexpr std::arrays and clearer sizeofs!

    Nice that block data could be compressed to such a large extent.

    nit: Would prefer the *.cmake changes where broken out into their own commit, keeping only src/bench/CMakeLists.txt as part of the benchmark change.

  100. ryanofsky commented at 5:00 pm on December 3, 2024: contributor

    Concept ACK, but curious for more feedback from @maflcko about this PR. The actual code changes here do not seem too complicated but maybe they make the code less generic. I wonder if you think there are concrete downsides to this PR, or if the changes are ok but possibly not be worth the review effort (as #31144 (comment) seems to suggest)

    I’m happy to spend time reviewing this if it improves performance and doesn’t cause other problems.

    this way the histogram data is ~100 kb instead of 1.7 MB

    Current approach seems ok to me, but wondering it it might be better to just use a sampling of the most common write sizes instead of including the entire histogram. It seems like if you take the top 50 sizes it covers 99.6% of the writes, and might make the test more maintainable and PR easier to understand without changing results too much.

    using histogram from https://gist.github.com/l0rinc/a44da845ad32ec89c30525507cdd28ee

     0    cut = 50
     1    hist_count = rest_count = 0
     2    histogram.sort(key=lambda h: (-h[1]*h[0]))
     3    for i, (size, count) in enumerate(histogram):
     4        if i < cut:
     5           print(f"{size=}, {count=}")
     6           hist_count += count
     7        else:
     8           rest_count += count
     9
    10    print()
    11    print(f"{hist_count=} {hist_count/(hist_count+rest_count)*100:.1f}%")
    12    print(f"{rest_count=} {rest_count/(hist_count+rest_count)*100:.1f}%")
    
     0size=32, count=5369404406
     1size=106, count=1193555153
     2size=71, count=1763349816
     3size=107, count=1064027363
     4size=25, count=2705183236
     5size=33, count=2025696499
     6size=4, count=14983070199
     7size=72, count=827712501
     8size=23, count=2357347372
     9size=1, count=47584838861
    10size=8, count=5939128629
    11size=21, count=2019807250
    12size=22, count=1616301504
    13size=34, count=682781008
    14size=139, count=155158814
    15size=253, count=83729340
    16size=105, count=198656556
    17size=64, count=275253628
    18size=4096, count=3905157
    19size=138, count=112918349
    20size=252, count=48105681
    21size=254, count=38940977
    22size=35, count=269339238
    23size=218, count=25918921
    24size=140, count=37826769
    25size=83, count=53818260
    26size=217, count=13625647
    27size=219, count=12992432
    28size=108, count=24662023
    29size=488, count=4909014
    30size=27, count=87057844
    31size=489, count=4483016
    32size=28, count=78184146
    33size=40, count=50600412
    34size=113, count=17829712
    35size=65, count=29197009
    36size=128, count=13323530
    37size=123, count=12986923
    38size=37, count=41467794
    39size=114, count=9826401
    40size=26, count=35870265
    41size=130, count=7111451
    42size=29, count=30858682
    43size=487, count=1836899
    44size=70, count=12772097
    45size=131, count=6398914
    46size=109, count=7506161
    47size=490, count=1380253
    48size=95901, count=6384
    49size=30, count=19197417
    50
    51hist_count=91959859913 99.6%
    52rest_count=382932220 0.4%
    
  101. maflcko commented at 5:09 pm on December 3, 2024: member

    Concept ACK, but curious for more feedback from @maflcko about this PR. The actual code changes here do not seem too complicated but maybe they make the code less generic.

    There is a good chance that increasing the size of the vector is insufficient, if there is ever a need to increase it to more than 8 bytes, so a complete rewrite may be needed in that case anyway. However, this is just my guess and only time will tell. So I’d say this change is probably fine for now.

    Would still be nice to if there was a way to take all of it out of the hot path (possibly with higher overall benefits), but I don’t know if such a change is possible and will replace this pull.

  102. l0rinc commented at 5:44 pm on December 3, 2024: contributor

    Would still be nice to if there was a way to take all of it out of the hot path

    Can you give me hints on how to do that? Since we have a primitive as a key now, we already skip xor with 0 value now, see https://github.com/bitcoin/bitcoin/pull/31144/files#diff-4020c723bb55e114bdc7ff769086a765dcc7ccfb61da2047a315db16c0c7a8b4R295

    but wondering it it might be better to just use a sampling of the most common write sizes @fanquake mentioned that he thinks this benchmark could be useful - if he’s fine with the truncated version as well, I’ll simplify (would solve some of @hodlinator’s cmake concerns as well).

  103. maflcko commented at 6:33 pm on December 3, 2024: member

    It seems like if you take the top 50 sizes it covers 99.6% of the writes

    I had the same thought. Obviously there could be an unlikely problem if the remaining 0.04% of writes accounted for the majority of the time, but that seems unlikely. Other than that, taking only the top N seems preferable.

  104. l0rinc commented at 9:38 am on December 5, 2024: contributor

    Would still be nice to if there was a way to take all of it out of the hot path

    Since blocks are XOR-ed as well, I can’t meaningfully test it with a reindex(-chainstate), so I did 2 full IBDs until 800k blocks, rebased after #30039 with -blocksxor=0 to test whether we can disable xor completely now.

    0hyperfine \
    1--runs 2 \
    2--export-json /mnt/my_storage/IBD-xor-rebased.json \
    3--parameter-list COMMIT e1074081c9f1895a4f629dfee347ceae484a10d3,f2fd1f7c043a2782cb2bf3c9fe7e2f94c17728b5 \
    4--prepare 'rm -rf /mnt/my_storage/BitcoinData/* && git checkout {COMMIT} && git clean -fxd && git reset --hard && cmake -B build -DCMAKE_BUILD_TYPE=Release -DBUILD_UTIL=OFF -DBUILD_TX=OFF -DBUILD_TESTS=OFF -DENABLE_WALLET=OFF -DINSTALL_MAN=OFF && cmake --build build -j$(nproc)' \
    5'COMMIT={COMMIT} ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=800000 -blocksxor=0 -dbcache=10000 -printtoconsole=0'
    
    0Benchmark 1: COMMIT=e1074081c9f1895a4f629dfee347ceae484a10d3 ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=800000 -blocksxor=0 -dbcache=10000 -printtoconsole=0
    1  Time (mean ± σ):     25797.921 s ± 61.629 s    [User: 26803.189 s, System: 1457.936 s]
    2  Range (min … max):   25754.343 s … 25841.500 s    2 runs
    3 
    4Benchmark 2: COMMIT=f2fd1f7c043a2782cb2bf3c9fe7e2f94c17728b5 ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=800000 -blocksxor=0 -dbcache=10000 -printtoconsole=0
    5  Time (mean ± σ):     23751.046 s ± 342.376 s    [User: 25322.345 s, System: 1509.236 s]
    6  Range (min … max):   23508.949 s … 23993.142 s    2 runs
    

    Which indicates a 9% speedup compared to the baseline:

    0Summary
    1  COMMIT=f2fd1f7c043a2782cb2bf3c9fe7e2f94c17728b5 ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=800000 -blocksxor=0 -dbcache=10000 -printtoconsole=0 ran
    2    1.09 ± 0.02 times faster than COMMIT=e1074081c9f1895a4f629dfee347ceae484a10d3 ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=800000 -blocksxor=0 -dbcache=10000 -printtoconsole=0
    
  105. maflcko commented at 9:46 am on December 5, 2024: member

    Would still be nice to if there was a way to take all of it out of the hot path (possibly with higher overall benefits), but I don’t know if such a change is possible and will replace this pull.

    Actually, this change here also affects RPC performance, not just internal validation, so this can be done in a follow-up or separate pull, if it is possible at all.

  106. l0rinc commented at 1:48 pm on December 6, 2024: contributor

    More context: The previous benchmark was for completely turning off XOR - but we can only do that for new IBD by explicitly setting it to 0. But for the majority of cases we likely want to still to the xor, so this PR is meant to speed it up. I have remeasured it with doing full IBD until 800k blocks (two runs to measure stability, since reindex wouldn’t cover all usages of XOR):

    0hyperfine \
    1--runs 2 \
    2--export-json /mnt/my_storage/IBD-xor-rebased.json \
    3--parameter-list COMMIT e1074081c9f1895a4f629dfee347ceae484a10d3,f2fd1f7c043a2782cb2bf3c9fe7e2f94c17728b5 \
    4--prepare 'rm -rf /mnt/my_storage/BitcoinData/* && git checkout {COMMIT} && git clean -fxd && git reset --hard && cmake -B build -DCMAKE_BUILD_TYPE=Release -DBUILD_UTIL=OFF -DBUILD_TX=OFF -DBUILD_TESTS=OFF -DENABLE_WALLET=OFF -DINSTALL_MAN=OFF && cmake --build build -j$(nproc)' \
    5'COMMIT={COMMIT} ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=800000 -dbcache=10000 -printtoconsole=0'
    
    0Benchmark 1: COMMIT=e1074081c9f1895a4f629dfee347ceae484a10d3 ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=800000 -dbcache=10000 -printtoconsole=0
    1  Time (mean ± σ):     25601.461 s ± 65.686 s    [User: 27025.116 s, System: 1586.908 s]
    2  Range (min … max):   25555.014 s … 25647.907 s    2 runs
    3 
    4Benchmark 2: COMMIT=f2fd1f7c043a2782cb2bf3c9fe7e2f94c17728b5 ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=800000 -dbcache=10000 -printtoconsole=0
    5  Time (mean ± σ):     24526.781 s ± 389.029 s    [User: 25525.801 s, System: 1552.625 s]
    6  Range (min … max):   24251.697 s … 24801.866 s    2 runs
    

    Which indicates that this will speed up IBD by roughly 4% on average (now that 30039 was merged the difference is more obvious):

    0Summary
    1 COMMIT=f2fd1f7c043a2782cb2bf3c9fe7e2f94c17728b5 ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=800000 -dbcache=10000 -printtoconsole=0 ran
    2   1.04 ± 0.02 times faster than COMMIT=e1074081c9f1895a4f629dfee347ceae484a10d3 ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=800000 -dbcache=10000 -printtoconsole=0
    
  107. l0rinc renamed this:
    optimization: change XOR obfuscation key from `std::vector<std::byte>{8}` to `uint64_t`
    optimization: speed up XOR by 4% (9% when disabled) by applying it in larger batches
    on Dec 6, 2024
  108. l0rinc force-pushed on Dec 6, 2024
  109. l0rinc commented at 5:30 pm on December 6, 2024: contributor

    Thanks for the reviews and hints, I’ve pushed the following changes:

    • Reverted all cmake changes @hodlinator mentioned and histogram archives, and based on the hints of @ryanofsky and @maflcko I’ve kept only the top entries (by frequency, re-sorted by size), making sure that the really big write-vectors are also covered (so kept the first 1000 instead of just the first 50). This enabled putting all the data in the sourcefile.
    • Added Assumes to each xor to check that we don’t have any useless calls with 0 keys - making sure we “turn off” the feature when we can.
  110. l0rinc force-pushed on Dec 6, 2024
  111. in src/streams.h:58 in b9c847fd09 outdated
    68+
    69+inline uint64_t RotateKey(const uint64_t key, const size_t key_offset)
    70+{
    71+    Assume(key);
    72+    size_t key_rotation{8 * key_offset};
    73+    if (key_rotation % 64 == 0) return key;
    


    maflcko commented at 1:37 pm on December 7, 2024:
    std::rotr can handle it fine, so this can be removed. Also the codegen is better without this for some reason?

    l0rinc commented at 2:12 pm on December 10, 2024:
    yes, it’s an optimization to avoid doing any rotation when it would wrap around - it would work without this as well. It’s not a measurable speedup, though, so I can remove it if you insist.
  112. in src/node/blockstorage.cpp:1178 in b9c847fd09 outdated
    1173@@ -1174,7 +1174,9 @@ static auto InitBlocksdirXorKey(const BlockManager::Options& opts)
    1174         };
    1175     }
    1176     LogInfo("Using obfuscation key for blocksdir *.dat files (%s): '%s'\n", fs::PathToString(opts.blocks_dir), HexStr(xor_key));
    1177-    return std::vector<std::byte>{xor_key.begin(), xor_key.end()};
    1178+    uint64_t key;
    1179+    std::memcpy(&key, xor_key.data(), sizeof key);
    


    maflcko commented at 1:20 pm on December 10, 2024:

    I wonder if it wouldn’t be better to have a class take care of construction, so that the memcpy is limited to one place. Something like:

    0class XorKey{
    1 uint64_t m_data;
    2 XorKey(std::span<std::byte, 8> sp);
    3}
    

    IIUC the compilers should still be able to treat this as an integral in codegen.


    l0rinc commented at 2:23 pm on December 10, 2024:

    You mean like a

    0union {
    1    uint64_t as_uint64; 
    2    std::array<std::byte, 8> as_bytes;
    3};
    

    or just a static method that initializes the uint64_t in a single place? Or should we encapsulate the XorKey concept behind a struct as well?


    sipa commented at 2:59 pm on December 10, 2024:

    It is undefined behavior in C++ to read from a union member that wasn’t the one most recently written to.

    Using memcpy between a data type and something (equivalent to) a char/byte array is not UB.


    maflcko commented at 5:49 pm on December 10, 2024:

    You mean like a

    0union {
    1    uint64_t as_uint64; 
    2    std::array<std::byte, 8> as_bytes;
    3};
    

    No, I mean just a simple struct. Have you seen my suggestion?

    The benefits would be that passing the integral value around would be type-safe. Also, the endian-considerations are fully contained in a simple struct, as opposed to the xor internal implementation detail and all modules that use xor. Finally, the memcpy would also be contained in a single place. Overall this could make the code smaller, or not, depending on how many users are there. However, in any case, encapsulating the assumptions around type-safety, and endianness would already be worth it in my view.


    l0rinc commented at 3:51 pm on April 5, 2025:
    The latest version uses a dedicated Obfuscation type - very good suggestion, it’s a lot cleaner this way, thanks
  113. in src/test/streams_tests.cpp:21 in e1074081c9 outdated
    13@@ -14,13 +14,73 @@ using namespace std::string_literals;
    14 
    15 BOOST_FIXTURE_TEST_SUITE(streams_tests, BasicTestingSetup)
    16 
    17+BOOST_AUTO_TEST_CASE(xor_roundtrip_random_chunks)
    


    ryanofsky commented at 3:49 pm on December 10, 2024:

    In commit “test: Compare util::Xor with randomized inputs against simple impl” (e1074081c9f1895a4f629dfee347ceae484a10d3)

    Would be good to add comment explaining test. Test seems to be to encoding and then decoding random byte vectors with random 8 byte xor keys using differently-sized and differently-aligned random chunks for encoding and decoding, and then making sure the byte vectors are unchanged after the round trip.


    l0rinc commented at 3:51 pm on April 5, 2025:
    Done, thanks (though different-aligned is a consequence of different-sized)
  114. in src/test/streams_tests.cpp:97 in e1074081c9 outdated
    42+        apply_random_xor_chunks(roundtrip, key_bytes, rng);
    43+        BOOST_CHECK(original == roundtrip);
    44+    }
    45+}
    46+
    47+BOOST_AUTO_TEST_CASE(xor_bytes_reference)
    


    ryanofsky commented at 3:56 pm on December 10, 2024:

    In commit “test: Compare util::Xor with randomized inputs against simple impl” (e1074081c9f1895a4f629dfee347ceae484a10d3)

    Again adding a test comment could be helpful here. This test is making sure the util::Xor function returns same results as a naive byte-by-byte xor with an 8-byte key, using 100 random sized random byte vectors.

    Would suggest moving xor_bytes_reference test up before the xor_roundtrip_random_chunks since it seems like a simpler test that’s an easier introduction to this code and could be followed by more complicated tests.


    l0rinc commented at 4:41 pm on April 5, 2025:
    Done
  115. in src/bench/xor.cpp:17 in caafbd0692 outdated
    16+static void XorHistogram(benchmark::Bench& bench)
    17 {
    18-    FastRandomContext frc{/*fDeterministic=*/true};
    19-    auto data{frc.randbytes<std::byte>(1024)};
    20-    auto key{frc.randbytes<std::byte>(31)};
    21+    // The top util::Xor method's [write.size(), frequency] calls for the IBD of the first 860k blocks
    


    ryanofsky commented at 5:07 pm on December 10, 2024:

    re: #31144 (comment)

    I think instead of taking the top 1000 calls by count, it would make sense to take the top X calls by (size*count) as suggested #31144 (comment), where X could be smaller than 1000, because (size*count) should more closely approximate the time spent on all writes of a given size than count ignoring size. This should make the test more realistic and also allow shrinking the histogram.


    l0rinc commented at 3:53 pm on April 5, 2025:
    This isn’t needed anymore, since in #31551 we batch all the tiny calls now, so this PR only deals with doing the obfuscation on 64 bits
  116. in src/test/streams_tests.cpp:61 in e1074081c9 outdated
    56+    for (size_t test{0}; test < 100; ++test) {
    57+        const size_t write_size{1 + rng.randrange(100U)};
    58+        const size_t key_offset{rng.randrange(3 * 8U)}; // Should wrap around
    59+
    60+        std::vector key_bytes{rng.randbytes<std::byte>(sizeof(uint64_t))};
    61+        uint64_t key;
    


    ryanofsky commented at 11:00 pm on December 10, 2024:

    In commit “test: Compare util::Xor with randomized inputs against simple impl” (e1074081c9f1895a4f629dfee347ceae484a10d3)

    Would be good to use consistent variable names in these tests. The other tests are calling key vectors xor_pat and calling key values xor_key, while these tests are calling key vectors key_bytes and calling key values key. Would be clearer to use consistent names.

    Also, after this PR each test is keeping two different variables and two different representations for each key. It would be good clean this up afterwards and just have one variable per key. Using a dedicated type for keys like the XorKey struct Marco suggested #31144 (review) would be even more ideal.


    l0rinc commented at 4:41 pm on April 5, 2025:
    Good find, unified the test in the first commit
  117. in src/bench/xor.cpp:15 in caafbd0692 outdated
    11 #include <cstddef>
    12+#include <map>
    13 #include <vector>
    14 
    15-static void Xor(benchmark::Bench& bench)
    16+static void XorHistogram(benchmark::Bench& bench)
    


    ryanofsky commented at 11:23 pm on December 10, 2024:

    In commit “bench: Make Xor benchmark more representative” (caafbd069246848a8bdfc2f42fd1d692a824de94)

    Would be a helpful to have a comment saying what this benchmark is measuring. Maybe: // Measure speed of util::Xor function applied to a set of byte vectors. The byte vectors are filled with random data and have sizes matching a distribution of data write sizes observed during IBD.


    l0rinc commented at 4:41 pm on April 5, 2025:
    This was removed since
  118. in src/bench/xor.cpp:1041 in caafbd0692 outdated
    1040+            test_data.emplace_back(rand_bytes);
    1041+        }
    1042+    }
    1043+    assert(total_bytes == 114'929'502);
    1044+
    1045+    std::ranges::shuffle(test_data, rng); // Make it more realistic & less predictable
    


    ryanofsky commented at 11:48 pm on December 10, 2024:

    In commit “bench: Make Xor benchmark more representative” (caafbd069246848a8bdfc2f42fd1d692a824de94)

    It seems awkward to have code with all these hardcoded values that will make it hard to update the histogram, and to end up with test data we can’t directly control the size of. Would be cleaner to just choose how much data to generate, and generate it without hardcoding values from the histogram. Would suggest something more like the following, with an easily adjustable size that doesn’t hardcode other values.

     0std::vector<std::vector<std::byte>> test_data(1'000'000);
     1FastRandomContext rng{/*fDeterministic=*/true};
     2
     3std::vector<std::pair<uint64_t, size_t>> cumulative_count;
     4uint64_t total_count = 0;
     5for (const auto& [size, count] : histogram) {
     6    total_count += count;
     7    cumulative_count.emplace_back(total_count, size);
     8}
     9
    10auto sample_size = [&]() -> size_t {
    11    uint64_t rand_val = rng.randrange<uint64_t>(0, total_count);
    12    return std::lower_bound(
    13        cumulative_count.begin(), cumulative_count.end(), rand_val,
    14        [](const std::pair<uint64_t, size_t>& entry, uint64_t value) {
    15            return entry.first < value;
    16        })->second;
    17};
    18
    19uint64_t total_bytes{0};
    20for (auto& entry : test_data) {
    21    entry = rng.randbytes<std::byte>(sample_size());
    22    total_bytes += entry.size();
    23}
    24
    25std::ranges::shuffle(test_data, rng);
    

    (code was generated by chatgpt)


    l0rinc commented at 4:40 pm on April 5, 2025:
    Thanks, this was removed since
  119. in src/bench/xor.cpp:1056 in caafbd0692 outdated
    1055+        }
    1056+        ankerl::nanobench::doNotOptimizeAway(test_data);
    1057+    });
    1058+}
    1059+
    1060+static void AutoFileXor(benchmark::Bench& bench)
    


    ryanofsky commented at 0:03 am on December 11, 2024:

    In commit “bench: Make Xor benchmark more representative” (caafbd069246848a8bdfc2f42fd1d692a824de94)

    Not really sure I understand the goal if this benchmark. Is it significant that the xor key is 0? Would be helpful to have a description of what this benchmark is measuring and indicating.


    l0rinc commented at 4:40 pm on April 5, 2025:
    Was removed since
  120. in src/bench/xor.cpp:1064 in b9c847fd09 outdated
    1063+    const auto data{rng.randbytes<std::byte>(1'000'000)};
    1064+
    1065+    const std::vector empty_key_bytes(8, std::byte{0}); // Test disabled xor
    1066+    uint64_t empty_key;
    1067+    std::memcpy(&empty_key, empty_key_bytes.data(), 8);
    1068+
    


    ryanofsky commented at 0:05 am on December 11, 2024:

    In commit “bench: Make Xor benchmark more representative” (caafbd069246848a8bdfc2f42fd1d692a824de94)

    Again this looks like a nice place to replace these dual key representations with a clean XorKey value.


    l0rinc commented at 4:40 pm on April 5, 2025:
    Yes, this was done in the last commit
  121. in src/streams.h:31 in 7a2e5ec977 outdated
    28 #include <vector>
    29+#include <util/check.h>
    30 
    31 namespace util {
    32-inline void Xor(Span<std::byte> write, Span<const std::byte> key, size_t key_offset = 0)
    33+inline void XorInt(Span<std::byte> write, const uint64_t key, const size_t size)
    


    ryanofsky commented at 0:17 am on December 11, 2024:

    In commit “optimization: Xor 64 bits together instead of byte-by-byte” (7a2e5ec97700584eeac6f8b08ef697df6a147606)

    Having this size parameter is confusing and seems unnecessary. This would be clearer as:

    0inline void XorInt(Span<std::byte> write, const uint64_t key)
    1{
    2    uint64_t raw = 0;
    3    memcpy(&raw, write.data(), write.size());
    4    raw ^= key;
    5    memcpy(write.data(), &raw, write.size());
    6}
    

    Call sites would need to use .subspan(0, size) but this would also make them more obvious.


    l0rinc commented at 4:40 pm on April 5, 2025:
    This isn’t needed anymore in the latest version
  122. in src/streams.h:33 in 7a2e5ec977 outdated
    32-inline void Xor(Span<std::byte> write, Span<const std::byte> key, size_t key_offset = 0)
    33+inline void XorInt(Span<std::byte> write, const uint64_t key, const size_t size)
    34 {
    35-    if (key.size() == 0) {
    36-        return;
    37+    Assume(key);
    


    ryanofsky commented at 0:28 am on December 11, 2024:

    In commit “optimization: Xor 64 bits together instead of byte-by-byte” (7a2e5ec97700584eeac6f8b08ef697df6a147606)

    Seems like it would be less fragile and would simplify callers to replace this Assume(key) with if (!key) return;


    l0rinc commented at 12:19 pm on December 11, 2024:
    I could do that, but the Assume here was meant to make sure no calls get here when the key is 0 in the first place - since that can usually eliminate other work as well (e.g. MakeWritableByteSpan in DataStream#Xor)
  123. in src/dbwrapper.cpp:262 in 7a2e5ec977 outdated
    266-        std::vector<unsigned char> new_key = CreateObfuscateKey();
    267+    obfuscate_key = std::vector<unsigned char>(OBFUSCATE_KEY_NUM_BYTES, '\000'); // Needed for unobfuscated Read
    268+    const bool key_missing = !Read(OBFUSCATE_KEY_KEY, obfuscate_key);
    269+    if (key_missing && params.obfuscate && IsEmpty()) {
    270+        // Initialize non-degenerate obfuscation if it won't upset existing, non-obfuscated data.
    271+        const std::vector<unsigned char> new_key = CreateObfuscateKey();
    


    ryanofsky commented at 0:36 am on December 11, 2024:

    In commit “optimization: Xor 64 bits together instead of byte-by-byte” (7a2e5ec97700584eeac6f8b08ef697df6a147606)

    I’m confused why this code is changing in this commit when it seems unrelated to the stream.h optimization the behavior seems the same as before? If it his is a refactoring cleanup it would be good to move it to a separate commit explaining that it is a refactoring and what the point of these changes may be.


    l0rinc commented at 4:40 pm on April 5, 2025:
    We can’t write the obfuscation vector directly anymore (since we’re storing an Obfuscation object now) and Read can only read into a vector, so this part needed a temp vector - which looks a bit awkward, indeed. Split it out into a new commit, thanks for the hint!
  124. in src/node/mempool_persist.cpp:62 in 7a2e5ec977 outdated
    57@@ -58,15 +58,16 @@ bool LoadMempool(CTxMemPool& pool, const fs::path& load_path, Chainstate& active
    58     try {
    59         uint64_t version;
    60         file >> version;
    61-        std::vector<std::byte> xor_key;
    62         if (version == MEMPOOL_DUMP_VERSION_NO_XOR_KEY) {
    63-            // Leave XOR-key empty
    64+            const std::vector xor_key(sizeof(uint64_t), std::byte{'\000'});
    


    ryanofsky commented at 0:42 am on December 11, 2024:

    In commit “optimization: Xor 64 bits together instead of byte-by-byte” (7a2e5ec97700584eeac6f8b08ef697df6a147606)

    Are all the changes in this file also a refactoring that don’t change behavior? I don’t understand why these changes are in a commit that is supposed to be optimizing stream.h behavior. Would suggest splitting this commit up and and explaining what purpose of these changes are. Maybe they would make more sense in the next commit so this code is not changing twice?


    l0rinc commented at 4:40 pm on April 5, 2025:
    Now that we have a dedicated Obfuscation type these should be trivial now - let me know if you still think these should be split out to dedicated commits
  125. ryanofsky commented at 1:15 am on December 11, 2024: contributor

    Code review b9c847fd093d100628817af98fe837db938160f7. These changes look good and make sense, and I reviewed almost everything but have a few pieces of feedback:

    • I would very strongly endorse Marco’s suggestion to represent keys with an XorKey struct instead of raw uint64_t values so the code getting and setting keys is simpler, safer, and more uniform, and we can avoid a proliferation of memcpy calls.
    • I don’t think I understand structure of the third and fourth commits. The third commit seems to be adding an optimization to streams.h but also refactoring code not directly related to streams.h and then the fourth commit is refactoring a lot of the same code that was just refactored, and some of the code that was optimized as well. Would suggest doing this more cleanly in 3 commits:
      • First commit adding optimized stream.h/stream.cpp functions and wrappers to provide backwards compatibility so no other code or tests have to change in the commit.
      • Second commit updating code and tests to call the optimized stream.h API instead of the backwards compatibility wrappers.
      • Third commit deleting stream.h backwards compatibility wrappers.
  126. l0rinc marked this as a draft on Dec 14, 2024
  127. l0rinc force-pushed on Dec 21, 2024
  128. DrahtBot added the label CI failed on Dec 21, 2024
  129. DrahtBot commented at 5:07 pm on December 21, 2024: contributor

    🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/34748857550

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  130. l0rinc force-pushed on Dec 21, 2024
  131. l0rinc renamed this:
    optimization: speed up XOR by 4% (9% when disabled) by applying it in larger batches
    optimization: batch XOR operations 12% faster IBD
    on Dec 22, 2024
  132. l0rinc commented at 11:42 am on December 22, 2024: contributor

    The PR has been split into 3 to simplify review, please check those out first:

    The 3 changes together achieve a 12.3% speedup in raw IBD (real nodes, multiple runs, 870k blocks, 1Gb dbcache, ssd)

     0hyperfine \
     1--runs 2 \
     2--parameter-list COMMIT d73f37dda221835b5109ede1b84db2dc7c4b74a1,fe7365584bb3703e5691c93fb004772e84db3697 \
     3--prepare 'rm -rf /mnt/my_storage/BitcoinData/* && git checkout {COMMIT} && git clean -fxd && git reset --hard && cmake -B build -DCMAKE_BUILD_TYPE=Release -DBUILD_UTIL=OFF -DBUILD_TX=OFF -DBUILD_TESTS=OFF -DENABLE_WALLET=OFF -DINSTALL_MAN=OFF && cmake --build build -j$(nproc)' \
     4'COMMIT={COMMIT} ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=870000 -dbcache=1000 -printtoconsole=0'                                                          
     5  
     6Benchmark 1: COMMIT=d73f37dda221835b5109ede1b84db2dc7c4b74a1 ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=870000 -dbcache=1000 -printtoconsole=0
     7  Time (mean ± σ):     34098.690 s ± 175.888 s    [User: 51918.900 s, System: 2898.126 s]
     8  Range (min … max):   33974.318 s … 34223.062 s    2 runs
     9  
    10Benchmark 2: COMMIT=fe7365584bb3703e5691c93fb004772e84db3697 ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=870000 -dbcache=1000 -printtoconsole=0
    11  Time (mean ± σ):     30359.460 s ± 417.536 s    [User: 48322.766 s, System: 2898.793 s]
    12  Range (min … max):   30064.218 s … 30654.703 s    2 runs
    
  133. l0rinc force-pushed on Dec 22, 2024
  134. l0rinc force-pushed on Dec 23, 2024
  135. l0rinc force-pushed on Dec 23, 2024
  136. l0rinc force-pushed on Dec 24, 2024
  137. DrahtBot removed the label CI failed on Dec 24, 2024
  138. Sjors commented at 1:19 pm on January 7, 2025: member

    I ran a benchmark to compare this PR (898a07e2ab3e5b653ddadc76f2d04d625f35607c, rebased) against master @ 433412fd8478923dfdb20044f74c5d1e19fa8dd8.

    AMD Ryzen 7950x machine with Ubuntu 24.10, with one local network peer and a gigabit internet connection.

    Got about a 3% speedup.

    0bitcoind -dbcache=30000 -stopatheight=878000 -blocksdir=/magnetic/.bitcoin -addnode=local-network
    

    Before: 5 hours 03 minutes (seems master already improved a bit)

    Time excludes the 8 minutes to flush 26Gib worth of chainstate to disk during shutdown. That itself is twice as fast as a few months ago: #30987 (comment)

    After: 4 hours and 50 minutes

  139. hodlinator commented at 10:33 am on January 8, 2025: contributor

    3% speedup is less to write home about but still good. Having a less constrained (Sjors) -dbcache setting (30GB) would lead to less XOR read/writes.

    12.3% speedup (l0rinc) could at least in part be explained by the 1GB setting, leading to more XOR-operations.

    An optimization that provides a bigger win for constrained devices should be welcomed.

  140. l0rinc commented at 9:30 am on January 10, 2025: contributor

    Thank you @Sjors for testing it. I was surprised to see your config only revealed a 3% change so I reran the full IBDs with the configs you had: -dbcache=30000 -stopatheight=878000 (I had -dbcache=1000 -stopatheight=870000 before). I suspect the difference in our measurements could stem from doing a single run and not including the final dump in the measurements.

    I was using a HDD this time and wasn’t seeding from local nodes, so the variance was a bit bigger for me, but I ran both before and after several times and there’s an obvious clustering (the before case was consistently slower than any after run, showing a ~12% speedup on average even with high dbcache):

     0hyperfine \
     1--runs 2 \
     2--parameter-list COMMIT d73f37dda221835b5109ede1b84db2dc7c4b74a1,fe7365584bb3703e5691c93fb004772e84db3697 \
     3--prepare 'rm -rf /mnt/my_storage/BitcoinData/* && git checkout {COMMIT} && git clean -fxd && git reset --hard && cmake -B build -DCMAKE_BUILD_TYPE=Release -DBUILD_UTIL=OFF -DBUILD_TX=OFF -DBUILD_TESTS=OFF -DENABLE_WALLET=OFF -DINSTALL_MAN=OFF && cmake --build build -j$(nproc)' \
     4'COMMIT={COMMIT} ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -dbcache=30000 -stopatheight=878000 -printtoconsole=0' 
     5
     6Benchmark 1: COMMIT=d73f37dda221835b5109ede1b84db2dc7c4b74a1 ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -dbcache=30000 -stopatheight=878000 -printtoconsole=0
     7  Time (mean ± σ):     40251.909 s ± 1669.663 s    [User: 51304.669 s, System: 1889.767 s]
     8  Range (min … max):   39071.279 s … 41432.539 s    2 runs
     9 
    10Benchmark 2: COMMIT=fe7365584bb3703e5691c93fb004772e84db3697 ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -dbcache=30000 -stopatheight=878000 -printtoconsole=0
    11  Time (mean ± σ):     36062.225 s ± 916.289 s    [User: 47770.885 s, System: 2097.642 s]
    12  Range (min … max):   35414.310 s … 36710.139 s    2 runs
    13 
    14Summary
    15  COMMIT=fe7365584bb3703e5691c93fb004772e84db3697 ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -dbcache=30000 -stopatheight=878000 -printtoconsole=0 ran
    16    1.12 ± 0.05 times faster than COMMIT=d73f37dda221835b5109ede1b84db2dc7c4b74a1 ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -dbcache=30000 -stopatheight=878000 -printtoconsole=0
    
  141. maflcko commented at 10:05 am on January 10, 2025: member

    Not sure how useful it is to derive speed improvements from measurements where the variance is about half as large as the difference itself. Not claiming this is the case here, but if you measure from the public network, you could very well just measure the bandwidth of the picked nodes (completely unrelated to this pull).

    It is fine if you want to do those measurements locally for fun, but putting them in the pull request title and description doesn’t seem ideal. It would be better to focus on stable and reproducible measurements there.

  142. l0rinc commented at 10:09 am on January 10, 2025: contributor
    Usually the variance is a lot lower (see previous measurements), but these are just my benchmarks (I want them to be as close to reality as possible, that’s why I’m repeating them to have some predictability), I would appreciate if you could provide independent measurements that you find more stable.
  143. hodlinator commented at 10:43 am on January 11, 2025: contributor

    bitcoind -dbcache=30000 -stopatheight=878000 -blocksdir=/magnetic/.bitcoin -addnode=local-network @Sjors, what kind of drive is /magnetic/? (Edit: I’m thinking if the drive is a bit clunky, it will crowd out the speedup from optimizing XOR).

  144. l0rinc commented at 1:07 pm on January 11, 2025: contributor

    I think I managed to reproduce the ~2% difference - by not doing an IBD but a -reindex-chainstate. @Sjors, was your datadir completely empty for the runs? My 12% comes from having nothing locally (e.g. no blocks) to being fully synced (i.e. has to include the final flush as well) - to be as close to the user’s experience as possible.

     0 hyperfine \
     1--runs 2 \
     2--parameter-list COMMIT d73f37dda221835b5109ede1b84db2dc7c4b74a1,fe7365584bb3703e5691c93fb004772e84db3697 \
     3--prepare 'git checkout {COMMIT} && git clean -fxd && git reset --hard && cmake -B build -DCMAKE_BUILD_TYPE=Release -DBUILD_UTIL=OFF -DBUILD_TX=OFF -DBUILD_TESTS=OFF -DENABLE_WALLET=OFF -DINSTALL_MAN=OFF && cmake --build build -j$(nproc)' \
     4'COMMIT={COMMIT} ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -dbcache=30000 -stopatheight=878000 -reindex-chainstate -printtoconsole=0 -connect=0' 
     5
     6Benchmark 1: COMMIT=d73f37dda221835b5109ede1b84db2dc7c4b74a1 ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -dbcache=30000 -stopatheight=878000 -reindex-chainstate -printtoconsole=0 -connect=0
     7  Time (mean ± σ):     23664.320 s ± 111.385 s    [User: 35795.225 s, System: 714.912 s]
     8  Range (min … max):   23585.559 s … 23743.081 s    2 runs
     9 
    10Benchmark 2: COMMIT=fe7365584bb3703e5691c93fb004772e84db3697 ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -dbcache=30000 -stopatheight=878000 -reindex-chainstate -printtoconsole=0 -connect=0
    11  Time (mean ± σ):     23277.741 s ± 172.333 s    [User: 34509.524 s, System: 582.073 s]
    12  Range (min … max):   23155.883 s … 23399.599 s    2 runs
    13 
    14Summary
    15  COMMIT=fe7365584bb3703e5691c93fb004772e84db3697 ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -dbcache=30000 -stopatheight=878000 -reindex-chainstate -printtoconsole=0 -connect=0 ran
    16    1.02 ± 0.01 times faster than COMMIT=d73f37dda221835b5109ede1b84db2dc7c4b74a1 ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -dbcache=30000 -stopatheight=878000 -reindex-chainstate -printtoconsole=0 -connect=0
    
  145. andrewtoth commented at 8:47 pm on January 11, 2025: contributor

    I benchmarked this PR rebased onto 35bf426e02210c1bbb04926f4ca2e0285fbfcd11 up to block 878k two times each, and I saw a 9% speedup. This was using dbcache=30000.

    0hyperfine --export-markdown ~/bench.md --show-output --parameter-list commit 1298bae74a1f690fd6cc0e029e490537cbeb301b,35bf426e02210c1bbb04926f4ca2e0285fbfcd11 --prepare 'git checkout {commit} && \
    1cmake --build build -j $(nproc); \
    2rm -rf /home/user/.bitcoin; \
    3sync; sudo /sbin/sysctl vm.drop_caches=3;' -M 2 'echo '{commit}' && /usr/bin/time ./build/src/bitcoind -printtoconsole=0 -dbcache=30000 -connect=local_node -stopatheight=878000'
    

    This branch:

    0  Time (mean ± σ):     17147.622 s ± 37.933 s    [User: 35867.007 s, System: 871.738 s]
    1  Range (min … max):   17120.799 s … 17174.444 s    2 runs
    

    Master branch commit 35bf426e02210c1bbb04926f4ca2e0285fbfcd11:

    0  Time (mean ± σ):     18704.258 s ±  4.425 s    [User: 37215.600 s, System: 891.293 s]
    1  Range (min … max):   18701.130 s … 18707.387 s    2 runs
    
    0Summary
    1  'echo 1298bae74a1f690fd6cc0e029e490537cbeb301b && /usr/bin/time ./build/src/bitcoind -printtoconsole=0 -dbcache=30000 -connect=192.168.2.171 -stopatheight=878000' ran
    2    1.09 ± 0.00 times faster than 'echo 35bf426e02210c1bbb04926f4ca2e0285fbfcd11 && /usr/bin/time ./build/src/bitcoind -printtoconsole=0 -dbcache=30000 -connect=192.168.2.171 -stopatheight=878000'
    
  146. Sjors commented at 9:05 am on January 13, 2025: member

    @hodlinator a Western Digital spinning disk. Much slower than SSD but fine for just writing out blocks to disk.

    (I’m assuming we’re not doing something foolish like writing the block, then reading it again to do the xor and writing it back) @l0rinc I wiped the blocks, indexes and chainstate dirs between runs. Both times I mention exclude the 8 minute flush, which remained similar.

    I agree with @maflcko that a difference of 10 minutes is probably not statistically significant. I don’t have a good offline benchmark setup.

  147. l0rinc commented at 9:44 am on January 13, 2025: contributor

    @Sjors, your setup is already extremely fast - it seems this optimization shines mostly on commodity hardware, which I assume is used more often. Let’s wait for other reproducers.

    Edit: I’ve created a reproducer tutorial to help others in benchmarking this change: https://gist.github.com/l0rinc/83d2bdfce378ad7396610095ceb7bed5

  148. hodlinator commented at 9:11 pm on January 13, 2025: contributor

    Nodes on local LAN, same commits, both on SSD. Syncing node was laptop running 13th Gen Intel i7, 20 logical cores.

    Full node / source

    Made to not have any other connections (verified through running with -debug=net for a while).

    Deleted anchors.dat & peers.dat.

    0₿ build/src/bitcoind -dbcache=30000 -nofixedseeds -nodnsseed
    

    Syncing node

    Deleted ~/.bitcoin.

    0₿ time build/src/bitcoind -dbcache=30000 -stopatheight=878000 -connect=<sourcenode>
    

    Results

    Commit Wall time
    433412fd8478923dfdb20044f74c5d1e19fa8dd8 3h55m23s (including 8m13s to flush UTXO set to disk)
    898a07e2ab3e5b653ddadc76f2d04d625f35607c (this PR) rebased onto 433412fd8478923dfdb20044f74c5d1e19fa8dd8 3h29m52s (including 8m11s to flush UTXO set to disk)

    => 11.9% speedup

  149. hodlinator commented at 8:28 am on January 14, 2025: contributor
    (I did a second run on 433412fd8478923dfdb20044f74c5d1e19fa8dd8, it took 3h45m53s, 9.5 minutes / 4.4% faster. So there is some variance).
  150. l0rinc commented at 8:29 am on January 14, 2025: contributor
    These are the fastest IBDs I’ve seen so far
  151. DrahtBot added the label Needs rebase on Jan 22, 2025
  152. l0rinc force-pushed on Jan 26, 2025
  153. DrahtBot removed the label Needs rebase on Jan 26, 2025
  154. mlori commented at 9:07 pm on February 17, 2025: none

    Benchmark Results for Bitcoin Core Optimization

    Based on the guide: https://gist.github.com/l0rinc/83d2bdfce378ad7396610095ceb7bed5, I executed the following benchmark tests on the commits listed below, with 4 runs per commit.

    • caa68f79c11e5c444977ce8dee8a43020b7b3c5a (Optimization: XOR 64 bits together instead of byte-by-byte)
    • 5acf12bafeb126f2190b3f401f95199e0eea90c9 (master)

    I began this quest on January 14 and obtained the final results on February 12, taking nearly a month in total.

    Results

    The execution of caa68f79c1 was 12% faster than 5acf12bafe.


    Executed Command

     0COMMITS="caa68f79c11e5c444977ce8dee8a43020b7b3c5a 5acf12bafeb126f2190b3f401f95199e0eea90c9"
     1STOP_HEIGHT=882000
     2DBCACHE=5000
     3C_COMPILER=gcc
     4CXX_COMPILER=g++
     5
     6hyperfine --export-json "../ibd-${COMMITS/ /-}-${STOP_HEIGHT}-${DBCACHE}-${C_COMPILER}.json" \
     7--runs 4 --parameter-list COMMIT ${COMMITS/ /,} \
     8--prepare "killall bitcoind; rm -rf ../BitcoinData/*; git checkout {COMMIT}; git clean -fxd; git reset --hard; \
     9cmake -B build -DCMAKE_BUILD_TYPE=Release -DENABLE_WALLET=OFF -DCMAKE_C_COMPILER=$C_COMPILER -DCMAKE_CXX_COMPILER=$CXX_COMPILER && \
    10cmake --build build -j$(nproc) --target bitcoind && ./build/src/bitcoind -datadir=../BitcoinData -stopatheight=1 -printtoconsole=0 || true" \
    11--cleanup "mv ../BitcoinData/debug.log ../BitcoinData-logs/debug-{COMMIT}-$(date +%s).log" \
    12"COMPILER=$C_COMPILER COMMIT={COMMIT} ./build/src/bitcoind -datadir=../BitcoinData -stopatheight=$STOP_HEIGHT -dbcache=$DBCACHE -printtoconsole=0"
    

    Commit: caa68f79c1

    0Time (mean ± σ):  64403.836 s ± 9298.080 s    
    1User: 57218.987 s, System: 3918.333 s  
    2Range (min … max):  50760.210 s … 70526.366 s    
    3Runs: 4
    

    Commit: 5acf12bafe

    0Time (mean ± σ):  71941.764 s ± 2646.491 s    
    1User: 60882.538 s, System: 3941.555 s  
    2Range (min … max):  68480.815 s … 74310.881 s    
    3Runs: 4
    

    Summary

    0COMPILER=gcc COMMIT=caa68f79c11e5c444977ce8dee8a43020b7b3c5a 
    1ran 1.12 ± 0.17 times faster than 
    2COMPILER=gcc COMMIT=5acf12bafeb126f2190b3f401f95199e0eea90c9
    

    Bitcoin Node Setup

    • CPU: Intel® Core™ i5-6500T @ 2.50GHz
    • RAM: 16 GB
    • Storage: 1TB SSD (Apacer AS350)
    • Internet:
      • Download: 732.86 Mbps
      • Upload: 322.55 Mbps

    Database Cache Adjustments

    Initially, the provided database cache (DBCACHE) size was too high for the available memory on this node, causing fluctuations in execution time. To account for this variability, four runs per commit were conducted. Several configurations were tested:

    • DBCACHE=30000: Execution halted.
    • DBCACHE=10000: Execution was too slow (probably because swapping) without a significant difference between benchmarks.
    • DBCACHE=5000: Found to be the optimal configuration for this node.

  155. l0rinc force-pushed on Mar 11, 2025
  156. l0rinc renamed this:
    optimization: batch XOR operations 12% faster IBD
    [IBD] multi-byte block obfuscation
    on Mar 12, 2025
  157. l0rinc marked this as ready for review on Mar 12, 2025
  158. DrahtBot added the label Needs rebase on Mar 20, 2025
  159. l0rinc force-pushed on Mar 20, 2025
  160. DrahtBot removed the label Needs rebase on Mar 20, 2025
  161. l0rinc force-pushed on Mar 20, 2025
  162. DrahtBot added the label CI failed on Apr 2, 2025
  163. l0rinc force-pushed on Apr 3, 2025
  164. DrahtBot removed the label CI failed on Apr 4, 2025
  165. l0rinc force-pushed on Apr 5, 2025
  166. l0rinc commented at 5:30 pm on April 5, 2025: contributor

    Applied the remaining suggestions, thanks @ryanofsky, @hodlinator, @maflcko.

    As mentioned before, this PR only contains the multi-byte obfuscation now, batching the single-byte serializations is done in #31551

    The PR achieves 18x faster obfuscation on Linux, 49x faster on Mac and an IBD speedup of 4%.

    The latest push:

    • Changed the existing Xor benchmark to XorObfuscationBench, measuring a 10 MiB chunk of random memory;
    • dbwrapper.cpp and mempool_persist.cpp migrations are simplified by dedicated refactor commits;
    • streams_tests.cpp now uses the native m_rng, comments were added to the new big tests, unified key_bytes{"ff00ff00ff00ff00"_hex_v} name and storage;
    • Updated every measurement and commit message to reflect the current state after the split.

    Thanks for the reviews so far, it’s ready for review again! Edit: rebased in latest push to resolve CI failure

  167. l0rinc force-pushed on Apr 6, 2025
  168. achow101 referenced this in commit 33df4aebae on Apr 16, 2025
  169. DrahtBot added the label Needs rebase on Apr 16, 2025
  170. l0rinc force-pushed on Apr 17, 2025
  171. l0rinc commented at 10:10 am on April 17, 2025: contributor
    Rebased, now that #31551 was merged - will redo the IBD benchmarks (since we have bigger obfuscatable chunks now) to see if any of the commit messages or descriptions need changing. The PR is otherwise ready for review again!
  172. DrahtBot removed the label Needs rebase on Apr 17, 2025
  173. in src/bench/xor.cpp:13 in 46854038e7 outdated
     5@@ -6,19 +6,27 @@
     6 #include <random.h>
     7 #include <span.h>
     8 #include <streams.h>
     9+#include <util/byte_units.h>
    10 
    11+#include <cmath>
    12 #include <cstddef>
    13+#include <map>
    


    hodlinator commented at 7:23 am on April 22, 2025:
    Unused?

    l0rinc commented at 4:35 pm on April 22, 2025:
    Done
  174. in src/dbwrapper.cpp:218 in 46854038e7 outdated
    212@@ -213,7 +213,11 @@ struct LevelDBContext {
    213 };
    214 
    215 CDBWrapper::CDBWrapper(const DBParams& params)
    216-    : m_db_context{std::make_unique<LevelDBContext>()}, m_name{fs::PathToString(params.path.stem())}, m_path{params.path}, m_is_memory{params.memory_only}
    217+    : m_db_context{std::make_unique<LevelDBContext>()},
    218+      m_name{fs::PathToString(params.path.stem())},
    219+      m_obfuscation{0},
    


    hodlinator commented at 7:27 am on April 22, 2025:
    nit: Could declare default initialization value for m_obfuscation in header to avoid touching these lines.

    l0rinc commented at 4:39 pm on April 22, 2025:
    Done
  175. in src/dbwrapper.cpp:256 in 46854038e7 outdated
    251@@ -248,24 +252,24 @@ CDBWrapper::CDBWrapper(const DBParams& params)
    252         LogPrintf("Finished database compaction of %s\n", fs::PathToString(params.path));
    253     }
    254 
    255-    // The base-case obfuscation key, which is a noop.
    256-    obfuscate_key = std::vector<unsigned char>(OBFUSCATE_KEY_NUM_BYTES, '\000');
    257+    {
    258+        m_obfuscation = 0; // Needed for unobfuscated Read
    


    hodlinator commented at 7:32 am on April 22, 2025:
    There’s no return before the assignment on line 271, and we already set this field to 0 during initialization, so could be removed?

    l0rinc commented at 12:46 pm on April 22, 2025:
    This is needed to be able to read without obfuscation - as the comment states. Can you suggest how to reword that to make it obvious?

    hodlinator commented at 1:21 pm on April 22, 2025:

    This is needed to be able to read without obfuscation - as the comment states.

    Why does it need to be set to zero again when it’s been set in the initializer list of this same ctor?


    l0rinc commented at 4:35 pm on April 22, 2025:
    I’ve changed it to an assert - if that’s what you meant.

    hodlinator commented at 1:35 pm on April 23, 2025:

    That’s better. But I still think it’s redundant as the code block now ends with

    0m_obfuscation = obfuscate_key_vector;
    

    and all code paths lead there.


    l0rinc commented at 2:18 pm on April 23, 2025:
    Yeah, can you tell me what’s wrong with it? If you have a suggestion that passes ci (and local IBD for some blocks), let me know

    hodlinator commented at 2:19 pm on April 23, 2025:
    I had that one coming. :)

    hodlinator commented at 6:41 am on April 24, 2025:

    Inside a ctor which initializes m_obfuscation{0} through the header, you have:

    0    ...
    1    {
    2        std::vector<unsigned char> obfuscate_key_vector(Obfuscation::SIZE_BYTES, '\000');
    3        ...
    4        if (...) {
    5            ... (modify obfuscate_key_vector)
    6        }
    7        m_obfuscation = obfuscate_key_vector;
    8    }
    9}
    

    Do you feel a need to assert(m_obfuscation == 0) at the beginning of the block because the ctor has ~30 preceeding lines? Can’t think of another reason.


    l0rinc commented at 9:33 am on April 24, 2025:

    assert(m_obfuscation == 0) is needed to document (and to make sure) that the obfuscation key is already written with obfuscation turned on, so we have to make sure the obfuscation key is 0 to turn it off for this write only.

    https://github.com/bitcoin/bitcoin/blob/master/src/test/dbwrapper_tests.cpp#L29-L46 demonstrates the usage of this call - I’ve updated it to verify that reopening the database will read the obfuscation key correctly.

    So without assert(m_obfuscation == 0) we might not read back old stored obfuscation keys back correctly. Does that explain it? If not, let’s have a call.


    hodlinator commented at 10:14 am on April 24, 2025:

    Ah, I was interpreting “Needed for unobfuscated Read” as applying to Read-calls after the ctor, hadn’t realized it was for the call 2 lines below. :man_facepalming: That’s why I didn’t think it was relevant to include in #31144 (review). Sorry for this detour. assert is good.

    Maybe comment could spell out “Needed for unobfuscated Read directly below” for similar readers to me, but hopefully there aren’t too many of them.


    l0rinc commented at 7:02 pm on April 24, 2025:
    Extended the comment

    l0rinc commented at 7:07 pm on April 24, 2025:
    No problem, glad it’s sorted. Extended the comment to make it even clearer.
  176. in src/dbwrapper.h:190 in 46854038e7 outdated
    186@@ -188,16 +187,11 @@ class CDBWrapper
    187     std::string m_name;
    188 
    189     //! a key used for optional XOR-obfuscation of the database
    190-    std::vector<unsigned char> obfuscate_key;
    191+    Obfuscation m_obfuscation;
    


    hodlinator commented at 7:43 am on April 22, 2025:
    (Thanks for restoring sanity with the m_-prefix).
  177. in src/obfuscation.h:1 in 46854038e7 outdated
    0@@ -0,0 +1,85 @@
    1+// Copyright (c) 2009-present The Bitcoin Core developers
    


    hodlinator commented at 7:45 am on April 22, 2025:
    2009?

    l0rinc commented at 12:46 pm on April 22, 2025:
    I’m moving code, I never know what that implies

    l0rinc commented at 4:35 pm on April 22, 2025:
    Changed anyway
  178. in src/obfuscation.h:24 in 46854038e7 outdated
    19+{
    20+public:
    21+    static constexpr size_t SIZE_BYTES{sizeof(uint64_t)};
    22+
    23+private:
    24+    std::array<uint64_t, SIZE_BYTES> rotations; // Cached key rotations
    


    hodlinator commented at 7:48 am on April 22, 2025:
    0    // Cached key rotations for different offsets.
    1    std::array<uint64_t, SIZE_BYTES> m_rotations;
    2    
    

    Most important for me is m_-prefix, newline and comment are nits.


    l0rinc commented at 4:35 pm on April 22, 2025:
    renamed to m_rotations
  179. in src/obfuscation.h:16 in 46854038e7 outdated
    11+#include <random>
    12+#include <span.h>
    13+#include <util/check.h>
    14+#include <cstring>
    15+#include <climits>
    16+#include <serialize.h>
    


    hodlinator commented at 8:38 am on April 22, 2025:

    util/check.h, random and cstring seem unused? bit-header seems required for rotr and std::endian.

    0#include <serialize.h>
    1#include <span.h>
    2
    3#include <array>
    4#include <bit>
    5#include <cassert>
    6#include <climits>
    7#include <cstdint>
    

    l0rinc commented at 4:35 pm on April 22, 2025:
    Did something similar (except for cstdint)
  180. in src/obfuscation.h:23 in 46854038e7 outdated
    18+class Obfuscation
    19+{
    20+public:
    21+    static constexpr size_t SIZE_BYTES{sizeof(uint64_t)};
    22+
    23+private:
    


    hodlinator commented at 8:41 am on April 22, 2025:
    nit: Might as well move the private section last and merge the two public ones?

    l0rinc commented at 4:35 pm on April 22, 2025:
    Want to avoid this, but I don’t actually mind either way - done
  181. in src/test/dbwrapper_tests.cpp:50 in 46854038e7 outdated
    46@@ -57,7 +47,7 @@ BOOST_AUTO_TEST_CASE(dbwrapper_basic_data)
    47         bool res_bool;
    48 
    49         // Ensure that we're doing real obfuscation when obfuscate=true
    50-        BOOST_CHECK(obfuscate != is_null_key(dbwrapper_private::GetObfuscateKey(dbw)));
    51+        BOOST_CHECK(obfuscate == dbwrapper_private::GetObfuscation(dbw));
    


    hodlinator commented at 8:42 am on April 22, 2025:

    nit:

    0        BOOST_CHECK_EQUAL(obfuscate, dbwrapper_private::GetObfuscation(dbw));
    

    Here and above on line 30.


    l0rinc commented at 4:35 pm on April 22, 2025:
    Done
  182. in src/obfuscation.h:58 in 46854038e7 outdated
    53+    Obfuscation(const std::vector<uint8_t>& key_vec) : Obfuscation(MakeByteSpan(key_vec).first<SIZE_BYTES>()) {}
    54+    Obfuscation(const std::vector<std::byte>& key_vec) : Obfuscation(std::span(key_vec).first<SIZE_BYTES>()) {}
    55+
    56+    uint64_t Key() const { return rotations[0]; }
    57+    operator bool() const { return Key() != 0; }
    58+    void operator()(std::span<std::byte> write, const size_t key_offset_bytes = 0) const
    


    hodlinator commented at 8:48 am on April 22, 2025:

    nit: This is inherited from stream.h, but could change to “target”/“dest” to make it more of a noun?

    0    void operator()(std::span<std::byte> target, const size_t key_offset_bytes = 0) const
    

    Same in xor_roundtrip_random_chunks and xor_bytes_reference test cases.


    l0rinc commented at 4:35 pm on April 22, 2025:
    Done
  183. in src/dbwrapper.cpp:405 in 118d8083b9 outdated
    401@@ -411,10 +402,6 @@ void CDBIterator::SeekToFirst() { m_impl_iter->iter->SeekToFirst(); }
    402 void CDBIterator::Next() { m_impl_iter->iter->Next(); }
    403 
    404 namespace dbwrapper_private {
    405-
    406-const std::vector<unsigned char>& GetObfuscateKey(const CDBWrapper &w)
    407-{
    408-    return w.obfuscate_key;
    409-}
    410+const std::vector<unsigned char>& GetObfuscateKey(const CDBWrapper& w) { return w.obfuscate_key; }
    


    hodlinator commented at 9:00 am on April 22, 2025:

    Commit message in 118d8083b913e130b073ed3ad0eeb5aca4887899:

    0- This commit inlines the obfuscate‑key initialization, replaces `key_exists` with `key_missing`, and simplifies the `if` condition that writes a new obfuscation key.
    1- The `CreateObfuscateKey` method and its private helper are removed.
    2+ This commit inlines the `CreateObfuscateKey` method, replaces `key_exists` with `key_missing`, and simplifies the `if` condition that writes a new obfuscation key.
    
    1. This private method is just reformatted. Maybe a case of moving changes between commits. Seems unnecessary.

    2. “The CreateObfuscateKey method is inlined” might be more precise?


    l0rinc commented at 4:35 pm on April 22, 2025:
    Did something similar

    hodlinator commented at 1:43 pm on April 23, 2025:
    Seems you forgot to remove “The CreateObfuscateKey method and its private helper were also removed.”? I think saying CreateObfuscateKey was inlined is sufficient and no private helper was removed as stated above in (1.

    l0rinc commented at 9:33 am on April 24, 2025:
    Sure, done
  184. in src/test/streams_tests.cpp:41 in 46854038e7 outdated
    36+        const auto key_bytes{m_rng.randbytes<std::byte>(Obfuscation::SIZE_BYTES)};
    37+        const Obfuscation obfuscation{key_bytes};
    38+        apply_random_xor_chunks(roundtrip, obfuscation);
    39+
    40+        // Verify intermediate state is different from original (unless key is zero)
    41+        const bool all_zero = !obfuscation || (HexStr(key_bytes).find_first_not_of('0') >= write_size * 2);
    


    hodlinator commented at 9:23 am on April 22, 2025:

    Would rather skip converting to string:

    0        const bool all_zero = !obfuscation || (static_cast<size_t>(std::count(key_bytes.begin(), key_bytes.begin() + write_size, std::byte{0})) == write_size);
    

    l0rinc commented at 4:37 pm on April 22, 2025:
    None are very readable, but I still find the stringy one simpler to read - the performance doesn’t matter here at all

    hodlinator commented at 2:10 pm on April 23, 2025:
    0        const bool all_zero = !obfuscation || std::ranges::all_of(std::span{key_bytes.begin(), write_size}, [](auto& b) { return b == std::byte{0}; });
    
    0        const bool all_zero = !obfuscation || std::all_of(key_bytes.begin(), key_bytes.begin() + write_size, [](auto& b) { return b == std::byte{0}; });
    

    l0rinc commented at 9:32 am on April 24, 2025:

    Ok, sure, changed it to something similar:

    0const bool all_zeros{!obfuscation || std::ranges::all_of(std::span(key_bytes).first(write_size), [](auto b) { return b == std::byte{0}; })};
    

    hodlinator commented at 10:04 am on April 24, 2025:
    Thanks! (If you re-touch: std::span{key_bytes})

    hodlinator commented at 12:32 pm on April 24, 2025:

    https://github.com/bitcoin/bitcoin/pull/31144/checks?check_run_id=41075184292 Was able to repro with GCC using:

    0cmake -B build -DCMAKE_BUILD_TYPE=Debug -DAPPEND_CPPFLAGS="-D_LIBCPP_HARDENING_MODE=_LIBCPP_HARDENING_MODE_DEBUG -D_GLIBCXX_DEBUG -D_GLIBCXX_DEBUG_PEDANTIC"
    

    We both forgot to limit the span:

    0const bool all_zeros{!obfuscation || std::ranges::all_of(std::span{key_bytes}.first(std::min(write_size, key_bytes.size())), [](auto b) { return b == std::byte{0}; })};
    

    Edit: Could drop !obfuscation || since I guess it was mainly there as an optimization to short-circuit the stringly code.


    l0rinc commented at 7:02 pm on April 24, 2025:
    Done (except for the edit part, it’s better documentation this way I think)

    l0rinc commented at 7:08 pm on April 24, 2025:
    Done, thanks
  185. in src/test/streams_tests.cpp:93 in 46854038e7 outdated
    88+    BOOST_CHECK_EQUAL(obf2.Key(), test_key);
    89+
    90+    // std::vector<uint8_t> constructor
    91+    std::vector<uint8_t> uint8_key(Obfuscation::SIZE_BYTES);
    92+    std::memcpy(uint8_key.data(), &test_key, uint8_key.size());
    93+    const Obfuscation obf4{uint8_key};
    


    hodlinator commented at 9:50 am on April 22, 2025:
    obf3 went missing.

    l0rinc commented at 4:38 pm on April 22, 2025:
    Hah, removed a constructor during refactoring and left these - moved them to braced scopes, this way the names can be the same
  186. in src/obfuscation.h:24 in 46854038e7 outdated
    49+
    50+public:
    51+    Obfuscation(const uint64_t key) { SetRotations(key); }
    52+    Obfuscation(const std::span<const std::byte, SIZE_BYTES> key_span) : Obfuscation(ToUint64(key_span)) {}
    53+    Obfuscation(const std::vector<uint8_t>& key_vec) : Obfuscation(MakeByteSpan(key_vec).first<SIZE_BYTES>()) {}
    54+    Obfuscation(const std::vector<std::byte>& key_vec) : Obfuscation(std::span(key_vec).first<SIZE_BYTES>()) {}
    


    hodlinator commented at 9:53 am on April 22, 2025:

    span::first() is unchecked (https://en.cppreference.com/w/cpp/container/span/first), so we may be reading out of bounds of these vectors.

    I managed to make it work with these constructors, part of larger diff which uses std::array over std::vector in most places:

    0    Obfuscation(std::span<const std::byte, SIZE_BYTES> key_span) : Obfuscation(ToUint64(key_span)) {}
    1    Obfuscation(std::span<const uint8_t, SIZE_BYTES> key_span) : Obfuscation(MakeByteSpan(key_span).first<SIZE_BYTES>()) {}
    

    l0rinc commented at 4:39 pm on April 22, 2025:
    I’m not yet sure about the array (tried it, CI or local IBD didn’t work for the reason you’ve also mentioned) - will take a look at that later
  187. in src/test/streams_tests.cpp:314 in 71eb6eaa74 outdated
    314     in.clear();
    315     in.push_back(std::byte{0xf0});
    316     in.push_back(std::byte{0x0f});
    317 
    318     {
    319+        const auto key_bytes{"ff0fff0fff0fff0f"_hex_v};
    


    hodlinator commented at 11:51 am on April 22, 2025:
    In 71eb6eaa740ad0b28737e90e59b89a8e951d90d9: key_bytes is unused.

    l0rinc commented at 4:40 pm on April 22, 2025:
    Was meant as a simplification for the final commit - but didn’t actually use these - thanks, fixed
  188. in src/test/streams_tests.cpp:302 in 71eb6eaa74 outdated
    299 
    300-    // Single character key
    301     {
    302+        const auto key_bytes{"ffffffffffffffff"_hex_v};
    303+        uint64_t xor_key;
    304+        std::memcpy(&xor_key, key_bytes.data(), sizeof(xor_key));
    


    hodlinator commented at 11:53 am on April 22, 2025:
    In 71eb6eaa740ad0b28737e90e59b89a8e951d90d9: key_bytes and xor_key are unused after this.

    l0rinc commented at 4:40 pm on April 22, 2025:
    Fixed
  189. l0rinc requested review from ryanofsky on Apr 22, 2025
  190. l0rinc requested review from hodlinator on Apr 22, 2025
  191. l0rinc requested review from laanwj on Apr 22, 2025
  192. l0rinc requested review from maflcko on Apr 22, 2025
  193. hodlinator commented at 12:30 pm on April 22, 2025: contributor

    Code Review 46854038e7984b599d25640de26d4680e62caba7

    Edit: Superseded by #31144#pullrequestreview-2788618744

    Didn’t get much deeper than surface level yet, but sharing what I found so far.

    My primary suggestion is to change the Obfuscation-ctors to take static-extent spans to prevent accidental out-of-bounds access and also to clarify that we don’t consume bigger vectors (see inline comment). std::array of matching size maps well to such spans.

    Branch with suggestions

    https://github.com/bitcoin/bitcoin/compare/master...hodlinator:bitcoin:pr/31144_suggestions

    Applies below diff in appropriate commit. Also adds initial commit de4a3ec4cc27aff527906f553f50a27eab51a418 “random: Add stack version of randbytes”.

    Diff of suggestions applied to reviewed code

      0diff --git a/src/bench/xor.cpp b/src/bench/xor.cpp
      1index b04f08bba5..fca9fdceb3 100644
      2--- a/src/bench/xor.cpp
      3+++ b/src/bench/xor.cpp
      4@@ -10,7 +10,6 @@
      5 
      6 #include <cmath>
      7 #include <cstddef>
      8-#include <map>
      9 #include <vector>
     10 
     11 static void XorObfuscationBench(benchmark::Bench& bench)
     12diff --git a/src/dbwrapper.cpp b/src/dbwrapper.cpp
     13index 00317138d5..a6c9c02623 100644
     14--- a/src/dbwrapper.cpp
     15+++ b/src/dbwrapper.cpp
     16@@ -213,11 +213,7 @@ struct LevelDBContext {
     17 };
     18 
     19 CDBWrapper::CDBWrapper(const DBParams& params)
     20-    : m_db_context{std::make_unique<LevelDBContext>()},
     21-      m_name{fs::PathToString(params.path.stem())},
     22-      m_obfuscation{0},
     23-      m_path{params.path},
     24-      m_is_memory{params.memory_only}
     25+    : m_db_context{std::make_unique<LevelDBContext>()}, m_name{fs::PathToString(params.path.stem())}, m_path{params.path}, m_is_memory{params.memory_only}
     26 {
     27     DBContext().penv = nullptr;
     28     DBContext().readoptions.verify_checksums = true;
     29@@ -253,22 +249,21 @@ CDBWrapper::CDBWrapper(const DBParams& params)
     30     }
     31 
     32     {
     33-        m_obfuscation = 0; // Needed for unobfuscated Read
     34-        std::vector<unsigned char> obfuscate_key_vector(Obfuscation::SIZE_BYTES, '\000');
     35-        const bool key_missing{!Read(OBFUSCATE_KEY_KEY, obfuscate_key_vector)};
     36+        std::array<uint8_t, Obfuscation::SIZE_BYTES> obfuscate_key; // Zero-initialized is needed for unobfuscated Read
     37+        const bool key_missing{!Read(OBFUSCATE_KEY_KEY, obfuscate_key)};
     38         if (key_missing && params.obfuscate && IsEmpty()) {
     39             // Initialize non-degenerate obfuscation if it won't upset existing, non-obfuscated data.
     40-            std::vector<uint8_t> new_key(Obfuscation::SIZE_BYTES);
     41+            std::array<uint8_t, Obfuscation::SIZE_BYTES> new_key;
     42             GetRandBytes(new_key);
     43 
     44             // Write `new_key` so we don't obfuscate the key with itself
     45             Write(OBFUSCATE_KEY_KEY, new_key);
     46-            obfuscate_key_vector = new_key;
     47+            obfuscate_key = new_key;
     48 
     49-            LogPrintf("Wrote new obfuscate key for %s: %s\n", fs::PathToString(params.path), HexStr(obfuscate_key_vector));
     50+            LogPrintf("Wrote new obfuscate key for %s: %s\n", fs::PathToString(params.path), HexStr(obfuscate_key));
     51         }
     52-        LogPrintf("Using obfuscation key for %s: %s\n", fs::PathToString(params.path), HexStr(obfuscate_key_vector));
     53-        m_obfuscation = obfuscate_key_vector;
     54+        LogPrintf("Using obfuscation key for %s: %s\n", fs::PathToString(params.path), HexStr(obfuscate_key));
     55+        m_obfuscation = {obfuscate_key};
     56     }
     57 }
     58 
     59diff --git a/src/dbwrapper.h b/src/dbwrapper.h
     60index 5f23f1dc43..feb31b7d48 100644
     61--- a/src/dbwrapper.h
     62+++ b/src/dbwrapper.h
     63@@ -187,7 +187,7 @@ private:
     64     std::string m_name;
     65 
     66     //! a key used for optional XOR-obfuscation of the database
     67-    Obfuscation m_obfuscation;
     68+    Obfuscation m_obfuscation{0};
     69 
     70     //! the key under which the obfuscation key is stored
     71     static const std::string OBFUSCATE_KEY_KEY;
     72diff --git a/src/node/blockstorage.cpp b/src/node/blockstorage.cpp
     73index d2c1f33205..56d96e58f4 100644
     74--- a/src/node/blockstorage.cpp
     75+++ b/src/node/blockstorage.cpp
     76@@ -1107,7 +1107,7 @@ static Obfuscation InitBlocksdirXorKey(const BlockManager::Options& opts)
     77 {
     78     // Bytes are serialized without length indicator, so this is also the exact
     79     // size of the XOR-key file.
     80-    std::array<std::byte, 8> xor_key{};
     81+    std::array<std::byte, Obfuscation::SIZE_BYTES> xor_key{};
     82 
     83     // Consider this to be the first run if the blocksdir contains only hidden
     84     // files (those which start with a .). Checking for a fully-empty dir would
     85diff --git a/src/obfuscation.h b/src/obfuscation.h
     86index 70bc844c6d..d4a732e286 100644
     87--- a/src/obfuscation.h
     88+++ b/src/obfuscation.h
     89@@ -5,15 +5,14 @@
     90 #ifndef BITCOIN_OBFUSCATION_H
     91 #define BITCOIN_OBFUSCATION_H
     92 
     93+#include <serialize.h>
     94+#include <span.h>
     95+
     96 #include <array>
     97+#include <bit>
     98 #include <cassert>
     99-#include <cstdint>
    100-#include <random>
    101-#include <span.h>
    102-#include <util/check.h>
    103-#include <cstring>
    104 #include <climits>
    105-#include <serialize.h>
    106+#include <cstdint>
    107 
    108 class Obfuscation
    109 {
    110@@ -21,13 +20,15 @@ public:
    111     static constexpr size_t SIZE_BYTES{sizeof(uint64_t)};
    112 
    113 private:
    114-    std::array<uint64_t, SIZE_BYTES> rotations; // Cached key rotations
    115+    // Cached key rotations for different offsets.
    116+    std::array<uint64_t, SIZE_BYTES> m_rotations;
    117+
    118     void SetRotations(const uint64_t key)
    119     {
    120         for (size_t i{0}; i < SIZE_BYTES; ++i) {
    121             size_t key_rotation_bits{CHAR_BIT * i};
    122             if constexpr (std::endian::native == std::endian::big) key_rotation_bits *= -1;
    123-            rotations[i] = std::rotr(key, key_rotation_bits);
    124+            m_rotations[i] = std::rotr(key, key_rotation_bits);
    125         }
    126     }
    127 
    128@@ -49,36 +50,35 @@ private:
    129 
    130 public:
    131     Obfuscation(const uint64_t key) { SetRotations(key); }
    132-    Obfuscation(const std::span<const std::byte, SIZE_BYTES> key_span) : Obfuscation(ToUint64(key_span)) {}
    133-    Obfuscation(const std::vector<uint8_t>& key_vec) : Obfuscation(MakeByteSpan(key_vec).first<SIZE_BYTES>()) {}
    134-    Obfuscation(const std::vector<std::byte>& key_vec) : Obfuscation(std::span(key_vec).first<SIZE_BYTES>()) {}
    135+    Obfuscation(std::span<const std::byte, SIZE_BYTES> key_span) : Obfuscation(ToUint64(key_span)) {}
    136+    Obfuscation(std::span<const uint8_t, SIZE_BYTES> key_span) : Obfuscation(MakeByteSpan(key_span).first<SIZE_BYTES>()) {}
    137 
    138-    uint64_t Key() const { return rotations[0]; }
    139+    uint64_t Key() const { return m_rotations[0]; }
    140     operator bool() const { return Key() != 0; }
    141-    void operator()(std::span<std::byte> write, const size_t key_offset_bytes = 0) const
    142+    void operator()(std::span<std::byte> target, const size_t key_offset_bytes = 0) const
    143     {
    144         if (!*this) return;
    145-        const uint64_t rot_key{rotations[key_offset_bytes % SIZE_BYTES]}; // Continue obfuscation from where we left off
    146-        for (; write.size() >= SIZE_BYTES; write = write.subspan(SIZE_BYTES)) { // Process multiple bytes at a time
    147-            Xor(write, rot_key, SIZE_BYTES);
    148+        const uint64_t rot_key{m_rotations[key_offset_bytes % SIZE_BYTES]}; // Continue obfuscation from where we left off
    149+        for (; target.size() >= SIZE_BYTES; target = target.subspan(SIZE_BYTES)) { // Process multiple bytes at a time
    150+            Xor(target, rot_key, SIZE_BYTES);
    151         }
    152-        Xor(write, rot_key, write.size());
    153+        Xor(target, rot_key, target.size());
    154     }
    155 
    156     template <typename Stream>
    157     void Serialize(Stream& s) const
    158     {
    159-        std::vector<std::byte> bytes(SIZE_BYTES);
    160-        std::memcpy(bytes.data(), &rotations[0], SIZE_BYTES);
    161+        std::array<std::byte, SIZE_BYTES> bytes;
    162+        std::memcpy(bytes.data(), &m_rotations[0], SIZE_BYTES); // MISMATCH IN size, byte vs i64
    163         s << bytes;
    164     }
    165 
    166     template <typename Stream>
    167     void Unserialize(Stream& s)
    168     {
    169-        std::vector<std::byte> bytes(SIZE_BYTES);
    170+        std::array<std::byte, SIZE_BYTES> bytes;
    171         s >> bytes;
    172-        SetRotations(ToUint64(MakeByteSpan(bytes).first<SIZE_BYTES>()));
    173+        SetRotations(ToUint64(bytes));
    174     }
    175 };
    176 
    177diff --git a/src/random.h b/src/random.h
    178index c702309d0c..ecc4d1e88f 100644
    179--- a/src/random.h
    180+++ b/src/random.h
    181@@ -301,6 +301,15 @@ public:
    182         return ret;
    183     }
    184 
    185+    /** Generate random bytes on the stack. */
    186+    template <size_t N, BasicByte B = unsigned char>
    187+    std::array<B, N> randbytes() noexcept
    188+    {
    189+        std::array<B, N> ret;
    190+        Impl().fillrand(MakeWritableByteSpan(ret));
    191+        return ret;
    192+    }
    193+
    194     /** Generate a random 32-bit integer. */
    195     uint32_t rand32() noexcept { return Impl().template randbits<32>(); }
    196 
    197diff --git a/src/test/dbwrapper_tests.cpp b/src/test/dbwrapper_tests.cpp
    198index 7d9135baf0..c7431e6e1f 100644
    199--- a/src/test/dbwrapper_tests.cpp
    200+++ b/src/test/dbwrapper_tests.cpp
    201@@ -27,7 +27,7 @@ BOOST_AUTO_TEST_CASE(dbwrapper)
    202         uint256 res;
    203 
    204         // Ensure that we're doing real obfuscation when obfuscate=true
    205-        BOOST_CHECK(obfuscate == dbwrapper_private::GetObfuscation(dbw));
    206+        BOOST_CHECK_EQUAL(obfuscate, dbwrapper_private::GetObfuscation(dbw));
    207 
    208         BOOST_CHECK(dbw.Write(key, in));
    209         BOOST_CHECK(dbw.Read(key, res));
    210@@ -47,7 +47,7 @@ BOOST_AUTO_TEST_CASE(dbwrapper_basic_data)
    211         bool res_bool;
    212 
    213         // Ensure that we're doing real obfuscation when obfuscate=true
    214-        BOOST_CHECK(obfuscate == dbwrapper_private::GetObfuscation(dbw));
    215+        BOOST_CHECK_EQUAL(obfuscate, dbwrapper_private::GetObfuscation(dbw));
    216 
    217         //Simulate block raw data - "b + block hash"
    218         std::string key_block = "b" + m_rng.rand256().ToString();
    219diff --git a/src/test/streams_tests.cpp b/src/test/streams_tests.cpp
    220index ab64825d98..ebd22d2a0c 100644
    221--- a/src/test/streams_tests.cpp
    222+++ b/src/test/streams_tests.cpp
    223@@ -33,12 +33,12 @@ BOOST_AUTO_TEST_CASE(xor_roundtrip_random_chunks)
    224         const std::vector original{m_rng.randbytes<std::byte>(write_size)};
    225         std::vector roundtrip{original};
    226 
    227-        const auto key_bytes{m_rng.randbytes<std::byte>(Obfuscation::SIZE_BYTES)};
    228+        const std::array key_bytes{m_rng.randbytes<Obfuscation::SIZE_BYTES, std::byte>()};
    229         const Obfuscation obfuscation{key_bytes};
    230         apply_random_xor_chunks(roundtrip, obfuscation);
    231 
    232         // Verify intermediate state is different from original (unless key is zero)
    233-        const bool all_zero = !obfuscation || (HexStr(key_bytes).find_first_not_of('0') >= write_size * 2);
    234+        const bool all_zero = !obfuscation || (static_cast<size_t>(std::count(key_bytes.begin(), key_bytes.begin() + write_size, std::byte{0})) == write_size);
    235         BOOST_CHECK_EQUAL(original != roundtrip, !all_zero);
    236 
    237         apply_random_xor_chunks(roundtrip, obfuscation);
    238@@ -60,7 +60,7 @@ BOOST_AUTO_TEST_CASE(xor_bytes_reference)
    239         const size_t write_size{1 + m_rng.randrange(100U)};
    240         const size_t key_offset{m_rng.randrange(3 * 8U)}; // Should wrap around
    241 
    242-        const auto key_bytes{m_rng.randbytes<std::byte>(Obfuscation::SIZE_BYTES)};
    243+        const std::array key_bytes{m_rng.randbytes<Obfuscation::SIZE_BYTES, std::byte>()};
    244         const Obfuscation obfuscation{key_bytes};
    245         std::vector expected{m_rng.randbytes<std::byte>(write_size)};
    246         std::vector actual{expected};
    247@@ -72,7 +72,6 @@ BOOST_AUTO_TEST_CASE(xor_bytes_reference)
    248     }
    249 }
    250 
    251-
    252 BOOST_AUTO_TEST_CASE(obfuscation_constructors)
    253 {
    254     constexpr uint64_t test_key = 0x0123456789ABCDEF;
    255@@ -81,23 +80,17 @@ BOOST_AUTO_TEST_CASE(obfuscation_constructors)
    256     const Obfuscation obf1{test_key};
    257     BOOST_CHECK_EQUAL(obf1.Key(), test_key);
    258 
    259-    // std::span constructor
    260+    // std::span<std::byte> constructor
    261     std::array<std::byte, Obfuscation::SIZE_BYTES> key_bytes{};
    262     std::memcpy(key_bytes.data(), &test_key, Obfuscation::SIZE_BYTES);
    263-    const Obfuscation obf2{std::span{key_bytes}};
    264+    const Obfuscation obf2{key_bytes};
    265     BOOST_CHECK_EQUAL(obf2.Key(), test_key);
    266 
    267-    // std::vector<uint8_t> constructor
    268-    std::vector<uint8_t> uint8_key(Obfuscation::SIZE_BYTES);
    269+    // std::span<uint8_t> constructor
    270+    std::array<uint8_t, Obfuscation::SIZE_BYTES> uint8_key;
    271     std::memcpy(uint8_key.data(), &test_key, uint8_key.size());
    272-    const Obfuscation obf4{uint8_key};
    273-    BOOST_CHECK_EQUAL(obf4.Key(), test_key);
    274-
    275-    // std::vector<std::byte> constructor
    276-    std::vector<std::byte> byte_vector_key(Obfuscation::SIZE_BYTES);
    277-    std::memcpy(byte_vector_key.data(), &test_key, byte_vector_key.size());
    278-    const Obfuscation obf5{byte_vector_key};
    279-    BOOST_CHECK_EQUAL(obf5.Key(), test_key);
    280+    const Obfuscation obf3{uint8_key};
    281+    BOOST_CHECK_EQUAL(obf3.Key(), test_key);
    282 }
    283 
    284 BOOST_AUTO_TEST_CASE(obfuscation_serialize)
    285@@ -129,13 +122,13 @@ BOOST_AUTO_TEST_CASE(xor_file)
    286     auto raw_file{[&](const auto& mode) { return fsbridge::fopen(xor_path, mode); }};
    287     const std::vector<uint8_t> test1{1, 2, 3};
    288     const std::vector<uint8_t> test2{4, 5};
    289-    auto key_bytes{"ff00ff00ff00ff00"_hex_v};
    290+    auto key_bytes{"ff00ff00ff00ff00"_hex};
    291     uint64_t xor_key;
    292     std::memcpy(&xor_key, key_bytes.data(), sizeof(xor_key));
    293 
    294     {
    295         // Check errors for missing file
    296-        AutoFile xor_file{raw_file("rb"), key_bytes};
    297+        AutoFile xor_file{raw_file("rb"), {key_bytes}};
    298         BOOST_CHECK_EXCEPTION(xor_file << std::byte{}, std::ios_base::failure, HasReason{"AutoFile::write: file handle is nullpt"});
    299         BOOST_CHECK_EXCEPTION(xor_file >> std::byte{}, std::ios_base::failure, HasReason{"AutoFile::read: file handle is nullpt"});
    300         BOOST_CHECK_EXCEPTION(xor_file.ignore(1), std::ios_base::failure, HasReason{"AutoFile::ignore: file handle is nullpt"});
    301@@ -345,7 +338,7 @@ BOOST_AUTO_TEST_CASE(streams_serializedata_xor)
    302     in.push_back(std::byte{0xf0});
    303 
    304     {
    305-        const Obfuscation obfuscation{"ffffffffffffffff"_hex_v};
    306+        const Obfuscation obfuscation{"ffffffffffffffff"_hex};
    307 
    308         DataStream ds{in};
    309         obfuscation(ds);
    310@@ -357,7 +350,7 @@ BOOST_AUTO_TEST_CASE(streams_serializedata_xor)
    311     in.push_back(std::byte{0x0f});
    312 
    313     {
    314-        const Obfuscation obfuscation{"ff0fff0fff0fff0f"_hex_v};
    315+        const Obfuscation obfuscation{"ff0fff0fff0fff0f"_hex};
    316 
    317         DataStream ds{in};
    318         obfuscation(ds);
    319@@ -673,7 +666,7 @@ BOOST_AUTO_TEST_CASE(buffered_reader_matches_autofile_random_content)
    320     const FlatFilePos pos{0, 0};
    321 
    322     const FlatFileSeq test_file{m_args.GetDataDirBase(), "buffered_file_test_random", node::BLOCKFILE_CHUNK_SIZE};
    323-    const std::vector obfuscation{m_rng.randbytes<std::byte>(8)};
    324+    const Obfuscation obfuscation{m_rng.randbytes<Obfuscation::SIZE_BYTES>()};
    325 
    326     // Write out the file with random content
    327     {
    328@@ -726,7 +719,7 @@ BOOST_AUTO_TEST_CASE(buffered_writer_matches_autofile_random_content)
    329 
    330     const FlatFileSeq test_buffered{m_args.GetDataDirBase(), "buffered_write_test", node::BLOCKFILE_CHUNK_SIZE};
    331     const FlatFileSeq test_direct{m_args.GetDataDirBase(), "direct_write_test", node::BLOCKFILE_CHUNK_SIZE};
    332-    const std::vector obfuscation{m_rng.randbytes<std::byte>(8)};
    333+    const Obfuscation obfuscation{m_rng.randbytes<Obfuscation::SIZE_BYTES>()};
    334 
    335     {
    336         DataBuffer test_data{m_rng.randbytes<std::byte>(file_size)};
    
  194. hodlinator commented at 1:52 pm on April 22, 2025: contributor
    Changed the serialization of Obfuscation from vector -> array without re-testing my suggestions towards the end. Forgot that one serializes the size and the other does not. Seems to be responsible for some of the test failures on my suggestion-branch.
  195. l0rinc force-pushed on Apr 22, 2025
  196. l0rinc force-pushed on Apr 22, 2025
  197. in src/bench/xor.cpp:22 in 2a50ec0d17 outdated
    23 
    24-    bench.batch(data.size()).unit("byte").run([&] {
    25-        util::Xor(data, key);
    26+    std::vector key_bytes{rng.randbytes<std::byte>(8)};
    27+    uint64_t key;
    28+    std::memcpy(&key, key_bytes.data(), 8);
    


    hodlinator commented at 8:29 pm on April 23, 2025:
    In 2a50ec0d17b12f2adb1bf8872592b898810e9ac4: key is no longer used.

    l0rinc commented at 9:32 am on April 24, 2025:
    Oops!…I Did It Again
  198. hodlinator commented at 7:02 am on April 24, 2025: contributor

    Got some mixed messaging here since you told me repeatedly in private not to investigate failures I reported (https://github.com/bitcoin/bitcoin/pull/31144#issuecomment-2821410082) on my previous branch (https://github.com/bitcoin/bitcoin/pull/31144#pullrequestreview-2783071854) but then chided me in public (https://github.com/bitcoin/bitcoin/pull/31144#discussion_r2056174501). Anyways, the previous branch was not the sliced bread I believed it to be, looking back.


    New branch: https://github.com/bitcoin/bitcoin/compare/master...hodlinator:bitcoin:pr/31144_suggestions_2 CI: https://github.com/hodlinator/bitcoin/actions/runs/14628005565/attempts/1 (one failure due to #32322)

    If you have a suggestion that passes ci (and local IBD for some blocks), let me know

    How about CI above + local IBD on mainnet of a couple of months of blocks? Have you ever encountered IBD failures for code that passes CI? Seems like missing test coverage in that case?

    Scaled back some of the vector -> array, especially in context of serialization.

    Comparison between current PR and suggestion branch: https://github.com/bitcoin/bitcoin/compare/6892c5a..hodlinator:bitcoin:pr/31144_suggestions_2

  199. l0rinc force-pushed on Apr 24, 2025
  200. l0rinc commented at 9:43 am on April 24, 2025: contributor

    Have you ever encountered IBD failures for code that passes CI? Seems like missing test coverage in that case?

    yes, extended one of the tests now, hoping that will cover it next time.

    then chided me in public

    Definitely wasn’t my intention to scold you in any way, just didn’t (and still don’t) understand what you’re objecting to or suggesting in that part of the code. Pushed some changes, if it’s still not clear, let’s discuss in person.

  201. DrahtBot added the label CI failed on Apr 24, 2025
  202. DrahtBot commented at 11:06 am on April 24, 2025: contributor

    🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/41075184281

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  203. l0rinc force-pushed on Apr 24, 2025
  204. l0rinc force-pushed on Apr 24, 2025
  205. l0rinc commented at 7:06 pm on April 24, 2025: contributor
    Thanks for the reviews, addressed most of your concerns - except for the vector constructor for Obfuscation - but if other reviewers also think it’s better that way, I’ll do it of course. Also extended the BOOST_AUTO_TEST_CASE(dbwrapper) test case with asserting that the obfuscation key can be read back by an unobfuscated instance as well.
  206. l0rinc force-pushed on Apr 24, 2025
  207. l0rinc force-pushed on Apr 24, 2025
  208. l0rinc force-pushed on Apr 24, 2025
  209. hodlinator commented at 10:13 am on April 25, 2025: contributor
    Just letting the thread know that we’ve cleared up our miscommunication regarding #31144#pullrequestreview-2788618744. The combination of my prior suggestion failing CI and @l0rinc’s cautioning to make sure my further suggestions pass CI seem to be an unlucky coincidence within a very short time-frame. We should strive towards assuming good-faith when communicating over text, but on some days frustrations can get the best of me.
  210. l0rinc force-pushed on Apr 25, 2025
  211. l0rinc force-pushed on Apr 25, 2025
  212. l0rinc commented at 11:34 am on April 25, 2025: contributor
    Sorry for a few useless pushes, I was fighting some compilers for strings not always being constexpr-able… Code is ready for review again.
  213. DrahtBot commented at 1:26 pm on April 25, 2025: contributor
    [07:30:05.813] ⚠️ Failure generated from target with exit code 77: [’/ci_container_base/ci/scratch/build-x86_64-pc-linux-gnu/bin/fuzz’, ‘-runs=1’, PosixPath(’/ci_container_base/ci/scratch/qa-assets/fuzz_corpora/validation_load_mempool’)]
  214. l0rinc force-pushed on Apr 25, 2025
  215. l0rinc commented at 5:04 pm on April 25, 2025: contributor
    For now I’ve reverted the assert that checks the deserialized key size, since I couldn’t reproduce the fuzz failure locally (or when I could, even master failed for the same command). Will continue trying, but without the assert it’s closer to the previous state, so it should be fine as it is as well.
  216. DrahtBot removed the label CI failed on Apr 25, 2025
  217. test: compare util::Xor with randomized inputs against simple impl
    Since production code only uses keys of length 8, we're not testing with other values anymore
    b9c54ccd8c
  218. bench: make XorObfuscationBench more representative
    Since a previous PR already solved the tiny byte-array xors during serialization, we're only concentrating on big continuous chunks now.
    
    >  cmake -B build -DBUILD_BENCH=ON -DCMAKE_BUILD_TYPE=Release \
    && cmake --build build -j$(nproc) \
    && build/bin/bench_bitcoin -filter='XorObfuscationBench' -min-time=10000
    
    C++ compiler .......................... AppleClang 17.0.0.17000013
    
    |              ns/MiB |               MiB/s |    err% |     total | benchmark
    |--------------------:|--------------------:|--------:|----------:|:----------
    |          731,927.62 |            1,366.26 |    0.2% |     10.67 | `XorObfuscationBench`
    
    C++ compiler .......................... GNU 13.3.0
    
    |              ns/MiB |               MiB/s |    err% |         ins/MiB |         cyc/MiB |    IPC |        bra/MiB |   miss% |     total | benchmark
    |--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
    |          941,015.26 |            1,062.68 |    0.0% |    9,437,186.97 |    3,378,911.52 |  2.793 |   1,048,577.15 |    0.0% |     10.99 | `XorObfuscationBench`
    366bffd125
  219. refactor: unify xor vs obfuscation nomenclature
    This simplifies the next commits.
    9712481ae7
  220. refactor: prepare `DBWrapper` for obfuscation key change
    Since `CDBWrapper::Read` still supports vector only, we can initialize `m_obfuscation` directly instead of using a separate helper.
    `CreateObfuscation` was also inlined, replaced `key_exists` with `key_missing`, and simplified the `if` condition that writes a new obfuscation key.
    13cc039f20
  221. refactor: prepare mempool_persist for obfuscation key change e80438d0e9
  222. optimization: migrate fixed-size obfuscation from `std::vector<std::byte>` to `uint64_t`
    Since `util::Xor` now operates on `uint64_t` keys, migrated the obfuscation key end‑to‑end - from use sites in util::Xor up through streams, LevelDB wrappers, and disk (de)serialization - replacing all former `std::vector<std::byte>` keys with `uint64_t` (we still serialize them as vectors but convert immediately to `uint64_t` on load). This is why tests still generate vector keys and convert them to `uint64_t` later instead of generating them directly.
    
    We also short‑circuit `Xor` calls when the key is zero to avoid unnecessary calculations (e.g., `MakeWritableByteSpan`).
    
    In `Obfuscation::Unserialize` we can safely throw an `std::logic_error` since during mempool fuzzing `mempool_persist.cpp#L141` catches and ignored these errors safely.
    
    >  cmake -B build -DBUILD_BENCH=ON -DCMAKE_BUILD_TYPE=Release \
    && cmake --build build -j$(nproc) \
    && build/bin/bench_bitcoin -filter='XorObfuscationBench' -min-time=10000
    
    C++ compiler .......................... AppleClang 17.0.0.17000013
    
    |              ns/MiB |               MiB/s |    err% |     total | benchmark
    |--------------------:|--------------------:|--------:|----------:|:----------
    |           14,730.40 |           67,886.80 |    0.1% |     11.01 | `XorObfuscationBench`
    
    C++ compiler .......................... GNU 13.3.0
    
    |              ns/MiB |               MiB/s |    err% |         ins/MiB |         cyc/MiB |    IPC |        bra/MiB |   miss% |     total | benchmark
    |--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
    |           51,187.17 |           19,536.15 |    0.0% |      327,683.95 |      183,747.58 |  1.783 |      65,536.55 |    0.0% |     11.00 | `XorObfuscationBench`
    
    ----
    
    A few other benchmarks that seem to have improved as well (tested with Clang only):
    Before:
    
    |               ns/op |                op/s |    err% |     total | benchmark
    |--------------------:|--------------------:|--------:|----------:|:----------
    |        2,202,618.49 |              454.01 |    0.2% |     11.01 | `ReadBlockBench`
    |          734,444.92 |            1,361.57 |    0.3% |     10.66 | `ReadRawBlockBench`
    
    After:
    
    |               ns/op |                op/s |    err% |     total | benchmark
    |--------------------:|--------------------:|--------:|----------:|:----------
    |        1,912,308.06 |              522.93 |    0.4% |     10.98 | `ReadBlockBench`
    |           49,092.93 |           20,369.53 |    0.2% |     10.99 | `ReadRawBlockBench`
    
    Co-authored-by: Hodlinator <172445034+hodlinator@users.noreply.github.com>
    bba64732ff
  223. l0rinc force-pushed on Apr 26, 2025
  224. l0rinc commented at 2:57 pm on April 26, 2025: contributor
    Restored the deserialization validation in Obfuscation::Unserialize, we can’t assert, but we can throw a std::logic_error, since during mempool fuzzing https://github.com/bitcoin/bitcoin/blob/master/src/node/mempool_persist.cpp#L141 catches and ignored these errors safely (managed to reproduce it on Linux, not sure why it’s not reproducible on Mac). I’ve also split out all renames to a single commit before any other refactor or optimization to simplify the higher-risk changes. PR is ready for review again!

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-04-28 00:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me