optimization: bulk reads(27%)/writes(290%) in [undo]block [de]serialization, 6% faster IBD #31551

pull l0rinc wants to merge 8 commits into bitcoin:master from l0rinc:l0rinc/bulk-block-read-write changing 7 files +183 −165
  1. l0rinc commented at 2:34 pm on December 21, 2024: contributor

    An alternative implementation of #31539, where the whole block is read into memory instead of doing it in chunks, as suggested by @theuni.


    This is a follow-up to #31490 (first few commits duplicated here, hence a draft) and a precursor to #31144.

    Currently, obfuscation operations are performed byte-by-byte during serialization. Buffering the reads allows batching these operations (implemented in #31144) and improves file access efficiency by reducing fread calls and associated locking overhead. For writes, buffering enables batched obfuscations directly on the internal buffer, avoiding the need to copy input spans for obfuscation. Xor key offsets are calculated based on the file position, and the batched obfuscation is applied before writing to disk.


    2 full IBD runs against master for 870k blocks (seeded from real nodes) indicates a 6% total speedup.

     0hyperfine --runs 2 --export-json /mnt/my_storage/ibd-xor-buffered.json --parameter-list COMMIT d73f37dda221835b5109ede1b84db2dc7c4b74a1,6853b2740851befffa3ca0b24d94212ed1e48d66 --prepare 'rm -rf /mnt/my_storage/BitcoinData/* && git checkout {COMMIT} && git clean -fxd && git reset --hard && cmake -B build -DCMAKE_BUILD_TYPE=Release -DBUILD_UTIL=OFF -DBUILD_TX=OFF -DBUILD_TESTS=OFF -DENABLE_WALLET=OFF -DINSTALL_MAN=OFF && cmake --build build -j$(nproc)' 'COMMIT={COMMIT} ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=870000 -printtoconsole=0'
     1
     2Benchmark 1: COMMIT=d73f37dda221835b5109ede1b84db2dc7c4b74a1 ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=870000 -printtoconsole=0
     3  Time (mean ± σ):     40216.674 s ± 113.132 s    [User: 51496.289 s, System: 3541.340 s]
     4  Range (min … max):   40136.678 s … 40296.671 s    2 runs
     5 
     6Benchmark 2: COMMIT=6853b2740851befffa3ca0b24d94212ed1e48d66 ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=870000 -printtoconsole=0
     7  Time (mean ± σ):     37964.015 s ± 624.115 s    [User: 49086.037 s, System: 3375.072 s]
     8  Range (min … max):   37522.699 s … 38405.331 s    2 runs
     9 
    10Summary
    11  COMMIT=6853b2740851befffa3ca0b24d94212ed1e48d66 ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=870000 -printtoconsole=0 ran
    12    1.06 ± 0.02 times faster than COMMIT=d73f37dda221835b5109ede1b84db2dc7c4b74a1 ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=870000 -printtoconsole=0
    

    Microbenchmarks show a ~27%/290% speedup (the followup XOR batching improves this further):

    cmake -B build -DBUILD_BENCH=ON -DCMAKE_BUILD_TYPE=Release && cmake –build build -j$(nproc) && build/src/bench/bench_bitcoin -filter=‘ReadBlockFromDiskBench|SaveBlockToDiskBench’ -min-time=10000

    Before:

    ns/op op/s err% total benchmark
    2,285,753.56 437.49 0.3% 11.03 ReadBlockFromDiskBench
    5,244,636.51 190.67 0.3% 11.04 SaveBlockToDiskBench

    After:

    ns/op op/s err% total benchmark
    1,787,371.46 559.48 1.6% 11.16 SaveBlockToDiskBench
    1,804,240.57 554.25 0.2% 11.01 ReadBlockFromDiskBench
  2. bench: add SaveBlockToDiskBench 3f0514f905
  3. refactor,blocks: inline `UndoWriteToDisk`
    `UndoWriteToDisk` wasn't really extracting a meaningful subset of the `WriteUndoDataForBlock` functionality, it's tied closely to the only caller (needs the header size twice, recalculated undo serializes size, returns multiple branches, modifies parameter, needs documentation).
    
    The inlined code should only differ in these parts (modernization will be done in other commits):
    * renamed `_pos` to `pos` in `WriteUndoDataForBlock` to match the parameter name;
    * inlined `hashBlock` parameter usage into `hasher << block.pprev->GetBlockHash()`;
    * changed `return false` to `return FatalError`.
    
    Co-authored-by: Ryan Ofsky <ryan@ofsky.org>
    b91d55cf26
  4. refactor,blocks: inline `WriteBlockToDisk`
    `WriteBlockToDisk` wasn't really extracting a meaningful subset of the `SaveBlockToDisk` functionality, it's tied closely to the only caller (needs the header size twice, recalculated block serializes size, returns multiple branches, mutates parameter).
    
    The inlined code should only differ in these parts (modernization will be done in other commits):
    * renamed `blockPos` to `pos` in `SaveBlockToDisk` to match the parameter name;
    * changed `return false` to `return FlatFilePos()`.
    
    Also removed remaining references to `SaveBlockToDisk`.
    
    Co-authored-by: Ryan Ofsky <ryan@ofsky.org>
    df77c0d704
  5. refactor,blocks: cache block serialized size for consecutive calls
    For consistency `UNDO_DATA_DISK_OVERHEAD` was also extracted to avoid the constant's ambiguity.
    4b8226b5b7
  6. refactor,blocks: remove costly asserts
    When the behavior was changes in a previous commit (caching `GetSerializeSize` and avoiding `AutoFile.tell`), asserts were added to make sure the behavior was kept - to make sure reviewers and CI validates it.
    We can safely remove them now.
    
    Co-authored-by: Anthony Towns <aj@erisian.com.au>
    42d157d596
  7. scripted-diff: rename block and undo functions for consistency
    Co-authored-by: Ryan Ofsky <ryan@ofsky.org>
    
    -BEGIN VERIFY SCRIPT-
    sed -i \
        -e 's/\bSaveBlockToDisk\b/SaveBlock/g' \
        -e 's/\bWriteUndoDataForBlock\b/SaveBlockUndo/g' \
        $(git ls-files)
    -END VERIFY SCRIPT-
    92abdde7bb
  8. DrahtBot commented at 2:34 pm on December 21, 2024: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/31551.

    Reviews

    See the guideline for information on the review process. A summary of reviews will appear here.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #31539 (optimization: buffer reads(23%)/writes(290%) in [undo]block [de]serialization, 6% faster IBD by l0rinc)
    • #31144 (optimization: batch XOR operations 12% faster IBD by l0rinc)
    • #29307 (util: explicitly close all AutoFiles that have been written by vasild)
    • #27006 (reduce cs_main scope, guard block index ’nFile’ under a local mutex by furszy)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  9. optimization: Bulk serialization reads in `UndoReadFromDisk` and `ReadBlockFromDisk`
    The Obfuscation (XOR) operations are currently done byte-by-byte during serialization, buffering the reads will enable batching the obfuscation operations later (not yet done here).
    
    Also, different operating systems seem to handle file caching differently, so reading bigger batches (and processing those from memory) is also a bit faster (likely because of fewer native fread calls or less locking).
    
    Before:
    |               ns/op |                op/s |    err% |     total | benchmark
    |--------------------:|--------------------:|--------:|----------:|:----------
    |        2,285,753.56 |              437.49 |    0.3% |     11.03 | `ReadBlockFromDiskBench`
    
    After:
    |               ns/op |                op/s |    err% |     total | benchmark
    |--------------------:|--------------------:|--------:|----------:|:----------
    |        1,804,240.57 |              554.25 |    0.2% |     11.01 | `ReadBlockFromDiskBench`
    
    Co-authored-by: Cory Fields <cory-nospam-@coryfields.com>
    34eafa54a6
  10. l0rinc force-pushed on Dec 21, 2024
  11. l0rinc renamed this:
    optimization: bulk reads(27%)/writes(290%) in [undo]block [de]serialization
    optimization: bulk reads(27%)/writes(290%) in [undo]block [de]serialization, 6% faster IBD
    on Dec 21, 2024
  12. optimization: Bulk serialization writes in `SaveBlockUndo` and `SaveBlock`
    Similarly to the serialization reads, buffered writes will enable batched xor calculations - especially since currently we need to copy the write inputs Span to do the obfuscation on it, batching enables doing the xor on the internal buffer instead.
    
    Before:
    |               ns/op |                op/s |    err% |     total | benchmark
    |--------------------:|--------------------:|--------:|----------:|:----------
    |        5,244,636.51 |              190.67 |    0.3% |     11.04 | `SaveBlockToDiskBench`
    
    After:
    |               ns/op |                op/s |    err% |     total | benchmark
    |--------------------:|--------------------:|--------:|----------:|:----------
    |        1,787,371.46 |              559.48 |    1.6% |     11.16 | `SaveBlockToDiskBench`
    
    Co-authored-by: Cory Fields <cory-nospam-@coryfields.com>
    1f736fbdb4
  13. l0rinc force-pushed on Dec 21, 2024


l0rinc DrahtBot


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-12-27 18:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me