optimization: buffer reads(23%)/writes(19%) in [undo]block [de]serialization #31539

pull l0rinc wants to merge 8 commits into bitcoin:master from l0rinc:l0rinc/buffered-block-read-write changing 10 files +308 −157
  1. l0rinc commented at 11:09 am on December 19, 2024: contributor

    This is a follow-up to #31490 (first few commits duplicated here, hence a draft) and a precursor to #31144.

    Currently, obfuscation operations are performed byte-by-byte during serialization. Buffering the reads allows batching these operations (implemented in #31144) and improves file access efficiency by reducing fread calls and associated locking overhead. Testing with various buffer sizes showed that 16 KiB provides the best performance.

    For writes, buffering enables batched obfuscations directly on the internal buffer, avoiding the need to copy input spans for obfuscation. Xor key offsets are calculated based on the file position, and the batched obfuscation is applied before writing to disk.

    BufferedFile was avoided for both reads and writes due to its unrelated operations and potential deprecation.


    Microbenchmarks show a ~23%/19% speedup (the followup XOR batching improves this further):

    cmake -B build -DBUILD_BENCH=ON -DCMAKE_BUILD_TYPE=Release && cmake –build build -j$(nproc) && build/src/bench/bench_bitcoin -filter=‘ReadBlockFromDiskTest|SaveBlockToDiskBench’ -min-time=10000

    Before:

    ns/op op/s err% total benchmark
    2,288,264.16 437.01 0.2% 11.00 ReadBlockFromDiskBench
    5,260,934.43 190.08 1.2% 11.08 SaveBlockToDiskBench

    After optimization: Buffer serialization reads in UndoReadFromDisk and ReadBlockFromDisk (23% faster read):

    ns/op op/s err% total benchmark
    1,847,259.64 541.34 0.2% 11.03 ReadBlockFromDiskBench

    After optimization: Buffer serialization writes in UndoWriteToDisk and WriteBlockToDisk (19% faster write):

    ns/op op/s err% total benchmark
    4,394,044.07 227.58 1.3% 10.93 SaveBlockToDiskBench
  2. DrahtBot commented at 11:09 am on December 19, 2024: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/31539.

    Reviews

    See the guideline for information on the review process. A summary of reviews will appear here.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #31490 (refactor: cache block[undo] serialized size for consecutive calls by l0rinc)
    • #31144 (optimization: speed up XOR by 4% (9% when disabled) by applying it in larger batches by l0rinc)
    • #29307 (util: explicitly close all AutoFiles that have been written by vasild)
    • #27006 (reduce cs_main scope, guard block index ’nFile’ under a local mutex by furszy)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  3. l0rinc renamed this:
    optimization: buffer reads/writes in [undo]blocks
    optimization: buffer reads/writes in [undo]block [de]serialization
    on Dec 19, 2024
  4. bench: add SaveBlockToDiskBench 3f0514f905
  5. refactor,blocks: inline `UndoWriteToDisk`
    `UndoWriteToDisk` wasn't really extracting a meaningful subset of the `WriteUndoDataForBlock` functionality, it's tied closely to the only caller (needs the header size twice, recalculated undo serializes size, returns multiple branches, modifies parameter, needs documentation).
    
    The inlined code should only differ in these parts (modernization will be done in other commits):
    * renamed `_pos` to `pos` in `WriteUndoDataForBlock` to match the parameter name;
    * inlined `hashBlock` parameter usage into `hasher << block.pprev->GetBlockHash()`;
    * changed `return false` to `return FatalError`.
    
    Co-authored-by: Ryan Ofsky <ryan@ofsky.org>
    b91d55cf26
  6. refactor,blocks: inline `WriteBlockToDisk`
    `WriteBlockToDisk` wasn't really extracting a meaningful subset of the `SaveBlockToDisk` functionality, it's tied closely to the only caller (needs the header size twice, recalculated block serializes size, returns multiple branches, mutates parameter).
    
    The inlined code should only differ in these parts (modernization will be done in other commits):
    * renamed `blockPos` to `pos` in `SaveBlockToDisk` to match the parameter name;
    * changed `return false` to `return FlatFilePos()`.
    
    Also removed remaining references to `SaveBlockToDisk`.
    
    Co-authored-by: Ryan Ofsky <ryan@ofsky.org>
    df77c0d704
  7. refactor,blocks: cache block serialized size for consecutive calls
    For consistency `UNDO_DATA_DISK_OVERHEAD` was also extracted to avoid the constant's ambiguity.
    4b8226b5b7
  8. refactor,blocks: remove costly asserts
    When the behavior was changes in a previous commit (caching `GetSerializeSize` and avoiding `AutoFile.tell`), asserts were added to make sure the behavior was kept - to make sure reviewers and CI validates it.
    We can safely remove them now.
    
    Co-authored-by: Anthony Towns <aj@erisian.com.au>
    42d157d596
  9. scripted-diff: rename block and undo functions for consistency
    Co-authored-by: Ryan Ofsky <ryan@ofsky.org>
    
    -BEGIN VERIFY SCRIPT-
    sed -i \
        -e 's/\bSaveBlockToDisk\b/SaveBlock/g' \
        -e 's/\bWriteUndoDataForBlock\b/SaveBlockUndo/g' \
        $(git ls-files)
    -END VERIFY SCRIPT-
    92abdde7bb
  10. optimization: Buffer serialization reads in UndoReadFromDisk and ReadBlockFromDisk
    The Obfuscation (XOR) operations are currently done byte-by-byte during serialization, buffering the reads will enable batching the obfuscation operations later (not yet done here).
    
    Also, different operating systems seem to handle file caching differently, so reading bigger batches (and processing those from memory) is also a bit faster (likely because of fewer native fread calls or less locking).
    
    Buffered reading was done by first exhausting the buffer, and if more data is still needed, reading directly into the destination (to avoid copying into the buffer when we have huge destination Spans, such as the many megabyte vectors in later blocks), and lastly refilling the buffer completely.
    
    Running the `ReadBlockFromDiskTest` benchmarks with different buffer sizes indicated that 16 KiB is the optimal buffer size (roughly 25% faster than master).
    
    Testing was done by randomly reading data from the same file with `AutoFile` and `BufferedReadOnlyFile` and making sure that the exact same data is read.
    
    `BufferedFile` wasn't used here because it does too many unrelated operations (and might be removed in the future).
    
    Before:
    
    |               ns/op |                op/s |    err% |     total | benchmark
    |--------------------:|--------------------:|--------:|----------:|:----------
    |        2,288,264.16 |              437.01 |    0.2% |     11.00 | `ReadBlockFromDiskBench`
    
    After:
    |               ns/op |                op/s |    err% |     total | benchmark
    |--------------------:|--------------------:|--------:|----------:|:----------
    |        1,847,259.64 |              541.34 |    0.2% |     11.03 | `ReadBlockFromDiskBench`
    6bc190ee42
  11. optimization: Buffer serialization writes in UndoWriteToDisk and WriteBlockToDisk
    Similarly to the serialization reads, buffered writes will enable batched xor calculations - especially since currently we need to copy the write inputs Span to do the obfuscation on it, batching will enable doing the xor on the internal buffer instead.
    
    All write operations are delegated to `AutoFile`.
    Xor key offsets are also calculated based on where we are in the underlying file.
    To avoid the buffered write of 4096 in `AutoFile.write`, we're disabling obfuscation for the underlying `m_dest` and doing it ourselves for the whole buffer (instead of byte-by-byte) before writing it to file.
    We can't obfuscate the write's src directly since it would mutate the method's input (for very big writes it would avoid copying), but we can fill our own buffer and xor all of that safely.
    
    `BufferedFile` wasn't used here because it does too many unrelated operations (and might be removed in the future).
    
    Before:
    
    |               ns/op |                op/s |    err% |     total | benchmark
    |--------------------:|--------------------:|--------:|----------:|:----------
    |        5,260,934.43 |              190.08 |    1.2% |     11.08 | `SaveBlockToDiskBench`
    
    After:
    |               ns/op |                op/s |    err% |     total | benchmark
    |--------------------:|--------------------:|--------:|----------:|:----------
    |        4,394,044.07 |              227.58 |    1.3% |     10.93 | `SaveBlockToDiskBench`
    4ef1507ff7
  12. l0rinc force-pushed on Dec 21, 2024
  13. l0rinc renamed this:
    optimization: buffer reads/writes in [undo]block [de]serialization
    optimization: buffer reads(23%)/writes(19%) in [undo]block [de]serialization
    on Dec 21, 2024


l0rinc DrahtBot


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-12-21 12:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me