refactor: cache block[undo] serialized size for consecutive calls #31490

pull l0rinc wants to merge 6 commits into bitcoin:master from l0rinc:l0rinc/undo changing 7 files +133 −150
  1. l0rinc commented at 1:52 pm on December 13, 2024: contributor

    UndoWriteToDisk and WriteBlockToDisk were delegating a subset of their functionality to single-use methods that didn’t optimally capture a meaningful chunk of the algorithm, resulting in calculating things twice (serialized size, header size). This change inlines the awkward methods (asserting that all previous behavior was retained), and in separate commits makes the usages less confusing. Besides making the methods slightly more intuitive, the refactorings reduce duplicate calculations as well.

    The speed difference is insignificant for now (~0.5% for the new SaveBlockToDiskBench), but are a cleanup for follow-ups such as #31539

    This change is a precursor to https://github.com/bitcoin/bitcoin/pull/31539

  2. DrahtBot commented at 1:52 pm on December 13, 2024: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/31490.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    Concept ACK TheCharlatan, andrewtoth, BrandonOdiwuor, theuni
    Stale ACK ryanofsky

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #31539 (optimization: buffer reads/writes in [undo]block [de]serialization by l0rinc)
    • #29307 (util: explicitly close all AutoFiles that have been written by vasild)
    • #27006 (reduce cs_main scope, guard block index ’nFile’ under a local mutex by furszy)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  3. DrahtBot added the label Refactoring on Dec 13, 2024
  4. TheCharlatan commented at 2:04 pm on December 13, 2024: contributor
    Concept ACK
  5. andrewtoth commented at 6:04 pm on December 13, 2024: contributor

    Concept ACK

    Should we not prefer the more modern and explicit uint32_t vs unsigned int?

  6. l0rinc force-pushed on Dec 15, 2024
  7. l0rinc commented at 2:30 pm on December 15, 2024: contributor

    prefer the more modern and explicit uint32_t

    I was hoping someone will recommend that - done: https://github.com/bitcoin/bitcoin/compare/e913e773926ecb72e327acf60c68655b5611cb7a..d69766164d177707ec7be19c4c188bd79ba3e4a3

    Edit: Added block size calculation deduplication as well to the PR

  8. l0rinc force-pushed on Dec 15, 2024
  9. l0rinc renamed this:
    refactor: Cache blockundo serialized size for consecutive calls
    refactor: Cache block[undo] serialized size for consecutive calls
    on Dec 15, 2024
  10. in src/node/blockstorage.cpp:682 in 5fd6b3f0fd outdated
    678@@ -679,8 +679,8 @@ bool BlockManager::UndoWriteToDisk(const CBlockUndo& blockundo, FlatFilePos& pos
    679     }
    680 
    681     // Write index header
    682-    unsigned int nSize = GetSerializeSize(blockundo);
    683-    fileout << GetParams().MessageStart() << nSize;
    684+    Assume(blockundo_size == GetSerializeSize(blockundo));
    


    ajtowns commented at 1:08 am on December 17, 2024:
    Calling Assume(x) always fully evaluates x, so there’s no saving here… The only thing Assume() does differently in release mode is that it doesn’t abort.

    l0rinc commented at 8:48 am on December 17, 2024:

    Damn, you’re right, thank you! Pushed a separate commit that removes the Assumes (this way the reviewers and CI can make sure the values are the same before/after) - plus found another debug log that has a similar problem, snuck it into this PR as well.

    mkdir demo && cmake -B build -DCMAKE_BUILD_TYPE=Debug && cmake –build build -j$(nproc) && build/src/bitcoind -datadir=demo

    0static bool test()
    1{
    2    std::cout << "Assume was called!" << std::endl;
    3    return false;
    4}
    5
    6static auto InitBlocksdirXorKey(const BlockManager::Options& opts)
    7{
    8    Assume(test());
    9...
    

    prints in both cases, fails only with Debug


    davidgumberg commented at 2:07 am on December 18, 2024:

    Calling Assume(x) always fully evaluates x, so there’s no saving here… The only thing Assume() does differently in release mode is that it doesn’t abort.

    I thought this wasn’t the case if x can be determined by the compiler to have no side effects?

    In any case #31178 shows that one shouldn’t treat assume(x) as a no-op that can be placed in a hot path for free.


    l0rinc commented at 1:40 pm on December 18, 2024:
  11. in src/blockencodings.cpp:172 in a50f9d7152 outdated
    166@@ -167,7 +167,9 @@ ReadStatus PartiallyDownloadedBlock::InitData(const CBlockHeaderAndShortTxIDs& c
    167             break;
    168     }
    169 
    170-    LogDebug(BCLog::CMPCTBLOCK, "Initialized PartiallyDownloadedBlock for block %s using a cmpctblock of size %lu\n", cmpctblock.header.GetHash().ToString(), GetSerializeSize(cmpctblock));
    171+    if (LogAcceptCategory(BCLog::CMPCTBLOCK, BCLog::Level::Debug)) {
    172+        LogDebug(BCLog::CMPCTBLOCK, "Initialized PartiallyDownloadedBlock for block %s using a cmpctblock of size %lu\n", cmpctblock.header.GetHash().ToString(), GetSerializeSize(cmpctblock));
    173+    }
    


    maflcko commented at 10:03 am on December 17, 2024:
    LogDebug is already a macro doing exactly that. No need to do the same check twice.

    l0rinc commented at 10:36 am on December 17, 2024:

    // Use a macro instead of a function for conditional logging to prevent // evaluating arguments when logging for the category is not enabled.

    Thanks, dropped the logger related commit. What’s the reason for the logger avoiding evaluation but Assume not doing it?

    What is the goal? Is this an optimization?

    As stated in the description a costly calculation, we can deduplicate it. I’ll clarify further in the description.

    If yes, how can it be observed?

    A quick peek can be seen in https://github.com/bitcoin-dev-tools/benchcoin/pull/66, but I’m currently running several full reindex-chainstates to see how much effect it has - previous measurements show 2-3%.

  12. maflcko changes_requested
  13. maflcko commented at 10:06 am on December 17, 2024: member

    Not sure about the changes.

    What is the goal? Is this an optimization? If yes, how can it be observed?

    Also, the changes seem to be based on a misunderstanding of the log and check macros.

  14. l0rinc force-pushed on Dec 17, 2024
  15. l0rinc renamed this:
    refactor: Cache block[undo] serialized size for consecutive calls
    optimization: Cache block[undo] serialized size for consecutive calls
    on Dec 17, 2024
  16. in src/node/blockstorage.h:178 in 0cb15046b0 outdated
    174@@ -175,7 +175,7 @@ class BlockManager
    175      * After this call, it will point to the beginning of the serialized CBlock data, after the separator fields
    176      * (BLOCK_SERIALIZATION_HEADER_SIZE)
    177      */
    178-    bool WriteBlockToDisk(const CBlock& block, FlatFilePos& pos) const;
    179+    bool WriteBlockToDisk(const CBlock& block, uint32_t block_size, FlatFilePos& pos) const;
    


    ryanofsky commented at 5:16 pm on December 17, 2024:

    In commit “refactor: Cache block serialized size for consecutive calls” (0cb15046b039d0e6efd2b57075bade46e1a10237)

    Would be good to document new WriteBlockToDisk block_size and UndoWriteToDisk blockundo_size arguments to clarify how they relate to the pos argument. Specifically it would be good to say that size arguments should be set to the size of data without the header, but the pos argument will increase by size of the data plus the header.


    l0rinc commented at 12:15 pm on December 18, 2024:
    I tried doing that, but noticed that all variants were awkward. Realized that the code smell is caused by extracting a part of the write methods that isn’t a meaningful subset of the functionality: it’s tied closely to the only caller (needs the header size twice, recalculated undo serializes size, returns multiple branches, modifies parameter, needs documentation). So I’ve recreated the PR completely, inlining these single-use methods, which also solves the serialization size calculations more naturally.
  17. ryanofsky approved
  18. ryanofsky commented at 5:32 pm on December 17, 2024: contributor

    Code review ACK 9a7e1ced7c3bb17193c7401181365c4075d45ec2. This seems like a safer, more efficient approach, but I still think the API is very confusing.

    No need to address here, but for a followup I think it would make more sense for WriteBlockToDisk instead of SaveBlockToDisk call FindNextBlockPos, and for UndoWriteToDisk instead of WriteUndoDataForBlock to be call FindUndoPos, so higher level SaveBlockToDisk / WriteUndoDataForBlock functions don’t contain any logic dealing with header fields.

    I also find names of these functions to be inconsistent, overlong and confusing. Would rename:

    • SaveBlockToDisk to SaveBlock
    • WriteBlockToDisk to WriteBlock
    • WriteUndoDataForBlock to SaveBlockUndo
    • UndoWriteToDisk to WriteBlockUndo
  19. DrahtBot requested review from andrewtoth on Dec 17, 2024
  20. DrahtBot requested review from TheCharlatan on Dec 17, 2024
  21. l0rinc force-pushed on Dec 18, 2024
  22. in src/util/check.h:96 in a730402ab7 outdated
    92@@ -93,6 +93,7 @@ constexpr T&& inline_assertion_check(LIFETIMEBOUND T&& val, [[maybe_unused]] con
    93  * - For fatal errors, use Assert().
    94  * - For non-fatal errors in interactive sessions (e.g. RPC or command line
    95  *   interfaces), CHECK_NONFATAL() might be more appropriate.
    96+ *   Note that the assumption is always fully evaluated - even in non-debug builds.
    


    maflcko commented at 12:30 pm on December 18, 2024:

    This is part of “identity function”. A function can not return the value that was passed in without evaluating it. (Having a different return type, depending on build type would also make it harder to use the function).

    If the docs are worthy to change, it would be good to either change none of them, or all of them in this file for other identity functions as well.


    l0rinc commented at 1:33 pm on December 18, 2024:
    Since Assume is a macro, it could have been written differently to avoid parameter evaluation (similarly to LogDebug), so I think it needs a comment. But I’m fine doing it in a different PR is this is remotely controversial.

    l0rinc commented at 1:40 pm on December 18, 2024:

    sipa commented at 1:42 pm on December 18, 2024:

    @l0rinc The point is that Assume(x) and x have the same behavior, apart from error detection. So if the expression x has side-effects, and the error condition does not occur, debug (e.g., fuzz) builds and release builds will behave identically.

    Of course, if the expression x is simple enough that the compiler knows it has no side effects, the optimizer will just remove it in non-debug builds, so for such expressions, you get the best of both worlds: no runtime penalty, but still a guarantee that release and debug builds behave identically (modulo compiler bugs).

    Lastly, Assume(x) is an expression, not a statement. You can use it like if (!Assume(x != nullptr)) { ... }. This isn’t possible to avoid evaluating.


    l0rinc commented at 1:52 pm on December 18, 2024:
    Thank you for clarifying. But given that we’ve bumped into this multiple times, do you think #31528 isn’t a valid change?
  23. maflcko commented at 12:33 pm on December 18, 2024: member
    concept ack, if this improves a benchmark. In the future, it would be good to mention a speedup in the pull request description, so that reviewers can see the motivation and goal for the change.
  24. l0rinc force-pushed on Dec 18, 2024
  25. l0rinc referenced this in commit fe49ecea66 on Dec 18, 2024
  26. BrandonOdiwuor commented at 1:54 pm on December 18, 2024: contributor
    Concept ACK
  27. theuni commented at 7:43 pm on December 18, 2024: member

    concept ack, if this improves a benchmark. In the future, it would be good to mention a speedup in the pull request description, so that reviewers can see the motivation and goal for the change.

    Agree with this. If this actually shows up as significant in real workloads (IBD), concept ACK and we could potentially take this further by caching the size even earlier (as part of deserializing over the wire) to avoid the need for the calculation at all.

    But if there’s no noticeable speedup, I’m not a fan of muddying the api.

  28. l0rinc commented at 7:51 pm on December 18, 2024: contributor

    If this actually shows up as significant in real workloads

    Thanks for the reviews, I’m running full IBD benchmarks currently, we’ll see the results shortly (can’t just do a quick reindex-chainstate since the changes are undo and block writing related).

    I have two other changes in queue that will be based on this refactor (https://github.com/bitcoin/bitcoin/pull/31539 and #31144, which I’ve drafted until these are sorted). The 3 changes together seem to result in >5% speedup for IBD (every kind, regardless of dbcache or prunedness) - but those benchmarks are also still running.

  29. l0rinc renamed this:
    optimization: Cache block[undo] serialized size for consecutive calls
    optimization: cache block[undo] serialized size for consecutive calls
    on Dec 19, 2024
  30. l0rinc commented at 1:17 pm on December 19, 2024: contributor

    we could potentially take this further by caching the size even earlier

    Absolutely, but that’s a bigger change (would cache the serialized sizes in CBlock, guarding against any other mutation (which requires better encapsulation), storing GetSerializeSize for TX_NO_WITNESS() and TX_WITH_WITNESS() lazily, similarly to the existing checked* flags)… but that’s a big change, affecting a lot of consensus code, I’m still working on that and will push in a separate PR.

  31. bench: add SaveBlockToDiskBench 3f0514f905
  32. refactor,blocks: inline `UndoWriteToDisk`
    `UndoWriteToDisk` wasn't really extracting a meaningful subset of the `WriteUndoDataForBlock` functionality, it's tied closely to the only caller (needs the header size twice, recalculated undo serializes size, returns multiple branches, modifies parameter, needs documentation).
    
    The inlined code should only differ in these parts (modernization will be done in other commits):
    * renamed `_pos` to `pos` in `WriteUndoDataForBlock` to match the parameter name;
    * inlined `hashBlock` parameter usage into `hasher << block.pprev->GetBlockHash()`;
    * changed `return false` to `return FatalError`.
    
    Co-authored-by: Ryan Ofsky <ryan@ofsky.org>
    b91d55cf26
  33. refactor,blocks: inline `WriteBlockToDisk`
    `WriteBlockToDisk` wasn't really extracting a meaningful subset of the `SaveBlockToDisk` functionality, it's tied closely to the only caller (needs the header size twice, recalculated block serializes size, returns multiple branches, mutates parameter).
    
    The inlined code should only differ in these parts (modernization will be done in other commits):
    * renamed `blockPos` to `pos` in `SaveBlockToDisk` to match the parameter name;
    * changed `return false` to `return FlatFilePos()`.
    
    Also removed remaining references to `SaveBlockToDisk`.
    
    Co-authored-by: Ryan Ofsky <ryan@ofsky.org>
    df77c0d704
  34. refactor,blocks: cache block serialized size for consecutive calls
    For consistency `UNDO_DATA_DISK_OVERHEAD` was also extracted to avoid the constant's ambiguity.
    4b8226b5b7
  35. refactor,blocks: remove costly asserts
    When the behavior was changes in a previous commit (caching `GetSerializeSize` and avoiding `AutoFile.tell`), asserts were added to make sure the behavior was kept - to make sure reviewers and CI validates it.
    We can safely remove them now.
    
    Co-authored-by: Anthony Towns <aj@erisian.com.au>
    42d157d596
  36. scripted-diff: rename block and undo functions for consistency
    Co-authored-by: Ryan Ofsky <ryan@ofsky.org>
    
    -BEGIN VERIFY SCRIPT-
    sed -i \
        -e 's/\bSaveBlockToDisk\b/SaveBlock/g' \
        -e 's/\bWriteUndoDataForBlock\b/SaveBlockUndo/g' \
        $(git ls-files)
    -END VERIFY SCRIPT-
    92abdde7bb
  37. l0rinc force-pushed on Dec 19, 2024
  38. l0rinc renamed this:
    optimization: cache block[undo] serialized size for consecutive calls
    refactor: cache block[undo] serialized size for consecutive calls
    on Dec 19, 2024
  39. l0rinc commented at 10:21 pm on December 19, 2024: contributor

    @maflcko, @theuni, the reindex and IBD didn’t show any speedup, so I’ve added a SaveBlockToDiskBench microbenchmark which only revealed a tiny speedup in itself, so I’ve demoted this from an optimization to a refactoring. This PR is now meant as a cleanup for follow-ups such as #31539 and #31144 where the total IBD speedup (until 870k blocks) is 9% - and ReadBlockFromDiskTest benchmark is 2x faster.

     0hyperfine \
     1--runs 2 \
     2--export-json /mnt/my_storage/ibd-xor-buffered.json \
     3--parameter-list COMMIT b042c4f0538c6f9cdf8efbcef552796851e38a85,8be7a6f07b1887103b7613e3f913ae708adcb42f \
     4--prepare 'rm -rf /mnt/my_storage/BitcoinData/* && git checkout {COMMIT} && git clean -fxd && git reset --hard && cmake -B build -DCMAKE_BUILD_TYPE=Release -DBUILD_UTIL=OFF -DBUILD_TX=OFF -DBUILD_TESTS=OFF -DENABLE_WALLET=OFF -DINSTALL_MAN=OFF && cmake --build build -j$(nproc)' \
     5'COMMIT={COMMIT} ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=870000 -printtoconsole=0'
     6
     7Benchmark 1: COMMIT=b042c4f0538c6f9cdf8efbcef552796851e38a85 ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=870000 -printtoconsole=0
     8  Time (mean ± σ):     39790.800 s ± 26.465 s    [User: 51319.040 s, System: 3420.884 s]
     9  Range (min … max):   39772.086 s … 39809.514 s    2 runs
    10
    11Benchmark 2: COMMIT=8be7a6f07b1887103b7613e3f913ae708adcb42f ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=870000 -printtoconsole=0
    12  Time (mean ± σ):     36396.906 s ± 27.299 s    [User: 47531.210 s, System: 3384.100 s]
    13  Range (min … max):   36377.603 s … 36416.210 s    2 runs
    14
    15Summary
    16  COMMIT=8be7a6f07b1887103b7613e3f913ae708adcb42f ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=870000 -printtoconsole=0 ran
    17    1.09 ± 0.00 times faster than COMMIT=b042c4f0538c6f9cdf8efbcef552796851e38a85 ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=870000 -printtoconsole=0
    

    I wanted to split this from #31144 to reduce risk, but if you think it’s easier to review them in a single PR, I don’t mind.


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-12-21 12:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me