refactor: inline UndoWriteToDisk and WriteBlockToDisk to reduce serialization calls #31490

pull l0rinc wants to merge 7 commits into bitcoin:master from l0rinc:l0rinc/undo changing 20 files +179 −204
  1. l0rinc commented at 1:52 pm on December 13, 2024: contributor

    UndoWriteToDisk and WriteBlockToDisk were delegating a subset of their functionality to single-use methods that didn’t optimally capture a meaningful chunk of the algorithm, resulting in calculating things twice (serialized size, header size). This change inlines the awkward methods (asserting that all previous behavior was retained), and in separate commits makes the usages less confusing. Besides making the methods slightly more intuitive, the refactorings reduce duplicate calculations as well.

    The speed difference is insignificant for now (~0.5% for the new SaveBlockToDiskBench), but are a cleanup for follow-ups such as https://github.com/bitcoin/bitcoin/pull/31539

  2. DrahtBot commented at 1:52 pm on December 13, 2024: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/31490.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK ryanofsky, hodlinator, TheCharlatan, andrewtoth
    Concept ACK BrandonOdiwuor, theuni

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #31551 (optimization: bulk reads(27%)/writes(290%) in [undo]block [de]serialization, 6% faster IBD by l0rinc)
    • #31539 (optimization: buffer reads(23%)/writes(290%) in [undo]block [de]serialization, 6% faster IBD by l0rinc)
    • #31533 (fuzz: Add fuzz target for block index tree and related validation events by mzumsande)
    • #31144 (optimization: batch XOR operations 12% faster IBD by l0rinc)
    • #29641 (scripted-diff: Use LogInfo over LogPrintf [WIP, NOMERGE, DRAFT] by maflcko)
    • #29307 (util: explicitly close all AutoFiles that have been written by vasild)
    • #26966 (index: initial sync speedup, parallelize process by furszy)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  3. DrahtBot added the label Refactoring on Dec 13, 2024
  4. TheCharlatan commented at 2:04 pm on December 13, 2024: contributor
    Concept ACK
  5. andrewtoth commented at 6:04 pm on December 13, 2024: contributor

    Concept ACK

    Should we not prefer the more modern and explicit uint32_t vs unsigned int?

  6. l0rinc force-pushed on Dec 15, 2024
  7. l0rinc commented at 2:30 pm on December 15, 2024: contributor

    prefer the more modern and explicit uint32_t

    I was hoping someone will recommend that - done: https://github.com/bitcoin/bitcoin/compare/e913e773926ecb72e327acf60c68655b5611cb7a..d69766164d177707ec7be19c4c188bd79ba3e4a3

    Edit: Added block size calculation deduplication as well to the PR

  8. l0rinc force-pushed on Dec 15, 2024
  9. l0rinc renamed this:
    refactor: Cache blockundo serialized size for consecutive calls
    refactor: Cache block[undo] serialized size for consecutive calls
    on Dec 15, 2024
  10. in src/node/blockstorage.cpp:682 in 5fd6b3f0fd outdated
    678@@ -679,8 +679,8 @@ bool BlockManager::UndoWriteToDisk(const CBlockUndo& blockundo, FlatFilePos& pos
    679     }
    680 
    681     // Write index header
    682-    unsigned int nSize = GetSerializeSize(blockundo);
    683-    fileout << GetParams().MessageStart() << nSize;
    684+    Assume(blockundo_size == GetSerializeSize(blockundo));
    


    ajtowns commented at 1:08 am on December 17, 2024:
    Calling Assume(x) always fully evaluates x, so there’s no saving here… The only thing Assume() does differently in release mode is that it doesn’t abort.

    l0rinc commented at 8:48 am on December 17, 2024:

    Damn, you’re right, thank you! Pushed a separate commit that removes the Assumes (this way the reviewers and CI can make sure the values are the same before/after) - plus found another debug log that has a similar problem, snuck it into this PR as well.

    mkdir demo && cmake -B build -DCMAKE_BUILD_TYPE=Debug && cmake –build build -j$(nproc) && build/src/bitcoind -datadir=demo

    0static bool test()
    1{
    2    std::cout << "Assume was called!" << std::endl;
    3    return false;
    4}
    5
    6static auto InitBlocksdirXorKey(const BlockManager::Options& opts)
    7{
    8    Assume(test());
    9...
    

    prints in both cases, fails only with Debug


    davidgumberg commented at 2:07 am on December 18, 2024:

    Calling Assume(x) always fully evaluates x, so there’s no saving here… The only thing Assume() does differently in release mode is that it doesn’t abort.

    I thought this wasn’t the case if x can be determined by the compiler to have no side effects?

    In any case #31178 shows that one shouldn’t treat assume(x) as a no-op that can be placed in a hot path for free.


    l0rinc commented at 1:40 pm on December 18, 2024:
  11. in src/blockencodings.cpp:172 in a50f9d7152 outdated
    166@@ -167,7 +167,9 @@ ReadStatus PartiallyDownloadedBlock::InitData(const CBlockHeaderAndShortTxIDs& c
    167             break;
    168     }
    169 
    170-    LogDebug(BCLog::CMPCTBLOCK, "Initialized PartiallyDownloadedBlock for block %s using a cmpctblock of size %lu\n", cmpctblock.header.GetHash().ToString(), GetSerializeSize(cmpctblock));
    171+    if (LogAcceptCategory(BCLog::CMPCTBLOCK, BCLog::Level::Debug)) {
    172+        LogDebug(BCLog::CMPCTBLOCK, "Initialized PartiallyDownloadedBlock for block %s using a cmpctblock of size %lu\n", cmpctblock.header.GetHash().ToString(), GetSerializeSize(cmpctblock));
    173+    }
    


    maflcko commented at 10:03 am on December 17, 2024:
    LogDebug is already a macro doing exactly that. No need to do the same check twice.

    l0rinc commented at 10:36 am on December 17, 2024:

    // Use a macro instead of a function for conditional logging to prevent // evaluating arguments when logging for the category is not enabled.

    Thanks, dropped the logger related commit. What’s the reason for the logger avoiding evaluation but Assume not doing it?

    What is the goal? Is this an optimization?

    As stated in the description a costly calculation, we can deduplicate it. I’ll clarify further in the description.

    If yes, how can it be observed?

    A quick peek can be seen in https://github.com/bitcoin-dev-tools/benchcoin/pull/66, but I’m currently running several full reindex-chainstates to see how much effect it has - previous measurements show 2-3%.

  12. maflcko changes_requested
  13. maflcko commented at 10:06 am on December 17, 2024: member

    Not sure about the changes.

    What is the goal? Is this an optimization? If yes, how can it be observed?

    Also, the changes seem to be based on a misunderstanding of the log and check macros.

  14. l0rinc force-pushed on Dec 17, 2024
  15. l0rinc renamed this:
    refactor: Cache block[undo] serialized size for consecutive calls
    optimization: Cache block[undo] serialized size for consecutive calls
    on Dec 17, 2024
  16. in src/node/blockstorage.h:178 in 0cb15046b0 outdated
    174@@ -175,7 +175,7 @@ class BlockManager
    175      * After this call, it will point to the beginning of the serialized CBlock data, after the separator fields
    176      * (BLOCK_SERIALIZATION_HEADER_SIZE)
    177      */
    178-    bool WriteBlockToDisk(const CBlock& block, FlatFilePos& pos) const;
    179+    bool WriteBlockToDisk(const CBlock& block, uint32_t block_size, FlatFilePos& pos) const;
    


    ryanofsky commented at 5:16 pm on December 17, 2024:

    In commit “refactor: Cache block serialized size for consecutive calls” (0cb15046b039d0e6efd2b57075bade46e1a10237)

    Would be good to document new WriteBlockToDisk block_size and UndoWriteToDisk blockundo_size arguments to clarify how they relate to the pos argument. Specifically it would be good to say that size arguments should be set to the size of data without the header, but the pos argument will increase by size of the data plus the header.


    l0rinc commented at 12:15 pm on December 18, 2024:
    I tried doing that, but noticed that all variants were awkward. Realized that the code smell is caused by extracting a part of the write methods that isn’t a meaningful subset of the functionality: it’s tied closely to the only caller (needs the header size twice, recalculated undo serializes size, returns multiple branches, modifies parameter, needs documentation). So I’ve recreated the PR completely, inlining these single-use methods, which also solves the serialization size calculations more naturally.
  17. ryanofsky approved
  18. ryanofsky commented at 5:32 pm on December 17, 2024: contributor

    Code review ACK 9a7e1ced7c3bb17193c7401181365c4075d45ec2. This seems like a safer, more efficient approach, but I still think the API is very confusing.

    No need to address here, but for a followup I think it would make more sense for WriteBlockToDisk instead of SaveBlockToDisk call FindNextBlockPos, and for UndoWriteToDisk instead of WriteUndoDataForBlock to be call FindUndoPos, so higher level SaveBlockToDisk / WriteUndoDataForBlock functions don’t contain any logic dealing with header fields.

    I also find names of these functions to be inconsistent, overlong and confusing. Would rename:

    • SaveBlockToDisk to SaveBlock
    • WriteBlockToDisk to WriteBlock
    • WriteUndoDataForBlock to SaveBlockUndo
    • UndoWriteToDisk to WriteBlockUndo
  19. DrahtBot requested review from andrewtoth on Dec 17, 2024
  20. DrahtBot requested review from TheCharlatan on Dec 17, 2024
  21. l0rinc force-pushed on Dec 18, 2024
  22. in src/util/check.h:96 in a730402ab7 outdated
    92@@ -93,6 +93,7 @@ constexpr T&& inline_assertion_check(LIFETIMEBOUND T&& val, [[maybe_unused]] con
    93  * - For fatal errors, use Assert().
    94  * - For non-fatal errors in interactive sessions (e.g. RPC or command line
    95  *   interfaces), CHECK_NONFATAL() might be more appropriate.
    96+ *   Note that the assumption is always fully evaluated - even in non-debug builds.
    


    maflcko commented at 12:30 pm on December 18, 2024:

    This is part of “identity function”. A function can not return the value that was passed in without evaluating it. (Having a different return type, depending on build type would also make it harder to use the function).

    If the docs are worthy to change, it would be good to either change none of them, or all of them in this file for other identity functions as well.


    l0rinc commented at 1:33 pm on December 18, 2024:
    Since Assume is a macro, it could have been written differently to avoid parameter evaluation (similarly to LogDebug), so I think it needs a comment. But I’m fine doing it in a different PR is this is remotely controversial.

    l0rinc commented at 1:40 pm on December 18, 2024:

    sipa commented at 1:42 pm on December 18, 2024:

    @l0rinc The point is that Assume(x) and x have the same behavior, apart from error detection. So if the expression x has side-effects, and the error condition does not occur, debug (e.g., fuzz) builds and release builds will behave identically.

    Of course, if the expression x is simple enough that the compiler knows it has no side effects, the optimizer will just remove it in non-debug builds, so for such expressions, you get the best of both worlds: no runtime penalty, but still a guarantee that release and debug builds behave identically (modulo compiler bugs).

    Lastly, Assume(x) is an expression, not a statement. You can use it like if (!Assume(x != nullptr)) { ... }. This isn’t possible to avoid evaluating.


    l0rinc commented at 1:52 pm on December 18, 2024:
    Thank you for clarifying. But given that we’ve bumped into this multiple times, do you think #31528 isn’t a valid change?
  23. maflcko commented at 12:33 pm on December 18, 2024: member
    concept ack, if this improves a benchmark. In the future, it would be good to mention a speedup in the pull request description, so that reviewers can see the motivation and goal for the change.
  24. l0rinc force-pushed on Dec 18, 2024
  25. l0rinc referenced this in commit fe49ecea66 on Dec 18, 2024
  26. BrandonOdiwuor commented at 1:54 pm on December 18, 2024: contributor
    Concept ACK
  27. theuni commented at 7:43 pm on December 18, 2024: member

    concept ack, if this improves a benchmark. In the future, it would be good to mention a speedup in the pull request description, so that reviewers can see the motivation and goal for the change.

    Agree with this. If this actually shows up as significant in real workloads (IBD), concept ACK and we could potentially take this further by caching the size even earlier (as part of deserializing over the wire) to avoid the need for the calculation at all.

    But if there’s no noticeable speedup, I’m not a fan of muddying the api.

  28. l0rinc commented at 7:51 pm on December 18, 2024: contributor

    If this actually shows up as significant in real workloads

    Thanks for the reviews, I’m running full IBD benchmarks currently, we’ll see the results shortly (can’t just do a quick reindex-chainstate since the changes are undo and block writing related).

    I have two other changes in queue that will be based on this refactor (https://github.com/bitcoin/bitcoin/pull/31539 and #31144, which I’ve drafted until these are sorted). The 3 changes together seem to result in >5% speedup for IBD (every kind, regardless of dbcache or prunedness) - but those benchmarks are also still running.

  29. l0rinc renamed this:
    optimization: Cache block[undo] serialized size for consecutive calls
    optimization: cache block[undo] serialized size for consecutive calls
    on Dec 19, 2024
  30. l0rinc commented at 1:17 pm on December 19, 2024: contributor

    we could potentially take this further by caching the size even earlier

    Absolutely, but that’s a bigger change (would cache the serialized sizes in CBlock, guarding against any other mutation (which requires better encapsulation), storing GetSerializeSize for TX_NO_WITNESS() and TX_WITH_WITNESS() lazily, similarly to the existing checked* flags)… but that’s a big change, affecting a lot of consensus code, I’m still working on that and will push in a separate PR.

  31. l0rinc force-pushed on Dec 19, 2024
  32. l0rinc renamed this:
    optimization: cache block[undo] serialized size for consecutive calls
    refactor: cache block[undo] serialized size for consecutive calls
    on Dec 19, 2024
  33. l0rinc commented at 10:21 pm on December 19, 2024: contributor

    @maflcko, @theuni, the reindex and IBD didn’t show any speedup, so I’ve added a SaveBlockToDiskBench microbenchmark which only revealed a tiny speedup in itself, so I’ve demoted this from an optimization to a refactoring. This PR is now meant as a cleanup for follow-ups such as #31539 and #31144 (comment) where the total IBD speedup (until 870k blocks) is 12% - and ReadBlockFromDiskBench benchmark is 2x faster and SaveBlockToDiskBench goes from 190.67 op/s to 1,529.59 op/s (8x).

    I wanted to split this from #31144 to reduce risk, but if you think it’s easier to review them in a single PR, I don’t mind.

  34. l0rinc force-pushed on Jan 2, 2025
  35. in src/index/coinstatsindex.cpp:126 in 0dd386994b outdated
    122@@ -123,7 +123,7 @@ bool CoinStatsIndex::CustomAppend(const interfaces::BlockInfo& block)
    123         // pindex variable gives indexing code access to node internals. It
    124         // will be removed in upcoming commit
    125         const CBlockIndex* pindex = WITH_LOCK(cs_main, return m_chainstate->m_blockman.LookupBlockIndex(block.hash));
    126-        if (!m_chainstate->m_blockman.UndoReadFromDisk(block_undo, *pindex)) {
    127+        if (!m_chainstate->m_blockman.UndoRead(block_undo, *pindex)) {
    


    TheCharlatan commented at 8:36 am on January 4, 2025:

    In commit 0dd386994bafd67081c4521673018ddd22e2d63c

    Can we call this ReadBlockUndo instead?


    l0rinc commented at 12:07 pm on January 4, 2025:
    That was inconsistent, indeed - fixed!
  36. l0rinc force-pushed on Jan 4, 2025
  37. in src/bench/readwriteblock.cpp:22 in 2ff0ea366c outdated
    17+#include <cassert>
    18+#include <cstdint>
    19+#include <memory>
    20+#include <vector>
    21+
    22+CBlock CreateTestBlock()
    


    hodlinator commented at 1:13 pm on January 7, 2025:

    nit:

    0static CBlock CreateTestBlock()
    

    l0rinc commented at 1:02 pm on January 9, 2025:
    Done
  38. in src/node/blockstorage.cpp:956 in 2ff0ea366c outdated
    976         }
    977-        if (!UndoWriteToDisk(blockundo, _pos, block.pprev->GetBlockHash())) {
    978+        // Open history file to append
    979+        AutoFile fileout{OpenUndoFile(pos)};
    980+        if (fileout.IsNull()) {
    981+            LogError("%s: OpenUndoFile failed\n", __func__);
    


    hodlinator commented at 1:28 pm on January 7, 2025:

    nit: Newer code shouldn’t need to explicitly log __func__ as we have -logsourcelocations.

    0            LogError("OpenUndoFile failed");
    

    l0rinc commented at 1:02 pm on January 9, 2025:
    Added it to the assert removal commits, thanks!
  39. in src/bench/readwriteblock.cpp:68 in 2ff0ea366c outdated
    63+    });
    64+}
    65+
    66+BENCHMARK(SaveBlockBench, benchmark::PriorityLevel::HIGH);
    67+BENCHMARK(ReadBlockBench, benchmark::PriorityLevel::HIGH);
    68+BENCHMARK(ReadRawBlockBench, benchmark::PriorityLevel::HIGH);
    


    hodlinator commented at 1:48 pm on January 7, 2025:
    nit: Could add missing newline before EOF.

    l0rinc commented at 1:03 pm on January 9, 2025:
    Done
  40. in src/node/blockstorage.h:327 in 2ff0ea366c outdated
    323@@ -330,7 +324,7 @@ class BlockManager
    324     /** Get block file info entry for one block file */
    325     CBlockFileInfo* GetBlockFileInfo(size_t n);
    326 
    327-    bool WriteUndoDataForBlock(const CBlockUndo& blockundo, BlockValidationState& state, CBlockIndex& block)
    328+    bool SaveBlockUndo(const CBlockUndo& blockundo, BlockValidationState& state, CBlockIndex& block)
    


    hodlinator commented at 2:48 pm on January 7, 2025:

    nit:

    Terminology

    To me read/write and save/load are distinct pairs, so ReadBlock/SaveBlock seems off. Would have preferred WriteBlockUndo and WriteBlock.

    I liked the explicitness of ToDisk/FromDisk for cases where that is actually what it’s doing, makes code more self-documenting, even if slightly verbose. I see the change was prompted by #31490#pullrequestreview-2509562827.

    Could the scripted diff commit message at least include a one-line rationale?


    l0rinc commented at 3:01 pm on January 7, 2025:
    Thanks @hodlinator! The new names were requested by @ryanofsky and @TheCharlatan - I don’t have strong feelings about them either way. What do other reviewers think?

    ryanofsky commented at 3:54 pm on January 8, 2025:

    In commit “scripted-diff: rename block and undo functions for consistency” (2ff0ea366c61b2fcb80ad1f711480915c7a7aa2e)

    Agree with hodlinator that Read/Write or Load/Save would now be better than Read/Save. The reason I suggested Save instead of Write in #31490#pullrequestreview-250956282 is that previously we had a high level function for writing called Save and lower level functions called Write, but now after 9b59f8b624cc641bd7216ececffa3111041fd760, the lower level functions are dropped, so there’s no need to have a naming distinction.

    I liked the explicitness of ToDisk/FromDisk for cases where that is actually what it’s doing, makes code more self-documenting, even if slightly verbose.

    One reason I think it is good to drop ToDisk/FromDisk suffixes is for consistency. Lots of block storage functions (particularly WriteUndoDataForBlock here) access the disk but don’t have “disk” in the name so having “disk” some places and not others without having a clear rule for where it is used seems likely to lead to misinterpretation and false inferences.

    But the main reason I think it is good to drop “disk” from the names is that the main functions here (the high-level functions called by application code, not the low-level functions called by benchmarks) do not accept any parameters referencing files or disk locations:

    0bool ReadBlock(CBlock& block, const CBlockIndex& index) const;
    1FlatFilePos SaveBlock(const CBlock& block, int nHeight);
    2bool ReadBlockUndo(CBlockUndo& blockundo, const CBlockIndex& index) const;
    3bool SaveBlockUndo(const CBlockUndo& blockundo, BlockValidationState& state, CBlockIndex& block);
    

    These functions don’t need to use disk storage, even if that is what they are doing now. They could just as easily be using any type of storage (cloud, mobile, browser, etc).

    However, I do think it would be good to call these Read/Write or Load/Save instead of Read/Save. And if we want to add ToDisk/FromDisk suffixes to lower level functions that seems ok as long as it is done consistently.


    l0rinc commented at 1:03 pm on January 9, 2025:
    I’ve updated the scripted diff to use consistent Read/Write terminology - dropping the [To/From]Disk part which also just seems like noise to me. Thanks for the proposals.

    hodlinator commented at 1:15 pm on January 10, 2025:
    Thanks for resolving the terminology @l0rinc and for the back-story @ryanofsky!
  41. hodlinator commented at 2:58 pm on January 7, 2025: contributor

    Concept ACK cba4378072a6bd43f3e37a7d5e662d3041681566

    Removes repetitive calls to GetSerializeSize(). And avoids calling .tell().

    Nothing blocking.

    PR summary

    Repetitive mention of #31539.

    Commit history

    Tested splitting up 96c9b578a6b85fac9977cdf25bf2e52d04645bd4 into one which refactors existing code (225d294070fc403e03028b9fab714bd9dc4f3307), and a second commit (61a7254e1b3d4b5159ffcab6e1096da32dc1ace0) which renames the file and adds SaveBlockBench(). Keeping the changes small in the rename-commit means that Git clients correctly show it as a rename instead of add+delete.

    Current PR

    0₿ git -c log.showSignature=false log --oneline --follow readwriteblock.cpp
    12ff0ea366c (HEAD -> pr/31490) scripted-diff: rename block and undo functions for consistency
    296c9b578a6 bench: add SaveBlockBench
    

    With suggested splitting of commit 96c9b578a6b85fac9977cdf25bf2e52d04645bd4

    0₿ git -c log.showSignature=false log --oneline --follow readwriteblock.cpp
    19eba6eae8b (HEAD -> pr/31490_suggestions, myfork/pr/31490_suggestions) scripted-diff: rename block and undo functions for consistency
    261a7254e1b bench: add SaveBlockBench
    3225d294070 refactor,bench: Stop calling benchmarks test, etc
    4faecca9a85 test: Use span for raw data
    5fab0e834b8 bench: [refactor] iwyu
    6d9e477c4dc validation, blockstorage: Separate code paths for reindex and saving new blocks
    71c4b9cbe90 bench: add readblock benchmark
    
  42. in src/node/blockstorage.cpp:949 in 6e22e55920 outdated
    944@@ -945,7 +945,9 @@ bool BlockManager::WriteUndoDataForBlock(const CBlockUndo& blockundo, BlockValid
    945     // Write undo information to disk
    946     if (block.GetUndoPos().IsNull()) {
    947         FlatFilePos pos;
    948-        if (!FindUndoPos(state, block.nFile, pos, ::GetSerializeSize(blockundo) + 40)) {
    949+        const uint32_t blockundo_size{static_cast<uint32_t>(GetSerializeSize(blockundo))};
    950+        assert(UNDO_DATA_DISK_OVERHEAD == 40);
    


    ryanofsky commented at 2:39 pm on January 8, 2025:

    In commit “refactor,blocks: cache block serialized size for consecutive calls” (6e22e55920da33dc8970793f9e6a854eb3faa3c4)

    Seems like this could be a static assert

    EDIT: but I guess it is removed in the upcoming commit, so doesn’t matter too much


    l0rinc commented at 1:03 pm on January 9, 2025:
    Done, also added TODOs to the commit (+ message) to obviate that they’re temporary
  43. in src/node/blockstorage.cpp:948 in 6e22e55920 outdated
    944@@ -945,7 +945,9 @@ bool BlockManager::WriteUndoDataForBlock(const CBlockUndo& blockundo, BlockValid
    945     // Write undo information to disk
    946     if (block.GetUndoPos().IsNull()) {
    947         FlatFilePos pos;
    948-        if (!FindUndoPos(state, block.nFile, pos, ::GetSerializeSize(blockundo) + 40)) {
    949+        const uint32_t blockundo_size{static_cast<uint32_t>(GetSerializeSize(blockundo))};
    


    ryanofsky commented at 2:47 pm on January 8, 2025:

    In commit “refactor,blocks: cache block serialized size for consecutive calls” (https://github.com/bitcoin/bitcoin/commit/6e22e55920da33dc8970793f9e6a854eb3faa3c4)

    IMO in new code it would be better to avoid using uint32_t and unsigned int for sizes and just use size_t for consistency and simplicity. static_casts like this could then be avoided except when calling older functions, and could eventually be removed when older functions are updated.

    However, if we do want to keep using narrower size types for efficiency or backwards compatibility, I think the correct type to use would be unsigned int not uint32_t, because unsigned int is the type used by existing code, and the type that should be more efficient for whatever platform is being used. unsigned int and size_t are the two types that are used currently so casting to uint32_t introduces a third type and adds unnecessary complexity.


    l0rinc commented at 1:03 pm on January 9, 2025:
    I usually went in the other direction to make sure we know the sizes exactly. Since size_t doesn’t really serialize (e.g. in fileout << GetParams().MessageStart() << blockundo_size -> call to 'Serialize' is ambiguous) - I made it into a unsigned int (I don’t particularly like it, but it’s not that important).

    ryanofsky commented at 3:48 pm on January 9, 2025:

    re: #31490 (review)

    I made it into a unsigned int (I don’t particularly like it, but it’s not that important).

    Thanks, your change makes sense and to be clear I don’t like code writing unsigned int everywhere either, even though I do think unsigned int is a reasonable type for this code to be using. I think more ideally src/flatfile.h would define something like:

    0using FilePos = unsigned int;
    

    and then code could reference the FilePos type instead of unsigned int.

  44. in src/node/blockstorage.cpp:1095 in 6e22e55920 outdated
    1091@@ -1092,33 +1092,24 @@ bool BlockManager::ReadRawBlockFromDisk(std::vector<uint8_t>& block, const FlatF
    1092 
    1093 FlatFilePos BlockManager::SaveBlockToDisk(const CBlock& block, int nHeight)
    1094 {
    1095-    unsigned int nBlockSize = ::GetSerializeSize(TX_WITH_WITNESS(block));
    1096-    // Account for the 4 magic message start bytes + the 4 length bytes (8 bytes total,
    1097-    // defined as BLOCK_SERIALIZATION_HEADER_SIZE)
    1098-    nBlockSize += static_cast<unsigned int>(BLOCK_SERIALIZATION_HEADER_SIZE);
    1099-    FlatFilePos pos{FindNextBlockPos(nBlockSize, nHeight, block.GetBlockTime())};
    1100+    const uint32_t block_size{static_cast<uint32_t>(GetSerializeSize(TX_WITH_WITNESS(block)))};
    


    ryanofsky commented at 3:01 pm on January 8, 2025:

    In commit “refactor,blocks: cache block serialized size for consecutive calls” (https://github.com/bitcoin/bitcoin/commit/6e22e55920da33dc8970793f9e6a854eb3faa3c4)

    Same comments here about inappropriate use of uint32_t. IMO, we should prefer size_t if trying to modernize code or unsigned int if trying to be backwards compatible. If there is really a reason for introducing a third size type, it should be stated and clarified somewhere.


    l0rinc commented at 1:03 pm on January 9, 2025:
    Done, went with unsigned int

    maflcko commented at 1:07 pm on January 9, 2025:

    My guess was that for serialization fixed-size ints are required and this was the motivation for the change in the first place, so I am surprised to see it reverted.

    Not that it matter in this codebase in practise, given that int is assumed to be 32-bit, but I wanted to mention it.


    l0rinc commented at 1:19 pm on January 9, 2025:

    for serialization fixed-size ints are required

    Yes, it’s the same argument as in #31490 (review)

  45. in src/node/blockstorage.cpp:1105 in 6e22e55920 outdated
    1102     if (pos.IsNull()) {
    1103         LogError("%s: FindNextBlockPos failed\n", __func__);
    1104         return FlatFilePos();
    1105     }
    1106-
    1107-    // Open history file to append
    


    ryanofsky commented at 3:08 pm on January 8, 2025:

    In commit “refactor,blocks: cache block serialized size for consecutive calls” (https://github.com/bitcoin/bitcoin/commit/6e22e55920da33dc8970793f9e6a854eb3faa3c4)

    I don’t see a reason to delete the comments from this function. I think they are helpful and it is inconsistent to get rid of these because this commit is not deleting the corresponding comments in the WriteUndoDataForBlock function.

    I particularly think the “Write index header” comments in both functions are helpful because they define the “header” term that is used elsewhere and make it clear it refers to the (magic constant + size) fields that are output before serialized CBlock and CBlockUndo objects. I think the other deleted comments also helped organize the code and make it legible.


    l0rinc commented at 1:03 pm on January 9, 2025:
    I’ve reverted the comments that were used to group functionality (write header … write [undo[block), but I’ve deleted the ones that were just repeating code. Especially after extracting the constant to names values, they don’t add any value.
  46. ryanofsky approved
  47. ryanofsky commented at 4:18 pm on January 8, 2025: contributor

    Code review ACK 2ff0ea366c61b2fcb80ad1f711480915c7a7aa2e. New version of this PR is different and I think easier to follow than the previous version. All the changes now seem like obvious code cleanups.

    Might consider renaming PR from “cache block[undo] serialized size” because I’m not sure it’s even accurate to call not pointlessly recomputing the same values twice in a row “caching”.

  48. DrahtBot requested review from TheCharlatan on Jan 8, 2025
  49. DrahtBot requested review from hodlinator on Jan 8, 2025
  50. DrahtBot requested review from BrandonOdiwuor on Jan 8, 2025
  51. DrahtBot requested review from theuni on Jan 8, 2025
  52. l0rinc renamed this:
    refactor: cache block[undo] serialized size for consecutive calls
    refactor: inline `UndoWriteToDisk` and `WriteBlockToDisk` to reduce serialization calls
    on Jan 9, 2025
  53. refactor,bench: rename bench/readblock.cpp to bench/readwriteblock.cpp
    Done in separate commit to simplify review.
    Also renames benchmarks, since they're not strictly tests.
    
    Co-authored-by: Hodlinator <172445034+hodlinator@users.noreply.github.com>
    34f9a0157a
  54. bench: add SaveBlockBench 86b85bb11f
  55. refactor,blocks: inline `UndoWriteToDisk`
    `UndoWriteToDisk` wasn't really extracting a meaningful subset of the `WriteUndoDataForBlock` functionality, it's tied closely to the only caller (needs the header size twice, recalculated undo serializes size, returns multiple branches, modifies parameter, needs documentation).
    
    The inlined code should only differ in these parts (modernization will be done in other commits):
    * renamed `_pos` to `pos` in `WriteUndoDataForBlock` to match the parameter name;
    * inlined `hashBlock` parameter usage into `hasher << block.pprev->GetBlockHash()`;
    * changed `return false` to `return FatalError`;
    * capitalize comment.
    
    Co-authored-by: Ryan Ofsky <ryan@ofsky.org>
    42bc491465
  56. refactor,blocks: inline `WriteBlockToDisk`
    Similarly, `WriteBlockToDisk` wasn't really extracting a meaningful subset of the `SaveBlockToDisk` functionality, it's tied closely to the only caller (needs the header size twice, recalculated block serializes size, returns multiple branches, mutates parameter).
    
    The inlined code should only differ in these parts (modernization will be done in other commits):
    * renamed `blockPos` to `pos` in `SaveBlockToDisk` to match the parameter name;
    * changed `return false` to `return FlatFilePos()`.
    
    Also removed remaining references to `SaveBlockToDisk`.
    
    Co-authored-by: Ryan Ofsky <ryan@ofsky.org>
    dfb2f9d004
  57. l0rinc force-pushed on Jan 9, 2025
  58. l0rinc commented at 1:05 pm on January 9, 2025: contributor

    Thanks a lot for the reviews, I’ve pushed these changes:

    • changed title and commit messages to reflect the new purpose of refactoring and code cleanup;
    • removed repetitive mention of follow-up;
    • split benchmark renaming into a separate diff to ease review;
    • replaced temporary assert to static_assert (with todos to help the reviewers);
    • added assertion to the scripted diff to make sure the new names don’t exist yet;
    • changed SaveBlock[Undo] replacements to WriteBlock[Undo] in the scripted diff;
    • unified type sizes in new code to unsigned int;
    • reverted comments that were used to group code parts.
  59. refactor,blocks: deduplicate block's serialized size calculations
    For consistency `UNDO_DATA_DISK_OVERHEAD` was also extracted to avoid the constant's ambiguity.
    Asserts were added to help with the review - they are removed in the next commit.
    
    Co-authored-by: Ryan Ofsky <ryan@ofsky.org>
    fa39f27a0f
  60. refactor,blocks: remove costly asserts and modernize affected logs
    When the behavior was changes in a previous commit (caching `GetSerializeSize` and avoiding `AutoFile.tell`), (static)asserts were added to make sure the behavior was kept - to make sure reviewers and CI validates it.
    We can safely remove them now.
    
    Logs were also slightly modernized since they were trivial to do.
    
    Co-authored-by: Anthony Towns <aj@erisian.com.au>
    Co-authored-by: Hodlinator <172445034+hodlinator@users.noreply.github.com>
    baaa3b2846
  61. scripted-diff: rename block and undo functions for consistency
    Co-authored-by: Ryan Ofsky <ryan@ofsky.org>
    Co-authored-by: Hodlinator <172445034+hodlinator@users.noreply.github.com>
    
    -BEGIN VERIFY SCRIPT-
    grep -r -wE 'WriteBlock|ReadRawBlock|ReadBlock|WriteBlockUndo|ReadBlockUndo' $(git ls-files src/ ':!src/leveldb') && \
        echo "Error: One or more target names already exist!" && exit 1
    sed -i \
        -e 's/\bSaveBlockToDisk/WriteBlock/g' \
        -e 's/\bReadRawBlockFromDisk/ReadRawBlock/g' \
        -e 's/\bReadBlockFromDisk/ReadBlock/g' \
        -e 's/\bWriteUndoDataForBlock/WriteBlockUndo/g' \
        -e 's/\bUndoReadFromDisk/ReadBlockUndo/g' \
        $(git ls-files src/ ':!src/leveldb')
    -END VERIFY SCRIPT-
    223081ece6
  62. l0rinc force-pushed on Jan 9, 2025
  63. ryanofsky approved
  64. ryanofsky commented at 3:58 pm on January 9, 2025: contributor
    Code review ACK 223081ece651dc616ff63d9ac447eedc5c2a28fa. Since last review, “Save” was renamed to “Write”, uint32_t references were dropped, some log statements and comments were improved as suggested, and a lot of tweaks made to commits and commit messages which should make this easier to review.
  65. in src/bench/readwriteblock.cpp:30 in 223081ece6
    25+    CBlock block;
    26+    stream >> TX_WITH_WITNESS(block);
    27+    return block;
    28+}
    29+
    30+static void SaveBlockBench(benchmark::Bench& bench)
    


    ryanofsky commented at 4:01 pm on January 9, 2025:

    In commit “bench: add SaveBlockBench” (86b85bb11f8999eb59e34bd026b0791dc866f2eb)

    Could rename SaveBlock to WriteBlock here too


    l0rinc commented at 4:08 pm on January 9, 2025:
    Right, if I need to edit, I’ll rename this as well
  66. l0rinc requested review from maflcko on Jan 9, 2025
  67. in src/node/blockstorage.cpp:984 in 223081ece6
    1008             // fact it is. Note though, that a failed flush might leave the data
    1009             // file untrimmed.
    1010-            if (!FlushUndoFile(_pos.nFile, true)) {
    1011-                LogPrintLevel(BCLog::BLOCKSTORAGE, BCLog::Level::Warning, "Failed to flush undo file %05i\n", _pos.nFile);
    1012+            if (!FlushUndoFile(pos.nFile, true)) {
    1013+                LogPrintLevel(BCLog::BLOCKSTORAGE, BCLog::Level::Warning, "Failed to flush undo file %05i\n", pos.nFile);
    


    hodlinator commented at 11:09 am on January 10, 2025:

    nit: Could at least remove newline, and possibly switch to more modern way of non-filterable non-categorized warning:

    0                LogWarning("Failed to flush undo file %05i", pos.nFile);
    

    l0rinc commented at 2:06 pm on January 10, 2025:
    Wouldn’t LogWarning replace BCLog::BLOCKSTORAGE with BCLog::LogFlags::ALL?

    hodlinator commented at 2:32 pm on January 10, 2025:
    Correct, by “non-categorized” I was referring to BCLog::LogFlags::ALL.
  68. in src/node/blockstorage.cpp:956 in 223081ece6
    977         }
    978-        if (!UndoWriteToDisk(blockundo, _pos, block.pprev->GetBlockHash())) {
    979+        // Open history file to append
    980+        AutoFile fileout{OpenUndoFile(pos)};
    981+        if (fileout.IsNull()) {
    982+            LogError("OpenUndoFile failed");
    


    hodlinator commented at 11:12 am on January 10, 2025:
    nit: Could modernize logging when moving the log statement to avoid touching this line twice, resulting in git blame churn. Here and for “OpenBlockFile failed”. But I guess you prefer the current commit-style.

    l0rinc commented at 2:07 pm on January 10, 2025:
    Yeah, I prefer changing it minimally when moving, to make sure I don’t hide any changes there.
  69. hodlinator approved
  70. hodlinator commented at 1:55 pm on January 10, 2025: contributor

    ACK 223081ece651dc616ff63d9ac447eedc5c2a28fa

    Thanks for reorganizing the first commits! Confirmed that git -c log.showSignature=false log --oneline --follow src/bench/readwriteblock.cpp shows 7 commits.

    Cool with the sanity-check in the scripted diff, not sure I’ve seen that before.

    Commit message dfb2f9d004860c95fc6f0d4a016a9c038d53a475

    Might add some more context: “Similarly +to UndoWriteToDisk in parent commit+, WriteBlockToDisk wasn’t really extracting”

    Commit message 42bc4914658d9834a653bd1763aa8f0d54355480

    (What’s the inspiration for all the semicolons? Doesn’t appear to be one of the uses described here: https://grammarist.com/punctuation/how-to-use-semicolons-in-a-list/)

    Benchmarked with new bench target

    0₿ build/src/bench/bench_bitcoin -filter=SaveBlockBench -min-time=10000
    

    At second commit (86b85bb11f8999eb59e34bd026b0791dc866f2eb)

    0|               ns/op |                op/s |    err% |          ins/op |          cyc/op |    IPC |         bra/op |   miss% |     total | benchmark
    1|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
    2|        3,172,375.74 |              315.22 |    0.6% |   20,053,788.28 |    9,071,225.51 |  2.211 |   3,133,287.73 |    0.5% |     11.12 | `SaveBlockBench`
    

    (Median result of 3 runs).

    At final commit (223081ece651dc616ff63d9ac447eedc5c2a28fa)

    0|               ns/op |                op/s |    err% |          ins/op |          cyc/op |    IPC |         bra/op |   miss% |     total | benchmark
    1|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
    2|        3,159,241.92 |              316.53 |    1.4% |   19,805,232.51 |    8,963,495.75 |  2.210 |   3,080,238.64 |    0.4% |     11.08 | `SaveBlockBench`
    

    (Median result of 3 runs).

    Conclusion

    Unfortunately this confirms that serializing blocks is insanely fast, making this PR more of a refactor than an optimization.

  71. TheCharlatan approved
  72. TheCharlatan commented at 1:54 pm on January 14, 2025: contributor

    ACK 223081ece651dc616ff63d9ac447eedc5c2a28fa

    I don’t see a change in the runtime of the benchmarks, but these are some good refactors that I have wanted for some time.

  73. in src/node/blockstorage.cpp:948 in 223081ece6
    968     if (block.GetUndoPos().IsNull()) {
    969-        FlatFilePos _pos;
    970-        if (!FindUndoPos(state, block.nFile, _pos, ::GetSerializeSize(blockundo) + 40)) {
    971-            LogError("%s: FindUndoPos failed\n", __func__);
    972+        FlatFilePos pos;
    973+        const unsigned int blockundo_size{static_cast<unsigned int>(GetSerializeSize(blockundo))};
    


    andrewtoth commented at 5:32 pm on January 19, 2025:

    nit: Here and in WriteBlock

    0        const uint32_t blockundo_size{static_cast<uint32_t>(GetSerializeSize(blockundo))};
    

    l0rinc commented at 6:03 pm on January 19, 2025:
    That’s what I was going for originally, but this was explicitly requested by @ryanofsky in #31490 (review)
  74. andrewtoth approved
  75. andrewtoth commented at 6:01 pm on January 19, 2025: contributor
    ACK 223081ece651dc616ff63d9ac447eedc5c2a28fa

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-01-21 09:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me