[IBD] batch block reads/writes during AutoFile serialization #31551

pull l0rinc wants to merge 7 commits into bitcoin:master from l0rinc:l0rinc/bulk-block-read-write changing 7 files +297 −63
  1. l0rinc commented at 2:34 pm on December 21, 2024: contributor

    This change is part of [IBD] - Tracking PR for speeding up Initial Block Download

    Summary

    We can serialize the blocks and undos to any Stream which implements the appropriate read/write methods. AutoFile is one of these, writing the results “directly” to disk (through the OS file cache). Batching these in memory first and reading/writing these to disk is measurably faster (likely because of fewer native fread calls or less locking, as observed by @martinus in a similar change).

    Unlocking new optimization opportunities

    Buffered writes will also enable batched obfuscation calculations (implemented in #31144) - especially since currently we need to copy the write input’s std::span to do the obfuscation on it, and batching enables doing the operations on the internal buffer directly.

    Measurements (micro benchmarks, full IBDs and reindexes)

    Microbenchmarks for [Read|Write]BlockBench show a ~30%/168% speedup with macOS/Clang, and ~19%/24% with Linux/GCC (the follow-up XOR batching improves these further):

    Before:

    ns/op op/s err% total benchmark
    2,271,441.67 440.25 0.1% 11.00 ReadBlockBench
    5,149,564.31 194.19 0.8% 10.95 WriteBlockBench

    After:

    ns/op op/s err% total benchmark
    1,738,683.04 575.15 0.2% 11.04 ReadBlockBench
    3,052,658.88 327.58 1.0% 10.91 WriteBlockBench

    Before:

    ns/op op/s err% ins/op cyc/op IPC bra/op miss% total benchmark
    6,895,987.11 145.01 0.0% 71,055,269.86 23,977,374.37 2.963 5,074,828.78 0.4% 22.00 ReadBlockBench
    5,152,973.58 194.06 2.2% 19,350,886.41 8,784,539.75 2.203 3,079,335.21 0.4% 23.18 WriteBlockBench

    After:

    ns/op op/s err% ins/op cyc/op IPC bra/op miss% total benchmark
    5,771,882.71 173.25 0.0% 65,741,889.82 20,453,232.33 3.214 3,971,321.75 0.3% 22.01 ReadBlockBench
    4,145,681.13 241.21 4.0% 15,337,596.85 5,732,186.47 2.676 2,239,662.64 0.1% 23.94 WriteBlockBench

    2 full IBD runs against master (compiled with GCC where the gains seem more modest) for 888888 blocks (seeded from real nodes) indicates a ~7% total speedup.

     0COMMITS="8af40aaf283cc81fe2b3cc125d21f090c562460e 5c21f6c26fd6b6eab33e208cd565577735fe9aa7 97d30e2534b0a8c388a4a2059277f64d3b2243af"; \
     1STOP_HEIGHT=888888; DBCACHE=1000; \
     2C_COMPILER=gcc; CXX_COMPILER=g++; \
     3BASE_DIR="/mnt/my_storage"; DATA_DIR="${BASE_DIR}/BitcoinData"; LOG_DIR="${BASE_DIR}/logs"; \
     4git fetch --all -q && (for c in $COMMITS; do git fetch origin $c -q && git log -1 --oneline $c || exit 1; done) && \
     5hyperfine \
     6  --export-json "${BASE_DIR}/ibd-${COMMITS// /-}-${STOP_HEIGHT}-${DBCACHE}-${C_COMPILER}.json" \
     7  --runs 2 \
     8  --parameter-list COMMIT ${COMMITS// /,} \
     9  --prepare "killall bitcoind; rm -rf ${DATA_DIR}/*; git checkout {COMMIT}; git clean -fxd; git reset --hard; \
    10    cmake -B build -DCMAKE_BUILD_TYPE=Release -DENABLE_WALLET=OFF -DCMAKE_C_COMPILER=$C_COMPILER -DCMAKE_CXX_COMPILER=$CXX_COMPILER && \
    11    cmake --build build -j$(nproc) --target bitcoind && \
    12    ./build/bin/bitcoind -datadir=${DATA_DIR} -stopatheight=1 -printtoconsole=0" \
    13  --cleanup "cp ${DATA_DIR}/debug.log ${LOG_DIR}/debug-{COMMIT}-$(date +%s).log" \
    14  "COMPILER=$C_COMPILER COMMIT={COMMIT} ./build/bin/bitcoind -datadir=${DATA_DIR} -stopatheight=$STOP_HEIGHT -dbcache=$DBCACHE -blocksonly -printtoconsole=0"
    15
    168af40aaf28 refactor: replace raw values in ReadRawBlock with HEADER_BYTE_SIZE
    175c21f6c26f optimization: Bulk serialization reads in `UndoRead`, `ReadBlock`, and `ReadRawBlock`
    1897d30e2534 optimization: Bulk serialization writes in `WriteBlockUndo` and `WriteBlock`
    19
    20Benchmark 1: COMPILER=gcc COMMIT=8af40aaf283cc81fe2b3cc125d21f090c562460e ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=888888 -dbcache=1000 -blocksonly -printtoconsole=0
    21  Time (mean ± σ):     30899.137 s ± 501.097 s    [User: 43094.744 s, System: 3295.980 s]
    22  Range (min … max):   30544.808 s … 31253.467 s    2 runs
    23 
    24Benchmark 2: COMPILER=gcc COMMIT=5c21f6c26fd6b6eab33e208cd565577735fe9aa7 ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=888888 -dbcache=1000 -blocksonly -printtoconsole=0
    25  Time (mean ± σ):     29956.405 s ± 111.231 s    [User: 42278.888 s, System: 3239.997 s]
    26  Range (min … max):   29877.754 s … 30035.057 s    2 runs
    27 
    28Benchmark 3: COMPILER=gcc COMMIT=97d30e2534b0a8c388a4a2059277f64d3b2243af ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=888888 -dbcache=1000 -blocksonly -printtoconsole=0
    29  Time (mean ± σ):     28826.445 s ± 147.749 s    [User: 41450.577 s, System: 2955.840 s]
    30  Range (min … max):   28721.970 s … 28930.919 s    2 runs
    31 
    32Summary
    33  COMPILER=gcc COMMIT=97d30e2534b0a8c388a4a2059277f64d3b2243af ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=888888 -dbcache=1000 -blocksonly -printtoconsole=0 ran
    34    1.04 ± 0.01 times faster than COMPILER=gcc COMMIT=5c21f6c26fd6b6eab33e208cd565577735fe9aa7 ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=888888 -dbcache=1000 -blocksonly -printtoconsole=0
    35    1.07 ± 0.02 times faster than COMPILER=gcc COMMIT=8af40aaf283cc81fe2b3cc125d21f090c562460e ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=888888 -dbcache=1000 -blocksonly -printtoconsole=0
    
  2. DrahtBot commented at 2:34 pm on December 21, 2024: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/31551.

    Reviews

    See the guideline for information on the review process. A summary of reviews will appear here.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #31144 ([IBD] multi-byte block obfuscation by l0rinc)
    • #29700 (kernel, refactor: return error status on all fatal errors by ryanofsky)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  3. l0rinc force-pushed on Dec 21, 2024
  4. l0rinc renamed this:
    optimization: bulk reads(27%)/writes(290%) in [undo]block [de]serialization
    optimization: bulk reads(27%)/writes(290%) in [undo]block [de]serialization, 6% faster IBD
    on Dec 21, 2024
  5. l0rinc force-pushed on Dec 21, 2024
  6. DrahtBot added the label Needs rebase on Jan 22, 2025
  7. l0rinc force-pushed on Jan 22, 2025
  8. DrahtBot removed the label Needs rebase on Jan 22, 2025
  9. l0rinc force-pushed on Jan 22, 2025
  10. l0rinc marked this as ready for review on Jan 22, 2025
  11. l0rinc force-pushed on Jan 22, 2025
  12. DrahtBot commented at 11:14 pm on January 22, 2025: contributor

    🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/36027778098

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  13. DrahtBot added the label CI failed on Jan 22, 2025
  14. l0rinc force-pushed on Jan 22, 2025
  15. DrahtBot removed the label CI failed on Jan 23, 2025
  16. l0rinc marked this as a draft on Jan 23, 2025
  17. l0rinc marked this as ready for review on Jan 23, 2025
  18. in src/node/blockstorage.cpp:1029 in 0e9fb2ed33 outdated
    1026+
    1027     // Open history file to read
    1028     AutoFile filein{OpenBlockFile(pos, true)};
    1029     if (filein.IsNull()) {
    1030-        LogError("%s: OpenBlockFile failed for %s\n", __func__, pos.ToString());
    1031+        LogError("OpenBlockFile failed for %s", pos.ToString());
    


    maflcko commented at 1:58 pm on January 24, 2025:

    I am not a fan of mixing logging changes into a commit that does something completely different (claimed optimization).

    Also, not everyone has srclocation logging enabled, so the log strings should still be unique.

    Looks like the prior pull did something similar and now there are two LogError("OpenUndoFile failed"); in two different contexts. So when removing the __func__, it would be good to add back any context that was removed (i.e. whether it was opened for reading or writing).


    l0rinc commented at 2:30 pm on January 24, 2025:
    Thank you, excellent observation. I have separated the comment updates to a final commit and fixed the previous PR’s comments as well. Done
  19. l0rinc force-pushed on Jan 24, 2025
  20. in src/node/blockstorage.cpp:690 in f8b1b218b4 outdated
    688-    HashVerifier verifier{filein}; // Use HashVerifier as reserializing may lose data, c.f. commit d342424301013ec47dc146a4beb49d5c9319d80a
    689     try {
    690+        // Read block
    691+        filein >> undo_size;
    692+        if (undo_size > MAX_SIZE) {
    693+            LogError("%s: Refusing to read undo data of size: %d", __func__, undo_size);
    


    maflcko commented at 3:42 pm on January 24, 2025:
    nit in f8b1b218b4f8533128b337168b6769e2e4c30c02: I think for new logs, __func__ is not needed. Refusing to read undo data of size looks unique as well.

    l0rinc commented at 4:26 pm on January 24, 2025:
    Done
  21. in src/node/blockstorage.cpp:682 in f8b1b218b4 outdated
    712         LogError("%s: Deserialize or I/O error - %s at %s\n", __func__, e.what(), pos.ToString());
    713         return false;
    714     }
    715 
    716-    // Verify checksum
    717-    if (hashChecksum != verifier.GetHash()) {
    


    maflcko commented at 3:46 pm on January 24, 2025:
    nit in https://github.com/bitcoin/bitcoin/commit/f8b1b218b4f8533128b337168b6769e2e4c30c02: Any reason to re-order this? Should be fine to keep the scopes as-is, since they are so small.

    l0rinc commented at 4:09 pm on January 24, 2025:
    You mean by leaving a hashChecksum reference outside the try, initialize it inside and validate it outside as well? I can do that (or bring them inside in a separate commit), but I didn’t like that. Can you show me with a diff with the version that you like?

    maflcko commented at 4:17 pm on January 24, 2025:
    All good, I see it wouldn’t make the diff smaller. Feel free to resolve.
  22. in src/node/blockstorage.cpp:966 in f72c2dd3a0 outdated
    962@@ -963,23 +963,34 @@ bool BlockManager::WriteBlockUndo(const CBlockUndo& blockundo, BlockValidationSt
    963             return false;
    964         }
    965         // Open history file to append
    966-        AutoFile fileout{OpenUndoFile(pos)};
    967+        AutoFile fileout{m_undo_file_seq.Open(pos, false), {}}; // We'll obfuscate ourselves
    


    maflcko commented at 3:54 pm on January 24, 2025:
    in f72c2dd3a0c5005bb527587dc5576a19dc6a790d: I wonder if it would be more encapsulated to add a new AutoFile::write_large(std::vector<std::byte>&&) method which consumes a large byte blob. This would allow to keep the xor internal to AutoFile, reducing the complexity here and allowing other places to use it as well?

    l0rinc commented at 4:10 pm on January 24, 2025:
    Interesting idea - what do others think?

    l0rinc commented at 7:16 pm on January 26, 2025:
    Done in a separate commit, the result is a lot cleaner, thanks for the suggestion
  23. in src/node/blockstorage.cpp:682 in 2a5c52ac4e outdated
    678@@ -679,7 +679,7 @@ bool BlockManager::ReadBlockUndo(CBlockUndo& blockundo, const CBlockIndex& index
    679     // Open history file to read
    680     AutoFile filein{OpenUndoFile(pos, true)};
    681     if (filein.IsNull()) {
    682-        LogError("OpenUndoFile failed for %s", pos.ToString());
    683+        LogError("%s: OpenUndoFile failed for %s", __func__, pos.ToString());
    


    maflcko commented at 3:57 pm on January 24, 2025:
    nit in 2a5c52ac4e0c098dc6dcf354d9e00fc5c78b6dfe: An alternative would be to say LogError("OpenUndoFile failed for %s while reading", _);

    l0rinc commented at 4:26 pm on January 24, 2025:
    Done
  24. maflcko commented at 4:04 pm on January 24, 2025: member

    2 full IBD runs against master for 870k blocks (seeded from real nodes) indicates a 6% total speedup. Note that this is only measurable with full IBD, not with reindex(-chainstate) since the gain comes from block reading/writing.

    I don’t understand this. -reindex(-chainstate) requires the read of blocks, so why would it not be measurable? See also #28226 which this pull is inspired by(?) and which claims a speedup in reindex-chainstate.

  25. l0rinc force-pushed on Jan 24, 2025
  26. l0rinc commented at 4:30 pm on January 24, 2025: contributor

    which claims a speedup in reindex-chainstate

    So the 9% improvement in unserializing seems to translate to roughly 1.2% total runtime improvement in -reindex-chainstate on my machine.

    I can measure several runs to see if it makes any difference (to be able to claim anything reliable about a ~1% potential speedup), but the reads are only 27% faster, while the writes are 2.9x faster, so I’m not surprised that reindex doesn’t shine compared to IBD. But I’m currently rerunning them with the rebased PR to make sure my measurements were realistic. If you have any concrete measurement you’d like me to do, feel free to specify them.

  27. l0rinc force-pushed on Jan 26, 2025
  28. l0rinc force-pushed on Jan 26, 2025
  29. l0rinc renamed this:
    optimization: bulk reads(27%)/writes(290%) in [undo]block [de]serialization, 6% faster IBD
    optimization: bulk reads(32%)/writes(298%) in [undo]block [de]serialization, ~6% faster IBD
    on Jan 27, 2025
  30. l0rinc commented at 4:48 pm on January 27, 2025: contributor

    which this pull is inspired by(?)

    I did bump into it, but basically reinvented it (not the first time) while investigating why XOR is slow. I’ll add @martinus as a coauthor as well, thanks for the hint @maflcko, he deserves it.


    -reindex(-chainstate) requires the read of blocks, so why would it not be measurable?

    Thanks for the push, I have redone all the measurements: reindex-chainstate is very stable (see the updated PR description for details), two runs indicate a 4% speedup (with GCC, will try with Clang as well since it indicated a bigger speedup in the microbenchmarks):

    0Summary
    1  COMMIT=97b4a50c714c801e08bc02648a9c259f284069c2 ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=880000 -dbcache=30000 -printtoconsole=0 -reindex-chainstate -connect=0 ran
    2    1.04 ± 0.00 times faster than COMMIT=5acf12bafeb126f2190b3f401f95199e0eea90c9 ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=880000 -dbcache=30000 -printtoconsole=0 -reindex-chainstate -connect=0
    

    While IBD is closer to the user’s experience (having actual block- and undo writes, which is important since they were modified here), it’s also less stable, so I ran 3 rounds of each, indicating a 6% speedup (bringing reindex and IBD speeds closer together):

    0Summary
    1  COMMIT=0e9fb2ed330960c0e4cd36b077c64ac7d0f84240 ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=880000 -dbcache=30000 -printtoconsole=0 ran
    2    1.06 ± 0.02 times faster than COMMIT=5acf12bafeb126f2190b3f401f95199e0eea90c9 ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=880000 -dbcache=30000 -printtoconsole=0
    
  31. l0rinc force-pushed on Jan 27, 2025
  32. l0rinc renamed this:
    optimization: bulk reads(32%)/writes(298%) in [undo]block [de]serialization, ~6% faster IBD
    [IBD] batch `AutoFile` reads/writes during serialization
    on Mar 12, 2025
  33. l0rinc renamed this:
    [IBD] batch `AutoFile` reads/writes during serialization
    [IBD] batch block reads/writes during `AutoFile` serialization
    on Mar 12, 2025
  34. ryanofsky commented at 9:40 pm on March 13, 2025: contributor

    Code review d700b14491a27a6f4f9069dde86fe914784346ca

    This approach seems reaonable but maybe it would be good to have a more general solution to providing buffered reads and writes instead of having these ad-hoc std::vector<uint8_t> / SpanReader / DataStream / write_large calls.

    Ideally instead of:

    0std::vector<uint8_t> mem(blk_size);
    1filein >> Span{mem};
    2SpanReader(mem) >> TX_WITH_WITNESS(block);
    3// ...
    4fileout.write_large(DataStream(block_size) << TX_WITH_WITNESS(block));
    

    Code could look more like:

    0BufferedReader{fileIn, blk_size} >> TX_WITH_WITNESS(block);
    1// ...
    2BufferedWriter{fileout, block_size} << TX_WITH_WITNESS(block);
    

    But I guess as long as the buffers are only going to be used once and thrown away in this case (and not be fixed-size ring buffers) having buffer classes wouldn’t substantively improve this code much aside from making it look nicer. Buffer stream classes would seem more useful in cases where more fields are serialized and sizes are not known in advance.

    Just wanted to suggest this in case there are other places where serialization is slow and buffering could speed it up. We might want a more general solution if that is the case.

  35. l0rinc force-pushed on Mar 14, 2025
  36. l0rinc commented at 2:20 pm on March 14, 2025: contributor

    Thanks for the review @ryanofsky

    would be good to have a more general solution to providing buffered reads and writes

    That’s what I had originally (with a fixed-size buffer, so was even more general than your suggestion), making the call-site-diff trivial: #31539. It was suggested by @theuni that it may be simpler to serialize to buffer directly. It does seem simpler to reason about this one, especially since they basically had the same speed. What do you think?

    What other places are candidates for similar buffering?

  37. ryanofsky commented at 3:33 pm on March 14, 2025: contributor

    No strong opinion. I guess I’d be a little disappointed if you initially had a general solution that made it easy to enable buffering any place and then were told that instead of using that, you should add kludges to this code and implement ad-hoc buffering here. I don’t think buffering is difficult to reason about and don’t think buffering is the type of thing that needs to be deployed on a case-by-case basis, especially since performance of buffering should mostly depend on where data is being written, not what data is being written. So if there is evidence that XORed file i/o is faster with buffering one place, probably we should expect XORed file i/o with buffering to be faster other places as well.

    I have no problems with current version of this PR though, and I can understand reasons why there might be pushback against implementation in #31539 since since the BufferedReadOnlyFile class implemented there duplicated functionality of AutoFile, didn’t support writing, and didn’t work with any type of stream. So it could be faulted not just for introducing a new abstraction, but for the abstraction not being usable in more places and not being compatible with current stream classes.

    Would maybe suggest adding a little “Design considerations” or “Alternatives considered” section to end of PR description to link to #31539 and say how current design was chosen, because it would have been useful to see and know about the alternative when reviewing this PR.

  38. l0rinc commented at 4:20 pm on March 14, 2025: contributor

    since the BufferedReadOnlyFile class implemented there duplicated functionality of AutoFile, didn’t support writing, and didn’t work with any type of stream

    There was a writer as well, but it resulted in some weird CI failures and given this alternative I just closed the PR.

    disappointed if you initially had a general solution that made it easy to enable buffering any place and then were told that instead of using that

    No, the suggested approach was better, that’s why I kept this one. I’d be curious what other reviewers think, but I don’t mind re-generalizing it (now or in a follow-up, when we find other usages).

    XORed file i/o is faster with buffering one place, probably we should expect XORed file i/o with buffering to be faster other places as well

    It’s not just the xor though, many small writes seemed to be slower in general, it seems faster if we buffer instead of the OS (likely because of locality, the file is a global resource that needs locking).

    don’t think buffering is the type of thing that needs to be deployed on a case-by-case basis

    I can get behind that, can you point me to other usages where we could use this?

  39. ryanofsky commented at 5:01 pm on March 14, 2025: contributor

    I can get behind that, can you point me to other usages where we could use this?

    I don’t know, but if I were looking I’d just grep for places where AutoFile class is used. I don’t have any real concerns here and I’m fine with the current approach. To the extent I do have concerns they are about readability and maintainability of code not performance, so I like an approach that makes it trivial to turn buffering on and off, which is I why thought BufferedReader / BufferedWriter stream adapters suggested in #31551 (comment) might be a good idea. But when I posted that I didn’t know a similar idea had already been tried before, so wouldn’t suggest changing the approach again at this point without feedback from the other reviewers.

  40. l0rinc force-pushed on Mar 14, 2025
  41. l0rinc commented at 11:09 pm on March 14, 2025: contributor

    Thanks again for the hint @ryanofsky, I went through all changes again and reworked the batching - it’s so much better this way and it includes your idea, @theuni’s and @maflcko’s.

    Additionally I fixed a few things after rebase:

    • There was a leftover SaveBlock from the previous refactor;
    • Commits and docs still contained build/src/benc/bench_bitcoin - changed them to build/bin/bench_bitcoin;
    • BLOCK_SERIALIZATION_HEADER_SIZE is used inline in the examples now and is too long - simplified it in a separate commit;
    • Added BufferedFileW and BufferedFileR, encapsulating the buffering (and the new AutoFile::write_large) so that we can have a similar usage to what you suggested;
    • I have also extracted the low-risk refactorings (which are needed for the batching later) to a separate commit to minimize the diffs in the risky commits.

    The performance is same as before, the commits should be a lot simpler to review.

  42. DrahtBot added the label Needs rebase on Mar 20, 2025
  43. l0rinc force-pushed on Mar 20, 2025
  44. l0rinc force-pushed on Mar 20, 2025
  45. DrahtBot added the label CI failed on Mar 20, 2025
  46. DrahtBot commented at 1:35 pm on March 20, 2025: contributor

    🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/39110927855

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  47. in src/streams.h:485 in cf019b73a7 outdated
    480+    explicit BufferedFileW(AutoFile& file, const uint32_t buffer_size)
    481+        : m_file(file), m_buffer_size{buffer_size}, m_buf{buffer_size} {}
    482+
    483+    ~BufferedFileW()
    484+    {
    485+        Assert(m_buf.size() <= m_buffer_size);
    


    maflcko commented at 2:25 pm on March 20, 2025:
    This could be an assume, given that the program can continue normally, albeit it is a pessimization or possible OOM on low memory machines?

    l0rinc commented at 4:48 pm on March 20, 2025:

    This could be an assume

    I’m fine with both, thanks, changed reader & writer.

    This could be an assume, given that the program can continue normally, albeit it is a pessimization or possible OOM on low memory machines?

    You mean reading the whole block into memory could be the last straw? Can you please detail the pessimization scenario?


    maflcko commented at 5:37 pm on March 20, 2025:
    I just mean that the assert gives a hard upper bound, but the assume is a bit weaker on release builds, so if the code is used in a place where the upper bound is too small, the assume-version would continue to work on machines with enough memory and only crash “when needed” (aka when running oom).
  48. DrahtBot removed the label Needs rebase on Mar 20, 2025
  49. l0rinc force-pushed on Mar 20, 2025
  50. DrahtBot removed the label CI failed on Mar 20, 2025
  51. ryanofsky commented at 8:00 pm on March 24, 2025: contributor

    Code review 2c073b90ae977af1ac01164a894edd978529fdc6. I think the new BufferedFile approach implemented is reasonable and a little nicer than the previous approach which made lower-level calls in block storage code to allocate and pass buffers. But the need to hardcode and compute sizes in 8782d6ed09bbb947426c1ed54ad4d5188326764d and 2c073b90ae977af1ac01164a894edd978529fdc6 still seems to make the storage code more awkward and fragile, and require earlier refactoring changes that make the PR more complicated overall.

    If it would be possible, and if it would ok with you and other reviewers, I would prefer if it we could make minimal changes to the blockstorage code and instead add BufferedReader/BufferedWriter streams that only add buffering without making other changes. A diff of what this could look like is:

      0--- a/src/node/blockstorage.cpp
      1+++ b/src/node/blockstorage.cpp
      2@@ -660,11 +660,12 @@ bool BlockManager::ReadBlockUndo(CBlockUndo& blockundo, const CBlockIndex& index
      3     const FlatFilePos pos{WITH_LOCK(::cs_main, return index.GetUndoPos())};
      4 
      5     // Open history file to read
      6-    AutoFile filein{OpenUndoFile(pos, true)};
      7-    if (filein.IsNull()) {
      8+    AutoFile file{OpenUndoFile(pos, true)};
      9+    if (file.IsNull()) {
     10         LogError("OpenUndoFile failed for %s", pos.ToString());
     11         return false;
     12     }
     13+    BufferedReader filein{file};
     14 
     15     // Read block
     16     uint256 hashChecksum;
     17@@ -937,11 +938,12 @@ bool BlockManager::WriteBlockUndo(const CBlockUndo& blockundo, BlockValidationSt
     18             return false;
     19         }
     20         // Open history file to append
     21-        AutoFile fileout{OpenUndoFile(pos)};
     22-        if (fileout.IsNull()) {
     23+        AutoFile file{OpenUndoFile(pos)};
     24+        if (file.IsNull()) {
     25             LogError("OpenUndoFile failed");
     26             return FatalError(m_opts.notifications, state, _("Failed to write undo data."));
     27         }
     28+        BufferedWriter fileout{file};
     29 
     30         // Write index header
     31         fileout << GetParams().MessageStart() << blockundo_size;
     32@@ -986,11 +988,12 @@ bool BlockManager::ReadBlock(CBlock& block, const FlatFilePos& pos) const
     33     block.SetNull();
     34 
     35     // Open history file to read
     36-    AutoFile filein{OpenBlockFile(pos, true)};
     37-    if (filein.IsNull()) {
     38+    AutoFile file{OpenBlockFile(pos, true)};
     39+    if (file.IsNull()) {
     40         LogError("%s: OpenBlockFile failed for %s\n", __func__, pos.ToString());
     41         return false;
     42     }
     43+    BufferedReader filein{file};
     44 
     45     // Read block
     46     try {
     47@@ -1039,11 +1042,12 @@ bool BlockManager::ReadRawBlock(std::vector<uint8_t>& block, const FlatFilePos&
     48         return false;
     49     }
     50     hpos.nPos -= 8; // Seek back 8 bytes for meta header
     51-    AutoFile filein{OpenBlockFile(hpos, true)};
     52-    if (filein.IsNull()) {
     53+    AutoFile file{OpenBlockFile(hpos, true)};
     54+    if (file.IsNull()) {
     55         LogError("%s: OpenBlockFile failed for %s\n", __func__, pos.ToString());
     56         return false;
     57     }
     58+    BufferedReader filein{file};
     59 
     60     try {
     61         MessageStartChars blk_start;
     62@@ -1082,12 +1086,13 @@ FlatFilePos BlockManager::WriteBlock(const CBlock& block, int nHeight)
     63         LogError("FindNextBlockPos failed");
     64         return FlatFilePos();
     65     }
     66-    AutoFile fileout{OpenBlockFile(pos)};
     67-    if (fileout.IsNull()) {
     68+    AutoFile file{OpenBlockFile(pos)};
     69+    if (file.IsNull()) {
     70         LogError("OpenBlockFile failed");
     71         m_opts.notifications.fatalError(_("Failed to write block."));
     72         return FlatFilePos();
     73     }
     74+    BufferedWriter fileout{file};
     75 
     76     // Write index header
     77     fileout << GetParams().MessageStart() << block_size;
     78--- a/src/streams.cpp
     79+++ b/src/streams.cpp
     80@@ -64,6 +64,15 @@ void AutoFile::read(std::span<std::byte> dst)
     81     }
     82 }
     83 
     84+void AutoFile::read_buffer(std::span<std::byte>& dst)
     85+{
     86+    size_t size{detail_fread(dst)};
     87+    if (size != dst.size() && !feof()) {
     88+        throw std::ios_base::failure("AutoFile::read_buffer: fread failed");
     89+    }
     90+    dst = dst.first(size);
     91+}
     92+
     93 void AutoFile::ignore(size_t nSize)
     94 {
     95     if (!m_file) throw std::ios_base::failure("AutoFile::ignore: file handle is nullptr");
     96@@ -102,6 +111,16 @@ void AutoFile::write(std::span<const std::byte> src)
     97     }
     98 }
     99 
    100+void AutoFile::write_buffer(std::span<std::byte> src)
    101+{
    102+    if (!m_file) throw std::ios_base::failure("AutoFile::write_buffer: file handle is nullptr");
    103+    util::Xor(src, m_xor, *m_position); // obfuscate in-place
    104+    if (std::fwrite(src.data(), 1, src.size(), m_file) != src.size()) {
    105+        throw std::ios_base::failure("AutoFile::write_buffer: write failed");
    106+    }
    107+    if (m_position) *m_position += src.size();
    108+}
    109+
    110 bool AutoFile::Commit()
    111 {
    112     return ::FileCommit(m_file);
    113--- a/src/streams.h
    114+++ b/src/streams.h
    115@@ -465,6 +465,11 @@ public:
    116         ::Unserialize(*this, obj);
    117         return *this;
    118     }
    119+
    120+    //! Read into a mutable buffer more flexibly than read(), shortening the buffer if reached EOF.
    121+    void read_buffer(std::span<std::byte>& dst);
    122+    //! Write a mutable buffer more efficiently than write(), XORing the buffer in-place.
    123+    void write_buffer(std::span<std::byte> src);
    124 };
    125 
    126 /** Wrapper around an AutoFile& that implements a ring buffer to
    127@@ -614,4 +619,72 @@ public:
    128     }
    129 };
    130 
    131+template<typename S>
    132+class BufferedReader
    133+{
    134+    S& m_stream;
    135+    std::vector<std::byte> m_buffer;
    136+    size_t m_pos; //!< Next m_buffer position to read.
    137+public:
    138+    BufferedReader(S& stream, size_t size=65536) : m_stream{stream}, m_buffer{size}, m_pos{size} {}
    139+    void read(std::span<std::byte> dst)
    140+    {
    141+        while (!dst.empty()) {
    142+            if (m_pos == m_buffer.size()) {
    143+                std::span<std::byte> data{m_buffer};
    144+                m_stream.read_buffer(data);
    145+                m_buffer.resize(data.size()); // shrink on eof
    146+                m_pos = 0;
    147+            }
    148+            if (m_buffer.size() == 0) throw std::ios_base::failure("BufferedReader::read: end of file");
    149+            size_t n = std::min(dst.size(), m_buffer.size() - m_pos);
    150+            std::copy(m_buffer.begin() + m_pos, m_buffer.begin() + m_pos + n, dst.begin());
    151+            m_pos += n;
    152+            dst = dst.subspan(n);
    153+        }
    154+    }
    155+
    156+    template <typename T>
    157+    BufferedReader& operator>>(T&& obj)
    158+    {
    159+        ::Unserialize(*this, obj);
    160+        return *this;
    161+    }
    162+};
    163+
    164+template<typename S>
    165+class BufferedWriter
    166+{
    167+    S& m_stream;
    168+    std::vector<std::byte> m_buffer;
    169+    size_t m_pos; //!< Next m_buffer position to write.
    170+public:
    171+    BufferedWriter(S& stream, size_t size=65536) : m_stream{stream}, m_buffer{size}, m_pos{0} {}
    172+    ~BufferedWriter() { flush(); }
    173+
    174+    void write(std::span<const std::byte> src)
    175+    {
    176+        while (!src.empty()) {
    177+            size_t n = std::min(src.size(), m_buffer.size() - m_pos);
    178+            std::copy(src.begin(), src.begin() + n, m_buffer.begin() + m_pos);
    179+            m_pos += n;
    180+            if (m_pos == m_buffer.size()) flush();
    181+            src = src.subspan(n);
    182+        }
    183+    }
    184+
    185+    void flush()
    186+    {
    187+         m_stream.write_buffer(std::span{m_buffer}.first(m_pos));
    188+         m_pos = 0;
    189+    }
    190+
    191+    template <typename T>
    192+    BufferedWriter& operator<<(const T& obj)
    193+    {
    194+        ::Serialize(*this, obj);
    195+        return *this;
    196+    }
    197+};
    198+
    199 #endif // BITCOIN_STREAMS_H
    

    This is untested but compiles. If it’s workable I think it would be a better direction to go in, both to keep blockstorage code simpler, but also to provide better support for buffered reads and writes in the stream API in case there are other parts of code that could benefit from them.

    I am also ok with with current approach and happy to review if there are issues with this or it’s just not the direction we want to go.

  52. l0rinc commented at 9:00 pm on March 24, 2025: contributor
    @ryanofsky, thanks again for the review. Your suggestion seems to point back to the initial implementation of this PR, i.e. https://github.com/bitcoin/bitcoin/pull/31539/commits, where we didn’t know the full size of the block, and read and wrote in chunks. This current alternative was explicitly suggested by @theuni - and while it’s a bit messy to always read the size, the resulting code is faster and seem to have fewer moving parts (e.g. read doesn’t mutate the parameter, write doesn’t copy the write buffer, doesn’t loop inside on buffers of different size, etc), it’s why I went this way. Maybe there’s a middle ground by allocating max block size, ignoring the actual size - which would potentially simplify both approaches - would that be better than the current solutions? I’ll investigate tomorrow…
  53. ryanofsky commented at 9:37 pm on March 24, 2025: contributor

    @ryanofsky, thanks again for the review. Your suggestion seems to point back to the initial implementation of this PR, i.e. https://github.com/bitcoin/bitcoin/pull/31539/commits

    Absolutely yes. There are a few things I would change about that PR but overall it seems a lot cleaner and simpler to me than this PR. If it doesn’t perform as well as this one that is surprising, because I thought the performance improvement here came from buffering, not from choosing buffers with magic sizes. But if you and other reviewers agree the current approach is better than the previous approach implemented in #31539, I guess this is the way to go, and I don’t mind reviewing it as-is.

  54. l0rinc force-pushed on Mar 26, 2025
  55. l0rinc force-pushed on Mar 26, 2025
  56. refactor: rename leftover WriteBlockBench
    The benchmark was referencing the old name of the method
    3e825068ef
  57. refactor: collect block read operations into try block
    In preparation for upcoming changes, `HashVerifier` was included in the try/catch to include the `undo_size` serialization there as well since the try is about `Deserialize` errors. This is why the final checksum verification was also included in the try.
    fbe701bf32
  58. refactor: shorten BLOCK_SERIALIZATION_HEADER_SIZE
    We will be using it inline, the name should be shorter
    4d90db8b96
  59. log: unify error messages for (read/write)[undo]block
    Co-authored-by: maflcko <6399679+maflcko@users.noreply.github.com>
    1b62425d4e
  60. refactor: replace raw values in ReadRawBlock with HEADER_BYTE_SIZE fc97b81996
  61. l0rinc force-pushed on Mar 26, 2025
  62. l0rinc commented at 2:54 pm on March 26, 2025: contributor

    @ryanofsky, I’ve experimented with your version, turns out I’m mostly fine with both solutions. https://github.com/bitcoin/bitcoin/compare/2c073b90ae977af1ac01164a894edd978529fdc6..06f37689a3b79f2f65ac95d8ae54a407ed2d4502

    I have based this version on the patch you’ve posted - with minor differences.

    • Removed all introduced undo_size seek backs - except in ReadRawBlock, which already steps back to get access to the size. I have cleaned it up in a separate commit - we don’t need buffering there since we’re not serializing anything there, just reading the whole block directly already.
    • read_buffer/write_buffer introduces non-trivial Duck Typing (i.e. now users have to know if the want to call void write(std::span<const std::byte> src); or void write_buffer(std::span<std::byte> src);), I don’t particularly like it, but we can live with it.
    • std::copy can be simplified to std::copy_n
    • the new AutoFile::write_buffer can now be used in AutoFile::write to reduce duplication
    • Changed read_buffer to return the size instead of mutating the parameter, this enabled us to avoid resizing the buffer for eof in BufferedReader (the benchmarks caught the slowdown)
    • Used DataStream as buffers
    • Added tests for the buffered writers - for the read/write methods and separately for the serialization part

    The ReadBlockBench/WriteBlockBench speedups changed slightly from 24%/31% to 19%/24% on Linux/GCC, and from 32%/298% on Mac to 30%/ 194% To see how real performance is affected, I’m rerunning a few full IBDs to measure the performance change after the buffering changes (reindex won’t suffice here).

  63. optimization: Bulk serialization reads in `UndoRead`, `ReadBlock`
    The Obfuscation (XOR) operations are currently done byte-by-byte during serialization, buffering the reads will enable batching the obfuscation operations later (not yet done here).
    
    Also, different operating systems seem to handle file caching differently, so reading bigger batches (and processing those from memory) is also a bit faster (likely because of fewer native fread calls or less locking).
    
    Note that `ReadRawBlock` doesn't need buffering since it's already reading the whole block directly.
    
    The current implementation of looping over a fixed-sized buffer (based on a previous implementation, modified by Rus Yanofsky) is a more general alternative to the solution provided by Cory Fields of reading the whole block size in advance.
    
    ------
    
    > macOS Sequoia 15.3.1
    > C++ compiler .......................... Clang 19.1.7
    > cmake -B build -DBUILD_BENCH=ON -DCMAKE_BUILD_TYPE=Release -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ && cmake --build build -j$(nproc) && build/bin/bench_bitcoin -filter='WriteBlockBench' -min-time=10000
    
    Before:
    
    |               ns/op |                op/s |    err% |     total | benchmark
    |--------------------:|--------------------:|--------:|----------:|:----------
    |        2,271,441.67 |              440.25 |    0.1% |     11.00 | `ReadBlockBench`
    
    After:
    
    |               ns/op |                op/s |    err% |     total | benchmark
    |--------------------:|--------------------:|--------:|----------:|:----------
    |        1,738,683.04 |              575.15 |    0.2% |     11.04 | `ReadBlockBench`
    
    ------
    
    > Ubuntu 24.04.2 LTS
    > C++ compiler .......................... GNU 13.3.0
    > cmake -B build -DBUILD_BENCH=ON -DCMAKE_BUILD_TYPE=Release -DCMAKE_C_COMPILER=gcc -DCMAKE_CXX_COMPILER=g++ && cmake --build build -j$(nproc) && build/bin/bench_bitcoin -filter='WriteBlockBench' -min-time=20000
    
    Before:
    
    |               ns/op |                op/s |    err% |          ins/op |          cyc/op |    IPC |         bra/op |   miss% |     total | benchmark
    |--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
    |        6,895,987.11 |              145.01 |    0.0% |   71,055,269.86 |   23,977,374.37 |  2.963 |   5,074,828.78 |    0.4% |     22.00 | `ReadBlockBench`
    
    After:
    
    |               ns/op |                op/s |    err% |          ins/op |          cyc/op |    IPC |         bra/op |   miss% |     total | benchmark
    |--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
    |        5,771,882.71 |              173.25 |    0.0% |   65,741,889.82 |   20,453,232.33 |  3.214 |   3,971,321.75 |    0.3% |     22.01 | `ReadBlockBench`
    
    Co-authored-by: Ryan Ofsky <ryan@ofsky.org>
    Co-authored-by: Martin Leitner-Ankerl <martin.ankerl@gmail.com>
    Co-authored-by: Cory Fields <cory-nospam-@coryfields.com>
    29678d79e7
  64. optimization: Bulk serialization writes in `WriteBlockUndo` and `WriteBlock`
    Added `AutoFile::write_buffer` for batching obfuscation operations, so instead of copying the data and doing the xor in a 4096 byte array, we're doing it directly on the input.
    
    Similarly to the serialization reads, buffered writes will enable batched xor calculations - especially since currently we need to copy the write input's std::span to do the obfuscation on it, batching enables doing the xor on the internal buffer instead.
    
    ------
    
    > macOS Sequoia 15.3.1
    > C++ compiler .......................... Clang 19.1.7
    > cmake -B build -DBUILD_BENCH=ON -DCMAKE_BUILD_TYPE=Release -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ && cmake --build build -j$(nproc) && build/bin/bench_bitcoin -filter='WriteBlockBench' -min-time=10000
    
    Before:
    
    |               ns/op |                op/s |    err% |     total | benchmark
    |--------------------:|--------------------:|--------:|----------:|:----------
    |        5,149,564.31 |              194.19 |    0.8% |     10.95 | `WriteBlockBench`
    
    After:
    
    |               ns/op |                op/s |    err% |     total | benchmark
    |--------------------:|--------------------:|--------:|----------:|:----------
    |        3,052,658.88 |              327.58 |    1.0% |     10.91 | `WriteBlockBench`
    
    ------
    
    > Ubuntu 24.04.2 LTS
    > C++ compiler .......................... GNU 13.3.0
    > cmake -B build -DBUILD_BENCH=ON -DCMAKE_BUILD_TYPE=Release -DCMAKE_C_COMPILER=gcc -DCMAKE_CXX_COMPILER=g++ && cmake --build build -j$(nproc) && build/bin/bench_bitcoin -filter='WriteBlockBench' -min-time=20000
    
    Before:
    
    |               ns/op |                op/s |    err% |          ins/op |          cyc/op |    IPC |         bra/op |   miss% |     total | benchmark
    |--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
    |        5,152,973.58 |              194.06 |    2.2% |   19,350,886.41 |    8,784,539.75 |  2.203 |   3,079,335.21 |    0.4% |     23.18 | `WriteBlockBench`
    
    After:
    
    |               ns/op |                op/s |    err% |          ins/op |          cyc/op |    IPC |         bra/op |   miss% |     total | benchmark
    |--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
    |        4,145,681.13 |              241.21 |    4.0% |   15,337,596.85 |    5,732,186.47 |  2.676 |   2,239,662.64 |    0.1% |     23.94 | `WriteBlockBench`
    
    Co-authored-by: Ryan Ofsky <ryan@ofsky.org>
    Co-authored-by: Cory Fields <cory-nospam-@coryfields.com>
    8fe73a102f
  65. l0rinc force-pushed on Mar 28, 2025
  66. l0rinc commented at 10:23 am on March 28, 2025: contributor
    @maflcko / @ryanofsky, this implementation seems to fail in the same weird way as https://github.com/bitcoin/bitcoin/actions/runs/12918565902/job/36027337187?pr=31539 - with 100% tests passing. The last PR indicates it’s the batch writing part - could it be an OOM because the sanitizer overhead in DataStream#resize(1 << 20)? WriteBlockBench indicates that even a buffer size of 1 << 16 is half the speed (Linux/GCC speed is the same) - but I’ll try it, let’s see if it passes CI at least.
  67. maflcko commented at 10:48 am on March 28, 2025: member

    Does CI pass locally before and after? Measuring locally, what is the memory usage of the CI before and after?

    It seems odd that using exact block sizes (this pull initially) passes, whereas using a 1<<20 approximation fails due to OOM.

  68. l0rinc commented at 7:47 pm on March 30, 2025: contributor

    It seems odd that using exact block sizes (this pull initially) passes, whereas using a 1«20 approximation fails due to OOM

    Yes, I thought of the same, but the other PR had the same issue. I’m not familiar enough with the intricacies of ASan + LSan + UBSan + integer, no depends, USDT to have a good enough explanation - besides the instrumentation overhead causing it to overflow.

    What is the memory usage of the CI before and after?

    I have measured a valgrind/massif IBD until the be benchmarked block #413567, indicating that there isn’t a significant memory usage change - not sure about the CI usage, that seems messy to measure.

     0COMMITS="06f37689a3b79f2f65ac95d8ae54a407ed2d4502 8fe73a102f5b8406f166d7ac4b1a36bba84de7c4"; \
     1STOP_HEIGHT=413567; DBCACHE=1000; \
     2C_COMPILER=gcc; CXX_COMPILER=g++; \
     3BASE_DIR="/mnt/my_storage"; DATA_DIR="$BASE_DIR/BitcoinData"; LOG_DIR="$BASE_DIR/logs"; \
     4for COMMIT in $COMMITS; do \
     5  git fetch --all -q && git fetch origin $COMMIT && git log -1 --oneline $COMMIT && \
     6  killall bitcoind; rm -rf $DATA_DIR/* && git checkout $COMMIT && git clean -fxd && git reset --hard && \
     7  cmake -B build -DCMAKE_BUILD_TYPE=Release -DENABLE_WALLET=OFF -DCMAKE_C_COMPILER=$C_COMPILER -DCMAKE_CXX_COMPILER=$CXX_COMPILER && \
     8  cmake --build build -j$(nproc) --target bitcoind && \
     9  ./build/bin/bitcoind -datadir=$DATA_DIR -stopatheight=1 -printtoconsole=0 && \
    10  valgrind --tool=massif --time-unit=ms \
    11    --massif-out-file="$LOG_DIR/massif-$COMMIT-$STOP_HEIGHT-$DBCACHE.out" \
    12    ./build/bin/bitcoind -datadir=$DATA_DIR \
    13    -stopatheight=$STOP_HEIGHT \
    14    -dbcache=$DBCACHE \
    15    -blocksonly=1 \
    16    -printtoconsole=0 && \
    17  cp $DATA_DIR/debug.log "$LOG_DIR/debug-$COMMIT-$STOP_HEIGHT-$DBCACHE.log" && \
    18  ms_print "$LOG_DIR/massif-$COMMIT-$STOP_HEIGHT-$DBCACHE.out" | tee "$LOG_DIR/massif-$COMMIT-$STOP_HEIGHT-$DBCACHE.txt"; \
    19done
    

    06f37689a3 - size_t size = 1 « 20

     0    GB
     11.322^              #
     2     |              #     :     :               @      :    :               :
     3     |             @#     :     :    :          @:     :    :    :     :    :
     4     |            :@#    ::    ::    :    ::    @:     :   ::    :    ::   ::
     5     |            :@#   @::    ::   ::    :     @:    ::   ::    :    ::   @:
     6     |            :@#   @::    ::   ::    :     @:    ::   ::   ::    ::   @:
     7     |            :@#   @::    ::   ::   ::     @:   :::   ::   ::   :::   @:
     8     |            :@#   @::  ::::  :::   ::     @:  @:::  :::  :::   :::  :@:
     9     |          :@:@#  :@::  : ::  :::  :::     @:  @::: ::::  :::   :::  :@:
    10     |         ::@:@#  :@::  : ::  :::  :::     @:  @::: ::::  :::   ::: ::@:
    11     |        @::@:@#  :@::  : :: @:::  :::    :@:  @::: ::::  :::  @::: ::@:
    12     |        @::@:@# ::@:: :: :: @:::  :::    :@:  @::: :::: :::: :@::: ::@::
    13     |       @@::@:@# ::@:: :: :: @:::  :::    :@: :@::::::::@:::: :@::: ::@::
    14     |     ::@@::@:@# ::@:: :: :::@::: :::: @  :@: :@::::::::@:::: :@::::::@::
    15     |    :: @@::@:@#:::@::::: :::@::: :::: @  :@: :@::::::::@:::: :@::::::@::
    16     |    :: @@::@:@#:::@::::: :::@:::::::: @:::@:::@::::::::@::::::@::::::@::
    17     |   ::: @@::@:@#:::@::::: :::@:::::::: @: :@:::@::::::::@::::::@::::::@::
    18     | ::::: @@::@:@#:::@::::: :::@:::::::: @: :@:::@::::::::@::::::@::::::@::
    19     | ::::: @@::@:@#:::@::::: :::@:::::::: @: :@:::@::::::::@::::::@::::::@::
    20     | ::::: @@::@:@#:::@::::: :::@:::::::: @: :@:::@::::::::@::::::@::::::@::
    21   0 +----------------------------------------------------------------------->h
    22     0                                                                   4.449
    

    vs

    8fe73a102f - size_t size = 1 « 16

     0    GB
     11.336^                 #
     2     |           @     #                   :   :           :     :
     3     |           @     #    :     :        :   :     :     :     :    :    :
     4     |          @@     #    :     :     :  :   :     :    ::    @:    :    :
     5     |          @@    :#    :    ::     :  :  ::     :   :::   :@:   ::   ::
     6     |          @@    :#    :    ::     :  :  ::    @:   :::   :@:   ::   ::
     7     |          @@    :#   @:    ::     :  :  ::    @:   :::  ::@:  :::   ::
     8     |        @@@@   ::#   @:   :::   :::  : :::  ::@:  @:::  ::@:  :::   ::
     9     |        @ @@   ::#  :@:   :::   : :  : :::  ::@:  @:::  ::@:  :::  :::
    10     |      @@@ @@   ::#  :@:  ::::   : : :: :::  ::@:  @::: :::@: ::::  :::
    11     |      @@@ @@ ::::#  :@:  ::::   : : :: :::  ::@: :@::: :::@: :::: :::: :
    12     |      @@@ @@ : ::# ::@:  ::::  @: : ::@:::  ::@: :@:::::::@:::::: :::: :
    13     |    ::@@@ @@ : ::# ::@:  ::::  @: : ::@::: :::@:::@:::::::@::::::@:::: :
    14     |   :: @@@ @@:: ::#:::@:  ::::  @: ::::@::: :::@:::@:::::::@::::::@:::: :
    15     |   :: @@@ @@:: ::#:::@:  ::::  @: ::::@:::@:::@:::@:::::::@::::::@::::::
    16     |  ::: @@@ @@:: ::#:::@:@@::::::@: ::::@:::@:::@:::@:::::::@::::::@::::::
    17     | :::: @@@ @@:: ::#:::@:@ ::::: @: ::::@:::@:::@:::@:::::::@::::::@::::::
    18     | :::: @@@ @@:: ::#:::@:@ ::::: @: ::::@:::@:::@:::@:::::::@::::::@::::::
    19     | :::: @@@ @@:: ::#:::@:@ ::::: @: ::::@:::@:::@:::@:::::::@::::::@::::::
    20     | :::: @@@ @@:: ::#:::@:@ ::::: @: ::::@:::@:::@:::@:::::::@::::::@::::::
    21   0 +----------------------------------------------------------------------->h
    22     0                                                                   4.111
    

    Does CI pass locally before and after?

    It’s passing here as well now with 1 << 16 instead of 1 << 20.


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-03-31 09:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me