merkle: pre‑reserve leaves to prevent reallocs with odd vtx count #32497

pull l0rinc wants to merge 2 commits into bitcoin:master from l0rinc:l0rinc/pre‑reserve-merkle-leaves-to-max changing 4 files +26 −16
  1. l0rinc commented at 1:01 pm on May 14, 2025: contributor

    Summary

    ComputeMerkleRoot duplicates the last hash when the input size is odd. If the caller provides a std::vector whose capacity equals its size (common when the vector was created with resize()), that extra push_back forces a reallocation, doubling its capacity. We pay this penalty on every Merkle root calculation even though we know the final size in advance.

    Fix

    • Call sites: replace resize(n) + index assignment with reserve(n + (n & 1)) + emplace_back(...);
    • Guarantees one extra slot when vtx.size() is odd (assigned later);
    • Removes default construction of uint256 objects we overwrite anyway.

    Validation

    The benchmark was updated to use an even leaf count for simplicity and to focus on hashing speed rather than reallocations or vector copying.

    Asserts (not pushed in this PR) confirm that push_back never reallocates and that the coinbase witness hash remains null:

    0if (hashes.size() & 1) {
    1    assert(hashes.size() < hashes.capacity()); // TODO remove
    2    hashes.push_back(hashes.back());
    3}
    4
    5leaves.reserve(block.vtx.size() + (block.vtx.size() & 1)); // capacity rounded up to even
    6leaves.emplace_back();
    7assert(leaves.back().IsNull()); // TODO remove
    

    A full -reindex-chainstate up to block 896 408 ran without triggering the asserts.

  2. DrahtBot commented at 1:01 pm on May 14, 2025: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/32497.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK yuvicc
    Concept ACK luke-jr

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #32189 (refactor: Txid type safety (parent PR) by marcofleon)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  3. in src/consensus/merkle.cpp:69 in 39b6c139bd outdated
    65@@ -66,20 +66,20 @@ uint256 ComputeMerkleRoot(std::vector<uint256> hashes, bool* mutated) {
    66 uint256 BlockMerkleRoot(const CBlock& block, bool* mutated)
    67 {
    68     std::vector<uint256> leaves;
    69-    leaves.resize(block.vtx.size());
    70-    for (size_t s = 0; s < block.vtx.size(); s++) {
    71-        leaves[s] = block.vtx[s]->GetHash();
    72+    leaves.reserve((block.vtx.size() + 1) & ~1ULL); // capacity rounded up to even
    


    luke-jr commented at 11:25 am on May 20, 2025:
    This assumes size_t is unsigned long long I think? Can we avoid that easily?

    l0rinc commented at 1:27 pm on May 20, 2025:
    we could either do size_t{1} or leaves.reserve(block.vtx.size() + (block.vtx.size() & 1)), I think, if you think this could be problematic on 32 bit systems.

    l0rinc commented at 9:36 am on June 15, 2025:
    I ended up with block.vtx.size() + (block.vtx.size() & 1) - if the value ends with a 1, it’s an odd number and we increment it to make it even
  4. in src/bench/merkle_root.cpp:15 in 899e466237 outdated
    10@@ -11,16 +11,15 @@
    11 
    12 static void MerkleRoot(benchmark::Bench& bench)
    13 {
    14-    FastRandomContext rng(true);
    15-    std::vector<uint256> leaves;
    16-    leaves.resize(9001);
    17+    FastRandomContext rng{/*fDeterministic=*/true};
    18+    std::vector<uint256> leaves(9000); // even number of leaves
    


    luke-jr commented at 11:33 am on May 20, 2025:
    Wouldn’t it be better to benchmark the less-performant input?

    l0rinc commented at 1:24 pm on May 20, 2025:
    After this change all call-sites are pre-sized to even number of leaves, so this reflects the usage better. I could also round to 9002 to mimic the new leaves.reserve((9001 + 1) & ~1ULL) behavior, if you thinks that’s more representative.

    l0rinc commented at 9:35 am on June 15, 2025:
    added back 9001
  5. in src/bench/merkle_root.cpp:22 in 899e466237 outdated
    23-        bool mutation = false;
    24-        uint256 hash = ComputeMerkleRoot(std::vector<uint256>(leaves), &mutation);
    25-        leaves[mutation] = hash;
    26+        bool mutation{false};
    27+        const uint256 hash{ComputeMerkleRoot(std::vector(leaves), &mutation)};
    28+        assert(mutation == false && hash == uint256{"18ff9b2bef46a834b2625871b2839fec3b67fc1243920c424858324fbde1a8de"});
    


    luke-jr commented at 11:33 am on May 20, 2025:
    How does this avoid optimisations?

    l0rinc commented at 1:21 pm on May 20, 2025:
    the ComputeMerkleRoot calculation has to be performed to be able to validate this assert - it’s can’t eliminate the call.
  6. luke-jr approved
  7. luke-jr commented at 11:34 am on May 20, 2025: member
    utACK main commit. Not sure why the benchmark is being changed.
  8. l0rinc commented at 1:28 pm on May 20, 2025: contributor
    Thanks for the review!
  9. yuvicc commented at 8:34 pm on June 13, 2025: contributor

    Concept ACK

    Memory allocation/deallocation is slow and expensive.

  10. yuvicc commented at 8:42 pm on June 13, 2025: contributor
    I have thought on changing the benchmark tests as we would want to test the worst case scenario right? or else we could have both case for testing?
  11. yuvicc commented at 9:35 pm on June 13, 2025: contributor

    Benchmark Testing

    master@5757de4ddd37f9321ee6b338b40888fd3561fc00

    • With 9000
    0|             ns/leaf |              leaf/s |    err% |     total | benchmark
    1|--------------------:|--------------------:|--------:|----------:|:----------
    2|               52.62 |       19,005,746.07 |    0.2% |      0.01 | `MerkleRoot`
    3|               52.64 |       18,998,504.40 |    0.3% |      0.01 | `MerkleRoot`
    4|               52.63 |       18,999,727.67 |    0.2% |      0.01 | `MerkleRoot`
    
    • With 9001
    0|             ns/leaf |              leaf/s |    err% |     total | benchmark
    1|--------------------:|--------------------:|--------:|----------:|:----------
    2|               53.50 |       18,693,063.88 |    0.3% |      0.01 | `MerkleRoot`
    3|               53.53 |       18,681,211.49 |    0.5% |      0.01 | `MerkleRoot`
    4|               53.49 |       18,694,053.87 |    0.5% |      0.01 | `MerkleRoot`
    

    Commit 39b6c139bd6be33699af781f1d71f6fed303d468

    • With 9000
    0|             ns/leaf |              leaf/s |    err% |     total | benchmark
    1|--------------------:|--------------------:|--------:|----------:|:----------
    2|               52.51 |       19,043,628.95 |    0.2% |      0.01 | `MerkleRoot`
    3|               52.52 |       19,040,989.96 |    0.2% |      0.01 | `MerkleRoot`
    4|               52.53 |       19,036,358.39 |    0.2% |      0.01 | `MerkleRoot`
    
    • With 9001
    0|             ns/leaf |              leaf/s |    err% |     total | benchmark
    1|--------------------:|--------------------:|--------:|----------:|:----------
    2|               53.40 |       18,713,525.67 |    0.3% |      0.01 | `MerkleRoot`
    3|               53.44 |       19,314,655.73 |    0.3% |      0.01 | `MerkleRoot`
    4|               53.41 |       18,883,462.75 |    0.3% |      0.01 | `MerkleRoot`
    
  12. l0rinc force-pushed on Jun 15, 2025
  13. l0rinc commented at 10:06 am on June 15, 2025: contributor
    Updated the benchmark to showcase the before/after state better (resembles production code changes), by splitting out the vector copies to the unmetered lambda and having an odd number of elements. Changed the previous leaves.reserve((block.vtx.size() + 1) & ~1ULL) rounding to leaves.reserve(block.vtx.size() + (block.vtx.size() & 1)).
  14. l0rinc requested review from luke-jr on Jun 15, 2025
  15. l0rinc requested review from yuvicc on Jun 15, 2025
  16. yuvicc commented at 5:26 pm on June 15, 2025: contributor

    Code review ACK 4cc5942895673591de4edceb6dd0c7188c302a72

    Testing

    Before Master@9a7eece5a4

    ns/leaf leaf/s err% total benchmark
    53.75 18,605,160.70 0.7% 0.01 MerkleRoot
    53.64 18,643,002.35 0.1% 0.01 MerkleRoot
    53.91 18,548,704.52 0.4% 0.01 MerkleRoot

    After commit@4cc5942895673591de4edceb6dd0c7188c302a72

    ns/leaf leaf/s err% total benchmark
    57.67 17,340,305.39 0.0% 0.52 MerkleRoot
    57.69 17,332,534.96 0.0% 0.52 MerkleRoot
    58.14 17,199,057.10 0.0% 0.52 MerkleRoot
  17. in src/bench/merkle_root.cpp:19 in 064f33bbe2 outdated
    18     std::vector<uint256> leaves;
    19-    leaves.resize(9001);
    20-    for (auto& item : leaves) {
    21-        item = rng.rand256();
    22+    leaves.resize(block_vtx_size);
    23+    for (auto& h : leaves) h = rng.rand256();
    


    maflcko commented at 9:11 am on June 20, 2025:
    why is this changed. touching a line to only remove the {} after the for loop seems odd and I had the impression that the dev notes preferred to use {}

    l0rinc commented at 11:19 am on June 20, 2025:
    I’m fine with both, reverted.
  18. in src/bench/merkle_root.cpp:22 in 064f33bbe2 outdated
    21-        item = rng.rand256();
    22+    leaves.resize(block_vtx_size);
    23+    for (auto& h : leaves) h = rng.rand256();
    24+
    25+    constexpr size_t iterations{1000};
    26+    std::vector<std::vector<uint256>> cases(iterations);
    


    maflcko commented at 9:14 am on June 20, 2025:
    not sure about pre-allocating. All real code paths allocate in-line for each iteration, so the benchmark should follow that. This would also avoid eating 100+mb for a microbenchmark?

    l0rinc commented at 10:42 am on June 20, 2025:
    I wanted to explicitly only measure the ComputeMerkleRoot call here without collecting hashes into a vector. But if you think that part should de included, I don’t mind, but in that case we won’t measure the behavior change in ComputeMerkleRoot anymore. However, I think the proper allocation makes sense regardless - e.g. for 2001 entries now we’ll allocate 2002 entries instead of 4002 like before.
  19. in src/consensus/merkle.cpp:71 in 4cc5942895 outdated
    65@@ -66,20 +66,20 @@ uint256 ComputeMerkleRoot(std::vector<uint256> hashes, bool* mutated) {
    66 uint256 BlockMerkleRoot(const CBlock& block, bool* mutated)
    67 {
    68     std::vector<uint256> leaves;
    69-    leaves.resize(block.vtx.size());
    70-    for (size_t s = 0; s < block.vtx.size(); s++) {
    71-        leaves[s] = block.vtx[s]->GetHash();
    72+    leaves.reserve(block.vtx.size() + (block.vtx.size() & 1)); // capacity rounded up to even
    73+    for (const auto& tx : block.vtx) {
    74+        leaves.emplace_back(tx->GetHash());
    


    maflcko commented at 9:19 am on June 20, 2025:
    why is this changed? I don’t think a byte array can meaningfully be moved (of course it compiles, but moving a byte is no different than copying it)

    l0rinc commented at 11:19 am on June 20, 2025:
    After I have used leaves.emplace_back(); // The witness hash of the coinbase is 0 in BlockWitnessMerkleRoot it seemed more consistent to me to use emplace_back in BlockMerkleRoot as well - as you said it should do the same. I’m fine with both, I’ve changed them inside the loop to push_back.
  20. in src/consensus/merkle.cpp:81 in 4cc5942895 outdated
    83-    leaves[0].SetNull(); // The witness hash of the coinbase is 0.
    84-    for (size_t s = 1; s < block.vtx.size(); s++) {
    85-        leaves[s] = block.vtx[s]->GetWitnessHash();
    86+    leaves.reserve(block.vtx.size() + (block.vtx.size() & 1)); // capacity rounded up to even
    87+    leaves.emplace_back(); // The witness hash of the coinbase is 0.
    88+    for (size_t i{1}; i < block.vtx.size(); ++i) {
    


    maflcko commented at 9:20 am on June 20, 2025:
    why is this changed? I don’t think it is worth to touch a line just to change the name of the variable from one single char to another single char.

    l0rinc commented at 11:19 am on June 20, 2025:
    There was no other loops with the same variable after the change anymore, so it doesn’t need to be a non-standard loop variable name (wasn’t sure what s meant in this context) - but since it’s important to you I’ve reverted them.
  21. maflcko commented at 9:22 am on June 20, 2025: member
    not sure how meaningful this is, but it looks like there are a bunch of unrelated style changes that don’t seem to have a benefit?
  22. l0rinc force-pushed on Jun 20, 2025
  23. l0rinc commented at 11:24 am on June 20, 2025: contributor

    The changes revert the original benchmark’s behavior of including the leaf vector in the benchmark (more explicitly and reserved like the production code). I’ve also changed the emplace_back calls for the hashes to push_bash and reverted the original loop conditions to minimize the diff in critical code sections.

    Note to yuvicc: previous benchmark version was measuring something else than master (only the merkle call without the vector copy), so we can’t directly compare the results.

    The new results don’t show any significant speed increase, only the memory allocation is more predictable - which isn’t captured by the benchmark anymore.

  24. l0rinc requested review from yuvicc on Jun 25, 2025
  25. l0rinc requested review from maflcko on Jun 25, 2025
  26. l0rinc force-pushed on Jun 25, 2025
  27. l0rinc commented at 9:33 pm on June 25, 2025: contributor

    I’ve rebased the changed and adjusted the benchmark to be more similar to the other production usages in the first commit, rounded to even in the second (optimization) commit, so that we can realistically measure the speed difference before and after:

    % build/bin/bench_bitcoin -filter=‘MerkleRoot’ –min-time=1000

    Before 7f620cffebee593e48434cf182cc2fd64a6d76be:

    ns/leaf leaf/s err% total benchmark
    45.50 21,975,688.66 0.0% 1.10 MerkleRoot
    45.53 21,962,546.91 0.1% 1.10 MerkleRoot
    45.53 21,965,162.52 0.0% 1.10 MerkleRoot

    After a73dd45e95f5e909d0950ba7c76225589df74854:

    ns/leaf leaf/s err% total benchmark
    45.04 22,200,976.95 0.1% 1.10 MerkleRoot
    44.97 22,235,208.25 0.1% 1.10 MerkleRoot
    45.01 22,217,033.81 0.1% 1.10 MerkleRoot
  28. refactor: make `MerkleRoot` benchmark more representative
    Instead of mutating the input after each round to avoid unwanted compiler optimizations, we assert the expected hash, which likewise inhibits aggressive optimization.
    
    To make the benchmark more similar to other `ComputeMerkleRoot` the input leaves copying is made explicit before each run.
    
    % build/bin/bench_bitcoin -filter='MerkleRoot' --min-time=1000
    
    |             ns/leaf |              leaf/s |    err% |     total | benchmark
    |--------------------:|--------------------:|--------:|----------:|:----------
    |               45.50 |       21,975,688.66 |    0.0% |      1.10 | `MerkleRoot`
    |               45.53 |       21,962,546.91 |    0.1% |      1.10 | `MerkleRoot`
    |               45.53 |       21,965,162.52 |    0.0% |      1.10 | `MerkleRoot`
    e87430ed5f
  29. merkle: pre-reserve leaves to prevent reallocs with odd vtx count
    This prevents the input vector from doubling in size when the input count is odd.
    The rounding adds 1 when the size is odd (least significant bit is 1); otherwise it adds 0, keeping the value even.
    
    % build/bin/bench_bitcoin -filter='MerkleRoot' --min-time=1000
    
    |             ns/leaf |              leaf/s |    err% |     total | benchmark
    |--------------------:|--------------------:|--------:|----------:|:----------
    |               45.04 |       22,200,976.95 |    0.1% |      1.10 | `MerkleRoot`
    |               44.97 |       22,235,208.25 |    0.1% |      1.10 | `MerkleRoot`
    |               45.01 |       22,217,033.81 |    0.1% |      1.10 | `MerkleRoot`
    be8f3053a7
  30. l0rinc force-pushed on Jun 25, 2025
  31. yuvicc commented at 7:22 am on June 27, 2025: contributor

    lgtm re-ACK be8f3053a7ad526823f32f1a70847b9c06c4c54b

    Before e87430ed5fad6f9d90c1e7d256b9b8276b936c24

    ns/leaf leaf/s err% total benchmark
    53.83 18,577,129.91 0.1% 1.10 MerkleRoot
    53.62 18,648,858.81 0.1% 1.10 MerkleRoot
    53.70 18,621,594.40 0.1% 1.10 MerkleRoot

    After be8f3053a7ad526823f32f1a70847b9c06c4c54b

    ns/leaf leaf/s err% total benchmark
    53.02 18,860,232.62 0.2% 1.10 MerkleRoot
    52.88 18,910,755.68 0.0% 1.10 MerkleRoot
    52.89 18,906,671.63 0.1% 1.10 MerkleRoot

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-07-06 06:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me