merkle: pre‑reserve leaves to prevent reallocs with odd vtx count #32497

pull l0rinc wants to merge 3 commits into bitcoin:master from l0rinc:l0rinc/pre‑reserve-merkle-leaves-to-max changing 5 files +39 −25
  1. l0rinc commented at 1:01 pm on May 14, 2025: contributor

    Summary

    ComputeMerkleRoot duplicates the last hash when the input size is odd. If the caller provides a std::vector whose capacity equals its size, that extra push_back forces a reallocation, doubling its capacity (causing peak memory usage of 3x the necessary size).

    This affects roughly half of the created blocks (those with odd transaction counts), causing unnecessary memory fragmentation during every block validation.

    Fix

    • Pre-reserves vector capacity to account for the odd-count duplication using (size + 1) & ~1ULL.
      • This syntax produces optimal assembly across x86/ARM and 32/64-bit platforms for GCC & Clang.
    • Eliminates default construction of uint256 objects that are immediately overwritten by switching from resize to reserve + push_back.

    Memory Impact

    Memory profiling shows 50% reduction in peak allocation (576KB → 288KB) and elimination of reallocation overhead.

    Validation

    The benchmark was updated to use an odd leaf count to demonstrate the real-world scenario where the reallocation occurs.

    A full -reindex-chainstate up to block 896 408 ran without triggering the asserts.

    Temporary asserts (not included in this PR) confirm that push_back never reallocates and that the coinbase witness hash remains null:

    0if (hashes.size() & 1) {
    1    assert(hashes.size() < hashes.capacity()); // TODO remove
    2    hashes.push_back(hashes.back());
    3}
    4
    5leaves.reserve((block.vtx.size() + 1) & ~1ULL); // capacity rounded up to even
    6leaves.emplace_back();
    7assert(leaves.back().IsNull()); // TODO remove
    

    Benchmark Performance

    While the main purpose is to improve predictability, the reduced memory operations also improve hashing throughput slightly.

  2. DrahtBot commented at 1:01 pm on May 14, 2025: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/32497.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK hodlinator, vasild
    Concept NACK darosior
    Concept ACK luke-jr
    Stale ACK yuvicc, optout21

    If your review is incorrectly listed, please copy-paste <!–meta-tag:bot-skip–> into the comment that the bot should ignore.

    Conflicts

    No conflicts as of last run.

  3. in src/consensus/merkle.cpp:69 in 39b6c139bd outdated
    65@@ -66,20 +66,20 @@ uint256 ComputeMerkleRoot(std::vector<uint256> hashes, bool* mutated) {
    66 uint256 BlockMerkleRoot(const CBlock& block, bool* mutated)
    67 {
    68     std::vector<uint256> leaves;
    69-    leaves.resize(block.vtx.size());
    70-    for (size_t s = 0; s < block.vtx.size(); s++) {
    71-        leaves[s] = block.vtx[s]->GetHash();
    72+    leaves.reserve((block.vtx.size() + 1) & ~1ULL); // capacity rounded up to even
    


    luke-jr commented at 11:25 am on May 20, 2025:
    This assumes size_t is unsigned long long I think? Can we avoid that easily?

    l0rinc commented at 1:27 pm on May 20, 2025:
    we could either do size_t{1} or leaves.reserve(block.vtx.size() + (block.vtx.size() & 1)), I think, if you think this could be problematic on 32 bit systems.

    l0rinc commented at 9:36 am on June 15, 2025:
    I ended up with block.vtx.size() + (block.vtx.size() & 1) - if the value ends with a 1, it’s an odd number and we increment it to make it even

    hodlinator commented at 7:44 pm on November 21, 2025:
    How about (txs.size() + 1) & ~(decltype(txs.size()){1})? Accessing txs.size() one time less at runtime. :)

    l0rinc commented at 12:10 pm on November 22, 2025:

    Accessing txs.size() one time less at runtime

    The compiler should be able to deduce that it’s the same (see assembly below): https://en.wikipedia.org/wiki/Common_subexpression_elimination


    I have considered a few alternatives:

     0BOOST_AUTO_TEST_CASE(merkle_test_even_sizing)
     1{
     2    for (size_t size = 0; size <= 1'000'000; ++size) {
     3        size_t reference = size + (size & 1U);
     4
     5        BOOST_CHECK_EQUAL(reference, (size + 1) & ~size_t{1});
     6        BOOST_CHECK_EQUAL(reference, (size + 1) & ~1ULL);
     7        BOOST_CHECK_EQUAL(reference, (size + 1) & -2);
     8        BOOST_CHECK_EQUAL(reference, (size + 1) >> 1 << 1);
     9    }
    10}
    

    So I’m reverting to my my original out.reserve((in.size() + 1) & ~1ULL); , it seems to have the simplest C++ code and the best assembly for gcc/clang and 32/64 bits and arm/x86, see: https://godbolt.org/z/xzscoq7nv

    Architecture Bits GCC Assembly Clang Assembly Notes
    x86 32 and eax, -2 and edi, 2147483646 Clang uses 0x7FFFFFFE (max positive signed 32-bit aligned).
    x86-64 64 and rax, -2 and rbp, rax GCC uses immediate -2; Clang uses a register mask here.
    ARMv7 32 bic r0, r0, [#1](/bitcoin-bitcoin/1/) bic r6, r0, [#1](/bitcoin-bitcoin/1/) Both use “Bit Clear” instruction.
    AArch64 64 and x0, x0, -2 and x24, x8, [#0](/bitcoin-bitcoin/0/)x7ffffffffffffffe Both use bitwise AND with ~1.

    GCC

    Clang


    hodlinator commented at 2:54 pm on November 27, 2025:

    (size + 1) & -2 eh? :)

    I’m happy with (size + 1) & ~1ULL. It should narrow down to 4-byte int on 32-bit.

    Diffed ull_literal vs decltype_trick for x86-64 and ARM32 (Edit: on GCC) and they are the same. :+1:


    l0rinc commented at 3:12 pm on November 27, 2025:
    Thanks for checking, I think it’s the best solution regarding C++ simplicity and CPU support.
  4. in src/bench/merkle_root.cpp:15 in 899e466237 outdated
    10@@ -11,16 +11,15 @@
    11 
    12 static void MerkleRoot(benchmark::Bench& bench)
    13 {
    14-    FastRandomContext rng(true);
    15-    std::vector<uint256> leaves;
    16-    leaves.resize(9001);
    17+    FastRandomContext rng{/*fDeterministic=*/true};
    18+    std::vector<uint256> leaves(9000); // even number of leaves
    


    luke-jr commented at 11:33 am on May 20, 2025:
    Wouldn’t it be better to benchmark the less-performant input?

    l0rinc commented at 1:24 pm on May 20, 2025:
    After this change all call-sites are pre-sized to even number of leaves, so this reflects the usage better. I could also round to 9002 to mimic the new leaves.reserve((9001 + 1) & ~1ULL) behavior, if you thinks that’s more representative.

    l0rinc commented at 9:35 am on June 15, 2025:
    added back 9001
  5. in src/bench/merkle_root.cpp:22 in 899e466237 outdated
    23-        bool mutation = false;
    24-        uint256 hash = ComputeMerkleRoot(std::vector<uint256>(leaves), &mutation);
    25-        leaves[mutation] = hash;
    26+        bool mutation{false};
    27+        const uint256 hash{ComputeMerkleRoot(std::vector(leaves), &mutation)};
    28+        assert(mutation == false && hash == uint256{"18ff9b2bef46a834b2625871b2839fec3b67fc1243920c424858324fbde1a8de"});
    


    luke-jr commented at 11:33 am on May 20, 2025:
    How does this avoid optimisations?

    l0rinc commented at 1:21 pm on May 20, 2025:
    the ComputeMerkleRoot calculation has to be performed to be able to validate this assert - it’s can’t eliminate the call.
  6. luke-jr approved
  7. luke-jr commented at 11:34 am on May 20, 2025: member
    utACK main commit. Not sure why the benchmark is being changed.
  8. l0rinc commented at 1:28 pm on May 20, 2025: contributor
    Thanks for the review!
  9. yuvicc commented at 8:34 pm on June 13, 2025: contributor

    Concept ACK

    Memory allocation/deallocation is slow and expensive.

  10. yuvicc commented at 8:42 pm on June 13, 2025: contributor
    I have thought on changing the benchmark tests as we would want to test the worst case scenario right? or else we could have both case for testing?
  11. yuvicc commented at 9:35 pm on June 13, 2025: contributor

    Benchmark Testing

    master@5757de4ddd37f9321ee6b338b40888fd3561fc00

    • With 9000
    0|             ns/leaf |              leaf/s |    err% |     total | benchmark
    1|--------------------:|--------------------:|--------:|----------:|:----------
    2|               52.62 |       19,005,746.07 |    0.2% |      0.01 | `MerkleRoot`
    3|               52.64 |       18,998,504.40 |    0.3% |      0.01 | `MerkleRoot`
    4|               52.63 |       18,999,727.67 |    0.2% |      0.01 | `MerkleRoot`
    
    • With 9001
    0|             ns/leaf |              leaf/s |    err% |     total | benchmark
    1|--------------------:|--------------------:|--------:|----------:|:----------
    2|               53.50 |       18,693,063.88 |    0.3% |      0.01 | `MerkleRoot`
    3|               53.53 |       18,681,211.49 |    0.5% |      0.01 | `MerkleRoot`
    4|               53.49 |       18,694,053.87 |    0.5% |      0.01 | `MerkleRoot`
    

    Commit 39b6c139bd6be33699af781f1d71f6fed303d468

    • With 9000
    0|             ns/leaf |              leaf/s |    err% |     total | benchmark
    1|--------------------:|--------------------:|--------:|----------:|:----------
    2|               52.51 |       19,043,628.95 |    0.2% |      0.01 | `MerkleRoot`
    3|               52.52 |       19,040,989.96 |    0.2% |      0.01 | `MerkleRoot`
    4|               52.53 |       19,036,358.39 |    0.2% |      0.01 | `MerkleRoot`
    
    • With 9001
    0|             ns/leaf |              leaf/s |    err% |     total | benchmark
    1|--------------------:|--------------------:|--------:|----------:|:----------
    2|               53.40 |       18,713,525.67 |    0.3% |      0.01 | `MerkleRoot`
    3|               53.44 |       19,314,655.73 |    0.3% |      0.01 | `MerkleRoot`
    4|               53.41 |       18,883,462.75 |    0.3% |      0.01 | `MerkleRoot`
    
  12. l0rinc force-pushed on Jun 15, 2025
  13. l0rinc commented at 10:06 am on June 15, 2025: contributor
    Updated the benchmark to showcase the before/after state better (resembles production code changes), by splitting out the vector copies to the unmetered lambda and having an odd number of elements. Changed the previous leaves.reserve((block.vtx.size() + 1) & ~1ULL) rounding to leaves.reserve(block.vtx.size() + (block.vtx.size() & 1)).
  14. l0rinc requested review from luke-jr on Jun 15, 2025
  15. l0rinc requested review from yuvicc on Jun 15, 2025
  16. yuvicc commented at 5:26 pm on June 15, 2025: contributor

    Code review ACK 4cc5942895673591de4edceb6dd0c7188c302a72

    Testing

    Before Master@9a7eece5a4

    ns/leaf leaf/s err% total benchmark
    53.75 18,605,160.70 0.7% 0.01 MerkleRoot
    53.64 18,643,002.35 0.1% 0.01 MerkleRoot
    53.91 18,548,704.52 0.4% 0.01 MerkleRoot

    After commit@4cc5942895673591de4edceb6dd0c7188c302a72

    ns/leaf leaf/s err% total benchmark
    57.67 17,340,305.39 0.0% 0.52 MerkleRoot
    57.69 17,332,534.96 0.0% 0.52 MerkleRoot
    58.14 17,199,057.10 0.0% 0.52 MerkleRoot
  17. in src/bench/merkle_root.cpp:19 in 064f33bbe2 outdated
    18     std::vector<uint256> leaves;
    19-    leaves.resize(9001);
    20-    for (auto& item : leaves) {
    21-        item = rng.rand256();
    22+    leaves.resize(block_vtx_size);
    23+    for (auto& h : leaves) h = rng.rand256();
    


    maflcko commented at 9:11 am on June 20, 2025:
    why is this changed. touching a line to only remove the {} after the for loop seems odd and I had the impression that the dev notes preferred to use {}

    l0rinc commented at 11:19 am on June 20, 2025:
    I’m fine with both, reverted.
  18. in src/bench/merkle_root.cpp:22 in 064f33bbe2 outdated
    21-        item = rng.rand256();
    22+    leaves.resize(block_vtx_size);
    23+    for (auto& h : leaves) h = rng.rand256();
    24+
    25+    constexpr size_t iterations{1000};
    26+    std::vector<std::vector<uint256>> cases(iterations);
    


    maflcko commented at 9:14 am on June 20, 2025:
    not sure about pre-allocating. All real code paths allocate in-line for each iteration, so the benchmark should follow that. This would also avoid eating 100+mb for a microbenchmark?

    l0rinc commented at 10:42 am on June 20, 2025:
    I wanted to explicitly only measure the ComputeMerkleRoot call here without collecting hashes into a vector. But if you think that part should de included, I don’t mind, but in that case we won’t measure the behavior change in ComputeMerkleRoot anymore. However, I think the proper allocation makes sense regardless - e.g. for 2001 entries now we’ll allocate 2002 entries instead of 4002 like before.
  19. in src/consensus/merkle.cpp:71 in 4cc5942895 outdated
    65@@ -66,20 +66,20 @@ uint256 ComputeMerkleRoot(std::vector<uint256> hashes, bool* mutated) {
    66 uint256 BlockMerkleRoot(const CBlock& block, bool* mutated)
    67 {
    68     std::vector<uint256> leaves;
    69-    leaves.resize(block.vtx.size());
    70-    for (size_t s = 0; s < block.vtx.size(); s++) {
    71-        leaves[s] = block.vtx[s]->GetHash();
    72+    leaves.reserve(block.vtx.size() + (block.vtx.size() & 1)); // capacity rounded up to even
    73+    for (const auto& tx : block.vtx) {
    74+        leaves.emplace_back(tx->GetHash());
    


    maflcko commented at 9:19 am on June 20, 2025:
    why is this changed? I don’t think a byte array can meaningfully be moved (of course it compiles, but moving a byte is no different than copying it)

    l0rinc commented at 11:19 am on June 20, 2025:
    After I have used leaves.emplace_back(); // The witness hash of the coinbase is 0 in BlockWitnessMerkleRoot it seemed more consistent to me to use emplace_back in BlockMerkleRoot as well - as you said it should do the same. I’m fine with both, I’ve changed them inside the loop to push_back.
  20. in src/consensus/merkle.cpp:81 in 4cc5942895 outdated
    83-    leaves[0].SetNull(); // The witness hash of the coinbase is 0.
    84-    for (size_t s = 1; s < block.vtx.size(); s++) {
    85-        leaves[s] = block.vtx[s]->GetWitnessHash();
    86+    leaves.reserve(block.vtx.size() + (block.vtx.size() & 1)); // capacity rounded up to even
    87+    leaves.emplace_back(); // The witness hash of the coinbase is 0.
    88+    for (size_t i{1}; i < block.vtx.size(); ++i) {
    


    maflcko commented at 9:20 am on June 20, 2025:
    why is this changed? I don’t think it is worth to touch a line just to change the name of the variable from one single char to another single char.

    l0rinc commented at 11:19 am on June 20, 2025:
    There was no other loops with the same variable after the change anymore, so it doesn’t need to be a non-standard loop variable name (wasn’t sure what s meant in this context) - but since it’s important to you I’ve reverted them.
  21. maflcko commented at 9:22 am on June 20, 2025: member
    not sure how meaningful this is, but it looks like there are a bunch of unrelated style changes that don’t seem to have a benefit?
  22. l0rinc force-pushed on Jun 20, 2025
  23. l0rinc commented at 11:24 am on June 20, 2025: contributor

    The changes revert the original benchmark’s behavior of including the leaf vector in the benchmark (more explicitly and reserved like the production code). I’ve also changed the emplace_back calls for the hashes to push_bash and reverted the original loop conditions to minimize the diff in critical code sections.

    Note to yuvicc: previous benchmark version was measuring something else than master (only the merkle call without the vector copy), so we can’t directly compare the results.

    The new results don’t show any significant speed increase, only the memory allocation is more predictable - which isn’t captured by the benchmark anymore.

  24. l0rinc requested review from yuvicc on Jun 25, 2025
  25. l0rinc requested review from maflcko on Jun 25, 2025
  26. l0rinc force-pushed on Jun 25, 2025
  27. l0rinc commented at 9:33 pm on June 25, 2025: contributor

    I’ve rebased the changed and adjusted the benchmark to be more similar to the other production usages in the first commit, rounded to even in the second (optimization) commit, so that we can realistically measure the speed difference before and after:

    % build/bin/bench_bitcoin -filter=‘MerkleRoot’ –min-time=1000

    Before 7f620cffebee593e48434cf182cc2fd64a6d76be:

    ns/leaf leaf/s err% total benchmark
    45.50 21,975,688.66 0.0% 1.10 MerkleRoot
    45.53 21,962,546.91 0.1% 1.10 MerkleRoot
    45.53 21,965,162.52 0.0% 1.10 MerkleRoot

    After a73dd45e95f5e909d0950ba7c76225589df74854:

    ns/leaf leaf/s err% total benchmark
    45.04 22,200,976.95 0.1% 1.10 MerkleRoot
    44.97 22,235,208.25 0.1% 1.10 MerkleRoot
    45.01 22,217,033.81 0.1% 1.10 MerkleRoot
  28. l0rinc force-pushed on Jun 25, 2025
  29. yuvicc commented at 7:22 am on June 27, 2025: contributor

    lgtm re-ACK be8f3053a7ad526823f32f1a70847b9c06c4c54b

    Before e87430ed5fad6f9d90c1e7d256b9b8276b936c24

    ns/leaf leaf/s err% total benchmark
    53.83 18,577,129.91 0.1% 1.10 MerkleRoot
    53.62 18,648,858.81 0.1% 1.10 MerkleRoot
    53.70 18,621,594.40 0.1% 1.10 MerkleRoot

    After be8f3053a7ad526823f32f1a70847b9c06c4c54b

    ns/leaf leaf/s err% total benchmark
    53.02 18,860,232.62 0.2% 1.10 MerkleRoot
    52.88 18,910,755.68 0.0% 1.10 MerkleRoot
    52.89 18,906,671.63 0.1% 1.10 MerkleRoot
  30. l0rinc force-pushed on Jul 30, 2025
  31. l0rinc commented at 11:58 pm on July 30, 2025: contributor
    Rebased and added a new commit on top for deduplicating the integer rounding. Ready for review again.
  32. l0rinc force-pushed on Jul 30, 2025
  33. DrahtBot added the label CI failed on Jul 30, 2025
  34. DrahtBot commented at 11:58 pm on July 30, 2025: contributor

    🚧 At least one of the CI tasks failed. Task lint: https://github.com/bitcoin/bitcoin/runs/47077763348 LLM reason (✨ experimental): Lint check failed due to missing include guards in src/util/ints.h.

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  35. DrahtBot removed the label CI failed on Jul 31, 2025
  36. optout21 commented at 6:36 am on August 7, 2025: contributor

    LGTM! Minor comments:

    • Maybe add a note to the ComputeMerkleRoot declaration, that the hashes input is expected to have one extra capacity (in case size is odd). Minor, as this function is not used directly much.
    • Even more, it could debug-assert that capacity is even, but this may be a bit of a stretch.
  37. l0rinc force-pushed on Aug 7, 2025
  38. l0rinc commented at 7:03 pm on August 7, 2025: contributor

    it could debug-assert that capacity is even

    Added, thanks for the hint

    Edit: had to push a few variants, the fuzzing still seems to fail, so I removed the Assert/Assume from ComputeMerkleRoot (it’s not a requirement anyway, it’s just more performant)

  39. l0rinc force-pushed on Aug 7, 2025
  40. DrahtBot added the label CI failed on Aug 7, 2025
  41. DrahtBot commented at 7:31 pm on August 7, 2025: contributor

    🚧 At least one of the CI tasks failed. Task multiprocess, i686, DEBUG: https://github.com/bitcoin/bitcoin/runs/47624077326 LLM reason (✨ experimental): The CI failure is caused by an assertion failure in consensus/merkle.cpp during the merkle root computation.

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  42. l0rinc force-pushed on Aug 7, 2025
  43. l0rinc force-pushed on Aug 7, 2025
  44. DrahtBot removed the label CI failed on Aug 8, 2025
  45. optout21 commented at 8:20 am on August 8, 2025: contributor

    reACK 8af0a3a72c8af781bebed7289657c637220df893

    (prev: ACK e3b02f1539e59a6a20a00b4161219b5aba6fefed ReACK. Nicely simplified. a7997d02ebe5b72e706712f43884644b705a1cd1 Good optimization, nice catch!)

  46. in src/test/fuzz/integer.cpp:85 in a7997d02eb outdated
    80@@ -80,7 +81,8 @@ FUZZ_TARGET(integer, .init = initialize_integer)
    81     }
    82     constexpr uint256 u256_min{"0000000000000000000000000000000000000000000000000000000000000000"};
    83     constexpr uint256 u256_max{"ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff"};
    84-    const std::vector<uint256> v256{u256, u256_min, u256_max};
    85+    std::vector<uint256> v256{u256, u256_min, u256_max};
    86+    v256.reserve(RoundUpToEven(v256.size()));
    


    achow101 commented at 1:34 am on August 9, 2025:
    This reserve seems to be for ComputeMerkleRoot. However, I think it would be better to have ComputeMerkleRoot do any reservations necessary rather than relying on the caller to do so.

    l0rinc commented at 2:32 am on August 9, 2025:

    ComputeMerkleRoot already received a list of values, usually sized to fit their current size exactly (i.e. capacity() == size()). We can of course increase the capacity inside - it’s what:

    0if (hashes.size() & 1) {
    1    hashes.push_back(hashes.back());
    2}
    

    does already, but if we want to avoid reallocations, we have to predict the size of the vector before we start populating it.


    We could add some kind of lambda as a parameter and an expected size maybe, something like:

    0template <typename Gen>
    1uint256 GenerateMerkleRoot(std::size_t count, Gen&& gen, bool* mutated = nullptr)
    2{
    3    std::vector<uint256> leaves;
    4    leaves.reserve(RoundUpToEven(count));
    5    for (std::size_t i{0}; i < count; ++i) {
    6        leaves.push_back(gen(i));
    7    }
    8    return ComputeMerkleRoot(std::move(leaves), mutated);
    9}
    

    and the production code callers would look like:

     0static uint256 ComputeModifiedMerkleRoot(const CMutableTransaction& cb, const CBlock& block)
     1{
     2    return GenerateMerkleRoot(block.vtx.size(), [&](auto i) { return i == 0 ? cb.GetHash() : block.vtx[i]->GetHash(); });
     3}
     4
     5uint256 BlockMerkleRoot(const CBlock& block, bool* mutated)
     6{
     7    return GenerateMerkleRoot(block.vtx.size(), [&](auto i) { return block.vtx[i]->GetHash(); }, mutated);
     8}
     9uint256 BlockWitnessMerkleRoot(const CBlock& block, bool* mutated)
    10{
    11    return GenerateMerkleRoot(block.vtx.size(), [&](auto i) { return i == 0 ? uint256{} : uint256(block.vtx[i]->GetWitnessHash()); }, mutated);
    12}
    

    Would that be simpler in your opinion?

     0diff --git a/src/consensus/merkle.cpp b/src/consensus/merkle.cpp
     1index 8c56d979a5..a0e358b005 100644
     2--- a/src/consensus/merkle.cpp
     3+++ b/src/consensus/merkle.cpp
     4@@ -5,7 +5,6 @@
     5 #include <consensus/merkle.h>
     6 #include <hash.h>
     7 #include <util/check.h>
     8-#include <util/ints.h>
     9 
    10 /*     WARNING! If you're reading this because you're learning about crypto
    11        and/or designing a new system that will use merkle trees, keep in mind
    12@@ -63,26 +62,14 @@ uint256 ComputeMerkleRoot(std::vector<uint256> hashes, bool* mutated) {
    13     return hashes[0];
    14 }
    15 
    16-
    17 uint256 BlockMerkleRoot(const CBlock& block, bool* mutated)
    18 {
    19-    std::vector<uint256> leaves;
    20-    leaves.reserve(RoundUpToEven(block.vtx.size()));
    21-    for (size_t s = 0; s < block.vtx.size(); s++) {
    22-        leaves.push_back(block.vtx[s]->GetHash());
    23-    }
    24-    return ComputeMerkleRoot(std::move(leaves), mutated);
    25+    return GenerateMerkleRoot(block.vtx.size(), [&](auto i) { return block.vtx[i]->GetHash(); }, mutated);
    26 }
    27 
    28 uint256 BlockWitnessMerkleRoot(const CBlock& block, bool* mutated)
    29 {
    30-    std::vector<uint256> leaves;
    31-    leaves.reserve(RoundUpToEven(block.vtx.size()));
    32-    leaves.emplace_back(); // The witness hash of the coinbase is 0.
    33-    for (size_t s = 1; s < block.vtx.size(); s++) {
    34-        leaves.push_back(block.vtx[s]->GetWitnessHash());
    35-    }
    36-    return ComputeMerkleRoot(std::move(leaves), mutated);
    37+    return GenerateMerkleRoot(block.vtx.size(), [&](auto i) { return i == 0 ? uint256{} : uint256(block.vtx[i]->GetWitnessHash()); }, mutated);
    38 }
    39 
    40 /* This implements a constant-space merkle root/path calculator, limited to 2^32 leaves. */
    41diff --git a/src/consensus/merkle.h b/src/consensus/merkle.h
    42index c722cbe446..8b5c14157a 100644
    43--- a/src/consensus/merkle.h
    44+++ b/src/consensus/merkle.h
    45@@ -9,9 +9,21 @@
    46 
    47 #include <primitives/block.h>
    48 #include <uint256.h>
    49+#include <util/ints.h>
    50 
    51 uint256 ComputeMerkleRoot(std::vector<uint256> hashes, bool* mutated = nullptr);
    52 
    53+template <typename Gen>
    54+uint256 GenerateMerkleRoot(std::size_t count, Gen&& gen, bool* mutated = nullptr)
    55+{
    56+    std::vector<uint256> leaves;
    57+    leaves.reserve(RoundUpToEven(count));
    58+    for (std::size_t i{0}; i < count; ++i) {
    59+        leaves.push_back(gen(i));
    60+    }
    61+    return ComputeMerkleRoot(std::move(leaves), mutated);
    62+}
    63+
    64 /*
    65  * Compute the Merkle root of the transactions in a block.
    66  * *mutated is set to true if a duplicated subtree was found.
    67diff --git a/src/signet.cpp b/src/signet.cpp
    68index 3104b7f2d7..5aea084f15 100644
    69--- a/src/signet.cpp
    70+++ b/src/signet.cpp
    71@@ -58,13 +58,8 @@ static bool FetchAndClearCommitmentSection(const std::span<const uint8_t> header
    72 
    73 static uint256 ComputeModifiedMerkleRoot(const CMutableTransaction& cb, const CBlock& block)
    74 {
    75-    std::vector<uint256> leaves;
    76-    leaves.reserve(RoundUpToEven(block.vtx.size()));
    77-    leaves.push_back(cb.GetHash());
    78-    for (size_t s = 1; s < block.vtx.size(); ++s) {
    79-        leaves.push_back(block.vtx[s]->GetHash());
    80-    }
    81-    return ComputeMerkleRoot(std::move(leaves));
    82+    return GenerateMerkleRoot(block.vtx.size(),
    83+        [&](auto i) { return i ? block.vtx[i]->GetHash() : cb.GetHash(); });
    84 }
    85 
    86 std::optional<SignetTxs> SignetTxs::Create(const CBlock& block, const CScript& challenge)
    

    achow101 commented at 6:15 am on August 9, 2025:

    Would that be simpler in your opinion?

    No.

    but if we want to avoid reallocations, we have to predict the size of the vector before we start populating it.

    We can also avoid inserting into the vector in the first place, e.g. this diff gives me about the same speedup without needing to change anything with allocations:

     0diff --git a/src/consensus/merkle.cpp b/src/consensus/merkle.cpp
     1index 7dd24e1868f..46056bfa3e0 100644
     2--- a/src/consensus/merkle.cpp
     3+++ b/src/consensus/merkle.cpp
     4@@ -44,6 +44,7 @@
     5 
     6 
     7 uint256 ComputeMerkleRoot(std::vector<uint256> hashes, bool* mutated) {
     8+    std::array<uint256, 2> odd_hash;
     9     bool mutation = false;
    10     while (hashes.size() > 1) {
    11         if (mutated) {
    12@@ -51,11 +52,20 @@ uint256 ComputeMerkleRoot(std::vector<uint256> hashes, bool* mutated) {
    13                 if (hashes[pos] == hashes[pos + 1]) mutation = true;
    14             }
    15         }
    16-        if (hashes.size() & 1) {
    17+        bool need_dup = hashes.capacity() == hashes.size() && (hashes.size() & 1);
    18+        if (need_dup) {
    19+            odd_hash[0] = hashes.back();
    20+            odd_hash[1] = hashes.back();
    21+        } else if (hashes.size() & 1) {
    22             hashes.push_back(hashes.back());
    23         }
    24         SHA256D64(hashes[0].begin(), hashes[0].begin(), hashes.size() / 2);
    25         hashes.resize(hashes.size() / 2);
    26+        if (need_dup) {
    27+            SHA256D64(odd_hash[0].begin(), odd_hash[0].begin(), 1);
    28+            assert(hashes.capacity() > hashes.size());
    29+            hashes.push_back(odd_hash[0]);
    30+        }
    31     }
    32     if (mutated) *mutated = mutation;
    33     if (hashes.size() == 0) return uint256();
    

    l0rinc commented at 6:21 am on August 9, 2025:
    Clever, we can of course calculate the extra value independently, but isn’t it a lot simpler to just precalculate the final size before insertions and keep ComputeMerkleRoot unchanged?

    achow101 commented at 6:58 am on August 9, 2025:
    I think it’s simpler if the caller does not have to be aware that they should allocate a larger vector to be passed in.

    l0rinc commented at 3:41 pm on August 9, 2025:
    The lambda solution is even simpler in that case, the caller doesn’t even have to allocate or iterate at all.

    l0rinc commented at 8:44 pm on August 13, 2025:
    Addressed it in a different way in latest push, let me know if this solves your concern or if it’s orthogonal.

    l0rinc commented at 1:04 pm on November 22, 2025:
    Reverted to the original implementation, it’s the least-invasive change with simplest diff

    vasild commented at 1:11 pm on December 9, 2025:

    I think it’s simpler if the caller does not have to be aware that they should allocate a larger vector to be passed in.

    +1

    I do not understand why the patch above works :-/ but I like that it is contained inside ComputeMerkleRoot().


    l0rinc commented at 1:42 pm on December 9, 2025:
    Given that reviewers found changing a resize to a reserve too risky for this part of the code, I doubt this would pass review.
  47. DrahtBot added the label Needs rebase on Aug 13, 2025
  48. l0rinc force-pushed on Aug 13, 2025
  49. l0rinc commented at 8:44 pm on August 13, 2025: contributor

    Rebased after #33116 - the only conflict was adding .ToUint256() to the merkle hashes.

    Based on the suggestion of @achow101 (since I do agree that it’s kinda’ leaking abstractions and there’s also a lot of duplication for collecting the merkle leaves), I have also deduplicated all call sites without needing to touch the consensusy Merkle calculation itself. We can still do that in a follow-up, if needed, but this way we’ve made the code both simpler and more performant (which wasn’t the case with either the original or the suggestion), and separated the leaves abstraction and kept the risky merkle root calculation intact. I have remeasured the before/after benchmarks, no change in performance since last push, but updated the commit messages anyway.

  50. DrahtBot added the label CI failed on Aug 13, 2025
  51. DrahtBot commented at 9:49 pm on August 13, 2025: contributor

    🚧 At least one of the CI tasks failed. Task tidy: https://github.com/bitcoin/bitcoin/runs/48036589106 LLM reason (✨ experimental): The failure is caused by an error in clang-tidy related to the use of std::move on a const variable, indicating a code warning treated as an error.

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  52. l0rinc force-pushed on Aug 13, 2025
  53. DrahtBot removed the label Needs rebase on Aug 13, 2025
  54. DrahtBot removed the label CI failed on Aug 13, 2025
  55. darosior commented at 4:11 pm on November 13, 2025: member
    Sorry to be this guy again, but it seems to me unreasonable to touch such critical consensus code for such marginal benefits. Unless observable benefits can be shown, i lean toward Concept NACK.
  56. l0rinc commented at 4:14 pm on November 21, 2025: contributor

    Unless observable benefits can be shown

    The change itself is trivial, it’s just a +1 for the reserve before calling the Merkle calculation. I don’t mind simplifying the change if you think that’s easier to review (I could also just bump leaves.reserve(block.vtx.size()) to leaves.reserve(block.vtx.size() + 1) without the refactors). Originally it was similar to that, but based on concerns to decouple callers from needing to know the internals of Merkle calculations, I extracted the loops.

    I’m open to simplifying it back to just a reserve if you think that’s easier to review, but I’d argue that the underlying incorrect memory reservation is a minor bug that should be fixed.


    While this change does result in small speedup, the primary purpose is to avoid having to reallocate 2x additional memory for a predictable calculation - less memory fragmentation, lower memory usage, more predictable behavior. Let’s quantify the impact: running massif for the benchmark with an odd leaf count shows the memory peaks clearly:

     0COMMITS="f2334a0ae056a9e3cba218206d139b3688eeda96 de6a28396b8bf7a3fd1124418b4cd5beba1906d6"; \
     1CC=gcc; CXX=g++; \
     2BASE_DIR="/mnt/my_storage"; LOG_DIR="$BASE_DIR/logs"; \
     3FILTER="MerkleRoot"; MIN_TIME=200000; \
     4mkdir -p "$LOG_DIR"; \
     5(echo "memory benchmark | filter=$FILTER | min-time=${MIN_TIME}ms | $(hostname) | $(uname -m) | $(lscpu | grep 'Model name' | head -1 | cut -d: -f2 | xargs) | $(nproc) cores | $(free -h | awk '/^Mem:/{print $2}') RAM";) &&\
     6(for c in $COMMITS; do git log -1 --pretty='%h %s' $c || exit 1; done; echo "") && \
     7for COMMIT in $COMMITS; do \
     8  SHORT_COMMIT=${COMMIT:0:8}; \
     9  echo "Processing commit $SHORT_COMMIT..."; \
    10  git fetch -q origin $COMMIT && git checkout -q $COMMIT && \
    11  git clean -fxd && git reset --hard && \
    12  cmake -B build -G Ninja -DCMAKE_BUILD_TYPE=RelWithDebInfo -DBUILD_BENCH=ON -DENABLE_IPC=OFF && \
    13  ninja -C build bench_bitcoin -j2 && \
    14  valgrind --tool=massif --time-unit=ms \
    15    --massif-out-file="$LOG_DIR/massif-bench-$FILTER-$SHORT_COMMIT.out" \
    16    build/bin/bench_bitcoin -filter="$FILTER" --min-time=$MIN_TIME && \
    17  ms_print "$LOG_DIR/massif-bench-$FILTER-$SHORT_COMMIT.out" | tee "$LOG_DIR/massif-bench-$FILTER-$SHORT_COMMIT.txt" && \
    18  echo "Completed $SHORT_COMMIT"; \
    19done && \
    20echo "" && echo "Results in: $LOG_DIR/massif-bench-$FILTER-*.{out,txt}"
    

    We can also clearly see the difference in the capture stack traces. Before, having 1.3 MB peak memory usage:

     0    MB
     11.332^        :                                                               
     2     | #      :                                                               
     3     | #      :                                                               
     4     | #      :                                                               
     5     | #      :                                                               
     6     | #    @ :                                                               
     7     | #    @ :                                                               
     8     | #    @ :                                                               
     9     | #    @ :                                                               
    10     | #    @ :                                                               
    11     | #    @ :                                                               
    12     | #    @ :                                                               
    13     | #    @ :                                                               
    14     | #::::@::::::::::::::::::::::::::::::::::::::::::::::::::::::@:::::@::::
    15     | #: ::@::::: :::::::: :: ::: :::::: : : :: ::: ::: : : : ::::@:::::@::::
    16     | #: ::@::::: :::::::: :: ::: :::::: : : :: ::: ::: : : : ::::@:::::@::::
    17     | #: ::@::::: :::::::: :: ::: :::::: : : :: ::: ::: : : : ::::@:::::@::::
    18     | #: ::@::::: :::::::: :: ::: :::::: : : :: ::: ::: : : : ::::@:::::@::::
    19     | #: ::@::::: :::::::: :: ::: :::::: : : :: ::: ::: : : : ::::@:::::@::::
    20     | #: ::@::::: :::::::: :: ::: :::::: : : :: ::: ::: : : : ::::@:::::@::::
    21   0 +----------------------------------------------------------------------->s
    22     0                                                                   226.2
    

    showing the reallocations clearly in the stacks:

    097.87% (1,366,841B) (heap allocation functions) malloc/new/new[], --alloc-fns, etc.
    1->41.25% (576,064B) 0x969717: allocate (new_allocator.h:151)
    2| ->41.25% (576,064B) 0x969717: allocate (allocator.h:203)
    3|   ->41.25% (576,064B) 0x969717: allocate (alloc_traits.h:614)
    4|     ->41.25% (576,064B) 0x969717: _M_allocate (stl_vector.h:387)
    5|       ->41.25% (576,064B) 0x969717: _M_realloc_append<const uint256&> (vector.tcc:572)
    6|         ->41.25% (576,064B) 0x969717: push_back (stl_vector.h:1427)
    7|           ->41.25% (576,064B) 0x969717: ComputeMerkleRoot(std::vector<uint256, std::allocator<uint256> >, bool*) (merkle.cpp:55)
    8|             ->41.25% (576,064B) 0x2235A7: operator() (merkle_root.cpp:31)
    9|               ->41.25% (576,064B) 0x2235A7: ankerl::nanobench::Bench& ankerl::nanobench::Bench::run<MerkleRoot(ankerl::nanobench::Bench&)::{lambda()
    

    After, having a 0.8 MB peak memory usage

     0    KB
     1801.4^ #                                                                      
     2     | #                                                                      
     3     | #                                                                      
     4     | #                                                                      
     5     | #                                                                      
     6     | #                                                                      
     7     | #                                                                      
     8     | #                                                         :::::@:::::@:
     9     | #:::@@@::@:::::::::::::::@::@:@:::@@:::::::::@::::::@:::::@::::@:::::@:
    10     | #:::@ @: @:::::::::::::::@::@:@:::@ :::: ::::@::::::@:::::@::::@:::::@:
    11     | #:::@ @: @:::::::::::::::@::@:@:::@ :::: ::::@::::::@:::::@::::@:::::@:
    12     | #:::@ @: @:::::::::::::::@::@:@:::@ :::: ::::@::::::@:::::@::::@:::::@:
    13     | #:::@ @: @:::::::::::::::@::@:@:::@ :::: ::::@::::::@:::::@::::@:::::@:
    14     | #:::@ @: @:::::::::::::::@::@:@:::@ :::: ::::@::::::@:::::@::::@:::::@:
    15     | #:::@ @: @:::::::::::::::@::@:@:::@ :::: ::::@::::::@:::::@::::@:::::@:
    16     | #:::@ @: @:::::::::::::::@::@:@:::@ :::: ::::@::::::@:::::@::::@:::::@:
    17     | #:::@ @: @:::::::::::::::@::@:@:::@ :::: ::::@::::::@:::::@::::@:::::@:
    18     | #:::@ @: @:::::::::::::::@::@:@:::@ :::: ::::@::::::@:::::@::::@:::::@:
    19     | #:::@ @: @:::::::::::::::@::@:@:::@ :::: ::::@::::::@:::::@::::@:::::@:
    20     | #:::@ @: @:::::::::::::::@::@:@:::@ :::: ::::@::::::@:::::@::::@:::::@:
    21   0 +----------------------------------------------------------------------->s
    22     0                                                                   227.5
    

    and the stacks don’t show reallocs anymore:

    096.37% (790,809B) (heap allocation functions) malloc/new/new[], --alloc-fns, etc.
    1->35.10% (288,064B) 0x2234AF: allocate (new_allocator.h:151)
    2| ->35.10% (288,064B) 0x2234AF: allocate (allocator.h:203)
    3|   ->35.10% (288,064B) 0x2234AF: allocate (alloc_traits.h:614)
    4|     ->35.10% (288,064B) 0x2234AF: _M_allocate (stl_vector.h:387)
    5|       ->35.10% (288,064B) 0x2234AF: reserve (vector.tcc:79)
    6|         ->35.10% (288,064B) 0x2234AF: ToMerkleLeaves<std::vector<uint256>, MerkleRoot(ankerl::nanobench::Bench&)::<lambda()>::<lambda(bool, const auto:46&)> > (merkle.h:19)
    7|           ->35.10% (288,064B) 0x2234AF: operator() (merkle_root.cpp:25)
    8|             ->35.10% (288,064B) 0x2234AF: ankerl::nanobench::Bench& ankerl::nanobench::Bench::run<MerkleRoot(ankerl::nanobench::Bench&)::{lambda()
    
  57. in src/test/merkle_tests.cpp:240 in de6a28396b
    241-    hashes[0].SetNull();
    242-    hashes[1] = block.vtx[1]->GetHash().ToUint256();
    243-
    244-    uint256 merkleRootofHashes = ComputeMerkleRoot(hashes);
    245+    auto leaves{ToMerkleLeaves(block.vtx, [&](bool first, const auto& tx) {
    246+        return first ? uint256{} : tx->GetWitnessHash().ToUint256();
    


    hodlinator commented at 6:54 pm on November 21, 2025:

    Why are we changing from GetHash() to GetWitnessHash() here?

    For txs without witness data they return the same value, which master seems to implicitly validate.


    l0rinc commented at 9:58 am on November 22, 2025:
    We’re comparing it against the results of BlockWitnessMerkleRoot which returns GetWitnessHash. #16865 didn’t specify why they’re comparing GetHash against GetWitnessHash here, I deduced it’s a coincidence. I don’t mind keeping GetHash if you think it was deliberate.
  58. in src/consensus/merkle.cpp:67 in de6a28396b
    68-    std::vector<uint256> leaves;
    69-    leaves.resize(block.vtx.size());
    70-    for (size_t s = 0; s < block.vtx.size(); s++) {
    71-        leaves[s] = block.vtx[s]->GetHash().ToUint256();
    72-    }
    73+    auto leaves{ToMerkleLeaves(block.vtx, [&](auto _, const auto& tx) {
    


    hodlinator commented at 7:02 pm on November 21, 2025:

    If we keep the lambda, _ is more the Python way of declaring unused variables, while C/C++ usually omits the identifier entirely:

    0    auto leaves{ToMerkleLeaves(block.vtx, [](bool, const auto& tx) {
    

    or

    0    auto leaves{ToMerkleLeaves(block.vtx, [](auto /*first*/, const auto& tx) {
    

    Notice I also removed the capture-& when it is not needed, which makes the lambda more pure (less cognitive load). (Also prefer bool over auto since it doesn’t cost a single char, but that’s more subjective).

     0diff --git a/src/bench/merkle_root.cpp b/src/bench/merkle_root.cpp
     1index bd8524c805..55baed6e02 100644
     2--- a/src/bench/merkle_root.cpp
     3+++ b/src/bench/merkle_root.cpp
     4@@ -22,7 +22,7 @@ static void MerkleRoot(benchmark::Bench& bench)
     5 
     6     constexpr uint256 expected_root{"d8d4dfd014a533bc3941b8663fa6e7f3a8707af124f713164d75b0c3179ecb08"};
     7     bench.batch(hashes.size()).unit("leaf").run([&] {
     8-        auto leaves{ToMerkleLeaves(hashes, [&](bool, const auto& h) { return h; })};
     9+        auto leaves{ToMerkleLeaves(hashes, [](bool, const auto& h) { return h; })};
    10         const uint256 root{ComputeMerkleRoot(std::move(leaves))};
    11         assert(root == expected_root);
    12     });
    13diff --git a/src/consensus/merkle.cpp b/src/consensus/merkle.cpp
    14index 4f637110dd..41eb1f6e5a 100644
    15--- a/src/consensus/merkle.cpp
    16+++ b/src/consensus/merkle.cpp
    17@@ -64,7 +64,7 @@ uint256 ComputeMerkleRoot(std::vector<uint256> hashes, bool* mutated) {
    18 
    19 uint256 BlockMerkleRoot(const CBlock& block, bool* mutated)
    20 {
    21-    auto leaves{ToMerkleLeaves(block.vtx, [&](auto _, const auto& tx) {
    22+    auto leaves{ToMerkleLeaves(block.vtx, [](bool, const auto& tx) {
    23         return tx->GetHash().ToUint256();
    24     })};
    25     return ComputeMerkleRoot(std::move(leaves), mutated);
    26@@ -72,7 +72,7 @@ uint256 BlockMerkleRoot(const CBlock& block, bool* mutated)
    27 
    28 uint256 BlockWitnessMerkleRoot(const CBlock& block, bool* mutated)
    29 {
    30-    auto leaves{ToMerkleLeaves(block.vtx, [&](auto first, const auto& tx) {
    31+    auto leaves{ToMerkleLeaves(block.vtx, [](bool first, const auto& tx) {
    32         return first ? uint256{} : tx->GetWitnessHash().ToUint256();
    33     })};
    34     return ComputeMerkleRoot(std::move(leaves), mutated);
    35diff --git a/src/signet.cpp b/src/signet.cpp
    36index 9c564d66d7..ed42b554f3 100644
    37--- a/src/signet.cpp
    38+++ b/src/signet.cpp
    39@@ -57,7 +57,7 @@ static bool FetchAndClearCommitmentSection(const std::span<const uint8_t> header
    40 
    41 static uint256 ComputeModifiedMerkleRoot(const CMutableTransaction& cb, const CBlock& block)
    42 {
    43-    auto leaves{ToMerkleLeaves(block.vtx, [&](auto first, const auto& tx) {
    44+    auto leaves{ToMerkleLeaves(block.vtx, [&](bool first, const auto& tx) {
    45         return (first ? cb.GetHash() : tx->GetHash()).ToUint256();
    46     })};
    47     return ComputeMerkleRoot(std::move(leaves));
    48diff --git a/src/test/fuzz/merkle.cpp b/src/test/fuzz/merkle.cpp
    49index 4ac83ffb81..b4136008e2 100644
    50--- a/src/test/fuzz/merkle.cpp
    51+++ b/src/test/fuzz/merkle.cpp
    52@@ -44,7 +44,7 @@ FUZZ_TARGET(merkle)
    53     }
    54 
    55     const std::size_t num_txs{block->vtx.size()};
    56-    auto tx_hashes{ToMerkleLeaves(block->vtx, [&](bool, const auto& tx) {
    57+    auto tx_hashes{ToMerkleLeaves(block->vtx, [](bool, const auto& tx) {
    58         return tx->GetHash().ToUint256();
    59     })};
    60 
    61diff --git a/src/test/merkle_tests.cpp b/src/test/merkle_tests.cpp
    62index c31368c9ec..c8f8079d22 100644
    63--- a/src/test/merkle_tests.cpp
    64+++ b/src/test/merkle_tests.cpp
    65@@ -236,7 +236,7 @@ BOOST_AUTO_TEST_CASE(merkle_test_BlockWitness)
    66 
    67     uint256 blockWitness = BlockWitnessMerkleRoot(block);
    68 
    69-    auto leaves{ToMerkleLeaves(block.vtx, [&](bool first, const auto& tx) {
    70+    auto leaves{ToMerkleLeaves(block.vtx, [](bool first, const auto& tx) {
    71         return first ? uint256{} : tx->GetWitnessHash().ToUint256();
    72     })};
    73     uint256 merkleRootofHashes{ComputeMerkleRoot(std::move(leaves))};
    

    l0rinc commented at 1:02 pm on November 22, 2025:
    Thanks, ended up reverting ToMerkleLeaves completely to simplify the review
  59. hodlinator commented at 9:31 pm on November 21, 2025: contributor

    Concept ACK de6a28396b8bf7a3fd1124418b4cd5beba1906d6

    ComputeMerkleRoot(std::vector<uint256> hashes, ...) takes a vector by value, but luckily call sites already move into it, and since the underlying array/memory is transferred on move, the capacity “transfers as well” - neat!


    Sorry to be this guy again, but it seems to me unreasonable to touch such critical consensus code for such marginal benefits. Unless observable benefits can be shown, …

    Thanks for being that guy!

    Going by #32497 (comment) and #32497 (comment) there seems to be a 2% speed increase for computing merkle roots.

    I cannot spot any issue with how this PR currently changes the consensus code. However, maybe we could reduce the scope of the change somewhat to make it more palatable. Here is my attempt: https://github.com/bitcoin/bitcoin/compare/master...hodlinator:bitcoin:refs/heads/pr/32497_minimal#diff-706988c23877f8a557484053887f932b2cafb3b5998b50497ce7ff8118ac85a3 (link directly to src/consensus/merkle.cpp)

    This 2% speed increase in merkle root computation is probably not at all measurable for a full IBD though. So if there is no variant of this optimization that inspires confidence, I can understand the push-back.

  60. l0rinc commented at 1:00 pm on November 22, 2025: contributor

    Latest push:

    • rebases against latest master;
    • reverts code back to simply change leaves.resize(block.vtx.size()) constructs before calling ComputeMerkleRoot to .reserve((block.vtx.size() + 1) & ~1ULL); ** added godbolt reproducer for each important combination of compiler and architecture
    • Separates the test changes to a separate commit explaining the motivation in the commit message
    • updated commit messages and PR descriptions.
  61. l0rinc force-pushed on Nov 22, 2025
  62. in src/bench/merkle_root.cpp:27 in 45dc62f0ed outdated
    30+
    31+    constexpr uint256 expected_root{"d8d4dfd014a533bc3941b8663fa6e7f3a8707af124f713164d75b0c3179ecb08"};
    32+    bench.batch(hashes.size()).unit("leaf").run([&] {
    33+        std::vector<uint256> leaves;
    34+        leaves.reserve(hashes.size());
    35+        for (size_t s = 0; s < hashes.size(); s++) {
    


    optout21 commented at 10:37 am on November 27, 2025:

    nit: consistent coding style

    0-        for (size_t s = 0; s < hashes.size(); s++) {
    1+        for (auto s{0}; s < hashes.size(); ++s) {
    

    l0rinc commented at 11:06 am on November 27, 2025:
    I deliberately avoided changing any of that in this commit: #32497 (review)

    optout21 commented at 12:24 pm on November 27, 2025:
    Hmm, I see this as new code added by this commit, not present in the original.

    l0rinc commented at 2:08 pm on November 27, 2025:
    You’re right, it needs more explanation: I could change this to use a more modern style, but it would be different from the code that it’s trying to measure, namely: https://github.com/bitcoin/bitcoin/blob/9c24cda72edb2085edfa75296d6b42fab34433d9/src/consensus/merkle.cpp#L81 It’s a very tiny difference, but I usually prefer the benchmark being representative of the code so that it’s grep-friendly (in case we modify one and search for similar code)
  63. in src/bench/merkle_root.cpp:29 in 45dc62f0ed outdated
    32+    bench.batch(hashes.size()).unit("leaf").run([&] {
    33+        std::vector<uint256> leaves;
    34+        leaves.reserve(hashes.size());
    35+        for (size_t s = 0; s < hashes.size(); s++) {
    36+            leaves.push_back(hashes[s]);
    37+        }
    


    optout21 commented at 10:39 am on November 27, 2025:
    Retracted, vector size is different. Nevermind, sorry. nit: Could this copy be done simpler? (i.e. leaves = hashes, or leaves{hashes})
  64. bitcoin deleted a comment on Nov 27, 2025
  65. in src/bench/merkle_root.cpp:22 in e3b02f1539 outdated
    23+    for (auto& item : hashes) {
    24         item = rng.rand256();
    25     }
    26-    bench.batch(leaves.size()).unit("leaf").run([&] {
    27-        bool mutation = false;
    28-        uint256 hash = ComputeMerkleRoot(std::vector<uint256>(leaves), &mutation);
    


    hodlinator commented at 2:31 pm on November 27, 2025:
    in 45dc62f0edf71d52ae6b024dec6b11746e3c4076 “refactor: make MerkleRoot benchmark more representative”: Sending in the &mutation parameter makes the function perform more work (somewhat quadratically?). Are we sure we want to drop that?

    l0rinc commented at 3:06 pm on November 27, 2025:

    My reasoning was that I want the benchmark to reflect the actual usage, see: #32497 (comment)

    It wouldn’t be fair to see the ComputeMerkleRoot performance without the leaf constructions (and get an unrealistic speedup). What do you think?


    hodlinator commented at 8:25 pm on November 27, 2025:

    I’m fine with (edit: all for) including the leaf construction, but that’s not my concern in this case.

    mutation/mutated is used here: https://github.com/bitcoin/bitcoin/blob/14a4dca89adaab6f1ad42af6716f9065f1abe2e1/src/validation.cpp#L3944-L3961

    Which leads to this being run on every iteration through the outer while: https://github.com/bitcoin/bitcoin/blob/14a4dca89adaab6f1ad42af6716f9065f1abe2e1/src/consensus/merkle.cpp#L46-L53


    l0rinc commented at 8:54 pm on November 27, 2025:

    ComputeMerkleRoot is usually called without mutation checks: https://github.com/bitcoin/bitcoin/blob/fa283d28e261416f0785450ab687af199103776a/src/validation.cpp#L3929-L3932

    Most existing mutation checks are noops, see: #33805. The benchmark was modeling pre-segwit BlockMerkleRoot before, and now it’s closer to the behavior of BlockWitnessMerkleRoot.

    I haven’t measured it explicitly, but the latter should be more representative of modern usage - please let me know if you think I misunderstood something.


    hodlinator commented at 10:35 am on December 1, 2025:

    ComputeMerkleRoot is usually called without mutation checks

    That is true for witness-version of the roots but we still perform the checks for the non-witness root. Every time net_processing receives a regular block message (unless we are loading blocks), we check whether the block is mutated:

    https://github.com/bitcoin/bitcoin/blob/fa283d28e261416f0785450ab687af199103776a/src/net_processing.cpp#L4647

    IsBlockMutated() always calls CheckMerkleRoot() which always calls BlockMerkleRoot() (once for each block) with the bool param that ends up going into ComputeMerkleRoot().


     0diff --git a/src/bench/merkle_root.cpp b/src/bench/merkle_root.cpp
     1index 310fc56739..c0e63a09e7 100644
     2--- a/src/bench/merkle_root.cpp
     3+++ b/src/bench/merkle_root.cpp
     4@@ -33,4 +33,80 @@ static void MerkleRoot(benchmark::Bench& bench)
     5     });
     6 }
     7 
     8+static void MerkleRootMutation(benchmark::Bench& bench)
     9+{
    10+    FastRandomContext rng{/*fDeterministic=*/true};
    11+
    12+    std::vector<uint256> hashes;
    13+    hashes.resize(9001);
    14+    for (auto& item : hashes) {
    15+        item = rng.rand256();
    16+    }
    17+
    18+    constexpr uint256 expected_root{"d8d4dfd014a533bc3941b8663fa6e7f3a8707af124f713164d75b0c3179ecb08"};
    19+    bench.batch(hashes.size()).unit("leaf").run([&] {
    20+        std::vector<uint256> leaves;
    21+        bool mutation = false;
    22+        leaves.reserve((hashes.size() + 1) & ~1ULL); // capacity rounded up to even
    23+        for (size_t s = 0; s < hashes.size(); s++) {
    24+            leaves.push_back(hashes[s]);
    25+        }
    26+
    27+        const uint256 root{ComputeMerkleRoot(std::move(leaves), &mutation)};
    28+        assert(mutation == (root.GetUint64(0) & 1));
    29+        assert(root == expected_root);
    30+    });
    31+}
    32+
    33+static void MerkleRootMutationSimpleReserve(benchmark::Bench& bench)
    34+{
    35+    FastRandomContext rng{/*fDeterministic=*/true};
    36+
    37+    std::vector<uint256> hashes;
    38+    hashes.resize(9001);
    39+    for (auto& item : hashes) {
    40+        item = rng.rand256();
    41+    }
    42+
    43+    constexpr uint256 expected_root{"d8d4dfd014a533bc3941b8663fa6e7f3a8707af124f713164d75b0c3179ecb08"};
    44+    bench.batch(hashes.size()).unit("leaf").run([&] {
    45+        std::vector<uint256> leaves;
    46+        bool mutation = false;
    47+        leaves.reserve(hashes.size());
    48+        for (size_t s = 0; s < hashes.size(); s++) {
    49+            leaves.push_back(hashes[s]);
    50+        }
    51+
    52+        const uint256 root{ComputeMerkleRoot(std::move(leaves), &mutation)};
    53+        assert(mutation == (root.GetUint64(0) & 1));
    54+        assert(root == expected_root);
    55+    });
    56+}
    57+
    58+static void MerkleRootSimpleReserve(benchmark::Bench& bench)
    59+{
    60+    FastRandomContext rng{/*fDeterministic=*/true};
    61+
    62+    std::vector<uint256> hashes;
    63+    hashes.resize(9001);
    64+    for (auto& item : hashes) {
    65+        item = rng.rand256();
    66+    }
    67+
    68+    constexpr uint256 expected_root{"d8d4dfd014a533bc3941b8663fa6e7f3a8707af124f713164d75b0c3179ecb08"};
    69+    bench.batch(hashes.size()).unit("leaf").run([&] {
    70+        std::vector<uint256> leaves;
    71+        leaves.reserve(hashes.size());
    72+        for (size_t s = 0; s < hashes.size(); s++) {
    73+            leaves.push_back(hashes[s]);
    74+        }
    75+
    76+        const uint256 root{ComputeMerkleRoot(std::move(leaves))};
    77+        assert(root == expected_root);
    78+    });
    79+}
    80+
    81 BENCHMARK(MerkleRoot, benchmark::PriorityLevel::HIGH);
    82+BENCHMARK(MerkleRootMutation, benchmark::PriorityLevel::HIGH);
    83+BENCHMARK(MerkleRootMutationSimpleReserve, benchmark::PriorityLevel::HIGH);
    84+BENCHMARK(MerkleRootSimpleReserve, benchmark::PriorityLevel::HIGH);
    
    0₿ cmake --build build -t bench_bitcoin && build/bin/bench_bitcoin -filter="MerkleRoot.*" -min-time=10000
    1[1/4] Generating bitcoin-build-info.h
    2
    3|             ns/leaf |              leaf/s |    err% |        ins/leaf |        cyc/leaf |    IPC |       bra/leaf |   miss% |     total | benchmark
    4|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
    5|              113.30 |        8,826,041.95 |    0.1% |        1,223.17 |          417.61 |  2.929 |           2.63 |    0.2% |     10.91 | `MerkleRoot`
    6|              114.78 |        8,712,680.79 |    0.0% |        1,239.18 |          423.01 |  2.929 |           3.63 |    0.2% |     10.98 | `MerkleRootMutation`
    7|              115.98 |        8,622,472.65 |    0.0% |        1,247.23 |          427.47 |  2.918 |           4.64 |    0.2% |     11.02 | `MerkleRootMutationSimpleReserve`
    8|              114.91 |        8,702,442.98 |    0.1% |        1,230.23 |          423.52 |  2.905 |           3.64 |    0.2% |     11.02 | `MerkleRootSimpleReserve`
    

    Relative improvement when not using the mutated parameter: MerkleRootSimpleReserve -> MerkleRoot (current PR): 113.30 -> 114.91 => 1.40% Relative improvement when bringing back the mutated parameter: MerkleRootMutationSimpleReserve (close to master) -> MerkleRootMutation: 115.98 -> 114.78 => 1.03%

    Conclusion:

    The somewhat quadratic aspect of calculating the mutated parameter of ComputeMerkleRoot() makes the improvement from reserving the correct size stand out less. The impact of keeping the mutated part had less impact than I feared though. I see 3 alternatives:

    1. Keep current PR code - But add clear note and justification to commit message & PR description of why we are no longer benchmarking the mutated param.
    2. Change back to no longer dropping the mutated param aspect of the benchmark.
    3. Maintain 2 benchmarks, one which exercises mutated and one which does not.

    My preference is 2). It accepts a lower relative improvement from reserving the correct amount, but I don’t see a clear justification for dropping mutated from benchmarking (as it is used in net_processing as described above). The signal of 1.03% is still clear. (Though some distance away from 2.4% currently stated in the PR description).


    l0rinc commented at 10:40 am on December 1, 2025:
    I can also just revert the benchmark completely, this is mostly a correctness (reserve the actual amount, not one less) and memory fix (don’t allocate 3x peak memory), I don’t expect it to speed up execution a lot, but I do expect it to fragment the memory less, see #32497 (comment)

    hodlinator commented at 10:52 am on December 1, 2025:

    I prefer doing the reserve-adjustment in the benchmark as well and doing the reserve inside of the benchmarked section (edited #32497 (review) earlier today).

    Makes sense that it could have a higher impact on memory fragmentation than execution speed.


    l0rinc commented at 12:10 pm on December 1, 2025:

    I have added a composite benchmark to measure both, added originally with resize + overwrite (as the code it measures), and in the last commit changes them to even reserve.

    Remeasured it as follows:

    0for commit in 5c77b621591a800095c0afd738e4b96c47400701 8bbc90a9811ee279265ad114376eff7e852bee10; do \
    1    git fetch origin $commit >/dev/null 2>&1 && git checkout $commit >/dev/null 2>&1 && echo "" && git log -1 --pretty='%h %s' && \
    2    rm -rf build >/dev/null 2>&1 && cmake -B build -DBUILD_BENCH=ON -DENABLE_IPC=OFF -DCMAKE_BUILD_TYPE=RelWithDebInfo >/dev/null 2>&1 && \
    3    cmake --build build -j$(nproc) >/dev/null 2>&1 && \
    4    for _ in $(seq 5); do \
    5      sleep 5; \
    6      sudo taskpolicy -t 5 -l 5 nice -n -20 ./build/bin/bench_bitcoin -filter='MerkleRoot.*' -min-time=1000; \
    7    done; \
    8done
    
     05c77b62159 refactor: make `MerkleRoot` benchmark more representative
     1
     2|             ns/leaf |              leaf/s |    err% |     total | benchmark
     3|--------------------:|--------------------:|--------:|----------:|:----------
     4|               44.26 |       22,592,347.11 |    0.0% |      1.10 | `MerkleRoot`
     5|               44.74 |       22,352,771.64 |    0.1% |      1.10 | `MerkleRootWithMutation`
     6
     7|             ns/leaf |              leaf/s |    err% |     total | benchmark
     8|--------------------:|--------------------:|--------:|----------:|:----------
     9|               44.26 |       22,593,951.64 |    0.1% |      1.10 | `MerkleRoot`
    10|               44.73 |       22,356,324.04 |    0.0% |      1.10 | `MerkleRootWithMutation`
    11
    12|             ns/leaf |              leaf/s |    err% |     total | benchmark
    13|--------------------:|--------------------:|--------:|----------:|:----------
    14|               44.36 |       22,544,297.38 |    0.1% |      1.10 | `MerkleRoot`
    15|               44.94 |       22,251,479.03 |    0.2% |      1.10 | `MerkleRootWithMutation`
    16
    17|             ns/leaf |              leaf/s |    err% |     total | benchmark
    18|--------------------:|--------------------:|--------:|----------:|:----------
    19|               44.31 |       22,566,269.08 |    0.0% |      1.10 | `MerkleRoot`
    20|               44.82 |       22,312,541.43 |    0.1% |      1.10 | `MerkleRootWithMutation`
    21
    22|             ns/leaf |              leaf/s |    err% |     total | benchmark
    23|--------------------:|--------------------:|--------:|----------:|:----------
    24|               44.32 |       22,563,549.17 |    0.0% |      1.10 | `MerkleRoot`
    25|               44.84 |       22,300,473.36 |    0.0% |      1.10 | `MerkleRootWithMutation`
    26
    278bbc90a981 merkle: pre-reserve leaves to prevent reallocs with odd vtx count
    28
    29|             ns/leaf |              leaf/s |    err% |     total | benchmark
    30|--------------------:|--------------------:|--------:|----------:|:----------
    31|               43.67 |       22,897,985.36 |    0.0% |      1.10 | `MerkleRoot`
    32|               44.18 |       22,636,422.13 |    0.1% |      1.10 | `MerkleRootWithMutation`
    33
    34|             ns/leaf |              leaf/s |    err% |     total | benchmark
    35|--------------------:|--------------------:|--------:|----------:|:----------
    36|               43.66 |       22,905,129.22 |    0.0% |      1.11 | `MerkleRoot`
    37|               44.21 |       22,618,981.30 |    0.1% |      1.10 | `MerkleRootWithMutation`
    38
    39|             ns/leaf |              leaf/s |    err% |     total | benchmark
    40|--------------------:|--------------------:|--------:|----------:|:----------
    41|               43.75 |       22,856,741.62 |    0.1% |      1.11 | `MerkleRoot`
    42|               44.32 |       22,560,920.73 |    0.1% |      1.10 | `MerkleRootWithMutation`
    43
    44|             ns/leaf |              leaf/s |    err% |     total | benchmark
    45|--------------------:|--------------------:|--------:|----------:|:----------
    46|               43.76 |       22,850,068.43 |    0.1% |      1.10 | `MerkleRoot`
    47|               44.57 |       22,437,748.76 |    0.2% |      1.10 | `MerkleRootWithMutation`
    48
    49|             ns/leaf |              leaf/s |    err% |     total | benchmark
    50|--------------------:|--------------------:|--------:|----------:|:----------
    51|               43.79 |       22,838,665.64 |    0.1% |      1.10 | `MerkleRoot`
    52|               44.29 |       22,579,859.06 |    0.1% |      1.10 | `MerkleRootWithMutation`
    

    Which shows a ~1% speedup for both cases:

    But since isn’t necessarily the point, I also added the previous memory measurements to the commit messages for clarity.

    Let me know what you think.


    hodlinator commented at 8:17 pm on December 1, 2025:

    Looks good!

    nit: With your version of having 2 benchmarks within the same function, -filter='MerkleRoot.*' in the commit message can be simplified to -filter='MerkleRoot'.


    l0rinc commented at 9:06 am on December 2, 2025:

    can be simplified to -filter=‘MerkleRoot’.

    Yeah, thanks, I know, I explicitly extended it since it produces two results now.

  66. in src/test/merkle_tests.cpp:231 in e3b02f1539
    226@@ -227,7 +227,8 @@ BOOST_AUTO_TEST_CASE(merkle_test_BlockWitness)
    227 {
    228     CBlock block;
    229 
    230-    block.vtx.resize(2);
    231+    size_t vtx_count{3};
    232+    block.vtx.resize(vtx_count);
    


    hodlinator commented at 2:36 pm on November 27, 2025:
    nit in 6c2d181bdc09024c5de5e29d9e913fcc2dba5071 “test: adjust ComputeMerkleRoot tests”: Why extract vtx_count when it’s only used once?

    l0rinc commented at 3:10 pm on November 27, 2025:
    Good point, I can use it in a few more places, pushed.
  67. hodlinator commented at 3:01 pm on November 27, 2025: contributor

    Reviewed e3b02f1539e59a6a20a00b4161219b5aba6fefed

    Thanks for reverting to a simpler version of this change!

    2 inline questions.

  68. l0rinc force-pushed on Nov 27, 2025
  69. l0rinc force-pushed on Dec 1, 2025
  70. hodlinator approved
  71. hodlinator commented at 8:26 pm on December 1, 2025: contributor

    ACK 8af0a3a72c8af781bebed7289657c637220df893

    While the benefit of this change to consensus code is low, it’s been simplified back to a more easily reviewable version.

    The benchmark now has two parts, one that detects mutation issues and one which does not, corresponding to the non-witness & witness merkle root computations performed by Bitcoin Core on mainnet.

    One extra level of paranoia?

    I think this change really is simple enough to approve as-is. It’s maybe not the same risk-level as changing standard library major versions, but adjacent to it. If people still object, one idea to improve confidence could be to splice in a new commit before the last one - it could add BlockMerkleRootNew() and BlockWitnessMerkleRootNew() alongside the old ones, and have a fuzz test which compares results between new & old versions. Then the last commit could merge new & old together, keeping the old name but using the new implementations, and removing the fuzz test again.

    Benchmarks

    0₿ cmake --build build -t bench_bitcoin && build/bin/bench_bitcoin -filter="MerkleRoot" -min-time=10000
    

    5fd62ede95900a6ea603c2b9f14c58ff4c32a4e2 “bench: make MerkleRoot benchmark more representative”

    0|             ns/leaf |              leaf/s |    err% |        ins/leaf |        cyc/leaf |    IPC |       bra/leaf |   miss% |     total | benchmark
    1|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
    2|              115.53 |        8,655,444.89 |    0.0% |        1,216.23 |          425.94 |  2.855 |           2.14 |    0.3% |     11.03 | `MerkleRoot`
    3|              116.84 |        8,558,882.10 |    0.0% |        1,231.24 |          430.78 |  2.858 |           3.14 |    0.2% |     10.99 | `MerkleRootWithMutation`
    

    8af0a3a72c8af781bebed7289657c637220df893 “validation: pre-reserve leaves to prevent reallocs with odd vtx count”

    0|             ns/leaf |              leaf/s |    err% |        ins/leaf |        cyc/leaf |    IPC |       bra/leaf |   miss% |     total | benchmark
    1|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
    2|              113.25 |        8,830,206.60 |    0.0% |        1,223.17 |          417.50 |  2.930 |           2.63 |    0.2% |     10.99 | `MerkleRoot`
    3|              114.67 |        8,720,906.94 |    0.0% |        1,238.18 |          422.72 |  2.929 |           3.63 |    0.2% |     11.01 | `MerkleRootWithMutation`
    

    Modified 8af0a3a72c8af781bebed7289657c637220df893 which just does leaves.reserve(hashes.size())

    0|             ns/leaf |              leaf/s |    err% |        ins/leaf |        cyc/leaf |    IPC |       bra/leaf |   miss% |     total | benchmark
    1|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
    2|              114.81 |        8,709,871.88 |    0.0% |        1,230.23 |          423.26 |  2.907 |           3.64 |    0.2% |     11.01 | `MerkleRoot`
    3|              116.21 |        8,605,021.23 |    0.0% |        1,245.23 |          428.41 |  2.907 |           4.64 |    0.2% |     11.00 | `MerkleRootWithMutation`
    

    This last one shows that even just switching from resize() to reserve(), without always reserving an even number, does result in a measurable speedup by itself. Explains why I didn’t see as big differences in my previous testing (https://github.com/bitcoin/bitcoin/pull/32497#discussion_r2576533841) since all those variants used reserve().

  72. DrahtBot requested review from optout21 on Dec 1, 2025
  73. in src/test/merkle_tests.cpp:242 in 8af0a3a72c
    237@@ -237,12 +238,12 @@ BOOST_AUTO_TEST_CASE(merkle_test_BlockWitness)
    238     uint256 blockWitness = BlockWitnessMerkleRoot(block);
    239 
    240     std::vector<uint256> hashes;
    241-    hashes.resize(block.vtx.size());
    242-    hashes[0].SetNull();
    243-    hashes[1] = block.vtx[1]->GetHash().ToUint256();
    244+    hashes.resize(vtx_count); // Note: leaving odd count to exercise old behavior
    245+    for(size_t pos{1}; pos < vtx_count; pos++) {
    


    vasild commented at 11:04 am on December 9, 2025:

    style: space after for and ++i is preferred over i++ (from doc/developer-notes.md).

    0    for (size_t pos{1}; pos < vtx_count; ++pos) {
    

    (the loop a few lines above also uses pos++)


    l0rinc commented at 1:41 pm on December 9, 2025:
    This is deliberate to mimic the source since reviewers mentioned it explicitly: #32497#pullrequestreview-2945160215

    vasild commented at 5:42 am on December 10, 2025:
    This for(... without a space is a new code.

    l0rinc commented at 12:51 pm on December 11, 2025:
    Yes, I copied that from the source to simplify grepping for code (since I couldn’t adjust the source instead). If you insist, I can decouple the two if I need to touch again

    l0rinc commented at 1:48 pm on December 11, 2025:
    Rebased, fixed this, thanks!
  74. in src/bench/merkle_root.cpp:27 in 8af0a3a72c outdated
    31+
    32+    constexpr uint256 expected_root{"d8d4dfd014a533bc3941b8663fa6e7f3a8707af124f713164d75b0c3179ecb08"};
    33+    for (bool mutate : {false, true}) {
    34+        bench.name(mutate ? "MerkleRootWithMutation" : "MerkleRoot").batch(hashes.size()).unit("leaf").run([&] {
    35+            std::vector<uint256> leaves;
    36+            leaves.reserve((hashes.size() + 1) & ~1ULL); // capacity rounded up to even
    


    vasild commented at 11:20 am on December 9, 2025:

    This reminds me of the Obfuscated C Code Contest. I consider the dumb version easier to read:

    0x % 2 == 0 ? x : x + 1
    

    Remember that other people could get confused easier than you.

    Assembly for (x + 1) & ~1ULL:

    0t[0x304574] <+4>:  leaq   0x1(%rdi), %rax
    1t[0x304578] <+8>:  andq   $-0x2, %rax
    

    Assembly for x % 2 == 0 ? x : x + 1:

    0t[0x304574] <+4>:  movl   %edi, %eax
    1t[0x304576] <+6>:  andl   $0x1, %eax
    2t[0x304579] <+9>:  addq   %rdi, %rax
    

    If you really like the ~!@~*!& version, maybe put it in an inline function RoundUpToEven(n)?


    l0rinc commented at 1:41 pm on December 9, 2025:

    maybe put it in an inline function RoundUpToEven(n)

    I did that before, reviewers found it complicated.

  75. in src/bench/merkle_root.cpp:18 in 8af0a3a72c outdated
    17-    leaves.resize(9001);
    18-    for (auto& item : leaves) {
    19+    FastRandomContext rng{/*fDeterministic=*/true};
    20+
    21+    std::vector<uint256> hashes{};
    22+    hashes.resize(9001);
    


    vasild commented at 12:13 pm on December 9, 2025:

    In the commit message of 8af0a3a72c8af781bebed7289657c637220df893 validation: pre-reserve leaves to prevent reallocs with odd vtx count:

    … doubling its capacity (allocating 3x the necessary memory)

    Why 3x?


    l0rinc commented at 1:41 pm on December 9, 2025:

    Why 3x?

    The existing and the new doubled have to coexist for a while

  76. vasild commented at 1:15 pm on December 9, 2025: contributor
    Approach ACK 8af0a3a72c8af781bebed7289657c637220df893
  77. test: adjust `ComputeMerkleRoot` tests
    Update the integer fuzz test to move the vector into `ComputeMerkleRoot`, matching production usage patterns and avoiding unnecessary copies.
    
    Update `merkle_test_BlockWitness` to use an odd number of transactions to ensure the test covers the scenario where leaf duplication occurs. Also switch to `GetWitnessHash` to match `BlockWitnessMerkleRoot` semantics.
    The manual vector setup retains the exact-size `resize` to explicitly verify the behavior against the calculated root.
    f0a2183108
  78. bench: make `MerkleRoot` benchmark more representative
    Two versions are run now, one with the mutation calculations, the other without.
    To avoid unwanted compiler optimizations, we assert the expected hash, which should inhibit aggressive optimization.
    
    To make the benchmark more similar to production `ComputeMerkleRoot` call sites, the input leaves-copying is made explicit before each run.
    
    > ./build/bin/bench_bitcoin -filter='MerkleRoot.*' -min-time=1000
    
    |             ns/leaf |              leaf/s |    err% |     total | benchmark
    |--------------------:|--------------------:|--------:|----------:|:----------
    |               44.26 |       22,592,347.11 |    0.0% |      1.10 | `MerkleRoot`
    |               44.74 |       22,352,771.64 |    0.1% |      1.10 | `MerkleRootWithMutation`
    
    Massif memory measurements show the excessive memory reservations:
    
        MB
    1.332^        :
         | #      :
         | #      :
         | #      :
         | #      :
         | #    @ :
         | #    @ :
         | #    @ :
         | #    @ :
         | #    @ :
         | #    @ :
         | #    @ :
         | #    @ :
         | #::::@::::::::::::::::::::::::::::::::::::::::::::::::::::::@:::::@::::
         | #: ::@::::: :::::::: :: ::: :::::: : : :: ::: ::: : : : ::::@:::::@::::
         | #: ::@::::: :::::::: :: ::: :::::: : : :: ::: ::: : : : ::::@:::::@::::
         | #: ::@::::: :::::::: :: ::: :::::: : : :: ::: ::: : : : ::::@:::::@::::
         | #: ::@::::: :::::::: :: ::: :::::: : : :: ::: ::: : : : ::::@:::::@::::
         | #: ::@::::: :::::::: :: ::: :::::: : : :: ::: ::: : : : ::::@:::::@::::
         | #: ::@::::: :::::::: :: ::: :::::: : : :: ::: ::: : : : ::::@:::::@::::
       0 +----------------------------------------------------------------------->s
         0                                                                   226.2
    
    showing the reallocations clearly in the stacks:
    97.87% (1,366,841B) (heap allocation functions) malloc/new/new[], --alloc-fns, etc.
    ->41.25% (576,064B) 0x969717: allocate (new_allocator.h:151)
    | ->41.25% (576,064B) 0x969717: allocate (allocator.h:203)
    |   ->41.25% (576,064B) 0x969717: allocate (alloc_traits.h:614)
    |     ->41.25% (576,064B) 0x969717: _M_allocate (stl_vector.h:387)
    |       ->41.25% (576,064B) 0x969717: _M_realloc_append<const uint256&> (vector.tcc:572)
    |         ->41.25% (576,064B) 0x969717: push_back (stl_vector.h:1427)
    |           ->41.25% (576,064B) 0x969717: ComputeMerkleRoot(std::vector<uint256, std::allocator<uint256> >, bool*) (merkle.cpp:55)
    |             ->41.25% (576,064B) 0x2235A7: operator() (merkle_root.cpp:31)
    |               ->41.25% (576,064B) 0x2235A7: ankerl::nanobench::Bench& ankerl::nanobench::Bench::run<MerkleRoot(ankerl::nanobench::Bench&)::{lambda()
    
    Co-authored-by: Hodlinator <172445034+hodlinator@users.noreply.github.com>
    1a768c0d4c
  79. validation: pre-reserve leaves to prevent reallocs with odd vtx count
    `ComputeMerkleRoot` duplicates the last hash when the input size is odd. If the caller provides a `std::vector` whose capacity equals its size, that extra `push_back` forces a reallocation, doubling its capacity (allocating 3x the necessary memory).
    
    This affects roughly half of the created blocks (those with odd transaction counts), causing unnecessary memory fragmentation during every block validation.
    
    Fix this by pre-reserving the vector capacity to account for the odd-count duplication. The expression `(size + 1) & ~1ULL` adds 1 to the size and clears the last bit, effectively rounding up to the next even number. This syntax produces optimal assembly across x86/ARM and 32/64-bit platforms for gcc/clang, see https://godbolt.org/z/xzscoq7nv.
    
    Also switch from `resize` to `reserve` + `push_back` to eliminate the default construction of `uint256` objects that are immediately overwritten.
    
    > ./build/bin/bench_bitcoin -filter='MerkleRoot.*' -min-time=1000
    
    |             ns/leaf |              leaf/s |    err% |     total | benchmark
    |--------------------:|--------------------:|--------:|----------:|:----------
    |               43.66 |       22,905,129.22 |    0.0% |      1.11 | `MerkleRoot`
    |               44.18 |       22,636,422.13 |    0.1% |      1.10 | `MerkleRootWithMutation`
    
    Massif memory measurements after show 0.8 MB peak memory usage
    
        KB
    801.4^ #
         | #
         | #
         | #
         | #
         | #
         | #
         | #                                                         :::::@:::::@:
         | #:::@@@::@:::::::::::::::@::@:@:::@@:::::::::@::::::@:::::@::::@:::::@:
         | #:::@ @: @:::::::::::::::@::@:@:::@ :::: ::::@::::::@:::::@::::@:::::@:
         | #:::@ @: @:::::::::::::::@::@:@:::@ :::: ::::@::::::@:::::@::::@:::::@:
         | #:::@ @: @:::::::::::::::@::@:@:::@ :::: ::::@::::::@:::::@::::@:::::@:
         | #:::@ @: @:::::::::::::::@::@:@:::@ :::: ::::@::::::@:::::@::::@:::::@:
         | #:::@ @: @:::::::::::::::@::@:@:::@ :::: ::::@::::::@:::::@::::@:::::@:
         | #:::@ @: @:::::::::::::::@::@:@:::@ :::: ::::@::::::@:::::@::::@:::::@:
         | #:::@ @: @:::::::::::::::@::@:@:::@ :::: ::::@::::::@:::::@::::@:::::@:
         | #:::@ @: @:::::::::::::::@::@:@:::@ :::: ::::@::::::@:::::@::::@:::::@:
         | #:::@ @: @:::::::::::::::@::@:@:::@ :::: ::::@::::::@:::::@::::@:::::@:
         | #:::@ @: @:::::::::::::::@::@:@:::@ :::: ::::@::::::@:::::@::::@:::::@:
         | #:::@ @: @:::::::::::::::@::@:@:::@ :::: ::::@::::::@:::::@::::@:::::@:
       0 +----------------------------------------------------------------------->s
         0                                                                   227.5
    
    and the stacks don't show reallocs anymore:
    96.37% (790,809B) (heap allocation functions) malloc/new/new[], --alloc-fns, etc.
    ->35.10% (288,064B) 0x2234AF: allocate (new_allocator.h:151)
    | ->35.10% (288,064B) 0x2234AF: allocate (allocator.h:203)
    |   ->35.10% (288,064B) 0x2234AF: allocate (alloc_traits.h:614)
    |     ->35.10% (288,064B) 0x2234AF: _M_allocate (stl_vector.h:387)
    |       ->35.10% (288,064B) 0x2234AF: reserve (vector.tcc:79)
    |         ->35.10% (288,064B) 0x2234AF: ToMerkleLeaves<std::vector<uint256>, MerkleRoot(ankerl::nanobench::Bench&)::<lambda()>::<lambda(bool, const auto:46&)> > (merkle.h:19)
    |           ->35.10% (288,064B) 0x2234AF: operator() (merkle_root.cpp:25)
    |             ->35.10% (288,064B) 0x2234AF: ankerl::nanobench::Bench& ankerl::nanobench::Bench::run<MerkleRoot(ankerl::nanobench::Bench&)::{lambda()
    
    Co-authored-by: optout21 <13562139+optout21@users.noreply.github.com>
    Co-authored-by: Hodlinator <172445034+hodlinator@users.noreply.github.com>
    9f39a8309c
  80. l0rinc force-pushed on Dec 11, 2025
  81. hodlinator approved
  82. hodlinator commented at 2:05 pm on December 11, 2025: contributor

    re-ACK 9f39a8309cb0a23df03937b0b225aff8ce9e1ec6

    Resolved conflict with #33805 and fixed code nit #32497 (review).

  83. DrahtBot requested review from vasild on Dec 11, 2025
  84. DrahtBot added the label CI failed on Dec 11, 2025
  85. l0rinc closed this on Dec 11, 2025

  86. l0rinc reopened this on Dec 11, 2025

  87. DrahtBot removed the label CI failed on Dec 11, 2025
  88. vasild approved
  89. vasild commented at 8:00 am on December 12, 2025: contributor
    ACK 9f39a8309cb0a23df03937b0b225aff8ce9e1ec6

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-12-12 09:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me