l0rinc commented at 1:01 pm on May 14, 2025: contributor

Summary

ComputeMerkleRoot duplicates the last hash when the input size is odd. If the caller provides a std::vector whose capacity equals its size (common when the vector was created with resize()), that extra push_back forces a reallocation, doubling its capacity. We pay this penalty on every Merkle root calculation even though we know the final size in advance.

Fix

Call sites: replace resize(n) + index assignment with reserve(n + (n & 1)) + emplace_back(...);
Guarantees one extra slot when vtx.size() is odd (assigned later);
Removes default construction of uint256 objects we overwrite anyway.

Validation

The benchmark was updated to use an even leaf count for simplicity and to focus on hashing speed rather than reallocations or vector copying.

Asserts (not pushed in this PR) confirm that push_back never reallocates and that the coinbase witness hash remains null:

0if (hashes.size() & 1) {
1    assert(hashes.size() < hashes.capacity()); // TODO remove
2    hashes.push_back(hashes.back());
3}
4
5leaves.reserve(block.vtx.size() + (block.vtx.size() & 1)); // capacity rounded up to even
6leaves.emplace_back();
7assert(leaves.back().IsNull()); // TODO remove

A full -reindex-chainstate up to block 896 408 ran without triggering the asserts.

DrahtBot commented at 1:01 pm on May 14, 2025: contributor

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/32497.

Reviews

See the guideline for information on the review process.

Type	Reviewers
ACK	yuvicc
Concept ACK	luke-jr

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

No conflicts as of last run.

in src/consensus/merkle.cpp:69 in 39b6c139bd outdated

65@@ -66,20 +66,20 @@ uint256 ComputeMerkleRoot(std::vector<uint256> hashes, bool* mutated) {
66 uint256 BlockMerkleRoot(const CBlock& block, bool* mutated)
67 {
68     std::vector<uint256> leaves;
69-    leaves.resize(block.vtx.size());
70-    for (size_t s = 0; s < block.vtx.size(); s++) {
71-        leaves[s] = block.vtx[s]->GetHash();
72+    leaves.reserve((block.vtx.size() + 1) & ~1ULL); // capacity rounded up to even

luke-jr commented at 11:25 am on May 20, 2025:

This assumes size_t is unsigned long long I think? Can we avoid that easily?

l0rinc commented at 1:27 pm on May 20, 2025:

we could either do size_t{1} or leaves.reserve(block.vtx.size() + (block.vtx.size() & 1)), I think, if you think this could be problematic on 32 bit systems.

l0rinc commented at 9:36 am on June 15, 2025:

I ended up with block.vtx.size() + (block.vtx.size() & 1) - if the value ends with a 1, it’s an odd number and we increment it to make it even

in src/bench/merkle_root.cpp:15 in 899e466237 outdated

10@@ -11,16 +11,15 @@
11 
12 static void MerkleRoot(benchmark::Bench& bench)
13 {
14-    FastRandomContext rng(true);
15-    std::vector<uint256> leaves;
16-    leaves.resize(9001);
17+    FastRandomContext rng{/*fDeterministic=*/true};
18+    std::vector<uint256> leaves(9000); // even number of leaves

luke-jr commented at 11:33 am on May 20, 2025:

Wouldn’t it be better to benchmark the less-performant input?

l0rinc commented at 1:24 pm on May 20, 2025:

After this change all call-sites are pre-sized to even number of leaves, so this reflects the usage better. I could also round to 9002 to mimic the new leaves.reserve((9001 + 1) & ~1ULL) behavior, if you thinks that’s more representative.

l0rinc commented at 9:35 am on June 15, 2025:

added back 9001

in src/bench/merkle_root.cpp:22 in 899e466237 outdated

23-        bool mutation = false;
24-        uint256 hash = ComputeMerkleRoot(std::vector<uint256>(leaves), &mutation);
25-        leaves[mutation] = hash;
26+        bool mutation{false};
27+        const uint256 hash{ComputeMerkleRoot(std::vector(leaves), &mutation)};
28+        assert(mutation == false && hash == uint256{"18ff9b2bef46a834b2625871b2839fec3b67fc1243920c424858324fbde1a8de"});

luke-jr commented at 11:33 am on May 20, 2025:

How does this avoid optimisations?

l0rinc commented at 1:21 pm on May 20, 2025:

the ComputeMerkleRoot calculation has to be performed to be able to validate this assert - it’s can’t eliminate the call.

luke-jr approved

luke-jr commented at 11:34 am on May 20, 2025: member

utACK main commit. Not sure why the benchmark is being changed.

l0rinc commented at 1:28 pm on May 20, 2025: contributor

Thanks for the review!

yuvicc commented at 8:34 pm on June 13, 2025: contributor

Concept ACK

Memory allocation/deallocation is slow and expensive.

yuvicc commented at 8:42 pm on June 13, 2025: contributor

I have thought on changing the benchmark tests as we would want to test the worst case scenario right? or else we could have both case for testing?

yuvicc commented at 9:35 pm on June 13, 2025: contributor

Benchmark Testing

master@5757de4ddd37f9321ee6b338b40888fd3561fc00

With 9000

0|             ns/leaf |              leaf/s |    err% |     total | benchmark
1|--------------------:|--------------------:|--------:|----------:|:----------
2|               52.62 |       19,005,746.07 |    0.2% |      0.01 | `MerkleRoot`
3|               52.64 |       18,998,504.40 |    0.3% |      0.01 | `MerkleRoot`
4|               52.63 |       18,999,727.67 |    0.2% |      0.01 | `MerkleRoot`

With 9001

0|             ns/leaf |              leaf/s |    err% |     total | benchmark
1|--------------------:|--------------------:|--------:|----------:|:----------
2|               53.50 |       18,693,063.88 |    0.3% |      0.01 | `MerkleRoot`
3|               53.53 |       18,681,211.49 |    0.5% |      0.01 | `MerkleRoot`
4|               53.49 |       18,694,053.87 |    0.5% |      0.01 | `MerkleRoot`

Commit 39b6c139bd6be33699af781f1d71f6fed303d468

With 9000

0|             ns/leaf |              leaf/s |    err% |     total | benchmark
1|--------------------:|--------------------:|--------:|----------:|:----------
2|               52.51 |       19,043,628.95 |    0.2% |      0.01 | `MerkleRoot`
3|               52.52 |       19,040,989.96 |    0.2% |      0.01 | `MerkleRoot`
4|               52.53 |       19,036,358.39 |    0.2% |      0.01 | `MerkleRoot`

With 9001

0|             ns/leaf |              leaf/s |    err% |     total | benchmark
1|--------------------:|--------------------:|--------:|----------:|:----------
2|               53.40 |       18,713,525.67 |    0.3% |      0.01 | `MerkleRoot`
3|               53.44 |       19,314,655.73 |    0.3% |      0.01 | `MerkleRoot`
4|               53.41 |       18,883,462.75 |    0.3% |      0.01 | `MerkleRoot`

l0rinc force-pushed on Jun 15, 2025

l0rinc commented at 10:06 am on June 15, 2025: contributor

Updated the benchmark to showcase the before/after state better (resembles production code changes), by splitting out the vector copies to the unmetered lambda and having an odd number of elements. Changed the previous leaves.reserve((block.vtx.size() + 1) & ~1ULL) rounding to leaves.reserve(block.vtx.size() + (block.vtx.size() & 1)).

l0rinc requested review from luke-jr on Jun 15, 2025

l0rinc requested review from yuvicc on Jun 15, 2025

yuvicc commented at 5:26 pm on June 15, 2025: contributor

Code review ACK 4cc5942895673591de4edceb6dd0c7188c302a72

Testing

Before Master@9a7eece5a4

ns/leaf	leaf/s	err%	total	benchmark
53.75	18,605,160.70	0.7%	0.01	`MerkleRoot`
53.64	18,643,002.35	0.1%	0.01	`MerkleRoot`
53.91	18,548,704.52	0.4%	0.01	`MerkleRoot`

After commit@4cc5942895673591de4edceb6dd0c7188c302a72

ns/leaf	leaf/s	err%	total	benchmark
57.67	17,340,305.39	0.0%	0.52	`MerkleRoot`
57.69	17,332,534.96	0.0%	0.52	`MerkleRoot`
58.14	17,199,057.10	0.0%	0.52	`MerkleRoot`

in src/bench/merkle_root.cpp:19 in 064f33bbe2 outdated

18     std::vector<uint256> leaves;
19-    leaves.resize(9001);
20-    for (auto& item : leaves) {
21-        item = rng.rand256();
22+    leaves.resize(block_vtx_size);
23+    for (auto& h : leaves) h = rng.rand256();

maflcko commented at 9:11 am on June 20, 2025:

why is this changed. touching a line to only remove the {} after the for loop seems odd and I had the impression that the dev notes preferred to use {}

l0rinc commented at 11:19 am on June 20, 2025:

I’m fine with both, reverted.

in src/bench/merkle_root.cpp:22 in 064f33bbe2 outdated

21-        item = rng.rand256();
22+    leaves.resize(block_vtx_size);
23+    for (auto& h : leaves) h = rng.rand256();
24+
25+    constexpr size_t iterations{1000};
26+    std::vector<std::vector<uint256>> cases(iterations);

maflcko commented at 9:14 am on June 20, 2025:

not sure about pre-allocating. All real code paths allocate in-line for each iteration, so the benchmark should follow that. This would also avoid eating 100+mb for a microbenchmark?

l0rinc commented at 10:42 am on June 20, 2025:

I wanted to explicitly only measure the ComputeMerkleRoot call here without collecting hashes into a vector. But if you think that part should de included, I don’t mind, but in that case we won’t measure the behavior change in ComputeMerkleRoot anymore. However, I think the proper allocation makes sense regardless - e.g. for 2001 entries now we’ll allocate 2002 entries instead of 4002 like before.

in src/consensus/merkle.cpp:71 in 4cc5942895 outdated

65@@ -66,20 +66,20 @@ uint256 ComputeMerkleRoot(std::vector<uint256> hashes, bool* mutated) {
66 uint256 BlockMerkleRoot(const CBlock& block, bool* mutated)
67 {
68     std::vector<uint256> leaves;
69-    leaves.resize(block.vtx.size());
70-    for (size_t s = 0; s < block.vtx.size(); s++) {
71-        leaves[s] = block.vtx[s]->GetHash();
72+    leaves.reserve(block.vtx.size() + (block.vtx.size() & 1)); // capacity rounded up to even
73+    for (const auto& tx : block.vtx) {
74+        leaves.emplace_back(tx->GetHash());

maflcko commented at 9:19 am on June 20, 2025:

why is this changed? I don’t think a byte array can meaningfully be moved (of course it compiles, but moving a byte is no different than copying it)

l0rinc commented at 11:19 am on June 20, 2025:

After I have used leaves.emplace_back(); // The witness hash of the coinbase is 0 in BlockWitnessMerkleRoot it seemed more consistent to me to use emplace_back in BlockMerkleRoot as well - as you said it should do the same. I’m fine with both, I’ve changed them inside the loop to push_back.

in src/consensus/merkle.cpp:81 in 4cc5942895 outdated

83-    leaves[0].SetNull(); // The witness hash of the coinbase is 0.
84-    for (size_t s = 1; s < block.vtx.size(); s++) {
85-        leaves[s] = block.vtx[s]->GetWitnessHash();
86+    leaves.reserve(block.vtx.size() + (block.vtx.size() & 1)); // capacity rounded up to even
87+    leaves.emplace_back(); // The witness hash of the coinbase is 0.
88+    for (size_t i{1}; i < block.vtx.size(); ++i) {

maflcko commented at 9:20 am on June 20, 2025:

why is this changed? I don’t think it is worth to touch a line just to change the name of the variable from one single char to another single char.

l0rinc commented at 11:19 am on June 20, 2025:

There was no other loops with the same variable after the change anymore, so it doesn’t need to be a non-standard loop variable name (wasn’t sure what s meant in this context) - but since it’s important to you I’ve reverted them.

maflcko commented at 9:22 am on June 20, 2025: member

not sure how meaningful this is, but it looks like there are a bunch of unrelated style changes that don’t seem to have a benefit?

l0rinc force-pushed on Jun 20, 2025

l0rinc commented at 11:24 am on June 20, 2025: contributor

The changes revert the original benchmark’s behavior of including the leaf vector in the benchmark (more explicitly and reserved like the production code). I’ve also changed the emplace_back calls for the hashes to push_bash and reverted the original loop conditions to minimize the diff in critical code sections.

Note to yuvicc: previous benchmark version was measuring something else than master (only the merkle call without the vector copy), so we can’t directly compare the results.

~~The new results don’t show any significant speed increase, only the memory allocation is more predictable - which isn’t captured by the benchmark anymore.~~

l0rinc requested review from yuvicc on Jun 25, 2025

l0rinc requested review from maflcko on Jun 25, 2025

l0rinc force-pushed on Jun 25, 2025

l0rinc commented at 9:33 pm on June 25, 2025: contributor

I’ve rebased the changed and adjusted the benchmark to be more similar to the other production usages in the first commit, rounded to even in the second (optimization) commit, so that we can realistically measure the speed difference before and after:

% build/bin/bench_bitcoin -filter=‘MerkleRoot’ –min-time=1000

Before 7f620cffebee593e48434cf182cc2fd64a6d76be:

ns/leaf	leaf/s	err%	total	benchmark
45.50	21,975,688.66	0.0%	1.10	`MerkleRoot`
45.53	21,962,546.91	0.1%	1.10	`MerkleRoot`
45.53	21,965,162.52	0.0%	1.10	`MerkleRoot`

After a73dd45e95f5e909d0950ba7c76225589df74854:

ns/leaf	leaf/s	err%	total	benchmark
45.04	22,200,976.95	0.1%	1.10	`MerkleRoot`
44.97	22,235,208.25	0.1%	1.10	`MerkleRoot`
45.01	22,217,033.81	0.1%	1.10	`MerkleRoot`

refactor: make `MerkleRoot` benchmark more representative

Instead of mutating the input after each round to avoid unwanted compiler optimizations, we assert the expected hash, which likewise inhibits aggressive optimization.

To make the benchmark more similar to other `ComputeMerkleRoot` the input leaves copying is made explicit before each run.

% build/bin/bench_bitcoin -filter='MerkleRoot' --min-time=1000

|             ns/leaf |              leaf/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|               45.50 |       21,975,688.66 |    0.0% |      1.10 | `MerkleRoot`
|               45.53 |       21,962,546.91 |    0.1% |      1.10 | `MerkleRoot`
|               45.53 |       21,965,162.52 |    0.0% |      1.10 | `MerkleRoot`

e87430ed5f

merkle: pre-reserve leaves to prevent reallocs with odd vtx count

This prevents the input vector from doubling in size when the input count is odd.
The rounding adds 1 when the size is odd (least significant bit is 1); otherwise it adds 0, keeping the value even.

% build/bin/bench_bitcoin -filter='MerkleRoot' --min-time=1000

|             ns/leaf |              leaf/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|               45.04 |       22,200,976.95 |    0.1% |      1.10 | `MerkleRoot`
|               44.97 |       22,235,208.25 |    0.1% |      1.10 | `MerkleRoot`
|               45.01 |       22,217,033.81 |    0.1% |      1.10 | `MerkleRoot`

be8f3053a7

l0rinc force-pushed on Jun 25, 2025

yuvicc commented at 7:22 am on June 27, 2025: contributor

lgtm re-ACK be8f3053a7ad526823f32f1a70847b9c06c4c54b

Before e87430ed5fad6f9d90c1e7d256b9b8276b936c24

ns/leaf	leaf/s	err%	total	benchmark
53.83	18,577,129.91	0.1%	1.10	`MerkleRoot`
53.62	18,648,858.81	0.1%	1.10	`MerkleRoot`
53.70	18,621,594.40	0.1%	1.10	`MerkleRoot`

After be8f3053a7ad526823f32f1a70847b9c06c4c54b

ns/leaf	leaf/s	err%	total	benchmark
53.02	18,860,232.62	0.2%	1.10	`MerkleRoot`
52.88	18,910,755.68	0.0%	1.10	`MerkleRoot`
52.89	18,906,671.63	0.1%	1.10	`MerkleRoot`

merkle: pre‑reserve leaves to prevent reallocs with odd vtx count #32497

Summary

Fix

Validation

Code Coverage & Benchmarks

Reviews

Conflicts

Benchmark Testing

Testing

Before Master@9a7eece5a4

After commit@4cc5942895673591de4edceb6dd0c7188c302a72