bench: Match ConnectBlock tx output counts

monlovesmango commented at 5:37 pm on April 3, 2025: contributor

There turned out to be a mismatch in the tx output counts which caused ‘ConnectBlockMixedEcdsaSchnorr’ benchmark to run slower than ‘ConnectBlockAllEcdsa’ and ‘ConnectBlockAllSchnorr’. This commit makes the tx output counts uniform across all benchmarks.

This commit also renames the ’taproot_tx’ variable to ’tx’ to reflect that this variable represents a general tx and not just a taproot tx.

DrahtBot commented at 5:37 pm on April 3, 2025: contributor

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/32216.

Reviews

See the guideline for information on the review process.

Type	Reviewers
ACK	davidgumberg, janb84, josibake, Prabhat1308

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#31668 (Added rescan option for import descriptors by saikiran57)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

DrahtBot added the label Tests on Apr 3, 2025

monlovesmango commented at 5:40 pm on April 3, 2025: contributor

Background for this PR can be found in #31689.

Happy to remove the renaming of taproot_tx if reviewers feel that it is inappropriate.

janb84 commented at 6:31 pm on April 3, 2025: contributor

see review below.

Prabhat1308 commented at 6:57 pm on April 3, 2025: contributor

tACK 1f0aef6

My output is less verbose compared to the outputs by other reviewers for some reason.

0
1|            ns/block |             block/s |    err% |     total | benchmark
2|--------------------:|--------------------:|--------:|----------:|:----------
3|       30,309,900.52 |               32.99 |    0.2% |     65.84 | `ConnectBlockAllEcdsa`
4|       29,992,461.34 |               33.34 |    0.4% |     65.71 | `ConnectBlockAllSchnorr`
5|       30,330,863.62 |               32.97 |    0.2% |     66.21 | `ConnectBlockMixedEcdsaSchnorr`

Also agree with the naming change since not all transactions are taproot and better to keep it general.

This benchmark is expected to be slower than the AllSchnorr or Ecdsa benchmark

because it uses transactions with both Schnorr and Ecdsa signatures

which requires the transaction to be hashed multiple times for

the different signature algorithms

I see this mentioned but since for small numbers there isn’t much difference. The difference will be seen when number is increased to actual numbers ? I am of the view that benchmarks be close to the actual numbers and would probably like some actual input numbers to be put rather than the small dummy numbers since these aren’t the markers for this benchmark.

EDIT: After examining the code again , I see that 1000 transactions are taken . Although still far from real world scenario (with 5 inputs/output) . Should be okay for a benchmark .

DrahtBot requested review from janb84 on Apr 3, 2025

in src/bench/connectblock.cpp:1 in 1f0aef6d70

janb84 commented at 7:06 pm on April 3, 2025:

NIT: Please change the comment on line 114-119, your change removes the need for the comment

0/**
1 * This benchmark is expected to be slower than the AllSchnorr or Ecdsa benchmark
2 * because it uses transactions with both Schnorr and Ecdsa signatures
3 * which requires the transaction to be hashed multiple times for
4 * the different signature algorithms
5 */

monlovesmango commented at 9:35 pm on April 3, 2025:

done! good catch

janb84 commented at 6:54 am on April 4, 2025:

Thanks for changing this, it was after the comment of @Prabhat1308 that I noticed it was still there. So props to @Prabhat1308 👏

janb84 commented at 7:07 pm on April 3, 2025: contributor

Approach ACK 1f0aef6

The benchmark now runs with equal amount of outputs, this was also discussed during the PR review club. Seems logical to have an equal amount of outputs between the different benchmarks and the change also removes the outlier. Would like to know if the unequal number of outputs were there with a reason though.

monlovesmango force-pushed on Apr 3, 2025

bench: Match ConnectBlock tx output counts

There turned out to be a mismatch in the tx output counts which caused
'ConnectBlockMixedEcdsaSchnorr' benchmark to run slower than
'ConnectBlockAllEcdsa' and 'ConnectBlockAllSchnorr'. This commit makes
the tx output counts uniform across all benchmarks.

This commit also renames the 'taproot_tx' variable to 'tx' to reflect
that this variable represents a general tx and not just a taproot tx.

924f25f6fc

monlovesmango force-pushed on Apr 3, 2025

monlovesmango commented at 9:47 pm on April 3, 2025: contributor

My output is less verbose compared to the outputs by other reviewers for some reason.

This is how it looks for me too when I run it on my mac, not sure why.

davidgumberg commented at 1:19 am on April 4, 2025: contributor

Tested ACK 924f25f6fc766cb89a23fb

Nice find! Given that, as I understand, part of the motivation for this benchmark’s introduction (#31689) is to be able to compare ConnectBlock() performance with and without schnorr batch validation (#29491), and part of the motivation for using Schnorr signatures in the first place is that they’re susceptible to batch validation, it seems to me that it would be useful for these three benchmarks to be as like as possible, so that they can be compared, and this PR makes that the case in my testing.

Recent PR review club where this was discussed: (https://bitcoincore.reviews/31689#l-60).

Benchmark Results

master (c66f7dab33fb08dc323)

0$ git checkout --detach 924f25f~1
1$ cmake -B build -DBUILD_BENCH=ON && cmake --build build -j $(nproc)
2$ ./build/bin/bench_bitcoin --filter=ConnectBlock.* --min-time=5000

ns/block	block/s	err%	ins/block	cyc/block	IPC	bra/block	miss%	total	benchmark
46,549,571.00	21.48	1.2%	672,393,930.64	161,287,432.73	4.169	15,122,140.18	1.7%	5.46	`ConnectBlockAllEcdsa`
47,199,469.60	21.19	0.2%	680,249,499.27	164,132,414.09	4.145	15,120,173.64	1.7%	5.47	`ConnectBlockAllSchnorr`
59,705,754.22	16.75	0.7%	839,091,363.50	207,657,368.33	4.041	18,945,901.12	1.8%	5.55	`ConnectBlockMixedEcdsaSchnorr`

pr branch (924f25f6fc766cb)

0$ git checkout --detach 924f25f
1$ cmake -B build -DBUILD_BENCH=ON && cmake --build build -j $(nproc)
2$ ./build/bin/bench_bitcoin --filter=ConnectBlock.* --min-time=5000

ns/block	block/s	err%	ins/block	cyc/block	IPC	bra/block	miss%	total	benchmark
58,425,229.87	17.12	0.6%	840,316,339.12	203,096,705.00	4.138	18,694,845.12	1.7%	5.45	`ConnectBlockAllEcdsa`
59,756,200.89	16.73	0.6%	846,439,909.12	207,679,122.78	4.076	18,670,295.75	1.8%	5.52	`ConnectBlockAllSchnorr`
59,767,476.56	16.73	0.6%	840,513,726.44	207,604,950.00	4.049	18,968,510.88	1.8%	5.58	`ConnectBlockMixedEcdsaSchnorr`

My output is less verbose compared to the outputs by other reviewers for some reason.

This is because nanobench provides some extra stats via linux’s perf_events subsystem and omits them when they’re not available:

CPU statistics like instructions, cycles, branches, branch misses are only available on Linux, through perf events. On some systems you might need to change permissions through perf_event_paranoid or use ACL.

https://github.com/martinus/nanobench/blob/e4327893194f06928012eb81cabc606c4e4791ac/src/docs/tutorial.rst?plain=1#L79-L84

DrahtBot requested review from janb84 on Apr 4, 2025

DrahtBot requested review from Prabhat1308 on Apr 4, 2025

janb84 commented at 6:52 am on April 4, 2025: contributor

re ACK 924f25f

Changes sinds last review:

Removed comment block which was not relevant anymore.

josibake approved

josibake commented at 7:56 am on April 4, 2025: member

ACK https://github.com/bitcoin/bitcoin/pull/32216/commits/924f25f6fc766cb89a23fb1a6ad632947742da9f

Thanks for picking this up! Very cool to see this coming from the PR review club. While not strictly necessary for all of these benchmarks to have the same number of transactions, I think this change better aligns with the motivation of #31689 and will help avoid confusion in the future. Also agree that the variable rename improves the readability.

l0rinc changes_requested

l0rinc commented at 8:27 am on April 4, 2025: contributor

As mentioned in the original PR, I personally would prefer adjusting the bench.unit("block") instead (or besides), since a “block” doesn’t actually define what it contains, but number of transactions tells more about what we’re measuring. Changing it to

 0diff --git a/src/bench/connectblock.cpp b/src/bench/connectblock.cpp
 1index efe05dc2af..2317538f93 100644
 2--- a/src/bench/connectblock.cpp
 3+++ b/src/bench/connectblock.cpp
 4@@ -92,7 +92,7 @@ std::pair<std::vector<CKey>, std::vector<CTxOut>> CreateKeysAndOutputs(const CKe
 5 void BenchmarkConnectBlock(benchmark::Bench& bench, std::vector<CKey>& keys, std::vector<CTxOut>& outputs, TestChain100Setup& test_setup)
 6 {
 7     const auto& test_block{CreateTestBlock(test_setup, keys, outputs)};
 8-    bench.unit("block").run([&] {
 9+    bench.batch(outputs.size()).unit("tx").run([&] {
10         LOCK(cs_main);
11         auto& chainman{test_setup.m_node.chainman};
12         auto& chainstate{chainman->ActiveChainstate()};

for the original change results in:

ns/tx	tx/s	err%	total	benchmark
8,712,574.78	114.78	0.1%	11.00	`ConnectBlockAllEcdsa`
8,630,085.42	115.87	0.0%	10.98	`ConnectBlockAllSchnorr`
8,718,840.97	114.69	0.0%	10.95	`ConnectBlockMixedEcdsaSchnorr`

and after this PR:

ns/tx	tx/s	err%	total	benchmark
8,702,884.12	114.90	0.0%	10.93	`ConnectBlockAllEcdsa`
8,650,517.76	115.60	0.1%	11.04	`ConnectBlockAllSchnorr`
8,722,221.53	114.65	0.0%	10.95	`ConnectBlockMixedEcdsaSchnorr`

We can keep the current changes as well, but I think it would clarify things if we defined more precisely what we’re measuring.

Prabhat1308 commented at 1:21 pm on April 4, 2025: contributor

reACK 924f25f

monlovesmango commented at 4:46 pm on April 4, 2025: contributor

Hey @l0rinc ,

I appreciate you reiterating this feedback! I actually tend to agree with you, however do not quite agree with the strategy laid out in your suggested changes, and my lack of experience makes me hesitant to make any sweeping changes.

For your suggested changes, in order for the metric to be tx/s I believe the batch size would actually need to be the number of txs and not the outputs.size(). Additionally, I actually think the metric we would really want is tx outputs/s, as switching to tx/s wouldn’t actually resolve the issue this pr focused on. From a technical perspective it would be inaccurate to say that this benchmark is actually measuring tx outputs/s just by scaling the batch because what it actually would bench is just connectblock divided by the number of tx outputs which doesn’t change the metric in a meaningful way.

More broadly, for a “ConnectBlock” benchmark my assumption is that this could potentially be used to benchmark other aspects of connectblock functionality, not just signature validation, in which case tx outputs/s would not really be an appropriate metric. I do think it might be valuable to have a separate benchmark that is more focused at the tx output granularity level.

I think it would clarify things if we defined more precisely what we’re measuring.

In my view, block/s is the most precise definition of what we’re measuring and tx/s (or tx output/x) would be an obfuscation.

l0rinc commented at 8:34 pm on April 4, 2025: contributor

In my view, block/s is the most precise definition of what we’re measuring and tx/s (or tx output/x) would be an obfuscation.

Isn’t the point of this PR to change what blocks/s means? That’s why this PR exists, since a block doesn’t really narrow down what we’re measuring. We might as well call it thing/s.

the metric we would really want is tx outputs/s

I think that’s basically what I wrote (you can scale it to actual transaction outputs, but since the transaction count is the same for all, it’s just a multiplier, won’t change the ratios)

We can also write:

0bench.batch(test_block.vtx.size() * outputs.size()).unit("txo").run([&] {

or even

0bench.batch(std::accumulate(test_block.vtx.begin(), test_block.vtx.end(), size_t{0}, [](size_t sum, const auto& tx) { return sum + tx->vout.size(); })).unit("txo").run([&] {

if you think that’s more accurate, but calling it “block/s” and than changing the value and still calling it “block/s” is not the sign of a reliable benchmark.

Will let you decide, I basically already said these in the other PR as well.

monlovesmango commented at 11:35 pm on April 4, 2025: contributor

Isn’t the point of this PR to change what blocks/s means?

The point of this PR is simply to make the existing benchmark code more consistent across benchmarks, not to change anything in the design of the benchmark that has already been merged. As I said before, I do not have the experience nor expertise to tackle something like that and am not convinced that just changing the unit of the benchmark is meaningful.

ryanofsky assigned ryanofsky on Apr 8, 2025

ryanofsky merged this on Apr 8, 2025

ryanofsky closed this on Apr 8, 2025

bench: Match ConnectBlock tx output counts #32216

Code Coverage & Benchmarks

Reviews

Conflicts

Benchmark Results

master (c66f7dab33fb08dc323)

pr branch (924f25f6fc766cb)