Benchmark Chainstate::ConnectBlock duration

Eunovo commented at 10:34 am on January 20, 2025: contributor

Introduce benchmarks to evaluate ConnectBlock performance for:

Blocks containing only Schnorr signatures
Blocks containing both Schnorr and ECDSA signatures
Blocks containing only ECDSA signatures

The benchmarks in this PR focus on signature validation. Additional benchmarks may be added in the future to assess other aspects of ConnectBlock.

This is the first step toward implementing Batch Verification of Schnorr Signatures in Core. It provides a way to test and measure the performance improvements of batch verification on Core. For more details on batch validation, refer to the batch-verify module on secp and batch-verify on core.

DrahtBot commented at 10:34 am on January 20, 2025: contributor

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/31689.

Reviews

See the guideline for information on the review process.

Type	Reviewers
ACK	l0rinc, josibake, fjahr
Approach ACK	TheCharlatan
Stale ACK	davidgumberg

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Eunovo marked this as a draft on Jan 20, 2025

Eunovo force-pushed on Jan 20, 2025

in src/bench/check_connectblock.cpp:26 in 0b13781d9f outdated

21+    Chainstate& chainstate{test_setup->m_node.chainman->ActiveChainstate()};
22+
23+    // Create a block with (500*4) schnorr signatures to verify
24+    // Typical blocks have high number of txs with few inputs
25+    const int num_txs = 500;
26+    const int num_inputs = 4;

l0rinc commented at 12:17 pm on January 20, 2025:

Could we use an existing block from Mainnet - to be sure we’re measuring a representative case and not one skewed towards our preferences? One with many schnorr sigs - but since we don’t yet have one that only contains them, I don’t think it makes sense to measure that (like we do here).

fjahr commented at 1:18 pm on January 21, 2025:

Optimizing for one specific block is also skewing towards something, just a bit different. Maybe the test should generate a number of blocks with some randomization of the txs and inputs and outputs instead.

l0rinc commented at 2:14 pm on January 21, 2025:

They’re always skewed, but if we use actual blocks as basis, at least we can’t trick ourselves so easily.

fjahr commented at 2:23 pm on January 21, 2025:

They’re always skewed,

Not if we use randomization.

but if we use actual blocks as basis, at least we can’t trick ourselves so easily.

We are just tricking ourselves differently.

After a brief discussion with @Eunovo and @josibake elsewhere I think there should be 3 separate benchmarks: No Schnorr Blocks, Random-mix Blocks and only schnorr blocks. This allows to see the impact of changes on each of these scenarios and the random mix blocks should cover what we currently see on chain.

l0rinc commented at 2:37 pm on January 21, 2025:

They’re always skewed

Not if we use randomization.

We always introduce bias, it’s unavoidable. Let’s try to come up with something that minimizes our own preference of how we wish the blocks looked like (as in this PR where the block only contained Schnorrs). We can avoid that by checking the actual usages and relying on them.

I think there should be 3 separate benchmarks: No Schnorr Blocks, Random-mix Blocks and only schnorr blocks.

This will likely minimize the biases that we introduce - especially, as mentioned before, if we rely on existing blocks, instead of an artificial setup that are skewed towards the settings we desire to measure.

Eunovo commented at 3:40 pm on January 21, 2025:

I’ve updated the PR to use 3 blocks

all schnorr
50% schnorr, 50% ecdsa
all ecdsa

If we used one block from Mainnet here, we would optimize for that block. This setup of 3 blocks is more balanced. We can always do an actual IBD benchmark or use tools like Benchcoin cc @josibake

Connecting random blocks from Mainnet is tricky. At best, I would only be able to use the Mainnet block as a template to create the blocks for the benchmark. I’m not sure it’s worth it to do this.

l0rinc commented at 3:45 pm on January 21, 2025:

we would optimize for that block

So what are we optimizing for now?

Eunovo force-pushed on Jan 20, 2025

fjahr commented at 12:41 pm on January 21, 2025: contributor

Concept ACK

What’s keeping this in draft status?

Eunovo commented at 1:06 pm on January 21, 2025: contributor

@fjahr I’m trying to use some real Mainnet blocks here instead. Block 861848 for example, has a lot of taproot inputs and could be good for measuring batch-validation impact on a taproot-heavy block. cc @l0rinc

Eunovo force-pushed on Jan 21, 2025

in src/bench/check_connectblock.cpp:102 in c0597320e1 outdated

 97+static void ConnectBlockMixed(benchmark::Bench& bench)
 98+{
 99+    const auto test_setup{MakeNoLogFileContext<TestChain100Setup>()};
100+
101+    size_t num_taproot{2};
102+    size_t num_nontaproot{2};

l0rinc commented at 3:41 pm on January 21, 2025:

As mentioned earlier, I would prefer basing the constants on data derived from actual usage. The current approach feels arbitrary and directly impacts the outcome.

fjahr commented at 4:32 pm on January 21, 2025:

Picking a random block is just as arbitrary and it would make sense to base the numbers on an outlook for the future rather than looking backwards. I think 50/50 is good as a starting point. I’m sure there has been a block in past that had exactly 50% schnorr sigs, so this argument about using a specific block is pointless. If you want to suggest a different split based on the past I would suggest taking a larger range of blocks and deriving a very specific split from that. Like over the last year it’s probably going to be something around 40%. I would still prefer to use 50/50 because it’s more representative of what we expect the future to be like but maybe you can convince other reviewers to use this.

l0rinc commented at 4:45 pm on January 21, 2025:

Picking a random block is just as arbitrary

Absolutely, nobody recommended that.

I’m sure there has been a block in past that had exactly 50% schnorr sigs

My point is that I’d appreciate putting in the work and actually testing that assumption. And I’m not recommending getting any block, but create one that is based on measured values, not guessed ones.

Like over the last year it’s probably going to be something around 40%.

Great, can we back that by actual measurements?

it’s more representative of what we expect the future to be like

How so? Whenever that changes we can easily adjust the benchmarks. Let’s base it on data instead of feelings.

What I’m proposing is to take inspiration from the actual usages instead of coming up with these trivial values - only to discover later that they’re not representative and don’t cause any measurable speedup in reality.

fjahr commented at 5:20 pm on January 21, 2025:

Absolutely, nobody recommended that.

You literally suggested it one day ago and when @Eunovo said he was going to follow through with it you gave him a thumbs up. That’s what started this whole argument.

Great, can we back that by actual measurements?

As I said, I think 50/50 is fine so I won’t do those measurements. Unless @Eunovo wants to do them it’s probably better if you do them and suggest to use them here.

How so? Whenever that changes we can easily adjust the benchmarks. Let’s base it on data instead of feelings.

What I’m proposing is to take inspiration from the actual usages instead of coming up with these trivial values - only to discover later that they’re not representative and don’t cause any measurable speedup in reality.

These aren’t feelings, they are assumptions based on what is being built in the bitcoin ecosystem. It seems more likely than not that we will see more taproot adoption rather than less.

l0rinc commented at 5:25 pm on January 21, 2025:

You literally suggested it one day ago

Read it again, there was no randomness suggested, rather the opposite, a well thought out investigation.

likely than not that we will see more taproot adoption rather than less

Exactly, so why 50/50 and not 80/20?

fjahr commented at 5:28 pm on January 21, 2025:

Exactly, so why 50/50 and not 80/20?

Why 80/20?

l0rinc commented at 5:29 pm on January 21, 2025:

Exactly!

fjahr commented at 5:31 pm on January 21, 2025:

Read it again, there was no randomness suggested

What investigation? You suggested to pick a random block from mainnet and @Eunovo followed up with this saying “Block 861848 for example”.

fjahr commented at 5:33 pm on January 21, 2025:

Exactly, so why 50/50 and not 80/20?

Why 80/20?

I hope you get the point. I can ask you why why why for anything you suggest too (why do you want to look at the past? Why do you want to look at the past year/6 months etc) and this PR will not go anywhere which would be a shame. This is a pointless debate.

l0rinc commented at 5:34 pm on January 21, 2025:

Please read what I wrote, there was no random block suggestion. All I’m saying is let’s back up our choices by data instead of random values guided by feelings. Did not expect that to be a controversial statement…

fjahr commented at 5:40 pm on January 21, 2025:

Please read what I wrote, there was no random block suggestion.

Of course and you confirmed it with the engagement in the comment.

All I’m saying is let’s back up our choices by data instead of random values guided by feelings.

What data? Your suggestion is as arbitrary as mine as long as you have not made an actual suggestion of which data you want to use and put in the work to run the analysis. When have done that I can degrade that effort just like you degrade my comments here by calling them “feelings” because you will have to make decisions, too, like which blocks you actually include.

pinheadmz commented at 5:52 pm on January 21, 2025:

Ok guys, cool it please. I recommend a 24 hour break on this thread, or take it off github.

mzumsande commented at 5:56 pm on January 21, 2025: contributor

I’m trying to use some real Mainnet blocks here instead.

How would that be feasible if we don’t have the UTXO set that this block is validated on?

l0rinc commented at 6:03 pm on January 21, 2025: contributor

A similar attempt I had (based on your previous comment): https://github.com/bitcoin/bitcoin/pull/31699/commits/d90a0b8c90cd662bd2588e7df5fa6f641eebe3ba

But we can of course just copy the structure to make sure the benchmarks measure something real and not something completely made up (happens all the time with benchmarks and tests). There is no perfect benchmark, but I’d like to investigate if we can do better than what we have currently.

Eunovo commented at 6:10 pm on January 21, 2025: contributor

How would that be feasible if we don’t have the UTXO set that this block is validated on? @mzumsande Copying the block structure could be feasible

fjahr commented at 6:43 pm on January 21, 2025: contributor

Approach ACK

IMO the check_ prefix on the benchmark file name is a bit odd, just connectblock.cpp should be fine.

And there is quite a bit of duplication in the file benchmarks file, I played around with it a bit and I would suggest something like this: https://github.com/fjahr/bitcoin/commit/12eab5df067915ed83f91bf0777fcd0189f02b85 feel free to use it.

And I think you can take this out of draft status :)

Eunovo force-pushed on Jan 22, 2025

Eunovo commented at 3:47 pm on January 22, 2025: contributor

I used a script to count Tx inputs in the 848000 to 868000 block range. The non-taproot to taproot ratio is 80% to 20%. I used that to create the mixed block. The mixed block can be used to gauge the effects of improvements like batch-validation on current blocks while we also see what it’s effect on all schnorr blocks look like.

Eunovo marked this as ready for review on Jan 22, 2025

in src/bench/CMakeLists.txt:26 in d186875d42 outdated

19@@ -20,6 +20,7 @@ add_executable(bench_bitcoin
20   chacha20.cpp
21   checkblock.cpp
22   checkblockindex.cpp
23+  connectblock.cpp
24   checkqueue.cpp
25   cluster_linearize.cpp
26   crypto_hash.cpp

l0rinc commented at 11:31 am on January 23, 2025:

We order these alphabetically to minimize needless merge conflicts

0  checkblockindex.cpp
1  checkqueue.cpp
2  cluster_linearize.cpp
3  connectblock.cpp
4  crypto_hash.cpp

in src/bench/connectblock.cpp:15 in d186875d42 outdated

12+#include <test/util/setup_common.h>
13+#include <validation.h>
14+
15+#include <cassert>
16+#include <utility>
17+#include <vector>

l0rinc commented at 11:39 am on January 23, 2025:

nit: utility seems unused and bench seems out of line:

 0#include <addresstype.h>
 1#include <bench/bench.h>
 2#include <interfaces/chain.h>
 3#include <kernel/cs_main.h>
 4#include <script/interpreter.h>
 5#include <sync.h>
 6#include <test/util/setup_common.h>
 7#include <validation.h>
 8
 9#include <cassert>
10#include <vector>

l0rinc changes_requested

l0rinc commented at 12:28 pm on January 23, 2025: contributor

I have started reviewing but it contains too many misunderstanding, a few quick notes:

use std::shared_ptr to clarify ownership instead of get
don’t clear the vectors, recreate them, they’re faster and more obvious
use brace inits where possible
for primitive parameters add comments on use site
use_schnorr seems redundant, num_taproot already defines it
bench.unit("block").run([&] { unit should likely be something that changes, e.g. total signatures or something that we can compare between the different runs
use emplace_back to construct objects in-place
destructure tuple returns to avoid calling .first
num_txs can be a param with default value

Stopped reviewing here, will continue based on follow ups

fjahr commented at 12:37 pm on January 23, 2025: contributor

The non-taproot to taproot ratio is 80% to 20%. I used that to create the mixed block.

I still think that 50/50 would have been a better choice. Just from looking at the UTXO set we know that this number will be different from 80/20 in the future. Currently P2TR outputs take up 32% of the UTXO set. P2TR grows much faster than all other output types in the UTXO set. The other outputs types have been mostly stagnating the past ~year (while P2TR utxos doubled in the last year, P2WPKH grew by 10%, the others didn’t grow). And you can subtract provably unspendables and lost keys from the rest of the output types in much higher numbers which will lead to an even higher share of P2TR inputs in future blocks. Could all current P2TR key holder throw them away and never spend their outputs? Sure, but much less likely than that the trend continues and future block have a lot more taproot inputs.

And future blocks are much more interesting for this benchmark than past blocks because we can do all kinds of tricks and optimizations to speed up IBD but having a new tip propagate through the network as fast as possible is a much harder task.

This is my last comment on the topic of the mixed block split and I will just focus on the code from now on. I just want to reiterate that it makes a lot more sense to optimize for future blocks rather than past blocks.

ACK to 80/20 split. Please leave it and let’s move on.

Eunovo force-pushed on Jan 24, 2025

TheCharlatan commented at 7:12 am on January 25, 2025: contributor

Re #31689#pullrequestreview-2569617843

use std::shared_ptr to clarify ownership instead of get

I don’t think this applies here, or is good advice in general. Shared pointers only seem useful when the object receiving a pointer possibly outlives the original, which I don’t think is the case here anywhere. I’d instead suggest to take a reference of the underlying object where possible.

 0diff --git a/src/bench/connectblock.cpp b/src/bench/connectblock.cpp
 1index 51471aa3d5..ec28646b0f 100644
 2--- a/src/bench/connectblock.cpp
 3+++ b/src/bench/connectblock.cpp
 4@@ -18 +18 @@ CBlock CreateTestBlock(
 5-    std::shared_ptr<TestChain100Setup> test_setup,
 6+    TestChain100Setup& test_setup,
 7@@ -23 +23 @@ CBlock CreateTestBlock(
 8-    Chainstate& chainstate{test_setup->m_node.chainman->ActiveChainstate()};
 9+    Chainstate& chainstate{test_setup.m_node.chainman->ActiveChainstate()};
10@@ -32 +32 @@ CBlock CreateTestBlock(
11-    const WitnessV1Taproot coinbase_taproot{XOnlyPubKey(test_setup->coinbaseKey.GetPubKey())};
12+    const WitnessV1Taproot coinbase_taproot{XOnlyPubKey(test_setup.coinbaseKey.GetPubKey())};
13@@ -36,2 +36,2 @@ CBlock CreateTestBlock(
14-    auto coinbase_to_spend{test_setup->m_coinbase_txns[0]};
15-    const auto [first_tx, _]{test_setup->CreateValidTransaction(
16+    auto coinbase_to_spend{test_setup.m_coinbase_txns[0]};
17+    const auto [first_tx, _]{test_setup.CreateValidTransaction(
18@@ -42 +42 @@ CBlock CreateTestBlock(
19-    test_setup->CreateAndProcessBlock(std::vector{first_tx}, test_block_parent_coinbase, &chainstate);
20+    test_setup.CreateAndProcessBlock(std::vector{first_tx}, test_block_parent_coinbase, &chainstate);
21@@ -55 +55 @@ CBlock CreateTestBlock(
22-        const auto [taproot_tx, _]{test_setup->CreateValidTransaction(
23+        const auto [taproot_tx, _]{test_setup.CreateValidTransaction(
24@@ -63 +63 @@ CBlock CreateTestBlock(
25-    return test_setup->CreateBlock(txs, coinbase_spk, chainstate);
26+    return test_setup.CreateBlock(txs, coinbase_spk, chainstate);
27@@ -89 +89 @@ std::pair<std::vector<CKey>, std::vector<CScript>> CreateKeysAndScripts(const CK
28-void BenchmarkConnectBlock(benchmark::Bench& bench, std::vector<CKey>& keys, std::vector<CScript>& spks, std::shared_ptr<TestChain100Setup> test_setup)
29+void BenchmarkConnectBlock(benchmark::Bench& bench, std::vector<CKey>& keys, std::vector<CScript>& spks, TestChain100Setup& test_setup)
30@@ -95 +95 @@ void BenchmarkConnectBlock(benchmark::Bench& bench, std::vector<CKey>& keys, std
31-    Chainstate& chainstate{test_setup->m_node.chainman->ActiveChainstate()};
32+    Chainstate& chainstate{test_setup.m_node.chainman->ActiveChainstate()};
33@@ -111 +111 @@ static void ConnectBlockAllSchnorr(benchmark::Bench& bench)
34-    const std::shared_ptr test_setup{MakeNoLogFileContext<TestChain100Setup>()};
35+    const std::unique_ptr test_setup{MakeNoLogFileContext<TestChain100Setup>()};
36@@ -113 +113 @@ static void ConnectBlockAllSchnorr(benchmark::Bench& bench)
37-    BenchmarkConnectBlock(bench, keys, spks, test_setup);
38+    BenchmarkConnectBlock(bench, keys, spks, *test_setup);
39@@ -118 +118 @@ static void ConnectBlockMixed(benchmark::Bench& bench)
40-    const std::shared_ptr test_setup{MakeNoLogFileContext<TestChain100Setup>()};
41+    const std::unique_ptr test_setup{MakeNoLogFileContext<TestChain100Setup>()};
42@@ -121 +121 @@ static void ConnectBlockMixed(benchmark::Bench& bench)
43-    BenchmarkConnectBlock(bench, keys, spks, test_setup);
44+    BenchmarkConnectBlock(bench, keys, spks, *test_setup);
45@@ -126 +126 @@ static void ConnectBlockNoSchnorr(benchmark::Bench& bench)
46-    const std::shared_ptr test_setup{MakeNoLogFileContext<TestChain100Setup>()};
47+    const std::unique_ptr test_setup{MakeNoLogFileContext<TestChain100Setup>()};
48@@ -128 +128 @@ static void ConnectBlockNoSchnorr(benchmark::Bench& bench)
49-    BenchmarkConnectBlock(bench, keys, spks, test_setup);
50+    BenchmarkConnectBlock(bench, keys, spks, *test_setup);

TheCharlatan commented at 8:08 am on January 25, 2025: contributor

Approach ACK

in src/bench/connectblock.cpp:66 in d54722be41 outdated

61+    // Coinbase output can use any output type as it is not spent and will not change the benchmark
62+    const CScript coinbase_spk{GetScriptForDestination(coinbase_taproot)};
63+    return test_setup->CreateBlock(txs, coinbase_spk, chainstate);
64+}
65+
66+std::pair<std::vector<CKey>, std::vector<CScript>> CreateKeysAndScripts(const CKey& coinbaseKey, size_t num_taproot, size_t num_nontaproot)

josibake commented at 3:33 pm on January 27, 2025:

I know that we generically call things “scripts” but I think it would be more clear to name this something like CreateKeysAndSignatures, as that better indicates what we are testing, i.e., ECDSA signatures and Schnorr signatures. I’d also recommend renaming the variables to num_ecdsa_sigs and num_schnorr_sigs, or something similar.

EDIT: reading this again, Scripts is more accurate since the signatures are created later, ignore me 😅

in src/bench/connectblock.cpp:105 in d54722be41 outdated

100+
101+    BlockValidationState test_block_state;
102+    bench.unit("block").run([&] {
103+        LOCK(cs_main);
104+        CCoinsViewCache viewNew{&chainstate.CoinsTip()};
105+        assert(chainstate.ConnectBlock(test_block, test_block_state, pindex.get(), viewNew, false));

josibake commented at 3:43 pm on January 27, 2025:

fJustCheck is already false by default, so I think this would be better as:

0        assert(chainstate.ConnectBlock(test_block, test_block_state, pindex.get(), viewNew));

in src/bench/connectblock.cpp:30 in d54722be41 outdated

25+    std::vector<CTxOut> outputs;
26+    // Each transaction will pay to the same outputs that will be spent in the next one
27+    outputs.reserve(scriptpubkeys.size());
28+    for (const auto& spk : scriptpubkeys) {
29+        outputs.emplace_back(COIN, spk);
30+    }

josibake commented at 3:45 pm on January 27, 2025:

It seems this was moved in here to deduplicate the code (which is great!), but I found this part to be a bit confusing. We only use this scriptpubkeys argument to create an outputs array, so why not create the CTxOut vector directly when we are creating the scriptpubkeys vector and then pass the CTxOut vector in as an argument? I also think a comment explicitly mentioning we are creating an same number input, same number output transaction would be worth adding.

Eunovo commented at 4:09 am on January 30, 2025:

so why not create the CTxOut vector directly when we are creating the scriptpubkeys vector and then pass the CTxOut vector in as an argument?

I updated the PR to do that.

I also think a comment explicitly mentioning we are creating an same number input, same number output transaction would be worth adding.

I also added a comment stating this

josibake commented at 3:46 pm on January 27, 2025: member

Concept ACK

Thanks for picking this up, @Eunovo! This is a necessary precursor for the batch validation work, but also a generally useful benchmark to have. I think the motivation for the PR could be a bit more detailed, specifically highlighting that the benchmarks we are adding now are focused on signature verification. I imagine there are other aspects of ConnectBlock we might want to benchmark, which can be added to this file in the future. I do think it’s a good idea to keep this initial PR focused on signature validation.

I ran the benchmark locally (after doing the recommended pyperf step) and I noticed the mixed block case was taking noticeably longer than the all ECDSA/all Schnorr cases. See the output below:

0./build/src/bench/bench_bitcoin -filter="ConnectBlock.*"
1|            ns/block |             block/s |    err% |       ins/block |       cyc/block |    IPC |      bra/block |   miss% |     total | benchmark
2|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
3|       66,432,404.00 |               15.05 |    0.1% |  675,467,204.00 |  165,553,635.00 |  4.080 |  15,187,165.00 |    1.7% |      0.74 | `ConnectBlockAllSchnorr`
4|       83,202,399.00 |               12.02 |    0.1% |  839,153,157.00 |  207,359,165.00 |  4.047 |  19,061,066.00 |    1.7% |      0.93 | `ConnectBlockMixed`
5|       66,390,509.00 |               15.06 |    0.1% |  670,872,896.00 |  165,494,875.00 |  4.054 |  15,186,590.00 |    1.7% |      0.74 | `ConnectBlockNoSchnorr`

This was surprising to me. I noticed that for the 80/20 split, you are doing it within a transaction, e.g., two taproot inputs and two ECDSA inputs in the same transaction (as opposed to a transaction with two Schnorr spends and a transaction with two ECDSA spends). My first guess is the extra time is coming from hashing the transaction multiple times for the different signature digest algorithms? Just a guess; will need to dig into the code more to understand. I think it would be good to document some of this in the comments, explaining how we are constructing the transactions and that we are only doing the simple case of key path spends for Schnorr and using SIGHASH_ALL/DEFAULT in these initial benchmarks.

I’d also echo @TheCharlatan ’s feedback regarding taking a reference. Still in the process of reviewing, so will follow up with more feedback on the implementation. In the meanwhile, left some nits, if you end up retouching this before I finish my review

Eunovo force-pushed on Jan 30, 2025

Eunovo commented at 4:12 am on January 30, 2025: contributor

I’ve updated the code to take a reference to the underlying test_setup object following @TheCharlatan and @josibake reviews. I’ve also added more comments explaining how the transactions in the test block are created.

Eunovo force-pushed on Jan 30, 2025

Eunovo commented at 4:40 am on January 30, 2025: contributor

My first guess is the extra time is coming from hashing the transaction multiple times for the different signature digest algorithms? Just a guess; will need to dig into the code more to understand.

I’m pretty sure this is the case after looking at SignatureHash and SignatureHashSchnorr in interpreter.cpp. I added a comment explaining this

in src/bench/connectblock.cpp:72 in 1c6b886465 outdated

67+ * Creates key pairs and corresponding outputs for the benchmark transactions.
68+ * - For Taproot outputs: Creates simple key path spendable outputs
69+ * - For non-Taproot outputs: Creates P2WPKH (native SegWit v0) outputs
70+ * - All outputs have value of 1 BTC
71+ */
72+std::pair<std::vector<CKey>, std::vector<CTxOut>> CreateKeysAndOutputs(const CKey& coinbaseKey, size_t num_taproot, size_t num_nontaproot)

josibake commented at 3:54 pm on February 10, 2025:

nit: int seems more appropriate here and I find num_ecdsa and num_schnorr to be more intuitive than taproot and nontaproot

0std::pair<std::vector<CKey>, std::vector<CTxOut>> CreateKeysAndOutputs(const CKey& coinbaseKey, int num_ecdsa, int num_schnorr)

l0rinc commented at 2:10 pm on February 12, 2025:

Agree with the rename (including benchmark names), but have a slight preference for natural numbers instead, since it’s a counter (we can’t have -1 transctions, a “size_t” documents that better)

Eunovo commented at 10:25 am on February 13, 2025:

I renamed num_taproot and num_nontaproot to num_schnorr and num_ecdsa respectively in https://github.com/bitcoin/bitcoin/pull/31689/commits/4b37efee90ee339b7bde7777b24a5268501cee9d but left type as size_t

in src/bench/connectblock.cpp:80 in 1c6b886465 outdated

75+    keys.reserve(num_taproot + num_nontaproot + 1);
76+
77+    std::vector<CTxOut> outputs;
78+    outputs.reserve(num_taproot + num_nontaproot);
79+
80+    for (size_t i{0}; i < num_nontaproot; i++) {

josibake commented at 3:56 pm on February 10, 2025:

per the style guide in doc/developer-notes.md, ++i is preferred over i++

0    for (int i{0}; i < num_schnorr; ++i) {

(same feedback for the other loops below)

Eunovo commented at 10:25 am on February 13, 2025:

Done in https://github.com/bitcoin/bitcoin/pull/31689/commits/4b37efee90ee339b7bde7777b24a5268501cee9d

josibake approved

josibake commented at 4:07 pm on February 10, 2025: member

ACK https://github.com/bitcoin/bitcoin/pull/31689/commits/1c6b886465df0f00549e7d10c3bfefd27be7f1c2

I spent some time trying to come up with a way to create a mixed block of all schnorr input transactions and all ecdsa input transactions (as opposed to transactions with both schnorr and ecdsa inputs), but I couldn’t find a simple way to do it. Furthermore, it felt unnecessary for what we are trying to accomplish: establish a baseline for signature validation of blocks with both schnorr and ecdsa signatures. Thought it was a bit surprising at first, I think the comment you added explaining that a mixed input transaction will take longer to validate is sufficient.

I left a few non-blocking style nits if you end up retouching, but overall this looks good!

DrahtBot requested review from TheCharlatan on Feb 10, 2025

DrahtBot requested review from fjahr on Feb 10, 2025

in src/bench/connectblock.cpp:45 in 1c6b886465 outdated

40+        chainstate.m_chain.Height() + 1, keys, outputs, std::nullopt, std::nullopt)};
41+    const auto test_block_parent_coinbase{GetScriptForDestination(coinbase_taproot)};
42+    test_setup.CreateAndProcessBlock(std::vector{first_tx}, test_block_parent_coinbase, &chainstate);
43+
44+    std::vector<CMutableTransaction> txs;
45+    txs.reserve(num_txs);

davidgumberg commented at 6:37 am on February 12, 2025:

not blocking, just a question: here and below, why reserve() for vectors that are outside of a critical path?

Eunovo commented at 11:54 am on February 12, 2025:

I understand that the use of reserve here is not critical, but, I generally use reserve() whenever I know the size of the vector beforehand in line with best practices

l0rinc commented at 1:43 pm on February 12, 2025:

It’s good general practice and it documents the code as well, e.g:

0    std::vector<CTxOut> outputs;
1    outputs.reserve(num_taproot + num_nontaproot);

in src/bench/connectblock.cpp:35 in 1c6b886465 outdated

30+    Chainstate& chainstate{test_setup.m_node.chainman->ActiveChainstate()};
31+
32+    const WitnessV1Taproot coinbase_taproot{XOnlyPubKey(test_setup.coinbaseKey.GetPubKey())};
33+
34+    // Create the outputs that will be spent in the first transaction of the test block
35+    // Doing this in a separate blocks excludes the validation of it's inputs from the benchmark

l0rinc commented at 1:50 pm on February 12, 2025:

0    // Doing this in a separate block excludes the validation of its inputs from the benchmark

But more importantly, why are we doing this in the first place? Why do we want to exclude the validation of the inputs? Is that an important usecase?

davidgumberg commented at 1:58 am on February 13, 2025:

As-is this gives callers that use this helper more control over the blocks that they’re testing.

Excluding the necessary transaction that spends a coinbase input and creates the first set of outputs of the size and arrangement that the caller has designed makes these benchmarks more ‘ideal’ since it avoids testing validation of an input whose spending conditions the caller doesn’t have any control over, since TestChain100Setup gets to decide the coinbase outputs that the first transaction will have to spend.

l0rinc commented at 12:24 pm on February 13, 2025:

I’m not expert in this area to decide if this is to sterile or not (i.e. testing a too narrow slice that isn’t representative, where the biggest speedup doesn’t cause any measurable macro change), will let others decide if this is too micro or not.

in src/bench/connectblock.cpp:126 in 1c6b886465 outdated

121+
122+/**
123+ * This benchmark is expected to be slower than the AllSchnorr or NoSchnorr benchmark
124+ * because it uses transactions with both Schnorr and ECDSA signatures
125+ * which requires the transaction to be hashed multiple times for
126+ * the different signature allgorithms

l0rinc commented at 1:56 pm on February 12, 2025:

0 * the different signature algorithms

Eunovo commented at 10:27 am on February 13, 2025:

Done in https://github.com/bitcoin/bitcoin/pull/31689/commits/4b37efee90ee339b7bde7777b24a5268501cee9d

in src/bench/connectblock.cpp:42 in 1c6b886465 outdated

37+    const auto [first_tx, _]{test_setup.CreateValidTransaction(
38+        std::vector{coinbase_to_spend},
39+        std::vector{COutPoint(coinbase_to_spend->GetHash(), 0)},
40+        chainstate.m_chain.Height() + 1, keys, outputs, std::nullopt, std::nullopt)};
41+    const auto test_block_parent_coinbase{GetScriptForDestination(coinbase_taproot)};
42+    test_setup.CreateAndProcessBlock(std::vector{first_tx}, test_block_parent_coinbase, &chainstate);

l0rinc commented at 2:08 pm on February 12, 2025:

nit: we don’t necessarily need the std::vector constructs here, we could reduce the nose slightly (please see other uses of it here):

0    const auto [first_tx, _]{test_setup.CreateValidTransaction(
1        {coinbase_to_spend},
2        {COutPoint(coinbase_to_spend->GetHash(), 0)},
3        chainstate.m_chain.Height() + 1, keys, outputs, {}, {})};
4    const auto test_block_parent_coinbase{GetScriptForDestination(coinbase_taproot)};
5    test_setup.CreateAndProcessBlock({first_tx}, test_block_parent_coinbase, &chainstate);

Eunovo commented at 10:27 am on February 13, 2025:

Done in https://github.com/bitcoin/bitcoin/pull/31689/commits/4b37efee90ee339b7bde7777b24a5268501cee9d

in src/bench/connectblock.cpp:105 in 1c6b886465 outdated

100+
101+    Chainstate& chainstate{test_setup.m_node.chainman->ActiveChainstate()};
102+
103+    pindex->nHeight = chainstate.m_chain.Height() + 1;
104+    pindex->phashBlock = test_blockhash.get();
105+    pindex->pprev = chainstate.m_chain.Tip();

l0rinc commented at 2:12 pm on February 12, 2025:

It looks to me like we could do this more consistently:

0    auto& chainman{test_setup.m_node.chainman};
1    auto& chainstate{chainman->ActiveChainstate()};
2    {
3        LOCK(::cs_main);
4        auto* pindex{chainman->m_blockman.AddToBlockIndex(test_block, chainman->m_best_header)};
5        chainman->ActiveChain().SetTip(*pindex);
6    }
7    ...
8        assert(chainstate.ConnectBlock(test_block, test_block_state, chainman->ActiveChain().Tip(), viewNew));

but even if this isn’t the same, we should be able to get rid of the unique conversion back and forth:

0uint256 blockhash{test_block.GetHash()};
1pindex.phashBlock = &blockhash;

Eunovo commented at 10:28 am on February 13, 2025:

Adjusted the code to look more like this in https://github.com/bitcoin/bitcoin/pull/31689/commits/4b37efee90ee339b7bde7777b24a5268501cee9d

l0rinc commented at 12:35 pm on February 13, 2025:

is the AddToBlockIndex cheap enough to be done inside the benchmark

in src/bench/connectblock.cpp:46 in 1c6b886465 outdated

41+    const auto test_block_parent_coinbase{GetScriptForDestination(coinbase_taproot)};
42+    test_setup.CreateAndProcessBlock(std::vector{first_tx}, test_block_parent_coinbase, &chainstate);
43+
44+    std::vector<CMutableTransaction> txs;
45+    txs.reserve(num_txs);
46+    CTransactionRef input_tx{MakeTransactionRef(first_tx)};

l0rinc commented at 2:26 pm on February 12, 2025:

I don’t like this separation of first and rest - it introduces state in this loop that’s really hard to follow.

I don’t yet understand this part (we do we need a separate CreateValidTransaction call for the first tx here, I don’t get why that’s special - since it’s not a coinbase, right?), but if we have to keep it, consider simplifying to something like this inside the loop:

0        const auto& tx_to_spend{i == 0 ? first_tx : txs.back()};

davidgumberg commented at 2:02 am on February 13, 2025:

The reason to separate the first tx is because it goes into a block of its own. https://github.com/bitcoin/bitcoin/pull/31689/files#r1953657923

in src/bench/connectblock.cpp:101 in 1c6b886465 outdated

105+    pindex->pprev = chainstate.m_chain.Tip();
106+
107+    BlockValidationState test_block_state;
108+    bench.unit("block").run([&] {
109+        LOCK(cs_main);
110+        CCoinsViewCache viewNew{&chainstate.CoinsTip()};

l0rinc commented at 2:47 pm on February 12, 2025:

I’ve checked the price of calling this in every iteration: 1,000,387.85 blocks / s without ConnectBlock (measuring CCoinsViewCache creation only) versus 28.06 blocks / s for both - so 👍 we’re mostly measuring ConnectBlock speed here.

in src/bench/connectblock.cpp:36 in 1c6b886465 outdated

31+
32+    const WitnessV1Taproot coinbase_taproot{XOnlyPubKey(test_setup.coinbaseKey.GetPubKey())};
33+
34+    // Create the outputs that will be spent in the first transaction of the test block
35+    // Doing this in a separate blocks excludes the validation of it's inputs from the benchmark
36+    auto coinbase_to_spend{test_setup.m_coinbase_txns[0]};

l0rinc commented at 2:48 pm on February 12, 2025:

we can avoid copying this:

0    const auto& coinbase_to_spend{test_setup.m_coinbase_txns[0]};

in src/bench/connectblock.cpp:27 in 1c6b886465 outdated

22+ * - Each transaction spends all outputs from the previous transaction
23+ */
24+CBlock CreateTestBlock(
25+    TestChain100Setup& test_setup,
26+    const std::vector<CKey>& keys,
27+    const std::vector<CTxOut>& outputs,

l0rinc commented at 2:56 pm on February 12, 2025:

we could use std::span here

0    std::span<CKey> keys,
1    std::span<CTxOut> outputs,

but unfortunately that would require modernizing CreateValidTransaction as well - mayne in a follow-up PR :)

in src/bench/connectblock.cpp:41 in 1c6b886465 outdated

36+    auto coinbase_to_spend{test_setup.m_coinbase_txns[0]};
37+    const auto [first_tx, _]{test_setup.CreateValidTransaction(
38+        std::vector{coinbase_to_spend},
39+        std::vector{COutPoint(coinbase_to_spend->GetHash(), 0)},
40+        chainstate.m_chain.Height() + 1, keys, outputs, std::nullopt, std::nullopt)};
41+    const auto test_block_parent_coinbase{GetScriptForDestination(coinbase_taproot)};

l0rinc commented at 3:03 pm on February 12, 2025:

isn’t test_block_parent_coinbase the same as coinbase_spk? Can we reuse it?

0    const CScript coinbase_script{GetScriptForDestination(coinbase_taproot)};

in src/bench/connectblock.cpp:83 in 1c6b886465 outdated

78+    outputs.reserve(num_taproot + num_nontaproot);
79+
80+    for (size_t i{0}; i < num_nontaproot; i++) {
81+        const CKey key{GenerateRandomKey()};
82+        keys.emplace_back(key);
83+        outputs.emplace_back(COIN, GetScriptForDestination(WitnessV0KeyHash{key.GetPubKey()}));

l0rinc commented at 3:06 pm on February 12, 2025:

to avoid possible copying we could take advantage of emplace_back constructing the key in-place:

0        keys.emplace_back(GenerateRandomKey());
1        outputs.emplace_back(COIN, GetScriptForDestination(WitnessV0KeyHash{keys.back().GetPubKey()}));

in src/bench/connectblock.cpp:145 in 1c6b886465 outdated

140+    BenchmarkConnectBlock(bench, keys, outputs, *test_setup);
141+}
142+
143+BENCHMARK(ConnectBlockAllSchnorr, benchmark::PriorityLevel::HIGH);
144+BENCHMARK(ConnectBlockMixed, benchmark::PriorityLevel::HIGH);
145+BENCHMARK(ConnectBlockNoSchnorr, benchmark::PriorityLevel::HIGH);

l0rinc commented at 3:13 pm on February 12, 2025:

for consistency:

0BENCHMARK(ConnectBlockAllEcdsa, benchmark::PriorityLevel::HIGH);

in src/bench/connectblock.cpp:144 in 1c6b886465 outdated

139+    auto [keys, outputs]{CreateKeysAndOutputs(test_setup->coinbaseKey, /*num_taproot=*/0, /*num_nontaproot=*/4)};
140+    BenchmarkConnectBlock(bench, keys, outputs, *test_setup);
141+}
142+
143+BENCHMARK(ConnectBlockAllSchnorr, benchmark::PriorityLevel::HIGH);
144+BENCHMARK(ConnectBlockMixed, benchmark::PriorityLevel::HIGH);

l0rinc commented at 3:14 pm on February 12, 2025:

These names are global, it may not be obvious what “ConnectBlockMixed” means outside of this context:

0BENCHMARK(ConnectBlockMixedEcdsaSchnorr, benchmark::PriorityLevel::HIGH);

in src/bench/connectblock.cpp:107 in 1c6b886465 outdated

110+        CCoinsViewCache viewNew{&chainstate.CoinsTip()};
111+        assert(chainstate.ConnectBlock(test_block, test_block_state, pindex.get(), viewNew));
112+    });
113+}
114+
115+static void ConnectBlockAllSchnorr(benchmark::Bench& bench)

l0rinc commented at 3:27 pm on February 12, 2025:

This is what the flames look like - I haven’t worked in this area yet, do you think it’s representative?

and

in src/bench/connectblock.cpp:136 in 1c6b886465 outdated

131+    // Blocks in range 848000 to 868000 have a roughly 20 to 80 ratio of schnorr to ecdsa inputs
132+    auto [keys, outputs]{CreateKeysAndOutputs(test_setup->coinbaseKey, /*num_taproot=*/1, /*num_nontaproot=*/4)};
133+    BenchmarkConnectBlock(bench, keys, outputs, *test_setup);
134+}
135+
136+static void ConnectBlockNoSchnorr(benchmark::Bench& bench)

l0rinc commented at 3:28 pm on February 12, 2025:

And these are the ECDSA ones:

and

l0rinc changes_requested

l0rinc commented at 3:33 pm on February 12, 2025: contributor

Forgot to post the diff with most of my suggestions (please double-check):

  0diff --git a/src/bench/connectblock.cpp b/src/bench/connectblock.cpp
  1--- a/src/bench/connectblock.cpp	(revision cb071470c2d9e595b5b771b153dfe64fdd1b1392)
  2+++ b/src/bench/connectblock.cpp	(date 1739374368802)
  3@@ -32,35 +32,33 @@
  4     const WitnessV1Taproot coinbase_taproot{XOnlyPubKey(test_setup.coinbaseKey.GetPubKey())};
  5 
  6     // Create the outputs that will be spent in the first transaction of the test block
  7-    // Doing this in a separate blocks excludes the validation of it's inputs from the benchmark
  8-    auto coinbase_to_spend{test_setup.m_coinbase_txns[0]};
  9+    // Doing this in a separate block excludes the validation of its inputs from the benchmark
 10+    const auto& coinbase_to_spend{test_setup.m_coinbase_txns[0]};
 11     const auto [first_tx, _]{test_setup.CreateValidTransaction(
 12-        std::vector{coinbase_to_spend},
 13-        std::vector{COutPoint(coinbase_to_spend->GetHash(), 0)},
 14-        chainstate.m_chain.Height() + 1, keys, outputs, std::nullopt, std::nullopt)};
 15-    const auto test_block_parent_coinbase{GetScriptForDestination(coinbase_taproot)};
 16-    test_setup.CreateAndProcessBlock(std::vector{first_tx}, test_block_parent_coinbase, &chainstate);
 17+        {coinbase_to_spend},
 18+        {COutPoint(coinbase_to_spend->GetHash(), 0)},
 19+        chainstate.m_chain.Height() + 1, keys, outputs, {}, {})};
 20+    const CScript coinbase_script{GetScriptForDestination(coinbase_taproot)};
 21+    test_setup.CreateAndProcessBlock({first_tx}, coinbase_script, &chainstate);
 22 
 23     std::vector<CMutableTransaction> txs;
 24     txs.reserve(num_txs);
 25-    CTransactionRef input_tx{MakeTransactionRef(first_tx)};
 26-    for (int i{0}; i < num_txs; i++) {
 27+    for (int i{0}; i < num_txs; ++i) {
 28+        const auto& tx_to_spend{i == 0 ? first_tx : txs.back()};
 29+
 30         std::vector<COutPoint> inputs;
 31         inputs.reserve(outputs.size());
 32-
 33         for (size_t j{0}; j < outputs.size(); j++) {
 34-            inputs.emplace_back(input_tx->GetHash(), j);
 35+            inputs.emplace_back(tx_to_spend.GetHash(), j);
 36         }
 37 
 38         const auto [taproot_tx, _]{test_setup.CreateValidTransaction(
 39-            std::vector{input_tx}, inputs, chainstate.m_chain.Height() + 1, keys, outputs, std::nullopt, std::nullopt)};
 40+            {MakeTransactionRef(tx_to_spend)}, inputs, chainstate.m_chain.Height() + 1, keys, outputs, {}, {})};
 41         txs.emplace_back(taproot_tx);
 42-        input_tx = MakeTransactionRef(taproot_tx);
 43     }
 44 
 45     // Coinbase output can use any output type as it is not spent and will not change the benchmark
 46-    const CScript coinbase_spk{GetScriptForDestination(coinbase_taproot)};
 47-    return test_setup.CreateBlock(txs, coinbase_spk, chainstate);
 48+    return test_setup.CreateBlock(txs, coinbase_script, chainstate);
 49 }
 50 
 51 /*
 52@@ -69,53 +67,51 @@
 53  * - For non-Taproot outputs: Creates P2WPKH (native SegWit v0) outputs
 54  * - All outputs have value of 1 BTC
 55  */
 56-std::pair<std::vector<CKey>, std::vector<CTxOut>> CreateKeysAndOutputs(const CKey& coinbaseKey, size_t num_taproot, size_t num_nontaproot)
 57+std::pair<std::vector<CKey>, std::vector<CTxOut>> CreateKeysAndOutputs(const CKey& coinbaseKey, size_t num_schnorr, size_t num_ecdsa)
 58 {
 59     std::vector<CKey> keys{coinbaseKey};
 60-    keys.reserve(num_taproot + num_nontaproot + 1);
 61+    keys.reserve(num_schnorr + num_ecdsa + 1);
 62 
 63     std::vector<CTxOut> outputs;
 64-    outputs.reserve(num_taproot + num_nontaproot);
 65+    outputs.reserve(num_schnorr + num_ecdsa);
 66 
 67-    for (size_t i{0}; i < num_nontaproot; i++) {
 68-        const CKey key{GenerateRandomKey()};
 69-        keys.emplace_back(key);
 70-        outputs.emplace_back(COIN, GetScriptForDestination(WitnessV0KeyHash{key.GetPubKey()}));
 71+    for (size_t i{0}; i < num_ecdsa; ++i) {
 72+        keys.emplace_back(GenerateRandomKey());
 73+        outputs.emplace_back(COIN, GetScriptForDestination(WitnessV0KeyHash{keys.back().GetPubKey()}));
 74     }
 75 
 76-    for (size_t i{0}; i < num_taproot; i++) {
 77-        CKey key{GenerateRandomKey()};
 78-        keys.emplace_back(key);
 79-        outputs.emplace_back(COIN, GetScriptForDestination(WitnessV1Taproot{XOnlyPubKey(key.GetPubKey())}));
 80+    for (size_t i{0}; i < num_schnorr; ++i) {
 81+        keys.emplace_back(GenerateRandomKey());
 82+        outputs.emplace_back(COIN, GetScriptForDestination(WitnessV1Taproot{XOnlyPubKey(keys.back().GetPubKey())}));
 83     }
 84 
 85     return {keys, outputs};
 86 }
 87 
 88-void BenchmarkConnectBlock(benchmark::Bench& bench, std::vector<CKey>& keys, std::vector<CTxOut>& outputs, TestChain100Setup& test_setup)
 89+void BenchmarkConnectBlock(benchmark::Bench& bench, const std::vector<CKey>& keys, const std::vector<CTxOut>& outputs, TestChain100Setup& test_setup)
 90 {
 91     const auto test_block{CreateTestBlock(test_setup, keys, outputs)};
 92-    auto pindex{std::make_unique<CBlockIndex>(test_block)};
 93-    auto test_blockhash{std::make_unique<uint256>(test_block.GetHash())};
 94+    auto& chainman{test_setup.m_node.chainman};
 95+    auto& chainstate{chainman->ActiveChainstate()};
 96 
 97-    Chainstate& chainstate{test_setup.m_node.chainman->ActiveChainstate()};
 98-
 99-    pindex->nHeight = chainstate.m_chain.Height() + 1;
100-    pindex->phashBlock = test_blockhash.get();
101-    pindex->pprev = chainstate.m_chain.Tip();
102+    {
103+        LOCK(::cs_main);
104+        auto* pindex{chainman->m_blockman.AddToBlockIndex(test_block, chainman->m_best_header)};
105+        chainman->ActiveChain().SetTip(*pindex);
106+    }
107 
108     BlockValidationState test_block_state;
109     bench.unit("block").run([&] {
110         LOCK(cs_main);
111         CCoinsViewCache viewNew{&chainstate.CoinsTip()};
112-        assert(chainstate.ConnectBlock(test_block, test_block_state, pindex.get(), viewNew));
113+        assert(chainstate.ConnectBlock(test_block, test_block_state, chainman->ActiveChain().Tip(), viewNew));
114     });
115 }
116 
117 static void ConnectBlockAllSchnorr(benchmark::Bench& bench)
118 {
119-    const std::unique_ptr test_setup{MakeNoLogFileContext<TestChain100Setup>()};
120-    auto [keys, outputs]{CreateKeysAndOutputs(test_setup->coinbaseKey, /*num_taproot=*/4, /*num_nontaproot=*/0)};
121+    const auto test_setup{MakeNoLogFileContext<TestChain100Setup>()};
122+    auto [keys, outputs]{CreateKeysAndOutputs(test_setup->coinbaseKey, /*num_schnorr=*/4, /*num_ecdsa=*/0)};
123     BenchmarkConnectBlock(bench, keys, outputs, *test_setup);
124 }
125 
126@@ -123,23 +119,23 @@
127  * This benchmark is expected to be slower than the AllSchnorr or NoSchnorr benchmark
128  * because it uses transactions with both Schnorr and ECDSA signatures
129  * which requires the transaction to be hashed multiple times for
130- * the different signature allgorithms
131+ * the different signature algorithms
132  */
133-static void ConnectBlockMixed(benchmark::Bench& bench)
134+static void ConnectBlockMixedEcdsaSchnorr(benchmark::Bench& bench)
135 {
136-    const std::unique_ptr test_setup{MakeNoLogFileContext<TestChain100Setup>()};
137+    const auto test_setup{MakeNoLogFileContext<TestChain100Setup>()};
138     // Blocks in range 848000 to 868000 have a roughly 20 to 80 ratio of schnorr to ecdsa inputs
139-    auto [keys, outputs]{CreateKeysAndOutputs(test_setup->coinbaseKey, /*num_taproot=*/1, /*num_nontaproot=*/4)};
140+    auto [keys, outputs]{CreateKeysAndOutputs(test_setup->coinbaseKey, /*num_schnorr=*/1, /*num_ecdsa=*/4)};
141     BenchmarkConnectBlock(bench, keys, outputs, *test_setup);
142 }
143 
144-static void ConnectBlockNoSchnorr(benchmark::Bench& bench)
145+static void ConnectBlockAllEcdsa(benchmark::Bench& bench)
146 {
147-    const std::unique_ptr test_setup{MakeNoLogFileContext<TestChain100Setup>()};
148-    auto [keys, outputs]{CreateKeysAndOutputs(test_setup->coinbaseKey, /*num_taproot=*/0, /*num_nontaproot=*/4)};
149+    const auto test_setup{MakeNoLogFileContext<TestChain100Setup>()};
150+    auto [keys, outputs]{CreateKeysAndOutputs(test_setup->coinbaseKey, /*num_schnorr=*/0, /*num_ecdsa=*/4)};
151     BenchmarkConnectBlock(bench, keys, outputs, *test_setup);
152 }
153 
154 BENCHMARK(ConnectBlockAllSchnorr, benchmark::PriorityLevel::HIGH);
155-BENCHMARK(ConnectBlockMixed, benchmark::PriorityLevel::HIGH);
156-BENCHMARK(ConnectBlockNoSchnorr, benchmark::PriorityLevel::HIGH);
157+BENCHMARK(ConnectBlockMixedEcdsaSchnorr, benchmark::PriorityLevel::HIGH);
158+BENCHMARK(ConnectBlockAllEcdsa, benchmark::PriorityLevel::HIGH);

davidgumberg referenced this in commit e98e173a57 on Feb 13, 2025

davidgumberg referenced this in commit d2f6f83cea on Feb 13, 2025

davidgumberg commented at 6:52 am on February 13, 2025: contributor

ACK https://github.com/bitcoin/bitcoin/pull/31689/commits/1c6b886465df0f00549e7d10c3bfefd27be7f1c2

I like that the benchmarks added here for ConnectBlock are idealized scenarios that map to specific paths we want to test the performance of, namely Schnorr signature validation and ECDSA signature validation. Measuring performance in realistic scenarios is valuable for establishing ground truth, but it seems to me that microbenchmarks like these are most useful when their focus is narrow, since they will never be a total substitute for a real end-to-end measurement I am willing to sacrifice a little bit of faithfulness in exchange for ease of interpretation.

Out of curiosity, I added a P2PKH scenario (https://github.com/davidgumberg/bitcoin/commit/b11348c50123999e413a5c1c373e108b78954375), where blocks are filled with 4 input 4 output P2PKH transactions, and performance was very similar to P2WPKH:

0$ bench_bitcoin -filter=ConnectBlock.* -min-time=60000

benchmark	ns/block	block/s	err%	ins/block	cyc/block	IPC	bra/block	miss%	total	flamegraph
`ConnectBlockAllSchnorr`	44,778,812.21	22.33	1.0%	648,504,262.83	155,268,849.44	4.177	11,676,225.67	2.2%	65.24	link
`ConnectBlockMixed`	56,068,853.07	17.84	1.1%	808,990,254.47	194,613,843.82	4.157	14,939,414.67	2.2%	66.10	link
`ConnectBlockNoSchnorr`	43,697,139.92	22.88	0.6%	648,840,910.97	151,969,288.79	4.270	11,937,769.88	2.1%	64.39	link
`ConnectBlockPKH`	43,607,527.65	22.93	0.3%	652,306,790.73	151,637,220.73	4.302	12,033,719.03	2.2%	63.64	link

Eunovo force-pushed on Feb 13, 2025

DrahtBot added the label CI failed on Feb 13, 2025

DrahtBot commented at 10:33 am on February 13, 2025: contributor

🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/37155591285

Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.
An intermittent issue.

Leave a comment here, if you need help tracking down a confusing failure.

Eunovo commented at 10:50 am on February 13, 2025: contributor

Rebased https://github.com/bitcoin/bitcoin/commit/1c6b886465df0f00549e7d10c3bfefd27be7f1c2 to https://github.com/bitcoin/bitcoin/pull/31689/commits/6883704a0167aefc539dbae6ced1db976a128fca

I improved some variable and function names based on received feedback, and also made style changes to align with doc/developer-notes.md.

DrahtBot removed the label CI failed on Feb 13, 2025

in src/bench/connectblock.cpp:95 in bd27a83efc outdated

91+}
92+
93+void BenchmarkConnectBlock(benchmark::Bench& bench, std::vector<CKey>& keys, std::vector<CTxOut>& outputs, TestChain100Setup& test_setup)
94+{
95+    const auto test_block{CreateTestBlock(test_setup, keys, outputs)};
96+    auto test_blockhash{std::make_unique<uint256>(test_block.GetHash())};

l0rinc commented at 12:43 pm on February 13, 2025:

Seems unused now

in src/bench/connectblock.cpp:99 in bd27a83efc outdated

 95+    const auto test_block{CreateTestBlock(test_setup, keys, outputs)};
 96+    auto test_blockhash{std::make_unique<uint256>(test_block.GetHash())};
 97+    auto& chainman{test_setup.m_node.chainman};
 98+    auto& chainstate{chainman->ActiveChainstate()};
 99+
100+    BlockValidationState test_block_state;

l0rinc commented at 12:48 pm on February 13, 2025:

can be moved inside the benchmark to make sure previous run doesn’t pollute the next one (the lambda is unrolled and run multiple times, so we should reduce dependencies)

in src/bench/connectblock.cpp:35 in bd27a83efc outdated

30+    Chainstate& chainstate{test_setup.m_node.chainman->ActiveChainstate()};
31+
32+    const WitnessV1Taproot coinbase_taproot{XOnlyPubKey(test_setup.coinbaseKey.GetPubKey())};
33+
34+    // Create the outputs that will be spent in the first transaction of the test block
35+    // Doing this in a separate block excludes the validation of it's inputs from the benchmark

l0rinc commented at 12:48 pm on February 13, 2025:

0    // Doing this in a separate block excludes the validation of its inputs from the benchmark

in src/bench/connectblock.cpp:94 in bd27a83efc outdated

90+    return {keys, outputs};
91+}
92+
93+void BenchmarkConnectBlock(benchmark::Bench& bench, std::vector<CKey>& keys, std::vector<CTxOut>& outputs, TestChain100Setup& test_setup)
94+{
95+    const auto test_block{CreateTestBlock(test_setup, keys, outputs)};

l0rinc commented at 12:50 pm on February 13, 2025:

We could avoid copying here as well:

0    const auto& test_block{CreateTestBlock(test_setup, keys, outputs)};

Eunovo force-pushed on Feb 13, 2025

l0rinc commented at 12:54 pm on February 13, 2025: contributor

Concept ACK, please see my remaining comments:

 0diff --git a/src/bench/connectblock.cpp b/src/bench/connectblock.cpp
 1--- a/src/bench/connectblock.cpp	(revision b3a64758bc09eb60858c23292ba7f1b9a41b9bb9)
 2+++ b/src/bench/connectblock.cpp	(date 1739451327613)
 3@@ -33,7 +33,7 @@
 4 
 5     // Create the outputs that will be spent in the first transaction of the test block
 6     // Doing this in a separate block excludes the validation of it's inputs from the benchmark
 7-    auto coinbase_to_spend{test_setup.m_coinbase_txns[0]};
 8+    const auto& coinbase_to_spend{test_setup.m_coinbase_txns[0]};
 9     const auto [first_tx, _]{test_setup.CreateValidTransaction(
10         std::vector{coinbase_to_spend},
11         std::vector{COutPoint(coinbase_to_spend->GetHash(), 0)},
12@@ -90,25 +90,26 @@
13     return {keys, outputs};
14 }
15 
16-void BenchmarkConnectBlock(benchmark::Bench& bench, std::vector<CKey>& keys, std::vector<CTxOut>& outputs, TestChain100Setup& test_setup)
17+void BenchmarkConnectBlock(benchmark::Bench& bench, const std::vector<CKey>& keys, const std::vector<CTxOut>& outputs, TestChain100Setup& test_setup)
18 {
19-    const auto test_block{CreateTestBlock(test_setup, keys, outputs)};
20-    auto test_blockhash{std::make_unique<uint256>(test_block.GetHash())};
21+    const auto& test_block{CreateTestBlock(test_setup, keys, outputs)};
22     auto& chainman{test_setup.m_node.chainman};
23     auto& chainstate{chainman->ActiveChainstate()};
24 
25-    BlockValidationState test_block_state;
26+    LOCK(cs_main);
27+    auto* pindex{chainman->m_blockman.AddToBlockIndex(test_block, chainman->m_best_header)};
28+
29     bench.unit("block").run([&] {
30         LOCK(cs_main);
31-        auto* pindex{chainman->m_blockman.AddToBlockIndex(test_block, chainman->m_best_header)};
32         CCoinsViewCache viewNew{&chainstate.CoinsTip()};
33+        BlockValidationState test_block_state;
34         assert(chainstate.ConnectBlock(test_block, test_block_state, pindex, viewNew));
35     });
36 }
37 
38 static void ConnectBlockAllSchnorr(benchmark::Bench& bench)
39 {
40-    const std::unique_ptr test_setup{MakeNoLogFileContext<TestChain100Setup>()};
41+    const auto test_setup{MakeNoLogFileContext<TestChain100Setup>()};
42     auto [keys, outputs]{CreateKeysAndOutputs(test_setup->coinbaseKey, /*num_schnorr=*/4, /*num_ecdsa=*/0)};
43     BenchmarkConnectBlock(bench, keys, outputs, *test_setup);
44 }
45@@ -121,7 +122,7 @@
46  */
47 static void ConnectBlockMixedEcdsaSchnorr(benchmark::Bench& bench)
48 {
49-    const std::unique_ptr test_setup{MakeNoLogFileContext<TestChain100Setup>()};
50+    const auto test_setup{MakeNoLogFileContext<TestChain100Setup>()};
51     // Blocks in range 848000 to 868000 have a roughly 20 to 80 ratio of schnorr to ecdsa inputs
52     auto [keys, outputs]{CreateKeysAndOutputs(test_setup->coinbaseKey, /*num_schnorr=*/1, /*num_ecdsa=*/4)};
53     BenchmarkConnectBlock(bench, keys, outputs, *test_setup);
54@@ -129,7 +130,7 @@
55 
56 static void ConnectBlockAllEcdsa(benchmark::Bench& bench)
57 {
58-    const std::unique_ptr test_setup{MakeNoLogFileContext<TestChain100Setup>()};
59+    const auto test_setup{MakeNoLogFileContext<TestChain100Setup>()};
60     auto [keys, outputs]{CreateKeysAndOutputs(test_setup->coinbaseKey, /*num_schnorr=*/0, /*num_ecdsa=*/4)};
61     BenchmarkConnectBlock(bench, keys, outputs, *test_setup);
62 }

Eunovo force-pushed on Feb 13, 2025

Benchmark Chainstate::ConnectBlock duration

Measure ConnectBlock performance for
- blocks containing only schnorr sigs
- blocks containing both schnorr and ecdsa sigs
- blocks containing only ecdsa sigs

This will allow testing and measurement of performance improvement for features like
batch verification of schnorr signatures

7edaf8b64c

Eunovo force-pushed on Feb 13, 2025

l0rinc approved

l0rinc commented at 1:20 pm on February 13, 2025: contributor

ACK 7edaf8b64cb2d59ada22042fee62a417e52368b8

I ran the benchmarks and tests locally. I need a crypto expert to validate them conceptually as well, but I’m fine with the benchmarking code now.

(before merging please fix the typo in the PR description)

DrahtBot requested review from josibake on Feb 13, 2025

DrahtBot requested review from davidgumberg on Feb 13, 2025

josibake commented at 2:13 pm on February 13, 2025: member

reACK https://github.com/bitcoin/bitcoin/commit/7edaf8b64cb2d59ada22042fee62a417e52368b8

fjahr commented at 8:16 pm on March 5, 2025: contributor

utACK 7edaf8b64cb2d59ada22042fee62a417e52368b8

ryanofsky assigned ryanofsky on Mar 23, 2025

ryanofsky merged this on Mar 23, 2025

ryanofsky closed this on Mar 23, 2025

ryanofsky commented at 1:15 pm on March 23, 2025: contributor

(before merging please fix the typo in the PR description)

Note: I edited out an extra comma in the description in case that was the typo, but PR otherwise looked ready so I merged it

monlovesmango commented at 6:34 am on April 2, 2025: none

hey all,

is it possible that ConnectBlockMixedEcdsaSchnorr is taking longer simply because it has a total of 5 keys/outputs as opposed to ConnectBlockAllEcdsa and ConnectBlockAllSchnorr which both have a total of 4 keys/outputs?

when I make all three have a total of 5 keys/outputs they are all comparable:

ns/block	block/s	err%	ins/block	cyc/block	IPC	bra/block	miss%	total	benchmark
60,589,406.32	16.50	0.8%	841,408,256.25	208,383,527.67	4.038	18,821,289.97	1.7%	64.95	`ConnectBlockAllEcdsa`
61,600,777.26	16.23	0.5%	849,087,583.51	211,909,775.35	4.007	18,904,391.80	1.7%	64.16	`ConnectBlockAllSchnorr`
61,224,757.09	16.33	0.2%	843,700,462.04	210,410,191.25	4.010	19,149,296.85	1.7%	63.34	`ConnectBlockMixedEcdsaSchnorr`

same when I make all three have a total of 4 keys/outputs:

ns/block	block/s	err%	ins/block	cyc/block	IPC	bra/block	miss%	total	benchmark
49,623,666.70	20.15	0.8%	672,342,072.96	169,902,638.80	3.957	15,205,308.17	1.7%	64.03	`ConnectBlockAllEcdsa`
49,647,852.54	20.14	0.3%	678,295,781.02	170,690,886.42	3.974	15,250,284.70	1.7%	64.67	`ConnectBlockAllSchnorr`
49,319,385.60	20.28	0.4%	671,059,527.55	169,247,546.89	3.965	15,427,653.66	1.7%	64.44	`ConnectBlockMixedEcdsaSchnorr`

note I did not use pyperf system tune as it didn’t work on my machine (I think because I have AMD) but did use min-time=6000 which make my results consistent between runs.

josibake commented at 8:30 am on April 3, 2025: member

is it possible that ConnectBlockMixedEcdsaSchnorr is taking longer simply because it has a total of 5 keys/outputs as opposed to ConnectBlockAllEcdsa and ConnectBlockAllSchnorr which both have a total of 4 keys/outputs?

Great catch, I completely overlooked this when looking at the differences between the Mixed and All benchmarks! I double-checked this on my machine by running a few times (with pyperf), and got:

0|            ns/block |             block/s |    err% |       ins/block |       cyc/block |    IPC |      bra/block |   miss% |     total | benchmark
1|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
2|       66,487,065.00 |               15.04 |    0.1% |  671,725,440.00 |  165,661,990.00 |  4.055 |  15,152,305.00 |    1.7% |      0.74 | `ConnectBlockAllEcdsa`
3|       66,615,336.00 |               15.01 |    0.1% |  672,934,222.00 |  166,056,020.00 |  4.052 |  15,142,026.00 |    1.7% |      0.74 | `ConnectBlockAllSchnorr`
4|       66,892,165.00 |               14.95 |    0.1% |  671,417,525.00 |  166,584,145.00 |  4.031 |  15,373,026.00 |    1.7% |      0.74 | `ConnectBlockMixedEcdsaSchnorr`

..which confirms your results. Even if there is a difference due to hashing the transaction twice in the Mixed case, it seems negligible. Happy to review/ACK a PR that updates this to make the benchmarks consistent.

l0rinc commented at 9:53 am on April 3, 2025: contributor

We’ve known about that inconsistency, it’s why I recommended:

bench.unit(“block”).run([&] { unit should likely be something that changes, e.g. total signatures or something that we can compare between the different runs

monlovesmango commented at 3:59 pm on April 3, 2025: none

do you have a suggestion on what the unit should be? ConnectBlock benchmark is supposed to benchmark connecting a new block (and may be used for things other than signature benchmarking in the future) so I don’t see any other unit than “block” making sense.

l0rinc commented at 4:04 pm on April 3, 2025: contributor

But it’s a different block, I don’t think we can just compare the two performances

monlovesmango commented at 9:08 pm on April 7, 2025: none

I have opened #32216 to make all the ConnectBlock benchmarks have a uniform 5 outputs per tx.

The main piece of context that might be good to have in that PR is whether the unequal number of outputs were there for a reason.

Benchmark Chainstate::ConnectBlock duration #31689

Code Coverage & Benchmarks

Reviews