[EXPERIMENTAL] Schnorr batch verification for blocks

fjahr commented at 5:06 pm on February 27, 2024: contributor

This is a simple draft implementation of Schnorr signature batch verification in blocks, put here for conceptual discussion, and collaborative benchmarking. Most of code added here is from the secp256k1 implementation.

The secp256k1 code is still under review itself, please spend some time giving feedback on the API etc. if you can:

TODOs

Fix functional/feature_taproot.py
Extensive benchmarking using ConnectBlock() tracing
Add unit tests
Batch taproot tweak verification as well
Conceptual discussion in what form this realistically could be introduced into the code base, and later, a release

DrahtBot commented at 5:06 pm on February 27, 2024: contributor

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/29491.

Reviews

See the guideline for information on the review process. A summary of reviews will appear here.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#33101 (cmake: Proactively avoid use of SECP256K1_DISABLE_SHARED by hebasto)
#32724 (Musig2 tests by w0xlt)
#32575 (refactor: Split multithreaded case out of CheckInputScripts by fjahr)
#32317 (kernel: Separate UTXO set access from validation functions by TheCharlatan)
#31244 (descriptors: MuSig2 by achow101)
#31132 (validation: fetch block inputs on parallel threads 10% faster IBD by andrewtoth)
#29675 (wallet: Be able to receive and spend inputs involving MuSig2 aggregate keys by achow101)
#29247 (CAT in Tapscript (BIP-347) by arminsabouri)
#28690 (build: Introduce internal kernel library by TheCharlatan)
#28201 (Silent Payments: sending by josibake)
#28122 (Silent Payments: Implement BIP352 by josibake)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

LLM Linter (✨ experimental)

Possible typos and grammar issues:

In include/secp256k1_batch.h comment: “batch_add_xonpub_tweak_check” → “batch_add_xonlypub_tweak_check” [function name typo]
In src/secp256k1/src/bench.c printf: “Schnorr sigining algorithm” → “Schnorr signing algorithm” [spelling error]

drahtbot_id_4_m

fjahr force-pushed on Feb 27, 2024

DrahtBot added the label CI failed on Feb 27, 2024

DrahtBot commented at 5:11 pm on February 27, 2024: contributor

🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the documentation.

Possibly this is due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

Leave a comment here, if you need help tracking down a confusing failure.

Debug: https://github.com/bitcoin/bitcoin/runs/22041704016

Sjors commented at 9:40 am on February 28, 2024: member

I guess this is most interesting to test against the last 50K blocks on mainnet, i.e. since early 2023? Given the high percentage of taproot transactions. Such testing could benefit from a assume utxo snapshot, saving everyone the pain of rewinding. I can bake one, though ideally I’d like #26045 to land first.

Is the current PR already using batches? It’s unclear to me by glancing over validation.cpp if and how it’s collecting up to max_batch_size{106} signatures before calling batch.Verify().

Maybe add a -schnorrbatchverify startup argument so more people can try it once it gets through review.

fjahr commented at 12:53 pm on February 28, 2024: contributor

I guess this is most interesting to test against the last 50K blocks on mainnet, i.e. since early 2023? Given the high percentage of taproot transactions. Such testing could benefit from a assume utxo snapshot, saving everyone the pain of rewinding. I can bake one, though ideally I’d like #26045 to land first.

Yeah, that would be great. I have to get the tracing part to work in my setup again before I can start.

Is the current PR already using batches? It’s unclear to me by glancing over validation.cpp if and how it’s collecting up to max_batch_size{106} signatures before calling batch.Verify().

Yes, the rough architecture is as as follows (sorry for not writing a better explanation above but it’s still very rough and I expect it to change). So, the batch object is constructed and then handed down from ConnectBlock() to CheckInputScripts() which hands it to the CScriptCheck constructor. When the CScriptCheck instance is executed and the batch object is present BatchingCachingTransactionSignatureChecker is used which only differs from the default CachingTransactionSignatureChecker in that its implementation of VerifySchnorrSignature() and adds the Schnorr sig to the batch object instead of verifying it right there. (Here we have an issue because the function is not doing what the name says anymore but it is a much simpler change this way and I find it hard to predict where we will land with this in the end.) Then back in ConnectBlock() the batch object is verified after all transactions have been iterated over.

The 106 is an implementation detail from secp256k1 that I am not sure needs to be really exposed here. But what matters for the question is: if there are more than 106 sigs added to the batch the verification for the already added sigs will happen when the next sig is added and if the verification fails the adding itself will fail. So, a callback to the last paragraph, every 106 sigs the naming of VerifySchnorrSignature() is actually still correct ;) and the one verify call in the end just verifies the last n % 106 sigs left for that block. I did not put any effort in documenting this properly here yet because the secp256k1 API is not finalized yet and the details on this particular topic might still change, see for example https://github.com/bitcoin-core/secp256k1/pull/1134#pullrequestreview-1083948472

Everything should be conditional on the presence of a batch object which defaults to nullptr and if that is the case all the other functions should work as usual, for example when used outside of the context of a new block.

Maybe add a -schnorrbatchverify startup argument so more people can try it once it gets through review.

Yeah, I have been thinking that or maybe even a compiler argument

fjahr force-pushed on Feb 28, 2024

fjahr commented at 1:14 pm on February 28, 2024: contributor

@Sjors maybe what confused you is that BatchingCachingTransactionSignatureChecker::VerifySchnorrSignatureBatch() was unused. That was actually a leftover from an earlier design spike and I just forgot to remove it. It’s gone now.

Sjors commented at 1:36 pm on February 28, 2024: member

if there are more than 106 sigs added to the batch the verification for the already added sigs will happen when the next sig is added and if the verification fails the adding itself will fail.

That clarifies a lot, and should be documented :-)

DrahtBot added the label Needs rebase on Apr 6, 2024

0xB10C commented at 8:33 am on May 13, 2024: contributor

@willcl-ark and I ran a IBD benchmark vs master for this PR: IBD from a local node to height 840000 and currently only looking at the connect_block duration.

In the current draft implementation:

master is currently faster
batch validation is attempted and decreases performance even before taproot activated
after taproot activation, batch validation is significantly slower than non-batch validation

Used this bpftrace script to collect CSV data:

 0#!/usr/bin/env bpftrace
 1
 2usdt:./src/bitcoind:validation:block_connected
 3{
 4  $height = (int32) arg1;
 5  $tx = (uint64) arg2;
 6  $ins = (int32) arg3;
 7  $sigops = (int64) arg4;
 8  $duration = (int64) arg5; // in nanoseconds
 9
10  printf("%d,%ld,%d,%ld,%ld\n", $height, $tx, $ins, $sigops, $duration);
11}

willcl-ark commented at 8:38 am on May 13, 2024: member

batch validation is attempted and decreases performance even before taproot activated

From a quick skim of the changes ISTM that we always use the BatchingCachingTransactionSignatureChecker, and there was no switch to the CachingTransactionSignatureChecker, but this could well be because this is only in WIP state. It doesnt account for why it’s currently slower in all cases though.

fjahr force-pushed on Dec 5, 2024

DrahtBot removed the label Needs rebase on Dec 5, 2024

andrewtoth commented at 10:55 pm on December 7, 2024: contributor

It seems in this approach the m_batch_mutex is held for the entirety of adding, which will cause lock contention. Verify is then a blocking call on the main thread, which negates the multithreading.

I think a better approach would be to have a thread local batch for each worker thread and the main thread, so each thread can add all schnorr sigs without locking, and then each thread can verify their batches in parallel at the end.

fjahr force-pushed on Mar 11, 2025

fjahr renamed this:
~~[DO NOT MERGE] Schnorr batch verification for blocks~~
[EXPERIMENTAL] Schnorr batch verification for blocks
on Mar 11, 2025

in src/script/sigcache.h:81 in d333d5c6e3 outdated

71@@ -71,6 +72,22 @@ class CachingTransactionSignatureChecker : public TransactionSignatureChecker
72 
73     bool VerifyECDSASignature(const std::vector<unsigned char>& vchSig, const CPubKey& vchPubKey, const uint256& sighash) const override;
74     bool VerifySchnorrSignature(Span<const unsigned char> sig, const XOnlyPubKey& pubkey, const uint256& sighash) const override;
75+
76+    bool GetStore() const { return store; }
77+    SignatureCache& GetSigCache() const { return m_signature_cache; }
78+};
79+
80+class CollectingSignatureChecker : public CachingTransactionSignatureChecker {

Eunovo commented at 6:47 am on March 16, 2025:

https://github.com/bitcoin/bitcoin/pull/29491/commits/d333d5c6e3c4b454621998772ff74ff6656af691: This is nice! sets us up nicely to later do batch taptweak checking since we can “collect” it too.

in src/checkqueue.h:163 in 4541038323 outdated

160+                            break;
161+                        }
162+                        // Check succeeded; add signatures to shared batch
163+                        const auto& signatures = std::get<std::vector<SchnorrSignatureToVerify>>(check_result);
164+                        for (const auto& sig : signatures) {
165+                            m_batch.Add(sig.sig, sig.pubkey, sig.sighash);

Eunovo commented at 3:23 am on March 17, 2025:

https://github.com/bitcoin/bitcoin/pull/29491/commits/454103832351311d11e16505ed1925bd5f1c6875: IIUC All the threads still share this batch. That means we still have the same locking problem; threads will be stuck waiting for the batch to be free. In my very rough implementation https://github.com/bitcoin-dev-tools/benchcoin/pull/41/commits/173f07dc4493de3d3a9420f6a966606913277e9d#diff-0f68d4063c2fb11000f9048e46478e26c0ea2622f7866d00b574f429c3a4db61 I opted to give each batch a thread to resolve this.

fjahr commented at 0:17 am on March 20, 2025:

You’re right, this wasn’t a complete solution. I have improved this now by using buckets of signatures for each thread which are all added and only then verified by the master thread at the end. This could be further improved if we could add the signatures to a batch per thread right away but this would require that we could merge batch objects which the secp api currently doesn’t make possible. I’ll add that to my wish list because I think it should be even better than the current approach but it should be a comparatively minor improvement. I considered other approaches as well but this seems to be a very simple solution that does not rely on a more complex library or code or uses thread_local (which we try to avoid).

Your approach does do batching in each thread but also per each vChecks allocation, which are surprisingly small, especially when there are a lot of threads available. This is what caused me to look more into the nNow algo behavior. With the current approach here I try to only do one verify call per block which should be the fastest approach once pippenger is used in secp.

andrewtoth commented at 0:45 am on March 20, 2025:

I think @Eunovo’s approach is closer to what we want. Instead of a bucket per thread, have a batch per thread. However, instead of batch verifying after each vChecks batch, batch verify after the queue of checks is empty for the block. We need CCheckQueue::Complete to set a flag that no more checks will be added, and wake all threads again. The threads will all verify their batches once the global queue is empty. We would need to reset the flag after Complete.

This approach of verifying on master thread at the end will not be faster even with pippinger algo. The max speedup is closer to 2x, but multi threaded is 16x faster.

fjahr commented at 10:59 pm on March 23, 2025:

Instead of a bucket per thread, have a batch per thread. However, instead of batch verifying after each vChecks batch, batch verify after the queue of checks is empty for the block. We need CCheckQueue::Complete to set a flag that no more checks will be added, and wake all threads again. The threads will all verify their batches once the global queue is empty. We would need to reset the flag after Complete.

Yes, one batch per thread is the approach I mentioned above as well as preferred, if that wasn’t clear. My thinking was that the fastest approach overall would be if we would do that plus efficient merging of batches which should ideally have the same overhead as adding a single signature to a batch (to be verified that this is possible). It would need to be benchmarked across different scenarios for blocks with varying share of schnorr sigs and across different numbers of available threads, of course. I am sure there will be some scenarios where the “let each thread verify their own batch” will be faster and there will be other scenarios where “merge thread batches and verify once” is faster and we would need to decide based on that in the end. A definite upside of the merging of thread batches would be that it would be simpler code, we could save this round of waking up the threads with the flag set.

However, in the meantime I have asked Siv how hard it would be to add the merging of batches to the secp api and it appears it might be harder than I thought because it would require merging of scratch spaces, which is currently not possible. So, while maybe theoretically possible, it’s quite far of terms of engineering effort. Siv will still look into it and check if it’s feasible at all but until then I will disregard this approach.

So, long story short, I have now implemented it with a batch per thread using a flag set in complete as you suggested.

DrahtBot added the label Needs rebase on Mar 20, 2025

in src/checkqueue.h:129 in 6055f3a6f5 outdated

123@@ -114,6 +124,12 @@ class CCheckQueue
124                     if (fMaster && nTodo == 0) {
125                         // All checks are done; master performs batch verification
126                         if constexpr (std::is_same_v<R, ScriptFailureResult>) {
127+                            for (auto& bucket : m_signature_buckets) {
128+                                for (const auto& sig : bucket) {
129+                                    m_batch.Add(sig.sig, sig.pubkey, sig.sighash);

Eunovo commented at 12:46 pm on March 20, 2025:

https://github.com/bitcoin/bitcoin/pull/29491/commits/6055f3a6f530818e2805d11debb1fb2ce81fc5c0: This leaves the master thread to do all the “Pubkey recovery”(I think that’s what its called?) work in each batch.Add call, which is substantial. We can call “secp256k1_xonly_pubkey_parse” when collecting the signatures so the master thread is not stuck doing all the work to parse the XOnlyPubkey.

ryanofsky referenced this in commit b9c281011b on Mar 23, 2025

fjahr force-pushed on Mar 23, 2025

DrahtBot removed the label Needs rebase on Mar 23, 2025

fjahr commented at 1:29 pm on March 24, 2025: contributor

I’ve also rebased to get #31689 in here by the way.

in src/script/sigcache.h:81 in 696bd7547e outdated

76+
77+    bool GetStore() const { return store; }
78+    SignatureCache& GetSigCache() const { return m_signature_cache; }
79+};
80+
81+class CollectingSignatureChecker : public CachingTransactionSignatureChecker {

Eunovo commented at 8:05 pm on April 1, 2025:

https://github.com/bitcoin/bitcoin/pull/29491/commits/696bd7547e0e4ab44388d3c55ed636c46f9bdad5: IIUC the original idea was to “collect the signatures” and possibly combine them into one batch. Now that a “one batch per thread” approach has been taken, should this still be collecting signatures into a vector? the original “BatchSignatureChecker” that had a reference to the batch might work better since it skips the vector completely

Eunovo commented at 10:35 am on April 2, 2025:

the original “BatchSignatureChecker” that had a reference to the batch might work better since it skips the vector completely

On second thought, it might be better to leave as is because we are still experimenting and trying to find the right number of batches to use

Eunovo commented at 4:24 pm on May 5, 2025: contributor

While testing this, I observed a decline in performance as the number of threads increased. I’m not sure why this happens, but I suspect these play a role:

Doing batch-verification work within the critical section https://github.com/bitcoin/bitcoin/pull/29491/files#diff-0f68d4063c2fb11000f9048e46478e26c0ea2622f7866d00b574f429c3a4db61R151
Copy operations in https://github.com/bitcoin/bitcoin/pull/29491/files#diff-612abe4a74f2e9f6903cebff057d47bfe4f8a57507c175743f4b62fde0d3cc58R95-R99

I made some tweaks to my previous batch-verification implementation to produce https://github.com/Eunovo/bitcoin/commits/2025-batch-verification/ and retested. These are results from running the Microbenchmark build/bin/bench_bitcoin -filter=ConnectBlockAllSchnorr

Commit	Threads	Run 1 (block/s)	Run 2 (block/s)	Run 3 (block/s)	Min	Max	Mean (block/s)	StdDev
fjahr(4ab16a0b)	1	8.0	8.07	7.6	7.6	8.07	7.890000000000001	0.20704266871026072
fjahr(4ab16a0b)	2	15.54	15.74	15.66	15.54	15.74	15.646666666666667	0.08219218670625349
fjahr(4ab16a0b)	4	29.58	29.14	29.07	29.07	29.58	29.263333333333332	0.2257333727111592
fjahr(4ab16a0b)	8	49.33	45.88	48.5	45.88	49.33	47.903333333333336	1.4702909764925958
fjahr(4ab16a0b)	16	50.75	51.23	52.17	50.75	52.17	51.383333333333326	0.5897645481225736
master(baa848b8d)	1	6.31	6.59	6.6	6.31	6.6	6.5	0.13441230102437307
master(baa848b8d)	2	12.88	12.93	12.65	12.65	12.93	12.82	0.12192894105447909
master(baa848b8d)	4	24.96	24.96	24.93	24.93	24.96	24.95	0.014142135623731487
master(baa848b8d)	8	46.01	46.18	46.27	46.01	46.27	46.153333333333336	0.10780641085864351
master(baa848b8d)	16	52.84	52.36	52.83	52.36	52.84	52.67666666666667	0.22395436042987837
novo(0493ba332)	1	8.19	8.14	8.11	8.11	8.19	8.146666666666667	0.03299831645537218
novo(0493ba332)	2	15.47	15.78	15.75	15.47	15.78	15.666666666666666	0.1396026106091457
novo(0493ba332)	4	30.23	30.46	30.28	30.23	30.46	30.323333333333334	0.09877021593352711
novo(0493ba332)	8	53.96	50.33	54.68	50.33	54.68	52.99	1.9037331745809347
novo(0493ba332)	16	64.4	64.59	64.29	64.29	64.59	64.42666666666668	0.1239175353029395

Here’s a graph of the mean block processing speed (block/s) plotted against the number of threads (Higher is better). benchmark_comparison

EDIT I’m waiting on more results from a fresh IBD benchmark

fjahr commented at 5:10 pm on May 5, 2025: contributor

While testing this, I observed a decline in performance as the number of threads increased.

Can you share the setup for these benchmarks so I can test them? Thanks!

EDIT: I see you added it, it wasn’t in the notification I received

Eunovo commented at 6:40 am on May 6, 2025: contributor

Can you share the setup for these benchmarks so I can test them? Thanks!

OS is Ubuntu 24.04.2 LTS

Output from lspcu

 0Architecture:             x86_64
 1  CPU op-mode(s):         32-bit, 64-bit
 2  Address sizes:          48 bits physical, 48 bits virtual
 3  Byte Order:             Little Endian
 4CPU(s):                   16
 5  On-line CPU(s) list:    0-15
 6Vendor ID:                AuthenticAMD
 7  Model name:             AMD Ryzen 9 8945HS w/ Radeon 780M Graphics
 8    CPU family:           25
 9    Model:                117
10    Thread(s) per core:   2
11    Core(s) per socket:   8
12    Socket(s):            1
13    Stepping:             2
14    Frequency boost:      enabled
15    CPU(s) scaling MHz:   83%
16    CPU max MHz:          5263.0000
17    CPU min MHz:          400.0000

I’m running the IBD benchmark on a different machine. I’ll share details when results are in.

EDIT It seems you were referring to the command I ran. I used ConnectBlockAllSchnorr microbenchmark. Run it with build/bin/bench_bitcoin -filter=ConnectBlockAllSchnorr . For IBD benchmarking, I’m doing bitcoind -assumevalid=0 -par={par} -stopatheight=880000(I’m testing 1, 2, 4, 6, 8, 12 threads) with Benchkit.

Note that Benchkit is still experimental, it loads the 840000 assumeutxo snapshot and runs hyperfine

DrahtBot added the label Needs rebase on May 6, 2025

fjahr force-pushed on May 18, 2025

fjahr commented at 11:58 pm on May 18, 2025: contributor

It seems you were referring to the command I ran. I used ConnectBlockAllSchnorr microbenchmark. Run it with build/bin/bench_bitcoin -filter=ConnectBlockAllSchnorr . For IBD benchmarking, I’m doing bitcoind -assumevalid=0 -par={par} -stopatheight=880000(I’m testing 1, 2, 4, 6, 8, 12 threads) with Benchkit.

I couldn’t figure out how to test the different thread configurations for the microbench with Benchkit. Are you using Benchkit there as well or are you just setting worker_threads_num in test/util/setup_common.cpp? That’s what I have been doing for now.

Copy operations in https://github.com/bitcoin/bitcoin/pull/29491/files#diff-612abe4a74f2e9f6903cebff057d47bfe4f8a57507c175743f4b62fde0d3cc58R95-R99

I am not sure how to do this completely without copy, using span is unsafe because vChecks is cleared. Using emplace_back is possible and I am using it now but in the benchmarks I didn’t see noticable improvement from this. Can you be a bit more specific what you had in mind?

Doing batch-verification work within the critical section https://github.com/bitcoin/bitcoin/pull/29491/files#diff-0f68d4063c2fb11000f9048e46478e26c0ea2622f7866d00b574f429c3a4db61R151

I have implemented this and see a performance boost from it for the case with 15 worker threads (I didn’t test all the other thread configuration yet). On my machine I now get these results (running build/bin/bench_bitcoin -filter=ConnectBlockAllSchnorr -min-time=1000):

novo (15 worker threads)

0|            ns/block |             block/s |    err% |     total | benchmark
1|--------------------:|--------------------:|--------:|----------:|:----------
2|        9,285,356.00 |              107.70 |    1.6% |      1.09 | `ConnectBlockAllSchnorr`

this latest state 9d460e82e1404f7cfb29613d3852bbb99e0071ad (15 worker threads)

0|            ns/block |             block/s |    err% |     total | benchmark
1|--------------------:|--------------------:|--------:|----------:|:----------
2|        7,973,364.58 |              125.42 |    1.3% |      1.10 | `ConnectBlockAllSchnorr`

Thank you for the continued valuable feedback, I have made you co-author of the relevant commit.

A few comments on the latest code in the branch you linked to (0493ba332aba6ff0d6da55094fd8d834bb8cc18f):

Tracking if anything has been added to the batch and skipping verify makes sense as an idea but I have some issues with it: First of all, using the return value of secp256k1_batch_add_schnorrsig seems wrong because this will return false if something has gone wrong in adding the signature and this will usually mean that we have a problem, at least in the block verification process it definitely means that. So taking that to mean we don’t have to verify anymore, it’s fine doesn’t seem right. Additionally in my opinion we shouldn’t have to track this in bitcoin core because I believe secp256k1 should handle this. Such behavior should happen closest to the actual data IMO and if this is actually the case calling verify should be costless. In the current secp code the actual verification steps seem to be skipped here: https://github.com/bitcoin-core/secp256k1/pull/1134/files#diff-5a633b880f8591184912dc3e49b3dffc2af714b2fc27404d0a1b12f6a2df7a71R191 so it would be surprising to me if adding m_needs_verify gives a real performance boost. Did you benchmark this change in isolation and notice a performance impact? If you did see that I would be interested to dig into why that is. And fwiw, I could also be pursuaded to not trust secp with this and track this ourselves if this is what we usually do in core but I will need to look for some comparable reference code and understand the reasoning for this (seemingly) duplicate behavior.
IIUC you are using queue.empty() as the indicator that all the todos have been added, if I remember correctly I tried this at some point too but it turned out that there is no guarantee for that and it could be that the todos are just not added fast enough, hence the need for m_complete. So this means some threads could be verifying multiple times within a block. I don’t see why this would be causing problems with the verification though, I think the result is still correct in this regard. But the question is how this impacts performance. Here as well I would find it kind of surprising if this is faster in practice but maybe my thinking is wrong and your benchmarks tell us we should switch to this approach instead, letting threads verify their batch when they have the chance instead of letting them go to sleep. But I also think we need to wait until we have pippenger as well because that could influence these results significantly I think.
Running the loop one more time only with nNow = 0 to trigger the batch verification is very simple but I don’t see how you ensure that all the thread batches have been verified before the master thread returns. It looks to me like as soon as the master thread has verified its batch it simply returns, no matter how many (or any) other threads have verified their batches already and any failures in these batches would be missed. This is what I track with m_total_verified here in the code. Additionally, setting nNow = 0 has the unwanted side-effect that in the cleanup phase of the next iteration the loop seems to be in the very first iteration again and this leads to running nTotal++; giving the impression that there are more threads available. I guess this doesn’t cause issues only because there is no tracking of how many of the total threads have already verified their batch.

A few more comments on this latest push:

I have also been warming up to the idea of having the thread local batch object in Loop rather than in a checkqueue member, this saves us a few lines of code and I didn’t see an impact on performance.
Additionally I have been looking into some c++20 multithreading features since the latest push and tested using std::barrier instead of the counter but so far I didn’t see an improvement in performance or code clarity from it.
I have added a unit test for BatchSchnorrVerifier.
Just FYI, I don’t think rebasing currently makes sense for me here because this would require rebasing the secp batch branch and the author is working on doing that shortly themselves so I would rather wait for that.

Eunovo commented at 6:20 am on June 12, 2025: contributor

I couldn’t figure out how to test the different thread configurations for the microbench with Benchkit. Are you using Benchkit there as well or are you just setting worker_threads_num in test/util/setup_common.cpp? That’s what I have been doing for now.

Sorry I didn’t state that. That’s what I did for the microbench. I only use Benchkit for IBD bechmarks.

I’m going to review and test latest changes.

Eunovo commented at 8:22 am on June 12, 2025: contributor

I am not sure how to do this completely without copy, using span is unsafe because vChecks is cleared. Using emplace_back is possible and I am using it now but in the benchmarks I didn’t see noticable improvement from this. Can you be a bit more specific what you had in mind?

I was thinking it might be better to move away from the collecting sigs idea if the copy operations present a significant slowdown. I’ll see if I can detect any performance changes in testing.

Tracking if anything has been added to the batch and skipping verify makes sense as an idea but I have some issues with it: First of all, using the return value of secp256k1_batch_add_schnorrsig seems wrong because this will return false if something has gone wrong in adding the signature and this will usually mean that we have a problem, at least in the block verification process it definitely means that. So taking that to mean we don’t have to verify anymore, it’s fine doesn’t seem right. Additionally in my opinion we shouldn’t have to track this in bitcoin core because I believe secp256k1 should handle this. Such behavior should happen closest to the actual data IMO and if this is actually the case calling verify should be costless. In the current secp code the actual verification steps seem to be skipped here: https://github.com/bitcoin-core/secp256k1/pull/1134/files#diff-5a633b880f8591184912dc3e49b3dffc2af714b2fc27404d0a1b12f6a2df7a71R191 so it would be surprising to me if adding m_needs_verify gives a real performance boost. Did you benchmark this change in isolation and notice a performance impact? If you did see that I would be interested to dig into why that is. And fwiw, I could also be pursuaded to not trust secp with this and track this ourselves if this is what we usually do in core but I will need to look for some comparable reference code and understand the reasoning for this (seemingly) duplicate behavior.

Makes sense. I agree. We do not need to do this in core.

IIUC you are using queue.empty() as the indicator that all the todos have been added, if I remember correctly I tried this at some point too but it turned out that there is no guarantee for that and it could be that the todos are just not added fast enough, hence the need for m_complete. So this means some threads could be verifying multiple times within a block. I don’t see why this would be causing problems with the verification though, I think the result is still correct in this regard. But the question is how this impacts performance. Here as well I would find it kind of surprising if this is faster in practice but maybe my thinking is wrong and your benchmarks tell us we should switch to this approach instead, letting threads verify their batch when they have the chance instead of letting them go to sleep. But I also think we need to wait until we have pippenger as well because that could influence these results significantly I think.

Running the loop one more time only with nNow = 0 to trigger the batch verification is very simple but I don’t see how you ensure that all the thread batches have been verified before the master thread returns. It looks to me like as soon as the master thread has verified its batch it simply returns, no matter how many (or any) other threads have verified their batches already and any failures in these batches would be missed. This is what I track with m_total_verified here in the code. Additionally, setting nNow = 0 has the unwanted side-effect that in the cleanup phase of the next iteration the loop seems to be in the very first iteration again and this leads to running nTotal++; giving the impression that there are more threads available. I guess this doesn’t cause issues only because there is no tracking of how many of the total threads have already verified their batch.

It seems like the major reason for performance drop was the batch-verification work being performed in the critical section. I’ll try to reproduce your latest results.

Squashed 'src/secp256k1/' changes from b9313c6e1a..410abb205a

410abb205a batch: Generate speedup graphs
e5df505bad batch, extrakeys: Add benchmarks
642901f57a batch: Add tests for batch_add_* APIs
6378b6bcdc batch,ecmult: Add tests for core batch APIs and strauss_batch refactor
e5bbca63bf batch: Add example
f818fc0e4b batch: Add batch_add_* APIs
b47d8f3877 batch, ecmult: Add batch_verify and refactor strauss_batch
535a94b6d2 batch: Add create and destroy APIs
800233a2d4 batch: Initialize an experimental batch module
REVERT: b9313c6e1a Merge bitcoin-core/secp256k1#1708: release cleanup: bump version after 0.7.0
REVERT: a660a4976e Merge bitcoin-core/secp256k1#1707: release: Prepare for 0.7.0
REVERT: 7ab8b0cc01 release cleanup: bump version after 0.7.0
REVERT: a3e742d947 release: Prepare for 0.7.0
REVERT: f67b0ac1a0 ci: Don't hardcode ABI version
REVERT: 020ee60495 Merge bitcoin-core/secp256k1#1706: musig/tests: initialize keypair
REVERT: cde4130898 musig/tests: initialize keypair
REVERT: 6037833c9e Merge bitcoin-core/secp256k1#1702: changelog: update
REVERT: 40b4a06520 changelog: update
REVERT: 5e74086dc8 Merge bitcoin-core/secp256k1#1705: musig/test: Remove dead code
REVERT: 7c3380423c Merge bitcoin-core/secp256k1#1696: build: Refactor visibility logic and add override
REVERT: 8d967a602b musig/test: Remove dead code
REVERT: 983711cd6d musig/tests: Refactor vectors_signverify
REVERT: 73a695958a Merge bitcoin-core/secp256k1#1704: cmake: Make `secp256k1_objs` inherit interface defines from `secp256k1`
REVERT: bf082221ff cmake: Make `secp256k1_objs` inherit interface defines from `secp256k1`
REVERT: c82d84bb86 build: add CMake option for disabling symbol visibility attributes
REVERT: ce7923874f build: Add SECP256K1_NO_API_VISIBILITY_ATTRIBUTES
REVERT: e5297f6d79 build: Refactor visibility logic
REVERT: cbbbf3bd6e Merge bitcoin-core/secp256k1#1699: ci: enable musig module for native macOS arm64 job
REVERT: 943479a7a3 Merge bitcoin-core/secp256k1#1694: Revert "cmake: configure libsecp256k1.pc during install"
REVERT: 3352f9d667 ci: enable musig module for native macOS arm64 job
REVERT: ad60ef7ea7 Merge bitcoin-core/secp256k1#1689: ci: Convert `arm64` Cirrus tasks to GHA jobs
REVERT: c498779096 Merge bitcoin-core/secp256k1#1687: cmake: support the use of launchers in ctest -S scripts
REVERT: 44b205e9ee Revert "cmake: configure libsecp256k1.pc during install"
REVERT: 0dfe387dbe cmake: support the use of launchers in ctest -S scripts
REVERT: 89096c234d Merge bitcoin-core/secp256k1#1692: cmake: configure libsecp256k1.pc during install
REVERT: 7106dce6fd cmake: configure libsecp256k1.pc during install
REVERT: 004f57fcd8 ci: Move Valgrind build for `arm64` from Cirrus to GHA
REVERT: 5fafdfc30f ci: Move `gcc-snapshot` build for `arm64` from Cirrus to GHA
REVERT: e814b79a8b ci: Switch `arm64_debian` from QEMU to native `arm64` Docker image
REVERT: bcf77346b9 ci: Add `arm64` architecture to `docker_cache` job
REVERT: b77aae9226 ci: Rename Docker image tag to reflect architecture

git-subtree-dir: src/secp256k1
git-subtree-split: 410abb205a93b5c84a00a4e9e478c852b6dc6d69

b6fc8c52d5

Merge commit 'b6fc8c52d5a1ef0d20fb61db01cfb6d342e47b00' into 2024-02-batch-validation-updated2 0b18323777

build: Enable secp256k1 experimental batch module 9a64747004

validation: Add BatchVerify (unused) 82ce1d5196

script: Add batch validation error 536e3a8dbd

sigcache: Add CollectingSignatureChecker b024a84a5c

validation: Add BatchableScriptChecker 0f77e46795

validation: Use schnorr signature batch validation in parallel script verification

Co-authored-by: Eunovo <eunovo9@gmail.com>

2a6d5ff270

validation: Use schnorr batch validation in single thread case 16479c2ef7

fuzz: Add basic fuzz test for BatchSchnorrVerifier f663542588

test: Add BatchSchnorrVerifier unit test 2e99dbebe8

TODO: Fix Taproot functional test bcbbb574b1

in src/batchverify.cpp:39 in bcd51a49f8 outdated

34+    if (secp256k1_batch_usable(secp256k1_context_static, m_batch) == 0) {
35+        return false;
36+    }
37+
38+    secp256k1_xonly_pubkey pubkey_parsed;
39+    (void)secp256k1_xonly_pubkey_parse(secp256k1_context_static, &pubkey_parsed, pubkey.data());

Eunovo commented at 7:32 pm on June 15, 2025:

https://github.com/bitcoin/bitcoin/pull/29491/commits/bcd51a49f8b8892254288d7522435287f055f92f: why discard the return value of secp256k1_xonly_pubkey_parse()?

fjahr force-pushed on Jul 24, 2025

DrahtBot removed the label Needs rebase on Jul 24, 2025

DrahtBot commented at 10:54 pm on July 31, 2025: contributor

🐙 This pull request conflicts with the target branch and needs rebase.

DrahtBot added the label Needs rebase on Jul 31, 2025

[EXPERIMENTAL] Schnorr batch verification for blocks #29491

Code Coverage & Benchmarks

Reviews

Conflicts

LLM Linter (✨ experimental)