[EXPERIMENTAL] Schnorr batch verification for blocks #29491

pull fjahr wants to merge 10 commits into bitcoin:master from fjahr:2024-02-batch-validation-updated changing 84 files +2791 −504
  1. fjahr commented at 5:06 pm on February 27, 2024: contributor

    This is a draft implementation of Schnorr signature batch verification in blocks, put here for conceptual discussion, and collaborative benchmarking.

    The secp256k1 code is still under review itself, please spend some time giving feedback there if you can:

    TODOs

    • Batch taproot tweak verification as well
  2. DrahtBot commented at 5:06 pm on February 27, 2024: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/29491.

    Reviews

    See the guideline for information on the review process. A summary of reviews will appear here.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #34184 (mining: add cooldown to createNewBlock() immediately after IBD by Sjors)
    • #32575 (refactor: Split multithreaded case out of CheckInputScripts by fjahr)
    • #32317 (kernel: Separate UTXO set access from validation functions by sedited)
    • #29843 (policy: Allow non-standard scripts with -acceptnonstdtxn=1 (test nets only) by ajtowns)
    • #28690 (build: Introduce internal kernel library by sedited)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

    LLM Linter (✨ experimental)

    Possible typos and grammar issues:

    • memeory -> memory [spelling error making the word unclear]

    Possible places where named args for integral literals may be used (e.g. func(x, /*named_arg=*/0) in C++, and func(x, named_arg=0) in Python):

    • secp256k1_batch_add_schnorrsig(…, 32, …) in src/batchverify.cpp

    2026-01-31 14:01:32

  3. fjahr force-pushed on Feb 27, 2024
  4. DrahtBot added the label CI failed on Feb 27, 2024
  5. DrahtBot commented at 5:11 pm on February 27, 2024: contributor

    🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the documentation.

    Possibly this is due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    Leave a comment here, if you need help tracking down a confusing failure.

    Debug: https://github.com/bitcoin/bitcoin/runs/22041704016

  6. Sjors commented at 9:40 am on February 28, 2024: member

    I guess this is most interesting to test against the last 50K blocks on mainnet, i.e. since early 2023? Given the high percentage of taproot transactions. Such testing could benefit from a assume utxo snapshot, saving everyone the pain of rewinding. I can bake one, though ideally I’d like #26045 to land first.

    Is the current PR already using batches? It’s unclear to me by glancing over validation.cpp if and how it’s collecting up to max_batch_size{106} signatures before calling batch.Verify().

    Maybe add a -schnorrbatchverify startup argument so more people can try it once it gets through review.

  7. fjahr commented at 12:53 pm on February 28, 2024: contributor

    I guess this is most interesting to test against the last 50K blocks on mainnet, i.e. since early 2023? Given the high percentage of taproot transactions. Such testing could benefit from a assume utxo snapshot, saving everyone the pain of rewinding. I can bake one, though ideally I’d like #26045 to land first.

    Yeah, that would be great. I have to get the tracing part to work in my setup again before I can start.

    Is the current PR already using batches? It’s unclear to me by glancing over validation.cpp if and how it’s collecting up to max_batch_size{106} signatures before calling batch.Verify().

    Yes, the rough architecture is as as follows (sorry for not writing a better explanation above but it’s still very rough and I expect it to change). So, the batch object is constructed and then handed down from ConnectBlock() to CheckInputScripts() which hands it to the CScriptCheck constructor. When the CScriptCheck instance is executed and the batch object is present BatchingCachingTransactionSignatureChecker is used which only differs from the default CachingTransactionSignatureChecker in that its implementation of VerifySchnorrSignature() and adds the Schnorr sig to the batch object instead of verifying it right there. (Here we have an issue because the function is not doing what the name says anymore but it is a much simpler change this way and I find it hard to predict where we will land with this in the end.) Then back in ConnectBlock() the batch object is verified after all transactions have been iterated over.

    The 106 is an implementation detail from secp256k1 that I am not sure needs to be really exposed here. But what matters for the question is: if there are more than 106 sigs added to the batch the verification for the already added sigs will happen when the next sig is added and if the verification fails the adding itself will fail. So, a callback to the last paragraph, every 106 sigs the naming of VerifySchnorrSignature() is actually still correct ;) and the one verify call in the end just verifies the last n % 106 sigs left for that block. I did not put any effort in documenting this properly here yet because the secp256k1 API is not finalized yet and the details on this particular topic might still change, see for example https://github.com/bitcoin-core/secp256k1/pull/1134#pullrequestreview-1083948472

    Everything should be conditional on the presence of a batch object which defaults to nullptr and if that is the case all the other functions should work as usual, for example when used outside of the context of a new block.

    Maybe add a -schnorrbatchverify startup argument so more people can try it once it gets through review.

    Yeah, I have been thinking that or maybe even a compiler argument

  8. fjahr force-pushed on Feb 28, 2024
  9. fjahr commented at 1:14 pm on February 28, 2024: contributor
    @Sjors maybe what confused you is that BatchingCachingTransactionSignatureChecker::VerifySchnorrSignatureBatch() was unused. That was actually a leftover from an earlier design spike and I just forgot to remove it. It’s gone now.
  10. Sjors commented at 1:36 pm on February 28, 2024: member

    if there are more than 106 sigs added to the batch the verification for the already added sigs will happen when the next sig is added and if the verification fails the adding itself will fail.

    That clarifies a lot, and should be documented :-)

  11. DrahtBot added the label Needs rebase on Apr 6, 2024
  12. 0xB10C commented at 8:33 am on May 13, 2024: contributor

    @willcl-ark and I ran a IBD benchmark vs master for this PR: IBD from a local node to height 840000 and currently only looking at the connect_block duration.

    In the current draft implementation:

    • master is currently faster
    • batch validation is attempted and decreases performance even before taproot activated
    • after taproot activation, batch validation is significantly slower than non-batch validation

    image

    image

    Used this bpftrace script to collect CSV data:

     0#!/usr/bin/env bpftrace
     1
     2usdt:./src/bitcoind:validation:block_connected
     3{
     4  $height = (int32) arg1;
     5  $tx = (uint64) arg2;
     6  $ins = (int32) arg3;
     7  $sigops = (int64) arg4;
     8  $duration = (int64) arg5; // in nanoseconds
     9
    10  printf("%d,%ld,%d,%ld,%ld\n", $height, $tx, $ins, $sigops, $duration);
    11}
    
  13. willcl-ark commented at 8:38 am on May 13, 2024: member

    batch validation is attempted and decreases performance even before taproot activated

    From a quick skim of the changes ISTM that we always use the BatchingCachingTransactionSignatureChecker, and there was no switch to the CachingTransactionSignatureChecker, but this could well be because this is only in WIP state. It doesnt account for why it’s currently slower in all cases though.

  14. fjahr force-pushed on Dec 5, 2024
  15. DrahtBot removed the label Needs rebase on Dec 5, 2024
  16. andrewtoth commented at 10:55 pm on December 7, 2024: contributor

    It seems in this approach the m_batch_mutex is held for the entirety of adding, which will cause lock contention. Verify is then a blocking call on the main thread, which negates the multithreading.

    I think a better approach would be to have a thread local batch for each worker thread and the main thread, so each thread can add all schnorr sigs without locking, and then each thread can verify their batches in parallel at the end.

  17. fjahr force-pushed on Mar 11, 2025
  18. fjahr renamed this:
    [DO NOT MERGE] Schnorr batch verification for blocks
    [EXPERIMENTAL] Schnorr batch verification for blocks
    on Mar 11, 2025
  19. in src/script/sigcache.h:81 in d333d5c6e3 outdated
    71@@ -71,6 +72,22 @@ class CachingTransactionSignatureChecker : public TransactionSignatureChecker
    72 
    73     bool VerifyECDSASignature(const std::vector<unsigned char>& vchSig, const CPubKey& vchPubKey, const uint256& sighash) const override;
    74     bool VerifySchnorrSignature(Span<const unsigned char> sig, const XOnlyPubKey& pubkey, const uint256& sighash) const override;
    75+
    76+    bool GetStore() const { return store; }
    77+    SignatureCache& GetSigCache() const { return m_signature_cache; }
    78+};
    79+
    80+class CollectingSignatureChecker : public CachingTransactionSignatureChecker {
    


    Eunovo commented at 6:47 am on March 16, 2025:
    https://github.com/bitcoin/bitcoin/pull/29491/commits/d333d5c6e3c4b454621998772ff74ff6656af691: This is nice! sets us up nicely to later do batch taptweak checking since we can “collect” it too.
  20. in src/checkqueue.h:163 in 4541038323 outdated
    160+                            break;
    161+                        }
    162+                        // Check succeeded; add signatures to shared batch
    163+                        const auto& signatures = std::get<std::vector<SchnorrSignatureToVerify>>(check_result);
    164+                        for (const auto& sig : signatures) {
    165+                            m_batch.Add(sig.sig, sig.pubkey, sig.sighash);
    


    Eunovo commented at 3:23 am on March 17, 2025:
    https://github.com/bitcoin/bitcoin/pull/29491/commits/454103832351311d11e16505ed1925bd5f1c6875: IIUC All the threads still share this batch. That means we still have the same locking problem; threads will be stuck waiting for the batch to be free. In my very rough implementation https://github.com/bitcoin-dev-tools/benchcoin/pull/41/commits/173f07dc4493de3d3a9420f6a966606913277e9d#diff-0f68d4063c2fb11000f9048e46478e26c0ea2622f7866d00b574f429c3a4db61 I opted to give each batch a thread to resolve this.

    fjahr commented at 0:17 am on March 20, 2025:

    You’re right, this wasn’t a complete solution. I have improved this now by using buckets of signatures for each thread which are all added and only then verified by the master thread at the end. This could be further improved if we could add the signatures to a batch per thread right away but this would require that we could merge batch objects which the secp api currently doesn’t make possible. I’ll add that to my wish list because I think it should be even better than the current approach but it should be a comparatively minor improvement. I considered other approaches as well but this seems to be a very simple solution that does not rely on a more complex library or code or uses thread_local (which we try to avoid).

    Your approach does do batching in each thread but also per each vChecks allocation, which are surprisingly small, especially when there are a lot of threads available. This is what caused me to look more into the nNow algo behavior. With the current approach here I try to only do one verify call per block which should be the fastest approach once pippenger is used in secp.


    andrewtoth commented at 0:45 am on March 20, 2025:

    I think @Eunovo’s approach is closer to what we want. Instead of a bucket per thread, have a batch per thread. However, instead of batch verifying after each vChecks batch, batch verify after the queue of checks is empty for the block. We need CCheckQueue::Complete to set a flag that no more checks will be added, and wake all threads again. The threads will all verify their batches once the global queue is empty. We would need to reset the flag after Complete.

    This approach of verifying on master thread at the end will not be faster even with pippinger algo. The max speedup is closer to 2x, but multi threaded is 16x faster.


    fjahr commented at 10:59 pm on March 23, 2025:

    Instead of a bucket per thread, have a batch per thread. However, instead of batch verifying after each vChecks batch, batch verify after the queue of checks is empty for the block. We need CCheckQueue::Complete to set a flag that no more checks will be added, and wake all threads again. The threads will all verify their batches once the global queue is empty. We would need to reset the flag after Complete.

    Yes, one batch per thread is the approach I mentioned above as well as preferred, if that wasn’t clear. My thinking was that the fastest approach overall would be if we would do that plus efficient merging of batches which should ideally have the same overhead as adding a single signature to a batch (to be verified that this is possible). It would need to be benchmarked across different scenarios for blocks with varying share of schnorr sigs and across different numbers of available threads, of course. I am sure there will be some scenarios where the “let each thread verify their own batch” will be faster and there will be other scenarios where “merge thread batches and verify once” is faster and we would need to decide based on that in the end. A definite upside of the merging of thread batches would be that it would be simpler code, we could save this round of waking up the threads with the flag set.

    However, in the meantime I have asked Siv how hard it would be to add the merging of batches to the secp api and it appears it might be harder than I thought because it would require merging of scratch spaces, which is currently not possible. So, while maybe theoretically possible, it’s quite far of terms of engineering effort. Siv will still look into it and check if it’s feasible at all but until then I will disregard this approach.

    So, long story short, I have now implemented it with a batch per thread using a flag set in complete as you suggested.

  21. DrahtBot added the label Needs rebase on Mar 20, 2025
  22. in src/checkqueue.h:129 in 6055f3a6f5 outdated
    123@@ -114,6 +124,12 @@ class CCheckQueue
    124                     if (fMaster && nTodo == 0) {
    125                         // All checks are done; master performs batch verification
    126                         if constexpr (std::is_same_v<R, ScriptFailureResult>) {
    127+                            for (auto& bucket : m_signature_buckets) {
    128+                                for (const auto& sig : bucket) {
    129+                                    m_batch.Add(sig.sig, sig.pubkey, sig.sighash);
    


    Eunovo commented at 12:46 pm on March 20, 2025:
    https://github.com/bitcoin/bitcoin/pull/29491/commits/6055f3a6f530818e2805d11debb1fb2ce81fc5c0: This leaves the master thread to do all the “Pubkey recovery”(I think that’s what its called?) work in each batch.Add call, which is substantial. We can call “secp256k1_xonly_pubkey_parse” when collecting the signatures so the master thread is not stuck doing all the work to parse the XOnlyPubkey.
  23. ryanofsky referenced this in commit b9c281011b on Mar 23, 2025
  24. fjahr force-pushed on Mar 23, 2025
  25. DrahtBot removed the label Needs rebase on Mar 23, 2025
  26. fjahr commented at 1:29 pm on March 24, 2025: contributor
    I’ve also rebased to get #31689 in here by the way.
  27. in src/script/sigcache.h:81 in 696bd7547e outdated
    76+
    77+    bool GetStore() const { return store; }
    78+    SignatureCache& GetSigCache() const { return m_signature_cache; }
    79+};
    80+
    81+class CollectingSignatureChecker : public CachingTransactionSignatureChecker {
    


    Eunovo commented at 8:05 pm on April 1, 2025:
    https://github.com/bitcoin/bitcoin/pull/29491/commits/696bd7547e0e4ab44388d3c55ed636c46f9bdad5: IIUC the original idea was to “collect the signatures” and possibly combine them into one batch. Now that a “one batch per thread” approach has been taken, should this still be collecting signatures into a vector? the original “BatchSignatureChecker” that had a reference to the batch might work better since it skips the vector completely

    Eunovo commented at 10:35 am on April 2, 2025:

    the original “BatchSignatureChecker” that had a reference to the batch might work better since it skips the vector completely

    On second thought, it might be better to leave as is because we are still experimenting and trying to find the right number of batches to use

  28. Eunovo commented at 4:24 pm on May 5, 2025: contributor

    While testing this, I observed a decline in performance as the number of threads increased. I’m not sure why this happens, but I suspect these play a role:

    I made some tweaks to my previous batch-verification implementation to produce https://github.com/Eunovo/bitcoin/commits/2025-batch-verification/ and retested. These are results from running the Microbenchmark build/bin/bench_bitcoin -filter=ConnectBlockAllSchnorr

    Commit Threads Run 1 (block/s) Run 2 (block/s) Run 3 (block/s) Min Max Mean (block/s) StdDev
    fjahr(4ab16a0b) 1 8.0 8.07 7.6 7.6 8.07 7.890000000000001 0.20704266871026072
    fjahr(4ab16a0b) 2 15.54 15.74 15.66 15.54 15.74 15.646666666666667 0.08219218670625349
    fjahr(4ab16a0b) 4 29.58 29.14 29.07 29.07 29.58 29.263333333333332 0.2257333727111592
    fjahr(4ab16a0b) 8 49.33 45.88 48.5 45.88 49.33 47.903333333333336 1.4702909764925958
    fjahr(4ab16a0b) 16 50.75 51.23 52.17 50.75 52.17 51.383333333333326 0.5897645481225736
    master(baa848b8d) 1 6.31 6.59 6.6 6.31 6.6 6.5 0.13441230102437307
    master(baa848b8d) 2 12.88 12.93 12.65 12.65 12.93 12.82 0.12192894105447909
    master(baa848b8d) 4 24.96 24.96 24.93 24.93 24.96 24.95 0.014142135623731487
    master(baa848b8d) 8 46.01 46.18 46.27 46.01 46.27 46.153333333333336 0.10780641085864351
    master(baa848b8d) 16 52.84 52.36 52.83 52.36 52.84 52.67666666666667 0.22395436042987837
    novo(0493ba332) 1 8.19 8.14 8.11 8.11 8.19 8.146666666666667 0.03299831645537218
    novo(0493ba332) 2 15.47 15.78 15.75 15.47 15.78 15.666666666666666 0.1396026106091457
    novo(0493ba332) 4 30.23 30.46 30.28 30.23 30.46 30.323333333333334 0.09877021593352711
    novo(0493ba332) 8 53.96 50.33 54.68 50.33 54.68 52.99 1.9037331745809347
    novo(0493ba332) 16 64.4 64.59 64.29 64.29 64.59 64.42666666666668 0.1239175353029395

    Here’s a graph of the mean block processing speed (block/s) plotted against the number of threads (Higher is better). benchmark_comparison

    EDIT I’m waiting on more results from a fresh IBD benchmark

  29. fjahr commented at 5:10 pm on May 5, 2025: contributor

    While testing this, I observed a decline in performance as the number of threads increased.

    Can you share the setup for these benchmarks so I can test them? Thanks!

    EDIT: I see you added it, it wasn’t in the notification I received

  30. Eunovo commented at 6:40 am on May 6, 2025: contributor

    Can you share the setup for these benchmarks so I can test them? Thanks!

    OS is Ubuntu 24.04.2 LTS

    Output from lspcu

     0Architecture:             x86_64
     1  CPU op-mode(s):         32-bit, 64-bit
     2  Address sizes:          48 bits physical, 48 bits virtual
     3  Byte Order:             Little Endian
     4CPU(s):                   16
     5  On-line CPU(s) list:    0-15
     6Vendor ID:                AuthenticAMD
     7  Model name:             AMD Ryzen 9 8945HS w/ Radeon 780M Graphics
     8    CPU family:           25
     9    Model:                117
    10    Thread(s) per core:   2
    11    Core(s) per socket:   8
    12    Socket(s):            1
    13    Stepping:             2
    14    Frequency boost:      enabled
    15    CPU(s) scaling MHz:   83%
    16    CPU max MHz:          5263.0000
    17    CPU min MHz:          400.0000
    

    I’m running the IBD benchmark on a different machine. I’ll share details when results are in.

    EDIT It seems you were referring to the command I ran. I used ConnectBlockAllSchnorr microbenchmark. Run it with build/bin/bench_bitcoin -filter=ConnectBlockAllSchnorr . For IBD benchmarking, I’m doing bitcoind -assumevalid=0 -par={par} -stopatheight=880000(I’m testing 1, 2, 4, 6, 8, 12 threads) with Benchkit.

    Note that Benchkit is still experimental, it loads the 840000 assumeutxo snapshot and runs hyperfine

  31. DrahtBot added the label Needs rebase on May 6, 2025
  32. fjahr force-pushed on May 18, 2025
  33. fjahr commented at 11:58 pm on May 18, 2025: contributor

    It seems you were referring to the command I ran. I used ConnectBlockAllSchnorr microbenchmark. Run it with build/bin/bench_bitcoin -filter=ConnectBlockAllSchnorr . For IBD benchmarking, I’m doing bitcoind -assumevalid=0 -par={par} -stopatheight=880000(I’m testing 1, 2, 4, 6, 8, 12 threads) with Benchkit.

    I couldn’t figure out how to test the different thread configurations for the microbench with Benchkit. Are you using Benchkit there as well or are you just setting worker_threads_num in test/util/setup_common.cpp? That’s what I have been doing for now.

    Copy operations in https://github.com/bitcoin/bitcoin/pull/29491/files#diff-612abe4a74f2e9f6903cebff057d47bfe4f8a57507c175743f4b62fde0d3cc58R95-R99

    I am not sure how to do this completely without copy, using span is unsafe because vChecks is cleared. Using emplace_back is possible and I am using it now but in the benchmarks I didn’t see noticable improvement from this. Can you be a bit more specific what you had in mind?

    Doing batch-verification work within the critical section https://github.com/bitcoin/bitcoin/pull/29491/files#diff-0f68d4063c2fb11000f9048e46478e26c0ea2622f7866d00b574f429c3a4db61R151

    I have implemented this and see a performance boost from it for the case with 15 worker threads (I didn’t test all the other thread configuration yet). On my machine I now get these results (running build/bin/bench_bitcoin -filter=ConnectBlockAllSchnorr -min-time=1000):

    novo (15 worker threads)

    0|            ns/block |             block/s |    err% |     total | benchmark
    1|--------------------:|--------------------:|--------:|----------:|:----------
    2|        9,285,356.00 |              107.70 |    1.6% |      1.09 | `ConnectBlockAllSchnorr`
    

    this latest state 9d460e82e1404f7cfb29613d3852bbb99e0071ad (15 worker threads)

    0|            ns/block |             block/s |    err% |     total | benchmark
    1|--------------------:|--------------------:|--------:|----------:|:----------
    2|        7,973,364.58 |              125.42 |    1.3% |      1.10 | `ConnectBlockAllSchnorr`
    

    Thank you for the continued valuable feedback, I have made you co-author of the relevant commit.

    A few comments on the latest code in the branch you linked to (0493ba332aba6ff0d6da55094fd8d834bb8cc18f):

    • Tracking if anything has been added to the batch and skipping verify makes sense as an idea but I have some issues with it: First of all, using the return value of secp256k1_batch_add_schnorrsig seems wrong because this will return false if something has gone wrong in adding the signature and this will usually mean that we have a problem, at least in the block verification process it definitely means that. So taking that to mean we don’t have to verify anymore, it’s fine doesn’t seem right. Additionally in my opinion we shouldn’t have to track this in bitcoin core because I believe secp256k1 should handle this. Such behavior should happen closest to the actual data IMO and if this is actually the case calling verify should be costless. In the current secp code the actual verification steps seem to be skipped here: https://github.com/bitcoin-core/secp256k1/pull/1134/files#diff-5a633b880f8591184912dc3e49b3dffc2af714b2fc27404d0a1b12f6a2df7a71R191 so it would be surprising to me if adding m_needs_verify gives a real performance boost. Did you benchmark this change in isolation and notice a performance impact? If you did see that I would be interested to dig into why that is. And fwiw, I could also be pursuaded to not trust secp with this and track this ourselves if this is what we usually do in core but I will need to look for some comparable reference code and understand the reasoning for this (seemingly) duplicate behavior.
    • IIUC you are using queue.empty() as the indicator that all the todos have been added, if I remember correctly I tried this at some point too but it turned out that there is no guarantee for that and it could be that the todos are just not added fast enough, hence the need for m_complete. So this means some threads could be verifying multiple times within a block. I don’t see why this would be causing problems with the verification though, I think the result is still correct in this regard. But the question is how this impacts performance. Here as well I would find it kind of surprising if this is faster in practice but maybe my thinking is wrong and your benchmarks tell us we should switch to this approach instead, letting threads verify their batch when they have the chance instead of letting them go to sleep. But I also think we need to wait until we have pippenger as well because that could influence these results significantly I think.
    • Running the loop one more time only with nNow = 0 to trigger the batch verification is very simple but I don’t see how you ensure that all the thread batches have been verified before the master thread returns. It looks to me like as soon as the master thread has verified its batch it simply returns, no matter how many (or any) other threads have verified their batches already and any failures in these batches would be missed. This is what I track with m_total_verified here in the code. Additionally, setting nNow = 0 has the unwanted side-effect that in the cleanup phase of the next iteration the loop seems to be in the very first iteration again and this leads to running nTotal++; giving the impression that there are more threads available. I guess this doesn’t cause issues only because there is no tracking of how many of the total threads have already verified their batch.

    A few more comments on this latest push:

    • I have also been warming up to the idea of having the thread local batch object in Loop rather than in a checkqueue member, this saves us a few lines of code and I didn’t see an impact on performance.
    • Additionally I have been looking into some c++20 multithreading features since the latest push and tested using std::barrier instead of the counter but so far I didn’t see an improvement in performance or code clarity from it.
    • I have added a unit test for BatchSchnorrVerifier.
    • Just FYI, I don’t think rebasing currently makes sense for me here because this would require rebasing the secp batch branch and the author is working on doing that shortly themselves so I would rather wait for that.
  34. Eunovo commented at 6:20 am on June 12, 2025: contributor

    I couldn’t figure out how to test the different thread configurations for the microbench with Benchkit. Are you using Benchkit there as well or are you just setting worker_threads_num in test/util/setup_common.cpp? That’s what I have been doing for now.

    Sorry I didn’t state that. That’s what I did for the microbench. I only use Benchkit for IBD bechmarks.

    I’m going to review and test latest changes.

  35. Eunovo commented at 8:22 am on June 12, 2025: contributor

    I am not sure how to do this completely without copy, using span is unsafe because vChecks is cleared. Using emplace_back is possible and I am using it now but in the benchmarks I didn’t see noticable improvement from this. Can you be a bit more specific what you had in mind?

    I was thinking it might be better to move away from the collecting sigs idea if the copy operations present a significant slowdown. I’ll see if I can detect any performance changes in testing.

    • Tracking if anything has been added to the batch and skipping verify makes sense as an idea but I have some issues with it: First of all, using the return value of secp256k1_batch_add_schnorrsig seems wrong because this will return false if something has gone wrong in adding the signature and this will usually mean that we have a problem, at least in the block verification process it definitely means that. So taking that to mean we don’t have to verify anymore, it’s fine doesn’t seem right. Additionally in my opinion we shouldn’t have to track this in bitcoin core because I believe secp256k1 should handle this. Such behavior should happen closest to the actual data IMO and if this is actually the case calling verify should be costless. In the current secp code the actual verification steps seem to be skipped here: https://github.com/bitcoin-core/secp256k1/pull/1134/files#diff-5a633b880f8591184912dc3e49b3dffc2af714b2fc27404d0a1b12f6a2df7a71R191 so it would be surprising to me if adding m_needs_verify gives a real performance boost. Did you benchmark this change in isolation and notice a performance impact? If you did see that I would be interested to dig into why that is. And fwiw, I could also be pursuaded to not trust secp with this and track this ourselves if this is what we usually do in core but I will need to look for some comparable reference code and understand the reasoning for this (seemingly) duplicate behavior.

    Makes sense. I agree. We do not need to do this in core.

    • IIUC you are using queue.empty() as the indicator that all the todos have been added, if I remember correctly I tried this at some point too but it turned out that there is no guarantee for that and it could be that the todos are just not added fast enough, hence the need for m_complete. So this means some threads could be verifying multiple times within a block. I don’t see why this would be causing problems with the verification though, I think the result is still correct in this regard. But the question is how this impacts performance. Here as well I would find it kind of surprising if this is faster in practice but maybe my thinking is wrong and your benchmarks tell us we should switch to this approach instead, letting threads verify their batch when they have the chance instead of letting them go to sleep. But I also think we need to wait until we have pippenger as well because that could influence these results significantly I think.
    • Running the loop one more time only with nNow = 0 to trigger the batch verification is very simple but I don’t see how you ensure that all the thread batches have been verified before the master thread returns. It looks to me like as soon as the master thread has verified its batch it simply returns, no matter how many (or any) other threads have verified their batches already and any failures in these batches would be missed. This is what I track with m_total_verified here in the code. Additionally, setting nNow = 0 has the unwanted side-effect that in the cleanup phase of the next iteration the loop seems to be in the very first iteration again and this leads to running nTotal++; giving the impression that there are more threads available. I guess this doesn’t cause issues only because there is no tracking of how many of the total threads have already verified their batch.

    It seems like the major reason for performance drop was the batch-verification work being performed in the critical section. I’ll try to reproduce your latest results.

  36. in src/batchverify.cpp:39 in bcd51a49f8 outdated
    34+    if (secp256k1_batch_usable(secp256k1_context_static, m_batch) == 0) {
    35+        return false;
    36+    }
    37+
    38+    secp256k1_xonly_pubkey pubkey_parsed;
    39+    (void)secp256k1_xonly_pubkey_parse(secp256k1_context_static, &pubkey_parsed, pubkey.data());
    


    Eunovo commented at 7:32 pm on June 15, 2025:
    https://github.com/bitcoin/bitcoin/pull/29491/commits/bcd51a49f8b8892254288d7522435287f055f92f: why discard the return value of secp256k1_xonly_pubkey_parse()?

    fjahr commented at 12:49 pm on January 26, 2026:
    Not discarding it anymore in the latest push.
  37. fjahr force-pushed on Jul 24, 2025
  38. DrahtBot removed the label Needs rebase on Jul 24, 2025
  39. DrahtBot added the label Needs rebase on Jul 31, 2025
  40. maflcko removed the label CI failed on Sep 26, 2025
  41. fjahr force-pushed on Jan 26, 2026
  42. fjahr commented at 3:57 pm on January 26, 2026: contributor
    Rebased on latest master and included the latest API changes in the batch module PR in secp256k1.
  43. DrahtBot added the label CI failed on Jan 26, 2026
  44. DrahtBot commented at 4:55 pm on January 26, 2026: contributor

    🚧 At least one of the CI tasks failed. Task fuzzer,address,undefined,integer: https://github.com/bitcoin/bitcoin/actions/runs/21364278324/job/61491586498 LLM reason (✨ experimental): Fuzz target crash: libFuzzer reported a deadly signal (exit code 77) during fuzz run.

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  45. DrahtBot removed the label Needs rebase on Jan 26, 2026
  46. fjahr force-pushed on Jan 27, 2026
  47. fjahr force-pushed on Jan 27, 2026
  48. fjahr force-pushed on Jan 27, 2026
  49. fjahr force-pushed on Jan 29, 2026
  50. fjahr force-pushed on Jan 29, 2026
  51. DrahtBot removed the label CI failed on Jan 29, 2026
  52. DrahtBot added the label Needs rebase on Jan 31, 2026
  53. validation: Add BatchVerify (unused) d406370c70
  54. Squashed 'src/secp256k1/' changes from 14e56970cb..15ea24cb8c
    15ea24cb8c batch: make add functions void & introduce reset
    bfcc479a35 batch: remove `batch_usable` api
    15e388e096 batch: make tests functions internal & static
    aac054a373 fix typos & index the right inputs for benchmarks
    c07e710003 batch: remove experimental status
    49fb753393 test: fix ci failures
    e96dabb4af batch: Generate speedup graphs
    b0b3425cd4 batch, extrakeys: Add benchmarks
    9d5115156b batch: Add tests for batch_add_* APIs
    668199c917 batch,ecmult: Add tests for core batch APIs and strauss_batch refactor
    53a158203f batch: Add example
    b40b4186b8 batch: Add batch_add_* APIs
    2bed1cb6ee batch, ecmult: Add batch_verify and refactor strauss_batch
    8f13eeae31 batch: Add create and destroy APIs
    0b6b0c87ad batch: Initialize an experimental batch module
    REVERT: 14e56970cb Merge bitcoin-core/secp256k1#1794: ecmult: Use size_t for array indices
    REVERT: c7a52400d6 Merge bitcoin-core/secp256k1#1809: release cleanup: bump version after 0.7.1
    REVERT: ae7eb729c0 release cleanup: bump version after 0.7.1
    REVERT: 1a53f4961f Merge bitcoin-core/secp256k1#1808: Prepare for 0.7.1
    REVERT: 20a209f11c release: prepare for 0.7.1
    REVERT: c4b6a81a60 changelog: update in preparation for the v0.7.1 release
    REVERT: ebb35882da Merge bitcoin-core/secp256k1#1796: bench: fail early if user inputs invalid value for SECP256K1_BENCH_ITERS
    REVERT: c09215f7af bench: fail early if user inputs invalid value for SECP256K1_BENCH_ITERS
    REVERT: 471e3a130d Merge bitcoin-core/secp256k1#1800: sage: verify Eisenstein integer connection for GLV constants
    REVERT: 29ac4d8491 sage: verify Eisenstein integer connection for GLV constants
    REVERT: 4721e077b4 Merge bitcoin-core/secp256k1#1793: doc/bench: added help text for SECP256K1_BENCH_ITERS env var for bench_ecmult
    REVERT: bd5ced1fe1 doc/bench: added help text for SECP256K1_BENCH_ITERS env var for bench_ecmult
    REVERT: 47eb70959a ecmult: Use size_t for array indices in _odd_multiplies_table
    REVERT: bb1d199de5 ecmult: Use size_t for array indices into tables
    REVERT: 2d9137ce9d Merge bitcoin-core/secp256k1#1764: group: Avoid using infinity field directly in other modules
    REVERT: f9a944ff2d Merge bitcoin-core/secp256k1#1790: doc: include arg -DSECP256K1_USE_EXTERNAL_DEFAULT_CALLBACKS=ON for cmake
    REVERT: 0406cfc4d1 doc: include arg -DUSE_EXTERNAL_DEFAULT_CALLBACKS=1 for cmake
    REVERT: 8d445730ec Merge bitcoin-core/secp256k1#1783: Add VERIFY_CHECKs and documentation that flags must be 0 or 1
    REVERT: aa2a39c1a7 Merge bitcoin-core/secp256k1#1778: doc/bench: Added cmake build options to bench error messages
    REVERT: 540fec8ae9 Merge bitcoin-core/secp256k1#1788: test: split monolithic ellswift test into independent cases
    REVERT: d822b29021 test: split monolithic ellswift test into independent cases
    REVERT: ae00c552df Add VERIFY_CHECKs that flags are 0 or 1
    REVERT: 5c75183344 Merge bitcoin-core/secp256k1#1784: refactor: remove ret from secp256k1_ec_pubkey_serialize
    REVERT: be5e4f02fd Merge bitcoin-core/secp256k1#1779: Add ARG_CHECKs to ensure "array of pointers" elements are non-NULL
    REVERT: 3daab83a60 refactor: remove ret from secp256k1_ec_pubkey_serialize
    REVERT: 8bcda186d2 test: Add non-NULL checks for "pointer of array" API functions
    REVERT: 5a08c1bcdc Add ARG_CHECKs to ensure "array of pointers" elements are non-NULL
    REVERT: 3b5b03f301 doc/bench: Added cmake build options to bench error messages
    REVERT: e7f7083b53 Merge bitcoin-core/secp256k1#1774: refactor: split up internal pubkey serialization function into compressed/uncompressed variants
    REVERT: b6c2a3cd77 Merge bitcoin-core/secp256k1#1761: ecmult_multi: reduce strauss memory usage by 30%
    REVERT: f5e815f430 remove secp256k1_eckey_pubkey_serialize function
    REVERT: 0d3659c547 use new `_eckey_pubkey_serialize{33,65}` functions in modules (ellswift,musig)
    REVERT: adb76f82ea use new `_eckey_pubkey_serialize{33,65}` functions in public API
    REVERT: fc7458ca3e introduce `secp256k1_eckey_pubkey_serialize{33,65}` functions
    REVERT: 2f73e5281d group: Avoid using infinity field directly in other modules
    REVERT: 26166c4f5f ecmult_multi: reduce strauss memory usage by 30%
    
    git-subtree-dir: src/secp256k1
    git-subtree-split: 15ea24cb8c1bd239a7a39939da1952cf6d3a35b0
    ea9a84ab3c
  55. Merge commit 'ea9a84ab3c3dccc8eebc9689b0242f67b83fed24' into HEAD 324a338e19
  56. script: Add batch validation error e6b7862943
  57. sigcache: Add CollectingSignatureChecker cd958dfcf4
  58. validation: Add BatchableScriptChecker 8508d08597
  59. validation: Use schnorr signature batch validation in parallel script verification
    Co-authored-by: Eunovo <eunovo9@gmail.com>
    13ab315a82
  60. validation: Use schnorr batch validation in single thread case c43c31c3fd
  61. fuzz: Add basic fuzz test for BatchSchnorrVerifier f03c43aed0
  62. test: Add BatchSchnorrVerifier unit test f178c6cba7
  63. fjahr force-pushed on Jan 31, 2026
  64. DrahtBot removed the label Needs rebase on Jan 31, 2026
  65. DrahtBot added the label Needs rebase on Feb 3, 2026
  66. DrahtBot commented at 1:29 pm on February 3, 2026: contributor
    🐙 This pull request conflicts with the target branch and needs rebase.

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-02-03 18:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me