WIP: Add schnorrsig batch verification #760

pull jonasnick wants to merge 8 commits into bitcoin-core:master from jonasnick:schnorrsig-batch-verify changing 18 files +893 −24
  1. jonasnick commented at 3:10 pm on June 18, 2020: contributor

    This was part of #558 (for 20 months) to demonstrate the advantages of batch verification (see graph), but then removed to simplify #558 because there are still ongoing discussions:

    • there’s @real-or-random’s proposal to add synthetic randomness for batch verification (https://github.com/sipa/bips/issues/204)
    • batch verification fairly well tested, but still wouldn’t be comfortable with using this in Bitcoin Core for consensus in its current state because it relies on parts of the lib that are otherwise unused such as scratch spaces and ecmult_multi. Ideally we would have comprehensive fuzz tests for batch verification.
    • adding chacha20 may not be worth it, because it may only provide a negligible speedup over SHA256 (TODO: test this), plus we’re planning to allow overriding the SHA256 implementation at compile time (https://github.com/bitcoin-core/secp256k1/pull/558#issuecomment-619579991).
  2. jonasnick renamed this:
    Add schnorrsig batch verification
    WIP: Add schnorrsig batch verification
    on Jun 18, 2020
  3. gmaxwell commented at 10:54 pm on July 22, 2020: contributor
    Time to start rebasing on the nearly complete #558?
  4. jonasnick force-pushed on Jul 25, 2020
  5. real-or-random referenced this in commit 8ab24e8dad on Sep 11, 2020
  6. jonasnick force-pushed on Sep 11, 2020
  7. jonasnick commented at 9:50 pm on September 11, 2020: contributor
    rebased on master
  8. jonasnick commented at 9:51 pm on September 11, 2020: contributor
     0schnorrsig_sign: min 25.7us / avg 25.8us / max 26.2us
     1schnorrsig_verify: min 57.5us / avg 57.7us / max 58.0us
     2schnorrsig_batch_verify_1: min 64.0us / avg 64.3us / max 64.7us
     3schnorrsig_batch_verify_2: min 50.4us / avg 50.7us / max 51.0us
     4schnorrsig_batch_verify_4: min 43.8us / avg 43.9us / max 44.1us
     5schnorrsig_batch_verify_8: min 40.4us / avg 40.5us / max 40.5us
     6schnorrsig_batch_verify_16: min 38.9us / avg 39.0us / max 39.1us
     7schnorrsig_batch_verify_32: min 38.2us / avg 38.4us / max 38.7us
     8schnorrsig_batch_verify_64: min 37.7us / avg 37.8us / max 37.9us
     9schnorrsig_batch_verify_128: min 35.2us / avg 35.3us / max 35.3us
    10schnorrsig_batch_verify_256: min 31.9us / avg 32.0us / max 32.1us
    11schnorrsig_batch_verify_512: min 29.2us / avg 29.4us / max 29.7us
    12schnorrsig_batch_verify_1024: min 27.5us / avg 27.5us / max 27.5us
    13schnorrsig_batch_verify_2048: min 25.8us / avg 25.9us / max 26.0us
    14schnorrsig_batch_verify_4096: min 24.5us / avg 24.7us / max 24.8us
    15schnorrsig_batch_verify_8192: min 23.5us / avg 23.5us / max 23.6us
    
  9. sipa commented at 10:16 pm on September 11, 2020: contributor

    It’s a bit unfortunate that this API doesn’t really lend itself to cleanly supporting combined batches of BIP340 signature and taproot tweaks (which also need an EC multiplication).

    Given that this internally builds a batch object anyway, would it be reasonable to have that in the external API as well? So an idea could be that you:

    • Construct an (opaque) batch object
    • Add BIP340 verifications to it, using a variant of secp256k1_schnorrsig_verify that either fails immediately (parsing/decompression failures), or succeeds when the check was added to a batch object.
    • Add tweak checks to it, using a variant of secp256k1_xonly_pubkey_tweak_add_check.
    • In the end, a batch_verify function can be called on the batch to do all checks, and return true or false.
  10. elichai commented at 10:49 am on September 12, 2020: contributor

    It’s a bit unfortunate that this API doesn’t really lend itself to cleanly supporting combined batches of BIP340 signature and taproot tweaks (which also need an EC multiplication).

    Given that this internally builds a batch object anyway, would it be reasonable to have that in the external API as well? So an idea could be that you:

    * Construct an (opaque) batch object
    
    * Add BIP340 verifications to it, using a variant of `secp256k1_schnorrsig_verify` that either fails immediately (parsing/decompression failures), or succeeds when the check was added to a batch object.
    
    * Add tweak checks to it, using a variant of `secp256k1_xonly_pubkey_tweak_add_check`.
    
    * In the end, a batch_verify function can be called on the batch to do all checks, and return true or false.
    

    OoO I like that constructions, it allows for lazy batching and verifying only when you’re ready, which is also very useful for non-bitcoin applications by verifying things periodically when you have spare CPU time.

  11. jonasnick commented at 11:30 am on September 12, 2020: contributor
    Sounds like a reasonable plan. In particular, because it would be easy to add functions that manipulate the batch object for other schemes who need an EC mult at the end of verification. The current batch object only holds pointers to the elements, so care must be taken to ensure that they still exist at batch_verify time if this becomes a multi-step process.
  12. gmaxwell commented at 7:18 pm on September 12, 2020: contributor

    I think if it can be avoided it would be best to minimize holding pointers to caller provided objects, except in narrow cases (e.g. scratch)… lifetime management is hard for everyone.

    An alternative might be to have a function that takes a sigs countcount and pointers to arrays of pubkeys/signatures/messagehashes, then taproot count, and arrays for those. Less generic, but it would avoid needing to copy the inputs into library provided memory or retain pointers to caller provided objects.

  13. sipa commented at 1:24 am on September 14, 2020: contributor

    @gmaxwell The alternative is probably that the caller is going to do the copying into some batch object on their side instead, so I don’t think it’s that much of a difference.

    I think having the batch object have its own storage is probably better. That may mean that the caller should be able to select a maximum size (and once exceeded, transparently run validation of the already-provided batch?)

  14. gmaxwell commented at 2:15 am on September 14, 2020: contributor
    Sounds fine to me, though I hope it doesn’t need 2x the memory to store both the input and the intermediate work. :)
  15. elichai commented at 8:19 am on September 14, 2020: contributor

    I think having the batch object have its own storage is probably better. That may mean that the caller should be able to select a maximum size (and once exceeded, transparently run validation of the already-provided batch?)

    I agree but I’m somewhat worried about how, this will probably require the caller to know the approximate size of the batch(or the amount of sigs/tweaks) when starting the batch. I’d love to see if there’s some creative C API we can come up with

  16. sipa commented at 8:30 am on September 14, 2020: contributor

    @elichai No, I mean the opposite!

    The caller shouldn’t need to predict how large the batch will become - if they knew that, they wouldn’t need it, as they could just choose to stop after a certain size instead.

    What I mean is that the caller gets to set a maximum memory usage limit, and when that limit would be exceeded, adding another entry to the batch just causes the batch validation to run on what was added so far - and remember the outcome of that.

  17. gmaxwell commented at 8:37 am on September 14, 2020: contributor
    If what it processed so far failed, all further calls can be super fast because it’s just going to return a fail. :P
  18. sipa commented at 8:49 am on September 14, 2020: contributor
    Taking short circuit evaluation of && to a next level.
  19. elichai commented at 8:52 am on September 14, 2020: contributor

    (and once exceeded, transparently run validation of the already-provided batch?)

    I like that :) it gives the caller a tradeoff between memory and CPU while not crippling them if they predicted wrongly the max size

  20. in src/scalar_8x32_impl.h:823 in a91510405c outdated
    820+        r2->d[1] = x14;
    821+        r2->d[0] = x15;
    822+
    823+        over1 = secp256k1_scalar_check_overflow(r1);
    824+        over2 = secp256k1_scalar_check_overflow(r2);
    825+        over_count++;
    


    roconnor-blockstream commented at 4:30 pm on March 9, 2021:
    BIP-0340 says we should also repeat if r1 or r2 are zero.
  21. in src/scalar_low_impl.h:141 in a91510405c outdated
    121@@ -122,4 +122,9 @@ static SECP256K1_INLINE void secp256k1_scalar_cmov(secp256k1_scalar *r, const se
    122     *r = (*r & mask0) | (*a & mask1);
    123 }
    124 
    125+SECP256K1_INLINE static void secp256k1_scalar_chacha20(secp256k1_scalar *r1, secp256k1_scalar *r2, const unsigned char *seed, uint64_t n) {
    126+    *r1 = (seed[0] + n) % EXHAUSTIVE_TEST_ORDER;
    127+    *r2 = (seed[1] + n) % EXHAUSTIVE_TEST_ORDER;
    


    roconnor-blockstream commented at 4:31 pm on March 9, 2021:
    Perhaps the same goes for here. BIP-0340 says that 0 should be excluded.
  22. elichai commented at 9:50 am on March 29, 2021: contributor

    How do people feel about the following API:

    0int secp256k1_start_batch_size(size_t ops);
    1secp256k1_batch* secp256k1_start_batch(const secp256k1_context* ctx, secp256k1_scratch_space* scratch);
    2int secp256k1_batch_add_sig(ctx, batch, sig, msg, pubkey);
    3int secp256k1_batch_add_xpubkey_tweak_add_check(ctx, batch, parity, tweaked_pubkey, pubkey, tweak);
    4int secp256k1_batch_verify(ctx, batch);
    

    All the add functions for secp256k1_batch will use something like that:

    0if (batch.len == batch.scratch_capacity) {
    1    if (batch.failed) {return;}
    2    batch.failed = !secp256k1_batch_verify(ctx, batch);
    3    // clear the rest of the state
    4}
    5// add to batch
    6batch.len++
    7return
    8}
    

    (all the names are subject to bikeshedding)

  23. jonasnick commented at 1:44 pm on March 29, 2021: contributor
    I like this idea of batch verifying in an add function if the scratch space is full. It’ll need quite a bit of refactoring in ecmult_multi to separate out scratch space allocation. @elichai that matches my understanding of the approach and looks good to me. What does secp256k1_start_batch_size do?
  24. roconnor-blockstream commented at 1:56 pm on March 29, 2021: contributor

    I’m starting to think the ecmult_multi_var is slightly too narrow of an interface to be used for batch verification. Currently it does a “Multi-multiply: R = inp_g_sc * G + sum_i ni * Ai.” But what I think we want is one that does “Multi-multiply: R = (sum_i gi) * G + sum_i ni * Ai.” So that we can stream a series of equations to be batch verfied without needing to add up all the G coefficents in advance.

    We have (attempted) this nice streamable API for ecmult_multi_var, but what’s the point of it if we just have to allocate a new buffer for all the inputs upfront?

  25. elichai commented at 2:00 pm on March 29, 2021: contributor

    What does secp256k1_start_batch_size do?

    Tells you the size of the scratch space required for the amount of signatures/tweaks you want to batch

  26. roconnor-blockstream commented at 2:06 pm on March 29, 2021: contributor

    Barring such an enhanced ecmult_multi_var interface I would propose the following API for batch verification:

     0typedef int (secp256k1_batch_verify_gi_callback)(secp256k1_scalar *gi, size_t idx, void *data);
     1typedef int (secp256k1_batch_verify_callback)(secp256k1_scalar *na, secp256k1_scalar *nb, secp256k1_ge *pta, secp256k1_ge *ptb, size_t idx, void *data);
     2
     3/* Verifies na_i*A_i = nb_i*B_i + ng_i * G for all i < n (with high probability). */
     4static int secp256k1_batch_verify(ctx, scratch, secp256k1_batch_verify_gi_callback cb_gi, secp256k1_batch_verify_callback  cb, void *cbdata, size_t n);
     5
     6secp256k1_batch_data_gi_from_sig(secp256k1_scalar *gi, sig);
     7secp256k1_batch_data_from_sig(secp256k1_scalar *na, secp256k1_scalar *nb, secp256k1_ge *pta, secp256k1_ge *ptb, sig, msg, pubkey);
     8
     9secp256k1_batch_data_gi_from_xpubkey_tweak(secp256k1_scalar *gi);
    10secp256k1_batch_data_from_xpubkey_tweak(secp256k1_scalar *na, secp256k1_scalar *nb, secp256k1_ge *pta, secp256k1_ge *ptb, parity, tweaked_pubkey, pubkey, tweak);
    

    Edit: There are a couple of possible variations here. We could drop the na scalar values, and instead verify A_i = nb_i*B_i + ng_i * G (though I think adding the na is fine as it comes nearly for free). We could also rearrange the verification equation to verify 0 = na_i*A_i + nb_i*B_i + ng_i * G or 0 = A_i + nb_i*B_i + ng_i * G. I don’t have any strong feelings about these variants.

  27. roconnor-blockstream commented at 3:46 pm on March 29, 2021: contributor
    My proposal was based on the idea that batch_verify must use secp256k1_ecmult_multi_var, but this line of thinking was wrong. batch_verify can call secp256k1_ecmult_pippenger_wnaf and friends directly. I withdraw my proposal until I give things more consideration.
  28. sipa commented at 4:01 pm on March 29, 2021: contributor

    @roconnor-blockstream I don’t see what the issue is with the multi-multiplication interface. The batch interface can do the aggregation of scalars before calling the multi-multiplication code.

    I also don’t think we should be exposing a public interface for arbitrary EC operations/verifications. This library aims for a high-level interface of protocols.

  29. roconnor-blockstream commented at 6:07 pm on March 29, 2021: contributor

    The issue is that if everything is done naively the following happens:

    1. Batch verification allocates a buffer, or it is allocated by the caller.
    2. Batch verification runs point decompression the data from the signatures and tweaks, filling the buffer until it is full, copying this data from their own working copy of signature and tweak data.
    3. A chacha seed is computed by scanning this buffered data (note that the BIP-340 specification for the chacha seed as written doesn’t support mixing tweaks with signature data, so some liberty must be taken here).
    4. The buffered data is scanned again to compute the ng scalar value to be passed to ecmult_mult.
    5. ecmult_multi is called with this ng scalar value, the buffered data, and a custom callback to lookup points and scalars from this buffered data, and multiply it by the appropriate chacha coefficient.
    6. ecmult_multi calls either secp256k1_ecmult_pippenger_batch and/or secp256k1_ecmult_strauss_batch
    7. In either case yet another buffer is allocated, to hold yet another copy of the points which is filled in by calling the callback which simply copies from the previous allocated buffer.
    8. The result of ecmult_multi is tested for infinity.

    This naive approach involves three copies of an entire batch of points in working memory simultaneously:

    1. Group elements in compressed form from signature data, and public keys, and tap-tweak public keys that the user is starting from.
    2. a copy with decompressed points for the naive batch validation implementation itself (an alternative implementation could maybe keep this copy of points compressed, but it still needs to be buffered.)
    3. another copy of decompressed points for either the secp256k1_ecmult_pippenger_batch and/or secp256k1_ecmult_strauss_batch, depending on which on one ends up being used.

    Having 3 simultaneous copies of a rather large amount of the same data just to conform to the existing ecmult_multi API doesn’t seem reasonable.

  30. gmaxwell commented at 0:49 am on March 30, 2021: contributor

    I don’t recall what this implementation does but at least at one point batch validation was implemented without duplicated buffering, reusing the scratch space for both the queue and the working space for the multi-exp.

    ecmult_multi is not a public api, it is entirely internal to the library and not exported (all things which are exposed are annotated with SECP256K1_API)– and I think half the motivation for how it is particularly structured was so that the wnaf pippenger could be shimmed into a set of existing tests. If its interface needs changes or the layout of the scratch space needs to change to avoid making extra copies then that is probably a perfectly reasonable thing to do, but is also a behind the scenes optimization that shouldn’t change the public interface.

  31. jonasnick force-pushed on Mar 30, 2021
  32. jonasnick commented at 9:00 pm on March 30, 2021: contributor

    Rebased in order to benchmark with safegcd. Pre-rebase (EDIT: without endo):

     0$ ./bench_schnorrsig
     1schnorrsig_sign: min 28.0us / avg 28.4us / max 29.0us
     2schnorrsig_verify: min 63.1us / avg 64.1us / max 64.9us
     3schnorrsig_batch_verify_1: min 70.7us / avg 71.1us / max 71.5us
     4schnorrsig_batch_verify_2: min 55.5us / avg 56.1us / max 56.5us
     5schnorrsig_batch_verify_4: min 47.6us / avg 48.5us / max 48.9us
     6schnorrsig_batch_verify_8: min 44.3us / avg 44.4us / max 44.7us
     7schnorrsig_batch_verify_16: min 42.6us / avg 43.0us / max 43.4us
     8schnorrsig_batch_verify_32: min 42.2us / avg 42.4us / max 42.6us
     9schnorrsig_batch_verify_64: min 42.0us / avg 42.1us / max 42.2us
    10schnorrsig_batch_verify_128: min 38.4us / avg 39.0us / max 39.4us
    11schnorrsig_batch_verify_256: min 35.1us / avg 35.5us / max 35.8us
    12schnorrsig_batch_verify_512: min 32.1us / avg 32.4us / max 33.0us
    13schnorrsig_batch_verify_1024: min 30.2us / avg 30.4us / max 30.7us
    14schnorrsig_batch_verify_2048: min 28.4us / avg 28.7us / max 28.8us
    15schnorrsig_batch_verify_4096: min 27.4us / avg 27.4us / max 27.5us
    16schnorrsig_batch_verify_8192: min 26.5us / avg 26.8us / max 27.0us
    

    Post-rebase:

     0$ ./bench_schnorrsig
     1schnorrsig_sign: min 26.6us / avg 27.2us / max 27.8us
     2schnorrsig_verify: min 46.4us / avg 47.4us / max 48.6us
     3schnorrsig_batch_verify_1: min 55.1us / avg 57.0us / max 58.4us
     4schnorrsig_batch_verify_2: min 47.9us / avg 49.4us / max 52.0us
     5schnorrsig_batch_verify_4: min 44.8us / avg 45.1us / max 45.6us
     6schnorrsig_batch_verify_8: min 42.3us / avg 43.4us / max 44.9us
     7schnorrsig_batch_verify_16: min 42.2us / avg 42.7us / max 43.0us
     8schnorrsig_batch_verify_32: min 42.2us / avg 42.4us / max 42.5us
     9schnorrsig_batch_verify_64: min 39.8us / avg 40.3us / max 41.2us
    10schnorrsig_batch_verify_128: min 36.7us / avg 37.2us / max 38.1us
    11schnorrsig_batch_verify_256: min 32.9us / avg 33.6us / max 34.7us
    12schnorrsig_batch_verify_512: min 30.9us / avg 31.4us / max 32.0us
    13schnorrsig_batch_verify_1024: min 29.2us / avg 29.5us / max 29.8us
    14schnorrsig_batch_verify_2048: min 27.9us / avg 28.1us / max 28.3us
    15schnorrsig_batch_verify_4096: min 26.8us / avg 26.9us / max 27.0us
    16schnorrsig_batch_verify_8192: min 26.6us / avg 26.9us / max 27.2us
    
  33. jonasnick commented at 10:23 pm on March 30, 2021: contributor

    As @elichai noted on IRC, this is an unfair comparison because the pre-rebase benchmark was without endomorphism. So here’s pre-rebase with endo enabled:

     0$ ./bench_schnorrsig 
     1schnorrsig_sign: min 28.3us / avg 28.6us / max 28.9us
     2schnorrsig_verify: min 45.2us / avg 45.7us / max 46.4us
     3schnorrsig_batch_verify_1: min 50.3us / avg 50.7us / max 51.2us
     4schnorrsig_batch_verify_2: min 46.3us / avg 46.7us / max 47.0us
     5schnorrsig_batch_verify_4: min 42.9us / avg 43.2us / max 43.5us
     6schnorrsig_batch_verify_8: min 41.5us / avg 41.6us / max 41.9us
     7schnorrsig_batch_verify_16: min 41.8us / avg 41.9us / max 42.0us
     8schnorrsig_batch_verify_32: min 41.4us / avg 41.6us / max 41.7us
     9schnorrsig_batch_verify_64: min 38.9us / avg 39.2us / max 39.6us
    10schnorrsig_batch_verify_128: min 35.7us / avg 35.7us / max 35.8us
    11schnorrsig_batch_verify_256: min 32.4us / avg 32.9us / max 33.7us
    12schnorrsig_batch_verify_512: min 30.5us / avg 30.6us / max 30.7us
    13schnorrsig_batch_verify_1024: min 28.5us / avg 28.6us / max 28.7us
    14schnorrsig_batch_verify_2048: min 27.1us / avg 27.3us / max 27.6us
    15schnorrsig_batch_verify_4096: min 25.9us / avg 26.4us / max 26.6us
    16schnorrsig_batch_verify_8192: min 26.1us / avg 26.2us / max 26.3us
    

    EDIT: I can not explain this performance regression right now, here’s the pre rebase branch I’ve used.

  34. sipa commented at 5:11 pm on March 31, 2021: contributor

    I can’t reproduce those benchmark results.

    All numbers on AMD Ryzen Threadripper 2950X 16-Core Processor, GCC 10.2.1.

    old pre-safegcd branch with endo enabled and gmp enabled:

     0schnorrsig_sign: min 29.2us / avg 29.4us / max 30.0us
     1schnorrsig_verify: min 48.4us / avg 48.7us / max 49.2us
     2schnorrsig_batch_verify_1: min 55.0us / avg 55.2us / max 55.6us
     3schnorrsig_batch_verify_2: min 50.3us / avg 50.4us / max 50.4us
     4schnorrsig_batch_verify_4: min 46.7us / avg 46.8us / max 46.9us
     5schnorrsig_batch_verify_8: min 44.6us / avg 44.7us / max 44.7us
     6schnorrsig_batch_verify_16: min 44.2us / avg 44.3us / max 44.5us
     7schnorrsig_batch_verify_32: min 43.5us / avg 43.6us / max 43.6us
     8schnorrsig_batch_verify_64: min 41.1us / avg 41.1us / max 41.2us
     9schnorrsig_batch_verify_128: min 37.8us / avg 37.8us / max 37.9us
    10schnorrsig_batch_verify_256: min 33.9us / avg 34.2us / max 34.4us
    11schnorrsig_batch_verify_512: min 32.0us / avg 32.0us / max 32.0us
    12schnorrsig_batch_verify_1024: min 29.8us / avg 29.9us / max 30.0us
    13schnorrsig_batch_verify_2048: min 28.3us / avg 28.4us / max 28.4us
    14schnorrsig_batch_verify_4096: min 27.1us / avg 27.2us / max 27.3us
    15schnorrsig_batch_verify_8192: min 27.1us / avg 27.2us / max 27.3us
    

    old pre-safegcd branch with endo enabled and gmp disabled:

     0schnorrsig_sign: min 29.1us / avg 29.2us / max 29.4us
     1schnorrsig_verify: min 52.2us / avg 52.4us / max 52.8us
     2schnorrsig_batch_verify_1: min 55.0us / avg 55.2us / max 55.4us
     3schnorrsig_batch_verify_2: min 50.6us / avg 50.7us / max 50.9us
     4schnorrsig_batch_verify_4: min 47.0us / avg 47.3us / max 47.5us
     5schnorrsig_batch_verify_8: min 44.9us / avg 44.9us / max 44.9us
     6schnorrsig_batch_verify_16: min 44.8us / avg 45.0us / max 45.4us
     7schnorrsig_batch_verify_32: min 43.5us / avg 43.6us / max 43.6us
     8schnorrsig_batch_verify_64: min 41.2us / avg 41.3us / max 41.3us
     9schnorrsig_batch_verify_128: min 37.8us / avg 37.8us / max 37.9us
    10schnorrsig_batch_verify_256: min 34.2us / avg 34.3us / max 34.4us
    11schnorrsig_batch_verify_512: min 32.4us / avg 33.1us / max 33.9us
    12schnorrsig_batch_verify_1024: min 30.3us / avg 30.4us / max 30.6us
    13schnorrsig_batch_verify_2048: min 28.3us / avg 28.4us / max 28.6us
    14schnorrsig_batch_verify_4096: min 27.1us / avg 27.3us / max 27.5us
    15schnorrsig_batch_verify_8192: min 27.1us / avg 27.2us / max 27.3us
    

    new branch (endo and gmp are gone):

     0schnorrsig_sign: min 26.3us / avg 26.5us / max 26.7us
     1schnorrsig_verify: min 48.0us / avg 48.3us / max 48.6us
     2schnorrsig_batch_verify_1: min 55.1us / avg 55.3us / max 55.3us
     3schnorrsig_batch_verify_2: min 50.6us / avg 50.7us / max 50.9us
     4schnorrsig_batch_verify_4: min 47.0us / avg 47.2us / max 47.5us
     5schnorrsig_batch_verify_8: min 44.6us / avg 44.8us / max 45.0us
     6schnorrsig_batch_verify_16: min 44.3us / avg 44.6us / max 44.7us
     7schnorrsig_batch_verify_32: min 43.6us / avg 43.7us / max 43.8us
     8schnorrsig_batch_verify_64: min 41.2us / avg 41.4us / max 41.6us
     9schnorrsig_batch_verify_128: min 37.7us / avg 38.1us / max 38.8us
    10schnorrsig_batch_verify_256: min 34.0us / avg 34.2us / max 34.5us
    11schnorrsig_batch_verify_512: min 31.9us / avg 32.2us / max 32.4us
    12schnorrsig_batch_verify_1024: min 30.1us / avg 30.2us / max 30.3us
    13schnorrsig_batch_verify_2048: min 28.3us / avg 28.5us / max 28.7us
    14schnorrsig_batch_verify_4096: min 27.2us / avg 27.2us / max 27.3us
    15schnorrsig_batch_verify_8192: min 27.2us / avg 27.3us / max 27.5us
    
  35. sipa commented at 5:40 pm on March 31, 2021: contributor

    Similar results with GCC 7.5.0 on the same hardware

    pre-safegcd, with endo, with gmp:

     0schnorrsig_sign: min 28.8us / avg 29.0us / max 29.8us
     1schnorrsig_verify: min 48.3us / avg 48.6us / max 48.9us
     2schnorrsig_batch_verify_1: min 54.6us / avg 54.8us / max 55.1us
     3schnorrsig_batch_verify_2: min 50.1us / avg 50.2us / max 50.4us
     4schnorrsig_batch_verify_4: min 46.7us / avg 46.8us / max 46.9us
     5schnorrsig_batch_verify_8: min 44.3us / avg 44.3us / max 44.3us
     6schnorrsig_batch_verify_16: min 44.0us / avg 44.1us / max 44.2us
     7schnorrsig_batch_verify_32: min 43.4us / avg 43.8us / max 44.7us
     8schnorrsig_batch_verify_64: min 41.1us / avg 41.2us / max 41.4us
     9schnorrsig_batch_verify_128: min 37.5us / avg 37.6us / max 37.6us
    10schnorrsig_batch_verify_256: min 33.8us / avg 33.9us / max 33.9us
    11schnorrsig_batch_verify_512: min 31.8us / avg 31.9us / max 32.0us
    12schnorrsig_batch_verify_1024: min 29.7us / avg 29.8us / max 29.9us
    13schnorrsig_batch_verify_2048: min 28.2us / avg 28.2us / max 28.3us
    14schnorrsig_batch_verify_4096: min 27.0us / avg 27.2us / max 27.3us
    15schnorrsig_batch_verify_8192: min 27.0us / avg 27.1us / max 27.2us
    

    pre-safegcd, with endo, without gmp:

     0schnorrsig_sign: min 29.0us / avg 29.2us / max 29.6us
     1schnorrsig_verify: min 51.9us / avg 52.5us / max 54.2us
     2schnorrsig_batch_verify_1: min 54.7us / avg 55.0us / max 55.5us
     3schnorrsig_batch_verify_2: min 50.2us / avg 50.4us / max 50.7us
     4schnorrsig_batch_verify_4: min 46.7us / avg 46.9us / max 47.1us
     5schnorrsig_batch_verify_8: min 44.4us / avg 44.5us / max 44.6us
     6schnorrsig_batch_verify_16: min 44.0us / avg 44.1us / max 44.2us
     7schnorrsig_batch_verify_32: min 43.3us / avg 43.4us / max 43.5us
     8schnorrsig_batch_verify_64: min 40.9us / avg 40.9us / max 41.0us
     9schnorrsig_batch_verify_128: min 37.5us / avg 37.5us / max 37.7us
    10schnorrsig_batch_verify_256: min 33.9us / avg 34.4us / max 34.8us
    11schnorrsig_batch_verify_512: min 31.8us / avg 31.8us / max 32.0us
    12schnorrsig_batch_verify_1024: min 29.6us / avg 29.7us / max 29.7us
    13schnorrsig_batch_verify_2048: min 28.1us / avg 28.2us / max 28.4us
    14schnorrsig_batch_verify_4096: min 26.9us / avg 27.0us / max 27.1us
    15schnorrsig_batch_verify_8192: min 27.0us / avg 27.2us / max 27.3us
    

    post-safegcd:

     0schnorrsig_sign: min 25.9us / avg 26.0us / max 26.3us
     1schnorrsig_verify: min 47.9us / avg 48.1us / max 48.4us
     2schnorrsig_batch_verify_1: min 54.7us / avg 54.9us / max 55.0us
     3schnorrsig_batch_verify_2: min 50.2us / avg 50.4us / max 50.6us
     4schnorrsig_batch_verify_4: min 46.8us / avg 48.2us / max 50.6us
     5schnorrsig_batch_verify_8: min 44.5us / avg 45.1us / max 45.5us
     6schnorrsig_batch_verify_16: min 44.0us / avg 44.2us / max 44.5us
     7schnorrsig_batch_verify_32: min 43.6us / avg 43.6us / max 43.7us
     8schnorrsig_batch_verify_64: min 40.9us / avg 41.0us / max 41.2us
     9schnorrsig_batch_verify_128: min 37.6us / avg 37.9us / max 38.4us
    10schnorrsig_batch_verify_256: min 33.7us / avg 33.8us / max 34.1us
    11schnorrsig_batch_verify_512: min 31.8us / avg 32.1us / max 32.4us
    12schnorrsig_batch_verify_1024: min 29.6us / avg 29.6us / max 29.7us
    13schnorrsig_batch_verify_2048: min 28.1us / avg 28.2us / max 28.2us
    14schnorrsig_batch_verify_4096: min 27.0us / avg 27.3us / max 27.7us
    15schnorrsig_batch_verify_8192: min 27.1us / avg 27.2us / max 27.5us
    
  36. sipa commented at 9:38 pm on March 31, 2021: contributor

    Did benchmarks on a i7-7820HQ CPU with clock fixed at 2.6 Ghz.

    I do indeed observe a small regression on some GCC versions (7,8,10), but on clang it appears to go the other way around. I don’t think there is much reason for concern here - we know there are variations in performance between compiler versions, and it’s to be expected that different code will affect different compilers differently:

      0pre-safegcd ENDO=on GMP=off CC=gcc-7
      1schnorrsig_sign: min 38.5us / avg 38.5us / max 38.6us
      2schnorrsig_verify: min 66.4us / avg 66.6us / max 67.1us
      3schnorrsig_batch_verify_1: min 70.2us / avg 70.3us / max 70.5us
      4schnorrsig_batch_verify_2: min 64.0us / avg 64.0us / max 64.1us
      5schnorrsig_batch_verify_4: min 59.6us / avg 59.6us / max 59.7us
      6schnorrsig_batch_verify_8: min 57.2us / avg 57.2us / max 57.3us
      7schnorrsig_batch_verify_16: min 57.5us / avg 57.6us / max 57.6us
      8schnorrsig_batch_verify_32: min 57.1us / avg 57.3us / max 57.5us
      9schnorrsig_batch_verify_64: min 53.5us / avg 53.6us / max 53.6us
     10schnorrsig_batch_verify_128: min 49.1us / avg 49.1us / max 49.2us
     11schnorrsig_batch_verify_256: min 44.4us / avg 44.5us / max 44.5us
     12schnorrsig_batch_verify_512: min 42.1us / avg 42.1us / max 42.2us
     13schnorrsig_batch_verify_1024: min 39.3us / avg 39.4us / max 39.4us
     14schnorrsig_batch_verify_2048: min 37.4us / avg 37.5us / max 37.6us
     15schnorrsig_batch_verify_4096: min 36.0us / avg 36.0us / max 36.2us
     16schnorrsig_batch_verify_8192: min 36.0us / avg 36.0us / max 36.1us
     17
     18post-safegcd CC=gcc-7
     19schnorrsig_sign: min 35.3us / avg 35.3us / max 35.5us
     20schnorrsig_verify: min 61.9us / avg 62.0us / max 62.3us
     21schnorrsig_batch_verify_1: min 70.6us / avg 70.7us / max 70.8us
     22schnorrsig_batch_verify_2: min 64.3us / avg 64.3us / max 64.4us
     23schnorrsig_batch_verify_4: min 59.7us / avg 59.8us / max 59.9us
     24schnorrsig_batch_verify_8: min 57.2us / avg 57.3us / max 57.4us
     25schnorrsig_batch_verify_16: min 57.7us / avg 57.7us / max 57.8us
     26schnorrsig_batch_verify_32: min 57.3us / avg 57.4us / max 57.5us
     27schnorrsig_batch_verify_64: min 53.6us / avg 53.7us / max 53.7us
     28schnorrsig_batch_verify_128: min 49.2us / avg 49.2us / max 49.3us
     29schnorrsig_batch_verify_256: min 44.5us / avg 44.5us / max 44.6us
     30schnorrsig_batch_verify_512: min 42.1us / avg 42.2us / max 42.3us
     31schnorrsig_batch_verify_1024: min 39.4us / avg 39.4us / max 39.5us
     32schnorrsig_batch_verify_2048: min 37.5us / avg 37.6us / max 37.7us
     33schnorrsig_batch_verify_4096: min 36.0us / avg 36.1us / max 36.2us
     34schnorrsig_batch_verify_8192: min 36.1us / avg 36.1us / max 36.2us
     35
     36
     37pre-safegcd ENDO=on GMP=off CC=gcc-8
     38schnorrsig_sign: min 38.3us / avg 38.4us / max 38.5us
     39schnorrsig_verify: min 66.7us / avg 66.8us / max 67.0us
     40schnorrsig_batch_verify_1: min 70.5us / avg 70.6us / max 70.7us
     41schnorrsig_batch_verify_2: min 64.3us / avg 64.4us / max 64.5us
     42schnorrsig_batch_verify_4: min 59.8us / avg 59.9us / max 60.0us
     43schnorrsig_batch_verify_8: min 57.4us / avg 57.5us / max 57.6us
     44schnorrsig_batch_verify_16: min 57.9us / avg 58.0us / max 58.1us
     45schnorrsig_batch_verify_32: min 57.4us / avg 57.5us / max 57.6us
     46schnorrsig_batch_verify_64: min 53.8us / avg 53.9us / max 53.9us
     47schnorrsig_batch_verify_128: min 49.3us / avg 49.4us / max 49.5us
     48schnorrsig_batch_verify_256: min 44.6us / avg 44.7us / max 44.7us
     49schnorrsig_batch_verify_512: min 42.3us / avg 42.3us / max 42.4us
     50schnorrsig_batch_verify_1024: min 39.5us / avg 39.5us / max 39.6us
     51schnorrsig_batch_verify_2048: min 37.6us / avg 37.7us / max 37.8us
     52schnorrsig_batch_verify_4096: min 36.1us / avg 36.2us / max 36.4us
     53schnorrsig_batch_verify_8192: min 36.2us / avg 36.3us / max 36.3us
     54
     55post-safegcd CC=gcc-8
     56schnorrsig_sign: min 35.0us / avg 35.1us / max 35.2us
     57schnorrsig_verify: min 62.0us / avg 62.1us / max 62.6us
     58schnorrsig_batch_verify_1: min 70.7us / avg 70.7us / max 70.7us
     59schnorrsig_batch_verify_2: min 64.3us / avg 64.4us / max 64.4us
     60schnorrsig_batch_verify_4: min 59.8us / avg 59.9us / max 60.1us
     61schnorrsig_batch_verify_8: min 57.4us / avg 57.5us / max 57.5us
     62schnorrsig_batch_verify_16: min 57.9us / avg 57.9us / max 57.9us
     63schnorrsig_batch_verify_32: min 57.5us / avg 57.6us / max 57.6us
     64schnorrsig_batch_verify_64: min 53.8us / avg 53.8us / max 53.9us
     65schnorrsig_batch_verify_128: min 49.3us / avg 49.3us / max 49.4us
     66schnorrsig_batch_verify_256: min 44.6us / avg 44.7us / max 44.7us
     67schnorrsig_batch_verify_512: min 42.3us / avg 42.3us / max 42.3us
     68schnorrsig_batch_verify_1024: min 39.5us / avg 39.5us / max 39.6us
     69schnorrsig_batch_verify_2048: min 37.6us / avg 37.6us / max 37.7us
     70schnorrsig_batch_verify_4096: min 36.1us / avg 36.2us / max 36.3us
     71schnorrsig_batch_verify_8192: min 36.2us / avg 36.2us / max 36.2us
     72
     73
     74pre-safegcd ENDO=on GMP=off CC=gcc-9
     75schnorrsig_sign: min 38.4us / avg 38.4us / max 38.6us
     76schnorrsig_verify: min 66.8us / avg 66.9us / max 67.1us
     77schnorrsig_batch_verify_1: min 70.6us / avg 70.6us / max 70.7us
     78schnorrsig_batch_verify_2: min 64.5us / avg 64.5us / max 64.6us
     79schnorrsig_batch_verify_4: min 59.8us / avg 59.8us / max 59.9us
     80schnorrsig_batch_verify_8: min 57.4us / avg 57.5us / max 57.5us
     81schnorrsig_batch_verify_16: min 57.9us / avg 57.9us / max 58.0us
     82schnorrsig_batch_verify_32: min 57.5us / avg 57.5us / max 57.6us
     83schnorrsig_batch_verify_64: min 53.7us / avg 53.7us / max 53.8us
     84schnorrsig_batch_verify_128: min 49.3us / avg 49.3us / max 49.3us
     85schnorrsig_batch_verify_256: min 44.6us / avg 44.6us / max 44.6us
     86schnorrsig_batch_verify_512: min 42.2us / avg 42.2us / max 42.2us
     87schnorrsig_batch_verify_1024: min 39.4us / avg 39.4us / max 39.5us
     88schnorrsig_batch_verify_2048: min 37.5us / avg 37.6us / max 37.6us
     89schnorrsig_batch_verify_4096: min 36.1us / avg 36.2us / max 36.3us
     90schnorrsig_batch_verify_8192: min 36.2us / avg 36.2us / max 36.2us
     91
     92post-safegcd CC=gcc-9
     93schnorrsig_sign: min 35.0us / avg 35.0us / max 35.2us
     94schnorrsig_verify: min 62.1us / avg 62.2us / max 62.9us
     95schnorrsig_batch_verify_1: min 70.6us / avg 70.7us / max 70.9us
     96schnorrsig_batch_verify_2: min 64.2us / avg 64.2us / max 64.3us
     97schnorrsig_batch_verify_4: min 59.6us / avg 59.6us / max 59.7us
     98schnorrsig_batch_verify_8: min 57.3us / avg 57.5us / max 57.7us
     99schnorrsig_batch_verify_16: min 57.9us / avg 57.9us / max 58.0us
    100schnorrsig_batch_verify_32: min 57.4us / avg 57.5us / max 57.5us
    101schnorrsig_batch_verify_64: min 53.9us / avg 53.9us / max 53.9us
    102schnorrsig_batch_verify_128: min 49.3us / avg 49.3us / max 49.4us
    103schnorrsig_batch_verify_256: min 44.6us / avg 44.6us / max 44.6us
    104schnorrsig_batch_verify_512: min 42.2us / avg 42.2us / max 42.2us
    105schnorrsig_batch_verify_1024: min 39.4us / avg 39.4us / max 39.5us
    106schnorrsig_batch_verify_2048: min 37.5us / avg 37.6us / max 37.7us
    107schnorrsig_batch_verify_4096: min 36.1us / avg 36.1us / max 36.2us
    108schnorrsig_batch_verify_8192: min 36.1us / avg 36.1us / max 36.2us
    109
    110
    111pre-safegcd ENDO=on GMP=off CC=gcc-10
    112schnorrsig_sign: min 39.3us / avg 39.4us / max 39.5us
    113schnorrsig_verify: min 66.6us / avg 66.6us / max 66.8us
    114schnorrsig_batch_verify_1: min 70.3us / avg 70.3us / max 70.4us
    115schnorrsig_batch_verify_2: min 64.2us / avg 64.2us / max 64.2us
    116schnorrsig_batch_verify_4: min 59.8us / avg 59.8us / max 59.8us
    117schnorrsig_batch_verify_8: min 57.3us / avg 57.3us / max 57.3us
    118schnorrsig_batch_verify_16: min 57.7us / avg 57.7us / max 57.8us
    119schnorrsig_batch_verify_32: min 57.3us / avg 57.3us / max 57.4us
    120schnorrsig_batch_verify_64: min 53.7us / avg 53.8us / max 53.8us
    121schnorrsig_batch_verify_128: min 49.4us / avg 49.4us / max 49.4us
    122schnorrsig_batch_verify_256: min 44.6us / avg 44.6us / max 44.7us
    123schnorrsig_batch_verify_512: min 42.2us / avg 42.3us / max 42.4us
    124schnorrsig_batch_verify_1024: min 39.4us / avg 39.5us / max 39.6us
    125schnorrsig_batch_verify_2048: min 37.6us / avg 37.6us / max 37.7us
    126schnorrsig_batch_verify_4096: min 36.1us / avg 36.2us / max 36.3us
    127schnorrsig_batch_verify_8192: min 36.2us / avg 36.2us / max 36.3us
    128
    129post-safegcd CC=gcc-10
    130schnorrsig_sign: min 35.9us / avg 35.9us / max 36.2us
    131schnorrsig_verify: min 61.9us / avg 61.9us / max 62.1us
    132schnorrsig_batch_verify_1: min 70.5us / avg 70.5us / max 70.5us
    133schnorrsig_batch_verify_2: min 64.4us / avg 64.4us / max 64.4us
    134schnorrsig_batch_verify_4: min 60.1us / avg 60.1us / max 60.2us
    135schnorrsig_batch_verify_8: min 57.7us / avg 57.7us / max 57.7us
    136schnorrsig_batch_verify_16: min 57.8us / avg 57.9us / max 57.9us
    137schnorrsig_batch_verify_32: min 57.4us / avg 57.4us / max 57.5us
    138schnorrsig_batch_verify_64: min 53.7us / avg 53.7us / max 53.8us
    139schnorrsig_batch_verify_128: min 49.3us / avg 49.3us / max 49.4us
    140schnorrsig_batch_verify_256: min 44.6us / avg 44.6us / max 44.6us
    141schnorrsig_batch_verify_512: min 42.2us / avg 42.2us / max 42.3us
    142schnorrsig_batch_verify_1024: min 39.4us / avg 39.5us / max 39.5us
    143schnorrsig_batch_verify_2048: min 37.6us / avg 37.6us / max 37.7us
    144schnorrsig_batch_verify_4096: min 36.1us / avg 36.3us / max 36.5us
    145schnorrsig_batch_verify_8192: min 36.2us / avg 36.2us / max 36.2us
    146
    147
    148pre-safegcd ENDO=on GMP=off CC=clang-8
    149schnorrsig_sign: min 35.8us / avg 35.9us / max 36.1us
    150schnorrsig_verify: min 66.4us / avg 66.4us / max 66.6us
    151schnorrsig_batch_verify_1: min 70.6us / avg 70.7us / max 70.7us
    152schnorrsig_batch_verify_2: min 63.6us / avg 63.7us / max 63.8us
    153schnorrsig_batch_verify_4: min 58.8us / avg 58.8us / max 58.8us
    154schnorrsig_batch_verify_8: min 56.3us / avg 56.4us / max 56.4us
    155schnorrsig_batch_verify_16: min 56.6us / avg 56.7us / max 56.9us
    156schnorrsig_batch_verify_32: min 56.5us / avg 56.6us / max 56.6us
    157schnorrsig_batch_verify_64: min 52.5us / avg 52.5us / max 52.6us
    158schnorrsig_batch_verify_128: min 48.1us / avg 48.2us / max 48.2us
    159schnorrsig_batch_verify_256: min 43.6us / avg 43.6us / max 43.7us
    160schnorrsig_batch_verify_512: min 41.3us / avg 41.4us / max 41.4us
    161schnorrsig_batch_verify_1024: min 38.6us / avg 38.6us / max 38.7us
    162schnorrsig_batch_verify_2048: min 36.8us / avg 36.9us / max 36.9us
    163schnorrsig_batch_verify_4096: min 35.4us / avg 35.5us / max 35.6us
    164schnorrsig_batch_verify_8192: min 35.4us / avg 35.5us / max 35.5us
    165
    166post-safegcd CC=clang-8
    167schnorrsig_sign: min 32.5us / avg 32.5us / max 32.6us
    168schnorrsig_verify: min 61.6us / avg 61.7us / max 62.3us
    169schnorrsig_batch_verify_1: min 69.8us / avg 69.8us / max 69.9us
    170schnorrsig_batch_verify_2: min 63.0us / avg 63.1us / max 63.1us
    171schnorrsig_batch_verify_4: min 58.3us / avg 58.3us / max 58.4us
    172schnorrsig_batch_verify_8: min 55.9us / avg 55.9us / max 56.0us
    173schnorrsig_batch_verify_16: min 56.4us / avg 56.4us / max 56.5us
    174schnorrsig_batch_verify_32: min 56.4us / avg 56.4us / max 56.5us
    175schnorrsig_batch_verify_64: min 52.6us / avg 52.7us / max 52.7us
    176schnorrsig_batch_verify_128: min 48.3us / avg 48.3us / max 48.3us
    177schnorrsig_batch_verify_256: min 43.7us / avg 43.8us / max 43.8us
    178schnorrsig_batch_verify_512: min 41.4us / avg 41.5us / max 41.5us
    179schnorrsig_batch_verify_1024: min 38.7us / avg 38.7us / max 38.8us
    180schnorrsig_batch_verify_2048: min 36.9us / avg 37.0us / max 37.1us
    181schnorrsig_batch_verify_4096: min 35.4us / avg 35.5us / max 35.6us
    182schnorrsig_batch_verify_8192: min 35.4us / avg 35.5us / max 35.5us
    183
    184
    185pre-safegcd ENDO=on GMP=off CC=clang-9
    186schnorrsig_sign: min 37.8us / avg 37.9us / max 38.5us
    187schnorrsig_verify: min 66.5us / avg 66.6us / max 67.4us
    188schnorrsig_batch_verify_1: min 70.0us / avg 70.1us / max 70.1us
    189schnorrsig_batch_verify_2: min 63.1us / avg 63.2us / max 63.3us
    190schnorrsig_batch_verify_4: min 58.4us / avg 58.5us / max 58.6us
    191schnorrsig_batch_verify_8: min 55.6us / avg 55.7us / max 55.9us
    192schnorrsig_batch_verify_16: min 56.1us / avg 56.2us / max 56.3us
    193schnorrsig_batch_verify_32: min 56.2us / avg 56.2us / max 56.3us
    194schnorrsig_batch_verify_64: min 52.5us / avg 52.5us / max 52.6us
    195schnorrsig_batch_verify_128: min 48.2us / avg 48.2us / max 48.2us
    196schnorrsig_batch_verify_256: min 43.7us / avg 43.7us / max 43.7us
    197schnorrsig_batch_verify_512: min 41.4us / avg 41.4us / max 41.4us
    198schnorrsig_batch_verify_1024: min 38.6us / avg 38.7us / max 38.7us
    199schnorrsig_batch_verify_2048: min 36.8us / avg 36.9us / max 37.0us
    200schnorrsig_batch_verify_4096: min 35.4us / avg 35.5us / max 35.6us
    201schnorrsig_batch_verify_8192: min 35.5us / avg 35.5us / max 35.5us
    202
    203post-safegcd CC=clang-9
    204schnorrsig_sign: min 34.5us / avg 34.5us / max 34.6us
    205schnorrsig_verify: min 61.5us / avg 61.6us / max 61.8us
    206schnorrsig_batch_verify_1: min 69.8us / avg 69.8us / max 69.9us
    207schnorrsig_batch_verify_2: min 63.3us / avg 63.3us / max 63.4us
    208schnorrsig_batch_verify_4: min 58.6us / avg 58.6us / max 58.7us
    209schnorrsig_batch_verify_8: min 55.8us / avg 55.9us / max 55.9us
    210schnorrsig_batch_verify_16: min 56.3us / avg 56.3us / max 56.4us
    211schnorrsig_batch_verify_32: min 56.4us / avg 56.4us / max 56.5us
    212schnorrsig_batch_verify_64: min 52.5us / avg 52.5us / max 52.5us
    213schnorrsig_batch_verify_128: min 48.2us / avg 48.2us / max 48.2us
    214schnorrsig_batch_verify_256: min 43.7us / avg 43.7us / max 43.7us
    215schnorrsig_batch_verify_512: min 41.4us / avg 41.4us / max 41.5us
    216schnorrsig_batch_verify_1024: min 38.7us / avg 38.7us / max 38.8us
    217schnorrsig_batch_verify_2048: min 36.8us / avg 36.9us / max 37.0us
    218schnorrsig_batch_verify_4096: min 35.4us / avg 35.5us / max 35.6us
    219schnorrsig_batch_verify_8192: min 35.4us / avg 35.5us / max 35.6us
    220
    221
    222pre-safegcd ENDO=on GMP=off CC=clang-10
    223schnorrsig_sign: min 37.6us / avg 37.6us / max 37.7us
    224schnorrsig_verify: min 66.5us / avg 66.6us / max 66.7us
    225schnorrsig_batch_verify_1: min 70.8us / avg 70.9us / max 70.9us
    226schnorrsig_batch_verify_2: min 63.7us / avg 63.7us / max 63.8us
    227schnorrsig_batch_verify_4: min 58.7us / avg 58.8us / max 58.8us
    228schnorrsig_batch_verify_8: min 56.1us / avg 56.2us / max 56.3us
    229schnorrsig_batch_verify_16: min 56.6us / avg 56.6us / max 56.7us
    230schnorrsig_batch_verify_32: min 56.5us / avg 56.5us / max 56.6us
    231schnorrsig_batch_verify_64: min 52.5us / avg 52.5us / max 52.6us
    232schnorrsig_batch_verify_128: min 48.2us / avg 48.3us / max 48.3us
    233schnorrsig_batch_verify_256: min 43.7us / avg 43.7us / max 43.7us
    234schnorrsig_batch_verify_512: min 41.4us / avg 41.4us / max 41.4us
    235schnorrsig_batch_verify_1024: min 38.7us / avg 38.7us / max 38.8us
    236schnorrsig_batch_verify_2048: min 36.9us / avg 36.9us / max 37.0us
    237schnorrsig_batch_verify_4096: min 35.4us / avg 35.5us / max 35.6us
    238schnorrsig_batch_verify_8192: min 35.5us / avg 35.5us / max 35.5us
    239
    240post-safegcd CC=clang-10
    241schnorrsig_sign: min 34.3us / avg 34.4us / max 34.4us
    242schnorrsig_verify: min 61.7us / avg 61.7us / max 61.9us
    243schnorrsig_batch_verify_1: min 69.8us / avg 69.8us / max 69.9us
    244schnorrsig_batch_verify_2: min 63.0us / avg 63.1us / max 63.1us
    245schnorrsig_batch_verify_4: min 58.3us / avg 58.3us / max 58.4us
    246schnorrsig_batch_verify_8: min 55.5us / avg 55.6us / max 55.7us
    247schnorrsig_batch_verify_16: min 55.9us / avg 56.0us / max 56.1us
    248schnorrsig_batch_verify_32: min 56.1us / avg 56.2us / max 56.3us
    249schnorrsig_batch_verify_64: min 52.6us / avg 52.6us / max 52.7us
    250schnorrsig_batch_verify_128: min 48.2us / avg 48.3us / max 48.4us
    251schnorrsig_batch_verify_256: min 43.7us / avg 43.8us / max 43.8us
    252schnorrsig_batch_verify_512: min 41.4us / avg 41.5us / max 41.5us
    253schnorrsig_batch_verify_1024: min 38.7us / avg 38.7us / max 38.8us
    254schnorrsig_batch_verify_2048: min 36.9us / avg 36.9us / max 37.1us
    255schnorrsig_batch_verify_4096: min 35.4us / avg 35.5us / max 35.7us
    256schnorrsig_batch_verify_8192: min 35.5us / avg 35.5us / max 35.6us
    257
    258
    259pre-safegcd ENDO=on GMP=off CC=clang-11
    260schnorrsig_sign: min 37.5us / avg 37.5us / max 37.6us
    261schnorrsig_verify: min 66.3us / avg 66.3us / max 66.5us
    262schnorrsig_batch_verify_1: min 70.4us / avg 70.4us / max 70.5us
    263schnorrsig_batch_verify_2: min 63.3us / avg 63.3us / max 63.4us
    264schnorrsig_batch_verify_4: min 58.5us / avg 58.5us / max 58.6us
    265schnorrsig_batch_verify_8: min 55.6us / avg 55.6us / max 55.7us
    266schnorrsig_batch_verify_16: min 56.1us / avg 56.2us / max 56.3us
    267schnorrsig_batch_verify_32: min 56.3us / avg 56.4us / max 56.4us
    268schnorrsig_batch_verify_64: min 52.5us / avg 52.5us / max 52.6us
    269schnorrsig_batch_verify_128: min 48.2us / avg 48.2us / max 48.2us
    270schnorrsig_batch_verify_256: min 43.6us / avg 43.7us / max 43.7us
    271schnorrsig_batch_verify_512: min 41.4us / avg 41.4us / max 41.4us
    272schnorrsig_batch_verify_1024: min 38.6us / avg 38.7us / max 38.7us
    273schnorrsig_batch_verify_2048: min 36.8us / avg 36.9us / max 36.9us
    274schnorrsig_batch_verify_4096: min 35.4us / avg 35.4us / max 35.5us
    275schnorrsig_batch_verify_8192: min 35.4us / avg 35.5us / max 35.5us
    276
    277post-safegcd CC=clang-11
    278schnorrsig_sign: min 34.3us / avg 34.4us / max 34.4us
    279schnorrsig_verify: min 61.3us / avg 61.5us / max 61.7us
    280schnorrsig_batch_verify_1: min 69.5us / avg 69.5us / max 69.6us
    281schnorrsig_batch_verify_2: min 62.7us / avg 62.8us / max 62.8us
    282schnorrsig_batch_verify_4: min 58.1us / avg 58.2us / max 58.3us
    283schnorrsig_batch_verify_8: min 55.9us / avg 56.0us / max 56.1us
    284schnorrsig_batch_verify_16: min 56.5us / avg 56.6us / max 56.6us
    285schnorrsig_batch_verify_32: min 56.2us / avg 56.3us / max 56.3us
    286schnorrsig_batch_verify_64: min 52.6us / avg 53.4us / max 55.1us
    287schnorrsig_batch_verify_128: min 48.2us / avg 48.8us / max 49.6us
    288schnorrsig_batch_verify_256: min 43.7us / avg 43.8us / max 43.8us
    289schnorrsig_batch_verify_512: min 41.4us / avg 41.6us / max 41.7us
    290schnorrsig_batch_verify_1024: min 38.7us / avg 38.8us / max 38.8us
    291schnorrsig_batch_verify_2048: min 36.9us / avg 37.0us / max 37.0us
    292schnorrsig_batch_verify_4096: min 35.4us / avg 35.5us / max 35.6us
    293schnorrsig_batch_verify_8192: min 35.4us / avg 35.5us / max 35.5us
    
  37. jonasnick commented at 1:14 pm on April 1, 2021: contributor
    I noticed a higher variance in my benchmark than I was used to and re-run the experiment in a more controlled environment (gcc 10.2.0). I did not find a performance degradation post-rebase anymore. Single schnorrsig_verify was fastest post-rebase compared to pre-rebase (with endo, bignum=gmp and bignum=no) and batch verify was very similar across the three configurations.
  38. jonasnick commented at 9:20 pm on April 7, 2021: contributor
    I added a commit to reduce the batch verification randomizers to 128 bits. This gives up to a 9% speedup.
  39. jonasnick commented at 2:16 pm on May 17, 2021: contributor

    I’m intending to remove the batch verification speedup graph from BIP-340 and instead place it in libsecp’s doc directory. Therefore, I added a commit that allows recreating said graph (originally proposed for BIP-340).

    I removed the log fit from the graph and instead increased the granularity. The shape of the graph may change again once the optimal pippenger threshold/windows are updated to reflect the latest improvements.

  40. jonasnick force-pushed on May 30, 2021
  41. jonasnick commented at 7:30 pm on May 30, 2021: contributor

    Added two commits:

    1. A fix for a bug in the batch_verify benchmarks. Previously the same signatures would be checked in every iteration, which meant that the same randomizers would be used such that the optimization in 2. showed worse results for some number of sigs. Moreover, the graph in docs/speedup-batch/ looks much smoother now.
    2. An optimization to the range of optimzers by @roconnor-blockstream. Instead of choosing them from [0, 2^128-1], they are chosen from [-2^127, 2^127-1] which affects the scalars post endomorphism split and leads to an improvement of 3 to 9 percent. With the former randomizer range, one of the scalars would be 0 about 50% of the time. Now the scalar is always 0, which speeds up both Strauss’ and Pippenger’s algo.
  42. jonasnick force-pushed on May 31, 2021
  43. jonasnick commented at 3:44 pm on May 31, 2021: contributor
    Rebased the PR to (hopefully) fix CI issues.
  44. Add scalar_chacha20
    This is in preparation for schnorrsig_batch_verify.
    91cea99260
  45. schnorrsig: add batch_verify 03e125d92f
  46. fixup! Add scalar_chacha20 e8663014ef
  47. fixup! scratch space in benchmarks
    Without this commit, 8192 points require 2 batches.
    9646aad346
  48. bench_schnorrsig: stop verifying same sigs in each iter b40c6e52ef
  49. doc: add batch verification speedup graph c2f73913a2
  50. for benchmarks only: use 128 bit randomizer
    This is just a commit for benchmarks and should be improved if 128 bit
    randomizers are to be actually used.
    1) it does not follow bip-schnorr batch verification
    2) the randomizers are not uniformly distributed in [0, 2^128-1] for no reason
    3) chacha output is thrown away
    1b2e1a6d2b
  51. Choose batch randomizers in range [-2^127, 2^127-1]
    H/T roconnor-blockstream for this idea
    869e7097d9
  52. gmaxwell commented at 3:49 pm on June 1, 2021: contributor
    Would it be a win to skip applying the endomorphism where the corresponding scalar is equal to zero? (or perhaps for strauss, more generally only applying endomorphism to digits that get used, which has not applying it at all for zero as a subset),
  53. jonasnick commented at 3:19 pm on December 16, 2021: contributor

    If there’s concern that our current ecmulti_multi implementation isn’t robust enough yet to be used for consensus applications, we could do the following:

    1. Disable pippenger for now. This decreases complexity and Strauss’ algo is already being used.
    2. Support only a fixed number of scratch sizes (e.g. small and large) which allows to test more exhaustively.
  54. Sajjon commented at 10:57 am on February 9, 2022: none

    Hey! What is the status of this PR? What are the blockers to get it merged? :)

    BIP340 mentions BatchVerify and schnorrsig/tests_impl.h on master contains some verify_batch (TODO) comments suggesting that BatchVerify will get merged, but not when.

    I’m asking because I would like to know when I can offer this as part of my API in a Swift wrapper around libsecp256k1.

    I guess there is no low-hanging fruit I can help with to get this merged? One does not simply do crypto…

    Thanks!

  55. jonasnick commented at 10:14 am on February 11, 2022: contributor
    Hey @Sajjon, this PR needs a significant overhaul before getting merged as discussed above. I proposed this as a project to https://www.summerofbitcoin.org/. If there are people wanting to do this project, my plan is that we will help them to get a PR ready this summer.
  56. bicrxm commented at 3:52 am on February 17, 2022: none

    Hey @Sajjon, this PR needs a significant overhaul before getting merged as discussed above. I proposed this as a project to https://www.summerofbitcoin.org/. If there are people wanting to do this project, my plan is that we will help them to get a PR ready this summer.

    I am interested in this. Should I start from PR #558 to understand it.

  57. fjahr commented at 10:08 pm on June 28, 2024: contributor
    @jonasnick should this be closed since it was basically superseded by #1134?
  58. jonasnick commented at 7:02 pm on July 9, 2024: contributor
    yes, thanks
  59. jonasnick closed this on Jul 9, 2024


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin-core/secp256k1. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-01-24 05:15 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me