optimize additive pubkey tweaking with vartime generator point multiplication (>80% speedup) #1843

pull theStack wants to merge 4 commits into bitcoin-core:master from theStack:pubkey_tweak_add-boost changing 12 files +191 −35
  1. theStack commented at 11:47 PM on April 5, 2026: contributor

    Additive public key tweaking (i.e. given a public key $P$ and a tweak scalar $t$, calculating $P' = P + t \cdot G$) is currently implemented using the "double multiply" algorithm secp256k1_ecmult ($R = na \cdot A + ng \cdot G$), with $na=1$ and $ng=t$: https://github.com/bitcoin-core/secp256k1/blob/7262adb4b40074201fb30847035a82b8d742f350/src/ecmult.h#L43-L47 https://github.com/bitcoin-core/secp256k1/blob/7262adb4b40074201fb30847035a82b8d742f350/src/eckey_impl.h#L62-L65

    After some tinkering I found that this could be sped up quite a bit. To take full use of the fact that the tweak scalar is typically not considered sensitive data (as opposed to secret keys or nonces), this PR first introduces a variable-time generator point multiplication routine secp256k1_ecmult_gen_var, which is essentially a copy of secp256k1_ecmult_gen [1], but with all the side-channel resistance mitigations stripped out. This function is significantly faster than its constant-time original in all COMB table size configurations, over 66% in all but the smallest 2KB setting (benchmarked via $ ./build/bin/bench_ecmult):

    ECMULT_GEN_KB ecmult_gen ecmult_gen_var speedup
    2 11.6 us 8.7 us ~33.3%
    22 10.3 us 6.18 us ~66.6%
    86 (default) 9.85 us 5.86 us ~68.1%

    Plugging that new routine into the internal function secp256k1_eckey_pubkey_tweak_add function (which is easy as the vartime variant is stateless and thus doesn't need a context object anymore) yields the following speedups for the API function secp256k1_ec_pubkey_tweak_add (benchmarked via $ ./build/bin/bench tweak on the second vs. third commit):

    ECMULT_GEN_KB ec_pubkey_tweak_add<br>(using ecmult) ec_pubkey_tweak_add <br>(using new ecmult_gen_var) speedup
    2 16.3 us 11.6 us ~40.5%
    22 16.3 us 8.98 us ~81.5%
    86 (default) 16.3 us 8.65 us ~88.4%

    Note that the improved code path (function secp256k1_eckey_pubkey_tweak_add) is also used in the following API functions, so they should all benefit from it:

    Improving Silent Payments scanning performance was the original motivation to look into this (see also https://github.com/craigraw/bench_bip352/), and in fact the "common case" benchmarks show a 15-20% speedup if this branch is applied.

    [1] I've also experimented with reintroducing the much simpler ECMULT_GEN_PREC_BITS precomputation table (which was used prior to SDMC in versions earlier than 0.5.0) and that performs even better with PREC_BITS=8: https://github.com/theStack/secp256k1/commit/8f7fc93db4710ea2b7f88c7c458de7d4832968b4 (needing only 31 point additions rather than 42). However, given that it's unclear where to store this table data (having the choice of either bloating up the library size further or again needing some form of context, if it's calculated at runtime) I decided to stick with a solution that uses the already available table data first.

  2. sipa commented at 10:10 PM on April 6, 2026: contributor

    Neat.

    My guess would be that even bigger SDMC tables would be even better, because the cost of constant-time table lookups disappears (which needs to reads through the entire table).

  3. ecmult_gen: compute `ecmult_gen_scalar_diff` at compile-time
    Having this available in as a global constant allows to introduce
    an alternative `ecmult_gen` function that doesn't need access to
    a context, see next commit.
    
    Note that the precomputed constant takes the name of the function
    that previously generated it at run-time (`secp256k1_ecmult_gen_scalar_diff`),
    while the function is now renamed to include the "compute" verb
    (`secp256k1_ecmult_gen_compute_scalar_diff`), to match the naming
    of the table generation function.
    d45dc46598
  4. ecmult_gen: introduce `secp256k1_ecmult_gen_var`
    Add a faster variable-time variant for generator point multiplication.
    This is essentially `ecmult_gen` without side-channel mitigations and
    without requiring a context object. Intended for use cases where the
    scalar is not representing sensitive data (e.g. pubkey tweaking).
    
    On arm64, this is ~66% faster than the constant-time variant
    (with the default build table size, i.e. ECMULT_GEN_KB=86):
    
    $ ./build/bin/bench_ecmult
    Benchmark                     ,    Min(us)    ,    Avg(us)    ,    Max(us)
    ecmult_gen                    ,     9.94      ,    10.0       ,    10.1
    ecmult_gen_var                ,     5.96      ,     5.99      ,     6.02
    .....
    6b5c0622b8
  5. bench: add benchmark for `secp256k1_ec_pubkey_tweak_add` 9e555979c5
  6. eckey: speed up `_pubkey_tweak_add` by using ecmult_gen_var
    Rather than using the generic double multiply (with na=1), use the
    newly introduced fast variable-time generator point multiplication
    and add it up to the base public key manually.
    
    On arm64, this improves the performance of the
    `secp256k1_ec_pubkey_tweak_add` API function by about 80%:
    
    ----- Before (prior this commit): -----
    ```
    $ ./build/bin/bench tweak
    Benchmark                     ,    Min(us)    ,    Avg(us)    ,    Max(us)
    
    ec_pk_tweak_add               ,    16.1       ,    16.2       ,    16.6
    ```
    
    ----- After (this commit): -----
    ```
    $ ./build/bin/bench tweak
    Benchmark                     ,    Min(us)    ,    Avg(us)    ,    Max(us)
    
    ec_pk_tweak_add               ,     8.94      ,     8.98      ,     9.29
    ```
    
    Note that the following API functions also benefit from the improved code path:
    - secp256k1_xonly_pubkey_tweak_add
    - secp256k1_xonly_pubkey_tweak_add_check
      (this one is consensus-critical for P2TR script path spends, see BIP341)
    - secp256k1_keypair_xonly_tweak_add
    - secp256k1_musig_pubkey_ec_tweak_add
    - secp256k1_musig_pubkey_xonly_tweak_add
    0ebb1aabf6
  7. in src/ecmult_gen_impl.h:349 in f64306aa5b
     344 | +                uint32_t bitdata = secp256k1_rotr32(recoded[bit_pos >> 5], bit_pos & 0x1f);
     345 | +
     346 | +                /* Clear the bit at position tooth */
     347 | +                bits &= ~(1 << tooth);
     348 | +
     349 | +                /* Write the bit into position tooth (and junk into higher bits). */
    


    sipa commented at 10:15 PM on April 6, 2026:

    I think you can avoid this power side-channel protection mechanism here. It's very cheap, but not worth the cost when the data isn't secret to begin with.


    theStack commented at 6:21 PM on April 8, 2026:

    Makes sense, removed it.

  8. theStack commented at 6:20 PM on April 8, 2026: contributor

    Added a preparatory commit that moves the ecmult_gen_scalar_diff constant precomputation from run-time to compile-time with the precompute_ecmult_gen binary (for the exhaustive tests though, its generated at run-time, similarly how its done with the the table), and addressed #1843 (review).

    My guess would be that even bigger SDMC tables would be even better, because the cost of constant-time table lookups disappears (which needs to reads through the entire table).

    Ah, indeed! I've tried with table sizes up to ~18 MB, and at least on my arm64 machine, the ~5MB one (GEN_KB=5120) yields the best results (see https://github.com/theStack/secp256k1/commit/aa1a4e719f90c8b9ab5b418a7254f5487f969d51 and $ ./bench_gen_var_with_all_tables.sh):

    ECMULT_GEN_KB ecmult_gen ecmult_gen_var ec_pubkey_tweak_add speedup to unoptimized ec_pubkey_tweak_add (16.3 us)
    2 11.7 us 8.81 us 11.7 us 39.3%
    22 10.5 us 6.31 us 9.05 us 80.1%
    86 9.96 us 5.94 us 8.67 us 88.0%
    148 11.4 us 5.18 us 7.91 us 106.1%
    256 14.3 us 4.46 us 7.22 us 125.8%
    464 21.1 us 4.04 us 6.85 us 138.0%
    832 33.1 us 3.66 us 6.46 us 152.3%
    1536 61.2 us 3.46 us 6.21 us 162.5%
    2816 103.0 us 3.13 us 6.02 us 170.8%
    5120 188.0 us 2.85 us 5.80 us 181.0%
    9728 371.0 us 2.95 us 6.33 us 157.5%
    18432 506.0 us 4.02 us 7.32 us 122.7%

    So we would need to embed two different tables to get the most out of it (without degrading the signing/nonce generation speed). Not sure how performance-critical pubkey tweaking is though in practice, the only currently widespread scenario I could think of is verification of taproot commitments. For silentpayments I still have to check how these larger table values would affect overall performance.

  9. theStack force-pushed on Apr 8, 2026
  10. w0xlt commented at 8:39 PM on April 9, 2026: contributor

    Concept ACK

  11. sipa commented at 4:20 PM on April 10, 2026: contributor

    @theStack Yeah, I don't think it's worth having two built-in tables just for this, but it's good to see my intuition confirmed.

  12. w0xlt commented at 11:42 PM on April 12, 2026: contributor

    The code can be deduplicated.

    <details> <summary>diff</summary>

    diff --git a/src/eckey_impl.h b/src/eckey_impl.h
    index 9cee88d..566a318 100644
    --- a/src/eckey_impl.h
    +++ b/src/eckey_impl.h
    @@ -61,6 +61,7 @@ static int secp256k1_eckey_privkey_tweak_add(secp256k1_scalar *key, const secp25
     
     static int secp256k1_eckey_pubkey_tweak_add(secp256k1_ge *key, const secp256k1_scalar *tweak) {
         secp256k1_gej pt;
    +    /* `tweak` is public here, so the variable-time generator multiplication is safe. */
         secp256k1_ecmult_gen_var(&pt, tweak);
         secp256k1_gej_add_ge_var(&pt, &pt, key, NULL);
     
    diff --git a/src/ecmult_gen.h b/src/ecmult_gen.h
    index 74942ed..567cc48 100644
    --- a/src/ecmult_gen.h
    +++ b/src/ecmult_gen.h
    @@ -138,6 +138,10 @@ static void secp256k1_ecmult_gen_context_clear(secp256k1_ecmult_gen_context* ctx
     
     /** Multiply with the generator: R = a*G */
     static void secp256k1_ecmult_gen(const secp256k1_ecmult_gen_context* ctx, secp256k1_gej *r, const secp256k1_scalar *a);
    +/** Multiply with the generator: R = a*G.
    + *
    + * Variable-time implementation. Only call this for public/non-secret scalars.
    + */
     static void secp256k1_ecmult_gen_var(secp256k1_gej *r, const secp256k1_scalar *a);
     
     static void secp256k1_ecmult_gen_blind(secp256k1_ecmult_gen_context *ctx, const secp256k1_hash_ctx *hash_ctx, const unsigned char *seed32);
    diff --git a/src/ecmult_gen_impl.h b/src/ecmult_gen_impl.h
    index 8e4209f..3c4460e 100644
    --- a/src/ecmult_gen_impl.h
    +++ b/src/ecmult_gen_impl.h
    @@ -30,17 +30,89 @@ static void secp256k1_ecmult_gen_context_clear(secp256k1_ecmult_gen_context *ctx
         secp256k1_fe_clear(&ctx->proj_blind);
     }
     
    +/* Convert a scalar to a zero-padded array of 32-bit words. Only the first 8 words
    + * can contain non-zero data, but the padding avoids out-of-bounds reads from the
    + * scalar when COMB_BITS > 256. */
    +static void secp256k1_ecmult_gen_scalar_to_recoded(uint32_t recoded[(COMB_BITS + 31) >> 5], const secp256k1_scalar *s) {
    +    int i;
    +
    +    memset(recoded, 0, ((COMB_BITS + 31) >> 5) * sizeof(*recoded));
    +    for (i = 0; i < 8 && i < ((COMB_BITS + 31) >> 5); ++i) {
    +        recoded[i] = secp256k1_scalar_get_bits_limb32(s, 32 * i, 32);
    +    }
    +}
    +
    +/* Gather the mask(block)-selected bits of the recoded scalar into a packed value:
    + * bits[tooth] = d[(block*COMB_TEETH + tooth)*COMB_SPACING + comb_off].
    + *
    + * This constant-time variant mirrors the power side-channel hardening used by
    + * secp256k1_ecmult_gen. */
    +static uint32_t secp256k1_ecmult_gen_lookup_bits(const uint32_t recoded[(COMB_BITS + 31) >> 5], uint32_t comb_off, uint32_t block) {
    +    uint32_t bits = 0;
    +    uint32_t bit_pos = comb_off + block * COMB_TEETH * COMB_SPACING;
    +    uint32_t tooth;
    +
    +    /* Instead of reading individual bits here to construct the bits variable,
    +     * build up the result by xoring rotated reads together. In every iteration,
    +     * one additional bit is made correct, starting at the bottom. The bits
    +     * above that contain junk. This reduces leakage by avoiding computations
    +     * on variables that can have only a low number of possible values (e.g.,
    +     * just two values when reading a single bit into a variable.) See:
    +     * https://www.usenix.org/system/files/conference/usenixsecurity18/sec18-alam.pdf
    +     */
    +    for (tooth = 0; tooth < COMB_TEETH; ++tooth) {
    +        /* Construct bitdata s.t. the bottom bit is the bit we'd like to read.
    +         *
    +         * We could just set bitdata = recoded[bit_pos >> 5] >> (bit_pos & 0x1f)
    +         * but this would simply discard the bits that fall off at the bottom,
    +         * and thus, for example, bitdata could still have only two values if we
    +         * happen to shift by exactly 31 positions. We use a rotation instead,
    +         * which ensures that bitdata doesn't lose entropy. This relies on the
    +         * rotation being atomic, i.e., the compiler emitting an actual rot
    +         * instruction. */
    +        uint32_t bitdata = secp256k1_rotr32(recoded[bit_pos >> 5], bit_pos & 0x1f);
    +
    +        /* Clear the bit at position tooth, but sssh, don't tell clang. */
    +        uint32_t volatile vmask = ~(1 << tooth);
    +        bits &= vmask;
    +
    +        /* Write the bit into position tooth (and junk into higher bits). */
    +        bits ^= bitdata << tooth;
    +        bit_pos += COMB_SPACING;
    +    }
    +    return bits;
    +}
    +
    +/* Same bit packing as secp256k1_ecmult_gen_lookup_bits(), but without the
    + * constant-time/power-analysis hardening as the scalar is public here. */
    +static uint32_t secp256k1_ecmult_gen_lookup_bits_var(const uint32_t recoded[(COMB_BITS + 31) >> 5], uint32_t comb_off, uint32_t block) {
    +    uint32_t bits = 0;
    +    uint32_t bit_pos = comb_off + block * COMB_TEETH * COMB_SPACING;
    +    uint32_t tooth;
    +
    +    for (tooth = 0; tooth < COMB_TEETH; ++tooth) {
    +        uint32_t bit = (recoded[bit_pos >> 5] >> (bit_pos & 0x1f)) & 1;
    +        bits |= bit << tooth;
    +        bit_pos += COMB_SPACING;
    +    }
    +    return bits;
    +}
    +
    +static void secp256k1_ecmult_gen_lookup_table_index(uint32_t bits, uint32_t *sign, uint32_t *abs) {
    +    *sign = (bits >> (COMB_TEETH - 1)) & 1;
    +    *abs = (bits ^ -*sign) & (COMB_POINTS - 1);
    +    VERIFY_CHECK(*sign == 0 || *sign == 1);
    +    VERIFY_CHECK(*abs < COMB_POINTS);
    +}
    +
     static void secp256k1_ecmult_gen(const secp256k1_ecmult_gen_context *ctx, secp256k1_gej *r, const secp256k1_scalar *gn) {
         uint32_t comb_off;
         secp256k1_ge add;
         secp256k1_fe neg;
         secp256k1_ge_storage adds;
         secp256k1_scalar d;
    -    /* Array of uint32_t values large enough to store COMB_BITS bits. Only the bottom
    -     * 8 are ever nonzero, but having the zero padding at the end if COMB_BITS>256
    -     * avoids the need to deal with out-of-bounds reads from a scalar. */
    -    uint32_t recoded[(COMB_BITS + 31) >> 5] = {0};
    -    int first = 1, i;
    +    uint32_t recoded[(COMB_BITS + 31) >> 5];
    +    int first = 1;
     
         memset(&adds, 0, sizeof(adds));
     
    @@ -88,9 +160,7 @@ static void secp256k1_ecmult_gen(const secp256k1_ecmult_gen_context *ctx, secp25
         /* Compute the scalar d = (gn + ctx->scalar_offset). */
         secp256k1_scalar_add(&d, &ctx->scalar_offset, gn);
         /* Convert to recoded array. */
    -    for (i = 0; i < 8 && i < ((COMB_BITS + 31) >> 5); ++i) {
    -        recoded[i] = secp256k1_scalar_get_bits_limb32(&d, 32 * i, 32);
    -    }
    +    secp256k1_ecmult_gen_scalar_to_recoded(recoded, &d);
         secp256k1_scalar_clear(&d);
     
         /* In secp256k1_ecmult_gen_prec_table we have precomputed sums of the
    @@ -171,47 +241,11 @@ static void secp256k1_ecmult_gen(const secp256k1_ecmult_gen_context *ctx, secp25
         comb_off = COMB_SPACING - 1;
         while (1) {
             uint32_t block;
    -        uint32_t bit_pos = comb_off;
             /* Inner loop: for each block, add table entries to the result. */
             for (block = 0; block < COMB_BLOCKS; ++block) {
    -            /* Gather the mask(block)-selected bits of d into bits. They're packed:
    -             * bits[tooth] = d[(block*COMB_TEETH + tooth)*COMB_SPACING + comb_off]. */
    -            uint32_t bits = 0, sign, abs, index, tooth;
    -            /* Instead of reading individual bits here to construct the bits variable,
    -             * build up the result by xoring rotated reads together. In every iteration,
    -             * one additional bit is made correct, starting at the bottom. The bits
    -             * above that contain junk. This reduces leakage by avoiding computations
    -             * on variables that can have only a low number of possible values (e.g.,
    -             * just two values when reading a single bit into a variable.) See:
    -             * https://www.usenix.org/system/files/conference/usenixsecurity18/sec18-alam.pdf
    -             */
    -            for (tooth = 0; tooth < COMB_TEETH; ++tooth) {
    -                /* Construct bitdata s.t. the bottom bit is the bit we'd like to read.
    -                 *
    -                 * We could just set bitdata = recoded[bit_pos >> 5] >> (bit_pos & 0x1f)
    -                 * but this would simply discard the bits that fall off at the bottom,
    -                 * and thus, for example, bitdata could still have only two values if we
    -                 * happen to shift by exactly 31 positions. We use a rotation instead,
    -                 * which ensures that bitdata doesn't lose entropy. This relies on the
    -                 * rotation being atomic, i.e., the compiler emitting an actual rot
    -                 * instruction. */
    -                uint32_t bitdata = secp256k1_rotr32(recoded[bit_pos >> 5], bit_pos & 0x1f);
    -
    -                /* Clear the bit at position tooth, but sssh, don't tell clang. */
    -                uint32_t volatile vmask = ~(1 << tooth);
    -                bits &= vmask;
    -
    -                /* Write the bit into position tooth (and junk into higher bits). */
    -                bits ^= bitdata << tooth;
    -                bit_pos += COMB_SPACING;
    -            }
    -
    -            /* If the top bit of bits is 1, flip them all (corresponding to looking up
    -             * the negated table value), and remember to negate the result in sign. */
    -            sign = (bits >> (COMB_TEETH - 1)) & 1;
    -            abs = (bits ^ -sign) & (COMB_POINTS - 1);
    -            VERIFY_CHECK(sign == 0 || sign == 1);
    -            VERIFY_CHECK(abs < COMB_POINTS);
    +            uint32_t bits, sign, abs, index;
    +            bits = secp256k1_ecmult_gen_lookup_bits(recoded, comb_off, block);
    +            secp256k1_ecmult_gen_lookup_table_index(bits, &sign, &abs);
     
                 /** This uses a conditional move to avoid any secret data in array indexes.
                  *   _Any_ use of secret indexes has been demonstrated to result in timing
    @@ -275,38 +309,22 @@ static void secp256k1_ecmult_gen_var(secp256k1_gej *r, const secp256k1_scalar *g
         uint32_t comb_off;
         secp256k1_ge add;
         secp256k1_scalar d;
    -    uint32_t recoded[(COMB_BITS + 31) >> 5] = {0};
    -    int i;
    +    uint32_t recoded[(COMB_BITS + 31) >> 5];
     
         /* Adjust input scalar for difference and convert to recoded array. */
         secp256k1_scalar_add(&d, &secp256k1_ecmult_gen_scalar_diff, gn);
    -    for (i = 0; i < 8 && i < ((COMB_BITS + 31) >> 5); ++i) {
    -        recoded[i] = secp256k1_scalar_get_bits_limb32(&d, 32 * i, 32);
    -    }
    +    secp256k1_ecmult_gen_scalar_to_recoded(recoded, &d);
     
         /* Outer loop: iterate over comb_off from COMB_SPACING - 1 down to 0. */
         secp256k1_gej_set_infinity(r);
         comb_off = COMB_SPACING - 1;
         while (1) {
             uint32_t block;
    -        uint32_t bit_pos = comb_off;
             /* Inner loop: for each block, add table entries to the result. */
             for (block = 0; block < COMB_BLOCKS; ++block) {
    -            /* Gather the mask(block)-selected bits of d into bits. They're packed:
    -             * bits[tooth] = d[(block*COMB_TEETH + tooth)*COMB_SPACING + comb_off]. */
    -            uint32_t bits = 0, sign, abs, tooth;
    -            for (tooth = 0; tooth < COMB_TEETH; ++tooth) {
    -                uint32_t bit = (recoded[bit_pos >> 5] >> (bit_pos & 0x1f)) & 1;
    -                bits |= bit << tooth;
    -                bit_pos += COMB_SPACING;
    -            }
    -
    -            /* If the top bit of bits is 1, flip them all (corresponding to looking up
    -             * the negated table value), and remember to negate the result in sign. */
    -            sign = (bits >> (COMB_TEETH - 1)) & 1;
    -            abs = (bits ^ -sign) & (COMB_POINTS - 1);
    -            VERIFY_CHECK(sign == 0 || sign == 1);
    -            VERIFY_CHECK(abs < COMB_POINTS);
    +            uint32_t bits, sign, abs;
    +            bits = secp256k1_ecmult_gen_lookup_bits_var(recoded, comb_off, block);
    +            secp256k1_ecmult_gen_lookup_table_index(bits, &sign, &abs);
     
                 /* Perform lookup, negate if necessary and add to r. */
                 secp256k1_ge_from_storage(&add, &secp256k1_ecmult_gen_prec_table[block][abs]);
    

    </details>

  13. w0xlt commented at 12:21 AM on April 13, 2026: contributor

    ACK 0ebb1aabf62a201aef6da5e5a9e9719afbb26a90 mod above nit

  14. real-or-random added the label performance on Apr 13, 2026
  15. real-or-random added the label tweak/refactor on Apr 13, 2026
  16. real-or-random commented at 11:22 AM on April 13, 2026: contributor

    to take full use of the fact that the tweak scalar is typically not considered sensitive data

    The word "typically" is what worries me here. We don't really know what people use these functions for. What if they do have secret tweaks?

    And isn't the tweak secret even in known applications, e.g., Taproot tweaking, say Q = P + tG? One security feature of Taproot is that it hides the Merkle root until a script-path spend. This only works if the tweak t is secret. If t is public, then the attacker can get P = Q - tG and search for a Merkle root m such that t = hash(P, m).

  17. theStack commented at 1:22 PM on April 13, 2026: contributor

    to take full use of the fact that the tweak scalar is typically not considered sensitive data

    The word "typically" is what worries me here. We don't really know what people use these functions for. What if they do have secret tweaks?

    Hm good point, the "if something sounds too good to be true, it probably is" proverb comes to my mind :sweat_smile: Fwiw, what made me believe that the change is fine (in the sense that it doesn't make things worse w.r.t. security to what we have now) is that currently public key tweaking is not side-channel resistant either, as variable-time scalar multiplication (ecmult) is used and the tweak scalar isn't cleared out anywhere. Seems like it would be better to go into the other direction and use ecmult_gen then? (IIRC, I experimented with that a while ago, and it was still a few micro-seconds faster than the current code).

    And isn't the tweak secret even in known applications, e.g., Taproot tweaking, say Q = P + tG? One security feature of Taproot is that it hides the Merkle root until a script-path spend. This only works if the tweak t is secret. If t is public, then the attacker can get P = Q - tG and search for a Merkle root m such that t = hash(P, m).

    Makes sense. For verifying taproot tweaks though, i.e. secp256k1_xonly_pubkey_tweak_add_check, treating the tweak as public seems to be fine though, as that one is called after revealing the merkle root? Maybe it's even worth it to have two different eckey_pubkey_tweak_add variants internally, one secure and one fast? The former could be used for creating tweaks, the latter for verifying them, or for special use cases like Silent Payments.

  18. sipa commented at 2:21 PM on April 13, 2026: contributor

    One security feature of Taproot is that it hides the Merkle root until a script-path spend.

    Hmm, I would consider that a privacy feature, and not a security feature. Generally, taproot script trees will involve multiple distrusted participants, and the leaf/script structure will be shared with them. This doesn't work if the script tree is supposed to be secret from a security perspective.

    And I think that we're generally not aiming to use constant time operations to protect privacy. Otherwise, it literally applies to everything. You may well want to keep your public keys private, so all functions involving public keys should be constant time?

    It may make sense to have a mode/extension/module for "do everything constant time", but I don't think it aligns with how the library has operated so far.


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin-core/secp256k1. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-04-14 15:15 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me