optimize additive pubkey tweaking with vartime generator point multiplication (>80% speedup) #1843

theStack commented at 11:47 PM on April 5, 2026: contributor

Additive public key tweaking (i.e. given a public key $P$ and a tweak scalar $t$, calculating $P' = P + t \cdot G$) is currently implemented using the "double multiply" algorithm secp256k1_ecmult ($R = na \cdot A + ng \cdot G$), with $na=1$ and $ng=t$: https://github.com/bitcoin-core/secp256k1/blob/7262adb4b40074201fb30847035a82b8d742f350/src/ecmult.h#L43-L47 https://github.com/bitcoin-core/secp256k1/blob/7262adb4b40074201fb30847035a82b8d742f350/src/eckey_impl.h#L62-L65

After some tinkering I found that this could be sped up quite a bit. To take full use of the fact that the tweak scalar is typically not considered sensitive data (as opposed to secret keys or nonces), this PR first introduces a variable-time generator point multiplication routine secp256k1_ecmult_gen_var, which is essentially a copy of secp256k1_ecmult_gen [1], but with all the side-channel resistance mitigations stripped out. This function is significantly faster than its constant-time original in all COMB table size configurations, over 66% in all but the smallest 2KB setting (benchmarked via $ ./build/bin/bench_ecmult):

ECMULT_GEN_KB	`ecmult_gen`	`ecmult_gen_var`	speedup
2	11.6 us	8.7 us	~33.3%
22	10.3 us	6.18 us	~66.6%
86 (default)	9.85 us	5.86 us	~68.1%

Plugging that new routine into the internal function secp256k1_eckey_pubkey_tweak_add function (which is easy as the vartime variant is stateless and thus doesn't need a context object anymore) yields the following speedups for the API function secp256k1_ec_pubkey_tweak_add (benchmarked via $ ./build/bin/bench tweak on the second vs. third commit):

ECMULT_GEN_KB	`ec_pubkey_tweak_add`<br>(using `ecmult`)	`ec_pubkey_tweak_add` <br>(using new `ecmult_gen_var`)	speedup
2	16.3 us	11.6 us	~40.5%
22	16.3 us	8.98 us	~81.5%
86 (default)	16.3 us	8.65 us	~88.4%

Note that the improved code path (function secp256k1_eckey_pubkey_tweak_add) is also used in the following API functions, so they should all benefit from it:

secp256k1_xonly_pubkey_tweak_add
secp256k1_xonly_pubkey_tweak_add_check (this one is used for verifying P2TR script path spends in Bitcoin Core and thus consensus-critical, see BIP341)
secp256k1_keypair_xonly_tweak_add
secp256k1_musig_pubkey_{ec,xonly}_tweak_add
secp256k1_silentpayments_recipient_scan_outputs in PR #1765

Improving Silent Payments scanning performance was the original motivation to look into this (see also https://github.com/craigraw/bench_bip352/), and in fact the "common case" benchmarks show a 15-20% speedup if this branch is applied.

[1] I've also experimented with reintroducing the much simpler ECMULT_GEN_PREC_BITS precomputation table (which was used prior to SDMC in versions earlier than 0.5.0) and that performs even better with PREC_BITS=8: https://github.com/theStack/secp256k1/commit/8f7fc93db4710ea2b7f88c7c458de7d4832968b4 (needing only 31 point additions rather than 42). However, given that it's unclear where to store this table data (having the choice of either bloating up the library size further or again needing some form of context, if it's calculated at runtime) I decided to stick with a solution that uses the already available table data first.

sipa commented at 10:10 PM on April 6, 2026: contributor

Neat.

My guess would be that even bigger SDMC tables would be even better, because the cost of constant-time table lookups disappears (which needs to reads through the entire table).

ecmult_gen: compute `ecmult_gen_scalar_diff` at compile-time

Having this available in as a global constant allows to introduce
an alternative `ecmult_gen` function that doesn't need access to
a context, see next commit.

Note that the precomputed constant takes the name of the function
that previously generated it at run-time (`secp256k1_ecmult_gen_scalar_diff`),
while the function is now renamed to include the "compute" verb
(`secp256k1_ecmult_gen_compute_scalar_diff`), to match the naming
of the table generation function.

d45dc46598

ecmult_gen: introduce `secp256k1_ecmult_gen_var`

Add a faster variable-time variant for generator point multiplication.
This is essentially `ecmult_gen` without side-channel mitigations and
without requiring a context object. Intended for use cases where the
scalar is not representing sensitive data (e.g. pubkey tweaking).

On arm64, this is ~66% faster than the constant-time variant
(with the default build table size, i.e. ECMULT_GEN_KB=86):

$ ./build/bin/bench_ecmult
Benchmark                     ,    Min(us)    ,    Avg(us)    ,    Max(us)
ecmult_gen                    ,     9.94      ,    10.0       ,    10.1
ecmult_gen_var                ,     5.96      ,     5.99      ,     6.02
.....

6b5c0622b8

bench: add benchmark for `secp256k1_ec_pubkey_tweak_add` 9e555979c5

eckey: speed up `_pubkey_tweak_add` by using ecmult_gen_var

Rather than using the generic double multiply (with na=1), use the
newly introduced fast variable-time generator point multiplication
and add it up to the base public key manually.

On arm64, this improves the performance of the
`secp256k1_ec_pubkey_tweak_add` API function by about 80%:

----- Before (prior this commit): -----
```
$ ./build/bin/bench tweak
Benchmark                     ,    Min(us)    ,    Avg(us)    ,    Max(us)

ec_pk_tweak_add               ,    16.1       ,    16.2       ,    16.6
```

----- After (this commit): -----
```
$ ./build/bin/bench tweak
Benchmark                     ,    Min(us)    ,    Avg(us)    ,    Max(us)

ec_pk_tweak_add               ,     8.94      ,     8.98      ,     9.29
```

Note that the following API functions also benefit from the improved code path:
- secp256k1_xonly_pubkey_tweak_add
- secp256k1_xonly_pubkey_tweak_add_check
  (this one is consensus-critical for P2TR script path spends, see BIP341)
- secp256k1_keypair_xonly_tweak_add
- secp256k1_musig_pubkey_ec_tweak_add
- secp256k1_musig_pubkey_xonly_tweak_add

0ebb1aabf6

in src/ecmult_gen_impl.h:349 in f64306aa5b

 344 | +                uint32_t bitdata = secp256k1_rotr32(recoded[bit_pos >> 5], bit_pos & 0x1f);
 345 | +
 346 | +                /* Clear the bit at position tooth */
 347 | +                bits &= ~(1 << tooth);
 348 | +
 349 | +                /* Write the bit into position tooth (and junk into higher bits). */

sipa commented at 10:15 PM on April 6, 2026:

I think you can avoid this power side-channel protection mechanism here. It's very cheap, but not worth the cost when the data isn't secret to begin with.

theStack commented at 6:21 PM on April 8, 2026:

Makes sense, removed it.

theStack commented at 6:20 PM on April 8, 2026: contributor

Added a preparatory commit that moves the ecmult_gen_scalar_diff constant precomputation from run-time to compile-time with the precompute_ecmult_gen binary (for the exhaustive tests though, its generated at run-time, similarly how its done with the the table), and addressed #1843 (review).

My guess would be that even bigger SDMC tables would be even better, because the cost of constant-time table lookups disappears (which needs to reads through the entire table).

Ah, indeed! I've tried with table sizes up to ~18 MB, and at least on my arm64 machine, the ~5MB one (GEN_KB=5120) yields the best results (see https://github.com/theStack/secp256k1/commit/aa1a4e719f90c8b9ab5b418a7254f5487f969d51 and $ ./bench_gen_var_with_all_tables.sh):

ECMULT_GEN_KB	`ecmult_gen`	`ecmult_gen_var`	`ec_pubkey_tweak_add`	speedup to unoptimized `ec_pubkey_tweak_add` (16.3 us)
2	11.7 us	8.81 us	11.7 us	39.3%
22	10.5 us	6.31 us	9.05 us	80.1%
86	9.96 us	5.94 us	8.67 us	88.0%
148	11.4 us	5.18 us	7.91 us	106.1%
256	14.3 us	4.46 us	7.22 us	125.8%
464	21.1 us	4.04 us	6.85 us	138.0%
832	33.1 us	3.66 us	6.46 us	152.3%
1536	61.2 us	3.46 us	6.21 us	162.5%
2816	103.0 us	3.13 us	6.02 us	170.8%
5120	188.0 us	2.85 us	5.80 us	181.0%
9728	371.0 us	2.95 us	6.33 us	157.5%
18432	506.0 us	4.02 us	7.32 us	122.7%

So we would need to embed two different tables to get the most out of it (without degrading the signing/nonce generation speed). Not sure how performance-critical pubkey tweaking is though in practice, the only currently widespread scenario I could think of is verification of taproot commitments. For silentpayments I still have to check how these larger table values would affect overall performance.

theStack force-pushed on Apr 8, 2026

w0xlt commented at 8:39 PM on April 9, 2026: contributor

Concept ACK

sipa commented at 4:20 PM on April 10, 2026: contributor

@theStack Yeah, I don't think it's worth having two built-in tables just for this, but it's good to see my intuition confirmed.

w0xlt commented at 11:42 PM on April 12, 2026: contributor

The code can be deduplicated.

diff --git a/src/eckey_impl.h b/src/eckey_impl.h
index 9cee88d..566a318 100644
--- a/src/eckey_impl.h
+++ b/src/eckey_impl.h
@@ -61,6 +61,7 @@ static int secp256k1_eckey_privkey_tweak_add(secp256k1_scalar *key, const secp25
 
 static int secp256k1_eckey_pubkey_tweak_add(secp256k1_ge *key, const secp256k1_scalar *tweak) {
     secp256k1_gej pt;
+    /* `tweak` is public here, so the variable-time generator multiplication is safe. */
     secp256k1_ecmult_gen_var(&pt, tweak);
     secp256k1_gej_add_ge_var(&pt, &pt, key, NULL);
 
diff --git a/src/ecmult_gen.h b/src/ecmult_gen.h
index 74942ed..567cc48 100644
--- a/src/ecmult_gen.h
+++ b/src/ecmult_gen.h
@@ -138,6 +138,10 @@ static void secp256k1_ecmult_gen_context_clear(secp256k1_ecmult_gen_context* ctx
 
 /** Multiply with the generator: R = a*G */
 static void secp256k1_ecmult_gen(const secp256k1_ecmult_gen_context* ctx, secp256k1_gej *r, const secp256k1_scalar *a);
+/** Multiply with the generator: R = a*G.
+ *
+ * Variable-time implementation. Only call this for public/non-secret scalars.
+ */
 static void secp256k1_ecmult_gen_var(secp256k1_gej *r, const secp256k1_scalar *a);
 
 static void secp256k1_ecmult_gen_blind(secp256k1_ecmult_gen_context *ctx, const secp256k1_hash_ctx *hash_ctx, const unsigned char *seed32);
diff --git a/src/ecmult_gen_impl.h b/src/ecmult_gen_impl.h
index 8e4209f..3c4460e 100644
--- a/src/ecmult_gen_impl.h
+++ b/src/ecmult_gen_impl.h
@@ -30,17 +30,89 @@ static void secp256k1_ecmult_gen_context_clear(secp256k1_ecmult_gen_context *ctx
     secp256k1_fe_clear(&ctx->proj_blind);
 }
 
+/* Convert a scalar to a zero-padded array of 32-bit words. Only the first 8 words
+ * can contain non-zero data, but the padding avoids out-of-bounds reads from the
+ * scalar when COMB_BITS > 256. */
+static void secp256k1_ecmult_gen_scalar_to_recoded(uint32_t recoded[(COMB_BITS + 31) >> 5], const secp256k1_scalar *s) {
+    int i;
+
+    memset(recoded, 0, ((COMB_BITS + 31) >> 5) * sizeof(*recoded));
+    for (i = 0; i < 8 && i < ((COMB_BITS + 31) >> 5); ++i) {
+        recoded[i] = secp256k1_scalar_get_bits_limb32(s, 32 * i, 32);
+    }
+}
+
+/* Gather the mask(block)-selected bits of the recoded scalar into a packed value:
+ * bits[tooth] = d[(block*COMB_TEETH + tooth)*COMB_SPACING + comb_off].
+ *
+ * This constant-time variant mirrors the power side-channel hardening used by
+ * secp256k1_ecmult_gen. */
+static uint32_t secp256k1_ecmult_gen_lookup_bits(const uint32_t recoded[(COMB_BITS + 31) >> 5], uint32_t comb_off, uint32_t block) {
+    uint32_t bits = 0;
+    uint32_t bit_pos = comb_off + block * COMB_TEETH * COMB_SPACING;
+    uint32_t tooth;
+
+    /* Instead of reading individual bits here to construct the bits variable,
+     * build up the result by xoring rotated reads together. In every iteration,
+     * one additional bit is made correct, starting at the bottom. The bits
+     * above that contain junk. This reduces leakage by avoiding computations
+     * on variables that can have only a low number of possible values (e.g.,
+     * just two values when reading a single bit into a variable.) See:
+     * https://www.usenix.org/system/files/conference/usenixsecurity18/sec18-alam.pdf
+     */
+    for (tooth = 0; tooth < COMB_TEETH; ++tooth) {
+        /* Construct bitdata s.t. the bottom bit is the bit we'd like to read.
+         *
+         * We could just set bitdata = recoded[bit_pos >> 5] >> (bit_pos & 0x1f)
+         * but this would simply discard the bits that fall off at the bottom,
+         * and thus, for example, bitdata could still have only two values if we
+         * happen to shift by exactly 31 positions. We use a rotation instead,
+         * which ensures that bitdata doesn't lose entropy. This relies on the
+         * rotation being atomic, i.e., the compiler emitting an actual rot
+         * instruction. */
+        uint32_t bitdata = secp256k1_rotr32(recoded[bit_pos >> 5], bit_pos & 0x1f);
+
+        /* Clear the bit at position tooth, but sssh, don't tell clang. */
+        uint32_t volatile vmask = ~(1 << tooth);
+        bits &= vmask;
+
+        /* Write the bit into position tooth (and junk into higher bits). */
+        bits ^= bitdata << tooth;
+        bit_pos += COMB_SPACING;
+    }
+    return bits;
+}
+
+/* Same bit packing as secp256k1_ecmult_gen_lookup_bits(), but without the
+ * constant-time/power-analysis hardening as the scalar is public here. */
+static uint32_t secp256k1_ecmult_gen_lookup_bits_var(const uint32_t recoded[(COMB_BITS + 31) >> 5], uint32_t comb_off, uint32_t block) {
+    uint32_t bits = 0;
+    uint32_t bit_pos = comb_off + block * COMB_TEETH * COMB_SPACING;
+    uint32_t tooth;
+
+    for (tooth = 0; tooth < COMB_TEETH; ++tooth) {
+        uint32_t bit = (recoded[bit_pos >> 5] >> (bit_pos & 0x1f)) & 1;
+        bits |= bit << tooth;
+        bit_pos += COMB_SPACING;
+    }
+    return bits;
+}
+
+static void secp256k1_ecmult_gen_lookup_table_index(uint32_t bits, uint32_t *sign, uint32_t *abs) {
+    *sign = (bits >> (COMB_TEETH - 1)) & 1;
+    *abs = (bits ^ -*sign) & (COMB_POINTS - 1);
+    VERIFY_CHECK(*sign == 0 || *sign == 1);
+    VERIFY_CHECK(*abs < COMB_POINTS);
+}
+
 static void secp256k1_ecmult_gen(const secp256k1_ecmult_gen_context *ctx, secp256k1_gej *r, const secp256k1_scalar *gn) {
     uint32_t comb_off;
     secp256k1_ge add;
     secp256k1_fe neg;
     secp256k1_ge_storage adds;
     secp256k1_scalar d;
-    /* Array of uint32_t values large enough to store COMB_BITS bits. Only the bottom
-     * 8 are ever nonzero, but having the zero padding at the end if COMB_BITS>256
-     * avoids the need to deal with out-of-bounds reads from a scalar. */
-    uint32_t recoded[(COMB_BITS + 31) >> 5] = {0};
-    int first = 1, i;
+    uint32_t recoded[(COMB_BITS + 31) >> 5];
+    int first = 1;
 
     memset(&adds, 0, sizeof(adds));
 
@@ -88,9 +160,7 @@ static void secp256k1_ecmult_gen(const secp256k1_ecmult_gen_context *ctx, secp25
     /* Compute the scalar d = (gn + ctx->scalar_offset). */
     secp256k1_scalar_add(&d, &ctx->scalar_offset, gn);
     /* Convert to recoded array. */
-    for (i = 0; i < 8 && i < ((COMB_BITS + 31) >> 5); ++i) {
-        recoded[i] = secp256k1_scalar_get_bits_limb32(&d, 32 * i, 32);
-    }
+    secp256k1_ecmult_gen_scalar_to_recoded(recoded, &d);
     secp256k1_scalar_clear(&d);
 
     /* In secp256k1_ecmult_gen_prec_table we have precomputed sums of the
@@ -171,47 +241,11 @@ static void secp256k1_ecmult_gen(const secp256k1_ecmult_gen_context *ctx, secp25
     comb_off = COMB_SPACING - 1;
     while (1) {
         uint32_t block;
-        uint32_t bit_pos = comb_off;
         /* Inner loop: for each block, add table entries to the result. */
         for (block = 0; block < COMB_BLOCKS; ++block) {
-            /* Gather the mask(block)-selected bits of d into bits. They're packed:
-             * bits[tooth] = d[(block*COMB_TEETH + tooth)*COMB_SPACING + comb_off]. */
-            uint32_t bits = 0, sign, abs, index, tooth;
-            /* Instead of reading individual bits here to construct the bits variable,
-             * build up the result by xoring rotated reads together. In every iteration,
-             * one additional bit is made correct, starting at the bottom. The bits
-             * above that contain junk. This reduces leakage by avoiding computations
-             * on variables that can have only a low number of possible values (e.g.,
-             * just two values when reading a single bit into a variable.) See:
-             * https://www.usenix.org/system/files/conference/usenixsecurity18/sec18-alam.pdf
-             */
-            for (tooth = 0; tooth < COMB_TEETH; ++tooth) {
-                /* Construct bitdata s.t. the bottom bit is the bit we'd like to read.
-                 *
-                 * We could just set bitdata = recoded[bit_pos >> 5] >> (bit_pos & 0x1f)
-                 * but this would simply discard the bits that fall off at the bottom,
-                 * and thus, for example, bitdata could still have only two values if we
-                 * happen to shift by exactly 31 positions. We use a rotation instead,
-                 * which ensures that bitdata doesn't lose entropy. This relies on the
-                 * rotation being atomic, i.e., the compiler emitting an actual rot
-                 * instruction. */
-                uint32_t bitdata = secp256k1_rotr32(recoded[bit_pos >> 5], bit_pos & 0x1f);
-
-                /* Clear the bit at position tooth, but sssh, don't tell clang. */
-                uint32_t volatile vmask = ~(1 << tooth);
-                bits &= vmask;
-
-                /* Write the bit into position tooth (and junk into higher bits). */
-                bits ^= bitdata << tooth;
-                bit_pos += COMB_SPACING;
-            }
-
-            /* If the top bit of bits is 1, flip them all (corresponding to looking up
-             * the negated table value), and remember to negate the result in sign. */
-            sign = (bits >> (COMB_TEETH - 1)) & 1;
-            abs = (bits ^ -sign) & (COMB_POINTS - 1);
-            VERIFY_CHECK(sign == 0 || sign == 1);
-            VERIFY_CHECK(abs < COMB_POINTS);
+            uint32_t bits, sign, abs, index;
+            bits = secp256k1_ecmult_gen_lookup_bits(recoded, comb_off, block);
+            secp256k1_ecmult_gen_lookup_table_index(bits, &sign, &abs);
 
             /** This uses a conditional move to avoid any secret data in array indexes.
              *   _Any_ use of secret indexes has been demonstrated to result in timing
@@ -275,38 +309,22 @@ static void secp256k1_ecmult_gen_var(secp256k1_gej *r, const secp256k1_scalar *g
     uint32_t comb_off;
     secp256k1_ge add;
     secp256k1_scalar d;
-    uint32_t recoded[(COMB_BITS + 31) >> 5] = {0};
-    int i;
+    uint32_t recoded[(COMB_BITS + 31) >> 5];
 
     /* Adjust input scalar for difference and convert to recoded array. */
     secp256k1_scalar_add(&d, &secp256k1_ecmult_gen_scalar_diff, gn);
-    for (i = 0; i < 8 && i < ((COMB_BITS + 31) >> 5); ++i) {
-        recoded[i] = secp256k1_scalar_get_bits_limb32(&d, 32 * i, 32);
-    }
+    secp256k1_ecmult_gen_scalar_to_recoded(recoded, &d);
 
     /* Outer loop: iterate over comb_off from COMB_SPACING - 1 down to 0. */
     secp256k1_gej_set_infinity(r);
     comb_off = COMB_SPACING - 1;
     while (1) {
         uint32_t block;
-        uint32_t bit_pos = comb_off;
         /* Inner loop: for each block, add table entries to the result. */
         for (block = 0; block < COMB_BLOCKS; ++block) {
-            /* Gather the mask(block)-selected bits of d into bits. They're packed:
-             * bits[tooth] = d[(block*COMB_TEETH + tooth)*COMB_SPACING + comb_off]. */
-            uint32_t bits = 0, sign, abs, tooth;
-            for (tooth = 0; tooth < COMB_TEETH; ++tooth) {
-                uint32_t bit = (recoded[bit_pos >> 5] >> (bit_pos & 0x1f)) & 1;
-                bits |= bit << tooth;
-                bit_pos += COMB_SPACING;
-            }
-
-            /* If the top bit of bits is 1, flip them all (corresponding to looking up
-             * the negated table value), and remember to negate the result in sign. */
-            sign = (bits >> (COMB_TEETH - 1)) & 1;
-            abs = (bits ^ -sign) & (COMB_POINTS - 1);
-            VERIFY_CHECK(sign == 0 || sign == 1);
-            VERIFY_CHECK(abs < COMB_POINTS);
+            uint32_t bits, sign, abs;
+            bits = secp256k1_ecmult_gen_lookup_bits_var(recoded, comb_off, block);
+            secp256k1_ecmult_gen_lookup_table_index(bits, &sign, &abs);
 
             /* Perform lookup, negate if necessary and add to r. */
             secp256k1_ge_from_storage(&add, &secp256k1_ecmult_gen_prec_table[block][abs]);

</details>

w0xlt commented at 12:21 AM on April 13, 2026: contributor

ACK 0ebb1aabf62a201aef6da5e5a9e9719afbb26a90 mod above nit

real-or-random added the label performance on Apr 13, 2026

real-or-random added the label tweak/refactor on Apr 13, 2026

real-or-random commented at 11:22 AM on April 13, 2026: contributor

to take full use of the fact that the tweak scalar is typically not considered sensitive data

The word "typically" is what worries me here. We don't really know what people use these functions for. What if they do have secret tweaks?

And isn't the tweak secret even in known applications, e.g., Taproot tweaking, say Q = P + tG? One security feature of Taproot is that it hides the Merkle root until a script-path spend. This only works if the tweak t is secret. If t is public, then the attacker can get P = Q - tG and search for a Merkle root m such that t = hash(P, m).

theStack commented at 1:22 PM on April 13, 2026: contributor

to take full use of the fact that the tweak scalar is typically not considered sensitive data

The word "typically" is what worries me here. We don't really know what people use these functions for. What if they do have secret tweaks?

Hm good point, the "if something sounds too good to be true, it probably is" proverb comes to my mind :sweat_smile: Fwiw, what made me believe that the change is fine (in the sense that it doesn't make things worse w.r.t. security to what we have now) is that currently public key tweaking is not side-channel resistant either, as variable-time scalar multiplication (ecmult) is used and the tweak scalar isn't cleared out anywhere. Seems like it would be better to go into the other direction and use ecmult_gen then? (IIRC, I experimented with that a while ago, and it was still a few micro-seconds faster than the current code).

And isn't the tweak secret even in known applications, e.g., Taproot tweaking, say Q = P + tG? One security feature of Taproot is that it hides the Merkle root until a script-path spend. This only works if the tweak t is secret. If t is public, then the attacker can get P = Q - tG and search for a Merkle root m such that t = hash(P, m).

Makes sense. For verifying taproot tweaks though, i.e. secp256k1_xonly_pubkey_tweak_add_check, treating the tweak as public seems to be fine though, as that one is called after revealing the merkle root? Maybe it's even worth it to have two different eckey_pubkey_tweak_add variants internally, one secure and one fast? The former could be used for creating tweaks, the latter for verifying them, or for special use cases like Silent Payments.

sipa commented at 2:21 PM on April 13, 2026: contributor

One security feature of Taproot is that it hides the Merkle root until a script-path spend.

Hmm, I would consider that a privacy feature, and not a security feature. Generally, taproot script trees will involve multiple distrusted participants, and the leaf/script structure will be shared with them. This doesn't work if the script tree is supposed to be secret from a security perspective.

And I think that we're generally not aiming to use constant time operations to protect privacy. Otherwise, it literally applies to everything. You may well want to keep your public keys private, so all functions involving public keys should be constant time?

It may make sense to have a mode/extension/module for "do everything constant time", but I don't think it aligns with how the library has operated so far.

sedited referenced this in commit 963ea38c0c on Apr 19, 2026

theStack commented at 8:25 PM on April 22, 2026: contributor

Fwiw the script verification benchmark for P2TR script-path spends in Bitcoin Core (recently introduced in #35038) shows a ~20% speedup on my machine if this branch is applied to the secp256k1 subtree:

master branch:

$ ./build/bin/bench_bitcoin -filter=VerifyScriptP2TR_ScriptPath

| ns/script | script/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 43,380.08 | 23,052.06 | 0.2% | 0.01 | VerifyScriptP2TR_ScriptPath

Branch apply-secp-pr-1843:

$ ./build/bin/bench_bitcoin -filter=VerifyScriptP2TR_ScriptPath

| ns/script | script/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 35,948.50 | 27,817.57 | 0.1% | 0.01 | VerifyScriptP2TR_ScriptPath

If using variable-time operations on all tweaking operations is seen as controversial (though #1843 (comment) sounds pretty convincing that it isn't, especially considering that they are currently variable-time as well?), it could still be selectively applied to secp256k1_xonly_pubkey_tweak_add_check. Though I assume script-path spends are still rare enough in practice that the average node runner would perceive much of a difference.

real-or-random commented at 8:26 AM on April 27, 2026: contributor

And I think that we're generally not aiming to use constant time operations to protect privacy.

A counterexample is constant-time ECDH in EllSwift. This is clearly done to protect privacy. (In the case of our primary target BIP324, it's not even to protect long-term keys because there are none.)

But this is just an example that I picked because it's so clear. In general, things are much less clear.

Otherwise, it literally applies to everything. You may well want to keep your public keys private, so all functions involving public keys should be constant time?

Security and privacy are properties of high-level protocols and applications, and they depend on the user's threat model.

This makes it (relatively) easy for EllSwift. We know that primary security/privacy goal is confidentiality and shared secret. Of course we'd like to prevent its leakage. This includes the leakage of the input secret keys because they can be used to compute the shared secret.

In contrast, the tweaking functions are low-level primitives. Maybe we shouldn't have exposed them for this reason but we can't turn back time. At the level of our code, all we can do is distinguish "secret" inputs and outputs from "non-secret" ones and document this choice. But we don't know what what the application context is, so we don't which inputs and outputs should be considered secret and which not.

What makes it even more difficult is that there's no clear cut between low-level primitives and high-level systems. Indeed, perhaps someone wants to keep their public Schnorr or ECDSA keys private [1]. And sure, there's a good reason to do this, even in Bitcoin.

In the end, we need to make some assumptions and draw the line somewhere. I'm not sure if this matched your thinking when writing the library initially, but the way I see it is this like this: while it would be nice to treat ECDSA public keys as "secrets", this would only be useful if done in the entire stack. And this seems a hopeless game, so it's pragmatic to accept that public keys are "non-secret".

What does all of this mean for the concrete case of public key? Let's say we tweak P = xG to P + tG. I agree that the Taproot example was not good. It's too far stretched because it violates the aforementioned pragmatic rule that public keys are not secret. That is, P and P + tG and are not secrets in Taproot.

But I want to make a point that t can be a secret in general. I can't rule out that there's a system out there somewhere in which the discrete logarithm of P is known to someone, and the security of that system relies on that someone not learning t (so that it doesn't learn x + t, the discrete logarithm of P.) This is particularly true because they could have looked at the implementation and see that it's constant time, and also because we have treated scalars as secrets in the past. As a result, I believe weakening the constant-time guarantees is a (silent!) breaking change. As such, it requires a new API.

I think there are two reasonable ways to move forward:

Stick to the low-level paradigm, but offer different variants of the functions. The current one and one that explicitly treats t as public.
Offer a "Taproot (tweaking)" module that also takes care of the hashing. Then we will know the application context. (Or simpler, add _taproot versions of the functions without a separate module.)

I think, conceptually, 2 is the cleanest. If we'd also add a BIP32 module, then we could deprecate the low-level functions. But the latter has been mentioned a few times in the past, and so far, nobody found the time to worked on it.

For now, I think 1 is easier. The only case where performance clearly matters is secp256k1_xonly_pubkey_tweak_add_check. If we want to keep things simple, we could add a variant secp256k1_xonly_pubkey_tweak_add_check_var here.

In case of secp256k1_xonly_pubkey_tweak_add_check, it's rather likely that this function is only used for verification. So if we want to keep things even simpler, I think I could also be convinced to just change it to variable time and make the silent breaking change. But it feels wrong to, and if I had to decide, I'd prefer adding a variant.

[1] Not even considering all the quantum talk. I simply have user privacy in mind.

theStack commented at 5:18 PM on April 30, 2026: contributor

@real-or-random: Thanks for elaborating! What still confuses me is the claim about "constant-time guarantees" for the current pubkey tweak functions. As far as I can tell, these don't hold, considering that variable-time point multiplication (ecmult) is used [1], e.g. for _pubkey_tweak_add: https://github.com/bitcoin-core/secp256k1/blob/b11340b3ce2afac1b6ffda4ce5828c30621d2917/src/eckey_impl.h#L62-L72 The jacobi->affine conversion at the end is in constant-time, but that alone doesn't seem to help? Would this PR really weaken anything in that regards?

[1] Related shower thought: for consistency with other internal functions, maybe we should rename ecmult -> ecmult_var and ecmult_const -> ecmult? :thinking:

Kino1994 referenced this in commit 76fc5b2fb0 on Jun 28, 2026

theStack commented at 12:38 AM on June 29, 2026: contributor

Opened #1883, which only contains the ecmult_gen_var implementation of this PR (first two commits rebased) without applying it in any code path that is reachable for public API functions yet. Closing this one, as modifying the behavior of pubkey tweaking API functions w.r.t. constant-timing guarantees is clearly controversial (https://github.com/bitcoin-core/secp256k1/pull/1843#issuecomment-4325379402). I still think the current guarantees are not very strong, but if we do something there (strengthening them by e.g. multiplying via ecmult_const, or documenting them in the API headers as-is?) it would likely belong to a different PR.

theStack closed this on Jun 29, 2026