optimize pubkey tweaking by using vartime jacobian->affine conversion (~10% speedup) #1844

pull theStack wants to merge 2 commits into bitcoin-core:master from theStack:pubkey-tweak-use_vartime_group_to_affine changing 2 files +49 −5

theStack commented at 4:19 PM on April 11, 2026: contributor
This PR has a very similar theme and motivation as #1843, but is significantly simpler.

Tweaks are not considered sensitive data, and the point multiplications are not performed in constant-time either, so we can use the variable-time variant for converting to affine coordinates (ge_set_gej_var) as well to improve the performance. On my arm64 machine, this shows a ~10% speedup for both the additive and multiplicative public key tweaking API functions:

master branch
```
$ ./build/bin/bench tweak
Benchmark                     ,    Min(us)    ,    Avg(us)    ,    Max(us)

ec_pk_tweak_add               ,    16.3       ,    16.4       ,    16.7
ec_pk_tweak_mul               ,    19.7       ,    19.8       ,    19.8
```
PR branch
```
$ ./build/bin/bench tweak
Benchmark                     ,    Min(us)    ,    Avg(us)    ,    Max(us)

ec_pk_tweak_add               ,    14.6       ,    14.7       ,    14.9
ec_pk_tweak_mul               ,    18.1       ,    18.1       ,    18.2
```
The reduced execution time roughly matches the difference between fe_inv and fe_inv_var (see internal benchmarks), ~1.6 us on my machine.

Note that the improved code path for additive tweaking (function secp256k1_eckey_pubkey_tweak_add) is also used in the following API functions, so they should all benefit from it:
- secp256k1_xonly_pubkey_tweak_add
- secp256k1_xonly_pubkey_tweak_add_check (this one is used for verifying P2TR script path spends in Bitcoin Core and thus consensus-critical, see BIP341)
- secp256k1_keypair_xonly_tweak_add
- secp256k1_musig_pubkey_{ec,xonly}_tweak_add
- secp256k1_silentpayments_recipient_scan_outputs in PR #1765
bench: add benchmarks for `secp256k1_ec_pubkey_tweak_{add,mul}` 0a08281e59

eckey: use vartime jacobian->affine conversion for pubkey tweaking

Tweaks are not considered sensitive data, and the point multiplications
are not performed in constant-time either, so we can use the
variable-time variant for converting to affine coordinates (ge_set_gej_var)
as well to improve the performance. On my arm64 machine, this shows a
~10% speedup for both the additive and multiplicative public key tweaking
API functions:

----- Before (prior this commit): -----
```
$ ./build/bin/bench tweak
Benchmark                     ,    Min(us)    ,    Avg(us)    ,    Max(us)

ec_pk_tweak_add               ,    16.3       ,    16.4       ,    16.7
ec_pk_tweak_mul               ,    19.7       ,    19.8       ,    19.8
```

----- After (this commit): -----
```
$ ./build/bin/bench tweak
Benchmark                     ,    Min(us)    ,    Avg(us)    ,    Max(us)

ec_pk_tweak_add               ,    14.6       ,    14.7       ,    14.9
ec_pk_tweak_mul               ,    18.1       ,    18.1       ,    18.2
```

The reduced execution time roughly matches the difference between
`fe_inv` and `fe_inv_var` (see internal benchmarks), ~1.6 us on my machine.

5a9e0a1dd8

theStack renamed this:
~~optimize pubkey tweaking by using vartime jabobian->affine conversion (~10% speedup)~~
optimize pubkey tweaking by using vartime jacobian->affine conversion (~10% speedup)
on Apr 11, 2026
w0xlt approved

w0xlt commented at 12:20 AM on April 13, 2026: contributor

ACK 5a9e0a1dd8dc2089fbf20f8658c9fa296dbb253c

nit:

diff --git a/src/bench.c b/src/bench.c
index c971b94..d1f550f 100644
--- a/src/bench.c
+++ b/src/bench.c
@@ -170,7 +170,7 @@ static void bench_tweak_setup(void* arg) {
     }
 }
 
-static void bench_pubkey_tweak_add(void *arg, int iters) {
+static void bench_pubkey_tweak_add_run(void *arg, int iters) {
     int i;
     bench_data *data = (bench_data*)arg;
 
@@ -183,7 +183,7 @@ static void bench_pubkey_tweak_add(void *arg, int iters) {
     }
 }
 
-static void bench_pubkey_tweak_mul(void *arg, int iters) {
+static void bench_pubkey_tweak_mul_run(void *arg, int iters) {
     int i;
     bench_data *data = (bench_data*)arg;
 
@@ -303,8 +303,8 @@ int main(int argc, char** argv) {
 
     if (d || have_flag(argc, argv, "ecdsa") || have_flag(argc, argv, "sign") || have_flag(argc, argv, "ecdsa_sign")) run_benchmark("ecdsa_sign", bench_sign_run, bench_sign_setup, NULL, &data, 10, iters);
     if (d || have_flag(argc, argv, "ec") || have_flag(argc, argv, "keygen") || have_flag(argc, argv, "ec_keygen")) run_benchmark("ec_keygen", bench_keygen_run, bench_keygen_setup, NULL, &data, 10, iters);
-    if (d || have_flag(argc, argv, "ec") || have_flag(argc, argv, "tweak") || have_flag(argc, argv, "ec_pk_tweak_add")) run_benchmark("ec_pk_tweak_add", bench_pubkey_tweak_add, bench_tweak_setup, NULL, &data, 10, iters);
-    if (d || have_flag(argc, argv, "ec") || have_flag(argc, argv, "tweak") || have_flag(argc, argv, "ec_pk_tweak_mul")) run_benchmark("ec_pk_tweak_mul", bench_pubkey_tweak_mul, bench_tweak_setup, NULL, &data, 10, iters);
+    if (d || have_flag(argc, argv, "ec") || have_flag(argc, argv, "tweak") || have_flag(argc, argv, "ec_pk_tweak_add")) run_benchmark("ec_pk_tweak_add", bench_pubkey_tweak_add_run, bench_tweak_setup, NULL, &data, 10, iters);
+    if (d || have_flag(argc, argv, "ec") || have_flag(argc, argv, "tweak") || have_flag(argc, argv, "ec_pk_tweak_mul")) run_benchmark("ec_pk_tweak_mul", bench_pubkey_tweak_mul_run, bench_tweak_setup, NULL, &data, 10, iters);
 
     secp256k1_context_destroy(data.ctx);

Contributors

theStack

w0xlt

Linked (view graph)

#1765 Add "silentpayments" module implementing BIP352 (take 4, limited to full-node scanning)#1843 optimize additive pubkey tweaking with vartime generator point multiplication (>80% speedup)