Additive public key tweaking (i.e. given a public key $P$ and a tweak scalar $t$, calculating $P' = P + t \cdot G$) is currently implemented using the "double multiply" algorithm secp256k1_ecmult ($R = na \cdot A + ng \cdot G$), with $na=1$ and $ng=t$:
https://github.com/bitcoin-core/secp256k1/blob/7262adb4b40074201fb30847035a82b8d742f350/src/ecmult.h#L43-L47
https://github.com/bitcoin-core/secp256k1/blob/7262adb4b40074201fb30847035a82b8d742f350/src/eckey_impl.h#L62-L65
After some tinkering I found that this could be sped up quite a bit. To take full use of the fact that the tweak scalar is typically not considered sensitive data (as opposed to secret keys or nonces), this PR first introduces a variable-time generator point multiplication routine secp256k1_ecmult_gen_var, which is essentially a copy of secp256k1_ecmult_gen [1], but with all the side-channel resistance mitigations stripped out. This function is significantly faster than its constant-time original in all COMB table size configurations, over 66% in all but the smallest 2KB setting (benchmarked via $ ./build/bin/bench_ecmult):
| ECMULT_GEN_KB | ecmult_gen |
ecmult_gen_var |
speedup |
|---|---|---|---|
| 2 | 11.6 us | 8.7 us | ~33.3% |
| 22 | 10.3 us | 6.18 us | ~66.6% |
| 86 (default) | 9.85 us | 5.86 us | ~68.1% |
Plugging that new routine into the internal function secp256k1_eckey_pubkey_tweak_add function (which is easy as the vartime variant is stateless and thus doesn't need a context object anymore) yields the following speedups for the API function secp256k1_ec_pubkey_tweak_add (benchmarked via $ ./build/bin/bench tweak on the second vs. third commit):
| ECMULT_GEN_KB | ec_pubkey_tweak_add<br>(using ecmult) |
ec_pubkey_tweak_add <br>(using new ecmult_gen_var) |
speedup |
|---|---|---|---|
| 2 | 16.3 us | 11.6 us | ~40.5% |
| 22 | 16.3 us | 8.98 us | ~81.5% |
| 86 (default) | 16.3 us | 8.65 us | ~88.4% |
Note that the improved code path (function secp256k1_eckey_pubkey_tweak_add) is also used in the following API functions, so they should all benefit from it:
secp256k1_xonly_pubkey_tweak_addsecp256k1_xonly_pubkey_tweak_add_check(this one is used for verifying P2TR script path spends in Bitcoin Core and thus consensus-critical, see BIP341)secp256k1_keypair_xonly_tweak_addsecp256k1_musig_pubkey_{ec,xonly}_tweak_addsecp256k1_silentpayments_recipient_scan_outputsin PR #1765
Improving Silent Payments scanning performance was the original motivation to look into this (see also https://github.com/craigraw/bench_bip352/), and in fact the "common case" benchmarks show a 15-20% speedup if this branch is applied.
[1] I've also experimented with reintroducing the much simpler ECMULT_GEN_PREC_BITS precomputation table (which was used prior to SDMC in versions earlier than 0.5.0) and that performs even better with PREC_BITS=8: https://github.com/theStack/secp256k1/commit/8f7fc93db4710ea2b7f88c7c458de7d4832968b4 (needing only 31 point additions rather than 42). However, given that it's unclear where to store this table data (having the choice of either bloating up the library size further or again needing some form of context, if it's calculated at runtime) I decided to stick with a solution that uses the already available table data first.