- Selected Co-Z formulas from “Scalar Multiplication on Weierstraß Elliptic Curves from Co-Z Arithmetic” (Goundar, Joye, et. al.) added as group methods with new type sep256k1_coz_t.
- Co-Z methods used for point precomputations in secp256k1_ecmult.
- WINDOW_A size increased to 6 since the precomputation is much faster per-point.
- DBLU cost: 3M+4S, ZADDU cost: 5M+2S.
- From 2.4% to 3.8% faster ‘bench’ results, depending on configuration.
(UPDATE: Technique explicitly described in https://eprint.iacr.org/2008/051)
It’s probably best to have a read of the paper (http://joye.site88.net/papers/GJMRV11regpm.pdf) if not familiar with Co-Z arithmetic, but roughly it’s based on the observations that
- points with equal Z coordinate (Co-Z) can be added very cheaply.
- an input to a Co-Z double or add can be updated to have the same Z coordinate as the result, almost for free, allowing “chaining” of operations.
This works out well for generating a table of 1P, (2P), 3P, 5P, etc… as the 2P can be created with the same Z as 1P, then added to 1P, 3P, etc., in each case being updated to have the same Z as the previous output, which is also the next input. Then the total cost for a table of 8 values is 3M+4S + 7(5M+2S) = 38M+18S, compared to 3M+4S + 7(12M+4S) = 87M+32S. Alternatively, as this PR does, a table of 16 values can be built instead for 3M+4S + 15(5M+2S) = 78M+34S, which is still cheaper than the current table generation; then ~6 point additions are saved during the scalar multiplication (72M+24S), for a total saving of 72M+24S + 87M+32S - (78M+34S) = 81M+22S, or ~100M.
Timing results for ‘bench’ across a variety of configurations (bignum=gmp throughout):
Field | Endo? | Before | After | %Perf |
---|---|---|---|---|
64bit | yes | 1m10.623s | 1m8.147s | +3.6% |
64bit | no | 1m35.524s | 1m32.660s | +3.1% |
32bit | yes | 1m43.756s | 1m39.973s | +3.8% |
32bit | no | 2m21.871s | 2m17.690s | +3.0% |
gmp | yes | 1m41.278s | 1m37.710s | +3.7% |
gmp | no | 2m24.717s | 2m21.326s | +2.4% |