Signed-digit multi-comb ecmult

sipa commented at 10:51 pm on December 27, 2021: contributor

A third iteration of the signed-digit multi-comb ecmult_gen algorithm (earlier attempts: #693, and #546 by Peter Dettman). Short summary:

A new constant-time point multiplication algorithm with precomputation (so only used for multiply with G).
Based on section 3.3 of https://eprint.iacr.org/2012/309 by Mike Hamburg.
Configurable through two parameters: COMB_BLOCKS and COMB_TEETH
- Currently only 3 predefined configurations reachable through ./configure (tables 2 kB, 22 kB, 86 kB). All three are included in precomputed_ecmult_gen.c and tested in CI. The 2 kB option is already comparable in speed with the current code.
- Many more configurations can be reached by manually setting the macros. These are not tested.

Compared with the previous PR #693:

Updated to the new static-precomputation-only model (#893).
Just 3 curated configurations reachable through configure.
Removed some optimizations that do not matter (much).
Do blinding through an final correction add rather than an initial start point, which may later permit usage of incomplete addition formulae (#1051).
The recoding of the input scalar to signed bit representation is done slightly differently, which needs fewer special cases.

sipa renamed this:
~~WIP Reword of Signed-Digit Multicomb~~
WIP Rework of Signed-Digit Multicomb
on Dec 27, 2021

sipa force-pushed on Dec 28, 2021

peterdettman commented at 11:48 am on December 28, 2021: contributor

Just out of curiosity, some perf. numbers (best Min of 3 bench sign, 64-bit, i7-9750H):

branch	ecdsa_sign	schnorrsig_sign
master	29.5	23.1
this PR	25.6	19.2
experimental	23.8	17.7

(“experimental” is this PR plus the other PRs for normalize, group formulae, and the “vector” modinv).

Add secp256k1_scalar_half

This introduces a new secp256k1_scalar_half function which multiplies
a scalar with the multiplicative inverse of 2 (modulo order).

f8d148b6fa

Initial gej blinding -> final ge blinding

Instead of having the starting point of the ecmult_gen computation be
offset, do it with the final point. This enables reasoning over the
set of points reachable in intermediary computations, which can be
leveraged by potential future optimization.

Because the final point is in affine coordinates, its projective
blinding is no longer possible. It will be reintroduced again in
a different way, in a later commit.

Also introduce some more comments and more descriptive names.

6a792b6406

sipa force-pushed on Dec 29, 2021

sipa commented at 0:08 am on December 29, 2021: contributor

@peterdettman Care to redo benchmarks for the latest commit (I’ve removed the incomplete comb optimization, and re-added the uint32_t[9] recoded approach)?

siv2r commented at 3:28 am on December 29, 2021: contributor

These were the best min I got (running the benchmark thrice) on my machine (64-bit, i7-8750H).

branch	ecdsa_sign	schnorrsig_sign
master	64.4	49.6
this PR	57.0	42.0

peterdettman commented at 6:00 am on December 29, 2021: contributor

Updated perf. numbers (-O3, best Min of 3 bench sign, 64-bit, i7-9750H):

branch	ecdsa_sign	schnorrsig_sign
master	29.6	23.1
this PR	25.7	19.4
experimental	24.2	17.9

(“experimental” is this PR plus the other PRs for normalize, group formulae, and the “vector” modinv).

So this looks just slightly slower than before, but perfectly fine if we are merge-focused. We can go hunting the extra 2% once we’ve booked the 20%.

sipa force-pushed on Dec 29, 2021

sipa renamed this:
~~WIP Rework of Signed-Digit Multicomb~~
Signed-digit multi-comb ecmult_gen algorithm
on Dec 29, 2021

Signed-digit multi-comb ecmult_gen algorithm

This introduces the signed-digit multi-comb multiplication algorithm
for constant-time G multiplications (ecmult_gen). It is based on
section 3.3 of "Fast and compact elliptic-curve cryptography" by
Mike Hamburg (see https://eprint.iacr.org/2012/309).

Original implementation by Peter Dettman, with changes by Pieter Wuille
to use scalars for recoding, and additional comments.

1ae98bb58c

Always generate tables for current (blocks,teeth) config 1777163831

Provide 3 configurations accessible through ./configure 878e678976

Optimization: move 2^COMB_BITS-1 offset into ctx->scalar_offset

It is unnecessary to recompute the 2^COMB_BITS-1 scalar offset needed
by the SDMC algorithm for every multiplication; move it into the
context scalar_offset value instead.

a23da751af

Optimization: first table lookup needs no point addition d4ab037441

Optimization: avoid unnecessary doublings in precomputation 10e6d6be4b

Make secp256k1_scalar_get_bits support 32-bit reads eab993a9bc

Optimization: use Nx32 representation for recoded bits

The existing code needs to deal with the edge case that bit_pos >= 256,
which would lead to an out-of-bounds read from secp256k1_scalar.

Instead, recode the scalar into an array of uint32_t with enough zero
padding at the end to alleviate the issue. This also simplifies the
code, and is necessary for a security improvement in a follow-up
commit.

Original code by Peter Dettman, with modifications by Pieter Wuille.

1464b937da

Reduce side channels from single-bit reads 3d9027457a

Reintroduce projective blinding 4a4d16e83c

sipa force-pushed on Dec 29, 2021

sipa closed this on Dec 29, 2021

sipa commented at 9:00 pm on December 29, 2021: contributor

Restarting this in a new PR to avoid the WIP discussion: #1058.

Signed-digit multi-comb ecmult_gen algorithm #1057