Signed-digit based ecmult

sipa commented at 6:39 am on December 30, 2022: contributor

Using some insights learned from #1058, this replaces the fixed-wnaf ecmult_const algorithm with a signed-digit based one. Conceptually both algorithms are very similar, in that they boil down to summing precomputed odd multiples of the input points. Practically however, the new algorithm is simpler because it’s just using scalar operations, rather than relying on wnaf machinery with skew terms to guarantee odd multipliers.

The idea is that we can compute $q \cdot A$ as follows:

Let $s = f(q)$, for some function $f()$.
Compute $(s_1, s_2)$ such that $s = s_1 + \lambda s_2$, using secp256k1_scalar_lambda_split.
Let $v_1 = s_1 + 2^{128}$ and $v_2 = s_2 + 2^{128}$ (such that the $v_i$ are positive and $n$ bits long).
Computing the result as $$\sum_{i=0}^{n-1} (2v_1[i]-1) 2^i A + \sum_{i=0}^{n-1} (2v_2[i]-1) 2^i \lambda A$$ where $x[i]$ stands for the i’th bit of $x$, so summing positive and negative powers of two times $A$, based on the bits of $v_1.$

The comments in ecmult_const_impl.h show that if $f(q) = (q + (1+\lambda)(2^n - 2^{129} - 1))/2 \mod n$, the result will equal $q \cdot A$.

This last step can be performed in groups of multiple bits at once, by looking up entries in a precomputed table of odd multiples of $A$ and $\lambda A$, and then multiplying by a power of two before proceeding to the next group.

The result is slightly faster (I measure ~2% speedup), but significantly simpler as it only uses scalar arithmetic to determine the table lookup values. The speedup is due to the fact that no skew corrections at the end are needed, and less overhead to determine table indices. The precomputed table sizes are also made independent from the ecmult ones, after observing that the optimal table size is bigger here (which also gives a small speedup).

in src/tests.c:2219 in b4387056a5 outdated

2214+            /* 2^255 */
2215+            SECP256K1_SCALAR_CONST(0x80000000ul, 0, 0, 0, 0, 0, 0, 0),
2216+            /* 2^255 - 1 */
2217+            SECP256K1_SCALAR_CONST(0x7ffffffful, 0xfffffffful, 0xfffffffful, 0xfffffffful, 0xfffffffful, 0xfffffffful, 0xfffffffful, 0xfffffffful),
2218+        };
2219+        for (i = 0; (unsigned)i < sizeof(HALF_TESTS) / sizeof(HALF_TESTS[0]); ++i) {

real-or-random commented at 11:14 am on December 30, 2022:

nit: Maybe introduce a size_t instead of reusing i and casting it.. Easy given that you anyway open a new scope. (You could also scope the other uses of i properly as a clean up.)

sipa commented at 10:00 pm on December 30, 2022:

Added an unsigned one.

real-or-random commented at 11:20 am on December 30, 2022: contributor

Whoa nice! The skew correction is really annoying to reason about…

I just don’t know where to get all the review power from.

sipa force-pushed on Dec 30, 2022

sipa commented at 10:12 pm on December 30, 2022: contributor

Added a commit to remove secp256k1_scalar_shr_int, which is now unused apart from tests. It’s a net reduction diff now, even with all the comments it adds.

sipa commented at 3:30 am on January 5, 2023: contributor

@peterdettman Randomly tagging you, you may find this interesting.

apoelstra commented at 2:05 pm on January 7, 2023: contributor

Very cool! I wonder if the explanation might be clearer if you said something like “without any further optimizations, k0 would be 2^n - 1 with n = 256; however we are going to combine this scheme with the GLV endomorphism and do a windowed rather than bitwise approach. So instead we solve for k0” rather than “for some constant k0”.

In the end I was able to understand what you were doing without much trouble, so what you’ve written is fine, but it was a bit intimidating to see k0, k1, k2 all introduced at once just as unknowns.

apoelstra approved

apoelstra commented at 4:01 pm on January 7, 2023: contributor

ACK f16c5008ba0975b6111f4c1f1b5a2e3005c00c8f

sipa force-pushed on Jan 7, 2023

sipa commented at 6:50 pm on January 7, 2023: contributor

@apoelstra I’ve changed the explanation to introduce the offset terms a bit more gently.

I’ve also dropped the ecmult_const_get_scalar_bit_group function as secp256k1_scalar_get_bits_var can be used instead (the _var part is not an issue as it’s only variable-time in the offset/length, not the scalar).

apoelstra commented at 3:16 pm on January 8, 2023: contributor

@sipa this looks way better, thanks!

apoelstra approved

apoelstra commented at 3:59 pm on January 8, 2023: contributor

ACK b4ff4ebbb6486756f33c3f2119bc897d1c75179a

sipa force-pushed on Jan 19, 2023

sipa commented at 8:37 pm on January 19, 2023: contributor

Rebased after merge of #1170 and #1190.

sipa commented at 10:09 pm on January 20, 2023: contributor

Mental note: instead of adding $2^{128}$ to the split scalars, I believe it’s also possible to (a) conditionally negate the scalar (if it’s negative) and then (b) swap the meaning of positive/negate in the table (alternatively, bitwise negate the integers read from the table). This would mean we only need enough table coverage for 128 bits rather than 129, which for window=4 means one fewer addition.

EDIT: no, doesn’t work

sipa force-pushed on Jan 23, 2023

sipa commented at 4:16 pm on January 23, 2023: contributor

Oops, my previous rebase accidentally reverted the changes I made to address @apoelstra’s comment. Re-rebased.

in src/ecmult_const_impl.h:116 in 1c48861222 outdated

207+     *   - Let R1 = C_256((s1 + 2^256 - 1) / 2, A)
208+     *   - Let R2 = C_256((s2 + 2^256 - 1) / 2, lambda*A)
209+     *   - Return R1 + R2
210+     *
211+     * The issue is that while q1 and q2 are small-range numbers, (q1 + 2^256 - 1) / 2 mod n
212+     * and (q2 + 2^256 - 1) / 2 mod n are not, undoing the benefit of the splitting.

peterdettman commented at 5:10 am on February 10, 2023:

~~q1, q2~~ s1, s2

sipa commented at 9:58 pm on February 10, 2023:

Fixed.

in src/ecmult_const_impl.h:151 in 1c48861222 outdated

245 
246-static void secp256k1_ecmult_const(secp256k1_gej *r, const secp256k1_ge *a, const secp256k1_scalar *scalar, int size) {
247-    secp256k1_ge pre_a[ECMULT_TABLE_SIZE(WINDOW_A)];
248-    secp256k1_ge tmpa;
249+    /* The offset to add to s1 and s2 to make them positive. Equal to 2^128 - 1. */
250+    static const secp256k1_scalar S_OFFSET = SECP256K1_SCALAR_CONST(0, 0, 0, 0, 0xfffffffful, 0xfffffffful, 0xfffffffful, 0xfffffffful);

peterdettman commented at 5:43 am on February 10, 2023:

It seems 2^128 would work just as well for the offset. Is it easier to just add 1 to bit 128 than keep this offset around?

peterdettman commented at 6:25 am on February 10, 2023:

Also ~~positive~~ non-negative here.

sipa commented at 9:58 pm on February 10, 2023:

Fixed.

sipa commented at 10:01 pm on February 10, 2023:

We don’t have scalar code to add just one bit (except secp256k1_scalar_cadd_bit, but that doesn’t permit overflow), so there is little to gain from using $2^{128}$ instead, I think.

Arguably we could use the exact bounds for the lambda splitting too (see secp256k1_scalar_split_lambda_verify).

in src/ecmult_const_impl.h:118 in 1c48861222 outdated

209+     *   - Return R1 + R2
210+     *
211+     * The issue is that while q1 and q2 are small-range numbers, (q1 + 2^256 - 1) / 2 mod n
212+     * and (q2 + 2^256 - 1) / 2 mod n are not, undoing the benefit of the splitting.
213+     *
214+     * The make it work, we apply the corrections to the input scalar, including the division by

peterdettman commented at 5:58 am on February 10, 2023:

~~The~~ To make it work…

sipa commented at 9:58 pm on February 10, 2023:

Fixed.

in src/ecmult_const_impl.h:120 in 1c48861222 outdated

211+     * The issue is that while q1 and q2 are small-range numbers, (q1 + 2^256 - 1) / 2 mod n
212+     * and (q2 + 2^256 - 1) / 2 mod n are not, undoing the benefit of the splitting.
213+     *
214+     * The make it work, we apply the corrections to the input scalar, including the division by
215+     * two, before calling split_lambda. Let's introduce an offset k, and solve for it. We also
216+     * add 2^128-1 to s1 and s2 after splitting to make sure we end up with positive scalars.

peterdettman commented at 6:01 am on February 10, 2023:

I guess we technically want “non-negative” scalars (or rephrase as “to make sure neither are negative”).

sipa commented at 9:58 pm on February 10, 2023:

Fixed.

in src/ecmult_const_impl.h:143 in 1c48861222 outdated

234+     * <=> q   = 2*(s1 + s2*lambda) + (2^129 - 1 - 2^l)*(1 + lambda) mod n
235+     * <=> q   = 2*(q + k) / 2 + (2^129 - 1 - 2^l)*(1 + lambda) mod n
236+     * <=> q   = q + k + (2^129 - 1 - 2^l)*(1 + lambda) mod n
237+     * <=> k   = (2^l + 1 - 2^129)*(1 + lambda) mod n
238+     *
239+     * We will process the computation of C_l and C_l in groups of ECMULT_CONST_GROUP_SIZE, so we

peterdettman commented at 6:06 am on February 10, 2023:

Should this say “C_l(v1, A) and C_l(v2, lambda*A)”?

sipa commented at 9:58 pm on February 10, 2023:

Done.

sipa force-pushed on Feb 10, 2023

peterdettman commented at 5:35 am on February 11, 2023: contributor

Mental note: instead of adding 2128 to the split scalars, I believe it’s also possible to (a) conditionally negate the scalar (if it’s negative) and then (b) swap the meaning of positive/negate in the table (alternatively, bitwise negate the integers read from the table). This would mean we only need enough table coverage for 128 bits rather than 129, which for window=4 means one fewer addition.

EDIT: no, doesn’t work

Are you able to explain why that doesn’t work?

peterdettman commented at 8:06 am on February 11, 2023: contributor

Nice idea, simple and clear.

I would prefer to use 2^128 as the offset, allowing the split to use the full two’s complement range of [-2^128, 2^128). That generalizes better to other split schemes (I have in mind the basis reduction of https://ia.cr/2020/454).
The Straus ladder has the same need to deal with negative split outputs; should the mechanism be unified (one way or the other?)
I don’t have a build to test the performance of the current _scalar_split_lambda, but it’s doing quite a bit of unnecessary work compared to the “original” non-modular formula, which it seems could be done cmod 2^129 (i.e. 129-bit two’s complement representation) - and e.g. avoid calculating the 384 bits that are shifted away. Not to belabor the point, but if the split outputs were in 129-bit two’s complement, adding a 2^128 offset is just a bit flip (or maybe a single limb complement).

sipa force-pushed on Feb 11, 2023

sipa commented at 3:11 pm on February 11, 2023: contributor

I’ve switched to offset $2^{128}$.

in src/ecmult_const_impl.h:125 in dd21e2c5ca outdated

216+     * add 2^128 to s1 and s2 after splitting to make sure we end up with non-negative scalars
217+     * (a slightly smaller offset would work due to the bounds on the split, but we pick 2^128
218+     * for simplicity).
219+     *
220+     *   To compute q*A:
221+     *   - Let s1, s2 = split_lambda((q + k) / 2 mod n)

peterdettman commented at 5:07 pm on February 11, 2023:

It’s a bit jarring to see the form of the solution magically appear here as (q + k)/2 mod n. It could just be ’t’ (or ‘f(q)’) and the derivation below works out fine, but without the leap.

sipa commented at 3:23 am on February 12, 2023:

Good idea, that’s a lot easier to follow I imagine. Done.

in src/ecmult_const_impl.h:127 in dd21e2c5ca outdated

218+     * for simplicity).
219+     *
220+     *   To compute q*A:
221+     *   - Let s1, s2 = split_lambda((q + k) / 2 mod n)
222+     *   - Let v1 = s1 + 2^128
223+     *   - Let v2 = s2 + 2^128

peterdettman commented at 6:03 pm on February 11, 2023:

I kinda feel like these should say “mod n”, but it’s not quite clear. Let R1 = C_256((s1 + 2^256 - 1) / 2, A) implies that s1 is a 256-bit value so for v1 to become small enough for the following C_l presumably the mod is needed.

sipa commented at 3:26 am on February 12, 2023:

It really depends on whether we think of $s_1$ and $s_2$ as integers (in range $(-2^{128},2^{128})$, or as scalars (either in range $(n-2^{128},n]$ or in range $[0,2^{128})$ ).

The code more closely matches the split-range behavior, so I’ve added mod n.

peterdettman commented at 6:38 pm on February 11, 2023: contributor

I think if _scalar_split_lambda just output 129-bit two’s complement values (and then sign-extend to ECMULT_CONST_BITS), then they would be directly usable in the comb, no offset needed for s1, s2. Something for the future, perhaps.

sipa force-pushed on Feb 12, 2023

sipa commented at 3:40 am on February 12, 2023: contributor

Are you able to explain why that doesn’t work?

Yeah, the issue is that when using the signed-digit encoding, negating the (final, intended) scalar isn’t the same as negating the scalar whose bits are used to drive the point lookups. It’s possible of course to negate, but instead it corresponds to complementing the lookup scalar $s$: with $s \rightarrow 2^l - 1 - s$. This doesn’t help you get rid of the 129th bit.

It’s also possible to see it information theoretically: since $-C_{128}(s_1, P) = C_{128}(2^{128} - 1 - s_1, P)$, negating doesn’t “add” anything; the $C_{128}$ operation can already represent that with another input. So it’s not possible to compute $2^{129} - 1$ different results using $C_{128}$ (which takes a 128-bit input), even with an optional negation of the result at the end.

I would prefer to use 2^128 as the offset, allowing the split to use the full two’s complement range of [-2^128, 2^128). That generalizes better to other split schemes (I have in mind the basis reduction of https://ia.cr/2020/454).

Done.

The Straus ladder has the same need to deal with negative split outputs; should the mechanism be unified (one way or the other?)

I’d need to look. I think the technique used here is most directly applicable to the Pippenger multi-multiplication code (as it’s essentially a constant-time algorithm, except for some small optimizations). I plan to look at that when the code here is done.

I don’t have a build to test the performance of the current _scalar_split_lambda, but it’s doing quite a bit of unnecessary work compared to the “original” non-modular formula, which it seems could be done cmod 2^129 (i.e. 129-bit two’s complement representation) - and e.g. avoid calculating the 384 bits that are shifted away. Not to belabor the point, but if the split outputs were in 129-bit two’s complement, adding a 2^128 offset is just a bit flip (or maybe a single limb complement).

Of course, be my guest to optimize the lambda splitting code, but it is just run once per ECDH operation. On my Ryzen 5950X, lambda splitting takes 190 ns (for comparison, a constant time gej+ge point addition takes 320 ns; ECDH takes 51000 ns).

I think if _scalar_split_lambda just output 129-bit two’s complement values (and then sign-extend to ECMULT_CONST_BITS), then they would be directly usable in the comb, no offset needed for s1, s2. Something for the future, perhaps.

I believe that’s right. There would still need to be an offset/halving before splitting to compensate for the signed-digit representation, but none after splitting.

sipa force-pushed on Apr 10, 2023

sipa commented at 11:46 am on April 10, 2023: contributor

Rebased.

sipa force-pushed on May 11, 2023

sipa commented at 5:02 pm on May 11, 2023: contributor

Rebased.

sipa force-pushed on May 11, 2023

real-or-random added the label performance on May 11, 2023

real-or-random added the label refactor on May 11, 2023

in src/ecmult_const_impl.h:96 in f22eb66c07 outdated

182+     * point multiplication in that fashion. Let v be an n-bit non-negative integer (0 <= v < 2^n),
183+     * and v[i] its i'th bit (so v = sum(v[i] * 2^i, i=0..n-1)). Then define:
184+     *
185+     *   C_l(v, A) = sum((2*v[i] - 1) * 2^i*A, i=0..l-1)
186+     *
187+     * Then it holds that C_n(v, A) = sum((2*v[i] - 1) * 2^i*A, i=0..l-1)

jonasnick commented at 8:08 pm on June 14, 2023:

0     * Then it holds that C_l(v, A) = sum((2*v[i] - 1) * 2^i*A, i=0..l-1)

sipa commented at 8:31 pm on June 20, 2023:

Fixed.

in src/ecmult_const_impl.h:63 in f22eb66c07 outdated

71+ */
72+#define ECMULT_CONST_TABLE_GET_GE(r,pre,n) do { \
73+    unsigned m = 0; \
74+    /* If the top bit of n is 0, we want the negation. */ \
75+    volatile unsigned negative = ((n) >> (ECMULT_CONST_GROUP_SIZE - 1)) ^ 1; \
76+    /* The index is computed by looking the the bottom bits, after making positive. */ \

jonasnick commented at 8:08 pm on June 14, 2023:

0    /* The index is computed by looking the bottom bits, after making positive. */ \

sipa commented at 8:31 pm on June 20, 2023:

Fixed (replaced the first the with at instead).

in src/ecmult_const_impl.h:222 in f22eb66c07 outdated

355+     *
356+     * We proceed in groups of ECMULT_CONST_GROUP_SIZE bits, operating on that many bits
357+     * at a time, from high in v1/v2 to low. Call these bits1 (from v1) and bits2 (from v2).
358+     *
359+     * Now note that ECMULT_CONST_TABLE_GET_GE(&t, pre_a, bits1) loads into t a point equal
360+     * to C_{ECMULT_CONST_GROUP_SIZE}(bits, A), and analogously for pre_lam_a / bits2.

jonasnick commented at 8:09 pm on June 14, 2023:

0     * to C_{ECMULT_CONST_GROUP_SIZE}(bits1, A), and analogously for pre_lam_a / bits2.

sipa commented at 8:30 pm on June 20, 2023:

Fixed.

in src/ecmult_const_impl.h:212 in f22eb66c07 outdated

325-    secp256k1_ecmult_odd_multiples_table_globalz_windowa(pre_a, &Z, r);
326-    for (i = 0; i < ECMULT_TABLE_SIZE(WINDOW_A); i++) {
327+    secp256k1_gej_set_ge(&res, a);
328+    secp256k1_ecmult_const_odd_multiples_table_globalz(pre_a, &Z, &res);
329+    for (i = 0; i < ECMULT_CONST_TABLE_SIZE; i++) {
330         secp256k1_fe_normalize_weak(&pre_a[i].y);

jonasnick commented at 7:15 am on June 15, 2023:

Why normalize_weak here? As far as I can see, pre_a[i].y already has magnitude 1.

sipa commented at 8:30 pm on June 20, 2023:

I don’t remember why it was there; it seems unnecessary indeed.

in src/ecmult_const_impl.h:66 in f22eb66c07 outdated

72+#define ECMULT_CONST_TABLE_GET_GE(r,pre,n) do { \
73+    unsigned m = 0; \
74+    /* If the top bit of n is 0, we want the negation. */ \
75+    volatile unsigned negative = ((n) >> (ECMULT_CONST_GROUP_SIZE - 1)) ^ 1; \
76+    /* The index is computed by looking the the bottom bits, after making positive. */ \
77+    unsigned index = ((unsigned)(-negative) ^ n) & ((1U << (ECMULT_CONST_GROUP_SIZE - 1)) - 1U); \

jonasnick commented at 1:42 pm on June 15, 2023:

Wow, that this is quite a non-obvious way to get the index (but straightforward to see that it works).

jonasnick commented at 1:48 pm on June 15, 2023: contributor

Concept ACK. Besides being simpler than the existing approach I can get a reliable 1.5% speedup on my dev laptop. ECMULT_CONST_GROUP_SIZE 4 and 5 are best choices in my benchmarks and approximately equal in performance (with 4 being 0.16% faster).

I created a branch with a few minor suggestions. Feel free to cherry-pick.

sipa force-pushed on Jun 20, 2023

sipa commented at 8:29 pm on June 20, 2023: contributor

Made the following changes (includes the fixups and tests by @jonasnick, the latter combined into a separate commit). I didn’t include the macro -> function change, as I measure a slight slowdown from it.

  0diff --git a/CHANGELOG.md b/CHANGELOG.md
  1index 69542415..fa027fdb 100644
  2--- a/CHANGELOG.md
  3+++ b/CHANGELOG.md
  4@@ -7,6 +7,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
  5 
  6 ## [Unreleased]
  7 
  8+#### Changed
  9+ - The point multiplication algorithm used for ECDH operations (module `ecdh`) was replaced with a slightly faster one.
 10+
 11 ## [0.3.2] - 2023-05-13
 12 We strongly recommend updating to 0.3.2 if you use or plan to use GCC >=13 to compile libsecp256k1. When in doubt, check the GCC version using `gcc -v`.
 13 
 14@@ -54,7 +57,6 @@ The ABI is compatible with version 0.3.0.
 15 #### Changed
 16  - Forbade cloning or destroying `secp256k1_context_static`. Create a new context instead of cloning the static context. (If this change breaks your code, your code is probably wrong.)
 17  - Forbade randomizing (copies of) `secp256k1_context_static`. Randomizing a copy of `secp256k1_context_static` did not have any effect and did not provide defense-in-depth protection against side-channel attacks. Create a new context if you want to benefit from randomization.
 18- - The point multiplication algorithm used for ECDH operations (module `ecdh`) was replaced with a slightly faster one.
 19 
 20 #### Removed
 21  - Removed the configuration header `src/libsecp256k1-config.h`. We recommend passing flags to `./configure` or `cmake` to set configuration options (see `./configure --help` or `cmake -LH`). If you cannot or do not want to use one of the supported build systems, pass configuration flags such as `-DSECP256K1_ENABLE_MODULE_SCHNORRSIG` manually to the compiler (see the file `configure.ac` for supported flags).
 22diff --git a/src/ecmult_const_impl.h b/src/ecmult_const_impl.h
 23index 525d86f0..58cc19a8 100644
 24--- a/src/ecmult_const_impl.h
 25+++ b/src/ecmult_const_impl.h
 26@@ -60,7 +60,7 @@ static void secp256k1_ecmult_const_odd_multiples_table_globalz(secp256k1_ge *pre
 27     unsigned m = 0; \
 28     /* If the top bit of n is 0, we want the negation. */ \
 29     volatile unsigned negative = ((n) >> (ECMULT_CONST_GROUP_SIZE - 1)) ^ 1; \
 30-    /* The index is computed by looking the the bottom bits, after making positive. */ \
 31+    /* The index is computed by looking at the bottom bits, after making positive. */ \
 32     unsigned index = ((unsigned)(-negative) ^ n) & ((1U << (ECMULT_CONST_GROUP_SIZE - 1)) - 1U); \
 33     secp256k1_fe neg_y; \
 34     VERIFY_CHECK((n) >> ECMULT_CONST_GROUP_SIZE == 0); \
 35@@ -82,6 +82,25 @@ static void secp256k1_ecmult_const_odd_multiples_table_globalz(secp256k1_ge *pre
 36     secp256k1_fe_cmov(&(r)->y, &neg_y, negative); \
 37 } while(0)
 38 
 39+/* For K as defined in the comment of secp256k1_ecmult_const, we have several precomputed
 40+ * formulas/constants.
 41+ * - in exhaustive test mode, we give an explicit expression to compute it at compile time: */
 42+#ifdef EXHAUSTIVE_TEST_ORDER
 43+    static const secp256k1_scalar secp256k1_ecmult_const_K = ((SECP256K1_SCALAR_CONST(0, 0, 0, (1U << (ECMULT_CONST_BITS - 128)) - 2U, 0, 0, 0, 0) + EXHAUSTIVE_TEST_ORDER - 1U) * (1U + EXHAUSTIVE_TEST_LAMBDA)) % EXHAUSTIVE_TEST_ORDER;
 44+/* - for the real secp256k1 group we have constants for various ECMULT_CONST_BITS values. */
 45+#elif ECMULT_CONST_BITS == 129
 46+    /* For GROUP_SIZE = 1,3. */
 47+    static const secp256k1_scalar secp256k1_ecmult_const_K = SECP256K1_SCALAR_CONST(0xac9c52b3ul, 0x3fa3cf1ful, 0x5ad9e3fdul, 0x77ed9ba4ul, 0xa880b9fcul, 0x8ec739c2ul, 0xe0cfc810ul, 0xb51283ceul);
 48+#elif ECMULT_CONST_BITS == 130
 49+    /* For GROUP_SIZE = 2,5. */
 50+    static const secp256k1_scalar secp256k1_ecmult_const_K = SECP256K1_SCALAR_CONST(0xa4e88a7dul, 0xcb13034eul, 0xc2bdd6bful, 0x7c118d6bul, 0x589ae848ul, 0x26ba29e4ul, 0xb5c2c1dcul, 0xde9798d9ul);
 51+#elif ECMULT_CONST_BITS == 132
 52+    /* For GROUP_SIZE = 4,6 */
 53+    static const secp256k1_scalar secp256k1_ecmult_const_K = SECP256K1_SCALAR_CONST(0x76b1d93dul, 0x0fae3c6bul, 0x3215874bul, 0x94e93813ul, 0x7937fe0dul, 0xb66bcaaful, 0xb3749ca5ul, 0xd7b6171b);
 54+#else
 55+#  error "Unknown ECMULT_CONST_BITS"
 56+#endif
 57+
 58 static void secp256k1_ecmult_const(secp256k1_gej *r, const secp256k1_ge *a, const secp256k1_scalar *q) {
 59     /* The approach below combines the signed-digit logic from Mike Hamburg's
 60      * "Fast and compact elliptic-curve cryptography" (https://eprint.iacr.org/2012/309)
 61@@ -93,7 +112,7 @@ static void secp256k1_ecmult_const(secp256k1_gej *r, const secp256k1_ge *a, cons
 62      *
 63      *   C_l(v, A) = sum((2*v[i] - 1) * 2^i*A, i=0..l-1)
 64      *
 65-     * Then it holds that C_n(v, A) = sum((2*v[i] - 1) * 2^i*A, i=0..l-1)
 66+     * Then it holds that C_l(v, A) = sum((2*v[i] - 1) * 2^i*A, i=0..l-1)
 67      *                              = (2*sum(v[i] * 2^i, i=0..l-1) + 1 - 2^l) * A
 68      *                              = (2*v + 1 - 2^l) * A
 69      *
 70@@ -150,26 +169,8 @@ static void secp256k1_ecmult_const(secp256k1_gej *r, const secp256k1_ge *a, cons
 71      */
 72 
 73     /* The offset to add to s1 and s2 to make them non-negative. Equal to 2^128. */
 74-    static const secp256k1_scalar S_OFFSET = SECP256K1_SCALAR_CONST(0, 0, 0, 1, 0, 0, 0, 0);
 75-
 76-    /* For K, we have several precomputed formulas/constants.
 77-     * - in exhaustive test mode, we give an explicit expression to compute it at compile time: */
 78-#ifdef EXHAUSTIVE_TEST_ORDER
 79-    static const secp256k1_scalar K = ((SECP256K1_SCALAR_CONST(0, 0, 0, (1U << (ECMULT_CONST_BITS - 128)) - 2U, 0, 0, 0, 0) + EXHAUSTIVE_TEST_ORDER - 1U) * (1U + EXHAUSTIVE_TEST_LAMBDA)) % EXHAUSTIVE_TEST_ORDER;
 80-    /* - for the real secp256k1 group we have constants for various ECMULT_CONST_BITS values. */
 81-#elif ECMULT_CONST_BITS == 129
 82-    /* For GROUP_SIZE = 1,3. */
 83-    static const secp256k1_scalar K = SECP256K1_SCALAR_CONST(0xac9c52b3ul, 0x3fa3cf1ful, 0x5ad9e3fdul, 0x77ed9ba4ul, 0xa880b9fcul, 0x8ec739c2ul, 0xe0cfc810ul, 0xb51283ceul);
 84-#elif ECMULT_CONST_BITS == 130
 85-    /* For GROUP_SIZE = 2,5. */
 86-    static const secp256k1_scalar K = SECP256K1_SCALAR_CONST(0xa4e88a7dul, 0xcb13034eul, 0xc2bdd6bful, 0x7c118d6bul, 0x589ae848ul, 0x26ba29e4ul, 0xb5c2c1dcul, 0xde9798d9ul);
 87-#elif ECMULT_CONST_BITS == 132
 88-    /* For GROUP_SIZE = 4,6 */
 89-    static const secp256k1_scalar K = SECP256K1_SCALAR_CONST(0x76b1d93dul, 0x0fae3c6bul, 0x3215874bul, 0x94e93813ul, 0x7937fe0dul, 0xb66bcaaful, 0xb3749ca5ul, 0xd7b6171b);
 90-#else
 91-#  error "Unknown ECMULT_CONST_BITS"
 92-#endif
 93 
 94+    static const secp256k1_scalar S_OFFSET = SECP256K1_SCALAR_CONST(0, 0, 0, 1, 0, 0, 0, 0);
 95     secp256k1_scalar s, v1, v2;
 96     secp256k1_ge pre_a[ECMULT_CONST_TABLE_SIZE];
 97     secp256k1_ge pre_a_lam[ECMULT_CONST_TABLE_SIZE];
 98@@ -178,7 +179,7 @@ static void secp256k1_ecmult_const(secp256k1_gej *r, const secp256k1_ge *a, cons
 99     int group, i;
100 
101     /* Compute v1 and v2. */
102-    secp256k1_scalar_add(&s, q, &K);
103+    secp256k1_scalar_add(&s, q, &secp256k1_ecmult_const_K);
104     secp256k1_scalar_half(&s, &s);
105     secp256k1_scalar_split_lambda(&v1, &v2, &s);
106     secp256k1_scalar_add(&v1, &v1, &S_OFFSET);
107@@ -209,7 +210,6 @@ static void secp256k1_ecmult_const(secp256k1_gej *r, const secp256k1_ge *a, cons
108     secp256k1_gej_set_ge(&res, a);
109     secp256k1_ecmult_const_odd_multiples_table_globalz(pre_a, &Z, &res);
110     for (i = 0; i < ECMULT_CONST_TABLE_SIZE; i++) {
111-        secp256k1_fe_normalize_weak(&pre_a[i].y);
112         secp256k1_ge_mul_lambda(&pre_a_lam[i], &pre_a[i]);
113     }
114 
115@@ -219,7 +219,7 @@ static void secp256k1_ecmult_const(secp256k1_gej *r, const secp256k1_ge *a, cons
116      * at a time, from high in v1/v2 to low. Call these bits1 (from v1) and bits2 (from v2).
117      *
118      * Now note that ECMULT_CONST_TABLE_GET_GE(&t, pre_a, bits1) loads into t a point equal
119-     * to C_{ECMULT_CONST_GROUP_SIZE}(bits, A), and analogously for pre_lam_a / bits2.
120+     * to C_{ECMULT_CONST_GROUP_SIZE}(bits1, A), and analogously for pre_lam_a / bits2.
121      * This means that all we need to do is add these looked up values together, multiplied
122      * by 2^(ECMULT_GROUP_SIZE * group).
123      */
124diff --git a/src/tests.c b/src/tests.c
125index 1887d6aa..3ed8e49c 100644
126--- a/src/tests.c
127+++ b/src/tests.c
128@@ -2375,11 +2375,17 @@ static void run_scalar_tests(void) {
129         /* Test that halving and doubling roundtrips on some fixed values. */
130         static const secp256k1_scalar HALF_TESTS[] = {
131             /* 0 */
132-            SECP256K1_SCALAR_CONST(0, 0, 0, 0, 0, 0, 0, 0),
133+            secp256k1_scalar_one,
134             /* 1 */
135-            SECP256K1_SCALAR_CONST(0, 0, 0, 0, 0, 0, 0, 1),
136+            secp256k1_scalar_zero,
137             /* -1 */
138             SECP256K1_SCALAR_CONST(0xfffffffful, 0xfffffffful, 0xfffffffful, 0xfffffffeul, 0xbaaedce6ul, 0xaf48a03bul, 0xbfd25e8cul, 0xd0364140ul),
139+            /* -2 (largest odd value) */
140+            SECP256K1_SCALAR_CONST(0xfffffffful, 0xfffffffful, 0xfffffffful, 0xfffffffeul, 0xbaaedce6ul, 0xaf48a03bul, 0xbfd25e8cul, 0xd036413Ful),
141+            /* Half the secp256k1 order */
142+            SECP256K1_SCALAR_CONST(0x7ffffffful, 0xfffffffful, 0xfffffffful, 0xfffffffful, 0x5d576e73ul, 0x57a4501dul, 0xdfe92f46ul, 0x681b20a0ul),
143+            /* Half the secp256k1 order + 1*/
144+            SECP256K1_SCALAR_CONST(0x7ffffffful, 0xfffffffful, 0xfffffffful, 0xfffffffful, 0x5d576e73ul, 0x57a4501dul, 0xdfe92f46ul, 0x681b20a1ul),
145             /* 2^255 */
146             SECP256K1_SCALAR_CONST(0x80000000ul, 0, 0, 0, 0, 0, 0, 0),
147             /* 2^255 - 1 */
148@@ -4535,25 +4541,75 @@ static void ecmult_const_commutativity(void) {
149 static void ecmult_const_mult_zero_one(void) {
150     secp256k1_scalar zero = SECP256K1_SCALAR_CONST(0, 0, 0, 0, 0, 0, 0, 0);
151     secp256k1_scalar one = SECP256K1_SCALAR_CONST(0, 0, 0, 0, 0, 0, 0, 1);
152+    secp256k1_scalar s;
153     secp256k1_scalar negone;
154     secp256k1_gej res1;
155     secp256k1_ge res2;
156     secp256k1_ge point;
157-    secp256k1_scalar_negate(&negone, &one);
158+    secp256k1_ge inf;
159 
160+    random_scalar_order_test(&s);
161+    secp256k1_scalar_negate(&negone, &one);
162     random_group_element_test(&point);
163+    secp256k1_ge_set_infinity(&inf);
164+
165+    /* 0*point */
166     secp256k1_ecmult_const(&res1, &point, &zero);
167-    secp256k1_ge_set_gej(&res2, &res1);
168-    CHECK(secp256k1_ge_is_infinity(&res2));
169+    CHECK(secp256k1_gej_is_infinity(&res1));
170+
171+    /* s*inf */
172+    secp256k1_ecmult_const(&res1, &inf, &s);
173+    CHECK(secp256k1_gej_is_infinity(&res1));
174+
175+    /* 1*point */
176     secp256k1_ecmult_const(&res1, &point, &one);
177     secp256k1_ge_set_gej(&res2, &res1);
178     ge_equals_ge(&res2, &point);
179+
180+    /* -1*point */
181     secp256k1_ecmult_const(&res1, &point, &negone);
182     secp256k1_gej_neg(&res1, &res1);
183     secp256k1_ge_set_gej(&res2, &res1);
184     ge_equals_ge(&res2, &point);
185 }
186 
187+static void ecmult_const_check_result(const secp256k1_ge *A, const secp256k1_scalar* q, const secp256k1_gej *res) {
188+    secp256k1_scalar zero = SECP256K1_SCALAR_CONST(0, 0, 0, 0, 0, 0, 0, 0);
189+    secp256k1_gej pointj, res2j;
190+    secp256k1_ge res2;
191+    secp256k1_gej_set_ge(&pointj, A);
192+    secp256k1_ecmult(&res2j, &pointj, q, &zero);
193+    secp256k1_ge_set_gej(&res2, &res2j);
194+    ge_equals_gej(&res2, res);
195+}
196+
197+static void ecmult_const_mult_edges(void) {
198+    secp256k1_scalar q;
199+    secp256k1_ge point;
200+    secp256k1_gej res;
201+    size_t i;
202+    size_t cases = 1 + sizeof(scalars_near_split_bounds) / sizeof(scalars_near_split_bounds[0]);
203+
204+    /* We are trying to reach the following edge cases (variables are defined as
205+     * in ecmult_const_impl.h):
206+     *   1. i = 0: s = 0 <=> q = -K
207+     *   2. i > 0: v1, v2 large values
208+     *               <=> s1, s2 large values
209+     *               <=> s = scalars_near_split_bounds[i]
210+     *               <=> q = 2*scalars_near_split_bounds[i] - K
211+     */
212+    for (i = 0; i < cases; ++i) {
213+        secp256k1_scalar_negate(&q, &secp256k1_ecmult_const_K);
214+        if (i > 0) {
215+            secp256k1_scalar_add(&q, &q, &scalars_near_split_bounds[i - 1]);
216+            secp256k1_scalar_add(&q, &q, &scalars_near_split_bounds[i - 1]);
217+        }
218+        random_group_element_test(&point);
219+        secp256k1_ecmult_const(&res, &point, &q);
220+        ecmult_const_check_result(&point, &q, &res);
221+    }
222+}
223+
224 static void ecmult_const_mult_xonly(void) {
225     int i;
226 
227@@ -4644,6 +4700,7 @@ static void ecmult_const_chain_multiply(void) {
228 
229 static void run_ecmult_const_tests(void) {
230     ecmult_const_mult_zero_one();
231+    ecmult_const_mult_edges();
232     ecmult_const_random_mult();
233     ecmult_const_commutativity();
234     ecmult_const_chain_multiply();

sipa force-pushed on Jun 20, 2023

sipa commented at 8:40 pm on June 20, 2023: contributor

Fixed compilation error, reverting the following:

 0diff --git a/src/tests.c b/src/tests.c
 1index 3ed8e49c..88b5c3a4 100644
 2--- a/src/tests.c
 3+++ b/src/tests.c
 4@@ -2375,9 +2375,9 @@ static void run_scalar_tests(void) {
 5         /* Test that halving and doubling roundtrips on some fixed values. */
 6         static const secp256k1_scalar HALF_TESTS[] = {
 7             /* 0 */
 8-            secp256k1_scalar_one,
 9+            SECP256K1_SCALAR_CONST(0, 0, 0, 0, 0, 0, 0, 0),
10             /* 1 */
11-            secp256k1_scalar_zero,
12+            SECP256K1_SCALAR_CONST(0, 0, 0, 0, 0, 0, 0, 1),
13             /* -1 */
14             SECP256K1_SCALAR_CONST(0xfffffffful, 0xfffffffful, 0xfffffffful, 0xfffffffeul, 0xbaaedce6ul, 0xaf48a03bul, 0xbfd25e8cul, 0xd0364140ul),
15             /* -2 (largest odd value) */

in src/scalar_8x32_impl.h:248 in e6b14b947f outdated

217@@ -218,6 +218,37 @@ static void secp256k1_scalar_negate(secp256k1_scalar *r, const secp256k1_scalar
218     r->d[7] = t & nonzero;
219 }
220 
221+static void secp256k1_scalar_half(secp256k1_scalar *r, const secp256k1_scalar *a) {

real-or-random commented at 2:35 pm on June 22, 2023:

Notes for myself (that could be turned into a comment):

 0Writing `/` for field division and `//` for integer division, we compute
 1  a/2 = (a - (a&1))/2 + (a&1)/2
 2      = (a >> 1) + (a&1 ?    1/2 : 0)
 3      = (a >> 1) + (a&1 ? n//2+1 : 0),
 4where n is the group order and in the last equality we have used 1/2 = n//2+1 (mod n).
 5For n/2, we have the constants SECP256K1_N_H_0, ...
 6
 7The sum does not overflow:
 8The interesting case is a = -2, the largest odd scalar. Here, the left summand is
 9  a >> 1 = (a - a&1)/2  = (n-2-1)//2 = (n-3)//2,
10and the right summand is
11   a&1 ? n//2+1 : 0 = n//2+1 = (n-1)//2 + 2//2 = (n+1)//2.
12In sum, we have (n-3)//2 + (n+1)//2 = (2n-2)//2 = n - 1 < n.

sipa commented at 6:51 pm on June 23, 2023:

Not sure I understand the a[255..0] notation.

real-or-random commented at 8:09 am on June 24, 2023:

Yeah, I edited the text to remove it. It was a relic of an earlier draft.

sipa commented at 7:11 pm on June 26, 2023:

Included in the commit with some minor editing.

in src/scalar_8x32_impl.h:236 in e6b14b947f outdated

231+    r->d[2] = t; t >>= 32;
232+    t += (uint32_t)((a->d[3] >> 1) | (a->d[4] << 31));
233+    t += SECP256K1_N_H_3 & mask;
234+    r->d[3] = t; t >>= 32;
235+    t += (uint32_t)((a->d[4] >> 1) | (a->d[5] << 31));
236+    t += mask;

real-or-random commented at 2:39 pm on June 22, 2023:

0    t += SECP256K1_N_H_4 & mask;

and same below is easier to read and more general.

sipa commented at 7:01 pm on June 23, 2023:

Done.

in src/scalar_8x32_impl.h:294 in e6b14b947f outdated

244+    r->d[7] = (uint32_t)t + (uint32_t)(a->d[7] >> 1) + (SECP256K1_N_H_7 & mask);
245+#ifdef VERIFY
246+    /* The line above only computed the bottom 32 bits of r->d[7]. Redo the computation
247+     * in full 64 bits to make sure no overflow occured. */
248+    VERIFY_CHECK((t + (a->d[7] >> 1) + (SECP256K1_N_H_7 & mask)) >> 32 == 0);
249+#endif

real-or-random commented at 9:05 am on June 23, 2023:

Add a VERIFY(secp256k1_scalar_check_overflow(r)) == 0)?

(We should probably add a secp256k1_scalar_verify and check elements in every function, like for field and group. But that’s a separate PR.)

sipa commented at 7:01 pm on June 23, 2023:

Done.

sipa force-pushed on Jun 23, 2023

sipa force-pushed on Jun 26, 2023

real-or-random cross-referenced this on Jun 27, 2023 from issue scalar: Verify invariants on every entry by real-or-random

jonasnick commented at 12:22 pm on July 3, 2023: contributor

I didn’t include the macro -> function change, as I measure a slight slowdown from it.

Oh, ok. I did not run benchmarks.

1a26896eff6bc2f7b2edbcd0313c3e2e574b08a6 looks good

sipa force-pushed on Jul 9, 2023

sipa commented at 2:06 pm on July 9, 2023: contributor

Needed trivial rebase (CHANGELOG.md affected by #1354).

real-or-random commented at 11:32 am on August 24, 2023: contributor

needs rebase (and I plan to have a look soon :))

sipa force-pushed on Aug 24, 2023

sipa commented at 8:35 pm on August 24, 2023: contributor

Rebased.

jonasnick added this to the milestone 0.4.1 or 0.5.0 on Sep 20, 2023

in src/ecmult_const_impl.h:22 in 5dbf252935 outdated

17+ * the tables cannot have infinities in them (this breaks the effective-affine technique's
18+ * z-ratio tracking) */
19+#  if EXHAUSTIVE_TEST_ORDER == 199
20+#    define ECMULT_CONST_GROUP_SIZE 4
21+#  elif EXHAUSTIVE_TEST_ORDER == 13
22+#    define ECMULT_CONST_GROUP_SIZE 3

theStack commented at 3:43 pm on September 26, 2023:

Is there any reason why the smallest exhaustive group test order of the ones that we typically use (7) is not supported here? Adding it with group size 2, i.e.

0#  elif EXHAUSTIVE_TEST_ORDER == 7
1#    define ECMULT_CONST_GROUP_SIZE 2

seems to work at least (adapting EXHAUSTIVE_TEST_ORDER to 7 in ./src/tests_exhaustive.c and running them doesn’t return any error)

sipa commented at 3:28 pm on October 23, 2023:

Done. This PR (or its code) may have predated the test order 7 introduction.

in src/ecmult_const_impl.h:67 in 5dbf252935 outdated

78     secp256k1_fe neg_y; \
79-    VERIFY_CHECK(((n) & 1) == 1); \
80-    VERIFY_CHECK((n) >= -((1 << ((w)-1)) - 1)); \
81-    VERIFY_CHECK((n) <=  ((1 << ((w)-1)) - 1)); \
82+    VERIFY_CHECK((n) >> ECMULT_CONST_GROUP_SIZE == 0); \
83+    VERIFY_CHECK(index >> (ECMULT_CONST_GROUP_SIZE - 1) == 0); \

theStack commented at 3:49 pm on September 26, 2023:

nit: maybe slightly easier to grasp (at least for me), if the intended assertions here are “n must be in range [0,2^ECMULT_CONST_GROUP_SIZE[” and “index must be in range [0,2^(ECMULT_CONST_GROUP_SIZE-1)[”?

0    VERIFY_CHECK((n) < (1U << ECMULT_CONST_GROUP_SIZE)); \
1    VERIFY_CHECK(index < (1U << (ECMULT_CONST_GROUP_SIZE - 1))); \

(assuming callers of this macro only ever pass unsigned n)

sipa commented at 3:28 pm on October 23, 2023:

Done.

in src/ecmult_const_impl.h:119 in f773744172 outdated

204+     *
205+     * Then it holds that C_l(v, A) = sum((2*v[i] - 1) * 2^i*A, i=0..l-1)
206+     *                              = (2*sum(v[i] * 2^i, i=0..l-1) + 1 - 2^l) * A
207+     *                              = (2*v + 1 - 2^l) * A
208+     *
209+     * Thus, one can compute q*A as C_256((q + 2^l - 1) / 2, A). This is the basis for the

siv2r commented at 1:33 pm on October 21, 2023:

0     * Thus, one can compute q*A as C_256((q + 2^256 - 1) / 2, A). This is the basis for the

sipa commented at 3:29 pm on October 23, 2023:

Fixed.

sipa force-pushed on Oct 23, 2023

sipa commented at 3:29 pm on October 23, 2023: contributor

Rebased and addressed the comments above.

jonasnick approved

jonasnick commented at 12:53 pm on October 24, 2023: contributor

ACK mod nits c8eb787cd96995aefa79bf4bab18903c409db026

I added a couple of (micro-)nit fixups to my branch. Most notably, it tries to unify the first iteration in ecmult_const with the loop and adds an explanation for the index computation in ECMULT_CONST_TABLE_GET_GE.

in src/scalar_8x32_impl.h:292 in 878b49383c outdated

287+    r->d[7] = (uint32_t)t + (uint32_t)(a->d[7] >> 1) + (SECP256K1_N_H_7 & mask);
288+#ifdef VERIFY
289+    /* The line above only computed the bottom 32 bits of r->d[7]. Redo the computation
290+     * in full 64 bits to make sure no overflow occured. */
291+    VERIFY_CHECK((t + (a->d[7] >> 1) + (SECP256K1_N_H_7 & mask)) >> 32 == 0);
292+    VERIFY_CHECK(secp256k1_scalar_check_overflow(r) == 0);

real-or-random commented at 1:52 pm on October 25, 2023:

nit: This is redundant because it’s done by secp256k1_scalar_verify. (Same in 64 bit version)

sipa commented at 7:56 pm on November 4, 2023:

Gone.

in src/scalar_8x32_impl.h:290 in 878b49383c outdated

285+    t += SECP256K1_N_H_6 & mask;
286+    r->d[6] = t; t >>= 32;
287+    r->d[7] = (uint32_t)t + (uint32_t)(a->d[7] >> 1) + (SECP256K1_N_H_7 & mask);
288+#ifdef VERIFY
289+    /* The line above only computed the bottom 32 bits of r->d[7]. Redo the computation
290+     * in full 64 bits to make sure no overflow occured. */

real-or-random commented at 1:54 pm on October 25, 2023:

nit: I think the comment is slightly misleading. Unless I’m mistaken, we redo the computation to check that indeed we didn’t need to compute the upper 32 bits explicitly, because we know they’re zero. I don’t see how this is related to overflow.

sipa commented at 7:56 pm on November 4, 2023:

Fixed.

in src/ecmult_const_impl.h:66 in 402b3776a8 outdated

74+#define ECMULT_CONST_TABLE_GET_GE(r,pre,n) do { \
75+    unsigned m = 0; \
76+    /* If the top bit of n is 0, we want the negation. */ \
77+    volatile unsigned negative = ((n) >> (ECMULT_CONST_GROUP_SIZE - 1)) ^ 1; \
78+    /* The index is computed by looking at the bottom bits, after making positive. */ \
79+    unsigned index = ((unsigned)(-negative) ^ n) & ((1U << (ECMULT_CONST_GROUP_SIZE - 1)) - 1U); \

real-or-random commented at 2:09 pm on October 25, 2023:

nit: It’s purely stylistic, but I suggest preferring unsigned int over just unsigned, as we do almost everywhere.

(There are few more instances below.)

sipa commented at 7:56 pm on November 4, 2023:

Done.

in src/ecmult_const_impl.h:194 in 402b3776a8 outdated

258+     */
259 
260-    int i;
261+    /* The offset to add to s1 and s2 to make them non-negative. Equal to 2^128. */
262 
263+    static const secp256k1_scalar S_OFFSET = SECP256K1_SCALAR_CONST(0, 0, 0, 1, 0, 0, 0, 0);

real-or-random commented at 2:35 pm on October 25, 2023:

nit: Drop the empty line here?

sipa commented at 7:57 pm on November 4, 2023:

Done.

in src/ecmult_const_impl.h:207 in 402b3776a8 outdated

279+     * secp256k1_ecmult_const_odd_multiples_table_globalz) cannot deal with infinity in a
280+     * constant-time manner anyway. */
281     if (secp256k1_ge_is_infinity(a)) {
282         secp256k1_gej_set_infinity(r);
283         return;
284     }

real-or-random commented at 2:49 pm on October 25, 2023:

nit: Want to move this to the top of the function? This will make the return earlier, but I also think it’s generally more readable. (At the moment, this block is between computing v1, v2 and VERIFYing their range. That’s a bit unnatural.)

sipa commented at 7:57 pm on November 4, 2023:

Done.

in src/ecmult_const_impl.h:256 in 402b3776a8 outdated

262 
263+    static const secp256k1_scalar S_OFFSET = SECP256K1_SCALAR_CONST(0, 0, 0, 1, 0, 0, 0, 0);
264+    secp256k1_scalar s, v1, v2;
265+    secp256k1_ge pre_a[ECMULT_CONST_TABLE_SIZE];
266+    secp256k1_ge pre_a_lam[ECMULT_CONST_TABLE_SIZE];
267+    secp256k1_fe Z;

real-or-random commented at 3:06 pm on October 25, 2023:

nit: I think we should reserve uppercase for macros.

sipa commented at 7:57 pm on November 4, 2023:

Fixed.

in src/ecmult_const_impl.h:257 in 402b3776a8 outdated

263+    static const secp256k1_scalar S_OFFSET = SECP256K1_SCALAR_CONST(0, 0, 0, 1, 0, 0, 0, 0);
264+    secp256k1_scalar s, v1, v2;
265+    secp256k1_ge pre_a[ECMULT_CONST_TABLE_SIZE];
266+    secp256k1_ge pre_a_lam[ECMULT_CONST_TABLE_SIZE];
267+    secp256k1_fe Z;
268+    secp256k1_gej res;

real-or-random commented at 3:08 pm on October 25, 2023:

nit: Is there a reason not to work on r directly?

sipa commented at 7:57 pm on November 4, 2023:

No real reason; fixed.

in src/tests.c:4499 in 31969b5b81 outdated

4497     secp256k1_ge_set_gej(&res2, &res1);
4498     ge_equals_ge(&res2, &point);
4499 }
4500 
4501+static void ecmult_const_check_result(const secp256k1_ge *A, const secp256k1_scalar* q, const secp256k1_gej *res) {
4502+    secp256k1_scalar zero = SECP256K1_SCALAR_CONST(0, 0, 0, 0, 0, 0, 0, 0);

real-or-random commented at 3:17 pm on October 25, 2023:

nit: secp256k1_scalar_zero?

sipa commented at 7:58 pm on November 4, 2023:

Done.

in src/ecmult_const_impl.h:98 in c8eb787cd9 outdated

141+#elif ECMULT_CONST_BITS == 129
142+    /* For GROUP_SIZE = 1,3. */
143+    static const secp256k1_scalar secp256k1_ecmult_const_K = SECP256K1_SCALAR_CONST(0xac9c52b3ul, 0x3fa3cf1ful, 0x5ad9e3fdul, 0x77ed9ba4ul, 0xa880b9fcul, 0x8ec739c2ul, 0xe0cfc810ul, 0xb51283ceul);
144+#elif ECMULT_CONST_BITS == 130
145+    /* For GROUP_SIZE = 2,5. */
146+    static const secp256k1_scalar secp256k1_ecmult_const_K = SECP256K1_SCALAR_CONST(0xa4e88a7dul, 0xcb13034eul, 0xc2bdd6bful, 0x7c118d6bul, 0x589ae848ul, 0x26ba29e4ul, 0xb5c2c1dcul, 0xde9798d9ul);

real-or-random commented at 3:43 pm on October 25, 2023:

This K is odd, which means one strategy for computing (q + K) / 2 is

0(q + K) / 2 = if q is even then q/2 + K/2 else (q+K)/2

Halving even scalars wouldn’t even require our half function, it’s just a shift. But I don’t think we should do this because it works only for group sizes 2,5…

edit: Okay, after I wrote the above, I noticed that we want the algorithm to be constant-time in q. So this would save a half, but at the costs of two shifts, an additional add, and a cmov. That’s probably worse in performance anyway…

sipa commented at 7:59 pm on November 4, 2023:

I think a halving won’t be worse than say an add plus a shift, so I don’t think this would be an improvement.

real-or-random commented at 3:43 pm on October 25, 2023: contributor

utACK mod these nits

also utACK on jonas’ fixups (except a typo and one further comment here: https://github.com/jonasnick/secp256k1/commit/cdf545a9d30cf3805b6e0c34d1baeacefdbf9323 )

I believe I’ve done a thorough review, but I really have only nits.

in src/ecmult_const_impl.h:178 in 402b3776a8 outdated

177+#elif ECMULT_CONST_BITS == 130
178+    /* For GROUP_SIZE = 2,5. */
179+    static const secp256k1_scalar secp256k1_ecmult_const_K = SECP256K1_SCALAR_CONST(0xa4e88a7dul, 0xcb13034eul, 0xc2bdd6bful, 0x7c118d6bul, 0x589ae848ul, 0x26ba29e4ul, 0xb5c2c1dcul, 0xde9798d9ul);
180+#elif ECMULT_CONST_BITS == 132
181+    /* For GROUP_SIZE = 4,6 */
182+    static const secp256k1_scalar secp256k1_ecmult_const_K = SECP256K1_SCALAR_CONST(0x76b1d93dul, 0x0fae3c6bul, 0x3215874bul, 0x94e93813ul, 0x7937fe0dul, 0xb66bcaaful, 0xb3749ca5ul, 0xd7b6171b);

stratospher commented at 5:54 am on October 30, 2023:

micro nit if you touch this commit again:

0    static const secp256k1_scalar secp256k1_ecmult_const_K = SECP256K1_SCALAR_CONST(0x76b1d93dul, 0x0fae3c6bul, 0x3215874bul, 0x94e93813ul, 0x7937fe0dul, 0xb66bcaaful, 0xb3749ca5ul, 0xd7b6171bul);

sipa commented at 7:59 pm on November 4, 2023:

Done.

stratospher commented at 9:53 am on November 3, 2023: contributor

ACK c8eb787.

was able to follow the useful comments and code! also observed a 2% speedup on my machine.

Add secp256k1_scalar_half for halving scalars (+ tests/benchmarks).

Co-authored-by: Jonas Nick <jonasd.nick@gmail.com>
Co-authored-by: Tim Ruffing <crypto@timruffing.de>

2140da9cd5

make SECP256K1_SCALAR_CONST reduce modulo exhaustive group order ba523be067

Signed-digit based ecmult_const algorithm 4d16e90111

ecmult_const: add/improve tests

* add test case for a=infinity

  The corresponding ecmult_const branch was not tested before this commit.

* add test for edge cases

aa9f3a3c00

Remove unused secp256k1_wnaf_const 115fdc7232

Remove unused secp256k1_scalar_shr_int 21f49d9bec

Add changelog entry for signed-digit ecmult_const algorithm 355bbdf38a

sipa force-pushed on Nov 4, 2023

jonasnick approved

jonasnick commented at 10:33 am on November 7, 2023: contributor

ACK 355bbdf38a2f932daadd02325a0d90d902cb2af4

siv2r commented at 4:49 pm on November 7, 2023: contributor

ACK 355bbdf

real-or-random approved

real-or-random commented at 10:17 pm on November 7, 2023: contributor

ACK 355bbdf38a2f932daadd02325a0d90d902cb2af4

real-or-random merged this on Nov 7, 2023

real-or-random closed this on Nov 7, 2023

Signed-digit based ecmult_const algorithm #1184