Signed-digit multi-comb for ecmult

peterdettman commented at 10:27 am on August 4, 2018: contributor

See section 3.3 of https://eprint.iacr.org/2012/309 for a description of the algorithm. Briefly, the scalar is recoded into signed-binary form, then divided into several blocks. A separate precomp. table is prepared for each block, and performing a multiplication is done using one comb per block, interleaved.

This implementation is constant-time, preserves the existing scalar blinding, but the NUMS group element is not yet used, perhaps not really useful (no zeroes in the signed-digit recoding). ~~Static precomputation is not yet implemented.~~ Settings are overridden to let the exhaustive tests work.

You can play with the comb parameters in ecmult_gen.h .

Compared to the existing approach, this gives improved performance/memory tradeoffs, and allows considerable flexibility in the parameters depending on platform details.

The following table gives an idea of the sort of tradeoffs available (bench_sign results - best “min” of 3, asm=no, 64bit field and scalar, -O3, Haswell):

Blocks	Teeth	Spacing	Memory (KiB)	Time (us)
43	6	1	86	39.2
22	6	2	44	39.7
11	6	4	22	40.1
4	6	11	8	41.0
4	5	13	4	42.6
2	5	26	2	44.6
2	4	32	1	48.4
1	4	64	0.5	53.3
1	3	86	0.25	63.1
1	2	128	0.125	82.3
1	1	256	0.0625	140

For existing approach: 44.6us (64KiB precomp. data)

Signed-digit multi-comb for ecmult_gen

- see section 3.3 of https://eprint.iacr.org/2012/309

75d9e97b26

Support static precomputation with multi-comb 09ca1468d8

gmaxwell commented at 0:22 am on August 11, 2018: contributor

Oh wow! I wasn’t expecting to see a big (relative to what we normally get) speedup any time soon.

in src/ecmult_gen_impl.h:147 in 09ca1468d8 outdated

140@@ -141,7 +141,14 @@ static void secp256k1_ecmult_gen_context_build(secp256k1_ecmult_gen_context *ctx
141 #endif
142 #else
143     (void)cb;
144-    ctx->prec = (secp256k1_ge_storage (*)[64][16])secp256k1_ecmult_static_context;
145+#if USE_COMB
146+    ctx->prec = (secp256k1_ge_storage (*)[COMB_BLOCKS][COMB_POINTS])secp256k1_ecmult_gen_ctx_prec;
147+#if COMB_OFFSET
148+    secp256k1_ge_from_storage(&ctx->offset, &secp256k1_ecmult_gen_ctx_offset);

apoelstra commented at 0:44 am on September 23, 2018:

secp256k1_ecmult_gen_ctx_offset is not declared except in gen_context.c, so this line doesn’t compile for me when I set the parameters to 4/4/16.

peterdettman commented at 4:46 am on September 23, 2018:

This is a build system issue; there’s no dependency of gen_context on ecmult_gen.h (and presumably others). At the moment, after changing comb parameters in ecmult_gen.h, you’d need to touch gen_context.c (or just make clean).

apoelstra commented at 12:59 pm on September 24, 2018:

This has burned me multiple times while testing – @sipa can you advise how to fix this?

Add comments for context offset e8beef9ab2

in src/ecmult_gen.h:57 in 75d9e97b26 outdated

52+#endif
53+
54+/* The remaining COMB_* parameters are derived values, don't modify these. */
55+#define COMB_BITS (COMB_BLOCKS * COMB_TEETH * COMB_SPACING)
56+#define COMB_GROUPED ((COMB_SPACING == 1) && ((32 % COMB_TEETH) == 0))
57+#define COMB_OFFSET (COMB_BITS == 256)

apoelstra commented at 0:46 am on September 23, 2018:

Can you add some documentation somewhere about what the offset does?

peterdettman commented at 5:12 am on September 23, 2018:

Done: https://github.com/bitcoin-core/secp256k1/pull/546/commits/e8beef9ab2476da9a912a7a12982b1e86d2f166c .

apoelstra commented at 2:12 pm on September 23, 2018:

Great, thanks!

peterdettman commented at 5:52 am on September 23, 2018: contributor

The CI test failure (1407.7) seems spurious, might need investigation.

apoelstra commented at 1:54 pm on September 23, 2018: contributor

Agreed it seems spurious - I had the same issue on #557. I kicked travis to see if it works this time.

in src/ecmult_gen.h:47 in 75d9e97b26 outdated

42+
43+#else
44+
45+  /* COMB_BLOCKS, COMB_TEETH, COMB_SPACING must all be positive and the product of the three (COMB_BITS)
46+   * must evaluate to a value in the range [256, 288]. The resulting memory usage for precomputation
47+   * will be COMB_POINTS_TOTAL * sizeof(secp256k1_ge_storage). */

apoelstra commented at 4:27 pm on September 23, 2018:

Should comment that COMB_SPACING should not exceed 32 or else the bit_pos logic in secp256k1_ecmult_gen stops working.

peterdettman commented at 2:47 am on September 24, 2018:

I don’t see why, and all cases in the table above passed the tests before benchmarking.

apoelstra commented at 12:59 pm on September 24, 2018:

Oh, I think I forgot to make clean after recompiling again. The tests do pass now.

But my reasoning was that you’re only looking at one word of recoded at once, so if the bit_pos index jumps forward by more than one word, eventually bit_pos >> 5 will increment by more than one per iteration and you’ll have skipped an entire word. But I’m re-reading the code with fresher eyes and now I see that this is exactly the intended behaviour.

in src/ecmult_gen.h:56 in 75d9e97b26 outdated

51+
52+#endif
53+
54+/* The remaining COMB_* parameters are derived values, don't modify these. */
55+#define COMB_BITS (COMB_BLOCKS * COMB_TEETH * COMB_SPACING)
56+#define COMB_GROUPED ((COMB_SPACING == 1) && ((32 % COMB_TEETH) == 0))

apoelstra commented at 4:30 pm on September 23, 2018:

I think we should drop COMB_GROUPED because it’s impossible for 32 % COMB_TEETH to be 0. (If COMB_TEETH is zero, things obviously won’t work…but if it’s ≥ 32, then several other things break: COMB_MASK overflows; the bits = expression in ecmult_gen_impl.h:228 left-shifts too far; I also get this bizarre error about the size of prec being negative in ecmult_gen.h:71.)

apoelstra commented at 4:47 pm on September 23, 2018:

In fact, once COMB_TEETH gets much past 10 we start running into trouble in secp256k1_ecmult_gen_context_build because we put COMB_POINTS_TOTAL-many gejs on the stack.

Given that this is a ridiculous thing to do I don’t think we should try to support it, e.g. by using heap allocations rather than stack allocations in gen_context_build. We should just assume it doesn’t happen and drop COMB_GROUPED.

peterdettman commented at 2:26 am on September 24, 2018:

I assume you are misreading 32%T, which is 0 when T is a small power of 2. Still, practical values of COMB_TEETH probably peak at 8, and there is the stack concern as you say, so we should probably have the precompiler constrain it.

apoelstra commented at 12:46 pm on September 24, 2018:

You’re right, I misread 32 % COMB_TEETH (which is 0 for 1,2,4,8) as COMB_TEETH % 32 (which can’t be 0). Let me reassess COMB_GROUPED in that light.

peterdettman commented at 1:39 pm on September 24, 2018:

https://github.com/bitcoin-core/secp256k1/pull/546/commits/4c8ff9b648a7062133922d2d0c945712e77dd753 adds precompiler guards on the comb constants.

apoelstra commented at 4:57 pm on September 23, 2018: contributor

Since USE_COMB is now always set, I think it’s worth adding another commit that removes the old precomp code.

Other than that, and my nits about extreme parameter settings, ACK.

Reduce side-channels from single-bit reads 183a7ffe65

Avoid unnecessary doublings in precomputation e848342a1c

Add precompiler guards on comb constants 4c8ff9b648

Make use of negation optional via COMB_NEGATION 915661985d

Add missing COMB_NEGATION for exhaustive tests 91179c1acc

in src/ecmult_gen_impl.h:234 in 183a7ffe65 outdated

233             bits = 0;
234             for (tooth = 0; tooth < COMB_TEETH; ++tooth) {
235-                bit = (recoded[bit_pos >> 5] >> (bit_pos & 0x1F)) & 1;
236-                bits |= bit << tooth;
237+                bit = recoded[bit_pos >> 5] >> (bit_pos & 0x1F);
238+                bits &= ~(1 << tooth);

apoelstra commented at 5:34 pm on September 24, 2018:

I think this bit-clearing line isn’t actually needed, because bits starts at 0 and 1 << tooth is unique on each iteration.

peterdettman commented at 5:49 pm on September 24, 2018:

Note that ‘bit’ may now contain junk (or noise) bits beyond bit 0. Iterations after the first therefore need to clear the target bit (and we try to leave noise in the high bits). The change is in response to [1], which I think @gmaxwell mentioned on IRC a while back.

[1] https://www.usenix.org/system/files/conference/usenixsecurity18/sec18-alam.pdf

apoelstra commented at 5:58 pm on September 24, 2018:

Ah! I understand, I was misreading bit << tooth as (bit & 1) << tooth.

apoelstra commented at 5:58 pm on September 24, 2018:

Because the variable is named bit :P

apoelstra commented at 6:34 pm on September 24, 2018: contributor

ACK except that I’d really like to fix the make clean issue.

in src/ecmult_gen_impl.h:233 in 91179c1acc

228+    comb_off = COMB_SPACING - 1;
229+    for (;;) {
230+        bit_pos = comb_off;
231+        for (block = 0; block < COMB_BLOCKS; ++block) {
232+#if COMB_GROUPED
233+            bits = recoded[bit_pos >> 5] >> (bit_pos & 0x1F);

apoelstra commented at 2:01 pm on October 2, 2018:

Should this be done as part of a cmov ladder, like the old algorithm, to avoid cache-timing attacks?

peterdettman commented at 7:09 pm on October 2, 2018:

This line is reading a window from the scalar. IIUC, the cmov ladder you mean is the _ge_storage_cmov loop at lines 255-257.

apoelstra commented at 7:11 pm on October 2, 2018:

Oh, yes! I missed that - you do indeed have a cmov ladder at the point where the secret data is actually used. Thanks.

gmaxwell cross-referenced this on May 25, 2019 from issue Use a static constant table for small ecmult WINDOW_G sizes. by gmaxwell

apoelstra cross-referenced this on Nov 9, 2019 from issue Don't put an absurd amount of data onto the stack in some configs by gmaxwell

sipa cross-referenced this on Nov 11, 2019 from issue Signed-digit multi-comb for ecmult_gen (by peterdettman) by sipa

sipa commented at 10:46 pm on November 11, 2019: contributor

I’ve opened a rebased PR with somewhat better integration added in #693.

real-or-random commented at 1:50 pm on February 25, 2020: contributor

I’ve opened a rebased PR with somewhat better integration added in #693.

Closing this in favor of #693.

real-or-random closed this on Feb 25, 2020

sipa cross-referenced this on Dec 29, 2021 from issue Signed-digit multi-comb ecmult_gen algorithm by sipa

sipa commented at 9:08 pm on December 29, 2021: contributor

Another iteration: #1058.

Signed-digit multi-comb for ecmult_gen #546