Signed-digit multi-comb for ecmult_gen #546

pull peterdettman wants to merge 8 commits into bitcoin-core:master from peterdettman:sdmc changing 9 files +414 −24
  1. peterdettman commented at 10:27 am on August 4, 2018: contributor

    See section 3.3 of https://eprint.iacr.org/2012/309 for a description of the algorithm. Briefly, the scalar is recoded into signed-binary form, then divided into several blocks. A separate precomp. table is prepared for each block, and performing a multiplication is done using one comb per block, interleaved.

    This implementation is constant-time, preserves the existing scalar blinding, but the NUMS group element is not yet used, perhaps not really useful (no zeroes in the signed-digit recoding). Static precomputation is not yet implemented. Settings are overridden to let the exhaustive tests work.

    You can play with the comb parameters in ecmult_gen.h .

    Compared to the existing approach, this gives improved performance/memory tradeoffs, and allows considerable flexibility in the parameters depending on platform details.

    The following table gives an idea of the sort of tradeoffs available (bench_sign results - best “min” of 3, asm=no, 64bit field and scalar, -O3, Haswell):

    Blocks Teeth Spacing Memory (KiB) Time (us)
    43 6 1 86 39.2
    22 6 2 44 39.7
    11 6 4 22 40.1
    4 6 11 8 41.0
    4 5 13 4 42.6
    2 5 26 2 44.6
    2 4 32 1 48.4
    1 4 64 0.5 53.3
    1 3 86 0.25 63.1
    1 2 128 0.125 82.3
    1 1 256 0.0625 140

    For existing approach: 44.6us (64KiB precomp. data)

  2. Signed-digit multi-comb for ecmult_gen
    - see section 3.3 of https://eprint.iacr.org/2012/309
    75d9e97b26
  3. Support static precomputation with multi-comb 09ca1468d8
  4. gmaxwell commented at 0:22 am on August 11, 2018: contributor
    Oh wow! I wasn’t expecting to see a big (relative to what we normally get) speedup any time soon.
  5. in src/ecmult_gen_impl.h:147 in 09ca1468d8 outdated
    140@@ -141,7 +141,14 @@ static void secp256k1_ecmult_gen_context_build(secp256k1_ecmult_gen_context *ctx
    141 #endif
    142 #else
    143     (void)cb;
    144-    ctx->prec = (secp256k1_ge_storage (*)[64][16])secp256k1_ecmult_static_context;
    145+#if USE_COMB
    146+    ctx->prec = (secp256k1_ge_storage (*)[COMB_BLOCKS][COMB_POINTS])secp256k1_ecmult_gen_ctx_prec;
    147+#if COMB_OFFSET
    148+    secp256k1_ge_from_storage(&ctx->offset, &secp256k1_ecmult_gen_ctx_offset);
    


    apoelstra commented at 0:44 am on September 23, 2018:
    secp256k1_ecmult_gen_ctx_offset is not declared except in gen_context.c, so this line doesn’t compile for me when I set the parameters to 4/4/16.

    peterdettman commented at 4:46 am on September 23, 2018:
    This is a build system issue; there’s no dependency of gen_context on ecmult_gen.h (and presumably others). At the moment, after changing comb parameters in ecmult_gen.h, you’d need to touch gen_context.c (or just make clean).

    apoelstra commented at 12:59 pm on September 24, 2018:
    This has burned me multiple times while testing – @sipa can you advise how to fix this?
  6. Add comments for context offset e8beef9ab2
  7. in src/ecmult_gen.h:57 in 75d9e97b26 outdated
    52+#endif
    53+
    54+/* The remaining COMB_* parameters are derived values, don't modify these. */
    55+#define COMB_BITS (COMB_BLOCKS * COMB_TEETH * COMB_SPACING)
    56+#define COMB_GROUPED ((COMB_SPACING == 1) && ((32 % COMB_TEETH) == 0))
    57+#define COMB_OFFSET (COMB_BITS == 256)
    


    apoelstra commented at 0:46 am on September 23, 2018:
    Can you add some documentation somewhere about what the offset does?


    apoelstra commented at 2:12 pm on September 23, 2018:
    Great, thanks!
  8. peterdettman commented at 5:52 am on September 23, 2018: contributor
    The CI test failure (1407.7) seems spurious, might need investigation.
  9. apoelstra commented at 1:54 pm on September 23, 2018: contributor
    Agreed it seems spurious - I had the same issue on #557. I kicked travis to see if it works this time.
  10. in src/ecmult_gen.h:47 in 75d9e97b26 outdated
    42+
    43+#else
    44+
    45+  /* COMB_BLOCKS, COMB_TEETH, COMB_SPACING must all be positive and the product of the three (COMB_BITS)
    46+   * must evaluate to a value in the range [256, 288]. The resulting memory usage for precomputation
    47+   * will be COMB_POINTS_TOTAL * sizeof(secp256k1_ge_storage). */
    


    apoelstra commented at 4:27 pm on September 23, 2018:
    Should comment that COMB_SPACING should not exceed 32 or else the bit_pos logic in secp256k1_ecmult_gen stops working.

    peterdettman commented at 2:47 am on September 24, 2018:
    I don’t see why, and all cases in the table above passed the tests before benchmarking.

    apoelstra commented at 12:59 pm on September 24, 2018:

    Oh, I think I forgot to make clean after recompiling again. The tests do pass now.

    But my reasoning was that you’re only looking at one word of recoded at once, so if the bit_pos index jumps forward by more than one word, eventually bit_pos >> 5 will increment by more than one per iteration and you’ll have skipped an entire word. But I’m re-reading the code with fresher eyes and now I see that this is exactly the intended behaviour.

  11. in src/ecmult_gen.h:56 in 75d9e97b26 outdated
    51+
    52+#endif
    53+
    54+/* The remaining COMB_* parameters are derived values, don't modify these. */
    55+#define COMB_BITS (COMB_BLOCKS * COMB_TEETH * COMB_SPACING)
    56+#define COMB_GROUPED ((COMB_SPACING == 1) && ((32 % COMB_TEETH) == 0))
    


    apoelstra commented at 4:30 pm on September 23, 2018:
    I think we should drop COMB_GROUPED because it’s impossible for 32 % COMB_TEETH to be 0. (If COMB_TEETH is zero, things obviously won’t work…but if it’s ≥ 32, then several other things break: COMB_MASK overflows; the bits = expression in ecmult_gen_impl.h:228 left-shifts too far; I also get this bizarre error about the size of prec being negative in ecmult_gen.h:71.)

    apoelstra commented at 4:47 pm on September 23, 2018:

    In fact, once COMB_TEETH gets much past 10 we start running into trouble in secp256k1_ecmult_gen_context_build because we put COMB_POINTS_TOTAL-many gejs on the stack.

    Given that this is a ridiculous thing to do I don’t think we should try to support it, e.g. by using heap allocations rather than stack allocations in gen_context_build. We should just assume it doesn’t happen and drop COMB_GROUPED.


    peterdettman commented at 2:26 am on September 24, 2018:
    I assume you are misreading 32%T, which is 0 when T is a small power of 2. Still, practical values of COMB_TEETH probably peak at 8, and there is the stack concern as you say, so we should probably have the precompiler constrain it.

    apoelstra commented at 12:46 pm on September 24, 2018:
    You’re right, I misread 32 % COMB_TEETH (which is 0 for 1,2,4,8) as COMB_TEETH % 32 (which can’t be 0). Let me reassess COMB_GROUPED in that light.

    peterdettman commented at 1:39 pm on September 24, 2018:
  12. apoelstra commented at 4:57 pm on September 23, 2018: contributor

    Since USE_COMB is now always set, I think it’s worth adding another commit that removes the old precomp code.

    Other than that, and my nits about extreme parameter settings, ACK.

  13. Reduce side-channels from single-bit reads 183a7ffe65
  14. Avoid unnecessary doublings in precomputation e848342a1c
  15. Add precompiler guards on comb constants 4c8ff9b648
  16. Make use of negation optional via COMB_NEGATION 915661985d
  17. Add missing COMB_NEGATION for exhaustive tests 91179c1acc
  18. in src/ecmult_gen_impl.h:234 in 183a7ffe65 outdated
    233             bits = 0;
    234             for (tooth = 0; tooth < COMB_TEETH; ++tooth) {
    235-                bit = (recoded[bit_pos >> 5] >> (bit_pos & 0x1F)) & 1;
    236-                bits |= bit << tooth;
    237+                bit = recoded[bit_pos >> 5] >> (bit_pos & 0x1F);
    238+                bits &= ~(1 << tooth);
    


    apoelstra commented at 5:34 pm on September 24, 2018:
    I think this bit-clearing line isn’t actually needed, because bits starts at 0 and 1 << tooth is unique on each iteration.

    peterdettman commented at 5:49 pm on September 24, 2018:

    Note that ‘bit’ may now contain junk (or noise) bits beyond bit 0. Iterations after the first therefore need to clear the target bit (and we try to leave noise in the high bits). The change is in response to [1], which I think @gmaxwell mentioned on IRC a while back.

    [1] https://www.usenix.org/system/files/conference/usenixsecurity18/sec18-alam.pdf


    apoelstra commented at 5:58 pm on September 24, 2018:
    Ah! I understand, I was misreading bit << tooth as (bit & 1) << tooth.

    apoelstra commented at 5:58 pm on September 24, 2018:
    Because the variable is named bit :P
  19. apoelstra commented at 6:34 pm on September 24, 2018: contributor
    ACK except that I’d really like to fix the make clean issue.
  20. in src/ecmult_gen_impl.h:233 in 91179c1acc
    228+    comb_off = COMB_SPACING - 1;
    229+    for (;;) {
    230+        bit_pos = comb_off;
    231+        for (block = 0; block < COMB_BLOCKS; ++block) {
    232+#if COMB_GROUPED
    233+            bits = recoded[bit_pos >> 5] >> (bit_pos & 0x1F);
    


    apoelstra commented at 2:01 pm on October 2, 2018:
    Should this be done as part of a cmov ladder, like the old algorithm, to avoid cache-timing attacks?

    peterdettman commented at 7:09 pm on October 2, 2018:
    This line is reading a window from the scalar. IIUC, the cmov ladder you mean is the _ge_storage_cmov loop at lines 255-257.

    apoelstra commented at 7:11 pm on October 2, 2018:
    Oh, yes! I missed that - you do indeed have a cmov ladder at the point where the secret data is actually used. Thanks.
  21. gmaxwell cross-referenced this on May 25, 2019 from issue Use a static constant table for small ecmult WINDOW_G sizes. by gmaxwell
  22. apoelstra cross-referenced this on Nov 9, 2019 from issue Don't put an absurd amount of data onto the stack in some configs by gmaxwell
  23. sipa cross-referenced this on Nov 11, 2019 from issue Signed-digit multi-comb for ecmult_gen (by peterdettman) by sipa
  24. sipa commented at 10:46 pm on November 11, 2019: contributor
    I’ve opened a rebased PR with somewhat better integration added in #693.
  25. real-or-random commented at 1:50 pm on February 25, 2020: contributor

    I’ve opened a rebased PR with somewhat better integration added in #693.

    Closing this in favor of #693.

  26. real-or-random closed this on Feb 25, 2020

  27. sipa cross-referenced this on Dec 29, 2021 from issue Signed-digit multi-comb ecmult_gen algorithm by sipa
  28. sipa cross-referenced this on Dec 29, 2021 from issue Signed-digit multi-comb ecmult_gen algorithm by sipa
  29. sipa commented at 9:08 pm on December 29, 2021: contributor
    Another iteration: #1058.

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin-core/secp256k1. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-10-30 01:15 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me