Avoid division when decomposing scalars #21

pull peterdettman wants to merge 1 commits into bitcoin-core:master from peterdettman:split-without-div changing 4 files +72 −43
  1. peterdettman commented at 2:49 pm on May 25, 2014: contributor
    • In secp256k1_gej_split_exp, there are two divisions used. Since the denominator is a constant known at compile-time, each can be replaced by a multiplication followed by a right-shift (and rounding).
    • This change adds the constants g1, g2 for this purpose and rewrites secp256k1_gej_split_exp accordingly.

    The technique is discussed, amongst other places, in the paper “Efficient Software Implementation of Public-Key Cryptography on Sensor Networks Using the MSP430X Microcontroller” (Gouvea, Oliveira, Lopez) - in section 4.3.

  2. sipa commented at 5:27 pm on June 2, 2014: contributor
    Nice, this removes the need for secp256k1_num_div entirely.
  3. sipa commented at 11:11 pm on June 15, 2014: contributor

    Can you:

    • Remove the secp256k1_num_div function and its implementations.
    • Add some commentary about how you came up with the numbers?
  4. peterdettman commented at 2:36 am on June 21, 2014: contributor
    @sipa Apologies for the delay, I’m a bit pushed for time at the moment. I’ll return to these pull requests hopefully later this week.
  5. peterdettman commented at 4:21 am on June 24, 2014: contributor
    I’ve added some comments and removed _num_div, but not ready for merge yet as it’s not giving the expected performance improvement. Bear with me while I investigate!
  6. peterdettman commented at 9:54 am on June 24, 2014: contributor
    The split itself is a lot faster, but there’s some kind of one-off-error vs the original “bnc1/bnc2” values on a small proportion of inputs, so looking into that now.
  7. peterdettman commented at 1:27 pm on July 16, 2014: contributor

    This one’s got me a bit confused. Not the “one-off-error”, which was actually just the rough edges of what is inherently an approximation (I’ve increased the precision so now the split is virtually always exactly the same as the old code). Rather, I can’t see that performance (of bench) is actually improved.

    Naively an improvement is expected since two divisions are replaced by simple truncations (right-shift). If the split code is wrapped in a (pointless) loop of say 10 iterations to try and give it a heavier “weight”, then the difference is pretty clear: the new split is something like 1.8x as fast, and it shows in the bench results. However, for the unmodified bench, I think I am measuring the new code as very slightly slower (it’s hard to pick the signal out from the noise here though).

    So, I’m thinking it has something to do with needing to load g1/g2 (vs ‘order’) at each call i.e. some sort of memory/cacheing problem. This seems plausible as the split is only needed once per scalar mult, and so the constants could quite well be getting evicted between split calls. (Note that ‘order’, which is not used in the updated split, is used in the ECDSA code, so perhaps is being cached for the existing split code).

    On that theory I tried a few changes like copying things into local variables, using gcc prefetch intrinsics, changing the layout in memory to correspond to the order of usage (g1,g2,a1b2,a2,b1), etc., but without much luck and usually just making things worse!

    Anyway, I’d appreciate it if others could perhaps report their timings with/without this patch (of course it’s only relevant when the endomorphism is used), and if you have any thoughts on what’s going on here performance-wise, I’m all ears.

  8. in src/num_gmp_impl.h: in 941d4ffe99 outdated
    332+    r->limbs = a->limbs - limbs;
    333+    if (left == 0) {
    334+        mpn_copyi(r->data, a->data + limbs, r->limbs);
    335+    } else {
    336+        mpn_rshift(r->data, a->data + limbs, r->limbs, left);
    337+        while (r->limbs>1 && r->data[r->limbs-1]==0) r->limbs--;
    


    sipa commented at 3:02 pm on July 16, 2014:
    Who not >0?

    peterdettman commented at 2:00 pm on July 20, 2014:
    I’m guessing you’re referring to the “r->limbs>1” test? _num_trunc is just _num_split modified to ignore the low bits (perhaps I could have just extended _num_shift to support large shifts). I gather that a zero valued num still has limbs==1, but in any case, I’ve just copied and modified… Please clarify if I’ve misunderstood the question.
  9. peterdettman force-pushed on Aug 20, 2014
  10. sipa commented at 9:44 pm on September 1, 2014: contributor
    Indeed, I notice a slight slowdown with this (around 0.4%). I wonder whether the advantage of not needing division in the bignum API isn’t already worth it…
  11. theuni cross-referenced this on Sep 11, 2014 from issue WIP: Internal bignum by theuni
  12. sipa commented at 11:18 am on October 26, 2014: contributor
    @gmaxwell review requested
  13. peterdettman commented at 11:30 am on October 26, 2014: contributor
    @sipa I’m still very curious about the performance gap, but I think it’s reasonable to merge this since the native num stuff appears to be progressing rapidly. I still expect there will be a net performance gain in batch operations, and we may yet find a way around the single-op deficit.
  14. peterdettman force-pushed on Oct 31, 2014
  15. peterdettman force-pushed on Nov 3, 2014
  16. peterdettman force-pushed on Nov 4, 2014
  17. peterdettman force-pushed on Nov 4, 2014
  18. peterdettman force-pushed on Nov 4, 2014
  19. peterdettman force-pushed on Nov 13, 2014
  20. peterdettman commented at 3:39 am on November 13, 2014: contributor
    Re-measuring with -O3 -march=native, it looks like this PR actually does improve performance (I measure very roughly 0.5% faster on bench_verify).
  21. peterdettman force-pushed on Nov 15, 2014
  22. peterdettman force-pushed on Nov 18, 2014
  23. Avoid division when decomposing scalars
    - In secp256k1_gej_split_exp, there are two divisions used. Since the denominator is a constant known at compile-time, each can be replaced by a multiplication followed by a right-shift (and rounding).
    - Add the constants g1, g2 for this purpose and rewrite secp256k1_gej_split_exp accordingly.
    - Remove secp256k1_num_div since no longer used
    960f9a2e15
  24. peterdettman force-pushed on Nov 19, 2014
  25. sipa cross-referenced this on Dec 1, 2014 from issue Multiplication-only lambda splitter without bignum by sipa
  26. gmaxwell closed this on Dec 7, 2014

  27. roconnor-blockstream cross-referenced this on Sep 21, 2020 from issue Increase precision of g1 and g2. by roconnor-blockstream
  28. tecnovert referenced this in commit f1c601ed9d on Apr 18, 2022

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin-core/secp256k1. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-01-24 02:15 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me