Minor optimizations to _scalar_inverse to save 4M #452

pull peterdettman wants to merge 2 commits into bitcoin-core:master from peterdettman:sinv_opt changing 1 files +67 −104
  1. peterdettman commented at 5:57 am on April 18, 2017: contributor

    The existing secp256k1_scalar_inverse uses a straight-forward “blocks-of-1” approach, but the exponent here admits of a few tweaks.

    Firstly, the pattern 01010 occurs 3 times, so if we have (u5=)x^5 available we can save 3M during accumulation. Actually u5 can be precalculated cheaply, replacing 1S with 1M.

    Secondly, the addition chain invites optimization since it ends with 127 (a critical threshold) and also contains unnecessary (to the chain)x4 and un-accumulated x7. The addition chain 2, 3, 6, 8, 14, 28, 56, 112, 126 looks inviting… and fortuitously, armed with u5, we can replace [x127 0 x] with [x126 u5] and [x4 0 x] with [x3 u5](two occurrences) - for free. The new shorter addition chain saves 2M.

    There should be ~256 squares and I counted 44 multiplies in the existing version, and a saving of 1S + 4M with this PR. Since performance of scalar mul/sqr are very similar in my config, I’d expect about 5/300 ~= 1.6% performance improvement, and measure roughly 1.5%.

  2. Minor optimizations to _scalar_inverse to save 4M
    - Precalculate x^5 and use for "01010" patterns during accumulation. (net -2M)
    - Further use of x^5 to allow shorter addition chain (net -2M)
    cf12fa13cb
  3. peterdettman cross-referenced this on Apr 18, 2017 from issue Scalar 4x64 performance improvements by peterdettman
  4. briansmith commented at 7:21 pm on April 24, 2017: contributor
    ICYMI: I submitted a PR to improve upon this by 1 square and 3 multiplications (total savings 2 squares + 7 multiplications), but because it was based on Peter’s unmerged PR, the PR ended up being a PR against his repo instead of this one: https://github.com/peterdettman/secp256k1/pull/1.
  5. Further shorten the addition chain for scalar inversion.
    Reduce the number of squarings by one and reduce the number of
    multiplications by three.
    465159c278
  6. gmaxwell commented at 4:45 pm on April 25, 2017: contributor
    These changes are awesome. Thanks. Will test and review carefully soon.
  7. sipa commented at 10:04 pm on April 25, 2017: contributor

    Benchmarked bench_verify with GMP disabled on a i7-6820HQ CPU, pegged to 2.6GHz.

    ACK on both. The result is just a sequence of mul/sqr calls, which by definition is an exponentiation ladder. If the result were incorrect, it would implement an incorrect exponent, which nearly every test run should catch.

  8. sipa cross-referenced this on Apr 25, 2017 from issue Further shorten the addition chain for scalar inversion. by briansmith
  9. briansmith commented at 10:12 pm on April 25, 2017: contributor
    FWIW, I also reviewed Peter’s PR and it is correct. I also agree with @sipa’s assessment that many things would fail if the sequence of multiplications and squarings were incorrect. Nonetheless, I also manually verified that Peter’s addition chain is correct when improving it.
  10. gmaxwell commented at 11:14 pm on April 25, 2017: contributor
    @peterdettman Can you pull briansmith’s commit into this PR?
  11. peterdettman commented at 5:04 am on April 26, 2017: contributor
    @gmaxwell Done. Cumulative 255S+44M -> 253S+37M, measured speed of scalar_inverse at +3% as expected.
  12. gmaxwell approved
  13. gmaxwell commented at 10:40 pm on April 26, 2017: contributor
    ACK.
  14. sipa merged this on Apr 26, 2017
  15. sipa closed this on Apr 26, 2017

  16. sipa referenced this in commit cbc20b8c34 on Apr 26, 2017

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin-core/secp256k1. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-12-23 05:15 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me