Need info for NEON implementation of field multiplication #431

issue laanwj opened this issue on December 2, 2016
  1. laanwj commented at 6:08 AM on December 2, 2016: member

    @peterdettman In #173 (comment) you mentioned an alternative approach to the multiplication that would be good for SIMD paralellization.

    Back then you said "I have sample C code for 5x52 and could whip up a 10x26 version". Could you send this to me? I'm especially interested in 10x26 as ARM NEON has only 32x32->64 multiplication, but possibly I could figure it out myself with the 5x52 one when I know the approach.

  2. peterdettman commented at 9:49 AM on December 20, 2016: contributor

    Sorry for the slow reply. I managed to locate my experimental code from 2 years ago and pushed it to a branch here: https://github.com/peterdettman/secp256k1/tree/alt_mul . It actually has both 5x52 and 10x26 versions. Probably best read in conjunction with the paper: http://eprint.iacr.org/2014/852.pdf .

    Both versions appear to pass basic tests, but I have a vague recollection that the 10x26 one in particular might actually have potential overflows as written. These are definitely use-at-your-own-risk. Still, the basic structure should give you an idea whether there's good potential for SIMD there. I'm not entirely optimistic; the 5x52 is currently a few percent slower than master, but the 10x26 one is something like 17% slower for me.

  3. laanwj closed this on Apr 14, 2022


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin-core/secp256k1. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-04-14 18:15 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me