For discussion: compact field elements

peterdettman commented at 9:21 am on June 22, 2014: contributor

secp256k1_fec_t: compact field element representation
secp256k1_fec_* methods: import/export, mul/sqr/sqr_pow
only field “64bit” at the moment (no asm)
sqrt and constant-time inversion changed to use this

A compact representation allows faster multiplication/squaring, although the overheads of converting back and forth make it unwieldy in the point formulae. Nevertheless, sqrt and Fermat-based inversion require >256 consecutive mul/sqr operations, making the conversion worthwhile.

The measured performance improvement for isolated secp256k1_fe_sqrt testing (and by extension secp256k1_fe_inv) is around %18, including all overheads.

sipa commented at 1:05 am on June 25, 2014: contributor

What this really does is introduce a second field element representation, which is optimized for mul/sqr rather than additions/scalar multiplications. For some other field implementations, there may not be a need for distinguishing them, and even just a wrapper implementation would add unnecessary copying overhead.

What performance gain is there for non-microbenchmarks?

peterdettman commented at 5:01 am on June 25, 2014: contributor

I agree with your assessment, although I suspect the copying overheads could be made negligible for fields that don’t want/need it. There could well be some sort of cache impact from switching b/w mul/sqr implementations.

For ‘bench’, I just measured 1m10.232s vs 1m11.145s (+endo, 64bit), so roughly 1.3% faster. AFAICT that’s just sqrt, and I’m not sure if it’s a “fair” weighting.

Just for argument’s sake, I switched to USE_FIELD_INV_BUILTIN (for gmp; I can’t get openssl working yet), and then ‘bench’ takes 1m17.683s vs 1m19.926s, or roughly 2.9% faster. Probably not important if batch inversion becomes available.

I imagine the reaction will be a bit “meh” for those numbers, given the extra cruft in the API. Maybe some of the benefit could be had just by providing _sqr_pow, and any conversion could be a completely private detail of the field (when pow is “big enough”)?

Looking ahead a bit, some of the cruft might be needed for other purposes anyway. Import/export via char arrays is a bit annoying and quite costly. If a native bignum is implemented I could see wanting to export (import) directly into the “native limb format”, which would be basically the same as _fec_t (but not thought of as belonging to the field itself). I want this already for the modular inverse prototypes I’ve been working on.

Please don’t feel any pressure to decide on merging this; it’s more of a discussion piece, since there’s some reasonably low-hanging performance gain to be had, and I’m wondering if there’s some clean way of looking at it that doesn’t clutter up the API, or at least dovetails with other changes.

Add compact field element variant and methods

- secp256k1_fec_t: compact field element representation
- secp256k1_fec_* methods: import/export, mul/sqr/sqr_pow
- only field "64bit" at the moment (no asm)
- sqrt and constant-time inversion changed to use this

A compact representation allows faster multiplication/squaring, although the overheads of converting back and forth make it unwieldy in the point formulae. Nevertheless, sqrt and Fermat-based inversion require >256 consecutive mul/sqr operations, making the conversion worthwhile.

The measured performance improvement for isolated secp256k1_fe_sqrt testing (and by extension secp256k1_fe_inv) is around %18, including all overheads.

f22254230f

peterdettman force-pushed on Aug 20, 2014

sipa commented at 11:02 pm on December 16, 2014: contributor

The benefit of this for verification speed is now lower still, as it doesn’t require a field inverse anymore. Public key decompression still requires a square root though, so it may be worthwhile.

peterdettman commented at 4:18 am on December 17, 2014: contributor

In light of the recent mul/sqr improvements, I will probably revisit this to see whether it is still true that “A compact representation allows faster multiplication/squaring”. I think we should also try adding _sqr_pow to the field API and see how that measures up. That’s then a fairer baseline to try and beat with an addition chain focused version.

sipa commented at 12:05 pm on December 17, 2014: contributor

Well, _sqr_pow sounds good for readability, but it’s unlikely to be optimal. I expect that in fe_inverse and fe_sqrt, we’ll want to convert once to another type, and convert back once, instead of inside every single exponentiation done.

Also, have you thought about using Montgomery multiplication? It seems slightly faster (at least for 64-bit) for scalars, which are essentially the same representation as the one you’re proposing here for fields, AFAICT.

sipa commented at 0:37 am on February 13, 2015: contributor

Looks very outdated by now.

peterdettman commented at 5:47 am on February 13, 2015: contributor

@sipa Agreed, closing. mul/sqr are significantly faster now than when this was first mooted. Also, I was unable to get a _sqr_pow to show any significant improvement, against my expectations that a fully in-registers loop should have been faster. I never did investigate Montgomery in this context.

peterdettman closed this on Feb 13, 2015

For discussion: compact field elements #37