- Factor out common exponentiation prelude for _fe_sqrt_var/fe_inv
- Allow _fe_sqrt_var output to overwrite input
- Make argument for _fe_normalizes_to_zero(_var) const
- Add _fe_rsqrt_var for calculating reciprocal square root (1/sqrt(x))
- Add _fe_par_rsqrt_inv_var to calculate the r-sqrt of one input and the inverse of a second input, with a single exponentiation.
Because sqrt and inverse (by exponentiation) use very similar addition chains, some interesting functions are possible. x^((p - 3)/4) is the reciprocal square root 1/sqrt(x) (for quadratic residue x, else -1/sqrt(-x)), and already appears in the calculation of _fe_inverse.
The reciprocal root (RR) is in a way more useful than the sqrt itself, as we can calculate both the square root (x.RR) and the inverse (x.RR^4) from it very cheaply. Furthermore we can calculate RR(x) and 1/y for completely independent inputs x, y - with a single exponentiation, via RR(x.y^4). (Note that the current implementation does not try to handle x or y being 0).
Some representative benchmark results (bench_internal): field_inverse: min 4.74us / avg 4.77us / max 4.85us field_sqrt_var: min 4.67us / avg 4.72us / max 4.86us field_rsqrt_var: min 4.71us / avg 4.75us / max 4.84us field_par_rsqrt_inv_var: min 4.90us / avg 4.93us / max 5.13us
I think this can be applied to #262 to recover the output y coordinate. The trick there is to take a compressed input (i.e. x-only) X0, calculate K (== Y0^2) via the curve equation and then take the point (K.X0, K^2) as the “decompressed” point on an isomorphic curve (where the “u” value for the isom. is implicitly sqrt(K) == Y0). So we avoid a sqrt upfront. However at the end we calculate an inverse (using _fe_inv) already, so we can at that point calculate the actual RR(K) and 1/Z in parallel. Then we have sqrt(K) which is Y0 and we can choose which root based on the sign bit from the compressed input as usual. With sqrt(K) and 1/Z we can fully recover the output y.
I’ll try to find some time to re-do that pull request against the latest ECDH code.