fiat-crypto can generate verified field code for multiple targets, e.g., C and x86_64 asm. It has algorithm templates for our mul and sqr algorithm (under the name “Dettman”) for secp256k1’s field. But the current implementation misses at least one optimization that we have in our C code (but not asm), namely the one #810.
CryptOpt can optimize fiat-crypto’s output, producing significantly faster asm while retaining the formal guarantees.
A plausible route towards using these projects in this library is:
- Optimize the algorithm template in fiat-crypto code further by implementing the optimization from #810 and perhaps further refinements. This is tracked in https://github.com/mit-plv/fiat-crypto/issues/1582. This would ideally be done by the fiat-crypto folks (in particular @OwenConoly who contributed the initial implementation).
- If fiat-crypto’s C output is fast enough (ideally on par with our current C code), replace the current C code by fiat-crypto’s C code.
- If CryptoOpt provides a significant enough further speedup, replace the current x86_64 asm by fiat-crypto + CryptOpt x86_64 asm. Since the current output of CryptOpt is not fully generic x86_64 asm, we need either of these two:
- Add CPU id on our side
- Modify CryptOpt so that it uses only instructions available on all x86_64 CPUs