Would implementing the Intel ECC optimizations using the PCLMULQDQ instruction make sense for libsecp256k1?
According to this document, the speed boost can be "up to 600x": https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/polynomial-multiplication-instructions-paper.pdf
Searching around Github I found an example that potentially could be used as a reference: https://github.com/wqweto/VbAsyncSocket/blob/4b7f4d8bc650688e2b6ad5460c997ed1df26d2e0/lib/thunks/gf128.c#L116-L165
- Is using HW accelerated ECC safe?
- Would it be worth doing?
Apologies if this has already been discussed and I missed it.