Original algorithm borrowed from this paper (LS3): https://www.researchgate.net/publication/304417579_Modular_Inverse_Algorithms_Without_Multiplications_for_Cryptographic_Applications
Was improved by Anton Bukov (@k06a) and Mikhail Melnik (@zumzoom) to use unsigned 256-bit integers: https://gist.github.com/k06a/b990b7c7dda766d4f661e653d6804a53
This would allow to avoid usage of 62-bit signed representation and compute modinv without 256-bit multiplications and division.
Code is passing all the tests, but we injected some code to benchmark new also, need to move it to the bench target for sure, maybe someone could help with this?
Running tests on ARM64 (Apple M1 Max) gives 20% improvement for modinv()
method:
0t1 = clock();
1for (i = 0; i < 1000000; i++) {
2 secp256k1_scalar_to_signed62(&s, x);
3 secp256k1_modinv64(&s, &secp256k1_const_modinfo_scalar);
4 secp256k1_scalar_from_signed62(r, &s);
5 CHECK(secp256k1_scalar_eq(r, &rrr));
6}
7t2 = clock();
8for (i = 0; i < 1000000; i++) {
9 secp256k1_modinv64_scalar(r, x, &secp256k1_const_mod_scalar);
10 CHECK(secp256k1_scalar_eq(r, &rrr));
11}
12t3 = clock();
13printf("Time old: %f\n", ((double)(t2 - t1)) / CLOCKS_PER_SEC);
14printf("Time new: %f\n", ((double)(t3 - t2)) / CLOCKS_PER_SEC);
15printf("Improvement: %.2f%%\n", (1 - (double)(t3 - t2) / (double)(t2 - t1)) * 100);
=>
0Time old: 3.284568
1Time new: 2.639177
2Improvement: 19.65%
Please help to test in on x86 and x64 architectures, we tested only ARM64.