It managed to convince autotools to compile your asm. It seems slower than our C code, at least on my machine:
12th Gen Intel(R) Core(TM) i7-1260P, TurboBoost disabled
0./bench_internal field
1
2Benchmark , Min(us) , Avg(us) , Max(us)
3
4this PR (e2684293b1b72a1ab5974a2864549cea2788cf95), --with-asm=x86_64 (which is the default)
5field_sqr , 0.0296 , 0.0297 , 0.0297
6field_mul , 0.0339 , 0.0341 , 0.0344
7
8master (464a9115b4eda46b464d22829ece4f51985944bf), --with-asm=x86_64 (which is the default)
9field_sqr , 0.0270 , 0.0271 , 0.0272
10field_mul , 0.0359 , 0.0359 , 0.0360
11
12master, --with-asm=no (464a9115b4eda46b464d22829ece4f51985944bf), gcc 12.2.1 -02:
13Benchmark , Min(us) , Avg(us) , Max(us)
14field_sqr , 0.0236 , 0.0238 , 0.0240
15field_mul , 0.0283 , 0.0284 , 0.0286
By the way, we should really get rid of our current ASM in the short term, at least for the field. GCC beats it significantly.