It managed to convince autotools to compile your asm. It seems slower than our C code, at least on my machine:
12th Gen Intel(R) Core(TM) i7-1260P, TurboBoost disabled
./bench_internal field
Benchmark , Min(us) , Avg(us) , Max(us)
this PR (e2684293b1b72a1ab5974a2864549cea2788cf95), --with-asm=x86_64 (which is the default)
field_sqr , 0.0296 , 0.0297 , 0.0297
field_mul , 0.0339 , 0.0341 , 0.0344
master (464a9115b4eda46b464d22829ece4f51985944bf), --with-asm=x86_64 (which is the default)
field_sqr , 0.0270 , 0.0271 , 0.0272
field_mul , 0.0359 , 0.0359 , 0.0360
master, --with-asm=no (464a9115b4eda46b464d22829ece4f51985944bf), gcc 12.2.1 -02:
Benchmark , Min(us) , Avg(us) , Max(us)
field_sqr , 0.0236 , 0.0238 , 0.0240
field_mul , 0.0283 , 0.0284 , 0.0286
By the way, we should really get rid of our current ASM in the short term, at least for the field. GCC beats it significantly.