I've been playing with the benchmarks and I started thinking about the optimizer. Now most of the benchmarks reuse the variables from past iterations(thanks @apoelstra for showing me that:) ) but the compiler might still be able to figure out it can remove at least the last iteration.
another thing is the jacobi symbol calculation, that has 0 side effect, so I would assume the optimizer will just remove it (which it seems to do so). I'm not sure what to make out of these results and I think I'll need to try and read the ASM itself to figure out what's going on.
But before:
$ ./bench_internal
scalar_add: min 0.00771us / avg 0.00811us / max 0.00934us
scalar_negate: min 0.00274us / avg 0.00277us / max 0.00280us
scalar_sqr: min 0.0296us / avg 0.0301us / max 0.0308us
scalar_mul: min 0.0305us / avg 0.0311us / max 0.0323us
scalar_inverse: min 8.65us / avg 8.76us / max 9.03us
scalar_inverse_var: min 2.08us / avg 2.14us / max 2.21us
field_normalize: min 0.00718us / avg 0.00744us / max 0.00797us
field_normalize_weak: min 0.00294us / avg 0.00316us / max 0.00360us
field_sqr: min 0.0140us / avg 0.0147us / max 0.0153us
field_mul: min 0.0185us / avg 0.0189us / max 0.0195us
field_inverse: min 4.09us / avg 4.12us / max 4.17us
field_inverse_var: min 2.10us / avg 2.11us / max 2.15us
field_sqrt: min 4.07us / avg 4.09us / max 4.14us
group_double_var: min 0.123us / avg 0.124us / max 0.126us
group_add_var: min 0.283us / avg 0.284us / max 0.287us
group_add_affine: min 0.242us / avg 0.244us / max 0.249us
group_add_affine_var: min 0.203us / avg 0.204us / max 0.208us
group_jacobi_var: min 0.179us / avg 0.186us / max 0.197us
wnaf_const: min 0.118us / avg 0.120us / max 0.124us
ecmult_wnaf: min 0.447us / avg 0.454us / max 0.465us
hash_sha256: min 0.252us / avg 0.257us / max 0.265us
hash_hmac_sha256: min 1.01us / avg 1.02us / max 1.06us
hash_rfc6979_hmac_sha256: min 5.56us / avg 5.72us / max 6.10us
context_verify: min 2652us / avg 2732us / max 2909us
context_sign: min 28.1us / avg 31.1us / max 33.2us
num_jacobi: min 0.0881us / avg 0.0893us / max 0.0913us
$ ./bench_internal
scalar_add: min 0.00793us / avg 0.00834us / max 0.00901us
scalar_negate: min 0.00285us / avg 0.00290us / max 0.00294us
scalar_sqr: min 0.0302us / avg 0.0310us / max 0.0321us
scalar_mul: min 0.0312us / avg 0.0318us / max 0.0323us
scalar_inverse: min 8.99us / avg 9.21us / max 9.47us
scalar_inverse_var: min 2.16us / avg 2.26us / max 2.39us
field_normalize: min 0.00753us / avg 0.00767us / max 0.00792us
field_normalize_weak: min 0.00305us / avg 0.00313us / max 0.00319us
field_sqr: min 0.0147us / avg 0.0152us / max 0.0158us
field_mul: min 0.0194us / avg 0.0196us / max 0.0200us
field_inverse: min 4.20us / avg 4.25us / max 4.34us
field_inverse_var: min 2.15us / avg 2.19us / max 2.26us
field_sqrt: min 4.16us / avg 4.19us / max 4.26us
group_double_var: min 0.126us / avg 0.128us / max 0.130us
group_add_var: min 0.291us / avg 0.293us / max 0.300us
group_add_affine: min 0.250us / avg 0.251us / max 0.252us
group_add_affine_var: min 0.208us / avg 0.213us / max 0.223us
group_jacobi_var: min 0.182us / avg 0.196us / max 0.204us
wnaf_const: min 0.124us / avg 0.126us / max 0.129us
ecmult_wnaf: min 0.456us / avg 0.464us / max 0.476us
hash_sha256: min 0.261us / avg 0.266us / max 0.274us
hash_hmac_sha256: min 1.04us / avg 1.06us / max 1.08us
hash_rfc6979_hmac_sha256: min 5.72us / avg 5.76us / max 5.87us
context_verify: min 2720us / avg 2789us / max 2922us
context_sign: min 28.4us / avg 29.6us / max 31.9us
num_jacobi: min 0.0867us / avg 0.0882us / max 0.0906us
After:
$ ./bench_internal
scalar_add: min 0.00950us / avg 0.00985us / max 0.0104us
scalar_negate: min 0.00338us / avg 0.00343us / max 0.00365us
scalar_sqr: min 0.0303us / avg 0.0311us / max 0.0325us
scalar_mul: min 0.0319us / avg 0.0325us / max 0.0335us
scalar_inverse: min 9.05us / avg 9.15us / max 9.39us
scalar_inverse_var: min 2.13us / avg 2.20us / max 2.35us
field_normalize: min 0.00853us / avg 0.00864us / max 0.00879us
field_normalize_weak: min 0.00394us / avg 0.00401us / max 0.00405us
field_sqr: min 0.0151us / avg 0.0154us / max 0.0161us
field_mul: min 0.0192us / avg 0.0198us / max 0.0204us
field_inverse: min 4.20us / avg 4.24us / max 4.34us
field_inverse_var: min 2.17us / avg 2.19us / max 2.22us
field_sqrt: min 4.18us / avg 4.25us / max 4.36us
group_double_var: min 0.128us / avg 0.129us / max 0.131us
group_add_var: min 0.294us / avg 0.298us / max 0.303us
group_add_affine: min 0.254us / avg 0.259us / max 0.268us
group_add_affine_var: min 0.212us / avg 0.215us / max 0.220us
group_jacobi_var: min 0.187us / avg 0.196us / max 0.205us
wnaf_const: min 0.124us / avg 0.126us / max 0.129us
ecmult_wnaf: min 0.469us / avg 0.475us / max 0.482us
hash_sha256: min 0.262us / avg 0.267us / max 0.272us
hash_hmac_sha256: min 1.05us / avg 1.06us / max 1.08us
hash_rfc6979_hmac_sha256: min 5.84us / avg 5.88us / max 5.97us
context_verify: min 2832us / avg 2899us / max 2999us
context_sign: min 29.9us / avg 31.5us / max 32.9us
num_jacobi: min 0.670us / avg 0.684us / max 0.699us
$ ./bench_internal
scalar_add: min 0.00910us / avg 0.00965us / max 0.0108us
scalar_negate: min 0.00326us / avg 0.00331us / max 0.00337us
scalar_sqr: min 0.0297us / avg 0.0304us / max 0.0316us
scalar_mul: min 0.0309us / avg 0.0316us / max 0.0320us
scalar_inverse: min 8.87us / avg 8.97us / max 9.08us
scalar_inverse_var: min 2.06us / avg 2.17us / max 2.27us
field_normalize: min 0.00828us / avg 0.00843us / max 0.00863us
field_normalize_weak: min 0.00383us / avg 0.00388us / max 0.00393us
field_sqr: min 0.0144us / avg 0.0148us / max 0.0153us
field_mul: min 0.0185us / avg 0.0189us / max 0.0197us
field_inverse: min 4.15us / avg 4.17us / max 4.26us
field_inverse_var: min 2.09us / avg 2.11us / max 2.15us
field_sqrt: min 4.09us / avg 4.11us / max 4.14us
group_double_var: min 0.125us / avg 0.126us / max 0.128us
group_add_var: min 0.284us / avg 0.286us / max 0.287us
group_add_affine: min 0.245us / avg 0.246us / max 0.249us
group_add_affine_var: min 0.205us / avg 0.207us / max 0.210us
group_jacobi_var: min 0.185us / avg 0.192us / max 0.198us
wnaf_const: min 0.119us / avg 0.121us / max 0.124us
ecmult_wnaf: min 0.456us / avg 0.462us / max 0.478us
hash_sha256: min 0.248us / avg 0.257us / max 0.261us
hash_hmac_sha256: min 1.02us / avg 1.05us / max 1.09us
hash_rfc6979_hmac_sha256: min 5.58us / avg 5.63us / max 5.72us
context_verify: min 2652us / avg 2668us / max 2730us
context_sign: min 27.7us / avg 29.0us / max 31.4us
num_jacobi: min 0.650us / avg 0.656us / max 0.663us
It does seem like there's a big difference in num_jacobi (from 0.0882us before to 0.656us)