I’ve been playing with the benchmarks and I started thinking about the optimizer. Now most of the benchmarks reuse the variables from past iterations(thanks @apoelstra for showing me that:) ) but the compiler might still be able to figure out it can remove at least the last iteration.
another thing is the jacobi symbol calculation, that has 0 side effect, so I would assume the optimizer will just remove it (which it seems to do so). I’m not sure what to make out of these results and I think I’ll need to try and read the ASM itself to figure out what’s going on.
But before:
0$ ./bench_internal
1scalar_add: min 0.00771us / avg 0.00811us / max 0.00934us
2scalar_negate: min 0.00274us / avg 0.00277us / max 0.00280us
3scalar_sqr: min 0.0296us / avg 0.0301us / max 0.0308us
4scalar_mul: min 0.0305us / avg 0.0311us / max 0.0323us
5scalar_inverse: min 8.65us / avg 8.76us / max 9.03us
6scalar_inverse_var: min 2.08us / avg 2.14us / max 2.21us
7field_normalize: min 0.00718us / avg 0.00744us / max 0.00797us
8field_normalize_weak: min 0.00294us / avg 0.00316us / max 0.00360us
9field_sqr: min 0.0140us / avg 0.0147us / max 0.0153us
10field_mul: min 0.0185us / avg 0.0189us / max 0.0195us
11field_inverse: min 4.09us / avg 4.12us / max 4.17us
12field_inverse_var: min 2.10us / avg 2.11us / max 2.15us
13field_sqrt: min 4.07us / avg 4.09us / max 4.14us
14group_double_var: min 0.123us / avg 0.124us / max 0.126us
15group_add_var: min 0.283us / avg 0.284us / max 0.287us
16group_add_affine: min 0.242us / avg 0.244us / max 0.249us
17group_add_affine_var: min 0.203us / avg 0.204us / max 0.208us
18group_jacobi_var: min 0.179us / avg 0.186us / max 0.197us
19wnaf_const: min 0.118us / avg 0.120us / max 0.124us
20ecmult_wnaf: min 0.447us / avg 0.454us / max 0.465us
21hash_sha256: min 0.252us / avg 0.257us / max 0.265us
22hash_hmac_sha256: min 1.01us / avg 1.02us / max 1.06us
23hash_rfc6979_hmac_sha256: min 5.56us / avg 5.72us / max 6.10us
24context_verify: min 2652us / avg 2732us / max 2909us
25context_sign: min 28.1us / avg 31.1us / max 33.2us
26num_jacobi: min 0.0881us / avg 0.0893us / max 0.0913us
27
28$ ./bench_internal
29scalar_add: min 0.00793us / avg 0.00834us / max 0.00901us
30scalar_negate: min 0.00285us / avg 0.00290us / max 0.00294us
31scalar_sqr: min 0.0302us / avg 0.0310us / max 0.0321us
32scalar_mul: min 0.0312us / avg 0.0318us / max 0.0323us
33scalar_inverse: min 8.99us / avg 9.21us / max 9.47us
34scalar_inverse_var: min 2.16us / avg 2.26us / max 2.39us
35field_normalize: min 0.00753us / avg 0.00767us / max 0.00792us
36field_normalize_weak: min 0.00305us / avg 0.00313us / max 0.00319us
37field_sqr: min 0.0147us / avg 0.0152us / max 0.0158us
38field_mul: min 0.0194us / avg 0.0196us / max 0.0200us
39field_inverse: min 4.20us / avg 4.25us / max 4.34us
40field_inverse_var: min 2.15us / avg 2.19us / max 2.26us
41field_sqrt: min 4.16us / avg 4.19us / max 4.26us
42group_double_var: min 0.126us / avg 0.128us / max 0.130us
43group_add_var: min 0.291us / avg 0.293us / max 0.300us
44group_add_affine: min 0.250us / avg 0.251us / max 0.252us
45group_add_affine_var: min 0.208us / avg 0.213us / max 0.223us
46group_jacobi_var: min 0.182us / avg 0.196us / max 0.204us
47wnaf_const: min 0.124us / avg 0.126us / max 0.129us
48ecmult_wnaf: min 0.456us / avg 0.464us / max 0.476us
49hash_sha256: min 0.261us / avg 0.266us / max 0.274us
50hash_hmac_sha256: min 1.04us / avg 1.06us / max 1.08us
51hash_rfc6979_hmac_sha256: min 5.72us / avg 5.76us / max 5.87us
52context_verify: min 2720us / avg 2789us / max 2922us
53context_sign: min 28.4us / avg 29.6us / max 31.9us
54num_jacobi: min 0.0867us / avg 0.0882us / max 0.0906us
After:
0$ ./bench_internal
1scalar_add: min 0.00950us / avg 0.00985us / max 0.0104us
2scalar_negate: min 0.00338us / avg 0.00343us / max 0.00365us
3scalar_sqr: min 0.0303us / avg 0.0311us / max 0.0325us
4scalar_mul: min 0.0319us / avg 0.0325us / max 0.0335us
5scalar_inverse: min 9.05us / avg 9.15us / max 9.39us
6scalar_inverse_var: min 2.13us / avg 2.20us / max 2.35us
7field_normalize: min 0.00853us / avg 0.00864us / max 0.00879us
8field_normalize_weak: min 0.00394us / avg 0.00401us / max 0.00405us
9field_sqr: min 0.0151us / avg 0.0154us / max 0.0161us
10field_mul: min 0.0192us / avg 0.0198us / max 0.0204us
11field_inverse: min 4.20us / avg 4.24us / max 4.34us
12field_inverse_var: min 2.17us / avg 2.19us / max 2.22us
13field_sqrt: min 4.18us / avg 4.25us / max 4.36us
14group_double_var: min 0.128us / avg 0.129us / max 0.131us
15group_add_var: min 0.294us / avg 0.298us / max 0.303us
16group_add_affine: min 0.254us / avg 0.259us / max 0.268us
17group_add_affine_var: min 0.212us / avg 0.215us / max 0.220us
18group_jacobi_var: min 0.187us / avg 0.196us / max 0.205us
19wnaf_const: min 0.124us / avg 0.126us / max 0.129us
20ecmult_wnaf: min 0.469us / avg 0.475us / max 0.482us
21hash_sha256: min 0.262us / avg 0.267us / max 0.272us
22hash_hmac_sha256: min 1.05us / avg 1.06us / max 1.08us
23hash_rfc6979_hmac_sha256: min 5.84us / avg 5.88us / max 5.97us
24context_verify: min 2832us / avg 2899us / max 2999us
25context_sign: min 29.9us / avg 31.5us / max 32.9us
26num_jacobi: min 0.670us / avg 0.684us / max 0.699us
27
28$ ./bench_internal
29scalar_add: min 0.00910us / avg 0.00965us / max 0.0108us
30scalar_negate: min 0.00326us / avg 0.00331us / max 0.00337us
31scalar_sqr: min 0.0297us / avg 0.0304us / max 0.0316us
32scalar_mul: min 0.0309us / avg 0.0316us / max 0.0320us
33scalar_inverse: min 8.87us / avg 8.97us / max 9.08us
34scalar_inverse_var: min 2.06us / avg 2.17us / max 2.27us
35field_normalize: min 0.00828us / avg 0.00843us / max 0.00863us
36field_normalize_weak: min 0.00383us / avg 0.00388us / max 0.00393us
37field_sqr: min 0.0144us / avg 0.0148us / max 0.0153us
38field_mul: min 0.0185us / avg 0.0189us / max 0.0197us
39field_inverse: min 4.15us / avg 4.17us / max 4.26us
40field_inverse_var: min 2.09us / avg 2.11us / max 2.15us
41field_sqrt: min 4.09us / avg 4.11us / max 4.14us
42group_double_var: min 0.125us / avg 0.126us / max 0.128us
43group_add_var: min 0.284us / avg 0.286us / max 0.287us
44group_add_affine: min 0.245us / avg 0.246us / max 0.249us
45group_add_affine_var: min 0.205us / avg 0.207us / max 0.210us
46group_jacobi_var: min 0.185us / avg 0.192us / max 0.198us
47wnaf_const: min 0.119us / avg 0.121us / max 0.124us
48ecmult_wnaf: min 0.456us / avg 0.462us / max 0.478us
49hash_sha256: min 0.248us / avg 0.257us / max 0.261us
50hash_hmac_sha256: min 1.02us / avg 1.05us / max 1.09us
51hash_rfc6979_hmac_sha256: min 5.58us / avg 5.63us / max 5.72us
52context_verify: min 2652us / avg 2668us / max 2730us
53context_sign: min 27.7us / avg 29.0us / max 31.4us
54num_jacobi: min 0.650us / avg 0.656us / max 0.663us
It does seem like there’s a big difference in num_jacobi
(from 0.0882us before to 0.656us)