Fix Jacobi benchmarks and other benchmark improvements #797

sipa commented at 6:41 PM on August 11, 2020: contributor

A number of improvements to bench_internal:

Less confusing variable naming
Use random Z coordinates for the (initial) gej variables (instead of 1).
Vary the inputs to the Jacobi symbol benchmarks (they were constantly testing the same input, giving a non-representative result).
Add a benchmark for testing gej_to_ge_var.

sipa renamed this:
~~Benchmark improvements (Jacobi symbols, Z randomization, gej_to_ge_var)~~
Fix Jacobi benchmarks and other benchmark improvements
on Aug 12, 2020

in src/bench_internal.c:275 in 4d97c4a7d5 outdated

 261 | @@ -262,8 +262,12 @@ void bench_group_jacobi_var(void* arg, int iters) {
 262 |  
 263 |      for (i = 0; i < iters; i++) {
 264 |          j += secp256k1_gej_has_quad_y_var(&data->gej[0]);
 265 | +        secp256k1_fe_add(&data->gej[0].y, &data->fe[1]);
 266 | +        secp256k1_fe_add(&data->gej[0].z, &data->fe[2]);
 267 | +        secp256k1_fe_normalize_var(&data->gej[0].y);
 268 | +        secp256k1_fe_normalize_var(&data->gej[0].z);

real-or-random commented at 1:37 PM on August 12, 2020:

Maybe add a comment that this does not result in valid points but we don't care and that the additions and normalizations are negligible (if this agrees with your intention). The latter is also true for the other two changed functions.

sipa commented at 1:41 AM on September 10, 2020:

Done.

real-or-random commented at 1:40 PM on August 12, 2020: contributor

ACK mod nit

elichai commented at 10:16 AM on August 13, 2020: contributor

Isn't it more meaningful to test it with p as the prime of the jacobi symbol? (tested locally and the result are very similiar, but I think that for purposes like schnorr this is what we care about)

EDIT: I see that bench_group_jacobi_var actually encapsulates this already.

elichai commented at 11:10 AM on August 13, 2020: contributor

FYI this seem to be slightly slower than the actual random average.

before this PR I got: (locked on 2.4Ghz)

scalar_inverse: min 15.8us / avg 15.8us / max 16.1us                                                                   
scalar_inverse_var: min 3.13us / avg 3.16us / max 3.24us                                                               
field_inverse: min 7.39us / avg 7.42us / max 7.57us                                                                                                                                                                                           
field_inverse_var: min 3.10us / avg 3.11us / max 3.12us                                                                                                                                                                                       
group_jacobi_var: min 1.23us / avg 1.24us / max 1.26us                                                                 
num_jacobi: min 1.01us / avg 1.04us / max 1.20us

with this PR:

scalar_inverse: min 15.8us / avg 15.8us / max 16.0us
scalar_inverse_var: min 3.12us / avg 3.21us / max 3.48us
field_inverse: min 7.41us / avg 7.44us / max 7.55us
field_inverse_var: min 3.11us / avg 3.12us / max 3.24us
group_jacobi_var: min 3.67us / avg 3.68us / max 3.70us
num_jacobi: min 3.54us / avg 3.55us / max 3.57us

with randomly chosen inputs for field_inverse_var and num_jacobi:

scalar_inverse: min 15.8us / avg 15.8us / max 15.9us
scalar_inverse_var: min 3.13us / avg 3.15us / max 3.21us
field_inverse: min 7.39us / avg 7.41us / max 7.51us
field_inverse_var: min 3.16us / avg 3.21us / max 3.24us
group_jacobi_var: min 3.67us / avg 3.80us / max 4.43us
num_jacobi: min 3.42us / avg 3.44us / max 3.48us

(code diff for this https://pastebin.com/raw/ewuiYxGc)

jonasnick approved

jonasnick commented at 10:04 PM on August 18, 2020: contributor

ACK 38a7bb3fc7d82ca7ccd37d43c6648e422925fa70. I can't explain @elichai's results but they seem to be close enough and preallocating random elements could be a separate improvement (because it gets rid of the normalize, add, etc noise).

gmaxwell commented at 4:05 AM on August 22, 2020: contributor

ACK. I had assumed the different results with the precomputed values were probably largely alignment/cache/branch predictor issues. There has been some academic work to eliminate these effects by basically making a bunch of different binaries with the memory layout all randomized and benchmarking all of them. ... but unfortunately, like a billion other computer science research things that have shipped modified versions of clang, I wasn't able to get their code to work with the effort I was willing to put in.

I've contemplated before making benchmarks use precomputed values but it's also really nice if the benchmarks can run on low memory systems with minimal modification. Costs like add/etc could be subtracted out.

sipa commented at 2:41 AM on September 9, 2020: contributor

With the prospect of not needing Jacobi symbols anymore for BIP340 (and it possibly being removable from the codebase), the third commit may end up not being very useful long term. It doesn't hurt though.

I think this is ready.

real-or-random commented at 2:46 PM on September 9, 2020: contributor

@sipa I think my comment above is still unresolved (https://github.com/bitcoin-core/secp256k1/pull/797#discussion_r469265201)

Rename bench_internal variables

The _x and _y suffices are confusing; they don't actually correspond
to X and Y coordinates. Instead replace them with arrays.

c7a3424c5f

Randomize the Z coordinates in bench_internal

Also increase the number of fe inputs.

d0fdd5f009

Make jacobi benchmarks vary inputs

Also make the num_jacobi benchmark use the scalar order as modulus,
instead of a random number.

5c6af60ec5

Add benchmark for secp256k1_ge_set_gej_var cb5524adc5

sipa force-pushed on Sep 10, 2020

real-or-random commented at 10:28 AM on September 10, 2020: contributor

ACK cb5524adc589d3ae5066a1aa2f818bbfb91d0b1d

jonasnick commented at 11:35 AM on September 10, 2020: contributor

ACK cb5524adc589d3ae5066a1aa2f818bbfb91d0b1d

jonasnick merged this on Sep 10, 2020

jonasnick closed this on Sep 10, 2020

deadalnix referenced this in commit aade0f932f on Sep 28, 2020

deadalnix referenced this in commit a1e34a734e on Sep 29, 2020