Preventing compiler optimizations in benchmarks without a memory fence #678

pull elichai wants to merge 2 commits into bitcoin-core:master from elichai:2019-10-bench-small changing 1 files +32 −22

elichai commented at 2:31 PM on October 28, 2019: contributor

As asked #667 (comment) this is the parts of #667 that don't require an assembly memory fence.

I splitted them to 2 commits, one with obvious easy ones. and another that changes the logic a bit to achieve this (See #667 (review) )

in src/bench_internal.c:247 in 4cd0333071 outdated

 245 | +    CHECK(j == 20000);
 246 |  }
 247 |  
 248 |  void bench_ecmult_wnaf(void* arg) {
 249 | -    int i;
 250 | +    int i, bits=0, overflow = 0;

real-or-random commented at 2:35 PM on October 28, 2019:

    int i, bits = 0, overflow = 0;

in src/bench_internal.c:188 in 4cd0333071 outdated

 184 | @@ -182,15 +185,16 @@ void bench_field_inverse_var(void* arg) {
 185 |  }
 186 |  
 187 |  void bench_field_sqrt(void* arg) {
 188 | -    int i;
 189 | +    int i, j=0;

real-or-random commented at 2:36 PM on October 28, 2019:

    int i, j = 0;

in src/bench_internal.c:105 in 4cd0333071 outdated

 104 | -        secp256k1_scalar_split_lambda(&l, &r, &data->scalar_x);
 105 | -        secp256k1_scalar_add(&data->scalar_x, &data->scalar_x, &data->scalar_y);
 106 | +        secp256k1_scalar_split_lambda(&data->scalar_x, &data->scalar_y, &data->scalar_x);
 107 | +        j += secp256k1_scalar_add(&data->scalar_x, &data->scalar_x, &data->scalar_y);
 108 |      }
 109 | +    CHECK(j <= 2000);

jonasnick commented at 2:36 PM on October 28, 2019:

shouldn't this be <= 20000 (i.e. it's missing a 0)?

elichai commented at 2:48 PM on October 28, 2019:

You're right, I missed a 0.

Added accumulators and checks on benchmarks so they won't get optimized out 73a30c6b58
Modified bench_scalar_split so it won't get optimized out 362bb25608
elichai force-pushed on Oct 28, 2019
jonasnick approved
jonasnick commented at 3:54 PM on October 28, 2019: contributor

ACK 362bb256
real-or-random commented at 5:19 PM on October 28, 2019: contributor

code changes in 362bb25608dbcd724a07dd5170c4ebe081c3dd84 look good to me.

This fixes the two jacobi benchmark on gcc but for some reason it makes some of the benchmarks in clang noticeably faster (a few percents) on my machine, see the spreadsheet here: https://gist.github.com/real-or-random/68cf2e2ba7d9ab1fc6a50a7e3698fc8f/raw/e13e2485628016976cedc2ad58180050ddc6c071/362bb25.ods Any idea why this is the case? Can you reproduce this?
elichai commented at 5:36 PM on October 28, 2019: contributor

@real-or-random Is there a commend to convert the data to the excel thing? or did you do it manually? :)
elichai commented at 5:43 PM on October 28, 2019: contributor

that's my stats: https://gist.github.com/elichai/7cc8ac137c83565768d87387efae96bf

Seems like clang didn't optimize this from the beginning. Not sure why it's faster but in my machine it's only by 8us
real-or-random approved
real-or-random commented at 12:39 PM on November 5, 2019: contributor

Anyway this is certainly an improvement.

ACK 362bb25608dbcd724a07dd5170c4ebe081c3dd84 I read the diff and I ran the benchmarks
elichai commented at 8:46 AM on November 17, 2019: contributor

Can we get this merged? anything blocking? @sipa @apoelstra @gmaxwell
sipa commented at 7:54 PM on November 18, 2019: contributor

ACK, this is certainly an improvement. I'm not convinced that the compiler isn't still able to optimize some of these away, but there is only so much we can do.
jonasnick referenced this in commit 544002c008 on Nov 18, 2019
jonasnick merged this on Nov 18, 2019
jonasnick closed this on Nov 18, 2019
fanquake referenced this in commit 8c97780db8 on Jun 13, 2020
sidhujag referenced this in commit 8a3a072968 on Jun 13, 2020
ComputerCraftr referenced this in commit b98f1c6e6c on Jun 16, 2020
UdjinM6 referenced this in commit 9d36ba6570 on Aug 10, 2021
5tefan referenced this in commit 8ded2caa74 on Aug 12, 2021
gades referenced this in commit d855cc511d on May 8, 2022

Contributors

Linked (view graph)

#667 Is the compiler optimizing out some of the benchmarks?#680 Change benchmark output to CSV