Preventing compiler optimizations in benchmarks without a memory fence

elichai commented at 2:31 pm on October 28, 2019: contributor

As asked #667 (comment) this is the parts of #667 that don’t require an assembly memory fence.

I splitted them to 2 commits, one with obvious easy ones. and another that changes the logic a bit to achieve this (See #667 (review) )

elichai cross-referenced this on Oct 28, 2019 from issue Is the compiler optimizing out some of the benchmarks? by elichai

in src/bench_internal.c:247 in 4cd0333071 outdated

245+    CHECK(j == 20000);
246 }
247 
248 void bench_ecmult_wnaf(void* arg) {
249-    int i;
250+    int i, bits=0, overflow = 0;

real-or-random commented at 2:35 pm on October 28, 2019:

0    int i, bits = 0, overflow = 0;

in src/bench_internal.c:188 in 4cd0333071 outdated

184@@ -182,15 +185,16 @@ void bench_field_inverse_var(void* arg) {
185 }
186 
187 void bench_field_sqrt(void* arg) {
188-    int i;
189+    int i, j=0;

real-or-random commented at 2:36 pm on October 28, 2019:

0    int i, j = 0;

in src/bench_internal.c:105 in 4cd0333071 outdated

104-        secp256k1_scalar_split_lambda(&l, &r, &data->scalar_x);
105-        secp256k1_scalar_add(&data->scalar_x, &data->scalar_x, &data->scalar_y);
106+        secp256k1_scalar_split_lambda(&data->scalar_x, &data->scalar_y, &data->scalar_x);
107+        j += secp256k1_scalar_add(&data->scalar_x, &data->scalar_x, &data->scalar_y);
108     }
109+    CHECK(j <= 2000);

jonasnick commented at 2:36 pm on October 28, 2019:

shouldn’t this be <= 20000 (i.e. it’s missing a 0)?

elichai commented at 2:48 pm on October 28, 2019:

You’re right, I missed a 0.

Added accumulators and checks on benchmarks so they won't get optimized out 73a30c6b58

Modified bench_scalar_split so it won't get optimized out 362bb25608

elichai force-pushed on Oct 28, 2019

jonasnick approved

jonasnick commented at 3:54 pm on October 28, 2019: contributor

ACK 362bb256

real-or-random commented at 5:19 pm on October 28, 2019: contributor

code changes in 362bb25608dbcd724a07dd5170c4ebe081c3dd84 look good to me.

This fixes the two jacobi benchmark on gcc but for some reason it makes some of the benchmarks in clang noticeably faster (a few percents) on my machine, see the spreadsheet here: https://gist.github.com/real-or-random/68cf2e2ba7d9ab1fc6a50a7e3698fc8f/raw/e13e2485628016976cedc2ad58180050ddc6c071/362bb25.ods Any idea why this is the case? Can you reproduce this?

elichai commented at 5:36 pm on October 28, 2019: contributor

@real-or-random Is there a commend to convert the data to the excel thing? or did you do it manually? :)

elichai commented at 5:43 pm on October 28, 2019: contributor

that’s my stats: https://gist.github.com/elichai/7cc8ac137c83565768d87387efae96bf

Seems like clang didn’t optimize this from the beginning. Not sure why it’s faster but in my machine it’s only by 8us

real-or-random cross-referenced this on Oct 28, 2019 from issue Change benchmark output to CSV by real-or-random

real-or-random approved

real-or-random commented at 12:39 pm on November 5, 2019: contributor

Anyway this is certainly an improvement.

ACK 362bb25608dbcd724a07dd5170c4ebe081c3dd84 I read the diff and I ran the benchmarks

elichai commented at 8:46 am on November 17, 2019: contributor

Can we get this merged? anything blocking? @sipa @apoelstra @gmaxwell

sipa commented at 7:54 pm on November 18, 2019: contributor

ACK, this is certainly an improvement. I’m not convinced that the compiler isn’t still able to optimize some of these away, but there is only so much we can do.

jonasnick referenced this in commit 544002c008 on Nov 18, 2019

jonasnick merged this on Nov 18, 2019

jonasnick closed this on Nov 18, 2019

elichai cross-referenced this on Mar 1, 2020 from issue Implement go bindings to secp256k1 by elichai

sipa cross-referenced this on Jun 9, 2020 from issue Update libsecp256k1 subtree by sipa

fanquake referenced this in commit 8c97780db8 on Jun 13, 2020

sidhujag referenced this in commit 8a3a072968 on Jun 13, 2020

ComputerCraftr referenced this in commit b98f1c6e6c on Jun 16, 2020

UdjinM6 referenced this in commit 9d36ba6570 on Aug 10, 2021

5tefan referenced this in commit 8ded2caa74 on Aug 12, 2021

gades referenced this in commit d855cc511d on May 8, 2022

Preventing compiler optimizations in benchmarks without a memory fence #678