As asked #667 (comment) this is the parts of #667 that don't require an assembly memory fence.
I splitted them to 2 commits, one with obvious easy ones. and another that changes the logic a bit to achieve this (See #667 (review) )
As asked #667 (comment) this is the parts of #667 that don't require an assembly memory fence.
I splitted them to 2 commits, one with obvious easy ones. and another that changes the logic a bit to achieve this (See #667 (review) )
245 | + CHECK(j == 20000); 246 | } 247 | 248 | void bench_ecmult_wnaf(void* arg) { 249 | - int i; 250 | + int i, bits=0, overflow = 0;
int i, bits = 0, overflow = 0;
184 | @@ -182,15 +185,16 @@ void bench_field_inverse_var(void* arg) { 185 | } 186 | 187 | void bench_field_sqrt(void* arg) { 188 | - int i; 189 | + int i, j=0;
int i, j = 0;
104 | - secp256k1_scalar_split_lambda(&l, &r, &data->scalar_x); 105 | - secp256k1_scalar_add(&data->scalar_x, &data->scalar_x, &data->scalar_y); 106 | + secp256k1_scalar_split_lambda(&data->scalar_x, &data->scalar_y, &data->scalar_x); 107 | + j += secp256k1_scalar_add(&data->scalar_x, &data->scalar_x, &data->scalar_y); 108 | } 109 | + CHECK(j <= 2000);
shouldn't this be <= 20000 (i.e. it's missing a 0)?
You're right, I missed a 0.
ACK 362bb256
code changes in 362bb25608dbcd724a07dd5170c4ebe081c3dd84 look good to me.
This fixes the two jacobi benchmark on gcc but for some reason it makes some of the benchmarks in clang noticeably faster (a few percents) on my machine, see the spreadsheet here: https://gist.github.com/real-or-random/68cf2e2ba7d9ab1fc6a50a7e3698fc8f/raw/e13e2485628016976cedc2ad58180050ddc6c071/362bb25.ods Any idea why this is the case? Can you reproduce this?
@real-or-random Is there a commend to convert the data to the excel thing? or did you do it manually? :)
that's my stats: https://gist.github.com/elichai/7cc8ac137c83565768d87387efae96bf
Seems like clang didn't optimize this from the beginning. Not sure why it's faster but in my machine it's only by 8us
Anyway this is certainly an improvement.
ACK 362bb25608dbcd724a07dd5170c4ebe081c3dd84 I read the diff and I ran the benchmarks
Can we get this merged? anything blocking? @sipa @apoelstra @gmaxwell
ACK, this is certainly an improvement. I'm not convinced that the compiler isn't still able to optimize some of these away, but there is only so much we can do.