@maflcko, so my point was that it seems to me that changing the benchmark slightly changes it’s performance considerably:
0static void SipHash_32b(benchmark::Bench& bench)
1{
2 uint256 x;
3 uint64_t k1 = 0;
4 bench.run([&] {
5 *((uint64_t*)x.begin()) = SipHashUint256(0, ++k1, x);
6 });
7}
works with inputs such as:
and the benchmark results in:
make -j10 && ./src/bench/bench_bitcoin –filter=‘SipHash_32b’ –min-time=10000
ns/op |
op/s |
err% |
total |
benchmark |
35.11 |
28,479,847.20 |
0.1% |
11.00 |
SipHash_32b |
Changing the benchmark by adding starting values for each input and modifying every 64 bit chunk of x, and consuming the SipHashUint256
result via doNotOptimizeAway
, as follows:
0static void SipHash_32b_new(benchmark::Bench& bench)
1{
2 FastRandomContext rng(true);
3 auto k0{rng.rand64()}, k1{rng.rand64()};
4 auto x{rng.rand256()};
5 auto* x_ptr{reinterpret_cast<uint64_t*>(x.data())};
6 bench.run([&] {
7 ankerl::nanobench::doNotOptimizeAway(SipHashUint256(k0, k1, x));
8 ++k0; ++k1; ++x_ptr[0]; ++x_ptr[1]; ++x_ptr[2]; ++x_ptr[3];
9 });
10}
which would work with inputs such as:
results in the following benchmark:
make -j10 && ./src/bench/bench_bitcoin –filter=‘SipHash_32b_new’ –min-time=10000
ns/op |
op/s |
err% |
total |
benchmark |
21.54 |
46,420,932.76 |
0.1% |
10.98 |
SipHash_32b_new |
I’ve added doNotOptimizeAway
to every other benchmark in crypto_hash.cpp
and their values didn’t change considerably.
For the record, the following is also running a lot faster:
0static void SipHash_32b_new(benchmark::Bench& bench)
1{
2 uint256 x;
3 uint64_t k1 = 0;
4 bench.run([&] {
5 auto result = SipHashUint256(0, ++k1, x);
6 ankerl::nanobench::doNotOptimizeAway(result);
7 *((uint64_t*)x.begin()) += 1;
8 });
9}
but adding result
instead of the 1
makes it slow again:
0 *((uint64_t*)x.begin()) += result;