Hi, This builds on #18013
Before anything I want to point out that we have 3 SipHash implementations CSipHasher
, SipHashUint256
, SipHashUint256Extra
. this PR touches only the first one(not used in any hashmap AFAIK).
I re-implemented the CSipHasher
with performance up to 3X times faster for big strings (BUFFER_SIZE = 1000*1000
) and 5%-19% faster for small strings (3 bytes, because a minute of syncing showed me that 3 bytes siphash is something that happens quite often)
Benchmarks against other siphash implementations can be found here: https://gist.github.com/elichai/abdebeeaee7e581bc74c75cb9487b3af (code: https://github.com/elichai/siphash-bench)
My implementation was inspired by the one in Rust’s stdlib (https://github.com/rust-lang/rust/blob/master/src/libcore/hash/sip.rs) which rust-bitcoin use in https://github.com/rust-bitcoin/bitcoin_hashes.
Before:
0$ ./src/bench/bench_bitcoin -filter="SipHash|SipHash_3b|SipHash_32b"
1#Benchmark evals iterations total min max median
2SipHash 5 700 4.20809 0.0011912 0.00122256 0.00120163
3SipHash_32b 5 40000000 4.1793 2.08632e-08 2.0948e-08 2.08949e-08
4SipHash_3b 5 40000000 3.18892 1.56861e-08 1.64617e-08 1.5749e-08
5$ ./src/bench/bench_bitcoin -filter="SipHash|SipHash_3b|SipHash_32b"
6#Benchmark evals iterations total min max median
7SipHash 5 700 4.24318 0.00120808 0.00121676 0.00121336
8SipHash_32b 5 40000000 4.23684 2.06753e-08 2.16015e-08 2.14555e-08
9SipHash_3b 5 40000000 3.15998 1.54582e-08 1.61558e-08 1.58555e-08
10$ ./src/bench/bench_bitcoin -filter="SipHash|SipHash_3b|SipHash_32b"
11#Benchmark evals iterations total min max median
12SipHash 5 700 4.2472 0.0012113 0.00121558 0.00121324
13SipHash_32b 5 40000000 4.20925 2.09789e-08 2.11288e-08 2.10327e-08
14SipHash_3b 5 40000000 3.10727 1.54352e-08 1.55982e-08 1.55463e-08
15$ ./src/bench/bench_bitcoin -filter="SipHash|SipHash_3b|SipHash_32b"
16#Benchmark evals iterations total min max median
17SipHash 5 700 4.37224 0.00124528 0.00125769 0.0012473
18SipHash_32b 5 40000000 4.26011 2.1214e-08 2.134e-08 2.13171e-08
19SipHash_3b 5 40000000 3.18842 1.59033e-08 1.59832e-08 1.59432e-08
After:
0$ ./src/bench/bench_bitcoin -filter="SipHash|SipHash_3b|SipHash_32b"
1#Benchmark evals iterations total min max median
2SipHash 5 700 1.36254 0.000386656 0.000392219 0.000388635
3SipHash_32b 5 40000000 4.31286 2.13773e-08 2.17857e-08 2.16181e-08
4SipHash_3b 5 40000000 2.91375 1.44794e-08 1.46495e-08 1.45848e-08
5$ ./src/bench/bench_bitcoin -filter="SipHash|SipHash_3b|SipHash_32b"
6#Benchmark evals iterations total min max median
7SipHash 5 700 1.32683 0.000372232 0.000386258 0.000376842
8SipHash_32b 5 40000000 4.15533 2.069e-08 2.08661e-08 2.07693e-08
9SipHash_3b 5 40000000 2.77612 1.38154e-08 1.3988e-08 1.38665e-08
10$ ./src/bench/bench_bitcoin -filter="SipHash|SipHash_3b|SipHash_32b"
11#Benchmark evals iterations total min max median
12SipHash 5 700 1.36596 0.00038727 0.000392932 0.000391074
13SipHash_32b 5 40000000 4.27694 2.13219e-08 2.14471e-08 2.13672e-08
14SipHash_3b 5 40000000 2.75763 1.37529e-08 1.38244e-08 1.37862e-08
15$ ./src/bench/bench_bitcoin -filter="SipHash|SipHash_3b|SipHash_32b"
16#Benchmark evals iterations total min max median
17SipHash 5 700 1.34316 0.000376846 0.000386059 0.000385079
18SipHash_32b 5 40000000 4.23368 2.1066e-08 2.14124e-08 2.11283e-08
19SipHash_3b 5 40000000 2.81931 1.40299e-08 1.42123e-08 1.40787e-08
Also made the benchmarks print a more readable output(https://gist.github.com/elichai/812c8866a69959404b480d968e080475),
this is limited by up to 47 chars of benchmark name, so as long as we don’t add more names like CHACHA20_POLY1305_AEAD_256BYTES_ENCRYPT_DECRYPT
and longer then it will be fine.
(it can probably be adjustable but that will require iterating over all the tests before running them to determine the longest cell and I thought the 47 limit is more than reasonable)