This change is part of [IBD] - Tracking PR for speeding up Initial Block Download
Summary
The in-memory representation of the UTXO set uses (salted) SipHash for avoiding key collision attacks.
Hashing a uint256
key is done so often that a specialized optimization was extracted to SipHashUint256Extra. The constant salting operations were already extracted in the general case, this PR adjust the main specialization similarly.
Details
Previously, only k0
and k1
were stored, causing the constant xor operations to be recomputed in every call to SipHashUint256Extra
.
This commit adds a dedicated Uint256ExtraSipHasher
class that caches the initial state (v0-v3) and to perform the SipHash
computation on a uint256
(with an extra parameter), hiding the constant computation details from higher-level code and improving efficiency (as suggested by @sipa in #30442 (comment)).
Measurements
cmake -B build -DBUILD_BENCH=ON -DCMAKE_BUILD_TYPE=Release && cmake –build build -j$(nproc) && build/bin/bench_bitcoin -filter=‘SaltedOutpointHasherBench’ -min-time=10000
C++ compiler …………………….. AppleClang 16.0.0.16000026
Before:
ns/op | op/s | err% | total | benchmark |
---|---|---|---|---|
58.60 | 17,065,922.04 | 0.3% | 11.02 | SaltedOutpointHasherBench_create_set |
11.97 | 83,576,684.83 | 0.1% | 11.01 | SaltedOutpointHasherBench_hash |
14.50 | 68,985,850.12 | 0.3% | 10.96 | SaltedOutpointHasherBench_match |
13.90 | 71,942,033.47 | 0.4% | 11.03 | SaltedOutpointHasherBench_mismatch |
After:
ns/op | op/s | err% | total | benchmark |
---|---|---|---|---|
57.27 | 17,462,299.19 | 0.1% | 11.02 | SaltedOutpointHasherBench_create_set |
11.24 | 88,997,888.48 | 0.3% | 11.04 | SaltedOutpointHasherBench_hash |
13.91 | 71,902,014.20 | 0.2% | 11.01 | SaltedOutpointHasherBench_match |
13.29 | 75,230,390.31 | 0.1% | 11.00 | SaltedOutpointHasherBench_mismatch |
compared to master:
0create_set - 17,462,299.19 / 17,065,922.04 - 2.3% faster
1hash - 88,997,888.48 / 83,576,684.83 - 6.4% faster
2match - 71,902,014.20 / 68,985,850.12 - 4.2% faster
3mismatch - 75,230,390.31 / 71,942,033.47 - 4.5% faster
C++ compiler …………………….. GNU 13.3.0
Before:
ns/op | op/s | err% | ins/op | cyc/op | IPC | bra/op | miss% | total | benchmark |
---|---|---|---|---|---|---|---|---|---|
136.76 | 7,312,133.16 | 0.0% | 1,086.67 | 491.12 | 2.213 | 119.54 | 1.1% | 11.01 | SaltedOutpointHasherBench_create_set |
23.82 | 41,978,882.62 | 0.0% | 252.01 | 85.57 | 2.945 | 4.00 | 0.0% | 11.00 | SaltedOutpointHasherBench_hash |
60.42 | 16,549,695.42 | 0.1% | 460.51 | 217.04 | 2.122 | 21.00 | 1.4% | 10.99 | SaltedOutpointHasherBench_match |
78.66 | 12,713,595.35 | 0.1% | 555.59 | 282.52 | 1.967 | 20.19 | 2.2% | 10.74 | SaltedOutpointHasherBench_mismatch |
After:
ns/op | op/s | err% | ins/op | cyc/op | IPC | bra/op | miss% | total | benchmark |
---|---|---|---|---|---|---|---|---|---|
135.38 | 7,386,349.49 | 0.0% | 1,078.19 | 486.16 | 2.218 | 119.56 | 1.1% | 11.00 | SaltedOutpointHasherBench_create_set |
23.67 | 42,254,558.08 | 0.0% | 247.01 | 85.01 | 2.906 | 4.00 | 0.0% | 11.00 | SaltedOutpointHasherBench_hash |
58.95 | 16,962,220.14 | 0.1% | 446.55 | 211.74 | 2.109 | 20.86 | 1.4% | 11.01 | SaltedOutpointHasherBench_match |
76.98 | 12,991,047.69 | 0.1% | 548.93 | 276.50 | 1.985 | 20.25 | 2.3% | 10.72 | SaltedOutpointHasherBench_mismatch |
0compared to master:
1create_set - 7,386,349.49 / 7,312,133.16 - 1.0% faster
2hash - 42,254,558.08 / 41,978,882.62 - 0.6% faster
3match - 16,962,220.14 / 16,549,695.42 - 2.4% faster
4mismatch - 12,991,047.69 / 12,713,595.35 - 2.1% faster