[IBD] precalculate SipHash constant salt XORs

l0rinc commented at 3:01 pm on July 12, 2024: contributor

This change is part of [IBD] - Tracking PR for speeding up Initial Block Download

Summary

The in-memory representation of the UTXO set uses (salted) SipHash for avoiding key collision attacks.

Hashing a uint256 key is done so often that a specialized optimization was extracted to SipHashUint256Extra. The constant salting operations were already extracted in the general case, this PR adjust the main specialization similarly.

Details

Previously, only k0 and k1 were stored, causing the constant xor operations to be recomputed in every call to SipHashUint256Extra. This commit adds a dedicated Uint256ExtraSipHasher class that caches the initial state (v0-v3) and to perform the SipHash computation on a uint256 (with an extra parameter), hiding the constant computation details from higher-level code and improving efficiency (as suggested by Sipa in #30442 (comment)).

Measurements

  0diff --git a/src/bench/crypto_hash.cpp b/src/bench/crypto_hash.cpp
  1--- a/src/bench/crypto_hash.cpp	(revision 9b1a7c3e8dd78d97fbf47c2d056d043b05969176)
  2+++ b/src/bench/crypto_hash.cpp	(revision e1b4f056b3097e7e34b0eda31f57826d81c9d810)
  3@@ -2,7 +2,6 @@
  4 // Distributed under the MIT software license, see the accompanying
  5 // file COPYING or http://www.opensource.org/licenses/mit-license.php.
  6 
  7-
  8 #include <bench/bench.h>
  9 #include <crypto/muhash.h>
 10 #include <crypto/ripemd160.h>
 11@@ -12,9 +11,11 @@
 12 #include <crypto/sha512.h>
 13 #include <crypto/siphash.h>
 14 #include <random.h>
 15-#include <span.h>
 16 #include <tinyformat.h>
 17 #include <uint256.h>
 18+#include <primitives/transaction.h>
 19+#include <util/hasher.h>
 20+#include <unordered_set>
 21 
 22 #include <cstdint>
 23 #include <vector>
 24@@ -205,6 +206,98 @@
 25     });
 26 }
 27 
 28+static void SaltedOutpointHasherBench_hash(benchmark::Bench& bench)
 29+{
 30+    FastRandomContext rng{/*fDeterministic=*/true};
 31+    constexpr size_t size{1000};
 32+
 33+    std::vector<COutPoint> outpoints(size);
 34+    for (auto& outpoint : outpoints) {
 35+        outpoint = {Txid::FromUint256(rng.rand256()), rng.rand32()};
 36+    }
 37+
 38+    const SaltedOutpointHasher hasher;
 39+    bench.batch(size).run([&] {
 40+        size_t result{0};
 41+        for (const auto& outpoint : outpoints) {
 42+            result ^= hasher(outpoint);
 43+        }
 44+        ankerl::nanobench::doNotOptimizeAway(result);
 45+    });
 46+}
 47+
 48+static void SaltedOutpointHasherBench_match(benchmark::Bench& bench)
 49+{
 50+    FastRandomContext rng{/*fDeterministic=*/true};
 51+    constexpr size_t size{1000};
 52+
 53+    std::unordered_set<COutPoint, SaltedOutpointHasher> values;
 54+    std::vector<COutPoint> value_vector;
 55+    values.reserve(size);
 56+    value_vector.reserve(size);
 57+
 58+    for (size_t i{0}; i < size; ++i) {
 59+        COutPoint outpoint{Txid::FromUint256(rng.rand256()), rng.rand32()};
 60+        values.emplace(outpoint);
 61+        value_vector.push_back(outpoint);
 62+        assert(values.contains(outpoint));
 63+    }
 64+
 65+    bench.batch(size).run([&] {
 66+        bool result{true};
 67+        for (const auto& outpoint : value_vector) {
 68+            result ^= values.contains(outpoint);
 69+        }
 70+        ankerl::nanobench::doNotOptimizeAway(result);
 71+    });
 72+}
 73+
 74+static void SaltedOutpointHasherBench_mismatch(benchmark::Bench& bench)
 75+{
 76+    FastRandomContext rng{/*fDeterministic=*/true};
 77+    constexpr size_t size{1000};
 78+
 79+    std::unordered_set<COutPoint, SaltedOutpointHasher> values;
 80+    std::vector<COutPoint> missing_value_vector;
 81+    values.reserve(size);
 82+    missing_value_vector.reserve(size);
 83+
 84+    for (size_t i{0}; i < size; ++i) {
 85+        values.emplace(Txid::FromUint256(rng.rand256()), rng.rand32());
 86+        COutPoint missing_outpoint{Txid::FromUint256(rng.rand256()), rng.rand32()};
 87+        missing_value_vector.push_back(missing_outpoint);
 88+        assert(!values.contains(missing_outpoint));
 89+    }
 90+
 91+    bench.batch(size).run([&] {
 92+        bool result{false};
 93+        for (const auto& outpoint : missing_value_vector) {
 94+            result ^= values.contains(outpoint);
 95+        }
 96+        ankerl::nanobench::doNotOptimizeAway(result);
 97+    });
 98+}
 99+
100+static void SaltedOutpointHasherBench_create_set(benchmark::Bench& bench)
101+{
102+    FastRandomContext rng{/*fDeterministic=*/true};
103+    constexpr size_t size{1000};
104+
105+    std::vector<COutPoint> outpoints(size);
106+    for (auto& outpoint : outpoints) {
107+        outpoint = {Txid::FromUint256(rng.rand256()), rng.rand32()};
108+    }
109+
110+    bench.batch(size).run([&] {
111+        std::unordered_set<COutPoint, SaltedOutpointHasher> set;
112+        set.reserve(size);
113+        for (const auto& outpoint : outpoints) {
114+            set.emplace(outpoint);
115+        }
116+        ankerl::nanobench::doNotOptimizeAway(set.size());
117+    });
118+}
119+
120 static void MuHash(benchmark::Bench& bench)
121 {
122     MuHash3072 acc;
123@@ -276,6 +369,10 @@
124 BENCHMARK(SHA256_32b_AVX2, benchmark::PriorityLevel::HIGH);
125 BENCHMARK(SHA256_32b_SHANI, benchmark::PriorityLevel::HIGH);
126 BENCHMARK(SipHash_32b, benchmark::PriorityLevel::HIGH);
127+BENCHMARK(SaltedOutpointHasherBench_hash, benchmark::PriorityLevel::HIGH);
128+BENCHMARK(SaltedOutpointHasherBench_match, benchmark::PriorityLevel::HIGH);
129+BENCHMARK(SaltedOutpointHasherBench_mismatch, benchmark::PriorityLevel::HIGH);
130+BENCHMARK(SaltedOutpointHasherBench_create_set, benchmark::PriorityLevel::HIGH);
131 BENCHMARK(SHA256D64_1024_STANDARD, benchmark::PriorityLevel::HIGH);
132 BENCHMARK(SHA256D64_1024_SSE4, benchmark::PriorityLevel::HIGH);
133 BENCHMARK(SHA256D64_1024_AVX2, benchmark::PriorityLevel::HIGH);

cmake -B build -DBUILD_BENCH=ON -DCMAKE_BUILD_TYPE=Release && cmake –build build -j$(nproc) && build/bin/bench_bitcoin -filter=‘SaltedOutpointHasherBench’ -min-time=10000

Before:

ns/op	op/s	err%	total	benchmark
58.60	17,065,922.04	0.3%	11.02	`SaltedOutpointHasherBench_create_set`
11.97	83,576,684.83	0.1%	11.01	`SaltedOutpointHasherBench_hash`
14.50	68,985,850.12	0.3%	10.96	`SaltedOutpointHasherBench_match`
13.90	71,942,033.47	0.4%	11.03	`SaltedOutpointHasherBench_mismatch`

After:

ns/op	op/s	err%	total	benchmark
57.27	17,462,299.19	0.1%	11.02	`SaltedOutpointHasherBench_create_set`
11.24	88,997,888.48	0.3%	11.04	`SaltedOutpointHasherBench_hash`
13.91	71,902,014.20	0.2%	11.01	`SaltedOutpointHasherBench_match`
13.29	75,230,390.31	0.1%	11.00	`SaltedOutpointHasherBench_mismatch`

compared to master:

0create_set - 17,462,299.19 / 17,065,922.04 - 2.3% faster
1hash       - 88,997,888.48 / 83,576,684.83 - 6.4% faster
2match      - 71,902,014.20 / 68,985,850.12 - 4.2% faster
3mismatch   - 75,230,390.31 / 71,942,033.47 - 4.5% faster

C++ compiler …………………….. GNU 13.3.0

Before:

ns/op	op/s	err%	ins/op	cyc/op	IPC	bra/op	miss%	total	benchmark
136.76	7,312,133.16	0.0%	1,086.67	491.12	2.213	119.54	1.1%	11.01	`SaltedOutpointHasherBench_create_set`
23.82	41,978,882.62	0.0%	252.01	85.57	2.945	4.00	0.0%	11.00	`SaltedOutpointHasherBench_hash`
60.42	16,549,695.42	0.1%	460.51	217.04	2.122	21.00	1.4%	10.99	`SaltedOutpointHasherBench_match`
78.66	12,713,595.35	0.1%	555.59	282.52	1.967	20.19	2.2%	10.74	`SaltedOutpointHasherBench_mismatch`

After:

ns/op	op/s	err%	ins/op	cyc/op	IPC	bra/op	miss%	total	benchmark
135.38	7,386,349.49	0.0%	1,078.19	486.16	2.218	119.56	1.1%	11.00	`SaltedOutpointHasherBench_create_set`
23.67	42,254,558.08	0.0%	247.01	85.01	2.906	4.00	0.0%	11.00	`SaltedOutpointHasherBench_hash`
58.95	16,962,220.14	0.1%	446.55	211.74	2.109	20.86	1.4%	11.01	`SaltedOutpointHasherBench_match`
76.98	12,991,047.69	0.1%	548.93	276.50	1.985	20.25	2.3%	10.72	`SaltedOutpointHasherBench_mismatch`

0compared to master:
1create_set -  7,386,349.49 / 7,312,133.16  - 1.0% faster
2hash       - 42,254,558.08 / 41,978,882.62 - 0.6% faster
3match      - 16,962,220.14 / 16,549,695.42 - 2.4% faster
4mismatch   - 12,991,047.69 / 12,713,595.35 - 2.1% faster

DrahtBot commented at 3:01 pm on July 12, 2024: contributor

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/30442.

Reviews

See the guideline for information on the review process.

Type	Reviewers
Concept ACK	jonatack
Stale ACK	laanwj, ismaelsadeeq

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

No conflicts as of last run.

l0rinc force-pushed on Jul 12, 2024

l0rinc renamed this:
~~Precalculate SipHash constant XOR with k0 and k1 in SaltedOutpointHasher~~
optimization: Precalculate SipHash constant XOR with k0 and k1 in SaltedOutpointHasher
on Jul 12, 2024

l0rinc marked this as ready for review on Jul 13, 2024

l0rinc commented at 12:40 pm on July 14, 2024: contributor

@andrewtoth, this is another tiny addition to the coincache speedup, your review would be welcome.

andrewtoth commented at 3:36 pm on July 14, 2024: contributor

I did not see any improvement in the benchmark with this change. Running ./src/bench/bench_bitcoin -filter=.*Out[pP]oint.*

commit 3b7b82b4b0c39a38538aae2cba10bec3907c5cbf

0|               ns/op |                op/s |    err% |          ins/op |          cyc/op |         bra/op |   miss% |     total | benchmark
1|--------------------:|--------------------:|--------:|----------------:|----------------:|---------------:|--------:|----------:|:----------
2|                1.31 |      766,241,272.84 |    0.2% |            0.00 |            0.00 |           0.00 |    0.0% |      0.01 | `COutPoint_equality_match`
3|                0.64 |    1,561,309,649.35 |    0.0% |            0.00 |            0.00 |           0.00 |    0.0% |      0.01 | `COutPoint_equality_mismatch`
4|              107.62 |        9,292,298.54 |    1.7% |            0.00 |            0.00 |           0.00 |    0.0% |      0.10 | `SaltedOutpointHasherBenchmark_create_set`
5|               23.36 |       42,799,791.14 |    0.0% |            0.00 |            0.00 |           0.00 |    0.0% |      0.01 | `SaltedOutpointHasherBenchmark_hash`
6|               48.78 |       20,502,210.98 |    0.1% |            0.00 |            0.00 |           0.00 |    0.0% |      0.01 | `SaltedOutpointHasherBenchmark_match`
7|               69.31 |       14,426,956.77 |    0.3% |            0.00 |            0.00 |           0.00 |    0.0% |      0.01 | `SaltedOutpointHasherBenchmark_mismatch`

commit 7ac4e3aaf345a06fdde463100c45c4295d730820

0|               ns/op |                op/s |    err% |          ins/op |          cyc/op |         bra/op |   miss% |     total | benchmark
1|--------------------:|--------------------:|--------:|----------------:|----------------:|---------------:|--------:|----------:|:----------
2|                1.30 |      766,440,414.89 |    0.0% |            0.00 |            0.00 |           0.00 |    0.0% |      0.01 | `COutPoint_equality_match`
3|                1.02 |      982,018,502.07 |    0.2% |            0.00 |            0.00 |           0.00 |    0.0% |      0.01 | `COutPoint_equality_mismatch`
4|              108.92 |        9,180,628.87 |    0.9% |            0.00 |            0.00 |           0.00 |    0.0% |      0.10 | `SaltedOutpointHasherBenchmark_create_set`
5|               23.69 |       42,213,365.83 |    0.1% |            0.00 |            0.00 |           0.00 |    0.0% |      0.01 | `SaltedOutpointHasherBenchmark_hash`
6|               47.90 |       20,876,369.10 |    0.1% |            0.00 |            0.00 |           0.00 |    0.0% |      0.01 | `SaltedOutpointHasherBenchmark_match`
7|               70.10 |       14,265,655.03 |    0.1% |            0.00 |            0.00 |           0.00 |    0.0% |      0.01 | `SaltedOutpointHasherBenchmark_mismatch`

l0rinc commented at 5:58 pm on July 14, 2024: contributor

That’s disappointing, thanks for checking, something’s still off with my compiler, it seems. After I finish benching #28280 (comment), I’ll try this again on the dedicates server instead. I’m also a bit surprised that the ratios are completely different compared to my measurements, e.g. SaltedOutpointHasherBenchmark_match and _mismatch are 40% off, while in my case they’re basically equal. Are compilers this different? Could you please send me the exact command and compiler versions you’ve used? Thanks!

l0rinc marked this as a draft on Jul 15, 2024

andrewtoth commented at 3:55 pm on July 15, 2024: contributor

Could you please send me the exact command and compiler versions you’ve used?

gcc version 11.4.0 (Ubuntu 11.4.0-1ubuntu1~22.04)

I ran ./configure --enable-bench make ./src/bench/bench_bitcoin -filter=.*Out[pP]oint.*

l0rinc commented at 10:15 am on July 16, 2024: contributor

Now that the other benchmark finished after a few days, I ran this on a gcc (Debian 12.2.0-14) 12.2.0.

After the first run I got a warning of:

Warning, results might be unstable:

CPU frequency scaling enabled: CPU 0 between 800.0 and 4,200.0 MHz

CPU governor is ‘powersave’ but should be ‘performance’

Turbo is enabled, CPU frequency will fluctuate

which I’ve fixed and also ran pyperf system tune. Additionally I set a --min-time=10000, otherwise the results weren’t consistent between runs. I ran each change twice to make sure they’re stable.

before:

0./configure --enable-bench
1git checkout 7ac4e3aaf345a06fdde463100c45c4295d730820 && git reset --hard
2make -j$(nproc) && ./src/bench/bench_bitcoin -filter='.*Out[pP]oint.*' --min-time=10000

ns/op	op/s	err%	ins/op	cyc/op	IPC	bra/op	miss%	total	benchmark
135.01	7,406,978.51	0.1%	1,060.42	485.49	2.184	120.72	1.1%	11.05	`SaltedOutpointHasherBenchmark_create_set`
23.89	41,860,205.94	0.0%	252.01	85.94	2.932	4.00	0.0%	11.00	`SaltedOutpointHasherBenchmark_hash`
59.81	16,719,701.99	0.1%	451.14	215.16	2.097	20.83	1.5%	11.00	`SaltedOutpointHasherBenchmark_match`
79.21	12,624,726.37	0.0%	566.84	284.97	1.989	20.50	2.3%	10.74	`SaltedOutpointHasherBenchmark_mismatch`

ns/op	op/s	err%	ins/op	cyc/op	IPC	bra/op	miss%	total	benchmark
133.71	7,479,023.41	0.0%	1,060.38	480.97	2.205	120.72	1.1%	11.03	`SaltedOutpointHasherBenchmark_create_set`
23.87	41,894,947.89	0.0%	252.01	85.88	2.935	4.00	0.0%	11.00	`SaltedOutpointHasherBenchmark_hash`
60.05	16,652,312.90	0.1%	452.20	216.05	2.093	20.86	1.3%	11.00	`SaltedOutpointHasherBenchmark_match`
79.90	12,515,491.22	0.1%	566.85	287.46	1.972	20.50	2.3%	10.74

after:

0git checkout 3b7b82b4b0c39a38538aae2cba10bec3907c5cbf && git reset --hard
1make -j$(nproc) && ./src/bench/bench_bitcoin -filter='.*Out[pP]oint.*' --min-time=10000

ns/op	op/s	err%	ins/op	cyc/op	IPC	bra/op	miss%	total	benchmark
132.60	7,541,596.87	0.0%	1,045.84	476.97	2.193	120.72	1.1%	11.03	`SaltedOutpointHasherBenchmark_create_set`
23.43	42,674,820.33	0.0%	246.01	84.29	2.918	4.00	0.0%	11.00	`SaltedOutpointHasherBenchmark_hash`
58.90	16,977,444.58	0.0%	444.45	211.91	2.097	20.92	1.3%	11.00	`SaltedOutpointHasherBenchmark_match`
78.29	12,773,764.99	0.0%	544.01	281.65	1.932	20.29	2.6%	10.73	`SaltedOutpointHasherBenchmark_mismatch`

ns/op	op/s	err%	ins/op	cyc/op	IPC	bra/op	miss%	total	benchmark
132.75	7,533,102.65	0.0%	1,045.87	477.53	2.190	120.72	1.1%	11.02	`SaltedOutpointHasherBenchmark_create_set`
23.44	42,660,977.02	0.0%	246.01	84.33	2.917	4.00	0.0%	11.00	`SaltedOutpointHasherBenchmark_hash`
59.05	16,934,798.45	0.0%	444.96	212.44	2.094	20.93	1.3%	11.01	`SaltedOutpointHasherBenchmark_match`
76.32	13,103,542.96	0.1%	531.55	274.56	1.936	19.99	2.4%	10.71	`SaltedOutpointHasherBenchmark_mismatch`

Resulting in:

SaltedOutpointHasherBenchmark_create_set: Before Avg = 7443000.96, After Avg = 7537349.76, Increase = 1.27%

SaltedOutpointHasherBenchmark_hash: Before Avg = 41877576.91, After Avg = 42667898.67, Increase = 1.89%

SaltedOutpointHasherBenchmark_match: Before Avg = 16686007.45, After Avg = 16956121.52, Increase = 1.62%

SaltedOutpointHasherBenchmark_mismatch: Before Avg = 12570108.79, After Avg = 12938653.98, Increase = 2.93% @andrewtoth, what do you think, can you try reproducing these results?

l0rinc marked this as ready for review on Jul 18, 2024

andrewtoth commented at 3:24 am on July 25, 2024: contributor

Based on your benchmark results, I don’t think this can be called an optimization. It seems to be worse for one benchmark, better in another, and roughly the same for the rest. The description also notes that this will not be noticeable by users. In light of that, does the motivation for this PR still hold?

l0rinc closed this on Jul 25, 2024

l0rinc commented at 6:11 pm on July 26, 2024: contributor

I’ve removed the COutPoint_equality changes and benchmarks (updated the comments to avoid confusion), the SaltedOutpointHasher speed gains remain. I’ll let you decide if it’s worth doing the change or not.

l0rinc reopened this on Jul 26, 2024

l0rinc force-pushed on Jul 26, 2024

DrahtBot commented at 4:18 pm on August 30, 2024: contributor

🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/27977216903

Make sure to run all tests locally, according to the documentation.

The failure may happen due to a number of reasons, for example:

Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.
An intermittent issue.

Leave a comment here, if you need help tracking down a confusing failure.

DrahtBot added the label CI failed on Aug 30, 2024

l0rinc force-pushed on Aug 31, 2024

l0rinc renamed this:
~~optimization: Precalculate SipHash constant XOR with k0 and k1 in SaltedOutpointHasher~~
optimization: precalculate SipHash constant XOR with k0 and k1 in SaltedOutpointHasher
on Aug 31, 2024

DrahtBot removed the label CI failed on Aug 31, 2024

DrahtBot added the label CI failed on Oct 13, 2024

l0rinc force-pushed on Oct 15, 2024

achow101 requested review from josibake on Oct 15, 2024

l0rinc force-pushed on Oct 16, 2024

l0rinc commented at 1:15 pm on October 16, 2024: contributor

@laanwj This is the optimization that relies on #30349, would really appreciate you input on it.

DrahtBot removed the label CI failed on Oct 16, 2024

laanwj requested review from laanwj on Oct 17, 2024

laanwj added the label Utils/log/libs on Oct 17, 2024

l0rinc closed this on Jan 19, 2025

laanwj commented at 9:50 am on January 20, 2025: member

Sorry for taking so long.

FWIW, Code review ACK bc959f5ba9a62b41b36cc662aa176b03969be176. Moving the magic numbers to constants seems like a good idea to me. It’s good to add the benchmarks. And although the speed change is neglilible (as would be expected from factoring out ~four xor instructions), it doesn’t complicate the code much either.

l0rinc reopened this on Jan 20, 2025

in src/bench/crypto_hash.cpp:212 in 6ce13a33c6 outdated

199@@ -199,6 +200,98 @@ static void SipHash_32b(benchmark::Bench& bench)
200     });
201 }
202 
203+static void SaltedOutpointHasherBenchmark_hash(benchmark::Bench& bench)
204+{
205+    FastRandomContext rng(true);
206+    constexpr size_t size = 1000;

ismaelsadeeq commented at 9:33 pm on January 23, 2025:

In “bench: Add COutPoint and SaltedOutpointHasher benchmarks” 6ce13a33c6901d820467b9c24a03ca3861d91640

You could factor out size as a static global variable and reuse it in the benchmarks to avoid repetition!

l0rinc commented at 10:01 pm on January 23, 2025:

I thought of that but they each have different difficulties, so they should be tuned separately (i.e. if something takes 2x as much time per round, we likely want fewer repetitions)

in src/bench/crypto_hash.cpp:240 in 6ce13a33c6 outdated

229+    std::vector<COutPoint> value_vector;
230+    values.reserve(size);
231+    value_vector.reserve(size);
232+
233+    for (size_t i = 0; i < size; ++i) {
234+        COutPoint outpoint{Txid::FromUint256(rng.rand256()), rng.rand32()};

ismaelsadeeq commented at 9:34 pm on January 23, 2025:

In “bench: Add COutPoint and SaltedOutpointHasher benchmarks” 6ce13a33c6901d820467b9c24a03ca3861d91640

You could create a helper for inserting random outpoints to reduce duplication.

l0rinc commented at 10:04 pm on January 23, 2025:

If I need to edit again, I’ll consider it, thanks

in src/bench/crypto_hash.cpp:223 in 6ce13a33c6 outdated

212+
213+    const SaltedOutpointHasher hasher;
214+    bench.batch(size).run([&] {
215+        size_t result = 0;
216+        for (const auto& outpoint : outpoints) {
217+            result ^= hasher(outpoint);

ismaelsadeeq commented at 9:35 pm on January 23, 2025:

In “bench: Add COutPoint and SaltedOutpointHasher benchmarks” 6ce13a33c6901d820467b9c24a03ca3861d91640

Why are you xoring the hashes?

l0rinc commented at 10:00 pm on January 23, 2025:

Just to make sure they’re not optimized away - ankerl::nanobench::doNotOptimizeAway is likely more costly than a simple xor. Do you have a better idea?

in src/bench/crypto_hash.cpp:241 in 6ce13a33c6 outdated

230+    values.reserve(size);
231+    value_vector.reserve(size);
232+
233+    for (size_t i = 0; i < size; ++i) {
234+        COutPoint outpoint{Txid::FromUint256(rng.rand256()), rng.rand32()};
235+        values.emplace(outpoint);

ismaelsadeeq commented at 9:38 pm on January 23, 2025:

In “bench: Add COutPoint and SaltedOutpointHasher benchmarks” 6ce13a33c6901d820467b9c24a03ca3861d91640

Why aren’t you benching insertion to the unordered map?

l0rinc commented at 10:03 pm on January 23, 2025:

I’d be measuring the map’s behavior in that case, not the hashing

l0rinc commented at 10:04 pm on January 23, 2025:

But there is a SaltedOutpointHasherBenchmark_create_set which does something similar

ismaelsadeeq commented at 9:44 pm on January 23, 2025: member

I’ve thoroughly reviewed 6ce13a33c6901d820467b9c24a03ca3861d91640 and lightly reviewed bc959f5ba9a62b41b36cc662aa176b03969be176.

I’ve noticed a slight increase in performance.

Moving the magic numbers to constants seems like a good idea to me. It’s good to add the benchmarks.

Same!

ACK bc959f5ba9a62b41b36cc662aa176b03969be176

DrahtBot requested review from ismaelsadeeq on Jan 23, 2025

sipa commented at 3:24 pm on February 1, 2025: member

I’m not convinced about the approach here: it’s leaking a SipHash implementation detail (the way the initial v0..v3 values are computed) into a higher layer (util/hasher.h).

I would suggest to have a Uint256ExtraSipHasher class instead of the SipHashUint256Extra function, with a constructor that takes k0, k1 like the function current before, and computes the v0..v3 values, and then an uint64_t Uint256ExtraSipHasher::operator()(const uint256& val, uint32_t extra) const noexcept which contains the meat of the computation.

l0rinc force-pushed on Feb 1, 2025

l0rinc commented at 7:42 pm on February 1, 2025: contributor

Thanks a lot for the review @sipa. The SipHash leaking into the hasher also bothered me a lot (it’s also why I closed it a few weeks ago before @laanwj revived it), but I like the Uint256ExtraSipHasher wrapper suggestion (basically the same as CSipHasher now, except for the extra tmp and count fields: both constructors cache the constant parts now.

I’m not sure why the suggested name was Uint256ExtraSipHasher and not SipHasherUint256Extra (to be more similar to SipHashUint256), but I don’t really mind either way.

I pushed a separate rebase (without conflicts), to simplify review, and the requested changes (and a few more related cleanups) in https://github.com/bitcoin/bitcoin/compare/2885e7785417b57a3f3cad420196837965572258..d848da97f2e4fd0997507b2d1c2a1f4a2325193d

I’d like to clean up SipHash after this change (get rid of templates, reduce duplication) in a separate PR - if this gets merged.

l0rinc renamed this:
~~optimization: precalculate SipHash constant XOR with k0 and k1 in SaltedOutpointHasher~~
[IBD] Precalculate some SipHash constant calculations
on Mar 12, 2025

l0rinc renamed this:
~~[IBD] Precalculate some SipHash constant calculations~~
[IBD] precalculate some SipHash constant calculations
on Mar 12, 2025

l0rinc renamed this:
~~[IBD] precalculate some SipHash constant calculations~~
[IBD] precalculate SipHash constant salt calculations
on Mar 12, 2025

l0rinc force-pushed on Mar 12, 2025

l0rinc force-pushed on Apr 17, 2025

l0rinc commented at 6:48 pm on May 16, 2025: contributor

@laanwj, @sipa, I’ve addressed the concerns you’ve raised, I’d appreciate a re-review here

DrahtBot added the label Needs rebase on Aug 13, 2025

test: Rename k1/k2 to k0/k1 for consistency 585fa7e1e3

refactor: Extract C0-C3 Siphash constants 7042a3b96b

optimization: refactor: Introduce Uint256ExtraSipHasher to cache SipHash constant state

Previously, only k0 and k1 were stored, causing the constant xor operations to be recomputed in every call to `SipHashUint256Extra`.
This commit adds a dedicated `Uint256ExtraSipHasher` class that caches the initial state (v0-v3) and to perform the `SipHash` computation on a `uint256` (with an extra parameter), hiding the constant computation details from higher-level code and improving efficiency.
This basically brings the precalculations in the `CSipHasher` constructor to the `uint256` specialized SipHash implementation.

> cmake -B build -DBUILD_BENCH=ON -DCMAKE_BUILD_TYPE=Release && cmake --build build -j$(nproc) && build/src/bench/bench_bitcoin -filter='SaltedOutpointHasherBench.*' -min-time=10000

> C++ compiler .......................... AppleClang 16.0.0.16000026

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|               57.27 |       17,462,299.19 |    0.1% |     11.02 | `SaltedOutpointHasherBench_create_set`
|               11.24 |       88,997,888.48 |    0.3% |     11.04 | `SaltedOutpointHasherBench_hash`
|               13.91 |       71,902,014.20 |    0.2% |     11.01 | `SaltedOutpointHasherBench_match`
|               13.29 |       75,230,390.31 |    0.1% |     11.00 | `SaltedOutpointHasherBench_mismatch`

compared to master:
create_set - 17,462,299.19/17,065,922.04 - 2.3% faster
hash       - 88,997,888.48/83,576,684.83 - 6.4% faster
match      - 71,902,014.20/68,985,850.12 - 4.2% faster
mismatch   - 75,230,390.31/71,942,033.47 - 4.5% faster

> C++ compiler .......................... GNU 13.3.0

|               ns/op |                op/s |    err% |          ins/op |          cyc/op |    IPC |         bra/op |   miss% |     total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
|              135.38 |        7,386,349.49 |    0.0% |        1,078.19 |          486.16 |  2.218 |         119.56 |    1.1% |     11.00 | `SaltedOutpointHasherBench_create_set`
|               23.67 |       42,254,558.08 |    0.0% |          247.01 |           85.01 |  2.906 |           4.00 |    0.0% |     11.00 | `SaltedOutpointHasherBench_hash`
|               58.95 |       16,962,220.14 |    0.1% |          446.55 |          211.74 |  2.109 |          20.86 |    1.4% |     11.01 | `SaltedOutpointHasherBench_match`
|               76.98 |       12,991,047.69 |    0.1% |          548.93 |          276.50 |  1.985 |          20.25 |    2.3% |     10.72 | `SaltedOutpointHasherBench_mismatch`

compared to master:
create_set -  7,386,349.49/7,312,133.16  - 1% faster
hash       - 42,254,558.08/41,978,882.62 - 0.6% faster
match      - 16,962,220.14/16,549,695.42 - 2.4% faster
mismatch   - 12,991,047.69/12,713,595.35 - 2% faster

Co-authored-by: sipa <pieter@wuille.net>

656da514c5

l0rinc commented at 10:46 pm on August 13, 2025: contributor

Rebased to fix a conflict with #33116.

Also dropped the benchmarking commit to make the change less scary - but kept the patch in the PRs description for anyone interested.

Change is ready for review again.

l0rinc force-pushed on Aug 13, 2025

jonatack commented at 10:52 pm on August 13, 2025: member

Concept ACK to be pinged by the bot, plan to review in a couple of weeks.

DrahtBot removed the label Needs rebase on Aug 13, 2025

l0rinc renamed this:
~~[IBD] precalculate SipHash constant salt calculations~~
[IBD] precalculate SipHash constant salt XORs
on Aug 13, 2025

[IBD] precalculate SipHash constant salt XORs #30442

Summary

Details

Measurements

Code Coverage & Benchmarks

Reviews

Conflicts