optimization: precalculate SipHash constant XOR with k0 and k1 in SaltedOutpointHasher #30442

pull l0rinc wants to merge 2 commits into bitcoin:master from l0rinc:paplorinc/siphash changing 7 files +132 −28
  1. l0rinc commented at 3:01 pm on July 12, 2024: contributor

    Continuing the IBD-related micro-optimizations (started in #30326), here I’m precalculating the SipHash constants XOR with k0 and k1 for the map hash calculations .

    before:

    ns/op op/s err% total benchmark
    60.68 16,481,085.68 0.3% 11.05 SaltedOutpointHasherBenchmark_create_set
    20.64 48,451,463.67 0.0% 11.00 SaltedOutpointHasherBenchmark_hash
    26.73 37,408,850.57 0.3% 11.02 SaltedOutpointHasherBenchmark_match
    24.25 41,243,487.98 0.6% 10.93 SaltedOutpointHasherBenchmark_mismatch

    after:

    ns/op op/s err% total benchmark
    58.81 17,004,413.32 0.4% 11.00 SaltedOutpointHasherBenchmark_create_set
    19.25 51,947,532.75 0.0% 11.01 SaltedOutpointHasherBenchmark_hash
    24.58 40,682,615.34 0.2% 11.01 SaltedOutpointHasherBenchmark_match
    23.92 41,812,410.34 0.5% 10.93 SaltedOutpointHasherBenchmark_mismatch

    i.e.:

    0SaltedOutpointHasherBenchmark_create_set - 17,004,413.32/16,481,085.68 - 3% faster
    1SaltedOutpointHasherBenchmark_hash - 51,947,532.75/48,451,463.67 - 7.2% faster
    2SaltedOutpointHasherBenchmark_match - 40,682,615.34/37,408,850.57 - 8.7% faster
    3SaltedOutpointHasherBenchmark_mismatch - 41,812,410.34/41,243,487.98 - 1.3% faster
    

    The tiny speed gains are probably not measurable by an actual IBD.

  2. DrahtBot commented at 3:01 pm on July 12, 2024: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/30442.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK laanwj, ismaelsadeeq

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    Conflicts

    No conflicts as of last run.

  3. l0rinc force-pushed on Jul 12, 2024
  4. l0rinc renamed this:
    Precalculate SipHash constant XOR with k0 and k1 in SaltedOutpointHasher
    optimization: Precalculate SipHash constant XOR with k0 and k1 in SaltedOutpointHasher
    on Jul 12, 2024
  5. l0rinc marked this as ready for review on Jul 13, 2024
  6. l0rinc commented at 12:40 pm on July 14, 2024: contributor
    @andrewtoth, this is another tiny addition to the coincache speedup, your review would be welcome.
  7. andrewtoth commented at 3:36 pm on July 14, 2024: contributor

    I did not see any improvement in the benchmark with this change. Running ./src/bench/bench_bitcoin -filter=.*Out[pP]oint.*

    commit 3b7b82b4b0c39a38538aae2cba10bec3907c5cbf

    0|               ns/op |                op/s |    err% |          ins/op |          cyc/op |         bra/op |   miss% |     total | benchmark
    1|--------------------:|--------------------:|--------:|----------------:|----------------:|---------------:|--------:|----------:|:----------
    2|                1.31 |      766,241,272.84 |    0.2% |            0.00 |            0.00 |           0.00 |    0.0% |      0.01 | `COutPoint_equality_match`
    3|                0.64 |    1,561,309,649.35 |    0.0% |            0.00 |            0.00 |           0.00 |    0.0% |      0.01 | `COutPoint_equality_mismatch`
    4|              107.62 |        9,292,298.54 |    1.7% |            0.00 |            0.00 |           0.00 |    0.0% |      0.10 | `SaltedOutpointHasherBenchmark_create_set`
    5|               23.36 |       42,799,791.14 |    0.0% |            0.00 |            0.00 |           0.00 |    0.0% |      0.01 | `SaltedOutpointHasherBenchmark_hash`
    6|               48.78 |       20,502,210.98 |    0.1% |            0.00 |            0.00 |           0.00 |    0.0% |      0.01 | `SaltedOutpointHasherBenchmark_match`
    7|               69.31 |       14,426,956.77 |    0.3% |            0.00 |            0.00 |           0.00 |    0.0% |      0.01 | `SaltedOutpointHasherBenchmark_mismatch`
    

    commit 7ac4e3aaf345a06fdde463100c45c4295d730820

    0|               ns/op |                op/s |    err% |          ins/op |          cyc/op |         bra/op |   miss% |     total | benchmark
    1|--------------------:|--------------------:|--------:|----------------:|----------------:|---------------:|--------:|----------:|:----------
    2|                1.30 |      766,440,414.89 |    0.0% |            0.00 |            0.00 |           0.00 |    0.0% |      0.01 | `COutPoint_equality_match`
    3|                1.02 |      982,018,502.07 |    0.2% |            0.00 |            0.00 |           0.00 |    0.0% |      0.01 | `COutPoint_equality_mismatch`
    4|              108.92 |        9,180,628.87 |    0.9% |            0.00 |            0.00 |           0.00 |    0.0% |      0.10 | `SaltedOutpointHasherBenchmark_create_set`
    5|               23.69 |       42,213,365.83 |    0.1% |            0.00 |            0.00 |           0.00 |    0.0% |      0.01 | `SaltedOutpointHasherBenchmark_hash`
    6|               47.90 |       20,876,369.10 |    0.1% |            0.00 |            0.00 |           0.00 |    0.0% |      0.01 | `SaltedOutpointHasherBenchmark_match`
    7|               70.10 |       14,265,655.03 |    0.1% |            0.00 |            0.00 |           0.00 |    0.0% |      0.01 | `SaltedOutpointHasherBenchmark_mismatch`
    
  8. l0rinc commented at 5:58 pm on July 14, 2024: contributor
    That’s disappointing, thanks for checking, something’s still off with my compiler, it seems. After I finish benching #28280 (comment), I’ll try this again on the dedicates server instead. I’m also a bit surprised that the ratios are completely different compared to my measurements, e.g. SaltedOutpointHasherBenchmark_match and _mismatch are 40% off, while in my case they’re basically equal. Are compilers this different? Could you please send me the exact command and compiler versions you’ve used? Thanks!
  9. l0rinc marked this as a draft on Jul 15, 2024
  10. andrewtoth commented at 3:55 pm on July 15, 2024: contributor

    Could you please send me the exact command and compiler versions you’ve used?

    gcc version 11.4.0 (Ubuntu 11.4.0-1ubuntu1~22.04)

    I ran ./configure --enable-bench make ./src/bench/bench_bitcoin -filter=.*Out[pP]oint.*

  11. l0rinc commented at 10:15 am on July 16, 2024: contributor

    Now that the other benchmark finished after a few days, I ran this on a gcc (Debian 12.2.0-14) 12.2.0.

    After the first run I got a warning of:

    Warning, results might be unstable:

    • CPU frequency scaling enabled: CPU 0 between 800.0 and 4,200.0 MHz
    • CPU governor is ‘powersave’ but should be ‘performance’
    • Turbo is enabled, CPU frequency will fluctuate

    which I’ve fixed and also ran pyperf system tune. Additionally I set a --min-time=10000, otherwise the results weren’t consistent between runs. I ran each change twice to make sure they’re stable.

    before:

    0./configure --enable-bench
    1git checkout 7ac4e3aaf345a06fdde463100c45c4295d730820 && git reset --hard
    2make -j$(nproc) && ./src/bench/bench_bitcoin -filter='.*Out[pP]oint.*' --min-time=10000
    
    ns/op op/s err% ins/op cyc/op IPC bra/op miss% total benchmark
    135.01 7,406,978.51 0.1% 1,060.42 485.49 2.184 120.72 1.1% 11.05 SaltedOutpointHasherBenchmark_create_set
    23.89 41,860,205.94 0.0% 252.01 85.94 2.932 4.00 0.0% 11.00 SaltedOutpointHasherBenchmark_hash
    59.81 16,719,701.99 0.1% 451.14 215.16 2.097 20.83 1.5% 11.00 SaltedOutpointHasherBenchmark_match
    79.21 12,624,726.37 0.0% 566.84 284.97 1.989 20.50 2.3% 10.74 SaltedOutpointHasherBenchmark_mismatch
    ns/op op/s err% ins/op cyc/op IPC bra/op miss% total benchmark
    133.71 7,479,023.41 0.0% 1,060.38 480.97 2.205 120.72 1.1% 11.03 SaltedOutpointHasherBenchmark_create_set
    23.87 41,894,947.89 0.0% 252.01 85.88 2.935 4.00 0.0% 11.00 SaltedOutpointHasherBenchmark_hash
    60.05 16,652,312.90 0.1% 452.20 216.05 2.093 20.86 1.3% 11.00 SaltedOutpointHasherBenchmark_match
    79.90 12,515,491.22 0.1% 566.85 287.46 1.972 20.50 2.3% 10.74

    after:

    0git checkout 3b7b82b4b0c39a38538aae2cba10bec3907c5cbf && git reset --hard
    1make -j$(nproc) && ./src/bench/bench_bitcoin -filter='.*Out[pP]oint.*' --min-time=10000
    
    ns/op op/s err% ins/op cyc/op IPC bra/op miss% total benchmark
    132.60 7,541,596.87 0.0% 1,045.84 476.97 2.193 120.72 1.1% 11.03 SaltedOutpointHasherBenchmark_create_set
    23.43 42,674,820.33 0.0% 246.01 84.29 2.918 4.00 0.0% 11.00 SaltedOutpointHasherBenchmark_hash
    58.90 16,977,444.58 0.0% 444.45 211.91 2.097 20.92 1.3% 11.00 SaltedOutpointHasherBenchmark_match
    78.29 12,773,764.99 0.0% 544.01 281.65 1.932 20.29 2.6% 10.73 SaltedOutpointHasherBenchmark_mismatch
    ns/op op/s err% ins/op cyc/op IPC bra/op miss% total benchmark
    132.75 7,533,102.65 0.0% 1,045.87 477.53 2.190 120.72 1.1% 11.02 SaltedOutpointHasherBenchmark_create_set
    23.44 42,660,977.02 0.0% 246.01 84.33 2.917 4.00 0.0% 11.00 SaltedOutpointHasherBenchmark_hash
    59.05 16,934,798.45 0.0% 444.96 212.44 2.094 20.93 1.3% 11.01 SaltedOutpointHasherBenchmark_match
    76.32 13,103,542.96 0.1% 531.55 274.56 1.936 19.99 2.4% 10.71 SaltedOutpointHasherBenchmark_mismatch

    Resulting in:

    SaltedOutpointHasherBenchmark_create_set: Before Avg = 7443000.96, After Avg = 7537349.76, Increase = 1.27%

    SaltedOutpointHasherBenchmark_hash: Before Avg = 41877576.91, After Avg = 42667898.67, Increase = 1.89%

    SaltedOutpointHasherBenchmark_match: Before Avg = 16686007.45, After Avg = 16956121.52, Increase = 1.62%

    SaltedOutpointHasherBenchmark_mismatch: Before Avg = 12570108.79, After Avg = 12938653.98, Increase = 2.93% @andrewtoth, what do you think, can you try reproducing these results?

  12. l0rinc marked this as ready for review on Jul 18, 2024
  13. andrewtoth commented at 3:24 am on July 25, 2024: contributor
    Based on your benchmark results, I don’t think this can be called an optimization. It seems to be worse for one benchmark, better in another, and roughly the same for the rest. The description also notes that this will not be noticeable by users. In light of that, does the motivation for this PR still hold?
  14. l0rinc closed this on Jul 25, 2024

  15. l0rinc commented at 6:11 pm on July 26, 2024: contributor
    I’ve removed the COutPoint_equality changes and benchmarks (updated the comments to avoid confusion), the SaltedOutpointHasher speed gains remain. I’ll let you decide if it’s worth doing the change or not.
  16. l0rinc reopened this on Jul 26, 2024

  17. l0rinc force-pushed on Jul 26, 2024
  18. l0rinc force-pushed on Jul 26, 2024
  19. DrahtBot commented at 4:18 pm on August 30, 2024: contributor

    🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/27977216903

    Make sure to run all tests locally, according to the documentation.

    The failure may happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  20. DrahtBot added the label CI failed on Aug 30, 2024
  21. l0rinc force-pushed on Aug 31, 2024
  22. l0rinc renamed this:
    optimization: Precalculate SipHash constant XOR with k0 and k1 in SaltedOutpointHasher
    optimization: precalculate SipHash constant XOR with k0 and k1 in SaltedOutpointHasher
    on Aug 31, 2024
  23. DrahtBot removed the label CI failed on Aug 31, 2024
  24. DrahtBot added the label CI failed on Oct 13, 2024
  25. l0rinc force-pushed on Oct 15, 2024
  26. achow101 requested review from josibake on Oct 15, 2024
  27. bench: Add COutPoint and SaltedOutpointHasher benchmarks
    This commit introduces new benchmarks to measure the performance of various operations using
    SaltedOutpointHasher, including hash computation, set operations, and set creation.
    
    These benchmarks are intended to provide insights about coin caching performance (e.g. during IBD).
    
    > make -j10 && ./src/bench/bench_bitcoin -filter='.*Out[pP]oint.*' --min-time=10000
    
    initial measurements:
    
    |               ns/op |                op/s |    err% |     total | benchmark
    |--------------------:|--------------------:|--------:|----------:|:----------
    |               60.68 |       16,481,085.68 |    0.3% |     11.05 | `SaltedOutpointHasherBenchmark_create_set`
    |               20.64 |       48,451,463.67 |    0.0% |     11.00 | `SaltedOutpointHasherBenchmark_hash`
    |               26.73 |       37,408,850.57 |    0.3% |     11.02 | `SaltedOutpointHasherBenchmark_match`
    |               24.25 |       41,243,487.98 |    0.6% |     10.93 | `SaltedOutpointHasherBenchmark_mismatch`
    6ce13a33c6
  28. Precalculate SipHash constants XOR with k0 and k1
    Previously, only k0 and k1 were stored, requiring redundant calculations in `SipHashUint256Extra(k0, k1, id.hash, id.n)`.
    Now, the precomputed values (`v0`, `v1`, `v2`, `v3`) are stored and used, improving efficiency.
    
    |               ns/op |                op/s |    err% |     total | benchmark
    |--------------------:|--------------------:|--------:|----------:|:----------
    |               58.46 |       17,104,708.69 |    0.4% |     11.05 | `SaltedOutpointHasherBenchmark_create_set`
    |               19.26 |       51,919,263.84 |    0.1% |     11.00 | `SaltedOutpointHasherBenchmark_hash`
    |               25.51 |       39,207,035.36 |    0.2% |     11.00 | `SaltedOutpointHasherBenchmark_match`
    |               24.20 |       41,328,321.90 |    1.5% |     10.97 | `SaltedOutpointHasherBenchmark_mismatch`
    
    compared to master:
    SaltedOutpointHasherBenchmark_create_set - 17,104,708.69/16,481,085.68 - 3.7% faster
    SaltedOutpointHasherBenchmark_hash - 51,919,263.84/48,451,463.67 - 7% faster
    SaltedOutpointHasherBenchmark_match - 39,207,035.36/37,408,850.57 - 4.8% faster
    SaltedOutpointHasherBenchmark_mismatch - 41,328,321.90/41,243,487.98 - same
    bc959f5ba9
  29. l0rinc force-pushed on Oct 16, 2024
  30. l0rinc commented at 1:15 pm on October 16, 2024: contributor
    @laanwj This is the optimization that relies on #30349, would really appreciate you input on it.
  31. DrahtBot removed the label CI failed on Oct 16, 2024
  32. laanwj requested review from laanwj on Oct 17, 2024
  33. laanwj added the label Utils/log/libs on Oct 17, 2024
  34. l0rinc closed this on Jan 19, 2025

  35. laanwj commented at 9:50 am on January 20, 2025: member

    Sorry for taking so long.

    FWIW, Code review ACK bc959f5ba9a62b41b36cc662aa176b03969be176. Moving the magic numbers to constants seems like a good idea to me. It’s good to add the benchmarks. And although the speed change is neglilible (as would be expected from factoring out ~four xor instructions), it doesn’t complicate the code much either.

  36. l0rinc reopened this on Jan 20, 2025

  37. in src/bench/crypto_hash.cpp:206 in 6ce13a33c6 outdated
    199@@ -199,6 +200,98 @@ static void SipHash_32b(benchmark::Bench& bench)
    200     });
    201 }
    202 
    203+static void SaltedOutpointHasherBenchmark_hash(benchmark::Bench& bench)
    204+{
    205+    FastRandomContext rng(true);
    206+    constexpr size_t size = 1000;
    


    ismaelsadeeq commented at 9:33 pm on January 23, 2025:

    In “bench: Add COutPoint and SaltedOutpointHasher benchmarks” 6ce13a33c6901d820467b9c24a03ca3861d91640

    You could factor out size as a static global variable and reuse it in the benchmarks to avoid repetition!


    l0rinc commented at 10:01 pm on January 23, 2025:
    I thought of that but they each have different difficulties, so they should be tuned separately (i.e. if something takes 2x as much time per round, we likely want fewer repetitions)
  38. in src/bench/crypto_hash.cpp:234 in 6ce13a33c6 outdated
    229+    std::vector<COutPoint> value_vector;
    230+    values.reserve(size);
    231+    value_vector.reserve(size);
    232+
    233+    for (size_t i = 0; i < size; ++i) {
    234+        COutPoint outpoint{Txid::FromUint256(rng.rand256()), rng.rand32()};
    


    ismaelsadeeq commented at 9:34 pm on January 23, 2025:

    In “bench: Add COutPoint and SaltedOutpointHasher benchmarks” 6ce13a33c6901d820467b9c24a03ca3861d91640

    You could create a helper for inserting random outpoints to reduce duplication.


    l0rinc commented at 10:04 pm on January 23, 2025:
    If I need to edit again, I’ll consider it, thanks
  39. in src/bench/crypto_hash.cpp:217 in 6ce13a33c6 outdated
    212+
    213+    const SaltedOutpointHasher hasher;
    214+    bench.batch(size).run([&] {
    215+        size_t result = 0;
    216+        for (const auto& outpoint : outpoints) {
    217+            result ^= hasher(outpoint);
    


    ismaelsadeeq commented at 9:35 pm on January 23, 2025:

    In “bench: Add COutPoint and SaltedOutpointHasher benchmarks” 6ce13a33c6901d820467b9c24a03ca3861d91640

    Why are you xoring the hashes?


    l0rinc commented at 10:00 pm on January 23, 2025:
    Just to make sure they’re not optimized away - ankerl::nanobench::doNotOptimizeAway is likely more costly than a simple xor. Do you have a better idea?
  40. in src/bench/crypto_hash.cpp:235 in 6ce13a33c6 outdated
    230+    values.reserve(size);
    231+    value_vector.reserve(size);
    232+
    233+    for (size_t i = 0; i < size; ++i) {
    234+        COutPoint outpoint{Txid::FromUint256(rng.rand256()), rng.rand32()};
    235+        values.emplace(outpoint);
    


    ismaelsadeeq commented at 9:38 pm on January 23, 2025:

    In “bench: Add COutPoint and SaltedOutpointHasher benchmarks” 6ce13a33c6901d820467b9c24a03ca3861d91640

    Why aren’t you benching insertion to the unordered map?


    l0rinc commented at 10:03 pm on January 23, 2025:
    I’d be measuring the map’s behavior in that case, not the hashing

    l0rinc commented at 10:04 pm on January 23, 2025:
    But there is a SaltedOutpointHasherBenchmark_create_set which does something similar
  41. ismaelsadeeq commented at 9:44 pm on January 23, 2025: member

    I’ve thoroughly reviewed 6ce13a33c6901d820467b9c24a03ca3861d91640 and lightly reviewed bc959f5ba9a62b41b36cc662aa176b03969be176.

    I’ve noticed a slight increase in performance.

    Moving the magic numbers to constants seems like a good idea to me. It’s good to add the benchmarks.

    Same!

    ACK bc959f5ba9a62b41b36cc662aa176b03969be176

  42. DrahtBot requested review from ismaelsadeeq on Jan 23, 2025

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-01-30 21:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me