benchmark: Improve SipHash_32b accuracy to avoid potential optimization issues #30349

pull l0rinc wants to merge 1 commits into bitcoin:master from l0rinc:paplorinc/remove_siphash_benchmark_recursion changing 1 files +9 −3
  1. l0rinc commented at 2:05 pm on June 27, 2024: contributor

    This PR stems from the discussions in #30317 (review)

    The previous benchmark for SipHash was slightly less accurate in representing real-world usage and allowed for potential compiler optimizations that could invalidate the benchmark. This change aims to ensure the benchmark produces more realistic results.

    By modifying the initial values and only incrementing the bytes of val, the benchmark should reflects a more typical usage patterns - and prevent the compiler from optimizing away the calculations.


    On my M1 processor the benchmark’s speed changed significantly (but the CI seems to produce the same result as before):

    cmake -B build -DCMAKE_BUILD_TYPE=Release -DBUILD_BENCH=ON && cmake –build build -j10 && ./build/src/bench/bench_bitcoin –filter=SipHash_32b –min-time=1000

    Before:

    ns/op op/s err% total benchmark
    35.15 28,445,856.66 0.2% 1.10 SipHash_32b

    After (note that only the benchmark changed):

    ns/op op/s err% total benchmark
    22.05 45,350,886.64 0.3% 1.10 SipHash_32b
  2. DrahtBot commented at 2:05 pm on June 27, 2024: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/30349.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK maflcko, hodlinator, achow101
    Stale ACK laanwj

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    Conflicts

    No conflicts as of last run.

  3. l0rinc renamed this:
    benchmark: Refactor SipHash_32b to improve accuracy and potential avoid optimization issues
    refactor: Refactor SipHash_32b to improve accuracy and potential avoid optimization issues
    on Jun 27, 2024
  4. DrahtBot added the label Refactoring on Jun 27, 2024
  5. l0rinc renamed this:
    refactor: Refactor SipHash_32b to improve accuracy and potential avoid optimization issues
    refactor: Improve SipHash_32b accuracy and potential avoid optimization issues
    on Jun 27, 2024
  6. l0rinc renamed this:
    refactor: Improve SipHash_32b accuracy and potential avoid optimization issues
    refactor: Improve SipHash_32b accuracy to avoid potential optimization issues
    on Jun 27, 2024
  7. maflcko commented at 2:10 pm on June 27, 2024: member
    Changing the behavior of something is not a refactor.
  8. l0rinc renamed this:
    refactor: Improve SipHash_32b accuracy to avoid potential optimization issues
    benchmark: Improve SipHash_32b accuracy to avoid potential optimization issues
    on Jun 27, 2024
  9. maflcko removed the label Refactoring on Jun 27, 2024
  10. DrahtBot added the label CI failed on Jul 22, 2024
  11. DrahtBot commented at 10:17 am on July 22, 2024: contributor

    🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/26763788481

    Make sure to run all tests locally, according to the documentation.

    The failure may happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  12. l0rinc force-pushed on Jul 22, 2024
  13. DrahtBot removed the label CI failed on Jul 22, 2024
  14. achow101 requested review from laanwj on Oct 15, 2024
  15. achow101 requested review from josibake on Oct 15, 2024
  16. laanwj added the label Tests on Oct 16, 2024
  17. in src/bench/crypto_hash.cpp:200 in 8c4e14deda outdated
    195+    auto k0 = rng.rand64(), k1 = rng.rand64();
    196+    auto val = rng.rand256();
    197     bench.run([&] {
    198-        *((uint64_t*)x.begin()) = SipHashUint256(0, ++k1, x);
    199+        ankerl::nanobench::doNotOptimizeAway(SipHashUint256(k0, k1, val));
    200+        ++k0; ++k1; ++(*val.begin());
    


    laanwj commented at 9:21 am on October 16, 2024:
    nit: please put multiple statements on multiple lines, this is pretty hard to read code as-is :sweat_smile:

    l0rinc commented at 1:11 pm on October 16, 2024:
    Thanks, done
  18. laanwj commented at 9:22 am on October 16, 2024: member
    ACK on overall change, small nit
  19. l0rinc force-pushed on Oct 16, 2024
  20. l0rinc force-pushed on Oct 16, 2024
  21. laanwj approved
  22. laanwj commented at 8:05 am on October 17, 2024: member
    ACK c49cc30b8185a50b4aa00cf705d313c8aa9482a3
  23. in src/bench/crypto_hash.cpp:198 in c49cc30b81 outdated
    196-    uint64_t k1 = 0;
    197+    FastRandomContext rng(true);
    198+    auto k0 = rng.rand64(), k1 = rng.rand64();
    199+    auto val = rng.rand256();
    200     bench.run([&] {
    201-        *((uint64_t*)x.begin()) = SipHashUint256(0, ++k1, x);
    


    maflcko commented at 10:01 am on October 17, 2024:
    Maybe I am wrong, but in theory I think that a byte array has weaker memory alignment requirements than uint64_t. So this is (in a strict sense) undefined behavior anyway. I presume it could be tested by adding a padding byte to uint256 and compile with ubsan, and then run this bench to observe the failure.

    laanwj commented at 10:24 am on October 17, 2024:

    This is a great point. Instead of the type-pun, we’d likely want to make this safer by assigning the result to a uint64_t then memcpy it in instead. That likely isn’t going to be significantly slower.

    Edit: Wait, this code is being removed. Sorry.


    maflcko commented at 10:30 am on October 17, 2024:

    Edit: Wait, this code is being removed. Sorry.

    Yeah, sorry, I was just trying to come up with another reason to change the code here :sweat_smile:


    l0rinc commented at 10:48 am on October 17, 2024:
    Do you have any concrete recommendation that I could apply here?

    maflcko commented at 11:31 am on October 17, 2024:

    Sorry again, I am just trying to add useful context that hasn’t been mentioned before. I just have an allergy against undefined behavior, so I wanted to mention that the previous code may have had one.

    As a fun fact, I tried it myself and in theory the UB should look like:

     0src/bench/crypto_hash.cpp:198:9: runtime error: store to misaligned address 0x7fffdb31cb81 for type 'uint64_t' (aka 'unsigned long'), which requires 8 byte alignment
     10x7fffdb31cb81: note: pointer points here
     2 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00
     3              ^ 
     4    [#0](/bitcoin-bitcoin/0/) 0x556ee2095df7 in SipHash_32b(ankerl::nanobench::Bench&)::$_0::operator()() const bld-cmake/src/bench/./src/bench/crypto_hash.cpp:198:33
     5    [#1](/bitcoin-bitcoin/1/) 0x556ee2095df7 in ankerl::nanobench::Bench& ankerl::nanobench::Bench::run<SipHash_32b(ankerl::nanobench::Bench&)::$_0>(SipHash_32b(ankerl::nanobench::Bench&)::$_0&&) bld-cmake/src/bench/./src/bench/nanobench.h:1221:13
     6    [#2](/bitcoin-bitcoin/2/) 0x556ee2092fc6 in SipHash_32b(ankerl::nanobench::Bench&) bld-cmake/src/bench/./src/bench/crypto_hash.cpp:197:11
     7
     8...
     9
    10SUMMARY: UndefinedBehaviorSanitizer: misaligned-pointer-use ./src/bench/crypto_hash.cpp:198:9 
    

    l0rinc commented at 11:39 am on October 17, 2024:
    Thank you for checking!
  24. maflcko approved
  25. maflcko commented at 10:02 am on October 17, 2024: member
    lgtm ACK c49cc30b8185a50b4aa00cf705d313c8aa9482a3
  26. maflcko added this to the milestone 29.0 on Oct 17, 2024
  27. in src/bench/crypto_hash.cpp:202 in c49cc30b81 outdated
    200     bench.run([&] {
    201-        *((uint64_t*)x.begin()) = SipHashUint256(0, ++k1, x);
    202+        ankerl::nanobench::doNotOptimizeAway(SipHashUint256(k0, k1, val));
    203+        ++k0;
    204+        ++k1;
    205+        ++(*val.begin());
    


    maflcko commented at 4:25 pm on October 17, 2024:

    This fails ubsan:

    0src/bench/crypto_hash.cpp:202:9: runtime error: implicit conversion from type 'int' of value 256 (32-bit, signed) to type 'unsigned char' changed the value to 0 (8-bit, unsigned)
    1    [#0](/bitcoin-bitcoin/0/) 0x55b763589f3f in SipHash_32b(ankerl::nanobench::Bench&)::$_0::operator()() const bld-cmake/src/bench/./src/bench/crypto_hash.cpp:202:9
    2
    3
    4...
    5
    6SUMMARY: UndefinedBehaviorSanitizer: implicit-signed-integer-truncation /home/marco/workspace/b-c/src/bench/crypto_hash.cpp:202:9 
    

    To reproduce you’ll have to run the bench under ubsan/isan without -sanity-check.


    l0rinc commented at 12:26 pm on October 18, 2024:
    I couldn’t reproduce the warning locally, but changed it to modify every value in val.data() instead to make sure none of them can be optimized away - would you be so kind and check if this solves the undefined behavior?

    laanwj commented at 10:09 am on October 20, 2024:
    i’m very confused about this one; begin() returns a unsigned char*, and unsigned values are definitely allowed to overflow within defined behavior. Also, m_data is an array (of fixed size) so by definition [0] will not be out of bounds.

    laanwj commented at 11:43 am on October 20, 2024:
    a workaround may be to do val.data()[0] = (val.data()[0] + 1) & 0xff; to make the wrap-around explicit, the C++ compiler will almost certainly just optimize it away. It’s also slightly more readable i think.

    sipa commented at 11:47 am on October 20, 2024:

    This is not UB, but ubsan is explicitly told to warn for certain error-prone but implementation-defined or well-defined behaviors.

    If the effect is deliberate, it is possible to add the function name to the ubsan suppressions.


    l0rinc commented at 11:48 am on October 20, 2024:
    Would you prefer I ‘& 0xFF’ the current implementation?

    laanwj commented at 12:56 pm on October 20, 2024:
    dunno, i would prefer &0xff to adding a supression (if that works to make the error go away, ofc), it more explicitly defines what is expected

    maflcko commented at 2:05 pm on October 23, 2024:

    Yeah, this is a bit confusing. UndefinedBehaviorSanitizer is just the name of the tool, not an implication that the issue it found is UB.

    If you want to reproduce, you’ll have to compile with the integer sanitizer (yes, to add to the confusion, the tool is called different here). Also, it only works with clang++/clang, so you’ll have to select that as well during cmake -B. Maybe this can be documented in doc/developer-notes.md?

    Edit: (reply to &0xff): An alternative to &0xff would be to cast to the underlying type uint8_t(bla), but both are fine and identical.


    maflcko commented at 7:31 pm on October 24, 2024:

    Maybe this can be documented in doc/developer-notes.md?

    (An alternative would be to add the CI configs to the cmake presets file, because the ISan is part of one. )


    l0rinc commented at 11:35 am on October 31, 2024:
    Thanks for the hints, I’ve replaced the increment of the bytes with a xor, this should keep the values in check. The overflow of i should be well-defined in this case, please let me know if you think any other change is needed since I couldn’t reproduce the sanitizer warnings. I have checked to make sure this still helps with the benchmarking of #30442. Done in https://github.com/bitcoin/bitcoin/compare/c88b1bde7500ace70a694f36a82521f92221936c..100cded580cfe9d69cc30866c29697d9658b4ce3 (also changed to brace inits since the spirit of @hodlinator was haunting me)

    maflcko commented at 11:40 am on October 31, 2024:

    The overflow of i should be well-defined in this case

    Unsigned integer overflow is well-defined (and relied upon) either way. This is just about documenting where one is expected and where one is unexpected.


    l0rinc commented at 11:43 am on October 31, 2024:
    So are you ok with the current version?

    maflcko commented at 11:54 am on October 31, 2024:

    I just think that it would be nice to not print the warning and to document that the unsigned overflow is expected here.

    Steps to reproduce on a fresh Ubuntu 24.04:

     0export DEBIAN_FRONTEND=noninteractive && apt update && apt install curl wget htop git vim ccache -y && git clone https://github.com/bitcoin/bitcoin.git  --depth=1 ./b-c && cd b-c && apt install build-essential cmake pkg-config  python3-zmq libzmq3-dev libevent-dev libboost-dev libsqlite3-dev  libdb++-dev clang llvm  -y
     1
     2git fetch origin --depth=1 100cded580cfe9d69cc30866c29697d9658b4ce3 && git checkout 100cded580cfe9d69cc30866c29697d9658b4ce3 && cmake -B ./bld-cmake -DSANITIZERS=integer -DCMAKE_C_COMPILER='clang' -DCMAKE_CXX_COMPILER='clang++'  -DBUILD_BENCH=ON                              && cmake --build ./bld-cmake --parallel  $(nproc)
     3
     4UBSAN_OPTIONS="suppressions=$(pwd)/test/sanitizer_suppressions/ubsan:print_stacktrace=1:halt_on_error=1:report_error_type=1" ./bld-cmake/src/bench/bench_bitcoin --filter=SipHash_32b
     5
     6...
     7
     8/b-c/src/bench/crypto_hash.cpp:203:9: runtime error: implicit conversion from type 'int' of value 256 (32-bit, signed) to type 'uint8_t' (aka 'unsigned char') changed the value to 0 (8-bit, unsigned)
     9    [#0](/bitcoin-bitcoin/0/) 0x5583348ec575 in SipHash_32b(ankerl::nanobench::Bench&)::$_0::operator()() const bld-cmake/src/bench/./src/bench/crypto_hash.cpp:203:9
    10    [#1](/bitcoin-bitcoin/1/) 0x5583348ec575 in ankerl::nanobench::Bench& ankerl::nanobench::Bench::run<SipHash_32b(ankerl::nanobench::Bench&)::$_0>(SipHash_32b(ankerl::nanobench::Bench&)::$_0&&) bld-cmake/src/bench/./src/bench/nanobench.h:1221:13
    11    [#2](/bitcoin-bitcoin/2/) 0x5583348ea256 in SipHash_32b(ankerl::nanobench::Bench&) bld-cmake/src/bench/./src/bench/crypto_hash.cpp:199:11
    12    [#3](/bitcoin-bitcoin/3/) 0x558334873009 in std::function<void (ankerl::nanobench::Bench&)>::operator()(ankerl::nanobench::Bench&) const /usr/bin/../lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/bits/std_function.h:591:9
    13    [#4](/bitcoin-bitcoin/4/) 0x558334873009 in benchmark::BenchRunner::RunAll(benchmark::Args const&) bld-cmake/src/bench/./src/bench/bench.cpp:127:13
    14    [#5](/bitcoin-bitcoin/5/) 0x55833486d35b in main bld-cmake/src/bench/./src/bench/bench_bitcoin.cpp:135:9
    15    [#6](/bitcoin-bitcoin/6/) 0x7f9aad6d11c9  (/lib/x86_64-linux-gnu/libc.so.6+0x2a1c9) (BuildId: 6d64b17fbac799e68da7ebd9985ddf9b5cb375e6)
    16    [#7](/bitcoin-bitcoin/7/) 0x7f9aad6d128a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2a28a) (BuildId: 6d64b17fbac799e68da7ebd9985ddf9b5cb375e6)
    17    [#8](/bitcoin-bitcoin/8/) 0x558334840bc4 in _start (/b-c/bld-cmake/src/bench/bench_bitcoin+0x1b6bc4) (BuildId: 6155e3af28ae68265b04bfa46b763f94bded03e5)
    18
    19SUMMARY: UndefinedBehaviorSanitizer: implicit-signed-integer-truncation /b-c/src/bench/crypto_hash.cpp:203:9 
    

    l0rinc commented at 11:40 am on November 3, 2024:
    Thanks a lot @maflcko, that reproducer was super useful! Added a & 0xFF (or do we prefer 0xff? Found about the same number of occurrences in the code) and bumped the uint8_t to get rid of the overflow warning, see: https://github.com/bitcoin/bitcoin/compare/100cded580cfe9d69cc30866c29697d9658b4ce3..42066f45ff5d48e78a317eda63c035809bd657c6
  28. l0rinc force-pushed on Oct 18, 2024
  29. maflcko removed this from the milestone 29.0 on Oct 24, 2024
  30. maflcko commented at 1:37 pm on October 30, 2024: member
    Are you still working on this? If not, maybe turn into a draft for now?
  31. l0rinc force-pushed on Oct 31, 2024
  32. l0rinc force-pushed on Oct 31, 2024
  33. l0rinc force-pushed on Oct 31, 2024
  34. DrahtBot commented at 11:34 am on October 31, 2024: contributor

    🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/32331774653

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  35. DrahtBot added the label CI failed on Oct 31, 2024
  36. DrahtBot removed the label CI failed on Oct 31, 2024
  37. Refactor SipHash_32b benchmark to improve accuracy and avoid optimization issues
    - Modify `SipHash_32b` benchmark to use `FastRandomContext` for generating initial values.
    - Cycle through and modify each byte of the `uint256` value to ensure no part of it can be optimized away.
    
    The lack of "recursion" (where the method call overwrites the used inputs partially) and the systematic modification of each input byte makes the benchmark usage more reliable and thorough.
    42066f45ff
  38. l0rinc force-pushed on Nov 3, 2024
  39. maflcko commented at 7:58 am on November 4, 2024: member

    review ACK 42066f45ff5d48e78a317eda63c035809bd657c6

    Didn’t re-check ubsan

  40. DrahtBot requested review from laanwj on Nov 4, 2024
  41. hodlinator commented at 10:03 am on November 4, 2024: contributor

    Concept ACK 42066f45ff5d48e78a317eda63c035809bd657c6

    More realistic to also change k0 and val. (The modification of the uint256 val only happens for one byte at a time, so benchmark time shouldn’t be that affected by some unrelated expensive instructions / memory access :+1: ).

    UBSAN Repro

    Managed to repro maflcko’s concern on older push c88b1bde7500ace70a694f36a82521f92221936c:

     0₿ UBSAN_OPTIONS="suppressions=$(pwd)/test/sanitizer_suppressions/ubsan:print_stacktrace=1:halt_on_error=1:report_error_type=1" ./build/src/bench/bench_bitcoin --filter=SipHash_32b
     1/home/hodlinator/bitcoin/src/bench/crypto_hash.cpp:203:9: runtime error: implicit conversion from type 'int' of value 256 (32-bit, signed) to type 'unsigned char' changed the value to 0 (8-bit, unsigned)
     2    [#0](/bitcoin-bitcoin/0/) 0x5f6de628f68b  (/home/hodlinator/bitcoin/build/src/bench/bench_bitcoin+0x24d68b)
     3    [#1](/bitcoin-bitcoin/1/) 0x5f6de628d4ba  (/home/hodlinator/bitcoin/build/src/bench/bench_bitcoin+0x24b4ba)
     4    [#2](/bitcoin-bitcoin/2/) 0x5f6de6217da2  (/home/hodlinator/bitcoin/build/src/bench/bench_bitcoin+0x1d5da2)
     5    [#3](/bitcoin-bitcoin/3/) 0x5f6de6212d2a  (/home/hodlinator/bitcoin/build/src/bench/bench_bitcoin+0x1d0d2a)
     6    [#4](/bitcoin-bitcoin/4/) 0x75b2ee70127d  (/nix/store/sl141d1g77wvhr050ah87lcyz2czdxa3-glibc-2.40-36/lib/libc.so.6+0x2a27d) (BuildId: 6a788357de379aee7162f608d25a7692f7162b15)
     7    [#5](/bitcoin-bitcoin/5/) 0x75b2ee701338  (/nix/store/sl141d1g77wvhr050ah87lcyz2czdxa3-glibc-2.40-36/lib/libc.so.6+0x2a338) (BuildId: 6a788357de379aee7162f608d25a7692f7162b15)
     8    [#6](/bitcoin-bitcoin/6/) 0x5f6de61de094  (/home/hodlinator/bitcoin/build/src/bench/bench_bitcoin+0x19c094)
     9
    10SUMMARY: UndefinedBehaviorSanitizer: implicit-signed-integer-truncation /home/hodlinator/bitcoin/src/bench/crypto_hash.cpp:203:9 in
    

    Does not reproduce on current version (42066f45ff5d48e78a317eda63c035809bd657c6).

    Benchmark results

    Running on Intel workstation.

    Before:

    0 build/src/bench/bench_bitcoin -filter=SipHash_32b -min-time=5000
    1
    2|               ns/op |                op/s |    err% |          ins/op |          cyc/op |    IPC |         bra/op |   miss% |     total | benchmark
    3|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
    4|               24.63 |       40,601,125.32 |    0.0% |          256.00 |           90.84 |  2.818 |           3.00 |    0.0% |      5.50 | `SipHash_32b`
    

    After:

    0 build/src/bench/bench_bitcoin -filter=SipHash_32b -min-time=5000
    1
    2|               ns/op |                op/s |    err% |          ins/op |          cyc/op |    IPC |         bra/op |   miss% |     total | benchmark
    3|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
    4|               26.30 |       38,022,455.52 |    0.0% |          266.00 |           97.02 |  2.742 |           3.00 |    0.0% |      5.50 | `SipHash_32b`
    

    A decrease in op/s seems understandable.

    Concern

    I don’t understand why M1 Macs would have more ops after in the PR summary…? (Also - might need to refresh the results anyway after latest push).

  42. l0rinc commented at 2:49 pm on November 4, 2024: contributor

    I don’t understand why M1 Macs would have more ops after

    My understand is that when we’ve reused the hash as input, it could only run a single instance since each one depended on the previous one. The new way is more predictable and clang often optimizes more aggressively, I guess it unrolls the instances. I can investigate if you think it’s important.

    refresh the results anyway after latest push

    Update the description, thanks.

  43. hodlinator approved
  44. hodlinator commented at 4:14 pm on November 4, 2024: contributor

    ACK 42066f45ff5d48e78a317eda63c035809bd657c6

    My understand is that when we’ve reused the hash as input, it could only run a single instance since each one depended on the previous one. The new way is more predictable and clang often optimizes more aggressively, I guess it unrolls the instances. I can investigate if you think it’s important.

    Ah, that makes sense that the CPU/compiler wasn’t able to do as much out-of-order execution previously because of reusing the output from the last run as input for the next. Feels more accurate to change the benchmark in the way you’ve done. Finally was able to see the main tree in the forest of this little PR. :) No need to investigate further IMO.

  45. achow101 commented at 9:24 pm on November 14, 2024: member
    ACK 42066f45ff5d48e78a317eda63c035809bd657c6
  46. achow101 merged this on Nov 14, 2024
  47. achow101 closed this on Nov 14, 2024

  48. l0rinc deleted the branch on Nov 14, 2024

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-11-21 12:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me