refactor: C++20: Use std::rotl #29085

pull fjahr wants to merge 1 commits into bitcoin:master from fjahr:202312-cpp20 changing 5 files +46 −57
  1. fjahr commented at 10:49 pm on December 14, 2023: contributor

    While exploring some C++20 changes and checking against our code I found this potential improvement:

    1. We can replace our custom implementation of rotl32 in crypto/chacha20 with std::rotl from the new bit header.
  2. DrahtBot commented at 10:49 pm on December 14, 2023: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage

    For detailed information about the code coverage, see the test coverage report.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK fanquake
    Concept ACK hebasto
    Stale ACK maflcko

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    Conflicts

    No conflicts as of last run.

  3. DrahtBot added the label Refactoring on Dec 14, 2023
  4. DrahtBot added the label CI failed on Dec 14, 2023
  5. maflcko commented at 8:52 am on December 15, 2023: member

    Not sure. This needs a benchmark to check against slowdown, see #29057 (comment)

    Also, C++20 is enabled for years, so I don’t understand the other commit. I think you either forgot to compile locally, or your compiler doesn’t emit any warnings that the changes are wrong?

  6. fanquake commented at 9:53 am on December 15, 2023: member
     0txmempool.cpp:88:61: error: lambda capture 'this' is not used [-Werror,-Wunused-lambda-capture]
     1   88 |             mapTx.modify(mapTx.iterator_to(descendant), [=, this](CTxMemPoolEntry& e) {
     2      |                                                           ~~^~~~
     3txmempool.cpp:99:32: error: lambda capture 'this' is not used [-Werror,-Wunused-lambda-capture]
     4   99 |     mapTx.modify(updateIt, [=, this](CTxMemPoolEntry& e) { e.UpdateDescendantState(modifySize, modifyFee, modifyCount); });
     5      |                              ~~^~~~
     6txmempool.cpp:300:38: error: lambda capture 'this' is not used [-Werror,-Wunused-lambda-capture]
     7  300 |         mapTx.modify(ancestorIt, [=, this](CTxMemPoolEntry& e) { e.UpdateDescendantState(updateSize, updateFee, updateCount); });
     8      |                                    ~~^~~~
     9txmempool.cpp:315:26: error: lambda capture 'this' is not used [-Werror,-Wunused-lambda-capture]
    10  315 |     mapTx.modify(it, [=, this](CTxMemPoolEntry& e){ e.UpdateAncestorState(updateSize, updateFee, updateCount, updateSigOpsCost); });
    11      |                        ~~^~~~
    12txmempool.cpp:345:39: error: lambda capture 'this' is not used [-Werror,-Wunused-lambda-capture]
    13  345 |                 mapTx.modify(dit, [=, this](CTxMemPoolEntry& e){ e.UpdateAncestorState(modifySize, modifyFee, -1, modifySigOps); });
    14      |                                     ~~^~~~
    15txmempool.cpp:903:46: error: lambda capture 'this' is not used [-Werror,-Wunused-lambda-capture]
    16  903 |                 mapTx.modify(ancestorIt, [=, this](CTxMemPoolEntry& e){ e.UpdateDescendantState(0, nFeeDelta, 0);});
    17      |                                            ~~^~~~
    18txmempool.cpp:910:48: error: lambda capture 'this' is not used [-Werror,-Wunused-lambda-capture]
    19  910 |                 mapTx.modify(descendantIt, [=, this](CTxMemPoolEntry& e){ e.UpdateAncestorState(0, nFeeDelta, 0, 0); });
    20      |                                              ~~^~~~
    
  7. fjahr commented at 10:26 pm on December 15, 2023: contributor

    Also, C++20 is enabled for years, so I don’t understand the other commit.

    I don’t understand this comment. What’s the ‘other’ commit?

    I think you either forgot to compile locally, or your compiler doesn’t emit any warnings that the changes are wrong?

    I compiled locally just fine and ran all test using Apple clang version 15.0.0 as the compiler. I will look into the warnings.

  8. mzumsande commented at 10:45 pm on December 15, 2023: contributor
    I think the issue is that most of the spots that were changed don’t actually make use of the implicit capture of this (e.g. by accessing a class or struct member in the body of the lambda) and therefore don’t need to be changed. Only the example in the scheduler does that (which is a doc). Not the use of [=] in general is deprecated, just using [=] in combination with accessing this via implicit capture. See https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0806r2.html for a longer explanation.
  9. fjahr force-pushed on Dec 15, 2023
  10. fjahr commented at 11:18 pm on December 15, 2023: contributor

    Seems right that the capture doesn’t always need to be explicit, I dropped the second commit.

    I don’t see any performance impact on chacha20 from the first commit:

    $ src/bench/bench_bitcoin -filter=“CHACHA20_1MB|CHACHA20_256BYTES|CHACHA20_64BYTES”

    ns/byte byte/s err% total benchmark
    1.92 522,177,619.15 0.1% 0.02 CHACHA20_1MB
    1.94 514,584,551.79 0.3% 0.01 CHACHA20_256BYTES
    2.02 495,631,174.53 0.1% 0.01 CHACHA20_64BYTES

    $ src/bench/bench_bitcoin -filter=“CHACHA20_1MB|CHACHA20_256BYTES|CHACHA20_64BYTES”

    ns/byte byte/s err% total benchmark
    1.91 522,318,337.55 0.1% 0.02 CHACHA20_1MB
    1.94 514,750,575.59 0.2% 0.01 CHACHA20_256BYTES
    2.02 495,682,263.12 0.0% 0.01 CHACHA20_64BYTES

    $ src/bench/bench_bitcoin -filter=“CHACHA20_1MB|CHACHA20_256BYTES|CHACHA20_64BYTES”

    ns/byte byte/s err% total benchmark
    1.92 521,106,902.91 0.3% 0.02 CHACHA20_1MB
    1.94 515,778,495.78 0.0% 0.01 CHACHA20_256BYTES
    2.02 495,731,465.14 0.0% 0.01 CHACHA20_64BYTES

    $ src/bench/bench_bitcoin -filter=“CHACHA20_1MB|CHACHA20_256BYTES|CHACHA20_64BYTES”

    ns/byte byte/s err% total benchmark
    1.92 520,578,875.51 0.3% 0.02 CHACHA20_1MB
    1.94 515,833,155.08 0.1% 0.01 CHACHA20_256BYTES
    2.02 495,727,504.36 0.0% 0.01 CHACHA20_64BYTES

    $ src/bench/bench_bitcoin -filter=“CHACHA20_1MB|CHACHA20_256BYTES|CHACHA20_64BYTES”

    ns/byte byte/s err% total benchmark
    1.91 522,209,865.84 0.1% 0.02 CHACHA20_1MB
    1.94 515,822,396.86 0.0% 0.01 CHACHA20_256BYTES
    2.02 495,715,040.19 0.0% 0.01 CHACHA20_64BYTES

    $ src/bench/bench_bitcoin -filter=“CHACHA20_1MB|CHACHA20_256BYTES|CHACHA20_64BYTES”

    ns/byte byte/s err% total benchmark
    1.92 522,069,205.88 0.1% 0.02 CHACHA20_1MB
    1.94 514,707,650.50 0.2% 0.01 CHACHA20_256BYTES
    2.02 495,706,912.69 0.0% 0.01 CHACHA20_64BYTES
  11. DrahtBot removed the label CI failed on Dec 15, 2023
  12. fjahr renamed this:
    refactor: C++20: Use lambda implicit capture and std::rotl
    refactor: C++20: Use std::rotl
    on Dec 16, 2023
  13. fanquake commented at 1:08 pm on December 18, 2023: member

    If we are going to change it here, should also switch in hash.cpp:

    https://github.com/bitcoin/bitcoin/blob/c840dea27edfcfc67beb756ca6eda27b319f93d5/src/hash.cpp#L12-L15

    Also cc @theuni.

  14. fjahr force-pushed on Dec 24, 2023
  15. fjahr commented at 5:37 pm on December 24, 2023: contributor

    If we are going to change it here, should also switch in hash.cpp

    Done, thanks! I grepped for it didn’t find this one, probably because of the all-caps.

  16. maflcko added the label DrahtBot Guix build requested on Dec 25, 2023
  17. maflcko commented at 1:46 pm on December 25, 2023: member

    I guess this is fine, because the shift is constant and the compiler can skip the expensive modulo?

    https://github.com/llvm/llvm-project/blob/9fba1d5f3a52af0ae62f386d0c494bd9510fa845/libcxx/include/__bit/rotate.h#L35

  18. sipa commented at 2:10 pm on December 25, 2023: member

    @maflcko Constant propagation + strength reduction should make that code not emit an actual modulo operation in any compiler from the last 20 years (it’ll use x & (d-1) instead of x % d, if d is a known power of two).

    But hopefully that pattern in libc++ also matches whatever pattern matching clang has to make sure it instead emits an actual rot instruction on platforms where it exists.

  19. DrahtBot commented at 9:04 pm on December 25, 2023: contributor

    Guix builds (on x86_64)

    File commit 4b1196a9855dcd188a24f393aa2fa21e2d61f061(master) commit 1defdd46bbed418ebe19ed087d9d499c5fe7e12c(master and this pull)
    SHA256SUMS.part 7aefd803a584a152... f66270f8381da78c...
    *-aarch64-linux-gnu-debug.tar.gz 735f38fb37b239f8... 0178f4f1e25a9ccd...
    *-aarch64-linux-gnu.tar.gz 83a257c74e184b96... e70f485ce7c509e1...
    *-arm-linux-gnueabihf-debug.tar.gz 513acf435b18fab2... 66dcefcdfcdb8171...
    *-arm-linux-gnueabihf.tar.gz 581f911fcca5ae02... 419f97951c22103a...
    *-arm64-apple-darwin-unsigned.tar.gz 7a3f2b4e01f27896... ee665ca81967f2e3...
    *-arm64-apple-darwin-unsigned.zip dff45b014236e990... d8316d5dc9bd85c5...
    *-arm64-apple-darwin.tar.gz 0d9696b7bd76b36f... 0a812fb90bc27826...
    *-powerpc64-linux-gnu-debug.tar.gz dfe6cfdcd6ad3c6f... a5e96360b63b9c9b...
    *-powerpc64-linux-gnu.tar.gz 54c013201daad1ff... 13216629ab10e2fa...
    *-powerpc64le-linux-gnu-debug.tar.gz 8e944b002273b678... 6213e437a6763983...
    *-powerpc64le-linux-gnu.tar.gz 1ba8199984d7c43a... 75447353ea712f0a...
    *-riscv64-linux-gnu-debug.tar.gz e9abe48aeb7acc68... a884e5a19ce37301...
    *-riscv64-linux-gnu.tar.gz f4d1484dec65b967... 341b54b9c84ca385...
    *-x86_64-apple-darwin-unsigned.tar.gz 915650aa35e6e322... 8928011cd2a21165...
    *-x86_64-apple-darwin-unsigned.zip 6a6139cb415de743... d136c4f68df7be28...
    *-x86_64-apple-darwin.tar.gz c479d099d1df9a85... 226bc38c63f84332...
    *-x86_64-linux-gnu-debug.tar.gz 9e27161179f18a4a... e6abee7e9248190f...
    *-x86_64-linux-gnu.tar.gz f69632ca3dc98896... a7bce0cff5b25503...
    *.tar.gz c04997c9f9e51d80... a25b31625606e2cf...
    guix_build.log 412e8bd624c6cbcf... e6a2ef83e981021e...
    guix_build.log.diff 55ad9bb7e18060bd...
  20. DrahtBot removed the label DrahtBot Guix build requested on Dec 25, 2023
  21. maflcko commented at 4:20 pm on January 4, 2024: member
    lgtm ACK 842c288852c4d66de359e6907d9c82b7e618ab65
  22. theuni commented at 5:55 pm on January 4, 2024: member
    Any reason not to do Rotl in sha3 and ROTL in siphash as well?
  23. crypto, hash: replace custom rotl32 with std::rotl 6044628543
  24. fjahr force-pushed on Jan 5, 2024
  25. fjahr commented at 4:14 pm on January 5, 2024: contributor

    Any reason not to do Rotl in sha3 and ROTL in siphash as well?

    I don’t see any, apparently I just need to learn to grep patterns better…

    Done!

  26. hebasto commented at 5:48 pm on January 5, 2024: member
    Concept ACK.
  27. DrahtBot added the label CI failed on Jan 15, 2024
  28. fanquake approved
  29. fanquake commented at 4:23 pm on January 16, 2024: member

    ACK 60446285436da62adef1c0a9b11c3336d82b4d89

    Had a look at the difference in master vs this PR (rebased) on aarch64. Seems like for the most part, this gives the same output +- a handful of instructions. (Identical is identical minus difference in addresses / argument ordering etc).

    Function GCC 13.2 Clang 17.0.6
    MurmurHash3 Identical PR gives 5 less instructions: https://www.diffchecker.com/Jmlpl4rG/
    ChaCha20Aligned::Keystream PR gives 1 less instruction: https://www.diffchecker.com/d6dp5tsk/ Identical
    KeccakF Identical Identical
    CSipHasher:write (unsigned long) Identical Identical

    So this seems ok, unless anyone can find any major regressions. The output on corecheck also seems ok: https://corecheck.dev/bitcoin/bitcoin/pulls/29085.

  30. DrahtBot requested review from hebasto on Jan 16, 2024
  31. DrahtBot requested review from maflcko on Jan 16, 2024
  32. sipa commented at 4:25 pm on January 16, 2024: member
    @fanquake Which side of the diff is which version?
  33. fanquake commented at 4:27 pm on January 16, 2024: member

    Which side of the diff is which version?

    Left is master @ f1fcc9638cde7664b9642018fe6872148bbb0172 , right was the PR based on f1fcc9638cde7664b9642018fe6872148bbb0172.

  34. fanquake commented at 9:35 am on January 18, 2024: member
    Looks like the improvement is a bit more pronouced for x86_64, at least when using Clang (17.0.6). For example, comparing MurmurHash3, master (left), and this PR (right), we get less code and more rol instructions: https://www.diffchecker.com/I7s5JeTr/. For GCC 13.2, this PR makes no difference to the generated assembly: https://www.diffchecker.com/nM5DgKfN/.
  35. fanquake merged this on Jan 18, 2024
  36. fanquake closed this on Jan 18, 2024


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-09-28 22:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me