serialization: c++20 endian/byteswap/clz modernization #29263

pull theuni wants to merge 4 commits into bitcoin:master from theuni:cxx_20_endian2 changing 11 files +118 −324
  1. theuni commented at 7:47 pm on January 17, 2024: member

    This replaces #28674, #29036, and #29057. Now ready for testing and review.

    Replaces platform-specific endian and byteswap functions. This is especially useful for kernel, as it means that our deep serialization code no longer requires bitcoin-config.h.

    I apologize for the size of the last commit, but it’s hard to avoid making those changes at once.

    All platforms now use our internal functions rather than libc or platform-specific ones, with the exception of MSVC.

    Sadly, benchmarking showed that not all compilers are capable of detecting and optimizing byteswap functions, so compiler builtins are instead used where possible. However, they’re now detected via macros rather than autoconf checks.

    This matches how libc++ implements std::byteswap for c++23.

    I suggest we move/rename compat/endian.h, but I left that out of this PR to avoid bikeshedding.

    #29057 pointed out some irregularities in benchmarks. After messing with various compilers and configs for a few weeks with these changes, I’m of the opinion that we can’t win on every platform every time, so we should take the code that makes sense going forward. That said, if any real-world slowdowns are caused here, we should obviously investigate.

  2. DrahtBot commented at 7:47 pm on January 17, 2024: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage

    For detailed information about the code coverage, see the test coverage report.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK maflcko
    Concept ACK fanquake

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #29494 (build: Assume HAVE_CONFIG_H, Add IWYU pragma keep to bitcoin-config.h includes by maflcko)
    • #29450 (build: replace custom MAC_OSX macro with standard __APPLE__ for consistent macOS detection by paplorinc)
    • #21590 (Safegcd-based modular inverses in MuHash3072 by sipa)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  3. theuni force-pushed on Jan 17, 2024
  4. theuni commented at 1:57 pm on January 19, 2024: member

    Here’s a godbolt test that shows how various compilers perform: https://gcc.godbolt.org/z/nTadqEP83

    To test with/without builtins, comment/un-comment the DISABLE_BUILTIN_BSWAPS define at line 14.

    From my tests:

    • clang: all c++20 supporting versions (>= 10.0) optimize without the need for builtins
    • gcc: no version optimizes without the builtins
    • msvc: versions >= 19.37 optimize without builtins
  5. maflcko added the label DrahtBot Guix build requested on Jan 23, 2024
  6. theuni force-pushed on Jan 23, 2024
  7. theuni commented at 9:30 pm on January 23, 2024: member
    Added a quick note about std::byteswap and c++23.
  8. DrahtBot commented at 7:23 am on January 24, 2024: contributor

    Guix builds (on x86_64)

    File commit e69796c79c0aa202087a13ba62d9fbcc1c8754d4(master) commit ea6e27ce012501e3dd0bb8948ca1d02d450633c4(master and this pull)
    SHA256SUMS.part e3cbeb9e75c9ac85... 99dcf4de09376e75...
    *-aarch64-linux-gnu-debug.tar.gz 49b7d884d40ea63b... be15822b12796d24...
    *-aarch64-linux-gnu.tar.gz 7ba0932cedca0b55... e73549352dae7a29...
    *-arm-linux-gnueabihf-debug.tar.gz ff1cc0114023daa7... 047df3ec91266906...
    *-arm-linux-gnueabihf.tar.gz c9f35eb8d6b70596... aeb1de17fe3d8b73...
    *-arm64-apple-darwin-unsigned.tar.gz c7b8e86e6f571427... 117f1a412fcdf132...
    *-arm64-apple-darwin-unsigned.zip 64263106e26d5f38... 698d08ae6a55eb6b...
    *-arm64-apple-darwin.tar.gz c451a701b31666a6... 160ddcae70fd28b7...
    *-powerpc64-linux-gnu-debug.tar.gz 8eaed3b25e35a166... 91eb43803645a409...
    *-powerpc64-linux-gnu.tar.gz 92192d5dcd2abc5d... 0e4f93325d2c1d4c...
    *-powerpc64le-linux-gnu-debug.tar.gz 7ba12a5e155f1f31... 865994169509c581...
    *-powerpc64le-linux-gnu.tar.gz 783da5e58254e1c7... 338e87d38582f88d...
    *-riscv64-linux-gnu-debug.tar.gz 123752c651ab537b... 36fde55fe604836a...
    *-riscv64-linux-gnu.tar.gz 6816d44ba62480e0... d06f75cb9639c743...
    *-x86_64-apple-darwin-unsigned.tar.gz 651a9ae2176cf7e1... 7da6af545e133a49...
    *-x86_64-apple-darwin-unsigned.zip 32787903bc08d809... e6f37321a45e076f...
    *-x86_64-apple-darwin.tar.gz 878b96f61a83b5d5... 8fad7f75ceced2b9...
    *-x86_64-linux-gnu-debug.tar.gz 7a7b930773e6723f... 7d4987ee307d625a...
    *-x86_64-linux-gnu.tar.gz d012051198112116... 637d9dce06441d0f...
    *.tar.gz 05ffca50f20c6f8c... f1e9711f7f4b1d8b...
    guix_build.log b307741814adae42... 3d60205e28f729b9...
    guix_build.log.diff 6b4c07847d5b2078...
  9. DrahtBot removed the label DrahtBot Guix build requested on Jan 24, 2024
  10. fanquake commented at 3:30 pm on January 24, 2024: member
    Concept ACK - I think moving to using the builtins is ok. With the eventual plan to migrate to std::byteswap. @aureleoules everytime I visit the coverage/benchmarks for this PR, I get a 500 error. Can you take a look?
  11. aureleoules commented at 4:30 pm on January 24, 2024: member
    @fanquake thanks for reaching out, I fixed the issue.
  12. theuni commented at 4:35 pm on January 24, 2024: member
    @aureleoules Thanks! These numbers really don’t make much sense to me and don’t match what I’ve seen locally, but I’ll work on trying to reproduce.
  13. theuni commented at 7:11 pm on January 24, 2024: member
    From some more local tests, it looks like it’s the clz changes slowing things down for whatever reason. Still investigating.
  14. theuni force-pushed on Feb 1, 2024
  15. theuni commented at 9:17 pm on February 1, 2024: member

    I’ve dropped the clz changes as a test and kept only the endian/byteswap change. Locally on my machine the benchmarks look the same before and after.

    If the corecheck benchmarks look better I’ll update the title/description here.

    Edit: looks good.

  16. theuni commented at 10:01 pm on February 1, 2024: member

    Aha, I finally tracked down the culprit! No wonder the benchmarks weren’t making sense!

    The problem was the removal of this from crypto/common.h:

    0#if defined(HAVE_CONFIG_H)
    1#include <config/bitcoin-config.h>
    2#endif
    

    Turns out, sha256.cpp was relying on that. Adding that include to sha256.cpp (where it belongs) fixes my local benchmarks.

    I’m going to put the clz changes back to match the PR title/description, but this time with the include fixed up. Hopefully after that the benchmarks will make sense and all will be good :)

    I suppose there might be other compilation units that were depending on the include indirectly as well. I’ll figure out a way to check for that.

  17. theuni force-pushed on Feb 1, 2024
  18. theuni marked this as a draft on Feb 1, 2024
  19. theuni force-pushed on Feb 1, 2024
  20. theuni commented at 10:43 pm on February 1, 2024: member

    Converted to a draft while I’m still messing with this.

    I added a new commit to add the bitcoin-config.h include where it’s necessary. I’m not done investigating that yet, but it’s at least needed for chacha20poly1305.cpp as well.

    The benchmarks still show some regressions, though it’s not nearly as bad as it was before. Will continue poking.

  21. theuni force-pushed on Feb 1, 2024
  22. DrahtBot added the label CI failed on Feb 2, 2024
  23. maflcko commented at 12:14 pm on February 5, 2024: member
    I’ve created #26972 to track ideas on how to detect those silent build issues, or at least make them easier to spot.
  24. theuni commented at 3:12 pm on February 5, 2024: member

    I’ve pushed a commit which temporarily puts the includes back in the low-level headers for the sake of addressing the benchmarks first, setting aside the possibility of missing defines. @aureleoules’s benchmarks show 2 small regressions, though I am unable to reproduce those locally.

    My tests show: ./bench/bench_bitcoin -min-time=5000 -filter="AddrManGetAddr|PrevectorDeserializeNontrivial" master (3d52cedb497e0ec029ba3cef95b00169c157d8ae):

    0|               ns/op |                op/s |    err% |     total | benchmark
    1|--------------------:|--------------------:|--------:|----------:|:----------
    2|          224,138.83 |            4,461.52 |    0.1% |      5.50 | `AddrManGetAddr`
    3|              155.96 |        6,411,753.11 |    0.1% |      5.49 | `PrevectorDeserializeNontrivial
    

    This PR:

    0|               ns/op |                op/s |    err% |     total | benchmark
    1|--------------------:|--------------------:|--------:|----------:|:----------
    2|          226,531.50 |            4,414.40 |    0.2% |      5.50 | `AddrManGetAddr`
    3|              156.91 |        6,372,901.08 |    0.1% |      5.50 | `PrevectorDeserializeNontrivial`
    

    Essentially no change. I’m curious if @maflcko sees the same?

  25. DrahtBot removed the label CI failed on Feb 5, 2024
  26. aureleoules commented at 3:53 pm on February 5, 2024: member

    @theuni yeah the AddrManGetAddr and PrevectorDeserializeNontrivial are known to be flaky. I’ll filter them out from the UI while I find a better solution. Some benchmarks depend on I/O or randomness so it’s hard a have consistent results between runs.

    You can see ignored benchmarks here in the meantime: https://github.com/corecheck/frontend/blob/master/src/routes/%5Bowner%5D/%5Brepo%5D/pulls/%5Bnumber%5D/Benchmarks.svelte#L334

  27. theuni commented at 4:17 pm on February 5, 2024: member
    @aureleoules Thanks, that’s helpful.
  28. fanquake referenced this in commit 45b2a91897 on Feb 20, 2024
  29. fanquake commented at 2:06 pm on February 20, 2024: member
    @theuni want to rebase this now that #29404 is in?
  30. theuni force-pushed on Feb 20, 2024
  31. theuni commented at 5:01 pm on February 20, 2024: member

    Rebased. We’ll see what c-i thinks now. Benchmarks should be all good.

    I think it makes sense for #29408 to go first though.

  32. maflcko commented at 1:28 pm on February 26, 2024: member
    Ready to be taken out of draft?
  33. crypto: replace non-standard CLZ builtins with c++20's bit_width
    Also some header cleanups.
    52f9bba889
  34. crypto: replace CountBits with std::bit_width
    bit_width is a drop-in replacement with an exact meaning in c++, so there is
    no need to continue testing/fuzzing/benchmarking.
    297367b3bb
  35. theuni force-pushed on Feb 26, 2024
  36. theuni marked this as ready for review on Feb 26, 2024
  37. theuni commented at 4:14 pm on February 26, 2024: member
    Rebased after #29408 merge and undrafted.
  38. fanquake added the label DrahtBot Guix build requested on Feb 27, 2024
  39. fanquake commented at 10:12 am on February 27, 2024: member
    Looks like the test coverage report now no-longer shows any significant difference in benchmarks.
  40. in src/compat/byteswap.h:53 in 5745aedce7 outdated
    47-          | ((x & 0x0000ff0000000000ull) >> 24)
    48-          | ((x & 0x000000ff00000000ull) >> 8)
    49-          | ((x & 0x00000000ff000000ull) << 8)
    50-          | ((x & 0x0000000000ff0000ull) << 24)
    51-          | ((x & 0x000000000000ff00ull) << 40)
    52-          | ((x & 0x00000000000000ffull) << 56));
    


    maflcko commented at 3:44 pm on February 27, 2024:

    Any reason to remove this header and move the code from one file to another?

    Seems easier to review (and a smaller diff) to just keep it as-is for now. Given that it will be removed anyway with C++23, that should be fine.


    theuni commented at 3:50 pm on February 27, 2024:
    Hmm, I don’t remember now honestly. My best guess is I thought this would make the change to std::byteswap more straightforward in the future, but agree that it’d be easier to review now if left in-place. I’ll move it back.

    maflcko commented at 3:53 pm on February 27, 2024:
    The bswap changes + renaming to internal_bswap_* could also be done in a separate commit?

    theuni commented at 6:35 pm on February 27, 2024:
    Ah, I guess I combined them because the constexpr-ness of the endian conversion functions is at the mercy of the byteswap functions, so compat/endian.h requires the use of the BSWAP_CONSTEXPR macro. I guess that’s still ok with them split up, though.

    maflcko commented at 6:45 pm on February 27, 2024:

    Is there a requirement that the functions can be called at compile time? Not sure if there is a use-case, so just doing the full switch to constexpr with C++23 seems sufficient.

    In any case, exporting BSWAP_CONSTEXPR from this header should be fine as well?


    theuni commented at 6:55 pm on February 27, 2024:
    No, but I figure compilers are more likely to be able to optimize in the non-builtin case if they’re constexpr.

    maflcko commented at 7:02 pm on February 27, 2024:
    tiny nit: I think the style changes in this line are still there, and could be dropped

    theuni commented at 8:08 pm on February 27, 2024:
    Indeed, thanks. The whitespace was already off on the first line, so I kept it that way to reduce the diff here.
  41. theuni force-pushed on Feb 27, 2024
  42. theuni commented at 7:01 pm on February 27, 2024: member
    Updated to split the last commit as suggested by @maflcko . I also replaced static inline with just inline in places where I had introduced that.
  43. theuni force-pushed on Feb 27, 2024
  44. DrahtBot commented at 11:43 pm on February 27, 2024: contributor

    Guix builds (on x86_64)

    File commit 6a7ed5e2374bfea943742595c76da8f2d25061f6(master) commit 88c60987167496ea19a75ce130d9364a41520929(master and this pull)
    SHA256SUMS.part 1c34f1e5d39e6e9f... acb7161ac7364ecc...
    *-aarch64-linux-gnu-debug.tar.gz 24e23354d1be9fc8... a20b13fcce939673...
    *-aarch64-linux-gnu.tar.gz 358ad6c1b0dd8ead... f6fc3fe4b27e5ee7...
    *-arm-linux-gnueabihf-debug.tar.gz 60042e298ffdd3c3... 00c47060ecb3abf4...
    *-arm-linux-gnueabihf.tar.gz fed6a8b2bdf2a303... 63d75b669ff14561...
    *-arm64-apple-darwin-unsigned.tar.gz 4c9a8d9d5b89a020... 8ebdbacf70159fe2...
    *-arm64-apple-darwin-unsigned.zip 2cfdf3c5b0a0db30... de2f41b558ff6626...
    *-arm64-apple-darwin.tar.gz 7960a667b50e1e1f... 9ca99330356a62d5...
    *-powerpc64-linux-gnu-debug.tar.gz 25e57cb4560928de... 9d2312f9cba01279...
    *-powerpc64-linux-gnu.tar.gz fccab49cbdd030c6... 623776d643b66d7a...
    *-powerpc64le-linux-gnu-debug.tar.gz 0dad8bf978d0ac93... 14d5019dca08f8bd...
    *-powerpc64le-linux-gnu.tar.gz 561cdb98080d9fe7... 5054cf4bbbe05f01...
    *-riscv64-linux-gnu-debug.tar.gz bf9f2e103904eb7f... 769f2383f6b80be5...
    *-riscv64-linux-gnu.tar.gz 4f166d86b4597cbd... 0d2dc190d20b4fa3...
    *-x86_64-apple-darwin-unsigned.tar.gz 7ed87d2ffa2a3860... e23e5a8d91623708...
    *-x86_64-apple-darwin-unsigned.zip 999c3327d8d208c2... 7135d75a120bccf8...
    *-x86_64-apple-darwin.tar.gz 99c857d9cd288193... 107e29285c059475...
    *-x86_64-linux-gnu-debug.tar.gz c4aa5e9c880bc333... 65794f06ace89f5d...
    *-x86_64-linux-gnu.tar.gz 63f3a64c51429d91... b9b4b601b9369cb3...
    *.tar.gz d9e5a8d2f97b2900... e882728fa5d23050...
    guix_build.log eacd4c991584db23... f88d51ce2cfa8976...
    guix_build.log.diff da8f5ab257e07d01...
  45. DrahtBot removed the label DrahtBot Guix build requested on Feb 27, 2024
  46. in src/compat/byteswap.h:8 in c788bd34f5 outdated
     4@@ -5,44 +5,67 @@
     5 #ifndef BITCOIN_COMPAT_BYTESWAP_H
     6 #define BITCOIN_COMPAT_BYTESWAP_H
     7 
     8-#if defined(HAVE_CONFIG_H)
     9-#include <config/bitcoin-config.h>
    10+#include <bit>
    


    maflcko commented at 12:20 pm on February 28, 2024:
    nit in c788bd34f540f31cd62985de26ad7961417a40ec: What is this used for?

    theuni commented at 1:43 pm on February 28, 2024:
    Nice catch, this was leftover from the combined commit. Removed.
  47. maflcko commented at 1:04 pm on February 28, 2024: member

    ACK 1d051e86098e721a47cebf7cfd73d82aa414c05e 📛

    Signature:

    0untrusted comment: signature from minisign secret key on empty file; verify via: minisign -Vm "${path_to_any_empty_file}" -P RWTRmVTMeKV5noAMqVlsMugDDCyyTSbA3Re5AkUrhvLVln0tSaFWglOw -x "${path_to_this_whole_four_line_signature_blob}"
    1RUTRmVTMeKV5npGrKx1nqXCw5zeVHdtdYURB/KlyA/LMFgpNCs+SkW9a8N95d+U4AP1RJMi+krxU1A3Yux4bpwZNLvVBKy0wLgM=
    2trusted comment: ACK 1d051e86098e721a47cebf7cfd73d82aa414c05e 📛
    36Obp6pvdrMxN7MXEPLAuvZe0UCy9RzNtRu/VNpLvSGe6P6thBW12J6z7gacvZBa1MGOdmgJFjbeCHc8MjWTyDA==
    
  48. DrahtBot requested review from fanquake on Feb 28, 2024
  49. serialization: detect byteswap builtins without autoconf tests
    Rather than a complicated set of tests to decide which bswap functions to
    use, always prefer the compiler built-ins when available.
    
    These builtins and fallbacks can all be removed once we're using c++23, which
    adds std::byteswap.
    432b18ca8d
  50. serialization: use internal endian conversion functions
    These replace our platform-specific mess in favor of c++20 endian detection
    via std::endian and internal byteswap functions when necessary.
    
    They no longer rely on autoconf detection.
    86b7f28d6c
  51. theuni force-pushed on Feb 28, 2024
  52. maflcko commented at 2:04 pm on February 28, 2024: member

    ACK 86b7f28d6c507155a9d3a15487ee883989b88943 📘

    Signature:

    0untrusted comment: signature from minisign secret key on empty file; verify via: minisign -Vm "${path_to_any_empty_file}" -P RWTRmVTMeKV5noAMqVlsMugDDCyyTSbA3Re5AkUrhvLVln0tSaFWglOw -x "${path_to_this_whole_four_line_signature_blob}"
    1RUTRmVTMeKV5npGrKx1nqXCw5zeVHdtdYURB/KlyA/LMFgpNCs+SkW9a8N95d+U4AP1RJMi+krxU1A3Yux4bpwZNLvVBKy0wLgM=
    2trusted comment: ACK 86b7f28d6c507155a9d3a15487ee883989b88943 📘
    3kQ4T5XBTsSlLG85FEzyZHq8zuOHIKVd6qy22hl3UGR99D8vEpfnGCcvGnQHmd2MS2GYEuSKzOYeFDt9w2Ch8Bw==
    
  53. fanquake referenced this in commit bbfddb3998 on Feb 28, 2024
  54. fanquake requested review from TheCharlatan on Feb 29, 2024
  55. fanquake approved
  56. fanquake commented at 4:19 pm on March 1, 2024: member
    ACK 86b7f28d6c507155a9d3a15487ee883989b88943 - we can finish pruning out the __builtin_clz* checks/usage once the minisketch code has been updated. This is more good cleanup pre-CMake & for the kernal.
  57. fanquake merged this on Mar 1, 2024
  58. fanquake closed this on Mar 1, 2024


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-07-01 10:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me