Replace non-standard CLZ builtins with c++20’s bit_width #29057

pull theuni wants to merge 3 commits into bitcoin:master from theuni:bit_width changing 5 files +5 โˆ’48
  1. theuni commented at 9:35 pm on December 11, 2023: member

    Split out of #28674

    Note that we can’t yet drop our configure checks because we pass the results on to minisketch. I’ve opened a PR for that upstream here: https://github.com/sipa/minisketch/pull/80

    fanquake suggested that we simply replace our CountBits call-sites with uses of std::bit_width directly and just drop the tests and fuzzers. I agree with that, but I wanted to allow our tests/fuzzers to run with this change first to help convince reviewers that std::bit_width is a drop-in replacement forCountBits.

    I can either add a commit on top of this PR to do the switch after the c-i has run or do it as a follow-up PR, I have no preference.

    I was curious to see what would happen under the hood with this change. Fortunately libc++ (as an example) does exactly what one would expect, using the built-ins: https://github.com/llvm/llvm-project/blob/main/libcxx/include/__bit/countl.h#L56

  2. DrahtBot commented at 9:35 pm on December 11, 2023: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage

    For detailed information about the code coverage, see the test coverage report.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    Concept ACK hebasto, fanquake
    Stale ACK theStack

    If your review is incorrectly listed, please react with ๐Ÿ‘Ž to this comment and the bot will ignore it on the next update.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #28674 ([POC] C++20 std::endian by fanquake)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  3. theStack approved
  4. theStack commented at 1:45 am on December 12, 2023: contributor

    Code-review ACK 273e4dc8da41f113b21a7ece54988b7bddf3612e

    Unit tests succeeded locally :heavy_check_mark:

    Seems reasonable to keep CountBits in a first step with its tests and fuzzers for reassurance and then replace all of its call-sites. Happy to re-review this PR or a follow-up, depending on where you decide to put the next commit.

  5. hebasto commented at 8:37 am on December 12, 2023: member
    Concept ACK.
  6. hebasto commented at 9:22 am on December 12, 2023: member

    … but I wanted to allow our tests/fuzzers to run with this change first to help convince reviewers that std::bit_width is a drop-in replacement forCountBits.

    I’m convinced :)

  7. fanquake commented at 1:57 pm on December 12, 2023: member

    Concept ACK. We can also check the benchmarks in https://corecheck.dev/bitcoin/bitcoin/pulls/29057 when they become available.

    I can either add a commit on top of this PR to do the switch after the c-i has run or do it as a follow-up PR, I have no preference.

    I think the changes here should be straightforward enough, that you could push another commit doing the cleanup.

  8. theuni commented at 4:24 pm on December 12, 2023: member
    Thanks @theStack and @hebasto for the quick review. @fanquake sounds good, will push a commit to do the switch once we see some benchmarks.
  9. maflcko commented at 2:42 pm on December 13, 2023: member
    I don’t think the benchmarks will come. I guess the machine has been preempted by AWS? @aureleoules
  10. aureleoules commented at 3:25 pm on December 13, 2023: member

    I don’t think the benchmarks will come. I guess the machine has been preempted by AWS? @aureleoules

    I’ve re-ran the job. I actually found a way to re-run preempted jobs automatically! A lot of benchmark jobs failed yesterday because of #29061 including this pull because I rebase all pulls on master before running jobs and I don’t properly handle failures ๐Ÿ˜ฌ

  11. aureleoules commented at 5:47 pm on December 13, 2023: member

    So the benchmark results are available now but they are pessimistic on a few benchs (WalletCreateTxUseOnlyPresetInputs, WalletAvailableCoins and PrevectorClearNontrivial).

    I ran the benchmarks locally without cachegrind and there does not seem to be any variation between master and the pull. But when I run them with cachegrind I notice the same variation as seen on corecheck.

    This may be due to a limitation of cachegrind because it cannot emulate L2 cache and so results may be pessimist? I’m not an expert in this field though so I’ll let you interpret the results๐Ÿ˜…

  12. crypto: add BitWidth bench 303aebbcf5
  13. crypto: replace non-standard CLZ builtins with c++20's bit_width b9b40cda7b
  14. theuni force-pushed on Dec 13, 2023
  15. theuni commented at 6:19 pm on December 13, 2023: member

    I can reproduce a slight slowdown here. I added a bench that demonstrates the difference, then the original commit, then the removal.

    Both libc++ and libstdc++ implement this in terms of the same built-ins we were using before, so I find this surprising. I’d hate to find that the c++ism’s have a cost.

    Going to convert this to a draft while I look into it.

  16. crypto: replace CountBits with std::bit_width
    bit_width is a drop-in replacement with an exact meaning in c++, so there is
    no need to continue testing/fuzzing/benchmarking.
    0c5d731256
  17. theuni force-pushed on Dec 13, 2023
  18. DrahtBot added the label CI failed on Dec 13, 2023
  19. DrahtBot removed the label CI failed on Dec 13, 2023
  20. theuni marked this as a draft on Dec 13, 2023
  21. maflcko commented at 5:02 pm on January 5, 2024: member

    I can reproduce a slight slowdown here. Both libc++ and libstdc++ implement this in terms of the same built-ins we were using before, so I find this surprising. I’d hate to find that the c++ism’s have a cost.

    I ran the first commit via ./src/bench/bench_bitcoin --filter=BitWidth.* and I can see a slowdown of the current implementation (std is faster). Which is also confusing, given that it is the same code. Doubly confusing, because it is the opposite result of yours?

  22. DrahtBot added the label CI failed on Jan 16, 2024
  23. theuni commented at 7:43 pm on January 17, 2024: member

    @maflcko I’m thinking we should just take the c++20 code and assume it will be more performant going forward. That makes sense to me anyway.

    I’m going to go ahead and close this and include it in another PR which gathers the serialization c++20 changes. We can do another round of benches there.

  24. theuni closed this on Jan 17, 2024


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-01-10 00:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me