Enable HW-accelerated implementations of SHA256 for MSVC builds #24773

pull hebasto wants to merge 5 commits into bitcoin:master from hebasto:220405-avx2 changing 7 files +41 −3
  1. hebasto commented at 3:43 pm on April 5, 2022: member

    This PR enables AVX2, SSE4.1 and x86 SHA-NI implementations of SHA256 instead of the “standard” one.

    NOTE about testing. During runtime the SHA-NI implementation is available only if a CPU has the sha_ni flag set.

    Here are benchmark results on my machine with Intel Core i5 8350U (no sha_ni flag) + Windows 11 Pro 22H2:

    0>.\src\bench_bitcoin.exe -filter=SHA256D64.*
    1
    2|             ns/byte |              byte/s |    err% |     total | benchmark
    3|--------------------:|--------------------:|--------:|----------:|:----------
    4|                2.79 |      357,807,381.52 |    0.2% |      0.01 | SHA256D64_1024_AVX2 using the 'standard,sse41(4way),avx2(8way)' SHA256 implementation
    5|                4.29 |      232,954,767.62 |    0.0% |      0.01 | SHA256D64_1024_SHANI using the 'standard,sse41(4way)' SHA256 implementation
    6|                4.33 |      231,031,727.38 |    0.6% |      0.01 | SHA256D64_1024_SSE4 using the 'standard,sse41(4way)' SHA256 implementation
    7|               11.65 |       85,836,280.29 |    0.3% |      0.01 | SHA256D64_1024_STANDARD using the 'standard' SHA256 implementation
    

    On another machine with Intel Core i9-12950HX (sha_ni flag set) + Windows 11 Home 22H2:

    • the master branch @ 058488276f8dc244fe534ba45ec8dd2b4b198a2e:
     0>.\src\bench_bitcoin.exe -filter=SHA256.*
     1
     2|             ns/byte |              byte/s |    err% |     total | benchmark
     3|--------------------:|--------------------:|--------:|----------:|:----------
     4|               24.74 |       40,421,883.67 |    1.2% |      0.02 | `SHA256D64_1024_AVX2 using the 'standard' SHA256 implementation`
     5|               25.09 |       39,856,473.88 |    2.7% |      0.02 | `SHA256D64_1024_SHANI using the 'standard' SHA256 implementation`
     6|               25.02 |       39,965,849.49 |    1.4% |      0.02 | `SHA256D64_1024_SSE4 using the 'standard' SHA256 implementation`
     7|               25.06 |       39,900,152.21 |    1.4% |      0.02 | `SHA256D64_1024_STANDARD using the 'standard' SHA256 implementation`
     8|               17.87 |       55,953,892.74 |    4.0% |      0.01 | `SHA256_32b_AVX2 using the 'standard' SHA256 implementation`
     9|               17.59 |       56,839,372.12 |    2.5% |      0.01 | `SHA256_32b_SHANI using the 'standard' SHA256 implementation`
    10|               17.95 |       55,721,393.03 |    2.9% |      0.01 | `SHA256_32b_SSE4 using the 'standard' SHA256 implementation`
    11|               18.47 |       54,140,724.95 |    2.1% |      0.01 | `SHA256_32b_STANDARD using the 'standard' SHA256 implementation`
    12|                8.27 |      120,885,364.41 |    2.3% |      0.09 | `SHA256_AVX2 using the 'standard' SHA256 implementation`
    13|                8.10 |      123,519,312.24 |    0.5% |      0.09 | `SHA256_SHANI using the 'standard' SHA256 implementation`
    14|                8.15 |      122,720,467.32 |    1.7% |      0.09 | `SHA256_SSE4 using the 'standard' SHA256 implementation`
    15|                7.93 |      126,141,581.31 |    1.4% |      0.09 | `SHA256_STANDARD using the 'standard' SHA256 implementation`
    
    • this PR:
     0>.\src\bench_bitcoin.exe -filter=SHA256.*
     1
     2|             ns/byte |              byte/s |    err% |     total | benchmark
     3|--------------------:|--------------------:|--------:|----------:|:----------
     4|                2.24 |      446,126,616.75 |    1.7% |      0.01 | `SHA256D64_1024_AVX2 using the 'standard,sse41(4way),avx2(8way)' SHA256 implementation`
     5|                1.05 |      953,349,958.44 |    3.3% |      0.01 | `SHA256D64_1024_SHANI using the 'x86_shani(1way,2way)' SHA256 implementation`
     6|                2.82 |      354,952,157.43 |    1.1% |      0.01 | `SHA256D64_1024_SSE4 using the 'standard,sse41(4way)' SHA256 implementation`
     7|                5.45 |      183,471,444.57 |    1.6% |      0.01 | `SHA256D64_1024_STANDARD using the 'standard' SHA256 implementation`
     8|                4.37 |      228,701,362.34 |    1.4% |      0.01 | `SHA256_32b_AVX2 using the 'standard,sse41(4way),avx2(8way)' SHA256 implementation`
     9|                1.34 |      748,698,312.44 |    2.4% |      0.01 | `SHA256_32b_SHANI using the 'x86_shani(1way,2way)' SHA256 implementation`
    10|                4.30 |      232,450,436.16 |    2.6% |      0.01 | `SHA256_32b_SSE4 using the 'standard,sse41(4way)' SHA256 implementation`
    11|                4.54 |      220,072,501.29 |    3.6% |      0.01 | `SHA256_32b_STANDARD using the 'standard' SHA256 implementation`
    12|                2.14 |      467,093,278.53 |    4.2% |      0.02 | `SHA256_AVX2 using the 'standard,sse41(4way),avx2(8way)' SHA256 implementation`
    13|                0.44 |    2,279,721,873.93 |    2.9% |      0.01 | `SHA256_SHANI using the 'x86_shani(1way,2way)' SHA256 implementation`
    14|                2.09 |      478,148,608.59 |    2.2% |      0.02 | `SHA256_SSE4 using the 'standard,sse41(4way)' SHA256 implementation`
    15|                2.06 |      486,570,650.06 |    2.4% |      0.02 | `SHA256_STANDARD using the 'standard' SHA256 implementation`
    
  2. DrahtBot added the label Build system on Apr 5, 2022
  3. DrahtBot added the label Utils/log/libs on Apr 5, 2022
  4. laanwj commented at 9:12 am on April 6, 2022: member
    Concept ACK—but it does makes me wonder why AVX2 specifically? Would, say, enabling SHANI (which is better on hw that supports it) give a lot of extra work?
  5. hebasto commented at 9:16 am on April 6, 2022: member

    Concept ACK—but it does makes me wonder why AVX2 specifically? Would, say, enabling SHANI (which is better on hw that supports it) give a lot of extra work?

    I have no hardware that supports SHANI (

  6. laanwj commented at 9:34 am on April 6, 2022: member

    I have no hardware that supports SHANI (

    That’s a valid reason. I guess the base work in this PR will make it easier for people who care, to add support the other instruction sets too?

  7. hebasto commented at 9:46 am on April 6, 2022: member

    I have no hardware that supports SHANI (

    That’s a valid reason. I guess the base work in this PR will make it easier for people who care, to add support the other instruction sets too?

    Exactly.

  8. laanwj requested review from sipsorcery on Apr 6, 2022
  9. DrahtBot commented at 7:53 pm on April 6, 2022: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage

    For detailed information about the code coverage, see the test coverage report.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    Concept ACK laanwj, theuni
    Stale ACK sipsorcery

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #29774 (build: Enable fuzz binary in MSVC by hebasto)
    • #29625 (Several randomness improvements by sipa)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  10. sipsorcery approved
  11. sipsorcery commented at 11:28 am on April 17, 2022: member

    tACK 70f36d816e823d300361f872923f2c3a5f90d476.

    *I’m not qualified to verify the safety of changing crypto logic.

    I performed an msvc build and then ran the benchmark with and without this PR and there doesn’t seem to be much of a difference for SHA256 on my Intel i7 CPU based PC. I’m not sure if that as expected or not?

    On master:

    0|             ns/byte |              byte/s |    err% |     total | benchmark
    1|--------------------:|--------------------:|--------:|----------:|:----------
    2|               12.70 |       78,766,206.15 |    2.6% |      0.14 | `SHA1`
    3|               16.99 |       58,841,181.77 |    1.2% |      0.19 | `SHA256`
    4|               46.23 |       21,630,470.66 |   11.5% |      0.04 | :wavy_dash: `SHA256D64_1024` (Unstable with ~1.0 iters. Increase `minEpochIterations` to e.g. 10)
    5|               36.54 |       27,365,792.76 |    1.8% |      0.01 | `SHA256_32b`
    6|               24.93 |       40,108,453.26 |    3.8% |      0.29 | `SHA3_256_1M`
    7|               11.64 |       85,924,678.43 |    1.7% |      0.13 | `SHA512`
    

    With #24773:

    0|             ns/byte |              byte/s |    err% |     total | benchmark
    1|--------------------:|--------------------:|--------:|----------:|:----------
    2|               13.79 |       72,504,223.37 |    4.9% |      0.16 | `SHA1`
    3|               18.06 |       55,382,638.65 |    4.6% |      0.21 | `SHA256`
    4|               88.30 |       11,324,888.97 |   13.0% |      0.06 | :wavy_dash: `SHA256D64_1024` (Unstable with ~1.0 iters. Increase `minEpochIterations` to e.g. 10)
    5|               56.46 |       17,710,386.66 |   18.7% |      0.01 | :wavy_dash: `SHA256_32b` (Unstable with ~618.0 iters. Increase `minEpochIterations` to e.g. 6180)
    6|               28.18 |       35,489,686.70 |    2.3% |      0.33 | `SHA3_256_1M`
    7|               12.35 |       80,963,792.99 |    6.3% |      0.14 | :wavy_dash: `SHA512` (Unstable with ~1.0 iters. Increase `minEpochIterations` to e.g.10)
    
  12. laanwj referenced this in commit f8b2e9bcfc on Apr 19, 2022
  13. hebasto force-pushed on Apr 19, 2022
  14. hebasto commented at 4:20 pm on April 19, 2022: member
    Rebased on top of the merged #24772.
  15. sidhujag referenced this in commit f308cdc773 on Apr 19, 2022
  16. DrahtBot added the label Needs rebase on Apr 28, 2022
  17. hebasto marked this as a draft on May 28, 2022
  18. hebasto commented at 1:14 pm on May 28, 2022: member
    Converted to a “draft” until resolving discussion in #24774.
  19. fanquake commented at 2:15 am on October 13, 2022: member
    Closing for now, given you’ve closed the remaining base PR (#24774). Let me know if this is incorrect / should be reopened.
  20. fanquake closed this on Oct 13, 2022

  21. hebasto reopened this on Nov 28, 2022

  22. hebasto force-pushed on Nov 28, 2022
  23. hebasto renamed this:
    [POC] Enable AVX2 implementation of SHA256 for MSVC builds
    Enable HW-accelerated implementations of SHA256 for MSVC builds
    on Nov 28, 2022
  24. hebasto force-pushed on Nov 28, 2022
  25. hebasto force-pushed on Nov 28, 2022
  26. DrahtBot removed the label Needs rebase on Nov 28, 2022
  27. hebasto force-pushed on Nov 28, 2022
  28. hebasto marked this as ready for review on Nov 28, 2022
  29. hebasto commented at 4:49 pm on November 28, 2022: member

    Would, say, enabling SHANI (which is better on hw that supports it) give a lot of extra work?

    Implemented.

    Converted to a “draft” until resolving discussion in #24774.

    No longer based on #24774.

  30. in src/random.cpp:71 in b27defef42 outdated
    66@@ -67,7 +67,7 @@ static inline int64_t GetPerformanceCounter() noexcept
    67 #endif
    68 }
    69 
    70-#ifdef HAVE_GETCPUID
    71+#if defined(HAVE_GETCPUID) && !defined(_MSC_VER)
    


    fanquake commented at 4:54 pm on November 28, 2022:
    If GetCPUID is now implemented for MSVC, why is this skipped?

    hebasto commented at 5:07 pm on November 28, 2022:

    If GetCPUID is now implemented for MSVC

    It is implemented in the first commit.

    why is this skipped?

    MSVC does not support inline assembly on x64: https://github.com/bitcoin/bitcoin/blob/593979494042e77b5271d42b8f6e7b514e0007d8/src/random.cpp#L129-L136


    fanquake commented at 5:10 pm on November 28, 2022:

    It is implemented in the first commit.

    I’m aware. That’s why I said “is now implemented”.


    theuni commented at 11:27 pm on May 4, 2023:

    I’m so confused.

    Was GetCPUID not compiling already? If not, what was guarding it?

    And I don’t understand the comment about MSVC and assembly. It seems to deal with the randomness example ok?


    hebasto commented at 8:49 am on May 5, 2023:
    1. This #if ... #else ... #endif block guards two static functions–GetRdRand and GetRdSeed–which are not implemented for Windows. Not adding && !defined(_MSC_VER) ends with an error:
    0C:\Users\hebasto\bitcoin\src\random.cpp(137,1): fatal  error C1189: #error:  "RdRand is only supported on x86 and x86_64" [C:\Users\hebasto\bitcoin\build_msvc\li
    1bbitcoin_util\libbitcoin_util.vcxproj]
    
    1. And I don’t understand the comment about MSVC and assembly.

    Windows implementation for the GetRdRand and GetRdSeed was not added because “MSVC does not support inline assembly on x64”.

    1. Was GetCPUID not compiling already?

    It was. On Windows, it is used in the SHA256AutoDetect and AddCPUID functions.


    theuni commented at 3:19 pm on May 5, 2023:
    Hmm, ok, we are talking around each-other. I’ll ask some detailed questions in code and see if we can clear it up.

    sipa commented at 12:00 pm on April 8, 2024:
    @theuni I think what is going on is that before this PR, HAVE_GETCPUID wasn’t defined for MSVC builds, but with this PR, there now is such a feature. However, this PR still doesn’t enable rdrand/rdseed support for Windows (it would also need intrinsics versions), so this define conditional is needed to exclude support for that despite having getcpuid.
  31. in src/compat/attribute.h:1 in b27defef42 outdated
    0@@ -0,0 +1,16 @@
    1+// Copyright (c) 2022 The Bitcoin Core developers
    


    fanquake commented at 4:54 pm on November 28, 2022:
    We already have src/attributes.h. Any reason this is another file?

    hebasto commented at 5:20 pm on November 28, 2022:
    Thanks!. Updated.
  32. in src/compat/attribute.h:9 in b27defef42 outdated
    0@@ -0,0 +1,16 @@
    1+// Copyright (c) 2022 The Bitcoin Core developers
    2+// Distributed under the MIT software license, see the accompanying
    3+// file COPYING or http://www.opensource.org/licenses/mit-license.php.
    4+
    5+#ifndef BITCOIN_COMPAT_ATTRIBUTE_H
    6+#define BITCOIN_COMPAT_ATTRIBUTE_H
    7+
    8+#if defined(__GNUC__)
    9+#define FORCED_INLINE inline __attribute__((always_inline))
    


    fanquake commented at 4:54 pm on November 28, 2022:
    If we are going to add this, ALWAYS_INLINE seems like a better name.

    hebasto commented at 5:20 pm on November 28, 2022:
    Thanks!. Updated.
  33. hebasto force-pushed on Nov 28, 2022
  34. hebasto commented at 5:19 pm on November 28, 2022: member

    Updated b27defef4218b249a2ec5ec0dd033f62add8daf1 -> 674d3e9a405bebd6388632dbc23b733c65c7887c (pr24773.03 -> pr24773.04, diff):

  35. hebasto commented at 9:37 am on November 29, 2022: member
    Friendly ping @sipsorcery :)
  36. achow101 requested review from sipa on Apr 25, 2023
  37. achow101 requested review from theuni on Apr 25, 2023
  38. theuni changes_requested
  39. theuni commented at 2:54 pm on May 4, 2023: member

    Concept ACK. Coming to this after kill/shill/merge, sorry for the late re-review.

    Would you mind splitting up the inline changes from the rest? Looks good to me, but as it introduces a general abstraction in the crypto code I think it’d be nice not to restrict review to win32 devs for that part.

    After that the changes here look pretty uncontroversial to me (modulo actually verifying the cpuid bitwise stuff).

  40. hebasto commented at 3:06 pm on May 4, 2023: member

    Would you mind splitting up the inline changes from the rest?

    Just to clarify your point, you mean to put all s/inline __attribute__((always_inline))/ALWAYS_INLINE/ into the “Introduce platform-agnostic ALWAYS_INLINE macro” commit?

  41. theuni commented at 3:11 pm on May 4, 2023: member
    Sorry, yes, I meant to factor out those changes into a separate PR.
  42. hebasto commented at 5:50 pm on May 4, 2023: member

    Sorry, yes, I meant to factor out those changes into a separate PR.

    Thanks! Done in #27575.

  43. hebasto force-pushed on May 4, 2023
  44. hebasto commented at 5:57 pm on May 4, 2023: member

    Sorry, yes, I meant to factor out those changes into a separate PR.

    Thanks! Done in #27575.

    Rebased on top of the #27575 and drafted. Please review #27575 first.

  45. hebasto marked this as a draft on May 4, 2023
  46. in src/compat/cpuid.h:28 in 915bdbdf8c outdated
    22@@ -23,4 +23,22 @@ void static inline GetCPUID(uint32_t leaf, uint32_t subleaf, uint32_t& a, uint32
    23 }
    24 
    25 #endif // defined(__x86_64__) || defined(__amd64__) || defined(__i386__)
    26+
    27+#if defined(_MSC_VER) && defined(_M_X64)
    28+#define HAVE_GETCPUID
    


    theuni commented at 3:19 pm on May 5, 2023:
    How does this not get defined twice? Does MSVC not define __x86_64__ or __amd64__ or __i386__ ?

    hebasto commented at 6:36 pm on May 6, 2023:

    Does MSVC not define __x86_64__ or __amd64__ or __i386__ ?

    No, it does not. Easy to verify with the following diff:

    0--- a/src/compat/cpuid.h
    1+++ b/src/compat/cpuid.h
    2@@ -6,6 +6,7 @@
    3 #define BITCOIN_COMPAT_CPUID_H
    4
    5 #if defined(__x86_64__) || defined(__amd64__) || defined(__i386__)
    6+#error
    7 #define HAVE_GETCPUID
    8
    9 #include <cpuid.h>
    
  47. in src/crypto/sha256.cpp:587 in 915bdbdf8c outdated
    581+
    582+#include <intrin.h>
    583+
    584+bool AVXEnabled()
    585+{
    586+    auto a = _xgetbv(_XCR_XFEATURE_ENABLED_MASK);
    


    theuni commented at 3:20 pm on May 5, 2023:
    Why does the existing asm function not work? What is the specific error? Does it not like the inline asm? Or does it just have a problem with the opcode?

    hebasto commented at 6:48 pm on May 6, 2023:

    Why does the existing asm function not work?

    From https://learn.microsoft.com/en-us/cpp/assembler/inline/inline-assembler:

    Inline assembly is not supported on the ARM and x64 processors.


    What is the specific error? Does it not like the inline asm? Or does it just have a problem with the opcode?

    With the diff as follows:

     0--- a/src/crypto/sha256.cpp
     1+++ b/src/crypto/sha256.cpp
     2@@ -567,7 +567,6 @@ bool SelfTest() {
     3     return true;
     4 }
     5
     6-#if defined(USE_ASM) && (defined(__x86_64__) || defined(__amd64__) || defined(__i386__))
     7 /** Check whether the OS has enabled AVX registers. */
     8 bool AVXEnabled()
     9 {
    10@@ -575,7 +574,6 @@ bool AVXEnabled()
    11     __asm__("xgetbv" : "=a"(a), "=d"(d) : "c"(0));
    12     return (a & 6) == 6;
    13 }
    14-#endif
    15 } // namespace
    

    I have the following error:

    0C:\bitcoin\src\crypto\sha256.cpp(574,5): error C3861: '__asm__': identifier not found [C:\bitcoin\build_msvc\libbitcoin_crypto\libbitcoin_crypto.vcxproj]
    

    sipa commented at 11:26 am on May 16, 2023:
    Yeah, MSVC does not support inline asm at all. You have to use intrinsics.
  48. hebasto commented at 6:30 pm on May 6, 2023: member

    I had a chance to make some benchmarks on a Windows laptop equipped with Intel Core i9-12950HX with the sha_ni flag set.

    No significant difference has been observed. Going to close this PR. But that is curios.

    UPD. Well, the reason is that https://github.com/bitcoin/bitcoin/blob/322ec63b01499c1ec52d3912ee382ebd59f2366b/src/bench/bench_bitcoin.cpp#L64

    returns “standard” instead of expected " x86_shani(1way,2way)".

  49. fanquake referenced this in commit b13830eff6 on May 9, 2023
  50. DrahtBot added the label Needs rebase on May 9, 2023
  51. hebasto force-pushed on May 9, 2023
  52. hebasto marked this as ready for review on May 9, 2023
  53. hebasto commented at 5:29 pm on May 9, 2023: member
    Rebased on top of the merged #27575 and undrafted.
  54. DrahtBot removed the label Needs rebase on May 9, 2023
  55. hebasto marked this as a draft on May 10, 2023
  56. hebasto commented at 10:28 am on May 10, 2023: member

    Drafted again as it seems worth going #27598 and #27615 first.

    UPD. Well, the reason is that

    https://github.com/bitcoin/bitcoin/blob/322ec63b01499c1ec52d3912ee382ebd59f2366b/src/bench/bench_bitcoin.cpp#L64

    returns “standard” instead of expected " x86_shani(1way,2way)".

    Fixed in #27615.

  57. DrahtBot added the label Needs rebase on May 18, 2023
  58. hebasto force-pushed on May 18, 2023
  59. DrahtBot removed the label Needs rebase on May 18, 2023
  60. DrahtBot added the label CI failed on May 31, 2023
  61. DrahtBot removed the label CI failed on May 31, 2023
  62. DrahtBot added the label CI failed on Jun 8, 2023
  63. DrahtBot removed the label CI failed on Jun 9, 2023
  64. maflcko commented at 5:35 pm on September 20, 2023: member
    Rebase for fresh CI, if still relevant?
  65. hebasto force-pushed on Sep 20, 2023
  66. hebasto commented at 8:19 pm on September 20, 2023: member

    Rebase for fresh CI, if still relevant?

    Thanks! Done.

  67. DrahtBot added the label CI failed on Sep 20, 2023
  68. DrahtBot removed the label CI failed on Sep 21, 2023
  69. hebasto force-pushed on Sep 21, 2023
  70. hebasto force-pushed on Sep 21, 2023
  71. hebasto force-pushed on Sep 21, 2023
  72. hebasto force-pushed on Sep 21, 2023
  73. hebasto force-pushed on Sep 21, 2023
  74. hebasto force-pushed on Sep 21, 2023
  75. fanquake referenced this in commit 058488276f on Oct 4, 2023
  76. hebasto marked this as ready for review on Oct 4, 2023
  77. hebasto commented at 3:10 pm on October 4, 2023: member
    Rebased on top of the merged #27598 and undrafted.
  78. hebasto force-pushed on Oct 4, 2023
  79. hebasto commented at 3:23 pm on October 4, 2023: member

    @sipsorcery

    Mind having another look at this PR?

  80. sipsorcery commented at 8:36 pm on October 4, 2023: member

    tACK e94ae810a55da16fce6040714677b9d50781bebd.

    Relevant bench_bitcoin.exe results from my Intel i9 CPU machine. Looks up 5 to 10x faster than my i7 results above.

     0|             ns/byte |              byte/s |    err% |     total | benchmark
     1|--------------------:|--------------------:|--------:|----------:|:----------
     2|                1.37 |      729,927,007.30 |    0.3% |      0.02 | `SHA1`
     3|                2.20 |      453,902,805.03 |    3.0% |      0.01 | `SHA256D64_1024_AVX2 using the 'standard,sse41(4way),avx2(8way)' SHA256 implementation`
     4|                3.32 |      301,453,541.86 |    0.3% |      0.01 | `SHA256D64_1024_SHANI using the 'standard,sse41(4way)' SHA256 implementation`
     5|                3.32 |      300,830,846.91 |    0.3% |      0.01 | `SHA256D64_1024_SSE4 using the 'standard,sse41(4way)' SHA256 implementation`
     6|                9.18 |      108,890,919.66 |    3.2% |      0.01 | `SHA256D64_1024_STANDARD using the 'standard' SHA256 implementation`
     7|                7.42 |      134,746,633.08 |    0.4% |      0.01 | `SHA256_32b_AVX2 using the 'standard,sse41(4way),avx2(8way)' SHA256 implementation`
     8|                7.41 |      134,917,592.95 |    0.3% |      0.01 | `SHA256_32b_SHANI using the 'standard,sse41(4way)' SHA256 implementation`
     9|                8.03 |      124,525,069.29 |    6.0% |      0.01 | :wavy_dash: `SHA256_32b_SSE4 using the 'standard,sse41(4way)' SHA256 implementation` (Unstable with ~3,798.2 iters. Increase `minEpochIterations` to e.g. 37982)
    10|                7.46 |      133,974,122.00 |    0.7% |      0.01 | `SHA256_32b_STANDARD using the 'standard' SHA256 implementation`
    11|                3.55 |      281,825,099.34 |    3.2% |      0.04 | `SHA256_AVX2 using the 'standard,sse41(4way),avx2(8way)' SHA256 implementation`
    12|                3.45 |      289,939,112.79 |    2.4% |      0.04 | `SHA256_SHANI using the 'standard,sse41(4way)' SHA256 implementation`
    13|                3.47 |      287,952,084.77 |    1.1% |      0.04 | `SHA256_SSE4 using the 'standard,sse41(4way)' SHA256 implementation`
    14|                3.44 |      290,875,243.61 |    0.6% |      0.04 | `SHA256_STANDARD using the 'standard' SHA256 implementation`
    15|                3.39 |      294,828,704.52 |    1.9% |      0.04 | `SHA3_256_1M`
    16|                2.21 |      453,185,896.85 |    2.0% |      0.02 | `SHA512`
    
  81. hebasto commented at 8:40 pm on October 5, 2023: member

    I had an opportunity to benchmark this branch on a machine with Intel Core i9-12950HX, which has the sha_ni flag set. Results are posted in the PR description.

    The performance gain is huge…

    cc @sipa @theuni

  82. Frank-GER referenced this in commit 004c46e90e on Oct 13, 2023
  83. fanquake referenced this in commit 41e378a0a1 on Dec 11, 2023
  84. in build_msvc/bitcoin_config.h.in:33 in e94ae810a5 outdated
    28@@ -29,6 +29,9 @@
    29 /* Copyright year */
    30 #define COPYRIGHT_YEAR $
    31 
    32+/* Define this symbol to build in assembly routines */
    33+#define USE_ASM
    


    m3dwards commented at 1:42 pm on December 20, 2023:

    NIT: From what I understand we are enabling the special CPU instructions using intrinsics and not inline assembly. This makes the comment a little misleading and makes the symbol unfortunately named.

    Perhaps the symbol could be?

    0/* Define this symbol to enable CPU SHA Extensions */
    1#define USE_SHA_EXTENTIONS
    

    or?

    0/* Define this symbol to enable CPU Extensions */
    1#define USE_INTRINSICS
    
  85. m3dwards commented at 1:52 pm on December 20, 2023: contributor

    Tested e94ae81 on Windows 11 with VS2022. CPU: i7-4870HQ and saw big speed improvement.

    On master:

    0$ ./src/bench_bitcoin.exe -filter=SHA256D64.*
    1
    2|             ns/byte |              byte/s |    err% |     total | benchmark
    3|--------------------:|--------------------:|--------:|----------:|:----------
    4|               11.57 |       86,459,102.90 |    0.0% |      0.01 | `SHA256D64_1024_AVX2 using the 'standard' SHA256 implementation`
    5|               11.89 |       84,074,406.67 |    0.0% |      0.01 | `SHA256D64_1024_SHANI using the 'standard' SHA256 implementation`
    6|               11.97 |       83,559,862.30 |    0.6% |      0.01 | `SHA256D64_1024_SSE4 using the 'standard' SHA256 implementation`
    7|               11.90 |       84,031,286.06 |    0.0% |      0.01 | `SHA256D64_1024_STANDARD using the 'standard' SHA256 implementation`
    

    Using #24773

    0$ ./src/bench_bitcoin.exe -filter=SHA256D64.*
    1
    2|             ns/byte |              byte/s |    err% |     total | benchmark
    3|--------------------:|--------------------:|--------:|----------:|:----------
    4|                2.77 |      361,557,983.01 |    0.1% |      0.01 | `SHA256D64_1024_AVX2 using the 'standard,sse41(4way),avx2(8way)' SHA256 implementation`
    5|                4.33 |      230,699,639.18 |    0.1% |      0.01 | `SHA256D64_1024_SHANI using the 'standard,sse41(4way)' SHA256 implementation`
    6|                4.57 |      219,037,433.16 |    2.1% |      0.01 | `SHA256D64_1024_SSE4 using the 'standard,sse41(4way)' SHA256 implementation`
    7|               11.88 |       84,171,590.03 |    0.2% |      0.01 | `SHA256D64_1024_STANDARD using the 'standard' SHA256 implementation`
    

    One tiny nit on naming, not important to hold it up.

  86. DrahtBot requested review from theuni on Dec 20, 2023
  87. DrahtBot requested review from laanwj on Dec 20, 2023
  88. PastaPastaPasta referenced this in commit a3da2f13ee on Jan 2, 2024
  89. PastaPastaPasta referenced this in commit feb8aaa25b on Jan 2, 2024
  90. PastaPastaPasta referenced this in commit a641d17d0d on Jan 2, 2024
  91. PastaPastaPasta referenced this in commit 2ad5d26c3b on Jan 2, 2024
  92. DrahtBot added the label CI failed on Jan 14, 2024
  93. DrahtBot added the label Needs rebase on Jan 27, 2024
  94. Add MSVC implementation of GetCPUID() cf96097de3
  95. Add MSVC implementation of AVXEnabled() 1dc8641dad
  96. msvc: Enable AVX2 implementation of SHA256 2df3619a97
  97. msvc: Enable SSE4.1 implementation of SHA256 422d813081
  98. msvc: Enable x86 SHA-NI implementation of SHA256 6e2513f03e
  99. hebasto force-pushed on Feb 12, 2024
  100. hebasto commented at 2:22 pm on February 12, 2024: member
    Rebased.
  101. DrahtBot removed the label Needs rebase on Feb 12, 2024
  102. DrahtBot removed the label CI failed on Feb 12, 2024
  103. achow101 commented at 3:54 pm on April 9, 2024: member
    Deferring to after cmake
  104. achow101 closed this on Apr 9, 2024


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-12-05 00:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me