SHA256 implementations based on Intel SHA Extensions #13386

pull sipa wants to merge 3 commits into bitcoin:master from sipa:201806_shani changing 7 files +464 −32
  1. sipa commented at 11:32 PM on June 3, 2018: member

    Based on #13191.

    This adds SHA256 implementations that use Intel's SHA Extension instructions (using intrinsics). This needs GCC 4.9 or Clang 3.4.

    In addition to #13191, two extra implementations are provided:

    • (a) A variable-length SHA256 implementation using SHA extensions.
    • (b) A 2-way 64-byte input double-SHA256 implementation using SHA extensions.

    Benchmarks for 9001-element Merkle tree root computation on an AMD Ryzen 1800X system:

    • Using generic C++ code (pre-#10821): 6.1ms
    • Using SSE4 (master, #10821): 4.6ms
    • Using 4-way SSE4 specialized for 64-byte inputs (#13191): 2.8ms
    • Using 8-way AVX2 specialized for 64-byte inputs (#13191): 2.1ms
    • Using 2-way SHA-NI specialized for 64-byte inputs (this PR): 0.56ms

    Benchmarks for 32-byte SHA256 on the same system:

    • Using SSE4 (master, #10821): 190ns
    • Using SHA-NI (this PR): 53ns

    Benchmarks for 1000000-byte SHA256 on the same system:

    • Using SSE4 (master, #10821): 2.5ms
    • Using SHA-NI (this PR): 0.51ms
  2. in src/crypto/sha256.cpp:541 in 1773155bb2 outdated
     539 | +        TransformD64 = TransformD64Wrapper<sha256_sse4::Transform>;
     540 | +        ret = "sse4";
     541 | +    }
     542 | +
     543 | +#if defined(ENABLE_SSE41) && !defined(BUILD_BITCOIN_INTERNAL)
     544 | +    if (have_sse4) {
    


    kallewoof commented at 12:19 AM on June 4, 2018:

    What about

            ret = "sse4";
    #if defined(ENABLE_SSE41) && !defined(BUILD_BITCOIN_INTERNAL)
            TransformD64_4way = sha256d64_sse41::Transform_4way;
            ret += ",sse41";
    #endif
        }
    

    ?


    sipa commented at 12:35 AM on June 4, 2018:

    Done.

  3. sipa force-pushed on Jun 4, 2018
  4. fanquake added the label Validation on Jun 4, 2018
  5. ghost commented at 9:08 AM on June 4, 2018: none

    @sipa

    Great works as usual!

    I just came cross this thread

    https://github.com/armfazh/flo-shani-aesni/blob/master/README.md

    I hope you will have time to look at what they did!

  6. DrahtBot commented at 12:41 PM on June 4, 2018: member

    Needs rebase

  7. promag commented at 12:45 PM on June 4, 2018: member

    Concept ACK, nice numbers.

  8. sipa force-pushed on Jun 4, 2018
  9. sipa commented at 4:54 PM on June 4, 2018: member

    Rebased. @Kick1986 Nice, I'll have a look.

  10. DrahtBot commented at 12:49 AM on June 5, 2018: member

    <!--e57a25ab6845829454e8d69fc972939a-->Note to reviewers: This pull request conflicts with the following ones:

    • #13442 (Convert the 1-way SSE4 SHA256 code from asm to intrinsics by sipa)
    • #13203 (Add POWER8 ASM for 4-way SHA256 by TheBlueMatt)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  11. Empact commented at 9:34 AM on June 5, 2018: member

    For clang version, looks like they were added in 3.4, but never noted in the release notes. Source: went from commit date[1] to release date[2] to file in release version[3]. Did not check for every intrinsic used. [1] https://github.com/llvm-mirror/clang/commit/b83f5a77335ca8a5b41ba4e17aa8d4bb6248c1e4#diff-c4f203a0f202bf56c364a657b0aecae9 [2] http://releases.llvm.org/ [3] https://github.com/llvm-mirror/clang/blob/release_34/lib/Headers/shaintrin.h

  12. in src/crypto/sha256_shani.cpp:87 in bb80ab2596 outdated
      86 | +{
      87 | +    __m128i m0, m1, m2, m3, s0, s1, so0, so1;
      88 | +
      89 | +    /* Load state */
      90 | +    s0 = _mm_loadu_si128((const __m128i*)s);
      91 | +    s1 = _mm_loadu_si128((const __m128i*)(s + 4));
    


    theuni commented at 8:33 PM on June 5, 2018:

    I think these could be _mm_load_si128 if s[] was alignas(16).

  13. in src/Makefile.am:35 in bb80ab2596 outdated
      31 | @@ -32,6 +32,7 @@ LIBBITCOIN_UTIL=libbitcoin_util.a
      32 |  LIBBITCOIN_CRYPTO=crypto/libbitcoin_crypto.a
      33 |  LIBBITCOIN_CRYPTO_SSE41=crypto/libbitcoin_crypto_sse41.a
      34 |  LIBBITCOIN_CRYPTO_AVX2=crypto/libbitcoin_crypto_avx2.a
      35 | +LIBBITCOIN_CRYPTO_SHANI=crypto/libbitcoin_crypto_shani.a
    


    theuni commented at 8:49 PM on June 5, 2018:

    These are starting to get out of hand. I think we should take @TheBlueMatt's suggestion and treat LIBBITCOIN_CRYPTO as a collection of these helpers. That way we can just add $(LIBBITCOIN_CRYPTO) everywhere, and that will pull in the cpu-specific libs as well. Something like:

    ...
    LIBBITCOIN_CRYPTO=crypto/libbitcoin_crypto.a
    LIBBITCOIN_CRYPTO_AVX2=crypto/libbitcoin_crypto_avx2.a
    LIBBITCOIN_CRYPTO_SHANI=crypto/libbitcoin_crypto_shani.a
    LIBBITCOIN_CRYPTO+=$(LIBBITCOIN_CRYPTO_AVX2)
    LIBBITCOIN_CRYPTO+=$(LIBBITCOIN_CRYPTO_SHANI)
    ...
    

    Then the cpu-specific ones can be dropped from the LDADDs all over the place. I'm happy to do up a patch on top of this if you'd like.


    sipa commented at 12:43 AM on June 6, 2018:

    @theuni Actually, feel like PRing that as a separate PR, before this one goes in? Then @TheBlueMatt and I can both rebase ours on top of yours and not conflict with each other.

  14. theuni commented at 9:03 PM on June 5, 2018: member

    concept ACK.

    I noticed while testing #13400 that I added a bug, but bitcoind started up fine anyway, due to missing sanity checks for the double/4way/8way hashes. Mind adding those?

  15. in src/crypto/sha256_shani.cpp:12 in bb80ab2596 outdated
       8 | +
       9 | +#ifdef ENABLE_SHANI
      10 | +
      11 | +#include <stdint.h>
      12 | +#if defined(_MSC_VER)
      13 | +#include <immintrin.h>
    


    DesWurstes commented at 4:12 AM on June 7, 2018:

    I believe including immintrin.h is enough for both platforms. x86intrin includes immintrin.h, and immintrin.h includes everything that is needed: https://github.com/gcc-mirror/gcc/blob/master/gcc/config/i386/immintrin.h

    EDIT: Tested on Linux GCC and Clang, including immintrin.h is enough for all platforms.


    sipa commented at 4:59 PM on June 11, 2018:

    Thanks, fixed.

  16. laanwj referenced this in commit 70a03c635b on Jun 11, 2018
  17. DrahtBot added the label Needs rebase on Jun 11, 2018
  18. sipa force-pushed on Jun 11, 2018
  19. sipa commented at 4:52 PM on June 11, 2018: member

    Rebased.

  20. DrahtBot removed the label Needs rebase on Jun 11, 2018
  21. DrahtBot commented at 7:00 PM on June 12, 2018: member

    <!--cf906140f33d8803c4a75a2196329ecb-->Needs rebase

  22. DrahtBot added the label Needs rebase on Jun 12, 2018
  23. sipa force-pushed on Jun 12, 2018
  24. sipa commented at 7:23 PM on June 12, 2018: member

    Rebased.

  25. DrahtBot removed the label Needs rebase on Jun 12, 2018
  26. in src/crypto/sha256_shani.cpp:14 in b318acd6ee outdated
       9 | +#ifdef ENABLE_SHANI
      10 | +
      11 | +#include <stdint.h>
      12 | +#include <immintrin.h>
      13 | +
      14 | +#include "crypto/common.h"
    


    theuni commented at 8:48 PM on June 13, 2018:

    Linter is yelling about include style here.

  27. sipa force-pushed on Jun 14, 2018
  28. DrahtBot added the label Needs rebase on Jun 24, 2018
  29. DrahtBot commented at 2:19 PM on June 24, 2018: member

    <!--cf906140f33d8803c4a75a2196329ecb-->Needs rebase

  30. [Refactor] CPU feature detection logic for SHA256 268400d318
  31. sipa force-pushed on Jun 24, 2018
  32. sipa commented at 5:56 PM on June 24, 2018: member

    Rebased after #13471 merge.

    Also split up the CPU feature detection logic change into its own commit.

    I noticed while testing #13400 that I added a bug, but bitcoind started up fine anyway, due to missing sanity checks for the double/4way/8way hashes. Mind adding those?

    That was addressed in #13438.

  33. DrahtBot removed the label Needs rebase on Jun 24, 2018
  34. theuni commented at 7:51 PM on June 25, 2018: member
    
    utACK otherwise.
    
  35. Add SHA256 implementation using using Intel SHA intrinsics 4c935e2eee
  36. Use immintrin.h everywhere for intrinsics 66b2cf1ccf
  37. sipa force-pushed on Jun 26, 2018
  38. theuni commented at 6:56 PM on June 26, 2018: member

    Thanks! utACK 66b2cf1ccfad545a8ec3f2a854e23f647322bf30.

  39. jb55 commented at 3:32 AM on June 27, 2018: member

    Not sure what compelled me to do this, and it's probably overkill... but...

    Tested ACK 66b2cf1ccfad545a8ec3f2a854e23f647322bf30 with 100k rounds of quickcheck at various optimization levels, but only with the non-two way transform for now.

  40. DesWurstes commented at 11:54 AM on June 27, 2018: contributor

    Just a nit from older pull requests: Now that it has a custom CPUID function https://github.com/bitcoin/bitcoin/blob/d96bdd78307bc5469cb8a4d5ca0e6cbc21fe4073/src/crypto/sha256.cpp#L534-L538 including <cpuid.h> is not necessary now:

    https://github.com/bitcoin/bitcoin/blob/d96bdd78307bc5469cb8a4d5ca0e6cbc21fe4073/src/crypto/sha256.cpp#L12-L16

    Thank you for your awesome contributions!

  41. gmaxwell commented at 8:06 AM on July 7, 2018: contributor

    ACK

  42. laanwj commented at 7:10 PM on July 9, 2018: member

    utACK 66b2cf1ccfad545a8ec3f2a854e23f647322bf30 tested that build passes on FreeBSD+OpenBSD

  43. laanwj merged this on Jul 9, 2018
  44. laanwj closed this on Jul 9, 2018

  45. laanwj referenced this in commit 3a3eabef40 on Jul 9, 2018
  46. codablock referenced this in commit cd16312712 on Oct 1, 2019
  47. codablock referenced this in commit dda04a300e on Oct 1, 2019
  48. codablock referenced this in commit adfd2e1a83 on Oct 1, 2019
  49. codablock referenced this in commit 1b2252c28c on Oct 1, 2019
  50. barrystyle referenced this in commit 84d549d17b on Jan 22, 2020
  51. barrystyle referenced this in commit 9be0431b4c on Jan 22, 2020
  52. MarcoFalke locked this on Sep 8, 2021

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-04-14 18:15 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me