Avoid non-trivial global constants in SHA-NI code #18553

pull sipa wants to merge 1 commits into bitcoin:master from sipa:202004_no_global_init_shani changing 1 files +17 −19
  1. sipa commented at 5:28 AM on April 7, 2020: member

    This is a potential solution for #18456.

    It seems that the compiler cannot turn _mm_set_epi64x(<constant>,<constnant>) into a constant itself, and thus emits a global initializer for the MASK, INIT0, and INIT1 global constants in the sha-ni SHA256 implementation.

    Change this by turning them into dumb byte arrays, loading them into an SSE variable whenever needed.

    Tested on a SHA-NI capable machine. I do not observe any obvious performance impact (but this is hard to measure, it's already very fast...).

  2. DrahtBot added the label Utils/log/libs on Apr 7, 2020
  3. in src/crypto/sha256_shani.cpp:16 in e0e42b73d0 outdated
      16 |  namespace {
      17 |  
      18 | -const __m128i MASK = _mm_set_epi64x(0x0c0d0e0f08090a0bULL, 0x0405060700010203ULL);
      19 | -const __m128i INIT0 = _mm_set_epi64x(0x6a09e667bb67ae85ull, 0x510e527f9b05688cull);
      20 | -const __m128i INIT1 = _mm_set_epi64x(0x3c6ef372a54ff53aull, 0x1f83d9ab5be0cd19ull);
      21 | +const uint8_t MASK[16] alignas(__m128i) = {0x03, 0x02, 0x01, 0x00, 0x07, 0x06, 0x05, 0x04, 0x0b, 0x0a, 0x09, 0x08, 0x0f, 0x0e, 0x0d, 0x0c};
    


    elichai commented at 8:28 AM on April 7, 2020:

    This fails to compile on clang. Try moving the alignas to the beginning of the expression: alignas(__m128i) const uint8_t MASK[16] = {0x03, 0x02, 0x01, 0x00, 0x07, 0x06, 0x05, 0x04, 0x0b, 0x0a, 0x09, 0x08, 0x0f, 0x0e, 0x0d, 0x0c};


    sipa commented at 6:50 PM on April 7, 2020:

    Done.

  4. elichai commented at 8:35 AM on April 7, 2020: contributor

    Looking at the source of this function: https://github.com/gcc-mirror/gcc/blob/4a3895903c29ed85da6fcb886f31ff23d4c6e935/gcc/config/i386/emmintrin.h#L694 It seems to be doing just a read operation from that location, so I'm not too worried about the performance of this

    And looking at godbolt you seem to be right, it initializes it in the global initializers https://godbolt.org/z/M2b6cg

  5. MarcoFalke added the label Waiting for author on Apr 7, 2020
  6. MarcoFalke commented at 10:10 AM on April 7, 2020: member

    Alternative suggested on IRC, which might compile on clang:

    [13:29] <aj> +inline __m128i MASK() { return _mm_set_epi64x(0x0c0d0e0f08090a0bULL, 0x0405060700010203ULL); }
    
  7. DrahtBot commented at 11:09 AM on April 7, 2020: member

    <!--e57a25ab6845829454e8d69fc972939a-->

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    <!--174a7506f384e20aa4161008e828411d-->

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #13789 (crypto/sha256: Use pragmas to enforce necessary intrinsics for GCC and Clang by luke-jr)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  8. MarcoFalke removed the label Waiting for author on Apr 7, 2020
  9. sipa force-pushed on Apr 7, 2020
  10. sipa commented at 6:50 PM on April 7, 2020: member

    @MarcoFalke Going to try #18553 (review) first.

  11. Avoid non-trivial global constants in SHA-NI code 8508473094
  12. sipa force-pushed on Apr 7, 2020
  13. elichai commented at 11:28 AM on April 8, 2020: contributor

    Looks Good. Can someone with SHA-NI run this under valgrind and sanitizers? I'm a bit afraid this violates some weird C++ rule (I find manually modifying alignments quite scary)

  14. MarcoFalke added this to the milestone 0.20.0 on Apr 8, 2020
  15. laanwj commented at 2:33 PM on April 8, 2020: member

    Code review ACK 850847309458f43fc7ce6c13fa08c86e1cae042a

  16. laanwj commented at 12:04 PM on April 9, 2020: member

    I've diffed the assembly code generated by gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0. The only difference with this patch seems to be that the following initialization code is removed:

    --- ./src/crypto/crypto_libbitcoin_crypto_shani_a-sha256_shani.o.old_d  2020-04-09 14:01:37.073839875 +0200
    +++ ./src/crypto/crypto_libbitcoin_crypto_shani_a-sha256_shani.o.new_d  2020-04-09 14:01:43.009818616 +0200
    @@ -1,5 +1,5 @@
     
    -./src/crypto/crypto_libbitcoin_crypto_shani_a-sha256_shani.o.old:     file format elf64-x86-64
    +./src/crypto/crypto_libbitcoin_crypto_shani_a-sha256_shani.o.new:     file format elf64-x86-64
     
     
     Disassembly of section .text:
    @@ -1163,28 +1163,3 @@
         1502:      48 83 c4 68             add    $0x68,%rsp
         1506:      c3                      retq   
         1507:      e8 00 00 00 00          callq  150c <_ZN15sha256d64_shani14Transform_2wayEPhPKh+0x115c>
    -
    -Disassembly of section .text.startup:
    -
    -0000000000000000 <_GLOBAL__sub_I__ZN12sha256_shani9TransformEPjPKhm>:
    -   0:  48 83 ec 18             sub    $0x18,%rsp
    -   4:  66 0f 6f 05 00 00 00    movdqa 0x0(%rip),%xmm0        # c <_GLOBAL__sub_I__ZN12sha256_shani9TransformEPjPKhm+0xc>
    -   b:  00 
    -   c:  64 48 8b 04 25 28 00    mov    %fs:0x28,%rax
    -  13:  00 00 
    -  15:  48 89 44 24 08          mov    %rax,0x8(%rsp)
    -  1a:  31 c0                   xor    %eax,%eax
    -  1c:  0f 29 05 00 00 00 00    movaps %xmm0,0x0(%rip)        # 23 <_GLOBAL__sub_I__ZN12sha256_shani9TransformEPjPKhm+0x23>
    -  23:  48 8b 44 24 08          mov    0x8(%rsp),%rax
    -  28:  64 48 33 04 25 28 00    xor    %fs:0x28,%rax
    -  2f:  00 00 
    -  31:  66 0f 6f 05 00 00 00    movdqa 0x0(%rip),%xmm0        # 39 <_GLOBAL__sub_I__ZN12sha256_shani9TransformEPjPKhm+0x39>
    -  38:  00 
    -  39:  0f 29 05 00 00 00 00    movaps %xmm0,0x0(%rip)        # 40 <_GLOBAL__sub_I__ZN12sha256_shani9TransformEPjPKhm+0x40>
    -  40:  66 0f 6f 05 00 00 00    movdqa 0x0(%rip),%xmm0        # 48 <_GLOBAL__sub_I__ZN12sha256_shani9TransformEPjPKhm+0x48>
    -  47:  00 
    -  48:  0f 29 05 00 00 00 00    movaps %xmm0,0x0(%rip)        # 4f <_GLOBAL__sub_I__ZN12sha256_shani9TransformEPjPKhm+0x4f>
    -  4f:  75 05                   jne    56 <_GLOBAL__sub_I__ZN12sha256_shani9TransformEPjPKhm+0x56>
    -  51:  48 83 c4 18             add    $0x18,%rsp
    -  55:  c3                      retq   
    -  56:  e8 00 00 00 00          callq  5b <.LC5+0xb>
    

    This means that there is no extra overhead, at least with this compiler.

  17. MarcoFalke commented at 12:54 PM on April 9, 2020: member

    @rebroad This should fix your issue #18456

  18. theuni commented at 6:41 PM on April 9, 2020: member

    LGTM. @MarcoFalke does clang have problems with the fix here? If so, @ajtowns's solution could be tweaked to be even simpler:

    diff --git a/src/crypto/sha256_shani.cpp b/src/crypto/sha256_shani.cpp
    index 92f67710fb..b139bceb37 100644
    --- a/src/crypto/sha256_shani.cpp
    +++ b/src/crypto/sha256_shani.cpp
    @@ -15,9 +15,9 @@
    
     namespace {
    
    -const __m128i MASK = _mm_set_epi64x(0x0c0d0e0f08090a0bULL, 0x0405060700010203ULL);
    -const __m128i INIT0 = _mm_set_epi64x(0x6a09e667bb67ae85ull, 0x510e527f9b05688cull);
    -const __m128i INIT1 = _mm_set_epi64x(0x3c6ef372a54ff53aull, 0x1f83d9ab5be0cd19ull);
    +#define MASK _mm_set_epi64x(0x0c0d0e0f08090a0bULL, 0x0405060700010203ULL)
    +#define INIT0 _mm_set_epi64x(0x6a09e667bb67ae85ull, 0x510e527f9b05688cull)
    +#define INIT1  _mm_set_epi64x(0x3c6ef372a54ff53aull, 0x1f83d9ab5be0cd19ull)
    
     void inline  __attribute__((always_inline)) QuadRound(__m128i& state0, __m128i& state1, uint64_t k1, uint64_t k0)
     {
    

    I assume the compiler would be smart enough to avoid actually doing the load each time.

  19. MarcoFalke commented at 6:43 PM on April 9, 2020: member

    Clang on travis is green now

  20. theuni commented at 7:42 PM on April 9, 2020: member

    Sidenote: we could potentially add a check for illegal instructions in the .text.startup section in one of our python binary checking tools. Though we'd have to create a per-arch blacklist/whitelist to define "illegal".

  21. laanwj commented at 8:20 PM on April 9, 2020: member

    Sidenote: we could potentially add a check for illegal instructions in the .text.startup section in one of our python binary checking tools. Though we'd have to create a per-arch blacklist/whitelist to define "illegal".

    My idea was to forbid .text.startup sections completely in all 'special' compilation units, e.g. those compiled with non-default instruction sets. I think that's easier to implement a check for than instruction white/blacklists.

    (in any case, not for this PR)

  22. theuni commented at 8:33 PM on April 9, 2020: member

    @laanwj Aha, even simpler.

    Edit: And yes, I certainly meant that this could be done as a follow-up, not for this PR.

  23. fanquake added the label Needs backport (0.20) on Apr 12, 2020
  24. elichai commented at 1:09 PM on April 12, 2020: contributor

    Sounds like it solved the problem: #18456 (comment)

    ACK 850847309458f43fc7ce6c13fa08c86e1cae042a

  25. laanwj merged this on Apr 22, 2020
  26. laanwj closed this on Apr 22, 2020

  27. sidhujag referenced this in commit a4fe4e7bb9 on Apr 23, 2020
  28. fanquake referenced this in commit 842b13a5f4 on Apr 23, 2020
  29. fanquake removed the label Needs backport (0.20) on Apr 23, 2020
  30. laanwj referenced this in commit fb5b098598 on May 11, 2020
  31. deadalnix referenced this in commit fa124e895b on May 28, 2020
  32. ftrader referenced this in commit 3fb97c444d on Aug 17, 2020
  33. backpacker69 referenced this in commit c8657db884 on Mar 28, 2021
  34. codeofalltrades referenced this in commit 8736fc78be on May 27, 2021
  35. PastaPastaPasta referenced this in commit f66330da12 on Dec 22, 2021
  36. PastaPastaPasta referenced this in commit 34ad1fd1a8 on Dec 22, 2021
  37. PastaPastaPasta referenced this in commit d4279129c0 on Dec 22, 2021
  38. PastaPastaPasta referenced this in commit b330a1adbd on Dec 28, 2021
  39. DrahtBot locked this on Feb 15, 2022

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-04-13 18:14 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me