[CI test, dontmerge] PR #1000 with __(u)mul128() based abstractions in int128_struct #1130

pull sipa wants to merge 4 commits into bitcoin-core:master from sipa:pr1000_mul128 changing 16 files +751 −290
  1. sipa commented at 6:25 pm on July 22, 2022: contributor
  2. Simulated int128 type. 8cf46673a3
  3. int128: Tidy #includes of int128.h and int128_impl.h
    After this commit, int128.h and int128_impl.h are included as follows:
     - .c files which use int128 include int128_impl.h (after util.h)
     - .h files which use int128 include int128.h (after util.h)
    
    This list is exhaustive. util.h needs to included first because it sets
    up necessary #defines.
    62554ab655
  4. Switch to int128_struct implementation using (u)mul128 abstractions 2fe03cda23
  5. Use int128_struct on 64-bit MSVC 412f359f1a
  6. sipa force-pushed on Jul 22, 2022
  7. in src/util.h:247 in 412f359f1a
    243@@ -244,6 +244,9 @@ static SECP256K1_INLINE void secp256k1_int_cmov(int *r, const int *a, int flag)
    244 #elif defined(UINT128_MAX) || defined(__SIZEOF_INT128__)
    245 # define SECP256K1_WIDEMUL_INT128 1
    246 # define SECP256K1_INT128_NATIVE 1
    247+#elif defined(_MSC_VER) && (defined(_M_X64) || defined(_M_ARM64))
    


    real-or-random commented at 5:25 pm on July 29, 2022:

    I know this is just a CI test but I think we want something like this after #1000. Then I’d suggest this as a heuristic:

    0#elif UINTPTR_MAX >= UINT64_MAX
    

    sipa commented at 5:27 pm on July 29, 2022:
    That doesn’t guarantee the existence of __mulh and __umulh?

    real-or-random commented at 5:35 pm on July 29, 2022:

    No, it doesn’t. But even without access to an efficient 64x64->128 mul, it’s still a good idea to use the int128_struct implementation on a 64bit machine (with the simulated 64x64->128 mul then) because then the 64-bit field/scalar code will be used.

    (I admit I don’t know if we’ll run into such a machine/compiler. It may be a good idea to output a preprocessor warning in this case, or actually remove the simulated 64x64->128 mul code because our tests will never exercise it.)


    sipa commented at 5:38 pm on July 29, 2022:
    Hmm, perhaps we should benchmark the 10x26 field code against the 5x52 + int128_struct code. I’m not convinced about which one is faster.

    real-or-random commented at 8:25 am on July 30, 2022:

    It’s faster on 64 bit platforms because all the other operations except muls will profit:

     0Intel(R) Core(TM) i7-7600U CPU @ 2.80GHz (x86_64), gcc 12.1.0 -O2
     1
     2Benchmark                     ,    Min(us)    ,    Avg(us)    ,    Max(us)    
     3
     4./configure --with-test-override-wide-multiply=int64
     5
     6ecdsa_verify                  ,    95.3       ,    96.2       ,   102.0    
     7ecdsa_sign                    ,    64.0       ,    64.1       ,    64.7  
     8
     9./configure  --with-test-override-wide-multiply=int128_struct
    10
    11ecdsa_verify                  ,    55.9       ,    56.4       ,    57.4    
    12ecdsa_sign                    ,    42.2       ,    42.4       ,    43.0    
    13
    14./configure 
    15
    16ecdsa_verify                  ,    54.9       ,    55.4       ,    56.1    
    17ecdsa_sign                    ,    39.9       ,    40.1       ,    40.7    
    

    sipa commented at 12:38 pm on July 30, 2022:
    Ok, that’s convincing.

    roconnor-blockstream commented at 2:00 pm on August 8, 2022:
    Want to benchmark on 32-bit systems?

    sipa commented at 8:58 pm on August 16, 2022:

    When compiling for i686 (but executing on a Ryzen 5950X system), the 32-bit code is over 2x faster than the int128_struct 64-bit code.

    There is another problem with using int128_struct on 32-bit systems: it doesn’t result in constant-time code for me (GCC 10.3.0 on x86_64 cross-compiling for i686).


    roconnor-blockstream commented at 9:02 pm on August 16, 2022:
    Any idea the source of non-constant time?

    sipa commented at 9:03 pm on August 16, 2022:
    0  r->hi += hi + (r->lo < lo);
    

    roconnor-blockstream commented at 10:22 pm on August 16, 2022:
    We should probably change this right?

    sipa commented at 1:42 pm on August 17, 2022:
    Change it to what? There are instrinsics for some platforms to do addition-with-carry, which likely result in better performance here (as well as constant-time operation), but they’re obviously not standard C.

    roconnor-blockstream commented at 1:49 pm on August 17, 2022:
    I understood there were similar non-constant time issues in libsecp256k1 where comparisons were being transformed into branches, and that refactoring the code somehow fixed the problem.

    sipa commented at 2:02 pm on August 17, 2022:

    Maybe, but I’m not sure that’s needed.

    All of this is empirical of course - it’s very much possible for vaguely reasonable compilers to convert code like this into either constant time or non-constant time code.

    But based on what I’ve seen from GCC, this isn’t too suprising: i686 doesn’t have a (CPU) native 64-bit type; all (u)int64_t operations are effectively already translated to operations on pairs of 32-bit integers. And it seems that on several platforms, GCC compiles comparisons on tuple-represented integers to branches (this is also true on 64-bit platforms with __int128).

    Code of this form already occurs in scalar_4x64_impl.h, but it’s not a problem there, because it’s only used on 64-bit platforms, where uint64_t isn’t represented as a pair. For the same reason, it does not seem to be a problem if int128_struct_impl.h is used on 64-bit systems.


    real-or-random commented at 3:24 pm on August 19, 2022:

    all (u)int64_t operations are effectively already translated to operations on pairs of 32-bit integers. And it seems that on several platforms, GCC compiles comparisons on tuple-represented integers to branches

    That makes a lot of sense but recent GCC >=8 seems to get away without a branch, at least in this simple test… https://gcc.godbolt.org/z/nss58G3Ko

    If we find a nice of rewriting these comparisons, fine. But I don’t see any way of fixing this, so I wouldn’t bother too much. The build is seriously misconfigured if you use int128_struct on a 32bit machine, and we don’t need to support this.

  8. roconnor-blockstream commented at 10:07 pm on August 8, 2022: contributor

    It may be a good idea to output a preprocessor warning in this case, or actually remove the simulated 64x64->128 mul code because our tests will never exercise it.)

    We could add a #warning pragma saying “Beware of bugs in the above code; We have only proved it correct, not tested it.”

  9. sipa commented at 8:03 pm on November 16, 2022: contributor
    Superseded by #1156.
  10. sipa closed this on Nov 16, 2022


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin-core/secp256k1. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-12-22 21:15 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me