[CI test, dontmerge] PR #1000 with __(u)mul128() based abstractions in int128_struct #1130
pull sipa wants to merge 4 commits into bitcoin-core:master from sipa:pr1000_mul128 changing 16 files +751 −290-
sipa commented at 6:25 pm on July 22, 2022: contributor
-
Simulated int128 type. 8cf46673a3
-
int128: Tidy #includes of int128.h and int128_impl.h
After this commit, int128.h and int128_impl.h are included as follows: - .c files which use int128 include int128_impl.h (after util.h) - .h files which use int128 include int128.h (after util.h) This list is exhaustive. util.h needs to included first because it sets up necessary #defines.
-
Switch to int128_struct implementation using (u)mul128 abstractions 2fe03cda23
-
Use int128_struct on 64-bit MSVC 412f359f1a
-
sipa force-pushed on Jul 22, 2022
-
in src/util.h:247 in 412f359f1a
243@@ -244,6 +244,9 @@ static SECP256K1_INLINE void secp256k1_int_cmov(int *r, const int *a, int flag) 244 #elif defined(UINT128_MAX) || defined(__SIZEOF_INT128__) 245 # define SECP256K1_WIDEMUL_INT128 1 246 # define SECP256K1_INT128_NATIVE 1 247+#elif defined(_MSC_VER) && (defined(_M_X64) || defined(_M_ARM64))
real-or-random commented at 5:25 pm on July 29, 2022:I know this is just a CI test but I think we want something like this after #1000. Then I’d suggest this as a heuristic:
0#elif UINTPTR_MAX >= UINT64_MAX
sipa commented at 5:27 pm on July 29, 2022:That doesn’t guarantee the existence of__mulh
and__umulh
?
real-or-random commented at 5:35 pm on July 29, 2022:No, it doesn’t. But even without access to an efficient 64x64->128 mul, it’s still a good idea to use the int128_struct implementation on a 64bit machine (with the simulated 64x64->128 mul then) because then the 64-bit field/scalar code will be used.
(I admit I don’t know if we’ll run into such a machine/compiler. It may be a good idea to output a preprocessor warning in this case, or actually remove the simulated 64x64->128 mul code because our tests will never exercise it.)
sipa commented at 5:38 pm on July 29, 2022:Hmm, perhaps we should benchmark the 10x26 field code against the 5x52 + int128_struct code. I’m not convinced about which one is faster.
real-or-random commented at 8:25 am on July 30, 2022:It’s faster on 64 bit platforms because all the other operations except muls will profit:
0Intel(R) Core(TM) i7-7600U CPU @ 2.80GHz (x86_64), gcc 12.1.0 -O2 1 2Benchmark , Min(us) , Avg(us) , Max(us) 3 4./configure --with-test-override-wide-multiply=int64 5 6ecdsa_verify , 95.3 , 96.2 , 102.0 7ecdsa_sign , 64.0 , 64.1 , 64.7 8 9./configure --with-test-override-wide-multiply=int128_struct 10 11ecdsa_verify , 55.9 , 56.4 , 57.4 12ecdsa_sign , 42.2 , 42.4 , 43.0 13 14./configure 15 16ecdsa_verify , 54.9 , 55.4 , 56.1 17ecdsa_sign , 39.9 , 40.1 , 40.7
sipa commented at 12:38 pm on July 30, 2022:Ok, that’s convincing.
roconnor-blockstream commented at 2:00 pm on August 8, 2022:Want to benchmark on 32-bit systems?
sipa commented at 8:58 pm on August 16, 2022:When compiling for i686 (but executing on a Ryzen 5950X system), the 32-bit code is over 2x faster than the int128_struct 64-bit code.
There is another problem with using int128_struct on 32-bit systems: it doesn’t result in constant-time code for me (GCC 10.3.0 on x86_64 cross-compiling for i686).
roconnor-blockstream commented at 9:02 pm on August 16, 2022:Any idea the source of non-constant time?
sipa commented at 9:03 pm on August 16, 2022:0 r->hi += hi + (r->lo < lo);
roconnor-blockstream commented at 10:22 pm on August 16, 2022:We should probably change this right?
sipa commented at 1:42 pm on August 17, 2022:Change it to what? There are instrinsics for some platforms to do addition-with-carry, which likely result in better performance here (as well as constant-time operation), but they’re obviously not standard C.
roconnor-blockstream commented at 1:49 pm on August 17, 2022:I understood there were similar non-constant time issues in libsecp256k1 where comparisons were being transformed into branches, and that refactoring the code somehow fixed the problem.
sipa commented at 2:02 pm on August 17, 2022:Maybe, but I’m not sure that’s needed.
All of this is empirical of course - it’s very much possible for vaguely reasonable compilers to convert code like this into either constant time or non-constant time code.
But based on what I’ve seen from GCC, this isn’t too suprising: i686 doesn’t have a (CPU) native 64-bit type; all (u)int64_t operations are effectively already translated to operations on pairs of 32-bit integers. And it seems that on several platforms, GCC compiles comparisons on tuple-represented integers to branches (this is also true on 64-bit platforms with __int128).
Code of this form already occurs in scalar_4x64_impl.h, but it’s not a problem there, because it’s only used on 64-bit platforms, where uint64_t isn’t represented as a pair. For the same reason, it does not seem to be a problem if int128_struct_impl.h is used on 64-bit systems.
real-or-random commented at 3:24 pm on August 19, 2022:all (u)int64_t operations are effectively already translated to operations on pairs of 32-bit integers. And it seems that on several platforms, GCC compiles comparisons on tuple-represented integers to branches
That makes a lot of sense but recent GCC >=8 seems to get away without a branch, at least in this simple test… https://gcc.godbolt.org/z/nss58G3Ko
If we find a nice of rewriting these comparisons, fine. But I don’t see any way of fixing this, so I wouldn’t bother too much. The build is seriously misconfigured if you use int128_struct on a 32bit machine, and we don’t need to support this.
roconnor-blockstream commented at 10:07 pm on August 8, 2022: contributorIt may be a good idea to output a preprocessor warning in this case, or actually remove the simulated 64x64->128 mul code because our tests will never exercise it.)
We could add a
#warning
pragma saying “Beware of bugs in the above code; We have only proved it correct, not tested it.”sipa closed this on Nov 16, 2022
This is a metadata mirror of the GitHub repository bitcoin-core/secp256k1. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-11-22 12:15 UTC
More mirrored repositories can be found on mirror.b10c.me