Followups to int128_struct arithmetic

sipa commented at 7:51 pm on November 16, 2022: contributor

This is a follow-up to #1000:

Add randomized unit tests for int128 logic.
Add CI for the _(u)mulh code path (on non-ARM64 MSVC).
Add heuristic logic to enable int128_struct based arithmetic on 64-bit MSVC, or systems with pointers wider than 32 bits.
Fix signed overflow in ARM64 MSVC code.

in src/util.h:256 in 5b753e4830 outdated

254+# define SECP256K1_WIDEMUL_INT128 1
255+# define SECP256K1_INT128_STRUCT 1
256+#elif UINTPTR_MAX > 0xffffffff
257+/* Systems with 64-bit pointers (and thus registers) very likely benefit from
258+ * using 64-bit based arithmetic (even if we need to fall back to 32x32->64 based
259+ * multiplication logic. */

real-or-random commented at 8:00 pm on November 16, 2022:

0/* Systems with 64-bit pointers (and thus registers) very likely benefit from
1 * using 64-bit based arithmetic (even if we need to fall back to 32x32->64 based
2 * multiplication logic). */

sipa commented at 8:49 pm on November 16, 2022:

Done.

sipa cross-referenced this on Nov 16, 2022 from issue [CI test, dontmerge] PR #1000 with __(u)mul128() based abstractions in int128_struct by sipa

in src/util.h:245 in 5b753e4830 outdated

243 # define SECP256K1_INT128_NATIVE 1
244 #elif defined(USE_FORCE_WIDEMUL_INT64)
245+/* If USE_FORCE_WIDEMUL_INT64 is set, use int64. */
246 # define SECP256K1_WIDEMUL_INT64 1
247 #elif defined(UINT128_MAX) || defined(__SIZEOF_INT128__)
248+/* If __int128 exists, use int128. */

real-or-random commented at 8:04 pm on November 16, 2022:

0/* If native 128-bit types exist, use int128. */

(because our logic will also happily use int128_t, not only __int128.)

sipa commented at 8:49 pm on November 16, 2022:

Done.

in src/int128_native_impl.h:45 in 5b753e4830 outdated

40@@ -37,6 +41,10 @@ static SECP256K1_INLINE int secp256k1_u128_check_bits(const secp256k1_uint128 *r
41    return (*r >> n == 0);
42 }
43 
44+static SECP256K1_INLINE void secp256k1_i128_load(secp256k1_int128 *r, int64_t hi, uint64_t lo) {
45+    *r = (((uint128_t)(uint64_t)hi) << 64) + lo;

roconnor-blockstream commented at 8:09 pm on November 16, 2022:

why not ((int128_t)hi << 64) + lo?

sipa commented at 8:17 pm on November 16, 2022:

((int128_t)hi << 64) + lo is UB if hi is negative (left-shift of negative value is undefined).

((uint128_t)hi << 64) + lo would work, but involves sign-extending hi, and then throwing those extension bits away.

((uint128_t)(uint64_t)hi << 64) + lo is equivalent, but avoids the sign-extension, which I thought would be more obviously correct.

roconnor-blockstream commented at 8:28 pm on November 16, 2022:

Somehow I missed the fact the left-shift of negative values is UB. In retrospect, that makes a bit of sense if you cannot count on two’s complement, even if the formal definition of multiplying by 2^n still makes sense for negative numbers.

in src/util.h:253 in 5b753e4830 outdated

251+#elif defined(_MSC_VER) && (defined(_M_X64) || defined(_M_ARM64))
252+/* On 64-bit MSVC targets (x86_64 and arm64), use int128_struct
253+ * (which has special logic to implement using intrinsics on those systems). */
254+# define SECP256K1_WIDEMUL_INT128 1
255+# define SECP256K1_INT128_STRUCT 1
256+#elif UINTPTR_MAX > 0xffffffff

real-or-random commented at 8:46 pm on November 16, 2022:

0#elif SIZE_MAX > 0xffffffff

maybe this is a better heuristic because uintptr_t is an optional type. Though even if it’s not defined, your code at least won’t error because then UINTPTR_MAX will be treated as 0 in the preprocessor`. I don’t know.

sipa commented at 8:49 pm on November 16, 2022:

Done.

sipa force-pushed on Nov 16, 2022

Add int128 randomized tests f2b7e88768

int128: Add test override for testing __(u)mulh on MSVC X64

Also add a corresponding CI job

63ff064d2f

Heuristically decide whether to use int128_struct 9b5f589d30

sipa force-pushed on Nov 17, 2022

Avoid signed overflow in MSVC AMR64 secp256k1_mul128 3afce0af7c

real-or-random approved

real-or-random commented at 3:35 pm on November 17, 2022: contributor

utACK 3afce0af7c00eb4c5ca6d303e36a48c91a800459

in src/tests.c:1981 in 3afce0af7c outdated

1977+    {   /* secp256k1_u128_accum_mul */
1978+        secp256k1_uint128 res;
1979+
1980+        /* Check secp256k1_u128_accum_mul overflow */
1981+        secp256k1_u128_from_u64(&res, 0);
1982+        secp256k1_u128_accum_mul(&res, UINT64_MAX, UINT64_MAX);

roconnor-blockstream commented at 3:42 pm on November 17, 2022:

maybe just secp256k1_u128_mul(&res, UINT64_MAX, UINT64_MAX); without the secp256k1_u128_from_u64? And similarly below.

sipa commented at 3:47 pm on November 17, 2022:

This is code that’s already merged (from your PR, even!) and just being moved here. If you have nits on it, I’m happy to make small changes, but it sounds like you’re suggesting to change the test from testing secp256k1_u128_accum_mul to secp256k1_u128_mul instead. Maybe that’s better addressed in a separate PR to improve the tests?

roconnor-blockstream commented at 4:45 pm on November 17, 2022:

I believe what I’m proposing is a small change. I’m proposing changing the logic from

res = 0
res += UINT64_MAX*UINT64_MAX
res += UINT64_MAX*UINT64_MAX

to

res = UINT64_MAX*UINT64_MAX
res += UINT64_MAX*UINT64_MAX

It still tests accumulating multiplication overflow.

But maybe it isn’t worth changing.

sipa commented at 5:22 pm on November 17, 2022:

Done, in a new commit.

Make int128 overflow test use secp256k1_[ui]128_mul 99bd335599

in src/tests.c:1900 in 3afce0af7c outdated

1896+    load256i128(rswz, &swz);
1897+    CHECK(secp256k1_memcmp_var(rswr, rswz, 16) == 0);
1898+    /* test secp256k1_i128_accum_mul */
1899+    mulmod256(rswr, rsb, rsc, NULL);
1900+    add256(rswr, rswr, rswa);
1901+    if (int256is127(rswr)) {

roconnor-blockstream commented at 4:36 pm on November 17, 2022:

how often is this true?

sipa commented at 5:26 pm on November 17, 2022:

Often, from having tried this before adding this check.

sipa commented at 6:05 pm on November 17, 2022:

A Monte-Carlo simulation suggests int256is127() succeeds with probability ~15/16 (93.75%).

sipa commented at 8:21 pm on November 18, 2022:

In fact, testing this exhaustively with smaller integer types suggests the probability is exactly 15/16.

roconnor-blockstream commented at 6:07 pm on November 17, 2022: contributor

utACK 99bd335

real-or-random approved

real-or-random commented at 11:52 pm on November 17, 2022: contributor

utACK 99bd3355994a436e25d148c68e097cca11f3c63e

sipa cross-referenced this on Nov 17, 2022 from issue Add a secp256k1_i128_to_u64 function. by roconnor-blockstream

in src/util.h:258 in 99bd335599

256+#elif SIZE_MAX > 0xffffffff
257+/* Systems with 64-bit pointers (and thus registers) very likely benefit from
258+ * using 64-bit based arithmetic (even if we need to fall back to 32x32->64 based
259+ * multiplication logic). */
260+# define SECP256K1_WIDEMUL_INT128 1
261+# define SECP256K1_INT128_STRUCT 1

jonasnick commented at 7:59 pm on November 18, 2022:

Do we have a benchmark for a system where this is the case? @sipa’s benchmarks suggest that on 64 bit aarch64, int64 is faster than int128_struct.

sipa commented at 8:13 pm on November 18, 2022:

on 64 bit aarch64, int64 is faster than int128_struct.

Yes, but my benchmark also showed int64 is faster than int128, as this was on a system supporting __int128. So if there are systems where we want to prefer widemul=int64 over widemul=int128, that’s a discussion independent from the int128_struct decision logic.

To find an example where this branch here (use int128_struct over int64, because int128 isn’t available) matters, we’ll need to go outside of gcc/clang/icc/MSVC, because on those we all have access to true 64x64->128 hardware multiplication if it exists. I don’t know if there are any vaguely popular C compilers for 64-bit platforms out there… does Borland still exist?

EDIT: GCC 4.5 and below, or clang 3.0 and below, for x86_64, would be relevant platforms…

jonasnick commented at 9:04 pm on November 18, 2022:

Ok, if you think that’s a significant improvement on these systems, fine with me.

jonasnick commented at 9:04 pm on November 18, 2022: contributor

utACK 99bd3355994a436e25d148c68e097cca11f3c63e

EDIT: Maybe #1158 (comment) should be merged first?

real-or-random approved

real-or-random commented at 9:46 pm on November 18, 2022: contributor

ACK 99bd3355994a436e25d148c68e097cca11f3c63e tested this also on MSVC locally with the override, including all the benchmark binaries

EDIT: Maybe #1158 (comment) should be merged first?

I think we should be greedy and merge it now that it has enough ACKs. The other one will be rather easy to rebase.

real-or-random merged this on Nov 18, 2022

real-or-random closed this on Nov 18, 2022

real-or-random cross-referenced this on Dec 1, 2022 from issue 64x64->64 muls are not constant-time with MSVC on 32bit x86 by real-or-random

sipa referenced this in commit 9d47e7b71b on Dec 13, 2022

dhruv referenced this in commit 55ffd47cc6 on Dec 14, 2022

dhruv referenced this in commit 967c65b158 on Dec 14, 2022

dhruv referenced this in commit 78b5ddf28b on Jan 11, 2023

dhruv referenced this in commit 215394a1d5 on Jan 11, 2023

div72 referenced this in commit 945b094575 on Mar 14, 2023

str4d referenced this in commit 0df7b459f6 on Apr 21, 2023

vmta referenced this in commit e1120c94a1 on Jun 4, 2023

vmta referenced this in commit 8f03457eed on Jul 1, 2023

jonasnick cross-referenced this on Jul 17, 2023 from issue Upstream PRs 1147, 1149, 1000, 1155, 1156 by jonasnick

Followups to int128_struct arithmetic #1156