Increase precision of g1 and g2.

roconnor-blockstream commented at 3:49 pm on September 21, 2020: contributor

This allows us to shift by 256+128 = 384 bits, which is a multiple of the limb size of the scalar representation. This also happens to be the most precision possible for g2 that still fits into a 256-bit value.

This allows us to shift by 256+128 = 384 bits, which is a multiple of the limb size of the scalar representation.
This also happens to be the most precision possible for g2 that still fits into a 256-bit value.

57d4b0df2c

roconnor-blockstream commented at 3:53 pm on September 21, 2020: contributor

Historically speaking the values of g1 and g2 were set in #127 where they were copied from #21. However #21 was using secp256k1_num_mul where there was some advantage to using small values for g1 and g2. But with secp256k1_scalar_mul_shift_var, there is no advantage to using small values for g1 and g2.

Instead there is a very slight advantage of shifting by a multiple of the limb size.

in src/scalar_impl.h:282 in a582c3f475 outdated

281+ * roots of X^2 + X + 1.  Therefore lambda^2 + lamba == -1 mod n and beta^2 + beta == -1 mod p.
282+ * (The other primitive cube roots of unity are lambda^2 and beta^2 respectively.)
283+ *
284+ * Let l = -1/2 + i*sqrt(3)/2, the complex root of X^2 + X + 1. We can define a ring
285+ * homomorphism phi : Z[l] -> Z_n where phi(a + b*l) == a + b*lambda mod n. The kernel of phi
286+ * is a lattice over Z[l]. A reduced basis of this lattice is generated by the values  a1 + b1*l

roconnor-blockstream commented at 10:36 pm on September 21, 2020:

Rephrase as This lattice is generated by a reduced basis {a1 + b1*l, a2 + b2*l} where:.

in src/scalar_impl.h:431 in a582c3f475 outdated

430+ *  - either r2 <= (-b1 + b2) / 2 or n - (-b1 + b2)/2 <= r2.
431  *
432- * (Note that 'd' is also equal to the curve order here because [a1,b1] and [a2,b2] are found
433- * as outputs of the Extended Euclidean Algorithm on inputs 'order' and 'lambda').
434+ * Q.E.D.
435  *

roconnor-blockstream commented at 10:39 pm on September 21, 2020:

Delete this blank line.

in src/scalar_impl.h:349 in a582c3f475 outdated

346+ *
347+ *    |c2 - a*(-b1)/d|
348+ *  =
349+ *    |c2 - a*g2/2^384 + a*g2/2^384 - a*(-b1)/d|
350+ * <=   {triangle inequality}
351+ *    |c1 - a*g2/2^384| + |a*g2/2^384 - a*(-b1)/d|

sipa commented at 2:57 am on September 22, 2020:

I think you mean c2 here and the few lines below.

in src/scalar_impl.h:402 in a582c3f475 outdated

399+ *  =
400+ *    (-b1 + b2)(2^-1 + 2^-129)
401+ * <    {calculation}
402+ *    (-b1 + b2 + 2)/2
403+ *
404+ * Corollary 4: |k2| <= (-b1 + b2)/2.

sipa commented at 3:05 am on September 22, 2020:

Reuse of number 4.

in src/scalar_impl.h:317 in a582c3f475 outdated

314+ * can be found as outputs of the Extended Euclidean Algorithm on inputs 'n' and 'lambda').
315+ *
316+ * The function below splits a in r1 and r2, such that
317+ * - r1 + lambda * r2 == a (mod n)
318+ * - either 0 <= r1 <= (a1 + a2 + 1) / 2 or n - (a1 + a2 + 1)/2 <= r1 < n.
319+ * - either 0 <= r2 <= (-b1 + b2) / 2 or n - (-b1 + b2)/2 <= r2 < n.

sipa commented at 3:07 am on September 22, 2020:

Maybe add here that this implies either r1 or -r1 is in range [0..2^128-1] (and similarly for r2), as that’s what we ultimately care about.

in src/scalar_impl.h:322 in a582c3f475 outdated

319+ * - either 0 <= r2 <= (-b1 + b2) / 2 or n - (-b1 + b2)/2 <= r2 < n.
320+ *
321+ * Proof.
322+ *
323+ * Let
324+ *  - epsilon1 = |g1/2^384 - b2/d|

sipa commented at 3:51 am on September 22, 2020:

FWIW, epsilon1 and epsilon2 can be computed exactly, and are somewhat smaller than 2^-385.

epsilon1 = 9722325547791800083761969591266328053001952849235129972941425083593053056369/4562440617622195218641171605700291324876190276497652512765600377628314188904572302704052175358268500448500017515730822496526831756972782880663319377092625557245625225500848619780486929676500992 =~ 2^-387.574091440504.
epsilon2 = 31846006334326197673727480983349989536770120240999881061201988246081714647887/4562440617622195218641171605700291324876190276497652512765600377628314188904572302704052175358268500448500017515730822496526831756972782880663319377092625557245625225500848619780486929676500992 =~ 2^-385.862352326689.

roconnor-blockstream commented at 11:47 am on September 22, 2020:

Now that I see these written out like this, I can improve the bound on k1 by 1 using the fact that epsilon1 is so very small. The bound on k2 is already more or less optimal with this proof technique.

in src/scalar_impl.h:341 in a582c3f475 outdated

338+ * <=   {property of rounding in c1 & definition of epsilon1
339+ *    2^-1 + a*epsilon1
340+ * <    {a < 2^256 and epsilon1 < 2^-385}
341+ *    2^-1 + 2^256 * 2^-385
342+ *  =
343+ *    2^-1 + 2^-129

sipa commented at 3:57 am on September 22, 2020:

When using the exact value of epsilon1 here, I get 2^-1 + 2^-131.574091440504…

in src/scalar_impl.h:357 in a582c3f475 outdated

354+ * <=   {property of rounding in c1 & definition of epsilon2
355+ *    2^-1 + a*epsilon2
356+ * <    {a < 2^256 and epsilon2 < 2^-385}
357+ *    2^-1 + 2^256 * 2^-385
358+ *  =
359+ *    2^-1 + 2^-129

sipa commented at 3:58 am on September 22, 2020:

When using the exact value of epsilon2 here, I get 2^-1 + 2^-129.862352326689…

in src/scalar_impl.h:379 in a582c3f475 outdated

376+ * <    {Lemma 1 and Lemma 2}
377+ *    a1*(2^-1 + 2^-129) + a2*(2^-1 + 2^-129)
378+ *  =
379+ *    (a1 + a2)(2^-1 + 2^-129)
380+ * <    {calculation}
381+ *    (a1 + a2 + 3)/2

sipa commented at 4:06 am on September 22, 2020:

Using exact numbers I get (a1+a2)/2 + 0.3132789511161…

roconnor-blockstream commented at 11:56 am on September 22, 2020:

Oh I see you already wrote this.

roconnor-blockstream force-pushed on Sep 22, 2020

roconnor-blockstream commented at 3:16 pm on September 22, 2020: contributor

I’ve updated the proof. Please note that epsilon1 and epsilon2 are now defined to be 2^256 times larger than they were before. But don’t worry, they are still pretty small values.

roconnor-blockstream commented at 3:18 pm on September 22, 2020: contributor

I’ve added a secp256k1_split_lambda_verify function to verify the results against the bounds computed in the proof.

Be aware that the verification fails with GCC 9.3 and GCC 10.2 because these compilers are badly broken. See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95189.

in src/scalar_impl.h:424 in d6cc066dde outdated

425+ *
426+ * Q.E.D.
427  */
428 
429+#ifdef VERIFY
430+static void secp256k1_split_lambda_verify(const secp256k1_scalar *r1, const secp256k1_scalar *r2, const secp256k1_scalar *a) {

roconnor-blockstream commented at 7:41 pm on September 22, 2020:

should be called secp256k1_scalar_split_lambda_verify.

in src/scalar_impl.h:307 in d6cc066dde outdated

303@@ -293,16 +304,161 @@ static void secp256k1_scalar_split_lambda(secp256k1_scalar *r1, secp256k1_scalar
304  * The derivation is described in the paper "Efficient Software Implementation of Public-Key
305  * Cryptography on Sensor Networks Using the MSP430X Microcontroller" (Gouvea, Oliveira, Lopez),
306  * Section 4.3 (here we use a somewhat higher-precision estimate):
307- * d = a1*b2 - b1*a2
308- * g1 = round((2^272)*b2/d)
309- * g2 = round((2^272)*b1/d)
310+ * d = a1*b2 - b1*a

real-or-random commented at 10:10 pm on September 22, 2020:

0 * d = a1*b2 - b1*a2

in src/scalar_impl.h:315 in d6cc066dde outdated

312+ * g2 = round(2^384 * (-b1)/d)
313+ *
314+ * (Note that 'd' is also equal to the curve order, n, here because [a1,b1] and [a2,b2]
315+ * can be found as outputs of the Extended Euclidean Algorithm on inputs 'n' and 'lambda').
316+ *
317+ * The function below splits a in r1 and r2, such that

real-or-random commented at 10:16 pm on September 22, 2020:

In general, a and (a1,a2) are confusing because they’re not related. Maybe we can just rename a1,a2? They’re used only in comments, so that should be easy.

roconnor-blockstream commented at 10:49 pm on September 22, 2020:

How do we feel about renaming the input parameter as k?

real-or-random commented at 10:50 pm on September 22, 2020:

Totally fine, that matches the papers. I just didn’t suggest this one because it’s more work.

roconnor-blockstream commented at 11:28 pm on September 22, 2020:

And rename it in the header file, scalar.h too?

real-or-random commented at 1:47 pm on September 23, 2020:

Yes, consistency with the header is a good thing IMO.

real-or-random commented at 6:56 pm on September 24, 2020:

I think there are a few places in this comment where it’s still a.

real-or-random commented at 10:20 pm on September 22, 2020: contributor

Nice. I can create a follow up PR to add a sage script that computes all the constants here.

One could write optimized multiplication routines for the current shorter g1, g2. But that’s a lot of work for a tiny bit of performance so unless we have this, I agree that the new constants are an improvement.

real-or-random commented at 10:27 pm on September 22, 2020: contributor

Be aware that the verification fails with GCC 9.3 and GCC 10.2 because these compilers are badly broken. See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95189.

Yeah, I don’t think we want “expected failures”. But tbh, I don’t know what to do. We could disable the tests for these GCC versions but this is not a good idea either. We could also write our own memcmp for the tests but this is strange, too.

roconnor-blockstream commented at 1:10 am on September 23, 2020: contributor

What would you do if we already had such a test in place before GCC released their broken compilers?

Keep in mind these broken compilers are already compromising our existing tests @ https://github.com/bitcoin-core/secp256k1/blob/770b3dcd6f1edb0a6355a55d92b7bea1e9524042/src/tests.c#L3458-L3460 for example. It is simply that those tests are improperly succeeding instead of improperly failing.

real-or-random commented at 8:12 am on September 23, 2020: contributor

@roconnor-blockstream That’s different.

If the test succeeds incorrectly, then that’s a problem because that single test does not work. And yes, maybe we should fix this.

If the tests fail incorrectly, it’s much worse. Then the test binary aborts early and a lot of the tests are simply skipped. Moreover, people will be trained to ignore failing tests / Travis, which is bad.

elichai commented at 8:39 am on September 23, 2020: contributor

What about the other memcmp’s we have in actual production code? (bip340 tag, tweak add check, scratch impl, sha256 selftest) does this bug affect those too?

real-or-random cross-referenced this on Sep 23, 2020 from issue memcmp may be miscompiled by GCC by real-or-random

roconnor-blockstream commented at 1:06 pm on September 23, 2020: contributor

The fix for the tests that succeed incorrectly is the same as for fixing these tests. Stop using GCC 9.3 and 10.2 or find some flags to disable the use of intrinsics for memcmp for those particular compiler versions that are broken.

Making code accommodations for compilers with such egregious errors doesn’t seem reasonable to me.

real-or-random commented at 1:43 pm on September 23, 2020: contributor

The fix for the tests that succeed incorrectly is the same as for fixing these tests.

I agree.

Stop using GCC 9.3 and 10.2

We’re not in control. We control our code, not the compiler. We can blacklist these compilers but that’s ugly and people would override it.

or find some flags to disable the use of intrinsics for memcmp for those particular compiler versions that are broken.

-fno-builtin-memcmp should do the trick but then we need to detect GCC versions in autoconf. That’s also addiitonal code (autotools code) and work, and not everyone uses autoconf.

roconnor-blockstream commented at 1:51 pm on September 23, 2020: contributor

I propose we argue about -fno-builtin-memcmp for a few weeks until GCC 9 and GCC 10 have new point releases and then ignore the problem.

gmaxwell commented at 3:28 pm on September 23, 2020: contributor

Concept ACK, but the test cannot be included as is– not even in a couple weeks because the broken compilers have been massively shipped and some users will still be running them for a couple of years (e.g. imagine you setup an offline signer using install media…).

At a minimum the tests should add an explicit memcmp bug test so that the error the user gets is a useful report that tells them that their compiler is faulty and not all other tests are bypassed in case the user wants to continue with the broken compiler. But I think the library should probably ship a workaround (see my comments in #823).

And congrats on finding a compiler bug (though you weren’t the first in the case)– good testing (sadly) finds compiler bugs. The fact that none of the other tests find this suggest to me that there isn’t enough mutation testing going on here (or we just got really unlucky, and none of the mutation hittable differences with constant comparisons have an early null).

real-or-random commented at 4:50 pm on September 23, 2020: contributor

The fact that none of the other tests find this suggest to me that there isn’t enough mutation testing going on here (or we just got really unlucky, and none of the mutation hittable differences with constant comparisons have an early null).

We can for sure have better mutation testing but in our defense, we use memcmp in the tests mostly to assert equality and not inequality, and this is a natural thing to do. There are a few inequality checks with constants, e.g., https://github.com/bitcoin-core/secp256k1/blob/d7838ba6a6ac77cec173080f20efcd0e311ebfaa/src/modules/extrakeys/tests_impl.h#L432 But in this case, GCC emits a memcmp call, and I can’t convince it to emit wrong code. It’s not that easy to hit this bug, it seems to depend on specific contexts and optimizations.

roconnor-blockstream force-pushed on Sep 23, 2020

roconnor-blockstream commented at 5:10 pm on September 23, 2020: contributor

Tests now pass. :angry:

roconnor-blockstream force-pushed on Sep 23, 2020

sipa cross-referenced this on Sep 23, 2020 from issue memcmp with constants that contain zero bytes are broken in GCC by sipa

sipa commented at 3:30 am on September 26, 2020: contributor

utACK d91408ea4c3661c172f6fa2da5f586b5096146eb

sipa commented at 8:47 am on September 26, 2020: contributor

If we want to add unit tests for lambda splitting that result in numbers very close to the bound:

split(0x7fffffffffffffffffffffffffffffff5d576e7357a4501ddfe92f46681b20a0) = (0xa2a8918ca85bafe22016d0b917e4dd76, -0x59de565a2c9d0e2d4373f7623c1d7cd7) [where 0xa2a89… = k1_bound]
split(0x2c9c52b33fa3cf1f5ad9e3fd77ed9ba54b294b893722e9a500e698ca4cf7632e) = (0x7221bf6b0087441437aa3fd4855ff261, 0x8a65287bd47179fb2be08846cea267eb) [where 0x8a652… = k2_bound-1]

I found these by randomly generating r1 values (within range 0..k1_bound) and r2 values (very close to k2_bound) such that split(r1+lambda*r2)=(r1,r2), and observing that valid r1 values were within a very small range. As I brought r2 closer and closer to k2_bound, the range became narrowing, making it easy to see where it converges. Turns out that it works all the way up to k2_bound-1.

The same approach works for r1 very close to k1_bound, except it needs r2 values in range (-k2_bound…0). There it actually lets us go to k1_bound itself.

gmaxwell commented at 6:44 pm on September 26, 2020: contributor

That scalar that creates maximal values might be a useful scalar for ecmult tests and as a private key for ECDH too bad you don’t achieve the k2_bound.

sipa commented at 8:17 pm on September 26, 2020: contributor

Here is a commit to add tests for splitting and EC multiplications with those scalars: https://github.com/sipa/secp256k1/commits/202009_pr822. The endomorphism test fails when I introduce small deviations in g1/g2 (though only when the error is around ~2^128 at least).

Feel free to cherry pick, or I can open as a separate PR later.

sipa commented at 9:26 pm on September 28, 2020: contributor

As it appears that all these large-output values follow a simple pattern ((((ORDER+a)/2 + LAMBDA*b) % ORDER) for small signed odd integers a and small signed integers b), I’ve expanded the set a bit and added a comment.

roconnor-blockstream commented at 3:08 pm on September 29, 2020: contributor

Should I add

0#ifdef VERIFY
1#include <string.h>
2#endif

?

real-or-random commented at 3:41 pm on September 29, 2020: contributor

Should I add

0#ifdef VERIFY
1#include <string.h>
2#endif

?

I don’t have strong opinions (and I assume noone has). If you add it, I’ll remove it in #825 . :P

sipa commented at 8:22 pm on October 4, 2020: contributor

Should I add

If we expect to merge #825 afterwards, no need.

Detailed comments for secp256k1_scalar_split_lambda. 362decd428

Add secp256k1_split_lambda_verify. 2456616cb1

Allow secp256k1_split_lambda_verify to pass even in the prensence of GCC bug https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95189. bff7fff23e

Add tests to exercise lambda split near bounds 341196c641

roconnor-blockstream force-pushed on Oct 5, 2020

roconnor-blockstream commented at 12:04 pm on October 5, 2020: contributor

I added the include of string.h. I’ve updated the documentation. I’ve pick @sipa’s tests.

I expect the k2_bound is unachievable. With some effort could probably enumerate the few possible candidate points that could possibly achieve the k2_bound and inspect them to eliminate them as possibilities.

gmaxwell commented at 9:41 pm on October 5, 2020: contributor

looks good to me

sipa commented at 9:57 pm on October 5, 2020: contributor

ACK 57d4b0df2c1ba9c6718607b3c3175b25b93b120b

Reviewed the diff with the earlier version, created a merge of this with #826 and built/tested it (40000 runs of the unit tests with it). @roconnor-blockstream It’s my belief that the k2_bound is indeed unreachable; if it was, one of the inputs in the scalars_near_split_bounds constant now should trigger it.

sipa cross-referenced this on Oct 10, 2020 from issue Rip out non-endomorphism code by sipa

gmaxwell commented at 8:54 pm on October 10, 2020: contributor

For your consideration, I’ve made some test improvements: https://github.com/gmaxwell/secp256k1/commit/6c3e1fab91766aa0c424b678371a0c8099726655

gmaxwell commented at 8:55 pm on October 10, 2020: contributor

The bounds check in secp256k1_scalar_split_lambda_verify now doesn’t avoid triggering the GCC bug, it just falsely passes in the presence of it (e.g. set the two bounds to all zeros and it keeps passing). Not ideal considering essentially all developers are using impacted compilers. I didn’t realize this at first and burnt a few cpu years running tests that could not fail. It appears to be right with the code in #825.

sipa cross-referenced this on Oct 11, 2020 from issue Rip out non-endomorphism code + dependencies by sipa

sipa commented at 6:14 pm on October 11, 2020: contributor

PR that merges this (including @gmaxwell’s extra tests) with others: #830

sipa cross-referenced this on Oct 11, 2020 from issue Switch to our own memcmp function by real-or-random

elichai commented at 9:59 am on October 13, 2020: contributor

Is it expected to see any meaningful difference in performance here? because I can’t see any in the benchmakrs

real-or-random commented at 10:39 am on October 13, 2020: contributor

No, this just saves a few instructions and we run this only once per ecmult.

sipa referenced this in commit c6b6b8f1bb on Oct 14, 2020

sipa commented at 7:40 pm on October 14, 2020: contributor

Merged as part of #830.

sipa closed this on Oct 14, 2020

roconnor-blockstream deleted the branch on Jan 25, 2021

Increase precision of g1 and g2. #822