Correct `secp256k1_gej_add_ge` with 3% performance hit on signing

apoelstra commented at 1:24 pm on June 24, 2015: contributor

Fixes https://github.com/bitcoin/secp256k1/issues/257 by catching the degenerate case, which appears as an intermediate variable being 0/0, and using an alternate expression for that variable in this case.

This results in only a ~3% perf hit on signing. There are no changes in the multiplication or squaring count; this is entirely due to additions and cmovs. “I doubt we can do better” but I said that about an earlier revision with a 6% hit and Peter Dettman found a way to beat it, so my doubts aren’t worth much :)

gej_add_ge: rearrange algebra

There is zero functionality or opcount changes here; I need to do
this to make sure both R and M are computed before they are used,
since a future patch will replace either none or both of them.

Also compute r->y directly in terms of r->x, which again will be
used in a future patch.

bcf2fcfd3a

apoelstra cross-referenced this on Jun 24, 2015 from issue Add constant-time point multplication for ECDH by apoelstra

in src/group_impl.h: in 80efd6849a outdated

519+     * this case the correct value of lambda is (y1 - y2)/(x1 - x2),
520+     * so set R/M equal to this. */
521+    secp256k1_fe_negate(&rr_alt, &s2, 1);   /* rr = -Y2*Z1^3  */
522+    secp256k1_fe_add(&rr_alt, &s1);         /* rr = Y1*Z2^3 - Y2*Z1^3 */
523+    secp256k1_fe_negate(&m_alt, &u2, 1);    /* m = -X2*Z1^2 */
524+    secp256k1_fe_add(&m_alt, &u1);          /* m = X1*Z2^2 - Y1*Z2^2 */

peterdettman commented at 2:09 pm on June 24, 2015:

0/* m = X1*Z2^2 - X2*Z1^2 */

in src/group_impl.h: in 80efd6849a outdated

513+    /** If lambda = R/M = 0/0 we have a problem (except in the "trivial"
514+     *  case that Z = z1z2 = 0, and this is special-cased later on). */
515+    degenerate = secp256k1_fe_normalizes_to_zero(&m) *
516+                 secp256k1_fe_normalizes_to_zero(&rr) *
517+                 (1 - a->infinity);
518+    /* This only occurs when x1 = beta*x2 or x1 = beta^2*x2. In

peterdettman commented at 2:34 pm on June 24, 2015:

0x1 = beta*x2 or beta*x1 = x2

might make it clearer there are just the two possibilities.

apoelstra commented at 2:42 pm on June 24, 2015:

Yeah, agreed. And I should probably add that these are exactly the nontrivial cases for which x1^3 == x2^3, which explains why there are only the two of them.

in src/group_impl.h: in 80efd6849a outdated

514+     *  case that Z = z1z2 = 0, and this is special-cased later on). */
515+    degenerate = secp256k1_fe_normalizes_to_zero(&m) *
516+                 secp256k1_fe_normalizes_to_zero(&rr) *
517+                 (1 - a->infinity);
518+    /* This only occurs when x1 = beta*x2 or x1 = beta^2*x2. In
519+     * this case the correct value of lambda is (y1 - y2)/(x1 - x2),

apoelstra commented at 2:37 pm on June 24, 2015:

Oh, this is the reciprocal of what it should be.

in src/group_impl.h: in 80efd6849a outdated

510-    secp256k1_fe_mul(&r->z, &m, &z);                                /* r->z = M*Z (1) */
511+    secp256k1_fe_mul(&tt, &u1, &u2); secp256k1_fe_negate(&tt, &tt, 1); /* t = -U1*U2 (2) */
512+    secp256k1_fe_add(&rr, &tt);                                        /* rr = R = T^2-U1*U2 (3) */
513+    /** If lambda = R/M = 0/0 we have a problem (except in the "trivial"
514+     *  case that Z = z1z2 = 0, and this is special-cased later on). */
515+    degenerate = secp256k1_fe_normalizes_to_zero(&m) *

sipa commented at 2:44 pm on June 24, 2015:

Using & instead of * is probably slightly faster.

apoelstra commented at 2:50 pm on June 24, 2015:

Oh, I can also probably drop the * (1 - a->infinity) since if this is 0, the cmov code at the bottom of the function will overwrite everything I do anyway.

sipa commented at 2:51 pm on June 24, 2015:

Indeed!

in src/group_impl.h: in 80efd6849a outdated

523+    secp256k1_fe_negate(&m_alt, &u2, 1);    /* m = -X2*Z1^2 */
524+    secp256k1_fe_add(&m_alt, &u1);          /* m = X1*Z2^2 - Y1*Z2^2 */
525+
526+    secp256k1_fe_cmov(&rr_alt, &rr, !degenerate);
527+    secp256k1_fe_cmov(&m_alt, &m, !degenerate);
528+    /* Now Ralt / Malt = lambda and is guaranteed not to be 0/0.

sipa commented at 2:52 pm on June 24, 2015:

Would be nice if you could also update the algebra in the comments above to reflect this case.

in src/group_impl.h: in 80efd6849a outdated

529+     * We keep the old values in R, M because we will need them
530+     * for other reasons. */
531+    secp256k1_fe_sqr(&n, &m_alt);                       /* n = M^2 (1) */
532+    secp256k1_fe_mul(&q, &n, &t);                       /* q = Q = T*M^2 (1) */
533+    secp256k1_fe_mul(&n, &n, &m);                       /* n = M^4 (1) */
534+    secp256k1_fe_mul(&n, &n, &m_alt);                   /* n = M^4 (1) */

peterdettman commented at 3:02 pm on June 24, 2015:

I can’t test code right now, but it seems to me that 526,527:

0n = n * m * m_alt

could be replaced with:

0n = n * m^2

because either m == m_alt or m == 0.

Or even:

0n = n^2; cmov n,m,degenerate

apoelstra commented at 3:28 pm on June 24, 2015:

I can’t test for another hour, but if this cmov thing works I think it erases the perf hit!

apoelstra commented at 3:47 pm on June 24, 2015:

I tried it (using vim on phone keyboard, ugh, but I couldn’t wait..). It works, and drops the perf hit I measure from 5.7% to 3.8%! :D

Edit: Now I see 2.9% hit for signing: 48.7us vs 47.3us.

apoelstra force-pushed on Jun 24, 2015

apoelstra commented at 4:45 pm on June 24, 2015: contributor

Push to fix everyone’s comments (comment fixes and Peter’s awesome speedup).

in src/group_impl.h: in 335ac43b8d outdated

516+                 secp256k1_fe_normalizes_to_zero(&rr);
517+    /* This only occurs when y1 == -y2 and x1^3 == x2^3, but x1 != x2.
518+     * This means either x1 == beta*x2 or beta*x1 == x2, where beta is
519+     * a nontrivial cube root of one. In either case, an alternate
520+     * non-indeterminate expression for lambda is (x1 - x2)/(y1 - y2),
521+     * so we set R/M equal to this. */

apoelstra commented at 10:13 pm on June 24, 2015:

Oh, I had the expression for lambda right the first time :) it should be (y1 - y2)/(x1 - x2).

sipa commented at 9:56 pm on June 25, 2015:

Is it (y1 - y2)/(x1 - x2) or the inverse? The comment still says the inverse.

gmaxwell commented at 3:26 am on June 25, 2015: contributor

We might want to update the citation above to point out that the additive law given in the citation is actually not complete, lest some unwary traveler see the citation and go there rather than reading the code.

in src/group_impl.h: in 335ac43b8d outdated

533+    secp256k1_fe_sqr(&n, &m_alt);                       /* n = Malt^2 (1) */
534+    secp256k1_fe_mul(&q, &n, &t);                       /* q = Q = T*Malt^2 (1) */
535+    /* These two lines use a clever observation of Peter Dettman's,
536+     * that either M == Malt or M == 0. So M^3 * Malt is either Malt^4
537+     * (which is computed by squaring), or zero (which is computed by
538+     * the cmov). So the cost is one squaring versus two multiplications. */

peterdettman commented at 5:24 am on June 25, 2015:

It’s of course nice to be appreciated, but I think code comments would be better kept to the immediate technical details.

gmaxwell commented at 5:56 am on June 25, 2015:

Direct experience shows that people do find attribution in commit messages just fine. E.g. I had people contact me on the inversion free R == r trick.

apoelstra commented at 10:15 pm on June 25, 2015:

If you’d like me to remove your name, I will. But I agree with @gmaxwell that it’s probably good for readers that it be there.

gmaxwell commented at 11:26 pm on June 25, 2015:

Actually I was saying that people will find it if it’s just in the commit message rather than a comment in the code.

apoelstra commented at 2:52 am on June 26, 2015:

Oh :) well if you’re both against me I’ll take it out of the source comment.

sipa commented at 11:54 am on June 26, 2015:

I think we eventually want some AUTHORS document that explains who contributed what - that’s easier to access than digging through commit messages, and less invasive than putting it in the code.

Some documentation in the source tree that explains the various tricks and optimizations implemented would be even nicer.

in src/group_impl.h: in 335ac43b8d outdated

506     secp256k1_fe_sqr(&rr, &t);                          /* rr = T^2 (1) */
507-    secp256k1_fe_mul(&t, &u1, &u2); secp256k1_fe_negate(&t, &t, 1); /* t = -U1*U2 (2) */
508-    secp256k1_fe_add(&rr, &t);                                      /* rr = R = T^2-U1*U2 (3) */
509-    secp256k1_fe_sqr(&t, &rr);                                      /* t = R^2 (1) */
510-    secp256k1_fe_mul(&r->z, &m, &z);                                /* r->z = M*Z (1) */
511+    secp256k1_fe_mul(&tt, &u1, &u2); secp256k1_fe_negate(&tt, &tt, 1); /* t = -U1*U2 (2) */

sipa commented at 8:19 am on June 25, 2015:

comment: tt instead of t

apoelstra commented at 12:38 pm on June 25, 2015: contributor

For what it’s worth, the citation does say that it’s assuming the y1 = -y2 case does not occur.

apoelstra force-pushed on Jun 25, 2015

apoelstra commented at 9:57 pm on June 25, 2015: contributor

Push with comment fixes and a much-expanded comment about our high-level strategy for obtaining constant-time multiplication for all pairs of points (well, except those with a = infinity).

We might be the first to have produced a fast, constant-time algorithm which works for all pairs of points. When developing this fix I checked both Trezor and OpenSSL, and their constant-time additions both have degenerate cases (specifically both don’t work when asked to double) which they avoid by being careful about how constadds are used in their larger algorithms.

Edit: We did discuss using this strategy; however we were uncertain how it interacted with our existing blinding and optimization tricks throughout the codebase (which OpenSSL and Trezor do not have to worry about) and none of us wanted to have to keep “oh, provably make sure you avoid the cases that break constadd” in our heads for all future work.

apoelstra renamed this:
~~Correct `secp256k1_gej_add_ge` with 2M - S performance hit~~
Correct `secp256k1_gej_add_ge` with 3% performance hit on signing
on Jun 25, 2015

gej_add_ge: fix degenerate case when computing P + (-lambda)P

If two points (x1, y1) and (x2, y2) are given to gej_add_ge with
x1 != x2 but y1 = -y2, the function gives a wrong answer since
this causes it to compute "lambda = 0/0" during an intermediate
step. (Here lambda refers to an auxiallary variable in the point
addition formula, not the cube-root of 1 used by the endomorphism
optimization.)

This commit catches the 0/0 and replaces it with an alternate
expression for lambda, cmov'ing it in place if necessary.

5de4c5dffd

tests: Add failing unit test for #257 (bad addition formula) 8c5d5f7b5b

Add tests for adding P+Q with P.x!=Q.x and P.y=-Q.y 765742021a

apoelstra force-pushed on Jun 29, 2015

sipa commented at 4:27 pm on June 29, 2015: contributor

ACK

sipa merged this on Jun 29, 2015

sipa closed this on Jun 29, 2015

sipa referenced this in commit 17f7148606 on Jun 29, 2015

apoelstra deleted the branch on Jun 19, 2017

Correct secp256k1_gej_add_ge with 3% performance hit on signing #261

Correct `secp256k1_gej_add_ge` with 3% performance hit on signing #261