Correct secp256k1_gej_add_ge with 3% performance hit on signing #261

pull apoelstra wants to merge 4 commits into bitcoin-core:master from apoelstra:add-fix changing 2 files +157 −20
  1. apoelstra commented at 1:24 pm on June 24, 2015: contributor

    Fixes https://github.com/bitcoin/secp256k1/issues/257 by catching the degenerate case, which appears as an intermediate variable being 0/0, and using an alternate expression for that variable in this case.

    This results in only a ~3% perf hit on signing. There are no changes in the multiplication or squaring count; this is entirely due to additions and cmovs. “I doubt we can do better” but I said that about an earlier revision with a 6% hit and Peter Dettman found a way to beat it, so my doubts aren’t worth much :)

  2. gej_add_ge: rearrange algebra
    There is zero functionality or opcount changes here; I need to do
    this to make sure both R and M are computed before they are used,
    since a future patch will replace either none or both of them.
    
    Also compute r->y directly in terms of r->x, which again will be
    used in a future patch.
    bcf2fcfd3a
  3. apoelstra cross-referenced this on Jun 24, 2015 from issue Add constant-time point multplication for ECDH by apoelstra
  4. in src/group_impl.h: in 80efd6849a outdated
    519+     * this case the correct value of lambda is (y1 - y2)/(x1 - x2),
    520+     * so set R/M equal to this. */
    521+    secp256k1_fe_negate(&rr_alt, &s2, 1);   /* rr = -Y2*Z1^3  */
    522+    secp256k1_fe_add(&rr_alt, &s1);         /* rr = Y1*Z2^3 - Y2*Z1^3 */
    523+    secp256k1_fe_negate(&m_alt, &u2, 1);    /* m = -X2*Z1^2 */
    524+    secp256k1_fe_add(&m_alt, &u1);          /* m = X1*Z2^2 - Y1*Z2^2 */
    


    peterdettman commented at 2:09 pm on June 24, 2015:
    0/* m = X1*Z2^2 - X2*Z1^2 */
    
  5. in src/group_impl.h: in 80efd6849a outdated
    513+    /** If lambda = R/M = 0/0 we have a problem (except in the "trivial"
    514+     *  case that Z = z1z2 = 0, and this is special-cased later on). */
    515+    degenerate = secp256k1_fe_normalizes_to_zero(&m) *
    516+                 secp256k1_fe_normalizes_to_zero(&rr) *
    517+                 (1 - a->infinity);
    518+    /* This only occurs when x1 = beta*x2 or x1 = beta^2*x2. In
    


    peterdettman commented at 2:34 pm on June 24, 2015:
    0x1 = beta*x2 or beta*x1 = x2
    

    might make it clearer there are just the two possibilities.


    apoelstra commented at 2:42 pm on June 24, 2015:
    Yeah, agreed. And I should probably add that these are exactly the nontrivial cases for which x1^3 == x2^3, which explains why there are only the two of them.
  6. in src/group_impl.h: in 80efd6849a outdated
    514+     *  case that Z = z1z2 = 0, and this is special-cased later on). */
    515+    degenerate = secp256k1_fe_normalizes_to_zero(&m) *
    516+                 secp256k1_fe_normalizes_to_zero(&rr) *
    517+                 (1 - a->infinity);
    518+    /* This only occurs when x1 = beta*x2 or x1 = beta^2*x2. In
    519+     * this case the correct value of lambda is (y1 - y2)/(x1 - x2),
    


    apoelstra commented at 2:37 pm on June 24, 2015:
    Oh, this is the reciprocal of what it should be.
  7. in src/group_impl.h: in 80efd6849a outdated
    510-    secp256k1_fe_mul(&r->z, &m, &z);                                /* r->z = M*Z (1) */
    511+    secp256k1_fe_mul(&tt, &u1, &u2); secp256k1_fe_negate(&tt, &tt, 1); /* t = -U1*U2 (2) */
    512+    secp256k1_fe_add(&rr, &tt);                                        /* rr = R = T^2-U1*U2 (3) */
    513+    /** If lambda = R/M = 0/0 we have a problem (except in the "trivial"
    514+     *  case that Z = z1z2 = 0, and this is special-cased later on). */
    515+    degenerate = secp256k1_fe_normalizes_to_zero(&m) *
    


    sipa commented at 2:44 pm on June 24, 2015:
    Using & instead of * is probably slightly faster.

    apoelstra commented at 2:50 pm on June 24, 2015:
    Oh, I can also probably drop the * (1 - a->infinity) since if this is 0, the cmov code at the bottom of the function will overwrite everything I do anyway.

    sipa commented at 2:51 pm on June 24, 2015:
    Indeed!
  8. in src/group_impl.h: in 80efd6849a outdated
    523+    secp256k1_fe_negate(&m_alt, &u2, 1);    /* m = -X2*Z1^2 */
    524+    secp256k1_fe_add(&m_alt, &u1);          /* m = X1*Z2^2 - Y1*Z2^2 */
    525+
    526+    secp256k1_fe_cmov(&rr_alt, &rr, !degenerate);
    527+    secp256k1_fe_cmov(&m_alt, &m, !degenerate);
    528+    /* Now Ralt / Malt = lambda and is guaranteed not to be 0/0.
    


    sipa commented at 2:52 pm on June 24, 2015:
    Would be nice if you could also update the algebra in the comments above to reflect this case.
  9. in src/group_impl.h: in 80efd6849a outdated
    529+     * We keep the old values in R, M because we will need them
    530+     * for other reasons. */
    531+    secp256k1_fe_sqr(&n, &m_alt);                       /* n = M^2 (1) */
    532+    secp256k1_fe_mul(&q, &n, &t);                       /* q = Q = T*M^2 (1) */
    533+    secp256k1_fe_mul(&n, &n, &m);                       /* n = M^4 (1) */
    534+    secp256k1_fe_mul(&n, &n, &m_alt);                   /* n = M^4 (1) */
    


    peterdettman commented at 3:02 pm on June 24, 2015:

    I can’t test code right now, but it seems to me that 526,527:

    0n = n * m * m_alt
    

    could be replaced with:

    0n = n * m^2
    

    because either m == m_alt or m == 0.

    Or even:

    0n = n^2; cmov n,m,degenerate
    

    apoelstra commented at 3:28 pm on June 24, 2015:
    I can’t test for another hour, but if this cmov thing works I think it erases the perf hit!

    apoelstra commented at 3:47 pm on June 24, 2015:

    I tried it (using vim on phone keyboard, ugh, but I couldn’t wait..). It works, and drops the perf hit I measure from 5.7% to 3.8%! :D

    Edit: Now I see 2.9% hit for signing: 48.7us vs 47.3us.

  10. apoelstra force-pushed on Jun 24, 2015
  11. apoelstra commented at 4:45 pm on June 24, 2015: contributor
    Push to fix everyone’s comments (comment fixes and Peter’s awesome speedup).
  12. in src/group_impl.h: in 335ac43b8d outdated
    516+                 secp256k1_fe_normalizes_to_zero(&rr);
    517+    /* This only occurs when y1 == -y2 and x1^3 == x2^3, but x1 != x2.
    518+     * This means either x1 == beta*x2 or beta*x1 == x2, where beta is
    519+     * a nontrivial cube root of one. In either case, an alternate
    520+     * non-indeterminate expression for lambda is (x1 - x2)/(y1 - y2),
    521+     * so we set R/M equal to this. */
    


    apoelstra commented at 10:13 pm on June 24, 2015:
    Oh, I had the expression for lambda right the first time :) it should be (y1 - y2)/(x1 - x2).

    sipa commented at 9:56 pm on June 25, 2015:
    Is it (y1 - y2)/(x1 - x2) or the inverse? The comment still says the inverse.
  13. gmaxwell commented at 3:26 am on June 25, 2015: contributor
    We might want to update the citation above to point out that the additive law given in the citation is actually not complete, lest some unwary traveler see the citation and go there rather than reading the code.
  14. in src/group_impl.h: in 335ac43b8d outdated
    533+    secp256k1_fe_sqr(&n, &m_alt);                       /* n = Malt^2 (1) */
    534+    secp256k1_fe_mul(&q, &n, &t);                       /* q = Q = T*Malt^2 (1) */
    535+    /* These two lines use a clever observation of Peter Dettman's,
    536+     * that either M == Malt or M == 0. So M^3 * Malt is either Malt^4
    537+     * (which is computed by squaring), or zero (which is computed by
    538+     * the cmov). So the cost is one squaring versus two multiplications. */
    


    peterdettman commented at 5:24 am on June 25, 2015:
    It’s of course nice to be appreciated, but I think code comments would be better kept to the immediate technical details.

    gmaxwell commented at 5:56 am on June 25, 2015:
    Direct experience shows that people do find attribution in commit messages just fine. E.g. I had people contact me on the inversion free R == r trick.

    apoelstra commented at 10:15 pm on June 25, 2015:
    If you’d like me to remove your name, I will. But I agree with @gmaxwell that it’s probably good for readers that it be there.

    gmaxwell commented at 11:26 pm on June 25, 2015:
    Actually I was saying that people will find it if it’s just in the commit message rather than a comment in the code.

    apoelstra commented at 2:52 am on June 26, 2015:
    Oh :) well if you’re both against me I’ll take it out of the source comment.

    sipa commented at 11:54 am on June 26, 2015:

    I think we eventually want some AUTHORS document that explains who contributed what - that’s easier to access than digging through commit messages, and less invasive than putting it in the code.

    Some documentation in the source tree that explains the various tricks and optimizations implemented would be even nicer.

  15. in src/group_impl.h: in 335ac43b8d outdated
    506     secp256k1_fe_sqr(&rr, &t);                          /* rr = T^2 (1) */
    507-    secp256k1_fe_mul(&t, &u1, &u2); secp256k1_fe_negate(&t, &t, 1); /* t = -U1*U2 (2) */
    508-    secp256k1_fe_add(&rr, &t);                                      /* rr = R = T^2-U1*U2 (3) */
    509-    secp256k1_fe_sqr(&t, &rr);                                      /* t = R^2 (1) */
    510-    secp256k1_fe_mul(&r->z, &m, &z);                                /* r->z = M*Z (1) */
    511+    secp256k1_fe_mul(&tt, &u1, &u2); secp256k1_fe_negate(&tt, &tt, 1); /* t = -U1*U2 (2) */
    


    sipa commented at 8:19 am on June 25, 2015:
    comment: tt instead of t
  16. apoelstra commented at 12:38 pm on June 25, 2015: contributor
    For what it’s worth, the citation does say that it’s assuming the y1 = -y2 case does not occur.
  17. apoelstra force-pushed on Jun 25, 2015
  18. apoelstra force-pushed on Jun 25, 2015
  19. apoelstra commented at 9:57 pm on June 25, 2015: contributor

    Push with comment fixes and a much-expanded comment about our high-level strategy for obtaining constant-time multiplication for all pairs of points (well, except those with a = infinity).

    We might be the first to have produced a fast, constant-time algorithm which works for all pairs of points. When developing this fix I checked both Trezor and OpenSSL, and their constant-time additions both have degenerate cases (specifically both don’t work when asked to double) which they avoid by being careful about how constadds are used in their larger algorithms.

    Edit: We did discuss using this strategy; however we were uncertain how it interacted with our existing blinding and optimization tricks throughout the codebase (which OpenSSL and Trezor do not have to worry about) and none of us wanted to have to keep “oh, provably make sure you avoid the cases that break constadd” in our heads for all future work.

  20. apoelstra renamed this:
    Correct `secp256k1_gej_add_ge` with 2M - S performance hit
    Correct `secp256k1_gej_add_ge` with 3% performance hit on signing
    on Jun 25, 2015
  21. gej_add_ge: fix degenerate case when computing P + (-lambda)P
    If two points (x1, y1) and (x2, y2) are given to gej_add_ge with
    x1 != x2 but y1 = -y2, the function gives a wrong answer since
    this causes it to compute "lambda = 0/0" during an intermediate
    step. (Here lambda refers to an auxiallary variable in the point
    addition formula, not the cube-root of 1 used by the endomorphism
    optimization.)
    
    This commit catches the 0/0 and replaces it with an alternate
    expression for lambda, cmov'ing it in place if necessary.
    5de4c5dffd
  22. tests: Add failing unit test for #257 (bad addition formula) 8c5d5f7b5b
  23. Add tests for adding P+Q with P.x!=Q.x and P.y=-Q.y 765742021a
  24. apoelstra force-pushed on Jun 29, 2015
  25. sipa commented at 4:27 pm on June 29, 2015: contributor
    ACK
  26. sipa merged this on Jun 29, 2015
  27. sipa closed this on Jun 29, 2015

  28. sipa referenced this in commit 17f7148606 on Jun 29, 2015
  29. apoelstra deleted the branch on Jun 19, 2017

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin-core/secp256k1. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-10-30 01:15 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me