Eliminate scratch memory used when generating contexts #557

pull apoelstra wants to merge 6 commits into bitcoin-core:master from apoelstra:2018-09-no-scratch-memory-in-context-build changing 6 files +174 −59
  1. apoelstra commented at 9:43 pm on September 22, 2018: contributor
    Builds on #553
  2. in src/group_impl.h:175 in e1dacce4e6 outdated
    171@@ -172,20 +172,23 @@ static void secp256k1_ge_set_all_gej_var(secp256k1_ge *r, const secp256k1_gej *a
    172     }
    173 }
    174 
    175-static void secp256k1_ge_set_table_gej_var(secp256k1_ge *r, const secp256k1_gej *a, const secp256k1_fe *zr, size_t len) {
    176+static void secp256k1_ge_set_table_gej_var(secp256k1_ge_storage *r, const secp256k1_gej *a, const secp256k1_fe *zr, size_t len) {
    


    apoelstra commented at 9:51 pm on September 22, 2018:
    Oh, this function can be eliminated entirely actually.
  3. apoelstra force-pushed on Sep 22, 2018
  4. in src/ecmult_impl.h:196 in c4273d07cc outdated
    205+    secp256k1_ge_set_gej_zinv(&p_ge, &pj, &zi);
    206+    secp256k1_ge_from_storage(&last_ge, &pre[n - 1]);
    207+    secp256k1_ge_to_storage(&pre[n - 1], &p_ge);
    208+
    209+    /* Compute the actual x-coordinate of D, which will be needed below. */
    210+    secp256k1_fe_inv_var(&d.z, &d.z);
    


    apoelstra commented at 9:54 pm on September 22, 2018:
    should batch this inversion with the one 5 lines up
  5. apoelstra force-pushed on Sep 22, 2018
  6. apoelstra cross-referenced this on Sep 23, 2018 from issue Signed-digit multi-comb for ecmult_gen by peterdettman
  7. gmaxwell commented at 0:30 am on September 30, 2018: contributor
    Concept ACK. Are you going to batch that extra inv?
  8. apoelstra commented at 9:01 pm on October 1, 2018: contributor
    Yep, done.
  9. apoelstra commented at 2:49 pm on October 17, 2018: contributor
    @sipa can you take a look at this?
  10. real-or-random cross-referenced this on Oct 22, 2018 from issue Enable context creation in preallocated memory by real-or-random
  11. in src/group_impl.h:132 in 6d7b9b8a54 outdated
    132+static void secp256k1_ge_set_all_gej_var(secp256k1_ge *r, const secp256k1_gej *a, size_t len) {
    133+    secp256k1_fe u;
    134     size_t i;
    135-    size_t count = 0;
    136-    az = (secp256k1_fe *)checked_malloc(cb, sizeof(secp256k1_fe) * len);
    137+    size_t last_i = (size_t)-1;
    


    gmaxwell commented at 0:19 am on October 28, 2018:
    Why no SIZE_MAX?
  12. real-or-random cross-referenced this on Oct 28, 2018 from issue Fix integer overflow in ecmult_multi_var when n is large by jonasnick
  13. apoelstra force-pushed on Oct 30, 2018
  14. apoelstra commented at 12:52 pm on October 30, 2018: contributor
    Replaced all instances of (size_t)-1) with SIZE_MAX. Confirmed that git grep -n '(size_t)\w*-1' runs clean.
  15. sipa commented at 2:36 am on November 6, 2018: contributor
    Needs rebase on #553 (not urgent, I’ll review soon rebased or not).
  16. apoelstra force-pushed on Nov 6, 2018
  17. apoelstra commented at 3:55 pm on November 6, 2018: contributor
    Rebased.
  18. in src/group_impl.h:133 in 8b9cc17d8c outdated
    133+    secp256k1_fe u;
    134     size_t i;
    135-    size_t count = 0;
    136-    az = (secp256k1_fe *)checked_malloc(cb, sizeof(secp256k1_fe) * len);
    137+    size_t last_i = SIZE_MAX;
    138+    size_t first_i = SIZE_MAX;
    


    sipa commented at 2:07 am on November 8, 2018:
    It seems you’re not actually using first_i for anything except an assert at the end. Is that intentional?

    apoelstra commented at 2:29 pm on November 8, 2018:
    Yeah, though this was an algorithm-debugging check that probably should be removed now.
  19. in src/group_impl.h:152 in 8b9cc17d8c outdated
    156+    secp256k1_fe_inv_var(&u, &r[last_i].x);
    157 
    158-    azi = (secp256k1_fe *)checked_malloc(cb, sizeof(secp256k1_fe) * count);
    159-    secp256k1_fe_inv_all_var(azi, az, count);
    160-    free(az);
    161+    i = last_i + 1;
    


    sipa commented at 2:22 am on November 8, 2018:

    I think this can be written more clearly as:

    0i = last_i;
    1while (i > 0) {
    2    --i;
    3    if (!a[i].infinity) {
    4        secp256k1_fe_mul(&r[last_i].x, &r[i].x, &u);
    5        secp256k1_fe_mul(&u, &u, &a[last_i].z);
    6        last_i = i;
    7    }
    8}
    

    apoelstra commented at 2:46 pm on November 8, 2018:
    Ah, nice
  20. in src/ecmult_impl.h:172 in f868315946 outdated
    178+    pj.y = a_ge.y;
    179+    pj.z = a->z;
    180+    pj.infinity = 0;
    181+
    182+    zr = d.z;
    183+    secp256k1_fe_normalize(&zr);
    


    sipa commented at 2:52 am on November 8, 2018:
    You can use secp256k1_fe_normalize_var everywhere in this function.

    apoelstra commented at 2:46 pm on November 8, 2018:
    done
  21. in src/ecmult_impl.h:205 in f868315946 outdated
    213+
    214+    i = n - 1;
    215+    while (i > 0) {
    216+        secp256k1_fe zi2, zi3;
    217+        i--;
    218+        /* For the remaining points, we extract the z-ratio from the stored
    


    sipa commented at 8:22 am on November 8, 2018:
    Maybe point out that the stored z-ratio is extracted by the ge_from_storage at the end of the previous iteration of this loop.

    apoelstra commented at 2:46 pm on November 8, 2018:
    Added comment.
  22. in src/ecmult_impl.h:210 in f868315946 outdated
    219+         * x-coordinate, compute its z^-1 from that, and compute the full
    220+         * point from that: */
    221+        secp256k1_fe_mul(&zi, &zi, &last_ge.x);
    222+        secp256k1_fe_sqr(&zi2, &zi);
    223+        secp256k1_fe_mul(&zi3, &zi2, &zi);
    224+        /* To compute x, we observe that the z-ratio is simply `h` from
    


    sipa commented at 8:27 am on November 8, 2018:

    Took me a minute to realize you were reasoning backwards here.

    Perhaps explain that we’re given the rzr and r->y output of gej_add_ge_var, as well as the inverse of r->y (by multiplying the Z ratios with the overall Z inverse in the first half of this loop), and the (constant) b->x and b->y input (which refer to D), and are trying to compute (the affine version of) the a input to it.


    apoelstra commented at 2:46 pm on November 8, 2018:
    Expanded and rearranged this comment block.
  23. apoelstra commented at 2:46 pm on November 8, 2018: contributor
    Addressed all comments.
  24. apoelstra force-pushed on Nov 8, 2018
  25. sipa commented at 0:11 am on November 9, 2018: contributor

    ACK, please squash the fixes in the respective commits?

    It’s a really cool trick; I hadn’t expected it would be possible to reconstruct everything from Y and Z-ratios, but the formulas work out.

    It seems this actually makes secp256k1_context_create(SECP256K1_CONTEXT_VERIFY) around 6% faster for me (when disabling static precomputation…).

  26. ecmult_gen_impl: eliminate scratch memory used when generating context 7f7a2ed3a8
  27. apoelstra force-pushed on Nov 9, 2018
  28. apoelstra commented at 0:17 am on November 9, 2018: contributor
    Squashed.
  29. ecmult_impl: eliminate scratch memory used when generating context 47045270fa
  30. ecmult_impl: save one fe_inv_var 84740acd2a
  31. add `secp256k1_ge_set_all_gej_var` test which deals with many infinite points ffd3b346fe
  32. apoelstra force-pushed on Nov 9, 2018
  33. peterdettman commented at 8:09 am on November 9, 2018: contributor

    I think it’s a bit easier to understand if the z-ratios are stored with the y-values that they’ll be used with (instead of off by 1).

    Like this: https://github.com/peterdettman/secp256k1/commit/f1ab49c5dfb11933ea64299c0c769a36eed4fa87

  34. in src/ecmult_impl.h:232 in ffd3b346fe outdated
    241+         *
    242+         * Rearranging and dividing by `z^2` to convert to affine, we get
    243+         *
    244+         *     x = d_x - rzr / z^2
    245+         *       = d_x - rzr * zi2
    246+         */
    


    peterdettman commented at 8:13 am on November 9, 2018:
    The comments here could be improved with respect to the d.z factors that are missing/implicit in the equations.

    apoelstra commented at 1:17 pm on November 9, 2018:

    Sure, so above where I just say “this is equal to h” I could write out

    0rzr = h
    1    = d.x * z^2 - x * d.z^2
    2    = d.x * z^2 - x    because d.z = 1
    

    Would that be sufficient?


    peterdettman commented at 4:35 pm on November 9, 2018:
    I’m referring to the original d.z from the doubling at line 152 (call this d_z), not d_ge.z == 1. So the conversion to affine also involves this d_z: x_affine = d_x / d_z^2 - rzr / z^2 / d_z^2 and note that: zi == 1 / z / d_z in each loop iteration.

    apoelstra commented at 4:41 pm on November 9, 2018:
    Ah, right – I say d_x in the comments but then use a variable called dx_over_dz_squared, which is confusing.

    peterdettman commented at 8:13 am on November 10, 2018:
    Yes, also the comments gloss over the 1/d_z factor in zi.
  35. apoelstra commented at 4:04 pm on November 9, 2018: contributor
    I think @peterdettman’s commit is great, it does simplify the logic a fair bit.
  36. Store z-ratios in the 'x' coord they'll recover efa783f8f0
  37. apoelstra commented at 1:44 pm on November 10, 2018: contributor
    Added both of Peter’s commits squashed into one, and an addition commit that adds even more comments explaining how the effective-affine isomorphic curve plays into everything. @peterdettman do you think this is better? @sipa can you comment on this? We’re back to “working backward” in the sense that I do a series of equations starting from abstract coordinates and ending up with values that have variable bindings, but it’s all pretty explicit now.
  38. in src/ecmult_impl.h:208 in 2e93fed97c outdated
    203+     *           to move from our isomorphic curve back to secp256k1. We
    204+     *           applied this multiplier above to the inverse-z-coordinate
    205+     *           computing `pre[n-1]`, and since all subsequent inverse-zs
    206+     *           are computed as multiples of this value, they will also
    207+     *           have the multiplier applied to them without any further
    208+     *           computation.
    


    apoelstra commented at 1:52 pm on November 10, 2018:
    Actually this block is confusing and does not distinguish between the two curves. Will rewrite.
  39. apoelstra force-pushed on Nov 10, 2018
  40. apoelstra force-pushed on Nov 10, 2018
  41. ecmult_impl: expand comment to explain how effective affine interacts with everything b3bf5f99a3
  42. apoelstra force-pushed on Nov 10, 2018
  43. sipa commented at 3:43 am on November 11, 2018: contributor
    @apoelstra Looks very readable now; I’ll review in detail soon.
  44. peterdettman commented at 5:49 am on November 12, 2018: contributor
    Yes, better, thanks for improving that.
  45. gmaxwell commented at 5:19 pm on November 17, 2018: contributor
    Nice!
  46. sipa commented at 2:33 am on November 20, 2018: contributor
    utACK b3bf5f99a3251e3d72ffde1f39158af6ea133e33
  47. apoelstra commented at 4:13 pm on November 26, 2018: contributor
    Ready for merge?
  48. sipa merged this on Nov 26, 2018
  49. sipa closed this on Nov 26, 2018

  50. sipa referenced this in commit e34ceb333b on Nov 26, 2018
  51. apoelstra deleted the branch on Nov 26, 2018
  52. gmaxwell commented at 11:44 pm on May 13, 2019: contributor
    Addition/subtraction with constants are effectively free except sometimes in the tightest of loops, the compiler will usually remove them, when it doesn’t they usually get hidden by pipelines. This code is run-once code, it should be optimized for clarity and maintainability most and testability much more than for size (within reason) and performance.
  53. gmaxwell referenced this in commit a484e0008b on May 25, 2019
  54. real-or-random cross-referenced this on Jun 17, 2019 from issue Low-footprint mode by gmaxwell
  55. apoelstra cross-referenced this on Jul 12, 2023 from issue group: ge(j) should have as invariant that the curve equation holds by real-or-random

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin-core/secp256k1. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-11-23 23:15 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me