Eliminate scratch memory used when generating contexts

apoelstra commented at 9:43 pm on September 22, 2018: contributor

Builds on #553

in src/group_impl.h:175 in e1dacce4e6 outdated

171@@ -172,20 +172,23 @@ static void secp256k1_ge_set_all_gej_var(secp256k1_ge *r, const secp256k1_gej *a
172     }
173 }
174 
175-static void secp256k1_ge_set_table_gej_var(secp256k1_ge *r, const secp256k1_gej *a, const secp256k1_fe *zr, size_t len) {
176+static void secp256k1_ge_set_table_gej_var(secp256k1_ge_storage *r, const secp256k1_gej *a, const secp256k1_fe *zr, size_t len) {

apoelstra commented at 9:51 pm on September 22, 2018:

Oh, this function can be eliminated entirely actually.

apoelstra force-pushed on Sep 22, 2018

in src/ecmult_impl.h:196 in c4273d07cc outdated

205+    secp256k1_ge_set_gej_zinv(&p_ge, &pj, &zi);
206+    secp256k1_ge_from_storage(&last_ge, &pre[n - 1]);
207+    secp256k1_ge_to_storage(&pre[n - 1], &p_ge);
208+
209+    /* Compute the actual x-coordinate of D, which will be needed below. */
210+    secp256k1_fe_inv_var(&d.z, &d.z);

apoelstra commented at 9:54 pm on September 22, 2018:

should batch this inversion with the one 5 lines up

apoelstra force-pushed on Sep 22, 2018

apoelstra cross-referenced this on Sep 23, 2018 from issue Signed-digit multi-comb for ecmult_gen by peterdettman

gmaxwell commented at 0:30 am on September 30, 2018: contributor

Concept ACK. Are you going to batch that extra inv?

apoelstra commented at 9:01 pm on October 1, 2018: contributor

Yep, done.

apoelstra commented at 2:49 pm on October 17, 2018: contributor

@sipa can you take a look at this?

real-or-random cross-referenced this on Oct 22, 2018 from issue Enable context creation in preallocated memory by real-or-random

in src/group_impl.h:132 in 6d7b9b8a54 outdated

132+static void secp256k1_ge_set_all_gej_var(secp256k1_ge *r, const secp256k1_gej *a, size_t len) {
133+    secp256k1_fe u;
134     size_t i;
135-    size_t count = 0;
136-    az = (secp256k1_fe *)checked_malloc(cb, sizeof(secp256k1_fe) * len);
137+    size_t last_i = (size_t)-1;

gmaxwell commented at 0:19 am on October 28, 2018:

Why no SIZE_MAX?

real-or-random cross-referenced this on Oct 28, 2018 from issue Fix integer overflow in ecmult_multi_var when n is large by jonasnick

apoelstra force-pushed on Oct 30, 2018

apoelstra commented at 12:52 pm on October 30, 2018: contributor

Replaced all instances of (size_t)-1) with SIZE_MAX. Confirmed that git grep -n '(size_t)\w*-1' runs clean.

sipa commented at 2:36 am on November 6, 2018: contributor

Needs rebase on #553 (not urgent, I’ll review soon rebased or not).

apoelstra force-pushed on Nov 6, 2018

apoelstra commented at 3:55 pm on November 6, 2018: contributor

Rebased.

in src/group_impl.h:133 in 8b9cc17d8c outdated

133+    secp256k1_fe u;
134     size_t i;
135-    size_t count = 0;
136-    az = (secp256k1_fe *)checked_malloc(cb, sizeof(secp256k1_fe) * len);
137+    size_t last_i = SIZE_MAX;
138+    size_t first_i = SIZE_MAX;

sipa commented at 2:07 am on November 8, 2018:

It seems you’re not actually using first_i for anything except an assert at the end. Is that intentional?

apoelstra commented at 2:29 pm on November 8, 2018:

Yeah, though this was an algorithm-debugging check that probably should be removed now.

in src/group_impl.h:152 in 8b9cc17d8c outdated

156+    secp256k1_fe_inv_var(&u, &r[last_i].x);
157 
158-    azi = (secp256k1_fe *)checked_malloc(cb, sizeof(secp256k1_fe) * count);
159-    secp256k1_fe_inv_all_var(azi, az, count);
160-    free(az);
161+    i = last_i + 1;

sipa commented at 2:22 am on November 8, 2018:

I think this can be written more clearly as:

0i = last_i;
1while (i > 0) {
2    --i;
3    if (!a[i].infinity) {
4        secp256k1_fe_mul(&r[last_i].x, &r[i].x, &u);
5        secp256k1_fe_mul(&u, &u, &a[last_i].z);
6        last_i = i;
7    }
8}

apoelstra commented at 2:46 pm on November 8, 2018:

Ah, nice

in src/ecmult_impl.h:172 in f868315946 outdated

178+    pj.y = a_ge.y;
179+    pj.z = a->z;
180+    pj.infinity = 0;
181+
182+    zr = d.z;
183+    secp256k1_fe_normalize(&zr);

sipa commented at 2:52 am on November 8, 2018:

You can use secp256k1_fe_normalize_var everywhere in this function.

apoelstra commented at 2:46 pm on November 8, 2018:

done

in src/ecmult_impl.h:205 in f868315946 outdated

213+
214+    i = n - 1;
215+    while (i > 0) {
216+        secp256k1_fe zi2, zi3;
217+        i--;
218+        /* For the remaining points, we extract the z-ratio from the stored

sipa commented at 8:22 am on November 8, 2018:

Maybe point out that the stored z-ratio is extracted by the ge_from_storage at the end of the previous iteration of this loop.

apoelstra commented at 2:46 pm on November 8, 2018:

Added comment.

in src/ecmult_impl.h:210 in f868315946 outdated

219+         * x-coordinate, compute its z^-1 from that, and compute the full
220+         * point from that: */
221+        secp256k1_fe_mul(&zi, &zi, &last_ge.x);
222+        secp256k1_fe_sqr(&zi2, &zi);
223+        secp256k1_fe_mul(&zi3, &zi2, &zi);
224+        /* To compute x, we observe that the z-ratio is simply `h` from

sipa commented at 8:27 am on November 8, 2018:

Took me a minute to realize you were reasoning backwards here.

Perhaps explain that we’re given the rzr and r->y output of gej_add_ge_var, as well as the inverse of r->y (by multiplying the Z ratios with the overall Z inverse in the first half of this loop), and the (constant) b->x and b->y input (which refer to D), and are trying to compute (the affine version of) the a input to it.

apoelstra commented at 2:46 pm on November 8, 2018:

Expanded and rearranged this comment block.

apoelstra commented at 2:46 pm on November 8, 2018: contributor

Addressed all comments.

apoelstra force-pushed on Nov 8, 2018

sipa commented at 0:11 am on November 9, 2018: contributor

ACK, please squash the fixes in the respective commits?

It’s a really cool trick; I hadn’t expected it would be possible to reconstruct everything from Y and Z-ratios, but the formulas work out.

It seems this actually makes secp256k1_context_create(SECP256K1_CONTEXT_VERIFY) around 6% faster for me (when disabling static precomputation…).

ecmult_gen_impl: eliminate scratch memory used when generating context 7f7a2ed3a8

apoelstra force-pushed on Nov 9, 2018

apoelstra commented at 0:17 am on November 9, 2018: contributor

Squashed.

ecmult_impl: eliminate scratch memory used when generating context 47045270fa

ecmult_impl: save one fe_inv_var 84740acd2a

add `secp256k1_ge_set_all_gej_var` test which deals with many infinite points ffd3b346fe

apoelstra force-pushed on Nov 9, 2018

peterdettman commented at 8:09 am on November 9, 2018: contributor

I think it’s a bit easier to understand if the z-ratios are stored with the y-values that they’ll be used with (instead of off by 1).

Like this: https://github.com/peterdettman/secp256k1/commit/f1ab49c5dfb11933ea64299c0c769a36eed4fa87

in src/ecmult_impl.h:232 in ffd3b346fe outdated

241+         *
242+         * Rearranging and dividing by `z^2` to convert to affine, we get
243+         *
244+         *     x = d_x - rzr / z^2
245+         *       = d_x - rzr * zi2
246+         */

peterdettman commented at 8:13 am on November 9, 2018:

The comments here could be improved with respect to the d.z factors that are missing/implicit in the equations.

apoelstra commented at 1:17 pm on November 9, 2018:

Sure, so above where I just say “this is equal to h” I could write out

0rzr = h
1    = d.x * z^2 - x * d.z^2
2    = d.x * z^2 - x    because d.z = 1

Would that be sufficient?

peterdettman commented at 4:35 pm on November 9, 2018:

I’m referring to the original d.z from the doubling at line 152 (call this d_z), not d_ge.z == 1. So the conversion to affine also involves this d_z: x_affine = d_x / d_z^2 - rzr / z^2 / d_z^2 and note that: zi == 1 / z / d_z in each loop iteration.

apoelstra commented at 4:41 pm on November 9, 2018:

Ah, right – I say d_x in the comments but then use a variable called dx_over_dz_squared, which is confusing.

peterdettman commented at 8:13 am on November 10, 2018:

Yes, also the comments gloss over the 1/d_z factor in zi.

apoelstra commented at 4:04 pm on November 9, 2018: contributor

I think @peterdettman’s commit is great, it does simplify the logic a fair bit.

Store z-ratios in the 'x' coord they'll recover efa783f8f0

apoelstra commented at 1:44 pm on November 10, 2018: contributor

Added both of Peter’s commits squashed into one, and an addition commit that adds even more comments explaining how the effective-affine isomorphic curve plays into everything. @peterdettman do you think this is better? @sipa can you comment on this? We’re back to “working backward” in the sense that I do a series of equations starting from abstract coordinates and ending up with values that have variable bindings, but it’s all pretty explicit now.

in src/ecmult_impl.h:208 in 2e93fed97c outdated

203+     *           to move from our isomorphic curve back to secp256k1. We
204+     *           applied this multiplier above to the inverse-z-coordinate
205+     *           computing `pre[n-1]`, and since all subsequent inverse-zs
206+     *           are computed as multiples of this value, they will also
207+     *           have the multiplier applied to them without any further
208+     *           computation.

apoelstra commented at 1:52 pm on November 10, 2018:

Actually this block is confusing and does not distinguish between the two curves. Will rewrite.

apoelstra force-pushed on Nov 10, 2018

ecmult_impl: expand comment to explain how effective affine interacts with everything b3bf5f99a3

apoelstra force-pushed on Nov 10, 2018

sipa commented at 3:43 am on November 11, 2018: contributor

@apoelstra Looks very readable now; I’ll review in detail soon.

peterdettman commented at 5:49 am on November 12, 2018: contributor

Yes, better, thanks for improving that.

gmaxwell commented at 5:19 pm on November 17, 2018: contributor

Nice!

sipa commented at 2:33 am on November 20, 2018: contributor

utACK b3bf5f99a3251e3d72ffde1f39158af6ea133e33

apoelstra commented at 4:13 pm on November 26, 2018: contributor

Ready for merge?

sipa merged this on Nov 26, 2018

sipa closed this on Nov 26, 2018

sipa referenced this in commit e34ceb333b on Nov 26, 2018

apoelstra deleted the branch on Nov 26, 2018

gmaxwell commented at 11:44 pm on May 13, 2019: contributor

Addition/subtraction with constants are effectively free except sometimes in the tightest of loops, the compiler will usually remove them, when it doesn’t they usually get hidden by pipelines. This code is run-once code, it should be optimized for clarity and maintainability most and testability much more than for size (within reason) and performance.

gmaxwell referenced this in commit a484e0008b on May 25, 2019

real-or-random cross-referenced this on Jun 17, 2019 from issue Low-footprint mode by gmaxwell

apoelstra cross-referenced this on Jul 12, 2023 from issue group: ge(j) should have as invariant that the curve equation holds by real-or-random

Eliminate scratch memory used when generating contexts #557