Eliminate scratch memory used when generating contexts #557
pull apoelstra wants to merge 6 commits into bitcoin-core:master from apoelstra:2018-09-no-scratch-memory-in-context-build changing 6 files +174 −59-
apoelstra commented at 9:43 pm on September 22, 2018: contributorBuilds on #553
-
in src/group_impl.h:175 in e1dacce4e6 outdated
171@@ -172,20 +172,23 @@ static void secp256k1_ge_set_all_gej_var(secp256k1_ge *r, const secp256k1_gej *a 172 } 173 } 174 175-static void secp256k1_ge_set_table_gej_var(secp256k1_ge *r, const secp256k1_gej *a, const secp256k1_fe *zr, size_t len) { 176+static void secp256k1_ge_set_table_gej_var(secp256k1_ge_storage *r, const secp256k1_gej *a, const secp256k1_fe *zr, size_t len) {
apoelstra commented at 9:51 pm on September 22, 2018:Oh, this function can be eliminated entirely actually.apoelstra force-pushed on Sep 22, 2018in src/ecmult_impl.h:196 in c4273d07cc outdated
205+ secp256k1_ge_set_gej_zinv(&p_ge, &pj, &zi); 206+ secp256k1_ge_from_storage(&last_ge, &pre[n - 1]); 207+ secp256k1_ge_to_storage(&pre[n - 1], &p_ge); 208+ 209+ /* Compute the actual x-coordinate of D, which will be needed below. */ 210+ secp256k1_fe_inv_var(&d.z, &d.z);
apoelstra commented at 9:54 pm on September 22, 2018:should batch this inversion with the one 5 lines upapoelstra force-pushed on Sep 22, 2018apoelstra cross-referenced this on Sep 23, 2018 from issue Signed-digit multi-comb for ecmult_gen by peterdettmangmaxwell commented at 0:30 am on September 30, 2018: contributorConcept ACK. Are you going to batch that extra inv?apoelstra commented at 9:01 pm on October 1, 2018: contributorYep, done.real-or-random cross-referenced this on Oct 22, 2018 from issue Enable context creation in preallocated memory by real-or-randomin src/group_impl.h:132 in 6d7b9b8a54 outdated
132+static void secp256k1_ge_set_all_gej_var(secp256k1_ge *r, const secp256k1_gej *a, size_t len) { 133+ secp256k1_fe u; 134 size_t i; 135- size_t count = 0; 136- az = (secp256k1_fe *)checked_malloc(cb, sizeof(secp256k1_fe) * len); 137+ size_t last_i = (size_t)-1;
gmaxwell commented at 0:19 am on October 28, 2018:Why no SIZE_MAX?real-or-random cross-referenced this on Oct 28, 2018 from issue Fix integer overflow in ecmult_multi_var when n is large by jonasnickapoelstra force-pushed on Oct 30, 2018apoelstra commented at 12:52 pm on October 30, 2018: contributorReplaced all instances of(size_t)-1)
withSIZE_MAX
. Confirmed thatgit grep -n '(size_t)\w*-1'
runs clean.apoelstra force-pushed on Nov 6, 2018apoelstra commented at 3:55 pm on November 6, 2018: contributorRebased.in src/group_impl.h:133 in 8b9cc17d8c outdated
133+ secp256k1_fe u; 134 size_t i; 135- size_t count = 0; 136- az = (secp256k1_fe *)checked_malloc(cb, sizeof(secp256k1_fe) * len); 137+ size_t last_i = SIZE_MAX; 138+ size_t first_i = SIZE_MAX;
sipa commented at 2:07 am on November 8, 2018:It seems you’re not actually usingfirst_i
for anything except an assert at the end. Is that intentional?
apoelstra commented at 2:29 pm on November 8, 2018:Yeah, though this was an algorithm-debugging check that probably should be removed now.in src/group_impl.h:152 in 8b9cc17d8c outdated
156+ secp256k1_fe_inv_var(&u, &r[last_i].x); 157 158- azi = (secp256k1_fe *)checked_malloc(cb, sizeof(secp256k1_fe) * count); 159- secp256k1_fe_inv_all_var(azi, az, count); 160- free(az); 161+ i = last_i + 1;
sipa commented at 2:22 am on November 8, 2018:I think this can be written more clearly as:
0i = last_i; 1while (i > 0) { 2 --i; 3 if (!a[i].infinity) { 4 secp256k1_fe_mul(&r[last_i].x, &r[i].x, &u); 5 secp256k1_fe_mul(&u, &u, &a[last_i].z); 6 last_i = i; 7 } 8}
apoelstra commented at 2:46 pm on November 8, 2018:Ah, nicein src/ecmult_impl.h:172 in f868315946 outdated
178+ pj.y = a_ge.y; 179+ pj.z = a->z; 180+ pj.infinity = 0; 181+ 182+ zr = d.z; 183+ secp256k1_fe_normalize(&zr);
sipa commented at 2:52 am on November 8, 2018:You can usesecp256k1_fe_normalize_var
everywhere in this function.
apoelstra commented at 2:46 pm on November 8, 2018:donein src/ecmult_impl.h:205 in f868315946 outdated
213+ 214+ i = n - 1; 215+ while (i > 0) { 216+ secp256k1_fe zi2, zi3; 217+ i--; 218+ /* For the remaining points, we extract the z-ratio from the stored
sipa commented at 8:22 am on November 8, 2018:Maybe point out that the stored z-ratio is extracted by thege_from_storage
at the end of the previous iteration of this loop.
apoelstra commented at 2:46 pm on November 8, 2018:Added comment.in src/ecmult_impl.h:210 in f868315946 outdated
219+ * x-coordinate, compute its z^-1 from that, and compute the full 220+ * point from that: */ 221+ secp256k1_fe_mul(&zi, &zi, &last_ge.x); 222+ secp256k1_fe_sqr(&zi2, &zi); 223+ secp256k1_fe_mul(&zi3, &zi2, &zi); 224+ /* To compute x, we observe that the z-ratio is simply `h` from
sipa commented at 8:27 am on November 8, 2018:Took me a minute to realize you were reasoning backwards here.
Perhaps explain that we’re given the
rzr
andr->y
output ofgej_add_ge_var
, as well as the inverse ofr->y
(by multiplying the Z ratios with the overall Z inverse in the first half of this loop), and the (constant)b->x
andb->y
input (which refer to D), and are trying to compute (the affine version of) thea
input to it.
apoelstra commented at 2:46 pm on November 8, 2018:Expanded and rearranged this comment block.apoelstra commented at 2:46 pm on November 8, 2018: contributorAddressed all comments.apoelstra force-pushed on Nov 8, 2018sipa commented at 0:11 am on November 9, 2018: contributorACK, please squash the fixes in the respective commits?
It’s a really cool trick; I hadn’t expected it would be possible to reconstruct everything from Y and Z-ratios, but the formulas work out.
It seems this actually makes
secp256k1_context_create(SECP256K1_CONTEXT_VERIFY)
around 6% faster for me (when disabling static precomputation…).ecmult_gen_impl: eliminate scratch memory used when generating context 7f7a2ed3a8apoelstra force-pushed on Nov 9, 2018apoelstra commented at 0:17 am on November 9, 2018: contributorSquashed.ecmult_impl: eliminate scratch memory used when generating context 47045270faecmult_impl: save one fe_inv_var 84740acd2aadd `secp256k1_ge_set_all_gej_var` test which deals with many infinite points ffd3b346feapoelstra force-pushed on Nov 9, 2018peterdettman commented at 8:09 am on November 9, 2018: contributorI think it’s a bit easier to understand if the z-ratios are stored with the y-values that they’ll be used with (instead of off by 1).
Like this: https://github.com/peterdettman/secp256k1/commit/f1ab49c5dfb11933ea64299c0c769a36eed4fa87
in src/ecmult_impl.h:232 in ffd3b346fe outdated
241+ * 242+ * Rearranging and dividing by `z^2` to convert to affine, we get 243+ * 244+ * x = d_x - rzr / z^2 245+ * = d_x - rzr * zi2 246+ */
peterdettman commented at 8:13 am on November 9, 2018:The comments here could be improved with respect to thed.z
factors that are missing/implicit in the equations.
apoelstra commented at 1:17 pm on November 9, 2018:Sure, so above where I just say “this is equal to
h
” I could write out0rzr = h 1 = d.x * z^2 - x * d.z^2 2 = d.x * z^2 - x because d.z = 1
Would that be sufficient?
peterdettman commented at 4:35 pm on November 9, 2018:I’m referring to the original d.z from the doubling at line 152 (call this d_z), not d_ge.z == 1. So the conversion to affine also involves this d_z:x_affine = d_x / d_z^2 - rzr / z^2 / d_z^2
and note that:zi == 1 / z / d_z
in each loop iteration.
apoelstra commented at 4:41 pm on November 9, 2018:Ah, right – I sayd_x
in the comments but then use a variable calleddx_over_dz_squared
, which is confusing.
peterdettman commented at 8:13 am on November 10, 2018:Yes, also the comments gloss over the 1/d_z factor in zi.apoelstra commented at 4:04 pm on November 9, 2018: contributorI think @peterdettman’s commit is great, it does simplify the logic a fair bit.Store z-ratios in the 'x' coord they'll recover efa783f8f0apoelstra commented at 1:44 pm on November 10, 2018: contributorAdded both of Peter’s commits squashed into one, and an addition commit that adds even more comments explaining how the effective-affine isomorphic curve plays into everything. @peterdettman do you think this is better? @sipa can you comment on this? We’re back to “working backward” in the sense that I do a series of equations starting from abstract coordinates and ending up with values that have variable bindings, but it’s all pretty explicit now.in src/ecmult_impl.h:208 in 2e93fed97c outdated
203+ * to move from our isomorphic curve back to secp256k1. We 204+ * applied this multiplier above to the inverse-z-coordinate 205+ * computing `pre[n-1]`, and since all subsequent inverse-zs 206+ * are computed as multiples of this value, they will also 207+ * have the multiplier applied to them without any further 208+ * computation.
apoelstra commented at 1:52 pm on November 10, 2018:Actually this block is confusing and does not distinguish between the two curves. Will rewrite.apoelstra force-pushed on Nov 10, 2018apoelstra force-pushed on Nov 10, 2018ecmult_impl: expand comment to explain how effective affine interacts with everything b3bf5f99a3apoelstra force-pushed on Nov 10, 2018sipa commented at 3:43 am on November 11, 2018: contributor@apoelstra Looks very readable now; I’ll review in detail soon.peterdettman commented at 5:49 am on November 12, 2018: contributorYes, better, thanks for improving that.gmaxwell commented at 5:19 pm on November 17, 2018: contributorNice!sipa commented at 2:33 am on November 20, 2018: contributorutACK b3bf5f99a3251e3d72ffde1f39158af6ea133e33apoelstra commented at 4:13 pm on November 26, 2018: contributorReady for merge?sipa merged this on Nov 26, 2018sipa closed this on Nov 26, 2018
sipa referenced this in commit e34ceb333b on Nov 26, 2018apoelstra deleted the branch on Nov 26, 2018gmaxwell commented at 11:44 pm on May 13, 2019: contributorAddition/subtraction with constants are effectively free except sometimes in the tightest of loops, the compiler will usually remove them, when it doesn’t they usually get hidden by pipelines. This code is run-once code, it should be optimized for clarity and maintainability most and testability much more than for size (within reason) and performance.gmaxwell referenced this in commit a484e0008b on May 25, 2019real-or-random cross-referenced this on Jun 17, 2019 from issue Low-footprint mode by gmaxwellapoelstra cross-referenced this on Jul 12, 2023 from issue group: ge(j) should have as invariant that the curve equation holds by real-or-random
github-metadata-mirror
This is a metadata mirror of the GitHub repository bitcoin-core/secp256k1. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-10-30 03:15 UTC
This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me