Add _fe_half and use in _gej_add_ge and _gej

peterdettman commented at 6:30 pm on December 5, 2021: contributor

Trades 1 _half for 3 _mul_int and 2 _normalize_weak

Gives around 2-3% faster signing and ECDH, depending on compiler/platform.

peterdettman force-pushed on Dec 21, 2021

sipa commented at 1:50 am on December 22, 2021: contributor

ACK b6d109ddc80254e4ceedd7f22e6195a1ef37a087

I’ve written a basic test for the new function directly: https://github.com/sipa/secp256k1/commits/pr1033

peterdettman commented at 6:05 am on December 22, 2021: contributor

Thanks, @sipa . Merged your PR and added a benchmark entry also.

in src/field.h:136 in b6d109ddc8 outdated

123@@ -124,4 +124,6 @@ static void secp256k1_fe_storage_cmov(secp256k1_fe_storage *r, const secp256k1_f
124 /** If flag is true, set *r equal to *a; otherwise leave it. Constant-time.  Both *r and *a must be initialized.*/
125 static void secp256k1_fe_cmov(secp256k1_fe *r, const secp256k1_fe *a, int flag);
126 
127+static void secp256k1_fe_half(secp256k1_fe *r);

real-or-random commented at 10:21 am on December 22, 2021:

Can you add a comment here (that describes the effect on the magnitude)?

peterdettman commented at 2:14 pm on December 22, 2021:

Done

real-or-random commented at 11:37 am on December 22, 2021: contributor

ACK mod nit. It may be a good idea to squash the last commit (update comments) into the first one.

real-or-random commented at 11:54 am on December 22, 2021: contributor

Looking at the comment in gej_double: https://github.com/bitcoin-core/secp256k1/blob/master/src/group_impl.h#L274-L280

I wonder if half can save the normalization when switching to the other formula.

https://hyperelliptic.org/EFD/g1p/auto-shortw-jacobian-0.html#doubling-dbl-2009-l gives

0A = X1^2
1B = Y1^2
2C = B^2
3D = 2*((X1+B)^2-A-C)
4E = 3*A
5F = E^2
6X3 = F-2*D
7Y3 = E*(D-X3)-8*C
8Z3 = 2*Y1*Z1

I think this is equivalent to

0A = X1^2
1B = Y1^2
2C = B^2
3D = (X1+B)^2-A-C
4E = 3*(A/2)
5F = E^2
6X3 = F-D
7Y3 = E*(D/2-X3)-C
8Z3 = Y1*Z1

peterdettman commented at 2:07 pm on December 22, 2021: contributor

I wonder if half can save the normalization when switching to the other formula.

I might give it a try out of curiosity. It looks like the _half calls and extra _add calls can be paid for by removing several _mul_int calls, so maybe there’s a net gain.

peterdettman force-pushed on Dec 22, 2021

peterdettman commented at 2:14 pm on December 22, 2021: contributor

Squashed, added extra comments and extra VERIFY_CHECK that the low bit is zero before shifting it away.

real-or-random approved

real-or-random commented at 2:38 pm on December 22, 2021: contributor

ACK 0559a3d905ad9ec2e98faeb839e017349a18f8da

peterdettman commented at 2:40 pm on December 22, 2021: contributor

My reasoning for the output magnitude in _fe_half. Please review, since the bound is exact (i.e. very tight).

Given the formula m_out = (m_in » 1) + 1, we can just consider the worst case of an odd m_in. Also the top limb case will be the same as for the lower limbs, except with no incoming carry from above, and the lower limb(s) of P has a smaller value, so we deal only with a “middle limb” example where the added limb of P is maximal and a carry in from above is assumed.

Let odd m_in = 2.k + 1. Then the largest initial value for a lower limb per the magnitude constraints is (2.k + 1).(2.X), where X = 2^52-1 (resp. 2^26-1 in 32bit code). This value has potentially P added to it (worst case for an individual limb is +X), then is shifted down and then a carry is added. The carry will add 2^51 (resp. 2^25) == (X + 1)/2.

floor(((2.k + 1).(2.X) + X)/2) + (X + 1)/2

Since the carry is integral we can rearrange to this:

floor((4.k.X + 4.X + 1)/2) = 2.k.X + 2.X = 2.(k + 1).X

which is exactly the bound for the calculated output magnitude: floor((2.k + 1)/2) + 1 == k + 1

QED.

Edit: I have intentionally ignored any special treatment for a magnitude 0 input which technically could be left unchanged.

real-or-random commented at 3:38 pm on December 22, 2021: contributor

Hm, it didn’t occur to me that the analysis of the magnitude is that involved and I made a wrong assumption when reviewing this…

I believe your proof is correct but then we should maybe add a polished version of the proof to a comment and introduce tests that check the function with the exact largest input value (for even and odd input magnitudes).

peterdettman commented at 7:12 pm on December 22, 2021: contributor

I agree, especially about wanting better tests. I have some here that use _fe_negate to get “OK” inputs, but ideally I’d like to be able to generate a field element with maximal limbs for a given magnitude, except that the LSB should be set so that the adding of P will be triggered.

Possibly we need some testing-oriented method(s) alongside _fe_verify to allow precise construction of otherwise maybe-unreachable representations. e.g a method to assert a new magnitude value would be helpful since we can generally get the limbs we want, but lose control of the magnitude. Then again, maybe it’s easier to just directly manipulate the specific field representation(s) under SECP256K1_WIDEMUL_INT64/128 defines?

real-or-random commented at 7:30 pm on December 22, 2021: contributor

I think you need to rebase (or force-push for some other reason) once here to make CI happy. A few tasks were failing due to #1047 being merged during the CI run of this PR. (Sorry for the CI mess again. Shouldn’t happen for other PRs then at least…)

real-or-random commented at 7:38 pm on December 22, 2021: contributor

e.g a method to assert a new magnitude value would be helpful since we can generally get the limbs we want, but lose control of the magnitude.

Sorry I can’t follow here. When you “assert a new magnitude” value, wouldn’t you have control then?

Could we just add 2P (?) to increase the magnitude artificially for testing?

Then again, maybe it’s easier to just directly manipulate the specific field representation(s) under SECP256K1_WIDEMUL_INT64/128 defines?

You mean construct special values by by setting the fe members directly? Yeah, I think that’s what we should do for testing. go crypto has a similar thing, see this commit: https://github.com/golang/go/commit/d95ca9138026cbe40e0857d76a81a16d03230871

sipa commented at 10:25 pm on December 22, 2021: contributor

I clearly also went too fast over the new bound.

Here is my reasoning for it. It applies to x = r->n[i] (for i=0..8) for the 32-bit code. Think of x as a real number.

Let C = 2*0x3ffffff.

On entrance to the function, we know that x <= m*C. In the first half of the function, at most 0x3ffffff is added (the value of mask). Now x <= (m+1/2)*C. Then it is divided by two, leading to x <= (m/2+1/4)*C. Lastly, at most 1 << 25 is added (which is equal to C/4 + 1/2), giving x <= (m/2+1/2)*C + 1/2. This implies x <= ceil(m/2 + 1/2)*C + 1/2. And since x, ceil(m/2 + 1/2) and C are all integral, we can drop the + 1/2, giving x <= ceil(m/2 + 1/2)*C, and ceil(m/2 + 1/2) equals the new magnitude (m>>1)+1 for all natural m.

Let D = 2*3fffff.

For x = r->n[8] we instead have on entrance x <= m*D. In the first half of the function, at most 0x3fffff is added. Now x <= (m+1/2)*D. Then it is divided by two, leading to x <= (m/2+1/4)*D. This implies x <= ceil(m/2+1/4)*D, and ceil(m/2+1/4) equals the new magnitude (m>>1)+1 for all natural m.

Similar reasoning can be used for the 64-bit code.

It’d be good to document this reasoning in the code.

sipa commented at 0:04 am on December 23, 2021: contributor

@real-or-random By just following the “basic” doubling formula (affine λ = (3/2)*X1^2/Y1, X3 = λ^2 - 2*X1, Y3 = λ*(X1 - X3) - Y1), but using halving for the halving rather than bringing it to the Z coordinate when going to Jacobian, you get:

0L = (3/2) * X1^2
1S = Y1^2
2T = X1*S
3X3 = L^2 - 2*T
4Y3 = L*(T - X3) - S^2
5Z3 = Z1*Y1

which seems even simpler.

sipa commented at 1:04 am on December 23, 2021: contributor

Implemented here; seems to work, but I can’t really notice anything becoming faster. By operation count I would expect it to be slightly faster though: https://github.com/sipa/secp256k1/commits/202112_doublehalf. There may be some micro optimizations I’ve missed (e.g. I feel like it should be possible with one less negation), but it’s not going to matter much.

peterdettman commented at 4:59 am on December 23, 2021: contributor

Sorry I can’t follow here. When you “assert a new magnitude” value, wouldn’t you have control then?

Rephrased: we can generally create the limbs we want by simple addition through the _fe API, but not without the value of the magnitude variable going higher than we want. Therefore a new VERIFY-only method to let us just set the magnitude to what we want (this method would check the bounds of course) could be helpful.

However I think I’ll start by using the direct-setting approach to get some worst-case tests in place

peterdettman commented at 5:52 am on December 23, 2021: contributor

I feel like it should be possible with one less negation

Y3 = -(S^2 + L*(X3 - T))

Then only 2 negates needed, for T and Y3. Then you could choose to negate Z3 instead of Y3 (which is probably a speedup in itself).

By operations this should really be faster; I will test it shortly here. I wanted to note though that it also has a pleasant effect on the magnitudes of of the output field elements, which can be useful with #1032. With the input magnitudes of X1, Y1 constrained I think you could even negate them instead of T and Z3 and then the output magnitudes would get even smaller.

peterdettman commented at 6:10 am on December 23, 2021: contributor

By operations this should really be faster; I will test it shortly here.

For me it seems an improvement, most noticeably in ecdh (as expected), which it makes around 1-2% faster.

peterdettman force-pushed on Dec 23, 2021

peterdettman commented at 8:04 am on December 23, 2021: contributor

Cherry-picked new formula from @sipa and added a further refinement on top. Rebased to current master.

With new doubling formula ECDH is now around 4-5% faster (for the whole PR).

Edit: Interestingly, moving the negate from Y3 to Z3 can’t be done without changing some tests that appear to check for an exact expected z-ratio in some way that I haven’t investigated.

peterdettman commented at 9:48 am on December 23, 2021: contributor

Added specific test cases for maximal field elements (every limb at bound) and also worst-case field elements (subtract 1 from maximal low limb to ensure P is added and therefore carries all happen too).

peterdettman force-pushed on Dec 23, 2021

peterdettman commented at 12:13 pm on December 23, 2021: contributor

Added bounds analysis commentary to _fe_half methods and squashed into first commit.

sipa commented at 3:08 pm on December 23, 2021: contributor

Edit: Interestingly, moving the negate from Y3 to Z3 can’t be done without changing some tests that appear to check for an exact expected z-ratio in some way that I haven’t investigated.

Did you add a secp256k1_fe_negate(rzr, &a->y) in secp256k1_gej_double_var?

in src/tests.c:50 in bb8f15fdd7 outdated

46@@ -47,6 +47,37 @@ static void uncounting_illegal_callback_fn(const char* str, void* data) {
47     (*p)--;
48 }
49 
50+void max_fe_for_magnitude(secp256k1_fe *r, int m) {

sipa commented at 3:18 pm on December 23, 2021:

I think it would be cleaner to have this function be in the respective field_* files, and documented in field.h (it can still be surrounded by #ifdef VERIFY there).

peterdettman commented at 6:31 pm on December 23, 2021:

Is tests.c only ever compiled with VERIFY defined?

sipa commented at 6:34 pm on December 23, 2021:

It is right now, though in #1037 I suggested that maybe we also want tests that aren’t. We can deal with that later, though.

real-or-random commented at 6:35 pm on December 23, 2021:

I was asking myself the same question a few minutes ago. I think the answer is yes but we use #ifdef VERIFY in a few other places (also introduced by you ;)), so it may be good to keep it like this. In fact I think we should run the tests also without VERIFY, e.g., to catch compiler bugs that don’t appear when VERIFY is defined.

peterdettman commented at 12:30 pm on December 24, 2021:

I’ve replaced the test method with a new field method _fe_get_bounds when VERIFY is defined. If the tests ever get run without VERIFY, we will presumably need a separate define to classify “test support code”.

real-or-random cross-referenced this on Dec 23, 2021 from issue Try a non-uniform group law (e.g., for ecmult_gen)? by real-or-random

in src/group_impl.h:328 in bb8f15fdd7 outdated

323@@ -326,8 +324,6 @@ static void secp256k1_gej_double_var(secp256k1_gej *r, const secp256k1_gej *a, s
324 
325     if (rzr != NULL) {
326         *rzr = a->y;
327-        secp256k1_fe_normalize_weak(rzr);

peterdettman commented at 6:52 pm on December 23, 2021:

I haven’t looked at the callers, but removing this normalize_weak is a little sketchy.

sipa commented at 7:10 pm on December 23, 2021:

My belief is that this normalize_weak call only exists because of the multiply by 2 on the next line, which would otherwise operate on an input of unknown magnitude.

real-or-random commented at 9:05 pm on December 23, 2021:

Maybe it’s time to work on #1001 .

peterdettman commented at 12:00 pm on December 24, 2021:

I doubt there is a problem here, but performance impact is negligible and I don’t want to be losing sleep, so I’ve restored the _normalize_weak. I think it can be cleaned up after/with #1001 and/or #1032.

peterdettman commented at 11:54 am on December 24, 2021: contributor

Did you add a secp256k1_fe_negate(rzr, &a->y) in secp256k1_gej_double_var?

Thanks, that was the problem. It didn’t pan out faster though, despite my hope that scheduling a negate of Z3 along with other linear ops earlier in the method might be slightly faster than a final negate of Y3.

peterdettman force-pushed on Dec 24, 2021

sipa commented at 2:37 pm on December 24, 2021: contributor

The Sage group prover works with this new doubling formula:

 0--- a/sage/prove_group_implementations.sage
 1+++ b/sage/prove_group_implementations.sage
 2@@ -8,25 +8,20 @@ load("weierstrass_prover.sage")
 3 def formula_secp256k1_gej_double_var(a):
 4   """libsecp256k1's secp256k1_gej_double_var, used by various addition functions"""
 5   rz = a.Z * a.Y
 6-  rz = rz * 2
 7-  t1 = a.X^2
 8-  t1 = t1 * 3
 9-  t2 = t1^2
10-  t3 = a.Y^2
11-  t3 = t3 * 2
12-  t4 = t3^2
13-  t4 = t4 * 2
14-  t3 = t3 * a.X
15-  rx = t3
16-  rx = rx * 4
17-  rx = -rx
18-  rx = rx + t2
19-  t2 = -t2
20-  t3 = t3 * 6
21-  t3 = t3 + t2
22-  ry = t1 * t3
23-  t2 = -t4
24-  ry = ry + t2
25+  s = a.Y^2
26+  l = a.X^2
27+  l = l * 3
28+  l = l / 2
29+  t = -s
30+  t = t * a.X
31+  rx = l^2
32+  rx = rx + t
33+  rx = rx + t
34+  s = s^2
35+  t = t + rx
36+  ry = t * l
37+  ry = ry + s
38+  ry = -ry
39   return jacobianpoint(rx, ry, rz)

peterdettman force-pushed on Dec 26, 2021

peterdettman commented at 9:38 am on December 26, 2021: contributor

Moved one of the _fe_negate calls in _gej_add_ge and updated the formula and some comments to properly reflect the current placement of negations. This was squashed into the first commit and the whole thing rebased.

peterdettman commented at 7:19 am on December 27, 2021: contributor

I’ve updated the sage formulae to reflect this PR’s changes to _gej_double and _gej_add_ge. I’ve no more changes in mind.

real-or-random commented at 10:18 am on December 27, 2021: contributor

Please ignore the macOS CI failures. The Cirrus macOS VMs seems to have network issues… https://github.com/cirruslabs/osx-images/issues/42 macOS on CI is getting annoying on multiple fronts apparently.

sipa commented at 2:10 am on December 31, 2021: contributor

ACK b6631a809189fdbc9f31b710f995e7075160d536

Curious observation: in secp256k1_gej_add_ge, using degenerate = secp256k1_fe_normalizes_to_zero(&m); (dropping the & secp256k1_fe_normalizes_to_zero(&rr)) also works (and passes sage symbolic verification). I haven’t thought through why.

peterdettman commented at 7:35 am on December 31, 2021: contributor

@sipa An interesting observation. I have been playing around with reversing which of the two formulae is considered “primary” and which the “alternate”, and presumably there is actually a set of cases where either formula works. I think there may be some small gains to be had here.

It might not be relevant, but do these sage scripts understand that the endomorphism exists so that (y1 == +/-y2) does not imply (x1 == x2)?

peterdettman commented at 8:10 am on December 31, 2021: contributor

I have been playing around with reversing which of the two formulae is considered “primary” and which the “alternate”

Actually, on review, that’s not quite what I was doing. Rather I had noted that in the degenerate case m and rr were 0 and so some of the cmov calls can actually be cadd or czero, which are marginally cheaper.

sipa commented at 2:51 pm on December 31, 2021: contributor

It might not be relevant, but do these sage scripts understand that the endomorphism exists so that (y1 == +/-y2) does not imply (x1 == x2)?

They reason over (fractions and ideals of) polynomials over Q. My intuition says that that will discover all properties that could hold in some field, but I don’t know enough theory to know whether that’s true. The script does find the error in the “old” gej_add_ge code, which was only wrong due to the existence of (x1,y) and (x2,-y) points, IIRC.

real-or-random commented at 2:58 pm on January 26, 2022: contributor

needs (trivial) rebase :)

peterdettman force-pushed on Jan 27, 2022

peterdettman commented at 7:15 am on January 27, 2022: contributor

Rebased.

in src/field_5x52_impl.h:61 in 534df0fbb7 outdated

55@@ -56,6 +56,19 @@ static void secp256k1_fe_verify(const secp256k1_fe *a) {
56     }
57     VERIFY_CHECK(r == 1);
58 }
59+
60+static void secp256k1_fe_get_bounds(secp256k1_fe *r, int m) {

jonasnick commented at 9:34 pm on January 28, 2022:

Right now the tests don’t compile without VERIFY (for example if you configure with --enable-coverage. I’d suggest to put only the magnitude/normalized/fe_verify check inside an ifdef VERIFY block.

peterdettman commented at 12:10 pm on January 31, 2022:

I asked somewhere whether the tests are ever compiled without VERIFY and told they were not. Are you saying that –enable-coverage does do that?

jonasnick commented at 1:38 pm on January 31, 2022:

Yes –enable-coverage builds the tests without VERIFY to get more accurate coverage statistics.

peterdettman commented at 10:52 am on February 1, 2022:

OK, I’ve made that change, but it would seem a good time to consider a #define that is set only when compiling tests.

in src/field_10x26_impl.h:1174 in 534df0fbb7 outdated

1165+    /* Let m = r->magnitude
1166+     *     C = 0x3FFFFFFUL * 2
1167+     *     D = 0x03FFFFFUL * 2
1168+     *
1169+     * Initial bounds: t0..t8 <= C * m
1170+     *                     t9 <= D * m

jonasnick commented at 9:36 pm on January 28, 2022:

nit: Took me a bit to realize that all the bounds are about rational numbers. Perhaps worth mentioning.

peterdettman commented at 12:44 pm on January 31, 2022:

Added a comment to that effect.

jonasnick commented at 10:22 pm on January 28, 2022: contributor

This was fun to review. I get an 6% speedup in double and 2.5% in add.

I can’t run the sage code. Tried to run it on two different implementations, but I’m getting a no canonical coercion from Multivariate Polynomial Ring error.

Add _fe_half and use in _gej_add_ge

- Trades 1 _half for 3 _mul_int and 2 _normalize_weak
- Updated formula and comments in _gej_add_ge
- Added internal benchmark for _fe_half

925f78d55e

Add test for secp256k1_fe_half 9cc5c257ed

Run more iterations of run_field_misc

At count=64, this makes the test take around 1% of the total time.

2cbb4b1a42

Doubling formula using fe_half 557b31fac3

Further improve doubling formula using fe_half 4eb8b932ff

peterdettman force-pushed on Jan 31, 2022

sipa cross-referenced this on Jan 31, 2022 from issue Add Elligator Square module by sipa

Add fe_half tests for worst-case inputs

- Add field method _fe_get_bounds

d64bb5d4f3

Update sage files for new formulae

- formula_secp256k1_gej_double_var
- formula_secp256k1_gej_add_ge

e848c3799c

peterdettman force-pushed on Feb 1, 2022

jonasnick commented at 3:48 pm on February 2, 2022: contributor

ACK e848c3799c4f31367c3ed98d17e3b7de504d4c6e

Was able to run the updated sage scripts with #1068

real-or-random commented at 11:26 am on February 4, 2022: contributor

Curious observation: in secp256k1_gej_add_ge, using degenerate = secp256k1_fe_normalizes_to_zero(&m); (dropping the & secp256k1_fe_normalizes_to_zero(&rr)) also works (and passes sage symbolic verification). I haven’t thought through why.

I thought about this and I believe it works. The code currently switches to the alternative formula for lambda only if (R,M) = (0,0) but the alternative formula works whenever M = 0: Specifically, M = 0 implies y1 = -y2. If x1 = x2, then a = -b this is the r = infinity case that we handle separately. If x1 != x2, then the denominator in the alternative formula is non-zero, so this formula is well-defined. (And I understand this means that it gives the right result?)

One needs to carefully check that the infinity = assignment is still correct because now the definition of m_alt at this point in the code has changed. But this is true:

Case y1 = -y2 ==> degenerate = true ==> infinity = ((x1 - x2)Z = 0) & ~a->infinity a->infinity is handled separately. And if ~a->infinity, then Z = Z1 != 0, so infinity = (x1 - x2 = 0) = (a = -b) by case condition.

Case y1 != -y2 ==> degenerate = false ==> infinity = ((y1 + y2)Z = 0) & ~a->infinity. a->infinity is handled separately. And if ~a->infinity, then Z = Z1 != 0, so infinity = (y1 + y2 = 0) = false by case condition.

real-or-random commented at 2:29 pm on February 4, 2022: contributor

I pushed it here with a further change that saves the infinity variable: https://github.com/real-or-random/secp256k1/commits/202202-gej_add_ge (Should not hold up this PR, I’m in the middle of reviewing it)

real-or-random cross-referenced this on Feb 4, 2022 from issue sage: Fix incompatibility with sage 9.4 by real-or-random

real-or-random approved

real-or-random commented at 2:54 pm on February 4, 2022: contributor

ACK e848c3799c4f31367c3ed98d17e3b7de504d4c6e

If you ever touch this again, maybe squash the commits that affect the doubling function. But no need to invalidate the ACKs for this.

edit: Changed to PR title to reflect to change to _gej_double

real-or-random renamed this:
~~Add _fe_half and use in _gej_add_ge~~
Add _fe_half and use in _gej_add_ge and _gej_double
on Feb 4, 2022

sipa commented at 9:33 pm on February 17, 2022: contributor

utACK e848c3799c4f31367c3ed98d17e3b7de504d4c6e

real-or-random merged this on Feb 21, 2022

real-or-random closed this on Feb 21, 2022

real-or-random cross-referenced this on Feb 21, 2022 from issue group: Save a normalize_to_zero in gej_add_ge by real-or-random

fanquake referenced this in commit 8f8631d826 on Mar 17, 2022

fanquake referenced this in commit 4bb1d7e62a on Mar 17, 2022

fanquake referenced this in commit 465d05253a on Mar 30, 2022

jonasnick cross-referenced this on Mar 30, 2022 from issue Upstream PRs 1064, 1049, 899, 1068, 1072, 1069, 1074, 1026, 1033, 748, 1079, 1088, 1090, 731, 1089, 995, 1094, 1093 by jonasnick

real-or-random referenced this in commit 6c0aecf72b on Apr 1, 2022

fanquake referenced this in commit afb7a6fe06 on Apr 6, 2022

gwillen referenced this in commit 35d6112a72 on May 25, 2022

patricklodder referenced this in commit 21badcf9d2 on Jul 25, 2022

patricklodder referenced this in commit 03002a9013 on Jul 28, 2022

janus referenced this in commit 3a0652a777 on Aug 4, 2022

sipa referenced this in commit 1b21aa5175 on Feb 14, 2023

str4d referenced this in commit 522190d5c3 on Apr 21, 2023

vmta referenced this in commit e1120c94a1 on Jun 4, 2023

vmta referenced this in commit 8f03457eed on Jul 1, 2023

Add _fe_half and use in _gej_add_ge and _gej_double #1033