Native jacobi symbol algorithm

sipa commented at 10:54 pm on September 23, 2021: contributor

This introduces variants of the vartime divsteps-based GCD algorithm used for modular inverses to compute Jacobi symbols. Changes compared to the normal vartime divsteps:

Only positive matrices are used, guaranteeing that f and g remain positive.
An additional jac variable is updated to track sign changes during matrix computation.
There is (so far) no proof that this algorithm terminates within reasonable amount of time for every input, but experimentally it appears to almost always need less than 900 iterations. To account for that, only a bounded number of iterations is performed (1500), after which failure is returned. The field logic then falls back to using square roots to determining the result.
The algorithm converges to f=g=gcd(f0,g0) rather than g=0. To keep this test simple, the end condition is f=1, which won’t be reached if started with g=0. That case is dealt with specially.

This code is currently unused, except for tests. I don’t aim for it to be merged until there is a need for it, but this demonstrates its feasibility.

In terms of performance:

0field_inverse: min 1.76us / avg 1.76us / max 1.78us
1field_inverse_var: min 0.991us / avg 0.993us / max 0.996us
2field_jacobi_var: min 1.31us / avg 1.31us / max 1.31us
3field_sqrt: min 4.36us / avg 4.37us / max 4.40us

while with the older (f24e122d13db7061b1086ddfd21d3a1c5294213b) libgmp based Jacobi code on the same system:

0num_jacobi: min 1.53us / avg 1.54us / max 1.55us

sipa force-pushed on Sep 23, 2021

sipa cross-referenced this on Sep 28, 2021 from issue Add Elligator Square module by sipa

dhruv cross-referenced this on Nov 3, 2021 from issue BIP324: CKey encode/decode to elligator-swift by dhruv

in src/modinv32.h:41 in 042f91bf4a outdated

38@@ -39,4 +39,7 @@ static void secp256k1_modinv32_var(secp256k1_modinv32_signed30 *x, const secp256
39 /* Same as secp256k1_modinv32_var, but constant time in x (not in the modulus). */
40 static void secp256k1_modinv32(secp256k1_modinv32_signed30 *x, const secp256k1_modinv32_modinfo *modinfo);
41 
42+/* Compute the Jacobi symbol for x (where gcd(x,p) == 1), 0 if it cannot be determined. */
43+static int secp256k1_jacobi32_maybe_var(const secp256k1_modinv32_signed30 *x, const secp256k1_modinv32_modinfo *modinfo);

robot-dreams commented at 5:32 pm on November 8, 2021:

Nit: This API is probably ok since you specify that gcd(x, p) == 1, but is it still dangerous to return 0 here, instead of some other constant (like #define JACOBI_VAR_FAILURE -2 or whatever) that can’t possibly be the value of a jacobi symbol?

sipa commented at 9:31 pm on November 9, 2021:

Good idea. Done.

in src/tests.c:963 in 042f91bf4a outdated

828+            negone[0] = mod[0] - 1;
829+            for (i = 1; i < 16; ++i) negone[i] = mod[i];
830+            mulmod256(sqr, sqr, negone, mod);
831+            uint16_to_signed30(&x, sqr);
832+            CHECK(secp256k1_jacobi32_maybe_var(&x, &m) == -nonzero);
833+        }

robot-dreams commented at 5:44 pm on November 8, 2021:

Given that (-1 | n) = (-1)^((n - 1) / 2), I think we can make a stronger claim here that x and -x have opposite jacobi symbols if and only if mod % 4 == 3.

Would it make sense to get some more test cases here (and in the jacobi64 version below) by considering both possibilities, e.g. as follows?

0        negone[0] = mod[0] - 1;
1        for (i = 1; i < 16; ++i) negone[i] = mod[i];
2        mulmod256(sqr, sqr, negone, mod);
3        uint16_to_signed30(&x, sqr);
4        /* x and -x have opposite jacobi symbols if and only if mod % 4 == 3. */
5        if ((mod[0] & 3) == 3) {
6            CHECK(secp256k1_jacobi32_maybe_var(&x, &m) == -nonzero);
7        } else {
8            CHECK(secp256k1_jacobi32_maybe_var(&x, &m) == nonzero);
9        }

sipa commented at 9:32 pm on November 9, 2021:

Nice catch, done. I was hoping to have a test that guarantees finding a -1 result, but that’s hard without a per-modulus search for a non-square factor. This is pretty useful too, though.

in src/field_5x52_impl.h:691 in 042f91bf4a outdated

591+         * to computing a square root. This should be extremely rare with random
592+         * input. */
593+        secp256k1_fe dummy;
594+        ret = 2*secp256k1_fe_sqrt(&dummy, &tmp) - 1;
595+    }
596+    return ret;

robot-dreams commented at 8:01 pm on November 8, 2021:

Nit: Is it worth adding an extra verification step here, to always compute the jacobi symbol both ways for tests?

0    } else {
1#ifdef VERIFY
2        secp256k1_fe dummy;
3        VERIFY_CHECK(2*secp256k1_fe_sqrt(&dummy, &tmp) - 1 == ret);
4#endif
5    }
6    return ret;

sipa commented at 9:47 pm on November 9, 2021:

Done.

robot-dreams commented at 10:01 pm on November 8, 2021: contributor

tACK 042f91bf4acca0e1d70ca6afffe045a525a7b614

./tests passed, including the additional checks I suggested.

I’m still brushing up on the divsteps algorithm, and will review this change again once I’ve worked through the various details / optimizations.

Benchmark results look good on x86_64 (MacBook Pro): native jacobi version is faster than libgmp. However, on arm64 (Mac M1), native jacobi version was slightly slower!

On M1 I also tried ./configure --with-asm=arm --enable-experimental to see if that would speed up the native jacobi version, but attempting to build resulted in a lot of compile errors in field_10x26_arm.s.

MacBook Pro (x86_64)

native jacobi branch

0$ ./bench_internal field
1...
2field_jacobi_var: min 2.03us / avg 2.25us / max 3.85us
3field_sqrt: min 6.33us / avg 6.38us / max 6.46us
4
5$ ./bench_internal jacobi
6field_jacobi_var: min 2.00us / avg 2.06us / max 2.15us

libgmp branch f24e122d13db7061b1086ddfd21d3a1c5294213b:

0$ ./bench_internal field
1...
2field_sqrt: min 6.05us / avg 6.13us / max 6.38us
3
4$ ./bench_internal num
5num_jacobi: min 2.44us / avg 2.47us / max 2.53us

Mac M1 (arm64)

native jacobi branch

0% ./bench_internal field
1...
2field_jacobi_var: min 1.54us / avg 1.55us / max 1.56us
3field_sqrt: min 4.04us / avg 4.09us / max 4.29us
4
5% ./bench_internal jacobi
6field_jacobi_var: min 1.55us / avg 1.74us / max 3.18us

libgmp branch f24e122d13db7061b1086ddfd21d3a1c5294213b:

0% ./bench_internal field
1...
2field_sqrt: min 4.04us / avg 4.04us / max 4.05us
3
4% ./bench_internal jacobi
5...
6num_jacobi: min 1.41us / avg 1.41us / max 1.42us

real-or-random commented at 8:09 am on November 9, 2021: contributor

@robot-dreams Always nice to see new contributors in the repo. :) Just curious, are you simply helping or are you interested in getting this merged? If the latter, I’m further curious about your use case. Feel free just ignore my intrusive questions.

robot-dreams commented at 5:25 pm on November 9, 2021: contributor

@real-or-random Thanks for the reply! I’m simply helping and try to learn about this repo, I don’t have a personal use case.

in src/modinv64_impl.h:327 in 042f91bf4a outdated

322+        eta -= zeros;
323+        i -= zeros;
324+        /* Update the bottom bit of jac: when dividing g by an odd power of 2,
325+         * if (f mod 8) is 3 or 5, the Jacobi symbol changes sign. */
326+        jac ^= (zeros & ((f >> 1) ^ (f >> 2)));
327+        /* We're done once we've done 60 posdivsteps. */

robot-dreams commented at 5:30 pm on November 9, 2021:

Nit: 62 posdivsteps?

sipa commented at 4:30 pm on November 10, 2021:

Done.

in src/modinv32_impl.h:350 in 042f91bf4a outdated

345+        eta -= zeros;
346+        i -= zeros;
347+        /* Update the bottom bit of jac: when dividing g by an odd power of 2,
348+         * if (f mod 8) is 3 or 5, the Jacobi symbol changes sign. */
349+        jac ^= (zeros & ((f >> 1) ^ (f >> 2)));
350+        /* We're done once we've done 60 posdivsteps. */

robot-dreams commented at 5:37 pm on November 9, 2021:

Nit: 30 posdivsteps?

sipa commented at 4:30 pm on November 10, 2021:

Done.

sipa force-pushed on Nov 9, 2021

sipa commented at 10:24 pm on November 9, 2021: contributor

@robot-dreams Thanks for your comments!

The failure with asm=arm is expected, as Apple M1 is 64-bit ARM, while the assembly code used here is 32-bit.

sipa commented at 10:57 pm on November 9, 2021: contributor

Made another change: in VERIFY mode, the upper bound on the number of posdivsteps iterations is reduced to ~750 (which is close to the median of how many iterations are needed for random input mod the field size), so that the uncomputable path is actually exercised in the tests.

in src/modinv64_impl.h:333 in 6e3fcc4f51 outdated

328+        if (i == 0) break;
329+        VERIFY_CHECK((f & 1) == 1);
330+        VERIFY_CHECK((g & 1) == 1);
331+        VERIFY_CHECK((u * f0 + v * g0) == f << (62 - i));
332+        VERIFY_CHECK((q * f0 + r * g0) == g << (62 - i));
333+        /* If eta is negative, negate it and replace f,g with g,-f. */

robot-dreams commented at 3:35 am on November 10, 2021:

Nit: replace f,g with g,f?

sipa commented at 4:30 pm on November 10, 2021:

Done.

in src/modinv32_impl.h:356 in 6e3fcc4f51 outdated

351+        if (i == 0) break;
352+        VERIFY_CHECK((f & 1) == 1);
353+        VERIFY_CHECK((g & 1) == 1);
354+        VERIFY_CHECK((u * f0 + v * g0) == f << (30 - i));
355+        VERIFY_CHECK((q * f0 + r * g0) == g << (30 - i));
356+        /* If eta is negative, negate it and replace f,g with g,-f. */

robot-dreams commented at 3:54 am on November 10, 2021:

Nit: replace f,g with g,f?

sipa commented at 4:30 pm on November 10, 2021:

Done.

in src/modinv64_impl.h:345 in 6e3fcc4f51 outdated

340+            /* Update bottom bit of jac: when swapping f and g, the Jacobi symbol changes sign
341+             * if both f and g are 3 mod 4. */
342+            jac ^= ((f & g) >> 1);
343+            /* Use a formula to cancel out up to 6 bits of g. Also, no more than i can be cancelled
344+             * out (as we'd be done before that point), and no more than eta+1 can be done as its
345+             * will flip again once that happens. */

robot-dreams commented at 4:02 am on November 10, 2021:

Nit: sign will flip?

sipa commented at 4:29 pm on November 10, 2021:

Done.

robot-dreams commented at 4:44 am on November 10, 2021: contributor

ACK 6e3fcc4f5149c96ab052f3312091dfc0b8eab397

I agree with your modifications to the original safegcd for calculating Jacobi symbol instead of modular inverse. I also replicated the logic for the 64-bit version in Python, and checked that the result matches a “slow” Jacobi function for a few randomly generated inputs (see https://gist.github.com/robot-dreams/ceb00162b80384f2ae1913aaa2b35e75).

One potential concern is that with this change, there are now 4 slightly different implementations of a similar algorithm (32/64-bit, inv/jacobi). Do you think there’s a nice way to consolidate the logic, at least among a given word size? (If there’s no nice way or if it’s too much trouble, I’m fine with leaving it as is and relying on having good tests.)

Other questions:

What was the motivation for using different approaches to clear the low-order bits of g between the 32 and 64 bit implementations (8-bit lookup table vs 4 and 6-bit tricks from Hacker’s Delight)?
I agree that replacing f, g = g, -f with f, g = g, f ensures that all matrix entries are always positive. But aside from actually observing that it converges to f = 1 in practice, do you have any other intuitive justification for this? (The best I can come up with is that since you’re constantly dividing g by 2 and swapping the roles of f and g, you’re quickly moving towards a solution even if you replace g - f with g + f in the original GCD algorithm.)
Is there already a writeup describing this change? If not, would it make sense to add a brief section to https://github.com/bitcoin-core/secp256k1/blob/master/doc/safegcd_implementation.md? (If you think so, I’d be happy to give that a shot.)

robot-dreams commented at 4:57 am on November 10, 2021: contributor

@sipa Thanks for responding to the comments!

Yes, 32-bit makes sense (in retrospect the fact that the file was called field_10x26 instead of field_5x52 should have given that away)
Great idea to reduce the number of iterations for validation!

sipa force-pushed on Nov 10, 2021

sipa commented at 4:28 pm on November 10, 2021: contributor

One potential concern is that with this change, there are now 4 slightly different implementations of a similar algorithm (32/64-bit, inv/jacobi). Do you think there’s a nice way to consolidate the logic, at least among a given word size? (If there’s no nice way or if it’s too much trouble, I’m fine with leaving it as is and relying on having good tests.)

Arguably there are 6 implementations (32/64-bit, and const/var/varpos divsteps), and they do share some of the code (in particular the matrix application), but not everything. I don’t think there is an obvious way of merging more code without reducing performance, but I also haven’t tried very hard.

What was the motivation for using different approaches to clear the low-order bits of g between the 32 and 64 bit implementations (8-bit lookup table vs 4 and 6-bit tricks from Hacker’s Delight)?

Just dumb benchmarking. The formula was faster on x86_64 systems we tried, and the table was faster on ARMv7 systems (it’s explained in one of the commit messages that introduced it).

I agree that replacing f, g = g, -f with f, g = g, f ensures that all matrix entries are always positive. But aside from actually observing that it converges to f = 1 in practice, do you have any other intuitive justification for this? (The best I can come up with is that since you’re constantly dividing g by 2 and swapping the roles of f and g, you’re quickly moving towards a solution even if you replace g - f with g + f in the original GCD algorithm.)

I don’t, no. I do have a proof that any f>0,g>0 converges to f=g somewhere in a finite number of steps, but the upper bound is something like f*g*log2(f*g), which is obviously useless for practical purposes. The technique explained in https://github.com/sipa/safegcd-bounds completely fails for posdivsteps (doesn’t converge at all). With a lot of tricks I got it to converge, but in what appears quadratic (O(log2(f)^2)), and it’s only tractable to evaluate for f up to 2^20 or so.

The difficulty is that there are both steps that take f and g further apart (g /= 2) and steps that bring them closer (g = (f+g)/2), and it’s hard to infer tight bounds on how many times one can happen in between two executions of the other.

So in short, no; the actual argument why this works is purely empirical; all out of a 2.5 billion random inputs (with the field size modulus) need between 630 and 921 iterations.

Is there already a writeup describing this change? If not, would it make sense to add a brief section to https://github.com/bitcoin-core/secp256k1/blob/master/doc/safegcd_implementation.md? (If you think so, I’d be happy to give that a shot.)

I think that would be valuable, yes. I haven’t done so yet, but was considering it. Feel free to take a shot at it.

sipa force-pushed on Nov 12, 2021

sipa commented at 4:37 pm on November 12, 2021: contributor

Included @robot-dreams’s update to the writeup.

in src/modinv32_impl.h:675 in 902d57e075 outdated

660@@ -584,4 +661,69 @@ static void secp256k1_modinv32_var(secp256k1_modinv32_signed30 *x, const secp256
661     *x = d;
662 }
663 
664+/* Compute the Jacobi symbol of x modulo modinfo->modulus (variable time). gcd(x,modulus) must be 1. */

robot-dreams commented at 6:46 pm on November 15, 2021:

Nits (also applies to modinv64_impl):

Is gcd(x,modulus) must be 1 still true, since you’re handling x=0 here?
Is it necessary to also mention x must be normalized here for consistency with modinv32.h?

sipa commented at 8:28 pm on November 16, 2021:

I made a few changes to address this.

robot-dreams commented at 6:52 pm on November 15, 2021: contributor

ACK 5af3d290b6eabd6dd5cb2565caab24c0ceb8b8f9 aside from comment nits—I checked the range diff against the old commit and it looks good.

sipa force-pushed on Nov 16, 2021

robot-dreams commented at 9:22 pm on November 16, 2021: contributor

ACK 53b7ce28bef8008787c1aeb202ec17ef3a3b3325

sipa closed this on Jan 29, 2022

sipa reopened this on Jan 29, 2022

sipa force-pushed on Jun 28, 2022

sipa cross-referenced this on Jul 3, 2022 from issue Add x-only ecmult_const version with x specified as n/d by sipa

sipa cross-referenced this on Jul 21, 2022 from issue ElligatorSwift + integrated x-only DH by sipa

mratsim cross-referenced this on Aug 8, 2022 from issue Perf: `isSquare` - constant-time Jacobi/Kronecker/Legendre symbol using fast GCD by mratsim

sipa force-pushed on Nov 2, 2022

sipa force-pushed on Nov 4, 2022

in doc/safegcd_implementation.md:789 in d70f0fefc9 outdated

784+        # After
785+        if delta > 0 and g & 1:
786+            delta, f, g = 1 - delta, g, (g + f) // 2
787+```
788+
789+The algorithm is still correct, since the changed divstep, called a "posdivstep" (see section 8.4 and E.5 in the paper) preserves *gcd(f, g)*. However, there's no proof that the modified algorithm will converge. The justification for posdivsteps is completely empirical: in practice, it appears that the vast majority of inputs converge to *f=g=gcd(f<sub>0</sub>, g<sub>0<sub>)* in a number of steps proportional to their logarithm.

jonasnick commented at 10:25 pm on December 5, 2022:

g<sub>0<sub> -> g<sub>0</sub>

sipa commented at 4:45 pm on December 10, 2022:

Fixed.

in src/bench_internal.c:233 in d70f0fefc9 outdated

225+    for (i = 0; i < iters; i++) {
226+        j += secp256k1_fe_jacobi_var(&data->fe[0]);
227+        secp256k1_fe_add(&data->fe[0], &data->fe[1]);
228+    }
229+    CHECK(j <= iters);
230+}

jonasnick commented at 3:55 pm on December 10, 2022:

./bench_internal jacobi crashes on my machine when configured with CPPFLAGS=-DVERIFY. Backtrace:

 0src/field_5x52_impl.h:57: test condition failed: r == 1
 1
 2Program received signal SIGABRT, Aborted.
 30x00007ffff7e3fbc7 in __pthread_kill_implementation () from /nix/store/hsk71z8admvgykn7vzjy11dfnar9f4r1-glibc-2.35-163/lib/libc.so.6
 4(gdb) bt
 5[#0](/bitcoin-core-secp256k1/0/)  0x00007ffff7e3fbc7 in __pthread_kill_implementation () from /nix/store/hsk71z8admvgykn7vzjy11dfnar9f4r1-glibc-2.35-163/lib/libc.so.6
 6[#1](/bitcoin-core-secp256k1/1/)  0x00007ffff7df2b46 in raise () from /nix/store/hsk71z8admvgykn7vzjy11dfnar9f4r1-glibc-2.35-163/lib/libc.so.6
 7[#2](/bitcoin-core-secp256k1/2/)  0x00007ffff7ddd4b5 in abort () from /nix/store/hsk71z8admvgykn7vzjy11dfnar9f4r1-glibc-2.35-163/lib/libc.so.6
 8[#3](/bitcoin-core-secp256k1/3/)  0x0000000000402920 in secp256k1_fe_verify (a=<optimized out>) at src/field_5x52_impl.h:57
 9[#4](/bitcoin-core-secp256k1/4/)  0x000000000040c1f8 in secp256k1_fe_add (a=0x99, r=0x69) at src/field_5x52_impl.h:443
10[#5](/bitcoin-core-secp256k1/5/)  bench_field_jacobi_var (arg=arg@entry=0x7fffffffcf20, iters=iters@entry=20000) at src/bench_internal.c:227
11[#6](/bitcoin-core-secp256k1/6/)  0x0000000000413d5a in run_benchmark (name=0x416221 "field_jacobi_var", benchmark=0x40bdd0 <bench_field_jacobi_var>, setup=0x40cd70 <bench_setup>,
12    teardown=0x0, data=0x7fffffffcf20, count=10, iter=20000) at src/bench.h:109
13[#7](/bitcoin-core-secp256k1/7/)  0x000000000040147d in main (argc=2, argv=0x7fffffffd798) at src/bench_internal.c:393

sipa commented at 4:45 pm on December 10, 2022:

Fixed. It was just a missing normalize in the benchmark.

in src/tests.c:3205 in d70f0fefc9 outdated

2893@@ -2854,8 +2894,10 @@ void run_sqrt(void) {
2894         for (j = 0; j < count; j++) {
2895             random_fe(&x);
2896             secp256k1_fe_sqr(&s, &x);
2897+            CHECK(secp256k1_fe_jacobi_var(&s) == 1);
2898             test_sqrt(&s, &x);
2899             secp256k1_fe_negate(&t, &s, 1);
2900+            CHECK(secp256k1_fe_jacobi_var(&t) == -1);

jonasnick commented at 3:59 pm on December 10, 2022:

Is it possible that that secp256k1_fe_jacobi_var(&s) is taking the slow secp256k1_fe_sqrt branch about 50% of the time? I discovered this on my machine with this simple patch that adds a printf:

 0diff --git a/src/field_5x52_impl.h b/src/field_5x52_impl.h
 1index 26e89123..d3eb3347 100644
 2--- a/src/field_5x52_impl.h
 3+++ b/src/field_5x52_impl.h
 4@@ -681,6 +681,7 @@ static int secp256k1_fe_jacobi_var(const secp256k1_fe *x) {
 5          * to computing a square root. This should be extremely rare with random
 6          * input. */
 7         secp256k1_fe dummy;
 8+        printf("slow");
 9         ret = 2*secp256k1_fe_sqrt(&dummy, &tmp) - 1;
10 #ifdef VERIFY
11     } else {

I’m quite confused because should be “extremely rare with random input”, but s appears to be a random field element. Furthermore, “slow” isn’t printed in the jacobi benchmarks unless I compile with -DVERIFY (which crashes after a few iterations, see other comment).

sipa commented at 4:04 pm on December 10, 2022:

Yes, deliberately.

WIth -DVERIFY, the cut-off is at ~900 iterations of the algorithm, chosen to be roughly the median of the number of iterations needed. Without -DVERIFY, ~1500 iterations is the limit, which has probability less than 1 in a billion (probably a lot less).

It’s using a lower limit in -DVERIFY to make sure the slow branch is exercised in tests.

jonasnick commented at 4:46 pm on December 10, 2022:

Ah ok, that’s good. I did not looked into secp256k1_jacobi64_maybe_var so far and hadn’t considered that its behavior would change on VERIFY.

sipa force-pushed on Dec 10, 2022

sipa commented at 5:02 pm on December 10, 2022: contributor

I removed a determinant check in the 64-bit jacobi code, as this new algorithm can result in determinant either 2^62 or -2^62, but the int128 code can’t deal with that yet.

EDIT: fixed now, added two commits to deal with that.

sipa force-pushed on Dec 10, 2022

in src/int128_struct_impl.h:195 in 55822babf2 outdated

194+        return n >= 64 ? r->hi == (uint64_t)1 << (n - 64) && r->lo == 0
195+                       : r->hi == 0 && r->lo == (uint64_t)1 << n;
196+    } else {
197+        return n >= 64 ? r->hi == (~(uint64_t)0) << (n - 64) && r->lo == 0
198+                       : r->hi == (~(uint64_t)0) && r->lo == (~(uint64_t)0) << n;
199+    }

real-or-random commented at 4:48 pm on January 3, 2023:

nit:

0    return n >= 64 ? r->hi == (uint64_t)sign << (n - 64) && r->lo == 0
1                   : r->hi == (uint64_t)((sign - 1) >> 2) && r->lo == (uint64_t)sign << n;

sipa commented at 11:28 pm on January 3, 2023:

Done.

in src/tests.c:2001 in 55822babf2 outdated

1997+        int pos = ub % 127;
1998+        if (expect) {
1999+            /* If expect==1, set swz to exactly -(2 << pos). */
2000+            uint64_t hi = ~(uint64_t)0;
2001+            uint64_t lo = ~(uint64_t)0;
2002+            if (pos & 64) {

real-or-random commented at 5:03 pm on January 3, 2023:

nit:

0            if (pos >= 64) {

I find this much clearer, it took me a while to convince myself that they’re equivalent (which holds because pos < 128).

If you happen to accept this suggestion, you should change it below too and also the two occurrences in the sign == 1 test block, where you got this code from.

sipa commented at 11:28 pm on January 3, 2023:

Done.

in src/modinv64_impl.h:74 in 55822babf2 outdated

70@@ -71,11 +71,13 @@ static int secp256k1_modinv64_mul_cmp_62(const secp256k1_modinv64_signed62 *a, i
71     return 0;
72 }
73 
74-/* Check if the determinant of t is equal to 1 << n. */
75-static int secp256k1_modinv64_det_check_pow2(const secp256k1_modinv64_trans2x2 *t, unsigned int n) {
76+/* Check if the determinant of t is equal to 1 << n. If abs, |det| == 1 << n. */

real-or-random commented at 5:14 pm on January 3, 2023:

readability nit:

0/* Check if the determinant of t is equal to 1 << n. If abs, check if |det t| == 1 << n. */

or if you prefer shorter

0/* Check if the determinant of t equals 1 << n. If abs, check if |det t| == 1 << n. */

real-or-random commented at 7:17 am on January 5, 2023:

This one is still unresolved – (Feel free to dismiss, I’m mostly checking if the PR is ready for review again.)

sipa commented at 3:40 pm on January 5, 2023:

Done.

in src/field.h:138 in 55822babf2 outdated

138@@ -139,4 +139,7 @@ static void secp256k1_fe_half(secp256k1_fe *r);
139  *  magnitude set to 'm' and is normalized if (and only if) 'm' is zero. */
140 static void secp256k1_fe_get_bounds(secp256k1_fe *r, int m);
141 
142+/** Compute the Jacobi symbol of a / p. 0 if a=0; 1 if a square; -1 if a non-square. */

real-or-random commented at 5:25 pm on January 3, 2023:

nit:

0/** Compute the Jacobi symbol of a / p. 0 if a=0; 1 if a square and non-zero; -1 if a non-square. */

in src/field_5x52_impl.h:680 in 55822babf2 outdated

679+    if (ret == -2) {
680+        /* secp256k1_jacobi64_maybe_var failed to compute the Jacobi symbol. Fall back
681+         * to computing a square root. This should be extremely rare with random
682+         * input. */
683+        secp256k1_fe dummy;
684+        ret = 2*secp256k1_fe_sqrt(&dummy, &tmp) - 1;

real-or-random commented at 6:05 pm on January 3, 2023:

Should instead have a function fe_is_square_var?

I think this will be better because it

avoids the return value translations here and below
is more ergonomic to use (I think, remember that we had it this in the old version of BIP340 too),
and it aligns with the uses in ellswift: in the corresponding PR, we only test >=0 and <0, so we actually don’t even need the special casing for ==0 currently.

This is related to my comment below about the jacp function argument.

sipa commented at 9:52 pm on January 4, 2023:

I’ve renamed the function to secp256k1_fe_is_square_var; the internal (modinv) one is still actually computing a Jacobi symbol, I’m less convinced about changing things there.

real-or-random commented at 7:16 am on January 5, 2023:

makes sense

in src/modinv64_impl.h:330 in 55822babf2 outdated

323+    /* Transformation matrix; see comments in secp256k1_modinv64_divsteps_62. */
324+    uint64_t u = 1, v = 0, q = 0, r = 1;
325+    uint64_t f = f0, g = g0, m;
326+    uint32_t w;
327+    int i = 62, limit, zeros;
328+    int jac = *jacp;

real-or-random commented at 6:26 pm on January 3, 2023:

0    int jac = *jacp;
1    VERIFY_CHECK(jac == 0 || jac == 1);

sipa commented at 10:02 pm on January 4, 2023:

That’s not actually correct; the other bits of *jacp will take on non-zero values too, but are irrelevant. I’ve hopefully addressed that in the added comments.

real-or-random commented at 6:42 am on January 5, 2023:

Ok, I got it after reading your new comments.

in src/modinv64_impl.h:317 in 55822babf2 outdated

313+ * of the Jacobi symbol along the way. f0 and g0 must be f and g mod 2^64 rather than 2^62, because
314+ * Jacobi tracking requires knowing (f mod 8) rather than just (f mod 2).
315+ *
316+ * Input:  eta: initial eta
317+ *         f0:  bottom limb of initial f
318+ *         g0:  bottom limb of initial g

real-or-random commented at 6:29 pm on January 3, 2023:

should document the input/output parameter jacp. (And the variable name is confusing. This is not the jacobi symbol but its sign.)

sipa commented at 9:52 pm on January 4, 2023:

I’ve added a comment explaining it (in both the 32-bit and 64-bit versions).

in src/modinv64_impl.h:729 in 55822babf2 outdated

721+ * In VERIFY mode use a lower number of iterations (744, close to the median 756), so failure actually occurs. */
722+#ifdef VERIFY
723+#define JACOBI64_ITERATIONS 12
724+#else
725+#define JACOBI64_ITERATIONS 25
726+#endif

real-or-random commented at 6:34 pm on January 3, 2023:

On the one hand, I really like this trick. But on the other hand, it creates a difference between VERIFY and non-VERIFY (and who knows, maybe this influences loop unrolling etc.)

Would it be a good idea to keep this 25 in the real code but create a test-only version of it (either in the tests file or here) that just runs with 12? Or maybe better, make the number of iterations an argument of the function? Since it’s a “maybe” function, it should be correct for each iteration value.

sipa commented at 9:56 pm on January 4, 2023:

it creates a difference between VERIFY and non-VERIFY (and who knows, maybe this influences loop unrolling etc.)

Is that a problem? VERIFY vs non-VERIFY may already affect loop unrolling by having more code inside the loop for example. I feel like VERIFY mode isn’t to test the compiler’s correctness, but to test the correctness of our source code.

So… I’m not sure. I do see the uneasiness about it, but I feel that the current approach is really the only one that guarantees (a) it’s actually exercised in realistic settings and (b) using the same code as is used in production. A copy in tests could go out of sync, and having the iteration count be an argument still means somewhere higher in the stack there needs to be a decision what to set it to, which I think has the same problems.

If we don’t like it, I think the best alternative is just having maintainers occasionally manually setting the iteration count lower, and seeing if tests still pass.

real-or-random commented at 7:13 am on January 5, 2023:

I feel like VERIFY mode isn’t to test the compiler’s correctness, but to test the correctness of our source code.

That’s the way I see it. too. [1] I think the fundamental problem here is that we don’t really run tests in non-VERIFY mode. I’ll comment in #1037, which is somewhat related.

A copy in tests could go out of sync, […]

Agreed, that’s bad.

If we don’t like it, I think the best alternative is just having maintainers occasionally manually setting the iteration count lower, and seeing if tests still pass.

That’s even worse, we’ll forget to run it.

and having the iteration count be an argument still means somewhere higher in the stack there needs to be a decision what to set it to, which I think has the same problems.

So… I’m not sure. I do see the uneasiness about it, but I feel that the current approach is really the only one that guarantees (a) it’s actually exercised in realistic settings and (b) using the same code as is used in production.

Hm, yeah, I don’t think (a) is a big concern here: What this VERIFY tests is just handling the loop termination in case of too many iterations, and that’s a few lines of code which doesn’t care too much if the failure happened in an realistic input or not. So what I imagined it is that we run it for values of the iteration count in the test, and this will be good enough to exercise this code.

This would have the advantage that it minimizes the difference between VERIFY and non-VERIFY mode. But after thinking more about it, I agree that it’s a somewhat moot goal. [2] Your comment about loop unrolling is right and we have a lot of code locations where we add some substantial amount of code in VERIFY (you introduce another one here in this PR, which I don’t mind at all.)

So yes, let’s keep this as it is.

[1] Though to some extent, VERIFY can also help checking compiler, e.g., when a function is miscompiles and its result is VERIFIED somewhere else. [2] I think my brain mixed up my thoughts with the discussion in #1169#pullrequestreview-1224682895, but this was about SECP256K1_CHECKMEM_ENABLED being defined, not VERIFY.

sipa commented at 3:19 pm on January 5, 2023:

Another thing is that the approach here guarantees that (in VERIFY mode) all potential callers of this function actually deal with failure being returned. An approach where the caller has to explicitly ask for “low iterations” doesn’t have that property, as a hypothetical oblivious caller just wouldn’t ask for it.

That’s a minor concern of course; I think it’s pretty unlikely there will be other callers than secp256k1_fe_is_square_var any time soon (and tests).

real-or-random commented at 6:39 pm on January 3, 2023: contributor

This is really good, I have almost only nits.

note to self: I haven’t looked at the tests for secp256k1_jacobi32/64_maybe_var yet

in src/modinv32_impl.h:333 in 55822babf2 outdated

325+ *         f0:  bottom limb of initial f
326+ *         g0:  bottom limb of initial g
327+ * Output: t: transition matrix
328+ * Return: final eta
329+ */
330+static int32_t secp256k1_modinv32_posdivsteps_30_var(int32_t eta, uint32_t f0, uint32_t g0, secp256k1_modinv32_trans2x2 *t, int *jacp) {

real-or-random commented at 6:40 pm on January 3, 2023:

Since this is so similar to the other function, have you considered using macros to avoid code duplication? (I’m not saying it’s necessarily better.)

sipa commented at 9:58 pm on January 4, 2023:

I have, but it’s a bunch of small changes:

Different transformation matrices (which in terms of code means some negation signs are gone).
Higher upper bounds on eta.
*jacp accounting.
Different constraints on the determinant of the transformation matrix.

I don’t think macros to reduce code reuse would be very readable.

real-or-random commented at 6:41 am on January 5, 2023:

Makes sense!

sipa force-pushed on Jan 3, 2023

sipa force-pushed on Jan 4, 2023

sipa force-pushed on Jan 5, 2023

in doc/safegcd_implementation.md:775 in ef2ac449d7 outdated

768@@ -769,3 +769,30 @@ def modinv_var(M, Mi, x):
769         d, e = update_de(d, e, t, M, Mi)
770     return normalize(f, d, Mi)
771 ```
772+
773+## 8. From GCDs to Jacobi symbol
774+
775+We can also use a similar approach to calculate Jacobi symbol *(x | M)* by keeping track of an extra variable *j*, for which at every step *(x | M) = j (g | f)*. As we update *f* and *g*, we make corresponding updates to *j* using [properties of the Jacobi symbol](https://en.wikipedia.org/wiki/Jacobi_symbol#Properties). In particular, we update *j* whenever we divide *g* by *2* or swap *f* and *g*; these updates depend only on the values of *f* and *g* modulo *4* or *8*, and can thus be applied very quickly. Overall, this calculation is slightly simpler than the one for modular inverse because we no longer need to keep track of *d* and *e*.

real-or-random commented at 4:24 pm on January 5, 2023:

It wouldn’t hurt to spell the two properties out here for self-containedness.

sipa commented at 8:57 pm on January 26, 2023:

Done.

in src/modinv32.h:38 in ef2ac449d7 outdated

34@@ -35,4 +35,8 @@ static void secp256k1_modinv32_var(secp256k1_modinv32_signed30 *x, const secp256
35 /* Same as secp256k1_modinv32_var, but constant time in x (not in the modulus). */
36 static void secp256k1_modinv32(secp256k1_modinv32_signed30 *x, const secp256k1_modinv32_modinfo *modinfo);
37 
38+/* Compute the Jacobi symbol for (x | modinfo->modulus). Either x must be 0, or x must be coprime with

real-or-random commented at 4:26 pm on January 5, 2023:

maybe this should spell out that the Jacobi symbol is only defined for odd mod?

(Do we need mod > 1? I see the termination condition f == 1 is only checked after applying a the first round of posdivsteps.)

sipa commented at 6:54 pm on January 5, 2023:

If mod=1, then x=0, so I think the output is trivial in any case then?

real-or-random commented at 10:46 am on January 6, 2023:

Okay yes, I don’t immediately see why. If this is the case, then it should probably be explained.

(This can be done later as you said below.)

jonasnick commented at 8:39 am on February 27, 2023:

Nit: The jacobi symbol (0|1) is commonly defined to be 1 (see e.g. wikipedia), but secp256k1_jacobi_maybe_var returns 0 in that case. I think it would be valuable to document secp256k1_jacobi_maybe_var’s behavior in this edge case.

real-or-random commented at 9:30 am on February 27, 2023:

That’s related to what I said there: #979 (review). Though I don’t understand why I said there that I don’t see why the output is trivial. It’s trivial, of course. Maybe what I meant is that I don’t see immediately why the output is correct, which matches your comment here, @jonasnick.

Perhaps it’s better to simply exclude (0|1) as a valid input, and then document and VERIFY_CHECK that. That makes it easier to recognize this corner case in future code. (And I feel it’s hard to predict what definition of jac(0|1) future code would expect to work. It’s better to make callers remind of the corner case explicitly.)

sipa commented at 9:28 pm on February 27, 2023:

@real-or-random Has this been addressed by only supporting coprime inputs? The secp256k1_modinv32_modinfo requires an odd modulus already.

sipa commented at 9:29 pm on February 27, 2023:

Done.

real-or-random commented at 3:30 pm on February 28, 2023:

Yes, that can be considered resolved.

The secp256k1_modinv32_modinfo requires an odd modulus already.

I wonder if it’s a good idea to add a function secp256k1_modinv32_modinfo verify that checks consistency and call it on entry of every function that reads modinfo, similar to how we do this for other data structures. But if yes, that should happen in a separate PR.

real-or-random commented at 4:34 pm on January 5, 2023: contributor

Some final comments you can address or ignore.

I’m running some testing locally and will then ACK if it’s successful.

real-or-random approved

real-or-random commented at 7:03 pm on January 5, 2023: contributor

ACK ef2ac449d757189b67be27437824c2d0c0663bde diff and writeup is good and I tested every commit

sipa commented at 7:17 pm on January 5, 2023: contributor

I think I’ll leave your remaining nits for now, as they seem to intersect a bit with the existing safegcd/modinv functions, so addressing them is perhaps better done afterwards generally.

real-or-random commented at 8:32 pm on January 16, 2023: contributor

needs (trivial) rebase

sipa force-pushed on Jan 17, 2023

sipa commented at 4:45 am on January 17, 2023: contributor

Rebased after merge of #1190.

real-or-random approved

real-or-random commented at 9:39 am on January 18, 2023: contributor

reACK 642b9f42 diff and writeup is good and I tested every commit

in src/tests.c:1962 in f6470b6afe outdated

1961+    /* test secp256k1_i128_check_pow2 (sign == -1) */
1962+    {
1963+        int expect = (uc & 1);
1964+        int pos = ub % 127;
1965+        if (expect) {
1966+            /* If expect==1, set swz to exactly -(2 << pos). */

siv2r commented at 10:49 am on January 21, 2023:

Shouldn’t it be -2^pos instead of -(2 << pos)? Similarly, in the lines: 1973, 1937, 1947.

sipa commented at 8:57 pm on January 26, 2023:

Good catch, fixed.

sipa force-pushed on Jan 26, 2023

sipa commented at 8:57 pm on January 26, 2023: contributor

Rebased, and addressed a few nits.

jonasnick commented at 8:51 am on February 27, 2023: contributor

ACK mod nit

I was able to implement the algorithm based on the descriptions and python snippets in safegcd_implementation.md and compare it against the C implementation.

Make secp256k1_i128_check_pow2 support -(2^n) 5fffb2c7af

Make secp256k1_modinv64_det_check_pow2 support abs val 04c6c1b181

sipa force-pushed on Feb 27, 2023

sipa commented at 9:25 pm on February 27, 2023: contributor

Rebased, and updated to only support coprime inputs in the secp256k1_secp256k1_jacobiXX_maybe_var functions. The return value of those functions for “unknown/timeout” has been changed from -2 to 0.

in src/modinv32.h:38 in 4a8d82de3c outdated

34@@ -35,4 +35,8 @@ static void secp256k1_modinv32_var(secp256k1_modinv32_signed30 *x, const secp256
35 /* Same as secp256k1_modinv32_var, but constant time in x (not in the modulus). */
36 static void secp256k1_modinv32(secp256k1_modinv32_signed30 *x, const secp256k1_modinv32_modinfo *modinfo);
37 
38+/* Compute the Jacobi symbol for (x | modinfo->modulus). x must be coprime with modulus (and thus cannot be 0).

jonasnick commented at 12:52 pm on February 28, 2023:

Isn’t 0 coprime with 1 (and no other positive integer)? The gcd is 1, since 1 divides 0 and 1 divides 1.

sipa commented at 3:04 pm on February 28, 2023:

Good call. And iterating (f=1,g=0) doesn’t change the state, so it’s inaccurate to say it converges to f=g=gcd(f0,g0) as well.

sipa commented at 3:33 pm on February 28, 2023:

I’ve changed some comments, and clarified the explanation in safegcd_implementation.md too.

jonasnick commented at 4:01 pm on February 28, 2023:

Maybe I’m missing something but this line still says x coprime with modulus => x != 0.

sipa commented at 4:03 pm on February 28, 2023:

Ah, oops. It is correct, because the modinfo structure requires modulus >= 3, but I forgot to point that out in the comment here (it is added elsewhere).

sipa commented at 4:23 pm on February 28, 2023:

Added a comment about that too now.

sipa force-pushed on Feb 28, 2023

in src/tests.c:1046 in 1c1c202268 outdated

1042+        negone[0] = mod[0] - 1;
1043+        for (i = 1; i < 16; ++i) negone[i] = mod[i];
1044+        mulmod256(sqr, sqr, negone, mod);
1045+        uint16_to_signed30(&x, sqr);
1046+        jac = secp256k1_jacobi32_maybe_var(&x, &m);
1047+        CHECK(jac == 0 || jac == (1 - (mod[0] & 2)) * nonzero);

jonasnick commented at 8:51 pm on February 28, 2023:

nonzero is always 1 in this branch (same in modinv64 tests)

sipa commented at 8:57 pm on February 28, 2023:

Fixed.

jonasnick commented at 8:51 pm on February 28, 2023: contributor

reACK mod nit

Native jacobi symbol algorithm

This introduces variants of the divsteps-based GCD algorithm used for
modular inverses to compute Jacobi symbols. Changes compared to
the normal vartime divsteps:
* Only positive matrices are used, guaranteeing that f and g remain
  positive.
* An additional jac variable is updated to track sign changes during
  matrix computation.
* There is (so far) no proof that this algorithm terminates within
  reasonable amount of time for every input, but experimentally it
  appears to almost always need less than 900 iterations. To account
  for that, only a bounded number of iterations is performed (1500),
  after which failure is returned. In VERIFY mode a lower iteration
  count is used to make sure that callers exercise their fallback.
* The algorithm converges to f=g=gcd(f0,g0) rather than g=0. To keep
  this test simple, the end condition is f=1, which won't be reached
  if started with non-coprime or g=0 inputs. Because of that we only
  support coprime non-zero inputs.

1de2a01c2b

Add secp256k1_fe_is_square_var function

The implementation calls the secp256k1_modinvNN_jacobi_var code, falling back
to computing a square root in the (extremely rare) case it failed converge.

6be01036c8

doc: Describe Jacobi calculation in safegcd_implementation.md ce3cfc78a6

sipa force-pushed on Feb 28, 2023

jonasnick commented at 9:10 pm on February 28, 2023: contributor

ACK ce3cfc78a6020d21be299e1e4f22cf8ef089194d

real-or-random approved

real-or-random commented at 11:50 am on March 1, 2023: contributor

reACK ce3cfc78a6020d21be299e1e4f22cf8ef089194d diff and writeup is good and I tested every commit

real-or-random merged this on Mar 1, 2023

real-or-random closed this on Mar 1, 2023

real-or-random cross-referenced this on Mar 1, 2023 from issue modinv: Verify invariant of _modinfo struct by real-or-random

dhruv referenced this in commit a5df79db12 on Mar 7, 2023

dhruv referenced this in commit 77b510d84c on Mar 7, 2023

sipa referenced this in commit 763079a3f1 on Mar 8, 2023

div72 referenced this in commit 945b094575 on Mar 14, 2023

in src/int128_struct_impl.h:196 in ce3cfc78a6

195-                  : r->hi == 0 && r->lo == (uint64_t)1 << n;
196+static SECP256K1_INLINE int secp256k1_i128_check_pow2(const secp256k1_int128 *r, unsigned int n, int sign) {
197+    VERIFY_CHECK(n < 127);
198+    VERIFY_CHECK(sign == 1 || sign == -1);
199+    return n >= 64 ? r->hi == (uint64_t)sign << (n - 64) && r->lo == 0
200+                   : r->hi == (uint64_t)((sign - 1) >> 1) && r->lo == (uint64_t)sign << n;

roconnor-blockstream commented at 10:37 am on March 22, 2023:

There is no need to subtract 1 before doing a right shift.

real-or-random commented at 6:29 am on April 10, 2023:

solved in https://github.com/bitcoin-core/secp256k1/pull/1250

real-or-random referenced this in commit 145078c418 on Apr 10, 2023

sipa cross-referenced this on May 11, 2023 from issue Abstract out and merge all the magnitude/normalized logic by sipa

vmta referenced this in commit e1120c94a1 on Jun 4, 2023

vmta referenced this in commit 8f03457eed on Jul 1, 2023

jonasnick cross-referenced this on Jul 20, 2023 from issue Upstream PRs 1160, 1193, 1169, 1190, 1192, 1194, 1196, 1195, 1170, 1172, 1200, 1199, 1203, 1201, 1206, 1078, 1209, 979, 1212, 1218, 1217, 1221, 1222 by jonasnick

Native jacobi symbol algorithm #979

MacBook Pro (x86_64)

Mac M1 (arm64)