Native jacobi symbol algorithm #979

pull sipa wants to merge 5 commits into bitcoin-core:master from sipa:202109_native_jacobi changing 13 files +553 −40
  1. sipa commented at 10:54 pm on September 23, 2021: contributor

    This introduces variants of the vartime divsteps-based GCD algorithm used for modular inverses to compute Jacobi symbols. Changes compared to the normal vartime divsteps:

    • Only positive matrices are used, guaranteeing that f and g remain positive.
    • An additional jac variable is updated to track sign changes during matrix computation.
    • There is (so far) no proof that this algorithm terminates within reasonable amount of time for every input, but experimentally it appears to almost always need less than 900 iterations. To account for that, only a bounded number of iterations is performed (1500), after which failure is returned. The field logic then falls back to using square roots to determining the result.
    • The algorithm converges to f=g=gcd(f0,g0) rather than g=0. To keep this test simple, the end condition is f=1, which won’t be reached if started with g=0. That case is dealt with specially.

    This code is currently unused, except for tests. I don’t aim for it to be merged until there is a need for it, but this demonstrates its feasibility.

    In terms of performance:

    0field_inverse: min 1.76us / avg 1.76us / max 1.78us
    1field_inverse_var: min 0.991us / avg 0.993us / max 0.996us
    2field_jacobi_var: min 1.31us / avg 1.31us / max 1.31us
    3field_sqrt: min 4.36us / avg 4.37us / max 4.40us
    

    while with the older (f24e122d13db7061b1086ddfd21d3a1c5294213b) libgmp based Jacobi code on the same system:

    0num_jacobi: min 1.53us / avg 1.54us / max 1.55us
    
  2. sipa force-pushed on Sep 23, 2021
  3. sipa force-pushed on Sep 23, 2021
  4. sipa cross-referenced this on Sep 28, 2021 from issue Add Elligator Square module by sipa
  5. dhruv cross-referenced this on Nov 3, 2021 from issue BIP324: CKey encode/decode to elligator-swift by dhruv
  6. in src/modinv32.h:41 in 042f91bf4a outdated
    38@@ -39,4 +39,7 @@ static void secp256k1_modinv32_var(secp256k1_modinv32_signed30 *x, const secp256
    39 /* Same as secp256k1_modinv32_var, but constant time in x (not in the modulus). */
    40 static void secp256k1_modinv32(secp256k1_modinv32_signed30 *x, const secp256k1_modinv32_modinfo *modinfo);
    41 
    42+/* Compute the Jacobi symbol for x (where gcd(x,p) == 1), 0 if it cannot be determined. */
    43+static int secp256k1_jacobi32_maybe_var(const secp256k1_modinv32_signed30 *x, const secp256k1_modinv32_modinfo *modinfo);
    


    robot-dreams commented at 5:32 pm on November 8, 2021:
    Nit: This API is probably ok since you specify that gcd(x, p) == 1, but is it still dangerous to return 0 here, instead of some other constant (like #define JACOBI_VAR_FAILURE -2 or whatever) that can’t possibly be the value of a jacobi symbol?

    sipa commented at 9:31 pm on November 9, 2021:
    Good idea. Done.
  7. in src/tests.c:963 in 042f91bf4a outdated
    828+            negone[0] = mod[0] - 1;
    829+            for (i = 1; i < 16; ++i) negone[i] = mod[i];
    830+            mulmod256(sqr, sqr, negone, mod);
    831+            uint16_to_signed30(&x, sqr);
    832+            CHECK(secp256k1_jacobi32_maybe_var(&x, &m) == -nonzero);
    833+        }
    


    robot-dreams commented at 5:44 pm on November 8, 2021:

    Given that (-1 | n) = (-1)^((n - 1) / 2), I think we can make a stronger claim here that x and -x have opposite jacobi symbols if and only if mod % 4 == 3.

    Would it make sense to get some more test cases here (and in the jacobi64 version below) by considering both possibilities, e.g. as follows?

    0        negone[0] = mod[0] - 1;
    1        for (i = 1; i < 16; ++i) negone[i] = mod[i];
    2        mulmod256(sqr, sqr, negone, mod);
    3        uint16_to_signed30(&x, sqr);
    4        /* x and -x have opposite jacobi symbols if and only if mod % 4 == 3. */
    5        if ((mod[0] & 3) == 3) {
    6            CHECK(secp256k1_jacobi32_maybe_var(&x, &m) == -nonzero);
    7        } else {
    8            CHECK(secp256k1_jacobi32_maybe_var(&x, &m) == nonzero);
    9        }
    

    sipa commented at 9:32 pm on November 9, 2021:
    Nice catch, done. I was hoping to have a test that guarantees finding a -1 result, but that’s hard without a per-modulus search for a non-square factor. This is pretty useful too, though.
  8. in src/field_5x52_impl.h:691 in 042f91bf4a outdated
    591+         * to computing a square root. This should be extremely rare with random
    592+         * input. */
    593+        secp256k1_fe dummy;
    594+        ret = 2*secp256k1_fe_sqrt(&dummy, &tmp) - 1;
    595+    }
    596+    return ret;
    


    robot-dreams commented at 8:01 pm on November 8, 2021:

    Nit: Is it worth adding an extra verification step here, to always compute the jacobi symbol both ways for tests?

    0    } else {
    1#ifdef VERIFY
    2        secp256k1_fe dummy;
    3        VERIFY_CHECK(2*secp256k1_fe_sqrt(&dummy, &tmp) - 1 == ret);
    4#endif
    5    }
    6    return ret;
    

    sipa commented at 9:47 pm on November 9, 2021:
    Done.
  9. robot-dreams commented at 10:01 pm on November 8, 2021: contributor

    tACK 042f91bf4acca0e1d70ca6afffe045a525a7b614

    ./tests passed, including the additional checks I suggested.

    I’m still brushing up on the divsteps algorithm, and will review this change again once I’ve worked through the various details / optimizations.


    Benchmark results look good on x86_64 (MacBook Pro): native jacobi version is faster than libgmp. However, on arm64 (Mac M1), native jacobi version was slightly slower!

    On M1 I also tried ./configure --with-asm=arm --enable-experimental to see if that would speed up the native jacobi version, but attempting to build resulted in a lot of compile errors in field_10x26_arm.s.

    MacBook Pro (x86_64)

    native jacobi branch

    0$ ./bench_internal field
    1...
    2field_jacobi_var: min 2.03us / avg 2.25us / max 3.85us
    3field_sqrt: min 6.33us / avg 6.38us / max 6.46us
    4
    5$ ./bench_internal jacobi
    6field_jacobi_var: min 2.00us / avg 2.06us / max 2.15us
    

    libgmp branch f24e122d13db7061b1086ddfd21d3a1c5294213b:

    0$ ./bench_internal field
    1...
    2field_sqrt: min 6.05us / avg 6.13us / max 6.38us
    3
    4$ ./bench_internal num
    5num_jacobi: min 2.44us / avg 2.47us / max 2.53us
    

    Mac M1 (arm64)

    native jacobi branch

    0% ./bench_internal field
    1...
    2field_jacobi_var: min 1.54us / avg 1.55us / max 1.56us
    3field_sqrt: min 4.04us / avg 4.09us / max 4.29us
    4
    5% ./bench_internal jacobi
    6field_jacobi_var: min 1.55us / avg 1.74us / max 3.18us
    

    libgmp branch f24e122d13db7061b1086ddfd21d3a1c5294213b:

    0% ./bench_internal field
    1...
    2field_sqrt: min 4.04us / avg 4.04us / max 4.05us
    3
    4% ./bench_internal jacobi
    5...
    6num_jacobi: min 1.41us / avg 1.41us / max 1.42us
    
  10. real-or-random commented at 8:09 am on November 9, 2021: contributor
    @robot-dreams Always nice to see new contributors in the repo. :) Just curious, are you simply helping or are you interested in getting this merged? If the latter, I’m further curious about your use case. Feel free just ignore my intrusive questions.
  11. robot-dreams commented at 5:25 pm on November 9, 2021: contributor
    @real-or-random Thanks for the reply! I’m simply helping and try to learn about this repo, I don’t have a personal use case.
  12. in src/modinv64_impl.h:327 in 042f91bf4a outdated
    322+        eta -= zeros;
    323+        i -= zeros;
    324+        /* Update the bottom bit of jac: when dividing g by an odd power of 2,
    325+         * if (f mod 8) is 3 or 5, the Jacobi symbol changes sign. */
    326+        jac ^= (zeros & ((f >> 1) ^ (f >> 2)));
    327+        /* We're done once we've done 60 posdivsteps. */
    


    robot-dreams commented at 5:30 pm on November 9, 2021:
    Nit: 62 posdivsteps?

    sipa commented at 4:30 pm on November 10, 2021:
    Done.
  13. in src/modinv32_impl.h:350 in 042f91bf4a outdated
    345+        eta -= zeros;
    346+        i -= zeros;
    347+        /* Update the bottom bit of jac: when dividing g by an odd power of 2,
    348+         * if (f mod 8) is 3 or 5, the Jacobi symbol changes sign. */
    349+        jac ^= (zeros & ((f >> 1) ^ (f >> 2)));
    350+        /* We're done once we've done 60 posdivsteps. */
    


    robot-dreams commented at 5:37 pm on November 9, 2021:
    Nit: 30 posdivsteps?

    sipa commented at 4:30 pm on November 10, 2021:
    Done.
  14. sipa force-pushed on Nov 9, 2021
  15. sipa force-pushed on Nov 9, 2021
  16. sipa force-pushed on Nov 9, 2021
  17. sipa force-pushed on Nov 9, 2021
  18. sipa force-pushed on Nov 9, 2021
  19. sipa commented at 10:24 pm on November 9, 2021: contributor

    @robot-dreams Thanks for your comments!

    The failure with asm=arm is expected, as Apple M1 is 64-bit ARM, while the assembly code used here is 32-bit.

  20. sipa commented at 10:57 pm on November 9, 2021: contributor
    Made another change: in VERIFY mode, the upper bound on the number of posdivsteps iterations is reduced to ~750 (which is close to the median of how many iterations are needed for random input mod the field size), so that the uncomputable path is actually exercised in the tests.
  21. in src/modinv64_impl.h:333 in 6e3fcc4f51 outdated
    328+        if (i == 0) break;
    329+        VERIFY_CHECK((f & 1) == 1);
    330+        VERIFY_CHECK((g & 1) == 1);
    331+        VERIFY_CHECK((u * f0 + v * g0) == f << (62 - i));
    332+        VERIFY_CHECK((q * f0 + r * g0) == g << (62 - i));
    333+        /* If eta is negative, negate it and replace f,g with g,-f. */
    


    robot-dreams commented at 3:35 am on November 10, 2021:
    Nit: replace f,g with g,f?

    sipa commented at 4:30 pm on November 10, 2021:
    Done.
  22. in src/modinv32_impl.h:356 in 6e3fcc4f51 outdated
    351+        if (i == 0) break;
    352+        VERIFY_CHECK((f & 1) == 1);
    353+        VERIFY_CHECK((g & 1) == 1);
    354+        VERIFY_CHECK((u * f0 + v * g0) == f << (30 - i));
    355+        VERIFY_CHECK((q * f0 + r * g0) == g << (30 - i));
    356+        /* If eta is negative, negate it and replace f,g with g,-f. */
    


    robot-dreams commented at 3:54 am on November 10, 2021:
    Nit: replace f,g with g,f?

    sipa commented at 4:30 pm on November 10, 2021:
    Done.
  23. in src/modinv64_impl.h:345 in 6e3fcc4f51 outdated
    340+            /* Update bottom bit of jac: when swapping f and g, the Jacobi symbol changes sign
    341+             * if both f and g are 3 mod 4. */
    342+            jac ^= ((f & g) >> 1);
    343+            /* Use a formula to cancel out up to 6 bits of g. Also, no more than i can be cancelled
    344+             * out (as we'd be done before that point), and no more than eta+1 can be done as its
    345+             * will flip again once that happens. */
    


    robot-dreams commented at 4:02 am on November 10, 2021:
    Nit: sign will flip?

    sipa commented at 4:29 pm on November 10, 2021:
    Done.
  24. robot-dreams commented at 4:44 am on November 10, 2021: contributor

    ACK 6e3fcc4f5149c96ab052f3312091dfc0b8eab397

    I agree with your modifications to the original safegcd for calculating Jacobi symbol instead of modular inverse. I also replicated the logic for the 64-bit version in Python, and checked that the result matches a “slow” Jacobi function for a few randomly generated inputs (see https://gist.github.com/robot-dreams/ceb00162b80384f2ae1913aaa2b35e75).

    One potential concern is that with this change, there are now 4 slightly different implementations of a similar algorithm (32/64-bit, inv/jacobi). Do you think there’s a nice way to consolidate the logic, at least among a given word size? (If there’s no nice way or if it’s too much trouble, I’m fine with leaving it as is and relying on having good tests.)

    Other questions:

    • What was the motivation for using different approaches to clear the low-order bits of g between the 32 and 64 bit implementations (8-bit lookup table vs 4 and 6-bit tricks from Hacker’s Delight)?

    • I agree that replacing f, g = g, -f with f, g = g, f ensures that all matrix entries are always positive. But aside from actually observing that it converges to f = 1 in practice, do you have any other intuitive justification for this? (The best I can come up with is that since you’re constantly dividing g by 2 and swapping the roles of f and g, you’re quickly moving towards a solution even if you replace g - f with g + f in the original GCD algorithm.)

    • Is there already a writeup describing this change? If not, would it make sense to add a brief section to https://github.com/bitcoin-core/secp256k1/blob/master/doc/safegcd_implementation.md? (If you think so, I’d be happy to give that a shot.)

  25. robot-dreams commented at 4:57 am on November 10, 2021: contributor

    @sipa Thanks for responding to the comments!

    • Yes, 32-bit makes sense (in retrospect the fact that the file was called field_10x26 instead of field_5x52 should have given that away)
    • Great idea to reduce the number of iterations for validation!
  26. sipa force-pushed on Nov 10, 2021
  27. sipa commented at 4:28 pm on November 10, 2021: contributor

    One potential concern is that with this change, there are now 4 slightly different implementations of a similar algorithm (32/64-bit, inv/jacobi). Do you think there’s a nice way to consolidate the logic, at least among a given word size? (If there’s no nice way or if it’s too much trouble, I’m fine with leaving it as is and relying on having good tests.)

    Arguably there are 6 implementations (32/64-bit, and const/var/varpos divsteps), and they do share some of the code (in particular the matrix application), but not everything. I don’t think there is an obvious way of merging more code without reducing performance, but I also haven’t tried very hard.

    What was the motivation for using different approaches to clear the low-order bits of g between the 32 and 64 bit implementations (8-bit lookup table vs 4 and 6-bit tricks from Hacker’s Delight)?

    Just dumb benchmarking. The formula was faster on x86_64 systems we tried, and the table was faster on ARMv7 systems (it’s explained in one of the commit messages that introduced it).

    I agree that replacing f, g = g, -f with f, g = g, f ensures that all matrix entries are always positive. But aside from actually observing that it converges to f = 1 in practice, do you have any other intuitive justification for this? (The best I can come up with is that since you’re constantly dividing g by 2 and swapping the roles of f and g, you’re quickly moving towards a solution even if you replace g - f with g + f in the original GCD algorithm.)

    I don’t, no. I do have a proof that any f>0,g>0 converges to f=g somewhere in a finite number of steps, but the upper bound is something like f*g*log2(f*g), which is obviously useless for practical purposes. The technique explained in https://github.com/sipa/safegcd-bounds completely fails for posdivsteps (doesn’t converge at all). With a lot of tricks I got it to converge, but in what appears quadratic (O(log2(f)^2)), and it’s only tractable to evaluate for f up to 2^20 or so.

    The difficulty is that there are both steps that take f and g further apart (g /= 2) and steps that bring them closer (g = (f+g)/2), and it’s hard to infer tight bounds on how many times one can happen in between two executions of the other.

    So in short, no; the actual argument why this works is purely empirical; all out of a 2.5 billion random inputs (with the field size modulus) need between 630 and 921 iterations.

    Is there already a writeup describing this change? If not, would it make sense to add a brief section to https://github.com/bitcoin-core/secp256k1/blob/master/doc/safegcd_implementation.md? (If you think so, I’d be happy to give that a shot.)

    I think that would be valuable, yes. I haven’t done so yet, but was considering it. Feel free to take a shot at it.

  28. sipa force-pushed on Nov 12, 2021
  29. sipa commented at 4:37 pm on November 12, 2021: contributor
    Included @robot-dreams’s update to the writeup.
  30. in src/modinv32_impl.h:675 in 902d57e075 outdated
    660@@ -584,4 +661,69 @@ static void secp256k1_modinv32_var(secp256k1_modinv32_signed30 *x, const secp256
    661     *x = d;
    662 }
    663 
    664+/* Compute the Jacobi symbol of x modulo modinfo->modulus (variable time). gcd(x,modulus) must be 1. */
    


    robot-dreams commented at 6:46 pm on November 15, 2021:

    Nits (also applies to modinv64_impl):

    • Is gcd(x,modulus) must be 1 still true, since you’re handling x=0 here?
    • Is it necessary to also mention x must be normalized here for consistency with modinv32.h?

    sipa commented at 8:28 pm on November 16, 2021:
    I made a few changes to address this.
  31. robot-dreams commented at 6:52 pm on November 15, 2021: contributor
    ACK 5af3d290b6eabd6dd5cb2565caab24c0ceb8b8f9 aside from comment nits—I checked the range diff against the old commit and it looks good.
  32. sipa force-pushed on Nov 16, 2021
  33. robot-dreams commented at 9:22 pm on November 16, 2021: contributor
    ACK 53b7ce28bef8008787c1aeb202ec17ef3a3b3325
  34. sipa closed this on Jan 29, 2022

  35. sipa reopened this on Jan 29, 2022

  36. sipa force-pushed on Jun 28, 2022
  37. sipa cross-referenced this on Jul 3, 2022 from issue Add x-only ecmult_const version with x specified as n/d by sipa
  38. sipa cross-referenced this on Jul 21, 2022 from issue ElligatorSwift + integrated x-only DH by sipa
  39. mratsim cross-referenced this on Aug 8, 2022 from issue Perf: `isSquare` - constant-time Jacobi/Kronecker/Legendre symbol using fast GCD by mratsim
  40. sipa force-pushed on Nov 2, 2022
  41. sipa force-pushed on Nov 4, 2022
  42. in doc/safegcd_implementation.md:789 in d70f0fefc9 outdated
    784+        # After
    785+        if delta > 0 and g & 1:
    786+            delta, f, g = 1 - delta, g, (g + f) // 2
    787+```
    788+
    789+The algorithm is still correct, since the changed divstep, called a "posdivstep" (see section 8.4 and E.5 in the paper) preserves *gcd(f, g)*. However, there's no proof that the modified algorithm will converge. The justification for posdivsteps is completely empirical: in practice, it appears that the vast majority of inputs converge to *f=g=gcd(f<sub>0</sub>, g<sub>0<sub>)* in a number of steps proportional to their logarithm.
    


    jonasnick commented at 10:25 pm on December 5, 2022:
    g<sub>0<sub> -> g<sub>0</sub>

    sipa commented at 4:45 pm on December 10, 2022:
    Fixed.
  43. in src/bench_internal.c:233 in d70f0fefc9 outdated
    225+    for (i = 0; i < iters; i++) {
    226+        j += secp256k1_fe_jacobi_var(&data->fe[0]);
    227+        secp256k1_fe_add(&data->fe[0], &data->fe[1]);
    228+    }
    229+    CHECK(j <= iters);
    230+}
    


    jonasnick commented at 3:55 pm on December 10, 2022:

    ./bench_internal jacobi crashes on my machine when configured with CPPFLAGS=-DVERIFY. Backtrace:

     0src/field_5x52_impl.h:57: test condition failed: r == 1
     1
     2Program received signal SIGABRT, Aborted.
     30x00007ffff7e3fbc7 in __pthread_kill_implementation () from /nix/store/hsk71z8admvgykn7vzjy11dfnar9f4r1-glibc-2.35-163/lib/libc.so.6
     4(gdb) bt
     5[#0](/bitcoin-core-secp256k1/0/)  0x00007ffff7e3fbc7 in __pthread_kill_implementation () from /nix/store/hsk71z8admvgykn7vzjy11dfnar9f4r1-glibc-2.35-163/lib/libc.so.6
     6[#1](/bitcoin-core-secp256k1/1/)  0x00007ffff7df2b46 in raise () from /nix/store/hsk71z8admvgykn7vzjy11dfnar9f4r1-glibc-2.35-163/lib/libc.so.6
     7[#2](/bitcoin-core-secp256k1/2/)  0x00007ffff7ddd4b5 in abort () from /nix/store/hsk71z8admvgykn7vzjy11dfnar9f4r1-glibc-2.35-163/lib/libc.so.6
     8[#3](/bitcoin-core-secp256k1/3/)  0x0000000000402920 in secp256k1_fe_verify (a=<optimized out>) at src/field_5x52_impl.h:57
     9[#4](/bitcoin-core-secp256k1/4/)  0x000000000040c1f8 in secp256k1_fe_add (a=0x99, r=0x69) at src/field_5x52_impl.h:443
    10[#5](/bitcoin-core-secp256k1/5/)  bench_field_jacobi_var (arg=arg@entry=0x7fffffffcf20, iters=iters@entry=20000) at src/bench_internal.c:227
    11[#6](/bitcoin-core-secp256k1/6/)  0x0000000000413d5a in run_benchmark (name=0x416221 "field_jacobi_var", benchmark=0x40bdd0 <bench_field_jacobi_var>, setup=0x40cd70 <bench_setup>,
    12    teardown=0x0, data=0x7fffffffcf20, count=10, iter=20000) at src/bench.h:109
    13[#7](/bitcoin-core-secp256k1/7/)  0x000000000040147d in main (argc=2, argv=0x7fffffffd798) at src/bench_internal.c:393
    

    sipa commented at 4:45 pm on December 10, 2022:
    Fixed. It was just a missing normalize in the benchmark.
  44. in src/tests.c:3205 in d70f0fefc9 outdated
    2893@@ -2854,8 +2894,10 @@ void run_sqrt(void) {
    2894         for (j = 0; j < count; j++) {
    2895             random_fe(&x);
    2896             secp256k1_fe_sqr(&s, &x);
    2897+            CHECK(secp256k1_fe_jacobi_var(&s) == 1);
    2898             test_sqrt(&s, &x);
    2899             secp256k1_fe_negate(&t, &s, 1);
    2900+            CHECK(secp256k1_fe_jacobi_var(&t) == -1);
    


    jonasnick commented at 3:59 pm on December 10, 2022:

    Is it possible that that secp256k1_fe_jacobi_var(&s) is taking the slow secp256k1_fe_sqrt branch about 50% of the time? I discovered this on my machine with this simple patch that adds a printf:

     0diff --git a/src/field_5x52_impl.h b/src/field_5x52_impl.h
     1index 26e89123..d3eb3347 100644
     2--- a/src/field_5x52_impl.h
     3+++ b/src/field_5x52_impl.h
     4@@ -681,6 +681,7 @@ static int secp256k1_fe_jacobi_var(const secp256k1_fe *x) {
     5          * to computing a square root. This should be extremely rare with random
     6          * input. */
     7         secp256k1_fe dummy;
     8+        printf("slow");
     9         ret = 2*secp256k1_fe_sqrt(&dummy, &tmp) - 1;
    10 #ifdef VERIFY
    11     } else {
    

    I’m quite confused because should be “extremely rare with random input”, but s appears to be a random field element. Furthermore, “slow” isn’t printed in the jacobi benchmarks unless I compile with -DVERIFY (which crashes after a few iterations, see other comment).


    sipa commented at 4:04 pm on December 10, 2022:

    Yes, deliberately.

    WIth -DVERIFY, the cut-off is at ~900 iterations of the algorithm, chosen to be roughly the median of the number of iterations needed. Without -DVERIFY, ~1500 iterations is the limit, which has probability less than 1 in a billion (probably a lot less).

    It’s using a lower limit in -DVERIFY to make sure the slow branch is exercised in tests.


    jonasnick commented at 4:46 pm on December 10, 2022:
    Ah ok, that’s good. I did not looked into secp256k1_jacobi64_maybe_var so far and hadn’t considered that its behavior would change on VERIFY.
  45. sipa force-pushed on Dec 10, 2022
  46. sipa force-pushed on Dec 10, 2022
  47. sipa force-pushed on Dec 10, 2022
  48. sipa commented at 5:02 pm on December 10, 2022: contributor

    I removed a determinant check in the 64-bit jacobi code, as this new algorithm can result in determinant either 2^62 or -2^62, but the int128 code can’t deal with that yet.

    EDIT: fixed now, added two commits to deal with that.

  49. sipa force-pushed on Dec 10, 2022
  50. sipa force-pushed on Dec 10, 2022
  51. sipa force-pushed on Dec 10, 2022
  52. sipa force-pushed on Dec 10, 2022
  53. in src/int128_struct_impl.h:195 in 55822babf2 outdated
    194+        return n >= 64 ? r->hi == (uint64_t)1 << (n - 64) && r->lo == 0
    195+                       : r->hi == 0 && r->lo == (uint64_t)1 << n;
    196+    } else {
    197+        return n >= 64 ? r->hi == (~(uint64_t)0) << (n - 64) && r->lo == 0
    198+                       : r->hi == (~(uint64_t)0) && r->lo == (~(uint64_t)0) << n;
    199+    }
    


    real-or-random commented at 4:48 pm on January 3, 2023:

    nit:

    0    return n >= 64 ? r->hi == (uint64_t)sign << (n - 64) && r->lo == 0
    1                   : r->hi == (uint64_t)((sign - 1) >> 2) && r->lo == (uint64_t)sign << n;
    

    sipa commented at 11:28 pm on January 3, 2023:
    Done.
  54. in src/tests.c:2001 in 55822babf2 outdated
    1997+        int pos = ub % 127;
    1998+        if (expect) {
    1999+            /* If expect==1, set swz to exactly -(2 << pos). */
    2000+            uint64_t hi = ~(uint64_t)0;
    2001+            uint64_t lo = ~(uint64_t)0;
    2002+            if (pos & 64) {
    


    real-or-random commented at 5:03 pm on January 3, 2023:

    nit:

    0            if (pos >= 64) {
    

    I find this much clearer, it took me a while to convince myself that they’re equivalent (which holds because pos < 128).

    If you happen to accept this suggestion, you should change it below too and also the two occurrences in the sign == 1 test block, where you got this code from.


    sipa commented at 11:28 pm on January 3, 2023:
    Done.
  55. in src/modinv64_impl.h:74 in 55822babf2 outdated
    70@@ -71,11 +71,13 @@ static int secp256k1_modinv64_mul_cmp_62(const secp256k1_modinv64_signed62 *a, i
    71     return 0;
    72 }
    73 
    74-/* Check if the determinant of t is equal to 1 << n. */
    75-static int secp256k1_modinv64_det_check_pow2(const secp256k1_modinv64_trans2x2 *t, unsigned int n) {
    76+/* Check if the determinant of t is equal to 1 << n. If abs, |det| == 1 << n. */
    


    real-or-random commented at 5:14 pm on January 3, 2023:

    readability nit:

    0/* Check if the determinant of t is equal to 1 << n. If abs, check if |det t| == 1 << n. */
    

    or if you prefer shorter

    0/* Check if the determinant of t equals 1 << n. If abs, check if |det t| == 1 << n. */
    

    real-or-random commented at 7:17 am on January 5, 2023:
    This one is still unresolved – (Feel free to dismiss, I’m mostly checking if the PR is ready for review again.)

    sipa commented at 3:40 pm on January 5, 2023:
    Done.
  56. in src/field.h:138 in 55822babf2 outdated
    138@@ -139,4 +139,7 @@ static void secp256k1_fe_half(secp256k1_fe *r);
    139  *  magnitude set to 'm' and is normalized if (and only if) 'm' is zero. */
    140 static void secp256k1_fe_get_bounds(secp256k1_fe *r, int m);
    141 
    142+/** Compute the Jacobi symbol of a / p. 0 if a=0; 1 if a square; -1 if a non-square. */
    


    real-or-random commented at 5:25 pm on January 3, 2023:

    nit:

    0/** Compute the Jacobi symbol of a / p. 0 if a=0; 1 if a square and non-zero; -1 if a non-square. */
    
  57. in src/field_5x52_impl.h:680 in 55822babf2 outdated
    679+    if (ret == -2) {
    680+        /* secp256k1_jacobi64_maybe_var failed to compute the Jacobi symbol. Fall back
    681+         * to computing a square root. This should be extremely rare with random
    682+         * input. */
    683+        secp256k1_fe dummy;
    684+        ret = 2*secp256k1_fe_sqrt(&dummy, &tmp) - 1;
    


    real-or-random commented at 6:05 pm on January 3, 2023:

    Should instead have a function fe_is_square_var?

    I think this will be better because it

    • avoids the return value translations here and below
    • is more ergonomic to use (I think, remember that we had it this in the old version of BIP340 too),
    • and it aligns with the uses in ellswift: in the corresponding PR, we only test >=0 and <0, so we actually don’t even need the special casing for ==0 currently.

    This is related to my comment below about the jacp function argument.


    sipa commented at 9:52 pm on January 4, 2023:
    I’ve renamed the function to secp256k1_fe_is_square_var; the internal (modinv) one is still actually computing a Jacobi symbol, I’m less convinced about changing things there.

    real-or-random commented at 7:16 am on January 5, 2023:
    makes sense
  58. in src/modinv64_impl.h:330 in 55822babf2 outdated
    323+    /* Transformation matrix; see comments in secp256k1_modinv64_divsteps_62. */
    324+    uint64_t u = 1, v = 0, q = 0, r = 1;
    325+    uint64_t f = f0, g = g0, m;
    326+    uint32_t w;
    327+    int i = 62, limit, zeros;
    328+    int jac = *jacp;
    


    real-or-random commented at 6:26 pm on January 3, 2023:
    0    int jac = *jacp;
    1    VERIFY_CHECK(jac == 0 || jac == 1);
    

    sipa commented at 10:02 pm on January 4, 2023:
    That’s not actually correct; the other bits of *jacp will take on non-zero values too, but are irrelevant. I’ve hopefully addressed that in the added comments.

    real-or-random commented at 6:42 am on January 5, 2023:
    Ok, I got it after reading your new comments.
  59. in src/modinv64_impl.h:317 in 55822babf2 outdated
    313+ * of the Jacobi symbol along the way. f0 and g0 must be f and g mod 2^64 rather than 2^62, because
    314+ * Jacobi tracking requires knowing (f mod 8) rather than just (f mod 2).
    315+ *
    316+ * Input:  eta: initial eta
    317+ *         f0:  bottom limb of initial f
    318+ *         g0:  bottom limb of initial g
    


    real-or-random commented at 6:29 pm on January 3, 2023:
    should document the input/output parameter jacp. (And the variable name is confusing. This is not the jacobi symbol but its sign.)

    sipa commented at 9:52 pm on January 4, 2023:
    I’ve added a comment explaining it (in both the 32-bit and 64-bit versions).
  60. in src/modinv64_impl.h:729 in 55822babf2 outdated
    721+ * In VERIFY mode use a lower number of iterations (744, close to the median 756), so failure actually occurs. */
    722+#ifdef VERIFY
    723+#define JACOBI64_ITERATIONS 12
    724+#else
    725+#define JACOBI64_ITERATIONS 25
    726+#endif
    


    real-or-random commented at 6:34 pm on January 3, 2023:

    On the one hand, I really like this trick. But on the other hand, it creates a difference between VERIFY and non-VERIFY (and who knows, maybe this influences loop unrolling etc.)

    Would it be a good idea to keep this 25 in the real code but create a test-only version of it (either in the tests file or here) that just runs with 12? Or maybe better, make the number of iterations an argument of the function? Since it’s a “maybe” function, it should be correct for each iteration value.


    sipa commented at 9:56 pm on January 4, 2023:

    it creates a difference between VERIFY and non-VERIFY (and who knows, maybe this influences loop unrolling etc.)

    Is that a problem? VERIFY vs non-VERIFY may already affect loop unrolling by having more code inside the loop for example. I feel like VERIFY mode isn’t to test the compiler’s correctness, but to test the correctness of our source code.

    So… I’m not sure. I do see the uneasiness about it, but I feel that the current approach is really the only one that guarantees (a) it’s actually exercised in realistic settings and (b) using the same code as is used in production. A copy in tests could go out of sync, and having the iteration count be an argument still means somewhere higher in the stack there needs to be a decision what to set it to, which I think has the same problems.

    If we don’t like it, I think the best alternative is just having maintainers occasionally manually setting the iteration count lower, and seeing if tests still pass.


    real-or-random commented at 7:13 am on January 5, 2023:

    I feel like VERIFY mode isn’t to test the compiler’s correctness, but to test the correctness of our source code.

    That’s the way I see it. too. [1] I think the fundamental problem here is that we don’t really run tests in non-VERIFY mode. I’ll comment in #1037, which is somewhat related.

    A copy in tests could go out of sync, […]

    Agreed, that’s bad.

    If we don’t like it, I think the best alternative is just having maintainers occasionally manually setting the iteration count lower, and seeing if tests still pass.

    That’s even worse, we’ll forget to run it.

    and having the iteration count be an argument still means somewhere higher in the stack there needs to be a decision what to set it to, which I think has the same problems.

    So… I’m not sure. I do see the uneasiness about it, but I feel that the current approach is really the only one that guarantees (a) it’s actually exercised in realistic settings and (b) using the same code as is used in production.

    Hm, yeah, I don’t think (a) is a big concern here: What this VERIFY tests is just handling the loop termination in case of too many iterations, and that’s a few lines of code which doesn’t care too much if the failure happened in an realistic input or not. So what I imagined it is that we run it for values of the iteration count in the test, and this will be good enough to exercise this code.

    This would have the advantage that it minimizes the difference between VERIFY and non-VERIFY mode. But after thinking more about it, I agree that it’s a somewhat moot goal. [2] Your comment about loop unrolling is right and we have a lot of code locations where we add some substantial amount of code in VERIFY (you introduce another one here in this PR, which I don’t mind at all.)

    So yes, let’s keep this as it is.

    [1] Though to some extent, VERIFY can also help checking compiler, e.g., when a function is miscompiles and its result is VERIFIED somewhere else. [2] I think my brain mixed up my thoughts with the discussion in #1169#pullrequestreview-1224682895, but this was about SECP256K1_CHECKMEM_ENABLED being defined, not VERIFY.


    sipa commented at 3:19 pm on January 5, 2023:

    Another thing is that the approach here guarantees that (in VERIFY mode) all potential callers of this function actually deal with failure being returned. An approach where the caller has to explicitly ask for “low iterations” doesn’t have that property, as a hypothetical oblivious caller just wouldn’t ask for it.

    That’s a minor concern of course; I think it’s pretty unlikely there will be other callers than secp256k1_fe_is_square_var any time soon (and tests).

  61. real-or-random commented at 6:39 pm on January 3, 2023: contributor

    This is really good, I have almost only nits.

    note to self: I haven’t looked at the tests for secp256k1_jacobi32/64_maybe_var yet

  62. in src/modinv32_impl.h:333 in 55822babf2 outdated
    325+ *         f0:  bottom limb of initial f
    326+ *         g0:  bottom limb of initial g
    327+ * Output: t: transition matrix
    328+ * Return: final eta
    329+ */
    330+static int32_t secp256k1_modinv32_posdivsteps_30_var(int32_t eta, uint32_t f0, uint32_t g0, secp256k1_modinv32_trans2x2 *t, int *jacp) {
    


    real-or-random commented at 6:40 pm on January 3, 2023:
    Since this is so similar to the other function, have you considered using macros to avoid code duplication? (I’m not saying it’s necessarily better.)

    sipa commented at 9:58 pm on January 4, 2023:

    I have, but it’s a bunch of small changes:

    • Different transformation matrices (which in terms of code means some negation signs are gone).
    • Higher upper bounds on eta.
    • *jacp accounting.
    • Different constraints on the determinant of the transformation matrix.

    I don’t think macros to reduce code reuse would be very readable.


    real-or-random commented at 6:41 am on January 5, 2023:
    Makes sense!
  63. sipa force-pushed on Jan 3, 2023
  64. sipa force-pushed on Jan 4, 2023
  65. sipa force-pushed on Jan 5, 2023
  66. in doc/safegcd_implementation.md:775 in ef2ac449d7 outdated
    768@@ -769,3 +769,30 @@ def modinv_var(M, Mi, x):
    769         d, e = update_de(d, e, t, M, Mi)
    770     return normalize(f, d, Mi)
    771 ```
    772+
    773+## 8. From GCDs to Jacobi symbol
    774+
    775+We can also use a similar approach to calculate Jacobi symbol *(x | M)* by keeping track of an extra variable *j*, for which at every step *(x | M) = j (g | f)*. As we update *f* and *g*, we make corresponding updates to *j* using [properties of the Jacobi symbol](https://en.wikipedia.org/wiki/Jacobi_symbol#Properties). In particular, we update *j* whenever we divide *g* by *2* or swap *f* and *g*; these updates depend only on the values of *f* and *g* modulo *4* or *8*, and can thus be applied very quickly. Overall, this calculation is slightly simpler than the one for modular inverse because we no longer need to keep track of *d* and *e*.
    


    real-or-random commented at 4:24 pm on January 5, 2023:
    It wouldn’t hurt to spell the two properties out here for self-containedness.

    sipa commented at 8:57 pm on January 26, 2023:
    Done.
  67. in src/modinv32.h:38 in ef2ac449d7 outdated
    34@@ -35,4 +35,8 @@ static void secp256k1_modinv32_var(secp256k1_modinv32_signed30 *x, const secp256
    35 /* Same as secp256k1_modinv32_var, but constant time in x (not in the modulus). */
    36 static void secp256k1_modinv32(secp256k1_modinv32_signed30 *x, const secp256k1_modinv32_modinfo *modinfo);
    37 
    38+/* Compute the Jacobi symbol for (x | modinfo->modulus). Either x must be 0, or x must be coprime with
    


    real-or-random commented at 4:26 pm on January 5, 2023:

    maybe this should spell out that the Jacobi symbol is only defined for odd mod?

    (Do we need mod > 1? I see the termination condition f == 1 is only checked after applying a the first round of posdivsteps.)


    sipa commented at 6:54 pm on January 5, 2023:
    If mod=1, then x=0, so I think the output is trivial in any case then?

    real-or-random commented at 10:46 am on January 6, 2023:

    Okay yes, I don’t immediately see why. If this is the case, then it should probably be explained.

    (This can be done later as you said below.)


    jonasnick commented at 8:39 am on February 27, 2023:
    Nit: The jacobi symbol (0|1) is commonly defined to be 1 (see e.g. wikipedia), but secp256k1_jacobi_maybe_var returns 0 in that case. I think it would be valuable to document secp256k1_jacobi_maybe_var’s behavior in this edge case.

    real-or-random commented at 9:30 am on February 27, 2023:

    That’s related to what I said there: #979 (review). Though I don’t understand why I said there that I don’t see why the output is trivial. It’s trivial, of course. Maybe what I meant is that I don’t see immediately why the output is correct, which matches your comment here, @jonasnick.

    Perhaps it’s better to simply exclude (0|1) as a valid input, and then document and VERIFY_CHECK that. That makes it easier to recognize this corner case in future code. (And I feel it’s hard to predict what definition of jac(0|1) future code would expect to work. It’s better to make callers remind of the corner case explicitly.)


    sipa commented at 9:28 pm on February 27, 2023:
    @real-or-random Has this been addressed by only supporting coprime inputs? The secp256k1_modinv32_modinfo requires an odd modulus already.

    sipa commented at 9:29 pm on February 27, 2023:
    Done.

    real-or-random commented at 3:30 pm on February 28, 2023:

    Yes, that can be considered resolved.

    The secp256k1_modinv32_modinfo requires an odd modulus already.

    I wonder if it’s a good idea to add a function secp256k1_modinv32_modinfo verify that checks consistency and call it on entry of every function that reads modinfo, similar to how we do this for other data structures. But if yes, that should happen in a separate PR.

  68. real-or-random commented at 4:34 pm on January 5, 2023: contributor

    Some final comments you can address or ignore.

    I’m running some testing locally and will then ACK if it’s successful.

  69. real-or-random approved
  70. real-or-random commented at 7:03 pm on January 5, 2023: contributor
    ACK ef2ac449d757189b67be27437824c2d0c0663bde diff and writeup is good and I tested every commit
  71. sipa commented at 7:17 pm on January 5, 2023: contributor
    I think I’ll leave your remaining nits for now, as they seem to intersect a bit with the existing safegcd/modinv functions, so addressing them is perhaps better done afterwards generally.
  72. real-or-random commented at 8:32 pm on January 16, 2023: contributor
    needs (trivial) rebase
  73. sipa force-pushed on Jan 17, 2023
  74. sipa commented at 4:45 am on January 17, 2023: contributor
    Rebased after merge of #1190.
  75. real-or-random approved
  76. real-or-random commented at 9:39 am on January 18, 2023: contributor
    reACK 642b9f42 diff and writeup is good and I tested every commit
  77. in src/tests.c:1962 in f6470b6afe outdated
    1961+    /* test secp256k1_i128_check_pow2 (sign == -1) */
    1962+    {
    1963+        int expect = (uc & 1);
    1964+        int pos = ub % 127;
    1965+        if (expect) {
    1966+            /* If expect==1, set swz to exactly -(2 << pos). */
    


    siv2r commented at 10:49 am on January 21, 2023:
    Shouldn’t it be -2^pos instead of -(2 << pos)? Similarly, in the lines: 1973, 1937, 1947.

    sipa commented at 8:57 pm on January 26, 2023:
    Good catch, fixed.
  78. sipa force-pushed on Jan 26, 2023
  79. sipa commented at 8:57 pm on January 26, 2023: contributor
    Rebased, and addressed a few nits.
  80. jonasnick commented at 8:51 am on February 27, 2023: contributor

    ACK mod nit

    I was able to implement the algorithm based on the descriptions and python snippets in safegcd_implementation.md and compare it against the C implementation.

  81. Make secp256k1_i128_check_pow2 support -(2^n) 5fffb2c7af
  82. Make secp256k1_modinv64_det_check_pow2 support abs val 04c6c1b181
  83. sipa force-pushed on Feb 27, 2023
  84. sipa commented at 9:25 pm on February 27, 2023: contributor
    Rebased, and updated to only support coprime inputs in the secp256k1_secp256k1_jacobiXX_maybe_var functions. The return value of those functions for “unknown/timeout” has been changed from -2 to 0.
  85. in src/modinv32.h:38 in 4a8d82de3c outdated
    34@@ -35,4 +35,8 @@ static void secp256k1_modinv32_var(secp256k1_modinv32_signed30 *x, const secp256
    35 /* Same as secp256k1_modinv32_var, but constant time in x (not in the modulus). */
    36 static void secp256k1_modinv32(secp256k1_modinv32_signed30 *x, const secp256k1_modinv32_modinfo *modinfo);
    37 
    38+/* Compute the Jacobi symbol for (x | modinfo->modulus). x must be coprime with modulus (and thus cannot be 0).
    


    jonasnick commented at 12:52 pm on February 28, 2023:
    Isn’t 0 coprime with 1 (and no other positive integer)? The gcd is 1, since 1 divides 0 and 1 divides 1.

    sipa commented at 3:04 pm on February 28, 2023:
    Good call. And iterating (f=1,g=0) doesn’t change the state, so it’s inaccurate to say it converges to f=g=gcd(f0,g0) as well.

    sipa commented at 3:33 pm on February 28, 2023:
    I’ve changed some comments, and clarified the explanation in safegcd_implementation.md too.

    jonasnick commented at 4:01 pm on February 28, 2023:
    Maybe I’m missing something but this line still says x coprime with modulus => x != 0.

    sipa commented at 4:03 pm on February 28, 2023:
    Ah, oops. It is correct, because the modinfo structure requires modulus >= 3, but I forgot to point that out in the comment here (it is added elsewhere).

    sipa commented at 4:23 pm on February 28, 2023:
    Added a comment about that too now.
  86. sipa force-pushed on Feb 28, 2023
  87. sipa force-pushed on Feb 28, 2023
  88. in src/tests.c:1046 in 1c1c202268 outdated
    1042+        negone[0] = mod[0] - 1;
    1043+        for (i = 1; i < 16; ++i) negone[i] = mod[i];
    1044+        mulmod256(sqr, sqr, negone, mod);
    1045+        uint16_to_signed30(&x, sqr);
    1046+        jac = secp256k1_jacobi32_maybe_var(&x, &m);
    1047+        CHECK(jac == 0 || jac == (1 - (mod[0] & 2)) * nonzero);
    


    jonasnick commented at 8:51 pm on February 28, 2023:
    nonzero is always 1 in this branch (same in modinv64 tests)

    sipa commented at 8:57 pm on February 28, 2023:
    Fixed.
  89. jonasnick commented at 8:51 pm on February 28, 2023: contributor
    reACK mod nit
  90. Native jacobi symbol algorithm
    This introduces variants of the divsteps-based GCD algorithm used for
    modular inverses to compute Jacobi symbols. Changes compared to
    the normal vartime divsteps:
    * Only positive matrices are used, guaranteeing that f and g remain
      positive.
    * An additional jac variable is updated to track sign changes during
      matrix computation.
    * There is (so far) no proof that this algorithm terminates within
      reasonable amount of time for every input, but experimentally it
      appears to almost always need less than 900 iterations. To account
      for that, only a bounded number of iterations is performed (1500),
      after which failure is returned. In VERIFY mode a lower iteration
      count is used to make sure that callers exercise their fallback.
    * The algorithm converges to f=g=gcd(f0,g0) rather than g=0. To keep
      this test simple, the end condition is f=1, which won't be reached
      if started with non-coprime or g=0 inputs. Because of that we only
      support coprime non-zero inputs.
    1de2a01c2b
  91. Add secp256k1_fe_is_square_var function
    The implementation calls the secp256k1_modinvNN_jacobi_var code, falling back
    to computing a square root in the (extremely rare) case it failed converge.
    6be01036c8
  92. doc: Describe Jacobi calculation in safegcd_implementation.md ce3cfc78a6
  93. sipa force-pushed on Feb 28, 2023
  94. jonasnick commented at 9:10 pm on February 28, 2023: contributor
    ACK ce3cfc78a6020d21be299e1e4f22cf8ef089194d
  95. real-or-random approved
  96. real-or-random commented at 11:50 am on March 1, 2023: contributor
    reACK ce3cfc78a6020d21be299e1e4f22cf8ef089194d diff and writeup is good and I tested every commit
  97. real-or-random merged this on Mar 1, 2023
  98. real-or-random closed this on Mar 1, 2023

  99. real-or-random cross-referenced this on Mar 1, 2023 from issue modinv: Verify invariant of _modinfo struct by real-or-random
  100. dhruv referenced this in commit a5df79db12 on Mar 7, 2023
  101. dhruv referenced this in commit 77b510d84c on Mar 7, 2023
  102. sipa referenced this in commit 763079a3f1 on Mar 8, 2023
  103. div72 referenced this in commit 945b094575 on Mar 14, 2023
  104. in src/int128_struct_impl.h:196 in ce3cfc78a6
    195-                  : r->hi == 0 && r->lo == (uint64_t)1 << n;
    196+static SECP256K1_INLINE int secp256k1_i128_check_pow2(const secp256k1_int128 *r, unsigned int n, int sign) {
    197+    VERIFY_CHECK(n < 127);
    198+    VERIFY_CHECK(sign == 1 || sign == -1);
    199+    return n >= 64 ? r->hi == (uint64_t)sign << (n - 64) && r->lo == 0
    200+                   : r->hi == (uint64_t)((sign - 1) >> 1) && r->lo == (uint64_t)sign << n;
    


    roconnor-blockstream commented at 10:37 am on March 22, 2023:
    There is no need to subtract 1 before doing a right shift.

    real-or-random commented at 6:29 am on April 10, 2023:
  105. real-or-random referenced this in commit 145078c418 on Apr 10, 2023
  106. sipa cross-referenced this on May 11, 2023 from issue Abstract out and merge all the magnitude/normalized logic by sipa
  107. vmta referenced this in commit e1120c94a1 on Jun 4, 2023
  108. vmta referenced this in commit 8f03457eed on Jul 1, 2023

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin-core/secp256k1. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-10-30 01:15 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me