Add Elligator Square module #982

sipa commented at 7:04 PM on September 28, 2021: contributor

Based on #979.

This adds a module with an implementation of the Elligator Squared algorithm for encoding/decoding public keys in uniformly random byte arrays.

sipa force-pushed on Sep 28, 2021

sipa force-pushed on Oct 2, 2021

sipa force-pushed on Nov 9, 2021

in src/modules/ellsq/tests_impl.h:99 in 7b1e6260e2 outdated

  93 | +    for (i = 0; i < 1000*count; i++) {
  94 | +        secp256k1_fe t;
  95 | +        secp256k1_ge g;
  96 | +        random_field_element_test(&t);
  97 | +        secp256k1_ellsq_fe_to_ge_var(&g, &t);
  98 | +        CHECK(secp256k1_ge_is_valid_var(&g));

robot-dreams commented at 7:00 PM on November 12, 2021:

Is it worth checking that t appears exactly once among preimages of g?

        {
            /* t should appear exactly once in preimages */
            int j, matches = 0;
            secp256k1_fe f;
            for (j = 0; j < 4; j++) {
                if (secp256k1_ellsq_ge_to_fe_var(&f, &g, j) && check_fe_equal(&f, &t)) matches++;
            }
            CHECK(matches == 1);
        }

sipa commented at 9:06 PM on November 16, 2021:

Good idea, done.

in src/modules/ellsq/tests_impl.h:107 in 7b1e6260e2 outdated

  96 | +        random_field_element_test(&t);
  97 | +        secp256k1_ellsq_fe_to_ge_var(&g, &t);
  98 | +        CHECK(secp256k1_ge_is_valid_var(&g));
  99 | +    }
 100 | +    /* Verify that secp256k1_ellsq_ge_to_fe_var + fe_to_ge_var roundtrips for random inputs. */
 101 | +    for (i = 0; i < 1000*count; i++) {

robot-dreams commented at 7:02 PM on November 12, 2021:

Is it worth checking that all preimages are distinct?

        int j, valid[4];
        secp256k1_fe f[4];
        secp256k1_ge g;
        random_group_element_test(&g);
        for (j = 0; j < 4; j++) {
            valid[j] = 0;
            if (secp256k1_ellsq_ge_to_fe_var(&f[j], &g, j)) {
                secp256k1_ge g2;
                secp256k1_ellsq_fe_to_ge_var(&g2, &f[j]);
                ge_equals_ge(&g, &g2);
                valid[j] = 1;
            }
        }
        /* Preimages should all be distinct */
        for (j = 0; j < 4; j++) {
            int k;
            if (!valid[j]) continue;
            for (k = j + 1; k < 4; k++) {
                if (!valid[k]) continue;
                CHECK(!check_fe_equal(&f[j], &f[k]));
            }
        }

sipa commented at 9:06 PM on November 16, 2021:

Good idea, added (slightly differently).

robot-dreams commented at 8:07 PM on November 12, 2021: contributor

tACK 7b1e6260e215174561a2d1fdd7f500ba6eaf1489

I'll give a full ack once I carefully review:

Optimizations in section 3 of the writeup
Low-level implementation
Argument that the encoding is statistically indistinguishable from 64 random bytes, even with the minor changes (e.g. point at infinity, omitting some duplicate checks)

Basic question, what would go wrong if we use naive approaches such as:

(1) Given a random curve point (x1, y1), increment the x coordinate until you get another curve point (x2, y2). Use a random field element in [x1, x2) as the encoding. Decode by decrementing until you get an x corresponding to a curve point.

(2) Just generate and use a random field element as the encoding; both parties decrement x until they get a valid curve point.

I'm guessing the reason we don't do something like that is (1) will produce field elements that don't look random, and (2) will generate keys in a biased way?

sipa commented at 8:11 PM on November 12, 2021: contributor

@robot-dreams:

(1) Indeed, this will result in biased encodings, because the distance between field elements which are valid X coordinates varies, and attacker-known, so they can detect a bias: encodings for which the previous and next valid X coordinate are close together will occur more frequently.

(2) Biased private key isn't really a problem as long as (a) the bias is small and (b) the bias isn't recognizable from the encodings/public keys, which is the case here. There is another problem however: the encoder wouldn't know the corresponding private key.

robot-dreams commented at 8:16 PM on November 12, 2021: contributor

Oops, good point about (2), I am silly. 🤣

sipa commented at 8:20 PM on November 12, 2021: contributor

But about (1), if there were a known upper bound on the distance between valid X coordinates (AFAIK there isn't one, but more than ~128 is probably negligibly rare), you could actually use f(x) = (first point with largest X coordinate not larger than x) as mapping function in the scheme. I suspect that's significantly slower than what we have now though, as you'd need avg 128 iterations, and counting the distance between valid X coordinates around each...

in src/modules/ellsq/main_impl.h:254 in 7b1e6260e2 outdated

 249 | +            secp256k1_ge_neg(&q, &q);
 250 | +            secp256k1_gej_set_ge(&qj, &q);
 251 | +            secp256k1_gej_add_ge_var(&qj, &qj, &p, NULL);
 252 | +            if (!secp256k1_gej_is_infinity(&qj)) {
 253 | +                 secp256k1_ge_set_gej_var(&q, &qj);
 254 | +            } else {

robot-dreams commented at 8:34 PM on November 12, 2021:

Quick question, is there an easy way to hit this branch in a test, that doesn't involve something convoluted like mocking out the hash function?

No worries if not but just thought I'd double check, since I added a VERIFY_CHECK(0) here and it doesn't get hit by either the test or benchmark.

sipa commented at 8:35 PM on November 12, 2021:

See the comment below:

Note that if u1 is generated using a secure hash function, the probability of reaching this branch is negligible.

So I think it's indeed impossible to construct a testcase for it without stubbing out the hash function.

robot-dreams commented at 3:12 PM on November 15, 2021:

I ended up just checking this case manually with these steps:

Add code to always set u1 directly to a preimage of p (if one exists)
Add VERIFY_CHECK(0) in the else case and make sure it gets hit
Remove the VERIFY_CHECK(0) and make sure the tests (especially roundtrips) pass

Looks good to me!

robot-dreams commented at 10:46 PM on November 13, 2021: contributor

Code Review ACK 7b1e6260e215174561a2d1fdd7f500ba6eaf1489, modulo verifying the proof that the encoding is actually indistinguishable from 64 random bytes

I opened the different implementations of f, r side by side and checked that:

Section 2.6 is equivalent to Section 3
Section 3 is equivalent to the broken-down Sage implementation
The C and Sage implementations are equivalent

Basic question:

The use of SHA256 for generating branch values seems reasonable, but just wondering, is there a justification that this use of SHA256 gives you a secure PRG (taking rnd32 as the key)? (Is this a well-known construction?)

in src/modules/ellsq/main_impl.h:277 in 7b1e6260e2 outdated

 272 | +        }
 273 | +        memset(hashdata, 0, sizeof(hashdata));
 274 | +        return 1;
 275 | +    }
 276 | +    /* Only returned in case the provided pubkey is invalid. */
 277 | +    return 0;

robot-dreams commented at 3:19 PM on November 15, 2021:

Could this happen? The coverage report shows that this line doesn't get hit, and secp256k1_pubkey_load never seems to return 0.

If not, would it make sense to either:

Add an explicit check that the pubkey is valid
Replace the if (secp256k1_pubkey_load(ctx, &p, pubkey)) { with a VERIFY_CHECK and de-indent the body

sipa commented at 10:13 PM on November 15, 2021:

Hmm, good point. I will address this.

Perhaps we should also make secp256k1_load_pubkey return void (not in this PR, though).

sipa commented at 9:22 PM on November 16, 2021:

secp256k1_pubkey_load can actually return 0, through its ARG_CHECK macro.

robot-dreams commented at 10:25 PM on November 16, 2021:

Oh nice, I missed that.

Do we want more strict checks here than ARG_CHECK(!secp256k1_fe_is_zero(&ge->x))? For example the following passes, but I wouldn't expect 1000 randomly generate 64-byte strings to all produce valid pubkeys:

        int N = 1000;
        int j = 0, success = 0;
        secp256k1_ge g;
        secp256k1_pubkey pubkey;
        secp256k1_testrand256(pubkey.data);
        secp256k1_testrand256(pubkey.data + 32);
        for (j = 0; j < N; j++) {
            if (secp256k1_pubkey_load(ctx, &g, &pubkey)) success++;
        }
        CHECK(success == N);

sipa commented at 10:28 PM on November 16, 2021:

I don't think that's needed. That check is a last-resort sanity check (several functions will set pubkeys on output to all-zeros when they return failure). Triggering that situation implies you're already using the API incorrectly; a secp256k1_pubkey object must contain a valid public key, in normal usage.

in src/modules/ellsq/main_impl.h:273 in 7b1e6260e2 outdated

 268 | +                secp256k1_fe_normalize_var(&u2);
 269 | +                secp256k1_fe_get_b32(ell64 + 32, &u2);
 270 | +                break;
 271 | +            }
 272 | +        }
 273 | +        memset(hashdata, 0, sizeof(hashdata));

robot-dreams commented at 3:47 PM on November 15, 2021:

Do you know if this memset gets optimized out? e.g. here's how the disassembly for the tests binary looked on my machine:

callq   _secp256k1_ellsq_ge_to_fe_var
testl   %eax, %eax
movq    -0xd0(%rbp), %r15
movq    -0xc0(%rbp), %rbx
leaq    -0x2c0(%rbp), %r14
leaq    -0x90(%rbp), %r12
leaq    -0x320(%rbp), %r13
jne     0x1000097f1

...

0x1000097f1:
leaq    -0x2f0(%rbp), %r14
movq    %r14, %rdi
callq   _secp256k1_fe_normalize_var
addq    $0x20, %r15
movq    %r15, %rdi
movq    %r14, %rsi
callq   _secp256k1_fe_get_b32
movl    $0x1, %eax
movq    0x1867e5(%rip), %rcx    ## literal pool symbol address: ___stack_chk_guard
movq    (%rcx), %rcx
cmpq    -0x30(%rbp), %rcx
jne     0x10000988d
addq    $0x2f8, %rsp            ## imm = 0x2F8
popq    %rbx
popq    %r12
popq    %r13
popq    %r14
popq    %r15
popq    %rbp
retq

I see calls corresponding to the if statement on line 266 above, as well as the return 1, but I don't see anything corresponding to the memset.

real-or-random commented at 8:24 AM on November 16, 2021:

Yes, it is optimized out but currently that's true for all of these "safety" memsets in the codebase. This is something we should really fix. I have an PR here but it's old and needs to be rebased and finished... #636

robot-dreams commented at 4:54 PM on November 15, 2021: contributor

ACK 7b1e6260e215174561a2d1fdd7f500ba6eaf1489, I don't think the rest of my comments are blocking.

I'm happy with the argument that the sampling algorithm produces a uniformly random preimage of a curve point
Although I wasn't yet able to calculate the exact statistical distance between the distribution implemented here and the distribution described in the paper (handling infinity being the only difference), I suspect that it's O(1/p), i.e. negligible unless I made a big mistake in my estimation.
I've convinced myself (in part with the help of the section starting with "Back to Delphia" in this article) that the use of SHA256 to generate random branches / field elements is secure under a random oracle model 😅 sipa's favorite! Here, an attacker would have no choice but to guess rnd32.

The only thing I haven't verified is that the function f is indeed "well-distributed", but I don't want to block this PR until I've learned the background material needed to check that—I'm happy to accept that without verifying the proof for now.

Finally, would it make sense to either:

Elaborate on the statement not terribly non-uniform in https://github.com/sipa/writeups/tree/main/elligator-square-for-bn#25-dealing-with-infinity
Make the encoder NOT target the f(u) = -f(v) case, so that it's really easy to show that the statistical distance between the implemented distribution and the one in the paper is neglible (the decoder can still handle that case, so that every 64-byte string decodes to a valid public key)

sipa commented at 10:11 PM on November 15, 2021: contributor

Elaborate on the statement not terribly non-uniform in https://github.com/sipa/writeups/tree/main/elligator-square-for-bn#25-dealing-with-infinity

I should drop that, as it's confusing. Given the immensely low probability of hitting infinity in the first place, how it is handled is totally irrelevant.

This comment aims to indicate that it's better than e.g. mapping infinity to the generator or so (which it is, as it'd mean the generator has ~2n ellsq preimages, while other points only have ~n), but I don't think that's a useful justification given it's only changing an negligibly frequent event anyway. The real justification is: it's probably the easiest way to implement this edge case.

Make the encoder NOT target the f(u) = -f(v) case, so that it's really easy to show that the statistical distance between the implemented distribution and the one in the paper is neglible (the decoder can still handle that case, so that every 64-byte string decodes to a valid public key)

I don't think it should be hard to show the difference is negligible, given the fact that even triggering (much less observing) a case in which the outcome is different is negligible (it requires picking a uniformly random u for which f(u)=-P).

robot-dreams commented at 6:09 AM on November 16, 2021: contributor

OK, I'm fully on board with this now! If you're curious, see https://gist.github.com/robot-dreams/86302bfca28bf9dc83ced365cd428f64 for way too much detail that nobody needed or asked for.

sipa force-pushed on Nov 16, 2021

robot-dreams commented at 10:31 PM on November 16, 2021: contributor

ACK 07a4ef5a08cea9916cf93603670d8ee4d5340540

stratospher referenced this in commit 94767af462 on Jan 7, 2022

stratospher referenced this in commit b3c5aab7eb on Jan 7, 2022

stratospher referenced this in commit 4a00483129 on Jan 7, 2022

stratospher referenced this in commit 0cfafbda3e on Jan 7, 2022

stratospher referenced this in commit f01f335844 on Jan 7, 2022

stratospher referenced this in commit 8eb51d7e80 on Jan 7, 2022

stratospher referenced this in commit 915a1c9abb on Jan 7, 2022

stratospher referenced this in commit 0410b05d16 on Jan 8, 2022

stratospher referenced this in commit a77605a047 on Jan 8, 2022

stratospher referenced this in commit 8cdf579e86 on Jan 8, 2022

stratospher referenced this in commit 2e831ff811 on Jan 8, 2022

stratospher referenced this in commit 1c2ef4c40f on Jan 8, 2022

stratospher referenced this in commit e755867f79 on Jan 8, 2022

stratospher referenced this in commit 33827163a0 on Jan 8, 2022

stratospher referenced this in commit 4a63d306e4 on Jan 8, 2022

stratospher referenced this in commit 90d044f51c on Jan 8, 2022

stratospher referenced this in commit 2b621b5f92 on Jan 8, 2022

stratospher referenced this in commit 692ef047f3 on Jan 8, 2022

stratospher referenced this in commit ff16f37018 on Jan 8, 2022

sipa force-pushed on Jan 25, 2022

sipa commented at 7:59 PM on January 25, 2022: contributor

Rebased.

robot-dreams commented at 8:12 PM on January 25, 2022: contributor

ACK f997dadf592131054d1cfca6175730e53dee52a1 (aside from most recent build / CI changes) based on:

#982 (comment)
git range-diff 0a40a4861ae654b3685d3d3670d45876f7cc9c48 07a4ef5a08cea9916cf93603670d8ee4d5340540 f997dadf592131054d1cfca6175730e53dee52a1

sipa commented at 9:03 PM on January 31, 2022: contributor

Mental note: simplify this after #1033.

stratospher referenced this in commit 858c4072db on Feb 3, 2022

stratospher referenced this in commit 2b49380816 on Feb 3, 2022

stratospher referenced this in commit f8e8a8d141 on Feb 3, 2022

Native jacobi symbol algorithm

This introduces variants of the divsteps-based GCD algorithm used for
modular inverses to compute Jacobi symbols. Changes compared to
the normal vartime divsteps:
* Only positive matrices are used, guaranteeing that f and g remain
  positive.
* An additional jac variable is updated to track sign changes during
  matrix computation.
* There is (so far) no proof that this algorithm terminates within
  reasonable amount of time for every input, but experimentally it
  appears to almost always need less than 900 iterations. To account
  for that, only a bounded number of iterations is performed (1500),
  after which failure is returned. The field logic then falls back to
  using square roots to determining the result.
* The algorithm converges to f=g=gcd(f0,g0) rather than g=0. To keep
  this test simple, the end condition is f=1, which won't be reached
  if started with g=0. That case is dealt with specially.

e5ae1e0c4d

doc: Describe Jacobi calculation in safegcd_implementation.md 7e1bbef364

Elligator Squared module

This adds a module with an implementation of the Elligator Squared
algorithm for encoding/decoding public keys in uniformly random
byte arrays.

fcd4abfeed

Add ellsq testing to CI d4cbedc023

sipa force-pushed on Jun 28, 2022

sipa commented at 2:01 PM on July 7, 2022: contributor

Closing, as Elligator Square seems strictly inferior in performance and complexity to ElligatorSwift.

sipa closed this on Jul 7, 2022