Based on #979.
This adds a module with an implementation of the Elligator Squared algorithm for encoding/decoding public keys in uniformly random byte arrays.
93 | + for (i = 0; i < 1000*count; i++) { 94 | + secp256k1_fe t; 95 | + secp256k1_ge g; 96 | + random_field_element_test(&t); 97 | + secp256k1_ellsq_fe_to_ge_var(&g, &t); 98 | + CHECK(secp256k1_ge_is_valid_var(&g));
Is it worth checking that t appears exactly once among preimages of g?
{
/* t should appear exactly once in preimages */
int j, matches = 0;
secp256k1_fe f;
for (j = 0; j < 4; j++) {
if (secp256k1_ellsq_ge_to_fe_var(&f, &g, j) && check_fe_equal(&f, &t)) matches++;
}
CHECK(matches == 1);
}
Good idea, done.
96 | + random_field_element_test(&t); 97 | + secp256k1_ellsq_fe_to_ge_var(&g, &t); 98 | + CHECK(secp256k1_ge_is_valid_var(&g)); 99 | + } 100 | + /* Verify that secp256k1_ellsq_ge_to_fe_var + fe_to_ge_var roundtrips for random inputs. */ 101 | + for (i = 0; i < 1000*count; i++) {
Is it worth checking that all preimages are distinct?
int j, valid[4];
secp256k1_fe f[4];
secp256k1_ge g;
random_group_element_test(&g);
for (j = 0; j < 4; j++) {
valid[j] = 0;
if (secp256k1_ellsq_ge_to_fe_var(&f[j], &g, j)) {
secp256k1_ge g2;
secp256k1_ellsq_fe_to_ge_var(&g2, &f[j]);
ge_equals_ge(&g, &g2);
valid[j] = 1;
}
}
/* Preimages should all be distinct */
for (j = 0; j < 4; j++) {
int k;
if (!valid[j]) continue;
for (k = j + 1; k < 4; k++) {
if (!valid[k]) continue;
CHECK(!check_fe_equal(&f[j], &f[k]));
}
}
Good idea, added (slightly differently).
tACK 7b1e6260e215174561a2d1fdd7f500ba6eaf1489
I'll give a full ack once I carefully review:
Basic question, what would go wrong if we use naive approaches such as:
(1) Given a random curve point (x1, y1), increment the x coordinate until you get another curve point (x2, y2). Use a random field element in [x1, x2) as the encoding. Decode by decrementing until you get an x corresponding to a curve point.
(2) Just generate and use a random field element as the encoding; both parties decrement x until they get a valid curve point.
I'm guessing the reason we don't do something like that is (1) will produce field elements that don't look random, and (2) will generate keys in a biased way?
(1) Indeed, this will result in biased encodings, because the distance between field elements which are valid X coordinates varies, and attacker-known, so they can detect a bias: encodings for which the previous and next valid X coordinate are close together will occur more frequently.
(2) Biased private key isn't really a problem as long as (a) the bias is small and (b) the bias isn't recognizable from the encodings/public keys, which is the case here. There is another problem however: the encoder wouldn't know the corresponding private key.
Oops, good point about (2), I am silly. 🤣
But about (1), if there were a known upper bound on the distance between valid X coordinates (AFAIK there isn't one, but more than ~128 is probably negligibly rare), you could actually use f(x) = (first point with largest X coordinate not larger than x) as mapping function in the scheme. I suspect that's significantly slower than what we have now though, as you'd need avg 128 iterations, and counting the distance between valid X coordinates around each...
249 | + secp256k1_ge_neg(&q, &q); 250 | + secp256k1_gej_set_ge(&qj, &q); 251 | + secp256k1_gej_add_ge_var(&qj, &qj, &p, NULL); 252 | + if (!secp256k1_gej_is_infinity(&qj)) { 253 | + secp256k1_ge_set_gej_var(&q, &qj); 254 | + } else {
Quick question, is there an easy way to hit this branch in a test, that doesn't involve something convoluted like mocking out the hash function?
No worries if not but just thought I'd double check, since I added a VERIFY_CHECK(0) here and it doesn't get hit by either the test or benchmark.
See the comment below:
Note that if u1 is generated using a secure hash function, the probability of reaching this branch is negligible.
So I think it's indeed impossible to construct a testcase for it without stubbing out the hash function.
I ended up just checking this case manually with these steps:
u1 directly to a preimage of p (if one exists)VERIFY_CHECK(0) in the else case and make sure it gets hitVERIFY_CHECK(0) and make sure the tests (especially roundtrips) passLooks good to me!
Code Review ACK 7b1e6260e215174561a2d1fdd7f500ba6eaf1489, modulo verifying the proof that the encoding is actually indistinguishable from 64 random bytes
I opened the different implementations of f, r side by side and checked that:
Basic question:
rnd32 as the key)? (Is this a well-known construction?)272 | + } 273 | + memset(hashdata, 0, sizeof(hashdata)); 274 | + return 1; 275 | + } 276 | + /* Only returned in case the provided pubkey is invalid. */ 277 | + return 0;
Could this happen? The coverage report shows that this line doesn't get hit, and secp256k1_pubkey_load never seems to return 0.
If not, would it make sense to either:
if (secp256k1_pubkey_load(ctx, &p, pubkey)) { with a VERIFY_CHECK and de-indent the bodyHmm, good point. I will address this.
Perhaps we should also make secp256k1_load_pubkey return void (not in this PR, though).
secp256k1_pubkey_load can actually return 0, through its ARG_CHECK macro.
Oh nice, I missed that.
Do we want more strict checks here than ARG_CHECK(!secp256k1_fe_is_zero(&ge->x))? For example the following passes, but I wouldn't expect 1000 randomly generate 64-byte strings to all produce valid pubkeys:
int N = 1000;
int j = 0, success = 0;
secp256k1_ge g;
secp256k1_pubkey pubkey;
secp256k1_testrand256(pubkey.data);
secp256k1_testrand256(pubkey.data + 32);
for (j = 0; j < N; j++) {
if (secp256k1_pubkey_load(ctx, &g, &pubkey)) success++;
}
CHECK(success == N);
I don't think that's needed. That check is a last-resort sanity check (several functions will set pubkeys on output to all-zeros when they return failure). Triggering that situation implies you're already using the API incorrectly; a secp256k1_pubkey object must contain a valid public key, in normal usage.
268 | + secp256k1_fe_normalize_var(&u2); 269 | + secp256k1_fe_get_b32(ell64 + 32, &u2); 270 | + break; 271 | + } 272 | + } 273 | + memset(hashdata, 0, sizeof(hashdata));
Do you know if this memset gets optimized out? e.g. here's how the disassembly for the tests binary looked on my machine:
callq _secp256k1_ellsq_ge_to_fe_var
testl %eax, %eax
movq -0xd0(%rbp), %r15
movq -0xc0(%rbp), %rbx
leaq -0x2c0(%rbp), %r14
leaq -0x90(%rbp), %r12
leaq -0x320(%rbp), %r13
jne 0x1000097f1
...
0x1000097f1:
leaq -0x2f0(%rbp), %r14
movq %r14, %rdi
callq _secp256k1_fe_normalize_var
addq $0x20, %r15
movq %r15, %rdi
movq %r14, %rsi
callq _secp256k1_fe_get_b32
movl $0x1, %eax
movq 0x1867e5(%rip), %rcx ## literal pool symbol address: ___stack_chk_guard
movq (%rcx), %rcx
cmpq -0x30(%rbp), %rcx
jne 0x10000988d
addq $0x2f8, %rsp ## imm = 0x2F8
popq %rbx
popq %r12
popq %r13
popq %r14
popq %r15
popq %rbp
retq
I see calls corresponding to the if statement on line 266 above, as well as the return 1, but I don't see anything corresponding to the memset.
Yes, it is optimized out but currently that's true for all of these "safety" memsets in the codebase. This is something we should really fix. I have an PR here but it's old and needs to be rebased and finished... #636
ACK 7b1e6260e215174561a2d1fdd7f500ba6eaf1489, I don't think the rest of my comments are blocking.
rnd32.The only thing I haven't verified is that the function f is indeed "well-distributed", but I don't want to block this PR until I've learned the background material needed to check that—I'm happy to accept that without verifying the proof for now.
Finally, would it make sense to either:
not terribly non-uniform in https://github.com/sipa/writeups/tree/main/elligator-square-for-bn#25-dealing-with-infinityf(u) = -f(v) case, so that it's really easy to show that the statistical distance between the implemented distribution and the one in the paper is neglible (the decoder can still handle that case, so that every 64-byte string decodes to a valid public key)Elaborate on the statement not terribly non-uniform in https://github.com/sipa/writeups/tree/main/elligator-square-for-bn#25-dealing-with-infinity
I should drop that, as it's confusing. Given the immensely low probability of hitting infinity in the first place, how it is handled is totally irrelevant.
This comment aims to indicate that it's better than e.g. mapping infinity to the generator or so (which it is, as it'd mean the generator has ~2n ellsq preimages, while other points only have ~n), but I don't think that's a useful justification given it's only changing an negligibly frequent event anyway. The real justification is: it's probably the easiest way to implement this edge case.
Make the encoder NOT target the f(u) = -f(v) case, so that it's really easy to show that the statistical distance between the implemented distribution and the one in the paper is neglible (the decoder can still handle that case, so that every 64-byte string decodes to a valid public key)
I don't think it should be hard to show the difference is negligible, given the fact that even triggering (much less observing) a case in which the outcome is different is negligible (it requires picking a uniformly random u for which f(u)=-P).
OK, I'm fully on board with this now! If you're curious, see https://gist.github.com/robot-dreams/86302bfca28bf9dc83ced365cd428f64 for way too much detail that nobody needed or asked for.
ACK 07a4ef5a08cea9916cf93603670d8ee4d5340540
Rebased.
ACK f997dadf592131054d1cfca6175730e53dee52a1 (aside from most recent build / CI changes) based on:
This introduces variants of the divsteps-based GCD algorithm used for
modular inverses to compute Jacobi symbols. Changes compared to
the normal vartime divsteps:
* Only positive matrices are used, guaranteeing that f and g remain
positive.
* An additional jac variable is updated to track sign changes during
matrix computation.
* There is (so far) no proof that this algorithm terminates within
reasonable amount of time for every input, but experimentally it
appears to almost always need less than 900 iterations. To account
for that, only a bounded number of iterations is performed (1500),
after which failure is returned. The field logic then falls back to
using square roots to determining the result.
* The algorithm converges to f=g=gcd(f0,g0) rather than g=0. To keep
this test simple, the end condition is f=1, which won't be reached
if started with g=0. That case is dealt with specially.
This adds a module with an implementation of the Elligator Squared
algorithm for encoding/decoding public keys in uniformly random
byte arrays.
Closing, as Elligator Square seems strictly inferior in performance and complexity to ElligatorSwift.