There were several places where the code was non-constant time for invalid secret inputs. These are harmless under sane use but get in the way of automatic const-time validation.
(Nonce overflow in signing is not addressed, nor is s==0 in signing)
There were several places where the code was non-constant time for invalid secret inputs. These are harmless under sane use but get in the way of automatic const-time validation.
(Nonce overflow in signing is not addressed, nor is s==0 in signing)
Only the “Eliminate harmless” “declassify” commits are new, the rest are in other PRs.
This also doesn’t touch the privkey tweak functions yet, because I had forgotten about them.
With this PR the valgrind test is totally clean except for overflow and s==0 in signing (and except for whatever functions like the tweak that should be tested but aren’t tested yet).
Nonce overflow is a real non-constant timeness but security irrelevant because it never happens.
s==0 test is not actually a branch on secret data, but is instead a false positive owing to the fact that I haven’t put in anything yet to tell valgrind that the data is no longer secret at that point– though that will be easy to do.
This was made somewhat more complicated by preserving the output-zeroization on failure.
I added a fixups commit which fixes the seckey verify and tweak operations which were previously not tested.
There is a remaining issue that the the scalar code uses “th += (c0 < tl) ? 1 : 0;” which doesn’t always compile to constant time code. E.g. 32-bit scalar on power9 w/ GCC 9.2.1 turns some of the muladds into branches (e.g. src/scalar_8x32_impl.h:444).
I could use some help figuring out an alternative for that sort of carry which is equally fast and doesn’t sometimes get turned into branches.
Similarly, testing this with different compilers/build-options/arches would be useful to learn if there are any other missed cases.
I think this is probably enough progress to merge– it does at least make the tests clean on my desktop–, if people would like I can drop off the tester commit and I could work towards merging this much even though there are a few platform/compiler specific issues remaining.
191- } while (retry); /* This branch true is cryptographically unreachable. Requires sha256_hmac output > Fp. */
192+ /* Accept unobservably small non-uniformity. */
193+ secp256k1_rfc6979_hmac_sha256_generate(&rng, nonce32, 32);
194+ range = !secp256k1_fe_set_b32(&s, nonce32);
195+ range |= secp256k1_fe_is_zero(&s);
196+ secp256k1_fe_cmov(&s, &secp256k1_fe_one, range);
s
remains uninitialized if secp256k1_fe_set_b32
fails; doing anything on it can cause failure (and it actually can in practice in debug mode, because the magnitude/normalized flags will be garbage).
@gmaxwell I think you also can’t call secp256k1_fe_is_zero
in case secp256k1_fe_set_b32
above fails. Perhaps it’s easiest to call secp256k1_fe_clear
on s
before the set_b32?
204- retry = retry || secp256k1_scalar_is_zero(&b);
205- } while (retry); /* This branch true is cryptographically unreachable. Requires sha256_hmac output > order. */
206+ secp256k1_rfc6979_hmac_sha256_generate(&rng, nonce32, 32);
207+ secp256k1_scalar_set_b32(&b, nonce32, NULL);
208+ /* A blinding value of 0 works, but would undermine the projection hardening. */
209+ secp256k1_scalar_cmov(&b, &secp256k1_scalar_one, secp256k1_scalar_is_zero(&b));
331@@ -331,15 +332,15 @@ static int secp256k1_fe_set_b32(secp256k1_fe *r, const unsigned char *a) {
332 r->n[8] = (uint32_t)a[5] | ((uint32_t)a[4] << 8) | ((uint32_t)a[3] << 16) | ((uint32_t)(a[2] & 0x3) << 24);
333 r->n[9] = (uint32_t)((a[2] >> 2) & 0x3f) | ((uint32_t)a[1] << 6) | ((uint32_t)a[0] << 14);
334
335- if (r->n[9] == 0x3FFFFFUL && (r->n[8] & r->n[7] & r->n[6] & r->n[5] & r->n[4] & r->n[3] & r->n[2]) == 0x3FFFFFFUL && (r->n[1] + 0x40UL + ((r->n[0] + 0x3D1UL) >> 26)) > 0x3FFFFFFUL) {
336- return 0;
337- }
338+ ret = !((r->n[9] == 0x3FFFFFUL) & ((r->n[8] & r->n[7] & r->n[6] & r->n[5] & r->n[4] & r->n[3] & r->n[2]) == 0x3FFFFFFUL) & ((r->n[1] + 0x40UL + ((r->n[0] + 0x3D1UL) >> 26)) > 0x3FFFFFFUL));
.. > 0x3FFFFFFUL
still a branch?
317@@ -317,15 +318,15 @@ static int secp256k1_fe_set_b32(secp256k1_fe *r, const unsigned char *a) {
318 | ((uint64_t)a[2] << 24)
319 | ((uint64_t)a[1] << 32)
320 | ((uint64_t)a[0] << 40);
321- if (r->n[4] == 0x0FFFFFFFFFFFFULL && (r->n[3] & r->n[2] & r->n[1]) == 0xFFFFFFFFFFFFFULL && r->n[0] >= 0xFFFFEFFFFFC2FULL) {
322- return 0;
323- }
324+ ret = !((r->n[4] == 0x0FFFFFFFFFFFFULL) & ((r->n[3] & r->n[2] & r->n[1]) == 0xFFFFFFFFFFFFFULL) & (r->n[0] >= 0xFFFFEFFFFFC2FULL));
81+ memset(x, 0, 32);
82+ memset(y, 0, 32);
83 secp256k1_scalar_clear(&s);
84- return ret;
85+
86+ return !!ret & !overflow;
hashfp
that returned 5 this function will now return 1 instead of 5
(we use this behavior in rust-secp https://github.com/rust-bitcoin/rust-secp256k1/blob/master/src/ecdh.rs#L162 @apoelstra)
You are relying on behavior that directly contradicts the documentation and isn’t tested.
0: scalar was invalid (zero or overflow)
Nonce functions have an opaque data, so they can return information in other ways…
In either case the interface should be tested. I knew I was tightening up the behavior to better match the public interface but didn’t consider the existence of bespoke nonce functions that would return something else only that further changes might accidentally return non-0/1 values.
Thanks for noticing this! I consider the return values not agreeing with the docs to be a pretty serious bug, and it’s unfortunate that there aren’t tests for cases where the nonce function induces it. That said, the noncefp docs specify that the return is 0 or 1 too: " * Returns: 1 if a nonce was successfully generated. 0 will cause signing to fail."
~Opened an issue here: #710~
EDIT: I mean here: https://github.com/rust-bitcoin/rust-secp256k1/issues/196
Opened an issue here: #710
This is a link to this PR here and also I don’t see any other new issue.
717@@ -718,4 +718,18 @@ SECP256K1_INLINE static void secp256k1_scalar_mul_shift_var(secp256k1_scalar *r,
718 secp256k1_scalar_cadd_bit(r, 0, (l[(shift - 1) >> 5] >> ((shift - 1) & 0x1f)) & 1);
719 }
720
721+static SECP256K1_INLINE void secp256k1_scalar_cmov(secp256k1_scalar *r, const secp256k1_scalar *a, int flag) {
-O3 -march=native
uses vmovdqu
which might be variable time on AMD Bolldozer according to https://www.agner.org/optimize/instruction_tables.pdf
945@@ -946,4 +946,14 @@ SECP256K1_INLINE static void secp256k1_scalar_mul_shift_var(secp256k1_scalar *r,
946 secp256k1_scalar_cadd_bit(r, 0, (l[(shift - 1) >> 6] >> ((shift - 1) & 0x3f)) & 1);
947 }
948
949+static SECP256K1_INLINE void secp256k1_scalar_cmov(secp256k1_scalar *r, const secp256k1_scalar *a, int flag) {
-O3 -march=native
uses vmovdqu64
which might be variable time on AMD bolldozer
503+ }
504+ secp256k1_scalar_set_b32(&non, nonce32, &koverflow);
505+ koverflow |= secp256k1_scalar_is_zero(&non);
506+ /* The nonce is still secret here, but it overflowing or being zero is is less likely than 1:2^255. */
507+ declassify(ctx, &koverflow, sizeof(koverflow));
508+ if (!koverflow) {
cmov
on this too? (or just let it overflow? altough that will be a behavior change)
if koverflow
put 1 into the k and sign, but then clear and return 0
Letting it overflow would bias the nonce, which is extremely bad for security. The order of secp256k1 is fortunately such that the bias would not actually be a problem, though this isn’t the case for many other curves. Personally, if I were auditing an someone elses crypto code and saw it producing biased nonces like that I would instant fail the audit and not bother looking at anything else because the software was clearly written people by people who didn’t notice or didn’t heed the pile of dead bodies of their forebearers that thought leaving nonces a little biased was okay.
There isn’t much point in cmoving there because the loop must run again that case, making the function take a non-constant time. So the cmov would just be constant time theatre with no improvement. It would still have to be declassified too, since constant time analysis would notice the loop being conditioned on it– so the cmov would be just pure theater, not actually improve the constant timeness and just adding code.
Personally, if I were auditing an someone elses crypto code and saw it producing biased nonces like that I would instant fail the audit and not bother looking at anything else because the software was clearly written people by people who didn’t notice or didn’t heed the pile of dead bodies of their forebearers that thought leaving nonces a little biased was okay.
We ignore the overflow in the BIP 340 draft by the way. :X
@real-or-random I think the situation is a little different in BIP340, because it’s not using a generic retrying PRG-based approach, but a specified scheme (and curve) specific algorithm, with a rationale explaining that this approach is only safe because the order is so close to 2^256.
If we’d decide that’s worrisome because it may end up being reused for another curve, I’d prefer the EdDSA approach (hash to twice the width, and then modulo reduce it) over an iterative approach.
schnorrsig_sign
and could instead continue with the 0 nonce and memczero
the signature before returning 0.
159@@ -160,4 +160,16 @@ static SECP256K1_INLINE void *manual_alloc(void** prealloc_ptr, size_t alloc_siz
160 SECP256K1_GNUC_EXT typedef unsigned __int128 uint128_t;
161 #endif
162
163+/* Zero memory if flag == 1. Constant time. */
164+static SECP256K1_INLINE void memczero(void *s, size_t len, int flag) {
unsigned char flag
? why int and then case?
536+ secp256k1_scalar_clear(&msg);
537+ secp256k1_scalar_clear(&non);
538+ secp256k1_scalar_clear(&sec);
539+ secp256k1_ecdsa_signature_save(signature, &r, &s);
540+ memczero(signature, sizeof(*signature), (!ret) | overflow);
541+ return !!ret & !overflow;
ret
- Returns: 1: signature created
0: the nonce generation function failed, or the private key was invalid.
Fixes a bug. Like above behavior should get a test.
I reviewed all the cmov implementations in godbolt with latest 3 versions of gcc,clang,icc and WINE MSVC 2017 x64
Personally I’d prefer to split between the changes and the valgrind addition./
I like that the valgrind call is wrapped in #if defined(VALGRIND)
but Idk about the addition of int to the context just for this, but I don’t feel knowledgeable enough to comment on this seriously.
500+ koverflow |= secp256k1_scalar_is_zero(&non);
501+ /* The nonce is still secret here, but it overflowing or being zero is is less likely than 1:2^255. */
502+ declassify(ctx, &koverflow, sizeof(koverflow));
503+ if (!koverflow) {
504+ int success = secp256k1_ecdsa_sig_sign(&ctx->ecmult_gen_ctx, &r, &s, &sec, &msg, &non, NULL);
505+ /* The final signature is no longer a secret. */
81+ int ret;
82+ ret = !secp256k1_scalar_is_zero(tweak);
83
84 secp256k1_scalar_mul(key, key, tweak);
85- return 1;
86+ return ret;
314- *recid ^= 1;
315- }
316+ high = secp256k1_scalar_is_high(sigs);
317+ secp256k1_scalar_cond_negate(sigs, high);
318+ if (recid) {
319+ *recid ^= high;
secp256k1_scalar_is_high
) returns only 0/1. In general, it would be good to document this and have simple explicit tests for this.
Yes. And the tests do check that:
/* Check that comparison with the half order is equal to testing for high scalar. */
CHECK(secp256k1_scalar_is_high(&s) == (secp256k1_num_cmp(&snum, &half_order) > 0));
/* Check that comparison with the half order is equal to testing for high scalar after negation. */
CHECK(secp256k1_scalar_is_high(&neg) == (secp256k1_num_cmp(&negnum, &half_order) > 0));
For an “is” function is there any other logical return that might show up?
For an “is” function is there any other logical return that might show up? I think a lot of C code doesn’t bother to return exactly 1 and will be fine with every non-zero value instead, which is not an unreasonable choice.
General convention: Okay, then we should double-check this and document it… (another PR of course).
Tests: Yes, I’ve seen those. My feeling was that it’s not very explicit but yes, it’s there.
23@@ -24,6 +24,9 @@
24 #error "Please select scalar implementation"
25 #endif
26
27+static const secp256k1_scalar secp256k1_scalar_one = SECP256K1_SCALAR_CONST(0, 0, 0, 0, 0, 0, 0, 1);
28+static const secp256k1_scalar secp256k1_scalar_zero = SECP256K1_SCALAR_CONST(0, 0, 0, 0, 0, 0, 0, 0);
137@@ -132,6 +138,7 @@ secp256k1_context* secp256k1_context_preallocated_create(void* prealloc, unsigne
138 if (flags & SECP256K1_FLAGS_BIT_CONTEXT_VERIFY) {
139 secp256k1_ecmult_context_build(&ret->ecmult_ctx, &prealloc);
140 }
141+ ret->declassify = !!(flags & SECP256K1_FLAGS_BIT_CONTEXT_DECLASSIFY);
!!
here
lol Github was helpfully hiding this one from me, I only found it because you said two… :)
I don’t like having an apparently boolean variable have its truth value be other than one. It would totally be fine to not !! here with the current code, but then later we go and refactor something and the boolean test gets turned into an ==1 or used in an arithmetic expression and it works incorrectly.
So in general, I usually tend to boolenize a boolean variable unless its never used outside of a single scope, or in some performance critical inner loop where the single test opcode might actually matter. Though for flags the more ordinary way to booleanize is (flags & flag) == flag.
221@@ -215,6 +222,20 @@ void secp256k1_scratch_space_destroy(const secp256k1_context *ctx, secp256k1_scr
222 secp256k1_scratch_destroy(&ctx->error_callback, scratch);
223 }
224
225+/* Mark memory as no-longer-secret for the purpose of analysing constant-time behaviour
226+ * of the software. This is setup for use with valgrind but could be substituted with
227+ * the appropriate instrumentation for other analysis tools.
228+ */
229+static SECP256K1_INLINE void declassify(const secp256k1_context* ctx, void *p, size_t len) {
secp256k1_declassify
. Or maybe not? We also don’t have the prefix in util.h, hm.
util.h
then.
506+ /* The nonce is still secret here, but it overflowing or being zero is is less likely than 1:2^255. */
507+ declassify(ctx, &koverflow, sizeof(koverflow));
508+ if (!koverflow) {
509+ int success = secp256k1_ecdsa_sig_sign(&ctx->ecmult_gen_ctx, &r, &s, &sec, &msg, &non, NULL);
510+ /* The final signature is no longer a secret. */
511+ declassify(ctx, &success, sizeof(success));
success == ret
.
So we could simply get rid of success
(or alternatively VERIFY that success == ret
).
I think this matters here for the reader because we also “declassify” ret in a sense but that’s not obvious.
535+ memset(nonce32, 0, 32);
536+ secp256k1_scalar_clear(&msg);
537+ secp256k1_scalar_clear(&non);
538+ secp256k1_scalar_clear(&sec);
539+ secp256k1_ecdsa_signature_save(signature, &r, &s);
540+ memczero(signature, sizeof(*signature), (!ret) | overflow);
!ret | overflow
here, then we can get rid of the call to memczero
, which is probably not for free. Is there a reason why you preferred the call?
It’s easy to see that the declassification is not an issue. ret
is already declassified (see my comment above) and overflow
will anyway be zero.
It would then be replaced with a regular memset which I doubt would be significantly slower. My preference was to make it actually constant time unless it had a non-trivial cost.
I can give a contrived attack against secret overflow: If I can give you a tweak that you apply to you secret key with naive addition, and I can query you log(n) times, I can use the time signing takes to read off your secret key. It’s contrived not only because of the bad tweak function, but also because it assumes I can’t distinguish your failures by some way other than time.
It would then be replaced with a regular memset which I doubt would be significantly slower. My preference was to make it actually constant time unless it had a non-trivial cost.
Makes sense to me.
29+ fprintf(stderr, "This test can only usefully be run inside valgrind.\n");
30+ fprintf(stderr, "Usage: libtool --mode=execute valgrind ./valgrind_ctime_test\n");
31+ exit(1);
32+ }
33+
34+ /** In theory, testing with a single secret input should be sufficient:
I think I’ve lost track of where this is in review. I fixed some things, explained some things, closed some comments, but feel kind of lost.
I understand… I always feel lost when looking at this PR because the comments are on many different commits… I pinged you on the two conversations which you haven’t addressed. They’re both not crucial, it’s just my try of making everyone feel less lost and organize the PR.
There’s nothing else from my side, I can ACK this after your reply.
534- return ret;
535+ memset(nonce32, 0, 32);
536+ secp256k1_scalar_clear(&msg);
537+ secp256k1_scalar_clear(&non);
538+ secp256k1_scalar_clear(&sec);
539+ secp256k1_ecdsa_signature_save(signature, &r, &s);
This reads uninitialized memory if the nonce function returned 0.
We could now start a long discussion if this yields UB (see in #699 (comment)). But I think it’s preferable to just avoid the situation entirely by cmoving zero scalars into r
and s
before calling secp256k1_ecdsa_signature_save
instead of cmoving zero bytes afterwards.
189- retry = !secp256k1_fe_set_b32(&s, nonce32);
190- retry = retry || secp256k1_fe_is_zero(&s);
191- } while (retry); /* This branch true is cryptographically unreachable. Requires sha256_hmac output > Fp. */
192+ /* Accept unobservably small non-uniformity. */
193+ secp256k1_rfc6979_hmac_sha256_generate(&rng, nonce32, 32);
194+ range = !secp256k1_fe_set_b32(&s, nonce32);
range
is typically called overflow
in other code here. range
is a somewhat ambiguous identifier because it’s not clear whether it means “in range” or “out of range”.
There were several places where the code was non-constant time
for invalid secret inputs. These are harmless under sane use
but get in the way of automatic const-time validation.
(Nonce overflow in signing is not addressed, nor is s==0 in
signing)
ECDSA signing has a retry loop for the exceptionally unlikely case
that S==0. S is not a secret at this point and this case is so
rare that it will never be observed but branching on it will trip
up tools analysing if the code is constant time with respect to
secrets.
Derandomized ECDSA can also loop on k being zero or overflowing,
and while k is a secret these cases are too rare (1:2^255) to
ever observe and are also of no concern.
This adds a function for marking memory as no-longer-secret and
sets it up for use with the valgrind memcheck constant-time
test.