Add constant-time point multplication for ECDH

apoelstra commented at 7:48 pm on May 15, 2015: contributor

Add constant-time secp256k1_point_multiply for ECDH

Designed with clear separation of the wNAF conversion, precomputation and exponentiation (since the precomp at least we will probably want to separate in the API for users who reuse points a lot).

Future work:

actually separate precomp in the API
do multiexp rather than single exponentiation

apoelstra force-pushed on May 15, 2015

sipa commented at 5:48 am on May 17, 2015: contributor

I’m on vacation until june 1st. I’ll review afterwards.

gmaxwell commented at 5:30 am on May 20, 2015: contributor

“/** Do an ellitic curve scalar multiplication in constant time.”

spelling.

apoelstra force-pushed on May 22, 2015

apoelstra commented at 5:10 pm on May 22, 2015: contributor

Added endomorphism optimization.

On my laptop (Lenovo T440p i7-4700MQ 2.4Ghz) I see 85.6us without endomorphism, 107us with. I will need to investigate this..

Edit: Oh, I needed to shrink the wNAF input size from 256 bits to 128 when the endomorphism is enabled. When this is done I see 64.4us, a 25% improvement :)

apoelstra force-pushed on May 22, 2015

apoelstra force-pushed on May 23, 2015

apoelstra commented at 4:52 pm on May 23, 2015: contributor

Pushed; had to do some algebraic changes to get the endomorphism stuff to work. The pre-endomorphism commits have been changed in some minor ways (the unit tests are a bit cleaner, secp256k1_scalar_wnaf_force_odd is replaced by secp256k1_scalar_wnaf_cond_negate and secp256k1_scalar_is_even, and the return type of secp256k1_ecdh_wnaf is now void since it used to return a fixed value, which was confusing. This value is now computed by a macro.)

As for the endomorphism stuff, the wNAF conversion is actually a bit different. We require all wNAF digits to be odd, so to represent even 256-bit numbers, we (a) negate the number, making it odd since our modulus is odd; (b) convert this to wNAF; (c) negate the individual digits, un-negating the number while keeping all digits odd. In this way all 256-bit numbers can be represented. However, with the endomorphism optimization, we find ourselves converting 128-bit numbers to wNAF, which cannot be negated this way. (They can, but their negations are 256 bits, so we lose the optimization.) We therefore use a different strategy for handling even numbers when endomorphism optimization is enabled. Specifically, we add 1 to even numbers, add 2 to odd ones, and compensate after the multiplication by subtracting either P or 2P from the result. This is suggested in the Okeya/Tagaki paper.

With this strategy, the perf numbers are less impressive: with endomorphism optimization on my laptop can do a multiplication in 68.9us.

sipa commented at 12:34 pm on June 13, 2015: contributor

I have a few nits regarding naming and a question (see above), but the code looks well-written, well-documented and well-tested. Nice!

apoelstra commented at 3:29 pm on June 13, 2015: contributor

When I add a test against secp256k1_ec_pubkey_create that runs 100 times with a random_scalar_order scalar it fails usually when the endomorphism optimization is enabled. Investigating.

Edit: So this occurs specifically with scalars whose lambda-decomposition consists of a positive and a negative number (where “negative” means high bit set). Two positives or two negatives are OK, which seems inexplicable … those numbers are treated totally independently by the point-multiply code, so it is weird that there’d be a problem with their relative sign and not their absolute ones.

I doubt this has anything to do with the basepoint being the generator or not; it happens that the existing test code always uses the same hardcoded scalar whose lambda-decomposition consists of two positive values, hence our not seeing this bug with the existing tests.

apoelstra cross-referenced this on Jun 16, 2015 from issue `secp256k1_gej_add_ge` does the wrong thing with points P, Q with P_y = -Q_y but P≠-Q by apoelstra

apoelstra force-pushed on Jun 23, 2015

apoelstra commented at 1:29 pm on June 24, 2015: contributor

Rebased on https://github.com/bitcoin/secp256k1/pull/261, which fixes the problems I was having earlier with the endomorphism. Also add the additional test @sipa suggested (which didn’t pass until #261 :))

peterdettman cross-referenced this on Jun 26, 2015 from issue x-only ECDH without sqrt by peterdettman

in src/secp256k1.c: in 2896833062 outdated

240+        if (overflow) {
241+            ret = -2;
242+        } else {
243+            secp256k1_ecdh_point_multiply(&res, &pt, &s);
244+            secp256k1_ge_set_gej(&pt, &res);
245+            ret = secp256k1_eckey_pubkey_serialize(&pt, point, pointlen, *pointlen <= 33);

peterdettman commented at 12:01 pm on June 27, 2015:

The output point is supposed to be secret (ECDH agreement), but _pubkey_serialize uses _normalize_var internally because it’s assuming a public key point.

apoelstra commented at 3:25 pm on June 29, 2015:

Yikes! Thanks for catching this.

peterdettman commented at 12:32 pm on June 27, 2015: contributor

A couple of issues with the documentation for secp256k1_point_multiply:

The return code -1 is listed for “scalar was zero”, but it looks to me like it would actually return 0 in that case (via secp256k1_eckey_pubkey_serialize).
Is the scalar actually restricted to not overflow the curve order? In which case, “a 32-byte scalar” is a little misleading.

in include/secp256k1.h: in 2896833062 outdated

221+/** Do an elliptic curve scalar multiplication in constant time.
222+ *  Returns: 1: exponentiation was successful
223+ *          -1: scalar was zero (cannot serialize output point)
224+ *          -2: scalar overflow
225+ *          -3: invalid input point
226+ *  In:      scalar:   a 32-byte scalar with which to multiple the point

peterdettman commented at 12:32 pm on June 27, 2015:

multiple -> multiply

apoelstra force-pushed on Jun 29, 2015

apoelstra commented at 4:59 pm on June 29, 2015: contributor

Rebase and fix minor comments from Peter. TODO fix serialization timing leak.

apoelstra force-pushed on Jun 29, 2015

apoelstra commented at 8:19 pm on June 29, 2015: contributor

I’ve pushed a new version which fixes the timing leak. I’ve changed the API to output the hash of the computed point rather than the point itself. @gmaxwell had suggested early on that giving the user a curvepoint was “too low level” and not consistent with our policy of not exposing raw cryptographic objects.

The concern was that a user who has access to a “ECDH secret” which is a curvepoint might try to build his or her own cryptosystem in a fragile or broken way. I think this timing leak is an example of me getting burned by this – I didn’t think to look for this timing leak because I was thinking in terms of “points” rather than “secrets”.

apoelstra commented at 8:30 pm on June 29, 2015: contributor

This means that #262 is going to need to be changed to use secp256k1_point_multiply rather than secp256k1_ecdh_point_multiply since the latter now outputs a hash (which is algebraically useless).

in src/bench_ecdh.c: in 370c270267 outdated

35+static void bench_multiply(void* arg) {
36+    int i;
37+    bench_multiply_t *data = (bench_multiply_t*)arg;
38+
39+    for (i = 0; i < 20000; i++) {
40+        CHECK(secp256k1_ecdh_point_multiply(data->point, &data->pointlen, data->scalar) == 1);

peterdettman commented at 4:07 am on June 30, 2015:

Seems to have been missed in the API change?

apoelstra commented at 6:45 pm on June 30, 2015:

Yeah, oops. Should we put -Werror in Travis to prevent this oversight in future?

apoelstra force-pushed on Jun 30, 2015

in include/secp256k1.h: in 48dc8e3001 outdated

216@@ -217,6 +217,24 @@ SECP256K1_WARN_UNUSED_RESULT int secp256k1_ecdsa_recover_compact(
217   int recid
218 ) SECP256K1_ARG_NONNULL(1) SECP256K1_ARG_NONNULL(2) SECP256K1_ARG_NONNULL(3) SECP256K1_ARG_NONNULL(4) SECP256K1_ARG_NONNULL(5);
219 
220+/** Compute an EC Diffie-Hellman secret in constant time
221+ *  Returns: 1: exponentiation was successful
222+ *           0: scalar was zero (cannot serialize output point)
223+ *          -1: scalar overflow
224+ *          -2: invalid input point

peterdettman commented at 2:22 am on July 1, 2015:

The error codes seem out-of-sync with the method body.

peterdettman commented at 2:26 am on July 1, 2015: contributor

@apoelstra @gmaxwell Is there somewhere I can review the discussion about exposing low-level cryptographic operations (alluded to above)?

in src/tests.c: in 48dc8e3001 outdated

1391+    ecdh_chain_multiply();
1392+}
1393+
1394+void run_ecdh_api_tests(void) {
1395+    ecdh_generator_basepoint();
1396+}

peterdettman commented at 9:44 am on July 1, 2015:

Method run_ecdh_api_tests not actually called?

apoelstra commented at 6:52 pm on July 2, 2015:

Ugh, thanks, this was a rebasing error.

apoelstra commented at 7:01 pm on July 2, 2015: contributor

@peterdettman, @gmaxwell and I had a private conversation on May 15 with the lines

013:18:19 <gmaxwell> One thing I wonder about is that we've tried to avoid offering users APIs that were raw EC operations... both becaues the users won't use them safely, and because they'll construct less efficient implementations of crypto algorithims (e.g. if we expose add and double they can go write a non-constant time, slow as heck ecdh.).  I wonder if there are any api affordances we can make to make that multiply s
113:23:47 <andytoshi> there thave been a lot of comments on -dev and -wizards from people who are pretty jacked to have ECDH but who IMHO are probably gonna do something unsafe with it

This was right before I started working on this PR. Then a few days ago in #secp256k1 on IRC:

 010:51:09 <andytoshi> in https://github.com/bitcoin/secp256k1/pull/252#discussion_r33412268 peter dettman notes that pubkey_serialize uses fe_normalize_var which is a timing leak for ECDH
 110:51:23 <andytoshi> what do we do about this? do we need to split serialize into serialize and serialize_var?
 211:02:55 <sipa> i suppose
 311:03:00 <sipa> wait
 411:03:08 <sipa> right
 511:03:28 <sipa> ecdh is weird because it has an EC point which is secret information
 613:12:02 <gmaxwell> well I slightly protested before that returning a point was maybe too low level.
 713:31:22 <andytoshi> i recall that ... this might be a concrete reason to -not- do it
 813:31:50 <andytoshi> i mean, this feels like we got cut by our own sharp corner here..
 914:36:30 <sipa> we could hash it before returning
1016:31:30 <gmaxwell> that was my suggestion.

To the best of my knowledge this was the entirety of our discussion.

apoelstra force-pushed on Jul 2, 2015

peterdettman commented at 5:09 am on July 3, 2015: contributor

Thanks for chasing that up. Not sure how I feel about the hashing yet, but I would like to try and get agreement that the result be just the x-coordinate (or the hash of it) and not include the y-bit. Of course, this is somewhat with a view to integrating #262 cleanly (as an optimization for compressed input points), but possibly x-only output could be considered sufficient impediment to building adhoc schemes on top of ECDH-as-scalar-mult. In any case, I think taking just the x-coordinate as the shared secret is fairly standard, or at least noncontroversial.

gmaxwell commented at 5:21 am on July 3, 2015: contributor

It depends on the application, I’m sure I can construct a contrived protocol where the ‘malleability’ of someone flipping the point but still getting the same secret would be bad; e.g. where it causes you to reuse a key in a stream cipher where you otherwise wouldn’t.

peterdettman commented at 8:35 am on July 3, 2015: contributor

Yes, I understand this to be “benign malleability” per Shoup e.g. http://www.shoup.net/papers/iso-2_1.pdf section 2.3. I would like to simply say that it can be addressed in the outer scheme, but I gather we’re trying to foolproof things. I suppose one alternative here is to include the respective public keys under the result hash.

in src/secp256k1.c: in 0436dfcf62 outdated

231+    secp256k1_gej_t res;
232+    secp256k1_ge_t pt;
233+    secp256k1_scalar_t s;
234+    DEBUG_CHECK(point != NULL);
235+    DEBUG_CHECK(pointlen != NULL);
236+    DEBUG_CHECK(scalar != NULL);

peterdettman commented at 0:26 am on July 9, 2015:

DEBUG_CHECK(result != NULL). Also, point can now be const, and pointlen by-value.

in src/ecdh_impl.h: in 0436dfcf62 outdated

170+        VERIFY_CHECK(n != 0);
171+        secp256k1_gej_add_ge(r, r, &tmpa);
172+#else
173+        n = wnaf[i];
174+        VERIFY_CHECK(n != 0);
175+        ECMULT_TABLE_GET_GE(&tmpa, pre_a, n, WINDOW_A);

peterdettman commented at 10:23 am on July 9, 2015:

_GET_GE is a simple table lookup, with the index derived from bits of the scalar, so we’ll need a _cmov-based alternative.

sipa commented at 1:39 pm on July 9, 2015:

Ugh, nice catch!

in src/ecdh_impl.h: in 0436dfcf62 outdated

177+#endif
178+    }
179+
180+    if (!r->infinity) {
181+        secp256k1_fe_mul(&r->z, &r->z, &Z);
182+    }

peterdettman commented at 12:06 pm on July 9, 2015:

Just perform the _fe_mul unconditionally?

apoelstra commented at 0:22 am on July 14, 2015: contributor

Fixed all of @peterdettman’s minor comments. Added a new commit to fix the timing leak in GET_GE. The perf hit I measure, using bench_ecdh, is 10%, both with and without endomorphism. :/

Edit: Actually I can get this down to 3% without much trouble, updating the PR with that. I can do even better with a multi-cmov but that will be a bit more work..

apoelstra force-pushed on Jul 14, 2015

sipa commented at 0:49 am on July 14, 2015: contributor

Rebase?

apoelstra force-pushed on Jul 14, 2015

apoelstra commented at 0:58 am on July 14, 2015: contributor

Rebased.

Gustav-Simonsson cross-referenced this on Jul 15, 2015 from issue Use libsecp256k1 for scalar mult in ECDH in ECIES by Gustav-Simonsson

peterdettman commented at 1:37 pm on July 15, 2015: contributor

@apoelstra See https://github.com/peterdettman/secp256k1/commit/de5436efcc37a0b051058ba60647284d7bef4337 for some ideas on how to recover another 1% or so from the cmov table-lookup.

apoelstra commented at 6:19 pm on July 15, 2015: contributor

Thanks Peter. I’ve cherry-picked that commit. The total perf hit from doing constant-time lookups now looks like 1.7% to me.

apoelstra force-pushed on Jul 29, 2015

apoelstra commented at 3:42 pm on July 29, 2015: contributor

Rebase; change API to take a pubkey_t rather than point/pointlen, and to take a context_t.

tests: add a couple tests

  - Add zero/one sanity check tests for ecmult

  - Add unit test for secp256k1_scalar_split_lambda_var

  - Typo fix in `ge_equals_ge`; was comparing b->y to itself, should
    have been comparing a->y to b->y

  - Normalize y-coordinate in `random_group_element_test`; this is
    needed to pass random group elements as the first argument to
    `ge_equals_ge`, which I will do in a future commit.

baa75da59d

apoelstra force-pushed on Jul 29, 2015

apoelstra commented at 11:14 pm on July 29, 2015: contributor

I’ve added a commit which moves the ECDH code into its own module but leaves the constant-time multiply code in the main library. It also renames the const-multiply functions, so the result is kinda a mess of renaming and moving.

I’ll go back and rebase the original changeset as though this had been the layout from the start, but first I’d like an ACK that this is how we want to organize things.

in src/bench_ecdh.c: in e0e79874d2 outdated

44+}
45+
46+int main(void) {
47+    bench_multiply_t data;
48+
49+    run_benchmark("ecdh_mult", bench_multiply, bench_multiply_setup, NULL, &data, 10, 20000);

sipa commented at 2:09 pm on July 31, 2015:

Technically, ECDH is no longer just a multiplication.

in src/ecmult_const_impl.h: in e0e79874d2 outdated

43+
44+/** Convert a number to WNAF notation. The number becomes represented by sum(2^{wi} * wnaf[i], i=0..return_val)
45+ *  with the following guarantees:
46+ *  - each wnaf[i] an odd integer between -(1 << w) and (1 << w)
47+ *  - each wnaf[i] is nonzero
48+ *  - the number of words set is returned; this is always (256 + w - 1) / w

sipa commented at 2:11 pm on July 31, 2015:

Is this true, even with USE_ENDOMORPHISM?

in src/ecmult_const_impl.h: in e0e79874d2 outdated

180+        for (j = 0; j < WINDOW_A - 1; ++j) {
181+            /* This is a variable-time doubling, but it is actually constant-time for
182+             * nonzero points. We know on the first iteration that `r` will be zero
183+             * and know (by uniqueness of wNAF) that `r` will never be zero after
184+             * that iteration, so this does not result in a timing leak. */
185+            secp256k1_gej_double_var(r, r, NULL);

sipa commented at 2:19 pm on July 31, 2015:

Would you mind adding a secp256k1_gej_double_nonzero, which just VERIFY_CHECK’s for nonzeroness, and dispatches to secp256k1_gej_double_var? Placing the assumption about algorithm internals in a higher level is a bit ugly.

sipa commented at 2:31 pm on July 31, 2015: contributor

ACK design, with a few more nits.

Please squash and I’ll merge.

EDIT: no need to squash everything, but some logical steps are nice… ecmult_const then ecmult_const + endo then optimizations then ECDH module… or whatever you like.

Add constant-time multiply `secp256k1_ecmult_const` for ECDH

Designed with clear separation of the wNAF conversion, precomputation
and exponentiation (since the precomp at least we will probably want
to separate in the API for users who reuse points a lot.

Future work:
  - actually separate precomp in the API
  - do multiexp rather than single exponentiation

4401500060

apoelstra force-pushed on Jul 31, 2015

apoelstra commented at 7:53 pm on July 31, 2015: contributor

Thanks for the comments @sipa. All fixed, and rebased for a cleaner history.

sipa commented at 2:16 pm on August 1, 2015: contributor

Thanks! Can you also make some changes to .travis so this gets tested?

Add ECDH module which works by hashing the output of ecmult_const 0739bbb6f0

Add benchmarks for ECDH and const-time multiplication 91c0ce95ca

Make `secp256k1_scalar_add_bit` conditional; make `secp256k1_scalar_split_lambda_var` constant time

This has the effect of making `secp256k1_scalar_mul_shift_var` constant
time in both input scalars. Keep the _var name because it is NOT constant
time in the shift amount.

As used in `secp256k1_scalar_split_lambda_var`, the shift is always
the constant 272, so this function becomes constant time, and it
loses the `_var` suffix.

ed35d43a0c

Implement endomorphism optimization for secp256k1_ecmult_const 92e53fc4c8

Improve perf. of cmov-based table lookup 72ae443afb

apoelstra force-pushed on Aug 1, 2015

sipa merged this on Aug 2, 2015

sipa closed this on Aug 2, 2015

sipa referenced this in commit d84a3784f4 on Aug 2, 2015

sipa cross-referenced this on Aug 28, 2015 from issue Support for optional parts by sipa

Kagami commented at 11:51 am on November 5, 2015: none

It’s a bit unfortunate that secp256k1 always hash the derived secret, because it means I can’t use it for existing protocols like Bitmessage.

Any ideas how can I get Px with secp256k1? Only by implementing my own version of secp256k1_ecdh? (Note that Bitmessage uses SHA512(Px), so it’s not unsafe.)

apoelstra commented at 12:59 pm on November 5, 2015: contributor

@Kagami As you say, ECIES actually only needs a hash of Px, though it’s not our hash. Can you open an issue requesting we extend the ECDH API to support a custom hash function (like the signing functions support custom nonce functions)?

This is something we’ve considered but haven’t had any concrete usecases and I guess it forgot to get written down..

Kagami cross-referenced this on Nov 5, 2015 from issue Support custom hash functions in ECDH API by Kagami

sipa cross-referenced this on Nov 28, 2016 from issue add generic constant time multi-exp for ECDH by gmaxwell

apoelstra deleted the branch on Jun 19, 2017

Add constant-time point multplication for ECDH #252