apoelstra
commented at 7:48 pm on May 15, 2015:
contributor
Add constant-time secp256k1_point_multiply for ECDH
Designed with clear separation of the wNAF conversion, precomputation
and exponentiation (since the precomp at least we will probably want
to separate in the API for users who reuse points a lot).
Future work:
actually separate precomp in the API
do multiexp rather than single exponentiation
apoelstra force-pushed
on May 15, 2015
apoelstra force-pushed
on May 15, 2015
apoelstra force-pushed
on May 15, 2015
apoelstra force-pushed
on May 15, 2015
sipa
commented at 5:48 am on May 17, 2015:
contributor
I’m on vacation until june 1st. I’ll review afterwards.
gmaxwell
commented at 5:30 am on May 20, 2015:
contributor
“/** Do an ellitic curve scalar multiplication in constant time.”
spelling.
apoelstra force-pushed
on May 22, 2015
apoelstra
commented at 5:10 pm on May 22, 2015:
contributor
Added endomorphism optimization.
On my laptop (Lenovo T440p i7-4700MQ 2.4Ghz) I see 85.6us without endomorphism, 107us with. I will need to investigate this..
Edit: Oh, I needed to shrink the wNAF input size from 256 bits to 128 when the endomorphism is enabled. When this is done I see 64.4us, a 25% improvement :)
apoelstra force-pushed
on May 22, 2015
apoelstra force-pushed
on May 22, 2015
apoelstra force-pushed
on May 23, 2015
apoelstra force-pushed
on May 23, 2015
apoelstra
commented at 4:52 pm on May 23, 2015:
contributor
Pushed; had to do some algebraic changes to get the endomorphism stuff to work. The pre-endomorphism commits have been changed in some minor ways (the unit tests are a bit cleaner, secp256k1_scalar_wnaf_force_odd is replaced by secp256k1_scalar_wnaf_cond_negate and secp256k1_scalar_is_even, and the return type of secp256k1_ecdh_wnaf is now void since it used to return a fixed value, which was confusing. This value is now computed by a macro.)
As for the endomorphism stuff, the wNAF conversion is actually a bit different. We require all wNAF digits to be odd, so to represent even 256-bit numbers, we (a) negate the number, making it odd since our modulus is odd; (b) convert this to wNAF; (c) negate the individual digits, un-negating the number while keeping all digits odd. In this way all 256-bit numbers can be represented. However, with the endomorphism optimization, we find ourselves converting 128-bit numbers to wNAF, which cannot be negated this way. (They can, but their negations are 256 bits, so we lose the optimization.) We therefore use a different strategy for handling even numbers when endomorphism optimization is enabled. Specifically, we add 1 to even numbers, add 2 to odd ones, and compensate after the multiplication by subtracting either P or 2P from the result. This is suggested in the Okeya/Tagaki paper.
With this strategy, the perf numbers are less impressive: with endomorphism optimization on my laptop can do a multiplication in 68.9us.
sipa
commented at 12:34 pm on June 13, 2015:
contributor
I have a few nits regarding naming and a question (see above), but the code looks well-written, well-documented and well-tested. Nice!
apoelstra
commented at 3:29 pm on June 13, 2015:
contributor
When I add a test against secp256k1_ec_pubkey_create that runs 100 times with a random_scalar_order scalar it fails usually when the endomorphism optimization is enabled. Investigating.
Edit: So this occurs specifically with scalars whose lambda-decomposition consists of a positive and a negative number (where “negative” means high bit set). Two positives or two negatives are OK, which seems inexplicable … those numbers are treated totally independently by the point-multiply code, so it is weird that there’d be a problem with their relative sign and not their absolute ones.
I doubt this has anything to do with the basepoint being the generator or not; it happens that the existing test code always uses the same hardcoded scalar whose lambda-decomposition consists of two positive values, hence our not seeing this bug with the existing tests.
peterdettman cross-referenced this
on Jun 26, 2015
from issue
x-only ECDH without sqrt
by peterdettman
in
src/secp256k1.c:
in
2896833062outdated
240+ if (overflow) {
241+ ret = -2;
242+ } else {
243+ secp256k1_ecdh_point_multiply(&res, &pt, &s);
244+ secp256k1_ge_set_gej(&pt, &res);
245+ ret = secp256k1_eckey_pubkey_serialize(&pt, point, pointlen, *pointlen <= 33);
peterdettman
commented at 12:01 pm on June 27, 2015:
The output point is supposed to be secret (ECDH agreement), but _pubkey_serialize uses _normalize_var internally because it’s assuming a public key point.
peterdettman
commented at 12:32 pm on June 27, 2015:
contributor
A couple of issues with the documentation for secp256k1_point_multiply:
The return code -1 is listed for “scalar was zero”, but it looks to me like it would actually return 0 in that case (via secp256k1_eckey_pubkey_serialize).
Is the scalar actually restricted to not overflow the curve order? In which case, “a 32-byte scalar” is a little misleading.
in
include/secp256k1.h:
in
2896833062outdated
221+/** Do an elliptic curve scalar multiplication in constant time.
222+ * Returns: 1: exponentiation was successful
223+ * -1: scalar was zero (cannot serialize output point)
224+ * -2: scalar overflow
225+ * -3: invalid input point
226+ * In: scalar: a 32-byte scalar with which to multiple the point
peterdettman
commented at 12:32 pm on June 27, 2015:
multiple -> multiply
apoelstra force-pushed
on Jun 29, 2015
apoelstra
commented at 4:59 pm on June 29, 2015:
contributor
Rebase and fix minor comments from Peter. TODO fix serialization timing leak.
apoelstra force-pushed
on Jun 29, 2015
apoelstra force-pushed
on Jun 29, 2015
apoelstra
commented at 8:19 pm on June 29, 2015:
contributor
I’ve pushed a new version which fixes the timing leak. I’ve changed the API to output the hash of the computed point rather than the point itself. @gmaxwell had suggested early on that giving the user a curvepoint was “too low level” and not consistent with our policy of not exposing raw cryptographic objects.
The concern was that a user who has access to a “ECDH secret” which is a curvepoint might try to build his or her own cryptosystem in a fragile or broken way. I think this timing leak is an example of me getting burned by this – I didn’t think to look for this timing leak because I was thinking in terms of “points” rather than “secrets”.
apoelstra
commented at 8:30 pm on June 29, 2015:
contributor
This means that #262 is going to need to be changed to use secp256k1_point_multiply rather than secp256k1_ecdh_point_multiply since the latter now outputs a hash (which is algebraically useless).
in
src/bench_ecdh.c:
in
370c270267outdated
35+static void bench_multiply(void* arg) {
36+ int i;
37+ bench_multiply_t *data = (bench_multiply_t*)arg;
38+
39+ for (i = 0; i < 20000; i++) {
40+ CHECK(secp256k1_ecdh_point_multiply(data->point, &data->pointlen, data->scalar) == 1);
peterdettman
commented at 4:07 am on June 30, 2015:
013:18:19<gmaxwell> One thing I wonder about is that we've tried to avoid offering users APIs that were raw EC operations... both becaues the users won't use them safely, and because they'll construct less efficient implementations of crypto algorithims (e.g. if we expose add and double they can go write a non-constant time, slow as heck ecdh.). I wonder if there are any api affordances we can make to make that multiply s113:23:47<andytoshi> there thave been a lot of comments on -dev and-wizards from people who are pretty jacked to have ECDH but who IMHO are probably gonna do something unsafe with it
This was right before I started working on this PR. Then a few days ago in #secp256k1 on IRC:
010:51:09<andytoshi>in https://github.com/bitcoin/secp256k1/pull/252#discussion_r33412268 peter dettman notes that pubkey_serialize uses fe_normalize_var which is a timing leak for ECDH 110:51:23<andytoshi> what do we do about this?do we need to split serialize into serialize and serialize_var? 211:02:55<sipa> i suppose
311:03:00<sipa> wait
411:03:08<sipa> right
511:03:28<sipa> ecdh is weird because it has an EC point which is secret information
613:12:02<gmaxwell> well I slightly protested before that returning a point was maybe too low level. 713:31:22<andytoshi> i recall that ... this might be a concrete reason to -not-do it
813:31:50<andytoshi> i mean, this feels like we got cut by our own sharp corner here.. 914:36:30<sipa> we could hash it before returning
1016:31:30<gmaxwell> that was my suggestion.
To the best of my knowledge this was the entirety of our discussion.
apoelstra force-pushed
on Jul 2, 2015
peterdettman
commented at 5:09 am on July 3, 2015:
contributor
Thanks for chasing that up. Not sure how I feel about the hashing yet, but I would like to try and get agreement that the result be just the x-coordinate (or the hash of it) and not include the y-bit. Of course, this is somewhat with a view to integrating #262 cleanly (as an optimization for compressed input points), but possibly x-only output could be considered sufficient impediment to building adhoc schemes on top of ECDH-as-scalar-mult. In any case, I think taking just the x-coordinate as the shared secret is fairly standard, or at least noncontroversial.
gmaxwell
commented at 5:21 am on July 3, 2015:
contributor
It depends on the application, I’m sure I can construct a contrived protocol where the ‘malleability’ of someone flipping the point but still getting the same secret would be bad; e.g. where it causes you to reuse a key in a stream cipher where you otherwise wouldn’t.
peterdettman
commented at 8:35 am on July 3, 2015:
contributor
Yes, I understand this to be “benign malleability” per Shoup e.g. http://www.shoup.net/papers/iso-2_1.pdf section 2.3. I would like to simply say that it can be addressed in the outer scheme, but I gather we’re trying to foolproof things. I suppose one alternative here is to include the respective public keys under the result hash.
peterdettman
commented at 12:06 pm on July 9, 2015:
Just perform the _fe_mul unconditionally?
apoelstra
commented at 0:22 am on July 14, 2015:
contributor
Fixed all of @peterdettman’s minor comments. Added a new commit to fix the timing leak in GET_GE. The perf hit I measure, using bench_ecdh, is 10%, both with and without endomorphism. :/
Edit: Actually I can get this down to 3% without much trouble, updating the PR with that. I can do even better with a multi-cmov but that will be a bit more work..
apoelstra force-pushed
on Jul 14, 2015
sipa
commented at 0:49 am on July 14, 2015:
contributor
Rebase?
apoelstra force-pushed
on Jul 14, 2015
apoelstra force-pushed
on Jul 14, 2015
apoelstra
commented at 0:58 am on July 14, 2015:
contributor
apoelstra
commented at 6:19 pm on July 15, 2015:
contributor
Thanks Peter. I’ve cherry-picked that commit. The total perf hit from doing constant-time lookups now looks like 1.7% to me.
apoelstra force-pushed
on Jul 29, 2015
apoelstra
commented at 3:42 pm on July 29, 2015:
contributor
Rebase; change API to take a pubkey_t rather than point/pointlen, and to take a context_t.
tests: add a couple tests
- Add zero/one sanity check tests for ecmult
- Add unit test for secp256k1_scalar_split_lambda_var
- Typo fix in `ge_equals_ge`; was comparing b->y to itself, should
have been comparing a->y to b->y
- Normalize y-coordinate in `random_group_element_test`; this is
needed to pass random group elements as the first argument to
`ge_equals_ge`, which I will do in a future commit.
baa75da59d
apoelstra force-pushed
on Jul 29, 2015
apoelstra force-pushed
on Jul 29, 2015
apoelstra
commented at 11:14 pm on July 29, 2015:
contributor
I’ve added a commit which moves the ECDH code into its own module but leaves the constant-time multiply code in the main library. It also renames the const-multiply functions, so the result is kinda a mess of renaming and moving.
I’ll go back and rebase the original changeset as though this had been the layout from the start, but first I’d like an ACK that this is how we want to organize things.
Technically, ECDH is no longer just a multiplication.
in
src/ecmult_const_impl.h:
in
e0e79874d2outdated
43+
44+/** Convert a number to WNAF notation. The number becomes represented by sum(2^{wi} * wnaf[i], i=0..return_val)
45+ * with the following guarantees:
46+ * - each wnaf[i] an odd integer between -(1 << w) and (1 << w)
47+ * - each wnaf[i] is nonzero
48+ * - the number of words set is returned; this is always (256 + w - 1) / w
180+ for (j = 0; j < WINDOW_A - 1; ++j) {
181+ /* This is a variable-time doubling, but it is actually constant-time for
182+ * nonzero points. We know on the first iteration that `r` will be zero
183+ * and know (by uniqueness of wNAF) that `r` will never be zero after
184+ * that iteration, so this does not result in a timing leak. */
185+ secp256k1_gej_double_var(r, r, NULL);
Would you mind adding a secp256k1_gej_double_nonzero, which just VERIFY_CHECK’s for nonzeroness, and dispatches to secp256k1_gej_double_var? Placing the assumption about algorithm internals in a higher level is a bit ugly.
sipa
commented at 2:31 pm on July 31, 2015:
contributor
ACK design, with a few more nits.
Please squash and I’ll merge.
EDIT: no need to squash everything, but some logical steps are nice… ecmult_const then ecmult_const + endo then optimizations then ECDH module… or whatever you like.
Add constant-time multiply `secp256k1_ecmult_const` for ECDH
Designed with clear separation of the wNAF conversion, precomputation
and exponentiation (since the precomp at least we will probably want
to separate in the API for users who reuse points a lot.
Future work:
- actually separate precomp in the API
- do multiexp rather than single exponentiation
4401500060
apoelstra force-pushed
on Jul 31, 2015
apoelstra
commented at 7:53 pm on July 31, 2015:
contributor
Thanks for the comments @sipa. All fixed, and rebased for a cleaner history.
sipa
commented at 2:16 pm on August 1, 2015:
contributor
Thanks! Can you also make some changes to .travis so this gets tested?
Add ECDH module which works by hashing the output of ecmult_const0739bbb6f0
Add benchmarks for ECDH and const-time multiplication91c0ce95ca
Make `secp256k1_scalar_add_bit` conditional; make `secp256k1_scalar_split_lambda_var` constant time
This has the effect of making `secp256k1_scalar_mul_shift_var` constant
time in both input scalars. Keep the _var name because it is NOT constant
time in the shift amount.
As used in `secp256k1_scalar_split_lambda_var`, the shift is always
the constant 272, so this function becomes constant time, and it
loses the `_var` suffix.
ed35d43a0c
Implement endomorphism optimization for secp256k1_ecmult_const92e53fc4c8
Improve perf. of cmov-based table lookup72ae443afb
apoelstra force-pushed
on Aug 1, 2015
sipa merged this
on Aug 2, 2015
sipa closed this
on Aug 2, 2015
sipa referenced this in commit
d84a3784f4
on Aug 2, 2015
Kagami
commented at 11:51 am on November 5, 2015:
none
It’s a bit unfortunate that secp256k1 always hash the derived secret, because it means I can’t use it for existing protocols like Bitmessage.
Any ideas how can I get Px with secp256k1? Only by implementing my own version of secp256k1_ecdh? (Note that Bitmessage uses SHA512(Px), so it’s not unsafe.)
apoelstra
commented at 12:59 pm on November 5, 2015:
contributor
@Kagami As you say, ECIES actually only needs a hash of Px, though it’s not our hash. Can you open an issue requesting we extend the ECDH API to support a custom hash function (like the signing functions support custom nonce functions)?
This is something we’ve considered but haven’t had any concrete usecases and I guess it forgot to get written down..
This is a metadata mirror of the GitHub repository
bitcoin-core/secp256k1.
This site is not affiliated with GitHub.
Content is generated from a GitHub metadata backup.
generated: 2025-05-03 07:15 UTC
This site is hosted by @0xB10C More mirrored repositories can be found on mirror.b10c.me