variable sized precomputed table for signing

douglasbakkum commented at 10:04 am on October 18, 2015: none

This pull request gives an option to reduce the precomputed table size for the signing context (ctx) by setting #define ECMULT_GEN_PREC_BITS [N_BITS].

Motivation: Per #251 and #254, the static table can be reduced to 64kB. However, this is still too big for some of my embedded applications. Setting #define ECMULT_GEN_PREC_BITS 2 produces a 32kB table at a tradeoff of about 75% of the signing speed. Not defining this value will default to the existing implementation of 4 bits. Statistics:

 0ECMULT_GEN_PREC_BITS = 1
 1Precomputed table size: 32kB
 2./bench_sign 
 3ecdsa_sign: min 195us / avg 200us / max 212us
 4
 5ECMULT_GEN_PREC_BITS = 2
 6Precomputed table size: 32kB
 7./bench_sign 
 8ecdsa_sign: min 119us / avg 126us / max 134us
 9
10ECMULT_GEN_PREC_BITS = 4 (default)
11Precomputed table size: 64kB
12./bench_sign
13ecdsa_sign: min 83.5us / avg 89.6us / max 95.3us
14
15ECMULT_GEN_PREC_BITS = 8
16Precomputed table size: 512kB
17./bench_sign 
18ecdsa_sign: min 96.4us / avg 99.4us / max 104us

Only values of 2 and 4 make sense. 8 bits causes a larger table size with no increase in speed. 1 bit runs, actually, but does not reduce table size and is slower than 2 bits.

douglasbakkum cross-referenced this on Oct 18, 2015 from issue Low-footprint mode by gmaxwell

douglasbakkum commented at 10:35 am on October 18, 2015: none

See also comments about alternatively using wNAF: https://github.com/bitcoin/secp256k1/issues/189#issuecomment-149000292

gmaxwell commented at 4:27 pm on October 18, 2015: contributor

Reasonable enough.

Can you help me understand your memory trade-off requirements? There is a way to halve the table size without largely changing the performance though it requires getting rid of the nums blinding. I’d been under the impression that with the table completely static most of the reasonable target embedded devices (e.g. arm m3s with 192k ram) would end up with the table in their relatively copious flash and there was no real need to reduce below 64k on them.

douglasbakkum commented at 5:33 pm on October 18, 2015: none

Sure. The current device has 256kB of flash and 64kB of RAM on a Cortex M4. The secp256k1 library (including the 64kB table) takes up about 140kB in total.

jonasschnelli commented at 6:03 pm on October 18, 2015: contributor

Concept ACK. I think hardware wallets and other use-cases on embedded devices don’t really seek for super-performance. But they like to use a reliable and well-testes library that fits on a chip with tiny ram and tiny flash. Removing precomputed tables sounds after a logical tradeoff in that territory.

douglasbakkum cross-referenced this on Mar 22, 2016 from issue use secp ecc lib by default, with smaller precomputed table for signing by douglasbakkum

ddustin commented at 7:44 pm on October 23, 2018: none

Is there a reason this was never merged in?

gmaxwell commented at 0:31 am on May 23, 2019: contributor

@real-or-random This complements the WINDOW_G work, would you mind reviewing it?

real-or-random commented at 8:57 am on May 23, 2019: contributor

I’ll have a look. :)

in src/ecmult_gen_impl.h:60 in abcaa73da5 outdated

48@@ -49,39 +49,39 @@ static void secp256k1_ecmult_gen_context_build(secp256k1_ecmult_gen_context *ctx
49 
50     /* compute prec. */
51     {
52-        secp256k1_gej precj[1024]; /* Jacobian versions of prec. */
53+        secp256k1_gej precj[ECMULT_GEN_PREC_N * ECMULT_GEN_PREC_G]; /* Jacobian versions of prec. */
54         secp256k1_gej gbase;
55         secp256k1_gej numsbase;
56         gbase = gj; /* 16^j * G */

real-or-random commented at 7:57 am on May 24, 2019:

there are some comments that should be changed here and up to line 67

douglasbakkum commented at 10:37 am on May 24, 2019:

I updated the comments, assuming you meant to change, for example, 16 to PREC_G. Let me know if there is something in addition.

real-or-random commented at 8:04 am on May 24, 2019: contributor

@douglasbakkum I know this is very old. Are you still interested in this PR? If yes, it needs rebase. If not, we could adopt it.

douglasbakkum commented at 9:29 am on May 24, 2019: none

Awesome. Yes definitely interested. We have been using this in production BitBox’s since the PR was submitted.

I will rebase.

douglasbakkum closed this on May 24, 2019

douglasbakkum force-pushed on May 24, 2019

douglasbakkum reopened this on May 24, 2019

douglasbakkum commented at 10:22 am on May 24, 2019: none

@real-or-random The PR is now rebased. Let me know if you would like further changes.

douglasbakkum force-pushed on May 24, 2019

real-or-random commented at 11:52 am on May 24, 2019: contributor

Thanks, looks good to me!

What we need to do is make this and #596 consistent (for naming of constants, ./configure, etc.) I think the easiest way forward is to get #596 merged and then adapt this PR here. (I know this is also the easiest way for me as the author of #596 but I’m of course open to other suggestions too. :P) @gmaxwell What do you think? I’m not convinced that “window size” would be an appropriate term here, so maybe PREC_BITS makes more sense in general?

gmaxwell commented at 10:12 am on May 25, 2019: contributor

Sounds fine to me (both on the sequence and the naming)

gmaxwell commented at 12:08 pm on May 29, 2019: contributor

#596 is now merged and I don’t believe there is anything at risk of imminent merger which will cause difficult conflicts with this PR.

douglasbakkum force-pushed on May 29, 2019

gmaxwell commented at 8:47 pm on May 29, 2019: contributor

@real-or-random ^

real-or-random commented at 11:40 am on May 30, 2019: contributor

Great, so if ECMULT_GEN_PREC_BITS is fine, then this constant should be ./configure-able as in #596. See this commit for reference: https://github.com/bitcoin-core/secp256k1/pull/596/commits/2842dc523e88ff8f01c16486a1601ee49f4e60ff @douglasbakkum Do you want to give it a try?

douglasbakkum commented at 2:05 pm on May 31, 2019: none

@real-or-random Yea, agree. If it is quick for you, would you mind doing so?

vhnatyk referenced this in commit dbc9d46c7d on Jun 18, 2019

vhnatyk cross-referenced this on Jun 18, 2019 from issue Implementing pre allocation context creation by elichai

vhnatyk cross-referenced this on Jun 18, 2019 from issue Dev/vhnat/new master by vhnatyk

vhnatyk commented at 12:19 pm on June 19, 2019: none

Hi! any progress with this?

douglasbakkum commented at 1:36 pm on June 21, 2019: none

Sorry for the delays. My colleague @benma made a commit to make the value configurable.

douglasbakkum force-pushed on Jun 21, 2019

benma commented at 7:56 am on June 29, 2019: contributor

@real-or-random did you have a chance to test the configuration?

real-or-random commented at 9:33 am on June 29, 2019: contributor

No, sorry, not yet. It’s on my long “GitHub saved notifications” list, I’ll have a look next week. :)

in src/gen_context.c:56 in df3224d89a outdated

45@@ -45,23 +46,23 @@ int main(int argc, char **argv) {
46     fprintf(fp, "#define _SECP256K1_ECMULT_STATIC_CONTEXT_\n");
47     fprintf(fp, "#include \"src/group.h\"\n");
48     fprintf(fp, "#define SC SECP256K1_GE_STORAGE_CONST\n");
49-    fprintf(fp, "static const secp256k1_ge_storage secp256k1_ecmult_static_context[64][16] = {\n");
50+    fprintf(fp, "static const secp256k1_ge_storage secp256k1_ecmult_static_context[ECMULT_GEN_PREC_N][ECMULT_GEN_PREC_G] = {\n");

real-or-random commented at 3:10 pm on July 3, 2019:

This is somewhat weird if you change the config in between but for some reason gen_context is not run again.

So I think the generated ecmult_static_context.h should have the gen_context.c compile-time values of ECMULT_GEN_PREC_N and ECMULT_GEN_PREC_G hardcoded and contain a precompiler assertion (#error) that verifies they’re identical to the ecmult_static_context.h-compile-time values. This should make sure that we never compile with a ecmult_static_context.h generated with a different configuration. (Alternatively, we could make the file name of ecmult_static_context.h make dependent on the configured values, e.g., ecmult_static_context_64_16.h or something.)

But maybe I make a mistake here, it’s easy to get these things wrong. So correct me if you think I’m wrong.

benma commented at 8:58 am on July 4, 2019:

That’s a great catch. Normally gen_context is run with make if the configuration changed, but it might not be absolutely failsafe (e.g. by manually overwriting the file with a different version etc.).

Change incoming.

real-or-random commented at 8:26 am on July 15, 2019:

Thanks, looks good! Can you add a sentence like “Try deleting ecmult_static_context.h before the build.” or similar?

vhnatyk commented at 5:42 am on July 12, 2019: none

Seems we are so close 😄, sorry for spamming - what else needs to be done?

real-or-random commented at 12:45 pm on July 12, 2019: contributor

Ah I haven’t seen the changes. I’ll have a look next week!

in configure.ac:171 in 7abe39b298 outdated

164@@ -165,6 +165,14 @@ AC_ARG_WITH([ecmult-window], [AS_HELP_STRING([--with-ecmult-window=SIZE|auto],
165 )],
166 [req_ecmult_window=$withval], [req_ecmult_window=auto])
167 
168+AC_ARG_WITH([ecmult-gen-precision], [AS_HELP_STRING([--with-ecmult-gen-precision=2|4|8|auto],
169+[Precision bits to tune the precomputed table size for signing.]
170+[The size of the table is 32kB for 2 bits, 64kB for 4 bits, 512kB for 8 bits of precision.]
171+[A smaller table size usually results in slower signing.]

real-or-random commented at 8:23 am on July 15, 2019:

nit: maybe phrase it the around way around: “A larger table size usually results in possible faster signing.”

(to be consistent with “Larger values result in possibly better performance” for the other config entry)

benma commented at 10:14 am on July 15, 2019:

With ./configure --disable-openssl-tests --with-ecmult-gen-precision=8 (but not 2 and 4) and valgrind ./exhaustive_tests I get millions of warnings like

There is no issue if you call it like this:

valgrind --max-stackframe=2097912 ./exhaustive_tests

Reference:

-max-stackframe= [default: 2000000] The maximum size of a stack frame. If the stack pointer moves by more than this amount then Valgrind will assume that the program is switching to a different stack. You may need to use this option if your program has large stack-allocated arrays.

Seems like the big option just needs more stack space. It’s a valgrind heuristic only and the tests run fine otherwise.

Will add a note to the README.

real-or-random commented at 10:39 am on July 15, 2019:

Oh indeed, thanks!

in Makefile.am:175 in 7abe39b298 outdated

171@@ -172,7 +172,7 @@ src/ecmult_static_context.h: $(gen_context_BIN)
172 CLEANFILES = $(gen_context_BIN) src/ecmult_static_context.h $(JAVAROOT)/$(JAVAORG)/*.class .stamp-java
173 endif
174 
175-EXTRA_DIST = autogen.sh src/gen_context.c src/basic-config.h $(JAVA_FILES)
176+EXTRA_DIST = autogen.sh src/gen_context.c src/libsecp256k1-config.h src/basic-config.h $(JAVA_FILES)

real-or-random commented at 8:47 am on July 15, 2019:

I don’t think we want to distribute src/libsecp256k1-config.h. It will be generated by configure.

benma commented at 10:23 am on July 15, 2019:

Can you add a sentence like “Try deleting ecmult_static_context.h before the build.” or similar?

Done.

I don’t think we want to distribute src/libsecp256k1-config.h. It will be generated by configure.

That was a misguided attempt at having gen_context rebuild the static context upon configuration change. It seems to work now anyway, so I removed it again.

real-or-random commented at 11:49 am on July 15, 2019:

Oh hm, apparently make distcheck fails here: https://travis-ci.org/bitcoin-core/secp256k1/jobs/558864214#L847

On a second thought, I’m not sure that including src/libsecp256k1-config.h is a good idea. There are downstream users of this library (e.g., https://github.com/rust-bitcoin/rust-secp256k1) that don’t use the autotools. But I’m not sure what the best thing is. Maybe including it only if ECMULT_GEN_PREC_BITS is not defined. Sounds pragmatic to me.

benma commented at 3:24 pm on July 15, 2019:

fixed

real-or-random commented at 9:41 am on July 15, 2019: contributor

With ./configure --disable-openssl-tests --with-ecmult-gen-precision=8 (but not 2 and 4) and valgrind ./exhaustive_tests I get millions of warnings like

 0==6831== Invalid read of size 16
 1==6831==    at 0x1111AB: secp256k1_fe_add (field_5x52_impl.h:405)
 2==6831==    by 0x1111AB: secp256k1_gej_double_var.part.0.constprop.0 (group_impl.h:358)
 3==6831==    by 0x1154ED: secp256k1_gej_double_var (group_impl.h:326)
 4==6831==    by 0x1154ED: secp256k1_ecmult_gen_context_build (ecmult_gen_impl.h:83)
 5==6831==    by 0x11A55E: secp256k1_context_preallocated_create (secp256k1.c:130)
 6==6831==    by 0x11A55E: secp256k1_context_create (secp256k1.c:142)
 7==6831==    by 0x109130: main (tests_exhaustive.c:460)
 8==6831==  Address 0x1ffedfe740 is on thread 1's stack
 9==6831==  in frame [#1](/bitcoin-core-secp256k1/1/), created by secp256k1_ecmult_gen_context_build (group_impl.h:29)
10==6831==
11==6831== Invalid read of size 16
12==6831==    at 0x1111B0: secp256k1_fe_add (field_5x52_impl.h:405)
13==6831==    by 0x1111B0: secp256k1_gej_double_var.part.0.constprop.0 (group_impl.h:358)
14==6831==    by 0x1154ED: secp256k1_gej_double_var (group_impl.h:326)
15==6831==    by 0x1154ED: secp256k1_ecmult_gen_context_build (ecmult_gen_impl.h:83)
16==6831==    by 0x11A55E: secp256k1_context_preallocated_create (secp256k1.c:130)
17==6831==    by 0x11A55E: secp256k1_context_create (secp256k1.c:142)
18==6831==    by 0x109130: main (tests_exhaustive.c:460)
19==6831==  Address 0x1ffedfe730 is on thread 1's stack
20==6831==  in frame [#1](/bitcoin-core-secp256k1/1/), created by secp256k1_ecmult_gen_context_build (group_impl.h:29)
21==6831==
22==6831== Invalid write of size 4
23==6831==    at 0x1111B5: secp256k1_fe_add (field_5x52_impl.h:412)
24==6831==    by 0x1111B5: secp256k1_gej_double_var.part.0.constprop.0 (group_impl.h:358)
25==6831==    by 0x1154ED: secp256k1_gej_double_var (group_impl.h:326)
26==6831==    by 0x1154ED: secp256k1_ecmult_gen_context_build (ecmult_gen_impl.h:83)
27==6831==    by 0x11A55E: secp256k1_context_preallocated_create (secp256k1.c:130)
28==6831==    by 0x11A55E: secp256k1_context_create (secp256k1.c:142)
29==6831==    by 0x109130: main (tests_exhaustive.c:460)
30==6831==  Address 0x1ffedfe75c is on thread 1's stack
31==6831==  in frame [#1](/bitcoin-core-secp256k1/1/), created by secp256k1_ecmult_gen_context_build (group_impl.h:29)
32==6831==
33==6831== Invalid write of size 8
34==6831==    at 0x1111C8: secp256k1_fe_add (field_5x52_impl.h:405)
35==6831==    by 0x1111C8: secp256k1_gej_double_var.part.0.constprop.0 (group_impl.h:358)
36==6831==    by 0x1154ED: secp256k1_gej_double_var (group_impl.h:326)
37==6831==    by 0x1154ED: secp256k1_ecmult_gen_context_build (ecmult_gen_impl.h:83)
38==6831==    by 0x11A55E: secp256k1_context_preallocated_create (secp256k1.c:130)
39==6831==    by 0x11A55E: secp256k1_context_create (secp256k1.c:142)
40==6831==    by 0x109130: main (tests_exhaustive.c:460)
41==6831==  Address 0x1ffedfe740 is on thread 1's stack
42==6831==  in frame [#1](/bitcoin-core-secp256k1/1/), created by secp256k1_ecmult_gen_context_build (group_impl.h:29)

I haven’t looked at this further.

in README.md:70 in deec07d51e outdated

65+
66+    $ ./exhaustive_tests
67+
68+With valgrind, you might need to increase the max stack size:
69+
70+    $ valgrind --max-stackframe=2097912 ./exhaustive_tests

real-or-random commented at 11:34 am on July 15, 2019:

I need slightly larger values…. Maybe recommend 2500000?

benma commented at 2:05 pm on July 15, 2019:

Done

in src/gen_context.c:7 in deec07d51e outdated

3@@ -4,9 +4,10 @@
4  * file COPYING or http://www.opensource.org/licenses/mit-license.php.*
5  **********************************************************************/
6 
7+#include "libsecp256k1-config.h" // for ECMULT_GEN_PREC_BITS

real-or-random commented at 11:36 am on July 15, 2019:

It’s not crucial but we should make sure that this warning disappears: /configure --disable-openssl-tests --with-ecmult-gen-precision=8 --enable-experimental --enable-module-recovery --enable-module-ecdh --with-ecmult-window=22 CC=clang

0In file included from src/gen_context.c:9:
1src/basic-config.h:33: warning: "ECMULT_WINDOW_SIZE" redefined
2   33 | #define ECMULT_WINDOW_SIZE 15
3      |
4In file included from src/gen_context.c:7:
5src/libsecp256k1-config.h:18: note: this is the location of the previous definition
6   18 | #define ECMULT_WINDOW_SIZE 22

I guess other compilers warn, too.

In the end it does not matter what ECMULT_WINDOW_SIZE is for use in gen_context.c. I think sticking to 15 is reasonable (but the fact that we ignore the config setting here should be more prominent in the source).

real-or-random commented at 12:07 pm on July 15, 2019:

In the end it does not matter what ECMULT_WINDOW_SIZE is for use in gen_context.c. I think sticking to 15 is reasonable (but the fact that we ignore the config setting here should be more prominent in the source).

Actually it does not matter too much because gen_context does not use ECMULT_WINDOW_SIZE at all.

benma commented at 2:05 pm on July 15, 2019:

Fixed via #undef ECMULT_WINDOW_SIZE in basic-config.h, like with the other definitions.

vhnatyk commented at 10:48 pm on July 22, 2019:

@real-or-random sorry to bug, seems ok now? 😊

benma commented at 10:52 am on September 4, 2019:

Ping @real-or-random - seems we are really close now :P

real-or-random commented at 11:40 am on July 15, 2019: contributor

Sorry for bringing up more and more stuff. We should be really close now. :)

real-or-random commented at 5:31 pm on September 4, 2019: contributor

Yes, we’re close. :)

ACK 1d26b27ac90092306bfbc9cdd5123e8a5035202a (mod squashing) I read the changes, and tested them both with and without static precomputation, with different precision values and also with valgrind.

Can you squash some of the commits?

real-or-random requested review from sipa on Sep 4, 2019

real-or-random commented at 5:44 pm on September 4, 2019: contributor

@sipa Can you have look at this? I think you know the algorithm best.

sipa commented at 6:29 pm on September 4, 2019: contributor

Code review ACK 1d26b27ac90092306bfbc9cdd5123e8a5035202a. Looks great; all comments I wanted to make were addressed in later commits.

Please squash the fixup commits.

sipa approved

elichai commented at 6:46 pm on September 4, 2019: contributor

Maybe better docs to differentiate the different between ecmult_static_precomputation,ecmult-window,ecmult-gen-precision?

benma commented at 9:36 pm on September 4, 2019: contributor

@elichai okay to do this in a different PR? Not strictly related.

real-or-random commented at 5:34 am on September 5, 2019: contributor

I think this can happen in a different PR.

elichai commented at 5:37 am on September 5, 2019: contributor

Sure, just getting to a lot of tables that it’s getting confusing which table will affect what.

variable signing precompute table

make ECMULT_GEN_PREC_BITS configurable

ecmult_static_context.h: add compile time config assertion (#3) - Prevents accidentally using a file which was generated with a
different configuration.

README: mention valgrind issue

With --with-ecmult-gen-precision=8, valgrind needs a max stack size
adjustment to not run into a stack switching heuristic:

http://valgrind.org/docs/manual/manual-core.html

> -max-stackframe= [default: 2000000]
> The maximum size of a stack frame. If the stack pointer moves by more than this amount then Valgrind will assume that the program is switching to a different stack.
You may need to use this option if your program has large stack-allocated arrays.

basic-config: undef ECMULT_WINDOW_SIZE before (re-)defining it

dcb2e3b3ff

douglasbakkum force-pushed on Sep 5, 2019

douglasbakkum commented at 7:24 am on September 5, 2019: none

Thanks all. The commits are now squashed and rebased.

real-or-random approved

real-or-random commented at 8:45 am on September 5, 2019: contributor

ACK dcb2e3b3fff0b287d576842aabe5c79f2fe4df30 verified that all changes to the previous ACKed 1d26b27ac90092306bfbc9cdd5123e8a5035202a were due to the rebase

jonasnick approved

jonasnick commented at 12:20 pm on September 5, 2019: contributor

ACK dcb2e3b3fff0b287d576842aabe5c79f2fe4df30 read the code and tested various configurations with valgrind

real-or-random referenced this in commit 96cd94e385 on Sep 5, 2019

real-or-random merged this on Sep 5, 2019

real-or-random closed this on Sep 5, 2019

vhnatyk commented at 7:30 pm on September 5, 2019: none

So happy this got merged finally!) Hopefully I can start getting rid of black magik in my code now))

benma cross-referenced this on Oct 8, 2019 from issue rebase on bitcoin-core/secp256k1 by benma

vhnatyk cross-referenced this on Oct 19, 2019 from issue Clarify RAM footprint on MCUs by vhnatyk

gmaxwell cross-referenced this on Nov 9, 2019 from issue Don't put an absurd amount of data onto the stack in some configs by gmaxwell

sipa cross-referenced this on Jun 9, 2020 from issue Update libsecp256k1 subtree by sipa

fanquake referenced this in commit 8c97780db8 on Jun 13, 2020

sidhujag referenced this in commit 8a3a072968 on Jun 13, 2020

ComputerCraftr referenced this in commit b98f1c6e6c on Jun 16, 2020

UdjinM6 referenced this in commit 9d36ba6570 on Aug 10, 2021

5tefan referenced this in commit 8ded2caa74 on Aug 12, 2021

gades referenced this in commit d855cc511d on May 8, 2022

variable sized precomputed table for signing #337