Add pippenger_wnaf for multi-multiplication #486

pull jonasnick wants to merge 9 commits into bitcoin-core:master from jonasnick:pippenger changing 16 files +1601 −83
  1. jonasnick commented at 2:45 pm on November 13, 2017: contributor

    This PR is based on #473 and adds a variant of “Pippengers algorithm” (see Bernstein et al., Faster batch forgery identification, page 15 and https://github.com/scipr-lab/libff/pull/10) for point multi-multiplication that performs better with a large number of points than Strauss’ algorithm.

    aggsig

    Thanks to @sipa for providing wnaf_fixed, benchmarking, and the crucial suggestion to use affine addition.

    The PR also makes ecmult_multi decide which algorithm to use, based on the number of points and the available scratch space. For restricted scratch spaces this can be further optimized in the future (f.e. a 35kB scratch space allows batches of 11 points with strauss or 95 points with pippenger; choosing pippenger would be 5% faster).

    As soon as this PR has received some feedback I’ll repeat the benchmarks to determine the optimal pippenger_bucket_window with the new benchmarking code in #473.

  2. apoelstra commented at 2:47 pm on November 13, 2017: contributor

    Hooray!

    Please rebase – should be straightforward, we merged only some minor things last week.

  3. apoelstra cross-referenced this on Nov 13, 2017 from issue Multi-point multiplication support by sipa
  4. jonasnick force-pushed on Nov 14, 2017
  5. jonasnick commented at 10:45 am on November 14, 2017: contributor
    Rebased
  6. in src/ecmult_impl.h:783 in be3248f50a outdated
    778+    } else if(bucket_window == 10) {
    779+        return 17400;
    780+    } else if(bucket_window == 11) {
    781+        return 28600;
    782+    } else if(bucket_window == PIPPENGER_MAX_BUCKET_WINDOW) {
    783+        return INT_MAX;
    


    sipa commented at 10:39 pm on November 14, 2017:
    INT_MAX is not defined.

    apoelstra commented at 4:52 pm on November 15, 2017:
    #include <limits.h> ?
  7. jonasnick force-pushed on Nov 16, 2017
  8. apoelstra commented at 5:53 pm on November 16, 2017: contributor

    In secp256k1_ecmult_multi_split_strauss_wnaf we immediately convert a secp256k1_ge received from the callback to a secp256k1_gej. Would be better to change the callback API to just return a gej in the first place, otherwise the caller may be forced to do an expensive and pointless gej->ge conversion.

    Edit Oh, I was looking at old code. I see that Strauss does the ge->gej conversion but Pippenger does not (and Pippenger derives a speed benefit from using ge). And our two main applications, aggsig and bulletproofs, don’t need to do a gej->ge conversion anyway, at least during verification. Never mind.

  9. sipa commented at 5:45 am on November 17, 2017: contributor
    @apoelstra Yes, I believe it was a conscious choice to making the interface pass affine points because no callers need more.
  10. in include/secp256k1.h:264 in be8abb5ad1 outdated
    255@@ -243,6 +256,28 @@ SECP256K1_API void secp256k1_context_set_error_callback(
    256     const void* data
    257 ) SECP256K1_ARG_NONNULL(1);
    258 
    259+/** Create a secp256k1 scratch space object.
    260+ *
    261+ *  Returns: a newly created scratch space.
    262+ *  Args: ctx:  an existing context object (cannot be NULL)
    263+ *  In:  init_size: initial amount of memory to allocate
    264+ *        max_size: maximum amount of memory to allocat
    


    sipa commented at 10:19 pm on November 23, 2017:
    Nit: allocat
  11. in src/scratch_impl.h:16 in be8abb5ad1 outdated
     7+#ifndef _SECP256K1_SCRATCH_IMPL_H_
     8+#define _SECP256K1_SCRATCH_IMPL_H_
     9+
    10+#include "scratch.h"
    11+
    12+#define ALIGNMENT 16
    


    sipa commented at 10:22 pm on November 23, 2017:
    Add a comment to explain that we assume that common architectures never have alignment requirements above 8 for any of the types we care about (and make it 16 because who cares about a few byes)?

    sipa commented at 10:19 am on November 25, 2017:
    Perhaps add a TODO that we should try to determine this at configure time (using sizeof(max_align_t), if available, for example).
  12. in src/ecmult_impl.h:502 in 541ea3fc14 outdated
    497+    if (max_points == 0) return 0;
    498+    if (max_points > 160) max_points = 160; /* At this point, gains are not longer compensating for locality degradation */
    499+    n_batches = (n + max_points - 1) / max_points;
    500+    points_per_batch = (n + n_batches - 1) / n_batches;
    501+
    502+    /* Attempt to allocate sufficient space for Strauss */
    


    sipa commented at 10:36 pm on November 23, 2017:
    @apoelstra Is this halving loop still necessary? The max allocation call above should be perfect, so I think we can just assert that this never fails.

    apoelstra commented at 0:30 am on December 7, 2017:
    Agreed, please change this to a CHECK on the return value of secp256k1_scratch_resize.

    apoelstra commented at 3:57 pm on December 7, 2017:
    Oh, derp, Jonas points out that this is fixed in a later commit. Never mind.

    jonasnick commented at 4:01 pm on December 7, 2017:
    This is outdated by “Add pippenger_wnaf ecmult_multi”: if resize fails then return 0.
  13. in src/ecmult_impl.h:2 in 118ad20838 outdated
    0@@ -1,5 +1,5 @@
    1 /**********************************************************************
    2- * Copyright (c) 2013, 2014 Pieter Wuille                             *
    3+ * Copyright (c) 2013, 2014, 2017 Pieter Wuille, Andrew Poelstra      *
    


    sipa commented at 3:17 am on November 25, 2017:
    This needs a “Jonas Nick”, I think.
  14. in src/ecmult_impl.h:487 in 118ad20838 outdated
    483@@ -403,4 +484,540 @@ static void secp256k1_ecmult(const secp256k1_ecmult_context *ctx, secp256k1_gej
    484     }
    485 }
    486 
    487+static void secp256k1_ecmult(const secp256k1_ecmult_context *ctx, secp256k1_gej *r, const secp256k1_gej *a, const secp256k1_scalar *na, const secp256k1_scalar *ng) {
    


    sipa commented at 6:27 am on November 25, 2017:
    At some point this should be renamed to _var as well, but let’s leave that for later.
  15. in src/ecmult_impl.h:564 in 118ad20838 outdated
    559+}
    560+
    561+/** Convert a number to WNAF notation.
    562+ *  The number becomes represented by sum(2^{wi} * wnaf[i], i=0..WNAF_SIZE(w)+1) - return_val.
    563+ *  It has the following guarantees:
    564+ *  - each wnaf[i] an odd integer between -(1 << w) and (1 << w)
    


    sipa commented at 6:31 am on November 25, 2017:
    These two lines are more accurately and correctly written as each wnaf[i] is either zero or an odd integer between -(1 << w) and (1 << w).

    jonasnick commented at 1:43 pm on November 27, 2017:

    Specifically, do you suggest to replace

    0 *  - each wnaf[i] is an odd integer between -(1 << w) and (1 << w)
    1 *  - each wnaf[i] is nonzero, unless the given number is 0
    

    with

    0* wnaf[i] is either zero or an odd integer between -(1 << w) and (1 << w)
    

    ? If so, that sounds just as accurate but less helpful.


    apoelstra commented at 3:07 pm on November 27, 2017:
    To be clear: is it possible for a nonzero number to have 0 wnaf digits in our representation? I believe the answer is yes, though for wnaf_const (for example) it is not, so this is an important distinction.

    sipa commented at 7:32 pm on November 27, 2017:
    @jonasnick The first statement “each wnaf[i] is an odd integer between -(1 « w) and (1 « w)” is just incorrect. 0 is not an odd integer, and 0 is possible in case the input is 0.

    jonasnick commented at 10:47 pm on November 27, 2017:
    Oh, of course 0 is even. But I believe the second statement should stay because at least empirically it doesn’t happen that a nonzero number has 0 wnaf digits (we test for this, copied from wnaf_const). @sipa can you confirm?

    sipa commented at 2:10 am on November 28, 2017:
    I don’t care that it doesn’t happen often. The first sentence is stated as a condition which is always true, and it is not. You could say “There result satisfies one of the two following conditions:” and leave the two in. But leaving them as-is is just wrong.
  16. in src/ecmult_impl.h:719 in 118ad20838 outdated
    708+ * Returns optimal bucket_window (number of bits of a scalar represented by a
    709+ * set of buckets) for a given number of points.
    710+ */
    711+static int secp256k1_pippenger_bucket_window(size_t n) {
    712+#ifdef USE_ENDOMORPHISM
    713+    if (n <= 4) {
    


    sipa commented at 6:40 am on November 25, 2017:
    Maybe write this in a tree structure rather than linear (as in: top level is is a bisection for first half/second half) etc.

    jonasnick commented at 2:00 pm on November 27, 2017:
    What’s the goal of this?

    sipa commented at 7:30 pm on November 27, 2017:
    Only having a logarithmic number of conditionals on average.

    jonasnick commented at 2:43 pm on November 30, 2017:
    While being better in the average case I think this is worse for small batches (where this code has more impact compared to larger batches) and looks ugly. Maybe that’s something for another PR.

    sipa commented at 11:49 pm on December 4, 2017:
    Fair enough.
  17. in src/ecmult_impl.h:767 in 118ad20838 outdated
    762+#endif
    763+}
    764+
    765+static size_t secp256k1_pippenger_bucket_window_inv(int bucket_window) {
    766+#ifdef USE_ENDOMORPHISM
    767+    if(bucket_window == 1) {
    


    sipa commented at 6:41 am on November 25, 2017:
    Nit: space after if.

    sipa commented at 7:00 am on November 25, 2017:
    I think this can be written more efficiently using a switch case.
  18. in src/ecmult_impl.h:692 in 118ad20838 outdated
    688+                secp256k1_ge_neg(&tmp, &pt[point_state.input_pos]);
    689+                secp256k1_gej_add_ge_var(&buckets[idx], &buckets[idx], &tmp, NULL);
    690+            }
    691+        }
    692+        secp256k1_gej_set_infinity(&running_sum);
    693+        secp256k1_gej_set_infinity(&walking_sum);
    


    sipa commented at 6:51 am on November 25, 2017:
    Do you need a separate variable for walking_sum? Can’t you reuse r for this?

    jonasnick commented at 1:58 pm on November 27, 2017:
    I don’t see how. If r stores the result of previous iterations it can’t be used in the addition ladder.
  19. in src/ecmult_impl.h:961 in 118ad20838 outdated
    956+        n_points = space_for_points/entry_size;
    957+        n_points = n_points > max_points ? max_points : n_points;
    958+        if (n_points > res) {
    959+            res = n_points;
    960+        }
    961+        if(n_points < max_points) {
    


    sipa commented at 6:55 am on November 25, 2017:
    Nit: space after if.
  20. in src/ecmult_impl.h:999 in 118ad20838 outdated
    994+        max_points = ECMULT_MAX_POINTS_PER_BATCH;
    995+    }
    996+    n_batches = (n+max_points-1)/max_points;
    997+    n_batch_points = (n+n_batches-1)/n_batches;
    998+
    999+    if(n_batch_points >= ECMULT_PIPPENGER_THRESHOLD) {
    


    sipa commented at 6:56 am on November 25, 2017:
    Nit: space after if.
  21. in src/ecmult_impl.h:1014 in 118ad20838 outdated
    1009+    }
    1010+    for(i = 0; i < n_batches; i++) {
    1011+        size_t nbp = n < n_batch_points ? n : n_batch_points;
    1012+        size_t offset = n_batch_points*i;
    1013+        secp256k1_gej tmp;
    1014+        if(!(*f)(ctx, scratch, error_callback, &tmp, i == 0 ? inp_g_sc : NULL, cb, cbdata, nbp, offset)) {
    


    sipa commented at 6:56 am on November 25, 2017:
    Nit: you can just use f(...) instead of (*f)(...).
  22. in src/ecmult_impl.h:1008 in 118ad20838 outdated
    1003+        if (max_points == 0) {
    1004+            return 0;
    1005+        }
    1006+        n_batches = (n+max_points-1)/max_points;
    1007+        n_batch_points = (n+n_batches-1)/n_batches;
    1008+        f = &secp256k1_ecmult_strauss_batch;
    


    sipa commented at 6:57 am on November 25, 2017:
    Nit: ampersand is optional here.
  23. in src/scratch.h:11 in 118ad20838 outdated
     7+#ifndef _SECP256K1_SCRATCH_
     8+#define _SECP256K1_SCRATCH_
     9+#include <sys/types.h>
    10+
    11+/* The typedef is used internally; the struct name is used in the public API
    12+ * (where it is exposed as a different typedef) */
    


    sipa commented at 10:16 am on November 25, 2017:
    The typedef and struct name are the same publicly; the struct is just not defined there.

    apoelstra commented at 3:02 pm on November 27, 2017:
    I’m not sure what you mean, the typedef here is secp256k1_scratch, the public one is secp256k1_scratch_space.

    sipa commented at 7:38 pm on November 27, 2017:
    Oops, ignore me!
  24. jonasnick force-pushed on Nov 30, 2017
  25. jonasnick commented at 3:16 pm on November 30, 2017: contributor
    Pushed fixes
  26. peterdettman commented at 8:01 pm on November 30, 2017: contributor
    @jonasnick I opened a PR here: https://github.com/jonasnick/secp256k1/pull/1 for suggested changes to save some additions in each _pippenger_wnaf window.
  27. jonasnick commented at 4:46 pm on December 1, 2017: contributor
    As expected, @peterdettman delivers. I see about 2% to 4% speedup. Merged the commit into this PR.
  28. jonasnick force-pushed on Dec 1, 2017
  29. sipa commented at 11:54 pm on December 4, 2017: contributor

    utACK c7723997941576dd03bb1db77dad8d5dc6255924 tree 6f9376c5ebdfe589593fabde10c5d61f5d07ebcc

    Please squash fixes.

  30. jonasnick force-pushed on Dec 6, 2017
  31. jonasnick force-pushed on Dec 6, 2017
  32. jonasnick force-pushed on Dec 6, 2017
  33. yeastplume referenced this in commit 9eb90c9fd9 on Dec 6, 2017
  34. yeastplume cross-referenced this on Dec 6, 2017 from issue Aggsig work to date - merging by yeastplume
  35. jonasnick force-pushed on Dec 6, 2017
  36. jonasnick commented at 4:38 pm on December 6, 2017: contributor
    Squashed fixes and updated bucket windows. Previous bucket windows did not not include peterdettman’s optimization, did not use bench_ecmult and had a bug where the number of points for a bucket window were twice of what they should have been. The result is that the performance without endomorphism is very similar, with endomorphism it’s up to 8% faster for some number of points. The performance graph looks very similar to the one above.
  37. sipa commented at 11:56 pm on December 6, 2017: contributor
    ACK 50f0779be9c8d67fc5c8ca3a7a21b17da74cfa33
  38. in src/scratch.h:9 in 56fb2ccbe9 outdated
    0@@ -0,0 +1,35 @@
    1+/**********************************************************************
    2+ * Copyright (c) 2017 Andrew Poelstra	                              *
    3+ * Distributed under the MIT software license, see the accompanying   *
    4+ * file COPYING or http://www.opensource.org/licenses/mit-license.php.*
    5+ **********************************************************************/
    6+
    7+#ifndef _SECP256K1_SCRATCH_
    8+#define _SECP256K1_SCRATCH_
    9+#include <sys/types.h>
    


    apoelstra commented at 0:10 am on December 7, 2017:

    We can remove this #include. There are two places that ssize_t are used in ecmult_impl.h which need to be removed. The first, ssize_t max_alloc = can be changed to size_t because it is assigned a size_t anyway (we changed secp256k1_scratch_max_allocation to not return a sign value).

    The second use is ssize_t space_for_points. This can also be made into a size_t. The check on line 938, if (space_for_points < 0), will need to be rearranged to compare the two things that are subtracted on line 937 to get space_for_points.

  39. in src/scratch.h:16 in 56fb2ccbe9 outdated
    12+ * (where it is exposed as a different typedef) */
    13+typedef struct secp256k1_scratch_space_struct {
    14+    void *data;
    15+    size_t offset;
    16+    size_t init_size;
    17+    size_t max_size;
    


    apoelstra commented at 0:13 am on December 7, 2017:
    Can we add an error_callback pointer to this structure and remove it from the arguments of secp256k1_scratch_resize? It would avoid the need to pass error callbacks into a bunch of higher level functions.
  40. in src/ecmult.h:35 in da61b77821 outdated
    29@@ -28,4 +30,9 @@ static int secp256k1_ecmult_context_is_built(const secp256k1_ecmult_context *ctx
    30 /** Double multiply: R = na*A + ng*G */
    31 static void secp256k1_ecmult(const secp256k1_ecmult_context *ctx, secp256k1_gej *r, const secp256k1_gej *a, const secp256k1_scalar *na, const secp256k1_scalar *ng);
    32 
    33+typedef int (secp256k1_ecmult_multi_callback)(secp256k1_scalar *sc, secp256k1_ge *pt, size_t idx, void *data);
    34+
    35+/** Multi-multiply: R = inp_g_sc * G + sum_i ni * Ai. */
    


    apoelstra commented at 0:29 am on December 7, 2017:
    Update comment to say that this function will clobber the scratch space that gets passed in? I’ve nearly been burned by that in the Bulletproofs code a couple times.
  41. in src/tests.c:2702 in deabf0d88f outdated
    2697+    memset(&sc[0], 0, sizeof(sc[0]));
    2698+    CHECK(secp256k1_ecmult_multi(&ctx->ecmult_ctx, scratch, &ctx->error_callback, &r, &szero, ecmult_multi_callback, &data, 20));
    2699+    memset(&sc[1], 0, sizeof(sc[0]));
    2700+    memset(&sc[2], 0, sizeof(sc[0]));
    2701+    memset(&sc[3], 0, sizeof(sc[0]));
    2702+    memset(&sc[4], 0, sizeof(sc[0]));
    


    apoelstra commented at 0:45 am on December 7, 2017:
    Can we use secp256k1_scalar_clear instead of memsetting?
  42. jonasnick force-pushed on Dec 7, 2017
  43. jonasnick force-pushed on Dec 7, 2017
  44. jonasnick commented at 3:49 pm on December 7, 2017: contributor
    Pushed fixes
  45. in src/ecmult_impl.h:869 in 396615f1b5 outdated
    907+static int secp256k1_ecmult_pippenger_batch_single(const secp256k1_ecmult_context *actx, secp256k1_scratch *scratch, const secp256k1_callback* error_callback, secp256k1_gej *r, const secp256k1_scalar *inp_g_sc, secp256k1_ecmult_multi_callback cb, void *cbdata, size_t n) {
    908+    return secp256k1_ecmult_pippenger_batch(actx, scratch, error_callback, r, inp_g_sc, cb, cbdata, n, 0);
    909+}
    910+
    911+#define MAX_BATCH_SIZE 1024
    912+typedef int (*secp256k1_ecmult_multi_func)(const secp256k1_ecmult_context*, secp256k1_scratch*, const secp256k1_callback*, secp256k1_gej*, const secp256k1_scalar*, secp256k1_ecmult_multi_callback cb, void*, size_t);
    


    apoelstra commented at 4:29 pm on December 7, 2017:
    I’d like this definition and the two wrapper functions to be moved to src/ecmult.h rather than the _impl file. @sipa do you have thoughts on that? These are used (only) in bench_ecmult.c and tests.c.
  46. apoelstra commented at 5:00 pm on December 7, 2017: contributor
    ACK https://github.com/bitcoin-core/secp256k1/pull/486/commits/5979a573829b166f05ea8e03747bad9225451919 .. please squash. The only nit is the position of the secp256k1_ecmult_multi_func typedef, which I don’t care about that much.
  47. add resizeable scratch space API
    Alignment support by Pieter Wuille.
    548de42ecf
  48. Generalize Strauss to support multiple points
    API by Andrew Poelstra.
    8c1c831bdb
  49. Add ecmult_multi tests dba5471b69
  50. Add bench_ecmult bc65aa794e
  51. Add pippenger_wnaf ecmult_multi 355a38f113
  52. Use scratch space dependent batching in ecmult_multi 36b22c9337
  53. Add flags for choosing algorithm in ecmult_multi benchmark a58f543f5a
  54. Save some additions per window in _pippenger_wnaf 4c950bbeaf
  55. Use more precise pippenger bucket windows d2f9c6b5dc
  56. jonasnick force-pushed on Dec 7, 2017
  57. jonasnick commented at 8:13 pm on December 7, 2017: contributor
    Squashed
  58. sipa merged this on Dec 8, 2017
  59. sipa closed this on Dec 8, 2017

  60. sipa referenced this in commit c77fc08597 on Dec 8, 2017
  61. str4d cross-referenced this on Dec 11, 2017 from issue Merge current secp256k1 subtree by str4d
  62. anargle cross-referenced this on May 29, 2018 from issue Kernel half-aggregation by anargle

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin-core/secp256k1. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-10-30 09:15 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me