ecmult_multi: Replace scratch space with malloc, use abcd cost model #1789

pull fjahr wants to merge 2 commits into bitcoin-core:master from fjahr:2025-11-mem-multi-var changing 7 files +722 −829
  1. fjahr commented at 11:47 pm on December 15, 2025: contributor

    This is a draft of the discussed changes from an in-person meeting in October. It removes usage of scratch space in batch validation and replaces it with internal malloc usage. It also adds an ABCD cost model for algorithm selection.

    The API and internals follow the drafted spec from the meeting very closely: https://gist.github.com/fjahr/c2a009487dffe7a1fbf17ca1821976ca There are few minor changes that should not change the intended behavior. The test coverage is currently a bit less than it was previously. I am guessing an adapted form of the test_ecmult_multi test should be added back and there are a few TODOs left in the test code which I am planning to address after a first round of conceptual feedback.

    The second commit demonstrates the calibration tooling that I have been using though it’s not exactly what has given me the results that are in the PR. I have still been struggling with the calibration code and seem to never really get a result that just works without manual tweaking. The calibration itself as well as the code added there thus is rather a work in progress. I am assuming some version of the calibration code should be added to the repo and I haven’t thought much about what the best place to add it is. Putting it into the benchmark and combining it with the python script was just a convenient way for experimentation. I am very open to suggestions on how to change this.

  2. ecmult: Refactor ecmult algo selection 8760b3c148
  3. WIP: Tooling for algo selection abcd calibration 8b62524b0f
  4. fjahr force-pushed on Dec 16, 2025
  5. in src/modules/musig/keyagg_impl.h:152 in 8760b3c148
    163-#endif
    164-    secp256k1_musig_keyaggcoef_internal(sc, ctx->pks_hash, pt, &ctx->second_pk);
    165-    return 1;
    166-}
    167-
    168 int secp256k1_musig_pubkey_agg(const secp256k1_context* ctx, secp256k1_xonly_pubkey *agg_pk, secp256k1_musig_keyagg_cache *keyagg_cache, const secp256k1_pubkey * const* pubkeys, size_t n_pubkeys) {
    


    siv2r commented at 1:33 pm on December 26, 2025:
    It might be better to pass mem_limit as an input argument rather than defining it internally
  6. in src/modules/musig/keyagg_impl.h:206 in 8760b3c148
    207+        secp256k1_musig_keyaggcoef_internal(&scalars[i], ecmult_data.pks_hash, &points[i], &ecmult_data.second_pk);
    208+    }
    209+
    210+    /* TODO: Previously this just used TRIVIAL algorithm by not providing
    211+     * scratch space. To be discussed if this should be changed. */
    212+    mem_limit = n_pubkeys * (sizeof(secp256k1_ge) + sizeof(secp256k1_scalar)) * 4;
    


    siv2r commented at 1:51 pm on December 26, 2025:
    nit: Why the constant 4? I assume it covers internal overhead. I’m wondering whether we can define a ratio k such that mem_limit = n_points * (sizeof(secp256k1_ge) + sizeof(secp256k1_scalar)) * k yields a batch size of n_points when given to _ecmult_multi_batch_size()

    siv2r commented at 1:57 pm on December 26, 2025:
    Hmm, we could probably do mem_limit = max(STRAUSS_POINT_SIZE * n, PIPPENGER_POINT_SIZE * n + PIPPENGER_FIXED_SIZE). Just thinking out loud, not suggesting a change here.
  7. in src/tests_exhaustive.c:187 in 8760b3c148
    185@@ -186,17 +186,13 @@ typedef struct {
    186     secp256k1_ge pt[2];
    187 } ecmult_multi_data;
    


    siv2r commented at 2:02 pm on December 26, 2025:
    We can probably remove this data structure. In the old code, I only see this acting as cbdata (initialized using local scalars and points arrays) to the ecmult_multi_var
  8. in src/bench_ecmult.c:339 in 8760b3c148
    332@@ -306,11 +333,11 @@ static void run_ecmult_multi_bench(bench_data* data, size_t count, int includes_
    333 int main(int argc, char **argv) {
    334     bench_data data;
    335     int i, p;
    336-    size_t scratch_size;
    337 
    338     int iters = get_iters(10000);
    339 
    340-    data.ecmult_multi = secp256k1_ecmult_multi_var;
    341+    data.forced_algo = -1;
    


    siv2r commented at 2:09 pm on December 26, 2025:
    nit: We could define #define BENCH_ALGO_AUTO (-1) to make this more readable. Then, the check on line 252 would be: if (data->forced_algo != BENCH_ALGO_AUTO)
  9. in src/ecmult_impl.h:586 in 8760b3c148
    636+
    637+/* Pippenger fixed overhead: buckets + state + 2 extra entries */
    638+#define SECP256K1_PIPPENGER_FIXED_SIZE(w) \
    639+    ((sizeof(secp256k1_gej) << (w)) \
    640+     + sizeof(struct secp256k1_pippenger_state) \
    641+     + 2 * SECP256K1_PIPPENGER_ENTRY_SIZE(w))
    


    siv2r commented at 2:15 pm on December 26, 2025:
    0#define SECP256K1_PIPPENGER_FIXED_SIZE(w) \
    1    ((sizeof(secp256k1_gej) << (w)) \
    2     + 2 * SECP256K1_PIPPENGER_ENTRY_SIZE(w))
    

    We can remove the pippenger_state since it’s a stack variable. IIUC, the old code allocated this variable on the heap, hence it included it in the size calculation.

  10. in src/ecmult_impl.h:732 in 8760b3c148
    858+            n * ECMULT_TABLE_SIZE(WINDOW_A) * sizeof(secp256k1_ge));
    859+        state.ps = (struct secp256k1_strauss_point_state *)checked_malloc(error_callback,
    860+            n * sizeof(struct secp256k1_strauss_point_state));
    861+        points_gej = (secp256k1_gej *)checked_malloc(error_callback,
    862+            n * sizeof(secp256k1_gej));
    863+        scalars_copy = (secp256k1_scalar *)checked_malloc(error_callback,
    


    siv2r commented at 2:22 pm on December 26, 2025:
    Is there a specific reason for copying the scalars here? Since they are managed by the upper layer (like batch verification or MuSig), could we just pass the pointer to _strauss_wnaf directly?
  11. in src/ecmult_impl.h:750 in 8760b3c148
    876+        secp256k1_ecmult_strauss_wnaf(&state, r, n, points_gej, scalars_copy, scalar_g);
    877+    } else {
    878+        secp256k1_gej tmpj;
    879+        secp256k1_gej_set_infinity(&tmpj);
    880+        secp256k1_ecmult(r, &tmpj, &secp256k1_scalar_zero, scalar_g);
    881+    }
    


    siv2r commented at 2:25 pm on December 26, 2025:
    We can probably remove this else block. The n=0 case is already handled by the call to ecmult_strauss_wnaf in the preceding if block. Plus, secp256k1_ecmult calls strauss_wnaf internally, so this seems redundant.
  12. in src/ecmult_impl.h:598 in 8760b3c148
    648+ * Index 2-13 = PIPPENGER with window sizes 1-12
    649+ */
    650+static const struct secp256k1_ecmult_multi_abcd secp256k1_ecmult_multi_abcds[SECP256K1_ECMULT_MULTI_NUM_ALGOS] = {
    651+/*  A (per-point bytes)                     B (fixed bytes)                        C        D           */
    652+    {0,                                     0,                                     1000,    0      },
    653+    {SECP256K1_STRAUSS_POINT_SIZE,          0,                                     100,     0      },
    


    siv2r commented at 9:00 am on December 27, 2025:
    D shouldn’t be 0 here right?
  13. in src/bench_ecmult.c:76 in 8b62524b0f
    71+ * Measures the performance of each algorithm at various batch sizes and
    72+ * outputs. Use tools/ecmult_multi_calib.py to calculate optimal C and D
    73+ * values from the output.
    74+ */
    75+static void run_ecmult_multi_calib(bench_data* data) {
    76+    static const size_t batch_sizes[] = {10, 20, 30, 50, 75, 100, 150, 200, 300, 500, 750, 1000, 1500, 2000, 3000, 5000, 7500, 10000, 15000, 20000, 30000};
    


    siv2r commented at 9:17 am on December 27, 2025:

    I think it woulbe nice to have more batch sizes in the crossover region of strauss and pippenger (~80-150 points). So, something like this:

     0    static const size_t batch_sizes[] = {
     1        /* Small (Strauss region) */
     2        5, 10, 15, 20, 30, 50, 70,
     3        /* Crossover region */
     4        85, 88, 90, 100, 120, 150, 175,
     5        /* Medium (Pippenger small windows, w=6..8) */
     6        200, 300, 500, 750, 1000, 1200,
     7        /* Large (Pippenger large windows, w=9..12) */
     8        1500, 2000, 3000, 5000, 7500,
     9        10000, 15000, 20000, 30000
    10    };
    

    siv2r commented at 9:26 am on December 27, 2025:

    We should probably cap the batch size for Strauss calibration (7,500 or less?). At higher sizes, the benchmark results curve and become outliers, which will skew the linear regression model. Since ecmult_multi_select won’t choose Strauss for large batches anyway, we can avoid these sizes. I’ve attached a visualization below.

  14. siv2r commented at 9:38 am on December 27, 2025: contributor

    I ran the calibration tool, and I’m getting the following results. It’s very close the exisitng results, uptil PIPPENGER_8 after which the C value increases for me (this shouldn’t happen right?). I’ll run this a few more times.

     0static const struct secp256k1_ecmult_multi_abcd secp256k1_ecmult_multi_abcds[SECP256K1_ECMULT_MULTI_NUM_ALGOS] = {
     1    {0,                                     0,                                     1000,  0     },
     2    {SECP256K1_STRAUSS_POINT_SIZE,          0,                                     113,   143  },
     3    {SECP256K1_PIPPENGER_POINT_SIZE(1),  SECP256K1_PIPPENGER_FIXED_SIZE(1),  199,   303  },
     4    {SECP256K1_PIPPENGER_POINT_SIZE(2),  SECP256K1_PIPPENGER_FIXED_SIZE(2),  152,   460  },
     5    {SECP256K1_PIPPENGER_POINT_SIZE(3),  SECP256K1_PIPPENGER_FIXED_SIZE(3),  117,   782  },
     6    {SECP256K1_PIPPENGER_POINT_SIZE(4),  SECP256K1_PIPPENGER_FIXED_SIZE(4),  100,   1158 },
     7    {SECP256K1_PIPPENGER_POINT_SIZE(5),  SECP256K1_PIPPENGER_FIXED_SIZE(5),  86,    1837 },
     8    {SECP256K1_PIPPENGER_POINT_SIZE(6),  SECP256K1_PIPPENGER_FIXED_SIZE(6),  77,    3013 },
     9    {SECP256K1_PIPPENGER_POINT_SIZE(7),  SECP256K1_PIPPENGER_FIXED_SIZE(7),  72,    4845 },
    10    {SECP256K1_PIPPENGER_POINT_SIZE(8),  SECP256K1_PIPPENGER_FIXED_SIZE(8),  69,    8775 },
    11    {SECP256K1_PIPPENGER_POINT_SIZE(9),  SECP256K1_PIPPENGER_FIXED_SIZE(9),  73,    14373},
    12    {SECP256K1_PIPPENGER_POINT_SIZE(10), SECP256K1_PIPPENGER_FIXED_SIZE(10), 78,    26442},
    13    {SECP256K1_PIPPENGER_POINT_SIZE(11), SECP256K1_PIPPENGER_FIXED_SIZE(11), 80,    48783},
    14    {SECP256K1_PIPPENGER_POINT_SIZE(12), SECP256K1_PIPPENGER_FIXED_SIZE(12), 106,   88289},
    15};
    

    I’ve visualized the benchmarks for all algorithms (see https://github.com/siv2r/secp256k1/commit/311938612268866bd3dd0cf6c1d006e418518b1d for the diagrams). I’m planning to do a deeper dive into the regression model fit to ensure they align with these results. Will share any useful findings here

  15. siv2r commented at 2:11 pm on December 27, 2025: contributor

    I did some analysis on the linear regression model for the CD values (see https://github.com/siv2r/secp256k1/commit/b5985a60454e68c7f88bfe7a9d41b89a47f2b530). I basically ran ./bench_ecmult calib twice. Used the first run to compute the linear regression model, and the second run to test it against new benchmark values.

     0static const struct secp256k1_ecmult_multi_abcd secp256k1_ecmult_multi_abcds[SECP256K1_ECMULT_MULTI_NUM_ALGOS] = {
     1    {0,                                     0,                                     1000,  0     },
     2    {SECP256K1_STRAUSS_POINT_SIZE,          0,                                     112,   173  },
     3    {SECP256K1_PIPPENGER_POINT_SIZE(1),  SECP256K1_PIPPENGER_FIXED_SIZE(1),  197,   285  },
     4    {SECP256K1_PIPPENGER_POINT_SIZE(2),  SECP256K1_PIPPENGER_FIXED_SIZE(2),  152,   480  },
     5    {SECP256K1_PIPPENGER_POINT_SIZE(3),  SECP256K1_PIPPENGER_FIXED_SIZE(3),  117,   767  },
     6    {SECP256K1_PIPPENGER_POINT_SIZE(4),  SECP256K1_PIPPENGER_FIXED_SIZE(4),  100,   1167 },
     7    {SECP256K1_PIPPENGER_POINT_SIZE(5),  SECP256K1_PIPPENGER_FIXED_SIZE(5),  86,    1887 },
     8    {SECP256K1_PIPPENGER_POINT_SIZE(6),  SECP256K1_PIPPENGER_FIXED_SIZE(6),  78,    3023 },
     9    {SECP256K1_PIPPENGER_POINT_SIZE(7),  SECP256K1_PIPPENGER_FIXED_SIZE(7),  73,    4906 },
    10    {SECP256K1_PIPPENGER_POINT_SIZE(8),  SECP256K1_PIPPENGER_FIXED_SIZE(8),  70,    8889 },
    11    {SECP256K1_PIPPENGER_POINT_SIZE(9),  SECP256K1_PIPPENGER_FIXED_SIZE(9),  74,    14544},
    12    {SECP256K1_PIPPENGER_POINT_SIZE(10), SECP256K1_PIPPENGER_FIXED_SIZE(10), 79,    26764},
    13    {SECP256K1_PIPPENGER_POINT_SIZE(11), SECP256K1_PIPPENGER_FIXED_SIZE(11), 84,    49179},
    14    {SECP256K1_PIPPENGER_POINT_SIZE(12), SECP256K1_PIPPENGER_FIXED_SIZE(12), 106,   89518},
    15};
    

    Metric Definitions

    Metric Meaning Good Value for Benchmarking
    How well the linear model fits (0-1) > 0.95 (strong linear relationship)
    Std Error Uncertainty in slope estimate Low relative to slope
    p-value Probability slope = 0 by chance < 0.05 (statistically significant)
    Slope Time increase per additional n (μs) Positive, algorithm-dependent
    Intercept Fixed overhead time (μs) Algorithm-dependent

    Per-Algorithm Metrics

    Algorithm Std Error p-value Slope Intercept
    STRAUSS 0.983536 0.1660 1.29e-25 6.2956 -1917.23
    PIPPENGER_1 0.999995 0.0039 6.29e-73 8.5860 -133.20
    PIPPENGER_2 0.999992 0.0036 9.98e-71 6.7066 -115.04
    PIPPENGER_3 0.999993 0.0026 8.15e-71 4.9746 -34.11
    PIPPENGER_4 0.999996 0.0015 5.34e-75 4.1772 -13.27
    PIPPENGER_5 0.999995 0.0015 1.16e-73 3.5473 56.56
    PIPPENGER_6 0.999991 0.0019 2.30e-69 3.1035 105.03
    PIPPENGER_7 0.999994 0.0013 4.69e-72 2.7083 219.62
    PIPPENGER_8 0.999989 0.0016 2.64e-68 2.4879 414.58
    PIPPENGER_9 0.999977 0.0021 3.76e-64 2.2872 719.46
    PIPPENGER_10 0.999953 0.0028 5.02e-60 2.1705 1287.92
    PIPPENGER_11 0.999847 0.0049 4.73e-53 2.0592 2321.53
    PIPPENGER_12 0.999583 0.0075 3.62e-47 1.9331 4153.43

    The linear fit ($R^2$) and standard error for STRAUSS is relatively worse. We can improve this by using smaller batch sizes when calibrating Strauss. I see negative intercepts (i.e., D values) for PIPPENGER_1..4, this should be positive. I think this could also be address by capping these algorithms to medium batch sizes during calibration. Otherwise, the calibrated values are excellent!


fjahr siv2r


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin-core/secp256k1. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-01-07 22:15 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me