[WIP, Please benchmark] Use Co-Z in pippenger #1782

pull peterdettman wants to merge 1 commits into bitcoin-core:master from peterdettman:proto-coz-pippenger changing 4 files +455 −10
  1. peterdettman commented at 11:56 am on December 9, 2025: contributor

    Modifies the pippenger bucket-summing loop. Very early code, just here to get benchmarks of the basic concept.

    In short, we ensure running_sum always has the same z coordinate as the accumulator (“r”), so that adding running_sum to r can be done in 5M + 2S (with free update of running_sum for the new Co-Z coordinate); this is often called a ZADDU (or DBLU when it devolves to doubling). On the downside, when adding each bucket to the running_sum, we now need to also update r to keep them Co-Z; cost 3M + 1S. So a typical iteration of the summing loop costs 20M + 7S instead of 24M + 8S.

    I measure a few % overall improvement for pippenger_wnaf, depending on the number of points, although I would appreciate others sanity-checking that. Unfortunately it involves rather a lot of new code, not helped by the fact that there are special cases everywhere.

  2. Prototype using CoZ arithmetic in pippenger_wnaf d95e48b089
  3. peterdettman marked this as a draft on Dec 9, 2025
  4. john-moffett commented at 6:52 pm on December 9, 2025: contributor

    Apple M2 Max, default build. Master vs d95e48b089778de9ca992cc5dfbabd6674d8c2d4

     0secp256k1 configure summary
     1===========================
     2Build artifacts:
     3  library type ........................ Shared
     4Optional modules:
     5  ECDH ................................ ON
     6  ECDSA pubkey recovery ............... OFF
     7  extrakeys ........................... ON
     8  schnorrsig .......................... ON
     9  musig ............................... ON
    10  ElligatorSwift ...................... ON
    11Parameters:
    12  ecmult window size .................. 15
    13  ecmult gen table size ............... 86 KiB
    14Optional features:
    15  assembly ............................ OFF
    16  external callbacks .................. OFF
    17Optional binaries:
    18  benchmark ........................... ON
    19  noverify_tests ...................... ON
    20  tests ............................... ON
    21  exhaustive tests .................... ON
    22  ctime_tests ......................... OFF
    23  examples ............................ OFF
    24
    25Cross compiling ....................... FALSE
    26API visibility attributes ............. ON
    27Valgrind .............................. OFF
    28Preprocessor defined macros ........... ECMULT_WINDOW_SIZE=15 COMB_BLOCKS=43 COMB_TEETH=6
    29C compiler ............................ AppleClang 17.0.0.17000013, /usr/bin/cc
    30CFLAGS ................................ 
    31Compile options ....................... -Wall -pedantic -Wcast-align -Wconditional-uninitialized -Wextra -Wnested-externs -Wno-long-long -Wno-overlength-strings -Wno-unused-function -Wreserved-identifier -Wshadow -Wstrict-prototypes -Wundef
    32Build type:
    33 - CMAKE_BUILD_TYPE ................... RelWithDebInfo
    34 - CFLAGS ............................. -O2 -g 
    35 - LDFLAGS for executables ............ 
    36 - LDFLAGS for shared libraries ....... 
    
    Benchmark Before Avg (us) After Avg (us) Δ Avg (us) Δ Avg %
    ecmult_multi_79p_g 7.81 7.85 +0.04 +0.5%
    ecmult_multi_95p_g 7.47 7.32 -0.15 -2.0%
    ecmult_multi_111p_g 7.24 7.03 -0.21 -2.9%
    ecmult_multi_127p_g 7.15 6.87 -0.28 -3.9%
    ecmult_multi_159p_g 6.85 6.49 -0.36 -5.3%
    ecmult_multi_191p_g 6.57 6.25 -0.32 -4.9%
    ecmult_multi_223p_g 6.28 6.08 -0.20 -3.2%
    ecmult_multi_255p_g 6.23 5.99 -0.24 -3.9%
    ecmult_multi_319p_g 5.86 5.71 -0.15 -2.6%
    ecmult_multi_383p_g 5.71 5.59 -0.12 -2.1%
    ecmult_multi_447p_g 5.48 5.41 -0.07 -1.3%
    ecmult_multi_511p_g 5.35 5.28 -0.07 -1.3%
    ecmult_multi_639p_g 5.33 5.20 -0.13 -2.4%
    ecmult_multi_767p_g 5.04 5.06 +0.02 +0.4%
    ecmult_multi_895p_g 4.96 4.88 -0.08 -1.6%
    ecmult_multi_1023p_g 4.88 4.89 +0.01 +0.2%
    ecmult_multi_1279p_g 4.76 4.67 -0.09 -1.9%
    ecmult_multi_1535p_g 4.59 4.51 -0.08 -1.7%
    ecmult_multi_1791p_g 4.46 4.36 -0.10 -2.2%
    ecmult_multi_2047p_g 4.36 4.29 -0.07 -1.6%
    ecmult_multi_2559p_g 4.24 4.18 -0.06 -1.4%
    ecmult_multi_3071p_g 4.14 4.06 -0.08 -1.9%
    ecmult_multi_3583p_g 4.19 4.04 -0.15 -3.6%
    ecmult_multi_4095p_g 4.04 3.95 -0.09 -2.2%
    ecmult_multi_5119p_g 3.93 3.86 -0.07 -1.8%
    ecmult_multi_6143p_g 3.82 3.78 -0.04 -1.0%
    ecmult_multi_7167p_g 3.79 3.76 -0.03 -0.8%
    ecmult_multi_8191p_g 3.75 3.65 -0.10 -2.7%
    ecmult_multi_10239p_g 3.63 3.57 -0.06 -1.7%
    ecmult_multi_12287p_g 3.57 3.51 -0.06 -1.7%
    ecmult_multi_14335p_g 3.50 3.48 -0.02 -0.6%
    ecmult_multi_16383p_g 3.53 3.38 -0.15 -4.2%
    ecmult_multi_20479p_g 3.42 3.31 -0.11 -3.2%
    ecmult_multi_24575p_g 3.29 3.30 +0.01 +0.3%
    ecmult_multi_28671p_g 3.22 3.14 -0.08 -2.5%
    ecmult_multi_32767p_g 3.19 3.10 -0.09 -2.8%
  5. siv2r commented at 6:48 pm on December 15, 2025: contributor

    I benchmarked this pull request on a MacBook Pro (M4 Pro, ARM64, 12 cores: 8P + 4E) running macOS 15.6.1, plugged in with no background apps. Based on my results, this pull request performs better than master, with an average speedup of ~2%.

    Anyone who wants to reproduce the benchmarks can use this Python script. It benchmarks both this pull request and the master branch, and outputs an .xlsx file comparing their performance.


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin-core/secp256k1. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-01-07 22:15 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me