bench: replace wall-clock timer with per-process CPU timer #1732

pull Raimo33 wants to merge 4 commits into bitcoin-core:master from Raimo33:benchmark-precise changing 9 files +111 −28
  1. Raimo33 commented at 2:58 pm on September 2, 2025: none

    Goal

    This PR refactors the benchmarking functions as per #1701, in order to make benchmarks more deterministic and less influenced by the environvment.

    This is achieved by replacing Wall-Clock Timer with Per-Process CPU Timer when possible.

  2. Raimo33 marked this as a draft on Sep 2, 2025
  3. real-or-random commented at 3:05 pm on September 2, 2025: contributor

    Just some quick comments:

    [x] remove the number of runs (count) in favor of simpler cleaner approach with just number of iterations (iter).

    I think there’s a reason to have this. Some benchmarks take much longer than others, so it probably makes sense to run fewer iters for these.

    [x] remove min and max statistics in favor of simpler approach with just avg.

    I think min and max are useful. For constant-time code, you can also compare min. And max gives you an idea if there were fluctuations or not.

    [x] remove needless fixed point conversion in favor of simpler floating point divisions.

    Well, okay, that has a history; see #689. It’s debatable if it makes sense to avoid floating point math, but as long as it doesn’t get in your way here, it’s a cool thing to keep it. :D

  4. real-or-random commented at 3:06 pm on September 2, 2025: contributor
    It will be useful to split your changes into meaningful and separate commits, see https://github.com/bitcoin/bitcoin/blob/master/CONTRIBUTING.md#committing-patches.
  5. Raimo33 commented at 3:09 pm on September 2, 2025: none
    I think min and max just complicate things. let me explain: first of all, as it is right now, they don’t even measure the min and max, they just measure the min/max of the averages of all runs. aka not the absolute. Furthermore, in order to have them, one would need to run all the iterations 10 times more. benchmarks are already slow, adding this min/max slows them by 10 fold. imho it’s completely unnecessary. @real-or-random
  6. sipa commented at 3:15 pm on September 2, 2025: contributor

    If we’re going to rework this, I’d suggest using the stabilized quartiles approach from https://cr.yp.to/papers/rsrst-20250727.pdf:

    • StQ1: the average of all samples between 1st and 3rd octile
    • StQ2: the average of all samples between 3rd and 5th octile
    • StQ3: the average of all samples between 5th and 7th octile
  7. Raimo33 force-pushed on Sep 2, 2025
  8. Raimo33 commented at 3:23 pm on September 2, 2025: none

    I think there’s a reason to have this. Some benchmarks take much longer than others, so it probably makes sense to run fewer iters for these.

    right now all benchmarks are run with count=10 and fixed iters (apart from ecmult_multi which adjusts the number of iters, not count).

    therefore count is only useful to extrapolate min and max

  9. Raimo33 commented at 3:33 pm on September 2, 2025: none

    Well, okay, that has a history; see #689. It’s debatable if it makes sense to avoid floating point math, but as long as it doesn’t get in your way here, it’s a cool thing to keep it. :D

    I disagree with #689. It overcomplicate things for the sake of not having floating point math. those divisions aren’t even in the hot path, they’re outside the benchmarks.

  10. sipa commented at 3:42 pm on September 2, 2025: contributor
    Concept NACK on removing any ability to observe variance in timing. The current min/avg/max are far from perfect, but they work fairly well in practice. Improving is welcome, but removing them is a step backwards.
  11. Raimo33 commented at 3:57 pm on September 2, 2025: none

    Concept NACK on removing any ability to observe variance in timing. The current min/avg/max are far from perfect, but they work fairly well in practice. Improving is welcome, but removing them is a step backwards.

    what is the usefulness of measuring min/max when we are removing OS interference & thermal throttling out of the equation? min/max will be extremely close to the avg no matter how bad the benchmarked function is.

  12. Raimo33 force-pushed on Sep 2, 2025
  13. Raimo33 force-pushed on Sep 2, 2025
  14. Raimo33 force-pushed on Sep 2, 2025
  15. Raimo33 renamed this:
    [WIP] Refactor benchmark
    [WIP] refactor: remove system interference from benchmarks
    on Sep 2, 2025
  16. Raimo33 force-pushed on Sep 2, 2025
  17. Raimo33 force-pushed on Sep 2, 2025
  18. Raimo33 commented at 5:32 pm on September 2, 2025: none
    by the way, gettimeofday() is officially discouraged since 2008 in favor of clock_gettime(). The POSIX standard marks it as obsolescent but still provides the API for backward compatibility.
  19. Raimo33 force-pushed on Sep 2, 2025
  20. Raimo33 force-pushed on Sep 2, 2025
  21. Raimo33 commented at 7:45 pm on September 2, 2025: none

    even though the manual says that CLOCK_PROCESS_CPUTIME_ID is only useful if the process is locked to a core, modern CPUs have largely addressed this issue. So I think it is fair to compile CLOCK_PROCESS_CPUTIME_ID even though we don’t have the guarantee that the user has pinned the benchmarking process to a core. The worst case scenario is a unreliable benchmark, which the current repo has anyways.

    I added a line in the README.md for best practices to run the benchmarks.

    I also tried adding a function to pin the process to a core directly in C, but there’s no standard POSIX compliant way to do so. There is pthread_setaffinity_np() on linux, where ’np’ stands for ’not portable'

  22. Raimo33 force-pushed on Sep 2, 2025
  23. Raimo33 force-pushed on Sep 2, 2025
  24. Raimo33 force-pushed on Sep 2, 2025
  25. Raimo33 force-pushed on Sep 2, 2025
  26. real-or-random commented at 8:47 pm on September 2, 2025: contributor

    Concept NACK on removing any ability to observe variance in timing. The current min/avg/max are far from perfect, but they work fairly well in practice. Improving is welcome, but removing them is a step backwards.

    what is the usefulness of measuring min/max when we are removing OS interference & thermal throttling out of the equation? min/max will be extremely close to the avg no matter how bad the benchmarked function is.

    The point is exactly having a simple way of verifying that there’s indeed no interference. Getting rid of sources of variance is hard to get right, and it’s impossible to get a perfect solution. (This discussion shows this!) So we better have a way of spotting if something is off.

    I like the stabilized quartiles idea.

  27. Raimo33 commented at 8:52 pm on September 2, 2025: none

    I like the stabilized quartiles idea.

    tbh it scares me a bit, will see what I can do. Maybe in a future PR.

  28. Raimo33 force-pushed on Sep 2, 2025
  29. Raimo33 marked this as ready for review on Sep 2, 2025
  30. in CMakeLists.txt:34 in 66745f741f outdated
    29@@ -30,6 +30,8 @@ set(${PROJECT_NAME}_LIB_VERSION_AGE 0)
    30 #=============================
    31 set(CMAKE_C_STANDARD 90)
    32 set(CMAKE_C_EXTENSIONS OFF)
    33+# Enable POSIX features while maintaining ISO C compliance
    34+add_compile_definitions(_POSIX_C_SOURCE=200112L) #needed for `clock_gettime()` in bench.c
    


    real-or-random commented at 6:22 am on September 3, 2025:
    we’ll only need this in the benchmarks. Can you define it at the top of bench.c?

    Raimo33 commented at 6:33 pm on September 4, 2025:
    that was my initial setup. but it doesn’t work because it has to be before any include. I added it to the src/CMakeLists.txt targeting only the benchmarks.
  31. in README.md:162 in 66745f741f outdated
    158@@ -159,6 +159,8 @@ Benchmark
    159 ------------
    160 If configured with `--enable-benchmark` (which is the default), binaries for benchmarking the libsecp256k1 functions will be present in the root directory after the build.
    161 
    162+For an accurate benchmark, it is strongly recommended to pin the process to a dedicated CPU core and to disable CPU frequency scaling.
    



    l0rinc commented at 8:17 pm on September 6, 2025:
    This will be more reliable at best, but not more “accurate”, since it doesn’t reflect actual usage.

    Raimo33 commented at 8:47 pm on September 6, 2025:
    agreed. changed.
  32. in src/bench.h:14 in 66745f741f outdated
     6@@ -7,30 +7,63 @@
     7 #ifndef SECP256K1_BENCH_H
     8 #define SECP256K1_BENCH_H
     9 
    10+#if defined(_WIN32)
    11+# include <windows.h>
    12+#else
    13+# include <time.h>
    14+# include <sys/time.h>
    


    real-or-random commented at 6:38 am on September 3, 2025:
    Do we need both?

    Raimo33 commented at 8:11 am on September 3, 2025:
    yes, one for gettimeofday() and one for clock_gettime(). they don’t conflict.

    real-or-random commented at 6:56 pm on September 3, 2025:
    I think then it could be slightly more portable to include only what we’ll use.

    Raimo33 commented at 9:36 pm on September 3, 2025:

    just tried. unfortunately the macros to detect wether gettimeofday() or clock_gettime() is needed are defined in the time.h header. so the only viable solution is:

    0#include <time.h>
    1#if !defined(CLOCK_PROCESS_CPUTIME_ID) && !defined(CLOCK_MONOTONIC) && !defined(CLOCK_REALTIME)
    2#  include <sys/time.h>
    3#endif
    

    which every LLM tells me is ‘inherently unreliable’ and can lead to portability issues.

    ‘Including sys/time.h on modern POSIX systems that already have time.h and clock_gettime doesn’t hurt, and ensures gettimeofday is available if needed.’

    ‘Trying to skip it adds complexity with almost no benefit.’


    real-or-random commented at 8:39 am on September 8, 2025:
    Okay, but if we include time.h anyway, then let’s stick to clock_gettime on POSIX? https://pubs.opengroup.org/onlinepubs/9799919799/functions/clock_getres.html says that the proper macro to check is _POSIX_CPUTIME and that CLOCK_MONOTONIC must be supported, so there’s no need to check for it.

    Raimo33 commented at 9:37 am on September 8, 2025:
    You’re right, the latest specification makes CLOCK_MONOTONIC support mandatory, this makes it way simpler. no need to include <sys/time.h> anymore.

    Raimo33 commented at 10:22 am on September 8, 2025:
    This would be possible if every compiler supported the Std 1003.1-2024 version of POSIX, which is unrealistic. Until then, for backward compatibility, I kept CLOCK_REALTIME as the last fallback.

    Raimo33 commented at 10:26 am on September 8, 2025:
    and btw, the _POSIX_CPUTIME and _POSIX_MONOTONIC flags are to be used when you want to ensure that the library has mandatory support for those features. In my case for example, gcc doesnt expose _POSIX_MONOTONIC but still supports it. Therefore it’s better to check CLOCK_PROCESS_CPUTIME_ID and CLOCK_MONOTONIC directly.
  33. in src/bench.h:42 in 66745f741f outdated
    49+
    50+    return (int64_t)((k.QuadPart + u.QuadPart) / 10);
    51+#else /* POSIX */
    52+
    53+# if defined(CLOCK_PROCESS_CPUTIME_ID)
    54+    /* in theory, CLOCK_PROCESS_CPUTIME_ID is only useful if the process is locked to a core. see https://linux.die.net/man/3/clock_gettime */
    


    real-or-random commented at 6:40 am on September 3, 2025:
    and in practice?

    Raimo33 commented at 8:12 am on September 3, 2025:

    real-or-random commented at 6:57 pm on September 3, 2025:
    Okay, sorry. I was aware of the comment in the issue, but a future reader of the code comment will not. So what I’m trying to say is that the code comment can be improved.

    Raimo33 commented at 8:46 pm on September 3, 2025:
    Added comment and link to chatgpt deepsearch. might not be very professional but it includes the full analysis and lots of citations.
  34. in src/bench.h:64 in 66745f741f outdated
    75+    /* WARN: timer is influenced by environvment (OS, scheduling, interrupts...) */
    76     struct timeval tv;
    77-    gettimeofday(&tv, NULL);
    78-    return (int64_t)tv.tv_usec + (int64_t)tv.tv_sec * 1000000LL;
    79+    gettimeofday((struct timeval*)&tv, NULL);
    80+    return (int64_t)tv.tv_sec * 1000000 + tv.tv_usec;
    


    real-or-random commented at 6:42 am on September 3, 2025:

    How is this diff an improvement?

    • The pointer cast is a no-op.
    • LL is not needed.

    Raimo33 commented at 8:49 am on September 3, 2025:
    you’re right. removed.
  35. in src/bench.h:39 in 66745f741f outdated
    69+    struct timespec ts;
    70+    if (clock_gettime(CLOCK_REALTIME, &ts) != 0) {
    71+        return 0;
    72+    }
    73+    return (int64_t)ts.tv_sec * 1000000 + ts.tv_nsec / 1000;
    74+# else
    


    real-or-random commented at 6:43 am on September 3, 2025:
    This could be DRYed more. Maybe it’s a good idea to print to the user which clock method is used.

    Raimo33 commented at 8:16 am on September 3, 2025:

    yes, I tried everything you mentioned.

    removing redundancy here can be done by just using the ifdefs for the constant, but you’d need a different path for the last fallback. the last case uses tv, not ts.

    I tried issuing warnings with #warning to tell the user about the precision of the clock but it’s not compatible with c90.


    Raimo33 commented at 8:56 am on September 3, 2025:

    Here’s the DRY’d version:

     0# if defined(CLOCK_PROCESS_CPUTIME_ID)
     1    /* in theory, CLOCK_PROCESS_CPUTIME_ID is only useful if the process is locked to a core. see https://linux.die.net/man/3/clock_gettime */
     2const clockid_t clk_id = CLOCK_PROCESS_CPUTIME_ID;
     3# elif defined(CLOCK_MONOTONIC)
     4    /* WARN: timer is influenced by environvment (OS, scheduling, interrupts...) */
     5const clockid_t clk_id = CLOCK_MONOTONIC;
     6# elif defined(CLOCK_REALTIME)
     7    /* WARN: timer is influenced by environvment (OS, scheduling, interrupts...) */
     8const clockid_t clk_id = CLOCK_REALTIME;
     9# else
    10# define USE_GETTIMEOFDAY
    11# endif
    12
    13# ifndef USE_GETTIMEOFDAY
    14struct timespec ts;
    15if (clock_gettime(clk_id, &ts) != 0) {
    16    return 0;
    17}
    18return (int64_t)ts.tv_sec * 1000000 + ts.tv_nsec / 1000;
    19# else
    20    /* WARN: timer is influenced by environvment (OS, scheduling, interrupts...) */
    21struct timeval tv;
    22gettimeofday(&tv, NULL);
    23return tv.tv_usec + (int64_t)tv.tv_sec * 1000000;
    24# endif
    

    less readable imo


    real-or-random commented at 6:55 pm on September 3, 2025:

    I tried issuing warnings with #warning to tell the user about the precision of the clock but it’s not compatible with c90.

    What I had in mind is printing the warning out at runtime.


    Raimo33 commented at 9:36 pm on September 3, 2025:
    good suggestion. done.
  36. in src/bench.h:28 in 66745f741f outdated
    35-    struct timespec tv;
    36-    if (!timespec_get(&tv, TIME_UTC)) {
    37-        fputs("timespec_get failed!", stderr);
    38-        exit(EXIT_FAILURE);
    39+    if (GetProcessTimes(GetCurrentProcess(), &creation, &exit, &kernel, &user) == 0) {
    40+        return 0;
    


    real-or-random commented at 6:56 am on September 3, 2025:
    What’s the rationale for the change on Windows?

    Raimo33 commented at 8:19 am on September 3, 2025:
    previously wall-clock times. now per-process cpu time.
  37. Raimo33 commented at 8:30 am on September 3, 2025: none

    The point is exactly having a simple way of verifying that there’s indeed no interference. Getting rid of sources of variance is hard to get right, and it’s impossible to get a perfect solution. (This discussion shows this!) So we better have a way of spotting if something is off.

    fine, but at least let’s make it optional. I don’t like my benchmarks being 10 times slower just because min/max need to be computed. if I say 20'000 iterations I want it to be 20`000 iterations.

  38. Raimo33 force-pushed on Sep 3, 2025
  39. Raimo33 requested review from real-or-random on Sep 3, 2025
  40. Raimo33 force-pushed on Sep 3, 2025
  41. Raimo33 force-pushed on Sep 3, 2025
  42. Raimo33 force-pushed on Sep 3, 2025
  43. Raimo33 force-pushed on Sep 4, 2025
  44. Raimo33 force-pushed on Sep 4, 2025
  45. Raimo33 force-pushed on Sep 4, 2025
  46. Raimo33 force-pushed on Sep 6, 2025
  47. Raimo33 force-pushed on Sep 6, 2025
  48. Raimo33 force-pushed on Sep 6, 2025
  49. real-or-random commented at 8:43 am on September 8, 2025: contributor

    The point is exactly having a simple way of verifying that there’s indeed no interference. Getting rid of sources of variance is hard to get right, and it’s impossible to get a perfect solution. (This discussion shows this!) So we better have a way of spotting if something is off.

    fine, but at least let’s make it optional. I don’t like my benchmarks being 10 times slower just because min/max need to be computed. if I say 20'000 iterations I want it to be 20`000 iterations.

    Not sure. I suggest making this PR focused on a single (uncontroversial) change, which is switching to per process clocks.

  50. Raimo33 force-pushed on Sep 8, 2025
  51. Raimo33 renamed this:
    [WIP] refactor: remove system interference from benchmarks
    bench: replace wall-clock timer with per-process CPU timer
    on Sep 8, 2025
  52. Raimo33 force-pushed on Sep 8, 2025
  53. Raimo33 force-pushed on Sep 8, 2025
  54. in src/bench.h:44 in f9d2ddec4a outdated
    56+
    57+# if defined(CLOCK_PROCESS_CPUTIME_ID)
    58+    /* In theory, CLOCK_PROCESS_CPUTIME_ID is only useful if the process is locked to a core. 
    59+    * See https://linux.die.net/man/3/clock_gettime. 
    60+    * In practice, modern CPUs have largely addressed this issue. The worst case scenario is a unreliable benchmark. 
    61+    * See https://chatgpt.com/s/dr_68b7486a05a481919d9f121d182cc0cf 
    


    sipa commented at 3:18 pm on September 8, 2025:
    Can you find a source that’s a bit more authoritative than ChatGPT?

    Raimo33 commented at 3:55 pm on September 8, 2025:
    yes. Added multiple links.
  55. in src/bench.h:209 in f9d2ddec4a outdated
    203@@ -176,6 +204,14 @@ static int get_iters(int default_iters) {
    204     }
    205 }
    206 
    207+static void print_clock_info(void) {
    208+#if defined(_WIN32) || defined(CLOCK_PROCESS_CPUTIME_ID)
    209+    printf("INFO: using CPU timer, results are not influenced by other running processes\n\n");
    


    sipa commented at 3:19 pm on September 8, 2025:
    That seems like an exaggeration; hyperthreading, task swapping, and competing memory bandwidth, may all still affect the results.

    Raimo33 commented at 3:50 pm on September 8, 2025:
    you’re right. memory bandwith, cache pollution, hyperthreading, hot/cold branch predictor… But task swapping shouldn’t influence CLOCK_PROCESS_CPUTIME_ID. I changed the message to notify the user that he should probably pin the process.

    real-or-random commented at 7:22 am on September 9, 2025:

    I changed the message to notify the user that he should probably pin the process.

    If that’s not required on modern CPUs, why advise to do this?

    We could just say “INFO: Using per-process CPU timer”.

  56. Raimo33 force-pushed on Sep 8, 2025
  57. Raimo33 force-pushed on Sep 8, 2025
  58. in src/bench.h:42 in bbc8bb7003 outdated
    54+
    55+#else /* POSIX */
    56+
    57+# if defined(CLOCK_PROCESS_CPUTIME_ID)
    58+    /* In theory, CLOCK_PROCESS_CPUTIME_ID is only useful if the process is locked to a core. see https://man7.org/linux/man-pages/man2/clock_gettime.2.html *
    59+     * in practice, modern CPUs have synchronized TSCs which addresses this issue. see https://docs.amd.com/r/en-US/ug1586-onload-user/Timer-TSC-Stability   */
    


    real-or-random commented at 7:14 am on September 9, 2025:
    0    /* In theory, CLOCK_PROCESS_CPUTIME_ID is only useful if the process is locked to a core,
    1     * see `man clock_gettime` on Linux. In practice, modern CPUs have synchronized TSCs which
    2     * address this issue, see https://docs.amd.com/r/en-US/ug1586-onload-user/Timer-TSC-Stability . */
    
  59. Raimo33 force-pushed on Sep 9, 2025
  60. Raimo33 commented at 9:11 am on September 9, 2025: none
    I’ve applied the recommended changes and split the PR into multiple commits for simpler diffs
  61. Raimo33 force-pushed on Sep 9, 2025
  62. real-or-random added the label tweak/refactor on Sep 9, 2025
  63. l0rinc commented at 5:09 pm on September 9, 2025: none
    I haven’t investigated this in detail but couldn’t find any related issue: were there any efforts to use https://nanobench.ankerl.com like we do in Core instead? That will likely need some adjustments on the benchmarks themselves, but at least we wouldn’t be rolling our own framework.
  64. Raimo33 commented at 5:51 pm on September 9, 2025: none

    not familiar with nano bench, but I can see it uses std::chrono. which uses CLOCK_MONOTONIC (our fallback) under the hood, not TSC.

    also the user would need to have a c++ compiler, not a big real but I think this repo needs to strictly adhere to c89.

  65. Raimo33 commented at 6:02 pm on September 9, 2025: none
    I realized the windows version I implemented was using a unprecise clock. drafting. will fix.
  66. Raimo33 marked this as a draft on Sep 9, 2025
  67. Raimo33 force-pushed on Sep 9, 2025
  68. Raimo33 force-pushed on Sep 9, 2025
  69. Raimo33 commented at 11:12 pm on September 9, 2025: none
    I changed windows clock to QueryPerformanceCounter() which has microsecond precision. I also removed if checks from clock_gettime() as it can only fail for programming errors.
  70. real-or-random commented at 6:26 am on September 10, 2025: contributor

    I haven’t investigated this in detail but couldn’t find any related issue: were there any efforts to use nanobench.ankerl.com like we do in Core instead?

    nanobench is a framework for benchmarking C++ but this is C. Of course, we could still call our C from C++, but adding C++ to the code base seems a bit overkill to me if if’s “just” for the purpose of benchmarks. I agree that using an existing framework may be a good idea, but I’m not aware of any in pure C.

  71. Raimo33 marked this as ready for review on Sep 10, 2025
  72. l0rinc commented at 9:57 pm on September 10, 2025: none

    but adding C++ to the code base seems a bit overkill to me if if’s “just” for the purpose of benchmarks

    The purpose would be to standardize the benchmarking to avoid reinventing solutions to problems that aren’t strictly related to the purpose of this library. It would also help with trusting the result more, since we already know how that benchmarking library behaves. The main client of this library is C++ code, it doesn’t seem far-fetched to me to test that.

  73. real-or-random commented at 6:36 pm on September 15, 2025: contributor

    but adding C++ to the code base seems a bit overkill to me if if’s “just” for the purpose of benchmarks

    The purpose would be to standardize the benchmarking to avoid reinventing solutions to problems that aren’t strictly related to the purpose of this library. It would also help with trusting the result more, since we already know how that benchmarking library behaves. The main client of this library is C++ code, it doesn’t seem far-fetched to me to test that.

    Sorry, I didn’t want to be discouraging. I don’t think at all that nanobench is a far-fetched idea. And I truly agree that reinventing the wheel is not great.

    In the end, what counts is the maintenance burden. My main concern with C++ is that I’m not sure how fluent our regular contributors are in C++. (I’m not, but luckily I’m not the only one here.) If the bit of C++ is harder to touch (for us) than this code, then nanobench is not a good idea. If, on the other hand, we have people who can maintain C++, or if we get support from the Core contributors who are familiar with nanobench, then this can be a win.

    I’d be happy to see a demo/WIP PR but if you feel that this is a lot of work, then it might be a good idea to see what other contributors and maintainers think.

  74. Raimo33 commented at 3:34 pm on September 17, 2025: none
    This PR currently conflicts with https://github.com/bitcoin-core/secp256k1/pull/1734
  75. furszy commented at 4:16 pm on September 17, 2025: member

    This PR currently conflicts with #1734

    Don’t worry about it. #1734 only moves gettime() so it can be accessed by the test framework too. Any changes you make here will function correctly there as well, and vice versa.

  76. in src/bench.h:197 in 3e58a91b77
    189@@ -176,6 +190,14 @@ static int get_iters(int default_iters) {
    190     }
    191 }
    192 
    193+static void print_clock_info(void) {
    194+#if defined(CLOCK_PROCESS_CPUTIME_ID)
    195+    printf("INFO: Using per-process CPU timer\n\n");
    196+#else
    197+    printf("WARN: using Wall-Clock timer, results are highly influenced by other running processes\n\n");
    


    real-or-random commented at 11:03 am on September 22, 2025:
    0    printf("WARN: Using wall-clock timer instead of per-process CPU timer.\n\n");
    

    Let’s just stick to the facts and omit the interpretation of them, this makes it easier to move forward. ;)

  77. in README.md:162 in 3e58a91b77
    158@@ -159,6 +159,8 @@ Benchmark
    159 ------------
    160 If configured with `--enable-benchmark` (which is the default), binaries for benchmarking the libsecp256k1 functions will be present in the root directory after the build.
    161 
    162+For reliable benchmarks, it is strongly recommended to pin the process to a dedicated CPU core and to disable CPU frequency scaling.
    


    real-or-random commented at 11:05 am on September 22, 2025:
    0For reliable benchmarks, it is strongly recommended to pin the process to a single CPU core and to disable CPU frequency scaling.
    

    Dedicated sounds as if the core is only responsible for this process, which would be nice but it’s typically not doable.

  78. real-or-random approved
  79. real-or-random commented at 11:06 am on September 22, 2025: contributor
    ACK mod nits, still need to test this
  80. Raimo33 force-pushed on Sep 22, 2025
  81. in src/bench.c:7 in 8925b95882
    3@@ -4,6 +4,8 @@
    4  * file COPYING or https://www.opensource.org/licenses/mit-license.php.*
    5  ***********************************************************************/
    6 
    7+#define _POSIX_C_SOURCE 200112L /* for clock_gettime() */
    


    hebasto commented at 2:26 pm on September 22, 2025:
    Are there any references explaining why the value 200112L was chosen?

    Raimo33 commented at 2:53 pm on September 22, 2025:
    no particular reason besides the fact that it is the bare minimum version to make it work on my machine and CI. The reported 199309L version number didn’t expose the necessary macros.

    hebasto commented at 2:20 pm on September 25, 2025:

    The reported 199309L version number didn’t expose the necessary macros.

    I can’t reproduce this on my machine.

    no particular reason besides the fact that it is the bare minimum version to make it work on my machine…

    Could you share your build environment details so I can try to reproduce the issue?

    … and CI.

    It seems that CI is fine with the documented 199309L value. See: https://github.com/hebasto/secp256k1/actions/runs/18008380590

  82. hebasto commented at 2:27 pm on September 22, 2025: member

    According to https://www.man7.org/linux/man-pages/man3/clock_gettime.3.html:

    Link with -lrt (only for glibc versions before 2.17).

    I think this check, and adding the -lrt flag if necessary, should be included in both build systems.

  83. Raimo33 commented at 3:06 pm on September 22, 2025: none

    I think this check, and adding the -lrt flag if necessary, should be included in both build systems.

    Agreed, good catch! what do you think about something like this?

    0include(CheckFunctionExists)
    1check_function_exists(clock_gettime HAVE_CLOCK_GETTIME)
    2if(NOT HAVE_CLOCK_GETTIME)
    3    # On some platforms, clock_gettime requires librt
    4    target_link_libraries(your_target PRIVATE rt)
    5endif()
    
  84. hebasto commented at 3:27 pm on September 22, 2025: member

    Friendly ping @martinus, benchmark connoisseur :)

    Could you please give a rough estimate of these changes at the concept level?

  85. real-or-random commented at 3:41 pm on September 22, 2025: contributor

    Could you please give a rough estimate of these changes at the concept level?

    And what’s your take on using nanobench instead, even though this is a pure C library instead of C++?

  86. hebasto commented at 3:48 pm on September 22, 2025: member

    Goal

    This PR refactors the benchmarking functions as per #1701, in order to make benchmarks more deterministic and less influenced by the environvment.

    This is achieved by replacing Wall-Clock Timer with Per-Process CPU Timer when possible.

    I’ve tested 8925b9588259d27257f42eb4fb8bebd4a7be4d95 by running it alongside stress --cpu $(nproc) and observed the same effect on benchmark average values as on the master branch.

    Could you please clarify how one would measure or observe the claimed “less influenced by the environment” behaviour?

  87. Raimo33 commented at 8:53 am on September 25, 2025: none

    stress --cpu spawns workers that use 100% of the cpu. your test pegs every core to 100% utilization. and each gets probably pinned to a physical core. therefore you won’t see any difference in terms of variance.

    a better example would be starting various threads with random usage spikes at random times on the same core the benchmark is running on.

    basically, if you have no other processes running on that CPU, then this PR won’t show any improvement.

    this PR helps in the more realistic scenario where a user has background processes running, and the scheduler assigns multiple of them on the same CPU that’s running the benchmark, putting the thread to sleep, which the wall clock timer doesn’t account for.

    also, this PR is not merely a replacement of wall clock time with CPU time, but it also modernizes the clock function as per the Unix standard.

  88. hebasto commented at 3:08 pm on September 25, 2025: member

    I think this check, and adding the -lrt flag if necessary, should be included in both build systems.

    Agreed, good catch! what do you think about something like this?

    0include(CheckFunctionExists)
    1check_function_exists(clock_gettime HAVE_CLOCK_GETTIME)
    2if(NOT HAVE_CLOCK_GETTIME)
    3    # On some platforms, clock_gettime requires librt
    4    target_link_libraries(your_target PRIVATE rt)
    5endif()
    

    Feel free to grab f59c45f2cef03681074eef00046d3b146daf42e8 from https://github.com/hebasto/secp256k1/commits/pr1732/0925.cmake.

    UPD. @real-or-random do we need a CI job to build on some old system with old glibc?

  89. in src/bench.h:166 in 8925b95882 outdated
    195+    printf("INFO: Using per-process CPU timer\n\n");
    196+#else
    197+    printf("WARN: Using wall-clock timer instead of per-process CPU timer.\n\n");
    198+#endif
    199+}
    200+
    


    hebasto commented at 3:10 pm on September 25, 2025:

    5fe7c017682240295d701570f052b6ddfb2d1fb7:

    Do these messages make sense on Windows?


    Raimo33 commented at 3:19 pm on September 25, 2025:
    while on windows there’s no option for per-process cpu clock (or at least not high precision), I still think issuing the warning is fine.

    hebasto commented at 3:35 pm on September 25, 2025:
    It seems a bit of a stretch to call the native Windows QPC framework a “wall-clock timer”.

    Raimo33 commented at 4:26 pm on September 25, 2025:
    what about this? “WARN: Using global timer instead of per-process CPU timer.”
  90. Raimo33 commented at 4:54 pm on September 25, 2025: none

    Feel free to grab f59c45f from https://github.com/hebasto/secp256k1/commits/pr1732/0925.cmake.

    This seems overcomplicated and convoluted. but if you manage to simplify it I’ll include your commit.

  91. Raimo33 force-pushed on Sep 25, 2025
  92. hebasto commented at 5:12 pm on September 25, 2025: member

    Approach NACK cce014725032a58416237e38aedd2033ba2202f5.

    The current implementation breaks compatibility with systems using glibc versions prior to 2.17:

     0$ ldd --version | head -1
     1ldd (Ubuntu EGLIBC 2.15-0ubuntu10.23) 2.15
     2$ cmake -B build
     3$ cmake --build build -t bench
     4<snip>
     5[100%] Linking C executable ../bin/bench
     6CMakeFiles/bench.dir/bench.c.o: In function `gettime_us':
     7/secp256k1/src/bench.h:45: undefined reference to `clock_gettime'
     8collect2: ld returned 1 exit status
     9make[3]: *** [bin/bench] Error 1
    10make[2]: *** [src/CMakeFiles/bench.dir/all] Error 2
    11make[1]: *** [src/CMakeFiles/bench.dir/rule] Error 2
    12make: *** [bench] Error 2
    
  93. Raimo33 commented at 3:07 pm on September 28, 2025: none

    Fixed by unconditionally adding -lrt when available.

    0root@5997b2bd1aca:/workspace# ldd --version | head -1
    1ldd (Ubuntu EGLIBC 2.15-0ubuntu10.23) 2.15
    2root@5997b2bd1aca:/workspace# cmake --build build -t bench
    3[ 33%] Built target secp256k1_precomputed
    4[ 66%] Built target secp256k1
    5[100%] Built target bench
    
  94. in src/CMakeLists.txt:132 in 9128541044 outdated
    124@@ -125,12 +125,17 @@ endif()
    125 unset(${PROJECT_NAME}_soversion)
    126 
    127 if(SECP256K1_BUILD_BENCHMARK)
    128+  find_library(RT_LIBRARY rt)
    129+  add_library(optional_rt INTERFACE)
    130+  if(RT_LIBRARY)
    131+    target_link_libraries(optional_rt INTERFACE rt)
    132+  endif()
    


    purpleKarrot commented at 3:16 pm on September 29, 2025:

    Could you please extract this logic into a find-module, like FindRT.cmake? That module should provide an IMPORTED target with a namespace in the name.

    Have a look at the FindIconv.cmake module as an example. This module also provides an interface library for functionality that may be found in an actual library or built into the C standard library.


    Raimo33 commented at 1:20 pm on October 1, 2025:
    This looks compilicated, almost overcomplicated, without much benefits, I won’t be able to replicate FindIconv.cmake on my own. but you’re welcome to provide a commit and I’ll see if it should be cherrypicked.
  95. in README.md:160 in 9128541044 outdated
    158@@ -157,6 +159,8 @@ Benchmark
    159 ------------
    160 If configured with `--enable-benchmark` (which is the default), binaries for benchmarking the libsecp256k1 functions will be present in the root directory after the build.
    161 
    162+For reliable benchmarks, it is strongly recommended to pin the process to a single CPU core and to disable CPU frequency scaling.
    


    purpleKarrot commented at 3:35 pm on September 29, 2025:

    It would be great if the benchmark was configured in a way such that the guideline is followed.

    0add_test(NAME secp256k1_bench COMMAND ...)
    1set_tests_properties(secp256k1_bench PROPERTIES
    2  PROCESSORS 1
    3  PROCESSOR_AFFINITY ON
    4)
    

    I am not aware how to disable scaling, unfortunately.


    Raimo33 commented at 1:54 pm on October 1, 2025:
    This is a great idea, added.
  96. in src/CMakeLists.txt:133 in 9128541044
    124@@ -125,12 +125,17 @@ endif()
    125 unset(${PROJECT_NAME}_soversion)
    126 
    127 if(SECP256K1_BUILD_BENCHMARK)
    128+  find_library(RT_LIBRARY rt)
    129+  add_library(optional_rt INTERFACE)
    130+  if(RT_LIBRARY)
    131+    target_link_libraries(optional_rt INTERFACE rt)
    132+  endif()
    133   add_executable(bench bench.c)
    


    purpleKarrot commented at 3:43 pm on September 29, 2025:

    Note that CMake requires that

    • test names are unique per directory
    • target names are unique.

    While it is not strictly required to prefix test names with the project, it is still recommended, because it allows users to control which tests to run: ctest -R secp256k1_. This recommendation is already followed, good.

    For target names, this requirement is much stronger. Since this project is used as a subproject in bitcoin, neither bitcoin nor any other of its subprojects can define a target with the name bench. It would be recommended to rename the target to secp256k1_bench.


    Raimo33 commented at 1:25 pm on October 1, 2025:
    Agreed, done!

    Raimo33 commented at 1:54 pm on October 1, 2025:
    although CI scripts now need to change to reflect the executable name changes
  97. Raimo33 commented at 2:01 pm on October 1, 2025: none

    As per @purpleKarrot suggestion, benchmarks now can be run via ctest as well (as opposed to ./bench_name). This would be the preferable way to run them as it automatically handles CPU pinning and affinity. This means that, if both SECP256K1_BUILD_BENCHMARK and SECP256K1_BUILD_TESTS flags are set, the standard ctest command will run benchs+tests. This can be avoided by selectively choosing which “tests” to run with the following ctest syntax: ctest -R bench or ctest -R test (which uses regex).

    Otherwise we can evaluate the possibility of using labels for groups of tests.

  98. Raimo33 force-pushed on Oct 1, 2025
  99. Raimo33 force-pushed on Oct 1, 2025
  100. Raimo33 force-pushed on Oct 1, 2025
  101. Raimo33 force-pushed on Oct 1, 2025
  102. Raimo33 force-pushed on Oct 1, 2025
  103. Raimo33 force-pushed on Oct 1, 2025
  104. Raimo33 force-pushed on Oct 2, 2025
  105. real-or-random commented at 11:57 am on October 13, 2025: contributor
    @hebasto @purpleKarrot Want to re-review this? :)
  106. real-or-random commented at 6:51 am on October 15, 2025: contributor
    This needs to be rebased after the merge of #1734, sorry.
  107. Raimo33 force-pushed on Oct 15, 2025
  108. Raimo33 force-pushed on Oct 15, 2025
  109. Raimo33 force-pushed on Oct 15, 2025
  110. Raimo33 force-pushed on Oct 18, 2025
  111. Raimo33 commented at 7:25 am on October 21, 2025: none

    there appears to be an issue in the CI. @real-or-random can you trigger it again?

    also there’s one last thing I don’t like about this PR: having to define POSIX_C_SOURCE in every source file. if anyone finds a way to define it once, perhaps in the CMakeLists.txt please submit it.

  112. Raimo33 requested review from real-or-random on Oct 24, 2025
  113. Raimo33 requested review from hebasto on Oct 24, 2025
  114. Raimo33 requested review from purpleKarrot on Oct 24, 2025
  115. hebasto commented at 8:18 pm on October 25, 2025: member

    #1732 (comment):

    I suggest making this PR focused on a single (uncontroversial) change, which is switching to per process clocks.

    I strongly support this approach and I suggest postponing the following changes for follow-up PRs::

    • 4f3403ea10ffb91efdef8698e5f2464ec53460c7 “build: rename executables with prefix”, because it touches not only benchmarks but tests as well;
    • e6914740a74b40fb919c97c0f30c7579212fe922 “build: addbenchmarks to ctest”, because it would be reasonable to consider this after #1760.

    I also suggest dropping a93078fbb6fd8a62c8e7f67a7a0960dc9c6bf45c “refactor: reorder cmake commands for better readability” as it does the opposite given the style used for all other targets.

  116. in src/tests_exhaustive.c:8 in e6914740a7
    3@@ -4,6 +4,8 @@
    4  * file COPYING or https://www.opensource.org/licenses/mit-license.php.*
    5  ***********************************************************************/
    6 
    7+#define _POSIX_C_SOURCE 199309L /* for clock_gettime() */
    8+
    


    hebasto commented at 8:41 pm on October 25, 2025:
    This doesn’t seem necessary here.

    hebasto commented at 9:06 pm on October 25, 2025:
    That’s annoying that test sources have to be modified. See #1734 (comment).
  117. hebasto commented at 0:15 am on October 26, 2025: member

    #1732 (comment):

    This seems overcomplicated and convoluted.

    #1732 (review):

    This looks compilicated, almost overcomplicated, without much benefits…

    The PR author appears to favor simplicity over correctness (see the comment above). I disagree with that approach.

    but you’re welcome to provide a commit and I’ll see if it should be cherrypicked.

    Sure. Here is a branch that addresses some of @purpleKarrot’s comments and most of mine: https://github.com/hebasto/secp256k1/commits/251026-pr1732.alt/. However, it doesn’t yet include the necessary changes to the Autotools build system.

    UPD. Asked a question about handling _POSIX_C_SOURCE on IRC.

  118. cmake: Add `FindClockGettime` module 28f273d7fe
  119. bench: replace wall-clock timers with cpu-timers where possible
    This commit improves the reliability of benchmarks by removing some of the influence of other background running processes. This is achieved by using CPU bound clocks that aren't influenced by interrupts, sleeps, blocked I/O, etc.
    a1ea09a1dd
  120. bench: print clock info 76a4fc65e6
  121. docs: add benchmarking best practices e14981e28e
  122. Raimo33 force-pushed on Oct 26, 2025
  123. Raimo33 commented at 2:34 pm on October 26, 2025: none
    I agree, droppped the 3 unrelated commits from this PR, removed the unnecessary #define, cherry-picked your solution for the library detection
  124. Raimo33 marked this as a draft on Oct 26, 2025
  125. furszy commented at 2:17 pm on October 27, 2025: member

    Just started reviewing it; CMake compilation fails locally. It should set _POSIX_C_SOURCE=199309L instead, so the compiler expands correctly to -D_POSIX_C_SOURCE=199309L. See

     0diff --git a/cmake/FindClockGettime.cmake b/cmake/FindClockGettime.cmake
     1--- a/cmake/FindClockGettime.cmake	(revision e14981e28e1c1e4b2fb6321cd94342d0b2849be7)
     2+++ b/cmake/FindClockGettime.cmake	(date 1761574207222)
     3@@ -20,7 +20,7 @@
     4 
     5 cmake_push_check_state(RESET)
     6 
     7-set(CMAKE_REQUIRED_DEFINITIONS -D_POSIX_C_SOURCE=199309L)
     8+set(CMAKE_REQUIRED_DEFINITIONS _POSIX_C_SOURCE=199309L)
     9 check_symbol_exists(clock_gettime "time.h" CLOCK_GETTIME_IS_BUILT_IN)
    10 set(${CMAKE_FIND_PACKAGE_NAME}_FOUND ${CLOCK_GETTIME_IS_BUILT_IN})
    
  126. hebasto commented at 2:28 pm on October 27, 2025: member

    Just started reviewing it; CMake compilation fails locally. It should set _POSIX_C_SOURCE=199309L instead, so the compiler expands correctly to -D_POSIX_C_SOURCE=199309L during check_symbol_exists.

    I can’t reproduce it. What CMake version are you using?

    From CMake docs:

    CMAKE_REQUIRED_DEFINITIONS

    A semicolon-separated list of compiler definitions, each of the form -DFOO or -DFOO=bar.

  127. Raimo33 commented at 2:31 pm on October 27, 2025: none

    Just started reviewing it; CMake compilation fails locally. It should set _POSIX_C_SOURCE=199309L instead, so the compiler expands correctly to -D_POSIX_C_SOURCE=199309L during check_symbol_exists. See

    Can you provide the error output? my CMake doesn’t complain

  128. furszy commented at 2:42 pm on October 27, 2025: member

    Can you provide the error output? my CMake doesn’t complain

    Sure.

     0In file included from In file included from <built-in><built-in>:418:
     1<command line>:In file included from 1<built-in>:418:
     2:9<command line>:: 1In file included from :<built-in>::418:
     34189error: :
     4<command line>:macro name must be an identifier<command line>:1: 1:9: 
     5:error: 9error: : #define -D_POSIX_C_SOURCE 199309L
     6        ^macro name must be an identifier
     7macro name must be an identifier
     8error: 
     9macro name must be an identifier
    10#define -D_POSIX_C_SOURCE 199309L
    

    I can’t reproduce it. What CMake version are you using?

    cmake version 3.22.3

  129. hebasto commented at 2:47 pm on October 27, 2025: member

    cmake version 3.22.3

    Confirming. I’ll suggest a fix shortly.

    UPD. There was a change in CMake 3.26:

    For all COMPILE_DEFINITIONS properties, any leading -D on an item is removed…

  130. hebasto commented at 3:15 pm on October 27, 2025: member

    Just started reviewing it; CMake compilation fails locally. It should set _POSIX_C_SOURCE=199309L instead, so the compiler expands correctly to -D_POSIX_C_SOURCE=199309L. See

     0diff --git a/cmake/FindClockGettime.cmake b/cmake/FindClockGettime.cmake
     1--- a/cmake/FindClockGettime.cmake	(revision e14981e28e1c1e4b2fb6321cd94342d0b2849be7)
     2+++ b/cmake/FindClockGettime.cmake	(date 1761574207222)
     3@@ -20,7 +20,7 @@
     4 
     5 cmake_push_check_state(RESET)
     6 
     7-set(CMAKE_REQUIRED_DEFINITIONS -D_POSIX_C_SOURCE=199309L)
     8+set(CMAKE_REQUIRED_DEFINITIONS _POSIX_C_SOURCE=199309L)
     9 check_symbol_exists(clock_gettime "time.h" CLOCK_GETTIME_IS_BUILT_IN)
    10 set(${CMAKE_FIND_PACKAGE_NAME}_FOUND ${CLOCK_GETTIME_IS_BUILT_IN})
    

    Here is the minimal diff to fix the error for CMake older thank 3.26:

     0--- a/cmake/FindClockGettime.cmake
     1+++ b/cmake/FindClockGettime.cmake
     2@@ -34,7 +34,7 @@ if(${CMAKE_FIND_PACKAGE_NAME}_FOUND)
     3   if(NOT TARGET POSIX::clock_gettime)
     4     add_library(POSIX::clock_gettime INTERFACE IMPORTED)
     5     set_target_properties(POSIX::clock_gettime PROPERTIES
     6-      INTERFACE_COMPILE_DEFINITIONS "${CMAKE_REQUIRED_DEFINITIONS}"
     7+      INTERFACE_COMPILE_DEFINITIONS _POSIX_C_SOURCE=199309L
     8       INTERFACE_LINK_LIBRARIES "${CMAKE_REQUIRED_LIBRARIES}"
     9     )
    10   endif()
    
  131. furszy commented at 3:21 pm on October 27, 2025: member
    Isn’t that going to make CMAKE_REQUIRED_DEFINITIONS unused? the set(CMAKE_REQUIRED_DEFINITIONS -D_POSIX_C_SOURCE=199309L) line.
  132. hebasto commented at 3:22 pm on October 27, 2025: member

    Isn’t that going to make CMAKE_REQUIRED_DEFINITIONS unused? the set(CMAKE_REQUIRED_DEFINITIONS -D_POSIX_C_SOURCE=199309L) line.

    It is still used by check_symbol_exists().

  133. in src/tests_common.h:30 in a1ea09a1dd outdated
    42-    return (int64_t)tv.tv_usec + (int64_t)tv.tv_sec * 1000000LL;
    43+static int64_t gettime_us(void) {
    44+#if defined(_WIN32)
    45+
    46+    LARGE_INTEGER freq, counter;
    47+    QueryPerformanceFrequency(&freq);
    


    furszy commented at 3:32 pm on October 27, 2025:

    Per https://learn.microsoft.com/en-us/windows/win32/api/profileapi/nf-profileapi-queryperformancefrequency, freq should be static and initialized only once. The function’s description says:

    The frequency of the performance counter is fixed at system boot and is consistent across all processors. Therefore, the frequency need only be queried upon application initialization, and the result can be cached.


    Raimo33 commented at 3:54 pm on October 27, 2025:
    I know, it’s not mandatory but recommended. It is redundant to initialize it every time but I found no other solution since we are not in an OOP environment. and using an if statement to initialize it only the first time doesn’t seem optimal.

    furszy commented at 5:52 pm on October 27, 2025:

    I see. Using an init_time(void) function could work in that case, but it would require every binary to call this method early in every program’s execution. All good anyway.

    Another nit; QueryPerformanceFrequency returns 0 if the call fails, so it would be good to handle that as well.


    Raimo33 commented at 5:55 pm on October 27, 2025:

    Another nit; QueryPerformanceFrequency returns 0 if the call fails, so it would be good to handle that as well.

    my reasoning Is the same here. It would be one extra branch. and a failure in the clock is not a huge deal.

  134. furszy commented at 3:45 pm on October 27, 2025: member
    Have you considered GetProcessTimes or GetThreadTimes to ignore sleep and waiting times on Windows? It seems to me that the current approach is semantically different across platforms: on POSIX systems, you get CPU time (if enabled), whereas on Windows you’re using QueryPerformanceCounter, which measures elapsed wall-clock time. So results will not be comparable across platforms?
  135. Raimo33 commented at 3:52 pm on October 27, 2025: none

    Have you considered GetProcessTimes or GetThreadTimes

    I have, but they lack precision.

  136. furszy commented at 5:38 pm on October 27, 2025: member

    Have you considered GetProcessTimes or GetThreadTimes

    I have, but they lack precision.

    That’s interesting. What do you think about the semantical difference between the Windows and POSIX approaches? It seems to me that we should at least mention in the docs that results aren’t comparable across platforms.

  137. Raimo33 commented at 12:39 pm on October 28, 2025: none

    What do you think about the semantical difference between the Windows and POSIX approaches? It seems to me that we should at least mention in the docs that results aren’t comparable across platforms.

    We added print statements for that exact reason. On windows there simply isn’t a reliable way to get per-process time.


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin-core/secp256k1. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-11-04 03:15 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me