bench: replace wall-clock timer with per-process CPU timer #1732

pull Raimo33 wants to merge 7 commits into bitcoin-core:master from Raimo33:benchmark-precise changing 10 files +178 −130
  1. Raimo33 commented at 2:58 pm on September 2, 2025: none

    Goal

    This PR refactors the benchmarking functions as per #1701, in order to make benchmarks more deterministic and less influenced by the environvment.

    This is achieved by replacing Wall-Clock Timer with Per-Process CPU Timer when possible.

  2. Raimo33 marked this as a draft on Sep 2, 2025
  3. real-or-random commented at 3:05 pm on September 2, 2025: contributor

    Just some quick comments:

    [x] remove the number of runs (count) in favor of simpler cleaner approach with just number of iterations (iter).

    I think there’s a reason to have this. Some benchmarks take much longer than others, so it probably makes sense to run fewer iters for these.

    [x] remove min and max statistics in favor of simpler approach with just avg.

    I think min and max are useful. For constant-time code, you can also compare min. And max gives you an idea if there were fluctuations or not.

    [x] remove needless fixed point conversion in favor of simpler floating point divisions.

    Well, okay, that has a history; see #689. It’s debatable if it makes sense to avoid floating point math, but as long as it doesn’t get in your way here, it’s a cool thing to keep it. :D

  4. real-or-random commented at 3:06 pm on September 2, 2025: contributor
    It will be useful to split your changes into meaningful and separate commits, see https://github.com/bitcoin/bitcoin/blob/master/CONTRIBUTING.md#committing-patches.
  5. Raimo33 commented at 3:09 pm on September 2, 2025: none
    I think min and max just complicate things. let me explain: first of all, as it is right now, they don’t even measure the min and max, they just measure the min/max of the averages of all runs. aka not the absolute. Furthermore, in order to have them, one would need to run all the iterations 10 times more. benchmarks are already slow, adding this min/max slows them by 10 fold. imho it’s completely unnecessary. @real-or-random
  6. sipa commented at 3:15 pm on September 2, 2025: contributor

    If we’re going to rework this, I’d suggest using the stabilized quartiles approach from https://cr.yp.to/papers/rsrst-20250727.pdf:

    • StQ1: the average of all samples between 1st and 3rd octile
    • StQ2: the average of all samples between 3rd and 5th octile
    • StQ3: the average of all samples between 5th and 7th octile
  7. Raimo33 force-pushed on Sep 2, 2025
  8. Raimo33 commented at 3:23 pm on September 2, 2025: none

    I think there’s a reason to have this. Some benchmarks take much longer than others, so it probably makes sense to run fewer iters for these.

    right now all benchmarks are run with count=10 and fixed iters (apart from ecmult_multi which adjusts the number of iters, not count).

    therefore count is only useful to extrapolate min and max

  9. Raimo33 commented at 3:33 pm on September 2, 2025: none

    Well, okay, that has a history; see #689. It’s debatable if it makes sense to avoid floating point math, but as long as it doesn’t get in your way here, it’s a cool thing to keep it. :D

    I disagree with #689. It overcomplicate things for the sake of not having floating point math. those divisions aren’t even in the hot path, they’re outside the benchmarks.

  10. sipa commented at 3:42 pm on September 2, 2025: contributor
    Concept NACK on removing any ability to observe variance in timing. The current min/avg/max are far from perfect, but they work fairly well in practice. Improving is welcome, but removing them is a step backwards.
  11. Raimo33 commented at 3:57 pm on September 2, 2025: none

    Concept NACK on removing any ability to observe variance in timing. The current min/avg/max are far from perfect, but they work fairly well in practice. Improving is welcome, but removing them is a step backwards.

    what is the usefulness of measuring min/max when we are removing OS interference & thermal throttling out of the equation? min/max will be extremely close to the avg no matter how bad the benchmarked function is.

  12. Raimo33 force-pushed on Sep 2, 2025
  13. Raimo33 force-pushed on Sep 2, 2025
  14. Raimo33 force-pushed on Sep 2, 2025
  15. Raimo33 renamed this:
    [WIP] Refactor benchmark
    [WIP] refactor: remove system interference from benchmarks
    on Sep 2, 2025
  16. Raimo33 force-pushed on Sep 2, 2025
  17. Raimo33 force-pushed on Sep 2, 2025
  18. Raimo33 commented at 5:32 pm on September 2, 2025: none
    by the way, gettimeofday() is officially discouraged since 2008 in favor of clock_gettime(). The POSIX standard marks it as obsolescent but still provides the API for backward compatibility.
  19. Raimo33 force-pushed on Sep 2, 2025
  20. Raimo33 force-pushed on Sep 2, 2025
  21. Raimo33 commented at 7:45 pm on September 2, 2025: none

    even though the manual says that CLOCK_PROCESS_CPUTIME_ID is only useful if the process is locked to a core, modern CPUs have largely addressed this issue. So I think it is fair to compile CLOCK_PROCESS_CPUTIME_ID even though we don’t have the guarantee that the user has pinned the benchmarking process to a core. The worst case scenario is a unreliable benchmark, which the current repo has anyways.

    I added a line in the README.md for best practices to run the benchmarks.

    I also tried adding a function to pin the process to a core directly in C, but there’s no standard POSIX compliant way to do so. There is pthread_setaffinity_np() on linux, where ’np’ stands for ’not portable'

  22. Raimo33 force-pushed on Sep 2, 2025
  23. Raimo33 force-pushed on Sep 2, 2025
  24. Raimo33 force-pushed on Sep 2, 2025
  25. Raimo33 force-pushed on Sep 2, 2025
  26. real-or-random commented at 8:47 pm on September 2, 2025: contributor

    Concept NACK on removing any ability to observe variance in timing. The current min/avg/max are far from perfect, but they work fairly well in practice. Improving is welcome, but removing them is a step backwards.

    what is the usefulness of measuring min/max when we are removing OS interference & thermal throttling out of the equation? min/max will be extremely close to the avg no matter how bad the benchmarked function is.

    The point is exactly having a simple way of verifying that there’s indeed no interference. Getting rid of sources of variance is hard to get right, and it’s impossible to get a perfect solution. (This discussion shows this!) So we better have a way of spotting if something is off.

    I like the stabilized quartiles idea.

  27. Raimo33 commented at 8:52 pm on September 2, 2025: none

    I like the stabilized quartiles idea.

    tbh it scares me a bit, will see what I can do. Maybe in a future PR.

  28. Raimo33 force-pushed on Sep 2, 2025
  29. Raimo33 marked this as ready for review on Sep 2, 2025
  30. in CMakeLists.txt:34 in 66745f741f outdated
    29@@ -30,6 +30,8 @@ set(${PROJECT_NAME}_LIB_VERSION_AGE 0)
    30 #=============================
    31 set(CMAKE_C_STANDARD 90)
    32 set(CMAKE_C_EXTENSIONS OFF)
    33+# Enable POSIX features while maintaining ISO C compliance
    34+add_compile_definitions(_POSIX_C_SOURCE=200112L) #needed for `clock_gettime()` in bench.c
    


    real-or-random commented at 6:22 am on September 3, 2025:
    we’ll only need this in the benchmarks. Can you define it at the top of bench.c?

    Raimo33 commented at 6:33 pm on September 4, 2025:
    that was my initial setup. but it doesn’t work because it has to be before any include. I added it to the src/CMakeLists.txt targeting only the benchmarks.
  31. in README.md:162 in 66745f741f outdated
    158@@ -159,6 +159,8 @@ Benchmark
    159 ------------
    160 If configured with `--enable-benchmark` (which is the default), binaries for benchmarking the libsecp256k1 functions will be present in the root directory after the build.
    161 
    162+For an accurate benchmark, it is strongly recommended to pin the process to a dedicated CPU core and to disable CPU frequency scaling.
    



    l0rinc commented at 8:17 pm on September 6, 2025:
    This will be more reliable at best, but not more “accurate”, since it doesn’t reflect actual usage.

    Raimo33 commented at 8:47 pm on September 6, 2025:
    agreed. changed.
  32. in src/bench.h:14 in 66745f741f outdated
     6@@ -7,30 +7,63 @@
     7 #ifndef SECP256K1_BENCH_H
     8 #define SECP256K1_BENCH_H
     9 
    10+#if defined(_WIN32)
    11+# include <windows.h>
    12+#else
    13+# include <time.h>
    14+# include <sys/time.h>
    


    real-or-random commented at 6:38 am on September 3, 2025:
    Do we need both?

    Raimo33 commented at 8:11 am on September 3, 2025:
    yes, one for gettimeofday() and one for clock_gettime(). they don’t conflict.

    real-or-random commented at 6:56 pm on September 3, 2025:
    I think then it could be slightly more portable to include only what we’ll use.

    Raimo33 commented at 9:36 pm on September 3, 2025:

    just tried. unfortunately the macros to detect wether gettimeofday() or clock_gettime() is needed are defined in the time.h header. so the only viable solution is:

    0#include <time.h>
    1#if !defined(CLOCK_PROCESS_CPUTIME_ID) && !defined(CLOCK_MONOTONIC) && !defined(CLOCK_REALTIME)
    2#  include <sys/time.h>
    3#endif
    

    which every LLM tells me is ‘inherently unreliable’ and can lead to portability issues.

    ‘Including sys/time.h on modern POSIX systems that already have time.h and clock_gettime doesn’t hurt, and ensures gettimeofday is available if needed.’

    ‘Trying to skip it adds complexity with almost no benefit.’


    real-or-random commented at 8:39 am on September 8, 2025:
    Okay, but if we include time.h anyway, then let’s stick to clock_gettime on POSIX? https://pubs.opengroup.org/onlinepubs/9799919799/functions/clock_getres.html says that the proper macro to check is _POSIX_CPUTIME and that CLOCK_MONOTONIC must be supported, so there’s no need to check for it.

    Raimo33 commented at 9:37 am on September 8, 2025:
    You’re right, the latest specification makes CLOCK_MONOTONIC support mandatory, this makes it way simpler. no need to include <sys/time.h> anymore.

    Raimo33 commented at 10:22 am on September 8, 2025:
    This would be possible if every compiler supported the Std 1003.1-2024 version of POSIX, which is unrealistic. Until then, for backward compatibility, I kept CLOCK_REALTIME as the last fallback.

    Raimo33 commented at 10:26 am on September 8, 2025:
    and btw, the _POSIX_CPUTIME and _POSIX_MONOTONIC flags are to be used when you want to ensure that the library has mandatory support for those features. In my case for example, gcc doesnt expose _POSIX_MONOTONIC but still supports it. Therefore it’s better to check CLOCK_PROCESS_CPUTIME_ID and CLOCK_MONOTONIC directly.
  33. in src/bench.h:42 in 66745f741f outdated
    49+
    50+    return (int64_t)((k.QuadPart + u.QuadPart) / 10);
    51+#else /* POSIX */
    52+
    53+# if defined(CLOCK_PROCESS_CPUTIME_ID)
    54+    /* in theory, CLOCK_PROCESS_CPUTIME_ID is only useful if the process is locked to a core. see https://linux.die.net/man/3/clock_gettime */
    


    real-or-random commented at 6:40 am on September 3, 2025:
    and in practice?

    Raimo33 commented at 8:12 am on September 3, 2025:

    real-or-random commented at 6:57 pm on September 3, 2025:
    Okay, sorry. I was aware of the comment in the issue, but a future reader of the code comment will not. So what I’m trying to say is that the code comment can be improved.

    Raimo33 commented at 8:46 pm on September 3, 2025:
    Added comment and link to chatgpt deepsearch. might not be very professional but it includes the full analysis and lots of citations.
  34. in src/bench.h:64 in 66745f741f outdated
    75+    /* WARN: timer is influenced by environvment (OS, scheduling, interrupts...) */
    76     struct timeval tv;
    77-    gettimeofday(&tv, NULL);
    78-    return (int64_t)tv.tv_usec + (int64_t)tv.tv_sec * 1000000LL;
    79+    gettimeofday((struct timeval*)&tv, NULL);
    80+    return (int64_t)tv.tv_sec * 1000000 + tv.tv_usec;
    


    real-or-random commented at 6:42 am on September 3, 2025:

    How is this diff an improvement?

    • The pointer cast is a no-op.
    • LL is not needed.

    Raimo33 commented at 8:49 am on September 3, 2025:
    you’re right. removed.
  35. in src/bench.h:39 in 66745f741f outdated
    69+    struct timespec ts;
    70+    if (clock_gettime(CLOCK_REALTIME, &ts) != 0) {
    71+        return 0;
    72+    }
    73+    return (int64_t)ts.tv_sec * 1000000 + ts.tv_nsec / 1000;
    74+# else
    


    real-or-random commented at 6:43 am on September 3, 2025:
    This could be DRYed more. Maybe it’s a good idea to print to the user which clock method is used.

    Raimo33 commented at 8:16 am on September 3, 2025:

    yes, I tried everything you mentioned.

    removing redundancy here can be done by just using the ifdefs for the constant, but you’d need a different path for the last fallback. the last case uses tv, not ts.

    I tried issuing warnings with #warning to tell the user about the precision of the clock but it’s not compatible with c90.


    Raimo33 commented at 8:56 am on September 3, 2025:

    Here’s the DRY’d version:

     0# if defined(CLOCK_PROCESS_CPUTIME_ID)
     1    /* in theory, CLOCK_PROCESS_CPUTIME_ID is only useful if the process is locked to a core. see https://linux.die.net/man/3/clock_gettime */
     2const clockid_t clk_id = CLOCK_PROCESS_CPUTIME_ID;
     3# elif defined(CLOCK_MONOTONIC)
     4    /* WARN: timer is influenced by environvment (OS, scheduling, interrupts...) */
     5const clockid_t clk_id = CLOCK_MONOTONIC;
     6# elif defined(CLOCK_REALTIME)
     7    /* WARN: timer is influenced by environvment (OS, scheduling, interrupts...) */
     8const clockid_t clk_id = CLOCK_REALTIME;
     9# else
    10# define USE_GETTIMEOFDAY
    11# endif
    12
    13# ifndef USE_GETTIMEOFDAY
    14struct timespec ts;
    15if (clock_gettime(clk_id, &ts) != 0) {
    16    return 0;
    17}
    18return (int64_t)ts.tv_sec * 1000000 + ts.tv_nsec / 1000;
    19# else
    20    /* WARN: timer is influenced by environvment (OS, scheduling, interrupts...) */
    21struct timeval tv;
    22gettimeofday(&tv, NULL);
    23return tv.tv_usec + (int64_t)tv.tv_sec * 1000000;
    24# endif
    

    less readable imo


    real-or-random commented at 6:55 pm on September 3, 2025:

    I tried issuing warnings with #warning to tell the user about the precision of the clock but it’s not compatible with c90.

    What I had in mind is printing the warning out at runtime.


    Raimo33 commented at 9:36 pm on September 3, 2025:
    good suggestion. done.
  36. in src/bench.h:28 in 66745f741f outdated
    35-    struct timespec tv;
    36-    if (!timespec_get(&tv, TIME_UTC)) {
    37-        fputs("timespec_get failed!", stderr);
    38-        exit(EXIT_FAILURE);
    39+    if (GetProcessTimes(GetCurrentProcess(), &creation, &exit, &kernel, &user) == 0) {
    40+        return 0;
    


    real-or-random commented at 6:56 am on September 3, 2025:
    What’s the rationale for the change on Windows?

    Raimo33 commented at 8:19 am on September 3, 2025:
    previously wall-clock times. now per-process cpu time.
  37. Raimo33 commented at 8:30 am on September 3, 2025: none

    The point is exactly having a simple way of verifying that there’s indeed no interference. Getting rid of sources of variance is hard to get right, and it’s impossible to get a perfect solution. (This discussion shows this!) So we better have a way of spotting if something is off.

    fine, but at least let’s make it optional. I don’t like my benchmarks being 10 times slower just because min/max need to be computed. if I say 20'000 iterations I want it to be 20`000 iterations.

  38. Raimo33 force-pushed on Sep 3, 2025
  39. Raimo33 requested review from real-or-random on Sep 3, 2025
  40. Raimo33 force-pushed on Sep 3, 2025
  41. Raimo33 force-pushed on Sep 3, 2025
  42. Raimo33 force-pushed on Sep 3, 2025
  43. Raimo33 force-pushed on Sep 4, 2025
  44. Raimo33 force-pushed on Sep 4, 2025
  45. Raimo33 force-pushed on Sep 4, 2025
  46. Raimo33 force-pushed on Sep 6, 2025
  47. Raimo33 force-pushed on Sep 6, 2025
  48. Raimo33 force-pushed on Sep 6, 2025
  49. real-or-random commented at 8:43 am on September 8, 2025: contributor

    The point is exactly having a simple way of verifying that there’s indeed no interference. Getting rid of sources of variance is hard to get right, and it’s impossible to get a perfect solution. (This discussion shows this!) So we better have a way of spotting if something is off.

    fine, but at least let’s make it optional. I don’t like my benchmarks being 10 times slower just because min/max need to be computed. if I say 20'000 iterations I want it to be 20`000 iterations.

    Not sure. I suggest making this PR focused on a single (uncontroversial) change, which is switching to per process clocks.

  50. Raimo33 force-pushed on Sep 8, 2025
  51. Raimo33 renamed this:
    [WIP] refactor: remove system interference from benchmarks
    bench: replace wall-clock timer with per-process CPU timer
    on Sep 8, 2025
  52. Raimo33 force-pushed on Sep 8, 2025
  53. Raimo33 force-pushed on Sep 8, 2025
  54. in src/bench.h:44 in f9d2ddec4a outdated
    56+
    57+# if defined(CLOCK_PROCESS_CPUTIME_ID)
    58+    /* In theory, CLOCK_PROCESS_CPUTIME_ID is only useful if the process is locked to a core. 
    59+    * See https://linux.die.net/man/3/clock_gettime. 
    60+    * In practice, modern CPUs have largely addressed this issue. The worst case scenario is a unreliable benchmark. 
    61+    * See https://chatgpt.com/s/dr_68b7486a05a481919d9f121d182cc0cf 
    


    sipa commented at 3:18 pm on September 8, 2025:
    Can you find a source that’s a bit more authoritative than ChatGPT?

    Raimo33 commented at 3:55 pm on September 8, 2025:
    yes. Added multiple links.
  55. in src/bench.h:209 in f9d2ddec4a outdated
    203@@ -176,6 +204,14 @@ static int get_iters(int default_iters) {
    204     }
    205 }
    206 
    207+static void print_clock_info(void) {
    208+#if defined(_WIN32) || defined(CLOCK_PROCESS_CPUTIME_ID)
    209+    printf("INFO: using CPU timer, results are not influenced by other running processes\n\n");
    


    sipa commented at 3:19 pm on September 8, 2025:
    That seems like an exaggeration; hyperthreading, task swapping, and competing memory bandwidth, may all still affect the results.

    Raimo33 commented at 3:50 pm on September 8, 2025:
    you’re right. memory bandwith, cache pollution, hyperthreading, hot/cold branch predictor… But task swapping shouldn’t influence CLOCK_PROCESS_CPUTIME_ID. I changed the message to notify the user that he should probably pin the process.

    real-or-random commented at 7:22 am on September 9, 2025:

    I changed the message to notify the user that he should probably pin the process.

    If that’s not required on modern CPUs, why advise to do this?

    We could just say “INFO: Using per-process CPU timer”.

  56. Raimo33 force-pushed on Sep 8, 2025
  57. Raimo33 force-pushed on Sep 8, 2025
  58. in src/bench.h:42 in bbc8bb7003 outdated
    54+
    55+#else /* POSIX */
    56+
    57+# if defined(CLOCK_PROCESS_CPUTIME_ID)
    58+    /* In theory, CLOCK_PROCESS_CPUTIME_ID is only useful if the process is locked to a core. see https://man7.org/linux/man-pages/man2/clock_gettime.2.html *
    59+     * in practice, modern CPUs have synchronized TSCs which addresses this issue. see https://docs.amd.com/r/en-US/ug1586-onload-user/Timer-TSC-Stability   */
    


    real-or-random commented at 7:14 am on September 9, 2025:
    0    /* In theory, CLOCK_PROCESS_CPUTIME_ID is only useful if the process is locked to a core,
    1     * see `man clock_gettime` on Linux. In practice, modern CPUs have synchronized TSCs which
    2     * address this issue, see https://docs.amd.com/r/en-US/ug1586-onload-user/Timer-TSC-Stability . */
    
  59. Raimo33 force-pushed on Sep 9, 2025
  60. Raimo33 commented at 9:11 am on September 9, 2025: none
    I’ve applied the recommended changes and split the PR into multiple commits for simpler diffs
  61. Raimo33 force-pushed on Sep 9, 2025
  62. real-or-random added the label tweak/refactor on Sep 9, 2025
  63. l0rinc commented at 5:09 pm on September 9, 2025: none
    I haven’t investigated this in detail but couldn’t find any related issue: were there any efforts to use https://nanobench.ankerl.com like we do in Core instead? That will likely need some adjustments on the benchmarks themselves, but at least we wouldn’t be rolling our own framework.
  64. Raimo33 commented at 5:51 pm on September 9, 2025: none

    not familiar with nano bench, but I can see it uses std::chrono. which uses CLOCK_MONOTONIC (our fallback) under the hood, not TSC.

    also the user would need to have a c++ compiler, not a big real but I think this repo needs to strictly adhere to c89.

  65. Raimo33 commented at 6:02 pm on September 9, 2025: none
    I realized the windows version I implemented was using a unprecise clock. drafting. will fix.
  66. Raimo33 marked this as a draft on Sep 9, 2025
  67. Raimo33 force-pushed on Sep 9, 2025
  68. Raimo33 force-pushed on Sep 9, 2025
  69. Raimo33 commented at 11:12 pm on September 9, 2025: none
    I changed windows clock to QueryPerformanceCounter() which has microsecond precision. I also removed if checks from clock_gettime() as it can only fail for programming errors.
  70. real-or-random commented at 6:26 am on September 10, 2025: contributor

    I haven’t investigated this in detail but couldn’t find any related issue: were there any efforts to use nanobench.ankerl.com like we do in Core instead?

    nanobench is a framework for benchmarking C++ but this is C. Of course, we could still call our C from C++, but adding C++ to the code base seems a bit overkill to me if if’s “just” for the purpose of benchmarks. I agree that using an existing framework may be a good idea, but I’m not aware of any in pure C.

  71. Raimo33 marked this as ready for review on Sep 10, 2025
  72. l0rinc commented at 9:57 pm on September 10, 2025: none

    but adding C++ to the code base seems a bit overkill to me if if’s “just” for the purpose of benchmarks

    The purpose would be to standardize the benchmarking to avoid reinventing solutions to problems that aren’t strictly related to the purpose of this library. It would also help with trusting the result more, since we already know how that benchmarking library behaves. The main client of this library is C++ code, it doesn’t seem far-fetched to me to test that.

  73. real-or-random commented at 6:36 pm on September 15, 2025: contributor

    but adding C++ to the code base seems a bit overkill to me if if’s “just” for the purpose of benchmarks

    The purpose would be to standardize the benchmarking to avoid reinventing solutions to problems that aren’t strictly related to the purpose of this library. It would also help with trusting the result more, since we already know how that benchmarking library behaves. The main client of this library is C++ code, it doesn’t seem far-fetched to me to test that.

    Sorry, I didn’t want to be discouraging. I don’t think at all that nanobench is a far-fetched idea. And I truly agree that reinventing the wheel is not great.

    In the end, what counts is the maintenance burden. My main concern with C++ is that I’m not sure how fluent our regular contributors are in C++. (I’m not, but luckily I’m not the only one here.) If the bit of C++ is harder to touch (for us) than this code, then nanobench is not a good idea. If, on the other hand, we have people who can maintain C++, or if we get support from the Core contributors who are familiar with nanobench, then this can be a win.

    I’d be happy to see a demo/WIP PR but if you feel that this is a lot of work, then it might be a good idea to see what other contributors and maintainers think.

  74. kmk142789 approved
  75. Raimo33 commented at 3:34 pm on September 17, 2025: none
    This PR currently conflicts with https://github.com/bitcoin-core/secp256k1/pull/1734
  76. furszy commented at 4:16 pm on September 17, 2025: member

    This PR currently conflicts with #1734

    Don’t worry about it. #1734 only moves gettime() so it can be accessed by the test framework too. Any changes you make here will function correctly there as well, and vice versa.

  77. in src/bench.h:197 in 3e58a91b77
    189@@ -176,6 +190,14 @@ static int get_iters(int default_iters) {
    190     }
    191 }
    192 
    193+static void print_clock_info(void) {
    194+#if defined(CLOCK_PROCESS_CPUTIME_ID)
    195+    printf("INFO: Using per-process CPU timer\n\n");
    196+#else
    197+    printf("WARN: using Wall-Clock timer, results are highly influenced by other running processes\n\n");
    


    real-or-random commented at 11:03 am on September 22, 2025:
    0    printf("WARN: Using wall-clock timer instead of per-process CPU timer.\n\n");
    

    Let’s just stick to the facts and omit the interpretation of them, this makes it easier to move forward. ;)

  78. in README.md:162 in 3e58a91b77
    158@@ -159,6 +159,8 @@ Benchmark
    159 ------------
    160 If configured with `--enable-benchmark` (which is the default), binaries for benchmarking the libsecp256k1 functions will be present in the root directory after the build.
    161 
    162+For reliable benchmarks, it is strongly recommended to pin the process to a dedicated CPU core and to disable CPU frequency scaling.
    


    real-or-random commented at 11:05 am on September 22, 2025:
    0For reliable benchmarks, it is strongly recommended to pin the process to a single CPU core and to disable CPU frequency scaling.
    

    Dedicated sounds as if the core is only responsible for this process, which would be nice but it’s typically not doable.

  79. real-or-random approved
  80. real-or-random commented at 11:06 am on September 22, 2025: contributor
    ACK mod nits, still need to test this
  81. Raimo33 force-pushed on Sep 22, 2025
  82. in src/bench.c:7 in 8925b95882
    3@@ -4,6 +4,8 @@
    4  * file COPYING or https://www.opensource.org/licenses/mit-license.php.*
    5  ***********************************************************************/
    6 
    7+#define _POSIX_C_SOURCE 200112L /* for clock_gettime() */
    


    hebasto commented at 2:26 pm on September 22, 2025:
    Are there any references explaining why the value 200112L was chosen?

    Raimo33 commented at 2:53 pm on September 22, 2025:
    no particular reason besides the fact that it is the bare minimum version to make it work on my machine and CI. The reported 199309L version number didn’t expose the necessary macros.

    hebasto commented at 2:20 pm on September 25, 2025:

    The reported 199309L version number didn’t expose the necessary macros.

    I can’t reproduce this on my machine.

    no particular reason besides the fact that it is the bare minimum version to make it work on my machine…

    Could you share your build environment details so I can try to reproduce the issue?

    … and CI.

    It seems that CI is fine with the documented 199309L value. See: https://github.com/hebasto/secp256k1/actions/runs/18008380590

  83. hebasto commented at 2:27 pm on September 22, 2025: member

    According to https://www.man7.org/linux/man-pages/man3/clock_gettime.3.html:

    Link with -lrt (only for glibc versions before 2.17).

    I think this check, and adding the -lrt flag if necessary, should be included in both build systems.

  84. Raimo33 commented at 3:06 pm on September 22, 2025: none

    I think this check, and adding the -lrt flag if necessary, should be included in both build systems.

    Agreed, good catch! what do you think about something like this?

    0include(CheckFunctionExists)
    1check_function_exists(clock_gettime HAVE_CLOCK_GETTIME)
    2if(NOT HAVE_CLOCK_GETTIME)
    3    # On some platforms, clock_gettime requires librt
    4    target_link_libraries(your_target PRIVATE rt)
    5endif()
    
  85. hebasto commented at 3:27 pm on September 22, 2025: member

    Friendly ping @martinus, benchmark connoisseur :)

    Could you please give a rough estimate of these changes at the concept level?

  86. real-or-random commented at 3:41 pm on September 22, 2025: contributor

    Could you please give a rough estimate of these changes at the concept level?

    And what’s your take on using nanobench instead, even though this is a pure C library instead of C++?

  87. hebasto commented at 3:48 pm on September 22, 2025: member

    Goal

    This PR refactors the benchmarking functions as per #1701, in order to make benchmarks more deterministic and less influenced by the environvment.

    This is achieved by replacing Wall-Clock Timer with Per-Process CPU Timer when possible.

    I’ve tested 8925b9588259d27257f42eb4fb8bebd4a7be4d95 by running it alongside stress --cpu $(nproc) and observed the same effect on benchmark average values as on the master branch.

    Could you please clarify how one would measure or observe the claimed “less influenced by the environment” behaviour?

  88. Raimo33 commented at 8:53 am on September 25, 2025: none

    stress --cpu spawns workers that use 100% of the cpu. your test pegs every core to 100% utilization. and each gets probably pinned to a physical core. therefore you won’t see any difference in terms of variance.

    a better example would be starting various threads with random usage spikes at random times on the same core the benchmark is running on.

    basically, if you have no other processes running on that CPU, then this PR won’t show any improvement.

    this PR helps in the more realistic scenario where a user has background processes running, and the scheduler assigns multiple of them on the same CPU that’s running the benchmark, putting the thread to sleep, which the wall clock timer doesn’t account for.

    also, this PR is not merely a replacement of wall clock time with CPU time, but it also modernizes the clock function as per the Unix standard.

  89. hebasto commented at 3:08 pm on September 25, 2025: member

    I think this check, and adding the -lrt flag if necessary, should be included in both build systems.

    Agreed, good catch! what do you think about something like this?

    0include(CheckFunctionExists)
    1check_function_exists(clock_gettime HAVE_CLOCK_GETTIME)
    2if(NOT HAVE_CLOCK_GETTIME)
    3    # On some platforms, clock_gettime requires librt
    4    target_link_libraries(your_target PRIVATE rt)
    5endif()
    

    Feel free to grab f59c45f2cef03681074eef00046d3b146daf42e8 from https://github.com/hebasto/secp256k1/commits/pr1732/0925.cmake.

    UPD. @real-or-random do we need a CI job to build on some old system with old glibc?

  90. in src/bench.h:200 in 8925b95882 outdated
    195+    printf("INFO: Using per-process CPU timer\n\n");
    196+#else
    197+    printf("WARN: Using wall-clock timer instead of per-process CPU timer.\n\n");
    198+#endif
    199+}
    200+
    


    hebasto commented at 3:10 pm on September 25, 2025:

    5fe7c017682240295d701570f052b6ddfb2d1fb7:

    Do these messages make sense on Windows?


    Raimo33 commented at 3:19 pm on September 25, 2025:
    while on windows there’s no option for per-process cpu clock (or at least not high precision), I still think issuing the warning is fine.

    hebasto commented at 3:35 pm on September 25, 2025:
    It seems a bit of a stretch to call the native Windows QPC framework a “wall-clock timer”.

    Raimo33 commented at 4:26 pm on September 25, 2025:
    what about this? “WARN: Using global timer instead of per-process CPU timer.”
  91. bench: replace wall-clock timers with cpu-timers where possible
    This commit improves the reliability of benchmarks by removing some of the influence of other background running processes. This is achieved by using CPU bound clocks that aren't influenced by interrupts, sleeps, blocked I/O, etc.
    7da009204b
  92. bench: print clock info ce6d39cba5
  93. docs: add benchmarking best practices cce0147250
  94. Raimo33 commented at 4:54 pm on September 25, 2025: none

    Feel free to grab f59c45f from https://github.com/hebasto/secp256k1/commits/pr1732/0925.cmake.

    This seems overcomplicated and convoluted. but if you manage to simplify it I’ll include your commit.

  95. Raimo33 force-pushed on Sep 25, 2025
  96. hebasto commented at 5:12 pm on September 25, 2025: member

    Approach NACK cce014725032a58416237e38aedd2033ba2202f5.

    The current implementation breaks compatibility with systems using glibc versions prior to 2.17:

     0$ ldd --version | head -1
     1ldd (Ubuntu EGLIBC 2.15-0ubuntu10.23) 2.15
     2$ cmake -B build
     3$ cmake --build build -t bench
     4<snip>
     5[100%] Linking C executable ../bin/bench
     6CMakeFiles/bench.dir/bench.c.o: In function `gettime_us':
     7/secp256k1/src/bench.h:45: undefined reference to `clock_gettime'
     8collect2: ld returned 1 exit status
     9make[3]: *** [bin/bench] Error 1
    10make[2]: *** [src/CMakeFiles/bench.dir/all] Error 2
    11make[1]: *** [src/CMakeFiles/bench.dir/rule] Error 2
    12make: *** [bench] Error 2
    
  97. Raimo33 commented at 3:07 pm on September 28, 2025: none

    Fixed by unconditionally adding -lrt when available.

    0root@5997b2bd1aca:/workspace# ldd --version | head -1
    1ldd (Ubuntu EGLIBC 2.15-0ubuntu10.23) 2.15
    2root@5997b2bd1aca:/workspace# cmake --build build -t bench
    3[ 33%] Built target secp256k1_precomputed
    4[ 66%] Built target secp256k1
    5[100%] Built target bench
    
  98. in src/CMakeLists.txt:132 in 9128541044 outdated
    124@@ -125,12 +125,17 @@ endif()
    125 unset(${PROJECT_NAME}_soversion)
    126 
    127 if(SECP256K1_BUILD_BENCHMARK)
    128+  find_library(RT_LIBRARY rt)
    129+  add_library(optional_rt INTERFACE)
    130+  if(RT_LIBRARY)
    131+    target_link_libraries(optional_rt INTERFACE rt)
    132+  endif()
    


    purpleKarrot commented at 3:16 pm on September 29, 2025:

    Could you please extract this logic into a find-module, like FindRT.cmake? That module should provide an IMPORTED target with a namespace in the name.

    Have a look at the FindIconv.cmake module as an example. This module also provides an interface library for functionality that may be found in an actual library or built into the C standard library.


    Raimo33 commented at 1:20 pm on October 1, 2025:
    This looks compilicated, almost overcomplicated, without much benefits, I won’t be able to replicate FindIconv.cmake on my own. but you’re welcome to provide a commit and I’ll see if it should be cherrypicked.
  99. in README.md:162 in 9128541044 outdated
    158@@ -157,6 +159,8 @@ Benchmark
    159 ------------
    160 If configured with `--enable-benchmark` (which is the default), binaries for benchmarking the libsecp256k1 functions will be present in the root directory after the build.
    161 
    162+For reliable benchmarks, it is strongly recommended to pin the process to a single CPU core and to disable CPU frequency scaling.
    


    purpleKarrot commented at 3:35 pm on September 29, 2025:

    It would be great if the benchmark was configured in a way such that the guideline is followed.

    0add_test(NAME secp256k1_bench COMMAND ...)
    1set_tests_properties(secp256k1_bench PROPERTIES
    2  PROCESSORS 1
    3  PROCESSOR_AFFINITY ON
    4)
    

    I am not aware how to disable scaling, unfortunately.


    Raimo33 commented at 1:54 pm on October 1, 2025:
    This is a great idea, added.
  100. refactor: reorder cmake commands for better readability 9219b13918
  101. in src/CMakeLists.txt:133 in 9128541044
    124@@ -125,12 +125,17 @@ endif()
    125 unset(${PROJECT_NAME}_soversion)
    126 
    127 if(SECP256K1_BUILD_BENCHMARK)
    128+  find_library(RT_LIBRARY rt)
    129+  add_library(optional_rt INTERFACE)
    130+  if(RT_LIBRARY)
    131+    target_link_libraries(optional_rt INTERFACE rt)
    132+  endif()
    133   add_executable(bench bench.c)
    


    purpleKarrot commented at 3:43 pm on September 29, 2025:

    Note that CMake requires that

    • test names are unique per directory
    • target names are unique.

    While it is not strictly required to prefix test names with the project, it is still recommended, because it allows users to control which tests to run: ctest -R secp256k1_. This recommendation is already followed, good.

    For target names, this requirement is much stronger. Since this project is used as a subproject in bitcoin, neither bitcoin nor any other of its subprojects can define a target with the name bench. It would be recommended to rename the target to secp256k1_bench.


    Raimo33 commented at 1:25 pm on October 1, 2025:
    Agreed, done!

    Raimo33 commented at 1:54 pm on October 1, 2025:
    although CI scripts now need to change to reflect the executable name changes
  102. Raimo33 commented at 2:01 pm on October 1, 2025: none

    As per @purpleKarrot suggestion, benchmarks now can be run via ctest as well (as opposed to ./bench_name). This would be the preferable way to run them as it automatically handles CPU pinning and affinity. This means that, if both SECP256K1_BUILD_BENCHMARK and SECP256K1_BUILD_TESTS flags are set, the standard ctest command will run benchs+tests. This can be avoided by selectively choosing which “tests” to run with the following ctest syntax: ctest -R bench or ctest -R test (which uses regex).

    Otherwise we can evaluate the possibility of using labels for groups of tests.

  103. Raimo33 force-pushed on Oct 1, 2025
  104. Raimo33 force-pushed on Oct 1, 2025
  105. Raimo33 force-pushed on Oct 1, 2025
  106. Raimo33 force-pushed on Oct 1, 2025
  107. Raimo33 force-pushed on Oct 1, 2025
  108. Raimo33 force-pushed on Oct 1, 2025
  109. build: rename executables with prefix d2d790b4d4
  110. build: add benchmark compatibility with older glibc versions e95d277aac
  111. build: addbenchmarks to ctest 6f0e5d4461
  112. Raimo33 force-pushed on Oct 2, 2025
  113. real-or-random commented at 11:57 am on October 13, 2025: contributor
    @hebasto @purpleKarrot Want to re-review this? :)

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin-core/secp256k1. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-10-13 19:15 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me