bench: replace wall-clock timer with per-process CPU timer

Raimo33 commented at 2:58 pm on September 2, 2025: none

Goal

This PR refactors the benchmarking functions as per #1701, in order to make benchmarks more deterministic and less influenced by the environvment.

This is achieved by replacing Wall-Clock Timer with Per-Process CPU Timer when possible.

Raimo33 marked this as a draft on Sep 2, 2025

real-or-random commented at 3:05 pm on September 2, 2025: contributor

Just some quick comments:

[x] remove the number of runs (count) in favor of simpler cleaner approach with just number of iterations (iter).

I think there’s a reason to have this. Some benchmarks take much longer than others, so it probably makes sense to run fewer iters for these.

[x] remove min and max statistics in favor of simpler approach with just avg.

I think min and max are useful. For constant-time code, you can also compare min. And max gives you an idea if there were fluctuations or not.

[x] remove needless fixed point conversion in favor of simpler floating point divisions.

Well, okay, that has a history; see #689. It’s debatable if it makes sense to avoid floating point math, but as long as it doesn’t get in your way here, it’s a cool thing to keep it. :D

real-or-random commented at 3:06 pm on September 2, 2025: contributor

It will be useful to split your changes into meaningful and separate commits, see https://github.com/bitcoin/bitcoin/blob/master/CONTRIBUTING.md#committing-patches.

Raimo33 commented at 3:09 pm on September 2, 2025: none

I think min and max just complicate things. let me explain: first of all, as it is right now, they don’t even measure the min and max, they just measure the min/max of the averages of all runs. aka not the absolute. Furthermore, in order to have them, one would need to run all the iterations 10 times more. benchmarks are already slow, adding this min/max slows them by 10 fold. imho it’s completely unnecessary. @real-or-random

sipa commented at 3:15 pm on September 2, 2025: contributor

If we’re going to rework this, I’d suggest using the stabilized quartiles approach from https://cr.yp.to/papers/rsrst-20250727.pdf:

StQ1: the average of all samples between 1st and 3rd octile
StQ2: the average of all samples between 3rd and 5th octile
StQ3: the average of all samples between 5th and 7th octile

Raimo33 force-pushed on Sep 2, 2025

Raimo33 commented at 3:23 pm on September 2, 2025: none

I think there’s a reason to have this. Some benchmarks take much longer than others, so it probably makes sense to run fewer iters for these.

right now all benchmarks are run with count=10 and fixed iters (apart from ecmult_multi which adjusts the number of iters, not count).

therefore count is only useful to extrapolate min and max

Raimo33 commented at 3:33 pm on September 2, 2025: none

Well, okay, that has a history; see #689. It’s debatable if it makes sense to avoid floating point math, but as long as it doesn’t get in your way here, it’s a cool thing to keep it. :D

I disagree with #689. It overcomplicate things for the sake of not having floating point math. those divisions aren’t even in the hot path, they’re outside the benchmarks.

sipa commented at 3:42 pm on September 2, 2025: contributor

Concept NACK on removing any ability to observe variance in timing. The current min/avg/max are far from perfect, but they work fairly well in practice. Improving is welcome, but removing them is a step backwards.

Raimo33 commented at 3:57 pm on September 2, 2025: none

Concept NACK on removing any ability to observe variance in timing. The current min/avg/max are far from perfect, but they work fairly well in practice. Improving is welcome, but removing them is a step backwards.

what is the usefulness of measuring min/max when we are removing OS interference & thermal throttling out of the equation? min/max will be extremely close to the avg no matter how bad the benchmarked function is.

Raimo33 force-pushed on Sep 2, 2025

Raimo33 renamed this:
~~[WIP] Refactor benchmark~~
[WIP] refactor: remove system interference from benchmarks
on Sep 2, 2025

Raimo33 force-pushed on Sep 2, 2025

Raimo33 commented at 5:32 pm on September 2, 2025: none

by the way, gettimeofday() is officially discouraged since 2008 in favor of clock_gettime(). The POSIX standard marks it as obsolescent but still provides the API for backward compatibility.

Raimo33 force-pushed on Sep 2, 2025

Raimo33 commented at 7:45 pm on September 2, 2025: none

even though the manual says that CLOCK_PROCESS_CPUTIME_ID is only useful if the process is locked to a core, modern CPUs have largely addressed this issue. So I think it is fair to compile CLOCK_PROCESS_CPUTIME_ID even though we don’t have the guarantee that the user has pinned the benchmarking process to a core. The worst case scenario is a unreliable benchmark, which the current repo has anyways.

I added a line in the README.md for best practices to run the benchmarks.

I also tried adding a function to pin the process to a core directly in C, but there’s no standard POSIX compliant way to do so. There is pthread_setaffinity_np() on linux, where ’np’ stands for ’not portable'

Raimo33 force-pushed on Sep 2, 2025

real-or-random commented at 8:47 pm on September 2, 2025: contributor

Concept NACK on removing any ability to observe variance in timing. The current min/avg/max are far from perfect, but they work fairly well in practice. Improving is welcome, but removing them is a step backwards.

what is the usefulness of measuring min/max when we are removing OS interference & thermal throttling out of the equation? min/max will be extremely close to the avg no matter how bad the benchmarked function is.

The point is exactly having a simple way of verifying that there’s indeed no interference. Getting rid of sources of variance is hard to get right, and it’s impossible to get a perfect solution. (This discussion shows this!) So we better have a way of spotting if something is off.

I like the stabilized quartiles idea.

Raimo33 commented at 8:52 pm on September 2, 2025: none

I like the stabilized quartiles idea.

tbh it scares me a bit, will see what I can do. Maybe in a future PR.

Raimo33 force-pushed on Sep 2, 2025

Raimo33 marked this as ready for review on Sep 2, 2025

in CMakeLists.txt:34 in 66745f741f outdated

29@@ -30,6 +30,8 @@ set(${PROJECT_NAME}_LIB_VERSION_AGE 0)
30 #=============================
31 set(CMAKE_C_STANDARD 90)
32 set(CMAKE_C_EXTENSIONS OFF)
33+# Enable POSIX features while maintaining ISO C compliance
34+add_compile_definitions(_POSIX_C_SOURCE=200112L) #needed for `clock_gettime()` in bench.c

real-or-random commented at 6:22 am on September 3, 2025:

we’ll only need this in the benchmarks. Can you define it at the top of bench.c?

Raimo33 commented at 6:33 pm on September 4, 2025:

that was my initial setup. but it doesn’t work because it has to be before any include. I added it to the src/CMakeLists.txt targeting only the benchmarks.

in README.md:162 in 66745f741f outdated

158@@ -159,6 +159,8 @@ Benchmark
159 ------------
160 If configured with `--enable-benchmark` (which is the default), binaries for benchmarking the libsecp256k1 functions will be present in the root directory after the build.
161 
162+For an accurate benchmark, it is strongly recommended to pin the process to a dedicated CPU core and to disable CPU frequency scaling.

real-or-random commented at 6:37 am on September 3, 2025:

https://github.com/bitcoin/bitcoin/blob/46369583f3a99d2b403349790f8b26dbc28de132/src/bench/bench_bitcoin.cpp#L94 may be useful

l0rinc commented at 8:17 pm on September 6, 2025:

This will be more reliable at best, but not more “accurate”, since it doesn’t reflect actual usage.

Raimo33 commented at 8:47 pm on September 6, 2025:

agreed. changed.

in src/bench.h:14 in 66745f741f outdated

 6@@ -7,30 +7,63 @@
 7 #ifndef SECP256K1_BENCH_H
 8 #define SECP256K1_BENCH_H
 9 
10+#if defined(_WIN32)
11+# include <windows.h>
12+#else
13+# include <time.h>
14+# include <sys/time.h>

real-or-random commented at 6:38 am on September 3, 2025:

Do we need both?

Raimo33 commented at 8:11 am on September 3, 2025:

yes, one for gettimeofday() and one for clock_gettime(). they don’t conflict.

real-or-random commented at 6:56 pm on September 3, 2025:

I think then it could be slightly more portable to include only what we’ll use.

Raimo33 commented at 9:36 pm on September 3, 2025:

just tried. unfortunately the macros to detect wether gettimeofday() or clock_gettime() is needed are defined in the time.h header. so the only viable solution is:

0#include <time.h>
1#if !defined(CLOCK_PROCESS_CPUTIME_ID) && !defined(CLOCK_MONOTONIC) && !defined(CLOCK_REALTIME)
2#  include <sys/time.h>
3#endif

which every LLM tells me is ‘inherently unreliable’ and can lead to portability issues.

‘Including sys/time.h on modern POSIX systems that already have time.h and clock_gettime doesn’t hurt, and ensures gettimeofday is available if needed.’

‘Trying to skip it adds complexity with almost no benefit.’

real-or-random commented at 8:39 am on September 8, 2025:

Okay, but if we include time.h anyway, then let’s stick to clock_gettime on POSIX? https://pubs.opengroup.org/onlinepubs/9799919799/functions/clock_getres.html says that the proper macro to check is _POSIX_CPUTIME and that CLOCK_MONOTONIC must be supported, so there’s no need to check for it.

Raimo33 commented at 9:37 am on September 8, 2025:

You’re right, the latest specification makes CLOCK_MONOTONIC support mandatory, this makes it way simpler. no need to include <sys/time.h> anymore.

Raimo33 commented at 10:22 am on September 8, 2025:

This would be possible if every compiler supported the Std 1003.1-2024 version of POSIX, which is unrealistic. Until then, for backward compatibility, I kept CLOCK_REALTIME as the last fallback.

Raimo33 commented at 10:26 am on September 8, 2025:

and btw, the _POSIX_CPUTIME and _POSIX_MONOTONIC flags are to be used when you want to ensure that the library has mandatory support for those features. In my case for example, gcc doesnt expose _POSIX_MONOTONIC but still supports it. Therefore it’s better to check CLOCK_PROCESS_CPUTIME_ID and CLOCK_MONOTONIC directly.

in src/bench.h:42 in 66745f741f outdated

49+
50+    return (int64_t)((k.QuadPart + u.QuadPart) / 10);
51+#else /* POSIX */
52+
53+# if defined(CLOCK_PROCESS_CPUTIME_ID)
54+    /* in theory, CLOCK_PROCESS_CPUTIME_ID is only useful if the process is locked to a core. see https://linux.die.net/man/3/clock_gettime */

real-or-random commented at 6:40 am on September 3, 2025:

and in practice?

Raimo33 commented at 8:12 am on September 3, 2025:

see #1732 (comment)

real-or-random commented at 6:57 pm on September 3, 2025:

Okay, sorry. I was aware of the comment in the issue, but a future reader of the code comment will not. So what I’m trying to say is that the code comment can be improved.

Raimo33 commented at 8:46 pm on September 3, 2025:

Added comment and link to chatgpt deepsearch. might not be very professional but it includes the full analysis and lots of citations.

in src/bench.h:64 in 66745f741f outdated

75+    /* WARN: timer is influenced by environvment (OS, scheduling, interrupts...) */
76     struct timeval tv;
77-    gettimeofday(&tv, NULL);
78-    return (int64_t)tv.tv_usec + (int64_t)tv.tv_sec * 1000000LL;
79+    gettimeofday((struct timeval*)&tv, NULL);
80+    return (int64_t)tv.tv_sec * 1000000 + tv.tv_usec;

real-or-random commented at 6:42 am on September 3, 2025:

How is this diff an improvement?

The pointer cast is a no-op.
LL is not needed.

Raimo33 commented at 8:49 am on September 3, 2025:

you’re right. removed.

in src/bench.h:39 in 66745f741f outdated

69+    struct timespec ts;
70+    if (clock_gettime(CLOCK_REALTIME, &ts) != 0) {
71+        return 0;
72+    }
73+    return (int64_t)ts.tv_sec * 1000000 + ts.tv_nsec / 1000;
74+# else

real-or-random commented at 6:43 am on September 3, 2025:

This could be DRYed more. Maybe it’s a good idea to print to the user which clock method is used.

Raimo33 commented at 8:16 am on September 3, 2025:

yes, I tried everything you mentioned.

removing redundancy here can be done by just using the ifdefs for the constant, but you’d need a different path for the last fallback. the last case uses tv, not ts.

I tried issuing warnings with #warning to tell the user about the precision of the clock but it’s not compatible with c90.

Raimo33 commented at 8:56 am on September 3, 2025:

Here’s the DRY’d version:

 0# if defined(CLOCK_PROCESS_CPUTIME_ID)
 1    /* in theory, CLOCK_PROCESS_CPUTIME_ID is only useful if the process is locked to a core. see https://linux.die.net/man/3/clock_gettime */
 2const clockid_t clk_id = CLOCK_PROCESS_CPUTIME_ID;
 3# elif defined(CLOCK_MONOTONIC)
 4    /* WARN: timer is influenced by environvment (OS, scheduling, interrupts...) */
 5const clockid_t clk_id = CLOCK_MONOTONIC;
 6# elif defined(CLOCK_REALTIME)
 7    /* WARN: timer is influenced by environvment (OS, scheduling, interrupts...) */
 8const clockid_t clk_id = CLOCK_REALTIME;
 9# else
10# define USE_GETTIMEOFDAY
11# endif
12
13# ifndef USE_GETTIMEOFDAY
14struct timespec ts;
15if (clock_gettime(clk_id, &ts) != 0) {
16    return 0;
17}
18return (int64_t)ts.tv_sec * 1000000 + ts.tv_nsec / 1000;
19# else
20    /* WARN: timer is influenced by environvment (OS, scheduling, interrupts...) */
21struct timeval tv;
22gettimeofday(&tv, NULL);
23return tv.tv_usec + (int64_t)tv.tv_sec * 1000000;
24# endif

less readable imo

real-or-random commented at 6:55 pm on September 3, 2025:

I tried issuing warnings with #warning to tell the user about the precision of the clock but it’s not compatible with c90.

What I had in mind is printing the warning out at runtime.

Raimo33 commented at 9:36 pm on September 3, 2025:

good suggestion. done.

in src/bench.h:28 in 66745f741f outdated

35-    struct timespec tv;
36-    if (!timespec_get(&tv, TIME_UTC)) {
37-        fputs("timespec_get failed!", stderr);
38-        exit(EXIT_FAILURE);
39+    if (GetProcessTimes(GetCurrentProcess(), &creation, &exit, &kernel, &user) == 0) {
40+        return 0;

real-or-random commented at 6:56 am on September 3, 2025:

What’s the rationale for the change on Windows?

Raimo33 commented at 8:19 am on September 3, 2025:

previously wall-clock times. now per-process cpu time.

Raimo33 commented at 8:30 am on September 3, 2025: none

The point is exactly having a simple way of verifying that there’s indeed no interference. Getting rid of sources of variance is hard to get right, and it’s impossible to get a perfect solution. (This discussion shows this!) So we better have a way of spotting if something is off.

fine, but at least let’s make it optional. I don’t like my benchmarks being 10 times slower just because min/max need to be computed. if I say 20'000 iterations I want it to be 20`000 iterations.

Raimo33 force-pushed on Sep 3, 2025

Raimo33 requested review from real-or-random on Sep 3, 2025

Raimo33 force-pushed on Sep 3, 2025

Raimo33 force-pushed on Sep 4, 2025

Raimo33 force-pushed on Sep 6, 2025

real-or-random commented at 8:43 am on September 8, 2025: contributor

The point is exactly having a simple way of verifying that there’s indeed no interference. Getting rid of sources of variance is hard to get right, and it’s impossible to get a perfect solution. (This discussion shows this!) So we better have a way of spotting if something is off.

fine, but at least let’s make it optional. I don’t like my benchmarks being 10 times slower just because min/max need to be computed. if I say 20'000 iterations I want it to be 20`000 iterations.

Not sure. I suggest making this PR focused on a single (uncontroversial) change, which is switching to per process clocks.

Raimo33 force-pushed on Sep 8, 2025

Raimo33 renamed this:
~~[WIP] refactor: remove system interference from benchmarks~~
bench: replace wall-clock timer with per-process CPU timer
on Sep 8, 2025

Raimo33 force-pushed on Sep 8, 2025

in src/bench.h:44 in f9d2ddec4a outdated

56+
57+# if defined(CLOCK_PROCESS_CPUTIME_ID)
58+    /* In theory, CLOCK_PROCESS_CPUTIME_ID is only useful if the process is locked to a core. 
59+    * See https://linux.die.net/man/3/clock_gettime. 
60+    * In practice, modern CPUs have largely addressed this issue. The worst case scenario is a unreliable benchmark. 
61+    * See https://chatgpt.com/s/dr_68b7486a05a481919d9f121d182cc0cf

sipa commented at 3:18 pm on September 8, 2025:

Can you find a source that’s a bit more authoritative than ChatGPT?

Raimo33 commented at 3:55 pm on September 8, 2025:

yes. Added multiple links.

in src/bench.h:209 in f9d2ddec4a outdated

203@@ -176,6 +204,14 @@ static int get_iters(int default_iters) {
204     }
205 }
206 
207+static void print_clock_info(void) {
208+#if defined(_WIN32) || defined(CLOCK_PROCESS_CPUTIME_ID)
209+    printf("INFO: using CPU timer, results are not influenced by other running processes\n\n");

sipa commented at 3:19 pm on September 8, 2025:

That seems like an exaggeration; hyperthreading, task swapping, and competing memory bandwidth, may all still affect the results.

Raimo33 commented at 3:50 pm on September 8, 2025:

you’re right. memory bandwith, cache pollution, hyperthreading, hot/cold branch predictor… But task swapping shouldn’t influence CLOCK_PROCESS_CPUTIME_ID. I changed the message to notify the user that he should probably pin the process.

real-or-random commented at 7:22 am on September 9, 2025:

I changed the message to notify the user that he should probably pin the process.

If that’s not required on modern CPUs, why advise to do this?

We could just say “INFO: Using per-process CPU timer”.

Raimo33 force-pushed on Sep 8, 2025

in src/bench.h:42 in bbc8bb7003 outdated

54+
55+#else /* POSIX */
56+
57+# if defined(CLOCK_PROCESS_CPUTIME_ID)
58+    /* In theory, CLOCK_PROCESS_CPUTIME_ID is only useful if the process is locked to a core. see https://man7.org/linux/man-pages/man2/clock_gettime.2.html *
59+     * in practice, modern CPUs have synchronized TSCs which addresses this issue. see https://docs.amd.com/r/en-US/ug1586-onload-user/Timer-TSC-Stability   */

real-or-random commented at 7:14 am on September 9, 2025:

0    /* In theory, CLOCK_PROCESS_CPUTIME_ID is only useful if the process is locked to a core,
1     * see `man clock_gettime` on Linux. In practice, modern CPUs have synchronized TSCs which
2     * address this issue, see https://docs.amd.com/r/en-US/ug1586-onload-user/Timer-TSC-Stability . */

Raimo33 force-pushed on Sep 9, 2025

Raimo33 commented at 9:11 am on September 9, 2025: none

I’ve applied the recommended changes and split the PR into multiple commits for simpler diffs

Raimo33 force-pushed on Sep 9, 2025

real-or-random added the label tweak/refactor on Sep 9, 2025

l0rinc commented at 5:09 pm on September 9, 2025: none

I haven’t investigated this in detail but couldn’t find any related issue: were there any efforts to use https://nanobench.ankerl.com like we do in Core instead? That will likely need some adjustments on the benchmarks themselves, but at least we wouldn’t be rolling our own framework.

Raimo33 commented at 5:51 pm on September 9, 2025: none

not familiar with nano bench, but I can see it uses std::chrono. which uses CLOCK_MONOTONIC (our fallback) under the hood, not TSC.

also the user would need to have a c++ compiler, not a big real but I think this repo needs to strictly adhere to c89.

Raimo33 commented at 6:02 pm on September 9, 2025: none

I realized the windows version I implemented was using a unprecise clock. drafting. will fix.

Raimo33 marked this as a draft on Sep 9, 2025

Raimo33 force-pushed on Sep 9, 2025

Raimo33 commented at 11:12 pm on September 9, 2025: none

I changed windows clock to QueryPerformanceCounter() which has microsecond precision. I also removed if checks from clock_gettime() as it can only fail for programming errors.

real-or-random commented at 6:26 am on September 10, 2025: contributor

I haven’t investigated this in detail but couldn’t find any related issue: were there any efforts to use nanobench.ankerl.com like we do in Core instead?

nanobench is a framework for benchmarking C++ but this is C. Of course, we could still call our C from C++, but adding C++ to the code base seems a bit overkill to me if if’s “just” for the purpose of benchmarks. I agree that using an existing framework may be a good idea, but I’m not aware of any in pure C.

Raimo33 marked this as ready for review on Sep 10, 2025

l0rinc commented at 9:57 pm on September 10, 2025: none

but adding C++ to the code base seems a bit overkill to me if if’s “just” for the purpose of benchmarks

The purpose would be to standardize the benchmarking to avoid reinventing solutions to problems that aren’t strictly related to the purpose of this library. It would also help with trusting the result more, since we already know how that benchmarking library behaves. The main client of this library is C++ code, it doesn’t seem far-fetched to me to test that.

real-or-random commented at 6:36 pm on September 15, 2025: contributor

but adding C++ to the code base seems a bit overkill to me if if’s “just” for the purpose of benchmarks

The purpose would be to standardize the benchmarking to avoid reinventing solutions to problems that aren’t strictly related to the purpose of this library. It would also help with trusting the result more, since we already know how that benchmarking library behaves. The main client of this library is C++ code, it doesn’t seem far-fetched to me to test that.

Sorry, I didn’t want to be discouraging. I don’t think at all that nanobench is a far-fetched idea. And I truly agree that reinventing the wheel is not great.

In the end, what counts is the maintenance burden. My main concern with C++ is that I’m not sure how fluent our regular contributors are in C++. (I’m not, but luckily I’m not the only one here.) If the bit of C++ is harder to touch (for us) than this code, then nanobench is not a good idea. If, on the other hand, we have people who can maintain C++, or if we get support from the Core contributors who are familiar with nanobench, then this can be a win.

I’d be happy to see a demo/WIP PR but if you feel that this is a lot of work, then it might be a good idea to see what other contributors and maintainers think.

Raimo33 commented at 3:34 pm on September 17, 2025: none

This PR currently conflicts with https://github.com/bitcoin-core/secp256k1/pull/1734

furszy commented at 4:16 pm on September 17, 2025: member

This PR currently conflicts with #1734

Don’t worry about it. #1734 only moves gettime() so it can be accessed by the test framework too. Any changes you make here will function correctly there as well, and vice versa.

in src/bench.h:197 in 3e58a91b77

189@@ -176,6 +190,14 @@ static int get_iters(int default_iters) {
190     }
191 }
192 
193+static void print_clock_info(void) {
194+#if defined(CLOCK_PROCESS_CPUTIME_ID)
195+    printf("INFO: Using per-process CPU timer\n\n");
196+#else
197+    printf("WARN: using Wall-Clock timer, results are highly influenced by other running processes\n\n");

real-or-random commented at 11:03 am on September 22, 2025:

0    printf("WARN: Using wall-clock timer instead of per-process CPU timer.\n\n");

Let’s just stick to the facts and omit the interpretation of them, this makes it easier to move forward. ;)

in README.md:162 in 3e58a91b77

158@@ -159,6 +159,8 @@ Benchmark
159 ------------
160 If configured with `--enable-benchmark` (which is the default), binaries for benchmarking the libsecp256k1 functions will be present in the root directory after the build.
161 
162+For reliable benchmarks, it is strongly recommended to pin the process to a dedicated CPU core and to disable CPU frequency scaling.

real-or-random commented at 11:05 am on September 22, 2025:

0For reliable benchmarks, it is strongly recommended to pin the process to a single CPU core and to disable CPU frequency scaling.

Dedicated sounds as if the core is only responsible for this process, which would be nice but it’s typically not doable.

real-or-random approved

real-or-random commented at 11:06 am on September 22, 2025: contributor

ACK mod nits, still need to test this

Raimo33 force-pushed on Sep 22, 2025

in src/bench.c:7 in 8925b95882

3@@ -4,6 +4,8 @@
4  * file COPYING or https://www.opensource.org/licenses/mit-license.php.*
5  ***********************************************************************/
6 
7+#define _POSIX_C_SOURCE 200112L /* for clock_gettime() */

hebasto commented at 2:26 pm on September 22, 2025:

Are there any references explaining why the value 200112L was chosen?

Raimo33 commented at 2:53 pm on September 22, 2025:

no particular reason besides the fact that it is the bare minimum version to make it work on my machine and CI. The reported 199309L version number didn’t expose the necessary macros.

hebasto commented at 2:20 pm on September 25, 2025:

The reported 199309L version number didn’t expose the necessary macros.

I can’t reproduce this on my machine.

no particular reason besides the fact that it is the bare minimum version to make it work on my machine…

Could you share your build environment details so I can try to reproduce the issue?

… and CI.

It seems that CI is fine with the documented 199309L value. See: https://github.com/hebasto/secp256k1/actions/runs/18008380590

hebasto commented at 2:27 pm on September 22, 2025: member

According to https://www.man7.org/linux/man-pages/man3/clock_gettime.3.html:

Link with -lrt (only for glibc versions before 2.17).

I think this check, and adding the -lrt flag if necessary, should be included in both build systems.

Raimo33 commented at 3:06 pm on September 22, 2025: none

I think this check, and adding the -lrt flag if necessary, should be included in both build systems.

Agreed, good catch! what do you think about something like this?

0include(CheckFunctionExists)
1check_function_exists(clock_gettime HAVE_CLOCK_GETTIME)
2if(NOT HAVE_CLOCK_GETTIME)
3    # On some platforms, clock_gettime requires librt
4    target_link_libraries(your_target PRIVATE rt)
5endif()

hebasto commented at 3:27 pm on September 22, 2025: member

Friendly ping @martinus, benchmark connoisseur :)

Could you please give a rough estimate of these changes at the concept level?

real-or-random commented at 3:41 pm on September 22, 2025: contributor

Could you please give a rough estimate of these changes at the concept level?

And what’s your take on using nanobench instead, even though this is a pure C library instead of C++?

hebasto commented at 3:48 pm on September 22, 2025: member

Goal

This PR refactors the benchmarking functions as per #1701, in order to make benchmarks more deterministic and less influenced by the environvment.

This is achieved by replacing Wall-Clock Timer with Per-Process CPU Timer when possible.

I’ve tested 8925b9588259d27257f42eb4fb8bebd4a7be4d95 by running it alongside stress --cpu $(nproc) and observed the same effect on benchmark average values as on the master branch.

Could you please clarify how one would measure or observe the claimed “less influenced by the environment” behaviour?

Raimo33 commented at 8:53 am on September 25, 2025: none

stress --cpu spawns workers that use 100% of the cpu. your test pegs every core to 100% utilization. and each gets probably pinned to a physical core. therefore you won’t see any difference in terms of variance.

a better example would be starting various threads with random usage spikes at random times on the same core the benchmark is running on.

basically, if you have no other processes running on that CPU, then this PR won’t show any improvement.

this PR helps in the more realistic scenario where a user has background processes running, and the scheduler assigns multiple of them on the same CPU that’s running the benchmark, putting the thread to sleep, which the wall clock timer doesn’t account for.

also, this PR is not merely a replacement of wall clock time with CPU time, but it also modernizes the clock function as per the Unix standard.

hebasto commented at 3:08 pm on September 25, 2025: member

I think this check, and adding the -lrt flag if necessary, should be included in both build systems.

Agreed, good catch! what do you think about something like this?
0include(CheckFunctionExists)
1check_function_exists(clock_gettime HAVE_CLOCK_GETTIME)
2if(NOT HAVE_CLOCK_GETTIME)
3    # On some platforms, clock_gettime requires librt
4    target_link_libraries(your_target PRIVATE rt)
5endif()

Feel free to grab f59c45f2cef03681074eef00046d3b146daf42e8 from https://github.com/hebasto/secp256k1/commits/pr1732/0925.cmake.

UPD. @real-or-random do we need a CI job to build on some old system with old glibc?

in src/bench.h:166 in 8925b95882 outdated

195+    printf("INFO: Using per-process CPU timer\n\n");
196+#else
197+    printf("WARN: Using wall-clock timer instead of per-process CPU timer.\n\n");
198+#endif
199+}
200+

hebasto commented at 3:10 pm on September 25, 2025:

5fe7c017682240295d701570f052b6ddfb2d1fb7:

Do these messages make sense on Windows?

Raimo33 commented at 3:19 pm on September 25, 2025:

while on windows there’s no option for per-process cpu clock (or at least not high precision), I still think issuing the warning is fine.

hebasto commented at 3:35 pm on September 25, 2025:

It seems a bit of a stretch to call the native Windows QPC framework a “wall-clock timer”.

Raimo33 commented at 4:26 pm on September 25, 2025:

what about this? “WARN: Using global timer instead of per-process CPU timer.”

Raimo33 commented at 4:54 pm on September 25, 2025: none

Feel free to grab f59c45f from https://github.com/hebasto/secp256k1/commits/pr1732/0925.cmake.

This seems overcomplicated and convoluted. but if you manage to simplify it I’ll include your commit.

Raimo33 force-pushed on Sep 25, 2025

hebasto commented at 5:12 pm on September 25, 2025: member

Approach NACK cce014725032a58416237e38aedd2033ba2202f5.

The current implementation breaks compatibility with systems using glibc versions prior to 2.17:

 0$ ldd --version | head -1
 1ldd (Ubuntu EGLIBC 2.15-0ubuntu10.23) 2.15
 2$ cmake -B build
 3$ cmake --build build -t bench
 4<snip>
 5[100%] Linking C executable ../bin/bench
 6CMakeFiles/bench.dir/bench.c.o: In function `gettime_us':
 7/secp256k1/src/bench.h:45: undefined reference to `clock_gettime'
 8collect2: ld returned 1 exit status
 9make[3]: *** [bin/bench] Error 1
10make[2]: *** [src/CMakeFiles/bench.dir/all] Error 2
11make[1]: *** [src/CMakeFiles/bench.dir/rule] Error 2
12make: *** [bench] Error 2

Raimo33 commented at 3:07 pm on September 28, 2025: none

Fixed by unconditionally adding -lrt when available.

0root@5997b2bd1aca:/workspace# ldd --version | head -1
1ldd (Ubuntu EGLIBC 2.15-0ubuntu10.23) 2.15
2root@5997b2bd1aca:/workspace# cmake --build build -t bench
3[ 33%] Built target secp256k1_precomputed
4[ 66%] Built target secp256k1
5[100%] Built target bench

in src/CMakeLists.txt:132 in 9128541044 outdated

124@@ -125,12 +125,17 @@ endif()
125 unset(${PROJECT_NAME}_soversion)
126 
127 if(SECP256K1_BUILD_BENCHMARK)
128+  find_library(RT_LIBRARY rt)
129+  add_library(optional_rt INTERFACE)
130+  if(RT_LIBRARY)
131+    target_link_libraries(optional_rt INTERFACE rt)
132+  endif()

purpleKarrot commented at 3:16 pm on September 29, 2025:

Could you please extract this logic into a find-module, like FindRT.cmake? That module should provide an IMPORTED target with a namespace in the name.

Have a look at the FindIconv.cmake module as an example. This module also provides an interface library for functionality that may be found in an actual library or built into the C standard library.

Raimo33 commented at 1:20 pm on October 1, 2025:

This looks compilicated, almost overcomplicated, without much benefits, I won’t be able to replicate FindIconv.cmake on my own. but you’re welcome to provide a commit and I’ll see if it should be cherrypicked.

in README.md:160 in 9128541044 outdated

158@@ -157,6 +159,8 @@ Benchmark
159 ------------
160 If configured with `--enable-benchmark` (which is the default), binaries for benchmarking the libsecp256k1 functions will be present in the root directory after the build.
161 
162+For reliable benchmarks, it is strongly recommended to pin the process to a single CPU core and to disable CPU frequency scaling.

purpleKarrot commented at 3:35 pm on September 29, 2025:

It would be great if the benchmark was configured in a way such that the guideline is followed.

0add_test(NAME secp256k1_bench COMMAND ...)
1set_tests_properties(secp256k1_bench PROPERTIES
2  PROCESSORS 1
3  PROCESSOR_AFFINITY ON
4)

I am not aware how to disable scaling, unfortunately.

Raimo33 commented at 1:54 pm on October 1, 2025:

This is a great idea, added.

in src/CMakeLists.txt:133 in 9128541044

124@@ -125,12 +125,17 @@ endif()
125 unset(${PROJECT_NAME}_soversion)
126 
127 if(SECP256K1_BUILD_BENCHMARK)
128+  find_library(RT_LIBRARY rt)
129+  add_library(optional_rt INTERFACE)
130+  if(RT_LIBRARY)
131+    target_link_libraries(optional_rt INTERFACE rt)
132+  endif()
133   add_executable(bench bench.c)

purpleKarrot commented at 3:43 pm on September 29, 2025:

Note that CMake requires that

test names are unique per directory
target names are unique.

While it is not strictly required to prefix test names with the project, it is still recommended, because it allows users to control which tests to run: ctest -R secp256k1_. This recommendation is already followed, good.

For target names, this requirement is much stronger. Since this project is used as a subproject in bitcoin, neither bitcoin nor any other of its subprojects can define a target with the name bench. It would be recommended to rename the target to secp256k1_bench.

Raimo33 commented at 1:25 pm on October 1, 2025:

Agreed, done!

Raimo33 commented at 1:54 pm on October 1, 2025:

although CI scripts now need to change to reflect the executable name changes

Raimo33 commented at 2:01 pm on October 1, 2025: none

As per @purpleKarrot suggestion, benchmarks now can be run via ctest as well (as opposed to ./bench_name). This would be the preferable way to run them as it automatically handles CPU pinning and affinity. This means that, if both SECP256K1_BUILD_BENCHMARK and SECP256K1_BUILD_TESTS flags are set, the standard ctest command will run benchs+tests. This can be avoided by selectively choosing which “tests” to run with the following ctest syntax: ctest -R bench or ctest -R test (which uses regex).

Otherwise we can evaluate the possibility of using labels for groups of tests.

Raimo33 force-pushed on Oct 1, 2025

Raimo33 force-pushed on Oct 2, 2025

real-or-random commented at 11:57 am on October 13, 2025: contributor

@hebasto @purpleKarrot Want to re-review this? :)

real-or-random commented at 6:51 am on October 15, 2025: contributor

This needs to be rebased after the merge of #1734, sorry.

Raimo33 force-pushed on Oct 15, 2025

Raimo33 force-pushed on Oct 18, 2025

Raimo33 commented at 7:25 am on October 21, 2025: none

there appears to be an issue in the CI. @real-or-random can you trigger it again?

also there’s one last thing I don’t like about this PR: having to define POSIX_C_SOURCE in every source file. if anyone finds a way to define it once, perhaps in the CMakeLists.txt please submit it.

Raimo33 requested review from real-or-random on Oct 24, 2025

Raimo33 requested review from hebasto on Oct 24, 2025

Raimo33 requested review from purpleKarrot on Oct 24, 2025

hebasto commented at 8:18 pm on October 25, 2025: member

#1732 (comment):

I suggest making this PR focused on a single (uncontroversial) change, which is switching to per process clocks.

I strongly support this approach and I suggest postponing the following changes for follow-up PRs::

4f3403ea10ffb91efdef8698e5f2464ec53460c7 “build: rename executables with prefix”, because it touches not only benchmarks but tests as well;
e6914740a74b40fb919c97c0f30c7579212fe922 “build: addbenchmarks to ctest”, because it would be reasonable to consider this after #1760.

I also suggest dropping a93078fbb6fd8a62c8e7f67a7a0960dc9c6bf45c “refactor: reorder cmake commands for better readability” as it does the opposite given the style used for all other targets.

in src/tests_exhaustive.c:8 in e6914740a7

3@@ -4,6 +4,8 @@
4  * file COPYING or https://www.opensource.org/licenses/mit-license.php.*
5  ***********************************************************************/
6 
7+#define _POSIX_C_SOURCE 199309L /* for clock_gettime() */
8+

hebasto commented at 8:41 pm on October 25, 2025:

This doesn’t seem necessary here.

hebasto commented at 9:06 pm on October 25, 2025:

That’s annoying that test sources have to be modified. See #1734 (comment).

hebasto commented at 0:15 am on October 26, 2025: member

#1732 (comment):

This seems overcomplicated and convoluted.

#1732 (review):

This looks compilicated, almost overcomplicated, without much benefits…

The PR author appears to favor simplicity over correctness (see the comment above). I disagree with that approach.

but you’re welcome to provide a commit and I’ll see if it should be cherrypicked.

Sure. Here is a branch that addresses some of @purpleKarrot’s comments and most of mine: https://github.com/hebasto/secp256k1/commits/251026-pr1732.alt/. However, it doesn’t yet include the necessary changes to the Autotools build system.

UPD. Asked a question about handling _POSIX_C_SOURCE on IRC.

cmake: Add `FindClockGettime` module 28f273d7fe

bench: replace wall-clock timers with cpu-timers where possible

This commit improves the reliability of benchmarks by removing some of the influence of other background running processes. This is achieved by using CPU bound clocks that aren't influenced by interrupts, sleeps, blocked I/O, etc.

a1ea09a1dd

bench: print clock info 76a4fc65e6

docs: add benchmarking best practices e14981e28e

Raimo33 force-pushed on Oct 26, 2025

Raimo33 commented at 2:34 pm on October 26, 2025: none

I agree, droppped the 3 unrelated commits from this PR, removed the unnecessary #define, cherry-picked your solution for the library detection

Raimo33 marked this as a draft on Oct 26, 2025

furszy commented at 2:17 pm on October 27, 2025: member

Just started reviewing it; CMake compilation fails locally. It should set _POSIX_C_SOURCE=199309L instead, so the compiler expands correctly to -D_POSIX_C_SOURCE=199309L. See

 0diff --git a/cmake/FindClockGettime.cmake b/cmake/FindClockGettime.cmake
 1--- a/cmake/FindClockGettime.cmake	(revision e14981e28e1c1e4b2fb6321cd94342d0b2849be7)
 2+++ b/cmake/FindClockGettime.cmake	(date 1761574207222)
 3@@ -20,7 +20,7 @@
 4 
 5 cmake_push_check_state(RESET)
 6 
 7-set(CMAKE_REQUIRED_DEFINITIONS -D_POSIX_C_SOURCE=199309L)
 8+set(CMAKE_REQUIRED_DEFINITIONS _POSIX_C_SOURCE=199309L)
 9 check_symbol_exists(clock_gettime "time.h" CLOCK_GETTIME_IS_BUILT_IN)
10 set(${CMAKE_FIND_PACKAGE_NAME}_FOUND ${CLOCK_GETTIME_IS_BUILT_IN})

hebasto commented at 2:28 pm on October 27, 2025: member

Just started reviewing it; CMake compilation fails locally. It should set _POSIX_C_SOURCE=199309L instead, so the compiler expands correctly to -D_POSIX_C_SOURCE=199309L during check_symbol_exists.

I can’t reproduce it. What CMake version are you using?

From CMake docs:

CMAKE_REQUIRED_DEFINITIONS

A semicolon-separated list of compiler definitions, each of the form -DFOO or -DFOO=bar.

Raimo33 commented at 2:31 pm on October 27, 2025: none

Just started reviewing it; CMake compilation fails locally. It should set _POSIX_C_SOURCE=199309L instead, so the compiler expands correctly to -D_POSIX_C_SOURCE=199309L during check_symbol_exists. See

Can you provide the error output? my CMake doesn’t complain

furszy commented at 2:42 pm on October 27, 2025: member

Can you provide the error output? my CMake doesn’t complain

Sure.

 0In file included from In file included from <built-in><built-in>:418:
 1<command line>:In file included from 1<built-in>:418:
 2:9<command line>:: 1In file included from :<built-in>::418:
 34189error: :
 4<command line>:macro name must be an identifier<command line>:1: 1:9: 
 5:error: 9error: : #define -D_POSIX_C_SOURCE 199309L
 6        ^macro name must be an identifier
 7macro name must be an identifier
 8error: 
 9macro name must be an identifier
10#define -D_POSIX_C_SOURCE 199309L

I can’t reproduce it. What CMake version are you using?

cmake version 3.22.3

hebasto commented at 2:47 pm on October 27, 2025: member

cmake version 3.22.3

Confirming. I’ll suggest a fix shortly.

UPD. There was a change in CMake 3.26:

For all COMPILE_DEFINITIONS properties, any leading -D on an item is removed…

hebasto commented at 3:15 pm on October 27, 2025: member

Just started reviewing it; CMake compilation fails locally. It should set _POSIX_C_SOURCE=199309L instead, so the compiler expands correctly to -D_POSIX_C_SOURCE=199309L. See

 0diff --git a/cmake/FindClockGettime.cmake b/cmake/FindClockGettime.cmake
 1--- a/cmake/FindClockGettime.cmake	(revision e14981e28e1c1e4b2fb6321cd94342d0b2849be7)
 2+++ b/cmake/FindClockGettime.cmake	(date 1761574207222)
 3@@ -20,7 +20,7 @@
 4 
 5 cmake_push_check_state(RESET)
 6 
 7-set(CMAKE_REQUIRED_DEFINITIONS -D_POSIX_C_SOURCE=199309L)
 8+set(CMAKE_REQUIRED_DEFINITIONS _POSIX_C_SOURCE=199309L)
 9 check_symbol_exists(clock_gettime "time.h" CLOCK_GETTIME_IS_BUILT_IN)
10 set(${CMAKE_FIND_PACKAGE_NAME}_FOUND ${CLOCK_GETTIME_IS_BUILT_IN})

Here is the minimal diff to fix the error for CMake older thank 3.26:

 0--- a/cmake/FindClockGettime.cmake
 1+++ b/cmake/FindClockGettime.cmake
 2@@ -34,7 +34,7 @@ if(${CMAKE_FIND_PACKAGE_NAME}_FOUND)
 3   if(NOT TARGET POSIX::clock_gettime)
 4     add_library(POSIX::clock_gettime INTERFACE IMPORTED)
 5     set_target_properties(POSIX::clock_gettime PROPERTIES
 6-      INTERFACE_COMPILE_DEFINITIONS "${CMAKE_REQUIRED_DEFINITIONS}"
 7+      INTERFACE_COMPILE_DEFINITIONS _POSIX_C_SOURCE=199309L
 8       INTERFACE_LINK_LIBRARIES "${CMAKE_REQUIRED_LIBRARIES}"
 9     )
10   endif()

furszy commented at 3:21 pm on October 27, 2025: member

Isn’t that going to make CMAKE_REQUIRED_DEFINITIONS unused? the set(CMAKE_REQUIRED_DEFINITIONS -D_POSIX_C_SOURCE=199309L) line.

hebasto commented at 3:22 pm on October 27, 2025: member

Isn’t that going to make CMAKE_REQUIRED_DEFINITIONS unused? the set(CMAKE_REQUIRED_DEFINITIONS -D_POSIX_C_SOURCE=199309L) line.

It is still used by check_symbol_exists().

in src/tests_common.h:30 in a1ea09a1dd outdated

42-    return (int64_t)tv.tv_usec + (int64_t)tv.tv_sec * 1000000LL;
43+static int64_t gettime_us(void) {
44+#if defined(_WIN32)
45+
46+    LARGE_INTEGER freq, counter;
47+    QueryPerformanceFrequency(&freq);

furszy commented at 3:32 pm on October 27, 2025:

Per https://learn.microsoft.com/en-us/windows/win32/api/profileapi/nf-profileapi-queryperformancefrequency, freq should be static and initialized only once. The function’s description says:

The frequency of the performance counter is fixed at system boot and is consistent across all processors. Therefore, the frequency need only be queried upon application initialization, and the result can be cached.

Raimo33 commented at 3:54 pm on October 27, 2025:

I know, it’s not mandatory but recommended. It is redundant to initialize it every time but I found no other solution since we are not in an OOP environment. and using an if statement to initialize it only the first time doesn’t seem optimal.

furszy commented at 5:52 pm on October 27, 2025:

I see. Using an init_time(void) function could work in that case, but it would require every binary to call this method early in every program’s execution. All good anyway.

Another nit; QueryPerformanceFrequency returns 0 if the call fails, so it would be good to handle that as well.

Raimo33 commented at 5:55 pm on October 27, 2025:

Another nit; QueryPerformanceFrequency returns 0 if the call fails, so it would be good to handle that as well.

my reasoning Is the same here. It would be one extra branch. and a failure in the clock is not a huge deal.

furszy commented at 3:45 pm on October 27, 2025: member

Have you considered GetProcessTimes or GetThreadTimes to ignore sleep and waiting times on Windows? It seems to me that the current approach is semantically different across platforms: on POSIX systems, you get CPU time (if enabled), whereas on Windows you’re using QueryPerformanceCounter, which measures elapsed wall-clock time. So results will not be comparable across platforms?

Raimo33 commented at 3:52 pm on October 27, 2025: none

Have you considered GetProcessTimes or GetThreadTimes

I have, but they lack precision.

furszy commented at 5:38 pm on October 27, 2025: member

Have you considered GetProcessTimes or GetThreadTimes

I have, but they lack precision.

That’s interesting. What do you think about the semantical difference between the Windows and POSIX approaches? It seems to me that we should at least mention in the docs that results aren’t comparable across platforms.

Raimo33 commented at 12:39 pm on October 28, 2025: none

What do you think about the semantical difference between the Windows and POSIX approaches? It seems to me that we should at least mention in the docs that results aren’t comparable across platforms.

We added print statements for that exact reason. On windows there simply isn’t a reliable way to get per-process time.

bench: replace wall-clock timer with per-process CPU timer #1732

Goal

Goal