Goal
This PR refactors the benchmarking functions as per #1701, in order to make benchmarks more deterministic and less influenced by the environvment.
This is achieved by replacing Wall-Clock Timer with Per-Process CPU Timer when possible.
Just some quick comments:
[x] remove the number of runs (
count) in favor of simpler cleaner approach with just number of iterations (iter).
I think there’s a reason to have this. Some benchmarks take much longer than others, so it probably makes sense to run fewer iters for these.
[x] remove
minandmaxstatistics in favor of simpler approach with justavg.
I think min and max are useful. For constant-time code, you can also compare min. And max gives you an idea if there were fluctuations or not.
[x] remove needless fixed point conversion in favor of simpler floating point divisions.
Well, okay, that has a history; see #689. It’s debatable if it makes sense to avoid floating point math, but as long as it doesn’t get in your way here, it’s a cool thing to keep it. :D
min and max just complicate things. let me explain:
first of all, as it is right now, they don’t even measure the min and max, they just measure the min/max of the averages of all runs. aka not the absolute. Furthermore, in order to have them, one would need to run all the iterations 10 times more. benchmarks are already slow, adding this min/max slows them by 10 fold. imho it’s completely unnecessary. @real-or-random
        
      If we’re going to rework this, I’d suggest using the stabilized quartiles approach from https://cr.yp.to/papers/rsrst-20250727.pdf:
I think there’s a reason to have this. Some benchmarks take much longer than others, so it probably makes sense to run fewer iters for these.
right now all benchmarks are run with count=10 and fixed iters (apart from ecmult_multi which adjusts the number of iters, not count).
therefore count is only useful to extrapolate min and max
Well, okay, that has a history; see #689. It’s debatable if it makes sense to avoid floating point math, but as long as it doesn’t get in your way here, it’s a cool thing to keep it. :D
I disagree with #689. It overcomplicate things for the sake of not having floating point math. those divisions aren’t even in the hot path, they’re outside the benchmarks.
Concept NACK on removing any ability to observe variance in timing. The current min/avg/max are far from perfect, but they work fairly well in practice. Improving is welcome, but removing them is a step backwards.
what is the usefulness of measuring min/max when we are removing OS interference & thermal throttling out of the equation? min/max will be extremely close to the avg no matter how bad the benchmarked function is.
gettimeofday() is officially discouraged since 2008 in favor of clock_gettime(). The POSIX standard marks it as obsolescent but still provides the API for backward compatibility.
        
      even though the manual says that CLOCK_PROCESS_CPUTIME_ID is only useful if the process is locked to a core, modern CPUs have largely addressed this issue. So I think it is fair to compile CLOCK_PROCESS_CPUTIME_ID even though we don’t have the guarantee that the user has pinned the benchmarking process to a core. The worst case scenario is a unreliable benchmark, which the current repo has anyways.
I added a line in the README.md for best practices to run the benchmarks.
I also tried adding a function to pin the process to a core directly in C, but there’s no standard POSIX compliant way to do so. There is pthread_setaffinity_np() on linux, where ’np’ stands for ’not portable'
Concept NACK on removing any ability to observe variance in timing. The current min/avg/max are far from perfect, but they work fairly well in practice. Improving is welcome, but removing them is a step backwards.
what is the usefulness of measuring min/max when we are removing OS interference & thermal throttling out of the equation? min/max will be extremely close to the avg no matter how bad the benchmarked function is.
The point is exactly having a simple way of verifying that there’s indeed no interference. Getting rid of sources of variance is hard to get right, and it’s impossible to get a perfect solution. (This discussion shows this!) So we better have a way of spotting if something is off.
I like the stabilized quartiles idea.
I like the stabilized quartiles idea.
tbh it scares me a bit, will see what I can do. Maybe in a future PR.
29@@ -30,6 +30,8 @@ set(${PROJECT_NAME}_LIB_VERSION_AGE 0)
30 #=============================
31 set(CMAKE_C_STANDARD 90)
32 set(CMAKE_C_EXTENSIONS OFF)
33+# Enable POSIX features while maintaining ISO C compliance
34+add_compile_definitions(_POSIX_C_SOURCE=200112L) #needed for `clock_gettime()` in bench.c
bench.c?
              
            src/CMakeLists.txt targeting only the benchmarks.
              
            158@@ -159,6 +159,8 @@ Benchmark
159 ------------
160 If configured with `--enable-benchmark` (which is the default), binaries for benchmarking the libsecp256k1 functions will be present in the root directory after the build.
161 
162+For an accurate benchmark, it is strongly recommended to pin the process to a dedicated CPU core and to disable CPU frequency scaling.
 6@@ -7,30 +7,63 @@
 7 #ifndef SECP256K1_BENCH_H
 8 #define SECP256K1_BENCH_H
 9 
10+#if defined(_WIN32)
11+# include <windows.h>
12+#else
13+# include <time.h>
14+# include <sys/time.h>
gettimeofday() and one for clock_gettime(). they don’t conflict.
              
            just tried. unfortunately the macros to detect wether gettimeofday() or clock_gettime() is needed are defined in the time.h header. so the only viable solution is:
0#include <time.h>
1#if !defined(CLOCK_PROCESS_CPUTIME_ID) && !defined(CLOCK_MONOTONIC) && !defined(CLOCK_REALTIME)
2#  include <sys/time.h>
3#endif
which every LLM tells me is ‘inherently unreliable’ and can lead to portability issues.
‘Including sys/time.h on modern POSIX systems that already have time.h and clock_gettime doesn’t hurt, and ensures gettimeofday is available if needed.’
‘Trying to skip it adds complexity with almost no benefit.’
clock_gettime on POSIX?  https://pubs.opengroup.org/onlinepubs/9799919799/functions/clock_getres.html says that the proper macro to check is _POSIX_CPUTIME and that CLOCK_MONOTONIC must be supported, so there’s no need to check for it.
              
            CLOCK_MONOTONIC support mandatory, this makes it way simpler. no need to include <sys/time.h> anymore.
              
            CLOCK_REALTIME as the last fallback.
              
            _POSIX_CPUTIME and _POSIX_MONOTONIC flags are to be used when you want to ensure that the library has mandatory support for those features. In my case for example, gcc doesnt expose _POSIX_MONOTONIC but still supports it. Therefore it’s better to check CLOCK_PROCESS_CPUTIME_ID and CLOCK_MONOTONIC directly.
              
            49+
50+    return (int64_t)((k.QuadPart + u.QuadPart) / 10);
51+#else /* POSIX */
52+
53+# if defined(CLOCK_PROCESS_CPUTIME_ID)
54+    /* in theory, CLOCK_PROCESS_CPUTIME_ID is only useful if the process is locked to a core. see https://linux.die.net/man/3/clock_gettime */
75+    /* WARN: timer is influenced by environvment (OS, scheduling, interrupts...) */
76     struct timeval tv;
77-    gettimeofday(&tv, NULL);
78-    return (int64_t)tv.tv_usec + (int64_t)tv.tv_sec * 1000000LL;
79+    gettimeofday((struct timeval*)&tv, NULL);
80+    return (int64_t)tv.tv_sec * 1000000 + tv.tv_usec;
How is this diff an improvement?
LL is not needed.69+    struct timespec ts;
70+    if (clock_gettime(CLOCK_REALTIME, &ts) != 0) {
71+        return 0;
72+    }
73+    return (int64_t)ts.tv_sec * 1000000 + ts.tv_nsec / 1000;
74+# else
yes, I tried everything you mentioned.
removing redundancy here can be done by just using the ifdefs for the constant, but you’d need a different path for the last fallback. the last case uses tv, not ts.
I tried issuing warnings with #warning to tell the user about the precision of the clock but it’s not compatible with c90.
Here’s the DRY’d version:
 0# if defined(CLOCK_PROCESS_CPUTIME_ID)
 1    /* in theory, CLOCK_PROCESS_CPUTIME_ID is only useful if the process is locked to a core. see https://linux.die.net/man/3/clock_gettime */
 2const clockid_t clk_id = CLOCK_PROCESS_CPUTIME_ID;
 3# elif defined(CLOCK_MONOTONIC)
 4    /* WARN: timer is influenced by environvment (OS, scheduling, interrupts...) */
 5const clockid_t clk_id = CLOCK_MONOTONIC;
 6# elif defined(CLOCK_REALTIME)
 7    /* WARN: timer is influenced by environvment (OS, scheduling, interrupts...) */
 8const clockid_t clk_id = CLOCK_REALTIME;
 9# else
10# define USE_GETTIMEOFDAY
11# endif
12
13# ifndef USE_GETTIMEOFDAY
14struct timespec ts;
15if (clock_gettime(clk_id, &ts) != 0) {
16    return 0;
17}
18return (int64_t)ts.tv_sec * 1000000 + ts.tv_nsec / 1000;
19# else
20    /* WARN: timer is influenced by environvment (OS, scheduling, interrupts...) */
21struct timeval tv;
22gettimeofday(&tv, NULL);
23return tv.tv_usec + (int64_t)tv.tv_sec * 1000000;
24# endif
less readable imo
I tried issuing warnings with
#warningto tell the user about the precision of the clock but it’s not compatible with c90.
What I had in mind is printing the warning out at runtime.
35-    struct timespec tv;
36-    if (!timespec_get(&tv, TIME_UTC)) {
37-        fputs("timespec_get failed!", stderr);
38-        exit(EXIT_FAILURE);
39+    if (GetProcessTimes(GetCurrentProcess(), &creation, &exit, &kernel, &user) == 0) {
40+        return 0;
The point is exactly having a simple way of verifying that there’s indeed no interference. Getting rid of sources of variance is hard to get right, and it’s impossible to get a perfect solution. (This discussion shows this!) So we better have a way of spotting if something is off.
fine, but at least let’s make it optional. I don’t like my benchmarks being 10 times slower just because min/max need to be computed. if I say 20'000 iterations I want it to be 20`000 iterations.
The point is exactly having a simple way of verifying that there’s indeed no interference. Getting rid of sources of variance is hard to get right, and it’s impossible to get a perfect solution. (This discussion shows this!) So we better have a way of spotting if something is off.
fine, but at least let’s make it optional. I don’t like my benchmarks being 10 times slower just because min/max need to be computed. if I say 20'000 iterations I want it to be 20`000 iterations.
Not sure. I suggest making this PR focused on a single (uncontroversial) change, which is switching to per process clocks.
56+
57+# if defined(CLOCK_PROCESS_CPUTIME_ID)
58+    /* In theory, CLOCK_PROCESS_CPUTIME_ID is only useful if the process is locked to a core. 
59+    * See https://linux.die.net/man/3/clock_gettime. 
60+    * In practice, modern CPUs have largely addressed this issue. The worst case scenario is a unreliable benchmark. 
61+    * See https://chatgpt.com/s/dr_68b7486a05a481919d9f121d182cc0cf 
203@@ -176,6 +204,14 @@ static int get_iters(int default_iters) {
204     }
205 }
206 
207+static void print_clock_info(void) {
208+#if defined(_WIN32) || defined(CLOCK_PROCESS_CPUTIME_ID)
209+    printf("INFO: using CPU timer, results are not influenced by other running processes\n\n");
CLOCK_PROCESS_CPUTIME_ID. I changed the message to notify the user that he should probably pin the process.
              
            I changed the message to notify the user that he should probably pin the process.
If that’s not required on modern CPUs, why advise to do this?
We could just say “INFO: Using per-process CPU timer”.
54+
55+#else /* POSIX */
56+
57+# if defined(CLOCK_PROCESS_CPUTIME_ID)
58+    /* In theory, CLOCK_PROCESS_CPUTIME_ID is only useful if the process is locked to a core. see https://man7.org/linux/man-pages/man2/clock_gettime.2.html *
59+     * in practice, modern CPUs have synchronized TSCs which addresses this issue. see https://docs.amd.com/r/en-US/ug1586-onload-user/Timer-TSC-Stability   */
0    /* In theory, CLOCK_PROCESS_CPUTIME_ID is only useful if the process is locked to a core,
1     * see `man clock_gettime` on Linux. In practice, modern CPUs have synchronized TSCs which
2     * address this issue, see https://docs.amd.com/r/en-US/ug1586-onload-user/Timer-TSC-Stability . */
not familiar with nano bench, but I can see it uses std::chrono. which uses CLOCK_MONOTONIC (our fallback) under the hood, not TSC.
also the user would need to have a c++ compiler, not a big real but I think this repo needs to strictly adhere to c89.
QueryPerformanceCounter() which has microsecond precision.
I also removed if checks from clock_gettime() as it can only fail for programming errors.
        
      I haven’t investigated this in detail but couldn’t find any related issue: were there any efforts to use nanobench.ankerl.com like we do in Core instead?
nanobench is a framework for benchmarking C++ but this is C. Of course, we could still call our C from C++, but adding C++ to the code base seems a bit overkill to me if if’s “just” for the purpose of benchmarks. I agree that using an existing framework may be a good idea, but I’m not aware of any in pure C.
but adding C++ to the code base seems a bit overkill to me if if’s “just” for the purpose of benchmarks
The purpose would be to standardize the benchmarking to avoid reinventing solutions to problems that aren’t strictly related to the purpose of this library. It would also help with trusting the result more, since we already know how that benchmarking library behaves. The main client of this library is C++ code, it doesn’t seem far-fetched to me to test that.
but adding C++ to the code base seems a bit overkill to me if if’s “just” for the purpose of benchmarks
The purpose would be to standardize the benchmarking to avoid reinventing solutions to problems that aren’t strictly related to the purpose of this library. It would also help with trusting the result more, since we already know how that benchmarking library behaves. The main client of this library is C++ code, it doesn’t seem far-fetched to me to test that.
Sorry, I didn’t want to be discouraging. I don’t think at all that nanobench is a far-fetched idea. And I truly agree that reinventing the wheel is not great.
In the end, what counts is the maintenance burden. My main concern with C++ is that I’m not sure how fluent our regular contributors are in C++. (I’m not, but luckily I’m not the only one here.) If the bit of C++ is harder to touch (for us) than this code, then nanobench is not a good idea. If, on the other hand, we have people who can maintain C++, or if we get support from the Core contributors who are familiar with nanobench, then this can be a win.
I’d be happy to see a demo/WIP PR but if you feel that this is a lot of work, then it might be a good idea to see what other contributors and maintainers think.
189@@ -176,6 +190,14 @@ static int get_iters(int default_iters) {
190     }
191 }
192 
193+static void print_clock_info(void) {
194+#if defined(CLOCK_PROCESS_CPUTIME_ID)
195+    printf("INFO: Using per-process CPU timer\n\n");
196+#else
197+    printf("WARN: using Wall-Clock timer, results are highly influenced by other running processes\n\n");
0    printf("WARN: Using wall-clock timer instead of per-process CPU timer.\n\n");
Let’s just stick to the facts and omit the interpretation of them, this makes it easier to move forward. ;)
158@@ -159,6 +159,8 @@ Benchmark
159 ------------
160 If configured with `--enable-benchmark` (which is the default), binaries for benchmarking the libsecp256k1 functions will be present in the root directory after the build.
161 
162+For reliable benchmarks, it is strongly recommended to pin the process to a dedicated CPU core and to disable CPU frequency scaling.
0For reliable benchmarks, it is strongly recommended to pin the process to a single CPU core and to disable CPU frequency scaling.
Dedicated sounds as if the core is only responsible for this process, which would be nice but it’s typically not doable.
3@@ -4,6 +4,8 @@
4  * file COPYING or https://www.opensource.org/licenses/mit-license.php.*
5  ***********************************************************************/
6 
7+#define _POSIX_C_SOURCE 200112L /* for clock_gettime() */
200112L was chosen?
              
            199309L version number didn’t expose the necessary macros.
              
            The reported
199309Lversion number didn’t expose the necessary macros.
I can’t reproduce this on my machine.
no particular reason besides the fact that it is the bare minimum version to make it work on my machine…
Could you share your build environment details so I can try to reproduce the issue?
… and CI.
It seems that CI is fine with the documented 199309L value.
See: https://github.com/hebasto/secp256k1/actions/runs/18008380590
According to https://www.man7.org/linux/man-pages/man3/clock_gettime.3.html:
Link with
-lrt(only for glibc versions before 2.17).
I think this check, and adding the -lrt flag if necessary, should be included in both build systems.
I think this check, and adding the
-lrtflag if necessary, should be included in both build systems.
Agreed, good catch! what do you think about something like this?
0include(CheckFunctionExists)
1check_function_exists(clock_gettime HAVE_CLOCK_GETTIME)
2if(NOT HAVE_CLOCK_GETTIME)
3    # On some platforms, clock_gettime requires librt
4    target_link_libraries(your_target PRIVATE rt)
5endif()
Could you please give a rough estimate of these changes at the concept level?
And what’s your take on using nanobench instead, even though this is a pure C library instead of C++?
Goal
This PR refactors the benchmarking functions as per #1701, in order to make benchmarks more deterministic and less influenced by the environvment.
This is achieved by replacing Wall-Clock Timer with Per-Process CPU Timer when possible.
I’ve tested 8925b9588259d27257f42eb4fb8bebd4a7be4d95 by running it alongside stress --cpu $(nproc) and observed the same effect on benchmark average values as on the master branch.
Could you please clarify how one would measure or observe the claimed “less influenced by the environment” behaviour?
stress --cpu spawns workers that use 100% of the cpu. your test pegs every core to 100% utilization. and each gets probably pinned to a physical core. therefore you won’t see any difference in terms of variance.
a better example would be starting various threads with random usage spikes at random times on the same core the benchmark is running on.
basically, if you have no other processes running on that CPU, then this PR won’t show any improvement.
this PR helps in the more realistic scenario where a user has background processes running, and the scheduler assigns multiple of them on the same CPU that’s running the benchmark, putting the thread to sleep, which the wall clock timer doesn’t account for.
also, this PR is not merely a replacement of wall clock time with CPU time, but it also modernizes the clock function as per the Unix standard.
I think this check, and adding the
-lrtflag if necessary, should be included in both build systems.Agreed, good catch! what do you think about something like this?
0include(CheckFunctionExists) 1check_function_exists(clock_gettime HAVE_CLOCK_GETTIME) 2if(NOT HAVE_CLOCK_GETTIME) 3 # On some platforms, clock_gettime requires librt 4 target_link_libraries(your_target PRIVATE rt) 5endif()
Feel free to grab f59c45f2cef03681074eef00046d3b146daf42e8 from https://github.com/hebasto/secp256k1/commits/pr1732/0925.cmake.
UPD. @real-or-random do we need a CI job to build on some old system with old glibc?
195+    printf("INFO: Using per-process CPU timer\n\n");
196+#else
197+    printf("WARN: Using wall-clock timer instead of per-process CPU timer.\n\n");
198+#endif
199+}
200+
5fe7c017682240295d701570f052b6ddfb2d1fb7:
Do these messages make sense on Windows?
Feel free to grab f59c45f from https://github.com/hebasto/secp256k1/commits/pr1732/0925.cmake.
This seems overcomplicated and convoluted. but if you manage to simplify it I’ll include your commit.
Approach NACK cce014725032a58416237e38aedd2033ba2202f5.
The current implementation breaks compatibility with systems using glibc versions prior to 2.17:
 0$ ldd --version | head -1
 1ldd (Ubuntu EGLIBC 2.15-0ubuntu10.23) 2.15
 2$ cmake -B build
 3$ cmake --build build -t bench
 4<snip>
 5[100%] Linking C executable ../bin/bench
 6CMakeFiles/bench.dir/bench.c.o: In function `gettime_us':
 7/secp256k1/src/bench.h:45: undefined reference to `clock_gettime'
 8collect2: ld returned 1 exit status
 9make[3]: *** [bin/bench] Error 1
10make[2]: *** [src/CMakeFiles/bench.dir/all] Error 2
11make[1]: *** [src/CMakeFiles/bench.dir/rule] Error 2
12make: *** [bench] Error 2
Fixed by unconditionally adding -lrt when available.
0root@5997b2bd1aca:/workspace# ldd --version | head -1
1ldd (Ubuntu EGLIBC 2.15-0ubuntu10.23) 2.15
2root@5997b2bd1aca:/workspace# cmake --build build -t bench
3[ 33%] Built target secp256k1_precomputed
4[ 66%] Built target secp256k1
5[100%] Built target bench
124@@ -125,12 +125,17 @@ endif()
125 unset(${PROJECT_NAME}_soversion)
126 
127 if(SECP256K1_BUILD_BENCHMARK)
128+  find_library(RT_LIBRARY rt)
129+  add_library(optional_rt INTERFACE)
130+  if(RT_LIBRARY)
131+    target_link_libraries(optional_rt INTERFACE rt)
132+  endif()
Could you please extract this logic into a find-module, like FindRT.cmake? That module should provide an IMPORTED target with a namespace in the name.
Have a look at the FindIconv.cmake module as an example. This module also provides an interface library for functionality that may be found in an actual library or built into the C standard library.
158@@ -157,6 +159,8 @@ Benchmark
159 ------------
160 If configured with `--enable-benchmark` (which is the default), binaries for benchmarking the libsecp256k1 functions will be present in the root directory after the build.
161 
162+For reliable benchmarks, it is strongly recommended to pin the process to a single CPU core and to disable CPU frequency scaling.
It would be great if the benchmark was configured in a way such that the guideline is followed.
0add_test(NAME secp256k1_bench COMMAND ...)
1set_tests_properties(secp256k1_bench PROPERTIES
2  PROCESSORS 1
3  PROCESSOR_AFFINITY ON
4)
I am not aware how to disable scaling, unfortunately.
124@@ -125,12 +125,17 @@ endif()
125 unset(${PROJECT_NAME}_soversion)
126 
127 if(SECP256K1_BUILD_BENCHMARK)
128+  find_library(RT_LIBRARY rt)
129+  add_library(optional_rt INTERFACE)
130+  if(RT_LIBRARY)
131+    target_link_libraries(optional_rt INTERFACE rt)
132+  endif()
133   add_executable(bench bench.c)
Note that CMake requires that
While it is not strictly required to prefix test names with the project, it is still recommended, because it allows users to control which tests to run: ctest -R secp256k1_. This recommendation is already followed, good.
For target names, this requirement is much stronger. Since this project is used as a subproject in bitcoin, neither bitcoin nor any other of its subprojects can define a target with the name bench. It would be recommended to rename the target to secp256k1_bench.
As per @purpleKarrot suggestion, benchmarks now can be run via ctest as well (as opposed to ./bench_name). This would be the preferable way to run them as it automatically handles CPU pinning and affinity. This means that, if both SECP256K1_BUILD_BENCHMARK and SECP256K1_BUILD_TESTS flags are set, the standard ctest command will run benchs+tests.
This can be avoided by selectively choosing which “tests” to run with the following ctest syntax:
ctest -R bench or ctest -R test (which uses regex).
Otherwise we can evaluate the possibility of using labels for groups of tests.
there appears to be an issue in the CI. @real-or-random can you trigger it again?
also there’s one last thing I don’t like about this PR: having to define POSIX_C_SOURCE in every source file. if anyone finds a way to define it once, perhaps in the CMakeLists.txt please submit it.
I suggest making this PR focused on a single (uncontroversial) change, which is switching to per process clocks.
I strongly support this approach and I suggest postponing the following changes for follow-up PRs::
I also suggest dropping a93078fbb6fd8a62c8e7f67a7a0960dc9c6bf45c “refactor: reorder cmake commands for better readability” as it does the opposite given the style used for all other targets.
3@@ -4,6 +4,8 @@
4  * file COPYING or https://www.opensource.org/licenses/mit-license.php.*
5  ***********************************************************************/
6 
7+#define _POSIX_C_SOURCE 199309L /* for clock_gettime() */
8+
This seems overcomplicated and convoluted.
This looks compilicated, almost overcomplicated, without much benefits…
The PR author appears to favor simplicity over correctness (see the comment above). I disagree with that approach.
but you’re welcome to provide a commit and I’ll see if it should be cherrypicked.
Sure. Here is a branch that addresses some of @purpleKarrot’s comments and most of mine: https://github.com/hebasto/secp256k1/commits/251026-pr1732.alt/. However, it doesn’t yet include the necessary changes to the Autotools build system.
UPD. Asked a question about handling _POSIX_C_SOURCE on IRC.
This commit improves the reliability of benchmarks by removing some of the influence of other background running processes. This is achieved by using CPU bound clocks that aren't influenced by interrupts, sleeps, blocked I/O, etc.
          #define, cherry-picked your solution for the library detection
        
      Just started reviewing it; CMake compilation fails locally.
It should set _POSIX_C_SOURCE=199309L instead, so the compiler expands correctly to -D_POSIX_C_SOURCE=199309L. See
 0diff --git a/cmake/FindClockGettime.cmake b/cmake/FindClockGettime.cmake
 1--- a/cmake/FindClockGettime.cmake	(revision e14981e28e1c1e4b2fb6321cd94342d0b2849be7)
 2+++ b/cmake/FindClockGettime.cmake	(date 1761574207222)
 3@@ -20,7 +20,7 @@
 4 
 5 cmake_push_check_state(RESET)
 6 
 7-set(CMAKE_REQUIRED_DEFINITIONS -D_POSIX_C_SOURCE=199309L)
 8+set(CMAKE_REQUIRED_DEFINITIONS _POSIX_C_SOURCE=199309L)
 9 check_symbol_exists(clock_gettime "time.h" CLOCK_GETTIME_IS_BUILT_IN)
10 set(${CMAKE_FIND_PACKAGE_NAME}_FOUND ${CLOCK_GETTIME_IS_BUILT_IN})
Just started reviewing it; CMake compilation fails locally. It should set
_POSIX_C_SOURCE=199309Linstead, so the compiler expands correctly to-D_POSIX_C_SOURCE=199309Lduringcheck_symbol_exists.
I can’t reproduce it. What CMake version are you using?
From CMake docs:
CMAKE_REQUIRED_DEFINITIONSA semicolon-separated list of compiler definitions, each of the form -DFOO or -DFOO=bar.
Just started reviewing it; CMake compilation fails locally. It should set
_POSIX_C_SOURCE=199309Linstead, so the compiler expands correctly to-D_POSIX_C_SOURCE=199309Lduringcheck_symbol_exists. See
Can you provide the error output? my CMake doesn’t complain
Can you provide the error output? my CMake doesn’t complain
Sure.
 0In file included from In file included from <built-in><built-in>:418:
 1<command line>:In file included from 1<built-in>:418:
 2:9<command line>:: 1In file included from :<built-in>::418:
 34189error: :
 4<command line>:macro name must be an identifier<command line>:1: 1:9: 
 5:error: 9error: : #define -D_POSIX_C_SOURCE 199309L
 6        ^macro name must be an identifier
 7macro name must be an identifier
 8error: 
 9macro name must be an identifier
10#define -D_POSIX_C_SOURCE 199309L
I can’t reproduce it. What CMake version are you using?
cmake version 3.22.3
Just started reviewing it; CMake compilation fails locally. It should set
_POSIX_C_SOURCE=199309Linstead, so the compiler expands correctly to-D_POSIX_C_SOURCE=199309L. See0diff --git a/cmake/FindClockGettime.cmake b/cmake/FindClockGettime.cmake 1--- a/cmake/FindClockGettime.cmake (revision e14981e28e1c1e4b2fb6321cd94342d0b2849be7) 2+++ b/cmake/FindClockGettime.cmake (date 1761574207222) 3@@ -20,7 +20,7 @@ 4 5 cmake_push_check_state(RESET) 6 7-set(CMAKE_REQUIRED_DEFINITIONS -D_POSIX_C_SOURCE=199309L) 8+set(CMAKE_REQUIRED_DEFINITIONS _POSIX_C_SOURCE=199309L) 9 check_symbol_exists(clock_gettime "time.h" CLOCK_GETTIME_IS_BUILT_IN) 10 set(${CMAKE_FIND_PACKAGE_NAME}_FOUND ${CLOCK_GETTIME_IS_BUILT_IN})
Here is the minimal diff to fix the error for CMake older thank 3.26:
 0--- a/cmake/FindClockGettime.cmake
 1+++ b/cmake/FindClockGettime.cmake
 2@@ -34,7 +34,7 @@ if(${CMAKE_FIND_PACKAGE_NAME}_FOUND)
 3   if(NOT TARGET POSIX::clock_gettime)
 4     add_library(POSIX::clock_gettime INTERFACE IMPORTED)
 5     set_target_properties(POSIX::clock_gettime PROPERTIES
 6-      INTERFACE_COMPILE_DEFINITIONS "${CMAKE_REQUIRED_DEFINITIONS}"
 7+      INTERFACE_COMPILE_DEFINITIONS _POSIX_C_SOURCE=199309L
 8       INTERFACE_LINK_LIBRARIES "${CMAKE_REQUIRED_LIBRARIES}"
 9     )
10   endif()
CMAKE_REQUIRED_DEFINITIONS unused? the set(CMAKE_REQUIRED_DEFINITIONS -D_POSIX_C_SOURCE=199309L) line.
        
      Isn’t that going to make
CMAKE_REQUIRED_DEFINITIONSunused? theset(CMAKE_REQUIRED_DEFINITIONS -D_POSIX_C_SOURCE=199309L)line.
It is still used by check_symbol_exists().
42-    return (int64_t)tv.tv_usec + (int64_t)tv.tv_sec * 1000000LL;
43+static int64_t gettime_us(void) {
44+#if defined(_WIN32)
45+
46+    LARGE_INTEGER freq, counter;
47+    QueryPerformanceFrequency(&freq);
Per https://learn.microsoft.com/en-us/windows/win32/api/profileapi/nf-profileapi-queryperformancefrequency, freq should be static and initialized only once. The function’s description says:
The frequency of the performance counter is fixed at system boot and is consistent across all processors. Therefore, the frequency need only be queried upon application initialization, and the result can be cached.
I see. Using an init_time(void) function could work in that case, but it would require every binary to call this method early in every program’s execution. All good anyway.
Another nit; QueryPerformanceFrequency returns 0 if the call fails, so it would be good to handle that as well.
Another nit;
QueryPerformanceFrequencyreturns 0 if the call fails, so it would be good to handle that as well.
my reasoning Is the same here. It would be one extra branch. and a failure in the clock is not a huge deal.
GetProcessTimes or GetThreadTimes to ignore sleep and waiting times on Windows?
It seems to me that the current approach is semantically different across platforms: on POSIX systems, you get CPU time (if enabled), whereas on Windows you’re using QueryPerformanceCounter, which measures elapsed wall-clock time.
So results will not be comparable across platforms?
        
      Have you considered
GetProcessTimesorGetThreadTimes
I have, but they lack precision.
Have you considered
GetProcessTimesorGetThreadTimesI have, but they lack precision.
That’s interesting. What do you think about the semantical difference between the Windows and POSIX approaches? It seems to me that we should at least mention in the docs that results aren’t comparable across platforms.
What do you think about the semantical difference between the Windows and POSIX approaches? It seems to me that we should at least mention in the docs that results aren’t comparable across platforms.
We added print statements for that exact reason. On windows there simply isn’t a reliable way to get per-process time.