bench: add `-quiet` and `-iters=<n>` benchmark config args

jonatack commented at 4:04 pm on September 16, 2021: member

Running a benchmark 10 times, twice (before and after), for #22974 and then editing the output by hand to remove the warnings and recommendations, brought home the point that it would be nice to be able to do that automatisch. This PR adds a -quiet arg to silence warnings and recommendations and an iters=<n> arg to run each benchmark for the number of iterations passed.

 0$ src/bench/bench_bitcoin -?
 1Options:
 2
 3  -?
 4       Print this help message and exit
 5
 6  -asymptote=<n1,n2,n3,...>
 7       Test asymptotic growth of the runtime of an algorithm, if supported by
 8       the benchmark
 9
10  -filter=<regex>
11       Regular expression filter to select benchmark by name (default: .*)
12
13  -iters=<n>
14       Iterations of each benchmark to run (default: 1)
15
16  -list
17       List benchmarks without executing them
18
19  -output_csv=<output.csv>
20       Generate CSV file with the most important benchmark results
21
22  -output_json=<output.json>
23       Generate JSON file with all benchmark results
24
25  -quiet
26       Silence warnings and recommendations in benchmark results

examples

0$ ./src/bench/bench_bitcoin -filter=AddrManGood -iters=5 -quiet
1
2|               ns/op |                op/s |    err% |     total | benchmark
3|--------------------:|--------------------:|--------:|----------:|:----------
4|    2,538,968,665.00 |                0.39 |   15.9% |     12.12 | `AddrManGood`
5|    2,536,901,200.00 |                0.39 |   13.0% |     13.73 | `AddrManGood`
6|    2,337,840,590.00 |                0.43 |    3.9% |     12.07 | `AddrManGood`
7|    1,997,515,936.00 |                0.50 |    2.6% |     10.09 | `AddrManGood`
8|    2,217,950,210.00 |                0.45 |    1.3% |     11.30 | `AddrManGood`

 0$ ./src/bench/bench_bitcoin -filter=PrevectorDes*.* -iters=2 -quiet=1
 1
 2|               ns/op |                op/s |    err% |     total | benchmark
 3|--------------------:|--------------------:|--------:|----------:|:----------
 4|            8,062.56 |          124,030.15 |    5.7% |      0.09 | `PrevectorDeserializeNontrivial`
 5|            7,784.81 |          128,455.29 |    1.5% |      0.09 | `PrevectorDeserializeNontrivial`
 6
 7|              356.44 |        2,805,497.65 |    1.5% |      0.00 | `PrevectorDeserializeTrivial`
 8|              354.52 |        2,820,715.33 |    0.9% |      0.00 | `PrevectorDeserializeTrivial`
 9
10|              241.27 |        4,144,791.38 |    0.9% |      0.00 | `PrevectorDestructorNontrivial`
11|              241.45 |        4,141,658.77 |    0.9% |      0.00 | `PrevectorDestructorNontrivial`
12
13|              146.64 |        6,819,400.81 |    0.9% |      0.00 | `PrevectorDestructorTrivial`
14|              147.98 |        6,757,806.43 |    0.6% |      0.00 | `PrevectorDestructorTrivial`

 0$ ./src/bench/bench_bitcoin -filter=PrevectorDes*.* -iters=-1 -quiet=0
 1$ ./src/bench/bench_bitcoin -filter=PrevectorDes*.* -iters=0 -quiet=0
 2$ ./src/bench/bench_bitcoin -filter=PrevectorDes*.* -iters=1 -quiet=0
 3Warning, results might be unstable:
 4* DEBUG defined
 5* CPU frequency scaling enabled: CPU 0 between 400.0 and 3,100.0 MHz
 6* Turbo is enabled, CPU frequency will fluctuate
 7
 8Recommendations
 9* Make sure you compile for Release
10* Use 'pyperf system tune' before benchmarking. See https://github.com/psf/pyperf
11
12|               ns/op |                op/s |    err% |     total | benchmark
13|--------------------:|--------------------:|--------:|----------:|:----------
14|            6,204.87 |          161,163.71 |   15.2% |      0.07 | :wavy_dash: `PrevectorDeserializeNontrivial` (Unstable with ~1.0 iters. Increase `minEpochIterations` to e.g. 10)
15|              214.33 |        4,665,680.65 |    0.1% |      0.00 | `PrevectorDeserializeTrivial`
16|              257.23 |        3,887,584.03 |    8.6% |      0.00 | :wavy_dash: `PrevectorDestructorNontrivial` (Unstable with ~43.5 iters. Increase `minEpochIterations` to e.g. 435)
17|              151.34 |        6,607,846.82 |    1.9% |      0.00 | `PrevectorDestructorTrivial`

bench: add -quiet benchmark option argument 12a455cf1e

bench: add Bench::m_quiet class member dc4e5c32bb

bench: pass Bench::m_quiet to isWarningsEnabled() 6fb8798d2b

bench: drop unneeded header includes 308b9ffc71

bench: various args improvements

- use ALLOW_BOOL for -list arg instead of ALLOW_ANY
- touch up `-asymptote=<n1,n2,n3...>` help
- pack Args struct a bit more efficiently
- handle args in alphabetical order

089a83a837

bench: add -iters=<n> benchmark option argument 7cdad2cabb

jonatack commented at 4:08 pm on September 16, 2021: member

@martinus please let me know if this is ok, as the -quiet arg makes some minor changes to nanobench.h to allow the equivalent of NANOBENCH_SUPPRESS_WARNINGS=1 as an arg listed in the help.

laanwj added the label Tests on Sep 16, 2021

theStack commented at 8:41 pm on September 16, 2021: member

Strong Concept ACK on introducing an -iters parameter, after also having reviewed #22974 :) As for -quiet I’m less convinced about the usefulness, I think the warnings and recommendations almost always make sense and I personally wouldn’t see a need to silence them – if a result is marked as “Unstable” I’m usually throwing it away, assuming that it only has a limited value. (Note though that I’m not a heavy users of the benchmarks in general, maybe there is a need for silencing that I’m not aware of).

jonatack commented at 9:02 pm on September 16, 2021: member

Thanks! The -quiet arg was actually my first motivation, to be able to less painfully compare results or share them, most of which for me have the verbose >5% err unstable warnings even when all the recommendations are followed, and it takes an annoying amount of time to edit out all the warnings by hand. People can see the %err, so for sharing results the noisy warnings are quite pollutive and it’s so handy to be able to silence them.

fanquake commented at 2:47 am on September 17, 2021: member

I don’t think adding more code, and modifying a dependency, just to replicate functionality that already exists is a good idea. What’s the problem with using NANOBENCH_SUPPRESS_WARNINGS? In almost all cases I’m sure we’d rather people heed the warnings, and produce more useful benchmarks, than just ignore them.

and it takes an annoying amount of time to edit out all the warnings by hand.

Couldn’t you write a bash one-liner to post-process your benchmark output, that throws away lines that don’t start with a |?

in src/bench/nanobench.h:2282 in 6fb8798d2b outdated

2278@@ -2278,7 +2279,7 @@ struct IterationLogic::Impl {
2279                     os << col.value();
2280                 }
2281                 os << "| ";
2282-                auto showUnstable = isWarningsEnabled() && rErrorMedian >= 0.05;
2283+                auto showUnstable = isWarningsEnabled(mBench.m_quiet) && rErrorMedian >= 0.05;

martinus commented at 7:31 am on September 17, 2021:

I don’t think that warning should ever be hidden. It shows that the value you are seeing is not reliable because the error is too high. If you regularly see that warning in a benchmark, the benchmark should be improved or the machine should be somehow stabilized.

jonatack commented at 1:01 pm on September 17, 2021:

Yes, unfortunately many to most of the benchmarks have the warning for me, despite taking the steps to stabilize the machine. Encountered this extensively while working on #22284 and every time I run benchmarks to test pulls. I now use NANOBENCH_SUPPRESS_WARNINGS to be able to share my results without running a bench dozens of times to have a few without the warnings.

in src/bench/nanobench.h:628 in dc4e5c32bb outdated

623@@ -624,6 +624,9 @@ class Bench {
624     Bench& operator=(Bench const& other);
625     ~Bench() noexcept;
626 
627+    //! Whether to suppress warnings and recommendations. Equivalent to NANOBENCH_SUPPRESS_WARNINGS.
628+    bool m_quiet{false};

martinus commented at 7:42 am on September 17, 2021:

I’m not sure about adding support for this though, because this can already be done with the environment variable NANOBENCH_SUPPRESS_WARNINGS and it is not obvious which of these setting should override the other. Maybe it would be better to just add documentation for NANOBENCH_SUPPRESS_WARNINGS and also NANOBENCH_ENDLESS into the usage documentation?

If we really want that, in nanobench all the configuration should be inside of Config class. That way configuration can be reused in other benchmarks. Then Bench should have a well documented getter/setter for this. I’d prefer to keep https://github.com/martinus/nanobench and this code here in sync though.

jonatack commented at 1:02 pm on September 17, 2021:

Makes sense.

in src/bench/bench.cpp:61 in 7cdad2cabb

63-            p.second(bench);
64-        } else {
65-            for (auto n : args.asymptote) {
66-                bench.complexityN(n);
67+        for (int i = 0; i < args.iters; ++i) {
68+            if (i == 0 && args.iters > 1) {

martinus commented at 8:21 am on September 17, 2021:

I do not think adding an iterations argument that work like that is a good idea. nanobench itself already has the ability to perform multiple measurements (called “epoch” in nanobench), in fact each benchmark is already measured 11 times. That way it is able to show a measurement error. The number of iterations in each of the measurements is determined automatically, based on the computer’s clock accuracy.

If you get unstable benchmark results, the first thing to do should be to make sure the computer is really stable: no frequency scaling, no turbo, no other interfering programs. nanobench shows the warnings for good reason :slightly_smiling_face:

If that doesn’t help, make sure the actual benchmark itself is stable and actually always does the same (few randomness in it, better not have much allocations, threadding, locks, etc).

If that too doesn’t help, you can e.g. increase the number of iterations with minEpochIterations. That’s a bit problematic though because some benchmarks need a huge setting here, others a very low one. So generally it is probably better use minEpochTime and expose that setting in the arguments (probably as double value in seconds, e.g. like -minEpochTime=0.5)

jonatack commented at 1:04 pm on September 17, 2021:

Yes. I’ve gone through these things. In practice, what I’m seeing people doing to share and compare results in PR reviews is run a benchmark repeatedly, for which the iters=<n> proposal here is a handy convenience.

jonatack commented at 1:22 pm on September 17, 2021: member

I’m going to close and use these features for my own Bitcoin Core benchmarking. If anyone else would like to use them, they can cherry-pick this branch for their own use.

Thanks for your replies, @martinus. Feel free to add bench documentation as you suggest in #22999 (review) and I’ll be happy to review it.

jonatack closed this on Sep 17, 2021

laanwj referenced this in commit 03cb2b480b on Sep 24, 2021

DrahtBot locked this on Oct 30, 2022

bench: add -quiet and -iters=<n> benchmark config args #22999

bench: add `-quiet` and `-iters=<n>` benchmark config args #22999