martinus commented at 4:28 PM on October 17, 2017: contributor

The benchmark's KeepRunning() used to make a function call for each call, inflating measurement times for short running code. This change inlines the critical code that is executed each run and moves the slow timer updates into a new function.

This change increases the average runtime for Trig from 0.000000082339208 sec to 0.000000080948591.

TheBlueMatt commented at 5:20 PM on October 17, 2017: contributor

Cool! The benchmark suite is really not at all very good, sadly. If you feel up to it, I think we'd do better by just hardcoding the number of runs in each benchmark - it is often very easy to see very significant differences in the average runtime based on number of runs (even for things which take 1+ms per run!). Unless you have some better intuition than I about why this is the case, that is.

martinus commented at 7:10 PM on October 17, 2017: contributor

A while ago I wrote a microbenchmarking library similar to google benchmark. There I've used the geometric mean I think is a much better statistic than average here, and its better when you have outliers. I also think having the mean and confidence interval is much more reasonable than just min, max,avg values. Would you be interested in updates like this?

TheBlueMatt commented at 10:25 PM on October 17, 2017: contributor

I'd be happy with anything that gives more reliable values, really. Right now, to compre benchmark runs, I often have to pin to a CPU, then run multiple times, and only consider differences if both runs had the same iteration count, which is a huge pain.

fanquake added the label Tests on Oct 17, 2017

laanwj commented at 1:59 PM on October 18, 2017: member

Cool! The benchmark suite is really not at all very good, sadly. If you feel up to it, I think we'd do better by just hardcoding the number of runs in each benchmark

The idea of doing it this way is that the benchmarks can be run in roughly the same time on slower hardware, whereas on faster hardware they are more precise. The secp256k1 benchmarks have the problem that they take basically forever on lower-end CPUs. I agree it's probably broken though, and the range of CPUs that bitcoind can realistically run on is much smaller than secp256k1's. (if this is an issue, maybe a fixed scaling factor could be passed on the command line for this purpose, that scales all the benchmark iterations equally and is at least deterministic?)

TheBlueMatt commented at 3:28 PM on October 18, 2017: contributor

Yea, a scaling factor seems fine, even an auto-calculated one would probably also be fine, as long as it is very stable/deterministic on the same hardware between runs, which the current version is not.

martinus commented at 6:43 AM on October 19, 2017: contributor

I don't think using the exact same number of iterations is a huge benefit for stability. The problem is there will always be some random events in the system that cause some delays (unwanted fluctuations). The other problem are fluctuations that are inherent to what's benchmarked: e.g. if the code does any allocations, the runtime will always fluctuate. This fluctuation won't go away with fixed number of runs and I think the best way to deal with it is with better statistics. In my old library I am doing this:

Perform 5 runs, each approximately 0.5 seconds.
Calculate geometric mean of the runs.
calculate the standard deviation of the runtime's logarithm, and print the 95% confidence interval of the runtime.

Using the logarithm is much more appropriate for benchmark results, because runtime simply can't be negative. With that data we can use a student's t test to compare the mean of two different algorithms to show if the difference is actually statistically significant or might be just random fluctuation.

That's a bit more complicated, but not by much; boost has all the features for calculating these statistics in a few lines of code. I can work on that but unfortunately I'm quite busy so that will take some time

TheBlueMatt commented at 3:04 PM on October 19, 2017: contributor

For my largest use of bench (FIBRE stuff, which has pretty wide-ranging bench's which run through FEC encode/decode + mempool scan + block decode) number of runs makes a huge difference. I'd be happy to be wrong, but my guess has always been cache-related/branch-prediction effects (the issues still appear when you give bench_bitcoin a dedicated core, minimizing any system disruption). Fluctuation is fine, but I often see fluctuations which are 2-4% when iteration count changes (on roughly-1ms-runtime!) when trying to tease out 5% changes. For something running that long, it seems strange to me that you'd see such huge fluctuations.

martinus commented at 7:11 AM on October 20, 2017: contributor

That's strange. Using a fixed number of iterations should be a simple change. How about adding a command line option to specify a fixed number of iterations? Also a filter would be useful so only a specified test runs.

thebostik commented at 10:14 PM on October 21, 2017: none

The premise is a bit flawed, (though this version might still be better, I like the idea of visualizing the data),

used to make a function call for each call

isn't true because the value of countMask changes in two different ways inside of this logic, increasing to prevent time and cycles from being collected on every other iteration (how it starts at first). That countMask jumps up pretty quickly to powers of 2 - 1 (so the loop condition exits unless the count is a pretty high power of 2 - 1).

There's a comment that says as much, but it's non-obvious since it's tucked pretty far inside that logic. // If the execution was much too fast (1/128th of maxElapsed), increase the count mask by 8x and restart timing. // The restart avoids including the overhead of this code in the measurement.

I added another counter inside of KeepRunning to track those function invocations. It looks like when the benchmark has scaled up for the final timing loop for each, there are no more than 20 of these timing calls in the final output of the benchmark for the fastest benches (example of this, new column overheadCount, and new row TimeLookup). The timing calls themselves take 25ns avg each according to the same benchmark code (heh).

So I guess the complexity / readability to achieve resource-independent 1s bench runs is the main thing that could be improve with this current version, I thought the same thing too until really digging into it and playing with countMask some.

martinus commented at 7:13 AM on October 22, 2017: contributor

isn't true because the value of countMask changes in two different ways inside of this logic, increasing to prevent time and cycles from being collected on every other iteration (how it starts at first).

That's right, but KeepRunning() itself used to introduce a function call overhead. With my change the fast path is inlined, and the slow measuring path is in the non-inlined method.

I've benchmarked an extreme example (with cout so compiler doesn't optimize it away):

double d = 0.01;
while (state.KeepRunning()) {
    d += 0.000001;
}
std::cout << d << std::endl;

Before my change that the measurement is 3.19 ns, with the inlined change I measure 0.83 ns.

I also find the plot extremely useful, and the command line options. I can now do this:

./bench/bench_bitcoin -filter=Deser.* -evals=20 -printer=plot |tee out.html

The output is a nice plot, I've uploaded it here: http://lawyer-maggie-58725.bitballoon.com/ There you can clearly see that there is an outlier for DeserializeAndCheckBlockTest. Using the median of multiple runs it is nicely filtered out.

martinus renamed this:
~~Tests: Improve benchmark precision by inlining performance critical code~~
Tests: Improve benchmark precision
on Oct 27, 2017

martinus cross-referenced this on Oct 27, 2017 from issue bench: use std::chrono rather than gettimeofday by theuni

in src/bench/bench.h:119 in f4d9ade8f6 outdated

 144 | -        static void RunAll(double elapsedTimeForOne=1.0);
 145 | +        static void RunAll(Printer& printer, uint64_t numEvals = 1, double elapsedTimeForOneEval = 0.5, const std::string& filter = ".*", bool isListOnly = false);
 146 |      };
 147 |  }
 148 |  
 149 |  // BENCHMARK(foo) expands to:  benchmark::BenchRunner bench_11foo("foo", foo);

ryanofsky commented at 6:14 PM on October 30, 2017:

This comment is now out of date.

in src/bench/bench.h:120 in f4d9ade8f6 outdated

 147 |  }
 148 |  
 149 |  // BENCHMARK(foo) expands to:  benchmark::BenchRunner bench_11foo("foo", foo);
 150 | -#define BENCHMARK(n) \
 151 | -    benchmark::BenchRunner BOOST_PP_CAT(bench_, BOOST_PP_CAT(__LINE__, n))(BOOST_PP_STRINGIZE(n), n);
 152 | +#define BENCHMARK(n, avgTime) \

ryanofsky commented at 6:15 PM on October 30, 2017:

Usage example at the top of bench.h is now out of date. Should also add comment there explaining the new macro argument.

in src/bench/bench.h:121 in f4d9ade8f6 outdated

 148 |  
 149 |  // BENCHMARK(foo) expands to:  benchmark::BenchRunner bench_11foo("foo", foo);
 150 | -#define BENCHMARK(n) \
 151 | -    benchmark::BenchRunner BOOST_PP_CAT(bench_, BOOST_PP_CAT(__LINE__, n))(BOOST_PP_STRINGIZE(n), n);
 152 | +#define BENCHMARK(n, avgTime) \
 153 | +    benchmark::BenchRunner BOOST_PP_CAT(bench_, BOOST_PP_CAT(__LINE__, n))(BOOST_PP_STRINGIZE(n), n, avgTime);

ryanofsky commented at 6:22 PM on October 30, 2017:

Suggest expected_time or estimated_duration instead of avgTime. Developer guide recommends using snake case for variables, and avg doesn't tell you what it's supposed to be the average of.

in src/bench/bench_bitcoin.cpp:27 in f4d9ade8f6 outdated

  23 |      SetupEnvironment();
  24 |      fPrintToDebugLog = false; // don't want to write to debug.log file
  25 |  
  26 | -    benchmark::BenchRunner::RunAll();
  27 | +    int64_t evaluations = gArgs.GetArg("-evals", 5);
  28 | +    std::string regexFilter = gArgs.GetArg("-filter", ".*");

ryanofsky commented at 6:25 PM on October 30, 2017:

Should add -help output describing what the options do.

Should use snake_case for variable names (and ideally match them with option names).

martinus commented at 5:49 AM on October 31, 2017:

Thanks a lot for the review! I'll see if I can fix all that over the weekend.

in src/bench/bench.h:50 in f4d9ade8f6 outdated

  37 | @@ -36,53 +38,86 @@ BENCHMARK(CODE_TO_TIME);
  38 |   */
  39 |   
  40 |  namespace benchmark {
  41 | +class State;
  42 |  
  43 | -    class State {

ryanofsky commented at 6:49 PM on October 30, 2017:

New code is missing indentation, so now it is inconsistent, and BenchRunner looks like it is nested inside State. Also, the State class is key for someone trying to understand the benchmark framework, so would recommend leaving on top of the printer classes (you can forward declare Printer).

in src/bench/bench.h:115 in f4d9ade8f6 outdated

 140 |      public:
 141 | -        BenchRunner(std::string name, BenchFunction func);
 142 | +        BenchRunner(std::string name, BenchFunction func, double avgTime);
 143 |  
 144 | -        static void RunAll(double elapsedTimeForOne=1.0);
 145 | +        static void RunAll(Printer& printer, uint64_t numEvals = 1, double elapsedTimeForOneEval = 0.5, const std::string& filter = ".*", bool isListOnly = false);

ryanofsky commented at 6:54 PM on October 30, 2017:

Default argument values never actually seem to be used. Would drop so they aren't repeated and don't become inconsistent in the future.

ryanofsky commented at 7:29 PM on October 30, 2017: contributor

utACK f028ada09ee92a1950ccbc60415301b146eb349c if code comments/whitespace/variable names are cleaned up. This change adds some features and makes the code more portable and effectively makes benchmarks run with a fixed number of iterations like Matt wanted, though I think it would be a little more straightforward if the BENCHMARK macro just took the number of KeepRunning iterations literally instead of taking an average time estimate, and using 0.5/average_time as the number of iterations.

in src/bench/bench_bitcoin.cpp:32 in 08e77aa51a outdated

  27 | +        std::cout << HelpMessageGroup(_("Options:"))
  28 | +                  << HelpMessageOpt("-?", _("Print this help message and exit"))
  29 | +                  << HelpMessageOpt("-list", _("List benchmarks without executing them. Can be combined with -scaling and -filter"))
  30 | +                  << HelpMessageOpt("-evals=<n>", strprintf(_("Number of measurement evaluations to perform. (default: %u)"), DEFAULT_BENCH_EVALUATIONS))
  31 | +                  << HelpMessageOpt("-filter=<regex>", strprintf(_("Regular expression filter to select benchmark by name (default: %s)"), DEFAULT_BENCH_FILTER))
  32 | +                  << HelpMessageOpt("-scaling=<n>", strprintf(_("Scaling factor for iterations. The specified factor is then divided by 1000, so e.g. 500 results in a factor 0.5 (default: %u)"), DEFAULT_BENCH_SCALING))

TheBlueMatt commented at 8:33 PM on November 2, 2017:

This is somewhat confusing. Maybe call it the scaling factor for runtime to make it clear what it actually means?

martinus commented at 6:55 PM on November 7, 2017:

I've updated the description now too

in src/bench/bench.cpp:49 in 08e77aa51a outdated

  45 | +void benchmark::ConsolePrinter::footer() {}
  46 | +void benchmark::PlotlyPrinter::header()
  47 |  {
  48 | -    benchmarks().insert(std::make_pair(name, func));
  49 | +    std::cout << "<html><head>"
  50 | +              << "<script src=\"https://cdn.plot.ly/plotly-latest.min.js\"></script>"

TheBlueMatt commented at 8:37 PM on November 2, 2017:

Grr, I kinda feel yucky having code that generates a tracking request for people who want to see the output. Is it possible to check plotly in (does it have a compatible license?) or can we just put the graphs in simple cli form and skip the html output?

martinus commented at 6:50 PM on November 7, 2017:

Plotly.js is released under the MIT license: https://github.com/plotly/plotly.js/blob/master/LICENSE, so that should be pretty compatible :) The HTML output is optional, and default output is commandline, so I did not see a problem with that. I've chosen it because the box plots are really nice, especially when using more evaluations this the output make it very visible how the runtime actually fluctuates. That makes comparisons much easier. It would be very hard to reproduce that in the command line.

in src/bench/bench.cpp:17 in d61637bf94 outdated

  16 | -benchmark::BenchRunner::BenchmarkMap &benchmark::BenchRunner::benchmarks() {
  17 | -    static std::map<std::string, benchmark::BenchFunction> benchmarks_map;
  18 | -    return benchmarks_map;
  19 | +void benchmark::ConsolePrinter::header()
  20 | +{
  21 | +    std::cout << "# Benchmark, evals, iterations, total, min, max, median" << std::endl;

TheBlueMatt commented at 8:55 PM on November 6, 2017:

This is wrong for is_list_only - maybe is_list_only mode should also call result() with some other parameter for the mode?

martinus commented at 6:51 PM on November 7, 2017:

fixed in the following commit

in src/bench/bench.h:57 in d61637bf94 outdated

  77 | -    typedef std::function<void(State&)> BenchFunction;
  78 | +    std::chrono::time_point<std::chrono::high_resolution_clock> m_start_time;
  79 |  
  80 | -    class BenchRunner
  81 | +    bool UpdateTimer(std::chrono::time_point<std::chrono::high_resolution_clock> finish_time);
  82 | +    void PrintResults() const;

TheBlueMatt commented at 8:57 PM on November 6, 2017:

This function is neither defined, nor used.

martinus commented at 6:51 PM on November 7, 2017:

thanks, I've removed it

TheBlueMatt commented at 8:58 PM on November 6, 2017: contributor

Looks awesome, thanks for tackling this!

theuni commented at 10:26 PM on November 7, 2017: member

Concept ACK. The plotly.js issue could be worked around by requiring that a url be supplied when plotting.

It'd be nice if this could be rebased on #11562, as I think that's a simple logical subset of the changes here.

laanwj commented at 3:50 AM on November 8, 2017: member

Concept ACK. Adding HTML as an optional format is neat, please keep CSV support though I need it (but you do so that's great).

NACK on checking plotly (or any JS code) into the repository.

I think it's ok to use this URI by default for the HTML output, but agree that it should be possible to provide your own (e.g. if you've installed it somewhere on your local network, and want to avoid the internet access).

Also agree on rebasing on top of #11562, that one is ready for merge and will go in today.

martinus force-pushed on Nov 8, 2017

martinus commented at 5:48 AM on November 9, 2017: contributor

I've rebased and squashed my changes on top of the current master, and added configurable plotly.js URL. That was the first time I've done a rebase, I hope I've done it correctly! The travis build is green but for some reason it shows as "build is in progress"?

MarcoFalke commented at 3:55 PM on November 9, 2017: member

Don't worry about the travis-yellow. This happens from time to time.

MarcoFalke commented at 3:34 PM on November 10, 2017: member

Needs rebase

martinus force-pushed on Nov 10, 2017

in src/bench/bench.cpp:12 in 6c31a3599a outdated

   7 | @@ -8,98 +8,135 @@
   8 |  #include <assert.h>
   9 |  #include <iostream>
  10 |  #include <iomanip>
  11 | +#include <algorithm>
  12 | +#include <regex>

MarcoFalke commented at 9:09 PM on November 13, 2017:

Don't forget to #include <numeric> for std::accumulate.

martinus commented at 9:24 AM on November 18, 2017:

thanks, updated!

MarcoFalke commented at 9:47 PM on November 13, 2017: member

Looks good 6c31a3599a66a6a59b23919f6e3724b80e936485. Would you mind updating "RollingBloom-refresh" to the same Interface/output as well?

Also, you removed the cycles statistics from the output. Would be nice to shortly mention the reason for that.

martinus force-pushed on Nov 18, 2017

martinus commented at 9:42 AM on November 18, 2017: contributor

I have removed the cycles statistics because I personally don't think it is necessary, and it simplifies the code. I could add it back though if others think its a helpful statistic

in src/bench/bench.cpp:100 in 9b4567c7b8 outdated

 101 | -void
 102 | -benchmark::BenchRunner::RunAll(benchmark::duration elapsedTimeForOne)
 103 | +void benchmark::BenchRunner::RunAll(Printer& printer, uint64_t num_evals, double scaling, const std::string& filter, bool is_list_only)
 104 |  {
 105 |      perf_init();
 106 |      if (std::ratio_less_equal<benchmark::clock::period, std::micro>::value) {

MarcoFalke commented at 3:31 PM on November 27, 2017:

nit (unrelated to your changes): Condition should be negated.

Mind to add a commit for this?

martinus commented at 10:02 AM on December 2, 2017:

Ah, thanks for noting. I have fixed that message and rebased & squashed everything

MarcoFalke commented at 3:32 PM on November 27, 2017: member

Please squash your commits according to https://github.com/bitcoin/bitcoin/blob/master/CONTRIBUTING.md#squashing-commits

martinus force-pushed on Dec 2, 2017

MarcoFalke commented at 10:47 PM on December 13, 2017: member

utACK 846ae3e5143a3959a03fe9bdff78e0253ef38a37

TheBlueMatt commented at 9:44 PM on December 14, 2017: contributor

Code-level utACK 846ae3e5143a3959a03fe9bdff78e0253ef38a37, but there are a few iteration counts that seem much too high (or low). My output is below (take note of CCheckQueueSpeed which took much, much too long, as well as CCheckQueueSpeedPrevectorJob, as well as a few which were much too short.

Benchmark, evals, iterations, total, min, max, median

Base58CheckEncode, 5, 320000, 4.95443, 3.03136e-06, 3.15128e-06, 3.10137e-06 Base58Decode, 5, 800000, 4.70946, 1.15549e-06, 1.19147e-06, 1.17997e-06 Base58Encode, 5, 470000, 4.69162, 1.9516e-06, 2.03356e-06, 2.00767e-06 BenchLockedPool, 5, 530, 4.77861, 0.00174317, 0.00186048, 0.00180833 CCheckQueueSpeed, 5, 29000, 169.862, 0.00116512, 0.00117693, 0.00117196 CCheckQueueSpeedPrevectorJob, 5, 1400, 24.7343, 0.00344175, 0.00358993, 0.00354375 CCoinsCaching, 5, 170000, 5.63809, 6.55766e-06, 6.74362e-06, 6.64618e-06 CoinSelection, 5, 650, 7.56354, 0.00229932, 0.00235094, 0.00234165 DeserializeAndCheckBlockTest, 5, 160, 6.90174, 0.00853605, 0.00879497, 0.00859933 DeserializeBlockTest, 5, 130, 4.66008, 0.00716357, 0.00717861, 0.00716775 FastRandom_1bit, 5, 44000000, 0.552348, 2.49383e-09, 2.5533e-09, 2.50127e-09 FastRandom_32bit, 5, 110000000, 5.47546, 9.9329e-09, 1.00259e-08, 9.93521e-09 MempoolEviction, 5, 41000, 4.77384, 2.32089e-05, 2.34167e-05, 2.3244e-05 PrevectorClear, 5, 5600, 5.79097, 0.000206768, 0.000206858, 0.000206819 PrevectorDestructor, 5, 5700, 5.12332, 0.00017952, 0.00018031, 0.000179633 RIPEMD160, 5, 440, 5.56629, 0.00252712, 0.00254049, 0.00252763 RollingBloom, 5, 1500000, 6.21419, 8.1511e-07, 8.77143e-07, 8.16809e-07 SHA1, 5, 570, 5.67104, 0.00198533, 0.00199761, 0.00198967 SHA256, 5, 340, 5.67169, 0.00322694, 0.00361889, 0.00325424 SHA256_32b, 5, 4700000, 5.55119, 2.33977e-07, 2.40388e-07, 2.35284e-07 SHA512, 5, 330, 5.45279, 0.00328487, 0.00334672, 0.00329141 SipHash_32b, 5, 40000000, 5.66765, 2.82588e-08, 2.84166e-08, 2.83323e-08 Sleep100ms, 5, 10, 5.0036, 0.10007, 0.100075, 0.100071 Trig, 5, 12000000, 1.45462, 2.28395e-08, 2.80425e-08, 2.35008e-08 VerifyScriptBench, 5, 6300, 5.76116, 0.000181074, 0.000186656, 0.000182651

jtimon commented at 11:38 PM on December 14, 2017: contributor

Concept ACK

MarcoFalke commented at 11:51 PM on December 15, 2017: member

@martinus Can you share your results? @TheBlueMatt That is interesting. Running on my laptop gives similar results to yours with the exception of CCheckQueueSpeed and CCheckQueueSpeedPrevectorJob

differ: ccheckqueuespeed

similar: coinselection

MarcoFalke.txt:

# Benchmark, evals, iterations, total, min, max, median
Base58CheckEncode, 5, 320000, 4.5474, 2.817e-06, 2.8636e-06, 2.84568e-06
Base58Decode, 5, 800000, 3.82362, 9.30481e-07, 1.00586e-06, 9.49333e-07
Base58Encode, 5, 470000, 4.7699, 1.86009e-06, 2.11548e-06, 2.05627e-06
BenchLockedPool, 5, 530, 5.17102, 0.00175682, 0.00213842, 0.00190727
CCheckQueueSpeed, 5, 29000, 23.8431, 0.000159138, 0.000169364, 0.000164902
CCheckQueueSpeedPrevectorJob, 5, 1400, 6.47687, 0.000917759, 0.000934733, 0.000922342
CCoinsCaching, 5, 170000, 4.86768, 5.59884e-06, 5.89609e-06, 5.68206e-06
CoinSelection, 5, 650, 7.6276, 0.00218004, 0.00252972, 0.00234302
DeserializeAndCheckBlockTest, 5, 160, 6.98462, 0.00763635, 0.0101189, 0.00814532
DeserializeBlockTest, 5, 130, 4.60714, 0.00664361, 0.00766447, 0.00692787
FastRandom_1bit, 5, 44000000, 0.428713, 1.93345e-09, 1.97208e-09, 1.94739e-09
FastRandom_32bit, 5, 110000000, 4.87081, 8.54002e-09, 9.68865e-09, 8.64202e-09
MempoolEviction, 5, 41000, 4.25868, 2.01973e-05, 2.16433e-05, 2.0531e-05
PrevectorClear, 5, 5600, 4.10787, 0.000139865, 0.000152129, 0.000148594
PrevectorDestructor, 5, 5700, 3.80067, 0.000130306, 0.000137978, 0.000133524
RIPEMD160, 5, 440, 5.45262, 0.00240383, 0.00258753, 0.00243835
RollingBloom, 5, 1500000, 6.32018, 7.26596e-07, 9.93319e-07, 7.94326e-07
SHA1, 5, 570, 5.27207, 0.00177787, 0.00190894, 0.00186143
SHA256, 5, 340, 5.01164, 0.00285627, 0.00317001, 0.00289666
SHA256_32b, 5, 4700000, 4.91959, 2.04949e-07, 2.16811e-07, 2.09041e-07
SHA512, 5, 330, 4.41158, 0.00265473, 0.00268366, 0.00267743
SipHash_32b, 5, 40000000, 4.93749, 2.42605e-08, 2.48857e-08, 2.47638e-08
Sleep100ms, 5, 10, 5.0058, 0.100097, 0.100133, 0.100118
Trig, 5, 12000000, 1.16958, 1.89401e-08, 2.04877e-08, 1.90709e-08
VerifyScriptBench, 5, 6300, 4.50527, 0.000139754, 0.000145372, 0.000143697

martinus commented at 7:18 AM on December 16, 2017: contributor

That's interesting, here are my results:

# Benchmark, evals, iterations, total, min, max, median
Base58CheckEncode, 5, 320000, 5.79601, 3.47514e-06, 3.90062e-06, 3.56941e-06
Base58Decode, 5, 800000, 5.68374, 1.35576e-06, 1.47075e-06, 1.43981e-06
Base58Encode, 5, 470000, 5.97008, 2.29896e-06, 2.8671e-06, 2.47352e-06
BenchLockedPool, 5, 530, 5.74901, 0.00202647, 0.00244975, 0.00208988
CCheckQueueSpeed, 5, 29000, 4.64417, 3.06441e-05, 3.39189e-05, 3.14987e-05
CCheckQueueSpeedPrevectorJob, 5, 1400, 5.73779, 0.000769057, 0.000853848, 0.000828896
CCoinsCaching, 5, 170000, 5.82152, 6.65853e-06, 7.02337e-06, 6.89329e-06
CoinSelection, 5, 650, 5.41396, 0.00160549, 0.00172544, 0.00165988
DeserializeAndCheckBlockTest, 5, 160, 6.57906, 0.00803611, 0.00852187, 0.00820457
DeserializeBlockTest, 5, 130, 4.45676, 0.00673637, 0.00707143, 0.00679454
FastRandom_1bit, 5, 44000000, 0.545512, 2.36596e-09, 2.6505e-09, 2.48319e-09
FastRandom_32bit, 5, 110000000, 5.45518, 9.6993e-09, 1.00983e-08, 9.95917e-09
MempoolEviction, 5, 41000, 5.60891, 2.63059e-05, 3.01792e-05, 2.64325e-05
PrevectorClear, 5, 5600, 5.59075, 0.000189645, 0.000213659, 0.000199472
PrevectorDestructor, 5, 5700, 5.83946, 0.000179477, 0.000234454, 0.000197302
RIPEMD160, 5, 440, 5.58772, 0.00239881, 0.00266644, 0.00254531
RollingBloom, 5, 1500000, 6.05577, 6.86181e-07, 9.65984e-07, 7.73467e-07
SHA1, 5, 570, 5.15691, 0.00179377, 0.00183837, 0.00179627
SHA256, 5, 340, 5.21523, 0.00300338, 0.0030959, 0.00308264
SHA256_32b, 5, 4700000, 5.28716, 2.20308e-07, 2.30364e-07, 2.25683e-07
SHA512, 5, 330, 5.22515, 0.00310101, 0.00323767, 0.00316392
SipHash_32b, 5, 40000000, 5.31217, 2.60762e-08, 2.69529e-08, 2.68093e-08
Sleep100ms, 5, 10, 5.03343, 0.10037, 0.10093, 0.100714
Trig, 5, 12000000, 5.09343, 8.40781e-08, 8.57396e-08, 8.51671e-08
VerifyScriptBench, 5, 6300, 5.47213, 0.000165754, 0.00018609, 0.000169795

I have configured the runtimes so that total times is somewhat close to 5 seconds on my PC, the only mistake I see is that I have set FastRandom_1bit to 10 times too little iterations. Trig is a pretty useless benchmark just as demonstrations, so the runtimes there do not really matter IMHO. All other benchmarks take roughly a bit over 5 seconds for me. I'm using g++ 5.4.0 on Linux

It's really interesting that the runtimes of CCheckQueueSpeed are so different. What compiler are you using?

TheBlueMatt commented at 1:06 AM on December 18, 2017: contributor

Heh, I went and checked and the CCheckQueue bench tests are just (rather uselessly) testing inter-core latency...when I turn the core count way down using taskset things get much faster (2 non-HT cores seems to be the fastest, didnt test across physical CPUs, but assume it would be very slow). I'd suggest we either turn the bench off wholesale or we put some semi-realistic work in the check there.

TheBlueMatt commented at 1:08 AM on December 18, 2017: contributor

Oh, also this is on a many-core system, so the checkqueue bench spawns a ton of threads (it uses a dynamic count based on hardware), making the bench even more inconsistent between systems.

MarcoFalke commented at 10:34 PM on December 19, 2017: member

@martinus I don't want to hold up your pull request any longer. I really like how flexible the new design is to jump in and hack own stuff into it. However, since there seems to be issues arising from hardcoding the number of iterations, I'd propose the following solution:

<strike>Add a commit that also hardcodes the number of cores in checkqueue.cpp to 2. E.g.

diff --git a/src/bench/checkqueue.cpp b/src/bench/checkqueue.cpp
index e79c986792..55590c4a72 100644
--- a/src/bench/checkqueue.cpp
+++ b/src/bench/checkqueue.cpp
@@ -5,3 +5,2 @@
 #include <bench/bench.h>
-#include <util.h>
 #include <validation.h>
@@ -17,3 +16,3 @@
 // particularly visible
-static const int MIN_CORES = 2;
+static constexpr int NUM_CORES = 2;
 static const size_t BATCHES = 101;
@@ -33,4 +32,4 @@ static void CCheckQueueSpeed(benchmark::State& state)
     boost::thread_group tg;
-    for (auto x = 0; x < std::max(MIN_CORES, GetNumCores()); ++x) {
-       tg.create_thread([&]{queue.Thread();});
+    for (auto x = 0; x < NUM_CORES; ++x) {
+        tg.create_thread([&] { queue.Thread(); });
     }
@@ -80,4 +79,4 @@ static void CCheckQueueSpeedPrevectorJob(benchmark::State& state)
     boost::thread_group tg;
-    for (auto x = 0; x < std::max(MIN_CORES, GetNumCores()); ++x) {
-       tg.create_thread([&]{queue.Thread();});
+    for (auto x = 0; x < NUM_CORES; ++x) {
+        tg.create_thread([&] { queue.Thread(); });
     }

MarcoFalke commented at 10:36 PM on December 19, 2017: member

@TheBlueMatt suggested to get rid of the test, since it is not really a useful benchmark. I am fine with that, too.

(Make sure to add it as separate commit, so it is easier to review)

in src/bench/bench_bitcoin.cpp:12 in 33dbba421d outdated

   8 | @@ -9,17 +9,64 @@
   9 |  #include <validation.h>
  10 |  #include <util.h>
  11 |  #include <random.h>
  12 | +#include <init.h>

laanwj commented at 8:59 PM on December 20, 2017:

Why does this add a dependency on init.h?

in contrib/devtools/check-doc.py:18 in 33dbba421d outdated

  14 | @@ -15,9 +15,9 @@
  15 |  import sys
  16 |  
  17 |  FOLDER_GREP = 'src'
  18 | -FOLDER_TEST = 'src/test/'
  19 | +FOLDER_TEST = 'src/(test|bench)/'

MarcoFalke commented at 9:01 PM on December 20, 2017:

Why the reason for this change. It hides a valid warning for me at least:

Args undocumented: 1
set(['-plot-height'])

in src/bench/bench_bitcoin.cpp:41 in 33dbba421d outdated

  36 | +                  << HelpMessageOpt("-filter=<regex>", strprintf(_("Regular expression filter to select benchmark by name (default: %s)"), DEFAULT_BENCH_FILTER))
  37 | +                  << HelpMessageOpt("-scaling=<n>", strprintf(_("Scaling factor for benchmark's runtime (default: %u)"), DEFAULT_BENCH_SCALING))
  38 | +                  << HelpMessageOpt("-printer=(console|plot)", strprintf(_("Choose printer format. console: print data to console. plot: Print results as HTML graph (default: %s)"), DEFAULT_BENCH_PRINTER))
  39 | +                  << HelpMessageOpt("-plot-plotlyurl=<uri>", strprintf(_("URL to use for plotly.js (default: %s)"), DEFAULT_PLOT_PLOTLYURL))
  40 | +                  << HelpMessageOpt("-plot-width=<x>", strprintf(_("Plot width in pixel (default: %u)"), DEFAULT_PLOT_WIDTH))
  41 | +                  << HelpMessageOpt("-plot-width=<x>", strprintf(_("Plot height in pixel (default: %u)"), DEFAULT_PLOT_HEIGHT));

MarcoFalke commented at 9:02 PM on December 20, 2017:

Arg should say "-plot-height=<x>" to make the python check happy.

MarcoFalke commented at 9:02 PM on December 20, 2017: member

re-ACK. Needs rebase and two nits fixed.

martinus commented at 8:42 PM on December 21, 2017: contributor

I think I made my branch kaputt while trying to rebase :-(

MarcoFalke commented at 10:10 PM on December 21, 2017: member

Git commits are unkaputtbar. Should be trivial to recover. Give me a sec...

On Dec 21, 2017 15:43, "Martin Ankerl" notifications@github.com wrote:

I think I made my branch kaputt while trying to rebase :-(

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/bitcoin/bitcoin/pull/11517#issuecomment-353453059, or mute the thread https://github.com/notifications/unsubscribe-auth/AGGmv1sgm_8LoAiTDtAlTAw0E1S0EiiKks5tCsLogaJpZM4P8dRz .

MarcoFalke commented at 10:27 PM on December 21, 2017: member

git checkout improved-benchmarking git reset --hard 48ea572e79e1f63a5de4025d02ba72a666bcdc95 git rebase 604e08c83cf58ca7e7cda2ab284c1ace7bb12977

Solve the merge conflict (Should be straightforward): vim src/bench/checkqueue.cpp

git add src/bench/checkqueue.cpp && git rebase --continue git rebase --interactive HEAD~4

Adjust the prefixes, and the order of the commits, such that it says (approx):

pick foobaa0 Improved microbenchmarking with multiple features.

f foobaa1 increased number of iterations so it takes about 5 seconds

f foobaa2 removed unused init.h include fixed argument help check for bench

pick foobaa3 Removed CCheckQueueSpeed benchmark

git push origin improved-benchmarking --force

Improved microbenchmarking with multiple features.

* inline performance critical code
* Average runtime is specified and used to calculate iterations.
* Console: show median of multiple runs
* plot: show box plot
* filter benchmarks
* specify scaling factor
* ignore src/test and src/bench in command line check script
* number of iterations instead of time
* Replaced runtime in BENCHMARK makro number of iterations.
* Added -? to bench_bitcoin
* Benchmark plotly.js URL, width, height can be customized
* Fixed incorrect precision warning

00721e69f8

Removed CCheckQueueSpeed benchmark

This benchmark's runtime was rather unpredictive on different machines, not really a useful benchmark.

760af84072

in contrib/devtools/check-doc.py:20 in e63f31138c outdated

  16 | @@ -17,7 +17,7 @@
  17 |  FOLDER_GREP = 'src'
  18 |  FOLDER_TEST = 'src/test/'
  19 |  CMD_ROOT_DIR = '`git rev-parse --show-toplevel`/%s' % FOLDER_GREP
  20 | -CMD_GREP_ARGS = r"egrep -r -I '(map(Multi)?Args(\.count\(|\[)|Get(Bool)?Arg\()\"\-[^\"]+?\"' %s | grep -v '%s'" % (CMD_ROOT_DIR, FOLDER_TEST)
  21 | +CMD_GREP_ARGS = r"egrep -r -I '(map(Multi)?Args(\.count\(|\[)|Get(Bool)?Arg\()\"\-[^\"]+?\"' %s | grep -Ev '%s'" % (CMD_ROOT_DIR, FOLDER_TEST)

MarcoFalke commented at 10:35 PM on December 21, 2017:

Don't forget to remove this as well

martinus force-pushed on Dec 23, 2017

martinus commented at 10:22 AM on December 23, 2017: contributor

Thanks a lot @MarcoFalke!

laanwj commented at 1:52 PM on December 23, 2017: member

utACK 760af84

laanwj merged this on Dec 23, 2017

laanwj closed this on Dec 23, 2017

laanwj referenced this in commit 5180a86c96 on Dec 23, 2017

MarcoFalke commented at 5:25 PM on December 23, 2017: member

Post-merge checked that the rebase was done correct:

martinus deleted the branch on May 22, 2018

practicalswift cross-referenced this on Jun 3, 2018 from issue build: Guard against accidental introduction of new Boost dependencies by practicalswift

laanwj referenced this in commit 7c7508c268 on Jun 5, 2018

sickpig cross-referenced this on May 5, 2020 from issue Update benchmark suite by sickpig

PastaPastaPasta referenced this in commit 3301b73c37 on Jun 17, 2020

PastaPastaPasta referenced this in commit 9a9aa3f467 on Jul 2, 2020

PastaPastaPasta referenced this in commit 516aa1f4d5 on Jul 18, 2020

PastaPastaPasta referenced this in commit f08b2f82f8 on Jul 19, 2020

PastaPastaPasta referenced this in commit 7ccd49aede on Jul 19, 2020

UdjinM6 referenced this in commit 83726d286e on Jul 20, 2020

gades referenced this in commit cf5d8c4d57 on Jun 30, 2021

furszy cross-referenced this on Aug 31, 2021 from issue [Tests] Improve benchmark precision by furszy

bitcoin locked this on Sep 8, 2021