bench: update nanobench add -min_time #23025

pull martinus wants to merge 9 commits into bitcoin:master from martinus:2021-09-update-nanobench changing 9 files +128 −72
  1. martinus commented at 10:44 am on September 18, 2021: contributor

    This PR updates the nanobench with the latest release from upstream, v4.3.6. It fixes the missing performance counters.

    Due to discussions on #22999 I have done some work that should make the benchmark results more reliable. It introduces a new flag -min_time that allows to run a benchmark for much longer then the default. When results are unreliable, choosing a large timeframe here should usually get repeatable results even when frequency scaling cannot be disabled. The default is now 10ms. For this to work I have changed the AddrManGood and EvictionProtection benchmarks so they work with any number of iterations.

    Also, this adds more usage documentation to bench_bitcoin -h and I’ve cherry-picked two changes from #22999 authored by Jon Atack

  2. fanquake added the label Upstream on Sep 18, 2021
  3. in src/bench/peer_eviction.cpp:31 in 354fbc0022 outdated
    32-    size_t i{0};
    33+
    34     bench.run([&] {
    35-        ProtectEvictionCandidatesByRatio(copies.at(i));
    36-        ++i;
    37+        // creating a copy has an overhead of about 3%, so it does not influcence the benchmark results much.
    


    jonatack commented at 1:12 pm on September 18, 2021:
    3154897f nit, s/influcence/influence/

    martinus commented at 10:52 am on September 19, 2021:
    fixed in eb399ab0577dffe7e1080bb497ff642de1c8c359
  4. in src/bench/addrman.cpp:123 in 354fbc0022 outdated
    121+
    122     bench.run([&] {
    123-        markSomeAsGood(*addrmans.at(i));
    124-        ++i;
    125+        // To make the benchmark independent of the number of evaluations, we always prepare a new addrman.
    126+        // This is necessary because CAddrMan::Good() method modifies theobject, affecting the timing of subsequent calls
    


    jonatack commented at 1:19 pm on September 18, 2021:
    c67f365 nit s/method modifies theobject/modifies the object/

    martinus commented at 10:53 am on September 19, 2021:
    fixed in 652faa16a1410b51a5ad78a3121cf6ab9329f640
  5. in src/bench/bench_bitcoin.cpp:15 in 354fbc0022 outdated
    10 #include <util/system.h>
    11 
    12 #include <memory>
    13 
    14 static const char* DEFAULT_BENCH_FILTER = ".*";
    15+static const int64_t DEFAULT_MIN_DURATION_MS = 10;
    


    jonatack commented at 1:25 pm on September 18, 2021:

    f167647

    0static constexpr int64_t DEFAULT_MIN_DURATION_MS{10};
    

    martinus commented at 10:54 am on September 19, 2021:
    fixed in b8f16c323e02c3d06e5ad57b6c0f96196d9b12e0
  6. in src/bench/bench_bitcoin.cpp:64 in 354fbc0022 outdated
    60+                     "  highly depend on the stability of the machine. It can sometimes be very\n"
    61+                     "  difficult to get repeatable results, so here are a few tips to get stable\n"
    62+                     "  results:\n"
    63+                     "\n"
    64+                     "  * Use pyperf [1] to disable frequency scaling, turbo boost etc. For best\n"
    65+                     "    use CPU pinning and CPU isolation (see [2]).\n"
    


    jonatack commented at 1:31 pm on September 18, 2021:

    fc2d44dd missing word

    0                     "    results, use CPU pinning and CPU isolation (see [2]).\n"
    
  7. in src/bench/bench_bitcoin.cpp:71 in 354fbc0022 outdated
    67+                     "  * Each call of run() should do exactly the same work. E.g. inserting into\n"
    68+                     "    a std::vector doesn't do that as it will reallocate on certain calls. Make\n"
    69+                     "    sure each run has exactly the same preconditions.\n"
    70+                     "\n"
    71+                     "  * If results are still not reliable increase runtime with e.g.\n"
    72+                     "    -min_duration_ms=5000 to let a benchmark run for at least 5 seconds\n"
    


    jonatack commented at 1:32 pm on September 18, 2021:
    fc2d44d s/reliable increase/reliable, increase/ and s/seconds/seconds./

    martinus commented at 10:55 am on September 19, 2021:
    wordings fixed in 57a7a020a87f13c231b44ad1861b0bb6e6f92e70
  8. in src/bench/bench.h:49 in 354fbc0022 outdated
    45     bool is_list_only;
    46     std::vector<double> asymptote;
    47     std::string output_csv;
    48     std::string output_json;
    49+    std::string regex_filter;
    50+    std::chrono::milliseconds min_duration;
    


    jonatack commented at 1:37 pm on September 18, 2021:
    f167647 or 3e1a8ca maybe place this line after bool is_list_only, as it is a very lightweight data type in terms of packing

    martinus commented at 10:55 am on September 19, 2021:
    done!
  9. in src/bench/rollingbloom.cpp:25 in 354fbc0022 outdated
    28-        data[2] = count >> 8;
    29-        data[3] = count;
    30+        data[0] = static_cast<unsigned char>(count >> 24);
    31+        data[1] = static_cast<unsigned char>(count >> 16);
    32+        data[2] = static_cast<unsigned char>(count >> 8);
    33+        data[3] = static_cast<unsigned char>(count);
    


    jonatack commented at 1:43 pm on September 18, 2021:

    354fbc0 Would it be a good idea to instead just make data a vector of uint8? There has been ongoing refactoring in the codebase to move from unsigned char to uint8.

    0-    std::vector<unsigned char> data(32);
    1-    uint32_t count = 0;
    2+    std::vector<uint8_t> data(32);
    3+    uint8_t count = 0;
    

    martinus commented at 10:56 am on September 19, 2021:
    I can’t easily change that because the RollingBloomHash’s API uses that and I don’t want the change to be too large. I’ve changed this though to use & 0xFF like it’s done in many other places in the code. Done in b901174bfb6895c8e332c2b5e280348836398e68
  10. in src/bench/bench_bitcoin.cpp:60 in 354fbc0022 outdated
    56+                     "\n"
    57+                     "Description:\n"
    58+                     "\n"
    59+                     "  bench_bitcoin executes microbenchmarks. The quality of the benchmark results\n"
    60+                     "  highly depend on the stability of the machine. It can sometimes be very\n"
    61+                     "  difficult to get repeatable results, so here are a few tips to get stable\n"
    


    jonatack commented at 1:53 pm on September 18, 2021:

    fc2d44dd suggestion

    “It can sometimes be difficult to get stable, repeatable results, so here are a few tips:”

  11. in src/bench/bench_bitcoin.cpp:88 in 354fbc0022 outdated
    84+                     "  In rare cases it can be useful to suppress stability warnings. This can be\n"
    85+                     "  done with the environment variable NANOBENCH_SUPPRESS_WARNINGS, e.g:\n"
    86+                     "\n"
    87+                     "    NANOBENCH_SUPPRESS_WARNINGS=1 ./bench_bitcoin\n"
    88+                     "\n"
    89+                  << argsman.GetHelpMessage()
    


    jonatack commented at 1:55 pm on September 18, 2021:
    fc2d44d I think it would be most helpful to begin with the Options, followed by the Description and Notes, as they go together.

    jonatack commented at 6:41 pm on September 18, 2021:
    Another alternative would be to add this Description and Notes documentation to doc/benchmarking.md.

    martinus commented at 10:57 am on September 19, 2021:
    changed in 57a7a020a87f13c231b44ad1861b0bb6e6f92e70
  12. in src/bench/bench_bitcoin.cpp:24 in 354fbc0022 outdated
    21-    argsman.AddArg("-asymptote=n1,n2,n3,...", "Test asymptotic growth of the runtime of an algorithm, if supported by the benchmark", ArgsManager::ALLOW_ANY, OptionsCategory::OPTIONS);
    22+    argsman.AddArg("-asymptote=<n1,n2,n3,...>", "Test asymptotic growth of the runtime of an algorithm, if supported by the benchmark", ArgsManager::ALLOW_ANY, OptionsCategory::OPTIONS);
    23     argsman.AddArg("-filter=<regex>", strprintf("Regular expression filter to select benchmark by name (default: %s)", DEFAULT_BENCH_FILTER), ArgsManager::ALLOW_ANY, OptionsCategory::OPTIONS);
    24-    argsman.AddArg("-list", "List benchmarks without executing them", ArgsManager::ALLOW_ANY, OptionsCategory::OPTIONS);
    25+    argsman.AddArg("-list", "List benchmarks without executing them", ArgsManager::ALLOW_BOOL, OptionsCategory::OPTIONS);
    26+    argsman.AddArg("-min_duration_ms=<milliseconds>", strprintf("Minimum runtime per benchmark, in milliseconds (default: %d)", DEFAULT_MIN_DURATION_MS), ArgsManager::ALLOW_INT, OptionsCategory::OPTIONS);
    


    jonatack commented at 1:58 pm on September 18, 2021:
    f167647c consider shortening the argument name to -min_time (faster and easier to type!)

    martinus commented at 10:56 am on September 19, 2021:
    done in b8f16c323e02c3d06e5ad57b6c0f96196d9b12e0
  13. jonatack commented at 2:07 pm on September 18, 2021: member

    Nice–thank you for working on this! Using the min duration arg does improve stability for me (lower err%) and is convenient as a runtime option.

    Tested Concept/Approach ACK with various suggestions below.

  14. in src/bench/bench.cpp:64 in 354fbc0022 outdated
    58@@ -61,6 +59,11 @@ void benchmark::BenchRunner::RunAll(const Args& args)
    59 
    60         Bench bench;
    61         bench.name(p.first);
    62+        if (args.min_duration > std::chrono::milliseconds::zero()) {
    63+            // convert to nanos before dividing to reduce rounding errors
    64+            bench.minEpochTime(std::chrono::nanoseconds(args.min_duration) / bench.epochs());
    


    jonatack commented at 2:32 pm on September 18, 2021:

    Looking at https://en.cppreference.com/w/cpp/chrono/duration/duration_cast, would something like this be a good idea?

    0// converting integral duration to integral duration of shorter divisible time unit:
    1    // no duration_cast needed
    2    std::chrono::duration<long, std::micro> int_usec = int_ms;
    

    also line 62, can shorten if preferred

    0-        if (args.min_duration > std::chrono::milliseconds::zero()) {
    1+        if (args.min_duration > 0ms) {
    

    also, where std::chrono is added, look at include headers for <chrono> where needed


    martinus commented at 10:59 am on September 19, 2021:

    hm, I think it’s fine to use std::chrono::nanoseconds, it’s basically the same but with int64_t. It should be large so there’s no easy chance for overflow. I generally try to not use duration_cast, because that’s a sign that at that place there will be some loss of information due to the casting.

    I’ve changed line 64 to 0ms in b8f16c323e02c3d06e5ad57b6c0f96196d9b12e0

  15. DrahtBot commented at 2:50 am on September 19, 2021: member

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #22950 ([p2p] Pimpl AddrMan to abstract implementation details by amitiuttarwar)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  16. martinus force-pushed on Sep 19, 2021
  17. martinus commented at 11:00 am on September 19, 2021: contributor
    Thanks for the reviews! I’ve incorporated all the suggestions into the right commits and rebased everything.
  18. martinus renamed this:
    bench: update nanobench add `-min_duration_ms`
    bench: update nanobench add `-min_time`
    on Sep 19, 2021
  19. in src/bench/bench_bitcoin.cpp:71 in b901174bfb outdated
    67+                     "  * Each call of run() should do exactly the same work. E.g. inserting into\n"
    68+                     "    a std::vector doesn't do that as it will reallocate on certain calls. Make\n"
    69+                     "    sure each run has exactly the same preconditions.\n"
    70+                     "\n"
    71+                     "  * If results are still not reliable, increase runtime with e.g.\n"
    72+                     "    -min_duration_ms=5000 to let a benchmark run for at least 5 seconds.\n"
    


    jonatack commented at 3:26 pm on September 19, 2021:

    57a7a02

    0                     "    -min_time=5000 to let a benchmark run for at least 5 seconds.\n"
    

    martinus commented at 5:31 am on September 20, 2021:
    damn, forgot that one! Fixed in the latest rebase.
  20. in src/bench/bench.cpp:73 in b901174bfb outdated
    58@@ -61,6 +59,12 @@ void benchmark::BenchRunner::RunAll(const Args& args)
    59 
    60         Bench bench;
    61         bench.name(p.first);
    62+        if (args.min_time > 0ms) {
    63+            // convert to nanos before dividing to reduce rounding errors
    64+            std::chrono::nanoseconds min_time_ns = args.min_time;
    


    jonatack commented at 3:40 pm on September 19, 2021:

    b8f16c3

    • per doc/developer-notes.md, it looks like #include <chrono> should be added to bench.cpp and bench_bitcoin.cpp:
    0- Every `.cpp` and `.h` file should `#include` every header file it directly uses classes, functions or other
    1  definitions from, even if those headers are already included indirectly through other headers.
    2
    3  - *Rationale*: Excluding headers because they are already indirectly included results in compilation
    4    failures when those indirect dependencies change. Furthermore, it obscures what the real code
    5    dependencies are.
    
    • pico-nit, feel free to ignore
    0            const std::chrono::nanoseconds min_time_ns{args.min_time};
    

    martinus commented at 5:37 pm on September 19, 2021:

    hm, that include guideline is a bit hard to follow. There are lots of other missing includes, e.g. bench_bitcoin.cpp would need at least chrono, cstdint, vector, sstream, string, iostream. Should I add these as well?

    About the constructor, I think that is one of the cases where assignment is better: chrono’s assignment is more strict than the constructor, you can’t do e.g. std::chrono::nanoseconds x = 123 but you can do std::chrono::nanoseconds x{123}. So with the assignment it’s certain that args.min_time is actually another chrono type that convertible without loss of information, and not just any number that could represent anything.


    martinus commented at 5:32 am on September 20, 2021:
    I’ve now added the headers in clean up includes commit
  21. jonatack commented at 3:52 pm on September 19, 2021: member
    Almost-ACK b901174bfb6
  22. martinus force-pushed on Sep 20, 2021
  23. martinus commented at 5:32 am on September 20, 2021: contributor
    Rebased with header cleanups, and signed all commits.
  24. jonatack commented at 8:53 am on September 20, 2021: member
    ACK e7e890079e92e346997cbdf491b5b0717cd4bcab
  25. in src/bench/addrman.cpp:123 in cf84fc6f4a outdated
    121-        ++i;
    122+        // To make the benchmark independent of the number of evaluations, we always prepare a new addrman.
    123+        // This is necessary because CAddrMan::Good() method modifies the object, affecting the timing of subsequent calls
    124+        // to the same method and we want to do the same amount of work in every loop iteration.
    125+        //
    126+        // This has some overhead (about 8% on my machine), but that overhead is constant so improvements in
    


    MarcoFalke commented at 9:08 am on September 20, 2021:
    Likely going to change with #22974

    martinus commented at 11:12 am on September 20, 2021:
    hm, I don’t know much about what addrman does, but currently markSomeAsGood in the benchmark only uses every 32nd element. Can be bumped up to call Good() on every entry, maybe in random order?

    martinus commented at 12:49 pm on September 21, 2021:
    I’ve rebased because #22974 is already merged, and updated the benchmark in 153e686 so it makes more sense with the much faster Good(). Nothing else has changed

    MarcoFalke commented at 1:28 pm on September 21, 2021:
    If the overhead is now 99%, then changes in the order of -50% or +100% are not noticed. Just asking to make sure the overhead isn’t too much.

    martinus commented at 1:46 pm on September 21, 2021:

    This is what I get when I run ./src/bench/bench_bitcoin -min_time=10000 -filter="AddrManAdd.*" on my machine:

    ns/op op/s err% ins/op cyc/op IPC bra/op miss% total benchmark
    44,370,397.17 22.54 0.6% 448,854,352.19 141,515,648.83 3.172 33,913,128.43 0.1% 11.05 AddrManAdd
    114,389,015.11 8.74 0.2% 1,201,264,095.33 364,837,620.00 3.293 78,812,510.33 0.1% 10.87 AddrManAddThenGood

    So with the change it’s ~44ms for AddrManAdd, ~114ms for AddrManAddThenGood, leaving ~70ms for just the Good() part of the benchmark

  26. bench: update nanobench from 4.3.4 to 4.3.6
    Most importantly, this update fixes a bug in nanobench that always
    disabled performance counters on linux.
    
    It also adds another sanitizer suppression that is caught in clang++ 12.
    eed99cf272
  27. bench: remove unnecessary & incorrect multiplication in MuHashDiv
    Introduced in #19055, MuHashDiv benchmark used to multiply with a loop
    based on epochIterations. That does not do what it is supposed to do,
    because epochIterations() is determined automatically from nanobench.
    
    Also, multiplication is not needed for the algorithm (as pointed out by
    a comment in #19055), so it's better to remove this loop.
    468b232f71
  28. laanwj added the label Tests on Sep 21, 2021
  29. bench: change AddrManGood to AddrManAddThenGood
    Moves some of the setup into the benchmark loop so it is possible to run
    the loop for an arbitrary number of times. Due to recent optimizations
    in #22974 the benchmark now runs much faster, so the inner loop now calls
    Good() 32 times as often to get better numbers.
    
    Renamed the benchmark to AddrManAddThenGood because that's now what is
    actually tested. To get the the number of just Good(), one needs to
    subtract the benchmark result of AddrManAdd.
    153e6860e8
  30. bench: make EvictionProtection.* work with any number of iterations
    Moves copying of the setup into the benchmark loop so it is possible
    to run the loop for an arbitrary number of times.
    
    The overhead due to copying the candidates inside the loop is about 3%.
    9fef832932
  31. bench: introduce -min_time argument
    When it is not easily possible to stabilize benchmark machine and code
    the argument -min_time can be used to specify a minimum duration
    that a benchmark should take. E.g. choose -min_time=1000 if you
    are willing to wait about 1 second for each benchmark result.
    
    The default is now set to 10ms instead of 0, which should make runs on
    fast machines more stable with negligible slowdown.
    d3c6f8bfa1
  32. bench: add usage description and documentation
    This adds some usage description with tips to `bench_bitcoin -h`.
    1f10f1663e
  33. bench: clean up includes
    Drops unneeded and adds missing includes
    d312fd94a1
  34. bench: various args improvements
    - use ALLOW_BOOL for -list arg instead of ALLOW_ANY
    - touch up `-asymptote=<n1,n2,n3...>` help
    - pack Args struct a bit more efficiently
    - handle args in alphabetical order
    da4e2f1da0
  35. bench: fixed ubsan implicit conversion
    The benchmarks can now run much longer due to the minimum of 10ms or
    directly with -min_time. With -min_time=20000 I could trigger two ubsan
    errors in the benchmarks, which are fixed in this commit by using
    unsigned type and adding "& 0xFF".
    e148a52332
  36. martinus force-pushed on Sep 21, 2021
  37. jonatack commented at 1:27 pm on September 21, 2021: member

    re-ACK e148a5233292d156cda76cb20afb6641fc20f25e

    The Win64 CI failure looks unrelated.

    0$ NANOBENCH_SUPPRESS_WARNINGS=1 ./bench_bitcoin -filter=AddrMan*.* -min_time=20000
    1
    2|               ns/op |                op/s |    err% |     total | benchmark
    3|--------------------:|--------------------:|--------:|----------:|:----------
    4|      762,448,657.67 |                1.31 |    6.8% |     23.20 | `AddrManAdd`
    5|    1,794,860,430.00 |                0.56 |    2.5% |     19.71 | `AddrManAddThenGood`
    6|        9,101,993.99 |              109.87 |    7.8% |     20.56 | `AddrManGetAddr`
    7|           11,379.90 |           87,874.25 |    2.1% |     21.93 | `AddrManSelect`
    

    If you retouch, maybe order the addrman benchmark functions alphabetically, which would be the same order as the bench result output and place the two similar functions, AddrManAdd and AddrManAddThenGood, next to each other.

  38. martinus commented at 1:51 pm on September 21, 2021: contributor
    @jonatack Interesting that your benchmark results are so different. Did you compile for release? I can’t believe my computer is ~20 times faster than yours… I compile with ./autogen.sh && ./configure --with-incompatible-bdb CC=clang CXX=clang++
  39. jonatack commented at 1:58 pm on September 21, 2021: member
    @martinus no, it’s a debug build and the computer is not optimized. Re-building a non-debug build would take a long time (I have a slow 2-core laptop) and I’d already tested it properly above. Sorry for the weird example.
  40. laanwj commented at 4:38 pm on September 24, 2021: member
    Code review ACK e148a5233292d156cda76cb20afb6641fc20f25e
  41. laanwj merged this on Sep 24, 2021
  42. laanwj closed this on Sep 24, 2021

  43. sidhujag referenced this in commit 1f347afcd1 on Sep 24, 2021
  44. sidhujag referenced this in commit fbf3524358 on Sep 24, 2021
  45. martinus deleted the branch on Sep 26, 2021
  46. DrahtBot locked this on Oct 30, 2022

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-07-03 13:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me