RFC: bench: Add multi thread benchmarking #33740

pull fjahr wants to merge 1 commits into bitcoin:master from fjahr:202510-multi-thread-bench changing 4 files +88 −31
  1. fjahr commented at 11:36 pm on October 29, 2025: contributor

    This adds a rough way to run specific benchmarks with different numbers of worker threads to see how these impact performance. This is useful for me in the batch validation context and for potential refactorings of checkqueue but I am not sure if this is useful in many other contexts so I am leaving this as a RFC draft for now to see if there is any interest in merging something like this.

    Example output:

     0$ build/bin/bench_bitcoin -filter=ConnectBlockAllSchnorr -min-time=1000 -scale-threads
     1Running benchmarks with worker threads from 1 to 15
     2
     3|            ns/block |             block/s |    err% |     total | benchmark
     4|--------------------:|--------------------:|--------:|----------:|:----------
     5|       43,118,187.50 |               23.19 |    0.1% |      0.99 | `ConnectBlockAllSchnorr_1_threads`
     6|       29,015,444.33 |               34.46 |    0.2% |      1.10 | `ConnectBlockAllSchnorr_2_threads`
     7|       22,241,437.50 |               44.96 |    0.1% |      1.09 | `ConnectBlockAllSchnorr_3_threads`
     8|       17,893,361.00 |               55.89 |    0.3% |      1.09 | `ConnectBlockAllSchnorr_4_threads`
     9|       15,105,819.50 |               66.20 |    0.2% |      1.09 | `ConnectBlockAllSchnorr_5_threads`
    10|       13,145,510.38 |               76.07 |    0.2% |      1.08 | `ConnectBlockAllSchnorr_6_threads`
    11|       11,646,614.62 |               85.86 |    0.2% |      1.06 | `ConnectBlockAllSchnorr_7_threads`
    12|       10,476,425.89 |               95.45 |    0.2% |      1.07 | `ConnectBlockAllSchnorr_8_threads`
    13|        9,547,125.00 |              104.74 |    0.2% |      1.08 | `ConnectBlockAllSchnorr_9_threads`
    14|        9,574,246.27 |              104.45 |    0.6% |      1.08 | `ConnectBlockAllSchnorr_10_threads`
    15|        9,557,633.40 |              104.63 |    1.1% |      1.08 | `ConnectBlockAllSchnorr_11_threads`
    16|        9,495,175.00 |              105.32 |    0.8% |      1.08 | `ConnectBlockAllSchnorr_12_threads`
    17|        9,297,012.50 |              107.56 |    0.3% |      1.07 | `ConnectBlockAllSchnorr_13_threads`
    18|        9,775,795.45 |              102.29 |    0.7% |      1.11 | `ConnectBlockAllSchnorr_14_threads`
    19|        9,776,140.18 |              102.29 |    0.9% |      1.06 | `ConnectBlockAllSchnorr_15_threads`
    
  2. DrahtBot commented at 11:36 pm on October 29, 2025: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/33740.

    Reviews

    See the guideline for information on the review process. A summary of reviews will appear here.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #33300 (fuzz: compact block harness by Crypt-iQ)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  3. fjahr commented at 11:44 pm on October 29, 2025: contributor
    I people could test the exact command as above it might be interesting to post the results here. For me at least, 13 worker threads are consistently delivering a better result than 15 worker threads. EDIT: After some time I started to see the best results with 14 and 15, so maybe this was nothing.
  4. bench: Add step-wise multi thread benchmarking de25d1a855
  5. fjahr force-pushed on Oct 29, 2025
  6. in src/bench/bench.cpp:154 in de25d1a855
    161+        for (int threads : thread_scales) {
    162+            std::vector<std::string> current_setup_args = args.setup_args;
    163+            if (threads > 0) {
    164+                bool found = false;
    165+                for (auto& arg : current_setup_args) {
    166+                    if (arg.starts_with("-worker-threads=") == 0) {
    


    l0rinc commented at 9:33 am on November 25, 2025:

    hmmm, this looks like a refactor gone wrong, was this arg.find("-worker-threads=") == 0 originally?

    0                    if (arg.starts_with("-worker-threads=")) {
    

    so I guess if (!found) { should also be adjusted after this


    Alternatively, instead of finding and overwriting, what if we unconditionally erased and pushed back the actual thread count, something like:

    0std::vector current_setup_args {args.setup_args};
    1if (threads > 0) {
    2    std::erase_if(current_setup_args, [](const auto& arg) { return arg.starts_with("-worker-threads="); });
    3    current_setup_args.push_back(strprintf("-worker-threads=%d", threads));
    4}
    
  7. in src/bench/bench.cpp:125 in de25d1a855
    127+    if (is_thread_scaling) {
    128+        // Scale from 1 to MAX_SCRIPTCHECK_THREADS
    129+        constexpr int MAX_SCRIPTCHECK_THREADS = 15;
    130+        for (int i = 1; i <= MAX_SCRIPTCHECK_THREADS; ++i) {
    131+            thread_scales.push_back(i);
    132+        }
    


    l0rinc commented at 9:38 am on November 25, 2025:

    Script checks are currently the main source of threading, but not necessarily limited to this (e.g. https://github.com/bitcoin/bitcoin/pull/31132/commits or #26966 or #33689 or #32747), so I would suggest untangling multithreading from script validation here. If you insist this being about script validation (which sounds a bit too specific to include in a general benchmark), ignore the comment.

    nit: redundant comment, explains exactly what the code already explains without adding additional context

  8. in src/test/util/setup_common.cpp:274 in de25d1a855
    269+            // Use no worker threads while fuzzing to avoid non-determinism
    270+            worker_threads = 0;
    271+        } else if (m_node.args->IsArgSet("-worker-threads")) {
    272+            worker_threads = m_node.args->GetIntArg("-worker-threads", 2);
    273+            if (worker_threads < 0) {
    274+                throw std::runtime_error("-worker-threads must be non-negative");
    


    l0rinc commented at 9:40 am on November 25, 2025:
    What should happen when worker_threads > MAX_SCRIPTCHECK_THREADS?
  9. in src/bench/bench.cpp:119 in de25d1a855
    121+    bool is_thread_scaling = args.scale_threads && (args.regex_filter != ".*");
    122+    if (args.scale_threads && !is_thread_scaling) {
    123+        std::cout << "Warning: -scale-threads requires -filter to be set. Running with default thread count." << std::endl;
    124+    }
    125+
    126+    std::vector<int> thread_scales;
    


    l0rinc commented at 9:50 am on November 25, 2025:

    What’s the purpose of a vector over a range? Are we planning on testing threads 2,4,8,16 instead of 1,2,3,4,5,6...? If not, consider making it a range, something like:

    0constexpr int DEFAULT_MAX_BENCH_THREADS{16};
    1const auto [start_threads, end_threads]{is_thread_scaling ? std::pair{1, DEFAULT_MAX_BENCH_THREADS} : std::pair{0, 0}};
    

    and maybe iterate as

    0for (int threads{start_threads}; threads <= end_threads; ++threads) {
    1    ...
    2}
    
  10. in src/bench/bench.cpp:114 in de25d1a855
    116-        std::vector<const char*> ret;
    117-        ret.reserve(args.setup_args.size());
    118-        for (const auto& arg : args.setup_args) ret.emplace_back(arg.c_str());
    119-        return ret;
    120-    };
    121+    bool is_thread_scaling = args.scale_threads && (args.regex_filter != ".*");
    


    l0rinc commented at 9:51 am on November 25, 2025:

    Might be out of scope for this PR, but instead of inlining DEFAULT_BENCH_FILTER here, can we move it over to src/bench/bench.cpp

    0    bool is_thread_scaling = args.scale_threads && (args.regex_filter != DEFAULT_BENCH_FILTER);
    
  11. l0rinc commented at 10:11 am on November 25, 2025: contributor
    Not sure whether the concept of script validation has to creep into the benchmarking parameters, maybe we could generalize it to -thread-count or -par or something. Left a few nits.
  12. fjahr commented at 10:58 pm on November 26, 2025: contributor

    Not sure whether the concept of script validation has to creep into the benchmarking parameters, maybe we could generalize it to -thread-count or -par or something.

    Thanks, happy to change this and address the other comments. But aside from that do you think this is generally interesting enough to be considered merging? I was unsure and since I haven’t worked much on benchmarking I would like to get some concept-ack-ish feedback from the benchmarking wg :)


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-12-12 00:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me