RFC: bench: Add multi thread benchmarking

fjahr commented at 11:36 pm on October 29, 2025: contributor

This adds a rough way to run specific benchmarks with different numbers of worker threads to see how these impact performance. This is useful for me in the batch validation context and for potential refactorings of checkqueue but I am not sure if this is useful in many other contexts so I am leaving this as a RFC draft for now to see if there is any interest in merging something like this.

Example output:

 0$ build/bin/bench_bitcoin -filter=ConnectBlockAllSchnorr -min-time=1000 -scale-threads
 1Running benchmarks with worker threads from 1 to 15
 2
 3|            ns/block |             block/s |    err% |     total | benchmark
 4|--------------------:|--------------------:|--------:|----------:|:----------
 5|       43,118,187.50 |               23.19 |    0.1% |      0.99 | `ConnectBlockAllSchnorr_1_threads`
 6|       29,015,444.33 |               34.46 |    0.2% |      1.10 | `ConnectBlockAllSchnorr_2_threads`
 7|       22,241,437.50 |               44.96 |    0.1% |      1.09 | `ConnectBlockAllSchnorr_3_threads`
 8|       17,893,361.00 |               55.89 |    0.3% |      1.09 | `ConnectBlockAllSchnorr_4_threads`
 9|       15,105,819.50 |               66.20 |    0.2% |      1.09 | `ConnectBlockAllSchnorr_5_threads`
10|       13,145,510.38 |               76.07 |    0.2% |      1.08 | `ConnectBlockAllSchnorr_6_threads`
11|       11,646,614.62 |               85.86 |    0.2% |      1.06 | `ConnectBlockAllSchnorr_7_threads`
12|       10,476,425.89 |               95.45 |    0.2% |      1.07 | `ConnectBlockAllSchnorr_8_threads`
13|        9,547,125.00 |              104.74 |    0.2% |      1.08 | `ConnectBlockAllSchnorr_9_threads`
14|        9,574,246.27 |              104.45 |    0.6% |      1.08 | `ConnectBlockAllSchnorr_10_threads`
15|        9,557,633.40 |              104.63 |    1.1% |      1.08 | `ConnectBlockAllSchnorr_11_threads`
16|        9,495,175.00 |              105.32 |    0.8% |      1.08 | `ConnectBlockAllSchnorr_12_threads`
17|        9,297,012.50 |              107.56 |    0.3% |      1.07 | `ConnectBlockAllSchnorr_13_threads`
18|        9,775,795.45 |              102.29 |    0.7% |      1.11 | `ConnectBlockAllSchnorr_14_threads`
19|        9,776,140.18 |              102.29 |    0.9% |      1.06 | `ConnectBlockAllSchnorr_15_threads`

DrahtBot commented at 11:36 pm on October 29, 2025: contributor

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/33740.

Reviews

See the guideline for information on the review process. A summary of reviews will appear here.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#34210 (bench: Remove -priority-level= option by maflcko)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

fjahr commented at 11:44 pm on October 29, 2025: contributor

I people could test the exact command as above it might be interesting to post the results here. ~~For me at least, 13 worker threads are consistently delivering a better result than 15 worker threads.~~ EDIT: After some time I started to see the best results with 14 and 15, so maybe this was nothing.

fjahr force-pushed on Oct 29, 2025

in src/bench/bench.cpp:121 in de25d1a855 outdated

161+        for (int threads : thread_scales) {
162+            std::vector<std::string> current_setup_args = args.setup_args;
163+            if (threads > 0) {
164+                bool found = false;
165+                for (auto& arg : current_setup_args) {
166+                    if (arg.starts_with("-worker-threads=") == 0) {

l0rinc commented at 9:33 am on November 25, 2025:

hmmm, this looks like a refactor gone wrong, was this arg.find("-worker-threads=") == 0 originally?

0                    if (arg.starts_with("-worker-threads=")) {

so I guess if (!found) { should also be adjusted after this

Alternatively, instead of finding and overwriting, what if we unconditionally erased and pushed back the actual thread count, something like:

0std::vector current_setup_args {args.setup_args};
1if (threads > 0) {
2    std::erase_if(current_setup_args, [](const auto& arg) { return arg.starts_with("-worker-threads="); });
3    current_setup_args.push_back(strprintf("-worker-threads=%d", threads));
4}

in src/bench/bench.cpp:98 in de25d1a855 outdated

127+    if (is_thread_scaling) {
128+        // Scale from 1 to MAX_SCRIPTCHECK_THREADS
129+        constexpr int MAX_SCRIPTCHECK_THREADS = 15;
130+        for (int i = 1; i <= MAX_SCRIPTCHECK_THREADS; ++i) {
131+            thread_scales.push_back(i);
132+        }

l0rinc commented at 9:38 am on November 25, 2025:

Script checks are currently the main source of threading, but not necessarily limited to this (e.g. https://github.com/bitcoin/bitcoin/pull/31132/commits or #26966 or #33689 or #32747), so I would suggest untangling multithreading from script validation here. If you insist this being about script validation (which sounds a bit too specific to include in a general benchmark), ignore the comment.

nit: redundant comment, explains exactly what the code already explains without adding additional context

in src/test/util/setup_common.cpp:274 in de25d1a855 outdated

269+            // Use no worker threads while fuzzing to avoid non-determinism
270+            worker_threads = 0;
271+        } else if (m_node.args->IsArgSet("-worker-threads")) {
272+            worker_threads = m_node.args->GetIntArg("-worker-threads", 2);
273+            if (worker_threads < 0) {
274+                throw std::runtime_error("-worker-threads must be non-negative");

l0rinc commented at 9:40 am on November 25, 2025:

What should happen when worker_threads > MAX_SCRIPTCHECK_THREADS?

in src/bench/bench.cpp:92 in de25d1a855 outdated

121+    bool is_thread_scaling = args.scale_threads && (args.regex_filter != ".*");
122+    if (args.scale_threads && !is_thread_scaling) {
123+        std::cout << "Warning: -scale-threads requires -filter to be set. Running with default thread count." << std::endl;
124+    }
125+
126+    std::vector<int> thread_scales;

l0rinc commented at 9:50 am on November 25, 2025:

What’s the purpose of a vector over a range? Are we planning on testing threads 2,4,8,16 instead of 1,2,3,4,5,6...? If not, consider making it a range, something like:

0constexpr int DEFAULT_MAX_BENCH_THREADS{16};
1const auto [start_threads, end_threads]{is_thread_scaling ? std::pair{1, DEFAULT_MAX_BENCH_THREADS} : std::pair{0, 0}};

and maybe iterate as

0for (int threads{start_threads}; threads <= end_threads; ++threads) {
1    ...
2}

in src/bench/bench.cpp:87 in de25d1a855 outdated

116-        std::vector<const char*> ret;
117-        ret.reserve(args.setup_args.size());
118-        for (const auto& arg : args.setup_args) ret.emplace_back(arg.c_str());
119-        return ret;
120-    };
121+    bool is_thread_scaling = args.scale_threads && (args.regex_filter != ".*");

l0rinc commented at 9:51 am on November 25, 2025:

Might be out of scope for this PR, but instead of inlining DEFAULT_BENCH_FILTER here, can we move it over to src/bench/bench.cpp

0    bool is_thread_scaling = args.scale_threads && (args.regex_filter != DEFAULT_BENCH_FILTER);

l0rinc commented at 10:11 am on November 25, 2025: contributor

Not sure whether the concept of script validation has to creep into the benchmarking parameters, maybe we could generalize it to -thread-count or -par or something. Left a few nits.

fjahr commented at 10:58 pm on November 26, 2025: contributor

Not sure whether the concept of script validation has to creep into the benchmarking parameters, maybe we could generalize it to -thread-count or -par or something.

Thanks, happy to change this and address the other comments. But aside from that do you think this is generally interesting enough to be considered merging? I was unsure and since I haven’t worked much on benchmarking I would like to get some concept-ack-ish feedback from the benchmarking wg :)

DrahtBot added the label Needs rebase on Jan 14, 2026

DrahtBot commented at 11:13 pm on January 14, 2026: contributor

🐙 This pull request conflicts with the target branch and needs rebase.

bench: Add step-wise multi thread benchmarking 4538c002ed

fjahr force-pushed on Jan 15, 2026

fjahr commented at 10:11 pm on January 15, 2026: contributor

Rebased because it’s still useful for me but not seeing any concept ACKs, so I am closing this for now. Happy to re-open and address the comments if there is broader interest in this but I guess it’s too niece for general interest in merging it.

fjahr closed this on Jan 15, 2026

RFC: bench: Add multi thread benchmarking #33740

Code Coverage & Benchmarks

Reviews

Conflicts