Split out from #18710.
Some results (borrowed from #18710):
Hmmmm…
I’d maybe rather just abort the test and return early if only 1 thread gets made because we shouldn’t ever be running with the checkqueue on a single core machine…
Hmmmm…
I’d maybe rather just abort the test and return early if only 1 thread gets made because we shouldn’t ever be running with the checkqueue on a single core machine…
We do not know for sure who many CPU cores/threads users dedicate to Bitcoin Core :)
With this PR benchmark results are quite consistent:
0$ taskset --cpu-list 0-7 src/bench/bench_bitcoin -filter=CCheckQueueSpeedPrevectorJob
1| ns/job | job/s | err% | total | benchmark
2|--------------------:|--------------------:|--------:|----------:|:----------
3| 177.62 | 5,629,902.65 | 1.9% | 0.01 | `CCheckQueueSpeedPrevectorJob`
0$ taskset --cpu-list 0-3 src/bench/bench_bitcoin -filter=CCheckQueueSpeedPrevectorJob
1| ns/job | job/s | err% | total | benchmark
2|--------------------:|--------------------:|--------:|----------:|:----------
3| 176.05 | 5,680,327.83 | 3.3% | 0.01 | `CCheckQueueSpeedPrevectorJob`
0$ taskset --cpu-list 0-1 src/bench/bench_bitcoin -filter=CCheckQueueSpeedPrevectorJob
1| ns/job | job/s | err% | total | benchmark
2|--------------------:|--------------------:|--------:|----------:|:----------
3| 55.45 | 18,034,367.56 | 2.6% | 0.00 | `CCheckQueueSpeedPrevectorJob`
0$ taskset --cpu-list 0 src/bench/bench_bitcoin -filter=CCheckQueueSpeedPrevectorJob
1| ns/job | job/s | err% | total | benchmark
2|--------------------:|--------------------:|--------:|----------:|:----------
3| 54.73 | 18,271,118.07 | 4.0% | 0.00 | `CCheckQueueSpeedPrevectorJob`
And early return do not disable the test, right?
If it’s a benchmark that benchmarks thread synchronization/interaction (which is, I think, the only valid reason to run a benchmark multi-threaded) then I agree running it without threads makes little sense.
(And if so you could even see the “two threads on a single-core machine” case as a realistic case, because bitcoind will create threads on such a machine)
Co-authored-by: Martin Ankerl <Martin.Ankerl@gmail.com>
Updated d786765721e02e7d2c888c8276c02064e5056593 -> a65c1adc7638c2b6f2038cbe11a637b285dd8dd2 (pr19710.01 -> pr19710.02, diff).
Addressed:
Hmmmm…
I’d maybe rather just abort the test and return early if only 1 thread gets made because we shouldn’t ever be running with the checkqueue on a single core machine…
If it’s a benchmark that benchmarks thread synchronization/interaction (which is, I think, the only valid reason to run a benchmark multi-threaded) then I agree running it without threads makes little sense.
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.
Reviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.
23@@ -26,6 +24,10 @@ static const unsigned int QUEUE_BATCH_SIZE = 128;
24 // and there is a little bit of work done between calls to Add.
25 static void CCheckQueueSpeedPrevectorJob(benchmark::Bench& bench)
26 {
27+ if (GetNumCores() < 2) {
28+ return;
29+ }
I think this warrants a comment explaining “why?”.
Also, maybe don’t drop the MIN_CORES
constant and use it here instead of the magic number “2”?
45@@ -44,7 +46,7 @@ static void CCheckQueueSpeedPrevectorJob(benchmark::Bench& bench)
46 };
47 CCheckQueue<PrevectorJob> queue {QUEUE_BATCH_SIZE};
48 boost::thread_group tg;
49- for (auto x = 0; x < std::max(MIN_CORES, GetNumCores()); ++x) {
50+ for (auto x = 0; x < GetNumCores() - 1; ++x) {
Updated a65c1adc7638c2b6f2038cbe11a637b285dd8dd2 -> 4f9164389a078f64258f3d7f2ecda5ab68968e8c (pr19710.02 -> pr19710.03, diff).
Addressed @vasild’s comments:
Also, maybe don’t drop the MIN_CORES constant and use it here instead of the magic number “2”?
Maybe also comment here “why we do - 1”.
25@@ -26,6 +26,10 @@ static const unsigned int QUEUE_BATCH_SIZE = 128;
26 // and there is a little bit of work done between calls to Add.
27 static void CCheckQueueSpeedPrevectorJob(benchmark::Bench& bench)
28 {
29+ if (GetNumCores() < MIN_CORES) {
4f9164389a078f64258f3d7f2ecda5ab68968e8c
nit, single line and also add comment.
Updated 4f9164389a078f64258f3d7f2ecda5ab68968e8c -> 5e7acd65443d3fc7be084ea81be3089e4b28bc9a (pr19710.03 -> pr19710.04, diff).
nit, single line and also add comment.
ACK 5e7acd654
Thanks for the comments!
This change decreases the variance of benchmark results.
25@@ -26,6 +26,9 @@ static const unsigned int QUEUE_BATCH_SIZE = 128;
26 // and there is a little bit of work done between calls to Add.
27 static void CCheckQueueSpeedPrevectorJob(benchmark::Bench& bench)
28 {
29+ // We shouldn't ever be running with the checkqueue on a single core machine.
30+ if (GetNumCores() < MIN_CORES) return;
MIN_CORES
and change this to GetNumCores() == 1
?
https://en.cppreference.com/w/cpp/thread/thread/hardware_concurrency:
If the value is not well defined or not computable, returns
0
.
<=1
?
Updated 5e7acd65443d3fc7be084ea81be3089e4b28bc9a -> 3edc4e34fe2f92e7066c1455f5e42af2fdb43b99 (pr19710.04 -> pr19710.05, diff).
Maybe ditch
MIN_CORES
and change this toGetNumCores() == 1
?