Split out from #18710.
Some results (borrowed from #18710):

Hmmmm...
I'd maybe rather just abort the test and return early if only 1 thread gets made because we shouldn't ever be running with the checkqueue on a single core machine...
Hmmmm...
I'd maybe rather just abort the test and return early if only 1 thread gets made because we shouldn't ever be running with the checkqueue on a single core machine...
We do not know for sure who many CPU cores/threads users dedicate to Bitcoin Core :)
With this PR benchmark results are quite consistent:
$ taskset --cpu-list 0-7 src/bench/bench_bitcoin -filter=CCheckQueueSpeedPrevectorJob
| ns/job | job/s | err% | total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
| 177.62 | 5,629,902.65 | 1.9% | 0.01 | `CCheckQueueSpeedPrevectorJob`
$ taskset --cpu-list 0-3 src/bench/bench_bitcoin -filter=CCheckQueueSpeedPrevectorJob
| ns/job | job/s | err% | total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
| 176.05 | 5,680,327.83 | 3.3% | 0.01 | `CCheckQueueSpeedPrevectorJob`
$ taskset --cpu-list 0-1 src/bench/bench_bitcoin -filter=CCheckQueueSpeedPrevectorJob
| ns/job | job/s | err% | total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
| 55.45 | 18,034,367.56 | 2.6% | 0.00 | `CCheckQueueSpeedPrevectorJob`
$ taskset --cpu-list 0 src/bench/bench_bitcoin -filter=CCheckQueueSpeedPrevectorJob
| ns/job | job/s | err% | total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
| 54.73 | 18,271,118.07 | 4.0% | 0.00 | `CCheckQueueSpeedPrevectorJob`
And early return do not disable the test, right?
If it's a benchmark that benchmarks thread synchronization/interaction (which is, I think, the only valid reason to run a benchmark multi-threaded) then I agree running it without threads makes little sense.
(And if so you could even see the "two threads on a single-core machine" case as a realistic case, because bitcoind will create threads on such a machine)
Co-authored-by: Martin Ankerl <Martin.Ankerl@gmail.com>
Updated d786765721e02e7d2c888c8276c02064e5056593 -> a65c1adc7638c2b6f2038cbe11a637b285dd8dd2 (pr19710.01 -> pr19710.02, diff).
Addressed:
Hmmmm...
I'd maybe rather just abort the test and return early if only 1 thread gets made because we shouldn't ever be running with the checkqueue on a single core machine...
If it's a benchmark that benchmarks thread synchronization/interaction (which is, I think, the only valid reason to run a benchmark multi-threaded) then I agree running it without threads makes little sense.
<!--e57a25ab6845829454e8d69fc972939a-->
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.
<!--174a7506f384e20aa4161008e828411d-->
Reviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.
cr ACK a65c1adc7638c2b6f2038cbe11a637b285dd8dd2
23 | @@ -26,6 +24,10 @@ static const unsigned int QUEUE_BATCH_SIZE = 128; 24 | // and there is a little bit of work done between calls to Add. 25 | static void CCheckQueueSpeedPrevectorJob(benchmark::Bench& bench) 26 | { 27 | + if (GetNumCores() < 2) { 28 | + return; 29 | + }
I think this warrants a comment explaining "why?".
Also, maybe don't drop the MIN_CORES constant and use it here instead of the magic number "2"?
45 | @@ -44,7 +46,7 @@ static void CCheckQueueSpeedPrevectorJob(benchmark::Bench& bench) 46 | }; 47 | CCheckQueue<PrevectorJob> queue {QUEUE_BATCH_SIZE}; 48 | boost::thread_group tg; 49 | - for (auto x = 0; x < std::max(MIN_CORES, GetNumCores()); ++x) { 50 | + for (auto x = 0; x < GetNumCores() - 1; ++x) {
Maybe also comment here "why we do - 1".
ACK a65c1adc
Updated a65c1adc7638c2b6f2038cbe11a637b285dd8dd2 -> 4f9164389a078f64258f3d7f2ecda5ab68968e8c (pr19710.02 -> pr19710.03, diff).
Addressed @vasild's comments:
Also, maybe don't drop the MIN_CORES constant and use it here instead of the magic number "2"?
Maybe also comment here "why we do - 1".
25 | @@ -26,6 +26,10 @@ static const unsigned int QUEUE_BATCH_SIZE = 128; 26 | // and there is a little bit of work done between calls to Add. 27 | static void CCheckQueueSpeedPrevectorJob(benchmark::Bench& bench) 28 | { 29 | + if (GetNumCores() < MIN_CORES) {
4f9164389a078f64258f3d7f2ecda5ab68968e8c
nit, single line and also add comment.
Concept ACK.
Updated 4f9164389a078f64258f3d7f2ecda5ab68968e8c -> 5e7acd65443d3fc7be084ea81be3089e4b28bc9a (pr19710.03 -> pr19710.04, diff).
nit, single line and also add comment.
Code review ACK 5e7acd65443d3fc7be084ea81be3089e4b28bc9a.
ACK 5e7acd654
Thanks for the comments!
@MarcoFalke Mind having another look at this PR?
This change decreases the variance of benchmark results.
25 | @@ -26,6 +26,9 @@ static const unsigned int QUEUE_BATCH_SIZE = 128; 26 | // and there is a little bit of work done between calls to Add. 27 | static void CCheckQueueSpeedPrevectorJob(benchmark::Bench& bench) 28 | { 29 | + // We shouldn't ever be running with the checkqueue on a single core machine. 30 | + if (GetNumCores() < MIN_CORES) return;
Maybe ditch MIN_CORES and change this to GetNumCores() == 1?
https://en.cppreference.com/w/cpp/thread/thread/hardware_concurrency:
If the value is not well defined or not computable, returns
0.
Maybe <=1 ?
Updated 5e7acd65443d3fc7be084ea81be3089e4b28bc9a -> 3edc4e34fe2f92e7066c1455f5e42af2fdb43b99 (pr19710.04 -> pr19710.05, diff).
Maybe ditch
MIN_CORESand change this toGetNumCores() == 1?
Code review ACK 3edc4e34fe2f92e7066c1455f5e42af2fdb43b99
ACK