JeremyRubin
commented at 6:27 am on April 21, 2020:
contributor
Nice! From a simple code review PoV it looks good.
I remember there being a bunch of subtle nasties in the specifics of using std::threads v.s. boost threads around interrupt system and the details of how condition variables work. I can’t remember what they all were though now. I think #14464 closed because of that…
But generally huge concept ack on anything that makes #9938 more likely to move forward! Note though that the cuckoocache sort of stole it’s thunder – a lot of the contention in the CheckQueue was around the sigcache locking and the cuckoocache made that stuff way faster so it’s not obvious #9938 will show as impressive gains, and the trade offs for consensus code may be less worth it. Can discuss more via IRC; I’d love to get some of the improvements through in any case.
hebasto
commented at 5:08 pm on April 21, 2020:
member
138@@ -132,16 +139,31 @@ class CCheckQueue
139 {
140 }
141142+ //! Create a pool of new worker threads.
143+ void StartWorkerThreads(const int threads_num)
144+ {
145+ if (threads_num <= 0) return;
MarcoFalke
commented at 11:35 pm on April 23, 2020:
Is this needed? the for loop will never run anyway when this is true.
100@@ -95,6 +101,7 @@ class CCheckQueue
101 return fRet;
102 }
103 nIdle++;
104+ if (m_stopped) return true; // TODO: This return value is unused.
MarcoFalke
commented at 11:37 pm on April 23, 2020:
Also, shouldn’t the if (m_stopped) be after the condition variable wait? Otherwise notifying it will first empty the queue.
Finally, m_stopped could be renamed to m_request_stop and only briefly set in StopWorkerThreads to true and then immediately after to false. No other member functions (like StartWorkerThreads) touch it
Finally, m_stopped could be renamed to m_request_stop and only briefly set in StopWorkerThreads to true and then immediately after to false.
How this approach could work when StopWorkerThreads() is called during check queue being processed? I think m_stopped should keep its state in that case.
In that case in the commit “bench: Use CCheckQueue local thread pool” the lint-includes.sh linter will produce:
0Good job! The Boost dependency "boost/thread/thread.hpp" is no longer used.
1Please remove it from EXPECTED_BOOST_INCLUDES in ./test/lint/lint-includes.sh
2to make sure this dependency is not accidentally reintroduced.
which seems a bit falsely.
MarcoFalke
commented at 11:23 am on April 24, 2020:
I hope that no one is running the linters on every commit. They are meant to warn authors and reviewers as well as maintainers of code smell in a pull request as a whole. They are not meant as unit tests that absolutely must pass on every commit.
in
src/test/checkqueue_tests.cpp:390
in
d64dd5ffeeoutdated
354-
355-
356-/** Test that CCheckQueueControl is threadsafe */
357-BOOST_AUTO_TEST_CASE(test_CheckQueueControl_Locks)
358-{
359- auto queue = MakeUnique<Standard_Queue>(QUEUE_BATCH_SIZE);
MarcoFalke
commented at 11:48 pm on April 23, 2020:
d64dd5ffee064b0b943b7e00b16b0ad28ac65474: Why remove this test?
I think if there isn’t an interface for something, but you can expose one to make testing more robust, it’s worthwhile.
E.g., the CheckQueue does not need to be templated as we know a concrete single type for it, but templating it allows us to mock out the job type and probe the internal execution, which is a good thing.
What’s the missing interface here? I don’t see one being used in the code for this test.
@MarcoFalke@JeremyRubin
I was confused about this test (I don’t know the exact reason of that confusion now).
The test has been restored.
in
src/checkqueue.h:39
in
629dae2482outdated
34- boost::mutex mutex;
35-
36- //! Worker threads block on this when out of work
37- boost::condition_variable condWorker;
38-
39- //! Master thread blocks on this when out of work
MarcoFalke
commented at 11:50 pm on April 23, 2020:
629dae24827121412fb24d31ac19a918ade1db3f: why remove the documentation?
MarcoFalke
commented at 1:51 pm on April 24, 2020:
member
In the last commit: Could replace the last occurrence of thread.hpp with #include <boost/thread/shared_mutex.hpp> and kill boost thread from the linter?
hebasto force-pushed
on Apr 24, 2020
hebasto
commented at 2:36 pm on April 24, 2020:
member
In the last commit: Could replace the last occurrence of thread.hpp with #include <boost/thread/shared_mutex.hpp> and kill boost thread from the linter?
hebasto
commented at 4:50 pm on April 24, 2020:
member
hebasto
commented at 5:45 pm on May 26, 2020:
member
Rebased 0b5e2a5528fae627e7be98ac231bfb9ae24cd504 -> 82d5e650ee2b72d1447cbfc7bf1bb107dfd390b6 (pr18710.09 -> pr18710.10) on top of the #18881.
DrahtBot added the label
Needs rebase
on May 28, 2020
hebasto force-pushed
on May 28, 2020
hebasto
commented at 8:22 am on May 28, 2020:
member
Rebased 82d5e650ee2b72d1447cbfc7bf1bb107dfd390b6 -> 46838e94da7c54389bce1fb7c0951ab338dcc304 (pr18710.10 -> pr18710.11) on top of the #16127.
DrahtBot removed the label
Needs rebase
on May 28, 2020
DrahtBot added the label
Needs rebase
on Jun 2, 2020
hebasto force-pushed
on Jun 4, 2020
hebasto
commented at 9:34 am on June 4, 2020:
member
Rebased 46838e94da7c54389bce1fb7c0951ab338dcc304 -> 8457827f784e0e2586b19e78785d143b8eeb9513 (pr18710.11 -> pr18710.12) on top of the #18792.
DrahtBot removed the label
Needs rebase
on Jun 4, 2020
DrahtBot added the label
Needs rebase
on Jun 4, 2020
hebasto force-pushed
on Jun 4, 2020
hebasto
commented at 2:37 pm on June 4, 2020:
member
Rebased 8457827f784e0e2586b19e78785d143b8eeb9513 -> b392ef178725795d9e38a4dd000b20ee47c0d60a (pr18710.12 -> pr18710.13) due to the conflict with #19142.
DrahtBot removed the label
Needs rebase
on Jun 4, 2020
DrahtBot added the label
Needs rebase
on Jun 5, 2020
hebasto force-pushed
on Jun 5, 2020
hebasto
commented at 5:12 am on June 5, 2020:
member
Rebased b392ef178725795d9e38a4dd000b20ee47c0d60a -> eee80a1c36d886583e387934abce6dac9b3117b5 (pr18710.13 -> pr18710.14) due to the conflict with #18758.
fanquake removed the label
Needs rebase
on Jun 5, 2020
JeremyRubin
commented at 11:58 pm on June 5, 2020:
contributor
utack eee80a1
Failure is a build pause, job needs restart. I’m still not sure why the test needed to be removed, maybe you can explain that more clearly.
hebasto force-pushed
on Jun 6, 2020
hebasto
commented at 6:24 am on June 6, 2020:
member
No chance, unfortunately. See: #19171.
The #18731 has been built on top of this PR successfully.
JeremyRubin
commented at 5:47 pm on June 6, 2020:
contributor
utACKec44014
Annoying note:
As noted in #18710 (comment), Would be good if someone with access to some of the different platforms/arches we support can do benching/real network profiling to just to confirm there’s no perf regression on the condition variable and mutex behavior. Otherwise seems ready to merge.
But to be clear, these potential regressions would probably be small compared to other improvements realizable in the checkqueue design, so I don’t think it’s a blocker for accepting these changes but we do want to know if merging this means we should be prioritizing some optimizations to neutralize any degradation.
hebasto closed this
on Jun 23, 2020
hebasto reopened this
on Jun 23, 2020
laanwj
commented at 2:50 pm on July 1, 2020:
member
Concept ACK, will review
laanwj added this to the "Blockers" column in a project
adamjonas
commented at 1:29 pm on July 27, 2020:
member
To summarize the state of review for this PR:
Concept ACK (laanwj), Approach ACK (MarcoFalke), Code review ACK (Promag), utACK (JeremyRubin) with no NACKs or blocking concerns raised.
@JeremyRubinasked if someone “with access to some of the different platforms/arches we support can do benching/real network profiling to just to confirm there’s no perf regression on the condition variable and mutex behavior.”
DrahtBot added the label
Needs rebase
on Jul 28, 2020
hebasto force-pushed
on Jul 29, 2020
hebasto
commented at 8:39 am on July 29, 2020:
member
Rebased ec4401480246f60c6b22d98d205a509a181010d7 -> e7893d2e522477374ef9bf01997e9776c86b9813 (pr18710.15 -> pr18710.16) due to the conflicts with #18637 and #19589.
hebasto
commented at 9:01 am on July 29, 2020:
member
DrahtBot removed the label
Needs rebase
on Jul 29, 2020
DrahtBot added the label
Needs rebase
on Jul 30, 2020
hebasto force-pushed
on Jul 31, 2020
DrahtBot removed the label
Needs rebase
on Jul 31, 2020
hebasto
commented at 11:21 am on July 31, 2020:
member
Rebased e7893d2e522477374ef9bf01997e9776c86b9813 -> a57a0e526c94b35b1b018fde5a8a2ea46fca8899 (pr18710.16 -> pr18710.17) due to the conflicts with #18011 and #19604.
63@@ -62,8 +64,11 @@ class CCheckQueue
64 //! The maximum number of elements to be processed in one batch
65 const unsigned int nBatchSize;
6667+ std::vector<std::thread> m_worker_threads;
PR is much faster here, quite an improvement here! Results are also relatively stable, at least stable enough to see that the PR is faster on average.
Intel Celeron N3050 (2 threads)
Here the runtime fluctuates so much that it doesn’t make much sense to show the raw numbers. Especially the PR seems much more prone to fluctuations. To get a better understanding of the fluctuations, I’ve set epoch(100), and generated boxplots with
Then I get two html’s for each run, which I’ve merged by hand:
The PR seems to be slower here, but this might also be an artifact of the benchmark. I don’t think this benchmark is very good.
martinus
commented at 4:36 pm on August 1, 2020:
contributor
I ran the benchmark again on the Intel Celeron N3050, but with BATCHES=5001 instead of 101. The benchmark takes a long time, and is a bit more stable. It still fluctuates by a factor of 4. On average the PR is slower than master on this machine.
hebasto force-pushed
on Aug 2, 2020
hebasto
commented at 9:53 am on August 2, 2020:
member
There are some concerns about thread oversubscription impact on benchmarking.
Mind repeating your benchmarks on 2-thread system with updated branch?
martinus
commented at 10:36 am on August 2, 2020:
contributor
Sure, I ran it again. i7-8700 is faster too. I also tested it with queue.StartWorkerThreads(1);, which is by far the fastest for that benchmark. It seems that the work units are far too small to be able to reasonably distribute the work. So I am not sure if this benchmark is much useful.
ns/job
job/s
err%
ins/job
cyc/job
IPC
bra/job
miss%
total
benchmark
582.25
1,717,469.22
0.5%
267.01
347.19
0.769
50.65
3.7%
0.02
master f7c73b03d975a72f609ded2bbe250c1c8a76a944
371.47
2,692,026.21
0.6%
260.57
334.12
0.780
49.78
3.7%
0.01
PR a57a0e526c94b35b1b018fde5a8a2ea46fca8899
351.57
2,844,406.84
1.7%
261.09
332.71
0.785
49.90
3.6%
0.11
PR e084c3b6051fc596d0b9a38ae7a5da58e3637a31
131.95
7,578,398.22
1.1%
262.19
252.28
1.039
50.15
3.8%
0.04
queue.StartWorkerThreads(1);
Updated graph with BATCHES = 5001 on the Celeron N3050:
hebasto
commented at 10:48 am on August 2, 2020:
member
benchmarking results mean improved on all tested platforms (including 2-threaded Celeron N3050)
What did I miss?
hebasto
commented at 2:30 pm on August 13, 2020:
member
To make reviewing easier, the first commit is split out to #19710.
laanwj removed this from the "Blockers" column in a project
hebasto force-pushed
on Aug 21, 2020
hebasto
commented at 6:33 am on August 21, 2020:
member
Rebased e084c3b6051fc596d0b9a38ae7a5da58e3637a31 -> fa815a305108bc8b25ac468abaec7fa60c261f41 (pr18710.18 -> pr18710.19) on top of #19710 to prevent a merge conflict.
MarcoFalke referenced this in commit
afffbb1bc6
on Aug 31, 2020
DrahtBot added the label
Needs rebase
on Aug 31, 2020
hebasto force-pushed
on Aug 31, 2020
hebasto
commented at 9:00 am on August 31, 2020:
member
Rebased fa815a305108bc8b25ac468abaec7fa60c261f41 -> 89c4f7591e02df4fd36071995740113c1c06db3d (pr18710.19 -> pr18710.20) due to the conflict with #19710.
DrahtBot removed the label
Needs rebase
on Aug 31, 2020
sidhujag referenced this in commit
bfafd05ac3
on Aug 31, 2020
laanwj added this to the "Blockers" column in a project
DrahtBot added the label
Needs rebase
on Sep 23, 2020
refactor: Use member initializers in CCheckQueue0ef938685b
Add local thread pool to CCheckQueue01511776ac
test: Use CCheckQueue local thread pooldba30695fc
bench: Use CCheckQueue local thread pool6784ac471b
refactor: Drop boost::thread stuff in CCheckQueuebb6fcc75d1
hebasto force-pushed
on Sep 24, 2020
hebasto
commented at 4:04 am on September 24, 2020:
member
Rebased 89c4f7591e02df4fd36071995740113c1c06db3d -> bb6fcc75d1ec94b733d1477c816351c50be5faf9 (pr18710.20 -> pr18710.21) due to the conflict with #19927.
DrahtBot removed the label
Needs rebase
on Sep 24, 2020
laanwj removed this from the milestone 0.21.0
on Oct 8, 2020
laanwj added this to the milestone 0.22.0
on Oct 8, 2020
jamesob
commented at 5:12 am on December 11, 2020:
member
Concept ACK, will plan to review.
LarryRuane
commented at 0:54 am on December 16, 2020:
contributor
nit: remove the #include <boost/thread/thread.hpp> from checkqueue_tests.cpp.
Looks good. I ran the affected tests (and single-stepped through most of the new test code), and verified using debugger that a mainnet node correctly reaches the real script checking code.
ACKbb6fcc75d1ec94b733d1477c816351c50be5faf9
hebasto
commented at 1:21 am on December 16, 2020:
member
This is a metadata mirror of the GitHub repository
bitcoin/bitcoin.
This site is not affiliated with GitHub.
Content is generated from a GitHub metadata backup.
generated: 2024-12-18 18:12 UTC
This site is hosted by @0xB10C More mirrored repositories can be found on mirror.b10c.me