ken2812221
commented at 1:36 pm on October 11, 2018:
contributor
This PR removes use of the following object in checkqueue
boost::condition_variable
boost::mutex
boost::unique_lock
Instead of using boost::thread_group::interrupt_all to trigger boost::thread_interrupted exception, it should call InterruptScriptCheck() or scriptcheckqueue.Interrupt() to interrupt the checkqueue thread.
Also, it adds three functions:
StartScriptCheck()
InterruptScriptCheck()
StopScriptCheck()
for clearer logic for script check threads.
fanquake added the label
Refactoring
on Oct 11, 2018
ken2812221 force-pushed
on Oct 11, 2018
ken2812221 force-pushed
on Oct 11, 2018
in
test/lint/lint-includes.sh:69
in
5bb441ce2eoutdated
practicalswift
commented at 1:52 pm on October 11, 2018:
According to PR title boost/thread/condition_variable.hpp should go as well? :-)
ken2812221
commented at 1:54 pm on October 11, 2018:
I only remove them in checkqueue, the another place to use it is in scheduler.
practicalswift
commented at 1:52 pm on October 11, 2018:
contributor
Concept ACK
Towards a Boost free future! :-)
ken2812221 force-pushed
on Oct 11, 2018
ken2812221 force-pushed
on Oct 11, 2018
in
src/checkqueue.h:65
in
0e9a6f40e8outdated
61@@ -62,17 +62,19 @@ class CCheckQueue
62 //! The maximum number of elements to be processed in one batch
63 unsigned int nBatchSize;
6465+ std::atomic<bool> fInterrupted;
If an if only has a single-statement then-clause, it can appear
on the same line as the if, without braces. In every other case,
braces are required, and the then and else clauses must appear
correctly indented on a new line.
ken2812221
commented at 3:26 pm on October 11, 2018:
clang-format doesn’t catch this. Done manually.
ken2812221 force-pushed
on Oct 11, 2018
ken2812221 renamed this:
rafactor: Remove use of boost::condition_variable and boost::mutex in checkqueue
refactor: Remove use of boost::condition_variable and boost::mutex in checkqueue
on Oct 11, 2018
ken2812221 force-pushed
on Oct 11, 2018
in
src/validation.h:265
in
76c233a135outdated
259@@ -260,8 +260,12 @@ bool LoadBlockIndex(const CChainParams& chainparams) EXCLUSIVE_LOCKS_REQUIRED(cs
260 bool LoadChainTip(const CChainParams& chainparams);
261 /** Unload database information */
262 void UnloadBlockIndex();
263-/** Run an instance of the script checking thread */
264-void ThreadScriptCheck();
265+/** Start script checking thread */
266+void StartScriptCheck();
267+/** Interrupt all script checking thread */
ch4ot1c
commented at 7:48 pm on October 11, 2018:
contributor
utACK
ken2812221 force-pushed
on Oct 11, 2018
ken2812221 force-pushed
on Oct 11, 2018
ken2812221 force-pushed
on Oct 11, 2018
ken2812221 renamed this:
refactor: Remove use of boost::condition_variable and boost::mutex in checkqueue
refactor: Remove boost in checkqueue
on Oct 12, 2018
ken2812221 force-pushed
on Oct 12, 2018
sipa
commented at 7:51 pm on October 12, 2018:
member
utACK9aacef638f0c2159930640762daf524942052a9d
donaloconnor
commented at 9:56 pm on October 12, 2018:
contributor
Before merge I’d like to see some benchmarks, the checkqueue code is very performance sensitive, sometimes the boost condition variables and mutexs difference from std’s can be substantial. (I’m not kidding – the time to wake-up and condition variable sleep behavior is one of the major performance issues with this code).
ken2812221 force-pushed
on Oct 16, 2018
ken2812221
commented at 4:26 pm on October 16, 2018:
contributor
JeremyRubin
commented at 3:38 am on October 18, 2018:
contributor
That’s great, thanks for checking that – much bigger change than I expected it to be.
Looking at the numbers, it seems not only to be faster on the microbenchmark, but more consistent.
This merits running two additional tests, if you have energy for it:
Time to reindex (with assumevalid up to last month or something so it doesn’t take forever
Two nodes (one with this PR one without) connected behind a third node (on latest release), with debug bench set. Collect the logs on ConnectBlock.
Also collect a general system profile of which processes are using what on the cpu.
The second test is important because one thing I noticed when working on this code previously is that it was trivially easy to get speedups on this code at the expense of general performance. E.g., getting rid of condition variables (and just spinning until jobs came in) was much faster, but meant that the OS scheduler would be giving our CheckQueue workers run-time when there was nothing to do, impacting other processes on the computer (most likely because of causing more cache invalidation, rather than the impact of losing a few cycles of a process switch). Hence my suspicion: the 20% speedup is because the standard condition variables are not using system sleep (which plays better with the OS). This can actually negatively impact the process too as our main thread might be getting context switched out while we are trying to add jobs or something rather than waiting for notify.
JeremyRubin
commented at 3:42 am on October 18, 2018:
I think you could assert(! (fMaster && nTodo == 0)) here.
If fMaster were ever true here, we would deadlock.
Hence, there is no point to check that…
DrahtBot
commented at 10:02 am on October 20, 2018:
member
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.
Conflicts
No conflicts as of last run.
ken2812221 force-pushed
on Oct 20, 2018
ken2812221 force-pushed
on Oct 20, 2018
ken2812221 force-pushed
on Oct 20, 2018
ken2812221 force-pushed
on Oct 20, 2018
ken2812221 force-pushed
on Oct 20, 2018
ken2812221 force-pushed
on Oct 20, 2018
ken2812221 force-pushed
on Oct 20, 2018
in
src/checkqueue.h:101
in
02aae28c9doutdated
98 // return the current status
99 return fRet;
100+ } else {
101+ nIdle++;
102+ while (!(interrupted || !queue.empty() || (fMaster && nTodo == 0))) cond.wait(lock);
103+ if (interrupted) return false;
ken2812221
commented at 2:26 pm on October 24, 2018:
After tested a few times, return false or somthing else? seem to make the node have to reindex next time.
JeremyRubin
commented at 7:16 am on October 25, 2018:
The issue could be that when you interrupt, you don’t want to make fMaster return false causing the block to be marked false (which could be written into cache somehow).
Is interrupt guaranteed to wait until the current job is done? Making it guaranteed to wait probably could help fix that behavior… One way of doing that is to separate out when you wake up if there’s an active job or not, and don’t allow to quit when there is an active job until it’s done.
JeremyRubin
commented at 7:22 am on October 25, 2018:
ken2812221 renamed this:
refactor: Remove boost in checkqueue
[WIP] refactor: Remove boost in checkqueue
on Oct 24, 2018
JeremyRubin
commented at 8:32 am on October 25, 2018:
contributor
This version is not correct either if I’m correct about the block getting flushed with validity cached during shutdown.
Now you might incorrectly cache that a block was valid when it may not have been.
You need to finish processing the block, then shut down. Or, if a shutdown is triggered, abandon the block in the master (you could make a trivalent return: false, true, shutdown).
JeremyRubin
commented at 8:36 am on October 25, 2018:
contributor
Actually I think that’s not right, I think this version will just hang because nTodo will be != 0 and queue.empty() will be true, so the master would sleep and never get woken?
ken2812221
commented at 3:31 am on October 26, 2018:
contributor
@JeremyRubinmaster would never wait/sleep if interrupted is true. It would keep looping until nTodo == 0. Oh, I know what you mean.
JeremyRubin
commented at 6:24 am on October 26, 2018:
contributor
Your best bet IMO is to have a CheckQueueControl mutex which is separate from the CheckQueue mutex. Make interrupting get both…
ken2812221 force-pushed
on Oct 26, 2018
ken2812221 force-pushed
on Oct 26, 2018
ken2812221 force-pushed
on Oct 26, 2018
ken2812221 force-pushed
on Oct 26, 2018
ken2812221 force-pushed
on Oct 26, 2018
ken2812221 force-pushed
on Oct 26, 2018
ken2812221 force-pushed
on Oct 26, 2018
ken2812221 force-pushed
on Oct 26, 2018
ken2812221 force-pushed
on Oct 26, 2018
ken2812221
commented at 2:46 pm on October 26, 2018:
contributor
@JeremyRubin
How about now? I think the logic here would be more clear. It would keep checking until finish.
If there are new tasks added into queue after the worker threads ended, it would be only master worker checking the tasks unfortunately. However I think this is better than interrupt the master thread. Alternatively, we can detach these worker threads.
I am going to test this to see if there is anything broken.
ken2812221 force-pushed
on Oct 26, 2018
ken2812221 force-pushed
on Oct 26, 2018
JeremyRubin
commented at 11:53 pm on October 26, 2018:
contributor
I still think it’s best to make control of the checkqueue (as opposed to participation) protected by a mutex.
Easiest to audit/review.
But do let me know if that fixes the corruption issue…
ken2812221
commented at 0:01 am on October 27, 2018:
contributor
I still think it’s best to make control of the checkqueue (as opposed to participation) protected by a mutex.
There already have ControlMutex to ensure that there is only one CCheckQueueControl to access the CCheckQueue.
Edit: I don’t think it’s necessary to interrupt the master queue though. We can wait it to return the check result.
The second thought, if we detach the worker thread, then that is not necessary to interrupt the checkqueue. So I start to suspect that if the Interrupt is necessary here.
JeremyRubin
commented at 8:29 am on October 27, 2018:
contributor
I am saying that a simple change to make is to have Interrupt take the ControlMutex.
ken2812221 force-pushed
on Oct 27, 2018
ken2812221 force-pushed
on Oct 27, 2018
ken2812221 force-pushed
on Oct 27, 2018
ken2812221 force-pushed
on Oct 27, 2018
ken2812221
commented at 12:42 pm on October 27, 2018:
contributor
I am saying that a simple change to make is to have Interrupt take the ControlMutex.
That would cause recursive lock in test_bitcoin. Also, I don’t think that it’s necessary to lock ControlMutex. Can you explain why it’s necessary for Interrupt?
But do let me know if that fixes the corruption issue…
Seems the node won’t have to reindex after restarting now.
ken2812221 renamed this:
[WIP] refactor: Remove boost in checkqueue
refactor: Remove boost in checkqueue
on Oct 27, 2018
JeremyRubin
commented at 9:31 pm on October 27, 2018:
contributor
Where’s the recursive lock?
ken2812221
commented at 10:53 pm on October 27, 2018:
contributor
In test_bitcoin, it constuct CCheckQueueControl and call Interrupt in same thread
JeremyRubin
commented at 11:06 pm on October 27, 2018:
contributor
Interrupt should be called in the destructor of the test which should happen after any queuecontrols are destructed though?
ken2812221 force-pushed
on Oct 28, 2018
ken2812221
commented at 5:05 am on October 28, 2018:
contributor
Interrupt should be called in the destructor of the test which should happen after any queuecontrols are destructed though?
I think it would be better to make the checkqueue to control their worker threads instead of calling Thread() in somewhere.
ken2812221 force-pushed
on Oct 28, 2018
ken2812221 force-pushed
on Oct 28, 2018
ken2812221 force-pushed
on Oct 28, 2018
ken2812221 force-pushed
on Oct 28, 2018
ken2812221 force-pushed
on Oct 30, 2018
ken2812221 force-pushed
on Oct 30, 2018
ken2812221 force-pushed
on Oct 30, 2018
ken2812221 force-pushed
on Oct 30, 2018
ken2812221 force-pushed
on Oct 30, 2018
ken2812221 force-pushed
on Oct 30, 2018
ken2812221 force-pushed
on Oct 30, 2018
ken2812221 force-pushed
on Oct 30, 2018
ken2812221 force-pushed
on Oct 30, 2018
ken2812221 force-pushed
on Oct 30, 2018
ken2812221 force-pushed
on Oct 30, 2018
ken2812221 force-pushed
on Oct 31, 2018
ken2812221 force-pushed
on Nov 5, 2018
ken2812221 force-pushed
on Nov 9, 2018
ken2812221 force-pushed
on Nov 22, 2018
ken2812221 force-pushed
on Nov 22, 2018
ken2812221 force-pushed
on Nov 22, 2018
ken2812221 renamed this:
refactor: Remove boost in checkqueue
refactor: make checkqueue manage the threads by itself (also removed some boost dependencies)
on Nov 23, 2018
DrahtBot added the label
Needs rebase
on Dec 13, 2018
ken2812221 force-pushed
on Dec 17, 2018
DrahtBot removed the label
Needs rebase
on Dec 17, 2018
DrahtBot added the label
Needs rebase
on Dec 29, 2018
ken2812221 force-pushed
on Dec 29, 2018
fanquake deleted a comment
on Jan 3, 2019
fanquake removed the label
Needs rebase
on Jan 3, 2019
fanquake deleted a comment
on Jan 3, 2019
DrahtBot added the label
Needs rebase
on Jan 21, 2019
ken2812221 force-pushed
on Jan 22, 2019
DrahtBot removed the label
Needs rebase
on Jan 22, 2019
MarcoFalke
commented at 8:21 pm on March 6, 2019:
member
Could try a git rebase -Xignore-all-space df36ddf9ce8a4dcf0d7f324e57a13abb6cf6a57c?
DrahtBot added the label
Needs rebase
on Mar 6, 2019
refactor: Make CCheckQueue interruptible
This commit add a Interrupt function for CCheckQueue that it can handle interrupt by itself instead of relying on boost thread interrupt
d998ca40e1
refactor: Drop boost in CCheckQueue
replace boost::mutex with debuggable Mutex
replace boost::condition_variable with std::condition_variable
add const specifier to fMaster and nBatchSize
add clang thread safety attributes
move init value of member of CCheckQueue from constructor to definition
8a6e1a9531
refactor: Make CCheckQueue worker threads be owned by itself
Manage CCheckQueue workder threads by itself. Do not expose the threads outside the object.
f8145436ad
ken2812221 force-pushed
on Mar 7, 2019
DrahtBot removed the label
Needs rebase
on Mar 7, 2019
ken2812221 closed this
on Mar 15, 2019
practicalswift
commented at 6:10 am on March 15, 2019:
contributor
This is a metadata mirror of the GitHub repository
bitcoin/bitcoin.
This site is not affiliated with GitHub.
Content is generated from a GitHub metadata backup.
generated: 2024-11-17 06:12 UTC
This site is hosted by @0xB10C More mirrored repositories can be found on mirror.b10c.me