Avoid excessive lock contention in CCheckQueue::Add

hebasto commented at 5:26 pm on October 30, 2021: member

This PR significantly reduces lock contention in the CCheckQueue class by releasing a mutex before calling std::condition_variable::notify_one and std::condition_variable::notify_all.

From C++ docs:

The notifying thread does not need to hold the lock on the same mutex as the one held by the waiting thread(s); in fact doing so is a pessimization, since the notified thread would immediately block again, waiting for the notifying thread to release the lock.

Related to:

hebasto renamed this:
~~211030 contention~~
Avoid excessive lock contention in CCheckQueue::Add
on Oct 30, 2021

hebasto commented at 5:30 pm on October 30, 2021: member

cc @ajtowns @theuni @martinus @jonatack

lsilva01 approved

lsilva01 commented at 9:16 pm on October 30, 2021: contributor

tACK 5bcfd70 on Ubuntu 20.04

This PR moves LOCK(m_mutex) to a smaller scope, so when code reaches notify_one()or notify_all(), the mutex is already released. It also changes CCheckQueue ::Add to exit earlier if vChecks is empty.

jamesob commented at 2:11 pm on November 1, 2021: member

@hebasto do you expect there are significant performance implications for this change? Should I benchmark?

hebasto commented at 2:25 pm on November 1, 2021: member

@jamesob

@hebasto do you expect there are significant performance implications for this change?

Not sure about “significant” :)

Should I benchmark?

I’ll appreciate it! Also I have a plan to improve CCheckQueue performance further (not here, of course).

in src/checkqueue.h:170 in 0e41699722 outdated

166@@ -167,15 +167,20 @@ class CCheckQueue
167     //! Add a batch of checks to the queue
168     void Add(std::vector<T>& vChecks)
169     {
170-        LOCK(m_mutex);
171-        for (T& check : vChecks) {
172-            queue.push_back(T());
173-            check.swap(queue.back());
174+        unsigned int num_of_new_checks;

sipa commented at 3:31 pm on November 1, 2021:

unsigned int is typically smaller than the size of a vector. Use size_t or std::vector<T>::size_type perhaps?

I very much doubt we’ll ever have more than 4 billion checks queued up, but it also seems simpler to make the type can just account for everything.

hebasto commented at 3:54 pm on November 1, 2021:

I’ve chosen unsinged int due to https://github.com/bitcoin/bitcoin/blob/5adc5c02800f00d1e6e8812a2b0559b1800e82e9/src/checkqueue.h#L60

Maybe use size_t for nTodo as well?

sipa commented at 4:49 pm on November 1, 2021:

Ah, I see, that makes sense. I think changing to size_t everywhere can be done in a separate PR, if desires.

JeremyRubin commented at 6:56 pm on November 1, 2021:

the max number of checks is IIRC something slightly above 100k

in src/checkqueue.h:183 in 0e41699722 outdated

185-        if (vChecks.size() == 1)
186+
187+        if (num_of_new_checks == 1)
188             m_worker_cv.notify_one();
189-        else if (vChecks.size() > 1)
190+        else if (num_of_new_checks > 1)

sipa commented at 3:32 pm on November 1, 2021:

If you’re touching this code, make it follow style? (no indentation without brances).

hebasto commented at 3:51 pm on November 1, 2021:

Style is fixed in the second commit. Or do you want me to follow style here?

sipa commented at 5:45 pm on November 1, 2021:

Oh, that’s fine.

sipa commented at 3:33 pm on November 1, 2021: member

Concept ACK, but left some nits.

in src/checkqueue.h:177 in 0e41699722 outdated

176+            LOCK(m_mutex);
177+            for (T& check : vChecks) {
178+                queue.push_back(T());
179+                check.swap(queue.back());
180+            }
181+            num_of_new_checks = vChecks.size();

martinus commented at 6:24 pm on November 1, 2021:

nit: I think you could also write const auto num_of_new_checks = vChecks.size(); above, saves one line of code in the lock and doesn’t keep an uninitialized variable around for a few lines.

sipa commented at 7:09 pm on November 1, 2021:

Actually none of that is necessary. vChecks doesn’t change size during the adding loop. The old vChecks.size() based checks can be kept.

hebasto commented at 8:48 pm on November 1, 2021:

vChecks doesn’t change size during the adding loop.

Correct. But vChecks is passed in CCheckQueue::Add by a non-const reference, and that makes possible its modification by a concurrent thread between leaving a critical section and checking num_of_new_checks == 1.

Although, the only caller is located in CChainState::ConnectBlock which is guarded by the cs_main mutex. Therefore, you are right.

The old vChecks.size() based checks can be kept.

Yes, it will work. But while looking into CCheckQueue::Add code, it is not obvious that vChecks.size() is constant in multi-threading environment. I think that adding a local variable makes code more readable and robust considering any possible changes at the caller site(s) in the future.

OTOH, it seems all could look clearer if we pass vChecks as an rvalue reference using move semantics. What do you think?

vasild commented at 3:50 pm on November 2, 2021:

I would never think that a reference argument could be changed by another thread while the function is executing. I hope we don’t do that anywhere in the code. Because how would the function avoid races? Which mutex protects the argument? That would require something like (it does not compile, of course):

0void func(T& arg GUARDED_BY(some_mutex), ...)
1{
2    ...
3    LOCK(some_mutex);
4    access arg;
5    ...

So, I think it is safe to assume that vChecks will not change while Add() is executing. Indeed the only caller passes in a local variable that is not visible by other threads (and then throws it away, so the “swap” is actually a “move”).

… it seems all could look clearer if we pass vChecks as an rvalue reference using move semantics …

Yes :) and std::move() each element from vChecks into queue. It is unnecessary to first create a default-constructed object only to swap it with another and throw it away. This is what std::move() is for.

sipa commented at 4:06 pm on November 2, 2021:

@hebasto That’s not how const works in C++. The lack thereof means you (the called function in this case) are allowed to modify the argument; it says nothing at all about whether another thread might be modifying the same variable. Even if the argument was const, nothing prevents another thread from holding a mutable reference to the same object.

That said, passing an argument (const or not) to a function while it is being modified by other threads would be completely non-idiomatic, dangerous code. It is possible, and sometimes necessary of course, but at least you’d expect the passed object to have internal mutexes or other synchronization mechanisms to protect its state in that case; not something the callee should deal with. Virtually every STL function will exhibit undefined behavior if you pass it references to arguments that are being modified concurrently. Outside of very exceptional cases that should raise big red flags, you can very much assume that arguments passed to a function are under exclusive control of that function for the duration of its execution.

hebasto commented at 4:36 pm on November 2, 2021:

@sipa Thanks for your detailed explanation.

Going to address all of the comments.

in src/checkqueue.h:174 in 0e41699722 outdated

173-            check.swap(queue.back());
174+        unsigned int num_of_new_checks;
175+        {
176+            LOCK(m_mutex);
177+            for (T& check : vChecks) {
178+                queue.push_back(T());

martinus commented at 6:25 pm on November 1, 2021:

nit: maybe not part of this PR, but using queue.emplace_back() instead queue.push_back(T()) would save one move/copy constructor and one destructor because the object could be directly constructed in the queue

vasild commented at 3:36 pm on November 2, 2021:

What about queue.push_back(std::move(check))?

martinus commented at 4:21 pm on November 2, 2021:

I tried that once but then tests fail, I think for the types used here std::swap is implemented but a move constructor isn’t. I didn’t have a closer look why that fails.

I think the nicest c++ solution would be to replace the whole loop with

0queue.insert(queue.end(), std::make_move_iterator(vChecks.begin()), std::make_move_iterator(vChecks.end()));

But this too requires properly implementing move constructor.

martinus approved

martinus commented at 6:34 pm on November 1, 2021: contributor

utACK 5bcfd70a3b0e008d7a80e37a29fe90887a0f8c66. I doubt that this makes any measurable difference, but I’ve often been wrong with benchmark guess and shorter locks are always better :+1:

JeremyRubin commented at 6:59 pm on November 1, 2021: contributor

would love to see some thorough testing on this. I could be misremembering, but check cross platform that there’s no reason to continue to hold the lock. IIRC there was some sort of edge condition (maybe in window?) around a notify being able to be missed or something, but the details are like 5-6 years out of my cache :)

in src/checkqueue.h:170 in 5bcfd70a3b outdated

166@@ -167,16 +167,26 @@ class CCheckQueue
167     //! Add a batch of checks to the queue
168     void Add(std::vector<T>& vChecks)
169     {
170-        LOCK(m_mutex);
171-        for (T& check : vChecks) {
172-            queue.push_back(T());
173-            check.swap(queue.back());
174+        if (vChecks.empty()) {

vasild commented at 3:56 pm on November 2, 2021:

The code below assumes vChecks may be modified by another thread while this is executing (it cannot actually). I guess either remove that assumption or move this access to vChecks under m_mutex (if that mutex it supposed to protect vChecks!?).

It is a race to call std::vector::empty() while modifying it concurrently.

hebasto commented at 4:38 pm on November 2, 2021:

You are right.

I guess either remove that assumption or move this access to vChecks under m_mutex (if that mutex it supposed to protect vChecks!?).

The former will go.

Avoid excessive lock contention in CCheckQueue::Add c43aa62343

Exit early for an empty vChecks in CCheckQueue::Add 459e208276

hebasto force-pushed on Nov 3, 2021

hebasto commented at 9:36 am on November 3, 2021: member

@sipa @martinus @vasild

Thank you for your reviews and your suggestions!

Updated 5bcfd70a3b0e008d7a80e37a29fe90887a0f8c66 -> 459e208276a4d1457d37bf41c977e62caf05456d (pr23397.01 -> pr23397.02, diff):

addressed all of the comments

vasild approved

vasild commented at 11:47 am on November 3, 2021: member

ACK 459e208276a4d1457d37bf41c977e62caf05456d

The CI failure looks unrelated.

jamesob commented at 12:44 pm on November 3, 2021: member

ACKish - I haven’t reviewed the code, but it works for the benchmarks and so I guess in some sense I’ve tested it. As expected, no big diff in performance.

#23397 vs. $mergebase (absolute)

bench name	x	#23397	$mergebase
ibd.local.range.500000.540000.total_secs	2	4826.3510 (± 0.4951)	4831.4970 (± 6.8399)
ibd.local.range.500000.540000.peak_rss_KiB	2	6410470.0000 (± 1750.0000)	6406678.0000 (± 2714.0000)
ibd.local.range.500000.540000.cpu_kernel_secs	2	345.3800 (± 3.1200)	348.2850 (± 0.8050)
ibd.local.range.500000.540000.cpu_user_secs	2	30102.6800 (± 5.2600)	30086.9150 (± 24.1150)

#23397 vs. $mergebase (relative)

bench name	x	#23397	$mergebase
ibd.local.range.500000.540000.total_secs	2	1.000	1.001
ibd.local.range.500000.540000.peak_rss_KiB	2	1.001	1.000
ibd.local.range.500000.540000.cpu_kernel_secs	2	1.000	1.008
ibd.local.range.500000.540000.cpu_user_secs	2	1.001	1.000

martinus commented at 5:23 pm on November 3, 2021: contributor

ACK 459e208, codereview and tested. I first thought this introduced a segfault in psbt_wallet_tests/psbt_updater_test because that test failed for me, but thats a different issue fixed in #23403.

theStack approved

theStack commented at 2:38 pm on November 24, 2021: member

Code-review ACK 459e208276a4d1457d37bf41c977e62caf05456d

Maybe I’m missing something, but IMHO there is currently no possible way that CCheckQueue:Add gets called with an empty vector as argument, as the size always equals the number of inputs of a transaction (see last out parameter of CheckInputScripts). It’d be only empty for coinbase transactions, but for those control.Add(...) doesn’t get called. That said, it doesn’t hurt if the empty check in Add is still there.

MarcoFalke merged this on Nov 29, 2021

MarcoFalke closed this on Nov 29, 2021

sidhujag referenced this in commit 537b8a836d on Nov 29, 2021

hebasto deleted the branch on Nov 29, 2021

RandyMcMillan referenced this in commit a6aa71207f on Dec 23, 2021

PastaPastaPasta referenced this in commit c8374bd577 on Apr 7, 2022

PastaPastaPasta referenced this in commit 25f8e062c0 on Apr 7, 2022

PastaPastaPasta referenced this in commit 572737fff5 on Apr 7, 2022

PastaPastaPasta referenced this in commit 22838d1b8c on Apr 11, 2022

DrahtBot locked this on Nov 29, 2022

Avoid excessive lock contention in CCheckQueue::Add #23397

#23397 vs. $mergebase (absolute)

#23397 vs. $mergebase (relative)