Upgrade and Shared Lock support and locking test cases. #1551

TheBlueMatt commented at 2:52 AM on July 3, 2012: contributor

This adds SHARED_LOCK, TRY_SHARED_LOCK, and UPGRADE_LOCK to complement LOCK and TRY_LOCK.

It implements the many-readers, single-writer model where any thread holding a LOCK excludes other threads from a SHARED_LOCK, but many threads can hold a SHARED_LOCK. Only one thread can hold a UPGRADE_LOCK, it can be held while others hold a SHARED_LOCK, but it cannot be held while another thread holds a LOCK.

This should create tons of low-hanging fruit for optimization, even though it doesn't do any itself.

sipa commented at 8:22 AM on July 3, 2012: member

Interesting. I definitely want shared locks at some point in the future.

You seem to implement some own implementation for a recursive shared lock? That certainly needs a lot of testing, as it can easily have subtle bugs. Maybe it is easier to clean up the code so that we don't need recursive locking at all (if all data structures are properly encapsulated, you can put the taking of locks in all entry functions, and never further inside).

Also, to get the best performance, you'll also need support for upgradable locks. For example, connecting a block has two stages, one where data is read from the backend (inputs fetched from db), and one where the changes are committed. If the first stage happens in an upgradable lock, it can happen concurrently with read-only locks (most RPC queries...), even while protecting that only a single (potential) writer is active.

TheBlueMatt commented at 2:15 PM on July 3, 2012: contributor

Sadly, boost only provides recursive locking support in boost::recursive_mutex, not in boost::shared_mutex. Thus I had to add recursive checking to CCriticalSection. In terms of making bitcoin not need recursive locks...that would get very ugly. Bitcoin is such a mess as it is, trying to track down all the cases where we depend on recursive locks, I would think, would be quite a bit of effort. Yes something like this needs to be beaten on for a while before merge, but the code is pretty straightforward. Also, Im still adding more test cases.

Added upgrade lock.

Make test_bitcoin quiet again. 1e037c96c8

Add shared/upgrade lock support to CCriticalSection. 86f8a46ab0

Upgrade CCriticalBlock to add Shared/Upgrade locks. a1ce2c520b

Add locking test cases. b3226fffab

Add a warning if we upgrade directly from shared->exclusive lock. d3d8bf6f92

in src/sync.cpp:None in 92dd7a5bb1 outdated

 141 | +        nHasExclusive.reset(nHasExclusive.get() + 1);
 142 | +        return;
 143 | +    }
 144 | +
 145 | +    if (nHasShared.get() > (int*)0)
 146 | +    {

sipa commented at 2:22 PM on July 3, 2012:

This should just cause a fatal error. It's a programming error to try to go from a shared lock to an exclusive lock, and this implementation is almost certainly never what someone wants (it releases the lock entirely).

TheBlueMatt commented at 2:31 PM on July 3, 2012:

In the traditional strict model, sure. But if you are holding a SHARED_LOCK in one function which then calls another which requests a LOCK (and which would have been called in the same way no matter what the originally read data held), its not necessarily an error. By providing support for that case here, adding SHARED_LOCKs to bitcoin takes a ton less auditing.

In terms of the implementation not being what the user really wants, meh on the caller-facing side it functions identical to a unlock_shared_and_lock() even if it may end up waiting as other threads grab a lock in between. (Note that that may cause an error in cases where moving from a SHARED_LOCK to a LOCK would be a programming error, but not in cases like the one above)

sipa commented at 2:42 PM on July 3, 2012:

There's no point in implementing it. The only case where you'd use it, is something that needs to be solved in another way: you'd need to have taken an upgradable lock in the first place. If data was read while in the shared lock, and used to make an update while having the exclusive lock, the results will be wrong if an update happens in between.

Ideally, every public method of a well-encapsulated data structure will either take a shared or upgradable lock, and the inner code can take exclusive locks when necessary. Not taking an upgradable lock if you intend to make modification in the same lock-session is simply an error.

Put otherwise: providing the ability to switch from shared lock to exclusive lock will only lead to hard-to-detect programming errors. Yes, there may be cases where you want two separate lock-sessions (the first one shared, and the second one exclusive), but doing it implicitly makes it only more confusing.

TheBlueMatt commented at 11:48 PM on July 3, 2012:

My point was less of doing something in one session and more of doing it it multiple "sessions" that are really recursive calls that are very non-obvious. This is especially a problem with cs_main, which covers multiple, largely unrelated data structures. Yes, if Bitcoin were designed properly and if the switch to using shared/upgrade/unique locks were instant, it likely wouldn't be an issue. However, take, ProcessMessage, it currently takes a cs_main exclusive lock, however it is conceivable that many of its branches only need a SHARED or UPGRADE lock. Take, specifically, tx processing. It is conceivable that tx processing will only need a mempool exclusive lock, and only a shared lock on cs_main. However, once the tx is accepted and the wallet is called to process, the wallet may end up taking a cs_main lock as a part of one of its routines (even though it may only need a shared lock, but the old code only knows exclusive). Debugging such issues and tracking down each case where this may occur can get very complicated in the current bitcoin codebase. The idea behind providing shared->exclusive upgrade is to take away one more thing that would otherwise have to be audited for when converting to shared locking. Yes, such things /should/ be caught, but there is no guarantee and it takes away just one more failure mode.

TheBlueMatt commented at 12:00 AM on July 4, 2012:

Added a printf("WARNING: Upgrading directly from shared to exclusive mutex, something is wrong here: %s:%d\n" if we upgrade directly from a shared to an exclusive lock.

BitcoinPullTester commented at 12:50 PM on August 9, 2012: none

Automatic sanity-testing: FAILED, see http://jenkins.bluematt.me/pull-tester/d3d8bf6f924678961c5f8200757f1deaa8630981 for binaries and test log.

This could happen for one of several reasons:

It chanages paths in makefile.linux-mingw or otherwise changes build scripts in a way that made them incompatible with the automated testing scripts
It does not merge cleanly onto current master
It does not build on either Linux i386 or Win32 (via MinGW cross compile)
The test suite fails on either Linux i386 or Win32

jgarzik commented at 1:12 AM on September 5, 2012: contributor

Recommend closing, for now. No ACKs gathered, and it seems to me like a valid tool for our toolbox, without an immediate demonstrated need. Our locks do not seem highly contended, which seems to bump this quite down the priority scale down to "theoretically useful."

sipa commented at 1:16 AM on September 5, 2012: member

@jgarzik Not contended? We have a cs_main that blocks anything useful being done in parallel. Since many tasks only need read-only access to data structures, shared locks could increase parallellism massively.

That said, I still disagree with an implementation that releases a lock when trying to go from shared to exclusive.

TheBlueMatt commented at 1:42 AM on September 5, 2012: contributor

ACK on lack of ACKs

TheBlueMatt closed this on Sep 5, 2012

luke-jr commented at 10:56 PM on September 20, 2012: member

FWIW, the new tests in this seem to fail occasionally.

TheBlueMatt commented at 11:02 PM on September 20, 2012: contributor

Yea, it didnt work 100%, but I never got around to debugging it, and it was never a high priority (no interest anyway...)

laanwj commented at 11:47 AM on September 21, 2012: member

This is a nice idea and will allow for some extra concurrency, but it makes reasoning in the current mess that is bitcoin locking even harder.

I agree with sipa that we first need to get rid of recursive locks (and in the meantime, get getter insight into the current mess), before introducing more complex locking primitives.

suprnurd referenced this in commit 8b7dffbb64 on Dec 5, 2017

lateminer referenced this in commit 8b25a1eaf5 on May 6, 2020

bitcoin locked this on Sep 8, 2021