A while back I started looking at bitcoin, and I was disappointed with the code quality wrt locking.
I tried to fix the code to be clang -Wthread-safety clean, but I don't know the code enough to understand some of the entanglements that are currently present.
I've added lots of annotations that specifies the locking requirements on functions and data. This might seem a little overzealous, but I believe that this is necessary to "get back on track" and make the locking strategy clear and consistent.
Even with this patch there are lots of locking issues left - probably benign like things happening during startup and shutdown. There are also false positives from me not getting clang to grok TRY_LOCK correctly.
Below is a long note I wrote while looking through the source code a while back. Maybe you guys agree or maybe not - at least it explains the whats and the whys.
For an object that is accessed by multiple threads we have, for every field:
It is const, constant after initialization, or guarded by a mutex.
Re-using an object is a bad idea. It should never be done. If we do, we are mutating the constant-after-initialization fields. This leads to exessive locking and overcomplicated code. It is a lot simpler to discard the object and re-create it. This works, because the object is owned by some other object, and in that other object, it is a "guarded by mutex" field. Thus by re-creating the object, we lift the responsibility for locking into the parent.
A result of the above is that "constructors" must be separate from other functions in the class. Constructors that are not strictly C++ constructors, such as deserializing should be static functions if possible. This makes it impossible for them to to be used to mutate constant-after-initialization fields.
Locks should not be exported outside of a class. Java did this mistake and it is considered an anti-pattern.
Locks should never serialize code, they should protect data from being accessed while invariants on that data are broken.
Locks should be textually close to the fields. Given a lock, it should be easy to understand what it guards, and given a field, it should be easy to see which locks guard that field. Thus, locks should be private members of a class.
A good way to obfuscate this is to put a lock in a parent class, as done in keystore.h where CKeyStore. It is hard to know what that lock guards. It is a lot cleaner to duplicate that lock in the data section of the subclasses.
Recursive locks should never be used. They lead to bugs, and are only necessary in order to write sloppy code. If some functionality is both a transaction that ensures that the invariants of the data are obeyed, and part of larger transactions, then it is much clearer to separate these conserns into two functions:
void DoFoo() { LOCK(cs); DoFooUnlocked(); } void DoFooUnlocked() { ... }
Using locks to protect simple setters and getters is often buggy and/or complex. In most cases, mutating some state is part of a larger transaction, so the setter is often protected by some other lock external to the class. In these cases, re-building a new object is usually the best approach, but otherwise, explicitly marking setters as SetFooUnlocked() and not lock is clearer. It makes it explicit that you can't just mutate randomly without doing some external locking.
Do not hold locks over callbacks out of an object. This means that objects should signal some state change, and not guarantee which state the object is in when a callback happens. Sometimes, it makes sense to break this, but then the callback will have to be treated like a signal handler, or irq handler - i.e. it can only do simple things and not call out to other random code.
Every time you call TRY_LOCK, a kitten dies. The reason why this is bad is that it makes reasoning, and testing the code incredibly hard. TRY_LOCK is basically this:
if (rand() % 1) { DoA(); } else { DoB(); }
It is simply unnecessary randomness. The TRY_LOCK macro should be removed, it kills kittens.
If something like TRY_LOCK is truly needed, then at least it should be abstracted out of the actual functions, so that the functions themselves are testable. Thus in the above case, DoA() would be a separate, testable, function.
In some really large systems (millions of large tps), sharding work queues by lock is a reasonable design to reduce resource usage by threads (so you do cs_someLock.DoWorkNowOrWhenUnlocked(&DoA), and it is more efficient than TRY_LOCK, but for bitcoin - this is just overkill.
Having lots of semantically different locks in a class is not ideal. For example CNode has 4 locks. This means that there are 4 sets of invariants that can be established independently of each other in this class. To put it another way, when you call a function on this object, it should work on a subset of the state, or it should hold all 4 locks. This suggests that either the class is a composite of multiple objects, and should be broken up, or the number of locks should be reduced.
We cannot compose multiple critical sections which all take a lock into a large critical section. After we have released locks, all checks that we have done are invalidated. Most of CWallet is just plain wrong and based on this false assumption. Locking is done in CCryptoKeyStore instead of in CWallet. See above "Locks should be textually close to the fields" for a reason why people confuse themselves.
There is a lot of convoluted code like this: bool result; { LOCK(cs_KeyStore); result = (mapKeys.count(address) > 0); } return result;
which is equivalent to this:
LOCK(cs_KeyStore); return (mapKeys.count(address) > 0);The destructor is guaranteed to run after the expression is evaluated. There is no need to over-engineer.
Similarly, there is a lot of code like this:
void SomeFunction(...) { something(); { LOCK(cs_something); ... } }
or this: void SomeFunction(...) { { LOCK(cs_something); .. rest of function ... } return false; }
There is no point in all the excessive blocks. What are they for?