sync: use proper TSA attributes

vasild commented at 7:03 pm on September 9, 2020: member

Use the proper attribute from Clang’s thread safety analysis for AssertLockHeld():

if DEBUG_LOCKORDER is defined then AssertLockHeld() will check if the caller owns the given mutex and will abort() if not. Clang has an attribute exactly for that - ASSERT_EXCLUSIVE_LOCK(), documented as: “These are attributes on a function or method that does a run-time test to see whether the calling thread holds the given capability. The function is assumed to fail (no return) if the capability is not held.” [1]
if DEBUG_LOCKORDER is not defined, then AssertLockHeld() does nothing, thus don’t tag it with any attributes (don’t fool the compiler that we do something which we don’t).

Replace LockAssertion with AssertLockHeld and remove the former.

On the places where Clang cannot deduce that a mutex is held, use NO_THREAD_SAFETY_ANALYSIS, intended to be used when a code is “too complicated for the analysis to understand” [2].

[1] https://clang.llvm.org/docs/ThreadSafetyAnalysis.html#assert-capability-and-assert-shared-capability [2] https://clang.llvm.org/docs/ThreadSafetyAnalysis.html#no-thread-safety-analysis

PR is compared with alternatives in https://github.com/bitcoin-core/bitcoin-devwiki/wiki/AssertLockHeld-PRs

sync: use proper TSA attributes

Use the proper attribute from Clang's thread safety analysis for
`AssertLockHeld()`:

* if `DEBUG_LOCKORDER` is defined then `AssertLockHeld()` will check if
  the caller owns the given mutex and will `abort()` if not. Clang has
  an attribute exactly for that - `ASSERT_EXCLUSIVE_LOCK()`, documented
  as: "These are attributes on a function or method that does a run-time
  test to see whether the calling thread holds the given capability. The
  function is assumed to fail (no return) if the capability is not
  held." [1]

* if `DEBUG_LOCKORDER` is not defined, then `AssertLockHeld()` does
  nothing, thus don't tag it with any attributes (don't fool the
  compiler that we do something which we don't).

Replace `LockAssertion` with `AssertLockHeld` and remove the former.

On the places where Clang cannot deduce that a mutex is held, use
`NO_THREAD_SAFETY_ANALYSIS`, intended to be used when a code is
"too complicated for the analysis to understand" [2].

[1] https://clang.llvm.org/docs/ThreadSafetyAnalysis.html#requires-requires-shared
[2] https://clang.llvm.org/docs/ThreadSafetyAnalysis.html#no-thread-safety-analysis

2a4081e487

vasild commented at 7:05 pm on September 9, 2020: member

This overlaps with #19865 and #19918.

cc @hebasto @ryanofsky @MarcoFalke @ajtowns

DrahtBot added the label P2P on Sep 9, 2020

hebasto commented at 7:50 pm on September 9, 2020: member

Tested with the following patch:

 0--- a/src/txmempool.cpp
 1+++ b/src/txmempool.cpp
 2@@ -1108,7 +1108,7 @@ bool CTxMemPool::IsLoaded() const
 3 
 4 void CTxMemPool::SetIsLoaded(bool loaded)
 5 {
 6-    LOCK(cs);
 7+    AssertLockHeld(cs);
 8     m_is_loaded = loaded;
 9 }
10

on master (564e1ab0f3dc573bd3ea60a80f6649c361243df9) clang emits warning:

0txmempool.cpp:1112:5: warning: writing variable 'm_is_loaded' requires holding mutex 'cs' exclusively [-Wthread-safety-analysis]
1    m_is_loaded = loaded;
2    ^

with this PR (2a4081e487f1b83a5c9f1194a73d5a01a4c35cc6) clang compiles without warnings

Not sure if this is correct as in both cases the patch causes run time abort().

vasild commented at 8:21 pm on September 9, 2020: member

This is exactly how it is supposed to work: https://clang.llvm.org/docs/ThreadSafetyAnalysis.html#assert-capability-and-assert-shared-capability

m_is_loaded = loaded; is never going to be executed if cs is not locked and thus clang does not emit a warning about it.

vasild commented at 7:00 am on September 10, 2020: member

Just to elaborate - expecting a warning in the example above is similar to expecting a warning in:

0int Func(int x)
1{
2    if (x == 1) {
3        return 10;
4    }       
5    abort();
6} // should we get "control reaches end of non-void function" here?

The compiler does not emit a warning because it sees that abort() will not return and thus code after it is unreachable.

What we are doing now in master looks like removing the no return attribute from abort() in order to get a warning for a code that will never be executed (we have removed the ASSERT_EXCLUSIVE_LOCK attribute from AssertLockHeld() to warn on unreachable code).

hebasto commented at 7:31 am on September 10, 2020: member

My testing patch makes the code wrong from the point of concurrency – writing to m_is_loaded is not protected by a mutex.

On master compiler emits a warning about wrong code.

With this PR we get know about wrong code only at run time.

I believe that the former is strictly better then latter.

ryanofsky commented at 3:15 pm on September 10, 2020: member

@vasild, the textualist interpretation of the clang documentation here is very smart, but I think is misplaced because https://clang.llvm.org/docs/ThreadSafetyAnalysis.html is not a legal document or programming language standard, and this interpretation causes practical problems that will make the code less verifiably thread safe and harder to work with. Here are the practical problems with this PR:

It makes AssertLockHeld and AssertNotLockHeld calls now have different TSA annotations depending on whether the DEBUG_LOCKORDER macro is defined or not defined. I think this is bad. TSA annotations are already confusing to developers and difficult to work with. They cause errors in some configurations, or warning in other configurations, or may be completely ignored. They can happen locally or not, and are enabled and disabled in various CI jobs pretty much randomly for no reason. I think they are already about as crazily inconsistent as anyone can stand. Applying different annotations based on whether DEBUG_LOCKORDER is defined makes things worse and I think is not acceptable because:
- DEBUG_LOCKORDER flag is supposed to a debug flag turning on runtime debug features. There isn’t a reason for someone to expect it to affect thread safety analysis in non-debug functions.
- Making individual annotations conditional on the DEBUG_LOCKORDER flag is worse than turning annotations off entirely based on DEBUG_LOCKORDER or another flag. Presence of an individual annotation doesn’t just cause errors or suppress errors but can actually cause some errors and suppress different errors at the same time. With this PR, instead of developers just facing the inconvenient scenario of errors happening in one configuration but not other configurations, they may now face a new (nightmare) scenario of an error happening in one configuration, where fixing it can break a different configuration, and fixing that can break the original configuration, and so on. We should avoid this by just having a master switch that turns off all thread safety annotations, and not go down the road of applying different individual annotations based on different switches.
(I think you know this, but to make it clear to others:) Adding ASSERT_EXCLUSIVE_LOCK annotation to existing AssertLockHeld calls throughout the codebase may make that code less thread safe when it is changed in the future because ASSERT_EXCLUSIVE_LOCK suppresses compile time checks which verify that the mutex is really locked in all those places. Unsafety of ASSERT_EXCLUSIVE_LOCK is the reason why #19668 switched away from ASSERT_EXCLUSIVE_LOCK to EXCLUSIVE_LOCKS_REQUIRED and is the reason why I think #19865 and #19918 are both safer alternatives to this PR. (#19865 removes all existing AssertLockHeld calls to keeps compile-time checking as strict as possible while eliminating redundant runtime checks. #19668 keeps existing AssertLockHeld calls because Marco and others believe they are useful, and it adds a different function to replace the pre-#19668 behavior).
(This is also acknowledged in your PR description:) This PR forces to us to use NO_THREAD_SAFETY_ANALYSIS in situations where we don’t need and shouldn’t want to disable thread safety analysis. All we need is ability to inform the compiler that a specific mutex is locked at a specific time, which is exactly what ASSERT_EXCLUSIVE_LOCK does and why we should use it.

On the textual disagreement, I think while the text of https://clang.llvm.org/docs/ThreadSafetyAnalysis.html could be improved, it is compatible with how #19865 and #19668 are using ASSERT_EXCLUSIVE_LOCK and how pre-#19668 code was using it it. The text says if a function is annotated with that attribute “The function is assumed to fail (no return) if the capability is not held.” I agree it’s bad to add an annotation that causes the compiler to make a false assumption, but in the actual situations where this would be relevant, that assumption is the least of our problems. If AssertLockHeld is called and the capability is not held, then fact that we’re calling AssertLockHeld indicates the actual program has undefined behavior, and theoretical optimizations compiler might apply to the AssertLockHeld invocation are a sideshow. It would be good to fix the clang documentation upstream to match what the compiler actually does: assuming the capability is held, not optimizing the debug function call. Much better than practical harms to safety and maintainability of future code which I think this PR is making as unnecessary tradeoffs.

vasild commented at 7:38 am on September 14, 2020: member

@hebasto, @ryanofsky, thanks for the review!

I re-read your comments a few times and I see your points, but I disagree with some of the above. I see neither side is convinced and so I am closing this PR because for it to get merged an agreement is needed.

It would have been too boring if everybody agreed on everything all the time! :)

I still think that sticking to the documentation and accepting the limitations of the tools used is the right approach in the long term. OTOH telling the compiler that we do something while we actually do something else and misusing its directives is confusing and may backfire at some point.

vasild closed this on Sep 14, 2020

ryanofsky commented at 8:05 pm on September 14, 2020: member

@vasild I can’t speak for others, but I agree with you there is a potential problem. I just don’t think this PR is an good solution to the problem due to bigger problems it creates.

I think the best solution would be a documentation fix. It’d be helpful if you could take a look at https://reviews.llvm.org/D87629 and see if it resolves your concerns.

aaronpuchert commented at 11:38 pm on September 16, 2020: none

if DEBUG_LOCKORDER is not defined, then AssertLockHeld() does nothing, thus don’t tag it with any attributes (don’t fool the compiler that we do something which we don’t).

The attribute is meant to model the standard assert, so the function doesn’t have to actually check anything. My comment on @ryanofsky’s documentation change goes into a bit more detail about that.

https://clang.llvm.org/docs/ThreadSafetyAnalysis.html is not a legal document or programming language standard

Technically right, but we’d appreciate if you treat it like a language standard. So if there are discrepancies between documentation and implementation we’d like to know them.

It makes AssertLockHeld and AssertNotLockHeld calls now have different TSA annotations depending on whether the DEBUG_LOCKORDER macro is defined or not defined. I think this is bad.

Agreed, conceptually it’s independent of the build profile whether a mutex is held in this case. It’s just that in some profiles you check that it is, but how you handle assertions (like if you drop them in non-debug builds) doesn’t concern the Analysis.

If AssertLockHeld is called and the capability is not held, then fact that we’re calling AssertLockHeld indicates the actual program has undefined behavior, and theoretical optimizations compiler might apply to the AssertLockHeld invocation are a sideshow.

To be clear, Thread Safety annotations have absolutely no effect on the generated code. They are part of the Clang AST, and are considered in the Analysis library, but CodeGen ignores them, so the IR emitted by the Frontend already contains no trace of the attributes. Meaning that whatever the middle end or backend do cannot possibly be influenced by them. No optimization will ever depend on them.

DrahtBot locked this on Feb 15, 2022

sync: use proper TSA attributes #19929