thread-safety: fix annotations with REVERSE_LOCK #32465

pull theuni wants to merge 4 commits into bitcoin:master from theuni:fix-reverselock changing 6 files +33 −23
  1. theuni commented at 0:49 am on May 10, 2025: member

    This is one of several PRs to cleanup/modernize our threading primitives.

    While replacing the old critical section locks in the mining code with a REVERSE_LOCK, I noticed that our thread-safety annotations weren’t hooked up to it. This PR gets REVERSE_LOCK working properly.

    Firstly it modernizes the attributes as-recommended by the clang docs (ctrl+f for USE_LOCK_STYLE_THREAD_SAFETY_ATTRIBUTES). There’s a subtle difference between the old unlock_function and new release_capability, where our reverse_lock only works with the latter. I believe this is an upstream bug. I’ve reported and attempted a fix here, but either way it makes sense to me to modernize.

    The second commit just fixes an unrelated logging bug that I noticed while hacking around in here.

    The third adds a missing annotation pointed out by a fixed REVERSE_LOCK. Because clang’s thread-safety annotations aren’t passed through a reference to UniqueLock as one may assume (see here for more details), cs_main has to be listed explicitly as a requirement.

    The last commit actually fixes the reverse_lock by making it a SCOPED_LOCK and using the pattern found in a clang test. Though the docs don’t describe how to accomplish it, the functionality was added in this commit. Due to aliasing issues (see link above), in order to work correctly, the original mutex has to be passed along with the lock, so all existing REVERSE_LOCKs have been updated. To ensure that the mutexes actually match, a runtime assertion is added.

  2. thread-safety: modernize thread safety macros
    Clang added new "capability"-based thread-safety attributes years ago, but the
    old ones remain supported for backwards-compatibility.
    
    However, while adding annotations for our reverse_lock, I noticed that there
    is a difference between the unlock_function and release_capability attributes.
    
    unlock_function actually maps to release_generic_capability, which does not
    work properly when implementing a scoped unlocker.
    
    To be consistent, the other capability-based attributes are updated here as
    well. To avoid having to update our macro usage throughout the codebase, I
    reused our existing ones.
    
    Additionally, SHARED_UNLOCK_FUNCTION is added here, as a subsequent PR will
    introduce annotations for shared_mutex and shared_lock.
    832c57a534
  3. thread-safety: add missing lock annotation
    No warning is currently emitted because our reverse_lock does not enforce our
    thread-safety annotations. Once it is fixed, the unlock would cause a warning.
    aeea5f0ec1
  4. thread-safety: add missing lockname for reverse_lock
    Noticed while adding annotations to reverse_lock.
    0065b9673d
  5. DrahtBot commented at 0:49 am on May 10, 2025: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/32465.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK TheCharlatan, davidgumberg, ryanofsky, fjahr
    Concept ACK hebasto, achow101

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #32592 (threading: remove ancient CRITICAL_SECTION macros by theuni)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  6. thread-safety: fix annotations with REVERSE_LOCK
    Without proper annotations, clang thinks that mutexes are still held for the
    duration of a reverse_lock. This could lead to subtle bugs as
    EXCLUSIVE_LOCKS_REQUIRED(foo) passes when it shouldn't.
    
    As mentioned in the docs [0], clang's thread-safety analyzer is unable to deal
    with aliases of mutexes, so it is not possible to use the lock's copy of the
    mutex for that purpose. Instead, the original mutex needs to be passed back to
    the reverse_lock for the sake of thread-safety analysis, but it is not actually
    used otherwise.
    
    [0]: https://clang.llvm.org/docs/ThreadSafetyAnalysis.html
    96a341d406
  7. theuni force-pushed on May 10, 2025
  8. hebasto commented at 7:43 am on May 10, 2025: member
    Concept ACK.
  9. laanwj added the label Utils/log/libs on May 10, 2025
  10. laanwj added the label Tests on May 10, 2025
  11. in src/threadsafety.h:19 in 832c57a534 outdated
    14@@ -15,23 +15,24 @@
    15 // See https://clang.llvm.org/docs/ThreadSafetyAnalysis.html
    16 // for documentation.  The clang compiler can do advanced static analysis
    17 // of locking when given the -Wthread-safety option.
    18-#define LOCKABLE __attribute__((lockable))
    19+#define LOCKABLE __attribute__((capability("")))
    20 #define SCOPED_LOCKABLE __attribute__((scoped_lockable))
    


    TheCharlatan commented at 11:10 am on May 10, 2025:
    This threw me off for a bit, but while some attributes have been renamed, some have been kept the same, and their macros renamed instead. See: https://clang.llvm.org/docs/ThreadSafetyAnalysis.html#mutex-h
  12. TheCharlatan approved
  13. TheCharlatan commented at 12:07 pm on May 10, 2025: contributor
    ACK 96a341d4064132292621af404de908f01fbe3e2f
  14. DrahtBot requested review from hebasto on May 10, 2025
  15. fanquake referenced this in commit 0a8ab55951 on May 22, 2025
  16. in src/threadsafety.h:28 in 832c57a534 outdated
    29-#define UNLOCK_FUNCTION(...) __attribute__((unlock_function(__VA_ARGS__)))
    30+#define EXCLUSIVE_LOCK_FUNCTION(...) __attribute__((acquire_capability(__VA_ARGS__)))
    31+#define SHARED_LOCK_FUNCTION(...) __attribute__((acquire_shared_capability(__VA_ARGS__)))
    32+#define EXCLUSIVE_TRYLOCK_FUNCTION(...) __attribute__((try_acquire_capability(__VA_ARGS__)))
    33+#define SHARED_TRYLOCK_FUNCTION(...) __attribute__((try_acquire_shared_capability(__VA_ARGS__)))
    34+#define UNLOCK_FUNCTION(...) __attribute__((release_capability(__VA_ARGS__)))
    


    davidgumberg commented at 11:13 pm on May 22, 2025:

    In commit https://github.com/bitcoin/bitcoin/commit/832c57a53410870471a52647ce107454de3bc69c (thread-safety: modernize thread safety macros)


    Feel free to disregard, I am just trying to understand the motivation of this commit:

    unlock_function actually maps to release_generic_capability, which does not work properly when implementing a scoped unlocker.

    Looking at the example code in clang’s documentation:

    0  // [ s/mutex/capability ] 
    1  // Release/unlock an exclusive mutex.
    2  void Unlock() RELEASE();
    3
    4  // Release/unlock a shared mutex.
    5  void ReaderUnlock() RELEASE_SHARED();
    6
    7  // Generic unlock, can unlock exclusive and shared mutexes.
    8  void GenericUnlock() RELEASE_GENERIC();
    

    So, my assumption here is that when the “unlock capability” that ReverseLock takes is released generically, the thread safety analyzer can’t be sure whether this was an exclusive capability or a shared one, consequently it is ambiguous whether the resource is locked again, because ReverseLock may have been one of many that took “unlock capability” on the lock, is that what the problem is? If release_generic can’t disambiguate these situations, what is it even useful for annotating?


    I can see empirically why undoing just this part of the diff

    0
    1-#define UNLOCK_FUNCTION(...) __attribute__((release_capability(__VA_ARGS__)))
    2+#define UNLOCK_FUNCTION(...) __attribute__((unlock_function(__VA_ARGS__)))
    

    this passage:

    0    if (block.m_data) {
    1        REVERSE_LOCK(lock, cs_main);
    2        if (!blockman.ReadBlock(*block.m_data, *index)) block.m_data->SetNull();
    3    }
    4    block.found = true;
    

    results in the following error:

    0src/node/interfaces.cpp:449:19: warning: mutex 'cs_main' is not held on every path through here [-Wthread-safety-analysis]
    1  449 |     block.found = true;
    

    Which indicates that the thread safety analyzer does not understand the “unlock capability” was released and the lock was taken again when the reverse lock went out of scope, I’m just not sure I understand why.


    theuni commented at 2:25 pm on May 28, 2025:

    So, my assumption here is that when the “unlock capability” that ReverseLock takes is released generically, the thread safety analyzer can’t be sure whether this was an exclusive capability or a shared one, consequently it is ambiguous whether the resource is locked again, because ReverseLock may have been one of many that took “unlock capability” on the lock, is that what the problem is? If release_generic can’t disambiguate these situations, what is it even useful for annotating?

    My best guess is that a generic unlock is intended to be usable for either a shared/unshared mutex. There’s conflicting info about what’s supposed to happen when you call unlock() on a shared mutex, so perhaps the generic annotation is intended to simply work as both without complaining either way?

    Which indicates that the thread safety analyzer does not understand the “unlock capability” was released and the lock was taken again when the reverse lock went out of scope, I’m just not sure I understand why.

    This was my observation as well, and I think it’s as simple as “this is a clang bug”. See the PR description. I PR’d an upstream fix here.

  17. in src/sync.h:228 in 96a341d406
    226             LeaveCritical();
    227             lock.swap(templock);
    228         }
    229 
    230-        ~reverse_lock() {
    231+        ~reverse_lock() UNLOCK_FUNCTION() {
    


    davidgumberg commented at 8:35 pm on May 28, 2025:

    I was pretty puzzled by the annotations I think in part because of the deprecated “lock/unlock” terminology instead of “acquire/release”, but also because the meaning of UNLOCK_FUNCTION/RELEASE appears to shift from the constructor to the destructor.

    A reasonable explanation of these semantics is provided by their author: https://bugs.llvm.org/show_bug.cgi?id=36162#c7

     0class SCOPED_CAPABILITY AutoUnlock {
     1public:
     2  // Release mu, implicitly acquire *this and connect it to mu.
     3  AutoUnlock(Mutex &mu) RELEASE(mu);
     4
     5  // Reacquire all underlying capabilities and warn on double lock.
     6  void lock() RELEASE();
     7
     8  // Release all underlying capabilities, warn on double unlock.
     9  void unlock() ACQUIRE();
    10
    11  // Release *this and acquire all underlying capabilities, if they are not held.
    12  // There is no warning on double lock.
    13  ~AutoUnlock() RELEASE();
    14};
    

    Note that the destructor says RELEASE(), which seems a bit counter-intuitive. We release the scoped capability, which acquires the underlying capabilities.

    More discussion here: https://bugs.llvm.org/show_bug.cgi?id=36162 and here: https://reviews.llvm.org/D52578


    I wonder how persuaded one should be by the arguments not to use this pattern (RAII unlocks/reverse-locks) from above

    • Destructor calls are usually hidden behind return/break/continue statements and the regular exit from a block, which is why they are supposed to do only clean-up. Locking as part of cleaning up seems weird. As a consequence, there are potentially a bunch of implicit lock operations that are hard to see.
    • The destructor will also be called on stack unwinding after an exception was thrown. This is an important reason for using AutoLock, but for AutoUnlock it means that we might acquire a mutex when unwinding the stack.
    • Destructors are not supposed to throw exceptions, but locking can (generally) do that. Chromium’s mutexes might not, but std::mutex::lock “throws std::system_error when errors occur, including errors from the underlying operating system that would prevent lock from meeting its specifications.” [1] Note that std::mutex::unlock is specified as noexcept.

    I think in the case of ReadBlock() a reverse lock could be avoided by moving locking from ReadBlock() to its callers, but, this topic is probably out of scope for this PR.


    ryanofsky commented at 7:40 pm on May 29, 2025:

    re: #32465 (review)

    Thanks for finding the bugzilla link. Explanation there was very helpful.

    I also do find the arguments against AutoUnlock pretty persuasive, and looking at REVERSE_LOCK calls in our codebase, it seems like they could pretty easily be dropped and replaced with plain unlock/lock calls.

    Doesn’t seem too important though. I think realistically if an exception is thrown, locking in the destructor should either be fine and succeed, or there must be a more serious bug or condition like an out of memory error where it wouldn’t make a huge difference if the relock throws or deadlocks.

  18. davidgumberg commented at 9:06 pm on May 28, 2025: contributor

    crACK 96a341d4064132292621af404de908f01fbe3e2f

    src/node/interfaces.cpp’s FillBlock() is incorrectly annotated, and no warning is emitted, cherry-picking the commits that fix reverse_lock’s annotations, the thread safety analyzer complains as it should about FillBlock():

    0$ git checkout --detach master && git cherry-pick 0065b9673db5da2994b0b07c1d50ebfb19af39d0^..96a341d4064132292621af404de908f01fbe3e2f
    1$ CC=clang CXX=clang++ cmake -B build && cmake --build build -j $(nproc)
    2# [...]
    3/bitcoin/src/node/interfaces.cpp:446:9: warning: releasing mutex 'cs_main' that was not held [-Wthread-safety-analysis]
    4  446 |         REVERSE_LOCK(lock, cs_main);
    5      |         ^
    6/bitcoin/src/sync.h:251:82: note: expanded from macro 'REVERSE_LOCK'
    7  251 | #define REVERSE_LOCK(g, cs) typename std::decay<decltype(g)>::type::reverse_lock UNIQUE_NAME(revlock)(g, cs, #g, __FILE__, __LINE__)
    

    Tested that without 832c57a53410870471a52647ce107454de3bc69c’s change to UNLOCK_FUNCTION() thread safety analysis also fails.

  19. fanquake requested review from ryanofsky on May 29, 2025
  20. in src/node/interfaces.cpp:434 in aeea5f0ec1 outdated
    430@@ -431,7 +431,7 @@ class NodeImpl : public Node
    431 };
    432 
    433 // NOLINTNEXTLINE(misc-no-recursion)
    434-bool FillBlock(const CBlockIndex* index, const FoundBlock& block, UniqueLock<RecursiveMutex>& lock, const CChain& active, const BlockManager& blockman)
    435+bool FillBlock(const CBlockIndex* index, const FoundBlock& block, UniqueLock<RecursiveMutex>& lock, const CChain& active, const BlockManager& blockman) EXCLUSIVE_LOCKS_REQUIRED(cs_main)
    


    ryanofsky commented at 6:42 pm on May 29, 2025:

    In commit “thread-safety: add missing lock annotation” (aeea5f0ec112f9ec29da37806925e961c4998365)

    Would it also make sense to add AssertLockHeld(cs_main) here? Developer notes say to “Combine annotations in function declarations with run-time asserts in function definitions”

  21. ryanofsky approved
  22. ryanofsky commented at 7:52 pm on May 29, 2025: contributor

    Code review ACK 96a341d4064132292621af404de908f01fbe3e2f. Nice changes modernizing the annotations and supporting better thread safety warnings. @theuni I’d be happy to merge this since it now has 3 acks but I’m wondering what you think of the arguments davidgumberg pointed to in #32465 (review) against having a REVERSE_LOCK object at all?

    I think those arguments are pretty compelling, and think maybe the ideal thing would be to close this PR (to avoid confusing different approaches) and open a new PR dropping REVERSE_LOCK entirely and replacing with it lock/unlock calls, or perhaps with a WITH_UNLOCK() macro that runs a fragment of code with unlock() before and lock() after, and does not relock when an exception is thrown, and does not make an unsafe lock() call inside a destructor.

  23. theuni commented at 8:18 pm on May 30, 2025: member

    I think those arguments are pretty compelling, and think maybe the ideal thing would be to close this PR (to avoid confusing different approaches) and open a new PR dropping REVERSE_LOCK entirely and replacing with it lock/unlock calls, or perhaps with a WITH_UNLOCK() macro that runs a fragment of code with unlock() before and lock() after, and does not relock when an exception is thrown, and does not make an unsafe lock() call inside a destructor.

    Hmm.

    For context, my primary motivation here is preparing for the follow-up commits that will actually remove the lock/unlock methods completely, in order to force the exception-safe RAII wrappers. See these two commits: https://github.com/theuni/bitcoin/commit/d4b09184ae8ce38862a758423653f606e8b96045 and https://github.com/theuni/bitcoin/commit/1f2a9afdb302dd3f1686fe3461b9d7918a6b1859. My logic there was that there’s basically no good reason why lock/unlock shouldn’t be called together in the same scope, and any attempt to do so is almost guaranteed to be either a bad design or a bug.

    For even more context, this is in preparation for adding a SharedMutex (std::shared_mutex/shared_lock wrappers with proper thread-safety annotations). Much of that work involves copying our current implementations, so I wanted them cleaned/tightened up as much as possible first.

    Point taken about dtor exceptions and unwinding, but yeah, I think in practice if we’re throwing in a lock() while unwinding, things have already gone sideways.

    Imo the REVERSE_LOCK pattern clearly demonstrates intent and, together with bare lock()/unlock() removal, makes it impossible to do crazy things. For example, it’s not possible to reverse or re-lock a reverse_lock, so one level deep is as far as you can go before having to re-architect your code to be more reasonable.

    So maybe before deciding what to do here, I should ask what you think of the two commits above? If you’re not onboard with removing bare lock/unlock functionality as a follow-up, this PR becomes much less interesting.

  24. fjahr commented at 10:09 am on May 31, 2025: contributor
    utACK 96a341d4064132292621af404de908f01fbe3e2f
  25. achow101 commented at 9:23 pm on June 2, 2025: member

    Concept ACK

    preparing for the follow-up commits that will actually remove the lock/unlock methods completely, in order to force the exception-safe RAII wrappers.

    I think that’s a good idea.

  26. davidgumberg commented at 8:02 am on June 3, 2025: contributor
    Concept ACK on dropping non-RAII locks/unlocks. Maybe this can be done without reverse locks, e.g. a (maybe dumb) approach at replacing REVERSE_LOCK with LOCK*’s: https://github.com/davidgumberg/bitcoin/commit/fde4e277550095ade852ebbe09bf2f0c7533a42e
  27. ryanofsky commented at 12:45 pm on June 3, 2025: contributor

    re: theuni #32465 (comment)

    For context, my primary motivation here is preparing for the follow-up commits that will actually remove the lock/unlock methods completely, in order to force the exception-safe RAII wrappers. See these two commits: theuni@d4b0918 and theuni@1f2a9af.

    Yeah I think I’d disagree with this approach and I don’t think it makes sense to remove the unlock + lock methods for the reasons davidgumberg quoted in #32465 (review). RAII locking is great and makes sense. RAII unlocking seems dangerous and inferior to static verification provided by the thread annotations.

    Temporary unlocking can be useful when performing slow operations, and sometimes necessary when dealing with condition variables and external notifications, so the C++ library provides manual unlock/lock methods, but does not provide an automatic unlocking+relocking class for good reasons.

    My logic there was that there’s basically no good reason why lock/unlock shouldn’t be called together in the same scope, and any attempt to do so is almost guaranteed to be either a bad design or a bug.

    This is true for lock+unlock, but not for unlock+relock. If you acquire a lock and perform an operation, and the operation fails, part of the cleanup afterward should be to release the lock, and it makes sense to enforce this. But if you temporarily release a lock to perform an operation and the operation fails, it usually will not make sense to reacquire the lock as part of cleanup, and this should probably not even be the default behavior, let alone something strictly enforced by an API.

    For even more context, this is in preparation for adding a SharedMutex (std::shared_mutex/shared_lock wrappers with proper thread-safety annotations). Much of that work involves copying our current implementations, so I wanted them cleaned/tightened up as much as possible first.

    That makes sense. If you think this PR helps with that, I don’t think this change is worse than the status quo. I just think it would be better if we eliminated REVERSE_LOCK entirely and replaced it with manual unlock+lock calls or a WITH_UNLOCK macro that unlocked before performing an operation, and relocked afterwards when the operation does not throw an exception.

    Since implementing that would touch all the same lines of code that this PR touches, it would be nice to make that change instead of this change to reduce churn. But it also seems fine if we think it is good to improve REVERSE_LOCK now in this PR and then remove it later in a future PR.

    Point taken about dtor exceptions and unwinding, but yeah, I think in practice if we’re throwing in a lock() while unwinding, things have already gone sideways.

    I think the point about having a call that throws in a destructor is a small part of the argument. The bigger point is that it does not make sense to relock in the error case to begin with. In the error case, even when the relock call succeeds, it will usually be pointless and unnecessary because it will be immediately followed by another unlock call as the stack unwinds

    Also, I’d reject the the idea that if a lock call fails it means there has been some catastrophic failure. We are using pretty basic locks, but more generally locks can support things like deadlock detection and timeouts so a good locking API will closely resemble the C++ standard API with a locking RAII class and manual lock and unlock methods. There will not be a funky unlocking RAII class that makes a lock call in the destructor because it’s not really a good idea.

    re: davidgumberg #32465 (comment)

    Concept ACK on dropping non-RAII locks/unlocks. Maybe this can be done without reverse locks, e.g. a (maybe dumb) approach at replacing REVERSE_LOCK with LOCK*’s: davidgumberg@fde4e27

    IMO that commit is a good demonstration of why it would be good to replace REVERSE_LOCK with manual unlock() + lock() calls or with a WITH_UNLOCK macro that makes them automatically.

    For example there would be no need to restructure ValidationSignalsImpl::Iterate. We could simply replace { REVERSE_LOCK(lock); f(); } with { lock.unlock(); f(); lock.lock(); } or WITH_UNLOCK(lock, f());

  28. ryanofsky commented at 4:49 pm on June 3, 2025: contributor

    To be clear, I all think the changes in this PR are improvements, even the last commit annotating REVERSE_LOCK, so I’d be happy to merge see this PR merged.

    I’m just saying the last commit changing and annotating REVERSE_LOCK could be unnecessary if we decide to remove it later, and think we should remove it for reasons discussed above.

  29. theuni commented at 6:25 pm on June 4, 2025: member

    To be clear, I all think the changes in this PR are improvements, even the last commit annotating REVERSE_LOCK, so I’d be happy to merge see this PR merged.

    I’m just saying the last commit changing and annotating REVERSE_LOCK could be unnecessary if we decide to remove it later, and think we should remove it for reasons discussed above.

    Understood, and points taken.

    Currently playing around with your WITH_UNLOCK idea to see what that looks like. I’ll report back.


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-06-15 09:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me