Preserve the LockData initial state if “potential deadlock detected” exception thrown #19340

pull hebasto wants to merge 3 commits into bitcoin:master from hebasto:200620-stack changing 3 files +33 −9
  1. hebasto commented at 8:26 pm on June 20, 2020: member

    On master (e3fa3c7d671e34038a262bb2db16b30bee82153d) if the push_lock() throws the “potential deadlock detected” exception (via the potential_deadlock_detected() call), the LockData instance internal state differs from one when the push_lock() was called. This non-well behaviour makes (at least) testing brittle.

    This PR preserves the LockData instance initial state if push_lock() throws an exception, and improves the sync_tests unit test.

  2. DrahtBot added the label Tests on Jun 20, 2020
  3. DrahtBot commented at 10:03 pm on June 20, 2020: member

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #19337 (sync: detect double lock from the same thread by vasild)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  4. in src/sync.cpp:272 in c810cd4d17 outdated
    263@@ -255,6 +264,14 @@ void DeleteLock(void* cs)
    264     }
    265 }
    266 
    267+bool LockStackEmpty()
    268+{
    269+    LockData& lockdata = GetLockData();
    270+    std::lock_guard<std::mutex> lock(lockdata.dd_mutex);
    271+    LockStack& lock_stack = lockdata.m_lock_stacks[std::this_thread::get_id()];
    272+    return lock_stack.empty();
    


    vasild commented at 7:50 am on June 22, 2020:

    operator[] has a side effect that it would insert a new element in m_lock_stacks if the thread that called it does not own any locks.

    0    const auto it = lockdata.m_lock_stacks.find(std::this_thread::get_id());
    1    if (it == lockdata.m_lock_stacks.end()) {
    2        return true;
    3    }
    4    return it->second.empty().
    

    hebasto commented at 11:46 am on June 22, 2020:
  5. in src/sync.cpp:131 in c810cd4d17 outdated
    130@@ -131,7 +131,7 @@ static void potential_deadlock_detected(const LockPair& mismatch, const LockStac
    131     throw std::logic_error("potential deadlock detected");
    


    vasild commented at 9:27 am on June 22, 2020:

    I would suggest to unconditionally remove the entry from the stack here, after printing the stack and before abort()/throw. This cleanup is warranted from all code paths that call EnterCritical().

    0    // Undo the insertion from push_lock(). We will not lock the mutex.
    1    s2.pop_back();
    2    throw std::logic_error("potential deadlock detected");
    

    (this should be before the if, 4 lines above, but github does not allow me to put the suggestion there)

  6. vasild commented at 9:42 am on June 22, 2020: member

    It would be good if the offending entry is still in the lock stack by the time potential_deadlock_detected() is called because it prints the stack and we want to see the offender in the debug log.

    Also, I think that it would be best to undo all mods before abort()/throw. We also inserted into lockdata.lockorders and in lockdata.invlockorders. Maybe something like:

     0diff --git i/src/sync.cpp w/src/sync.cpp
     1index 3098e402c..b21154d6e 100644
     2--- i/src/sync.cpp
     3+++ w/src/sync.cpp
     4@@ -124,17 +124,12 @@ static void potential_deadlock_detected(const LockPair& mismatch, const LockStac
     5         }
     6         if (i.first == mismatch.second) {
     7             LogPrintf(" (2)"); /* Continued */
     8         }
     9         LogPrintf(" %s\n", i.second.ToString());
    10     }
    11-    if (g_debug_lockorder_abort) {
    12-        tfm::format(std::cerr, "Assertion failed: detected inconsistent lock order at %s:%i, details in debug log.\n", __FILE__, __LINE__);
    13-        abort();
    14-    }
    15-    throw std::logic_error("potential deadlock detected");
    16 }
    17 
    18 static void double_lock_detected(const void* mutex, LockStack& lock_stack)
    19 {
    20     LogPrintf("DOUBLE LOCK DETECTED\n");
    21     LogPrintf("Lock order:\n");
    22@@ -186,14 +181,28 @@ static void push_lock(MutexType* c, const CLockLocation& locklocation)
    23         if (lockdata.lockorders.count(p1))
    24             continue;
    25         lockdata.lockorders.emplace(p1, lock_stack);
    26 
    27         const LockPair p2 = std::make_pair(c, i.first);
    28         lockdata.invlockorders.insert(p2);
    29-        if (lockdata.lockorders.count(p2))
    30+        if (lockdata.lockorders.count(p2)) {
    31             potential_deadlock_detected(p1, lockdata.lockorders[p2], lockdata.lockorders[p1]);
    32+
    33+            lockdata.invlockorders.erase(p2);
    34+            lockdata.lockorders.erase(p1);
    35+            lock_stack.pop_back();
    36+
    37+            if (g_debug_lockorder_abort) {
    38+                tfm::format(std::cerr,
    39+                    "Assertion failed: detected inconsistent lock order at %s:%i, "
    40+                    "details in debug log.\n",
    41+                    __FILE__, __LINE__);
    42+                abort();
    43+            }
    44+            throw std::logic_error("potential deadlock detected");
    45+        }
    46     }
    47 }
    48 
    49 static void pop_lock()
    50 {
    51     LockData& lockdata = GetLockData();
    
  7. hebasto commented at 9:56 am on June 22, 2020: member

    @vasild

    It would be good if the offending entry is still in the lock stack by the time potential_deadlock_detected() is called because it prints the stack and we want to see the offender in the debug log.

    It prints from LockOrders instances, not from the lock stack of the current thread, no?

  8. vasild commented at 10:59 am on June 22, 2020: member

    You are right! I did not realize that the LockStack objects are copied.

    Anyway - I think, for EnterCritical() to be a good citizen, if it throws, it should leave its internal structures in the state they were before it was called.

  9. hebasto force-pushed on Jun 22, 2020
  10. hebasto commented at 11:46 am on June 22, 2020: member

    Updated c810cd4d17f1f3854045c7e6f9093c856ba28406 -> 236e76047fd0cf260d4aed0c0926e9942e15d828 (pr19340.01 -> pr19340.02, diff):

    operator[] has a side effect that it would insert a new element in m_lock_stacks if the thread that called it does not own any locks. @vasild Anyway - I think, for EnterCritical() to be a good citizen, if it throws, it should leave its internal structures in the state they were before it was called.

    I’d leave it as is since it seems unrelated to this PR goal.

  11. hebasto force-pushed on Jun 22, 2020
  12. hebasto commented at 6:53 pm on June 22, 2020: member

    Reworked after the discussion with @vasild on IRC.

    ~Going to update the OP soon.~ The OP has been updated.

  13. hebasto renamed this:
    Fix lock stack after "potential deadlock detected" exception
    Preserve the `LockData` initial state if "potential deadlock detected" exception thrown
    on Jun 22, 2020
  14. vasild approved
  15. vasild commented at 8:00 am on June 23, 2020: member
    ACK f88222a9
  16. JeremyRubin commented at 9:05 pm on July 9, 2020: contributor
    I’m not sure I fully get what’s going on here, but the change looks fine to me. Can you explain how making the copy of the lockstack actually helps? It looks like a race condition to even copy it?
  17. vasild commented at 7:58 am on July 10, 2020: member

    @JeremyRubin thanks for asking! The key here is the added lock_stack.pop_back(), not the copying. The copying is needed because without it the prints from potential_deadlock_detected() would not be complete (because we already removed one element). There is no race because that code is operating under lockdata.dd_mutex.

    Let me elaborate with an example:

    • Thread t1 acquires lockA
    • Thread t2 acquires lockB
    • Thread t1 tries to acquire lockB
      • EnterCritical() adds lockB to the lock stack of t1 and succeeds
      • The lock stacks are: t1: lockA, lockB, t2: lockB
      • Now t1 hangs inside the OS mutex lock() function, waiting for lockB to be released.
    • Thread t2 tries to acquire lockA
      • EnterCritical() adds lockA to the lock stack of t2 and it is really upset because some threads first (try to) acquire lockA and then lockB and some other threads try in reverse order. It is going to either abort() or throw, lets assume it is going to throw.
      • The lock stacks are: t1: lockA, lockB, t2: lockB, lockA
      • EnterCritical() prints both lock stacks and throws.
      • The caller (in t2) catches the exception and continues executing (this is only happening from tests).

    Notice that at the end t2’s lock stack is lockB, lockA but it actually only owns lockB – this is the problem being fixed by this PR. This only matters in tests which catch the exception and continue executing. The other code ends up with abort() and the whole program terminates immediately.

  18. in src/sync.cpp:154 in f88222a9a1 outdated
    142@@ -143,14 +143,17 @@ static void push_lock(void* c, const CLockLocation& locklocation)
    143             break;
    144 
    145         const LockPair p1 = std::make_pair(i.first, c);
    146+        const LockPair p2 = std::make_pair(c, i.first);
    147+        if (lockdata.lockorders.count(p2)) {
    


    MarcoFalke commented at 6:12 am on July 31, 2020:
    Any reason this is now being done before if (lockdata.lockorders.count(p1))continue?

    hebasto commented at 1:48 pm on August 2, 2020:
  19. in src/sync.h:74 in f88222a9a1 outdated
    71@@ -71,6 +72,7 @@ template <typename MutexType>
    72 void static inline AssertLockHeldInternal(const char* pszName, const char* pszFile, int nLine, MutexType* cs) ASSERT_EXCLUSIVE_LOCK(cs) {}
    73 void static inline AssertLockNotHeldInternal(const char* pszName, const char* pszFile, int nLine, void* cs) {}
    74 void static inline DeleteLock(void* cs) {}
    


    MarcoFalke commented at 6:12 am on July 31, 2020:
    0void inline DeleteLock(void* cs) {}
    

    Could remove the confusing static’s here?


    hebasto commented at 1:48 pm on August 2, 2020:
  20. MarcoFalke approved
  21. MarcoFalke commented at 6:13 am on July 31, 2020: member

    ACK f88222a9a1b218b85c442654ee50ddce290484b6

    just some questions. Feel free to ignore.

  22. hebasto force-pushed on Aug 2, 2020
  23. Preserve initial state if push_lock() throws exception 1f96be25b0
  24. test: Repeat deadlock tests 42b2a95373
  25. test: Add LockStackEmpty() 63e9e40b73
  26. hebasto force-pushed on Aug 2, 2020
  27. hebasto commented at 1:47 pm on August 2, 2020: member

    Updated f88222a9a1b218b85c442654ee50ddce290484b6 -> cab80f4578e5c9998206b3a8d3cc1019d0841bba (pr19340.03 -> pr19340.04, diff):

    Any reason this is now being done before if (lockdata.lockorders.count(p1))continue?

    Could remove the confusing static’s here?


    Rebased cab80f4578e5c9998206b3a8d3cc1019d0841bba -> 63e9e40b73507b0c9361fc8728d4e97fd72c9ec9 (pr19340.04 -> pr19340.05) to fix Travis “lint” job errors.

  28. vasild approved
  29. vasild commented at 11:24 am on August 4, 2020: member
    ACK 63e9e40
  30. MarcoFalke commented at 2:59 pm on August 4, 2020: member
    re-ACK 63e9e40b73
  31. MarcoFalke merged this on Aug 4, 2020
  32. MarcoFalke closed this on Aug 4, 2020

  33. hebasto deleted the branch on Aug 4, 2020
  34. sidhujag referenced this in commit 6f9d723560 on Aug 4, 2020
  35. deadalnix referenced this in commit 100e3ff738 on Sep 7, 2021
  36. kittywhiskers referenced this in commit da19a4c732 on Oct 8, 2021
  37. kittywhiskers referenced this in commit 4f35330c45 on Oct 12, 2021
  38. PastaPastaPasta referenced this in commit 790a4f9d60 on Oct 12, 2021
  39. PastaPastaPasta referenced this in commit 3264e26813 on Oct 20, 2021
  40. pravblockc referenced this in commit 3f992e344f on Nov 18, 2021
  41. DrahtBot locked this on Feb 15, 2022

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-11-17 15:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me