test: Fix inconsistent lock order in wallet_tests/CreateWallet #19982

pull hebasto wants to merge 3 commits into bitcoin:master from hebasto:200920-leave-cs changing 7 files +75 −23
  1. hebasto commented at 9:10 am on September 20, 2020: member

    This PR:

    • fixes #19049 that was caused by #16426
    • removes wallet_tests::CreateWallet suppression from the test/sanitizer_suppressions/tsan

    The example of the improved CheckLastCritical()/LEAVE_CRITICAL_SECTION() log (could be got when compiled without the last commit):

    02020-09-20T08:34:28.429485Z [test] INCONSISTENT LOCK ORDER DETECTED
    12020-09-20T08:34:28.429493Z [test] Current lock order (least recent first) is:
    22020-09-20T08:34:28.429501Z [test]  'walletInstance->cs_wallet' in wallet/wallet.cpp:4007 (in thread 'test')
    32020-09-20T08:34:28.429508Z [test]  'cs_wallets' in wallet/wallet.cpp:4089 (in thread 'test')
    

    Currently, there are other “naked” LEAVE_CRITICAL_SECTION() in the code base: https://github.com/bitcoin/bitcoin/blob/b99a1633b270e0e89479b2bb2ae19a8a8dc0fa05/src/rpc/mining.cpp#L698

    https://github.com/bitcoin/bitcoin/blob/b99a1633b270e0e89479b2bb2ae19a8a8dc0fa05/src/checkqueue.h#L208

  2. hebasto force-pushed on Sep 20, 2020
  3. hebasto force-pushed on Sep 20, 2020
  4. hebasto marked this as ready for review on Sep 20, 2020
  5. DrahtBot added the label Wallet on Sep 20, 2020
  6. hebasto force-pushed on Sep 20, 2020
  7. hebasto force-pushed on Sep 20, 2020
  8. DrahtBot commented at 6:46 pm on September 20, 2020: member

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Conflicts

    No conflicts as of last run.

  9. promag commented at 8:59 am on September 21, 2020: member

    This change reveals a bug in the wallet_tests/CreateWalletFromFile test, that will be fixed in the following commit. @hebasto I’m not following, how is it revealed?

  10. hebasto commented at 9:18 am on September 21, 2020: member

    @promag

    This change reveals a bug in the wallet_tests/CreateWalletFromFile test, that will be fixed in the following commit.

    @hebasto I’m not following, how is it revealed?

     0$ git checkout 7ac14ec58740226af85401f879f37f09755b990b
     1$ ./autogen.sh && ./configure --enable-debug && make clean && make
     2$ src/test/test_bitcoin --run_test=wallet_tests/CreateWalletFromFile
     3Running 1 test case...
     4wallet/test/wallet_tests.cpp:786 wallet->wallet()->cs_wallet was not most recent critical section locked, details in debug log.
     5unknown location(0): fatal error: in "wallet_tests/CreateWalletFromFile": signal: SIGABRT (application abort requested)
     6wallet/test/wallet_tests.cpp(785): last checkpoint
     7
     8*** 1 failure is detected in the test module "Bitcoin Core Test Suite"
     9^C
    10$ tail -4 '/tmp/test_common_Bitcoin Core/03a41ec8724b4ea8adc603b0dc139290a6f9c765607df0bc277823d7b0165c20/regtest/debug.log'
    112020-09-21T09:16:36.739551Z [test] INCONSISTENT LOCK ORDER DETECTED
    122020-09-21T09:16:36.739557Z [test] Current lock order (least recent first) is:
    132020-09-21T09:16:36.739562Z [test]  'walletInstance->cs_wallet' in wallet/wallet.cpp:4007 (in thread 'test')
    142020-09-21T09:16:36.739568Z [test]  'cs_wallets' in wallet/wallet.cpp:4089 (in thread 'test')
    
  11. promag commented at 9:25 am on September 21, 2020: member
    Thanks @hebasto, I tried make check instead. I’d suggest to add [skip ci] to that commit body
  12. hebasto commented at 9:43 am on September 21, 2020: member

    @promag

    Thanks @hebasto, I tried make check instead.

     0$ make check
     1...
     22020-09-21T09:41:56.850457Z [test] INCONSISTENT LOCK ORDER DETECTED
     32020-09-21T09:41:56.850463Z [test] Current lock order (least recent first) is:
     42020-09-21T09:41:56.850470Z [test]  'walletInstance->cs_wallet' in wallet/wallet.cpp:4007 (in thread 'test')
     52020-09-21T09:41:56.850475Z [test]  'cs_wallets' in wallet/wallet.cpp:4089 (in thread 'test')
     6wallet/test/wallet_tests.cpp:786 wallet->wallet()->cs_wallet was not most recent critical section locked, details in debug log.
     7make[3]: *** [Makefile:20887: wallet/test/wallet_tests.cpp.test] Error 1
     8make[3]: Leaving directory '/home/hebasto/guix/GitHub/bitcoin/src'
     9make[2]: *** [Makefile:19327: check-am] Error 2
    10make[2]: Leaving directory '/home/hebasto/guix/GitHub/bitcoin/src'
    11make[1]: *** [Makefile:19012: check-recursive] Error 1
    12make[1]: Leaving directory '/home/hebasto/guix/GitHub/bitcoin/src'
    13make: *** [Makefile:798: check-recursive] Error 1
    
  13. hebasto force-pushed on Sep 21, 2020
  14. hebasto commented at 9:53 am on September 21, 2020: member

    Updated 283ddf104573fcf2ae648147e5ecf24c776246df -> d89c332c597a1129aa9fd3293bd37e7ab90171eb (pr19982.03 -> pr19982.04, diff):

    I’d suggest to add [skip ci] to that commit body

  15. promag commented at 10:08 am on September 21, 2020: member
    Oh I did’t have --enable-debug.
  16. in src/wallet/test/wallet_tests.cpp:773 in d89c332c59 outdated
    784             m_coinbase_txns.push_back(CreateAndProcessBlock({}, GetScriptForRawPubKey(coinbaseKey.GetPubKey())).vtx[0]);
    785             block_tx = TestSimpleSpend(*m_coinbase_txns[2], 0, coinbaseKey, GetScriptForRawPubKey(key.GetPubKey()));
    786             m_coinbase_txns.push_back(CreateAndProcessBlock({block_tx}, GetScriptForRawPubKey(coinbaseKey.GetPubKey())).vtx[0]);
    787             mempool_tx = TestSimpleSpend(*m_coinbase_txns[3], 0, coinbaseKey, GetScriptForRawPubKey(key.GetPubKey()));
    788             BOOST_CHECK(chain->broadcastTransaction(MakeTransactionRef(mempool_tx), DEFAULT_TRANSACTION_MAXFEE, false, error));
    789+            LEAVE_CRITICAL_SECTION(cs_wallets);
    


    promag commented at 10:18 am on September 21, 2020:

    d89c332c597a1129aa9fd3293bd37e7ab90171eb

    Can’t we refactor the test to remove these (LEAVE/ENTER)CRITICAL_SECTION? Looks like this block could go after TestLoadWallet?


    hebasto commented at 10:32 am on September 21, 2020:
    I don’t think so as HandleLoadWallet() puts a lambda closure into the g_load_wallet_fns which in turn is used by CWallet::Create(): https://github.com/bitcoin/bitcoin/blob/b99a1633b270e0e89479b2bb2ae19a8a8dc0fa05/src/wallet/wallet.cpp#L4090-L4092
  17. in src/sync.cpp:252 in 422d35f421 outdated
    213+    }
    214+    if (g_debug_lockorder_abort) {
    215+        tfm::format(std::cerr, "%s:%s %s was not most recent critical section locked, details in debug log.\n", file, line, guardname);
    216+        abort();
    217+    }
    218+    throw std::logic_error(strprintf("%s was not most recent critical section locked", guardname));
    


    ariard commented at 3:55 pm on September 23, 2020:
    If this throw is only exercised in reverselock_tests you can comment it’s a testing only code path.

    hebasto commented at 11:42 am on October 5, 2020:
    Also it can throw in “naked” LEAVE_CRITICAL_SECTION that we have in the code base.
  18. in src/test/sync_tests.cpp:92 in ac6ead4d99 outdated
    87+    // The second test ensures that lock tracking data have not been broken by exception.
    88+    TestInconsistentLockOrderDetected(rmutex1, rmutex2);
    89+
    90+    Mutex mutex1, mutex2;
    91+    TestInconsistentLockOrderDetected(mutex1, mutex2);
    92+    // The second test ensures that lock tracking data have not been broken by exception.
    


    ariard commented at 4:09 pm on September 23, 2020:

    If I understand correctly, the CheckLastCritical call added by this commit in LEAVE_CRITICAL_SECTION should detect any inconsistent lock order without any modification on the lock stack.

    Maybe “By checking lock order consistency (CheckLastCritical) before any unlocking (LeaveCritical) the lock tracking data must not have been broken by exception”


    hebasto commented at 11:51 am on October 5, 2020:
    I’ll address this comment if the reviewers request any other changes.

    hebasto commented at 8:01 pm on November 14, 2020:
  19. in src/wallet/test/wallet_tests.cpp:777 in d89c332c59 outdated
    788             BOOST_CHECK(chain->broadcastTransaction(MakeTransactionRef(mempool_tx), DEFAULT_TRANSACTION_MAXFEE, false, error));
    789+            LEAVE_CRITICAL_SECTION(cs_wallets);
    790             LEAVE_CRITICAL_SECTION(wallet->wallet()->cs_wallet);
    791             SyncWithValidationInterfaceQueue();
    792             ENTER_CRITICAL_SECTION(wallet->wallet()->cs_wallet);
    793+            ENTER_CRITICAL_SECTION(cs_wallets);
    


    ariard commented at 4:58 pm on September 23, 2020:

    Okay I looked on, before this commit, behavior is :

    • TestLoadWallet() in src/wallet/test/wallet_tests.cpp L794.
    • Calling down CWallet::Create() which takes the cs_wallet lock in src/wallet/wallet.cpp L4007 then cs_wallets lock in src/wallet/wallet.cpp L4089
    • Calling down this handler, which attempt to leave the former first triggering a lock order inconsistency

    For tracking the faultive commit is 6a72f26.

  20. ariard commented at 5:05 pm on September 23, 2020: member

    Code Review ACK d89c332, feel free to take comments, thanks for fixing this.

    AFAICT this deadlock wasn’t catch by CI at #16426 merge because LEAVE_CRITICAL didn’t call neither potential_deadlock_detected or CheckLastCritical. Otherwise if a new lock would have been taken in the rest of CreateWalletFromFile it should have been detected.

    So this PR does improve deadlock detection by flagging any lock order inconsistency even if it didn’t materialize in an effective one as this case sounds to suggest ?

  21. ryanofsky approved
  22. ryanofsky commented at 8:24 pm on October 27, 2020: member
    Code review ACK d89c332c597a1129aa9fd3293bd37e7ab90171eb. Impressive work by hebasto not only debugging and fixing the buggy wallet test, but also improving deadlock detection, and improving deadlock detection output, and writing new deadlock detection tests! @hebasto if you feel like adding more information to #19049, I’d be curious to know more you debugged this, since I gave up there struggling to figure out which locks were involved
  23. ryanofsky commented at 8:31 pm on October 27, 2020: member

    Note to anyone else interested in reviewing this that the change is simpler than it probably looks.

    The actual bugfix in the third commit is very straightforward and just two lines. The first two commits are just improvements to error handling and debug output in lock order code, ignoring a big whitespace diff in the first commit.

  24. hebasto commented at 8:50 pm on October 27, 2020: member

    @ryanofsky

    @hebasto if you feel like adding more information to #19049, I’d be curious to know more you debugged this, since I gave up there struggling to figure out which locks were involved

    Our own lock stack being printed at some points appears very useful for debugging.

  25. DrahtBot added the label Needs rebase on Nov 12, 2020
  26. hebasto force-pushed on Nov 14, 2020
  27. hebasto commented at 7:17 pm on November 14, 2020: member

    Rebased d89c332c597a1129aa9fd3293bd37e7ab90171eb -> e2dc75f88fed8a4c0ca4baab485715001dce03d0 (pr19982.04 -> pr19982.05) due to the conflict with #20285.

    Also updated comments as requested by @ariard.

  28. DrahtBot removed the label Needs rebase on Nov 14, 2020
  29. hebasto force-pushed on Nov 15, 2020
  30. DrahtBot added the label Needs rebase on Nov 25, 2020
  31. hebasto renamed this:
    test: Fix inconsistent lock order in wallet_tests/CreateWalletFromFile
    test: Fix inconsistent lock order in wallet_tests/CreateWallet
    on Nov 28, 2020
  32. hebasto force-pushed on Nov 28, 2020
  33. hebasto commented at 8:16 pm on November 28, 2020: member
    Rebased d15646b4accabc2129fb4fa7049fb09c8def1c56 -> f2933e7a8636df9d428f89afb7208d35bb4c4041 (pr19982.06 -> pr19982.07) due to the conflict with #19337.
  34. DrahtBot removed the label Needs rebase on Nov 28, 2020
  35. ryanofsky approved
  36. ryanofsky commented at 9:58 pm on December 2, 2020: member
    Code review ACK f2933e7a8636df9d428f89afb7208d35bb4c4041. Since last review just rebase updating test name and tweaking comments
  37. ryanofsky commented at 10:03 pm on December 2, 2020: member

    Notes that might be helpful to other reviewers: #19982 (comment).

    The actual fix is very simple and contained in the third commit. The other two commits are error checking improvements.

  38. fanquake requested review from ajtowns on Dec 3, 2020
  39. fanquake requested review from ariard on Dec 3, 2020
  40. fanquake requested review from vasild on Dec 3, 2020
  41. in src/test/reverselock_tests.cpp:63 in 18dc61229b outdated
    59-    } catch(...) { }
    60+    } catch(const std::logic_error& e) {
    61+        BOOST_CHECK_EQUAL(e.what(), "lock2 was not most recent critical section locked");
    62+        error_thrown = true;
    63+    }
    64+    BOOST_CHECK(error_thrown);
    


    vasild commented at 7:16 am on December 3, 2020:

    BOOST_CHECK_EXCEPTION() can help simplify this:

    0    BOOST_CHECK_EXCEPTION(REVERSE_LOCK(lock2), std::logic_error, HasReason("lock2 was not most recent critical section locked"));
    

    hebasto commented at 8:50 am on December 3, 2020:
    Thanks! Updated.
  42. in src/test/sync_tests.cpp:81 in 93e77f66dd outdated
    76+        LEAVE_CRITICAL_SECTION(mutex1);
    77+    } catch (const std::logic_error& e) {
    78+        BOOST_CHECK_EQUAL(e.what(), "mutex1 was not most recent critical section locked");
    79+        error_thrown = true;
    80+    }
    81+    BOOST_CHECK(error_thrown);
    


    vasild commented at 7:20 am on December 3, 2020:

    ditto:

    0    BOOST_CHECK_EXCEPTION(LEAVE_CRITICAL_SECTION(mutex1), std::logic_error, HasReason("mutex1 was not most recent critical section locked"));
    

    hebasto commented at 8:50 am on December 3, 2020:
  43. vasild approved
  44. vasild commented at 7:39 am on December 3, 2020: member

    ACK f2933e7a

    0lock m1
    1lock m2
    2unlock m1
    3unlock m2
    

    the above was allowed before this PR and is forbidden after the PR. Such code cannot cause deadlock or any other issue and is legit. However it looks strange, may be a signal of a problem elsewhere and better be avoided. In our code in particular it would also brick the debugging mechanisms because LeaveCritical() from unlock m1 would remove the wrong lock from the lock stack (m2).

    I think it is ok to enforce “locks should be released in reverse order in which they were acquired”

  45. hebasto force-pushed on Dec 3, 2020
  46. hebasto commented at 8:50 am on December 3, 2020: member

    Updated f2933e7a8636df9d428f89afb7208d35bb4c4041 -> 608e14a2b69665b33df046cb96c74fefa2613969 (pr19982.07 -> pr19982.08, diff):

  47. practicalswift commented at 11:28 am on December 3, 2020: contributor

    Concept ACK

    Nice work!

  48. promag commented at 12:39 pm on December 3, 2020: member

    Tested ACK 608e14a2b69665b33df046cb96c74fefa2613969.

    As expected make check fails on 79bfd2e27280eb0ad794b1f3389f4b00b38376b3 with ./configure --enable-debug.

  49. vasild commented at 2:31 pm on December 3, 2020: member
    ACK 608e14a
  50. DrahtBot added the label Needs rebase on Dec 8, 2020
  51. sync: Improve CheckLastCritical()
    This commit adds actual lock stack logging if check fails.
    c5e3e74f70
  52. [skip ci] sync: Check precondition in LEAVE_CRITICAL_SECTION() macro
    This change reveals a bug in the wallet_tests/CreateWalletFromFile test,
    that will be fixed in the following commit.
    cb23fe01c1
  53. test: Fix inconsistent lock order in wallet_tests/CreateWallet e1e68b6305
  54. hebasto force-pushed on Dec 10, 2020
  55. hebasto commented at 6:56 pm on December 10, 2020: member
    Rebased 608e14a2b69665b33df046cb96c74fefa2613969 -> e1e68b6305beb47ebf7ee48f14e12fdebdfea1ef (pr19982.08 -> pr19982.09) due to the conflict with #19425.
  56. DrahtBot removed the label Needs rebase on Dec 10, 2020
  57. MarcoFalke commented at 7:21 pm on December 10, 2020: member

    review ACK e1e68b6305beb47ebf7ee48f14e12fdebdfea1ef 💂

    Signature:

     0-----BEGIN PGP SIGNED MESSAGE-----
     1Hash: SHA512
     2
     3review ACK e1e68b6305beb47ebf7ee48f14e12fdebdfea1ef 💂
     4-----BEGIN PGP SIGNATURE-----
     5
     6iQGzBAEBCgAdFiEE+rVPoUahrI9sLGYTzit1aX5ppUgFAlwqrYAACgkQzit1aX5p
     7pUjlrAv/QIaj8D90XXyl/5Sw/WeG+cGB2diICRF1Pbv4qvEvG7sr+ETsScHeJvK2
     8TVlXBWujZcYo50jpPMl9j8XcnLR9MfnT0GYTwOi+z+fcXmi4NULQRqx6JrNQS0Cb
     9cZQXp/2BNo1cMBEKmUmfnG221SzZ+rKuhiJXBte3JprpIhQyJTMvSAwW1yoVwwfJ
    10P4OQp0gvW+kOt1+Gy3DK7u9DLN848blddAvwqKaH276oemN92iClv7O9YUJAs4qS
    11gWLnyFGOMhUICJVOKYzneSAWBM+xRC/Ujx1pE1V18uv+F9lqjZIS95V2OLEHu8CQ
    12OBRHx1LqruQOHvpS/SC3317KCv68QJtCuiz254G5zxGFsMBAl9wGY2XZjBYdiazD
    137GTUVSXzVwa474uVJRPVnuLvKLF0IgqDGrNQo2t/x1kW+8qJ23UpuKOhJ+1nOReG
    146la8BU2ng9UKcvlwJ6PqFN/0A/GlLeNzfGPGxT1OzRtyu/4wrFeDvPxvLd5jXyJx
    15Ea6eO6Vm
    16=A3Z6
    17-----END PGP SIGNATURE-----
    

    Timestamp of file with hash b9199fb93572c3373d95d9880b18d20bf1295e74f013c4bf476dda6306523525 -

  58. ryanofsky approved
  59. ryanofsky commented at 7:38 pm on December 10, 2020: member
    Code review ACK e1e68b6305beb47ebf7ee48f14e12fdebdfea1ef. Just trivial rebase and suggested switch to BOOST_CHECK_EXCEPTION since last review
  60. vasild approved
  61. vasild commented at 8:13 pm on December 10, 2020: member
    ACK e1e68b630
  62. fanquake merged this on Dec 11, 2020
  63. fanquake closed this on Dec 11, 2020

  64. sidhujag referenced this in commit 9d8bc3e82d on Dec 11, 2020
  65. hebasto deleted the branch on Dec 11, 2020
  66. DrahtBot locked this on Feb 15, 2022

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-11-21 12:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me