scheduler: crash after releasing wallet #16307

issue fanquake openend this issue on June 29, 2019
  1. fanquake commented at 5:16 am on June 29, 2019: member

    I’ve seen this crash multiple times. It always seems to occur after closing a wallet from the gui. Have confirmed it still occurs using master (0f309541aa6dee91ac1cf6058c1ab5f51447b3c1). This has been split out of #16027.

    debug.log:

     02019-05-20T09:16:13Z UpdateTip: new best=00000000000000000015161d9def3e63c1907e8804d13c9c55d3a7886b59b3c3 height=576299 version=0x20000000 log2_work=90.642833 tx=413927861 date='2019-05-16T12:25:50Z' progress=0.997033 cache=21.0MiB(159346txo)
     12019-05-20T09:16:14Z UpdateTip: new best=0000000000000000002394c1e3c0d81c7c9c63920c77fbb4484e9dc02352137b height=576300 version=0x2000e000 log2_work=90.642854 tx=413930977 date='2019-05-16T12:28:52Z' progress=0.997035 cache=21.9MiB(166320txo)
     22019-05-20T09:16:14Z [default wallet] Releasing wallet
     32019-05-20T09:16:14Z Pre-allocating up to position 0xb00000 in rev01640.dat
     42019-05-20T09:16:14Z UpdateTip: new best=000000000000000000181dc5e64237d3394afa8163d0c1d29184b4e52e6f567c height=576301 version=0x20000000 log2_work=90.642876 tx=413934113 date='2019-05-16T12:39:48Z' progress=0.997040 cache=22.7MiB(173339txo)
     52019-05-20T09:16:14Z New outbound peer connected: version: 70015, blocks=576912, peer=2
     62019-05-20T09:16:14Z Pre-allocating up to position 0x6000000 in blk01640.dat
     72019-05-20T09:16:15Z UpdateTip: new best=00000000000000000007af1fa44f3ce1f3afc4407b08076c44536396954c1766 height=576302 version=0x20000000 log2_work=90.642897 tx=413936803 date='2019-05-16T12:41:49Z' progress=0.997042 cache=23.7MiB(180794txo)
     82019-05-20T09:16:16Z UpdateTip: new best=000000000000000000033acc17d6a30317c0ef56a47fd398d67eefe4bc16cf58 height=576303 version=0x20c00000 log2_work=90.642919 tx=413939413 date='2019-05-16T12:59:49Z' progress=0.997051 cache=24.6MiB(188220txo)
     92019-05-20T09:16:16Z 
    10
    11************************
    12EXCEPTION: NSt3__112system_errorE       
    13recursive_mutex lock failed: Invalid argument       
    14bitcoin in scheduler 
    15
    162019-06-27T05:15:35Z UpdateTip: new best=000000000000050ea601ed98623e384481b8e334c90b566dddd790cd1c6d3fc0 height=188601 version=0x00000001 log2_work=68.380951 tx=4868711 date='2012-07-11T17:55:23Z' progress=0.011452 cache=275.7MiB(2041444txo)
    172019-06-27T05:15:35Z [test] Releasing wallet
    182019-06-27T05:15:35Z UpdateTip: new best=00000000000007154af859c8aff075f5c1a0a4d76a89c6968441169cdcd56323 height=188602 version=0x00000001 log2_work=68.380979 tx=4868862 date='2012-07-11T18:15:20Z' progress=0.011452 cache=275.7MiB(2041494txo)
    192019-06-27T05:15:35Z UpdateTip: new best=00000000000005e9e4e20d35657073899bc0ce3f0604f4438733144d4693253c height=188603 version=0x00000001 log2_work=68.381007 tx=4868985 date='2012-07-11T18:17:03Z' progress=0.011452 cache=275.7MiB(2041508txo)
    202019-06-27T05:15:35Z UpdateTip: new best=00000000000008942df84dd2a8989c60b3be8735d1cfa61d1f299354a7dde974 height=188604 version=0x00000001 log2_work=68.381036 tx=4869147 date='2012-07-11T18:26:03Z' progress=0.011453 cache=275.8MiB(2041621txo)
    212019-06-27T05:15:35Z UpdateTip: new best=00000000000004175c1c03262a976d87f88ac4aad54e349eb4505b5194223acd height=188605 version=0x00000001 log2_work=68.381064 tx=4869187 date='2012-07-11T18:33:03Z' progress=0.011453 cache=275.8MiB(2041625txo)
    222019-06-27T05:15:35Z UpdateTip: new best=000000000000043d0bfe2bdf8216c9b98aa43ce5d1d3b9abb7bbe0b1db0c1be4 height=188606 version=0x00000001 log2_work=68.381092 tx=4869729 date='2012-07-11T18:40:54Z' progress=0.011454 cache=275.7MiB(2041603txo)
    232019-06-27T05:15:35Z UpdateTip: new best=00000000000007b3e12deb141b82db69845013083b2523ffa1b9da66a06191c5 height=188607 version=0x00000001 log2_work=68.38112 tx=4870331 date='2012-07-11T19:23:04Z' progress=0.011456 cache=275.8MiB(2041976txo)
    242019-06-27T05:15:35Z UpdateTip: new best=00000000000007be6fe72369eb95c3431c97277616f91348aa7a3b5dd558ea48 height=188608 version=0x00000001 log2_work=68.381149 tx=4870341 date='2012-07-11T19:33:29Z' progress=0.011456 cache=275.8MiB(2041979txo)
    252019-06-27T05:15:35Z UpdateTip: new best=000000000000014686e1934331d8545c986f2bd2f3eea673a7664194e419cd4d height=188609 version=0x00000001 log2_work=68.381177 tx=4871114 date='2012-07-11T19:38:03Z' progress=0.011457 cache=275.9MiB(2042711txo)
    262019-06-27T05:15:35Z UpdateTip: new best=00000000000002662763d987286539ec06092398e9cb7adadf963ff543c7ae92 height=188610 version=0x00000001 log2_work=68.381205 tx=4871692 date='2012-07-11T19:47:31Z' progress=0.011459 cache=275.9MiB(2042914txo)
    272019-06-27T05:15:35Z 
    28
    29************************
    30EXCEPTION: NSt3__112system_errorE       
    31recursive_mutex lock failed: Invalid argument       
    32bitcoin in scheduler 
    
     0lldb Bitcoin-Qt.app
     1(lldb) target create "Bitcoin-Qt.app"
     2Current executable set to 'Bitcoin-Qt.app' (x86_64).
     3(lldb) env QT_FATAL_WARNINGS=1 
     4(lldb) run
     5Process 8732 launched: '/Users/michael/github/bitcoin/Bitcoin-Qt.app/Contents/MacOS/Bitcoin-Qt' (x86_64)
     62019-06-29 13:02:05.709815+0800 Bitcoin-Qt[8732:1733426] MessageTracer: Falling back to default whitelist
     7Process 8732 stopped
     8* thread [#19](/bitcoin-bitcoin/19/), name = 'bitcoin-scheduler', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
     9    frame [#0](/bitcoin-bitcoin/0/): 0x00000001001b2958 Bitcoin-Qt`___lldb_unnamed_symbol14364$$Bitcoin-Qt + 40
    10Bitcoin-Qt`___lldb_unnamed_symbol14364$$Bitcoin-Qt:
    11->  0x1001b2958 <+40>: movq   (%rbx), %rcx
    12    0x1001b295b <+43>: movq   0x20(%rcx), %rcx
    13    0x1001b295f <+47>: movq   0xc4072a(%rip), %rdx      ; (void *)0x00007fff8fc15070: __stack_chk_guard
    14    0x1001b2966 <+54>: movq   (%rdx), %rdx
    15Target 0: (Bitcoin-Qt) stopped.
    
     0* thread [#19](/bitcoin-bitcoin/19/), name = 'bitcoin-scheduler', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
     1  * frame [#0](/bitcoin-bitcoin/0/): 0x00000001001b2958 Bitcoin-Qt`___lldb_unnamed_symbol14364$$Bitcoin-Qt + 40
     2    frame [#1](/bitcoin-bitcoin/1/): 0x00000001005139ca Bitcoin-Qt`___lldb_unnamed_symbol48305$$Bitcoin-Qt + 106
     3    frame [#2](/bitcoin-bitcoin/2/): 0x0000000100513783 Bitcoin-Qt`boost::detail::function::void_function_obj_invoker3<std::__1::__bind<void (CValidationInterface::*)(std::__1::shared_ptr<CBlock const> const&, CBlockIndex const*, std::__1::vector<std::__1::shared_ptr<CTransaction const>, std::__1::allocator<std::__1::shared_ptr<CTransaction const> > > const&), CValidationInterface*&, std::__1::placeholders::__ph<1> const&, std::__1::placeholders::__ph<2> const&, std::__1::placeholders::__ph<3> const&>, void, std::__1::shared_ptr<CBlock const> const&, CBlockIndex const*, std::__1::vector<std::__1::shared_ptr<CTransaction const>, std::__1::allocator<std::__1::shared_ptr<CTransaction const> > > const&>::invoke(boost::detail::function::function_buffer&, std::__1::shared_ptr<CBlock const> const&, CBlockIndex const*, std::__1::vector<std::__1::shared_ptr<CTransaction const>, std::__1::allocator<std::__1::shared_ptr<CTransaction const> > > const&) + 35
     4    frame [#3](/bitcoin-bitcoin/3/): 0x00000001005301a1 Bitcoin-Qt`___lldb_unnamed_symbol49499$$Bitcoin-Qt + 65
     5    frame [#4](/bitcoin-bitcoin/4/): 0x00000001005300d9 Bitcoin-Qt`___lldb_unnamed_symbol49498$$Bitcoin-Qt + 73
     6    frame [#5](/bitcoin-bitcoin/5/): 0x000000010052feaf Bitcoin-Qt`___lldb_unnamed_symbol49491$$Bitcoin-Qt + 79
     7    frame [#6](/bitcoin-bitcoin/6/): 0x000000010052fb48 Bitcoin-Qt`___lldb_unnamed_symbol49481$$Bitcoin-Qt + 72
     8    frame [#7](/bitcoin-bitcoin/7/): 0x000000010052f97b Bitcoin-Qt`___lldb_unnamed_symbol49477$$Bitcoin-Qt + 539
     9    frame [#8](/bitcoin-bitcoin/8/): 0x000000010081bee0 Bitcoin-Qt`___lldb_unnamed_symbol79231$$Bitcoin-Qt + 208
    10    frame [#9](/bitcoin-bitcoin/9/): 0x000000010081b32a Bitcoin-Qt`___lldb_unnamed_symbol79209$$Bitcoin-Qt + 378
    11    frame [#10](/bitcoin-bitcoin/10/): 0x00000001001d8a2b Bitcoin-Qt`void TraceThread<std::__1::function<void ()> >(char const*, std::__1::function<void ()>) + 91
    12    frame [#11](/bitcoin-bitcoin/11/): 0x00000001001fb894 Bitcoin-Qt`___lldb_unnamed_symbol17375$$Bitcoin-Qt + 84
    13    frame [#12](/bitcoin-bitcoin/12/): 0x0000000101090d78 libboost_thread-mt.dylib`boost::(anonymous namespace)::thread_proxy(void*) + 120
    14    frame [#13](/bitcoin-bitcoin/13/): 0x00007fff594a52eb libsystem_pthread.dylib`_pthread_body + 126
    15    frame [#14](/bitcoin-bitcoin/14/): 0x00007fff594a8249 libsystem_pthread.dylib`_pthread_start + 66
    16    frame [#15](/bitcoin-bitcoin/15/): 0x00007fff594a440d libsystem_pthread.dylib`thread_start + 13
    
  2. fanquake commented at 4:15 am on August 8, 2019: member

    I’ve put together an AppleScript that can reproduce this issue fairly easily. It only requires a syncing Bitcoin-Core and creating a wallet named “test”.

     0tell application "System Events" to tell process "Bitcoin Core"
     1	-- sane window location and size
     2	set position of window 1 to {0, 20}
     3	set size of window 1 to {500, 500}
     4	set frontmost to true
     5	
     6	repeat
     7		click menu item "Close Wallet..." of menu "File" of menu bar 1
     8		click button "Yes" of group 1 of window 1
     9		delay 0.2
    10		click menu item "Close Wallet..." of menu "File" of menu bar 1
    11		click button "Yes" of group 1 of window 1
    12		delay 0.2
    13		click menu item "[default wallet]" of menu "Open Wallet" of menu item "Open Wallet" of menu "File" of menu bar 1
    14		delay 0.2
    15		click menu item "test" of menu "Open Wallet" of menu item "Open Wallet" of menu "File" of menu bar 1
    16		delay 0.2
    17	end repeat
    18end tell
    

    cc @promag.

  3. fanquake referenced this in commit d0d5b4eb34 on Aug 9, 2019
  4. fanquake commented at 8:00 am on February 21, 2020: member

    From what I can tell, I’ve tracked down the cause of this crash.

    A call to CWallet::BlockConnected while we are in wallet->Flush() phase of ReleaseWallet gives the scheduler the cs_wallet lock. We then loop calling SyncTransaction -> AddToWalletIfInvolvingMe which also calls AssertLockHeld(cs_wallet);. At some point we’ll be done flushing, and call delete wallet, which blows up as we are trying to delete cs_wallet which is still being held in the scheduler thread.

    You can cause this crash running src/bitcoind and a Python script that is essentially:

    0while 1:
    1    rpc.unloadwallet('test')
    2    rpc.unloadwallet('test2')
    3    rpc.loadwallet('test')
    4    rpc.loadwallet('test2')
    

    debug.log

     02020-02-21T06:47:47.281600Z [scheduler] [test] SCHED: bool CWallet::AddToWalletIfInvolvingMe(const CTransactionRef &, CWalletTx::Confirmation, bool)
     12020-02-21T06:47:47.281738Z [scheduler] [test] SCHED: bool CWallet::AddToWalletIfInvolvingMe(const CTransactionRef &, CWalletTx::Confirmation, bool)
     22020-02-21T06:47:47.281919Z [scheduler] [test] SCHED: bool CWallet::AddToWalletIfInvolvingMe(const CTransactionRef &, CWalletTx::Confirmation, bool)
     32020-02-21T06:47:47.289817Z [httpworker.3] BerkeleyEnvironment::Flush: wallet.dat closed
     42020-02-21T06:47:47.289912Z [msghand] received: inv (541 bytes) peer=4
     52020-02-21T06:47:47.290027Z [scheduler] [test] SCHED: bool CWallet::AddToWalletIfInvolvingMe(const CTransactionRef &, CWalletTx::Confirmation, bool)
     62020-02-21T06:47:47.290102Z [httpworker.3] BerkeleyEnvironment::Flush: Flush(false) took              43ms
     72020-02-21T06:47:47.290204Z [scheduler] [test] SCHED: bool CWallet::AddToWalletIfInvolvingMe(const CTransactionRef &, CWalletTx::Confirmation, bool)
     8Assertion failed: (e == 0), function ~recursive_mutex, file /BuildRoot/Library/Caches/com.apple.xbs/Sources/libcxx/libcxx-400.9.4/src/mutex.cpp, line 64.
     9Process 72347 stopped
    10* thread [#2](/bitcoin-bitcoin/2/), name = 'b-httpworker.3', stop reason = signal SIGABRT
    11    frame [#0](/bitcoin-bitcoin/0/): 0x00007fff5c4182c2 libsystem_kernel.dylib`__pthread_kill + 10
    12libsystem_kernel.dylib`__pthread_kill:
    13->  0x7fff5c4182c2 <+10>: jae    0x7fff5c4182cc            ; <+20>
    14    0x7fff5c4182c4 <+12>: movq   %rax, %rdi
    15    0x7fff5c4182c7 <+15>: jmp    0x7fff5c412453            ; cerror_nocancel
    16    0x7fff5c4182cc <+20>: retq   
    17Target 0: (bitcoind) stopped.
    

    httpworker (unloadwallet) thread

     0
     1(AnnotatedMixin<std::__1::recursive_mutex> *) this = 0x000000015b7ccf98
     2
     3(lldb) frame select 6
     4frame [#6](/bitcoin-bitcoin/6/): 0x00000001000aa6f3 bitcoind`AnnotatedMixin<std::__1::recursive_mutex>::~AnnotatedMixin(this=0x000000015b7ccf98) at sync.h:82:23
     5  79  	class LOCKABLE AnnotatedMixin : public PARENT
     6  80  	{
     7  81  	public:
     8-> 82  	    ~AnnotatedMixin() {
     9  83  	        DeleteLock((void*)this);
    10  84  	    }
    11  85  	
    12
    13(lldb) frame select 7
    14frame [#7](/bitcoin-bitcoin/7/): 0x00000001007d9ea0 bitcoind`CWallet::~CWallet(this=0x000000015b7cce20) at wallet.h:758:5
    15  755 	    {
    16  756 	        // Should not have slots connected at this point.
    17  757 	        assert(NotifyUnload.empty());
    18-> 758 	    }
    19  759 	
    20  760 	    bool IsCrypted() const;
    21  761 	    bool IsLocked() const override;
    22(lldb) frame select 8
    23frame [#8](/bitcoin-bitcoin/8/): 0x00000001007c4373 bitcoind`CWallet::~CWallet(this=0x000000015b7cce20) at wallet.h:755:5
    24  752 	    }
    25  753 	
    26  754 	    ~CWallet()
    27-> 755 	    {
    28  756 	        // Should not have slots connected at this point.
    29  757 	        assert(NotifyUnload.empty());
    30  758 	    }
    

    scheduler thread (10)

     0
     1// in frame 14
     2frame variable criticalblock27
     3(DebugLock<decltype(this->cs_wallet)>) criticalblock27 = {
     4 std::__1::unique_lock<std::__1::recursive_mutex> = {
     5   __m_ = 0x000000015b7ccf98
     6   __owns_ = true
     7 }
     8}
     9
    10
    11
    12(lldb) frame select 12
    13frame [#12](/bitcoin-bitcoin/12/): 0x00000001007a76b3 bitcoind`CWallet::AddToWalletIfInvolvingMe(this=0x000000015b7cce20, ptx=std::__1::shared_ptr<const CTransaction>::element_type @ 0x000000015c6c3908 strong=1 weak=1, confirm=Confirmation @ 0x0000700007d2ab70, fUpdate=true) at wallet.cpp:914:9
    14  911 	{
    15  912 	    const CTransaction& tx = *ptx;
    16  913 	    {
    17-> 914 	        WalletLogPrintf("SCHED: %s\n", __PRETTY_FUNCTION__);
    18  915 	        // This assertion fails?
    19  916 	        AssertLockHeld(cs_wallet);
    20  917
    21(lldb) frame select 13
    22frame [#13](/bitcoin-bitcoin/13/): 0x00000001007a91ca bitcoind`CWallet::SyncTransaction(this=0x000000015b7cce20, ptx=std::__1::shared_ptr<const CTransaction>::element_type @ 0x000000015c6c3908 strong=1 weak=1, confirm=Confirmation @ 0x0000700007d2ac10, update_tx=true) at wallet.cpp:1089:10
    23  1086	void CWallet::SyncTransaction(const CTransactionRef& ptx, CWalletTx::Confirmation confirm, bool update_tx)
    24  1087	{
    25  1088	    //WalletLogPrintf("SCHED: %s\n", __PRETTY_FUNCTION__);
    26-> 1089	    if (!AddToWalletIfInvolvingMe(ptx, confirm, update_tx))
    27  1090	        return; // Not one of ours
    28  1091	
    29  1092	    // If a transaction changes 'conflicted' state, that changes the balance
    30(lldb) frame variable cs_wallet
    31(RecursiveMutex) cs_wallet = {
    32 std::__1::recursive_mutex = {
    33   __m_ = (__sig = 1297437784, __opaque = "")
    34 }
    35}
    36(lldb) frame select 14
    37frame [#14](/bitcoin-bitcoin/14/): 0x00000001007a99f2 bitcoind`CWallet::BlockConnected(this=0x000000015b7cce20, block=0x000000015c765d28, vtxConflicted=size=0, height=351323) at wallet.cpp:1129:9
    38  1126	    m_last_block_processed = block_hash;
    39  1127	    for (size_t index = 0; index < block.vtx.size(); index++) {
    40  1128	        CWalletTx::Confirmation confirm(CWalletTx::Status::CONFIRMED, height, block_hash, index);
    41-> 1129	        SyncTransaction(block.vtx[index], confirm);
    42  1130	        TransactionRemovedFromMempool(block.vtx[index]);
    43  1131	    }
    44  1132	    for (const CTransactionRef& ptx : vtxConflicted) {
    45(lldb) frame variable cs_wallet
    46(RecursiveMutex) cs_wallet = {
    47 std::__1::recursive_mutex = {
    48   __m_ = (__sig = 1297437784, __opaque = "")
    49 }
    50}
    51(lldb) frame select 15
    52frame [#15](/bitcoin-bitcoin/15/): 0x00000001007a9bb2 bitcoind`non-virtual thunk to CWallet::BlockConnected(this=0x000000015b7cce20, block=0x000000015c765d28, vtxConflicted=size=0, height=351323) at wallet.cpp:0
    53  1   	// Copyright (c) 2009-2010 Satoshi Nakamoto
    54  2   	// Copyright (c) 2009-2020 The Bitcoin Core developers
    55  3   	// Distributed under the MIT software license, see the accompanying
    56  4   	// file COPYING or http://www.opensource.org/licenses/mit-license.php.
    57  5   	
    58  6   	#include <wallet/wallet.h>
    59  7   	
    60(lldb) frame variable cs_wallet
    61error: no variable named 'cs_wallet' found in this frame
    

    httpworker (unloadwallet) bt

     0* thread [#2](/bitcoin-bitcoin/2/), name = 'b-httpworker.3', stop reason = signal SIGABRT
     1    frame [#0](/bitcoin-bitcoin/0/): 0x00007fff5c4182c2 libsystem_kernel.dylib`__pthread_kill + 10
     2    frame [#1](/bitcoin-bitcoin/1/): 0x00007fff5c4d3bf1 libsystem_pthread.dylib`pthread_kill + 284
     3    frame [#2](/bitcoin-bitcoin/2/): 0x00007fff5c3826a6 libsystem_c.dylib`abort + 127
     4    frame [#3](/bitcoin-bitcoin/3/): 0x00007fff5c34b20d libsystem_c.dylib`__assert_rtn + 324
     5    frame [#4](/bitcoin-bitcoin/4/): 0x00007fff59543092 libc++.1.dylib`std::__1::recursive_mutex::~recursive_mutex() + 46
     6    frame [#5](/bitcoin-bitcoin/5/): 0x00000001000f1edb bitcoind`AnnotatedMixin<std::__1::recursive_mutex>::~AnnotatedMixin(this=0x000000015b7ccf98) at sync.h:84:5
     7  * frame [#6](/bitcoin-bitcoin/6/): 0x00000001000aa6f3 bitcoind`AnnotatedMixin<std::__1::recursive_mutex>::~AnnotatedMixin(this=0x000000015b7ccf98) at sync.h:82:23
     8    frame [#7](/bitcoin-bitcoin/7/): 0x00000001007d9ea0 bitcoind`CWallet::~CWallet(this=0x000000015b7cce20) at wallet.h:758:5
     9    frame [#8](/bitcoin-bitcoin/8/): 0x00000001007c4373 bitcoind`CWallet::~CWallet(this=0x000000015b7cce20) at wallet.h:755:5
    10    frame [#9](/bitcoin-bitcoin/9/): 0x00000001007c703a bitcoind`CWallet::~CWallet(this=0x000000015b7cce20) at wallet.h:755:5
    11    frame [#10](/bitcoin-bitcoin/10/): 0x00000001007c4691 bitcoind`ReleaseWallet(wallet=0x000000015b7cce20) at wallet.cpp:117:5
    12    frame [#11](/bitcoin-bitcoin/11/): 0x000000010083115e bitcoind`std::__1::__shared_ptr_pointer<CWallet*, void (*)(CWallet*), std::__1::allocator<CWallet> >::__on_zero_shared(this=0x00000001577e76c0) at memory:3511:5
    13    frame [#12](/bitcoin-bitcoin/12/): 0x00000001000ebd1b bitcoind`std::__1::__shared_count::__release_shared(this=0x00000001577e76c0) at memory:3415:9
    14    frame [#13](/bitcoin-bitcoin/13/): 0x00000001000ebc8d bitcoind`std::__1::__shared_weak_count::__release_shared(this=0x00000001577e76c0) at memory:3457:27
    15    frame [#14](/bitcoin-bitcoin/14/): 0x000000010065a49a bitcoind`std::__1::shared_ptr<CWallet>::~shared_ptr(this=0x0000700007fb9738) at memory:4393:19
    16    frame [#15](/bitcoin-bitcoin/15/): 0x000000010065a3f3 bitcoind`std::__1::shared_ptr<CWallet>::~shared_ptr(this=0x0000700007fb9738) at memory:4391:1
    17    frame [#16](/bitcoin-bitcoin/16/): 0x0000000100797521 bitcoind`std::__1::shared_ptr<CWallet>::reset(this=0x0000700007fb9938) at memory:4528:5
    18    frame [#17](/bitcoin-bitcoin/17/): 0x000000010079740e bitcoind`UnloadWallet(wallet=nullptr) at wallet.cpp:143:12
    19    frame [#18](/bitcoin-bitcoin/18/): 0x000000010074f1e2 bitcoind`unloadwallet(request=0x0000700007fba990) at rpcwallet.cpp:2779:5
    

    scheduler thread bt

     0* thread [#10](/bitcoin-bitcoin/10/), name = 'b-scheduler'
     1    frame [#0](/bitcoin-bitcoin/0/): 0x00007fff5c489799 libsystem_malloc.dylib`tiny_malloc_should_clear + 1176
     2    frame [#1](/bitcoin-bitcoin/1/): 0x00007fff5c48920f libsystem_malloc.dylib`szone_malloc_should_clear + 66
     3    frame [#2](/bitcoin-bitcoin/2/): 0x00007fff5c487c99 libsystem_malloc.dylib`malloc_zone_malloc + 103
     4    frame [#3](/bitcoin-bitcoin/3/): 0x00007fff5c487c15 libsystem_malloc.dylib`malloc + 24
     5    frame [#4](/bitcoin-bitcoin/4/): 0x00007fff5955b378 libc++abi.dylib`operator new(unsigned long) + 40
     6    frame [#5](/bitcoin-bitcoin/5/): 0x00007fff59509521 libc++.1.dylib`std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >::__grow_by(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long) + 141
     7    frame [#6](/bitcoin-bitcoin/6/): 0x00007fff5950a3d7 libc++.1.dylib`std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >::push_back(char) + 85
     8    frame [#7](/bitcoin-bitcoin/7/): 0x0000000100a29562 bitcoind`std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >::operator+=(this="[test] SCHED: bool CWa", __c='l') at string:992:84
     9    frame [#8](/bitcoin-bitcoin/8/): 0x000000010099eccc bitcoind`BCLog::LogEscapeMessage(str="[test] SCHED: bool CWallet::AddToWalletIfInvolvingMe(const CTransactionRef &, CWalletTx::Confirmation, bool)\n") at logging.cpp:253:21
    10    frame [#9](/bitcoin-bitcoin/9/): 0x000000010099eef1 bitcoind`BCLog::Logger::LogPrintStr(this=0x0000000103300000, str="[test] SCHED: bool CWallet::AddToWalletIfInvolvingMe(const CTransactionRef &, CWalletTx::Confirmation, bool)\n") at logging.cpp:265:32
    11    frame [#10](/bitcoin-bitcoin/10/): 0x00000001007f7c86 bitcoind`void LogPrintf<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, char const*>(fmt="%s SCHED: %s\n", args="[test]", args=0x0000700007d2a790) at logging.h:173:23
    12    frame [#11](/bitcoin-bitcoin/11/): 0x00000001007a7e5e bitcoind`void CWallet::WalletLogPrintf<char const*>(this=0x000000015b7cce20, fmt="SCHED: %s\n", parameters="bool CWallet::AddToWalletIfInvolvingMe(const CTransactionRef &, CWalletTx::Confirmation, bool)") const at wallet.h:1139:9
    13  * frame [#12](/bitcoin-bitcoin/12/): 0x00000001007a76b3 bitcoind`CWallet::AddToWalletIfInvolvingMe(this=0x000000015b7cce20, ptx=std::__1::shared_ptr<const CTransaction>::element_type @ 0x000000015c6c3908 strong=1 weak=1, confirm=Confirmation @ 0x0000700007d2ab70, fUpdate=true) at wallet.cpp:914:9
    14    frame [#13](/bitcoin-bitcoin/13/): 0x00000001007a91ca bitcoind`CWallet::SyncTransaction(this=0x000000015b7cce20, ptx=std::__1::shared_ptr<const CTransaction>::element_type @ 0x000000015c6c3908 strong=1 weak=1, confirm=Confirmation @ 0x0000700007d2ac10, update_tx=true) at wallet.cpp:1089:10
    15    frame [#14](/bitcoin-bitcoin/14/): 0x00000001007a99f2 bitcoind`CWallet::BlockConnected(this=0x000000015b7cce20, block=0x000000015c765d28, vtxConflicted=size=0, height=351323) at wallet.cpp:1129:9
    16    frame [#15](/bitcoin-bitcoin/15/): 0x00000001007a9bb2 bitcoind`non-virtual thunk to CWallet::BlockConnected(this=0x000000015b7cce20, block=0x000000015c765d28, vtxConflicted=size=0, height=351323) at wallet.cpp:0
    17    frame [#16](/bitcoin-bitcoin/16/): 0x00000001000a1e2e bitcoind`interfaces::(anonymous namespace)::NotificationsHandlerImpl::BlockConnected(this=0x00000001190f2ec0, block=std::__1::shared_ptr<const CBlock>::element_type @ 0x000000015c765d28 strong=2 weak=1, index=0x000000011309fd20, tx_conflicted=size=0) at chain.cpp:179:26
    18    frame [#17](/bitcoin-bitcoin/17/): 0x00000001000a2114 bitcoind`non-virtual thunk to interfaces::(anonymous namespace)::NotificationsHandlerImpl::BlockConnected(this=0x00000001190f2ec8, block=std::__1::shared_ptr<const CBlock>::element_type @ 0x000000015c765d28 strong=2 weak=1, index=0x000000011309fd20, tx_conflicted=size=0) at chain.cpp:0
    
  5. Sjors commented at 10:47 am on February 21, 2020: member

    Or as a bash script:

    0#!/bin/bash
    1while [ 1 ]
    2do
    3	src/bitcoin-cli -testnet unloadwallet test
    4	src/bitcoin-cli -testnet unloadwallet test2
    5	src/bitcoin-cli -testnet loadwallet test
    6	src/bitcoin-cli -testnet loadwallet test2
    7done
    

    After an hour or so, on macOS Catalina:

    0EXCEPTION: NSt3__112system_errorE       
    1recursive_mutex lock failed: Invalid argument       
    2bitcoin in scheduler       
    3
    4libc++abi.dylib: terminating with uncaught exception of type std::__1::system_error: recursive_mutex lock failed: Invalid argument
    
  6. ryanofsky commented at 3:45 pm on February 24, 2020: member

    Maybe the wallet needs to unregister from notifications before flushing (reset m_chain_notifications_handler before calling Flush() instead of after):

    https://github.com/bitcoin/bitcoin/blob/ab9de435880c9d77e4137b65050591ef2d14f809/src/wallet/wallet.cpp#L114-L115

  7. MarcoFalke commented at 9:03 pm on March 5, 2020: member
    Might be good to test this with #18234 if @ryanofsky ’s suggestion does not work.
  8. fanquake commented at 2:15 pm on March 6, 2020: member

    Might be good to test this with #18234

    I’ve tested with #18234 and can still produce the same crash. I’m not sure the changes there would help.

    Maybe the wallet needs to unregister from notifications before flushing (reset m_chain_notifications_handler before calling Flush() instead of after):

    I’ve briefly tested this patch (on top of master):

     0diff --git a/src/wallet/wallet.cpp b/src/wallet/wallet.cpp
     1index 5a97f62b7..1a787bc57 100644
     2--- a/src/wallet/wallet.cpp
     3+++ b/src/wallet/wallet.cpp
     4@@ -110,8 +110,8 @@ static void ReleaseWallet(CWallet* wallet)
     5     const std::string name = wallet->GetName();
     6     wallet->WalletLogPrintf("Releasing wallet\n");
     7     wallet->BlockUntilSyncedToCurrentChain();
     8-    wallet->Flush();
     9     wallet->m_chain_notifications_handler.reset();
    10+    wallet->Flush();
    11     delete wallet;
    12     // Wallet is now released, notify UnloadWallet, if any.
    

    and it seems to cause deadlocks. Eventually an unloadwallet call times out and threads seem to be stalled on SyncWithValidationInterfaceQueue. I can post a backtrace shortly.

  9. bvbfan commented at 2:40 pm on March 6, 2020: contributor
    I can clarify what’s happen, BlockUntilSyncedToCurrentChain does not do what is advertised in its name, it just waits a notification but it not ensures that wallet processed block and tip block are same. Furthermore it makes a race condition in delete wallet since mutex is still held in process thread, it depends of which is faster flush or processing that’s why it’s not crash always.
  10. ryanofsky commented at 4:23 pm on March 10, 2020: member

    I can post a backtrace shortly. @fanquake or others, it would really help to see a “thread apply all bt” backtrace of the deadlock. From @promags’s fix #18280 I’m guessing it has something to do with the unload wallet notification, but it’s not really clear

  11. promag commented at 4:43 pm on March 10, 2020: member

    @ryanofsky I think the problem is in https://github.com/bitcoin/bitcoin/blob/5d92ac26ed8984c29eabc4b78bcddd2423e68dac/src/wallet/wallet.cpp#L114-L115 Because m_chain_notifications_handler.reset() disconnects but it’s not thread safe - some notification might be on way - and then delete wallet will screw that.

    So I think we need to ensure that after UnregisterValidationInterface nothing can be delivered to the wallet.

    BTW, when I say disconnect is not thread safe, I actually think there’s 2 problems there. First is that g_signals.m_internals->m_connMainSignals stores a raw pointer to the wallet and we are managing it with shared_ptr - should be fixed with the above suggestion. Second problem, even though boost::signal is thread safe, it doesn’t mean connect/disconnect of multiple signals is atomic which means that, for instance, a notification could be delivered before RegisterValidationInterface returns. I’m not sure if this the root of any problem but I suppose that register/unregister could be atomic - and for this I’m trying with #18280.

  12. ryanofsky commented at 8:37 pm on March 10, 2020: member

    It looks like promag found a good and pretty minimal workaround for this problem in #18280

    I don’t think #18280 is 100% perfect because BlockUntilSyncedToCurrentChain doesn’t always call SyncWithValidationInterfaceQueue (due to the old_tip == ::ChainActive().Tip()) so some notifications could still happen after deletion, but the current bug seems already pretty rare so I would expect this to probably be a good enough fix.

    I implemented a change in 06d2b530e57b88c0542e2bd8db00ca0b034b589d (branch), which I think could solve the problem in a more reliable way, and also maybe make unloading faster, since it drops the BlockUntilSyncedToCurrentChain call in ReleaseWallet

    EDIT: Replaced references to #16367 above with #18280

  13. promag commented at 8:49 pm on March 10, 2020: member

    I don’t think #16367 is 100% perfect because BlockUntilSyncedToCurrentChain doesn’t always call SyncWithValidationInterfaceQueue

    Right, I though about just calling SyncWithValidationInterfaceQueue (something in interfaces::Chain that calls it).

  14. fanquake commented at 8:18 am on March 11, 2020: member

    @fanquake or others, it would really help to see a “thread apply all bt” backtrace @ryanofsky sorry; I forgot about this. You can find a full backtrace, master + this patch applied, here.

    It looks like promag found a good and pretty minimal workaround for this problem in #16367

    I assume you mean #18280?

  15. ryanofsky commented at 2:34 pm on March 11, 2020: member

    I assume you mean #18280?

    Yes sorry edited comment above

    You can find a full backtrace, master + this patch applied, here.

    This backtrace is really interesting and shows a different deadlock that 2a42fc1bdb393f88d72ec4d5c93b43b2b86dcc91 from #18280 will not fix, but my change 06d2b530e57b88c0542e2bd8db00ca0b034b589d (branch, comment) does fix

    The backtrace shows SyncWithValidationInterfaceQueue being called in the CScheduler::serviceQueue thread which will always deadlock because the thread that’s supposed to be processing notifications can’t block waiting for notifications that it should be processing.

    This happens because MaybeCompactWalletDB accesses the shared_ptr and can trigger ReleaseWallet which calls SyncWithValidationInterfaceQueue


    Correction for above: 2a42fc1bdb393f88d72ec4d5c93b43b2b86dcc91 from #18280 should actually fix the issue because it deletes the BlockUntilSyncedToCurrentChain call from ReleaseWallet. I was thinking of an earlier version of that fix bba30cb4f4fe4aa99b5fad2420b0cc05527756d0 which didn’t do this


    Correction to correction: 2a42fc1bdb393f88d72ec4d5c93b43b2b86dcc91 from #18280 doesn’t fix the issue because it doesn’t really delete the BlockUntilSyncedToCurrentChain call from ReleaseWallet, it just replaces it with a waitForNotifications call which is effectively the same thing and deadlocks in the same way: #18280 (review)

  16. ryanofsky commented at 2:54 pm on March 11, 2020: member

    Not on IRC but caught this in logs:

    [23:16:45] <promag> re https://github.com/bitcoin/bitcoin/commit/06d2b530e57b88c0542e2bd8db00ca0b034b589d [23:17:23] <promag> AFAICT the problem remains [23:19:12] <promag> m_proxy.reset() races with signal slots right? [23:19:51] <promag> so fix it each slot should have the shared_ptr [23:20:04] <promag> so the last slot would destroy the proxy

    06d2b530e57b88c0542e2bd8db00ca0b034b589d should reference count correctly with no races, not because of slot lifetime but because of callback lifetime. By callback I mean object returned by std::bind that captures the shared pointer. Slot will be deleted when signal is disconnected but callback object should still exist while it is being called

  17. promag commented at 3:26 pm on March 11, 2020: member
    Oh nice one! But what prevents any of those slots to access the wallet concurrently with delete wallet?
  18. ryanofsky commented at 5:31 pm on March 11, 2020: member

    Oh nice one! But what prevents any of those slots to access the wallet concurrently with delete wallet?

    delete wallet is only called while the last shared ptr with refcount 1 is reset or destroyed. Normally this will be during wallet.reset() in UnloadWallet but in the case where MaybeCompactWalletDB or a validationinterface notification is running in another thread, it will happen as soon as the relevant shared_ptrs there are destroyed, at the earliest possible time the wallet can be deleted, but not before.

  19. promag commented at 6:03 pm on March 11, 2020: member
    Right right, RelaseWallet is called after all shared_ptr are gone.
  20. promag commented at 6:23 pm on March 11, 2020: member

    @ryanofsky I think you should open a PR as I think your approach is better. Could also add some comments explaining the change to shared_ptr and that std::bind now guarantees that the wallet is only released after all slots are called.

    On a side note, we should revisit all boost::signals2 usages to check for similar issue.

  21. ryanofsky commented at 6:58 pm on March 11, 2020: member

    @ryanofsky I think you should open a PR as I think your approach is better. Could also add some comments explaining the change to shared_ptr and that std::bind now guarantees that the wallet is only released after all slots are called.

    On a side note, we should revisit all boost::signals2 usages to check for similar issue.

    I’d be happy if someone wants to take code from 06d2b530e57b88c0542e2bd8db00ca0b034b589d (branch) and make it a real PR, but right now my plan is just to help get #18280 merged as a straightforward fix to the problem, and leave the pointer change for a future improvement, since it’s incomplete (breaks bench and c++ tests) and I haven’t really tested it at all (just ran python tests)

  22. promag commented at 11:40 pm on April 2, 2020: member
    This can be closed.
  23. fanquake closed this on Apr 2, 2020

  24. DrahtBot locked this on Aug 18, 2022

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-07-03 13:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me