Fixes #16307
Signed-off-by: Anthony Fieroni bvbfan@abv.bg
Fixes #16307
Signed-off-by: Anthony Fieroni bvbfan@abv.bg
1155@@ -1156,8 +1156,11 @@ void CWallet::BlockUntilSyncedToCurrentChain() {
1156 // ::ChainActive().Tip(), otherwise put a callback in the validation interface queue and wait
1157 // for the queue to drain enough to execute it (indicating we are caught up
1158 // at least with the time we entered this function).
1159- uint256 last_block_hash = WITH_LOCK(cs_wallet, return m_last_block_processed);
1160- chain().waitForNotificationsIfTipChanged(last_block_hash);
1161+ bool tipIsSame = false;
1162+ do {
1163+ uint256 last_block_hash = WITH_LOCK(cs_wallet, return m_last_block_processed);
1164+ tipIsSame = chain().waitForNotificationsIfTipIsNotSame(last_block_hash);
Shouldn’t this whole scope happen under the cs_wallet lock?
Also, shouldn’t m_chain_notifications_handler.reset()
happen inside here under the wallet lock?
Otherwise a new block could come in and put new notifications in the handler while we reset it.
Shouldn’t this whole scope happen under the cs_wallet lock?
No, we leave mutex for process thread to finish.
Also, shouldn’t m_chain_notifications_handler.reset() happen inside here under the wallet lock?
No.
Otherwise a new block could come in and put new notifications in the handler while we reset it.
I’m aware of that, we can have a function that takes a mutex and reset notifications
I think this is the wrong approach.
That’s at least ensure sync between wallet and chain tips otherwise all calls to BlockUntilSyncedToCurrentChain
are just pointless, they don’t do what is supposed to do. Your patch looks fine but not what we want when calls a function like BlockUntilSyncedToCurrentChain
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.
Reviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.
@promag the problem is not the queued connection / disconnection, actually
0void UnregisterValidationInterface(CValidationInterface* pwalletIn) {
1 if (g_signals.m_internals) {
2 g_signals.m_internals->m_connMainSignals.erase(pwalletIn);
3 }
4}
signal is received before disconnect is called by erasing the caller.
Fix unit test and rebase.
Let’s clarify a bit more.
0void disconnect() override
1{
2 if (m_notifications) {
3 m_notifications = nullptr; // <---------------- it makes no sense to be before unregister
4 UnregisterValidationInterface(this);
5 }
6}
Since ValidationInterfaceConnections
calls destructors in reverse order, for a bad luck UpdatedBlockTip
is disconnected last which is not held the mutex but update time by atomic variable.
Running in testnet for at least an hour in parallel with
0#!/bin/bash
1while [ 1 ]
2do
3 src/bitcoin-cli -testnet loadwallet test
4 src/bitcoin-cli -testnet unloadwallet test
5done
no crashes anymore.
I think I’ve hit into #16307 issue while doing refactoring on this part of the code. Current refactoring direction is to move towards the wallet and chain being independent tips, not more synchronized so another way to solve this would be to reset handler correctly while holding wallet lock and then test handler still exists in BlockConnected/BlockDisconnected/…, flushing correctly and committing right locator means worst-case we redo a rescan at wallet loading.
Doing it race-condition free likely need a handler lock though (or `NotificationsHandlerImpl locking wallet…)
while holding wallet lock and then test handler still exists in BlockConnected/BlockDisconnected
i don’t think that’s right approach to me.
159@@ -160,8 +160,8 @@ class NotificationsHandlerImpl : public Handler, CValidationInterface
160 void disconnect() override
161 {
162 if (m_notifications) {
163- m_notifications = nullptr;
m_notifications
should be an atomic pointer too since m_notifications
is get and set from different threads, though probably the synchronization done by boost signals & the scheduler prevents bugs from it not being atomic
Maybe m_notifications should be an atomic pointer too since m_notifications is get and set from different threads, though probably the synchronization done by boost signals & the scheduler prevents bugs from it not being atomic
You’re right
345@@ -346,13 +346,14 @@ class ChainImpl : public Chain
346 {
347 return MakeUnique<NotificationsHandlerImpl>(*this, notifications);
348 }
349- void waitForNotificationsIfTipChanged(const uint256& old_tip) override
350+ bool waitForNotificationsIfTipIsNotSame(const uint256& tip) override
109@@ -110,8 +110,8 @@ static void ReleaseWallet(CWallet* wallet)
110 const std::string name = wallet->GetName();
111 wallet->WalletLogPrintf("Releasing wallet\n");
112 wallet->BlockUntilSyncedToCurrentChain();
113- wallet->Flush();
114 wallet->m_chain_notifications_handler.reset();
115+ WITH_LOCK(wallet->cs_wallet, wallet->Flush());
I think there is a misunderstanding here (maybe my own), but my understanding is that the point of BlockUntilSyncedToCurrentChain is just to ensure that if you make two RPC calls in a row the second RPC call is always aware of the result of first call. Ensuring that notification in the queue from the time of the call are processed is sufficient to do this, without trying to guarantee that the wallet tip and node tips are equal.
Yes, but mostly after BlockUntilSyncedToCurrentChain lock is acquired and there is no guarantee that notification is faster than this held, i.e. you can leave BlockUntilSyncedToCurrentChain and don’t do what you want, you gain mutex before notification thread and you’re not sync anything.
re: #18279 (comment)
That’s at least ensure sync between wallet and chain tips otherwise all calls to
BlockUntilSyncedToCurrentChain
are just pointless, they don’t do what is supposed to do. Your patch looks fine but not what we want when calls a function likeBlockUntilSyncedToCurrentChain
I think there is a misunderstanding here (maybe my own), but my understanding is that the point of BlockUntilSyncedToCurrentChain is just to ensure that if you make two RPC calls in a row the second RPC call is always aware of the result of first call. Ensuring that notification in the queue from the time of the call are processed is sufficient to do this, without trying to guarantee that the wallet tip and node tips are equal.
Checking tips for equality is just an optimization to avoid waiting for the queue, and guaranteeing that the tips are equal isn’t actually possible (despite the attempt here) because BlockUntilSyncedToCurrentChain isn’t called with any locks held.
@promag, @ryanofsky i have another, looking better to me, solution ensuring there is no background tasks, but i’ve not push since it’s not well tested
0diff --git a/src/scheduler.cpp b/src/scheduler.cpp
1index 7cb7754fd..17c26be74 100644
2--- a/src/scheduler.cpp
3+++ b/src/scheduler.cpp
4@@ -199,12 +199,16 @@ void SingleThreadedSchedulerClient::AddToProcessQueue(std::function<void ()> fun
5 }
6
7 void SingleThreadedSchedulerClient::EmptyQueue() {
8- assert(!m_pscheduler->AreThreadsServicingQueue());
9- bool should_continue = true;
10+ if (!m_pscheduler->AreThreadsServicingQueue())
11+ return;
12+ auto hasCallbacks = [this]() -> bool {
13+ LOCK(m_cs_callbacks_pending);
14+ return !m_callbacks_pending.empty();
15+ };
16+ auto should_continue = hasCallbacks();
17 while (should_continue) {
18 ProcessQueue();
19- LOCK(m_cs_callbacks_pending);
20- should_continue = !m_callbacks_pending.empty();
21+ should_continue = hasCallbacks();
22 }
23 }
24
25diff --git a/src/validationinterface.cpp b/src/validationinterface.cpp
26index 1deb93c97..87ade954c 100644
27--- a/src/validationinterface.cpp
28+++ b/src/validationinterface.cpp
29@@ -90,6 +90,7 @@ void RegisterValidationInterface(CValidationInterface* pwalletIn) {
30 void UnregisterValidationInterface(CValidationInterface* pwalletIn) {
31 if (g_signals.m_internals) {
32 g_signals.m_internals->m_connMainSignals.erase(pwalletIn);
33+ g_signals.FlushBackgroundCallbacks();
34 }
35 }
36
37diff --git a/src/wallet/wallet.cpp b/src/wallet/wallet.cpp
38index 4c36caed8..51732c0ae 100644
39--- a/src/wallet/wallet.cpp
40+++ b/src/wallet/wallet.cpp
41@@ -111,7 +111,7 @@ static void ReleaseWallet(CWallet* wallet)
42 wallet->WalletLogPrintf("Releasing wallet\n");
43 wallet->BlockUntilSyncedToCurrentChain();
44 wallet->m_chain_notifications_handler.reset();
45- WITH_LOCK(wallet->cs_wallet, wallet->Flush());
46+ wallet->Flush();
47 delete wallet;
48 // Wallet is now released, notify UnloadWallet, if any.
49 {
Flushing callback right after disconnecting signals should effectively ensure we don’t have pending and it will not need atomic notification pointer or so.
re: #18279 (comment)
@promag, @ryanofsky i have another, looking better to me, solution ensuring there is no background tasks, but i’ve not push since it’s not well tested
This fix is very similar to c3722190bc4f0efb76008dff16c6721ef0248856 from #18280, and probably we should review and merge #18280 when it’s ready and close this PR.
In my opinion though, waiting for notifications is a fragile fix that will slow down wallet unloading and node shutdown without good reason. A better fix would be to switch to shared pointers instead of raw pointers and fix the use-after-delete bug more directly: 06d2b530e57b88c0542e2bd8db00ca0b034b589d (branch, comment)
UnregisterValidationInterface
to be applied in all cases.
Still no to me, changes in BlockUntilSyncedToCurrentChain are significant to me.
Of course no need to close this PR if you want to make other improvements, but I hope you can review #18280 when it’s ready as a fix for the original issue. Also would encourage closing this PR and opening a new one if the new version will differ substantially, so the review comment history will be intelligible.
I would also caution that while BlockUntilSyncedToCurrentChain can definitely be cleaned up, its purpose is ensuring consistent results between RPC calls and it shouldn’t be changed to block longer than actually necessary to fix real bugs.
Also sync should be in UnregisterValidationInterface to be applied in all cases.
This is only true when unregistering wallets, as far as I know. Not things like indexes. Also if we fix shared_ptr reference counting with 06d2b530e57b88c0542e2bd8db00ca0b034b589d (branch, comment) it should not even be necessary for wallets. No reason conceptually wallets should have to wait for events they don’t care about just to fix a pointer lifetime issue
This is only true when unregistering wallets, as far as I know. Not things like indexes.
No, that’s not correct to me,
Maybe m_notifications should be an atomic pointer too since m_notifications is get and set from different threads, though probably the synchronization done by boost signals & the scheduler prevents bugs from it not being atomic
doing in UnregisterValidationInterface
it ensures notification is not accessed in race condition.
No reason conceptually wallets should have to wait for events they don’t care about just to fix a pointer lifetime issue
Agree with this.
From #18280:
From boost signal documentation at https://www.boost.org/doc/libs/1_72_0/doc/html/signals2/thread-safety.html:
When a signal is invoked by calling signal::operator(), the invocation first acquires a lock on the signal’s mutex. Then it obtains a handle to the signal’s slot list and combiner. Next it releases the signal’s mutex, before invoking the combiner to iterate through the slot list.
This means that
UnregisterValidationInterface
doesn’t prevent more calls to that interface.
But just calling SyncWithValidationInterfaceQueue
(which BlockUntilSyncedToCurrentChain
might call) doesn’t fully fix the issue because on shutdown the scheduler thread is stopped earlier.
Obviously there are various possible fixes, like #18280 or 06d2b53. But this approach is clearly not good.