Ensure wallet and chain tip are in sync #18279

pull bvbfan wants to merge 0 commits into bitcoin:master from bvbfan:master changing 0 files +0 −0
  1. bvbfan commented at 2:34 pm on March 6, 2020: contributor

    Fixes #16307

    Signed-off-by: Anthony Fieroni bvbfan@abv.bg

  2. fanquake added the label Wallet on Mar 6, 2020
  3. in src/wallet/wallet.cpp:1163 in ddcc4b23b1 outdated
    1155@@ -1156,8 +1156,11 @@ void CWallet::BlockUntilSyncedToCurrentChain() {
    1156     // ::ChainActive().Tip(), otherwise put a callback in the validation interface queue and wait
    1157     // for the queue to drain enough to execute it (indicating we are caught up
    1158     // at least with the time we entered this function).
    1159-    uint256 last_block_hash = WITH_LOCK(cs_wallet, return m_last_block_processed);
    1160-    chain().waitForNotificationsIfTipChanged(last_block_hash);
    1161+    bool tipIsSame = false;
    1162+    do {
    1163+        uint256 last_block_hash = WITH_LOCK(cs_wallet, return m_last_block_processed);
    1164+        tipIsSame = chain().waitForNotificationsIfTipIsNotSame(last_block_hash);
    


    MarcoFalke commented at 2:58 pm on March 6, 2020:

    Shouldn’t this whole scope happen under the cs_wallet lock?

    Also, shouldn’t m_chain_notifications_handler.reset() happen inside here under the wallet lock?

    Otherwise a new block could come in and put new notifications in the handler while we reset it.


    bvbfan commented at 3:10 pm on March 6, 2020:

    Shouldn’t this whole scope happen under the cs_wallet lock?

    No, we leave mutex for process thread to finish.

    Also, shouldn’t m_chain_notifications_handler.reset() happen inside here under the wallet lock?

    No.

    Otherwise a new block could come in and put new notifications in the handler while we reset it.

    I’m aware of that, we can have a function that takes a mutex and reset notifications

  4. promag commented at 3:00 pm on March 6, 2020: member
    I think this is the wrong approach.
  5. bvbfan commented at 3:16 pm on March 6, 2020: contributor

    I think this is the wrong approach.

    That’s at least ensure sync between wallet and chain tips otherwise all calls to BlockUntilSyncedToCurrentChain are just pointless, they don’t do what is supposed to do. Your patch looks fine but not what we want when calls a function like BlockUntilSyncedToCurrentChain

  6. DrahtBot commented at 2:45 am on March 7, 2020: member

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #18338 (wip: Fix wallet unload race condition by promag)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  7. bvbfan commented at 9:22 am on March 7, 2020: contributor

    @promag the problem is not the queued connection / disconnection, actually

    0void UnregisterValidationInterface(CValidationInterface* pwalletIn) {
    1    if (g_signals.m_internals) {
    2        g_signals.m_internals->m_connMainSignals.erase(pwalletIn);
    3    }
    4}
    

    signal is received before disconnect is called by erasing the caller.

    Fix unit test and rebase.

  8. bvbfan commented at 10:47 am on March 7, 2020: contributor

    Let’s clarify a bit more.

    0void disconnect() override
    1{
    2    if (m_notifications) {
    3        m_notifications = nullptr; // <---------------- it makes no sense to be before unregister
    4        UnregisterValidationInterface(this);
    5    }
    6}
    

    Since ValidationInterfaceConnections calls destructors in reverse order, for a bad luck UpdatedBlockTip is disconnected last which is not held the mutex but update time by atomic variable. Running in testnet for at least an hour in parallel with

    0#!/bin/bash
    1while [ 1 ]
    2do
    3    src/bitcoin-cli -testnet loadwallet test
    4    src/bitcoin-cli -testnet unloadwallet test
    5done
    

    no crashes anymore.

  9. ariard commented at 4:47 am on March 9, 2020: member

    I think I’ve hit into #16307 issue while doing refactoring on this part of the code. Current refactoring direction is to move towards the wallet and chain being independent tips, not more synchronized so another way to solve this would be to reset handler correctly while holding wallet lock and then test handler still exists in BlockConnected/BlockDisconnected/…, flushing correctly and committing right locator means worst-case we redo a rescan at wallet loading.

    Doing it race-condition free likely need a handler lock though (or `NotificationsHandlerImpl locking wallet…)

  10. bvbfan commented at 1:06 pm on March 10, 2020: contributor

    while holding wallet lock and then test handler still exists in BlockConnected/BlockDisconnected

    i don’t think that’s right approach to me.

  11. in src/interfaces/chain.cpp:163 in 78a88ef667 outdated
    159@@ -160,8 +160,8 @@ class NotificationsHandlerImpl : public Handler, CValidationInterface
    160     void disconnect() override
    161     {
    162         if (m_notifications) {
    163-            m_notifications = nullptr;
    


    ryanofsky commented at 4:35 pm on March 10, 2020:
    This change seems clearly correct. Good catch! Maybe m_notifications should be an atomic pointer too since m_notifications is get and set from different threads, though probably the synchronization done by boost signals & the scheduler prevents bugs from it not being atomic

    bvbfan commented at 8:30 am on March 11, 2020:

    Maybe m_notifications should be an atomic pointer too since m_notifications is get and set from different threads, though probably the synchronization done by boost signals & the scheduler prevents bugs from it not being atomic

    You’re right

  12. in src/interfaces/chain.cpp:349 in 78a88ef667 outdated
    345@@ -346,13 +346,14 @@ class ChainImpl : public Chain
    346     {
    347         return MakeUnique<NotificationsHandlerImpl>(*this, notifications);
    348     }
    349-    void waitForNotificationsIfTipChanged(const uint256& old_tip) override
    350+    bool waitForNotificationsIfTipIsNotSame(const uint256& tip) override
    


    ryanofsky commented at 4:39 pm on March 10, 2020:
    I like the name change, and I can see how the having the bool return value should make races and the deadlock reported #16307 (comment) less likely, though I suspect there is also probably a more direct and reliable fix we can find
  13. in src/wallet/wallet.cpp:114 in 78a88ef667 outdated
    109@@ -110,8 +110,8 @@ static void ReleaseWallet(CWallet* wallet)
    110     const std::string name = wallet->GetName();
    111     wallet->WalletLogPrintf("Releasing wallet\n");
    112     wallet->BlockUntilSyncedToCurrentChain();
    113-    wallet->Flush();
    114     wallet->m_chain_notifications_handler.reset();
    115+    WITH_LOCK(wallet->cs_wallet, wallet->Flush());
    


    ryanofsky commented at 4:46 pm on March 10, 2020:
    Acquiring the cs_wallet lock here is unexpected. If this is really neccessary, I think the lock should be acquired closer to where it’s used inside the Flush implementation, probably with a comment if the purpose of the lock isn’t more obvious there

    bvbfan commented at 8:15 am on March 11, 2020:

    I think there is a misunderstanding here (maybe my own), but my understanding is that the point of BlockUntilSyncedToCurrentChain is just to ensure that if you make two RPC calls in a row the second RPC call is always aware of the result of first call. Ensuring that notification in the queue from the time of the call are processed is sufficient to do this, without trying to guarantee that the wallet tip and node tips are equal.

    Yes, but mostly after BlockUntilSyncedToCurrentChain lock is acquired and there is no guarantee that notification is faster than this held, i.e. you can leave BlockUntilSyncedToCurrentChain and don’t do what you want, you gain mutex before notification thread and you’re not sync anything.


    bvbfan commented at 8:17 am on March 11, 2020:
    Here after BlockUntilSyncedToCurrentChain is ensuring it’s on current tip, the lock tries to guarantee that notification thread is finished before we flush and release memory.
  14. ryanofsky commented at 4:59 pm on March 10, 2020: member

    re: #18279 (comment)

    That’s at least ensure sync between wallet and chain tips otherwise all calls to BlockUntilSyncedToCurrentChain are just pointless, they don’t do what is supposed to do. Your patch looks fine but not what we want when calls a function like BlockUntilSyncedToCurrentChain

    I think there is a misunderstanding here (maybe my own), but my understanding is that the point of BlockUntilSyncedToCurrentChain is just to ensure that if you make two RPC calls in a row the second RPC call is always aware of the result of first call. Ensuring that notification in the queue from the time of the call are processed is sufficient to do this, without trying to guarantee that the wallet tip and node tips are equal.

    Checking tips for equality is just an optimization to avoid waiting for the queue, and guaranteeing that the tips are equal isn’t actually possible (despite the attempt here) because BlockUntilSyncedToCurrentChain isn’t called with any locks held.

  15. bvbfan commented at 1:21 pm on March 11, 2020: contributor

    @promag, @ryanofsky i have another, looking better to me, solution ensuring there is no background tasks, but i’ve not push since it’s not well tested

     0diff --git a/src/scheduler.cpp b/src/scheduler.cpp
     1index 7cb7754fd..17c26be74 100644
     2--- a/src/scheduler.cpp
     3+++ b/src/scheduler.cpp
     4@@ -199,12 +199,16 @@ void SingleThreadedSchedulerClient::AddToProcessQueue(std::function<void ()> fun
     5 }
     6 
     7 void SingleThreadedSchedulerClient::EmptyQueue() {
     8-    assert(!m_pscheduler->AreThreadsServicingQueue());
     9-    bool should_continue = true;
    10+    if (!m_pscheduler->AreThreadsServicingQueue())
    11+        return;
    12+    auto hasCallbacks = [this]() -> bool {
    13+        LOCK(m_cs_callbacks_pending);
    14+        return !m_callbacks_pending.empty();
    15+    };
    16+    auto should_continue = hasCallbacks();
    17     while (should_continue) {
    18         ProcessQueue();
    19-        LOCK(m_cs_callbacks_pending);
    20-        should_continue = !m_callbacks_pending.empty();
    21+        should_continue = hasCallbacks();
    22     }
    23 }
    24 
    25diff --git a/src/validationinterface.cpp b/src/validationinterface.cpp
    26index 1deb93c97..87ade954c 100644
    27--- a/src/validationinterface.cpp
    28+++ b/src/validationinterface.cpp
    29@@ -90,6 +90,7 @@ void RegisterValidationInterface(CValidationInterface* pwalletIn) {
    30 void UnregisterValidationInterface(CValidationInterface* pwalletIn) {
    31     if (g_signals.m_internals) {
    32         g_signals.m_internals->m_connMainSignals.erase(pwalletIn);
    33+        g_signals.FlushBackgroundCallbacks();
    34     }
    35 }
    36 
    37diff --git a/src/wallet/wallet.cpp b/src/wallet/wallet.cpp
    38index 4c36caed8..51732c0ae 100644
    39--- a/src/wallet/wallet.cpp
    40+++ b/src/wallet/wallet.cpp
    41@@ -111,7 +111,7 @@ static void ReleaseWallet(CWallet* wallet)
    42     wallet->WalletLogPrintf("Releasing wallet\n");
    43     wallet->BlockUntilSyncedToCurrentChain();
    44     wallet->m_chain_notifications_handler.reset();
    45-    WITH_LOCK(wallet->cs_wallet, wallet->Flush());
    46+    wallet->Flush();
    47     delete wallet;
    48     // Wallet is now released, notify UnloadWallet, if any.
    49     {
    

    Flushing callback right after disconnecting signals should effectively ensure we don’t have pending and it will not need atomic notification pointer or so.

  16. ryanofsky commented at 2:10 pm on March 11, 2020: member

    re: #18279 (comment)

    @promag, @ryanofsky i have another, looking better to me, solution ensuring there is no background tasks, but i’ve not push since it’s not well tested

    This fix is very similar to c3722190bc4f0efb76008dff16c6721ef0248856 from #18280, and probably we should review and merge #18280 when it’s ready and close this PR.

    In my opinion though, waiting for notifications is a fragile fix that will slow down wallet unloading and node shutdown without good reason. A better fix would be to switch to shared pointers instead of raw pointers and fix the use-after-delete bug more directly: 06d2b530e57b88c0542e2bd8db00ca0b034b589d (branch, comment)

  17. bvbfan commented at 2:22 pm on March 11, 2020: contributor

    This fix is very similar to c372219 from #18280, and probably we should review and merge #18280 when it’s ready and close this PR.

    Still no to me, changes in BlockUntilSyncedToCurrentChain are significant to me.

  18. bvbfan commented at 2:25 pm on March 11, 2020: contributor
    Also sync should be in UnregisterValidationInterface to be applied in all cases.
  19. ryanofsky commented at 3:02 pm on March 11, 2020: member

    Still no to me, changes in BlockUntilSyncedToCurrentChain are significant to me.

    Of course no need to close this PR if you want to make other improvements, but I hope you can review #18280 when it’s ready as a fix for the original issue. Also would encourage closing this PR and opening a new one if the new version will differ substantially, so the review comment history will be intelligible.

    I would also caution that while BlockUntilSyncedToCurrentChain can definitely be cleaned up, its purpose is ensuring consistent results between RPC calls and it shouldn’t be changed to block longer than actually necessary to fix real bugs.

    Also sync should be in UnregisterValidationInterface to be applied in all cases.

    This is only true when unregistering wallets, as far as I know. Not things like indexes. Also if we fix shared_ptr reference counting with 06d2b530e57b88c0542e2bd8db00ca0b034b589d (branch, comment) it should not even be necessary for wallets. No reason conceptually wallets should have to wait for events they don’t care about just to fix a pointer lifetime issue

  20. bvbfan commented at 3:31 pm on March 11, 2020: contributor

    This is only true when unregistering wallets, as far as I know. Not things like indexes.

    No, that’s not correct to me,

    Maybe m_notifications should be an atomic pointer too since m_notifications is get and set from different threads, though probably the synchronization done by boost signals & the scheduler prevents bugs from it not being atomic

    doing in UnregisterValidationInterface it ensures notification is not accessed in race condition.

  21. promag commented at 4:01 pm on March 11, 2020: member

    No reason conceptually wallets should have to wait for events they don’t care about just to fix a pointer lifetime issue

    Agree with this.

    From #18280:

    From boost signal documentation at https://www.boost.org/doc/libs/1_72_0/doc/html/signals2/thread-safety.html:

    When a signal is invoked by calling signal::operator(), the invocation first acquires a lock on the signal’s mutex. Then it obtains a handle to the signal’s slot list and combiner. Next it releases the signal’s mutex, before invoking the combiner to iterate through the slot list.

    This means that UnregisterValidationInterface doesn’t prevent more calls to that interface.

    But just calling SyncWithValidationInterfaceQueue (which BlockUntilSyncedToCurrentChain might call) doesn’t fully fix the issue because on shutdown the scheduler thread is stopped earlier.

    Obviously there are various possible fixes, like #18280 or 06d2b53. But this approach is clearly not good.

  22. bvbfan closed this on Mar 15, 2020

  23. DrahtBot locked this on Aug 16, 2022

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-12-19 09:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me