wallet: UnregisterValidationInterface before SyncWithValidationInterfaceQueue #18280

pull promag wants to merge 2 commits into bitcoin:master from promag:2020-03-sync-unregistervalidationinterface changing 4 files +14 −3
  1. promag commented at 3:09 pm on March 6, 2020: member

    From boost signal documentation at https://www.boost.org/doc/libs/1_72_0/doc/html/signals2/thread-safety.html:

    When a signal is invoked by calling signal::operator(), the invocation first acquires a lock on the signal’s mutex. Then it obtains a handle to the signal’s slot list and combiner. Next it releases the signal’s mutex, before invoking the combiner to iterate through the slot list.

    This means that UnregisterValidationInterface doesn’t prevent more calls to that interface.

    This PR fixes the assumption that no validation calls happen after UnregisterValidationInterface in the following code https://github.com/bitcoin/bitcoin/blob/5d92ac26ed8984c29eabc4b78bcddd2423e68dac/src/wallet/wallet.cpp#L114-L115 The actual bug is that delete wallet races with the validation notifications.

    The fix consists in synchronizing with the validation queue (which happens in BlockUntilSyncedToCurrentChain) after UnregisterValidationInterface.

    Alternative to #18279. Fixes #16307.

  2. in src/validationinterface.cpp:99 in ce46396cf7 outdated
    87@@ -88,9 +88,11 @@ void RegisterValidationInterface(CValidationInterface* pwalletIn) {
    88 }
    89 
    90 void UnregisterValidationInterface(CValidationInterface* pwalletIn) {
    91-    if (g_signals.m_internals) {
    92-        g_signals.m_internals->m_connMainSignals.erase(pwalletIn);
    93-    }
    94+    AssertLockNotHeld(cs_main);
    95+    std::promise<void> promise;
    


    bvbfan commented at 3:13 pm on March 6, 2020:
    You don’t need make promise or calls function if g_signals.m_internals is nullptr.

    promag commented at 3:34 pm on March 6, 2020:
    Done.
  3. promag force-pushed on Mar 6, 2020
  4. promag force-pushed on Mar 6, 2020
  5. DrahtBot added the label Validation on Mar 6, 2020
  6. promag force-pushed on Mar 6, 2020
  7. promag force-pushed on Mar 7, 2020
  8. promag force-pushed on Mar 7, 2020
  9. DrahtBot commented at 2:44 am on March 7, 2020: member

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #18338 (wip: Fix wallet unload race condition by promag)
    • #18279 (Ensure wallet and chain tip are in sync by bvbfan)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  10. ryanofsky commented at 3:30 pm on March 10, 2020: member

    Alternative to #18279. Similar to #15700. More details in #18065. Possibly fixes #16307.

    About to look into this but this PR description seems like it is going to take a lot of digging to understand.

    Is it possible to summarize in a paragraph what this change is doing, what problem it fixes, and how it compares to the alternative? Thank you! :pray:

  11. in src/wallet/wallet.cpp:4097 in afe8d39bdc outdated
    3984@@ -3975,14 +3985,6 @@ std::shared_ptr<CWallet> CWallet::CreateWalletFromFile(interfaces::Chain& chain,
    3985     // Register with the validation interface. It's ok to do this after rescan since we're still holding locked_chain.
    


    ryanofsky commented at 3:35 pm on March 10, 2020:
    This comment is no longer true, and seems like it means there can now be missing notifications. I wonder if a different approach would be to send the loadwallet notification before locking the chain & scanning instead of after?
  12. ryanofsky commented at 3:54 pm on March 10, 2020: member

    Just to help other reviewers, this seems to make three changes:

    1. Instead of RegisterValidationInterface being non-blocking and beginning to send notifications right away, even if they preceded the RegisterValidationInterface call, it will now block waiting for any previously queued notifications to be drained, then attach the notification handler, then finally return.

    2. Instead of UnregisterValidationInterface being non-blocking and turning off notifications right away, it waits for any notifications that were queued prior to the UnregisterValidationInterface to be handled and blocks. Then it disconnects and returns after they were handled.

    3. Stops holding cs_main while sending the loadwallet notification.

    Change 3 seems like a good change just because it reduces unnecessary locking, but it’s not clear what motivated the change. Maybe it would fix the deadlock @fanquake alluded to in #16307 (comment), but it’s not possible to to tell without a backtrace.

    Change 1 at first glance seems like it more likely lead to missing notifications on startup, but maybe it’s intended to prevent the wallet redundantly processing mempool transactions blocks that have just been rescanned? Unclear

    Change 2 seems like it could fix missing notifications on shutdown, but I’m not sure where that’s been reported. It also seems like it could slow down shutdown unnecessarily.

    Note: I have a change which handles attaching notifications in a more deliberate nonracy way in #15719. Not sure if it overlaps with bugs that are being seen now, though. And it’s nice to have more minimal bugfixes anyway.

  13. promag renamed this:
    fix: Disconnect validation interface on the queue dispacher
    wip: fix: Disconnect validation interface on the queue dispacher
    on Mar 10, 2020
  14. promag commented at 4:18 pm on March 10, 2020: member
    Sorry @ryanofsky, should have tagged wip. Will address your comments and address each required change once this is good. Thanks for reviewing and your comments - will also revisit #15719.
  15. promag force-pushed on Mar 10, 2020
  16. promag force-pushed on Mar 10, 2020
  17. promag renamed this:
    wip: fix: Disconnect validation interface on the queue dispacher
    wallet: UnregisterValidationInterface before SyncWithValidationInterfaceQueue
    on Mar 10, 2020
  18. promag commented at 8:31 pm on March 10, 2020: member
    Updated OP. Reduced the scope to just fix #16307. Will submit other PRs with relevant changes from previous commits.
  19. promag force-pushed on Mar 10, 2020
  20. promag force-pushed on Mar 11, 2020
  21. promag force-pushed on Mar 11, 2020
  22. promag force-pushed on Mar 11, 2020
  23. promag force-pushed on Mar 11, 2020
  24. promag commented at 3:16 pm on March 11, 2020: member
    Updated to include dc62f3ce5228a611d973a154eb338e1fd34df38f which ensures that SyncWithValidationInterfaceQueue doesn’t hang due to !AreThreadsServicingQueue.
  25. in src/scheduler.cpp:154 in dc62f3ce52 outdated
    147@@ -148,6 +148,10 @@ bool CScheduler::AreThreadsServicingQueue() const {
    148 
    149 
    150 void SingleThreadedSchedulerClient::MaybeScheduleProcessQueue() {
    151+    if (!m_pscheduler->AreThreadsServicingQueue()) {
    152+        EmptyQueue();
    153+        return;
    154+    }
    


    ryanofsky commented at 4:58 pm on March 11, 2020:

    In commit “Handle no threads serving in MaybeScheduleProcessQueue” (dc62f3ce5228a611d973a154eb338e1fd34df38f)

    I think it would make more sense to add this logic at the bottom of AddToProcessQueue (and avoid calling MaybeScheduleProcessQueue in no thread case) than having it at the top of MaybeScheduleProcessQueue and returning early.

    Few reasons:

    • Having this logic doesn’t seem useful in ProcessQueue, the other place where MaybeScheduleProcessQueue is called
    • Adding this code here creates recursion ProcessQueue -> MaybeScheduleProcessQueue -> EmptyQueue -> ProcessQueue
    • If code is added here the name MaybeScheduleProcessQueue is now misleading because this is running callbacks immediately, not scheduling them for later
  26. ryanofsky approved
  27. ryanofsky commented at 5:06 pm on March 11, 2020: member
    Code review ACK 2a42fc1bdb393f88d72ec4d5c93b43b2b86dcc91. I left one comment below that would be nice to address. Also dc62f3ce5228a611d973a154eb338e1fd34df38f could use a better commit description. It’s unclear when SyncWithValidationInterfaceQueue would hang now. Is there a shutdown race where wallets are outliving the scheduler? Or is this just a test workaround for cases where scheduler is not running?
  28. promag force-pushed on Mar 11, 2020
  29. in src/scheduler.cpp:201 in 201eebf13b outdated
    195@@ -196,6 +196,9 @@ void SingleThreadedSchedulerClient::AddToProcessQueue(std::function<void ()> fun
    196         m_callbacks_pending.emplace_back(std::move(func));
    197     }
    198     MaybeScheduleProcessQueue();
    199+    if (!m_pscheduler->AreThreadsServicingQueue()) {
    200+        EmptyQueue();
    


    ryanofsky commented at 5:39 pm on March 11, 2020:

    In commit “Handle no threads serving in MaybeScheduleProcessQueue” (201eebf13b2723d7fe821d7222600fa04fadc0b0)

    Commit message subject is out of date since MaybeScheduleProcessQueue is no longer changing.

    Also I’m surprised by “SyncWithValidationInterfaceQueue is called after scheduler thread exists, which happen at shutdown” since I would hope wallets are getting unloaded before the scheduler stops, but maybe this is not the case currently. Probably beyond the scope of this PR to address, though


    promag commented at 3:16 pm on March 12, 2020:

    since I would hope wallets are getting unloaded before the scheduler stops

    Not currently the case.

    Probably beyond the scope of this PR to address, though

    Agree, I’ve tried and it caused some other failures.

    Commit message subject is out of date since MaybeScheduleProcessQueue is no longer changing.

    Ops need to update.

  30. in src/scheduler.cpp:199 in 201eebf13b outdated
    195@@ -196,6 +196,9 @@ void SingleThreadedSchedulerClient::AddToProcessQueue(std::function<void ()> fun
    196         m_callbacks_pending.emplace_back(std::move(func));
    197     }
    198     MaybeScheduleProcessQueue();
    199+    if (!m_pscheduler->AreThreadsServicingQueue()) {
    


    ryanofsky commented at 5:44 pm on March 11, 2020:

    In commit “Handle no threads serving in MaybeScheduleProcessQueue” (201eebf13b2723d7fe821d7222600fa04fadc0b0)

    If we can make this code:

    0if (m_pscheduler->AreThreadsServicingQueue()) [
    1    MaybeScheduleProcessQueue();
    2} else {
    3    EmptyQueue();
    4}
    

    it would be more obvious what’s going on here.

    Otherwise if the unconditional MaybeScheduleProcessQueue call is needed it would be good have a comment saying what it’s needed for.


    bvbfan commented at 6:23 pm on March 11, 2020:
    Should it be inside EmptyQueue

    ryanofsky commented at 7:04 pm on March 11, 2020:

    Should it be inside EmptyQueue

    I wouldn’t change emptyqueue because it’s called other places which don’t need this logic, and I can’t see how the behavior would fit in there (how you could coherently describe it or change the emptyqueue name to reflect it)


    promag commented at 7:28 pm on March 11, 2020:
    Changed. However if the scheduler thread is stopped after AreThreadsServicingQueue then the function won’t be called right?

    bvbfan commented at 6:28 am on March 12, 2020:
    Test wallet_multiwallet fails due to this code path, see my changes in EmptyQueue, you should schedule process queue when event queue is not empty, the logic is reversed there.

    bvbfan commented at 2:55 pm on March 12, 2020:

    I wouldn’t change emptyqueue because it’s called other places

    Where? It’s called in FlushBackgroundCallbacks only in init.cpp i’m pretty sure it will not affect anything wrong.


    ryanofsky commented at 3:12 pm on March 12, 2020:

    I wouldn’t change emptyqueue because it’s called other places

    Where? It’s called in FlushBackgroundCallbacks only in init.cpp i’m pretty sure it will not affect anything wrong.

    Not saying it’s wrong, saying I don’t understand how it makes sense for what how the EmptyQueue function is named, or what it currently does, or how it’s called other places.

    Maybe your suggestion to change EmptyQueue to schedule tasks instead of running them on the current thread is good, but it doesn’t make sense to me right now, and since no rationale was given and I can’t read your mind, I’m saying I wouldn’t change it here.


    ryanofsky commented at 3:35 pm on March 12, 2020:

    Test wallet_multiwallet fails due to this code path, see my changes in EmptyQueue, you should schedule process queue when event queue is not empty, the logic is reversed there.

    wallet_multiwallet.py is failing on appveyor with “‘unloadwallet’ RPC took longer than 60.000000 seconds. Consider using larger timeout for calls that take longer to return. (-344)”

    https://ci.appveyor.com/project/DrahtBot/bitcoin/builds/31405838

    So it looks like there is a problem with the current implementation. But I don’t understand the comment about EmptyQueue or logic being reversed


    bvbfan commented at 3:36 pm on March 12, 2020:

    It makes much more sense compare to current version

     0void SingleThreadedSchedulerClient::EmptyQueue() {
     1    if (!m_pscheduler->AreThreadsServicingQueue())
     2        return;
     3    auto pendingCallbacks = [this]() -> bool {
     4        LOCK(m_cs_callbacks_pending);
     5        return !m_callbacks_pending.empty();
     6    };
     7    bool should_continue = pendingCallbacks();
     8    while (should_continue) {
     9        ProcessQueue();
    10        should_continue = pendingCallbacks();
    11    }
    12}
    
    1. You don’t need to do stupid things, like here one, if (!m_pscheduler->AreThreadsServicingQueue())
    2. You don’t need to check against is there pending callbacks it’s done there, so if you don’t have pending ones you don’t need to enter eventually endless waiting.

    promag commented at 3:40 pm on March 12, 2020:
    Iv’e been trying to reproduce appveyor error wihtout any luck.

    ryanofsky commented at 5:16 pm on March 12, 2020:
    1. You don’t need to do stupid things, like here one, if (!m_pscheduler->AreThreadsServicingQueue())
    2. You don’t need to check against is there pending callbacks it’s done there, so if you don’t have pending ones you don’t need to enter eventually endless waiting. @bvbfan, I’m confused by what you are saying here. You seem to be saying you don’t need two things, but your code sample is doing both those things. Are you saying that for some reason it is better to do (1) in EmptyQueue instead of AddToProcessQueue? If so, what is the reason you think this? To me it seems better to do (1) in AddToProcessQueue instead of EmptyQueue, because AddToProcessQueue has to be aware of thread state while EmptyQueue does not.

    I’m even more confused what you are saying about (2) because you’re saying we don’t need a check for pending callbacks but your code is actually adding an extra check (before the first loop iteration).

    More importantly, I can’t figure out what this suggestion has to do with wallet_multiwallet.py. If you can clarify, it would be much appreciated. I’ve also been trying to reproduce the failure there and look for possible causes and haven’t been able to figure it out.


    bvbfan commented at 5:56 pm on March 12, 2020:

    I’m confused by what you are saying here. You seem to be saying you don’t need two things, but your code sample is doing both those things.

    I mean you don’t need to do these things outside else, i was not clear. These 2 things should be done here in EmptyQueue

    More importantly, I can’t figure out what this suggestion has to do with wallet_multiwallet.py

    0if (m_pscheduler->AreThreadsServicingQueue()) [
    1    MaybeScheduleProcessQueue();
    2} else {
    3    EmptyQueue();
    4}
    

    This code is wrong, you try to process / flush events but it does not do that. Let analyse current code

    0void SingleThreadedSchedulerClient::EmptyQueue() {
    1assert(!m_pscheduler->AreThreadsServicingQueue()); // <- assert (you can avoid that)
    2   bool should_continue = true;
    3   while (should_continue) {
    4       ProcessQueue(); // <--- here is the bigger mistake, it tries to process events before it's checked that has pending callbacks, you can enter this and wait to infinity (since app is shutting down) it should be checked for pendings before process queue
    5       LOCK(m_cs_callbacks_pending);
    6       should_continue = !m_callbacks_pending.empty();
    7   }
    

    ryanofsky commented at 6:31 pm on March 12, 2020:
    0       ProcessQueue(); // <--- here is the bigger mistake, it tries to process events before it's checked that has pending callbacks, you can enter this and wait to infinity (since app is shutting down) it should be checked for pendings before process queue
    

    Explain this more? ProcessQueue doesn’t wait for callbacks if there aren’t any, it just runs a callbacks if there is one.

    Iv’e been trying to reproduce appveyor error wihtout any luck.

    I finally reproduced the problem cutting down the test and running in a loop with –usecli. Looking at the stack trace I see unloadwallet RPC thread waiting for scheduler thread MaybeCompactWalletDB call, which is stuck in ReleaseWallet calling SyncWithValidationInterfaceQueue, which is stuck because SyncWithValidationInterfaceQueue will always hang if called from the scheduler thread. This is basically the same deadlock that was in fanquake’s backtrace from #16307 (comment), which suggests that 2a42fc1bdb393f88d72ec4d5c93b43b2b86dcc91 and 2097e35a635cd58bfc21e8ab149a903d58e94bf2 aren’t going to work as fixes for this problem in their current form. Maybe more can be done to get them working but an alternate approach like 06d2b530e57b88c0542e2bd8db00ca0b034b589d (branch, comment) might be easier at this point

  31. ryanofsky approved
  32. ryanofsky commented at 5:49 pm on March 11, 2020: member
    Code review ACK c9ca6c407adc3bf43a968b6aecd3ea4cbec1ea54. Just suggested scheduler changes since last review
  33. Handle no threads serving in MaybeScheduleProcessQueue
    This is necessary if SyncWithValidationInterfaceQueue is called after
    scheduler thread exists, which happen at shutdown.
    03cd591f47
  34. wallet: UnregisterValidationInterface before SyncWithValidationInterfaceQueue 2097e35a63
  35. promag force-pushed on Mar 11, 2020
  36. ryanofsky approved
  37. ryanofsky commented at 3:04 pm on March 12, 2020: member
    Code review ACK 2097e35a635cd58bfc21e8ab149a903d58e94bf2. Just suggested change clarifying AddToProcessQueue / MaybeScheduleProcessQueue logic since last review
  38. in src/scheduler.cpp:198 in 03cd591f47 outdated
    194@@ -195,7 +195,11 @@ void SingleThreadedSchedulerClient::AddToProcessQueue(std::function<void ()> fun
    195         LOCK(m_cs_callbacks_pending);
    196         m_callbacks_pending.emplace_back(std::move(func));
    197     }
    198-    MaybeScheduleProcessQueue();
    199+    if (m_pscheduler->AreThreadsServicingQueue()) {
    


    ryanofsky commented at 4:04 pm on March 12, 2020:

    In commit “Handle no threads serving in MaybeScheduleProcessQueue” (03cd591f47f852e0a2c2e6262367eff4acefca62)

    Was experimenting and wrote a small test for this change:

    0// Ensure scheduled tasks run even when there are no scheduler threads
    1BOOST_AUTO_TEST_CASE(singlethreadedscheduler_nothreads)
    2{
    3    CScheduler scheduler;
    4    SingleThreadedSchedulerClient client(&scheduler);
    5    bool task_ran = false;
    6    client.AddToProcessQueue([&] { task_ran = true; });
    7    BOOST_CHECK(task_ran);
    8}
    

    Could add this to scheduler_tests.cpp. It fails without this commit and passes with it.

  39. promag commented at 8:46 am on March 13, 2020: member
    Replaced by #18338.
  40. promag closed this on Mar 13, 2020

  41. promag deleted the branch on Mar 13, 2020
  42. DrahtBot locked this on Feb 15, 2022

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-07-03 10:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me