wallet: Fix wallet loading race during node start #19876

pull fjahr wants to merge 3 commits into bitcoin:master from fjahr:wallet_race changing 5 files +56 −106
  1. fjahr commented at 8:31 pm on September 4, 2020: contributor

    Intended to fix 19853

    There appears to be a race condition when using the wallet RPCs right after starting a node and that causes the tests to be flaky: in the wallet_basic.py a node is restarted and getbalance gets called right after. getbalance then sometimes reports a balance of 0 because the wallet is not loaded yet. It looks like this only appeared after the zapwallettxs tests were removed from this test but it should not directly be caused by the removal, it rather hides the issue less often. I wouldn’t be surprised if this is a known issue that has popped up in a different context already but I didn’t find anything on it yet.

    My naive solution is to track a ready state for CWallet but I am very much looking for concept feedback on that. Returning Null seems to be the default behavior for wallet RPCs in early failure cases like this. An explicit error might be more helpful for users to figure out what happened but I am opting for consistency for now. Either option is much better than reporting a balance 0 that is incorrect but that could be real and could cause alarm systems to go off because all of a sudden a wallet balance is 0 after a node restarts.

    Whether this ends up being the final solution or not, after the conceptual discussion I think it will make sense to add this check to other wallet RPCs and I will also check if there are other open flaky wallet test issues that might be caused by the same bug.

    To reproduce the issue see the commented out block of code that I left in the test file. This reproduces the issue on ~4 out of 5 tries on my machine.

    EDIT: The actual error that appears in the CI is hit here where node0_balance is 0 and then sendtoaddress is called with a negative amount.

  2. DrahtBot added the label RPC/REST/ZMQ on Sep 4, 2020
  3. DrahtBot added the label Wallet on Sep 4, 2020
  4. jonatack commented at 8:40 pm on September 4, 2020: contributor
    Thanks for looking at this, will review.
  5. fjahr force-pushed on Sep 4, 2020
  6. fjahr force-pushed on Sep 4, 2020
  7. achow101 commented at 9:58 pm on September 4, 2020: member
    We currently track what wallets are currently loading (see CWallet::LoadWallet) to avoid a different race condition. I think we could reuse that for the startup wallet loading. Then in the RPC, GetWalletForJSONRPCRequest should check that the wallet isn’t in the list of loading wallets. If it is, it should raise an exception. I don’t think that returning NullUniValue is the correct way to handle this error.
  8. promag commented at 5:06 pm on September 5, 2020: member
    Just needs a call to syncwithvalidationinterfacequeue before getbalance?
  9. fjahr commented at 6:09 pm on September 5, 2020: contributor

    Just needs a call to syncwithvalidationinterfacequeue before getbalance?

    Yes, this fixes the test. Initially, I was hoping I could fix the test and the underlying race issue in one PR but since the race fix is a bit more complicated I think it makes sense to split it from the test fix, so the CI failures stop as fast as possible. I opened it in #19887 and made you co-author.

  10. promag commented at 7:53 pm on September 5, 2020: member
    Is this really a bug? The wallet is just “catching up” (?)
  11. fjahr commented at 8:01 pm on September 5, 2020: contributor

    Is this really a bug? The wallet is just “catching up” (?)

    Yes, it’s catching up but while it’s doing that it responds with a false balance to a getbalance request, which could mean the wallet has been hacked or it’s broken. Instead, there should be an actual error that indicates that it’s catching up or just anything that does not look like a real balance.

  12. promag commented at 8:30 pm on September 5, 2020: member
    The balance is fine I think, the wallet isn’t aware of the mempool yet right? So, the wallet itself is loaded, but not synced.
  13. fjahr commented at 9:21 pm on September 5, 2020: contributor

    The balance is fine I think, the wallet isn’t aware of the mempool yet right? So, the wallet itself is loaded, but not synced.

    As a user, I would expect that the wallet would report my own change output in the balance even if the mempool is still syncing. But I will have to review that code and maybe I am off on that. The many reports in the CI failure issue in the span of 2 days suggest that this is hit pretty frequently on startup and I think we should minimize the potential of reporting unexpected balances to users. But if others agree that this is not a big deal, I will close it.

  14. maflcko referenced this in commit c91f955f44 on Sep 6, 2020
  15. maflcko commented at 11:08 am on September 6, 2020: member

    When this bug is fixed, it would be good to remove the temporary workarounds. git grep -W syncwithval test. Specifically:

    0test/functional/mempool_persist.py-        # Verify accounting of mempool transactions after restart is correct
    1test/functional/mempool_persist.py:        self.nodes[2].syncwithvalidationinterfacequeue()  # Flush mempool to wallet
    2test/functional/mempool_persist.py-        assert_equal(node2_balance, self.nodes[2].getbalance())
    3
    4test/functional/wallet_basic.py-        # Prevent potential race condition when calling wallet RPCs right after restart
    5test/functional/wallet_basic.py:        self.nodes[0].syncwithvalidationinterfacequeue()
    
  16. rpc, wallet: Ensure wallet has synced with mempool on startup
    This is used to prevent race conditions when using wallet RPCs right after starting a node. A wallet that is not synced with the mempool could temporarily report false balances because it's own transactions with change outputs do not appear to be in the mempool and are not trusted because of that.
    d37756f380
  17. test: Remove obsolete calls to syncwithvalidationinterfacequeue 40433cbf77
  18. rpc, wallet: Use EnsureWalletIsReady in all wallet RPCs c34dd415ca
  19. fjahr force-pushed on Sep 18, 2020
  20. fjahr marked this as ready for review on Sep 18, 2020
  21. fjahr commented at 9:54 pm on September 18, 2020: contributor

    I think this is ready for review now. It got harder to reproduce the issue with my script for some reason I can not explain. But through detailed logging I am certain that the issue is that postInitProcess() has not run yet after startup and thus the transactions (our own with change) that are in the mempool are not marked as in the mempool. This means they are not trusted and don’t get included in getbalance.

    We only want to ensure that the mempool sync runs once on startup. A flag indicating that it has in fact run still seems a reasonable solution after thinking about it a bit more, but happy to consider other approaches. Otherwise, I have improved the implementation a lot by not failing anymore when the issue is encountered and I also gave the flag a more reasonable name. I have combined this with a refactor which let me remove a lot of repetitive code. I have also removed the temporary fixes for the tests which used syncwithvalidationinterfacequeue.

  22. DrahtBot commented at 2:28 am on September 20, 2020: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #20243 (rpc, wallet: Expose wallet id in getwalletinfo RPC output by hebasto)
    • #20205 (wallet: Properly support a wallet id by achow101)
    • #19980 (refactor: Some wallet cleanups by promag)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  23. in src/wallet/wallet.h:739 in d37756f380 outdated
    734      * This lock protects all the fields added by CWallet.
    735      */
    736     mutable RecursiveMutex cs_wallet;
    737 
    738+    /** Indicated the wallet was initially synced with the mempool. */
    739+    void setMempoolSynced() { m_mempool_synced = true; };
    


    promag commented at 10:34 pm on October 13, 2020:

    d37756f380b27d259a7818078ddf02caa0a4578a

    Should be private?

  24. in src/wallet/rpcwallet.cpp:88 in d37756f380 outdated
    83+    // Make sure the results are valid at least up to the most recent block
    84+    // the user could have gotten from another RPC command prior to now
    85+    pwallet->BlockUntilSyncedToCurrentChain();
    86+
    87+    // Ensure that the mempool has been synced at least once on startup
    88+    if (!pwallet->mempoolSynced()) pwallet->syncMempool();
    


    promag commented at 10:35 pm on October 13, 2020:

    d37756f380b27d259a7818078ddf02caa0a4578a

    Just call pwallet->syncMempool() and early return there - no need to expose mempool synced flag.

  25. in src/wallet/wallet.h:729 in d37756f380 outdated
    724@@ -725,13 +725,22 @@ class CWallet final : public WalletStorage, public interfaces::Chain::Notificati
    725 
    726     bool CreateTransactionInternal(const std::vector<CRecipient>& vecSend, CTransactionRef& tx, CAmount& nFeeRet, int& nChangePosInOut, bilingual_str& error, const CCoinControl& coin_control, bool sign);
    727 
    728+    /** Wallet was initially synced with the mempool. */
    729+    bool m_mempool_synced{false};
    


    promag commented at 10:36 pm on October 13, 2020:

    d37756f380b27d259a7818078ddf02caa0a4578a

    Make it atomic? Currently accessed in multiple threads.

  26. in src/wallet/rpcwallet.cpp:80 in d37756f380 outdated
    76@@ -77,6 +77,20 @@ static bool ParseIncludeWatchonly(const UniValue& include_watchonly, const CWall
    77 }
    78 
    79 
    80+static CWallet* GetReadyWallet(std::shared_ptr<CWallet> const wallet) {
    


    promag commented at 10:43 pm on October 13, 2020:

    d37756f380b27d259a7818078ddf02caa0a4578a

    NAK the dereferencing because the wallet can be destroyed after the call, like

    0CWallet* const pwallet = GetReadyWallet(GetWalletForJSONRPCRequest(request));
    

    Also it’s not clear the function does the dereference.


    maflcko commented at 2:26 pm on March 21, 2022:

    Maybe the following signature might be better?

    0 static CWallet& GetReadyWallet(CWallet& wallet)
    1{
    
  27. promag commented at 10:45 pm on October 13, 2020: member
    Why not ensure wallet is ready on load/startup? No changes would be needed on RPC code.
  28. DrahtBot commented at 0:33 am on December 2, 2020: contributor

    🐙 This pull request conflicts with the target branch and needs rebase.

    Want to unsubscribe from rebase notifications on this pull request? Just convert this pull request to a “draft”.

  29. DrahtBot added the label Needs rebase on Dec 2, 2020
  30. bitcoin deleted a comment on Jan 19, 2021
  31. PastaPastaPasta referenced this in commit 8de6694794 on Jun 27, 2021
  32. PastaPastaPasta referenced this in commit 8dab9f488d on Jun 28, 2021
  33. PastaPastaPasta referenced this in commit 201c606366 on Jun 29, 2021
  34. PastaPastaPasta referenced this in commit b88d413046 on Jul 1, 2021
  35. PastaPastaPasta referenced this in commit 843e1d573c on Jul 1, 2021
  36. PastaPastaPasta referenced this in commit 7b796a1e1b on Jul 15, 2021
  37. PastaPastaPasta referenced this in commit 76eb5e3a03 on Jul 16, 2021
  38. Fabcien referenced this in commit 314bdf8203 on Sep 23, 2021
  39. DrahtBot commented at 11:21 am on December 15, 2021: contributor
    • Is it still relevant? ➡️ Please solve the conflicts to make it ready for review and to ensure the CI passes.
    • Is it no longer relevant? ➡️ Please close.
    • Did the author lose interest or time to work on this? ➡️ Please close it and mark it ‘Up for grabs’ with the label, so that it can be picked up in the future.
  40. DrahtBot commented at 1:07 pm on March 21, 2022: contributor
    • Is it still relevant? ➡️ Please solve the conflicts to make it ready for review and to ensure the CI passes.
    • Is it no longer relevant? ➡️ Please close.
    • Did the author lose interest or time to work on this? ➡️ Please close it and mark it ‘Up for grabs’ with the label, so that it can be picked up in the future.
  41. achow101 commented at 6:15 pm on October 12, 2022: member
    Closing this as it has not had any activity in a while. If you are interested in continuing work on this, please leave a comment so that it can be reopened.
  42. achow101 closed this on Oct 12, 2022

  43. bitcoin locked this on Oct 12, 2023

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-07-06 01:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me