Batch write imported stuff in importmulti #15741

pull achow101 wants to merge 6 commits into bitcoin:master from achow101:batch-write-importmulti changing 5 files +163 −89
  1. achow101 commented at 11:18 pm on April 3, 2019: member

    Instead of writing each item to the wallet database individually, do them in batches so that the import runs faster.

    This was tested by importing a ranged descriptor for 10,000 keys.

    Current master

    0$ time src/bitcoin-cli -regtest -rpcwallet=importbig importmulti '[{"desc": "sh(wpkh([73111820/44h/1h/0h]tpubDDoT2SgEjaU5rerQpfcRDWPAcwyZ5g7xxHgVAfPwidgPDKVjm89d6jJ8AQotp35Np3m6VaysfUY1C2g68wFqUmraGbzhSsMF9YBuTGxpBaW/1/*))#3w7php47", "range": [0, 10000], "timestamp": "now", "internal": true, "keypool": false, "watchonly": true}]'
    1...
    2
    3real	7m45.29s
    

    This PR:

    0$ time src/bitcoin-cli -regtest -rpcwallet=importbig4 importmulti '[{"desc": "pkh([73111820/44h/1h/0h]tpubDDoT2SgEjaU5rerQpfcRDWPAcwyZ5g7xxHgVAfPwidgPDKVjm89d6jJ8AQotp35Np3m6VaysfUY1C2g68wFqUmraGbzhSsMF9YBuTGxpBaW/1/*)#v65yjgmc", "range": [0, 10000], "timestamp": "now", "internal": true, "keypool": false, "watchonly": true}]'
    1...
    2
    3real	3.93s
    

    Fixes #15739

  2. fanquake added the label RPC/REST/ZMQ on Apr 3, 2019
  3. fanquake added the label Wallet on Apr 3, 2019
  4. gwillen commented at 11:30 pm on April 3, 2019: contributor
    utACK although it would be good to factor out the counting into a WalletWriteCache or something, that flushes itself at intervals so you don’t have to do it manually.
  5. in src/wallet/rpcdump.cpp:1290 in 508c486064 outdated
    1286@@ -1282,33 +1287,46 @@ static UniValue ProcessImport(CWallet * const pwallet, const UniValue& data, con
    1287             }
    1288              const CPubKey& pubkey = entry->second;
    1289              CPubKey temp;
    1290-             if (!pwallet->GetPubKey(id, temp) && !pwallet->AddWatchOnly(GetScriptForRawPubKey(pubkey), timestamp)) {
    1291+             if (!pwallet->GetPubKey(id, temp) && !pwallet->AddWatchOnlyWithDB(*batch, GetScriptForRawPubKey(pubkey), timestamp)) {
    


    gwillen commented at 11:31 pm on April 3, 2019:
    There is a weird indentation glitch here, although it looks like it was preexisting.
  6. in src/wallet/rpcdump.cpp:1329 in 508c486064 outdated
    1328             ExtractDestination(script, dest);
    1329             if (!internal && IsValidDestination(dest)) {
    1330                 pwallet->SetAddressBook(dest, label, "receive");
    1331             }
    1332         }
    1333+        batch.reset();
    


    gwillen commented at 11:32 pm on April 3, 2019:
    Is this necessary? It’s about to go out of scope, right?

    achow101 commented at 6:12 pm on April 4, 2019:
    Probably not. I did that just as a belt-and-suspenders thing.
  7. DrahtBot commented at 1:10 am on April 4, 2019: member

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #15876 ([rpc] signer bump fee by Sjors)
    • #15761 (Replace -upgradewallet startup option with upgradewallet RPC by achow101)
    • #15424 ([wallet] wallet-tool: command to remove key metadata by Sjors)
    • #15414 ([wallet] allow adding pubkeys from imported private keys to keypool by Sjors)
    • #14912 ([WIP] External signer support (e.g. hardware wallet) by Sjors)
    • #13756 (wallet: “avoid_reuse” wallet flag for improved privacy by kallewoof)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  8. Empact commented at 9:18 am on April 4, 2019: member

    https://github.com/Empact/bitcoin/commit/dd5b0aaf58d843c2532b3ca1829bf90482e9dc52 CWallet::AddKeyOrigin(const CPubKey& pubkey, const KeyOriginInfo& info) and CWallet::WriteKeyMetadata(const CKeyMetadata& meta, const CPubKey& pubkey, const bool overwrite) can be removed.

    https://github.com/Empact/bitcoin/commit/c5d9eec6529cb2ce185be92d199ac50790a400ff How about moving this functionality into CWallet, so that we don’t operate externally on CWallet::database any more than we already do. As the GetDBHandle comments say: https://github.com/bitcoin/bitcoin/blob/ba54342c9dd3f2e5cdeed9ac57f1924f0d885cc6/src/wallet/wallet.h#L750-L753

    Practically speaking, this also explicitly limits the scope of the batch operations on database to within the lifecycle of the CWallet instance.

    https://github.com/Empact/bitcoin/commit/b363aa785eb2d2cfb95a69f6cb1ddb6ebe89db18 Could also apply the batch treatment to SetAddressBook

  9. in src/wallet/wallet.h:1227 in 508c486064 outdated
    1223@@ -1222,6 +1224,7 @@ class CWallet final : public CCryptoKeyStore, private interfaces::Chain::Notific
    1224 
    1225     /** Add a KeyOriginInfo to the wallet */
    1226     bool AddKeyOrigin(const CPubKey& pubkey, const KeyOriginInfo& info);
    1227+    bool AddKeyOriginWithDB(WalletBatch& batch, const CPubKey& pubkey, const KeyOriginInfo& info);
    


    Empact commented at 9:52 am on April 4, 2019:
    nit: May be a bad example to follow, but AddKeypoolPubkeyWithDB has batch as the last argument, which makes as much sense to me.

    achow101 commented at 6:12 pm on April 4, 2019:
    I was following the AddKeyPubKeyWithDB way of argument ordering.
  10. Empact commented at 9:53 am on April 4, 2019: member
    Concept ACK
  11. in src/wallet/rpcdump.cpp:1279 in 508c486064 outdated
    1275-             if (!pwallet->HaveKey(id) && !pwallet->AddKeyPubKey(key, pubkey)) {
    1276+             if (!pwallet->HaveKey(id) && !pwallet->AddKeyPubKeyWithDB(*batch, key, pubkey)) {
    1277                  throw JSONRPCError(RPC_WALLET_ERROR, "Error adding key to wallet");
    1278              }
    1279              pwallet->UpdateTimeFirstKey(timestamp);
    1280+             if (++cnt % 1000 == 0) {
    


    promag commented at 12:57 pm on April 4, 2019:
    What is the problem of one batch only?

    achow101 commented at 6:11 pm on April 4, 2019:
    The batch can get very large in memory if lots of keys are involved, so we limit the size to avoid blowing up memory during imports.
  12. promag commented at 1:00 pm on April 4, 2019: member
    Concept ACK, nice improvement.
  13. MarcoFalke commented at 5:46 pm on April 4, 2019: member

    See also:

    • Use a single wallet batch for UpgradeKeyMetadata #15433
  14. achow101 force-pushed on Apr 4, 2019
  15. achow101 commented at 6:10 pm on April 4, 2019: member
    I’ve pulled in @Empact’s changes (Empact@dd5b0aa was squashed into 59ec41257f37f2a0438b10ef9776bf7ea3488e9a, cherry-picked the other two commits).
  16. gwillen commented at 10:59 pm on April 4, 2019: contributor
    Especially if there are now multiple places we’re manually batching writes like this, and more where we’re considering it: Let me suggest again and more strongly that the batch object should be handling autoflushing after every X calls, instead of open-coding it in the caller like this. If it’s appropriate for every caller, it could go into WalletBatch, else into a new class that’s an extremely thin wrapper around it. I bet the total number of lines added in this PR would actually go down.
  17. Sjors commented at 1:32 am on April 5, 2019: member
    Concept ACK. I tried the first two commits (so without address book change) with #14145 and importing 2x 1000 keys seems about 6 times faster.
  18. Empact commented at 4:09 am on April 5, 2019: member
    @gwillen Agree it should be done but would be appropriate as a follow-up
  19. practicalswift commented at 11:45 am on April 5, 2019: contributor
    Looks like AddKeypoolPubkey is no longer used and can be removed?
  20. achow101 force-pushed on Apr 5, 2019
  21. achow101 commented at 7:24 pm on April 5, 2019: member

    I’ve implemented @gwillen’s suggestion. WalletBatch will flush to disk automatically after every 1000 writes. This appears to be faster too as it does not close and reopen the database on every flush. With this change, I’ve also updated UpgradeKeyMetadata.

    Looks like AddKeypoolPubkey is no longer used and can be removed?

    Done

  22. achow101 force-pushed on Apr 5, 2019
  23. in src/wallet/wallet.cpp:409 in ebb0ea8553 outdated
    395-                }
    396+                batch.WriteKeyMetadata(meta, pubkey, true);
    397             }
    398         }
    399     }
    400-    batch.reset(); //write before setting the flag
    


    MarcoFalke commented at 8:29 pm on April 5, 2019:

    in commit ebb0ea85539b3e2f515fca89231c20ac86fb52d5:

    Why is this no longer needed?


    achow101 commented at 9:44 pm on April 5, 2019:
    When batch goes out of scope, it will write. This was just a belt and suspenders.

    gwillen commented at 9:53 pm on April 5, 2019:
    The comment suggests that in this case – unlike the batch write in importmulti – there’s a specific reason to force it to write before the end of the scope.

    achow101 commented at 10:42 pm on April 5, 2019:
    reverted
  24. achow101 commented at 9:50 pm on April 5, 2019: member
    I’m unsure why tests are currently failing.
  25. achow101 force-pushed on Apr 5, 2019
  26. in src/wallet/walletdb.h:162 in f2b9b32a63 outdated
    156@@ -157,6 +157,9 @@ class WalletBatch
    157             return false;
    158         }
    159         m_database.IncrementUpdateCounter();
    160+        if (m_database.nUpdateCounter % 1000 == 0) {
    


    promag commented at 10:08 pm on April 5, 2019:
    I get that this makes the code simple, but IMO this defeats the batch purpose. Like, it shouldn’t be something to worry about here.

    gwillen commented at 10:28 pm on April 5, 2019:

    I don’t understand what you mean by ‘defeats the batch purpose’. Are there cases where the caller is depending on the batch not being flushed?

    I think it is far better engineering practice to deal with periodic flushing once, inside the batch-management code, than to deal with it independently at each callsite. If there’s some situation where periodic flushing is undesirable, I think it would be better to offer both versions.


    promag commented at 1:50 pm on April 8, 2019:
    @gwillen I mean you should be able to create a batch with 1001 changes no? I was suggesting something on top of WalletBatch to do that.

    gwillen commented at 4:15 pm on April 8, 2019:
    Well, you can add as many changes as you want, it just flushes after every 1000. This should be completely transparent to the user. Unless there is some reason that flushing can hurt, like the caller is using the batch as though it were an atomic transaction?

    achow101 commented at 4:58 pm on April 8, 2019:

    Unless there is some reason that flushing can hurt, like the caller is using the batch as though it were an atomic transaction?

    There is only one place where an atomic transaction is used, and if an atomic transaction is active, then Flush() will do nothing.


    Empact commented at 9:07 am on April 10, 2019:
    How about add an optional argument to WalletBatch that indicates the # of updates to flush after? This makes the operation opt-in/explicit and also allows for optionally adjusting the flushes to the number of operations - e.g. 2x600 rather than 1000 and 200, if that’s desirable.

    promag commented at 2:24 pm on April 14, 2019:
    Well I may be wrong but I see the batch as an atomic transaction. Making 1500 updates and then aborting the batch should not commit 1000 updates IMO.

    achow101 commented at 3:39 pm on April 14, 2019:

    Well I may be wrong but I see the batch as an atomic transaction. Making 1500 updates and then aborting the batch should not commit 1000 updates IMO.

    Except that’s not how the batch works normally. Batches are not atomic unless you explicitly make them so by starting a transaction with the underlying BerkeleyBatch. In that case, then the flushing won’t do anything. In the normal case, each update in a batch itself is atomic and should be expected to be written to the wallet file. There is no aborting. As soon as a batch goes out of scope, every single update is committed to the file. If you make 1500 updates, the first 1000 will be written after you do the 1000th write operation, and then the last 500 will be written when the batch object is deleted.


    Sjors commented at 4:31 pm on May 16, 2019:
    So is the description of WalletBatch out of date? It says This represents a single transaction at the database. It will be committed when the object goes out of scope.

    achow101 commented at 4:21 pm on May 18, 2019:
    I believe that is out of date

    achow101 commented at 4:30 pm on May 18, 2019:
    I’ve updated the comment.
  27. in src/wallet/db.cpp:617 in f2b9b32a63 outdated
    613@@ -614,7 +614,9 @@ void BerkeleyBatch::Flush()
    614     if (fReadOnly)
    615         nMinutes = 1;
    616 
    617-    env->dbenv->txn_checkpoint(nMinutes ? gArgs.GetArg("-dblogsize", DEFAULT_WALLET_DBLOGSIZE) * 1024 : 0, nMinutes, 0);
    618+    if (env) {
    


    promag commented at 10:14 pm on April 5, 2019:
    Why this change?

    achow101 commented at 10:29 pm on April 5, 2019:
    So that tests pass. When there’s a dummy database, env will be nulltpr and this causes a segfault in tests which use a dummy database.

    promag commented at 1:48 pm on April 8, 2019:
    Can’t we make env is defined instead? Otherwise a comment would make sense as this doesn’t really happen.

    achow101 commented at 5:04 pm on April 8, 2019:

    Can’t we make env is defined instead?

    No. env is nullptr for dummy databases which are used in tests. A check like this is used in many other places in this file in the form of BerkeleyDatabase::IsDummy().

    Otherwise a comment would make sense as this doesn’t really happen.

    Done.


    Empact commented at 8:50 am on April 10, 2019:
    nit: could deal with it earlier (i.e. early return), and use IsDummy to make it more explicit.

    achow101 commented at 3:52 pm on April 14, 2019:
    IsDummy is done by BerkeleyDatabase, not BerkeleyBatch so is not available to do here (although the check is completely identical).
  28. achow101 force-pushed on Apr 5, 2019
  29. jnewbery added this to the "Blockers" column in a project

  30. achow101 force-pushed on Apr 8, 2019
  31. DrahtBot added the label Needs rebase on Apr 9, 2019
  32. achow101 force-pushed on Apr 9, 2019
  33. DrahtBot removed the label Needs rebase on Apr 9, 2019
  34. in src/wallet/wallet.cpp:405 in cc707d5e12 outdated
    388@@ -390,10 +389,6 @@ void CWallet::UpgradeKeyMetadata()
    389             CPubKey pubkey;
    390             if (GetPubKey(meta_pair.first, pubkey)) {
    391                 batch->WriteKeyMetadata(meta, pubkey, true);
    392-                if (++cnt % 1000 == 0) {
    393-                    // avoid creating overlarge in-memory batches in case the wallet contains large amounts of keys
    394-                    batch.reset(new WalletBatch(*database));
    395-                }
    


    Empact commented at 8:56 am on April 10, 2019:
    nit: could squash this into the previous

    achow101 commented at 3:51 pm on April 14, 2019:
    Done
  35. in src/wallet/wallet.cpp:3374 in a14a5ba2f9 outdated
    3408-{
    3409-    WalletBatch batch(*database);
    3410-    AddKeypoolPubkeyWithDB(pubkey, internal, batch);
    3411-    NotifyCanGetAddressesChanged();
    3412-}
    3413-
    



    achow101 commented at 3:51 pm on April 14, 2019:
    Done
  36. achow101 force-pushed on Apr 14, 2019
  37. DrahtBot added the label Needs rebase on Apr 17, 2019
  38. laanwj removed this from the "Blockers" column in a project

  39. achow101 force-pushed on Apr 20, 2019
  40. DrahtBot removed the label Needs rebase on Apr 22, 2019
  41. in src/wallet/rpcdump.cpp:1299 in 68d2fa32df outdated
    1296             pwallet->mapKeyMetadata[id].nCreateTime = timestamp;
    1297 
    1298             // Add to keypool only works with pubkeys
    1299             if (add_keypool) {
    1300-                pwallet->AddKeypoolPubkey(pubkey, internal);
    1301+                pwallet->AddKeypoolPubkeyWithDB(pubkey, internal, batch);
    


    Sjors commented at 4:10 pm on May 16, 2019:
    NotifyCanGetAddressesChanged() (I already tested that fixes QT receive screen)

    achow101 commented at 4:30 pm on May 18, 2019:
    Done
  42. Sjors changes_requested
  43. Sjors commented at 5:08 pm on May 16, 2019: member

    tACK 396d065 modulo notification bug (inline)

    I’m particularly happy that this moves some complexity out of the RPC codebase.

    Importing 1000 pubkeys on an iMac with 3,4 GHz Intel Core i5 goes from 25 to 6 seconds.

  44. laanwj added this to the "Blockers" column in a project

  45. promag commented at 10:05 am on May 18, 2019: member
    @achow101 looks like CWallet::UnsetWalletFlag should reuse the batch instance: Please test/banchmark with 76c3906d751442ddb1b9bf1662d3e2effb53e950 (https://github.com/promag/bitcoin/tree/pr15741), at least here it gives a significant improvement.
  46. jonasschnelli commented at 10:14 am on May 18, 2019: contributor
    Concept ACK
  47. jb55 commented at 2:15 pm on May 18, 2019: member
    @promag’s patch brings my import speeds down from 8 minutes to about 10 seconds for 10000 keys
  48. achow101 force-pushed on May 18, 2019
  49. achow101 commented at 4:30 pm on May 18, 2019: member
    I’ve pulled in @promag’s patch and updated it to match the convention we’ve been using for passing in a WalletBatch.
  50. achow101 force-pushed on May 18, 2019
  51. promag commented at 4:37 pm on May 18, 2019: member
    @achow101 please update times in OP.
  52. Add AddWatchOnlyWithDB, AddKeyOriginWithDB, AddCScriptWithDB functions
    AddWatchOnlyWithDB, AddKeyOriginWithDB, and AddCScriptWithDB add their
    respective data to the wallet using the provided WalletBatch instead
    of creating a new WalletBatch object every time. This allows for batching
    writes to the database.
    366fe0be0b
  53. Have WalletBatch automatically flush every 1000 updates
    Since it now automatically flushes, we don't need to have
    UpgradeKeyMetadata count and flush separately
    d6576e349e
  54. Batch writes for importmulti
    When writing all of the imported data to the wallet, use a common
    WalletBatch object so that batch writes are done and the writes
    finish more quickly.
    
    AddKeypoolPubkey is no longer needed so it is also removed
    ccb26cf347
  55. achow101 force-pushed on May 18, 2019
  56. achow101 commented at 5:21 pm on May 18, 2019: member

    I noticed that importing scripts would also be slow so I’ve added AddCScriptWIthDB in order to apply the same batching performance boost to such imports.

    Also updated OP.

  57. jb55 commented at 6:52 am on May 19, 2019: member
    tACK 87f7befca04a095a06af003f9fe20d2e3c155d2f sooo fast. tested a rescan with my all my trezor keys from HWI and it worked great. thanks @achow101, hw wallet + core is actually usable now (at least for watching :)!
  58. promag commented at 5:06 pm on May 19, 2019: member

    Restarted travis, it was failing with 😕

     047/121 - rpc_psbt.py failed, Duration: 5 s
     1stdout:
     22019-05-18T17:46:43.670000Z TestFramework (INFO): Initializing test directory /tmp/test_runner_₿_🏃_20190518_173555/rpc_psbt_71
     32019-05-18T17:46:47.538000Z TestFramework (ERROR): Assertion failed
     4Traceback (most recent call last):
     5  File "/home/travis/build/bitcoin/bitcoin/build/bitcoin-x86_64-unknown-linux-gnu/test/functional/test_framework/test_framework.py", line 194, in main
     6    self.run_test()
     7  File "/home/travis/build/bitcoin/bitcoin/build/bitcoin-x86_64-unknown-linux-gnu/test/functional/rpc_psbt.py", line 156, in run_test
     8    assert_raises_rpc_error(-22, "", self.nodes[0].converttopsbt, signedtx['hex'])
     9  File "/home/travis/build/bitcoin/bitcoin/build/bitcoin-x86_64-unknown-linux-gnu/test/functional/test_framework/util.py", line 105, in assert_raises_rpc_error
    10    assert try_rpc(code, message, fun, *args, **kwds), "No exception raised"
    11AssertionError: No exception raised
    
  59. promag commented at 8:36 pm on May 19, 2019: member
    utACK 87f7bef. nit, ampersand should be before space.
  60. in src/wallet/wallet.h:1207 in 87f7befca0 outdated
    1203@@ -1204,6 +1204,7 @@ class CWallet final : public CCryptoKeyStore, private interfaces::Chain::Notific
    1204 
    1205     /** Unsets a single wallet flag */
    1206     void UnsetWalletFlag(uint64_t flag);
    1207+    void UnsetWalletFlagWithDB(WalletBatch& batch, uint64_t flag);
    


    Empact commented at 9:54 pm on May 27, 2019:
    nit: could be private

    achow101 commented at 3:04 pm on May 28, 2019:
    Done
  61. in src/wallet/wallet.h:886 in 87f7befca0 outdated
    882@@ -872,6 +883,7 @@ class CWallet final : public CCryptoKeyStore, private interfaces::Chain::Notific
    883     //! Adds an encrypted key to the store, without saving it to disk (used by LoadWallet)
    884     bool LoadCryptedKey(const CPubKey &vchPubKey, const std::vector<unsigned char> &vchCryptedSecret);
    885     bool AddCScript(const CScript& redeemScript) override;
    886+    bool AddCScriptWithDB(WalletBatch& batch, const CScript& script);
    


    Empact commented at 9:56 pm on May 27, 2019:
    nit: could be private

    achow101 commented at 3:04 pm on May 28, 2019:
    Done
  62. Move some of ProcessImport into CWallet::Import*
    This maintains encapsulation of CWallet::database in the face of
    batching, e.g. allows making the `WithDB` methods private.
    6154a09e01
  63. Apply the batch treatment to CWallet::SetAddressBook via ImportScriptPubKeys 6cb888b37d
  64. wallet: Pass WalletBatch to CWallet::UnsetWalletFlag 0db94e55dc
  65. achow101 force-pushed on May 28, 2019
  66. jb55 commented at 9:40 pm on May 28, 2019: member
    utACK 0db94e5
  67. ariard commented at 2:16 am on May 29, 2019: member

    Tested ACK 0db94e5

    Drop import time for 10000 keys ranged descriptor from 495.57s to 3.626s with 2.8 GHz Intel Core i7.

  68. Empact commented at 3:05 am on May 29, 2019: member
    re-utACK https://github.com/bitcoin/bitcoin/pull/15741/commits/0db94e55dcbbfc741df463c404818d9033b4fff1 only change is re the privacy of UnsetWalletFlagWithDB and AddCScriptWithDB.
  69. meshcollider merged this on May 29, 2019
  70. meshcollider closed this on May 29, 2019

  71. meshcollider referenced this in commit ed40fbb02a on May 29, 2019
  72. meshcollider removed this from the "Blockers" column in a project

  73. sidhujag referenced this in commit be1228c147 on May 30, 2019
  74. MarcoFalke referenced this in commit dbf4f3f86a on Jul 26, 2019
  75. konez2k referenced this in commit b55323b771 on Jul 27, 2019
  76. deadalnix referenced this in commit 8792ff7cef on Jun 5, 2020
  77. kittywhiskers referenced this in commit c7d1af61b5 on Dec 4, 2021
  78. kittywhiskers referenced this in commit e51ca247d8 on Dec 8, 2021
  79. kittywhiskers referenced this in commit 002ceee795 on Dec 8, 2021
  80. kittywhiskers referenced this in commit 05a4924256 on Dec 12, 2021
  81. kittywhiskers referenced this in commit 7aca6750f3 on Dec 13, 2021
  82. kittywhiskers referenced this in commit 8e764eb64e on Dec 13, 2021
  83. kittywhiskers referenced this in commit 4cc7405afe on Dec 13, 2021
  84. DrahtBot locked this on Dec 16, 2021

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-12-18 15:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me