wallet: reduce loading time by using unordered maps #16910

pull achow101 wants to merge 6 commits into bitcoin:master from achow101:reduce-wallet-load changing 15 files +208 −92
  1. achow101 commented at 9:16 pm on September 18, 2019: member

    For large wallets with many transactions and keys, inserting and fetching from a std::map can have a performance impact. Since we largely do not rely on the sorting properties of std::map, we can use std::unordered_map instead which is a hash table and insertion and retrieval are constant time. We also use std::unordered_multimap for some things that were std::multimap.

    The changed maps are:

    • mapWallet: Map of all transactions
    • mapTxSpends: Map of outpoints to the txids of the txs that spent them *mapKeyMetadata: Map of key metadata *mapScriptMetadata: Map of script metadata *mapKeys: Map of private keys *mapCryptedKeys: Map of encrypted private keys *mapWatchKeys: Set of watch only keys. *setWatchOnly: Set of watch only scripts (std::unordered_set, not a map)

    Change mapWallet and mapTxSpends did require a few other changes and thus they are in their own commits. Additionally, the GUI relied on getting transactions from the wallet in a sorted order which unordered_map does not provide, so the commit “Change getWalletTxs to return a set instead of a vector” is needed in order to put all of the txs into a std::set (which is ordered) instead of a vector in order to retain the same behavior.

    mapTxSpends also relied on the sorted order to have some quick lookups, but these were changed to just do normal lookups over the same thing. It should be just as fast, or even faster, since std::unordered_map is a hash table.

    The hash function used for these unordered maps and sets is SipHash, using the SipHash module that we already have. SaltedTxidHasher and SaltedOutPointHasher were moved from their original locations in utxo set and validation code in order to also be used from the wallet. Additionally SaltedIDHasher was added to hash uint160s (i.e. CKeyID and CScriptID) and SaltedScriptHasher was added to hash CScripts.

    I did some becnhmarks with a large wallet I created on regtest using this script. This wallet is 169 MB in size with 117035 transactions and 103264 keys. It took ~20 secs to load on master, and ~18 secs with this change. So this change reduces wallet loading time by ~10%.

  2. DrahtBot added the label Mempool on Sep 18, 2019
  3. DrahtBot added the label Tests on Sep 18, 2019
  4. DrahtBot added the label UTXO Db and Indexes on Sep 18, 2019
  5. DrahtBot added the label Wallet on Sep 18, 2019
  6. achow101 force-pushed on Sep 18, 2019
  7. achow101 commented at 9:36 pm on September 18, 2019: member
    Shuffled some things around to get rid of the circular dependency. Also fixed a typo
  8. promag commented at 9:47 pm on September 18, 2019: member
    Concept ACK, especially improvements targeting big wallets (either lots of keys or lots of transactions).
  9. jonatack commented at 9:50 pm on September 18, 2019: member
    Concept ACK, will review and run valgrind+massif visualizer and flamegraphs to compare. Anywhere we can get a big wallet file? EDIT: Saw the script you posted in the PR body above… running.
  10. achow101 commented at 10:09 pm on September 18, 2019: member

    To benchmark this, I applied this commit: https://github.com/achow101/bitcoin/commit/16ebe1362eb8b35c42b61f73bf844bb29b29fdf5. It adds logging of the load time and also stops bitcoind after the wallet has been loaded in order to get more consistent results.

    Here are a couple graphs that may be interesting to people.

    Flame graph (svg hosted on my website, couldn’t inline in this comment) of master: https://achow101.com/bitcoin-master-flame.svg

    Flame graph of this PR: https://achow101.com/bitcoin-reduce-wallet-load-flame.svg

    Massif (heap profiler) memory usage over time on master: image

    Massif memory usage over time on this PR: image

  11. in src/wallet/wallet.cpp:752 in 4ed9dd6ffb outdated
    748@@ -749,7 +749,7 @@ std::set<uint256> CWallet::GetConflicts(const uint256& txid) const
    749 bool CWallet::HasWalletSpend(const uint256& txid) const
    750 {
    751     AssertLockHeld(cs_wallet);
    752-    auto iter = mapTxSpends.lower_bound(COutPoint(txid, 0));
    753+    auto iter = mapTxSpends.find(COutPoint(txid, 0));
    


    ryanofsky commented at 11:27 pm on September 18, 2019:
    Looks like a bug here. Previously it would return true if any output was spent, now it will only return true if the first output was spent. Could fix pretty easily by passing const CTransaction& instead of txid, and looping over the outputs.

    achow101 commented at 0:31 am on September 19, 2019:
    Done

    hebasto commented at 9:59 am on July 2, 2020:

    Looks like a bug here. Previously it would return true if any output was spent, now it will only return true if the first output was spent. Could fix pretty easily by passing const CTransaction& instead of txid, and looping over the outputs.

    A suggestion for a followup: it could be useful to add a test for it to prevent regressions in the future.

  12. achow101 force-pushed on Sep 19, 2019
  13. achow101 force-pushed on Sep 19, 2019
  14. achow101 force-pushed on Sep 19, 2019
  15. achow101 force-pushed on Sep 19, 2019
  16. fanquake removed the label Tests on Sep 19, 2019
  17. DrahtBot commented at 0:55 am on September 19, 2019: member

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #19671 (wallet: Remove -zapwallettxes by achow101)
    • #19653 (wallet: Replace -zapwallettxes with zapwallettxes RPC by achow101)
    • #19289 (wallet: GetWalletTx and IsMine require cs_wallet lock by promag)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  18. achow101 commented at 10:11 pm on September 19, 2019: member
    I did 1000 runs of my wallet loading benchmark. The mean loading time on master was 18408.14423 ms and the mean loading time on with this PR was 17498.71984. This is ~1 second faster, which is a ~5% performance increase with this PR. I also did a 2 sample t-test to check that these results are statistically significant, and it appears that they are.
  19. achow101 force-pushed on Sep 19, 2019
  20. DrahtBot added the label Needs rebase on Sep 30, 2019
  21. achow101 force-pushed on Sep 30, 2019
  22. DrahtBot removed the label Needs rebase on Sep 30, 2019
  23. laanwj added the label Resource usage on Oct 2, 2019
  24. DrahtBot added the label Needs rebase on Oct 15, 2019
  25. achow101 force-pushed on Oct 16, 2019
  26. achow101 force-pushed on Oct 21, 2019
  27. DrahtBot removed the label Needs rebase on Oct 21, 2019
  28. DrahtBot added the label Needs rebase on Oct 29, 2019
  29. achow101 force-pushed on Oct 29, 2019
  30. DrahtBot removed the label Needs rebase on Oct 29, 2019
  31. DrahtBot added the label Needs rebase on Nov 4, 2019
  32. achow101 force-pushed on Nov 4, 2019
  33. DrahtBot removed the label Needs rebase on Nov 4, 2019
  34. DrahtBot added the label Needs rebase on Nov 8, 2019
  35. achow101 force-pushed on Nov 8, 2019
  36. DrahtBot removed the label Needs rebase on Nov 8, 2019
  37. DrahtBot added the label Needs rebase on Nov 22, 2019
  38. achow101 force-pushed on Nov 22, 2019
  39. DrahtBot removed the label Needs rebase on Nov 22, 2019
  40. DrahtBot added the label Needs rebase on Dec 12, 2019
  41. achow101 force-pushed on Dec 12, 2019
  42. DrahtBot removed the label Needs rebase on Dec 12, 2019
  43. achow101 force-pushed on Dec 12, 2019
  44. elichai commented at 5:43 pm on December 15, 2019: contributor
    Concept ACK. I’m just playing with replacing a couple std::set with std::unordered_set and std::map with std::unordered_map and I also need access to a more public SaltedOutpointHasher.
  45. DrahtBot added the label Needs rebase on Jan 30, 2020
  46. achow101 force-pushed on Jan 30, 2020
  47. DrahtBot removed the label Needs rebase on Jan 30, 2020
  48. DrahtBot added the label Needs rebase on Feb 3, 2020
  49. achow101 force-pushed on Feb 11, 2020
  50. DrahtBot removed the label Needs rebase on Feb 11, 2020
  51. DrahtBot added the label Needs rebase on Mar 9, 2020
  52. achow101 force-pushed on Mar 10, 2020
  53. DrahtBot removed the label Needs rebase on Mar 10, 2020
  54. DrahtBot added the label Needs rebase on Apr 22, 2020
  55. achow101 force-pushed on Apr 22, 2020
  56. DrahtBot removed the label Needs rebase on Apr 22, 2020
  57. DrahtBot added the label Needs rebase on May 4, 2020
  58. achow101 force-pushed on May 4, 2020
  59. DrahtBot removed the label Needs rebase on May 4, 2020
  60. DrahtBot added the label Needs rebase on May 6, 2020
  61. achow101 force-pushed on May 12, 2020
  62. DrahtBot removed the label Needs rebase on May 12, 2020
  63. hebasto commented at 2:33 pm on June 27, 2020: member
    Concept ACK on using hashed containers.
  64. hebasto commented at 2:38 pm on June 27, 2020: member
    Why not using auto for iterators type? It could improve readability a bit :)
  65. in src/saltedhash.h:57 in 28dddb91b9 outdated
    49@@ -50,4 +50,28 @@ class SaltedOutpointHasher
    50     }
    51 };
    52 
    53+class SaltedIDHasher
    54+{
    55+private:
    56+    /** Salt */
    57+    uint64_t k0, k1;
    


    hebasto commented at 2:52 pm on June 27, 2020:

    28dddb91b9744987125cd2248577f38c0b2e77c7

    0    const uint64_t m_k0, m_k1;
    

    achow101 commented at 6:14 pm on June 29, 2020:
    Done

    achow101 commented at 7:03 pm on June 29, 2020:
    Turns out this can’t be const as it causes the swap in LegacyScriptPubKeyMan::EncryptKeys to not compile.

    hebasto commented at 9:08 am on July 2, 2020:

    Agree.

    Turns out this can’t be const as it causes the swap in LegacyScriptPubKeyMan::EncryptKeys to not compile.

    You mean LegacyScriptPubKeyMan::Encrypt() ?

  66. in src/saltedhash.h:69 in 28dddb91b9 outdated
    64+
    65+class SaltedScriptHasher
    66+{
    67+private:
    68+    /** Salt */
    69+    const uint64_t k0, k1;
    


    hebasto commented at 2:53 pm on June 27, 2020:

    28dddb91b9744987125cd2248577f38c0b2e77c7

    0    const uint64_t m_k0, m_k1;
    

    achow101 commented at 6:15 pm on June 29, 2020:
    Done
  67. in src/saltedhash.h:40 in 42bc524e82 outdated
    33+    SaltedOutpointHasher();
    34+
    35+    /**
    36+     * This *must* return size_t. With Boost 1.46 on 32-bit systems the
    37+     * unordered_map will behave unpredictably if the custom hasher returns a
    38+     * uint64_t, resulting in failures when syncing the chain (#4634).
    


    hebasto commented at 2:57 pm on June 27, 2020:

    42bc524e82e627909ef4d4825728cd0f30ac740f

    This comment could be dropped as all available reference docs refer to the size_t return type for a hasher operator().


    achow101 commented at 6:15 pm on June 29, 2020:
    Since this is just moved code, I’m going to leave as is.
  68. hebasto commented at 2:59 pm on June 27, 2020: member

    05afffb4396b3a542d9a001642cad5b1afbc7978 changes look correct.

    In commit 05afffb4396b3a542d9a001642cad5b1afbc7978 message s/mapScriptMetadata/m_script_metadata/

  69. achow101 force-pushed on Jun 29, 2020
  70. achow101 commented at 6:15 pm on June 29, 2020: member

    Why not using auto for iterators type? It could improve readability a bit :)

    Done

  71. in src/wallet/wallet.h:15 in 5fbf2f735b outdated
    11@@ -12,6 +12,7 @@
    12 #include <outputtype.h>
    13 #include <policy/feerate.h>
    14 #include <psbt.h>
    15+#include <saltedhash.h>
    


    hebasto commented at 6:25 pm on June 29, 2020:
    This should be moved to the next commit, no?

    achow101 commented at 7:03 pm on June 29, 2020:
    Done
  72. in src/script/signingprovider.h:11 in f9304b6da3 outdated
     7@@ -8,6 +8,7 @@
     8 
     9 #include <key.h>
    10 #include <pubkey.h>
    11+#include <saltedhash.h>
    


    hebasto commented at 6:31 pm on June 29, 2020:

    f9304b6da3237f4a5eb4ee40a149f125cfa12e1d

    0#include <unordered_map>
    

    achow101 commented at 7:03 pm on June 29, 2020:
    Done
  73. in src/wallet/scriptpubkeyman.h:19 in f9304b6da3 outdated
    15@@ -16,6 +16,8 @@
    16 #include <wallet/walletdb.h>
    17 #include <wallet/walletutil.h>
    18 
    19+#include <unordered_set>
    


    hebasto commented at 6:32 pm on June 29, 2020:

    f9304b6da3237f4a5eb4ee40a149f125cfa12e1d

    0#include <unordered_map>
    1#include <unordered_set>
    

    achow101 commented at 7:03 pm on June 29, 2020:
    Done
  74. achow101 force-pushed on Jun 29, 2020
  75. achow101 commented at 7:04 pm on June 29, 2020: member
    Had to rebase onto master to pick up the uint160 changes to CKeyID and CScriptID
  76. in src/primitives/transaction.h:11 in 2ba96b0438 outdated
     7@@ -8,6 +8,7 @@
     8 
     9 #include <stdint.h>
    10 #include <amount.h>
    11+#include <crypto/siphash.h>
    


    hebasto commented at 9:45 am on July 2, 2020:
    2ba96b0438fe734b788e56ba881303c84a3f249d Why this header added?

    achow101 commented at 3:10 pm on July 2, 2020:
    Oops, that’s from a previous iteration of this PR. Removed.
  77. in src/saltedhash.cpp:6 in 2ba96b0438 outdated
    0@@ -0,0 +1,10 @@
    1+// Copyright (c) 2019 The Bitcoin Core developers
    2+// Distributed under the MIT software license, see the accompanying
    3+// file COPYING or http://www.opensource.org/licenses/mit-license.php.
    4+
    5+#include <random.h>
    6+#include <saltedhash.h>
    


    hebasto commented at 9:47 am on July 2, 2020:

    2ba96b0438fe734b788e56ba881303c84a3f249d nit:

    0#include <saltedhash.h>
    1
    2#include <random.h>
    3
    4#include <limits>
    

    achow101 commented at 3:10 pm on July 2, 2020:
    Added #include <limits>. random.h is one of our headers, so it should be grouped with saltedhash.h.

    hebasto commented at 3:23 pm on July 2, 2020:

    I mean the common pattern in the repo for somefeature.cpp:

    0// Copyright...
    1
    2#include <somefeature.h>
    3
    4#include <anotherheader.h>
    5#include <otherheader.h>
    6//...
    

    nm, this is not documented :)

  78. hebasto approved
  79. hebasto commented at 9:57 am on July 2, 2020: member

    ACK 4b3439ffa93e7de1ce2fefcb16e989e5a1805569, tested on Linux Mint 20 (x86_64).

    Did not make benchmarks, though.


    e89c19a157d504492e0665fee81cac496d5fc7f6 Why are operator() definitions not placed in the header, that could imply inlining?

  80. achow101 commented at 2:56 pm on July 2, 2020: member

    Why are operator() definitions not placed in the header, that could imply inlining?

    I’m not sure that it matters since we never call operator() directly ourselves.

  81. Move SaltedTxidHasher and SaltedOutPointHasher to saltedhash.{cpp/h} 76bd5b05dc
  82. Change getWalletTxs to return a set instead of a vector 23eb02c755
  83. Change mapWallet to be a std::unordered_map 864bc3bfda
  84. Change mapTxSpends to be a std::unordered_multimap dec73a3188
  85. Add SaltedHashers for CKeyID, CScriptID, CPubKey, and CScript
    SaltedKeyIDHasher uses CSipHasher for hashing CKeyIDs
    SaltedScriptIDHasher uses CSipHasher for hashing CScriptIDs
    SaltedPubkeyasher uses CSipHasher for hashing CPubKeys
    SaltedScriptHasher uses CSipHasher for hashing CScripts
    11d172ca41
  86. Use std::unordered_map and std::unordered_set in various places in the wallet
    Change mapKeys, mapScripts, mapCryptedKeys, mapKeyMetadata, m_script_metadata, mapWatchKeys
    to use std::unordered_map.
    
    Change setWatchOnly to use std::unordered_set
    8e3d1e73cd
  87. achow101 force-pushed on Jul 2, 2020
  88. hebasto approved
  89. hebasto commented at 3:25 pm on July 2, 2020: member
    re-ACK 8e3d1e73cdc560c42349989e9a41e71485a873c9, only suggested changes since the previous review.
  90. achow101 commented at 4:00 pm on August 10, 2020: member
    This has been open for a while without much review. Closing for now, I’ll try to integrate some of these changes the next time I touch the relevant places (e.g. mapWallet when I get around to changing how transactions are stored).
  91. achow101 closed this on Aug 10, 2020

  92. DrahtBot locked this on Feb 15, 2022

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-07-05 22:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me