addrman: improve performance by using more suitable containers #18722

pull vasild wants to merge 1 commits into bitcoin:master from vasild:addrman_map changing 3 files +36 −15
  1. vasild commented at 1:17 pm on April 21, 2020: member

    CAddrMan uses std::map internally even though it does not require that the map’s elements are sorted. std::map’s access time is O(log(map size)). std::unordered_map is more suitable as it has a O(1) access time.

    This patch lowers the execution times of CAddrMan’s methods as follows (as per src/bench/addrman.cpp):

    0AddrMan::Add(): -3.5%
    1AddrMan::GetAddr(): -76%
    2AddrMan::Good(): -0.38%
    3AddrMan::Select(): -45%
    
  2. vasild commented at 1:17 pm on April 21, 2020: member

    Full output from the benchmark (./src/bench/bench_bitcoin -filter="AddrMan.*"):

    std::map

    Benchmark evals iterations total min max median
    AddrManAdd 5 5 4.45807 0.176704 0.179945 0.178443
    AddrManGetAddr 5 500 4.09847 0.00163663 0.00164301 0.0016392
    AddrManGood 5 2 4.66502 0.460193 0.474046 0.467368
    AddrManSelect 5 1000000 2.07299 4.12892e-07 4.15564e-07 4.147e-07

    std::unordered_map

    Benchmark evals iterations total min max median
    AddrManAdd 5 5 4.30032 0.169753 0.1738 0.172047
    AddrManGetAddr 5 500 0.967398 0.000384406 0.00038996 0.000387176
    AddrManGood 5 2 4.64742 0.460599 0.471216 0.464143
    AddrManSelect 5 1000000 1.14085 2.23899e-07 2.36962e-07 2.26767e-07
     0map$ ./src/bench/bench_bitcoin -filter="AddrMan.*"
     1# Benchmark, evals, iterations, total, min, max, median
     2AddrManAdd, 5, 5, 4.45807, 0.176704, 0.179945, 0.178443
     3AddrManGetAddr, 5, 500, 4.09847, 0.00163663, 0.00164301, 0.0016392
     4AddrManGood, 5, 2, 4.66502, 0.460193, 0.474046, 0.467368
     5AddrManSelect, 5, 1000000, 2.07299, 4.12892e-07, 4.15564e-07, 4.147e-07
     6
     7unordered_map$ ./src/bench/bench_bitcoin -filter="AddrMan.*"
     8# Benchmark, evals, iterations, total, min, max, median
     9AddrManAdd, 5, 5, 4.30032, 0.169753, 0.1738, 0.172047
    10AddrManGetAddr, 5, 500, 0.967398, 0.000384406, 0.00038996, 0.000387176
    11AddrManGood, 5, 2, 4.64742, 0.460599, 0.471216, 0.464143
    12AddrManSelect, 5, 1000000, 1.14085, 2.23899e-07, 2.36962e-07, 2.26767e-07
    
  3. fanquake added the label P2P on Apr 21, 2020
  4. vasild renamed this:
    addrman: use unordered_map instead of map
    addrman: improve performance by using more suitable containers
    on Apr 21, 2020
  5. in src/addrman.h:186 in 52faf86a4f outdated
    181@@ -181,14 +182,17 @@ friend class CAddrManTest;
    182     mutable RecursiveMutex cs;
    183 
    184 private:
    185+    typedef std::unordered_map<int, CAddrInfo> mapInfoType;
    186+    typedef std::unordered_map<CNetAddr, int, CNetAddrHash> mapAddrType;
    


    Empact commented at 5:17 pm on April 21, 2020:
    nit: using and capitalize class

  6. in src/addrman.h:26 in 52faf86a4f outdated
    20@@ -21,6 +21,7 @@
    21 #include <set>
    22 #include <stdint.h>
    23 #include <streams.h>
    24+#include <unordered_map>
    


    Empact commented at 5:23 pm on April 21, 2020:
    nit: looks like you can move the <map> include to addrman.cpp now.

    vasild commented at 7:55 pm on April 21, 2020:

    Right! And even better - std::map is only used inside CAddrMan::Check_() in a way that fits unordered_map better - we only insert elements, query elements by key and check container size.

    So I changed CAddrMan::Check_() to use unordered_(set|map) and removed the #include <map>.

  7. vasild force-pushed on Apr 21, 2020
  8. fanquake requested review from sipa on Apr 22, 2020
  9. MarcoFalke commented at 2:53 pm on April 22, 2020: member
    Would it be possible to submit the benchmark upfront as a separate pull, so that https://codespeed.bitcoinperf.com/ can start running it?
  10. vasild commented at 9:40 am on April 24, 2020: member
  11. vasild force-pushed on Apr 27, 2020
  12. vasild commented at 10:25 am on April 27, 2020: member
    Removed the first commit (which introduced the benchmark), now that it is already merged into master and rebased.
  13. in src/netaddress.h:113 in 4591c3d3a3 outdated
    106@@ -107,6 +107,19 @@ class CNetAddr
    107         }
    108 
    109         friend class CSubNet;
    110+        friend struct CNetAddrHash;
    111+};
    112+
    113+struct CNetAddrHash
    


    sipa commented at 5:40 pm on April 27, 2020:
    You’ll want to use a salted hash here (use CSipHasher from crypto/siphash.h or a specialization; see SaltedOutpointHasher in coins.h for inspiration). Otherwise an attacker may be able to degrade our performance by giving us a bunch of entries that all end up in the same bucket).

    vasild commented at 7:53 pm on May 4, 2020:
    Done, thanks for the suggestion!
  14. in src/addrman.h:500 in 4591c3d3a3 outdated
    498-                it++;
    499+                ++it;
    500             }
    501         }
    502+#else // pre C++14, drop this once we switch to C++14 or higher
    503+        std::vector<MapInfo::const_iterator> toDelete;
    


    sipa commented at 5:41 pm on April 27, 2020:
    Nit: new code should use the coding style described in doc/developer-.nodes.md (e.g. to_delete instead).

    vasild commented at 7:53 pm on May 4, 2020:
    Done
  15. vasild force-pushed on May 4, 2020
  16. vasild force-pushed on May 5, 2020
  17. DrahtBot commented at 9:52 pm on May 20, 2020: member

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #19238 (refactor: Make CAddrMan::cs non-recursive by hebasto)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  18. luke-jr referenced this in commit 7d6bcaf21d on Jun 9, 2020
  19. sipa commented at 6:00 am on June 10, 2020: member
    utACK 7cc317285f388f0cda974a321761992248257fb8
  20. hebasto commented at 6:27 pm on June 12, 2020: member
    Concept ACK.
  21. hebasto commented at 5:42 am on June 13, 2020: member

    Benchmarking results:

    • master (8c97780db8c9dd33efed134385573ba97e9cd165)
    0$ ./src/bench/bench_bitcoin -filter="AddrMan.*"
    1# Benchmark, evals, iterations, total, min, max, median
    2AddrManAdd, 5, 5, 1.15889, 0.0458257, 0.0469068, 0.0464069
    3AddrManGetAddr, 5, 500, 1.94467, 0.000772949, 0.000789915, 0.000775272
    4AddrManGood, 5, 2, 5.69597, 0.563135, 0.57652, 0.570197
    5AddrManSelect, 5, 1000000, 1.17994, 2.32975e-07, 2.37942e-07, 2.36478e-07
    
    • master + this PR
    0$ ./src/bench/bench_bitcoin -filter="AddrMan.*"
    1# Benchmark, evals, iterations, total, min, max, median
    2AddrManAdd, 5, 5, 0.972042, 0.0383488, 0.0396571, 0.0386554
    3AddrManGetAddr, 5, 500, 0.649199, 0.000256967, 0.000262957, 0.00025942
    4AddrManGood, 5, 2, 5.35751, 0.531294, 0.540723, 0.534681
    5AddrManSelect, 5, 1000000, 0.657766, 1.29313e-07, 1.36428e-07, 1.30873e-07
    

    Nice :)

  22. in src/addrman.cpp:79 in 7cc317285f outdated
    75@@ -73,12 +76,12 @@ double CAddrInfo::GetChance(int64_t nNow) const
    76 
    77 CAddrInfo* CAddrMan::Find(const CNetAddr& addr, int* pnId)
    78 {
    79-    std::map<CNetAddr, int>::iterator it = mapAddr.find(addr);
    80+    auto it = mapAddr.find(addr);
    


    hebasto commented at 5:50 am on June 13, 2020:
    0    const auto it = mapAddr.find(addr);
    

    vasild commented at 2:57 pm on June 15, 2020:
    Right, I will add the const if I end up modifying this PR for some bigger reason. But I think it is not worth invalidating the ACK.

    vasild commented at 6:15 pm on June 25, 2020:
    Done.
  23. in src/addrman.cpp:84 in 7cc317285f outdated
    81     if (it == mapAddr.end())
    82         return nullptr;
    83     if (pnId)
    84         *pnId = (*it).second;
    85-    std::map<int, CAddrInfo>::iterator it2 = mapInfo.find((*it).second);
    86+    auto it2 = mapInfo.find((*it).second);
    


    hebasto commented at 5:51 am on June 13, 2020:
    0    const auto it2 = mapInfo.find((*it).second);
    

    vasild commented at 2:57 pm on June 15, 2020:
    Right, I will add the const if I end up modifying this PR for some bigger reason. But I think it is not worth invalidating the ACK.

    vasild commented at 6:15 pm on June 25, 2020:
    Done.
  24. in src/netaddress.h:462 in 7cc317285f outdated
    117+public:
    118+    size_t operator()(const CNetAddr& a) const noexcept
    119+    {
    120+        CSipHasher hasher(salt_k0, salt_k1);
    121+        hasher.Write(a.ip, sizeof(a.ip));
    122+        return (size_t)hasher.Finalize();
    


    hebasto commented at 7:09 am on June 13, 2020:
    Is this safe on 32-bit system (e.g., ARM7)?

    vasild commented at 2:55 pm on June 15, 2020:

    Yes, I think it is.

    Finalize() returns uint64_t and even if size_t is 32 bits, then it will just take “half” of the hash. SaltedOutpointHasher already does such a conversion from uint64_t to size_t and it contains some comment that size_t must be returned.


    hebasto commented at 10:01 am on June 25, 2020:

    nit: All calls could be chained:

    0        return CSipHasher(salt_k0, salt_k1).Write(a.ip, sizeof(a.ip)).Finalize();
    

    vasild commented at 11:51 am on June 25, 2020:

    I actually find the 3-line variant easier to read (that could be subjective). The 3-line variant is more manageable wrt setting a breakpoint or reading a backtrace (e.g. if you only see it crashed on that multi-chained-line it wouldn’t be clear whether it crashed in the constructor, in the Write() method or in the Finalize() method).

    I would prefer to leave it as is.


    sipa commented at 8:40 pm on May 20, 2021:
    Any reason to not include m_net as well?

    vasild commented at 11:24 am on May 21, 2021:
    No. That code is from before m_net existed! I added it now, thanks!
  25. hebasto commented at 7:22 am on June 13, 2020: member

    nit: If #includes in addrman.h are already touched, mind moving

    0#include <fs.h>
    1#include <hash.h>
    2#include <streams.h>
    

    into the first group?

  26. vasild commented at 3:00 pm on June 15, 2020: member

    nit: If #includes in addrman.h are already touched, mind moving…

    Good catch, they should be moved! Given that there is already one ACK, I will move the headers if I alter this PR or as a followup PR if this gets merged as is now.

  27. hebasto approved
  28. hebasto commented at 4:17 pm on June 15, 2020: member
    ACK 7cc317285f388f0cda974a321761992248257fb8 @vasild feel free to ignore my nits :)
  29. in src/netaddress.h:127 in 7cc317285f outdated
    122+        return (size_t)hasher.Finalize();
    123+    }
    124+
    125+private:
    126+    const uint64_t salt_k0 = GetRand(std::numeric_limits<uint64_t>::max());
    127+    const uint64_t salt_k1 = GetRand(std::numeric_limits<uint64_t>::max());
    


    hebasto commented at 10:02 am on June 25, 2020:
    nit: Class member naming convention?

    vasild commented at 11:48 am on June 25, 2020:
    Right! A convention is a convention. Renamed.
  30. vasild force-pushed on Jun 25, 2020
  31. vasild commented at 11:47 am on June 25, 2020: member
    Prefixed salt_k0 and salt_k1 with m_ to abide to the naming convention. diff.
  32. hebasto commented at 4:42 pm on June 25, 2020: member

    @vasild

    Prefixed salt_k0 and salt_k1 with m_ to abide to the naming convention. diff.

    If some changes are made already, maybe make these ones ((https://github.com/bitcoin/bitcoin/pull/18722#discussion_r440238143), #18722 (review), #18722 (comment)) too?

    :)

  33. vasild force-pushed on Jun 25, 2020
  34. vasild commented at 6:18 pm on June 25, 2020: member

    I forgot about those, sorry. Done now.

    Since the ACKs - renamed variables to follow convention, added const, moved #includes. diff

  35. hebasto approved
  36. hebasto commented at 6:44 pm on June 25, 2020: member
    re-ACK d6e782174ecb7298e9d67303e52cc470ec9e2be7, only suggested changes since the previous review.
  37. adamjonas commented at 1:30 am on August 14, 2020: member

    Updated benchmarking results (now with nanobench, which meant I had to look up what these categories represented). In sum, it looks like a performance upgrade, especially for AddrManGetAddr though the error rate did jump up for me and I had to run the PR version a bunch of times to get something stable while master was stable the first time (though reading through 18011 this may be OS specific - this was run on OSX).

    • master (609ce2d0da495ee09db03ba0c2d9b38a3b3f541e)
    0|               ns/op |                op/s |    err% |     total | benchmark
    1|--------------------:|--------------------:|--------:|----------:|:----------
    2|       47,378,973.00 |               21.11 |    2.1% |      0.53 | `AddrManAdd`
    3|        1,585,044.00 |              630.90 |    1.8% |      0.02 | `AddrManGetAddr`
    4|      427,775,318.00 |                2.34 |    1.0% |      2.13 | `AddrManGood`
    5|              468.57 |        2,134,157.56 |    1.5% |      0.00 | `AddrManSelect`
    
    • master + PR
    0|               ns/op |                op/s |    err% |     total | benchmark
    1|--------------------:|--------------------:|--------:|----------:|:----------
    2|       42,701,933.00 |               23.42 |    2.6% |      0.47 | `AddrManAdd`
    3|          572,564.00 |            1,746.53 |    3.8% |      0.01 | `AddrManGetAddr`
    4|      429,306,009.00 |                2.33 |    2.6% |      2.17 | `AddrManGood`
    5|              406.60 |        2,459,419.58 |    5.0% |      0.00 | `AddrManSelect`
    
  38. DrahtBot added the label Needs rebase on Aug 25, 2020
  39. vasild force-pushed on Sep 1, 2020
  40. vasild commented at 1:08 pm on September 1, 2020: member
    Rebased to resolve conflicts.
  41. DrahtBot removed the label Needs rebase on Sep 1, 2020
  42. in src/netaddress.h:14 in 1e1cc9d859 outdated
    11@@ -12,6 +12,8 @@
    12 #include <attributes.h>
    13 #include <compat.h>
    14 #include <prevector.h>
    15+#include <crypto/siphash.h>
    


    hebasto commented at 5:33 pm on September 12, 2020:
    nit: The latest rebasing destroyed alphabetic order of headers :)

    vasild commented at 7:18 am on September 14, 2020:
    Fixed, someday we may consider to have an automated clang-format enforcement in CI…
  43. hebasto approved
  44. hebasto commented at 5:33 pm on September 12, 2020: member
    re-ACK 1e1cc9d8594b0ffba1173d8e431e351228691536, only rebased.
  45. vasild force-pushed on Sep 14, 2020
  46. hebasto approved
  47. hebasto commented at 7:16 am on September 14, 2020: member
    re-ACK 7dddf330fe9f17e20df7bc7431ad3f7b740a3a9c, only headers got correct order.
  48. in src/netaddress.h:259 in 7dddf330fe outdated
    191@@ -190,6 +192,7 @@ class CNetAddr
    192         }
    193 
    194         friend class CSubNet;
    195+        friend class CNetAddrHash;
    


    jonatack commented at 7:36 am on September 14, 2020:
    nit if you retouch, inverse the order of these two friend class declarements (both for alphabetical order and code order of the two classes, below)

    vasild commented at 2:02 pm on November 13, 2020:
    Done
  49. jonatack commented at 7:56 am on September 14, 2020: member
    Code review ACK 7dddf330fe9f17e20df7bc7431ad3f7b740a3a9c also: debug build is clean, unit tests are green, ran bitcoind and observed debug log and -netinfo
  50. fanquake commented at 12:30 pm on October 31, 2020: member
    Concept ACK. I think we can merge this after the next branch off, once we start requiring C++17, and skip the addition of the pre-C++14 code.
  51. fanquake added this to the milestone 22.0 on Oct 31, 2020
  52. jnewbery commented at 10:23 am on November 12, 2020: member
    concept ACK. Will review after split-off when the pre-C++14 code is removed.
  53. DrahtBot added the label Needs rebase on Nov 12, 2020
  54. vasild force-pushed on Nov 13, 2020
  55. vasild commented at 1:59 pm on November 13, 2020: member
    Rebased to resolve conflicts.
  56. vasild force-pushed on Nov 13, 2020
  57. vasild commented at 2:02 pm on November 13, 2020: member

    Addressed a nit and removed pre-C++14 code.

    This should be merged only after -std=c++17 becomes mandatory.

  58. DrahtBot removed the label Needs rebase on Nov 13, 2020
  59. vasild commented at 3:12 pm on December 17, 2020: member

    … after -std=c++17 becomes mandatory

    it is now

  60. DrahtBot added the label Needs rebase on Feb 1, 2021
  61. vasild force-pushed on Feb 10, 2021
  62. vasild commented at 3:43 pm on February 10, 2021: member
    d517c9d37…6f0c9f539: rebase due to conflicts
  63. DrahtBot removed the label Needs rebase on Feb 10, 2021
  64. DrahtBot added the label Needs rebase on May 20, 2021
  65. vasild force-pushed on May 21, 2021
  66. vasild commented at 11:25 am on May 21, 2021: member
    6f0c9f5390...6a1a28f2ad: rebased due to conflicts, plus add m_net to the CNetAddr hashing.
  67. DrahtBot removed the label Needs rebase on May 21, 2021
  68. in src/netaddress.h:489 in 6a1a28f2ad outdated
    484+{
    485+public:
    486+    size_t operator()(const CNetAddr& a) const noexcept
    487+    {
    488+        CSipHasher hasher(m_salt_k0, m_salt_k1);
    489+        hasher.Write(reinterpret_cast<const uint8_t*>(&a.m_net), sizeof(a.m_net));
    


    sipa commented at 5:29 pm on May 21, 2021:
    Micronit: this reinterpret cast is a bit ugly. You could avoid it by just doing hasher.Write(a.m_net);; it’ll interpret a.m_net as a uint64_t and use the optimized integer hasher.

    vasild commented at 8:41 am on May 24, 2021:
    Done. I hope the implicit conversion from int to uint64_t will not upset some compiler.

    jonatack commented at 9:57 am on May 24, 2021:
    The ugliness could maybe be considered a virtue; it’s explicit that this is an “on your head be it!” type of cast and highlights that the code could ideally be improved. https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#es49-if-you-must-use-a-cast-use-a-named-cast

    vasild commented at 10:10 am on May 24, 2021:
    Here we can do without a cast since there is Write() method that takes an integer argument (modulo int -> uint64 conversion which is ok because we never pass negative values).
  69. sipa commented at 5:43 pm on May 21, 2021: member
    utACK 6a1a28f2ad0d84bf30debbe94c5bd8db80a45702
  70. vasild force-pushed on May 24, 2021
  71. vasild commented at 8:40 am on May 24, 2021: member
    6a1a28f2ad...c22e98b30d: address review suggestion
  72. jonatack commented at 10:24 am on May 24, 2021: member
    ACK c22e98b30d9cd5d84ee3c14488defc25b3f998dd
  73. in src/addrman.h:575 in c22e98b30d outdated
    571@@ -569,13 +572,13 @@ friend class CAddrManTest;
    572 
    573         // Prune new entries with refcount 0 (as a result of collisions).
    574         int nLostUnk = 0;
    575-        for (std::map<int, CAddrInfo>::const_iterator it = mapInfo.begin(); it != mapInfo.end(); ) {
    576+        for (MapInfo::const_iterator it = mapInfo.begin(); it != mapInfo.end(); ) {
    


    hebasto commented at 11:54 am on May 24, 2021:

    nit:

    0        for (auto it = mapInfo.cbegin(); it != mapInfo.cend(); ) {
    

    vasild commented at 11:11 am on May 28, 2021:
    Done. Also, skipping the mention of MapInfo here makes it used in just one place. Dropped it and used std::unordered_map<... directly since there is no repetition of the tedious std::unordered_map<... now. Thanks!
  74. hebasto approved
  75. hebasto commented at 11:54 am on May 24, 2021: member
    ACK c22e98b30d9cd5d84ee3c14488defc25b3f998dd
  76. DrahtBot added the label Needs rebase on May 27, 2021
  77. vasild force-pushed on May 28, 2021
  78. vasild commented at 11:16 am on May 28, 2021: member
    c22e98b30d...1f4ee9c905: rebase due to conflicts and address review suggestion
  79. DrahtBot removed the label Needs rebase on May 28, 2021
  80. in src/addrman.h:442 in 1f4ee9c905 outdated
    439+        for (auto it = mapInfo.cbegin(); it != mapInfo.cend(); ) {
    440             if (it->second.fInTried == false && it->second.nRefCount == 0) {
    441-                std::map<int, CAddrInfo>::const_iterator itCopy = it++;
    442+                auto itCopy = it++;
    443                 Delete(itCopy->first);
    444                 nLostUnk++;
    


    jonatack commented at 12:42 pm on May 28, 2021:

    nit

    0-                auto itCopy = it++;
    1+                const auto itCopy = it++;
    2                 Delete(itCopy->first);
    3-                nLostUnk++;
    4+                ++nLostUnk;
    

    vasild commented at 2:41 pm on May 28, 2021:
    Done
  81. jonatack commented at 12:45 pm on May 28, 2021: member
    ACK 1f4ee9c90531986f7f045f1987df8b3a8fa60433
  82. addrman: use unordered_map instead of map
    `CAddrMan` uses `std::map` internally even though it does not require
    that the map's elements are sorted. `std::map`'s access time is
    `O(log(map size))`. `std::unordered_map` is more suitable as it has a
    `O(1)` access time.
    
    This patch lowers the execution times of `CAddrMan`'s methods as follows
    (as per `src/bench/addrman.cpp`):
    
    ```
    AddrMan::Add(): -3.5%
    AddrMan::GetAddr(): -76%
    AddrMan::Good(): -0.38%
    AddrMan::Select(): -45%
    ```
    a92485b2c2
  83. vasild force-pushed on May 28, 2021
  84. vasild commented at 2:42 pm on May 28, 2021: member
    1f4ee9c905...a92485b2c2: address review suggestion
  85. jonatack commented at 5:30 pm on May 28, 2021: member
    ACK a92485b2c250fd18f55d22aa32722bf52ab32bfe
  86. hebasto approved
  87. hebasto commented at 7:49 pm on May 29, 2021: member
    re-ACK a92485b2c250fd18f55d22aa32722bf52ab32bfe, only suggested changes and rebased since my previous review.
  88. achow101 commented at 7:24 pm on June 11, 2021: member
    ACK a92485b2c250fd18f55d22aa32722bf52ab32bfe
  89. fanquake merged this on Jun 12, 2021
  90. fanquake closed this on Jun 12, 2021

  91. vasild deleted the branch on Jun 13, 2021
  92. sidhujag referenced this in commit 37a5ba0692 on Jun 13, 2021
  93. gwillen referenced this in commit fdeeae492c on Jun 1, 2022
  94. DrahtBot locked this on Aug 16, 2022

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-01-21 21:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me