Several changes to mruset #6064

pull sipa wants to merge 4 commits into bitcoin:master from sipa:smallmru changing 8 files +255 −118
  1. sipa commented at 4:36 pm on April 25, 2015: member

    Including:

    • @gavinandresen’s #6066
    • Maintain the mru order using iterators rather than copies.
    • Use a ring buffer (implemented using a statically allocated vector) rather than a deque for keeping the iterators.
    • Replace the mruset unit tests with a single stronger unified one.

    This reduces the memory consumption of a maximal setAddrKnown by +- half.

  2. sipa commented at 4:41 pm on April 25, 2015: member
    The HMAC for address inventory hashing is absolute overkill, but was easy to write. I doubt the performance matters.
  3. sipa force-pushed on Apr 25, 2015
  4. sipa commented at 9:47 pm on April 25, 2015: member

    Rebased on top of @gavinandresen’s #6066 (so that I don’t need the address hasher here).

    Removed the boost::unordered_set usage for now, as it is incompatible with older boost versions (which don’t have unordered_set::reserve).

  5. in src/mruset.h: in 8ba8474594 outdated
    40-    void clear()
    41+    void clear(size_type nMaxSizeIn = 1)
    42     {
    43         set.clear();
    44-        queue.clear();
    45+        std::vector<iterator>(nMaxSizeIn, set.end()).swap(order);
    


    gavinandresen commented at 6:58 pm on April 26, 2015:
    I’m probably just an old fuddy-duddy… … but I find the “use swap to assign” idiom hard to review. Why not just: order.assign(nMaxSizeIn, set.end());

    sipa commented at 8:39 pm on April 26, 2015:
    It’s horrible, but it’s the only way to make a vector release its memory, afaik.

    laanwj commented at 1:44 pm on April 30, 2015:

    AFAIK too. Though this is only important if we expect the size to change (become smaller) dynamically.

    I’m not sure I like having clear() double as set-maximum-size method. I’d prefer the maximum size to be passed to the constructor and be static over the lifetime of the object. If that’s not possible, let’s add a special resize() call for resizing (that happens to clear the structure too).

  6. in src/mruset.h: in 8ba8474594 outdated
    55-                queue.pop_front();
    56+            if (set.size() == nMaxSize + 1) {
    57+                set.erase(order[first_used]);
    58+                order[first_used] = set.end();
    59+                first_used++;
    60+                if (first_used == nMaxSize) { first_used = 0; }
    


    gavinandresen commented at 7:03 pm on April 26, 2015:
    Save a line of code with: first_used = (first_used+1)%nMaxSize;

    sipa commented at 8:40 pm on April 26, 2015:
    Modulo operator is slow, if it’s not a compile-time known power-of-two constant.

    sipa commented at 8:58 pm on April 26, 2015:
    Updated to use another one-line construct.
  7. gavinandresen commented at 7:24 pm on April 26, 2015: contributor

    Code review ACK– note I changed the CRollingBloomFilter implementation commit in #6066 .

    Measured heap usage on OSX for a 1,000-entry mruset : 72,192 bytes. (before this change: 100,992 bytes).

    I’m running a node with these changes now.

  8. sipa force-pushed on Apr 26, 2015
  9. sipa commented at 8:59 pm on April 26, 2015: member
    Rebased on updated #6066, and addressed a nit.
  10. sipa commented at 9:04 pm on April 26, 2015: member
    @gavinandresen I measure around a 50 byte overhead per mruset entry now. Before, it was 50 bytes + double storage of the elements.
  11. sipa force-pushed on Apr 26, 2015
  12. sandakersmann commented at 1:49 pm on April 27, 2015: contributor
    Heading in src/mruset.h should be: // Copyright (c) 2012-2015 The Bitcoin Core developers
  13. sipa commented at 2:27 pm on April 27, 2015: member
    @sandakersmann Why? It’s only worked on in 2012 and 2015.
  14. sandakersmann commented at 2:34 pm on April 27, 2015: contributor
    @sipa Last update was in 2014.
  15. sipa force-pushed on Apr 27, 2015
  16. sipa commented at 5:30 pm on April 27, 2015: member
    @sandakersmann Ok, fixed.
  17. laanwj added the label Improvement on Apr 29, 2015
  18. Rolling bloom filter class
    For when you need to keep track of the last N items
    you've seen, and can tolerate some false-positives.
    
    Rebased-by: Pieter Wuille <pieter.wuille@gmail.com>
    69a5f8be0a
  19. in src/bloom.cpp: in 5f35e5be57 outdated
    39@@ -40,6 +40,17 @@ nFlags(nFlagsIn)
    40 {
    41 }
    42 
    43+// Private constructor used by CRollingBloomFilter
    44+CBloomFilter::CBloomFilter(unsigned int nElements, double nFPRate, unsigned int nTweakIn) :
    45+vData((unsigned int)(-1  / LN2SQUARED * nElements * log(nFPRate)) / 8),
    


    laanwj commented at 12:57 pm on April 30, 2015:
    Nit: indentation
  20. in src/bloom.cpp: in 5f35e5be57 outdated
    207@@ -197,3 +208,38 @@ void CBloomFilter::UpdateEmptyFull()
    208     isFull = full;
    209     isEmpty = empty;
    210 }
    211+
    212+CRollingBloomFilter::CRollingBloomFilter(unsigned int nElements, double fpRate, unsigned int nTweak) :
    213+    b1(nElements*2, fpRate, nTweak), b2(nElements*2, fpRate, nTweak)
    214+{
    215+    // Implemented using two bloom filters of nElements each.
    


    laanwj commented at 12:59 pm on April 30, 2015:
    Comment should be two filters of nElements*2 each, if I understand the above initialization correctly?
  21. in src/bloom.cpp: in 5f35e5be57 outdated
    226+        b1.clear();
    227+    else if (nInsertions%nBloomSize == nBloomSize/2)
    228+        b2.clear();
    229+    b1.insert(vKey);
    230+    b2.insert(vKey);
    231+    ++nInsertions;
    


    laanwj commented at 1:10 pm on April 30, 2015:
    I’d prefer to wrap this counter around at nBloomSize, this avoids some % operations, and also avoids trouble with integer overflows when UIINT_MAX is not divisible by nBloomSize.
  22. in src/main.cpp: in 5f35e5be57 outdated
    4798@@ -4799,8 +4799,9 @@ bool SendMessages(CNode* pto, bool fSendTrickle)
    4799             BOOST_FOREACH(const CAddress& addr, pto->vAddrToSend)
    4800             {
    4801                 // returns true if wasn't already contained in the set
    


    laanwj commented at 1:36 pm on April 30, 2015:
    This comment is no longer relevant: contains is self-documenting in contrast to insert’s second return value
  23. in src/test/mruset_tests.cpp: in 5f35e5be57 outdated
    109         }
    110     }
    111 }
    112 
    113-// 16-bit permutation function
    114-int static permute(int n)
    


    laanwj commented at 1:58 pm on April 30, 2015:
    I’ve checked that the old test still passes on the new code (a drawback of replacing the tests as well as the code that is tested :grin:)
  24. sipa force-pushed on Apr 30, 2015
  25. sipa force-pushed on Apr 30, 2015
  26. sipa commented at 3:14 pm on April 30, 2015: member

    Made several changes:

    • The MRU size of mruset is now a constant (set at constructor time).
    • Restyled some of the rolling bloom filter code.

    I think there is a significant improvement possible to the size of the rolling bloom filter, by using a different tweak for multiple filters, and checking several filters for a ‘contains’. I’ll check the math and write a separate PR for that if found useful.

  27. Replace mruset setAddrKnown with CRollingBloomFilter addrKnown
    Use a probabilistic bloom filter to keep track of which addresses
    we think we have given our peers, instead of a list.
    
    This uses much less memory, at the cost of sometimes failing to
    relay an address to a peer-- worst case if the bloom filter happens
    to be as full as it gets, 1-in-1,000.
    
    Measured memory usage of a full mruset setAddrKnown: 650Kbytes
    Constant memory usage of CRollingBloomFilter addrKnown: 37Kbytes.
    
    This will also help heap fragmentation, because the 37K of storage
    is allocated when a CNode is created (when a connection to a peer
    is established) and then there is no per-item-remembered memory
    allocation.
    
    I plan on testing by restarting a full node with an empty peers.dat,
    running a while with -debug=addrman and -debug=net, and making sure
    that the 'addr' message traffic out is reasonable.
    (suggestions for better tests welcome)
    d81cff32e5
  28. Use ring buffer of set iterators instead of deque of copies in mruset d4d5022cfc
  29. Better mruset unit test f46a680f42
  30. sipa force-pushed on Apr 30, 2015
  31. gavinandresen commented at 3:30 pm on April 30, 2015: contributor
    @sipa: There’s a good survey paper by Sasu Tarkoma et al: “Theory and Practice of Bloom Filters for Distributed Systems.” Which pointed me to: https://home.kookmin.ac.kr/~mkyoon/publications/2010TKDE.pdf … which has a more complicated scheme (maybe similar to what you’re thinking).
  32. gavinandresen commented at 3:41 pm on April 30, 2015: contributor
    ACK
  33. laanwj commented at 5:09 am on May 1, 2015: member

    I think there is a significant improvement possible to the size of the rolling bloom filter, by using a different tweak for both, and checking both filters for a ‘contains’. I’ll check the math and write a separate PR for that if found useful. @sipa sounds great to me, although let’s do that in a later pull, this is already a great improvement compared to the current state, so let’s get this reviewed and merged.

  34. laanwj commented at 10:52 am on May 1, 2015: member
    Tested ACK
  35. laanwj merged this on May 1, 2015
  36. laanwj closed this on May 1, 2015

  37. laanwj referenced this in commit b46e7c24e5 on May 1, 2015
  38. Cryptoslave referenced this in commit fc20c942e0 on Mar 8, 2016
  39. random-zebra referenced this in commit 8bbc0650e6 on Jul 1, 2020
  40. fanquake deleted a comment on Feb 4, 2021
  41. fanquake locked this on Feb 4, 2021

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-12-18 21:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me