Use std::unordered_set instead of set in blockfilter interface #14074

pull jimpo wants to merge 2 commits into bitcoin:master from jimpo:blockfilter-unordered-set changing 18 files +283 −208
  1. jimpo commented at 7:04 pm on August 26, 2018: contributor

    Use std::unordered_set (hash set) instead of std::set (tree set) in blockfilter interface, as suggested by @ryanofsky in #12254. This may result in a very minor speedup, but I haven’t measured.

    This moves CSipHasher to it’s own file crypto/siphash.h, so that it can be used in the libbitcoin_util library without including hash.{h,cpp}. I’m open to other suggestions on solving this issue if people would prefer to leave CSipHasher where it is.

  2. DrahtBot commented at 8:31 pm on August 26, 2018: member

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #14387 (Faster Input Deduplication Algorithm by JeremyRubin)
    • #14224 (Document intentional and unintentional unsigned integer overflows (wraparounds) using annotations by practicalswift)
    • #14121 (Index for BIP 157 block filters by jimpo)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  3. fanquake added the label Refactoring on Aug 26, 2018
  4. in src/util.h:362 in aa370700df outdated
    357+    uint64_t m_k0, m_k1;
    358+
    359+public:
    360+    ByteVectorHash();
    361+    size_t operator()(const std::vector<unsigned char>& input) const;
    362+};
    


    laanwj commented at 6:41 pm on August 27, 2018:
    I don’t think util.h is the best place for a generic utility data structure such as this, it’s more for assorted operating system functions

    jimpo commented at 9:47 pm on August 27, 2018:
    @laanwj What would be the right place? A new file utiltypes.{h,cpp}? Or util/types.{h,cpp}?

    laanwj commented at 10:32 am on August 28, 2018:
    or maybe support/bytevectorhash.h ? unless there’s a strong need to group this with other things
  5. in src/crypto/siphash.h:1 in aa370700df outdated
    0@@ -0,0 +1,47 @@
    1+// Copyright (c) 2016-2018 The Bitcoin Core developers
    


    laanwj commented at 6:41 pm on August 27, 2018:
    ACK on moving siphash to a separate unit
  6. jimpo force-pushed on Aug 28, 2018
  7. laanwj commented at 12:51 pm on August 31, 2018: member
    utACK f4608359e4e643c6e197df822e03226146e37a49 verified move-onlyness of extraction commit
  8. in src/support/bytevectorhash.cpp:7 in f4608359e4 outdated
    0@@ -0,0 +1,18 @@
    1+// Copyright (c) 2018 The Bitcoin Core developers
    2+// Distributed under the MIT software license, see the accompanying
    3+// file COPYING or http://www.opensource.org/licenses/mit-license.php.
    4+
    5+#include <crypto/siphash.h>
    6+#include <support/bytevectorhash.h>
    7+#include <random.h>
    


    practicalswift commented at 6:37 am on September 10, 2018:
    0src/support/bytevectorhash.cpp:6:1: warning: #includes are not sorted properly [llvm-include-order]
    
  9. in src/crypto/siphash.cpp:25 in f4608359e4 outdated
    20+    v[0] = 0x736f6d6570736575ULL ^ k0;
    21+    v[1] = 0x646f72616e646f6dULL ^ k1;
    22+    v[2] = 0x6c7967656e657261ULL ^ k0;
    23+    v[3] = 0x7465646279746573ULL ^ k1;
    24+    count = 0;
    25+    tmp = 0;
    


    practicalswift commented at 6:41 am on September 10, 2018:
    Initialize count and tmp using default member initializers or in the member initializer list of the constructor?

    jimpo commented at 6:39 pm on September 10, 2018:
    This is a move-only commit. I’d rather not modify the contents of the code here.

    practicalswift commented at 7:55 pm on September 10, 2018:
    Makes sense!
  10. laanwj commented at 8:11 pm on September 10, 2018: member
    @ryanofsky can you take a look here please? according to @jimpo this was your suggestion
  11. in src/support/bytevectorhash.h:9 in f4608359e4 outdated
    0@@ -0,0 +1,23 @@
    1+// Copyright (c) 2018 The Bitcoin Core developers
    2+// Distributed under the MIT software license, see the accompanying
    3+// file COPYING or http://www.opensource.org/licenses/mit-license.php.
    4+
    5+#ifndef BITCOIN_SUPPORT_BYTEVECTORHASH_H
    6+#define BITCOIN_SUPPORT_BYTEVECTORHASH_H
    7+
    8+/**
    9+ * Implementation of Hash named requirement for a byte vector. This may be used the hash function in
    


    promag commented at 9:37 pm on October 8, 2018:
    This may be used as the hash.. ?
  12. in src/support/bytevectorhash.h:15 in f4608359e4 outdated
    10+ * std::unordered_set or std::unordered_map over std::vector<unsigned char>. Internally, a random
    11+ * instance of SipHash-2-4 is used.
    12+ */
    13+class ByteVectorHash
    14+{
    15+private:
    


    promag commented at 9:41 pm on October 8, 2018:
    Drop private.

    jimpo commented at 1:34 am on October 9, 2018:
    I think it’s more clear to leave it explicit even if the default for class is private.
  13. in src/txmempool.h:18 in f4608359e4 outdated
    14@@ -15,6 +15,7 @@
    15 
    16 #include <amount.h>
    17 #include <coins.h>
    18+#include <crypto/siphash.h>
    


    promag commented at 9:50 pm on October 8, 2018:
    Can be removed?

    jimpo commented at 3:57 pm on October 10, 2018:
    It’s used in SaltedTxidHasher
  14. in src/util.cpp:9 in f4608359e4 outdated
    5@@ -6,6 +6,7 @@
    6 #include <util.h>
    7 
    8 #include <chainparamsbase.h>
    9+#include <crypto/siphash.h>
    


    promag commented at 9:56 pm on October 8, 2018:
    Can be removed?
  15. in src/support/bytevectorhash.h:8 in f4608359e4 outdated
    0@@ -0,0 +1,23 @@
    1+// Copyright (c) 2018 The Bitcoin Core developers
    2+// Distributed under the MIT software license, see the accompanying
    3+// file COPYING or http://www.opensource.org/licenses/mit-license.php.
    4+
    5+#ifndef BITCOIN_SUPPORT_BYTEVECTORHASH_H
    6+#define BITCOIN_SUPPORT_BYTEVECTORHASH_H
    7+
    8+/**
    


    promag commented at 9:57 pm on October 8, 2018:
    Missing includes <stdint.h> and <vector>.
  16. in src/support/bytevectorhash.h:13 in f4608359e4 outdated
     8+/**
     9+ * Implementation of Hash named requirement for a byte vector. This may be used the hash function in
    10+ * std::unordered_set or std::unordered_map over std::vector<unsigned char>. Internally, a random
    11+ * instance of SipHash-2-4 is used.
    12+ */
    13+class ByteVectorHash
    


    promag commented at 9:58 pm on October 8, 2018:
    nit, final.
  17. promag commented at 10:37 pm on October 8, 2018: member

    @jimpo can you improve PR description of the advantages of this change (beside linking @ryanofsky suggestion)?

    Do you have numbers to support this change?

    Left some nits.

  18. in src/support/bytevectorhash.h:10 in f4608359e4 outdated
     5+#ifndef BITCOIN_SUPPORT_BYTEVECTORHASH_H
     6+#define BITCOIN_SUPPORT_BYTEVECTORHASH_H
     7+
     8+/**
     9+ * Implementation of Hash named requirement for a byte vector. This may be used the hash function in
    10+ * std::unordered_set or std::unordered_map over std::vector<unsigned char>. Internally, a random
    


    ryanofsky commented at 0:58 am on October 9, 2018:

    In commit “blockfilter: Use unordered_set instead of set in blockfilter.” (f4608359e4e643c6e197df822e03226146e37a49)

    The way this is written makes it seem like this whole class is tied to std::vector<unsigned char>, when actually only the call operator is. I guess my concern that if someone wants to reuse this hash function on a similar type like Span<char>, prevector, or std::string, the naming and comments here will lead them to copy/paste/rename this whole class instead of doing something simpler like overloading the call operator or adding a Span conversion.

    I’d suggest:

    1. Mentioning in this comment that operator() overloads for new types could be added here in the future.
    2. Changing the name of the class from ByteVectorHash to something like RandomizedSipHash to avoid tying it to std::vector.
    3. Maybe replacing std::vector with Span.

    jimpo commented at 4:07 pm on October 10, 2018:
    I updated the comment to say that it works for any types that internally store (or reference) a byte array, but decided not to change the class name.
  19. ryanofsky approved
  20. ryanofsky commented at 2:07 am on October 9, 2018: member
    utACK f4608359e4e643c6e197df822e03226146e37a49. Sorry for missing this previously. I confirmed the first commit is move-only, and I think the changes in the second commit should make it easier to use hash maps more places in the future. I left a suggestion below, but also think this change looks good as it is.
  21. jimpo force-pushed on Oct 10, 2018
  22. jimpo force-pushed on Oct 10, 2018
  23. etscrivner commented at 10:21 pm on October 10, 2018: contributor
    utACK cfabca8749a132220e649d4563fb876afc21ceec
  24. sipa commented at 9:38 am on October 11, 2018: member
    Just one not comment: I always imagined the things in ‘support’ as low level features that are independently usable, and find it a bit strange to put something there that depends on random and crypto/siphash. Perhaps others have a different understanding, as I don’t think it was ever written down somewhere what it was for, but I would just put it in src/.
  25. jimpo commented at 7:37 pm on October 11, 2018: contributor
    @sipa I’d prefer to start creating more directory structure. How would you feel about a util/ directory instead and we can put it in there?
  26. sipa commented at 8:01 pm on October 12, 2018: member
    @jimpo Sounds great, but perhaps something to do independently? I’d prefer to not start a new convention, and then have it linger in a half finished state if there’s unclarity about how to move forward.
  27. ryanofsky commented at 6:00 pm on October 19, 2018: member
    utACK cfabca8749a132220e649d4563fb876afc21ceec. Only changes are updating ByteVectorHash comment, adding final specifier, and tweaking includes.
  28. laanwj referenced this in commit bccb4d29a8 on Nov 5, 2018
  29. Extract CSipHasher to it's own file in crypto/ directory.
    This is a move-only commit with the exception of changes to includes.
    4fb789e9b2
  30. blockfilter: Use unordered_set instead of set in blockfilter. fef5adcc33
  31. jimpo force-pushed on Nov 5, 2018
  32. jimpo commented at 11:12 pm on November 5, 2018: contributor
    @sipa I moved bytevectorhash to util/ now.
  33. laanwj commented at 2:36 pm on November 6, 2018: member
    utACK fef5adcc331c4d7b92b71e03fc8a73343a865599
  34. laanwj merged this on Nov 6, 2018
  35. laanwj closed this on Nov 6, 2018

  36. laanwj referenced this in commit 880bc728b4 on Nov 6, 2018
  37. jimpo deleted the branch on Nov 6, 2018
  38. kittywhiskers referenced this in commit 904da1fca5 on Jun 15, 2021
  39. kittywhiskers referenced this in commit 7c564a4316 on Jun 15, 2021
  40. kittywhiskers referenced this in commit a4ee94238b on Jun 16, 2021
  41. kittywhiskers referenced this in commit abf1c91bf2 on Jun 25, 2021
  42. kittywhiskers referenced this in commit f90ceb395a on Jun 25, 2021
  43. kittywhiskers referenced this in commit fa92d7eb88 on Jun 26, 2021
  44. kittywhiskers referenced this in commit 67bb702341 on Jun 26, 2021
  45. kittywhiskers referenced this in commit 9a5a285cfd on Jun 27, 2021
  46. kittywhiskers referenced this in commit 680319643f on Jun 27, 2021
  47. UdjinM6 referenced this in commit 7d664c7c53 on Jun 27, 2021
  48. MarcoFalke locked this on Sep 8, 2021

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-10-04 19:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me