bloom: use Span instead of std::vector for insert and contains [ZAP3] #18985

pull jb55 wants to merge 1 commits into bitcoin:master from jb55:2020-05-bloom-span changing 4 files +20 −24
  1. jb55 commented at 1:09 am on May 16, 2020: member

    This builds off #18468 since it simplifies this PR a bunch. Review that first.

    We can avoid many unnecessary std::vector allocations by changing CBloomFilter to take Spans instead of std::vectors for the insert and contains operations.

    CBloomFilter currently converts types such as CDataStream and uint256 to std::vector on insert and contains. This is unnecessary because CDataStreams and uint256s are already std::vectors internally. We just need a way to point to the right data within those types. Span gives us this ability.

    This is a part of the Zero Allocations Project #18849 (ZAP3). This code came up as a place where many allocations occur. Mainly due to allocations of CService::GetKey which is passed to these functions, but this PR is needed before we get to that.

  2. DrahtBot added the label Consensus on May 16, 2020
  3. DrahtBot added the label P2P on May 16, 2020
  4. DrahtBot added the label Tests on May 16, 2020
  5. practicalswift commented at 6:05 am on May 16, 2020: contributor
    Do you have a good way to quantify the impact of this change? That would be interesting to see :)
  6. fanquake removed the label Tests on May 16, 2020
  7. DrahtBot commented at 8:32 am on May 16, 2020: member

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Conflicts

    No conflicts as of last run.

  8. jb55 commented at 2:24 pm on May 16, 2020: member

    @practicalswift this is something I want to do once I figure out how to use the benchmarking suite. This one might be nontrivial though since it’s networking code. I might just do the benchmarking in #18849 to start.

    One piece of data that I can show here is the number of allocations before and after which I’ll do

  9. sipa commented at 8:48 pm on May 16, 2020: member

    Concept ACK, I think this is the right approach. In general I think that most functions that take either a const vector/prevector as input, or begin/end pointer, or a begin pointer + size, should be replaced with versions that just take in a Span.

    It would be worthwhile to make uint256 amendable to the range Span constructor. I think all you’d need is rename the data field to m_data, and then add a data() member function that does the same as begin().

    Nit: uint256 is not a vector internally, but an array.

  10. practicalswift commented at 10:13 pm on May 16, 2020: contributor

    I’ve noticed that we consistently pass Span by const-ref instead of by value in the project (this PR follows that convention): is that an intentional choice? Just curious :)

     0$ git grep -E '\([^(]*Span<.*>[^)]*\)' ":(exclude)src/span.h"
     1src/script/descriptor.cpp:std::string DescriptorChecksum(const Span<const char>& span)
     2src/script/descriptor.cpp:NODISCARD bool ParseKeyPath(const std::vector<Span<const char>>& split, KeyPath& out, std::string& error)
     3src/script/descriptor.cpp:std::unique_ptr<PubkeyProvider> ParsePubkeyInner(uint32_t key_exp_index, const Span<const char>& sp, bool permit_uncompressed, FlatSigningProvider& out, std::string& error)
     4src/script/descriptor.cpp:std::unique_ptr<PubkeyProvider> ParsePubkey(uint32_t key_exp_index, const Span<const char>& sp, bool permit_uncompressed, FlatSigningProvider& out, std::string& error)
     5src/script/descriptor.cpp:std::unique_ptr<DescriptorImpl> ParseScript(uint32_t key_exp_index, Span<const char>& sp, ParseScriptContext ctx, FlatSigningProvider& out, std::string& error)
     6src/script/descriptor.cpp:bool CheckChecksum(Span<const char>& sp, bool require_checksum, std::string& error, std::string* out_checksum = nullptr)
     7src/script/interpreter.cpp:static bool ExecuteWitnessScript(const Span<const valtype>& stack_span, const CScript& scriptPubKey, unsigned int flags, SigVersion sigversion, const BaseSignatureChecker& checker, ScriptError* serror)
     8src/serialize.h:template<typename Stream> inline void Serialize(Stream& s, const Span<const unsigned char>& span) { s.write(CharCast(span.data()), span.size()); }
     9src/serialize.h:template<typename Stream> inline void Serialize(Stream& s, const Span<unsigned char>& span) { s.write(CharCast(span.data()), span.size()); }
    10src/serialize.h:template<typename Stream> inline void Unserialize(Stream& s, Span<unsigned char>& span) { s.read(CharCast(span.data()), span.size()); }
    11src/test/util_tests.cpp:static std::string SpanToStr(Span<const char>& span)
    12src/util/spanparsing.cpp:bool Const(const std::string& str, Span<const char>& sp)
    13src/util/spanparsing.cpp:bool Func(const std::string& str, Span<const char>& sp)
    14src/util/spanparsing.cpp:Span<const char> Expr(Span<const char>& sp)
    15src/util/spanparsing.cpp:std::vector<Span<const char>> Split(const Span<const char>& sp, char sep)
    16src/util/spanparsing.h:bool Const(const std::string& str, Span<const char>& sp);
    17src/util/spanparsing.h:bool Func(const std::string& str, Span<const char>& sp);
    18src/util/spanparsing.h:Span<const char> Expr(Span<const char>& sp);
    19src/util/spanparsing.h:std::vector<Span<const char>> Split(const Span<const char>& sp, char sep);
    
  11. DrahtBot added the label Needs rebase on Aug 3, 2020
  12. theStack commented at 9:44 pm on August 11, 2020: member

    Concept ACK. Will review as soon as this is rebased.

    I’ve noticed that we consistently pass Span by const-ref instead of by value in the project (this PR follows that convention): is that an intentional choice? Just curious :) @practicalswift: Good point. I guess seen from a performance perspective it doesn’t really make a difference, considering how lightweight spans are. In any case I agree that we should agree on one passing type and use it consistently in the code-base (my personal preference would be to always pass it by value, though).

  13. jb55 commented at 11:22 pm on August 11, 2020: member

    Sebastian Falbesoner notifications@github.com writes:

    Concept ACK. Will review as soon as this is rebased.

    I’ve noticed that we consistently pass Span by const-ref instead of by value in the project (this PR follows that convention): is that an intentional choice? Just curious :)

    @practicalswift: Good point. I guess seen from a performance perspective it doesn’t really make a difference, considering how lightweight spans are. In any case I agree that we should agree on one passing type and use it consistently in the code-base (my personal preference would be to always pass it by value, though).

    It looks like they are being passed by value in other PRs, I don’t mind switching to that style here.

  14. practicalswift commented at 9:41 am on August 12, 2020: contributor

    @theStack @jb55 FWIW, throughout the C++ Core Guidelines span is consistently passed by value, and the following is said about it:

    Note: A span<T> object does not own its elements and is so small that it can be passed by value. Passing a span object as an argument is exactly as efficient as passing a pair of pointer arguments or passing a pointer and an integer count. (F.24)

    Same goes for en.cppreference.com FWIW.

  15. jb55 force-pushed on Aug 14, 2020
  16. jb55 commented at 4:18 pm on August 14, 2020: member
    rebased and switched from const Span<const unsigned char>& style to Span<const unsigned char>
  17. DrahtBot removed the label Needs rebase on Aug 14, 2020
  18. in src/bloom.cpp:62 in 1466b65662 outdated
    58@@ -59,17 +59,15 @@ void CBloomFilter::insert(const COutPoint& outpoint)
    59 {
    60     CDataStream stream(SER_NETWORK, PROTOCOL_VERSION);
    61     stream << outpoint;
    62-    std::vector<unsigned char> data(stream.begin(), stream.end());
    63-    insert(data);
    64+    insert(Span<const unsigned char>((const unsigned char*)stream.data(), stream.size()));
    


    theStack commented at 12:51 pm on August 15, 2020:
    Could simply use the recently introduced MakeUCharSpan() helper here and on other occurences of the commit.
  19. promag commented at 2:21 pm on August 15, 2020: member
    Concept ACK.
  20. jb55 commented at 0:37 am on August 17, 2020: member

    Sebastian Falbesoner notifications@github.com writes:

    Could simply use the recently introduced MakeUCharSpan() helper here and on other occurences of the commit.

    oh nice, I didn’t know that was a thing. I’ll do that.

  21. jb55 force-pushed on Aug 17, 2020
  22. jb55 commented at 3:21 pm on August 17, 2020: member

    pushed 897f5be28c072c700a55a511bf0a88efc3b71602

    git diff 1466b656625b3896176d3aa6f794d1d8d0f7c3e0 897f5be28c072c700a55a511bf0a88efc3b71602

    much cleaner 🤩

     0diff --git a/src/bloom.cpp b/src/bloom.cpp
     1index 436e913aa5..c19d6e7d7d 100644
     2--- a/src/bloom.cpp
     3+++ b/src/bloom.cpp
     4@@ -59,12 +59,12 @@ void CBloomFilter::insert(const COutPoint& outpoint)
     5 {
     6     CDataStream stream(SER_NETWORK, PROTOCOL_VERSION);
     7     stream << outpoint;
     8-    insert(Span<const unsigned char>((const unsigned char*)stream.data(), stream.size()));
     9+    insert(MakeUCharSpan(stream));
    10 }
    11 
    12 void CBloomFilter::insert(const uint256& hash)
    13 {
    14-    insert(Span<const unsigned char>(hash.begin(), hash.end()));
    15+    insert(MakeUCharSpan(hash));
    16 }
    17 
    18 bool CBloomFilter::contains(Span<const unsigned char> vKey) const
    19@@ -85,13 +85,12 @@ bool CBloomFilter::contains(const COutPoint& outpoint) const
    20 {
    21     CDataStream stream(SER_NETWORK, PROTOCOL_VERSION);
    22     stream << outpoint;
    23-    return contains(Span<const unsigned char>((const unsigned char*)stream.data(),
    24-        stream.size()));
    25+    return contains(MakeUCharSpan(stream));
    26 }
    27 
    28 bool CBloomFilter::contains(const uint256& hash) const
    29 {
    30-    return contains(Span<const unsigned char>(hash.begin(), hash.end()));
    31+    return contains(MakeUCharSpan(hash));
    32 }
    33 
    34 bool CBloomFilter::IsWithinSizeConstraints() const
    35@@ -241,7 +240,7 @@ void CRollingBloomFilter::insert(Span<const unsigned char> vKey)
    36 
    37 void CRollingBloomFilter::insert(const uint256& hash)
    38 {
    39-    insert(Span<const unsigned char>(hash.begin(), hash.end()));
    40+    insert(MakeUCharSpan(hash));
    41 }
    42 
    43 bool CRollingBloomFilter::contains(Span<const unsigned char> vKey) const
    44@@ -260,7 +259,7 @@ bool CRollingBloomFilter::contains(Span<const unsigned char> vKey) const
    45 
    46 bool CRollingBloomFilter::contains(const uint256& hash) const
    47 {
    48-    return contains(Span<const unsigned char>(hash.begin(), hash.end()));
    49+    return contains(MakeUCharSpan(hash));
    50 }
    51 
    52 void CRollingBloomFilter::reset()
    53diff --git a/src/test/bloom_tests.cpp b/src/test/bloom_tests.cpp
    54index 1ad5607429..9cf6e352e4 100644
    55--- a/src/test/bloom_tests.cpp
    56+++ b/src/test/bloom_tests.cpp
    57@@ -91,7 +91,7 @@ BOOST_AUTO_TEST_CASE(bloom_create_insert_key)
    58     CBloomFilter filter(2, 0.001, 0, BLOOM_UPDATE_ALL);
    59     filter.insert(vchPubKey);
    60     uint160 hash = pubkey.GetID();
    61-    filter.insert(Span<unsigned char>(hash.begin(), hash.end()));
    62+    filter.insert(MakeUCharSpan(hash));
    63 
    64     CDataStream stream(SER_NETWORK, PROTOCOL_VERSION);
    65     stream << filter;
    
  23. theStack commented at 6:10 pm on August 24, 2020: member
    LGTM now 👍 Only the include change in net.h is a bit confusing… there is no direct part in net.h that needs a Span as far as I can see. Wouldn’t it be more obvious to simply include <span.h> in bloom.h as it is used there in this PR?
  24. DrahtBot added the label Needs rebase on Sep 29, 2020
  25. bloom: use Span instead of std::vector for `insert` and `contains`
    We can avoid many unnecessary std::vector allocations by changing
    CBloomFilter to take Spans instead of std::vector's for the `insert`
    and `contains` operations.
    
    CBloomFilter currently converts types such as CDataStream and uint256
    to std::vector on `insert` and `contains`. This is unnecessary because
    CDataStreams and uint256 are already std::vectors internally. We just
    need a way to point to the right data within those types. Span gives
    us this ability.
    
    Signed-off-by: William Casarin <jb55@jb55.com>
    be8c3b4d6f
  26. jb55 force-pushed on Apr 28, 2021
  27. jb55 commented at 9:27 pm on April 28, 2021: member
    rebased
  28. DrahtBot removed the label Needs rebase on Apr 28, 2021
  29. vincenzopalazzo approved
  30. vincenzopalazzo commented at 10:28 pm on May 4, 2021: none

    utACK https://github.com/bitcoin/bitcoin/commit/be8c3b4d6f86064914fd2151720bd5f80932a442

    Theoretically, this could be a good improvement because it removes the smart pointer from the game, but I never use it in practice the span and I’m curious to see how much improvement brings this game.

  31. jb55 commented at 5:49 pm on May 5, 2021: member

    On Tue, May 04, 2021 at 03:28:59PM -0700, Vincenzo Palazzo wrote:

    Theoretically, this could be a good improvement because it removes the smart pointer from the game, but I never use it in practice the span and I’m curious to see how much improvement brings this game.

    See my comment here:

    This code came up as a place where many allocations occur. Mainly due to allocations of CService::GetKey which is passed to these functions, but this PR is needed before we get to that.

    This PR is a stepping stone to:

    [ZAP4] netaddress: return a prevector from CService::GetKey() [1]
    

    Which removes a bunch of unnecessary heap allocations during network operations.

    [1] #18849

  32. 0xB10C commented at 1:51 pm on August 3, 2021: member
    Concept ACK. Have you figured out a way to benchmark this?
  33. MarcoFalke removed the label Consensus on Aug 5, 2021
  34. MarcoFalke removed the label P2P on Aug 5, 2021
  35. MarcoFalke added the label Refactoring on Aug 5, 2021
  36. in src/bloom.h:118 in be8c3b4d6f
    111@@ -112,9 +112,9 @@ class CRollingBloomFilter
    112 public:
    113     CRollingBloomFilter(const unsigned int nElements, const double nFPRate);
    114 
    115-    void insert(const std::vector<unsigned char>& vKey);
    116+    void insert(Span<const unsigned char> vKey);
    117     void insert(const uint256& hash);
    118-    bool contains(const std::vector<unsigned char>& vKey) const;
    119+    bool contains(Span<const unsigned char> vKey) const;
    120     bool contains(const uint256& hash) const;
    


    MarcoFalke commented at 3:52 pm on August 5, 2021:
    you can remove this method now
  37. in src/bloom.h:75 in be8c3b4d6f
    72     void insert(const uint256& hash);
    73 
    74-    bool contains(const std::vector<unsigned char>& vKey) const;
    75+    bool contains(Span<const unsigned char> vKey) const;
    76     bool contains(const COutPoint& outpoint) const;
    77     bool contains(const uint256& hash) const;
    


    MarcoFalke commented at 3:52 pm on August 5, 2021:
    same
  38. MarcoFalke commented at 3:53 pm on August 5, 2021: member
    Needs rebase
  39. in src/test/bloom_tests.cpp:86 in be8c3b4d6f
    82@@ -83,7 +83,7 @@ BOOST_AUTO_TEST_CASE(bloom_create_insert_key)
    83     CBloomFilter filter(2, 0.001, 0, BLOOM_UPDATE_ALL);
    84     filter.insert(vchPubKey);
    85     uint160 hash = pubkey.GetID();
    86-    filter.insert(std::vector<unsigned char>(hash.begin(), hash.end()));
    87+    filter.insert(MakeUCharSpan(hash));
    


    martinus commented at 9:17 am on August 16, 2021:
    No need for MakeUCharSpan, filter.insert(hash); works now too
  40. in src/bloom.cpp:62 in be8c3b4d6f
    58@@ -59,17 +59,15 @@ void CBloomFilter::insert(const COutPoint& outpoint)
    59 {
    60     CDataStream stream(SER_NETWORK, PROTOCOL_VERSION);
    61     stream << outpoint;
    62-    std::vector<unsigned char> data(stream.begin(), stream.end());
    63-    insert(data);
    64+    insert(MakeUCharSpan(stream));
    


    martinus commented at 9:18 am on August 16, 2021:
    No need for MakeUCharSpan
  41. MarcoFalke added the label Waiting for author on Aug 16, 2021
  42. MarcoFalke added the label Needs rebase on Aug 16, 2021
  43. DrahtBot removed the label Needs rebase on Aug 16, 2021
  44. laanwj commented at 2:43 pm on August 16, 2021: member

    MarcoFalke added Waiting for author Needs rebase labels 3 hours ago DrahtBot removed the Needs rebase label 2 hours ago

    I think DrahtBot screwed up here. No rebase happened in between.

  45. jb55 closed this on Sep 20, 2021

  46. sipa commented at 5:54 pm on September 20, 2021: member
    :(
  47. laanwj commented at 7:40 pm on September 20, 2021: member
    Any specific reason for closing?
  48. jb55 commented at 9:19 pm on September 20, 2021: member
    lost interest. It’s fine with me if someone else wants to pick it up.
  49. vincenzopalazzo commented at 11:00 pm on September 20, 2021: none

    @jb55, I can be interested in it.

    I love your optimization, and maybe can be also a good starting point to push some code on bitcoin :-)

  50. MarcoFalke removed the label Waiting for author on Sep 21, 2021
  51. MarcoFalke added the label Up for grabs on Sep 21, 2021
  52. fanquake removed the label Up for grabs on Sep 28, 2021
  53. fanquake commented at 7:39 am on September 28, 2021: member
    Rebased, with comments addressed in #23115.
  54. MarcoFalke referenced this in commit 829c441af2 on Sep 29, 2021
  55. jb55 deleted the branch on Oct 5, 2021
  56. DrahtBot locked this on Oct 30, 2022

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-07-03 13:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me