Bloom filters #1795

pull TheBlueMatt wants to merge 17 commits into bitcoin:master from TheBlueMatt:bloom changing 21 files +1315 −62
  1. TheBlueMatt commented at 7:49 PM on September 6, 2012: member

    Filter tx invs and add MSG_FILTERED_BLOCK to provide blocks as header+vector<merkle block> (not including tx itself)

  2. BitcoinPullTester commented at 4:32 AM on September 8, 2012: none

    Automatic sanity-testing: PASSED, see http://jenkins.bluematt.me/pull-tester/bd93f61c2abe5b19e77030fa5859bc370c789a3a for binaries and test log.

  3. gavinandresen commented at 11:19 PM on September 18, 2012: contributor

    Very exciting.

    Did you see this: http://matthias.vallentin.net/blog/2011/06/a-garden-variety-of-bloom-filters/

    He has a really interesting looking C++11 headers-only implementation here at github that might give you inspiration (e.g. I think he uses boost to do the hashing; he seems to get a ton done in a very small number of lines of code). https://github.com/mavam/libbf

  4. jgarzik commented at 1:09 AM on September 27, 2012: contributor

    implementation ACK, I like it a lot. This seems to match the initial sketch from @mikehearn and myself on IRC.

  5. jgarzik commented at 9:25 PM on October 3, 2012: contributor

    Poke, let's get some additional review on this.

  6. mikehearn commented at 1:11 PM on October 24, 2012: contributor

    Hey Matt, could you rebase this on top of ultraprune? I'm ready to start merging the bitcoinj side but want to test this with the latest code. Will do a review now-ish.

  7. mikehearn commented at 1:45 PM on October 24, 2012: contributor

    ACK on this. Most of my comments are purely cosmetic except for the unit test commit being in the wrong place (it wouldn't compile in the order given).

    I guess we'll need a BIP? I'll start on one that documents what we've come up with.

  8. TheBlueMatt commented at 8:45 PM on November 2, 2012: member

    Thanks mike for the good review (as usual). I updated the code to addresses most of the issues you mentioned. Changes:

    • many additional comments
    • filteradd is now limited to 1MB, any larger and the node gets a Misbehaving(100).
    • fRelayTxn now defaults to false - this means we will no longer relay tx inv message before receiving the remote peer's version message, if this irks anyone it can be discussed. More changes to come (sipa's new merkle block stuff, more unit tests and a seed twiddle value for bloom filters)
  9. TheBlueMatt commented at 8:58 PM on November 2, 2012: member

    @mikehearn there are a few comments which note what the protocol defines as spec (these are obviously up for discussion):

    • there is no formal definition for filter parameters for a filteradd command if no filter is loaded - it is up to the serving node (if the client doest care, why should the spec determine the parameters?)
    • as noted above, 1MB limit to filteradd
    • txes in MSG_FILTERED_BLOCK can be relayed even when a node already has seen it (the current code does this) however it MUST always forward txn that the node has not seen (I think the bip says this, but its not really clearly worded imho, I updated the BIP to make this more clear)
  10. BitcoinPullTester commented at 9:12 PM on November 2, 2012: none

    Automatic sanity-testing: FAILED BUILD/TEST, see http://jenkins.bluematt.me/pull-tester/1860b72e356f0d4279b25bf1ca562ac00511eded for binaries and test log.

    This could happen for one of several reasons:

    1. It chanages paths in makefile.linux-mingw or otherwise changes build scripts in a way that made them incompatible with the automated testing scripts
    2. It does not build on either Linux i386 or Win32 (via MinGW cross compile)
    3. The test suite fails on either Linux i386 or Win32
    4. The block test-cases failed (lookup the first bNN identifier which failed in https://github.com/TheBlueMatt/test-scripts/blob/master/FullBlockTestGenerator.java)
  11. TheBlueMatt commented at 9:18 PM on November 2, 2012: member

    @BitcoinPullTester hey, I wasnt done yet!

  12. BitcoinPullTester commented at 11:23 AM on November 3, 2012: none

    Automatic sanity-testing: PASSED, see http://jenkins.bluematt.me/pull-tester/5be9d251b5901399c6f70b99f1c2bddc4e14e7ec for binaries and test log.

  13. TheBlueMatt commented at 12:01 AM on November 12, 2012: member

    Added the requested nTweak value and merged sipa's work on partial merkle tree representations. The BIP now needs rewritten and this is ready for second (third?) round reviews.

  14. sipa commented at 7:55 PM on November 18, 2012: member

    Mostly ACK;

    • I'd prefer to have the Murmurhash3 implementation separate from CBloomFilter
    • The 0xFFFFFFFF/(n-1)_i seed value seems intended to result in large bit differences between the different hash function's seeds. Together with the tweak, however, the first and the last now get seeds tweak and tweak-1. I think something simpler like k1_i+k2*n+tweak is better (with k1 and k2 arbitrary large odd 32-bit integers).
    • I feel uneasy about the arbitrary filter parameters used for the implicitly created filter when sending filteradd without filterload. The server cannot be expected to make a reasonable guess how the client is going to use the filter, and the client still has to track how large the server-side filter grows anyway. I'd prefer to make this simply illegal (DoS 100): filteradd always requires an active filter. Maybe the actual filter data in filterload can be made optional: if it is omitted, it's assumed to be all zeroes (though that would mean the size has to be specified).
  15. BitcoinPullTester commented at 6:56 PM on November 27, 2012: none

    Automatic sanity-testing: FAILED MERGE, see http://jenkins.bluematt.me/pull-tester/21e780e05c7b18ab318e57bd9e24961a34378e2d for test log.

    This pull does not merge cleanly onto current master

  16. BitcoinPullTester commented at 9:09 AM on December 18, 2012: none

    Automatic sanity-testing: PASSED, see http://jenkins.bluematt.me/pull-tester/a5a72aa691296af6de92658ff998da120c4b34d8 for binaries and test log.

  17. in src/main.cpp:None in a5a72aa691 outdated
    3379 | +        vector<unsigned char> vData;
    3380 | +        vRecv >> vData;
    3381 | +
    3382 | +        // Nodes must NEVER send a data item > 520 bytes (the max size for a script data object,
    3383 | +        // and thus, the maximum size any matched object can have) in a filteradd message
    3384 | +        if (vData.size() > 520)
    


    mikehearn commented at 1:27 PM on January 4, 2013:

    If vData is too large, it will still be added to the filter. There needs to be an else block here.


    TheBlueMatt commented at 12:39 AM on January 7, 2013:

    yep, fixed.


    Diapolo commented at 7:28 PM on January 11, 2013:

    Wouldn't it be nice if that 520 was a constant somewhere or a define? So it could be used for the max size for a script data object and in here.

  18. in src/main.cpp:None in a5a72aa691 outdated
    3386 | +
    3387 | +        LOCK(pfrom->cs_filter);
    3388 | +        if (pfrom->pfilter)
    3389 | +            pfrom->pfilter->insert(vData);
    3390 | +        else
    3391 | +            pfrom->Misbehaving(100);
    


    mikehearn commented at 1:27 PM on January 4, 2013:

    Maybe a log message here would be appropriate?


    TheBlueMatt commented at 12:39 AM on January 7, 2013:

    Misbehaving already logs info on who/how much, it would be kinda nice to pass it a why/where in code param, but that should be in a separate pull.

  19. mikehearn commented at 2:34 PM on January 9, 2013: contributor

    Can you add a new feature that lets clients opt out of auto-adding? Sipa points out that it's only necessary for wallets that contain outputs which don't require predictable data in the corresponding inputs, which (for now) is most of them. This should resolve the excessive expansion issues.

  20. gmaxwell commented at 2:47 PM on January 9, 2013: contributor

    @mikehearn Kind of an an unfortunate privacy loss for ones who do set it, and it still leaves the feature potentially useless for any who set it. (But I agree on having the option, if for nothing else because it makes the load-balancing lark I suggested viable)

  21. mikehearn commented at 3:11 PM on January 9, 2013: contributor

    Well not really, you can still set the FP rate to whatever you want. If all your outputs are pay-to-address then it's a no-op functionality-wise, the auto-adding didn't buy you anything. For the case where you have pay-to-pubkey or other scripts, the client has to refresh the filter every so often .... it can be implemented in bcj later.

  22. gmaxwell commented at 6:09 PM on January 9, 2013: contributor

    Setting that bit or not is still another anonymity set reduction, but I'm splitting hairs there ignore it. I guess I don't have a good feel for how fast the filter fails and needs to be reset. If it's too fast periodic refreshes won't be sufficient (E.g. will the size of the refresh plus the unwanted data end up being more than just not using the filter).

  23. mikehearn commented at 7:18 PM on January 9, 2013: contributor

    I think we can explore different algorithms later. The exact choices will depend on the users privacy preferences and device situation anyway.

    On Wed, Jan 9, 2013 at 7:09 PM, Gregory Maxwell notifications@github.comwrote:

    Setting that bit or not is still another anonymity set reduction, but I'm splitting hairs there ignore it. I guess I don't have a good feel for how fast the filter fails and needs to be reset. If it's too fast periodic refreshes won't be sufficient (E.g. will the size of the refresh plus the unwanted data end up being more than just not using the filter).

    — Reply to this email directly or view it on GitHubhttps://github.com/bitcoin/bitcoin/pull/1795#issuecomment-12058432.

  24. TheBlueMatt commented at 3:44 AM on January 11, 2013: member

    Added a nFlags to let the peer pick how/when it wants the filter updated...also added a second two commits to make it possible to match an arbitrary script template as a discussion point (that second part is not yet tested). Would like comments.

  25. gavinandresen commented at 4:19 PM on January 16, 2013: contributor

    Code review notes:

    I don't like the entire implementation of CPartialMerkleTree being stuffed into main.h, it should be its own .cpp/.h

    Message handling looks good from a potential vulnerability point of view.

    NACK on the script.cpp / script.h changes -- they are unused (yes? I don't see where MatchesTemplate is used outside of script.cpp/.h) and I don't see unit tests for the new features like OP_OPCODE.

  26. TheBlueMatt commented at 4:58 PM on January 16, 2013: member

    @gavinandresen Yes, after discussion I believe we are currently targeting skipping the 2 MatchesTemplate commits for 0.8 and maybe skipping those entirely depending on what we may need in the future.

  27. Add const versions of base_uint.end()/begin(), make size() const. 68feac96b6
  28. Add MurmurHash3 implementation to hash.h/add hash.cpp. 7ab026f449
  29. Add a CBloomFilter class for use as a transaction filter. bd21612c37
  30. Bump PROTOCOL_VERSION for filter messages. 133a546074
  31. Add a filter field in CNode, add filterload+filteradd+filterclear 422d122537
  32. Replace RelayMessage with RelayTransaction. 269d9c6492
  33. Automatically add any matching outputs to a filter during matching. d3b26f7077
  34. Add a CBlock.GetBlockHeader 587f0f855e
  35. Add a CMerkleBlock to store merkle branches of filtered txes. 9fb106e757
  36. Add test cases for CMerkleBlock and CBloomFilter. 2878c67cb5
  37. Relay CMerkleBlocks when asked for MSG_FILTERED_BLOCK b02ddbedcb
  38. Let a node opt out of tx invs before we get a their bloom filter
    Note that the default value for fRelayTxes is false, meaning we
    now no longer relay tx inv messages before receiving the remote
    peer's version message.
    4c8fc1a588
  39. Add a nTweak to bloom filters to tweak the seed. b1f99bed6f
  40. Add CPartialMerkleTree
    This adds a compact representation for a subset of a merkle tree's
    nodes.
    4bedfa9223
  41. Use CPartialMerkleTree for CMerkleBlock transactions. 21aaf255ff
  42. Add nFlags to CBloomFilter to make filter updating optional. e1a4f3778c
  43. Filter mempool command c51694eb9b
  44. BitcoinPullTester commented at 7:49 PM on January 16, 2013: none

    Automatic sanity-testing: PASSED, see http://jenkins.bluematt.me/pull-tester/e1a4f3778cb90ba9f0d4e736752f78dad1703caa for binaries and test log.

  45. TheBlueMatt commented at 7:52 PM on January 16, 2013: member

    @gavinandresen moved it to main.cpp, is that ok for you?

  46. gavinandresen commented at 8:59 PM on January 16, 2013: contributor

    ACK, looks good.

  47. gavinandresen referenced this in commit 91f70a75da on Jan 17, 2013
  48. gavinandresen merged this on Jan 17, 2013
  49. gavinandresen closed this on Jan 17, 2013

  50. rebroad commented at 9:45 AM on March 27, 2013: contributor

    So.. what was the BIP for these protocol changes? It would be nice if it could be referenced in the pull requests.

  51. in src/main.cpp:None in c51694eb9b
    2968 | @@ -2815,6 +2969,10 @@ bool static ProcessMessage(CNode* pfrom, string strCommand, CDataStream& vRecv)
    2969 |              vRecv >> pfrom->strSubVer;
    2970 |          if (!vRecv.empty())
    2971 |              vRecv >> pfrom->nStartingHeight;
    2972 | +        if (!vRecv.empty())
    2973 | +            vRecv >> pfrom->fRelayTxes; // set to true after we get the first filter* message
    


    rebroad commented at 9:51 AM on March 27, 2013:

    Somehow this doesn't feel like the best way to implement this....

  52. mikehearn commented at 10:26 AM on March 27, 2013: contributor

    https://en.bitcoin.it/wiki/BIP_0037 as was discussed on the mailing list. Regardless, this code was already merged and shipped months ago, it's a bit late to be commenting on it now. You can always open up new pull reqs if you want to change how the protocol works.

  53. laudney referenced this in commit 2481237f38 on Mar 19, 2014
  54. DrahtBot locked this on Sep 8, 2021

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-04-19 12:16 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me