[POC] Rust based Cuckoo Filter for m_addr_known #21837

pull fanquake wants to merge 2 commits into bitcoin:master from fanquake:rust_cuckoo_filters changing 29 files +1351 −9
  1. fanquake commented at 4:37 am on May 3, 2021: member

    This proof of concept PR (leveraging work done by Cory Fields & Jeremy Rubin, see #16834) replaces the rolling bloom filter used for m_addr_known with a Cuckoo Filter written in Rust. I’ve have made some minor build related adjustments to the Rust code, which can be seen here: https://github.com/fanquake/rust-cuckoofilter/tree/cabi_build_adjustments.

    See “Cuckoo Filter: Practically Better Than Bloom”:

    In many networking systems, Bloom filters are used for high-speed set membership tests. They permit a small fraction of false positive answers with very good space efficiency. However, they do not permit deletion of items from the set, and previous attempts to extend “standard” Bloom filters to support deletion all degrade either space or performance.

    We propose a new data structure called the cuckoo filter that can replace Bloom filters for approximate set member- ship tests. Cuckoo filters support adding and removing items dynamically while achieving even higher performance than Bloom filters.

    For applications that store many items and target moderately low false positive rates, cuckoo filters have lower space overhead than space-optimized Bloom filters. Our experimental results also show that cuckoo filters out-perform previous data structures that extend Bloom filters to support deletions substantially in both time and space.

    Using this is a matter of:

    0./autogen.sh
    1./configure --enable-experimental-rust
    2make
    3make -C src rusty-check
    4src/bitcoind 
    

    Note that sometimes compilation will finish (i.e due to ccache) before cargo has finished generating the header and Rust lib, which will result in a compile error:

    0./net.h:43:10: fatal error: rusty/out/rcf_cuckoofilter.h: No such file or directory
    1   43 | #include <rusty/out/rcf_cuckoofilter.h>
    

    In this case you can just re-run make. Has been tested on macOS and Linux. Sometimes p2p_getaddr_caching.py fails, because the number of records returned falls a few short of MAX_ADDR_TO_SEND. Need investigating.

    I’m not suggesting that this be merged as-is, or that this is the ideal way of integrating Rust code (i.e copying sources in tree, using cbindgen), into Bitcoin Core. What I am suggesting is that the Rust discussion should continue, particularly in regards to integrations that can be done in a very non-invasive / modular fashion.

    Doesn’t crash, but may catch your machine on fire 🔥, use with caution. More Rust related discussion available in #17090.

  2. [POC] build: Integrate rust-cuckoofilter library
    Integrates a slightly modified version of https://github.com/axiomhq/rust-cuckoofilter,
    so that headers are generated for use from our C++ code. You can see the
    modifications in my fork: https://github.com/fanquake/rust-cuckoofilter.
    
    This leverages existing work from Cory & Jeremy.
    
    Co-Authored-By: Jeremy Rubin <j@rubin.io>
    Co-Authored-By: Cory Fields <cory-nospam-@coryfields.com>
    9ebc927e0e
  3. p2p: experimental Rust based cuckoo filter dd78b53d94
  4. fanquake added the label Brainstorming on May 3, 2021
  5. fanquake added the label P2P on May 3, 2021
  6. MarcoFalke commented at 5:19 am on May 3, 2021: member

    What I am suggesting is that the Rust discussion should continue, particularly in regards to integrations that can be done in a very non-invasive / modular fashion.

    I am not sure if net processing is the right place for a rust playground. It might be better to allow devs to write some non-production code in rust. For example, let them write the unit tests in rust if they wish to do so. This wouldn’t be much different than the option to write tests today in either C++ or Python.

    I understand that your goal is to make the newly added code optional by effectively duplicating it, but the more-than-doubling of the review and maintenance burden is reason enough to Approach NACK this pull. Wasn’t one of the suggestions last time this came up to have a separate net-processing module written in rust that speaks with Bitcoin Core to give it blocks (completely separate from the existing C++ net-processing)?

  7. sipa commented at 5:28 am on May 3, 2021: member

    I don’t have too much opinion on the Rust integration side of things. If that’s a direction people want to go in, integration of a existing, functioning, library with a limited interface is probably one of the better places to start. As @MarcoFalke says, something test-only at first might be even better.

    Regarding cuckoo filters… they’re great, and there are really nice ways of using them for rolling probabilistic data structures.

    I don’t think this library is really what we want though. As I understand it, it deletes random elements when they overflow. That implies that there is basically 0 minimal retaining time for elements in the filter. For m_addr_known I don’t think that matters very much, but we have half a dozen other RollingBloomFilters, and for some of them the guaranteed time of retaining matters.

    I’m a bit biased here of course; @gmaxwell and I have done some research on constructing a rolling cuckoo filter, and found a way to build one with pretty good properties. I have a old branch, but it may be a while before I get it picked up again.

  8. DrahtBot commented at 7:51 am on May 3, 2021: member

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #21859 (Add minisketch subtree and integrate in build/test by sipa)
    • #21667 (build: Add -Werror=missing-noreturn by hebasto)
    • #21186 (net/net processing: Move addr data into net_processing by jnewbery)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  9. fanquake closed this on Jun 9, 2021

  10. kiminuo commented at 10:26 am on June 9, 2021: contributor
    Too bad. I hoped it would get more traction.
  11. DrahtBot locked this on Aug 18, 2022
  12. fanquake deleted the branch on Nov 9, 2022

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-07-05 19:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me