doc: clarify CRollingBloomFilter size estimate #19968

pull robot-dreams wants to merge 1 commits into bitcoin:master from robot-dreams:bloom-doc changing 1 files +12 −1
  1. robot-dreams commented at 10:34 AM on September 17, 2020: contributor

    Based on #19130, this change improves the comment for CRollingBloomFilter in bloom.h:

    • Give examples to illustrate the heuristic "1.8 bytes per element per factor 0.1 of false positive rate"
    • Add some Python code which can be copy/pasted for convenient filter size calculation (in an interpreter)
    • Reconcile the newly added code with the existing approximation
  2. fanquake added the label Docs on Sep 17, 2020
  3. robot-dreams force-pushed on Sep 17, 2020
  4. robot-dreams force-pushed on Sep 18, 2020
  5. in src/bloom.h:111 in 01d14f2bc5 outdated
     107 | +    logfp = math.log(fprate)
     108 | +    nhash = max(1, min(round(logfp/math.log(0.5)), 50))
     109 | +    nfilbits = math.ceil(-1.0*nhash*(3*int((nel+1)/2)) / math.log(1.0 - math.exp(logfp / nhash)))
     110 | +    return (nfilbits+63)/64*2*8
     111 | +
     112 | + * If we make these assumptions:
    


    dhruv commented at 6:36 PM on September 26, 2020:

    Nit: I found these next 6 lines added complexity to something that was very clear thus far. Do you think the simpler formula makes it easier to reason about the size of the bloom filter?


    laanwj commented at 11:23 AM on September 30, 2020:

    How does this render on doxygen? If it is markdown style you'd likely want to quote an inline script using triple-backquotes.

    TBH I'm not sure the python script adds much in the first place. Sometimes a simple but good-enough approximation is better for documentation purposes. Is this level of precision necessary?

    (edit: whoops, didn't mean to click the resolve conversation button)


    robot-dreams commented at 3:31 AM on October 2, 2020:

    Yeah, fair question @laanwj . I picked this up from the "up for grabs" #19130 but I could go either way—if we think the extra precision doesn't really help then we can close this PR.


    sipa commented at 3:42 AM on October 2, 2020:

    See my comment: #19130 (comment)

    Given how accurate the formula is in all reasonable cases (at most 0.115% off, and usually much less), I don't think it's worth including the code.

    The rest of the PR seems useful, though.


    robot-dreams commented at 4:10 AM on October 2, 2020:

    Thanks for the input! Agreed that the extra complexity from the Python script is not worthwhile for the small accuracy gain; I'll update.


    robot-dreams commented at 5:30 AM on October 2, 2020:

    Ooh, didn't know about doxygen use before, thanks for the reminder! The formatting should be reasonable now:

    <img width="807" alt="Doxygen" src="https://user-images.githubusercontent.com/4276679/94890948-9a186580-0435-11eb-808e-4902ae092b69.png">

  6. in src/bloom.h:97 in 01d14f2bc5 outdated
      93 | @@ -94,7 +94,26 @@ class CBloomFilter
      94 |   * insert()'ed ... but may also return true for items that were not inserted.
      95 |   *
      96 |   * It needs around 1.8 bytes per element per factor 0.1 of false positive rate.
      97 | - * (More accurately: 3/(log(256)*log(2)) * log(1/fpRate) * nElements bytes)
      98 | + * e.g. for 1000 elements, we'd need:
    


    dhruv commented at 6:38 PM on September 26, 2020:

    Nit: Maybe add "Lower tolerance for false positives increases the size of the bloom filter manifesting a trade-off between memory and compute (in the case of a false positive)."

  7. dhruv commented at 6:42 PM on September 26, 2020: member

    ACK 01d14f2

    I tested this by being new to bitcoin and having only a high-level understanding of bloom filters. The commit made the trade-off between the acceptable false positive rate and memory footprint easy to understand. A couple optional nits below.

  8. fanquake requested review from laanwj on Sep 29, 2020
  9. fanquake requested review from sipa on Sep 29, 2020
  10. robot-dreams closed this on Oct 2, 2020

  11. robot-dreams reopened this on Oct 2, 2020

  12. doc: clarify CRollingBloomFilter size estimate d9141a0002
  13. robot-dreams force-pushed on Oct 2, 2020
  14. robot-dreams renamed this:
    doc: make it easier to work out size of bloom filter
    doc: clarify CRollingBloomFilter size estimate
    on Oct 2, 2020
  15. laanwj commented at 10:44 AM on November 19, 2020: member

    ACK d9141a0002bb508b2e94e206a1bd28ef8f97ffde

  16. laanwj merged this on Nov 19, 2020
  17. laanwj closed this on Nov 19, 2020

  18. sidhujag referenced this in commit 283a159512 on Nov 19, 2020
  19. zkbot referenced this in commit be459619a8 on Mar 5, 2021
  20. zkbot referenced this in commit 78de0cdf46 on Apr 15, 2021
  21. PastaPastaPasta referenced this in commit d4c943a872 on Jun 27, 2021
  22. PastaPastaPasta referenced this in commit fb5804a0b2 on Jun 28, 2021
  23. PastaPastaPasta referenced this in commit 108c9b1891 on Jun 29, 2021
  24. PastaPastaPasta referenced this in commit fc333b0a9a on Jul 1, 2021
  25. PastaPastaPasta referenced this in commit 1d3ab5aa9d on Jul 1, 2021
  26. PastaPastaPasta referenced this in commit 101a0bba40 on Jul 15, 2021
  27. PastaPastaPasta referenced this in commit e376c205dd on Jul 15, 2021
  28. PastaPastaPasta referenced this in commit a9b5723556 on Jul 16, 2021
  29. gabriel-bjg referenced this in commit 105338f255 on Jul 16, 2021
  30. DrahtBot locked this on Feb 15, 2022

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-04-22 12:14 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me