net: calculate compact filters on the fly for an already advertised block #35262

pull randomlogin wants to merge 2 commits into bitcoin:master from randomlogin:p2p-cbf-advertised-block-not-found changing 6 files +437 −29
  1. randomlogin commented at 5:55 PM on May 11, 2026: none

    A compact block filter request might arrive when a block has already been processed (and advertised) but its filter has not yet been constructed. In that window the node fails to respond, which the requesting peer treats as misbehaviour and may disconnect/ban.

    Fixes: #29655, #27085 Related discussion: 1, 2

    The problem: Bitcoin Core receives a new block and processes it, then advertises it; however, by that moment the compact filter might not yet have been constructed. If a peer asks for a filter in that window, Bitcoin Core simply does not respond. From the peer's perspective this is a misbehaviour (we advertised a block and don't provide filters for it), which results in disconnects or bans. While it does not critically affect core nodes, it breaks the network topology for BIP 157 clients.

    More precisely, BIP157 specifies that for getcfilters requestor:

    StopHash MUST be known to belong to a block accepted by the receiving peer. This is the case if the peer had previously sent a headers or inv message with that block or any descendents. A node that receives getcfilters with an unknown StopHash SHOULD NOT respond.
    

    Current behaviour clearly contradicts the above specification.

    I (re-)discovered it while testing "light" compact block filter nodes kyoto and its addition to ldk-node, it was also discussed in context of neutrino.

    Proposed solution: compute filters on the fly when they are requested for blocks we already have. Unlike filter headers, individual filters do not commit to their predecessors, so we can compute them independently for a set of blocks. Moreover we compute each filter at most once and persist it, so subsequent requests proceed as usual by reading the already-computed filter from disk.

    Changes:

    • LookupFilter
    • LookupFilterHeader
    • LookupFilterRange
    • LookupFilterHashRange

    Schematically LookupFilter now works as follows:

    1. On filter request, check if we know the requested block. If not, abort.
    2. Check if the filter is already computed and stored. If so, respond with it.
    3. Otherwise compute the filter, persist it, respond with it.

    LookupFilterRange (and LookupFilterHashRange) now work as follows:

    1. We are asked for a filter range, say [tip-N, tip] (BIP 157 caps N at 1000).
    2. Check if filters for the full range [tip-N, tip] are already on disk. If so, respond with them.
    3. Otherwise check if filters for the stable subrange [tip-N, tip-10] are all on disk. If any are missing, fail — those are too far behind the tip to be a race-window miss and we refuse to recompute them on demand.
    4. For each block in [tip-9, tip] that is missing from the index, compute its filter on the fly. Then respond with the complete range.

    LookupFilterHeader is used by ProcessGetCFHeaders to fetch prev_filter_header at start - 1 of a requested range. For a getcfheaders [tip, tip] request, that predecessor is at tip - 1, which can itself be inside the race window. Without on-the-fly fallback on this path, the response would fail even though the filter hashes are recoverable. All four lookup paths now share the same chain-aware fallback.

    This change exposes the possibility that an external peer can force our node to compute filters on the fly and thus burn CPU. However this is not a concern, because:

    • It can only happen once per filter — subsequent requests read from disk.
    • There is a hard limit on the depth of on-the-fly computation: only the 10 blocks below the active chain tip (ONTHEFLY_TIP_WINDOW).
    • The whole window is tiny (single-digit milliseconds on testnet4, tens of milliseconds on mainnet — see numbers below).

    The value 10 for constant ONTHEFLY_TIP_WINDOW is somewhat arbitrary and perhaps can be lowered. Myself I observed a window of maximum depth 2, on regtest.

    Benchmarks

    I benchmarked the four scenarios below on testnet4 (heights 134703–134802). The benchmark files are not included in this PR because they require a synced node and real block data, but I can include them in an additional commit on request.

    • 1a. Request 100 filters, all already computed and stored.
    • 1b. Request 100 filters, 90 already stored, 10 computed on the fly (the worst case for a 100-block request).
    • 2a. Request 10 filters, all already stored.
    • 2b. Request 10 filters, none stored — all computed on the fly (the worst case for a 10-block request).

    Results on testnet4:

    Scenario Per-request total Per-block on-the-fly cost Overhead vs all-indexed
    1a (100 indexed) 0.97 ms baseline
    1b (100, 10 on the fly) 1.25 ms ~28 µs +0.28 ms
    2a (10 indexed) 0.099 ms baseline
    2b (10 all on the fly) 0.50 ms ~40 µs +0.40 ms

    Mainnet figures extrapolation (compute scales roughly with block size and script count; estimated from the per-block compute cost of ~3.4 ms measured against a real mainnet datadir):

    Scenario Per-request total Overhead vs all-indexed
    1a (100 indexed) ~2.5 ms baseline
    1b (100, 10 on the fly) ~36 ms +~34 ms
    2a (10 indexed) ~0.25 ms baseline
    2b (10 all on the fly) ~34 ms +~34 ms

    In other words, the worst-case extra cost a hostile peer can extract is ~34 ms per request, bounded to the 10 tip blocks, paid at most once per block per node lifetime due to the write-back.

    In theory, on-the-fly entries interact correctly with reorgs: the standard hash-vs-height fallback in LookUpOne ensures CustomAppend overwrites them with the new chain's filter, and CustomRemove preserves them via the hash index just like any other filter. I haven't tested it with block reorganizations.

    This PR does not affect block propagation, as the very root of this bug is the independence of block processing and filter construction in CustomAppend. Nor does it affect the initial block download, since during IBD a node does not respond to compact filter messages.

    I've used Claude Code for this PR.

  2. DrahtBot added the label P2P on May 11, 2026
  3. DrahtBot commented at 5:55 PM on May 11, 2026: contributor

    <!--e57a25ab6845829454e8d69fc972939a-->

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    <!--006a51241073e994b41acfe9ec718e94-->

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/35262.

    <!--021abf342d371248e50ceaed478a90ca-->

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    Concept ACK pinheadmz
    Approach NACK mzumsande

    If your review is incorrectly listed, please copy-paste <code>&lt;!--meta-tag:bot-skip--&gt;</code> into the comment that the bot should ignore.

    <!--174a7506f384e20aa4161008e828411d-->

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #35229 (refactor: Use CBlockIndex parameters as reference by optout21)
    • #24230 (indexes: Stop using node internal types and locking cs_main, improve sync logic by ryanofsky)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

    <!--5faf32d7da4f0f540f40219e4f7537a3-->

  4. randomlogin renamed this:
    net: calculate compact filters on-the-fly for an already advertised block
    net: calculate compact filters on the fly for an already advertised block
    on May 12, 2026
  5. test: add cfilter race condition tests
    Compact block filter requests can arrive after a block has been
    connected and advertised (NODE_COMPACT_FILTERS) but before the
    filter index's CustomAppend has written the filter on the
    validation-signals scheduler thread. In that window LookupFilter,
    LookupFilterRange, LookupFilterHashRange and LookupFilterHeader
    all fail, and peers requesting filters for a block we already have
    receive no reply.
    dafacfd564
  6. net: compute missing compact filters on the fly
    When a filter request arrives for a block that is connected and
    has block/undo data but has not yet been written by the filter
    index's CustomAppend (the gap between block-connect and the
    scheduler-thread callback), compute the filter from raw block
    data and persist it.
    75e525590f
  7. randomlogin force-pushed on May 12, 2026
  8. randomlogin marked this as ready for review on May 12, 2026
  9. in src/test/cfilter_race_tests.cpp:112 in dafacfd564
     107 | +        LOCK(cs_main);
     108 | +        new_tip = m_node.chainman->m_blockman.LookupBlockIndex(new_block.GetHash());
     109 | +    }
     110 | +    BOOST_REQUIRE(new_tip);
     111 | +    BOOST_REQUIRE_EQUAL(new_tip->nHeight, old_tip->nHeight + 1);
     112 | +    BOOST_REQUIRE(new_tip->nStatus & BLOCK_HAVE_DATA);
    


    pinheadmz commented at 7:03 PM on May 12, 2026:
     warning: reading variable 'nStatus' requires holding  'cs_main' [-Wthread-safety-analysis]
      112 |     BOOST_REQUIRE(new_tip->nStatus & BLOCK_HAVE_DATA);
    
  10. in src/test/cfilter_race_tests.cpp:1 in dafacfd564
       0 | @@ -0,0 +1,251 @@
       1 | +// Copyright (c) 2026-present The Bitcoin Core developers
    


    pinheadmz commented at 7:10 PM on May 12, 2026:

    nit: can drop the year

    // Copyright (c) The Bitcoin Core developers
    
  11. in src/test/cfilter_race_tests.cpp:31 in dafacfd564
      26 | +// A CValidationInterface that, when armed, stalls the validation-signals
      27 | +// scheduler thread inside BlockConnected until Release() is called. Register
      28 | +// this before the BlockFilterIndex so our callback fires first — at that point
      29 | +// the filter index's CustomAppend has not yet run and the filter is not yet
      30 | +// written.
      31 | +class PreFilterBlocker : public CValidationInterface
    


    pinheadmz commented at 7:16 PM on May 12, 2026:
    -class PreFilterBlocker : public CValidationInterface                                                                                                                                                                                                                                                                  
    +class PreFilterBlocker final : public CValidationInterface 
    

    This is needed to silence the compiler warning on line 69 below when the class is instantiated:

    warning: destructor called on non-final '(anonymous namespace)::PreFilterBlocker' that has virtual functions but non-virtual destructor [-Wdelete-non-abstract-non-virtual-dtor]

    The reason is because the compiler is afraid you might make a child class and then call the destructor on the parent, leading to undefined behavior. The keyword final tells the compiler to calm down, we aren't having any kids ;-)

  12. pinheadmz commented at 7:22 PM on May 12, 2026: member

    concept ACK

    A few compiler warnings I got locally, would be errors when CI runs. I'll do a more in depth review next

  13. mzumsande commented at 6:26 AM on May 13, 2026: contributor

    This looks like a very complicated approach - it adds a lot of complexity to index blocks out of order, and do the indexing directly from the message handler thread. It also seems architecturally unsound, with attempts to separate indexing from the node (#24230).

    It seems that there should be simpler solutions. Could we alternatively, for example, make sure this is just 1 or 2 blocks ahead of the last indexed blocks, wait (with a SyncWithValidationInterfaceQueue() call or similar ) until the pending BlockConnected() signal is processed and then try again?

    So I tend to Approach NACK.

  14. DrahtBot added the label CI failed on May 13, 2026
  15. DrahtBot commented at 8:20 AM on May 13, 2026: contributor

    <!--85328a0da195eb286784d51f73fa0af9-->

    🚧 At least one of the CI tasks failed. <sub>Task test ancestor commits: https://github.com/bitcoin/bitcoin/actions/runs/25751125589/job/75736916591</sub> <sub>LLM reason (✨ experimental): CI failed because the build of test_bitcoin was stopped by Clang -Werror compile errors in cfilter_race_tests.cpp (thread-safety analysis violation, plus a virtual/non-virtual destructor warning).</sub>

    <details><summary>Hints</summary>

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

    </details>

  16. DrahtBot marked this as a draft on May 15, 2026

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-05-18 06:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me