net: calculate compact filters on the fly for an already advertised block #35262

randomlogin commented at 5:55 PM on May 11, 2026: none

A compact block filter request might arrive when a block has already been processed (and advertised) but its filter has not yet been constructed. In that window the node fails to respond, which the requesting peer treats as misbehaviour and may disconnect/ban.

Fixes: #29655, #27085 Related discussion: 1, 2

The problem: Bitcoin Core receives a new block and processes it, then advertises it; however, by that moment the compact filter might not yet have been constructed. If a peer asks for a filter in that window, Bitcoin Core simply does not respond. From the peer's perspective this is a misbehaviour (we advertised a block and don't provide filters for it), which results in disconnects or bans. While it does not critically affect core nodes, it breaks the network topology for BIP 157 clients.

More precisely, BIP157 specifies that for getcfilters requestor:

StopHash MUST be known to belong to a block accepted by the receiving peer. This is the case if the peer had previously sent a headers or inv message with that block or any descendents. A node that receives getcfilters with an unknown StopHash SHOULD NOT respond.

Current behaviour clearly contradicts the above specification.

I (re-)discovered it while testing "light" compact block filter nodes kyoto and its addition to ldk-node, it was also discussed in context of neutrino.

Proposed solution: compute filters on the fly when they are requested for blocks we already have. Unlike filter headers, individual filters do not commit to their predecessors, so we can compute them independently for a set of blocks. Moreover we compute each filter at most once and persist it, so subsequent requests proceed as usual by reading the already-computed filter from disk.

Changes:

LookupFilter
LookupFilterHeader
LookupFilterRange
LookupFilterHashRange

Schematically LookupFilter now works as follows:

On filter request, check if we know the requested block. If not, abort.
Check if the filter is already computed and stored. If so, respond with it.
Otherwise compute the filter, persist it, respond with it.

LookupFilterRange (and LookupFilterHashRange) now work as follows:

We are asked for a filter range, say [tip-N, tip] (BIP 157 caps N at 1000).
Check if filters for the full range [tip-N, tip] are already on disk. If so, respond with them.
Otherwise check if filters for the stable subrange [tip-N, tip-10] are all on disk. If any are missing, fail — those are too far behind the tip to be a race-window miss and we refuse to recompute them on demand.
For each block in [tip-9, tip] that is missing from the index, compute its filter on the fly. Then respond with the complete range.

LookupFilterHeader is used by ProcessGetCFHeaders to fetch prev_filter_header at start - 1 of a requested range. For a getcfheaders [tip, tip] request, that predecessor is at tip - 1, which can itself be inside the race window. Without on-the-fly fallback on this path, the response would fail even though the filter hashes are recoverable. All four lookup paths now share the same chain-aware fallback.

This change exposes the possibility that an external peer can force our node to compute filters on the fly and thus burn CPU. However this is not a concern, because:

It can only happen once per filter — subsequent requests read from disk.
There is a hard limit on the depth of on-the-fly computation: only the 10 blocks below the active chain tip (ONTHEFLY_TIP_WINDOW).
The whole window is tiny (single-digit milliseconds on testnet4, tens of milliseconds on mainnet — see numbers below).

The value 10 for constant ONTHEFLY_TIP_WINDOW is somewhat arbitrary and perhaps can be lowered. Myself I observed a window of maximum depth 2, on regtest.

Benchmarks

I benchmarked the four scenarios below on testnet4 (heights 134703–134802). The benchmark files are not included in this PR because they require a synced node and real block data, but I can include them in an additional commit on request.

1a. Request 100 filters, all already computed and stored.
1b. Request 100 filters, 90 already stored, 10 computed on the fly (the worst case for a 100-block request).
2a. Request 10 filters, all already stored.
2b. Request 10 filters, none stored — all computed on the fly (the worst case for a 10-block request).

Results on testnet4:

Scenario	Per-request total	Per-block on-the-fly cost	Overhead vs all-indexed
1a (100 indexed)	0.97 ms	—	baseline
1b (100, 10 on the fly)	1.25 ms	~28 µs	+0.28 ms
2a (10 indexed)	0.099 ms	—	baseline
2b (10 all on the fly)	0.50 ms	~40 µs	+0.40 ms

Mainnet figures extrapolation (compute scales roughly with block size and script count; estimated from the per-block compute cost of ~3.4 ms measured against a real mainnet datadir):

Scenario	Per-request total	Overhead vs all-indexed
1a (100 indexed)	~2.5 ms	baseline
1b (100, 10 on the fly)	~36 ms	+~34 ms
2a (10 indexed)	~0.25 ms	baseline
2b (10 all on the fly)	~34 ms	+~34 ms

In other words, the worst-case extra cost a hostile peer can extract is ~34 ms per request, bounded to the 10 tip blocks, paid at most once per block per node lifetime due to the write-back.

In theory, on-the-fly entries interact correctly with reorgs: the standard hash-vs-height fallback in LookUpOne ensures CustomAppend overwrites them with the new chain's filter, and CustomRemove preserves them via the hash index just like any other filter. I haven't tested it with block reorganizations.

This PR does not affect block propagation, as the very root of this bug is the independence of block processing and filter construction in CustomAppend. Nor does it affect the initial block download, since during IBD a node does not respond to compact filter messages.

I've used Claude Code for this PR.

DrahtBot added the label P2P on May 11, 2026

DrahtBot commented at 5:55 PM on May 11, 2026: contributor

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/35262.

Reviews

See the guideline for information on the review process.

Type	Reviewers
Concept ACK	pinheadmz
Approach NACK	mzumsande

If your review is incorrectly listed, please copy-paste <code></code> into the comment that the bot should ignore.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#35229 (refactor: Use CBlockIndex parameters as reference by optout21)
#24230 (indexes: Stop using node internal types and locking cs_main, improve sync logic by ryanofsky)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

randomlogin renamed this:
~~net: calculate compact filters on-the-fly for an already advertised block~~
net: calculate compact filters on the fly for an already advertised block
on May 12, 2026

test: add cfilter race condition tests

Compact block filter requests can arrive after a block has been
connected and advertised (NODE_COMPACT_FILTERS) but before the
filter index's CustomAppend has written the filter on the
validation-signals scheduler thread. In that window LookupFilter,
LookupFilterRange, LookupFilterHashRange and LookupFilterHeader
all fail, and peers requesting filters for a block we already have
receive no reply.

dafacfd564

net: compute missing compact filters on the fly

When a filter request arrives for a block that is connected and
has block/undo data but has not yet been written by the filter
index's CustomAppend (the gap between block-connect and the
scheduler-thread callback), compute the filter from raw block
data and persist it.

75e525590f

randomlogin force-pushed on May 12, 2026

randomlogin marked this as ready for review on May 12, 2026

in src/test/cfilter_race_tests.cpp:112 in dafacfd564

 107 | +        LOCK(cs_main);
 108 | +        new_tip = m_node.chainman->m_blockman.LookupBlockIndex(new_block.GetHash());
 109 | +    }
 110 | +    BOOST_REQUIRE(new_tip);
 111 | +    BOOST_REQUIRE_EQUAL(new_tip->nHeight, old_tip->nHeight + 1);
 112 | +    BOOST_REQUIRE(new_tip->nStatus & BLOCK_HAVE_DATA);

pinheadmz commented at 7:03 PM on May 12, 2026:

 warning: reading variable 'nStatus' requires holding  'cs_main' [-Wthread-safety-analysis]
  112 |     BOOST_REQUIRE(new_tip->nStatus & BLOCK_HAVE_DATA);

in src/test/cfilter_race_tests.cpp:1 in dafacfd564

   0 | @@ -0,0 +1,251 @@
   1 | +// Copyright (c) 2026-present The Bitcoin Core developers

pinheadmz commented at 7:10 PM on May 12, 2026:

nit: can drop the year

// Copyright (c) The Bitcoin Core developers

in src/test/cfilter_race_tests.cpp:31 in dafacfd564

  26 | +// A CValidationInterface that, when armed, stalls the validation-signals
  27 | +// scheduler thread inside BlockConnected until Release() is called. Register
  28 | +// this before the BlockFilterIndex so our callback fires first — at that point
  29 | +// the filter index's CustomAppend has not yet run and the filter is not yet
  30 | +// written.
  31 | +class PreFilterBlocker : public CValidationInterface

pinheadmz commented at 7:16 PM on May 12, 2026:

-class PreFilterBlocker : public CValidationInterface                                                                                                                                                                                                                                                                  
+class PreFilterBlocker final : public CValidationInterface

This is needed to silence the compiler warning on line 69 below when the class is instantiated:

warning: destructor called on non-final '(anonymous namespace)::PreFilterBlocker' that has virtual functions but non-virtual destructor [-Wdelete-non-abstract-non-virtual-dtor]

The reason is because the compiler is afraid you might make a child class and then call the destructor on the parent, leading to undefined behavior. The keyword final tells the compiler to calm down, we aren't having any kids ;-)

pinheadmz commented at 7:22 PM on May 12, 2026: member

concept ACK

A few compiler warnings I got locally, would be errors when CI runs. I'll do a more in depth review next

mzumsande commented at 6:26 AM on May 13, 2026: contributor

This looks like a very complicated approach - it adds a lot of complexity to index blocks out of order, and do the indexing directly from the message handler thread. It also seems architecturally unsound, with attempts to separate indexing from the node (#24230).

It seems that there should be simpler solutions. Could we alternatively, for example, make sure this is just 1 or 2 blocks ahead of the last indexed blocks, wait (with a SyncWithValidationInterfaceQueue() call or similar ) until the pending BlockConnected() signal is processed and then try again?

So I tend to Approach NACK.

DrahtBot added the label CI failed on May 13, 2026

DrahtBot commented at 8:20 AM on May 13, 2026: contributor

🚧 At least one of the CI tasks failed. <sub>Task test ancestor commits: https://github.com/bitcoin/bitcoin/actions/runs/25751125589/job/75736916591</sub> <sub>LLM reason (✨ experimental): CI failed because the build of test_bitcoin was stopped by Clang -Werror compile errors in cfilter_race_tests.cpp (thread-safety analysis violation, plus a virtual/non-virtual destructor warning).</sub>

<details><summary>Hints</summary>

Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.
An intermittent issue.

Leave a comment here, if you need help tracking down a confusing failure.

</details>

DrahtBot marked this as a draft on May 15, 2026

maflcko commented at 2:03 PM on May 21, 2026: member

are you still working on this, or can it be closed?

randomlogin commented at 5:48 PM on May 21, 2026: none

are you still working on this, or can it be closed?

I'm working on another approach to fix this bug, and will create a new PR soon. We can either close directly this PR now or keep it for another few days and after a new one receives comments close one of PRs (or both).

fanquake commented at 9:21 AM on May 22, 2026: member

Closing this in favour of #35354. Ideally you don't need to open multiple PRs, you could have pushed the alternative approach here, and keep all the discussion in a single place.

fanquake closed this on May 22, 2026