Description
This PR is another take of using BIP 157 block filters (enabled by -blockfilterindex=1
) for faster wallet rescans and is a modern revival of #15845. For reviewers new to this topic I can highly recommend to read the corresponding PR review club (https://bitcoincore.reviews/15845).
The basic idea is to skip blocks for deeper inspection (i.e. looking at every single tx for matches) if our block filter doesn’t match any of the block’s spent or created UTXOs are relevant for our wallet. Note that there can be false-positives (see https://bitcoincore.reviews/15845#l-199 for a PR review club discussion about false-positive rates), but no false-negatives, i.e. it is safe to skip blocks if the filter doesn’t match; if the filter does match even though there are no wallet-relevant txs in the block, no harm is done, only a little more time is spent extra.
In contrast to #15845, this solution only supports descriptor wallets, which are way more widespread now than back in the time >3 years ago. With that approach, we don’t have to ever derive the relevant scriptPubKeys ourselves from keys before populating the filter, and can instead shift the full responsibility to that to the DescriptorScriptPubKeyMan
which already takes care of that automatically. Compared to legacy wallets, the IsMine
logic for descriptor wallets is as trivial as checking if a scriptPubKey is included in the ScriptPubKeyMan’s set of scriptPubKeys (m_map_script_pub_keys
): https://github.com/bitcoin/bitcoin/blob/e191fac4f3c37820f0618f72f0a8e8b524531ab8/src/wallet/scriptpubkeyman.cpp#L1703-L1710
One of the unaddressed issues of #15845 was that the filter was only created once outside the loop and as such didn’t take into account possible top-ups that have happened. This is solved here by keeping a state of ranged DescriptorScriptPubKeyMan
’s descriptor end ranges and check at each iteration whether that range has increased since last time. If yes, we update the filter with all scriptPubKeys that have been added since the last filter update with a range index equal or higher than the last end range. Note that finding new scriptPubKeys could be made more efficient than linearly iterating through the whole m_script_pub_keys
map (e.g. by introducing a bidirectional map), but this would mean introducing additional complexity and state and it’s probably not worth it at this time, considering that the performance gain is already significant.
Output scripts from non-ranged DescriptorScriptPubKeyMan
s (i.e. ones with a fixed set of output scripts that is never extended) are added only once when the filter is created first.
Benchmark results
Obviously, the speed-up indirectly correlates with the wallet tx frequency in the scanned range: the more blocks contain wallet-related transactions, the less blocks can be skipped due to block filter detection.
In a simple benchmark, a regtest chain with 1008 blocks (corresponding to 1 week) is mined with 20000 scriptPubKeys contained (25 txs * 800 outputs) each. The blocks each have a weight of ~2500000 WUs and hence are about 62.5% full. A global constant WALLET_TX_BLOCK_FREQUENCY
defines how often wallet-related txs are included in a block. The created descriptor wallet (default setting of keypool=1000
, we have 8*1000 = 8000 scriptPubKeys at the start) is backuped via the backupwallet
RPC before the mining starts and imported via restorewallet
RPC after. The measured time for taking this import process (which involves a rescan) once with block filters (-blockfilterindex=1
) and once without block filters (-blockfilterindex=0
) yield the relevant result numbers for the benchmark.
The following table lists the results, sorted from worst-case (all blocks contain wallte-relevant txs, 0% can be skipped) to best-case (no blocks contain walltet-relevant txs, 100% can be skipped) where the frequencies have been picked arbitrarily:
wallet-related tx frequency; 1 tx per… | ratio of irrelevant blocks | w/o filters | with filters | speed gain |
---|---|---|---|---|
~ 10 minutes (every block) | 0% | 56.806s | 63.554s | ~0.9x |
~ 20 minutes (every 2nd block) | 50% (1/2) | 58.896s | 36.076s | ~1.6x |
~ 30 minutes (every 3rd block) | 66.67% (2/3) | 56.781s | 25.430s | ~2.2x |
~ 1 hour (every 6th block) | 83.33% (5/6) | 58.193s | 15.786s | ~3.7x |
~ 6 hours (every 36th block) | 97.22% (35/36) | 57.500s | 6.935s | ~8.3x |
~ 1 day (every 144th block) | 99.31% (143/144) | 68.881s | 6.107s | ~11.3x |
(no txs) | 100% | 58.529s | 5.630s | ~10.4x |
Since even the (rather unrealistic) worst-case scenario of having wallet-related txs in every block of the rescan range obviously doesn’t take significantly longer, I’d argue it’s reasonable to always take advantage of block filters if they are available and there’s no need to provide an option for the user.
Feedback about the general approach (but also about details like naming, where I struggled a lot) would be greatly appreciated. Thanks fly out to furszy for discussing this subject and patiently answering basic question about descriptor wallets!