wallet: fast rescan with BIP157 block filters for descriptor wallets

theStack commented at 10:04 pm on August 29, 2022: contributor

Description

This PR is another take of using BIP 157 block filters (enabled by -blockfilterindex=1) for faster wallet rescans and is a modern revival of #15845. For reviewers new to this topic I can highly recommend to read the corresponding PR review club (https://bitcoincore.reviews/15845).

The basic idea is to skip blocks for deeper inspection (i.e. looking at every single tx for matches) if our block filter doesn’t match any of the block’s spent or created UTXOs are relevant for our wallet. Note that there can be false-positives (see https://bitcoincore.reviews/15845#l-199 for a PR review club discussion about false-positive rates), but no false-negatives, i.e. it is safe to skip blocks if the filter doesn’t match; if the filter does match even though there are no wallet-relevant txs in the block, no harm is done, only a little more time is spent extra.

In contrast to #15845, this solution only supports descriptor wallets, which are way more widespread now than back in the time >3 years ago. With that approach, we don’t have to ever derive the relevant scriptPubKeys ourselves from keys before populating the filter, and can instead shift the full responsibility to that to the DescriptorScriptPubKeyMan which already takes care of that automatically. Compared to legacy wallets, the IsMine logic for descriptor wallets is as trivial as checking if a scriptPubKey is included in the ScriptPubKeyMan’s set of scriptPubKeys (m_map_script_pub_keys): https://github.com/bitcoin/bitcoin/blob/e191fac4f3c37820f0618f72f0a8e8b524531ab8/src/wallet/scriptpubkeyman.cpp#L1703-L1710

One of the unaddressed issues of #15845 was that the filter was only created once outside the loop and as such didn’t take into account possible top-ups that have happened. This is solved here by keeping a state of ranged DescriptorScriptPubKeyMan’s descriptor end ranges and check at each iteration whether that range has increased since last time. If yes, we update the filter with all scriptPubKeys that have been added since the last filter update with a range index equal or higher than the last end range. Note that finding new scriptPubKeys could be made more efficient than linearly iterating through the whole m_script_pub_keys map (e.g. by introducing a bidirectional map), but this would mean introducing additional complexity and state and it’s probably not worth it at this time, considering that the performance gain is already significant.

Output scripts from non-ranged DescriptorScriptPubKeyMans (i.e. ones with a fixed set of output scripts that is never extended) are added only once when the filter is created first.

Benchmark results

Obviously, the speed-up indirectly correlates with the wallet tx frequency in the scanned range: the more blocks contain wallet-related transactions, the less blocks can be skipped due to block filter detection.

In a simple benchmark, a regtest chain with 1008 blocks (corresponding to 1 week) is mined with 20000 scriptPubKeys contained (25 txs * 800 outputs) each. The blocks each have a weight of ~2500000 WUs and hence are about 62.5% full. A global constant WALLET_TX_BLOCK_FREQUENCY defines how often wallet-related txs are included in a block. The created descriptor wallet (default setting of keypool=1000, we have 8*1000 = 8000 scriptPubKeys at the start) is backuped via the backupwallet RPC before the mining starts and imported via restorewallet RPC after. The measured time for taking this import process (which involves a rescan) once with block filters (-blockfilterindex=1) and once without block filters (-blockfilterindex=0) yield the relevant result numbers for the benchmark.

The following table lists the results, sorted from worst-case (all blocks contain wallte-relevant txs, 0% can be skipped) to best-case (no blocks contain walltet-relevant txs, 100% can be skipped) where the frequencies have been picked arbitrarily:

wallet-related tx frequency; 1 tx per…	ratio of irrelevant blocks	w/o filters	with filters	speed gain
~ 10 minutes (every block)	0%	56.806s	63.554s	~0.9x
~ 20 minutes (every 2nd block)	50% (1/2)	58.896s	36.076s	~1.6x
~ 30 minutes (every 3rd block)	66.67% (2/3)	56.781s	25.430s	~2.2x
~ 1 hour (every 6th block)	83.33% (5/6)	58.193s	15.786s	~3.7x
~ 6 hours (every 36th block)	97.22% (35/36)	57.500s	6.935s	~8.3x
~ 1 day (every 144th block)	99.31% (143/144)	68.881s	6.107s	~11.3x
(no txs)	100%	58.529s	5.630s	~10.4x

Since even the (rather unrealistic) worst-case scenario of having wallet-related txs in every block of the rescan range obviously doesn’t take significantly longer, I’d argue it’s reasonable to always take advantage of block filters if they are available and there’s no need to provide an option for the user.

Feedback about the general approach (but also about details like naming, where I struggled a lot) would be greatly appreciated. Thanks fly out to furszy for discussing this subject and patiently answering basic question about descriptor wallets!

DrahtBot added the label Wallet on Aug 29, 2022

DrahtBot commented at 11:11 pm on August 29, 2022: contributor

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

No conflicts as of last run.

Sjors commented at 12:12 pm on August 31, 2022: member

Concept ACK

Limiting the functionality to descriptor wallets is fine. We could add an argument to loadwallet that prevents a rescan, so a user can migrate (#19602) to a descriptor wallet and then rescan.

Is there anything in @jamesob’s #23549 you can reuse here?

The rescanwallet RPC help should mention that having -blockfilterindex=1 dramatically speeds things up. It might also be good to have log message somewhere that it’s using the block filter.

I tried a rescan from genesis on a simple mainnet wallet. It was blazing fast (in comparison) and it found all transactions. It’s also slightly faster than the little helper script I wrote for #23549.

Bonus feature for a followup: fetch any matching pruned blocks.

DrahtBot added the label Needs rebase on Sep 1, 2022

theStack force-pushed on Sep 2, 2022

DrahtBot removed the label Needs rebase on Sep 2, 2022

theStack commented at 4:23 pm on September 2, 2022: contributor

Rebased on master, with a few changes and an extra commit (see below). @Sjors:

Thanks for your initial feedback, much appreciated!

Limiting the functionality to descriptor wallets is fine. We could add an argument to loadwallet that prevents a rescan, so a user can migrate (#19602) to a descriptor wallet and then rescan.

Agree, I think this sounds like a useful follow-up idea.

Is there anything in @jamesob’s #23549 you can reuse here?

Not that I could see; on contrast to scanblocks, the filter set is created from wallet-internal source rather than external user input and also has to be updated continuously.

The rescanwallet RPC help should mention that having -blockfilterindex=1 dramatically speeds things up. It might also be good to have log message somewhere that it’s using the block filter.

Done, added an extra commit that documents the speedup possibilty for all RPCs that work on descriptor wallet and (possibly) involve a rescan (rescanblockchain, importdescriptors, restorewallet and loadwallet; did I miss any?). Also extended the “Rescan started from block…” debug message, showing if the fast or slow variant is used.

I tried a rescan from genesis on a simple mainnet wallet. It was blazing fast (in comparison) and it found all transactions. It’s also slightly faster than the little helper script I wrote for #23549.

🚀 🚀 🚀

Bonus feature for a followup: fetch any matching pruned blocks.

Sounds like a charming idea 👌

Note that this PR is covered in the PR review club for next wednesday, hosted by LarryRuane.

in src/wallet/wallet.cpp:275 in d0114e02cd outdated

270+        assert(!m_wallet.IsLegacy());
271+
272+        // create initial filter with scripts from both active and non-active ScriptPubKeyMans
273+        auto active_spkms = m_wallet.GetActiveScriptPubKeyMans();
274+        for (auto spkm : m_wallet.GetAllScriptPubKeyMans()) {
275+            auto* desc_spkm = dynamic_cast<DescriptorScriptPubKeyMan*>(spkm);

LarryRuane commented at 4:27 am on September 5, 2022:

d0114e02cddd9bf1cc8760ebe95941f80609259b nit

0            auto desc_spkm = dynamic_cast<DescriptorScriptPubKeyMan*>(spkm);

theStack commented at 2:24 pm on September 6, 2022:

Thanks, done.

in src/interfaces/chain.h:148 in d0114e02cd outdated

143@@ -143,6 +144,13 @@ class Chain
144     //! or one of its ancestors.
145     virtual std::optional<int> findLocatorFork(const CBlockLocator& locator) = 0;
146 
147+    //! Returns whether a block filter index is available.
148+    virtual bool haveBlockFilterIndex() = 0;

stickies-v commented at 1:31 pm on September 5, 2022:

nit: Some methods here are called has..., others have.... I’m not sure where the have... comes from, but it sounds weird. Could you change that to hasBlockFilterIndex() ?

0    virtual bool hasBlockFilterIndex() = 0;

theStack commented at 2:17 pm on September 6, 2022:

Agree that naming the method has... reads more fluent, considering that “chain” is singular. Done.

in src/node/interfaces.cpp:540 in d0114e02cd outdated

536@@ -536,6 +537,20 @@ class ChainImpl : public Chain
537         }
538         return std::nullopt;
539     }
540+    bool haveBlockFilterIndex() override

stickies-v commented at 1:42 pm on September 5, 2022:

There’s a bit of an interface inconsistency here (and in blockFilterMatchesAny ) in that so far most of the blockfilter functions don’t assume the type to be basic but rather expect them to be passed. Have you considered this? One alternative could be add “basic” in the function name, which can then be refactored later if/when we have more filtertypes, but at least then it’ll be more explicit? At first glance, I think I’d prefer just passing the FilterType::BASIC to stay consistent but no strong view (yet).

theStack commented at 2:17 pm on September 6, 2022:

Good point. To be frank I was just lazy and assumed that there won’t be any other block-filter types than “basic” for a long time anyways 😅 Fully agree though that passing the filter type is more consistent. Done for both methods.

in src/wallet/scriptpubkeyman.cpp:2661 in d0114e02cd outdated

2660+
2661+    for (auto const& [script_pub_key, index] : m_map_script_pub_keys) {
2662+        if (index >= minimum_index) script_pub_keys.emplace_back(script_pub_key);
2663+    }
2664+    return script_pub_keys;
2665+}

stickies-v commented at 1:49 pm on September 5, 2022:

This is a lot of code duplication from GetScriptPubKeys. At first sight, I don’t really see why this couldn’t be an overloaded function or default parameter?

theStack commented at 2:23 pm on September 6, 2022:

In an earlier version the method GetScriptPubKeys was indeed simply extended with a default parameter (minimum_index = 0). After the latest rebase however, this method is now part of the base class ScriptPubKeyMan interface: https://github.com/bitcoin/bitcoin/blob/5291933fedceb9df16eb9e4627b1d7386b53ba07/src/wallet/scriptpubkeyman.h#L246 Passing a minimum index only makes sense for DescriptorScriptPubKeyMans, that’s why I refrained from imposing a new parameter; overloading the method only for this class also seemed wrong to me, that’s why I chose to use a more explicit naming. Happy to overload though if other’s wish that too (maybe it’s not as problematic as I think :)).

stickies-v commented at 8:34 am on September 7, 2022:

I don’t really see the issue with overloading a child class function, especially with an optional parameter? Definitely beats the code duplication in my view, but yeah would be nice to hear others’ thoughts.

theStack commented at 9:04 am on September 7, 2022:

If the method of the child class has a different number of parameters than the method of the base class, the override doesn’t work. E.g., the following doesn’t compile:

0class Base {
1public:
2    virtual int get_foobar() const { return 1; }
3};
4
5class Inherited : public Base {
6    int get_foobar(int param = 0) const override { return 2; }
7};

0$ clang++ -c /tmp/override.cpp
1/tmp/override.cpp:7:45: error: non-virtual member function marked 'override' hides virtual member function
2        int get_foobar(int param = 0) const override { return 2; }
3                                            ^
4/tmp/override.cpp:3:25: note: hidden overloaded virtual function 'Base::get_foobar' declared here: different number of parameters (0 vs 1)
5            virtual int get_foobar() const { return 1; }
6                        ^

The only way to avoid code deduplication would be to also introduce the minimum_index parameter to the GetScriptPubKeys method of the base class interface, but that seemed excessive to me, considering that we would only evaluate that parameter in DescriptorScriptPubKeyMans.

stickies-v commented at 9:51 am on September 7, 2022:

There’s no need to override it, I’m just talking about overloading it for DescriptorScriptPubKeyMan. For example, I think the below could work (not sure if the vector return type was required, just kept it to an unordered_set here for minimal diff):

 0diff --git a/src/wallet/scriptpubkeyman.cpp b/src/wallet/scriptpubkeyman.cpp
 1index f7d7f99b8..4c42ef8e0 100644
 2--- a/src/wallet/scriptpubkeyman.cpp
 3+++ b/src/wallet/scriptpubkeyman.cpp
 4@@ -2642,24 +2642,17 @@ const WalletDescriptor DescriptorScriptPubKeyMan::GetWalletDescriptor() const
 5 
 6 const std::unordered_set<CScript, SaltedSipHasher> DescriptorScriptPubKeyMan::GetScriptPubKeys() const
 7 {
 8-    LOCK(cs_desc_man);
 9-    std::unordered_set<CScript, SaltedSipHasher> script_pub_keys;
10-    script_pub_keys.reserve(m_map_script_pub_keys.size());
11-
12-    for (auto const& script_pub_key: m_map_script_pub_keys) {
13-        script_pub_keys.insert(script_pub_key.first);
14-    }
15-    return script_pub_keys;
16+    return GetScriptPubKeys(0);
17 }
18 
19-const std::vector<CScript> DescriptorScriptPubKeyMan::GetScriptPubKeysWithMinIndex(int32_t minimum_index) const
20+const std::unordered_set<CScript, SaltedSipHasher> DescriptorScriptPubKeyMan::GetScriptPubKeys(int32_t minimum_index) const
21 {
22     LOCK(cs_desc_man);
23-    std::vector<CScript> script_pub_keys;
24+    std::unordered_set<CScript, SaltedSipHasher> script_pub_keys;
25     script_pub_keys.reserve(std::max<size_t>(m_map_script_pub_keys.size() - minimum_index, 0));
26 
27     for (auto const& [script_pub_key, index] : m_map_script_pub_keys) {
28-        if (index >= minimum_index) script_pub_keys.emplace_back(script_pub_key);
29+        if (index >= minimum_index) script_pub_keys.insert(script_pub_key);
30     }
31     return script_pub_keys;
32 }
33diff --git a/src/wallet/scriptpubkeyman.h b/src/wallet/scriptpubkeyman.h
34index ac44de0e4..4ff7fdb15 100644
35--- a/src/wallet/scriptpubkeyman.h
36+++ b/src/wallet/scriptpubkeyman.h
37@@ -643,7 +643,7 @@ public:
38 
39     const WalletDescriptor GetWalletDescriptor() const EXCLUSIVE_LOCKS_REQUIRED(cs_desc_man);
40     const std::unordered_set<CScript, SaltedSipHasher> GetScriptPubKeys() const override;
41-    const std::vector<CScript> GetScriptPubKeysWithMinIndex(int32_t minimum_index) const;
42+    const std::unordered_set<CScript, SaltedSipHasher> GetScriptPubKeys(int32_t minimum_index) const;
43 
44     bool GetDescriptorString(std::string& out, const bool priv) const;
45 
46diff --git a/src/wallet/wallet.cpp b/src/wallet/wallet.cpp
47index d79d9e76f..d327ab021 100644
48--- a/src/wallet/wallet.cpp
49+++ b/src/wallet/wallet.cpp
50@@ -289,7 +289,7 @@ public:
51         for (auto [desc_spkm, last_range_end] : m_last_range_ends) {
52             int32_t current_range_end = GetDescriptorRangeEnd(*desc_spkm);
53             if (current_range_end > last_range_end) {
54-                for (const auto& script_pub_key : desc_spkm->GetScriptPubKeysWithMinIndex(last_range_end)) {
55+                for (const auto& script_pub_key : desc_spkm->GetScriptPubKeys(last_range_end)) {
56                     m_filter_set.emplace(script_pub_key.begin(), script_pub_key.end());
57                 }
58                 m_last_range_ends.at(desc_spkm) = current_range_end;

theStack commented at 10:07 am on September 7, 2022:

Ah, now I fully understand what you mean, looks good at a first glance and should work. Somehow I missed the simple idea of just calling the overloaded method with from the one that is overrided (i.e. GetScriptPubKeys(0)) and was too focused on solving it with default parameter. Returning a std::unordered_set rather than std::vector should also be fine for our purposes. Will try that out in a bit 👍

in src/wallet/wallet.cpp:313 in d0114e02cd outdated

302+        return m_wallet.chain().blockFilterMatchesAny(block_hash, m_filter_set);
303+    }
304+
305+private:
306+    const CWallet& m_wallet;
307+    std::map<const DescriptorScriptPubKeyMan*, int32_t> m_last_range_ends;

stickies-v commented at 1:59 pm on September 5, 2022:

nit: Maybe a brief docstring would be helpful here? The name is not immediately obvious imo.

theStack commented at 2:22 pm on September 6, 2022:

Thanks, done.

in src/wallet/wallet.cpp:310 in d0114e02cd outdated

305+private:
306+    const CWallet& m_wallet;
307+    std::map<const DescriptorScriptPubKeyMan*, int32_t> m_last_range_ends;
308+    GCSFilter::ElementSet m_filter_set;
309+
310+    static int32_t GetDescriptorSPKRangeEnd(const DescriptorScriptPubKeyMan& desc_spkm)

stickies-v commented at 2:01 pm on September 5, 2022:

nit: Do we need SPK in the name here? GetDescriptorRangeEnd seems to describe it accurately?

theStack commented at 2:22 pm on September 6, 2022:

Agree that having the “SPK” in there is not really needed. Done.

stickies-v commented at 3:23 pm on September 5, 2022: contributor

Concept ACK

It might be too controversial to be part of this PR, but I wonder (not suggesting, yet) if after this PR we should enable blockfilterindex by default when a user has at least one descriptor wallet?

in src/wallet/wallet.cpp:1867 in d0114e02cd outdated

1846+        }
1847+
1848         // Read block data
1849         CBlock block;
1850-        chain().findBlock(block_hash, FoundBlock().data(block));
1851+        if (!skip_block) chain().findBlock(block_hash, FoundBlock().data(block));

LarryRuane commented at 6:52 pm on September 5, 2022:

d0114e02cddd9bf1cc8760ebe95941f80609259b nit, here we construct a CBlock that isn’t used if skip_block is true (we can use the block filter). I wonder if it would be worth the performance improvement to avoid the construction.

0        std::unique_ptr<CBlock> block;
1        if (!skip_block) {
2            block = std::make_unique<CBlock>();
3            chain().findBlock(block_hash, FoundBlock().data(*block));
4        }

You would also have to change 3 occurrences of block. to block-> since it now would be a pointer.

theStack commented at 2:32 pm on September 6, 2022:

After some consideration, I decided to keep as it is for now to keep the code simpler, as I think default-ctoring a CBlock is neglectible here compared to other operations (didn’t do any benchmarking or look deeper into it though). Happy to follow-up here though, will keep this conversation open. (Maybe also a topic to discuss for PR review club tomorrow, in case there is some time at the end?).

in src/wallet/scriptpubkeyman.cpp:2655 in d0114e02cd outdated

2651@@ -2652,6 +2652,18 @@ const std::unordered_set<CScript, SaltedSipHasher> DescriptorScriptPubKeyMan::Ge
2652     return script_pub_keys;
2653 }
2654 
2655+const std::vector<CScript> DescriptorScriptPubKeyMan::GetScriptPubKeysWithMinIndex(int32_t minimum_index) const
2656+{
2657+    LOCK(cs_desc_man);
2658+    std::vector<CScript> script_pub_keys;
2659+    script_pub_keys.reserve(m_map_script_pub_keys.size());

LarryRuane commented at 7:04 pm on September 5, 2022:

d0114e02cddd9bf1cc8760ebe95941f80609259b Because of the index >= minimum_index guard (just below, line 2662), could this reserve many more entries than needed? I would probably let std::vector increase the capacity as needed.

theStack commented at 2:27 pm on September 6, 2022:

Good catch! We actually know how large the resulting vector should be (m_map_script_pub_keys.size() minus the passed minimum index); changed that accordingly and clamped to a minimum of zero to avoid potential integer underflow issues (if e.g. the caller would pass a minimum index higher than the current ~~end range~~ count of scriptPubKeys).

LarryRuane commented at 7:35 pm on September 5, 2022: contributor

Concept ACK, I’ll try to do a more complete code review in the next few days.

theStack force-pushed on Sep 6, 2022

theStack commented at 2:35 pm on September 6, 2022: contributor

Force-pushed with most of the suggestions from @stickies-v and @LarryRuane. Thanks for your reviews, very good points!

Sjors commented at 4:50 pm on September 6, 2022: member

Suggest renaming to “turbo rescan” for more review :-)

LarryRuane commented at 7:01 pm on September 6, 2022: contributor

Performance test report: With the master branch, and an empty wallet (which is the best case for this PR), on mainnet, the rescanblockchain time was 90 minutes. With this PR, it was 10 minutes. This ratio of 9x is pretty close to the last row in the table in the description.

On signet, the PR branch is actually slower; this is probably due to the fact that so many blocks are empty or nearly empty; the time needed to check the filter is greater than the time to check the block directly.

But of course, mainnet is what really matters.

amovfx commented at 11:30 pm on September 6, 2022: none

Concept ACK, This looks like it is over my head but I’m going to give this a shot for the pr review club.

theStack force-pushed on Sep 8, 2022

theStack commented at 4:56 pm on September 8, 2022: contributor

Force-pushed again with the following changes:

took @stickies-v’s suggestion to overload DescriptorScriptPubKeyMan::GetScriptPubKeys(...) in order to deduplicate code (https://github.com/bitcoin/bitcoin/pull/25957#discussion_r964632557)
removed the adaption (earlier first commit) of wallet_import_rescan.py: as @LarryRuane found out, this test only works for legacy wallets (it uses RPCs like importprivkey, importaddress etc.) and hence is at this stage not useful for testing this PR
added a new test wallet_fast_rescan.py which tests the top-up detection by repeatedly creating txs sending to the address of each wallet descriptor’s range end, and then comparing the found wallet txs after slow and fast rescan each

theStack force-pushed on Sep 8, 2022

amovfx approved

amovfx commented at 0:39 am on September 9, 2022: none

I don’t have the skill to really review this. But I ran this over testnet and def got a speed improvement. 3:08.55 vs 3:50.80. This looks like a win.

blockfilterindex=1 { "start_height": 0, "stop_height": 2345883 } /Users/andrewoseen/git/bitcoin/build/src/bitcoin-cli -chain=test 0.00s user 0.00s system 0% cpu 3:08.55 total

blockfilterindex=0 { "start_height": 0, "stop_height": 2345883 } /Users/andrewoseen/git/bitcoin/build/src/bitcoin-cli -chain=test 0.00s user 0.00s system 0% cpu 3:50.80 total

in src/wallet/rpc/wallet.cpp:203 in 137347fad5 outdated

197@@ -198,7 +198,9 @@ static RPCHelpMan loadwallet()
198     return RPCHelpMan{"loadwallet",
199                 "\nLoads a wallet from a wallet file or directory."
200                 "\nNote that all wallet command-line options used when starting bitcoind will be"
201-                "\napplied to the new wallet.\n",
202+                "\napplied to the new wallet."
203+                "\nThe rescan is significantly faster if a descriptor wallet is loaded"
204+                "\nand block filters are available (bitcoind option \"-blockfilterindex=1\").\n",

luke-jr commented at 1:51 am on September 9, 2022:

Not sure it makes sense to talk about rescans on loadwallet. It can happen, but hopefully shouldn’t be very often.

theStack commented at 2:59 pm on September 13, 2022:

Thanks, restored the original help text for loadwallet on the latest force-push.

in src/wallet/scriptpubkeyman.h:646 in 137347fad5 outdated

642@@ -643,6 +643,7 @@ class DescriptorScriptPubKeyMan : public ScriptPubKeyMan
643 
644     const WalletDescriptor GetWalletDescriptor() const EXCLUSIVE_LOCKS_REQUIRED(cs_desc_man);
645     const std::unordered_set<CScript, SaltedSipHasher> GetScriptPubKeys() const override;
646+    const std::unordered_set<CScript, SaltedSipHasher> GetScriptPubKeys(int32_t minimum_index) const;

luke-jr commented at 1:52 am on September 9, 2022:

0    const std::unordered_set<CScript, SaltedSipHasher> GetScriptPubKeys(int32_t minimum_index = 0) const;

w0xlt commented at 11:10 am on September 13, 2022:

In that case, it would be also need to change LegacyScriptPubKeyMan::GetScriptPubKeys to LegacyScriptPubKeyMan::GetScriptPubKeys(int32_t minimum_index)

theStack commented at 3:01 pm on September 13, 2022:

In that case, it would be also need to change LegacyScriptPubKeyMan::GetScriptPubKeys to LegacyScriptPubKeyMan::GetScriptPubKeys(int32_t minimum_index)

Yes, we would need to impose the parameter on the base class, which I think wouldn’t make much sense (the minimum_index for this method is only useful for descriptor wallets). Also see the discussion here: #25957 (review)

Sjors commented at 9:36 am on September 29, 2022:

Legacy wallets have similiar topup functionality, but only for the keypool. LegacyScriptPubKeyMan::GetScriptPubKeys on the other hands gets it keys from places like KeyMap mapKeys in FillableSigningProvider, which is an std::map<CScriptID, CScript>. There’s no obvious way to get the first N keys for legacy without significant refactoring, afaik you’d have to look up the metadata for that which some ancient wallets don’t have?

So I agree with having GetScriptPubKeys(int32_t minimum_index) for DescriptorScriptPubKeyMan only.

in src/wallet/wallet.cpp:1847 in 137347fad5 outdated

1838@@ -1775,9 +1839,21 @@ CWallet::ScanResult CWallet::ScanForWalletTransactions(const uint256& start_bloc
1839             WalletLogPrintf("Still rescanning. At block %d. Progress=%f\n", block_height, progress_current);
1840         }
1841 
1842+        bool skip_block = false;
1843+        if (fast_rescan_filter) {
1844+            fast_rescan_filter->UpdateIfNeeded();
1845+            auto match_result = fast_rescan_filter->MatchesBlock(block_hash);
1846+            if (match_result.has_value() && !match_result.value()) {
1847+                // Block is skipped, pretend that we found it successfully

luke-jr commented at 1:54 am on September 9, 2022:

stickies-v commented at 9:52 pm on September 12, 2022:

I’m quite confused by this comment, so +1 on removing or clarifying. “Block does not match filter, mark it as scanned so …”

theStack commented at 3:01 pm on September 13, 2022:

Thanks, removed it.

luke-jr commented at 1:58 am on September 9, 2022: member

We could add an argument to loadwallet that prevents a rescan, so a user can migrate (#19602) to a descriptor wallet and then rescan.

Feels like a footgun. Would rather a param that migrates without “loading”. But I suspect extending this to support legacy wallets (outside the scope of this PR) wouldn’t be very hard either.

luke-jr changes_requested

in src/wallet/wallet.cpp:323 in 1daead3b4a outdated

315+
316+    static int32_t GetDescriptorRangeEnd(const DescriptorScriptPubKeyMan& desc_spkm)
317+    {
318+        LOCK(desc_spkm.cs_desc_man);
319+        return desc_spkm.GetWalletDescriptor().range_end;
320+    }

furszy commented at 3:51 pm on September 12, 2022:

What about using desc_spkm->m_max_cached_index directly in the caller side instead?

theStack commented at 12:08 pm on September 13, 2022:

m_max_cached_index is a private member and therefore not accessible from the outside. Not sure if it’s worth it to extend the descriptor SPK-man interface and add a corresponding getter member function (e.g. DescriptorScriptPubKeyMan::GetMaxCachedIndex()) only for this use-case.

in src/node/interfaces.cpp:12 in 137347fad5 outdated

 7@@ -8,6 +8,7 @@
 8 #include <chainparams.h>
 9 #include <deploymentstatus.h>
10 #include <external_signer.h>
11+#include <index/blockfilterindex.h>

stickies-v commented at 4:33 pm on September 12, 2022:

I think you also need to include blockfilter.h here.

theStack commented at 3:01 pm on September 13, 2022:

Done.

in src/wallet/wallet.cpp:313 in 1daead3b4a outdated

308+      * This information is used to detect whether new addresses were derived
309+      * (that is, if the current end range is larger than the saved end range)
310+      * after processing a block and hence a filter set update is needed to
311+      * take possible keypool top-ups into account.
312+      */
313+    std::map<const DescriptorScriptPubKeyMan*, int32_t> m_last_range_ends;

furszy commented at 7:54 pm on September 12, 2022:

I don’t think that pointers are a good design choice for map keys. Would rather use the spkm id:

0    std::map<uint256, int32_t> m_last_range_ends;

theStack commented at 2:06 pm on September 13, 2022:

Valid point, I struggled with the key type choice a lot. In an earlier version (before the PR was opened) I even used IDs at keys, but the drawback of this is that it leads to more code, as we (artificially?) need to convert between SPKM-Id and DescriptorScriptPubKey* pointers again and again, i.e. needing to call CWallet::GetScriptPubKeyMan(const uint256& id) (+casting the result to DescriptorScriptPubKeyMan*) at every iteration of the map and call DescriptorScriptPubKeyMan::GetID() before every insertion into the map. Considering that the result is always the same, it seems a bit excessive to recalculate the ID (involving creation of a string and hashing it, see DescriptorScriptPubKeyMan::GetID()) again and again. As long as we are sure that the different pointers are constant and unique, using them as keys should be okay? (At least, I couldn’t find anything stating that it should be avoided, as long as we know what we are doing; e.g. https://stackoverflow.com/questions/25122932/pointers-as-keys-in-map-c-stl).

Any thoughts?

theStack commented at 3:13 pm on September 13, 2022:

Decided to use SPKM-IDs, as I think it’s indeed the clearer design choice – the recalculation of the IDs only happens on repopulation of the filter (due to detected top-up), and shouldn’t affect the performance at all. Leaving this conversation open for further discussion though, if anyone has objections.

in src/wallet/wallet.cpp:279 in 137347fad5 outdated

274+        for (auto spkm : m_wallet.GetAllScriptPubKeyMans()) {
275+            auto desc_spkm = dynamic_cast<DescriptorScriptPubKeyMan*>(spkm);
276+            for (const auto& script_pub_key : desc_spkm->GetScriptPubKeys()) {
277+                m_filter_set.emplace(script_pub_key.begin(), script_pub_key.end());
278+            }
279+            // save active ScriptPubKeyMans's end ranges for possible future filter updates

stickies-v commented at 9:05 pm on September 12, 2022:

nit: structure

0            // save each active ScriptPubKeyMans's range end for possible future filter updates

theStack commented at 3:02 pm on September 13, 2022:

Done.

in src/wallet/wallet.cpp:273 in 137347fad5 outdated

268+    {
269+        // fast rescanning via block filters is only supported by descriptor wallets right now
270+        assert(!m_wallet.IsLegacy());
271+
272+        // create initial filter with scripts from both active and non-active ScriptPubKeyMans
273+        auto active_spkms = m_wallet.GetActiveScriptPubKeyMans();

stickies-v commented at 9:22 pm on September 12, 2022:

nit: {} list initialization (in a few other places too)

0        auto active_spkms{m_wallet.GetActiveScriptPubKeyMans()};

theStack commented at 3:02 pm on September 13, 2022:

Changed to list initialization in all commits (hope I didn’t miss any instances).

in src/wallet/wallet.cpp:278 in 137347fad5 outdated

273+        auto active_spkms = m_wallet.GetActiveScriptPubKeyMans();
274+        for (auto spkm : m_wallet.GetAllScriptPubKeyMans()) {
275+            auto desc_spkm = dynamic_cast<DescriptorScriptPubKeyMan*>(spkm);
276+            for (const auto& script_pub_key : desc_spkm->GetScriptPubKeys()) {
277+                m_filter_set.emplace(script_pub_key.begin(), script_pub_key.end());
278+            }

stickies-v commented at 9:32 pm on September 12, 2022:

You could put this into its own function, e.g. FastWalletRescanFilter::AddScriptPubKeys(DescriptorScriptPubKeyMan* desc_spkm, int32_t last_range_end = 0), so you can avoid the duplication in UpdateIfNeeded()? I think it’d be a bit more readable too (not that it’s bad now)

theStack commented at 3:02 pm on September 13, 2022:

Good idea, introduced a new private method AddScriptPubKeys as suggested.

in src/wallet/wallet.cpp:289 in 137347fad5 outdated

284+    }
285+
286+    void UpdateIfNeeded()
287+    {
288+        // repopulate filter with new scripts if top-up has happened since last iteration
289+        for (auto [desc_spkm, last_range_end] : m_last_range_ends) {

stickies-v commented at 9:39 pm on September 12, 2022:

I’m not very familiar with the wallet rescanning process, but I wonder if we should also check if there are any new spkms? If so, and if you would then want to track active_spkms as an internal member, you could extend FastWalletRescanFilter::AddScriptPubKeys from my other comment and have it update m_last_range_ends too for more deduplication.

furszy commented at 0:03 am on September 13, 2022:

Not needed. Every import command checks if a rescan is being performed prior execution (it’s the WalletRescanReserver purpose).

stickies-v commented at 10:46 am on September 13, 2022:

Thanks for clarifying. At first sight, that seems quite a fragile assumption since it’s implicit and requires every user of FastWalletRescanFilter to do that check voluntarily. Some kind of assertion would be nice but would go at the cost of performance to call and cast all spkms after every block, so maybe not worth it.

in src/wallet/wallet.cpp:1818 in 137347fad5 outdated

1807@@ -1748,7 +1808,11 @@ CWallet::ScanResult CWallet::ScanForWalletTransactions(const uint256& start_bloc
1808     uint256 block_hash = start_block;
1809     ScanResult result;
1810 
1811-    WalletLogPrintf("Rescan started from block %s...\n", start_block.ToString());
1812+    std::unique_ptr<FastWalletRescanFilter> fast_rescan_filter;
1813+    if (!IsLegacy() && chain().hasBlockFilterIndex(BlockFilterType::BASIC)) fast_rescan_filter.reset(new FastWalletRescanFilter(*this));

stickies-v commented at 9:47 pm on September 12, 2022:

Should include blockfilter.h

theStack commented at 3:03 pm on September 13, 2022:

Thanks, done.

in src/wallet/wallet.cpp:1850 in 137347fad5 outdated

1838@@ -1775,9 +1839,21 @@ CWallet::ScanResult CWallet::ScanForWalletTransactions(const uint256& start_bloc
1839             WalletLogPrintf("Still rescanning. At block %d. Progress=%f\n", block_height, progress_current);
1840         }
1841 
1842+        bool skip_block = false;
1843+        if (fast_rescan_filter) {
1844+            fast_rescan_filter->UpdateIfNeeded();
1845+            auto match_result = fast_rescan_filter->MatchesBlock(block_hash);
1846+            if (match_result.has_value() && !match_result.value()) {

stickies-v commented at 9:50 pm on September 12, 2022:

0            if (match_result.value_or(false)) {

theStack commented at 12:15 pm on September 13, 2022:

I don’t think that’s logically equivalent.

“we have a result, and the result is negative” (PR), versus
“we have a result, and the result is positive” (your suggestion)

(according to https://en.cppreference.com/w/cpp/utility/optional/value_or: “Returns the contained value if *this has a value, otherwise returns default_value.”)

stickies-v commented at 12:54 pm on September 13, 2022:

You’re right, I should have inversed it. This should be equivalent though:

0            if (!match_result.value_or(true)) {

theStack commented at 3:04 pm on September 13, 2022:

While I agree that this is now logically equivalent, I doubt that it’s more readable.

stickies-v commented at 4:11 pm on September 13, 2022:

Maybe it’s me, but for some reason I’m often confused by ! for negating in general, including in the second hand of the original version (as per my initial wrong suggestion, apparently). What about match_result.value_or(true) == false)? I think that’s concise and clear. I’m just not a fan of “re-implementing” the value_or() when it’s so common.

Don’t want to bikeshed over this obviously, so as you see fit - no further comment.

theStack commented at 2:16 pm on September 23, 2022:

I’m still not convinced, as I think there is more mental effort needed for a reader to grasp exactly what the “block skip condition” is. “the filter matching routine returned a result, and that result is negative” is expressed more closely with the PR variant IMHO. Will still leave the discussion open for a while in case anyone wants to chime in. Maybe it’s more a matter of taste and I have a bad one 😛

furszy commented at 2:32 pm on September 23, 2022:

As we are not using the returned optional value (which, if nullopt denotes that we don’t have the filter index for it) and we only want to know if the set matches the block filter, could change MatchesBlock to return a simple boolean instead.

Something like:

0std::optional<bool> MatchesBlock(const uint256& block_hash) const
1{
2    auto matches{m_wallet.chain().blockFilterMatchesAny(BlockFilterType::BASIC, block_hash, m_filter_set)};
3    return matches && *matches;
4}

And then this specific line would just be:

0            if (match_result) {

theStack commented at 6:10 pm on September 23, 2022:

@furszy: Unfortunately that doesn’t work. Remember that the condition for skipping a block if the filter does not match the block. If we just use !match_result as condition (I guess in your suggested change your just forgot the negation operator !), then we would also skip a block if the match function didn’t return a result (e.g. because the block filter is not available), which is incorrect.

furszy commented at 7:05 pm on September 23, 2022:

I guess in your suggested change your just forgot the negation operator !

it was a !match_result, yeah.

then we would also skip a block if the match function didn’t return a result (e.g. because the block filter is not available), which is incorrect.

That is guarded by the fast_rescan_filter object existence (before create the object, you check if the filter index exist). If there isn’t a filter index, then the object is never created, there by this line is never called.

As block filters are created with blocks (on the same connection signal), we shouldn’t need to check for filter existence at this point of the code.

Still.. there is still a tiny window of time for a race condition.. when a block gets stored and we are still creating the block filter.. So.. better to check the filter existence on every loop. And forget all what I just said :p

stickies-v commented at 9:21 pm on September 28, 2022:

A less controversial idea to improve readability: I think if the name represents what true/false means, it becomes clearer: for example matches_block instead of match_result. Also in this case, I think dereferencing is clearer than using .value()

 0diff --git a/src/wallet/wallet.cpp b/src/wallet/wallet.cpp
 1index d3f1fb418..2c2f80cfa 100644
 2--- a/src/wallet/wallet.cpp
 3+++ b/src/wallet/wallet.cpp
 4@@ -1846,8 +1846,8 @@ CWallet::ScanResult CWallet::ScanForWalletTransactions(const uint256& start_bloc
 5         bool skip_block{false};
 6         if (fast_rescan_filter) {
 7             fast_rescan_filter->UpdateIfNeeded();
 8-            auto match_result{fast_rescan_filter->MatchesBlock(block_hash)};
 9-            if (match_result.has_value() && !match_result.value()) {
10+            auto matches_block{fast_rescan_filter->MatchesBlock(block_hash)};
11+            if (matches_block.has_value() && !*matches_block) {
12                 result.last_scanned_block = block_hash;
13                 result.last_scanned_height = block_height;
14                 skip_block = true;
15///
16
17</details>

theStack commented at 12:13 pm on October 1, 2022:

Thanks, done 👌 I agree this is more readable than the previous version.

Sjors commented at 11:09 am on October 3, 2022:

The next commit aa613697e9c0b9005d411562bfdf8bd727c0b02f makes things more readable anyway because both the true and false case are handled for logging.

in src/wallet/wallet.cpp:1863 in 137347fad5 outdated

1847+                // Block is skipped, pretend that we found it successfully
1848+                result.last_scanned_block = block_hash;
1849+                result.last_scanned_height = block_height;
1850+                skip_block = true;
1851+            }
1852+        }

stickies-v commented at 10:03 pm on September 12, 2022:

What do you think about putting this in a separate function that returns bool skip_block? CWallet::ScanForWalletTransactions() is already quite bloated, I think we should try not to add to it?

theStack commented at 3:53 pm on September 13, 2022:

Thought about that already and agree that bloat should be avoided, but couldn’t come up with a satisfying function name yet (CanBlockBeSkippedDueToFastRescanFilter?) and was concerned that the “fast rescan” functionality in the wallet would become too scattered. One class and once place where to use this class seems to be more compact and self-contained than class + function (that passes the class instance) + another function where this is called.

Any conrete suggestions there? Happy to follow-up. (Another possible variant would be to put the check directly into FastWalletRescanFilter, as a method? Not sure.)

stickies-v commented at 4:54 pm on September 13, 2022:

I like the idea of moving (part of the logic) to FastWalletRescanFilter, yeah. What do you think about the below diff? I didn’t test it very carefully (tests pass though) but it seems equivalent and imo more readable?

 0diff --git a/src/wallet/wallet.cpp b/src/wallet/wallet.cpp
 1index d3f1fb418..fdc5fdc7f 100644
 2--- a/src/wallet/wallet.cpp
 3+++ b/src/wallet/wallet.cpp
 4@@ -282,6 +282,14 @@ public:
 5         }
 6     }
 7 
 8+    bool BlockNotInFilter(uint256 block_hash)
 9+    {
10+        UpdateIfNeeded();
11+        auto block_in_filter{MatchesBlock(block_hash)};
12+        return block_in_filter.value_or(true) == false;
13+    }
14+
15+private:
16     void UpdateIfNeeded()
17     {
18         // repopulate filter with new scripts if top-up has happened since last iteration
19@@ -300,7 +308,6 @@ public:
20         return m_wallet.chain().blockFilterMatchesAny(BlockFilterType::BASIC, block_hash, m_filter_set);
21     }
22 
23-private:
24     const CWallet& m_wallet;
25     /** Map for keeping track of each active descriptor's last seen end range.
26       * This information is used to detect whether new addresses were derived
27@@ -1843,16 +1850,7 @@ CWallet::ScanResult CWallet::ScanForWalletTransactions(const uint256& start_bloc
28             WalletLogPrintf("Still rescanning. At block %d. Progress=%f\n", block_height, progress_current);
29         }
30 
31-        bool skip_block{false};
32-        if (fast_rescan_filter) {
33-            fast_rescan_filter->UpdateIfNeeded();
34-            auto match_result{fast_rescan_filter->MatchesBlock(block_hash)};
35-            if (match_result.has_value() && !match_result.value()) {
36-                result.last_scanned_block = block_hash;
37-                result.last_scanned_height = block_height;
38-                skip_block = true;
39-            }
40-        }
41+        bool skip_block{fast_rescan_filter ? fast_rescan_filter->BlockNotInFilter(block_hash) : false};
42 
43         // Read block data
44         CBlock block;
45@@ -1865,7 +1863,10 @@ CWallet::ScanResult CWallet::ScanForWalletTransactions(const uint256& start_bloc
46         uint256 next_block_hash;
47         chain().findBlock(block_hash, FoundBlock().inActiveChain(block_still_active).nextBlock(FoundBlock().inActiveChain(next_block).hash(next_block_hash)));
48 
49-        if (!skip_block) {
50+        if (skip_block) {
51+            result.last_scanned_block = block_hash;
52+            result.last_scanned_height = block_height;
53+        } else {
54             if (!block.IsNull()) {
55                 LOCK(cs_wallet);
56                 if (!block_still_active) {

Edit: not blocking of course, can be a follow-up too.

theStack commented at 2:18 pm on September 23, 2022:

Would prefer that for a follow-up, could possibly even be a larger one that generally tries to divide up CWallet::ScanForWalletTransactions() into more functions to reduce bloat. Just one thing about naming, though I also unfortunately don’t have a better idea: BlockNotInFilter suggests that this method only retrieves information and as such doesn’t have any side-effects (but it does, due to the UpdateIfNeeded call).

stickies-v commented at 9:19 pm on September 28, 2022:

Initially I agreed with your view (and had the same consideration when proposing) about BlockNotInFilter having side-effects. However, in this case upon further thinking I’m not sure that’s a problem:

both BlockNotInFilter() and UpdateIfNeeded() are idempotent, we don’t need to be concerned about running it “too often”. UpdateIfNeeded() only has internal side-effects.
UpdateIfNeeded() is a separate/manual function to improve performance, so we don’t continuously need to monitor/update the filter when it’s not needed. All it does, is ensure the filter works as expected. From that perspective, I don’t think BlockNotInFilter() has the kind of side-effects we should normally be worried about.

theStack commented at 12:42 pm on October 1, 2022:

Good valid points, I tend to agree that in this case the side-effect is acceptable and we shouldn’t worry. With the current version of the PR that introduced logging messages on two cases of the tri-state result (block filter not found or wallet filter matches block), as suggested by Sjors, it get’s maybe a bit more difficult to put all this logic into a single method though; the logging would also need to be part of the method, if we want to distinguish between the concrete reason of why we don’t skip a block. That said, I’m of course still open though for a potential follow-up that reduces ScanForWalletTransaction bloat.

in test/functional/wallet_fast_rescan.py:28 in 137347fad5 outdated

23+
24+    def skip_test_if_missing_module(self):
25+        self.skip_if_no_wallet()
26+        self.skip_if_no_sqlite()
27+
28+    def get_wallet_txids(self, node, wallet_name):

stickies-v commented at 10:10 pm on September 12, 2022:

nit: PEP484

0    def get_wallet_txids(self, node: TestNode, wallet_name: str) -> List[str]:

theStack commented at 3:05 pm on September 13, 2022:

Done.

in src/node/interfaces.cpp:552 in 137347fad5 outdated

546+        const BlockFilterIndex* block_filter_index = GetBlockFilterIndex(filter_type);
547+        if (!block_filter_index) return std::nullopt;
548+
549+        BlockFilter filter;
550+        const CBlockIndex* index = WITH_LOCK(::cs_main, return chainman().m_blockman.LookupBlockIndex(block_hash));
551+        if (!block_filter_index->LookupFilter(index, filter)) return std::nullopt;

w0xlt commented at 12:59 pm on September 13, 2022:

No need to add block_filter_index->BlockUntilSyncedToCurrentChain ? Currently it seems to assume the chain is already synced.

0        if (!block_filter_index->BlockUntilSyncedToCurrentChain() || !block_filter_index->LookupFilter(index, filter)) return std::nullopt;

theStack commented at 3:28 pm on September 13, 2022:

Have to look deeper into the details here, but I think if we are not fully synced yet, it’s still better to use block filters for a (possibly small) subset of the chain rather to not use them at all?

w0xlt commented at 1:30 am on September 14, 2022:

Reviewing the code again, it seems to me that if the index not fully synced, the wallet will normally scan the remaining blocks. In this case, this change is really not necessary.

in src/wallet/wallet.cpp:1876 in df89e3c0c4 outdated

1875-                SyncTransaction(block.vtx[posInBlock], TxStateConfirmed{block_hash, block_height, static_cast<int>(posInBlock)}, fUpdate, /*rescanning_old_block=*/true);
1876-            }
1877-            // scan succeeded, record block as most recent successfully scanned
1878-            result.last_scanned_block = block_hash;
1879-            result.last_scanned_height = block_height;
1880+        if (!skip_block) {

w0xlt commented at 1:14 pm on September 13, 2022:

Doesn’t if (!skip_block) chain().findBlock(block_hash, FoundBlock().data(block)); guarantee that the block will be null if skip_block is true ?

theStack commented at 4:05 pm on September 13, 2022:

Note that the if (!block.IsNull()) { ... } block has an else-condition below that we don’t want to reach (it sets ScanResult::FAILURE) if we skipped the block due to the fast rescan filter. Reviewing with --ignore-all-space should show the diff a bit more clearly to see it. (Or did I interpret your comment wrongly and you meant something else?)

w0xlt commented at 1:26 am on September 14, 2022:

Got it. Originally if the block is null, this is a reason for failure. With skip_block, this can happen if the filter was applied.

theStack force-pushed on Sep 13, 2022

theStack commented at 4:05 pm on September 13, 2022: contributor

Force-pushed with most reviewer’s suggestions taken into account (thanks a lot!). Most significantly, the std::map key type for storing the end ranges was changed from DescriptorPubKeyMan pointers to SPK-Man IDs. This leads to a little more code (and unnecessary re-computation of IDs after filter repopulation), but seems to be the more clean design choice; see also discussion #25957 (review).

w0xlt approved

w0xlt commented at 2:06 am on September 14, 2022: contributor

ACK https://github.com/bitcoin/bitcoin/pull/25957/commits/0d611a80b43be1246047d60d93288e0648e8aeb8

furszy commented at 1:43 pm on September 16, 2022: member

Could make 1b6b09e3 simpler by removing the second if (!skip_block) { and adding it only inside the else path (this works because the block is null).

0} else if (!skip_block) {
1    // could not scan block, keep scanning but record this block as the most recent failure
2    result.last_failed_block = block_hash;
3    result.status = ScanResult::FAILURE;
4}

in src/wallet/wallet.cpp:283 in 456097763d outdated

277+            AddScriptPubKeys(desc_spkm);
278+            // save each active ScriptPubKeyMans's range end for possible future filter updates
279+            if (active_spkms.find(spkm) != active_spkms.end()) {
280+                m_last_range_ends.emplace(desc_spkm->GetID(), GetDescriptorRangeEnd(*desc_spkm));
281+            }
282+        }

furszy commented at 1:56 pm on September 16, 2022:

Can remove the active_spkms lookup by using desc_spkm->IsRange(). E.g.

0for (auto spkm : m_wallet.GetAllScriptPubKeyMans()) {
1     auto desc_spkm{dynamic_cast<DescriptorScriptPubKeyMan*>(spkm)};
2     AddScriptPubKeys(desc_spkm);
3     // save each spkm range end for possible future filter updates
4     if (desc_spkm->IsRange()) {
5         m_last_range_ends.emplace(desc_spkm->GetID(), GetDescriptorRangeEnd(*desc_spkm));
6     }
7}

Note: IsRange() can be a simple:

0bool DescriptorScriptPubKeyMan::IsRange()
1{
2   return WITH_LOCK(cs_desc_man, return GetWalletDescriptor().descriptor->IsRange());
3}

theStack commented at 2:24 pm on September 23, 2022:

Similar to the rationale in #25957 (review), I’d prefer to not extend the SPK-man interface with new methods unless absolutely needed, if I don’t even know if it would be useful outside of this PR (which is “only” an optimization and doesn’t add any functionality of its own). Only for saving one line of code in the ctor I don’t think it’s worth it.

furszy commented at 2:19 pm on September 16, 2022: member

As the purpose of the FastWalletRescanFilter is track the descriptors to update the filter set structure, what about renaming the class to something like FilterSetTracker?

in src/wallet/wallet.cpp:291 in 0d611a80b4 outdated

285+    void UpdateIfNeeded()
286+    {
287+        // repopulate filter with new scripts if top-up has happened since last iteration
288+        for (auto [desc_spkm_id, last_range_end] : m_last_range_ends) {
289+            auto desc_spkm{dynamic_cast<DescriptorScriptPubKeyMan*>(m_wallet.GetScriptPubKeyMan(desc_spkm_id))};
290+            int32_t current_range_end{GetDescriptorRangeEnd(*desc_spkm)};

aureleoules commented at 10:58 am on September 26, 2022:

make sure desc_spkm is not nullptr before dereferencing

theStack commented at 10:58 am on October 1, 2022:

Thanks, done (in both loops).

in src/wallet/wallet.cpp:288 in 0d611a80b4 outdated

283+    }
284+
285+    void UpdateIfNeeded()
286+    {
287+        // repopulate filter with new scripts if top-up has happened since last iteration
288+        for (auto [desc_spkm_id, last_range_end] : m_last_range_ends) {

aureleoules commented at 10:59 am on September 26, 2022:

0        for (const auto& [desc_spkm_id, last_range_end] : m_last_range_ends) {

theStack commented at 10:58 am on October 1, 2022:

Done.

in src/wallet/scriptpubkeyman.cpp:2651 in 0d611a80b4 outdated

2640@@ -2641,13 +2641,18 @@ const WalletDescriptor DescriptorScriptPubKeyMan::GetWalletDescriptor() const
2641 }
2642 
2643 const std::unordered_set<CScript, SaltedSipHasher> DescriptorScriptPubKeyMan::GetScriptPubKeys() const
2644+{
2645+    return GetScriptPubKeys(0);
2646+}
2647+
2648+const std::unordered_set<CScript, SaltedSipHasher> DescriptorScriptPubKeyMan::GetScriptPubKeys(int32_t minimum_index) const

aureleoules commented at 11:00 am on September 26, 2022:

it is returned by value anyway

0std::unordered_set<CScript, SaltedSipHasher> DescriptorScriptPubKeyMan::GetScriptPubKeys(int32_t minimum_index) const

theStack commented at 10:59 am on October 1, 2022:

Thanks, done with a preparatory extra refactoring commit which changes the ScriptPubKeyMan::GetScriptPubKeys() interface (which imposes this on us).

in src/wallet/scriptpubkeyman.cpp:2652 in 0d611a80b4 outdated

2648+const std::unordered_set<CScript, SaltedSipHasher> DescriptorScriptPubKeyMan::GetScriptPubKeys(int32_t minimum_index) const
2649 {
2650     LOCK(cs_desc_man);
2651     std::unordered_set<CScript, SaltedSipHasher> script_pub_keys;
2652-    script_pub_keys.reserve(m_map_script_pub_keys.size());
2653+    script_pub_keys.reserve(std::max<size_t>(m_map_script_pub_keys.size() - minimum_index, 0));

stickies-v commented at 1:16 pm on September 28, 2022:

Since you’re subtracting from an unsigned, minimum_index gets cast to unsigned and the expression overflows if minimum_index > m_map_script_pub_keys.size().

0    script_pub_keys.reserve(std::max(static_cast<int32_t>(m_map_script_pub_keys.size()) - minimum_index, 0));

(Note: the static_cast<int32_t> may overflow too)

Sjors commented at 9:42 am on September 29, 2022:

Or we could check m_map_script_pub_keys.size() >= minimum_index and return early if not.

theStack commented at 11:02 am on October 1, 2022:

Excellent find, thanks! Decided for the proposed solution of casting the .size() return value to int32_t.

Or we could check m_map_script_pub_keys.size() >= minimum_index and return early if not.

The comparison suffers from the same problem I think (i.e. signed value minimum_index could overflow in the course of getting casted to unsigned)?

in src/interfaces/chain.h:152 in f83bc26a5d outdated

143@@ -143,6 +144,13 @@ class Chain
144     //! or one of its ancestors.
145     virtual std::optional<int> findLocatorFork(const CBlockLocator& locator) = 0;
146 
147+    //! Returns whether a block filter index is available.
148+    virtual bool hasBlockFilterIndex(BlockFilterType filter_type) = 0;
149+
150+    //! Returns whether any of the elements match the block via a BIP 157 block filter
151+    //! or std::nullopt if the block filter for this block couldn't be found.
152+    virtual std::optional<bool> blockFilterMatchesAny(BlockFilterType filter_type, const uint256& block_hash, const GCSFilter::ElementSet& filter_set) = 0;

Sjors commented at 8:20 am on September 29, 2022:

f83bc26a5d2c93ff6d11e41b766e61cc8df7b45c @ryanofsky does GCSFilter::ElementSet present any problems for #10102?

in src/wallet/wallet.cpp:269 in 456097763d outdated

260@@ -260,6 +261,69 @@ std::shared_ptr<CWallet> LoadWalletInternal(WalletContext& context, const std::s
261         return nullptr;
262     }
263 }
264+
265+class FastWalletRescanFilter
266+{
267+public:
268+    FastWalletRescanFilter(const CWallet& wallet) : m_wallet(wallet)

Sjors commented at 8:31 am on September 30, 2022:

456097763d4bdf7ba35de2b3eb33d2160c8024c1: alternatively, maybe have the caller pass in a bunch of DescriptorScriptPubKeyMans (separate argument for active and inactive / all)? Then you might not need the assert and dynamic_cast. In that case I’d rename it to FastRescanFilterSPKM. It would also make this class more reusable for a non-wallet descriptor rescan, like #23549.

On the other hand, it would just move a bunch of complexity into 1b6b09e399773c4bac56afead60e4d29e7fbb010.

theStack commented at 12:18 pm on October 1, 2022:

Sounds like an interesting idea that is worth to be followed-up, primarily for the reusability argument. I think in the end we’d not get rid of the assert and dynamic_cast though, instead it’s just shifted to the caller? (I don’t see a way of avoiding the cast, since the wallet class methods GetAllScriptPubKeyMans() and GetActiveScriptPubKeyMans() only give us the SPKmans with the base class interface ScriptPubKeyMan*).

Sjors commented at 9:57 am on October 3, 2022:

While writing the suggestion, I implicitly assumed that a CWallet rescan function would call rescan on individual DescriptorScriptPubKeyMans, which would have removed the need for any casting. But scanning is done globally, which makes sense because otherwise you’d have to scan the blocks many times over.

A larger refactor might add MaybeMatch() to the SPKM interface, which would return true for legacy and would use the filter for descriptors. But this would mean having N separate filters. Not sure how that impacts the overall false-positive rate.

furszy commented at 2:54 pm on October 24, 2022:

A larger refactor might add MaybeMatch() to the SPKM interface, which would return true for legacy and would use the filter for descriptors. But this would mean having N separate filters. Not sure how that impacts the overall false-positive rate.

I actually have this coded in a separated branch for an on-going project that have (will push the milestone PR soon..). But essentially, Each spkm have their own elements set, and the update is signal-based instead of have to poll at every block.

I think that, as we will get another use case for it, it might be better to add it in follow-up PR.

Sjors commented at 9:18 am on September 30, 2022: member

Code for 0d611a80b43be1246047d60d93288e0648e8aeb8 looks good modulo potential overflow issue in https://github.com/bitcoin/bitcoin/pull/25957/commits/ae320ebc5ec857e783064ea8188b95cdd4b8a5ee#r982393105

However the top up doesn’t seem to work for me, e.g. when I set keypool=20 it misses at least one address (with a receive and send on it) that had index 21. (getaddressinfo knows about it, so it’s not an import mistake)

Update 1: actually I tested an older version, trying again… Update 2: ~~no, it’s really missing stuff, including a transaction received on a wpkh() descriptor with index 2.~~ Update 3 no, it’s not, I had a few super stingy transactions in the mempool for several weeks, so the order is different. Maybe I should just sort by amount to compare. Update 4 I set keypool=10 and it found a transaction received at index 9, but not at index 10. Update 5 I initially imported with "range": [0-3] but with a higher keypool value. I tried again, this time dropping the range argument entirely. But I still get the same issue. The two transactions are many blocks apart.

Review pro tip:

export descriptors from an existing wallet with listdescriptors
put them in a file, strip the newlines, remove {"descriptors": ...} and only keep the array, drop range and next (do not drop the range for inactive descriptors)
launch with keypool=3, create a watch-only wallet
cat tmp | src/bitcoin-cli -rpcwallet=... -stdin importdescriptors

With QT it’s easy to compare the wallets by switching between them in the transaction list. Only the labels should change. Some timetamps will change because the original wallet saw them enter the mempool, while the rescan only knows what block it was in.

You may need larger number for keypool if something is missing (`keypool is effectively the gap limit). But don’t set it too high, because then you won’t test the on the fly topup mechanism.

in src/wallet/wallet.cpp:1853 in 1b6b09e399 outdated

1848+            fast_rescan_filter->UpdateIfNeeded();
1849+            auto match_result{fast_rescan_filter->MatchesBlock(block_hash)};
1850+            if (match_result.has_value() && !match_result.value()) {
1851+                result.last_scanned_block = block_hash;
1852+                result.last_scanned_height = block_height;
1853+                skip_block = true;

Sjors commented at 9:41 am on September 30, 2022:

1b6b09e399773c4bac56afead60e4d29e7fbb010: it may be worth adding a new log category scan (in src/logging.cpp) that outputs which blocks are not skipped. That way I can look up the block hash for missing transactions and see if the problem was the filter or some other step.

In addition, this should log when a block doesn’t have a filter (but filters have been enabled).

theStack commented at 11:54 am on October 1, 2022:

Thanks, done. As for now, I didn’t introduce a new log category, but print it unconditionally via WalletLogPrintf.

Sjors commented at 10:04 am on October 3, 2022:

That’s probably a bit too much noise for a default log, and also confusing for anyone not familiar with these filters (and not trying to debug them). On the mainnet wallet I’m testing with (which has slightly more descriptors than the default) I get hundreds of these messages for every Still rescanning line.

theStack commented at 4:46 pm on October 4, 2022:

Agree, I ended up following the suggestion of adding a new log category scan now.

theStack force-pushed on Oct 1, 2022

theStack commented at 10:58 am on October 1, 2022: contributor

Thanks for the detailled reviews, much appreciated! Force-pushed with the following changes:

introduced a new commit which removes the unneeded “const” return value qualifier from the ScriptPubKeyMan::GetScriptPubKeys() interface and overruled methods (could also be done in the same commit extending the method for descriptor SPKs, but I think it’s good practice to not mix refactor commits with ones introducing new functionality) (https://github.com/bitcoin/bitcoin/pull/25957#discussion_r979865164)
fixed potential overflow in DescriptorScriptPubKeyMan::GetScriptPubKeys(...) by casting m_map_script_pub_keys.size() result to int32_t (I’d argue that the overflow risk for this cast is neglectable, considering that we’d need > 2 billion entries for it to hit) (https://github.com/bitcoin/bitcoin/pull/25957#discussion_r982393105)
assert desc_spkm after dynamic casting to DescriptorScriptPubKeyMan (https://github.com/bitcoin/bitcoin/pull/25957#discussion_r979863337)
use const auto& in iteration loop of the last range ends (https://github.com/bitcoin/bitcoin/pull/25957#discussion_r979863823)
rename matches_result to matches_block and using the dereference operator for value access (https://github.com/bitcoin/bitcoin/pull/25957#discussion_r982873205)
introduced a commit for showing a log message for each block that is not skipped (can either be if our filter matches, or if the block filter is not found), if we use fast rescan (https://github.com/bitcoin/bitcoin/pull/25957#discussion_r984412126)

The following suggestions I think would be good candidates for follow-ups:

encapsulate the fast rescan block filter update and matching logic into an own method in FastWalletRescanFilter (https://github.com/bitcoin/bitcoin/pull/25957#discussion_r982871332)
change the FastWalletRescanFilter interface by letting user pass in descriptor SPKs (https://github.com/bitcoin/bitcoin/pull/25957#discussion_r984347822)
introduce new logging category like e.g. “scan” (this could be seen as controversial and probably needs more detailled work, e.g. “do we really need an extra logging category for this?” / “what other messages have to be in this category?”)

However the top up doesn’t seem to work for me, e.g. when I set keypool=20 it misses at least one address (with a receive and send on it) that had index 21. (getaddressinfo knows about it, so it’s not an import mistake) @Sjors: interesting, could you share more details about the specific test-case (either here, or if preferred, in private via IRC)? Were any of the addresses on indices 0-20 used before in earlier blocks? (Do you think it’s possible to easily reproduce this scenario by simply modifying the functional test in the last commit?)

theStack commented at 12:54 pm on October 1, 2022: contributor

I’m responsing to @furszy’s review comments directly here as they were not given in code-review discussion threads:

Could make 1b6b09e simpler by removing the second if (!skip_block) { and adding it only inside the else path (this works because the block is null).
0} else if (!skip_block) {
1    // could not scan block, keep scanning but record this block as the most recent failure
2    result.last_failed_block = block_hash;
3    result.status = ScanResult::FAILURE;
4}

Did consider your suggestion, and I agree that it reduces the diff complexity (wouldn’t need to alter indenation), but preferred the current version of as I think it’s slightly more readable to surround the whole logic with the !skip_block condition (having the condition only at the else-branch could lead to confusion)

As the purpose of the FastWalletRescanFilter is track the descriptors to update the filter set structure, what about renaming the class to something like FilterSetTracker?

I think that rename would fit along well with Sjor’s idea of changing the class interface to be more generic (https://github.com/bitcoin/bitcoin/pull/25957#discussion_r984347822), for a potential follow-up.

Sjors commented at 9:19 am on October 3, 2022: member

Were any of the addresses on indices 0-20 used before in earlier blocks?

Yes, the number was ficitious, but the wallet received coins on index keypoolsize - 1 and then many blocks later on keypoolsize. The latter did not get picked up, suggesting the topup didn’t happen.

(still the case as of c6c08603a334c1f2cd13023c878b9e42418f9b13)

I’ll see if I can reproduce with a test case or with a doxxable wallet.

in src/wallet/wallet.cpp:1860 in c6c08603a3 outdated

1855+        if (fast_rescan_filter) {
1856+            fast_rescan_filter->UpdateIfNeeded();
1857+            auto matches_block{fast_rescan_filter->MatchesBlock(block_hash)};
1858+            if (matches_block.has_value()) {
1859+                if (*matches_block) {
1860+                    WalletLogPrintf("Fast rescan: inspect block %d (filter matched)\n", block_height);

Sjors commented at 10:01 am on October 3, 2022:

Can you add the hash here? (more useful when trying to see if a given block was matched)

theStack commented at 4:44 pm on October 4, 2022:

Done.

Sjors commented at 10:10 am on October 3, 2022: member

introduce new logging category like e.g. “scan” (this could be seen as controversial and probably needs more detailled work, e.g. “do we really need an extra logging category for this?” / “what other messages have to be in this category?”)

Adding a log category is not that much overhead. They’ve been added on an ad hoc basis before. See git blame, with not much drama. E.g. #23235 added four new categories with just the motivation that the main log was too noisy.

I’m concept meh on 5eb58340b21f6f7814eaee626e2263f1f1e95564. It seems fine to keep it const, unless someone has a good reason to edit the filter set.

fixed potential overflow in DescriptorScriptPubKeyMan::GetScriptPubKeys(...) by casting m_map_script_pub_keys.size() result to int32_t (I’d argue that the overflow risk for this cast is neglectable, considering that we’d need > 2 billion entries for it to hit)

It’s probably not neglible for a determined fuzzer :-)

BIP32 allows 2^32 child keys, but because a descriptor can’t be both hardened and unhardened, the maximum for a regular descriptor is 2^31. Combo descriptors return 4 script_pub_keys. This reveals two problems:

m_map_script_pub_keys size should be treated as a 64 bit (with a max of 33 bit)
We should not assume a 1 on 1 relation between the maximum index and the size of m_map_script_pub_keys, only m_map_script_pub_keys[script] can be used to determine an index.

Sjors commented at 11:42 am on October 3, 2022: member

In the test KEYPOOL_SIZE was larger than NUM_BLOCKS ~~so it would not actually top up~~ (actually it did topup). I ~~also~~ refactored the test to use the “natural” wallet topup mechanism through getnewaddress, which I find easier to follow.

This unfortunately doesn’t reproduce the problem… https://github.com/bitcoin/bitcoin/commit/81324b9b5c617d01e64d25a37ec5b1345b6ae9d1

Sjors commented at 12:08 pm on October 3, 2022: member

Aaah, oops. This was an inactive descriptor. There’s an open question* on how to deal with inactive descriptors when restoring a backup, but that’s orthogonal to this PR.

= if it’s a regular backup, then if the backup was made when the descriptor was active, it will topup and find new transactions. If the backup was made when it was inactive, the backup would have the right range. The only problem is when importing the descriptors directly. When importing a descriptor that’s inactive we should probably top up it anyway (perhaps unless the user explicitly sets a range for an inactive descriptor?)

furszy commented at 1:40 pm on October 3, 2022: member

The only problem is when importing the descriptors directly. When importing a descriptor that’s inactive we should probably top up it anyway (perhaps unless the user explicitly sets a range for an inactive descriptor?) @Sjors, have you checked CWallet::AddWalletDescriptor?, imported descriptors are topped up right before adding them to the wallet there (before setting them as active or inactive).

Sjors commented at 2:29 pm on October 3, 2022: member

@furszy that’s only once. The example inactive descriptor I imported would need ongoing topups. But I think the slow rescan would have the same issue.

furszy commented at 2:43 pm on October 3, 2022: member

@furszy that’s only once. The example inactive descriptor I imported would need ongoing topups. But I think the slow rescan would have the same issue.

Guess that you are deriving keys externally, that are outside of the pre-generated keypool range?

Because otherwise the wallet should detect them, we don’t differentiate active from inactive descriptors for the, per block, used keys top up.

edit, just saw #25957#pullrequestreview-1124847220..

theStack commented at 10:55 am on October 4, 2022: contributor

@Sjors: Thanks, I can confirm that running importdescriptors with a range descriptor that is inactive (which is the default if active is not explicitly specified) misses transactions for fast wallet rescan in the current version, while the slow rescan finds them all. See the following functional test patch to reproduce (slow rescan finds all 48 txs, while the fast rescan only finds 8, failing the assertion assert_equal(len(txids_fast), NUM_DESCRIPTORS * NUM_BLOCKS)):

 0diff --git a/test/functional/wallet_fast_rescan.py b/test/functional/wallet_fast_rescan.py
 1index d3a62dbc0b..9180bd2aab 100755
 2--- a/test/functional/wallet_fast_rescan.py
 3+++ b/test/functional/wallet_fast_rescan.py
 4@@ -41,7 +41,8 @@ class WalletFastRescanTest(BitcoinTestFramework):
 5         WALLET_BACKUP_FILENAME = os.path.join(node.datadir, 'wallet.bak')
 6         node.createwallet(wallet_name='topup_test', descriptors=True)
 7         w = node.get_wallet_rpc('topup_test')
 8-        assert_equal(len(w.listdescriptors()['descriptors']), NUM_DESCRIPTORS)
 9+        descriptors = w.listdescriptors()['descriptors']
10+        assert_equal(len(descriptors), NUM_DESCRIPTORS)
11         w.backupwallet(WALLET_BACKUP_FILENAME)
12
13         self.log.info(f"Create txs sending to end range address of each descriptor, triggering top-ups")
14@@ -56,14 +57,18 @@ class WalletFastRescanTest(BitcoinTestFramework):
15             self.generate(node, 1)
16
17         self.log.info("Import wallet backup with block filter index")
18+        node.createwallet(wallet_name='rescan_fast', descriptors=True, disable_private_keys=True, blank=True)
19+        w = node.get_wallet_rpc('rescan_fast')
20         with node.assert_debug_log(['fast variant using block filters']):
21-            node.restorewallet('rescan_fast', WALLET_BACKUP_FILENAME)
22+            res = w.importdescriptors([{"desc": descriptor['desc'], "timestamp": 0} for descriptor in descriptors])
23         txids_fast = self.get_wallet_txids(node, 'rescan_fast')
24
25         self.restart_node(0, [f'-keypool={KEYPOOL_SIZE}', '-blockfilterindex=0'])
26         self.log.info("Import wallet backup w/o block filter index")
27+        node.createwallet(wallet_name='rescan_slow', descriptors=True, disable_private_keys=True, blank=True)
28+        w = node.get_wallet_rpc('rescan_slow')
29         with node.assert_debug_log(['slow variant inspecting all blocks']):
30-            node.restorewallet("rescan_slow", WALLET_BACKUP_FILENAME)
31+            res = w.importdescriptors([{"desc": descriptor['desc'], "timestamp": 0} for descriptor in descriptors])
32         txids_slow = self.get_wallet_txids(node, 'rescan_slow')
33
34         assert_equal(len(txids_slow), NUM_DESCRIPTORS * NUM_BLOCKS)

The obvious solution is to not only take active SPKMs for potential filter top-ups into account, but simply all SPKMs with range descriptors (which was proposed already earlier by @furszy: #25957 (review)). Will tackle this and other code-review comments (w.r.t. logging, overflow handling) in a bit.

theStack force-pushed on Oct 4, 2022

theStack commented at 4:37 pm on October 4, 2022: contributor

Force-pushed with the following changes (can be easily reviewed via $ git range-diff c6c08603...4fbbb9ff):

fixed the potential overflow issue by ~~removing the script_pub_keys.reserve() call completely~~ reserving the worst-case of m_map_scriptPubKeys.size() (as in master now), considering

“We should not assume a 1 on 1 relation between the maximum index and the size of m_map_script_pub_keys, only m_map_script_pub_keys[script] can be used to determine an index.”

(see #25957 (comment); also, following the principle of “premature optimization is the root of all evil” here; compared to the performance boost by fast rescans (i.e. skipping blocks), the time spent for collecting the scriptPubKeys for top-up into a std::unordered_set every now and then seems to be small enough to be neglectable and not bother optimizing hash-table buckets)

fixed the missing top-ups for non-active SPKMs by taking all SPKMs with range descriptors into account, instead of only active SPKMs
introduced new logging category BCLog::SCAN ("scan") that is used for showing messages for each block that is not skipped in the course of a fast rescan (https://github.com/bitcoin/bitcoin/pull/25957#issuecomment-1265224042)
extended the functional test with the case of “import non-active range descriptors” (which is fixed above)

I’m concept meh on https://github.com/bitcoin/bitcoin/commit/5eb58340b21f6f7814eaee626e2263f1f1e95564. It seems fine to keep it const, unless someone has a good reason to edit the filter set.

Using const for return-by-value types is AFAIK pretty pointless (note that we are not returning a pointer or reference here) or at least obsolete advice, see e.g.: https://stackoverflow.com/a/8406966 / https://stackoverflow.com/a/8716406

theStack force-pushed on Oct 4, 2022

kouloumos referenced this in commit 88edf52153 on Oct 7, 2022

theStack force-pushed on Oct 13, 2022

theStack commented at 11:50 am on October 13, 2022: contributor

Force-pushed again with the following changes (git range-diff 4fbbb9ffa4...21763d8717), inspired by feedback from @achow101:

removed IsRangeDescriptor helper, use already available IsHDEnabled method instead for checking if a SPKM is containing a ranged descriptor
removed GetDescriptorRangeEnd helper, introduce a SPKM method GetRangeEnd method instead that returns the m_max_cached_index field (plus one) to avoid inspecting the actual descriptor object
removed the commit that got rid of the const qualifier from GetScriptPubKeys, since it was seen as controversial and is not really related to this PR (can be done as a follow-up).

We now don’t need to grab the lock cs_desc_man anymore for retrieving information about the descriptor (ranged property, end range).

DrahtBot added the label Needs rebase on Oct 17, 2022

theStack force-pushed on Oct 18, 2022

theStack commented at 10:52 pm on October 18, 2022: contributor

Rebased on master.

DrahtBot removed the label Needs rebase on Oct 18, 2022

Sjors commented at 12:40 pm on October 20, 2022: member

re-tACK 6fc1f5f92b4fed2ec0c92ba34ef83ab710876d79

Using a small -keypool I was able to import my non-active descriptors without the need to provide a range hint.

Another benefit I just noticed / realised: when re-opening an old wallet it now catches up a lot faster.

in src/node/interfaces.cpp:552 in 0141e0ab48 outdated

547+        const BlockFilterIndex* block_filter_index{GetBlockFilterIndex(filter_type)};
548+        if (!block_filter_index) return std::nullopt;
549+
550+        BlockFilter filter;
551+        const CBlockIndex* index{WITH_LOCK(::cs_main, return chainman().m_blockman.LookupBlockIndex(block_hash))};
552+        if (!block_filter_index->LookupFilter(index, filter)) return std::nullopt;

furszy commented at 2:23 pm on October 24, 2022:

As LookupBlockIndex can return a nullptr, better if we check it. (Currently, it cannot fail due our single use case but it might fail in the future)

theStack commented at 2:29 pm on October 25, 2022:

Good find, thanks, done. Now index == nullptr is another criteria for returning std::nullopt.

in src/wallet/wallet.cpp:1818 in 73964bc159 outdated

1813@@ -1814,7 +1814,11 @@ CWallet::ScanResult CWallet::ScanForWalletTransactions(const uint256& start_bloc
1814     uint256 block_hash = start_block;
1815     ScanResult result;
1816 
1817-    WalletLogPrintf("Rescan started from block %s...\n", start_block.ToString());
1818+    std::unique_ptr<FastWalletRescanFilter> fast_rescan_filter;
1819+    if (!IsLegacy() && chain().hasBlockFilterIndex(BlockFilterType::BASIC)) fast_rescan_filter.reset(new FastWalletRescanFilter(*this));

achow101 commented at 6:29 pm on October 24, 2022:

In 73964bc1590f0ba1113ecda7895744bd5f870ee5 “wallet: take use of FastWalletRescanFilter”

nit: I think we prefer to use make_unique

0    if (!IsLegacy() && chain().hasBlockFilterIndex(BlockFilterType::BASIC)) fast_rescan_filter = std::make_unique<FastWalletRescanFilter>(*this);

theStack commented at 2:29 pm on October 25, 2022:

Thanks, done.

in src/wallet/wallet.cpp:1848 in 73964bc159 outdated

1844@@ -1841,9 +1845,20 @@ CWallet::ScanResult CWallet::ScanForWalletTransactions(const uint256& start_bloc
1845             WalletLogPrintf("Still rescanning. At block %d. Progress=%f\n", block_height, progress_current);
1846         }
1847 
1848+        bool skip_block{false};

achow101 commented at 6:32 pm on October 24, 2022:

In 73964bc1590f0ba1113ecda7895744bd5f870ee5 “wallet: take use of FastWalletRescanFilter”

nit: Since only skip_block’s negation is checked, this could perhaps be clearer as bool fetch_block{true}, so that we are checking whether a block should be fetched and scanned, rather than whether a block should not be skipped.

in src/wallet/wallet.cpp:1861 in 73964bc159 outdated

1857+        }
1858+
1859         // Read block data
1860         CBlock block;
1861-        chain().findBlock(block_hash, FoundBlock().data(block));
1862+        if (!skip_block) chain().findBlock(block_hash, FoundBlock().data(block));

achow101 commented at 6:33 pm on October 24, 2022:

In 73964bc1590f0ba1113ecda7895744bd5f870ee5 “wallet: take use of FastWalletRescanFilter”

This could be inside of the if (!skip_block) below, and thus avoid having to construct a CBlock when we aren’t going to use it.

in src/wallet/rpc/backup.cpp:1593 in 1f0198a3fe outdated

1588@@ -1589,7 +1589,8 @@ RPCHelpMan importdescriptors()
1589     return RPCHelpMan{"importdescriptors",
1590                 "\nImport descriptors. This will trigger a rescan of the blockchain based on the earliest timestamp of all descriptors being imported. Requires a new wallet backup.\n"
1591             "\nNote: This call can take over an hour to complete if using an early timestamp; during that time, other rpc calls\n"
1592-            "may report that the imported keys, addresses or scripts exist but related transactions are still missing.\n",
1593+            "may report that the imported keys, addresses or scripts exist but related transactions are still missing.\n"
1594+            "The rescan is significantly faster if block filters are available (bitcoind option \"-blockfilterindex=1\").\n",

achow101 commented at 6:38 pm on October 24, 2022:

In 1f0198a3fe58d42dce9e8ddaa09dc2ba8171930e “rpc: doc: mention rescan speedup using blockfilterindex=1 in affected wallet RPCs”

I think we generally try to avoid using our binary names directly in help text like this.

0            "The rescan is significantly faster if block filters are available (Using startup option \"-blockfilterindex=1\").\n",

in src/wallet/rpc/transactions.cpp:843 in 1f0198a3fe outdated

838@@ -839,7 +839,9 @@ RPCHelpMan rescanblockchain()
839 {
840     return RPCHelpMan{"rescanblockchain",
841                 "\nRescan the local blockchain for wallet related transactions.\n"
842-                "Note: Use \"getwalletinfo\" to query the scanning progress.\n",
843+                "Note: Use \"getwalletinfo\" to query the scanning progress.\n"
844+                "The rescan is significantly faster if a descriptor wallet is loaded\n"

achow101 commented at 6:40 pm on October 24, 2022:

In 1f0198a3fe58d42dce9e8ddaa09dc2ba8171930e “rpc: doc: mention rescan speedup using blockfilterindex=1 in affected wallet RPCs”

Since rescanblockchain operates per-wallet, this should that it is faster when used on a descriptor wallet, as saying “if a descriptor wallet is loaded” incorrectly suggests that it may be faster for a legacy wallet if a descriptor wallet was also loaded at the same time.

0                "The rescan is significantly faster when used on a descriptor wallet\n"

achow101 commented at 6:54 pm on October 24, 2022: member

Output scripts from inactive DescriptorScriptPubKeyMans (i.e. ones with a fixed set of output scripts that is never extended) are added only once when the filter is created first.

That’s not correct, nor what this PR does. Inactive DescriptorSPKMs can still be ranged and be derived from like the active descriptors. Inactive just means that it is not being used for new addresses. However this PR does handle inactive ranged descriptor correctly.

What this should say is that non-ranged descriptors are added only once when the filter is created. There should also be a test for this case as there currently is not.

theStack commented at 2:09 am on October 25, 2022: contributor

Output scripts from inactive DescriptorScriptPubKeyMans (i.e. ones with a fixed set of output scripts that is never extended) are added only once when the filter is created first.

That’s not correct, nor what this PR does. Inactive DescriptorSPKMs can still be ranged and be derived from like the active descriptors. Inactive just means that it is not being used for new addresses. However this PR does handle inactive ranged descriptor correctly.

Indeed, corrected that in the PR description. In an earlier phase of the PR I had a misconception on what “active”/“inactive” descriptors really mean (unfortunately also leading to incorrect code that missed non-active ranged descriptor top-ups, as found by Sjors) and forgot to update the PR description after the logic was fixed to track ranged descriptors for top-ups, rather than only active ones.

Thanks for the thorough review, will also tackle the nits and the missing test for non-ranged descriptors in a bit.

in src/wallet/scriptpubkeyman.cpp:2651 in 6fc1f5f92b outdated

2643@@ -2644,17 +2644,27 @@ const WalletDescriptor DescriptorScriptPubKeyMan::GetWalletDescriptor() const
2644 }
2645 
2646 const std::unordered_set<CScript, SaltedSipHasher> DescriptorScriptPubKeyMan::GetScriptPubKeys() const
2647+{
2648+    return GetScriptPubKeys(0);
2649+}
2650+
2651+const std::unordered_set<CScript, SaltedSipHasher> DescriptorScriptPubKeyMan::GetScriptPubKeys(int32_t minimum_index) const

aureleoules commented at 10:42 am on October 25, 2022:

Thanks, done with a preparatory extra refactoring commit which changes the ScriptPubKeyMan::GetScriptPubKeys() interface (which imposes this on us).

const is still there

theStack commented at 2:34 pm on October 25, 2022:

Thanks, done with a preparatory extra refactoring commit which changes the ScriptPubKeyMan::GetScriptPubKeys() interface (which imposes this on us).

const is still there

See #25957 (comment):

removed the commit that got rid of the const qualifier from GetScriptPubKeys, since it was seen as controversial and is not really related to this PR (can be done as a follow-up).

aureleoules approved

aureleoules commented at 10:43 am on October 25, 2022: member

I benchmarked the PR using a testnet miner address. I imported a watchonly address of a testnet miner (addr(tb1qcq670zweall6zz4f96flfrefhr8myfxz9ll9l2)#whefl5al).

Without blockfilterindex: Rescan completed in 601714ms With blockfilterindex: Rescan completed in 84442ms ~7 times faster :rocket:

add chain interface methods for using BIP 157 block filters

This is useful for speeding up wallet rescans and is based on an
earlier version from PR #15845 ("wallet: Fast rescan with BIP157 block
filters"), which was never merged.

Co-authored-by: MacroFake <falke.marco@gmail.com>

088e38d3bb

wallet: support fetching scriptPubKeys with minimum descriptor range index

This extra method will be needed for updating the filter set for
faster wallet rescans; after an internal top-up has happened, we only
want to add the newly created scriptPubKeys.

845279132b

wallet: add method for retrieving the end range for a ScriptPubKeyMan c051026586

wallet: add `FastWalletRescanFilter` class for speeding up rescans

This only supports wallet descriptors right now.

70b3513904

wallet: take use of `FastWalletRescanFilter`

Can be reviewed with `--ignore-all-space`.

935c6c4b23

wallet: fast rescan: show log message for every non-skipped block

For that purpose, a new logging category BCLog::SCAN is introduced.

3449880b49

rpc: doc: mention rescan speedup using `blockfilterindex=1` in affected wallet RPCs ca48a4694f

test: add test for fast rescan using block filters (top-up detection) 0582932260

theStack force-pushed on Oct 25, 2022

theStack commented at 2:42 pm on October 25, 2022: contributor

Force-pushed with the review suggestions from @furszy and @achow101 taken into account (range-diff):

return-value of LookupBlockIndex is now checked against nullptr (https://github.com/bitcoin/bitcoin/pull/25957#discussion_r1003378783)
change the block inspect/skip boolean variable to indicate whether the block should be inspected rather than if the block should be skipped (s/skip_block/fetch_block/) (https://github.com/bitcoin/bitcoin/pull/25957#discussion_r1003628912), move construction and reading of CBlock into if (fetch_block) { ... } code block (https://github.com/bitcoin/bitcoin/pull/25957#discussion_r1003629894)
improved RPC documentation for calls that are affected by the fast rescan (https://github.com/bitcoin/bitcoin/pull/25957#discussion_r1003633631, #25957 (review))
updated the functional test to also include a fixed non-ranged descriptor (we simply send to the same address in each block)

Re-review would be much appreciated!

achow101 commented at 8:03 pm on October 25, 2022: member

ACK 0582932260e7de4e8aba01d63e7c8a9ddb9c3685

aureleoules approved

aureleoules commented at 11:54 pm on October 25, 2022: member

ACK 0582932260e7de4e8aba01d63e7c8a9ddb9c3685 - minor changes, documentation and updated test since last review

Sjors commented at 2:25 pm on October 26, 2022: member

re-utACK 0582932260e7de4e8aba01d63e7c8a9ddb9c3685

w0xlt approved

w0xlt commented at 2:51 pm on October 26, 2022: contributor

re-ACK https://github.com/bitcoin/bitcoin/pull/25957/commits/0582932260e7de4e8aba01d63e7c8a9ddb9c3685

achow101 merged this on Oct 26, 2022

achow101 closed this on Oct 26, 2022

theStack deleted the branch on Oct 26, 2022

sidhujag referenced this in commit c570842516 on Oct 27, 2022

theStack referenced this in commit 783288334c on Jan 19, 2023

fanquake referenced this in commit 2343886217 on Jan 19, 2023

sidhujag referenced this in commit d87235f70f on Jan 19, 2023

aguycalled referenced this in commit 1184856133 on Jan 27, 2023

bitcoin locked this on Nov 12, 2023

wallet: fast rescan with BIP157 block filters for descriptor wallets #25957

Description

Benchmark results

Conflicts