This PR speeds up wallet fast-rescan by executing the filter checks in parallel while ensuring that the filters are updated properly so that no output scripts are missed. Benchmarks, outlined below, show considerable improvement that tapers off at around 5x speedup at 8 threads.
Prerequisite PRs:
- #34667 - modify the fast-rescan test to ensure that it fails when the filter is not updated properly.
- #34681 - refactor
CWallet::ScanForWalletTransactionsto prepare for the work in this PR.
Benchmarks:
NOTE: to reproduce, please tune your system with pyperf system tune
EDIT
Set up your node to use block filters by setting blockfilterindex=1 in your bitcoin.conf file and ensure your blockfilterindex is synced to tip before attempting to reproduce.
Using the following command on mainnet with a wallet with no scripts and hyperfine version 1.20.0:
hyperfine --show-output --export-markdown results.md --export-json results.json \
--sort command \
--runs 3 \
-L commit 68f030ee,new-rescan \
-L num_threads 1,2,3,4,5,6,7,8,9,16 \
--prepare 'git checkout {commit} && cmake --build build -j 20 && build/bin/bitcoind -blockfilterindex=1 -walletpar={num_threads} && sleep 10 && build/bin/bitcoin-cli loadwalllet <wallet-name>' \
--conclude 'build/bin/bitcoin-cli stop && sleep 10' \
'build/bin/bitcoin-cli rescanblockchain 500000 900000'
I obtained the following results:
| Command | Mean [s] | Min [s] | Max [s] | Relative |
|---|---|---|---|---|
build/bin/bitcoin-cli rescanblockchain 500000 900000 (commit = baseline, num_threads = 1) |
532.395 ± 0.883 | 531.662 | 533.376 | 5.12 ± 0.01 |
build/bin/bitcoin-cli rescanblockchain 500000 900000 (commit = new-rescan, num_threads = 1) |
540.736 ± 1.016 | 539.906 | 541.869 | 5.20 ± 0.01 |
build/bin/bitcoin-cli rescanblockchain 500000 900000 (commit = new-rescan, num_threads = 2) |
292.880 ± 1.045 | 292.043 | 294.051 | 2.82 ± 0.01 |
build/bin/bitcoin-cli rescanblockchain 500000 900000 (commit = new-rescan, num_threads = 3) |
204.811 ± 0.331 | 204.433 | 205.047 | 1.97 ± 0.00 |
build/bin/bitcoin-cli rescanblockchain 500000 900000 (commit = new-rescan, num_threads = 4) |
161.883 ± 0.424 | 161.456 | 162.304 | 1.56 ± 0.00 |
build/bin/bitcoin-cli rescanblockchain 500000 900000 (commit = new-rescan, num_threads = 5) |
137.347 ± 0.339 | 136.995 | 137.672 | 1.32 ± 0.00 |
build/bin/bitcoin-cli rescanblockchain 500000 900000 (commit = new-rescan, num_threads = 6) |
121.098 ± 0.410 | 120.739 | 121.544 | 1.16 ± 0.00 |
build/bin/bitcoin-cli rescanblockchain 500000 900000 (commit = new-rescan, num_threads = 7) |
109.306 ± 0.319 | 108.938 | 109.514 | 1.05 ± 0.00 |
build/bin/bitcoin-cli rescanblockchain 500000 900000 (commit = new-rescan, num_threads = 8) |
104.008 ± 0.127 | 103.934 | 104.154 | 1.00 |
build/bin/bitcoin-cli rescanblockchain 500000 900000 (commit = new-rescan, num_threads = 9) |
104.788 ± 1.882 | 103.430 | 106.937 | 1.01 ± 0.02 |
build/bin/bitcoin-cli rescanblockchain 500000 900000 (commit = new-rescan, num_threads = 16) |
103.978 ± 0.161 | 103.801 | 104.116 | 1.00 ± 0.00 |
<details> <summary>System information:</summary>
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 46 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 20
On-line CPU(s) list: 0-19
Vendor ID: GenuineIntel
Model name: Intel(R) Core(TM) Ultra 7 265
CPU family: 6
Model: 198
Thread(s) per core: 1
Core(s) per socket: 1
Socket(s): 20
Stepping: 2
CPU(s) scaling MHz: 41%
CPU max MHz: 4800.0000
CPU min MHz: 800.0000
BogoMIPS: 4761.60
</details>
The improvements seem to peak at 5x despite the machine having an excess number of Cores (20).
Further benchmarks were performed using a python script with custom chains designed with payments at specified intervals, and the following graph was produced:
<img width="4760" height="8950" alt="benchmark_comparison" src="https://github.com/user-attachments/assets/43ab59be-d3e2-4cd0-8035-f0ff076ae315" />
<details> <summary>This graph was produced from a laptop with the following CPU specifications:</summary>
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 48 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 16
On-line CPU(s) list: 0-15
Vendor ID: AuthenticAMD
Model name: AMD Ryzen 9 8945HS w/ Radeon 780M Graphics
CPU family: 25
Model: 117
Thread(s) per core: 2
Core(s) per socket: 8
Socket(s): 1
Stepping: 2
Frequency boost: enabled
CPU(s) scaling MHz: 63%
CPU max MHz: 5263.0000
CPU min MHz: 400.0000
</details>
All materials for the custom benchmarks can be found here.
Although not explicitly checked with Valgrind, hyperfine reported that memory usage stayed the same across all runs. I'm not sure to what degree Hyperfine's memory usage report can be trusted, but the PR limits the amount of block hashes that can be held in memory for processing to 1000 (not configurable by the user).
All benchmarks were performed against https://github.com/bitcoin/bitcoin/pull/34400/commits/68f030eef7b14d5ac6372a12864a813317ae0f4f instead of master because -walletpar is implemented here, and the benchmark scripts would otherwise break, or more complicated scripts would be required to accommodate master.