This PR uses @furszy’s ThreadPool from #33689 to implement parallel fast rescan. The ThreadPool commit has been cherry-picked into this PR for use.
This PR:
- Adds a
ThreadPoolto theWalletContextto be shared by all the wallets - Adds
-walletparparameter to configure the number of threads to be used for parallel scanning - Updates the
wallet_fast_rescan.pytest to ensure it catches cases where the FastRescan filter wasn’t properly updated. This is crucial to ensure that changes in the PR do not cause newly added output scripts to be missed. - Refactors
ScanForWalletTransactionsto make the implementation of parallel scanning easier. - Implements parallel scanning
Benchmarks:
NOTE: to reproduce, please tune your system with pyperf system tune
Using the following command on mainnet with a wallet with no scripts:
0hyperfine --show-output --export-markdown results.md --export-json results.json \
1--sort command \
2--runs 3 \
3-L commit ef847e8,37d356b \
4-L num_threads 1,2,4,8,16 \
5--prepare 'git checkout {commit} && cmake --build build -j 20 && build/bin/bitcoind -walletpar={num_threads} && sleep 10' \
6--conclude 'build/bin/bitcoin-cli stop && sleep 10' \
7'build/bin/bitcoin-cli rescanblockchain 500000 900000'
I obtained the following results:
| Command | Mean [s] | Min [s] | Max [s] | Relative |
|---|---|---|---|---|
build/bin/bitcoin-cli rescanblockchain 500000 900000 (commit = baseline, num_threads = ..) |
536.996 ± 0.722 | 536.257 | 537.701 | 4.64 ± 0.01 |
build/bin/bitcoin-cli rescanblockchain 500000 900000 (commit = parallel_scan, num_threads = 1) |
540.210 ± 2.696 | 537.172 | 542.320 | 4.67 ± 0.03 |
build/bin/bitcoin-cli rescanblockchain 500000 900000 (commit = parallel_scan, num_threads = 2) |
358.190 ± 0.515 | 357.675 | 358.706 | 3.10 ± 0.01 |
build/bin/bitcoin-cli rescanblockchain 500000 900000 (commit = parallel_scan, num_threads = 4) |
230.217 ± 2.321 | 228.814 | 232.896 | 1.99 ± 0.02 |
build/bin/bitcoin-cli rescanblockchain 500000 900000 (commit = parallel_scan, num_threads = 8) |
151.144 ± 1.748 | 149.506 | 152.984 | 1.31 ± 0.02 |
build/bin/bitcoin-cli rescanblockchain 500000 900000 (commit = parallel_scan, num_threads = 16) |
115.642 ± 0.305 | 115.390 | 115.982 | 1.00 |
System information:
0Architecture: x86_64
1 CPU op-mode(s): 32-bit, 64-bit
2 Address sizes: 46 bits physical, 48 bits virtual
3 Byte Order: Little Endian
4CPU(s): 20
5 On-line CPU(s) list: 0-19
6Vendor ID: GenuineIntel
7 Model name: Intel(R) Core(TM) Ultra 7 265
8 CPU family: 6
9 Model: 198
10 Thread(s) per core: 1
11 Core(s) per socket: 1
12 Socket(s): 20
13 Stepping: 2
14 CPU(s) scaling MHz: 41%
15 CPU max MHz: 4800.0000
16 CPU min MHz: 800.0000
17 BogoMIPS: 4761.60
Further benchmarks were performed using a python script with custom chains designed with payments at specified intervals, and the following graph was produced:
This graph was produced from a laptop with the following CPU specifications:
0Architecture: x86_64
1 CPU op-mode(s): 32-bit, 64-bit
2 Address sizes: 48 bits physical, 48 bits virtual
3 Byte Order: Little Endian
4CPU(s): 16
5 On-line CPU(s) list: 0-15
6Vendor ID: AuthenticAMD
7 Model name: AMD Ryzen 9 8945HS w/ Radeon 780M Graphics
8 CPU family: 25
9 Model: 117
10 Thread(s) per core: 2
11 Core(s) per socket: 8
12 Socket(s): 1
13 Stepping: 2
14 Frequency boost: enabled
15 CPU(s) scaling MHz: 63%
16 CPU max MHz: 5263.0000
17 CPU min MHz: 400.0000
All materials for the custom benchmarks can be found here.
Although not explicitly checked with Valgrind, hyperfine reported that memory usage stayed the same across all runs. I’m not sure to what degree hyperfine’s memory usage report can be trusted, but the PR limits the amount of block hashes that can be held in memory for processing to 1000 (not configurable by the user).
All benchmarks were performed against https://github.com/bitcoin/bitcoin/pull/34400/commits/37d356bbbe3efed3c7c9e64fae1bac3f4d0ec6eb instead of master because -walletpar is implemented here, and the benchmark scripts would otherwise break, or more complicated scripts would be required to accommodate master.