wallet: parallel fast rescan (approx 5x speed up with 16 threads) #34400

pull Eunovo wants to merge 10 commits into bitcoin:master from Eunovo:new-rescan changing 15 files +534 −275
  1. Eunovo commented at 7:46 pm on January 24, 2026: contributor

    EDIT: Some parts of this PR have been split into #34667 and #34681 to ease review. This PR will be put in draft until the previous PRs have been merged.

    This PR uses the ThreadPool to implement parallel fast rescan.

    This PR:

    • Adds a ThreadPool to the WalletContext to be shared by all the wallets
    • Adds -walletpar parameter to configure the number of threads to be used for parallel scanning
    • Updates the wallet_fast_rescan.py test to ensure it catches cases where the FastRescan filter wasn’t properly updated. This is crucial to ensure that changes in the PR do not cause newly added output scripts to be missed.
    • Refactors ScanForWalletTransactions to make the implementation of parallel scanning easier.
    • Implements parallel scanning

    Benchmarks: NOTE: to reproduce, please tune your system with pyperf system tune

    EDIT Set up your node to use block filters by setting blockfilterindex=1 in your bitcoin.conf file and ensure your blockfilterindex is synced to tip before attempting to reproduce.

    Using the following command on mainnet with a wallet with no scripts and hyperfine version 1.20.0:

    0hyperfine --show-output --export-markdown results.md --export-json results.json  \
    1--sort command \
    2--runs 3 \
    3-L commit ef847e8,37d356b \
    4-L num_threads 1,2,4,8,16 \
    5--prepare 'git checkout {commit} && cmake --build build -j 20 && build/bin/bitcoind -blockfilterindex=1 -walletpar={num_threads} && sleep 10 && build/bin/bitcoin-cli loadwalllet <wallet-name>' \
    6--conclude 'build/bin/bitcoin-cli stop && sleep 10' \
    7'build/bin/bitcoin-cli rescanblockchain 500000 900000'
    

    I obtained the following results:

    Command Mean [s] Min [s] Max [s] Relative
    build/bin/bitcoin-cli rescanblockchain 500000 900000 (commit = baseline, num_threads = ..) 536.996 ± 0.722 536.257 537.701 4.64 ± 0.01
    build/bin/bitcoin-cli rescanblockchain 500000 900000 (commit = parallel_scan, num_threads = 1) 540.210 ± 2.696 537.172 542.320 4.67 ± 0.03
    build/bin/bitcoin-cli rescanblockchain 500000 900000 (commit = parallel_scan, num_threads = 2) 358.190 ± 0.515 357.675 358.706 3.10 ± 0.01
    build/bin/bitcoin-cli rescanblockchain 500000 900000 (commit = parallel_scan, num_threads = 4) 230.217 ± 2.321 228.814 232.896 1.99 ± 0.02
    build/bin/bitcoin-cli rescanblockchain 500000 900000 (commit = parallel_scan, num_threads = 8) 151.144 ± 1.748 149.506 152.984 1.31 ± 0.02
    build/bin/bitcoin-cli rescanblockchain 500000 900000 (commit = parallel_scan, num_threads = 16) 115.642 ± 0.305 115.390 115.982 1.00

    System information:

     0Architecture:             x86_64
     1  CPU op-mode(s):         32-bit, 64-bit
     2  Address sizes:          46 bits physical, 48 bits virtual
     3  Byte Order:             Little Endian
     4CPU(s):                   20
     5  On-line CPU(s) list:    0-19
     6Vendor ID:                GenuineIntel
     7  Model name:             Intel(R) Core(TM) Ultra 7 265
     8    CPU family:           6
     9    Model:                198
    10    Thread(s) per core:   1
    11    Core(s) per socket:   1
    12    Socket(s):            20
    13    Stepping:             2
    14    CPU(s) scaling MHz:   41%
    15    CPU max MHz:          4800.0000
    16    CPU min MHz:          800.0000
    17    BogoMIPS:             4761.60
    

    Further benchmarks were performed using a python script with custom chains designed with payments at specified intervals, and the following graph was produced:

    This graph was produced from a laptop with the following CPU specifications:

     0Architecture:                x86_64
     1  CPU op-mode(s):            32-bit, 64-bit
     2  Address sizes:             48 bits physical, 48 bits virtual
     3  Byte Order:                Little Endian
     4CPU(s):                      16
     5  On-line CPU(s) list:       0-15
     6Vendor ID:                   AuthenticAMD
     7  Model name:                AMD Ryzen 9 8945HS w/ Radeon 780M Graphics
     8    CPU family:              25
     9    Model:                   117
    10    Thread(s) per core:      2
    11    Core(s) per socket:      8
    12    Socket(s):               1
    13    Stepping:                2
    14    Frequency boost:         enabled
    15    CPU(s) scaling MHz:      63%
    16    CPU max MHz:             5263.0000
    17    CPU min MHz:             400.0000
    

    All materials for the custom benchmarks can be found here.

    Although not explicitly checked with Valgrind, hyperfine reported that memory usage stayed the same across all runs. I’m not sure to what degree hyperfine’s memory usage report can be trusted, but the PR limits the amount of block hashes that can be held in memory for processing to 1000 (not configurable by the user).

    All benchmarks were performed against https://github.com/bitcoin/bitcoin/pull/34400/commits/37d356bbbe3efed3c7c9e64fae1bac3f4d0ec6eb instead of master because -walletpar is implemented here, and the benchmark scripts would otherwise break, or more complicated scripts would be required to accommodate master.

  2. DrahtBot commented at 7:46 pm on January 24, 2026: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/34400.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    Concept ACK w0xlt

    If your review is incorrectly listed, please copy-paste <!–meta-tag:bot-skip–> into the comment that the bot should ignore.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #34681 (wallet: refactor ScanForWalletTransactions by Eunovo)
    • #34667 (test: ensure FastWalletRescanFilter is correctly updated during scanning by Eunovo)
    • #33008 (wallet: support bip388 policy with external signer by Sjors)
    • #30343 (wallet, logging: Replace WalletLogPrintf() with LogInfo() by ryanofsky)
    • #27865 (wallet: Track no-longer-spendable TXOs separately by achow101)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  3. DrahtBot added the label CI failed on Jan 24, 2026
  4. DrahtBot commented at 8:44 pm on January 24, 2026: contributor

    🚧 At least one of the CI tasks failed. Task test max 6 ancestor commits: https://github.com/bitcoin/bitcoin/actions/runs/21320629356/job/61369934184 LLM reason (✨ experimental): Compilation failed due to an unused private member (m_thread_pool) in wallet.h being treated as an error under -Werror.

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  5. Eunovo force-pushed on Jan 25, 2026
  6. DrahtBot removed the label CI failed on Jan 25, 2026
  7. Eunovo renamed this:
    Parallel Fast Rescan (approx 5x speed up with 16 threads)
    wallet: parallel fast rescan (approx 5x speed up with 16 threads)
    on Jan 27, 2026
  8. DrahtBot added the label Wallet on Jan 27, 2026
  9. luke-jr commented at 4:50 am on January 29, 2026: member
    I would have expected rescanning to be I/O bound rather than CPU, in which case parallelization could make things worse (more random seeking). Have you benchmarked this on a non-SSD?
  10. Eunovo commented at 8:42 am on January 29, 2026: contributor

    I would have expected rescanning to be I/O bound rather than CPU, in which case parallelization could make things worse (more random seeking). Have you benchmarked this on a non-SSD?

    Fast rescan checks block filters, which involves considerable hashing. This PR parallelises the checking of block filters, and my benchmarks show considerable improvements in rescan speeds with block filters. Slow rescan, which is I/O bound, remains the same. I expect the speedup to be transferable to non-SSD machines, but I haven’t benchmarked this.

  11. DrahtBot added the label Needs rebase on Feb 4, 2026
  12. in src/wallet/scan.cpp:203 in ef847e8bce outdated
    198+        // If m_max_blockqueue_size blocks have been filtered,
    199+        // stop reading more blocks for now, to give the
    200+        // main scanning loop a chance to update progress
    201+        // and erase some blocks from the queue.
    202+        if (m_continue && completed < m_max_blockqueue_size) m_continue = ReadBlockHash(result);
    203+        else if (!futures.empty()) thread_pool->ProcessTask();
    


    bvbfan commented at 8:59 am on February 8, 2026:
    This slows down the scanning no? All workers already process submit task in its own WorkThread just randomly trying to acquire mutex from scanning thread is non sense to me.

    Eunovo commented at 11:40 am on February 8, 2026:
    Are you referring to the ThreadPool::m_mutex? This mutex is not held during task processing. It is only briefly held to access the work queue. Calling ProcessTask() from the main thread does not slow down scanning; it gives the main thread work to do instead of wasting cycles waiting for results.

    bvbfan commented at 6:29 pm on February 9, 2026:
    Yep it’s not held during task execution, but if main thread do a task, it cannot put new tasks to queue i.e. workers “fight” itself to read something and do nothing. The idea is main thread submit tasks faster than workers could finish to keep all of them busy otherwise there is no difference between 3 and 16 thread (~13 threads do nothing).

    Eunovo commented at 9:34 am on February 10, 2026:
    The main thread intentionally submits only up to WORKERS_COUNT tasks before waiting, rather than continuously submitting. This allows it to pause and update filters whenever a payment is found, preventing unnecessary work on wallets with many transactions packed into a short block range.
  13. Eunovo force-pushed on Feb 8, 2026
  14. DrahtBot removed the label Needs rebase on Feb 8, 2026
  15. Eunovo force-pushed on Feb 11, 2026
  16. Eunovo commented at 5:20 pm on February 11, 2026: contributor
    #33689 has been merged; the cherry-picked Threadpool commit has been removed.
  17. furszy commented at 8:43 pm on February 11, 2026: member

    I like the PR conceptually but I think it would be nice to first improve the current scanning code structure, then land the parallelization feature. The current code mixes a lot responsibilities. Similar to what you did in 633531614f69de49733642fd19cc9eba830fbdea, but into a separate PR so we can first land some good building blocks for this to happen.

    Some quick pseudo-code structuring how I imagine it, which is similar to yours:

     0Scan(wallet, start_block_hash, end_block_hash, fn_filter_block, fn_process_block, interrupt) {
     1     it_current_hash = start_block_hash;
     2
     3     while (it_current_hash != end_block_hash || interrupt) {
     4         // Skip block if needed (this function contains the BlockFilterIndex check if enabled)
     5         if (fn_filter_block(it_current_hash)) continue;
     6    
     7         // (this is more or less how we currently do it, we fetch the block and the next block hash at the same time)
     8         block = chain.find_block(it_current_hash).next_block(it_current_hash);
     9    
    10         // (inside this function the wallet will digest the block update the filter and save progress if needed)
    11         fn_process_block(block);
    12      }
    13}
    
  18. w0xlt commented at 8:12 am on February 21, 2026: contributor
    Concept ACK
  19. test/wallet: ensure FastWalletRescanFilter is updated during scanning
    The fixed non-range descriptor address ensured that the FastWalletRescanFilter would match all Blocks even if the filter wasn't properly updated.
    This commit moves the non-range descriptor tx to a different block, so that the filters must be updated after each TopUp for the test to pass.
    90d347b40b
  20. wallet: move scanning logic to wallet/scan.cpp
    Move rescan logic to new class to allow the scanning loop
    to be simplified by delegating some logic to member
    functions in future commits.
    
    CWallet::ScanForWalletTransactions impl is moved to scan.cpp
    to prevent circular dependency of the form
    "wallet/wallet -> wallet/scan -> wallet/wallet".
    81ec69cbbf
  21. wallet/scan: extract block filtering logic to ShouldFetchBlock method
    Pure extraction of block filter matching logic into a dedicated method.
    This prepares for further refactoring of the scanning loop.
    6e662fe9e1
  22. wallet/scan: extract block scanning logic to ScanBlock method
    Pure extraction of block transaction processing logic into a dedicated
    method. This isolates the logic for fetching blocks and syncing their
    transactions.
    34d4cd27e3
  23. wallet/scan: extract block iteration logic into ReadNextBlock
    Extract block reading logic into ReadNextBlock method which returns
    std::optional<pair<hash, height>>.
    Introduce m_next_block member to track iteration state, consolidating
    the loop termination logic into ReadNextBlock.
    6909533e4a
  24. wallet/scan: extract progress tracking helper methods
    Extract UpdateProgress, UpdateTipIfChanged, and LogScanStatus methods.
    Move progress tracking variables to class members. This simplifies the
    main scanning loop and groups related progress tracking logic together.
    9f02e8b067
  25. wallet/scan: simplify ChainScanner::Scan main loop
    Extract ProcessBlock method and simplify the main scanning loop. This
    final refactoring demonstrates the clean separation of concerns with
    each helper method handling a specific aspect of the scanning process.
    e392eb5f93
  26. wallet: setup wallet threadpool
    All wallets will use the same ThreadPool owned by the WalletContext.
    ThreadPool::Submit() is threadsafe so there's no need for the use of
    external synchronization primitives when submiting tasks.
    All threads started by the ThreadPool will be destroyed with the
    ThreadPool and WalletContext.
    881ebc4730
  27. wallet/scan: combine block iteration and filtering in ReadNextBlocks
    This commit refactors the block filtering logic from ShouldFetchBlock
    into a new ReadNextBlocks method that works with a queue of blocks
    (m_blocks). This prepares the code for parallel block filter checking
    while keeping the current single-threaded behaviour.
    555ea0148e
  28. wallet: check blockfilters in parallel
    This commit implements parallel block filter checking using the wallet
    threadpool. The main thread reads block hashes and queues them while
    worker threads check filters in parallel.
    
    Synchronization:
    - Operations requiring cs_wallet (GetLastBlockHeight, SyncTransaction)
      remain on the main thread since cs_wallet is a RecursiveMutex and
      ScanForWalletTransactions is called from AttachChain which locks cs_wallet
    - Main thread uses ThreadPool::ProcessTask() to join workers when the
      block queue is full, avoiding busy-waiting
    
    Batching:
    - Up to m_max_blockqueue_size (1000) blocks are queued for filtering
    - When queue is full, main loop processes filtered blocks before reading more
    Thread safety:
    - All futures (at most `workers_count`)  are waited on before returning to
      avoid data races on `FastWalletRescanFilter::m_filter_set`.
    
    Benchmarks show considerable improvement (approx 5x with 16 threads).
    48154b87e2
  29. Eunovo force-pushed on Feb 26, 2026
  30. Eunovo commented at 4:21 pm on February 26, 2026: contributor

    I like the PR conceptually but I think it would be nice to first improve the current scanning code structure, then land the parallelization feature. The current code mixes a lot responsibilities. Similar to what you did in 6335316, but into a separate PR so we can first land some good building blocks for this to happen.

    I moved the test change into #34667 and the ScanForWalletTransactions refactor into #34681. I’ll be putting this PR in draft while those PRs are open.

  31. Eunovo marked this as a draft on Feb 26, 2026

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-03-03 21:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me