validation: fetch block input prevouts in parallel during ConnectBlock #35295

pull andrewtoth wants to merge 11 commits into bitcoin:master from andrewtoth:threaded-inputs changing 13 files +585 −42
  1. andrewtoth commented at 12:53 AM on May 15, 2026: contributor

    This PR is a continuation of #31132. All outstanding issues raised there have been resolved, but the volume of stale comments can make that change difficult to review.

    Currently, when connecting a block, each input prevout is looked up one at a time. For every input we first check the in-memory coins cache, and on a miss we make a synchronous round-trip to the chainstate LevelDB to read the coin from disk. Because these lookups happen serially as the block is being validated, the disk read latency stacks up and dominates the time spent in ConnectBlock whenever many inputs are not already in the cache.

    This PR moves those disk reads onto a pool of worker threads that run in parallel with block connection. Before entering ConnectBlock the block is handed to a CoinsViewOverlay, which kicks off the workers to begin fetching all of the block's prevouts from disk and warming the cache. The main validation thread continues to do exactly the same work it does today, hitting the cache for each input in order. The only difference is that by the time it asks, the coin is much more likely to already be there. There are no validation logic or consensus behavior changes. This is purely a parallelization of an existing read pattern.

    The number of fetcher threads is configurable via -prevoutfetchthreads=<n>, defaulting to 8 and capped at 16. Setting it to 0 disables input fetching entirely and reverts to the previous serial behavior.

    We have measured large performance gains for IBD and -reindex-chainstate, as well as worst-case steady-state block connection at the tip. l0rinc ran many thorough benchmarking passes on the original PR across multiple machines, storage types, dbcache sizes^1, operating systems[^2], and fetcher thread counts[^3]. Many other contributors also posted their benchmark results in the original PR. IBD speedups range from 1.18× to over 3× faster[^4]. Worst-case block connection time for network-attached storage was over 2× faster[^5]. Flamegraph comparisons before and after this change are available[^6].

    On safety: ConnectBlock runs while holding cs_main, so nothing else in the node can mutate the chainstate while the fetchers are reading it.

    On LevelDB: concurrent reads are fully supported and documented as such. We already rely on this in production today against our other LevelDB-backed databases. The txindex DB is read by multiple simultaneous HTTP RPC worker threads via the getrawtransaction RPC. The blockfilterindex DB is called concurrently from both the P2P cfilters / cfheaders / cfcheckpt message handlers on the msghand thread, and from the getblockfilter RPC on the HTTP RPC worker threads. We have not yet been issuing concurrent reads against the chainstate DB, but there is no LevelDB-side reason we can't. In fact, the chainstate DB is already being touched by more than one thread on master, because LevelDB schedules its own background compaction work.

    For reviewers:

    The main change is CoinsViewOverlay gets 1 new public and 2 new private methods.

    • StartFetching: public method called in lieu of CreateResetGuard before we enter ConnectBlock. It still returns a ResetGuard so the view is Reset before the block it is working on leaves scope. This kicks off worker threads who each just run while (ProcessInput()) {} and then return.
    • StopFetching: private method called on Reset whenever the guard leaves scope, or a method that mutates base is called (Flush/Sync/SetBackend). Stops all threads and clears multi threaded state.
    • ProcessInput: private method that fetches a single input prevout. Returns true if there are more inputs to fetch and false otherwise. This is the only method on CoinsViewOverlay that is called concurrently by multiple threads. Every other method on the overlay is still called synchronously on the main thread.

    The CoinsViewOverlay::FetchCoinFromBase method is also extended to lookup the coins fetched from ProcessInput first before falling back to base->PeekCoin.

    Mutating methods Reset, Flush, Sync and SetBackend are overridden in CoinsViewOverlay to call StopFetching first.

    See https://github.com/bitcoin/bitcoin/pull/35295/changes#diff-095ce1081a930998a10b37358fae5499ac47f8cb6f25f5df5d88e920a54e0341R568-R624 for a thorough description and ASCII diagram detailing the full behavior.

    [^2]: #31132 (comment) [^3]: #31132 (comment) [^4]: #31132 (comment) [^5]: #31132 (comment) [^6]: #31132 (comment)

  2. DrahtBot added the label Validation on May 15, 2026
  3. DrahtBot commented at 12:53 AM on May 15, 2026: contributor

    <!--e57a25ab6845829454e8d69fc972939a-->

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    <!--006a51241073e994b41acfe9ec718e94-->

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/35295.

    <!--021abf342d371248e50ceaed478a90ca-->

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    Concept ACK rkrux, theStack
    Stale ACK l0rinc

    If your review is incorrectly listed, please copy-paste <code>&lt;!--meta-tag:bot-skip--&gt;</code> into the comment that the bot should ignore.

    <!--174a7506f384e20aa4161008e828411d-->

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #34514 (refactor: remove unnecessary std::move for a few trivially copyable types by l0rinc)
    • #34132 (coins: drop error catcher, centralize fatal read handling by l0rinc)
    • #32317 (kernel: Separate UTXO set access from validation functions by sedited)
    • #28690 (build: Introduce internal kernel library by sedited)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

    <!--5faf32d7da4f0f540f40219e4f7537a3-->

  4. in src/node/chainstatemanager_args.cpp:63 in 22c4c1d737
      59 | @@ -60,6 +60,13 @@ util::Result<void> ApplyArgsManOptions(const ArgsManager& args, ChainstateManage
      60 |      // Subtract 1 because the main thread counts towards the par threads.
      61 |      opts.worker_threads_num = script_threads - 1;
      62 |  
      63 | +    if (auto value{args.GetIntArg("-inputfetchthreads")}) {
    


    l0rinc commented at 2:02 PM on May 15, 2026:

    22c4c1d validation: add -inputfetchthreads configuration option:

    https://corecheck.dev/bitcoin/bitcoin/pulls/35295 indicates we're not testing this part of the code - could we add a unit test for this?

    diff --git a/src/test/validation_chainstatemanager_tests.cpp b/src/test/validation_chainstatemanager_tests.cpp
    --- a/src/test/validation_chainstatemanager_tests.cpp	(revision d6946a12ab306c5054b535b0b7fac0101fb306ae)
    +++ b/src/test/validation_chainstatemanager_tests.cpp	(revision 82be6bdbe682cf9b4d1fb46820302cbbfb5c34b6)
    @@ -991,6 +991,13 @@
     
         BOOST_CHECK(!get_opts({"-minimumchainwork=xyz"}));                                                               // invalid hex characters
         BOOST_CHECK(!get_opts({"-minimumchainwork=01234567890123456789012345678901234567890123456789012345678901234"})); // > 64 hex chars
    +
    +    // test -inputfetchthreads
    +    BOOST_CHECK_EQUAL(get_valid_opts({}).inputfetch_threads_num, DEFAULT_INPUTFETCH_THREADS);
    +    BOOST_CHECK_EQUAL(get_valid_opts({"-inputfetchthreads=0"}).inputfetch_threads_num, 0);
    +    BOOST_CHECK_EQUAL(get_valid_opts({"-inputfetchthreads=3"}).inputfetch_threads_num, 3);
    +    BOOST_CHECK_EQUAL(get_valid_opts({"-inputfetchthreads=100"}).inputfetch_threads_num, MAX_INPUTFETCH_THREADS);
    +    BOOST_CHECK(!get_opts({"-inputfetchthreads=-1"}));
     }
     
     BOOST_AUTO_TEST_SUITE_END()
    
  5. in src/validation.cpp:1859 in a3981aa8a0
    1856 |      AssertLockHeld(::cs_main);
    1857 |      m_cacheview = std::make_unique<CCoinsViewCache>(&m_catcherview);
    1858 | -    m_connect_block_view = std::make_unique<CoinsViewOverlay>(&*m_cacheview);
    1859 | +    auto thread_pool{std::make_shared<ThreadPool>("inputfetch")};
    1860 | +    if (inputfetch_threads > 0) {
    1861 | +        thread_pool->Start(inputfetch_threads);
    


    l0rinc commented at 2:22 PM on May 15, 2026:

    a3981aa coins: introduce thread pool in CoinsViewOverlay:

    nit: do we want to be able to init the cache with > MAX_INPUTFETCH_THREADS threads? That's not really possible currently as far as I can tell, but we may want to clamp here again to make it future-proof.

  6. in src/coins.h:591 in 70ee8ba660
     586 |          const COutPoint& outpoint;
     587 |          //! The coin that workers will fetch and main thread will insert into cache.
     588 |          mutable std::optional<Coin> coin{std::nullopt};
     589 |  
     590 | +        //! The move constructor will never be used, since m_inputs will never need to reallocate.
     591 | +        InputToFetch(InputToFetch&& other) noexcept : outpoint{other.outpoint} { Assert(false); }
    


    l0rinc commented at 9:51 AM on May 18, 2026:

    70ee8ba coins: add ready flag to InputToFetch:

    nit: https://corecheck.dev/bitcoin/bitcoin/pulls/35295 coverage indicates this is unused - maybe a simple unit test could cover it. Not sure it matters...

  7. in src/coins.h:709 in 1f0c8957e1 outdated
     712 | +                    while (ProcessInput()) {}
     713 | +                });
     714 | +                if (auto futures{m_thread_pool->Submit(std::move(tasks))}; futures.has_value()) {
     715 | +                    m_futures = std::move(*futures);
     716 | +                } else {
     717 | +                    m_inputs.clear();
    


    l0rinc commented at 9:52 AM on May 18, 2026:

    1f0c895 coins: fetch inputs in parallel:

    https://corecheck.dev/bitcoin/bitcoin/pulls/35295 indicates this path is never taken


    ryanofsky commented at 7:12 PM on May 27, 2026:

    In commit "coins: fetch inputs in parallel" (1f0c8957e1e483608148dc4bf7f396b82d2c6e44)

    Unclear why clearing the inputs is the right thing to do when the thread pool submit fails.

    Since thread pool is passed externally and not only controlled by this code, it would seem good to handle failures by doing something like raising an exception or logging an error. Of if clearing m_inputs is a good way to handle this, it would be worth saying what this does in a comment.


    andrewtoth commented at 9:18 PM on May 31, 2026:

    I added a comment explaining why it's the correct action to take, and I also added a warning log here.

  8. in src/coins.h:685 in fec8a2b732 outdated
     677 | @@ -678,6 +678,24 @@ class CoinsViewOverlay : public CCoinsViewCache
     678 |          while (ProcessInput()) {}
     679 |          return CreateResetGuard();
     680 |      }
     681 | +
     682 | +    void SetBackend(CCoinsView& view) override
     683 | +    {
     684 | +        StopFetching();
     685 | +        CCoinsViewCache::SetBackend(view);
    


    l0rinc commented at 9:56 AM on May 18, 2026:

    fec8a2b coins: stop fetching before mutating base:

    Same, https://corecheck.dev/bitcoin/bitcoin/pulls/35295 indicates this is never actually called. This is probably a consequence of the Coins interface bloat; maybe we could throw, log, or Assume instead?


    andrewtoth commented at 9:01 PM on May 31, 2026:

    It is called in the fuzz harness.


    l0rinc commented at 9:09 PM on May 31, 2026:

    If it's test-only code maybe we could throw, log, or Assume instead?

  9. in src/coins.h:694 in fec8a2b732 outdated
     689 | +    {
     690 | +        StopFetching();
     691 | +        CCoinsViewCache::Flush(reallocate_cache);
     692 | +    }
     693 | +
     694 | +    void Sync() override
    


    l0rinc commented at 9:56 AM on May 18, 2026:

    fec8a2b coins: stop fetching before mutating base:

    I'm a bit surprised this is never covered in https://corecheck.dev/bitcoin/bitcoin/pulls/35295 - can you please investigate?

    What is the expected behavior when we have a second fetch session attached to the same underlying base, e.g.

    BOOST_AUTO_TEST_CASE(fetch_state_is_cleared_by_mutating_operations)
    {
        const auto block{CreateBlock()};
        CCoinsViewDB db{{.path = "", .cache_bytes = 1_MiB, .memory_only = true}, {}};
        CCoinsViewCache main_cache{&db};
        CCoinsViewCache alternate_cache{&db};
        PopulateView(block, main_cache);
        PopulateView(block, alternate_cache);
    
        for (const bool flush : {false, true}) {
            CoinsViewOverlay view{&main_cache, StartedThreadPool()};
            const auto first_guard{view.StartFetching(block)};
            view.SetBestBlock(uint256::ONE);
            if (flush) {
                view.Flush();
            } else {
                view.Sync();
            }
    
            const auto second_guard{view.StartFetching(block)};
            CheckCache(block, view);
        }
    
        CoinsViewOverlay view{&main_cache, StartedThreadPool()};
        const auto first_guard{view.StartFetching(block)};
        view.SetBackend(alternate_cache);
    
        const auto second_guard{view.StartFetching(block)};
        CheckCache(block, view);
    }
    
  10. in src/coins.h:645 in a15470468b outdated
     640 | +            for (const auto& input : tx->vin) {
     641 | +                m_inputs.emplace_back(input.prevout);
     642 | +            }
     643 | +        }
     644 | +        while (ProcessInput()) {}
     645 | +        return CreateResetGuard();
    


    l0rinc commented at 10:07 AM on May 18, 2026:

    a154704 validation: collect block inputs in CoinsViewOverlay before ConnectBlock:

    It should be safe to start twice, we could cover that with a test case, maybe something like:

    BOOST_AUTO_TEST_CASE(fetch_state_is_reusable_after_reset)
    {
        const auto block{CreateBlock()};
        CCoinsViewDB db{{.path = "", .cache_bytes = 1_MiB, .memory_only = true}, {}};
        CCoinsViewCache main_cache{&db};
        PopulateView(block, main_cache);
        CoinsViewOverlay view{&main_cache, StartedThreadPool()};
    
        {
            const auto reset_guard{view.StartFetching(block)};
            CheckCache(block, view);
            BOOST_CHECK_GT(view.GetCacheSize(), 0U);
        }
    
        BOOST_CHECK_EQUAL(view.GetCacheSize(), 0U);
    
        const auto reset_guard{view.StartFetching(block)};
        CheckCache(block, view);
    }
    
  11. in src/coins.h:646 in a3981aa8a0 outdated
     641 | @@ -640,10 +642,13 @@ class CoinsViewOverlay : public CCoinsViewCache
     642 |              return std::move(input.coin);
     643 |          }
     644 |  
     645 | -        // We will only get here for BIP30 checks.
     646 | +        // We will only get here for BIP30 checks or when parallel fetching is disabled.
     647 |          return base->PeekCoin(outpoint);
    


    l0rinc commented at 10:09 AM on May 18, 2026:

    a3981aa coins: introduce thread pool in CoinsViewOverlay:

    Since the workers call PeekCoin() concurrently on the shared base cache, maybe we could force routing the lookup through std::as_const to make the concurrent-read contract explicit at the access site:

    std::optional<Coin> CCoinsViewCache::PeekCoin(const COutPoint& outpoint) const
    {
        const auto& coins{std::as_const(cacheCoins)};
        if (auto it{coins.find(outpoint)}; it != coins.end()) {
            return it->second.coin.IsSpent() ? std::nullopt : std::optional{it->second.coin};
        }
        return base->PeekCoin(outpoint);
    }
    

    andrewtoth commented at 9:03 PM on May 31, 2026:

    I did not take this. This seems out of scope for this change.

  12. in src/test/fuzz/coins_view.cpp:452 in d6946a12ab outdated
     447 | +        .path = "",
     448 | +        .cache_bytes = 1_MiB,
     449 | +        .memory_only = true,
     450 | +    };
     451 | +    CCoinsViewDB backend_base_coins_view{std::move(db_params), CoinsViewOptions{}};
     452 | +    CCoinsViewCache backend_cache{&backend_base_coins_view, /*deterministic=*/true};
    


    l0rinc commented at 10:23 AM on May 18, 2026:

    d6946a1 fuzz: add coins_view_stacked fuzz harness to test concurrent leveldb reads:

    The generated block inputs are mostly arbitrary misses this way - could we help the fuzzer by allowing a fuzz-selected subset of the staged external prevouts into the DB before starting the overlay, so the same concurrent read path covers real backing-store hits as well as misses:

    void PopulateFetchInputs(CCoinsView& view, const CBlock& block, FuzzedDataProvider& fuzzed_data_provider)
    {
        CCoinsViewCache cache{&view, /*deterministic=*/true};
        cache.SetBestBlock(uint256::ONE);
        for (const auto& tx : block.vtx | std::views::drop(1)) {
            for (const auto& input : tx->vin) {
                if (!fuzzed_data_provider.ConsumeBool()) continue;
                auto coin{ConsumeDeserializable<Coin>(fuzzed_data_provider)};
                if (!coin || coin->IsSpent()) continue;
                cache.AddCoin(input.prevout, std::move(*coin), /*possible_overwrite=*/true);
            }
        }
        cache.Flush();
    }
    ...
    CCoinsViewDB backend_base_coins_view{std::move(db_params), CoinsViewOptions{}};
    CBlock block{BuildRandomBlock(fuzzed_data_provider)};
    PopulateFetchInputs(backend_base_coins_view, block, fuzzed_data_provider);
    CCoinsViewCache backend_cache{&backend_base_coins_view, /*deterministic=*/true};
    

    andrewtoth commented at 9:03 PM on May 31, 2026:

    I did something similar. I add the utxos when I create the block.

  13. in src/test/coinsviewoverlay_tests.cpp:196 in 650416ec87 outdated
     191 | @@ -175,5 +192,65 @@ BOOST_AUTO_TEST_CASE(fetch_no_inputs)
     192 |      BOOST_CHECK_EQUAL(view.GetCacheSize(), 0);
     193 |  }
     194 |  
     195 | +// Access coins that are not block inputs
     196 | +BOOST_AUTO_TEST_CASE(access_non_input_coins)
    


    l0rinc commented at 1:32 PM on May 18, 2026:

    650416e test: add unit tests for CoinsViewOverlay::StartFetching:

    Would it make sense to cover the thread pool interruption explicitly?

    BOOST_AUTO_TEST_CASE(fetch_interrupted_thread_pool_uses_normal_lookup)
    {
        const auto block{CreateBlock()};
        CCoinsViewDB db{{.path = "", .cache_bytes = 1_MiB, .memory_only = true}, {}};
        CCoinsViewCache main_cache{&db};
        PopulateView(block, main_cache);
    
        auto thread_pool{std::make_shared<ThreadPool>("fetch_interrupted")};
        thread_pool->Start(DEFAULT_INPUTFETCH_THREADS);
        thread_pool->Interrupt();
    
        {
            CoinsViewOverlay view{&main_cache, thread_pool};
            const auto reset_guard{view.StartFetching(block)};
            CheckCache(block, view);
        }
    
        thread_pool->Stop();
    }
    

    and maybe the theoretic-but-documented out-of-order flow:

    BOOST_AUTO_TEST_CASE(fetch_out_of_order_input_uses_normal_lookup)
    {
        const auto block{CreateBlock()};
        CCoinsViewDB db{{.path = "", .cache_bytes = 1_MiB, .memory_only = true}, {}};
        CCoinsViewCache main_cache{&db};
        PopulateView(block, main_cache);
    
        std::vector<COutPoint> fetched_inputs;
        std::unordered_set<Txid, SaltedTxidHasher> txids;
        txids.reserve(block.vtx.size() - 1);
        for (const auto& tx : block.vtx | std::views::drop(1)) {
            for (const auto& input : tx->vin) {
                if (!txids.contains(input.prevout.hash)) fetched_inputs.push_back(input.prevout);
            }
            txids.emplace(tx->GetHash());
        }
        BOOST_REQUIRE_GE(fetched_inputs.size(), 2U);
    
        CoinsViewOverlay view{&main_cache, StartedThreadPool()};
        const auto reset_guard{view.StartFetching(block)};
    
        const auto& out_of_order_input{fetched_inputs[1]};
        BOOST_CHECK(!view.AccessCoin(out_of_order_input).IsSpent());
        BOOST_CHECK(view.HaveCoinInCache(out_of_order_input));
    
        CheckCache(block, view);
    }
    
  14. in src/coins.h:631 in a15470468b
     626 | +        StopFetching();
     627 | +        CCoinsViewCache::Reset();
     628 | +    }
     629 | +
     630 |  public:
     631 |      using CCoinsViewCache::CCoinsViewCache;
    


    l0rinc commented at 1:35 PM on May 18, 2026:

    a154704 validation: collect block inputs in CoinsViewOverlay before ConnectBlock:

    nit: maybe we could harden destruction of the overlay (though callers must still keep the reset guard inside the overlay lifetime)

        ~CoinsViewOverlay() noexcept override { StopFetching(); }
    
  15. in src/test/coinsviewoverlay_tests.cpp:61 in d6946a12ab
      56 |      for (const auto& tx : block.vtx | std::views::drop(1)) {
      57 |          for (const auto& in : tx->vin) {
      58 | +            if (txids.contains(in.prevout.hash)) continue;
      59 |              Coin coin{};
      60 |              if (!spent) coin.out.nValue = 1;
      61 |              cache.EmplaceCoinInternalDANGER(COutPoint{in.prevout}, std::move(coin));
    


    l0rinc commented at 1:39 PM on May 18, 2026:

    Since the prod usually uses AddCoin for the unspent setup paths maybe we could exercise that as well:

                if (spent) {
                    cache.EmplaceCoinInternalDANGER(COutPoint{in.prevout}, std::move(coin));
                } else {
                    coin.out.nValue = 1;
                    cache.AddCoin(in.prevout, std::move(coin), /*possible_overwrite=*/false);
                }
    
  16. l0rinc approved
  17. l0rinc commented at 1:48 PM on May 18, 2026: contributor

    ACK d6946a12ab306c5054b535b0b7fac0101fb306ae

    My concerns from the original PR were addressed, left a few ideas on how to extend code coverage but the implementation finally seems complete to me, thanks for your persistence @andrewtoth :D/

  18. rkrux commented at 2:34 PM on May 20, 2026: contributor

    I don't know when I will get around to reviewing this PR but I do want to leave a Concept ACK d6946a1 here at this stage.

    It's also very very good to see multiple use cases of the common thread pool coming up in the codebase: http/rpcserver, wallet chain scanning, and this.

  19. in src/coins.h:698 in d6946a12ab
     698 | +            // We advance the tail since the input is cached and not accessed through this method again.
     699 | +            auto& input{m_inputs[m_input_tail++]};
     700 | +            // Check if the coin is ready to be read. We need acquire so we match the worker thread's release.
     701 | +            while (!input.ready.test(std::memory_order_acquire)) {
     702 | +                // Work instead of waiting if the coin is not ready
     703 | +                if (!ProcessInput()) {
    


    ryanofsky commented at 2:09 PM on May 21, 2026:

    In commit "coins: add ready flag to InputToFetch" (70ee8ba66055aed16cfa9e0a8ebeea6db0aad1b9)

    Is there a reason to call ProcessInput() here instead of input.ready.wait()?

    The comment above says this lets the main thread make progress instead of waiting for a specific worker, but was this benchmarked, or is there a reason to expect it to help? Since the main thread is the only thread that can continue validation, it seems possible that it should prioritize waiting for the needed input instead of switching to fetching later inputs, which may involve blocking I/O and may not make the current input ready any sooner.

    (Sorry if this was already asked somewhere. Either way it would be good to update the "Otherwise the main thread calls ProcessInput()..." comment above with a more specific rationale.)


    andrewtoth commented at 9:07 PM on May 31, 2026:

    I think this was done with an older design where we might not have any threads in the thread pool, and therefore wouldn't make any progress in that case. With the current design where we clear m_inputs if we didn't successfully submit tasks to at least 1 worker thread, it makes sense to just wait here. It simplifies this code and continues validation immediately instead of having the main thread potentially wait for expensive IO. I also bumped the default worker threads from 4 to 8. I believe we saw a speedup here due to this, but I think that was because we would now be fetching on a 5th thread by default. So bumping the default and just waiting on the main thread should be even faster.

  20. ryanofsky commented at 2:15 PM on May 21, 2026: contributor

    Code review d6946a12ab306c5054b535b0b7fac0101fb306ae. I've been meaning to review this for a while and finally getting to it now. Just one question so far below

  21. theStack commented at 8:53 PM on May 26, 2026: contributor

    Concept ACK

    I haven't had a chance yet to look at the code in-depth, but noticed that starting with commit a3981aa8a01195e3d4442e5d258f8f3198b49400 ("coins: introduce thread pool in CoinsViewOverlay"), the seemingly unrelated unit test test_LockDirectory fails on my arm64 Linux machine:

    $ git rev-parse --short HEAD
    a3981aa8a0
    $ ./build/bin/test_bitcoin --log_level=test_suite --run_test=coinsviewoverlay_tests,util_tests
    ./test/util_tests.cpp(1056): Entering test case "test_LockDirectory"     
    unknown location(0): fatal error: in "util_tests/test_LockDirectory": memory access violation at address: 0xffffb44470f0: no mapping at fault address
    ./test/util_tests.cpp(1068): last checkpoint                                                
    Test is aborted                                                                             
    ./test/util_tests.cpp(1056): Leaving test case "test_LockDirectory"; testing time: 2099us
    Test is aborted                                                                                                                                                                         
    ./test/util_tests.cpp(71): Leaving test suite "util_tests"; testing time: 10658us
    Test is aborted                                                                                                                                                                         
    Leaving test module "Bitcoin Core Test Suite"; testing time: 1008us                       
                                                                                                                                                                                            
    *** 1 failure is detected in the test module "Bitcoin Core Test Suite"                                                                                                                  
    ./test/util_tests.cpp(1135): error: in "util_tests/test_LockDirectory": check processstatus == 0 has failed [51456 != 0]      
    

    I suspect that the problem is that the forked unit test process in test_LockDirectory gets shut down hard, while the thread pool from the coinsviewoverlay_tests (started by StartedThreadPool) is still running. Keeping only the following minimal unit test in coinsviewoverlay_tests.cpp still fails on my machine with the command from above:

    BOOST_AUTO_TEST_CASE(lockdirectory_test_fail_reproducer)
    {
        CCoinsViewDB db{{.path = "", .cache_bytes = 1_MiB, .memory_only = true}, {}};
        CCoinsViewCache main_cache{&db};
        CoinsViewOverlay view{&main_cache, StartedThreadPool()};
    }
    

    If I add StartedThreadPool()->Stop(); at the end, the LockDirectory unit test failure is gone. Happy to share more details and debug if there is a suspicion that this could point to a deeper issue, though ISTM that this is only a test problem and stopping the thread pool at the end of the coinsviewoverlay test suite (with a test fixture?) would solve the problem.

  22. in src/coins.h:637 in a15470468b
     632 | +
     633 | +    //! Start fetching inputs from block.
     634 | +    [[nodiscard]] ResetGuard StartFetching(const CBlock& block LIFETIMEBOUND) noexcept
     635 | +    {
     636 | +        Assume(m_inputs.empty());
     637 | +        Assume(m_input_head.load(std::memory_order_relaxed) == 0);
    


    ryanofsky commented at 9:20 PM on May 26, 2026:

    In commit "validation: collect block inputs in CoinsViewOverlay before ConnectBlock" (a15470468ba89f2f86184191423c2a4d5c2ffcc1)

    This is not a performance hotspot, and if these preconditions are wrong it would seem to indicate a serious problem with validation logic. Would suggest either replacing Assume with Assert or at least taking some action such logging a warning if the Assume returns false, not just ignoring the errors

  23. in src/coins.h:626 in a15470468b outdated
     621 |      }
     622 |  
     623 | +protected:
     624 | +    void Reset() noexcept override
     625 | +    {
     626 | +        StopFetching();
    


    ryanofsky commented at 9:52 PM on May 26, 2026:

    In commit "validation: collect block inputs in CoinsViewOverlay before ConnectBlock" (a15470468ba89f2f86184191423c2a4d5c2ffcc1)

    Since this is a virtual method it means StopFetching could be called (through CreateResetGuard) even if StartFetching is never called, which is surprising, even if safe. Seems worth mentioning StopFetching could be called without StartFetching if this is intentional.

    More ideally it would seem good to just drop the CCoinsViewCache::CreateResetGuard method now that the only non-test code calling it is StartFetching. None of the test code actually uses the reset guard, except as a kludge to immediately reset the cache, so it seems like the tests would be better off either recreating the cache or calling a test-only reset method.


    andrewtoth commented at 9:11 PM on May 31, 2026:

    I added a comment to note that StopFetching is idempotent, and can be safely called when not fetching.

    I'm not sure it's worth refactoring the change to drop CreateResetGuard in this PR, as there is already a lot happening. I opted to not do that (can be refactored in a follow-up).

  24. in src/coins.h:613 in a15470468b
     608 | +    }
     609 | +
     610 |      std::optional<Coin> FetchCoinFromBase(const COutPoint& outpoint) const override
     611 |      {
     612 | +        // TODO: linear scan; replaced with O(1) m_input_tail lookup in a follow-up commit.
     613 | +        for (const auto i : std::views::iota(0U, m_inputs.size())) {
    


    ryanofsky commented at 9:56 PM on May 26, 2026:

    In commit "validation: collect block inputs in CoinsViewOverlay before ConnectBlock" (a15470468ba89f2f86184191423c2a4d5c2ffcc1)

    i is replaced with m_inputs_head immediately in the next commit, so it seems like it would reduce churn in the PR and make it clearer by introducing both m_input_head and m_input_tail this commit, instead of adding m_input_head and i now and replacing i later.


    andrewtoth commented at 9:11 PM on May 31, 2026:

    Bundled m_input_head into the first commit.

  25. in src/coins.h:647 in ec77ed6146
     641 | @@ -635,11 +642,16 @@ class CoinsViewOverlay : public CCoinsViewCache
     642 |      {
     643 |          Assume(m_inputs.empty());
     644 |          Assume(m_input_head.load(std::memory_order_relaxed) == 0);
     645 | +        Assume(m_input_tail == 0);
     646 |          // Loop through the inputs of the block and set them in the queue.
     647 | +        // Filter txs that are spending inputs created earlier in the same block.
    


    ryanofsky commented at 10:03 PM on May 26, 2026:

    In commit "coins: filter same-block spends in StartFetching" (ec77ed61467a80112e172e27ebafa6e8011b7a5c)

    It's not obvious that these inputs "won't be requested by FetchCoinFromBase" as stated in the commit message. That fact seems worth mentioning in this comment, otherwise it looks like this will result in extra PeekCoin calls.

  26. in src/coins.h:659 in 70ee8ba660
     651 | @@ -635,7 +652,12 @@ class CoinsViewOverlay : public CCoinsViewCache
     652 |      }
     653 |  
     654 |  public:
     655 | -    using CCoinsViewCache::CCoinsViewCache;
     656 | +    explicit CoinsViewOverlay(CCoinsView* in_base, bool deterministic = false) noexcept
     657 | +        : CCoinsViewCache{in_base, deterministic}
     658 | +    {
     659 | +        // Reserve to maximum theoretical number so emplace_back in StartFetching never reallocates m_inputs.
     660 | +        m_inputs.reserve(MAX_INPUTS_PER_BLOCK);
    


    ryanofsky commented at 10:20 PM on May 26, 2026:

    In commit "coins: add ready flag to InputToFetch" (70ee8ba66055aed16cfa9e0a8ebeea6db0aad1b9)

    This seems like a fragile way to ensure vector elements will not be moved. It would seem safer to replace the vector with a std::array (and an m_num_inputs field) to guarantee they will never move.


    l0rinc commented at 3:28 PM on May 29, 2026:

    Isn't the stack limit 1MB on Windows? std::array would be inline in CoinsViewOverlay, so local instances would put 80 * 24390 = 1,951,200 bytes on the stack just for m_inputs. A heap-backed std::unique_ptr<std::array<...>> could work, but a direct std::array member seems risky. It would also need another state field to track the number of valid inputs. We might be able to get around that with a noop sentinel, but that seems more risky than keeping the current explicit size.


    andrewtoth commented at 5:22 PM on May 29, 2026:

    I was experimenting with this, and I don't think an array is ideal here. I think instead of asserting that we will never move, we can properly implement the move constructor here but assert in it that ready will be false and coin will be nullopt. We can get rid of the MAX_INPUTS_PER_BLOCK commit and just let it resize inside of StartFetching.


    l0rinc commented at 1:00 PM on May 31, 2026:

    andrewtoth commented at 2:00 PM on May 31, 2026:

    What changed since https://mirror.b10c.me/bitcoin-bitcoin/31132#discussion_r3166350935?

    The comment above stating that it seems fragile. The previous version in the referenced thread of #31132 was just partially moving the outpoint and ignoring the other fields. Instead, we can move all fields and assert that ready is false and coin is nullopt.


    andrewtoth commented at 9:12 PM on May 31, 2026:

    Updated to drop the MAX_INPUTS_PER_BLOCK and implement the move constructor but assert that ready is false and coin is nullopt.

  27. in src/coins.h:684 in fec8a2b732 outdated
     677 | @@ -678,6 +678,24 @@ class CoinsViewOverlay : public CCoinsViewCache
     678 |          while (ProcessInput()) {}
     679 |          return CreateResetGuard();
     680 |      }
     681 | +
     682 | +    void SetBackend(CCoinsView& view) override
     683 | +    {
     684 | +        StopFetching();
    


    ryanofsky commented at 6:43 PM on May 27, 2026:

    In commit "coins: stop fetching before mutating base" (fec8a2b732bda4c2287bbeab8cfb29af43e9e254)

    Commit description seems makes it clear why SetBackend / Flush / Sync would not be safe to call while inputs are being fetched, but it's unclear (1) what would be calling these methods while inputs are being fetched and (2) why stopping fetching is the right behavior in these cases. It would be good to say more about this in the commit message or code comments.

    Also, it seems fragile for CoinsViewOverlay to have to override these 3 methods individually, because if new methods or overloads are added to the parent class, CoinsViewOverlay could be silently broken. If CoinsViewOverlay needs to stop fetching when the base view is mutated it would seem better to have a single UpdatingBaseView method it could override, or something like that, instead of needing to override 3 methods.


    andrewtoth commented at 9:17 PM on May 31, 2026:

    I updated the commit message to explain why these changes were needed. In production code we don't need any of them, since Flush is the only method to be called on CoinsViewOverlay, and for a valid block all inputs would have already been fetched to get there. Nevertheless these methods are called by our fuzz harnesses, which exercise the full API of CCoinsViewCache. So, it allows us to substitute CoinsViewOverlay with CCoinsViewCache there. I also added unit tests to cover all these.

    The UpdatingBaseView idea would be an interesting refactor but seems out of scope here, so I opted not to take it.

  28. in src/init.cpp:520 in 22c4c1d737
     516 | @@ -517,6 +517,7 @@ void SetupServerArgs(ArgsManager& argsman, bool can_listen_ipc)
     517 |      argsman.AddArg("-minimumchainwork=<hex>", strprintf("Minimum work assumed to exist on a valid chain in hex (default: %s, testnet3: %s, testnet4: %s, signet: %s)", defaultChainParams->GetConsensus().nMinimumChainWork.GetHex(), testnetChainParams->GetConsensus().nMinimumChainWork.GetHex(), testnet4ChainParams->GetConsensus().nMinimumChainWork.GetHex(), signetChainParams->GetConsensus().nMinimumChainWork.GetHex()), ArgsManager::ALLOW_ANY | ArgsManager::DEBUG_ONLY, OptionsCategory::OPTIONS);
     518 |      argsman.AddArg("-par=<n>", strprintf("Set the number of script verification threads (0 = auto, up to %d, <0 = leave that many cores free, default: %d)",
     519 |          MAX_SCRIPTCHECK_THREADS, DEFAULT_SCRIPTCHECK_THREADS), ArgsManager::ALLOW_ANY, OptionsCategory::OPTIONS);
     520 | +    argsman.AddArg("-inputfetchthreads=<n>", strprintf("Set the number of input fetch threads (0 disables, up to %d, default: %d). Negative values are rejected.", MAX_INPUTFETCH_THREADS, DEFAULT_INPUTFETCH_THREADS), ArgsManager::ALLOW_ANY, OptionsCategory::OPTIONS);
    


    ryanofsky commented at 6:51 PM on May 27, 2026:

    In commit "validation: add -inputfetchthreads configuration option" (22c4c1d737dc64d60dd22f9a635866b8ff5236a3)

    "input fetch threads" sounds pretty generic, and it would seem good to be more specific about what these threads are. LLM suggests the following name & description which I think would be better:

    argsman.AddArg("-prevoutfetchthreads=<n>", strprintf("Set the number of threads used to prefetch block input prevouts from the chainstate database (0 disables, up to %d, default: %d). Negative values are rejected.", MAX_INPUTFETCH_THREADS, DEFAULT_INPUTFETCH_THREADS), ArgsManager::ALLOW_ANY, OptionsCategory::OPTIONS);
    
  29. in src/node/chainstatemanager_args.cpp:65 in 22c4c1d737
      59 | @@ -60,6 +60,13 @@ util::Result<void> ApplyArgsManOptions(const ArgsManager& args, ChainstateManage
      60 |      // Subtract 1 because the main thread counts towards the par threads.
      61 |      opts.worker_threads_num = script_threads - 1;
      62 |  
      63 | +    if (auto value{args.GetIntArg("-inputfetchthreads")}) {
      64 | +        if (*value < 0) {
      65 | +            return util::Error{Untranslated(strprintf("-inputfetchthreads must be non-negative (got %d). Use 0 to disable input fetching.", *value))};
    


    ryanofsky commented at 6:52 PM on May 27, 2026:

    In commit "validation: add -inputfetchthreads configuration option" (22c4c1d737dc64d60dd22f9a635866b8ff5236a3)

    Should probably say "parallel input fetching" instead of "input fetching"

  30. in src/test/coinsviewoverlay_tests.cpp:61 in a3981aa8a0
      57 | @@ -55,6 +58,17 @@ void PopulateView(const CBlock& block, CCoinsView& view, bool spent = false)
      58 |      cache.Flush();
      59 |  }
      60 |  
      61 | +//! Returns a started thread pool shared across tests, mirroring how production reuses pools.
    


    ryanofsky commented at 7:02 PM on May 27, 2026:

    In commit "coins: introduce thread pool in CoinsViewOverlay" (a3981aa8a01195e3d4442e5d258f8f3198b49400)

    Unless there's a specific reason to share state across tests, it would seem better to make them self contained and not leak resources after they finish executing. Would suggest using a fixture instead of global like:

    <details><summary>diff</summary> <p>

    --- a/src/test/coinsviewoverlay_tests.cpp
    +++ b/src/test/coinsviewoverlay_tests.cpp
    @@ -20,7 +20,15 @@
     #include <memory>
     #include <ranges>
     
    -BOOST_AUTO_TEST_SUITE(coinsviewoverlay_tests)
    +struct CoinsViewOverlaySetup {
    +    std::shared_ptr<ThreadPool> m_thread_pool{[] {
    +        auto pool{std::make_shared<ThreadPool>("fetch_test")};
    +        pool->Start(DEFAULT_INPUTFETCH_THREADS);
    +        return pool;
    +    }()};
    +};
    +
    +BOOST_FIXTURE_TEST_SUITE(coinsviewoverlay_tests, CoinsViewOverlaySetup)
     
     namespace {
     
    @@ -58,17 +66,6 @@ void PopulateView(const CBlock& block, CCoinsView& view, bool spent = false)
         cache.Flush();
     }
     
    -//! Returns a started thread pool shared across tests, mirroring how production reuses pools.
    -std::shared_ptr<ThreadPool> StartedThreadPool()
    -{
    -    static const auto thread_pool{[] {
    -        auto pool{std::make_shared<ThreadPool>("fetch_test")};
    -        pool->Start(DEFAULT_INPUTFETCH_THREADS);
    -        return pool;
    -    }()};
    -    return thread_pool;
    -}
    -
     void CheckCache(const CBlock& block, const CCoinsViewCache& cache)
     {
         uint32_t counter{0};
    @@ -98,7 +95,7 @@ BOOST_AUTO_TEST_CASE(fetch_inputs_from_db)
         CCoinsViewDB db{{.path = "", .cache_bytes = 1_MiB, .memory_only = true}, {}};
         PopulateView(block, db);
         CCoinsViewCache main_cache{&db};
    -    CoinsViewOverlay view{&main_cache, StartedThreadPool()};
    +    CoinsViewOverlay view{&main_cache, m_thread_pool};
         const auto& outpoint{block.vtx[1]->vin[0].prevout};
     
         BOOST_CHECK(view.HaveCoin(outpoint));
    @@ -125,7 +122,7 @@ BOOST_AUTO_TEST_CASE(fetch_inputs_from_cache)
         CCoinsViewDB db{{.path = "", .cache_bytes = 1_MiB, .memory_only = true}, {}};
         CCoinsViewCache main_cache{&db};
         PopulateView(block, main_cache);
    -    CoinsViewOverlay view{&main_cache, StartedThreadPool()};
    +    CoinsViewOverlay view{&main_cache, m_thread_pool};
         CheckCache(block, view);
     
         const auto& outpoint{block.vtx[1]->vin[0].prevout};
    @@ -145,7 +142,7 @@ BOOST_AUTO_TEST_CASE(fetch_no_double_spend)
         CCoinsViewCache main_cache{&db};
         // Add all inputs as spent already in cache
         PopulateView(block, main_cache, /*spent=*/true);
    -    CoinsViewOverlay view{&main_cache, StartedThreadPool()};
    +    CoinsViewOverlay view{&main_cache, m_thread_pool};
         for (const auto& tx : block.vtx) {
             for (const auto& in : tx->vin) {
                 const auto& c{view.AccessCoin(in.prevout)};
    @@ -163,7 +160,7 @@ BOOST_AUTO_TEST_CASE(fetch_no_inputs)
         const auto block{CreateBlock()};
         CCoinsViewDB db{{.path = "", .cache_bytes = 1_MiB, .memory_only = true}, {}};
         CCoinsViewCache main_cache{&db};
    -    CoinsViewOverlay view{&main_cache, StartedThreadPool()};
    +    CoinsViewOverlay view{&main_cache, m_thread_pool};
         for (const auto& tx : block.vtx) {
             for (const auto& in : tx->vin) {
                 const auto& c{view.AccessCoin(in.prevout)};
    

    </p> </details>


    l0rinc commented at 3:56 PM on May 29, 2026:

    Yeah, I suggested deduplicating, this is likely the cause of @theStack's failure as well, your solution is more elegant

  31. in src/coins.h:625 in 1f0c8957e1
     621 |      void StopFetching() noexcept
     622 |      {
     623 | +        if (m_futures.empty()) {
     624 | +            Assume(m_inputs.empty());
     625 | +            Assume(m_input_head.load(std::memory_order_relaxed) == 0);
     626 | +            Assume(m_input_tail == 0);
    


    ryanofsky commented at 7:15 PM on May 27, 2026:

    In commit "coins: fetch inputs in parallel" (1f0c8957e1e483608148dc4bf7f396b82d2c6e44)

    Similar to other comments made previously it would seem good to replace these Assume checks with asserts or do something besides ignoring failures they return

  32. andrewtoth force-pushed on May 31, 2026
  33. andrewtoth renamed this:
    validation: fetch block inputs in parallel during ConnectBlock
    validation: fetch block input prevouts in parallel during ConnectBlock
    on May 31, 2026
  34. andrewtoth commented at 10:34 PM on May 31, 2026: contributor

    Thanks for your reviews @l0rinc @ryanofsky @theStack. I have taken most of your suggestions.

    • We no longer call ProcessInput on main thread, and just wait on the ready flag. This let's us drop const from ProcessInput.
    • Renamed all inputfetch|InputFetch to prevoutfetch|PrevoutFetch.
    • Bumped DEFAULT_PREVOUTFETCH_THREADS from 4 to 8. This was precipitated by no longer working on main thread, so we should have more workers.
    • Addressed various other suggestions.
    • The increase in diff size is mainly from adding unit tests for full code coverage.

    git diff d6946a12ab306c5054b535b0b7fac0101fb306ae..e71a395dfcc01d66ad0cafdc67fd87ba1de04a4b

  35. in src/coins.h:616 in d0aa46a717
     611 | @@ -602,6 +612,9 @@ class CoinsViewOverlay : public CCoinsViewCache
     612 |  
     613 |          auto& input{m_inputs[i]};
     614 |          input.coin = base->PeekCoin(input.outpoint);
     615 | +        // Use release so writing coin above happens before the main thread acquires.
     616 | +        input.ready.test_and_set(std::memory_order_release);
    


    l0rinc commented at 7:00 PM on June 1, 2026:

    d0aa46a coins: add ready flag to InputToFetch:

    Since each input slot should only be claimed once via m_input_head, could this assume the flag was not already set? It shouldn't fail if multiple threads are fetching the same (or the same, multiple times), but it would signal some kind of bug:

    Assume(!input.ready.test_and_set(std::memory_order_release));
    
  36. in src/coins.h:713 in e71a395dfc outdated
     713 | +
     714 | +        // We will only get here for BIP30 checks or when parallel fetching is disabled.
     715 |          return base->PeekCoin(outpoint);
     716 |      }
     717 |  
     718 | +    //! Non-null. May have zero workers when input fetching is disabled.
    


    l0rinc commented at 7:18 PM on June 1, 2026:

    If #34844 gets merged first, this is a good candidate

  37. in src/test/coinsviewoverlay_tests.cpp:64 in cfdc1b9aef
      57 | @@ -55,6 +58,17 @@ void PopulateView(const CBlock& block, CCoinsView& view, bool spent = false)
      58 |      cache.Flush();
      59 |  }
      60 |  
      61 | +//! Returns a started thread pool shared across tests, mirroring how production reuses pools.
      62 | +std::shared_ptr<ThreadPool> StartedThreadPool()
      63 | +{
      64 | +    static const auto thread_pool{[] {
    


    l0rinc commented at 7:27 PM on June 1, 2026:

    cfdc1b9 coins: introduce thread pool in CoinsViewOverlay:

    How come these are still static? It's moved in a later commit, can you please reduce churn and correct it here, too?

  38. in src/node/chainstatemanager_args.cpp:63 in 78ea7793f9
      59 | @@ -60,6 +60,13 @@ util::Result<void> ApplyArgsManOptions(const ArgsManager& args, ChainstateManage
      60 |      // Subtract 1 because the main thread counts towards the par threads.
      61 |      opts.worker_threads_num = script_threads - 1;
      62 |  
      63 | +    if (auto value{args.GetIntArg("-prevoutfetchthreads")}) {
    


    l0rinc commented at 11:48 AM on June 2, 2026:

    78ea779 validation: add -prevoutfetchthreads configuration option:

    Is it critical that we read this as a 64 bit value and cast later? It seems to me this would also work:

    diff --git a/src/node/chainstatemanager_args.cpp b/src/node/chainstatemanager_args.cpp
    --- a/src/node/chainstatemanager_args.cpp	(revision 371c525623e42424ead869114baff07ab60505b8)
    +++ b/src/node/chainstatemanager_args.cpp	(date 1780400814309)
    @@ -18,6 +18,8 @@
     #include <util/translation.h>
     #include <validation.h>
     
     #include <algorithm>
     #include <chrono>
     #include <cstdint>
     #include <string>
    @@ -60,11 +62,11 @@
         // Subtract 1 because the main thread counts towards the par threads.
         opts.worker_threads_num = script_threads - 1;
     
    -    if (auto value{args.GetIntArg("-prevoutfetchthreads")}) {
    +    if (auto value{args.GetArg<int32_t>("-prevoutfetchthreads")}) {
             if (*value < 0) {
                 return util::Error{Untranslated(strprintf("-prevoutfetchthreads must be non-negative (got %d). Use 0 to disable parallel input fetching.", *value))};
             }
    -        opts.prevoutfetch_threads_num = static_cast<int32_t>(std::min<int64_t>(*value, MAX_PREVOUTFETCH_THREADS));
    +        opts.prevoutfetch_threads_num = std::min(*value, MAX_PREVOUTFETCH_THREADS);
         }
     
         if (auto max_size = args.GetIntArg("-maxsigcachesize")) {
    
  39. in src/test/fuzz/coins_view.cpp:96 in e71a395dfc
      91 | +Mutex g_thread_pool_mutex;
      92 | +
      93 | +void StartPoolIfNeeded() EXCLUSIVE_LOCKS_REQUIRED(!g_thread_pool_mutex)
      94 | +{
      95 | +    LOCK(g_thread_pool_mutex);
      96 | +    if (g_thread_pool->WorkersCount() == DEFAULT_PREVOUTFETCH_THREADS) return;
    


    l0rinc commented at 3:29 PM on June 2, 2026:

    nit, we can either have an ThreadPool::IsStarted() method or use WorkersCount as a boolean:

        if (!g_thread_pool->WorkersCount()) g_thread_pool->Start(DEFAULT_PREVOUTFETCH_THREADS);
    

    cc: @theStack


    l0rinc commented at 10:53 AM on June 3, 2026:

    Applies to src/test/fuzz/coinscache_sim.cpp as well

  40. in src/coins.h:758 in e71a395dfc
     759 | +            // Only start threads if we have something to fetch.
     760 | +            if (!m_inputs.empty()) {
     761 | +                std::vector<std::function<void()>> tasks(workers_count, [this] {
     762 | +                    while (ProcessInput()) {}
     763 | +                });
     764 | +                if (auto futures{m_thread_pool->Submit(std::move(tasks))}; futures.has_value()) {
    


    l0rinc commented at 3:53 PM on June 2, 2026:

    nit:

                    if (auto futures{m_thread_pool->Submit(std::move(tasks))}) {
    
  41. in src/coins.h:691 in 0f70a37afe outdated
     695 | -        std::unordered_set<Txid, SaltedTxidHasher> txids;
     696 | -        txids.reserve(block.vtx.size());
     697 | -        for (const auto& tx : block.vtx | std::views::drop(1)) {
     698 | -            for (const auto& input : tx->vin) {
     699 | -                if (!txids.contains(input.prevout.hash)) m_inputs.emplace_back(input.prevout);
     700 | +        if (const auto workers_count{m_thread_pool->WorkersCount()}; workers_count > 0) {
    


    l0rinc commented at 3:57 PM on June 2, 2026:

    0f70a37 coins: fetch inputs in parallel:

    Up until the previous commit, the cache was also warmed on a single thread; here we're reverting to the original behavior when there are no threads (because the main thread isn't fetching anymore). It's not incorrect, but it changes behavior back and forth. It's also not obvious what the speed difference should be between 1 thread and 0 threads. My understanding is that with 0 threads there is no pre-warming (i.e., done during validation), while with 1 thread it is prefetched so the validation code always finds the value already in cache. And the fact that StartFetching doesn't always "start" anything complicates things further (and prefixing with Maybe* wouldn't help). I can accept that keeping the old behavior as a fallback is necessary, but the 0 vs 1 ambiguity adds complexity that could maybe be avoided (similarly to how -par=1 means 0 additional workers for scripts, which is also quite confusing). cc: @theStack


    andrewtoth commented at 12:54 AM on June 3, 2026:

    The parallel fetching is disabled with 0 threads. Current behavior is retained. StartFetching is just essentially CreateResetGuard. With 1 or more threads, the prevouts are fetched in the background, and the main thread will not fetch them from main cache or disk. With only 1 worker thread we can expect only slightly better performance, since the main thread can continue any validation while the background worker fetches from disk. This would be especially true with -par=1, since the main thread can do CPU intensive validation while the background thread waits on disk IO.


    l0rinc commented at 10:49 AM on June 3, 2026:

    So what's the advantage of keeping the old behavior?


    andrewtoth commented at 3:06 PM on June 3, 2026:

    I don't think there's an advantage. I would not configure it to 0 myself. It was requested as an option in #31132 (comment). BIP30 checks will still read concurrently with background reads with 1 thread.


    l0rinc commented at 8:36 AM on June 5, 2026:

    It was requested as an option in #31132 (comment).

    I don't see it, can you please quote that part? Especially since the fetches are still single threaded with -prevoutfetchthreads=1.

    I don't think there's an advantage. I would not configure it to 0 myself.

    I've compared -prevoutfetchthreads=0 with -prevoutfetchthreads=1, there's basically no difference. My preference would be to disallow -prevoutfetchthreads=0 and always pre-warm the cache from now on - it's just confusing to have this optionality. cc: @willcl-ark, @ryanofsky, @theStack

    0637a1cc6f fuzz: add coins_view_stacked fuzz harness to test concurrent leveldb reads
    
    2026-06-04 | reindex-chainstate | 950059 blocks | dbcache 1000 | i7-hdd | x86_64 | Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz | 8 cores | 62Gi RAM | HDD
    
    Benchmark 1: COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=950059 -dbcache=1000 -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0 -prevoutfetchthreads=0 (COMMIT = 0637a1cc6f2b2f28d4ab5d97aaf8a91ec436fc8f)
      Time (abs ≡):        30800.474 s               [User: 34040.905 s, System: 1333.384 s]
     
    Benchmark 2: COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=950059 -dbcache=1000 -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0 -prevoutfetchthreads=1 (COMMIT = 0637a1cc6f2b2f28d4ab5d97aaf8a91ec43
      Time (abs ≡):        30736.822 s               [User: 36206.873 s, System: 2918.100 s]
    

    andrewtoth commented at 1:33 PM on June 5, 2026:

    quote that part?

    It is this:

    this could also be an argument for adding a config option to disable parallel fetch, if we cannot assure ourselves enough?

    the fetches are still single threaded with -prevoutfetchthreads=1

    Sorry if I was unclear from my initial reply. For BIP30 checks they will fall back to reading from the chainstate db concurrently with the single background thread.

    One other advantage is less memory overhead from one less thread, if the user is very concerned about memory.

  42. in src/coins.h:644 in cfdc1b9aef outdated
     640 | @@ -639,10 +641,13 @@ class CoinsViewOverlay : public CCoinsViewCache
     641 |              return std::move(input.coin);
     642 |          }
     643 |  
     644 | -        // We will only get here for BIP30 checks.
     645 | +        // We will only get here for BIP30 checks or when parallel fetching is disabled.
    


    l0rinc commented at 3:59 PM on June 2, 2026:

    cfdc1b9 coins: introduce thread pool in CoinsViewOverlay:

    This is actually only guarded in the next commit

  43. in src/coins.h:704 in 0f70a37afe
     708 | +                    if (!txids.contains(input.prevout.hash)) m_inputs.emplace_back(input.prevout);
     709 | +                }
     710 | +                txids.emplace(tx->GetHash());
     711 | +            }
     712 | +            // Only start threads if we have something to fetch.
     713 | +            if (!m_inputs.empty()) {
    


    l0rinc commented at 4:14 PM on June 2, 2026:

    0f70a37 coins: fetch inputs in parallel:

    nit: Personally I find the !empty pattern harder to read than if (m_inputs.size()) { or if (m_inputs.size() > 0) {.

  44. in src/test/coinsviewoverlay_tests.cpp:49 in 3e995afb46
      41 | @@ -32,10 +42,13 @@ CBlock CreateBlock() noexcept
      42 |      coinbase.vin.emplace_back();
      43 |      block.vtx.push_back(MakeTransactionRef(coinbase));
      44 |  
      45 | +    Txid prevhash{Txid::FromUint256(uint256{1})};
      46 | +
      47 |      for (const auto i : std::views::iota(1, NUM_TXS)) {
      48 |          CMutableTransaction tx;
      49 | -        Txid txid{Txid::FromUint256(uint256(i))};
      50 | +        const Txid txid{i % 2 == 0 ? Txid::FromUint256(uint256(i)) : prevhash};
    


    l0rinc commented at 4:40 PM on June 2, 2026:

    nit: we don't usually have 50% internap spends, I remember only having around 5% on average

  45. in src/test/coinsviewoverlay_tests.cpp:73 in 3e995afb46 outdated
      70 | -            cache.EmplaceCoinInternalDANGER(COutPoint{in.prevout}, std::move(coin));
      71 | +            if (spent) {
      72 | +                cache.EmplaceCoinInternalDANGER(COutPoint{in.prevout}, std::move(coin));
      73 | +            } else {
      74 | +                coin.out.nValue = 1;
      75 | +                cache.AddCoin(in.prevout, std::move(coin), /*possible_overwrite=*/false);
    


    l0rinc commented at 4:41 PM on June 2, 2026:

    3e995af test: add unit tests for CoinsViewOverlay::StartFetching:

    nit: the EmplaceCoinInternalDANGER vs AddCoin difference could likely be added in earlier commit.

  46. in src/test/coinsviewoverlay_tests.cpp:100 in 3e995afb46
      93 | @@ -82,9 +94,12 @@ void CheckCache(const CBlock& block, const CCoinsViewCache& cache)
      94 |                  const auto& first{cache.AccessCoin(outpoint)};
      95 |                  const auto& second{cache.AccessCoin(outpoint)};
      96 |                  BOOST_CHECK_EQUAL(&first, &second);
      97 | -                ++counter;
      98 | -                BOOST_CHECK(cache.HaveCoinInCache(outpoint));
      99 | +                const auto should_have{!txids.contains(outpoint.hash)};
     100 | +                if (should_have) ++counter;
     101 | +                const auto have{cache.HaveCoinInCache(outpoint)};
     102 | +                BOOST_CHECK_EQUAL(should_have, have);
    


    l0rinc commented at 4:47 PM on June 2, 2026:

    3e995af test: add unit tests for CoinsViewOverlay::StartFetching:

    What we're checking here is that cache presence and an internal spend are mutually exclusive. It might be simpler to use BOOST_CHECK_NE for that:

                    const auto have{cache.HaveCoinInCache(outpoint)};
                    BOOST_CHECK_NE(txids.contains(outpoint.hash), have);
                    counter += have;
    
  47. in src/test/coinsviewoverlay_tests.cpp:207 in 3e995afb46
     203 | +    CCoinsViewCache main_cache{&db};
     204 | +    Coin coin{};
     205 | +    coin.out.nValue = 1;
     206 | +    const COutPoint outpoint{Txid::FromUint256(uint256::ZERO), 0};
     207 | +    main_cache.EmplaceCoinInternalDANGER(COutPoint{outpoint}, std::move(coin));
     208 | +    const COutPoint missing_outpoint{Txid::FromUint256(uint256::ONE), 0};
    


    l0rinc commented at 4:52 PM on June 2, 2026:

    3e995af test: add unit tests for CoinsViewOverlay::StartFetching:

    We could move this closer to the first usage to make it obvious that this was unknown during construction:

    const COutPoint missing_outpoint{Txid::FromUint256(uint256::ONE), 0};
    BOOST_CHECK(view.AccessCoin(missing_outpoint).IsSpent());
    BOOST_CHECK(!view.HaveCoinInCache(missing_outpoint));
    
  48. in src/test/coinsviewoverlay_tests.cpp:200 in 3e995afb46 outdated
     193 | @@ -175,5 +194,140 @@ BOOST_AUTO_TEST_CASE(fetch_no_inputs)
     194 |      BOOST_CHECK_EQUAL(view.GetCacheSize(), 0);
     195 |  }
     196 |  
     197 | -BOOST_AUTO_TEST_SUITE_END()
     198 | +// Access coins that are not block inputs
     199 | +BOOST_AUTO_TEST_CASE(access_non_input_coins)
     200 | +{
     201 | +    const auto block{CreateBlock()};
    


    theStack commented at 4:57 PM on June 2, 2026:

    nit: could as well start out with an empty block here, to ensure that it's actually a non-input hit below?


    andrewtoth commented at 10:11 PM on June 3, 2026:

    Ah this actually uncovered that the test was passing for the wrong reason!

    Below we define the missing_outpoint as having txid 1. But, the block does have txid 1! It was still missing from the db, but it wasn't taking the fallback path.

  49. in src/test/coinsviewoverlay_tests.cpp:214 in 3e995afb46
     210 | +    CoinsViewOverlay view{&main_cache, m_thread_pool};
     211 | +    const auto reset_guard{view.StartFetching(block)};
     212 | +
     213 | +    // Non-input fallback hit.
     214 | +    const auto& accessed_coin{view.AccessCoin(outpoint)};
     215 | +    BOOST_CHECK(!accessed_coin.IsSpent());
    


    l0rinc commented at 4:57 PM on June 2, 2026:

    3e995af test: add unit tests for CoinsViewOverlay::StartFetching:

    nit: can likely be inlined for simplicity (like we do in fetch_out_of_order_input_uses_normal_lookup or use HaveCoin directly):

        BOOST_CHECK(!view.AccessCoin(outpoint).IsSpent());
    

    cc: @theStack

  50. in src/test/coinsviewoverlay_tests.cpp:229 in 3e995afb46
     225 | +{
     226 | +    const auto block{CreateBlock()};
     227 | +    CCoinsViewDB db{{.path = "", .cache_bytes = 1_MiB, .memory_only = true}, {}};
     228 | +    CCoinsViewCache main_cache{&db};
     229 | +    PopulateView(block, main_cache);
     230 | +    auto thread_pool{std::make_shared<ThreadPool>("fetch_none")};
    


    l0rinc commented at 5:02 PM on June 2, 2026:

    3e995af test: add unit tests for CoinsViewOverlay::StartFetching:

    It's a bit confusing that we have a started shared thread pool in the fixture that we're not actually using in a few of the tests. This one doesn't start the thread pool, hence the fallback, but the fixture has a started one. Maybe we could add these to a different fixture to make this distinction obvious. cc: @theStack

  51. in src/test/coinsviewoverlay_tests.cpp:223 in 3e995afb46
     219 | +    BOOST_CHECK(missing_coin.IsSpent());
     220 | +    BOOST_CHECK(!view.HaveCoinInCache(missing_outpoint));
     221 | +}
     222 | +
     223 | +// Test that disabled input fetching falls back to normal cache lookups via base->PeekCoin.
     224 | +BOOST_AUTO_TEST_CASE(fetch_unstarted_thread_pool)
    


    l0rinc commented at 5:06 PM on June 2, 2026:

    3e995af test: add unit tests for CoinsViewOverlay::StartFetching:

    Coverage proves this does indeed exercise the fallback 👍

    <img width="855" height="307" alt="Image" src="https://github.com/user-attachments/assets/77b92691-9dcd-4bd9-b986-3b4e704d88cc" />

  52. in src/test/coinsviewoverlay_tests.cpp:277 in 3e995afb46 outdated
     273 | +    CoinsViewOverlay view{&main_cache, m_thread_pool};
     274 | +    const auto reset_guard{view.StartFetching(block)};
     275 | +
     276 | +    const auto& out_of_order_input{fetched_inputs[1]};
     277 | +    BOOST_CHECK(!view.AccessCoin(out_of_order_input).IsSpent());
     278 | +    BOOST_CHECK(view.HaveCoinInCache(out_of_order_input));
    


    l0rinc commented at 5:25 PM on June 2, 2026:

    3e995af test: add unit tests for CoinsViewOverlay::StartFetching:

    We might want to add a comment to the test header about FetchCoinFromBase refusing to advance for out-of-order requests (causing the fallback) and maybe prove that AccessCoin makes HaveCoinInCache pass:

    const auto& out_of_order_input{fetched_inputs[1]};
    BOOST_CHECK(!view.HaveCoinInCache(out_of_order_input));
    BOOST_CHECK(!view.AccessCoin(out_of_order_input).IsSpent());
    BOOST_CHECK(view.HaveCoinInCache(out_of_order_input));
    

    cc: @theStack

  53. in src/test/coinsviewoverlay_tests.cpp:285 in 3e995afb46 outdated
     281 | +}
     282 | +
     283 | +// Mutating operations on the overlay (Flush/Sync/SetBackend) must stop in-flight
     284 | +// fetching and clear all per-block state, so a second StartFetching on the same
     285 | +// overlay starts from clean preconditions.
     286 | +BOOST_AUTO_TEST_CASE(fetch_state_is_cleared_by_mutating_operations)
    


    l0rinc commented at 5:32 PM on June 2, 2026:

    3e995af test: add unit tests for CoinsViewOverlay::StartFetching:

    Commenting out StopFetching for SetBackend/Flush/Sync separately all makes this test fail 👍

  54. in src/validation.h:92 in 78ea7793f9
      88 | @@ -89,6 +89,9 @@ static const uint64_t MIN_DISK_SPACE_FOR_BLOCK_FILES{550_MiB};
      89 |  /** Maximum number of dedicated script-checking threads allowed */
      90 |  static constexpr int MAX_SCRIPTCHECK_THREADS{15};
      91 |  
      92 | +/** Maximum number of dedicate threads allowed for prefetching block input prevouts */
    


    l0rinc commented at 5:54 PM on June 2, 2026:

    78ea779 validation: add -prevoutfetchthreads configuration option:

    /** Maximum number of dedicated threads allowed for prefetching block input prevouts */
    
  55. l0rinc changes_requested
  56. l0rinc commented at 9:31 PM on June 2, 2026: contributor

    The number of fetcher threads is configurable via -prevoutfetchthreads=<n>, defaulting to 4

    Could you please update the PR description? Btw, I'm measuring the effect of 8 new threads on minimum node memory usage, will report back when it finishes.

    My remaining concern after the recent changes is that I don't understand the difference between parallelism of 0 and 1 - see comments.

    I remeasured reindex-chainstate with 1GB and 50GB dbcache on SSD and HDD to see the extremes (rebased to get seek competition as well): they're looking really good:

    <img width="2340" height="2340" alt="Image" src="https://github.com/user-attachments/assets/d88714a8-d676-4b01-a40b-900314ea4fbd" />

    <details><summary>i7</summary>

    for DBCACHE in 1000 50000; do \
        COMMITS="fbe628756cc417dd4b6ccd9d3a709ca8e2f6023c 3e90c5e1ceecb8fcb6a4271b98e84ded171252a7"; \
        STOP=950059; CC=gcc; CXX=g++; \
        BASE_DIR="/mnt/my_storage"; DATA_DIR="$BASE_DIR/BitcoinData"; LOG_DIR="$BASE_DIR/logs"; \
        (echo ""; for c in $COMMITS; do git fetch -q origin "$c" 2>/dev/null || true; git log -1 --pretty='%h %s' $c || exit 1; done) && \
        (echo "" && echo "$(date -I) | reindex-chainstate | ${STOP} blocks | dbcache ${DBCACHE} | $(hostname) | $(uname -m) | $(lscpu | grep 'Model name' | head -1 | cut -d: -f2 | xargs) | $(nproc) cores | $(free -h | awk '/^Mem:/{print $2}') RAM | $(lsblk -no ROTA $(df --output=source $BASE_DIR | tail -1) | grep -q 1 && echo HDD || echo SSD)"; echo "") && \
        hyperfine \
        --sort command \
        --runs 1 \
        --export-json "$BASE_DIR/rdx-$(sed -E 's/[^ ]+/\L&/g;s/[.]/_/g;s/ /-/g'<<<"$COMMITS")-$STOP-$DBCACHE-$CC.json" \
        --parameter-list COMMIT ${COMMITS// /,} \
        --prepare "killall -9 bitcoind 2>/dev/null; rm -f ./build/bin/bitcoind; done"; \he=$DBCACHE -reindex-chainstate -blocksonly -connect=0 -printtoconso
    
    fbe628756c Merge bitcoin/bitcoin#35131: guix, refactor: Minor script cleanups and improvements
    3e90c5e1ce fuzz: add coins_view_stacked fuzz harness to test concurrent leveldb reads
    
    2026-05-31 | reindex-chainstate | 950059 blocks | dbcache 1000 | i7-hdd | x86_64 | Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz | 8 cores | 62Gi RAM | HDD
    
    Benchmark 1: COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=950059 -dbcache=1000 -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0 (COMMIT = fbe628756cc417dd4b6ccd9d3a709ca8e2f6023c)
      Time (abs ≡):        30962.748 s               [User: 33946.611 s, System: 1343.201 s]
    
    Benchmark 2: COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=950059 -dbcache=1000 -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0 (COMMIT = 3e90c5e1ceecb8fcb6a4271b98e84ded171252a7)
      Time (abs ≡):        26650.647 s               [User: 38790.705 s, System: 1775.691 s]
    
    Relative speed comparison
            1.16          COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=950059 -dbcache=1000 -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0 (COMMIT = fbe628756cc417dd4b6ccd9d3a709ca8e2f6023c)
            1.00          COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=950059 -dbcache=1000 -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0 (COMMIT = 3e90c5e1ceecb8fcb6a4271b98e84ded171252a7)
    
    fbe628756c Merge bitcoin/bitcoin#35131: guix, refactor: Minor script cleanups and improvements
    3e90c5e1ce fuzz: add coins_view_stacked fuzz harness to test concurrent leveldb reads
    
    2026-06-01 | reindex-chainstate | 950059 blocks | dbcache 50000 | i7-hdd | x86_64 | Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz | 8 cores | 62Gi RAM | HDD
    
    Benchmark 1: COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=950059 -dbcache=50000 -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0 (COMMIT = fbe628756cc417dd4b6ccd9d3a709ca8e2f6023c)
      Time (abs ≡):        24869.226 s               [User: 28333.773 s, System: 779.739 s]
    
    Benchmark 2: COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=950059 -dbcache=50000 -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0 (COMMIT = 3e90c5e1ceecb8fcb6a4271b98e84ded171252a7)
      Time (abs ≡):        25187.463 s               [User: 30291.512 s, System: 885.903 s]
    
    Relative speed comparison
            1.00          COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=950059 -dbcache=50000 -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0 (COMMIT = fbe628756cc417dd4b6ccd9d3a709ca8e2f6023c)
            1.01          COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=950059 -dbcache=50000 -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0 (COMMIT = 3e90c5e1ceecb8fcb6a4271b98e84ded171252a7)
    

    </details>

    <details><summary>i9</summary>

     for DBCACHE in 1000 50000; do \
        COMMITS="fbe628756cc417dd4b6ccd9d3a709ca8e2f6023c 3e90c5e1ceecb8fcb6a4271b98e84ded171252a7"; \
        STOP=950059; CC=gcc; CXX=g++; \
        BASE_DIR="/mnt/my_storage"; DATA_DIR="$BASE_DIR/BitcoinData"; LOG_DIR="$BASE_DIR/logs"; \
        (echo ""; for c in $COMMITS; do git fetch -q origin "$c" 2>/dev/null || true; git log -1 --pretty='%h %s' $c || exit 1; done) && \
        (echo "" && echo "$(date -I) | reindex-chainstate | ${STOP} blocks | dbcache ${DBCACHE} | $(hostname) | $(uname -m) | $(lscpu | grep 'Model name' | head -1 | cut -d: -f2 | xargs) | $(nproc) cores | $(free -h | awk '/^Mem:/{print $2}') RAM | $(lsblk -no ROTA $(df --output=source $BASE_DIR | tail -1) | grep -q 1 && echo HDD || echo SSD)"; echo "") && \
        hyperfine \
        --sort command \
        --runs 1 \
        --export-json "$BASE_DIR/rdx-$(sed -E 's/[^ ]+/\L&/g;s/[.]/_/g;s/ /-/g'<<<"$COMMITS")-$STOP-$DBCACHE-$CC.json" \
        --parameter-list COMMIT ${COMMITS// /,} \
        --prepare "killall -9 bitcoind 2>/dev/null; rm -f ./build/bin/bitcoind;done=0"; \e=$DBCACHE -reindex-chainstate -blocksonly -connect=0 -printtocon
    
    fbe628756c Merge bitcoin/bitcoin#35131: guix, refactor: Minor script cleanups and improvements
    3e90c5e1ce fuzz: add coins_view_stacked fuzz harness to test concurrent leveldb reads
    
    2026-05-31 | reindex-chainstate | 950059 blocks | dbcache 1000 | i9-ssd | x86_64 | Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz | 16 cores | 62Gi RAM | SSD
    
    Benchmark 1: COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=950059 -dbcache=1000 -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0 (COMMIT = fbe628756cc417dd4b6ccd9d3a709ca8e2f6023c)
      Time (abs ≡):        19558.763 s               [User: 31796.752 s, System: 1064.759 s]
    
    Benchmark 2: COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=950059 -dbcache=1000 -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0 (COMMIT = 3e90c5e1ceecb8fcb6a4271b98e84ded171252a7)
      Time (abs ≡):        14958.313 s               [User: 35308.810 s, System: 1574.040 s]
    
    Relative speed comparison
            1.31          COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=950059 -dbcache=1000 -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0 (COMMIT = fbe628756cc417dd4b6ccd9d3a709ca8e2f6023c)
            1.00          COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=950059 -dbcache=1000 -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0 (COMMIT = 3e90c5e1ceecb8fcb6a4271b98e84ded171252a7)
    
    fbe628756c Merge bitcoin/bitcoin#35131: guix, refactor: Minor script cleanups and improvements
    3e90c5e1ce fuzz: add coins_view_stacked fuzz harness to test concurrent leveldb reads
    
    2026-06-01 | reindex-chainstate | 950059 blocks | dbcache 50000 | i9-ssd | x86_64 | Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz | 16 cores | 62Gi RAM | SSD
    
    Benchmark 1: COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=950059 -dbcache=50000 -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0 (COMMIT = fbe628756cc417dd4b6ccd9d3a709ca8e2f6023c)
      Time (abs ≡):        17399.201 s               [User: 25711.641 s, System: 684.138 s]
    
    Benchmark 2: COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=950059 -dbcache=50000 -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0 (COMMIT = 3e90c5e1ceecb8fcb6a4271b98e84ded171252a7)
      Time (abs ≡):        16674.033 s               [User: 27487.513 s, System: 800.722 s]
    
    Relative speed comparison
            1.04          COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=950059 -dbcache=50000 -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0 (COMMIT = fbe628756cc417dd4b6ccd9d3a709ca8e2f6023c)
            1.00          COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=950059 -dbcache=50000 -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0 (COMM
    

    </details>

  57. andrewtoth force-pushed on Jun 3, 2026
  58. andrewtoth commented at 12:55 AM on June 3, 2026: contributor

    Thanks for the review @l0rinc, addressed all suggestions.

  59. DrahtBot added the label CI failed on Jun 3, 2026
  60. in src/coins.h:619 in d1756187fe
     619 | +        // This assumes ConnectBlock accesses all inputs in the same order as
     620 | +        // they are added to m_inputs in StartFetching.
     621 | +        if (m_input_tail < m_inputs.size() && m_inputs[m_input_tail].outpoint == outpoint) {
     622 |              // We advance the tail since the input is cached and not accessed through this method again.
     623 |              auto& input{m_inputs[m_input_tail++]};
     624 | -            if (input.outpoint != outpoint) continue;
    


    l0rinc commented at 10:52 AM on June 3, 2026:

    d175618 coins: filter same-block spends in StartFetching:

    Any reason for doing this differently in the previous commit?


    andrewtoth commented at 3:03 PM on June 3, 2026:

    We had to scan in the previous commit, because we were also fetching prevouts for inputs created in that block. The next commit we filter those out, so here we won't have to scan.

  61. l0rinc commented at 10:59 AM on June 3, 2026: contributor

    CI would probably pass after a rebase (which would also get us the seek compaction fixes)

  62. andrewtoth force-pushed on Jun 3, 2026
  63. andrewtoth commented at 3:10 PM on June 3, 2026: contributor

    Rebased on master to test compatibility with https://github.com/bitcoin-core/leveldb-subtree/pull/61.

    Also addressed #35295 (review) (thanks @l0rinc).

    git range-diff 9961229360..22066bfb4c 7c2718a4b8..9db6f241c7

  64. DrahtBot removed the label CI failed on Jun 3, 2026
  65. in src/coins.h:596 in a92444eedc
     591 | +
     592 | +        //! Move ctor is required for resizing m_inputs in StartFetching. Elements will never move once parallel tasks
     593 | +        //! are started, so we can assert that coin is nullopt and ready is false.
     594 | +        InputToFetch(InputToFetch&& other) noexcept : outpoint{other.outpoint}
     595 | +        {
     596 | +            Assert(!coin);
    


    theStack commented at 5:06 PM on June 3, 2026:

    in a92444eedc27537953bf797364bbd3196a8f7adc: was this meant to be

                Assert(!other.coin);
    

    ? That would make more sense to me, as the current assert is just "stating the obvious", i.e. that a freshly-default constructed std::optional member is std::nullopt.


    andrewtoth commented at 10:11 PM on June 3, 2026:

    Nice catch! Done.

  66. andrewtoth force-pushed on Jun 3, 2026
  67. andrewtoth commented at 10:12 PM on June 3, 2026: contributor

    Addressed comments #35295 (review) and #35295 (review) (thanks @theStack!).

    git diff 9db6f241c74709be7565219220b481a586b6cc6d..73df01964cdc3a8b48ddc45010c469a0af283106

  68. l0rinc approved
  69. in src/coins.h:650 in d3dac3c9dd
     644 | @@ -642,10 +645,15 @@ class CoinsViewOverlay : public CCoinsViewCache
     645 |          Assert(m_input_head.load(std::memory_order_relaxed) == 0);
     646 |          Assert(m_input_tail == 0);
     647 |          // Loop through the inputs of the block and set them in the queue.
     648 | +        // Filter txs that are spending inputs created earlier in the same block. These inputs will be created
     649 | +        // directly in the cache from the tx that creates them, so they will not be requested from a base view.
     650 | +        std::unordered_set<Txid, SaltedTxidHasher> txids;
    


    l0rinc commented at 11:07 AM on June 4, 2026:

    d3dac3c coins: filter same-block spends in StartFetching:

    In previous versions this collected all txids, but now it's the earlier txids, i.e. no more forward references possible: maybe we could rename to reflect that

            std::unordered_set<Txid, SaltedTxidHasher> earlier_txids;
    

    Note: technically it's "earlier, except coinbase", which means that an invalid block trying to spend its own coinbase would still go to disk (likely not something we should care about)

  70. in src/coins.h:582 in d7bb17e2f9
     582 | + * objects. StartFetching then submits worker tasks to a ThreadPool and keeps the returned futures alive until fetching
     583 | + * is stopped.
     584 | + *
     585 | + * ProcessInput() atomically fetches and increments m_input_head, so each thread can only access a single element of the
     586 | + * m_inputs vector at a time. Workers race to claim inputs, so they may fetch elements in any order. If the fetched
     587 | + * index is greater than the size of m_inputs, no more inputs can be fetched and false is returned.
    


    l0rinc commented at 11:09 AM on June 4, 2026:

    d7bb17e doc: update CoinsViewOverlay docstring to describe parallel fetching:

    nit: size is an invalid index:

     * index is greater than or equal to the size of m_inputs, no more inputs can be fetched and false is returned.
    
  71. in src/coins.h:568 in d7bb17e2f9
     564 | @@ -565,13 +565,63 @@ class CCoinsViewCache : public CCoinsViewBacked
     565 |  };
     566 |  
     567 |  /**
     568 | - * CCoinsViewCache overlay that avoids populating/mutating parent cache layers on cache misses.
     569 | + * CCoinsViewCache subclass that asynchronously fetches all block inputs in parallel during ConnectBlock without
    


    l0rinc commented at 11:21 AM on June 4, 2026:

    d7bb17e doc: update CoinsViewOverlay docstring to describe parallel fetching:

    nit: not "all" inputs can be prefeched

     * CCoinsViewCache subclass that asynchronously fetches most block input prevouts in parallel during ConnectBlock without
    
  72. in src/coins.h:594 in 2c9efd46f9
     589 | +
     590 | +    /**
     591 | +     * Claim and fetch the next input in the queue.
     592 | +     *
     593 | +     * @return true if there are more inputs in the queue to fetch
     594 | +     * @return false if there are no more inputs in the queue to fetch
    


    l0rinc commented at 11:30 AM on June 4, 2026:

    2c9efd4 validation: collect block inputs in CoinsViewOverlay before ConnectBlock:

    nit: wouldn't this return true for the last element?

         * [@return](/bitcoin-bitcoin/contributor/return/) true if an input prevout was fetched
         * [@return](/bitcoin-bitcoin/contributor/return/) false if there are no more input prevouts in the queue to fetch
    
  73. in src/coins.h:573 in d7bb17e2f9
     571 |   *
     572 | - * This is achieved by fetching coins from the base view using PeekCoin() instead of GetCoin(),
     573 | - * so intermediate CCoinsViewCache layers are not filled.
     574 | + * Only used in ConnectBlock to pass as an ephemeral view that can be reset if the block is invalid.
     575 | + * It provides the same interface as CCoinsViewCache. It overrides all methods that mutate base,
     576 | + * stopping threads before calling superclass.
    


    l0rinc commented at 11:42 AM on June 4, 2026:

    d7bb17e doc: update CoinsViewOverlay docstring to describe parallel fetching:

    nit: StopFetching does not stop or destroy the ThreadPool worker threads anymore:

     * stopping fetch tasks before calling superclass.
    
  74. in src/coins.h:1 in 2c9efd46f9 outdated


    l0rinc commented at 11:45 AM on June 4, 2026:

    2c9efd46f98e9c0b2e9698d7550e96a0082e14ea

    Fetch coins from the m_inputs vector in FetchCoinFromBase by scanning all inputs starting at m_input_tail until we discover the input with the correct outpoint.

    Is this still accurate?

  75. in src/coins.h:703 in e92862f7e6
     707 | +                for (const auto& input : tx->vin) {
     708 | +                    if (!txids.contains(input.prevout.hash)) m_inputs.emplace_back(input.prevout);
     709 | +                }
     710 | +                txids.emplace(tx->GetHash());
     711 | +            }
     712 | +            // Only start threads if we have something to fetch.
    


    l0rinc commented at 11:50 AM on June 4, 2026:

    e92862f coins: fetch inputs in parallel:

    nit: this also seems to be a pre-threadpool framing:

                // Only submit tasks if we have something to fetch.
    
  76. in src/coins.h:648 in d3dac3c9dd
     644 | @@ -642,10 +645,15 @@ class CoinsViewOverlay : public CCoinsViewCache
     645 |          Assert(m_input_head.load(std::memory_order_relaxed) == 0);
     646 |          Assert(m_input_tail == 0);
     647 |          // Loop through the inputs of the block and set them in the queue.
     648 | +        // Filter txs that are spending inputs created earlier in the same block. These inputs will be created
    


    l0rinc commented at 11:51 AM on June 4, 2026:

    d3dac3c coins: filter same-block spends in StartFetching:

    nit: We're spending outputs

            // Loop through the block inputs and set their prevouts in the queue.
            // Filter inputs that spend outputs created earlier in the same block. These outputs will be created
    
  77. in src/test/fuzz/coins_view.cpp:395 in 4548f6468a
     399 |          }
     400 | +
     401 | +        // Stop async workers before accessing backend_coins_view to avoid data race
     402 | +        // (HaveCoin/GetCoin on CCoinsViewCache mutate cacheCoins via FetchCoin)
     403 | +        if (auto* async_cache{dynamic_cast<CoinsViewOverlay*>(&coins_view_cache)}) {
     404 | +            (void)async_cache->CreateResetGuard();
    


    l0rinc commented at 11:59 AM on June 4, 2026:

    4548f64 fuzz: update harnesses to cover CoinsViewOverlay::StartFetching:

    If I understand correctly, this stops fetching because the final backend consistency check calls HaveCoin/GetCoin which can mutate CCoinsViewCache.

    Maybe we can avoid that by using PeekCoin when mutation is prohibited, something like:

    std::optional<Coin> coin_in_backend;
    bool exists_using_have_coin_in_backend;
    if (dynamic_cast<CoinsViewOverlay*>(&coins_view_cache)) {
        // PeekCoin does not mutate cacheCoins, so async workers can keep running.
        coin_in_backend = backend_coins_view->PeekCoin(random_out_point);
        exists_using_have_coin_in_backend = coin_in_backend.has_value();
    } else {
        exists_using_have_coin_in_backend = backend_coins_view->HaveCoin(random_out_point);
        coin_in_backend = backend_coins_view->GetCoin(random_out_point);
    }
    

    which allows us to avoid those calls later:

    // If the backend has the coin, it must also be on the cache if the coin wasn't spent.
    if (!is_spent_using_access_coin && exists_using_have_coin_in_backend) {
        assert(exists_using_have_coin);
    }
    if (coin_in_backend) {
        assert(exists_using_have_coin_in_backend);
        // Note we can't assert that the backend coin matches the cache coin because the coin in
        // the cache may have been modified but not yet flushed.
    } else {
        assert(!exists_using_have_coin_in_backend);
    }
    
  78. in src/coins.h:683 in e0a95b547d outdated
     678 | +    {
     679 | +        StopFetching();
     680 | +        CCoinsViewCache::SetBackend(view);
     681 | +    }
     682 | +
     683 | +    void Flush(bool reallocate_cache = true) override
    


    l0rinc commented at 12:01 PM on June 4, 2026:

    e0a95b5 coins: stop fetching before mutating base:

    Should BatchWrite also stop fetching for API consistency? The fuzz harness can call the inherited BatchWrite while fetching is active (even though this cannot happen in production, as far as I can tell).

    void BatchWrite(CoinsViewCacheCursor& cursor, const uint256& block_hash) override
    {
        StopFetching();
        CCoinsViewCache::BatchWrite(cursor, block_hash);
    }
    

    andrewtoth commented at 9:45 PM on June 4, 2026:

    BatchWrite() only modifies this cacheCoins, it does not modify base in any way! So, it's safe to call. Flush() and Sync() call BatchWrite() on base.

  79. l0rinc approved
  80. l0rinc commented at 12:24 PM on June 4, 2026: contributor

    While I'm running IBD, pruned-IBD, reindex-chainstate, memory profiling, and -prevoutfetchthreads=0/1/4/8 comparisons I have added a few nits for the comments and docs. A few still reflect the pre-Threadpool state, we should probably fix it. But if the above benchmarks don't reveal anything new I'm fine with it as is, most of these can be follow-ups.

  81. validation: collect block inputs in CoinsViewOverlay before ConnectBlock
    Introduce CoinsViewOverlay::StartFetching, which maps all input prevouts of a
    block to a new m_inputs vector of InputToFetch elements. Returns a ResetGuard
    which is lifetime bound to the block, while the InputToFetch elements are
    lifetime bound to the block as well.
    
    Introduce StopFetching to clear the m_inputs vector.
    CCoinsViewCache::Reset is made virtual and is overridden in CoinsViewOverlay.
    StopFetching is called on Reset, so the InputToFetch objects will not
    exceed the lifetime of the block.
    
    Introduce ProcessInput to fetch the utxo of an individual input in m_inputs.
    Each caller fetches the input at m_input_head and increments it, so each call
    will fetch the next input in the queue.
    
    Fetch coins from the m_inputs vector in FetchCoinFromBase by scanning all inputs
    starting at m_input_tail until we discover the input with the correct outpoint.
    Since CoinsViewOverlay will cache the returned prevout and not request it again,
    we advance m_input_tail so we don't have to scan that input again.
    
    This is designed deliberately so multiple threads can call ProcessInput independently.
    
    Co-authored-by: l0rinc <pap.lorinc@gmail.com>
    Co-authored-by: Hodlinator <172445034+hodlinator@users.noreply.github.com>
    aeb4bf72d9
  82. coins: filter same-block spends in StartFetching
    Inputs spending outputs of an earlier transaction in the same block won't
    be in the cache or the db. They also won't be requested by FetchCoinFromBase,
    so we can filter them out to not waste time trying to fetch them.
    
    Build an unordered set of seen txids while flattening m_inputs and skip
    any prevout whose hash is already in the set.
    
    Co-authored-by: l0rinc <pap.lorinc@gmail.com>
    38b18bc5b3
  83. coins: add ready flag to InputToFetch
    Prepares for ProcessInput to be called from multiple threads.
    
    This flag acts as a memory fence around InputToFetch::coin. There is no lock
    guarding reads and writes of the coin field.
    Instead we use the flag's release/acquire semantics to ensure that when the
    main thread reads the coin it will have happened after a worker thread has
    finished writing it.
    
    Co-authored-by: l0rinc <pap.lorinc@gmail.com>
    9d6ae3c616
  84. coins: stop fetching before mutating base
    Prepares for ProcessInput to be called from multiple threads.
    
    ProcessInput reads from base via base->PeekCoin(). For it to be safe to call
    in parallel on separate threads, base must not be mutated while fetching is
    in progress.
    
    Flush, Sync, and SetBackend all mutate base, so override them on
    CoinsViewOverlay to StopFetching before delegating to the base class.
    
    In production these do not matter, since only Flush will be called on a
    valid block after all inputs have already been fetched and validated.
    Sync and SetBackend are not called on CoinsViewOverlay in production code.
    
    Our fuzz harnesses call Flush, Sync, and SetBackend at any time, trying to
    fully exercise the CCoinsViewOverlay API which CoinsViewOverlay inherits.
    By overriding these methods, we can use the same fuzz harnesses and
    substitute a CoinsViewOverlay for a CCoinsViewCache with minimal
    modification.
    
    Co-authored-by: l0rinc <pap.lorinc@gmail.com>
    d492a00e15
  85. validation: add -prevoutfetchthreads configuration option
    Add a configuration option for the number of worker threads used for
    parallel UTXO prevout prefetching during block connection.
    
    Default is 8 threads, max is 16, 0 disables parallel fetching.
    285a2cfb21
  86. coins: introduce thread pool in CoinsViewOverlay
    Prepares for ProcessInput to be called from multiple threads.
    
    Introduce a ThreadPool shared pointer to CoinsViewOverlay. A pool managed
    externally can be passed in the constructor.
    
    A global thread pool is used in fuzz harnesses since iterations can happen
    faster than the OS can create and tear down thread pools.
    This can cause a memory leak when fuzzing.
    
    Co-authored-by: l0rinc <pap.lorinc@gmail.com>
    963d0fa402
  87. coins: fetch inputs in parallel
    Leverages the thread pool to fetch inputs on multiple threads, while the overlay
    serves inputs on the main thread.
    
    This is a performance improvement over blocking the main thread to fetch inputs.
    
    Co-authored-by: l0rinc <pap.lorinc@gmail.com>
    56a72f5b82
  88. doc: update CoinsViewOverlay docstring to describe parallel fetching
    Co-authored-by: l0rinc <pap.lorinc@gmail.com>
    55613ef9e6
  89. test: add unit tests for CoinsViewOverlay::StartFetching
    Co-authored-by: l0rinc <pap.lorinc@gmail.com>
    f5623611c1
  90. fuzz: update harnesses to cover CoinsViewOverlay::StartFetching
    Co-authored-by: l0rinc <pap.lorinc@gmail.com>
    Co-authored-by: sedited <seb.kung@gmail.com>
    f2bb2bb7ea
  91. fuzz: add coins_view_stacked fuzz harness to test concurrent leveldb reads 14832b2302
  92. andrewtoth force-pushed on Jun 6, 2026
  93. andrewtoth commented at 7:56 PM on June 6, 2026: contributor

    Thanks @l0rinc. Addressed all your suggestions.

    git diff 73df01964cdc3a8b48ddc45010c469a0af283106..14832b23025c3ee9af62d9c762bca59726b614c7


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-06-07 13:51 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me