This PR is a continuation of #31132. All outstanding issues raised there have been resolved, but the volume of stale comments can make that change difficult to review.
Currently, when connecting a block, each input prevout is looked up one at a time. For every input we first check the in-memory coins cache, and on a miss we make a synchronous round-trip to the chainstate LevelDB to read the coin from disk. Because these lookups happen serially as the block is being validated, the disk read latency stacks up and dominates the time spent in ConnectBlock whenever many inputs are not already in the cache.
This PR moves those disk reads onto a pool of worker threads that run in parallel with block connection. Before entering ConnectBlock the block is handed to a CoinsViewOverlay, which kicks off the workers to begin fetching all of the block's prevouts from disk and warming the cache. The main validation thread continues to do exactly the same work it does today, hitting the cache for each input in order. The only difference is that by the time it asks, the coin is much more likely to already be there. There are no validation logic or consensus behavior changes. This is purely a parallelization of an existing read pattern.
The number of fetcher threads is configurable via -inputfetchthreads=<n>, defaulting to 4 and capped at 16. Setting it to 0 disables input fetching entirely and reverts to the previous serial behavior.
We have measured large performance gains for IBD and -reindex-chainstate, as well as worst-case steady-state block connection at the tip. l0rinc ran many thorough benchmarking passes on the original PR across multiple machines, storage types, dbcache sizes^1, operating systems[^2], and fetcher thread counts[^3]. Many other contributors also posted their benchmark results in the original PR. IBD speedups range from 1.18× to over 3× faster[^4]. Worst-case block connection time for network-attached storage was over 2× faster[^5]. Flamegraph comparisons before and after this change are available[^6].
On safety: ConnectBlock runs while holding cs_main, so nothing else in the node can mutate the chainstate while the fetchers are reading it.
On LevelDB: concurrent reads are fully supported and documented as such. We already rely on this in production today against our other LevelDB-backed databases. The txindex DB is read by multiple simultaneous HTTP RPC worker threads via the getrawtransaction RPC. The blockfilterindex DB is called concurrently from both the P2P cfilters / cfheaders / cfcheckpt message handlers on the msghand thread, and from the getblockfilter RPC on the HTTP RPC worker threads. We have not yet been issuing concurrent reads against the chainstate DB, but there is no LevelDB-side reason we can't. In fact, the chainstate DB is already being touched by more than one thread on master, because LevelDB schedules its own background compaction work.
[^2]: #31132 (comment) [^3]: #31132 (comment) [^4]: #31132 (comment) [^5]: #31132 (comment) [^6]: #31132 (comment)