This PR parallelizes fetching all input prevouts of a block right before block connection, achieving up to 46% faster IBD performance1234.
Problem
Currently, when fetching inputs in ConnectBlock, each input is fetched from the cache sequentially. A cache miss requires a round trip to the disk database to fetch the outpoint and insert it into the cache. Since the database is read-only during ConnectBlock, we can fetch all inputs of a block in parallel on multiple threads while connecting.
Solution
We introduce a new CoinsViewCacheAsync CCoinsViewCache subclass that manages worker threads to fetch block inputs in parallel. The block is passed to the CoinsViewCacheAsync view before entering ConnectBlock, which kicks off the worker threads to begin fetching the inputs. The view is then passed to ConnectBlock since it provides the same API as CCoinsViewCache. The cache returns fetched coins as they become available via the overridden FetchCoin method. If not available yet, the main thread also fetches coins as it waits.
Implementation Details
The CoinsViewCacheAsync implements a lock-free MPSC (Multiple Producer, Single Consumer) queue design:
- Work Distribution: Collects all input prevouts into a queue and uses a barrier to start all worker threads simultaneously
- Synchronization: Worker threads use an atomic counter to claim which inputs to fetch, and each input has an atomic flag to signal completion to the main thread
- Main Thread Processing: The main thread waits for inputs in order and moves their
Coininto thecacheCoinsmap as they become available. - Work Stealing: If the main thread catches up to the workers, it assists with fetching to maximize parallelism
- Completion: All threads synchronize on a barrier via a
Resetmethod, which ensures all threads are parked before proceeding
Safety and Correctness
- The
CoinsViewCacheAsyncworks on a block that has not been fully validated, but it does not interfere or modify any of the validation duringConnectBlock - It simply fetches inputs in parallel, which must be fetched before a transaction is validated anyways
- Invalid blocks: If an invalid block is mined, the temporary cache is discarded without being flushed. This is an improvement over the current behavior, where inputs are inserted into the main
CoinsTip()cache when pulled through for an invalid block
Performance
Benchmarks show up to 46% faster IBD performance1234. The parallelization of expensive disk lookups provides significant speedup.
Credits
Inspired by this comment.