Background
During the first part of reindexing, LoadExternalBlockFile()
sequentially reads raw blocks from the blocks/blk00nnn.dat
files (rather than receiving them from peers, as with initial block download) and eventually adds all of them to the block index. When an individual block is initially read, it can’t be immediately added unless all its ancestors have been added, which is rare (only about 8% of the time), because the blocks are not sorted by height. When the block can’t be immediately added to the block index, its disk location is saved in a map so it can be added later. When its parent is later added to the block index, LoadExternalBlockFile()
reads and deserializes the block from disk a second time and adds it to the block index. Most blocks (92%) get deserialized twice.
This PR
During the initial read, it’s rarely useful to deserialize the entire block; only the header is needed to determine if the block can be added to the block index immediately. This change to LoadExternalBlockFile()
initially deserializes only a block’s header, then deserializes the entire block only if it can be added immediately. This reduces reindex time on mainnet by 7 hours on a Raspberry Pi, which translates to around a 25% reduction in the first part of reindexing (adding blocks to the index), and about a 6% reduction in overall reindex time.
Summary: The performance gain is the result of deserializing each block only once, except its header which is deserialized twice, but the header is only 80 bytes.