Resurrection of #16718
tl;dr
I try swapping out std::unordered_map
for a faster third-party implementation where it matters, see something like 15% speedup for initial block download coincident with a 19.3% reduction in memory usage.
When profiling initial block download, it’s evident that a decent chunk of time on the critical path is spent in CCoinsViewCache
, the data structure responsible for the in-memory representation of the UTXO set.
Profile of the msghand
thread at height ~300,000.
The essential data member of the cache, cacheCoins
, is a std::unordered_map
that holds as many UTXO as can fit into memory given the dbcache
setting. While this map implementation is definitely preferable to std::map
(which retains order and has O(log(n)) lookups insted of O(1)), both stdlib maps are subpar relative to non-std
alternatives.
After seeing cacheCoins as a bottleneck, I decided to try swapping one of these alternative implementations in to see if there is any benefit. It looks like there is.
For 20,000 blocks at a representative part of the chain, we see something like a 15% speedup with a 19.3% reduction in memory usage (dbcache was set to 3000).
The CCoinsCaching
benchmark shows improvement but not as dramatically as IBD does:
Obviously adding 2000 lines of new consensus-critical code isn’t appealing, but if those numbers hold for the full IBD process I think there might be a compelling case to be made for the added code.
Implementations considered
Of the few best-looking options, I picked the one with the most succinct implementation and fewest dependencies, robin_hood::unordered_node_map
(it’s just a header file). I also looked at Facebook’s F14NodeMap
and Abseil’s node_hash_map
but both required too much elbow-grease (for now) to integrate here without pulling in a whole lot of code.
Next steps
I’m going to be running more comprehensive benchmarks, including a near-full IBD from the network, to see if these improvements hold.
I’m also interested in benching this along with a similar change to mapBlockIndex
(currently a std::unordered_map
).
Questions
If we decide to include this or another hashmap implementation, should we do a full fork inclusion a la leveldb or should we try to slim down the implementation to a single header file as is done here (but more)?