Resurrection of #16718
tl;dr
I try swapping out std::unordered_map for a faster third-party implementation where it matters, see something like 15% speedup for initial block download coincident with a 19.3% reduction in memory usage.
When profiling initial block download, it’s evident that a decent chunk of time on the critical path is spent in CCoinsViewCache, the data structure responsible for the in-memory representation of the UTXO set.
Profile of the msghand thread at height ~300,000.
The essential data member of the cache, cacheCoins, is a std::unordered_map that holds as many UTXO as can fit into memory given the dbcache setting. While this map implementation is definitely preferable to std::map (which retains order and has O(log(n)) lookups insted of O(1)), both stdlib maps are subpar relative to non-std alternatives.
After seeing cacheCoins as a bottleneck, I decided to try swapping one of these alternative implementations in to see if there is any benefit. It looks like there is.

For 20,000 blocks at a representative part of the chain, we see something like a 15% speedup with a 19.3% reduction in memory usage (dbcache was set to 3000).
The CCoinsCaching benchmark shows improvement but not as dramatically as IBD does:

Obviously adding 2000 lines of new consensus-critical code isn’t appealing, but if those numbers hold for the full IBD process I think there might be a compelling case to be made for the added code.
Implementations considered
Of the few best-looking options, I picked the one with the most succinct implementation and fewest dependencies, robin_hood::unordered_node_map (it’s just a header file). I also looked at Facebook’s F14NodeMap and Abseil’s node_hash_map but both required too much elbow-grease (for now) to integrate here without pulling in a whole lot of code.
Next steps
I’m going to be running more comprehensive benchmarks, including a near-full IBD from the network, to see if these improvements hold.
I’m also interested in benching this along with a similar change to mapBlockIndex (currently a std::unordered_map).
Questions
If we decide to include this or another hashmap implementation, should we do a full fork inclusion a la leveldb or should we try to slim down the implementation to a single header file as is done here (but more)?

