Since #17487 we no longer need to clear the coins cache when syncing to disk. A warm coins cache significantly speeds up block connection, and only needs to be fully flushed when nearing the dbcache
limit.
Periodic flushes occur every 24 hours, which empties the cache and causes block connection to slow down. By keeping the cache through periodic flushes a node can run for several days with an increasingly hotter cache and connect blocks much more quickly. Now not only can setting a higher dbcache
value be beneficial for IBD, it can also be beneficial for connecting blocks faster.
To benchmark in real world usage, I spun up 6 identical t2.small
AWS EC2 instances, all running in the same region in the same VPC. I configured 2 instances to run master, 2 instances to run the change in this PR, and 2 instances to run the change in this PR but with dbcache=1000
. All instances had prune=5000
and a 20 GB gp2
EBS
volume. A 7th EC2 instance in the same VPC ran master and connected only to some trusted nodes in the outside network. Each of the 6 nodes under test only connected directly to this 7th instance. I manually pruned as much as possible and uploaded the same blocks
, chainstate
and mempool.dat
to all instances. I started all 6 peers simultaneously at block height 835245
and ran them for over a week until block 836534
.
The results were much faster block connection times for this branch compared to master, and much faster for this branch with dbcache=1000
compared to default dbcache
.
branch | speed |
---|---|
master 1 | 1995.49ms/blk |
master 2 | 2129.78ms/blk |
branch default dbcache 1 | 1189.65ms/blk |
branch default dbcache 2 | 1037.74ms/blk |
branch dbcache=1000 1 | 393.69ms/blk |
branch dbcache=1000 2 | 427.77ms/blk |
The log files of all 6 instances are here.
There is a lot of noise with the exact times of blocks being connected, so I plotted the rolling 20 block connect time averages. The large dots are the times where the cache is emptied. For the red master nodes, this happens every 24 hours. The blue branch nodes with default dbcache
only filled up and emptied the caches once, which is seen in the middle. The green branch nodes with 1000 dbcache
never emptied the cache. It is very clear from the chart that whenever the cache is emptied, connect block speed degrades significantly.
Also note that this still clears the cache for pruning flushes. Having frequent pruning flushes with a large cache that doesn’t clear is less performant than the status quo #15265 (comment). See #28280.