Flush without erasing cache during periodic and pruning flushes

sdaftuar commented at 8:09 pm on January 25, 2019: member

Allow syncing the UTXO state to disk without clearing the in-memory cache. This provides better support for using a large -dbcache, which otherwise would get cleared every 24 hours.

Flush without erasing cache during periodic and pruning flushes 59eae4a006

sdaftuar commented at 8:09 pm on January 25, 2019: member

@sipa @gmaxwell Thoughts?

andcoisqu commented at 8:30 pm on January 25, 2019: none

If I understand correctly, this would remove the performance hit that #15218 would introduce?

sdaftuar commented at 8:34 pm on January 25, 2019: member

If I understand correctly, this would remove the performance hit that #15218 would introduce?

That’s the idea. I’m not totally sure this doesn’t introduce other problems, however – right now I’m watching a pruning node do an IBD with this patch (with a large -dbcache), and I’m observing that the frequent utxo writes (as files get pruned) are very slow, presumably because it’s looping through a large utxo cache to find the dirty entries. So if this doesn’t actually help with pruning, then I might need to tighten the scope to only do this for the periodic writes.

sipa commented at 8:42 pm on January 25, 2019: member

@sdaftuar By adding two pointers per CCoinsCacheEntry object we can construct two double-linked lists in the cache, one for all dirty and one for all clean entries. An alternative is separate maps for the two (which is lower memory usage, but now needs two lookups every time).

gmaxwell commented at 2:23 am on January 26, 2019: contributor

I tested something like this before and I think I found it had almost(?) as much negative impact to sync time as the full flush. This isn’t too surprising if you assume that the purpose of the dbcache is a write buffer for the purpose of eliminating writes on short lived objects (and the additional writes to delete them), rather than a read cache. Maybe on a slow disk the read cache component is more important, but on SSD it doesn’t appear to be.

If benchmarks support doing this, I’m all for it! but it sounds to me like they might not.

Long term what I’d like to see is keeping a linked list of writes, and after letting the cache fill a day or a weeks worth of activity (interesting open question: what is the amount of write suppression as a function of the buffer horizon?), start writing out (and erasing) the surviving oldest block worth of activity after each connected block… this way we’re always writing rather than blocking on writes in big bursts.

By writing in block-ordered order (or something close to it) we would get the ability to continually advance the synced-to point. (I also assumed that we’d change to some kind of open hash table for the random lookups–to reduce the pointer chasing, have a collision resolution which will overwrite non-dirty entries whenever its convenient, and write flushing simply make entries non-dirty rather than erasing them).

Or, at least, this was the kind of thing I had in mind when suggesting non-atomic-flushing. But all that stuff is a big project. :( I suspect the periodic flushes are by far the biggest source of blockacceptance/gbt latency though, so it’s one which would almost certainly have a big payoff.

sipa commented at 2:36 am on January 26, 2019: member

@gmaxwell I think that’s all true, but not really relevant here.

The goal isn’t speeding up synchronization given the choice of flushing or not flushing entirely. In that case we’ve already shown several times that not flushing is indeed better.

The scenario here is what to do when we’re forced to flush due to other reasons than the cache being full. And it would seem to me that flushing without wiping should be strictly better than flushing with wiping if the cost per write is constant. Unfortunately it seems that here is an unrelated extra cost of iterating over the non-dirty entries, making it not necessarily better.

gmaxwell commented at 2:57 am on January 26, 2019: contributor

And it would seem to me that flushing without wiping should be strictly better than flushing with wiping in that case.

I believe we found otherwise in benchmarking. If that isn’t actually the case, great, better is better.

laanwj added the label Validation on Jan 27, 2019

sdaftuar commented at 6:42 pm on January 29, 2019: member

I’ve done some benchmarking on the pruning component of this, comparing a pruning node with big dbcache doing an initial sync using either master or a version of this PR that has been modified to be able to loop over just the dirty entries in the cache when doing a BatchWrite. @gmaxwell’s recollection that preserving the read-cache benefit is of limited benefit seems correct both on SSD and spinning disks; I saw virtually no difference in performance (and in one of my tests, master seems to perform a bit better).

I expected the read-caching would be beneficial on spinning disks, but perhaps OS-level disk caching is more effective than I realized. At any rate this approach seems like it’s not worth pursuing, so I’ll close.

sdaftuar closed this on Jan 29, 2019

Sjors commented at 4:31 pm on November 19, 2019: member

I rebased this on top of #17487 in case someone wants to take another stab at benchmarking: https://github.com/Sjors/bitcoin/tree/2019/11/prune-no-erase

With the recent merge of Android support #17078 mobile devices would make an interesting benchmark. In my experience on a Xiaomi A1 (4 GB RAM, 32 / 64 GB disk) IBD becomes unbearingly slow after block 500,000 or so. This might be partly due to frequent pruning, which prevents it from leveraging its RAM. However as discussed above, these changes may not be enough to solve that. Pruning more than 10% (added in #11658), to reduce the frequency of prune events and thus flushes, might also help.

DrahtBot locked this on Dec 16, 2021

Flush without erasing cache during periodic and pruning flushes #15265