Add a pruning 'high water mark' to reduce the frequency of pruning events

esotericnonsense commented at 10:23 pm on September 17, 2017: contributor

Partial fix for issue #11315.

Every prune event flushes the dbcache to disk. By default this happens approximately every ~160MiB so high dbcache values are negated and IBD takes far longer than without pruning enabled.

This change allows a ‘high water mark’ for pruning such that the actual size of blk/rev on disk can increase a reasonable amount before flushing.

On a machine with prune=550 and dbcache=3000:

02017-09-17 22:04:56 Prune: target=550MiB hwm=3540MiB actual=3510MiB diff=-2960MiB max_prune_height=292477 removed 0 blk/rev pairs
12017-09-17 22:04:56 Prune: target=550MiB hwm=3540MiB actual=3516MiB diff=-2966MiB max_prune_height=292499 removed 0 blk/rev pairs
22017-09-17 22:04:57 Prune: target=550MiB hwm=3540MiB actual=468MiB diff=81MiB max_prune_height=292537 removed 21 blk/rev pairs
32017-09-17 22:04:57 Prune: UnlinkPrunedFiles deleted blk/rev (00103)
4...

I haven’t changed the ‘diff’ column in debug log (it could perhaps be hwm - actual rather than target - actual).

Not sure if this could potentially increase disk space requirements in some cases - may need documentation. With a very high dbcache value, if say 10GiB of blocks come in that only produce 2GiB of chainstate then you’d overshoot quite a bit, I think. It’s a tradeoff - more frequent flushing = slower IBD.

Thanks to sipa and gmaxwell for helping out on IRC.

Add a pruning 'high water mark' to reduce the frequency of pruning events ace88465f8

fanquake added the label Validation on Sep 18, 2017

esotericnonsense commented at 3:21 am on September 18, 2017: contributor

Benchmarks, syncing against a localhost node. Sending node on HDD, syncing node on SSD. Clock starts at UpdateTip height=1. prune=550 dbcache=3000.

0+--------+----------+------------------+-----------------+
1| height | unpruned | pruned (this PR) | pruned (master) |
2+--------+----------+------------------+-----------------+
3| 250000 |     427s |             593s |            724s |
4| 300000 |     916s |            1076s |           1402s |
5| 350000 |    1443s |            1979s |           2707s |
6+--------+----------+------------------+-----------------+

At height 350000, this PR results in a 529MiB dbcache vs. a 2646MiB dbcache unpruned. The pruned node ends up with approx. 30MiB. 110 seconds should be added to serialize the dbcache on the unpruned node’s shutdown (the other two cases were single digit seconds).

Final result is that the node can sync to height 350000 27% faster than without the PR by giving the prune target ~3GiB leeway. I didn’t want to spend the time to reach the end but I suspect results would be similar or better.

As in the above post, this is only a ‘partial fix’ because the dbcache is still limited empirically to far lower than the actual value.

edit: Ah yes, space requirements. In this test the chainstate folder’s final size is ~1GiB and the prune is allowed to overshoot by ~3GiB, so it raises the maximum disk space requirement by ~2GiB in this example.

esotericnonsense commented at 4:51 am on September 18, 2017: contributor

An additional performance gain could be gotten by tying this HWM to a percentage of the prune target.

For example, with prune=100000 you could let the data get to 100G x 1.10 before pruning, or cap it at 100G and prune down to 100G x 0.90 (similar effect on dbcache in both cases).

Looking at the documentation in -help:

0automatically prune block files to stay under the specified target size in MiB

so probably the ‘remain below’ option makes more sense, but that retains the far slower IBD mechanic at low prune levels

sdaftuar commented at 7:09 pm on September 29, 2017: member

No strong feelings from me, but when we worked on the pruning implementation our goal was to have the target be something that should be achievable. So if we were to decide that it’s worth exceeding it intentionally (eg for performance reasons during ibd), we should remember that we need to clearly communicate that to users.

But now that we in theory support non-atomic flushes, perhaps we can use that to flush less often during IBD even while we prune.

luke-jr commented at 9:42 am on November 10, 2017: member

Indeed, users expect that if they set prune=5000, the blockchain size remains <= 5000 MB. Perhaps it would make sense to have a prune-extra option to specify additional space to free when doing pruning and reduce its frequency (we can default it to 10% or something).

Sjors commented at 2:06 pm on February 10, 2018: member

Add a pruning ‘high water mark’ to reduce the frequency of pruning events #11359