Add a pruning ‘high water mark’ to reduce the frequency of pruning events #11359

pull esotericnonsense wants to merge 1 commits into bitcoin:master from esotericnonsense:2017-09-add-pruning-hwm changing 1 files +9 −3
  1. esotericnonsense commented at 10:23 pm on September 17, 2017: contributor

    Partial fix for issue #11315.

    Every prune event flushes the dbcache to disk. By default this happens approximately every ~160MiB so high dbcache values are negated and IBD takes far longer than without pruning enabled.

    This change allows a ‘high water mark’ for pruning such that the actual size of blk/rev on disk can increase a reasonable amount before flushing.

    On a machine with prune=550 and dbcache=3000:

    02017-09-17 22:04:56 Prune: target=550MiB hwm=3540MiB actual=3510MiB diff=-2960MiB max_prune_height=292477 removed 0 blk/rev pairs
    12017-09-17 22:04:56 Prune: target=550MiB hwm=3540MiB actual=3516MiB diff=-2966MiB max_prune_height=292499 removed 0 blk/rev pairs
    22017-09-17 22:04:57 Prune: target=550MiB hwm=3540MiB actual=468MiB diff=81MiB max_prune_height=292537 removed 21 blk/rev pairs
    32017-09-17 22:04:57 Prune: UnlinkPrunedFiles deleted blk/rev (00103)
    4...
    

    I haven’t changed the ‘diff’ column in debug log (it could perhaps be hwm - actual rather than target - actual).

    Not sure if this could potentially increase disk space requirements in some cases - may need documentation. With a very high dbcache value, if say 10GiB of blocks come in that only produce 2GiB of chainstate then you’d overshoot quite a bit, I think. It’s a tradeoff - more frequent flushing = slower IBD.

    Thanks to sipa and gmaxwell for helping out on IRC.

  2. Add a pruning 'high water mark' to reduce the frequency of pruning events ace88465f8
  3. fanquake added the label Validation on Sep 18, 2017
  4. esotericnonsense commented at 3:21 am on September 18, 2017: contributor

    Benchmarks, syncing against a localhost node. Sending node on HDD, syncing node on SSD. Clock starts at UpdateTip height=1. prune=550 dbcache=3000.

    0+--------+----------+------------------+-----------------+
    1| height | unpruned | pruned (this PR) | pruned (master) |
    2+--------+----------+------------------+-----------------+
    3| 250000 |     427s |             593s |            724s |
    4| 300000 |     916s |            1076s |           1402s |
    5| 350000 |    1443s |            1979s |           2707s |
    6+--------+----------+------------------+-----------------+
    

    At height 350000, this PR results in a 529MiB dbcache vs. a 2646MiB dbcache unpruned. The pruned node ends up with approx. 30MiB. 110 seconds should be added to serialize the dbcache on the unpruned node’s shutdown (the other two cases were single digit seconds).

    Final result is that the node can sync to height 350000 27% faster than without the PR by giving the prune target ~3GiB leeway. I didn’t want to spend the time to reach the end but I suspect results would be similar or better.

    As in the above post, this is only a ‘partial fix’ because the dbcache is still limited empirically to far lower than the actual value.

    edit: Ah yes, space requirements. In this test the chainstate folder’s final size is ~1GiB and the prune is allowed to overshoot by ~3GiB, so it raises the maximum disk space requirement by ~2GiB in this example.

  5. esotericnonsense commented at 4:51 am on September 18, 2017: contributor

    An additional performance gain could be gotten by tying this HWM to a percentage of the prune target.

    For example, with prune=100000 you could let the data get to 100G x 1.10 before pruning, or cap it at 100G and prune down to 100G x 0.90 (similar effect on dbcache in both cases).

    Looking at the documentation in -help:

    0automatically prune block files to stay under the specified target size in MiB
    

    so probably the ‘remain below’ option makes more sense, but that retains the far slower IBD mechanic at low prune levels

  6. sdaftuar commented at 7:09 pm on September 29, 2017: member

    No strong feelings from me, but when we worked on the pruning implementation our goal was to have the target be something that should be achievable. So if we were to decide that it’s worth exceeding it intentionally (eg for performance reasons during ibd), we should remember that we need to clearly communicate that to users.

    But now that we in theory support non-atomic flushes, perhaps we can use that to flush less often during IBD even while we prune.

  7. luke-jr commented at 9:42 am on November 10, 2017: member
    Indeed, users expect that if they set prune=5000, the blockchain size remains <= 5000 MB. Perhaps it would make sense to have a prune-extra option to specify additional space to free when doing pruning and reduce its frequency (we can default it to 10% or something).
  8. Sjors commented at 2:06 pm on February 10, 2018: member
    See also #12404.
  9. sipa referenced this in commit 9a1ad2c5cb on Jul 14, 2018
  10. DrahtBot added the label Needs rebase on Jul 14, 2018
  11. DrahtBot commented at 11:23 am on July 14, 2018: member
  12. Sjors commented at 2:45 pm on July 14, 2018: member
    This is superseded by #11658 which was just merged.
  13. MarcoFalke commented at 3:07 pm on July 14, 2018: member
    Closing for now as per @Sjors
  14. MarcoFalke closed this on Jul 14, 2018

  15. laanwj removed the label Needs rebase on Oct 24, 2019
  16. PastaPastaPasta referenced this in commit d7b71db2b9 on Jul 19, 2020
  17. PastaPastaPasta referenced this in commit 0c7ba8e4c9 on Jul 24, 2020
  18. PastaPastaPasta referenced this in commit 474e20ff38 on Jul 27, 2020
  19. UdjinM6 referenced this in commit d3f7be5b84 on Jul 27, 2020
  20. UdjinM6 referenced this in commit 0eb337028b on Jul 27, 2020
  21. DrahtBot locked this on Dec 16, 2021

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-10-04 19:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me