Prune more aggressively during IBD #12404

pull Sjors wants to merge 1 commits into bitcoin:master from Sjors:2018/02/ibd_prune_extra changing 2 files +32 −1
  1. Sjors commented at 1:50 pm on February 10, 2018: member

    Pruning forces a chainstate flush, which can defeat the dbcache and harm performance significantly.

    During IBD we now prune based on the worst case size of the remaining blocks, but no further than the minimum prune size of 550 MB.

    Using MAX_BLOCK_SERIALIZED_SIZE is complete overkill on testnet and usually too high on mainnet. It doesn’t take into account the SegWit activation block either. This causes the node to be further pruned than strictly needed after IBD. It also makes it more difficult to test. One improvement could be to use a moving average actual block size or a hard coded educated guess. However there’s something to be said for keeping this simple.

  2. fanquake added the label Validation on Feb 10, 2018
  3. Sjors commented at 2:39 pm on February 10, 2018: member
    @fanquake probably also needs “Block storage” label.
  4. Sjors force-pushed on Feb 10, 2018
  5. Sjors force-pushed on Feb 10, 2018
  6. fanquake added the label Block storage on Feb 11, 2018
  7. esotericnonsense commented at 3:30 am on February 19, 2018: contributor

    Untested ACK, would kill off #11658 and #11359.

    I personally don’t think the detail of how much we over-or-under-prune here are that important given that the long term solution is to fix the cache such that it doesn’t require a complete flush. Basically any change here will speed up pruning IBD by a large amount.

  8. Sjors force-pushed on Feb 19, 2018
  9. Sjors force-pushed on Feb 19, 2018
  10. Sjors force-pushed on Feb 19, 2018
  11. in src/validation.cpp:3662 in 86bef23e65 outdated
    3569@@ -3570,6 +3570,9 @@ static void FindFilesToPrune(std::set<int>& setFilesToPrune, uint64_t nPruneAfte
    3570 
    3571     unsigned int nLastBlockWeCanPrune = chainActive.Tip()->nHeight - MIN_BLOCKS_TO_KEEP;
    3572     uint64_t nCurrentUsage = CalculateCurrentUsage();
    3573+    // Worst case remaining block space:
    


    luke-jr commented at 3:21 pm on February 27, 2018:
    Seems like this ought to be using best case…?

    Sjors commented at 4:06 pm on February 27, 2018:
    Best case would be empty blocks. That would lead to a lot of flushes.
  12. morcos commented at 3:05 pm on March 5, 2018: member
    ACK 86bef23
  13. eklitzke commented at 7:54 am on March 11, 2018: contributor
    utACK 86bef23e6550cdcf989ae6ac22dbbc45bbf613e4
  14. Sjors force-pushed on Mar 12, 2018
  15. Sjors commented at 9:03 pm on March 12, 2018: member
    Rebased due to release notes change.
  16. Sjors force-pushed on Mar 26, 2018
  17. Sjors commented at 4:29 pm on March 26, 2018: member
    Rebased due to release notes change.
  18. Sjors commented at 6:17 pm on March 26, 2018: member
    p2p_leak.py failure on Travis seems a bit random (and passes on my local machine)…
  19. eklitzke commented at 4:00 am on March 27, 2018: contributor
    utACK 82efbf1e8ac67ad9d04cba9b64cb79ece86209f8
  20. luke-jr commented at 8:26 pm on March 31, 2018: member
    Before merging, please remove the name and PR reference from the commit message, so it doesn’t ping us every time someone adds it to their random fork.
  21. fanquake deleted a comment on Apr 1, 2018
  22. Sjors commented at 9:43 am on April 3, 2018: member
    @luke-jr will do. Should I also remove it from the PR description, since that also ends up in the merge commit message? Or do those merge commits rarely make it into upstream work because commits are cherry-picked?
  23. Sjors force-pushed on Apr 3, 2018
  24. Sjors commented at 9:49 am on April 3, 2018: member
    Done. Also: rebased for release notes.
  25. Prune more aggressively during IBD
    Pruning forces a chainstate flush, which can defeat the dbcache and harm performance significantly.
    
    During IBD we now prune based on the worst case size of the remaining blocks, but no further than
    the minimum prune size of 550 MB.
    949cbca28e
  26. Sjors force-pushed on May 15, 2018
  27. Sjors commented at 11:24 am on May 15, 2018: member
    Rebased so I can do some benchmarking.
  28. Sjors commented at 8:54 am on May 20, 2018: member

    I’ve been racing AWS instances for the past few days, using master, #11658 (rebased on master) and this PR. I use a t2.micro with 1 vCPU, 1 GiB RAM and 20 GB storage. I set prune to 10 GB, dbcache=300 and maxmempool=5.

    After 72 hours master is currently at block 341909, @luke-jr’s branch is at 364905 and mine is at 360719.

    I enabled T2 Unlimited to prevent CPU throttling, although it doesn’t seem to be CPU bound beyond the first 100-200K blocks (that will change after the assumed valid block):

    I tried higher values for dbcache but that led to out of memory crashes (sometimes during a cache flush) and once even to a machine freezing. I didn’t try adding swap to prevent these crashes; I’m not sure how to manage that properly, i.e. in a way that too much swap usage doesn’t end up cancelling the benefits of these caches.

    I’ll leave them running for a bit. So far it seems clear that merging either of these PR’s would be quite helpful on low-end machines, but which one is less clear. It probably depends on the choices for dbcache and prune and my guess is that machines with more RAM would benefit from pruning as aggressively as possible to minimize the number of cache flushes (but beyond ~8 GB of RAM it wouldn’t matter, because it would never flush).

    I just started three t2.medium instances with 2 vCPU, using dbcache=3000.

  29. Sjors commented at 10:30 am on May 20, 2018: member

    To clarify, is IsInitialBlockDownload() something that only happens once in the life time of a node, or is this also true if it needs to do a large catch up? If the latter, there is a case to be made for conservative pruning (or putting aggressive pruning behind a config flag).

    When you run something like c-lightning against a pruned bitcoind node, it’s constantly processing blocks as they come in. A large prune event could mess with this process if for some reason the other application isn’t completely caught up. This is less of a problem for the initial sync if that other application doesn’t care about pre-history. E.g. c-lightning doesn’t need history before its wallet creation block, so the trick there is to wait with launching it until bitcoind finishes IBD, and then keep them both running.

    But there may be other applications that need to track some sort of index all the way through IBD where it’s important they don’t lose sync.

  30. Sjors commented at 7:49 am on May 21, 2018: member

    After a little under 24 hours the t2.medium instances:

    • master: 458269
    • 10% pruning: 471894
    • this PR: 396051

    Notice how this PR so far seems to perform worse than master (on this instance and with these settings, still better than master on the t2.micro instance). I’ll keep an eye on it. Maybe it has something to do with the large dbcache? Because of the more frequent prune events the dbcache doesn’t grow as much on master and the 10% prune branch. See IRC. Paging @eklitzke.

    It would be nice to have a script that parses debug.log and spits out a CSV file with block height and cache size. Scrolling through the log I notice that on master cache mostly stays below 200 MB, on the 10% pruning cache stays below 1 GB and usually under 500 MB, whereas in this PR is grows up to 2 GB.

  31. Sjors commented at 8:57 am on May 21, 2018: member
    0echo "height, cache" > cache.csv
    1cat ~/.bitcoin/debug.log | grep UpdateTip | awk -F'[ =M]'  '{print $7", " $19 }' >> cache.csv
    

    I’ll update the plots later.

    Source data and Thunderplot file: plot.zip

  32. Sjors commented at 3:44 pm on May 25, 2018: member

    This extracts block height, cache size and a unix timestamp from the log:

    0cat prune300_master.log | grep UpdateTip | gawk -F'[ =M]'  '{print $7", " $19", " gsub(/[-T:Z]/," ") ", " mktime($1 " " $2 " " $3 " " $4 " " $5 " " $6)  }' >> prune300_master.csv
    

    IBD duration with dbcache=300MB:

    Vertical access is in days. They ran for more than a week but didn’t finish. The 10% prune strategy (green line) was the fastest, master the slowest.

    Cache usage:

    Both strategies use more cache than master, but don’t differ much for such a small dbcache.

    IBD duration with dbcache=300MB:

    The 10% pruning strategy (green) was the only one that finished IBD before I gave up. This PR (red line) is dramatically slower than even master.

    Cache usage:

    Both strategies use more cache than master. Aggressive pruning uses way more cache, but for some reason that seems to make things worse.

  33. n1bor commented at 9:15 am on May 31, 2018: none
    FYI on AWS been running a node with 4gig RAM and sc1 disks (very cheap - 0.025/GigMonth) for a node with txindex on and keeps up fine (i.e. does not use burst allowance). Is used by a lightning node so get reasonable number of rpc requests. Is useless for IBD, but can set to SSD initially then once IBD done switch to sc1 with the click of a button!
  34. Sjors commented at 10:00 am on May 31, 2018: member

    @n1bor for my own project on AWS, I also use the strategy of doing IBD on a fast machine (i3.2xlarge). Anything with > 10 GB RAM to prevent the cache from flushing and a disk big enough to avoid pruning (doesn’t have to be SSD). The bigger disk is ephemeral and goes away after you downgrade to a regular instance type, so I do a manual prune at the end and copy the data to the persistent disk.

    I’ll look into Cold HDD (sc1) Volumes for that project, because it c-lightning doesn’t fully love pruned nodes yet, but that’s off topic…

    But you don’t have that luxury on a physical machine like the various [Rasberry / Orange / Nano] Pi’s out there. So it’s quite useful if IBD performed better on a constrained machines. Also, even that price point pruning still saves $5 a month in storage at the current chain size.

  35. Sjors commented at 9:02 pm on June 9, 2018: member

    Zooming in a little bit, this branch started dramatically slowing down compared to master around block 375000, which is around the September 2015 UTXO set explosion:

    Perhaps the performance of read or write operations involving CCoinsCacheEntry with DIRTY flag dramatically decrease when cache is > ~ 1GB? That would explain why higher frequency pruning, which generally keeps cache below 300 MB in that period doesn’t slow down. While at the same time it would explain why 10 GB of cache, where all entries are FRESH, doesn’t slow down either. Does that seem even plausible?

    I could test this by deliberately interrupting IBD on master on a non pruned node at roughly the same blocks where this branch flushed the cache: 244000, 298000, 332000, 355000, 373000, 388000, 401000, 414000, 424000, 436000, 446000, 455000 (the latter two being the range where master starts slowing down).

  36. Sjors commented at 9:03 am on June 10, 2018: member
  37. n1bor commented at 7:57 pm on June 10, 2018: none
    @Sjors not sure if you ever saw this: https://gist.github.com/n1bor/d5b0330a9addb0bf5e0f869518883522 Feels to me that time spent on IBD for pruned nodes would be better spent on chainstate only download type solution. Factor of 50x speed up. But needs a softfork - so maybe not!
  38. sipa commented at 8:02 pm on June 10, 2018: member
    @n1bor That seems orthogonal. Synchronizing from chainstate is a very interesting idea, but it’s also a completely different security model (trusting that the historical chain has correct commitments rather than computing things yourself).
  39. n1bor commented at 8:31 pm on June 10, 2018: none

    My take is we have on order of “goodness”:

    1. Full Node
    2. Pruned Full Node
    3. Chain-State Downloaded Full Node with soft-fork to commit chainstate to headers. (what my post was about)
    4. SPV
    5. Web-Wallets Currently core on offers 1 & 2.

    Just think if core offered 3 would reduce number of users using web-wallets/SPV. Which has only got to be a good thing.

  40. sipa commented at 8:33 pm on June 10, 2018: member

    I agree that would be a good thing, but it in no way changes the fact that we should have a performant implementation for those who do not want to rely on trusting such hypothetical commitments (which this issue is about).

    Also, this is not the place to discuss changes to the Bitcoin protocol.

  41. Sjors commented at 7:21 pm on June 11, 2018: member

    I launched two new t2.medium nodes on AWS, running Ubuntu 16.04, 2 CPU (uncapped), 4 GB RAM no swap. I set prune=10000, dbcache=3000 and maxmempool=5 on both like I did earlier. The blue lines are the current master master, the orange line is this PR rebased on master.

    Again, this branch slows down dramatically quite early on, this time I captured some metrics:

    There are prune events at 15:28 (block 244388, cache 929 MB), 15:38 (297332, cache 1024 MB), 14:48 (331446, cache 1235 MB), 15:58 (354941, cache 1279 MB) and 16:11 (373100, cache 2951 MB). Those last two are right before and after the network activity drops.

    Note how this branch has dramatically more read throughput.

    I’ll try spinning up a node with dbcache=1000

    I ran the same configuration on my iMac (which has 4 CPU’s and a USB 3.1gen2 external SSD drive) and don’t get any noticeable performance difference between these two branches (2 hours 20 minutes to run from block 360.000 to 480.000).

  42. Sjors commented at 8:29 pm on June 11, 2018: member

    I don’t see LogPrint(BCLog::PRUNE, "Prune: target=%dMiB actual=%dMiB diff=%dMiB max_prune_height=%d removed %d blk/rev pairs\n" appear in the logs, not even for master. That category isn’t disabled by default, is it?

    Trying to figure out what could explain the extra disk read activity. Does anything related to pruning happen in a separate thread that we don’t wait for (before the next UpdateTip can happen)?

  43. Sjors commented at 9:17 am on June 12, 2018: member

    Running this branch with dbcache=1000 doesn’t cause the same high read disk activity:

    It’s still running so I don’t know if it’s faster than master or the 10% prune strategy, but at least it doesn’t suffer a similar slow down as dbcache=3000.

  44. Sjors commented at 3:12 pm on June 14, 2018: member

    The thick line shows this PR with dbcache set to 1000. It no longer shows the performance hit you see with dbcache=3000 and it’s faster than master, but not necessarily faster than the 10% pruning strategy.

    Closing this in favor of #11658, since the benefit seems small and an unexplained massive performance hit needs… explaining :-)

  45. Sjors closed this on Jun 14, 2018

  46. ajtowns commented at 7:58 am on July 12, 2018: member
    FWIW, one effect I’m seeing that might cause the difference between dbcache 3000 vs 1000 is that when the cache is flushed, it takes a little while (and presumably 3x as long with 3x as large a dbcache), during which the block download queues pretty much empty, and then after the cache is flushed, the queues take a while to even out and get back up to the same download speed.
  47. MarcoFalke locked this on Sep 8, 2021

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-12-22 00:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me