Prune more aggressively during IBD

Sjors commented at 1:50 pm on February 10, 2018: member

Pruning forces a chainstate flush, which can defeat the dbcache and harm performance significantly.

During IBD we now prune based on the worst case size of the remaining blocks, but no further than the minimum prune size of 550 MB.

Using MAX_BLOCK_SERIALIZED_SIZE is complete overkill on testnet and usually too high on mainnet. It doesn’t take into account the SegWit activation block either. This causes the node to be further pruned than strictly needed after IBD. It also makes it more difficult to test. One improvement could be to use a moving average actual block size or a hard coded educated guess. However there’s something to be said for keeping this simple.

fanquake added the label Validation on Feb 10, 2018

Sjors commented at 2:39 pm on February 10, 2018: member

@fanquake probably also needs “Block storage” label.

Sjors force-pushed on Feb 10, 2018

fanquake added the label Block storage on Feb 11, 2018

esotericnonsense commented at 3:30 am on February 19, 2018: contributor

Untested ACK, would kill off #11658 and #11359.

I personally don’t think the detail of how much we over-or-under-prune here are that important given that the long term solution is to fix the cache such that it doesn’t require a complete flush. Basically any change here will speed up pruning IBD by a large amount.

Sjors force-pushed on Feb 19, 2018

in src/validation.cpp:3662 in 86bef23e65 outdated

3569@@ -3570,6 +3570,9 @@ static void FindFilesToPrune(std::set<int>& setFilesToPrune, uint64_t nPruneAfte
3570 
3571     unsigned int nLastBlockWeCanPrune = chainActive.Tip()->nHeight - MIN_BLOCKS_TO_KEEP;
3572     uint64_t nCurrentUsage = CalculateCurrentUsage();
3573+    // Worst case remaining block space:

luke-jr commented at 3:21 pm on February 27, 2018:

Seems like this ought to be using best case…?

Sjors commented at 4:06 pm on February 27, 2018:

Best case would be empty blocks. That would lead to a lot of flushes.

morcos commented at 3:05 pm on March 5, 2018: member

ACK 86bef23

eklitzke commented at 7:54 am on March 11, 2018: contributor

utACK 86bef23e6550cdcf989ae6ac22dbbc45bbf613e4

Sjors force-pushed on Mar 12, 2018

Sjors commented at 9:03 pm on March 12, 2018: member

Rebased due to release notes change.

Sjors force-pushed on Mar 26, 2018

Sjors commented at 4:29 pm on March 26, 2018: member

Rebased due to release notes change.

Sjors commented at 6:17 pm on March 26, 2018: member

p2p_leak.py failure on Travis seems a bit random (and passes on my local machine)…

eklitzke commented at 4:00 am on March 27, 2018: contributor

utACK 82efbf1e8ac67ad9d04cba9b64cb79ece86209f8

luke-jr commented at 8:26 pm on March 31, 2018: member

Before merging, please remove the name and PR reference from the commit message, so it doesn’t ping us every time someone adds it to their random fork.

fanquake deleted a comment on Apr 1, 2018

Sjors commented at 9:43 am on April 3, 2018: member

@luke-jr will do. Should I also remove it from the PR description, since that also ends up in the merge commit message? Or do those merge commits rarely make it into upstream work because commits are cherry-picked?

Sjors force-pushed on Apr 3, 2018

Sjors commented at 9:49 am on April 3, 2018: member

Done. Also: rebased for release notes.

Prune more aggressively during IBD

Pruning forces a chainstate flush, which can defeat the dbcache and harm performance significantly.

During IBD we now prune based on the worst case size of the remaining blocks, but no further than
the minimum prune size of 550 MB.

949cbca28e

Sjors force-pushed on May 15, 2018

Sjors commented at 11:24 am on May 15, 2018: member

Rebased so I can do some benchmarking.

Sjors commented at 8:54 am on May 20, 2018: member

I’ve been racing AWS instances for the past few days, using master, #11658 (rebased on master) and this PR. I use a t2.micro with 1 vCPU, 1 GiB RAM and 20 GB storage. I set prune to 10 GB, dbcache=300 and maxmempool=5.

After 72 hours master is currently at block 341909, @luke-jr’s branch is at 364905 and mine is at 360719.

I enabled T2 Unlimited to prevent CPU throttling, although it doesn’t seem to be CPU bound beyond the first 100-200K blocks (that will change after the assumed valid block):

I tried higher values for dbcache but that led to out of memory crashes (sometimes during a cache flush) and once even to a machine freezing. I didn’t try adding swap to prevent these crashes; I’m not sure how to manage that properly, i.e. in a way that too much swap usage doesn’t end up cancelling the benefits of these caches.

I’ll leave them running for a bit. So far it seems clear that merging either of these PR’s would be quite helpful on low-end machines, but which one is less clear. It probably depends on the choices for dbcache and prune and my guess is that machines with more RAM would benefit from pruning as aggressively as possible to minimize the number of cache flushes (but beyond ~8 GB of RAM it wouldn’t matter, because it would never flush).

I just started three t2.medium instances with 2 vCPU, using dbcache=3000.

Sjors commented at 10:30 am on May 20, 2018: member

To clarify, is IsInitialBlockDownload() something that only happens once in the life time of a node, or is this also true if it needs to do a large catch up? If the latter, there is a case to be made for conservative pruning (or putting aggressive pruning behind a config flag).

When you run something like c-lightning against a pruned bitcoind node, it’s constantly processing blocks as they come in. A large prune event could mess with this process if for some reason the other application isn’t completely caught up. This is less of a problem for the initial sync if that other application doesn’t care about pre-history. E.g. c-lightning doesn’t need history before its wallet creation block, so the trick there is to wait with launching it until bitcoind finishes IBD, and then keep them both running.

But there may be other applications that need to track some sort of index all the way through IBD where it’s important they don’t lose sync.

Sjors commented at 7:49 am on May 21, 2018: member

After a little under 24 hours the t2.medium instances:

master: 458269
10% pruning: 471894
this PR: 396051

Notice how this PR so far seems to perform worse than master (on this instance and with these settings, still better than master on the t2.micro instance). I’ll keep an eye on it. Maybe it has something to do with the large dbcache? Because of the more frequent prune events the dbcache doesn’t grow as much on master and the 10% prune branch. See IRC. Paging @eklitzke.

It would be nice to have a script that parses debug.log and spits out a CSV file with block height and cache size. Scrolling through the log I notice that on master cache mostly stays below 200 MB, on the 10% pruning cache stays below 1 GB and usually under 500 MB, whereas in this PR is grows up to 2 GB.

Sjors commented at 8:57 am on May 21, 2018: member

0echo "height, cache" > cache.csv
1cat ~/.bitcoin/debug.log | grep UpdateTip | awk -F'[ =M]'  '{print $7", " $19 }' >> cache.csv

I’ll update the plots later.

Source data and Thunderplot file: plot.zip

Sjors commented at 3:44 pm on May 25, 2018: member

This extracts block height, cache size and a unix timestamp from the log:

0cat prune300_master.log | grep UpdateTip | gawk -F'[ =M]'  '{print $7", " $19", " gsub(/[-T:Z]/," ") ", " mktime($1 " " $2 " " $3 " " $4 " " $5 " " $6)  }' >> prune300_master.csv

IBD duration with dbcache=300MB:

Vertical access is in days. They ran for more than a week but didn’t finish. The 10% prune strategy (green line) was the fastest, master the slowest.

Cache usage:

Both strategies use more cache than master, but don’t differ much for such a small dbcache.

IBD duration with dbcache=300MB:

The 10% pruning strategy (green) was the only one that finished IBD before I gave up. This PR (red line) is dramatically slower than even master.

Cache usage:

Both strategies use more cache than master. Aggressive pruning uses way more cache, but for some reason that seems to make things worse.

n1bor commented at 9:15 am on May 31, 2018: none

FYI on AWS been running a node with 4gig RAM and sc1 disks (very cheap - 0.025/GigMonth) for a node with txindex on and keeps up fine (i.e. does not use burst allowance). Is used by a lightning node so get reasonable number of rpc requests. Is useless for IBD, but can set to SSD initially then once IBD done switch to sc1 with the click of a button!

Sjors commented at 10:00 am on May 31, 2018: member

@n1bor for my own project on AWS, I also use the strategy of doing IBD on a fast machine (i3.2xlarge). Anything with > 10 GB RAM to prevent the cache from flushing and a disk big enough to avoid pruning (doesn’t have to be SSD). The bigger disk is ephemeral and goes away after you downgrade to a regular instance type, so I do a manual prune at the end and copy the data to the persistent disk.

I’ll look into Cold HDD (sc1) Volumes for that project, because it c-lightning doesn’t fully love pruned nodes yet, but that’s off topic…

But you don’t have that luxury on a physical machine like the various [Rasberry / Orange / Nano] Pi’s out there. So it’s quite useful if IBD performed better on a constrained machines. Also, even that price point pruning still saves $5 a month in storage at the current chain size.

Sjors commented at 9:02 pm on June 9, 2018: member

Zooming in a little bit, this branch started dramatically slowing down compared to master around block 375000, which is around the September 2015 UTXO set explosion:

Perhaps the performance of read or write operations involving CCoinsCacheEntry with DIRTY flag dramatically decrease when cache is > ~ 1GB? That would explain why higher frequency pruning, which generally keeps cache below 300 MB in that period doesn’t slow down. While at the same time it would explain why 10 GB of cache, where all entries are FRESH, doesn’t slow down either. Does that seem even plausible?

I could test this by deliberately interrupting IBD on master on a non pruned node at roughly the same blocks where this branch flushed the cache: 244000, 298000, 332000, 355000, 373000, 388000, 401000, 414000, 424000, 436000, 446000, 455000 (the latter two being the range where master starts slowing down).

Sjors commented at 9:03 am on June 10, 2018: member

n1bor commented at 7:57 pm on June 10, 2018: none

@Sjors not sure if you ever saw this: https://gist.github.com/n1bor/d5b0330a9addb0bf5e0f869518883522 Feels to me that time spent on IBD for pruned nodes would be better spent on chainstate only download type solution. Factor of 50x speed up. But needs a softfork - so maybe not!

sipa commented at 8:02 pm on June 10, 2018: member

@n1bor That seems orthogonal. Synchronizing from chainstate is a very interesting idea, but it’s also a completely different security model (trusting that the historical chain has correct commitments rather than computing things yourself).

n1bor commented at 8:31 pm on June 10, 2018: none

My take is we have on order of “goodness”:

Full Node
Pruned Full Node
Chain-State Downloaded Full Node with soft-fork to commit chainstate to headers. (what my post was about)
SPV
Web-Wallets Currently core on offers 1 & 2.

Just think if core offered 3 would reduce number of users using web-wallets/SPV. Which has only got to be a good thing.

sipa commented at 8:33 pm on June 10, 2018: member

I agree that would be a good thing, but it in no way changes the fact that we should have a performant implementation for those who do not want to rely on trusting such hypothetical commitments (which this issue is about).

Also, this is not the place to discuss changes to the Bitcoin protocol.

Sjors commented at 7:21 pm on June 11, 2018: member

I launched two new t2.medium nodes on AWS, running Ubuntu 16.04, 2 CPU (uncapped), 4 GB RAM no swap. I set prune=10000, dbcache=3000 and maxmempool=5 on both like I did earlier. The blue lines are the current master master, the orange line is this PR rebased on master.

Again, this branch slows down dramatically quite early on, this time I captured some metrics:

There are prune events at 15:28 (block 244388, cache 929 MB), 15:38 (297332, cache 1024 MB), 14:48 (331446, cache 1235 MB), 15:58 (354941, cache 1279 MB) and 16:11 (373100, cache 2951 MB). Those last two are right before and after the network activity drops.

Note how this branch has dramatically more read throughput.

I’ll try spinning up a node with dbcache=1000

I ran the same configuration on my iMac (which has 4 CPU’s and a USB 3.1gen2 external SSD drive) and don’t get any noticeable performance difference between these two branches (2 hours 20 minutes to run from block 360.000 to 480.000).

Sjors commented at 8:29 pm on June 11, 2018: member

I don’t see LogPrint(BCLog::PRUNE, "Prune: target=%dMiB actual=%dMiB diff=%dMiB max_prune_height=%d removed %d blk/rev pairs\n" appear in the logs, not even for master. That category isn’t disabled by default, is it?

Trying to figure out what could explain the extra disk read activity. Does anything related to pruning happen in a separate thread that we don’t wait for (before the next UpdateTip can happen)?

Sjors commented at 9:17 am on June 12, 2018: member

Running this branch with dbcache=1000 doesn’t cause the same high read disk activity:

It’s still running so I don’t know if it’s faster than master or the 10% prune strategy, but at least it doesn’t suffer a similar slow down as dbcache=3000.

Sjors commented at 3:12 pm on June 14, 2018: member

The thick line shows this PR with dbcache set to 1000. It no longer shows the performance hit you see with dbcache=3000 and it’s faster than master, but not necessarily faster than the 10% pruning strategy.

Closing this in favor of #11658, since the benefit seems small and an unexplained massive performance hit needs… explaining :-)

Sjors closed this on Jun 14, 2018

ajtowns commented at 7:58 am on July 12, 2018: member

FWIW, one effect I’m seeing that might cause the difference between dbcache 3000 vs 1000 is that when the cache is flushed, it takes a little while (and presumably 3x as long with 3x as large a dbcache), during which the block download queues pretty much empty, and then after the cache is flushed, the queues take a while to even out and get back up to the same download speed.

MarcoFalke locked this on Sep 8, 2021

Prune more aggressively during IBD #12404