Current default settings are broken, some fix is needed #29603

issue reivanen openend this issue on March 8, 2024
  1. reivanen commented at 4:34 pm on March 8, 2024: none

    Please describe the feature you’d like to see added.

    Better default settings, or fix to database cache handling

    When doing IBD using default settings, on a relatively fast machine on relatively fast HDD everything works well until ~when segwit was activated. After that ETA estimate shot up from 4 days to 3 weeks. Investigating, my 1MB/s internet was downloading only about 15% of the time. The operating system became extremely laggy, and i found that the HDD was stuck on 100% IO load, doing 4-6MB/s read, plus the normal occasional writes for downloaded parts.

    I tried increasing db cache by 2x to 900MB, but absolutely nothing changed. RAM usage etc. same.

    Searching for issues i found that many “fixed” it by putting chainstate on SSD. This was not an option, so i started looking into putting it into a ramdrive. But then in a random comment about using a ramdrive someone said using more than 8000 dbcache makes the full chainstate fit into ram, essentially achieving the same. I was skeptical, since doubling it had not made any difference. But i decided to test it with 8500.

    And Bitcoin started working perfectly. Continuous download, low IO load, much faster write throughput. Something bordering unusable became smooth. So why is this a feature request instead of “that’s expected with such high db cache”

    Because even a device with only 3 gigs of ram can use a value of 8500 and start working well, instead of being useless. It takes something like 8 hours before bitcoin’s RAM usage reaches 3 gigs when syncing at 1MB/s. And at this point one can restart bitcoin which frees the ram, and sync for another 8 hours.

    This is very weird that there exists an undocumented “threshold” where dbcache’s operation changes completely, starting to work as an actual cache, freeing the HDD from doing constant small random read operations.

    Describe the solution you’d like

    just drop dbcache setting completely, let it grow as needed and when RAM starts to get full, just flush it as if restarting bitcoin. I wonder how many HDD:s have broken down due to weeks long constant 100% random read IO stress due to this issue, and how many users gave up on the idea of running a node because they thought their machine is not up to the task.

    Describe any alternatives you’ve considered

    change default setting to the minimum value that can support proper cache behavior after segwit blocks. Warn user that reducing the cache will cause extreme slowdown and disk stress when doing IBD after segwit blocks start, practically only doable with SSD.

  2. reivanen added the label Feature on Mar 8, 2024
  3. willcl-ark commented at 5:01 pm on March 8, 2024: contributor

    Hi @reivanen, thanks for your report.

    It is documented in https://github.com/bitcoin/bitcoin/blob/master/doc/reduce-memory.md#in-memory-caches that higher dbcache settings give better IBD performance, and I think it stands to reason that this effect would be more pronounced on slower hardware.

    I do agree though that it could be stated more clearly (other than in a doc titled “reduce memory”) that this known effect occurs.

    There is also a somewhat-related PR open to increase the maximum dbcache value, to further speed up IBD.

    In my opinion the current program model of having a constrained memory footprint is preferable to making it unconstrained. If you want to speed up IDB then you can temporarily increase your cache size (to another known maximum size), and after IDB is complete and you are in sync, the (known) memory constraints make Bitcoin Core a more stable program to run and use overall.

    Additionally, having the program notified that “memory is running out” is often more simply said-than-done, usually requiring workarounds which also in turn can make other internals harder to reason about than fixed sizes.

    I have a spinning HDD so may get around to try your cache size cutoff and see if I can also notice the performance differences you mention.

  4. reivanen commented at 10:56 am on March 9, 2024: none

    This is not just an issue of “better performance with larger dbcache” but a completely different way of working (depending on how much dbcache is used of total allocated?).

    I tested by increasing the network speed to 2MB/s. It worked as good as with 1, HDD was more than able to keep up. Until RAM usage had gone to around 8000 used from 8500 total, then the 100% IO READ started and download happening only a fraction of the time. Restart bitcoin, and it continues to download at 2MB/s continuously, presumably until 8 gigs of ram has filled.

    So after these data points i don’t see any other easy “fix” than making the default dbcache size large enough to properly process segwit blocks, and then purge the cache every time it is starting to fill up, before the “phase change” happens in cache behavior.

    I should not need to periodically restart bitcoin to make it work properly. (also it seems like the source i read is outdated, because currently the whole chainstate does not seem to fit into 8 gigs and it starts throttling DL speed due to the random reads. I will test with dbcache 10000 if it will finally sync to the end properly.

  5. reivanen commented at 11:16 am on March 9, 2024: none

    One more data point came to mind: this seems to be related to segwit?

    Because i was able to properly process the blockchain until around segwit activation using 450MB dbcache. But after that even 8500 dbcache was not enough, but started throttling as it (or RAM) came close to filling up.

  6. maflcko removed the label Feature on Mar 11, 2024
  7. maflcko added the label Resource usage on Mar 11, 2024
  8. Sjors commented at 11:48 am on March 11, 2024: member

    @reivanen is you node pruned? If it is, make sure to test on a recent version of master - #20827 may have improved performance for pruned nodes.

    If your node is not pruned, then #28358 should help, but only until you run out of RAM (and if you don’t there may be diminishing returns).

    The -dbcache feature reduces frequent disk writes, because it delays the process of writing new coins to disk. If you use it all the way through IBD, then it also reduces disk reads, since all the coins you created are in RAM.

    Once you run out of RAM the cache is flushed. So from that point onwards you still have the benefit of fewer writes (it waits until the next flush), but for reading you have to fetch coins from disk (unless the coins are from after the last flush, in which case they’re in RAM).

    I tried increasing db cache by 2x to 900MB, but absolutely nothing changed. RAM usage etc. same.

    The RAM is only used when needed. So it takes a while before you should see a difference.

    even a device with only 3 gigs of ram can use a value of 8500

    You would run of out of memory and crash once dbcache grows beyond the size of your RAM.

    Before that, things may grind to a halt if you have swap configured; your operating system will start writing RAM to disk, defeating the purpose of -dbcache. I’m not sure at what point that kicks in. But for benchmarking you may want to turn swap off.

    Until RAM usage had gone to around 8000 used from 8500 total, then the 100% IO READ started and download happening only a fraction of the time.

    If you’re pushing this close to the limit of your machine, swap might be the reason it slows down (and you’re risking a crash).

  9. reivanen commented at 10:26 pm on March 11, 2024: none

    i’m not running pruned.

    I did indeed turn off swap as first mitigation to see what is actually going on with RAM usage.

    From what you wrote it’s curious why restarting bitcoin-qt will allow for it to continue syncing at high speed until dbcache fills up.

  10. Sjors commented at 8:16 am on March 12, 2024: member
    Because if your dbcache is no longer in RAM but in swap on disk, it’s going to slow down. When you restart dbcache is empty, so it’s in RAM. Until it fills again and your OS moves it to swap. That’s my theory anyway, you’d have to check if that’s what’s happening.
  11. willcl-ark commented at 9:09 am on March 12, 2024: contributor

    Right, if you set dbcache higher than RAM available then the OS should start swapping it out when it runs out of (real) RAM. This should be expected to have bad performance because you are not using RAM cache for as much of the time as possible, and you become reliant on manually flushing the cache by restarting Bitcoin Core manually.

    In the case that dbcache is set lower than available RAM, Bitcoin Core will process until the cache is filled, then flush it all to disk, largely emptying it’s (fast, RAM) cache again. This should provide better performance, analagous to how you noticed that “manually” emptying the cache (by restarting bitcoind) did.

    Because I was curious about the spinning disk speed I left a node syncing from a single LAN connection last night. This had a 7200rpm HDD and bitcoind used default settings apart from the following:

    0daemon=1
    1port=18833
    2rpcport=18832
    3blocksonly=1
    4connect=localhost:8833
    

    This implies a dbcache of 450MB.

    I plotted the times which blocks were processed according to the logfile in matplotlib, also marking segwit activation height:

    spinning_IBD_529122

    I also marked with red dots blocks who had an interval > 30 seconds from the last block, which in some cases might indicate cache flushes, but I did not log cache flushes so can’t be sure.

    It does not appear to me from the shape of the graph that there’s any specific issue with segwit activation height and block processing times on IBD with default settings on a spinning disk. Rather the slope seems to follow general levels of the chains usage, which seems entriely in line with expectations.

  12. reivanen commented at 5:27 pm on March 18, 2024: none

    yes i have reinstalled the system with faster HDD access, and i can see similar performance as you posted.

    What i was suffering was probably extremely bad IO latency&throughput so that everything that was not in RAM took ridiculous amount of time.

    I thought average 2MB/s throughput was OK at this particulkar workload but now after reconfiguration same HW gets up to 30MB/s

  13. reivanen closed this on Mar 18, 2024


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-12-26 12:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me