Drop -dbcache limit #28358

pull Sjors wants to merge 1 commits into bitcoin:master from Sjors:2023/08/double-your-coins---cache changing 6 files +9 −7
  1. Sjors commented at 3:57 pm on August 28, 2023: member

    Due to recent UTXO set growth, the current maximum value for -dbcache of 16GB is ~just months away from being~ insufficient (for those who wish to complete IBD with the UTXO set held in RAM).

    This drops the limit. It also adds a warning that it’s up to users to check that they have enough RAM.

    Fixes #28249.


    A previous version of this PR increased the maximum to 64GB. It also made startup abort if the value provided is too high, rather than quietly round it down. But this didn’t get much support.

  2. DrahtBot commented at 3:57 pm on August 28, 2023: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage

    For detailed information about the code coverage, see the test coverage report.

    Reviews

    See the guideline for information on the review process.

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #30059 (Add option dbfilesize to control LevelDB target (“max”) file size by luke-jr)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  3. Sjors marked this as ready for review on Aug 28, 2023
  4. Sjors force-pushed on Aug 28, 2023
  5. DrahtBot added the label CI failed on Aug 28, 2023
  6. maflcko commented at 4:11 pm on August 28, 2023: member

    insufficient

    Can you explain this? Why is there a requirement that the whole utxo set must fit in the cache? This can easily lead to OOM, as you explain, in which case the user would have to start from zero, because potentially nothing has been persisted at all. Also, for slow storage it could lead to hangs, which could lead the user (or OS) to kill the process, causing the same issues. Finally, with assumeutxo, I wonder why there is any need to chase this? No objection to this pull, but I wonder if this really makes users happier on average, or will just lead to more hangs, crashes and -reindex.

  7. in doc/release-notes-25358.md:4 in f311e4549b outdated
    0@@ -0,0 +1,5 @@
    1+Node
    2+------
    3+
    4+- The maximum value for `-dbcache` has been increased to 32,000 MiB due to recent UTXO set growth. Be sure to check that you have enough RAM before increasing this.
    


    maflcko commented at 4:17 pm on August 28, 2023:
    Instead of implying that users should check their RAM and then increase this, it would be more helpful to explain why a user should change the setting in the first place, what they can expect, and what the tradeoffs are, but this seems better put in a separate document or the -help.

    Sjors commented at 4:41 pm on August 28, 2023:
    That was not the intended implication. If they feel like increasing it, they should check their RAM first.

    jonatack commented at 2:33 pm on August 30, 2023:
    0- The maximum allowed value for the `-dbcache` configuration option has been increased to 32,000 MiB due to recent UTXO set growth. Be sure to check that you have enough RAM before increasing this. (#28358)
    

    Sjors commented at 2:44 pm on August 30, 2023:
    I might also add something like “Even with sufficient RAM the maximum setting may not be optimal.”

    jonatack commented at 2:49 pm on August 30, 2023:
    Yes – and why, or link to an improved In-memory caches section in doc/reduce-memory.md.

    maflcko commented at 2:53 pm on August 30, 2023:
    It is just that this is the only thing you mention in the release notes. In any case, I don’t think the release notes are the right place for extended documentation. Either put it in the -help or a separate md.

    Sjors commented at 2:07 pm on February 13, 2024:
    Taken the suggested text diff.

    Sjors commented at 2:11 pm on February 13, 2024:
    I dropped the “check RAM” sentence.
  8. jamesob commented at 4:25 pm on August 28, 2023: member

    This can easily lead to OOM

    Wonder if it’s worth warning/failing if the dbcache param exceeds available memory.

  9. Sjors commented at 4:40 pm on August 28, 2023: member

    My recollection is that an IBD without a single flush is faster, but have not benched this in recent years.

    We can always recommend a lower setting if that turns out to be better. By not increasing it to infinity at least we constrain the worst case here, e.g. if it turns out the failures you mean get exponentially worse as the UTXO set keeps growing. While at the same time not removing the existing ability to sync in one go.

  10. Sjors commented at 4:41 pm on August 28, 2023: member

    if the dbcache param exceeds available memory.

    My understanding was that we can’t measure available RAM reliably.

  11. andrewtoth commented at 4:50 pm on August 28, 2023: contributor

    My recollection is that an IBD without a single flush is faster, but have not benched this in recent years.

    A single flush throughout the entire multi-hour IBD will not make a noticeable difference to a user. My initial version of #28280 had 3 extra pointers per cache entry, which caused the dbcache limit to be hit. It was still >2% faster than master which did not have a single flush.

  12. kristapsk commented at 5:00 pm on August 28, 2023: contributor
    Concept ACK. This does not change defaults, just gives more options to power users. And UTXO set will grow in future. For some server environments 16 GB more or less will mean nothing.
  13. DrahtBot removed the label CI failed on Aug 28, 2023
  14. TheCharlatan commented at 5:21 am on August 29, 2023: contributor
    Concept ACK.
  15. henrikhk commented at 4:51 pm on August 29, 2023: none
    Concept ACK.
  16. in doc/release-notes-25358.md:5 in f311e4549b outdated
    0@@ -0,0 +1,5 @@
    1+Node
    2+------
    3+
    4+- The maximum value for `-dbcache` has been increased to 32,000 MiB due to recent UTXO set growth. Be sure to check that you have enough RAM before increasing this.
    5+- Setting `-dbcache` above the maximum causes the node to abort startup. Previously it would round the number down.
    


    BenWestgate commented at 2:25 am on August 30, 2023:

    I would prefer it quietly round down to the maximum. Especially since it rounds up to the minimum.

    I know a few scripts people use that set dbcache to a percentage of available memory that this part of the PR would break for 64GB RAM.

    What problem is failing to start meant to solve?

  17. BenWestgate commented at 2:37 am on August 30, 2023: contributor

    Concept ACK.

    Except for:

    make startup abort if the value provided is too high, rather than quietly rounding it down.

    This breaks scripts that set dbcache to a % of available RAM.

    And is inconsistent with it rounding up silently rather than fail to start if the value is too low.

  18. 0xB10C commented at 7:46 am on August 30, 2023: contributor

    Concept ACK on increasing the maximum dbcache.

    I had to manually bump the dbcache to run bitcoin-cli verifychain 4 0 last year (https://github.com/bitcoin/bitcoin/pull/24851#issuecomment-1214040286). This came close to using 32 GB and might not be enough anymore (or soon - I haven’t checked again). Maybe worth considering going directly to 64 GB here to be able to actually use the verifychain 4 0 RPC?

  19. darosior commented at 7:57 am on August 30, 2023: member
    Concept ACK on giving more options.
  20. Sjors commented at 8:14 am on August 30, 2023: member

    This breaks scripts that set dbcache to a % of available RAM.

    Do you know any example of people doing that?

    In general I think we should fail to startup if the user provides an invalid configuration option, as this often leads to problems. I believe this has been gradually introduced for various other options too (see also #16545).

    At the same time I don’t want to needless break scripts out there.

    From my own experience I was very surprised to find that this rounding down behavior happened, which is which I prefer an error. But we could also print a warning in the debug log.

    Maybe worth considering going directly to 64 GB here to be able to actually use the verifychain 4 0 RPC? @sipa any thoughts on a sane maximum?

  21. BenWestgate commented at 12:57 pm on August 30, 2023: contributor

    This breaks scripts that set dbcache to a % of available RAM.

    Do you know any example of people doing that?

    My project Bails puts this in bitcoin-qt.desktop as well as .config/autostart/Bitcoin/bitcoin.desktop: -dbcache=$(($(grep Available /proc/meminfo | sed s/[^0-9]//g)/1024-2000))

    It’s an alpha with about 20 users, makes a portable USB stick node that keeps Bitcoin Core up to date, none probably own a 64GB computer, but it will break if they do.

    I found a couple other projects with the idea: https://github.com/Start9Labs/bitcoind-wrapper/issues/39 https://github.com/JWWeatherman/yeticold/issues/145 https://github.com/epiccurious/bitcoin-core-node-builder/issues/38

    They’ll waste time if they miss the change notes. Maybe a one time warning would be better when the value rounds up or down? It took me a long time to notice that it did that.

  22. in src/init.cpp:449 in f311e4549b outdated
    445@@ -446,7 +446,7 @@ void SetupServerArgs(ArgsManager& argsman)
    446     argsman.AddArg("-conf=<file>", strprintf("Specify path to read-only configuration file. Relative paths will be prefixed by datadir location (only useable from command line, not configuration file) (default: %s)", BITCOIN_CONF_FILENAME), ArgsManager::ALLOW_ANY, OptionsCategory::OPTIONS);
    447     argsman.AddArg("-datadir=<dir>", "Specify data directory", ArgsManager::ALLOW_ANY, OptionsCategory::OPTIONS);
    448     argsman.AddArg("-dbbatchsize", strprintf("Maximum database write batch size in bytes (default: %u)", nDefaultDbBatchSize), ArgsManager::ALLOW_ANY | ArgsManager::DEBUG_ONLY, OptionsCategory::OPTIONS);
    449-    argsman.AddArg("-dbcache=<n>", strprintf("Maximum database cache size <n> MiB (%d to %d, default: %d). In addition, unused mempool memory is shared for this cache (see -maxmempool).", nMinDbCache, nMaxDbCache, nDefaultDbCache), ArgsManager::ALLOW_ANY, OptionsCategory::OPTIONS);
    450+    argsman.AddArg("-dbcache=<n>", strprintf("Maximum database cache size <n> MiB (%d to %d, default: %d). Make sure you have enough RAM. In addition, unused mempool memory is shared for this cache (see -maxmempool).", nMinDbCache, nMaxDbCache, nDefaultDbCache), ArgsManager::ALLOW_ANY, OptionsCategory::OPTIONS);
    


    jonatack commented at 2:28 pm on August 30, 2023:

    While touching this help, I think the following would be much clearer (particularly s/for/with).

    0    argsman.AddArg("-dbcache=<n>", strprintf("Maximum database cache size <n> MiB (%d to %d, default: %d). Make sure you have enough RAM. In addition, unused memory allocated to the mempool is shared with this cache (see -maxmempool).", nMinDbCache, nMaxDbCache, nDefaultDbCache), ArgsManager::ALLOW_ANY, OptionsCategory::OPTIONS);
    

    Sjors commented at 2:07 pm on February 13, 2024:
    Taken
  23. in src/init.cpp:974 in f311e4549b outdated
    924@@ -925,6 +925,10 @@ bool AppInitParameterInteraction(const ArgsManager& args)
    925         return InitError(Untranslated("Cannot set -listen=0 together with -listenonion=1"));
    926     }
    927 
    928+    if (args.GetIntArg("-dbcache", nDefaultDbCache) >  nMaxDbCache) {
    929+        return InitError(Untranslated(strprintf("-dbcache must be at most %d MiB", nMaxDbCache)));
    930+    }
    


    jonatack commented at 2:31 pm on August 30, 2023:

    Could add a test.

    Edit: maybe -dbcache configuration option can be at most %d MiB, but %d was passed.


    Sjors commented at 2:07 pm on February 13, 2024:
    Used the text, added a test.
  24. in doc/release-notes-25358.md:1 in f311e4549b outdated
    0@@ -0,0 +1,5 @@
    1+Node
    


    jonatack commented at 2:35 pm on August 30, 2023:

    Maybe (I’m not sure this matters).

    0Updated settings
    

    Sjors commented at 2:46 pm on August 30, 2023:

    Release notes get edited later anyway, but I’ll change once I retouch.

    (waiting for more inputs in the discussion on warning/failure and the max size)


    Sjors commented at 2:07 pm on February 13, 2024:
    Updated.
  25. jonatack commented at 2:36 pm on August 30, 2023: member

    Concept ACK

    In general I think we should fail to startup if the user provides an invalid configuration option, as this often leads to problems. I believe this has been gradually introduced for various other options too (see also #16545).

    #27632 was another recent example of switching from a warning to raising an error. It’s too easy for users, even experienced ones, not to notice a warning line in the log.

  26. BenWestgate commented at 2:50 pm on August 30, 2023: contributor

    Perhaps it should be set so high that you can’t easily buy a Desktop with more RAM than the limit.

    I could go to Microcenter today and walk out with a 64GB gaming tower, or a Xeon workstation with 128GB, likely a 64GB laptop as well.

    Lots of guides on internet (foolishly) suggest setting it to half the total RAM.

  27. in src/init.cpp:929 in f311e4549b outdated
    924@@ -925,6 +925,10 @@ bool AppInitParameterInteraction(const ArgsManager& args)
    925         return InitError(Untranslated("Cannot set -listen=0 together with -listenonion=1"));
    926     }
    927 
    928+    if (args.GetIntArg("-dbcache", nDefaultDbCache) >  nMaxDbCache) {
    929+        return InitError(Untranslated(strprintf("-dbcache must be at most %d MiB", nMaxDbCache)));
    


    maflcko commented at 2:51 pm on August 30, 2023:

    Given that this will break software out there, why not simply keep the previous behavior and leave it as-is? What problem are you trying to solve?

    Printing a warning and/or updating the docs should be more than enough.


    Sjors commented at 8:03 am on August 31, 2023:
    That indeed seems like the best approach. Silently rounding down is annoying, but doesn’t break anything, so a warning should be fine.

    jonatack commented at 12:59 pm on August 31, 2023:
    OTOH, it seems that raising would be a clearer user interface than rounding down silently as now, or with an added warning. See https://en.wikipedia.org/wiki/Principle_of_least_astonishment. I don’t want to break anything, but the only software mentioned so far that this might marginally affect is that of @BenWestgate (thank you for reporting), who would be aware of the change.

    Sjors commented at 2:08 pm on February 13, 2024:

    I’m leaving this error as is for now, unless there are other examples of breaking software.

    I would also rather drop nMaxDbCache entirely than quietly enforce it.


    maflcko commented at 11:07 am on February 22, 2024:

    See https://en.wikipedia.org/wiki/Principle_of_least_astonishment

    If I imagined myself running into this error. I’d be indeed astonished and ask myself why the dbcache is capped at all. Again, if there is a reason for this change, it should be explained. It can’t hurt to put the reason in the error string.

  28. DrahtBot added the label CI failed on Sep 3, 2023
  29. DrahtBot removed the label CI failed on Sep 5, 2023
  30. luke-jr commented at 8:52 pm on September 5, 2023: member

    Why do we have an upper limit at all?

    My understanding was that we can’t measure available RAM reliably.

    #19873 does exactly that

  31. in src/txdb.h:41 in f311e4549b outdated
    37@@ -38,7 +38,7 @@ static const int64_t nDefaultDbCache = 450;
    38 //! -dbbatchsize default (bytes)
    39 static const int64_t nDefaultDbBatchSize = 16 << 20;
    40 //! max. -dbcache (MiB)
    41-static const int64_t nMaxDbCache = sizeof(void*) > 4 ? 16384 : 1024;
    42+static const int64_t nMaxDbCache = sizeof(void*) > 4 ? 32768 : 1024;
    


    luke-jr commented at 9:55 pm on September 5, 2023:
    Might as well make it constexpr while you’re touching it

    Sjors commented at 2:08 pm on February 13, 2024:
    Done
  32. DrahtBot added the label CI failed on Jan 12, 2024
  33. Sjors force-pushed on Feb 13, 2024
  34. Sjors commented at 2:10 pm on February 13, 2024: member

    Rebased, incorporated most of the text feedback above. Dropped the RAM warning from the release note, since that’s already in the help text.

    Why do we have an upper limit at all? @sipa seemed worried about users going overboard with this setting: #28249 (comment)

  35. tdb3 commented at 10:53 pm on February 17, 2024: contributor
    crACK
  36. Sjors force-pushed on Feb 22, 2024
  37. Sjors commented at 10:33 am on February 22, 2024: member
    Fixed broken feature_config_args.py test, taking into account the lower limit on 32 bit systems.
  38. maflcko commented at 11:09 am on February 22, 2024: member
    I still don’t understand the state of the current pull request. This leaves the currently hard-failing use case un-addressed (https://github.com/bitcoin/bitcoin/pull/28358#issuecomment-1698663976) and adds more hard-failing behavior for no reason (https://github.com/bitcoin/bitcoin/pull/28358#discussion_r1310404672)?
  39. maflcko commented at 11:34 am on February 22, 2024: member
    Also, CI still fails
  40. Sjors commented at 1:22 pm on February 22, 2024: member

    I was still waiting for someone to suggest a new value. But it seems fine to bump to 64 GB in order to support verifychain 4 0, so added that. @maflcko do you prefer to drop the limit? We should not quietly change a user setting, so imo an invalid value should explicitly fail, rather than silently fail.

    Also, CI still fails

    Indeed. I copied the is_64bits = sys.maxsize > 2**32 check from netutil.py, but it seems the check doesn’t actually work. So there might be a bug in netutil.py.

    https://cirrus-ci.com/task/5137529851084800?logs=ci#L10796

    Meanwhile the multiprocess, i686 CI compiler seems to think it’s 32 bit (while the test framework thinks it’s 64 bit), so the sizeof(void*) check in txdb.h seems broken too. (which could be another reason to just drop the limit entirely)

    https://cirrus-ci.com/task/4715317386018816?logs=ci#L3814

  41. Sjors force-pushed on Feb 22, 2024
  42. Sjors marked this as a draft on Feb 22, 2024
  43. Sjors renamed this:
    Double -dbache maximum to 32GB
    Double -dbache maximum to 64GB
    on Feb 22, 2024
  44. maflcko commented at 1:32 pm on February 22, 2024: member

    i686 is 32 bit and python on that CI machine is compiled to 64 bit.

    Edit: Assuming they are the same is the broken assumption

  45. Sjors force-pushed on Feb 22, 2024
  46. Sjors commented at 2:38 pm on February 22, 2024: member
    We could expose whether the program is compiled for 32 or 64 bit in the test framework… but I just added a regex for the error message instead.
  47. DrahtBot removed the label CI failed on Feb 22, 2024
  48. Sjors marked this as ready for review on Feb 23, 2024
  49. in src/node/caches.cpp:16 in 39a3d48a4a outdated
    12@@ -13,7 +13,7 @@ CacheSizes CalculateCacheSizes(const ArgsManager& args, size_t n_indexes)
    13 {
    14     int64_t nTotalCache = (args.GetIntArg("-dbcache", nDefaultDbCache) << 20);
    15     nTotalCache = std::max(nTotalCache, nMinDbCache << 20); // total cache cannot be less than nMinDbCache
    16-    nTotalCache = std::min(nTotalCache, nMaxDbCache << 20); // total cache cannot be greater than nMaxDbcache
    17+    assert(nTotalCache <= nMaxDbCache << 20);
    


    cbergqvist commented at 10:19 pm on March 4, 2024:

    nit: How about appending a comment to the assert such as // error already handled in AppInitParameterInteraction()?

    (Someone might have accidentally implemented a version of this function “properly” returning util::Result<CacheSizes> before realizing this was the case :) ).


    Sjors commented at 8:43 am on March 5, 2024:
    I’ll add a comment if I have to retouch.

    Sjors commented at 10:54 am on March 12, 2024:
    No longer needed since the limit is dropped now.
  50. cbergqvist approved
  51. cbergqvist commented at 11:04 pm on March 4, 2024: contributor

    ACK 39a3d48.

    Concept

    Adding this kind of hard error might make it more annoying to run old Bitcoin Core software in the far future. (But the consequence of a few extra flushes should be fine).

    Tested

    Starting bitcoind with -dbcache set to low value, max value, and beyond max value with expected results.

    Ran ./test/functional/feature_config_args.py with and without modified expected assert message, with expected results.

  52. DrahtBot requested review from TheCharlatan on Mar 4, 2024
  53. DrahtBot requested review from tdb3 on Mar 4, 2024
  54. DrahtBot requested review from jonatack on Mar 4, 2024
  55. DrahtBot requested review from 0xB10C on Mar 4, 2024
  56. DrahtBot requested review from darosior on Mar 4, 2024
  57. DrahtBot removed review request from tdb3 on Mar 4, 2024
  58. DrahtBot requested review from tdb3 on Mar 4, 2024
  59. DrahtBot removed review request from tdb3 on Mar 5, 2024
  60. DrahtBot requested review from tdb3 on Mar 5, 2024
  61. Sjors renamed this:
    Double -dbache maximum to 64GB
    Quadruple -dbache maximum to 64GB
    on Mar 11, 2024
  62. maflcko commented at 8:43 am on March 12, 2024: member

    @maflcko do you prefer to drop the limit? We should not quietly change a user setting, so imo an invalid value should explicitly fail, rather than silently fail.

    Why would the value be “invalid”?

  63. Sjors commented at 8:56 am on March 12, 2024: member
    If there is a limit, a value above the limit is invalid. That is the entire point of a limit. Which is why I asked if you prefer to drop the limit.
  64. maflcko commented at 9:47 am on March 12, 2024: member

    I don’t know if it is safe to drop the limit. For example, one could run into a signed integer overflow for some values. If that is the reason for the limit, you should add a comment for it.

    However, if there is no reason for the limit, I don’t see why a limit should exist.

  65. Sjors force-pushed on Mar 12, 2024
  66. Sjors commented at 10:50 am on March 12, 2024: member

    It seems there’s more enthusiasm for dropping this limit entirely, so I did that. This also impacts the GUI (but none of the translated strings), cc @hebasto.

    For future reference, c016405dadcdc645f31dc77cb0a92e2e10fe358a + 39a3d48a4ae87fd659a58ce93109a722fc436dd8 is the last version of this PR which does have a limit.

  67. Sjors renamed this:
    Quadruple -dbache maximum to 64GB
    Drop -dbache limit
    on Mar 12, 2024
  68. in src/qt/optionsdialog.cpp:96 in 5f313495ee outdated
    93@@ -94,8 +94,7 @@ OptionsDialog::OptionsDialog(QWidget* parent, bool enableWallet)
    94     ui->setupUi(this);
    95 
    96     /* Main elements init */
    97-    ui->databaseCache->setMinimum(nMinDbCache);
    98-    ui->databaseCache->setMaximum(nMaxDbCache);
    99+    ui->databaseCache->setRange(nMinDbCache, std::numeric_limits<int>::max());
    


    Sjors commented at 10:56 am on March 12, 2024:
    Even on a 32 bit system the maximum would be 2 petabyte (2 billion MiB), so not a practical concern imo.
  69. Sjors renamed this:
    Drop -dbache limit
    Drop -dbcache limit
    on Mar 12, 2024
  70. sipa commented at 11:24 am on March 12, 2024: member

    On 32-bit systems we shouldn’t raise the cache size too close to 4 GiB or more, as it’ll just result in memory allocation errors.

    On 64-bit systems, the limit is much less necessary, but I’d still worry about wildly too large numbers, as they also affect leveldb cache sizes. Setting those (way) too large is probably counterproductive.

  71. Sjors commented at 12:26 pm on March 12, 2024: member

    @sipa since 32 bit systems can’t have more than 4 GiB of RAM, the “Make sure you have enough RAM” wording in the documentation should cover that. Though perhaps “Stay at least 1 GiB below your RAM size” may be safer, also because of swap behaviour (see e.g. #29603 (comment))

    With something like #19873 we could warn the use if they pick a value too close to the limit (on both 32 and 64 bit systems).

    wildly too large

    We could suggest an optimum, but without further experimentation we don’t know where that is.

    Ideally we would measure performance in real time and flush when it starts degrading (unless the user really insists that we don’t).

  72. cbergqvist commented at 8:12 pm on March 14, 2024: contributor

    Built 5f313495eea36b566109894697b4a90085c53113 and ran bitcoin-qt because I was worried that setting a very high max value might cause issues with the QT UI control. It’s not some kind of horizontal slider that goes crazy, but rather a textbox with up/down arrows. It ignores key inputs that would make the value go further than std::numeric_limits<int>::max(). So all good from that perspective.

    bitcoin-qt_screenshot

  73. hebasto commented at 4:34 pm on March 18, 2024: member

    This also impacts the GUI (but none of the translated strings), cc @hebasto.

    5f313495eea36b566109894697b4a90085c53113: changes to the GUI code look good to me. It might worth considering adding the “Make sure you have enough RAM.” warning to the GUI as well (as a follow-up).

  74. Sjors commented at 4:37 pm on March 18, 2024: member

    “Make sure you have enough RAM.” warning to the GUI as well (as a follow-up).

    Will do if I need to retouch.

  75. andrewtoth commented at 3:53 am on May 14, 2024: contributor
    After #28233 keeping the entire utxo set in memory will also help connect new blocks as fast as possible, not just for IBD.
  76. Sjors force-pushed on May 14, 2024
  77. Sjors commented at 6:54 am on May 14, 2024: member

    Rebased to make it easier to test the interaction with (merged) #28233. However, any potential benefit goes away after the first restart. The only time -dbcache is filled beyond its previous 16GB limit is during IDB.

    In order to benefit from having the full UTXO set in RAM after a restart, we’d have to proactively load the entire UTXO set into -dbcache if space permits. Or maybe fill half the available space with coins, picked randomly.

    Added “Make sure you have enough RAM.” warning to the GUI.

  78. andrewtoth commented at 1:38 am on May 15, 2024: contributor

    The only time -dbcache is filled beyond its previous 16GB limit is during IDB

    I’m sure it will happen if you restart your node now and leave it running for long enough ;)

    In order to benefit from having the full UTXO set in RAM after a restart, we’d have to proactively load the entire UTXO set into -dbcache if space permits.

    I did make a crude attempt at that years ago #18941. Not sure it is worth it.

  79. BenWestgate approved
  80. BenWestgate commented at 11:22 am on May 23, 2024: contributor
    crACK
  81. in doc/release-notes-25358.md:1 in 2a71819af6 outdated
    0@@ -0,0 +1,6 @@
    1+Updated settings
    


    sdaftuar commented at 12:27 pm on May 23, 2024:
    Just noticed the filename here should be fixed (25358 vs 28358).
  82. sdaftuar commented at 12:32 pm on May 23, 2024: member

    Concept ACK.

    I don’t see much of a philosophical difference between having no limit and having a limit of 16GB on systems where a user has less than 16GB actually available… If we’re already requiring users to do something smart, we can keep doing that – and if we are worried about someone using an inappropriate value we can try to solve that separately (maybe by testing that we can allocate that much at startup or something?).

  83. BenWestgate commented at 3:53 pm on May 23, 2024: contributor

    If we are worried about someone using an inappropriate value we can try to solve that separately (maybe by testing that we can allocate that much at startup or something?).

    The best would be never setting it greater than total RAM. -dbcache=<total RAM> is still higher than optimal as there will be applications in memory that won’t be swapped to disk including Bitcoin. But measuring Available RAM or swap is precarious as it changes.

    Since other reviewers mentioned only wildly excessive values are seriously detrimental with swap, capping to max RAM is sensible.

    In order to benefit from having the full UTXO set in RAM after a restart, we’d have to proactively load the entire UTXO set into -dbcache if space permits. Or maybe fill half the available space with coins, picked randomly.

    Some have used cat .bitcoin/chainstate/* > /dev/null to make a node that’s far behind start syncing faster. Because of the disk io to read the chainstate folder, there’s a trade off between initial speed and medium-term speed. A node mere hours behind shouldn’t read the entire UTXO set into memory. But one that’s weeks behind will reach tip faster if it does.

    Though perhaps “Stay at least 1 GiB below your RAM size” may be safer, also because of swap behaviour (see e.g. #29603 (comment))

    That’s barely safe without swap, Bitcoin-QT and the OS will use most of the 1 GiB offset leaving it pretty close to locking up.

    Without a swap file, an offset from total ram (1-2 GiB?) may avoid OOM, but this becomes arbitrary and may limit performance if set too low. It’s better users w/o swap who set it too high OOM, than to limit performance of systems w/ swap that may benefit from more.

    So total memory is still the best limit, if any is enforced.

    We could suggest an optimum, but without further experimentation we don’t know where that is.

    I’ve done a lot of tests on 4 GB and 8 GB RAM machines with zram swap and the -datadir on USB sticks and available_RAM - 2 GiB is about optimal for keeping those systems responsive. See this closed issue for more data: https://github.com/BenWestgate/Bails/issues/41

    With -dbcache=<available_RAM - 2 GiB> both 4 GB and 8 GB RAM systems can unlock the screen and manipulate the desktop without exceeding a 10 second lag, after a user locks the screen and lets Bitcoin sync several hours. With 4 GB RAM, this is approximately our default -dbcache=450 as suggested elsewhere to NOT raise it on 4 GB. If -dbcache is too high with a slow swap the machine may take minutes to respond. That’s true even when the value is set lower than total or available memory. But whether that lag is undesirable or not depends on whether the user intends to use the machine or not before sync completes.

    Suggesting dbcache=$(total_ram - 3650) as an optimum place to start matches what some of our devs have recommended on stack exchange.

  84. validation: drop maximum -dbcache
    Fixes #28249
    bb3b980dfd
  85. Sjors force-pushed on May 28, 2024
  86. Sjors commented at 6:42 am on May 28, 2024: member

    @BenWestgate thanks for the analysis. If we want to add a recommendation to the help text, I suggest we do that in another PR though. It’s probably going to take multiple people doing benchmark on wildly different machines.

    Checking against RAM can also be done in a followup.

    I rebased and changed the .md file name.

  87. jlopp commented at 1:49 pm on June 24, 2024: contributor

    Concept ACK.

    For reference, I ran several sync tests with different dbcache sizes to see how much of an effect they have on performance. A good 25% improvement vs default when my machine abstained from flushing to disk. https://blog.lopp.net/effects-dbcache-size-bitcoin-node-sync-speed/

  88. Sjors commented at 5:24 pm on June 25, 2024: member

    @jlopp thanks for the testing and your article.

    I assume “to reach a given block height with a 28 GB dbcache size” you did not run with this patch? I’d be curious to see the result.

    And if you feel like battle testing this further, you could make a custom signet, produce a similar UTXO set to mainnet (maybe #24952 can help), increase it to a few hundred gigabyte and then see if and where there’s a value -dbcache where performance peaks and goes down.

  89. tdb3 approved
  90. tdb3 commented at 6:37 pm on September 7, 2024: contributor

    ACK bb3b980dfd96d9a89e6b04aef2859ce0555c6807

    Seems reasonable that a user explicitly setting dbcache would have the freedom (and responsibility) to allocate as much memory as desirable.

    Rebased this PR branch on top of master (d661e2b1b771abafb0b152842d775d3150032230).

    02024-09-06T03:37:15Z Cache configuration:
    12024-09-06T03:37:15Z * Using 2.0 MiB for block index database
    22024-09-06T03:37:15Z * Using 8.0 MiB for chain state database
    32024-09-06T03:37:15Z * Using 42998.0 MiB for in-memory UTXO set (plus up to 286.1 MiB of unused mempool space)
    

    Sync-from-Genesis Tests

    Ran three sync tests with another local node (running 27.1, stuck at 860036) on a LAN, with both nodes configured to only connect to each other. Tests ran with an inexpensive DRAMless SATA SSD.

    Observed a significantly faster sync with dbcache > 16384, with the tradeoff of having to spend more time writing out to disk (e.g. on exit). Overall, seems like an advantage to give the user the choice.

    Default dbcache

    Time to reach block 860036: 27 hours 29 minutes Time to exit bitcoind (writing out): 4 minutes

    dbcache=16384

    Time to reach block 860036: 8 hours 31 minutes Time to exit bitcoind (writing out): 8 minutes (cache was 4285.7MiB at the time) Significant write out seen at blocks 804273 and 849619.

    dbcache=43008

    Time to reach block 860036: 6 hours 42 minutes Time to exit bitcoind (writing out): 110 minutes (cache was 26691.3MiB at the time) Significant write out seen only when exiting bitcoins.

    Value of 43008 (42GiB) was chosen as a balance of being above expected cache high water mark during sync and low enough to not impact OS RAM needs for other programs.

  91. DrahtBot requested review from BenWestgate on Sep 7, 2024
  92. DrahtBot requested review from cbergqvist on Sep 7, 2024
  93. DrahtBot requested review from sdaftuar on Sep 7, 2024
  94. BenWestgate commented at 3:12 am on September 8, 2024: contributor

    dbcache=43008

    Time to reach block 860036: 6 hours 42 minutes Time to exit bitcoind (writing out): 110 minutes (cache was 26691.3MiB at the time) Significant write out seen only when exiting bitcoins.

    The long writing out would go significantly faster on a higher performance SSD, it is also CPU-intensive as it’s compressing the 27GB in memory. So the faster initial sync is a win.

  95. andrewtoth commented at 4:04 pm on September 8, 2024: contributor
    If we merge #30611, then write outs will be more evenly distributed throughout the sync and will not all happen on shutdown with a high dbcache.
  96. l0rinc commented at 7:02 pm on September 8, 2024: contributor
    Concept ACK, I’ll run some tests later
  97. l0rinc commented at 9:36 pm on September 12, 2024: contributor

    I ran benchmarks to evaluate the impact of removing the limit on IBD performance. The tests were conducted on an Intel Core i7-7700 CPU, 64 GB of RAM, and HDD storage. I synced up 3 times to block height 600,000 using various -dbcache settings: 2 GB, 5 GB, 10 GB, 20 GB, 30 GB, and 40 GB.

    0hyperfine \
    1--runs 3 \
    2--export-json /mnt/ibd_dbcache.json \
    3--parameter-list DBCACHE 2048,5120,10240,20480,30720,40960 \
    4--prepare 'rm -rf /mnt/BitcoinData/*' \
    5'./build/src/bitcoind -datadir=/mnt/BitcoinData -stopatheight=600000 -dbcache={DBCACHE} -printtoconsole=0'
    

    Results:

    • 2 GB: 4 hours, 39 minutes, 52 seconds
    • 5 GB: 4 hours, 5 minutes, 50 seconds
    • 10 GB: 3 hours, 51 minutes, 35 seconds
    • 20 GB: 3 hours, 48 minutes, 25 seconds
    • 30 GB: 3 hours, 45 minutes, 53 seconds
    • 40 GB: 3 hours, 45 minutes, 43 seconds

    dbcache | seconds

    2048 | 18565.20009 2048 | 16450.20701 2048 | 15362.17072 5120 | 14805.76276 5120 | 14797.24122 5120 | 14646.52724 10240 | 14453.19471 10240 | 13883.38449 10240 | 13347.80065 20480 | 14022.99233 20480 | 13658.22833 20480 | 13433.18961 30720 | 13688.31978 30720 | 13595.47899 30720 | 13376.19299 40960 | 13692.83482 40960 | 13672.43475 40960 | 13263.82877

    Findings:

    • Increasing -dbcache from 2 GB to 10 GB resulted in a significant improvement in IBD time, reducing it by about 48 minutes (~17% reduction).
    • Beyond 10 GB, the performance gains diminished, with less than a 2% improvement when increasing from 10 GB to 40 GB (approximately 6 minutes saved).

    Edit: I’ll run a few more rounds with -stopatheight=860000 -prune=550 for 10240,16384,20480,30720,40960

  98. BenWestgate commented at 2:37 am on September 13, 2024: contributor

    I ran benchmarks to evaluate the impact of removing the limit on IBD performance. … I synced up 3 times to block height 600,000…

    Thanks for this benchmark. The current block height is 861,090 and the size of the utxo set has more than doubled since 600,000 so this is why you do not see performance improvements from dbcache > 10 GB.

    Based on this data I think removing the limit may not be substantiated…

    Try syncing to 860,000 there should be performance benefit up to 30 GiB dbcache as some syncs grew to use 27GiB cache.

  99. in src/init.cpp:477 in bb3b980dfd
    473@@ -474,7 +474,7 @@ void SetupServerArgs(ArgsManager& argsman)
    474     argsman.AddArg("-conf=<file>", strprintf("Specify path to read-only configuration file. Relative paths will be prefixed by datadir location (only useable from command line, not configuration file) (default: %s)", BITCOIN_CONF_FILENAME), ArgsManager::ALLOW_ANY, OptionsCategory::OPTIONS);
    475     argsman.AddArg("-datadir=<dir>", "Specify data directory", ArgsManager::ALLOW_ANY, OptionsCategory::OPTIONS);
    476     argsman.AddArg("-dbbatchsize", strprintf("Maximum database write batch size in bytes (default: %u)", nDefaultDbBatchSize), ArgsManager::ALLOW_ANY | ArgsManager::DEBUG_ONLY, OptionsCategory::OPTIONS);
    477-    argsman.AddArg("-dbcache=<n>", strprintf("Maximum database cache size <n> MiB (%d to %d, default: %d). In addition, unused mempool memory is shared for this cache (see -maxmempool).", nMinDbCache, nMaxDbCache, nDefaultDbCache), ArgsManager::ALLOW_ANY, OptionsCategory::OPTIONS);
    478+    argsman.AddArg("-dbcache=<n>", strprintf("Maximum database cache size <n> MiB (minimum %d, default: %d). Make sure you have enough RAM. In addition, unused memory allocated to the mempool is shared with this cache (see -maxmempool).", nMinDbCache, nDefaultDbCache), ArgsManager::ALLOW_ANY, OptionsCategory::OPTIONS);
    


    BenWestgate commented at 2:50 am on September 13, 2024:

    nit: This should say “available RAM” rather than “RAM”.

    On a 64 GB system without swap: it will crash or OOM kill apps if you set dbcache=32000 but only have only have 10 GB available RAM.

    On a 64 GB system with enough swap: it will perform MUCH worse and prematurely wear out the flash storage, if you set dbcache=32000 but only have only have 10 GB available RAM.

    (I’ve destroyed new USB sticks overnight, putting a swapfile on them and doing IBD with dbcache set between available RAM and total RAM.)

    The only reason people raise this setting is to sync faster so we should tell them it’s available RAM that counts, not total RAM, to make that possible.


    Sjors commented at 8:21 am on September 13, 2024:

    I’m not sure what “available” means here? The amount of RAM used by other processes can vary wildly, especially on a desktop computer.

    In any case, I prefer to leave changes to the text to a followup.


    BenWestgate commented at 1:17 am on September 16, 2024:

    I’m not sure what “available” means here?

    From man free:

    available Estimation of how much memory is available for starting new applications, without swapping. Unlike the data provided by the cache or free fields, this field takes into account page cache and also that not all reclaimable memory slabs will be reclaimed due to items being in use

    So having “enough RAM” will work but having “enough available memory” is a more accurate number to stay under to avoid swapping or OOM errors.

    Someone may assume with 16 GB RAM they can safely set dbcache to 8-15 GB but on most desktop systems that would swap or crash.

  100. in src/qt/forms/optionsdialog.ui:108 in bb3b980dfd
    104@@ -105,7 +105,7 @@
    105          <item>
    106           <widget class="QLabel" name="databaseCacheLabel">
    107            <property name="toolTip">
    108-            <string extracomment="Tooltip text for Options window setting that sets the size of the database cache. Explains the corresponding effects of increasing/decreasing this value.">Maximum database cache size. A larger cache can contribute to faster sync, after which the benefit is less pronounced for most use cases. Lowering the cache size will reduce memory usage. Unused mempool memory is shared for this cache.</string>
    109+            <string extracomment="Tooltip text for Options window setting that sets the size of the database cache. Explains the corresponding effects of increasing/decreasing this value.">Maximum database cache size. Make sure you have enough RAM. A larger cache can contribute to faster sync, after which the benefit is less pronounced for most use cases. Lowering the cache size will reduce memory usage. Unused mempool memory is shared for this cache.</string>
    


    BenWestgate commented at 3:04 am on September 13, 2024:
    Nit: “Make sure you have enough available RAM.” seems more correct.
  101. BenWestgate commented at 3:12 am on September 13, 2024: contributor

    crACK bb3b980dfd96d9a89e6b04aef2859ce0555c6807.

    Setting dbcache below available RAM performs better and swaps less than setting it below total RAM, which may even crash the machine or OOM kill other applications.

    I feel the tool tip and argument texts should say “available RAM” but it’s not a blocker for me.

  102. achow101 commented at 7:46 pm on September 16, 2024: member
    ACK bb3b980dfd96d9a89e6b04aef2859ce0555c6807
  103. achow101 merged this on Sep 16, 2024
  104. achow101 closed this on Sep 16, 2024

  105. jonatack commented at 5:12 pm on September 17, 2024: member

    ACK bb3b980dfd96d9a89e6b04aef2859ce0555c6807

    Agee with more freedom/responsibility for the user and the ability to reduce flushing to disk for better IBD performance.

    It’s probably a good idea to avoid using strike-through tags in PR descriptions, as they aren’t rendered well by the merge script and look like emphasis instead.

    Due to recent UTXO set growth, the current maximum value for `-dbcache` of 16GB is ~just months away from being~ insufficient (for those who wish to complete IBD with the UTXO set held in RAM).
  106. Sjors deleted the branch on Sep 18, 2024
  107. l0rinc commented at 10:15 am on September 20, 2024: contributor

    Unsurprisingly (though somewhat surprisingly - should we document this?), setting -prune makes the -dbcache value less important:

    0// 6d7f24595b08b8d1eba53e648533bcf87c30b48f
    1hyperfine \
    2--runs 2 \
    3--export-json /mnt/ibd_dbcache_prune.json \
    4--parameter-list DBCACHE 10240,16384,20480,30720,40960 \
    5--prepare 'rm -rf /mnt/BitcoinData/*' \
    6'./build/src/bitcoind -datadir=/mnt/BitcoinData -printtoconsole=0 -stopatheight=860000 -prune=550 -dbcache={DBCACHE}'
    
     0Benchmark 1: ./build/src/bitcoind -datadir=/mnt/BitcoinData -printtoconsole=0 -stopatheight=860000 -prune=550 -dbcache=10240
     1  Time (mean ± σ):     54047.663 s ± 196.467 s    [User: 48343.781 s, System: 5183.690 s]
     2  Range (min … max):   53908.739 s … 54186.586 s    2 runs
     3
     4Benchmark 2: ./build/src/bitcoind -datadir=/mnt/BitcoinData -printtoconsole=0 -stopatheight=860000 -prune=550 -dbcache=16384
     5  Time (mean ± σ):     54971.917 s ± 33.433 s    [User: 47761.439 s, System: 4482.604 s]
     6  Range (min … max):   54948.276 s … 54995.558 s    2 runs
     7
     8Benchmark 3: ./build/src/bitcoind -datadir=/mnt/BitcoinData -printtoconsole=0 -stopatheight=860000 -prune=550 -dbcache=20480
     9  Time (mean ± σ):     54625.234 s ± 372.120 s    [User: 47902.577 s, System: 4480.376 s]
    10  Range (min … max):   54362.105 s … 54888.362 s    2 runs
    11
    12Benchmark 4: ./build/src/bitcoind -datadir=/mnt/BitcoinData -printtoconsole=0 -stopatheight=860000 -prune=550 -dbcache=30720
    13  Time (mean ± σ):     54996.086 s ± 500.933 s    [User: 47059.146 s, System: 3619.912 s]
    14  Range (min … max):   54641.873 s … 55350.299 s    2 runs
    15
    16Benchmark 5: ./build/src/bitcoind -datadir=/mnt/BitcoinData -printtoconsole=0 -stopatheight=860000 -prune=550 -dbcache=40960
    17  Time (mean ± σ):     56834.434 s ± 569.458 s    [User: 47748.418 s, System: 3785.085 s]
    18  Range (min … max):   56431.766 s … 57237.102 s    2 runs
    
  108. BenWestgate commented at 5:27 am on September 23, 2024: contributor

    Didn’t #28280 resolve this?

    On Fri, Sep 20, 2024 at 05:16, l0rinc @.***(mailto:On Fri, Sep 20, 2024 at 05:16, l0rinc «a href=)> wrote:

    Unsurprisingly (should we document this?), setting -prune makes the -dbcache value less important:

    benchmark

    // 6d7f24595b08b8d1eba53e648533bcf87c30b48f hyperfine
    –runs 2
    –export-json /mnt/ibd_dbcache_prune.json
    –parameter-list DBCACHE 10240,16384,20480,30720,40960
    –prepare

    '

    rm -rf /mnt/BitcoinData/*

    '

    \

    '

    ./build/src/bitcoind -datadir=/mnt/BitcoinData -printtoconsole=0 -stopatheight=860000 -prune=550 -dbcache={DBCACHE}

    '

    Benchmark 1: ./build/src/bitcoind -datadir=/mnt/BitcoinData -printtoconsole=0 -stopatheight=860000 -prune=550 -dbcache=10240 Time (mean ± σ): 54047.663 s ± 196.467 s [User: 48343.781 s, System: 5183.690 s] Range (min … max): 53908.739 s … 54186.586 s 2 runs

    Benchmark 2: ./build/src/bitcoind -datadir=/mnt/BitcoinData -printtoconsole=0 -stopatheight=860000 -prune=550 -dbcache=16384 Time (mean ± σ): 54971.917 s ± 33.433 s [User: 47761.439 s, System: 4482.604 s] Range (min … max): 54948.276 s … 54995.558 s 2 runs

    Benchmark 3: ./build/src/bitcoind -datadir=/mnt/BitcoinData -printtoconsole=0 -stopatheight=860000 -prune=550 -dbcache=20480 Time (mean ± σ): 54625.234 s ± 372.120 s [User: 47902.577 s, System: 4480.376 s] Range (min … max): 54362.105 s … 54888.362 s 2 runs

    Benchmark 4: ./build/src/bitcoind -datadir=/mnt/BitcoinData -printtoconsole=0 -stopatheight=860000 -prune=550 -dbcache=30720 Time (mean ± σ): 54996.086 s ± 500.933 s [User: 47059.146 s, System: 3619.912 s] Range (min … max): 54641.873 s … 55350.299 s 2 runs

    Benchmark 5: ./build/src/bitcoind -datadir=/mnt/BitcoinData -printtoconsole=0 -stopatheight=860000 -prune=550 -dbcache=40960 Time (mean ± σ): 56834.434 s ± 569.458 s [User: 47748.418 s, System: 3785.085 s] Range (min … max): 56431.766 s … 57237.102 s 2 runs

    — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

  109. l0rinc commented at 12:03 pm on September 23, 2024: contributor

    Didn’t #28280 resolve this?

    That’s why I was checking it, I’ve rebased it on 9cb9651d92ddb5d92724f6a52440601c7a0bbcf8 before running it with -prune. @andrewtoth is this what you expected to happen, or was I measuring something wrong here?

    0# git rev-parse HEAD
    16d7f24595b08b8d1eba53e648533bcf87c30b48f
    2
    3# git log --grep="bitcoin/bitcoin#28280"
    4commit 27a770b34b8f1dbb84760f442edb3e23a0c2420b
    5Merge: 0f68a05c08 589db872e1
    6Author: Ava Chow <github@achow101.com>
    7Date:   Wed Aug 7 20:06:39 2024 -0400
    8
    9    Merge bitcoin/bitcoin#28280: Don't empty dbcache on prune flushes: >30% faster IBD
    

    I ran it since without -prune as well since, and the differences are still minuscule (5% better than before - likely because of the HDD of the server):

    0hyperfine \
    1--runs 1 \
    2--export-json /mnt/ibd_dbcache_full.json \
    3--parameter-list DBCACHE 10240,16384,20480,30720,40960 \
    4--prepare 'rm -rf /mnt/BitcoinData/*' \
    5'./build/src/bitcoind -datadir=/mnt/BitcoinData -printtoconsole=0 -stopatheight=860000 -dbcache={DBCACHE}'
    
     0Benchmark 1: ./build/src/bitcoind -datadir=/mnt/BitcoinData -printtoconsole=0 -stopatheight=860000 -dbcache=10240
     1  Time (abs ≡):        43454.890 s               [User: 36019.244 s, System: 3368.345 s]
     2
     3Benchmark 2: ./build/src/bitcoind -datadir=/mnt/BitcoinData -printtoconsole=0 -stopatheight=860000 -dbcache=16384
     4  Time (abs ≡):        40668.815 s               [User: 35550.351 s, System: 2825.697 s]
     5
     6Benchmark 3: ./build/src/bitcoind -datadir=/mnt/BitcoinData -printtoconsole=0 -stopatheight=860000 -dbcache=20480
     7  Time (abs ≡):        40441.252 s               [User: 35192.236 s, System: 2627.069 s]
     8
     9Benchmark 4: ./build/src/bitcoind -datadir=/mnt/BitcoinData -printtoconsole=0 -stopatheight=860000 -dbcache=30720
    10  Time (abs ≡):        40075.069 s               [User: 34405.162 s, System: 1974.380 s]
    11
    12Benchmark 5: ./build/src/bitcoind -datadir=/mnt/BitcoinData -printtoconsole=0 -stopatheight=860000 -dbcache=40960
    13  Time (abs ≡):        38508.128 s               [User: 33951.127 s, System: 1771.498 s]
    
  110. andrewtoth commented at 1:41 pm on September 23, 2024: contributor
    I really think there’s too much variance when syncing with untrusted peers.

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-11-23 09:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me