Add -pruneduringinit option to temporarily use another prune target during IBD #31845

pull luke-jr wants to merge 1 commits into bitcoin:master from luke-jr:pruneduringinit changing 5 files +59 −12
  1. luke-jr commented at 1:29 am on February 12, 2025: member

    Low prune sizes can force flushing long before dbcache is full, making a significant impact on real-world IBD time.

    This new option allows a higher (or lower) prune size during IBD to mitigate this. When assumeutxo is active, the entire additional budget is allocated to the background chainstate.

  2. DrahtBot commented at 1:29 am on February 12, 2025: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/31845.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    Approach NACK stickies-v

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #31260 (scripted-diff: Type-safe settings retrieval by ryanofsky)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  3. DrahtBot added the label CI failed on Feb 12, 2025
  4. DrahtBot commented at 1:33 am on February 12, 2025: contributor

    🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/37065721535

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  5. andrewtoth commented at 2:10 am on February 12, 2025: contributor
    After #28280 and #30039 how significant is the impact of this?
  6. luke-jr force-pushed on Feb 12, 2025
  7. luke-jr commented at 2:14 am on February 12, 2025: member

    After #28280 and #30039 how significant is the impact of this?

    I don’t know, but logically I would expect this to still improve things significantly.

  8. luke-jr force-pushed on Feb 12, 2025
  9. DrahtBot removed the label CI failed on Feb 12, 2025
  10. maflcko added the label Needs Benchmark on Feb 12, 2025
  11. in src/node/blockmanager_args.cpp:29 in d612e2637e outdated
    24+    } else if (nPruneArg == 0) { // disabled
    25+        return 0;
    26+    } else if (nPruneArg == 1) { // manual pruning: -prune=1
    27+        return BlockManager::PRUNE_TARGET_MANUAL;
    28+    }
    29+    const uint64_t nPruneTarget{uint64_t(nPruneArg) * 1024 * 1024};
    


    l0rinc commented at 10:29 am on February 12, 2025:

    In many places in the codebase we use bit shifts for Kib/MiB/GiB, consider using them here as well:

    0    const uint64_t nPruneTarget{uint64_t(nPruneArg) << 20};
    

    luke-jr commented at 5:33 pm on February 12, 2025:
    It’s clearer using * 1024 IMO
  12. in src/node/blockmanager_args.cpp:31 in d612e2637e outdated
    26+    } else if (nPruneArg == 1) { // manual pruning: -prune=1
    27+        return BlockManager::PRUNE_TARGET_MANUAL;
    28+    }
    29+    const uint64_t nPruneTarget{uint64_t(nPruneArg) * 1024 * 1024};
    30+    if (nPruneTarget < MIN_DISK_SPACE_FOR_BLOCK_FILES) {
    31+        return util::Error{strprintf(_("%s configured below the minimum of %d MiB.  Please use a higher number."), opt_name, MIN_DISK_SPACE_FOR_BLOCK_FILES / 1024 / 1024)};
    


    l0rinc commented at 10:30 am on February 12, 2025:

    I know the extra space was there before, but it might be time to fix it now that we’re touching it:

    0        return util::Error{strprintf(_("%s configured below the minimum of %d MiB. Please use a higher number."), opt_name, MIN_DISK_SPACE_FOR_BLOCK_FILES / 1024 / 1024)};
    
  13. in src/init.cpp:511 in d612e2637e outdated
    507@@ -508,6 +508,9 @@ void SetupServerArgs(ArgsManager& argsman)
    508     argsman.AddArg("-prune=<n>", strprintf("Reduce storage requirements by enabling pruning (deleting) of old blocks. This allows the pruneblockchain RPC to be called to delete specific blocks and enables automatic pruning of old blocks if a target size in MiB is provided. This mode is incompatible with -txindex. "
    509             "Warning: Reverting this setting requires re-downloading the entire blockchain. "
    510             "(default: 0 = disable pruning blocks, 1 = allow manual pruning via RPC, >=%u = automatically prune block files to stay under the specified target size in MiB)", MIN_DISK_SPACE_FOR_BLOCK_FILES / 1024 / 1024), ArgsManager::ALLOW_ANY, OptionsCategory::OPTIONS);
    511+    argsman.AddArg("-pruneduringinit", "Temporarily adjusts the -prune setting until initial sync completes."
    


    l0rinc commented at 10:36 am on February 12, 2025:
    is this compatible with -fastprune? If so, we should cover the new feature with tests, separating IBD pruning from up-to-date pruning (maybe in feature_index_prune.py)
  14. in src/kernel/blockmanager_opts.h:27 in d612e2637e outdated
    23@@ -24,6 +24,7 @@ struct BlockManagerOpts {
    24     const CChainParams& chainparams;
    25     bool use_xor{DEFAULT_XOR_BLOCKSDIR};
    26     uint64_t prune_target{0};
    27+    int64_t prune_target_during_init{-1};
    


    l0rinc commented at 10:38 am on February 12, 2025:
    how many block do we absolutely need (for reorgs, I guess) during IBD?

    luke-jr commented at 5:43 pm on February 12, 2025:
    idk, not sure why it matters

    l0rinc commented at 9:01 pm on February 12, 2025:
    To see how low we can go safely to speed up IBD

    luke-jr commented at 9:36 pm on February 13, 2025:
    That’s backward. We need to go higher to speed up IBD. Lower slows it down.

    l0rinc commented at 9:57 pm on February 13, 2025:

    I meant that if we could avoid writing blocks (and especially undos, which likely don’t serve any purpose during IBD as far as I can tell), we could speed up IBD measurably for pruned nodes.

    If the reason for writing blocks is to avoid flushing to LevelDB, there may be better ways (as mentioned, #31645 + sorting speeds up dumping from e.g. 30 minutes to 7).

  15. in src/node/blockmanager_args.cpp:25 in d612e2637e outdated
    20+{
    21+    // block pruning; get the amount of disk space (in MiB) to allot for block & undo files
    22+    if (nPruneArg < 0) {
    23+        return util::Error{strprintf(_("%s cannot be configured with a negative value."), opt_name)};
    24+    } else if (nPruneArg == 0) { // disabled
    25+        return 0;
    


    l0rinc commented at 10:40 am on February 12, 2025:
    can we clarify what disabling this feature means exactly? Can we rather add a default value for minimal blocks kept during IBD instead of disabling it by default?

    luke-jr commented at 5:35 pm on February 12, 2025:

    It’s to disable pruning during IBD.

    I’m not seeing a use case to set this lower than -prune (we already prune extra during IBD when we prune), only higher (to avoid flushing).

  16. in src/init.cpp:1 in d612e2637e


    l0rinc commented at 10:47 am on February 12, 2025:
    Could you please rebase the build to make sure the latest tests are passing? It’s ~7 months old and the build system is complaining when switching branches.

    luke-jr commented at 5:36 pm on February 12, 2025:
    Merge it into master for testing. Rebasing for no reason is a bad practice.

    l0rinc commented at 5:42 pm on February 12, 2025:
    It’s a new PR, why base it on an old state? I ran the test locally and they’re passing, but rebasing the PR assures that CI also runs against the latest changes.
  17. in src/node/blockmanager_args.cpp:47 in d612e2637e outdated
    52+    const auto prune_parsed = ParsePruneOption(nPruneArg, "Prune");
    53+    if (!prune_parsed) return util::Error{util::ErrorString(prune_parsed)};
    54+    opts.prune_target = *prune_parsed;
    55+
    56+    if (args.IsArgSet("-pruneduringinit")) {
    57+        const auto pdi_parsed = ParsePruneOption(args.GetIntArg("-pruneduringinit", 0 /* unused */), "-pruneduringinit");
    


    l0rinc commented at 10:55 am on February 12, 2025:

    Does unused mean that the optional is never empty because we’ve just checked IsArgSet? Given that it’s an optional, we could include it directly in the condition instead:

    0    if (auto prune_during_init{args.GetIntArg("-pruneduringinit")}) {
    1        const auto pdi_parsed{ParsePruneOption(*prune_during_init, "-pruneduringinit")};
    

    l0rinc commented at 11:42 am on February 12, 2025:
    to avoid non-standard abbreviations such as pdi - and given that above we call the parsed prune option prune_parsed -, we could rather name this as prune_init_parsed.
  18. in src/node/blockmanager_args.cpp:21 in d612e2637e outdated
    16+#include <string_view>
    17 
    18 namespace node {
    19+util::Result<uint64_t> ParsePruneOption(const int64_t nPruneArg, const std::string_view opt_name)
    20+{
    21+    // block pruning; get the amount of disk space (in MiB) to allot for block & undo files
    


    l0rinc commented at 11:02 am on February 12, 2025:
    MIN_DISK_SPACE_FOR_BLOCK_FILES docs also mention block and undo data - should we add orphan blocks as well in the description here?

    luke-jr commented at 5:38 pm on February 12, 2025:

    There’s no such thing as orphan blocks. Stale blocks are implicitly included in the more general “blocks”.

    Anyway, this is just a move.


    l0rinc commented at 5:43 pm on February 12, 2025:
    Looks like we should update MIN_DISK_SPACE_FOR_BLOCK_FILES in that case in a separate PR.
  19. in src/node/blockmanager_args.cpp:44 in d612e2637e outdated
    49-        if (nPruneTarget < MIN_DISK_SPACE_FOR_BLOCK_FILES) {
    50-            return util::Error{strprintf(_("Prune configured below the minimum of %d MiB.  Please use a higher number."), MIN_DISK_SPACE_FOR_BLOCK_FILES / 1024 / 1024)};
    51-        }
    52+    const auto prune_parsed = ParsePruneOption(nPruneArg, "Prune");
    53+    if (!prune_parsed) return util::Error{util::ErrorString(prune_parsed)};
    54+    opts.prune_target = *prune_parsed;
    


    l0rinc commented at 11:06 am on February 12, 2025:

    To obviate that prune_parsed has a very narrow scope here we could encapsulate it into an if/else:

    0    if (auto prune_parsed{ParsePruneOption(nPruneArg, "Prune")}) {
    1        opts.prune_target = *prune_parsed;
    2    } else {
    3        return util::Error{ErrorString(prune_parsed)};
    4    }
    
  20. in src/node/blockstorage.cpp:330 in d612e2637e outdated
    321     // Distribute our -prune budget over all chainstates.
    322-    const auto target = std::max(
    323-        MIN_DISK_SPACE_FOR_BLOCK_FILES, GetPruneTarget() / chainman.GetAll().size());
    324+    target = (target / number_of_chainstates) + target_boost;
    325+    target = std::max(MIN_DISK_SPACE_FOR_BLOCK_FILES, target);
    326     const uint64_t target_sync_height = chainman.m_best_header->nHeight;
    


    l0rinc commented at 11:36 am on February 12, 2025:

    Previously the target calculations were trivial, now it might be ripe for an extraction to unburden FindFilesToPrune (making this sub-feature unit-testable as well):

     0uint64_t BlockManager::CalculateTarget(const ChainstateRole role, ChainstateManager& chainman) const
     1{
     2    uint64_t target{GetPruneTarget()};
     3    uint64_t target_boost{0};
     4    if (m_opts.prune_target_during_init > -1 && chainman.IsInitialBlockDownload()) {
     5        if (uint64_t(m_opts.prune_target_during_init) <= target) {
     6            target = m_opts.prune_target_during_init;
     7        } else if (role != ChainstateRole::ASSUMEDVALID) {
     8            // Only the background/normal gets the benefit
     9            // NOTE: This assumes only one such chainstate exists
    10            target_boost = m_opts.prune_target_during_init - target;
    11        }
    12    }
    13    // Distribute our -prune budget over all chainstates.
    14    const auto number_of_chainstates{chainman.GetAll().size()};
    15    target = (target / number_of_chainstates) + target_boost;
    16    return std::max(MIN_DISK_SPACE_FOR_BLOCK_FILES, target);
    17}
    

    with

    0const uint64_t target{CalculateTarget(chain.GetRole(), chainman)};
    
  21. in src/node/blockmanager_args.cpp:30 in d612e2637e outdated
    25+        return 0;
    26+    } else if (nPruneArg == 1) { // manual pruning: -prune=1
    27+        return BlockManager::PRUNE_TARGET_MANUAL;
    28+    }
    29+    const uint64_t nPruneTarget{uint64_t(nPruneArg) * 1024 * 1024};
    30+    if (nPruneTarget < MIN_DISK_SPACE_FOR_BLOCK_FILES) {
    


    l0rinc commented at 11:50 am on February 12, 2025:
    Do we really need the same standards for IBD block retention and normal flow? The comment states 288 blocks (and only 1 mb per block for some pre-SegWit reason) which seems absurd during IBD, doesn’t it?

    luke-jr commented at 5:41 pm on February 12, 2025:
    Outside the scope of this PR

    l0rinc commented at 5:45 pm on February 12, 2025:
    How so? This is basically my main concern: why are we storing blocks during IBD at all, when pruned? Or if they’re needed for some reason, do we really need to store 288 blocks throughout the IBD as well?

    andrewtoth commented at 6:20 pm on February 12, 2025:

    I think the scope of this PR is to reduce chainstate writes, which occur when we need to prune blocks. Pruning blocks means we can’t reindex them and would have to redownload, so we write the chainstate then to be safe.

    If we were to not store blocks, we would have to write the chainstate after indexing every block to avoid potentially redownloading them.


    l0rinc commented at 8:59 pm on February 12, 2025:
    Thanks for the details, @andrewtoth. So basically having a low pruning target during IBD (preferably 0) with high dbcache (preferably max) would be the fastest possible option (especially after #31645), right? @luke-jr, it would be great if we could lower the IBD block writing threshold further (especially since the mentioned 288-block limit already seems incorrect - I’ll run an IBD to measure it exactly).
  22. l0rinc changes_requested
  23. l0rinc commented at 12:17 pm on February 12, 2025: contributor

    Before merging I would like to run an IBD, a reindex and an assumeUTXO on the branch.

    My main concern currently (or rather the part I don’t yet fully understand) is that if “during IBD there shouldn’t be reorgs in the first place” (given a reliable block-header-first sync), why is this even configurable, why not always skip storage of blocks and undos completely during IBD (or at least reduce it to 6 blocks or something trivial) when prune is enabled (e.g. only enable it after the last 100 blocks or after assumevalid or something similar)?

    Also note that #31144 is meant to optimize block writing speed considerably.

  24. luke-jr force-pushed on Feb 12, 2025
  25. Add -pruneduringinit option to temporarily use another prune target during IBD d4a3abf6d4
  26. luke-jr force-pushed on Feb 12, 2025
  27. andrewtoth commented at 9:41 pm on February 12, 2025: contributor
    Can’t we just do an IBD without pruning, then shutdown and restart with a prune height set? Or just call pruneblockchain after we are done IBD?
  28. stickies-v commented at 3:14 pm on February 13, 2025: contributor

    Can’t we just do an IBD without pruning

    Disabling pruning is only possible on devices with sufficient disk space, but I agree that a simple restart seems like an easy enough workaround if benchmarks indicate that this a worthwhile performance improvement:

    0bitcoind --prune=4000 --stopatheight={current_tip} && bitcoind --prune=300
    

    Approach NACK, I don’t think this needs a separate startup option. If benchmarks agree, I think a PR that improves the --prune documentation would be worthwhile.


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-02-22 06:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me