Manual block file pruning.

mrbandrews commented at 2:29 pm on April 13, 2016: contributor

Implements #7365.

Now there is auto-prune and manual-prune. The user enables manual pruning on the command line with prune=1 and then uses an RPC command to prune: “pruneblockchain X” to prune up to height X.

Updated the python test (pruning.py).

jonasschnelli added the label Block storage on Apr 13, 2016

jonasschnelli commented at 2:30 pm on April 13, 2016: contributor

Nice! Concept ACK.

instagibbs commented at 8:13 pm on April 13, 2016: member

What use cases are motivating this mode?

laanwj commented at 6:28 am on April 15, 2016: member

Concept ACK

What use cases are motivating this mode?

This is explained in the original issue, #7365: if you have another application that needs to consume the block data before it’s gone.

instagibbs commented at 11:12 am on April 15, 2016: member

Should have read the issue, sorry. Concept ACK

sipa commented at 2:30 pm on June 2, 2016: member

Concept ACK. Needs rebase.

mrbandrews force-pushed on Jun 8, 2016

mrbandrews commented at 6:26 pm on June 8, 2016: contributor

Rebased.

in src/main.cpp: in 2695b56531 outdated

3617@@ -3617,10 +3618,44 @@ void UnlinkPrunedFiles(std::set<int>& setFilesToPrune)
3618     }
3619 }
3620 
3621+/* This function is called from the RPC code for pruneblockchain */
3622+void PruneBlockFilesManual(int manualPruneHeight)
3623+{
3624+    // stuff the prune height into a global variable, and flush

rebroad commented at 10:16 am on August 24, 2016:

I usually get told off for using global variables. Curious to know when they are permitted and when they are discouraged.

sipa commented at 11:58 am on August 25, 2016: member

Needs rebase.

mrbandrews force-pushed on Sep 7, 2016

in src/init.cpp: in 518a942674 outdated

1337@@ -1335,7 +1338,7 @@ bool AppInit2(boost::thread_group& threadGroup, CScheduler& scheduler)
1338 
1339                 // Check for changed -prune state.  What we are concerned about is a user who has pruned blocks
1340                 // in the past, but is now trying to run unpruned.
1341-                if (fHavePruned && !fPruneMode) {
1342+                if (fHavePruned && pruneMode==PRUNE_NONE) {

luke-jr commented at 9:47 pm on October 4, 2016:

Many places, you left (pruneMode) in a boolean context. I don’t see a need to change to == NONE here.

in src/main.cpp: in 518a942674 outdated

70@@ -71,13 +71,14 @@ bool fImporting = false;
71 bool fReindex = false;
72 bool fTxIndex = false;
73 bool fHavePruned = false;
74-bool fPruneMode = false;
75+PruneMode pruneMode = PRUNE_NONE;

luke-jr commented at 9:48 pm on October 4, 2016:

A bit ugly to have the enum and variable differ only by case.

laanwj commented at 8:29 am on November 21, 2016:

I think it’s fine.

in src/main.cpp: in 518a942674 outdated

77 bool fRequireStandard = true;
78 bool fCheckBlockIndex = false;
79 bool fCheckpointsEnabled = DEFAULT_CHECKPOINTS_ENABLED;
80 size_t nCoinCacheUsage = 5000 * 300;
81 uint64_t nPruneTarget = 0;
82+unsigned int nManualPruneHeight = 0;

luke-jr commented at 9:50 pm on October 4, 2016:

This appears to be used exclusively for passing a value from a caller to a callee, so it shouldn’t be a global variable.

in src/main.cpp: in 518a942674 outdated

3842@@ -3842,10 +3843,44 @@ void UnlinkPrunedFiles(std::set<int>& setFilesToPrune)
3843     }
3844 }
3845 
3846+/* This function is called from the RPC code for pruneblockchain */
3847+void PruneBlockFilesManual(int manualPruneHeight)

luke-jr commented at 9:59 pm on October 4, 2016:

int is the wrong type here. It is only guaranteed to hold up to 32768, which isn’t very useful for block heights.

Lock heights will break before 29 bits are exceeded, so I suggest using either unsigned long or uint32_t

in src/main.cpp: in 518a942674 outdated

3878-void FindFilesToPrune(std::set<int>& setFilesToPrune, uint64_t nPruneAfterHeight)
3879+void FindFilesToPruneAuto(std::set<int>& setFilesToPrune)
3880 {
3881     LOCK2(cs_main, cs_LastBlockFile);
3882+    const CChainParams& chainparams = Params();
3883+    uint64_t nPruneAfterHeight = chainparams.PruneAfterHeight();

luke-jr commented at 10:08 pm on October 4, 2016:

We’re trying to reduce calls to global Params() by passing things as arguments. This change has the opposite effect for no clear reason.

in src/rpc/blockchain.cpp: in 518a942674 outdated

685+
686+    LOCK(cs_main);
687+
688+    int height = params[0].get_int();
689+    if (height < 0) {
690+        throw JSONRPCError(RPC_INTERNAL_ERROR, "Negative block height.");

luke-jr commented at 10:11 pm on October 4, 2016:

RPC_INVALID_PARAMETER seems more appropriate here.

in src/rpc/blockchain.cpp: in 518a942674 outdated

688+    int height = params[0].get_int();
689+    if (height < 0) {
690+        throw JSONRPCError(RPC_INTERNAL_ERROR, "Negative block height.");
691+    } else if ((unsigned int) chainActive.Height() < Params().PruneAfterHeight()) {
692+        throw JSONRPCError(RPC_INTERNAL_ERROR, "Blockchain is too short for pruning.");
693+    }

luke-jr commented at 10:12 pm on October 4, 2016:

Should this also check if the requested height is too close to the tip? Or maybe just return the height it was able to successfully prune up to…

in src/rpc/blockchain.cpp: in 518a942674 outdated

677+            "pruneblockchain\n"
678+            "\nArguments:\n"
679+            "1. \"height\"       (int, required) The block height to prune up to.\n");
680+
681+    if (pruneMode != PRUNE_MANUAL) {
682+        throw JSONRPCError(RPC_INTERNAL_ERROR, "Cannot prune via RPC unless in manual prune mode.");

luke-jr commented at 10:14 pm on October 4, 2016:

RPC_METHOD_NOT_FOUND seems better for this (already used for wallet RPCs when the wallet is not enabled)

luke-jr changes_requested

luke-jr commented at 10:18 pm on October 4, 2016: member

Oh, it might make sense to add “prunemode”: “auto”/“manual” to getblockchaininfo also.

petertodd commented at 11:28 pm on October 13, 2016: contributor

Concept ACK

My OpenTimestamps Server could use this!

mrbandrews force-pushed on Oct 26, 2016

mrbandrews commented at 6:33 pm on October 26, 2016: contributor

Rebased and feedback addressed. I didn’t make a couple of the suggested changes, though:

“int” for block height is used elsewhere in the code (chain.h/chain.cpp); perhaps a future PR could change the type used in each place.
Params() - this was just a code move, although I had left one unnecessary line of code in there, which I removed now.
heights close to tip - it will prune up to the limit and log.

in src/init.cpp: in 98470f9d5b outdated

927@@ -928,12 +928,15 @@ bool AppInit2(boost::thread_group& threadGroup, CScheduler& scheduler)
928         return InitError(_("Prune cannot be configured with a negative value."));
929     }
930     nPruneTarget = (uint64_t) nSignedPruneTarget;
931-    if (nPruneTarget) {
932+    if (nPruneTarget / 1024 / 1024 == 1) {  // manual pruning: -prune=1
933+        LogPrintf("Manual block pruning enabled.  Use RPC call pruneblockchain=<height> to prune block and undo files.\n");

laanwj commented at 8:20 am on November 21, 2016:

Nit: Use RPC call pruneblockchain(height) - otherwise the syntax can be easily confused with a command line option.

in src/main.h: in 98470f9d5b outdated

190@@ -191,8 +191,10 @@ static const uint64_t nMinDiskSpace = 52428800;
191 /** Pruning-related variables and constants */
192 /** True if any block files have ever been pruned. */
193 extern bool fHavePruned;
194-/** True if we're running in -prune mode. */
195-extern bool fPruneMode;
196+enum PruneMode {PRUNE_NONE = 0, PRUNE_AUTO, PRUNE_MANUAL };

laanwj commented at 8:21 am on November 21, 2016:

Let’s use C++11 scoped enums in new code

0enum struct PruneMode { NONE=0, AUTO, MANUAL };

Then refer to PruneMode::NONE etc.

in src/init.cpp: in 98470f9d5b outdated

927@@ -928,12 +928,15 @@ bool AppInit2(boost::thread_group& threadGroup, CScheduler& scheduler)
928         return InitError(_("Prune cannot be configured with a negative value."));
929     }
930     nPruneTarget = (uint64_t) nSignedPruneTarget;
931-    if (nPruneTarget) {
932+    if (nPruneTarget / 1024 / 1024 == 1) {  // manual pruning: -prune=1

laanwj commented at 8:25 am on November 21, 2016:

Can we please store and compare the option value before multiplication? Doing a division here, though it achieves the correct effect, seems circuitous and unclear.

in src/main.cpp: in 98470f9d5b outdated

3319@@ -3317,7 +3320,7 @@ bool FindBlockPos(CValidationState &state, CDiskBlockPos &pos, unsigned int nAdd
3320         unsigned int nOldChunks = (pos.nPos + BLOCKFILE_CHUNK_SIZE - 1) / BLOCKFILE_CHUNK_SIZE;
3321         unsigned int nNewChunks = (vinfoBlockFile[nFile].nSize + BLOCKFILE_CHUNK_SIZE - 1) / BLOCKFILE_CHUNK_SIZE;
3322         if (nNewChunks > nOldChunks) {
3323-            if (fPruneMode)
3324+            if (pruneMode == PRUNE_AUTO)

laanwj commented at 8:27 am on November 21, 2016:

Are you sure this only needs to be done in auto-pruning mode?

in src/wallet/rpcdump.cpp: in 98470f9d5b outdated

114@@ -115,7 +115,7 @@ UniValue importprivkey(const JSONRPCRequest& request)
115     if (request.params.size() > 2)
116         fRescan = request.params[2].get_bool();
117 
118-    if (fRescan && fPruneMode)
119+    if (fRescan && pruneMode)

laanwj commented at 8:29 am on November 21, 2016:

As you already have to change every line on which the variable occurs anyhow, I’d prefer explicitly comparing the enumeration against a value instead of using it like a boolean: e.g. pruneMode != PruneMode::NONE. This makes it clear to developers reading the code that this is an enumeration and not a boolean and can avoid them introducing silly bugs.

in src/main.cpp: in 98470f9d5b outdated

3853@@ -3851,10 +3854,40 @@ void UnlinkPrunedFiles(std::set<int>& setFilesToPrune)
3854     }
3855 }
3856 
3857+/* Calculate the block/rev files to delete based on height specified by user with RPC command pruneblockchain */
3858+void FindFilesToPruneManual(std::set<int>& setFilesToPrune, int nManualPruneHeight)
3859+{
3860+    LOCK2(cs_main, cs_LastBlockFile);
3861+    if (chainActive.Tip() == NULL || pruneMode != PRUNE_MANUAL) {

laanwj commented at 8:31 am on November 21, 2016:

Calling this function with pruneMode != PRUNE_MANUAL should be an assertion error, as it must be a bug in the code.

in src/main.h: in 98470f9d5b outdated

279@@ -273,7 +280,8 @@ CAmount GetBlockSubsidy(int nHeight, const Consensus::Params& consensusParams);
280  *
281  * @param[out]   setFilesToPrune   The set of file indices that can be unlinked will be returned
282  */
283-void FindFilesToPrune(std::set<int>& setFilesToPrune, uint64_t nPruneAfterHeight);
284+void FindFilesToPruneAuto(std::set<int>& setFilesToPrune);
285+void FindFilesToPruneManual(std::set<int>& setFilesToPrune, int nManualPruneHeight);

laanwj commented at 8:36 am on November 21, 2016:

Aside: these functions are only used within main.cpp, why are we exporting them? (thinking about it, let’s just keep it for now, it seems common in main.cpp to export everything whether necessary or not, and changing will probably interfere with attempts of splitting up main such as #9183)

sipa commented at 6:51 pm on November 28, 2016:

There are many functions in main that are not exported (see the whole namespace block for the handling of NodeState, for example).

in src/rpc/blockchain.cpp: in 98470f9d5b outdated

822+            "\nArguments:\n"
823+            "1. \"height\"       (int, required) The block height to prune up to.\n");
824+
825+    if (pruneMode != PRUNE_MANUAL) {
826+        throw JSONRPCError(RPC_METHOD_NOT_FOUND, "Cannot prune via RPC unless in manual prune mode.");
827+        return 0;

laanwj commented at 8:39 am on November 21, 2016:

No need for a return if you have a throw

in src/rpc/blockchain.cpp: in 98470f9d5b outdated

832+    int height = request.params[0].get_int();
833+    if (height < 0) {
834+        throw JSONRPCError(RPC_INVALID_PARAMETER, "Negative block height.");
835+    } else if ((unsigned int) chainActive.Height() < Params().PruneAfterHeight()) {
836+        throw JSONRPCError(RPC_INTERNAL_ERROR, "Blockchain is too short for pruning.");
837+    }

laanwj commented at 8:48 am on November 21, 2016:

Do we need any foot-shooting checks: e.g. that the last 144 blocks are being retained to be robust against reorgs?

Edit: apparently the check for MIN_BLOCKS_TO_KEEP happens deeper in the pruning logic, and it continues in this case by pruning the allowed blocks only. Ok, makes sense I think, although a warning in the log may make it more transparent what happens.

mrbandrews commented at 4:31 pm on November 21, 2016: contributor

Feedback addressed. Re: the code in FindBlockPos (which reset fCheckForPruning only in auto-prune mode), yes that should only be auto-prune. I designed manual pruning to be a one-time act of pruning to the specified height. (In looking at this code, though, I noticed that nManualPruneHeight should probably be set=0 after we do the manual pruning, so I added that line of code, in a separate commit.)

So, there’s a commit responding to laanwj’s feedback, and another with that one line of code.

mrbandrews force-pushed on Nov 28, 2016

mrbandrews commented at 4:06 pm on November 28, 2016: contributor

There was a merge conflict (the declaration of FlushStateToDisk near the top of main.cpp conflicted with my adding an optional parameter), so I rebased and squashed everything. This should work now.

in src/main.cpp: in 342262d9a9 outdated

4174@@ -4140,7 +4175,7 @@ bool CVerifyDB::VerifyDB(const CChainParams& chainparams, CCoinsView *coinsview,
4175         uiInterface.ShowProgress(_("Verifying blocks..."), percentageDone);
4176         if (pindex->nHeight < chainActive.Height()-nCheckDepth)
4177             break;
4178-        if (fPruneMode && !(pindex->nStatus & BLOCK_HAVE_DATA)) {
4179+        if (pruneMode != PruneMode::NONE && !(pindex->nStatus & BLOCK_HAVE_DATA)) {

sipa commented at 6:49 pm on November 28, 2016:

FindFilesToPruneAuto is only called when pruneMode == PruneMode::AUTO, so why do we need this test? Maybe turn it into an assert at the beginning of the function.

sipa commented at 6:50 pm on November 28, 2016:

Also, if (pruneMode) is equivalent to if (pruneMode != PruneMode::NONE), since PruneMode::NONE is explicitly defined as 0.

(applies to many changed lines in this PR)

in src/main.h: in 342262d9a9 outdated

190@@ -191,8 +191,10 @@ static const uint64_t nMinDiskSpace = 52428800;
191 /** Pruning-related variables and constants */
192 /** True if any block files have ever been pruned. */
193 extern bool fHavePruned;
194-/** True if we're running in -prune mode. */
195-extern bool fPruneMode;
196+enum struct PruneMode {NONE=0, AUTO, MANUAL};

sipa commented at 6:55 pm on November 28, 2016:

Meta question: why do we need a tristate here? I think we could allow manual pruning even when in ‘auto` mode. In that case, manual-only pruning could be requested by setting the limit very high.

mrbandrews force-pushed on Dec 1, 2016

mrbandrews commented at 7:17 pm on December 1, 2016: contributor

So, originally I thought it was simpler to make manual pruning a separate mode from autoprune. After sipa’s question and giving this a little more thought, I agree that it should be the same. So, now setting prune=1 (manual pruning) means autoprune with target=max.

Sorry for the late change in approach. This makes the diff smaller, though.

Also, now I only expose in main.h that which is used elsewhere.

mrbandrews force-pushed on Dec 7, 2016

mrbandrews commented at 3:48 pm on December 7, 2016: contributor

Rebased.

in qa/rpc-tests/pruning.py: in 485684c748 outdated

30@@ -31,6 +31,10 @@ def __init__(self):
31         self.utxo_cache_0 = []
32         self.utxo_cache_1 = []
33 
34+    def setup_chain(self):

ryanofsky commented at 9:43 pm on December 13, 2016:

Do you actually need to override this method? Would it work to just change num_nodes from 3 to 5 in the constructor above? Maybe add a comment here if keeping this is necessary.

in qa/rpc-tests/pruning.py: in 485684c748 outdated

298+            start_node(2, self.options.tmpdir, ["-debug=1","-prune=550"])
299+            print("Success")
300+        except Exception as detail:
301+            raise AssertionError("Wallet test: unable to re-start the pruning node")
302+
303+        # connect a new node for IBD while pruning, then check wallet (see issue #7494)

ryanofsky commented at 9:59 pm on December 13, 2016:

I think this would be clearer if the comment said specifically what was being tested and what the connection to #7494 is, e.g. “Check that wallet loads loads successfully when restarting a pruned node after an IBD. This was reported to fail in #7494.”

in qa/rpc-tests/pruning.py: in 485684c748 outdated

231+        self.nodes[3] = start_node(3, self.options.tmpdir, ["-debug=0","-prune=1"], timewait=900)
232+        assert_equal(self.nodes[3].getblockcount(), 995)
233+
234+        # should not prune because chain tip of node 3 (995) < PruneAfterHeight (1000)
235+        try:
236+            self.nodes[3].pruneblockchain(500)

ryanofsky commented at 10:06 pm on December 13, 2016:

Might be good to check for the specific error messages here and below, e.g. assert_raises_message(JSONRPCException, "Blockchain is too short for pruning.", self.nodes[3].pruneblockchain, 500)

in src/rpc/blockchain.cpp: in 485684c748 outdated

821+            "pruneblockchain\n"
822+            "\nArguments:\n"
823+            "1. \"height\"       (int, required) The block height to prune up to.\n");
824+
825+    if (!fPruneMode)
826+        throw JSONRPCError(RPC_METHOD_NOT_FOUND, "Cannot prune blocks because node is not in prune mode.");

ryanofsky commented at 10:07 pm on December 13, 2016:

Could add a python test for this condition.

ryanofsky approved

ryanofsky commented at 10:11 pm on December 13, 2016: member

Lightly tested ACK 485684c748b477caee06bcaad207eeb805b6c3a2 (just ran pruning.py).

Code looks really good. I made a few suggestions you could consider, but they’re all on the python test.

mrbandrews force-pushed on Dec 14, 2016

mrbandrews commented at 2:18 pm on December 14, 2016: contributor

Added a commit with edits to python test per ryanofsky review.

neocogent commented at 9:19 am on December 29, 2016: none

This is very useful for me. I’m doing some testing on this now. My application is a sql db on top of the blockchain that depends on processing incoming blocks. After processed into sql the block data is never used again so it’s ideal if it can be pruned to save space, especially as I’m paying per GB on a cloud server.

Manual pruning allows me to start processing blocks while still syncing the chain without worry about blocks being pruned before I get to them. I had a workaround for this that disabled network access when the pruning was approaching the current block being worked on. So this is a much nicer approach and I feel there are probably more use cases for this.

I noticed that the man pages are not updated with this. The cmd line help does indicate prune=1 available but the man page misses that updated detail. So it may be nice to update that page too.

Currently it appears to be working but I am still in progress with a new full sync and I’m watching that pruning of block files trails behind my block processor as expected.

ryanofsky force-pushed on Jan 3, 2017

ryanofsky commented at 3:28 pm on January 3, 2017: member

I rebased this, updated the man pages, and squashed the test commit.

ryanofsky force-pushed on Jan 6, 2017

gmaxwell commented at 3:40 am on January 8, 2017: contributor

utACK. Great feature!

gmaxwell commented at 3:52 am on January 8, 2017: contributor

One question: I view this feature as the perfect compliment to importmulti, that lets you be sure you’ve imported all your keys before you prune. But importmulti takes its scanning argument as a timestamp, while this takes it’s argument as a height. Should we also support a timestamp option here that uses the same criteria as import multi?

Edit: I think I will fix this by adding a height based option to importmulti.

Add pruneblockchain RPC to enable manual block file pruning. 1fc4ec7bf2

fixup! Add pruneblockchain RPC to enable manual block file pruning.

Extend pruneblockchain RPC to accept block timestamps as well as block indices.

afffeea7d9

ryanofsky force-pushed on Jan 10, 2017

ryanofsky commented at 10:32 pm on January 10, 2017: member

Rebased for named arguments. @gmaxwell, I extended pruneblockchain in afffeea7d98ba358acd54a89bc0e7ae1c4d54023 to be able to take a timestamp instead of a block index. The code change is pretty small. The test change is bigger but straightforward.

laanwj commented at 1:16 pm on January 11, 2017: member

utACK afffeea

laanwj merged this on Jan 11, 2017

laanwj closed this on Jan 11, 2017

laanwj referenced this in commit e2e624d9ce on Jan 11, 2017

in src/rpc/blockchain.cpp: in afffeea7d9

846+    if (chainHeight < Params().PruneAfterHeight())
847+        throw JSONRPCError(RPC_INTERNAL_ERROR, "Blockchain is too short for pruning.");
848+    else if (height > chainHeight)
849+        throw JSONRPCError(RPC_INVALID_PARAMETER, "Blockchain is shorter than the attempted prune height.");
850+    else if (height > chainHeight - MIN_BLOCKS_TO_KEEP)
851+        LogPrint("rpc", "Attempt to prune blocks close to the tip.  Retaining the minimum number of blocks.");

jonasschnelli commented at 1:30 pm on January 11, 2017:

one post merge nit: We should probably report the pruned height in case the user given value was overrode.

ryanofsky commented at 7:32 pm on January 11, 2017:

Done in https://github.com/bitcoin/bitcoin/pull/9518

ryanofsky referenced this in commit 1c328f8de8 on Jan 11, 2017

ryanofsky referenced this in commit 918d1fb86b on Jan 11, 2017

codablock referenced this in commit 54b384b7ce on Jan 19, 2018

codablock referenced this in commit 32681a826d on Jan 20, 2018

codablock referenced this in commit 88b9aa45de on Jan 21, 2018

lateminer referenced this in commit 349d33ca55 on Jan 4, 2019

andvgal referenced this in commit 0593123342 on Jan 6, 2019

CryptoCentric referenced this in commit 04db1389a6 on Feb 27, 2019

CryptoCentric referenced this in commit e11d1ae195 on Feb 27, 2019

CryptoCentric referenced this in commit 56dc1cec07 on Mar 5, 2019

CryptoCentric referenced this in commit e961199b19 on Mar 5, 2019

MarcoFalke locked this on Sep 8, 2021

Manual block file pruning. #7871