Implements #7365.
Now there is auto-prune and manual-prune. The user enables manual pruning on the command line with prune=1 and then uses an RPC command to prune: "pruneblockchain X" to prune up to height X.
Updated the python test (pruning.py).
Implements #7365.
Now there is auto-prune and manual-prune. The user enables manual pruning on the command line with prune=1 and then uses an RPC command to prune: "pruneblockchain X" to prune up to height X.
Updated the python test (pruning.py).
Nice! Concept ACK.
What use cases are motivating this mode?
Should have read the issue, sorry. Concept ACK
Concept ACK. Needs rebase.
Rebased.
3617 | @@ -3617,10 +3618,44 @@ void UnlinkPrunedFiles(std::set<int>& setFilesToPrune) 3618 | } 3619 | } 3620 | 3621 | +/* This function is called from the RPC code for pruneblockchain */ 3622 | +void PruneBlockFilesManual(int manualPruneHeight) 3623 | +{ 3624 | + // stuff the prune height into a global variable, and flush
I usually get told off for using global variables. Curious to know when they are permitted and when they are discouraged.
Needs rebase.
1337 | @@ -1335,7 +1338,7 @@ bool AppInit2(boost::thread_group& threadGroup, CScheduler& scheduler) 1338 | 1339 | // Check for changed -prune state. What we are concerned about is a user who has pruned blocks 1340 | // in the past, but is now trying to run unpruned. 1341 | - if (fHavePruned && !fPruneMode) { 1342 | + if (fHavePruned && pruneMode==PRUNE_NONE) {
Many places, you left (pruneMode) in a boolean context. I don't see a need to change to == NONE here.
70 | @@ -71,13 +71,14 @@ bool fImporting = false; 71 | bool fReindex = false; 72 | bool fTxIndex = false; 73 | bool fHavePruned = false; 74 | -bool fPruneMode = false; 75 | +PruneMode pruneMode = PRUNE_NONE;
A bit ugly to have the enum and variable differ only by case.
I think it's fine.
77 | bool fRequireStandard = true;
78 | bool fCheckBlockIndex = false;
79 | bool fCheckpointsEnabled = DEFAULT_CHECKPOINTS_ENABLED;
80 | size_t nCoinCacheUsage = 5000 * 300;
81 | uint64_t nPruneTarget = 0;
82 | +unsigned int nManualPruneHeight = 0;
This appears to be used exclusively for passing a value from a caller to a callee, so it shouldn't be a global variable.
3842 | @@ -3842,10 +3843,44 @@ void UnlinkPrunedFiles(std::set<int>& setFilesToPrune) 3843 | } 3844 | } 3845 | 3846 | +/* This function is called from the RPC code for pruneblockchain */ 3847 | +void PruneBlockFilesManual(int manualPruneHeight)
int is the wrong type here. It is only guaranteed to hold up to 32768, which isn't very useful for block heights.
Lock heights will break before 29 bits are exceeded, so I suggest using either unsigned long or uint32_t
3878 | -void FindFilesToPrune(std::set<int>& setFilesToPrune, uint64_t nPruneAfterHeight) 3879 | +void FindFilesToPruneAuto(std::set<int>& setFilesToPrune) 3880 | { 3881 | LOCK2(cs_main, cs_LastBlockFile); 3882 | + const CChainParams& chainparams = Params(); 3883 | + uint64_t nPruneAfterHeight = chainparams.PruneAfterHeight();
We're trying to reduce calls to global Params() by passing things as arguments. This change has the opposite effect for no clear reason.
685 | + 686 | + LOCK(cs_main); 687 | + 688 | + int height = params[0].get_int(); 689 | + if (height < 0) { 690 | + throw JSONRPCError(RPC_INTERNAL_ERROR, "Negative block height.");
RPC_INVALID_PARAMETER seems more appropriate here.
688 | + int height = params[0].get_int(); 689 | + if (height < 0) { 690 | + throw JSONRPCError(RPC_INTERNAL_ERROR, "Negative block height."); 691 | + } else if ((unsigned int) chainActive.Height() < Params().PruneAfterHeight()) { 692 | + throw JSONRPCError(RPC_INTERNAL_ERROR, "Blockchain is too short for pruning."); 693 | + }
Should this also check if the requested height is too close to the tip? Or maybe just return the height it was able to successfully prune up to...
677 | + "pruneblockchain\n" 678 | + "\nArguments:\n" 679 | + "1. \"height\" (int, required) The block height to prune up to.\n"); 680 | + 681 | + if (pruneMode != PRUNE_MANUAL) { 682 | + throw JSONRPCError(RPC_INTERNAL_ERROR, "Cannot prune via RPC unless in manual prune mode.");
RPC_METHOD_NOT_FOUND seems better for this (already used for wallet RPCs when the wallet is not enabled)
Oh, it might make sense to add "prunemode": "auto"/"manual" to getblockchaininfo also.
Concept ACK
My OpenTimestamps Server could use this!
Rebased and feedback addressed. I didn't make a couple of the suggested changes, though:
927 | @@ -928,12 +928,15 @@ bool AppInit2(boost::thread_group& threadGroup, CScheduler& scheduler) 928 | return InitError(_("Prune cannot be configured with a negative value.")); 929 | } 930 | nPruneTarget = (uint64_t) nSignedPruneTarget; 931 | - if (nPruneTarget) { 932 | + if (nPruneTarget / 1024 / 1024 == 1) { // manual pruning: -prune=1 933 | + LogPrintf("Manual block pruning enabled. Use RPC call pruneblockchain=<height> to prune block and undo files.\n");
Nit: Use RPC call pruneblockchain(height) - otherwise the syntax can be easily confused with a command line option.
190 | @@ -191,8 +191,10 @@ static const uint64_t nMinDiskSpace = 52428800; 191 | /** Pruning-related variables and constants */ 192 | /** True if any block files have ever been pruned. */ 193 | extern bool fHavePruned; 194 | -/** True if we're running in -prune mode. */ 195 | -extern bool fPruneMode; 196 | +enum PruneMode {PRUNE_NONE = 0, PRUNE_AUTO, PRUNE_MANUAL };
Let's use C++11 scoped enums in new code
enum struct PruneMode { NONE=0, AUTO, MANUAL };
Then refer to PruneMode::NONE etc.
927 | @@ -928,12 +928,15 @@ bool AppInit2(boost::thread_group& threadGroup, CScheduler& scheduler) 928 | return InitError(_("Prune cannot be configured with a negative value.")); 929 | } 930 | nPruneTarget = (uint64_t) nSignedPruneTarget; 931 | - if (nPruneTarget) { 932 | + if (nPruneTarget / 1024 / 1024 == 1) { // manual pruning: -prune=1
Can we please store and compare the option value before multiplication? Doing a division here, though it achieves the correct effect, seems circuitous and unclear.
3319 | @@ -3317,7 +3320,7 @@ bool FindBlockPos(CValidationState &state, CDiskBlockPos &pos, unsigned int nAdd 3320 | unsigned int nOldChunks = (pos.nPos + BLOCKFILE_CHUNK_SIZE - 1) / BLOCKFILE_CHUNK_SIZE; 3321 | unsigned int nNewChunks = (vinfoBlockFile[nFile].nSize + BLOCKFILE_CHUNK_SIZE - 1) / BLOCKFILE_CHUNK_SIZE; 3322 | if (nNewChunks > nOldChunks) { 3323 | - if (fPruneMode) 3324 | + if (pruneMode == PRUNE_AUTO)
Are you sure this only needs to be done in auto-pruning mode?
114 | @@ -115,7 +115,7 @@ UniValue importprivkey(const JSONRPCRequest& request) 115 | if (request.params.size() > 2) 116 | fRescan = request.params[2].get_bool(); 117 | 118 | - if (fRescan && fPruneMode) 119 | + if (fRescan && pruneMode)
As you already have to change every line on which the variable occurs anyhow, I'd prefer explicitly comparing the enumeration against a value instead of using it like a boolean: e.g. pruneMode != PruneMode::NONE. This makes it clear to developers reading the code that this is an enumeration and not a boolean and can avoid them introducing silly bugs.
3853 | @@ -3851,10 +3854,40 @@ void UnlinkPrunedFiles(std::set<int>& setFilesToPrune) 3854 | } 3855 | } 3856 | 3857 | +/* Calculate the block/rev files to delete based on height specified by user with RPC command pruneblockchain */ 3858 | +void FindFilesToPruneManual(std::set<int>& setFilesToPrune, int nManualPruneHeight) 3859 | +{ 3860 | + LOCK2(cs_main, cs_LastBlockFile); 3861 | + if (chainActive.Tip() == NULL || pruneMode != PRUNE_MANUAL) {
Calling this function with pruneMode != PRUNE_MANUAL should be an assertion error, as it must be a bug in the code.
279 | @@ -273,7 +280,8 @@ CAmount GetBlockSubsidy(int nHeight, const Consensus::Params& consensusParams); 280 | * 281 | * @param[out] setFilesToPrune The set of file indices that can be unlinked will be returned 282 | */ 283 | -void FindFilesToPrune(std::set<int>& setFilesToPrune, uint64_t nPruneAfterHeight); 284 | +void FindFilesToPruneAuto(std::set<int>& setFilesToPrune); 285 | +void FindFilesToPruneManual(std::set<int>& setFilesToPrune, int nManualPruneHeight);
Aside: these functions are only used within main.cpp, why are we exporting them?
(thinking about it, let's just keep it for now, it seems common in main.cpp to export everything whether necessary or not, and changing will probably interfere with attempts of splitting up main such as #9183)
There are many functions in main that are not exported (see the whole namespace block for the handling of NodeState, for example).
822 | + "\nArguments:\n" 823 | + "1. \"height\" (int, required) The block height to prune up to.\n"); 824 | + 825 | + if (pruneMode != PRUNE_MANUAL) { 826 | + throw JSONRPCError(RPC_METHOD_NOT_FOUND, "Cannot prune via RPC unless in manual prune mode."); 827 | + return 0;
No need for a return if you have a throw
832 | + int height = request.params[0].get_int(); 833 | + if (height < 0) { 834 | + throw JSONRPCError(RPC_INVALID_PARAMETER, "Negative block height."); 835 | + } else if ((unsigned int) chainActive.Height() < Params().PruneAfterHeight()) { 836 | + throw JSONRPCError(RPC_INTERNAL_ERROR, "Blockchain is too short for pruning."); 837 | + }
Do we need any foot-shooting checks: e.g. that the last 144 blocks are being retained to be robust against reorgs?
Edit: apparently the check for MIN_BLOCKS_TO_KEEP happens deeper in the pruning logic, and it continues in this case by pruning the allowed blocks only. Ok, makes sense I think, although a warning in the log may make it more transparent what happens.
Feedback addressed. Re: the code in FindBlockPos (which reset fCheckForPruning only in auto-prune mode), yes that should only be auto-prune. I designed manual pruning to be a one-time act of pruning to the specified height. (In looking at this code, though, I noticed that nManualPruneHeight should probably be set=0 after we do the manual pruning, so I added that line of code, in a separate commit.)
So, there's a commit responding to laanwj's feedback, and another with that one line of code.
There was a merge conflict (the declaration of FlushStateToDisk near the top of main.cpp conflicted with my adding an optional parameter), so I rebased and squashed everything. This should work now.
4174 | @@ -4140,7 +4175,7 @@ bool CVerifyDB::VerifyDB(const CChainParams& chainparams, CCoinsView *coinsview, 4175 | uiInterface.ShowProgress(_("Verifying blocks..."), percentageDone); 4176 | if (pindex->nHeight < chainActive.Height()-nCheckDepth) 4177 | break; 4178 | - if (fPruneMode && !(pindex->nStatus & BLOCK_HAVE_DATA)) { 4179 | + if (pruneMode != PruneMode::NONE && !(pindex->nStatus & BLOCK_HAVE_DATA)) {
FindFilesToPruneAuto is only called when pruneMode == PruneMode::AUTO, so why do we need this test? Maybe turn it into an assert at the beginning of the function.
Also, if (pruneMode) is equivalent to if (pruneMode != PruneMode::NONE), since PruneMode::NONE is explicitly defined as 0.
(applies to many changed lines in this PR)
190 | @@ -191,8 +191,10 @@ static const uint64_t nMinDiskSpace = 52428800; 191 | /** Pruning-related variables and constants */ 192 | /** True if any block files have ever been pruned. */ 193 | extern bool fHavePruned; 194 | -/** True if we're running in -prune mode. */ 195 | -extern bool fPruneMode; 196 | +enum struct PruneMode {NONE=0, AUTO, MANUAL};
Meta question: why do we need a tristate here? I think we could allow manual pruning even when in 'auto` mode. In that case, manual-only pruning could be requested by setting the limit very high.
So, originally I thought it was simpler to make manual pruning a separate mode from autoprune. After sipa's question and giving this a little more thought, I agree that it should be the same. So, now setting prune=1 (manual pruning) means autoprune with target=max.
Sorry for the late change in approach. This makes the diff smaller, though.
Also, now I only expose in main.h that which is used elsewhere.
Rebased.
30 | @@ -31,6 +31,10 @@ def __init__(self): 31 | self.utxo_cache_0 = [] 32 | self.utxo_cache_1 = [] 33 | 34 | + def setup_chain(self):
Do you actually need to override this method? Would it work to just change num_nodes from 3 to 5 in the constructor above? Maybe add a comment here if keeping this is necessary.
298 | + start_node(2, self.options.tmpdir, ["-debug=1","-prune=550"]) 299 | + print("Success") 300 | + except Exception as detail: 301 | + raise AssertionError("Wallet test: unable to re-start the pruning node") 302 | + 303 | + # connect a new node for IBD while pruning, then check wallet (see issue #7494)
231 | + self.nodes[3] = start_node(3, self.options.tmpdir, ["-debug=0","-prune=1"], timewait=900) 232 | + assert_equal(self.nodes[3].getblockcount(), 995) 233 | + 234 | + # should not prune because chain tip of node 3 (995) < PruneAfterHeight (1000) 235 | + try: 236 | + self.nodes[3].pruneblockchain(500)
Might be good to check for the specific error messages here and below, e.g. assert_raises_message(JSONRPCException, "Blockchain is too short for pruning.", self.nodes[3].pruneblockchain, 500)
821 | + "pruneblockchain\n" 822 | + "\nArguments:\n" 823 | + "1. \"height\" (int, required) The block height to prune up to.\n"); 824 | + 825 | + if (!fPruneMode) 826 | + throw JSONRPCError(RPC_METHOD_NOT_FOUND, "Cannot prune blocks because node is not in prune mode.");
Could add a python test for this condition.
Lightly tested ACK 485684c748b477caee06bcaad207eeb805b6c3a2 (just ran pruning.py).
Code looks really good. I made a few suggestions you could consider, but they're all on the python test.
Added a commit with edits to python test per ryanofsky review.
This is very useful for me. I'm doing some testing on this now. My application is a sql db on top of the blockchain that depends on processing incoming blocks. After processed into sql the block data is never used again so it's ideal if it can be pruned to save space, especially as I'm paying per GB on a cloud server.
Manual pruning allows me to start processing blocks while still syncing the chain without worry about blocks being pruned before I get to them. I had a workaround for this that disabled network access when the pruning was approaching the current block being worked on. So this is a much nicer approach and I feel there are probably more use cases for this.
I noticed that the man pages are not updated with this. The cmd line help does indicate prune=1 available but the man page misses that updated detail. So it may be nice to update that page too.
Currently it appears to be working but I am still in progress with a new full sync and I'm watching that pruning of block files trails behind my block processor as expected.
I rebased this, updated the man pages, and squashed the test commit.
utACK. Great feature!
One question: I view this feature as the perfect compliment to importmulti, that lets you be sure you've imported all your keys before you prune. But importmulti takes its scanning argument as a timestamp, while this takes it's argument as a height. Should we also support a timestamp option here that uses the same criteria as import multi?
Edit: I think I will fix this by adding a height based option to importmulti.
Extend pruneblockchain RPC to accept block timestamps as well as block indices.
Rebased for named arguments. @gmaxwell, I extended pruneblockchain in afffeea7d98ba358acd54a89bc0e7ae1c4d54023 to be able to take a timestamp instead of a block index. The code change is pretty small. The test change is bigger but straightforward.
utACK afffeea
846 | + if (chainHeight < Params().PruneAfterHeight()) 847 | + throw JSONRPCError(RPC_INTERNAL_ERROR, "Blockchain is too short for pruning."); 848 | + else if (height > chainHeight) 849 | + throw JSONRPCError(RPC_INVALID_PARAMETER, "Blockchain is shorter than the attempted prune height."); 850 | + else if (height > chainHeight - MIN_BLOCKS_TO_KEEP) 851 | + LogPrint("rpc", "Attempt to prune blocks close to the tip. Retaining the minimum number of blocks.");
one post merge nit: We should probably report the pruned height in case the user given value was overrode.