Addressing what I consider to be non- or not-too-controversial comments from #27596.
Let me know if I missed anything among the many comments that can be easily included here.
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.
For detailed information about the code coverage, see the test coverage report.
See the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.
Reviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.
762 if (cursor) {
763- // The cursor may not exist after a snapshot has been loaded but before any
764- // blocks have been downloaded.
765 return FlushBlockFile(cursor->file_num, /*fFinalize=*/false, /*finalize_undo=*/false);
766 }
767 return false;
return true;
here in order to avoid spurious warnings?
Github ACK https://github.com/bitcoin/bitcoin/pull/28562/commits/26c815b60c8c4aec27de3fc6c7849ea3c7bdd91e
Thanks for addressing these!
161 self.sync_blocks(nodes=(n0, n1))
162
163 self.log.info("Ensuring background validation completes")
164 # N.B.: the `snapshot` key disappears once the background validation is complete.
165- wait_until_helper(lambda: not n1.getchainstates().get('snapshot'))
166+ wait_until_helper(lambda: not n1.getchainstates().get('snapshot'), timeout_factor=2)
not a fan of overwriting and hard-coding the user-picked timeout factor. Why not use self.wait_until
?
I think it could make sense to rename wait_until_helper
to wait_until_helper_internal
?
HaveTxsDownloaded
to HaveNumChainTxs
to fix the TODO in the function?
2854@@ -2855,6 +2855,9 @@ return RPCHelpMan{
2855 return data;
2856 };
2857
2858+ // We present the IBD/background chainstate as "normal" to users, since
2859+ // it should not matter to them if there is also a snapshot chainstate.
2860+ // To users, the IBD chainstate should always be considered "normal".
In commit “doc: Add and edit some comments around assumeutxo” (42d585d981350052b00001fa52a895ddca642c11)
It does not seem accurate that a user would not care if there is a snapshot chainstate, otherwise why would they have loaded a snapshot to begin with?
I’m not sure exactly which question this comment is trying to answer, but if the goal is just to explain the code maybe it can say “// If a chainstate has m_from_snapshot_blockhash set and there is more than one chainstate, then it a partially validated snapshot chainstate. Otherwise it is a fully validated normal chainstate.”
I think we have a simple misunderstanding of my wording, probably my English makes the comment harder to parse than it needs to be :)
My goal is to explain to developers why we talk about a “normal” chainstate when speaking to the user (via RPC interface) while internally referring to it as ibd/background in the code. I am trying to convey the same thing you said in your comment here: #27596 (review)
Particularly when you say The idea is for the "normal" chainstate to refer to a typical fully validated chainstate [...] I think the term "IBD chainstate" is basically meaningless because both chainstates are downloaded during IBD.
I am trying to say the same thing here: The normal chainstate is the ibd chainstate, no matter if there is a snapshot chainstate or not. Hence “it does not matter” that there is a snapshot with regards to the naming of the ibd/background/normal chainstate from the user perspective.
I am happy to take suggestions for better wording but I don’t want to just explain what the code does. I want to document the internal/external divergence in the naming of the chainstates for future developers.
re: #28562 (review)
Thanks, that is helpful. You’re convincing me it’s not a good idea to add new names and definitions in this RPC.
I think a good solution would be to get rid of the names entirely and switch to an array. This should be both more clear and more convenient for figuring out the active chain. Something like the following (draft change, untested):
0--- a/src/rpc/blockchain.cpp
1+++ b/src/rpc/blockchain.cpp
2@@ -2807,6 +2807,7 @@ const std::vector<RPCResult> RPCHelpForChainstate{
3 {RPCResult::Type::STR_HEX, "snapshot_blockhash", /*optional=*/true, "the base block of the snapshot this chainstate is based on, if any"},
4 {RPCResult::Type::NUM, "coins_db_cache_bytes", "size of the coinsdb cache"},
5 {RPCResult::Type::NUM, "coins_tip_cache_bytes", "size of the coinstip cache"},
6+ {RPCResult::Type::BOOL, "validated", "whether the chainstate is fully validated. True if all blocks in the chainstate were validated, false if the chain is based on a snapshot and the snapshot has not yet been validated."},
7 };
8
9 static RPCHelpMan getchainstates()
10@@ -2818,8 +2819,7 @@ return RPCHelpMan{
11 RPCResult{
12 RPCResult::Type::OBJ, "", "", {
13 {RPCResult::Type::NUM, "headers", "the number of headers seen so far"},
14- {RPCResult::Type::OBJ, "normal", /*optional=*/true, "fully validated chainstate containing blocks this node has validated starting from the genesis block", RPCHelpForChainstate},
15- {RPCResult::Type::OBJ, "snapshot", /*optional=*/true, "only present if an assumeutxo snapshot is loaded. Partially validated chainstate containing blocks this node has validated starting from the snapshot. After the snapshot is validated (when the 'normal' chainstate advances far enough to validate it), this chainstate will replace and become the 'normal' chainstate.", RPCHelpForChainstate},
16+ {RPCResult::Type::ARR, "chainstates", "list of the chainstates ordered by work, with the most-work (active) chainstate last", RPCHelpForChainstate},
17 }
18 },
19 RPCExamples{
20@@ -2834,7 +2834,7 @@ return RPCHelpMan{
21 NodeContext& node = EnsureAnyNodeContext(request.context);
22 ChainstateManager& chainman = *node.chainman;
23
24- auto make_chain_data = [&](const Chainstate& cs) EXCLUSIVE_LOCKS_REQUIRED(::cs_main) {
25+ auto make_chain_data = [&](const Chainstate& cs, bool validated) EXCLUSIVE_LOCKS_REQUIRED(::cs_main) {
26 AssertLockHeld(::cs_main);
27 UniValue data(UniValue::VOBJ);
28 if (!cs.m_chain.Tip()) {
29@@ -2852,20 +2852,18 @@ return RPCHelpMan{
30 if (cs.m_from_snapshot_blockhash) {
31 data.pushKV("snapshot_blockhash", cs.m_from_snapshot_blockhash->ToString());
32 }
33+ data.pushKV("validated", validated);
34 return data;
35 };
36
37- if (chainman.GetAll().size() > 1) {
38- for (Chainstate* chainstate : chainman.GetAll()) {
39- obj.pushKV(
40- chainstate->m_from_snapshot_blockhash ? "snapshot" : "normal",
41- make_chain_data(*chainstate));
42- }
43- } else {
44- obj.pushKV("normal", make_chain_data(chainman.ActiveChainstate()));
45- }
46 obj.pushKV("headers", chainman.m_best_header ? chainman.m_best_header->nHeight : -1);
47
48+ const auto& chainstates = chainman.GetAll();
49+ UniValue obj_chainstates{UniValue::VARR};
50+ for (Chainstate* cs : chainstates) {
51+ obj_chainstates.push_back(make_chain_data(*cs, !cs->m_from_snapshot_blockhash || chainstates.size() == 1));
52+ }
53+ obj.pushKV("chainstates", std::move(obj_chainstates));
54 return obj;
55 }
56 };
The reason for introducing new “normal” and “snapshot” names was just to use terms that could be defined and understood very simply. A “normal” chainstate is a chainstate where every block has been validated. A “snapshot” chainstate is a chainstate loaded from a snapshot where every block has not been validated.
Exposing these new names was a lot better than exposing our internal “ibd chainstate” and “snapshot chainstate” names which are can’t be defined without reference to temporary implementation details of our code like the fact that IBD snapshot is not deleted until the next node restart after it is done being used. I think these names are also just objectively very bad and add unneeded complexity to our code. I’m working on a change to get rid of them (fb02d9f7cc90f606516e3c3bc55ba177049c4411).
EDIT: Did some more work on the internal representation change, draft PR is #28608. The getchainstates RPC change was implemented in #28590
re: #28562 (review)
I implemented the changes I suggested above in #28590, so will mark this thread resolved to avoid unnecessarily slowing down this PR.
80@@ -81,7 +81,7 @@ def run_test(self):
81 self.sync_blocks()
82
83 def no_sync():
84- pass
85+ self.no_op
In commit “test: Increase timeout tolerance of feature_assumeutxo” (613e8317275ecb0051f9dcea57786aabe7dfdb27)
This seem strange since it is just referencing the function without calling it. Is it missing ()
? Also, it’s not clear why no_op is better than pass. Would be helpful to have a comment.
self.no_op
directly. I think it’s clearer then because we don’t need the same no_op function declaration that we already have in the framework.
155@@ -156,19 +156,19 @@ def no_sync():
156 self.connect_nodes(0, 1)
157
158 self.log.info(f"Ensuring snapshot chain syncs to tip. ({FINAL_HEIGHT})")
159- wait_until_helper(lambda: n1.getchainstates()['snapshot']['blocks'] == FINAL_HEIGHT)
160+ self.wait_until(lambda: n1.getchainstates()['snapshot']['blocks'] == FINAL_HEIGHT, timeout=120)
In commit “test: Increase timeout tolerance of feature_assumeutxo” (613e8317275ecb0051f9dcea57786aabe7dfdb27)
It would be great to eliminate all these duplicate 120s. I’m surprised the test framework doesn’t provide a way to set a default time within the unit tests, but if it doesn’t I think it would be better to just declare a WAIT_TIMEOUT = 120
constant at the top of the test, or add a helper method that calls wait_until.
wait_until
helper, I implemented this here for now but it may be better to pull it out into its own PR… What do you think about this idea @MarcoFalke ?
461@@ -462,8 +462,8 @@ void SetupServerArgs(ArgsManager& argsman)
462 argsman.AddArg("-prune=<n>", strprintf("Reduce storage requirements by enabling pruning (deleting) of old blocks. This allows the pruneblockchain RPC to be called to delete specific blocks and enables automatic pruning of old blocks if a target size in MiB is provided. This mode is incompatible with -txindex. "
463 "Warning: Reverting this setting requires re-downloading the entire blockchain. "
464 "(default: 0 = disable pruning blocks, 1 = allow manual pruning via RPC, >=%u = automatically prune block files to stay under the specified target size in MiB)", MIN_DISK_SPACE_FOR_BLOCK_FILES / 1024 / 1024), ArgsManager::ALLOW_ANY, OptionsCategory::OPTIONS);
465- argsman.AddArg("-reindex", "Rebuild chain state and block index from the blk*.dat files on disk. This will also rebuild active optional indexes.", ArgsManager::ALLOW_ANY, OptionsCategory::OPTIONS);
466- argsman.AddArg("-reindex-chainstate", "Rebuild chain state from the currently indexed blocks. When in pruning mode or if blocks on disk might be corrupted, use full -reindex instead.", ArgsManager::ALLOW_ANY, OptionsCategory::OPTIONS);
467+ argsman.AddArg("-reindex", "Rebuild chain state and block index from the blk*.dat files on disk. This will also rebuild active optional indexes. Warning: Not yet validated snapshot chainstates will be removed.", ArgsManager::ALLOW_ANY, OptionsCategory::OPTIONS);
468+ argsman.AddArg("-reindex-chainstate", "Rebuild chain state from the currently indexed blocks. When in pruning mode or if blocks on disk might be corrupted, use full -reindex instead. Warning: Not yet validated snapshot chainstates will be removed.", ArgsManager::ALLOW_ANY, OptionsCategory::OPTIONS);
In commit “doc: Add snapshot chainstate removal warning to reindexing documentation” (2bd1d39491f8ad811dcb6d3787e1e18ca0b928ad)
These warnings seem too alarmist, and also not very actionable. The point of reindex commands is to wipe state and rebuild it. So it should not be surprising that state loaded from an assumeutxo snapshot is also wiped. I think it would be a better to add a clarification instead of a warning here, and a suggestion to reload the snapshot. Maybe: “If an assumeutxo snapshot was loaded and not yet validated, it will need to be reloaded for its chain state to be rebuilt.”
re: #28562 (comment)
Can you point back to / comment on #27596 in regards to what is fixed / followed up on here, and what still needs fixing? Given there are now a number of post-merge review comments.
This would be great, but there are a lot of open review comments on #27596 right now and this PR only addresses a small number of them, so it would be hard to compile a list of what is left and decide what is worth spending time to fix.
It would be good to comment on #27596 about what things were addressed here, though, so if someone wants to make another PR that addresses other issues that seem important, they can do that after this one.
🐙 This pull request conflicts with the target branch and needs rebase.
461@@ -462,8 +462,8 @@ void SetupServerArgs(ArgsManager& argsman)
462 argsman.AddArg("-prune=<n>", strprintf("Reduce storage requirements by enabling pruning (deleting) of old blocks. This allows the pruneblockchain RPC to be called to delete specific blocks and enables automatic pruning of old blocks if a target size in MiB is provided. This mode is incompatible with -txindex. "
463 "Warning: Reverting this setting requires re-downloading the entire blockchain. "
464 "(default: 0 = disable pruning blocks, 1 = allow manual pruning via RPC, >=%u = automatically prune block files to stay under the specified target size in MiB)", MIN_DISK_SPACE_FOR_BLOCK_FILES / 1024 / 1024), ArgsManager::ALLOW_ANY, OptionsCategory::OPTIONS);
465- argsman.AddArg("-reindex", "Rebuild chain state and block index from the blk*.dat files on disk. This will also rebuild active optional indexes.", ArgsManager::ALLOW_ANY, OptionsCategory::OPTIONS);
466- argsman.AddArg("-reindex-chainstate", "Rebuild chain state from the currently indexed blocks. When in pruning mode or if blocks on disk might be corrupted, use full -reindex instead.", ArgsManager::ALLOW_ANY, OptionsCategory::OPTIONS);
467+ argsman.AddArg("-reindex", "Rebuild chain state and block index from the blk*.dat files on disk. This will also rebuild active optional indexes. If an assumeutxo snapshot was loaded and not yet validated, it will need to be reloaded for its chain state to be rebuilt.", ArgsManager::ALLOW_ANY, OptionsCategory::OPTIONS);
-reindex
and then call the RPC to load the snapshot (in time)?
re: #28562 (review)
bdee0fd: what does this look like in practice? You start with
-reindex
and then call the RPC to load the snapshot (in time)?
Right, if you want a snapshot to be used after reindexing you have to reload the snapshot. Reindexing wipes and rebuilds chain states, and without a snapshot there is no snapshot chainstate to rebuild.
Maybe the following would be clearer: “If enabled, wipe chain state and block index, and rebuild them from blk*.dat files on disk. Also wipe and rebuild other optional indexes that are active. If an assumeutxo snapshot was loaded, its chainstate will be wiped as well. The snapshot can then be reloaded via RPC.”
reindex-chainste
now.
Co-authored-by: Ryan Ofsky <ryan@ofsky.org>
- Remove usage of the internal wait_until_helper function
- Use framework self.no_op instead of new no_sync function
co-authored-by: Andrew Chow <github@achow101.com>
Co-authored-by: Ryan Ofsky <ryan@ofsky.org>
Co-authored-by: Ryan Ofsky <ryan@ofsky.org>
Co-authored-by: MarcoFalke <falke.marco@gmail.com>
Co-authored-by: MarcoFalke <falke.marco@gmail.com>
4849- assert(pindex->pprev->nChainTx <= pindex->nChainTx);
4850- }
4851+ // Make sure nChainTx sum is correctly computed.
4852+ unsigned int prev_chain_tx = pindex->pprev ? pindex->pprev->nChainTx : 0;
4853+ assert((pindex->nChainTx == pindex->nTx + prev_chain_tx)
4854+ // For testing, allow transaction counts to be completely unset.
-chain=main -checkblockindex
on mainnet?
I removed the “testing” check and ran the main binary (not the tests) and it crashed here.
Generally I don’t think a good place to put test-only code is the main production code. Especially when it comes to consensus critical code. This makes it impossible to properly test and review the code in a production environment outside of unit tests.
I removed the “testing” check and ran the main binary (not the tests) and it crashed here.
Hm, can you say what exactly you removed and what you can to see the crash? I couldn’t reproduce that so far.
Generally I don’t think a good place to put test-only code is the main production code. Especially when it comes to consensus critical code. This makes it impossible to properly test and review the code in a production environment outside of unit tests.
I guess we should improve our test setup to include realistic nTx
and nChainTx
values so we can remove those checks again.
I used this diff IIRC, which also fails the functional tests:
0diff --git a/src/validation.cpp b/src/validation.cpp
1index 9108f911f0..3bfa5091ad 100644
2--- a/src/validation.cpp
3+++ b/src/validation.cpp
4@@ -4847,10 +4847,7 @@ void ChainstateManager::CheckBlockIndex()
5 // Make sure nChainTx sum is correctly computed.
6 unsigned int prev_chain_tx = pindex->pprev ? pindex->pprev->nChainTx : 0;
7 assert((pindex->nChainTx == pindex->nTx + prev_chain_tx)
8- // For testing, allow transaction counts to be completely unset.
9- || (pindex->nChainTx == 0 && pindex->nTx == 0)
10- // For testing, allow this nChainTx to be unset if previous is also unset.
11- || (pindex->nChainTx == 0 && prev_chain_tx == 0 && pindex->pprev));
12+ );
13
14 if (pindexFirstAssumeValid == nullptr && pindex->nStatus & BLOCK_ASSUMED_VALID) pindexFirstAssumeValid = pindex;
15 if (pindexFirstInvalid == nullptr && pindex->nStatus & BLOCK_FAILED_VALID) pindexFirstInvalid = pindex;