assumeutxo: background validation completion

jamesob commented at 2:52 pm on July 29, 2022: member

This is part of the assumeutxo project (parent PR: #15606)

Part two of replacing #24232.

When a user activates a snapshot, the serialized UTXO set data is used to create an “assumed-valid” chainstate, which becomes active in an attempt to get the node to network tip as quickly as possible. Simultaneously in the background, the already-existing chainstate continues “conventional” IBD to both accumulate full block data and serve as a belt-and-suspenders to validate the assumed-valid chainstate.

Once the background chainstate’s tip reaches the base block of the snapshot used, we set m_stop_use on that chainstate and immediately take the hash of its UTXO set; we verify that this matches the assumeutxo value in the source code. Note that while we ultimately want to remove this background chainstate, we don’t do so until the following initialization process, when we again check the UTXO set hash of the background chainstate, and if it continues to match, we remove the (now unnecessary) background chainstate, and move the (previously) assumed-valid chainstate into its place. We then reinitialize the chainstate in the normal way.

As noted in previous comments, we could do the filesystem operations “inline” immediately when the background validation completes, but that’s basically just an optimization that saves disk space until the next restart. It didn’t strike me as worth the risk of moving chainstate data around on disk during runtime of the node, though maybe my concerns are overblown.

The final result of this completion process is a fully-validated chain, where the only evidence that the user synced using assumeutxo is the existence of a base_blockhash file in the chainstate directory.

DrahtBot commented at 9:47 pm on July 29, 2022: contributor

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Reviews

See the guideline for information on the review process.

Type	Reviewers
ACK	achow101
Stale ACK	ryanofsky, ariard, brunoerg

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#26857 (OP_VAULT draft by jamesob)
#25977 (refactor: Replace std::optional<bilingual_str> with util::Result by ryanofsky)
#25722 (refactor: Use util::Result class for wallet loading by ryanofsky)
#25665 (refactor: Add util::Result failure values, multiple error and warning messages by ryanofsky)
#24008 (assumeutxo: net_processing changes by jamesob)
#15606 (assumeutxo by jamesob)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

jonatack commented at 8:57 am on July 30, 2022: contributor

0test/validation_chainstatemanager_tests.cpp:418:35: error: calling function 'DisconnectTip' requires holding mutex 'bg_chainstate.m_mempool->cs' exclusively [-Werror,-Wthread-safety-precise]
1        BOOST_CHECK(bg_chainstate.DisconnectTip(unused_state, &unused_pool));
2                                  ^
3test/validation_chainstatemanager_tests.cpp:418:35: note: found near match 'mempool->cs'
41 error generated.

in src/test/validation_chainstatemanager_tests.cpp:404 in 5ff0d2f7e7 outdated

395-//! - Then mark a region of the chain BLOCK_ASSUMED_VALID and introduce a second chainstate
396-//!   that will tolerate assumed-valid blocks. Run LoadBlockIndex() and ensure that the first
397-//!   chainstate only contains fully validated blocks and the other chainstate contains all blocks,
398-//!   even those assumed-valid.
399-//!
400-BOOST_FIXTURE_TEST_CASE(chainstatemanager_loadblockindex, TestChain100Setup)

ryanofsky commented at 4:26 pm on August 1, 2022:

In commit “test: add testcases for snapshot initialization” (5ff0d2f7e78a40225a1eb9aa7bbf5c358d94bf48)

Something wonky is going on in this commit. It seems like some rebase error. This is dropping the chainstatemanager_loadblockindex test, and replacing it with an exact copy of the chainstatemanager_snapshot_init below, which now appears twice (and is edited in later commit 65ad0860ffd9441b005ac4bc240f596d9a81836f to become a different chainstatemanager_snapshot_completion test)

jamesob commented at 10:47 pm on September 6, 2022:

@ryanofsky yeah, thanks - not sure what happened there. Should be fixed.

ryanofsky commented at 4:47 pm on August 1, 2022: contributor

Code review ACK d4fc4ae18c6ac4e932beed20d6a61a018a5577d0 if missing chainstatemanager_loadblockindex test is added back. First 12 commits were previously reviewed as part of #25667. Later commits were cherry-picked from #24232 without code changes

DrahtBot added the label Needs rebase on Aug 3, 2022

jamesob force-pushed on Sep 6, 2022

DrahtBot removed the label Needs rebase on Sep 6, 2022

jamesob force-pushed on Sep 6, 2022

jamesob commented at 10:47 pm on September 6, 2022: member

Clang failure on some platforms looks spurious:

0test/validation_chainstatemanager_tests.cpp:501:35: error: calling function 'DisconnectTip' requires holding mutex 'bg_chainstate.m_mempool->cs' exclusively [-Werror,-Wthread-safety-precise]
1        BOOST_CHECK(bg_chainstate.DisconnectTip(unused_state, &unused_pool));
2                                  ^
3test/validation_chainstatemanager_tests.cpp:501:35: note: found near match 'mempool->cs'
41 error generated.

but lock is taken here.

Anyone have ideas for workarounds?

ryanofsky commented at 10:52 pm on September 6, 2022: contributor

If the lock is for bg_chainstate.GetMempool()->cs but the annotation requires bg_chainstate.m_mempool->cs, the compiler probably isn’t going to know that GetMempool() and m_mempool are the same. Probably should stick with m_mempool

jamesob commented at 11:02 pm on September 6, 2022: member

Probably should stick with m_mempool

It’s protected :grimacing:. But looks like I can use the RETURNS_LOCK annotation on MempoolMutex().

Edit: that’s private too, hah.

jamesob force-pushed on Sep 6, 2022

DrahtBot added the label Needs rebase on Sep 13, 2022

jamesob force-pushed on Sep 13, 2022

DrahtBot removed the label Needs rebase on Sep 13, 2022

jamesob marked this as ready for review on Oct 13, 2022

jamesob commented at 2:40 pm on October 13, 2022: member

Reopening and closing to get Github to deduplicate commits now included on master.

jamesob closed this on Oct 13, 2022

jamesob reopened this on Oct 13, 2022

aureleoules commented at 3:00 pm on October 13, 2022: member

@jamesob this PR should probably be added to #15606 now that it’s out of draft.

jamesob force-pushed on Oct 13, 2022

jamesob commented at 5:10 pm on October 13, 2022: member

Thanks for the ping @aureleoules; the parent PR description has been updated.

in src/validation.cpp:5214 in a65af66bd5 outdated

5209+    if (index_new.GetBlockHash() != snapshot_blockhash) {
5210+      LogPrintf("[snapshot] supposed base block %s does not match the " /* Continued */
5211+          "snapshot base block %s (height %d). Snapshot is not valid.",
5212+          index_new.ToString(), snapshot_blockhash.ToString(), snapshot_base_height);
5213+       return SnapshotCompletionResult::BASE_BLOCKHASH_MISMATCH;
5214+    }

achow101 commented at 8:25 pm on October 17, 2022:

In a65af66bd5a292d0bd0030f0c6b49a3dfab0afcc “validation: add ChainMan logic for completing UTXO snapshot validation”

Since this is also caused by an invalid snapshot, should this be below and also call handle_invalid_snapshot? Otherwise it seems like this would just continue with the IBD chainstate without invalidating the snapshot chainstate which I think would be incorrect?

jamesob commented at 2:26 pm on October 19, 2022:

Good catch! Fixed.

in src/validation.cpp:5585 in a65af66bd5 outdated

5421+    std::string dbpath = fs::PathToString(snapshot_datadir);
5422+    std::string target = fs::PathToString(invalid_path);
5423+    LogPrintf("[snapshot] renaming snapshot datadir %s to %s\n", dbpath, target);
5424+
5425+    try {
5426+        fs::rename(snapshot_datadir, invalid_path);

achow101 commented at 8:27 pm on October 17, 2022:

In a65af66bd5a292d0bd0030f0c6b49a3dfab0afcc “validation: add ChainMan logic for completing UTXO snapshot validation”

This only renames the chainstate dir, it’s not clear to me if/when the invalid chainstate is deleted. Considering that moving the chainstate dir and deleting it are effectively the same thing here, I think it could just do fs::remove_all.

jamesob commented at 2:29 pm on October 19, 2022:

I think the idea is that we may want to preserve the bad snapshot datadir for later analysis or issue filing. I’ll make a note in the abort node log message that the user can manually remove the directory if desired or save it for later forensics when filing an issue.

in src/validation.cpp:5101 in a65af66bd5 outdated

5099+    std::optional<CCoinsStats> maybe_stats;
5100+
5101+    try {
5102+        maybe_stats = ComputeUTXOStats(
5103+            CoinStatsHashType::HASH_SERIALIZED, snapshot_coinsdb, m_blockman, SnapshotUTXOHashBreakpoint);
5104+    } catch (StopHashingException) {

achow101 commented at 8:33 pm on October 17, 2022:

In a65af66bd5a292d0bd0030f0c6b49a3dfab0afcc “validation: add ChainMan logic for completing UTXO snapshot validation”

0validation.cpp: In member function ‘bool ChainstateManager::PopulateAndValidateSnapshot(Chainstate&, AutoFile&, const node::SnapshotMetadata&)’:
1validation.cpp:5103:14: warning: catching polymorphic type ‘struct StopHashingException’ by value [-Wcatch-value=]
2 5103 |     } catch (StopHashingException) {
3      |

jamesob commented at 2:23 pm on October 19, 2022:

Fixed, thanks.

in src/validation.cpp:5612 in f7179525b5 outdated

5221+
5222+std::optional<int> ChainstateManager::GetSnapshotBaseHeight()
5223+{
5224+    CBlockIndex* base = this->GetSnapshotBaseBlock();
5225+    return base ? std::make_optional(base->nHeight) : std::nullopt;
5226+}

aureleoules commented at 10:33 am on October 18, 2022:

f7179525b5cb37f010d5e42f41650414e648bf45: can be made const

 0diff --git a/src/validation.cpp b/src/validation.cpp
 1index a6bc06201..eb0799b15 100644
 2--- a/src/validation.cpp
 3+++ b/src/validation.cpp
 4@@ -5439,16 +5439,16 @@ void Chainstate::InvalidateCoinsDBOnDisk()
 5     }
 6 }
 7 
 8-CBlockIndex* ChainstateManager::GetSnapshotBaseBlock()
 9+const CBlockIndex* ChainstateManager::GetSnapshotBaseBlock() const
10 {
11     auto blockhash_op = this->SnapshotBlockhash();
12     if (!blockhash_op) return nullptr;
13     return Assert(m_blockman.LookupBlockIndex(*blockhash_op));
14 }
15 
16-std::optional<int> ChainstateManager::GetSnapshotBaseHeight()
17+std::optional<int> ChainstateManager::GetSnapshotBaseHeight() const
18 {
19-    CBlockIndex* base = this->GetSnapshotBaseBlock();
20+    const CBlockIndex* base = this->GetSnapshotBaseBlock();
21     return base ? std::make_optional(base->nHeight) : std::nullopt;
22 }
23 
24diff --git a/src/validation.h b/src/validation.h
25index d435d76b4..7b3498d55 100644
26--- a/src/validation.h
27+++ b/src/validation.h
28@@ -908,11 +908,11 @@ private:
29     std::chrono::time_point<std::chrono::steady_clock> m_last_presync_update GUARDED_BY(::cs_main) {};
30 
31     //! Returns nullptr if no snapshot has been loaded.
32-    CBlockIndex* GetSnapshotBaseBlock() EXCLUSIVE_LOCKS_REQUIRED(::cs_main);
33+    const CBlockIndex* GetSnapshotBaseBlock() const EXCLUSIVE_LOCKS_REQUIRED(::cs_main);
34 
35     //! Return the height of the base block of the snapshot in use, if one exists, else
36     //! nullopt.
37-    std::optional<int> GetSnapshotBaseHeight() EXCLUSIVE_LOCKS_REQUIRED(::cs_main);
38+    std::optional<int> GetSnapshotBaseHeight() const EXCLUSIVE_LOCKS_REQUIRED(::cs_main);
39 
40     //! Return true if a chainstate is considered usable.
41     //!

in src/validation.h:478 in 062c8883da outdated

469@@ -470,6 +470,13 @@ class Chainstate
470     //! Manages the UTXO set, which is a reflection of the contents of `m_chain`.
471     std::unique_ptr<CoinsViews> m_coins_views;
472 
473+    //! This toggle exists for use when doing background validation for UTXO
474+    //! snapshots. It is set once the background validation chain reaches the
475+    //! same height as the base of the snapshot, and signals that we should no
476+    //! longer connect blocks to the background chainstate. It may also be set
477+    //! for snapshot chainstates found to be invalid.
478+    bool m_stop_use GUARDED_BY(::cs_main) {false};

aureleoules commented at 10:58 am on October 18, 2022:

062c8883da2f32e9b58bb846d37b911f4ac365db I think it’s clearer.

0    bool m_disabled GUARDED_BY(::cs_main) {false};

in src/validation.h:580 in 236ae38274 outdated

575     }
576 
577     //! Destructs all objects related to accessing the UTXO set.
578     void ResetCoinsViews() { m_coins_views.reset(); }
579 
580+    bool HasCoinsViews() { return (bool)m_coins_views; }

aureleoules commented at 11:03 am on October 18, 2022:

236ae38274669ce9fd4dcd2a85ab2c373226457f

0    bool HasCoinsViews() const { return m_coins_views != nullptr; }

jamesob commented at 1:28 pm on October 19, 2022:

Since unique_ptr has well-defined semantics for the bool operator, I’m going to leave this as is.

aureleoules commented at 2:59 pm on October 19, 2022:

alright, the function can be made const though

jamesob commented at 3:02 pm on October 19, 2022:

Yep, done.

aureleoules commented at 3:12 pm on October 19, 2022:

It hasn’t changed :grimacing:

jamesob commented at 4:57 pm on October 19, 2022:

Ugh oops, my mistake. Fixed.

in src/node/chainstate.cpp:28 in a65af66bd5 outdated

24@@ -25,16 +25,52 @@
25 #include <vector>
26 
27 namespace node {
28+

aureleoules commented at 11:34 am on October 18, 2022:

a65af66bd5a292d0bd0030f0c6b49a3dfab0afcc

in src/node/chainstate.cpp:49 in a65af66bd5 outdated

44+    }
45+
46+    // Check for changed -prune state.  What we are concerned about is a user who has pruned blocks
47+    // in the past, but is now trying to run unpruned.
48+    if (chainman.m_blockman.m_have_pruned && !options.prune) {
49+        return {ChainstateLoadStatus::FAILURE, _("You need to rebuild the database using -reindex to go back to unpruned mode.  This will redownload the entire blockchain")};

aureleoules commented at 11:42 am on October 18, 2022:

nit a65af66bd5a292d0bd0030f0c6b49a3dfab0afcc

0        return {ChainstateLoadStatus::FAILURE, _("You need to rebuild the database using -reindex to go back to unpruned mode. This will redownload the entire blockchain")};

in src/node/chainstate.cpp:202 in a65af66bd5 outdated

217-        return {ChainstateLoadStatus::FAILURE, _("You need to rebuild the database using -reindex to go back to unpruned mode.  This will redownload the entire blockchain")};
218-    }
219+        // Because ValidatedSnapshotCleanup() has torn down chainstates with
220+        // ChainstateManager::ResetChainstates(), reinitialize them here without
221+        // duplicating the blockindex work above.
222+        assert(chainman.GetAll().size() == 0);

aureleoules commented at 11:50 am on October 18, 2022:

a65af66bd5a292d0bd0030f0c6b49a3dfab0afcc

0        assert(chainman.GetAll().empty());

in src/validation.cpp:5198 in a65af66bd5 outdated

5193+       // Nothing to do - this function only applies to the background
5194+       // validation chainstate.
5195+       return SnapshotCompletionResult::SKIPPED;
5196+    }
5197+    int snapshot_tip_height = this->ActiveHeight();
5198+    int snapshot_base_height = *this->GetSnapshotBaseHeight();

aureleoules commented at 12:12 pm on October 18, 2022:

a65af66bd5a292d0bd0030f0c6b49a3dfab0afcc: maybe Assert first?

in src/validation.cpp:5199 in a65af66bd5 outdated

5194+       // validation chainstate.
5195+       return SnapshotCompletionResult::SKIPPED;
5196+    }
5197+    int snapshot_tip_height = this->ActiveHeight();
5198+    int snapshot_base_height = *this->GetSnapshotBaseHeight();
5199+    CBlockIndex& index_new = *m_ibd_chainstate->m_chain.Tip();

aureleoules commented at 12:13 pm on October 18, 2022:

a65af66bd5a292d0bd0030f0c6b49a3dfab0afcc: Assert here as well and can be const

in src/validation.cpp:5207 in a65af66bd5 outdated

5202+        // Background IBD not complete yet.
5203+        return SnapshotCompletionResult::SKIPPED;
5204+    }
5205+
5206+    assert(SnapshotBlockhash());
5207+    uint256 snapshot_blockhash = *SnapshotBlockhash();

aureleoules commented at 12:14 pm on October 18, 2022:

nit a65af66bd5a292d0bd0030f0c6b49a3dfab0afcc: can be inlined

0	uint256 snapshot_blockhash{*Assert(SnapshotBlockhash())};

in src/validation.cpp:5288 in 50909f0d5d outdated

5283+        // prevents us from validating the snapshot, so we should shut down and let the
5284+        // user handle the issue manually.
5285+        handle_invalid_snapshot();
5286+        return SnapshotCompletionResult::STATS_FAILED;
5287+    }
5288+    const auto ibd_stats = *maybe_ibd_stats;

aureleoules commented at 12:52 pm on October 18, 2022:

a65af66bd5a292d0bd0030f0c6b49a3dfab0afcc

0    const auto& ibd_stats = *maybe_ibd_stats;

in src/validation.cpp:5481 in 50909f0d5d outdated

5476+                  "in-memory chainstates. You are testing, right?\n");
5477+        return false;
5478+    }
5479+
5480+    auto snapshot_chainstate_path = *snapshot_chainstate_path_maybe;
5481+    auto ibd_chainstate_path = *ibd_chainstate_path_maybe;

aureleoules commented at 12:53 pm on October 18, 2022:

nit: a65af66bd5a292d0bd0030f0c6b49a3dfab0afcc

0    const auto& snapshot_chainstate_path = *snapshot_chainstate_path_maybe;
1    const auto& ibd_chainstate_path = *ibd_chainstate_path_maybe;

aureleoules commented at 12:58 pm on October 18, 2022: member

I reviewed all the commits and verified they all compile. The code looks good and the unit tests seem to confirm that the behavior is correct. I left some minor/nits comments.

jamesob force-pushed on Oct 19, 2022

jamesob commented at 2:38 pm on October 19, 2022: member

Thanks very much for the reviews @achow101 @aureleoules. I’ve addressed all your feedback unless otherwise noted.

Just as a reminder, a previous (and very similar) version of this code was looked at by @ryanofsky @MarcoFalke back in #24232. Hopefully the code here is pretty familiar, and should make review a bit easier.

jamesob force-pushed on Oct 19, 2022

in src/node/chainstate.cpp:141 in c1767471c1 outdated

133@@ -167,6 +134,93 @@ ChainstateLoadResult LoadChainstate(ChainstateManager& chainman, const CacheSize
134     return {ChainstateLoadStatus::SUCCESS, {}};
135 }
136 
137+ChainstateLoadResult LoadChainstate(ChainstateManager& chainman, const CacheSizes& cache_sizes,
138+                                    const ChainstateLoadOptions& options)
139+{
140+    if (!hashAssumeValid.IsNull()) {
141+        LogPrintf("Assuming ancestors of block %s have valid signatures.\n", hashAssumeValid.GetHex());

ariard commented at 10:07 pm on October 19, 2022:

I think could do a LookupBlockIndex to display the block height, as it’s more speaking for the node operator.

jamesob commented at 8:00 pm on November 3, 2022:

Out of scope for this PR, since this is just code I moved.

in src/node/chainstate.cpp:187 in c1767471c1 outdated

177+    }
178+
179+    auto [init_status, init_error] = CompleteChainstateInitialization(chainman, options);
180+    if (init_status != ChainstateLoadStatus::SUCCESS) {
181+        return {init_status, init_error};
182+    }

ariard commented at 10:14 pm on October 19, 2022:

I think you could have a quick log of the chainstate initialization success, as this function has a lot of checks/ops and you can inform the node operator where he is in the initialization sequence.

jamesob commented at 8:01 pm on November 3, 2022:

Out of scope or possibly follow-up given this was existing code.

in src/validation.cpp:3044 in c1767471c1 outdated

2984@@ -2973,6 +2985,13 @@ bool Chainstate::ActivateBestChain(BlockValidationState& state, std::shared_ptr<
2985     // we use m_chainstate_mutex to enforce mutual exclusion so that only one caller may execute this function at a time
2986     LOCK(m_chainstate_mutex);
2987 
2988+    // Belt-and-suspenders check that we aren't attempting to advance the background
2989+    // chainstate past the snapshot base block.
2990+    if (WITH_LOCK(::cs_main, return m_disabled)) {
2991+        LogPrintf("m_disabled is set - this chainstate should not be in operation. Bug?.\n");

ariard commented at 10:24 pm on October 19, 2022:

I think you could go even further, and invite the node operator to fulfill an issue on the repository or at least indicate a communication process to feedback the bug to us ?

jamesob commented at 5:43 pm on February 22, 2023:

Fixed.

ariard commented at 10:31 pm on October 19, 2022: member

Will do a more in-depth review soon

DrahtBot added the label Needs rebase on Oct 26, 2022

in src/validation.cpp:5310 in 07b9dea9d4 outdated

5180+//
5181+//  (i) setting `m_disabled` immediately and ensuring all chainstate accesses go
5182+//      through IsUsable() checks, or
5183+//
5184+//  (ii) giving each chainstate its own lock instead of using cs_main for everything.
5185+SnapshotCompletionResult ChainstateManager::MaybeCompleteSnapshotValidation(

ryanofsky commented at 7:38 pm on October 31, 2022:

It looks like there has been a rebase error and two previous commits were combined into one commit that doesn’t have a valid commit description. The two previous commits were:

“move-only-ish: init: factor out chainstate initialization” (aead772a24b1f7759cd756bf6534a70dc41ca080)
“validation: add ChainMan logic for completing UTXO snapshot validation” (863e5f389e08cd2069e4834df82879ec39142aac)

and the new combined commit is

“move-only-ish: init: factor out chainstate initialization” (07b9dea9d4e8de2ce22e8ba50172a4af85801931)

It would help review to split these up and separate moveonly changes from other changes

jamesob commented at 1:46 pm on November 3, 2022:

Wow, no idea how that got so screwed up. Will fix, thanks.

in src/validation.h:1057 in 1dda5f3f59 outdated

996@@ -985,7 +997,10 @@ class ChainstateManager
997     std::optional<uint256> SnapshotBlockhash() const;
998 
999     //! Is there a snapshot in use and has it been fully validated?

ariard commented at 10:33 pm on October 31, 2022:

Comment could be updated to reflect snapshot validation implied the ibd chainstate is from now disabled. No ulterior blocks should be connected.

in src/validation.h:457 in 1dda5f3f59 outdated

469@@ -470,6 +470,13 @@ class Chainstate
470     //! Manages the UTXO set, which is a reflection of the contents of `m_chain`.
471     std::unique_ptr<CoinsViews> m_coins_views;
472 
473+    //! This toggle exists for use when doing background validation for UTXO
474+    //! snapshots. It is set once the background validation chain reaches the
475+    //! same height as the base of the snapshot, and signals that we should no

ariard commented at 10:50 pm on October 31, 2022:

Comment is unclear if toggle is setup once “same-height-of-snapshot’ or if it also includes UTXO set hash comparison (PopulateAndValidateSnapshot())

jamesob commented at 8:18 pm on November 3, 2022:

Added.

in src/validation.h:586 in 65d2ce6417 outdated

575     }
576 
577     //! Destructs all objects related to accessing the UTXO set.
578     void ResetCoinsViews() { m_coins_views.reset(); }
579 
580+    bool HasCoinsViews() const { return (bool)m_coins_views; }

ariard commented at 10:56 pm on October 31, 2022:

Can get a one line comment. That way reduces confusion for the next contributor on a quest to refactor validation.h/coins.h, without introducing odds of weird consensus bugs.

jamesob commented at 8:18 pm on November 3, 2022:

Added.

ariard commented at 11:02 pm on October 31, 2022: member

Reviewing commit by commit. So far until 65d2ce6.

I think this is pending on a rebase :) Few more PRs and we should be good to tag assumeutxo 0.1. Let’s keep going!

jamesob force-pushed on Nov 3, 2022

DrahtBot removed the label Needs rebase on Nov 3, 2022

in src/validation.h:1060 in 53f9a79c87 outdated

975@@ -962,7 +976,10 @@ class ChainstateManager
976     std::optional<uint256> SnapshotBlockhash() const;
977 
978     //! Is there a snapshot in use and has it been fully validated?
979-    bool IsSnapshotValidated() const EXCLUSIVE_LOCKS_REQUIRED(::cs_main) { return m_snapshot_validated; }
980+    bool IsSnapshotValidated() const EXCLUSIVE_LOCKS_REQUIRED(::cs_main)
981+    {
982+        return m_snapshot_chainstate && m_ibd_chainstate && m_ibd_chainstate->m_disabled;

ryanofsky commented at 0:53 am on November 8, 2022:

In commit “validation: add CChainState::m_disabled and ChainMan::isUsable” (53f9a79c87fff061d7f5232b9c4e829f9380fa01)

This definition of IsSnapshotValidated seems to contradict earlier comment that m_disabled flag “does not indicate that the background chainstate has been used to fully validate the snapshot”. If this is not a bug it might be useful to add a comment to this function to clarify, or maybe call something like IsBackgroundChainDownloaded.

jamesob commented at 2:50 pm on November 8, 2022:

Ah, yeah, good catch! I moved too fast when addressing Antoine’s feedback and forgot that setting m_disabled on the IBD chainstate does indicate full snapshot validation. It’s a little confusing because setting m_disabled on the snapshot chainstate indicates that validation failed, but this definition of IsSnapshotValidated() is actually still correct, I just screwed up the comment.

ryanofsky commented at 2:57 pm on November 8, 2022:

Oh, that is a helpful clarification that m_disabled is set in different conditions on the different chainstates. It’d be useful to say in doxygen comment for m_disabled that it is is only set on background chainstate after it is validated and validation was successful, and only set on snapshot chainstate if it’s invalid and needs to be deleted.

in src/node/chainstate.cpp:33 in 15a86a4940 outdated

27@@ -28,38 +28,13 @@
28 #include <vector>
29 
30 namespace node {
31-ChainstateLoadResult LoadChainstate(ChainstateManager& chainman, const CacheSizes& cache_sizes,
32-                                    const ChainstateLoadOptions& options)
33+// Complete initialization of chainstates after the initial call has been made
34+// to ChainstateManager::InitializeChainstate().
35+static ChainstateLoadResult CompleteChainstateInitialization(

ryanofsky commented at 1:07 am on November 8, 2022:

In commit “move-only-ish: init: factor out chainstate initialization” (15a86a4940777a427d2b1c8c7c1e533de440f951)

This is just an aesthetic suggestion, but instead of splitting this function in two and putting the second half below the first, half you could forward declare the second CompleteChainstateInitialization function, and keep the first part first and second part second. This would also make the diff a lot smaller because basically no code would have to move.

jamesob commented at 3:46 pm on November 8, 2022:

I would do this, but repeating the function signature for a static function for perpetuity just to save on a diff that’s pretty easy to read with --color-moved=dimmed-zebra --color-moved-ws=ignore-space-change anyway doesn’t seem like the right trade to me. Will leave this as-is.

in src/validation.h:999 in e17d51768b outdated

994+    //! is the base of the UTXO snapshot in use, compare its coins to ensure
995+    //! they match those expected by the snapshot.
996+    //!
997+    //! If the coins match (expected), then mark the validation chainstate for
998+    //! deletion and continue using the snapshot chainstate as active.
999+    //! Otherwise, revert to using the ibd chainstate and shutdown (TODO).

ryanofsky commented at 1:11 am on November 8, 2022:

In commit “validation: add ChainMan logic for completing UTXO snapshot validation” (e17d51768bcf5f7791fdf7d0b2ee4b11cc64d922)

Which part of this is TODO? It seems like current code is calling InvalidateCoinsDBOnDisk and shutting down as described

jamesob commented at 3:47 pm on November 8, 2022:

Good catch!

in src/node/chainstate.cpp:192 in e17d51768b outdated

185@@ -183,6 +186,43 @@ ChainstateLoadResult LoadChainstate(ChainstateManager& chainman, const CacheSize
186         return {init_status, init_error};
187     }
188 
189+    // If a snapshot chainstate was fully validated by a background chainstate during
190+    // the last run, detect it here and clean up the now-unneeded background
191+    // chainstate. This is the expected case during snapshot completion, and
192+    // not just a belt-and-suspenders.

ryanofsky commented at 1:24 am on November 8, 2022:

In commit “validation: add ChainMan logic for completing UTXO snapshot validation” (e17d51768bcf5f7791fdf7d0b2ee4b11cc64d922)

You probably explained this somewhere previously, but I can’t remember why this is expected. Could you replace “This is expected” sentence with something more specific like “The background chainstate is only deleted on startup, not while the node is running, because of [reason]”.

jamesob commented at 3:51 pm on November 8, 2022:

Good point, done.

in src/validation.cpp:5532 in e17d51768b outdated

5527+
5528+        LogPrintf("%s: error renaming file '%s' -> '%s': %s\n",
5529+                __func__, src_str, dest_str, e.what());
5530+        AbortNode(strprintf(
5531+            "Rename of '%s' -> '%s' failed. "
5532+            "Do not restart until manually moving or deleting invalid snapshot directory %s.",

ryanofsky commented at 1:37 am on November 8, 2022:

In commit “validation: add ChainMan logic for completing UTXO snapshot validation” (e17d51768bcf5f7791fdf7d0b2ee4b11cc64d922)

What’s the reason not to restart? It seems like if you do restart the snapshot will just fail to validate again which should be safe because it will either succesfully rename, or show the same error message again. Maybe better to say what to do instead of what not to do like “Rename of invalid chainstate ‘%s’ -> ‘%s’ failed. Node may refuse to start until bad chainstate is manually renamed or deleted.”

jamesob commented at 3:57 pm on November 8, 2022:

Good point! Done.

in src/validation.cpp:2821 in e17d51768b outdated

2816+        // This call may set `m_disabled`, which is referenced immediately afterwards in
2817+        // ActivateBestChain.
2818+        //
2819+        // If this fails (i.e. if snapshot is invalid), we will have marked the
2820+        // snapshot chainstate as unusable, and will have removed its coinsdb from
2821+        // disk.

ryanofsky commented at 4:14 am on November 8, 2022:

In commit “validation: add ChainMan logic for completing UTXO snapshot validation” (e17d51768bcf5f7791fdf7d0b2ee4b11cc64d922)

I think it’s just renamed _INVALID, not removed. In any case, this detail doesn’t really seem relevant here. It seems more relevant that the failure is handled internally so no need to check for errors from this function.

jamesob commented at 3:54 pm on November 8, 2022:

Good point, fixed.

in src/validation.cpp:2817 in e17d51768b outdated

2809@@ -2810,6 +2810,18 @@ bool Chainstate::ConnectTip(BlockValidationState& state, CBlockIndex* pindexNew,
2810              Ticks<SecondsDouble>(time_total),
2811              Ticks<MillisecondsDouble>(time_total) / num_blocks_total);
2812 
2813+    // If we are the background validation chainstate, check to see if we are done
2814+    // validating the snapshot (i.e. our tip has reached the snapshot's base block).
2815+    if (this != &m_chainman.ActiveChainstate()) {
2816+        // This call may set `m_disabled`, which is referenced immediately afterwards in
2817+        // ActivateBestChain.

ryanofsky commented at 4:15 am on November 8, 2022:

In commit “validation: add ChainMan logic for completing UTXO snapshot validation” (https://github.com/bitcoin/bitcoin/commit/e17d51768bcf5f7791fdf7d0b2ee4b11cc64d922)

Would add “so it will stop connecting new blocks” to make it clearer what m_disabled is used for.

jamesob commented at 3:53 pm on November 8, 2022:

Done.

ryanofsky approved

ryanofsky commented at 4:58 am on November 8, 2022: contributor

Code review ACK e7dc5e6e52811e2b03c053ab07fc25720e6f1d10 assuming question about IsSnapshotValidated doesn’t point at a real problem. Changes since last review were rebasing, adding const keywords, renaming stop_use to disabled, improving invalid snapshot message, adding mempool mutex to tests.

jamesob force-pushed on Nov 8, 2022

jamesob commented at 4:01 pm on November 8, 2022: member

Thanks as always for your good review, @ryanofsky. I’ve rebased on top of latest master to fix CI.

in src/validation.h:484 in e07764318a outdated

456+    //!
457+    //! In the expected case, it is set once the background validation chain reaches the
458+    //! same height as the base of the snapshot and its UTXO set is found to hash to
459+    //! the expected assumeutxo value. It signals that we should no longer connect
460+    //! blocks to the background chainstate. When set on the background validation
461+    //! chainstate, it signifies that we have fully validated the snapshot chainstate.

ryanofsky commented at 8:50 pm on November 13, 2022:

In commit “validation: add CChainState::m_disabled and ChainMan::isUsable” (e07764318ad766ba27dfeafe1ca2345534f81dd5)

This last sentence is confusing because it is part of the “In the expected case” paragraph, but it is referring to an different unexpected case. It’s also not very clear that this sentence is referring to a different chainstate object than the rest of the paragraph.

Would move this sentence to a separate paragraph and say something like “In the error case, if the background UTXO hash does not match the expected assumeutxo valid, m_disabled will be set to true on the snapshot chainstate (not the background validation chainstate) indicating that the snapshot chainstate is invalid.”

ryanofsky approved

ryanofsky commented at 8:54 pm on November 13, 2022: contributor

Code review ACK ab41e8b4093925dc339096c15b33fe8b4726f55a. Just rebase since last review, and some nice comment and error message improvements.

in src/test/validation_chainstatemanager_tests.cpp:610 in ab41e8b409 outdated

604+
605+    {
606+        LOCK(chainman_restarted.GetMutex());
607+        BOOST_CHECK_EQUAL(chainman_restarted.GetAll().size(), 1);
608+        BOOST_CHECK(!chainman_restarted.IsSnapshotActive());
609+        BOOST_CHECK(!chainman_restarted.IsSnapshotValidated());

ariard commented at 5:24 pm on November 22, 2022:

As I understand this sequence of checks verifying the cleanness of our chain validation, I think you could also verify the coin cache rebalancing as done above, i.e BOOST_CHECK(active_cs.m_coinstip_cache_size_bytes > tip_cache_before_complete) and the other one.

jamesob commented at 5:42 pm on February 22, 2023:

Fixed.

in src/test/validation_chainstatemanager_tests.cpp:637 in ab41e8b409 outdated

629+    SnapshotCompletionResult res;
630+    auto mock_shutdown = [](bilingual_str msg) {};
631+
632+    // Test tampering with the IBD UTXO set with an extra coin to ensure it causes
633+    // snapshot completion to fail.
634+    CCoinsViewCache& ibd_coins = WITH_LOCK(::cs_main,

ariard commented at 5:33 pm on November 22, 2022:

To verify the coinstats logic (ComputeUTXOStats) is committing well to all the attributes of the chain tip coin set, I think other variations of tampering could be done, such as picking up a random coin and inflating its nValue or tweaking its scriptPubKey.

jamesob commented at 5:42 pm on February 22, 2023:

Good for a follow-up.

in src/test/validation_chainstatemanager_tests.cpp:651 in ab41e8b409 outdated

643+    fs::path snapshot_chainstate_dir = gArgs.GetDataDirNet() / "chainstate_snapshot";
644+    BOOST_CHECK(fs::exists(snapshot_chainstate_dir));
645+
646+    res = WITH_LOCK(::cs_main,
647+        return chainman.MaybeCompleteSnapshotValidation(mock_shutdown));
648+    BOOST_CHECK_EQUAL(res, SnapshotCompletionResult::HASH_MISMATCH);

ariard commented at 5:37 pm on November 22, 2022:

Note, among the validation_chainstatemanager_test.cpp unit tests, I think only SnapshotCompletionResult::HASH_MISMATCH is exercised, I don’t see checks for MISSING_CHAINPARAMS, STATS_FAILED, BASE_BLOCKHASH_MISMATCH. They can be added later.

in src/validation.cpp:5253 in ab41e8b409 outdated

5240@@ -5196,6 +5241,149 @@ bool ChainstateManager::PopulateAndValidateSnapshot(
5241     return true;
5242 }
5243 
5244+// Currently, this function holds cs_main for its duration, which could be for
5245+// multiple minutes due to the GetUTXOStats call. This hold is necessary

ariard commented at 5:42 pm on November 22, 2022:

I believe here you would like to say ComputeUTXOStats (used L5344), the core logic of GetUTXOStats.

jamesob commented at 5:43 pm on February 22, 2023:

Fixed.

in src/node/chainstate.cpp:200 in ab41e8b409 outdated

195+    // filesystem operations to move leveldb data directories around, and that seems
196+    // too risky to do in the middle of normal runtime.
197+    auto snapshot_completion = chainman.MaybeCompleteSnapshotValidation();
198+
199+    if (snapshot_completion == SnapshotCompletionResult::SKIPPED) {
200+        // do nothing; expected case

ariard commented at 6:01 pm on November 22, 2022:

One future improvement suggestion: I think here the boolean result of DetectSnapshotChainstate to completely skip the call of MaybeCompleteSnapshotValidation and remove SnapshotCompletionResult::SKIPPED. From my understanding, lack of UTXO snapshot should imply lack of background chainstate, whatever its sync state and the necessity of clean it..

in src/node/chainstate.cpp:212 in ab41e8b409 outdated

207+        // Because ValidatedSnapshotCleanup() has torn down chainstates with
208+        // ChainstateManager::ResetChainstates(), reinitialize them here without
209+        // duplicating the blockindex work above.
210+        assert(chainman.GetAll().empty());
211+        assert(!chainman.IsSnapshotActive());
212+        assert(!chainman.IsSnapshotValidated());

ariard commented at 6:06 pm on November 22, 2022:

I believe asserting cache sanity could be also done, there is a call to MaybeRebalanceCaches() at the end of MaybeCompleteSnapshotValidation().

in src/node/chainstate.cpp:220 in ab41e8b409 outdated

215+
216+        // A reload of the block index is required to recompute setBlockIndexCandidates
217+        // for the fully validated chainstate.
218+        chainman.ActiveChainstate().UnloadBlockIndex();
219+
220+        auto [init_status, init_error] = CompleteChainstateInitialization(chainman, cache_sizes, options);

ariard commented at 6:22 pm on November 22, 2022:

I think CompleteChainstateInitialization would deserve to be better documented, notably the functional difference when there is a single chainstate and when multiple are present (background/snapshot). From my understanding, apart of the coin caches balancing, all the other operations are affecting the state of ChainstateManager / BlockManager, shared between both chainstates.

in src/validation.cpp:5328 in ab41e8b409 outdated

5271+    const int snapshot_base_height = *Assert(this->GetSnapshotBaseHeight());
5272+    const CBlockIndex& index_new = *Assert(m_ibd_chainstate->m_chain.Tip());
5273+
5274+    if (index_new.nHeight < snapshot_base_height) {
5275+        // Background IBD not complete yet.
5276+        return SnapshotCompletionResult::SKIPPED;

ariard commented at 6:33 pm on November 22, 2022:

A small note, I think here you could have a SnapshotCompletionResult::NOT_MATURE_YET as this usage doesn’t really convey the same information of the previons one (i.e the lack of a background validation chainstate).

in src/validation.cpp:3079 in ab41e8b409 outdated

3039@@ -3032,6 +3040,13 @@ bool Chainstate::ActivateBestChain(BlockValidationState& state, std::shared_ptr<
3040     // we use m_chainstate_mutex to enforce mutual exclusion so that only one caller may execute this function at a time
3041     LOCK(m_chainstate_mutex);
3042 
3043+    // Belt-and-suspenders check that we aren't attempting to advance the background
3044+    // chainstate past the snapshot base block.

ariard commented at 7:29 pm on November 22, 2022:

From my understanding there is a marginal case where this is not only a belt-and-suspender. When background chainstate reaches height parity with the snapshot chainstate, the background chainstate is torn down only in LoadChainstate->ValidatedSnapshotCleanup by the call to ResetChainstates() L5584. Before, ABC could be called as the background chainstate is still ActiveChainstate(), at least for the block of the connection sequence, before the remaining one hit the m_disabled L3106. If correct, the comment could be more informative.

jamesob commented at 5:34 pm on February 22, 2023:

For what it’s worth, in future PRs I think this possibility is eliminated, since calls like ActiveChainstate().ActivateBestChain() in ProcessNewBlock are replaced with specific chainstate references.

ariard approved

ariard commented at 7:32 pm on November 22, 2022: member

ACK ab41e8b

The code could be better documented, with few improvements, and the unit test coverage enhanced, however this could be done in a follow-up. Reviewed the validation and cleanup of the background chainstate from hiting the block height parity with snapshot in ConnectTip, to flushing the data struct when the node is re-started in AppInitMain().

DrahtBot added the label Needs rebase on Dec 5, 2022

jamesob force-pushed on Dec 7, 2022

DrahtBot removed the label Needs rebase on Dec 7, 2022

DrahtBot added the label Needs rebase on Dec 8, 2022

jamesob force-pushed on Dec 14, 2022

DrahtBot removed the label Needs rebase on Dec 14, 2022

achow101 commented at 9:20 pm on January 19, 2023: member

ACK 30e113735ae3b1465d30b2125e7826a7a1527b99

The commit message of 287bb3f4ce41341aa0aea97b8176dc790889830d “validation: add CChainState::m_disabled and ChainMan::isUsable” still mentions m_stop_use rather than m_disabled

DrahtBot added the label Needs rebase on Jan 27, 2023

add ChainstateManager.GetSnapshot{BaseHeight,BaseBlock}()

For use in later commits.

5ee22cdafd

validation: add CChainState::m_disabled and ChainMan::isUsable

and remove m_snapshot_validated. This state can now be inferred by the
number of isUsable chainstates.

m_disabled is used to signal that a chainstate should no longer be used
by validation logic; it is used as a sentinel when background validation
completes or if the snapshot chainstate is found to be invalid.

isUsable is a convenience method that incorporates m_disabled.

c29f26b47b

add Chainstate::HasCoinsViews()

Used in subsequent commits. Also cleans up asserts in
coins_views-related convenience methods to be more exact.

637a90b973

move-only-ish: init: factor out chainstate initialization

Moves chainstate initialization into its own function. This is
necessary to later support a more readable way of handling
background-validation chainstate cleanup during init, since the
chainstate initialization functions may need to be repeated after
moving leveldb filesystem content around.

This commit isn't strictly necessary, but the alternative is to (ab)use
the `while` loop in init.cpp with a `continue` on the basis of a
specific ChainstateLoadingError return value from LoadChainstate. Not
only is this harder to read, but it can't be unittested.

The approach here lets us consolidate background-validation cleanup to
LoadChainstate, and therefore exercise it within tests.

This commit is most easily reviewed with

  git diff --color-moved=dimmed-zebra
  --color-moved-ws=ignore-space-change

f2a4f3376f

DrahtBot requested review from ariard on Feb 22, 2023

DrahtBot requested review from ryanofsky on Feb 22, 2023

jamesob force-pushed on Feb 22, 2023

jamesob commented at 5:50 pm on February 22, 2023: member

Rebased and reworded the commit message that @achow101 pointed out was out of date. Also fixed two minor items that @ariard noted.

In attempt to encourage re-ACKers, here is the diff (including rebase items):

 0diff <(git diff --no-color au.complete.10~9..au.complete.10) <(git diff --no-color au.complete.11~9..au.complete.11)
 1109c109
 2< index 60acb614b4..d36184fed7 100644
 3---
 4> index ba1024d22e..cd82d8743c 100644
 5138c138
 6< -    if (nPruneTarget == std::numeric_limits<uint64_t>::max()) {
 7---
 8> -    if (chainman.m_blockman.GetPruneTarget() == std::numeric_limits<uint64_t>::max()) {
 9140,141c140,141
10< -    } else if (nPruneTarget) {
11< -        LogPrintf("Prune configured to target %u MiB on disk for block and undo files.\n", nPruneTarget / 1024 / 1024);
12---
13> -    } else if (chainman.m_blockman.GetPruneTarget()) {
14> -        LogPrintf("Prune configured to target %u MiB on disk for block and undo files.\n", chainman.m_blockman.GetPruneTarget() / 1024 / 1024);
15187c187
16< +    if (nPruneTarget == std::numeric_limits<uint64_t>::max()) {
17---
18> +    if (chainman.m_blockman.GetPruneTarget() == std::numeric_limits<uint64_t>::max()) {
19189,190c189,190
20< +    } else if (nPruneTarget) {
21< +        LogPrintf("Prune configured to target %u MiB on disk for block and undo files.\n", nPruneTarget / 1024 / 1024);
22---
23> +    } else if (chainman.m_blockman.GetPruneTarget()) {
24> +        LogPrintf("Prune configured to target %u MiB on disk for block and undo files.\n", chainman.m_blockman.GetPruneTarget() / 1024 / 1024);
25257c257
26< index 22b9af1201..fd6a9b4c2e 100644
27---
28> index 56867a584b..45d59b685c 100644
29294c294
30< @@ -518,10 +533,156 @@ BOOST_FIXTURE_TEST_CASE(chainstatemanager_snapshot_init, SnapshotTestSetup)
31---
32> @@ -518,10 +533,160 @@ BOOST_FIXTURE_TEST_CASE(chainstatemanager_snapshot_init, SnapshotTestSetup)
33366a367,368
34> +    const Chainstate& active_cs2 = chainman_restarted.ActiveChainstate();
35> +
36371a374,375
37> +        BOOST_CHECK(active_cs2.m_coinstip_cache_size_bytes > tip_cache_before_complete);
38> +        BOOST_CHECK(active_cs2.m_coinsdb_cache_size_bytes > db_cache_before_complete);
39453c457
40< index 76bea97341..0916827710 100644
41---
42> @@ -3067,6 +3075,14 @@ bool Chainstate::ActivateBestChain(BlockValidationState& state, std::shared_ptr<
43478c482,483
44< +        LogPrintf("m_disabled is set - this chainstate should not be in operation. Bug?.\n");
45---
46> +        LogPrintf("m_disabled is set - this chainstate should not be in operation. " /* Continued */
47> +            "Please report this as a bug. %s\n", PACKAGE_BUGREPORT);
48485c490
49< @@ -3080,6 +3095,15 @@ bool Chainstate::ActivateBestChain(BlockValidationState& state, std::shared_ptr<
50---

achow101 commented at 6:41 pm on February 22, 2023: member

re-ACK a00d0e78510d79bc9fc57ec622aec98a65efa8c0

DrahtBot removed the label Needs rebase on Feb 22, 2023

jonatack commented at 7:14 pm on February 22, 2023: contributor

In the pull description, could drop or update “Since it’s dependent on the commits in #25667, I’m opening this as a draft.”

ariard approved

ariard commented at 9:37 pm on February 22, 2023: member

ACK a00d0e78.

Since last ACK ab41e8b4:

move nPruneTarget usage to BlockManager::GetPruneTarget() in LoadChainstate()
bug report call message (i.e PACKAGE_BUGREPORT) in LogPrintf in ActivateBestChain()
move GetUTXOStats mention to ComputeUTXOStats in MaybeCompleteSnapshotValidation comment
new BOOST_CHECKs in chainstatemanager_snapshot_completion about cache size bytes

Sjors commented at 6:44 pm on February 23, 2023: member

I’d like to test this with loadtxoutset. I wonder if I can just cherry-pick a few things from #15606 to do that myself, or if a more rigorous rebase is needed…

jamesob commented at 4:58 pm on February 28, 2023: member

Maintainers: what else would you like to see for a merge here?

achow101 commented at 5:43 pm on February 28, 2023: member

Maintainers: what else would you like to see for a merge here?

Waiting for a couple more reviews.

jamesob commented at 7:17 pm on March 2, 2023: member

In the pull description, could drop or update “Since it’s dependent on the commits in #25667, I’m opening this as a draft.”

Done, thanks.

brunoerg commented at 6:29 pm on March 3, 2023: contributor

The commit message of https://github.com/bitcoin/bitcoin/commit/287bb3f4ce41341aa0aea97b8176dc790889830d “validation: add CChainState::m_disabled and ChainMan::isUsable” still mentions m_stop_use rather than m_disabled

In the pull description, it still mention m_stop_use.

brunoerg approved

brunoerg commented at 9:11 pm on March 3, 2023: contributor

crACK a00d0e78510d79bc9fc57ec622aec98a65efa8c0

In 5ee22cdafd2562bcb8bf0ae6025e4b53c826382d, it creates ChainstateManager::GetSnapshotBaseBlock and ChainstateManager::GetSnapshotBaseHeight. GetSnapshotBaseHeight is used (not in this commit) in MaybeCompleteSnapshotValidation to check if IBD (in background) has been completed.

In c29f26b47b8ef978d8689dc0222aa663361ee6cb, it creates CChainState::m_disabled and I could check it’s been used to point that a chainstate should no longer be used by validation logic. (perhaps PR description should be updated).

Some logic I could understand: assumeutxo2

In e91b2165103784f13316e513d03dda0d63570675, it first added in CompleteChainstateInitialization

0assert(chainman.m_total_coinstip_cache > 0);
1assert(chainman.m_total_coinsdb_cache > 0);

Good to check we have enough bytes available for in-memory coins caches and leveldb stuff.

In LoadChainstate I could check (what is mentioned):

Loads the fully validated chainstate
Loads a chain created from a UTXO snapshot, if any exist.

With both ones, it calls MaybeCompleteSnapshotValidation to compare the background chainstate’s UTXO hash with the hard-coded (and expected) one. If so, it sets m_disabled=true. And then, we can _re-_initialize the chainstate (InitializeChainstate).

If we could have loadtxoutset (as mentioned by @Sjors), it would be good to review it and see things happen in practice.

achow101 commented at 0:45 am on March 7, 2023: member

Missed this on last review (sorry):

0../../../src/validation.cpp: In member function ‘SnapshotCompletionResult ChainstateManager::MaybeCompleteSnapshotValidation(std::function<void(bilingual_str)>)’:
1../../../src/validation.cpp:5450:14: error: catching polymorphic type ‘struct StopHashingException’ by value [-Werror=catch-value=]
2 5450 |     } catch (StopHashingException) {
3      |              ^~~~~~~~~~~~~~~~~~~~

validation: add ChainMan logic for completing UTXO snapshot validation

Trigger completion when a background validation chainstate reaches the
same height as a UTXO snapshot, and handle cleaning up the chainstate
on subsequent startup.

d96c59cc5c

log: add LoadBlockIndex() message for assumedvalid blocks

I found this useful during unittest debugging.

7300ced9de

refactor: make MempoolMutex() public

for use in the following unittests.

d70919a88f

test: add snapshot completion unittests

Also adjusts the previous snapshot chainstate init tests
to account for the fact that the init process is now attempting to
validate and complete background chainstates whose tip is at the
snapshot base block. We use a DisconnectTip() hack to preserve the
nature of the test.

87a1108c81

docs: update assumeutxo.md

Include notes about the `chainstate_snapshot` rename as well as
updates for the included code.

2b373fe49d

jamesob force-pushed on Mar 7, 2023

jamesob commented at 9:08 pm on March 7, 2023: member

Missed this on last review (sorry):

Fixed!

achow101 commented at 11:36 pm on March 7, 2023: member

ACK 2b373fe49d64f04ceab2309d3f40da7bac6b37d6

achow101 merged this on Mar 7, 2023

achow101 closed this on Mar 7, 2023

sidhujag referenced this in commit 3a3d85f88f on Mar 8, 2023

in src/validation.cpp:5403 in d96c59cc5c outdated

5398+            SnapshotUTXOHashBreakpoint);
5399+    } catch (StopHashingException const&) {
5400+        return SnapshotCompletionResult::STATS_FAILED;
5401+    }
5402+
5403+    // XXX note that this function is slow and will hold cs_main for potentially minutes.

fjahr commented at 3:55 pm on March 12, 2023:

Just catching up… Not sure if intentional or not but maybe the XXX here was something you still wanted to address but missed @jamesob ?

in src/validation.cpp:5661 in d96c59cc5c outdated

5656+    auto rename_failed_abort = [](
5657+                                   fs::path p_old,
5658+                                   fs::path p_new,
5659+                                   const fs::filesystem_error& err) {
5660+        LogPrintf("%s: error renaming file (%s): %s\n",
5661+                __func__, fs::PathToString(p_old), err.what());

maflcko commented at 2:59 pm on March 13, 2023:

nit in d96c59cc5cd2f73f1f55c133c52208671fe75ef3: No need to pass func

This is already done by log internally
This will print a useless operator(), twice: [validation.cpp:5712] [operator()] operator(): error renaming file

in src/validation.cpp:5672 in d96c59cc5c outdated

5667+
5668+    try {
5669+        fs::rename(ibd_chainstate_path, tmp_old);
5670+    } catch (const fs::filesystem_error& e) {
5671+        rename_failed_abort(ibd_chainstate_path, tmp_old, e);
5672+        throw;

maflcko commented at 3:13 pm on March 13, 2023:

nit in d96c59cc5cd2f73f1f55c133c52208671fe75ef3:

Looks like the LOC can be reduced by moving all of this into the lambda instead of repeating the passed paths twice and the throw logic twice.

in src/validation.cpp:5589 in d96c59cc5c outdated

5584+    } catch (const fs::filesystem_error& e) {
5585+        auto src_str = fs::PathToString(snapshot_datadir);
5586+        auto dest_str = fs::PathToString(invalid_path);
5587+
5588+        LogPrintf("%s: error renaming file '%s' -> '%s': %s\n",
5589+                __func__, src_str, dest_str, e.what());

maflcko commented at 3:16 pm on March 13, 2023:

nit: Same about __func__

in src/validation.cpp:5345 in d96c59cc5c outdated

5340+            "restart, the node will resume syncing from %d "
5341+            "without using any snapshot data. "
5342+            "Please report this incident to %s, including how you obtained the snapshot. "
5343+            "The invalid snapshot chainstate has been left on disk in case it is "
5344+            "helpful in diagnosing the issue that caused this error."),
5345+            PACKAGE_NAME, snapshot_tip_height, snapshot_base_height, snapshot_base_height, PACKAGE_BUGREPORT

maflcko commented at 3:19 pm on March 13, 2023:

Same commit: At the risk of people running into this, I wonder if this should use STR_INTERNAL_BUG() so that more info is printed about the version/commit hash they were using.

in src/validation.cpp:3082 in d96c59cc5c outdated

3074@@ -3067,6 +3075,14 @@ bool Chainstate::ActivateBestChain(BlockValidationState& state, std::shared_ptr<
3075     // we use m_chainstate_mutex to enforce mutual exclusion so that only one caller may execute this function at a time
3076     LOCK(m_chainstate_mutex);
3077 
3078+    // Belt-and-suspenders check that we aren't attempting to advance the background
3079+    // chainstate past the snapshot base block.
3080+    if (WITH_LOCK(::cs_main, return m_disabled)) {
3081+        LogPrintf("m_disabled is set - this chainstate should not be in operation. " /* Continued */
3082+            "Please report this as a bug. %s\n", PACKAGE_BUGREPORT);

maflcko commented at 3:20 pm on March 13, 2023:

Same. (STR_INTERNAL_BUG(msg))

maflcko commented at 3:20 pm on March 13, 2023: member

left some nits. Feel free to ignore

Fabcien referenced this in commit f58299dfff on Oct 26, 2023

Fabcien referenced this in commit b9af6d5b90 on Oct 26, 2023

Fabcien referenced this in commit 004abc8d7e on Oct 26, 2023

Fabcien referenced this in commit 587d2b747f on Oct 26, 2023

bitcoin locked this on Mar 12, 2024

assumeutxo: background validation completion #25740

Reviews

Conflicts