assumeutxo: add init and completion logic #24232

pull jamesob wants to merge 18 commits into bitcoin:master from jamesob:2022-02-au-init-and-completion changing 19 files +1278 −262
  1. jamesob commented at 10:13 pm on February 1, 2022: member

    This is part of the assumeutxo project (parent PR: #15606)


    This changeset adds logic for detecting and activating snapshot chainstates on start, and for completing the snapshot validation process once the background chainstate reaches the base block of the assumed-valid chain. It also handles removing the snapshot chainstate data on-disk in the event that the snapshot fails to validate.

    As detailed in assumeutxo.md:

    Once the tip of the background chainstate hits the base block of the snapshot chainstate, we stop use of the background chainstate by setting m_stop_use, in CompleteSnapshotValidation(), which is checked in ActivateBestChain()). We hash the background chainstate’s UTXO set contents and ensure it matches the compiled value in CMainParams::m_assumeutxo_data.

    The background chainstate data lingers on disk until shutdown, when in ChainstateManager::Reset(), the background chainstate is cleaned up with ValidatedSnapshotShutdownCleanup(), which renames the chainstate_[hash] datadir as chainstate.

    Failure consideration: if bitcoind unexpectedly halts after m_stop_use is set on the background chainstate but before CompleteSnapshotValidation() can finish, the need to complete snapshot validation will be detected on subsequent init by ChainstateManager::CheckForUncleanShutdown().


    Most of this change is unittested, though some logic (anything assuming on-disk leveldb data) can’t be tested without some rework of unittest setup, given in-memory leveldb instances are used. I can look into how difficult this would be.

    This change adds the ability to unittest using on-disk leveldb coins dbs.

    Possible follow-ups

  2. DrahtBot added the label UTXO Db and Indexes on Feb 2, 2022
  3. DrahtBot added the label Validation on Feb 2, 2022
  4. DrahtBot commented at 3:54 am on February 2, 2022: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #25598 (doc: assumeutxo: format tables to be aligned in plain-text by theStack)
    • #25564 (Move DEFAULT_MAX_ORPHAN_TRANSACTIONS to node/txorphanage.h by MarcoFalke)
    • #25527 ([kernel 3c/n] Decouple validation cache initialization from ArgsManager by dongcarl)
    • #25077 (Fix chain tip data race and corrupt rest response by MarcoFalke)
    • #25038 (policy: nVersion=3 and Package RBF by glozow)
    • #22663 (dbwrapper: properly destroy objects in case CDBWrapper ctor throws by Crypt-iQ)
    • #19888 (rpc, test: Improve getblockstats for unspendables by fjahr)
    • #17786 (refactor: Nuke policy/fees->mempool circular dependencies by hebasto)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  5. in src/validation.h:1025 in 08f4bf18b1 outdated
    1021     //! ResizeCoinsCaches() as needed.
    1022     void MaybeRebalanceCaches() EXCLUSIVE_LOCKS_REQUIRED(::cs_main);
    1023 
    1024+    //! When starting up, search the datadir for a chainstate based on a UTXO
    1025+    //! snapshot that is in the process of being validated.
    1026+    bool DetectSnapshotChainstate(CTxMemPool& mempool) EXCLUSIVE_LOCKS_REQUIRED(::cs_main);
    


    MarcoFalke commented at 6:09 am on February 2, 2022:
    any reason this is a reference. memepool at the call site is already a pointer to allow setting it to nullptr
  6. in src/validation.h:1036 in 08f4bf18b1 outdated
    1032+
    1033+    //! Complete validation of the active snapshot if `chainstate` is in the process of
    1034+    //! background validating and has reached the base block of the snapshot.
    1035+    bool MaybeCompleteSnapshotValidation(
    1036+        CChainState& chainstate,
    1037+        CBlockIndex* pindexNew) EXCLUSIVE_LOCKS_REQUIRED(::cs_main);
    


    MarcoFalke commented at 6:10 am on February 2, 2022:
    Any reason this is a pointer? This can’t be null, neither at the call site nor inside the function.
  7. in src/node/chainstate.cpp:41 in 08f4bf18b1 outdated
    37+
    38+    // If we're not using a snapshot or we haven't fully validated it yet,
    39+    // create a validation chainstate.
    40+    if (!chainman.IsSnapshotValidated()) {
    41+        LogPrintf("Loading validation chainstate\n");
    42+        chainman.InitializeChainstate(Assert(mempool));
    


    MarcoFalke commented at 6:11 am on February 2, 2022:
    any reason to add the assert?
  8. MarcoFalke commented at 6:12 am on February 2, 2022: member
    Looks like there are test failures
  9. jamesob force-pushed on Feb 2, 2022
  10. jamesob force-pushed on Feb 2, 2022
  11. jamesob commented at 3:15 pm on February 2, 2022: member

    Thanks for the quick feedback Mco.

    The test failure was actually due to removing a commit that I’d forgotten the use for. Basically, because we’re now calling MaybeRebalanceCaches in LoadChainstate, that prompts a FlushStateToDisk call before the block index is loaded, which trips some UB.

    The bounds check in https://github.com/bitcoin/bitcoin/pull/24232/commits/d70b8de51e54061e9ef045366f4a7651678dd82a fixes this.

  12. jamesob force-pushed on Feb 2, 2022
  13. jamesob force-pushed on Feb 2, 2022
  14. jamesob force-pushed on Feb 2, 2022
  15. in src/validation.cpp:5007 in 7e2b212f94 outdated
    5002+    CCoinsStats ibd_stats{CoinStatsHashType::HASH_SERIALIZED};
    5003+    auto breakpoint_fnc = [] { /* TODO insert breakpoint here? */ };
    5004+
    5005+    if (!GetUTXOStats(&ibd_coins_db, WITH_LOCK(::cs_main, return std::ref(m_blockman)), ibd_stats, breakpoint_fnc)) {
    5006+        LogPrintf("[snapshot] failed to generate stats for validation coins db\n");
    5007+        return false;
    


    jamesob commented at 3:04 pm on February 3, 2022:
    Should probably de-activate the existing snapshot chainstate as in the if (snapshot_invalid) conditional.
  16. in src/validation.cpp:5025 in 7e2b212f94 outdated
    5020+    auto maybe_au_data = ExpectedAssumeutxo(curr_height, ::Params());
    5021+
    5022+    if (!maybe_au_data) {
    5023+        LogPrintf("[snapshot] assumeutxo data not found for height " /* Continued */
    5024+            "(%d) - refusing to validate snapshot\n", curr_height);
    5025+        return false;
    


    jamesob commented at 3:04 pm on February 3, 2022:
    Same as above. These failure modes also need tests.
  17. in src/validation.cpp:5306 in 7e2b212f94 outdated
    5301+    // time we compare the UTXO set hash to the base of our active snapshot.
    5302+    if (&chainstate != &this->ActiveChainstate()) {
    5303+        if (pindexNew.nHeight > getSnapshotHeight()) {
    5304+            // TODO jamesob: better handling?
    5305+            LogPrintf("[snapshot] something is wrong! validation chain " /* Continued */
    5306+                "should not have continued past the snapshot origin\n");
    


    jamesob commented at 3:11 pm on February 3, 2022:
    We should either deactivate the snapshot chainstate or shutdown.
  18. jamesob force-pushed on Feb 8, 2022
  19. jamesob commented at 1:01 am on February 8, 2022: member

    Okay, I’ve pushed a revision that addresses my previous comments and more comprehensively (that is to say: at all) tests the major failure modes. I had to parameterize the shutdown behavior during snapshot completion to get this to work within the unittests.

    Pending CI pass, this should be ready for continued review.

  20. jamesob force-pushed on Feb 8, 2022
  21. jamesob force-pushed on Feb 8, 2022
  22. jamesob commented at 11:36 pm on February 8, 2022: member
    Alright, CI is finally passing. I had to add https://github.com/bitcoin/bitcoin/pull/24232/commits/5d5dfaa7a647307f58acdd1909fc8f34c1787f82 because an existing unittest that advanced an IBD chain past the snapshot it generated started failing. This isn’t something that should happen in practice, but was done to test an unrelated feature (UpdateTip behavior on a background chainstate).
  23. laanwj added this to the "Blockers" column in a project

  24. DrahtBot added the label Needs rebase on Feb 17, 2022
  25. in src/test/util/chainstate.h:71 in 9f89221913 outdated
    66+            // disturbing the existing BlockManager instance, which is needed to
    67+            // recognize the headers chain previously generated by the chainstate we're
    68+            // removing. Without those headers, we can't activate the snapshot below.
    69+            LOCK(::cs_main);
    70+            uint256 gen_hash = node.chainman->ActiveChainstate().m_chain[0]->GetBlockHash();
    71+            node.chainman->Reset();
    


    MarcoFalke commented at 12:20 pm on March 7, 2022:
    Looks like #24299 was merged, so you might have to inline the needed resets here (or create a helper function for it).
  26. jamesob force-pushed on Mar 7, 2022
  27. jamesob commented at 5:00 pm on March 7, 2022: member
    Alright, have rebased; basically I just took the old ChainstateManager::Reset() method and renamed it to cleanup(), and applied @jonatack -style changes (lock assertion/annotation vs. LOCK).
  28. DrahtBot removed the label Needs rebase on Mar 7, 2022
  29. jonatack commented at 7:02 pm on March 10, 2022: contributor
    Planning to review soon.
  30. in src/validation.cpp:5419 in becdbd6da4 outdated
    5108+            if (found.empty()) {
    5109+                found = it->path();
    5110+                LogPrintf("[snapshot] found snapshot datadir %s\n", fs::PathToString(found));
    5111+            } else {
    5112+                LogPrintf("[snapshot] WARNING - detected multiple snapshot " /* Continued */
    5113+                    "datadirs (%s). This is not expected.\n", fs::PathToString(it->path()));
    


    ryanofsky commented at 1:37 am on March 23, 2022:

    In commit “init: add utxo snapshot detection” (becdbd6da4c6d7c9c658ffc38b1c9da1c89571a1)

    Log message seems vague about what the result of this warning is. I think it would be clearer if log message said the directory would be ignored like “[snapshot] WARNING: Multiple snapshot directories were not expected. Ignoring unexpected snapshot directory {it->path}.”

    Also, as long as the code is not capable of handling multiple snapshots it seems like a better design for this might just be to call the directory <datadir>/chainstate_snapshot instead of <datadir>/chainstate_{blockhash} so there would be no ambiguity. Blockhash could be dumped to a <datadir>/chainstate_snapshot/block text file, if the idea is that it should be human readable (or maybe a chainstate_snapshot/snapshot.json file to be more fancy.)

    In any case, I don’t know what happens to the ignored snapshot directories after the background download completes? Will it it finish validating one snapshot and then go onto validate the next snapshot next time bitcoind is restarted?


    ryanofsky commented at 1:38 pm on March 25, 2022:

    In commit “init: add utxo snapshot detection” (becdbd6da4c6d7c9c658ffc38b1c9da1c89571a1)

    re: https://github.com/bitcoin/bitcoin/pull/24232/#discussion_r832762880

    Also, as long as the code is not capable of handling multiple snapshots it seems like a better design for this might just be to call the directory <datadir>/chainstate_snapshot instead of <datadir>/chainstate_{blockhash} so there would be no ambiguity.

    I take back this part of my suggestion. I think current design of iterating all the chainstate directories is good, but the code should be improved to be more robust (in the future, not necessarily here). I think the best thing for bitcoin to do at startup would be to list all the chainstate* directories and use whichever one has the most work at its tip as the active chainstate. Then, if that active chainstate comes from a snapshot that needs to be validated, additionally load up the “ibd” chainstate and sync it up in the background to validate the starting snapshot.

    This makes the right thing happen regardless of what directories are present. Snapshots will be used if they are useful, ignored if they are not, and if they are manually added or removed it causes no problems.


    jamesob commented at 4:08 pm on April 11, 2022:

    I think the best thing for bitcoin to do at startup would be to list all the chainstate* directories and use whichever one has the most work at its tip as the active chainstate.

    I agree this would be a nice feature, but the process of unpacking snapshots and loading them to the point of being able to evaluate “most work” entails some big changes, so I’ll defer this one for later.

    I do like your suggestion of calling it <datadir>/chainstate_snapshot and will incorporate that.


    jamesob commented at 2:30 am on April 21, 2022:
    I’ve changed the code to write the snapshot chainstate leveldb to chainstate_snapshot. A separate file within that directory called base_blockhash contains a serialized uint256 of what used to be in the directory name.
  31. in src/validation.cpp:5155 in fe09582ec3 outdated
    5150+            datadir_hash.ToString(), m_from_snapshot_blockhash->ToString());
    5151+        return false;
    5152+    }
    5153+
    5154+    LogPrintf("[snapshot] tearing down snapshot datadir %s\n", fs::PathToString(*snapshot_datadir));
    5155+    assert(fs::remove_all(*snapshot_datadir) > 0);
    


    ryanofsky commented at 2:02 am on March 23, 2022:

    In commit “validation: tear down snapshot datadirs upon validation failure” (fe09582ec34ff4d19bb2aa490ffd6ccca95e8480)

    Two issues this line:

    • Should not put code intended to produce side effects in an assert because it will be skipped if asserts are disabled. I know the bitcoin project does not “support” building with asserts disabled, but I don’t think it should go out of its way to be broken either. Also, people who used to reading conventional c++ code may skim past asserts not expecting to see real behavior hidden inside them. Would suggest if (remove_all() <= 0) { LogPrint("Warning failed to remove {snapshot}"); } since this is handling an error condition inside an error condition, and asserts should be used to check code correctness, not handle i/o errors anyway.

    • IMO, it is usually not appropriate to call removal_all in production code. One stray mv or mount bind command combined with this could result in deleting data that doesn’t belong to us, and potentially very large data losses. Would be safer to use leveldb::DestroyDB on the database and fs::remove calls individually on any of our own files or directories.


    jamesob commented at 10:27 pm on April 13, 2022:
    Both good points, both fixed. I’m now using leveldb::DestroyDB() to remove chainstate data.
  32. in src/validation.cpp:5154 in fe09582ec3 outdated
    5149+        LogPrintf("[snapshot] WARNING - unexpected blockhash for snapshot datadir (%s); expected %s\n",
    5150+            datadir_hash.ToString(), m_from_snapshot_blockhash->ToString());
    5151+        return false;
    5152+    }
    5153+
    5154+    LogPrintf("[snapshot] tearing down snapshot datadir %s\n", fs::PathToString(*snapshot_datadir));
    


    ryanofsky commented at 2:08 am on March 23, 2022:

    In commit “validation: tear down snapshot datadirs upon validation failure” (fe09582ec34ff4d19bb2aa490ffd6ccca95e8480)

    Here and other places, may be better to replace “tear down” with “remove”. I’m not exactly sure what tearing down implies, but it sounds different than removing.

  33. in src/validation.cpp:4803 in fe09582ec3 outdated
    4797@@ -4798,7 +4798,9 @@ bool ChainstateManager::ActivateSnapshot(
    4798         *snapshot_chainstate, coins_file, metadata);
    4799 
    4800     if (!snapshot_ok) {
    4801-        WITH_LOCK(::cs_main, this->MaybeRebalanceCaches());
    4802+        LOCK(::cs_main);
    4803+        this->MaybeRebalanceCaches();
    4804+        snapshot_chainstate->TeardownSnapshotDatadir();
    


    ryanofsky commented at 2:34 am on March 23, 2022:

    In commit “validation: tear down snapshot datadirs upon validation failure” (fe09582ec34ff4d19bb2aa490ffd6ccca95e8480)

    Return value is not checked here. It seems like right now this could say:

    0bool removed = snapshot_chainstate->TeardownSnapshotDatadir();
    1assert(removed);
    

    since logically it should be impossible for any of the conditions where TeardownSnapshotDatadir returns false to trigger currently. However, this is pretty fragile, and if you’re not planning to call TeardownSnapshotDatadir from other places in the future, I think it would be better to make TeardownSnapshotDatadir return void instead of bool, and turn the checks inside into asserts to make them function preconditions.

    I don’t mean to nitpick but I think anyplace errors are not checked it increases chances that bugs will slip in and go undetected.

  34. in src/node/blockstorage.cpp:536 in 1aeca9ae4d outdated
    557+        // chainstate init, when we call ChainstateManager::MaybeRebalanceCaches() (which
    558+        // then calls FlushStateToDisk()), resulting in a call to this function before we
    559+        // have populated `m_blockfile_info` via LoadBlockIndexDB().
    560+        return;
    561+    }
    562     FlatFilePos block_pos_old(m_last_blockfile, m_blockfile_info[m_last_blockfile].nSize);
    


    ryanofsky commented at 2:51 am on March 23, 2022:

    In commit “blockmanager: avoid undefined behavior during FlushBlockFile” (1aeca9ae4d9938a9ae5a06dbe16d8b1cd40b9960)

    Would add assert(m_blockfile_info.size() > m_last_blockfile) to catch the related UB condition.


    jamesob commented at 8:03 pm on April 13, 2022:
    Added, thanks.
  35. in src/validation.h:498 in c703836160 outdated
    491@@ -492,6 +492,12 @@ class CChainState
    492     //! Manages the UTXO set, which is a reflection of the contents of `m_chain`.
    493     std::unique_ptr<CoinsViews> m_coins_views;
    494 
    495+    //! This toggle exists for use when doing background validation for UTXO
    496+    //! snapshots. It is set once the background validation chain reaches the
    497+    //! same height as the base of the snapshot, and signals that we should no
    498+    //! longer connect blocks to this chainstate.
    


    ryanofsky commented at 3:01 am on March 23, 2022:

    In commit “validation: add CChainState::m_stop_use and ChainMan::isUsable” (c703836160f3eca8073c22a1531bb98ad9666999)

    Would s/this chainstate/the background chainstate/. Which chainstate is “this chainstate” was not immediately obvious.


    jamesob commented at 10:27 pm on April 13, 2022:
    Fixed
  36. in src/validation.h:877 in c703836160 outdated
    871@@ -866,6 +872,15 @@ class ChainstateManager
    872     CBlockIndex* getSnapshotBaseBlock() EXCLUSIVE_LOCKS_REQUIRED(::cs_main);
    873     std::optional<int> getSnapshotHeight() EXCLUSIVE_LOCKS_REQUIRED(::cs_main);
    874 
    875+    //! Return true if a chainstate is considered usable.
    876+    //!
    877+    //! This might be false for, say, a background validation chainstate which has
    


    ryanofsky commented at 3:15 am on March 23, 2022:

    In commit “validation: add CChainState::m_stop_use and ChainMan::isUsable” (c703836160f3eca8073c22a1531bb98ad9666999)

    Would change “This might be false, for, say” to “This will be false for”. IMO wishy washy comments are hard to understand and don’t provide any real future proofing (A veritably false comment is still less confusing than a technically true but mostly misleading comment)


    jamesob commented at 10:31 pm on April 13, 2022:
    Wishywashy comment no more.
  37. ryanofsky commented at 3:22 am on March 23, 2022: contributor

    Started review (will update list below with progress).

    • becdbd6da4c6d7c9c658ffc38b1c9da1c89571a1 init: add utxo snapshot detection (1/11)
    • fe09582ec34ff4d19bb2aa490ffd6ccca95e8480 validation: tear down snapshot datadirs upon validation failure (2/11)
    • d8f32839cfbc8cbfe194d190bdc19fb3c4d6e32a add ChainstateManager.getSnapshot{Height,BaseBlock}() (3/11)
    • b28ed6628850f2f4eceb33c059c0d9997972bbc0 db: add StoragePath() to CDBWrapper and CCoinsViewDB (4/11)
    • 1aeca9ae4d9938a9ae5a06dbe16d8b1cd40b9960 blockmanager: avoid undefined behavior during FlushBlockFile (5/11)
    • c703836160f3eca8073c22a1531bb98ad9666999 validation: add CChainState::m_stop_use and ChainMan::isUsable (6/11)
    • bf36616fc62f668c45d30449fb5b7c2c84a73a99 add CChainState::HasCoinsViews() (7/11)
    • 328aa05c28fe9e5fff5d90e9b23bbbb5f6da1846 validation: add ChainMan logic for completing UTXO snapshot validation (8/11)
    • b1c55101c2883bf03777c320cec3b05b58454a53 move-only: test: make snapshot chainstate setup reusable (9/11)
    • 9916bb825f6ef69f6c906dd4154309df07aa9d03 test: add reset_chainstate parameter for snapshot unittests (10/11)
    • 51d0dd34ecfb42d80707ec6cf788f0e63e30439f test: add testcases for snapshot completion (11/11)
  38. in src/node/chainstate.cpp:140 in 328aa05c28 outdated
    136@@ -137,6 +137,8 @@ std::optional<ChainstateLoadingError> LoadChainstate(bool fReset,
    137         }
    138     }
    139 
    140+    chainman.checkForUncleanShutdown();
    


    ryanofsky commented at 5:36 pm on March 24, 2022:

    In commit “validation: add ChainMan logic for completing UTXO snapshot validation” (328aa05c28fe9e5fff5d90e9b23bbbb5f6da1846)

    I think it would simplify things and be less confusing to drop this checkForUncleanShutdown() method and just call maybeCompleteSnapshotValidation() here unconditionally.

    I think it overcomplicates things to think of this as special “unclean” case. I think if shutdown is requested, a node should stop what it is doing and shut down. If it was in the middle of doing work, and it has to pick up and redo some of that work on the restart, there is nothing “unclean” about that, it’s just what you do when you are trying to write a robust and responsive application.


    jamesob commented at 8:14 pm on April 13, 2022:
    Yep, good idea. Done. I’m now just calling maybeCompleteSnapshotValidation() from within LoadChainstate() (during init). This dovetails with the broader change of doing background validation cleanup on initialization (instead of shutdown) which I’ll describe in other comments.
  39. in src/validation.cpp:5055 in 328aa05c28 outdated
    5050+       return {false, {}};
    5051+    }
    5052+
    5053+    auto handleInvalidSnapshot = [&]() EXCLUSIVE_LOCKS_REQUIRED(::cs_main) {
    5054+        LogPrintf("[snapshot] !!! the snapshot you've been working off of is invalid\n");
    5055+        LogPrintf("[snapshot] deleting snapshot and reverting to validated chain\n");
    


    ryanofsky commented at 6:04 pm on March 24, 2022:

    In commit “validation: add ChainMan logic for completing UTXO snapshot validation” (328aa05c28fe9e5fff5d90e9b23bbbb5f6da1846)

    I think this error should trigger more than logprints and a silent shutdown. Especially since after this a GUI user or systemd service script is likely to just bring the node up again and continue syncing like nothing happened. Probably would be best to call AbortNode() here to trigger a more noticeable error, and not just close the GUI with no indication of why it is closing.

    Also text of this error doesn’t seem to give an indication of what is causing it or how it should be dealt with. IIUC there are 3 possible causes for this error:

    1. Bitcoin sources were modified to include the hash of an invalid snapshot
    2. Bitcoin validation code is buggy
    3. Local data or memory corruption

    I’m assuming (3) is the most likely cause and the error message could reflect this. Maybe would suggest:

    {PACKAGE_NAME} failed to validate the assumeutxo snapshot state. This indicates a hardware problem, or a bug in the software, or an bad software modification that allowed an invalid snapshot to be loaded. As a result of this, the node will shut down and delete any state that was built on the snapshot, resetting the chain height from {snapshot_tip_height} to {snapshot_base_height}. On the next restart, the node will resume syncing from {snapshot_base_height} not using any snapshot data.

    Also to minimize potential damage, maybe help to debug or recover, and just avoid wiping traces of a bad thing that happened, i think it would probably be better to just rename the snapshot directory with an -invalid suffix instead of trying to delete it.


    ariard commented at 4:07 pm on April 8, 2022:
    I think the log print can be even more proactive, if the causes of this error is 1., the log print could invite the node operator to report the snapshot source to the project, e.g opening an issue. That would help to identify and correct wrongdoing in the snapshot distribution circuit.

    jamesob commented at 10:20 pm on April 13, 2022:
    Now using AbortNode() as well as the nice log message you specify; I’m also including PACKAGE_BUGREPORT which lists the github issues URL. Thanks for the great suggestion.
  40. in src/test/util/chainstate.h:20 in 9916bb825f outdated
    15 
    16 #include <univalue.h>
    17 
    18 #include <boost/test/unit_test.hpp>
    19 
    20+using node::NodeContext;
    


    ryanofsky commented at 8:35 pm on March 24, 2022:

    In commit “test: add reset_chainstate parameter for snapshot unittests” (9916bb825f6ef69f6c906dd4154309df07aa9d03)

    Would avoid using statement in header file because it’s not limited to this file, but applies to any cpp file including it.


    jamesob commented at 8:16 pm on April 13, 2022:
    Done, thanks.
  41. in src/test/util/chainstate.h:65 in 9916bb825f outdated
    59@@ -48,6 +60,33 @@ CreateAndActivateUTXOSnapshot(node::NodeContext& node, const fs::path root, F ma
    60 
    61     malleation(auto_infile, metadata);
    62 
    63+    if (reset_chainstate) {
    64+        {
    65+            // What follows is a hack to selectively reset chainstate data without
    


    ryanofsky commented at 8:52 pm on March 24, 2022:

    In commit “test: add reset_chainstate parameter for snapshot unittests” (9916bb825f6ef69f6c906dd4154309df07aa9d03)

    Can the comment say what in particular is hacky and what would make it less hacky? This just seems like it is unitiializing and reinitializing the chainstate, not doing anything too egregrious. Is the hacky aspect of this just that its is duplicating a lot of code. Would it be easy to deduplicate in a followup?


    jamesob commented at 8:18 pm on April 13, 2022:
    Removed the mention of hackiness. Perhaps this could live as a separate function if it was found to be useful elsewhere, but I think for now it’s fine here. The code itself is sort of single-purpose; I don’t think there’s another case where we want to do such an unceremonious unload of all the chainstate data.
  42. in src/validation.cpp:5140 in fe09582ec3 outdated
    5135+{
    5136+    if (!m_from_snapshot_blockhash) {
    5137+        // Chainstate isn't based on a snapshot.
    5138+        return false;
    5139+    }
    5140+    std::optional<fs::path> snapshot_datadir = FindSnapshotChainstateDatadir();
    


    ryanofsky commented at 1:08 am on March 25, 2022:

    In commit “validation: tear down snapshot datadirs upon validation failure” (fe09582ec34ff4d19bb2aa490ffd6ccca95e8480)

    It seems fragile that in order to remove a particular snapshot chainstate’s data files, we have to call a function that iterates all snapshot chainstate directories, and hope that it returns the same directory, and then do extra checks below to ensure that it did actually return the same one.

    It would be nice to introduce a fs::path ChainstatePath(std::optional<uint256> from_snapshot_blockhash) function and just say fs::path snapshot_datadir = ChainstatePath(m_from_snapshot_blockhash) here. (ChainstatePath would just do the inverse of PathToSnapshotHash and probably would be useful to call other places).

    EDIT: Or maybe this could just call the StoragePath() method added later to get the chainstate directory path from *this?


    jamesob commented at 2:32 am on April 21, 2022:
    I’ve simplified all this with the rename to chainstate_snapshot.
  43. in src/validation.cpp:5089 in 328aa05c28 outdated
    5084+    assert(this->GetAll().size() == 2);
    5085+
    5086+    CCoinsViewDB& ibd_coins_db = chainstate.CoinsDB();
    5087+    chainstate.ForceFlushStateToDisk();
    5088+
    5089+    auto snapshot_blockhash = SnapshotBlockhash().value_or(uint256());
    


    ryanofsky commented at 1:55 am on March 25, 2022:

    In commit “validation: add ChainMan logic for completing UTXO snapshot validation” (328aa05c28fe9e5fff5d90e9b23bbbb5f6da1846)

    SnapshotBlockhash() not being nullopt seems like another thing that could be asserted above so this impossible value_or() case can go away


    jamesob commented at 10:20 pm on April 13, 2022:
    Done.
  44. in src/validation.cpp:5078 in 328aa05c28 outdated
    5073+     } else if (index_new.nHeight < snapshot_height) {
    5074+         // Background IBD not complete yet.
    5075+         return {false, {}};
    5076+     }
    5077+
    5078+    int curr_height = chainstate.m_chain.Height();
    


    ryanofsky commented at 2:02 am on March 25, 2022:

    In commit “validation: add ChainMan logic for completing UTXO snapshot validation” (328aa05c28fe9e5fff5d90e9b23bbbb5f6da1846)

    Here and below I think it would be make code more readable to stop using the generic chainstate variable and instead use the unambiguous m_ibd_chainstate variable, since at this point point to the same chainstate.


    jamesob commented at 10:20 pm on April 13, 2022:
    Done, thanks.
  45. in src/validation.cpp:5095 in 328aa05c28 outdated
    5090+
    5091+    LogPrintf("[snapshot] tip: actual=%s expected=%s\n",
    5092+        chainstate.m_chain.Tip()->ToString(),
    5093+        snapshot_blockhash.ToString());
    5094+
    5095+    assert(index_new.GetBlockHash() == snapshot_blockhash);
    


    ryanofsky commented at 2:08 am on March 25, 2022:

    In commit “validation: add ChainMan logic for completing UTXO snapshot validation” (328aa05c28fe9e5fff5d90e9b23bbbb5f6da1846)

    This seems like an error that actually needs to be checked for, not something that can asserted. If the snapshot blockhash comes from the directory name, an outside process renaming the directory could trigger this?


    jamesob commented at 10:20 pm on April 13, 2022:
    Good point, done.
  46. in src/validation.cpp:5121 in 328aa05c28 outdated
    5116+    }
    5117+
    5118+    // Compare the background validation chainstate's UTXO set hash against the hard-coded
    5119+    // assumeutxo hash we expect.
    5120+    //
    5121+    // TODO: For belt-and-suspenders, we could cache an obfuscated version of the UTXO set
    


    ryanofsky commented at 2:11 am on March 25, 2022:

    In commit “validation: add ChainMan logic for completing UTXO snapshot validation” (328aa05c28fe9e5fff5d90e9b23bbbb5f6da1846)

    Why obfuscated?


    jamesob commented at 10:21 pm on April 13, 2022:
    Removed.
  47. in src/validation.cpp:5139 in 328aa05c28 outdated
    5134+
    5135+    m_snapshot_validated = true;
    5136+
    5137+    // m_ibd_chainstate is the same as chainstate, per the assertion above.
    5138+    // Modification happens to the argument (this symbol) to make it explicit that
    5139+    // the caller might act specifically based on this value being different.
    


    ryanofsky commented at 2:15 am on March 25, 2022:

    In commit “validation: add ChainMan logic for completing UTXO snapshot validation” (328aa05c28fe9e5fff5d90e9b23bbbb5f6da1846)

    I don’t think I understand what this comment is saying. Is using the chainstate variable future-proofing for some upcoming change? What could it mean for the caller to “act specifically based on this value”?


    jamesob commented at 10:21 pm on April 13, 2022:
    I must’ve been in a haze when I typed this - I have no idea what it means either. Removed.
  48. in src/validation.cpp:5338 in 328aa05c28 outdated
    5333+    LogPrintf("[snapshot] deleting background chainstate directory (now unnecessary) (%s)\n",
    5334+        fs::PathToString(old_chainstate));
    5335+
    5336+    // Instead of deleting the background chainstate directly, rename the old
    5337+    // chainstate to chainstate_bak, rename the new chainstate into place, and then
    5338+    // delete to avoid likelihood of corruption from mid-delete shutdown.
    


    ryanofsky commented at 2:19 am on March 25, 2022:

    In commit “validation: add ChainMan logic for completing UTXO snapshot validation” (328aa05c28fe9e5fff5d90e9b23bbbb5f6da1846)

    This isn’t really specific to shutdown. Renaming a directory before deleting it is just good practice because renaming is atomic and deleting a directory isn’t.


    jamesob commented at 10:27 pm on April 13, 2022:
    Removed comment
  49. in src/validation.h:867 in d8f32839cf outdated
    861@@ -862,6 +862,10 @@ class ChainstateManager
    862         CBlockIndex** ppindex) EXCLUSIVE_LOCKS_REQUIRED(cs_main);
    863     friend CChainState;
    864 
    865+    // Returns nullptr if no snapshot ahs been loaded.
    866+    CBlockIndex* getSnapshotBaseBlock() EXCLUSIVE_LOCKS_REQUIRED(::cs_main);
    867+    std::optional<int> getSnapshotHeight() EXCLUSIVE_LOCKS_REQUIRED(::cs_main);
    


    ryanofsky commented at 3:02 pm on March 25, 2022:

    In commit “add ChainstateManager.getSnapshot{Height,BaseBlock}()” (d8f32839cfbc8cbfe194d190bdc19fb3c4d6e32a)

    Would rename getSnapshotHeight to getSnapshotBaseHeight for two reasons:

    • To be consistent with getSnapshotBaseBlock
    • To disambiguate. “Snapshot” could refer to snapshot chainstate rather than snapshot itself, and chainstate may have a height higher than the initial snapshot

    This is a tangent, but I’m increasingly disliking the terms “ibd chainstate” and “snapshot chainstate.” If had a magic wand I would abolish them everywhere. I think ideally a chainstate would consist of a UTXO set, a tip pointer, and “validated starting from” block pointer and “validated starting from” UTXO hash. ChainstateManager would have a list of chainstates, and one of them would be active with rest inactive. Network code would download blocks to validate chainstates, prioritizing blocks for the active chainstate. ChainstateManager would connect the incoming blocks, and if the tip of any chainstate crossed the “validated starting from” block of a later chainstate, and UTXO hash matched, the later “validated_starting_from” pointer would moved back to the earlier one, and earlier chainstate (optionally) discarded. Probably having explicit tip and “validated starting from” pointers would also allow CBlockIndex validation flags to be simplified so more code doesn’t need to care about assumeutxo. This could all be cleanup though. More important to get the existing code that’s already merged functional and not change things at this point.


    jamesob commented at 10:31 pm on April 13, 2022:
    Renamed. Agree with your thoughts on the better design, and also agree that the path from here to there is not so long. I’d just prefer to wait until we actually need to revamp the design (e.g. multiple parallel background chainstates) to do all the code changes.
  50. in src/validation.cpp:5175 in 328aa05c28 outdated
    5171@@ -5051,6 +5172,34 @@ void ChainstateManager::Unload()
    5172     m_best_invalid = nullptr;
    5173 }
    5174 
    5175+void ChainstateManager::cleanup()
    


    ryanofsky commented at 3:24 pm on March 25, 2022:

    In commit “validation: add ChainMan logic for completing UTXO snapshot validation” (328aa05c28fe9e5fff5d90e9b23bbbb5f6da1846)

    I think I’d NACK the approach for trying to rename and delete things on shutdown. I don’t see why it’s not possible to rename and delete the snapshot when it is validated. I understand that maybe you want the pointers to be stable so can’t delete the CChainstate object, but that doesn’t seem like a reason why you can’t close/destroy databases and rename directories.

    If I’m missing something and it really is not possible to rename the chainstate directories while the node is running, the best time to rename them would be startup time, not shutdown time. Shutdown is the time to stop work, not begin it, and it’s especially precarious if bitcoin is running on a constantly rebooting windows machine or as a systemd service, where the system may ask nicely for a shutdown but quickly run out of patience and kill the process.


    jamesob commented at 8:12 pm on April 11, 2022:
    Good points, willfix.

    jamesob commented at 10:23 pm on April 13, 2022:
    I’m now doing the “cleanup” (i.e. deletion of background chainstate data) on startup, and not shutdown. I’d like to experiment with doing deletion inline in a follow-up PR.
  51. ryanofsky commented at 3:45 pm on March 25, 2022: contributor

    Now reviewed the full PR, but will wait for answers to some questions before adding ACK.

    I do think this PR could be merged in its current form, but some things, particularly detecting chainstate directories on startup, and renaming snapshot chainstate directory when its validated could be handled in a more robust way. So would want to clean these things up or find out what future plans may be.

    Overall this looks very good. Could be a candidate for review club, I think.

  52. in src/validation.cpp:5418 in becdbd6da4 outdated
    5107+        {
    5108+            if (found.empty()) {
    5109+                found = it->path();
    5110+                LogPrintf("[snapshot] found snapshot datadir %s\n", fs::PathToString(found));
    5111+            } else {
    5112+                LogPrintf("[snapshot] WARNING - detected multiple snapshot " /* Continued */
    


    ariard commented at 5:00 pm on April 5, 2022:
    At least one improvement of the log message could be to advice the node operator on what they should do, e.g deleting the oldest snapshot one or whatever could be appropriate.
  53. in src/validation.cpp:5149 in fe09582ec3 outdated
    5144+    }
    5145+
    5146+    uint256 datadir_hash = PathToSnapshotHash(*snapshot_datadir);
    5147+
    5148+    if (datadir_hash != *m_from_snapshot_blockhash) {
    5149+        LogPrintf("[snapshot] WARNING - unexpected blockhash for snapshot datadir (%s); expected %s\n",
    


    ariard commented at 5:13 pm on April 5, 2022:
    Same comment, unclear what a node operator should do with an invalid, name-different snapshot datadir.
  54. in src/validation.h:865 in d8f32839cf outdated
    861@@ -862,6 +862,10 @@ class ChainstateManager
    862         CBlockIndex** ppindex) EXCLUSIVE_LOCKS_REQUIRED(cs_main);
    863     friend CChainState;
    864 
    865+    // Returns nullptr if no snapshot ahs been loaded.
    


    ariard commented at 5:14 pm on April 5, 2022:
    nit: s/ahs/has/g

    jamesob commented at 10:28 pm on April 13, 2022:
    Fixed, thanks
  55. ariard commented at 5:48 pm on April 5, 2022: member
    Reviewed until bf36616
  56. DrahtBot added the label Needs rebase on Apr 6, 2022
  57. in src/test/validation_chainstate_tests.cpp:94 in 51d0dd34ec outdated
    89@@ -90,7 +90,8 @@ BOOST_FIXTURE_TEST_CASE(chainstate_update_tip, TestChain100Setup)
    90     // After adding some blocks to the tip, best block should have changed.
    91     BOOST_CHECK(::g_best_block != curr_tip);
    92 
    93-    BOOST_REQUIRE(CreateAndActivateUTXOSnapshot(m_node, m_path_root));
    94+    BOOST_REQUIRE(CreateAndActivateUTXOSnapshot(
    95+        this, NoMalleation, /* reset_chainstate */ true));
    


    fanquake commented at 10:34 am on April 7, 2022:
    0        this, NoMalleation, /*reset_chainstate=*/true));
    

    Anywhere you are adding / touching named args, please use the style from the developer docs.


    jamesob commented at 10:14 pm on April 13, 2022:
    Done, thanks.
  58. in src/validation.cpp:2967 in 51d0dd34ec outdated
    2963@@ -2959,6 +2964,12 @@ bool CChainState::ActivateBestChain(BlockValidationState& state, std::shared_ptr
    2964 
    2965         if (nStopAtHeight && pindexNewTip && pindexNewTip->nHeight >= nStopAtHeight) StartShutdown();
    2966 
    2967+        // This will have been toggled in ABCStep -> ConnectTip -> completeSnapshotValidation,
    


    ariard commented at 3:41 pm on April 8, 2022:
    nit: maybeCompleteSnapshotValidation

    jamesob commented at 10:19 pm on April 13, 2022:
    Fixed, thanks.
  59. in src/validation.h:781 in 328aa05c28 outdated
    783+enum class SnapshotCompletionError {
    784+    IBD_TOO_FAR,
    785+    MISSING_CHAINPARAMS,
    786+    STATS_FAILED,
    787+    HASH_MISMATCH,
    788+};
    


    ariard commented at 4:18 pm on April 8, 2022:
    Nice to add comments about error purposes.

    jamesob commented at 10:27 pm on April 13, 2022:
    Done
  60. in src/validation.cpp:5113 in 328aa05c28 outdated
    5108+
    5109+    if (!GetUTXOStats(&ibd_coins_db, WITH_LOCK(::cs_main, return std::ref(m_blockman)), ibd_stats, breakpoint_fnc)) {
    5110+        LogPrintf("[snapshot] failed to generate stats for validation coins db\n");
    5111+        // While this isn't a proablem with the snapshot per se, this condition
    5112+        // prevents us from validating the snapshot, so we should shut down and let the
    5113+        // use handle the issue manually.
    


    ariard commented at 4:38 pm on April 8, 2022:
    nit: s/proablem/problem/g and s/use/user/g

    jamesob commented at 10:20 pm on April 13, 2022:
    Done, thanks.
  61. in src/validation.cpp:5178 in 328aa05c28 outdated
    5052+
    5053+    auto handleInvalidSnapshot = [&]() EXCLUSIVE_LOCKS_REQUIRED(::cs_main) {
    5054+        LogPrintf("[snapshot] !!! the snapshot you've been working off of is invalid\n");
    5055+        LogPrintf("[snapshot] deleting snapshot and reverting to validated chain\n");
    5056+
    5057+        m_active_chainstate = m_ibd_chainstate.get();
    


    ariard commented at 4:58 pm on April 8, 2022:

    From reading ChainstateManager docs, there are two notions of IBD chainstates, the ibd chainstate and the background ibd chainstate. However, the ibd chainstate is only used here, after it has been fully validated and won’t be used further as node operation is shutting down.

    So what’s the purpose of using the notion of an ibd_chainstate, couldn’t m_ibd_chainstate be called m_background_ibd_chainstate ?


    jamesob commented at 8:40 pm on April 11, 2022:
    m_ibd_chainstate is the active if a snapshot is not being used, so calling it m_background_... wouldn’t be appropriate.
  62. ariard commented at 5:05 pm on April 8, 2022: member
    Reviewed 328aa05c, I think a commit can be added to update bitcoin/assumeutxo.md in consequence, the subsection “Background chainstate hits snapshots base block”. Especially, the chainstate directories/caches dance done in function of completion success or failure.
  63. jamesob commented at 10:41 pm on April 13, 2022: member

    Thanks to everyone (esp. @ryanofsky) for the feedback so far.

    I’m about to push the latest iteration of this branch. The one bit of feedback I’m not incorporating (yet) is the rename of the leveldb dir chainstate_[blockhash] to chainstate_snapshot, mostly because introducing a non-leveldb file (blockhash or whatever we eventually call it) complicates the DestroyDB/removeSnapshotDatadir process, since it’s a non-leveldb file that leaves the directory dangling after the leveldb::DestroyDB call. But that said, I do want to go that route at some point.

    Some notable changes resulting from the feedback so far:

    • I’ve added utilities to create the coinsdb on disk (instead of in memory) within unittests; previously all chainstate data has been in memory for unittests, but this made testing snapshot cleanup basically impossible. I think having the default case be for coinsdb to be persisted to disk makes the tests more realistic anyway, which is good.
    • Based upon Russ’ good feedback, the unnecessary background validation chainstate data is now deleted at subsequent initialization (instead of at shutdown). It’s plausible to me that we may be able to do this “inline” with snapshot validation, but I’ll have to look into that and spend a few more days testing it.
  64. jamesob force-pushed on Apr 13, 2022
  65. DrahtBot removed the label Needs rebase on Apr 14, 2022
  66. jamesob force-pushed on Apr 21, 2022
  67. jamesob force-pushed on Apr 21, 2022
  68. jamesob requested review from ryanofsky on Apr 21, 2022
  69. jamesob requested review from MarcoFalke on Apr 21, 2022
  70. jamesob requested review from ariard on Apr 21, 2022
  71. in src/validation.cpp:5119 in 2efe7c7ec8 outdated
    5255@@ -5230,6 +5256,136 @@ bool ChainstateManager::PopulateAndValidateSnapshot(
    5256     return true;
    5257 }
    5258 
    5259+// Currently, this function holds cs_main for its duration, which could be for
    5260+// multiple minutes due to the GetUTXOStats call. This hold is necessary
    5261+// because we need to avoid advancing the background validation chainstate
    5262+// farther than the snapshot base block - and this function is also invoked
    5263+// from within ConnectTip, i.e. from within ActivateBestChain, so cs_main is
    5264+// held anyway.
    


    jamesob commented at 1:36 pm on April 21, 2022:

    Perhaps it isn’t actually necessary to hold cs_main throughout this if I set m_stop_use immediately, and then ensure all chainstate accesses are going through isUsable() checks…

    Edit: anyway, that’s an optional optimization that can be looked at in a future PR.


    ryanofsky commented at 7:33 pm on July 6, 2022:

    In commit “validation: add ChainMan logic for completing UTXO snapshot validation” (00b319a35cfb691a2fb831e84f63512412cae068)

    Perhaps it isn’t actually necessary to hold cs_main throughout this if I set m_stop_use immediately, and then ensure all chainstate accesses are going through isUsable() checks…

    Edit: anyway, that’s an optional optimization that can be looked at in a future PR.

    Maybe change the TODO comment below to mention this idea. It seems like a more direct solution than using different locks.


    jamesob commented at 1:06 pm on July 20, 2022:
    Done
  72. in src/dbwrapper.h:230 in 98fc81fbaa outdated
    216@@ -212,6 +217,12 @@ class CDBWrapper
    217 
    218     std::vector<unsigned char> CreateObfuscateKey() const;
    219 
    220+    //! path to filesystem storage
    221+    const fs::path m_path;
    222+
    223+    //! whether or not the database resides in memory
    224+    bool m_is_memory;
    


    ryanofsky commented at 4:04 pm on April 21, 2022:

    In commit “db: add StoragePath, Options to CDBWrapper/CCoinsViewDB” (98fc81fbaaa02792b94264680cc0062ccfde9698)

    Minor: Might be a little nicer to combine two members these into a std::optional<std::path> member to avoid work when StoragePath is called, and to prevent possibility of class being in a confusing m_is_memory && !m_path.empty() state


    jamesob commented at 4:49 pm on April 27, 2022:
    While I agree, there are a few users of CDBWrapper outside the scope of this PR (BaseIndex::DB, CBlockTreeDB) - they seem to construct with non-empty paths even when being used in memory, so making this change will involve a few out-of-scope modifications. I agree with you that this is the right way to go, and would be a nice follow-on.
  73. in src/validation.cpp:5217 in f427976e6a outdated
    5212+    // `DestroyDB` will fail. See `leveldb::~DBImpl()`.
    5213+    //
    5214+    // And this chainstate should no longer be considered usable anyway, so
    5215+    // teardown all coinsviews.
    5216+    m_coins_views.reset();
    5217+    return dbwrapper::DestroyDB(db_path, options).ok();
    


    ryanofsky commented at 4:16 pm on April 21, 2022:

    In commit “validation: add CChainState::destroyCoinsDB” (f427976e6a3e024564fa824cc0578ee9b1285198)

    Is it necessary for this function to be passed a database path instead of getting the path from this->CoinsDB->StoragePath()? If so, would be helpful to add a comment saying when db_path might not the coinsdb path. If not, would be nice to eliminate the db_path argument to simplify and remove the possibility of bugs from passing the wrong path.


    jamesob commented at 3:31 pm on April 29, 2022:
    I think the new changes should take care of this, let me know if not.
  74. in src/validation.cpp:4907 in da4aaba007 outdated
    4860@@ -4856,6 +4861,62 @@ const AssumeutxoData* ExpectedAssumeutxo(
    4861     return nullptr;
    4862 }
    4863 
    4864+//! The file in the snapshot chainstate dir which stores the base blockhash. This
    4865+//! is needed to reconstruct snapshot chainstates on init.
    4866+constexpr std::string_view SNAPSHOT_BLOCKHASH_FILENAME = "base_blockhash";
    


    ryanofsky commented at 4:35 pm on April 21, 2022:

    In commit “validation: rename snapshot chainstate dir” (da4aaba0073108092f0dd0e02753eea253395c4e)

    Just IMO, but snapshot_block_hash might be a more descriptive name than base_blockhash for someone just looking at directory contents.


    jamesob commented at 5:57 pm on April 27, 2022:
    I’m going to stick with base_blockhash just because it makes grepping for {m_,}base_blockhash easier, which is the equivalent data in SnapshotMetadata.
  75. in src/validation.cpp:4967 in 5e8879c402 outdated
    4916@@ -4917,6 +4917,17 @@ static std::optional<uint256> ReadSnapshotBaseBlockhash(fs::path datadir)
    4917     return base_blockhash;
    4918 }
    4919 
    4920+static std::optional<fs::path> FindSnapshotChainstateDatadir()
    


    ryanofsky commented at 4:44 pm on April 21, 2022:

    In commit “validation: remove snapshot datadirs upon validation failure” (5e8879c402b23f8c3b71287a1f4393caa73dfcbd)

    This function was added in previous commit and is moving this commit. Would be good to just add the function to the final location in previous commit (“init: add utxo snapshot detection”, 94a484340be246e38ba40644f4e2ba1913d22240) and avoid the move.


    jamesob commented at 2:42 pm on April 29, 2022:
    Done, thanks.
  76. in src/validation.cpp:5016 in 5e8879c402 outdated
    5003+
    5004+        // PopulateAndValidateSnapshot can return (in error) before the leveldb datadir
    5005+        // has been created, so only attempt removal if we got that far.
    5006+        if (auto snapshot_datadir = FindSnapshotChainstateDatadir()) {
    5007+            bool removed = snapshot_chainstate->removeSnapshotDatadir();
    5008+            assert(removed);
    


    ryanofsky commented at 5:00 pm on April 21, 2022:

    In commit “validation: remove snapshot datadirs upon validation failure” (5e8879c402b23f8c3b71287a1f4393caa73dfcbd)

    Maybe this doesn’t apply because our project uses assets in a nonstandard way, but I would only ever use C asserts to catch internal logic bugs which are supposed to be impossible. Would not use asserts to handle operating system errors, problems with I/O, or external libraries.

    So I’d replace this assert and other assert(removed) in the function below to call AbortNode or throw runtime_error instead. Not sure if this required, though.


    jamesob commented at 7:27 pm on April 27, 2022:

    I would only ever use C asserts to catch internal logic bugs which are supposed to be impossible. Would not use asserts to handle operating system errors, problems with I/O, or external libraries.

    This is a really good point, thanks for putting it in those terms. I’ve fixed up all asserts that fit this description, and I’m not wrapped each filesystem operation in a try/catch to be sure I’m catching any fs::filesystem_errors that are thrown.

  77. in src/validation.cpp:4676 in 70455311b6 outdated
    4813@@ -4814,12 +4814,8 @@ std::vector<CChainState*> ChainstateManager::GetAll()
    4814     LOCK(::cs_main);
    4815     std::vector<CChainState*> out;
    4816 
    4817-    if (!IsSnapshotValidated() && m_ibd_chainstate) {
    


    ryanofsky commented at 5:15 pm on April 21, 2022:

    In commit “validation: add CChainState::m_stop_use and ChainMan::isUsable” (70455311b695e65b847031a90fa870ab50519b48)

    It seems like a good thing to drop the IsSnapshotValidated call in this commit so the GetAll() method implementation is simpler and not tied to details of snapshot validation. But just to make sure new code is equivalent to previous code, is it true that isUsable(m_ibd_chainstate) implies IsSnapshotValidated()? Like if you added a assert(!isUsable(m_ibd_chainstate) || IsSnapshotValidated()) here it would pass? Not suggesting to add this, just wondering.


    jamesob commented at 2:41 pm on April 29, 2022:
    I’ve rephrased the definition of IsSnapshotValidated() to be based on m_stop_use (i.e. IsUsable()), and so the two falling out of sync is no longer a concern here. Turns out that the m_snapshot_validated state is unnecessary.
  78. in src/validation.cpp:5269 in 70455311b6 outdated
    5265+    assert(ibd_usable || snapshot_usable);
    5266+
    5267+    if (!ibd_usable && snapshot_usable) {
    5268+        // Snapshot must have been validated if snapshot chain is in use but ibd isn't.
    5269+        assert(m_snapshot_validated);
    5270+    }
    


    ryanofsky commented at 5:18 pm on April 21, 2022:

    In commit “validation: add CChainState::m_stop_use and ChainMan::isUsable” (70455311b695e65b847031a90fa870ab50519b48)

    IMO, this would be a little clearer as:

    0    if (!ibd_usable) {
    1        // Snapshot must have been validated if ibd chain is out of use.
    2        assert(snapshot_usable);
    3        assert(m_snapshot_validated);
    4    }
    
  79. in src/validation.cpp:5298 in 2efe7c7ec8 outdated
    5293+    auto handleInvalidSnapshot = [&]() EXCLUSIVE_LOCKS_REQUIRED(::cs_main) {
    5294+        bilingual_str user_error = _(strprintf(
    5295+            "%s failed to validate the -assumeutxo snapshot state. "
    5296+            "This indicates a hardware problem, or a bug in the software, or a "
    5297+            "bad software modification that allowed an invalid snapshot to be "
    5298+            "loaded. As a result of this, the node will shut down and move any "
    


    ryanofsky commented at 5:27 pm on April 21, 2022:

    In commit “validation: add ChainMan logic for completing UTXO snapshot validation” (2efe7c7ec82e7885e07140e1deb5b5e9d6869f1d)

    Would s/move/stop using/. The word “move” seems less clear here, and the actual useful details about the rename are logged later.


    jamesob commented at 3:30 pm on April 29, 2022:
    Fixed.
  80. in src/validation.cpp:5307 in 2efe7c7ec8 outdated
    5302+            "without using any snapshot data. "
    5303+            "Please report this incident to %s, including how you obtained the snapshot.",
    5304+            PACKAGE_NAME, snapshot_tip_height, snapshot_base_height, snapshot_base_height, PACKAGE_BUGREPORT
    5305+        ).c_str());
    5306+
    5307+        LogPrintf("[snapshot] !!! %s\n", user_error.translated);
    


    ryanofsky commented at 5:29 pm on April 21, 2022:

    In commit “validation: add ChainMan logic for completing UTXO snapshot validation” (2efe7c7ec82e7885e07140e1deb5b5e9d6869f1d)

    I believe should log .original, not .translated


    jamesob commented at 3:30 pm on April 29, 2022:
    Fixed.
  81. in src/validation.cpp:5323 in 2efe7c7ec8 outdated
    5318+
    5319+   if (index_new.nHeight > snapshot_base_height) {
    5320+      LogPrintf("[snapshot] something is wrong! validation chain " /* Continued */
    5321+             "should not have continued past the snapshot origin\n");
    5322+         handleInvalidSnapshot();
    5323+         return {false, SnapshotCompletionError::IBD_TOO_FAR};
    


    ryanofsky commented at 5:35 pm on April 21, 2022:

    In commit “validation: add ChainMan logic for completing UTXO snapshot validation” (2efe7c7ec82e7885e07140e1deb5b5e9d6869f1d)

    Curious why the code doesn’t just assert index_new.nHeight <= snapshot_base_height. Is it possible for chainstates that would trigger this to even be loaded? If so, might make be better to check chainstate heights early when they are loaded than to handle it later as part of validation.


    jamesob commented at 3:30 pm on April 29, 2022:
    Fixed this to be clearer.
  82. in src/validation.cpp:5541 in 2efe7c7ec8 outdated
    5536+
    5537+    auto invalid_path = snapshot_datadir + "_INVALID";
    5538+    std::string dbpath = fs::PathToString(snapshot_datadir);
    5539+    std::string target = fs::PathToString(invalid_path);
    5540+    LogPrintf("[snapshot] renaming snapshot datadir %s to %s\n", dbpath, target);
    5541+    fs::rename(snapshot_datadir, invalid_path);
    


    ryanofsky commented at 5:52 pm on April 21, 2022:

    In commit “validation: add ChainMan logic for completing UTXO snapshot validation” (2efe7c7ec82e7885e07140e1deb5b5e9d6869f1d)

    Would be good to log that the rename fails, if it fails. Not critical, but could avoid confusion figuring out what is happening from the logs if two snapshots fail to validate.

  83. in src/validation.cpp:5607 in 2efe7c7ec8 outdated
    5602+    fs::rename(ibd_chainstate_path, tmp_old);
    5603+    fs::rename(snapshot_chainstate_path, ibd_chainstate_path);
    5604+
    5605+    LogPrintf("[snapshot] moving snapshot chainstate (%s) to " /* Continued */
    5606+        "default chainstate directory (%s)\n",
    5607+        fs::PathToString(snapshot_chainstate_path), fs::PathToString(ibd_chainstate_path));
    


    ryanofsky commented at 6:01 pm on April 21, 2022:

    In commit “validation: add ChainMan logic for completing UTXO snapshot validation” (2efe7c7ec82e7885e07140e1deb5b5e9d6869f1d)

    I’m suprised you can just rename directories while leveldb is still using them. If this works, I guess it’s ok, but I think more ideally CChainstate objects would have Destroy() and Rename(new_name) methods and would do these operations in a safe way without ChainStateManager being involved at such a low level.

  84. ryanofsky commented at 6:07 pm on April 21, 2022: contributor

    This looks great! It is much simpler than I remember it previously.

    I left some suggestions below (all minor, feel free to ignore them), and so far I reviewed all the non-test changes. I want to spend a little more time looking at the last commit, then review the tests, too.

    • 98fc81fbaaa02792b94264680cc0062ccfde9698 db: add StoragePath, Options to CDBWrapper/CCoinsViewDB (1/15)
    • f427976e6a3e024564fa824cc0578ee9b1285198 validation: add CChainState::destroyCoinsDB (2/15)
    • da4aaba0073108092f0dd0e02753eea253395c4e validation: rename snapshot chainstate dir (3/15)
    • 94a484340be246e38ba40644f4e2ba1913d22240 init: add utxo snapshot detection (4/15)
    • 5e8879c402b23f8c3b71287a1f4393caa73dfcbd validation: remove snapshot datadirs upon validation failure (5/15)
    • 89f3fc8c9205c34a28c5b892a979a9513cce0905 add ChainstateManager.getSnapshot{BaseHeight,BaseBlock}() (6/15)
    • f2abfde9f425c90bd6580ada35df94b35e130462 blockmanager: avoid undefined behavior during FlushBlockFile (7/15)
    • 70455311b695e65b847031a90fa870ab50519b48 validation: add CChainState::m_stop_use and ChainMan::isUsable (8/15)
    • c4774650050118c8659a636c2d826c3ee8a9b5bd add CChainState::HasCoinsViews() (9/15)
    • 2efe7c7ec82e7885e07140e1deb5b5e9d6869f1d validation: add ChainMan logic for completing UTXO snapshot validation (10/15)
    • 223561af0854347c6e10ee81650cbeec7b0c0284 move-only: test: make snapshot chainstate setup reusable (11/15)
    • c11d0ac32c5f150bfecd023fd455fbd916b93622 test: add reset_chainstate parameter for snapshot unittests (12/15)
    • 26f6e02888f2cf3ea164c2780368d721736ec8c5 test: allow coinsdb to reside on-disk in tests (13/15)
    • e02233a2b109ad4ac59a5a0cfbd83189cd2f4c61 test: add testcases for snapshot completion (14/15)
    • 559b1ec7242f4636b03aa39b1ec1b629ff1414d7 docs: update assumeutxo.md (15/15)
  85. in src/node/chainstate.cpp:122 in 2efe7c7ec8 outdated
    39-    if (!chainman.IsSnapshotValidated()) {
    40-        LogPrintf("Loading validation chainstate\n");
    41-        chainman.InitializeChainstate(mempool);
    42-    }
    43+    // Load the fully-validated chainstate.
    44+    chainman.InitializeChainstate(mempool);
    


    ryanofsky commented at 3:26 pm on April 26, 2022:

    In commit “validation: add ChainMan logic for completing UTXO snapshot validation” (2efe7c7ec82e7885e07140e1deb5b5e9d6869f1d)

    Would it make sense to assert(!chainman.IsSnapshotValidated()) here?


    jamesob commented at 7:24 pm on April 27, 2022:

    I think it may make sense to have that assertion at the end of the function, after the call to maybeCompleteSnapshotValidation() (which I will do), but at this point if we are init’ing after having shut down after the background validation chainstate has hit the snapshot base block, we are initializing a snapshot chainstate that has been validated but needs to be removed.

    That said, I’m thinking it may be possible to get rid of m_snapshot_validated/IsSnapshotValidated() altogether (in favor of the m_stop_use/isUsable() pair)… going to see if this is doable.


    ryanofsky commented at 4:13 pm on June 2, 2022:

    That said, I’m thinking it may be possible to get rid of m_snapshot_validated/IsSnapshotValidated() altogether (in favor of the m_stop_use/isUsable() pair)… going to see if this is doable.

    (This was implemented in cf5a6d3cfb1c77b5689ceb7a4bfa468e79774fb0)

  86. in src/validation.h:899 in 2efe7c7ec8 outdated
    892@@ -858,7 +893,10 @@ class ChainstateManager
    893     CChainState* m_active_chainstate GUARDED_BY(::cs_main) {nullptr};
    894 
    895     //! If true, the assumed-valid chainstate has been fully validated
    896-    //! by the background validation chainstate.
    897+    //! by the background validation chainstate. This will trigger shutdown
    898+    //! logic.
    899+    //!
    900+    //! @sa validatedSnapshotCleanup()
    


    ryanofsky commented at 3:27 pm on April 26, 2022:

    In commit “validation: add ChainMan logic for completing UTXO snapshot validation” (2efe7c7ec82e7885e07140e1deb5b5e9d6869f1d)

    Stray comment?


    jamesob commented at 2:47 pm on April 29, 2022:
    Fixed, thanks.
  87. in src/node/chainstate.cpp:39 in 2efe7c7ec8 outdated
    33@@ -34,12 +34,8 @@ std::optional<ChainstateLoadingError> LoadChainstate(bool fReset,
    34     // Load a chain created from a UTXO snapshot, if any exist.
    35     chainman.DetectSnapshotChainstate(mempool);
    36 
    37-    // If we're not using a snapshot or we haven't fully validated it yet,
    38-    // create a validation chainstate.
    39-    if (!chainman.IsSnapshotValidated()) {
    


    ryanofsky commented at 5:06 pm on April 26, 2022:

    In commit “validation: add ChainMan logic for completing UTXO snapshot validation” (2efe7c7ec82e7885e07140e1deb5b5e9d6869f1d)

    Was this check ever necessary to begin with when it was added in commit 94a484340be246e38ba40644f4e2ba1913d22240? It seems like IsSnapshotValidated would have always been false this whole time. Would probably be good to backport this change to the earlier commit.


    jamesob commented at 2:47 pm on April 29, 2022:
    Good point, backported.
  88. in src/init.cpp:1480 in 2efe7c7ec8 outdated
    1473@@ -1474,6 +1474,13 @@ bool AppInitMain(NodeContext& node, interfaces::BlockAndHeaderTipInfo* tip_info)
    1474                 break;
    1475             case ChainstateLoadingError::SHUTDOWN_PROBED:
    1476                 break;
    1477+            case ChainstateLoadingError::SNAPSHOT_VALIDATION_COMPLETE:
    1478+                // Retry LoadChainstate now that the fully-validated snapshot
    1479+                // chainstate datadir has been moved to the `chainstate` directory.
    1480+                continue;
    


    ryanofsky commented at 5:16 pm on April 26, 2022:

    In commit “validation: add ChainMan logic for completing UTXO snapshot validation” (2efe7c7ec82e7885e07140e1deb5b5e9d6869f1d)

    This is a little hard to follow, and maybe it’s the simplest approach, but I wonder if LoadChainstate could just handle snapshot validated case internally. Treating SNAPSHOT_VALIDATION_COMPLETE like an error code when it doesn’t really indicate a problem, and using continue to break out of a switch statement inside of a while loop seem like messy complications that could be nice to avoid, if there’s a more straightforward way.


    jamesob commented at 4:27 pm on April 27, 2022:
    Yeah, definitely agree. I’ll take a look.
  89. in src/node/chainstate.cpp:145 in 2efe7c7ec8 outdated
    140+
    141+    if (snapshot_completion.completed) {
    142+        LogPrintf("[snapshot] cleaning up unneeded background chainstate, then reinitializing\n");
    143+        chainman.validatedSnapshotCleanup();
    144+        chainman.reset();
    145+        return ChainstateLoadingError::SNAPSHOT_VALIDATION_COMPLETE;
    


    ryanofsky commented at 5:41 pm on April 26, 2022:

    In commit “validation: add ChainMan logic for completing UTXO snapshot validation” (2efe7c7ec82e7885e07140e1deb5b5e9d6869f1d)

    Maybe this is the simplest approach for now, but could you add a comment about what will cause this case to happen and what needs to be done to handle it. It seems like it would be an unusual for background validation to complete on startup. Would it basically only happen if the node was shut down after the background chainstate was validated but before it was deleted?

    Also wondering why it is it even necessary to call maybeCompleteSnapshotValidation here when there is already a call in connectTip()

    Also, is this approach as efficient as it could be, or is it doing unnecessary work like calling UnloadBlockIndex / LoadBlockIndex. SNAPSHOT_VALIDATION_COMPLETE causes LoadChainstate to be called again, but it’s unclear which parts of LoadChainstate actually need to re-run

    I think it would be good to add a code comment to explain this case more and answer some of these questions.


    jamesob commented at 2:45 pm on April 29, 2022:

    Would it basically only happen if the node was shut down after the background chainstate was validated but before it was deleted?

    Yes, and this is now the default mode. I’m not removing any leveldb contents during runtime because re-initializing chainstates mid-run is just too delicate a change to intermingle with all this other stuff. Maybe in a future PR we can look at actually doing the “snapshot cleanup” (i.e. leveldb deletion and rename) inline when the bg validation chainstate connects up with the snapshot base block, but until then I’m just going to handle that messy stuff during init (in LoadChainstate()).

    Also wondering why it is it even necessary to call maybeCompleteSnapshotValidation here when there is already a call in connectTip()

    We have to rehash the contents of the bg validation’s UTXO set to ensure that they match the expected value, since that hash isn’t persisted anywhere after the first snapshot validation.

    Also, is this approach as efficient as it could be, or is it doing unnecessary work like calling UnloadBlockIndex / LoadBlockIndex. SNAPSHOT_VALIDATION_COMPLETE causes LoadChainstate to be called again, but it’s unclear which parts of LoadChainstate actually need to re-run

    Good call, I’ve cleaned this up to avoid the confusing continue flow-control; now all chainstate re-initialization is confined to LoadChainstate(). As an added bonus, I’m now able to unittest (or, more accurately, “cpp test”) this whole process!

  90. in src/validation.cpp:3015 in 2efe7c7ec8 outdated
    3009@@ -2998,6 +3010,15 @@ bool CChainState::ActivateBestChain(BlockValidationState& state, std::shared_ptr
    3010                     assert(trace.pblock && trace.pindex);
    3011                     GetMainSignals().BlockConnected(trace.pblock, trace.pindex);
    3012                 }
    3013+
    3014+                // This will have been toggled in
    3015+                // ABCStep -> ConnectTip -> maybeCompleteSnapshotValidation,
    


    ryanofsky commented at 5:44 pm on April 26, 2022:

    In commit “validation: add ChainMan logic for completing UTXO snapshot validation” (2efe7c7ec82e7885e07140e1deb5b5e9d6869f1d)

    Just a suggestion, but maybe s/ABCStep/ActivateBestChainStep/ to help with grepping.


    jamesob commented at 2:43 pm on April 29, 2022:
    Done, thank you.
  91. ryanofsky commented at 6:04 pm on April 26, 2022: contributor

    Few more review comments for the last code commit “validation: add ChainMan logic for completing UTXO snapshot validation” (2efe7c7ec82e7885e07140e1deb5b5e9d6869f1d)

    Suggestion for PR description: Maybe add 2-3 sentences saying what assumeutxo functionality was already merged before this PR, and what changes are left to implement after this. It’s easy to forget where these changes fit into the bigger picture, and helpful to be reminded!

  92. DrahtBot added the label Needs rebase on Apr 28, 2022
  93. jamesob force-pushed on Apr 29, 2022
  94. jamesob force-pushed on Apr 29, 2022
  95. jamesob commented at 3:28 pm on April 29, 2022: member

    au.init.4 -> au.init.6

    View range diff on GitHub

    Thanks so much for the continued and good review @ryanofsky. I’ve pushed (another) big update, but thankfully I think this PR is in better shape than ever.

    Namely, we’re now able to test the entire snapshot cleanup process from within the cpp tests, including filesystem operations and even a simulated node restart. This is a big improvement IMO, since we are now testing the full assumeutxo lifecycle from within cpp tests.

    Other changes include

    • removing unnecessary m_snapshot_validated state,
    • avoiding the confusing continue-based init.cpp control flow for reinitializing the new chainstate after snapshot cleanup - all that is contained within LoadSnapshot() now,
    • catching all fs::filesystem_errors where applicable (and moving these calls out of asserts),
    • adding the ability to store block tree db on-disk from cpp tests,
    • removing the controversial lowerCamelCase method naming (see #24846),
    • better assurance that leveldb objects are destructed before any fs operations (delete, rename)

    I think this PR is in a good place, but I see that it needs another rebase so I’ll start on that now…

  96. jamesob force-pushed on Apr 29, 2022
  97. DrahtBot removed the label Needs rebase on Apr 29, 2022
  98. jamesob force-pushed on Apr 29, 2022
  99. in src/node/chainstate.cpp:16 in c0d496fb88 outdated
     8@@ -9,6 +9,81 @@
     9 #include <validation.h>
    10 
    11 namespace node {
    12+
    13+// Complete initialization of chainstates after the inital call has been made
    14+// to ChainstateManager::InitializeChainstate().
    15+static std::optional<ChainstateLoadingError>
    16+CompleteChainstateInitialization(
    


    ryanofsky commented at 2:44 pm on May 4, 2022:

    In commit “move-only-ish: init: factor out chainstate initialization” (1eb127e55e1fabe1c5addccd9be0feb421149645)

    Name CompleteChainstateInitialization is a bit vague. Maybe something like OpenChainstateDBs could give a better idea what the function is actually doing


    jamesob commented at 2:19 pm on June 23, 2022:
    Based on your other feedback of consolidating more logic into this function (which I’ve taken), I think the suggested name here is probably too narrow.
  100. DrahtBot added the label Needs rebase on May 13, 2022
  101. jamesob force-pushed on May 16, 2022
  102. DrahtBot removed the label Needs rebase on May 16, 2022
  103. DrahtBot added the label Needs rebase on May 20, 2022
  104. jamesob force-pushed on May 23, 2022
  105. jamesob force-pushed on May 23, 2022
  106. DrahtBot removed the label Needs rebase on May 23, 2022
  107. jamesob force-pushed on Jun 2, 2022
  108. in src/validation.cpp:4903 in 60d8fabe1b outdated
    4898+    AssertLockHeld(::cs_main);
    4899+    assert(snapshot_chainstate.m_from_snapshot_blockhash);
    4900+
    4901+    const std::optional<fs::path> datadir_path = snapshot_chainstate.CoinsDB().StoragePath();
    4902+    assert(datadir_path); // Sanity check that chainstate isn't in-memory.
    4903+    const fs::path write_to = *datadir_path / fs::u8path(SNAPSHOT_BLOCKHASH_FILENAME);
    


    ryanofsky commented at 2:57 pm on June 2, 2022:

    In commit “validation: rename snapshot chainstate dir” (60d8fabe1bd7b24257193884bb542cef55fb4252)

    Could change SNAPSHOT_BLOCKHASH_FILENAME type from std::string to fs::path so you don’t need any u8path call. (Also applies to read function below)


    jamesob commented at 2:18 pm on June 23, 2022:
    Fixed.
  109. in src/validation.cpp:4913 in 60d8fabe1b outdated
    4908+        LogPrintf("[snapshot] failed to open base blockhash file for writing: %s\n",
    4909+                  fs::PathToString(write_to));
    4910+        return false;
    4911+    }
    4912+    afile << *snapshot_chainstate.m_from_snapshot_blockhash;
    4913+    return true;
    


    ryanofsky commented at 3:06 pm on June 2, 2022:

    In commit “validation: rename snapshot chainstate dir” (60d8fabe1bd7b24257193884bb542cef55fb4252)

    Preexisting problem should probably be dealt with separately, but it doesn’t seem great CAutoFile ignores errors calling fclose. Might be a place to improve error reporting in the future.


    jamesob commented at 2:18 pm on June 23, 2022:
    Fixed.
  110. in src/validation.cpp:4929 in 60d8fabe1b outdated
    4924+
    4925+    if (!fs::exists(p)) {
    4926+        LogPrintf("[snapshot] snapshot chainstate dir is malformed! no base blockhash file " /* Continued */
    4927+            "exists at path %s. Try reinitializing snapshot?\n", fs::PathToString(p));
    4928+        return std::nullopt;
    4929+    }
    


    ryanofsky commented at 3:12 pm on June 2, 2022:

    In commit “validation: rename snapshot chainstate dir” (60d8fabe1bd7b24257193884bb542cef55fb4252)

    You could consider dropping both of these fs::exists checks because they should already be covered by “failed to open base blockhash file for reading” check below. Would make code shorter and I think that error message provides more useful context anyway.


    jamesob commented at 5:48 pm on June 15, 2022:
    I’m going to push back on this because I do think the specificity could be helpful, even if the possibility of those errors is pretty slim. Why not be specific about errors when possible at the cost of a few lines?
  111. in src/validation.cpp:4939 in 60d8fabe1b outdated
    4934+    if (afile.IsNull()) {
    4935+        LogPrintf("[snapshot] failed to open base blockhash file for reading: %s\n",
    4936+            fs::PathToString(p));
    4937+        return std::nullopt;
    4938+    }
    4939+    afile >> base_blockhash;
    


    ryanofsky commented at 3:29 pm on June 2, 2022:

    In commit “validation: rename snapshot chainstate dir” (60d8fabe1bd7b24257193884bb542cef55fb4252)

    Could log warnings on unexpected conditions.

     0--- a/src/validation.cpp
     1+++ b/src/validation.cpp
     2@@ -4937,6 +4937,11 @@ static std::optional<uint256> ReadSnapshotBaseBlockhash(fs::path datadir)
     3         return std::nullopt;
     4     }
     5     afile >> base_blockhash;
     6+    if (std::fgetc(afile.Get()) != EOF) {
     7+        LogPrintf("[snapshot] warning: unexpected trailing data in %s\n", fs::PathToString(p));
     8+    } else if (std::ferror(afile.Get())) {
     9+        LogPrintf("[snapshot] warning: i/o error reading %s\n", fs::PathToString(p));
    10+    }
    11     return base_blockhash;
    12 }
    

    I’m assuming failure would be pretty quick if blockhash was not valid (block would not be found) so it wouldn’t improve anything practically to add a checksum or more validation here.


    jamesob commented at 2:19 pm on June 23, 2022:
    Fixed, thanks.
  112. in src/validation.cpp:4901 in 60d8fabe1b outdated
    4896+    EXCLUSIVE_LOCKS_REQUIRED(::cs_main)
    4897+{
    4898+    AssertLockHeld(::cs_main);
    4899+    assert(snapshot_chainstate.m_from_snapshot_blockhash);
    4900+
    4901+    const std::optional<fs::path> datadir_path = snapshot_chainstate.CoinsDB().StoragePath();
    


    ryanofsky commented at 3:53 pm on June 2, 2022:

    In commit “validation: rename snapshot chainstate dir” (60d8fabe1bd7b24257193884bb542cef55fb4252)

    Throughout this commit would be good to s/datadir/chainstate_dir/ or similar. Datadir suggests top-level -datadir, I think


    jamesob commented at 2:19 pm on June 23, 2022:
    Fixed.
  113. in src/validation.cpp:5321 in 084c8e3749 outdated
    5316+        fs::PathToString(*path));
    5317+    std::optional<uint256> base_blockhash = ReadSnapshotBaseBlockhash(*path);
    5318+    if (!base_blockhash) {
    5319+        return false;
    5320+    }
    5321+    this->InitializeChainstate(mempool, /*snapshot_blockhash=*/ *base_blockhash);
    


    ryanofsky commented at 4:37 pm on June 2, 2022:

    In commit “init: add utxo snapshot detection” (084c8e37494c7423ea53d165cc04354117b4938c)

    I think it would clarify things to only call InitializeChainstate one place and eliminate chainstate juggling inside like:

     0diff --git a/src/validation.cpp b/src/validation.cpp
     1index f417f82f975..38df4e94d44 100644
     2--- a/src/validation.cpp
     3+++ b/src/validation.cpp
     4@@ -4848,28 +4848,13 @@ std::vector<CChainState*> ChainstateManager::GetAll()
     5     return out;
     6 }
     7 
     8-CChainState& ChainstateManager::InitializeChainstate(
     9-    CTxMemPool* mempool, const std::optional<uint256>& snapshot_blockhash)
    10+void ChainstateManager::InitializeChainstate(CTxMemPool* mempool)
    11 {
    12     AssertLockHeld(::cs_main);
    13-    bool is_snapshot = snapshot_blockhash.has_value();
    14-    std::unique_ptr<CChainState>& to_modify =
    15-        is_snapshot ? m_snapshot_chainstate : m_ibd_chainstate;
    16-
    17-    if (to_modify) {
    18-        throw std::logic_error("should not be overwriting a chainstate");
    19-    }
    20-    to_modify.reset(new CChainState(mempool, m_blockman, *this, snapshot_blockhash));
    21-
    22-    // Snapshot chainstates and initial IBD chaintates always become active.
    23-    if (is_snapshot || (!is_snapshot && !m_active_chainstate)) {
    24-        LogPrintf("Switching active chainstate to %s\n", to_modify->ToString());
    25-        m_active_chainstate = to_modify.get();
    26-    } else {
    27-        throw std::logic_error("unexpected chainstate activation");
    28-    }
    29-
    30-    return *to_modify;
    31+    assert(!m_ibd_chainstate);
    32+    assert(!m_active_chainstate);
    33+    m_ibd_chainstate = std::make_unique<CChainState>(mempool, m_blockman, *this);
    34+    m_active_chainstate = m_ibd_chainstate.get();
    35 }
    36 
    37 const AssumeutxoData* ExpectedAssumeutxo(
    38@@ -5308,6 +5293,7 @@ ChainstateManager::~ChainstateManager()
    39 
    40 bool ChainstateManager::DetectSnapshotChainstate(CTxMemPool* mempool)
    41 {
    42+    assert(!m_snapshot_chainstate);
    43     std::optional<fs::path> path = FindSnapshotChainstateDatadir();
    44     if (!path) {
    45         return false;
    46@@ -5318,6 +5304,8 @@ bool ChainstateManager::DetectSnapshotChainstate(CTxMemPool* mempool)
    47     if (!base_blockhash) {
    48         return false;
    49     }
    50-    this->InitializeChainstate(mempool, /*snapshot_blockhash=*/ *base_blockhash);
    51+    m_snapshot_chainstate = std::make_unique<CChainState>(mempool, m_blockman, *this, base_blockhash);
    52+    LogPrintf("Switching active chainstate to %s\n", m_snapshot_chainstate->ToString());
    53+    m_active_chainstate = m_snapshot_chainstate.get();
    54     return true;
    55 }
    56diff --git a/src/validation.h b/src/validation.h
    57index f8bdda160e5..70fb9c6d324 100644
    58--- a/src/validation.h
    59+++ b/src/validation.h
    60@@ -903,17 +903,10 @@ public:
    61     //! coins databases. This will be split somehow across chainstates.
    62     int64_t m_total_coinsdb_cache{0};
    63 
    64-    //! Instantiate a new chainstate and assign it based upon whether it is
    65-    //! from a snapshot.
    66+    //! Instantiate a new chainstate.
    67     //!
    68-    //! [@param](/bitcoin-bitcoin/contributor/param/)[in] mempool              The mempool to pass to the chainstate
    69-    //                                  constructor
    70-    //! [@param](/bitcoin-bitcoin/contributor/param/)[in] snapshot_blockhash   If given, signify that this chainstate
    71-    //!                                 is based on a snapshot.
    72-    CChainState& InitializeChainstate(
    73-        CTxMemPool* mempool,
    74-        const std::optional<uint256>& snapshot_blockhash = std::nullopt)
    75-        LIFETIMEBOUND EXCLUSIVE_LOCKS_REQUIRED(::cs_main);
    76+    //! [@param](/bitcoin-bitcoin/contributor/param/)[in] mempool The mempool to pass to the chainstate constructor
    77+    void InitializeChainstate(CTxMemPool* mempool) EXCLUSIVE_LOCKS_REQUIRED(::cs_main);
    78 
    79     //! Get all chainstates currently being used.
    80     std::vector<CChainState*> GetAll();
    

    jamesob commented at 2:20 pm on June 23, 2022:
    Fixed, thanks.
  114. in src/validation.cpp:5016 in 6e88c2cbea outdated
    5087+            bool removed = DeleteCoinsDBFromDisk(*snapshot_datadir, db_options, /*is_snapshot=*/true);
    5088+            if (!removed) {
    5089+                AbortNode(strprintf("Failed to remove snapshot chainstate dir (%s). "
    5090+                    "Manually remove it before restarting.\n", fs::PathToString(*snapshot_datadir)));
    5091+            }
    5092+            assert(removed);
    


    ryanofsky commented at 4:57 pm on June 2, 2022:

    In commit “validation: remove snapshot datadirs upon validation failure” (6e88c2cbea05b42d7ef387b83ae4147c79c26383)

    AbortNode can’t shut down immediately, so this assert could still trigger. It seems like it should be fine to just drop the assert since only caller to this function appears to be loadtxoutset and it is handling the false return value. Or if there is a concern about the node doing something bad with the snapshot directory still in the way, could throw an exception or call std::exit or std::abort instead of asserting


    jamesob commented at 2:23 pm on June 23, 2022:
    Fixed.
  115. in src/validation.h:497 in cf5a6d3cfb outdated
    489@@ -490,6 +490,12 @@ class CChainState
    490     //! Manages the UTXO set, which is a reflection of the contents of `m_chain`.
    491     std::unique_ptr<CoinsViews> m_coins_views;
    492 
    493+    //! This toggle exists for use when doing background validation for UTXO
    494+    //! snapshots. It is set once the background validation chain reaches the
    495+    //! same height as the base of the snapshot, and signals that we should no
    496+    //! longer connect blocks to the background chainstate.
    497+    bool m_stop_use{false};
    


    ryanofsky commented at 5:09 pm on June 2, 2022:

    In commit “validation: add CChainState::m_stop_use and ChainMan::isUsable” (cf5a6d3cfb1c77b5689ceb7a4bfa468e79774fb0)

    Should this be guarded by cs_main like m_snapshot_validated was? I’m guessing it can be accessed by more than one thread. If not, maybe add a comment about how it’s only accessed synchronously.


    jamesob commented at 2:23 pm on June 23, 2022:
    Fixed.
  116. in src/node/chainstate.cpp:107 in 1eb127e55e outdated
    54+        if (!chainstate->ReplayBlocks()) {
    55+            return ChainstateLoadingError::ERROR_REPLAYBLOCKS_FAILED;
    56+        }
    57+
    58+        // The on-disk coinsdb is now in a good state, create the cache
    59+        chainstate->InitCoinsCache(chainman.m_total_coinstip_cache * init_cache_fraction);
    


    ryanofsky commented at 6:07 pm on June 2, 2022:

    In commit “move-only-ish: init: factor out chainstate initialization” (1eb127e55e1fabe1c5addccd9be0feb421149645)

    This used to be nCoinCacheUsage instead of chainman.m_total_coinstip_cache * init_cache_fraction. Is it an intentional change, and could it be explained in the commit message? Should it be moved to the earlier commit introducing init_cache_fraction?


    jamesob commented at 2:24 pm on June 23, 2022:
    Fixed.
  117. in src/node/chainstate.cpp:169 in fa00494726 outdated
    156@@ -154,6 +157,51 @@ std::optional<ChainstateLoadingError> LoadChainstate(bool fReset,
    157                 chainman, coins_db_in_memory, fReset, fReindexChainState, coins_error_cb)) {
    158         return err;
    159     }
    160+
    161+    // If a snapshot chainstate was fully validated by a background chainstate during
    162+    // the last run, detect it here and clean up the now-unneeded background
    


    ryanofsky commented at 6:38 pm on June 2, 2022:

    In commit “validation: add ChainMan logic for completing UTXO snapshot validation” (fa004947262b4180ac41b14ede7ba19ee24eaad0)

    Would maybe mention that this is what normally happens, is not some unusual case that is just being cleaned up after, like I originally thought reviewing this PR. Could s/run, detect/run, the background chainstate won’t be deleted until the next restart, so detect/


    jamesob commented at 2:24 pm on June 23, 2022:
    Fixed.
  118. DrahtBot added the label Needs rebase on Jun 2, 2022
  119. in src/node/chainstate.cpp:193 in fa00494726 outdated
    178+
    179+        chainman.InitializeChainstate(mempool);
    180+
    181+        // A reload of the block index is required to recompute setBlockIndexCandidates
    182+        // for the fully validated chainstate.
    183+        chainman.ActiveChainstate().UnloadBlockIndex();
    


    ryanofsky commented at 7:00 pm on June 2, 2022:

    In commit “validation: add ChainMan logic for completing UTXO snapshot validation” (fa004947262b4180ac41b14ede7ba19ee24eaad0)

    Perhaps make a method that recomputes setBlockIndexCandidates without reloading? Seems like that would cleanup BlockIndex code and simplify this code as well.


    jamesob commented at 7:23 pm on June 16, 2022:
    That’d be a substantial portion of LoadBlockIndex yanked out into its own function; a change like that would also require some thought as to whether the two operations can truly be independent, which I’m not sure about offhand. In the interest of keeping this PR within its already-pretty-broad scope, I’m going to save this as a follow-up for someone else.
  120. in src/node/chainstate.cpp:195 in fa00494726 outdated
    190+                !chainman.m_blockman.LookupBlockIndex(chainman.GetParams().GetConsensus().hashGenesisBlock)) {
    191+            return ChainstateLoadingError::ERROR_BAD_GENESIS_BLOCK;
    192+        }
    193+        if (!fReindex && !chainman.ActiveChainstate().LoadGenesisBlock()) {
    194+            return ChainstateLoadingError::ERROR_LOAD_GENESIS_BLOCK_FAILED;
    195+        }
    


    ryanofsky commented at 7:03 pm on June 2, 2022:

    In commit “validation: add ChainMan logic for completing UTXO snapshot validation” (fa004947262b4180ac41b14ede7ba19ee24eaad0)

    Seems like if these two checks dealing with the genesis block were moved into the CompleteChainstateInitialization function, the code wouldn’t need to be repeated here and above


    jamesob commented at 2:24 pm on June 23, 2022:
    Fixed.
  121. in src/validation.cpp:5386 in fa00494726 outdated
    5381+    }
    5382+
    5383+    assert(index_new.nHeight == snapshot_base_height);
    5384+
    5385+    auto handle_invalid_snapshot = [&]() EXCLUSIVE_LOCKS_REQUIRED(::cs_main) {
    5386+        bilingual_str user_error = _(strprintf(
    


    ryanofsky commented at 7:09 pm on June 2, 2022:

    In commit “validation: add ChainMan logic for completing UTXO snapshot validation” (fa004947262b4180ac41b14ede7ba19ee24eaad0)

    Should translate the format string not formatted string

     0--- a/src/validation.cpp
     1+++ b/src/validation.cpp
     2@@ -5383,7 +5383,7 @@ SnapshotCompletionResult ChainstateManager::MaybeCompleteSnapshotValidation(
     3     assert(index_new.nHeight == snapshot_base_height);
     4 
     5     auto handle_invalid_snapshot = [&]() EXCLUSIVE_LOCKS_REQUIRED(::cs_main) {
     6-        bilingual_str user_error = _(strprintf(
     7+        bilingual_str user_error = strprintf(_(
     8             "%s failed to validate the -assumeutxo snapshot state. "
     9             "This indicates a hardware problem, or a bug in the software, or a "
    10             "bad software modification that allowed an invalid snapshot to be "
    11@@ -5392,9 +5392,9 @@ SnapshotCompletionResult ChainstateManager::MaybeCompleteSnapshotValidation(
    12             "from %d to %d. On the next "
    13             "restart, the node will resume syncing from %d "
    14             "without using any snapshot data. "
    15-            "Please report this incident to %s, including how you obtained the snapshot.",
    16+            "Please report this incident to %s, including how you obtained the snapshot."),
    17             PACKAGE_NAME, snapshot_tip_height, snapshot_base_height, snapshot_base_height, PACKAGE_BUGREPORT
    18-        ).c_str());
    19+        );
    20 
    21         LogPrintf("[snapshot] !!! %s\n", user_error.original);
    22         LogPrintf("[snapshot] deleting snapshot, reverting to validated chain, and stopping node\n");
    

    jamesob commented at 2:24 pm on June 23, 2022:
    Fixed.
  122. in src/validation.cpp:5557 in fa00494726 outdated
    5548@@ -5392,6 +5549,37 @@ bool ChainstateManager::DetectSnapshotChainstate(CTxMemPool* mempool)
    5549     return true;
    5550 }
    5551 
    5552+void CChainState::InvalidateCoinsDBOnDisk()
    5553+{
    5554+    AssertLockHeld(::cs_main);
    5555+    // Should never be called on a non-snapshot chainstate.
    5556+    assert(m_from_snapshot_blockhash);
    5557+    fs::path snapshot_datadir = *FindSnapshotChainstateDatadir();
    


    ryanofsky commented at 7:18 pm on June 2, 2022:

    In commit “validation: add ChainMan logic for completing UTXO snapshot validation” (fa004947262b4180ac41b14ede7ba19ee24eaad0)

    I think it would be better to use the datadir path that is was already loaded this->CoinsDB().StoragePath() instead of going outside the CChainState object and looking for a chainstate directory in the filesystem.


    jamesob commented at 2:24 pm on June 23, 2022:
    Fixed.
  123. in src/validation.cpp:5558 in fa00494726 outdated
    5553+{
    5554+    AssertLockHeld(::cs_main);
    5555+    // Should never be called on a non-snapshot chainstate.
    5556+    assert(m_from_snapshot_blockhash);
    5557+    fs::path snapshot_datadir = *FindSnapshotChainstateDatadir();
    5558+    assert(fs::exists(snapshot_datadir));
    


    ryanofsky commented at 7:22 pm on June 2, 2022:

    In commit “validation: add ChainMan logic for completing UTXO snapshot validation” (fa004947262b4180ac41b14ede7ba19ee24eaad0)

    I don’t think this should be an assert. It could just as easily be a filesystem error or outside change as an internal bug in the code.

    Assert isn’t necessary anyway, since rename will fail if directory doesn’t exist. Just handling filesystem errors and not trying to anticipate them is usually simpler and more reliable unless the existence check provides additional information.

    EDIT: Same comment could apply to other assert(fs::exists()) calls below


    jamesob commented at 2:25 pm on June 23, 2022:
    Fixed.
  124. in src/validation.cpp:4954 in 3ad9487ad9 outdated
    4950@@ -4951,6 +4951,62 @@ static std::optional<fs::path> FindSnapshotChainstateDatadir()
    4951     return std::nullopt;
    4952 }
    4953 
    4954+dbwrapper::Options CChainState::PrepareForCoinsDBDeletion()
    


    ryanofsky commented at 7:51 pm on June 2, 2022:

    In commit “add utilities for deleting on-disk leveldb data” (3ad9487ad9474a3912c65832daa56a9723495407)

    I don’t think it is actually valid to use the previous options structs after destroying m_coins_view. For example the structs have info_log and block_cache pointers that will no longer be valid.

    I’d guess it should be sufficient to drop this method, have callers just delete the whole CChainState object then call DestroyDB(path, /*options=*/{}) with a default options parameter. If not could add a dbwrapper::DestroyDB(path) wrapper that doesn’t need options and fills them out itself.


    jamesob commented at 2:25 pm on June 23, 2022:
    Fixed.
  125. in src/validation.cpp:5434 in fa00494726 outdated
    5429+        handle_invalid_snapshot();
    5430+        return {false, SnapshotCompletionError::MISSING_CHAINPARAMS};
    5431+    }
    5432+
    5433+    const AssumeutxoData& au_data = *maybe_au_data;
    5434+    auto breakpoint_fnc = [] { /* TODO insert breakpoint here? */ };
    


    ryanofsky commented at 8:08 pm on June 2, 2022:

    In commit “validation: add ChainMan logic for completing UTXO snapshot validation” (fa004947262b4180ac41b14ede7ba19ee24eaad0)

    You may want something like:

    0auto breakpoint_fnc = [] { if (ShutdownRequested()) throw StopHashing{}; };
    1
    2try {
    3    ComputeUTXOStats(breakpoint_fnc);
    4} except (const StopHashing&) {
    5    return false;
    6}
    

    jamesob commented at 2:25 pm on June 23, 2022:
    Fixed.
  126. in src/validation.cpp:5421 in fa00494726 outdated
    5416+    assert(this->GetAll().size() == 2);
    5417+
    5418+    CCoinsViewDB& ibd_coins_db = m_ibd_chainstate->CoinsDB();
    5419+    m_ibd_chainstate->ForceFlushStateToDisk();
    5420+
    5421+    LogPrintf("[snapshot] tip: actual=%s expected=%s\n",
    


    ryanofsky commented at 8:19 pm on June 2, 2022:

    In commit “validation: add ChainMan logic for completing UTXO snapshot validation” (fa004947262b4180ac41b14ede7ba19ee24eaad0)

    I think log message is a little confusing. It will always print the same hash because the two hashes were were already compared on line 5378. Maybe move it log message above line 5378 or just print the hash once.


    jamesob commented at 2:25 pm on June 23, 2022:
    Fixed.
  127. in src/validation.h:1002 in fa00494726 outdated
     997+    //! If the coins match (expected), then mark the validation chainstate for
     998+    //! deletion and continue using the snapshot chainstate as active.
     999+    //! Otherwise, revert to using the ibd chainstate and shutdown (TODO).
    1000+    SnapshotCompletionResult MaybeCompleteSnapshotValidation(
    1001+        std::function<void(bilingual_str)> shutdown_fnc =
    1002+            [](bilingual_str msg) { AbortNode(msg.translated, msg); })
    


    ryanofsky commented at 8:25 pm on June 2, 2022:

    In commit “validation: add ChainMan logic for completing UTXO snapshot validation” (fa004947262b4180ac41b14ede7ba19ee24eaad0)

    s/msg.translated/msg.original/ since that’s what’s logged


    jamesob commented at 2:25 pm on June 23, 2022:
    Fixed.
  128. in src/test/util/setup_common.h:119 in 1918b546a5 outdated
    113     explicit TestingSetup(
    114         const std::string& chainName = CBaseChainParams::MAIN,
    115         const std::vector<const char*>& extra_args = {},
    116         const bool coins_db_in_memory = true,
    117-        const bool m_block_tree_db_in_memory = true);
    118+        const bool block_tree_db_in_memory = true);
    


    ryanofsky commented at 8:49 pm on June 2, 2022:

    In commit “test: move-only-ish: factor out LoadVerifyActivateChainstate()” (1918b546a59d75abcdac80ede563e3eb552d3968)

    Fixup belongs in previous commit, I think


    jamesob commented at 2:25 pm on June 23, 2022:
    Fixed.
  129. in doc/assumeutxo.md:117 in d988631b50 outdated
    125 
    126-**Failure consideration:** if bitcoind unexpectedly halts after `m_stop_use` is set on
    127-the background chainstate but before `CompleteSnapshotValidation()` can finish, the
    128-need to complete snapshot validation will be detected on subsequent init by
    129-`ChainstateManager::CheckForUncleanShutdown()`.
    130+The background chainstate data lingers on disk until subsequent initialization.
    


    ryanofsky commented at 1:09 am on June 3, 2022:

    In commit “docs: update assumeutxo.md” (d988631b50b8b10a8dadae00fe61ebd32058a9d1)

    s/until subsequent initialization/the software is restarted/ would be clearer I think.

    could also s/on subsequent inits/on software startup/ above


    jamesob commented at 2:25 pm on June 23, 2022:
    Fixed.
  130. in src/node/blockstorage.cpp:532 in ae7c791038 outdated
    527+    if (m_blockfile_info.size() < 1) {
    528+        // Return if we haven't loaded any blockfiles yet. This happens during
    529+        // chainstate init, when we call ChainstateManager::MaybeRebalanceCaches() (which
    530+        // then calls FlushStateToDisk()), resulting in a call to this function before we
    531+        // have populated `m_blockfile_info` via LoadBlockIndexDB().
    532+        return;
    


    ryanofsky commented at 1:11 am on June 3, 2022:

    In commit “blockmanager: avoid undefined behavior during FlushBlockFile” (ae7c79103851b9157c382fab9cb8433b172e7eed)

    Could consider adding a unit test to hit this code path.


    jamesob commented at 7:08 pm on June 16, 2022:
    I believe this is hit on every initialization, per the comment above (which is also how I realized it was a necessary change :).

    ryanofsky commented at 6:40 pm on July 6, 2022:

    In commit “blockmanager: avoid undefined behavior during FlushBlockFile” (5254307117d7bacf275a85707f29dcbdd3d24dc5)

    Is commit message out of date? It says “This case becomes more than academic in the next commit, when we call MaybeRebalanceCaches() during chainstate init before the block index has been loaded.” But I think this is actually happening in an earlier commit b1dbc76037c1f2ff893113057afd3f2eac581dbd, not the next commit? In any case, it would be helpful to know specifically what “during chainstate init” means in the code comment here and maybe be more specific about related commit in the commit message.


    jamesob commented at 7:54 pm on July 19, 2022:
    Fixed.
  131. in src/test/util/chainstate.h:65 in 9fb0eab4fa outdated
    57@@ -48,6 +58,33 @@ CreateAndActivateUTXOSnapshot(node::NodeContext& node, const fs::path root, F ma
    58 
    59     malleation(auto_infile, metadata);
    60 
    61+    if (reset_chainstate) {
    


    ryanofsky commented at 1:21 am on June 3, 2022:

    In commit “test: add reset_chainstate parameter for snapshot unittests” (9fb0eab4faf4b2aaabe29504ebb2a93a1c36a0e5)

    This test setup code seems fragile duplicative. Maybe comment can say that it is a stripped down copy of the LoadChainstate function. Would be nice if a future followup could call LoadChainstate directly or split up the block part and the chain part of LoadChainstate and just call the chain part.


    jamesob commented at 8:43 pm on June 16, 2022:

    Yeah, agree - though reusing LoadChainstate in this case would require building in the option to preserve the existing block index, which (to date anyway) would be a test-specific feature. Not sure which is worse: test-specific parameter or duplicating a few lines of code.

    I’m adding a note to the comment as you suggest though.

  132. in src/test/validation_chainstatemanager_tests.cpp:403 in 2e30177903 outdated
    378+
    379+    SnapshotCompletionResult res;
    380+    auto mock_shutdown = [](bilingual_str msg) {};
    381+
    382+    fs::path snapshot_chainstate_dir = gArgs.GetDataDirNet() / "chainstate_snapshot";
    383+    BOOST_CHECK(fs::exists(snapshot_chainstate_dir));
    


    ryanofsky commented at 1:34 am on June 3, 2022:

    In commit “test: add testcases for snapshot completion” (2e30177903c3df456ed288a61fa05efb117d2e1c)

    Maybe expose FindSnapshotChainstateDatadir function and call that, to reduce repetition here and add coverage there.

    This would also be a good place to call ReadSnapshotBaseBlockhash function and test that.


    jamesob commented at 2:26 pm on June 23, 2022:
    Fixed; moved both of those functions to the new src/node/utxo_snapshot.cpp unit.
  133. in src/validation.cpp:4895 in 60d8fabe1b outdated
    4890+
    4891+
    4892+//! Write out the blockhash of the snapshot base block that was used to construct
    4893+//! this chainstate. This value is read in during subsequent initializations and
    4894+//! used to reconstruct snapshot-based chainstates.
    4895+static bool WriteSnapshotBaseBlockhash(CChainState& snapshot_chainstate)
    


    ryanofsky commented at 1:39 am on June 3, 2022:

    In commit “validation: rename snapshot chainstate dir” (60d8fabe1bd7b24257193884bb542cef55fb4252)

    I get =Werror compiler warning/errors this commit because these functions are static and not called anywhere. IMO, these functions seem useful to call in unit tests and I would just make them normal non-hidden functions.

    You could also move them to src/node/snapshot.cpp or something like that so they are not expanding validation.cpp which is already bursting at the seams.


    jamesob commented at 2:26 pm on June 23, 2022:
    Fixed.
  134. in src/node/chainstate.cpp:93 in 084c8e3749 outdated
    89     for (CChainState* chainstate : chainman.GetAll()) {
    90+        LogPrintf("Initializing chainstate %s\n", chainstate->ToString());
    91+
    92         chainstate->InitCoinsDB(
    93-            /*cache_size_bytes=*/nCoinDBCache,
    94+            /*cache_size_bytes=*/nCoinDBCache * init_cache_fraction,
    


    ryanofsky commented at 1:44 am on June 3, 2022:

    In commit “init: add utxo snapshot detection” (084c8e37494c7423ea53d165cc04354117b4938c)

    Should init_cache_fraction also be used in chainstate->InitCoinsCache call below? (It is changing later in a move-only commit, and maybe that change was supposed to happen this commit)


    jamesob commented at 2:26 pm on June 23, 2022:
    Fixed.
  135. in src/validation.cpp:5086 in 6e88c2cbea outdated
    5082+
    5083+        // PopulateAndValidateSnapshot can return (in error) before the leveldb datadir
    5084+        // has been created, so only attempt removal if we got that far.
    5085+        if (auto snapshot_datadir = FindSnapshotChainstateDatadir()) {
    5086+            dbwrapper::Options db_options = snapshot_chainstate->PrepareForCoinsDBDeletion();
    5087+            bool removed = DeleteCoinsDBFromDisk(*snapshot_datadir, db_options, /*is_snapshot=*/true);
    


    ryanofsky commented at 1:51 am on June 3, 2022:

    In commit “validation: remove snapshot datadirs upon validation failure” (6e88c2cbea05b42d7ef387b83ae4147c79c26383)

    Seems like there is no unit test for this code path? I think it would be important to add because this seems like code that would run very rarely and could easily become broken without anybody knowing about it.


    jamesob commented at 8:06 pm on June 21, 2022:

    This is covered in the unittests; you can verify by observing the logline

    02022-06-21T19:59:23.617500Z (mocktime: 2020-08-31T15:34:12Z) [test] [validation.cpp:4919] [DeleteCoinsDBFromDisk] Removing leveldb dir at /tmp/test_common_Bitcoin Core/c0e34f63b527f04735c4b2ae434e9b81110d7f5d39eedf5c919e4c3ccd6fabdb/regtest/chainstate_snapshot
    

    after running

    0% ./src/test/test_bitcoin --log_level=message '--run_test=*/chainstatemanager_snapshot_completion' -- -printtoconsole=1 -debug=0
    
  136. in src/validation.h:879 in dde02d3d6a outdated
    863@@ -864,6 +864,10 @@ class ChainstateManager
    864         CBlockIndex** ppindex) EXCLUSIVE_LOCKS_REQUIRED(cs_main);
    865     friend CChainState;
    866 
    867+    // Returns nullptr if no snapshot has been loaded.
    868+    CBlockIndex* GetSnapshotBaseBlock() EXCLUSIVE_LOCKS_REQUIRED(::cs_main);
    869+    std::optional<int> GetSnapshotBaseHeight() EXCLUSIVE_LOCKS_REQUIRED(::cs_main);
    


    ryanofsky commented at 1:52 am on June 3, 2022:

    In commit “add ChainstateManager.GetSnapshot{BaseHeight,BaseBlock}()” (dde02d3d6af14ab94c6910b5695e9bbd6262fe55)

    Seems like these would be good to call in the new unit tests


    jamesob commented at 8:14 pm on June 21, 2022:
    These are private methods so I think that’s out. But it isn’t so bad - they’re already incidentally tested by ActivateSnapshot and MaybeCompleteSnapshotValidation.
  137. ryanofsky approved
  138. ryanofsky commented at 2:18 am on June 3, 2022: contributor

    Code review ACK d988631b50b8b10a8dadae00fe61ebd32058a9d1. This is looking very good. The changes seem even more straightforward than before and the new tests are pretty clean and seem to cover the basic functionality well.

    I think I’d still question the design decision (https://github.com/bitcoin/bitcoin/pull/24232#discussion_r861875528) not to delete the background chainstate after it’s been validated until the next restart. IMO this makes things awkward in both the code and the tests because it adds an extra initialization side-quest that has to rehash the utxo set after it was already hashed, repeat initialization that was already done, and add hacky test code simulating a restart in order to reach this code path. I think if you just deleted the background chainstate and renamed the snapshot chainstate over it right after verifying the background chainstate, everything would be more straightforward. I don’t see why it should be a problem because higher level code can already deal with switching to a new chainstate when the snapshot is loaded, so just unloading and reloading the chainstates without even changing which one is active seems like an easier problem. This does seems like something could be addressed in a followup, though.

  139. jamesob commented at 8:53 pm on June 14, 2022: member

    Thanks for the comprehensive feedback, @ryanofsky. Many good suggestions here. I just wanted to leave a quick comment to note that I’m in process of incorporating them - there are quite a few commits to touch up.

    I want to evaluate your suggestion that we do the chainstate swap/data deletion “inline” with the background chainstate connecting up with the snapshot base block, but I’m going to get your other (simpler) suggestions integrated first. As you note, I do think this is something that could be changed in future PRs if desired. But hopefully by the end of the week I should have a better idea about its feasibility.

    My only conceptual hangup at the moment is that we will need to do some kind of hacky DisconnectTippish thing of the snapshot base block to ensure that if the cleanup process isn’t started or doesn’t complete before shutdown, that we are able to catch it on subsequent boot. Maybe that’s cleaner than the cost of some of the almost-duplicated code in this approach. But in either case, we need to handle the possibility that snapshot validation wasn’t completed properly before shutdown and must be on subsequent init.

  140. ryanofsky commented at 2:19 am on June 15, 2022: contributor

    My only conceptual hangup at the moment is that we will need to do some kind of hacky DisconnectTippish thing of the snapshot base block to ensure that if the cleanup process isn’t started or doesn’t complete before shutdown, that we are able to catch it on subsequent boot.

    I guess the last block on the background chain being connected is what triggers validating the background utxo set and deleting it, so you would want to leave the last block disconnected in order to trigger it again in case this is interrupted. This seems reasonable. Also seems reasonable that there could be some different trigger. I think the main improvement would be that if the backround validation and cleanup code is more integrated into normal block processing, init code can be simpler, and users wouldn’t have a reason to stop and restart their nodes.

    This seems reasonable. Also seems just as reasonable that there could be another that says if starting up and the background chain is fully downloaded, validate and delete it.

  141. jamesob force-pushed on Jun 21, 2022
  142. DrahtBot removed the label Needs rebase on Jun 21, 2022
  143. DrahtBot added the label Needs rebase on Jun 22, 2022
  144. jamesob force-pushed on Jun 22, 2022
  145. DrahtBot removed the label Needs rebase on Jun 22, 2022
  146. jamesob force-pushed on Jun 22, 2022
  147. jamesob force-pushed on Jun 22, 2022
  148. jamesob commented at 2:35 pm on June 23, 2022: member

    Thanks for all the feedback there @ryanofsky. It took me much longer than I expected to address everything.

    I am not going to address your recommendation here to swap out the chainstate data “inline” immediately after snapshot validation (vs. current approach of waiting until subsequent startup) because of the combination of two things:

    • I’m running out of steam on assumeutxo. I have been working on this feature for 3 years and it isn’t even existentially important to bitcoin. I have other stuff that is probably more valuable for me to work on and I’m pretty close to burnt out on this project.
    • I don’t see that approach as being substantially safer or more correct. In fact, making such a dramatic change as swapping out directories and reinitializing chainstates “in flight” seems fairly daring to me, vs. just halting use of one of the chainstates and doing the janitorial work at some point when the user is stopping/starting the bitcoind process.

    I want to heartily thank you for your continued review of this PR, and I want to apologize for being so demoralized by the length of this project.

  149. jamesob force-pushed on Jun 23, 2022
  150. in test/functional/feature_init.py:58 in 2abc1b1a4c outdated
    54@@ -55,7 +55,6 @@ def check_clean_start():
    55             b'Loading P2P addresses',
    56             b'Loading banlist',
    57             b'Loading block index',
    58-            b'Switching active chainstate',
    


    jamesob commented at 2:44 pm on June 23, 2022:
    Note to reviewers: this is necessary because of the simplification of InitializeChainstate(); we now only log when switching the active to a snapshot chainstate.
  151. in src/node/utxo_snapshot.cpp:25 in 35cba210f2 outdated
    19+{
    20+    AssertLockHeld(::cs_main);
    21+    assert(snapshot_chainstate.m_from_snapshot_blockhash);
    22+
    23+    const std::optional<fs::path> chaindir = snapshot_chainstate.CoinsDB().StoragePath();
    24+    assert(chaindir); // Sanity check that chainstate isn't in-memory.
    


    ariard commented at 1:05 am on June 24, 2022:
    Minor comment, using the “in-memory” wording (and in the whole cascade of callers) to qualify the state of chainstate sounds confusing. It’s could be interpreted as if the chainstate is not loaded in your DRAM, or not already persisted on your disk, or “fresh” from a validation standpoint…

    jamesob commented at 8:24 pm on June 28, 2022:
    While I agree that “in-memory” is a little confusing in a vacuum, that term has had a very specific meaning for awhile in the context of chainstates and coinsdb - see e.g. the arguments in_memory on CChainState::InitCoinsDB, CoinsViews::CoinsViews, and CCoinsViewDB::m_is_memory.
  152. in src/node/utxo_snapshot.cpp:38 in 35cba210f2 outdated
    32+        return false;
    33+    }
    34+    afile << *snapshot_chainstate.m_from_snapshot_blockhash;
    35+
    36+    if (afile.fclose() != 0) {
    37+        LogPrintf("[snapshot] failed to close base blockhash file %s after writing\n",
    


    ariard commented at 1:18 am on June 24, 2022:

    I think it could ease debugging and feature correctness monitoring if the assumeutxo logic gets its own LogFlags. It would be nice if one can observe the chainstate dance operating nicely, or spotting any weirdness.

    It’s more a global note on the assumeutxo patchet, it can be addressed latter on.


    jamesob commented at 7:36 pm on June 28, 2022:
    Agree, maybe will do in a later change.
  153. in src/validation.cpp:5241 in e0f892fa15 outdated
    5236+CChainState& ChainstateManager::ActivateSnapshotChainstate(CTxMemPool* mempool, uint256 base_blockhash)
    5237+{
    5238+    assert(!m_snapshot_chainstate);
    5239+    m_snapshot_chainstate = 
    5240+        std::make_unique<CChainState>(mempool, m_blockman, *this, base_blockhash);
    5241+    LogPrintf("Switching active chainstate to %s\n", m_snapshot_chainstate->ToString());
    


    ariard commented at 1:40 am on June 24, 2022:
    Maybe log deserves its “[snapshot]” prefix.

    jamesob commented at 7:37 pm on June 28, 2022:
    Done.
  154. in src/validation.cpp:4865 in 75841142b9 outdated
    4859@@ -4860,6 +4860,48 @@ const AssumeutxoData* ExpectedAssumeutxo(
    4860     return nullptr;
    4861 }
    4862 
    4863+static bool DeleteCoinsDBFromDisk(
    4864+    const fs::path db_path,
    4865+    bool is_snapshot = false)
    


    ariard commented at 1:54 am on June 24, 2022:
    At this PR, DeleteCoinsDBFromDisk is called only twice so it sounds there is no clear default argument so it could be better to give them explicitly. Default arguments have already been source of bugs in the past..
  155. ariard commented at 2:31 am on June 24, 2022: member

    Reviewed until 2abc1b1, should review other commits soon.

    I’m running out of steam on assumeutxo. I have been working on this feature for 3 years and it isn’t even existentially important to bitcoin. I have other stuff that is probably more valuable for me to work on and I’m pretty close to burnt out on this project

    I still think assumeutxo is an interesting project to work on, especially to guarantee a smooth initial syncing process. If we aim to keep onboarding new full-nodes users in the coming decade to strengthen network decentralization it’s more than nice to get assumeutxo in our feature set. It’s also works well with a potential utreexo integration in core.

    I would be interested to see a clear project status to know where to focus review time, neither the board or the meta-PR sounds to provide this information. I think it would also help to cut PRs in smaller chunks, even if there are not conceptually hard like this one. Hopefully assumeutxo should be close from a v0.1 release.

  156. DrahtBot added the label Needs rebase on Jun 24, 2022
  157. jamesob force-pushed on Jun 28, 2022
  158. jamesob commented at 7:41 pm on June 28, 2022: member

    Thanks for your review and encouragement, @ariard.

    I think it would also help to cut PRs in smaller chunks, even if there are not conceptually hard like this one.

    In principle, I agree, and that’s how I started the project. But I have found that even small PRs take months to merge (e.g. #23997 has been open for 6 months), and I have found that sadly the process encourages bundling large changesets together because otherwise progress feels impossibly slow.

    I have considered at various points doing the remainder of assumeutxo in a single PR because then at least review/test time could be consolidated on one changeset instead of a painfully lengthy series of smaller changes, each of which seems to take nearly half a year to get in.

  159. jamesob commented at 8:13 pm on June 28, 2022: member

    I would be interested to see a clear project status to know where to focus review time, neither the board or the meta-PR sounds to provide this information.

    I have added an updated project status list on the parent PR (https://github.com/bitcoin/bitcoin/pull/15606#issue-421531882, cc @ariard). Please let me know if there’s anything I can add that would be helpful. Only about half the subtasks are complete so far, but (I hope) we are farther along than halfway, as after this PR and #24008 are merged, most of the tricky logic and refactoring will be done.

  160. DrahtBot removed the label Needs rebase on Jun 28, 2022
  161. in src/txdb.h:76 in 93296fefe1 outdated
    71@@ -71,6 +72,9 @@ class CCoinsViewDB final : public CCoinsView
    72 
    73     //! Dynamically alter the underlying leveldb cache size.
    74     void ResizeCache(size_t new_cache_size) EXCLUSIVE_LOCKS_REQUIRED(cs_main);
    75+
    76+    //! @returns filesystem path to on-disk storage.
    


    ariard commented at 7:05 pm on July 4, 2022:
    nit: could document when an optional not containing value is returned.

    jamesob commented at 8:12 pm on July 18, 2022:
    Done.
  162. in src/validation.cpp:4839 in 0f9d3f19fe outdated
    4932@@ -4928,9 +4933,17 @@ bool ChainstateManager::ActivateSnapshot(
    4933             static_cast<size_t>(current_coinstip_cache_size * SNAPSHOT_CACHE_PERC));
    4934     }
    4935 
    4936-    const bool snapshot_ok = this->PopulateAndValidateSnapshot(
    4937+    bool snapshot_ok = this->PopulateAndValidateSnapshot(
    


    ariard commented at 7:16 pm on July 4, 2022:
    Side-note, i think it can be interesting to document PopulateAndValidateSnapshot() with what exactly this function is achieving, the sequence of checks and operations and the potential failures. It’s a ~200 LoC function in the middle of the validation code.

    jamesob commented at 8:29 pm on July 18, 2022:
    Yeah, good point. I’ll do a follow-up on this since it’s out of scope for the already large change.
  163. in src/node/utxo_snapshot.h:55 in 0f9d3f19fe outdated
    48+//! this chainstate. This value is read in during subsequent initializations and
    49+//! used to reconstruct snapshot-based chainstates.
    50+bool WriteSnapshotBaseBlockhash(CChainState& snapshot_chainstate)
    51+    EXCLUSIVE_LOCKS_REQUIRED(::cs_main);
    52+
    53+std::optional<uint256> ReadSnapshotBaseBlockhash(fs::path chaindir)
    


    ariard commented at 7:19 pm on July 4, 2022:
    nit: could be documented like the writer.

    jamesob commented at 8:20 pm on July 18, 2022:
    Fixed.
  164. in src/node/utxo_snapshot.cpp:21 in 0f9d3f19fe outdated
    15+
    16+namespace node {
    17+
    18+bool WriteSnapshotBaseBlockhash(CChainState& snapshot_chainstate)
    19+{
    20+    AssertLockHeld(::cs_main);
    


    ariard commented at 7:28 pm on July 4, 2022:
    I think this call could be modified a bit to avoid holding cs_main during the function lifetime. In ActivateSnapshot() you might have a WITH_LOCK to get the StoragePath() and give it as a argument here.

    jamesob commented at 8:14 pm on July 18, 2022:
    I think conceptually this should hold cs_main, since we’re writing to a chainstate directory. Splitting cs_main out from per-chainstate locks should be a larger follow-on effort, and I think we can delay reworking this call until then.

    ariard commented at 2:56 pm on August 1, 2022:
    All good to me.
  165. in src/node/utxo_snapshot.cpp:51 in 0f9d3f19fe outdated
    46+    if (!fs::exists(chaindir)) {
    47+        LogPrintf("[snapshot] cannot read base blockhash: no chainstate dir " /* Continued */
    48+            "exists at path %s\n", fs::PathToString(chaindir));
    49+        return std::nullopt;
    50+    }
    51+    const fs::path p = chaindir / node::SNAPSHOT_BLOCKHASH_FILENAME;
    


    ariard commented at 7:33 pm on July 4, 2022:
    to improve readability of this function, could be read_from.

    jamesob commented at 8:20 pm on July 18, 2022:
    Fixed.
  166. in src/node/utxo_snapshot.cpp:55 in 0f9d3f19fe outdated
    50+    }
    51+    const fs::path p = chaindir / node::SNAPSHOT_BLOCKHASH_FILENAME;
    52+
    53+    if (!fs::exists(p)) {
    54+        LogPrintf("[snapshot] snapshot chainstate dir is malformed! no base blockhash file " /* Continued */
    55+            "exists at path %s. Try reinitializing snapshot?\n", fs::PathToString(p));
    


    ariard commented at 7:35 pm on July 4, 2022:
    By reinitializing the snapshot, you mean removing the snapshot chainstate dir and calling back loadtxoutset?

    jamesob commented at 8:20 pm on July 18, 2022:
    Fixed, thanks.
  167. in src/validation.h:995 in b1dbc76037 outdated
    987@@ -994,6 +988,13 @@ class ChainstateManager
    988     /** Produce the necessary coinbase commitment for a block (modifies the hash, don't call for mined blocks). */
    989     std::vector<unsigned char> GenerateCoinbaseCommitment(CBlock& block, const CBlockIndex* pindexPrev) const;
    990 
    991+    //! When starting up, search the datadir for a chainstate based on a UTXO
    992+    //! snapshot that is in the process of being validated.
    993+    bool DetectSnapshotChainstate(CTxMemPool* mempool) EXCLUSIVE_LOCKS_REQUIRED(::cs_main);
    994+
    995+    CChainState& ActivateSnapshotChainstate(CTxMemPool* mempool, uint256 base_blockhash) 
    


    ariard commented at 7:45 pm on July 4, 2022:
    whitespace here

    jamesob commented at 3:36 pm on July 19, 2022:
    Not sure what you mean - trailing whitespace?

    ariard commented at 2:55 pm on August 1, 2022:
    Well I don’t have it anymore on the latest push of #25667 so all good.
  168. in src/node/chainstate.cpp:76 in b1dbc76037 outdated
    78@@ -74,12 +79,18 @@ std::optional<ChainstateLoadingError> LoadChainstate(bool fReset,
    79         return ChainstateLoadingError::ERROR_LOAD_GENESIS_BLOCK_FAILED;
    80     }
    81 
    82+    // Conservative value that will ultimately be changed by
    83+    // a call to `chainman.MaybeRebalanceCaches()`.
    84+    double init_cache_fraction = 0.2;
    


    ariard commented at 7:56 pm on July 4, 2022:
    I think it would be good to document the reasoning behind picking up 0.2 and to which proportion of the coin cache is allocated to each chainstate iirc the purpose of cache rebalancing.

    jamesob commented at 1:49 pm on July 19, 2022:
    Fixed.
  169. in src/validation.cpp:5225 in b1dbc76037 outdated
    5220+    assert(!m_snapshot_chainstate);
    5221+    std::optional<fs::path> path = node::FindSnapshotChainstateDir();
    5222+    if (!path) {
    5223+        return false;
    5224+    }
    5225+    LogPrintf("[snapshot] detected active snapshot chainstate (%s) - loading\n",
    


    ariard commented at 8:03 pm on July 4, 2022:
    I would say, as the return value of DetectSnapshotChainstate is not considered in LoadChainstate the log print could be only located only when the detection sequence is fully valid, i.e after the read of the base blockhash.

    jamesob commented at 1:59 pm on July 19, 2022:
    Good point, fixed.
  170. in src/validation.cpp:5239 in b1dbc76037 outdated
    5234+}
    5235+
    5236+CChainState& ChainstateManager::ActivateSnapshotChainstate(CTxMemPool* mempool, uint256 base_blockhash)
    5237+{
    5238+    assert(!m_snapshot_chainstate);
    5239+    m_snapshot_chainstate = 
    


    ariard commented at 8:03 pm on July 4, 2022:
    whitespace

    jamesob commented at 3:36 pm on July 19, 2022:
    I think this is an okay linebreak to prevent long lines.
  171. in src/validation.cpp:4745 in 167b9af34a outdated
    4859@@ -4860,6 +4860,46 @@ const AssumeutxoData* ExpectedAssumeutxo(
    4860     return nullptr;
    4861 }
    4862 
    4863+static bool DeleteCoinsDBFromDisk(const fs::path db_path, bool is_snapshot)
    4864+    EXCLUSIVE_LOCKS_REQUIRED(::cs_main)
    4865+{
    4866+    AssertLockHeld(::cs_main);
    


    ariard commented at 8:11 pm on July 4, 2022:
    Is that an undocumented design assumption of our writers that they should always take cs_main lock even if there is no modification of a protected value like here afaict ? Wonder if the consistency guarantees of the files are not guaranteed at the kernel level anyway.

    jamesob commented at 3:50 pm on July 19, 2022:
    I assumed that it was commonly accepted that any destructive modification to a chainstate directory would require cs_main, but you make a good point that I don’t think that’s documented anywhere…
  172. in src/validation.h:879 in b707032db9 outdated
    834@@ -835,6 +835,10 @@ class ChainstateManager
    835         CBlockIndex** ppindex) EXCLUSIVE_LOCKS_REQUIRED(cs_main);
    836     friend CChainState;
    837 
    838+    // Returns nullptr if no snapshot has been loaded.
    839+    CBlockIndex* GetSnapshotBaseBlock() EXCLUSIVE_LOCKS_REQUIRED(::cs_main);
    840+    std::optional<int> GetSnapshotBaseHeight() EXCLUSIVE_LOCKS_REQUIRED(::cs_main);
    


    ariard commented at 8:15 pm on July 4, 2022:
    nit: could be documented

    jamesob commented at 7:51 pm on July 19, 2022:
    Fixed.
  173. in src/validation.h:475 in 430761a35d outdated
    468@@ -469,6 +469,12 @@ class CChainState
    469     //! Manages the UTXO set, which is a reflection of the contents of `m_chain`.
    470     std::unique_ptr<CoinsViews> m_coins_views;
    471 
    472+    //! This toggle exists for use when doing background validation for UTXO
    473+    //! snapshots. It is set once the background validation chain reaches the
    474+    //! same height as the base of the snapshot, and signals that we should no
    475+    //! longer connect blocks to the background chainstate.
    


    ariard commented at 8:26 pm on July 4, 2022:
    The snapshot invalid case mentioned by IsUsable could be also reflected here.

    jamesob commented at 7:57 pm on July 19, 2022:
    Done.
  174. in src/validation.cpp:4713 in 430761a35d outdated
    4831-    }
    4832-
    4833-    if (m_snapshot_chainstate) {
    4834-        out.push_back(m_snapshot_chainstate.get());
    4835+    for (CChainState* cs : {m_ibd_chainstate.get(), m_snapshot_chainstate.get()}) {
    4836+        if (this->IsUsable(cs)) out.push_back(cs);
    


    ariard commented at 8:32 pm on July 4, 2022:
    Maybe you could update GetAll documentation to “Get all chainstates currently IsUsable() to make it clear some chainstates might be filtered out.

    jamesob commented at 7:57 pm on July 19, 2022:
    I think the “currently being used” in the doc covers this; talking about IsUsable() would be leaking details since that’s a private method.
  175. in src/node/chainstate.h:30 in 00b319a35c outdated
    26@@ -27,6 +27,7 @@ enum class ChainstateLoadingError {
    27     ERROR_GENERIC_BLOCKDB_OPEN_FAILED,
    28     ERROR_BLOCKS_WITNESS_INSUFFICIENTLY_VALIDATED,
    29     SHUTDOWN_PROBED,
    30+    SNAPSHOT_VALIDATION_FAILED,
    


    ariard commented at 8:45 pm on July 4, 2022:
    could be good to document the ChainstateLoadingError, that would ease the review of the code paths where errors are returned and the adequateness of them.

    jamesob commented at 1:05 pm on July 20, 2022:
    These are apparently going away in #25308 anyway, so I’ll leave this be.
  176. in src/node/chainstate.cpp:57 in 00b319a35c outdated
    19@@ -20,10 +20,26 @@ static std::optional<ChainstateLoadingError> CompleteChainstateInitialization(
    20     std::function<void()> coins_error_cb) EXCLUSIVE_LOCKS_REQUIRED(::cs_main)
    21 
    22 {
    23+    if (!chainman.BlockIndex().empty() &&
    24+            !chainman.m_blockman.LookupBlockIndex(chainman.GetParams().GetConsensus().hashGenesisBlock)) {
    25+        return ChainstateLoadingError::ERROR_BAD_GENESIS_BLOCK;
    26+    }
    27+
    28+    // At this point blocktree args are consistent with what's on disk.
    


    ariard commented at 8:57 pm on July 4, 2022:
    What’s the purpose of moving those checks from LoadChainstate() to CompleteChainstateInitialization() ? I’m not sure the comment about ThreadImport is still accurate.

    jamesob commented at 8:06 pm on July 19, 2022:

    It’s to consolidate the calls so that we don’t have to repeat them in the if (snapshot_completion.completed) case.

    I believe the ThreadImport comment is still accurate: https://github.com/jamesob/bitcoin/blob/master/src/node/blockstorage.cpp#L894


    ariard commented at 3:11 pm on August 1, 2022:
    Right, that’s accurate.
  177. in src/validation.h:778 in 00b319a35c outdated
    773+    // not match the one expected by the snapshot chainstate.
    774+    BASE_BLOCKHASH_MISMATCH,
    775+};
    776+
    777+struct SnapshotCompletionResult {
    778+    bool completed;
    


    ariard commented at 8:58 pm on July 4, 2022:
    could be called success for less redundancy with completion

    jamesob commented at 1:29 pm on July 20, 2022:
    Fixed by removing
  178. in src/validation.cpp:2759 in 00b319a35c outdated
    2749@@ -2750,6 +2750,14 @@ bool CChainState::ConnectTip(BlockValidationState& state, CBlockIndex* pindexNew
    2750     LogPrint(BCLog::BENCH, "  - Connect postprocess: %.2fms [%.2fs (%.2fms/blk)]\n", (nTime6 - nTime5) * MILLI, nTimePostConnect * MICRO, nTimePostConnect * MILLI / nBlocksTotal);
    2751     LogPrint(BCLog::BENCH, "- Connect block: %.2fms [%.2fs (%.2fms/blk)]\n", (nTime6 - nTime1) * MILLI, nTimeTotal * MICRO, nTimeTotal * MILLI / nBlocksTotal);
    2752 
    2753+    // If we are the background validation chainstate, check to see if we are done
    2754+    // validating the snapshot (i.e. our tip has reached the snapshot's base block).
    2755+    if (this != &m_chainman.ActiveChainstate()) {
    2756+        // This call may set `m_stop_use`, which is referenced immediately afterwards in
    2757+        // ActivateBestChain.
    2758+        m_chainman.MaybeCompleteSnapshotValidation();
    


    ariard commented at 9:05 pm on July 4, 2022:
    I think assumeutxod.md might be inaccurate on that, it says the UTXO set is checked in ActivateBestChain(), should say ConnectTip() ?

    ryanofsky commented at 7:56 pm on July 6, 2022:

    In commit “validation: add ChainMan logic for completing UTXO snapshot validation” (00b319a35cfb691a2fb831e84f63512412cae068)

    Since return value is not checked, can comment above be updated to say what is supposed to happen here if the snapshot fails validation? Or if the hash fails to compute? I’m a little confused because about what error handling is supposed to be because it seems like m_stop_use is set in the success case, but the shutdown function and InvalidateCoinsDBOnDisk are not always called in the failure case. So it is seems possible to end up with the background chainstate continueing to sync past the snapshot base, meanwhile the snapshot is still active


    jamesob commented at 8:13 pm on July 19, 2022:
    Technically docs are right because the UTXO set is checked in ABC -> ABCStep -> ConnectTip, but I can update.

    jamesob commented at 8:19 pm on July 19, 2022:
    Based upon my (re)reading of MaybeCompleteSnapshotValidation(), the snapshot cannot be found invalid (without some kind of code error) without calling handle_invalid_snapshot(), which will mark the snapshot chainstate as unusable and remove the associated coinsdb.

    ryanofsky commented at 1:43 pm on July 22, 2022:

    re: #24232 (review)

    In commit “validation: add ChainMan logic for completing UTXO snapshot validation” (1eb88f888cc04bd9da5405d4d5edc7b6e2603a8e)

    Based upon my (re)reading of MaybeCompleteSnapshotValidation(), the snapshot cannot be found invalid (without some kind of code error) without calling handle_invalid_snapshot(), which will mark the snapshot chainstate as unusable and remove the associated coinsdb.

    It seems like it would be possible to add some assertions about the return value or chain state here to verify that. But if you think that is overkill it would be good at least update the comment to be explicit that you are intentionally ignoring the return value. Could prefix “If this fails…” with “Intentionally ignoring return value. If this fails…”

    If MaybeCompleteSnapshotValidation always cleans up after itself I guess I also don’t understand why it is not an error if it fails here, but it is an error if it fails the other place it is called on startup.

    If might be clearer if MaybeCompleteSnapshotValidation just returned an error code and callers would handle cleanup to make code clearer and easier to reason about.


    ariard commented at 3:12 pm on August 1, 2022:
    Good to me, it’s a bit byzantine but that’s allright.
  179. in src/validation.cpp:3047 in 00b319a35c outdated
    3041+                // This will have been toggled in
    3042+                // ActivateBestChainStep -> ConnectTip -> MaybeCompleteSnapshotValidation,
    3043+                // if at all, so we should catch it here.
    3044+                //
    3045+                // Break this do-while to ensure we don't advance past the base snapshot.
    3046+                if (m_stop_use) {
    


    ariard commented at 9:15 pm on July 4, 2022:
    I think ActivateBestChain could be documented on those new block processing stoppers if a background state is fully validated.

    jamesob commented at 8:25 pm on July 19, 2022:
    Added a comment to ABC’s docstring.
  180. in src/validation.cpp:5279 in 00b319a35c outdated
    5274+            !this->IsUsable(m_snapshot_chainstate.get()) ||
    5275+            !this->IsUsable(m_ibd_chainstate.get()) ||
    5276+            !m_ibd_chainstate->m_chain.Tip()) {
    5277+       // Nothing to do - this function only applies to the background
    5278+       // validation chainstate.
    5279+       return {false, {}};
    


    ariard commented at 9:20 pm on July 4, 2022:
    Isn’t that check a belt-and-suspender as “backgroundness” is already checked in ConnectTip() ?

    jamesob commented at 1:07 pm on July 20, 2022:
    The first clause of the if is belt-and-suspenders, but the remaining calls aren’t.
  181. in src/validation.cpp:5294 in 00b319a35c outdated
    5289+
    5290+    assert(SnapshotBlockhash());
    5291+    uint256 snapshot_blockhash = *SnapshotBlockhash();
    5292+
    5293+    if (index_new.GetBlockHash() != snapshot_blockhash) {
    5294+      LogPrintf("[snapshot] snapshot base blockhash does not match background chainstate tip - code error?\n");
    


    ariard commented at 9:21 pm on July 4, 2022:
    or more theoretical a deep reorg down to the snapshot blockhash ? or what else it could be a hint of ?

    ryanofsky commented at 7:46 pm on July 6, 2022:

    In commit “validation: add ChainMan logic for completing UTXO snapshot validation” (00b319a35cfb691a2fb831e84f63512412cae068)

    or more theoretical a deep reorg down to the snapshot blockhash ? or what else it could be a hint of ?

    A deep reorg does seem like the thing that would trigger this, assuming there is not a bug. I would just write message to straightforwardly report what happened. Something like "Validated block %s height %d does not match the snapshot base block %s height %d. Snapshot is not valid.", index_new.BlockHash().ToString(), index_new.nHeight, snapshot_blockhash.ToString(), snapshot_height.


    jamesob commented at 1:10 pm on July 20, 2022:
    Fixed, thanks
  182. in src/validation.cpp:5210 in 00b319a35c outdated
    5341+        return {false, SnapshotCompletionError::MISSING_CHAINPARAMS};
    5342+    }
    5343+
    5344+    const AssumeutxoData& au_data = *maybe_au_data;
    5345+    std::optional<CCoinsStats> maybe_ibd_stats;
    5346+    try {
    


    ariard commented at 9:24 pm on July 4, 2022:
    If that function takes few minutes to yield a result could be nice to add a log print in the happy path informing a node operator what’s going if the rest of the nodes operations are staling due to cs_main lock held.

    jamesob commented at 1:12 pm on July 20, 2022:
    Added
  183. in src/validation.cpp:5182 in 00b319a35c outdated
    5315+        LogPrintf("[snapshot] deleting snapshot, reverting to validated chain, and stopping node\n");
    5316+
    5317+        m_active_chainstate = m_ibd_chainstate.get();
    5318+        m_snapshot_chainstate->m_stop_use = true;
    5319+        assert(!this->IsUsable(m_snapshot_chainstate.get()));
    5320+
    


    ariard commented at 9:25 pm on July 4, 2022:
    assert usability of ibd chainstate as it’s supposed to take the validation relay at next restart?

    jamesob commented at 1:11 pm on July 20, 2022:
    Added
  184. in src/validation.cpp:5433 in 00b319a35c outdated
    5564+    // The caller of this method will be responsible for reinitializing chainstates
    5565+    // if they want to continue operation.
    5566+    this->ResetChainstates();
    5567+
    5568+    // No chainstates should be considered usable.
    5569+    assert(this->GetAll().size() == 0);
    


    ariard commented at 9:36 pm on July 4, 2022:
    Reading until then, it was my understanding that the snapshot state stays usable, can’t find when the snapshot chainstate is turn to m_stop_use = true, apart in the handle_invalid_snapshot case?

    jamesob commented at 1:14 pm on July 20, 2022:
    This is just a redundant check that no chainstates are considered usable while we’re moving the leveldb directories around on disk.
  185. ariard commented at 10:06 pm on July 4, 2022: member

    Reviewed all the code changes until cbcc8bc. Still have to check the correctness between the operations model laid out in assumeutxo.md and the code and the test coverage/changes.


    In principle, I agree, and that’s how I started the project. But I have found that even small PRs take months to merge (e.g. #23997 has been open for 6 months), and I have found that sadly the process encourages bundling large changesets together because otherwise progress feels impossibly slow.

    I understand that core review process might sounds slow and that large changesets are a way to keep the ball rolling forward from a PR author perspective. However, I think large changesets might also produce the downside that they’re are likely to discourage reviewers to get engaged, as the context for them to consider and evaluate is bigger.

    I would advise to split this PR in two chunks, one the logic for detecting and activating snapshot chainstates on start, the other one the logic to complete the snapshot validation process. Especially, the latter one, as it’s modifying ActivateBestChain(), even superficially, it’s one of the most critical path of the codebase and I think it’s worthy of more scrutiny.

    Though more generally, about the assumeutxo project itself, I think a majority of the contributors who partaked in the design of assumeutxo are not or less active anymore. Minding that the project is context-rich with changes in the validation engine, the caches, the disk I/O , the init sequence and the indexes, I think it’s a pretty high bar to attract review. I would say assumeutxo would benefit from a renewed evangelism about why it matters and the upsides for the Bitcoin ecosystem as whole. I think it could be reasonable to give regular updates about the state of the project at meetings, the updated project status list sounds a good start.

    Last, I believe a lot of opened PRs about validation and indexes could benefit from your review bandwidth. Speaking for myself, I always seek to have a positive review balance sheet between how much bandwidth I’m providing to the project and how much I’m requesting for my own PRs.

  186. in src/validation.cpp:1513 in 0f9d3f19fe outdated
    1508@@ -1508,14 +1509,18 @@ CChainState::CChainState(
    1509       m_chainman(chainman),
    1510       m_from_snapshot_blockhash(from_snapshot_blockhash) {}
    1511 
    1512+
    1513+constexpr std::string_view SNAPSHOT_CHAINSTATE_SUFFIX = "_snapshot";
    


    ryanofsky commented at 5:52 pm on July 6, 2022:

    In commit “validation: rename snapshot chainstate dir” (0f9d3f19fe535c2b457818a46bb8e536461581dc)

    Might be better to define this in node/utxo_snapshot.h now instead of validation.cpp so it doesn’t have to move later.


    jamesob commented at 8:28 pm on July 18, 2022:
    Fixed.
  187. in src/validation.cpp:5236 in b1dbc76037 outdated
    5231+    this->ActivateSnapshotChainstate(mempool, *base_blockhash);
    5232+
    5233+    return true;
    5234+}
    5235+
    5236+CChainState& ChainstateManager::ActivateSnapshotChainstate(CTxMemPool* mempool, uint256 base_blockhash)
    


    ryanofsky commented at 6:20 pm on July 6, 2022:

    In commit “init: add utxo snapshot detection” (b1dbc76037c1f2ff893113057afd3f2eac581dbd)

    This ChainstateManager::ActivateSnapshotChainstate() method name seem easy to confuse with the other ChainstateManager::ActivateSnapshot() method.

    I think ideally the methods would have more distinct names, like this could be called ActivateExistingSnapshot and the other could be called ActivateNewSnapshot.


    jamesob commented at 3:46 pm on July 19, 2022:
    Yeah, good point. Changing this (at least) to ActivateExistingSnapshot().
  188. in src/validation.cpp:4861 in 55519a32c9 outdated
    4978+        if (auto snapshot_datadir = node::FindSnapshotChainstateDir()) {
    4979+            // We have to destruct leveldb::DB in order to release the db lock, otherwise
    4980+            // DestroyDB() (in DeleteCoinsDBFromDisk()) will fail. See `leveldb::~DBImpl()`.
    4981+            // Destructing the chainstate (and so resetting the coinsviews object) does this.
    4982+            snapshot_chainstate.reset();
    4983+            bool removed = DeleteCoinsDBFromDisk(*snapshot_datadir, /*is_snapshot=*/true);
    


    ryanofsky commented at 6:27 pm on July 6, 2022:

    In commit “validation: remove snapshot datadirs upon validation failure” (55519a32c9fee6df413fdc654dc8fca0d9ed8fc0)

    This does not seem to have test coverage (tests pass with the DeleteCoinsDBFromDisk call commented out)


    jamesob commented at 2:33 pm on July 20, 2022:
    Good find! I’ve added test coverage, and the tests now fail when this line is deleted.
  189. in src/validation.cpp:5305 in b707032db9 outdated
    5296@@ -5297,3 +5297,16 @@ CChainState& ChainstateManager::ActivateSnapshotChainstate(CTxMemPool* mempool,
    5297     m_active_chainstate = m_snapshot_chainstate.get();
    5298     return *m_snapshot_chainstate;
    5299 }
    5300+
    5301+CBlockIndex* ChainstateManager::GetSnapshotBaseBlock()
    5302+{
    5303+    auto blockhash_op = this->SnapshotBlockhash();
    5304+    if (!blockhash_op) return nullptr;
    5305+    return m_blockman.LookupBlockIndex(*blockhash_op);
    


    ryanofsky commented at 6:31 pm on July 6, 2022:

    In commit “add ChainstateManager.GetSnapshot{BaseHeight,BaseBlock}()” (b707032db9e71d98ce5427b37e7d13011f725ca9)

    Would be good to return Assert(m_blockman.LookupBlockIndex(*blockhash_op)) to be clear it is not supposed to return null in this case.


    jamesob commented at 7:52 pm on July 19, 2022:
    Fixed.
  190. in src/validation.h:777 in 00b319a35c outdated
    772+    // The blockhash of the current tip of the background validation chainstate does
    773+    // not match the one expected by the snapshot chainstate.
    774+    BASE_BLOCKHASH_MISMATCH,
    775+};
    776+
    777+struct SnapshotCompletionResult {
    


    ryanofsky commented at 7:02 pm on July 6, 2022:

    In commit “validation: add ChainMan logic for completing UTXO snapshot validation” (00b319a35cfb691a2fb831e84f63512412cae068)

    This result type seems overly complicated to me. Instead of having a struct with bool and optional members, why not return a plain enum:

    0enum class SnapshotValidationResult { SUCCESS, SKIPPED, FAILED_MISSING_CHAINPARAMS, FAILED_STATS, FAILED_HASH_MISMATCH, FAILED_BASE_BLOCKHASH_MISMATCH };
    

    jamesob commented at 1:28 pm on July 20, 2022:
    Fixed, thanks
  191. in src/init.cpp:1503 in 00b319a35c outdated
    1498@@ -1499,6 +1499,9 @@ bool AppInitMain(NodeContext& node, interfaces::BlockAndHeaderTipInfo* tip_info)
    1499                 break;
    1500             case ChainstateLoadingError::SHUTDOWN_PROBED:
    1501                 break;
    1502+            case ChainstateLoadingError::SNAPSHOT_VALIDATION_FAILED:
    1503+                strLoadError = _("UTXO snapshot failed to validate");
    


    ryanofsky commented at 7:22 pm on July 6, 2022:

    In commit “validation: add ChainMan logic for completing UTXO snapshot validation” (00b319a35cfb691a2fb831e84f63512412cae068)

    If this case is hit, it looks like the GUI will prompt to reindex. I guess that seems reasonable if it is intended behavior? Could change this return InitError(_("UTXO snapshot failed to validate. Please do XXX")); if this is supposed to just exit instead.


    jamesob commented at 2:01 pm on July 20, 2022:
    Fixed
  192. in src/validation.cpp:5053 in 00b319a35c outdated
    5048+    {
    5049+        return "ComputeUTXOStats interrupted by shutdown.";
    5050+    }
    5051+};
    5052+
    5053+static auto SnapshotUTXOHashBreakpoint = [] {
    


    ryanofsky commented at 7:29 pm on July 6, 2022:

    In commit “validation: add ChainMan logic for completing UTXO snapshot validation” (00b319a35cfb691a2fb831e84f63512412cae068)

    Can it just be a normal function static void SnapshotUTXOHashBreakpoint() {?


    jamesob commented at 1:05 pm on July 20, 2022:
    Yep, good point. Fixed.
  193. in src/validation.cpp:5614 in 00b319a35c outdated
    5609+        LogPrintf("Deletion of %s failed. Please remove it manually, as the " /* Continued */
    5610+                  "directory is now unnecessary.\n",
    5611+                  fs::PathToString(tmp_old));
    5612+    }
    5613+
    5614+    LogPrintf("[snapshot] deleted background chainstate directory (%s)\n",
    


    ryanofsky commented at 8:04 pm on July 6, 2022:

    In commit “validation: add ChainMan logic for completing UTXO snapshot validation” (00b319a35cfb691a2fb831e84f63512412cae068)

    Seems like this should be in an } else { block to avoid confusing deletion of %s failed message followed by %s deleted message


    jamesob commented at 1:15 pm on July 20, 2022:
    Good point, fixed
  194. ryanofsky approved
  195. ryanofsky commented at 8:15 pm on July 6, 2022: contributor
    Code review ACK d003388b07ed567a99f8c23f8a273145de6aa012. A lot of changes and cleanups since last review like simplifying the InitializeChainstate method. This looks very good and puts all the code needed to validate and clean up snapshots into place. I left more review comments, but they aren’t important and can be ignored
  196. DrahtBot added the label Needs rebase on Jul 19, 2022
  197. jamesob force-pushed on Jul 20, 2022
  198. db: add StoragePath to CDBWrapper/CCoinsViewDB
    This is used in subsequent commits. It allows us to clean up UTXO
    snapshot chainstate after background validation completes.
    d65678d383
  199. validation: rename snapshot chainstate dir
    This changes the snapshot's leveldb chainstate dir name from
    `chainstate_[blockhash]` to `chainstate_snapshot`. This simplifies
    later logic that loads snapshot data, and enforces the limitation
    of a single snapshot at any given time.
    
    Since we still need to persis the blockhash of the base block, we
    write that out to a file (`chainstate_snapshot/base_blockhash`) for
    later use during initialization, so that we can reinitialize the
    snapshot chainstate.
    
    Co-authored-by: Russell Yanofsky <russ@yanofsky.org>
    05db9ccb2f
  200. init: add utxo snapshot detection
    Add functionality for activating a snapshot-based chainstate if one is
    detected on-disk.
    
    Also cautiously initialize chainstate cache usages so that we don't
    somehow blow past our cache allowances during initialization, then
    rebalance at the end of init.
    
    Co-authored-by: Russell Yanofsky <russ@yanofsky.org>
    2fdd29956d
  201. add utilities for deleting on-disk leveldb data
    Used in later commits to remove leveldb directories for
    - invalid snapshot chainstates, and
    - background-vaildation chainstates that have finished serving their
      purpose.
    4761573de6
  202. validation: remove snapshot datadirs upon validation failure
    If a UTXO snapshot fails to validate, don't leave the resulting datadir
    on disk as this will confuse initialization on next startup and we'll
    get an assertion error.
    95fd2532d6
  203. add ChainstateManager.GetSnapshot{BaseHeight,BaseBlock}()
    For use in later commits.
    68fcd35506
  204. blockmanager: avoid undefined behavior during FlushBlockFile
    If we call FlushBlockFile() without having intitialized the block index
    with LoadBlockIndexDB(), we may be indexing into an empty vector.
    
    Specifically this is an issue when we call MaybeRebalanceCaches() during
    chainstate init before the block index has been loaded, which calls
    FlushBlockFile().
    
    Also add an assert to avoid undefined behavior.
    
    Co-authored-by: Russell Yanofsky <russ@yanofsky.org>
    e68675693b
  205. validation: add CChainState::m_stop_use and ChainMan::isUsable
    and remove m_snapshot_validated. This state can now be inferred by the
    number of isUsable chainstates.
    
    m_stop_use is used to signal that a chainstate should no longer be used
    by validation logic; it is used as a sentinel when background validation
    completes or if the snapshot chainstate is found to be invalid.
    
    isUsable is a convenience method that incorporates m_stop_use.
    d9fbea81e2
  206. add CChainState::HasCoinsViews()
    Used in subsequent commits. Also cleans up asserts in
    coins_views-related convenience methods to be more exact.
    6cf2bf4c4b
  207. move-only-ish: init: factor out chainstate initialization
    Moves chainstate initialization into its own function. This is necessary
    to later support a more readable way of handling background-validation
    chainstate cleanup during init, since the chainstate initialization
    functions may need to be repeated after moving leveldb filesystem
    content around.
    
    This commit isn't strictly necessary, but the alternative is to (ab)use
    the `while` loop in init.cpp with a `continue` on the basis of a
    specific ChainstateLoadingError return value from LoadChainstate. Not
    only is this harder to read, but it can't be unittested.
    
    The approach here lets us consolidate background-validation cleanup to
    LoadChainstate, and therefore exercise it within tests.
    
    This commit is most easily reviewed with
    
      git diff --color-moved=dimmed-zebra --color-moved-ws=ignore-space-change
    79ffaa712c
  208. validation: add ChainMan logic for completing UTXO snapshot validation
    Trigger completion when a background validation chainstate reaches the
    same height as a UTXO snapshot, and handle cleaning up the chainstate on
    subsequent startup.
    1eb88f888c
  209. log: add LoadBlockIndex() message for assumedvalid blocks
    I found this useful during unittest debugging.
    2d996c8f79
  210. move-only: test: make snapshot chainstate setup reusable
    For use in next commit.
    
    Most easily reviewed with
    `--color-moved=dimmed_zebra --color-moved-ws=allow-indentation-change`.
    7371c36a7e
  211. test: add reset_chainstate parameter for snapshot unittests
    This CreateAndActivateUTXOSnapshot parameter is necessary once we
    perform snapshot completion within ABC, since the existing UpdateTip
    test will fail because the IBD chain that has generated the snapshot
    will exceed the base of the snapshot.
    
    Being able to test snapshots being loaded into a mostly-uninitialized
    datadir allows for more realistic unittest scenarios.
    72e3f9cacf
  212. test: allow on-disk coins and block tree dbs in tests
    Used when testing cleanup of on-disk chainstate data for snapshot
    testcases. Also necessary for simulating node restart in .cpp tests.
    e60cdc2f9d
  213. test: move-only-ish: factor out LoadVerifyActivateChainstate()
    in TestingSetup(). This is used in the following commit to test
    reinitializing chainstates after snapshot validation and cleanup.
    
    Best reviewed with `git diff --color-moved=dimmed-zebra`.
    6c873364e2
  214. test: add testcases for snapshot completion 106fc0dd1c
  215. docs: update assumeutxo.md
    Include notes about the `chainstate_snapshot` rename as well as
    updates for the included code.
    ac8c9d8092
  216. jamesob force-pushed on Jul 20, 2022
  217. jamesob commented at 4:14 pm on July 20, 2022: member

    Rebased and addressed feedback. Thanks Russ and Antoine.

    I’m approaching “thermonuclear levels of burnout” on this one, as the phrase goes. I’m getting bludgeoned by rebases, so any help reviewing this is appreciated.

    I’ve looked at splitting this out into two PRs and while it is possible it will not be trivial for me to do. If any potential reviewers feel strongly about doing that, I can.

  218. DrahtBot removed the label Needs rebase on Jul 20, 2022
  219. jamesob commented at 1:41 pm on July 21, 2022: member
    Okay, I think it’s pretty clear that the right move is to split this PR up into two separate ones (initialization vs. completion). I’ll be opening those shortly.
  220. jamesob closed this on Jul 21, 2022

  221. in src/validation.cpp:5151 in 1eb88f888c outdated
    5146+
    5147+    assert(SnapshotBlockhash());
    5148+    uint256 snapshot_blockhash = *SnapshotBlockhash();
    5149+
    5150+    if (index_new.GetBlockHash() != snapshot_blockhash) {
    5151+      LogPrintf("[snapshot] supposed base block %s does not match the " /* Continued */
    


    ryanofsky commented at 1:54 pm on July 22, 2022:

    In commit “validation: add ChainMan logic for completing UTXO snapshot validation” (1eb88f888cc04bd9da5405d4d5edc7b6e2603a8e)

    This seems unclear what is supposed or why. Would seem clearer to say “Snapshot base block {snapshot_blockhash} height {snapshot_base_height} doesn’t match validated block {index_new.hash} height {index_new.height}”

  222. ryanofsky commented at 2:01 pm on July 22, 2022: contributor

    Code review ACK ac8c9d809214bd8dcdacdb66127cabf49788e097. Rebase and a lot of suggested cleanups since the last review, nothing major

    I’m still a little confused about MaybeCompleteSnapshotValidation error handling, but in the worst case problems might just make debugging & error recovery less convenient so this is not a blocker

  223. achow101 referenced this in commit 6912a28f08 on Oct 13, 2022
  224. achow101 referenced this in commit d5e4f9a439 on Mar 7, 2023
  225. fanquake added this to the "To do" column in a project

  226. bitcoin locked this on Aug 1, 2023

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-11-21 09:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me