assumeutxo: Check snapshot base block is not in invalid chain #30267

pull fjahr wants to merge 3 commits into bitcoin:master from fjahr:2024-06-invalid-snapshot-block changing 6 files +66 −54
  1. fjahr commented at 8:49 am on June 11, 2024: contributor

    This was discovered in a discussion in #29996

    If the base block of the snapshot is marked invalid or part of an invalid chain, we currently still load the snapshot and get stuck in a weird state where we have the snapshot chainstate but it will never connect to our valid chain.

    While this scenario is highly unlikely to occur on mainnet, it still seems good to prevent this inconsistent state.

    The behavior change described above is in the second commit.

    The first commit refactors the early checks in the loadtxoutset RPC by moving them into ActivateSnapshot() in order to have the chance to cover them by unit tests in the future and have a more consistent interface. Previously checks were spread out between rpc/blockchain.cpp and validation.cpp. In order to be able to return the error message to users of the RPC, the return type of ActivateSnapshot() is changed from bool to util::Result.

    The third commit removes an unnecessary restart introduced in #29428.

  2. DrahtBot commented at 8:49 am on June 11, 2024: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage

    For detailed information about the code coverage, see the test coverage report.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK mzumsande, alfonsoromanz, achow101
    Stale ACK tdb3, BrandonOdiwuor

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #30342 (kernel, logging: Pass Logger instances to kernel objects by ryanofsky)
    • #30329 (fuzz: improve utxo_snapshot target by mzumsande)
    • #30320 (assumeutxo: Don’t load a snapshot if it’s not in the best header chain by mzumsande)
    • #29641 (scripted-diff: Use LogInfo/LogDebug over LogPrintf/LogPrint by maflcko)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  3. in test/functional/feature_assumeutxo.py:210 in cbc604f6a7 outdated
    206+
    207+    def test_snapshot_block_invalidated(self, dump_output_path):
    208+        self.log.info("Test snapshot is not loaded when base block is invalid.")
    209+        node = self.nodes[0]
    210+        block_hash = node.getblockhash(SNAPSHOT_BASE_HEIGHT - 1)
    211+        node.invalidateblock(block_hash)
    


    tdb3 commented at 4:05 pm on June 11, 2024:

    If the base block of the snapshot is marked invalid or part of an invalid chain

    This existing test seems to focus on the base block being part of an invalid chain. Is it also worth adding an explicit test for an invalid base block itself (or would it be overkill)?


    fjahr commented at 12:59 pm on June 18, 2024:
    I think it’s a bit overkill but it was easy to add and also shouldn’t make the test considerably slower, so I added coverage for this as well.
  4. tdb3 approved
  5. tdb3 commented at 4:05 pm on June 11, 2024: contributor

    ACK cbc604f6a7739bf22a5d811b6366453860d59f4c Seems like an improvement to avoid expending resources loading a snapshot that won’t be part of a valid chain.

    Built and ran unit and functional tests (including feature_assumeutxo). All passed.

    Tested that feature_assumeutxo would fail with:

    0bool start_block_invalid = false;
    

    Test failed as expected.

  6. alfonsoromanz approved
  7. alfonsoromanz commented at 7:58 am on June 12, 2024: contributor

    tACK cbc604f6a7739bf22a5d811b6366453860d59f4c

    This PR successfully builds and passes the test when running test/functional/feature_assumeutxo.py.

    Additionally, I tested this in the context of my PR (#29996) back when the code looked something like this:

     0    def test_snapshot_in_a_divergent_chain(self, dump_output_path):
     1        node = self.nodes[2]
     2        # Rollback the chain down to START_HEIGHT
     3        block_hash = node.getblockhash(START_HEIGHT + 1)
     4        node.invalidateblock(block_hash)
     5        assert_equal(node.getblockcount(), START_HEIGHT)
     6
     7        self.log.info(f"Check importing a snapshot where current chain-tip is not an ancestor of the snapshot block but has less work")
     8        # Generate a divergent chain up to 298
     9        self.generate(node, nblocks=99, sync_fun=self.no_op)
    10        assert_equal(node.getblockcount(), SNAPSHOT_BASE_HEIGHT - 1)
    11
    12        # Try importing the snapshot and assert its success
    13        self.log.info('Importing the snapshot')
    14        loaded = node.loadtxoutset(dump_output_path)
    

    The snapshot load failed with the expected message: test_framework.authproxy.JSONRPCException: The base block header is part of an invalid chain. (-32603)

  8. BrandonOdiwuor commented at 2:36 pm on June 12, 2024: contributor
    Code Review ACK cbc604f6a7739bf22a5d811b6366453860d59f4c
  9. in src/rpc/blockchain.cpp:2845 in cbc604f6a7 outdated
    2840+    bool start_block_invalid = WITH_LOCK(::cs_main,
    2841+            return snapshot_start_block->nStatus & BLOCK_FAILED_MASK);
    2842+    if (start_block_invalid) {
    2843+        throw JSONRPCError(
    2844+            RPC_INTERNAL_ERROR,
    2845+            "The base block header is part of an invalid chain.");
    


    BrandonOdiwuor commented at 2:39 pm on June 12, 2024:

    Nit: Would it be better to include the base block hash on the error message to help in debugging

    0throw JSONRPCError(
    1            RPC_INTERNAL_ERROR,
    2            "The base block header (%s) is part of an invalid chain.", base_blockhash.ToString());
    

    fjahr commented at 12:59 pm on June 18, 2024:
    taken with minor edits
  10. in src/rpc/blockchain.cpp:2842 in f302956062 outdated
    2835@@ -2836,6 +2836,15 @@ static RPCHelpMan loadtxoutset()
    2836             strprintf("The base block header (%s) must appear in the headers chain. Make sure all headers are syncing, and call this RPC again.",
    2837                       base_blockhash.ToString()));
    2838     }
    2839+
    2840+    bool start_block_invalid = WITH_LOCK(::cs_main,
    2841+            return snapshot_start_block->nStatus & BLOCK_FAILED_MASK);
    2842+    if (start_block_invalid) {
    


    mzumsande commented at 6:25 pm on June 13, 2024:
    Some checks are within ActivateSnapshot(), others, like the added one, are in the rpc code (which means that they can’t be covered in unit tests for that function). Do you know if there is a reason for that?

    ryanofsky commented at 2:55 pm on June 17, 2024:

    re: #30267 (review)

    Some checks are within ActivateSnapshot(), others, like the added one, are in the rpc code (which means that they can’t be covered in unit tests for that function). Do you know if there is a reason for that?

    I think this is just a mistake, and it makes the checks less reliable because they are done without keeping cs_main locked. I suggested fixing this before in #27596 (review).


    fjahr commented at 12:59 pm on June 18, 2024:
    I think these were always separated like that. But sounds good to me to move them. I have added a refactoring commit which moves the other two early checks into ActivateSnapshot() and then I am adding the new check there as well.

    mzumsande commented at 6:35 pm on June 18, 2024:
    Looks good (except for the compiler issue with Join()) - I think it would be helpful to not only log the reason, but also return them with the RPC - for that, ActivateSnapshot() would need to return it, possibly as util::Result as suggested by ryanofsky, but that could be a follow-up. Though for now, it might be helpful to expand the error (“Unable to load UTXO snapshot”) to point the user to the debug log for the reason.

    fjahr commented at 7:02 pm on June 19, 2024:
    Ah, yeah, I gave it a shot and it pretty simple so I am changing the return type as well in the refactoring commit now.
  11. fjahr force-pushed on Jun 18, 2024
  12. DrahtBot commented at 4:02 pm on June 18, 2024: contributor

    🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the documentation.

    Possibly this is due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    Leave a comment here, if you need help tracking down a confusing failure.

    Debug: https://github.com/bitcoin/bitcoin/runs/26368413924

  13. DrahtBot added the label CI failed on Jun 18, 2024
  14. fjahr force-pushed on Jun 19, 2024
  15. fjahr force-pushed on Jun 19, 2024
  16. fjahr commented at 7:42 pm on June 19, 2024: contributor
    Needed to rebase because of a silent merge conflict.
  17. refactor: Move early loadtxoutset checks into ActiveSnapshot
    Also changes the return type of ActiveSnapshot to allow returning the
    error message to the user of the loadtxoutset RPC.
    80315c0118
  18. fjahr force-pushed on Jun 19, 2024
  19. DrahtBot removed the label CI failed on Jun 19, 2024
  20. in test/functional/feature_assumeutxo.py:212 in b99304a6fc outdated
    204@@ -205,6 +205,19 @@ def test_snapshot_with_less_work(self, dump_output_path):
    205             assert_raises_rpc_error(-32603, "Unable to load UTXO snapshot", node.loadtxoutset, dump_output_path)
    206         self.restart_node(0, extra_args=self.extra_args[0])
    207 
    208+    def test_snapshot_block_invalidated(self, dump_output_path):
    209+        self.log.info("Test snapshot is not loaded when base block is invalid.")
    210+        node = self.nodes[0]
    211+        # We are testing the case where the base block is invalidated itself
    212+        # and also the case where one of its partents is invalidated.
    


    mzumsande commented at 5:50 pm on June 20, 2024:
    typo: parents

    fjahr commented at 8:39 am on June 21, 2024:
    fixed
  21. in src/rpc/blockchain.cpp:2824 in 80315c0118 outdated
    2841-    }
    2842-    if (!chainman.ActivateSnapshot(afile, metadata, false)) {
    2843-        throw JSONRPCError(RPC_INTERNAL_ERROR, "Unable to load UTXO snapshot " + fs::PathToString(path));
    2844+    auto activation_result{chainman.ActivateSnapshot(afile, metadata, false)};
    2845+    if (!activation_result) {
    2846+        throw JSONRPCError(RPC_INTERNAL_ERROR, strprintf(_("Unable to load UTXO snapshot: %s\n"), util::ErrorString(activation_result)).original);
    


    mzumsande commented at 6:50 pm on June 20, 2024:
    This changes it such that the path is no longer included in the RPC error. I think that it wasn’t really helpful anyway (the user already provided the path when calling the RPC, so why the need to report it back to the user?), but just wanted to mention it in case someone would miss that.

    maflcko commented at 6:52 am on July 3, 2024:
    What is the point of translating this message when the translation is discarded? Seems wasteful of translator’s time, no?

    maflcko commented at 7:16 am on July 3, 2024:
    Is the trailing newline needed? This would be the first JSONRPCError with a trailing newline.

    maflcko commented at 7:28 am on July 3, 2024:
    I think the path is transformed to an absolute one, so the string representation may be different from the one passed in by the user.

    ryanofsky commented at 4:52 pm on July 9, 2024:

    re: #30267 (review)

    What is the point of translating this message when the translation is discarded? Seems wasteful of translator’s time, no?

    Removed translation in #30417


    ryanofsky commented at 4:52 pm on July 9, 2024:

    re: #30267 (review)

    Is the trailing newline needed? This would be the first JSONRPCError with a trailing newline.

    Removed in #30417

  22. mzumsande commented at 6:51 pm on June 20, 2024: contributor
    Code Review ACK 750f8b50a749e577fc11f2f9f79e06cdd29e66f5
  23. DrahtBot requested review from tdb3 on Jun 20, 2024
  24. DrahtBot requested review from BrandonOdiwuor on Jun 20, 2024
  25. DrahtBot requested review from alfonsoromanz on Jun 20, 2024
  26. assumeutxo: Check snapshot base block is not marked invalid
    Co-authored-by: Alfonso Roman Zubeldia <alfonsoromanz24@gmail.com>
    19ce3d407e
  27. test: Remove unnecessary restart in assumeutxo test 2f9bde69f4
  28. fjahr force-pushed on Jun 21, 2024
  29. mzumsande commented at 3:04 pm on June 21, 2024: contributor
    re-ACK 2f9bde6
  30. alfonsoromanz commented at 9:21 am on June 27, 2024: contributor
    Re-ACK 2f9bde69f45c7a9fdcf0c65f9e1305391a6f1f28. The RPC code looks much cleaner after the refactor. Also, it seems very useful to get the error message in the RPC response rather than having to rely on the logs in some scenarios if you are an RPC user.
  31. achow101 commented at 8:48 pm on July 2, 2024: member
    ACK 2f9bde69f45c7a9fdcf0c65f9e1305391a6f1f28
  32. achow101 merged this on Jul 2, 2024
  33. achow101 closed this on Jul 2, 2024

  34. in src/validation.cpp:5746 in 80315c0118 outdated
    5742@@ -5729,7 +5743,7 @@ bool ChainstateManager::ActivateSnapshot(
    5743                     "Manually remove it before restarting.\n"), fs::PathToString(*snapshot_datadir)));
    5744             }
    5745         }
    5746-        return false;
    5747+        return util::Error{_(reason)};
    


    maflcko commented at 7:07 am on July 3, 2024:

    It is not possible to pass a raw c-string pointer to transifex for translation. Only string literals can be translated.

    I guess it could make sense to mark _(const char*) as consteval to catch those issues at compile time?


    maflcko commented at 7:10 am on July 3, 2024:
    Also, this commit is not a “refactor”, because it changes logging and RPC error strings slightly.

    maflcko commented at 11:07 am on July 3, 2024:

    I guess it could make sense to mark _(const char*) as consteval to catch those issues at compile time?

    Done in https://github.com/bitcoin/bitcoin/pull/30383


    ryanofsky commented at 11:17 am on July 3, 2024:

    re: #30267 (review)

    It is not possible to pass a raw c-string pointer to transifex for translation. Only string literals can be translated.

    In case it’s not clear, the fix for this problem and the previous unnecessary translation problem is probably to replace _() with Untranslated() everywhere where in this PR when converting strings to bilingual_str. If we added a snapshot loading GUI in the future, it might make sense to add translations at that point.


    ryanofsky commented at 4:53 pm on July 9, 2024:

    re: #30267 (review)

    It is not possible to pass a raw c-string pointer to transifex for translation. Only string literals can be translated.

    Fixed in #30417

  35. maflcko commented at 8:37 am on July 3, 2024: member
    post-merge lgtm. Left some nits
  36. alfonsoromanz commented at 3:58 pm on July 6, 2024: contributor

    Post-merge question: Shouldn’t this TODO be deleted because of the new test introduced in this PR?

    • TODO: Valid snapshot file, but referencing a snapshot block that turns out to be invalid, or has an invalid parent

    The test_snapshot_block_invalidated function seems to address this scenario by invalidating the snapshot base block and one of its parents, ensuring the snapshot cannot be loaded if the base block or its parent is invalid.

  37. fjahr commented at 4:11 pm on July 6, 2024: contributor

    Post-merge question: Shouldn’t this TODO be deleted because of the new test introduced in this PR?

    Duh, you are right, fixed with #30403. Thanks!

  38. fjahr referenced this in commit 51c1441126 on Jul 8, 2024
  39. ryanofsky referenced this in commit 740b4424a5 on Jul 9, 2024

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-11-24 03:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me