validation: Stricter assumeutxo error handling when renaming chainstates #27862

pull ryanofsky wants to merge 2 commits into bitcoin:master from ryanofsky:pr/assumeabort changing 5 files +16 −10
  1. ryanofsky commented at 4:27 pm on June 12, 2023: contributor

    There are two places in assumeutxo code where it is calling AbortNode to trigger asynchronous shutdowns without returning errors to calling functions.

    One case, in LoadChainstate, happens when snapshot validation succeeds, and there is an error trying to replace the background chainstate with the snapshot chainstate.

    The other case, in InvalidateCoinsDBOnDisk, happens when snapshot validatiion fails, and there is an error trying to remove the snapshot chainstate.

    In both cases the node is being forced to shut down, so it makes sense for these functions to raise errors so callers can know that an error happened without having to infer it from the shutdown state.

    Noticed these cases while reviewing #27861, which replaces the AbortNode function with a FatalError function.

  2. DrahtBot commented at 4:27 pm on June 12, 2023: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK TheCharlatan, achow101, jamesob

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #27861 (kernel: Add interrupt class by TheCharlatan)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  3. DrahtBot added the label Validation on Jun 12, 2023
  4. in src/node/chainstate.cpp:212 in 7564b87eba outdated
    206@@ -207,7 +207,9 @@ ChainstateLoadResult LoadChainstate(ChainstateManager& chainman, const CacheSize
    207     } else if (snapshot_completion == SnapshotCompletionResult::SUCCESS) {
    208         LogPrintf("[snapshot] cleaning up unneeded background chainstate, then reinitializing\n");
    209         if (!chainman.ValidatedSnapshotCleanup()) {
    210-            AbortNode("Background chainstate cleanup failed unexpectedly.");
    211+            const auto error = Untranslated("Background chainstate cleanup failed unexpectedly.");
    212+            AbortNode(error.original);
    213+            return {ChainstateLoadStatus::FAILURE, error};
    


    TheCharlatan commented at 8:11 pm on June 12, 2023:
    Just an observation on the behavior change here. In the current behavior, the function continues execution until it reaches CompleteChainstateInitialization further below. There it calls options.check_interrupt(), which will lead it to return ChainstateLoadStatus::INTERRUPTED. I think making it return a ChainstateLoadStatus::FAILURE now makes more sense, since it is not stopped due to an external interrupt.
  5. in src/validation.cpp:5664 in 7564b87eba outdated
    5660@@ -5661,6 +5661,7 @@ void Chainstate::InvalidateCoinsDBOnDisk()
    5661             "snapshot directory %s, otherwise you will encounter the same error again "
    5662             "on the next startup.",
    5663             src_str, dest_str, src_str));
    5664+        throw;
    


    TheCharlatan commented at 8:20 pm on June 12, 2023:
    This seems like a significant behavior change. The way I read this, InvalidateCoinsDBOnDisk is only called by MaybeCompleteSnapshotValidation. Its return type is ignored by its call site in ConnectTip, which I think would now lead us to skip over some code afterwards.

    ryanofsky commented at 7:11 pm on June 13, 2023:

    re: #27862 (review)

    This seems like a significant behavior change.

    It does only happen when snapshot validation fails (so the node would be shutting down anyway), and renaming the snapshot also fails. Just reraising the exception seemed like this simplest possible approach to take (copied from code in ValidatedSnapshotCleanup). But it is simple enough to clean up the code more. I pushed a new version which gets rid of the AbortNode call and uses util::Result instead.


    TheCharlatan commented at 7:41 pm on June 13, 2023:
    Nice, this is exactly what I had in mind! Can be resolved.
  6. ryanofsky force-pushed on Jun 13, 2023
  7. ryanofsky commented at 7:15 pm on June 13, 2023: contributor
    Updated 7564b87ebab27bea76efe5dd54d3f6f41cc78a1b -> 1ac09b93cdb41eb7dbc1a62364363e59507da1af (pr/assumeabort.1 -> pr/assumeabort.2, compare) splitting this into two commits and making more changes to InvalidateCoinsDBOnDisk to normalize its error handling
  8. DrahtBot added the label CI failed on Jun 13, 2023
  9. TheCharlatan approved
  10. TheCharlatan commented at 8:16 pm on June 13, 2023: contributor

    ACK 1ac09b93cdb41eb7dbc1a62364363e59507da1af

    That windows build failure is weird though. How can there be a conflict between std::format and tfm::format, if we are only in the latter’s namespace? Why is this only popping up now?

  11. maflcko commented at 7:33 am on June 15, 2023: member

    For ref, the msvc error is:

     0  validation.cpp
     1C:\Users\ContainerAdministrator\AppData\Local\Temp\cirrus-ci-build\src\util\translation.h(66,26): error C2666: 'tinyformat::format': 2 overloads have similar conversions [C:\Users\ContainerAdministrator\AppData\Local\Temp\cirrus-ci-build\build_msvc\libbitcoin_node\libbitcoin_node.vcxproj]
     2C:\Users\ContainerAdministrator\AppData\Local\Temp\cirrus-ci-build\src\tinyformat.h(1151,13): message : could be 'std::string tinyformat::format<std::string,std::string>(const std::string &,const std::string &,const std::string &)' [C:\Users\ContainerAdministrator\AppData\Local\Temp\cirrus-ci-build\build_msvc\libbitcoin_node\libbitcoin_node.vcxproj]
     3C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.32.31326\include\format(3103,20): message : or       'std::wstring std::format<std::string,std::string>(const std::_Basic_format_string<wchar_t,std::basic_string<char,std::char_traits<char>,std::allocator<char>>,std::basic_string<char,std::char_traits<char>,std::allocator<char>>>,std::string &&,std::string &&)' [found using argument-dependent lookup] [C:\Users\ContainerAdministrator\AppData\Local\Temp\cirrus-ci-build\build_msvc\libbitcoin_node\libbitcoin_node.vcxproj]
     4C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.32.31326\include\format(3098,19): message : or       'std::string std::format<std::string,std::string>(const std::_Basic_format_string<char,std::basic_string<char,std::char_traits<char>,std::allocator<char>>,std::basic_string<char,std::char_traits<char>,std::allocator<char>>>,std::string &&,std::string &&)' [found using argument-dependent lookup] [C:\Users\ContainerAdministrator\AppData\Local\Temp\cirrus-ci-build\build_msvc\libbitcoin_node\libbitcoin_node.vcxproj]
     5C:\Users\ContainerAdministrator\AppData\Local\Temp\cirrus-ci-build\src\util\translation.h(65,1): message : while trying to match the argument list '(const std::string, std::string, std::string)' [C:\Users\ContainerAdministrator\AppData\Local\Temp\cirrus-ci-build\build_msvc\libbitcoin_node\libbitcoin_node.vcxproj]
     6C:\Users\ContainerAdministrator\AppData\Local\Temp\cirrus-ci-build\src\util\translation.h(65,1): message : note: qualification adjustment (const/volatile) may be causing the ambiguity [C:\Users\ContainerAdministrator\AppData\Local\Temp\cirrus-ci-build\build_msvc\libbitcoin_node\libbitcoin_node.vcxproj]
     7C:\Users\ContainerAdministrator\AppData\Local\Temp\cirrus-ci-build\src\validation.cpp(5433): message : see reference to function template instantiation 'bilingual_str tinyformat::format<bilingual_str,bilingual_str>(const bilingual_str &,const bilingual_str &,const bilingual_str &)' being compiled [C:\Users\ContainerAdministrator\AppData\Local\Temp\cirrus-ci-build\build_msvc\libbitcoin_node\libbitcoin_node.vcxproj]
     8C:\Users\ContainerAdministrator\AppData\Local\Temp\cirrus-ci-build\src\util\translation.h(67,26): error C2666: 'tinyformat::format': 2 overloads have similar conversions [C:\Users\ContainerAdministrator\AppData\Local\Temp\cirrus-ci-build\build_msvc\libbitcoin_node\libbitcoin_node.vcxproj]
     9C:\Users\ContainerAdministrator\AppData\Local\Temp\cirrus-ci-build\src\tinyformat.h(1151,13): message : could be 'std::string tinyformat::format<std::string,std::string>(const std::string &,const std::string &,const std::string &)' [C:\Users\ContainerAdministrator\AppData\Local\Temp\cirrus-ci-build\build_msvc\libbitcoin_node\libbitcoin_node.vcxproj]
    10C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.32.31326\include\format(3103,20): message : or       'std::wstring std::format<std::string,std::string>(const std::_Basic_format_string<wchar_t,std::basic_string<char,std::char_traits<char>,std::allocator<char>>,std::basic_string<char,std::char_traits<char>,std::allocator<char>>>,std::string &&,std::string &&)' [found using argument-dependent lookup] [C:\Users\ContainerAdministrator\AppData\Local\Temp\cirrus-ci-build\build_msvc\libbitcoin_node\libbitcoin_node.vcxproj]
    11C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.32.31326\include\format(3098,19): message : or       'std::string std::format<std::string,std::string>(const std::_Basic_format_string<char,std::basic_string<char,std::char_traits<char>,std::allocator<char>>,std::basic_string<char,std::char_traits<char>,std::allocator<char>>>,std::string &&,std::string &&)' [found using argument-dependent lookup] [C:\Users\ContainerAdministrator\AppData\Local\Temp\cirrus-ci-build\build_msvc\libbitcoin_node\libbitcoin_node.vcxproj]
    12C:\Users\ContainerAdministrator\AppData\Local\Temp\cirrus-ci-build\src\util\translation.h(65,1): message : while trying to match the argument list '(const std::string, std::string, std::string)' [C:\Users\ContainerAdministrator\AppData\Local\Temp\cirrus-ci-build\build_msvc\libbitcoin_node\libbitcoin_node.vcxproj]
    13C:\Users\ContainerAdministrator\AppData\Local\Temp\cirrus-ci-build\src\util\translation.h(65,1): message : note: qualification adjustment (const/volatile) may be causing the ambiguity [C:\Users\ContainerAdministrator\AppData\Local\Temp\cirrus-ci-build\build_msvc\libbitcoin_node\libbitcoin_node.vcxproj]
    14  rawtransaction.cpp
    
  12. maflcko commented at 7:36 am on June 15, 2023: member
    No idea if #27892 fixes the compile error, but the best way to find out would be to rebase this pull on top of that?
  13. hebasto commented at 11:09 am on June 15, 2023: member

    No idea if #27892 fixes the compile error, but the best way to find out would be to rebase this pull on top of that?

    It works.

  14. hebasto commented at 12:33 pm on June 15, 2023: member

    For ref, the msvc error is:

    The other possible solution could be as follows:

     0--- a/src/util/translation.h
     1+++ b/src/util/translation.h
     2@@ -63,8 +63,8 @@ inline T const& TranslateArg(const T& arg, bool translated)
     3 template <typename... Args>
     4 bilingual_str format(const bilingual_str& fmt, const Args&... args)
     5 {
     6-    return bilingual_str{format(fmt.original, TranslateArg(args, false)...),
     7-                         format(fmt.translated, TranslateArg(args, true)...)};
     8+    return bilingual_str{tinyformat::format(fmt.original, TranslateArg(args, false)...),
     9+                         tinyformat::format(fmt.translated, TranslateArg(args, true)...)};
    10 }
    11 } // namespace tinyformat
    12 
    
  15. ryanofsky force-pushed on Jun 15, 2023
  16. ryanofsky commented at 3:23 pm on June 15, 2023: contributor
    Updated 1ac09b93cdb41eb7dbc1a62364363e59507da1af -> 3984b7c14d5d1ade485c4b30c74641c9361f4c1d (pr/assumeabort.2 -> pr/assumeabort.3, compare) pulling in #27892 to try to fix the MSVC error, and dropping the AbortNode call in the first commit to make the shutdown sequence more straightforward
  17. in src/node/chainstate.cpp:210 in 02654c938e outdated
    206@@ -207,7 +207,7 @@ ChainstateLoadResult LoadChainstate(ChainstateManager& chainman, const CacheSize
    207     } else if (snapshot_completion == SnapshotCompletionResult::SUCCESS) {
    208         LogPrintf("[snapshot] cleaning up unneeded background chainstate, then reinitializing\n");
    209         if (!chainman.ValidatedSnapshotCleanup()) {
    210-            AbortNode("Background chainstate cleanup failed unexpectedly.");
    211+            return {ChainstateLoadStatus::FAILURE, Untranslated("Background chainstate cleanup failed unexpectedly.")};
    


    TheCharlatan commented at 4:06 pm on June 15, 2023:
    Should this be FAILURE_FATAL now?
  18. DrahtBot removed the label CI failed on Jun 15, 2023
  19. achow101 referenced this in commit 5b8e07725d on Jun 15, 2023
  20. validation: Stricter assumeutxo error handling in LoadChainstate
    Make LoadChainstate return an explicit error when snapshot validation succeeds,
    but there is an error trying to replace the background chainstate with the
    snapshot chainstate. Previously in this case LoadChainstate would trigger a
    shutdown and return INTERRUPTED, now it will return an actual error code.
    
    There's no real change to behavior other than error message being formatted a
    little differently.
    
    Motivation for this change is to replace error handling via callbacks with
    error handling via return value ahead of
    https://github.com/bitcoin/bitcoin/pull/27861
    9047337d36
  21. validation: Stricter assumeutxo error handling in InvalidateCoinsDBOnDisk
    Currently InvalidateCoinsDBOnDisk is calling AbortNode without an error to the
    caller if it fails. Change it to return just return util::Result, and update
    the caller to handle the error itself.
    
    This causes the secondary error to be shown below the main error instead of the
    other way around.
    1c7d08b9ac
  22. ryanofsky force-pushed on Jun 15, 2023
  23. ryanofsky commented at 9:10 pm on June 15, 2023: contributor
    Rebased 3984b7c14d5d1ade485c4b30c74641c9361f4c1d -> 1c7d08b9acd33aff343228ada7e058e606cb1062 (pr/assumeabort.3 -> pr/assumeabort.4, compare) on top of #27892 and switched to ChainstateLoadStatus::FAILURE_FATAL in the first commit to avoid triggering reindex suggestion
  24. TheCharlatan approved
  25. TheCharlatan commented at 9:37 pm on June 15, 2023: contributor
    ACK 1c7d08b9acd33aff343228ada7e058e606cb1062
  26. achow101 commented at 8:15 pm on June 16, 2023: member

    ACK 1c7d08b9acd33aff343228ada7e058e606cb1062

    As someone slightly unfamiliar with init and shutdown, AbortNode not actually killing the node when it is called is surprising to me. The changes in this PR make these snapshot failure cases line up better with what I had expected.

  27. sidhujag referenced this in commit be1fb20010 on Jun 19, 2023
  28. jamesob approved
  29. jamesob commented at 4:36 pm on June 22, 2023: member

    ACK 1c7d08b9acd33aff343228ada7e058e606cb1062 (jamesob/ackr/27862.1.ryanofsky.validation_stricter_assu)

    Good, common sense changes. Thanks for this cleanup. Reviewed and built each commit locally.

  30. achow101 merged this on Jun 22, 2023
  31. achow101 closed this on Jun 22, 2023

  32. sidhujag referenced this in commit 770b363c5c on Jun 22, 2023
  33. ryanofsky referenced this in commit f562856d02 on Sep 29, 2023
  34. Frank-GER referenced this in commit 53099fdff4 on Oct 5, 2023
  35. Fabcien referenced this in commit 49ba25cf70 on Apr 19, 2024
  36. bitcoin locked this on Jun 21, 2024

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-12-21 15:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me