Return EXIT_FAILURE on post-init fatal errors #27708

pull furszy wants to merge 5 commits into bitcoin:master from furszy:2023_main_exit_failure changing 14 files +85 −56
  1. furszy commented at 3:04 pm on May 20, 2023: member

    It seems odd to return EXIT_SUCCESS when the node aborted execution due a fatal internal error or any post-init problem that triggers an unrequested shutdown.

    e.g. blocks or coins db I/O errors, disconnect block failure, failure during thread import (external blocks loading process error), among others.

  2. DrahtBot commented at 3:04 pm on May 20, 2023: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK TheCharlatan, ryanofsky, pinheadmz, theStack

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #27607 (index: improve initialization and pruning violation check by furszy)
    • #26966 (index: blockfilter initial sync speedup, parallelize process by furszy)
    • #26697 (logging: use bitset for categories by LarryRuane)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  3. TheCharlatan commented at 1:31 pm on May 25, 2023: contributor
    Concept ACK
  4. in src/shutdown.h:23 in c4e32eb72c outdated
    18@@ -19,6 +19,9 @@ bool InitShutdownState();
    19 /** Request shutdown of the application. */
    20 void StartShutdown();
    21 
    22+/** Request application shutdown due an internal error. */
    23+void StartErrorShutdown();
    


    TheCharlatan commented at 1:56 pm on May 25, 2023:
    Could this just be a local static function in shutdown.cpp and not exposed in the header?

    furszy commented at 2:26 pm on May 26, 2023:
    The reason behind that is #27607 (review). But.. could not expose it and move the caller to use AbortNode() instead. Same as we do with the external block import failures.

    furszy commented at 2:34 pm on May 26, 2023:
    Still, in order to do that in the cleanest possible way, will need to refactor shutdown.cpp, so AbortNode is placed after the static StartErrorShutdown function. Which I’m not so sure that worth it due the conflicts that could cause with your PRs. Thoughts?

    TheCharlatan commented at 2:58 pm on May 26, 2023:
    So IIUC, this call here would call StartErrorShutdown?

    furszy commented at 3:05 pm on May 26, 2023:

    Yeah here, I just rebased #27607 on top of this one.

    No problem on using AbortNode there. Just would need to bubble up the indexes error so it can be provided to AbortNode. Which.. isn’t bad as it will remove another ui_interface dependency.


    TheCharlatan commented at 9:03 pm on May 26, 2023:

    Which.. isn’t bad as it will remove another ui_interface dependency.

    Can you elaborate on this point? AbortNode still relies on ui_interface from what I can tell.

    From my current understanding both the StartShutdown in ThreadImport and in the check are call shutdown and node AbortNode, because BlockImport might run before the ui is available (this is me guessing out of historical reasons). I am not sold on this point though, since this runs in another thread and we also provide the noui consumer of the ui interface. So I think these should both become calls to AbortNode.


    furszy commented at 9:52 pm on May 26, 2023:

    Which.. isn’t bad as it will remove another ui_interface dependency.

    Can you elaborate on this point? AbortNode still relies on ui_interface from what I can tell.

    Ok, sorry for the confusion.

    The ui_interface dependency that can be removed is the base.cpp one. Thus why said to “bubble up” the error string up to init.cpp instead of calling InitError from the index base class Start method.

    From my current understanding both the StartShutdown in ThreadImport and in the check are call shutdown and node AbortNode, because BlockImport might run before the ui is available (this is me guessing out of historical reasons).

    AbortNode isn’t doing anything special. We already use InitError in init.cpp prior spawning the import thread. What the node will do when there is no gui is print the error to the command line, and the debug file, prior shutting down (using the noui consumer as you said).

    I am not sold on this point though, since this runs in another thread and we also provide the noui consumer of the ui interface. So I think these should both become calls to AbortNode.

    Yeah, that is my idea as well. The ThreadImport “fatal” failures should use AbortNode.

  5. in src/shutdown.h:35 in c4e32eb72c outdated
    31@@ -29,7 +32,9 @@ bool ShutdownRequested();
    32 
    33 /** Wait for StartShutdown to be called in any thread. This can only be used
    34  * from a single thread.
    35+ * Returns true if shutdown was requested by the user. Otherwise, it's an
    36+ * internally requested shutdown which only happens when an error arises.
    37  */
    38-void WaitForShutdown();
    39+bool WaitForShutdown();
    


    TheCharlatan commented at 1:56 pm on May 25, 2023:
    Does it make sense to add [[nodiscard]] here?

    furszy commented at 3:35 pm on May 26, 2023:
    sure

    furszy commented at 3:38 pm on June 1, 2023:
    Ended up not adding this change to not modify AbortShutdown().
  6. in src/shutdown.h:33 in c4e32eb72c outdated
    31@@ -29,7 +32,9 @@ bool ShutdownRequested();
    32 
    33 /** Wait for StartShutdown to be called in any thread. This can only be used
    34  * from a single thread.
    35+ * Returns true if shutdown was requested by the user. Otherwise, it's an
    36+ * internally requested shutdown which only happens when an error arises.
    


    TheCharlatan commented at 2:33 pm on May 25, 2023:
    This comment seems a bit inaccurate, e.g. we may have stop_at_height set, but shutdown is still triggered internally. There is also one error case in https://github.com/bitcoin/bitcoin/blob/master/src/node/blockstorage.cpp#L931 that triggers a normal shutdown.

    furszy commented at 3:33 pm on May 26, 2023:

    This comment seems a bit inaccurate, e.g. we may have stop_at_height set, but shutdown is still triggered internally. There is also one error case in https://github.com/bitcoin/bitcoin/blob/master/src/node/blockstorage.cpp#L931 that triggers a normal shutdown.

    I would label stop_at_height as an action requested by the user. The user had to provide -stopatheight at startup so the shutdown is expected.

    On the other hand, https://github.com/bitcoin/bitcoin/blob/master/src/node/blockstorage.cpp#L931 seems to be a good candidate for AbortNode. It’s a error in the ThreadImport, so either the db data got modified in some wrong way or the user provided invalid external blocks.

  7. TheCharlatan changes_requested
  8. TheCharlatan commented at 3:13 pm on May 25, 2023: contributor
    Was wondering if this was attempted before and skimmed through the pull request history, but I did not find prior attempts.
  9. furszy force-pushed on May 29, 2023
  10. furszy force-pushed on May 29, 2023
  11. DrahtBot added the label CI failed on May 29, 2023
  12. DrahtBot removed the label CI failed on May 29, 2023
  13. furszy commented at 3:41 pm on June 1, 2023: member

    Feedback tackled. Thanks TheCharlatan!. Changes:

    • Moved ThreadImport ABC error to use AbortNode. Per comment.
    • And moved AbortNode code below StartShutdown so it can use the introduced static function instead of having to add a public StartErrorShutdown function that is only used by the .cpp file and not externally.
  14. theStack commented at 4:43 pm on June 6, 2023: contributor

    Concept ACK

    Regarding the conclusion in the first commit ("‘StartShutdown’ should only be used for user requested shutdowns. Internal errors that cause a shutdown should use ‘AbortNode’."), the following is probably another candidate where AbortNode is preferred over StartShutdown?

    https://github.com/bitcoin/bitcoin/blob/8cc65f093c0cf7a27b3bfc6da90fd7a6ac8f5e48/src/index/base.cpp#L33-L41

    (not saying it necessarily has to happen in this PR though, if it makes sense).

  15. furszy commented at 7:55 pm on June 6, 2023: member

    Regarding the conclusion in the first commit ("‘StartShutdown’ should only be used for user requested shutdowns. Internal errors that cause a shutdown should use ‘AbortNode’."), the following is probably another candidate where AbortNode is preferred over StartShutdown?

    Yeah absolutely. Good eye. If you compare it in-detail, that function is a plain copy of AbortNode: it (1) calls SetMiscWarning, (2) logs the error, (3) notifies the UI by calling InitError (noui listeners as well) and (4) calls to the shutdown function.. So, we can just get rid of it.

  16. theStack commented at 10:20 pm on June 6, 2023: contributor

    Regarding the conclusion in the first commit ("‘StartShutdown’ should only be used for user requested shutdowns. Internal errors that cause a shutdown should use ‘AbortNode’."), the following is probably another candidate where AbortNode is preferred over StartShutdown?

    Yeah absolutely. Good eye. If you compare it in-detail, that function is a plain copy of AbortNode: it (1) calls SetMiscWarning, (2) logs the error, (3) notifies the UI by calling InitError (noui listeners as well) and (4) calls to the shutdown function.. So, we can just get rid of it.

    Oh you’re right, didn’t look at AbortNode before, it’s indeed almost the same. Feel free to pull in the dedup commit then: https://github.com/theStack/bitcoin/commit/3eca21000eb33b96063fdcdf30e8ad8514e41126 Note that I still kept the FatalError helper for ease-of-use for the callers, as it still is useful for not having to call tfm::format(...) each time (thanks to simple template magic).

  17. furszy force-pushed on Jun 7, 2023
  18. furszy commented at 2:05 pm on June 7, 2023: member
    done, cherry-picked 🚜.
  19. DrahtBot added the label CI failed on Jun 7, 2023
  20. ryanofsky commented at 4:57 pm on June 7, 2023: contributor
    Approach ACK d44dd73bcabc694880dfc966fad21980cd3b4c64. Need to review more closely, but these changes look very good and should help with #27711
  21. ryanofsky approved
  22. ryanofsky commented at 10:15 pm on June 7, 2023: contributor

    Code review ACK d44dd73bcabc694880dfc966fad21980cd3b4c64. But it would be nice to extend this change to work with bitcoin-qt as well. I also think it would be good to apply the following changes, which replace the bool shutdown_due_error global with an int exit_status{EXIT_SUCCESS} NodeContext member.

      0diff --git a/src/bitcoind.cpp b/src/bitcoind.cpp
      1index c305d5e6cdc1..0a2a6ed4ebc2 100644
      2--- a/src/bitcoind.cpp
      3+++ b/src/bitcoind.cpp
      4@@ -169,6 +169,9 @@ static bool AppInit(NodeContext& node, int argc, char* argv[])
      5         // Set this early so that parameter interactions go to console
      6         InitLogging(args);
      7         InitParameterInteraction(args);
      8+        if (!InitShutdownState(node.exit_status)) {
      9+            return InitError(Untranslated("Initializing wait-for-shutdown state failed."));
     10+        }
     11         if (!AppInitBasicSetup(args)) {
     12             // InitError will have been called with detailed error, which ends up on console
     13             return false;
     14@@ -236,11 +239,6 @@ static bool AppInit(NodeContext& node, int argc, char* argv[])
     15     }
     16 #endif
     17     SetSyscallSandboxPolicy(SyscallSandboxPolicy::SHUTOFF);
     18-    if (fRet) {
     19-        fRet = WaitForShutdown();
     20-    }
     21-    Interrupt(node);
     22-    Shutdown(node);
     23 
     24     return fRet;
     25 }
     26@@ -264,5 +262,12 @@ MAIN_FUNCTION
     27     // Connect bitcoind signal handlers
     28     noui_connect();
     29 
     30-    return (AppInit(node, argc, argv) ? EXIT_SUCCESS : EXIT_FAILURE);
     31+    if (AppInit(node, argc, argv)) {
     32+        WaitForShutdown();
     33+    } else {
     34+        node.exit_status = EXIT_FAILURE;
     35+    }
     36+    Interrupt(node);
     37+    Shutdown(node);
     38+    return node.exit_status;
     39 }
     40diff --git a/src/init.cpp b/src/init.cpp
     41index 1ae21ec8d2dd..c4b09eb2373c 100644
     42--- a/src/init.cpp
     43+++ b/src/init.cpp
     44@@ -812,9 +812,6 @@ bool AppInitBasicSetup(const ArgsManager& args)
     45     // Enable heap terminate-on-corruption
     46     HeapSetInformation(nullptr, HeapEnableTerminationOnCorruption, nullptr, 0);
     47 #endif
     48-    if (!InitShutdownState()) {
     49-        return InitError(Untranslated("Initializing wait-for-shutdown state failed."));
     50-    }
     51 
     52     if (!SetupNetworking()) {
     53         return InitError(Untranslated("Initializing networking failed."));
     54diff --git a/src/node/context.h b/src/node/context.h
     55index 84f4053c8407..b24f778f4f64 100644
     56--- a/src/node/context.h
     57+++ b/src/node/context.h
     58@@ -7,7 +7,9 @@
     59 
     60 #include <kernel/context.h>
     61 
     62+#include <atomic>
     63 #include <cassert>
     64+#include <cstdlib>
     65 #include <functional>
     66 #include <memory>
     67 #include <vector>
     68@@ -62,6 +64,7 @@ struct NodeContext {
     69     interfaces::WalletLoader* wallet_loader{nullptr};
     70     std::unique_ptr<CScheduler> scheduler;
     71     std::function<void()> rpc_interruption_point = [] {};
     72+    std::atomic<int> exit_status{EXIT_SUCCESS};
     73 
     74     //! Declare default constructor and destructor that are not inline, so code
     75     //! instantiating the NodeContext struct doesn't need to #include class
     76diff --git a/src/shutdown.cpp b/src/shutdown.cpp
     77index 72f23be21ca0..7641ee3b066a 100644
     78--- a/src/shutdown.cpp
     79+++ b/src/shutdown.cpp
     80@@ -11,6 +11,7 @@
     81 
     82 #include <logging.h>
     83 #include <node/interface_ui.h>
     84+#include <util/check.h>
     85 #include <util/tokenpipe.h>
     86 #include <warnings.h>
     87 
     88@@ -21,8 +22,7 @@
     89 #endif
     90 
     91 static std::atomic<bool> fRequestShutdown(false);
     92-// Whether shutdown was requested by the user or because of an internal error
     93-static std::atomic<bool> shutdown_due_error{false};
     94+static std::atomic<int>* g_exit_status{nullptr};
     95 
     96 #ifdef WIN32
     97 /** On windows it is possible to simply use a condition variable. */
     98@@ -35,8 +35,9 @@ static TokenPipeEnd g_shutdown_r;
     99 static TokenPipeEnd g_shutdown_w;
    100 #endif
    101 
    102-bool InitShutdownState()
    103+bool InitShutdownState(std::atomic<int>& exit_status)
    104 {
    105+    g_exit_status = &exit_status;
    106 #ifndef WIN32
    107     std::optional<TokenPipe> pipe = TokenPipe::Make();
    108     if (!pipe) return false;
    109@@ -46,9 +47,8 @@ bool InitShutdownState()
    110     return true;
    111 }
    112 
    113-static void StartShutdown(bool error)
    114+void StartShutdown()
    115 {
    116-    if (error) shutdown_due_error.exchange(error);
    117 #ifdef WIN32
    118     std::unique_lock<std::mutex> lk(g_shutdown_mutex);
    119     fRequestShutdown = true;
    120@@ -68,8 +68,6 @@ static void StartShutdown(bool error)
    121 #endif
    122 }
    123 
    124-void StartShutdown() { return StartShutdown(/*error=*/false); }
    125-
    126 bool AbortNode(const std::string& strMessage, bilingual_str user_message)
    127 {
    128     SetMiscWarning(Untranslated(strMessage));
    129@@ -78,7 +76,8 @@ bool AbortNode(const std::string& strMessage, bilingual_str user_message)
    130         user_message = _("A fatal internal error occurred, see debug.log for details");
    131     }
    132     InitError(user_message);
    133-    StartShutdown(/*error=*/true);
    134+    Assert(g_exit_status)->store(EXIT_FAILURE);
    135+    StartShutdown();
    136     return false;
    137 }
    138 
    139@@ -97,7 +96,7 @@ bool ShutdownRequested()
    140     return fRequestShutdown;
    141 }
    142 
    143-bool WaitForShutdown()
    144+void WaitForShutdown()
    145 {
    146 #ifdef WIN32
    147     std::unique_lock<std::mutex> lk(g_shutdown_mutex);
    148@@ -109,5 +108,4 @@ bool WaitForShutdown()
    149         assert(0);
    150     }
    151 #endif
    152-    return !shutdown_due_error;
    153 }
    154diff --git a/src/shutdown.h b/src/shutdown.h
    155index 34d5d30bc074..2b8fdb779ac6 100644
    156--- a/src/shutdown.h
    157+++ b/src/shutdown.h
    158@@ -8,10 +8,12 @@
    159 
    160 #include <util/translation.h> // For bilingual_str
    161 
    162+#include <atomic>
    163+
    164 /** Initialize shutdown state. This must be called before using either StartShutdown(),
    165  * AbortShutdown() or WaitForShutdown(). Calling ShutdownRequested() is always safe.
    166  */
    167-bool InitShutdownState();
    168+bool InitShutdownState(std::atomic<int>& exit_status);
    169 
    170 /** Request shutdown of the application. */
    171 void StartShutdown();
    172@@ -29,9 +31,7 @@ bool ShutdownRequested();
    173 
    174 /** Wait for StartShutdown to be called in any thread. This can only be used
    175  * from a single thread.
    176- * Returns true if shutdown was requested by the user. Otherwise, it's an
    177- * internally requested shutdown which only happens when an error arises.
    178  */
    179-bool WaitForShutdown();
    180+void WaitForShutdown();
    181 
    182 #endif // BITCOIN_SHUTDOWN_H
    

    I think this approach has a few advantages:

    • Avoids adding a new shutdown global right before #27711 removes all the shutdown globals, so there is less cleanup #27711 needs to do later.
    • I think semantics of an exit_status field are more obvious than a shutdown_due_error field. Easier to think concretely about what the exit code will be than abstractly about “whether shutdown was requested by the user”.
    • Having a NodeContext::exit_status field should make it easier to port this change to the GUI, since the GUI doesn’t use the WaitForShutdown function.

    (EDIT: Updated code suggestion above to fix early return bug in initial post)

  23. furszy commented at 3:27 pm on June 8, 2023: member

    Thanks @ryanofsky! I see most of the changes great, except one.

    If we change AppInit return value to be an int, we need to change all the return lines to return an int, not only the ones at the end of the function.

    Otherwise, the return false (or return error(msg)) statements will be treated as return 0 which means EXIT_SUCCESS, which is the opposite that we want.

    Other than that nuance, pulling them :).

  24. ryanofsky commented at 3:57 pm on June 8, 2023: contributor
    Oh, you are right. I thought I searched for early returns, but somehow I missed seeing them. I updated the suggestion above to fix this. I think AppInit should keep returning bool not int to keep the changes minimal, and because returning bool is better for following existing coding conventions.
  25. furszy force-pushed on Jun 8, 2023
  26. move ThreadImport ABC error to use AbortNode
    'StartShutdown' should only be used for user requested
    shutdowns. Internal errors that cause a shutdown should
    use 'AbortNode'.
    9ddf7e03a3
  27. refactor: index: use `AbortNode` in fatal error helper
    Deduplicates code in the `FatalError` template function by using
    `AbortNode` which does the exact same thing if called without any user
    message (i.e. without second parameter specified). The template is still
    kept for ease-of-use w.r.t. not having to call `tfm::format(...)` at the
    call-side each time, and also to keep the diff minimal.
    3c06926cf2
  28. furszy force-pushed on Jun 8, 2023
  29. furszy commented at 7:54 pm on June 8, 2023: member

    Thanks ryanofsky, updated per feedback with a tiny diff:

    Didn’t move InitShutdownState from AppInitBasicSetup to AppInit and instead provided the NodeContext ref to AppInitBasicSetup. Not sure if you have an strong opinion about this (shoot if you have it) but the rationale was to keep the same workflow as we have now so the bitcoin-qt related changes b0267f2 doesn’t introduce another InitShutdownState call.

    I think AppInit should keep returning bool not int to keep the changes minimal, and because returning bool is better for following existing coding conventions.

    Funny enough, GuiMain() function returns an int. I think that we just don’t have a convention here. Still, agree to follow the simpler path.

    Another topic would be whether to drop 99d3212 or not. It is not needed anymore. I was going to drop it but.. it feels good to place the global booleans first in the .cpp file and not in the middle of it. Thoughts?

  30. furszy force-pushed on Jun 8, 2023
  31. furszy commented at 8:54 pm on June 8, 2023: member

    Reworked the diff a bit more check.

    It’s because AppInit can return true without initializing the shutdown pipe. e.g. when the user starts bitcoind with a -h or -version. Which leads to an assertion error at the WaitForShutdown call.

    Probably we should refactor AppInit in the future, I liked where you were going.

  32. DrahtBot removed the label CI failed on Jun 8, 2023
  33. ryanofsky approved
  34. ryanofsky commented at 12:12 pm on June 9, 2023: contributor

    Code review ACK f13880e83200ec824d7528061f96708d96bf9bd4

    This looks good as is, but would suggest two changes if you feel like making them:

    1. Maybe drop the “move ‘AbortNode’ after ‘StartShutdown’” commit if it is not needed anymore. Dropping it would help git blame work better and make history more legible. Also #27711 is going to remove AbortNode so moving it now will cause an uglier conflict.

    2. I think it would be better if the main() function always returned node.exit_status instead conditionally returning it. Better to make it clear that setting node.exit_status always determines the exit code and not create doubt about whether the value will be ignored. Would suggest the following (despite slightly longer diff)

     0diff --git a/src/bitcoind.cpp b/src/bitcoind.cpp
     1index 71b2eef814c6..4133269b3433 100644
     2--- a/src/bitcoind.cpp
     3+++ b/src/bitcoind.cpp
     4@@ -236,13 +236,6 @@ static bool AppInit(NodeContext& node, int argc, char* argv[])
     5     }
     6 #endif
     7     SetSyscallSandboxPolicy(SyscallSandboxPolicy::SHUTOFF);
     8-    if (fRet) {
     9-        WaitForShutdown();
    10-    } else {
    11-        node.exit_status = EXIT_FAILURE;
    12-    }
    13-    Interrupt(node);
    14-    Shutdown(node);
    15 
    16     return fRet;
    17 }
    18@@ -266,5 +259,13 @@ MAIN_FUNCTION
    19     // Connect bitcoind signal handlers
    20     noui_connect();
    21 
    22-    return AppInit(node, argc, argv) ? node.exit_status.load() : EXIT_FAILURE;
    23+    if (AppInit(node, argc, argv)) {
    24+        WaitForShutdown();
    25+    } else {
    26+        node.exit_status = EXIT_FAILURE;
    27+    }
    28+    Interrupt(node);
    29+    Shutdown(node);
    30+
    31+    return node.exit_status;
    32 }
    

    I think this would be a general improvement anyway because it reduces complexity of AppInit and make it a pure initialization function, not a function that initializes, interrupts, and waits.

  35. furszy commented at 1:43 pm on June 9, 2023: member
    1. I think it would be better if the main() function always returned node.exit_status instead conditionally returning it. Better to make it clear that setting node.exit_status always determines the exit code and not create doubt about whether the value will be ignored. I think this would be a general improvement anyway because it reduces complexity of AppInit and make it a pure initialization function, not a function that initializes, interrupts, and waits.

    Hmm, isn’t the diff exactly what I changed in my last push? check #27708 (comment).

    The bug there is that AppInit can return true without initializing the shutdown pipe, which makes WaitForShutdown assertion fail (cannot call WaitForShutdown without initializing the pipe). The user could have provided the -help or -version arguments, see, which prints information and return early without initializing the node.

    Overall, I like where those changes are going, but they require to refactor AppInit further so the args parsing step, and the execution of commands that return early without initializing the node, are performed in a different function. If you feel strong on adding it here, I could add it. Not really a thing to introduce more changes for me.

    Maybe drop the “move ‘AbortNode’ after ‘StartShutdown’” commit if it is not needed anymore. Dropping it would help git blame work better and make history more legible. Also #27711 is going to remove AbortNode so moving it now will cause an uglier conflict.

    k good, didn’t know about 27711 removing it. Will drop it.

  36. furszy force-pushed on Jun 9, 2023
  37. furszy commented at 2:13 pm on June 9, 2023: member
    Have quickly drafted the needed changes to make your diff work so we are in the same page, check 61fe1af.
  38. in src/bitcoind.cpp:115 in 61fe1af014 outdated
    111@@ -112,12 +112,8 @@ int fork_daemon(bool nochdir, bool noclose, TokenPipeEnd& endpoint)
    112 
    113 #endif
    114 
    115-static bool AppInit(NodeContext& node, int argc, char* argv[])
    116+static bool ParseArgs(NodeContext& node, int argc, char* argv[])
    


    ryanofsky commented at 2:39 pm on June 9, 2023:

    In commit “refactor: decouple early return commands from AppInit” (61fe1af01418d8b93d38aba8aeb65aecfa6a95cd)

    Would be nice to just pass ArgsManager& to this function instead of NodeContext& node, since accessing other parts of the context is not needed.


    furszy commented at 8:56 pm on June 9, 2023:
    done
  39. in src/init.cpp:803 in aa415e885e outdated
    799@@ -800,7 +800,7 @@ std::set<BlockFilterType> g_enabled_filter_types;
    800     std::terminate();
    801 };
    802 
    803-bool AppInitBasicSetup(const ArgsManager& args)
    804+bool AppInitBasicSetup(const ArgsManager& args, NodeContext& node)
    


    ryanofsky commented at 3:05 pm on June 9, 2023:

    In commit “refactor: index: use AbortNode in fatal error helper” (3c06926cf21dcca3074ef51506f556b2286c299b)

    It would be slightly better to just pass a reference to node.exit_status instead of passing the whole NodeContext, because passing NodeContext exposes a lot of other state this function doesn’t need to have access to. But it’s not important because #27711 gets rid of the InitShutdownState function and will drop the parameter anyway.


    furszy commented at 8:55 pm on June 9, 2023:
    done. Refactored to pass exit_status ref.
  40. in src/qt/bitcoin.cpp:401 in c8806e972f outdated
    398@@ -398,8 +399,7 @@ void BitcoinApplication::initializeResult(bool success, interfaces::BlockAndHead
    399     qDebug() << __func__ << ": Initialization result: " << success;
    400 
    401     // Set exit result.
    


    ryanofsky commented at 3:08 pm on June 9, 2023:

    In commit “gui: return EXIT_FAILURE on post-init fatal errors” (c8806e972f2be9f8a05bf11b1f6a8a2e70e01962)

    Should delete the “Set exit result” comment here or move it down below. Probably fine to delete because exit_status.store(EXIT_FAILURE) should pretty clear by itself.


    furszy commented at 8:55 pm on June 9, 2023:
    done, deleted
  41. in src/qt/bitcoin.cpp:469 in c8806e972f outdated
    464@@ -464,6 +465,10 @@ void BitcoinApplication::handleNonFatalException(const QString& message)
    465         QLatin1String("<br><br>") + GUIUtil::MakeHtmlLink(message, PACKAGE_BUGREPORT));
    466 }
    467 
    468+int BitcoinApplication::getExitStatus() const {
    469+    return node().context()->exit_status.load();
    


    ryanofsky commented at 3:13 pm on June 9, 2023:

    In commit “gui: return EXIT_FAILURE on post-init fatal errors” (c8806e972f2be9f8a05bf11b1f6a8a2e70e01962)

    This is ok for now, but if you want to future-proof the code and make it compatible with #10102, it would be better to add an interfaces::Node::getExitStatus() method and have GUI call node().getExitStatus(). The problem is that with #10102, node().context() will return null when the node is running in a different process than the GUI.

    I also think it might be clearer just to delete the BitcoinApplication::getExitStatus method entirely and just return node().getExitStatus() at the end of GuiMain()

    EDIT: I guess you would also need to add a interfaces::Node::setExitStatus() method as well to replace ’m_node->context()->exit_status.store` above. Would be nice, but this is also fine for now.


    furszy commented at 8:53 pm on June 9, 2023:

    I also think it might be clearer just to delete the BitcoinApplication::getExitStatus method entirely and just return node().getExitStatus() at the end of GuiMain()

    That was my first idea too but GuiMain() is a static function with no direct access to the node interface. The interface is created inside the BitcoinApplication::createNode class method.

    This is ok for now, but if you want to future-proof the code and make it compatible with #10102, it would be better to add an interfaces::Node::getExitStatus() method and have GUI call node().getExitStatus().

    Yeah, good. Added getExitStatus() to the node interface.


    ryanofsky commented at 11:23 pm on June 9, 2023:

    re: #27708 (review)

    That was my first idea too but GuiMain() is a static function with no direct access to the node interface.

    I think it can just do return app.node().getExitStatus();. The app.node() method is public and already used in GuiMain()


    furszy commented at 2:36 pm on June 10, 2023:
    Done, added
  42. in src/bitcoind.cpp:156 in 61fe1af014 outdated
    171@@ -153,17 +172,6 @@ static bool AppInit(NodeContext& node, int argc, char* argv[])
    172     std::any context{&node};
    173     try
    174     {
    175-        if (auto error = common::InitConfig(args)) {
    


    ryanofsky commented at 3:22 pm on June 9, 2023:

    In commit “refactor: decouple early return commands from AppInit” (61fe1af01418d8b93d38aba8aeb65aecfa6a95cd)

    Note for reviewers: Moving the common::InitConfig call out of the try block means if InitConfig throws an exception it won’t be caught. But this is good and should not change behavior because InitConfig is already catching std::exception internally and should just be doing basic parsing and not triggering other types of exceptions.


    furszy commented at 8:59 pm on June 9, 2023:
    yeah, same note applies for the other loop that was moved out of the try catch. The one that calls IsSwitchChar().
  43. ryanofsky approved
  44. ryanofsky commented at 3:26 pm on June 9, 2023: contributor

    Code review ACK 61fe1af01418d8b93d38aba8aeb65aecfa6a95cd. This looks very good.

    Appreciate you cleaning up the code to make it more obvious when the new exit_status value will and won’t be returned. (And sorry I posted another buggy diff that forgot about the early returns!) Left a few more minor suggestions that you could take or leave.

  45. DrahtBot added the label CI failed on Jun 9, 2023
  46. Return EXIT_FAILURE on post-init fatal errors
    It seems odd to return `EXIT_SUCCESS` when the node aborted
    execution due a fatal internal error or any post-init problem
    that triggers an unrequested shutdown.
    
    e.g. blocks or coins db I/O errors, disconnect block failure,
    failure during thread import (external blocks loading process
    error), among others.
    
    Co-authored-by: Ryan Ofsky <ryan@ofsky.org>
    3b2c61e819
  47. furszy force-pushed on Jun 9, 2023
  48. furszy commented at 9:01 pm on June 9, 2023: member
    Thanks @ryanofsky! Feedback tackled. Small diff.
  49. DrahtBot removed the label CI failed on Jun 9, 2023
  50. in src/qt/bitcoin.cpp:443 in 624340f9ac outdated
    439@@ -441,6 +440,7 @@ void BitcoinApplication::initializeResult(bool success, interfaces::BlockAndHead
    440 #endif
    441         pollShutdownTimer->start(SHUTDOWN_POLLING_DELAY);
    442     } else {
    443+        m_node->context()->exit_status.store(EXIT_FAILURE);
    


    ryanofsky commented at 11:16 pm on June 9, 2023:

    In commit “gui: return EXIT_FAILURE on post-init fatal errors” (624340f9acbdd3c2d54c3824248e2c0cff4cf487)

    Not necessary to address here, but with #10102 when the node and GUI are running in separate processes, m_node->context() will be null, so this will crash.

    You could fix and future-proof this by adding a interfaces::Node::setExitStatus() function, or just update the existing appInitMain() function to set the error code:

    0    bool appInitMain(interfaces::BlockAndHeaderTipInfo* tip_info) override
    1    {
    2        if (!AppInitMain(*m_context, tip_info)) {
    3            m_context->exit_status.store(EXIT_FAILURE);
    4            return false;
    5        }
    6        return true;
    7    }
    

    If you let appInitMain set the error code, you could also simplify GUI code even more, skipping Q_EMIT initializeResult(rv, tip_info) when appInitMain fails and getting rid of the rv parameter.


    furszy commented at 2:35 pm on June 10, 2023:

    You could fix and future-proof this by adding a interfaces::Node::setExitStatus() function, or just update the existing appInitMain() function to set the error code.

    Great, added it. Thanks!

    If you let appInitMain set the error code, you could also simplify GUI code even more, skipping Q_EMIT initializeResult(rv, tip_info) when appInitMain fails and getting rid of the rv parameter.

    Hmm, that would be nice but, in a quick glance, I don’t think that we can remove the rv parameter without introducing another signal to the InitExecutor class (which wouldn’t be bad). Because we still need to call to the shutdown function at the GUI level: the BitcoinApplication::requestShutdown that is called inside BitcoinApplication::initializeResult when the rv parameter is false.

    Still, even when I would love to keep cleaning stuff, maybe better to leave it for a follow-up so the gui related changes are minimal here.

  51. ryanofsky approved
  52. ryanofsky commented at 11:25 pm on June 9, 2023: contributor

    Code review ACK da60d86bfbe7fe49ce7181b5b394c0e3d8c53854

    Thanks for following up on earlier suggestions. Left some new comments below, but this also looks good in it’s current form.

  53. gui: return EXIT_FAILURE on post-init fatal errors 4927167f85
  54. refactor: decouple early return commands from AppInit
    Cleaned up the init flow to make it more obvious when
    the 'exit_status' value will and won't be returned.
    
    This is because it was confusing that `AppInit` was
    returning true under two different circumstances:
    
    1) When bitcoind was launched only to retrieve the "-help"
    or "-version" information. In this case, the app was
    not initialized.
    
    2) When the user triggers a shutdown. In this case,
    the app was fully initialized.
    61c569ab60
  55. furszy force-pushed on Jun 10, 2023
  56. furszy commented at 2:40 pm on June 10, 2023: member

    Updated per feedback, thanks ryanofsky. Tiny diff:

    • Removed BitcoinApplication::getExitStatus.
    • Moved exit_status.store(EXIT_FAILURE) to the interface appInitMain() method to be compatible with #10102.
  57. DrahtBot added the label CI failed on Jun 10, 2023
  58. pinheadmz commented at 6:24 pm on June 10, 2023: member

    concept ACK

    I was looking for this writing #27850 was surprised that a fatal internal error didn’t satisfy the python test for assert init error on startup

  59. DrahtBot removed the label CI failed on Jun 11, 2023
  60. TheCharlatan approved
  61. TheCharlatan commented at 7:44 am on June 12, 2023: contributor
    ACK 61c569ab6069d04079a0831468eb713983919636
  62. DrahtBot requested review from ryanofsky on Jun 12, 2023
  63. ryanofsky approved
  64. ryanofsky commented at 1:40 pm on June 12, 2023: contributor

    Code review ACK 61c569ab6069d04079a0831468eb713983919636

    This should be a really nice change because it actually makes bitcoind return an error code when there is an error. But it also clears out StartShutdown clutter and makes the main() function comprehensible, so is a nice code improvment.

    I think this would be ready for merge if another reviewer added an ACK.

  65. DrahtBot requested review from ryanofsky on Jun 12, 2023
  66. DrahtBot removed review request from ryanofsky on Jun 12, 2023
  67. in src/shutdown.cpp:34 in 3b2c61e819 outdated
    30@@ -28,6 +31,7 @@ bool AbortNode(const std::string& strMessage, bilingual_str user_message)
    31         user_message = _("A fatal internal error occurred, see debug.log for details");
    32     }
    33     InitError(user_message);
    34+    Assert(g_exit_status)->store(EXIT_FAILURE);
    


    pinheadmz commented at 3:18 pm on June 12, 2023:

    Does this make InitShutdownState() required before calling AbortNode() ? I suppose that could be added to the comment here. I’m just making sure I understand, I don’t think you need to retouch:

    https://github.com/bitcoin/bitcoin/blob/6f5f37eefd425faabd9a97a3c3028395c34f08c4/src/shutdown.h#L14-L17


    ryanofsky commented at 4:21 pm on June 12, 2023:

    Does this make InitShutdownState() required before calling AbortNode() ? I suppose that could be added to the comment here.

    It could be mentioned but AbortNode is about to be removed in #27861, so it isn’t too important

  68. in src/shutdown.cpp:24 in 3b2c61e819 outdated
    20@@ -20,6 +21,8 @@
    21 #include <condition_variable>
    22 #endif
    23 
    24+static std::atomic<int>* g_exit_status{nullptr};
    


    pinheadmz commented at 3:22 pm on June 12, 2023:
    I am guessing we need a new global here because we can’t otherwise access NodeContext::exit_status?

    TheCharlatan commented at 3:55 pm on June 12, 2023:
    FWIW, this could be de-globalized again by the interrupt class introduced in #27861.

    ryanofsky commented at 4:20 pm on June 12, 2023:

    FWIW, this could be de-globalized again by the interrupt class introduced in #27861.

    Yes #27861 replaces the AbortNode function with a KernelNotifications::fatalError method that can access whatever state it needs through the KernelNotifications class without needing any globals. AbortNode is the only function using g_exit_status so it should just disappear then.

  69. pinheadmz approved
  70. pinheadmz commented at 3:24 pm on June 12, 2023: member

    ACK 61c569ab6069d04079a0831468eb713983919636

    A few non-breaking questions below. Reviewed each commit and ran all tests locally. Confirmed that #2039 behavior now exits with code 1 instead of 0 like it did on master. Also confirmed that remaining StartShutdown() calls are all user-triggered.

     0-----BEGIN PGP SIGNED MESSAGE-----
     1Hash: SHA256
     2
     3ACK 61c569ab6069d04079a0831468eb713983919636
     4-----BEGIN PGP SIGNATURE-----
     5
     6iQIzBAEBCAAdFiEE5hdzzW4BBA4vG9eM5+KYS2KJyToFAmSHODIACgkQ5+KYS2KJ
     7yTpd4g/+I56MeCxiLqHXVtFe2yUw9JY1wYNeo6kuLU2A5Yx+kWeTBKZl9dA/lTIu
     85/TYbFaTai5px1HZzynnfD+XrFLtjaxynGqHhbfRmX8cGUyo5WkRRY6IcYUuqiXs
     9FBUsTqlXrEq00j012RL63xpiZhswUZTEpw3BzHD5c1IlK7FkdCWCVT0xl3o68ivs
    10W0pfXRXnAcAtseWgMlrZtPHGaEsQS1CMHAaz2KXvRPbOCu0p0vjAuGkzqpBZuu3x
    112kzVcPnuUunb6F+t62Q2Mrk4bCuKYlT9I5CozGVV8W9rWJODvzKyTCmTLCXbIxC0
    12MrGN6DHn0AzBbCBm6E65o5B80czNzM0Wci/7ZElmjBkDJxWP0KPDtaNp9QsazYbO
    13AVWhKfMMBDEqmwRyXvMK1iv/zJrZ2Y2QQf/LPt1vbLjVwHT2wF7HyEOUJHKKmrI2
    14fLWf62JKvBvrrNdeb12s5ZaMwMgcM5UA/8H2YuDw/zOvgJYZZiXAiRI3ETDvsm5B
    15dFIgn0M5njMq3UFgRl2hHW/qBwKiufs8QvrKw8zxuXxF5H+4sQcwCVtiwe//dsDY
    16179WwJQcQ3t7iwg1TNgscv9nVY5nRJvPzHs4iopLbPhks60akDic1g6ruMCdbAEj
    17/Md+oVk7d59HJLvhDlR6xpS8iKycyadFcsiMlOuK/Xn56ZPLKt0=
    18=N/HN
    19-----END PGP SIGNATURE-----
    

    pinheadmz’s public key is on keybase

  71. in test/functional/test_framework/test_node.py:383 in 3b2c61e819 outdated
    376@@ -377,17 +377,23 @@ def is_node_stopped(self):
    377             return False
    378 
    379         # process has stopped. Assert that it didn't return an error code.
    380-        assert return_code == 0, self._node_msg(
    381-            "Node returned non-zero exit code (%d) when stopping" % return_code)
    382+        # unless 'expected_ret_code' is provided.
    383+        if expected_ret_code is not None:
    384+            assert return_code == expected_ret_code, self._node_msg(
    385+                "Node returned unexpected exit code (%d) vs (%d) when stopping" % (return_code, expected_ret_code))
    


    theStack commented at 4:50 pm on June 12, 2023:

    nit, could take the opportunity to change to f-strings here (and below), as suggested in the functional test style guidelines:

    0                 f"Node returned unexpected exit code ({return_code}) vs ({expected_ret_code}) when stopping")
    
  72. theStack approved
  73. theStack commented at 4:53 pm on June 12, 2023: contributor
    Code-review ACK 61c569ab6069d04079a0831468eb713983919636
  74. ryanofsky merged this on Jun 12, 2023
  75. ryanofsky closed this on Jun 12, 2023

  76. furszy deleted the branch on Jun 12, 2023
  77. in test/functional/test_framework/test_node.py:384 in 3b2c61e819 outdated
    381-            "Node returned non-zero exit code (%d) when stopping" % return_code)
    382+        # unless 'expected_ret_code' is provided.
    383+        if expected_ret_code is not None:
    384+            assert return_code == expected_ret_code, self._node_msg(
    385+                "Node returned unexpected exit code (%d) vs (%d) when stopping" % (return_code, expected_ret_code))
    386+        else:
    


    maflcko commented at 9:08 am on June 13, 2023:

    In 3b2c61e8198bcefb1c2343caff1d705951026cc4:

    Any reason to make expected_ret_code optional? Seems easier to just make the default value 0 and drop this whole else block.

  78. maflcko commented at 9:08 am on June 13, 2023: member
    left a question / test nit for follow-up
  79. sidhujag referenced this in commit 160ad972a4 on Jun 15, 2023
  80. luke-jr referenced this in commit b2be79f213 on Aug 16, 2023
  81. luke-jr referenced this in commit 3be6b482c4 on Aug 16, 2023
  82. luke-jr referenced this in commit 1edcbb9ae5 on Aug 16, 2023
  83. luke-jr referenced this in commit a6fbe3e6b5 on Aug 16, 2023

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-07-05 16:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me