kernel: Rm ShutdownRequested and AbortNode from validation code. #27861

pull TheCharlatan wants to merge 5 commits into bitcoin:master from TheCharlatan:kernelInterrupt changing 31 files +394 −199
  1. TheCharlatan commented at 1:16 pm on June 12, 2023: contributor

    Get rid of all ShutdownRequested calls in validation code by introducing an interrupt object that applications can use to cancel long-running kernel operations.

    Replace all AbortNode calls in validation code with new fatal error and flush error notifications so kernel applications can be notified about failures and choose how to handle them.


    This pull request is part of the libbitcoinkernel project #27587 https://github.com/orgs/bitcoin/projects/3 and more specifically its “Step 2: Decouple most non-consensus code from libbitcoinkernel”.

    The pull request mostly allows dropping the kernel dependency on shutdown.cpp. The only dependency left after this is a StartShutdown call which will be removed in followup PR #27711. This PR also drops the last reference to the uiInterface global in kernel code. The process of moving the uiInterface out of the kernel was started in #27636.

    This pull request contains a subset of patches originally proposed in #27711. It will be part of a series of changes required to make handling of interrupts (or in other words the current shutdown procedure) in the kernel library more transparent and less reliable on global mutable state. The set of patches contained here was originally proposed by @ryanofsky here.

  2. DrahtBot commented at 1:16 pm on June 12, 2023: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK hebasto, ryanofsky, achow101

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #27866 (blockstorage: Return on fatal flush errors by TheCharlatan)
    • #27850 (test: Add unit & functional test coverage for blockstore by pinheadmz)
    • #27823 (init: return error when block index is non-contiguous, fix feature_init.py file perturbation by mzumsande)
    • #27746 (Rework validation logic for assumeutxo by sdaftuar)
    • #27460 (rpc: Add importmempool RPC by MarcoFalke)
    • #26966 (index: blockfilter initial sync speedup, parallelize process by furszy)
    • #26762 (Make CCheckQueue RAII-styled by hebasto)
    • #26045 (rpc: Optimize serialization disk space of dumptxoutset by aureleoules)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  3. DrahtBot added the label Validation on Jun 12, 2023
  4. in src/shutdown.cpp:20 in 3ce961f38d outdated
    20 #include <assert.h>
    21-#include <atomic>
    22-#ifdef WIN32
    23-#include <condition_variable>
    24-#endif
    25+#include <optional>
    


    ryanofsky commented at 1:53 pm on June 12, 2023:

    In commit “util: Add SignalInterrupt class and use in shutdown.cpp” (3ce961f38df80843d01190589146088906819ea3)

    Doesn’t look like any optional objects are used here so probably not necessary to add

  5. in src/util/signalinterrupt.h:34 in 3ce961f38d outdated
    29+public:
    30+    SignalInterrupt();
    31+    explicit operator bool() const;
    32+    void operator()();
    33+    void reset();
    34+    void sleep();
    


    ryanofsky commented at 2:13 pm on June 12, 2023:

    In commit “util: Add SignalInterrupt class and use in shutdown.cpp” (3ce961f38df80843d01190589146088906819ea3)

    Would suggest replacing “sleep” with “wait” everywhere in this commit. The original SignalInterrupt class from #27711 (comment) had a sleep method to be consistent with CThreadInterrupt::sleep_for. But now that the class no longer depends on CThreadInterrupt, better to go with post-c++11 terminology and call it wait.

    Also would suggest just calling m_sleep_mutex and m_sleep_cv variables m_mutex and m_cv. Adding “wait” or “sleep” to the name doesn’t add real information because mutex and condition variables are generally used to wait for things.

  6. in src/util/signalinterrupt.cpp:45 in 3ce961f38d outdated
    40+
    41+void SignalInterrupt::operator()()
    42+{
    43+#ifdef WIN32
    44+    std::unique_lock<std::mutex> lk(m_sleep_mutex);
    45+    m_flag.store(true, std::memory_order_release);
    


    ryanofsky commented at 2:23 pm on June 12, 2023:

    In commit “util: Add SignalInterrupt class and use in shutdown.cpp” (3ce961f38df80843d01190589146088906819ea3)

    I think it would be safer to do m_flag = true here and not use memory_order_release. The previous code was doing fRequestShutdown = true; and it would be nice to just keep behavior unchanged in this commit and avoid causing potential bugs.

  7. in src/util/signalinterrupt.cpp:30 in 3ce961f38d outdated
    25+#endif
    26+}
    27+
    28+SignalInterrupt::operator bool() const
    29+{
    30+    return m_flag.load(std::memory_order_acquire);
    


    ryanofsky commented at 2:31 pm on June 12, 2023:

    In commit “util: Add SignalInterrupt class and use in shutdown.cpp” (3ce961f38df80843d01190589146088906819ea3)

    I think it would be safer to return m_flag and avoid using memory_order_acquire. The previous code was just doing return fRequestShutdown which uses a stricter memory ordering.

  8. in src/util/signalinterrupt.cpp:38 in 3ce961f38d outdated
    33+void SignalInterrupt::reset()
    34+{
    35+    // Cancel existing interrupt by waiting for it, this will reset condition flags and remove
    36+    // the token from the pipe.
    37+    if (*this) sleep();
    38+    m_flag.store(false, std::memory_order_release);
    


    ryanofsky commented at 2:34 pm on June 12, 2023:

    In commit “util: Add SignalInterrupt class and use in shutdown.cpp” (3ce961f38df80843d01190589146088906819ea3)

    Previous code was fRequestShutdown = false, would use m_flag = false to avoid changing behavior instead of adding memory_order_release

  9. in src/kernel/blockmanager_opts.h:30 in d60ab3b509 outdated
    25@@ -25,6 +26,8 @@ struct BlockManagerOpts {
    26     bool fast_prune{false};
    27     bool stop_after_block_import{DEFAULT_STOPAFTERBLOCKIMPORT};
    28     const fs::path blocks_dir;
    29+    Notifications& notifications;
    30+    mutable bool interrupt_on_fatal_error{true};
    


    ryanofsky commented at 2:50 pm on June 12, 2023:

    In commit “kernel: Add fatalError method to notifications” (d60ab3b5099d3a348f3122b5d3207ac12f4a751c)

    This option is unused so would be good to drop

  10. in src/node/blockstorage.cpp:532 in d60ab3b509 outdated
    528@@ -529,7 +529,7 @@ void BlockManager::FlushUndoFile(int block_file, bool finalize)
    529 {
    530     FlatFilePos undo_pos_old(block_file, m_blockfile_info[block_file].nUndoSize);
    531     if (!UndoFileSeq().Flush(undo_pos_old, finalize)) {
    532-        AbortNode("Flushing undo file to disk failed. This is likely the result of an I/O error.");
    533+        m_opts.notifications.fatalError("Flushing undo file to disk failed. This is likely the result of an I/O error.");
    


    ryanofsky commented at 3:24 pm on June 12, 2023:

    In commit “kernel: Add fatalError method to notifications” (d60ab3b5099d3a348f3122b5d3207ac12f4a751c)

    This is calling fatalError and then continuing without returning an error or raising an exception, so I think the new code here is more confusing than the current code (even if strictly speaking it is a refactoring).

    We could potentially change this code to treat failing flush as a normal error that gets returned to the caller. But I think if we are going to do that, it would be better to do in a separate PR which could receive targeted review without needing reviewers to look at all the other changes here.

    For this PR, I would suggest doing one of two things:

    1. Adding a kernel::Notifications::flushError method and calling flushError here instead of afatalError. The current code is making this AbortNode call in a different way than other AbortNode calls, and it would be confusing for the kernel notification interface to hide differences between errors that are not handled the same. A flushError method could have a comment like “The flush error notification is sent when there is an I/O error saving state in memory to disk. It is recommended that applications treat this a as a fatal error, even though it is technically less severe, and the kernel may do some work after it happens.”

    2. Reverting the change to this line and keeping the AbortNode function for now. This would get most of the work in the PR done, just leaving behind leaving 2 AbortNode calls that can be cleaned up with more care later.

  11. in src/node/blockstorage.cpp:551 in d60ab3b509 outdated
    547@@ -548,7 +548,7 @@ void BlockManager::FlushBlockFile(bool fFinalize, bool finalize_undo)
    548 
    549     FlatFilePos block_pos_old(m_last_blockfile, m_blockfile_info[m_last_blockfile].nSize);
    550     if (!BlockFileSeq().Flush(block_pos_old, fFinalize)) {
    551-        AbortNode("Flushing block file to disk failed. This is likely the result of an I/O error.");
    552+        m_opts.notifications.fatalError("Flushing block file to disk failed. This is likely the result of an I/O error.");
    


    ryanofsky commented at 3:25 pm on June 12, 2023:

    In commit “kernel: Add fatalError method to notifications” (d60ab3b5099d3a348f3122b5d3207ac12f4a751c)

    This is the other misleading fatalError call (see comment above)

  12. in src/node/chainstate.cpp:210 in d60ab3b509 outdated
    206@@ -207,7 +207,7 @@ ChainstateLoadResult LoadChainstate(ChainstateManager& chainman, const CacheSize
    207     } else if (snapshot_completion == SnapshotCompletionResult::SUCCESS) {
    208         LogPrintf("[snapshot] cleaning up unneeded background chainstate, then reinitializing\n");
    209         if (!chainman.ValidatedSnapshotCleanup()) {
    210-            AbortNode("Background chainstate cleanup failed unexpectedly.");
    211+            chainman.GetNotifications().fatalError("Background chainstate cleanup failed unexpectedly.");
    


    ryanofsky commented at 3:36 pm on June 12, 2023:

    In commit “kernel: Add fatalError method to notifications” (d60ab3b5099d3a348f3122b5d3207ac12f4a751c)

    No error is returned after this fatalError call, but this looks like a bug not directly related to this PR. Opened #27862 to address it.

  13. in src/validation.cpp:5659 in d60ab3b509 outdated
    5662@@ -5664,7 +5663,7 @@ void Chainstate::InvalidateCoinsDBOnDisk()
    5663 
    5664         LogPrintf("%s: error renaming file '%s' -> '%s': %s\n",
    5665                 __func__, src_str, dest_str, e.what());
    5666-        AbortNode(strprintf(
    5667+        m_chainman.GetNotifications().fatalError(strprintf(
    


    ryanofsky commented at 3:41 pm on June 12, 2023:

    In commit “kernel: Add fatalError method to notifications” (d60ab3b5099d3a348f3122b5d3207ac12f4a751c)

    No error is returned after this fatalError call, but this looks like a bug not directly related to this PR. Opened #27862 to address it.

  14. TheCharlatan marked this as ready for review on Jun 12, 2023
  15. DrahtBot added the label Needs rebase on Jun 12, 2023
  16. ryanofsky approved
  17. ryanofsky commented at 5:13 pm on June 12, 2023: contributor
    Code review ACK d60ab3b5099d3a348f3122b5d3207ac12f4a751c. Overall looks great! Feel free to ignore most of comments I left. The only thing I think really should be addressed is the two BlockManager::Flush calls aborting without returning errors. I think this is confusing and potentially could lead to bugs, but since the changes here don’t actually affect the way the code run in practice, it doesn’t need to block the PR.
  18. TheCharlatan commented at 5:24 pm on June 12, 2023: contributor

    The only thing I think really should be addressed is the two BlockManager::Flush calls aborting without returning errors.

    I was about to open a separate PR from this branch here: https://github.com/TheCharlatan/bitcoin/tree/kernelReturnOnAbort . I think this should go in a separate PR for ease of review. I’ll also provide a list of all the other non-immediate returns that should be fine to do.

  19. ryanofsky commented at 6:35 pm on June 12, 2023: contributor

    The only thing I think really should be addressed is the two BlockManager::Flush calls aborting without returning errors.

    I was about to open a separate PR from this branch here: https://github.com/TheCharlatan/bitcoin/tree/kernelReturnOnAbort

    Great! I also opened #27862 earlier which overlaps with the branch a little. I think between the branch and PR all the cases where errors are not currently returned are covered.

  20. TheCharlatan force-pushed on Jun 13, 2023
  21. TheCharlatan commented at 11:03 am on June 13, 2023: contributor

    Rebased d60ab3b5099d3a348f3122b5d3207ac12f4a751c -> 3e60c02ef3759e16a762bd193e0b9291bf46b73c (kernelInterrupt_0 -> kernelInterrupt_1, compare).

    Updated 3e60c02ef3759e16a762bd193e0b9291bf46b73c -> 39d9b5144410c15385ee48bc886e0dea36f34b6a (kernelInterrupt_1 -> kernelInterrupt_2, compare).

    • Addressed @ryanofsky’s comment, cleaning up includes.
    • Addressed @ryanofsky’s comment, renaming sleep to wait.
    • Addressed @ryanofsky’s comments (1, 2, 3), avoiding behavior changes around atomic memory acquisition and release.
    • Addressed @ryanofsky’s comment, removing a field that was accidentally carried over from the parent PR.
  22. DrahtBot removed the label Needs rebase on Jun 13, 2023
  23. in src/index/base.cpp:204 in 39d9b51444 outdated
    200@@ -199,7 +201,7 @@ void BaseIndex::ThreadSync()
    201                     break;
    202                 }
    203                 if (pindex_next->pprev != pindex && !Rewind(pindex, pindex_next->pprev)) {
    204-                    FatalErrorf("%s: Failed to rewind index %s to a previous chain tip",
    205+                    FatalErrorf(m_chain->context()->exit_status, "%s: Failed to rewind index %s to a previous chain tip",
    


    ryanofsky commented at 4:57 pm on June 13, 2023:

    In commit “kernel: Add fatalError method to notifications” (39d9b5144410c15385ee48bc886e0dea36f34b6a)

    Would suggest reverting all the calls to FatalErrorf this commit and just making FatalErrorf a member function to minimize the diff and avoid adding lots of new context accesses. Then the only src/index/ changes that would be needed would be:

     0--- a/src/index/base.cpp
     1+++ b/src/index/base.cpp
     2@@ -8,6 +8,7 @@
     3 #include <interfaces/chain.h>
     4 #include <kernel/chain.h>
     5 #include <logging.h>
     6+#include <node/abort.h>
     7 #include <node/blockstorage.h>
     8 #include <node/context.h>
     9 #include <node/database_args.h>
    10@@ -31,9 +32,10 @@ constexpr auto SYNC_LOG_INTERVAL{30s};
    11 constexpr auto SYNC_LOCATOR_WRITE_INTERVAL{30s};
    12 
    13 template <typename... Args>
    14-static void FatalErrorf(const char* fmt, const Args&... args)
    15+void BaseIndex::FatalErrorf(const char* fmt, const Args&... args)
    16 {
    17-    AbortNode(tfm::format(fmt, args...));
    18+    auto message = tfm::format(fmt, args...);
    19+    node::AbortNode(m_chain->context()->exit_status, message);
    20 }
    21 
    22 CBlockLocator GetLocator(interfaces::Chain& chain, const uint256& block_hash)
    23diff --git a/src/index/base.h b/src/index/base.h
    24index 231f36b60538..8affee90f862 100644
    25--- a/src/index/base.h
    26+++ b/src/index/base.h
    27@@ -94,6 +94,9 @@ private:
    28 
    29     virtual bool AllowPrune() const = 0;
    30 
    31+    template <typename... Args>
    32+    void FatalErrorf(const char* fmt, const Args&... args);
    33+
    34 protected:
    35     std::unique_ptr<interfaces::Chain> m_chain;
    36     Chainstate* m_chainstate{nullptr};
    
  24. ryanofsky approved
  25. ryanofsky commented at 5:17 pm on June 13, 2023: contributor
    Code review ACK 39d9b5144410c15385ee48bc886e0dea36f34b6a. I left a suggestion to reduce size of changes to index code, and I still think it is inappropriate to treat flush errors as fatal errors, and would prefer to add a separate flush error method (https://github.com/bitcoin/bitcoin/pull/27861#discussion_r1226847296), but I won’t get hung up on this.
  26. ryanofsky commented at 5:27 pm on June 13, 2023: contributor

    re: #27861 (comment)

    • De-globalized the exit_status in shutdown.*. My current approach touches quite a few lines though and seems a bit unclean, suggestions for alternative approaches are appreciated.

    I’m actually not sure what this is referring to. exit_status is not referenced in shutdown.cpp at all after this PR: https://github.com/TheCharlatan/bitcoin/blob/39d9b5144410c15385ee48bc886e0dea36f34b6a/src/shutdown.cpp.

    Also, after this PR shutdown.cpp is just a thin wrapper around the kernel::Context::interrupt object, so we could pretty easily delete shutdown.cpp entirely and update callers to use the object directly.

  27. TheCharlatan commented at 6:34 pm on June 13, 2023: contributor

    Re #27861#pullrequestreview-1477669211

    and I still think it is inappropriate to treat flush errors as fatal errors, and would prefer to add a separate flush error method (https://github.com/bitcoin/bitcoin/pull/27861#discussion_r1226847296), but I won’t get hung up on this.

    ACK, I’ll revert to using AbortNode then for now, that should keep any potential churn minimal. Once #27866 has received sufficient review, we can always decide to add special cases (e.g. flush error notifications) afterwards.

  28. TheCharlatan commented at 7:01 pm on June 13, 2023: contributor

    Re #27861 (comment)

    ACK, I’ll revert to using AbortNode then for now, that should keep any potential churn minimal. Once #27866 has received sufficient review, we can always decide to add special cases (e.g. flush error notifications) afterwards.

    Ugh, just noticed this gets annoying with the exit status - we probably don’t want to introduce that in the blockstorage scope. Leaving as is then :/.

  29. TheCharlatan force-pushed on Jun 13, 2023
  30. TheCharlatan commented at 8:30 pm on June 13, 2023: contributor

    Updated 39d9b5144410c15385ee48bc886e0dea36f34b6a -> 315915b43082a4c43bec3ba8e826abc7280b6cbc (kernelInterrupt_2 -> kernelInterrupt_3, compare).

  31. DrahtBot added the label CI failed on Jun 13, 2023
  32. in src/Makefile.am:986 in 315915b430 outdated
    981   util/strencodings.cpp \
    982   util/string.cpp \
    983   util/syscall_sandbox.cpp \
    984   util/syserror.cpp \
    985   util/thread.cpp \
    986+  util/threadinterrupt.cpp \
    


    hebasto commented at 2:05 pm on June 15, 2023:

    8af687bc442c8b7ba9360fb9a961b374f41d0c59:

    Is it required?

  33. in src/util/threadinterrupt.h:23 in 315915b430 outdated
    15@@ -16,6 +16,11 @@
    16     A helper class for interruptible sleeps. Calling operator() will interrupt
    17     any current sleep, and after that point operator bool() will return true
    18     until reset.
    19+
    20+    This class should not be used in a signal handler. If sending an interrupt
    21+    from a signal handler is neccessary, the \ref SignalInterrupt class can be
    22+    used instead.
    23+
    


    hebasto commented at 2:17 pm on June 15, 2023:
    Maybe make the whole comment Doxygen compatible?
  34. hebasto commented at 2:40 pm on June 15, 2023: member
    Concept ACK.
  35. ryanofsky referenced this in commit 02654c938e on Jun 15, 2023
  36. TheCharlatan commented at 7:36 pm on June 15, 2023: contributor

    Re #27861#pullrequestreview-1477669211

    and I still think it is inappropriate to treat flush errors as fatal errors, and would prefer to add a separate flush error method (https://github.com/bitcoin/bitcoin/pull/27861#discussion_r1226847296)

    Just to ensure I am understanding you here, would the node::KernelNotifications for this extra notification match the implementation of the fatalError method for now (and later be adapted as needed), or do you want it to just log?

  37. ryanofsky commented at 8:38 pm on June 15, 2023: contributor

    would the node::KernelNotifications for this extra notification match the implementation of the fatalError method for now (and later be adapted as needed), or do you want it to just log?

    It should definitely call AbortNode and not change current behavior. The point of the suggestion is send fatal error notiifications in a consistent way and not introduce misleading code that says an error will be returned separately when it won’t be.

  38. ryanofsky approved
  39. ryanofsky commented at 8:51 pm on June 15, 2023: contributor

    Code review ACK 315915b43082a4c43bec3ba8e826abc7280b6cbc. Only change since last review is simplifying changes to index code as suggested earlier.

    I still think it would be nicer if FlushUndoFile and FlushBlockFile sent flush error notifications instead of fatal error notifications to avoid lumping together errors that mean different things and are not handled the same way (https://github.com/bitcoin/bitcoin/pull/27861#discussion_r1226847296). But this would just be for clarity. Behavior would be the same either way so it’s not that important.

  40. ryanofsky referenced this in commit 26b695b367 on Jun 15, 2023
  41. ryanofsky referenced this in commit 9047337d36 on Jun 15, 2023
  42. TheCharlatan force-pushed on Jun 15, 2023
  43. TheCharlatan commented at 9:36 pm on June 15, 2023: contributor

    Thank you for bearing with me @ryanofsky and the review @hebasto,

    Updated 315915b43082a4c43bec3ba8e826abc7280b6cbc -> 8654e835dd12a96b98eb554da48e342270e7279b (kernelInterrupt_3 -> kernelInterrupt_4, compare).

    • Addressed @hebasto’s comment, removing source file included by accident in makefile sources list.
    • Addressed @hebasto’s comment, improving documentation formatting.
    • Adopted @ryanofsky’s suggestions, added a commit for separate notification handling of fatal flush errors.

    Rebased 8654e835dd12a96b98eb554da48e342270e7279b -> 5155ce6114e199545a6ebdc77425dedc20233610 (kernelInterrupt_4 -> kernelInterrupt_5, compare).

    • Hopefully this fixes the CI failure.
  44. DrahtBot removed the label CI failed on Jun 15, 2023
  45. in src/util/signalinterrupt.h:20 in 5155ce6114 outdated
    15+#include <atomic>
    16+#include <cstdlib>
    17+
    18+namespace util {
    19+/**
    20+ * Helper class that allows manages an interrupt flag, and allows a thread or
    


    evansmj commented at 2:56 am on June 19, 2023:
    Helper class that [allows] [manages] an interrupt flag Was this meant to be Helper class that manages an interrupt flag?
  46. in src/util/signalinterrupt.h:24 in 5155ce6114 outdated
    19+/**
    20+ * Helper class that allows manages an interrupt flag, and allows a thread or
    21+ * signal to interrupt another thread.
    22+ *
    23+ * This class is safe to be used in a signal handler. If sending an interrupt
    24+ * from a signal handler is not neccessary, the more lightweight \ref
    


    evansmj commented at 2:57 am on June 19, 2023:
    neccessary can be updated to necessary
  47. in src/util/threadinterrupt.h:21 in 5155ce6114 outdated
    21+ * A helper class for interruptible sleeps. Calling operator() will interrupt
    22+ * any current sleep, and after that point operator bool() will return true
    23+ * until reset.
    24+ *
    25+ * This class should not be used in a signal handler. If sending an interrupt
    26+ * from a signal handler is neccessary, the \ref SignalInterrupt class can be
    


    evansmj commented at 2:58 am on June 19, 2023:
    necessary
  48. evansmj commented at 2:59 am on June 19, 2023: none
    just a few typos i found
  49. TheCharlatan force-pushed on Jun 19, 2023
  50. TheCharlatan commented at 9:40 am on June 19, 2023: contributor

    Updated 5155ce6114e199545a6ebdc77425dedc20233610 -> e0653c2223f8d6d74645c9b5742de1c069d8c3a7 (kernelInterrupt_5 -> kernelInterrupt_6, compare).

  51. in src/kernel/context.h:21 in f6cbafb31f outdated
    17@@ -16,12 +18,24 @@ namespace kernel {
    18 //! State stored directly in this struct should be simple. More complex state
    19 //! should be stored to std::unique_ptr members pointing to opaque types.
    20 struct Context {
    21+    //! Interrupt functionality that can be used to stop long-running kernel operations.
    


    ryanofsky commented at 7:45 pm on June 20, 2023:

    In commit “util: Add SignalInterrupt class and use in shutdown.cpp” (f6cbafb31f36da2e003fa75a31207cd11b8c23ef)

    Maybe say “Interrupt object” instead of “Interrupt functionality” because the comment is supposed to be documenting a struct member.

  52. in src/util/threadinterrupt.h:23 in f6cbafb31f outdated
    23+ * until reset.
    24+ *
    25+ * This class should not be used in a signal handler. If sending an interrupt
    26+ * from a signal handler is necessary, the \ref SignalInterrupt class can be
    27+ * used instead. See the commit message of
    28+ * https://github.com/bitcoin/bitcoin/commit/cd03513dc2fcccaa142e9632a28b38efd0056436
    


    ryanofsky commented at 7:55 pm on June 20, 2023:

    In commit “util: Add SignalInterrupt class and use in shutdown.cpp” (f6cbafb31f36da2e003fa75a31207cd11b8c23ef)

    I don’t know how helpful that git commit message is, and probably be better not to link to a github url if there can be a more direct explanation.

    I’m actually not sure what question the link is trying to answer, but if more explanation is needed here, I think this comment can just say that thread interrupt class shouldn’t be called from a signal handler because it uses thread synchronization primitives that aren’t safe to use with signals (see https://man7.org/linux/man-pages/man7/signal-safety.7.html)

  53. in src/kernel/notifications_interface.h:33 in 6318b09ab1 outdated
    26@@ -27,6 +27,11 @@ class Notifications
    27     virtual void headerTip(SynchronizationState state, int64_t height, int64_t timestamp, bool presync) {}
    28     virtual void progress(const bilingual_str& title, int progress_percent, bool resume_possible) {}
    29     virtual void warning(const bilingual_str& warning) {}
    30+
    31+    //! The flush error notification is sent to notify the user that an error
    32+    //! occurred while flushing block data to disk. Functions notifying about
    33+    //! this error may not return an error code or raise an exception.
    


    ryanofsky commented at 8:09 pm on June 20, 2023:

    In commit “kernel: Add flushError method to notifications” (6318b09ab123fdcdec8ee46ce28f01a2d7c2dce3)

    Would replace “Functions notifying about this error may not return an error code or raise an exception.” because its unclear what functions that is refers to and it doesn’t suggest how a kernel application should handle a flush error. Would replace it with “Kernel code ignores flush errors that don’t affect the immediate operation it is trying to perform. Applications can choose to handle the flush error notification by logging the error, or notifying the user, or triggering an early shutdown as a precaution against causing more errors.”

  54. in src/kernel/notifications_interface.h:40 in e0653c2223 outdated
    32@@ -32,6 +33,12 @@ class Notifications
    33     //! occurred while flushing block data to disk. Functions notifying about
    34     //! this error may not return an error code or raise an exception.
    35     virtual void flushError(const std::string& debug_message) {}
    36+
    37+    //! The fatal error notification is sent to notify the user and start
    38+    //! shutting down if an error happens in kernel code that can't be recovered
    39+    //! from. After this notification is sent, whatever function triggered the
    40+    //! error should also return an error code or raise an exception.
    


    ryanofsky commented at 4:00 pm on June 22, 2023:

    In commit “kernel: Add fatalError method to notifications” (e0653c2223f8d6d74645c9b5742de1c069d8c3a7)

    Maybe add “This notification can be used to log the error, or notify the user, trigger an application shutdown to prevent causing more errors.”

  55. ryanofsky approved
  56. ryanofsky commented at 4:03 pm on June 22, 2023: contributor
    Code review ACK e0653c2223f8d6d74645c9b5742de1c069d8c3a7. Thanks for the updates! Just suggested a few small changes to comments
  57. achow101 referenced this in commit 6a473373d4 on Jun 22, 2023
  58. DrahtBot added the label Needs rebase on Jun 22, 2023
  59. sidhujag referenced this in commit 770b363c5c on Jun 22, 2023
  60. TheCharlatan force-pushed on Jun 27, 2023
  61. TheCharlatan commented at 9:43 pm on June 27, 2023: contributor

    Updated e0653c2223f8d6d74645c9b5742de1c069d8c3a7 -> 0c2fd16275ae078e33d71b0c9da8034075998553 (kernelInterrupt_6 -> kernelInterrupt_7, compare).

    • Adjusted and simplified documentation as suggested by @ryanofsky

    Rebased 0c2fd16275ae078e33d71b0c9da8034075998553 -> 001acf37dff40688d82c3b1a7ff9ed3901addb33 (kernelInterrupt_7 -> kernelInterrupt_8, compare).

  62. ryanofsky approved
  63. ryanofsky commented at 0:10 am on June 28, 2023: contributor

    Code review ACK 001acf37dff40688d82c3b1a7ff9ed3901addb33. This looks very good, and hopefully can be merged soon with more review.

    To encourage review, I would maybe rewrite the PR title and description to be a little less abstract, and describe the benefits concretely. For title, would suggest “kernel: Remove all ShutdownRequested and AbortNode calls from validation code.” For description would suggest: “Replace all AbortNode calls in validation code with new fatal error and flush error notifications so kernel applications can be notified about failures and choose how to handle them. Also, get rid of all ShutdownRequested calls in validation code by introducing an interrupt object that applications can use to cancel long-running kernel operations. This PR mostly allows dropping the kernel dependency on shutdown.cpp. The only dependency left after this is a StartShutdown call which will be removed in followup PR #27711. This PR also drops the last reference to the uiInterface global in kernel code.”

    Only changes since last review are rebase and some documentation updates. It does need another rebase, though, because there is a new minor conflict with #27896

  64. util: Add SignalInterrupt class and use in shutdown.cpp
    This change helps generalize shutdown code so an interrupt can be
    provided to libbitcoinkernel callers. This may also be useful to
    eventually de-globalize all of the shutdown code.
    
    Co-authored-by: Russell Yanofsky <russ@yanofsky.org>
    Co-authored-by: TheCharlatan <seb.kung@gmail.com>
    e2d680a32d
  65. kernel: Pass interrupt reference to chainman
    This and the following commit seek to decouple the libbitcoinkernel
    library from the shutdown code. As a library, it should it should have
    its own flexible interrupt infrastructure without relying on node-wide
    globals.
    
    The commit takes the first step towards this goal by de-globalising
    `ShutdownRequested` calls in kernel code.
    
    Co-authored-by: Russell Yanofsky <russ@yanofsky.org>
    Co-authored-by: TheCharlatan <seb.kung@gmail.com>
    edb55e2777
  66. scripted-diff: Rename FatalError to FatalErrorf
    This is done in preparation for the next commit where a new FatalError
    function is introduced. FatalErrorf follows common convention to append
    'f' for functions accepting format arguments.
    
    -BEGIN VERIFY SCRIPT-
    sed -i 's/FatalError/FatalErrorf/g' $( git grep -l 'FatalError')
    -END VERIFY SCRIPT-
    3fa9094b92
  67. kernel: Add flushError method to notifications
    This is done in addition with the following commit. Both have the goal
    of getting rid of direct calls to AbortNode from kernel code. This extra
    flushError method is added to notify specifically about errors that
    arrise when flushing (syncing) block data to disk. Unlike other
    instances, the current calls to AbortNode in the blockstorage flush
    functions do not report an error to their callers.
    
    This commit is part of the libbitcoinkernel project and further removes
    the shutdown's and, more generally, the kernel library's dependency on
    interface_ui with a kernel notification method. By removing interface_ui
    from the kernel library, its dependency on boost is reduced to just
    boost::multi_index. At the same time it also takes a step towards
    de-globalising the interrupt infrastructure.
    7320db96f8
  68. kernel: Add fatalError method to notifications
    FatalError replaces what previously was the AbortNode function in
    shutdown.cpp.
    
    This commit is part of the libbitcoinkernel project and further removes
    the shutdown's and, more generally, the kernel library's dependency on
    interface_ui with a kernel notification method. By removing interface_ui
    from the kernel library, its dependency on boost is reduced to just
    boost::multi_index. At the same time it also takes a step towards
    de-globalising the interrupt infrastructure.
    
    Co-authored-by: Russell Yanofsky <russ@yanofsky.org>
    Co-authored-by: TheCharlatan <seb.kung@gmail.com>
    6eb33bd0c2
  69. TheCharlatan force-pushed on Jun 28, 2023
  70. TheCharlatan commented at 9:53 am on June 28, 2023: contributor

    Rebased 001acf37dff40688d82c3b1a7ff9ed3901addb33 -> 6eb33bd0c21b3e075fbab596351cacafdc947472 (kernelInterrupt_8 -> kernelInterrupt_9, compare).

  71. TheCharlatan renamed this:
    kernel: Add interrupt class
    kernel: Rm ShutdownRequested and AbortNode from validation code.
    on Jun 28, 2023
  72. DrahtBot removed the label Needs rebase on Jun 28, 2023
  73. in src/kernel/context.cpp:32 in 6eb33bd0c2
    28@@ -26,6 +29,8 @@ Context::Context()
    29 Context::~Context()
    30 {
    31     ECC_Stop();
    32+    assert(g_context);
    


    hebasto commented at 1:32 pm on June 28, 2023:
    Would it be safer to move the assertion to the top of the desctuctor’s code?
  74. hebasto commented at 1:32 pm on June 28, 2023: member
    ACK 6eb33bd0c21b3e075fbab596351cacafdc947472, I have reviewed the code and it looks OK.
  75. DrahtBot requested review from ryanofsky on Jun 28, 2023
  76. ryanofsky approved
  77. ryanofsky commented at 2:17 pm on June 28, 2023: contributor

    Code review ACK 6eb33bd0c21b3e075fbab596351cacafdc947472. No changes since last review other than rebase.

    If I can make one more suggestion about PR the description, I would suggest to make “Replace all AbortNode calls…” and “Get rid of all ShutdownRequested calls..” the first and main paragraph, and put everything else that’s there below it separated by a horizontal line. I think the other text which is abstract and relies on explanations in linked PRs is going to be intimidating and not to make much sense to people who aren’t following the libbitcoinkernel project closely, even if they are familiar with this code.

    I think the PR makes sense by itself and the “Replace all AbortNode calls…” and “Get rid of all ShutdownRequested calls..” text describes the changes and immediate benefits. If reviewers want to go down the rabbit hole and read about libbitcoinkernel goals and phases, that’s great, but it shouldn’t be a prerequisite for understanding and reviewing the actual code.


    EDIT: When first posted this comment I didn’t realize you broke apart AbortNode and ShutdownRequested explanations into separate paragraphs and added the “As a standalone library…” preamble. Would also drop the preamble or move it to the end, because I think the point about code dependencies is incidental, and then main benefit this PR provides are interfaces for applications using the kernel library to:

    1. Cancel long running operations
    2. Receive notifications about errors
  78. ryanofsky commented at 5:52 pm on July 6, 2023: contributor
    If anyone else wants to review: this PR has two acks and is a pure refactoring and I think could be merged if there is another reviewer.
  79. achow101 commented at 6:33 pm on July 6, 2023: member

    light ACK 6eb33bd0c21b3e075fbab596351cacafdc947472

    I’m not particularly experienced with this area of the code, so I can’t really comment on approach taken here. However mechanically, the code looks find and correct to me and seems to achieve the stated objectives.

  80. ryanofsky merged this on Jul 6, 2023
  81. ryanofsky closed this on Jul 6, 2023

  82. sidhujag referenced this in commit 61285e8985 on Jul 7, 2023
  83. in src/util/signalinterrupt.h:35 in e2d680a32d outdated
    30+public:
    31+    SignalInterrupt();
    32+    explicit operator bool() const;
    33+    void operator()();
    34+    void reset();
    35+    void wait();
    


    MarcoFalke commented at 8:34 am on July 7, 2023:
    Would it make sense to document that all of these may throw, except for bool()?

    MarcoFalke commented at 8:57 am on July 7, 2023:
    Also, it would be good to explain why in the following commits it is fine to then ignore the exceptions and no longer catch them.

    ryanofsky commented at 11:43 pm on July 10, 2023:

    Also, it would be good to explain why in the following commits it is fine to then ignore the exceptions and no longer catch them.

    I probably just missed something, but there shouldn’t be places in the following commits where exceptions are no longer caught. This should just be a refactoring.

    I agree though that throwing undocumented exceptions is not good. And probably getting rid of exceptions and using nodiscard return values would be better than documenting them. The reason these are using exceptions is I was trying to make the SignalInterrupt interface consistent with the CThreadInterrupt interface, and there wasn’t a good way to return an error from the constructor.

    I will follow up with this in #28051

  84. ryanofsky referenced this in commit d9453bddd6 on Jul 7, 2023
  85. in src/index/base.h:98 in 6eb33bd0c2
    93@@ -94,6 +94,9 @@ class BaseIndex : public CValidationInterface
    94 
    95     virtual bool AllowPrune() const = 0;
    96 
    97+    template <typename... Args>
    98+    void FatalErrorf(const char* fmt, const Args&... args);
    


    MarcoFalke commented at 7:19 am on July 8, 2023:

    in 6eb33bd0c21b3e075fbab596351cacafdc947472: What is this?

    It would be good to split this up in it’s own commit, so that it is possible to add a description as to why the changes are needed and beneficial.

    Currently it looks like a mistake, because my understanding is that it will just lead to a linker error to call this function somewhere outside of base.cpp


    MarcoFalke commented at 7:32 am on July 8, 2023:
    Ah nvm. I see it is private and probably needed to access the m_... member.
  86. in src/shutdown.cpp:13 in 6eb33bd0c2
    36-
    37-void InitShutdownState(std::atomic<int>& exit_status)
    38-{
    39-    g_exit_status = &exit_status;
    40-}
    41+#include <assert.h>
    


    MarcoFalke commented at 7:23 am on July 8, 2023:
    nit in 6eb33bd0c21b3e075fbab596351cacafdc947472: Why are you replacing cassert with assert.h?
  87. MarcoFalke commented at 7:29 am on July 8, 2023: member

    Concept ACK, left some questions.

    Concept ACK 6eb33bd0c21b3e075fbab596351cacafdc947472 🗼

    Signature:

    0untrusted comment: signature from minisign secret key on empty file; verify via: minisign -Vm "${path_to_any_empty_file}" -P RWTRmVTMeKV5noAMqVlsMugDDCyyTSbA3Re5AkUrhvLVln0tSaFWglOw -x "${path_to_this_whole_four_line_signature_blob}"
    1RUTRmVTMeKV5npGrKx1nqXCw5zeVHdtdYURB/KlyA/LMFgpNCs+SkW9a8N95d+U4AP1RJMi+krxU1A3Yux4bpwZNLvVBKy0wLgM=
    2trusted comment: Concept ACK 6eb33bd0c21b3e075fbab596351cacafdc947472 🗼
    3NeAejYNkmlMXartwqcjqc2A5dHtRJ8FfNWSSKXnDRNWaaFh+q2iMhmXxlc2pMsdyhs4X2otttjtOoOTRLqrhAQ==
    
  88. ryanofsky referenced this in commit 24169b10d8 on Jul 10, 2023
  89. ryanofsky referenced this in commit a362f30ac6 on Jul 13, 2023
  90. ryanofsky referenced this in commit 0dce004268 on Jul 13, 2023

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-07-01 10:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me