event loop: tolerate unexpected exceptions in `post()` callbacks #260

pull hulxv wants to merge 1 commits into bitcoin-core:master from hulxv:fix/deadlock-on-exception changing 1 files +7 −1
  1. hulxv commented at 7:55 PM on March 19, 2026: contributor

    If a function posted via EventLoop::post() threw an exception, the event loop would exit without resetting m_post_fn or notifying the condition variable, permanently deadlocking the calling thread in post().

    This change catches the exception instead, logs it, and keeps the event loop running so the caller is unblocked and other I/O events continue to be processed.

    Fix #259

  2. DrahtBot commented at 7:55 PM on March 19, 2026: none

    <!--e57a25ab6845829454e8d69fc972939a-->

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    <!--021abf342d371248e50ceaed478a90ca-->

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK ryanofsky

    If your review is incorrectly listed, please copy-paste <code>&lt;!--meta-tag:bot-skip--&gt;</code> into the comment that the bot should ignore.

    <!--5faf32d7da4f0f540f40219e4f7537a3-->

  3. in src/mp/proxy.cpp:250 in c6ad6057f0
     244 | @@ -239,13 +245,26 @@ void EventLoop::loop()
     245 |      kj::Own<kj::AsyncIoStream> wait_stream{
     246 |          m_io_context.lowLevelProvider->wrapSocketFd(m_wait_fd, kj::LowLevelAsyncIoProvider::TAKE_OWNERSHIP)};
     247 |      int post_fd{m_post_fd};
     248 | +    KJ_DEFER({
     249 | +        Lock lock(m_mutex);
     250 | +        m_cv.wait(lock.m_lock, [this]() MP_REQUIRES(m_mutex) { return m_num_clients == 0; });
    


    ryanofsky commented at 6:53 PM on March 23, 2026:

    In commit "fix: EventLoop::post() deadlocks when posted function throws" (c6ad6057f0ce488f483f2b00a8af7a15e3a73787)

    It seems like waiting for clients to be released here might just replace one deadlock with another deadlock in the general case m_post_fn throws and there is not just a single client.

    I tend to think if m_post_fn is going to throws, reasonable things to do might be:

    • Abort so this is detected as a bug
    • Or keep executing the event loop, but log an error
    • Or let the exception immediately propagate to the caller like happens currently.

    Trying to propagate the exception, but waiting for resources to be freed and possibly hanging seems like it adds more complexity without providing a reliable benefit


    hulxv commented at 6:41 AM on March 27, 2026:

    oh ok, I didn't notice that. thanks for catching it.

    I tend to think if m_post_fn is going to throws, reasonable things to do might be:

    * Abort so this is detected as a bug
    
    * Or keep executing the event loop, but log an error
    
    * Or let the exception immediately propagate to the caller like happens currently.

    I think the best and simplest solution here is keeping the executing of the event loop and just log an error. what do you think?


    ryanofsky commented at 9:14 AM on March 27, 2026:

    re: #260 (review)

    I think the best and simplest solution here is keeping the executing of the event loop and just log an error. what do you think?

    Yes I think logging the exception here would be a simple improvement that should help debugging. Would recommend kj::runCatchingExceptions for that and kj::str() to turn the exception into a string.

    Using catch (...) { eptr = std::current_exception(); } to catch the exception in the event loop thread and std::rethrow_exception(eptr); to rethrow it EventLoop::post() could be a different improvement that would help debugging, but I don't know if it would add too much complexity.

    Keeping the event loop running after the exception is thrown should also be an improvement.


    hulxv commented at 7:54 PM on March 27, 2026:

    done

  4. hulxv force-pushed on Mar 27, 2026
  5. hulxv requested review from ryanofsky on Mar 30, 2026
  6. ryanofsky approved
  7. ryanofsky commented at 12:29 PM on March 30, 2026: collaborator

    Code review ACK 5cf327871f9f6ff187bf0904005dab89d8136991. Thanks for the logging improvement!

    IMO it would be good to update the commit title, pr title, and PR description to clarify what this change does now, but the change itself looks good.

  8. in src/mp/proxy.cpp:249 in 5cf327871f outdated
     244 | @@ -245,7 +245,9 @@ void EventLoop::loop()
     245 |          if (read_bytes != 1) throw std::logic_error("EventLoop wait_stream closed unexpectedly");
     246 |          Lock lock(m_mutex);
     247 |          if (m_post_fn) {
     248 | -            Unlock(lock, *m_post_fn);
     249 | +            KJ_IF_MAYBE(exception, kj::runCatchingExceptions([&]() { Unlock(lock, *m_post_fn); })) {
     250 | +                MP_LOG(*this, Log::Error) << "EventLoop: m_post_fn threw: " << kj::str(*exception).cStr();
    


    ryanofsky commented at 12:38 PM on March 30, 2026:

    In commit "fix: EventLoop::post() deadlocks when posted function throws" (5cf327871f9f6ff187bf0904005dab89d8136991)

    Might be good to add a comment here that m_post_fn throwing is never expected to happen, and if it does happen there will likely be other bugs because EventLoop::post will return without an indication of failure.


    hulxv commented at 6:44 AM on March 31, 2026:

    done!

  9. ryanofsky approved
  10. ryanofsky commented at 12:47 PM on March 30, 2026: collaborator

    Thanks for the logging improvement!

    Forgot when I wrote this that this change isn't only a logging improvement, but it also changes execution because EventLoop::post will now return on failure instead of hanging, and other I/O events will now continue to be processed. Would be good to mention these things in the pr/commit description and maybe change title to something like "event loop: tolerate unexpected exceptions in post() callbacks"

  11. hulxv renamed this:
    fix: `EventLoop::post()` deadlocks when posted function throws
    event loop: tolerate unexpected exceptions in `post()` callbacks
    on Mar 31, 2026
  12. hulxv force-pushed on Mar 31, 2026
  13. hulxv commented at 6:45 AM on March 31, 2026: contributor

    is there anything else that should be added?

  14. hulxv requested review from ryanofsky on Mar 31, 2026
  15. ryanofsky approved
  16. ryanofsky commented at 2:18 PM on April 2, 2026: collaborator

    Code review ACK 7b26641aa3eb6cfd63f075d029cd4318b3f35a45. Thanks for the updates!

    I am increasingly thinking the best thing would be for EventLoop::post to throw an exception when the callback throws an exception, instead of not returning like happens currently, or returning without failing like happens with this PR. (std::current_exception std::rethrow_exception could be useful for this.)

    But It is probably not worth adding much complexity for this case since it is never expected to happen. And the logging added by this PR should at least be helpful for debugging. Just commenting to suggest possible followups.

  17. ryanofsky commented at 9:14 PM on April 3, 2026: collaborator

    I was about to merge this but I see IWYU failures when building locally (I didn't notice that CI was skipped for this PR):

    mp/src/mp/proxy.cpp should add these lines:
    #include <kj/exception.h>       // for Exception, runCatchingExceptions
    

    I think these need to be fixed before this can be merged.

  18. hulxv commented at 11:42 PM on April 3, 2026: contributor

    Done! Can you rerun the ci again?

  19. ryanofsky commented at 11:30 AM on April 4, 2026: collaborator

    Done! Can you rerun the ci again?

    Sure. Also would recommend squashing commits https://github.com/bitcoin/bitcoin/blob/master/CONTRIBUTING.md#squashing-commits

  20. ryanofsky commented at 6:17 PM on April 6, 2026: collaborator

    Code review 0dfe095015d200d0f1e0dd74bdc478df784bdb09. Looks like there is another CI failure https://github.com/bitcoin-core/libmultiprocess/actions/runs/23966308399/job/69937803736?pr=260. Can be fixed with:

    --- a/src/mp/proxy.cpp
    +++ b/src/mp/proxy.cpp
    @@ -249,7 +249,7 @@ void EventLoop::loop()
                 // m_post_fn throwing is never expected. If it does happen, the caller
                 // of EventLoop::post() will return without any indication of failure,
                 // which will likely cause other bugs. Log the error and continue.
    -            KJ_IF_MAYBE(exception, kj::runCatchingExceptions([&]() { Unlock(lock, *m_post_fn); })) {
    +            KJ_IF_MAYBE(exception, kj::runCatchingExceptions([&]() MP_REQUIRES(m_mutex) { Unlock(lock, *m_post_fn); })) {
                     MP_LOG(*this, Log::Error) << "EventLoop: m_post_fn threw: " << kj::str(*exception).cStr();
                 }
                 m_post_fn = nullptr;
    
  21. event loop: tolerate unexpected exceptions in `post()` callbacks b8a48c65e6
  22. hulxv force-pushed on Apr 13, 2026
  23. hulxv commented at 4:23 PM on April 13, 2026: contributor

    Can you rerun the ci again please?

  24. hulxv commented at 11:59 AM on April 14, 2026: contributor

    I think the ci failure is not related

    sh: rsync: not found
    rsync: connection unexpectedly closed (0 bytes received so far) [sender]
    rsync error: error in rsync protocol data stream (code 12) at io.c(232) [sender=3.2.7]
    The process '/usr/bin/rsync' failed with exit code 12
    
  25. ryanofsky approved
  26. ryanofsky commented at 12:22 PM on April 14, 2026: collaborator

    Code review ACK b8a48c65e600681e51c658b10b3b6f9a8c890201. Since last review just squashed commits and added missing thread safety annotation. This change is a probably an improvement from a debugging perspective since it will now always log an error when an unexpected exception happens. It is not a clear improvement in behavior since continuing after an unexpected failure could potentially be worse than hanging, but the difference is unlikely to be significant. Probably the best thing would be to catch and rethrow the exception.

    Planning to merge this since the logging could be helpful

  27. ryanofsky merged this on Apr 14, 2026
  28. ryanofsky closed this on Apr 14, 2026


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin-core/libmultiprocess. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-04-20 17:30 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me