bug: fix mptest hang, ProxyClient deadlock in disconnect handler #201

pull ryanofsky wants to merge 6 commits into bitcoin-core:master from ryanofsky:pr/context changing 7 files +152 −73
  1. ryanofsky commented at 8:16 pm on September 2, 2025: collaborator

    This bug is currently causing mptest “disconnecting and blocking” test to occasionally hang as reported by maflcko in https://github.com/bitcoin/bitcoin/issues/33244.

    The bug was actually first reported by Sjors in https://github.com/Sjors/bitcoin/pull/90#issuecomment-3024942087 and there are more details about it in #189.

    The bug is caused by the “disconnecting and blocking” test triggering a disconnect right before a server IPC call returns. This results in a race between the IPC server thread and the onDisconnect handler in the event loop thread both trying to destroy the server’s request_threads ProxyClient<Thread> object when the IPC call is done.

    There was a lack of synchronization in this case, fixed here by adding loop->sync() various places. There were also lock order issues where Waiter::m_mutex could be incorrectly locked before EventLoop::m_mutex resulting in a deadlock.

  2. DrahtBot commented at 8:16 pm on September 2, 2025: none

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK Sjors, Eunovo

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #210 (Replace KJ_DEFER with kj::defer by Sjors)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  3. ryanofsky renamed this:
    bug: fix ProxyClient<Thread> deadlock if disconnected as IPC call is returning
    bug: fix mptest hang, ProxyClient<Thread> deadlock in disconnect handler
    on Sep 3, 2025
  4. ryanofsky commented at 4:19 pm on September 3, 2025: collaborator

    I captured two stack traces of mptest hang on master (1b8d4a6f1e54b92708bd2ad627ec6d440a1daf3d) to be more confident this PR fixes the bug. I might also be able to update the test to make it trigger this bug more reliably, but would need to think more about how to do that. Both stack traces were identical and showed a deadlock between Waiter::m_mutex and EventLoop::post. The stack trace annotated with source code is below.

    • Thread 1 is the main test thread. It is stuck at the end of the “disconnecting and blocking” test trying to destroy ProxyClient<FooInterface> which is blocked waiting for EventLoop::post and EventLoop::m_mutex in order to free the client IPC handle it owns.
    • Thread 2 is the server thread created by ProxyServer<ThreadMap>::makeThread which executes async IPC calls. It is blocked in a similar place as thread 1, trying to destroy ProxyClient<Thread> waiting for EventLoop::post. But additionally, further up the stack in frame 22 in the type-context.h Passfield function it has locked Waiter::m_mutex to delete the ThreadContext::request_threads map entry containing the object being destroyed. This lock is the bug because Waiter::m_mutex can’t be held while trying to call EventLoop::post, since this violates lock ordering.
    • Thread 3 is the EventLoop thread which is processing the disconnected event. It is running the ProxyClient<Thread>::m_disconnected_cb callback which is stuck trying to acquire Waiter::m_mutex in order to delete its ThreadContext::request_threads entry. It can’t acquire the mutex because thread 2 has it. It is also trying to delete the same map entry which thread 2 is, which is an additional bug fixed in this PR by always deleting these entries on event loop thread instead of letting thread 2 access the map directly.
     0[#0](/bitcoin-core-multiprocess/0/)  0x00007f480509438e in __futex_abstimed_wait_common () from /nix/store/cg9s562sa33k78m63njfn1rw47dp9z0i-glibc-2.40-66/lib/libc.so.6
     1[#1](/bitcoin-core-multiprocess/1/)  0x00007f4805097010 in pthread_cond_wait@@GLIBC_2.3.2 () from /nix/store/cg9s562sa33k78m63njfn1rw47dp9z0i-glibc-2.40-66/lib/libc.so.6
     2[#2](/bitcoin-core-multiprocess/2/)  0x000055df0ff8bdbb in std::condition_variable::wait<mp::EventLoop::post(kj::Function<void ()>)::$_2>(std::unique_lock<std::mutex>&, mp::EventLoop::post(kj::Function<void ()>)::$_2) (this=0x7f4804ffed08, __lock=..., __p=...)
     3    at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/condition_variable:105
     4[#3](/bitcoin-core-multiprocess/3/)  mp::EventLoop::post (this=0x7f4804ffec90, fn=...) at mp/src/mp/proxy.cpp:273
     5
     6   269      Unlock(lock, [&] {
     7   270          char buffer = 0;
     8   271          KJ_SYSCALL(write(post_fd, &buffer, 1));
     9   272      });
    10>  273      m_cv.wait(lock.m_lock, [this, &fn]() MP_REQUIRES(m_mutex) { return m_post_fn != &fn; });
    11   274  }
    12
    13[#4](/bitcoin-core-multiprocess/4/)  0x000055df0feefd7d in mp::EventLoop::sync<mp::ProxyClientBase<mp::test::messages::FooInterface, mp::test::FooImplementation>::ProxyClientBase(mp::test::messages::FooInterface::Client, mp::Connection*, bool)::{lambda()#2}::operator()() const::{lambda()#1}>(mp::ProxyClientBase<mp::test::messages::FooInterface, mp::test::FooImplementation>::ProxyClientBase(mp::test::messages::FooInterface::Client, mp::Connection*, bool)::{lambda()#2}::operator()() const::{lambda()#1}&&) (this=0x7f4804ffec90, callable=...) at mp/include/mp/proxy-io.h:182
    14
    15   179      template <typename Callable>
    16   180      void sync(Callable&& callable)
    17   181      {
    18 > 182          post(std::forward<Callable>(callable));
    19   183      }
    20
    21[#5](/bitcoin-core-multiprocess/5/)  0x000055df0feefc52 in mp::ProxyClientBase<mp::test::messages::FooInterface, mp::test::FooImplementation>::ProxyClientBase(mp::test::messages::FooInterface::Client, mp::Connection*, bool)::{lambda()#2}::operator()() const (this=<optimized out>)
    22    at mp/include/mp/proxy-io.h:434
    23
    24   431          Sub::destroy(*this);
    25   432
    26   433          // FIXME: Could just invoke removed addCleanup fn here instead of duplicating code
    27 > 434          m_context.loop->sync([&]() {
    28   435              // Remove disconnect callback on cleanup so it doesn't run and try
    29   436              // to access this object after it's destroyed. This call needs to
    30   437              // run inside loop->sync() on the event loop thread because
    31   438              // otherwise, if there were an ill-timed disconnect, the
    32   439              // onDisconnect handler could fire and delete the Connection object
    33   440              // before the removeSyncCleanup call.
    34   441              if (m_context.connection) m_context.connection->removeSyncCleanup(disconnect_cb);
    35
    36[#6](/bitcoin-core-multiprocess/6/)  std::__invoke_impl<void, mp::ProxyClientBase<mp::test::messages::FooInterface, mp::test::FooImplementation>::ProxyClientBase(mp::test::messages::FooInterface::Client, mp::Connection*, bool)::{lambda()#2}&>(std::__invoke_other, mp::ProxyClientBase<mp::test::messages::FooInterface, mp::test::FooImplementation>::ProxyClientBase(mp::test::messages::FooInterface::Client, mp::Connection*, bool)::{lambda()#2}&) (__f=...) at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/bits/invoke.h:61
    37[#7](/bitcoin-core-multiprocess/7/)  std::__invoke_r<void, mp::ProxyClientBase<mp::test::messages::FooInterface, mp::test::FooImplementation>::ProxyClientBase(mp::test::messages::FooInterface::Client, mp::Connection*, bool)::{lambda()#2}&>(mp::ProxyClientBase<mp::test::messages::FooInterface, mp::test::FooImplementation>::ProxyClientBase(mp::test::messages::FooInterface::Client, mp::Connection*, bool)::{lambda()#2}&) (__fn=...) at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/bits/invoke.h:111
    38[#8](/bitcoin-core-multiprocess/8/)  std::_Function_handler<void (), mp::ProxyClientBase<mp::test::messages::FooInterface, mp::test::FooImplementation>::ProxyClientBase(mp::test::messages::FooInterface::Client, mp::Connection*, bool)::{lambda()#2}>::_M_invoke(std::_Any_data const&) (__functor=...)
    39    at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/bits/std_function.h:290
    40[#9](/bitcoin-core-multiprocess/9/)  0x000055df0ff52d26 in std::function<void()>::operator() (this=0x7fffb7c1b700) at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/bits/std_function.h:591
    41[#10](/bitcoin-core-multiprocess/10/) mp::CleanupRun (fns=empty std::__cxx11::list) at mp/include/mp/proxy.h:43
    42
    43    39  inline void CleanupRun(CleanupList& fns) {
    44    40      while (!fns.empty()) {
    45    41          auto fn = std::move(fns.front());
    46    42          fns.pop_front();
    47>   43          fn();
    48    44      }
    49    45  }
    50
    51[#11](/bitcoin-core-multiprocess/11/) 0x000055df0ff872e7 in mp::ProxyClientBase<mp::test::messages::FooInterface, mp::test::FooImplementation>::~ProxyClientBase (this=0x7f4800027cf0) at mp/include/mp/proxy-io.h:460
    52
    53   457  template <typename Interface, typename Impl>
    54   458  ProxyClientBase<Interface, Impl>::~ProxyClientBase() noexcept
    55   459  {
    56>  460      CleanupRun(m_context.cleanup_fns);
    57   461  }
    58
    59[#12](/bitcoin-core-multiprocess/12/) 0x000055df0feebfc4 in std::default_delete<mp::ProxyClient<mp::test::messages::FooInterface> >::operator() (this=0x7fffb7c1b830, __ptr=0x7f4800027cf0) at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/bits/unique_ptr.h:93
    60[#13](/bitcoin-core-multiprocess/13/) std::__uniq_ptr_impl<mp::ProxyClient<mp::test::messages::FooInterface>, std::default_delete<mp::ProxyClient<mp::test::messages::FooInterface> > >::reset (this=0x7fffb7c1b830, __p=0x0)
    61    at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/bits/unique_ptr.h:205
    62[#14](/bitcoin-core-multiprocess/14/) std::unique_ptr<mp::ProxyClient<mp::test::messages::FooInterface>, std::default_delete<mp::ProxyClient<mp::test::messages::FooInterface> > >::reset (this=0x7fffb7c1b830, __p=0x0)
    63    at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/bits/unique_ptr.h:504
    64[#15](/bitcoin-core-multiprocess/15/) mp::test::TestSetup::~TestSetup (this=this@entry=0x7fffb7c1b7d8) at mp/test/mp/test/test.cpp:103
    65
    66    98      ~TestSetup()
    67    99      {
    68   100          // Test that client cleanup_fns are executed.
    69   101          bool destroyed = false;
    70   102          client->m_context.cleanup_fns.emplace_front([&destroyed] { destroyed = true; });
    71>  103          client.reset();
    72   104          KJ_EXPECT(destroyed);
    73   105
    74   106          thread.join();
    75   107      }
    76
    77[#16](/bitcoin-core-multiprocess/16/) 0x000055df0fee997f in mp::test::TestCase251::run (this=<optimized out>) at mp/test/mp/test/test.cpp:298
    78
    79   292      // Now that the disconnect has been detected, set signal allowing the
    80   293      // callFnAsync() IPC call to return. Since signalling may not wake up the
    81   294      // thread right away, it is important for the signal variable to be declared
    82   295      // *before* the TestSetup variable so is not destroyed while
    83   296      // signal.get_future().get() is called.
    84   297      signal.set_value();
    85>  298  }
    86
    87[#17](/bitcoin-core-multiprocess/17/) 0x00007f4805a34291 in kj::Maybe<kj::Exception> kj::runCatchingExceptions<kj::TestRunner::run()::{lambda()#1}>(kj::TestRunner::run()::{lambda()#1}&&) () from /nix/store/46kiq9naswgbqfc14kc9nxcbgd0rv0m2-capnproto-1.1.0/lib/libkj-test.so.1.1.0
    88[#18](/bitcoin-core-multiprocess/18/) 0x00007f4805a33a08 in kj::TestRunner::run() () from /nix/store/46kiq9naswgbqfc14kc9nxcbgd0rv0m2-capnproto-1.1.0/lib/libkj-test.so.1.1.0
    89[#19](/bitcoin-core-multiprocess/19/) 0x00007f4805a337bd in kj::Function<kj::MainBuilder::Validity ()>::Impl<kj::_::BoundMethod<kj::TestRunner&, kj::TestRunner::getMain()::{lambda(auto:1&, (auto:2&&)...)#7}, kj::TestRunner::getMain()::{lambda(auto:1&, (auto:2&&)...)#8}> >::operator()() ()
    90   from /nix/store/46kiq9naswgbqfc14kc9nxcbgd0rv0m2-capnproto-1.1.0/lib/libkj-test.so.1.1.0
    91[#20](/bitcoin-core-multiprocess/20/) 0x00007f480572f93d in kj::MainBuilder::MainImpl::operator()(kj::StringPtr, kj::ArrayPtr<kj::StringPtr const>) () from /nix/store/46kiq9naswgbqfc14kc9nxcbgd0rv0m2-capnproto-1.1.0/lib/libkj.so.1.1.0
    92[#21](/bitcoin-core-multiprocess/21/) 0x00007f480572df56 in kj::Maybe<kj::Exception> kj::runCatchingExceptions<kj::runMainAndExit(kj::ProcessContext&, kj::Function<void (kj::StringPtr, kj::ArrayPtr<kj::StringPtr const>)>&&, int, char**)::$_0>(kj::runMainAndExit(kj::ProcessContext&, kj::Function<void (kj::StringPtr, kj::ArrayPtr<kj::StringPtr const>)>&&, int, char**)::$_0&&) () from /nix/store/46kiq9naswgbqfc14kc9nxcbgd0rv0m2-capnproto-1.1.0/lib/libkj.so.1.1.0
    93[#22](/bitcoin-core-multiprocess/22/) 0x00007f480572dc7f in kj::runMainAndExit(kj::ProcessContext&, kj::Function<void (kj::StringPtr, kj::ArrayPtr<kj::StringPtr const>)>&&, int, char**) () from /nix/store/46kiq9naswgbqfc14kc9nxcbgd0rv0m2-capnproto-1.1.0/lib/libkj.so.1.1.0
    94[#23](/bitcoin-core-multiprocess/23/) 0x00007f4805a329b9 in main () from /nix/store/46kiq9naswgbqfc14kc9nxcbgd0rv0m2-capnproto-1.1.0/lib/libkj-test.so.1.1.0
    95[#24](/bitcoin-core-multiprocess/24/) 0x00007f480502a47e in __libc_start_call_main () from /nix/store/cg9s562sa33k78m63njfn1rw47dp9z0i-glibc-2.40-66/lib/libc.so.6
    96[#25](/bitcoin-core-multiprocess/25/) 0x00007f480502a539 in __libc_start_main_impl () from /nix/store/cg9s562sa33k78m63njfn1rw47dp9z0i-glibc-2.40-66/lib/libc.so.6
    97[#26](/bitcoin-core-multiprocess/26/) 0x000055df0fee6eb5 in _start ()
    
      0[#0](/bitcoin-core-multiprocess/0/)  0x00007f480509438e in __futex_abstimed_wait_common () from /nix/store/cg9s562sa33k78m63njfn1rw47dp9z0i-glibc-2.40-66/lib/libc.so.6
      1[#1](/bitcoin-core-multiprocess/1/)  0x00007f4805097010 in pthread_cond_wait@@GLIBC_2.3.2 () from /nix/store/cg9s562sa33k78m63njfn1rw47dp9z0i-glibc-2.40-66/lib/libc.so.6
      2[#2](/bitcoin-core-multiprocess/2/)  0x000055df0ff8bd0b in std::condition_variable::wait<mp::EventLoop::post(kj::Function<void ()>)::$_0>(std::unique_lock<std::mutex>&, mp::EventLoop::post(kj::Function<void ()>)::$_0) (this=0x7f4804ffed08, __lock=..., __p=...)
      3    at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/condition_variable:105
      4[#3](/bitcoin-core-multiprocess/3/)  mp::EventLoop::post (this=this@entry=0x7f4804ffec90, fn=...) at mp/src/mp/proxy.cpp:266
      5
      6   258  void EventLoop::post(kj::Function<void()> fn)
      7   259  {
      8   260      if (std::this_thread::get_id() == m_thread_id) {
      9   261          fn();
     10   262          return;
     11   263      }
     12   264      Lock lock(m_mutex);
     13   265      EventLoopRef ref(*this, &lock);
     14>  266      m_cv.wait(lock.m_lock, [this]() MP_REQUIRES(m_mutex) { return m_post_fn == nullptr; });
     15   267      m_post_fn = &fn;
     16   268      int post_fd{m_post_fd};
     17   269      Unlock(lock, [&] {
     18   270          char buffer = 0;
     19   271          KJ_SYSCALL(write(post_fd, &buffer, 1));
     20   272      });
     21   273      m_cv.wait(lock.m_lock, [this, &fn]() MP_REQUIRES(m_mutex) { return m_post_fn != &fn; });
     22   274  }
     23
     24[#4](/bitcoin-core-multiprocess/4/)  0x000055df0ff8f22d in mp::EventLoop::sync<mp::ProxyClientBase<mp::Thread, capnp::Void>::ProxyClientBase(mp::Thread::Client, mp::Connection*, bool)::{lambda()#2}::operator()() const::{lambda()#1}>(mp::ProxyClientBase<mp::Thread, capnp::Void>::ProxyClientBase(mp::Thread::Client, mp::Connection*, bool)::{lambda()#2}::operator()() const::{lambda()#1}&&) (this=0x7f4804ffec90, callable=...) at mp/include/mp/proxy-io.h:182
     25
     26   179      template <typename Callable>
     27   180      void sync(Callable&& callable)
     28   181      {
     29>  182          post(std::forward<Callable>(callable));
     30   183      }
     31
     32[#5](/bitcoin-core-multiprocess/5/)  0x000055df0ff8f102 in mp::ProxyClientBase<mp::Thread, capnp::Void>::ProxyClientBase(mp::Thread::Client, mp::Connection*, bool)::{lambda()#2}::operator()() const (this=<optimized out>) at mp/include/mp/proxy-io.h:434
     33
     34   431          Sub::destroy(*this);
     35   432
     36   433          // FIXME: Could just invoke removed addCleanup fn here instead of duplicating code
     37>  434          m_context.loop->sync([&]() {
     38   435              // Remove disconnect callback on cleanup so it doesn't run and try
     39   436              // to access this object after it's destroyed. This call needs to
     40   437              // run inside loop->sync() on the event loop thread because
     41   438              // otherwise, if there were an ill-timed disconnect, the
     42   439              // onDisconnect handler could fire and delete the Connection object
     43   440              // before the removeSyncCleanup call.
     44   441              if (m_context.connection) m_context.connection->removeSyncCleanup(disconnect_cb);
     45
     46[#6](/bitcoin-core-multiprocess/6/)  std::__invoke_impl<void, mp::ProxyClientBase<mp::Thread, capnp::Void>::ProxyClientBase(mp::Thread::Client, mp::Connection*, bool)::{lambda()#2}&>(std::__invoke_other, mp::ProxyClientBase<mp::Thread, capnp::Void>::ProxyClientBase(mp::Thread::Client, mp::Connection*, bool)::{lambda()#2}&) (__f=...) at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/bits/invoke.h:61
     47[#7](/bitcoin-core-multiprocess/7/)  std::__invoke_r<void, mp::ProxyClientBase<mp::Thread, capnp::Void>::ProxyClientBase(mp::Thread::Client, mp::Connection*, bool)::{lambda()#2}&>(mp::ProxyClientBase<mp::Thread, capnp::Void>::ProxyClientBase(mp::Thread::Client, mp::Connection*, bool)::{lambda()#2}&) (__fn=...)
     48    at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/bits/invoke.h:111
     49[#8](/bitcoin-core-multiprocess/8/)  std::_Function_handler<void (), mp::ProxyClientBase<mp::Thread, capnp::Void>::ProxyClientBase(mp::Thread::Client, mp::Connection*, bool)::{lambda()#2}>::_M_invoke(std::_Any_data const&) (__functor=...)
     50    at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/bits/std_function.h:290
     51[#9](/bitcoin-core-multiprocess/9/)  0x000055df0ff52d26 in std::function<void()>::operator() (this=0x7f47ffffe870) at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/bits/std_function.h:591
     52[#10](/bitcoin-core-multiprocess/10/) mp::CleanupRun (fns=empty std::__cxx11::list) at mp/include/mp/proxy.h:43
     53
     54    39  inline void CleanupRun(CleanupList& fns) {
     55    40      while (!fns.empty()) {
     56    41          auto fn = std::move(fns.front());
     57    42          fns.pop_front();
     58>   43          fn();
     59    44      }
     60    45  }
     61
     62[#11](/bitcoin-core-multiprocess/11/) 0x000055df0ff8e627 in mp::ProxyClientBase<mp::Thread, capnp::Void>::~ProxyClientBase (this=0x7f47f8000e58) at mp/include/mp/proxy-io.h:460
     63
     64   457  template <typename Interface, typename Impl>
     65   458  ProxyClientBase<Interface, Impl>::~ProxyClientBase() noexcept
     66   459  {
     67>  460      CleanupRun(m_context.cleanup_fns);
     68   461  }
     69
     70[#12](/bitcoin-core-multiprocess/12/) 0x000055df0ff53129 in std::pair<mp::Connection* const, mp::ProxyClient<mp::Thread> >::~pair (this=0x7f47f8000e50) at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/bits/stl_iterator.h:3013
     71[#13](/bitcoin-core-multiprocess/13/) std::destroy_at<std::pair<mp::Connection* const, mp::ProxyClient<mp::Thread> > > (__location=0x7f47f8000e50) at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/bits/stl_construct.h:88
     72[#14](/bitcoin-core-multiprocess/14/) std::allocator_traits<std::allocator<std::_Rb_tree_node<std::pair<mp::Connection* const, mp::ProxyClient<mp::Thread> > > > >::destroy<std::pair<mp::Connection* const, mp::ProxyClient<mp::Thread> > > (__p=0x7f47f8000e50, __a=...)
     73    at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/bits/alloc_traits.h:599
     74[#15](/bitcoin-core-multiprocess/15/) std::_Rb_tree<mp::Connection*, std::pair<mp::Connection* const, mp::ProxyClient<mp::Thread> >, std::_Select1st<std::pair<mp::Connection* const, mp::ProxyClient<mp::Thread> > >, std::less<mp::Connection*>, std::allocator<std::pair<mp::Connection* const, mp::ProxyClient<mp::Thread> > > >::_M_destroy_node (this=0x7f47fffff680, __p=0x7f47f8000e30) at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/bits/stl_tree.h:621
     75[#16](/bitcoin-core-multiprocess/16/) std::_Rb_tree<mp::Connection*, std::pair<mp::Connection* const, mp::ProxyClient<mp::Thread> >, std::_Select1st<std::pair<mp::Connection* const, mp::ProxyClient<mp::Thread> > >, std::less<mp::Connection*>, std::allocator<std::pair<mp::Connection* const, mp::ProxyClient<mp::Thread> > > >::_M_drop_node (this=0x7f47fffff680, __p=0x7f47f8000e30) at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/bits/stl_tree.h:629
     76[#17](/bitcoin-core-multiprocess/17/) std::_Rb_tree<mp::Connection*, std::pair<mp::Connection* const, mp::ProxyClient<mp::Thread> >, std::_Select1st<std::pair<mp::Connection* const, mp::ProxyClient<mp::Thread> > >, std::less<mp::Connection*>, std::allocator<std::pair<mp::Connection* const, mp::ProxyClient<mp::Thread> > > >::_M_erase (this=this@entry=0x7f47fffff680, __x=0x7f47f8000e30) at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/bits/stl_tree.h:1934
     77[#18](/bitcoin-core-multiprocess/18/) 0x000055df0ff530c9 in std::_Rb_tree<mp::Connection*, std::pair<mp::Connection* const, mp::ProxyClient<mp::Thread> >, std::_Select1st<std::pair<mp::Connection* const, mp::ProxyClient<mp::Thread> > >, std::less<mp::Connection*>, std::allocator<std::pair<mp::Connection* const, mp::ProxyClient<mp::Thread> > > >::clear (this=0x7f47fffff680) at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/bits/stl_tree.h:1251
     78[#19](/bitcoin-core-multiprocess/19/) std::_Rb_tree<mp::Connection*, std::pair<mp::Connection* const, mp::ProxyClient<mp::Thread> >, std::_Select1st<std::pair<mp::Connection* const, mp::ProxyClient<mp::Thread> > >, std::less<mp::Connection*>, std::allocator<std::pair<mp::Connection* const, mp::ProxyClient<mp::Thread> > > >::_M_erase_aux (this=0x7f47fffff680, __first={...}, __last={...}) at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/bits/stl_tree.h:2505
     79[#20](/bitcoin-core-multiprocess/20/) 0x000055df0ff70fa6 in std::_Rb_tree<mp::Connection*, std::pair<mp::Connection* const, mp::ProxyClient<mp::Thread> >, std::_Select1st<std::pair<mp::Connection* const, mp::ProxyClient<mp::Thread> > >, std::less<mp::Connection*>, std::allocator<std::pair<mp::Connection* const, mp::ProxyClient<mp::Thread> > > >::erase (this=0x7f4804ffed30, __x=<optimized out>) at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/bits/stl_tree.h:2519
     80[#21](/bitcoin-core-multiprocess/21/) std::map<mp::Connection*, mp::ProxyClient<mp::Thread>, std::less<mp::Connection*>, std::allocator<std::pair<mp::Connection* const, mp::ProxyClient<mp::Thread> > > >::erase (this=0x7f4804ffed30, __x=<optimized out>)
     81    at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/bits/stl_map.h:1118
     82[#22](/bitcoin-core-multiprocess/22/) operator() (this=this@entry=0x7f47ffffeaf8) at mp/include/mp/type-context.h:103
     83
     84   102                      const bool erase_thread{inserted};
     85   103                      KJ_DEFER(if (erase_thread) {
     86   104                          std::unique_lock<std::mutex> lock(thread_context.waiter->m_mutex);
     87   105                          // Call erase here with a Connection* argument instead
     88   106                          // of an iterator argument, because the `request_thread`
     89   107                          // iterator may be invalid if the connection is closed
     90   108                          // during this function call. More specifically, the
     91   109                          // iterator may be invalid because SetThread adds a
     92   110                          // cleanup callback to the Connection destructor that
     93   111                          // erases the thread from the map, and also because the
     94   112                          // ProxyServer<Thread> destructor calls
     95   113                          // request_threads.clear().
     96>  114                          request_threads.erase(server.m_context.connection);
     97   115                      });
     98   116                      fn.invoke(server_context, args...);
     99   117                  }
    100
    101[#23](/bitcoin-core-multiprocess/23/) 0x000055df0ff709c7 in run (this=<optimized out>) at /nix/store/46kiq9naswgbqfc14kc9nxcbgd0rv0m2-capnproto-1.1.0/include/kj/common.h:2007
    102[#24](/bitcoin-core-multiprocess/24/) ~Deferred (this=<optimized out>) at /nix/store/46kiq9naswgbqfc14kc9nxcbgd0rv0m2-capnproto-1.1.0/include/kj/common.h:1996
    103[#25](/bitcoin-core-multiprocess/25/) operator() (this=0x7f4800034628) at mp/include/mp/type-context.h:117
    104
    105>  117                  }
    106
    107[#26](/bitcoin-core-multiprocess/26/) 0x000055df0ff04920 in kj::Function<void()>::operator() (this=0x7f47ffffeda0) at /nix/store/46kiq9naswgbqfc14kc9nxcbgd0rv0m2-capnproto-1.1.0/include/kj/function.h:119
    108[#27](/bitcoin-core-multiprocess/27/) mp::Unlock<std::unique_lock<std::mutex>, kj::Function<void()>&> (lock=..., callback=...) at mp/include/mp/util.h:198
    109
    110   194  template <typename Lock, typename Callback>
    111   195  void Unlock(Lock& lock, Callback&& callback)
    112   196  {
    113   197      const UnlockGuard<Lock> unlock(lock);
    114>  198      callback();
    115   199  }
    116
    117[#28](/bitcoin-core-multiprocess/28/) 0x000055df0ff8d979 in mp::Waiter::wait<mp::ProxyServer<mp::ThreadMap>::makeThread(capnp::CallContext<mp::ThreadMap::MakeThreadParams, mp::ThreadMap::MakeThreadResults>)::$_0::operator()() const::{lambda()#1}>(std::unique_lock<std::mutex>&, mp::ProxyServer<mp::ThreadMap>::makeThread(capnp::CallContext<mp::ThreadMap::MakeThreadParams, mp::ThreadMap::MakeThreadResults>)::$_0::operator()() const::{lambda()#1})::{lambda()#1}::operator()() const (this=<optimized out>) at mp/include/mp/proxy-io.h:296
    118
    119   284      template <class Predicate>
    120   285      void wait(std::unique_lock<std::mutex>& lock, Predicate pred)
    121   286      {
    122   287          m_cv.wait(lock, [&] {
    123   288              // Important for this to be "while (m_fn)", not "if (m_fn)" to avoid
    124   289              // a lost-wakeup bug. A new m_fn and m_cv notification might be sent
    125   290              // after the fn() call and before the lock.lock() call in this loop
    126   291              // in the case where a capnp response is sent and a brand new
    127   292              // request is immediately received.
    128   293              while (m_fn) {
    129   294                  auto fn = std::move(*m_fn);
    130   295                  m_fn.reset();
    131>  296                  Unlock(lock, fn);
    132   297              }
    133   298              const bool done = pred();
    134   299              return done;
    135   300          });
    136   301      }
    137
    138[#29](/bitcoin-core-multiprocess/29/) std::condition_variable::wait<mp::Waiter::wait<mp::ProxyServer<mp::ThreadMap>::makeThread(capnp::CallContext<mp::ThreadMap::MakeThreadParams, mp::ThreadMap::MakeThreadResults>)::$_0::operator()() const::{lambda()#1}>(std::unique_lock<std::mutex>&, mp::ProxyServer<mp::ThreadMap>::makeThread(capnp::CallContext<mp::ThreadMap::MakeThreadParams, mp::ThreadMap::MakeThreadResults>)::$_0::operator()() const::{lambda()#1})::{lambda()#1}>(std::unique_lock<std::mutex>&, mp::Waiter::wait<mp::ProxyServer<mp::ThreadMap>::makeThread(capnp::CallContext<mp::ThreadMap::MakeThreadParams, mp::ThreadMap::MakeThreadResults>)::$_0::operator()() const::{lambda()#1}>(std::unique_lock<std::mutex>&, mp::ProxyServer<mp::ThreadMap>::makeThread(capnp::CallContext<mp::ThreadMap::MakeThreadParams, mp::ThreadMap::MakeThreadResults>)::$_0::operator()() const::{lambda()#1})::{lambda()#1}) (this=0x7f47f8000dd8, __lock=..., __p=...) at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/condition_variable:104
    139[#30](/bitcoin-core-multiprocess/30/) mp::Waiter::wait<mp::ProxyServer<mp::ThreadMap>::makeThread(capnp::CallContext<mp::ThreadMap::MakeThreadParams, mp::ThreadMap::MakeThreadResults>)::$_0::operator()() const::{lambda()#1}>(std::unique_lock<std::mutex>&, mp::ProxyServer<mp::ThreadMap>::makeThread(capnp::CallContext<mp::ThreadMap::MakeThreadParams, mp::ThreadMap::MakeThreadResults>)::$_0::operator()() const::{lambda()#1}) (this=0x7f47f8000db0, lock=..., pred=...) at mp/include/mp/proxy-io.h:287
    140
    141>  287          m_cv.wait(lock, [&] {
    142
    143[#31](/bitcoin-core-multiprocess/31/) mp::ProxyServer<mp::ThreadMap>::makeThread(capnp::CallContext<mp::ThreadMap::MakeThreadParams, mp::ThreadMap::MakeThreadResults>)::$_0::operator()() const (this=<optimized out>) at mp/src/mp/proxy.cpp:404
    144
    145   393  kj::Promise<void> ProxyServer<ThreadMap>::makeThread(MakeThreadContext context)
    146   394  {
    147   395      const std::string from = context.getParams().getName();
    148   396      std::promise<ThreadContext*> thread_context;
    149   397      std::thread thread([&thread_context, from, this]() {
    150   398          g_thread_context.thread_name = ThreadName(m_connection.m_loop->m_exe_name) + " (from " + from + ")";
    151   399          g_thread_context.waiter = std::make_unique<Waiter>();
    152   400          thread_context.set_value(&g_thread_context);
    153   401          std::unique_lock<std::mutex> lock(g_thread_context.waiter->m_mutex);
    154   402          // Wait for shutdown signal from ProxyServer<Thread> destructor (signal
    155   403          // is just waiter getting set to null.)
    156>  404          g_thread_context.waiter->wait(lock, [] { return !g_thread_context.waiter; });
    157   405      });
    158   406      auto thread_server = kj::heap<ProxyServer<Thread>>(*thread_context.get_future().get(), std::move(thread));
    159   407      auto thread_client = m_connection.m_threads.add(kj::mv(thread_server));
    160   408      context.getResults().setResult(kj::mv(thread_client));
    161   409      return kj::READY_NOW;
    162   410  }
    163
    164[#32](/bitcoin-core-multiprocess/32/) std::__invoke_impl<void, mp::ProxyServer<mp::ThreadMap>::makeThread(capnp::CallContext<mp::ThreadMap::MakeThreadParams, mp::ThreadMap::MakeThreadResults>)::$_0>(std::__invoke_other, mp::ProxyServer<mp::ThreadMap>::makeThread(capnp::CallContext<mp::ThreadMap::MakeThreadParams, mp::ThreadMap::MakeThreadResults>)::$_0&&) (__f=...) at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/bits/invoke.h:61
    165[#33](/bitcoin-core-multiprocess/33/) std::__invoke<mp::ProxyServer<mp::ThreadMap>::makeThread(capnp::CallContext<mp::ThreadMap::MakeThreadParams, mp::ThreadMap::MakeThreadResults>)::$_0>(mp::ProxyServer<mp::ThreadMap>::makeThread(capnp::CallContext<mp::ThreadMap::MakeThreadParams, mp::ThreadMap::MakeThreadResults>)::$_0&&) (__fn=...) at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/bits/invoke.h:96
    166[#34](/bitcoin-core-multiprocess/34/) std::thread::_Invoker<std::tuple<mp::ProxyServer<mp::ThreadMap>::makeThread(capnp::CallContext<mp::ThreadMap::MakeThreadParams, mp::ThreadMap::MakeThreadResults>)::$_0> >::_M_invoke<0ul> (this=<optimized out>)
    167    at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/bits/std_thread.h:301
    168[#35](/bitcoin-core-multiprocess/35/) std::thread::_Invoker<std::tuple<mp::ProxyServer<mp::ThreadMap>::makeThread(capnp::CallContext<mp::ThreadMap::MakeThreadParams, mp::ThreadMap::MakeThreadResults>)::$_0> >::operator() (this=<optimized out>)
    169    at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/bits/std_thread.h:308
    170[#36](/bitcoin-core-multiprocess/36/) std::thread::_State_impl<std::thread::_Invoker<std::tuple<mp::ProxyServer<mp::ThreadMap>::makeThread(capnp::CallContext<mp::ThreadMap::MakeThreadParams, mp::ThreadMap::MakeThreadResults>)::$_0> > >::_M_run (this=<optimized out>)
    171    at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/bits/std_thread.h:253
    172[#37](/bitcoin-core-multiprocess/37/) 0x00007f48054ed064 in execute_native_thread_routine () from /nix/store/7c0v0kbrrdc2cqgisi78jdqxn73n3401-gcc-14.2.1.20250322-lib/lib/libstdc++.so.6
    173[#38](/bitcoin-core-multiprocess/38/) 0x00007f4805097e63 in start_thread () from /nix/store/cg9s562sa33k78m63njfn1rw47dp9z0i-glibc-2.40-66/lib/libc.so.6
    174[#39](/bitcoin-core-multiprocess/39/) 0x00007f480511bdbc in __clone3 () from /nix/store/cg9s562sa33k78m63njfn1rw47dp9z0i-glibc-2.40-66/lib/libc.so.6
    
     0[#0](/bitcoin-core-multiprocess/0/)  0x00007f480509463f in __lll_lock_wait () from /nix/store/cg9s562sa33k78m63njfn1rw47dp9z0i-glibc-2.40-66/lib/libc.so.6
     1[#1](/bitcoin-core-multiprocess/1/)  0x00007f480509b392 in pthread_mutex_lock@@GLIBC_2.2.5 () from /nix/store/cg9s562sa33k78m63njfn1rw47dp9z0i-glibc-2.40-66/lib/libc.so.6
     2[#2](/bitcoin-core-multiprocess/2/)  0x000055df0ff8d324 in __gthread_mutex_lock (__mutex=0x7f47f8000db0) at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/x86_64-unknown-linux-gnu/bits/gthr-default.h:762
     3[#3](/bitcoin-core-multiprocess/3/)  std::mutex::lock (this=0x7f47f8000db0) at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/bits/std_mutex.h:113
     4[#4](/bitcoin-core-multiprocess/4/)  std::unique_lock<std::mutex>::lock (this=<optimized out>) at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/bits/unique_lock.h:147
     5[#5](/bitcoin-core-multiprocess/5/)  std::unique_lock<std::mutex>::unique_lock (__m=..., this=<optimized out>) at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/bits/unique_lock.h:73
     6[#6](/bitcoin-core-multiprocess/6/)  mp::SetThread(std::map<mp::Connection*, mp::ProxyClient<mp::Thread>, std::less<mp::Connection*>, std::allocator<std::pair<mp::Connection* const, mp::ProxyClient<mp::Thread> > > >&, std::mutex&, mp::Connection*, std::function<mp::Thread::Client ()> const&)::$_0::operator()() const (
     7    this=0x7f47f8001170) at mp/src/mp/proxy.cpp:326
     8
     9   308  std::tuple<ConnThread, bool> SetThread(ConnThreads& threads, std::mutex& mutex, Connection* connection, const std::function<Thread::Client()>& make_thread)
    10   309  {
    11   310      const std::unique_lock<std::mutex> lock(mutex);
    12   311      auto thread = threads.find(connection);
    13   312      if (thread != threads.end()) return {thread, false};
    14   313      thread = threads.emplace(
    15   314          std::piecewise_construct, std::forward_as_tuple(connection),
    16   315          std::forward_as_tuple(make_thread(), connection, /* destroy_connection= */ false)).first;
    17   316      thread->second.setDisconnectCallback([&threads, &mutex, thread] {
    18   317          // Note: it is safe to use the `thread` iterator in this cleanup
    19   318          // function, because the iterator would only be invalid if the map entry
    20   319          // was removed, and if the map entry is removed the ProxyClient<Thread>
    21   320          // destructor unregisters the cleanup.
    22   321
    23   322          // Connection is being destroyed before thread client is, so reset
    24   323          // thread client m_disconnect_cb member so thread client destructor does not
    25   324          // try to unregister this callback after connection is destroyed.
    26   325          // Remove connection pointer about to be destroyed from the map
    27>  326          const std::unique_lock<std::mutex> lock(mutex);
    28   327          thread->second.m_disconnect_cb.reset();
    29   328          threads.erase(thread);
    30   329      });
    31   330      return {thread, true};
    32   331  }
    33
    34[#7](/bitcoin-core-multiprocess/7/)  std::__invoke_impl<void, mp::SetThread(std::map<mp::Connection*, mp::ProxyClient<mp::Thread>, std::less<mp::Connection*>, std::allocator<std::pair<mp::Connection* const, mp::ProxyClient<mp::Thread> > > >&, std::mutex&, mp::Connection*, std::function<mp::Thread::Client ()> const&)::$_0&>(std::__invoke_other, mp::SetThread(std::map<mp::Connection*, mp::ProxyClient<mp::Thread>, std::less<mp::Connection*>, std::allocator<std::pair<mp::Connection* const, mp::ProxyClient<mp::Thread> > > >&, std::mutex&, mp::Connection*, std::function<mp::Thread::Client ()> const&)::$_0&)
    35    (__f=...) at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/bits/invoke.h:61
    36[#8](/bitcoin-core-multiprocess/8/)  std::__invoke_r<void, mp::SetThread(std::map<mp::Connection*, mp::ProxyClient<mp::Thread>, std::less<mp::Connection*>, std::allocator<std::pair<mp::Connection* const, mp::ProxyClient<mp::Thread> > > >&, std::mutex&, mp::Connection*, std::function<mp::Thread::Client ()> const&)::$_0&>(mp::SetThread(std::map<mp::Connection*, mp::ProxyClient<mp::Thread>, std::less<mp::Connection*>, std::allocator<std::pair<mp::Connection* const, mp::ProxyClient<mp::Thread> > > >&, std::mutex&, mp::Connection*, std::function<mp::Thread::Client ()> const&)::$_0&) (__fn=...)
    37    at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/bits/invoke.h:111
    38[#9](/bitcoin-core-multiprocess/9/)  std::_Function_handler<void(), mp::SetThread(std::map<mp::Connection*, mp::ProxyClient<mp::Thread>, std::less<mp::Connection*>, std::allocator<std::pair<mp::Connection* const, mp::ProxyClient<mp::Thread> > > >&, std::mutex&, mp::Connection*, std::function<mp::Thread::Client()> const&)::$_0>::_M_invoke (__functor=...) at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/bits/std_function.h:290
    39[#10](/bitcoin-core-multiprocess/10/) 0x000055df0ff8de15 in std::function<void()>::operator() (this=0x7f47f8000fb0) at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/bits/std_function.h:591
    40[#11](/bitcoin-core-multiprocess/11/) mp::Unlock<mp::Lock, std::function<void()>&> (lock=..., callback=...) at mp/include/mp/util.h:198
    41
    42   194  template <typename Lock, typename Callback>
    43   195  void Unlock(Lock& lock, Callback&& callback)
    44   196  {
    45   197      const UnlockGuard<Lock> unlock(lock);
    46>  198      callback();
    47   199  }
    48
    49[#12](/bitcoin-core-multiprocess/12/) 0x000055df0ff8a70f in mp::Connection::~Connection (this=0x7f480002a810) at mp/src/mp/proxy.cpp:139
    50
    51   135      Lock lock{m_loop->m_mutex};
    52   136      while (!m_sync_cleanup_fns.empty()) {
    53   137          CleanupList fn;
    54   138          fn.splice(fn.begin(), m_sync_cleanup_fns, m_sync_cleanup_fns.begin());
    55>  139          Unlock(lock, fn.front());
    56   140      }
    57   141  }
    58
    59[#13](/bitcoin-core-multiprocess/13/) 0x000055df0feeedd3 in std::default_delete<mp::Connection>::operator() (__ptr=0x7f480002a810, this=<optimized out>) at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/bits/unique_ptr.h:93
    60[#14](/bitcoin-core-multiprocess/14/) std::__uniq_ptr_impl<mp::Connection, std::default_delete<mp::Connection> >::reset (this=0xfffffffffffffe00, __p=0x0) at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/bits/unique_ptr.h:205
    61[#15](/bitcoin-core-multiprocess/15/) std::unique_ptr<mp::Connection, std::default_delete<mp::Connection> >::reset (this=0xfffffffffffffe00, __p=0x0) at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/bits/unique_ptr.h:504
    62[#16](/bitcoin-core-multiprocess/16/) mp::test::TestSetup::TestSetup(bool)::{lambda()#1}::operator()() const::{lambda()#2}::operator()() const (this=<optimized out>) at mp/test/mp/test/test.cpp:79
    63
    64    76                server_disconnect = [&] { loop.sync([&] { server_connection.reset(); }); };
    65    77                // Set handler to destroy the server when the client disconnects. This
    66    78                // is ignored if server_disconnect() is called instead.
    67>   79                server_connection->onDisconnect([&] { server_connection.reset(); });
    68    80
    69    81                auto client_connection = std::make_unique<Connection>(loop, kj::mv(pipe.ends[1]));
    70
    71[#17](/bitcoin-core-multiprocess/17/) kj::_::MaybeVoidCaller<kj::_::Void, kj::_::Void>::apply<mp::test::TestSetup::TestSetup(bool)::{lambda()#1}::operator()() const::{lambda()#2}>(mp::test::TestSetup::TestSetup(bool)::{lambda()#1}::operator()() const::{lambda()#2}&, kj::_::Void&&) (func=..., in=...)
    72    at /nix/store/46kiq9naswgbqfc14kc9nxcbgd0rv0m2-capnproto-1.1.0/include/kj/async-prelude.h:195
    73[#18](/bitcoin-core-multiprocess/18/) kj::_::TransformPromiseNode<kj::_::Void, kj::_::Void, mp::test::TestSetup::TestSetup(bool)::{lambda()#1}::operator()() const::{lambda()#2}, kj::_::PropagateException>::getImpl(kj::_::ExceptionOrValue&) (this=<optimized out>, output=...)
    74    at /nix/store/46kiq9naswgbqfc14kc9nxcbgd0rv0m2-capnproto-1.1.0/include/kj/async-inl.h:739
    75[#19](/bitcoin-core-multiprocess/19/) 0x00007f48057c3a3e in kj::_::TransformPromiseNodeBase::get(kj::_::ExceptionOrValue&) () from /nix/store/46kiq9naswgbqfc14kc9nxcbgd0rv0m2-capnproto-1.1.0/lib/libkj-async.so.1.1.0
    76[#20](/bitcoin-core-multiprocess/20/) 0x00007f48057d102a in kj::TaskSet::Task::fire() () from /nix/store/46kiq9naswgbqfc14kc9nxcbgd0rv0m2-capnproto-1.1.0/lib/libkj-async.so.1.1.0
    77[#21](/bitcoin-core-multiprocess/21/) 0x00007f48057d143d in non-virtual thunk to kj::TaskSet::Task::fire() () from /nix/store/46kiq9naswgbqfc14kc9nxcbgd0rv0m2-capnproto-1.1.0/lib/libkj-async.so.1.1.0
    78[#22](/bitcoin-core-multiprocess/22/) 0x00007f48057c8cb2 in kj::_::waitImpl(kj::Own<kj::_::PromiseNode, kj::_::PromiseDisposer>&&, kj::_::ExceptionOrValue&, kj::WaitScope&, kj::SourceLocation)::$_2::operator()() const () from /nix/store/46kiq9naswgbqfc14kc9nxcbgd0rv0m2-capnproto-1.1.0/lib/libkj-async.so.1.1.0
    79[#23](/bitcoin-core-multiprocess/23/) 0x00007f48057c1d68 in kj::_::waitImpl(kj::Own<kj::_::PromiseNode, kj::_::PromiseDisposer>&&, kj::_::ExceptionOrValue&, kj::WaitScope&, kj::SourceLocation) () from /nix/store/46kiq9naswgbqfc14kc9nxcbgd0rv0m2-capnproto-1.1.0/lib/libkj-async.so.1.1.0
    80[#24](/bitcoin-core-multiprocess/24/) 0x000055df0ff8e44c in kj::Promise<unsigned long>::wait (this=<optimized out>, waitScope=..., location=...) at /nix/store/46kiq9naswgbqfc14kc9nxcbgd0rv0m2-capnproto-1.1.0/include/kj/async-inl.h:1357
    81[#25](/bitcoin-core-multiprocess/25/) 0x000055df0ff8b722 in mp::EventLoop::loop (this=0x7f4804ffec90) at mp/src/mp/proxy.cpp:231
    82
    83>  231          const size_t read_bytes = wait_stream->read(&buffer, 0, 1).wait(m_io_context.waitScope);
    84
    85
    86[#26](/bitcoin-core-multiprocess/26/) 0x000055df0feed376 in mp::test::TestSetup::TestSetup(bool)::{lambda()#1}::operator()() const (this=0x7f480002d1e8) at mp/test/mp/test/test.cpp:92
    87
    88    91                client_promise.set_value(std::move(client_proxy));
    89>   92                loop.loop();
    90
    91[#27](/bitcoin-core-multiprocess/27/) 0x00007f48054ed064 in execute_native_thread_routine () from /nix/store/7c0v0kbrrdc2cqgisi78jdqxn73n3401-gcc-14.2.1.20250322-lib/lib/libstdc++.so.6
    92[#28](/bitcoin-core-multiprocess/28/) 0x00007f4805097e63 in start_thread () from /nix/store/cg9s562sa33k78m63njfn1rw47dp9z0i-glibc-2.40-66/lib/libc.so.6
    93[#29](/bitcoin-core-multiprocess/29/) 0x00007f480511bdbc in __clone3 () from /nix/store/cg9s562sa33k78m63njfn1rw47dp9z0i-glibc-2.40-66/lib/libc.so.6
    
  5. ryanofsky force-pushed on Sep 3, 2025
  6. ryanofsky commented at 5:02 pm on September 3, 2025: collaborator
    Updated fe1cbedc78c8eadcd668fb6cac244d63747f1503 -> 1d6d854047ab2c6c06f36e5a21fa381969abb483 (pr/context.1 -> pr/context.2, compare) fixing iwyu error and making minor cleanups
  7. in include/mp/proxy-io.h:539 in 1d6d854047 outdated
    531@@ -534,29 +532,73 @@ void ProxyServerBase<Interface, Impl>::invokeDestroy()
    532     CleanupRun(m_context.cleanup_fns);
    533 }
    534 
    535-using ConnThreads = std::map<Connection*, ProxyClient<Thread>>;
    536+//! Map from Connection to local or remote thread handle which will be used over
    537+//! that connection. This map will typically only contain one entry, but can
    538+//! contain multiple if a single thread makes IPC calls over multiple
    539+//! connections. A std::optional value type is used to avoid the map needing to
    540+//! be locked while ProxyClient<Thread> objects are constructred, see
    


    maflcko commented at 7:16 am on September 4, 2025:
    constructred -> constructed [spelling error
    

    ryanofsky commented at 7:55 pm on September 5, 2025:

    constructred

    Thanks, fixed

  8. in include/mp/proxy-io.h:571 in dd86363f34 outdated
    569-    //! server thread.
    570+    //! Waiter object used to allow remote clients to execute code on this
    571+    //! thread. If this is a server thread created by
    572+    //! ProxyServer<ThreadMap>::makeThread(), it is initialized by that
    573+    //! function. Otherwise if this is a thread that was created externally,
    574+    //! this is just initialized the first time the thread tries to make an IPC
    


    Eunovo commented at 8:36 am on September 5, 2025:
    nit: then it is initialized...

    ryanofsky commented at 7:54 pm on September 5, 2025:

    re: #201 (review)

    Thanks rephrased this to be clearer now, hopefully

  9. in src/mp/proxy.cpp:358 in cd6d95aef8 outdated
    349+        // loop->sync(), so if the connection is broken there is not a race
    350+        // between this thread trying to remove the callback and the disconnect
    351+        // handler attempting to call it.
    352+        m_context.loop->sync([&]() {
    353+            if (m_disconnect_cb) {
    354+                m_context.connection->removeSyncCleanup(*m_disconnect_cb);
    


    Eunovo commented at 9:31 am on September 5, 2025:

    https://github.com/bitcoin-core/libmultiprocess/pull/201/commits/cd6d95aef81149a5ecae2c05aad4f93661c5b713: While attempting to reproduce the bug, the disconnect_cb would hang or cause a segmentation fault while executing. I believe it was caused by the callback being deleted by the ProxyClient<Thread> destructor while it was being executed as part of the cleanups in the Connection destructor.

    0[#8](/bitcoin-core-multiprocess/8/)  operator() (__closure=0x7fffe80012e0) at /libmultiprocess/src/mp/proxy.cpp:326
    1326	        const std::unique_lock<std::mutex> lock(mutex);
    2(gdb) print mutex
    3$12 = (std::mutex &) <error reading variable: Cannot access memory at address 0x2f916f0c9da42c7c>
    4(gdb) 
    

    This change seems to stop the undefined behaviour caused by the previously described scenario, even though the test still hangs.

    0[#8](/bitcoin-core-multiprocess/8/)  operator() (__closure=0x7fffe8000f60) at /libmultiprocess/src/mp/proxy.cpp:326
    1326	        const std::unique_lock<std::mutex> lock(mutex);
    2(gdb) print mutex
    3$1 = (std::mutex &) [@0x7fffe8000db0](/bitcoin-core-multiprocess/contributor/0x7fffe8000db0/): {<std::__mutex_base> = {_M_mutex = {__data = {__lock = 2, __count = 0, __owner = 1676762, __nusers = 1, __kind = 0, __spins = 0, __elision = 0, 
    4        __list = {__prev = 0x0, __next = 0x0}}, __size = "\002\000\000\000\000\000\000\000ڕ\031\000\001", '\000' <repeats 26 times>, __align = 2}}, <No data fields>}
    5(gdb) 
    

    The test hangs because the ProxyClient<Thread> destructor waits for the EventLoop to clear the disconnect_cb, but the EventLoop is blocked trying to acquire the waiter mutex. This mutex is still held by PassField in type_context.h during its thread erasure operation, which is higher up in the call stack of the same thread that invoked the destructor, creating a deadlock.


    ryanofsky commented at 8:25 pm on September 5, 2025:

    re: #201 (review)

    While attempting to reproduce the bug, the disconnect_cb would hang or cause a segmentation fault while executing. I believe it was caused by the callback being deleted by the ProxyClient<Thread> destructor while it was being executed as part of the cleanups in the Connection destructor.

    Yes exactly, that’s the race condition bug being fixed.

    This change seems to stop the undefined behaviour caused by the previously described scenario, even though the test still hangs. […] The test hangs because the ProxyClient<Thread> destructor waits for the EventLoop to clear the disconnect_cb, but the EventLoop is blocked trying to acquire the waiter mutex. This mutex is still held by PassField in type_context.h during its thread erasure operation, which is higher up in the call stack of the same thread that invoked the destructor, creating a deadlock.

    Yes this change in this file is not sufficient to stop the deadlock because there was also code in type_context.h locking Waiter::m_mutex while calling this destructor via request_threads.erase(). So this commit also replaces request_threads.erase() with request_threads.extract() to delay calling this decstructor until after it releases Waiter::m_mutex That code has been fixed with an additional sync() call. (EDIT: The sync() call is really the key to the fix not the extract() call. While the extract() would fix the deadlock it would not fix the race condition.)

    With both of these changes there shouldn’t be a race or deadlock anymore.

  10. in include/mp/type-context.h:112 in cd6d95aef8 outdated
    118+                        // Erase the request_threads entry on the event loop
    119+                        // thread with loop->sync(), so if the connection is
    120+                        // broken there is not a race between this thread and
    121+                        // the disconnect handler trying to destroy the thread
    122+                        // client object.
    123+                        server.m_context.loop->sync([&] {
    


    Eunovo commented at 10:40 am on September 5, 2025:
    https://github.com/bitcoin-core/libmultiprocess/pull/201/commits/cd6d95aef81149a5ecae2c05aad4f93661c5b713: The test stops hanging after this change. This thread used to compete with the disconnect handler for the waiter mutex. After this change, the disconnect handler and this callback must execute one after the other.

    ryanofsky commented at 8:17 pm on September 5, 2025:

    re: #201 (review)

    cd6d95a: The test stops hanging after this change. This thread used to compete with the disconnect handler for the waiter mutex. After this change, the disconnect handler and this callback must execute one after the other.

    Yes, the disconnect handler is called from the ~Connection destructor which always runs on the event loop thread, and that would race against this thread which is an IPC server thread. Now everything just runs sequentially on the event loop thread.

  11. Eunovo approved
  12. Eunovo commented at 11:09 am on September 5, 2025: contributor

    TestedACK https://github.com/bitcoin-core/libmultiprocess/pull/201/commits/1d6d854047ab2c6c06f36e5a21fa381969abb483

    As I understand it, there are two problems that are solved by this PR:

    • a kind of use-after-free bug where the disconnect_cb is deleted in one thread while it is being executed in another
    • a deadlock caused by two threads competing for the same waiter mutex

    I left comments with more details under this review.

  13. ryanofsky force-pushed on Sep 5, 2025
  14. ryanofsky force-pushed on Sep 5, 2025
  15. ryanofsky commented at 8:43 pm on September 5, 2025: collaborator

    Thanks for the review! Updated with suggested changes

    re: #201#pullrequestreview-3188652191

    As I understand it, there are two problems that are solved by this PR:

    Yes, there were basically the deadlock bug which happened most commonly, and different race bugs that could happen because there wasn’t really any synchronization previously between disconnect handler and the type-context.h cleanup code after asynchronous IPC calls.


    Updated 1d6d854047ab2c6c06f36e5a21fa381969abb483 -> 564e353c01b020dfef87adb24c32c56f7b002157 (pr/context.2 -> pr/context.3, compare) updating comments and adding assert.

    Updated 564e353c01b020dfef87adb24c32c56f7b002157 -> f219f1b51c85e5c4a7157c78d278f832eb0bd975 (pr/context.3 -> pr/context.4, compare) fixing LLM linter issues

  16. TheCharlatan commented at 2:24 pm on September 8, 2025: collaborator
    Can confirm that this fixes the observed test hang.
  17. in include/mp/proxy-io.h:601 in 90d4a42a39 outdated
    600+    //! associated with one event loop and guarded by EventLoop::m_mutex. So
    601+    //! Waiter::m_mutex does not need to be held while accessing individual
    602+    //! ProxyClient<Thread> instances, and may even need to be released to
    603+    //! respect lock order and avoid locking Waiter::m_mutex before
    604+    //! EventLoop::m_mutex.
    605+    ConnThreads callback_threads MP_GUARDED_BY(waiter->m_mutex);
    


    Sjors commented at 7:47 am on September 10, 2025:
    In 90d4a42a39fe0e6062e0a06ce784c582ac99cb3f doc: describe ThreadContext struct and synchronization requirements: did you mean to introduce this guard and (some of) the synchronization note in the next commit?

    ryanofsky commented at 10:38 am on September 10, 2025:

    re: #201 (review)

    In 90d4a42 doc: describe ThreadContext struct and synchronization requirements: did you mean to introduce this guard and (some of) the synchronization note in the next commit?

    I could move it to the next commit if preferred. I actually did want to add the MP_GUARDED_BY annotations in this commit to match the comment added here “these maps are guarded by Waiter::m_mutex”. I just couldn’t add both of them due to compiler errors, so added one here and one later.


    Sjors commented at 11:53 am on September 10, 2025:
    I think the MP_GUARDED_BY should be in the same commit that explains it (as you do now). It seems slightly better to add them both in the later commit, but no strong preference.

    ryanofsky commented at 8:50 pm on September 10, 2025:

    re: #201 (review)

    Wound up going in a different direction, just adding thread annotations in the second commit. Hodlinator found a bug where annotations were not being enforced as strictly as they could because the build was using -Wthread-safety-analysis, not -Wthread-safety like the bitcoin build. Fixing this required more changes so I just made all the thread safety annotations together in one commit.

  18. in src/mp/proxy.cpp:315 in f219f1b51c outdated
    311@@ -305,29 +312,34 @@ bool EventLoop::done() const
    312     return m_num_clients == 0 && m_async_fns->empty();
    313 }
    314 
    315-std::tuple<ConnThread, bool> SetThread(ConnThreads& threads, std::mutex& mutex, Connection* connection, const std::function<Thread::Client()>& make_thread)
    316+std::tuple<ConnThread, bool> SetThread(ConnThreads& threads, Mutex& mutex, Connection* connection, const std::function<Thread::Client()>& make_thread)
    


    hodlinator commented at 8:00 am on September 10, 2025:
    nit: Accepting an independent mutex here feels very unsteady when not knowing there are only 23 callers. Either taking the full Waiter type or at least naming it waiter_mutex would feel more reassuring.

    hodlinator commented at 12:40 pm on September 10, 2025:

    Having these decoupled variables also results in warnings (at least when building as part of Bitcoin Core, haven’t been able to make it inside freestanding libmultiprocess yet).

    0../src/ipc/libmultiprocess/include/mp/type-context.h:28:24: warning: passing variable 'callback_threads' by reference requires holding mutex 'thread_context.waiter->m_mutex' [-Wthread-safety-reference]
    1   28 |         thread_context.callback_threads, thread_context.waiter->m_mutex, &connection,
    

    ryanofsky commented at 8:53 pm on September 10, 2025:

    re: #201 (review)

    Nice catch! This was caused by the bitcoin build using -Wthread-safety and the builds here only using -Wthread-safety-analysis. This is fixed now by grouping the first two parameters into a struct, which I think should resolve your initial concern too.

  19. in include/mp/proxy-io.h:541 in 7a8a99c50f outdated
    534@@ -537,8 +535,10 @@ void ProxyServerBase<Interface, Impl>::invokeDestroy()
    535 //! Map from Connection to local or remote thread handle which will be used over
    536 //! that connection. This map will typically only contain one entry, but can
    537 //! contain multiple if a single thread makes IPC calls over multiple
    538-//! connections.
    539-using ConnThreads = std::map<Connection*, ProxyClient<Thread>>;
    540+//! connections. A std::optional value type is used to avoid the map needing to
    541+//! be locked while ProxyClient<Thread> objects are constructed, see
    542+//! ThreadContext "Synchronization note" below.
    543+using ConnThreads = std::map<Connection*, std::optional<ProxyClient<Thread>>>;
    


    Sjors commented at 10:19 am on September 10, 2025:

    In 7a8a99c50feca10be4e42a1cdb135864ce376f71 bug: fix ProxyClient deadlock if disconnected as IPC call is returning: this commit might be easier to follow conceptually if you split the std::optional bit into a separate commit.

    I made an attempt in: https://github.com/bitcoin-core/libmultiprocess/commit/af383138e0323d24892c19e7307ef650def195fd

    It probably needs a comment in SetThread that separating the insert doesn’t do anything until the next commit. And then it’s still a big diff.


    ryanofsky commented at 11:16 am on September 10, 2025:

    re: #201 (review)

    I made an attempt in: af38313

    It probably needs a comment in SetThread that separating the insert doesn’t do anything until the next commit. And then it’s still a big diff.

    Thanks will push an update with that change


    Sjors commented at 11:49 am on September 10, 2025:
    I split off an additional commit 94fe2251f6bc629f98d7ab71b09a8c0b5459d150 that switches the early return to try_emplace. Now the remaining changes are nice and short: 5b65dae936a53a2c6e43ed49f1f382f919979514

    ryanofsky commented at 8:50 pm on September 10, 2025:

    re: #201 (review)

    I split off an additional commit 94fe225 that switches the early return to try_emplace. Now the remaining changes are nice and short: 5b65dae

    Nice! Took both of your commits

  20. in src/mp/proxy.cpp:351 in 7a8a99c50f outdated
    348@@ -336,17 +349,18 @@ ProxyClient<Thread>::~ProxyClient()
    349     // cleanup callback that was registered to handle the connection being
    350     // destroyed before the thread being destroyed.
    351     if (m_disconnect_cb) {
    


    Sjors commented at 10:32 am on September 10, 2025:
    In 7a8a99c50feca10be4e42a1cdb135864ce376f71 bug: fix ProxyClient deadlock if disconnected as IPC call is returning: the reason for keeping this if(m_disconnect_cb) is just to avoid a needless sync call? I.e. a race condition could lead m_disconnect_cb to disappear in the mean time, but it can never appear.

    ryanofsky commented at 11:21 am on September 10, 2025:

    In 7a8a99c bug: fix ProxyClient deadlock if disconnected as IPC call is returning: the reason for keeping this if(m_disconnect_cb) is just to avoid a needless sync call? I.e. a race condition could lead m_disconnect_cb to disappear in the mean time, but it can never appear.

    Yes it should be fine to just call sync() unconditionally, but if m_disconnect_cb is null there’s no reason to do that. Maybe it would be simpler to just call it unconditionally though. I might have just been trying to reduce the diff size here.

  21. in include/mp/proxy-io.h:69 in 7a8a99c50f outdated
    65@@ -66,8 +66,6 @@ struct ProxyClient<Thread> : public ProxyClientBase<Thread, ::capnp::Void>
    66     ProxyClient(const ProxyClient&) = delete;
    67     ~ProxyClient();
    68 
    69-    void setDisconnectCallback(const std::function<void()>& fn);
    


    Sjors commented at 10:37 am on September 10, 2025:
    In 7a8a99c50feca10be4e42a1cdb135864ce376f71 bug: fix ProxyClient deadlock if disconnected as IPC call is returning: can you dedicate a sentence in the commit message as to why this function was dropped?

    ryanofsky commented at 10:40 am on September 10, 2025:

    In 7a8a99c bug: fix ProxyClient deadlock if disconnected as IPC call is returning: can you dedicate a sentence in the commit message as to why this function was dropped?

    Yes will push an update with that change. There’s no particular reason for dropping this method and it could be kept, but it was a short method only called in one place so it seemed better to inline.


    Sjors commented at 11:52 am on September 10, 2025:
    I agree it’s nicer to drop it.

    ryanofsky commented at 8:51 pm on September 10, 2025:

    re: #201 (review)

    can you dedicate a sentence in the commit message as to why this function was dropped?

    Added this now

  22. in include/mp/proxy-io.h:310 in f219f1b51c outdated
    304@@ -307,7 +305,7 @@ struct Waiter
    305     //! mutexes than necessary. This mutex can be held at the same time as
    306     //! EventLoop::m_mutex as long as Waiter::mutex is locked first and
    307     //! EventLoop::m_mutex is locked second.
    308-    std::mutex m_mutex;
    309+    Mutex m_mutex;
    310     std::condition_variable m_cv;
    311     std::optional<kj::Function<void()>> m_fn;
    


    hodlinator commented at 11:34 am on September 10, 2025:

    Might as well explicitly guard?

    0    std::optional<kj::Function<void()>> m_fn MP_GUARDED_BY(m_mutex);
    

    hodlinator commented at 12:03 pm on September 10, 2025:
    (I guess that doesn’t work because of std::condition_variable <-> MP_GUARDED_BY interaction. Please resolve).

    ryanofsky commented at 8:55 pm on September 10, 2025:

    re: #201 (review)

    Thanks, added the guarded by annotation in the second commit! Condition variable interaction was not hard to solve with an additional “requires” annotation.


    hodlinator commented at 9:15 am on September 11, 2025:
    Ah, didn’t think to add the MP_REQUIRES(m_mutex)-annotation. I like the GuardedRef!
  23. Sjors commented at 11:51 am on September 10, 2025: member

    utACK f219f1b51c85e5c4a7157c78d278f832eb0bd975

    I left some suggestions to make the intermediate commits more clear, which might benefit other reviewers.

  24. DrahtBot requested review from Eunovo on Sep 10, 2025
  25. ci: Use -Wthread-safety not -Wthread-safety-analysis
    Use -Wthread-safety not -Wthread-safety-analysis in llvm and sanitize jobs.
    -Wthread-safety is a more general flag which adds additional checks outside the
    core thread safety analysis.
    
    Credit to hodlinator for noticing the bitcoin core build being stricter about
    thread safety checks than the libmultiprocess build here:
    https://github.com/bitcoin-core/libmultiprocess/pull/201#discussion_r2336632302
    4e365b019a
  26. proxy-io.h: add Waiter::m_mutex thread safety annotations
    This is just a refactoring, no behavior is changing. This is only adding
    annotations and switching from unannotated to annotated types.
    
    In SetThread, the first two parameters are combined into a single GuardedRef
    parameter to avoid -Wthread-safety-reference warnings at call sites like
    "warning: passing variable 'callback_threads' by reference requires holding
    mutex 'thread_context.waiter->m_mutex'"
    d60db601ed
  27. doc: describe ThreadContext struct and synchronization requirements 9b07991135
  28. Use std::optional in ConnThreads to allow shortening locks
    This is just a refactoring changing the ConnThreads data type. The optional
    value is not actually left in an unset state until the next commit.
    ca9b380ea9
  29. Use try_emplace in SetThread instead of threads.find
    This commit easiest to review ignoring whitespace (git diff -w). This is a
    minor change in behavior, but the only change is shortening the duration that
    threads.mutex is locked while inserting a new entry in the threads.ref map. The
    lock is now only held while the entry is created and is released while the
    ProxyClient<Thread> object is initialized.
    
    This change doesn't really fix any problems but it simplifies the next commit
    which deals with race conditions and deadlocks in this code, so it has been
    split out.
    85df96482c
  30. bug: fix ProxyClient<Thread> deadlock if disconnected as IPC call is returning
    This bug is currently causing mptest "disconnecting and blocking" test to
    occasionally hang as reported by maflcko in
    https://github.com/bitcoin/bitcoin/issues/33244.
    
    The bug was actually first reported by Sjors in
    https://github.com/Sjors/bitcoin/pull/90#issuecomment-3024942087 and there are
    more details about it in
    https://github.com/bitcoin-core/libmultiprocess/issues/189.
    
    The bug is caused by the "disconnecting and blocking" test triggering a
    disconnect right before a server IPC call returns. This results in a race
    between the IPC server thread and the onDisconnect handler in the event loop
    thread both trying to destroy the server's request_threads ProxyClient<Thread>
    object when the IPC call is done.
    
    There was a lack of synchronization in this case, fixed here by adding
    loop->sync() a few places. Specifically the fixes were to:
    
    - Always call SetThread on the event loop thread using the loop->sync() method,
      to prevent a race between the ProxyClient<Thread> creation code and the
      connection shutdown code if there was an ill-timed disconnect.
    
    - Similarly in ~ProxyClient<Thread> and thread-context.h PassField(), use
      loop->sync() when destroying the thread object, in case a disconnect happens
      at that time.
    
    A few other changes were made in this commit to make the resulting code safer
    and simpler, even though they are not technically necessary for the fix:
    
    - In thread-context.h PassField(), Waiter::m_mutex is now unlocked while
      destroying ProxyClient<Thread> just to respect EventLoop::m_mutex and
      Waiter::m_mutex lock order and never lock the Waiter first. This is just for
      consistency. There is no actually possibility for a deadlock here due to the
      new sync() call.
    
    - This adds asserts to make sure functions expected to run on the event
      loop thread are only called on that thread.
    
    - This inlines the ProxyClient<Thread>::setDisconnectCallback function, just
      because it was a short function only called in a single place.
    4a269b21b8
  31. ryanofsky force-pushed on Sep 10, 2025
  32. ryanofsky commented at 10:23 pm on September 10, 2025: collaborator

    Thanks for the reviews! I took Sjors commits to break the change up into smaller parts and addressed hodlinators suggestions to add more thread safety annotations and make them more strict. I think both changes should make this PR easier to review.

    Updated f219f1b51c85e5c4a7157c78d278f832eb0bd975 -> b6321bb99c8a79400460899eaf3b5e9b15e41be7 (pr/context.4 -> pr/context.5, compare) with suggested changes to split commits and improve thread safety annotations. This actual fix is not changed.

    Updated b6321bb99c8a79400460899eaf3b5e9b15e41be7 -> d86823eed705056e430a6c1c3cd676c360f6a0ae (pr/context.5 -> pr/context.6, compare) fixing typos

  33. ryanofsky force-pushed on Sep 10, 2025
  34. Sjors commented at 6:53 am on September 11, 2025: member

    d86823eed705056e430a6c1c3cd676c360f6a0ae looks good once CI is happy…

    Thanks for splitting the commits. The new struct GuardedRef introduced in 6d6096ae93232fb9dd4b582197fd2f5196efa8f4 is nice.

    When I ran make before locally it didn’t emit any warnings about locks. Given that you only changed the CI configuration, I assume vanilla make still won’t. It would be good to have a cmake developer preset that catches as much as possible (different PR of course).

    Additionally, maybe one of @DrahtBot’s llm’s can look through CI logs (when all jobs pass) for anything unusual compared to master.

    OpenBSD fail seems real:

    0In file included from /home/runner/work/libmultiprocess/libmultiprocess/test/mp/test/foo-types.h:15:
    1/home/runner/work/libmultiprocess/libmultiprocess/include/mp/type-context.h:96:29: error: no viable constructor or deduction guide for deduction of template arguments of 'GuardedRef'
    2                            GuardedRef{thread_context.waiter->m_mutex, request_threads}, server.m_context.connection,
    3                            ^
    4/home/runner/work/libmultiprocess/libmultiprocess/include/mp/util.h:186:8: note: candidate function template not viable: requires 1 argument, but 2 were provided
    5struct GuardedRef
    

    https://github.com/bitcoin-core/libmultiprocess/actions/runs/17628430545/job/50090729622?pr=201

    Co-pilot seems to think this is due to incomplete c++20 support, and we could pre-empt that by adding target_compile_features(multiprocess PUBLIC cxx_std_20) after the three add_library( statements. Though @purpleKarrot recently pointed out LLM’s tend to be wrong when it comes to cmake.

  35. in include/mp/util.h:185 in d86823eed7 outdated
    181@@ -182,6 +182,13 @@ class MP_SCOPED_CAPABILITY Lock {
    182     std::unique_lock<std::mutex> m_lock;
    183 };
    184 
    185+template<typename T>
    


    hodlinator commented at 9:06 am on September 11, 2025:

    meganit: Pretty sure Clang-format in main project prefers:

    0template <typename T>
    
  36. ryanofsky force-pushed on Sep 11, 2025
  37. ryanofsky commented at 10:26 am on September 11, 2025: collaborator

    Thanks, the openbsd failure is probably just happening because openbsd is using an old version of clang. It should be possible to reproduce this locally using the llvm job and following change in shell.nix:

    0-  llvm = crossPkgs.llvmPackages_20;
    1+  llvm = crossPkgs.llvmPackages_16;
    

    But I’m having some kind of network issue and can’t download the llvm 16 packages.

    You are right that it would be nice to have a preset or something that makes it easier to see thread safety errors locally. FWIW I just have a configuration script that does:

    0MP_CXX_FLAGS="-Werror -ftemplate-backtrace-limit=0 -Wthread-safety"
    1CC=clang CXX=clang++ cmake -DCMAKE_INSTALL_PREFIX=$HOME/work/mp/build/prefix -DCapnProto_DEBUG=1 -DCMAKE_BUILD_TYPE=Debug -DCMAKE_CXX_FLAGS="$MP_CXX_FLAGS" -DMULTIPROCESS_RUN_CLANG_TIDY=1 ..
    

    And sometimes I’ll add other things like -fsanitize=address to MP_CXX_FLAGS as well.

    As far as draftbot/CI, the llvm job in ci/configs/llvm.bash uses -Werror and -Wthread-safety together so it will fail if there are any compile time thread safety issues. So I think there is no risk of compile-time errors slipping through anymore (now that is it is using broader -Wthread-safety instead of narrower -Wthread-safety-analysis)


    Updated d86823eed705056e430a6c1c3cd676c360f6a0ae -> 4a269b21b8c8c2c7138df82c8dc4103a388e339b (pr/context.6 -> pr/context.7, compare) with attempted fix for openbsd failure

  38. ryanofsky commented at 10:35 am on September 11, 2025: collaborator
    Openbsd / clang-16 fix seems to have worked!
  39. Sjors commented at 11:42 am on September 11, 2025: member
    ACK 4a269b21b8c8c2c7138df82c8dc4103a388e339b
  40. ryanofsky commented at 9:15 pm on September 12, 2025: collaborator

    I’ve made progress on implementing a more comprehensive disconnect test that can detect bugs like https://github.com/bitcoin/bitcoin/issues/33244 and #189 more reliably instead of only intermittently when the test is run thousands of times. If I can finish it soon, I’ll add it to this PR, otherwise it can be a followup.

    The test is implemented in commit ea4976f099cb83ef59f58f09b62576358cdbf976 (tag) and I think I’m maybe 70% finished with the implementation. The test passes on master because not all of the checks are enabled yet and many aren’t implemented (these are noted in FIXME comments in the code). The commit combines the 4 existing disconnect test cases into a more general test and follows up on earlier ideas and efforts from:

  41. Sjors commented at 8:09 am on September 13, 2025: member
    @ryanofsky that commit also touches non-test code, so I think it would be better as a standalone PR.
  42. ryanofsky commented at 10:21 am on September 13, 2025: collaborator

    @ryanofsky that commit also touches non-test code, so I think it would be better as a standalone PR.

    That’s reasonable. But to be fair the only changes to non-test code are adding 4 lines like:

    0if (connection->m_loop->m_signal) connection->m_loop->m_signal(SignalCallSetup{});
    

    so the test is able to disconnect at specific points. m_signal is always null unless the test sets it, so it should be pretty clear that these lines only affect tests. In practice I’d also split up the change into nontest and test-only commits. And I’d add hasSignal()/sendSignal() helpers so signal mechanism wouldn’t be exposed and this would look more like:

    0if (connection->m_loop->hasSignal<CallSetup>()) connection->m_loop->sendSignal<CallSetup>());
    
  43. in ci/configs/sanitize.bash:4 in 4e365b019a outdated
    0@@ -1,7 +1,7 @@
    1 CI_DESC="CI job running ThreadSanitizer"
    2 CI_DIR=build-sanitize
    3 export CXX=clang++
    4-export CXXFLAGS="-ggdb -Werror -Wall -Wextra -Wpedantic -Wthread-safety-analysis -Wno-unused-parameter -fsanitize=thread"
    5+export CXXFLAGS="-ggdb -Werror -Wall -Wextra -Wpedantic -Wthread-safety -Wno-unused-parameter -fsanitize=thread"
    


    Eunovo commented at 11:55 am on September 15, 2025:
    https://github.com/bitcoin-core/libmultiprocess/pull/201/commits/4e365b019a9feb3150ac8de089014e719f08c9c4: This is unrelated to the PR, but why generate debugging information here? I’m referring to the use of the -ggdb flag

    ryanofsky commented at 12:21 pm on September 15, 2025:

    re: #201 (review)

    I haven’t tried without -ggdb, but sanitizers like ThreadSanitizer print stack traces that I think depend on having this debug information. Cap’n Proto also can use addr2line to print stack traces internally when there are errors (though I don’t think we have this option enabled)

  44. DrahtBot requested review from Eunovo on Sep 15, 2025
  45. in src/mp/proxy.cpp:366 in d60db601ed outdated
    362@@ -364,7 +363,7 @@ ProxyServer<Thread>::~ProxyServer()
    363     assert(m_thread_context.waiter.get());
    364     std::unique_ptr<Waiter> waiter;
    365     {
    366-        const std::unique_lock<std::mutex> lock(m_thread_context.waiter->m_mutex);
    367+        const Lock lock(m_thread_context.waiter->m_mutex);
    


    Eunovo commented at 1:33 pm on September 15, 2025:
  46. ryanofsky referenced this in commit 0b62c777d9 on Sep 17, 2025
  47. ryanofsky commented at 9:30 am on September 17, 2025: collaborator
    I think I’ll go ahead and merge PR this since it’s had acks and testing to confirm the fix works. I would have liked to add my test changes here to test it in a more reliable way but it’s taking longer than expected to implement so better to make it a followup.
  48. ryanofsky merged this on Sep 17, 2025
  49. ryanofsky closed this on Sep 17, 2025

  50. ryanofsky referenced this in commit 535fa0ad0d on Sep 17, 2025
  51. in include/mp/proxy-io.h:1 in 4a269b21b8


    hodlinator commented at 12:43 pm on September 18, 2025:
    Tangent: Why does EventLoop use Unix sockets within the same process to signal when there’s a new m_post_fn to execute? Seems like there should be thread synchronization primitives with marginally less overhead?

    hodlinator commented at 12:48 pm on September 18, 2025:
    Tangent in unrelated file - regarding how easy it is to understand libmultiprocess: TestSetup (in test/mp/test/test.cpp) is hard to read, maybe would benefit from adding comments with argument names of anonymous lambdas and explicit captures of variables. Also feeling resistance when having to learn that kj::heap returns something std::unique_ptr-like which implicit casts to raw pointer. Are the kj types only used when necessary to interface with Cap’n’Proto, and std everywhere else?

    ryanofsky commented at 2:14 pm on September 18, 2025:

    re: https://github.com/bitcoin-core/libmultiprocess/pull/201/files#r2359060452

    Tangent: Why does EventLoop use Unix sockets within the same process to signal when there’s a new m_post_fn to execute? Seems like there should be thread synchronization primitives with marginally less overhead?

    EventLoop documentation is meant to provide some background on this and specifically the link at the bottom https://groups.google.com/d/msg/capnproto/TuQFF1eH2-M/g81sHaTAAQAJ addresses this question.

    Cap’n Proto doesn’t use threads and assumes all code calling it runs on a single thread. There are benefits and costs that come from this but one of the costs is that external threads need to use I/O to communicate with cap’n proto’s event loop. There isn’t another way for threads to send signals. (Or at least there wasn’t at time I wrote this, maybe newer versions of cap’n proto provide something)


    ryanofsky commented at 2:34 pm on September 18, 2025:

    re: #201 (review)

    Tangent in unrelated file - regarding how easy it is to understand libmultiprocess: TestSetup (in test/mp/test/test.cpp) is hard to read, maybe would benefit from adding comments with argument names of anonymous lambdas and explicit captures of variables.

    This is probably true and I’m currently working on test improvements where I’m adding more members to the TestSetup class and could add more comments.

    I’m not sure I would always prefer explicit variable captures though. I think [&] is not just a way of capturing state and cutting down noise, but also an important way indicating that the lambda will only be called in the current scope, and is not some event handler that can be called at a later time. So I tend to like [&] for synchronous inline callbacks, and [x, y, z] for asynchronous event handlers.

    Also feeling resistance when having to learn that kj::heap returns something std::unique_ptr-like which implicit casts to raw pointer. Are the kj types only used when necessary to interface with Cap’n’Proto, and std everywhere else?

    I think basically yes but the word “only” is too strong. The code does tends to use kj library when cap’nproto types are involved and std library otherwise. But there are probably exceptions, and I haven’t found kj types and macros to cause problems in practice. In general kj types tend to have nice debugging and safety features and have a real rationale behind them. They are not just NIH. I think kj::Own in particular has capabilities std::unique_ptr doesn’t have because it doesn’t bake the deleter into the type definition, making it easier to support things like memory pools and implement zero-copy techniques cap’n proto uses internally.


    hodlinator commented at 7:15 pm on September 23, 2025:

    I think the main thing I was missing is the need for pumping the async IO. I assume waiting on a async IO-wrapped socket serves that purpose?

    Was curious why you didn’t use m_io_context->provider->newOneWayPipe() but when writing to the AsyncOutputStream a promise is returned with KJ_WARN_UNUSED_RESULT (https://github.com/capnproto/capnproto/blob/7db701e94ad00b8db06ede2bea5f527e2a6fa3de/c%2B%2B/src/kj/async-io.h#L121), which doesn’t mirror what was in the example in Google Groups.

    Q: I can see there’s a ::close() call for [m_]post_fd, but none for m_wait_fd? Maybe it’s garbage collected once the process dies, or does wrapSocketFd() assume responsibility for closing?


    hodlinator commented at 8:40 pm on September 23, 2025:

    Your policy of only using [&] for indicating that the lambda will be called in current scope is nice when it comes to function-scope - as when creating server_connection, but less so in the case of TestSetup::server_disconnect when this is captured, which can perhaps be somewhat forgiven for a relatively simple lambda.


    I agree that kj has some clever quirks that are missing from std. But is not a one-way street (implicit cast from Own to raw pointer for example). Would still advise towards using std when possible to make review more approachable. This passed mptest for example:

     0diff --git a/include/mp/proxy-io.h b/include/mp/proxy-io.h
     1index a0bdf13..6358c86 100644
     2--- a/include/mp/proxy-io.h
     3+++ b/include/mp/proxy-io.h
     4@@ -179,7 +179,7 @@ public:
     5 
     6     //! Run function on event loop thread. Does not return until function completes.
     7     //! Must be called while the loop() function is active.
     8-    void post(kj::Function<void()> fn);
     9+    void post(std::function<void()> fn);
    10 
    11     //! Wrapper around EventLoop::post that takes advantage of the
    12     //! fact that callable will not go out of scope to avoid requirement that it
    13@@ -231,7 +231,7 @@ public:
    14     std::thread m_async_thread;
    15 
    16     //! Callback function to run on event loop thread during post() or sync() call.
    17-    kj::Function<void()>* m_post_fn MP_GUARDED_BY(m_mutex) = nullptr;
    18+    std::function<void()>* m_post_fn MP_GUARDED_BY(m_mutex) = nullptr;
    19 
    20     //! Callback functions to run on async thread.
    21     std::optional<CleanupList> m_async_fns MP_GUARDED_BY(m_mutex);
    22diff --git a/src/mp/proxy.cpp b/src/mp/proxy.cpp
    23index 06825c9..6611cfb 100644
    24--- a/src/mp/proxy.cpp
    25+++ b/src/mp/proxy.cpp
    26@@ -263,7 +263,7 @@ void EventLoop::loop()
    27     m_cv.notify_all();
    28 }
    29 
    30-void EventLoop::post(kj::Function<void()> fn)
    31+void EventLoop::post(std::function<void()> fn)
    32 {
    33     if (std::this_thread::get_id() == m_thread_id) {
    34         fn();
    

    ryanofsky commented at 12:08 pm on September 30, 2025:

    re: #201 (review)

    Yes IIRC wrapSocketFd does take ownership. A lot of this code is also updated & made more generic in windows PR https://github.com/bitcoin/bitcoin/pull/32387 in case you are curious.


    ryanofsky commented at 12:22 pm on September 30, 2025:

    re: #201 (review)

    Nice find on lambda inconsistencies, that would make sense to clean up. Would be happy to review a PR with any cleanups and improvements like this.

    On kj::Function vs std::function, this was actually changed recently in 52256e730f516af3d4ec5068c0157a183f7aee6e because std::function unlike kj::Function requires function objects to be copyable, which prevents writing lambdas that capture move-only objects. So I’m actually surprised that diff compiles. Maybe it compiles because it is only changing EventLoop::post not Waiter::post? Happy to change to whatever types work though.

  52. hodlinator commented at 12:51 pm on September 18, 2025: contributor

    Didn’t fully get over the hump of knowing what I was looking at in the last commit before the merge this time. At least made me publish #213 (which originally contained a parallel discovery of #202).

    Left some pending inline thoughts and the diagram below.

     0---
     1  config:
     2    class:
     3      hideEmptyMembersBox: true
     4---
     5 classDiagram
     6    class EventLoop{
     7        +loop()
     8        +sync()
     9        +post()
    10    }
    11    class Waiter{
    12        +post()
    13        +wait()
    14    }
    15    
    16    class Connection {
    17        +m_loop
    18        +m_stream
    19        +m_network
    20        +m_rpc_system
    21        +CleanupList m_sync_cleanup_fns
    22    }
    23    note for Connection "Server *or* Client"
    24
    25    class ProxyServer~Interface~
    26    class ProxyClientBase~Interface, Impl~
    27    note for ProxyClientBase "Inherits Impl"
    28    class ProxyServerBase~Interface, Impl~ {
    29        +shared_ptr~Impl~ m_impl
    30    }
    31    note for ProxyServerBase "Inherits Interface::Server,<br/>Owns Impl"
    32
    33    class Calculator {
    34        +solveEquation(const std::string&) = 0
    35    }
    36    note for Calculator "Fairly typical pure virtual C++ interface"
    37
    38    class ProxyClient~::CalculatorInterface~ {
    39    }
    40    
    41    class ProxyClientCustom~::CalculatorInterface, Calculator~ {
    42    }
    43
    44    class ProxyClientCustom~Interface, Impl~ {
    45    }
    46
    47    ProxyClientCustom~::CalculatorInterface,Calculator~ <-- ProxyClient~::CalculatorInterface~
    48    ProxyClientBase~Interface, Impl~ <-- ProxyClientCustom~Interface, Impl~
    49
    50    class TestSetup {
    51        +thread
    52    }
    53
    54    class TestSetup.thread {
    55        +EventLoop
    56        +pipe
    57        +server_connection
    58        +client_connection
    59        +unique_ptr~ProxyClient~FooInterface~~ client_proxy
    60    }
    61
    62    Connection "1" o-- "1" EventLoop
    63    TestSetup "1" o-- "1" TestSetup.thread
    64    TestSetup.thread "2" o-- "1" Connection : server & client
    
  53. ryanofsky commented at 3:09 pm on September 18, 2025: collaborator
    Thanks for the review and followups!
  54. fanquake referenced this in commit edb871cba2 on Sep 19, 2025
  55. Sjors referenced this in commit 132621fc01 on Sep 23, 2025
  56. theuni referenced this in commit 419d891b0a on Sep 26, 2025
  57. ryanofsky commented at 12:25 pm on September 30, 2025: collaborator
    Thanks for the followups, and I’d be happy to review a PR which cleans up code and makes it more readable or approachable!
  58. ryanofsky referenced this in commit afcc40b0f1 on Oct 2, 2025
  59. ryanofsky referenced this in commit 5bb3cbdfba on Oct 2, 2025
  60. ryanofsky referenced this in commit abcd4c4ff9 on Oct 7, 2025
  61. fanquake referenced this in commit a14e7b9dee on Oct 16, 2025

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin-core/libmultiprocess. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-12-04 19:30 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me