bug: fix mptest hang, ProxyClient<Thread> deadlock in disconnect handler #201

pull ryanofsky wants to merge 6 commits into bitcoin-core:master from ryanofsky:pr/context changing 7 files +152 −73
  1. ryanofsky commented at 8:16 PM on September 2, 2025: collaborator

    This bug is currently causing mptest "disconnecting and blocking" test to occasionally hang as reported by maflcko in https://github.com/bitcoin/bitcoin/issues/33244.

    The bug was actually first reported by Sjors in https://github.com/Sjors/bitcoin/pull/90#issuecomment-3024942087 and there are more details about it in #189.

    The bug is caused by the "disconnecting and blocking" test triggering a disconnect right before a server IPC call returns. This results in a race between the IPC server thread and the onDisconnect handler in the event loop thread both trying to destroy the server's request_threads ProxyClient<Thread> object when the IPC call is done.

    There was a lack of synchronization in this case, fixed here by adding loop->sync() various places. There were also lock order issues where Waiter::m_mutex could be incorrectly locked before EventLoop::m_mutex resulting in a deadlock.

  2. DrahtBot commented at 8:16 PM on September 2, 2025: none

    <!--e57a25ab6845829454e8d69fc972939a-->

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    <!--021abf342d371248e50ceaed478a90ca-->

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK Sjors, Eunovo

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    <!--174a7506f384e20aa4161008e828411d-->

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #210 (Replace KJ_DEFER with kj::defer by Sjors)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

    <!--5faf32d7da4f0f540f40219e4f7537a3-->

  3. ryanofsky renamed this:
    bug: fix ProxyClient<Thread> deadlock if disconnected as IPC call is returning
    bug: fix mptest hang, ProxyClient<Thread> deadlock in disconnect handler
    on Sep 3, 2025
  4. ryanofsky commented at 4:19 PM on September 3, 2025: collaborator

    I captured two stack traces of mptest hang on master (1b8d4a6f1e54b92708bd2ad627ec6d440a1daf3d) to be more confident this PR fixes the bug. I might also be able to update the test to make it trigger this bug more reliably, but would need to think more about how to do that. Both stack traces were identical and showed a deadlock between Waiter::m_mutex and EventLoop::post. The stack trace annotated with source code is below.

    • Thread 1 is the main test thread. It is stuck at the end of the "disconnecting and blocking" test trying to destroy ProxyClient<FooInterface> which is blocked waiting for EventLoop::post and EventLoop::m_mutex in order to free the client IPC handle it owns.
    • Thread 2 is the server thread created by ProxyServer<ThreadMap>::makeThread which executes async IPC calls. It is blocked in a similar place as thread 1, trying to destroy ProxyClient<Thread> waiting for EventLoop::post. But additionally, further up the stack in frame 22 in the type-context.h Passfield function it has locked Waiter::m_mutex to delete the ThreadContext::request_threads map entry containing the object being destroyed. This lock is the bug because Waiter::m_mutex can't be held while trying to call EventLoop::post, since this violates lock ordering.
    • Thread 3 is the EventLoop thread which is processing the disconnected event. It is running the ProxyClient<Thread>::m_disconnected_cb callback which is stuck trying to acquire Waiter::m_mutex in order to delete its ThreadContext::request_threads entry. It can't acquire the mutex because thread 2 has it. It is also trying to delete the same map entry which thread 2 is, which is an additional bug fixed in this PR by always deleting these entries on event loop thread instead of letting thread 2 access the map directly.

    <details><summary>thread 1</summary> <p>

    [#0](/bitcoin-core-multiprocess/0/)  0x00007f480509438e in __futex_abstimed_wait_common () from /nix/store/cg9s562sa33k78m63njfn1rw47dp9z0i-glibc-2.40-66/lib/libc.so.6
    [#1](/bitcoin-core-multiprocess/1/)  0x00007f4805097010 in pthread_cond_wait@@GLIBC_2.3.2 () from /nix/store/cg9s562sa33k78m63njfn1rw47dp9z0i-glibc-2.40-66/lib/libc.so.6
    [#2](/bitcoin-core-multiprocess/2/)  0x000055df0ff8bdbb in std::condition_variable::wait<mp::EventLoop::post(kj::Function<void ()>)::$_2>(std::unique_lock<std::mutex>&, mp::EventLoop::post(kj::Function<void ()>)::$_2) (this=0x7f4804ffed08, __lock=..., __p=...)
        at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/condition_variable:105
    [#3](/bitcoin-core-multiprocess/3/)  mp::EventLoop::post (this=0x7f4804ffec90, fn=...) at mp/src/mp/proxy.cpp:273
    
       269      Unlock(lock, [&] {
       270          char buffer = 0;
       271          KJ_SYSCALL(write(post_fd, &buffer, 1));
       272      });
    >  273      m_cv.wait(lock.m_lock, [this, &fn]() MP_REQUIRES(m_mutex) { return m_post_fn != &fn; });
       274  }
    
    [#4](/bitcoin-core-multiprocess/4/)  0x000055df0feefd7d in mp::EventLoop::sync<mp::ProxyClientBase<mp::test::messages::FooInterface, mp::test::FooImplementation>::ProxyClientBase(mp::test::messages::FooInterface::Client, mp::Connection*, bool)::{lambda()#2}::operator()() const::{lambda()#1}>(mp::ProxyClientBase<mp::test::messages::FooInterface, mp::test::FooImplementation>::ProxyClientBase(mp::test::messages::FooInterface::Client, mp::Connection*, bool)::{lambda()#2}::operator()() const::{lambda()#1}&&) (this=0x7f4804ffec90, callable=...) at mp/include/mp/proxy-io.h:182
    
       179      template <typename Callable>
       180      void sync(Callable&& callable)
       181      {
     > 182          post(std::forward<Callable>(callable));
       183      }
    
    [#5](/bitcoin-core-multiprocess/5/)  0x000055df0feefc52 in mp::ProxyClientBase<mp::test::messages::FooInterface, mp::test::FooImplementation>::ProxyClientBase(mp::test::messages::FooInterface::Client, mp::Connection*, bool)::{lambda()#2}::operator()() const (this=<optimized out>)
        at mp/include/mp/proxy-io.h:434
    
       431          Sub::destroy(*this);
       432
       433          // FIXME: Could just invoke removed addCleanup fn here instead of duplicating code
     > 434          m_context.loop->sync([&]() {
       435              // Remove disconnect callback on cleanup so it doesn't run and try
       436              // to access this object after it's destroyed. This call needs to
       437              // run inside loop->sync() on the event loop thread because
       438              // otherwise, if there were an ill-timed disconnect, the
       439              // onDisconnect handler could fire and delete the Connection object
       440              // before the removeSyncCleanup call.
       441              if (m_context.connection) m_context.connection->removeSyncCleanup(disconnect_cb);
    
    [#6](/bitcoin-core-multiprocess/6/)  std::__invoke_impl<void, mp::ProxyClientBase<mp::test::messages::FooInterface, mp::test::FooImplementation>::ProxyClientBase(mp::test::messages::FooInterface::Client, mp::Connection*, bool)::{lambda()#2}&>(std::__invoke_other, mp::ProxyClientBase<mp::test::messages::FooInterface, mp::test::FooImplementation>::ProxyClientBase(mp::test::messages::FooInterface::Client, mp::Connection*, bool)::{lambda()#2}&) (__f=...) at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/bits/invoke.h:61
    [#7](/bitcoin-core-multiprocess/7/)  std::__invoke_r<void, mp::ProxyClientBase<mp::test::messages::FooInterface, mp::test::FooImplementation>::ProxyClientBase(mp::test::messages::FooInterface::Client, mp::Connection*, bool)::{lambda()#2}&>(mp::ProxyClientBase<mp::test::messages::FooInterface, mp::test::FooImplementation>::ProxyClientBase(mp::test::messages::FooInterface::Client, mp::Connection*, bool)::{lambda()#2}&) (__fn=...) at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/bits/invoke.h:111
    [#8](/bitcoin-core-multiprocess/8/)  std::_Function_handler<void (), mp::ProxyClientBase<mp::test::messages::FooInterface, mp::test::FooImplementation>::ProxyClientBase(mp::test::messages::FooInterface::Client, mp::Connection*, bool)::{lambda()#2}>::_M_invoke(std::_Any_data const&) (__functor=...)
        at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/bits/std_function.h:290
    [#9](/bitcoin-core-multiprocess/9/)  0x000055df0ff52d26 in std::function<void()>::operator() (this=0x7fffb7c1b700) at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/bits/std_function.h:591
    [#10](/bitcoin-core-multiprocess/10/) mp::CleanupRun (fns=empty std::__cxx11::list) at mp/include/mp/proxy.h:43
    
        39  inline void CleanupRun(CleanupList& fns) {
        40      while (!fns.empty()) {
        41          auto fn = std::move(fns.front());
        42          fns.pop_front();
    >   43          fn();
        44      }
        45  }
    
    [#11](/bitcoin-core-multiprocess/11/) 0x000055df0ff872e7 in mp::ProxyClientBase<mp::test::messages::FooInterface, mp::test::FooImplementation>::~ProxyClientBase (this=0x7f4800027cf0) at mp/include/mp/proxy-io.h:460
    
       457  template <typename Interface, typename Impl>
       458  ProxyClientBase<Interface, Impl>::~ProxyClientBase() noexcept
       459  {
    >  460      CleanupRun(m_context.cleanup_fns);
       461  }
    
    [#12](/bitcoin-core-multiprocess/12/) 0x000055df0feebfc4 in std::default_delete<mp::ProxyClient<mp::test::messages::FooInterface> >::operator() (this=0x7fffb7c1b830, __ptr=0x7f4800027cf0) at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/bits/unique_ptr.h:93
    [#13](/bitcoin-core-multiprocess/13/) std::__uniq_ptr_impl<mp::ProxyClient<mp::test::messages::FooInterface>, std::default_delete<mp::ProxyClient<mp::test::messages::FooInterface> > >::reset (this=0x7fffb7c1b830, __p=0x0)
        at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/bits/unique_ptr.h:205
    [#14](/bitcoin-core-multiprocess/14/) std::unique_ptr<mp::ProxyClient<mp::test::messages::FooInterface>, std::default_delete<mp::ProxyClient<mp::test::messages::FooInterface> > >::reset (this=0x7fffb7c1b830, __p=0x0)
        at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/bits/unique_ptr.h:504
    [#15](/bitcoin-core-multiprocess/15/) mp::test::TestSetup::~TestSetup (this=this@entry=0x7fffb7c1b7d8) at mp/test/mp/test/test.cpp:103
    
        98      ~TestSetup()
        99      {
       100          // Test that client cleanup_fns are executed.
       101          bool destroyed = false;
       102          client->m_context.cleanup_fns.emplace_front([&destroyed] { destroyed = true; });
    >  103          client.reset();
       104          KJ_EXPECT(destroyed);
       105
       106          thread.join();
       107      }
    
    [#16](/bitcoin-core-multiprocess/16/) 0x000055df0fee997f in mp::test::TestCase251::run (this=<optimized out>) at mp/test/mp/test/test.cpp:298
    
       292      // Now that the disconnect has been detected, set signal allowing the
       293      // callFnAsync() IPC call to return. Since signalling may not wake up the
       294      // thread right away, it is important for the signal variable to be declared
       295      // *before* the TestSetup variable so is not destroyed while
       296      // signal.get_future().get() is called.
       297      signal.set_value();
    >  298  }
    
    [#17](/bitcoin-core-multiprocess/17/) 0x00007f4805a34291 in kj::Maybe<kj::Exception> kj::runCatchingExceptions<kj::TestRunner::run()::{lambda()#1}>(kj::TestRunner::run()::{lambda()#1}&&) () from /nix/store/46kiq9naswgbqfc14kc9nxcbgd0rv0m2-capnproto-1.1.0/lib/libkj-test.so.1.1.0
    [#18](/bitcoin-core-multiprocess/18/) 0x00007f4805a33a08 in kj::TestRunner::run() () from /nix/store/46kiq9naswgbqfc14kc9nxcbgd0rv0m2-capnproto-1.1.0/lib/libkj-test.so.1.1.0
    [#19](/bitcoin-core-multiprocess/19/) 0x00007f4805a337bd in kj::Function<kj::MainBuilder::Validity ()>::Impl<kj::_::BoundMethod<kj::TestRunner&, kj::TestRunner::getMain()::{lambda(auto:1&, (auto:2&&)...)#7}, kj::TestRunner::getMain()::{lambda(auto:1&, (auto:2&&)...)#8}> >::operator()() ()
       from /nix/store/46kiq9naswgbqfc14kc9nxcbgd0rv0m2-capnproto-1.1.0/lib/libkj-test.so.1.1.0
    [#20](/bitcoin-core-multiprocess/20/) 0x00007f480572f93d in kj::MainBuilder::MainImpl::operator()(kj::StringPtr, kj::ArrayPtr<kj::StringPtr const>) () from /nix/store/46kiq9naswgbqfc14kc9nxcbgd0rv0m2-capnproto-1.1.0/lib/libkj.so.1.1.0
    [#21](/bitcoin-core-multiprocess/21/) 0x00007f480572df56 in kj::Maybe<kj::Exception> kj::runCatchingExceptions<kj::runMainAndExit(kj::ProcessContext&, kj::Function<void (kj::StringPtr, kj::ArrayPtr<kj::StringPtr const>)>&&, int, char**)::$_0>(kj::runMainAndExit(kj::ProcessContext&, kj::Function<void (kj::StringPtr, kj::ArrayPtr<kj::StringPtr const>)>&&, int, char**)::$_0&&) () from /nix/store/46kiq9naswgbqfc14kc9nxcbgd0rv0m2-capnproto-1.1.0/lib/libkj.so.1.1.0
    [#22](/bitcoin-core-multiprocess/22/) 0x00007f480572dc7f in kj::runMainAndExit(kj::ProcessContext&, kj::Function<void (kj::StringPtr, kj::ArrayPtr<kj::StringPtr const>)>&&, int, char**) () from /nix/store/46kiq9naswgbqfc14kc9nxcbgd0rv0m2-capnproto-1.1.0/lib/libkj.so.1.1.0
    [#23](/bitcoin-core-multiprocess/23/) 0x00007f4805a329b9 in main () from /nix/store/46kiq9naswgbqfc14kc9nxcbgd0rv0m2-capnproto-1.1.0/lib/libkj-test.so.1.1.0
    [#24](/bitcoin-core-multiprocess/24/) 0x00007f480502a47e in __libc_start_call_main () from /nix/store/cg9s562sa33k78m63njfn1rw47dp9z0i-glibc-2.40-66/lib/libc.so.6
    [#25](/bitcoin-core-multiprocess/25/) 0x00007f480502a539 in __libc_start_main_impl () from /nix/store/cg9s562sa33k78m63njfn1rw47dp9z0i-glibc-2.40-66/lib/libc.so.6
    [#26](/bitcoin-core-multiprocess/26/) 0x000055df0fee6eb5 in _start ()
    
    

    </p> </details>

    <details><summary>thread 2</summary> <p>

    [#0](/bitcoin-core-multiprocess/0/)  0x00007f480509438e in __futex_abstimed_wait_common () from /nix/store/cg9s562sa33k78m63njfn1rw47dp9z0i-glibc-2.40-66/lib/libc.so.6
    [#1](/bitcoin-core-multiprocess/1/)  0x00007f4805097010 in pthread_cond_wait@@GLIBC_2.3.2 () from /nix/store/cg9s562sa33k78m63njfn1rw47dp9z0i-glibc-2.40-66/lib/libc.so.6
    [#2](/bitcoin-core-multiprocess/2/)  0x000055df0ff8bd0b in std::condition_variable::wait<mp::EventLoop::post(kj::Function<void ()>)::$_0>(std::unique_lock<std::mutex>&, mp::EventLoop::post(kj::Function<void ()>)::$_0) (this=0x7f4804ffed08, __lock=..., __p=...)
        at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/condition_variable:105
    [#3](/bitcoin-core-multiprocess/3/)  mp::EventLoop::post (this=this@entry=0x7f4804ffec90, fn=...) at mp/src/mp/proxy.cpp:266
    
       258  void EventLoop::post(kj::Function<void()> fn)
       259  {
       260      if (std::this_thread::get_id() == m_thread_id) {
       261          fn();
       262          return;
       263      }
       264      Lock lock(m_mutex);
       265      EventLoopRef ref(*this, &lock);
    >  266      m_cv.wait(lock.m_lock, [this]() MP_REQUIRES(m_mutex) { return m_post_fn == nullptr; });
       267      m_post_fn = &fn;
       268      int post_fd{m_post_fd};
       269      Unlock(lock, [&] {
       270          char buffer = 0;
       271          KJ_SYSCALL(write(post_fd, &buffer, 1));
       272      });
       273      m_cv.wait(lock.m_lock, [this, &fn]() MP_REQUIRES(m_mutex) { return m_post_fn != &fn; });
       274  }
    
    [#4](/bitcoin-core-multiprocess/4/)  0x000055df0ff8f22d in mp::EventLoop::sync<mp::ProxyClientBase<mp::Thread, capnp::Void>::ProxyClientBase(mp::Thread::Client, mp::Connection*, bool)::{lambda()#2}::operator()() const::{lambda()#1}>(mp::ProxyClientBase<mp::Thread, capnp::Void>::ProxyClientBase(mp::Thread::Client, mp::Connection*, bool)::{lambda()#2}::operator()() const::{lambda()#1}&&) (this=0x7f4804ffec90, callable=...) at mp/include/mp/proxy-io.h:182
    
       179      template <typename Callable>
       180      void sync(Callable&& callable)
       181      {
    >  182          post(std::forward<Callable>(callable));
       183      }
    
    [#5](/bitcoin-core-multiprocess/5/)  0x000055df0ff8f102 in mp::ProxyClientBase<mp::Thread, capnp::Void>::ProxyClientBase(mp::Thread::Client, mp::Connection*, bool)::{lambda()#2}::operator()() const (this=<optimized out>) at mp/include/mp/proxy-io.h:434
    
       431          Sub::destroy(*this);
       432
       433          // FIXME: Could just invoke removed addCleanup fn here instead of duplicating code
    >  434          m_context.loop->sync([&]() {
       435              // Remove disconnect callback on cleanup so it doesn't run and try
       436              // to access this object after it's destroyed. This call needs to
       437              // run inside loop->sync() on the event loop thread because
       438              // otherwise, if there were an ill-timed disconnect, the
       439              // onDisconnect handler could fire and delete the Connection object
       440              // before the removeSyncCleanup call.
       441              if (m_context.connection) m_context.connection->removeSyncCleanup(disconnect_cb);
    
    [#6](/bitcoin-core-multiprocess/6/)  std::__invoke_impl<void, mp::ProxyClientBase<mp::Thread, capnp::Void>::ProxyClientBase(mp::Thread::Client, mp::Connection*, bool)::{lambda()#2}&>(std::__invoke_other, mp::ProxyClientBase<mp::Thread, capnp::Void>::ProxyClientBase(mp::Thread::Client, mp::Connection*, bool)::{lambda()#2}&) (__f=...) at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/bits/invoke.h:61
    [#7](/bitcoin-core-multiprocess/7/)  std::__invoke_r<void, mp::ProxyClientBase<mp::Thread, capnp::Void>::ProxyClientBase(mp::Thread::Client, mp::Connection*, bool)::{lambda()#2}&>(mp::ProxyClientBase<mp::Thread, capnp::Void>::ProxyClientBase(mp::Thread::Client, mp::Connection*, bool)::{lambda()#2}&) (__fn=...)
        at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/bits/invoke.h:111
    [#8](/bitcoin-core-multiprocess/8/)  std::_Function_handler<void (), mp::ProxyClientBase<mp::Thread, capnp::Void>::ProxyClientBase(mp::Thread::Client, mp::Connection*, bool)::{lambda()#2}>::_M_invoke(std::_Any_data const&) (__functor=...)
        at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/bits/std_function.h:290
    [#9](/bitcoin-core-multiprocess/9/)  0x000055df0ff52d26 in std::function<void()>::operator() (this=0x7f47ffffe870) at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/bits/std_function.h:591
    [#10](/bitcoin-core-multiprocess/10/) mp::CleanupRun (fns=empty std::__cxx11::list) at mp/include/mp/proxy.h:43
    
        39  inline void CleanupRun(CleanupList& fns) {
        40      while (!fns.empty()) {
        41          auto fn = std::move(fns.front());
        42          fns.pop_front();
    >   43          fn();
        44      }
        45  }
    
    [#11](/bitcoin-core-multiprocess/11/) 0x000055df0ff8e627 in mp::ProxyClientBase<mp::Thread, capnp::Void>::~ProxyClientBase (this=0x7f47f8000e58) at mp/include/mp/proxy-io.h:460
    
       457  template <typename Interface, typename Impl>
       458  ProxyClientBase<Interface, Impl>::~ProxyClientBase() noexcept
       459  {
    >  460      CleanupRun(m_context.cleanup_fns);
       461  }
    
    [#12](/bitcoin-core-multiprocess/12/) 0x000055df0ff53129 in std::pair<mp::Connection* const, mp::ProxyClient<mp::Thread> >::~pair (this=0x7f47f8000e50) at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/bits/stl_iterator.h:3013
    [#13](/bitcoin-core-multiprocess/13/) std::destroy_at<std::pair<mp::Connection* const, mp::ProxyClient<mp::Thread> > > (__location=0x7f47f8000e50) at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/bits/stl_construct.h:88
    [#14](/bitcoin-core-multiprocess/14/) std::allocator_traits<std::allocator<std::_Rb_tree_node<std::pair<mp::Connection* const, mp::ProxyClient<mp::Thread> > > > >::destroy<std::pair<mp::Connection* const, mp::ProxyClient<mp::Thread> > > (__p=0x7f47f8000e50, __a=...)
        at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/bits/alloc_traits.h:599
    [#15](/bitcoin-core-multiprocess/15/) std::_Rb_tree<mp::Connection*, std::pair<mp::Connection* const, mp::ProxyClient<mp::Thread> >, std::_Select1st<std::pair<mp::Connection* const, mp::ProxyClient<mp::Thread> > >, std::less<mp::Connection*>, std::allocator<std::pair<mp::Connection* const, mp::ProxyClient<mp::Thread> > > >::_M_destroy_node (this=0x7f47fffff680, __p=0x7f47f8000e30) at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/bits/stl_tree.h:621
    [#16](/bitcoin-core-multiprocess/16/) std::_Rb_tree<mp::Connection*, std::pair<mp::Connection* const, mp::ProxyClient<mp::Thread> >, std::_Select1st<std::pair<mp::Connection* const, mp::ProxyClient<mp::Thread> > >, std::less<mp::Connection*>, std::allocator<std::pair<mp::Connection* const, mp::ProxyClient<mp::Thread> > > >::_M_drop_node (this=0x7f47fffff680, __p=0x7f47f8000e30) at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/bits/stl_tree.h:629
    [#17](/bitcoin-core-multiprocess/17/) std::_Rb_tree<mp::Connection*, std::pair<mp::Connection* const, mp::ProxyClient<mp::Thread> >, std::_Select1st<std::pair<mp::Connection* const, mp::ProxyClient<mp::Thread> > >, std::less<mp::Connection*>, std::allocator<std::pair<mp::Connection* const, mp::ProxyClient<mp::Thread> > > >::_M_erase (this=this@entry=0x7f47fffff680, __x=0x7f47f8000e30) at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/bits/stl_tree.h:1934
    [#18](/bitcoin-core-multiprocess/18/) 0x000055df0ff530c9 in std::_Rb_tree<mp::Connection*, std::pair<mp::Connection* const, mp::ProxyClient<mp::Thread> >, std::_Select1st<std::pair<mp::Connection* const, mp::ProxyClient<mp::Thread> > >, std::less<mp::Connection*>, std::allocator<std::pair<mp::Connection* const, mp::ProxyClient<mp::Thread> > > >::clear (this=0x7f47fffff680) at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/bits/stl_tree.h:1251
    [#19](/bitcoin-core-multiprocess/19/) std::_Rb_tree<mp::Connection*, std::pair<mp::Connection* const, mp::ProxyClient<mp::Thread> >, std::_Select1st<std::pair<mp::Connection* const, mp::ProxyClient<mp::Thread> > >, std::less<mp::Connection*>, std::allocator<std::pair<mp::Connection* const, mp::ProxyClient<mp::Thread> > > >::_M_erase_aux (this=0x7f47fffff680, __first={...}, __last={...}) at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/bits/stl_tree.h:2505
    [#20](/bitcoin-core-multiprocess/20/) 0x000055df0ff70fa6 in std::_Rb_tree<mp::Connection*, std::pair<mp::Connection* const, mp::ProxyClient<mp::Thread> >, std::_Select1st<std::pair<mp::Connection* const, mp::ProxyClient<mp::Thread> > >, std::less<mp::Connection*>, std::allocator<std::pair<mp::Connection* const, mp::ProxyClient<mp::Thread> > > >::erase (this=0x7f4804ffed30, __x=<optimized out>) at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/bits/stl_tree.h:2519
    [#21](/bitcoin-core-multiprocess/21/) std::map<mp::Connection*, mp::ProxyClient<mp::Thread>, std::less<mp::Connection*>, std::allocator<std::pair<mp::Connection* const, mp::ProxyClient<mp::Thread> > > >::erase (this=0x7f4804ffed30, __x=<optimized out>)
        at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/bits/stl_map.h:1118
    [#22](/bitcoin-core-multiprocess/22/) operator() (this=this@entry=0x7f47ffffeaf8) at mp/include/mp/type-context.h:103
    
       102                      const bool erase_thread{inserted};
       103                      KJ_DEFER(if (erase_thread) {
       104                          std::unique_lock<std::mutex> lock(thread_context.waiter->m_mutex);
       105                          // Call erase here with a Connection* argument instead
       106                          // of an iterator argument, because the `request_thread`
       107                          // iterator may be invalid if the connection is closed
       108                          // during this function call. More specifically, the
       109                          // iterator may be invalid because SetThread adds a
       110                          // cleanup callback to the Connection destructor that
       111                          // erases the thread from the map, and also because the
       112                          // ProxyServer<Thread> destructor calls
       113                          // request_threads.clear().
    >  114                          request_threads.erase(server.m_context.connection);
       115                      });
       116                      fn.invoke(server_context, args...);
       117                  }
    
    [#23](/bitcoin-core-multiprocess/23/) 0x000055df0ff709c7 in run (this=<optimized out>) at /nix/store/46kiq9naswgbqfc14kc9nxcbgd0rv0m2-capnproto-1.1.0/include/kj/common.h:2007
    [#24](/bitcoin-core-multiprocess/24/) ~Deferred (this=<optimized out>) at /nix/store/46kiq9naswgbqfc14kc9nxcbgd0rv0m2-capnproto-1.1.0/include/kj/common.h:1996
    [#25](/bitcoin-core-multiprocess/25/) operator() (this=0x7f4800034628) at mp/include/mp/type-context.h:117
    
    >  117                  }
    
    [#26](/bitcoin-core-multiprocess/26/) 0x000055df0ff04920 in kj::Function<void()>::operator() (this=0x7f47ffffeda0) at /nix/store/46kiq9naswgbqfc14kc9nxcbgd0rv0m2-capnproto-1.1.0/include/kj/function.h:119
    [#27](/bitcoin-core-multiprocess/27/) mp::Unlock<std::unique_lock<std::mutex>, kj::Function<void()>&> (lock=..., callback=...) at mp/include/mp/util.h:198
    
       194  template <typename Lock, typename Callback>
       195  void Unlock(Lock& lock, Callback&& callback)
       196  {
       197      const UnlockGuard<Lock> unlock(lock);
    >  198      callback();
       199  }
    
    [#28](/bitcoin-core-multiprocess/28/) 0x000055df0ff8d979 in mp::Waiter::wait<mp::ProxyServer<mp::ThreadMap>::makeThread(capnp::CallContext<mp::ThreadMap::MakeThreadParams, mp::ThreadMap::MakeThreadResults>)::$_0::operator()() const::{lambda()#1}>(std::unique_lock<std::mutex>&, mp::ProxyServer<mp::ThreadMap>::makeThread(capnp::CallContext<mp::ThreadMap::MakeThreadParams, mp::ThreadMap::MakeThreadResults>)::$_0::operator()() const::{lambda()#1})::{lambda()#1}::operator()() const (this=<optimized out>) at mp/include/mp/proxy-io.h:296
    
       284      template <class Predicate>
       285      void wait(std::unique_lock<std::mutex>& lock, Predicate pred)
       286      {
       287          m_cv.wait(lock, [&] {
       288              // Important for this to be "while (m_fn)", not "if (m_fn)" to avoid
       289              // a lost-wakeup bug. A new m_fn and m_cv notification might be sent
       290              // after the fn() call and before the lock.lock() call in this loop
       291              // in the case where a capnp response is sent and a brand new
       292              // request is immediately received.
       293              while (m_fn) {
       294                  auto fn = std::move(*m_fn);
       295                  m_fn.reset();
    >  296                  Unlock(lock, fn);
       297              }
       298              const bool done = pred();
       299              return done;
       300          });
       301      }
    
    [#29](/bitcoin-core-multiprocess/29/) std::condition_variable::wait<mp::Waiter::wait<mp::ProxyServer<mp::ThreadMap>::makeThread(capnp::CallContext<mp::ThreadMap::MakeThreadParams, mp::ThreadMap::MakeThreadResults>)::$_0::operator()() const::{lambda()#1}>(std::unique_lock<std::mutex>&, mp::ProxyServer<mp::ThreadMap>::makeThread(capnp::CallContext<mp::ThreadMap::MakeThreadParams, mp::ThreadMap::MakeThreadResults>)::$_0::operator()() const::{lambda()#1})::{lambda()#1}>(std::unique_lock<std::mutex>&, mp::Waiter::wait<mp::ProxyServer<mp::ThreadMap>::makeThread(capnp::CallContext<mp::ThreadMap::MakeThreadParams, mp::ThreadMap::MakeThreadResults>)::$_0::operator()() const::{lambda()#1}>(std::unique_lock<std::mutex>&, mp::ProxyServer<mp::ThreadMap>::makeThread(capnp::CallContext<mp::ThreadMap::MakeThreadParams, mp::ThreadMap::MakeThreadResults>)::$_0::operator()() const::{lambda()#1})::{lambda()#1}) (this=0x7f47f8000dd8, __lock=..., __p=...) at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/condition_variable:104
    [#30](/bitcoin-core-multiprocess/30/) mp::Waiter::wait<mp::ProxyServer<mp::ThreadMap>::makeThread(capnp::CallContext<mp::ThreadMap::MakeThreadParams, mp::ThreadMap::MakeThreadResults>)::$_0::operator()() const::{lambda()#1}>(std::unique_lock<std::mutex>&, mp::ProxyServer<mp::ThreadMap>::makeThread(capnp::CallContext<mp::ThreadMap::MakeThreadParams, mp::ThreadMap::MakeThreadResults>)::$_0::operator()() const::{lambda()#1}) (this=0x7f47f8000db0, lock=..., pred=...) at mp/include/mp/proxy-io.h:287
    
    >  287          m_cv.wait(lock, [&] {
    
    [#31](/bitcoin-core-multiprocess/31/) mp::ProxyServer<mp::ThreadMap>::makeThread(capnp::CallContext<mp::ThreadMap::MakeThreadParams, mp::ThreadMap::MakeThreadResults>)::$_0::operator()() const (this=<optimized out>) at mp/src/mp/proxy.cpp:404
    
       393  kj::Promise<void> ProxyServer<ThreadMap>::makeThread(MakeThreadContext context)
       394  {
       395      const std::string from = context.getParams().getName();
       396      std::promise<ThreadContext*> thread_context;
       397      std::thread thread([&thread_context, from, this]() {
       398          g_thread_context.thread_name = ThreadName(m_connection.m_loop->m_exe_name) + " (from " + from + ")";
       399          g_thread_context.waiter = std::make_unique<Waiter>();
       400          thread_context.set_value(&g_thread_context);
       401          std::unique_lock<std::mutex> lock(g_thread_context.waiter->m_mutex);
       402          // Wait for shutdown signal from ProxyServer<Thread> destructor (signal
       403          // is just waiter getting set to null.)
    >  404          g_thread_context.waiter->wait(lock, [] { return !g_thread_context.waiter; });
       405      });
       406      auto thread_server = kj::heap<ProxyServer<Thread>>(*thread_context.get_future().get(), std::move(thread));
       407      auto thread_client = m_connection.m_threads.add(kj::mv(thread_server));
       408      context.getResults().setResult(kj::mv(thread_client));
       409      return kj::READY_NOW;
       410  }
    
    [#32](/bitcoin-core-multiprocess/32/) std::__invoke_impl<void, mp::ProxyServer<mp::ThreadMap>::makeThread(capnp::CallContext<mp::ThreadMap::MakeThreadParams, mp::ThreadMap::MakeThreadResults>)::$_0>(std::__invoke_other, mp::ProxyServer<mp::ThreadMap>::makeThread(capnp::CallContext<mp::ThreadMap::MakeThreadParams, mp::ThreadMap::MakeThreadResults>)::$_0&&) (__f=...) at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/bits/invoke.h:61
    [#33](/bitcoin-core-multiprocess/33/) std::__invoke<mp::ProxyServer<mp::ThreadMap>::makeThread(capnp::CallContext<mp::ThreadMap::MakeThreadParams, mp::ThreadMap::MakeThreadResults>)::$_0>(mp::ProxyServer<mp::ThreadMap>::makeThread(capnp::CallContext<mp::ThreadMap::MakeThreadParams, mp::ThreadMap::MakeThreadResults>)::$_0&&) (__fn=...) at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/bits/invoke.h:96
    [#34](/bitcoin-core-multiprocess/34/) std::thread::_Invoker<std::tuple<mp::ProxyServer<mp::ThreadMap>::makeThread(capnp::CallContext<mp::ThreadMap::MakeThreadParams, mp::ThreadMap::MakeThreadResults>)::$_0> >::_M_invoke<0ul> (this=<optimized out>)
        at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/bits/std_thread.h:301
    [#35](/bitcoin-core-multiprocess/35/) std::thread::_Invoker<std::tuple<mp::ProxyServer<mp::ThreadMap>::makeThread(capnp::CallContext<mp::ThreadMap::MakeThreadParams, mp::ThreadMap::MakeThreadResults>)::$_0> >::operator() (this=<optimized out>)
        at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/bits/std_thread.h:308
    [#36](/bitcoin-core-multiprocess/36/) std::thread::_State_impl<std::thread::_Invoker<std::tuple<mp::ProxyServer<mp::ThreadMap>::makeThread(capnp::CallContext<mp::ThreadMap::MakeThreadParams, mp::ThreadMap::MakeThreadResults>)::$_0> > >::_M_run (this=<optimized out>)
        at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/bits/std_thread.h:253
    [#37](/bitcoin-core-multiprocess/37/) 0x00007f48054ed064 in execute_native_thread_routine () from /nix/store/7c0v0kbrrdc2cqgisi78jdqxn73n3401-gcc-14.2.1.20250322-lib/lib/libstdc++.so.6
    [#38](/bitcoin-core-multiprocess/38/) 0x00007f4805097e63 in start_thread () from /nix/store/cg9s562sa33k78m63njfn1rw47dp9z0i-glibc-2.40-66/lib/libc.so.6
    [#39](/bitcoin-core-multiprocess/39/) 0x00007f480511bdbc in __clone3 () from /nix/store/cg9s562sa33k78m63njfn1rw47dp9z0i-glibc-2.40-66/lib/libc.so.6
    

    </p> </details>

    <details><summary>thread 3</summary> <p>

    [#0](/bitcoin-core-multiprocess/0/)  0x00007f480509463f in __lll_lock_wait () from /nix/store/cg9s562sa33k78m63njfn1rw47dp9z0i-glibc-2.40-66/lib/libc.so.6
    [#1](/bitcoin-core-multiprocess/1/)  0x00007f480509b392 in pthread_mutex_lock@@GLIBC_2.2.5 () from /nix/store/cg9s562sa33k78m63njfn1rw47dp9z0i-glibc-2.40-66/lib/libc.so.6
    [#2](/bitcoin-core-multiprocess/2/)  0x000055df0ff8d324 in __gthread_mutex_lock (__mutex=0x7f47f8000db0) at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/x86_64-unknown-linux-gnu/bits/gthr-default.h:762
    [#3](/bitcoin-core-multiprocess/3/)  std::mutex::lock (this=0x7f47f8000db0) at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/bits/std_mutex.h:113
    [#4](/bitcoin-core-multiprocess/4/)  std::unique_lock<std::mutex>::lock (this=<optimized out>) at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/bits/unique_lock.h:147
    [#5](/bitcoin-core-multiprocess/5/)  std::unique_lock<std::mutex>::unique_lock (__m=..., this=<optimized out>) at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/bits/unique_lock.h:73
    [#6](/bitcoin-core-multiprocess/6/)  mp::SetThread(std::map<mp::Connection*, mp::ProxyClient<mp::Thread>, std::less<mp::Connection*>, std::allocator<std::pair<mp::Connection* const, mp::ProxyClient<mp::Thread> > > >&, std::mutex&, mp::Connection*, std::function<mp::Thread::Client ()> const&)::$_0::operator()() const (
        this=0x7f47f8001170) at mp/src/mp/proxy.cpp:326
    
       308  std::tuple<ConnThread, bool> SetThread(ConnThreads& threads, std::mutex& mutex, Connection* connection, const std::function<Thread::Client()>& make_thread)
       309  {
       310      const std::unique_lock<std::mutex> lock(mutex);
       311      auto thread = threads.find(connection);
       312      if (thread != threads.end()) return {thread, false};
       313      thread = threads.emplace(
       314          std::piecewise_construct, std::forward_as_tuple(connection),
       315          std::forward_as_tuple(make_thread(), connection, /* destroy_connection= */ false)).first;
       316      thread->second.setDisconnectCallback([&threads, &mutex, thread] {
       317          // Note: it is safe to use the `thread` iterator in this cleanup
       318          // function, because the iterator would only be invalid if the map entry
       319          // was removed, and if the map entry is removed the ProxyClient<Thread>
       320          // destructor unregisters the cleanup.
       321
       322          // Connection is being destroyed before thread client is, so reset
       323          // thread client m_disconnect_cb member so thread client destructor does not
       324          // try to unregister this callback after connection is destroyed.
       325          // Remove connection pointer about to be destroyed from the map
    >  326          const std::unique_lock<std::mutex> lock(mutex);
       327          thread->second.m_disconnect_cb.reset();
       328          threads.erase(thread);
       329      });
       330      return {thread, true};
       331  }
    
    [#7](/bitcoin-core-multiprocess/7/)  std::__invoke_impl<void, mp::SetThread(std::map<mp::Connection*, mp::ProxyClient<mp::Thread>, std::less<mp::Connection*>, std::allocator<std::pair<mp::Connection* const, mp::ProxyClient<mp::Thread> > > >&, std::mutex&, mp::Connection*, std::function<mp::Thread::Client ()> const&)::$_0&>(std::__invoke_other, mp::SetThread(std::map<mp::Connection*, mp::ProxyClient<mp::Thread>, std::less<mp::Connection*>, std::allocator<std::pair<mp::Connection* const, mp::ProxyClient<mp::Thread> > > >&, std::mutex&, mp::Connection*, std::function<mp::Thread::Client ()> const&)::$_0&)
        (__f=...) at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/bits/invoke.h:61
    [#8](/bitcoin-core-multiprocess/8/)  std::__invoke_r<void, mp::SetThread(std::map<mp::Connection*, mp::ProxyClient<mp::Thread>, std::less<mp::Connection*>, std::allocator<std::pair<mp::Connection* const, mp::ProxyClient<mp::Thread> > > >&, std::mutex&, mp::Connection*, std::function<mp::Thread::Client ()> const&)::$_0&>(mp::SetThread(std::map<mp::Connection*, mp::ProxyClient<mp::Thread>, std::less<mp::Connection*>, std::allocator<std::pair<mp::Connection* const, mp::ProxyClient<mp::Thread> > > >&, std::mutex&, mp::Connection*, std::function<mp::Thread::Client ()> const&)::$_0&) (__fn=...)
        at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/bits/invoke.h:111
    [#9](/bitcoin-core-multiprocess/9/)  std::_Function_handler<void(), mp::SetThread(std::map<mp::Connection*, mp::ProxyClient<mp::Thread>, std::less<mp::Connection*>, std::allocator<std::pair<mp::Connection* const, mp::ProxyClient<mp::Thread> > > >&, std::mutex&, mp::Connection*, std::function<mp::Thread::Client()> const&)::$_0>::_M_invoke (__functor=...) at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/bits/std_function.h:290
    [#10](/bitcoin-core-multiprocess/10/) 0x000055df0ff8de15 in std::function<void()>::operator() (this=0x7f47f8000fb0) at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/bits/std_function.h:591
    [#11](/bitcoin-core-multiprocess/11/) mp::Unlock<mp::Lock, std::function<void()>&> (lock=..., callback=...) at mp/include/mp/util.h:198
    
       194  template <typename Lock, typename Callback>
       195  void Unlock(Lock& lock, Callback&& callback)
       196  {
       197      const UnlockGuard<Lock> unlock(lock);
    >  198      callback();
       199  }
    
    [#12](/bitcoin-core-multiprocess/12/) 0x000055df0ff8a70f in mp::Connection::~Connection (this=0x7f480002a810) at mp/src/mp/proxy.cpp:139
    
       135      Lock lock{m_loop->m_mutex};
       136      while (!m_sync_cleanup_fns.empty()) {
       137          CleanupList fn;
       138          fn.splice(fn.begin(), m_sync_cleanup_fns, m_sync_cleanup_fns.begin());
    >  139          Unlock(lock, fn.front());
       140      }
       141  }
    
    [#13](/bitcoin-core-multiprocess/13/) 0x000055df0feeedd3 in std::default_delete<mp::Connection>::operator() (__ptr=0x7f480002a810, this=<optimized out>) at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/bits/unique_ptr.h:93
    [#14](/bitcoin-core-multiprocess/14/) std::__uniq_ptr_impl<mp::Connection, std::default_delete<mp::Connection> >::reset (this=0xfffffffffffffe00, __p=0x0) at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/bits/unique_ptr.h:205
    [#15](/bitcoin-core-multiprocess/15/) std::unique_ptr<mp::Connection, std::default_delete<mp::Connection> >::reset (this=0xfffffffffffffe00, __p=0x0) at /nix/store/9ds850ifd4jwcccpp3v14818kk74ldf2-gcc-14.2.1.20250322/include/c++/14.2.1.20250322/bits/unique_ptr.h:504
    [#16](/bitcoin-core-multiprocess/16/) mp::test::TestSetup::TestSetup(bool)::{lambda()#1}::operator()() const::{lambda()#2}::operator()() const (this=<optimized out>) at mp/test/mp/test/test.cpp:79
    
        76                server_disconnect = [&] { loop.sync([&] { server_connection.reset(); }); };
        77                // Set handler to destroy the server when the client disconnects. This
        78                // is ignored if server_disconnect() is called instead.
    >   79                server_connection->onDisconnect([&] { server_connection.reset(); });
        80
        81                auto client_connection = std::make_unique<Connection>(loop, kj::mv(pipe.ends[1]));
    
    [#17](/bitcoin-core-multiprocess/17/) kj::_::MaybeVoidCaller<kj::_::Void, kj::_::Void>::apply<mp::test::TestSetup::TestSetup(bool)::{lambda()#1}::operator()() const::{lambda()#2}>(mp::test::TestSetup::TestSetup(bool)::{lambda()#1}::operator()() const::{lambda()#2}&, kj::_::Void&&) (func=..., in=...)
        at /nix/store/46kiq9naswgbqfc14kc9nxcbgd0rv0m2-capnproto-1.1.0/include/kj/async-prelude.h:195
    [#18](/bitcoin-core-multiprocess/18/) kj::_::TransformPromiseNode<kj::_::Void, kj::_::Void, mp::test::TestSetup::TestSetup(bool)::{lambda()#1}::operator()() const::{lambda()#2}, kj::_::PropagateException>::getImpl(kj::_::ExceptionOrValue&) (this=<optimized out>, output=...)
        at /nix/store/46kiq9naswgbqfc14kc9nxcbgd0rv0m2-capnproto-1.1.0/include/kj/async-inl.h:739
    [#19](/bitcoin-core-multiprocess/19/) 0x00007f48057c3a3e in kj::_::TransformPromiseNodeBase::get(kj::_::ExceptionOrValue&) () from /nix/store/46kiq9naswgbqfc14kc9nxcbgd0rv0m2-capnproto-1.1.0/lib/libkj-async.so.1.1.0
    [#20](/bitcoin-core-multiprocess/20/) 0x00007f48057d102a in kj::TaskSet::Task::fire() () from /nix/store/46kiq9naswgbqfc14kc9nxcbgd0rv0m2-capnproto-1.1.0/lib/libkj-async.so.1.1.0
    [#21](/bitcoin-core-multiprocess/21/) 0x00007f48057d143d in non-virtual thunk to kj::TaskSet::Task::fire() () from /nix/store/46kiq9naswgbqfc14kc9nxcbgd0rv0m2-capnproto-1.1.0/lib/libkj-async.so.1.1.0
    [#22](/bitcoin-core-multiprocess/22/) 0x00007f48057c8cb2 in kj::_::waitImpl(kj::Own<kj::_::PromiseNode, kj::_::PromiseDisposer>&&, kj::_::ExceptionOrValue&, kj::WaitScope&, kj::SourceLocation)::$_2::operator()() const () from /nix/store/46kiq9naswgbqfc14kc9nxcbgd0rv0m2-capnproto-1.1.0/lib/libkj-async.so.1.1.0
    [#23](/bitcoin-core-multiprocess/23/) 0x00007f48057c1d68 in kj::_::waitImpl(kj::Own<kj::_::PromiseNode, kj::_::PromiseDisposer>&&, kj::_::ExceptionOrValue&, kj::WaitScope&, kj::SourceLocation) () from /nix/store/46kiq9naswgbqfc14kc9nxcbgd0rv0m2-capnproto-1.1.0/lib/libkj-async.so.1.1.0
    [#24](/bitcoin-core-multiprocess/24/) 0x000055df0ff8e44c in kj::Promise<unsigned long>::wait (this=<optimized out>, waitScope=..., location=...) at /nix/store/46kiq9naswgbqfc14kc9nxcbgd0rv0m2-capnproto-1.1.0/include/kj/async-inl.h:1357
    [#25](/bitcoin-core-multiprocess/25/) 0x000055df0ff8b722 in mp::EventLoop::loop (this=0x7f4804ffec90) at mp/src/mp/proxy.cpp:231
    
    >  231          const size_t read_bytes = wait_stream->read(&buffer, 0, 1).wait(m_io_context.waitScope);
    
    
    [#26](/bitcoin-core-multiprocess/26/) 0x000055df0feed376 in mp::test::TestSetup::TestSetup(bool)::{lambda()#1}::operator()() const (this=0x7f480002d1e8) at mp/test/mp/test/test.cpp:92
    
        91                client_promise.set_value(std::move(client_proxy));
    >   92                loop.loop();
    
    [#27](/bitcoin-core-multiprocess/27/) 0x00007f48054ed064 in execute_native_thread_routine () from /nix/store/7c0v0kbrrdc2cqgisi78jdqxn73n3401-gcc-14.2.1.20250322-lib/lib/libstdc++.so.6
    [#28](/bitcoin-core-multiprocess/28/) 0x00007f4805097e63 in start_thread () from /nix/store/cg9s562sa33k78m63njfn1rw47dp9z0i-glibc-2.40-66/lib/libc.so.6
    [#29](/bitcoin-core-multiprocess/29/) 0x00007f480511bdbc in __clone3 () from /nix/store/cg9s562sa33k78m63njfn1rw47dp9z0i-glibc-2.40-66/lib/libc.so.6
    

    </p> </details>

  5. ryanofsky force-pushed on Sep 3, 2025
  6. ryanofsky commented at 5:02 PM on September 3, 2025: collaborator

    <!-- begin push-2 -->

    Updated fe1cbedc78c8eadcd668fb6cac244d63747f1503 -> 1d6d854047ab2c6c06f36e5a21fa381969abb483 (pr/context.1 -> pr/context.2, compare)<!-- end --> fixing iwyu error and making minor cleanups

  7. in include/mp/proxy-io.h:539 in 1d6d854047 outdated
     531 | @@ -534,29 +532,73 @@ void ProxyServerBase<Interface, Impl>::invokeDestroy()
     532 |      CleanupRun(m_context.cleanup_fns);
     533 |  }
     534 |  
     535 | -using ConnThreads = std::map<Connection*, ProxyClient<Thread>>;
     536 | +//! Map from Connection to local or remote thread handle which will be used over
     537 | +//! that connection. This map will typically only contain one entry, but can
     538 | +//! contain multiple if a single thread makes IPC calls over multiple
     539 | +//! connections. A std::optional value type is used to avoid the map needing to
     540 | +//! be locked while ProxyClient<Thread> objects are constructred, see
    


    maflcko commented at 7:16 AM on September 4, 2025:
    constructred -> constructed [spelling error

    ryanofsky commented at 7:55 PM on September 5, 2025:

    constructred

    Thanks, fixed

  8. in include/mp/proxy-io.h:571 in dd86363f34 outdated
     569 | -    //! server thread.
     570 | +    //! Waiter object used to allow remote clients to execute code on this
     571 | +    //! thread. If this is a server thread created by
     572 | +    //! ProxyServer<ThreadMap>::makeThread(), it is initialized by that
     573 | +    //! function. Otherwise if this is a thread that was created externally,
     574 | +    //! this is just initialized the first time the thread tries to make an IPC
    


    Eunovo commented at 8:36 AM on September 5, 2025:

    nit: then it is initialized...


    ryanofsky commented at 7:54 PM on September 5, 2025:

    re: #201 (review)

    Thanks rephrased this to be clearer now, hopefully

  9. in src/mp/proxy.cpp:358 in cd6d95aef8 outdated
     349 | +        // loop->sync(), so if the connection is broken there is not a race
     350 | +        // between this thread trying to remove the callback and the disconnect
     351 | +        // handler attempting to call it.
     352 | +        m_context.loop->sync([&]() {
     353 | +            if (m_disconnect_cb) {
     354 | +                m_context.connection->removeSyncCleanup(*m_disconnect_cb);
    


    Eunovo commented at 9:31 AM on September 5, 2025:

    https://github.com/bitcoin-core/libmultiprocess/pull/201/commits/cd6d95aef81149a5ecae2c05aad4f93661c5b713: While attempting to reproduce the bug, the disconnect_cb would hang or cause a segmentation fault while executing. I believe it was caused by the callback being deleted by the ProxyClient<Thread> destructor while it was being executed as part of the cleanups in the Connection destructor.

    [#8](/bitcoin-core-multiprocess/8/)  operator() (__closure=0x7fffe80012e0) at /libmultiprocess/src/mp/proxy.cpp:326
    326	        const std::unique_lock<std::mutex> lock(mutex);
    (gdb) print mutex
    $12 = (std::mutex &) <error reading variable: Cannot access memory at address 0x2f916f0c9da42c7c>
    (gdb) 
    

    This change seems to stop the undefined behaviour caused by the previously described scenario, even though the test still hangs.

    [#8](/bitcoin-core-multiprocess/8/)  operator() (__closure=0x7fffe8000f60) at /libmultiprocess/src/mp/proxy.cpp:326
    326	        const std::unique_lock<std::mutex> lock(mutex);
    (gdb) print mutex
    $1 = (std::mutex &) [@0x7fffe8000db0](/bitcoin-core-multiprocess/contributor/0x7fffe8000db0/): {<std::__mutex_base> = {_M_mutex = {__data = {__lock = 2, __count = 0, __owner = 1676762, __nusers = 1, __kind = 0, __spins = 0, __elision = 0, 
            __list = {__prev = 0x0, __next = 0x0}}, __size = "\002\000\000\000\000\000\000\000Ú•\031\000\001", '\000' <repeats 26 times>, __align = 2}}, <No data fields>}
    (gdb) 
    

    The test hangs because the ProxyClient<Thread> destructor waits for the EventLoop to clear the disconnect_cb, but the EventLoop is blocked trying to acquire the waiter mutex. This mutex is still held by PassField in type_context.h during its thread erasure operation, which is higher up in the call stack of the same thread that invoked the destructor, creating a deadlock.


    ryanofsky commented at 8:25 PM on September 5, 2025:

    re: #201 (review)

    While attempting to reproduce the bug, the disconnect_cb would hang or cause a segmentation fault while executing. I believe it was caused by the callback being deleted by the ProxyClient<Thread> destructor while it was being executed as part of the cleanups in the Connection destructor.

    Yes exactly, that's the race condition bug being fixed.

    This change seems to stop the undefined behaviour caused by the previously described scenario, even though the test still hangs. [...] The test hangs because the ProxyClient<Thread> destructor waits for the EventLoop to clear the disconnect_cb, but the EventLoop is blocked trying to acquire the waiter mutex. This mutex is still held by PassField in type_context.h during its thread erasure operation, which is higher up in the call stack of the same thread that invoked the destructor, creating a deadlock.

    Yes this change in this file is not sufficient to stop the deadlock because there was also code in type_context.h locking Waiter::m_mutex while calling this destructor via request_threads.erase(). ~So this commit also replaces request_threads.erase() with request_threads.extract() to delay calling this decstructor until after it releases Waiter::m_mutex~ That code has been fixed with an additional sync() call. (EDIT: The sync() call is really the key to the fix not the extract() call. While the extract() would fix the deadlock it would not fix the race condition.)

    With both of these changes there shouldn't be a race or deadlock anymore.

  10. in include/mp/type-context.h:112 in cd6d95aef8 outdated
     118 | +                        // Erase the request_threads entry on the event loop
     119 | +                        // thread with loop->sync(), so if the connection is
     120 | +                        // broken there is not a race between this thread and
     121 | +                        // the disconnect handler trying to destroy the thread
     122 | +                        // client object.
     123 | +                        server.m_context.loop->sync([&] {
    


    Eunovo commented at 10:40 AM on September 5, 2025:

    https://github.com/bitcoin-core/libmultiprocess/pull/201/commits/cd6d95aef81149a5ecae2c05aad4f93661c5b713: The test stops hanging after this change. This thread used to compete with the disconnect handler for the waiter mutex. After this change, the disconnect handler and this callback must execute one after the other.


    ryanofsky commented at 8:17 PM on September 5, 2025:

    re: #201 (review)

    cd6d95a: The test stops hanging after this change. This thread used to compete with the disconnect handler for the waiter mutex. After this change, the disconnect handler and this callback must execute one after the other.

    Yes, the disconnect handler is called from the ~Connection destructor which always runs on the event loop thread, and that would race against this thread which is an IPC server thread. Now everything just runs sequentially on the event loop thread.

  11. Eunovo approved
  12. Eunovo commented at 11:09 AM on September 5, 2025: contributor

    TestedACK https://github.com/bitcoin-core/libmultiprocess/pull/201/commits/1d6d854047ab2c6c06f36e5a21fa381969abb483

    As I understand it, there are two problems that are solved by this PR:

    • a kind of use-after-free bug where the disconnect_cb is deleted in one thread while it is being executed in another
    • a deadlock caused by two threads competing for the same waiter mutex

    I left comments with more details under this review.

  13. ryanofsky force-pushed on Sep 5, 2025
  14. ryanofsky force-pushed on Sep 5, 2025
  15. ryanofsky commented at 8:43 PM on September 5, 2025: collaborator

    Thanks for the review! Updated with suggested changes

    re: #201#pullrequestreview-3188652191

    As I understand it, there are two problems that are solved by this PR:

    Yes, there were basically the deadlock bug which happened most commonly, and different race bugs that could happen because there wasn't really any synchronization previously between disconnect handler and the type-context.h cleanup code after asynchronous IPC calls.


    <!-- begin push-3 -->

    Updated 1d6d854047ab2c6c06f36e5a21fa381969abb483 -> 564e353c01b020dfef87adb24c32c56f7b002157 (pr/context.2 -> pr/context.3, compare)<!-- end --> updating comments and adding assert.

    <!-- begin push-4 -->

    Updated 564e353c01b020dfef87adb24c32c56f7b002157 -> f219f1b51c85e5c4a7157c78d278f832eb0bd975 (pr/context.3 -> pr/context.4, compare)<!-- end --> fixing LLM linter issues

  16. TheCharlatan commented at 2:24 PM on September 8, 2025: collaborator

    Can confirm that this fixes the observed test hang.

  17. in include/mp/proxy-io.h:601 in 90d4a42a39 outdated
     600 | +    //! associated with one event loop and guarded by EventLoop::m_mutex. So
     601 | +    //! Waiter::m_mutex does not need to be held while accessing individual
     602 | +    //! ProxyClient<Thread> instances, and may even need to be released to
     603 | +    //! respect lock order and avoid locking Waiter::m_mutex before
     604 | +    //! EventLoop::m_mutex.
     605 | +    ConnThreads callback_threads MP_GUARDED_BY(waiter->m_mutex);
    


    Sjors commented at 7:47 AM on September 10, 2025:

    In 90d4a42a39fe0e6062e0a06ce784c582ac99cb3f doc: describe ThreadContext struct and synchronization requirements: did you mean to introduce this guard and (some of) the synchronization note in the next commit?


    ryanofsky commented at 10:38 AM on September 10, 2025:

    re: #201 (review)

    In 90d4a42 doc: describe ThreadContext struct and synchronization requirements: did you mean to introduce this guard and (some of) the synchronization note in the next commit?

    I could move it to the next commit if preferred. I actually did want to add the MP_GUARDED_BY annotations in this commit to match the comment added here "these maps are guarded by Waiter::m_mutex". I just couldn't add both of them due to compiler errors, so added one here and one later.


    Sjors commented at 11:53 AM on September 10, 2025:

    I think the MP_GUARDED_BY should be in the same commit that explains it (as you do now). It seems slightly better to add them both in the later commit, but no strong preference.


    ryanofsky commented at 8:50 PM on September 10, 2025:

    re: #201 (review)

    Wound up going in a different direction, just adding thread annotations in the second commit. Hodlinator found a bug where annotations were not being enforced as strictly as they could because the build was using -Wthread-safety-analysis, not -Wthread-safety like the bitcoin build. Fixing this required more changes so I just made all the thread safety annotations together in one commit.

  18. in src/mp/proxy.cpp:315 in f219f1b51c outdated
     311 | @@ -305,29 +312,34 @@ bool EventLoop::done() const
     312 |      return m_num_clients == 0 && m_async_fns->empty();
     313 |  }
     314 |  
     315 | -std::tuple<ConnThread, bool> SetThread(ConnThreads& threads, std::mutex& mutex, Connection* connection, const std::function<Thread::Client()>& make_thread)
     316 | +std::tuple<ConnThread, bool> SetThread(ConnThreads& threads, Mutex& mutex, Connection* connection, const std::function<Thread::Client()>& make_thread)
    


    hodlinator commented at 8:00 AM on September 10, 2025:

    nit: Accepting an independent mutex here feels very unsteady when not knowing there are only ~2~3 callers. Either taking the full Waiter type or at least naming it waiter_mutex would feel more reassuring.


    hodlinator commented at 12:40 PM on September 10, 2025:

    Having these decoupled variables also results in warnings (at least when building as part of Bitcoin Core, haven't been able to make it inside freestanding libmultiprocess yet).

    ../src/ipc/libmultiprocess/include/mp/type-context.h:28:24: warning: passing variable 'callback_threads' by reference requires holding mutex 'thread_context.waiter->m_mutex' [-Wthread-safety-reference]
       28 |         thread_context.callback_threads, thread_context.waiter->m_mutex, &connection,
    

    ryanofsky commented at 8:53 PM on September 10, 2025:

    re: #201 (review)

    Nice catch! This was caused by the bitcoin build using -Wthread-safety and the builds here only using -Wthread-safety-analysis. This is fixed now by grouping the first two parameters into a struct, which I think should resolve your initial concern too.

  19. in include/mp/proxy-io.h:541 in 7a8a99c50f outdated
     534 | @@ -537,8 +535,10 @@ void ProxyServerBase<Interface, Impl>::invokeDestroy()
     535 |  //! Map from Connection to local or remote thread handle which will be used over
     536 |  //! that connection. This map will typically only contain one entry, but can
     537 |  //! contain multiple if a single thread makes IPC calls over multiple
     538 | -//! connections.
     539 | -using ConnThreads = std::map<Connection*, ProxyClient<Thread>>;
     540 | +//! connections. A std::optional value type is used to avoid the map needing to
     541 | +//! be locked while ProxyClient<Thread> objects are constructed, see
     542 | +//! ThreadContext "Synchronization note" below.
     543 | +using ConnThreads = std::map<Connection*, std::optional<ProxyClient<Thread>>>;
    


    Sjors commented at 10:19 AM on September 10, 2025:

    In 7a8a99c50feca10be4e42a1cdb135864ce376f71 bug: fix ProxyClient<Thread> deadlock if disconnected as IPC call is returning: this commit might be easier to follow conceptually if you split the std::optional bit into a separate commit.

    I made an attempt in: https://github.com/bitcoin-core/libmultiprocess/commit/af383138e0323d24892c19e7307ef650def195fd

    It probably needs a comment in SetThread that separating the insert doesn't do anything until the next commit. And then it's still a big diff.


    ryanofsky commented at 11:16 AM on September 10, 2025:

    re: #201 (review)

    I made an attempt in: af38313

    It probably needs a comment in SetThread that separating the insert doesn't do anything until the next commit. And then it's still a big diff.

    Thanks will push an update with that change


    Sjors commented at 11:49 AM on September 10, 2025:

    I split off an additional commit 94fe2251f6bc629f98d7ab71b09a8c0b5459d150 that switches the early return to try_emplace. Now the remaining changes are nice and short: 5b65dae936a53a2c6e43ed49f1f382f919979514


    ryanofsky commented at 8:50 PM on September 10, 2025:

    re: #201 (review)

    I split off an additional commit 94fe225 that switches the early return to try_emplace. Now the remaining changes are nice and short: 5b65dae

    Nice! Took both of your commits

  20. in src/mp/proxy.cpp:351 in 7a8a99c50f outdated
     348 | @@ -336,17 +349,18 @@ ProxyClient<Thread>::~ProxyClient()
     349 |      // cleanup callback that was registered to handle the connection being
     350 |      // destroyed before the thread being destroyed.
     351 |      if (m_disconnect_cb) {
    


    Sjors commented at 10:32 AM on September 10, 2025:

    In 7a8a99c50feca10be4e42a1cdb135864ce376f71 bug: fix ProxyClient<Thread> deadlock if disconnected as IPC call is returning: the reason for keeping this if(m_disconnect_cb) is just to avoid a needless sync call? I.e. a race condition could lead m_disconnect_cb to disappear in the mean time, but it can never appear.


    ryanofsky commented at 11:21 AM on September 10, 2025:

    In 7a8a99c bug: fix ProxyClient deadlock if disconnected as IPC call is returning: the reason for keeping this if(m_disconnect_cb) is just to avoid a needless sync call? I.e. a race condition could lead m_disconnect_cb to disappear in the mean time, but it can never appear.

    Yes it should be fine to just call sync() unconditionally, but if m_disconnect_cb is null there's no reason to do that. Maybe it would be simpler to just call it unconditionally though. I might have just been trying to reduce the diff size here.

  21. in include/mp/proxy-io.h:69 in 7a8a99c50f outdated
      65 | @@ -66,8 +66,6 @@ struct ProxyClient<Thread> : public ProxyClientBase<Thread, ::capnp::Void>
      66 |      ProxyClient(const ProxyClient&) = delete;
      67 |      ~ProxyClient();
      68 |  
      69 | -    void setDisconnectCallback(const std::function<void()>& fn);
    


    Sjors commented at 10:37 AM on September 10, 2025:

    In 7a8a99c50feca10be4e42a1cdb135864ce376f71 bug: fix ProxyClient<Thread> deadlock if disconnected as IPC call is returning: can you dedicate a sentence in the commit message as to why this function was dropped?


    ryanofsky commented at 10:40 AM on September 10, 2025:

    In 7a8a99c bug: fix ProxyClient deadlock if disconnected as IPC call is returning: can you dedicate a sentence in the commit message as to why this function was dropped?

    Yes will push an update with that change. There's no particular reason for dropping this method and it could be kept, but it was a short method only called in one place so it seemed better to inline.


    Sjors commented at 11:52 AM on September 10, 2025:

    I agree it's nicer to drop it.


    ryanofsky commented at 8:51 PM on September 10, 2025:

    re: #201 (review)

    can you dedicate a sentence in the commit message as to why this function was dropped?

    Added this now

  22. in include/mp/proxy-io.h:310 in f219f1b51c outdated
     304 | @@ -307,7 +305,7 @@ struct Waiter
     305 |      //! mutexes than necessary. This mutex can be held at the same time as
     306 |      //! EventLoop::m_mutex as long as Waiter::mutex is locked first and
     307 |      //! EventLoop::m_mutex is locked second.
     308 | -    std::mutex m_mutex;
     309 | +    Mutex m_mutex;
     310 |      std::condition_variable m_cv;
     311 |      std::optional<kj::Function<void()>> m_fn;
    


    hodlinator commented at 11:34 AM on September 10, 2025:

    Might as well explicitly guard?

        std::optional<kj::Function<void()>> m_fn MP_GUARDED_BY(m_mutex);
    

    hodlinator commented at 12:03 PM on September 10, 2025:

    (I guess that doesn't work because of std::condition_variable <-> MP_GUARDED_BY interaction. Please resolve).


    ryanofsky commented at 8:55 PM on September 10, 2025:

    re: #201 (review)

    Thanks, added the guarded by annotation in the second commit! Condition variable interaction was not hard to solve with an additional "requires" annotation.


    hodlinator commented at 9:15 AM on September 11, 2025:

    Ah, didn't think to add the MP_REQUIRES(m_mutex)-annotation. I like the GuardedRef!

  23. Sjors commented at 11:51 AM on September 10, 2025: member

    utACK f219f1b51c85e5c4a7157c78d278f832eb0bd975

    I left some suggestions to make the intermediate commits more clear, which might benefit other reviewers.

  24. DrahtBot requested review from Eunovo on Sep 10, 2025
  25. ci: Use -Wthread-safety not -Wthread-safety-analysis
    Use -Wthread-safety not -Wthread-safety-analysis in llvm and sanitize jobs.
    -Wthread-safety is a more general flag which adds additional checks outside the
    core thread safety analysis.
    
    Credit to hodlinator for noticing the bitcoin core build being stricter about
    thread safety checks than the libmultiprocess build here:
    https://github.com/bitcoin-core/libmultiprocess/pull/201#discussion_r2336632302
    4e365b019a
  26. proxy-io.h: add Waiter::m_mutex thread safety annotations
    This is just a refactoring, no behavior is changing. This is only adding
    annotations and switching from unannotated to annotated types.
    
    In SetThread, the first two parameters are combined into a single GuardedRef
    parameter to avoid -Wthread-safety-reference warnings at call sites like
    "warning: passing variable 'callback_threads' by reference requires holding
    mutex 'thread_context.waiter->m_mutex'"
    d60db601ed
  27. doc: describe ThreadContext struct and synchronization requirements 9b07991135
  28. Use std::optional in ConnThreads to allow shortening locks
    This is just a refactoring changing the ConnThreads data type. The optional
    value is not actually left in an unset state until the next commit.
    ca9b380ea9
  29. Use try_emplace in SetThread instead of threads.find
    This commit easiest to review ignoring whitespace (git diff -w). This is a
    minor change in behavior, but the only change is shortening the duration that
    threads.mutex is locked while inserting a new entry in the threads.ref map. The
    lock is now only held while the entry is created and is released while the
    ProxyClient<Thread> object is initialized.
    
    This change doesn't really fix any problems but it simplifies the next commit
    which deals with race conditions and deadlocks in this code, so it has been
    split out.
    85df96482c
  30. bug: fix ProxyClient<Thread> deadlock if disconnected as IPC call is returning
    This bug is currently causing mptest "disconnecting and blocking" test to
    occasionally hang as reported by maflcko in
    https://github.com/bitcoin/bitcoin/issues/33244.
    
    The bug was actually first reported by Sjors in
    https://github.com/Sjors/bitcoin/pull/90#issuecomment-3024942087 and there are
    more details about it in
    https://github.com/bitcoin-core/libmultiprocess/issues/189.
    
    The bug is caused by the "disconnecting and blocking" test triggering a
    disconnect right before a server IPC call returns. This results in a race
    between the IPC server thread and the onDisconnect handler in the event loop
    thread both trying to destroy the server's request_threads ProxyClient<Thread>
    object when the IPC call is done.
    
    There was a lack of synchronization in this case, fixed here by adding
    loop->sync() a few places. Specifically the fixes were to:
    
    - Always call SetThread on the event loop thread using the loop->sync() method,
      to prevent a race between the ProxyClient<Thread> creation code and the
      connection shutdown code if there was an ill-timed disconnect.
    
    - Similarly in ~ProxyClient<Thread> and thread-context.h PassField(), use
      loop->sync() when destroying the thread object, in case a disconnect happens
      at that time.
    
    A few other changes were made in this commit to make the resulting code safer
    and simpler, even though they are not technically necessary for the fix:
    
    - In thread-context.h PassField(), Waiter::m_mutex is now unlocked while
      destroying ProxyClient<Thread> just to respect EventLoop::m_mutex and
      Waiter::m_mutex lock order and never lock the Waiter first. This is just for
      consistency. There is no actually possibility for a deadlock here due to the
      new sync() call.
    
    - This adds asserts to make sure functions expected to run on the event
      loop thread are only called on that thread.
    
    - This inlines the ProxyClient<Thread>::setDisconnectCallback function, just
      because it was a short function only called in a single place.
    4a269b21b8
  31. ryanofsky force-pushed on Sep 10, 2025
  32. ryanofsky commented at 10:23 PM on September 10, 2025: collaborator

    Thanks for the reviews! I took Sjors commits to break the change up into smaller parts and addressed hodlinators suggestions to add more thread safety annotations and make them more strict. I think both changes should make this PR easier to review.

    <!-- begin push-5 -->

    Updated f219f1b51c85e5c4a7157c78d278f832eb0bd975 -> b6321bb99c8a79400460899eaf3b5e9b15e41be7 (pr/context.4 -> pr/context.5, compare)<!-- end --> with suggested changes to split commits and improve thread safety annotations. This actual fix is not changed.

    <!-- begin push-6 -->

    Updated b6321bb99c8a79400460899eaf3b5e9b15e41be7 -> d86823eed705056e430a6c1c3cd676c360f6a0ae (pr/context.5 -> pr/context.6, compare)<!-- end --> fixing typos

  33. ryanofsky force-pushed on Sep 10, 2025
  34. Sjors commented at 6:53 AM on September 11, 2025: member

    d86823eed705056e430a6c1c3cd676c360f6a0ae looks good once CI is happy...

    Thanks for splitting the commits. The new struct GuardedRef introduced in 6d6096ae93232fb9dd4b582197fd2f5196efa8f4 is nice.

    When I ran make before locally it didn't emit any warnings about locks. Given that you only changed the CI configuration, I assume vanilla make still won't. It would be good to have a cmake developer preset that catches as much as possible (different PR of course).

    Additionally, maybe one of @DrahtBot's llm's can look through CI logs (when all jobs pass) for anything unusual compared to master.

    OpenBSD fail seems real:

    In file included from /home/runner/work/libmultiprocess/libmultiprocess/test/mp/test/foo-types.h:15:
    /home/runner/work/libmultiprocess/libmultiprocess/include/mp/type-context.h:96:29: error: no viable constructor or deduction guide for deduction of template arguments of 'GuardedRef'
                                GuardedRef{thread_context.waiter->m_mutex, request_threads}, server.m_context.connection,
                                ^
    /home/runner/work/libmultiprocess/libmultiprocess/include/mp/util.h:186:8: note: candidate function template not viable: requires 1 argument, but 2 were provided
    struct GuardedRef
    

    https://github.com/bitcoin-core/libmultiprocess/actions/runs/17628430545/job/50090729622?pr=201

    Co-pilot seems to think this is due to incomplete c++20 support, and we could pre-empt that by adding target_compile_features(multiprocess PUBLIC cxx_std_20) after the three add_library( statements. Though @purpleKarrot recently pointed out LLM's tend to be wrong when it comes to cmake.

  35. in include/mp/util.h:185 in d86823eed7 outdated
     181 | @@ -182,6 +182,13 @@ class MP_SCOPED_CAPABILITY Lock {
     182 |      std::unique_lock<std::mutex> m_lock;
     183 |  };
     184 |  
     185 | +template<typename T>
    


    hodlinator commented at 9:06 AM on September 11, 2025:

    meganit: Pretty sure Clang-format in main project prefers:

    template <typename T>
    
  36. ryanofsky force-pushed on Sep 11, 2025
  37. ryanofsky commented at 10:26 AM on September 11, 2025: collaborator

    Thanks, the openbsd failure is probably just happening because openbsd is using an old version of clang. It should be possible to reproduce this locally using the llvm job and following change in shell.nix:

    -  llvm = crossPkgs.llvmPackages_20;
    +  llvm = crossPkgs.llvmPackages_16;
    

    But I'm having some kind of network issue and can't download the llvm 16 packages.

    You are right that it would be nice to have a preset or something that makes it easier to see thread safety errors locally. FWIW I just have a configuration script that does:

    MP_CXX_FLAGS="-Werror -ftemplate-backtrace-limit=0 -Wthread-safety"
    CC=clang CXX=clang++ cmake -DCMAKE_INSTALL_PREFIX=$HOME/work/mp/build/prefix -DCapnProto_DEBUG=1 -DCMAKE_BUILD_TYPE=Debug -DCMAKE_CXX_FLAGS="$MP_CXX_FLAGS" -DMULTIPROCESS_RUN_CLANG_TIDY=1 ..
    

    And sometimes I'll add other things like -fsanitize=address to MP_CXX_FLAGS as well.

    As far as draftbot/CI, the llvm job in ci/configs/llvm.bash uses -Werror and -Wthread-safety together so it will fail if there are any compile time thread safety issues. So I think there is no risk of compile-time errors slipping through anymore (now that is it is using broader -Wthread-safety instead of narrower -Wthread-safety-analysis)


    <!-- begin push-7 -->

    Updated d86823eed705056e430a6c1c3cd676c360f6a0ae -> 4a269b21b8c8c2c7138df82c8dc4103a388e339b (pr/context.6 -> pr/context.7, compare)<!-- end --> with attempted fix for openbsd failure

  38. ryanofsky commented at 10:35 AM on September 11, 2025: collaborator

    Openbsd / clang-16 fix seems to have worked!

  39. Sjors commented at 11:42 AM on September 11, 2025: member

    ACK 4a269b21b8c8c2c7138df82c8dc4103a388e339b

  40. ryanofsky commented at 9:15 PM on September 12, 2025: collaborator

    I've made progress on implementing a more comprehensive disconnect test that can detect bugs like https://github.com/bitcoin/bitcoin/issues/33244 and #189 more reliably instead of only intermittently when the test is run thousands of times. If I can finish it soon, I'll add it to this PR, otherwise it can be a followup.

    The test is implemented in commit ea4976f099cb83ef59f58f09b62576358cdbf976 (tag) and I think I'm maybe 70% finished with the implementation. The test passes on master because not all of the checks are enabled yet and many aren't implemented (these are noted in FIXME comments in the code). The commit combines the 4 existing disconnect test cases into a more general test and follows up on earlier ideas and efforts from:

  41. Sjors commented at 8:09 AM on September 13, 2025: member

    @ryanofsky that commit also touches non-test code, so I think it would be better as a standalone PR.

  42. ryanofsky commented at 10:21 AM on September 13, 2025: collaborator

    @ryanofsky that commit also touches non-test code, so I think it would be better as a standalone PR.

    That's reasonable. But to be fair the only changes to non-test code are adding 4 lines like:

    if (connection->m_loop->m_signal) connection->m_loop->m_signal(SignalCallSetup{});
    

    so the test is able to disconnect at specific points. m_signal is always null unless the test sets it, so it should be pretty clear that these lines only affect tests. In practice I'd also split up the change into nontest and test-only commits. And I'd add hasSignal()/sendSignal() helpers so signal mechanism wouldn't be exposed and this would look more like:

    if (connection->m_loop->hasSignal<CallSetup>()) connection->m_loop->sendSignal<CallSetup>());
    
  43. in ci/configs/sanitize.bash:4 in 4e365b019a outdated
       0 | @@ -1,7 +1,7 @@
       1 |  CI_DESC="CI job running ThreadSanitizer"
       2 |  CI_DIR=build-sanitize
       3 |  export CXX=clang++
       4 | -export CXXFLAGS="-ggdb -Werror -Wall -Wextra -Wpedantic -Wthread-safety-analysis -Wno-unused-parameter -fsanitize=thread"
       5 | +export CXXFLAGS="-ggdb -Werror -Wall -Wextra -Wpedantic -Wthread-safety -Wno-unused-parameter -fsanitize=thread"
    


    Eunovo commented at 11:55 AM on September 15, 2025:

    https://github.com/bitcoin-core/libmultiprocess/pull/201/commits/4e365b019a9feb3150ac8de089014e719f08c9c4: This is unrelated to the PR, but why generate debugging information here? I'm referring to the use of the -ggdb flag


    ryanofsky commented at 12:21 PM on September 15, 2025:

    re: #201 (review)

    I haven't tried without -ggdb, but sanitizers like ThreadSanitizer print stack traces that I think depend on having this debug information. Cap'n Proto also can use addr2line to print stack traces internally when there are errors (though I don't think we have this option enabled)

  44. DrahtBot requested review from Eunovo on Sep 15, 2025
  45. in src/mp/proxy.cpp:366 in d60db601ed outdated
     362 | @@ -364,7 +363,7 @@ ProxyServer<Thread>::~ProxyServer()
     363 |      assert(m_thread_context.waiter.get());
     364 |      std::unique_ptr<Waiter> waiter;
     365 |      {
     366 | -        const std::unique_lock<std::mutex> lock(m_thread_context.waiter->m_mutex);
     367 | +        const Lock lock(m_thread_context.waiter->m_mutex);
    


    Eunovo commented at 1:33 PM on September 15, 2025:
  46. ryanofsky referenced this in commit 0b62c777d9 on Sep 17, 2025
  47. ryanofsky commented at 9:30 AM on September 17, 2025: collaborator

    I think I'll go ahead and merge PR this since it's had acks and testing to confirm the fix works. I would have liked to add my test changes here to test it in a more reliable way but it's taking longer than expected to implement so better to make it a followup.

  48. ryanofsky merged this on Sep 17, 2025
  49. ryanofsky closed this on Sep 17, 2025

  50. ryanofsky referenced this in commit 535fa0ad0d on Sep 17, 2025
  51. in include/mp/proxy-io.h:1 in 4a269b21b8


    hodlinator commented at 12:43 PM on September 18, 2025:

    Tangent: Why does EventLoop use Unix sockets within the same process to signal when there's a new m_post_fn to execute? Seems like there should be thread synchronization primitives with marginally less overhead?


    hodlinator commented at 12:48 PM on September 18, 2025:

    Tangent in unrelated file - regarding how easy it is to understand libmultiprocess: TestSetup (in test/mp/test/test.cpp) is hard to read, maybe would benefit from adding comments with argument names of anonymous lambdas and explicit captures of variables. Also feeling resistance when having to learn that kj::heap returns something std::unique_ptr-like which implicit casts to raw pointer. Are the kj types only used when necessary to interface with Cap'n'Proto, and std everywhere else?


    ryanofsky commented at 2:14 PM on September 18, 2025:

    re: https://github.com/bitcoin-core/libmultiprocess/pull/201/files#r2359060452

    Tangent: Why does EventLoop use Unix sockets within the same process to signal when there's a new m_post_fn to execute? Seems like there should be thread synchronization primitives with marginally less overhead?

    EventLoop documentation is meant to provide some background on this and specifically the link at the bottom https://groups.google.com/d/msg/capnproto/TuQFF1eH2-M/g81sHaTAAQAJ addresses this question.

    Cap'n Proto doesn't use threads and assumes all code calling it runs on a single thread. There are benefits and costs that come from this but one of the costs is that external threads need to use I/O to communicate with cap'n proto's event loop. There isn't another way for threads to send signals. (Or at least there wasn't at time I wrote this, maybe newer versions of cap'n proto provide something)


    ryanofsky commented at 2:34 PM on September 18, 2025:

    re: #201 (review)

    Tangent in unrelated file - regarding how easy it is to understand libmultiprocess: TestSetup (in test/mp/test/test.cpp) is hard to read, maybe would benefit from adding comments with argument names of anonymous lambdas and explicit captures of variables.

    This is probably true and I'm currently working on test improvements where I'm adding more members to the TestSetup class and could add more comments.

    I'm not sure I would always prefer explicit variable captures though. I think [&] is not just a way of capturing state and cutting down noise, but also an important way indicating that the lambda will only be called in the current scope, and is not some event handler that can be called at a later time. So I tend to like [&] for synchronous inline callbacks, and [x, y, z] for asynchronous event handlers.

    Also feeling resistance when having to learn that kj::heap returns something std::unique_ptr-like which implicit casts to raw pointer. Are the kj types only used when necessary to interface with Cap'n'Proto, and std everywhere else?

    I think basically yes but the word "only" is too strong. The code does tends to use kj library when cap'nproto types are involved and std library otherwise. But there are probably exceptions, and I haven't found kj types and macros to cause problems in practice. In general kj types tend to have nice debugging and safety features and have a real rationale behind them. They are not just NIH. I think kj::Own in particular has capabilities std::unique_ptr doesn't have because it doesn't bake the deleter into the type definition, making it easier to support things like memory pools and implement zero-copy techniques cap'n proto uses internally.


    hodlinator commented at 7:15 PM on September 23, 2025:

    I think the main thing I was missing is the need for pumping the async IO. I assume waiting on a async IO-wrapped socket serves that purpose?

    Was curious why you didn't use m_io_context->provider->newOneWayPipe() but when writing to the AsyncOutputStream a promise is returned with KJ_WARN_UNUSED_RESULT (https://github.com/capnproto/capnproto/blob/7db701e94ad00b8db06ede2bea5f527e2a6fa3de/c%2B%2B/src/kj/async-io.h#L121), which doesn't mirror what was in the example in Google Groups.

    Q: I can see there's a ::close() call for [m_]post_fd, but none for m_wait_fd? Maybe it's garbage collected once the process dies, or does wrapSocketFd() assume responsibility for closing?


    hodlinator commented at 8:40 PM on September 23, 2025:

    Your policy of only using [&] for indicating that the lambda will be called in current scope is nice when it comes to function-scope - as when creating server_connection, but less so in the case of TestSetup::server_disconnect when this is captured, which can perhaps be somewhat forgiven for a relatively simple lambda.


    I agree that kj has some clever quirks that are missing from std. But is not a one-way street (implicit cast from Own to raw pointer for example). Would still advise towards using std when possible to make review more approachable. This passed mptest for example:

    diff --git a/include/mp/proxy-io.h b/include/mp/proxy-io.h
    index a0bdf13..6358c86 100644
    --- a/include/mp/proxy-io.h
    +++ b/include/mp/proxy-io.h
    @@ -179,7 +179,7 @@ public:
     
         //! Run function on event loop thread. Does not return until function completes.
         //! Must be called while the loop() function is active.
    -    void post(kj::Function<void()> fn);
    +    void post(std::function<void()> fn);
     
         //! Wrapper around EventLoop::post that takes advantage of the
         //! fact that callable will not go out of scope to avoid requirement that it
    @@ -231,7 +231,7 @@ public:
         std::thread m_async_thread;
     
         //! Callback function to run on event loop thread during post() or sync() call.
    -    kj::Function<void()>* m_post_fn MP_GUARDED_BY(m_mutex) = nullptr;
    +    std::function<void()>* m_post_fn MP_GUARDED_BY(m_mutex) = nullptr;
     
         //! Callback functions to run on async thread.
         std::optional<CleanupList> m_async_fns MP_GUARDED_BY(m_mutex);
    diff --git a/src/mp/proxy.cpp b/src/mp/proxy.cpp
    index 06825c9..6611cfb 100644
    --- a/src/mp/proxy.cpp
    +++ b/src/mp/proxy.cpp
    @@ -263,7 +263,7 @@ void EventLoop::loop()
         m_cv.notify_all();
     }
     
    -void EventLoop::post(kj::Function<void()> fn)
    +void EventLoop::post(std::function<void()> fn)
     {
         if (std::this_thread::get_id() == m_thread_id) {
             fn();
    

    ryanofsky commented at 12:08 PM on September 30, 2025:

    re: #201 (review)

    Yes IIRC wrapSocketFd does take ownership. A lot of this code is also updated & made more generic in windows PR https://github.com/bitcoin/bitcoin/pull/32387 in case you are curious.


    ryanofsky commented at 12:22 PM on September 30, 2025:

    re: #201 (review)

    Nice find on lambda inconsistencies, that would make sense to clean up. Would be happy to review a PR with any cleanups and improvements like this.

    On kj::Function vs std::function, this was actually changed recently in 52256e730f516af3d4ec5068c0157a183f7aee6e because std::function unlike kj::Function requires function objects to be copyable, which prevents writing lambdas that capture move-only objects. So I'm actually surprised that diff compiles. Maybe it compiles because it is only changing EventLoop::post not Waiter::post? Happy to change to whatever types work though.

  52. hodlinator commented at 12:51 PM on September 18, 2025: contributor

    Didn't fully get over the hump of knowing what I was looking at in the last commit before the merge this time. At least made me publish #213 (which originally contained a parallel discovery of #202).

    Left some pending inline thoughts and the diagram below.

    <details><summary>Tried diagramming out the example code to get a better grip on the library, but I'm not entirely friends with Mermaid.</summary>

    ---
      config:
        class:
          hideEmptyMembersBox: true
    ---
     classDiagram
        class EventLoop{
            +loop()
            +sync()
            +post()
        }
        class Waiter{
            +post()
            +wait()
        }
        
        class Connection {
            +m_loop
            +m_stream
            +m_network
            +m_rpc_system
            +CleanupList m_sync_cleanup_fns
        }
        note for Connection "Server *or* Client"
    
        class ProxyServer~Interface~
        class ProxyClientBase~Interface, Impl~
        note for ProxyClientBase "Inherits Impl"
        class ProxyServerBase~Interface, Impl~ {
            +shared_ptr~Impl~ m_impl
        }
        note for ProxyServerBase "Inherits Interface::Server,<br/>Owns Impl"
    
        class Calculator {
            +solveEquation(const std::string&) = 0
        }
        note for Calculator "Fairly typical pure virtual C++ interface"
    
        class ProxyClient~::CalculatorInterface~ {
        }
        
        class ProxyClientCustom~::CalculatorInterface, Calculator~ {
        }
    
        class ProxyClientCustom~Interface, Impl~ {
        }
    
        ProxyClientCustom~::CalculatorInterface,Calculator~ <-- ProxyClient~::CalculatorInterface~
        ProxyClientBase~Interface, Impl~ <-- ProxyClientCustom~Interface, Impl~
    
        class TestSetup {
            +thread
        }
    
        class TestSetup.thread {
            +EventLoop
            +pipe
            +server_connection
            +client_connection
            +unique_ptr~ProxyClient~FooInterface~~ client_proxy
        }
    
        Connection "1" o-- "1" EventLoop
        TestSetup "1" o-- "1" TestSetup.thread
        TestSetup.thread "2" o-- "1" Connection : server & client
    

    </details>

  53. ryanofsky commented at 3:09 PM on September 18, 2025: collaborator

    Thanks for the review and followups!

  54. fanquake referenced this in commit edb871cba2 on Sep 19, 2025
  55. Sjors referenced this in commit 132621fc01 on Sep 23, 2025
  56. theuni referenced this in commit 419d891b0a on Sep 26, 2025
  57. ryanofsky commented at 12:25 PM on September 30, 2025: collaborator

    Thanks for the followups, and I'd be happy to review a PR which cleans up code and makes it more readable or approachable!

  58. ryanofsky referenced this in commit afcc40b0f1 on Oct 2, 2025
  59. ryanofsky referenced this in commit 5bb3cbdfba on Oct 2, 2025
  60. ryanofsky referenced this in commit abcd4c4ff9 on Oct 7, 2025
  61. fanquake referenced this in commit a14e7b9dee on Oct 16, 2025

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin-core/libmultiprocess. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-04-20 17:30 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me