mptest times out intermittently #35491

issue maflcko opened this issue on June 9, 2026
  1. maflcko commented at 5:39 AM on June 9, 2026: member

    🚧 At least one of the CI tasks failed. <sub>Task Alpine (musl): https://github.com/bitcoin/bitcoin/actions/runs/27157390674/job/80163685993</sub> <sub>LLM reason (✨ experimental): CI failed because CTest timed out running the mptest test (mptest — Timeout after ~2400s).</sub>

    Originally posted by @DrahtBot in #35487 (comment)

    Normally, the test finishes in less than a second, so this looks like a clear bug in the test itself. The logs were:

    351/351 Test   [#3](/bitcoin-bitcoin/3/): mptest ............................................................***Timeout 2400.10 sec
    [ TEST ] spawn_tests.cpp:46: SpawnProcess does not run callback in child
    [ PASS ] spawn_tests.cpp:46: SpawnProcess does not run callback in child (67572 μs)
    [ TEST ] test.cpp:130: Call FooInterface methods
    [ PASS ] test.cpp:130: Call FooInterface methods (10731 μs)
    [ TEST ] test.cpp:231: Call IPC method after client connection is closed
    [ PASS ] test.cpp:231: Call IPC method after client connection is closed (4165 μs)
    [ TEST ] test.cpp:248: Calling IPC method after server connection is closed
    [ PASS ] test.cpp:248: Calling IPC method after server connection is closed (535 μs)
    [ TEST ] test.cpp:265: Calling IPC method and disconnecting during the call
    [ PASS ] test.cpp:265: Calling IPC method and disconnecting during the call (531 μs)
    [ TEST ] test.cpp:285: Calling IPC method, disconnecting and blocking during the call
    [ PASS ] test.cpp:285: Calling IPC method, disconnecting and blocking during the call (915 μs)
    [ TEST ] test.cpp:334: Worker thread destroyed before it is initialized
    [ PASS ] test.cpp:334: Worker thread destroyed before it is initialized (10824 μs)
    [ TEST ] test.cpp:371: Calling async IPC method, with server disconnect racing the call
    [ PASS ] test.cpp:371: Calling async IPC method, with server disconnect racing the call (11922 μs)
    [ TEST ] test.cpp:401: Calling async IPC method, with server disconnect after cleanup
    [ PASS ] test.cpp:401: Calling async IPC method, with server disconnect after cleanup (826 μs)
    [ TEST ] test.cpp:430: Make simultaneous IPC calls on single remote thread
    
    Errors while running CTest
    
    99% tests passed, 1 tests failed out of 351
    
                
    
  2. maflcko added the label CI failed on Jun 9, 2026
  3. maflcko commented at 5:50 AM on June 9, 2026: member
  4. fanquake commented at 9:02 AM on June 9, 2026: member
  5. fanquake commented at 10:17 AM on June 9, 2026: member

    @dergoegge maybe you can reproduce in antithesis? (See prior discussion in #34014)

  6. maflcko commented at 10:40 AM on June 9, 2026: member

    Or maybe in https://github.com/dergoegge/bedrock?

    In the meantime I'll try to reproduce on a fresh podman run -it --rm --platform=linux alpine:3.23:

    apk update && apk add rsync screen python3 git bash vim && git clone --depth=1 https://github.com/bitcoin/bitcoin ./b-c-ci && cd ./b-c-ci

    RUN_UNIT_TESTS=false RUN_FUNCTIONAL_TESTS=false CCACHE_DIR=/ccache_dir CCACHE_MAXSIZE=5500M USER=dummy_user DANGER_RUN_CI_ON_HOST="1" MAKEJOBS="-j$(nproc)" FILE_ENV="./ci/test/00_setup_env_native_alpine_musl.sh" ./ci/test_run_all.sh

    while /ci_container_base/ci/scratch*/build-*/src/ipc/libmultiprocess/test/mptest ; do true ; done

  7. maflcko commented at 11:11 AM on June 9, 2026: member

    bt is:

    (gdb) thread apply all bt
    
    Thread 3 (LWP 254272 "mptest"):
    [#0](/bitcoin-bitcoin/0/)  __cp_end () at src/thread/x86_64/syscall_cp.s:29
    [#1](/bitcoin-bitcoin/1/)  0x00007f4d95cf872f in __syscall_cp_c (nr=281, u=<optimized out>, v=<optimized out>, w=<optimized out>, x=<optimized out>, y=0, z=8) at src/thread/pthread_cancel.c:33
    [#2](/bitcoin-bitcoin/2/)  0x00007f4d95cd580c in epoll_pwait (fd=3, ev=0x7f4d95886a60, cnt=16, to=-1, sigs=0x0) at src/linux/epoll.c:28
    [#3](/bitcoin-bitcoin/3/)  0x000055e48dc153e6 in kj::UnixEventPort::wait (this=0x55e48dd719d8) at /usr/src/kj/io.h:284
    [#4](/bitcoin-bitcoin/4/)  0x000055e48db7c968 in kj::EventLoop::wait (this=this@entry=0x55e48dd71ab0) at /usr/src/kj/async.c++:1851
    [#5](/bitcoin-bitcoin/5/)  0x000055e48db820ac in kj::_::waitImpl (node=..., result=..., waitScope=..., location=...) at /usr/src/kj/async.c++:1985
    [#6](/bitcoin-bitcoin/6/)  0x000055e48da8165e in kj::Promise<unsigned long>::wait (this=0x7f4d95886ff0, waitScope=..., location=...) at /ci_container_base/depends/x86_64-pc-linux-musl/include/kj/async-inl.h:1359
    [#7](/bitcoin-bitcoin/7/)  mp::EventLoop::loop (this=this@entry=0x7f4d95887730) at /ci_container_base/src/ipc/libmultiprocess/src/mp/proxy.cpp:244
    [#8](/bitcoin-bitcoin/8/)  0x000055e48d90fd3b in mp::test::TestSetup::TestSetup(bool)::{lambda()#1}::operator()() const (__closure=<optimized out>) at /ci_container_base/src/ipc/libmultiprocess/test/mp/test/test.cpp:112
    [#9](/bitcoin-bitcoin/9/)  0x000055e48d90ff89 in std::__invoke_impl<void, mp::test::TestSetup::TestSetup(bool)::{lambda()#1}>(std::__invoke_other, mp::test::TestSetup::TestSetup(bool)::{lambda()#1}&&) (__f=...) at /usr/include/c++/15.2.0/bits/invoke.h:63
    [#10](/bitcoin-bitcoin/10/) std::__invoke<mp::test::TestSetup::TestSetup(bool)::{lambda()#1}>(mp::test::TestSetup::TestSetup(bool)::{lambda()#1}&&) (__fn=...) at /usr/include/c++/15.2.0/bits/invoke.h:98
    [#11](/bitcoin-bitcoin/11/) std::thread::_Invoker<std::tuple<mp::test::TestSetup::TestSetup(bool)::{lambda()#1}> >::_M_invoke<0ul>(std::_Index_tuple<0ul>) (this=<optimized out>) at /usr/include/c++/15.2.0/bits/std_thread.h:303
    [#12](/bitcoin-bitcoin/12/) std::thread::_Invoker<std::tuple<mp::test::TestSetup::TestSetup(bool)::{lambda()#1}> >::operator()() (this=<optimized out>) at /usr/include/c++/15.2.0/bits/std_thread.h:310
    [#13](/bitcoin-bitcoin/13/) std::thread::_State_impl<std::thread::_Invoker<std::tuple<mp::test::TestSetup::TestSetup(bool)::{lambda()#1}> > >::_M_run() (this=<optimized out>) at /usr/include/c++/15.2.0/bits/std_thread.h:255
    [#14](/bitcoin-bitcoin/14/) 0x00007f4d95ad2680 in ?? () from /usr/lib/libstdc++.so.6
    [#15](/bitcoin-bitcoin/15/) 0x00007f4d95cf943a in start (p=<optimized out>) at src/thread/pthread_create.c:207
    [#16](/bitcoin-bitcoin/16/) 0x00007f4d95cfad88 in __clone () at src/thread/x86_64/clone.s:22
    Backtrace stopped: frame did not save the PC
    
    Thread 2 (LWP 254273 "mptest"):
    [#0](/bitcoin-bitcoin/0/)  __cp_end () at src/thread/x86_64/syscall_cp.s:29
    [#1](/bitcoin-bitcoin/1/)  0x00007f4d95cf872f in __syscall_cp_c (nr=202, u=<optimized out>, v=<optimized out>, w=<optimized out>, x=<optimized out>, y=0, z=0) at src/thread/pthread_cancel.c:33
    [#2](/bitcoin-bitcoin/2/)  0x00007f4d95cf7beb in __futex4_cp (addr=0x7f4d958147a4, op=0, val=2, to=<optimized out>) at src/thread/__timedwait.c:24
    [#3](/bitcoin-bitcoin/3/)  __timedwait_cp (addr=addr@entry=0x7f4d958147a4, val=val@entry=2, clk=clk@entry=0, at=at@entry=0x0, priv=128, priv@entry=1) at src/thread/__timedwait.c:52
    [#4](/bitcoin-bitcoin/4/)  0x00007f4d95cf8acc in __pthread_cond_timedwait (c=0x7f4d959d7708, m=0x7f4d959d76e0, ts=0x0) at src/thread/pthread_cond_timedwait.c:100
    [#5](/bitcoin-bitcoin/5/)  0x00007f4d95ac839d in std::condition_variable::wait(std::unique_lock<std::mutex>&) () from /usr/lib/libstdc++.so.6
    [#6](/bitcoin-bitcoin/6/)  0x000055e48da80d4c in std::condition_variable::wait<mp::Waiter::wait<mp::ProxyServer<mp::ThreadMap>::makeThread(mp::ThreadMap::Server::MakeThreadContext)::<lambda()>::<lambda()> >(mp::Lock&, mp::ProxyServer<mp::ThreadMap>::makeThread(mp::ThreadMap::Server::MakeThreadContext)::<lambda()>::<lambda()>)::<lambda()> > (this=0x7f4d959d7708, __lock=..., __p=...) at /usr/include/c++/15.2.0/condition_variable:107
    [#7](/bitcoin-bitcoin/7/)  mp::Waiter::wait<mp::ProxyServer<mp::ThreadMap>::makeThread(mp::ThreadMap::Server::MakeThreadContext)::<lambda()>::<lambda()> > (this=0x7f4d959d76e0, lock=..., pred=...) at /ci_container_base/src/ipc/libmultiprocess/include/mp/proxy-io.h:389
    [#8](/bitcoin-bitcoin/8/)  operator() (__closure=<optimized out>) at /ci_container_base/src/ipc/libmultiprocess/src/mp/proxy.cpp:426
    [#9](/bitcoin-bitcoin/9/)  0x000055e48da80edb in std::__invoke_impl<void, mp::ProxyServer<mp::ThreadMap>::makeThread(mp::ThreadMap::Server::MakeThreadContext)::<lambda()> > (__f=...) at /usr/include/c++/15.2.0/bits/invoke.h:63
    [#10](/bitcoin-bitcoin/10/) std::__invoke<mp::ProxyServer<mp::ThreadMap>::makeThread(mp::ThreadMap::Server::MakeThreadContext)::<lambda()> > (__fn=...) at /usr/include/c++/15.2.0/bits/invoke.h:98
    [#11](/bitcoin-bitcoin/11/) std::thread::_Invoker<std::tuple<mp::ProxyServer<mp::ThreadMap>::makeThread(mp::ThreadMap::Server::MakeThreadContext)::<lambda()> > >::_M_invoke<0> (this=<optimized out>) at /usr/include/c++/15.2.0/bits/std_thread.h:303
    [#12](/bitcoin-bitcoin/12/) std::thread::_Invoker<std::tuple<mp::ProxyServer<mp::ThreadMap>::makeThread(mp::ThreadMap::Server::MakeThreadContext)::<lambda()> > >::operator() (this=<optimized out>) at /usr/include/c++/15.2.0/bits/std_thread.h:310
    [#13](/bitcoin-bitcoin/13/) std::thread::_State_impl<std::thread::_Invoker<std::tuple<mp::ProxyServer<mp::ThreadMap>::makeThread(mp::ThreadMap::Server::MakeThreadContext)::<lambda()> > > >::_M_run(void) (this=<optimized out>) at /usr/include/c++/15.2.0/bits/std_thread.h:255
    [#14](/bitcoin-bitcoin/14/) 0x00007f4d95ad2680 in ?? () from /usr/lib/libstdc++.so.6
    [#15](/bitcoin-bitcoin/15/) 0x00007f4d95cf943a in start (p=<optimized out>) at src/thread/pthread_create.c:207
    [#16](/bitcoin-bitcoin/16/) 0x00007f4d95cfad88 in __clone () at src/thread/x86_64/clone.s:22
    Backtrace stopped: frame did not save the PC
    
    Thread 1 (LWP 254247 "mptest"):
    [#0](/bitcoin-bitcoin/0/)  __cp_end () at src/thread/x86_64/syscall_cp.s:29
    [#1](/bitcoin-bitcoin/1/)  0x00007f4d95cf872f in __syscall_cp_c (nr=202, u=<optimized out>, v=<optimized out>, w=<optimized out>, x=<optimized out>, y=0, z=0) at src/thread/pthread_cancel.c:33
    [#2](/bitcoin-bitcoin/2/)  0x00007f4d95cf7beb in __futex4_cp (addr=0x7ffdb39e59d4, op=0, val=2, to=<optimized out>) at src/thread/__timedwait.c:24
    [#3](/bitcoin-bitcoin/3/)  __timedwait_cp (addr=addr@entry=0x7ffdb39e59d4, val=val@entry=2, clk=clk@entry=0, at=at@entry=0x0, priv=128, priv@entry=1) at src/thread/__timedwait.c:52
    [#4](/bitcoin-bitcoin/4/)  0x00007f4d95cf8acc in __pthread_cond_timedwait (c=0x7f4d959d7608, m=0x7f4d959d75e0, ts=0x0) at src/thread/pthread_cond_timedwait.c:100
    [#5](/bitcoin-bitcoin/5/)  0x00007f4d95ac839d in std::condition_variable::wait(std::unique_lock<std::mutex>&) () from /usr/lib/libstdc++.so.6
    [#6](/bitcoin-bitcoin/6/)  0x000055e48d8f86d6 in std::condition_variable::wait<mp::Waiter::wait<mp::test::TestCase430::run()::<lambda()> >(mp::Lock&, mp::test::TestCase430::run()::<lambda()>)::<lambda()> > (this=0x7f4d959d7608, __lock=..., __p=...) at /usr/include/c++/15.2.0/condition_variable:107
    [#7](/bitcoin-bitcoin/7/)  mp::Waiter::wait<mp::test::TestCase430::run()::<lambda()> > (this=0x7f4d959d75e0, lock=..., pred=...) at /ci_container_base/src/ipc/libmultiprocess/include/mp/proxy-io.h:389
    [#8](/bitcoin-bitcoin/8/)  mp::test::TestCase430::run (this=<optimized out>) at /ci_container_base/src/ipc/libmultiprocess/test/mp/test/test.cpp:479
    [#9](/bitcoin-bitcoin/9/)  0x000055e48da8ba9b in kj::TestRunner::run()::{lambda()#1}::operator()() const (__closure=<optimized out>) at /usr/src/kj/test.c++:318
    [#10](/bitcoin-bitcoin/10/) kj::runCatchingExceptions<kj::TestRunner::run()::{lambda()#1}>(kj::TestRunner::run()::{lambda()#1}&&) (func=...) at /usr/src/kj/exception.h:371
    [#11](/bitcoin-bitcoin/11/) kj::TestRunner::run (this=0x7ffdb39e6950) at /usr/src/kj/test.c++:318
    [#12](/bitcoin-bitcoin/12/) 0x000055e48da8cb43 in kj::TestRunner::getMain()::{lambda(auto:1&, (auto:2&&)...)#7}::operator()<kj::TestRunner>(kj::TestRunner&) (__closure=<optimized out>, s=...) at /usr/src/kj/test.c++:217
    [#13](/bitcoin-bitcoin/13/) kj::_::BoundMethod<kj::TestRunner&, kj::TestRunner::getMain()::{lambda(auto:1&, (auto:2&&)...)#7}, kj::TestRunner::getMain()::{lambda(auto:1&, (auto:2&&)...)#8}>::operator()<>() (this=<optimized out>) at /usr/src/kj/function.h:263
    [#14](/bitcoin-bitcoin/14/) kj::Function<kj::MainBuilder::Validity ()>::Impl<kj::_::BoundMethod<kj::TestRunner&, kj::TestRunner::getMain()::{lambda(auto:1&, (auto:2&&)...)#7}, kj::TestRunner::getMain()::{lambda(auto:1&, (auto:2&&)...)#8}> >::operator()() (this=<optimized out>) at /usr/src/kj/function.h:142
    [#15](/bitcoin-bitcoin/15/) 0x000055e48dc39488 in kj::Function<kj::MainBuilder::Validity()>::operator() (this=<optimized out>) at /usr/src/kj/function.h:119
    [#16](/bitcoin-bitcoin/16/) kj::MainBuilder::MainImpl::operator() (this=<optimized out>, programName=..., params=...) at /usr/src/kj/main.c++:623
    [#17](/bitcoin-bitcoin/17/) 0x000055e48dc3e2d5 in kj::Function<void(kj::StringPtr, kj::ArrayPtr<kj::StringPtr const>)>::Impl<kj::MainBuilder::MainImpl>::operator() (this=<optimized out>, params#0=..., params#1=...) at /usr/src/kj/function.h:142
    [#18](/bitcoin-bitcoin/18/) 0x000055e48dc3990b in kj::Function<void(kj::StringPtr, kj::ArrayPtr<kj::StringPtr const>)>::operator() (this=0x7ffdb39e6960, params#0=..., params#1=...) at /usr/src/kj/function.h:119
    [#19](/bitcoin-bitcoin/19/) operator() (__closure=<optimized out>) at /usr/src/kj/main.c++:228
    [#20](/bitcoin-bitcoin/20/) kj::runCatchingExceptions<kj::runMainAndExit(ProcessContext&, MainFunc&&, int, char**)::<lambda()> > (func=...) at /usr/src/kj/exception.h:371
    [#21](/bitcoin-bitcoin/21/) kj::runMainAndExit (context=..., func=..., argc=<optimized out>, argc@entry=1, argv=argv@entry=0x7ffdb39e6a08) at /usr/src/kj/main.c++:228
    [#22](/bitcoin-bitcoin/22/) 0x000055e48da89d29 in main (argc=1, argv=0x7ffdb39e6a08) at /usr/src/kj/test.c++:381
    
  8. maflcko added the label Tests on Jun 9, 2026
  9. ryanofsky commented at 1:18 PM on June 9, 2026: contributor

    re: #35491 (comment)

    bt is:

    Nice, thanks for reproducing and providing the stack trace. The hang on line 479 suggests this is probably a test bug:

    https://github.com/bitcoin/bitcoin/blob/bcbf5bae1645605939f0700c438f0cafb47056dc/src/ipc/libmultiprocess/test/mp/test/test.cpp#L479

    Reviewing test test more closely, I suspect following change would fix it:

    --- a/src/ipc/libmultiprocess/test/mp/test/test.cpp
    +++ b/src/ipc/libmultiprocess/test/mp/test/test.cpp
    @@ -470,6 +470,7 @@ KJ_TEST("Make simultaneous IPC calls on single remote thread")
                     [&running, &tc, i](auto&& results) {
                         assert(results.getResult() == static_cast<int32_t>(100 * (i+1)));
                         running -= 1;
    +                    Lock lock(tc.waiter->m_mutex);
                         tc.waiter->m_cv.notify_all();
                     }));
             }
    

    This looks like a lost wakeup bug where notify_all has no effect because the notify happens when the thread calling wait is not currently waiting (either because it hasn't started to wait yet, or because it stops waiting to check running == 0).

    It looks like this bug has been present since this test was added in https://github.com/bitcoin-core/libmultiprocess/commit/1238170f68e8fa7ae41c79465df5cdae34d568e9, and probably happened because c++'s documentation around what locking is required for notify_all is pretty confusing, as we've seen before ("I didn't know that c++ still requires the lock to be held before notifying, even though it now recommends against holding the lock while notifying").

  10. maflcko commented at 4:18 PM on June 9, 2026: member

    Reviewing test test more closely, I suspect following change would fix it:

    Seems fine, but I feel like clang thread safety annotations should be used to avoid this (and other problems in the future). At least this was the conclusion in https://mirror.b10c.me/bitcoin-bitcoin/30967#discussion_r1780728018. Also this is what https://github.com/isocpp/CppCoreGuidelines/issues/1272#issuecomment-2622750085 recommends?

  11. ryanofsky commented at 4:53 PM on June 9, 2026: contributor

    Thanks, the lock and mutex class used here do use clang's thread safety annotations, and I don't see an obvious way more annotations could make a difference here, but it might be possible. I do agree with the conclusion in that thread that just taking the basic lock/update/notify approach and not trying to be clever with std::atomics and other optimizations is good.

    The C++ guidelines also seem good. I think the notify_all documentation is just misleading when it says:

    The notifying thread does not need to hold the lock on the same mutex as the one held by the waiting thread(s). Doing so may be a pessimization

    Because this implies that you don't need to lock a mutex to send a notification, when you actually do. C++ does make it ok (unlike posix threads) to unlock the mutex before sending the notification, but you still need to lock the mutex after the state transition to be sure the waiting thread is waiting.

  12. maflcko commented at 1:28 PM on June 10, 2026: member

    Thanks, the lock and mutex class used here do use clang's thread safety annotations, and I don't see an obvious way more annotations could make a difference here, but it might be possible.

    Just like fad8e7fba7, the cv should be guarded:

    diff --git a/include/mp/proxy-io.h b/include/mp/proxy-io.h
    index 9260593..545645b 100644
    --- a/include/mp/proxy-io.h
    +++ b/include/mp/proxy-io.h
    @@ -412,3 +412,3 @@ struct Waiter
         Mutex m_mutex;
    -    std::condition_variable m_cv;
    +    std::condition_variable m_cv MP_GUARDED_BY(m_mutex);
         std::optional<kj::Function<void()>> m_fn MP_GUARDED_BY(m_mutex);
    

    This will catch the issue at compile-time:

    /tmp/tmp.CdXP7X55gV/git_dir/test/mp/test/test.cpp:513:32: error: reading variable 'm_cv' requires holding mutex 'tc.waiter->m_mutex' [-Werror,-Wthread-safety-analysis]
      513 |                     tc.waiter->m_cv.notify_all();
          |                                ^
    

    Sent a pull to https://github.com/bitcoin-core/libmultiprocess/pull/295

  13. ryanofsky commented at 12:46 AM on June 11, 2026: contributor

    Just like fad8e7f, the cv should be guarded:

    I see. It's an interesting idea to guard the condition variable with the mutex, and I didn't know you could do that. It's not a clear win because it leads to less than optimal code by preventing releasing the lock before notifying as notify_one recommends, potentially triggering the "hurry up and wait" scenario described there depending on the thread implementation.

    It also looks like guarding the condition variable could have gone against CppCoreGuidelines according to the issue you linkedm but they decided not to add a guideline addressing this because performance is "not the main focus of the guidelines".

    I'd agree though having this annotation is better than not having it in cases like these where performance is not important.


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-06-11 10:51 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me