Happened once (?) while testing https://github.com/bitcoin/bitcoin/pull/30975.
https://github.com/bitcoin/bitcoin/actions/runs/12826202676/job/35765674846?pr=30975#step:7:3078
I haven’t been able yet to reproduce locally.
Happened once (?) while testing https://github.com/bitcoin/bitcoin/pull/30975.
https://github.com/bitcoin/bitcoin/actions/runs/12826202676/job/35765674846?pr=30975#step:7:3078
I haven’t been able yet to reproduce locally.
So far I tried on (M4) macOS 15.2 with commit 9a90f964a3c4fe75477926d34b7f1813b144449d:
0cmake -B build -DWITH_MULTIPROCESS=ON
1cmake --build build
2BITCOIND=$(pwd)/build/src/bitcoin-node build/test/functional/rpc_misc.py
Using multiprocess master @ 3b2617b3e59f55ae3501c82a62c87f393077779f
From the CI link above the error seems to be:
0node0 stderr libc++abi: terminating due to uncaught exception of type std::__1::system_error: mutex lock failed: Invalid argument
which according to https://stackoverflow.com/questions/66773247/libcabi-dylib-terminating-with-uncaught-exception-of-type-std-1system-er “typically happens when .lock() is called on a mutex that is not yet constructed, or has already been destructed.”
In the test, the error happens when the node is shutting down after an echoipc call has been made:
Another CI failure on the same test, this time on centOS. I’m unsure if it’s the same issue:
https://cirrus-ci.com/task/6027377219731456?logs=ci#L4105
0raise AssertionError("Unexpected stderr {} != {}".format(stderr, expected_stderr))
1AssertionError: Unexpected stderr terminate called after throwing an instance of 'kj::ExceptionImpl'
2what(): mp/proxy.cpp:242: disconnected: write(m_post_fd, &buffer, 1): Broken pipe
The failure on centos https://cirrus-ci.com/task/6027377219731456?logs=ci#L4111 also reported https://github.com/bitcoin/bitcoin/pull/30975#issuecomment-2603827142 is happening in nearly the same place as originally reported failure on macos, when the test tries to stop the node on line https://github.com/bitcoin/bitcoin/blob/11b293a9e7e89f25acc74d0bda5cc68cd164b551/test/functional/rpc_misc.py#L83 after an echoipc call is made.
However, there are some differences between the two cases:
libc++abi: terminating due to uncaught exception of type std::__1::system_error: mutex lock failed: Invalid argument. The crash happens very late though, after Shutdown: done is logged and the IPC event loop exits when the Init object is being destroyedterminate called after throwing an instance of 'kj::ExceptionImpl… probably comes from the child process not the parent process.In both cases there seems to be a race condition and the bug could have the same root cause. I think my next step will be to try to run the test in loop to see if I could reproduce locally.
Looking into this more it’s not clear if this “mutex lock failed: Invalid argument” issue is the same as the other “disconnected: write(m_post_fd, &buffer, 1): Broken pipe” issue reported in https://github.com/bitcoin/bitcoin/issues/31151, or if the fix in #129 will affect it.
The other “Broken pipe” issue happens the when the child process in the echoipc test does not shut down cleanly due to a race condition in the EventLoop shutdown sequence that happens there. But the “mutex lock failed” here is happening in the parent process not the child process, and even though the last log messages are about the event loop shutting down, they could just indicate that the event loop shut down successfully, and the error is happening at some point after that.
The echoipc parent process starts its event loop with the CapnpProtocol::connect function instead of CapnpProtocol::serve method running the event loop in the child process. And instead of shutting down when the last clean disconnects, it shuts down when the main() process exits and MakeNodeInit() destructor is called to destroy the interfaces::Init object which owns the interfaces::Ipc object which owns the CapnpProtocol object which calls removeClient in its destructor to stop the event loop.
But even though the event loop shutdown sequence is different, it could have the race conditions described in https://github.com/bitcoin/bitcoin/issues/31151 as long as there are any ProxyServerBase::m_impl objects that need to be destroyed asynchronously and could cause EventLoop::startAsyncThread to spawn a separate thread.
The echoipc method in the parent process isn’t passing any interface pointers to the child process so it isn’t actually creating any ProxyServer objects of its own. However the IPC framework is creating ProxyServer<Thread> and
ProxyServer<ThreadMap> objects and while the former is a custom class that doesn’t inherit from ProxyServerBase, the latter does, so it’s possible a similar race condition could exist that causes this issue and would be fixed by #129.
But it’s also possible #129 does not fix this, so will probably just need to see if it happens again.
But it’s also possible #129 does not fix this, so will probably just need to see if it happens again.
This issue did turn out to be different than #31151 and not fixed by #129. A similar macos “mutex lock failed” error was reported in #154. Both this issue and #154 should be fixed by https://github.com/bitcoin/bitcoin/pull/31815, but this has not been confirmed yet.