Socket disconnect followed by segfault on macos/arm64 #191

issue ryanofsky openend this issue on July 24, 2025
  1. ryanofsky commented at 4:34 pm on July 24, 2025: collaborator

    Originally posted by @pinheadmz in https://github.com/bitcoin/bitcoin/issues/32297#issuecomment-3074628613

    I got some crashes testing https://github.com/bitcoin/bitcoin/commit/37cd2c076434e7acbdbb20996cf87afb2cb5bc84 on macos/arm64. This is on a pruned mainnet node catching up on about 30 days behind blockchain tip. After these two crashes I wasn’t able to reproduce any more for a little while after a few restarts then got the third crash.

    server: build/bin/bitcoin-node -server=1 -printtoconsole=1 -ipcbind=unix -debug=ipc -debug=rpc -debug=http

    client: build/bin/bitcoin-cli -getinfo

    server:

    02025-07-15T17:16:16Z [ipc] {bitcoin-node-65637/b-capnp-loop-15499815} IPC server: socket disconnected.
    12025-07-15T17:16:16Z [ipc] {bitcoin-node-65637/b-capnp-loop-15499815} IPC server destroy N2mp11ProxyServerIN3ipc5capnp8messages4InitEEE
    22025-07-15T17:16:16Z [rpc] ThreadRPCServer method=getbalances user=
    3Segmentation fault: 11
    

    client: (1st occurence)

    0error: timeout on transient error: kj::Exception: capnp/rpc.c++:2779: disconnected: Peer disconnected.
    1stack: 102cca3bc 102cc028c 1007be520 1007c0f90
    2
    3Probably bitcoin-node is not running or not listening on a unix socket. Can be started with:
    4
    5    bitcoin-node -chain=main -ipcbind=unix
    

    client: (2nd occurence)

    0libc++abi: terminating due to uncaught exception of type kj::ExceptionImpl: /opt/homebrew/include/kj/memory.h:258: failed: expected ptr != nullptr; null Own<> dereference
    1stack: 10563c7cb 102e3e057 102ef043b 102ef0367 102ef7b6b 102ef7a0b 102ef79af 102ef794f 102ef788f 102ef7833 102ef51b7 102e528e7 102e525f3 10339cf9b 10339c8eb 102e307b3 102e3062b 102e30503 102e2f593 1895a2f93 18959dd33
    2Abort trap: 6
    

    client: (3rd occurence)

    0error: kj::Exception: capnp/rpc.c++:2779: disconnected: Peer disconnected.
    1stack: 1070523bc 10704828c 104f737c4 104f7621c
    2error: timeout on transient error: Could not connect to the server 127.0.0.1:8332
    
  2. ryanofsky commented at 5:22 pm on July 24, 2025: collaborator

    From these logs it seems like there are abrupt socket disconnects followed by segfaults and other errors here.

    It seems like in the first example, the client is connecting and calling getbalance and then disconnecting before waiting for the getbalance response, and this causes the server to crash because it not handling the disconnect properly. The problem with abrupt disconnects causing crashes is known and should be fixed with https://github.com/bitcoin/bitcoin/pull/32345. That PR should cause the server not crash in this case.

    It is not clear what is causing the client to disconnect, though. And the client errors don’t seem to make that much sense.

    The first client error “timeout on transient error […] Peer disconnected” only happens when the client is trying to connect to the server before it sends any RPC method request. So it does not seem consistent with the server log. Maybe it is actually from a different client than the one that send the getbalances request?

    The second client error “null Own<> dereference” seems more promising. Maybe that is the client which sent the getbalances request, and suddenly crashed. If so, getting a stack trace from that client could reveal what the problem is.

    The third client error “Could not connect to the server 127.0.0.1:8332” is actually an RPC error not an IPC error. But there is an IPC “Peer disconnected” error immediately before it, which I think would have to come from a different bitcoin-cli invocation? I don’t see how a single bitcoin-cil invocation could result in both of these errors, since the first error should cause it to exit.

    I guess overall I’m confused about some of this behavior, and also don’t understand why there are multiple different client errors and only a single server error. Unclear if the server crashed different times with the same error, or if all of these clients were just to connecting to the same server around the time it crashed. [EDIT: rereading the OP it sounds like the server just crashed multiple times and the clients showed varying error messages each time it crashed]

    I do think https://github.com/bitcoin/bitcoin/pull/32345 should remove a lot of instability and noise here and make any remaining issues easier to resolve.

  3. ryanofsky commented at 5:40 pm on July 24, 2025: collaborator
    Actually I think I’m misreading the server logs and there might not even a sudden disconnect causing the server to crash. The two lines beginning the log could be a clean disconnect. The getbalances call after could be coming from another client if -getinfo calls are being made in succession. Could help to have more logs, or a stack trace from the server to clarify.
  4. pinheadmz commented at 6:46 pm on July 28, 2025: none
    I was making many client calls to get the crashes, so that might explain why there seem to be multiple clients. I’ll try to capture a stack.
  5. pinheadmz commented at 7:19 pm on July 28, 2025: none

    Got something after pummeling with RPC requests:

     0Process 50097 stopped
     1* thread [#3](/bitcoin-core-multiprocess/3/), name = 'b-capnp-loop', stop reason = EXC_BAD_ACCESS (code=1, address=0x795848873540be60)
     2    frame [#0](/bitcoin-core-multiprocess/0/): 0x0000000100c6c934 bitcoin-node`std::__1::__function::__value_func<void (bool, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>)>::operator bool[abi:ne180100](this=0x795848873540be48) const at function.h:463:75
     3   460        std::swap(__f_, __f.__f_);
     4   461    }
     5   462
     6-> 463    _LIBCPP_HIDE_FROM_ABI explicit operator bool() const _NOEXCEPT { return __f_ != nullptr; }
     7   464
     8   465  #  ifndef _LIBCPP_HAS_NO_RTTI
     9   466    _LIBCPP_HIDE_FROM_ABI const std::type_info& target_type() const _NOEXCEPT {
    10Target 0: (bitcoin-node) stopped.
    11(lldb) bt
    12* thread [#3](/bitcoin-core-multiprocess/3/), name = 'b-capnp-loop', stop reason = EXC_BAD_ACCESS (code=1, address=0x795848873540be60)
    13  * frame [#0](/bitcoin-core-multiprocess/0/): 0x0000000100c6c934 bitcoin-node`std::__1::__function::__value_func<void (bool, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>)>::operator bool[abi:ne180100](this=0x795848873540be48) const at function.h:463:75
    14    frame [#1](/bitcoin-core-multiprocess/1/): 0x0000000100c6c8d0 bitcoin-node`std::__1::function<void (bool, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>)>::operator bool[abi:ne180100](this=0x795848873540be48) const at function.h:886:93
    15    frame [#2](/bitcoin-core-multiprocess/2/): 0x0000000100c6cb48 bitcoin-node`mp::Logger& mp::operator<<<char const (&) [2]>(logger=0x000000016ff12130, value={\x83N\xc6\x01\x01\0\0\0}) at proxy-io.h:113:13
    16    frame [#3](/bitcoin-core-multiprocess/3/): 0x0000000100c6c68c bitcoin-node`mp::EventLoop::log(this=0x795848873540bd30) at proxy-io.h:180:16
    17    frame [#4](/bitcoin-core-multiprocess/4/): 0x0000000101141b50 bitcoin-node`void mp::serverDestroy<mp::ProxyServer<ipc::capnp::messages::Rpc>>(server=0x0000600007535220) at proxy-types.h:580:41
    18    frame [#5](/bitcoin-core-multiprocess/5/): 0x0000000101141aa4 bitcoin-node`mp::ProxyServer<ipc::capnp::messages::Rpc>::~ProxyServer(this=0x0000600007535220, vtt=0x0000000101ec3d50) at rpc.capnp.proxy-types.c++:8:58
    19    frame [#6](/bitcoin-core-multiprocess/6/): 0x0000000101141ce0 bitcoin-node`mp::ProxyServer<ipc::capnp::messages::Rpc>::~ProxyServer(this=0x0000600007535220) at rpc.capnp.proxy-types.c++:8:56
    20    frame [#7](/bitcoin-core-multiprocess/7/): 0x0000000101141d44 bitcoin-node`mp::ProxyServer<ipc::capnp::messages::Rpc>::~ProxyServer(this=0x0000600007535220) at rpc.capnp.proxy-types.c++:8:56
    21    frame [#8](/bitcoin-core-multiprocess/8/): 0x0000000100dd48ac bitcoin-node`kj::_::HeapDisposer<mp::ProxyServer<ipc::capnp::messages::Rpc>>::disposeImpl(this=0x0000000101eb7400, pointer=0x0000600007535220) const at memory.h:557:60
    22    frame [#9](/bitcoin-core-multiprocess/9/): 0x00000001063812e0 libcapnp-rpc.1.1.0.dylib`capnp::LocalClient::~LocalClient() + 188
    23    frame [#10](/bitcoin-core-multiprocess/10/): 0x000000010637c3a8 libcapnp-rpc.1.1.0.dylib`capnp::LocalClient::~LocalClient() + 20
    24    frame [#11](/bitcoin-core-multiprocess/11/): 0x000000010637e924 libcapnp-rpc.1.1.0.dylib`kj::_::AttachmentPromiseNode<kj::_::Tuple<kj::Own<capnp::LocalClient, std::nullptr_t>, kj::Own<capnp::CallContextHook, std::nullptr_t>>>::~AttachmentPromiseNode() + 100
    25    frame [#12](/bitcoin-core-multiprocess/12/): 0x0000000106512384 libkj-async.1.1.0.dylib`kj::_::ForkHubBase::fire() + 80
    26    frame [#13](/bitcoin-core-multiprocess/13/): 0x000000010650fef4 libkj-async.1.1.0.dylib`kj::EventLoop::turn() + 128
    27    frame [#14](/bitcoin-core-multiprocess/14/): 0x000000010651071c libkj-async.1.1.0.dylib`kj::_::waitImpl(kj::Own<kj::_::PromiseNode, kj::_::PromiseDisposer>&&, kj::_::ExceptionOrValue&, kj::WaitScope&, kj::SourceLocation) + 464
    28    frame [#15](/bitcoin-core-multiprocess/15/): 0x000000010172fe8c bitcoin-node`kj::Promise<unsigned long>::wait(this=0x000000016ff12c20, waitScope=0x0000600002300458, location=(fileName = "ipc/libmultiprocess/src/mp/proxy.cpp", function = "loop", lineNumber = 196, columnNumber = 35)) at async-inl.h:1357:3
    29    frame [#16](/bitcoin-core-multiprocess/16/): 0x000000010172f810 bitcoin-node`mp::EventLoop::loop(this=0x0000000148804098) at proxy.cpp:196:68
    30    frame [#17](/bitcoin-core-multiprocess/17/): 0x0000000100c5c190 bitcoin-node`ipc::capnp::(anonymous namespace)::CapnpProtocol::startLoop(this=0x0000600001609408)::'lambda'()::operator()() const at protocol.cpp:91:21
    31    frame [#18](/bitcoin-core-multiprocess/18/): 0x0000000100c5c008 bitcoin-node`decltype(std::declval<ipc::capnp::(anonymous namespace)::CapnpProtocol::startLoop(char const*)::'lambda'()>()()) std::__1::__invoke[abi:ne180100]<ipc::capnp::(anonymous namespace)::CapnpProtocol::startLoop(char const*)::'lambda'()>(__f=0x0000600001609408) at invoke.h:344:25
    32    frame [#19](/bitcoin-core-multiprocess/19/): 0x0000000100c5bfac bitcoin-node`void std::__1::__thread_execute[abi:ne180100]<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, ipc::capnp::(anonymous namespace)::CapnpProtocol::startLoop(char const*)::'lambda'()>(__t=size=2, (null)=__tuple_indices<> @ 0x000000016ff12f77) at thread.h:199:3
    33    frame [#20](/bitcoin-core-multiprocess/20/): 0x0000000100c5b92c bitcoin-node`void* std::__1::__thread_proxy[abi:ne180100]<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, ipc::capnp::(anonymous namespace)::CapnpProtocol::startLoop(char const*)::'lambda'()>>(__vp=0x0000600001609400) at thread.h:208:3
    34    frame [#21](/bitcoin-core-multiprocess/21/): 0x00000001895a2f94 libsystem_pthread.dylib`_pthread_start + 136
    
  6. pinheadmz commented at 7:31 pm on July 28, 2025: none
    To be totally clear, I pummeled using watch ... -getinfo from 4 separate tmux panes, while sync was around 99%. I let that run for several minutes, then cancelled all the watch processes, took a breath, and executed one single -getinfo request – for some reason that produced the same crash twice in lldb.
  7. ryanofsky commented at 8:02 pm on July 28, 2025: collaborator

    Thanks! The bitcoin-node stack trace is consistent with bugs fixed in #160 and https://github.com/bitcoin/bitcoin/pull/32345. An annotated version of the stack trace shown below shows this is probably happening because the server.m_context.connection pointer is null, which would happen if the connection is closed suddenly before #160. If you merge into https://github.com/bitcoin/bitcoin/pull/32345 into your branch, the server crash should stop happening.

    It’s still not clear what clients may be doing to trigger this crash though. If one of them is segfaulting before the server crash, that might explain this, and it could help to see the client stack trace. Otherwise it’s not really clear. https://github.com/bitcoin/bitcoin/pull/32345 might fix the client problems or might not, would have to test to find out.

    Annotated stack trace from #191 (comment):

     0    frame [#2](/bitcoin-core-multiprocess/2/): 0x0000000100c6cb48 bitcoin-node`mp::Logger& mp::operator<<<char const (&) [2]>(logger=0x000000016ff12130, value={\x83N\xc6\x01\x01\0\0\0}) at proxy-io.h:113:13
     1
     2   110      template <typename T>
     3   111      friend Logger& operator<<(Logger& logger, T&& value)
     4   112      {
     5>  113          if (logger.m_fn) logger.m_buffer << std::forward<T>(value);
     6   114          return logger;
     7   115      }
     8    
     9    frame [#3](/bitcoin-core-multiprocess/3/): 0x0000000100c6c68c bitcoin-node`mp::EventLoop::log(this=0x795848873540bd30) at proxy-io.h:180:16
    10
    11   177      Logger log()
    12   178      {
    13   179          Logger logger(false, m_log_fn);
    14>  180          logger << "{" << LongThreadName(m_exe_name) << "} ";
    15   181          return logger;
    16   182      }
    17    
    18    frame [#4](/bitcoin-core-multiprocess/4/): 0x0000000101141b50 bitcoin-node`void mp::serverDestroy<mp::ProxyServer<ipc::capnp::messages::Rpc>>(server=0x0000600007535220) at proxy-types.h:580:41
    19
    20   577  template <typename Server>
    21   578  void serverDestroy(Server& server)
    22   579  {
    23>  580      server.m_context.connection->m_loop.log() << "IPC server destroy " << typeid(server).name();
    24   581  }
    
  8. pinheadmz commented at 2:13 pm on August 7, 2025: none
    Couldn’t reproduce the crash https://github.com/bitcoin/bitcoin/pull/32345 so I think you can close this since the fixes seem to be merged already in libmultiprocess
  9. fanquake closed this on Aug 21, 2025


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin-core/libmultiprocess. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-12-04 19:30 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me