intermittent issue in rpc_signer.py (enumeratesigners timeout) under GLIBCXX debug mode #32524

issue maflcko opened this issue on May 16, 2025
  1. maflcko commented at 6:36 AM on May 16, 2025: member

    https://cirrus-ci.com/task/5094568045051904?logs=ci#L4419

    [18:26:32.573] test_framework.authproxy.JSONRPCException: 'enumeratesigners' RPC took longer than 1200.000000 seconds. Consider using larger timeout for calls that take longer to return. (-344)
    

    Looking at the full log, the test is just idle and then times out. However, there is no indication why the RPC does not proceed.

  2. maflcko added the label CI failed on May 16, 2025
  3. maflcko commented at 10:04 AM on May 16, 2025: member
  4. maflcko commented at 10:46 AM on May 16, 2025: member

    Same https://cirrus-ci.com/task/4561849141690368?logs=ci#L6059

    So I guess this was recently introduced

  5. fanquake commented at 10:49 AM on May 16, 2025: member
  6. laanwj commented at 11:12 AM on May 16, 2025: member

    If it's #32343 i don't understand why. It was changed so that the brute force fd-closing path isn't even compiled on Linux.

    No idea what could even take so long, unless it's an obscure bug in the C++ library like fs::directory_iterator looping forever.

  7. laanwj commented at 11:37 AM on May 16, 2025: member

    It would be reasonaly useful to if it logged a full backtrace of bitcoind (gdb thread apply all bt) in these case of timeouts.

    Though, i'm not sure that's enough to debug this. If this is what we think that it is, this is already happening in the forked process, after all. This similarly makes it difficult to add debug logging.

    Edit: i'll make an implementation using bare readdir instead of the fs_iterator, and open a PR. So we can see if that fixes it.

  8. maflcko commented at 8:46 AM on May 17, 2025: member

    I've created 50 runs and they all passed: https://cirrus-ci.com/task/5325312260571136 So I guess this is related. I'll try to get the gdb bt next.

  9. maflcko commented at 8:54 AM on May 17, 2025: member

    In the meantime, I place my bet that this is a gcc-11 upstream bug 🤞

  10. romanz commented at 12:49 PM on May 17, 2025: contributor
  11. maflcko commented at 12:39 PM on May 18, 2025: member

    The bt is:

    Thread 9 (Thread 0x7f0973400640 (LWP 21341) "b-httpworker.5"):
    [#0](/bitcoin-bitcoin/0/)  __GI___libc_read (nbytes=1024, buf=0x7f09733fb7f0, fd=30) at ../sysdeps/unix/sysv/linux/read.c:26
    [#1](/bitcoin-bitcoin/1/)  __GI___libc_read (fd=30, buf=0x7f09733fb7f0, nbytes=1024) at ../sysdeps/unix/sysv/linux/read.c:24
    [#2](/bitcoin-bitcoin/2/)  0x0000564d74731ba3 in subprocess::Popen::execute_process() ()
    [#3](/bitcoin-bitcoin/3/)  0x0000564d74732b44 in subprocess::Popen::Popen<subprocess::input, subprocess::output, subprocess::error, subprocess::close_fds>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, subprocess::input&&, subprocess::output&&, subprocess::error&&, subprocess::close_fds&&) ()
    [#4](/bitcoin-bitcoin/4/)  0x0000564d747299f5 in RunCommandParseJSON(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) ()
    [#5](/bitcoin-bitcoin/5/)  0x0000564d7457642c in ExternalSigner::Enumerate(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__debug::vector<ExternalSigner, std::allocator<ExternalSigner> >&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) ()
    [#6](/bitcoin-bitcoin/6/)  0x0000564d73e7f8e5 in enumeratesigners()::{lambda(RPCHelpMan const&, JSONRPCRequest const&)#1}::operator()(RPCHelpMan const&, JSONRPCRequest const&) const [clone .constprop.0] ()
    [#7](/bitcoin-bitcoin/7/)  0x0000564d73e815d2 in std::_Function_handler<UniValue (RPCHelpMan const&, JSONRPCRequest const&), enumeratesigners()::{lambda(RPCHelpMan const&, JSONRPCRequest const&)#1}>::_M_invoke(std::_Any_data const&, RPCHelpMan const&, JSONRPCRequest const&) ()
    [#8](/bitcoin-bitcoin/8/)  0x0000564d74630660 in RPCHelpMan::HandleRequest(JSONRPCRequest const&) const ()
    [#9](/bitcoin-bitcoin/9/)  0x0000564d73e6334e in CRPCCommand::CRPCCommand(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, RPCHelpMan (*)())::{lambda(JSONRPCRequest const&, UniValue&, bool)#1}::operator()(JSONRPCRequest const&, UniValue&, bool) const [clone .isra.0] ()
    [#10](/bitcoin-bitcoin/10/) 0x0000564d73f80826 in ExecuteCommands(std::__debug::vector<CRPCCommand const*, std::allocator<CRPCCommand const*> > const&, JSONRPCRequest const&, UniValue&) ()
    [#11](/bitcoin-bitcoin/11/) 0x0000564d73f80e4d in CRPCTable::execute(JSONRPCRequest const&) const ()
    [#12](/bitcoin-bitcoin/12/) 0x0000564d73f81031 in JSONRPCExec(JSONRPCRequest const&, bool) ()
    [#13](/bitcoin-bitcoin/13/) 0x0000564d740fbb02 in HTTPReq_JSONRPC(std::any const&, HTTPRequest*) ()
    [#14](/bitcoin-bitcoin/14/) 0x0000564d7411405d in WorkQueue<HTTPClosure>::Run() ()
    [#15](/bitcoin-bitcoin/15/) 0x0000564d7410ca38 in HTTPWorkQueueRun(WorkQueue<HTTPClosure>*, int) ()
    [#16](/bitcoin-bitcoin/16/) 0x00007f097c4cf253 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
    [#17](/bitcoin-bitcoin/17/) 0x00007f097c157ac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
    [#18](/bitcoin-bitcoin/18/) 0x00007f097c1e9850 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
    
  12. maflcko commented at 6:25 AM on May 19, 2025: member

    Whatever the issue is, it may be fixed in gcc-14. At least https://cirrus-ci.com/task/5477938151292928 passed 25 times. (The prior commit with gcc-12 failed at least once: https://cirrus-ci.com/task/5645419964792832?logs=ci#L4122). Gcc-13 also failed at least once: https://cirrus-ci.com/task/4577002323181568?logs=ci#L690

  13. maflcko commented at 7:21 AM on May 19, 2025: member

    Backtrace with file names and lines:

    Thread 33 (Thread 0x7f9d93e006c0 (LWP 18294) "b-httpworker.2"):
    [#0](/bitcoin-bitcoin/0/)  0x00007f9da1dfca9a in read () from /lib/x86_64-linux-gnu/libc.so.6
    [#1](/bitcoin-bitcoin/1/)  0x000055f3d4633e22 in read (__nbytes=1024, __buf=0x7f9d93dfb030, __fd=30) at /usr/include/x86_64-linux-gnu/bits/unistd.h:28
    [#2](/bitcoin-bitcoin/2/)  subprocess::util::read_atmost_n (read_upto=1024, buf=0x7f9d93dfb030 "", fp=0x7f9d8c0036c0) at ./util/subprocess.h:437
    [#3](/bitcoin-bitcoin/3/)  subprocess::Popen::execute_process (this=this@entry=0x7f9d93dfb8b0) at ./util/subprocess.h:1242
    [#4](/bitcoin-bitcoin/4/)  0x000055f3d4634bfc in subprocess::Popen::Popen<subprocess::input, subprocess::output, subprocess::error, subprocess::close_fds> (this=0x7f9d93dfb8b0, cmd_args="/usr/bin/python3 /ci_container_base/test/functional/mocks/signer.py enumerate") at ./util/subprocess.h:964
    [#5](/bitcoin-bitcoin/5/)  0x000055f3d462a9dc in RunCommandParseJSON (str_command="/usr/bin/python3 /ci_container_base/test/functional/mocks/signer.py enumerate", str_std_in="") at ./common/run_command.cpp:27
    [#6](/bitcoin-bitcoin/6/)  0x000055f3d44717bc in ExternalSigner::Enumerate (command="/usr/bin/python3 /ci_container_base/test/functional/mocks/signer.py", signers=std::__debug::vector of length 0, capacity 0, chain="regtest") at /usr/include/c++/13/bits/allocator.h:184
    [#7](/bitcoin-bitcoin/7/)  0x000055f3d3d78a55 in operator() (request=..., self=..., __closure=<optimized out>) at /usr/include/c++/13/bits/basic_string.tcc:242
    [#8](/bitcoin-bitcoin/8/)  0x000055f3d3d7a0c1 in std::__invoke_impl<UniValue, enumeratesigners()::<lambda(const RPCHelpMan&, const JSONRPCRequest&)>&, const RPCHelpMan&, const JSONRPCRequest&> (__f=...) at /usr/include/c++/13/bits/invoke.h:60
    [#9](/bitcoin-bitcoin/9/)  std::__invoke_r<UniValue, enumeratesigners()::<lambda(const RPCHelpMan&, const JSONRPCRequest&)>&, const RPCHelpMan&, const JSONRPCRequest&> (__fn=...) at /usr/include/c++/13/bits/invoke.h:116
    [#10](/bitcoin-bitcoin/10/) std::_Function_handler<UniValue(const RPCHelpMan&, const JSONRPCRequest&), enumeratesigners()::<lambda(const RPCHelpMan&, const JSONRPCRequest&)> >::_M_invoke(const std::_Any_data &, const RPCHelpMan &, const JSONRPCRequest &) (__functor=..., __args#0=..., __args#1=...) at /usr/include/c++/13/bits/std_function.h:291
    [#11](/bitcoin-bitcoin/11/) 0x000055f3d4530306 in std::function<UniValue (RPCHelpMan const&, JSONRPCRequest const&)>::operator()(RPCHelpMan const&, JSONRPCRequest const&) const (__args#1=..., __args#0=..., this=0x7f9d93dfd110) at /usr/include/c++/13/bits/std_function.h:591
    [#12](/bitcoin-bitcoin/12/) RPCHelpMan::HandleRequest (this=this@entry=0x7f9d93dfd0f0, request=...) at ./rpc/util.cpp:663
    [#13](/bitcoin-bitcoin/13/) 0x000055f3d3d5d2fb in CRPCCommand::CRPCCommand(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, RPCHelpMan (*)())::{lambda(JSONRPCRequest const&, UniValue&, bool)#1}::operator()(JSONRPCRequest const&, UniValue&, bool) const (request=..., result=..., __closure=<optimized out>) at ./rpc/server.h:101
    [#14](/bitcoin-bitcoin/14/) 0x000055f3d3e6ed87 in std::function<bool (JSONRPCRequest const&, UniValue&, bool)>::operator()(JSONRPCRequest const&, UniValue&, bool) const (__args#2=<optimized out>, __args#1=..., __args#0=..., this=0x55f3d50aee20 <RegisterSignerRPCCommands(CRPCTable&)::commands+64>) at /usr/include/c++/13/bits/std_function.h:591
    [#15](/bitcoin-bitcoin/15/) ExecuteCommand (last_handler=true, result=..., request=..., command=...) at ./rpc/server.cpp:514
    [#16](/bitcoin-bitcoin/16/) ExecuteCommands (commands=std::__debug::vector of length 1, capacity 1 = {...}, request=..., result=...) at ./rpc/server.cpp:481
    [#17](/bitcoin-bitcoin/17/) 0x000055f3d3e6fc17 in CRPCTable::execute (this=this@entry=0x55f3d50b1f80 <tableRPC>, request=...) at ./rpc/server.cpp:501
    [#18](/bitcoin-bitcoin/18/) 0x000055f3d3e6ffdb in JSONRPCExec (jreq=..., catch_errors=<optimized out>) at ./rpc/server.cpp:357
    [#19](/bitcoin-bitcoin/19/) 0x000055f3d3ffddd2 in HTTPReq_JSONRPC (context=..., req=0x7f9d30001ca0) at ./httprpc.cpp:217
    [#20](/bitcoin-bitcoin/20/) 0x000055f3d4014451 in std::function<bool (HTTPRequest*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)>::operator()(HTTPRequest*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const (__args#1="", __args#0=<optimized out>, this=0x7f9d300053c0) at /usr/include/c++/13/bits/std_function.h:591
    [#21](/bitcoin-bitcoin/21/) HTTPWorkItem::operator() (this=0x7f9d30005390) at ./httpserver.cpp:60
    [#22](/bitcoin-bitcoin/22/) WorkQueue<HTTPClosure>::Run (this=this@entry=0x55f40ede4b10) at ./httpserver.cpp:115
    [#23](/bitcoin-bitcoin/23/) 0x000055f3d400bb37 in HTTPWorkQueueRun (queue=0x55f40ede4b10, worker_num=<optimized out>) at ./httpserver.cpp:417
    [#24](/bitcoin-bitcoin/24/) 0x00007f9da20f6db4 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
    [#25](/bitcoin-bitcoin/25/) 0x00007f9da1d7daa4 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
    [#26](/bitcoin-bitcoin/26/) 0x00007f9da1e0ac3c in ?? () from /lib/x86_64-linux-gnu/libc.so.6
    
  14. laanwj commented at 8:33 AM on May 19, 2025: member

    That backtrace is exactly what i'd expected. It's the parent process waiting for the forked child there.

  15. maflcko commented at 10:00 AM on May 19, 2025: member

    So far it looks like this can only happen with gcc-13 (or older) and GLIBCXX debug mode. I don't think any real user is running in production with the std lib debug mode, so an alternative would be to just fixup the CI?

    For reference, the end-user setting -D_GLIBCXX_ASSERTIONS passed 25 times: https://cirrus-ci.com/task/5739615073599488

  16. laanwj commented at 10:22 AM on May 19, 2025: member

    so an alternative would be to just fixup the CI?

    Sure. If this is unlikely to cause issues for real users, that makes more sense, the point of the CI is not to keep us patching the code to make the CI happy 😄

  17. maflcko commented at 10:58 AM on May 19, 2025: member

    I'd say finding compiler bugs is a valid use-case of the CI, but this one seems to be fixed already, so there is probably not much value in bisecting the exact gcc commit, especially given that it would be extremely tedious without a minimal non-intermittent reproducer.

    (The time would be better spent on reducing other compiler bugs, such as #32276 (comment))

  18. laanwj commented at 12:02 PM on May 19, 2025: member

    Yes, bisecting it would be neigh-impossible, given that it seems impossible to reliably reproduce (one thing that's strange is that it never happens in the unit tests, which exercise RunCommandParseJSON). i already looked into the git history of directory_iterator in the gcc/libstdc++-v3 and wasn't able to find any bug fix that would explain the behavior here, or how it was fixed. So it's likely something indirect.

  19. maflcko commented at 12:15 PM on May 19, 2025: member

    The debug mode keeps state in additional structs/fields, so my bet would be that one of the states is multi-thread unsafe, but that is just a wild guess.

  20. fanquake closed this on May 20, 2025

  21. fanquake referenced this in commit af65fd1a33 on May 20, 2025
  22. maflcko commented at 11:17 AM on May 20, 2025: member

    It even happens on gcc-14, it seems:

    https://cirrus-ci.com/task/5476449961902080?logs=ci#L5506

  23. maflcko reopened this on May 20, 2025

  24. maflcko renamed this:
    intermittent issue in rpc_signer.py (enumeratesigners timeout)
    intermittent issue in rpc_signer.py (enumeratesigners timeout) under GLIBCXX debug mode
    on May 20, 2025
  25. fanquake added this to the milestone 30.0 on May 20, 2025
  26. maflcko commented at 12:54 PM on May 20, 2025: member

    What I don't understand is why the CentOS task seemingly passed before, but started failing after https://github.com/bitcoin/bitcoin/commit/af65fd1a333011137dafd3df9a51704fd319feb4, even though the task wasn't modified apart from a comment.

  27. fanquake commented at 2:15 PM on May 21, 2025: member
  28. l0rinc commented at 9:03 AM on May 22, 2025: contributor

    Another one in https://github.com/bitcoin/bitcoin/pull/32404/checks?check_run_id=42637298331 - retriggering the build makes it pass.

    <details> <summary>CI logs</summary>

    [13:30:22.121] node0 2025-05-21T17:30:07.606048Z [msghand] [logging/timer.h:58] [Log] [lock] Enter: lock contention connman.m_nodes_mutex, ./net.h:1648 started
    [13:30:22.121] node0 2025-05-21T17:30:07.606205Z [msghand] [logging/timer.h:58] [Log] [lock] Enter: lock contention connman.m_nodes_mutex, ./net.h:1648 completed (2μs)
    [13:30:22.121] node0 2025-05-21T17:30:21.228925Z [msghand] [net_processing.cpp:3449] [ProcessMessage] [net] received: ping (8 bytes) peer=0
    [13:30:22.121] node0 2025-05-21T17:30:21.229000Z [msghand] [net.cpp:3870] [PushMessage] [net] sending pong (8 bytes) peer=0
    [13:30:22.121] node0 2025-05-21T17:30:21.229120Z [msghand] [net.cpp:3870] [PushMessage] [net] sending ping (8 bytes) peer=0
    [13:30:22.121] node0 2025-05-21T17:30:21.229620Z [msghand] [net_processing.cpp:3449] [ProcessMessage] [net] received: pong (8 bytes) peer=0
    [13:30:22.121] node0 2025-05-21T17:30:21.329837Z [net] [logging/timer.h:58] [Log] [lock] Enter: lock contention m_nodes_mutex, ./net.cpp:1906 started
    [13:30:22.121] node0 2025-05-21T17:30:21.329888Z [net] [logging/timer.h:58] [Log] [lock] Enter: lock contention m_nodes_mutex, ./net.cpp:1906 completed (1μs)
    [13:30:22.121] test  2025-05-21T17:30:21.429000Z TestFramework (ERROR): Assertion failed
    [13:30:22.121] Traceback (most recent call last):
    [13:30:22.121] File "/ci_container_base/test/functional/test_framework/authproxy.py", line 169, in _get_response
    [13:30:22.121] http_response = self.__conn.getresponse()
    [13:30:22.121] ^^^^^^^^^^^^^^^^^^^^^^^^^
    [13:30:22.121] File "/usr/lib64/python3.12/http/client.py", line 1430, in getresponse
    [13:30:22.121] response.begin()
    [13:30:22.121] File "/usr/lib64/python3.12/http/client.py", line 331, in begin
    [13:30:22.121] version, status, reason = self._read_status()
    [13:30:22.121] ^^^^^^^^^^^^^^^^^^^
    [13:30:22.121] File "/usr/lib64/python3.12/http/client.py", line 292, in _read_status
    [13:30:22.121] line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
    [13:30:22.121] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    [13:30:22.121] File "/usr/lib64/python3.12/socket.py", line 720, in readinto
    [13:30:22.121] return self._sock.recv_into(b)
    [13:30:22.121] ^^^^^^^^^^^^^^^^^^^^^^^
    [13:30:22.121] TimeoutError: timed out
    [13:30:22.121] During handling of the above exception, another exception occurred:
    [13:30:22.121] Traceback (most recent call last):
    [13:30:22.121] File "/ci_container_base/test/functional/test_framework/util.py", line 164, in try_rpc
    [13:30:22.121] fun(*args, **kwds)
    [13:30:22.121] File "/ci_container_base/test/functional/test_framework/coverage.py", line 50, in __call__
    [13:30:22.121] return_val = self.auth_service_proxy_instance.__call__(*args, **kwargs)
    [13:30:22.121] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    [13:30:22.121] File "/ci_container_base/test/functional/test_framework/authproxy.py", line 132, in __call__
    [13:30:22.121] response, status = self._request('POST', self.__url.path, postdata.encode('utf-8'))
    [13:30:22.121] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    [13:30:22.121] File "/ci_container_base/test/functional/test_framework/authproxy.py", line 106, in _request
    [13:30:22.121] return self._get_response()
    [13:30:22.121] ^^^^^^^^^^^^^^^^^^^^
    [13:30:22.121] File "/ci_container_base/test/functional/test_framework/authproxy.py", line 171, in _get_response
    [13:30:22.121] raise JSONRPCException({
    [13:30:22.121] test_framework.authproxy.JSONRPCException: 'enumeratesigners' RPC took longer than 1200.000000 seconds. Consider using larger timeout for calls that take longer to return. (-344)
    [13:30:22.121] During handling of the above exception, another exception occurred:
    [13:30:22.121] Traceback (most recent call last):
    [13:30:22.121] File "/ci_container_base/test/functional/test_framework/test_framework.py", line 184, in main
    [13:30:22.121] self.run_test()
    [13:30:22.121] File "/ci_container_base/ci/scratch/build-x86_64-pc-linux-gnu/test/functional/rpc_signer.py", line 54, in run_test
    [13:30:22.121] assert_raises_rpc_error(
    [13:30:22.121] File "/ci_container_base/test/functional/test_framework/util.py", line 155, in assert_raises_rpc_error
    [13:30:22.121] assert try_rpc(code, message, fun, *args, **kwds), "No exception raised"
    [13:30:22.121] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    [13:30:22.121] File "/ci_container_base/test/functional/test_framework/util.py", line 168, in try_rpc
    [13:30:22.121] raise AssertionError("Unexpected JSONRPC error code %i" % e.error["code"])
    [13:30:22.121] AssertionError: Unexpected JSONRPC error code -344
    [13:30:22.121] test  2025-05-21T17:30:21.437000Z TestFramework (DEBUG): Closing down network thread
    [13:30:22.121] test  2025-05-21T17:30:21.488000Z TestFramework (INFO): Not stopping nodes as test failed. The dangling processes will be cleaned up later.
    [13:30:22.121] test  2025-05-21T17:30:21.488000Z TestFramework (WARNING): Not cleaning up dir /ci_container_base/ci/scratch/test_runner/test_runner_₿_🏃_20250521_130522/rpc_signer_216
    [13:30:22.121] test  2025-05-21T17:30:21.488000Z TestFramework (ERROR): Test failed. Test logging available at /ci_container_base/ci/scratch/test_runner/test_runner_₿_🏃_20250521_130522/rpc_signer_216/test_framework.log
    [13:30:22.121] test  2025-05-21T17:30:21.489000Z TestFramework (ERROR):
    [13:30:22.121] test  2025-05-21T17:30:21.489000Z TestFramework (ERROR): Hint: Call /ci_container_base/test/functional/combine_logs.py '/ci_container_base/ci/scratch/test_runner/test_runner_₿_🏃_20250521_130522/rpc_signer_216' to consolidate all logs
    [13:30:22.121] test  2025-05-21T17:30:21.489000Z TestFramework (ERROR):
    [13:30:22.121] test  2025-05-21T17:30:21.489000Z TestFramework (ERROR): If this failure happened unexpectedly or intermittently, please file a bug and provide a link or upload of the combined log.
    [13:30:22.121] test  2025-05-21T17:30:21.489000Z TestFramework (ERROR): https://github.com/bitcoin/bitcoin/issues
    [13:30:22.121] test  2025-05-21T17:30:21.489000Z TestFramework (ERROR):
    
    

    </details>

  29. maflcko commented at 11:39 AM on May 22, 2025: member

    I am out of ideas on how to proceed here. I tried this again on Ubuntu and the gcc-13 failed frequently (https://cirrus-ci.com/task/4846423977492480), whereas the previous commit with gcc-14 passed 100 times (https://cirrus-ci.com/task/5508081775280128)

  30. maflcko referenced this in commit a476e94689 on May 22, 2025
  31. maflcko referenced this in commit fa079538e3 on May 22, 2025
  32. fanquake closed this on May 22, 2025

  33. fanquake referenced this in commit 2df824f4e6 on May 22, 2025
  34. maflcko commented at 6:10 AM on May 23, 2025: member

    This is closed, but the failure remains, when triggered, so it would be good to investigate this long-term at some point. Discussion can continue here, or in a new issue.

  35. fanquake reopened this on May 23, 2025

  36. maflcko added the label Brainstorming on May 23, 2025
  37. maflcko commented at 8:18 AM on May 23, 2025: member

    Another data point: I haven't see a failure so far in my G++ snapshot DEBUG=1 nightly run: https://github.com/maflcko/b-c-nightly/actions/runs/15153825944/job/42604725571

  38. maflcko commented at 3:56 PM on May 23, 2025: member

    The 32-bit GLIBCXX debug run (gcc-13) is also passing fine: https://cirrus-ci.com/task/6401825881980928?logs=ci#L3769

  39. tnndbtc commented at 12:13 AM on July 15, 2025: none

    I assume the general code logic should be correct, but likely some exception/run-time error occurred then caused the intermittent issue.

    So, I studied Popen::execute_process() in src/util/subprocess.h, one interesting statement caught my attention:

        subprocess_close(err_wr_pipe);// close child side of pipe, else get stuck in read below
        stream_.close_child_fds();
    

    I notice that we didn't check the return code of the calling of subprocess_close()/close(), in many cases. Indeed, close() is a very safe call in general, but what if close() returns -1, but we didn't check it?

    So, I hacked the subprocess.h by not calling subprocess_close(err_wr_pipe); on purpose in my macOS. Then I kind of reproduced the issue, at least it has very similar call stack on bitcoind, and on signer.py, it likely hangs on buffer = sys.stdin.read()

    Not sure if this helps on the debugging process, but just want to share a thought. Maybe we should handle the return code of subprocess_close() to provide more hints when next it reproduces?

    # the bitcoind process hangs on subprocess::util::read_atmost_n
    (lldb) bt
    * thread [#7](/bitcoin-bitcoin/7/), name = 'b-httpworker.3'
      * frame [#0](/bitcoin-bitcoin/0/): 0x000000018854e9b0 libsystem_kernel.dylib`read + 8
        frame [#1](/bitcoin-bitcoin/1/): 0x0000000105dfb2b0 bitcoind`subprocess::util::read_atmost_n(fp=0x00000001ed6ac330, buf="", read_upto=1024) at subprocess.h:437:24
        frame [#2](/bitcoin-bitcoin/2/): 0x0000000105e06444 bitcoind`subprocess::Popen::execute_process(this=0x000000016b5944e0) at subprocess.h:1250:24
        frame [#3](/bitcoin-bitcoin/3/): 0x0000000105e05b68 bitcoind`subprocess::Popen::Popen<subprocess::input, subprocess::output, subprocess::error, subprocess::close_fds>(this=0x000000016b5944e0, cmd_args="/opt/homebrew/opt/python@3.13/bin/python3.13 /Users/<user>/data/code/bitcoin/test/functional/mocks/signer.py enumerate", args=0x000000016b5944d8, args=0x000000016b5944d0, args=0x000000016b5944c4, args=0x000000016b5944c3) at subprocess.h:964:5
        frame [#4](/bitcoin-bitcoin/4/): 0x0000000105df8ac8 bitcoind`subprocess::Popen::Popen<subprocess::input, subprocess::output, subprocess::error, subprocess::close_fds>(this=0x000000016b5944e0, cmd_args="/opt/homebrew/opt/python@3.13/bin/python3.13 /Users/<user>/data/code/bitcoin/test/functional/mocks/signer.py enumerate", args=0x000000016b5944d8, args=0x000000016b5944d0, args=0x000000016b5944c4, args=0x000000016b5944c3) at subprocess.h:957:3
        frame [#5](/bitcoin-bitcoin/5/): 0x0000000105df8360 bitcoind`RunCommandParseJSON(str_command="/opt/homebrew/opt/python@3.13/bin/python3.13 /Users/<user>/data/code/bitcoin/test/functional/mocks/signer.py enumerate", str_std_in="") at run_command.cpp:27:14
        frame [#6](/bitcoin-bitcoin/6/): 0x0000000105e1e030 bitcoind`ExternalSigner::Enumerate(command="/opt/homebrew/opt/python@3.13/bin/python3.13 /Users/<user>/data/code/bitcoin/test/functional/mocks/signer.py", signers=size=0, chain="regtest") at external_signer.cpp:28:29
        frame [#7](/bitcoin-bitcoin/7/): 0x0000000105410f44 bitcoind`enumeratesigners()::$_0::operator()(this=0x000000016b5955b8, self=0x000000016b595598, request=0x000000016b5958b0) const at external_signer.cpp:51:17
        frame [#8](/bitcoin-bitcoin/8/): 0x0000000105410d20 bitcoind`decltype(std::declval<enumeratesigners()::$_0&>()(std::declval<RPCHelpMan const&>(), std::declval<JSONRPCRequest const&>())) std::__1::__invoke[abi:v160006]<enumeratesigners()::$_0&, RPCHelpMan const&, JSONRPCRequest const&>(__f=0x000000016b5955b8, __args=0x000000016b595598, __args=0x000000016b5958b0) at invoke.h:394:23
        frame [#9](/bitcoin-bitcoin/9/): 0x0000000105410c48 bitcoind`UniValue std::__1::__invoke_void_return_wrapper<UniValue, false>::__call<enumeratesigners()::$_0&, RPCHelpMan const&, JSONRPCRequest const&>(__args=0x000000016b5955b8, __args=0x000000016b595598, __args=0x000000016b5958b0) at invoke.h:478:16
        frame [#10](/bitcoin-bitcoin/10/): 0x0000000105410bd4 bitcoind`std::__1::__function::__alloc_func<enumeratesigners()::$_0, std::__1::allocator<enumeratesigners()::$_0>, UniValue (RPCHelpMan const&, JSONRPCRequest const&)>::operator()[abi:v160006](this=0x000000016b5955b8, __arg=0x000000016b595598, __arg=0x000000016b5958b0) at function.h:185:16
        frame [#11](/bitcoin-bitcoin/11/): 0x000000010540e58c bitcoind`std::__1::__function::__func<enumeratesigners()::$_0, std::__1::allocator<enumeratesigners()::$_0>, UniValue (RPCHelpMan const&, JSONRPCRequest const&)>::operator()(this=0x000000016b5955b0, __arg=0x000000016b595598, __arg=0x000000016b5958b0) at function.h:356:12
        frame [#12](/bitcoin-bitcoin/12/): 0x0000000105ece8dc bitcoind`std::__1::__function::__value_func<UniValue (RPCHelpMan const&, JSONRPCRequest const&)>::operator()[abi:v160006](this=0x000000016b5955b0, __args=0x000000016b595598, __args=0x000000016b5958b0) const at function.h:510:16
        frame [#13](/bitcoin-bitcoin/13/): 0x0000000105eb83dc bitcoind`std::__1::function<UniValue (RPCHelpMan const&, JSONRPCRequest const&)>::operator()(this= Lambda in File external_signer.cpp at Line 44, __arg=0x000000016b595598, __arg=0x000000016b5958b0) const at function.h:1156:12
        frame [#14](/bitcoin-bitcoin/14/): 0x0000000105eb6ed4 bitcoind`RPCHelpMan::HandleRequest(this=0x000000016b595598, request=0x000000016b5958b0) const at util.cpp:663:20
        frame [#15](/bitcoin-bitcoin/15/): 0x0000000105368b04 bitcoind`CRPCCommand::CRPCCommand(this=0x000000010642b4c8, request=0x000000016b5958b0, result=0x000000016b595e00, (null)=true)())::'lambda'(JSONRPCRequest const&, UniValue&, bool)::operator()(JSONRPCRequest const&, UniValue&, bool) const at server.h:101:91
        frame [#16](/bitcoin-bitcoin/16/): 0x0000000105368a6c bitcoind`decltype(std::declval<CRPCCommand::CRPCCommand(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, RPCHelpMan (*)())::'lambda'(JSONRPCRequest const&, UniValue&, bool)&>()(std::declval<JSONRPCRequest const&>(), std::declval<UniValue&>(), std::declval<bool>())) std::__1::__invoke[abi:v160006]<CRPCCommand::CRPCCommand(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, RPCHelpMan (*)())::'lambda'(JSONRPCRequest const&, UniValue&, bool)&, JSONRPCRequest const&, UniValue&, bool>(__f=0x000000010642b4c8, __args=0x000000016b5958b0, __args=0x000000016b595e00, __args=0x000000016b5957d7) at invoke.h:394:23
        frame [#17](/bitcoin-bitcoin/17/): 0x0000000105368980 bitcoind`bool std::__1::__invoke_void_return_wrapper<bool, false>::__call<CRPCCommand::CRPCCommand(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, RPCHelpMan (*)())::'lambda'(JSONRPCRequest const&, UniValue&, bool)&, JSONRPCRequest const&, UniValue&, bool>(__args=0x000000010642b4c8, __args=0x000000016b5958b0, __args=0x000000016b595e00, __args=0x000000016b5957d7) at invoke.h:478:16
        frame [#18](/bitcoin-bitcoin/18/): 0x0000000105368900 bitcoind`std::__1::__function::__alloc_func<CRPCCommand::CRPCCommand(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, RPCHelpMan (*)())::'lambda'(JSONRPCRequest const&, UniValue&, bool), std::__1::allocator<CRPCCommand::CRPCCommand(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, RPCHelpMan (*)())::'lambda'(JSONRPCRequest const&, UniValue&, bool)>, bool (JSONRPCRequest const&, UniValue&, bool)>::operator()[abi:v160006](this=0x000000010642b4c8, __arg=0x000000016b5958b0, __arg=0x000000016b595e00, __arg=0x000000016b5957d7) at function.h:185:16
        frame [#19](/bitcoin-bitcoin/19/): 0x0000000105366298 bitcoind`std::__1::__function::__func<CRPCCommand::CRPCCommand(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, RPCHelpMan (*)())::'lambda'(JSONRPCRequest const&, UniValue&, bool), std::__1::allocator<CRPCCommand::CRPCCommand(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, RPCHelpMan (*)())::'lambda'(JSONRPCRequest const&, UniValue&, bool)>, bool (JSONRPCRequest const&, UniValue&, bool)>::operator()(this=0x000000010642b4c0, __arg=0x000000016b5958b0, __arg=0x000000016b595e00, __arg=0x000000016b5957d7) at function.h:356:12
        frame [#20](/bitcoin-bitcoin/20/): 0x00000001051a3f0c bitcoind`std::__1::__function::__value_func<bool (JSONRPCRequest const&, UniValue&, bool)>::operator()[abi:v160006](this=0x000000010642b4c0, __args=0x000000016b5958b0, __args=0x000000016b595e00, __args=0x000000016b5957d7) const at function.h:510:16
        frame [#21](/bitcoin-bitcoin/21/): 0x00000001051a3df0 bitcoind`std::__1::function<bool (JSONRPCRequest const&, UniValue&, bool)>::operator()(this= Function = enumeratesigners() , __arg=0x000000016b5958b0, __arg=0x000000016b595e00, __arg=true) const at function.h:1156:12
        frame [#22](/bitcoin-bitcoin/22/): 0x00000001055f51cc bitcoind`ExecuteCommand(command=0x000000010642b490, request=0x000000016b596b10, result=0x000000016b595e00, last_handler=true) at server.cpp:514:20
        frame [#23](/bitcoin-bitcoin/23/): 0x00000001055da63c bitcoind`ExecuteCommands(commands=size=1, request=0x000000016b596b10, result=0x000000016b595e00) at server.cpp:481:13
        frame [#24](/bitcoin-bitcoin/24/): 0x00000001055da378 bitcoind`CRPCTable::execute(this=0x0000000106431b88, request=0x000000016b596b10) const at server.cpp:501:13
        frame [#25](/bitcoin-bitcoin/25/): 0x00000001055d9e38 bitcoind`JSONRPCExec(jreq=0x000000016b596b10, catch_errors=true) at server.cpp:357:31
        frame [#26](/bitcoin-bitcoin/26/): 0x0000000104ce4b10 bitcoind`HTTPReq_JSONRPC(context=0x0000600001be2ac8, req=0x00006000015e2e40) at httprpc.cpp:204:21
        frame [#27](/bitcoin-bitcoin/27/): 0x0000000104ce42c4 bitcoind`StartHTTPRPC(std::__1::any const&)::$_0::operator()(this=0x0000600001be2ac8, req=0x00006000015e2e40, (null)="") const at httprpc.cpp:376:80
        frame [#28](/bitcoin-bitcoin/28/): 0x0000000104ce4250 bitcoind`decltype(std::declval<StartHTTPRPC(std::__1::any const&)::$_0&>()(std::declval<HTTPRequest*>(), std::declval<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&>())) std::__1::__invoke[abi:v160006]<StartHTTPRPC(std::__1::any const&)::$_0&, HTTPRequest*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&>(__f=0x0000600001be2ac8, __args=0x000000016b596e10, __args="") at invoke.h:394:23
        frame [#29](/bitcoin-bitcoin/29/): 0x0000000104ce4170 bitcoind`bool std::__1::__invoke_void_return_wrapper<bool, false>::__call<StartHTTPRPC(std::__1::any const&)::$_0&, HTTPRequest*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&>(__args=0x0000600001be2ac8, __args=0x000000016b596e10, __args="") at invoke.h:478:16
        frame [#30](/bitcoin-bitcoin/30/): 0x0000000104ce40f8 bitcoind`std::__1::__function::__alloc_func<StartHTTPRPC(std::__1::any const&)::$_0, std::__1::allocator<StartHTTPRPC(std::__1::any const&)::$_0>, bool (HTTPRequest*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&)>::operator()[abi:v160006](this=0x0000600001be2ac8, __arg=0x000000016b596e10, __arg="") at function.h:185:16
        frame [#31](/bitcoin-bitcoin/31/): 0x0000000104ce1adc bitcoind`std::__1::__function::__func<StartHTTPRPC(std::__1::any const&)::$_0, std::__1::allocator<StartHTTPRPC(std::__1::any const&)::$_0>, bool (HTTPRequest*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&)>::operator()(this=0x0000600001be2ac0, __arg=0x000000016b596e10, __arg="") at function.h:356:12
        frame [#32](/bitcoin-bitcoin/32/): 0x0000000104d04198 bitcoind`std::__1::__function::__value_func<bool (HTTPRequest*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&)>::operator()[abi:v160006](this=0x00006000036e6878, __args=0x000000016b596e10, __args="") const at function.h:510:16
        frame [#33](/bitcoin-bitcoin/33/): 0x0000000104d0408c bitcoind`std::__1::function<bool (HTTPRequest*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&)>::operator()(this= Lambda in File httprpc.cpp at Line 376, __arg=0x00006000015e2e40, __arg="") const at function.h:1156:12
        frame [#34](/bitcoin-bitcoin/34/): 0x0000000104d03cf0 bitcoind`HTTPWorkItem::operator()(this=0x00006000036e6850) at httpserver.cpp:62:9
        frame [#35](/bitcoin-bitcoin/35/): 0x0000000104d17560 bitcoind`WorkQueue<HTTPClosure>::Run(this=0x00006000022e4000) at httpserver.cpp:117:13
        frame [#36](/bitcoin-bitcoin/36/): 0x0000000104cf099c bitcoind`HTTPWorkQueueRun(queue=0x00006000022e4000, worker_num=3) at httpserver.cpp:419:12
        frame [#37](/bitcoin-bitcoin/37/): 0x0000000104d14498 bitcoind`decltype(std::declval<void (*)(WorkQueue<HTTPClosure>*, int)>()(std::declval<WorkQueue<HTTPClosure>*>(), std::declval<int>())) std::__1::__invoke[abi:v160006]<void (*)(WorkQueue<HTTPClosure>*, int), WorkQueue<HTTPClosure>*, int>(__f=0x00006000015e81c8, __args=0x00006000015e81d0, __args=0x00006000015e81d8) at invoke.h:394:23
        frame [#38](/bitcoin-bitcoin/38/): 0x0000000104d14354 bitcoind`void std::__1::__thread_execute[abi:v160006]<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, void (*)(WorkQueue<HTTPClosure>*, int), WorkQueue<HTTPClosure>*, int, 2ul, 3ul>(__t=size=4, (null)=__tuple_indices<2UL, 3UL> @ 0x000000016b596f77) at thread:282:5
        frame [#39](/bitcoin-bitcoin/39/): 0x0000000104d13310 bitcoind`void* std::__1::__thread_proxy[abi:v160006]<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, void (*)(WorkQueue<HTTPClosure>*, int), WorkQueue<HTTPClosure>*, int>>(__vp=0x00006000015e81c0) at thread:293:5
        frame [#40](/bitcoin-bitcoin/40/): 0x000000018858f2e4 libsystem_pthread.dylib`_pthread_start + 136
    (lldb) 
    
    # the signer.py process likely hangs on buffer = sys.stdin.read()
    (lldb) process attach --pid 31835
    Process 31835 stopped
    * thread [#1](/bitcoin-bitcoin/1/), queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
        frame [#0](/bitcoin-bitcoin/0/): 0x000000018854e9b0 libsystem_kernel.dylib`read + 8
    libsystem_kernel.dylib`read:
    ->  0x18854e9b0 <+8>:  b.lo   0x18854e9d0               ; <+40>
        0x18854e9b4 <+12>: pacibsp 
        0x18854e9b8 <+16>: stp    x29, x30, [sp, #-0x10]!
        0x18854e9bc <+20>: mov    x29, sp
    Executable module set to "/opt/homebrew/Cellar/python@3.13/3.13.3_1/Frameworks/Python.framework/Versions/3.13/Resources/Python.app/Contents/MacOS/Python".
    Architecture set to: arm64-apple-macosx-.
    (lldb) bt
    * thread [#1](/bitcoin-bitcoin/1/), queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
      * frame [#0](/bitcoin-bitcoin/0/): 0x000000018854e9b0 libsystem_kernel.dylib`read + 8
        frame [#1](/bitcoin-bitcoin/1/): 0x0000000102bc9034 Python`_Py_read + 108
        frame [#2](/bitcoin-bitcoin/2/): 0x0000000102bf6440 Python`_io_FileIO_readall_impl + 304
        frame [#3](/bitcoin-bitcoin/3/): 0x0000000102a6cac8 Python`cfunction_vectorcall_NOARGS + 100
        frame [#4](/bitcoin-bitcoin/4/): 0x0000000102bfc290 Python`_io__Buffered_read + 736
        frame [#5](/bitcoin-bitcoin/5/): 0x0000000102a1c514 Python`method_vectorcall_FASTCALL + 108
        frame [#6](/bitcoin-bitcoin/6/): 0x0000000102a119b8 Python`PyObject_VectorcallMethod + 148
        frame [#7](/bitcoin-bitcoin/7/): 0x0000000102c058d0 Python`_io_TextIOWrapper_read + 192
        frame [#8](/bitcoin-bitcoin/8/): 0x0000000102a1c514 Python`method_vectorcall_FASTCALL + 108
        frame [#9](/bitcoin-bitcoin/9/): 0x0000000102a103a0 Python`PyObject_Vectorcall + 92
        frame [#10](/bitcoin-bitcoin/10/): 0x0000000102b387d0 Python`_PyEval_EvalFrameDefault + 6876
        frame [#11](/bitcoin-bitcoin/11/): 0x0000000102b36a78 Python`PyEval_EvalCode + 200
        frame [#12](/bitcoin-bitcoin/12/): 0x0000000102ba7078 Python`run_eval_code_obj + 104
        frame [#13](/bitcoin-bitcoin/13/): 0x0000000102ba6ac0 Python`run_mod + 168
        frame [#14](/bitcoin-bitcoin/14/): 0x0000000102ba51a0 Python`pyrun_file + 164
        frame [#15](/bitcoin-bitcoin/15/): 0x0000000102ba44dc Python`_PyRun_SimpleFileObject + 256
        frame [#16](/bitcoin-bitcoin/16/): 0x0000000102ba4114 Python`_PyRun_AnyFileObject + 80
        frame [#17](/bitcoin-bitcoin/17/): 0x0000000102bcba2c Python`pymain_run_file_obj + 164
        frame [#18](/bitcoin-bitcoin/18/): 0x0000000102bcb6d8 Python`pymain_run_file + 76
        frame [#19](/bitcoin-bitcoin/19/): 0x0000000102bca948 Python`Py_RunMain + 968
        frame [#20](/bitcoin-bitcoin/20/): 0x0000000102bcaf44 Python`pymain_main + 304
        frame [#21](/bitcoin-bitcoin/21/): 0x0000000102bcafe4 Python`Py_BytesMain + 40
        frame [#22](/bitcoin-bitcoin/22/): 0x000000018820c274 dyld`start + 2840
    (lldb) 
    
    
    
  40. pablomartin4btc commented at 12:45 PM on July 23, 2025: member
  41. tnndbtc commented at 11:46 PM on July 23, 2025: none

    Since it happened again, should we consider adding more error/exception handlings on subprocess_close? Currently we assume it's always successful, but it may return -1 as error code.

  42. Sjors referenced this in commit d25de7e249 on Jul 25, 2025
  43. Sjors commented at 8:20 AM on July 25, 2025: member

    @tnndbtc I opened #33061 to do this, though this may be problematic.

  44. Sjors referenced this in commit 4911de32e5 on Jul 25, 2025
  45. fanquake closed this on Jul 26, 2025

  46. fanquake referenced this in commit 75a5c8258e on Jul 26, 2025

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-04-22 12:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me