intermittent issue in rpc_signer.py (enumeratesigners timeout) under GLIBCXX debug mode #32524

issue maflcko openend this issue on May 16, 2025
  1. maflcko commented at 6:36 am on May 16, 2025: member

    https://cirrus-ci.com/task/5094568045051904?logs=ci#L4419

    0[18:26:32.573] test_framework.authproxy.JSONRPCException: 'enumeratesigners' RPC took longer than 1200.000000 seconds. Consider using larger timeout for calls that take longer to return. (-344)
    

    Looking at the full log, the test is just idle and then times out. However, there is no indication why the RPC does not proceed.

  2. maflcko added the label CI failed on May 16, 2025
  3. maflcko commented at 10:04 am on May 16, 2025: member
  4. maflcko commented at 10:46 am on May 16, 2025: member

    Same https://cirrus-ci.com/task/4561849141690368?logs=ci#L6059

    So I guess this was recently introduced

  5. fanquake commented at 10:49 am on May 16, 2025: member
  6. laanwj commented at 11:12 am on May 16, 2025: member

    If it’s #32343 i don’t understand why. It was changed so that the brute force fd-closing path isn’t even compiled on Linux.

    No idea what could even take so long, unless it’s an obscure bug in the C++ library like fs::directory_iterator looping forever.

  7. laanwj commented at 11:37 am on May 16, 2025: member

    It would be reasonaly useful to if it logged a full backtrace of bitcoind (gdb thread apply all bt) in these case of timeouts.

    Though, i’m not sure that’s enough to debug this. If this is what we think that it is, this is already happening in the forked process, after all. This similarly makes it difficult to add debug logging.

    Edit: i’ll make an implementation using bare readdir instead of the fs_iterator, and open a PR. So we can see if that fixes it.

  8. maflcko commented at 8:46 am on May 17, 2025: member
    I’ve created 50 runs and they all passed: https://cirrus-ci.com/task/5325312260571136 So I guess this is related. I’ll try to get the gdb bt next.
  9. maflcko commented at 8:54 am on May 17, 2025: member
    In the meantime, I place my bet that this is a gcc-11 upstream bug 🤞
  10. romanz commented at 12:49 pm on May 17, 2025: contributor
  11. maflcko commented at 12:39 pm on May 18, 2025: member

    The bt is:

     0Thread 9 (Thread 0x7f0973400640 (LWP 21341) "b-httpworker.5"):
     1[#0](/bitcoin-bitcoin/0/)  __GI___libc_read (nbytes=1024, buf=0x7f09733fb7f0, fd=30) at ../sysdeps/unix/sysv/linux/read.c:26
     2[#1](/bitcoin-bitcoin/1/)  __GI___libc_read (fd=30, buf=0x7f09733fb7f0, nbytes=1024) at ../sysdeps/unix/sysv/linux/read.c:24
     3[#2](/bitcoin-bitcoin/2/)  0x0000564d74731ba3 in subprocess::Popen::execute_process() ()
     4[#3](/bitcoin-bitcoin/3/)  0x0000564d74732b44 in subprocess::Popen::Popen<subprocess::input, subprocess::output, subprocess::error, subprocess::close_fds>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, subprocess::input&&, subprocess::output&&, subprocess::error&&, subprocess::close_fds&&) ()
     5[#4](/bitcoin-bitcoin/4/)  0x0000564d747299f5 in RunCommandParseJSON(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) ()
     6[#5](/bitcoin-bitcoin/5/)  0x0000564d7457642c in ExternalSigner::Enumerate(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__debug::vector<ExternalSigner, std::allocator<ExternalSigner> >&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) ()
     7[#6](/bitcoin-bitcoin/6/)  0x0000564d73e7f8e5 in enumeratesigners()::{lambda(RPCHelpMan const&, JSONRPCRequest const&)#1}::operator()(RPCHelpMan const&, JSONRPCRequest const&) const [clone .constprop.0] ()
     8[#7](/bitcoin-bitcoin/7/)  0x0000564d73e815d2 in std::_Function_handler<UniValue (RPCHelpMan const&, JSONRPCRequest const&), enumeratesigners()::{lambda(RPCHelpMan const&, JSONRPCRequest const&)#1}>::_M_invoke(std::_Any_data const&, RPCHelpMan const&, JSONRPCRequest const&) ()
     9[#8](/bitcoin-bitcoin/8/)  0x0000564d74630660 in RPCHelpMan::HandleRequest(JSONRPCRequest const&) const ()
    10[#9](/bitcoin-bitcoin/9/)  0x0000564d73e6334e in CRPCCommand::CRPCCommand(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, RPCHelpMan (*)())::{lambda(JSONRPCRequest const&, UniValue&, bool)#1}::operator()(JSONRPCRequest const&, UniValue&, bool) const [clone .isra.0] ()
    11[#10](/bitcoin-bitcoin/10/) 0x0000564d73f80826 in ExecuteCommands(std::__debug::vector<CRPCCommand const*, std::allocator<CRPCCommand const*> > const&, JSONRPCRequest const&, UniValue&) ()
    12[#11](/bitcoin-bitcoin/11/) 0x0000564d73f80e4d in CRPCTable::execute(JSONRPCRequest const&) const ()
    13[#12](/bitcoin-bitcoin/12/) 0x0000564d73f81031 in JSONRPCExec(JSONRPCRequest const&, bool) ()
    14[#13](/bitcoin-bitcoin/13/) 0x0000564d740fbb02 in HTTPReq_JSONRPC(std::any const&, HTTPRequest*) ()
    15[#14](/bitcoin-bitcoin/14/) 0x0000564d7411405d in WorkQueue<HTTPClosure>::Run() ()
    16[#15](/bitcoin-bitcoin/15/) 0x0000564d7410ca38 in HTTPWorkQueueRun(WorkQueue<HTTPClosure>*, int) ()
    17[#16](/bitcoin-bitcoin/16/) 0x00007f097c4cf253 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
    18[#17](/bitcoin-bitcoin/17/) 0x00007f097c157ac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
    19[#18](/bitcoin-bitcoin/18/) 0x00007f097c1e9850 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
    
  12. maflcko commented at 6:25 am on May 19, 2025: member
    Whatever the issue is, it may be fixed in gcc-14. At least https://cirrus-ci.com/task/5477938151292928 passed 25 times. (The prior commit with gcc-12 failed at least once: https://cirrus-ci.com/task/5645419964792832?logs=ci#L4122). Gcc-13 also failed at least once: https://cirrus-ci.com/task/4577002323181568?logs=ci#L690
  13. maflcko commented at 7:21 am on May 19, 2025: member

    Backtrace with file names and lines:

     0Thread 33 (Thread 0x7f9d93e006c0 (LWP 18294) "b-httpworker.2"):
     1[#0](/bitcoin-bitcoin/0/)  0x00007f9da1dfca9a in read () from /lib/x86_64-linux-gnu/libc.so.6
     2[#1](/bitcoin-bitcoin/1/)  0x000055f3d4633e22 in read (__nbytes=1024, __buf=0x7f9d93dfb030, __fd=30) at /usr/include/x86_64-linux-gnu/bits/unistd.h:28
     3[#2](/bitcoin-bitcoin/2/)  subprocess::util::read_atmost_n (read_upto=1024, buf=0x7f9d93dfb030 "", fp=0x7f9d8c0036c0) at ./util/subprocess.h:437
     4[#3](/bitcoin-bitcoin/3/)  subprocess::Popen::execute_process (this=this@entry=0x7f9d93dfb8b0) at ./util/subprocess.h:1242
     5[#4](/bitcoin-bitcoin/4/)  0x000055f3d4634bfc in subprocess::Popen::Popen<subprocess::input, subprocess::output, subprocess::error, subprocess::close_fds> (this=0x7f9d93dfb8b0, cmd_args="/usr/bin/python3 /ci_container_base/test/functional/mocks/signer.py enumerate") at ./util/subprocess.h:964
     6[#5](/bitcoin-bitcoin/5/)  0x000055f3d462a9dc in RunCommandParseJSON (str_command="/usr/bin/python3 /ci_container_base/test/functional/mocks/signer.py enumerate", str_std_in="") at ./common/run_command.cpp:27
     7[#6](/bitcoin-bitcoin/6/)  0x000055f3d44717bc in ExternalSigner::Enumerate (command="/usr/bin/python3 /ci_container_base/test/functional/mocks/signer.py", signers=std::__debug::vector of length 0, capacity 0, chain="regtest") at /usr/include/c++/13/bits/allocator.h:184
     8[#7](/bitcoin-bitcoin/7/)  0x000055f3d3d78a55 in operator() (request=..., self=..., __closure=<optimized out>) at /usr/include/c++/13/bits/basic_string.tcc:242
     9[#8](/bitcoin-bitcoin/8/)  0x000055f3d3d7a0c1 in std::__invoke_impl<UniValue, enumeratesigners()::<lambda(const RPCHelpMan&, const JSONRPCRequest&)>&, const RPCHelpMan&, const JSONRPCRequest&> (__f=...) at /usr/include/c++/13/bits/invoke.h:60
    10[#9](/bitcoin-bitcoin/9/)  std::__invoke_r<UniValue, enumeratesigners()::<lambda(const RPCHelpMan&, const JSONRPCRequest&)>&, const RPCHelpMan&, const JSONRPCRequest&> (__fn=...) at /usr/include/c++/13/bits/invoke.h:116
    11[#10](/bitcoin-bitcoin/10/) std::_Function_handler<UniValue(const RPCHelpMan&, const JSONRPCRequest&), enumeratesigners()::<lambda(const RPCHelpMan&, const JSONRPCRequest&)> >::_M_invoke(const std::_Any_data &, const RPCHelpMan &, const JSONRPCRequest &) (__functor=..., __args#0=..., __args#1=...) at /usr/include/c++/13/bits/std_function.h:291
    12[#11](/bitcoin-bitcoin/11/) 0x000055f3d4530306 in std::function<UniValue (RPCHelpMan const&, JSONRPCRequest const&)>::operator()(RPCHelpMan const&, JSONRPCRequest const&) const (__args#1=..., __args#0=..., this=0x7f9d93dfd110) at /usr/include/c++/13/bits/std_function.h:591
    13[#12](/bitcoin-bitcoin/12/) RPCHelpMan::HandleRequest (this=this@entry=0x7f9d93dfd0f0, request=...) at ./rpc/util.cpp:663
    14[#13](/bitcoin-bitcoin/13/) 0x000055f3d3d5d2fb in CRPCCommand::CRPCCommand(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, RPCHelpMan (*)())::{lambda(JSONRPCRequest const&, UniValue&, bool)#1}::operator()(JSONRPCRequest const&, UniValue&, bool) const (request=..., result=..., __closure=<optimized out>) at ./rpc/server.h:101
    15[#14](/bitcoin-bitcoin/14/) 0x000055f3d3e6ed87 in std::function<bool (JSONRPCRequest const&, UniValue&, bool)>::operator()(JSONRPCRequest const&, UniValue&, bool) const (__args#2=<optimized out>, __args#1=..., __args#0=..., this=0x55f3d50aee20 <RegisterSignerRPCCommands(CRPCTable&)::commands+64>) at /usr/include/c++/13/bits/std_function.h:591
    16[#15](/bitcoin-bitcoin/15/) ExecuteCommand (last_handler=true, result=..., request=..., command=...) at ./rpc/server.cpp:514
    17[#16](/bitcoin-bitcoin/16/) ExecuteCommands (commands=std::__debug::vector of length 1, capacity 1 = {...}, request=..., result=...) at ./rpc/server.cpp:481
    18[#17](/bitcoin-bitcoin/17/) 0x000055f3d3e6fc17 in CRPCTable::execute (this=this@entry=0x55f3d50b1f80 <tableRPC>, request=...) at ./rpc/server.cpp:501
    19[#18](/bitcoin-bitcoin/18/) 0x000055f3d3e6ffdb in JSONRPCExec (jreq=..., catch_errors=<optimized out>) at ./rpc/server.cpp:357
    20[#19](/bitcoin-bitcoin/19/) 0x000055f3d3ffddd2 in HTTPReq_JSONRPC (context=..., req=0x7f9d30001ca0) at ./httprpc.cpp:217
    21[#20](/bitcoin-bitcoin/20/) 0x000055f3d4014451 in std::function<bool (HTTPRequest*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)>::operator()(HTTPRequest*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const (__args#1="", __args#0=<optimized out>, this=0x7f9d300053c0) at /usr/include/c++/13/bits/std_function.h:591
    22[#21](/bitcoin-bitcoin/21/) HTTPWorkItem::operator() (this=0x7f9d30005390) at ./httpserver.cpp:60
    23[#22](/bitcoin-bitcoin/22/) WorkQueue<HTTPClosure>::Run (this=this@entry=0x55f40ede4b10) at ./httpserver.cpp:115
    24[#23](/bitcoin-bitcoin/23/) 0x000055f3d400bb37 in HTTPWorkQueueRun (queue=0x55f40ede4b10, worker_num=<optimized out>) at ./httpserver.cpp:417
    25[#24](/bitcoin-bitcoin/24/) 0x00007f9da20f6db4 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
    26[#25](/bitcoin-bitcoin/25/) 0x00007f9da1d7daa4 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
    27[#26](/bitcoin-bitcoin/26/) 0x00007f9da1e0ac3c in ?? () from /lib/x86_64-linux-gnu/libc.so.6
    
  14. laanwj commented at 8:33 am on May 19, 2025: member
    That backtrace is exactly what i’d expected. It’s the parent process waiting for the forked child there.
  15. maflcko commented at 10:00 am on May 19, 2025: member

    So far it looks like this can only happen with gcc-13 (or older) and GLIBCXX debug mode. I don’t think any real user is running in production with the std lib debug mode, so an alternative would be to just fixup the CI?

    For reference, the end-user setting -D_GLIBCXX_ASSERTIONS passed 25 times: https://cirrus-ci.com/task/5739615073599488

  16. laanwj commented at 10:22 am on May 19, 2025: member

    so an alternative would be to just fixup the CI?

    Sure. If this is unlikely to cause issues for real users, that makes more sense, the point of the CI is not to keep us patching the code to make the CI happy 😄

  17. maflcko commented at 10:58 am on May 19, 2025: member

    I’d say finding compiler bugs is a valid use-case of the CI, but this one seems to be fixed already, so there is probably not much value in bisecting the exact gcc commit, especially given that it would be extremely tedious without a minimal non-intermittent reproducer.

    (The time would be better spent on reducing other compiler bugs, such as #32276 (comment))

  18. laanwj commented at 12:02 pm on May 19, 2025: member
    Yes, bisecting it would be neigh-impossible, given that it seems impossible to reliably reproduce (one thing that’s strange is that it never happens in the unit tests, which exercise RunCommandParseJSON). i already looked into the git history of directory_iterator in the gcc/libstdc++-v3 and wasn’t able to find any bug fix that would explain the behavior here, or how it was fixed. So it’s likely something indirect.
  19. maflcko commented at 12:15 pm on May 19, 2025: member
    The debug mode keeps state in additional structs/fields, so my bet would be that one of the states is multi-thread unsafe, but that is just a wild guess.
  20. fanquake closed this on May 20, 2025

  21. fanquake referenced this in commit af65fd1a33 on May 20, 2025
  22. maflcko commented at 11:17 am on May 20, 2025: member

    It even happens on gcc-14, it seems:

    https://cirrus-ci.com/task/5476449961902080?logs=ci#L5506

  23. maflcko reopened this on May 20, 2025

  24. maflcko renamed this:
    intermittent issue in rpc_signer.py (enumeratesigners timeout)
    intermittent issue in rpc_signer.py (enumeratesigners timeout) under GLIBCXX debug mode
    on May 20, 2025
  25. fanquake added this to the milestone 30.0 on May 20, 2025
  26. maflcko commented at 12:54 pm on May 20, 2025: member
    What I don’t understand is why the CentOS task seemingly passed before, but started failing after https://github.com/bitcoin/bitcoin/commit/af65fd1a333011137dafd3df9a51704fd319feb4, even though the task wasn’t modified apart from a comment.
  27. fanquake commented at 2:15 pm on May 21, 2025: member
  28. l0rinc commented at 9:03 am on May 22, 2025: contributor

    Another one in https://github.com/bitcoin/bitcoin/pull/32404/checks?check_run_id=42637298331 - retriggering the build makes it pass.

     0[13:30:22.121] node0 2025-05-21T17:30:07.606048Z [msghand] [logging/timer.h:58] [Log] [lock] Enter: lock contention connman.m_nodes_mutex, ./net.h:1648 started
     1[13:30:22.121] node0 2025-05-21T17:30:07.606205Z [msghand] [logging/timer.h:58] [Log] [lock] Enter: lock contention connman.m_nodes_mutex, ./net.h:1648 completed (2μs)
     2[13:30:22.121] node0 2025-05-21T17:30:21.228925Z [msghand] [net_processing.cpp:3449] [ProcessMessage] [net] received: ping (8 bytes) peer=0
     3[13:30:22.121] node0 2025-05-21T17:30:21.229000Z [msghand] [net.cpp:3870] [PushMessage] [net] sending pong (8 bytes) peer=0
     4[13:30:22.121] node0 2025-05-21T17:30:21.229120Z [msghand] [net.cpp:3870] [PushMessage] [net] sending ping (8 bytes) peer=0
     5[13:30:22.121] node0 2025-05-21T17:30:21.229620Z [msghand] [net_processing.cpp:3449] [ProcessMessage] [net] received: pong (8 bytes) peer=0
     6[13:30:22.121] node0 2025-05-21T17:30:21.329837Z [net] [logging/timer.h:58] [Log] [lock] Enter: lock contention m_nodes_mutex, ./net.cpp:1906 started
     7[13:30:22.121] node0 2025-05-21T17:30:21.329888Z [net] [logging/timer.h:58] [Log] [lock] Enter: lock contention m_nodes_mutex, ./net.cpp:1906 completed (1μs)
     8[13:30:22.121] test  2025-05-21T17:30:21.429000Z TestFramework (ERROR): Assertion failed
     9[13:30:22.121] Traceback (most recent call last):
    10[13:30:22.121] File "/ci_container_base/test/functional/test_framework/authproxy.py", line 169, in _get_response
    11[13:30:22.121] http_response = self.__conn.getresponse()
    12[13:30:22.121] ^^^^^^^^^^^^^^^^^^^^^^^^^
    13[13:30:22.121] File "/usr/lib64/python3.12/http/client.py", line 1430, in getresponse
    14[13:30:22.121] response.begin()
    15[13:30:22.121] File "/usr/lib64/python3.12/http/client.py", line 331, in begin
    16[13:30:22.121] version, status, reason = self._read_status()
    17[13:30:22.121] ^^^^^^^^^^^^^^^^^^^
    18[13:30:22.121] File "/usr/lib64/python3.12/http/client.py", line 292, in _read_status
    19[13:30:22.121] line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
    20[13:30:22.121] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    21[13:30:22.121] File "/usr/lib64/python3.12/socket.py", line 720, in readinto
    22[13:30:22.121] return self._sock.recv_into(b)
    23[13:30:22.121] ^^^^^^^^^^^^^^^^^^^^^^^
    24[13:30:22.121] TimeoutError: timed out
    25[13:30:22.121] During handling of the above exception, another exception occurred:
    26[13:30:22.121] Traceback (most recent call last):
    27[13:30:22.121] File "/ci_container_base/test/functional/test_framework/util.py", line 164, in try_rpc
    28[13:30:22.121] fun(*args, **kwds)
    29[13:30:22.121] File "/ci_container_base/test/functional/test_framework/coverage.py", line 50, in __call__
    30[13:30:22.121] return_val = self.auth_service_proxy_instance.__call__(*args, **kwargs)
    31[13:30:22.121] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    32[13:30:22.121] File "/ci_container_base/test/functional/test_framework/authproxy.py", line 132, in __call__
    33[13:30:22.121] response, status = self._request('POST', self.__url.path, postdata.encode('utf-8'))
    34[13:30:22.121] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    35[13:30:22.121] File "/ci_container_base/test/functional/test_framework/authproxy.py", line 106, in _request
    36[13:30:22.121] return self._get_response()
    37[13:30:22.121] ^^^^^^^^^^^^^^^^^^^^
    38[13:30:22.121] File "/ci_container_base/test/functional/test_framework/authproxy.py", line 171, in _get_response
    39[13:30:22.121] raise JSONRPCException({
    40[13:30:22.121] test_framework.authproxy.JSONRPCException: 'enumeratesigners' RPC took longer than 1200.000000 seconds. Consider using larger timeout for calls that take longer to return. (-344)
    41[13:30:22.121] During handling of the above exception, another exception occurred:
    42[13:30:22.121] Traceback (most recent call last):
    43[13:30:22.121] File "/ci_container_base/test/functional/test_framework/test_framework.py", line 184, in main
    44[13:30:22.121] self.run_test()
    45[13:30:22.121] File "/ci_container_base/ci/scratch/build-x86_64-pc-linux-gnu/test/functional/rpc_signer.py", line 54, in run_test
    46[13:30:22.121] assert_raises_rpc_error(
    47[13:30:22.121] File "/ci_container_base/test/functional/test_framework/util.py", line 155, in assert_raises_rpc_error
    48[13:30:22.121] assert try_rpc(code, message, fun, *args, **kwds), "No exception raised"
    49[13:30:22.121] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    50[13:30:22.121] File "/ci_container_base/test/functional/test_framework/util.py", line 168, in try_rpc
    51[13:30:22.121] raise AssertionError("Unexpected JSONRPC error code %i" % e.error["code"])
    52[13:30:22.121] AssertionError: Unexpected JSONRPC error code -344
    53[13:30:22.121] test  2025-05-21T17:30:21.437000Z TestFramework (DEBUG): Closing down network thread
    54[13:30:22.121] test  2025-05-21T17:30:21.488000Z TestFramework (INFO): Not stopping nodes as test failed. The dangling processes will be cleaned up later.
    55[13:30:22.121] test  2025-05-21T17:30:21.488000Z TestFramework (WARNING): Not cleaning up dir /ci_container_base/ci/scratch/test_runner/test_runner_₿_🏃_20250521_130522/rpc_signer_216
    56[13:30:22.121] test  2025-05-21T17:30:21.488000Z TestFramework (ERROR): Test failed. Test logging available at /ci_container_base/ci/scratch/test_runner/test_runner_₿_🏃_20250521_130522/rpc_signer_216/test_framework.log
    57[13:30:22.121] test  2025-05-21T17:30:21.489000Z TestFramework (ERROR):
    58[13:30:22.121] test  2025-05-21T17:30:21.489000Z TestFramework (ERROR): Hint: Call /ci_container_base/test/functional/combine_logs.py '/ci_container_base/ci/scratch/test_runner/test_runner_₿_🏃_20250521_130522/rpc_signer_216' to consolidate all logs
    59[13:30:22.121] test  2025-05-21T17:30:21.489000Z TestFramework (ERROR):
    60[13:30:22.121] test  2025-05-21T17:30:21.489000Z TestFramework (ERROR): If this failure happened unexpectedly or intermittently, please file a bug and provide a link or upload of the combined log.
    61[13:30:22.121] test  2025-05-21T17:30:21.489000Z TestFramework (ERROR): https://github.com/bitcoin/bitcoin/issues
    62[13:30:22.121] test  2025-05-21T17:30:21.489000Z TestFramework (ERROR):
    
  29. maflcko commented at 11:39 am on May 22, 2025: member

    I am out of ideas on how to proceed here. I tried this again on Ubuntu and the gcc-13 failed frequently (https://cirrus-ci.com/task/4846423977492480), whereas the previous commit with gcc-14 passed 100 times (https://cirrus-ci.com/task/5508081775280128)

  30. maflcko referenced this in commit a476e94689 on May 22, 2025
  31. maflcko referenced this in commit fa079538e3 on May 22, 2025
  32. fanquake closed this on May 22, 2025

  33. fanquake referenced this in commit 2df824f4e6 on May 22, 2025
  34. maflcko commented at 6:10 am on May 23, 2025: member
    This is closed, but the failure remains, when triggered, so it would be good to investigate this long-term at some point. Discussion can continue here, or in a new issue.
  35. fanquake reopened this on May 23, 2025

  36. maflcko added the label Brainstorming on May 23, 2025
  37. maflcko commented at 8:18 am on May 23, 2025: member
    Another data point: I haven’t see a failure so far in my G++ snapshot DEBUG=1 nightly run: https://github.com/maflcko/b-c-nightly/actions/runs/15153825944/job/42604725571
  38. maflcko commented at 3:56 pm on May 23, 2025: member
    The 32-bit GLIBCXX debug run (gcc-13) is also passing fine: https://cirrus-ci.com/task/6401825881980928?logs=ci#L3769
  39. tnndbtc commented at 0:13 am on July 15, 2025: none

    I assume the general code logic should be correct, but likely some exception/run-time error occurred then caused the intermittent issue.

    So, I studied Popen::execute_process() in src/util/subprocess.h, one interesting statement caught my attention:

    0    subprocess_close(err_wr_pipe);// close child side of pipe, else get stuck in read below
    1    stream_.close_child_fds();
    

    I notice that we didn’t check the return code of the calling of subprocess_close()/close(), in many cases. Indeed, close() is a very safe call in general, but what if close() returns -1, but we didn’t check it?

    So, I hacked the subprocess.h by not calling subprocess_close(err_wr_pipe); on purpose in my macOS. Then I kind of reproduced the issue, at least it has very similar call stack on bitcoind, and on signer.py, it likely hangs on buffer = sys.stdin.read()

    Not sure if this helps on the debugging process, but just want to share a thought. Maybe we should handle the return code of subprocess_close() to provide more hints when next it reproduces?

     0# the bitcoind process hangs on subprocess::util::read_atmost_n
     1(lldb) bt
     2* thread [#7](/bitcoin-bitcoin/7/), name = 'b-httpworker.3'
     3  * frame [#0](/bitcoin-bitcoin/0/): 0x000000018854e9b0 libsystem_kernel.dylib`read + 8
     4    frame [#1](/bitcoin-bitcoin/1/): 0x0000000105dfb2b0 bitcoind`subprocess::util::read_atmost_n(fp=0x00000001ed6ac330, buf="", read_upto=1024) at subprocess.h:437:24
     5    frame [#2](/bitcoin-bitcoin/2/): 0x0000000105e06444 bitcoind`subprocess::Popen::execute_process(this=0x000000016b5944e0) at subprocess.h:1250:24
     6    frame [#3](/bitcoin-bitcoin/3/): 0x0000000105e05b68 bitcoind`subprocess::Popen::Popen<subprocess::input, subprocess::output, subprocess::error, subprocess::close_fds>(this=0x000000016b5944e0, cmd_args="/opt/homebrew/opt/python@3.13/bin/python3.13 /Users/<user>/data/code/bitcoin/test/functional/mocks/signer.py enumerate", args=0x000000016b5944d8, args=0x000000016b5944d0, args=0x000000016b5944c4, args=0x000000016b5944c3) at subprocess.h:964:5
     7    frame [#4](/bitcoin-bitcoin/4/): 0x0000000105df8ac8 bitcoind`subprocess::Popen::Popen<subprocess::input, subprocess::output, subprocess::error, subprocess::close_fds>(this=0x000000016b5944e0, cmd_args="/opt/homebrew/opt/python@3.13/bin/python3.13 /Users/<user>/data/code/bitcoin/test/functional/mocks/signer.py enumerate", args=0x000000016b5944d8, args=0x000000016b5944d0, args=0x000000016b5944c4, args=0x000000016b5944c3) at subprocess.h:957:3
     8    frame [#5](/bitcoin-bitcoin/5/): 0x0000000105df8360 bitcoind`RunCommandParseJSON(str_command="/opt/homebrew/opt/python@3.13/bin/python3.13 /Users/<user>/data/code/bitcoin/test/functional/mocks/signer.py enumerate", str_std_in="") at run_command.cpp:27:14
     9    frame [#6](/bitcoin-bitcoin/6/): 0x0000000105e1e030 bitcoind`ExternalSigner::Enumerate(command="/opt/homebrew/opt/python@3.13/bin/python3.13 /Users/<user>/data/code/bitcoin/test/functional/mocks/signer.py", signers=size=0, chain="regtest") at external_signer.cpp:28:29
    10    frame [#7](/bitcoin-bitcoin/7/): 0x0000000105410f44 bitcoind`enumeratesigners()::$_0::operator()(this=0x000000016b5955b8, self=0x000000016b595598, request=0x000000016b5958b0) const at external_signer.cpp:51:17
    11    frame [#8](/bitcoin-bitcoin/8/): 0x0000000105410d20 bitcoind`decltype(std::declval<enumeratesigners()::$_0&>()(std::declval<RPCHelpMan const&>(), std::declval<JSONRPCRequest const&>())) std::__1::__invoke[abi:v160006]<enumeratesigners()::$_0&, RPCHelpMan const&, JSONRPCRequest const&>(__f=0x000000016b5955b8, __args=0x000000016b595598, __args=0x000000016b5958b0) at invoke.h:394:23
    12    frame [#9](/bitcoin-bitcoin/9/): 0x0000000105410c48 bitcoind`UniValue std::__1::__invoke_void_return_wrapper<UniValue, false>::__call<enumeratesigners()::$_0&, RPCHelpMan const&, JSONRPCRequest const&>(__args=0x000000016b5955b8, __args=0x000000016b595598, __args=0x000000016b5958b0) at invoke.h:478:16
    13    frame [#10](/bitcoin-bitcoin/10/): 0x0000000105410bd4 bitcoind`std::__1::__function::__alloc_func<enumeratesigners()::$_0, std::__1::allocator<enumeratesigners()::$_0>, UniValue (RPCHelpMan const&, JSONRPCRequest const&)>::operator()[abi:v160006](this=0x000000016b5955b8, __arg=0x000000016b595598, __arg=0x000000016b5958b0) at function.h:185:16
    14    frame [#11](/bitcoin-bitcoin/11/): 0x000000010540e58c bitcoind`std::__1::__function::__func<enumeratesigners()::$_0, std::__1::allocator<enumeratesigners()::$_0>, UniValue (RPCHelpMan const&, JSONRPCRequest const&)>::operator()(this=0x000000016b5955b0, __arg=0x000000016b595598, __arg=0x000000016b5958b0) at function.h:356:12
    15    frame [#12](/bitcoin-bitcoin/12/): 0x0000000105ece8dc bitcoind`std::__1::__function::__value_func<UniValue (RPCHelpMan const&, JSONRPCRequest const&)>::operator()[abi:v160006](this=0x000000016b5955b0, __args=0x000000016b595598, __args=0x000000016b5958b0) const at function.h:510:16
    16    frame [#13](/bitcoin-bitcoin/13/): 0x0000000105eb83dc bitcoind`std::__1::function<UniValue (RPCHelpMan const&, JSONRPCRequest const&)>::operator()(this= Lambda in File external_signer.cpp at Line 44, __arg=0x000000016b595598, __arg=0x000000016b5958b0) const at function.h:1156:12
    17    frame [#14](/bitcoin-bitcoin/14/): 0x0000000105eb6ed4 bitcoind`RPCHelpMan::HandleRequest(this=0x000000016b595598, request=0x000000016b5958b0) const at util.cpp:663:20
    18    frame [#15](/bitcoin-bitcoin/15/): 0x0000000105368b04 bitcoind`CRPCCommand::CRPCCommand(this=0x000000010642b4c8, request=0x000000016b5958b0, result=0x000000016b595e00, (null)=true)())::'lambda'(JSONRPCRequest const&, UniValue&, bool)::operator()(JSONRPCRequest const&, UniValue&, bool) const at server.h:101:91
    19    frame [#16](/bitcoin-bitcoin/16/): 0x0000000105368a6c bitcoind`decltype(std::declval<CRPCCommand::CRPCCommand(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, RPCHelpMan (*)())::'lambda'(JSONRPCRequest const&, UniValue&, bool)&>()(std::declval<JSONRPCRequest const&>(), std::declval<UniValue&>(), std::declval<bool>())) std::__1::__invoke[abi:v160006]<CRPCCommand::CRPCCommand(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, RPCHelpMan (*)())::'lambda'(JSONRPCRequest const&, UniValue&, bool)&, JSONRPCRequest const&, UniValue&, bool>(__f=0x000000010642b4c8, __args=0x000000016b5958b0, __args=0x000000016b595e00, __args=0x000000016b5957d7) at invoke.h:394:23
    20    frame [#17](/bitcoin-bitcoin/17/): 0x0000000105368980 bitcoind`bool std::__1::__invoke_void_return_wrapper<bool, false>::__call<CRPCCommand::CRPCCommand(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, RPCHelpMan (*)())::'lambda'(JSONRPCRequest const&, UniValue&, bool)&, JSONRPCRequest const&, UniValue&, bool>(__args=0x000000010642b4c8, __args=0x000000016b5958b0, __args=0x000000016b595e00, __args=0x000000016b5957d7) at invoke.h:478:16
    21    frame [#18](/bitcoin-bitcoin/18/): 0x0000000105368900 bitcoind`std::__1::__function::__alloc_func<CRPCCommand::CRPCCommand(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, RPCHelpMan (*)())::'lambda'(JSONRPCRequest const&, UniValue&, bool), std::__1::allocator<CRPCCommand::CRPCCommand(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, RPCHelpMan (*)())::'lambda'(JSONRPCRequest const&, UniValue&, bool)>, bool (JSONRPCRequest const&, UniValue&, bool)>::operator()[abi:v160006](this=0x000000010642b4c8, __arg=0x000000016b5958b0, __arg=0x000000016b595e00, __arg=0x000000016b5957d7) at function.h:185:16
    22    frame [#19](/bitcoin-bitcoin/19/): 0x0000000105366298 bitcoind`std::__1::__function::__func<CRPCCommand::CRPCCommand(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, RPCHelpMan (*)())::'lambda'(JSONRPCRequest const&, UniValue&, bool), std::__1::allocator<CRPCCommand::CRPCCommand(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, RPCHelpMan (*)())::'lambda'(JSONRPCRequest const&, UniValue&, bool)>, bool (JSONRPCRequest const&, UniValue&, bool)>::operator()(this=0x000000010642b4c0, __arg=0x000000016b5958b0, __arg=0x000000016b595e00, __arg=0x000000016b5957d7) at function.h:356:12
    23    frame [#20](/bitcoin-bitcoin/20/): 0x00000001051a3f0c bitcoind`std::__1::__function::__value_func<bool (JSONRPCRequest const&, UniValue&, bool)>::operator()[abi:v160006](this=0x000000010642b4c0, __args=0x000000016b5958b0, __args=0x000000016b595e00, __args=0x000000016b5957d7) const at function.h:510:16
    24    frame [#21](/bitcoin-bitcoin/21/): 0x00000001051a3df0 bitcoind`std::__1::function<bool (JSONRPCRequest const&, UniValue&, bool)>::operator()(this= Function = enumeratesigners() , __arg=0x000000016b5958b0, __arg=0x000000016b595e00, __arg=true) const at function.h:1156:12
    25    frame [#22](/bitcoin-bitcoin/22/): 0x00000001055f51cc bitcoind`ExecuteCommand(command=0x000000010642b490, request=0x000000016b596b10, result=0x000000016b595e00, last_handler=true) at server.cpp:514:20
    26    frame [#23](/bitcoin-bitcoin/23/): 0x00000001055da63c bitcoind`ExecuteCommands(commands=size=1, request=0x000000016b596b10, result=0x000000016b595e00) at server.cpp:481:13
    27    frame [#24](/bitcoin-bitcoin/24/): 0x00000001055da378 bitcoind`CRPCTable::execute(this=0x0000000106431b88, request=0x000000016b596b10) const at server.cpp:501:13
    28    frame [#25](/bitcoin-bitcoin/25/): 0x00000001055d9e38 bitcoind`JSONRPCExec(jreq=0x000000016b596b10, catch_errors=true) at server.cpp:357:31
    29    frame [#26](/bitcoin-bitcoin/26/): 0x0000000104ce4b10 bitcoind`HTTPReq_JSONRPC(context=0x0000600001be2ac8, req=0x00006000015e2e40) at httprpc.cpp:204:21
    30    frame [#27](/bitcoin-bitcoin/27/): 0x0000000104ce42c4 bitcoind`StartHTTPRPC(std::__1::any const&)::$_0::operator()(this=0x0000600001be2ac8, req=0x00006000015e2e40, (null)="") const at httprpc.cpp:376:80
    31    frame [#28](/bitcoin-bitcoin/28/): 0x0000000104ce4250 bitcoind`decltype(std::declval<StartHTTPRPC(std::__1::any const&)::$_0&>()(std::declval<HTTPRequest*>(), std::declval<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&>())) std::__1::__invoke[abi:v160006]<StartHTTPRPC(std::__1::any const&)::$_0&, HTTPRequest*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&>(__f=0x0000600001be2ac8, __args=0x000000016b596e10, __args="") at invoke.h:394:23
    32    frame [#29](/bitcoin-bitcoin/29/): 0x0000000104ce4170 bitcoind`bool std::__1::__invoke_void_return_wrapper<bool, false>::__call<StartHTTPRPC(std::__1::any const&)::$_0&, HTTPRequest*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&>(__args=0x0000600001be2ac8, __args=0x000000016b596e10, __args="") at invoke.h:478:16
    33    frame [#30](/bitcoin-bitcoin/30/): 0x0000000104ce40f8 bitcoind`std::__1::__function::__alloc_func<StartHTTPRPC(std::__1::any const&)::$_0, std::__1::allocator<StartHTTPRPC(std::__1::any const&)::$_0>, bool (HTTPRequest*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&)>::operator()[abi:v160006](this=0x0000600001be2ac8, __arg=0x000000016b596e10, __arg="") at function.h:185:16
    34    frame [#31](/bitcoin-bitcoin/31/): 0x0000000104ce1adc bitcoind`std::__1::__function::__func<StartHTTPRPC(std::__1::any const&)::$_0, std::__1::allocator<StartHTTPRPC(std::__1::any const&)::$_0>, bool (HTTPRequest*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&)>::operator()(this=0x0000600001be2ac0, __arg=0x000000016b596e10, __arg="") at function.h:356:12
    35    frame [#32](/bitcoin-bitcoin/32/): 0x0000000104d04198 bitcoind`std::__1::__function::__value_func<bool (HTTPRequest*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&)>::operator()[abi:v160006](this=0x00006000036e6878, __args=0x000000016b596e10, __args="") const at function.h:510:16
    36    frame [#33](/bitcoin-bitcoin/33/): 0x0000000104d0408c bitcoind`std::__1::function<bool (HTTPRequest*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&)>::operator()(this= Lambda in File httprpc.cpp at Line 376, __arg=0x00006000015e2e40, __arg="") const at function.h:1156:12
    37    frame [#34](/bitcoin-bitcoin/34/): 0x0000000104d03cf0 bitcoind`HTTPWorkItem::operator()(this=0x00006000036e6850) at httpserver.cpp:62:9
    38    frame [#35](/bitcoin-bitcoin/35/): 0x0000000104d17560 bitcoind`WorkQueue<HTTPClosure>::Run(this=0x00006000022e4000) at httpserver.cpp:117:13
    39    frame [#36](/bitcoin-bitcoin/36/): 0x0000000104cf099c bitcoind`HTTPWorkQueueRun(queue=0x00006000022e4000, worker_num=3) at httpserver.cpp:419:12
    40    frame [#37](/bitcoin-bitcoin/37/): 0x0000000104d14498 bitcoind`decltype(std::declval<void (*)(WorkQueue<HTTPClosure>*, int)>()(std::declval<WorkQueue<HTTPClosure>*>(), std::declval<int>())) std::__1::__invoke[abi:v160006]<void (*)(WorkQueue<HTTPClosure>*, int), WorkQueue<HTTPClosure>*, int>(__f=0x00006000015e81c8, __args=0x00006000015e81d0, __args=0x00006000015e81d8) at invoke.h:394:23
    41    frame [#38](/bitcoin-bitcoin/38/): 0x0000000104d14354 bitcoind`void std::__1::__thread_execute[abi:v160006]<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, void (*)(WorkQueue<HTTPClosure>*, int), WorkQueue<HTTPClosure>*, int, 2ul, 3ul>(__t=size=4, (null)=__tuple_indices<2UL, 3UL> @ 0x000000016b596f77) at thread:282:5
    42    frame [#39](/bitcoin-bitcoin/39/): 0x0000000104d13310 bitcoind`void* std::__1::__thread_proxy[abi:v160006]<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, void (*)(WorkQueue<HTTPClosure>*, int), WorkQueue<HTTPClosure>*, int>>(__vp=0x00006000015e81c0) at thread:293:5
    43    frame [#40](/bitcoin-bitcoin/40/): 0x000000018858f2e4 libsystem_pthread.dylib`_pthread_start + 136
    44(lldb) 
    45
    46# the signer.py process likely hangs on buffer = sys.stdin.read()
    47(lldb) process attach --pid 31835
    48Process 31835 stopped
    49* thread [#1](/bitcoin-bitcoin/1/), queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
    50    frame [#0](/bitcoin-bitcoin/0/): 0x000000018854e9b0 libsystem_kernel.dylib`read + 8
    51libsystem_kernel.dylib`read:
    52->  0x18854e9b0 <+8>:  b.lo   0x18854e9d0               ; <+40>
    53    0x18854e9b4 <+12>: pacibsp 
    54    0x18854e9b8 <+16>: stp    x29, x30, [sp, #-0x10]!
    55    0x18854e9bc <+20>: mov    x29, sp
    56Executable module set to "/opt/homebrew/Cellar/python@3.13/3.13.3_1/Frameworks/Python.framework/Versions/3.13/Resources/Python.app/Contents/MacOS/Python".
    57Architecture set to: arm64-apple-macosx-.
    58(lldb) bt
    59* thread [#1](/bitcoin-bitcoin/1/), queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
    60  * frame [#0](/bitcoin-bitcoin/0/): 0x000000018854e9b0 libsystem_kernel.dylib`read + 8
    61    frame [#1](/bitcoin-bitcoin/1/): 0x0000000102bc9034 Python`_Py_read + 108
    62    frame [#2](/bitcoin-bitcoin/2/): 0x0000000102bf6440 Python`_io_FileIO_readall_impl + 304
    63    frame [#3](/bitcoin-bitcoin/3/): 0x0000000102a6cac8 Python`cfunction_vectorcall_NOARGS + 100
    64    frame [#4](/bitcoin-bitcoin/4/): 0x0000000102bfc290 Python`_io__Buffered_read + 736
    65    frame [#5](/bitcoin-bitcoin/5/): 0x0000000102a1c514 Python`method_vectorcall_FASTCALL + 108
    66    frame [#6](/bitcoin-bitcoin/6/): 0x0000000102a119b8 Python`PyObject_VectorcallMethod + 148
    67    frame [#7](/bitcoin-bitcoin/7/): 0x0000000102c058d0 Python`_io_TextIOWrapper_read + 192
    68    frame [#8](/bitcoin-bitcoin/8/): 0x0000000102a1c514 Python`method_vectorcall_FASTCALL + 108
    69    frame [#9](/bitcoin-bitcoin/9/): 0x0000000102a103a0 Python`PyObject_Vectorcall + 92
    70    frame [#10](/bitcoin-bitcoin/10/): 0x0000000102b387d0 Python`_PyEval_EvalFrameDefault + 6876
    71    frame [#11](/bitcoin-bitcoin/11/): 0x0000000102b36a78 Python`PyEval_EvalCode + 200
    72    frame [#12](/bitcoin-bitcoin/12/): 0x0000000102ba7078 Python`run_eval_code_obj + 104
    73    frame [#13](/bitcoin-bitcoin/13/): 0x0000000102ba6ac0 Python`run_mod + 168
    74    frame [#14](/bitcoin-bitcoin/14/): 0x0000000102ba51a0 Python`pyrun_file + 164
    75    frame [#15](/bitcoin-bitcoin/15/): 0x0000000102ba44dc Python`_PyRun_SimpleFileObject + 256
    76    frame [#16](/bitcoin-bitcoin/16/): 0x0000000102ba4114 Python`_PyRun_AnyFileObject + 80
    77    frame [#17](/bitcoin-bitcoin/17/): 0x0000000102bcba2c Python`pymain_run_file_obj + 164
    78    frame [#18](/bitcoin-bitcoin/18/): 0x0000000102bcb6d8 Python`pymain_run_file + 76
    79    frame [#19](/bitcoin-bitcoin/19/): 0x0000000102bca948 Python`Py_RunMain + 968
    80    frame [#20](/bitcoin-bitcoin/20/): 0x0000000102bcaf44 Python`pymain_main + 304
    81    frame [#21](/bitcoin-bitcoin/21/): 0x0000000102bcafe4 Python`Py_BytesMain + 40
    82    frame [#22](/bitcoin-bitcoin/22/): 0x000000018820c274 dyld`start + 2840
    83(lldb) 
    
  40. pablomartin4btc commented at 12:45 pm on July 23, 2025: member
  41. tnndbtc commented at 11:46 pm on July 23, 2025: none
    Since it happened again, should we consider adding more error/exception handlings on subprocess_close? Currently we assume it’s always successful, but it may return -1 as error code.
  42. Sjors referenced this in commit d25de7e249 on Jul 25, 2025
  43. Sjors commented at 8:20 am on July 25, 2025: member
    @tnndbtc I opened #33061 to do this, though this may be problematic.
  44. Sjors referenced this in commit 4911de32e5 on Jul 25, 2025
  45. fanquake closed this on Jul 26, 2025

  46. fanquake referenced this in commit 75a5c8258e on Jul 26, 2025

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-08-12 09:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me