0[18:26:32.573] test_framework.authproxy.JSONRPCException: 'enumeratesigners' RPC took longer than 1200.000000 seconds. Consider using larger timeout for calls that take longer to return. (-344)
Looking at the full log, the test is just idle and then times out. However, there is no indication why the RPC does not proceed.
maflcko added the label
CI failed
on May 16, 2025
maflcko
commented at 10:04 am on May 16, 2025:
member
laanwj
commented at 11:12 am on May 16, 2025:
member
If it’s #32343 i don’t understand why. It was changed so that the brute force fd-closing path isn’t even compiled on Linux.
No idea what could even take so long, unless it’s an obscure bug in the C++ library like fs::directory_iterator looping forever.
laanwj
commented at 11:37 am on May 16, 2025:
member
It would be reasonaly useful to if it logged a full backtrace of bitcoind (gdb thread apply all bt) in these case of timeouts.
Though, i’m not sure that’s enough to debug this. If this is what we think that it is, this is already happening in the forked process, after all. This similarly makes it difficult to add debug logging.
Edit: i’ll make an implementation using bare readdir instead of the fs_iterator, and open a PR. So we can see if that fixes it.
maflcko
commented at 8:46 am on May 17, 2025:
member
maflcko
commented at 7:21 am on May 19, 2025:
member
Backtrace with file names and lines:
0Thread33 (Thread0x7f9d93e006c0 (LWP 18294) "b-httpworker.2"):
1[#0](/bitcoin-bitcoin/0/) 0x00007f9da1dfca9a in read () from /lib/x86_64-linux-gnu/libc.so.6 2[#1](/bitcoin-bitcoin/1/) 0x000055f3d4633e22 in read (__nbytes=1024, __buf=0x7f9d93dfb030, __fd=30) at /usr/include/x86_64-linux-gnu/bits/unistd.h:28 3[#2](/bitcoin-bitcoin/2/) subprocess::util::read_atmost_n (read_upto=1024, buf=0x7f9d93dfb030 "", fp=0x7f9d8c0036c0) at ./util/subprocess.h:437 4[#3](/bitcoin-bitcoin/3/) subprocess::Popen::execute_process (this=this@entry=0x7f9d93dfb8b0) at ./util/subprocess.h:1242 5[#4](/bitcoin-bitcoin/4/) 0x000055f3d4634bfc in subprocess::Popen::Popen<subprocess::input, subprocess::output, subprocess::error, subprocess::close_fds> (this=0x7f9d93dfb8b0, cmd_args="/usr/bin/python3 /ci_container_base/test/functional/mocks/signer.py enumerate") at ./util/subprocess.h:964 6[#5](/bitcoin-bitcoin/5/) 0x000055f3d462a9dc in RunCommandParseJSON (str_command="/usr/bin/python3 /ci_container_base/test/functional/mocks/signer.py enumerate", str_std_in="") at ./common/run_command.cpp:27 7[#6](/bitcoin-bitcoin/6/) 0x000055f3d44717bc in ExternalSigner::Enumerate (command="/usr/bin/python3 /ci_container_base/test/functional/mocks/signer.py", signers=std::__debug::vector of length 0, capacity 0, chain="regtest") at /usr/include/c++/13/bits/allocator.h:184 8[#7](/bitcoin-bitcoin/7/) 0x000055f3d3d78a55 in operator() (request=..., self=..., __closure=<optimized out>) at /usr/include/c++/13/bits/basic_string.tcc:242 9[#8](/bitcoin-bitcoin/8/) 0x000055f3d3d7a0c1 in std::__invoke_impl<UniValue, enumeratesigners()::<lambda(const RPCHelpMan&, const JSONRPCRequest&)>&, const RPCHelpMan&, const JSONRPCRequest&> (__f=...) at /usr/include/c++/13/bits/invoke.h:6010[#9](/bitcoin-bitcoin/9/) std::__invoke_r<UniValue, enumeratesigners()::<lambda(const RPCHelpMan&, const JSONRPCRequest&)>&, const RPCHelpMan&, const JSONRPCRequest&> (__fn=...) at /usr/include/c++/13/bits/invoke.h:11611[#10](/bitcoin-bitcoin/10/) std::_Function_handler<UniValue(const RPCHelpMan&, const JSONRPCRequest&), enumeratesigners()::<lambda(const RPCHelpMan&, const JSONRPCRequest&)> >::_M_invoke(const std::_Any_data &, const RPCHelpMan &, const JSONRPCRequest &) (__functor=..., __args#0=..., __args#1=...) at /usr/include/c++/13/bits/std_function.h:29112[#11](/bitcoin-bitcoin/11/) 0x000055f3d4530306 in std::function<UniValue (RPCHelpMan const&, JSONRPCRequest const&)>::operator()(RPCHelpMan const&, JSONRPCRequest const&) const (__args#1=..., __args#0=..., this=0x7f9d93dfd110) at /usr/include/c++/13/bits/std_function.h:59113[#12](/bitcoin-bitcoin/12/) RPCHelpMan::HandleRequest (this=this@entry=0x7f9d93dfd0f0, request=...) at ./rpc/util.cpp:66314[#13](/bitcoin-bitcoin/13/) 0x000055f3d3d5d2fb in CRPCCommand::CRPCCommand(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, RPCHelpMan (*)())::{lambda(JSONRPCRequest const&, UniValue&, bool)#1}::operator()(JSONRPCRequest const&, UniValue&, bool) const (request=..., result=..., __closure=<optimized out>) at ./rpc/server.h:10115[#14](/bitcoin-bitcoin/14/) 0x000055f3d3e6ed87 in std::function<bool (JSONRPCRequest const&, UniValue&, bool)>::operator()(JSONRPCRequest const&, UniValue&, bool) const (__args#2=<optimized out>, __args#1=..., __args#0=..., this=0x55f3d50aee20 <RegisterSignerRPCCommands(CRPCTable&)::commands+64>) at /usr/include/c++/13/bits/std_function.h:59116[#15](/bitcoin-bitcoin/15/) ExecuteCommand (last_handler=true, result=..., request=..., command=...) at ./rpc/server.cpp:51417[#16](/bitcoin-bitcoin/16/) ExecuteCommands (commands=std::__debug::vector of length 1, capacity 1 = {...}, request=..., result=...) at ./rpc/server.cpp:48118[#17](/bitcoin-bitcoin/17/) 0x000055f3d3e6fc17 in CRPCTable::execute (this=this@entry=0x55f3d50b1f80 <tableRPC>, request=...) at ./rpc/server.cpp:50119[#18](/bitcoin-bitcoin/18/) 0x000055f3d3e6ffdb in JSONRPCExec (jreq=..., catch_errors=<optimized out>) at ./rpc/server.cpp:35720[#19](/bitcoin-bitcoin/19/) 0x000055f3d3ffddd2 in HTTPReq_JSONRPC (context=..., req=0x7f9d30001ca0) at ./httprpc.cpp:21721[#20](/bitcoin-bitcoin/20/) 0x000055f3d4014451 in std::function<bool (HTTPRequest*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)>::operator()(HTTPRequest*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const (__args#1="", __args#0=<optimized out>, this=0x7f9d300053c0) at /usr/include/c++/13/bits/std_function.h:59122[#21](/bitcoin-bitcoin/21/) HTTPWorkItem::operator() (this=0x7f9d30005390) at ./httpserver.cpp:6023[#22](/bitcoin-bitcoin/22/) WorkQueue<HTTPClosure>::Run (this=this@entry=0x55f40ede4b10) at ./httpserver.cpp:11524[#23](/bitcoin-bitcoin/23/) 0x000055f3d400bb37 in HTTPWorkQueueRun (queue=0x55f40ede4b10, worker_num=<optimized out>) at ./httpserver.cpp:41725[#24](/bitcoin-bitcoin/24/) 0x00007f9da20f6db4 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.626[#25](/bitcoin-bitcoin/25/) 0x00007f9da1d7daa4 in ?? () from /lib/x86_64-linux-gnu/libc.so.627[#26](/bitcoin-bitcoin/26/) 0x00007f9da1e0ac3c in ?? () from /lib/x86_64-linux-gnu/libc.so.6
laanwj
commented at 8:33 am on May 19, 2025:
member
That backtrace is exactly what i’d expected. It’s the parent process waiting for the forked child there.
maflcko
commented at 10:00 am on May 19, 2025:
member
So far it looks like this can only happen with gcc-13 (or older) and GLIBCXX debug mode. I don’t think any real user is running in production with the std lib debug mode, so an alternative would be to just fixup the CI?
laanwj
commented at 10:22 am on May 19, 2025:
member
so an alternative would be to just fixup the CI?
Sure. If this is unlikely to cause issues for real users, that makes more sense, the point of the CI is not to keep us patching the code to make the CI happy 😄
maflcko
commented at 10:58 am on May 19, 2025:
member
I’d say finding compiler bugs is a valid use-case of the CI, but this one seems to be fixed already, so there is probably not much value in bisecting the exact gcc commit, especially given that it would be extremely tedious without a minimal non-intermittent reproducer.
(The time would be better spent on reducing other compiler bugs, such as #32276 (comment))
laanwj
commented at 12:02 pm on May 19, 2025:
member
Yes, bisecting it would be neigh-impossible, given that it seems impossible to reliably reproduce (one thing that’s strange is that it never happens in the unit tests, which exercise RunCommandParseJSON). i already looked into the git history of directory_iterator in the gcc/libstdc++-v3 and wasn’t able to find any bug fix that would explain the behavior here, or how it was fixed. So it’s likely something indirect.
maflcko
commented at 12:15 pm on May 19, 2025:
member
The debug mode keeps state in additional structs/fields, so my bet would be that one of the states is multi-thread unsafe, but that is just a wild guess.
fanquake closed this
on May 20, 2025
fanquake referenced this in commit
af65fd1a33
on May 20, 2025
maflcko
commented at 11:17 am on May 20, 2025:
member
maflcko renamed this:
intermittent issue in rpc_signer.py (enumeratesigners timeout)
intermittent issue in rpc_signer.py (enumeratesigners timeout) under GLIBCXX debug mode
on May 20, 2025
fanquake added this to the milestone 30.0
on May 20, 2025
maflcko
commented at 12:54 pm on May 20, 2025:
member
0[13:30:22.121] node0 2025-05-21T17:30:07.606048Z [msghand][logging/timer.h:58][Log][lock] Enter: lock contention connman.m_nodes_mutex, ./net.h:1648 started
1[13:30:22.121] node0 2025-05-21T17:30:07.606205Z [msghand][logging/timer.h:58][Log][lock] Enter: lock contention connman.m_nodes_mutex, ./net.h:1648 completed (2μs) 2[13:30:22.121] node0 2025-05-21T17:30:21.228925Z [msghand][net_processing.cpp:3449][ProcessMessage][net] received: ping (8 bytes)peer=0 3[13:30:22.121] node0 2025-05-21T17:30:21.229000Z [msghand][net.cpp:3870][PushMessage][net] sending pong (8 bytes)peer=0 4[13:30:22.121] node0 2025-05-21T17:30:21.229120Z [msghand][net.cpp:3870][PushMessage][net] sending ping (8 bytes)peer=0 5[13:30:22.121] node0 2025-05-21T17:30:21.229620Z [msghand][net_processing.cpp:3449][ProcessMessage][net] received: pong (8 bytes)peer=0 6[13:30:22.121] node0 2025-05-21T17:30:21.329837Z [net][logging/timer.h:58][Log][lock] Enter: lock contention m_nodes_mutex, ./net.cpp:1906 started
7[13:30:22.121] node0 2025-05-21T17:30:21.329888Z [net][logging/timer.h:58][Log][lock] Enter: lock contention m_nodes_mutex, ./net.cpp:1906 completed (1μs) 8[13:30:22.121] test 2025-05-21T17:30:21.429000Z TestFramework (ERROR): Assertion failed
9[13:30:22.121] Traceback (most recent call last):
10[13:30:22.121] File "/ci_container_base/test/functional/test_framework/authproxy.py", line 169, in _get_response
11[13:30:22.121]http_response= self.__conn.getresponse()12[13:30:22.121] ^^^^^^^^^^^^^^^^^^^^^^^^^
13[13:30:22.121] File "/usr/lib64/python3.12/http/client.py", line 1430, in getresponse
14[13:30:22.121] response.begin()15[13:30:22.121] File "/usr/lib64/python3.12/http/client.py", line 331, in begin
16[13:30:22.121] version, status, reason= self._read_status()17[13:30:22.121] ^^^^^^^^^^^^^^^^^^^
18[13:30:22.121] File "/usr/lib64/python3.12/http/client.py", line 292, in _read_status
19[13:30:22.121]line= str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")20[13:30:22.121] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
21[13:30:22.121] File "/usr/lib64/python3.12/socket.py", line 720, in readinto
22[13:30:22.121]return self._sock.recv_into(b)23[13:30:22.121] ^^^^^^^^^^^^^^^^^^^^^^^
24[13:30:22.121] TimeoutError: timed out
25[13:30:22.121] During handling of the above exception, another exception occurred:
26[13:30:22.121] Traceback (most recent call last):
27[13:30:22.121] File "/ci_container_base/test/functional/test_framework/util.py", line 164, in try_rpc
28[13:30:22.121] fun(*args, **kwds)29[13:30:22.121] File "/ci_container_base/test/functional/test_framework/coverage.py", line 50, in __call__
30[13:30:22.121]return_val= self.auth_service_proxy_instance.__call__(*args, **kwargs)31[13:30:22.121] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
32[13:30:22.121] File "/ci_container_base/test/functional/test_framework/authproxy.py", line 132, in __call__
33[13:30:22.121] response, status= self._request('POST', self.__url.path, postdata.encode('utf-8'))34[13:30:22.121] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
35[13:30:22.121] File "/ci_container_base/test/functional/test_framework/authproxy.py", line 106, in _request
36[13:30:22.121]return self._get_response()37[13:30:22.121] ^^^^^^^^^^^^^^^^^^^^
38[13:30:22.121] File "/ci_container_base/test/functional/test_framework/authproxy.py", line 171, in _get_response
39[13:30:22.121] raise JSONRPCException({40[13:30:22.121] test_framework.authproxy.JSONRPCException: 'enumeratesigners' RPC took longer than 1200.000000 seconds. Consider using larger timeout for calls that take longer to return. (-344)41[13:30:22.121] During handling of the above exception, another exception occurred:
42[13:30:22.121] Traceback (most recent call last):
43[13:30:22.121] File "/ci_container_base/test/functional/test_framework/test_framework.py", line 184, in main
44[13:30:22.121] self.run_test()45[13:30:22.121] File "/ci_container_base/ci/scratch/build-x86_64-pc-linux-gnu/test/functional/rpc_signer.py", line 54, in run_test
46[13:30:22.121] assert_raises_rpc_error(47[13:30:22.121] File "/ci_container_base/test/functional/test_framework/util.py", line 155, in assert_raises_rpc_error
48[13:30:22.121] assert try_rpc(code, message, fun, *args, **kwds), "No exception raised"49[13:30:22.121] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
50[13:30:22.121] File "/ci_container_base/test/functional/test_framework/util.py", line 168, in try_rpc
51[13:30:22.121] raise AssertionError("Unexpected JSONRPC error code %i" % e.error["code"])52[13:30:22.121] AssertionError: Unexpected JSONRPC error code -344
53[13:30:22.121] test 2025-05-21T17:30:21.437000Z TestFramework (DEBUG): Closing down network thread
54[13:30:22.121] test 2025-05-21T17:30:21.488000Z TestFramework (INFO): Not stopping nodes as test failed. The dangling processes will be cleaned up later.
55[13:30:22.121] test 2025-05-21T17:30:21.488000Z TestFramework (WARNING): Not cleaning up dir /ci_container_base/ci/scratch/test_runner/test_runner_₿_🏃_20250521_130522/rpc_signer_216
56[13:30:22.121] test 2025-05-21T17:30:21.488000Z TestFramework (ERROR): Test failed. Test logging available at /ci_container_base/ci/scratch/test_runner/test_runner_₿_🏃_20250521_130522/rpc_signer_216/test_framework.log
57[13:30:22.121] test 2025-05-21T17:30:21.489000Z TestFramework (ERROR):
58[13:30:22.121] test 2025-05-21T17:30:21.489000Z TestFramework (ERROR): Hint: Call /ci_container_base/test/functional/combine_logs.py '/ci_container_base/ci/scratch/test_runner/test_runner_₿_🏃_20250521_130522/rpc_signer_216' to consolidate all logs
59[13:30:22.121] test 2025-05-21T17:30:21.489000Z TestFramework (ERROR):
60[13:30:22.121] test 2025-05-21T17:30:21.489000Z TestFramework (ERROR): If this failure happened unexpectedly or intermittently, please file a bug and provide a link or upload of the combined log.
61[13:30:22.121] test 2025-05-21T17:30:21.489000Z TestFramework (ERROR): https://github.com/bitcoin/bitcoin/issues
62[13:30:22.121] test 2025-05-21T17:30:21.489000Z TestFramework (ERROR):
maflcko
commented at 11:39 am on May 22, 2025:
member
maflcko referenced this in commit
a476e94689
on May 22, 2025
maflcko referenced this in commit
fa079538e3
on May 22, 2025
fanquake closed this
on May 22, 2025
fanquake referenced this in commit
2df824f4e6
on May 22, 2025
maflcko
commented at 6:10 am on May 23, 2025:
member
This is closed, but the failure remains, when triggered, so it would be good to investigate this long-term at some point. Discussion can continue here, or in a new issue.
fanquake reopened this
on May 23, 2025
maflcko added the label
Brainstorming
on May 23, 2025
maflcko
commented at 8:18 am on May 23, 2025:
member
tnndbtc
commented at 0:13 am on July 15, 2025:
none
I assume the general code logic should be correct, but likely some exception/run-time error occurred then caused the intermittent issue.
So, I studied Popen::execute_process() in src/util/subprocess.h, one interesting statement caught my attention:
0 subprocess_close(err_wr_pipe);// close child side of pipe, else get stuck in read below
1 stream_.close_child_fds();
I notice that we didn’t check the return code of the calling of subprocess_close()/close(), in many cases. Indeed, close() is a very safe call in general, but what if close() returns -1, but we didn’t check it?
So, I hacked the subprocess.h by not calling subprocess_close(err_wr_pipe); on purpose in my macOS. Then I kind of reproduced the issue, at least it has very similar call stack on bitcoind, and on signer.py, it likely hangs on buffer = sys.stdin.read()
Not sure if this helps on the debugging process, but just want to share a thought. Maybe we should handle the return code of subprocess_close() to provide more hints when next it reproduces?
tnndbtc
commented at 11:46 pm on July 23, 2025:
none
Since it happened again, should we consider adding more error/exception handlings on subprocess_close? Currently we assume it’s always successful, but it may return -1 as error code.
Sjors referenced this in commit
d25de7e249
on Jul 25, 2025
Sjors
commented at 8:20 am on July 25, 2025:
member
@tnndbtc I opened #33061 to do this, though this may be problematic.
Sjors referenced this in commit
4911de32e5
on Jul 25, 2025
fanquake closed this
on Jul 26, 2025
fanquake referenced this in commit
75a5c8258e
on Jul 26, 2025
This is a metadata mirror of the GitHub repository
bitcoin/bitcoin.
This site is not affiliated with GitHub.
Content is generated from a GitHub metadata backup.
generated: 2025-08-12 09:13 UTC
This site is hosted by @0xB10C More mirrored repositories can be found on mirror.b10c.me