intermittent issue in rpc_signer.py (enumeratesigners timeout) under GLIBCXX debug mode #32524

issue maflcko openend this issue on May 16, 2025
  1. maflcko commented at 6:36 am on May 16, 2025: member

    https://cirrus-ci.com/task/5094568045051904?logs=ci#L4419

    0[18:26:32.573] test_framework.authproxy.JSONRPCException: 'enumeratesigners' RPC took longer than 1200.000000 seconds. Consider using larger timeout for calls that take longer to return. (-344)
    

    Looking at the full log, the test is just idle and then times out. However, there is no indication why the RPC does not proceed.

  2. maflcko added the label CI failed on May 16, 2025
  3. maflcko commented at 10:04 am on May 16, 2025: member
  4. maflcko commented at 10:46 am on May 16, 2025: member

    Same https://cirrus-ci.com/task/4561849141690368?logs=ci#L6059

    So I guess this was recently introduced

  5. fanquake commented at 10:49 am on May 16, 2025: member
  6. laanwj commented at 11:12 am on May 16, 2025: member

    If it’s #32343 i don’t understand why. It was changed so that the brute force fd-closing path isn’t even compiled on Linux.

    No idea what could even take so long, unless it’s an obscure bug in the C++ library like fs::directory_iterator looping forever.

  7. laanwj commented at 11:37 am on May 16, 2025: member

    It would be reasonaly useful to if it logged a full backtrace of bitcoind (gdb thread apply all bt) in these case of timeouts.

    Though, i’m not sure that’s enough to debug this. If this is what we think that it is, this is already happening in the forked process, after all. This similarly makes it difficult to add debug logging.

    Edit: i’ll make an implementation using bare readdir instead of the fs_iterator, and open a PR. So we can see if that fixes it.

  8. maflcko commented at 8:46 am on May 17, 2025: member
    I’ve created 50 runs and they all passed: https://cirrus-ci.com/task/5325312260571136 So I guess this is related. I’ll try to get the gdb bt next.
  9. maflcko commented at 8:54 am on May 17, 2025: member
    In the meantime, I place my bet that this is a gcc-11 upstream bug 🤞
  10. romanz commented at 12:49 pm on May 17, 2025: contributor
  11. maflcko commented at 12:39 pm on May 18, 2025: member

    The bt is:

     0Thread 9 (Thread 0x7f0973400640 (LWP 21341) "b-httpworker.5"):
     1[#0](/bitcoin-bitcoin/0/)  __GI___libc_read (nbytes=1024, buf=0x7f09733fb7f0, fd=30) at ../sysdeps/unix/sysv/linux/read.c:26
     2[#1](/bitcoin-bitcoin/1/)  __GI___libc_read (fd=30, buf=0x7f09733fb7f0, nbytes=1024) at ../sysdeps/unix/sysv/linux/read.c:24
     3[#2](/bitcoin-bitcoin/2/)  0x0000564d74731ba3 in subprocess::Popen::execute_process() ()
     4[#3](/bitcoin-bitcoin/3/)  0x0000564d74732b44 in subprocess::Popen::Popen<subprocess::input, subprocess::output, subprocess::error, subprocess::close_fds>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, subprocess::input&&, subprocess::output&&, subprocess::error&&, subprocess::close_fds&&) ()
     5[#4](/bitcoin-bitcoin/4/)  0x0000564d747299f5 in RunCommandParseJSON(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) ()
     6[#5](/bitcoin-bitcoin/5/)  0x0000564d7457642c in ExternalSigner::Enumerate(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__debug::vector<ExternalSigner, std::allocator<ExternalSigner> >&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) ()
     7[#6](/bitcoin-bitcoin/6/)  0x0000564d73e7f8e5 in enumeratesigners()::{lambda(RPCHelpMan const&, JSONRPCRequest const&)#1}::operator()(RPCHelpMan const&, JSONRPCRequest const&) const [clone .constprop.0] ()
     8[#7](/bitcoin-bitcoin/7/)  0x0000564d73e815d2 in std::_Function_handler<UniValue (RPCHelpMan const&, JSONRPCRequest const&), enumeratesigners()::{lambda(RPCHelpMan const&, JSONRPCRequest const&)#1}>::_M_invoke(std::_Any_data const&, RPCHelpMan const&, JSONRPCRequest const&) ()
     9[#8](/bitcoin-bitcoin/8/)  0x0000564d74630660 in RPCHelpMan::HandleRequest(JSONRPCRequest const&) const ()
    10[#9](/bitcoin-bitcoin/9/)  0x0000564d73e6334e in CRPCCommand::CRPCCommand(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, RPCHelpMan (*)())::{lambda(JSONRPCRequest const&, UniValue&, bool)#1}::operator()(JSONRPCRequest const&, UniValue&, bool) const [clone .isra.0] ()
    11[#10](/bitcoin-bitcoin/10/) 0x0000564d73f80826 in ExecuteCommands(std::__debug::vector<CRPCCommand const*, std::allocator<CRPCCommand const*> > const&, JSONRPCRequest const&, UniValue&) ()
    12[#11](/bitcoin-bitcoin/11/) 0x0000564d73f80e4d in CRPCTable::execute(JSONRPCRequest const&) const ()
    13[#12](/bitcoin-bitcoin/12/) 0x0000564d73f81031 in JSONRPCExec(JSONRPCRequest const&, bool) ()
    14[#13](/bitcoin-bitcoin/13/) 0x0000564d740fbb02 in HTTPReq_JSONRPC(std::any const&, HTTPRequest*) ()
    15[#14](/bitcoin-bitcoin/14/) 0x0000564d7411405d in WorkQueue<HTTPClosure>::Run() ()
    16[#15](/bitcoin-bitcoin/15/) 0x0000564d7410ca38 in HTTPWorkQueueRun(WorkQueue<HTTPClosure>*, int) ()
    17[#16](/bitcoin-bitcoin/16/) 0x00007f097c4cf253 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
    18[#17](/bitcoin-bitcoin/17/) 0x00007f097c157ac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
    19[#18](/bitcoin-bitcoin/18/) 0x00007f097c1e9850 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
    
  12. maflcko commented at 6:25 am on May 19, 2025: member
    Whatever the issue is, it may be fixed in gcc-14. At least https://cirrus-ci.com/task/5477938151292928 passed 25 times. (The prior commit with gcc-12 failed at least once: https://cirrus-ci.com/task/5645419964792832?logs=ci#L4122). Gcc-13 also failed at least once: https://cirrus-ci.com/task/4577002323181568?logs=ci#L690
  13. maflcko commented at 7:21 am on May 19, 2025: member

    Backtrace with file names and lines:

     0Thread 33 (Thread 0x7f9d93e006c0 (LWP 18294) "b-httpworker.2"):
     1[#0](/bitcoin-bitcoin/0/)  0x00007f9da1dfca9a in read () from /lib/x86_64-linux-gnu/libc.so.6
     2[#1](/bitcoin-bitcoin/1/)  0x000055f3d4633e22 in read (__nbytes=1024, __buf=0x7f9d93dfb030, __fd=30) at /usr/include/x86_64-linux-gnu/bits/unistd.h:28
     3[#2](/bitcoin-bitcoin/2/)  subprocess::util::read_atmost_n (read_upto=1024, buf=0x7f9d93dfb030 "", fp=0x7f9d8c0036c0) at ./util/subprocess.h:437
     4[#3](/bitcoin-bitcoin/3/)  subprocess::Popen::execute_process (this=this@entry=0x7f9d93dfb8b0) at ./util/subprocess.h:1242
     5[#4](/bitcoin-bitcoin/4/)  0x000055f3d4634bfc in subprocess::Popen::Popen<subprocess::input, subprocess::output, subprocess::error, subprocess::close_fds> (this=0x7f9d93dfb8b0, cmd_args="/usr/bin/python3 /ci_container_base/test/functional/mocks/signer.py enumerate") at ./util/subprocess.h:964
     6[#5](/bitcoin-bitcoin/5/)  0x000055f3d462a9dc in RunCommandParseJSON (str_command="/usr/bin/python3 /ci_container_base/test/functional/mocks/signer.py enumerate", str_std_in="") at ./common/run_command.cpp:27
     7[#6](/bitcoin-bitcoin/6/)  0x000055f3d44717bc in ExternalSigner::Enumerate (command="/usr/bin/python3 /ci_container_base/test/functional/mocks/signer.py", signers=std::__debug::vector of length 0, capacity 0, chain="regtest") at /usr/include/c++/13/bits/allocator.h:184
     8[#7](/bitcoin-bitcoin/7/)  0x000055f3d3d78a55 in operator() (request=..., self=..., __closure=<optimized out>) at /usr/include/c++/13/bits/basic_string.tcc:242
     9[#8](/bitcoin-bitcoin/8/)  0x000055f3d3d7a0c1 in std::__invoke_impl<UniValue, enumeratesigners()::<lambda(const RPCHelpMan&, const JSONRPCRequest&)>&, const RPCHelpMan&, const JSONRPCRequest&> (__f=...) at /usr/include/c++/13/bits/invoke.h:60
    10[#9](/bitcoin-bitcoin/9/)  std::__invoke_r<UniValue, enumeratesigners()::<lambda(const RPCHelpMan&, const JSONRPCRequest&)>&, const RPCHelpMan&, const JSONRPCRequest&> (__fn=...) at /usr/include/c++/13/bits/invoke.h:116
    11[#10](/bitcoin-bitcoin/10/) std::_Function_handler<UniValue(const RPCHelpMan&, const JSONRPCRequest&), enumeratesigners()::<lambda(const RPCHelpMan&, const JSONRPCRequest&)> >::_M_invoke(const std::_Any_data &, const RPCHelpMan &, const JSONRPCRequest &) (__functor=..., __args#0=..., __args#1=...) at /usr/include/c++/13/bits/std_function.h:291
    12[#11](/bitcoin-bitcoin/11/) 0x000055f3d4530306 in std::function<UniValue (RPCHelpMan const&, JSONRPCRequest const&)>::operator()(RPCHelpMan const&, JSONRPCRequest const&) const (__args#1=..., __args#0=..., this=0x7f9d93dfd110) at /usr/include/c++/13/bits/std_function.h:591
    13[#12](/bitcoin-bitcoin/12/) RPCHelpMan::HandleRequest (this=this@entry=0x7f9d93dfd0f0, request=...) at ./rpc/util.cpp:663
    14[#13](/bitcoin-bitcoin/13/) 0x000055f3d3d5d2fb in CRPCCommand::CRPCCommand(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, RPCHelpMan (*)())::{lambda(JSONRPCRequest const&, UniValue&, bool)#1}::operator()(JSONRPCRequest const&, UniValue&, bool) const (request=..., result=..., __closure=<optimized out>) at ./rpc/server.h:101
    15[#14](/bitcoin-bitcoin/14/) 0x000055f3d3e6ed87 in std::function<bool (JSONRPCRequest const&, UniValue&, bool)>::operator()(JSONRPCRequest const&, UniValue&, bool) const (__args#2=<optimized out>, __args#1=..., __args#0=..., this=0x55f3d50aee20 <RegisterSignerRPCCommands(CRPCTable&)::commands+64>) at /usr/include/c++/13/bits/std_function.h:591
    16[#15](/bitcoin-bitcoin/15/) ExecuteCommand (last_handler=true, result=..., request=..., command=...) at ./rpc/server.cpp:514
    17[#16](/bitcoin-bitcoin/16/) ExecuteCommands (commands=std::__debug::vector of length 1, capacity 1 = {...}, request=..., result=...) at ./rpc/server.cpp:481
    18[#17](/bitcoin-bitcoin/17/) 0x000055f3d3e6fc17 in CRPCTable::execute (this=this@entry=0x55f3d50b1f80 <tableRPC>, request=...) at ./rpc/server.cpp:501
    19[#18](/bitcoin-bitcoin/18/) 0x000055f3d3e6ffdb in JSONRPCExec (jreq=..., catch_errors=<optimized out>) at ./rpc/server.cpp:357
    20[#19](/bitcoin-bitcoin/19/) 0x000055f3d3ffddd2 in HTTPReq_JSONRPC (context=..., req=0x7f9d30001ca0) at ./httprpc.cpp:217
    21[#20](/bitcoin-bitcoin/20/) 0x000055f3d4014451 in std::function<bool (HTTPRequest*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)>::operator()(HTTPRequest*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const (__args#1="", __args#0=<optimized out>, this=0x7f9d300053c0) at /usr/include/c++/13/bits/std_function.h:591
    22[#21](/bitcoin-bitcoin/21/) HTTPWorkItem::operator() (this=0x7f9d30005390) at ./httpserver.cpp:60
    23[#22](/bitcoin-bitcoin/22/) WorkQueue<HTTPClosure>::Run (this=this@entry=0x55f40ede4b10) at ./httpserver.cpp:115
    24[#23](/bitcoin-bitcoin/23/) 0x000055f3d400bb37 in HTTPWorkQueueRun (queue=0x55f40ede4b10, worker_num=<optimized out>) at ./httpserver.cpp:417
    25[#24](/bitcoin-bitcoin/24/) 0x00007f9da20f6db4 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
    26[#25](/bitcoin-bitcoin/25/) 0x00007f9da1d7daa4 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
    27[#26](/bitcoin-bitcoin/26/) 0x00007f9da1e0ac3c in ?? () from /lib/x86_64-linux-gnu/libc.so.6
    
  14. laanwj commented at 8:33 am on May 19, 2025: member
    That backtrace is exactly what i’d expected. It’s the parent process waiting for the forked child there.
  15. maflcko commented at 10:00 am on May 19, 2025: member

    So far it looks like this can only happen with gcc-13 (or older) and GLIBCXX debug mode. I don’t think any real user is running in production with the std lib debug mode, so an alternative would be to just fixup the CI?

    For reference, the end-user setting -D_GLIBCXX_ASSERTIONS passed 25 times: https://cirrus-ci.com/task/5739615073599488

  16. laanwj commented at 10:22 am on May 19, 2025: member

    so an alternative would be to just fixup the CI?

    Sure. If this is unlikely to cause issues for real users, that makes more sense, the point of the CI is not to keep us patching the code to make the CI happy 😄

  17. maflcko commented at 10:58 am on May 19, 2025: member

    I’d say finding compiler bugs is a valid use-case of the CI, but this one seems to be fixed already, so there is probably not much value in bisecting the exact gcc commit, especially given that it would be extremely tedious without a minimal non-intermittent reproducer.

    (The time would be better spent on reducing other compiler bugs, such as #32276 (comment))

  18. laanwj commented at 12:02 pm on May 19, 2025: member
    Yes, bisecting it would be neigh-impossible, given that it seems impossible to reliably reproduce (one thing that’s strange is that it never happens in the unit tests, which exercise RunCommandParseJSON). i already looked into the git history of directory_iterator in the gcc/libstdc++-v3 and wasn’t able to find any bug fix that would explain the behavior here, or how it was fixed. So it’s likely something indirect.
  19. maflcko commented at 12:15 pm on May 19, 2025: member
    The debug mode keeps state in additional structs/fields, so my bet would be that one of the states is multi-thread unsafe, but that is just a wild guess.
  20. fanquake closed this on May 20, 2025

  21. fanquake referenced this in commit af65fd1a33 on May 20, 2025
  22. maflcko commented at 11:17 am on May 20, 2025: member

    It even happens on gcc-14, it seems:

    https://cirrus-ci.com/task/5476449961902080?logs=ci#L5506

  23. maflcko reopened this on May 20, 2025

  24. maflcko renamed this:
    intermittent issue in rpc_signer.py (enumeratesigners timeout)
    intermittent issue in rpc_signer.py (enumeratesigners timeout) under GLIBCXX debug mode
    on May 20, 2025
  25. fanquake added this to the milestone 30.0 on May 20, 2025
  26. maflcko commented at 12:54 pm on May 20, 2025: member
    What I don’t understand is why the CentOS task seemingly passed before, but started failing after https://github.com/bitcoin/bitcoin/commit/af65fd1a333011137dafd3df9a51704fd319feb4, even though the task wasn’t modified apart from a comment.


maflcko fanquake laanwj romanz

Labels
CI failed

Milestone
30.0


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-05-20 21:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me