intermittent timeout in wallet_signer.py : ‘createwallet’ RPC took longer than 1200.000000 seconds #32855

issue maflcko openend this issue on July 2, 2025
  1. maflcko commented at 7:52 am on July 2, 2025: member

    in the multiprocess, i686, DEBUG task in self.nodes[1].createwallet(wallet_name='hww_disconnect', disable_private_keys=True, external_signer=True),

    https://cirrus-ci.com/task/5704849158832128?logs=ci#L2967:

     0 node1 2025-06-26T14:05:40.072184Z [msghand] [net_processing.cpp:3452] [ProcessMessage] [net] received: pong (8 bytes) peer=0 
     1 test  2025-06-26T14:05:43.222938Z TestFramework (ERROR): JSONRPC error 
     2                                   Traceback (most recent call last):
     3                                     File "/ci_container_base/test/functional/test_framework/authproxy.py", line 174, in _get_response
     4                                       http_response = self.__conn.getresponse()
     5                                                       ^^^^^^^^^^^^^^^^^^^^^^^^^
     6                                     File "/usr/lib/python3.12/http/client.py", line 1428, in getresponse
     7                                       response.begin()
     8                                     File "/usr/lib/python3.12/http/client.py", line 331, in begin
     9                                       version, status, reason = self._read_status()
    10                                                                 ^^^^^^^^^^^^^^^^^^^
    11                                     File "/usr/lib/python3.12/http/client.py", line 292, in _read_status
    12                                       line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
    13                                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    14                                     File "/usr/lib/python3.12/socket.py", line 707, in readinto
    15                                       return self._sock.recv_into(b)
    16                                              ^^^^^^^^^^^^^^^^^^^^^^^
    17                                   TimeoutError: timed out
    18                                   During handling of the above exception, another exception occurred:
    19                                   Traceback (most recent call last):
    20                                     File "/ci_container_base/test/functional/test_framework/test_framework.py", line 190, in main
    21                                       self.run_test()
    22                                     File "/ci_container_base/ci/scratch/build-i686-pc-linux-gnu/test/functional/wallet_signer.py", line 59, in run_test
    23                                       self.test_disconnected_signer()
    24                                     File "/ci_container_base/ci/scratch/build-i686-pc-linux-gnu/test/functional/wallet_signer.py", line 246, in test_disconnected_signer
    25                                       self.nodes[1].createwallet(wallet_name='hww_disconnect', disable_private_keys=True, external_signer=True)
    26                                     File "/ci_container_base/test/functional/test_framework/coverage.py", line 50, in __call__
    27                                       return_val = self.auth_service_proxy_instance.__call__(*args, **kwargs)
    28                                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    29                                     File "/ci_container_base/test/functional/test_framework/authproxy.py", line 137, in __call__
    30                                       response, status = self._request('POST', self.__url.path, postdata.encode('utf-8'))
    31                                                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    32                                     File "/ci_container_base/test/functional/test_framework/authproxy.py", line 111, in _request
    33                                       return self._get_response()
    34                                              ^^^^^^^^^^^^^^^^^^^^
    35                                     File "/ci_container_base/test/functional/test_framework/authproxy.py", line 176, in _get_response
    36                                       raise JSONRPCException({
    37                                   test_framework.authproxy.JSONRPCException: 'createwallet' RPC took longer than 1200.000000 seconds. Consider using larger timeout for calls that take longer to return. (-344)
    38 test  2025-06-26T14:05:43.230396Z TestFramework (DEBUG): Closing down network thread 
    
  2. maflcko added the label CI failed on Jul 2, 2025
  3. maflcko added the label Tests on Jul 2, 2025
  4. tnndbtc commented at 3:51 pm on July 3, 2025: none
    @maflcko Is there a way to retrieve debug.log (in /ci_container_base/ci/scratch/test_runner/test_runner_₿_🏃_20250626_133646/) while the test fails so it may provide more clues on why RPC took longer?
  5. maflcko commented at 4:22 pm on July 3, 2025: member
    You can just follow the links to “Open full logs” to get to https://api.cirrus-ci.com/v1/task/5704849158832128/logs/ci.log, but I don’t think there are any details in the log that give more hints, on a quick skim.
  6. bitcoin deleted a comment on Jul 3, 2025
  7. Sjors commented at 7:51 am on July 7, 2025: member

    Assuming this is specific to external wallets, and not an unrelated bug in the test itself, the most interesting code is:

    0void CWallet::SetupDescriptorScriptPubKeyMans()
    1{
    2...
    3        UniValue signer_res = signer->GetDescriptors(account);
    

    Beyond that wallet creation is more or less the same between regular and external signer wallets.

    0UniValue ExternalSigner::GetDescriptors(const int account)
    1{
    2    return RunCommandParseJSON(m_command + " --fingerprint " + m_fingerprint + NetworkArg() + " getdescriptors --account " + strprintf("%d", account));
    3}
    

    In the functional test this results in a call to mocks/signer.py that’s handled by def getdescriptors(args).

    Note that test_disconnected_signer is not the first call to mocks/signer.py in the functional test. It would be useful to know if it always times out at this specific line or also elsewhere in the test.

    This test was recently added in #32682, but the createwallet part of the new test is not unique.

  8. maflcko commented at 8:30 am on July 7, 2025: member
    I’d say it would be useful if the mocks scripts had more debug logging enabled; maybe some command tracing when entering the script and when exiting it?
  9. Sjors commented at 2:44 pm on July 9, 2025: member
    @maflcko that’s easier said than done, but see #32928!
  10. tnndbtc commented at 4:45 pm on July 10, 2025: none

    Thanks for @Sjors ’ quick clue on the potential issue. I searched the CI output and agree that it’s likely due to the calling of

    return RunCommandParseJSON(m_command + " --fingerprint " + m_fingerprint + NetworkArg() + " getdescriptors --account " + strprintf("%d", account)); which requires a read from stdin in test/functional/mocks/signer.py

    0if not sys.stdin.isatty():  # on my local mac test, this is not from terminal so it will try to run read()
    1    buffer = sys.stdin.read() # this one read an empty string from stdin
    

    Likely the stdin is missing in this particular run and caused test/functional/mocks/signer.py to hang. The execution of the python code is implemented in src/util/subprocess.h (Popen(std::initializer_list<const char*> cmd_args, Args&& ...args)), which I haven’t researched further on possible bugs that can cause the pipe not flushed, or closed too early before child process can consume it.

    Since this mock/signer.py is very isolated from regular python test framework so it’s hard to add debug traces that can be caught by CI process. I would propose adding a log in C++ code, like this:

    0        WalletLogPrintf("Begin of GetDescriptors"); // add debug log
    1        UniValue signer_res = signer->GetDescriptors(account);
    2        WalletLogPrintf("End of GetDescriptors"); // add debug log
    

    This is not ideal, but it will give a good idea on what goes wrong next time it happens.

    More details on how I identified the problematic code, I searched the full CI output:

    0[10:05:43.853]  node1 2025-06-26T13:45:43.128865Z [httpworker.8] [wallet/wallet.h:931] [WalletLogPrintf] [hww_disconnect] Setting minversion to 169900
    1[10:05:43.853]  node1 2025-06-26T13:45:43.128982Z [httpworker.8] [wallet/sqlite.cpp:55] [TraceSqlCallback] [walletdb:trace] [/ci_container_base/ci/scratch/test_runner/test_runner_₿_🏃_20250626_133646/wallet_signer_216/node1/regtest/hww_disconnect/wallet.dat] SQLite Statement: INSERT or REPLACE into main values(?, ?)
    2[10:05:43.853]  node1 2025-06-26T13:45:43.129258Z [httpworker.8] [wallet/sqlite.cpp:55] [TraceSqlCallback] [walletdb:trace] [/ci_container_base/ci/scratch/test_runner/test_runner_₿_🏃_20250626_133646/wallet_signer_216/node1/regtest/hww_disconnect/wallet.dat] SQLite Statement: INSERT or REPLACE into main values(?, ?)
    

    Then there is no more logs for thread: httpworker.8, which indicates a hanging issue. However, from a good test, the traces would show.

     0walletInstance->SetMinVersion(FEATURE_LATEST); // in CWallet::Create , this will generate following log:
     12025-07-10T16:13:48.468246Z [httpworker.7] [wallet/wallet.h:942] [WalletLogPrintf] [hww_disconnect] Setting minversion to 169900
     22025-07-10T16:13:48.468268Z [httpworker.7] [wallet/sqlite.cpp:55] [TraceSqlCallback] [walletdb:trace] [/tmp/test_runner_₿_🏃_20250710_121343/wallet_signer_0/node1/regtest/hww_disconnect/wallet.dat] SQLite Statement: INSERT or REPLACE into main values(?, ?)
     3
     4walletInstance->InitWalletFlags(wallet_creation_flags | WALLET_FLAG_LAST_HARDENED_XPUB_CACHED); // in CWallet::Create, this one will generate following logs
     52025-07-10T16:13:48.468307Z [httpworker.7] [wallet/sqlite.cpp:55] [TraceSqlCallback] [walletdb:trace] [/tmp/test_runner_₿_🏃_20250710_121343/wallet_signer_0/node1/regtest/hww_disconnect/wallet.dat] SQLite Statement: INSERT or REPLACE into main values(?, ?)
     6
     7UniValue signer_res = signer->GetDescriptors(account); // in CWallet::SetupDescriptorScriptPubKeyMans(), this one requires a read from stdin
     8
     9if (!batch.TxnBegin()) throw std::runtime_error("Error: cannot create db transaction for descriptors import"); // in CWallet::SetupDescriptorScriptPubKeyMans(), this one will generate following log
    102025-07-10T16:13:48.534299Z [httpworker.7] [wallet/sqlite.cpp:55] [TraceSqlCallback] [walletdb:trace] [/tmp/test_runner_₿_🏃_20250710_121343/wallet_signer_0/node1/regtest/hww_disconnect/wallet.dat] SQLite Statement: BEGIN TRANSACTION
    

    Based on the log traces, the python statement that causes wallet creation failure is: self.nodes[1].createwallet(wallet_name='hww_disconnect', disable_private_keys=True, external_signer=True)

  11. maflcko commented at 5:51 pm on July 10, 2025: member

    Likely the stdin is missing in this particular run and caused test/functional/mocks/signer.py to hang.

    What does “stdin is missing” mean? It should always be present and point to some file descriptor.

    I think the only way this could hang is when the process spawning the signer is also waiting on it, without first closing the pipes. However this seems to happen correctly, looking at RunCommandParseJSON:

    0    auto c = sp::Popen(str_command, sp::input{sp::PIPE}, sp::output{sp::PIPE}, sp::error{sp::PIPE}, sp::close_fds{true});
    1    if (!str_std_in.empty()) {
    2        c.send(str_std_in);
    3    }
    4    auto [out_res, err_res] = c.communicate();
    
  12. maflcko commented at 5:53 pm on July 10, 2025: member
    Maybe this is #32524, but with N=1 and no way to reproduce, this seems difficult to debug.
  13. bitcoin deleted a comment on Jul 11, 2025
  14. tnndbtc commented at 11:46 pm on July 12, 2025: none

    @maflcko For “stdin is missing”, I was referring to a suspicion that bitcoind failed to write an empty string to the pipe and caused signer.py waiting on buffer = sys.stdin.read(). However, I was wrong about that.

    After I checked the call stack posted in #32524 , it indicated that bitcoind got stuck on a read() call from err_rd_pipe in util::read_atmost_n, which is to read from the child process, i.e., rpc_signer.py.

    Also, the bitcoind process indeed has closed the write pipe, by calling subprocess_close(err_wr_pipe); which will trigger the signer.py to get a empty string from buffer = sys.stdin.read(), so this part is good.

    I’d spend more time on this subprocess::process_execute() logic and hopefully find out the potential bug.

  15. maflcko commented at 6:58 am on July 14, 2025: member

    Possibly similar root cause: https://cirrus-ci.com/task/5280224666976256?logs=ci#L1710

    At least on CI, back when I tested #32529 by Mara did seem to improve the situation.

    I wonder if we should re-open and use that pull, even if there isn’t a reproducer or further details available.

  16. naiyoma commented at 8:44 pm on July 18, 2025: contributor

    I was testing #32928 and made an attempt to reproduce this issue by hanging a getdescriptors call

     0index aa9d286417..edbd5359ee 100755
     1--- a/test/functional/mock_external_signer.py
     2+++ b/test/functional/mock_external_signer.py
     3@@ -97,6 +97,15 @@ parser_signtx.add_argument('psbt', metavar='psbt')
     4 parser_signtx.set_defaults(func=signtx)
     5 
     6 if not sys.stdin.isatty():
     7+    log.debug("About to read from stdin")
     8+    log.debug(f"Current args: {sys.argv}")
     9+    # Hang the getdescriptors call
    10+    if "getdescriptors" in sys.argv:
    11+        log.debug("Naiyoma HANGING: stdin hang for getdescriptors")
    12+        # Create an infinite read that will timeout
    13+        import time
    14+        while True:
    15+            time.sleep(1)
    16     buffer = sys.stdin.read()
    17     if buffer and buffer.rstrip() != "":
    18         sys.argv.extend(buffer.rstrip().split(" "))
    

    TimeoutError: timed out

     0
     1During handling of the above exception, another exception occurred:
     2
     3Traceback (most recent call last):
     4  File "/home/ubuntu/Projects/bitcoin/test/functional/test_framework/test_framework.py", line 195, in main
     5    self.run_test()
     6  File "/home/ubuntu/Projects/bitcoin/./build/test/functional/wallet_signer.py", line 45, in run_test
     7    self.test_valid_signer()
     8  File "/home/ubuntu/Projects/bitcoin/./build/test/functional/wallet_signer.py", line 58, in test_valid_signer
     9    self.nodes[1].createwallet(wallet_name='hww', disable_private_keys=True, external_signer=True)
    10  File "/home/ubuntu/Projects/bitcoin/test/functional/test_framework/coverage.py", line 50, in __call__
    11    return_val = self.auth_service_proxy_instance.__call__(*args, **kwargs)
    12                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    13  File "/home/ubuntu/Projects/bitcoin/test/functional/test_framework/authproxy.py", line 137, in __call__
    14    response, status = self._request('POST', self.__url.path, postdata.encode('utf-8'))
    15                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    16  File "/home/ubuntu/Projects/bitcoin/test/functional/test_framework/authproxy.py", line 111, in _request
    17    return self._get_response()
    18           ^^^^^^^^^^^^^^^^^^^^
    19  File "/home/ubuntu/Projects/bitcoin/test/functional/test_framework/authproxy.py", line 176, in _get_response
    20    raise JSONRPCException({
    21test_framework.authproxy.JSONRPCException: 'createwallet' RPC took longer than 30.000000 seconds. Consider using larger timeout for calls that take longer to return. (-344)
    22```
    23`
    
  17. tnndbtc commented at 9:09 pm on July 18, 2025: none

    @naiyoma I also suspect this might be the cause and put comment in #32524 (comment) . As stated in that comment, one possible reason is that the close of err_wr_pipe, which should send the empty string to stdin for the python process (signer.py) to confirm, could return -1 but bitcoind didn’t check the return result. subprocess_close(err_wr_pipe);// close child side of pipe, else get stuck in read below

    Typically, close of file descriptor or pipe shouldn’t cause error, but it’s still good to add a result check in bitcoind code to ensure, or help rule out when next time this issue happens.

  18. naiyoma commented at 8:43 pm on July 19, 2025: contributor

    As stated in that comment, one possible reason is that the close of err_wr_pipe, which should send the empty string to stdin for the python process (signer.py) to confirm, could return -1 but bitcoind didn’t check the return result. subprocess_close(err_wr_pipe);// close child side of pipe, else get stuck in read below

    I am not sure about this because on my end subprocess works okay

    0subprocess_close(err_wr_pipe);// close child side of pipe, else get stuck in read below
    1fprintf(stderr, "PARENT: Closed err_wr_pipe (fd=%d)\n", err_wr_pipe);
    
    0PARENT: Closed err_wr_pipe (fd=34)
    1PARENT: Closed err_wr_pipe (fd=34)
    
  19. naiyoma commented at 9:05 pm on July 19, 2025: contributor

    Likely the stdin is missing in this particular run and caused test/functional/mocks/signer.py to hang.

    I agree with this

    in RunCommandParseJSON str_std_in is empty, I logged

    0 if(str_std_in.empty()) {
    1        LogPrintf("RunCommandParseJSON: No stdin data to send\n");
    2    } 
    

    logs

    0 grep "RunCommandParseJSON" /tmp/bitcoin_func_test_09pw1wgg/node1/regtest/debug.log
    12025-07-19T20:52:09.711192Z [httpworker.1] [common/run_command.cpp:29] [RunCommandParseJSON] RunCommandParseJSON: command='/usr/bin/python3 /home/ubuntu/Projects/bitcoin/test/functional/test_framework/../mock_external_signer.py enumerate', stdin_length=0, stdin_empty=true
    22025-07-19T20:52:09.720216Z [httpworker.1] [common/run_command.cpp:32] [RunCommandParseJSON] RunCommandParseJSON: No stdin data to send
    32025-07-19T20:52:09.837336Z [httpworker.1] [common/run_command.cpp:29] [RunCommandParseJSON] RunCommandParseJSON: command='/usr/bin/python3 /home/ubuntu/Projects/bitcoin/test/functional/test_framework/../mock_external_signer.py --fingerprint 00000001 --chain regtest getdescriptors --account 0', stdin_length=0, stdin_empty=true
    42025-07-19T20:52:09.840036Z [httpworker.1] [common/run_command.cpp:32] [RunCommandParseJSON] RunCommandParseJSON: No stdin data to send
    
  20. tnndbtc commented at 6:11 pm on July 21, 2025: none
    @naiyoma For if(str_std_in.empty()), it will always return true in this case. However, the empty string is sent by successful calling of subprocess_close(err_wr_pipe);// close child side of pipe, else get stuck in read below. If you comment out this line, you will see that read from stdin in signer.py will 100% get stuck.
  21. fanquake closed this on Jul 26, 2025

  22. fanquake referenced this in commit 75a5c8258e on Jul 26, 2025

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-08-03 06:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me