intermittent timeout in wallet_signer.py : 'createwallet' RPC took longer than 1200.000000 seconds #32855

issue maflcko opened this issue on July 2, 2025
  1. maflcko commented at 7:52 AM on July 2, 2025: member

    in the multiprocess, i686, DEBUG task in self.nodes[1].createwallet(wallet_name='hww_disconnect', disable_private_keys=True, external_signer=True),

    https://cirrus-ci.com/task/5704849158832128?logs=ci#L2967:

     node1 2025-06-26T14:05:40.072184Z [msghand] [net_processing.cpp:3452] [ProcessMessage] [net] received: pong (8 bytes) peer=0 
     test  2025-06-26T14:05:43.222938Z TestFramework (ERROR): JSONRPC error 
                                       Traceback (most recent call last):
                                         File "/ci_container_base/test/functional/test_framework/authproxy.py", line 174, in _get_response
                                           http_response = self.__conn.getresponse()
                                                           ^^^^^^^^^^^^^^^^^^^^^^^^^
                                         File "/usr/lib/python3.12/http/client.py", line 1428, in getresponse
                                           response.begin()
                                         File "/usr/lib/python3.12/http/client.py", line 331, in begin
                                           version, status, reason = self._read_status()
                                                                     ^^^^^^^^^^^^^^^^^^^
                                         File "/usr/lib/python3.12/http/client.py", line 292, in _read_status
                                           line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
                                                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                                         File "/usr/lib/python3.12/socket.py", line 707, in readinto
                                           return self._sock.recv_into(b)
                                                  ^^^^^^^^^^^^^^^^^^^^^^^
                                       TimeoutError: timed out
                                       During handling of the above exception, another exception occurred:
                                       Traceback (most recent call last):
                                         File "/ci_container_base/test/functional/test_framework/test_framework.py", line 190, in main
                                           self.run_test()
                                         File "/ci_container_base/ci/scratch/build-i686-pc-linux-gnu/test/functional/wallet_signer.py", line 59, in run_test
                                           self.test_disconnected_signer()
                                         File "/ci_container_base/ci/scratch/build-i686-pc-linux-gnu/test/functional/wallet_signer.py", line 246, in test_disconnected_signer
                                           self.nodes[1].createwallet(wallet_name='hww_disconnect', disable_private_keys=True, external_signer=True)
                                         File "/ci_container_base/test/functional/test_framework/coverage.py", line 50, in __call__
                                           return_val = self.auth_service_proxy_instance.__call__(*args, **kwargs)
                                                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                                         File "/ci_container_base/test/functional/test_framework/authproxy.py", line 137, in __call__
                                           response, status = self._request('POST', self.__url.path, postdata.encode('utf-8'))
                                                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                                         File "/ci_container_base/test/functional/test_framework/authproxy.py", line 111, in _request
                                           return self._get_response()
                                                  ^^^^^^^^^^^^^^^^^^^^
                                         File "/ci_container_base/test/functional/test_framework/authproxy.py", line 176, in _get_response
                                           raise JSONRPCException({
                                       test_framework.authproxy.JSONRPCException: 'createwallet' RPC took longer than 1200.000000 seconds. Consider using larger timeout for calls that take longer to return. (-344)
     test  2025-06-26T14:05:43.230396Z TestFramework (DEBUG): Closing down network thread 
    
  2. maflcko added the label CI failed on Jul 2, 2025
  3. maflcko added the label Tests on Jul 2, 2025
  4. tnndbtc commented at 3:51 PM on July 3, 2025: none

    @maflcko Is there a way to retrieve debug.log (in /ci_container_base/ci/scratch/test_runner/test_runner_₿_🏃_20250626_133646/) while the test fails so it may provide more clues on why RPC took longer?

  5. maflcko commented at 4:22 PM on July 3, 2025: member

    You can just follow the links to "Open full logs" to get to https://api.cirrus-ci.com/v1/task/5704849158832128/logs/ci.log, but I don't think there are any details in the log that give more hints, on a quick skim.

  6. bitcoin deleted a comment on Jul 3, 2025
  7. Sjors commented at 7:51 AM on July 7, 2025: member

    Assuming this is specific to external wallets, and not an unrelated bug in the test itself, the most interesting code is:

    void CWallet::SetupDescriptorScriptPubKeyMans()
    {
    ...
            UniValue signer_res = signer->GetDescriptors(account);
    

    Beyond that wallet creation is more or less the same between regular and external signer wallets.

    UniValue ExternalSigner::GetDescriptors(const int account)
    {
        return RunCommandParseJSON(m_command + " --fingerprint " + m_fingerprint + NetworkArg() + " getdescriptors --account " + strprintf("%d", account));
    }
    

    In the functional test this results in a call to mocks/signer.py that's handled by def getdescriptors(args).

    Note that test_disconnected_signer is not the first call to mocks/signer.py in the functional test. It would be useful to know if it always times out at this specific line or also elsewhere in the test.

    This test was recently added in #32682, but the createwallet part of the new test is not unique.

  8. maflcko commented at 8:30 AM on July 7, 2025: member

    I'd say it would be useful if the mocks scripts had more debug logging enabled; maybe some command tracing when entering the script and when exiting it?

  9. Sjors commented at 2:44 PM on July 9, 2025: member

    @maflcko that's easier said than done, but see #32928!

  10. tnndbtc commented at 4:45 PM on July 10, 2025: none

    Thanks for @Sjors ' quick clue on the potential issue. I searched the CI output and agree that it's likely due to the calling of

    return RunCommandParseJSON(m_command + " --fingerprint " + m_fingerprint + NetworkArg() + " getdescriptors --account " + strprintf("%d", account)); which requires a read from stdin in test/functional/mocks/signer.py

    if not sys.stdin.isatty():  # on my local mac test, this is not from terminal so it will try to run read()
        buffer = sys.stdin.read() # this one read an empty string from stdin
    

    Likely the stdin is missing in this particular run and caused test/functional/mocks/signer.py to hang. The execution of the python code is implemented in src/util/subprocess.h (Popen(std::initializer_list<const char*> cmd_args, Args&& ...args)), which I haven't researched further on possible bugs that can cause the pipe not flushed, or closed too early before child process can consume it.

    Since this mock/signer.py is very isolated from regular python test framework so it's hard to add debug traces that can be caught by CI process. I would propose adding a log in C++ code, like this:

            WalletLogPrintf("Begin of GetDescriptors"); // add debug log
            UniValue signer_res = signer->GetDescriptors(account);
            WalletLogPrintf("End of GetDescriptors"); // add debug log
    

    This is not ideal, but it will give a good idea on what goes wrong next time it happens.

    More details on how I identified the problematic code, I searched the full CI output:

    [10:05:43.853]  node1 2025-06-26T13:45:43.128865Z [httpworker.8] [wallet/wallet.h:931] [WalletLogPrintf] [hww_disconnect] Setting minversion to 169900
    [10:05:43.853]  node1 2025-06-26T13:45:43.128982Z [httpworker.8] [wallet/sqlite.cpp:55] [TraceSqlCallback] [walletdb:trace] [/ci_container_base/ci/scratch/test_runner/test_runner_₿_🏃_20250626_133646/wallet_signer_216/node1/regtest/hww_disconnect/wallet.dat] SQLite Statement: INSERT or REPLACE into main values(?, ?)
    [10:05:43.853]  node1 2025-06-26T13:45:43.129258Z [httpworker.8] [wallet/sqlite.cpp:55] [TraceSqlCallback] [walletdb:trace] [/ci_container_base/ci/scratch/test_runner/test_runner_₿_🏃_20250626_133646/wallet_signer_216/node1/regtest/hww_disconnect/wallet.dat] SQLite Statement: INSERT or REPLACE into main values(?, ?)
    

    Then there is no more logs for thread: httpworker.8, which indicates a hanging issue. However, from a good test, the traces would show.

    walletInstance->SetMinVersion(FEATURE_LATEST); // in CWallet::Create , this will generate following log:
    2025-07-10T16:13:48.468246Z [httpworker.7] [wallet/wallet.h:942] [WalletLogPrintf] [hww_disconnect] Setting minversion to 169900
    2025-07-10T16:13:48.468268Z [httpworker.7] [wallet/sqlite.cpp:55] [TraceSqlCallback] [walletdb:trace] [/tmp/test_runner_₿_🏃_20250710_121343/wallet_signer_0/node1/regtest/hww_disconnect/wallet.dat] SQLite Statement: INSERT or REPLACE into main values(?, ?)
    
    walletInstance->InitWalletFlags(wallet_creation_flags | WALLET_FLAG_LAST_HARDENED_XPUB_CACHED); // in CWallet::Create, this one will generate following logs
    2025-07-10T16:13:48.468307Z [httpworker.7] [wallet/sqlite.cpp:55] [TraceSqlCallback] [walletdb:trace] [/tmp/test_runner_₿_🏃_20250710_121343/wallet_signer_0/node1/regtest/hww_disconnect/wallet.dat] SQLite Statement: INSERT or REPLACE into main values(?, ?)
    
    UniValue signer_res = signer->GetDescriptors(account); // in CWallet::SetupDescriptorScriptPubKeyMans(), this one requires a read from stdin
    
    if (!batch.TxnBegin()) throw std::runtime_error("Error: cannot create db transaction for descriptors import"); // in CWallet::SetupDescriptorScriptPubKeyMans(), this one will generate following log
    2025-07-10T16:13:48.534299Z [httpworker.7] [wallet/sqlite.cpp:55] [TraceSqlCallback] [walletdb:trace] [/tmp/test_runner_₿_🏃_20250710_121343/wallet_signer_0/node1/regtest/hww_disconnect/wallet.dat] SQLite Statement: BEGIN TRANSACTION
    

    Based on the log traces, the python statement that causes wallet creation failure is: self.nodes[1].createwallet(wallet_name='hww_disconnect', disable_private_keys=True, external_signer=True)

  11. maflcko commented at 5:51 PM on July 10, 2025: member

    Likely the stdin is missing in this particular run and caused test/functional/mocks/signer.py to hang.

    What does "stdin is missing" mean? It should always be present and point to some file descriptor.

    I think the only way this could hang is when the process spawning the signer is also waiting on it, without first closing the pipes. However this seems to happen correctly, looking at RunCommandParseJSON:

        auto c = sp::Popen(str_command, sp::input{sp::PIPE}, sp::output{sp::PIPE}, sp::error{sp::PIPE}, sp::close_fds{true});
        if (!str_std_in.empty()) {
            c.send(str_std_in);
        }
        auto [out_res, err_res] = c.communicate();
    
  12. maflcko commented at 5:53 PM on July 10, 2025: member

    Maybe this is #32524, but with N=1 and no way to reproduce, this seems difficult to debug.

  13. bitcoin deleted a comment on Jul 11, 2025
  14. tnndbtc commented at 11:46 PM on July 12, 2025: none

    @maflcko For "stdin is missing", I was referring to a suspicion that bitcoind failed to write an empty string to the pipe and caused signer.py waiting on buffer = sys.stdin.read(). However, I was wrong about that.

    After I checked the call stack posted in #32524 , it indicated that bitcoind got stuck on a read() call from err_rd_pipe in util::read_atmost_n, which is to read from the child process, i.e., rpc_signer.py.

    Also, the bitcoind process indeed has closed the write pipe, by calling subprocess_close(err_wr_pipe); which will trigger the signer.py to get a empty string from buffer = sys.stdin.read(), so this part is good.

    I'd spend more time on this subprocess::process_execute() logic and hopefully find out the potential bug.

  15. maflcko commented at 6:58 AM on July 14, 2025: member

    Possibly similar root cause: https://cirrus-ci.com/task/5280224666976256?logs=ci#L1710

    At least on CI, back when I tested #32529 by Mara did seem to improve the situation.

    I wonder if we should re-open and use that pull, even if there isn't a reproducer or further details available.

  16. naiyoma commented at 8:44 PM on July 18, 2025: contributor

    I was testing #32928 and made an attempt to reproduce this issue by hanging a getdescriptors call

    index aa9d286417..edbd5359ee 100755
    --- a/test/functional/mock_external_signer.py
    +++ b/test/functional/mock_external_signer.py
    @@ -97,6 +97,15 @@ parser_signtx.add_argument('psbt', metavar='psbt')
     parser_signtx.set_defaults(func=signtx)
     
     if not sys.stdin.isatty():
    +    log.debug("About to read from stdin")
    +    log.debug(f"Current args: {sys.argv}")
    +    # Hang the getdescriptors call
    +    if "getdescriptors" in sys.argv:
    +        log.debug("Naiyoma HANGING: stdin hang for getdescriptors")
    +        # Create an infinite read that will timeout
    +        import time
    +        while True:
    +            time.sleep(1)
         buffer = sys.stdin.read()
         if buffer and buffer.rstrip() != "":
             sys.argv.extend(buffer.rstrip().split(" "))
    

    TimeoutError: timed out

    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "/home/ubuntu/Projects/bitcoin/test/functional/test_framework/test_framework.py", line 195, in main
        self.run_test()
      File "/home/ubuntu/Projects/bitcoin/./build/test/functional/wallet_signer.py", line 45, in run_test
        self.test_valid_signer()
      File "/home/ubuntu/Projects/bitcoin/./build/test/functional/wallet_signer.py", line 58, in test_valid_signer
        self.nodes[1].createwallet(wallet_name='hww', disable_private_keys=True, external_signer=True)
      File "/home/ubuntu/Projects/bitcoin/test/functional/test_framework/coverage.py", line 50, in __call__
        return_val = self.auth_service_proxy_instance.__call__(*args, **kwargs)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/home/ubuntu/Projects/bitcoin/test/functional/test_framework/authproxy.py", line 137, in __call__
        response, status = self._request('POST', self.__url.path, postdata.encode('utf-8'))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/home/ubuntu/Projects/bitcoin/test/functional/test_framework/authproxy.py", line 111, in _request
        return self._get_response()
               ^^^^^^^^^^^^^^^^^^^^
      File "/home/ubuntu/Projects/bitcoin/test/functional/test_framework/authproxy.py", line 176, in _get_response
        raise JSONRPCException({
    test_framework.authproxy.JSONRPCException: 'createwallet' RPC took longer than 30.000000 seconds. Consider using larger timeout for calls that take longer to return. (-344)
    ```
    `
    
  17. tnndbtc commented at 9:09 PM on July 18, 2025: none

    @naiyoma I also suspect this might be the cause and put comment in #32524 (comment) . As stated in that comment, one possible reason is that the close of err_wr_pipe, which should send the empty string to stdin for the python process (signer.py) to confirm, could return -1 but bitcoind didn't check the return result. subprocess_close(err_wr_pipe);// close child side of pipe, else get stuck in read below

    Typically, close of file descriptor or pipe shouldn't cause error, but it's still good to add a result check in bitcoind code to ensure, or help rule out when next time this issue happens.

  18. naiyoma commented at 8:43 PM on July 19, 2025: contributor

    As stated in that comment, one possible reason is that the close of err_wr_pipe, which should send the empty string to stdin for the python process (signer.py) to confirm, could return -1 but bitcoind didn't check the return result. subprocess_close(err_wr_pipe);// close child side of pipe, else get stuck in read below

    I am not sure about this because on my end subprocess works okay

    subprocess_close(err_wr_pipe);// close child side of pipe, else get stuck in read below
    fprintf(stderr, "PARENT: Closed err_wr_pipe (fd=%d)\n", err_wr_pipe);
    
    PARENT: Closed err_wr_pipe (fd=34)
    PARENT: Closed err_wr_pipe (fd=34)
    
  19. naiyoma commented at 9:05 PM on July 19, 2025: contributor

    Likely the stdin is missing in this particular run and caused test/functional/mocks/signer.py to hang.

    I agree with this

    in RunCommandParseJSON str_std_in is empty, I logged

     if(str_std_in.empty()) {
            LogPrintf("RunCommandParseJSON: No stdin data to send\n");
        } 
    

    logs

     grep "RunCommandParseJSON" /tmp/bitcoin_func_test_09pw1wgg/node1/regtest/debug.log
    2025-07-19T20:52:09.711192Z [httpworker.1] [common/run_command.cpp:29] [RunCommandParseJSON] RunCommandParseJSON: command='/usr/bin/python3 /home/ubuntu/Projects/bitcoin/test/functional/test_framework/../mock_external_signer.py enumerate', stdin_length=0, stdin_empty=true
    2025-07-19T20:52:09.720216Z [httpworker.1] [common/run_command.cpp:32] [RunCommandParseJSON] RunCommandParseJSON: No stdin data to send
    2025-07-19T20:52:09.837336Z [httpworker.1] [common/run_command.cpp:29] [RunCommandParseJSON] RunCommandParseJSON: command='/usr/bin/python3 /home/ubuntu/Projects/bitcoin/test/functional/test_framework/../mock_external_signer.py --fingerprint 00000001 --chain regtest getdescriptors --account 0', stdin_length=0, stdin_empty=true
    2025-07-19T20:52:09.840036Z [httpworker.1] [common/run_command.cpp:32] [RunCommandParseJSON] RunCommandParseJSON: No stdin data to send
    
  20. tnndbtc commented at 6:11 PM on July 21, 2025: none

    @naiyoma For if(str_std_in.empty()), it will always return true in this case. However, the empty string is sent by successful calling of subprocess_close(err_wr_pipe);// close child side of pipe, else get stuck in read below. If you comment out this line, you will see that read from stdin in signer.py will 100% get stuck.

  21. fanquake closed this on Jul 26, 2025

  22. fanquake referenced this in commit 75a5c8258e on Jul 26, 2025

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-04-22 12:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me