test: display abrupt shutdown errors in console output #28253

pull furszy wants to merge 1 commits into bitcoin:master from furszy:2023_test_framework_print_stderr changing 1 files +6 −1
  1. furszy commented at 9:57 PM on August 10, 2023: member

    Making it easier to debug errors in the CI environment, particularly in scenarios where it's not immediately clear what happened nor which node crashed (or shutdown abruptly).

    A bit of context: Currently, the test framework redirects each node's stderr output stream to a different temporary file inside each node's data directory. While this is sufficient for storing the error, it isn't very helpful for understanding what happened just by reading the CI console output.

    Most of the time, reading the stderr file in the CI environment is not possible, because people don't have access to it.

    Testing Note: The displayed error difference can be observed by cherry-picking this commit https://github.com/furszy/bitcoin-core/commit/9cc5393c0f24c774b4a59706fc7c7873a59573e7 on top of this branch and running any functional test.

  2. DrahtBot commented at 9:57 PM on August 10, 2023: contributor

    <!--e57a25ab6845829454e8d69fc972939a-->

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    <!--006a51241073e994b41acfe9ec718e94-->

    Code Coverage

    For detailed information about the code coverage, see the test coverage report.

    <!--021abf342d371248e50ceaed478a90ca-->

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK MarcoFalke, theStack
    Concept ACK TheCharlatan
    Stale ACK hebasto

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

  3. DrahtBot added the label Tests on Aug 10, 2023
  4. TheCharlatan commented at 9:34 AM on August 11, 2023: contributor

    Concept ACK

  5. maflcko commented at 3:55 PM on August 14, 2023: member

    The CI uses the combine logs helper. Can you link to a CI output before and after?

  6. furszy commented at 8:39 PM on August 14, 2023: member

    The CI uses the combine logs helper. Can you link to a CI output before and after?

    Sure.

    Before https://github.com/furszy/bitcoin-core/runs/15887889073. No information provided. After https://github.com/furszy/bitcoin-core/runs/15886179713. The abrupt shutdown reason is printed.

    The commit causing the abrupt shutdown is the one shared in the PR description.

  7. in test/functional/test_framework/test_node.py:245 in 2f19be6910 outdated
     240 | +                        print(f"'{self.datadir_path.name}' abruptly aborted with error:\n")
     241 | +                        print(''.join(line for line in err_file if line.strip() and line[0] != "*"))
     242 | +                        print("************************")
     243 | +
     244 |                  raise FailedToStartError(self._node_msg(
     245 |                      'bitcoind exited with status {} during initialization'.format(self.process.returncode)))
    


    maflcko commented at 7:01 AM on August 15, 2023:

    Why not append your message to this message?


    furszy commented at 1:41 PM on August 15, 2023:

    The idea was to separate the stderr output from the exception being thrown here. Mainly to get a more readable print when a crash happens. The FailedToStartError exception message appears in-between two long stack traces.

    But, either way is ok for me.


    maflcko commented at 1:45 PM on August 15, 2023:

    If you want to keep it, it would also be good to show an example output (CI and non-CI) when this is printed in a "normal" test (not create_cache)


    furszy commented at 2:02 PM on August 15, 2023:

    Why might the output differ between CI and non-CI runs? The test framework redirect stderr to a file, and this file seems to remain untouched regardless of the execution environment.

    Also, the "normal" test (not create_cache) tense isn't clear for me. Does it refer to running an individual test locally?


    maflcko commented at 2:15 PM on August 15, 2023:

    Why might the output differ between CI and non-CI runs?

    I thought that the test_runner.py may be doing something with stderr, but it looks like it will get printed:

                    print(BOLD[1] + 'stdout:\n' + BOLD[0] + stdout + '\n')
                    print(BOLD[1] + 'stderr:\n' + BOLD[0] + stderr + '\n')
                    if combined_logs_len and os.path.isdir(testdir):
                        # Print the final `combinedlogslen` lines of the combined logs
    

    Also, the "normal" test (not create_cache) tense isn't clear for me. Does it refer to running an individual test locally?

    Yes, I mean any test that is not create_cache.py.

    Mainly to get a more readable print when a crash happens. The FailedToStartError exception message appears in-between two long stack traces.

    Looking at my code block above, it looks like the error message would be easier to spot in the output if it was part of the combined output, as opposed to the python stack trace, no?

    You can use newlines to aid reading and break long blocks.


    furszy commented at 2:32 PM on August 15, 2023:

    Why might the output differ between CI and non-CI runs?

    I thought that the test_runner.py may be doing something with stderr, but it looks like it will get printed:

                    print(BOLD[1] + 'stdout:\n' + BOLD[0] + stdout + '\n')
                    print(BOLD[1] + 'stderr:\n' + BOLD[0] + stderr + '\n')
                    if combined_logs_len and os.path.isdir(testdir):
                        # Print the final `combinedlogslen` lines of the combined logs
    

    In a first glance, that doesn't seems to print the nodes' stderr. That seems to be related to the job stderr (the test run in a subprocess). And those aren't connected.

    Also, the "normal" test (not create_cache) tense isn't clear for me. Does it refer to running an individual test locally?

    Yes, I mean any test that is not create_cache.py.

    Ok. The result is pretty much the same. Just the stack trace is a bit longer.

    Mainly to get a more readable print when a crash happens. The FailedToStartError exception message appears in-between two long stack traces.

    Looking at my code block above, it looks like the error message would be easier to spot in the output if it was part of the combined output, as opposed to the python stack trace, no?

    You can use newlines to aid reading and break long blocks.

    As far as I can see, we don't support combined output on single test cases. So, that would make us lose the crash cause on individual test runs?


    maflcko commented at 2:40 PM on August 15, 2023:

    In a first glance, that doesn't seems to print the nodes' stderr. That seems to be related to the job stderr (the test run in a subprocess). And those aren't connected.

    Currently, you are reading the node's stderr and the printing it to stdout, so it should appear as print(BOLD[1] + 'stdout:\n' + BOLD[0] + stdout + '\n').

    Ok. The result is pretty much the same. Just the stack trace is a bit longer.

    The difference should be that the output won't be shown in the CI tail if the test has produced enough output.

    As far as I can see, we don't support combined output on single test cases. So, that would make us lose the crash cause on individual test runs?

    combined output is only supported for single test files.


    furszy commented at 2:52 PM on August 16, 2023:

    In a first glance, that doesn't seems to print the nodes' stderr. That seems to be related to the job stderr (the test run in a subprocess). And those aren't connected.

    Currently, you are reading the node's stderr and the printing it to stdout, so it should appear as print(BOLD[1] + 'stdout:\n' + BOLD[0] + stdout + '\n').

    Ok. The result is pretty much the same. Just the stack trace is a bit longer.

    The difference should be that the output won't be shown in the CI tail if the test has produced enough output.

    As far as I can see, we don't support combined output on single test cases. So, that would make us lose the crash cause on individual test runs?

    combined output is only supported for single test files.

    Hmm, this made me struggle. it seems that we are mixing test_runner.py functionality and the per test case functionality.

    The print(BOLD[1] + 'stdout:\n' + BOLD[0] + stdout + '\n') occurs only at the test_runner level. When the '-combinedlogslen' flag is provided. Then, for each test case, if we want to combine logs, we need to manually call to the combine_logs.py script.

    Guessing that this confusion arose because I was referring to running single test cases without the test runner, while you are thinking on running them with it. For instance, calling ./wallet_migration.py vs calling ./test_runner wallet_migration.py.

    Yet, I'm a bit confused about where we currently stand in the discussion. Could you please clarify the proposed changes?


    maflcko commented at 7:17 AM on August 17, 2023:

    Yet, I'm a bit confused about where we currently stand in the discussion. Could you please clarify the proposed changes?

    Not sure myself (I haven't tested this yet, just looked at the code). However, I think overall the best UX would be to append the stderr to this message, see https://github.com/bitcoin/bitcoin/pull/28253/files#r1294258580 (using * and \n to format it nicely)


    furszy commented at 2:49 PM on August 23, 2023:

    Ok. Pushed. This is how the CI error will look now https://github.com/furszy/bitcoin-core/runs/16145060771.

  8. maflcko commented at 7:03 AM on August 15, 2023: member

    Ok, so this will only be hit when Bitcoin Core is completely non-functional and completely refuses to start (in create_cache.py)? In all other cases, this should already be printed.

  9. furszy force-pushed on Aug 23, 2023
  10. furszy force-pushed on Aug 23, 2023
  11. furszy commented at 2:51 PM on August 23, 2023: member

    Updated per feedback. Thanks Marco.

    Placed stderr error inside the thrown exception instead of printing it directly. This is how the CI error looks now https://github.com/furszy/bitcoin-core/runs/16145060771.

  12. in test/functional/test_framework/test_node.py:236 in 6ac03a511c outdated
     231 | @@ -232,8 +232,18 @@ def wait_for_rpc_connection(self):
     232 |          poll_per_s = 4
     233 |          for _ in range(poll_per_s * self.rpc_timeout):
     234 |              if self.process.poll() is not None:
     235 | +                str_error = ""
     236 | +                if self.process.returncode != 0:
    


    maflcko commented at 2:54 PM on August 23, 2023:

    nit: Any reason for this check? Reading an empty stderr should be fine either way?


    furszy commented at 3:20 PM on August 23, 2023:

    Merely to not print the "********\n" string when there is no error. But just pushed a slightly different version only checking the stderr string instead of the return code.

  13. in test/functional/test_framework/test_node.py:239 in 6ac03a511c outdated
     231 | @@ -232,8 +232,18 @@ def wait_for_rpc_connection(self):
     232 |          poll_per_s = 4
     233 |          for _ in range(poll_per_s * self.rpc_timeout):
     234 |              if self.process.poll() is not None:
     235 | +                str_error = ""
     236 | +                if self.process.returncode != 0:
     237 | +                    # Print any abrupt shutdown error, so it is visible from the console and the
     238 | +                    # user doesn't have to go node by node checking which one failed.
     239 | +                    with open(self.stderr.name, 'r', encoding='utf-8') as err_file:
    


    maflcko commented at 2:54 PM on August 23, 2023:

    Are you sure .name is valid on all types? Wouldn't it be better to use seek(0) and read().decode(), like below?


    furszy commented at 3:20 PM on August 23, 2023:

    Sure. Pushed 👌🏼.

  14. maflcko approved
  15. maflcko commented at 2:55 PM on August 23, 2023: member

    lgtm ACK

  16. furszy force-pushed on Aug 23, 2023
  17. furszy commented at 4:06 PM on August 23, 2023: member

    Updated per feedback. Thanks.

    Same CI result as before: https://github.com/furszy/bitcoin-core/runs/16146994349.

  18. DrahtBot added the label CI failed on Sep 4, 2023
  19. DrahtBot removed the label CI failed on Sep 5, 2023
  20. hebasto commented at 1:54 PM on October 4, 2023: member

    Concept ACK.

  21. theStack commented at 10:39 PM on October 4, 2023: contributor

    Concept ACK

  22. in test/functional/test_framework/test_node.py:238 in 36b6f308e4 outdated
     231 | @@ -232,8 +232,18 @@ def wait_for_rpc_connection(self):
     232 |          poll_per_s = 4
     233 |          for _ in range(poll_per_s * self.rpc_timeout):
     234 |              if self.process.poll() is not None:
     235 | +                # Attach abrupt shutdown error/s to the exception message
     236 | +                str_error = ''
     237 | +                self.stderr.seek(0)
     238 | +                err = ''.join(line for line in self.stderr.read().decode('utf-8').strip() if line[0] != "*")
    


    theStack commented at 11:37 PM on October 4, 2023:

    This list comprehension results in a list of characters rather than a list of lines, and filters out all * characters, rather than only lines starting with *. I assume what you intended would look more like this:

                    err = ''.join(line.decode('utf-8') for line in self.stderr if not line.startswith(b"*"))
    

    (On the other hand, do we need this * filter at all?)

    I tested with the following patch:

    diff --git a/src/init.cpp b/src/init.cpp
    index 33389e695..e6509cc85 100644
    --- a/src/init.cpp
    +++ b/src/init.cpp
    @@ -1661,6 +1661,7 @@ bool AppInitMain(NodeContext& node, interfaces::BlockAndHeaderTipInfo* tip_info)
         }
     
         chainman.m_thread_load = std::thread(&util::TraceThread, "initload", [=, &chainman, &args, &node] {
    +        throw std::logic_error("I'm an ugly error.\n*** This line should be skipped.\nThis line should be visible. ***");
             // Import blocks
             ImportBlocks(chainman, vImportFiles);
             if (args.GetBoolArg("-stopafterblockimport", DEFAULT_STOPAFTERBLOCKIMPORT)) {
    

    furszy commented at 12:31 AM on October 5, 2023:

    On the other hand, do we need this * filter at all?

    We need it to present the crashed/aborted node's identification within the error message. Because the stderr message contain the * separator (link).

    In other words, right now the format is:

    ****************
    <node_datadir> abruptly aborted with error:
    
    <error message>
    ****************
    

    And what you propose would looks like:

    <node_datadir> abruptly aborted with error:
    ****************
    <error message>
    ****************
    

    Which I think that is harder to read.


    theStack commented at 11:05 AM on October 5, 2023:

    Okay I see, that makes sense.

  23. furszy force-pushed on Oct 5, 2023
  24. furszy force-pushed on Oct 5, 2023
  25. furszy commented at 12:40 AM on October 5, 2023: member

    Updated per feedback, thanks theStack!

    Diff is minimal.

  26. DrahtBot added the label CI failed on Oct 5, 2023
  27. hebasto approved
  28. hebasto commented at 9:20 AM on October 5, 2023: member

    ACK 2633152c1a53a5c961c83cbb97a91dab6c59fc43, tested on GHA:

    • native macOS:
    test_framework.test_node.FailedToStartError: [node 0] bitcoind exited with status -6 during initialization. 
    
    ************************
    'node0' abruptly aborted with error:
    
    EXCEPTION: St11logic_error       
    I'm an ugly error. The blockchain is completely messed up.       
    bitcoin in initload       
    
    libc++abi: terminating due to uncaught exception of type std::logic_error: I'm an ugly error. The blockchain is completely messed up.
    ************************
    
    2023-10-05T09:09:57.765000Z TestFramework (INFO): Stopping nodes
    
    • native Windows:
    test_framework.test_node.FailedToStartError: [node 0] bitcoind exited with status 3 during initialization. 
    
    ************************
    'node0' abruptly aborted with error:
    
    
    
    EXCEPTION: class std::logic_error       
    
    I'm an ugly error. The blockchain is completely messed up.       
    
    D:\a\bitcoin\bitcoin\src\bitcoind.exe in initload       
    
    
    
    ************************
    
    2023-10-05T09:10:35.169000Z TestFramework (INFO): Stopping nodes
    
  29. DrahtBot removed the label CI failed on Oct 5, 2023
  30. in test/functional/test_framework/test_node.py:241 in 2633152c1a outdated
     236 | +                str_error = ''
     237 | +                self.stderr.seek(0)
     238 | +                err = ''.join(line.decode('utf-8') for line in self.stderr if not line.startswith(b"*"))
     239 | +                if err != '':
     240 | +                    str_error += "\n\n************************\n"
     241 | +                    str_error += f"'{self.datadir_path.name}' abruptly aborted with error:"
    


    maflcko commented at 10:22 AM on October 5, 2023:

    nit: Is this needed? _node_msg already prints the node index. Also you could write f"'node{self.index}' ... (shorter)


    furszy commented at 12:46 PM on October 5, 2023:

    hmm ok. I run this dozens of times and never noticed it.. pushed it. Now the code is much shorter.

  31. maflcko approved
  32. maflcko commented at 10:23 AM on October 5, 2023: member

    lgtm

  33. theStack approved
  34. theStack commented at 11:04 AM on October 5, 2023: contributor

    ACK 2633152c1a53a5c961c83cbb97a91dab6c59fc43

  35. test: display abrupt shutdown errors in console output
    Making it easier to debug errors in the CI environment,
    particularly in scenarios where it's not immediately clear
    what happened nor which node crashed (or shutdown abruptly).
    0f83ab407e
  36. furszy force-pushed on Oct 5, 2023
  37. furszy commented at 12:48 PM on October 5, 2023: member

    Updated per feedback, small diff. Now the code is much shorter.

  38. maflcko commented at 12:50 PM on October 5, 2023: member

    lgtm ACK 0f83ab407ec5aa0591c54c03bcf408c7f2f0a192

  39. DrahtBot requested review from theStack on Oct 5, 2023
  40. DrahtBot requested review from hebasto on Oct 5, 2023
  41. maflcko commented at 1:45 PM on October 5, 2023: member

    Nice.

    Just tested this to debug a crash and found it useful.

  42. theStack approved
  43. theStack commented at 8:40 PM on October 5, 2023: contributor

    ACK 0f83ab407ec5aa0591c54c03bcf408c7f2f0a192

  44. maflcko added this to the milestone 26.0 on Oct 6, 2023
  45. fanquake merged this on Oct 6, 2023
  46. fanquake closed this on Oct 6, 2023

  47. furszy deleted the branch on Oct 6, 2023
  48. Frank-GER referenced this in commit 417519231c on Oct 13, 2023
  49. bitcoin locked this on Oct 5, 2024


hebasto

Labels

Milestone
26.0


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-04-16 00:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me