test: Do not pass tests on unhandled exceptions #33001

pull maflcko wants to merge 2 commits into bitcoin:master from maflcko:2507-test-actually-fail-on-failure changing 1 files +3 −15
  1. maflcko commented at 10:43 am on July 17, 2025: member

    Currently the functional tests are problematic, because they pass, even if they encounter an unhanded exception.

    Fix this by handling all exceptions: Catch BaseException as fallback and mark it as failure.

    Can be tested via:

     0diff --git a/test/functional/wallet_disable.py b/test/functional/wallet_disable.py
     1index da6e5d408f..ecc41fb041 100755
     2--- a/test/functional/wallet_disable.py
     3+++ b/test/functional/wallet_disable.py
     4@@ -19,6 +19,7 @@ class DisableWalletTest (BitcoinTestFramework):
     5         self.wallet_names = []
     6 
     7     def run_test (self):
     8+        import sys;sys.exit("fatal error")
     9         # Make sure wallet is really disabled
    10         assert_raises_rpc_error(-32601, 'Method not found', self.nodes[0].getwalletinfo)
    11         x = self.nodes[0].validateaddress('3J98t1WpEZ73CNmQviecrnyiWrnqRhWNLy')
    

    Previously, the test would pass. With this patch, it would fail.

  2. in test/functional/test_framework/test_framework.py:208 in fad0cd510c outdated
    214@@ -215,6 +215,9 @@ def main(self):
    215         except KeyboardInterrupt:
    216             self.log.warning("Exiting after keyboard interrupt")
    217             self.success = TestStatus.FAILED
    218+        except BaseException as e:
    219+            self.log.warning(f"Unhandled base exception {repr(e)}")
    220+            self.success = TestStatus.FAILED
    


    stickies-v commented at 1:36 pm on July 17, 2025:

    All these exception handlers (except SkipTest) have the same behaviour, except for making the logs less useful by hiding the actual error. Can’t we just remove all (except SkipTest and BaseException) and let the exception speak for itself? + updates log to exception instead of warning.

     0diff --git a/test/functional/test_framework/test_framework.py b/test/functional/test_framework/test_framework.py
     1index 21007d9156..17e5a6b397 100755
     2--- a/test/functional/test_framework/test_framework.py
     3+++ b/test/functional/test_framework/test_framework.py
     4@@ -194,29 +194,11 @@ class BitcoinTestFramework(metaclass=BitcoinTestMetaClass):
     5             else:
     6                 self.run_test()
     7 
     8-        except JSONRPCException:
     9-            self.log.exception("JSONRPC error")
    10-            self.success = TestStatus.FAILED
    11         except SkipTest as e:
    12             self.log.warning("Test Skipped: %s" % e.message)
    13             self.success = TestStatus.SKIPPED
    14-        except AssertionError:
    15-            self.log.exception("Assertion failed")
    16-            self.success = TestStatus.FAILED
    17-        except KeyError:
    18-            self.log.exception("Key error")
    19-            self.success = TestStatus.FAILED
    20-        except subprocess.CalledProcessError as e:
    21-            self.log.exception("Called Process failed with '{}'".format(e.output))
    22-            self.success = TestStatus.FAILED
    23-        except Exception:
    24-            self.log.exception("Unexpected exception caught during testing")
    25-            self.success = TestStatus.FAILED
    26-        except KeyboardInterrupt:
    27-            self.log.warning("Exiting after keyboard interrupt")
    28-            self.success = TestStatus.FAILED
    29         except BaseException as e:
    30-            self.log.warning(f"Unhandled base exception {repr(e)}")
    31+            self.log.exception(repr(e))
    32             self.success = TestStatus.FAILED
    33         finally:
    34             exit_code = self.shutdown()
    

    maflcko commented at 2:57 pm on July 17, 2025:
    Agree; thx, done

    stickies-v commented at 3:15 pm on July 17, 2025:
    Logging the exception e seems more useful than “Unexpected exception”, and imo completely obviates the need for having the CalledProcessError and KeyboardInterrupt special case handling?

    stickies-v commented at 3:21 pm on July 17, 2025:

    Logs with special case handling:

    02025-07-17T15:07:39.774663Z TestFramework (ERROR): Called Process failed with 'None'
    12025-07-17T15:18:22.039274Z TestFramework (WARNING): Exiting after keyboard interrupt
    

    Logs without special case handling:

    02025-07-17T15:19:09.169925Z TestFramework (ERROR): CalledProcessError(1, ['ls', '--invalid-flag'])
    12025-07-17T15:19:26.639162Z TestFramework (ERROR): KeyboardInterrupt()
    

    Imo, they’re equally informative for KeyboardInterrupt, and more informative without special case handling for CalledProcessError, with less code.

     0diff --git a/test/functional/feature_abortnode.py b/test/functional/feature_abortnode.py
     1index a5c8aa163a..748875b25d 100755
     2--- a/test/functional/feature_abortnode.py
     3+++ b/test/functional/feature_abortnode.py
     4@@ -22,6 +22,8 @@ class AbortNodeTest(BitcoinTestFramework):
     5         # We'll connect the nodes later
     6 
     7     def run_test(self):
     8+        import subprocess
     9+        subprocess.run(["ls", "--invalid-flag"], check=True)
    10         self.generate(self.nodes[0], 3, sync_fun=self.no_op)
    11 
    12         # Deleting the undo file will result in reorg failure
    13diff --git a/test/functional/test_framework/test_framework.py b/test/functional/test_framework/test_framework.py
    14index 823d946cfe..17e5a6b397 100755
    15--- a/test/functional/test_framework/test_framework.py
    16+++ b/test/functional/test_framework/test_framework.py
    17@@ -197,14 +197,8 @@ class BitcoinTestFramework(metaclass=BitcoinTestMetaClass):
    18         except SkipTest as e:
    19             self.log.warning("Test Skipped: %s" % e.message)
    20             self.success = TestStatus.SKIPPED
    21-        except subprocess.CalledProcessError as e:
    22-            self.log.exception("Called Process failed with '{}'".format(e.output))
    23-            self.success = TestStatus.FAILED
    24-        except KeyboardInterrupt:
    25-            self.log.warning("Exiting after keyboard interrupt")
    26-            self.success = TestStatus.FAILED
    27         except BaseException as e:
    28-            self.log.exception(f"Unexpected exception")
    29+            self.log.exception(repr(e))
    30             self.success = TestStatus.FAILED
    31         finally:
    32             exit_code = self.shutdown()
    

    maflcko commented at 3:59 pm on July 17, 2025:

    more informative without special case handling for CalledProcessError, with less code.

    I don’t think this is true. It is missing the output. You can try this for any command that captures the output:

    subprocess.run(['bash', '-c', 'echo aaaaa && false'], check=True, capture_output=True)

    There could be an argument about stderr, but this seems better as a separate commit or pull.

    KeyboardInterrupt

    It currently uses self.log.warning, which is different in verbosity. I don’t think it makes sense to show the traceback when the user pressed CTRL+C?


    stickies-v commented at 5:47 pm on July 17, 2025:

    You can try this for any command that captures the output:

    Ah cool, thanks for the example. Agreed that improving is out of scope for this PR then.

    I don’t think it makes sense to show the traceback when the user pressed CTRL+C?

    No strong preference either way, I kinda like seeing where it got interrupted (and I’m already used to it because the KeyboardInterrupt exception handling doesn’t work for test_runner.py which is my usual interface), but makes sense to not change this behaviour here and leave as-is.

    Can be marked as resolved.


    maflcko commented at 5:35 am on July 18, 2025:

    I kinda like seeing where it got interrupted

    Yeah, I realize it is useful to debug timeouts. Pushed a fresh commit for KeyboardInterrupt.

  3. maflcko added the label Tests on Jul 17, 2025
  4. maflcko force-pushed on Jul 17, 2025
  5. DrahtBot commented at 2:56 pm on July 17, 2025: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/33001.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK stickies-v, pablomartin4btc

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

  6. test: Do not pass tests on unhandled exceptions
    This adds a missing catch for BaseException (e.g. SystemExit), which
    would otherwise be silently ignored.
    
    Also, remove the redundant other catches, which are just calling
    log.exception with a redundant log message.
    fa30b34026
  7. maflcko force-pushed on Jul 17, 2025
  8. DrahtBot added the label CI failed on Jul 17, 2025
  9. DrahtBot commented at 3:05 pm on July 17, 2025: contributor

    🚧 At least one of the CI tasks failed. Task lint: https://github.com/bitcoin/bitcoin/runs/46188973278 LLM reason (✨ experimental): The test_framework code contains two lint errors related to unused variables and f-string formatting, which are detected by the linter and cause the CI failure.

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  10. DrahtBot removed the label CI failed on Jul 17, 2025
  11. stickies-v approved
  12. stickies-v commented at 5:48 pm on July 17, 2025: contributor
    ACK fa30b34026f76a5b8af997152fced2d281782e0d
  13. test: Log KeyboardInterrupt as exception
    log.exception is more verbose and useful to debug timeouts.
    
    Also, log stderr for CalledProcessError to make debugging easier.
    faa3e68411
  14. stickies-v approved
  15. stickies-v commented at 10:10 am on July 18, 2025: contributor
    re-ACK faa3e684118bffa7a98cf76eeeb59243219df900
  16. OrangeDoro commented at 10:38 am on July 20, 2025: none

    Hi! I’m a grad student working on a research project about using large language models to automate code review. Based on your commit faa3e684118bffa7a98cf76eeeb59243219df900 and the changes in test/functional/test_framework/test_framework.py, my tool generated this comment:

    1. Ensure that e.stdout and e.stderr are checked for existence before logging to prevent potential AttributeError. Consider using getattr(e, 'stdout', 'N/A') and getattr(e, 'stderr', 'N/A').
    2. Change except BaseException to except Exception to avoid catching system-exiting exceptions unintentionally.
    3. Ensure self.success is set to TestStatus.FAILED in the BaseException block to accurately reflect the test’s outcome.
    4. Ensure that the shutdown method in the finally block is well-defined and handles potential exceptions to enhance robustness.
    5. Reintroduce the KeyboardInterrupt exception handling to ensure user interruptions are logged and handled properly.
    6. Consider sanitizing the output before logging or avoid logging the entire output to prevent sensitive information exposure.
    7. Ensure that the exit code handling is appropriate for the application context. Document the intended behavior if sys.exit() is called.
    8. The assertion assert hasattr(self, "num_nodes") could be replaced with an explicit check and raise a ValueError if self.num_nodes is not set.
    9. Consider logging the command that was executed in the subprocess.CalledProcessError block for better debugging context.
    10. Add a test case that simulates a subprocess.CalledProcessError and verifies that the log message contains both stdout and stderr.
    11. Capture the log output during tests and assert that it matches the expected format, including the presence of stdout and stderr.
    12. The self.success assignments in the exception handling blocks could use comments to clarify what each status means in the context of the test execution.

    As part of my research, I’m trying to understand how useful these comments are in real-world development. If you have a moment, I’d be super grateful if you could quickly reply to these two yes/no questions:

    1. Does this comment provide suggestions from a dimension you hadn’t considered?

    2. Do you find this comment helpful?

    Thanks a lot for your time and feedback! And sorry again if this message is a bother.

  17. maflcko commented at 6:56 am on July 21, 2025: member
    @OrangeDoro Every LLM generated point in your comment is wrong and completely misses the point. This is trivial to see, if you took a look at the previous comments and commit messages. (Please stop sharing LLM generated content that you didn’t review or even understand yourself)
  18. pablomartin4btc approved
  19. pablomartin4btc commented at 9:24 pm on July 22, 2025: member

    tACK faa3e684118bffa7a98cf76eeeb59243219df900

    Managed to reproduce the issue with the patch provided in the PR description, this branch fixes it. Nice detail handling KeyboardInterrupt for debugging purpose. Code-wise refactoring: better reading removing redundant catches.


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-07-23 00:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me