test: write functional test results to csv #30291

pull tdb3 wants to merge 1 commits into bitcoin:master from tdb3:test_results_as_csv changing 1 files +28 −3
  1. tdb3 commented at 0:24 am on June 15, 2024: contributor

    Adds argument --resultsfile to test_runner.py. Enables functional test results to be written to a (csv) file for processing by other applications (or for historical archiving). Test name, status, and duration are written to the file provided with the argument.

    Since test_runner.py is being touched, also fixes a misspelling (linter warning). Can split into its own commit if desired.

    Notes

    • Total runtime of functional tests has seemed to have increased on my development machines over the past few months (more tests added, individual test runtime increase, etc.). Was interested in recording test runtime data over time to detect trends. Initially searched doc/benchmarking.md, existing PRs, and Issues, but didn’t immediately see this type of capability or alternate solutions (please chime in if you know of one!). Thought it would be beneficial to add this capability to test_runner to facilitate this type of data analysis (and potentially other use cases)
    • Saw https://github.com/bitcoin/bitcoin/blob/master/test/functional/README.md#benchmarking-with-perf, and this PR’s higher level data seems complimentary.
    • Was on the fence as to whether to expand print_results() (i.e. take advantage of the same loop over test_results) or implement in a separate write_results() function. Decided on the latter for now, but interested in reviewers’ thoughts.

    Example 1: all tests pass

     0$ test/functional/test_runner.py --resultsfile functional_test_results.csv --cachedir=/mnt/tmp/cache --tmpdir=/mnt/tmp feature_blocksdir wallet_startup feature_config_args mempool_accept
     1Temporary test directory at /mnt/tmp/test_runner_₿_🏃_20240614_201625
     2Test results will be written to functional_test_results.csv
     3...
     4
     5$ cat functional_test_results.csv 
     6test,status,duration(seconds)
     7feature_blocksdir.py,Passed,1
     8feature_config_args.py,Passed,29
     9mempool_accept.py,Passed,9
    10wallet_startup.py,Passed,2
    11ALL,Passed,29
    

    Example 2: one test failure

    0$ cat functional_test_results.csv 
    1test,status,duration(seconds)
    2feature_blocksdir.py,Passed,1
    3feature_config_args.py,Passed,28
    4wallet_startup.py,Passed,2
    5mempool_accept.py,Failed,1
    6ALL,Failed,28
    
  2. DrahtBot commented at 0:24 am on June 15, 2024: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage

    For detailed information about the code coverage, see the test coverage report.

    Reviews

    See the guideline for information on the review process.

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #30084 (lint/contrib/doc updates by jonatack)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  3. DrahtBot added the label Tests on Jun 15, 2024
  4. tdb3 force-pushed on Jun 15, 2024
  5. DrahtBot added the label CI failed on Jun 15, 2024
  6. DrahtBot commented at 0:33 am on June 15, 2024: contributor

    🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the documentation.

    Possibly this is due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    Leave a comment here, if you need help tracking down a confusing failure.

    Debug: https://github.com/bitcoin/bitcoin/runs/26256129112

  7. tdb3 force-pushed on Jun 15, 2024
  8. DrahtBot removed the label CI failed on Jun 15, 2024
  9. maflcko commented at 7:54 am on June 15, 2024: member
    lgtm ACK ee3b5c7fb4e81f833bb8a7db6c14b4734a2d3133
  10. brunoerg commented at 3:43 pm on June 15, 2024: contributor
    Concept ACK
  11. tdb3 marked this as ready for review on Jun 15, 2024
  12. kevkevinpal commented at 6:08 pm on June 16, 2024: contributor

    Concept ACK

    This line initially when I read it made me assume that 28 tests failed when in reality 28 passed and one failed ALL,Failed,28

    Also I think the durration(seconds) here is equal to number of tests for all which seems to be inaccurate too

    I would expect

    0test,status,duration(seconds)
    1feature_blocksdir.py,Passed,1
    2feature_config_args.py,Passed,28
    3wallet_startup.py,Passed,2
    4mempool_accept.py,Failed,1
    5ALL,Failed,32
    
  13. test: write functional test results to csv
    Adds argument --resultsfile to test_runner.py.
    Writes comma-separated functional test name, status,
    and duration to the file provided with the argument.
    Also fixes minor typo in test_runner.py
    ad06e68399
  14. in test/functional/test_runner.py:481 in ee3b5c7fb4 outdated
    473@@ -471,6 +474,13 @@ def main():
    474 
    475     logging.debug("Temporary test directory at %s" % tmpdir)
    476 
    477+    results_filepath = None
    478+    if args.resultsfile:
    479+        results_filepath = pathlib.Path(args.resultsfile)
    480+        # Stop early if the parent directory doesn't exist
    481+        assert results_filepath.parent.exists()
    


    kevkevinpal commented at 6:25 pm on June 16, 2024:

    Would be useful to have a log come from this assert

    0        assert results_filepath.parent.exists(), "Results file path parent directory does not exist"
    

    Before:

    0test/functional/test_runner.py --resultsfile ./fakedir/functional_test_results.csv feature_blocksdir
    1Temporary test directory at /tmp/test_runner_₿_🏃_20240616_141849
    2Traceback (most recent call last):
    3  File "/home/kevkevin/DEVDIR/bitcoin/test/functional/test_runner.py", line 939, in <module>
    4    main()
    5  File "/home/kevkevin/DEVDIR/bitcoin/test/functional/test_runner.py", line 481, in main
    6    assert results_filepath.parent.exists()
    7AssertionError
    

    After:

    0test/functional/test_runner.py --resultsfile ./fakedir/functional_test_results.csv feature_blocksdir
    1Temporary test directory at /tmp/test_runner_₿_🏃_20240616_142329
    2Traceback (most recent call last):
    3  File "/home/kevkevin/DEVDIR/bitcoin/test/functional/test_runner.py", line 939, in <module>
    4    main()
    5  File "/home/kevkevin/DEVDIR/bitcoin/test/functional/test_runner.py", line 481, in main
    6    assert results_filepath.parent.exists(), "Results file path parent directory does not exist"
    7AssertionError: Results file path parent directory does not exist
    

    tdb3 commented at 2:54 am on June 17, 2024:
    Thanks. This is an improvement. Included.
  15. tdb3 force-pushed on Jun 17, 2024
  16. tdb3 commented at 3:28 am on June 17, 2024: contributor

    Concept ACK

    This line initially when I read it made me assume that 28 tests failed when in reality 28 passed and one failed ALL,Failed,28

    Also I think the durration(seconds) here is equal to number of tests for all which seems to be inaccurate too

    I would expect

    0test,status,duration(seconds)
    1feature_blocksdir.py,Passed,1
    2feature_config_args.py,Passed,28
    3wallet_startup.py,Passed,2
    4mempool_accept.py,Failed,1
    5ALL,Failed,32
    

    Thank you for the review!

    The approach is:

    • Similarly to the stdout printing, the status of each test and its duration are included in the CSV file
    • The summary row ALL reflects the overall Passed/Failed status (i.e. if one test Failed, then overall is Failed)
    • The duration for ALL reflects the actual runtime rather than the accumulated test duration (and since tests are run in parallel, this is shorter than the accumulated test duration). This approach was chosen to cleanly fit the data into CSV form, and avoid extraneous partial rows.
    • While the duration for ALL in the CSV file does differ from stdout, the accumulated test duration can be obtained easily by summing up the row durations.

    stdout:

     0$ test/functional/test_runner.py --resultsfile functional_test_results.csv --cachedir=/mnt/tmp/cache --tmpdir=/mnt/tmp feature_blocksdir wallet_startup feature_config_args mempool_accept
     1...
     2TEST                   | STATUS    | DURATION
     3
     4feature_blocksdir.py   | ✓ Passed  | 1 s
     5feature_config_args.py | ✓ Passed  | 29 s
     6mempool_accept.py      | ✓ Passed  | 9 s
     7wallet_startup.py      | ✓ Passed  | 3 s
     8
     9ALL                    | ✓ Passed  | 42 s (accumulated) 
    10Runtime: 29 s
    

    results file:

    0$ cat functional_test_results.csv 
    1test,status,duration(seconds)
    2feature_blocksdir.py,Passed,1
    3feature_config_args.py,Passed,29
    4mempool_accept.py,Passed,9
    5wallet_startup.py,Passed,3
    6ALL,Passed,29
    
  17. in test/functional/test_runner.py:728 in ad06e68399
    723+        results_writer.writerow(['test', 'status', 'duration(seconds)'])
    724+        all_passed = True
    725+        for test_result in test_results:
    726+            all_passed = all_passed and test_result.was_successful
    727+            results_writer.writerow([test_result.name, test_result.status, str(test_result.time)])
    728+        results_writer.writerow(['ALL', ("Passed" if all_passed else "Failed"), str(total_runtime)])
    


    rkrux commented at 7:27 am on June 17, 2024:
    Nit: Having had only few tests failed also display ALL,Failed, which in the first glance gives the impression that all of them failed. In cases like this, displaying ALL, Few failed would be more explicit.

    tdb3 commented at 1:05 pm on June 17, 2024:

    Yeah, at first glance showing ALL,Failed is not ideal, but in this case the CSV output is matching the approach taken by stdout printing, so it seems appropriate for now.

    Since the scope of this PR is to focus on adding the CSV output capability, I’ll leave it as-is for now. Perhaps a future discussion for how ALL is treated in both the stdout printing and the CSV could be had?

    0TEST                   | STATUS    | DURATION
    1
    2feature_blocksdir.py   | ✓ Passed  | 1 s
    3feature_config_args.py | ✓ Passed  | 29 s
    4wallet_startup.py      | ✓ Passed  | 3 s
    5mempool_accept.py      | ✖ Failed  | 1 s
    6
    7ALL                    | ✖ Failed  | 34 s (accumulated) 
    8Runtime: 29 s
    

    rkrux commented at 1:18 pm on June 17, 2024:
    I see, makes sense to have that discussion separate from this.
  18. in test/functional/test_runner.py:479 in ad06e68399
    473@@ -471,6 +474,13 @@ def main():
    474 
    475     logging.debug("Temporary test directory at %s" % tmpdir)
    476 
    477+    results_filepath = None
    478+    if args.resultsfile:
    479+        results_filepath = pathlib.Path(args.resultsfile)
    


    rkrux commented at 7:32 am on June 17, 2024:
    The write function below write_results intentionally creates a csv file, should we add a check here for the csv file extension?

    rkrux commented at 7:35 am on June 17, 2024:
    As soon as the functional_test_results.csv file was generated in my system, I thought of adding it in gitignore. Since this feature is conditional based on the param, we can’t add this hardcoded name in gitgnore but how about adding *.csv in gitgnore?

    tdb3 commented at 1:07 pm on June 17, 2024:
    Good thought. Since the approach is to allow the user to specify the filename/extension as they see fit (rather than force a particular extension), this seems ok as-is. If others feel that forcing the extension .csv would be more ideal, then will keep this in mind.
  19. rkrux approved
  20. rkrux commented at 7:39 am on June 17, 2024: none

    tACK ad06e68

    Make is successful so are all the functional tests. Passed the resultsfile param and generated the results csv few times.

    Thanks @tdb3 for this, it’s a good addition to the testing suite. I can imagine using this feature and also sorting the tests in descending order based on time taken. I mentioned couple points but none are blockers.

  21. DrahtBot requested review from brunoerg on Jun 17, 2024
  22. DrahtBot requested review from kevkevinpal on Jun 17, 2024
  23. DrahtBot requested review from maflcko on Jun 17, 2024
  24. maflcko commented at 10:57 am on June 17, 2024: member
    re-ACK ad06e68399da71c615db0dbf5304d0cd46bc1f40
  25. in test/functional/test_runner.py:721 in ad06e68399
    715@@ -702,6 +716,17 @@ def print_results(test_results, max_len_name, runtime):
    716     results += "Runtime: %s s\n" % (runtime)
    717     print(results)
    718 
    719+
    720+def write_results(test_results, filepath, total_runtime):
    721+    with open(filepath, mode="w", encoding="utf8") as results_file:
    


    brunoerg commented at 12:48 pm on June 17, 2024:
    It creates the file if it doesn’t exist, right?

    tdb3 commented at 12:50 pm on June 17, 2024:
    Yes, and overwrites the existing file if it does

    maflcko commented at 12:51 pm on June 17, 2024:

    brunoerg commented at 12:53 pm on June 17, 2024:
    Cool. Thanks.
  26. DrahtBot requested review from brunoerg on Jun 17, 2024
  27. brunoerg commented at 1:06 pm on June 17, 2024: contributor

    ACK ad06e68399da71c615db0dbf5304d0cd46bc1f40

  28. tdb3 commented at 1:08 pm on June 17, 2024: contributor

    tACK ad06e68

    Thanks for the review @rkrux!

  29. marcofleon commented at 1:08 pm on June 17, 2024: contributor
    Good idea, tested ACK ad06e68399da71c615db0dbf5304d0cd46bc1f40
  30. tdb3 commented at 1:10 pm on June 17, 2024: contributor

    Good idea, tested ACK ad06e68

    Thanks for reviewing/testing @marcofleon!

  31. kevkevinpal commented at 6:31 pm on June 17, 2024: contributor

    tACK ad06e68

    Looks good to me!

  32. achow101 commented at 6:43 pm on June 17, 2024: member
    ACK ad06e68399da71c615db0dbf5304d0cd46bc1f40
  33. achow101 merged this on Jun 17, 2024
  34. achow101 closed this on Jun 17, 2024


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-09-28 22:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me