test: write functional test results to csv

tdb3 commented at 0:24 am on June 15, 2024: contributor

Adds argument --resultsfile to test_runner.py. Enables functional test results to be written to a (csv) file for processing by other applications (or for historical archiving). Test name, status, and duration are written to the file provided with the argument.

Since test_runner.py is being touched, also fixes a misspelling (linter warning). Can split into its own commit if desired.

Notes

Total runtime of functional tests has seemed to have increased on my development machines over the past few months (more tests added, individual test runtime increase, etc.). Was interested in recording test runtime data over time to detect trends. Initially searched doc/benchmarking.md, existing PRs, and Issues, but didn’t immediately see this type of capability or alternate solutions (please chime in if you know of one!). Thought it would be beneficial to add this capability to test_runner to facilitate this type of data analysis (and potentially other use cases)
Saw https://github.com/bitcoin/bitcoin/blob/master/test/functional/README.md#benchmarking-with-perf, and this PR’s higher level data seems complimentary.
Was on the fence as to whether to expand print_results() (i.e. take advantage of the same loop over test_results) or implement in a separate write_results() function. Decided on the latter for now, but interested in reviewers’ thoughts.

Example 1: all tests pass

 0$ test/functional/test_runner.py --resultsfile functional_test_results.csv --cachedir=/mnt/tmp/cache --tmpdir=/mnt/tmp feature_blocksdir wallet_startup feature_config_args mempool_accept
 1Temporary test directory at /mnt/tmp/test_runner_₿_🏃_20240614_201625
 2Test results will be written to functional_test_results.csv
 3...
 4
 5$ cat functional_test_results.csv 
 6test,status,duration(seconds)
 7feature_blocksdir.py,Passed,1
 8feature_config_args.py,Passed,29
 9mempool_accept.py,Passed,9
10wallet_startup.py,Passed,2
11ALL,Passed,29

Example 2: one test failure

0$ cat functional_test_results.csv 
1test,status,duration(seconds)
2feature_blocksdir.py,Passed,1
3feature_config_args.py,Passed,28
4wallet_startup.py,Passed,2
5mempool_accept.py,Failed,1
6ALL,Failed,28

DrahtBot commented at 0:24 am on June 15, 2024: contributor

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage

For detailed information about the code coverage, see the test coverage report.

Reviews

See the guideline for information on the review process.

Type	Reviewers
ACK	rkrux, maflcko, brunoerg, marcofleon, kevkevinpal, achow101

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#30084 (lint/contrib/doc updates by jonatack)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

DrahtBot added the label Tests on Jun 15, 2024

tdb3 force-pushed on Jun 15, 2024

DrahtBot added the label CI failed on Jun 15, 2024

DrahtBot commented at 0:33 am on June 15, 2024: contributor

🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the documentation.

Possibly this is due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

Leave a comment here, if you need help tracking down a confusing failure.

Debug: https://github.com/bitcoin/bitcoin/runs/26256129112

tdb3 force-pushed on Jun 15, 2024

DrahtBot removed the label CI failed on Jun 15, 2024

maflcko commented at 7:54 am on June 15, 2024: member

lgtm ACK ee3b5c7fb4e81f833bb8a7db6c14b4734a2d3133

brunoerg commented at 3:43 pm on June 15, 2024: contributor

Concept ACK

tdb3 marked this as ready for review on Jun 15, 2024

kevkevinpal commented at 6:08 pm on June 16, 2024: contributor

Concept ACK

This line initially when I read it made me assume that 28 tests failed when in reality 28 passed and one failed ALL,Failed,28

Also I think the durration(seconds) here is equal to number of tests for all which seems to be inaccurate too

I would expect

0test,status,duration(seconds)
1feature_blocksdir.py,Passed,1
2feature_config_args.py,Passed,28
3wallet_startup.py,Passed,2
4mempool_accept.py,Failed,1
5ALL,Failed,32

test: write functional test results to csv

Adds argument --resultsfile to test_runner.py.
Writes comma-separated functional test name, status,
and duration to the file provided with the argument.
Also fixes minor typo in test_runner.py

ad06e68399

in test/functional/test_runner.py:481 in ee3b5c7fb4 outdated

473@@ -471,6 +474,13 @@ def main():
474 
475     logging.debug("Temporary test directory at %s" % tmpdir)
476 
477+    results_filepath = None
478+    if args.resultsfile:
479+        results_filepath = pathlib.Path(args.resultsfile)
480+        # Stop early if the parent directory doesn't exist
481+        assert results_filepath.parent.exists()

kevkevinpal commented at 6:25 pm on June 16, 2024:

Would be useful to have a log come from this assert

0        assert results_filepath.parent.exists(), "Results file path parent directory does not exist"

Before:

0test/functional/test_runner.py --resultsfile ./fakedir/functional_test_results.csv feature_blocksdir
1Temporary test directory at /tmp/test_runner_₿_🏃_20240616_141849
2Traceback (most recent call last):
3  File "/home/kevkevin/DEVDIR/bitcoin/test/functional/test_runner.py", line 939, in <module>
4    main()
5  File "/home/kevkevin/DEVDIR/bitcoin/test/functional/test_runner.py", line 481, in main
6    assert results_filepath.parent.exists()
7AssertionError

After:

0test/functional/test_runner.py --resultsfile ./fakedir/functional_test_results.csv feature_blocksdir
1Temporary test directory at /tmp/test_runner_₿_🏃_20240616_142329
2Traceback (most recent call last):
3  File "/home/kevkevin/DEVDIR/bitcoin/test/functional/test_runner.py", line 939, in <module>
4    main()
5  File "/home/kevkevin/DEVDIR/bitcoin/test/functional/test_runner.py", line 481, in main
6    assert results_filepath.parent.exists(), "Results file path parent directory does not exist"
7AssertionError: Results file path parent directory does not exist

tdb3 commented at 2:54 am on June 17, 2024:

Thanks. This is an improvement. Included.

tdb3 force-pushed on Jun 17, 2024

tdb3 commented at 3:28 am on June 17, 2024: contributor

Concept ACK

This line initially when I read it made me assume that 28 tests failed when in reality 28 passed and one failed ALL,Failed,28

Also I think the durration(seconds) here is equal to number of tests for all which seems to be inaccurate too

I would expect
0test,status,duration(seconds)
1feature_blocksdir.py,Passed,1
2feature_config_args.py,Passed,28
3wallet_startup.py,Passed,2
4mempool_accept.py,Failed,1
5ALL,Failed,32

Thank you for the review!

The approach is:

Similarly to the stdout printing, the status of each test and its duration are included in the CSV file
The summary row ALL reflects the overall Passed/Failed status (i.e. if one test Failed, then overall is Failed)
The duration for ALL reflects the actual runtime rather than the accumulated test duration (and since tests are run in parallel, this is shorter than the accumulated test duration). This approach was chosen to cleanly fit the data into CSV form, and avoid extraneous partial rows.
While the duration for ALL in the CSV file does differ from stdout, the accumulated test duration can be obtained easily by summing up the row durations.

stdout:

 0$ test/functional/test_runner.py --resultsfile functional_test_results.csv --cachedir=/mnt/tmp/cache --tmpdir=/mnt/tmp feature_blocksdir wallet_startup feature_config_args mempool_accept
 1...
 2TEST                   | STATUS    | DURATION
 3
 4feature_blocksdir.py   | ✓ Passed  | 1 s
 5feature_config_args.py | ✓ Passed  | 29 s
 6mempool_accept.py      | ✓ Passed  | 9 s
 7wallet_startup.py      | ✓ Passed  | 3 s
 8
 9ALL                    | ✓ Passed  | 42 s (accumulated) 
10Runtime: 29 s

results file:

0$ cat functional_test_results.csv 
1test,status,duration(seconds)
2feature_blocksdir.py,Passed,1
3feature_config_args.py,Passed,29
4mempool_accept.py,Passed,9
5wallet_startup.py,Passed,3
6ALL,Passed,29

in test/functional/test_runner.py:728 in ad06e68399

723+        results_writer.writerow(['test', 'status', 'duration(seconds)'])
724+        all_passed = True
725+        for test_result in test_results:
726+            all_passed = all_passed and test_result.was_successful
727+            results_writer.writerow([test_result.name, test_result.status, str(test_result.time)])
728+        results_writer.writerow(['ALL', ("Passed" if all_passed else "Failed"), str(total_runtime)])

rkrux commented at 7:27 am on June 17, 2024:

Nit: Having had only few tests failed also display ALL,Failed, which in the first glance gives the impression that all of them failed. In cases like this, displaying ALL, Few failed would be more explicit.

tdb3 commented at 1:05 pm on June 17, 2024:

Yeah, at first glance showing ALL,Failed is not ideal, but in this case the CSV output is matching the approach taken by stdout printing, so it seems appropriate for now.

Since the scope of this PR is to focus on adding the CSV output capability, I’ll leave it as-is for now. Perhaps a future discussion for how ALL is treated in both the stdout printing and the CSV could be had?

0TEST                   | STATUS    | DURATION
1
2feature_blocksdir.py   | ✓ Passed  | 1 s
3feature_config_args.py | ✓ Passed  | 29 s
4wallet_startup.py      | ✓ Passed  | 3 s
5mempool_accept.py      | ✖ Failed  | 1 s
6
7ALL                    | ✖ Failed  | 34 s (accumulated) 
8Runtime: 29 s

rkrux commented at 1:18 pm on June 17, 2024:

I see, makes sense to have that discussion separate from this.

in test/functional/test_runner.py:479 in ad06e68399

473@@ -471,6 +474,13 @@ def main():
474 
475     logging.debug("Temporary test directory at %s" % tmpdir)
476 
477+    results_filepath = None
478+    if args.resultsfile:
479+        results_filepath = pathlib.Path(args.resultsfile)

rkrux commented at 7:32 am on June 17, 2024:

The write function below write_results intentionally creates a csv file, should we add a check here for the csv file extension?

rkrux commented at 7:35 am on June 17, 2024:

As soon as the functional_test_results.csv file was generated in my system, I thought of adding it in gitignore. Since this feature is conditional based on the param, we can’t add this hardcoded name in gitgnore but how about adding *.csv in gitgnore?

tdb3 commented at 1:07 pm on June 17, 2024:

Good thought. Since the approach is to allow the user to specify the filename/extension as they see fit (rather than force a particular extension), this seems ok as-is. If others feel that forcing the extension .csv would be more ideal, then will keep this in mind.

rkrux approved

rkrux commented at 7:39 am on June 17, 2024: contributor

tACK ad06e68

Make is successful so are all the functional tests. Passed the resultsfile param and generated the results csv few times.

Thanks @tdb3 for this, it’s a good addition to the testing suite. I can imagine using this feature and also sorting the tests in descending order based on time taken. I mentioned couple points but none are blockers.

DrahtBot requested review from brunoerg on Jun 17, 2024

DrahtBot requested review from kevkevinpal on Jun 17, 2024

DrahtBot requested review from maflcko on Jun 17, 2024

maflcko commented at 10:57 am on June 17, 2024: member

re-ACK ad06e68399da71c615db0dbf5304d0cd46bc1f40

in test/functional/test_runner.py:721 in ad06e68399

715@@ -702,6 +716,17 @@ def print_results(test_results, max_len_name, runtime):
716     results += "Runtime: %s s\n" % (runtime)
717     print(results)
718 
719+
720+def write_results(test_results, filepath, total_runtime):
721+    with open(filepath, mode="w", encoding="utf8") as results_file:

brunoerg commented at 12:48 pm on June 17, 2024:

It creates the file if it doesn’t exist, right?

tdb3 commented at 12:50 pm on June 17, 2024:

Yes, and overwrites the existing file if it does

maflcko commented at 12:51 pm on June 17, 2024:

See https://docs.python.org/3.8/library/functions.html#open

brunoerg commented at 12:53 pm on June 17, 2024:

Cool. Thanks.

DrahtBot requested review from brunoerg on Jun 17, 2024

brunoerg commented at 1:06 pm on June 17, 2024: contributor

ACK ad06e68399da71c615db0dbf5304d0cd46bc1f40

tdb3 commented at 1:08 pm on June 17, 2024: contributor

tACK ad06e68

Thanks for the review @rkrux!

marcofleon commented at 1:08 pm on June 17, 2024: contributor

Good idea, tested ACK ad06e68399da71c615db0dbf5304d0cd46bc1f40

tdb3 commented at 1:10 pm on June 17, 2024: contributor

Good idea, tested ACK ad06e68

Thanks for reviewing/testing @marcofleon!

kevkevinpal commented at 6:31 pm on June 17, 2024: contributor

tACK ad06e68

Looks good to me!

achow101 commented at 6:43 pm on June 17, 2024: member

ACK ad06e68399da71c615db0dbf5304d0cd46bc1f40

achow101 merged this on Jun 17, 2024

achow101 closed this on Jun 17, 2024

bitcoin locked this on Jun 17, 2025

test: write functional test results to csv #30291

Notes

Example 1: all tests pass

Example 2: one test failure

Code Coverage

Reviews

Conflicts