test: Run framework unit tests in parallel #29771

tdb3 commented at 3:42 AM on March 31, 2024: contributor

Functional test framework unit tests are currently run prior to all other functional tests.

This PR enables execution of the test framework unit tests in parallel with the functional tests, rather than before the functional tests, saving runtime and more efficiently using available cores.

This is a follow up to #29470 (comment)

New behavior:

When running all tests, the framework unit tests are run in parallel with the other tests (unless explicitly skipped with --exclude). This parallelization introduces marginal time savings when running all tests, depending on the machine used. As an example, a 2-3% time savings (9 seconds) was observed on a machine using --jobs=18 (with 18 available cores).
When running specific functional tests, framework unit tests are now skipped by default. Framework unit tests can be added by including feature_framework_unit_tests.py in the list of specific tests being executed. The rationale for skipping by default is that if the tester is running specific functional tests, there is a conscious decision to focus testing, and choosing to run all tests (where unit tests are run by default) would be a next step.
The --skipunit option is now removed since unit tests are parallelized (they no longer delay other tests). Unit tests are treated equally as functional tests.

Implementation notes:

Since TextTestRunner can be noisy (even with verbosity=0, and therefore trigger job failure through the presence of non-failure stderr output), the approach taken was to send output to stdout, and forward test result (as determined by TestResult returned). This aligns with the previous check for unit test failure (if not result.wasSuccessful():).

This approach was tested by inserting self.assertEquals(True, False) into test_framework/address.py and seeing specifics of the failure reported.

135/302 - feature_framework_unit_tests.py failed, Duration: 0 s

stdout:
.F
======================================================================
FAIL: test_bech32_decode (test_framework.address.TestFrameworkScript.test_bech32_decode)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/dev/myrepos/bitcoin/test/functional/test_framework/address.py", line 228, in test_bech32_decode
    self.assertEqual(True, False)
AssertionError: True != False

----------------------------------------------------------------------
Ran 2 tests in 0.003s

FAILED (failures=1)


stderr:

There was an initial thought to parallelize the execution of the unit tests themselves (i.e. run the 12 unit test files in parallel), however, this is not anticipated to further reduce runtime meaningfully and is anticipated to add unnecessary complexity.

DrahtBot commented at 3:42 AM on March 31, 2024: contributor

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage

For detailed information about the code coverage, see the test coverage report.

Reviews

See the guideline for information on the review process.

Type	Reviewers
ACK	maflcko, achow101
Approach ACK	kevkevinpal

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#29387 (test: fix RPC coverage check by BrandonOdiwuor)
#29335 (test: Handle functional test disk-full error by BrandonOdiwuor)
#28312 (test: fix keys_to_multisig_script (P2MS) helper for n/k > 16 by theStack)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

DrahtBot added the label Tests on Mar 31, 2024

tdb3 force-pushed on Mar 31, 2024

DrahtBot added the label CI failed on Mar 31, 2024

DrahtBot commented at 4:11 AM on March 31, 2024: contributor

🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the documentation.

Possibly this is due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

Leave a comment here, if you need help tracking down a confusing failure.

<sub>Debug: https://github.com/bitcoin/bitcoin/runs/23271888984</sub>

tdb3 force-pushed on Mar 31, 2024

DrahtBot removed the label CI failed on Mar 31, 2024

Sjors commented at 2:25 PM on April 2, 2024: member

I suppose the original idea was to run these first as a sanity check. To somewhat retain that property, perhaps the entire test suite can be aborted if this fails.

If you run these in parallel, I think you might as well drop the --skipunit since it's faster than many of the other tests.

When running specific functional tests, framework unit tests are now skipped by default

This is useful, but wasn't that already the case? Or do you mean when using the --filter argument, rather than executing the individual test directly?

Tested the average of three runs for time test/functional/test_runner.py --jobs=100 on a machine with 32 threads and a RAM disk for --cachedir and --tmpdir

Before: 1 minute 12 seconds

After: 1 minute 13 seconds

That last number is dominated by a single run that took unusually long. Typically it's 4 seconds faster.

maflcko commented at 2:31 PM on April 2, 2024: member

I suppose the original idea was to run these first as a sanity check. To somewhat retain that property, perhaps the entire test suite can be aborted if this fails.

Not sure. For example, what point is it to abort all tests when the muhash unit test fails, when most tests are not using muhash? If someone wanted to abort early, there is the failfast option.

If you run these in parallel, I think you might as well drop the --skipunit since it's faster than many of the other tests.

Agree.

Sjors commented at 2:51 PM on April 2, 2024: member

I was assuming that any failure of the framework tests means your system is broken, but maybe that's too strong an assumption.

tdb3 commented at 10:03 PM on April 2, 2024: contributor

Thank you for taking a look @Sjors and @maflcko!

If you run these in parallel, I think you might as well drop the --skipunit since it's faster than many of the other tests.

To ensure I understand, is the desire here to have the unit tests run by default when running individual tests (and also by default when running all tests)?

If so, that seems reasonable to me. As you mentioned, the unit tests are decently fast and the default job value is 4 anyway (so there would be some inherent parallelization now rather than the previous behavior to serialize unit test execution). For example, on my system, running feature_signet.py with or without feature_framework_unit_tests.py (default jobs=4) now has the same runtime. The tests that would be impacted are relatively short ones anyway, and the user still has the option to specify --skipunit to save a few seconds (the difference between the short test execution and unit test runtime).

When running specific functional tests, framework unit tests are now skipped by default

This is useful, but wasn't that already the case? Or do you mean when using the --filter argument, rather than executing the individual test directly?

The previous behavior was to run the unit tests first, blocking all other tests until the unit tests passed. If the unit tests failed, execution would stop before running any other tests (i.e. the default behavior for unit tests was to fail fast).

The new behavior is for unit tests to be run in parallel, not in a fail fast manner. The user can specify --failfast to end testing early if any test fails (unit tests are no exception).

Sjors commented at 7:39 AM on April 3, 2024: member

is the desire here to have the unit tests run by default when running individual tests

No

(and also by default when running all tests)?

Yes

They are just another test now, so they don't need special treatment via --skipunit. Conversely, they don't need to be run more often than other tests.

The user can specify --failfast to end testing early if any test fails (unit tests are no exception)

Yes, that seems good enough.

tdb3 commented at 9:10 AM on April 3, 2024: contributor

is the desire here to have the unit tests run by default when running individual tests

No

(and also by default when running all tests)?

Yes

They are just another test now, so they don't need special treatment via --skipunit. Conversely, they don't need to be run more often than other tests.

The user can specify --failfast to end testing early if any test fails (unit tests are no exception)

Yes, that seems good enough.

Ah, thank you for clarifying. @mzumsande, now that unit tests are being run in parallel, we could treat them like any other test (i.e. no special treatment with --skipunit, so this option could now be removed). Do you see this aligning with the original goals of skipunit or is there something else to consider? New behavior is described in the opening post, but briefly:

unit tests are run in parallel with other tests, so no longer delay other tests
unit tests are no longer run implicitly when running specific tests (so do not block those tests). Unit tests can be explicitly included in the list of tests.

mzumsande commented at 9:24 AM on April 3, 2024: contributor

@mzumsande, now that unit tests are being run in parallel, we could treat them like any other test (i.e. no special treatment with --skipunit, so this option could now be removed). Do you see this aligning with the original goals of skipunit or is there something else to consider?

Yes, if they are treated like other tests, -skipunit is no longer necessary and should be removed. That's even better than the status quo because it removes the need to remember that -skipunit exists when running a subset of tests

tdb3 force-pushed on Apr 3, 2024

tdb3 commented at 10:42 AM on April 3, 2024: contributor

Thank you for the input all. skipunit now removed. Rebased and pushed.

in test/functional/feature_framework_unit_tests.py:47 in 61c32b87bd outdated

  42 | +            test_framework_tests
  43 | +        )
  44 | +        if not result.wasSuccessful():
  45 | +            print(f"{buf.getvalue()}", file=sys.stderr)
  46 | +            sys.exit(TEST_EXIT_FAILED)
  47 | +        sys.exit(TEST_EXIT_PASSED)

maflcko commented at 11:11 AM on April 3, 2024:

Could also print the result on success? IIUC, stdout will be hidden by the test runner, and when running this test independently, having some feedback is better than no feedback?

tdb3 commented at 1:31 AM on April 4, 2024:

Thank you for reviewing. Would you be ok with a simplified approach of writing all unit test output to stdout, with this output being shown when unit tests fail? This seems a bit cleaner and more consistent than selectively placing all unit test output to stdout or stderr depending on test pass/fail.

More info

TextTestRunner lets us print unit test output to one stream. Previously (before this PR), TextTestRunner wrote all output to stderr regardless of pass/fail but doing this now would cause a false failure (test runner interprets any stderr to mean the test failed).

As was the case before this PR, unit test pass/fail would continue to be based on TestResult returned.

As you mentioned, stdout is hidden by the test runner in non-failure situations, so writing to stdout on test pass doesn't do much currently (other than providing the output to test runner for future use). The user is provided with a reasonable test pass/fail status (matching that of the other tests).

Example output

Pre-PR (pass)

$ test/functional/test_runner.py --jobs=18 --cachedir=/mnt/tmp/cache --tmpdir=/mnt/tmp test/functional/feature_signet.py
Temporary test directory at /mnt/tmp/test_runner_₿_🏃_20240403_192403
Running Unit Tests for Test Framework Modules
.....................
----------------------------------------------------------------------
Ran 21 tests in 8.262s

OK
Remaining jobs: [feature_signet.py]
1/1 - feature_signet.py passed, Duration: 8 s

TEST              | STATUS    | DURATION

feature_signet.py | ✓ Passed  | 8 s

ALL               | ✓ Passed  | 8 s (accumulated) 
Runtime: 8 s

Pre-PR (failure in unit test)

$ test/functional/test_runner.py --jobs=18 --cachedir=/mnt/tmp/cache --tmpdir=/mnt/tmp test/functional/feature_signet.py
Temporary test directory at /mnt/tmp/test_runner_₿_🏃_20240403_192509
Running Unit Tests for Test Framework Modules
.F
======================================================================
FAIL: test_bech32_decode (test_framework.address.TestFrameworkScript.test_bech32_decode)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/dev/bitcoin/test/functional/test_framework/address.py", line 228, in test_bech32_decode
    self.assertEqual(False, True)
AssertionError: False != True

----------------------------------------------------------------------
Ran 2 tests in 0.003s

FAILED (failures=1)
Early exiting after failure in TestFramework unit tests

PR (pass, running unit and functional tests):

$ test/functional/test_runner.py --jobs=18 --cachedir=/mnt/tmp/cache --tmpdir=/mnt/tmp test/functional/feature_signet.py test/functional/feature_framework_unit_tests.py 
Temporary test directory at /mnt/tmp/test_runner_₿_🏃_20240403_193148
Remaining jobs: [feature_signet.py, feature_framework_unit_tests.py]
1/2 - feature_signet.py passed, Duration: 8 s
Remaining jobs: [feature_framework_unit_tests.py]
2/2 - feature_framework_unit_tests.py passed, Duration: 9 s

TEST                            | STATUS    | DURATION

feature_framework_unit_tests.py | ✓ Passed  | 9 s
feature_signet.py               | ✓ Passed  | 8 s

ALL                             | ✓ Passed  | 17 s (accumulated) 
Runtime: 9 s

PR (pass, running only unit tests):

$ test/functional/test_runner.py --jobs=18 --cachedir=/mnt/tmp/cache --tmpdir=/mnt/tmp test/functional/feature_framework_unit_tests.py
Temporary test directory at /mnt/tmp/test_runner_₿_🏃_20240403_193240
Remaining jobs: [feature_framework_unit_tests.py]
1/1 - feature_framework_unit_tests.py passed, Duration: 9 s

TEST                            | STATUS    | DURATION

feature_framework_unit_tests.py | ✓ Passed  | 9 s

ALL                             | ✓ Passed  | 9 s (accumulated) 
Runtime: 9 s

PR (failure in unit tests):

$ test/functional/test_runner.py --jobs=18 --cachedir=/mnt/tmp/cache --tmpdir=/mnt/tmp test/functional/feature_framework_unit_tests.py
Temporary test directory at /mnt/tmp/test_runner_₿_🏃_20240403_193325
Remaining jobs: [feature_framework_unit_tests.py]
1/1 - feature_framework_unit_tests.py failed, Duration: 0 s

stdout:
.F
======================================================================
FAIL: test_bech32_decode (test_framework.address.TestFrameworkScript.test_bech32_decode)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/dev/myrepos/bitcoin/test/functional/test_framework/address.py", line 228, in test_bech32_decode
    self.assertEqual(False, True)
AssertionError: False != True

----------------------------------------------------------------------
Ran 2 tests in 0.003s

FAILED (failures=1)

stderr:


TEST                            | STATUS    | DURATION

feature_framework_unit_tests.py | ✖ Failed  | 0 s

ALL                             | ✖ Failed  | 0 s (accumulated) 
Runtime: 0 s

maflcko commented at 6:16 AM on April 4, 2024:

Would you be ok with a simplified approach of writing all unit test output to stdout, with this output being shown when unit tests fail?

Sure. That'd be fine as well

maflcko commented at 6:17 AM on April 4, 2024:

I was thinking about calling the script (passing) directly:

python3 ./test/functional/feature_framework_unit_tests.py

tdb3 commented at 12:24 PM on April 4, 2024:

Ah, makes sense. Pushed an update to address this.

DrahtBot added the label CI failed on Apr 3, 2024

tdb3 commented at 12:25 PM on April 4, 2024: contributor

Pushed update to address comment and simplify feature_framework_unit_tests.py; providing output when running with or without test runner. Rebased as well.

tdb3 force-pushed on Apr 4, 2024

DrahtBot removed the label CI failed on Apr 4, 2024

DrahtBot added the label Needs rebase on Apr 22, 2024

tdb3 force-pushed on Apr 23, 2024

tdb3 commented at 1:27 AM on April 23, 2024: contributor

🐙 This pull request conflicts with the target branch and needs rebase.

Rebased to address conflict

DrahtBot removed the label Needs rebase on Apr 23, 2024

maflcko commented at 7:00 AM on April 23, 2024: member

Rebased to address conflict

The rebase is wrong. Please pay attention to correctly address any merge conflicts, as they arise. One way to avoid mistakes, is to review your own code changes and to check rebases, as other reviewers will do later. For example, using the range-diff, as explained in the docs.

tdb3 commented at 10:49 AM on April 23, 2024: contributor

The rebase is wrong.

Thank you. Looks like I had overlooked the addition of wallet_util (TEST_FRAMEWORK_MODULES being moved to feature_framework_unit_tests.py made it less obvious). Will rebase/push soon.

tdb3 force-pushed on Apr 24, 2024

test: Run framework unit tests in parallel

Reorganize functional test framework unit tests to run in parallel
with other functional tests.

The option `skipunit` is removed, since unit tests no longer delay
functional test execution.

Unit tests are run by default when running all tests, and can be
run explicitly with `feature_framework_unit_tests.py` when running
a subset of tests.

f19f0a2e5a

tdb3 force-pushed on Apr 24, 2024

DrahtBot added the label CI failed on Apr 24, 2024

DrahtBot commented at 12:27 AM on April 24, 2024: contributor

🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the documentation.

Possibly this is due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

Leave a comment here, if you need help tracking down a confusing failure.

<sub>Debug: https://github.com/bitcoin/bitcoin/runs/24178995046</sub>

tdb3 commented at 1:16 AM on April 24, 2024: contributor

Rebased and pushed.

DrahtBot removed the label CI failed on Apr 24, 2024

maflcko commented at 8:43 AM on April 24, 2024: member

ACK f19f0a2e5af6c2a64900f1f229e21b6f1668bd3d 🌽

<details><summary>Show signature</summary>

Signature:

untrusted comment: signature from minisign secret key on empty file; verify via: minisign -Vm "${path_to_any_empty_file}" -P RWTRmVTMeKV5noAMqVlsMugDDCyyTSbA3Re5AkUrhvLVln0tSaFWglOw -x "${path_to_this_whole_four_line_signature_blob}"
RUTRmVTMeKV5npGrKx1nqXCw5zeVHdtdYURB/KlyA/LMFgpNCs+SkW9a8N95d+U4AP1RJMi+krxU1A3Yux4bpwZNLvVBKy0wLgM=
trusted comment: ACK f19f0a2e5af6c2a64900f1f229e21b6f1668bd3d 🌽
5iGa35C3WtPyuzsus+SXWTAwe3NCXsFoFkgT0RqTyiSknQ3FPoy0VkQEbNmm9S4yj1hIlYnvVtDo7Ny/YVS7BA==

</details>

in test/functional/test_runner.py:241 in f19f0a2e5a

 237 | @@ -255,6 +238,7 @@
 238 |      'wallet_keypool.py --descriptors',
 239 |      'wallet_descriptor.py --descriptors',
 240 |      'p2p_nobloomfilter_messages.py',
 241 | +    TEST_FRAMEWORK_UNIT_TESTS,

kevkevinpal commented at 1:32 PM on April 24, 2024:

is there any reason we picked this location in BASE_SCRIPTS, I would think we would want to run TEST_FRAMEWORK_UNIT_TESTS earlier on even if they are running in parallel

since if the unit tests fail other tests will probably fail as well.

I think making it the first script in BASE_SCRIPTS would make sense to me and then making an exception in regards to run times since other tests deepened on this.

tdb3 commented at 4:12 PM on April 24, 2024:

Thank you for the review!

Great question. The location chosen was within the # vv Tests less than 30s vv portion of BASE_SCRIPTS. I'm seeing run-times for feature_framework_unit_tests.py (on two machines I'm using) of 9s and 6s, and TEST_FRAMEWORK_UNIT_TESTS is around mid-way in the "30s" portion of the list, which seemed reasonable. https://github.com/bitcoin/bitcoin/blob/f19f0a2e5af6c2a64900f1f229e21b6f1668bd3d/test/functional/test_runner.py#L392-L393

The existing comment to insert at random seems to provide general high-level guidance and doesn't dig into individual system performance (for example, 30s worth of work on an Ivy Bridge CPU and a Zen 4 CPU would be drastically different due to significant IPC differences). Maybe I'm overlooking other guidance that defines relative performance (e.g. 30s on a particular cloud service instance type, CPU type, etc.). There's probably an opportunity for rethinking and updating this guidance, but that would be adjacent to this PR rather than included in it.

kevkevinpal commented at 1:34 PM on April 24, 2024: contributor

Approach ACK f19f0a2e5af6c2a64900f1f229e21b6f1668bd3d

I still need to run this on my local machine

achow101 commented at 8:00 PM on April 26, 2024: member

ACK f19f0a2e5af6c2a64900f1f229e21b6f1668bd3d

DrahtBot requested review from kevkevinpal on Apr 26, 2024

achow101 merged this on Apr 26, 2024

achow101 closed this on Apr 26, 2024

bitcoin locked this on Apr 26, 2025