Slow unit tests delay functional tests and leave CPU unused

maflcko commented at 1:07 pm on June 18, 2025: member

When running the unit and functional tests under a sanitizer (as done in some CI tasks), the CPU usage could be better. Currently, one or two slow running unit tests idle along on a single CPU thread.

Can be seen in this graph, which shows an initial spike during compilation, then the slow unit tests, and finally the functional test CPU usage:

It would be nice to somehow fix this. Options are:

Speed up the unit tests (not sure how easy this is, or if this is possible at all)
Run the functional tests along with the ctest-tests in parallel somehow
Something else?

maflcko added the label Tests on Jun 18, 2025

willcl-ark commented at 1:14 pm on June 18, 2025: member

I have noticed similar when investigating tangential Ci changes. It is possible to run more unit tests in parallel, but we still always wait for the slowest test.

If I recall it’s a secp256 test, and there was perhaps a suggestion to disable it (mostly), or have it run weekly or something?

maflcko commented at 1:38 pm on June 18, 2025: member

Looking at https://github.com/bitcoin/bitcoin/actions/runs/15714959524/job/44282297642#step:6:4481, it seems there are several affected tests:

0...
1135/142 Test  [#66](/bitcoin-bitcoin/66/): net_tests ............................   Passed  243.14 sec
2136/142 Test [#139](/bitcoin-bitcoin/139/): wallet_tests .........................   Passed  108.31 sec
3137/142 Test [#138](/bitcoin-bitcoin/138/): wallet_crypto_tests ..................   Passed  116.03 sec
4138/142 Test   [#5](/bitcoin-bitcoin/5/): secp256k1_noverify_tests .............   Passed  426.46 sec
5139/142 Test [#129](/bitcoin-bitcoin/129/): coinselector_tests ...................   Passed  268.56 sec
6140/142 Test   [#9](/bitcoin-bitcoin/9/): bench_sanity_check ...................   Passed  487.70 sec
7141/142 Test  [#32](/bitcoin-bitcoin/32/): coins_tests ..........................   Passed  559.64 sec
8142/142 Test   [#6](/bitcoin-bitcoin/6/): secp256k1_tests ......................   Passed  590.82 sec

edit:

Tsan:

0[20:37:36.776] 136/142 Test  [#32](/bitcoin-bitcoin/32/): coins_tests ..........................   Passed  122.51 sec
1[20:37:38.388] 137/142 Test  [#30](/bitcoin-bitcoin/30/): checkqueue_tests .....................   Passed  124.34 sec
2[20:37:46.297] 138/142 Test [#139](/bitcoin-bitcoin/139/): wallet_tests .........................   Passed   29.87 sec
3[20:38:40.235] 139/142 Test   [#9](/bitcoin-bitcoin/9/): bench_sanity_check ...................   Passed  193.33 sec
4[20:38:42.459] 140/142 Test [#129](/bitcoin-bitcoin/129/): coinselector_tests ...................   Passed   98.51 sec
5[20:42:20.591] 141/142 Test   [#5](/bitcoin-bitcoin/5/): secp256k1_noverify_tests .............   Passed  413.70 sec
6[20:48:11.703] 142/142 Test   [#6](/bitcoin-bitcoin/6/): secp256k1_tests ......................   Passed  764.81 sec

msan:

0[13:02:32.115] 134/141 Test  [#31](/bitcoin-bitcoin/31/): coins_tests ..........................   Passed   98.23 sec
1[13:02:40.482] 135/141 Test [#138](/bitcoin-bitcoin/138/): wallet_tests .........................   Passed   22.38 sec
2[13:02:53.347] 136/141 Test [#114](/bitcoin-bitcoin/114/): txvalidationcache_tests ..............   Passed   67.72 sec
3[13:03:29.497] 137/141 Test [#128](/bitcoin-bitcoin/128/): coinselector_tests ...................   Passed   78.19 sec
4[13:03:32.923] 138/141 Test  [#29](/bitcoin-bitcoin/29/): checkqueue_tests .....................   Passed  163.40 sec
5[13:04:12.828] 139/141 Test   [#8](/bitcoin-bitcoin/8/): bench_sanity_check ...................   Passed  207.55 sec
6[13:05:08.881] 140/141 Test   [#5](/bitcoin-bitcoin/5/): secp256k1_noverify_tests .............   Passed  263.61 sec
7[13:08:27.015] 141/141 Test   [#6](/bitcoin-bitcoin/6/): secp256k1_tests ......................   Passed  461.74 sec

willcl-ark commented at 3:08 pm on June 19, 2025: member

It would be helpful to know when we should run these secp256k1 tests in this repo. I think they are run in the libsecp repo as part of CI based on make "$BUILD" defaulting to make check, but I’ve not verified too closely.

I am not sure I see any point to re-running them here on every change. What sounds more sensible to me is to build and run them on subtree updates only. I have a demo branch adding such a CI job which I can tidy and open as a PR, if something like this is wanted?

This perhaps won’t “fix” this issue entirely, but should make a difference of a few minutes per job on a few of our jobs.

maflcko commented at 3:43 pm on June 19, 2025: member

I am not sure I see any point to re-running them here on every change. What sounds more sensible to me is to build and run them on subtree updates only. I have a demo branch adding such a CI job which I can tidy and open as a PR, if something like this is wanted?

No objection from my side.

This perhaps won’t “fix” this issue entirely

I guess the easiest fix would be to call ctest --test-dir ./bld-cmake --show-only=human from the functional test runner and then run those listed in separate processes (when an option like --run-ctest-tests) is given?

willcl-ark commented at 9:33 am on June 20, 2025: member

I opened a draft for disabling the secp256 tests by default, running them only on subtree updates.

Regarding running unit tests from the functional tests, the simplest would probably be to run ctest --test-dir build -j$(nproc) using a new process from multiprocessing, then dump stdout/stderr in the results section and forward on the return code (if non-zero) to the runner.

But it sounds like you are proposing something more like enumerating all tests and running them singly from the runner using the existing infrastructure?

maflcko commented at 9:37 am on June 20, 2025: member

yeah, I am not sure if it is fine to double prarallelize (nproc unit tests at the same time with nproc functional tests), but I guess it just means the early launced functional tests will be a bit slower, which should be fine

willcl-ark commented at 9:55 am on June 20, 2025: member

It appears to work on my machine OK running them both with nproc in parallel. In fact on my 16 core machine I can run ctest at -j16 and test_runner.py --jobs=20 together without issue.

Just to be clear, would we want the ctest output included and integrated in the test_runner results section? If so this will I think involve parsing the ctest output (and therefore more code). If we are happy just to run it using multiprocess and dump stdout/err at the end, and record the exit code, the change will be a fair bit smaller (but the output possibly messier!)

ajtowns commented at 0:23 am on July 17, 2025: contributor

cf #32945 maybe?

Changing test_runner.py from python to cmake logic might make running the functional and unit tests in parallel more plausible?

maflcko commented at 11:38 am on July 17, 2025: member

cf #32945 maybe?

Can’t hurt, but depending on the sanitizer, it is a different test that is the slowest: #32770 (comment) so the approach would have to be done for other tests as well.

Submitted #33000 for the ci in the meantime.

Slow unit tests delay functional tests and leave CPU unused #32770