test: Let test_runner.py start multiple jobs per timeslot #23799

sipa commented at 8:25 PM on December 16, 2021: member

test_runner.py currently only checks every 0.5s whether any job has finished, and if so, starts at most one new job. At higher parallellism it becomes increasingly likely that multiple jobs have finished at the same time. Fix this by always noticing all finished jobs every timeslot, and starting as many new ones.

Let test_runner.py start multiple jobs per timeslot 975097f424

sipa commented at 8:27 PM on December 16, 2021: member

At -j32 it speeds things up by a few seconds wall clock time for me. Though the runtime is mostly dominated by a few very long-running jobs. Splitting those up could significantly increase the gain.

MarcoFalke commented at 8:29 PM on December 16, 2021: member

"Duplicate" of #13384?

sipa commented at 8:31 PM on December 16, 2021: member

Eh, not really - this still polls every 0.5s. It's just not limited to detecting at most one completed task per 0.5s.

DrahtBot added the label Tests on Dec 16, 2021

laanwj commented at 3:44 PM on December 17, 2021: member

The approach in #13384 would be the most efficient, to start a new test as soon as another stops. But this is an improvement. Concept ACK.

sipa commented at 4:08 PM on December 17, 2021: member

I'm perfectly fine with the approach in #13384 as well. This PR is just an obvious incremental improvement to the current code with no impact apart from making things a bit faster, so I'd expect it to be perhaps less controversial.

in test/functional/test_runner.py:536 in 975097f424

 558 | -                break
 559 | +    i = 0
 560 | +    while i < test_count:
 561 | +        for test_result, testdir, stdout, stderr in job_queue.get_next():
 562 | +            test_results.append(test_result)
 563 | +            i += 1

unknown commented at 4:32 PM on December 17, 2021:

Looked at this code in which for i in range(test_count) is replaced by while i< test_count):

Read the answers here https://stackoverflow.com/questions/869229/why-is-looping-over-range-in-python-faster-than-using-a-while-loop and I am not sure why this would be faster.

Ignore the review if it does not make sense or I am missing something important.

sipa commented at 4:35 PM on December 17, 2021:

That's not what makes it faster.

The speedup is due to get_next() now returning all finished jobs, instead of one finished job (and then sleeping 0.5s between calls). The change here is just to deal with the fact that the returned value is now a list of jobs instead of a single one.

unknown approved

unknown commented at 5:09 PM on December 17, 2021: none

ACK https://github.com/bitcoin/bitcoin/pull/23799/commits/975097f424541a149c5b4e03816208aa76aad6b9

laanwj commented at 6:39 PM on December 17, 2021: member

perhaps less controversial.

Sure. Though I don't think it was controversial. I definitely didn't mean my comment like that at the time, it was far from a NACK. I just had a question about the dot-printing, and not being able to exit with Ctrl-C was lightly annoying. I'm sure those could be solved.

MarcoFalke commented at 1:16 PM on December 18, 2021: member

Yeah, I am wondering if the CTRL+C needs to be captured and translated into a kill?

laanwj commented at 4:18 PM on January 5, 2022: member

Code review and lightly tested ACK 975097f424541a149c5b4e03816208aa76aad6b9

Yeah, I am wondering if the CTRL+C needs to be captured and translated into a kill?

Maybe. Usually it's a result of accidentally ignoring KeyboardInterrupt exceptions (for example in threads). But haven't checked it. I'm going ahead and merge this for now. I guess someone interested in the other PR can pick it up.

laanwj renamed this:
~~Let test_runner.py start multiple jobs per timeslot~~
test: Let test_runner.py start multiple jobs per timeslot
on Jan 5, 2022

laanwj merged this on Jan 5, 2022

laanwj closed this on Jan 5, 2022

sidhujag referenced this in commit 4ef7f3b5d8 on Jan 6, 2022

in test/functional/test_runner.py:534 in 975097f424

 556 | -            if failfast:
 557 | -                logging.debug("Early exiting after test failure")
 558 | -                break
 559 | +    i = 0
 560 | +    while i < test_count:
 561 | +        for test_result, testdir, stdout, stderr in job_queue.get_next():

MarcoFalke commented at 2:05 PM on January 6, 2022:

...............
               
Remaining jobs: [wallet_import_rescan.py --legacy-wallet, p2p_node_network_limited.py]
..............................................................................................................................................................................................................................
                                                                                                                                                                                                                              
Remaining jobs: [wallet_import_rescan.py --legacy-wallet]
......................................................................................................................................................................................
                                                                                                                                                                                      
----------------------------------------------------------------------
Ran 10 tests in 0.753s
OK
Traceback (most recent call last):
  File "test/functional/test_runner.py", line 816, in <module>
    main()
  File "test/functional/test_runner.py", line 460, in main
    run_tests(
  File "test/functional/test_runner.py", line 535, in run_tests
    for test_result, testdir, stdout, stderr in job_queue.get_next():
  File "test/functional/test_runner.py", line 647, in get_next
    raise IndexError('pop from empty list')
IndexError: pop from empty list

https://cirrus-ci.com/task/6633293054476288?logs=ci#L6037

sipa commented at 6:10 PM on January 6, 2022:

Hmm, I don't understand how this is possible. It requires get_next() to be called when self.jobs and self.test_list are empty, while i < test_count in run_tests(). The latter should imply that self.test_list is not empty.

sipa commented at 6:57 PM on January 6, 2022:

See https://github.com/bitcoin/bitcoin/pull/23995

mzumsande commented at 2:13 PM on January 7, 2022:

see #23995 (comment) for a possible explanation.

0967622354toon commented at 5:29 PM on January 6, 2022: none

Duplicate of #

0967622354toon commented at 5:29 PM on January 6, 2022: none

H

sipa cross-referenced this on Jan 6, 2022 from issue Simplify test_runner.py a bit by sipa

softminus cross-referenced this on Jan 26, 2022 from issue Add "test groups" to `rpc-tests.py` by str4d

mzumsande cross-referenced this on Jan 28, 2022 from issue test: Fix failfast option for functional test runner by mzumsande

MarcoFalke referenced this in commit 9392e1350c on Feb 7, 2022

sidhujag referenced this in commit 443645d1a3 on Feb 7, 2022

bitcoin locked this on Jan 7, 2023