tests: Automatically order regtests by runtime #13285

domob1812 commented at 11:11 AM on May 20, 2018: contributor

Bitcoin's regtests are ordered roughly by runtime (longest running first) to aid in parallelism in test/functional/test_runner.py. This makes sense, but having the manually-ordered list in the script has two issues:

It is brittle and error prone, as whoever adds a new test has to judge where to put it. Also, when a test is changed in a way that affects the runtime, it is likely forgotten to reoder the list.
Having the list in a weird order makes it much less readable. In code, it should be ordered in a way that is logical for developers to write and read, with related tests next to each other.

I would like to make the ordering automatic. For this, we can just have a new data file (e. g., JSON or generated Python code) that has the runtime for each test, and then use this file to sort the list automatically before running any tests. This file should probably be checked into version control, but test_runner.py can have a new execution mode that not only prints the runtimes after the fact (which it already does) but also saves them to this file. That way, it will be easy to keep it updated, and no manual judging is required.

Please let me know what you think about this proposal - I would like to get feedback before I implement it, so that I can take that feedback into account and not waste my time.

domob1812 commented at 11:12 AM on May 20, 2018: contributor

If having a checked-in but generated data file is not good, we could also specify the approximate runtime manually - either in the list in test_runner.py or in each test file itself. That would at least solve issue 2.

fanquake added the label Tests on May 20, 2018

MarcoFalke commented at 4:27 PM on May 21, 2018: member

See also #11964

domob1812 cross-referenced this on May 22, 2018 from issue Avoid "test_runner.py" script array by jonasschnelli

domob1812 commented at 5:59 PM on May 22, 2018: contributor

Thanks for the reference, that sounds interesting. Let's see how that goes and whether or not it makes sense to still work on this, smaller-scope proposal in the mean time.

jeffrade commented at 6:12 PM on May 22, 2018: contributor

Instead of a separate file or in test_runnrer.py, could we add any meta data about the test in the test files themselves? This will remove any dependency from test_runner.py and avoid complexity by adding another file. See this comment from #11964 for requirements that I think will work best here.

domob1812 commented at 6:27 PM on May 22, 2018: contributor

Yes, that's certainly possible. The disadvantage I see is that it still means we have to manually specify the runtime, and cannot generate it automatically. But I think that's a minor drawback, and not having another list of tests (even if autogenerated) may well outweigh it. This certainly solves the main issue I'm interested in, which is the manual ordering in the list.

I currently don't have time (nor do I want to step on your toes here) to implement the full set of metadata mentioned in #11964, though. So I would only include the rough runtime in the tests to make the order automatic (i. e., the order in test_runner.py itself won't matter and can be based on logical relation of tests rather than runtime). I would not implement the full set of metadata to get rid of the list completely for now. Do you still think that's worthwhile for now, or do you just want to implement the full solution yourself instead?

jeffrade commented at 7:32 PM on May 22, 2018: contributor

@domob1812 Sorry, I had to re-read your original idea and now see what it solves. I'd say go ahead and open a PR for what you want to tackle. I think they are independent of each other and your solution would get rid of the manual work involved.

jeffrade commented at 7:57 PM on May 22, 2018: contributor

For this, we can just have a new data file (e. g., JSON or generated Python code) that has the runtime for each test, and then use this file to sort the list automatically before running any tests.

Also, IMO, this file should just store the descending order of the tests and not the actual time (e.g. execution time in seconds). That will keep the commits of this new JSON file only necessary when a new test or existing test takes longer (or shorter) than another test(s).

In other words, we don't care if test_a takes 64 seconds and test_b takes 53 seconds. Just that test_a takes longer than test_b.

jnewbery commented at 1:34 PM on May 23, 2018: member

To me, this seems like over-engineering something that isn't a huge issue. The reason we order the tests in descending execution time order is to save a bit of time when running all tests through the test_runner. As long as the most time-consuming tests are at the top, reordering the rest of the tests shaves off a few seconds at best.

Why not just run the tests once, order the results by time and then submit a PR that reorders the tests if they're in the wrong order? I don't think we need to worry about reordering them more than once a year.

domob1812 commented at 4:03 PM on May 23, 2018: contributor

Thanks everyone for the input, that's exactly why I opened this issue before working on the actual code! @jnewbery: I fully agree that just the "automatic ordering" aspect is likely not too important in itself, and that having an auto-generated order might be overengineering. However, as far as I can see your comment relates only to point 1) of my original report. I believe that point 2) is actually the important part (perhaps that was not made very clear): I want to get rid of the requirement to have tests in order in the code, at least roughly. Instead, the code should be mainly written for us developers, and should be in an order that makes sense from a logical point of view.

So if you (and others) believe that solving point 1) in addition is not worth the complication of the auto-generated list, my proposal is to solve just point 2) by manually including (approximate) test runtimes somewhere - for instance, by a limited implementation of metadata in tests as suggested by #11964 (where the metadata is for now just the test runtime, so that test_runners.py can sort on the fly before running the tests). What do you think about this proposal?

jnewbery commented at 4:58 PM on May 23, 2018: member

a limited implementation of metadata in tests .. for test runtime.

This seems to me to be more maintenance overhead than the list that currently exists in one place in test_runner.py

domob1812 commented at 5:18 PM on May 23, 2018: contributor

@jnewbery: Why? This would be a single number in the test itself, which is a place where it is logical to put it (not in a separate list). Also, having metadata in the tests themselves seems to be what we are going to anyway with #11964 at some point in the future.

Also, again, my goal is not to reduce the maintenance overhead of keeping the current list sorted. I want to get rid of the constraint that it has to be sorted, and thus allow reordering it in a more readable and logical way. That's what I want to solve here - if you think this is a non-goal (I disagree, for readability reasons of the code), then of course this proposal is not useful.

jeffrade commented at 5:57 PM on May 23, 2018: contributor

Why not just run the tests once, order the results by time and then submit a PR that reorders the tests if they're in the wrong order? I don't think we need to worry about reordering them more than once a year.

If I understand this correctly, sounds like the minimum, IMO, that can be done is:

Pull the BASE_SCRIPTS and EXTENDED_SCRIPTS into another file (simple JSON file(s)) keeping their current order.
Then when running test_runner.py with a new flag (e.g. track_time), we have logic that will keep track of the time it takes to run each test and keep the order in a list in memory.
Then take that list in memory and write the list order to these new JSON file(s).
Then if there are changes to that JSON, changes will show in git and the developer can choose whether to commit the new ordering or not.

jnewbery commented at 7:34 PM on May 23, 2018: member

I'm afraid I just don't agree with the premise of this issue. Having the test names not sorted lexicographically in a list in test_runner.py has never caused issues for me.

I'm not saying you shouldn't do this - other contributors may feel the same way as you - but for me this is a concept NACK. Review/maintenance overhead seems higher than just having a list in test_runner.py and occasionally reordering it.

jeffrade commented at 7:52 PM on May 23, 2018: contributor

I'm not saying you shouldn't do this - other contributors may feel the same way as you - but for me this is a concept NACK. Review/maintenance overhead seems higher than just having a list in test_runner.py and occasionally reordering it.

Fair.

If @domob1812 decides to open a PR, I'd be happy to review.

domob1812 commented at 4:31 PM on May 24, 2018: contributor

Let me perhaps explain how I came to this idea: I'm the core maintainer of Namecoin, which is based on Bitcoin Core and tries to follow the upstream development as closely as possible. We do have our own set of regression tests, in addition to the ones Bitcoin has. I most certainly want to have all of them in a single block in test_runner.py, currently at the end (definitely not mixed into the Bitcoin tests, for various reasons from merge conflicts to clarity in reading the code). This means that the tests take longer than they need to, because our longer running tests are not parallelised well.

Of course, that's by itself not a valid reason to change anything in Bitcoin. First, that's Namecoin's problem and not Bitcoin's. And second, we can solve the issue quite easily by moving the block of tests to the beginning rather than end of the test list.

However, since in my opinion it clearly makes sense to have the "name-related" tests in Namecoin in their own block rather than mixed into all other tests, I assume that the same argument can be made also for Bitcoin's tests. I'm not strictly thinking about a lexicographical order; perhaps having all "segwit-related" tests next to each other would make sense, or all P2P tests (the latter is close to lexicographic ordering now with the naming convention).

But if you disagree and don't see value in doing this for Bitcoin, I can understand that we should not introduce more complexity "for nothing" - then I'm perfectly happy to close this issue as "won't fix", and just put Namecoin's tests at the beginning of the list. But ideally I would like to know before coding it up whether or not such a change would be accepted (not considering the actual code quality for now, of course).

jnewbery commented at 4:44 PM on May 24, 2018: member

Thanks for the full disclosure @domob1812! As you note, Namecoin's concerns aren't a valid reason to change anything in Bitcoin :)

would like to know before coding it up whether or not such a change would be accepted

I can't tell you that. I'm a concept NACK, but other contributors might think this is useful.

MarcoFalke commented at 5:06 PM on May 24, 2018: member

You could sort both lists by time and then do a zipper merge instead of concatenation to get a result that probably good enough. :p

MarcoFalke closed this on May 24, 2018

MarcoFalke commented at 5:07 PM on May 24, 2018: member

Closing for now, since there seems to be some disagreement and we already have the other issue open. That seems enough. Also, everyone is free to propose pull request changes, but I don't think we should keep this issue open.

bitcoin locked this on Sep 8, 2021