Suggestion on how to mitigate non-determinism in tests #17731

issue davereikher opened this issue on December 12, 2019

davereikher commented at 6:17 AM on December 12, 2019: contributor
Following issue #14343, in (the still open) PR #16878 I used a specific approach to address the non-deterministic coverage issue and I'd like to present it here in general form to a wider audience to get additional opinions on that approach because I think it can be used on other such tests to mitigate non-determinism.

The approach can be broadly described by a 'recipe' for any given test:
- Run the coverage measuring script contrib/devtools/test_deterministic_coverage.sh many times on the test and write down all branches that sometimes run and sometimes don't.
- Figure out how to cause all of those branches to run from the test body.
- Add code at the end of the test (after all the assertions) that would cause all of those branches to run.
This approach worked effectively on the test DoS_mapOrphans in PR #16878 and I believe it can be used on some or maybe all of the tests listed in contrib/devtools/test_deterministic_coverage.sh. As far as I can see, the only disadvantage for this approach is the seemingly unrelated code at the end of such tests required to ensure coverage but I believe any confusion can be eliminated by a comment explaining what it's there for.

What do people think?
davereikher added the label Feature on Dec 12, 2019
practicalswift commented at 8:54 AM on December 12, 2019: contributor

@davereikher

Very glad that you are interested in tackling the remaining instances of non-determinism in our unit tests. That's great!

One of the problems with line coverage non-determinism is that it makes it impossible to reason about line coverage with precision. More specifically having line coverage non-determinism makes it impossible to say if a PR will reduce or increase line coverage since the delta in line coverage due to the suggested change will be added to a random variable (the existing line coverage which varies between runs due to non-determinism in a few tests).

In cases where the ideal solution of making the tests deterministic by adjusting the existing test logic is not possible or hard to achieve I think your suggestion makes sense as a second-best alternative. That will give us a stable line coverage number and thus the ability to reason about the line coverage impact of suggested changes :)

Since this only is relevant for a small set of tests I don't see any real downside of what you suggest: if/when "proper test determinism" is introduced for the test in question the temporary fix suggested can simply be removed, and a comment on why the extra testing code is needed will eliminate any confusion.

Again: thanks a lot for working on making our unit tests deterministic in terms of line coverage! :)

Don't hesitate to ping me if you need a test determinism PR reviewed or if I can help in any other way :)
fanquake removed the label Feature on Dec 12, 2019
fanquake added the label Brainstorming on Dec 12, 2019
fanquake added the label Tests on Dec 12, 2019
MarcoFalke commented at 4:57 PM on December 13, 2019: member

determinism shouldn't be achieved by "painting lines green" in the coverage report. This merely hides the non-determinism. What we want are tests that run exactly the same branches over and over again (assuming the same seed).
davereikher commented at 8:17 PM on December 21, 2019: contributor
So, if I understand correctly, there currently exists a problem where it is unclear whether a PR increases or decreases coverage. It would be beneficial to make all tests deterministic but it's not clear how to do it since it was observed (see here) that the non-determinism is not due to a changing seed but most likely due to concurrency issues. I see the following options:
1. Leave everything as is to avoid "painting the lines green", ignoring the non-deterministic coverage issue,
2. Use the proposed method, at least temporarily, until a better permanent solution is found,
3. Re-write the tests with the non-deterministic coverage to make it deterministic, using a different approach,
4. Remove the non-deterministic tests,
I guess the fourth option is most likely not realistic, as it would reduce coverage significantly. The third option is the best in it's outcome, but I think it would take a very large amount of effort. Perhaps it's not even possible without some serious refactoring for some of the tests. This is due to the fact that the non-determinism is not just a seed issue, it's most likely concurrency-related and I personally don't see another way to relatively easily achieve coverage determinism without deep changes to those tests and some of the fixtures. The second option can be seen as a temporary patch which addresses the issue that a small amount of tests were written without consideration of coverage determinism. They therefore introduced an inability to track how a new PR affects coverage. Thus a patch in the form of "covering the lines green" is requied to resolve this 'bug'. I agree that from an aesthetic point of view the patch may be a ugly, but given the current situation I think it contributes overall and makes work more efficient by allowing to see how a new PR affects the coverage.

Of course, it's also a question of how important knowing if a new PR affects coverage is. If it's very important, maybe even option 4 can be considered. If it's not at all, we can close the issue. So I think eventually it boils down to whether the effect of the new PR question is more important than the aesthetics of the tests?
adamjonas commented at 3:20 PM on August 2, 2022: member

Marco and co have continued to be vigilant about non-determinism in tests. Closing due to lack of momentum. Can re-open if future interest in this specific approach.
adamjonas closed this on Aug 2, 2022
bitcoin locked this on Aug 2, 2023

Contributors

Labels

Linked (view graph)

#14343 coverage reports non-deterministic #16878 Fix non-deterministic coverage of test DoS_mapOrphans