Suggestion on how to mitigate non-determinism in tests #17731

issue davereikher opened this issue on December 12, 2019
  1. davereikher commented at 6:17 AM on December 12, 2019: contributor

    Following issue #14343, in (the still open) PR #16878 I used a specific approach to address the non-deterministic coverage issue and I'd like to present it here in general form to a wider audience to get additional opinions on that approach because I think it can be used on other such tests to mitigate non-determinism.

    The approach can be broadly described by a 'recipe' for any given test:

    • Run the coverage measuring script contrib/devtools/test_deterministic_coverage.sh many times on the test and write down all branches that sometimes run and sometimes don't.
    • Figure out how to cause all of those branches to run from the test body.
    • Add code at the end of the test (after all the assertions) that would cause all of those branches to run.

    This approach worked effectively on the test DoS_mapOrphans in PR #16878 and I believe it can be used on some or maybe all of the tests listed in contrib/devtools/test_deterministic_coverage.sh. As far as I can see, the only disadvantage for this approach is the seemingly unrelated code at the end of such tests required to ensure coverage but I believe any confusion can be eliminated by a comment explaining what it's there for.

    What do people think?

  2. davereikher added the label Feature on Dec 12, 2019
  3. practicalswift commented at 8:54 AM on December 12, 2019: contributor

    @davereikher

    Very glad that you are interested in tackling the remaining instances of non-determinism in our unit tests. That's great!

    One of the problems with line coverage non-determinism is that it makes it impossible to reason about line coverage with precision. More specifically having line coverage non-determinism makes it impossible to say if a PR will reduce or increase line coverage since the delta in line coverage due to the suggested change will be added to a random variable (the existing line coverage which varies between runs due to non-determinism in a few tests).

    In cases where the ideal solution of making the tests deterministic by adjusting the existing test logic is not possible or hard to achieve I think your suggestion makes sense as a second-best alternative. That will give us a stable line coverage number and thus the ability to reason about the line coverage impact of suggested changes :)

    Since this only is relevant for a small set of tests I don't see any real downside of what you suggest: if/when "proper test determinism" is introduced for the test in question the temporary fix suggested can simply be removed, and a comment on why the extra testing code is needed will eliminate any confusion.

    Again: thanks a lot for working on making our unit tests deterministic in terms of line coverage! :)

    Don't hesitate to ping me if you need a test determinism PR reviewed or if I can help in any other way :)

  4. fanquake removed the label Feature on Dec 12, 2019
  5. fanquake added the label Brainstorming on Dec 12, 2019
  6. fanquake added the label Tests on Dec 12, 2019
  7. MarcoFalke commented at 4:57 PM on December 13, 2019: member

    determinism shouldn't be achieved by "painting lines green" in the coverage report. This merely hides the non-determinism. What we want are tests that run exactly the same branches over and over again (assuming the same seed).

  8. davereikher commented at 8:17 PM on December 21, 2019: contributor

    So, if I understand correctly, there currently exists a problem where it is unclear whether a PR increases or decreases coverage. It would be beneficial to make all tests deterministic but it's not clear how to do it since it was observed (see here) that the non-determinism is not due to a changing seed but most likely due to concurrency issues. I see the following options:

    1. Leave everything as is to avoid "painting the lines green", ignoring the non-deterministic coverage issue,
    2. Use the proposed method, at least temporarily, until a better permanent solution is found,
    3. Re-write the tests with the non-deterministic coverage to make it deterministic, using a different approach,
    4. Remove the non-deterministic tests,

    I guess the fourth option is most likely not realistic, as it would reduce coverage significantly. The third option is the best in it's outcome, but I think it would take a very large amount of effort. Perhaps it's not even possible without some serious refactoring for some of the tests. This is due to the fact that the non-determinism is not just a seed issue, it's most likely concurrency-related and I personally don't see another way to relatively easily achieve coverage determinism without deep changes to those tests and some of the fixtures. The second option can be seen as a temporary patch which addresses the issue that a small amount of tests were written without consideration of coverage determinism. They therefore introduced an inability to track how a new PR affects coverage. Thus a patch in the form of "covering the lines green" is requied to resolve this 'bug'. I agree that from an aesthetic point of view the patch may be a ugly, but given the current situation I think it contributes overall and makes work more efficient by allowing to see how a new PR affects the coverage.

    Of course, it's also a question of how important knowing if a new PR affects coverage is. If it's very important, maybe even option 4 can be considered. If it's not at all, we can close the issue. So I think eventually it boils down to whether the effect of the new PR question is more important than the aesthetics of the tests?

  9. adamjonas commented at 3:20 PM on August 2, 2022: member

    Marco and co have continued to be vigilant about non-determinism in tests. Closing due to lack of momentum. Can re-open if future interest in this specific approach.

  10. adamjonas closed this on Aug 2, 2022

  11. bitcoin locked this on Aug 2, 2023

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-04-29 03:15 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me