coverage reports non-deterministic #14343

issue MarcoFalke openend this issue on September 28, 2018
  1. MarcoFalke commented at 1:49 am on September 28, 2018: member

    Our unit tests and functional tests are non-deterministic in the overall execution, but the coverage should not be affected by that. I.e. some functions might be executed in a different order or sometimes skipped, but every line, function and branch should be executed at least once.

    This is currently not true, even for serialization errors that should be hit exactly once.

    Beside the obvious issue of missing test coverage on some runs, this also makes it impossible to see how test coverage changes between two commits.

  2. MarcoFalke added the label Tests on Sep 28, 2018
  3. MarcoFalke added this to the milestone 0.18.0 on Sep 28, 2018
  4. practicalswift commented at 8:46 am on October 1, 2018: contributor

    Non-determinism is bad :-(

    Is there a specific subset of the tests that exhibit this behaviour, or is it across the board?

  5. ryanofsky commented at 4:36 am on October 4, 2018: member
    Marco, how would you suggest fixing this? I imagine adding options to use fixed seeds when choosing random test behaviors would fix some problems, and disabling tests like practicalswift suggested could be a fallback in other cases.
  6. MarcoFalke commented at 7:38 am on October 4, 2018: member

    I have no idea on how to solve this, since some (unit) tests are inherently racy with regard to coverage.

    See for example:

    https://github.com/bitcoin/bitcoin/blob/2796c6e5ec9055545e987023b04eb5ebe64d5320/src/validation.cpp#L2677-L2679

    which is sometimes hit and other times not.

  7. ryanofsky commented at 8:59 am on October 4, 2018: member

    Assuming cases like this aren’t very common, we could add a -test-determinism debug option and change the condition to:

    0if (GetMainSignals().CallbacksPending() > 10 && !g_test_determinism)
    

    This would be similar to the g_debug_lockorder_abort flag, which also modifies behavior for the sake of testing.

  8. practicalswift commented at 9:07 am on October 4, 2018: contributor
    @ryanofsky That is a very good idea. That would also serve as a documentation on where these issues are expected to be encountered. And documenting the issues is the first step towards fixing them :-)
  9. MarcoFalke commented at 9:12 am on October 4, 2018: member

    I think a first step would be to see which tests are the problematic ones (as suggested by @practicalswift earlier), then see where and why they are causing indeterminism and then see how to fix each instance best.

    So maybe run coverage reports for each unit test case, the util tests, and each functional test script thrice or so and see if they differ?

  10. MarcoFalke commented at 11:12 pm on December 4, 2018: member
    Alternatively, the coverage reports should just run in a (virtual) environment that always executes deterministically
  11. practicalswift commented at 5:44 pm on January 30, 2019: contributor

    I’ve worked on this and this is the current status:

    I’m now able to run src/test/test_bitcoin, src/qt/test/test_bitcoin-qt and src/bench/bench_bitcoin -evals=1 -scaling=0 with 100% determinism with regards to line coverage.

    More specifically I’m getting exactly the same set of covered lines across hundreds of test runs. (A line is covered if it is executed at least once during a test run.)

    I had to patch the following files to achieve determinism:

    • src/qt/rpcconsole.cpp (RPCConsole::setClientModel)
    • src/qt/trafficgraphwidget.cpp (TrafficGraphWidget::updateRates)
    • src/test/test_bitcoin.{cpp,h} (BasicTestingSetup::BasicTestingSetup)
    • src/validation.cpp (CChainState::ActivateBestChain)
    • src/wallet/wallet.cpp (GetLocktimeForNewTransaction)

    Additionally I had to disable the following ten unit tests which were non-deterministic also in the presence of said patches:

    • bloom_tests/rolling_bloom
    • coins_tests/updatecoins_simulation_test
    • coinselector_tests/knapsack_solver_test
    • denialofservice_tests/DoS_mapOrphans
    • scheduler_tests/manythreads
    • scheduler_tests/singlethreadedscheduler_ordered
    • tx_validationcache_tests/tx_mempool_block_doublespend
    • txindex_tests/txindex_initial_sync
    • validation_block_tests/processnewblock_signals_ordering
    • wallet_tests/ListCoins

    I’m currently using #ifdef DETERMINISTIC_COVERAGE_MODE_UNSAFE_FOR_PRODUCTION to keep my changes opt-in.

    For example:

     0diff --git a/src/validation.cpp b/src/validation.cpp
     1index 6a26bf9ba..61a5558ff 100644
     2--- a/src/validation.cpp
     3+++ b/src/validation.cpp
     4@@ -2677,6 +2677,9 @@ bool CChainState::ActivateBestChain(CValidationState &state, const CChainParams&
     5     do {
     6         boost::this_thread::interruption_point();
     7
     8+#ifdef DETERMINISTIC_COVERAGE_MODE_UNSAFE_FOR_PRODUCTION
     9+        // Skip validation queue drain to achieve line coverage determinism in tests.
    10+#else
    11         if (GetMainSignals().CallbacksPending() > 10) {
    12             // Block until the validation queue drains. This should largely
    13             // never happen in normal operation, however may happen during
    14@@ -2686,6 +2689,7 @@ bool CChainState::ActivateBestChain(CValidationState &state, const CChainParams&
    15             // probably have a DEBUG_LOCKORDER test for this in the future.
    16             SyncWithValidationInterfaceQueue();
    17         }
    18+#endif
    19
    20         {
    21             LOCK(cs_main);
    

    Is that a sensible approach?

  12. practicalswift commented at 5:50 pm on January 30, 2019: contributor
    I’ve also written a shell script that runs the test suite up to N times and errors if line coverage non-determinism is detected in any of the runs.
  13. MarcoFalke commented at 5:58 pm on January 30, 2019: member
    Wow, thanks for working on this. I am definitely interested in seeing the helper script and patch.
  14. MarcoFalke referenced this in commit 9e7f8f6c82 on Feb 5, 2019
  15. MarcoFalke removed this from the milestone 0.18.0 on Feb 13, 2019
  16. MarcoFalke referenced this in commit 80112b17e7 on Mar 2, 2019
  17. practicalswift commented at 3:51 pm on July 4, 2019: contributor
    Partial solution offered in #16320 :-)
  18. fanquake referenced this in commit 12f7147c89 on Aug 23, 2019
  19. PastaPastaPasta referenced this in commit 7fe7b445f2 on Jun 27, 2021
  20. PastaPastaPasta referenced this in commit ba7dc914eb on Jun 27, 2021
  21. PastaPastaPasta referenced this in commit 3e4dd8d9fc on Jun 28, 2021
  22. PastaPastaPasta referenced this in commit 3b989cd794 on Jun 28, 2021
  23. PastaPastaPasta referenced this in commit 28b30d5464 on Jun 28, 2021
  24. PastaPastaPasta referenced this in commit ebf56727ab on Jun 29, 2021
  25. PastaPastaPasta referenced this in commit 4eafb4dea6 on Jun 29, 2021
  26. PastaPastaPasta referenced this in commit fad61df3f2 on Jul 1, 2021
  27. PastaPastaPasta referenced this in commit 2624540a97 on Jul 1, 2021
  28. PastaPastaPasta referenced this in commit 963e4c3d9f on Jul 8, 2021
  29. PastaPastaPasta referenced this in commit dd2f85c084 on Jul 10, 2021
  30. MarcoFalke closed this on May 6, 2022

  31. DrahtBot locked this on May 6, 2023

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-12-04 06:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me