coverage reports non-deterministic

MarcoFalke commented at 1:49 am on September 28, 2018: member

Our unit tests and functional tests are non-deterministic in the overall execution, but the coverage should not be affected by that. I.e. some functions might be executed in a different order or sometimes skipped, but every line, function and branch should be executed at least once.

This is currently not true, even for serialization errors that should be hit exactly once.

Beside the obvious issue of missing test coverage on some runs, this also makes it impossible to see how test coverage changes between two commits.

MarcoFalke added the label Tests on Sep 28, 2018

MarcoFalke added this to the milestone 0.18.0 on Sep 28, 2018

practicalswift commented at 8:46 am on October 1, 2018: contributor

Non-determinism is bad :-(

Is there a specific subset of the tests that exhibit this behaviour, or is it across the board?

ryanofsky commented at 4:36 am on October 4, 2018: member

Marco, how would you suggest fixing this? I imagine adding options to use fixed seeds when choosing random test behaviors would fix some problems, and disabling tests like practicalswift suggested could be a fallback in other cases.

MarcoFalke commented at 7:38 am on October 4, 2018: member

I have no idea on how to solve this, since some (unit) tests are inherently racy with regard to coverage.

See for example:

https://github.com/bitcoin/bitcoin/blob/2796c6e5ec9055545e987023b04eb5ebe64d5320/src/validation.cpp#L2677-L2679

which is sometimes hit and other times not.

ryanofsky commented at 8:59 am on October 4, 2018: member

Assuming cases like this aren’t very common, we could add a -test-determinism debug option and change the condition to:

0if (GetMainSignals().CallbacksPending() > 10 && !g_test_determinism)

This would be similar to the g_debug_lockorder_abort flag, which also modifies behavior for the sake of testing.

practicalswift commented at 9:07 am on October 4, 2018: contributor

@ryanofsky That is a very good idea. That would also serve as a documentation on where these issues are expected to be encountered. And documenting the issues is the first step towards fixing them :-)

MarcoFalke commented at 9:12 am on October 4, 2018: member

I think a first step would be to see which tests are the problematic ones (as suggested by @practicalswift earlier), then see where and why they are causing indeterminism and then see how to fix each instance best.

So maybe run coverage reports for each unit test case, the util tests, and each functional test script thrice or so and see if they differ?

MarcoFalke commented at 11:12 pm on December 4, 2018: member

Alternatively, the coverage reports should just run in a (virtual) environment that always executes deterministically

practicalswift commented at 5:44 pm on January 30, 2019: contributor

I’ve worked on this and this is the current status:

I’m now able to run src/test/test_bitcoin, src/qt/test/test_bitcoin-qt and src/bench/bench_bitcoin -evals=1 -scaling=0 with 100% determinism with regards to line coverage.

More specifically I’m getting exactly the same set of covered lines across hundreds of test runs. (A line is covered if it is executed at least once during a test run.)

I had to patch the following files to achieve determinism:

src/qt/rpcconsole.cpp (RPCConsole::setClientModel)
src/qt/trafficgraphwidget.cpp (TrafficGraphWidget::updateRates)
src/test/test_bitcoin.{cpp,h} (BasicTestingSetup::BasicTestingSetup)
src/validation.cpp (CChainState::ActivateBestChain)
src/wallet/wallet.cpp (GetLocktimeForNewTransaction)

Additionally I had to disable the following ten unit tests which were non-deterministic also in the presence of said patches:

bloom_tests/rolling_bloom
coins_tests/updatecoins_simulation_test
coinselector_tests/knapsack_solver_test
denialofservice_tests/DoS_mapOrphans
scheduler_tests/manythreads
scheduler_tests/singlethreadedscheduler_ordered
tx_validationcache_tests/tx_mempool_block_doublespend
txindex_tests/txindex_initial_sync
validation_block_tests/processnewblock_signals_ordering
wallet_tests/ListCoins

I’m currently using #ifdef DETERMINISTIC_COVERAGE_MODE_UNSAFE_FOR_PRODUCTION to keep my changes opt-in.

For example:

 0diff --git a/src/validation.cpp b/src/validation.cpp
 1index 6a26bf9ba..61a5558ff 100644
 2--- a/src/validation.cpp
 3+++ b/src/validation.cpp
 4@@ -2677,6 +2677,9 @@ bool CChainState::ActivateBestChain(CValidationState &state, const CChainParams&
 5     do {
 6         boost::this_thread::interruption_point();
 7
 8+#ifdef DETERMINISTIC_COVERAGE_MODE_UNSAFE_FOR_PRODUCTION
 9+        // Skip validation queue drain to achieve line coverage determinism in tests.
10+#else
11         if (GetMainSignals().CallbacksPending() > 10) {
12             // Block until the validation queue drains. This should largely
13             // never happen in normal operation, however may happen during
14@@ -2686,6 +2689,7 @@ bool CChainState::ActivateBestChain(CValidationState &state, const CChainParams&
15             // probably have a DEBUG_LOCKORDER test for this in the future.
16             SyncWithValidationInterfaceQueue();
17         }
18+#endif
19
20         {
21             LOCK(cs_main);

Is that a sensible approach?

practicalswift commented at 5:50 pm on January 30, 2019: contributor

I’ve also written a shell script that runs the test suite up to N times and errors if line coverage non-determinism is detected in any of the runs.

MarcoFalke commented at 5:58 pm on January 30, 2019: member

Wow, thanks for working on this. I am definitely interested in seeing the helper script and patch.

MarcoFalke referenced this in commit 9e7f8f6c82 on Feb 5, 2019

MarcoFalke removed this from the milestone 0.18.0 on Feb 13, 2019

MarcoFalke referenced this in commit 80112b17e7 on Mar 2, 2019

practicalswift commented at 3:51 pm on July 4, 2019: contributor

Partial solution offered in #16320 :-)

fanquake referenced this in commit 12f7147c89 on Aug 23, 2019

PastaPastaPasta referenced this in commit 7fe7b445f2 on Jun 27, 2021

PastaPastaPasta referenced this in commit ba7dc914eb on Jun 27, 2021

PastaPastaPasta referenced this in commit 3e4dd8d9fc on Jun 28, 2021

PastaPastaPasta referenced this in commit 3b989cd794 on Jun 28, 2021

PastaPastaPasta referenced this in commit 28b30d5464 on Jun 28, 2021

PastaPastaPasta referenced this in commit ebf56727ab on Jun 29, 2021

PastaPastaPasta referenced this in commit 4eafb4dea6 on Jun 29, 2021

PastaPastaPasta referenced this in commit fad61df3f2 on Jul 1, 2021

PastaPastaPasta referenced this in commit 2624540a97 on Jul 1, 2021

PastaPastaPasta referenced this in commit 963e4c3d9f on Jul 8, 2021

PastaPastaPasta referenced this in commit dd2f85c084 on Jul 10, 2021

MarcoFalke closed this on May 6, 2022

DrahtBot locked this on May 6, 2023

coverage reports non-deterministic #14343