Debugging failing unit tests is hard if the failure is non-deterministic and the seed is not known.
Fix that by printing the seed and making it possible to set the seed from outside.
Debugging failing unit tests is hard if the failure is non-deterministic and the seed is not known.
Fix that by printing the seed and making it possible to set the seed from outside.
37@@ -38,9 +38,21 @@ extern FastRandomContext g_insecure_rand_ctx;
38 */
39 extern bool g_mock_deterministic_tests;
40
41-static inline void SeedInsecureRand(bool deterministic = false)
42+enum class SeedRand {
43+ ZEROS, //!< Seed with a compile time constant of zeros
44+ SEEDR, //!< Call the Seed() helper
SEEDR
, or should it be SEED
/SEEDER
?
SEED
? :)
47+void Seed(FastRandomContext& ctx)
48+{
49+ // Should be enough to get the seed once for the process
50+ static uint256 seed{};
51+ static const std::string RANDOM_CTX_SEED{"RANDOM_CTX_SEED"};
52+ if (seed.IsNull()) seed = GetUintFromEnv(RANDOM_CTX_SEED);
if (seed.IsNull())
necessary in this line? Isn’t seed
always null here?
11@@ -12,6 +12,9 @@ export LC_ALL=C
12 # Use GCOV_EXECUTABLE="llvm-cov gcov" if compiling with clang.
13 GCOV_EXECUTABLE="gcov"
14
15+# Keep random seed constant over multiple runs.
16+export RANDOM_CTX_SEED=${RANDOM_CTX_SEED:-8c9678d7cde616175258686f6aaf7c2446657f40a9bc5196ba97ca21b5e3c5c4}
contrib/devtools/test_deterministic_coverage.sh
script will fail to detect non-deterministic coverage if the seed is fixed between runs. Isn’t it the whole point of that script?
To make sure I ran a single test, wallet_tests/coin_mark_dirty_immature_credit
30 times using the script and I always got the same coverage.
If I’m not mistaken, the main purpose here is to log the seed, so maybe this should be changed to a random number read from /dev/random, something like
export RANDOM_CTX_SEED=$(xxd -l 32 -p /dev/urandom | tr -d " \n")
Also, for this purpose, I believe this should be moved to within the while [[ ${TEST_RUN_ID} -lt ${N_TEST_RUNS} ]]; do
loop so that the seed will be different for each iteration.
xxd -l 32 -p /dev/urandom | tr -d " \n"
11@@ -12,6 +12,9 @@ export LC_ALL=C
12 # Use GCOV_EXECUTABLE="llvm-cov gcov" if compiling with clang.
13 GCOV_EXECUTABLE="gcov"
14
15+# Keep random seed constant over multiple runs.
16+export RANDOM_CTX_SEED=${RANDOM_CTX_SEED:-$(xxd -l 32 -p /dev/urandom | tr -d " \n")}
And a random seed is already picked per process in the while loop you referenced below. So moving the export RANDOM_CTX_SEED
wouldn’t make any observable difference.
So either I leave it as is or remove the export RANDOM_CTX_SEED
from this file again. What do you think?
Sorry for the delay,
Huh? If you change the seed between runs and then compare the results of those runs, you are pretty much guaranteed to get non-deterministic coverage. I think this test should check for non-determinism across runs, assuming everything kept constant.
You’re right, many tests fail with a fixed seed (I tried 5 different tests from the list in the script and all of them exhibited non-deterministic coverage after a few tens of runs), thanks for pointing this out! On the other hand, I’ve been running the script for all tests which are not in the list inside the script while having the seed changed every time (I moved the line inside the loop just for this test) and no non-determinism in coverage has been detected so far (I’m at iteration ~70 now). I’ll update when it gets to a 1000.
…it should act as a proxy to be able to answer the question “How reliable can test failures be reproduced”
But then I think it doesn’t perfectly serve it’s purpose - the fact is that the script’s actual mechanism of operation is to compare the coverage between runs and look for discrepancies. If the coverage is found to be the same between runs, it still does not guarantee that a failure can always be reproduced. That’s because for a test that might fail, at some point in the code there is a branching into two outcomes - test failing and test succeeding. Since all tests probably always succeed when running this script, the succeeding branch is always taken and we never know what’s going on in the failing branch. There might be non-determinism there which could still make the failure irreproducible.
And a random seed is already picked per process in the while loop you referenced below. So moving the
export RANDOM_CTX_SEED
wouldn’t make any observable difference.So either I leave it as is or remove the
export RANDOM_CTX_SEED
from this file again. What do you think?
Isn’t the seed set to RANDOM_CTX_SEED if this environment variable is found and thus the seed is the same for all runs in the loop if this line is located outside the loop?
Isn’t the seed set to RANDOM_CTX_SEED if this environment variable is found and thus the seed is the same for all runs in the loop if this line is located outside the loop?
Yes, but putting it inside the loop will make it pick a new seed every time. This is already the default behaviour when no seed is provided at all. So might as well leave it out completely.
BTW,
… I’ll update when it gets to a 1000.
The script found non-deterministic behaviour at iteration 399, meaning it ran the entire test suite 398 times with a different seed each run and did not detect non-deterministic coverage in any of them. The non-determinism in run 399 might be caused by a test that should be included in the list of tests in the script. I’m not sure how to isolate the test that causes this non-determinism though. It requires some detective work. I’ll try working on it.