Debugging failing unit tests is hard if the failure is non-deterministic and the seed is not known.
Fix that by printing the seed and making it possible to set the seed from outside.
Debugging failing unit tests is hard if the failure is non-deterministic and the seed is not known.
Fix that by printing the seed and making it possible to set the seed from outside.
Concept ACK
37 | @@ -38,9 +38,21 @@ extern FastRandomContext g_insecure_rand_ctx; 38 | */ 39 | extern bool g_mock_deterministic_tests; 40 | 41 | -static inline void SeedInsecureRand(bool deterministic = false) 42 | +enum class SeedRand { 43 | + ZEROS, //!< Seed with a compile time constant of zeros 44 | + SEEDR, //!< Call the Seed() helper
Is this intentionally abbreviated as SEEDR, or should it be SEED/SEEDER?
Happy to change it to something else if you can find a better name
I was thinking simply SEED? :)
Changed. Let's hope this doesn't conflict with any windows symbols
47 | +void Seed(FastRandomContext& ctx) 48 | +{ 49 | + // Should be enough to get the seed once for the process 50 | + static uint256 seed{}; 51 | + static const std::string RANDOM_CTX_SEED{"RANDOM_CTX_SEED"}; 52 | + if (seed.IsNull()) seed = GetUintFromEnv(RANDOM_CTX_SEED);
Is if (seed.IsNull()) necessary in this line? Isn't seed always null here?
It will be set once for the process, so (ideally) should only be null once
11 | @@ -12,6 +12,9 @@ export LC_ALL=C 12 | # Use GCOV_EXECUTABLE="llvm-cov gcov" if compiling with clang. 13 | GCOV_EXECUTABLE="gcov" 14 | 15 | +# Keep random seed constant over multiple runs. 16 | +export RANDOM_CTX_SEED=${RANDOM_CTX_SEED:-8c9678d7cde616175258686f6aaf7c2446657f40a9bc5196ba97ca21b5e3c5c4}
I think the contrib/devtools/test_deterministic_coverage.sh script will fail to detect non-deterministic coverage if the seed is fixed between runs. Isn't it the whole point of that script?
To make sure I ran a single test, wallet_tests/coin_mark_dirty_immature_credit 30 times using the script and I always got the same coverage.
If I'm not mistaken, the main purpose here is to log the seed, so maybe this should be changed to a random number read from /dev/random, something like
export RANDOM_CTX_SEED=$(xxd -l 32 -p /dev/urandom | tr -d " \n")
Also, for this purpose, I believe this should be moved to within the while [[ ${TEST_RUN_ID} -lt ${N_TEST_RUNS} ]]; do loop so that the seed will be different for each iteration.
Thanks, switched to xxd -l 32 -p /dev/urandom | tr -d " \n"
Addressed feedback by @davereikher
11 | @@ -12,6 +12,9 @@ export LC_ALL=C 12 | # Use GCOV_EXECUTABLE="llvm-cov gcov" if compiling with clang. 13 | GCOV_EXECUTABLE="gcov" 14 | 15 | +# Keep random seed constant over multiple runs. 16 | +export RANDOM_CTX_SEED=${RANDOM_CTX_SEED:-$(xxd -l 32 -p /dev/urandom | tr -d " \n")}
This makes the seed different between subsequent runs of the script, but the script will still show deterministic coverage each time it runs, since the seed doesn't change between iterations within the script, so this line should be moved down into this loop: https://github.com/bitcoin/bitcoin/blob/faad0e8e4d61dfb115331c18e68adffb2b3ee471/contrib/devtools/test_deterministic_coverage.sh#L106
Huh? If you change the seed between runs and then compare the results of those runs, you are pretty much guaranteed to get non-deterministic coverage. I think this test should check for non-determinism across runs, assuming everything kept constant. That is, it should act as a proxy to be able to answer the question "How reliable can test failures be reproduced".
I think you are looking for the test to answer the question "How deterministic is coverage across runs", which is also a useful question to ask. Not sure how to accommodate both.
And a random seed is already picked per process in the while loop you referenced below. So moving the export RANDOM_CTX_SEED wouldn't make any observable difference.
So either I leave it as is or remove the export RANDOM_CTX_SEED from this file again. What do you think?
Sorry for the delay,
Huh? If you change the seed between runs and then compare the results of those runs, you are pretty much guaranteed to get non-deterministic coverage. I think this test should check for non-determinism across runs, assuming everything kept constant.
You're right, many tests fail with a fixed seed (I tried 5 different tests from the list in the script and all of them exhibited non-deterministic coverage after a few tens of runs), thanks for pointing this out! On the other hand, I've been running the script for all tests which are not in the list inside the script while having the seed changed every time (I moved the line inside the loop just for this test) and no non-determinism in coverage has been detected so far (I'm at iteration ~70 now). I'll update when it gets to a 1000.
...it should act as a proxy to be able to answer the question "How reliable can test failures be reproduced"
But then I think it doesn't perfectly serve it's purpose - the fact is that the script's actual mechanism of operation is to compare the coverage between runs and look for discrepancies. If the coverage is found to be the same between runs, it still does not guarantee that a failure can always be reproduced. That's because for a test that might fail, at some point in the code there is a branching into two outcomes - test failing and test succeeding. Since all tests probably always succeed when running this script, the succeeding branch is always taken and we never know what's going on in the failing branch. There might be non-determinism there which could still make the failure irreproducible.
And a random seed is already picked per process in the while loop you referenced below. So moving the
export RANDOM_CTX_SEEDwouldn't make any observable difference.So either I leave it as is or remove the
export RANDOM_CTX_SEEDfrom this file again. What do you think?
Isn't the seed set to RANDOM_CTX_SEED if this environment variable is found and thus the seed is the same for all runs in the loop if this line is located outside the loop?
Isn't the seed set to RANDOM_CTX_SEED if this environment variable is found and thus the seed is the same for all runs in the loop if this line is located outside the loop?
Yes, but putting it inside the loop will make it pick a new seed every time. This is already the default behaviour when no seed is provided at all. So might as well leave it out completely.
Oh ok, so yes, I think it's best to remove it then.
Ok, removed. Thanks for the discussion!
Likewise!
BTW,
... I'll update when it gets to a 1000.
The script found non-deterministic behaviour at iteration 399, meaning it ran the entire test suite 398 times with a different seed each run and did not detect non-deterministic coverage in any of them. The non-determinism in run 399 might be caused by a test that should be included in the list of tests in the script. I'm not sure how to isolate the test that causes this non-determinism though. It requires some detective work. I'll try working on it.
Tested ACK fae43a97ca947cd0802392e9bb86d9d0572c0fba
<!--cf906140f33d8803c4a75a2196329ecb-->Needs rebase