test: Seed test RNG context for each test case, print seed

MarcoFalke commented at 7:39 pm on September 27, 2019: member

Debugging failing unit tests is hard if the failure is non-deterministic and the seed is not known.

Fix that by printing the seed and making it possible to set the seed from outside.

fanquake added the label Tests on Sep 27, 2019

MarcoFalke force-pushed on Sep 27, 2019

practicalswift commented at 8:09 am on September 28, 2019: contributor

Concept ACK

MarcoFalke force-pushed on Sep 30, 2019

MarcoFalke force-pushed on Oct 1, 2019

in src/test/setup_common.h:43 in fa74667c47 outdated

37@@ -38,9 +38,21 @@ extern FastRandomContext g_insecure_rand_ctx;
38  */
39 extern bool g_mock_deterministic_tests;
40 
41-static inline void SeedInsecureRand(bool deterministic = false)
42+enum class SeedRand {
43+    ZEROS, //!< Seed with a compile time constant of zeros
44+    SEEDR, //!< Call the Seed() helper

practicalswift commented at 8:24 pm on October 3, 2019:

Is this intentionally abbreviated as SEEDR, or should it be SEED/SEEDER?

MarcoFalke commented at 8:30 pm on October 3, 2019:

Happy to change it to something else if you can find a better name

practicalswift commented at 8:37 pm on October 3, 2019:

I was thinking simply SEED? :)

MarcoFalke commented at 8:46 pm on October 3, 2019:

Changed. Let’s hope this doesn’t conflict with any windows symbols

MarcoFalke force-pushed on Oct 3, 2019

in src/test/setup_common.cpp:52 in fab503d6df outdated

47+void Seed(FastRandomContext& ctx)
48+{
49+    // Should be enough to get the seed once for the process
50+    static uint256 seed{};
51+    static const std::string RANDOM_CTX_SEED{"RANDOM_CTX_SEED"};
52+    if (seed.IsNull()) seed = GetUintFromEnv(RANDOM_CTX_SEED);

davereikher commented at 11:32 am on October 4, 2019:

Is if (seed.IsNull()) necessary in this line? Isn’t seed always null here?

MarcoFalke commented at 1:05 pm on October 4, 2019:

It will be set once for the process, so (ideally) should only be null once

in contrib/devtools/test_deterministic_coverage.sh:16 in fab503d6df outdated

11@@ -12,6 +12,9 @@ export LC_ALL=C
12 # Use GCOV_EXECUTABLE="llvm-cov gcov" if compiling with clang.
13 GCOV_EXECUTABLE="gcov"
14 
15+# Keep random seed constant over multiple runs.
16+export RANDOM_CTX_SEED=${RANDOM_CTX_SEED:-8c9678d7cde616175258686f6aaf7c2446657f40a9bc5196ba97ca21b5e3c5c4}

davereikher commented at 12:09 pm on October 4, 2019:

I think the contrib/devtools/test_deterministic_coverage.sh script will fail to detect non-deterministic coverage if the seed is fixed between runs. Isn’t it the whole point of that script? To make sure I ran a single test, wallet_tests/coin_mark_dirty_immature_credit 30 times using the script and I always got the same coverage. If I’m not mistaken, the main purpose here is to log the seed, so maybe this should be changed to a random number read from /dev/random, something like export RANDOM_CTX_SEED=$(xxd -l 32 -p /dev/urandom | tr -d " \n") Also, for this purpose, I believe this should be moved to within the while [[ ${TEST_RUN_ID} -lt ${N_TEST_RUNS} ]]; do loop so that the seed will be different for each iteration.

MarcoFalke commented at 1:16 pm on October 4, 2019:

Thanks, switched to xxd -l 32 -p /dev/urandom | tr -d " \n"

davereikher changes_requested

MarcoFalke force-pushed on Oct 4, 2019

MarcoFalke commented at 1:16 pm on October 4, 2019: member

Addressed feedback by @davereikher

in contrib/devtools/test_deterministic_coverage.sh:16 in faad0e8e4d outdated

11@@ -12,6 +12,9 @@ export LC_ALL=C
12 # Use GCOV_EXECUTABLE="llvm-cov gcov" if compiling with clang.
13 GCOV_EXECUTABLE="gcov"
14 
15+# Keep random seed constant over multiple runs.
16+export RANDOM_CTX_SEED=${RANDOM_CTX_SEED:-$(xxd -l 32 -p /dev/urandom | tr -d " \n")}

davereikher commented at 2:30 pm on October 4, 2019:

This makes the seed different between subsequent runs of the script, but the script will still show deterministic coverage each time it runs, since the seed doesn’t change between iterations within the script, so this line should be moved down into this loop: https://github.com/bitcoin/bitcoin/blob/faad0e8e4d61dfb115331c18e68adffb2b3ee471/contrib/devtools/test_deterministic_coverage.sh#L106

MarcoFalke commented at 5:45 pm on October 7, 2019:

Huh? If you change the seed between runs and then compare the results of those runs, you are pretty much guaranteed to get non-deterministic coverage. I think this test should check for non-determinism across runs, assuming everything kept constant. That is, it should act as a proxy to be able to answer the question “How reliable can test failures be reproduced”.

MarcoFalke commented at 5:49 pm on October 7, 2019:

I think you are looking for the test to answer the question “How deterministic is coverage across runs”, which is also a useful question to ask. Not sure how to accommodate both.

MarcoFalke commented at 2:30 pm on October 8, 2019:

And a random seed is already picked per process in the while loop you referenced below. So moving the export RANDOM_CTX_SEED wouldn’t make any observable difference.

So either I leave it as is or remove the export RANDOM_CTX_SEED from this file again. What do you think?

davereikher commented at 11:10 am on October 9, 2019:

Sorry for the delay,

Huh? If you change the seed between runs and then compare the results of those runs, you are pretty much guaranteed to get non-deterministic coverage. I think this test should check for non-determinism across runs, assuming everything kept constant.

You’re right, many tests fail with a fixed seed (I tried 5 different tests from the list in the script and all of them exhibited non-deterministic coverage after a few tens of runs), thanks for pointing this out! On the other hand, I’ve been running the script for all tests which are not in the list inside the script while having the seed changed every time (I moved the line inside the loop just for this test) and no non-determinism in coverage has been detected so far (I’m at iteration ~70 now). I’ll update when it gets to a 1000.

…it should act as a proxy to be able to answer the question “How reliable can test failures be reproduced”

But then I think it doesn’t perfectly serve it’s purpose - the fact is that the script’s actual mechanism of operation is to compare the coverage between runs and look for discrepancies. If the coverage is found to be the same between runs, it still does not guarantee that a failure can always be reproduced. That’s because for a test that might fail, at some point in the code there is a branching into two outcomes - test failing and test succeeding. Since all tests probably always succeed when running this script, the succeeding branch is always taken and we never know what’s going on in the failing branch. There might be non-determinism there which could still make the failure irreproducible.

And a random seed is already picked per process in the while loop you referenced below. So moving the export RANDOM_CTX_SEED wouldn’t make any observable difference.

So either I leave it as is or remove the export RANDOM_CTX_SEED from this file again. What do you think?

Isn’t the seed set to RANDOM_CTX_SEED if this environment variable is found and thus the seed is the same for all runs in the loop if this line is located outside the loop?

MarcoFalke commented at 2:28 pm on October 9, 2019:

Isn’t the seed set to RANDOM_CTX_SEED if this environment variable is found and thus the seed is the same for all runs in the loop if this line is located outside the loop?

Yes, but putting it inside the loop will make it pick a new seed every time. This is already the default behaviour when no seed is provided at all. So might as well leave it out completely.

davereikher commented at 3:02 pm on October 9, 2019:

Oh ok, so yes, I think it’s best to remove it then.

MarcoFalke commented at 3:20 pm on October 9, 2019:

Ok, removed. Thanks for the discussion!

davereikher commented at 5:04 am on October 10, 2019:

Likewise!

davereikher commented at 5:12 am on October 10, 2019:

BTW,

… I’ll update when it gets to a 1000.

The script found non-deterministic behaviour at iteration 399, meaning it ran the entire test suite 398 times with a different seed each run and did not detect non-deterministic coverage in any of them. The non-determinism in run 399 might be caused by a test that should be included in the list of tests in the script. I’m not sure how to isolate the test that causes this non-determinism though. It requires some detective work. I’ll try working on it.

test: Seed test RNG context for each test case, print seed fae43a97ca

MarcoFalke force-pushed on Oct 9, 2019

davereikher commented at 5:03 am on October 10, 2019: contributor

Tested ACK fae43a97ca947cd0802392e9bb86d9d0572c0fba

DrahtBot added the label Needs rebase on Nov 7, 2019

DrahtBot commented at 2:27 pm on November 7, 2019: member

MarcoFalke referenced this in commit 772673dfbe on Nov 7, 2019

MarcoFalke merged this on Nov 7, 2019

MarcoFalke closed this on Nov 7, 2019

MarcoFalke deleted the branch on Nov 7, 2019

sidhujag referenced this in commit 9bc18304f1 on Nov 9, 2019

fanquake removed the label Needs rebase on Jul 26, 2020

jasonbcox referenced this in commit ee2ce03e6d on Oct 19, 2020

sidhujag referenced this in commit 7e8c01256e on Nov 10, 2020

PastaPastaPasta referenced this in commit ee3f303a07 on Dec 22, 2021

PastaPastaPasta referenced this in commit 81e86e6954 on Dec 22, 2021

PastaPastaPasta referenced this in commit 601f39505b on Dec 22, 2021

furszy referenced this in commit 73ac5f699e on Feb 7, 2022

DrahtBot locked this on Feb 15, 2022

test: Seed test RNG context for each test case, print seed #16978