Actually, the exponential distribution is unbounded that causes intermittent timeout errors in functional tests.
test: Add random interval upper bound in regtest #19588
pull hebasto wants to merge 1 commits into bitcoin:master from hebasto:200725-uni-test changing 1 files +13 −2-
hebasto commented at 7:24 am on July 25, 2020: member
-
in src/init.cpp:570 in 237bc3edbb outdated
566@@ -567,6 +567,7 @@ void SetupServerArgs(NodeContext& node) 567 gArgs.AddArg("-rpcwhitelistdefault", "Sets default behavior for rpc whitelisting. Unless rpcwhitelistdefault is set to 0, if any -rpcwhitelist is set, the rpc server acts as if all rpc users are subject to empty-unless-otherwise-specified whitelists. If rpcwhitelistdefault is set to 1 and no -rpcwhitelist is set, rpc server acts as if all rpc users are subject to empty whitelists.", ArgsManager::ALLOW_BOOL, OptionsCategory::RPC); 568 gArgs.AddArg("-rpcworkqueue=<n>", strprintf("Set the depth of the work queue to service RPC calls (default: %d)", DEFAULT_HTTP_WORKQUEUE), ArgsManager::ALLOW_ANY | ArgsManager::DEBUG_ONLY, OptionsCategory::RPC); 569 gArgs.AddArg("-server", "Accept command line and JSON-RPC commands", ArgsManager::ALLOW_ANY, OptionsCategory::RPC); 570+ gArgs.AddArg("-uniformintervaltest", "Use uniform distribution for random intervals instead of exponential one (default: 0)", ArgsManager::ALLOW_BOOL | ArgsManager::DEBUG_ONLY, OptionsCategory::DEBUG_TEST);
MarcoFalke commented at 7:40 am on July 25, 2020:not sure if adding another mock-setting is the right approach here
- This is not guarded and can accidentally be set on mainnet
- This will run a different code path when set (partially code that has been added for this test-setting only), thus reducing the effectiveness of tests and increasing the maintenance burden
- There is already a way to mock the poisson dist with the noban permission (This could be extracted to a hidden test-only permission if there are concerns about noban being too broad)
- A timeout of 60s with the expected value of the poisson dist (5s or 7s or whatever) yields a failure rate that is far lower than the failure rate we see due to infrastructure problems or other test problems and races. The issue here is that the nodes are daisy chained and thus the delays add up, which could bring the delay over 60s more likely.
hebasto commented at 8:38 am on July 25, 2020:- This is not guarded and can accidentally be set on mainnet
Yes. It must be guarded. Going to fix it.
- This will run a different code path when set (partially code that has been added for this test-setting only), thus reducing the effectiveness of tests and increasing the maintenance burden
The current code path, which is skipped in tests with this PR, is not covered by tests in master.
- There is already a way to mock the poisson dist with the noban permission (This could be extracted to a hidden test-only permission if there are concerns about noban being too broad)
This approach does not work universally, e.g., in
wallet_backup.py
: https://github.com/bitcoin/bitcoin/blob/40a04814d130dfc9131af3f568eb44533e2bcbfc/test/functional/wallet_backup.py#L52-L57- A timeout of 60s with the expected value of the poisson dist (5s or 7s or whatever) yields a failure rate that is far lower than the failure rate we see due to infrastructure problems or other test problems and races. The issue here is that the nodes are daisy chained and thus the delays add up, which could bring the delay over 60s more likely.
The assumption about test failures that would be fixed by this PR could be biased as in the most cases we have only CI logs that lack data about infrastructure problems.
MarcoFalke commented at 7:55 pm on August 12, 2020:The current code path, which is skipped in tests with this PR, is not covered by tests in master.
The same might be true for this change. Capping the time might never execute code paths (in other functions) that only run after a long timeout.
hebasto force-pushed on Jul 25, 2020hebasto renamed this:
test: Use uniform distribution for random intervals in functional tests
test: Add random interval upper bound in regtest
on Jul 25, 2020hebasto commented at 9:55 am on July 25, 2020: memberReworked. @MarcoFalke Mind taking another look?
The changes are pretty local, and if they will not justify themselves, they could be easily reverted back :smiley:
DrahtBot commented at 6:33 pm on July 26, 2020: memberThe following sections might be updated with supplementary metadata relevant to reviewers and maintainers.
Conflicts
Reviewers, this pull request conflicts with the following ones:
- #18077 (net: Add NAT-PMP port forwarding support by hebasto)
- #17428 (p2p: Try to preserve outbound block-relay-only connections during restart by hebasto)
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.
DrahtBot added the label P2P on Jul 26, 2020DrahtBot added the label Tests on Jul 26, 2020hebasto force-pushed on Jul 29, 2020hebasto commented at 11:41 am on July 29, 2020: memberRebased c6d1cbdb4b7d21ebb3254781f8f6a6c8a679837d -> 8a945c0a4834b60c5453485a78cbae45ce65b3d9 (pr19588.02 -> pr19588.03) to include the recent CI changes.test: Add random interval upper bound in regtest
These changes fix intermittent timeout errors in the functional tests.
in src/net.cpp:2858 in 8a945c0a48 outdated
2854@@ -2853,7 +2855,16 @@ int64_t CConnman::PoissonNextSendInbound(int64_t now, int average_interval_secon 2855 2856 int64_t PoissonNextSend(int64_t now, int average_interval_seconds) 2857 { 2858- return now + (int64_t)(log1p(GetRand(1ULL << 48) * -0.0000000000000035527136788 /* -1/2^48 */) * average_interval_seconds * -1000000.0 + 0.5); 2859+ const int64_t average_interval_ms = average_interval_seconds * 1000000ULL;
instagibbs commented at 3:10 pm on August 5, 2020:math isn’t my strongest suit: Why are we multiplying seconds by 100k to get ms? 🤔
hebasto commented at 3:14 pm on August 5, 2020:math isn’t my strongest suit: Why are we multiplying seconds by 100k to get ms? thinking
Why
1000000ULL
is 100k? :)
hebasto commented at 3:20 pm on August 5, 2020:Going to update names.
instagibbs commented at 3:32 pm on August 5, 2020:too many zeroes clearly :eyes: , thanks for update of namehebasto force-pushed on Aug 5, 2020hebasto commented at 3:29 pm on August 5, 2020: memberUpdated 8a945c0a4834b60c5453485a78cbae45ce65b3d9 -> b34b8edfb24a7660376466bb136affcc6d688b75 (pr19588.03 -> pr19588.04, diff).
Addressed @instagibbs’ comment:
math isn’t my strongest suit: Why are we multiplying seconds by 100k to get ms?
in src/net.cpp:2862 in b34b8edfb2
2854@@ -2853,7 +2855,16 @@ int64_t CConnman::PoissonNextSendInbound(int64_t now, int average_interval_secon 2855 2856 int64_t PoissonNextSend(int64_t now, int average_interval_seconds) 2857 { 2858- return now + (int64_t)(log1p(GetRand(1ULL << 48) * -0.0000000000000035527136788 /* -1/2^48 */) * average_interval_seconds * -1000000.0 + 0.5); 2859+ const int64_t average_interval_microseconds = average_interval_seconds * 1000000ULL; 2860+ int64_t delta_microseconds = -std::log1p(GetRand(1ULL << 48) * -0.0000000000000035527136788 /* -1/2^48 */) * average_interval_microseconds + 0.5; 2861+ 2862+ // Upper bound to avoid timeouts in the functional tests. 2863+ static const bool regtest = gArgs.GetChainName() == CBaseChainParams::REGTEST;
laanwj commented at 6:02 pm on August 12, 2020:I think this is really ugly, both doing a string compare here, making the function depend on a global, and doing this at first call (which is not guaranteed to be after initializing the chain name). Though I don’t really have a proposal how to do this differently.hebasto closed this on Aug 15, 2020
DrahtBot locked this on Feb 15, 2022
github-metadata-mirror
This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-12-04 18:12 UTC
This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me