test: Implicitly sync after generate*, unless opted out #23300

pull MarcoFalke wants to merge 1 commits into bitcoin:master from MarcoFalke:2110-testSyncImp changing 46 files +144 −175
  1. MarcoFalke commented at 11:27 am on October 18, 2021: member

    The most frequent failure in functional tests are intermittent races. Fixing such bugs is cumbersome because it involves:

    • Noticing the failure
    • Fetching and reading the log to determine the test case that failed
    • Adding a self.sync_all() where it was forgotten
    • Spamming out a pr and waiting for review, which is already sparse

    Also, writing a linter to catch those is not possible, nor is review effective in finding these bugs prior to merge.

    Fix all future intermittent races caused by a missing sync_block call by calling sync_all implicitly after each generate*, unless opted out. This ensures that the code is race-free (with regards to blocks) when the tests pass once, instead of our current approach where the code can never be guaranteed to be race-free.

    There are some scripted-diff cleanups (see #22567), but they will be submitted in a follow-up to reduce the conflicts in this pull.

  2. practicalswift commented at 12:28 pm on October 18, 2021: contributor
    Concept ACK
  3. DrahtBot added the label Tests on Oct 18, 2021
  4. DrahtBot commented at 8:48 pm on October 18, 2021: member

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #23371 (test: MiniWallet: add P2TR support and use it per default by theStack)
    • #23075 (refactoring: Fee estimation functional test cleanups by darosior)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  5. MarcoFalke force-pushed on Oct 19, 2021
  6. test: Implicitly sync after generate*, unless opted out facc352648
  7. MarcoFalke force-pushed on Oct 29, 2021
  8. in test/functional/test_framework/test_framework.py:633 in facc352648
    626@@ -627,20 +627,27 @@ def join_network(self):
    627         self.connect_nodes(1, 2)
    628         self.sync_all()
    629 
    630-    def generate(self, generator, *args, **kwargs):
    631+    def no_op(self):
    632+        pass
    633+
    634+    def generate(self, generator, *args, sync_fun=None, **kwargs):
    


    luke-jr commented at 3:46 pm on October 29, 2021:
    Suggest making it a required parameter, otherwise there’s a high risk of backports breaking.

    MarcoFalke commented at 3:59 pm on October 29, 2021:

    I don’t think we’ll backport any major test changes to 22.x (or earlier) at this point, and hopefully this pull is merged before the next major release, so backports to 23.x aren’t an issue.

    If reviewers insist that this should be done, I suggest doing it in a follow up as a scripted-diff to make it easy to review and keep the conflicts in this pull low.

  9. hebasto commented at 11:24 am on November 8, 2021: member
    Concept ACK.
  10. lsilva01 approved
  11. lsilva01 commented at 3:27 pm on November 8, 2021: contributor
    tACK facc352 on Ubuntu 20.04
  12. brunoerg approved
  13. brunoerg commented at 7:48 pm on November 8, 2021: member

    tACK facc352648e4fe1ed9b406400b6e4a9d51f30349 on MacOS 11.6

    Nice. It solves the failure I was getting when running test_runner. I think there are some other tests you could apply it like rpc_packages.py.

  14. MarcoFalke merged this on Nov 9, 2021
  15. MarcoFalke closed this on Nov 9, 2021

  16. MarcoFalke deleted the branch on Nov 9, 2021
  17. MarcoFalke commented at 9:33 am on November 9, 2021: member

    I think there are some other tests you could apply it like rpc_packages.py.

    Can you explain this a bit more? Or maybe create a follow-up yourself?

  18. hebasto commented at 9:38 am on November 9, 2021: member

    FWIW, was testing this PR, and with the following patch:

     0diff --git a/.cirrus.yml b/.cirrus.yml
     1index f7962e951..93866de06 100644
     2--- a/.cirrus.yml
     3+++ b/.cirrus.yml
     4@@ -164,7 +164,7 @@ task:
     5     - netsh int ipv4 set dynamicport tcp start=1025 num=64511
     6     - netsh int ipv6 set dynamicport tcp start=1025 num=64511
     7     # Exclude feature_dbcrash for now due to timeout
     8-    - python test\functional\test_runner.py --nocleanup --ci --quiet --combinedlogslen=4000 --jobs=4 --timeout-factor=8 --failfast --extended --exclude feature_dbcrash
     9+    - python test\functional\test_runner.py --nocleanup --ci --quiet --combinedlogslen=4000 --jobs=4 --failfast --extended --exclude feature_dbcrash
    10 
    11 task:
    12   name: 'ARM [unit tests, no functional tests] [bullseye]'
    13diff --git a/ci/test/00_setup_env.sh b/ci/test/00_setup_env.sh
    14index 8a9d808f5..0685d1f0c 100755
    15--- a/ci/test/00_setup_env.sh
    16+++ b/ci/test/00_setup_env.sh
    17@@ -41,7 +41,7 @@ export RUN_SECURITY_TESTS=${RUN_SECURITY_TESTS:-false}
    18 # By how much to scale the test_runner timeouts (option --timeout-factor).
    19 # This is needed because some ci machines have slow CPU or disk, so sanitizers
    20 # might be slow or a reindex might be waiting on disk IO.
    21-export TEST_RUNNER_TIMEOUT_FACTOR=${TEST_RUNNER_TIMEOUT_FACTOR:-40}
    22+export TEST_RUNNER_TIMEOUT_FACTOR=${TEST_RUNNER_TIMEOUT_FACTOR:-1}
    23 export TEST_RUNNER_ENV=${TEST_RUNNER_ENV:-}
    24 export RUN_FUZZ_TESTS=${RUN_FUZZ_TESTS:-false}
    25 export EXPECTED_TESTS_DURATION_IN_SECONDS=${EXPECTED_TESTS_DURATION_IN_SECONDS:-1000}
    

    all tasks, except for the native Windows one, pass.

    In the Windows task two tests fail due timeouts:

    0feature_pruning.py                                 | ✖ Failed  | 1381 s
    1rpc_misc.py                                        | ✖ Failed  | 63 s
    
  19. MarcoFalke commented at 9:49 am on November 9, 2021: member
    Thanks for testing. Do you have logs for the two timeouts?
  20. hebasto commented at 10:09 am on November 9, 2021: member

    Thanks for testing. Do you have logs for the two timeouts?

    https://api.cirrus-ci.com/v1/task/4512194625798144/logs/functional_tests.log

  21. MarcoFalke commented at 10:23 am on November 9, 2021: member

    The pruning one has a reorg and a call to verifychain, which may take a long time, so this is expected:

    0                                   test_framework.authproxy.JSONRPCException: 'verifychain' RPC took longer than 60.000000 seconds. Consider using larger timeout for calls that take longer to return. (-344)
    

    The rpc_misc one looks like an unrelated issue.

  22. hebasto commented at 10:25 am on November 9, 2021: member

    The rpc_misc one looks like an unrelated issue.

    Could be easy tuned:

    0@@ -84,7 +84,7 @@ class RpcMiscTest(BitcoinTestFramework):
    1 
    2         # Restart the node with indices and wait for them to sync
    3         self.restart_node(0, ["-txindex", "-blockfilterindex", "-coinstatsindex"])
    4-        self.wait_until(lambda: all(i["synced"] for i in node.getindexinfo().values()))
    5+        self.wait_until(lambda: all(i["synced"] for i in node.getindexinfo().values()), timeout=120)
    6 
    7         # Returns a list of all running indices by default
    8         values = {"synced": True, "best_block_height": 200}
    
  23. MarcoFalke commented at 10:27 am on November 9, 2021: member
    The sync threads all exit within a second, so I don’t understand why raising the timeout would fix this issue.
  24. hebasto commented at 11:31 am on November 9, 2021: member

    @MarcoFalke

    The sync threads all exit within a second…

    That is not the case in CI environment. See my demo branch, and its CI log:

    02021-11-09T11:28:10.405000Z TestFramework (INFO): Total time taken by `wait_until` call: 52.64929151535034 seconds.
    
  25. MarcoFalke commented at 11:39 am on November 9, 2021: member

    Oh you are right.

    Any idea why coinstatsindex takes more than a minute on windows, but finishes in less than a second on linux?

  26. hebasto commented at 12:25 pm on November 9, 2021: member

    Any idea why coinstatsindex takes more than a minute on windows, but finishes in less than a second on linux?

    Have no idea.

    Locally, on Windows 10 Pro 21H1, it takes about 30 seconds.

  27. sidhujag referenced this in commit 26504dee4b on Nov 9, 2021
  28. sidhujag referenced this in commit 6cca5e7f0b on Nov 9, 2021
  29. MarcoFalke referenced this in commit 15d109802a on Nov 15, 2021
  30. DrahtBot locked this on Nov 9, 2022

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-12-23 15:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me