qa: Fix intermittent “Unable to connect to bitcoind” errors on Windows #28509

pull hebasto wants to merge 1 commits into bitcoin:master from hebasto:230919-subprocess changing 2 files +63 −6
  1. hebasto commented at 8:56 pm on September 19, 2023: member

    During my investigation of #28411 and other similar functional test failures on Windows in CI, I found out that https://github.com/bitcoin/bitcoin/blob/abe4fedab735c145881e85dc2b02cf819a241635/test/functional/test_framework/test_node.py#L223 sometimes fails for unknown to me reasons. By “fails”, I mean that a child process does not make any progress.

    This PR ensures a child process’s progress by checking a created PID file shortly. If the check fails, another two attempts are following.

    Although this PR fixes tests on Windows, the new logic is platform-agnostic and increases test robustness.

    In several dozens of runs in my personal repo GHA, the only intermittent failure still happens – #28491.

    Closes #28411.

  2. DrahtBot commented at 8:56 pm on September 19, 2023: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Reviews

    See the guideline for information on the review process. A summary of reviews will appear here.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #28392 (test: Use pathlib over os path by ns-xvrn)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  3. DrahtBot added the label Tests on Sep 19, 2023
  4. hebasto force-pushed on Sep 19, 2023
  5. DrahtBot added the label CI failed on Sep 19, 2023
  6. maflcko commented at 9:16 pm on September 19, 2023: member
    May be easier to just bump the python version from 3.9 to 3.12 to fix the bug?
  7. hebasto force-pushed on Sep 20, 2023
  8. hebasto marked this as a draft on Sep 20, 2023
  9. hebasto force-pushed on Sep 20, 2023
  10. hebasto marked this as ready for review on Sep 20, 2023
  11. hebasto commented at 9:50 am on September 20, 2023: member
    The CI failure is #28491 and unrelated to this PR.
  12. qa: Ensure `subprocess.Popen(bitcoind)` succeeds f6c3419604
  13. hebasto force-pushed on Sep 20, 2023
  14. DrahtBot removed the label CI failed on Sep 20, 2023
  15. hebasto commented at 1:58 pm on September 20, 2023: member

    … just bump the python version from 3.9 to 3.12…

    From Python 3.12 Release Schedule:

    Expected:

    • 3.12.0 final: Monday, 2023-10-02

    The currently available Python versions in the Windows 2022 image:

    • 3.7.9
    • 3.8.10
    • 3.9.13
    • 3.10.11
    • 3.11.5
  16. maflcko commented at 2:18 pm on September 20, 2023: member
    Which one are we using right now?
  17. hebasto commented at 2:19 pm on September 20, 2023: member

    Which one are we using right now?

    On Windows, it is 3.11.5.

  18. maflcko commented at 2:49 pm on September 20, 2023: member
    Ok, so the issue is probably not due to an too-old python version.
  19. fanquake commented at 4:22 pm on September 24, 2023: member

    Concept ~0. A bunch of extra code in the test-framework, to fix a not-yet-identified, Windows only issue.

    the new logic is platform-agnostic and increases test robustness.

    Can you elaborate on how this increases robustness for non-Windows platforms, if they are already working?

  20. hebasto commented at 4:48 pm on September 24, 2023: member

    Concept ~0. A bunch of extra code in the test-framework, to fix a not-yet-identified, Windows only issue.

    1. We already have an entire directory with code that serves similar purposes in our CI.

    2. We already have a bunch of platform-specific code in the test-framework.

    3. The issue has been identified (please refer to the PR description), but its cause has not yet been determined.
      Of course, it would be great if someone identifies it. And then this workaround can be dropped.

    Can you elaborate on how this increases robustness for non-Windows platforms, if they are already working?

    If some similar issues will happen for non-Windows platform in the future, they won’t break the tests.

  21. fanquake commented at 4:51 pm on September 24, 2023: member

    If some similar issues will happen for non-Windows platform in the future, they won’t break the tests.

    You mean the issues will just be hidden / less-likely to be identified & debugged?

  22. hebasto commented at 4:55 pm on September 24, 2023: member

    @fanquake

    If some similar issues will happen for non-Windows platform in the future, they won’t break the tests.

    You mean the issues will just be hidden / less-likely to be identified & debugged?

    This PR adds additional logging and exceptions.

    What do you suggest?

  23. fanquake commented at 5:02 pm on September 24, 2023: member

    What do you suggest?

    I would suggest we figure out why Python doesn’t work on Windows, or at least, doesn’t work when run in the GitHub CI, and fix it in a targeted way (while reporting the issue upstream), with the intention to drop the workaround as soon as a newer version of Python is available, rather than inject all this new code, into the test framework, where it affects all platforms.

  24. hebasto commented at 6:10 pm on September 25, 2023: member

    I would suggest we figure out why Python doesn’t work on Windows, or at least, doesn’t work when run in the GitHub CI…

    I started to think that the issue is specific to GHA CI as I cannot reproduce it locally.

  25. maflcko commented at 12:45 pm on September 29, 2023: member

    Could it make sense to disable the functional tests on Windows for pull requests and only run them on master?

    This means that issues will be caught at a later stage only, but I’d suspect they are easy to fixup post-merge.

    Overall this may be less work than having someone re-run the CI on all affected pull request or having people ignore the Windows CI anyway.

  26. fanquake commented at 9:14 am on October 2, 2023: member
    Yea, I think this might be the right thing to do (for now). Persistent random red CI is pointless, and confusing for contributors. It’s a shame that Windows Python doesn’t seem to work on GitHub, but we also aren’t going to make all the changes here to work around that.
  27. hebasto closed this on Oct 2, 2023

  28. fanquake referenced this in commit 3cd02806ec on Oct 4, 2023
  29. Frank-GER referenced this in commit 07f975f0b5 on Oct 13, 2023
  30. bitcoin locked this on Oct 1, 2024

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-11-21 12:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me