ci, test: kill all dangling descendant processes #34963

pull rkrux wants to merge 1 commits into bitcoin:master from rkrux:ci-ancestor-commits changing 3 files +21 −6
  1. rkrux commented at 7:05 AM on March 31, 2026: contributor

    Relates to #34731

    This patch kills any and all dangling descendant processes started by the test runner in the test ancestor commits CI job only because this job uses the --failfast option to run tests that might leave dangling subprocesses that were not killed intentionally in the CI because the original implementation killed the test runner as well, which was not desirable.

    <!-- *** Please remove the following help text before submitting: *** Pull requests without a rationale and clear improvement may be closed immediately. GUI-related pull requests should be opened against https://github.com/bitcoin-core/gui first. See CONTRIBUTING.md -->

    <!-- Please provide clear motivation for your patch and explain how it improves Bitcoin Core user experience or Bitcoin Core developer experience significantly: * Any test improvements or new tests that improve coverage are always welcome. * All other changes should have accompanying unit tests (see `src/test/`) or functional tests (see `test/`). Contributors should note which tests cover modified code. If no tests exist for a region of modified code, new tests should accompany the change. * Bug fixes are most welcome when they come with steps to reproduce or an explanation of the potential issue as well as reasoning for the way the bug was fixed. * Features are welcome, but might be rejected due to design or scope issues. If a feature is based on a lot of dependencies, contributors should first consider building the system outside of Bitcoin Core, if possible. * Refactoring changes are only accepted if they are required for a feature or bug fix or otherwise improve developer experience significantly. For example, most "code style" refactoring changes require a thorough explanation why they are useful, what downsides they have and why they *significantly* improve developer experience or avoid serious programming bugs. Note that code style is often a subjective matter. Unless they are explicitly mentioned to be preferred in the [developer notes](/doc/developer-notes.md), stylistic code changes are usually rejected. -->

    <!-- Bitcoin Core has a thorough review process and even the most trivial change needs to pass a lot of eyes and requires non-zero or even substantial time effort to review. There is a huge lack of active reviewers on the project, so patches often sit for a long time. -->

  2. DrahtBot commented at 7:05 AM on March 31, 2026: contributor

    <!--e57a25ab6845829454e8d69fc972939a-->

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    <!--021abf342d371248e50ceaed478a90ca-->

    Reviews

    See the guideline for information on the review process. A summary of reviews will appear here.

    <!--174a7506f384e20aa4161008e828411d-->

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #34547 (lint: modernise lint tooling by willcl-ark)
    • #31723 (qa: Facilitate debugging bitcoind inside functional tests by hodlinator)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

    <!--5faf32d7da4f0f540f40219e4f7537a3-->

  3. DrahtBot added the label CI failed on Mar 31, 2026
  4. rkrux commented at 12:31 PM on March 31, 2026: contributor

    Huh! Failed at the 113th commit: https://github.com/bitcoin/bitcoin/actions/runs/23785411040/job/69307375322?pr=34963 in the opportunistic test.

    So the issue was only delayed (not removed) by reusing the sockets that are under time-wait state.

  5. rkrux force-pushed on Apr 1, 2026
  6. DrahtBot removed the label CI failed on Apr 1, 2026
  7. rkrux commented at 2:37 PM on April 1, 2026: contributor

    Adding a 1s delay between commits and setting tcp_tw_reuse passes the ancestor commits job: https://github.com/bitcoin/bitcoin/actions/runs/23841150308/job/69496837047?pr=34963

  8. rkrux force-pushed on Apr 1, 2026
  9. DrahtBot added the label CI failed on Apr 1, 2026
  10. rkrux force-pushed on Apr 1, 2026
  11. rkrux force-pushed on Apr 2, 2026
  12. rkrux force-pushed on Apr 2, 2026
  13. rkrux force-pushed on Apr 2, 2026
  14. DrahtBot removed the label CI failed on Apr 2, 2026
  15. rkrux force-pushed on Apr 3, 2026
  16. rkrux closed this on Apr 3, 2026

  17. rkrux reopened this on Apr 3, 2026

  18. DrahtBot added the label Needs rebase on Apr 3, 2026
  19. rkrux force-pushed on Apr 3, 2026
  20. DrahtBot removed the label Needs rebase on Apr 3, 2026
  21. rkrux commented at 6:20 PM on April 3, 2026: contributor

    It passed just by removing failfast without adding any delay: https://github.com/bitcoin/bitcoin/actions/runs/23947029708/job/69845405978?pr=34963

  22. DrahtBot added the label CI failed on Apr 3, 2026
  23. rkrux commented at 4:46 AM on April 4, 2026: contributor

    The test ancestor commits job passed again: https://github.com/bitcoin/bitcoin/actions/runs/23957112786/job/69878067205?pr=34963

    But the windows cross built failed with a timeout while shutting down node in wallet_tx_doublespend.py: https://github.com/bitcoin/bitcoin/actions/runs/23957112786/job/69878469247?pr=34963

  24. rkrux commented at 9:27 AM on April 4, 2026: contributor
  25. rkrux force-pushed on Apr 4, 2026
  26. rkrux force-pushed on Apr 4, 2026
  27. rkrux commented at 10:05 AM on April 4, 2026: contributor

    So the latest patch that kills all dangling subprocesses in this CI job has passed twice for 2-3 commits.

    I will try once with ~125 commits but even if it doesn't pass, I believe it's still a good addition to kill any dangling processes in the CI.

  28. rkrux force-pushed on Apr 4, 2026
  29. rkrux renamed this:
    [draft, wip, ignore] CI
    ci, test: kill all dangling descendant processes
    on Apr 4, 2026
  30. DrahtBot renamed this:
    ci, test: kill all dangling descendant processes
    ci, test: kill all dangling descendant processes
    on Apr 4, 2026
  31. rkrux marked this as ready for review on Apr 4, 2026
  32. rkrux commented at 10:13 AM on April 4, 2026: contributor

    I guess this can be done for all CI jobs as well, later? That way we can get rid of the conditional import as well in the test runner.

  33. rkrux commented at 10:48 AM on April 4, 2026: contributor
  34. ci, test: kill all dangling descendant processes
    This patch kills any and all dangling descendant processes started
    by the test runner in the test ancestor commits CI job only because
    this job uses the --failfast option to run tests that might leave
    dangling subprocesses that were not killed intentionally
    in the CI because the original implementation killed the test
    runner as well, which was not desirable.
    eba18422ff
  35. rkrux force-pushed on Apr 4, 2026
  36. DrahtBot removed the label CI failed on Apr 4, 2026
  37. in .github/workflows/ci.yml:64 in eba18422ff
      60 | @@ -61,6 +61,7 @@ jobs:
      61 |      runs-on: ${{ needs.runners.outputs.provider == 'cirrus' && 'ghcr.io/cirruslabs/ubuntu-runner-amd64:24.04-md' || 'ubuntu-24.04' }}
      62 |      env:
      63 |        TEST_RUNNER_PORT_MIN: "14000"  # Use a larger port, to avoid colliding with CIRRUS_CACHE_HOST port 12321.
      64 | +      CI_FAILFAST_TEST_LEAVE_DANGLING: 0
    


    maflcko commented at 8:29 AM on April 7, 2026:

    i don't understand this change. Why is this set to 0?

    It should either be left as-is, or set to 1, see also:

    .github/workflows/ci.yml:  CI_FAILFAST_TEST_LEAVE_DANGLING: 1  # GHA does not care about dangling processes and setting this variable avoids killing the CI script itself on error
    

    rkrux commented at 8:42 AM on April 7, 2026:

    I set it to 0 because otherwise 1 would be inherited in this job, which I didn't want because I want to kill all dangling sub-processes except the CI script itself.

    setting this variable avoids killing the CI script itself on error

    The implementation in this PR should not kill the CI script unlike the older implementation.


    rkrux commented at 8:49 AM on April 7, 2026:

    The idea for this PR came from this comment: #34726 (comment)

    2026-03-18T02:28:12.5942751Z WARNING! There is already a bitcoind process running on this system. Tests may fail unexpectedly due to resource contention!

    I don't know why this happened but it seemed reasonable to kill any dangling process.


    maflcko commented at 8:51 AM on April 7, 2026:

    Sure, I understand what the code change does, but I don't understand why the change is needed and what this will even change. IIUC it will only stylistically change the CI log on GHA, but I am not even sure about that.

    I don't think the code should be changed, when there is no reason, nor a description of the observed behavior change.


    maflcko commented at 8:53 AM on April 7, 2026:

    I don't know why this happened but it seemed reasonable to kill any dangling process.

    I disagree, because I fail to see how killing dangling processes after a failure will affect anything that happens before a passing run. There is no connection, and if there was one, it is the burden of the pull request author to explain it.


    rkrux commented at 9:08 AM on April 7, 2026:

    Fair point.

    I will close this PR in this case.

  38. rkrux closed this on Apr 7, 2026


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-04-22 09:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me