test: fix dangling bitcoind in functional tests #19281

pull S3RK wants to merge 1 commits into bitcoin:master from S3RK:test_terminate_gracefully changing 2 files +30 −5
  1. S3RK commented at 5:30 AM on June 15, 2020: member

    This PR fixes a problem when after failure of a test with --failfast option there are dangling nodes. The nodes will continue to occupy rpc/p2p ports on the machine and will cause further test failures.

    The fix: Introduce new test status — cancelled and register a signal handler that will raise TestCancelled exception to properly handle termination. Test runner now sends SIGTERM instead of SIGKILL.

  2. fanquake added the label Tests on Jun 15, 2020
  3. in test/functional/test_framework/test_framework.py:12 in 2709209516 outdated
       8 | @@ -9,6 +9,7 @@
       9 |  import argparse
      10 |  import logging
      11 |  import os
      12 | +import signal
    


    jonatack commented at 11:23 AM on June 15, 2020:

    nit: sort

  4. in test/functional/test_runner.py:607 in 2709209516 outdated
     605 | -            proc.kill()
     606 | +            logging.debug("Terminating dangling process: %s" % str(proc.pid))
     607 | +            proc.terminate()
     608 |  
     609 |          for proc in procs:
     610 | +            logging.debug("Waiting for %s to terminate." % str(proc.pid))
    


    jonatack commented at 11:34 AM on June 15, 2020:

    nit: Here and in the other file touched, I think the project prefers using format for interpolation.

    -            logging.debug("Terminating dangling process: %s" % str(proc.pid))
    +            logging.debug("Terminating dangling process: {}".format(proc.pid))
                 proc.terminate()
     
             for proc in procs:
    -            logging.debug("Waiting for %s to terminate." % str(proc.pid))
    +            logging.debug("Waiting for {} to terminate.".format(proc.pid))
    
  5. jonatack commented at 12:31 PM on June 15, 2020: member

    Concept ACK. A couple nits below, but no need to change anything until other reviewers agree with this.

  6. jonatack commented at 12:32 PM on June 15, 2020: member

    Concept ACK. A couple nits below, but no need to change anything until other reviewers agree with this.

  7. S3RK renamed this:
    test: gracefully terminate parallel tests
    test: fix dangling bitcoind in functional tests
    on Jun 23, 2020
  8. laanwj referenced this in commit f6072e601a on Jul 1, 2020
  9. test: gracefully terminate parallel tests
    Test runner sends SIGTERM instead of SIGKILL to a running test
    to allow proper shutdown of nodes.
    bb44fd944d
  10. S3RK force-pushed on Jul 6, 2020
  11. S3RK commented at 11:34 AM on July 6, 2020: member

    Fixed nits

  12. laanwj commented at 12:40 PM on July 15, 2020: member

    Concept ACK

  13. S3RK requested review from jonatack on Jul 23, 2020
  14. NelsonGaldeman commented at 11:15 PM on February 25, 2021: contributor

    ACK https://github.com/bitcoin/bitcoin/pull/19281/commits/bb44fd944d4d7715aa1247d53da9ae21c59c543a

    Tested on OSX 11.1

  15. dunxen commented at 1:21 PM on June 4, 2021: contributor

    ACK bb44fd9 (rebased on 3ac520966).

    • Forced fail in one test and ran with --failfast. SIGTERM triggered and also saw graceful stopping of processes. No dangling bitcoind processes seen.
    • Can also confirm that sending a SIGTERM to one of the running tests also triggers a fast fail with graceful process termination.

    <details> <summary>Screenshot with details of external <code>SIGTERM</code></summary> <br>

    Screenshot 2021-06-04 at 15 14 20

    </details>

  16. MarcoFalke commented at 1:27 PM on June 4, 2021: member

    I think --failfast is only supported by CI, which will throw away the whole VM, so dangling process is not an issue. However, timeout instead of bug might be more of an issue

  17. S3RK commented at 6:52 AM on June 8, 2021: member

    @MarcoFalke would you concept ack this if I add a timeout to proc.wait()? In this case we don't need to restrict --failfast to CI only, and in case a process is stuck CI will not timeout

  18. MarcoFalke commented at 6:08 AM on June 14, 2021: member

    There shouldn't be any timeout. The whole process tree should be killed immediately. If there was a simple way in python to do that, I'd prefer that.

  19. DrahtBot commented at 4:39 PM on June 15, 2021: member

    <!--e57a25ab6845829454e8d69fc972939a-->

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    <!--174a7506f384e20aa4161008e828411d-->

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #22249 (test: kill process group to avoid dangling processes when using --failfast by S3RK)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  20. MarcoFalke referenced this in commit 0844084c13 on Jun 18, 2021
  21. DrahtBot added the label Needs rebase on Jun 18, 2021
  22. DrahtBot commented at 12:30 PM on June 18, 2021: member

    <!--cf906140f33d8803c4a75a2196329ecb-->

    🐙 This pull request conflicts with the target branch and needs rebase.

    <sub>Want to unsubscribe from rebase notifications on this pull request? Just convert this pull request to a "draft".</sub>

  23. MarcoFalke closed this on Jun 18, 2021

  24. sidhujag referenced this in commit 0f79e33528 on Jun 18, 2021
  25. DrahtBot locked this on Aug 18, 2022

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-04-25 00:14 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me