test: kill process group to avoid dangling processes when using `--failfast` #22249

pull S3RK wants to merge 1 commits into bitcoin:master from S3RK:test_kill_process_group changing 1 files +6 −13
  1. S3RK commented at 7:09 AM on June 15, 2021: member

    This is an alternative to #19281

    This PR fixes a problem when after test failure with --failfast option there could be dangling nodes. The nodes will continue to occupy rpc/p2p ports on the machine and will cause further test failures.

    If there are any dangling nodes left at the end of the test run we kill the whole process group. Pros: the operations is immediate and won't lead to CI timeout Cons: the test_runner process is also killed and exit code is 137

    Example output:

    ...
    Early exiting after test failure
    
    TEST                           | STATUS    | DURATION
    
    rpc_decodescript.py            | ✓ Passed  | 2 s
    rpc_deprecated.py              | ✓ Passed  | 2 s
    rpc_deriveaddresses.py         | ✓ Passed  | 2 s
    rpc_dumptxoutset.py            | ✖ Failed  | 2 s
    
    ALL                            | ✖ Failed  | 8 s (accumulated)
    Runtime: 4 s
    
    Killed: 9
    > echo $?
    137
    
  2. fanquake added the label Tests on Jun 15, 2021
  3. in test/functional/test_runner.py:556 in 5012bf02a0 outdated
     554 | -    job_queue.kill_and_join()
     555 | +    # Clean up dangling processes if any. This may only happen with --failfast option.
     556 | +    # Killing the process group will also terminate the current process but that is
     557 | +    # not an issue
     558 | +    if len(job_queue.jobs):
     559 | +        os.killpg(os.getpid(), signal.SIGKILL)
    


    MarcoFalke commented at 7:18 AM on June 15, 2021:

    S3RK commented at 7:39 AM on June 15, 2021:

    Yes, thanks

  4. MarcoFalke approved
  5. MarcoFalke commented at 7:20 AM on June 15, 2021: member

    This means --failfast no longer works on Windows, but who uses that anyway :man_shrugging:

  6. test: kill process group to avoid dangling processes 451b96f7d2
  7. S3RK force-pushed on Jun 15, 2021
  8. S3RK commented at 7:39 AM on June 15, 2021: member

    I can add different code paths for Win and Unix if desirable

  9. MarcoFalke commented at 9:56 AM on June 15, 2021: member

    review ACK 451b96f7d2796d00eabaec56d831f9e9b1a569cc

    mind pushing a temporary commit with a logic error in one of the tests, so that we can observe the CI fails properly?

  10. practicalswift commented at 10:02 AM on June 15, 2021: contributor

    Concept ACK on addressing the annoying dangling bitcoind issue :)

  11. DrahtBot commented at 11:49 AM on June 15, 2021: member

    <!--e57a25ab6845829454e8d69fc972939a-->

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    <!--174a7506f384e20aa4161008e828411d-->

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #19281 (test: fix dangling bitcoind in functional tests by S3RK)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  12. promag commented at 1:22 PM on June 15, 2021: member

    It's not clear to me what is wrong with the kill+wait approach, why does it result in dangling processes.

  13. S3RK commented at 7:17 PM on June 15, 2021: member

    It's not clear to me what is wrong with the kill+wait approach, why does it result in dangling processes.

    So, we have three layers of processes, something like

    test_runner.py[100] -> test_1.py[101] -> bitcoind[102]
                        -> test_2.py[103] -> bitcoind[104] 
    

    Currently test_runner.py[100] sends SIGKILL to all test_x.py which leads to its immediate termination and orphaned bitcoind processes being linked to init. E.g. test_1.py[101] got killed and bitcoind[102] is dangling. Actually, I'm not sure why do we use wait there, I don't think it changes anything.

    To avoid that we can do SIGTERM and wait, but that could lead to CI timeouts. I've tried that first in #19281

    Another alternative is to setup signal handler in child process (bitcoind) to listen when parent process exits, but that's feels backward.

    mind pushing a temporary commit with a logic error in one of the tests, so that we can observe the CI fails properly?

    Sure, will do

  14. aitorjs commented at 11:24 PM on June 15, 2021: contributor

    ACK 451b96f7d2796d00eabaec56d831f9e9b1a569cc. Manual testing with and without --failfast.

    Failing an assert with --failfast, passes through reviewed code. When fails, exits gracefully stopping of processes.

    <details> <summary>Log</summary> <br> <p> $ test/functional/test_runner.py --failfast rpc_decodescript.py rpc_deprecated.py rpc_deriveaddresses.py rpc_dumptxoutset.py <br><br> Temporary test directory at /tmp/test_runner_₿_🏃_20210616_004909 Running Unit Tests for Test Framework Modules .......... ---------------------------------------------------------------------- Ran 10 tests in 0.715s

    OK Remaining jobs: [rpc_decodescript.py, rpc_deprecated.py, rpc_deriveaddresses.py, rpc_dumptxoutset.py] 1/4 - rpc_decodescript.py failed, Duration: 1 s

    stdout: 2021-06-15T22:49:12.613000Z TestFramework (INFO): Initializing test directory /tmp/test_runner_₿🏃_20210616_004909/rpc_decodescript_3 2021-06-15T22:49:13.132000Z TestFramework (ERROR): Assertion failed Traceback (most recent call last): File "/home/bitcoin/blockchain/bitcoinPR/22249/test/functional/test_framework/test_framework.py", line 128, in main self.run_test() File "/home/bitcoin/blockchain/bitcoinPR/22249/test/functional/rpc_decodescript.py", line 230, in run_test self.decodescript_script_pub_key() File "/home/bitcoin/blockchain/bitcoinPR/22249/test/functional/rpc_decodescript.py", line 67, in decodescript_script_pub_key assert_equal('1 ' + public_key_hash, rpc_result['segwit']['asm']) File "/home/bitcoin/blockchain/bitcoinPR/22249/test/functional/test_framework/util.py", line 51, in assert_equal raise AssertionError("not(%s)" % " == ".join(str(arg) for arg in (thing1, thing2) + args)) AssertionError: not(1 5dd1d3a048119c27b28293056724d9522f26d945 == 0 5dd1d3a048119c27b28293056724d9522f26d945) 2021-06-15T22:49:13.183000Z TestFramework (INFO): Stopping nodes 2021-06-15T22:49:13.285000Z TestFramework (WARNING): Not cleaning up dir /tmp/test_runner🏃_20210616_004909/rpc_decodescript_3 2021-06-15T22:49:13.285000Z TestFramework (ERROR): Test failed. Test logging available at /tmp/test_runner🏃_20210616_004909/rpc_decodescript_3/test_framework.log 2021-06-15T22:49:13.285000Z TestFramework (ERROR): 2021-06-15T22:49:13.286000Z TestFramework (ERROR): Hint: Call /home/bitcoin/blockchain/bitcoinPR/22249/test/functional/combine_logs.py '/tmp/test_runner₿_🏃_20210616_004909/rpc_decodescript_3' to consolidate all logs 2021-06-15T22:49:13.286000Z TestFramework (ERROR): 2021-06-15T22:49:13.286000Z TestFramework (ERROR): If this failure happened unexpectedly or intermittently, please file a bug and provide a link or upload of the combined log. 2021-06-15T22:49:13.286000Z TestFramework (ERROR): https://github.com/bitcoin/bitcoin/issues 2021-06-15T22:49:13.286000Z TestFramework (ERROR):

    stderr:

    Early exiting after test failure

    TEST | STATUS | DURATION

    rpc_decodescript.py | ✖ Failed | 1 s

    ALL | ✖ Failed | 1 s (accumulated) Runtime: 1 s

    dangling processes <!-- Print I write to know if it goes through the added part in test_runner.py ---> Terminado (killed)

    </p> </details>

    Failing an assert without --failfast, dont passes through reviewed code and all tests are run.

    <details> <summary>Log</summary> <br> $ test/functional/test_runner.py rpc_decodescript.py rpc_deprecated.py rpc_deriveaddresses.py rpc_dumptxoutset.py <br><br> <p> Temporary test directory at /tmp/test_runner_₿_🏃_20210616_005950 Running Unit Tests for Test Framework Modules .......... ---------------------------------------------------------------------- Ran 10 tests in 0.680s

    OK Remaining jobs: [rpc_decodescript.py, rpc_deprecated.py, rpc_deriveaddresses.py, rpc_dumptxoutset.py] 1/4 - rpc_decodescript.py failed, Duration: 1 s

    stdout: 2021-06-15T22:59:53.470000Z TestFramework (INFO): Initializing test directory /tmp/test_runner_₿🏃_20210616_005950/rpc_decodescript_3 2021-06-15T22:59:53.735000Z TestFramework (ERROR): Assertion failed Traceback (most recent call last): File "/home/bitcoin/blockchain/bitcoinPR/22249/test/functional/test_framework/test_framework.py", line 128, in main self.run_test() File "/home/bitcoin/blockchain/bitcoinPR/22249/test/functional/rpc_decodescript.py", line 230, in run_test self.decodescript_script_pub_key() File "/home/bitcoin/blockchain/bitcoinPR/22249/test/functional/rpc_decodescript.py", line 67, in decodescript_script_pub_key assert_equal('1 ' + public_key_hash, rpc_result['segwit']['asm']) File "/home/bitcoin/blockchain/bitcoinPR/22249/test/functional/test_framework/util.py", line 51, in assert_equal raise AssertionError("not(%s)" % " == ".join(str(arg) for arg in (thing1, thing2) + args)) AssertionError: not(1 5dd1d3a048119c27b28293056724d9522f26d945 == 0 5dd1d3a048119c27b28293056724d9522f26d945) 2021-06-15T22:59:53.786000Z TestFramework (INFO): Stopping nodes 2021-06-15T22:59:53.888000Z TestFramework (WARNING): Not cleaning up dir /tmp/test_runner🏃_20210616_005950/rpc_decodescript_3 2021-06-15T22:59:53.888000Z TestFramework (ERROR): Test failed. Test logging available at /tmp/test_runner🏃_20210616_005950/rpc_decodescript_3/test_framework.log 2021-06-15T22:59:53.888000Z TestFramework (ERROR): 2021-06-15T22:59:53.889000Z TestFramework (ERROR): Hint: Call /home/bitcoin/blockchain/bitcoinPR/22249/test/functional/combine_logs.py '/tmp/test_runner₿_🏃_20210616_005950/rpc_decodescript_3' to consolidate all logs 2021-06-15T22:59:53.889000Z TestFramework (ERROR): 2021-06-15T22:59:53.889000Z TestFramework (ERROR): If this failure happened unexpectedly or intermittently, please file a bug and provide a link or upload of the combined log. 2021-06-15T22:59:53.889000Z TestFramework (ERROR): https://github.com/bitcoin/bitcoin/issues 2021-06-15T22:59:53.889000Z TestFramework (ERROR):

    stderr:

    Remaining jobs: [rpc_deprecated.py, rpc_deriveaddresses.py, rpc_dumptxoutset.py] 2/4 - rpc_deprecated.py passed, Duration: 1 s Remaining jobs: [rpc_deriveaddresses.py, rpc_dumptxoutset.py] 3/4 - rpc_deriveaddresses.py passed, Duration: 2 s Remaining jobs: [rpc_dumptxoutset.py] 4/4 - rpc_dumptxoutset.py passed, Duration: 2 s

    TEST | STATUS | DURATION

    rpc_deprecated.py | ✓ Passed | 1 s rpc_deriveaddresses.py | ✓ Passed | 2 s rpc_dumptxoutset.py | ✓ Passed | 2 s rpc_decodescript.py | ✖ Failed | 1 s

    ALL | ✖ Failed | 6 s (accumulated) Runtime: 2 s

    </p> </details>

  15. MarcoFalke commented at 7:02 AM on June 17, 2021: member

    Example failure is here and looks good: https://cirrus-ci.com/build/5678106947092480

    Commit can be removed again.

  16. S3RK force-pushed on Jun 17, 2021
  17. S3RK commented at 7:45 PM on June 17, 2021: member

    Done

  18. MarcoFalke merged this on Jun 18, 2021
  19. MarcoFalke closed this on Jun 18, 2021

  20. sidhujag referenced this in commit 0f79e33528 on Jun 18, 2021
  21. gwillen referenced this in commit 16c93fc360 on Jun 1, 2022
  22. DrahtBot locked this on Aug 18, 2022

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-04-24 21:14 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me