ipc, test: Add tests for unclean disconnect and thread busy behavior #34284

pull ryanofsky wants to merge 1 commits into bitcoin:master from ryanofsky:pr/ipc-testasync changing 2 files +118 −2
  1. ryanofsky commented at 1:46 pm on January 14, 2026: contributor
    Upcoming libmultiprocess changes are expected to alter this behavior (https://github.com/bitcoin/bitcoin/issues/34250#issuecomment-3749243782), making test coverage useful for documenting current behavior and validating the intended changes.
  2. DrahtBot commented at 1:46 pm on January 14, 2026: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/34284.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK Sjors
    Stale ACK ismaelsadeeq

    If your review is incorrectly listed, please copy-paste <!–meta-tag:bot-skip–> into the comment that the bot should ignore.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #34568 (mining: Break compatibility with existing IPC mining clients by ryanofsky)
    • #34422 (Update libmultiprocess subtree to be more stable with rust IPC client by ryanofsky)
    • #34184 (mining: add cooldown to createNewBlock() immediately after IBD by Sjors)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

    LLM Linter (✨ experimental)

    Possible typos and grammar issues:

    • “If the process has exited, asserts that the exit code matches” -> “If the process has exited, the method asserts that the exit code matches” [Adds missing subject to avoid a dangling clause and clarify what performs the assertion]

    2026-02-12 01:12:11

  3. DrahtBot added the label CI failed on Jan 14, 2026
  4. DrahtBot commented at 3:23 pm on January 14, 2026: contributor

    🚧 At least one of the CI tasks failed. Task lint: https://github.com/bitcoin/bitcoin/actions/runs/20996362200/job/60354097964 LLM reason (✨ experimental): Lint check failed (ruff) due to unused variable errors in Python code, causing the CI to fail.

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  5. ismaelsadeeq commented at 4:00 pm on January 14, 2026: member
    Concept ACK
  6. ryanofsky force-pushed on Jan 14, 2026
  7. ryanofsky commented at 11:54 pm on January 14, 2026: contributor
    Updated 7666f1cbafdff3fc8511ea540d9d7c7421902142 -> 1835ee877af30800c361e2075baae44030dc33de (pr/ipc-testasync.1 -> pr/ipc-testasync.2, compare) to fix various ci errors
  8. ryanofsky force-pushed on Jan 15, 2026
  9. ryanofsky commented at 1:39 pm on January 15, 2026: contributor

    Updated 1835ee877af30800c361e2075baae44030dc33de -> 568a0d8311a73094e6510699cd0d65ef6c4a252d (pr/ipc-testasync.2 -> pr/ipc-testasync.3, compare) to fix CI errors https://github.com/bitcoin/bitcoin/actions/runs/21014168771

    Updated 568a0d8311a73094e6510699cd0d65ef6c4a252d -> 96fd77bbcc42f1ac76be025e006bb59866bb7cc3 (pr/ipc-testasync.3 -> pr/ipc-testasync.4, compare) to fix CI errors https://github.com/bitcoin/bitcoin/actions/runs/21033187005

    Updated 96fd77bbcc42f1ac76be025e006bb59866bb7cc3 -> f532e9b2ff1a8b6359a79f238426a912abbf7428 (pr/ipc-testasync.4 -> pr/ipc-testasync.5, compare) making LLM suggested typo fixes and trying to bypass apparently stuck and incomplete CI job https://github.com/bitcoin/bitcoin/actions/runs/21036706108

  10. ryanofsky force-pushed on Jan 15, 2026
  11. ryanofsky force-pushed on Jan 15, 2026
  12. DrahtBot removed the label CI failed on Jan 15, 2026
  13. in test/functional/interface_ipc.py:455 in f532e9b2ff outdated
    450+        https://github.com/bitcoin/bitcoin/issues/33923 and currently causes a
    451+        thread busy error. A future change will make this just queue the calls
    452+        for execution and not trigger any error"""
    453+        node = self.nodes[0]
    454+        self.log.info("Running thread busy test")
    455+        timeout = self.rpc_timeout * 1000.0
    


    ismaelsadeeq commented at 12:47 pm on January 19, 2026:

    In “ipc, test: Add tests for unclean disconnect and thread busy behavior” f532e9b2ff1a8b6359a79f238426a912abbf7428

    nitty-nit: 60s is fine, but rpc_timeout seems unrelated to block wait timeout. I’d rather we define a timeout relative to the code in scope that would make the template wait till the required precondition is met; in case where changing the value of timeout should not necessitate editing this code (I assume that unlikely, so this suggestion is just a cosmetic).


    ryanofsky commented at 12:52 pm on January 20, 2026:

    re: #34284 (review)

    I think the idea behind timeout variable is to choose a cutoff time, where if the node does not respond to the request within that time, the request is considered hung and the test should fail.

    The timeout should not be so short that the test fails spuriously. And the timeout should not be so long that hangs take a very long time to detect. Since test environments vary a lot, the timeout should also be controllable by the --timeout-factor command line option.

    Passing rpc_timeout as the waitNext timeout accomplishes these things, which I think are what we want, but we could potentially document the rpc_timeout variable to describe it better, or rename it to something more general like request_timeout. The current name also seems ok since IPC is a type of RPC. I do think we want to avoid having multiple timeout variables that are controlled separately if we can avoid that.

  14. ismaelsadeeq approved
  15. ismaelsadeeq commented at 12:57 pm on January 19, 2026: member

    ACK f532e9b2ff1a8b6359a79f238426a912abbf7428

    I have tested locally and verified that the issue reported exists on master.

    I also cherry-picked https://github.com/bitcoin/bitcoin/compare/master...ryanofsky:bitcoin:pr/ipc-cancel compile and ran the test run_unclean_disconnect_test. It seems to fix the issue, and the subtest is not passing after that patch.

    Post that commit, it will be easy to update this test to reflect the new behaviour; we just have to assert the opposite of what’s assumed here, which is that an unclean disconnect does not trigger a crash and that more concurrency is allowed.

  16. in test/functional/interface_ipc.py:492 in f532e9b2ff
    487+                assert_equal(e.type, "FAILED")
    488+            else:
    489+                raise AssertionError("Expected thread busy exception")
    490+
    491+            # Generate a new block to make the active waitNext calls return, then clean up.
    492+            self.generate(node, 1, sync_fun=lambda: None)
    


    maflcko commented at 3:10 pm on January 19, 2026:
    nit: I think you can use self.no_op instead of the lambda

    ryanofsky commented at 8:26 pm on February 11, 2026:

    re: #34284 (review)

    nit: I think you can use self.no_op instead of the lambda

    Thanks, switched to self.no_op

  17. ryanofsky commented at 4:15 am on January 21, 2026: contributor

    I have a PR https://github.com/bitcoin-core/libmultiprocess/pull/240 which addresses both of the of the issues tested here (#33923 and #34250). The PR seems to work well but needs more testing and cleanup. I still think it would be good to merge this PR first to get more test coverage in place. If the other PR goes ahead the tests here can be updated as follows:

     0--- a/test/functional/interface_ipc.py
     1+++ b/test/functional/interface_ipc.py
     2@@ -401,9 +401,9 @@ class IPCInterfaceTest(BitcoinTestFramework):
     3 
     4     def run_unclean_disconnect_test(self):
     5         """Test behavior when disconnecting during an IPC call that later
     6-        returns a non-null interface pointer. Currently this behavior causes a
     7-        crash as reported [#34250](/bitcoin-bitcoin/34250/), but a
     8-        followup will change this behavior."""
     9+        returns a non-null interface pointer. This used to cause a crash as
    10+        reported [#34250](/bitcoin-bitcoin/34250/), but now just
    11+        results in a cancellation log message"""
    12         node = self.nodes[0]
    13         self.log.info("Running disconnect during BlockTemplate.waitNext")
    14         timeout = self.rpc_timeout * 1000.0
    15@@ -428,28 +428,22 @@ class IPCInterfaceTest(BitcoinTestFramework):
    16             with node.assert_debug_log(expected_msgs=["BlockTemplate.waitNext", "IPC server post request"]):
    17                 promise = template.waitNext(ctx, waitoptions)
    18                 await asyncio.sleep(0.1)
    19-            disconnected_log_check.enter_context(node.assert_debug_log(expected_msgs=["IPC server: socket disconnected"]))
    20+            disconnected_log_check.enter_context(node.assert_debug_log(expected_msgs=["IPC server: socket disconnected", "cancelled while executing"]))
    21             del promise
    22 
    23         asyncio.run(capnp.run(async_routine()))
    24 
    25         # Wait for socket disconnected log message, then generate a block to
    26-        # cause the waitNext() call to return a new template. This will cause a
    27-        # crash and disconnect with error output.
    28-        disconnected_log_check.close()
    29-        try:
    30+        # cause the waitNext() call to return a new template. Look for a
    31+        # cancelled IPC log message after waitNext returns.
    32+        with node.assert_debug_log(expected_msgs=["interrupted (cancelled)"]):
    33+            disconnected_log_check.close()
    34             self.generate(node, 1)
    35-        except (http.client.RemoteDisconnected, BrokenPipeError, ConnectionResetError):
    36-            pass
    37-        node.wait_until_stopped(expected_ret_code=(-11, -6, 1, 66), expected_stderr=re.compile(r"\S"))
    38-        self.start_node(0)
    39 
    40     def run_thread_busy_test(self):
    41         """Test behavior when sending multiple calls to the same server thread
    42         which used to cause a crash as reported
    43-        [#33923](/bitcoin-bitcoin/33923/) and currently causes a
    44-        thread busy error. A future change will make this just queue the calls
    45-        for execution and not trigger any error"""
    46+        [#33923](/bitcoin-bitcoin/33923/)."""
    47         node = self.nodes[0]
    48         self.log.info("Running thread busy test")
    49         timeout = self.rpc_timeout * 1000.0
    50@@ -480,18 +474,15 @@ class IPCInterfaceTest(BitcoinTestFramework):
    51             with node.assert_debug_log(expected_msgs=["BlockTemplate.waitNext", "IPC server post request"]):
    52                 promise2 = template.waitNext(ctx, waitoptions)
    53                 await asyncio.sleep(0.1)
    54-            try:
    55-                await template.waitNext(ctx, waitoptions)
    56-            except capnp.lib.capnp.KjException as e:
    57-                assert_equal(e.description, "remote exception: std::exception: thread busy")
    58-                assert_equal(e.type, "FAILED")
    59-            else:
    60-                raise AssertionError("Expected thread busy exception")
    61+            with node.assert_debug_log(expected_msgs=["BlockTemplate.waitNext", "IPC server post request"]):
    62+                promise3 = template.waitNext(ctx, waitoptions)
    63+                await asyncio.sleep(0.1)
    64 
    65             # Generate a new block to make the active waitNext calls return, then clean up.
    66             self.generate(node, 1, sync_fun=lambda: None)
    67             await ((await promise1).result).destroy(ctx)
    68             await ((await promise2).result).destroy(ctx)
    69+            await ((await promise3).result).destroy(ctx)
    70             await template.destroy(ctx)
    71 
    72         asyncio.run(capnp.run(async_routine()))
    
  18. DrahtBot added the label Needs rebase on Feb 7, 2026
  19. Sjors commented at 8:57 am on February 11, 2026: member

    Concept ACK

    Needs rebase after the test split in #34452 (suggest keeping them in interface_ipc.py), otherwise f532e9b2ff1a8b6359a79f238426a912abbf7428 looks good.

  20. ipc, test: Add tests for unclean disconnect and thread busy behavior
    Upcoming libmultiprocess changes are expected to alter this behavior
    (https://github.com/bitcoin/bitcoin/issues/34250#issuecomment-3749243782),
    making test coverage useful for documenting current behavior and validating the
    intended changes.
    61c9db4872
  21. in test/functional/test_framework/test_node.py:481 in f532e9b2ff outdated
    477@@ -477,12 +478,17 @@ def is_node_stopped(self, *, expected_stderr="", expected_ret_code=0):
    478             return False
    479 
    480         # process has stopped. Assert that it didn't return an error code.
    481-        assert return_code == expected_ret_code, self._node_msg(
    482+        if not isinstance(expected_ret_code, Iterable):
    


    Sjors commented at 9:03 am on February 11, 2026:
    Let’s add a comment to is_node_stopped that this takes either one ret code or a list, then the line here is less a surprise.

    ryanofsky commented at 8:27 pm on February 11, 2026:

    re: #34284 (review)

    Let’s add a comment to is_node_stopped that this takes either one ret code or a list, then the line here is less a surprise.

    Thanks, added comment

  22. ryanofsky referenced this in commit 4f42fc4ddc on Feb 11, 2026
  23. ryanofsky force-pushed on Feb 11, 2026
  24. ryanofsky commented at 8:50 pm on February 11, 2026: contributor

    Thanks for the review!

    Rebased f532e9b2ff1a8b6359a79f238426a912abbf7428 -> 66d0258efa9edcb64dc09b0e2b00e1b82b1dd97f (pr/ipc-testasync.5 -> pr/ipc-testasync.6, compare) due to conflicts with #34452 and implemented suggestions

    Updated 66d0258efa9edcb64dc09b0e2b00e1b82b1dd97f -> 97d2b7375ae3b7bcb479b4ea9b50fd83f7600209 (pr/ipc-testasync.6 -> pr/ipc-testasync.7, compare) to fix race condition in thread busy test https://github.com/bitcoin/bitcoin/actions/runs/21922475933/job/63306151362?pr=34284

    Updated 97d2b7375ae3b7bcb479b4ea9b50fd83f7600209 -> 61c9db4872bc069658f87743e07688d05cab01ae (pr/ipc-testasync.7 -> pr/ipc-testasync.8, compare) to deal with no stderr output from musl in https://github.com/bitcoin/bitcoin/actions/runs/21924150204/job/63312180100?pr=34284

  25. ryanofsky force-pushed on Feb 11, 2026
  26. DrahtBot added the label CI failed on Feb 11, 2026
  27. DrahtBot removed the label Needs rebase on Feb 11, 2026
  28. ryanofsky force-pushed on Feb 12, 2026
  29. DrahtBot removed the label CI failed on Feb 12, 2026
  30. Sjors commented at 10:05 am on February 13, 2026: member

    ACK 61c9db4872bc069658f87743e07688d05cab01ae

    If you have to retouch, you could add cd2aa85867a64003c62a2f3d7d6ea56f7e1b916c from #34284 here which contains (and explains) the test_node.py changes from this PR (modulo one word change).

  31. DrahtBot requested review from ismaelsadeeq on Feb 13, 2026

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-02-17 09:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me