ipc: Handle unclean shutdowns better #32345

pull ryanofsky wants to merge 8 commits into bitcoin:master from ryanofsky:pr/ipc-stop changing 22 files +485 −234
  1. Squashed 'src/ipc/libmultiprocess/' changes from 35944ffd23fa..f15ef6cdeec5
    f15ef6cdeec5 Improve IPC client disconnected exceptions
    4a9a387e68b4 test: Add test coverage for client & server disconnections
    a848ec60d490 refactor: Add clang thread safety annotations to EventLoop
    4b97111ada84 refactor: Remove DestructorCatcher and AsyncCallable
    276eb8f99d05 refactor: Drop addClient/removeClient methods
    025a77ec2e46 refactor: Use EventLoopRef instead of addClient/removeClient
    394f966e93f8 refactor: Add ProxyContext EventLoop* member
    c1aa2d7dd546 proxy-io.h: Add EventLoopRef RAII class handle addClient/removeClient refcounting
    2e02532f4720 proxy-io.h: Add more detailed EventLoop comment
    
    git-subtree-dir: src/ipc/libmultiprocess
    git-subtree-split: f15ef6cdeec54c83f52e00de47d57b09f9a5f03b
    c021835739
  2. Merge commit 'c021835739272d1daa8b4a2afc8e8c7d093d534c' into pr/ipc-stop-base b9e16ff790
  3. ipc: Use EventLoopRef instead of addClient/removeClient
    Use EventLoopRef to avoid reference counting bugs and be more exception safe
    and deal with removal of addClient/removeClient methods in
    https://github.com/bitcoin-core/libmultiprocess/pull/160
    
    A test update is also required due to
    https://github.com/bitcoin-core/libmultiprocess/pull/160 to deal with changed
    reference count semantics. In IpcPipeTest(), it is is now necessary to destroy
    the client Proxy object instead of just the client Connection object to
    decrease the event loop reference count and allow the loop to exit so the test
    does not hang on shutdown.
    197b2aaaaa
  4. test: Add unit test coverage for Init and Shutdown code
    Currently this code is not called in unit tests. Calling should make it
    possible to write tests for things like IPC exceptions being thrown during
    shutdown.
    cf1c26a3a2
  5. ipc: Avoid waiting for clients to disconnect when shutting down
    This fixes behavior reported by Antoine Poinsot <darosior@protonmail.com>
    https://github.com/bitcoin/bitcoin/pull/29409#issuecomment-2546088852 where if
    an IPC client is connected, the node will wait forever for it to disconnect
    before exiting.
    8ca7049cea
  6. doc: Improve IPC interface comments
    Fix some comments that were referring to previous versions of these methods and
    did not make sense.
    6427d6f175
  7. ipc: Add Ctrl-C handler for spawned subprocesses
    This fixes an error reported by Antoine Poinsot <darosior@protonmail.com> in
    https://github.com/bitcoin-core/libmultiprocess/issues/123 that does not happen
    in master, but does happen with https://github.com/bitcoin/bitcoin/pull/10102
    applied, where if Ctrl-C is pressed when `bitcoin-node` is started, it is
    handled by both `bitcoin-node` and `bitcoin-wallet` processes, causing the
    wallet to shutdown abruptly instead of waiting for the node and shutting down
    cleanly.
    
    This change fixes the problem by having the wallet process print to stdout when
    it receives a Ctrl-C signal but not otherwise react, letting the node shut
    everything down cleanly.
    db845b915f
  8. ipc: Handle bitcoin-wallet disconnections
    This fixes an error reported by Antoine Poinsot <darosior@protonmail.com> in
    https://github.com/bitcoin-core/libmultiprocess/issues/123 that does not happen
    in master, but does happen with https://github.com/bitcoin/bitcoin/pull/10102
    applied, where if the child bitcoin-wallet process is killed (either by an
    external signal or by Ctrl-C as reported in the issue) the bitcoin-node process
    will not shutdown cleanly after that because chain client flush()
    calls will fail.
    
    This change fixes the problem by handling ipc::Exception errors thrown during
    the flush() calls, and it relies on the fixes to disconnect detection
    implemented in https://github.com/bitcoin-core/libmultiprocess/pull/160 to work
    effectively.
    5b38e62ccb
  9. DrahtBot commented at 8:58 pm on April 24, 2025: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/32345.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    Concept ACK darosior

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #32438 (refactor: Removals after bdb removal by maflcko)
    • #32387 ([DRAFT] ipc: add windows support by ryanofsky)
    • #29641 (scripted-diff: Use LogInfo over LogPrintf [WIP, NOMERGE, DRAFT] by maflcko)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

    LLM Linter (✨ experimental)

    Possible typos and grammar issues:

    FooImplemention → FooImplementation
    explictly → explicitly

    No other typos were found.

  10. DrahtBot added the label CI failed on Apr 25, 2025
  11. DrahtBot commented at 0:33 am on April 25, 2025: contributor

    🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/41118314270

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  12. fanquake commented at 10:20 am on April 25, 2025: member

    https://github.com/bitcoin/bitcoin/actions/runs/14651550530/job/41119318297?pr=32345#step:5:966:

    0Assertion failed: fRPCInWarmup, file ./rpc/server.cpp, line 329
    1Error: Process completed with exit code 3.
    

    https://cirrus-ci.com/task/5477140512112640?logs=ci#L5037:

    0[17:33:37.941] ./test/node_init_tests.cpp(41): error: in "node_init_tests/init_test": check AppInitMain(m_node) has failed
    1[17:33:37.941] Error: Cannot resolve -bind address: ''
    2[17:33:37.954] ./test/node_init_tests.cpp(32): Leaving test case "init_test"; testing time: 32703us
    

    https://cirrus-ci.com/task/5336403023757312?logs=ci#L2886:

    0[18:45:09.651] /ci_container_base/src/ipc/interfaces.cpp:35:5: error: ignoring return value of function declared with 'warn_unused_result' attribute [-Werror,-Wunused-result]
    1[18:45:09.651]    35 |     write(STDOUT_FILENO, g_ignore_ctrl_c.data(), g_ignore_ctrl_c.size());
    2[18:45:09.651]       |     ^~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    3[18:45:09.651] 1 error generated.
    
  13. ryanofsky force-pushed on Apr 25, 2025
  14. ryanofsky commented at 5:38 pm on April 25, 2025: contributor
    Updated e8372fa3a606d35146c72fc61663e17c67a4621a -> a56468ab0bf2154159919a2c3feba9974486d10e (pr/ipc-stop.1 -> pr/ipc-stop.2, compare) to fix draftbot typos and CI failures, though I could not reproduce the previous releases AppInitMain failure in local CI so it could still be broken
  15. darosior commented at 7:53 pm on April 25, 2025: member
    Concept ACK. Will test using my Rust client for #29409 rebased on top of this.
  16. ryanofsky force-pushed on Apr 28, 2025
  17. ryanofsky commented at 4:22 pm on April 28, 2025: contributor
    Updated a56468ab0bf2154159919a2c3feba9974486d10e -> 0b72656ecf363d9fc4820a9e3e22afb2c98d4e75 (pr/ipc-stop.2 -> pr/ipc-stop.3, compare) to try to fix init test CI failures https://github.com/bitcoin/bitcoin/actions/runs/14670251419/job/41176416988?pr=32345 and https://cirrus-ci.com/task/6094268256747520 Updated 0b72656ecf363d9fc4820a9e3e22afb2c98d4e75 -> 5b38e62ccbfcdb7f20a6a74fcc4ef2358e3da56c (pr/ipc-stop.3 -> pr/ipc-stop.4, compare) to fix test_bitcoin-qt failures caused by previous fix https://github.com/bitcoin/bitcoin/pull/32345/checks?check_run_id=41288589561
  18. ryanofsky force-pushed on Apr 28, 2025
  19. DrahtBot removed the label CI failed on Apr 28, 2025
  20. maflcko commented at 6:18 am on April 29, 2025: member
    There are also some typos, according to #32345 (comment), if you want to fix them if you have to re-touch for other reasons anyway.
  21. DrahtBot added the label CI failed on May 5, 2025
  22. DrahtBot added the label Needs rebase on May 9, 2025
  23. DrahtBot commented at 4:39 pm on May 9, 2025: contributor

    🐙 This pull request conflicts with the target branch and needs rebase.

  24. darosior commented at 6:49 pm on May 20, 2025: member

    I tested this PR by rebasing #29409 on top of it and running my (updated) core_bdk_wallet against it.

    I found that bitcoin-node will (still) segfault if i try to stop it while the client is running. The steps i followed were to 1) start bitcoin-node 2) start my PoC which will connect to it and stall with the open connection for 6 seconds 3) before the 6 seconds have elapsed, stop the bitcoin-node process (in this specific case by sending it a SIGINT). Here are the logs of the bitcoin-node process:

     0[...]
     12025-05-20T18:44:44Z init message: Starting network threads
     22025-05-20T18:44:44Z DNS seeding disabled
     32025-05-20T18:44:44Z init message: Done loading
     42025-05-20T18:44:44Z initload thread start
     52025-05-20T18:44:44Z msghand thread start
     62025-05-20T18:44:44Z Loading 0 mempool transactions from file...
     72025-05-20T18:44:44Z Imported mempool transactions from file: 0 succeeded, 0 failed, 0 expired, 0 already there, 0 waiting for initial broadcast
     82025-05-20T18:44:44Z opencon thread start
     92025-05-20T18:44:44Z addcon thread start
    102025-05-20T18:44:44Z net thread start
    112025-05-20T18:44:44Z initload thread exit
    122025-05-20T18:44:44Z New manual v2 peer connected: version: 70016, blocks=113, peer=0
    132025-05-20T18:44:44Z Synchronizing blockheaders, height: 113 (~0.47%)
    142025-05-20T18:44:45Z [ipc] {bitcoin-node-948798/b-capnp-loop-948800} IPC server recv request  [#1](/bitcoin-bitcoin/1/) Init.construct$Params ()
    152025-05-20T18:44:45Z [ipc] {bitcoin-node-948798/b-capnp-loop-948800} IPC server send response [#1](/bitcoin-bitcoin/1/) Init.construct$Results (threadMap = <external capability>)
    162025-05-20T18:44:45Z [ipc] {bitcoin-node-948798/b-capnp-loop-948800} IPC server recv request  [#2](/bitcoin-bitcoin/2/) Init.makeChain$Params (context = (thread = <external capability>))
    172025-05-20T18:44:45Z [ipc] {bitcoin-node-948798/b-capnp-loop-948800} IPC server post request  [#2](/bitcoin-bitcoin/2/) {bitcoin-node-948798/b-capnp-loop-948823 (from )}
    182025-05-20T18:44:45Z [ipc] {bitcoin-node-948798/b-capnp-loop-948800} IPC server send response [#2](/bitcoin-bitcoin/2/) Init.makeChain$Results (result = <external capability>)
    192025-05-20T18:44:45Z [ipc] {bitcoin-node-948798/b-capnp-loop-948800} IPC server recv request  [#3](/bitcoin-bitcoin/3/) Chain.initMessage$Params (context = (thread = <external capability>), message = "Oxydation of the Bitcoin Core wallet in progress..")
    202025-05-20T18:44:45Z [ipc] {bitcoin-node-948798/b-capnp-loop-948800} IPC server post request  [#3](/bitcoin-bitcoin/3/) {bitcoin-node-948798/b-capnp-loop-948823 (from )}
    212025-05-20T18:44:45Z init message: Oxydation of the Bitcoin Core wallet in progress..
    222025-05-20T18:44:45Z [ipc] {bitcoin-node-948798/b-capnp-loop-948800} IPC server send response [#3](/bitcoin-bitcoin/3/) Chain.initMessage$Results ()
    232025-05-20T18:44:45Z [ipc] {bitcoin-node-948798/b-capnp-loop-948800} IPC server recv request  [#4](/bitcoin-bitcoin/4/) Chain.showProgress$Params (context = (thread = <external capability>), title = "BDK Core startup", progress = 1, resumePossible = false)
    242025-05-20T18:44:45Z [ipc] {bitcoin-node-948798/b-capnp-loop-948800} IPC server post request  [#4](/bitcoin-bitcoin/4/) {bitcoin-node-948798/b-capnp-loop-948823 (from )}
    252025-05-20T18:44:45Z [ipc] {bitcoin-node-948798/b-capnp-loop-948800} IPC server send response [#4](/bitcoin-bitcoin/4/) Chain.showProgress$Results ()
    262025-05-20T18:44:45Z [ipc] {bitcoin-node-948798/b-capnp-loop-948800} IPC server recv request  [#5](/bitcoin-bitcoin/5/) Chain.getHeight$Params (context = (thread = <external capability>))
    272025-05-20T18:44:45Z [ipc] {bitcoin-node-948798/b-capnp-loop-948800} IPC server post request  [#5](/bitcoin-bitcoin/5/) {bitcoin-node-948798/b-capnp-loop-948823 (from )}
    282025-05-20T18:44:45Z [ipc] {bitcoin-node-948798/b-capnp-loop-948800} IPC server send response [#5](/bitcoin-bitcoin/5/) Chain.getHeight$Results (result = 113, hasResult = true)
    292025-05-20T18:44:45Z [ipc] {bitcoin-node-948798/b-capnp-loop-948800} IPC server recv request  [#6](/bitcoin-bitcoin/6/) Chain.getBlockHash$Params (context = (thread = <external capability>), height = 113)
    302025-05-20T18:44:45Z [ipc] {bitcoin-node-948798/b-capnp-loop-948800} IPC server post request  [#6](/bitcoin-bitcoin/6/) {bitcoin-node-948798/b-capnp-loop-948823 (from )}
    312025-05-20T18:44:45Z [ipc] {bitcoin-node-948798/b-capnp-loop-948800} IPC server send response [#6](/bitcoin-bitcoin/6/) Chain.getBlockHash$Results (result = "H\\3678\\307#\\020|\\020\\017\\340\\2469\\330\\356\\335\\277\\341dOB{\\220\\327\\354[\\361\\036\\233\\301\\035DD")
    322025-05-20T18:44:45Z [ipc] {bitcoin-node-948798/b-capnp-loop-948800} IPC server recv request  [#7](/bitcoin-bitcoin/7/) Chain.showProgress$Params (context = (thread = <external capability>), title = "BDK Core startup", progress = 100, resumePossible = true)
    332025-05-20T18:44:45Z [ipc] {bitcoin-node-948798/b-capnp-loop-948800} IPC server post request  [#7](/bitcoin-bitcoin/7/) {bitcoin-node-948798/b-capnp-loop-948823 (from )}
    342025-05-20T18:44:45Z [ipc] {bitcoin-node-948798/b-capnp-loop-948800} IPC server send response [#7](/bitcoin-bitcoin/7/) Chain.showProgress$Results ()
    352025-05-20T18:44:45Z [ipc] {bitcoin-node-948798/b-capnp-loop-948800} IPC server recv request  [#8](/bitcoin-bitcoin/8/) Chain.handleNotifications$Params (context = (thread = <external capability>), notifications = <external capability>)
    362025-05-20T18:44:45Z [ipc] {bitcoin-node-948798/b-capnp-loop-948800} IPC server post request  [#8](/bitcoin-bitcoin/8/) {bitcoin-node-948798/b-capnp-loop-948823 (from )}
    372025-05-20T18:44:45Z [ipc] {bitcoin-node-948798/b-capnp-loop-948800} IPC server send response [#8](/bitcoin-bitcoin/8/) Chain.handleNotifications$Results (result = <external capability>)
    382025-05-20T18:44:45Z [ipc] {bitcoin-node-948798/b-capnp-loop-948800} IPC server destroy N2mp11ProxyServerIN3ipc5capnp8messages7HandlerEEE
    39^C2025-05-20T18:44:47Z Shutdown: In progress...
    402025-05-20T18:44:47Z addcon thread exit
    412025-05-20T18:44:47Z opencon thread exit
    422025-05-20T18:44:47Z net thread exit
    432025-05-20T18:44:47Z msghand thread exit
    442025-05-20T18:44:47Z scheduler thread exit
    452025-05-20T18:44:47Z Writing 0 mempool transactions to file...
    462025-05-20T18:44:47Z Writing 0 unbroadcast transactions to file.
    472025-05-20T18:44:47Z Dumped mempool: 0.000s to copy, 0.003s to dump, 27 bytes dumped to file
    482025-05-20T18:44:47Z Flushed fee estimates to fee_estimates.dat.
    492025-05-20T18:44:47Z [ipc] {bitcoin-node-948798/b-shutoff-948798} IPC client first request from current thread, constructing waiter
    502025-05-20T18:44:47Z [ipc] {bitcoin-node-948798/b-shutoff-948798} IPC client send ChainNotifications.chainStateFlushed$Params (context = (thread = <external capability>, callbackThread = <external capability>), role = 0, locator = "\\200\\021\\001\\000\\022H\\3678\\307#\\020|\\020\\017\\340\\2469\\330\\356\\335\\277\\341dOB{\\220\\327\\354[\\361\\036\\233\\301\\035DD\\tGm\\237]YbZDV9\\254\\252f\\205\\001\\323K\\375!\\340\\t\\367\\375\\346\\264W\\207\\000\\2101\\035:\\314\\233\\246\\275\\207\\v\\\\\\\\\\343\\202\\026\\274\\037\\213\\314\\303V\\2116<*+\\352\\255\\026\\204M\\332\\340\\nT\\306\\360Pa\\304\\002\\024\\356\\247\\220\\263\\257l\\230\\350\\362\\027Yl\\300\\316lk\\232\\f\\333\\235,\\234\\336\\204+JE\\263>\\237\\177\\305z\\017\\252\\311@\\341e\\026\\300dk\\305\\212V\\334\\377\\004cb\\217p\\n\\307w\\0046\\2261|\\024\\\\N\\336\\223G\\263~n\\271\\256\\264\\346\\227\\301\\032\\301\\030\\bw\\241Z.\\302\\261\\027 \\005\\374B\\357R\\231\\237Ct\\324\\232I\\322\\370\\303T\\253\\205\\354\\233\\337\\031C\\3215O\\3746\\016\\360b &\\300\\201\\244U\\221\\306 Pql+c\\212\\2524\\314a\\t\\216\\275=\\230b\\207ouv[\\365(\\3...
    512025-05-20T18:44:47Z [ipc] {bitcoin-node-948798/b-shutoff-948798} IPC client recv ChainNotifications.chainStateFlushed$Results ()
    522025-05-20T18:44:47Z [ipc] {bitcoin-node-948798/b-capnp-loop-948800} IPC server destroy N2mp11ProxyServerIN3ipc5capnp8messages5ChainEEE
    532025-05-20T18:44:47Z [ipc] {bitcoin-node-948798/b-capnp-loop-948800} IPC server destroy N2mp11ProxyServerIN3ipc5capnp8messages4InitEEE
    54Segmentation fault
    
  25. ryanofsky commented at 2:33 am on May 21, 2025: contributor

    re: #32345 (comment)

    I found that bitcoin-node will (still) segfault if i try to stop it while the client is running.

    Thanks for testing this! I created a new issue in https://github.com/bitcoin-core/libmultiprocess/issues/176 to help debug. So far I couldn’t reproduce the problem running a very basic client there, but I can try to extend it to do more of the operations your client is doing and see if I can make the problem happen. It would also help to have a stack trace from the segfault, and I provided instructions for getting a stack trace there.


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-05-29 18:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me