type-context.h should return error instead of asserting on simultaneous IPC calls #206

issue ryanofsky opened this issue on September 5, 2025
  1. ryanofsky commented at 12:29 PM on September 5, 2025: collaborator

    Currently, when a client makes simultaneous IPC calls on a single server thread (assigning the same Context.thread value in the request), assert(!m_fn) triggers in Waiter::post.

    Since this error is pretty easy to trigger in python and rust cilents not using libmultiprocess, it would be better for the server to refuse these requests and provide a clear error message instead of crashing. This should just require a small change to the Waiter class and the code calling it in type-context.h. Context: https://github.com/bitcoin/bitcoin/pull/33201#issuecomment-3258111873 and https://gnusha.org/bitcoin-core-dev/2025-09-05.log:

    04:50 < sipa> ryanofsky: any idea about this error in https://github.com/bitcoin/bitcoin/actions/runs/17491932432/job/49683489249?pr=33201 :
    04:51 < sipa>  node0 stderr bitcoin-node: ipc/libmultiprocess/include/mp/proxy-io.h:279: void mp::Waiter::post(Fn &&) [Fn = (lambda at ./ipc/libmultiprocess/include/mp/type-context.h:67:19)]: Assertion `!m_fn' failed. 
    04:55 < kevkevin> looks like the test got hung up on this line in the ipc test: template4 = File "/home/admin/actions-runner/_work/_temp/build/test/functional/interface_ipc.py", line 156, in async_routine: await template2.result.waitNext(ctx, waitoptions)
    04:56 < sipa> yeah, bitcoin-node crashed during that invocation
    05:01 < sipa> i can reproduce it locally, though it's pretty rare (once in 60 runs, so far)
    05:04 < kevkevin> hmm interesting, I'll try to reproduce it locally myself
    05:05 < ryanofsky> that is interesting, i've never seen that assert hit before. i think it could happen if two ipc clients try to execute a different IPC calls on the same thread at the same time
    05:06 < sipa> huh!
    05:06 < ryanofsky> libmultiprocess assumes a 1:1 threading model where a there is one server thread for every client thread, emulating a traditional c++ call stack
    05:06 < ryanofsky> allowing recursive mutexes, synchronous callbacks, etc
    05:06 < sipa> should the python side do anything specific to arrange that?
    05:07 < sipa> i would assume not, because there is only one thread on the python side that does all IPC calls
    05:07 < ryanofsky> it should just make two IPC calls at the same time, and they should be calls that do not return instantly like waitNext
    05:08 < ryanofsky> yeah, it is unexpected that current python test would cause that
    05:08 < dodo> c
    05:09 -!- hacker4web3bitco [~hacker4we@user/hacker4web3bitco] has joined #bitcoin-core-dev
    05:10 < ryanofsky> maybe there is a race condition where c++ code sends a response but the server thread doesn't return to being idle, and python code sends the next request quickly enough to hit the assert
    05:10 -!- bitcoinlover [~hacker4we@user/hacker4web3bitco] has quit [Ping timeout: 248 seconds]
    05:11 < sipa> oh, i may have messed something up
    05:11 < sipa> waitnext = template2.result.waitNext(ctx, waitoptions)
    05:11 < sipa> template4 = await template2.result.waitNext(ctx, waitoptions)
    05:11 < sipa> might create two asynchronous calls simultaneously?
    05:12 < ryanofsky> it seems like that would just chain the calls so the server runs them back to back, and make the bug more likely to happen
    05:13 < ryanofsky> but if you added an await on the first line it might prevent this
    05:13 < sipa> the first one is just unnecessary - i inlined it into the second one, but forgot to remove the original
    05:14 < sipa> you say "make the bug more likely", so you think that the bug isn't in this code itself, but elsewhere?
    05:14 < ryanofsky> actually i misread. yeah i assumed you were using waitnext variable second line and the calls would be chained. but actually this does look like simultaneous calls
    05:14 < sipa> alright, let me run it 1000 times to see if it reappears, but if not, i'm going to assume that was the culprit
    05:14 < ryanofsky> no i think this code does explain the bug
    05:15 < sipa> thanks!
    05:15 < ryanofsky> i just first misread it as chained calls that would be pipelined when actually those are parallel calls
    05:15 < sipa> right
    05:16 < ryanofsky> thanks for the test, and finding this. server could definitely do better here, it could just refuse the request instead of asserting
    05:18 < sipa> or at least a more helpful error message
    05:18 < sipa> i could imagine third parties trying to use the interface might hit something like this too
    05:20 < ryanofsky> yeah the error message is the most important thing in practice
    
  2. Eunovo commented at 10:14 AM on September 23, 2025: contributor

    I added a python script to reproduce the error https://github.com/bitcoin-core/libmultiprocess/compare/master...Eunovo:libmultiprocess:test-ipc-with-py?expand=1

    Run it with:

    > mkdir build
    > cd build
    > cmake ..
    > make mptests
    > test/mp/test/test.py
    

    It fails with the following error because of the assertion:

    Traceback (most recent call last):
      File "/home/novo/code/libmultiprocess/build/test/mp/test/test.py", line 80, in <module>
        asyncio.run(capnp.run(main()))
      File "/usr/lib/python3.12/asyncio/runners.py", line 194, in run
        return runner.run(main)
               ^^^^^^^^^^^^^^^^
      File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
        return self._loop.run_until_complete(task)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/usr/lib/python3.12/asyncio/base_events.py", line 687, in run_until_complete
        return future.result()
               ^^^^^^^^^^^^^^^
      File "capnp/lib/capnp.pyx", line 2027, in run
      File "capnp/lib/capnp.pyx", line 2028, in capnp.lib.capnp.run
      File "/home/novo/code/libmultiprocess/build/test/mp/test/test.py", line 75, in main
        print(await foo.callback(ctx, callback, i+1))
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    capnp.lib.capnp.KjException: capnp/rpc.c++:2778: disconnected: Peer disconnected.
    stack: 7a405f497a04 7a405f48e250 7a405f47dd00 7a405f31afa0 7a405f31d2a0
    
  3. ryanofsky referenced this in commit db03a663f5 on Oct 2, 2025
  4. ryanofsky commented at 2:52 PM on February 27, 2026: collaborator

    This crash was replaced with a "thready busy" exception by @Eunovo in #214, and the exception was replaced with request queueing in #240, so this can be closed

  5. ryanofsky closed this on Feb 27, 2026


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin-core/libmultiprocess. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-04-18 12:30 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me