type-context.h should return error instead of asserting on simultaneous IPC calls #206

issue ryanofsky openend this issue on September 5, 2025
  1. ryanofsky commented at 12:29 pm on September 5, 2025: collaborator

    Currently, when a client makes simultaneous IPC calls on a single server thread (assigning the same Context.thread value in the request), assert(!m_fn) triggers in Waiter::post.

    Since this error is pretty easy to trigger in python and rust cilents not using libmultiprocess, it would be better for the server to refuse these requests and provide a clear error message instead of crashing. This should just require a small change to the Waiter class and the code calling it in type-context.h. Context: https://github.com/bitcoin/bitcoin/pull/33201#issuecomment-3258111873 and https://gnusha.org/bitcoin-core-dev/2025-09-05.log:

     004:50 < sipa> ryanofsky: any idea about this error in https://github.com/bitcoin/bitcoin/actions/runs/17491932432/job/49683489249?pr=33201 :
     104:51 < sipa>  node0 stderr bitcoin-node: ipc/libmultiprocess/include/mp/proxy-io.h:279: void mp::Waiter::post(Fn &&) [Fn = (lambda at ./ipc/libmultiprocess/include/mp/type-context.h:67:19)]: Assertion `!m_fn' failed. 
     204:55 < kevkevin> looks like the test got hung up on this line in the ipc test: template4 = File "/home/admin/actions-runner/_work/_temp/build/test/functional/interface_ipc.py", line 156, in async_routine: await template2.result.waitNext(ctx, waitoptions)
     304:56 < sipa> yeah, bitcoin-node crashed during that invocation
     405:01 < sipa> i can reproduce it locally, though it's pretty rare (once in 60 runs, so far)
     505:04 < kevkevin> hmm interesting, I'll try to reproduce it locally myself
     605:05 < ryanofsky> that is interesting, i've never seen that assert hit before. i think it could happen if two ipc clients try to execute a different IPC calls on the same thread at the same time
     705:06 < sipa> huh!
     805:06 < ryanofsky> libmultiprocess assumes a 1:1 threading model where a there is one server thread for every client thread, emulating a traditional c++ call stack
     905:06 < ryanofsky> allowing recursive mutexes, synchronous callbacks, etc
    1005:06 < sipa> should the python side do anything specific to arrange that?
    1105:07 < sipa> i would assume not, because there is only one thread on the python side that does all IPC calls
    1205:07 < ryanofsky> it should just make two IPC calls at the same time, and they should be calls that do not return instantly like waitNext
    1305:08 < ryanofsky> yeah, it is unexpected that current python test would cause that
    1405:08 < dodo> c
    1505:09 -!- hacker4web3bitco [~hacker4we@user/hacker4web3bitco] has joined #bitcoin-core-dev
    1605:10 < ryanofsky> maybe there is a race condition where c++ code sends a response but the server thread doesn't return to being idle, and python code sends the next request quickly enough to hit the assert
    1705:10 -!- bitcoinlover [~hacker4we@user/hacker4web3bitco] has quit [Ping timeout: 248 seconds]
    1805:11 < sipa> oh, i may have messed something up
    1905:11 < sipa> waitnext = template2.result.waitNext(ctx, waitoptions)
    2005:11 < sipa> template4 = await template2.result.waitNext(ctx, waitoptions)
    2105:11 < sipa> might create two asynchronous calls simultaneously?
    2205:12 < ryanofsky> it seems like that would just chain the calls so the server runs them back to back, and make the bug more likely to happen
    2305:13 < ryanofsky> but if you added an await on the first line it might prevent this
    2405:13 < sipa> the first one is just unnecessary - i inlined it into the second one, but forgot to remove the original
    2505:14 < sipa> you say "make the bug more likely", so you think that the bug isn't in this code itself, but elsewhere?
    2605:14 < ryanofsky> actually i misread. yeah i assumed you were using waitnext variable second line and the calls would be chained. but actually this does look like simultaneous calls
    2705:14 < sipa> alright, let me run it 1000 times to see if it reappears, but if not, i'm going to assume that was the culprit
    2805:14 < ryanofsky> no i think this code does explain the bug
    2905:15 < sipa> thanks!
    3005:15 < ryanofsky> i just first misread it as chained calls that would be pipelined when actually those are parallel calls
    3105:15 < sipa> right
    3205:16 < ryanofsky> thanks for the test, and finding this. server could definitely do better here, it could just refuse the request instead of asserting
    3305:18 < sipa> or at least a more helpful error message
    3405:18 < sipa> i could imagine third parties trying to use the interface might hit something like this too
    3505:20 < ryanofsky> yeah the error message is the most important thing in practice
    
  2. Eunovo commented at 10:14 am on September 23, 2025: contributor

    I added a python script to reproduce the error https://github.com/bitcoin-core/libmultiprocess/compare/master...Eunovo:libmultiprocess:test-ipc-with-py?expand=1

    Run it with:

    0> mkdir build
    1> cd build
    2> cmake ..
    3> make mptests
    4> test/mp/test/test.py
    

    It fails with the following error because of the assertion:

     0Traceback (most recent call last):
     1  File "/home/novo/code/libmultiprocess/build/test/mp/test/test.py", line 80, in <module>
     2    asyncio.run(capnp.run(main()))
     3  File "/usr/lib/python3.12/asyncio/runners.py", line 194, in run
     4    return runner.run(main)
     5           ^^^^^^^^^^^^^^^^
     6  File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
     7    return self._loop.run_until_complete(task)
     8           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     9  File "/usr/lib/python3.12/asyncio/base_events.py", line 687, in run_until_complete
    10    return future.result()
    11           ^^^^^^^^^^^^^^^
    12  File "capnp/lib/capnp.pyx", line 2027, in run
    13  File "capnp/lib/capnp.pyx", line 2028, in capnp.lib.capnp.run
    14  File "/home/novo/code/libmultiprocess/build/test/mp/test/test.py", line 75, in main
    15    print(await foo.callback(ctx, callback, i+1))
    16          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    17capnp.lib.capnp.KjException: capnp/rpc.c++:2778: disconnected: Peer disconnected.
    18stack: 7a405f497a04 7a405f48e250 7a405f47dd00 7a405f31afa0 7a405f31d2a0
    
  3. ryanofsky referenced this in commit db03a663f5 on Oct 2, 2025

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin-core/libmultiprocess. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-12-04 19:30 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me