Made SpawnProcess() behavior safe post fork() #237

pull Sjors wants to merge 4 commits into bitcoin-core:master from Sjors:2026/01/fork-safu changing 4 files +163 −15
  1. Sjors commented at 11:58 am on January 8, 2026: member

    Bitcoin Core functional tests for the echoipc RPC method were occasionally timing out, see https://github.com/bitcoin/bitcoin/issues/34187. A helpful LLM quickly figured out it was due to doing unsafe things after fork().

    https://pubs.opengroup.org/onlinepubs/009695399/functions/fork.html

    Fix it by having SpawnProcess evaluate fd_to_args and build argv in the parent before fork().

    SpawnProcess now avoids ExecProcess. I intentionally did not fix the latter, not just to reduce churn, but also because it’s now only used by mpgen and it doesn’t seem worth losing the debug errors it (unsafely) throws.

    The last commit adds a test which reproduces the issue.

  2. refactor: extract MakeArgv helper
    The next commit will use it to create an alternative to ExecProcess that's safe to call post-fork.
    
    Added a note that ExecProcess is not safe to call post-fork. Since it's only used by mpgen at build time, this is fine, and lets of print useful errors.
    14e926a3ff
  3. DrahtBot commented at 11:58 am on January 8, 2026: none

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK ryanofsky

    If your review is incorrectly listed, please copy-paste <!–meta-tag:bot-skip–> into the comment that the bot should ignore.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #231 (Add windows support by ryanofsky)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

    LLM Linter (✨ experimental)

    Possible places where named args for integral literals may be used (e.g. func(x, /*named_arg=*/0) in C++, and func(x, named_arg=0) in Python):

    • [::waitpid(pid, &status, /options=/0)] in src/mp/util.cpp
    • [::waitpid(pid, &status, /options=/0)] in test/mp/test/spawn_tests.cpp

    2026-01-10

  4. Sjors force-pushed on Jan 8, 2026
  5. Sjors force-pushed on Jan 8, 2026
  6. Sjors commented at 12:13 pm on January 8, 2026: member

    Fixed and pre-empted a few other style nits.

    Hopefully fixed both CI build failures too (include-what-you-use).

  7. Sjors force-pushed on Jan 8, 2026
  8. Sjors commented at 5:43 am on January 9, 2026: member
    @DrahtBot code suggestion is stale, was fixed in the last push.
  9. Sjors force-pushed on Jan 9, 2026
  10. Sjors commented at 7:12 am on January 9, 2026: member
    Added code comments to explain how the test works, and to point out it could trigger false negatives (missing a regression), but won’t have spurious failures.
  11. Sjors force-pushed on Jan 9, 2026
  12. maflcko commented at 9:02 am on January 9, 2026: contributor

    @DrahtBot code suggestion is stale, was fixed in the last push.

    It wasn’t stale. The LLM with this input had a 10% chance of producing this slop response, and you hit it. :tada:

  13. Sjors commented at 11:58 am on January 9, 2026: member
  14. in src/mp/util.cpp:143 in 9d3c2b69c3
    140+    // descriptor for socket 1. On failure, the parent throws, but the child
    141+    // must _exit(1) (post-fork child must not throw).
    142     if (close(fds[pid ? 0 : 1]) != 0) {
    143-        throw std::system_error(errno, std::system_category(), "close");
    144+        if (pid) throw std::system_error(errno, std::system_category(), "close");
    145+        _exit(1);
    


    ryanofsky commented at 1:28 pm on January 9, 2026:

    In commit “Precompute argv before fork in SpawnProcess” (9d3c2b69c30841f4c686ecdf085db9ecca50f814)

    Getting rid of the throw here makes sense but it’d also be nice to have diagnostic and not just return 1. Would suggest something like:

    0    static constexpr char msg[] = "SpawnProcess(child): close(fds[1]) failed\n";
    1    (void)::write(STDERR_FILENO, msg, sizeof(msg) - 1);
    2    _exit(126);
    

    Sjors commented at 3:21 am on January 10, 2026:
    Done

    Sjors commented at 3:29 am on January 10, 2026:

    The default CI running doesn’t like the void trick.

    0/home/runner/work/libmultiprocess/libmultiprocess/src/mp/util.cpp:147:22: error: ignoring return value of ssize_t write(int, const void*, size_t) declared with attribute warn_unused_result [-Werror=unused-result]
    1  147 |         (void)::write(STDERR_FILENO, msg, sizeof(msg) - 1);
    

    Trying a workaround.


    ryanofsky commented at 12:17 pm on January 13, 2026:

    re: #237 (review)

    The default CI running doesn’t like the void trick.

    For future reference, it’s possible work around this by using (void)! instead of (void). The same issue happened previously in https://github.com/bitcoin/bitcoin/blob/3c8d389a84d29c7e5532548320228f3e8107969d/src/ipc/interfaces.cpp#L35-L36

  15. in src/mp/util.cpp:168 in 9d3c2b69c3 outdated
    156             }
    157         }
    158-        ExecProcess(fd_to_args(fds[0]));
    159+
    160+        execvp(argv[0], argv.data());
    161+        perror("execvp failed");
    


    ryanofsky commented at 1:52 pm on January 9, 2026:

    In commit “Precompute argv before fork in SpawnProcess” (9d3c2b69c30841f4c686ecdf085db9ecca50f814)

    Keeping perror is probaby the best choice for now even though strictly speaking it might not be safe and could hang. Would suggest adding a comment like “// NOTE: perror() is not async-signal-safe; calling it here in a post-fork child may still deadlock in multithreaded parents.”

    Note for future todo: It would be nice to improve this later by adding an error-reporting pipe, writing errno to it so parent process can provide a better diagnostic to code calling it. This idea came up previously in #213 (review) / 286fe469c9c96d99bd3340b22f1f2e7314ef9993


    Sjors commented at 3:21 am on January 10, 2026:
    Added a note and TODO
  16. in src/mp/util.cpp:142 in 9d3c2b69c3
    139+    // Parent process closes the descriptor for socket 0, child closes the
    140+    // descriptor for socket 1. On failure, the parent throws, but the child
    141+    // must _exit(1) (post-fork child must not throw).
    142     if (close(fds[pid ? 0 : 1]) != 0) {
    143-        throw std::system_error(errno, std::system_category(), "close");
    144+        if (pid) throw std::system_error(errno, std::system_category(), "close");
    


    ryanofsky commented at 2:01 pm on January 9, 2026:

    In commit “Precompute argv before fork in SpawnProcess” (9d3c2b69c30841f4c686ecdf085db9ecca50f814)

    Note: It looks like there is a pre-existing bug here where if close(fds[0]) fails in the parent fd[1] will not be closed either and could be leaked. Maybe doesn’t matter since if one close fails, not sure what chances of other one succeeding could be, but this might be good to address in followup.


    Sjors commented at 3:21 am on January 10, 2026:
    Fixed in a separate commit.

    ryanofsky commented at 12:19 pm on January 13, 2026:

    re: #237 (review)

    Fixed in a separate commit.

    Thanks for the fix!

  17. in src/mp/util.cpp:158 in 9d3c2b69c3
    157         }
    158-        ExecProcess(fd_to_args(fds[0]));
    159+
    160+        execvp(argv[0], argv.data());
    161+        perror("execvp failed");
    162+        _exit(1);
    


    ryanofsky commented at 2:05 pm on January 9, 2026:

    In commit “Precompute argv before fork in SpawnProcess” (9d3c2b69c30841f4c686ecdf085db9ecca50f814)

    Would suggest replacing 1 with 127, since this is more distinctive and is also what bash returns in $? when a command fails to execute


    Sjors commented at 3:21 am on January 10, 2026:
    Done
  18. ryanofsky approved
  19. ryanofsky commented at 2:20 pm on January 9, 2026: collaborator

    Code review ACK 0baf75c861e8be081143a162634c30ecbfe794c3. I wouldn’t have guessed that this could cause https://github.com/bitcoin/bitcoin/issues/34187, but it makes sense and could easily explain the echoipc child process randomly hanging.

    I’m a little doubtful about how valuable the test in third commit is since it is long and only catches fd_to_args being called incorrectly in the child process, not MakeArgv, or future bad code that could be added there. But the test is pretty clean, and could be dropped if it ever becomes a maintenance burden.

    Left some suggestions below, but happy to merge this as-is if preferred

  20. SpawnProcess: avoid fd leak on close failure
    Co-authored-by: Russell Yanofsky <russ@yanofsky.org>
    30a8681de6
  21. Sjors force-pushed on Jan 10, 2026
  22. Sjors commented at 3:21 am on January 10, 2026: member

    I’m a little doubtful about how valuable the test in third commit is since it is long and only catches fd_to_args being called incorrectly in the child process

    The test is mainly useful to confirm the fix. I agree it wouldn’t catch all possible regressions, but it could be adjusted if one sneaks in. Or indeed just dropped if it causes problems.

    I did find it useful to wrap my head around post-fork issues, which we’ve run into before e.g. in https://github.com/bitcoin/bitcoin/pull/33063.


    Investigating CI failures…

  23. Sjors force-pushed on Jan 10, 2026
  24. Precompute argv before fork in SpawnProcess
    Evaluate fd_to_args and build argv in the parent before fork().
    
    In the child between fork() and execvp(), avoid throwing/allocating
    and use _exit(1) on error.
    
    This likely fixes the occasional deadlock in Bitcoin Core functional
    test, noticed in bitcoin/bitcoin#34187.
    
    Co-authored-by: Russell Yanofsky <russ@yanofsky.org>
    69652f0edf
  25. test: check SpawnProcess post-fork safety
    This test fails without the previous commit.
    5205a87cd9
  26. Sjors force-pushed on Jan 10, 2026
  27. ryanofsky approved
  28. ryanofsky commented at 12:21 pm on January 13, 2026: collaborator
    Code review ACK 5205a87cd90e23f5aa2ae8503213dc90bd557bfd, just tweaking error codes and output since last review and fixing potential (pre-existing) descriptor leak on failure bug.
  29. ryanofsky merged this on Jan 13, 2026
  30. ryanofsky closed this on Jan 13, 2026

  31. Sjors deleted the branch on Jan 13, 2026

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin-core/libmultiprocess. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-01-27 08:30 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me