subprocess: Let shell parse command on non-Windows systems #32577

pull hebasto wants to merge 2 commits into bitcoin:master from hebasto:250521-subprocess-split changing 5 files +57 −11
  1. hebasto commented at 11:45 am on May 21, 2025: member

    Fixes #32574.

    The subprocess::Popen constructor has two overloads: https://github.com/bitcoin/bitcoin/blob/7763e86afa045910a14ac9b2cd644927a447370b/src/util/subprocess.h#L938-L941

    During the migration from Boost.Process in #28981, the second was chosen for two reasons: (1) it minimized changes at the call sites, and (2) it addressed quoting issues on Windows. This approach internally uses the subprocess::util::split() function, which currently fails to handle quoted tokens, such as in subshell invocations like sh -c 'ls; echo'.

    This issue was not caught by our tests as they are broken. The last commit fixes the system_tests/run_command test. This commit can be applied on top of the master branch to reproduce the existing failure in subprocess.

    The second commit is intended to be upstreamed. However, I’m seeking code review feedback first (please note that the src/util/subprocess.h code is C++11 compatible).

  2. DrahtBot commented at 11:45 am on May 21, 2025: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/32577.

    Reviews

    See the guideline for information on the review process. A summary of reviews will appear here.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #32601 (test: Fix system_tests/run_command test by hebasto)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

    LLM Linter (✨ experimental)

    Possible typos and grammar issues:

    • Vector of strings which needs to be joined -> Vector of strings which need to be joined [Subject-verb agreement error: “strings” is plural, so “need” is the correct verb]

    drahtbot_id_0520

  3. laanwj commented at 12:00 pm on May 21, 2025: member

    It’s ambiguous how to split arguments. The UNIX shell does it in one way (which is pretty nice and supports quoting), Windows shell does it another way. In #32566 we want the OS shell’s behavior, because the notifications have historically invoked the shell.

    However, for RunCommandParseJSON (as changed here) we don’t use a shell, so have to implement our own splitting if we want that. And i guess emulating a UNIX shell makes more sense, as it’s more versatile. But there is no single “proper” way, just whatever we decide to implement (and should document).

    Alternatively, can’t we just pass a vector of arguments to RunCommandParseJSON and avoid splitting/merging? Hmm maybe not as it comes in through one argument string -externalsigner anyhow, there’s just some additional arguments that we add. OK this is annoying.

  4. in src/util/subprocess.h:325 in 7a871febe6 outdated
    328     while (true) {
    329-      auto pos = str.find_first_of(delims, init);
    330-      if (pos == std::string::npos) {
    331-        res.emplace_back(str.substr(init, str.length()));
    332-        break;
    333+      if (str[init] == quote) {
    


    laanwj commented at 12:08 pm on May 21, 2025:
    How would you represent a (loose) quote in this case?

    hebasto commented at 2:54 pm on May 21, 2025:
    What would you suggest as a test case?

    hebasto commented at 3:09 pm on May 22, 2025:
    No longer relevant.
  5. hebasto commented at 12:12 pm on May 21, 2025: member

    Alternatively, can’t we just pass a vector of arguments to RunCommandParseJSON and avoid splitting/merging?

    This approach caused quoting-related issues on Windows, though I haven’t tested it with the current master branch.

  6. laanwj commented at 12:17 pm on May 21, 2025: member

    This approach #28981 (comment) quoting-related issues on Windows, though I haven’t tested it with the current master branch.

    The thing on Windows is that CreateProcessW takes a single command line, instead of a list of arguments. So a “join” is unavoidable. It does do quoting on windows while joining this command line though-i don’t know enough to know if it’s correct: https://github.com/arun11299/cpp-subprocess/blob/master/cpp-subprocess/subprocess.hpp#L186. It does this independently of whether the command line was passed as a single string of a vector, the former just goes through split too, so i don’t see how it causes that issue.

  7. laanwj added the label Utils/log/libs on May 21, 2025
  8. hebasto commented at 2:20 pm on May 21, 2025: member

    Alternatively, can’t we just pass a vector of arguments to RunCommandParseJSON and avoid splitting/merging? Hmm maybe not as it comes in through one argument string -externalsigner anyhow, there’s just some additional arguments that we add. OK this is annoying.

    Exactly. We have to parse and tokenize the -signer argument in cases like bitcoind -signer="/usr/bin/python3 /some_path/signer.py".

  9. ryanofsky commented at 3:01 pm on May 21, 2025: contributor

    I’d think for both windows and unix, a good approach could be not parse -signer strings, just leave it up to the shell on unix, and shell+appilcation itself on windows, because on windows the application runtime is responsible for splitting arguments, usually using CommandLineToArgv. This was also discussed pretty extensively in #13339.

    If we can avoid to have a split function at all that would seem great.

  10. hebasto commented at 3:14 pm on May 21, 2025: member

    If we can avoid to have a split function at all that would seem great.

    We can’t avoid splitting when calling execvp:https://github.com/bitcoin/bitcoin/blob/87ec923d3a7af7b30613174b41c6fb11671df466/src/util/subprocess.h#L1382

  11. ryanofsky commented at 4:27 pm on May 21, 2025: contributor

    We can’t avoid splitting when calling execvp:

    The suggestion is to let the shell do the splitting so we do not need to do it. We would call execvp with sh as the first command line argument -c as the second one, and the -signer value as the third one. This is also what the python subprocess module does when shell=True is passed.

  12. hebasto force-pushed on May 21, 2025
  13. hebasto commented at 9:29 pm on May 21, 2025: member

    @laanwj @ryanofsky

    Thank you for your feedback!

    I’ve restored the shell option in the subprocess API, and employed it on non-Windows systems.

  14. ryanofsky commented at 10:56 pm on May 21, 2025: contributor

    I’ve restored the shell option in the subprocess API, and employed it on non-Windows systems.

    Hmm the shell support implemented in eb160602a50bebcca3f53cdae741e1732738d51a actually seems somewhat broken as it is still splitting and joining instead of preserving the original string passed to Popen() instead of just passing it unaltered to execvp as I was suggesting.

    It actually might not work too badly in that commit as long as the original string doesn’t contain any tab characters, because otherwise the split and join functions are inverses of each other. But the later commit c82420ba84b754e4fa8373aee08ec45b9e524512 seems like a mistake and I don’t see how adding double quotes there could be good.

    It could help to know more about the bug this is supposed to fix as I don’t think I understand it.

  15. fanquake commented at 8:36 am on May 22, 2025: member

    (please note that the src/util/subprocess.h code is C++11 compatible).

    It seems a bit odd that we’ve imported a dependency that is pinned to an older C++ version than what we want to use, especially when we are trying to upstream changes to it. This being a nuisance has come up multiple times already (also here: #32343 (comment)). What is the longer term plan here?

  16. hebasto commented at 10:35 am on May 22, 2025: member

    What is the longer term plan here?

    The initial plan was to maintain our own code independently of the upstream repository. However, there have been a few requests to upstream our changes. At this point, I still believe we shouldn’t concern ourselves with upstream and should instead take full advantage of C++20.

  17. hebasto force-pushed on May 22, 2025
  18. hebasto commented at 2:14 pm on May 22, 2025: member

    I’ve restored the shell option in the subprocess API, and employed it on non-Windows systems.

    Hmm the shell support implemented in eb16060 actually seems somewhat broken as it is still splitting and joining instead of preserving the original string passed to Popen() instead of just passing it unaltered to execvp as I was suggesting.

    Thanks! Implemented as you suggested.

    (no further cleanups in subprocess.h for now)

  19. in src/test/system_tests.cpp:38 in 783db05c45 outdated
    34         BOOST_CHECK(result.isObject());
    35         const UniValue& success = result.find_value("success");
    36         BOOST_CHECK(!success.isNull());
    37         BOOST_CHECK_EQUAL(success.get_bool(), true);
    38     }
    39+#ifdef WIN32
    


    fanquake commented at 3:13 pm on May 22, 2025:
    Why is this test being scoped to only Windows?

    hebasto commented at 3:17 pm on May 22, 2025:
    On other platforms, execl() might fail only if there is problem with /bin/sh. I can’t see any trivial way to adjust the test to trigger a failure. Any suggestions?
  20. ryanofsky commented at 7:47 pm on May 22, 2025: contributor

    Looking at 4837fdc1569d58541fb310251b969ae160907e6e:

    • The change to subprocess.h seems like an improvement. Seems to make more sense to construct command line strings and pass them to the shell to parse, than construct command line strings and parse the strings we constructed ourselves.
    • It would probably be good if the new behavior were controlled by a shell boolean option like python & the upstream library
    • Windows seems to be doing joining, splitting, and the joining again, but that is unchanged in this PR.
    • It’s not clear to me what the problem in #32574 on illumos or actually is or how this change fixes it.
    • I don’t think the added quotes in external_signer.cpp are doing anything, or it isn’t clear what they are trying to do. They at least do not seem to be related to the illumos bug so maybe could be dropped here?

    On the c++11 issue, I think maybe just compiling the subprocess unit test in c++11 would be a good way to ensure compatibility. It seems cmake has a CXX_STANDARD per-target property. I would be surprised if there there were benefits to the subprocess library actually using newer c++ features.

  21. hebasto renamed this:
    subprocess: Handle quoted tokens properly
    subprocess: Let shell parse command on non-Windows systems
    on May 23, 2025
  22. hebasto commented at 6:13 am on May 23, 2025: member
    • It’s not clear to me what the problem in #32574 on illumos or actually is or how this change fixes it.

    I’ve updated #32574 with more details.

    On the c++11 issue, I think maybe just compiling the subprocess unit test in c++11 would be a good way to ensure compatibility. It seems cmake has a CXX_STANDARD per-target property. I would be surprised if there there were benefits to the subprocess library actually using newer c++ features.

    Please see #32343 (comment).

  23. hebasto commented at 12:51 pm on May 23, 2025: member
    Closing in favour of #32601.
  24. hebasto closed this on May 23, 2025

  25. in src/external_signer.cpp:65 in 783db05c45 outdated
    61@@ -62,12 +62,12 @@ bool ExternalSigner::Enumerate(const std::string& command, std::vector<ExternalS
    62 
    63 UniValue ExternalSigner::DisplayAddress(const std::string& descriptor) const
    64 {
    65-    return RunCommandParseJSON(m_command + " --fingerprint " + m_fingerprint + NetworkArg() + " displayaddress --desc " + descriptor);
    66+    return RunCommandParseJSON(m_command + " --fingerprint \"" + m_fingerprint + "\"" + NetworkArg() + " displayaddress --desc \"" + descriptor + "\"");
    


    hebasto commented at 12:55 pm on May 23, 2025:

    #32577 (comment):

    • I don’t think the added quotes in external_signer.cpp are doing anything, or it isn’t clear what they are trying to do.

    Descriptors contain the ( and ) characters, which may cause an issue with the shell:

    0$ echo pk(0279be667ef9dcbbac55a06295ce870b07029bfcdb2dce28d959f2815b16f81798)
    1bash: syntax error near unexpected token `('
    

    ryanofsky commented at 7:45 pm on May 23, 2025:

    #32577 (comment):

    • I don’t think the added quotes in external_signer.cpp are doing anything, or it isn’t clear what they are trying to do.

    Descriptors contain the ( and ) characters, which may cause an issue with the shell:

    0$ echo pk(0279be667ef9dcbbac55a06295ce870b07029bfcdb2dce28d959f2815b16f81798)
    1bash: syntax error near unexpected token `('
    

    I see. I think this will probably work pretty well on unix systems and will even work on windows systems as long as descriptor strings do not contain any whitespace because subprocess quote_argument function will currently leave arguments not containing whitespace alone.

    So this is probably fine if we are looking for a good-enough solution. But there still may be problems though if the descriptor strings contained any $ characters on unix (because these will still be interpreted by the shell) or whitespace characters on windows (because these would cause quote_argument to double the quotes), or if the strings contained backslashes on either platform.

    Also it looks like the subprocess quote_argument function is buggy, because any argument that begins and ends with quotes and doesn’t have spaces will incorrectly have its quotes stripped by the windows program that is executed. If that bug is ever fixed, we would have a bug here on windows.

    This is all very messy. So despite my earlier suggestion to use the shell to avoid needing to split ourselves #32577 (comment), I think the most robust solution would actually be to avoid needing to split these command line strings at all, and it would be best to change signature of RunCommandParseJSON to take a vector<string> argument instead of a std::string to avoid the need for any splitting.


    ryanofsky commented at 8:10 pm on May 23, 2025:

    it would be best to change signature of RunCommandParseJSON to take a vector argument instead of a std::string to avoid the need for any splitting

    Another advantage of this approach is that we should be able to avoid needing to change subprocess library at all, all the changes could be internal

  26. hebasto commented at 3:33 pm on May 23, 2025: member
    • It would probably be good if the new behavior were controlled by a shell boolean option like python & the upstream library

    Reverting 633e45b2e2728efcb0637afa94fcbd5756dfbe76 is still an option on the table. #32566 may also benefit from it.

    The issues mentioned in #32577 (comment) might be addressed by switching to an alternative subprocess::Popen constructor, as done in #32566.

  27. Revert "remove unneeded shell option from cpp-subprocess"
    This reverts commit 633e45b2e2728efcb0637afa94fcbd5756dfbe76.
    1df30b917a
  28. Run subprocess in shell on non-Windows systems 48a05564d5
  29. hebasto reopened this on May 23, 2025

  30. hebasto force-pushed on May 23, 2025
  31. hebasto commented at 5:21 pm on May 23, 2025: member
    Reopened per request.

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-05-25 18:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me