common: Close non-std fds before exec in RunCommandJSON #32343

pull laanwj wants to merge 3 commits into bitcoin:master from laanwj:2025-04-closefds changing 2 files +59 −1
  1. laanwj commented at 7:01 pm on April 24, 2025: member

    Picks up stale #30756, while addressing my fallback comment (https://github.com/bitcoin/bitcoin/pull/30756#discussion_r2030844440).

    Currently, RunCommandParseJSON runs its target with whatever fds happen to be open inherited on POSIX platforms. I don’t think there’s any practical scenario where this is a problem right now, but there’s a lot of potential for weird problems (eg, if a process manages to outlive bitcoind - perhaps it’s hanging - the listening port(s) won’t get released and starting bitcoind again will fail). It’s also a potential security issue if a child process is intended to be sandboxed at some point. Not to mention plain ugly :)

    cpp-subprocess has a feature to address this called close_fds. Not sure why it was removed in #29961 rather than fixing this during the migration, but this PR restores it, enables it for RunCommandParseJSON, and optimises it by iterating over /proc/self/fd/ like most other libraries do these days ([eg, glib]> (https://gitlab.gnome.org/GNOME/glib/blob/487b1fd20c5e494366a82ddc0fa6b53b8bd779ad/glib/gspawn.c#L1094)) since iterating all possible fd numbers has been found to be problematic.

    (Equivalent to #22417 was for boost::process)

  2. Revert "remove unneeded close_fds option from cpp-subprocess"
    This reverts commit 79c30363733503a1fb7d4c98aa0d56ced0be6e32.
    5f8bfa09e7
  3. laanwj added the label Utils/log/libs on Apr 24, 2025
  4. DrahtBot commented at 7:01 pm on April 24, 2025: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/32343.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    Concept ACK hebasto

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

  5. laanwj force-pushed on Apr 24, 2025
  6. DrahtBot added the label CI failed on Apr 24, 2025
  7. DrahtBot commented at 7:07 pm on April 24, 2025: contributor

    🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/41111890297

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  8. laanwj force-pushed on Apr 24, 2025
  9. laanwj commented at 7:54 pm on April 24, 2025: member

    This causes the signer test to fail (timeout) in the CentOS run, interesting, cannot reproduce it locally

    0[15:47:25.389] 
    1wallet_signer.py --descriptors                                       | ✖ Failed  | 1205 s
    
    0[15:47:25.389] 
    1                                   test_framework.authproxy.JSONRPCException: 'walletdisplayaddress' RPC took longer than 1200.000000 seconds. Consider using larger timeout for calls that take longer to return. (-344)
    

    i suspect this is actually a case where /proc/self/fd isn’t accessible so it gets to the fallback, which is so slow on this configuration that it times out. Ugh. Could go back to silently swallowing the exception, but also i’m not sure having this behavior conditionally is that useful at all (a security/robustness feature shouldn’t be so easy to accidentally disable).

    Apparently, doing this in a cross-platform is extremely difficult, mind that /proc/sys/fd is specific to Linux, and some platforms (like MacOS) don’t have a way at all: https://stackoverflow.com/a/56650580 . That leaves the brute-force option which can, as we see, be super slow.

  10. laanwj commented at 12:10 pm on April 25, 2025: member

    TIL Linux has close_range(first, last, flags) (borrowed from FreeBSD) which would be ideal for this, as it closes an entire range of file descriptors in one syscall. But unfortunately it was introduced in glibc 2.34, and we require only 2.31.

    Edit: i checked what actually leaks into a subprocess (using notify, not a signer, because it’s easier, but it’s going to be the same):

    0rm -f /tmp/open-files && ./launch-bitcoin-mainnet.sh -startupnotify="ls -al /proc/self/fd >> /tmp/open-files"
    
     0$ cat /tmp/open-files
     1total 0
     2dr-x------ 2 user user  0 Apr 25 15:15 .
     3dr-xr-xr-x 9 user user  0 Apr 25 15:15 ..
     4lrwx------ 1 user user 64 Apr 25 15:15 0 -> /dev/null  [EXPECTED]
     5l-wx------ 1 user user 64 Apr 25 15:15 1 -> /tmp/open-files [EXPECTED]
     6lrwx------ 1 user user 64 Apr 25 15:15 2 -> /dev/null  [EXPECTED]
     7lrwx------ 1 user user 64 Apr 25 15:15 25 -> socket:[8892399]
     8lrwx------ 1 user user 64 Apr 25 15:15 26 -> socket:[8892400]
     9lr-x------ 1 user user 64 Apr 25 15:15 27 -> /data/bitcoin/mempool.dat
    10lrwx------ 1 user user 64 Apr 25 15:15 28 -> socket:[8899630]
    11lr-x------ 1 user user 64 Apr 25 15:15 29 -> /data/bitcoin/onion_v3_private_key
    12lr-x------ 1 user user 64 Apr 25 15:15 3 -> /proc/1261821/fd [EXPECTED]
    13lrwx------ 1 user user 64 Apr 25 15:15 5 -> /data/bitcoin/.lock
    14lrwx------ 1 user user 64 Apr 25 15:15 6 -> /data/bitcoin/blocks/.lock
    15l-wx------ 1 user user 64 Apr 25 15:15 7 -> /data/bitcoin/debug.log
    

    Only the ones with [EXPECTED] should be there. Leaking the lock files is bad, because a child process that outlives bitcoind will hold on to them. Leaking a file descriptor to the onion key (but this is a race with -startupnotify, it isn’t persistently open) and debug log is potentially an info leak.

    The three sockets are from the bitcoin P2P code:

    0bitcoind  1261781                               user   25u     IPv4            8892399       0t0        TCP vm07.lan:18888 (LISTEN)
    1bitcoind  1261781                               user   26u     IPv4            8892400       0t0        TCP localhost:8334 (LISTEN)
    2bitcoind  1261781                               user   28u     IPv4            8899630       0t0        TCP localhost:58468->localhost:9051 (ESTABLISHED)
    
  11. vasild commented at 1:37 pm on April 25, 2025: contributor

    /proc/self/fd

    I would try something like this to check if /proc/self/fd is available:

     0--- i/ci/test/03_test_script.sh
     1+++ w/ci/test/03_test_script.sh
     2@@ -145,12 +145,17 @@ if [ "$RUN_UNIT_TESTS" = "true" ]; then
     3 fi
     4 
     5 if [ "$RUN_UNIT_TESTS_SEQUENTIAL" = "true" ]; then
     6   DIR_UNIT_TEST_DATA="${DIR_UNIT_TEST_DATA}" LD_LIBRARY_PATH="${DEPENDS_DIR}/${HOST}/lib" "${BASE_OUTDIR}"/bin/test_bi
     7 fi
     8
     9+ls -l /proc || :
    10+ls -l /proc/self || :
    11+ls -l /proc/self/fd || :
    12+mount || :
    13+
    14 if [ "$RUN_FUNCTIONAL_TESTS" = "true" ]; then
    15   # parses TEST_RUNNER_EXTRA as an array which allows for multiple arguments such as TEST_RUNNER_EXTRA='--exclude "rpc
    16   eval "TEST_RUNNER_EXTRA=($TEST_RUNNER_EXTRA)"
    17   LD_LIBRARY_PATH="${DEPENDS_DIR}/${HOST}/lib" test/functional/test_runner.py --ci "${MAKEJOBS}" --tmpdirprefix "${BAS
    18 fi
    
  12. laanwj force-pushed on Apr 25, 2025
  13. laanwj force-pushed on Apr 25, 2025
  14. laanwj commented at 2:34 pm on April 25, 2025: member

    I would try something like this to check if /proc/self/fd is available:

    Thanks! Tried that, looks like it is:

     0[09:58:29.841] === BEGIN mount info ===
     1...
     2 + ls -al /proc/self/fd
     3[09:58:29.860] total 0
     4[09:58:29.860] dr-x------ 2 root root  4 Apr 25 09:58 .
     5[09:58:29.860] dr-xr-xr-x 9 root root  0 Apr 25 09:58 ..
     6[09:58:29.860] lr-x------ 1 root root 64 Apr 25 09:58 0 -> /dev/null
     7[09:58:29.860] l-wx------ 1 root root 64 Apr 25 09:58 1 -> pipe:[887125611]
     8[09:58:29.860] l-wx------ 1 root root 64 Apr 25 09:58 2 -> pipe:[887125612]
     9[09:58:29.860] lr-x------ 1 root root 64 Apr 25 09:58 3 -> /proc/69/fd
    10...
    11[09:58:29.872] === END mount info ===
    

    Not just that, but confirmed with some other additional logging in the unit tests that it’s actually closing fd’s through the directory iterator loop:

    0[10:02:47.377] 2025-04-25T14:02:47.353738Z [test] [common/run_command.cpp:33] [RunCommandParseJSON] Error output: pre fs::directory_iterator
    1[10:02:47.377] closing: 3
    2[10:02:47.377] closing: 4
    3[10:02:47.377] closing: 5
    4[10:02:47.377] closing: 6
    5[10:02:47.377] closing: 7
    6[10:02:47.377] post fs::directory_iterator
    

    So it doesn’t look like that is the problem.

  15. DrahtBot removed the label CI failed on Apr 25, 2025
  16. hebasto commented at 10:54 pm on April 26, 2025: member
    Concept ACK.
  17. maflcko commented at 8:53 am on April 27, 2025: member
    The error seems to be intermittent, because now it looks like the centos task passed. (For reference, the failure three days ago was https://github.com/bitcoin/bitcoin/runs/41112451543)
  18. laanwj force-pushed on Apr 27, 2025
  19. DrahtBot added the label CI failed on Apr 27, 2025
  20. maflcko commented at 1:56 pm on April 27, 2025: member
    Re-running the CentOS task shows that 1 in 5 runs fail
  21. laanwj force-pushed on Apr 28, 2025
  22. laanwj commented at 6:37 am on April 28, 2025: member

    i replaced the fallback by an explicit OSError (this should be propagated to the parent through the error pipe), maybe that will help diagnose the issue.

    (failure on MacOS is expected – there is no /proc)

    Edit: MacOS should pass now. The Linux code makes the hard assumption /proc/<pid>/fd is available, for other operating systems always use the fallback.

  23. DrahtBot removed the label CI failed on Apr 28, 2025
  24. laanwj force-pushed on Apr 28, 2025
  25. laanwj force-pushed on Apr 28, 2025
  26. maflcko commented at 11:24 am on April 28, 2025: member
    Scheduled 10 runs in https://cirrus-ci.com/task/5397894338969600, but it seems they are all passing. Huh
  27. laanwj commented at 11:27 am on April 28, 2025: member

    Scheduled 10 runs in https://cirrus-ci.com/task/5397894338969600, but it seems they are all passing. Huh

    It could be 4a6b99b3172fe007328bd260968acdb0dd78c569, though i have no idea why that would make a difference. In any case, nice that it passes now.

    Cleaning up and squashing the relevant commits.

  28. laanwj force-pushed on Apr 28, 2025
  29. in src/util/subprocess.h:1316 in 3acd934311 outdated
    1311+        // For Linux, enumerate /proc/<pid>/fd.
    1312+        try {
    1313+          std::vector<int> fds_to_close;
    1314+          for (const auto& it : fs::directory_iterator(strprintf("/proc/%d/fd", getpid()))) {
    1315+            int64_t fd;
    1316+            if (!ParseInt64(it.path().filename().native(), &fd)) continue;
    


    maflcko commented at 12:22 pm on April 28, 2025:
    style nit: fd’s probably can’t be negative nor include + as prefix? So ToIntegral<uint64_t> may be a better fit.

    laanwj commented at 4:57 pm on April 28, 2025:
    Thanks, yes, that would be betterr.
  30. laanwj force-pushed on Apr 28, 2025
  31. cpp-subprocess: Iterate through /proc/self/fd for close_fds option on Linux
    As an optimization, iterate through /proc/<pid>/fd on Linux.
    
    Co-authored-by: laanwj <126646+laanwj@users.noreply.github.com>
    269c2a3dc5
  32. run_command: Enable close_fds option to avoid lingering fds bdf634918c
  33. laanwj force-pushed on Apr 28, 2025
  34. laanwj commented at 5:33 pm on April 28, 2025: member

    3acd934311477ac88e1c3176aaeaec2b3ad35425 → bdf634918c32d3c0da2c2262252470d1755bb085

    • Use ToIntegral instead of ParseInt

    Edit: had to restart “previous releases, depends DEBUG” run, failure was some docker network problem not related to this PR


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-04-29 18:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me