init: Fixes for file descriptor accounting #16003

pull tryphe wants to merge 4 commits into bitcoin:master from tryphe:fd-limits-3 changing 2 files +15 −14
  1. tryphe commented at 8:28 am on May 10, 2019: contributor

    Bug 1: If the daemon does not have enough file descriptors (we’ll call them FDs), it is not asserted. The daemon only aborts if FD count < MIN_CORE_FILEDESCRIPTORS(150).

    Bug 2: (Unsure how to reproduce undefined behavior) If the system ulimit setting is a low number, the daemon does not allocate the right amount of FDs during init via RaiseFileDescriptorLimit. But it’s close.

    Steps to produce Bug 1: Run ulimit -n 150, then start bitcoind in the same shell.

    Result: bitcoind runs normally and displays an erroneous warning:

    0Warning: Reducing -maxconnections from 125 to -8, because of system limitations.
    1...
    2Using at most -8 automatic connections (150 file descriptors available)
    

    Expected: bitcoind should not start, as it needs roughly 163~172~(?) FDs.

    -MIN_CORE_FILEDESCRIPTORS = 150 -MAX_ADDNODE_CONNECTIONS = 8 (new in assertion) ~-MAX_OUTBOUND_CONNECTIONS = 8~ ~-CConnman::Options:nMaxFeeler = 1~(Edit: these exist within the bounds of -maxconnections, if it’s high enough) -Number of -bind interfaces, default = 1 (new in allocation) -Number of -rpcthreads, default = 4 (new in allocation and assertion)

    150 + 8 + 1 + 4 = 163 minimum +125 maxconnections = 288 speculative maximum

    Coverage with this patch + ulimit -n 150:

    0Bitcoin Core version v0.18.99.0-50ccaa56f -dirty (release build)
    1Error: Not enough file descriptors available. 150 available, 163 required.
    

    daemon closes

    Coverage with this patch + ulimit -n 163:

    0Bitcoin Core version v0.18.99.0-50ccaa56f (release build)
    1There are 163 file descriptors available, 163 required, 163 reserved, and 288 requested.
    2Warning: Reducing -maxconnections from 125 to 0, because of file descriptor limitations.
    3...
    4Using at most 0 automatic connections (163 file descriptors available)
    

    Coverage with this patch + ulimit -n 203:

    0Bitcoin Core version v0.18.99.0-50ccaa56f (release build)
    1There are 203 file descriptors available, 163 required, 203 reserved, and 288 requested.
    2Warning: Reducing -maxconnections from 125 to 40, because of file descriptor limitations.
    3...
    4Using at most 40 automatic connections (203 file descriptors available)
    

    Coverage with this patch + 1024 ulimit (matches most Linux systems)

    0Bitcoin Core version v0.18.99.0-50ccaa56f (release build)
    1There are 1024 file descriptors available, 163 required, 288 reserved, and 288 requested.
    2...
    3Using at most 125 automatic connections (1024 file descriptors available)
    

    The replaced (old) arithmetic smells strange if you consider the subtraction can subtract negative values to increase their value, and any unintended behavior will then be silenced by std::max(x,0). Now there should be less likelyhood of suppressed failure.

    Possible fix for #14870

    Edit: thanks to IRC chat for the tips!

  2. tryphe renamed this:
    [init] an incorrect amount of file descriptors is requested, and a different amount is also asserted
    init: an incorrect amount of file descriptors is requested, and a different amount is also asserted
    on May 10, 2019
  3. DrahtBot added the label P2P on May 10, 2019
  4. DrahtBot commented at 10:00 am on May 10, 2019: member

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #16362 (Add bilingual_str type by hebasto)
    • #15759 ([p2p] Add 2 outbound blocks-only connections by sdaftuar)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  5. tryphe force-pushed on May 10, 2019
  6. tryphe force-pushed on May 10, 2019
  7. [init] reserve file descriptors for bind() and accept(), and fix the
    amount of
    descriptors requested
    
    [init] initialize more file descriptors based on number of bind
    interfaces, MAX_OUTBOUND_CONNECTIONS, and
    CConnman::Options:nMaxFeeler
    
    [init] add verbose output for FD limits, cleanup lines.
    
    [init] cleanup LogPrintF to print FDs (oops), use better variable
    semantics, include MAX_ADDNODE_CONNECTIONS in nUserMinConnections
    
    [init] addnode connections don't count towards maxconnections
    
    [init] correct soft limit for MAX_OUTBOUND_CONNECTIONS
    
    [init] connections are softcapped at 0
    
    [init] fix up descriptor commits, remove unused variables, add verbosity
    
    [init] fix up strings slightly
    
    [init] small refactor for better verbosity
    20e6e0f898
  8. tryphe force-pushed on May 10, 2019
  9. jonatack commented at 10:45 am on May 15, 2019: member

    Reference IRC discussion, IIUC: http://www.erisian.com.au/bitcoin-core-dev/log-2019-05-10.html#l-127

    Tests that show the expected behavior and verify the issue/regression would be good here if possible.

  10. in src/net.h:63 in 148e43a1aa outdated
    58@@ -59,6 +59,8 @@ static const unsigned int MAX_SUBVERSION_LENGTH = 256;
    59 static const int MAX_OUTBOUND_CONNECTIONS = 8;
    60 /** Maximum number of addnode outgoing nodes */
    61 static const int MAX_ADDNODE_CONNECTIONS = 8;
    62+/** Number of feeler connections */
    63+static const int NUM_FEELER_CONNECTIONS = 1;
    


    Empact commented at 7:33 pm on May 15, 2019:
    nit: MAX_FEELER_CONNECTIONS for consistency, and coherence with nMaxFeeler

    tryphe commented at 5:16 am on May 17, 2019:
    Right, that makes sense. This essentially becomes 0 when -maxconnections <=MAX_OUTBOUND_CONNECTIONS, which I wasn’t sure of initially.
  11. Empact commented at 7:35 pm on May 15, 2019: member

    Two requests to simplify review:

    • squash travis fixups into the appropriate commit
    • split fixes into 2 PRs, one addressing each
  12. tryphe commented at 5:05 am on May 17, 2019: contributor
    @Empact Thanks. I thought it would be crazy to open two PRs, because bug 1 fixes something for sure, and bug 2 isn’t really reproducible. Any bad behavior from bug 2 could be confused for bug 1, and it would be hard to tell which one it is. Eg, things could go south if not enough FDs were allocated (maybe allocated is the wrong word) or not enough were asserted. And since file descriptor problems via threads are hard to reproduce, I thought it made sense to mash it all together.
  13. in src/init.cpp:1013 in 148e43a1aa outdated
    1021+    nFD = RaiseFileDescriptorLimit(nFDMin + nUserMaxConnections);
    1022+    if (nFD < nFDMin)
    1023+        return InitError(strprintf(_("Not enough file descriptors available. %d available, %d required."), nFD, nFDMin));
    1024+
    1025+    // Calculate new -maxconnection count. Note: std::min<int>(...) to work around FreeBSD compilation issue described in #2695
    1026+    nMaxConnections = std::min<int>(nMaxConnections, nFD - nFDMin);
    


    tryphe commented at 5:48 am on May 17, 2019:
    I wonder if we need <int> here? Both arguments are int now. But I didn’t want to cover up this old fix.
  14. tryphe commented at 7:36 pm on July 18, 2019: contributor

    Looking for more suggestions here. I’m not quite sure how to logically split this into multiple PRs, so I’ll leave that for now.

    This commit certainly isn’t perfect, but is essentially an open improvement to enforce/allocate a slightly more accurate number of file descriptors. If the user has an absurdly low system setting, we lower the number of maximum inbound connections until it’s no longer possible, instead of possibly passing with an unsafe amount.

    One note: I’ve included mmaped things as well, because they require a temporary handle to obtain the mapping. Not sure if this was overkill but it seemed like a good idea.

    I also have a feeling that the number of bind arguments isn’t always equal to the number of interfaces bitcoind will listen on, so that’s probably a bad assumption.

    Any additions or ideas on how to fix this up is appreciated. Thanks :)

  15. MarcoFalke added the label Resource usage on Jul 18, 2019
  16. MarcoFalke added the label Brainstorming on Jul 18, 2019
  17. MarcoFalke added the label Bug on Jul 18, 2019
  18. init: avoid string conflicts with #15329 (reversed)
    init: nMaxConnections and (nFD-nFDMin) cannot be negative, don't check against 0
    init: try to fix Travis note about deducing types
    init: try to fix travis again "deduced conflicting types"
    init: try a better way through the Travis problem
    init: revert old init warning for drahtbot
    net: name nit "NUM"->"MAX"_FEELER_CONNCTIONS
    init: name nit "NUM"->"MAX"_FEELER_CONNCTIONS
    init: fixup typo
    c45779cfef
  19. tryphe force-pushed on Jul 18, 2019
  20. init: nit: revert InitWarning string change for log compat dcb63c9077
  21. init: revert log correction for log compat (oops) 7cb1ad5f83
  22. in src/init.cpp:1018 in c45779cfef outdated
    1015     LogPrintf("There are %d file descriptors available, %d required, %d reserved, and %d requested.\n", nFD, nFDMin, nFDMin + nMaxConnections, nFDMin + nUserMaxConnections);
    1016 
    1017     // Display a warning if the number of total maximum connections was lowered
    1018     if (nMaxConnections < nUserMaxConnections)
    1019-        InitWarning(strprintf(_("Reducing -maxconnections from %d to %d, because of file descriptor limitations."), nUserMaxConnections, nMaxConnections));
    1020+        InitWarning(strprintf(_("Reducing -maxconnections from %d to %d, because of system limitations."), nUserMaxConnections, nMaxConnections));
    


    tryphe commented at 8:35 pm on July 18, 2019:
    nit: maybe I should remove this change for log compatability
  23. tryphe commented at 9:41 pm on July 18, 2019: contributor
    Maybe we can just implement -extrafds=n (or maybe just fds for the total and define the minimum somewhere) instead of modifying this code every time we realize something new or existing needs a file handle. It doesn’t really make sense to modify this code if we can just modify a default config value, right (except of course for dynamic values that can’t be known)?
  24. tryphe commented at 6:55 pm on July 19, 2019: contributor

    I figured I should supply some pseudocode to show what happens differently in the code and how it’s related to both Bug 1 and 2. There’s a bit of redundant code, and lots of constants from other places, so I tried to reduce the amount of references needed to actually understand what’s going on.

    Suppose you are running Mac OS X and listening on the network interface. The ulimit is 256. Suppose you also have 1 bind interface nBind(1) and 1 RPC thread nRpc(1).

    Current code

    0nFD = RaiseFileDescriptorLimit(nMaxConn(125) + nCore(150) + nAddnode(8)) = gets reduced to 256
    1nMaxConnections = min(125, nFD(256) - nBind(1) - nCore(150) - nAddnode(8) = min(125, 97) = 97
    2if ( nFD < nCore(150) ) error("not enough fds")
    3nMaxConnections = min(nFD(256) - nCore(150) - nAddnode(8), nMaxConnections(97)) = min(98, 97) = 97
    

    Usable FDs: nCore(150) + nMaxConnections(97) = 247 + nAddnode(8) + nRpc(1) + nBind(1) = 257 There’s too many, even if we only have 1 bind interface and 1 RPC thread.

    This PR

    0nFDMin = nCore(150) + nAddnode(8) + nBind(1) + nRpc(1) = 160
    1nFD = RaiseFileDescriptorLimit(nMaxConn(125) + nFDMin(160)) = gets reduced to 256
    2if ( nFD < nFDMin(160) ) error("not enough fds")
    3nMaxConnections = min(nMaxConn(125), 256-nFDMin(160)) = 96
    

    Usable FDs: nCore(150) + nMaxConnections(96) = 246 + nAddnode(8) + nRpc(1) + nBind(1) = 256

  25. DrahtBot added the label Needs rebase on Jul 24, 2019
  26. DrahtBot commented at 6:15 pm on July 24, 2019: member
  27. laanwj renamed this:
    init: an incorrect amount of file descriptors is requested, and a different amount is also asserted
    init: Fixes for file descriptor accounting
    on Sep 30, 2019
  28. adamjonas commented at 1:51 am on May 6, 2020: member
    Hi @tryphe - Wondering if you are going to continue with this PR or whether it should be closed. It looks like this has needed a rebase for some time. I’d also suggest you squash your commits in accordance with the CONTRIBUTING guidelines and add tests as suggested above to verify the issue.
  29. fanquake commented at 11:21 pm on May 7, 2020: member
    Will close this for now. However I think the changes are worth following up on, so will open a new good first issue. Will also extract one change that can just be merged now.
  30. fanquake closed this on May 7, 2020

  31. fanquake referenced this in commit 8da1e43b63 on May 12, 2020
  32. sidhujag referenced this in commit f4d69f72b5 on May 12, 2020
  33. tryphe commented at 6:15 pm on May 18, 2020: contributor

    Thanks for following up @adamjonas @fanquake!

    I certainly don’t think this is the most correct way to proceed, especially with the large scope of changes with new sockets and threads. But it’s closer than what we had before, in terms of knowing the [required, requested, allocated] counts of FDs. I think it makes much more sense to modularize the summing of FDs and make it easier to obtain the counts with future changes. Maybe something like a DescriptorMan that does all of the dirty work. Doing this will make adding more sockets or modifying connection min/max bounds much less conflicted. It’s apparent that we don’t want to keep editing these general files (init.cpp and net.h) every time we need to change the descriptor count.

    Will ping #18911 so they can see this message.

  34. fanquake removed the label Needs rebase on May 19, 2020
  35. DrahtBot locked this on Feb 15, 2022

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-07-06 04:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me