init: fixes file descriptor accounting #30065

pull sr-gi wants to merge 3 commits into bitcoin:master from sr-gi:2024-05-fdcount changing 1 files +28 −22
  1. sr-gi commented at 7:47 pm on May 8, 2024: member

    The current logic for file descriptor accounting is pretty convoluted and hard to follow. This is partially caused by the lack of documentation plus non-intuitive variable naming (which was more intuitive when fewer things were accounted for, but hasn’t aged well). This has led to this accounting being error-prone and hard to maintain (as shown in the first commit of this PR).

    Redefine some of the constants, plus document what are we accounting for so this can be extended more easily

    Fixes #18911

  2. DrahtBot commented at 7:47 pm on May 8, 2024: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage

    For detailed information about the code coverage, see the test coverage report.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK vasild, TheCharlatan, naumenkogs
    Concept ACK BrandonOdiwuor

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    Conflicts

    No conflicts as of last run.

  3. in src/init.cpp:1007 in a44da681f5 outdated
    985-    // Trim requested connection counts, to fit into system limitations
    986-    // <int> in std::min<int>(...) to work around FreeBSD compilation issue described in #2695
    987-    nMaxConnections = std::max(std::min<int>(nMaxConnections, fd_max - nBind - MIN_CORE_FILEDESCRIPTORS - MAX_ADDNODE_CONNECTIONS - NUM_FDS_MESSAGE_CAPTURE), 0);
    988     if (nFD < MIN_CORE_FILEDESCRIPTORS)
    989         return InitError(_("Not enough file descriptors available."));
    990-    nMaxConnections = std::min(nFD - MIN_CORE_FILEDESCRIPTORS - MAX_ADDNODE_CONNECTIONS - NUM_FDS_MESSAGE_CAPTURE, nMaxConnections);
    


    sr-gi commented at 7:57 pm on May 8, 2024:

    Notice how the latter nMaxConnections trim is redundant once we limit nFD = std::min(FD_SETSIZE, nFD);. Moving things around, plus accounting for nBind (which is currently not being accounted for in master), we get:

    0nMaxConnections = std::max(std::min<int>(nMaxConnections, nFD - nBind - MIN_CORE_FILEDESCRIPTORS - MAX_ADDNODE_CONNECTIONS - NUM_FDS_MESSAGE_CAPTURE), 0);
    1nMaxConnections = std::min(nMaxConnections, nFD - nBind - MIN_CORE_FILEDESCRIPTORS - MAX_ADDNODE_CONNECTIONS - NUM_FDS_MESSAGE_CAPTURE,);
    
  4. TheCharlatan commented at 7:57 pm on May 8, 2024: contributor
    Concept ACK
  5. sr-gi commented at 8:03 pm on May 8, 2024: member

    I think it is worth mentioning that the current approach is defining the minimum amount of file descriptions required without accounting for a single connection*, this means that if we are at the bare minimum, we will run but won’t be able to connect to any nodes. This is consistent with the current logic in master, but I think it’s not the way it should be.

    I’m happy to add another commit addressing this, but I’ve rathered start approaching this sticking to the same assumptions as master.

    * This actually accounts for manual connections, which may never happen, but not to any of our outbound. If we happen to not create any manuals we’d have 8 slots (MAX_ADDNODE_CONNECTIONS) for outbounds

  6. sr-gi commented at 8:07 pm on May 8, 2024: member
    @vasild you may be interested in this. I decided to fix it when seeing you extending the current logic in #29415
  7. BrandonOdiwuor commented at 3:57 pm on May 11, 2024: contributor
    Concept ACK
  8. in src/init.cpp:992 in 927f153845 outdated
     995 #endif
     996+    if (nFD < MIN_CORE_FDS)
     997+        return InitError(strprintf(_("Not enough file descriptors available. %d available, %d required."), nFD, MIN_CORE_FDS));
     998+
     999     // Trim requested connection counts, to fit into system limitations
    1000     // <int> in std::min<int>(...) to work around FreeBSD compilation issue described in #2695
    


    fanquake commented at 8:28 am on May 15, 2024:
    I think now could also be the time to remove this FreeBSD workaround. It was needed because older versions of FreeBSD, used to ship with an old Clang (3.x). However we now require Clang 15+, and the effected version of FreeBSD 10.x, is long EOL.

    sr-gi commented at 1:32 pm on May 15, 2024:
    Covered in 55f16f5
  9. laanwj added the label UTXO Db and Indexes on May 16, 2024
  10. laanwj added the label P2P on May 16, 2024
  11. vasild commented at 5:17 am on May 22, 2024: contributor

    The current logic for file descriptor accounting is pretty convoluted and hard to follow.

    Concept ACK, I stumbled on this recently in https://github.com/bitcoin/bitcoin/pull/29415

  12. in src/init.cpp:997 in 55f16f56f4 outdated
    978     nUserMaxConnections = args.GetIntArg("-maxconnections", DEFAULT_MAX_PEER_CONNECTIONS);
    979     nMaxConnections = std::max(nUserMaxConnections, 0);
    980-
    981-    nFD = RaiseFileDescriptorLimit(nMaxConnections + MIN_CORE_FILEDESCRIPTORS + MAX_ADDNODE_CONNECTIONS + nBind + NUM_FDS_MESSAGE_CAPTURE);
    982-
    983-#ifdef USE_POLL
    


    vasild commented at 10:42 am on May 24, 2024:

    In the commit message of a44da681f5 init: fixes fd accounting regarding poll/select:

    … file descriptions limits …

    s/descriptions/descriptors/

  13. in src/init.cpp:984 in 55f16f56f4 outdated
    987+    // Reserve enough FDs to account for the bare minimum, plus any manual connections, plus the bound interfaces
    988+    int nMinRequiredFds = MIN_CORE_FDS + MAX_ADDNODE_CONNECTIONS + nBind;
    989+
    990+    // Try raising the FD limit to what we need (nFD may be smaller than the requested amount if this fails)
    991+    nFD = RaiseFileDescriptorLimit(nMaxConnections + nMinRequiredFds);
    992+    // If we are using select instead of poll, our actuall limit may be even smaller
    


    vasild commented at 10:51 am on May 24, 2024:
    0    // If we are using select instead of poll, our actual limit may be even smaller
    
  14. in src/init.cpp:992 in 55f16f56f4 outdated
    1000-    // <int> in std::min<int>(...) to work around FreeBSD compilation issue described in #2695
    1001-    nMaxConnections = std::max(std::min<int>(nMaxConnections, fd_max - nBind - MIN_CORE_FILEDESCRIPTORS - MAX_ADDNODE_CONNECTIONS - NUM_FDS_MESSAGE_CAPTURE), 0);
    1002-    if (nFD < MIN_CORE_FILEDESCRIPTORS)
    1003-        return InitError(_("Not enough file descriptors available."));
    1004-    nMaxConnections = std::min(nFD - MIN_CORE_FILEDESCRIPTORS - MAX_ADDNODE_CONNECTIONS - NUM_FDS_MESSAGE_CAPTURE, nMaxConnections);
    1005+    nMaxConnections = std::max(std::min(nMaxConnections, nFD - nMinRequiredFds), 0);
    


    vasild commented at 12:07 pm on May 24, 2024:

    An alternative to max(min()) is clamp():

    0    nMaxConnections = std::clamp(nFD - nMinRequiredFds, 0, nMaxConnections);
    

    (feel free to ignore if you don’t find it more readable)


    sr-gi commented at 2:02 pm on May 31, 2024:

    I’ve rolled this back. After some of the refactoring, this looked like:

    0std::clamp(available_fds - min_required_fds, 0, nUserMaxConnections);
    

    However, available_fds cannot be smaller than min_required_fds, and nUserMaxConnections cannot be negative. So calling std::min should be enough.


    vasild commented at 4:36 pm on June 1, 2024:
    Excellent! Some simplifications leading to further simplifications. :heart:
  15. in src/init.cpp:976 in 55f16f56f4 outdated
    972+    // Make sure enough file descriptors are available. We need to reserve enough FDs to account for the bare minimum,
    973+    // plus all manual connections and all bound interfaces. Any remainder will be available for connection sockets
    974+
    975+    // Number of bound interfaces (we have at least one)
    976     int nBind = std::max(nUserBind, size_t(1));
    977+    // Maximum number of connetions with other nodes, this accounts for all types of outbound and inbounds expect for manual
    


    vasild commented at 12:20 pm on May 24, 2024:
    0    // Maximum number of connections with other nodes, this accounts for all types of outbounds and inbounds except for manual
    
  16. in src/init.cpp:978 in 55f16f56f4 outdated
    974+
    975+    // Number of bound interfaces (we have at least one)
    976     int nBind = std::max(nUserBind, size_t(1));
    977+    // Maximum number of connetions with other nodes, this accounts for all types of outbound and inbounds expect for manual
    978     nUserMaxConnections = args.GetIntArg("-maxconnections", DEFAULT_MAX_PEER_CONNECTIONS);
    979     nMaxConnections = std::max(nUserMaxConnections, 0);
    


    vasild commented at 12:38 pm on May 24, 2024:
    What about giving an InitError if the provided value is negative instead of silently clamping it to 0? Surely -maxconnections=-50 looks like a typo and probably the intention is not to end up with -maxconnections=0.

    sr-gi commented at 3:10 pm on May 24, 2024:
    I think that makes more sense
  17. in src/init.cpp:980 in 55f16f56f4 outdated
    983-#ifdef USE_POLL
    984-    int fd_max = nFD;
    985-#else
    986-    int fd_max = FD_SETSIZE;
    987+    // Reserve enough FDs to account for the bare minimum, plus any manual connections, plus the bound interfaces
    988+    int nMinRequiredFds = MIN_CORE_FDS + MAX_ADDNODE_CONNECTIONS + nBind;
    


    vasild commented at 12:45 pm on May 24, 2024:

    according to doc/developer-notes.md variables should use snake_case and no need to denote the type in the variable name (n for integer): min_required_fds.

    But more importantly, the name nMinRequiredFds is misleading because we accept a smaller number - a bit below we check if (nFD < MIN_CORE_FDS) return error..., so it is not “minimum required”. I think this variable here should be: min_required_fds = MIN_CORE_FDS + nBind, that check should be if (nFD < min_required_fds) return error and then:

    0-    nFD = RaiseFileDescriptorLimit(nMaxConnections + nMinRequiredFds);
    1+    nFD = RaiseFileDescriptorLimit(min_required_fds + MAX_ADDNODE_CONNECTIONS + nMaxConnections);
    

    and

    0-    nMaxConnections = std::max(std::min<int>(nMaxConnections, nFD - nMinRequiredFds), 0);
    1+    nMaxConnections = std::max(std::min<int>(nMaxConnections, available_fds - min_required_fds - MAX_ADDNODE_CONNECTIONS), 0);
    

    sr-gi commented at 3:09 pm on May 24, 2024:

    I agree the if check should be updated, but I don’t necessarily agree with splitting MAX_ADDNODE_CONNECTIONS from min_required_fds.

    There is the outstanding question of whether the minimum required fds to run Core should account for some of the connections. Currently, we are not accounting for any (given the check is performed against MIN_CORE_FDS), but I think we should be accounting for at least, outbound + manuals


    vasild commented at 2:08 pm on May 28, 2024:

    Ok, no strong opinion whether to include or exclude MAX_ADDNODE_CONNECTIONS in min_required_fds.

    I guess better to update the check. It is funny to have code like

    0min_required = ...;
    1if (available < 10) error
    

    it should really be

    0min_required = ...;
    1if (available < min_required) error
    

    sr-gi commented at 3:27 pm on May 28, 2024:
    I’ve fixed that, plus increased the min_required_fds to also account for automatic outbounds in the last commit. I’m open to discuss about the latter, but I think it’s something we should also account for

    vasild commented at 9:11 am on May 29, 2024:

    The newly added int auto_outbounds = std::min(MAX_OUTBOUND_FULL_RELAY_CONNECTIONS + MAX_BLOCK_RELAY_ONLY_CONNECTIONS + MAX_FEELER_CONNECTIONS, nUserMaxConnections); seems to be duplicating this logic, but not exactly:

    https://github.com/bitcoin/bitcoin/blob/be100cf4c77a3e45750773689e0a396fda39d8a7/src/net.h#L1069-L1073

    and I think this is getting complicated again. I would suggest to keep it simple - drop the last commit f569b06a53851f844219cd80cd33676470e3b776 init: fix min required fds check and account for outbounds

    and only apply this change to 3982543007 init: improves file descriptors accounting and docs:

    0-    if (available_fds < MIN_CORE_FDS)
    1+    if (available_fds < min_required_fds)
    

    sr-gi commented at 5:33 pm on May 29, 2024:

    It is duplicating part of that, given CConnman::m_max_automatic_outbound is private.

    I agree it is adding more complexity, but not accounting for it in the minimum means that could be in a situation where no single outbound connection is accounted for, so the user is basically only connected to manuals.

    I acknowledge that’s fairly rare, but the two reasons why we are currently not facing this issue is, first, because the minimum is fairly low, so it would need a system with really limited resources to become an issue, and second, because If we were in that situation it would also need to be that case that the node is defining manual connections, otherwise the 8 reserved slots for manual would be available for automatic.

    Despite that, this feels like bad design, so I’d argue accounting for automatic is something we should do (though not necessarily as done in the current approach)


    vasild commented at 9:15 am on May 30, 2024:
    If the node is connected to 8 manually defined outbound peers and 0 inbounds and 0 automatic outbounds, then it is still operational? No need to prohibit startup ~in this case~ if there is a chance that this might happen?

    sr-gi commented at 1:55 pm on May 30, 2024:

    Is it though? If the user has not specified that he wants no automatic connections (-maxconnections != 0) I’d argue he’d expect them to happen.

    In any case, I think this is such an edge case that it is not worth blocking the PR on it. If you have a strong opinion on this, I’m happy to drop it and squash the rest of the commit to one of the previous ones


    vasild commented at 5:01 am on May 31, 2024:
    I am somewhat ~0 on erroring in the case of 8manual/0in/0out, but my bigger anxiety with the latest commit is that it makes it confusing again (or too complicated in my judgement). I tried to code a simple way to account for the automatic outbounds but gave up. Also, most naturally, as the “minimum required” name suggests, the check should be if (... < min_required). I find if (... < min_required + something more) confusing.

    sr-gi commented at 1:33 pm on May 31, 2024:
    I agree with the “making it more confusing”. Given this is an edge case, I’ll remove the outbound count and we can account for it if someone has a better way of doing so, or if someone opens an issue about it.
  18. in src/init.cpp:983 in 55f16f56f4 outdated
    986-    int fd_max = FD_SETSIZE;
    987+    // Reserve enough FDs to account for the bare minimum, plus any manual connections, plus the bound interfaces
    988+    int nMinRequiredFds = MIN_CORE_FDS + MAX_ADDNODE_CONNECTIONS + nBind;
    989+
    990+    // Try raising the FD limit to what we need (nFD may be smaller than the requested amount if this fails)
    991+    nFD = RaiseFileDescriptorLimit(nMaxConnections + nMinRequiredFds);
    


    vasild commented at 1:08 pm on May 24, 2024:

    snake_case for variables. I think available or available_fds is a good name:

    0    available_fds = RaiseFileDescriptorLimit(nMaxConnections + nMinRequiredFds);
    

    sr-gi commented at 2:56 pm on May 24, 2024:
    I kept nFD because it is not locally defined, but defined in the namespace, so at least one other method is using it. If we don’t mind a bigger diff I’m happy to change it

    vasild commented at 2:11 pm on May 28, 2024:
    ACK to rename nFD (maybe as part of 2f49fa0e761ee67d6eeb424b157689f479da8b2c init: improves file descriptors accounting and docs)

    sr-gi commented at 3:25 pm on May 28, 2024:
    Updated
  19. vasild commented at 1:14 pm on May 24, 2024: contributor

    Approach ACK 55f16f56f4

    Might as well squash the last commit 55f16f56f4 init: Remove FreeBSD workaround to [#2695](/bitcoin-bitcoin/2695/) into some of the previous ones that touch that line.

  20. sr-gi force-pushed on May 24, 2024
  21. sr-gi commented at 3:29 pm on May 24, 2024: member
    Thanks for the suggestions @vasild, I’ve partially covered them and commented on some of the pending ones
  22. sr-gi force-pushed on May 24, 2024
  23. DrahtBot added the label CI failed on May 24, 2024
  24. DrahtBot commented at 3:36 pm on May 24, 2024: contributor

    🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the documentation.

    Possibly this is due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    Leave a comment here, if you need help tracking down a confusing failure.

    Debug: https://github.com/bitcoin/bitcoin/runs/25385987025

  25. sr-gi force-pushed on May 24, 2024
  26. sr-gi force-pushed on May 24, 2024
  27. DrahtBot removed the label CI failed on May 24, 2024
  28. vasild approved
  29. vasild commented at 2:13 pm on May 28, 2024: contributor

    ACK fa42bf7ee89999b33d4af0600b0d8da1275e5d5e

    This is already an improvement over master. Would be nice to resolve #30065 (review) as well.

  30. DrahtBot requested review from TheCharlatan on May 28, 2024
  31. DrahtBot requested review from BrandonOdiwuor on May 28, 2024
  32. sr-gi force-pushed on May 28, 2024
  33. sr-gi force-pushed on May 28, 2024
  34. sr-gi force-pushed on May 28, 2024
  35. sr-gi force-pushed on May 29, 2024
  36. sr-gi force-pushed on May 31, 2024
  37. sr-gi commented at 1:59 pm on May 31, 2024: member

    ACK fa42bf7

    This is already an improvement over master. Would be nice to resolve #30065 (comment) as well.

    Addressed all comments now

  38. vasild approved
  39. vasild commented at 4:34 pm on June 1, 2024: contributor

    ACK a9e211a5302630635b825655cd984683794aa42c

    Note: this is the last but one commit. The last commit 23f843d816 init: fix min required fds check and account for outbounds now contains a single dummy change of removing an empty line. Can/should be dropped altogether. You can push -f a9e211a5302630635b825655cd984683794aa42c into this branch.

    Thanks!

  40. sr-gi force-pushed on Jun 3, 2024
  41. sr-gi commented at 2:16 pm on June 3, 2024: member
    Nice catch @vasild, I’ve squashed that into one of the previous commits. It should be good now.
  42. vasild approved
  43. vasild commented at 10:27 am on June 20, 2024: contributor
    ACK fe76929d4d68a4ca6d27c58c1e243a64fdc70b2a
  44. in src/init.cpp:978 in fe76929d4d outdated
    974@@ -975,12 +975,14 @@ bool AppInitParameterInteraction(const ArgsManager& args)
    975     int nBind = std::max(nUserBind, size_t(1));
    976     // Maximum number of connections with other nodes, this accounts for all types of outbounds and inbounds except for manual
    977     nUserMaxConnections = args.GetIntArg("-maxconnections", DEFAULT_MAX_PEER_CONNECTIONS);
    978-    nMaxConnections = std::max(nUserMaxConnections, 0);
    979+    if (nUserMaxConnections < 0){
    


    TheCharlatan commented at 2:21 pm on August 24, 2024:
    Nit (clang-format): Space after closing parenthesis.
  45. in src/init.cpp:150 in 0a7030d312 outdated
    147 #else
    148-#define MIN_CORE_FILEDESCRIPTORS 150
    149+#define MIN_LEVELDB_FDS 150
    150 #endif
    151 
    152+static const int MIN_CORE_FDS = MIN_LEVELDB_FDS + NUM_FDS_MESSAGE_CAPTURE;
    


    TheCharlatan commented at 5:46 pm on August 25, 2024:
    nit: static constexpr

    TheCharlatan commented at 5:50 pm on August 25, 2024:
    Why did you not put MAX_ADDNODE_CONNECTIONS in here?

    sr-gi commented at 2:41 pm on September 5, 2024:
    It was a matter of definition. The bare minimum does not really account for connections, even though running the node without any connection won’t be too useful.

    TheCharlatan commented at 3:04 pm on September 5, 2024:
    Oh, that makes sense!
  46. in src/init.cpp:977 in fe76929d4d outdated
    973+    // plus all manual connections and all bound interfaces. Any remainder will be available for connection sockets
    974+
    975+    // Number of bound interfaces (we have at least one)
    976     int nBind = std::max(nUserBind, size_t(1));
    977+    // Maximum number of connections with other nodes, this accounts for all types of outbounds and inbounds except for manual
    978     nUserMaxConnections = args.GetIntArg("-maxconnections", DEFAULT_MAX_PEER_CONNECTIONS);
    


    TheCharlatan commented at 6:26 pm on August 25, 2024:

    I think this should be de-globalized and renamed like I’ve done here: https://github.com/TheCharlatan/bitcoin/commit/a2c048322c67436e02a3a9607400f65f10dd6564

    The avaialble_fds global variable there are also kind of silly, since it is only there to be logged at a later point in time, but I don’t feel strongly if it should go or not either way.


    sr-gi commented at 2:46 pm on September 5, 2024:
    Yeah, I agree, I was going to de-globalize most of them but saw that they were used for logging later, so I decided not to overcomplicate the change by addressing things that are not directly related. I’ll do so with nUserMaxConnections given it’s not used for anything else
  47. TheCharlatan commented at 6:31 pm on August 25, 2024: contributor
    This looks good, but I think these few, simple things should be addressed.
  48. DrahtBot requested review from TheCharlatan on Aug 25, 2024
  49. DrahtBot added the label CI failed on Aug 31, 2024
  50. DrahtBot removed the label CI failed on Sep 4, 2024
  51. sr-gi force-pushed on Sep 5, 2024
  52. sr-gi commented at 3:02 pm on September 5, 2024: member
    Took (most of?) your suggestions @TheCharlatan
  53. init: fixes fd accounting regarding poll/select
    We are computing our file descriptors limits based on whether we use
    poll or select. However, we are taking that into account only partially
    (subtracting from fd_max in one case, but from nFD later on). Moreover,
    nBind is also only accounted for partially.
    
    Simplify and fix this logic
    29008a7ff4
  54. init: improves file descriptors accounting and docs
    The current logic for file descriptor accounting is pretty convoluted and hard
    to follow. This is partially caused by the lack of documentation plus non-intuitive
    variable naming (which was more intuitive when fewer things were accounted for, but
    hasn't aged well). This has led to this accounting being error-prone and hard to maintain
    (as shown in he previous commit).
    
    Redefine some of the constants, plus document what are we accounting for so this can be
    extended more easily
    
    Remove FreeBSD workaround to #2695
    c773649481
  55. init: error out if -maxconnections is negative d4c7c4009d
  56. sr-gi force-pushed on Sep 5, 2024
  57. sr-gi commented at 3:58 pm on September 5, 2024: member
    Rebased to make CI green. This was based on a pre-CMake commit
  58. vasild approved
  59. vasild commented at 8:17 am on September 9, 2024: contributor

    ACK d4c7c4009da14c84576d43ab4a1241967cfa5ffc

    This code gives me a headache :exploding_head: (the old one, the new one - less)

  60. TheCharlatan approved
  61. TheCharlatan commented at 8:30 am on September 9, 2024: contributor
    ACK d4c7c4009da14c84576d43ab4a1241967cfa5ffc
  62. sr-gi commented at 2:49 pm on September 9, 2024: member

    ACK d4c7c40

    This code gives me a headache 🤯 (the old one, the new one - less)

    This is literally the reason why I decided to fix this 🫠

    https://www.notion.so/srgi/PR-29415-110fd8ede41446fba41e81859bbdf985?pvs=4#521d59ebadd2475bb5630fbad147cbe4

  63. naumenkogs commented at 12:57 pm on September 10, 2024: member

    ACK d4c7c4009da14c84576d43ab4a1241967cfa5ffc

    The change makes the code much easier to understand, and I couldn’t find any bugs or flaws.

  64. ryanofsky merged this on Sep 10, 2024
  65. ryanofsky closed this on Sep 10, 2024


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-11-21 09:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me