p2p: Increase inbound capacity for block-relay only connections #28463

pull mzumsande wants to merge 6 commits into bitcoin:master from mzumsande:202308_increase_block_relay changing 9 files +169 −11
  1. mzumsande commented at 8:36 pm on September 12, 2023: contributor

    This is joint work with amitiuttarwar.

    See issue #28462 for a broader discussion on increasing the number of block-relay-only connections independent of this particular implementation proposal.

    We suggest to increase the number of inbound slots allocated to block-relay-only peers by increasing the default maximum connections from 125 to 200, with 50% of inbound slots accessible for tx-relaying peers. This is a prerequisite for being able to increase the default number of outgoing block-relay-only peers later, because the current inbound capacity of the network is not sufficient. In order to account for incoming tx-relaying peers separately from incoming block-relay peers, changes to the inbound eviction logic are necessary.

    See the next post in this thread for a more detailed explanation and motivation of the changes.

  2. DrahtBot commented at 8:36 pm on September 12, 2023: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage

    For detailed information about the code coverage, see the test coverage report.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    Concept ACK naumenkogs, naiyoma

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #31260 (scripted-diff: Type-safe settings retrieval by ryanofsky)
    • #29415 (Broadcast own transactions only via short-lived Tor or I2P connections by vasild)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  3. DrahtBot added the label P2P on Sep 12, 2023
  4. mzumsande marked this as a draft on Sep 12, 2023
  5. mzumsande commented at 8:37 pm on September 12, 2023: contributor

    Status Quo

    We have a default maximum of 125 connections (not counting manual ones). Out of these, 11 (8 full outbounds, 2 block-relay-only, 1 feeler) are reserved for outbounds, leaving us with 114 inbound slots. There are currently no limits on how many of the inbounds can support transaction relay.

    Problem

    We want to increase the number of inbound slots specifically offered to block-relay-only connections, but don’t want to significantly change the number of tx-relaying inbound peers, because we don’t want to radically change the memory and traffic requirements that come with running a full node with the default configuration. This is complicated by a timing issue: When a new peer connects to us, we don’t know yet whether it is block-relay-only before we have processed their version message. Inbound peer eviction, however, happens right after a peer connects to us and before processing its version msg. This means that we don’t yet know during inbound eviction, if the new peer wants tx relay.

    Approach

    We propose to introduce a separate maximum of tx-relaying inbound connections, in addition to the existing global maximum. The new maximum for tx-relaying inbounds gets chosen such that 50% of the total inbound slots can be utilized by tx-relaying peers. We choose a ratio here instead of a fixed number here so that it scales with users changing their -maxconnections value from the default. The ratio of 50% is also made adjustable in the form of a new startup parameter.

    Changes to the eviction logic

    We don’t change the eviction logic for total connections, which happens in CreateNodeFromAcceptedSocket(). We add a tx-relay-specific eviction check to the version message processing which runs if we don’t have capacity for another tx-relaying peer: This check uses the same logic from AttemptToEvictConnection, but specifically evicts only tx-relaying peers (by protecting the rest of the peers). Only if we cannot find another peer to evict (e.g. all peers are protected, which should only happen if a low -maxconnections is used) we disconnect the new peer. This approach achieves that the number of tx-relaying peers can never exceed the limit.

    Numbers

    Specifically, we propose to increase the total number of connections from 125 to 200, with a suggested ratio of 50% tx-relay inbounds. With 11 connection slots reserved for outbound peers, this results in (200 - 11) * 0.5 = 94.5 slots for tx-relaying inbound peers, which closely aligns with the typical tx-relaying inbounds seen in today’s world, where typically ~20% of inbound slots are already taken by block-relay-only connections. These numbers were chosen to be sufficient to raise the number of block-relay-only connections from 2 to 8 once this patch is widely deployed (see Issue #28462 for more details on this). [Note: In a previous iteration, a ratio of 60% was suggested, see this post for more discussion.]

    Handling bloom filter peers

    The logic gets a bit complicated if a node supports BIP37 (bloom filters), because then an inbound peer can be switched to tx-relay after sending a FILTERLOAD or FILTERCLEAR message (but never back!), which could result in an excess of tx-relaying inbounds. We propose to solve this by triggering the eviction logic for a tx-relaying peer from net_processing whenever we switch a peer to tx-relay due to a BIP37-related message.

  6. naumenkogs commented at 8:28 am on September 13, 2023: member

    Concept ACK, specifically on changes to the eviction logic, and handling bloom filters. Numbers also seem fine I guess.

    I was curious whether there is (or will be) any other good reason to delay eviction after the version (apart from making this logic cleaner), but I haven’t identified any.

  7. mzumsande commented at 4:16 pm on September 28, 2023: contributor

    I was curious whether there is (or will be) any other good reason to delay eviction after the version (apart from making this logic cleaner), but I haven’t identified any.

    Discussed this with @amitiuttarwar and we think that it would be better to move the tx-related eviction into the version processing logic. Pros:

    • It’s conceptually simpler to only evict a tx-relaying peer when we are sure that it is one
    • We wouldn’t have to evict anyone in the situation where all tx inbound slots are full and a new block-relay inbound connects
    • We need the logic for this (EvictTxPeerIfFull()) anyway when dealing with bloom-filter peers.

    Cons:

    • The eviction now happens in two different places, after accepting a connection and during version processing.

    I will update the PR with this soon!

  8. DrahtBot added the label Needs rebase on Oct 3, 2023
  9. mzumsande force-pushed on Oct 3, 2023
  10. DrahtBot removed the label Needs rebase on Oct 3, 2023
  11. mzumsande force-pushed on Oct 13, 2023
  12. mzumsande commented at 6:38 pm on October 13, 2023: contributor

    Changed the approach to do the eviction of tx-relaying peers within the version message processing, when we can make use of the fRelay flag they sent us - this has the advantage that we don’t have to evict new peers before being sure if they want tx relay.

    As a result of the later eviction, it became necessary to protect the new peer from being evicted right by AttemptToEvictConnection() right after connecting (so that the new peer only gets evicted if no other peer can be found). This isn’t done for the existing unconditional inobound eviction in net (CreateNodeFromAcceptedSocket), because there we would evict a peer before adding the new peer to m_nodes.

  13. mzumsande force-pushed on Oct 13, 2023
  14. DrahtBot added the label CI failed on Oct 13, 2023
  15. DrahtBot removed the label CI failed on Oct 13, 2023
  16. in src/net.h:1201 in ab5ad61f67 outdated
    1195@@ -1193,6 +1196,16 @@ class CConnman
    1196     int GetExtraFullOutboundCount() const;
    1197     // Count the number of block-relay-only peers we have over our limit.
    1198     int GetExtraBlockRelayCount() const;
    1199+    /**
    1200+     * If we are at capacity for inbound tx-relay peers, attempt to evict one.
    1201+     * @param[in]   protect_peer    NodeId of a peer we want to protect
    


    amitiuttarwar commented at 11:26 pm on October 16, 2023:

    spacing nit to line up with the return description

    0     * [@param](/bitcoin-bitcoin/contributor/param/)[in]   protect_peer      NodeId of a peer we want to protect
    

    mzumsande commented at 9:42 pm on October 19, 2023:
    done
  17. in src/net.cpp:1658 in ab5ad61f67 outdated
    1699@@ -1700,7 +1700,7 @@ std::pair<size_t, bool> CConnman::SocketSendData(CNode& node) const
    1700  *   to forge.  In order to partition a node the attacker must be
    1701  *   simultaneously better at all of them than honest peers.
    1702  */
    1703-bool CConnman::AttemptToEvictConnection(bool evict_tx_relay_peer, std::optional<NodeId> protected_peer)
    1704+bool CConnman::AttemptToEvictConnection(bool evict_tx_relay_peer, std::optional<NodeId> protect_peer)
    


    amitiuttarwar commented at 11:34 pm on October 16, 2023:
    I’m guessing you renamed to protect_peer to match the function signature of EvictTxPeerIfFull, which sounds good to me. but this change should be squashed into ce8cd83a81ac5da882e8c6862a8d53efafc3ae86 which introduced the param.

    mzumsande commented at 9:42 pm on October 19, 2023:
    Yes, some changes made it into the wrong commit. Fixed now.
  18. in src/net_processing.cpp:3492 in ab5ad61f67 outdated
    3488@@ -3489,6 +3489,16 @@ void PeerManagerImpl::ProcessMessage(CNode& pfrom, const std::string& msg_type,
    3489             }
    3490         }
    3491 
    3492+        // If we have too many tx-relaying inbound peers, attempt to evict one (not necessarily this one).
    


    amitiuttarwar commented at 0:07 am on October 17, 2023:

    current comment feels misleading since we specifically protect this peer

    0        // If we have too many tx-relaying inbound peers, attempt to evict an existing one.
    

    mzumsande commented at 9:43 pm on October 19, 2023:
    Thanks, took your suggestion.
  19. in src/net.h:1364 in 8247eaa09d outdated
    1359@@ -1351,7 +1360,13 @@ class CConnman
    1360      */
    1361     bool AlreadyConnectedToAddress(const CAddress& addr);
    1362 
    1363-    bool AttemptToEvictConnection();
    1364+    /**
    1365+     * Try to find an inbound connection to evict when the node is full.
    


    amitiuttarwar commented at 0:58 am on October 17, 2023:
    0     * Attempt to evict an inbound connection. 
    

    unlike EvictTxPeerIfFull, this function doesn’t explicitly check “full”, right? there’s some implicit logic around min number of connections required for there to be unprotected peers, but that’s slightly different than max capacity?

    this comment is also on the function definition in net.cpp on master & I think we should remove it from there as well.


    mzumsande commented at 9:43 pm on October 19, 2023:
    True, that is something the caller has to check. I removed the notion of “full” from both net.h and net.cpp.
  20. in src/net_processing.cpp:3494 in 8247eaa09d outdated
    3488@@ -3489,6 +3489,16 @@ void PeerManagerImpl::ProcessMessage(CNode& pfrom, const std::string& msg_type,
    3489             }
    3490         }
    3491 
    3492+        // If we have too many tx-relaying inbound peers, attempt to evict one (not necessarily this one).
    3493+        // Only if this fails, disconnect this peer.
    3494+        if (pfrom.IsInboundConn()) {
    


    amitiuttarwar commented at 8:02 pm on October 17, 2023:

    I think we can strengthen the check here to prevent a iterating through all the nodes if the incoming peer doesn’t want tx relay enabled (and we are not offering NODE_BLOOM service)

    0        if (pfrom.IsInboundConn() && pfrom.m_relay_txs) {
    

    mzumsande commented at 9:43 pm on October 19, 2023:
    Good point - done.
  21. in src/net.h:1369 in 8247eaa09d outdated
    1365+     * Try to find an inbound connection to evict when the node is full.
    1366+     * @param[in] evict_tx_relay_peer Whether to only select full relay peers for eviction
    1367+     * @param[in] protect_peer        Protect peer with node id
    1368+     * @return                        True if a node was marked for disconnect
    1369+     */
    1370+    bool AttemptToEvictConnection(bool evict_tx_relay_peer, std::optional<NodeId> protect_peer);
    


    amitiuttarwar commented at 8:09 pm on October 17, 2023:
    0    bool AttemptToEvictConnection(bool evict_tx_relay_peer = false, std::optional<NodeId> protect_peer = std::nullopt);
    

    adding default values here would allow us not to have to touch the call site from CreateNodeFromAcceptedSocket & make more clear that the existing functionality shouldn’t change unless additional info is passed in.


    mzumsande commented at 9:44 pm on October 19, 2023:
    changed to use default values.
  22. in src/net.h:1209 in 8247eaa09d outdated
    1204+     *                                no need for eviction, or a peer was evicted).
    1205+     *                                Returns false, if we are full but couldn't find
    1206+     *                                a peer to evict (all eligible peers are protected)
    1207+     *                                so that the caller can deal with this.
    1208+     */
    1209+    bool EvictTxPeerIfFull(std::optional<NodeId> protect_peer);
    


    amitiuttarwar commented at 8:47 pm on October 17, 2023:
    I wonder if it could be clearer to call this function something like TxInboundCapacityAvailable. imo that seems more useful from the POV of the net processing caller, but either is ok with me.

    amitiuttarwar commented at 8:51 pm on October 17, 2023:

    default param here makes the FILTERLOAD/FILTERCLEAR callers slightly cleaner

    0    bool EvictTxPeerIfFull(std::optional<NodeId> protect_peer) = std::nullopt;
    

    mzumsande commented at 9:49 pm on October 19, 2023:

    changed to use default values.

    Not sure about the rename suggestion, I think I prefer EvictTxPeerIfFull, because I think for naming describing what the function does is more important than describing the return value. Otherwise other spots might call this function in the future with the intention to just perform a check, and then might be surprised about the side effects.


    amitiuttarwar commented at 10:19 pm on October 19, 2023:

    and then might be surprised about the side effects.

    yeah totally. I was trying to make the return values more self-evident, but on further thought, I agree with you that emphasizing eviction is more important - people can always read the doc string to understand the return value. can’t think of anything to concisely express HasOrMakeInboundTxCapacity, so leaving as is sounds good :)

  23. amitiuttarwar commented at 0:27 am on October 18, 2023: contributor

    overall, I like this approach significantly more. although we are introducing another touchpoint for inbound eviction logic, it is more straightforward to reason about. but the biggest improvement is not unnecessarily disconnecting a new or existing connection when tx relay isn’t being requested.

    reviewed this in-depth. the fundamental structure & logic looks good to me. left a lot of comments along the way 😛

    some additional thoughts:

    • commit message for ab5ad61f67f3bcab503ac3eb012c4f35d733b946 says “When we receive an inbound connection and don’t have space for another full-relay peer, we now attempt to evict specifically a full-relay inbound.” I think this should mention timing of processing the version message for clarity.
    • the order of events when processing the VERSION message makes sense, because we want to instantiate tx_relay for the before before triggering the new eviction logic (which uses that in the count).
    • this code strengthens the distinction between inbound-tx-relay connections from inbound-block-relay connections. ideally, this would be captured in separate connection types, but that would require more invasive changes since we don’t know the distinction at the time of accepting the connection. I don’t think the benefit would be worth the cost, but is something worth keeping in mind if we continue to build stronger distinctions between the two types of inbounds.
  24. mzumsande force-pushed on Oct 19, 2023
  25. mzumsande commented at 9:52 pm on October 19, 2023: contributor
    Thanks for the review @amitiuttarwar! i pushed an update, addressing all feedback.
  26. amitiuttarwar commented at 10:29 pm on October 19, 2023: contributor

    coauthor ack 9555a33b3d5ad9da32402b89b993729895165655

    the code looks good to me, based on the assumption that our research finds the increase in resource usage reasonable & safe

  27. achow101 referenced this in commit 82ea4e787c on Nov 7, 2023
  28. DrahtBot added the label Needs rebase on Nov 8, 2023
  29. mzumsande force-pushed on Nov 9, 2023
  30. mzumsande commented at 4:24 pm on November 9, 2023: contributor
    Rebased after #28464 was merged.
  31. DrahtBot removed the label Needs rebase on Nov 9, 2023
  32. 0xB10C commented at 4:17 pm on November 14, 2023: contributor

    I’ve been running this for about a week now. I currently have 161 inbound connections: 113 (maximum) full-relay connections and 48 block-relay-only connections.

    image

  33. 0xB10C commented at 4:18 pm on November 14, 2023: contributor

    In #28463 (comment) you mention that:

    [we] don’t want to significantly change the number of tx-relaying inbound peers, because we don’t want to radically change the memory and traffic requirements that come with running a full node with the default configuration.

    On multiple full-inbound master nodes I’m getting roughly an 80%/20% full-relay/block-relay-only inbound split. This makes sense as we currently open 8 full-relay outbound and 2 block-relay outbound connections (ignoring feelers here). That’s roughly 91 full-relay and 23 block-relay inbound connections. With the new maximum of 113 that an increase of 22 full-relay inbound slots being used and nearly an increase of 25%. I’d argue that this is a significant change.

  34. mzumsande commented at 4:02 pm on November 15, 2023: contributor

    I’d argue that this is a significant change.

    Hmm yes, in that comment I was thinking about the theoretical limit of all inbound slots being filled with tx-relaying peers. You are right, in practice that amount will be lower because some block-relay-only connections are made to us. In that sense, the introduction of block-relay-only peers likely came with a reduction in bandwidth for popular nodes that were already at capacity before.

    One way to accommodate this would be to reduce the percentage, so that there would only be ~90-100 slots for tx-relaying inbound peers instead of 113.

  35. DrahtBot added the label Needs rebase on Nov 28, 2023
  36. mzumsande force-pushed on Nov 28, 2023
  37. DrahtBot removed the label Needs rebase on Nov 28, 2023
  38. DrahtBot added the label CI failed on Nov 29, 2023
  39. mzumsande force-pushed on Dec 1, 2023
  40. mzumsande force-pushed on Dec 1, 2023
  41. DrahtBot removed the label CI failed on Dec 1, 2023
  42. mzumsande commented at 5:13 pm on December 29, 2023: contributor
    After discussing wit @amitiuttarwar I’ve now lowered the inbound percentage from 60% to 50% (leading to 95 instead of 113 slots for tx-relaying inbounds). This way, we’ll be more aligned with the typical number of tx-relaying inbounds today, because nodes today usually have a significant number of block-relay-only inbounds as @0xB10C noted. The previous number of 113 was aligned with the maximum number of tx-relaying inbounds.
  43. mzumsande force-pushed on Dec 29, 2023
  44. mzumsande marked this as ready for review on Dec 29, 2023
  45. DrahtBot added the label CI failed on Jan 17, 2024
  46. DrahtBot added the label Needs rebase on Jan 29, 2024
  47. mzumsande force-pushed on Feb 5, 2024
  48. mzumsande commented at 8:14 pm on February 5, 2024: contributor
    a9a64f9 to 77a612d: rebased
  49. DrahtBot removed the label Needs rebase on Feb 5, 2024
  50. DrahtBot removed the label CI failed on Feb 5, 2024
  51. DrahtBot added the label Needs rebase on Feb 27, 2024
  52. mzumsande force-pushed on Feb 27, 2024
  53. DrahtBot removed the label Needs rebase on Feb 27, 2024
  54. DrahtBot added the label CI failed on Jul 25, 2024
  55. DrahtBot commented at 1:40 am on July 25, 2024: contributor

    🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/22038352373

    Make sure to run all tests locally, according to the documentation.

    The failure may happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  56. mzumsande force-pushed on Jul 25, 2024
  57. mzumsande commented at 3:00 pm on July 25, 2024: contributor
    6490355 to 4963e31: Rebased and added __file__ to functional test (see #30463).
  58. DrahtBot removed the label CI failed on Jul 25, 2024
  59. mzumsande force-pushed on Aug 5, 2024
  60. mzumsande force-pushed on Aug 5, 2024
  61. mzumsande commented at 6:19 pm on August 5, 2024: contributor

    related doc should also be updated?

    Good point, thanks! I added a commit that updates these spots. This would obviously also need release notes, but those could be added at a later point.

  62. mzumsande force-pushed on Aug 5, 2024
  63. DrahtBot commented at 9:11 am on September 6, 2024: contributor

    🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/28368043818

    Make sure to run all tests locally, according to the documentation.

    The failure may happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  64. DrahtBot added the label CI failed on Sep 6, 2024
  65. in doc/reduce-traffic.md:8 in 13a51962e8 outdated
     2@@ -3,8 +3,9 @@ Reduce Traffic
     3 
     4 Some node operators need to deal with bandwidth caps imposed by their ISPs.
     5 
     6-By default, Bitcoin Core allows up to 125 connections to different peers, 11 of
     7-which are outbound. You can therefore, have at most 114 inbound connections.
     8+By default, Bitcoin Core allows up to 200 connections to different peers, 11 of
     9+which are outbound. You can therefore, have at most 189 inbound connections, half of
    10+which can only be taken up by low-traffic block-relay-only peers.
    


    naiyoma commented at 2:13 pm on September 14, 2024:
    nit: I’m trying to understand something. According to the PR description, 50% of inbound slots are allocated for tx-relaying peers, but the commit message seems to imply that 50% are for block-relay-only peers. Does this mean that the inbound slots are equally split between tx-relaying and block-relay-only peers? If so, could the description be expanded for clarity?

    mzumsande commented at 10:33 pm on September 17, 2024:

    What is meant is that half of the slots can only be accessed by block-relay-only peers, and the other half is accessible to both: “with 50% of inbound slots accessible for tx-relaying peers” and “half of which can only be taken up by low-traffic block-relay-only peers” are therefore both saying the same thing:

    If we’d have 0 tx-relay inobund peers, in theory all inbound slots could be filled up with block-relay-only peers, whereas if we had 0 block-relay-only inbound peers, still only half of the slots could be filled with tx-relay peers.

  66. naiyoma commented at 5:01 pm on September 15, 2024: contributor
    Concept ACK to increase the number of maximum connections to 200, enabling an increase in outgoing block relay connections later on.
  67. net: add options to AttemptToEvictConnection
    If the added bool evict_tx_relay_peers set, non-tx-relay
    peers will be exempt from disconnection.
    Also allow to specify a peer that exempt from eviction.
    Both options will be used in later commits.
    
    Co-authored-by: Amiti Uttarwar <amiti@uttarwar.org>
    d794e5157b
  68. net: increase inbound capacity for block-relay-only connections
    ..and adjust the eviction logic.
    The new default max connection number is 200, the default maximum of tx-relaying
    inbounds is limited to 50% of all inbound connections.
    With 11 outbound connections, that is (200 - 11) * 0.5 = 94.5.
    As a result, the tx-related maximum traffic should not change
    drastically.
    
    When we receive an inbound connection and don't have space for another
    full-relay peer, we now attempt to evict specifically a full-relay inbound
    after receiving the version message of the new peer.
    
    Once this commit is widely deployed, the added inbound capacity will
    allow us to increase the number of outgoing block-relay-only connections.
    
    Co-authored-by: Amiti Uttarwar <amiti@uttarwar.org>
    7d53b0672c
  69. test: add functional test for inbound maxconnection limits
    Co-authored-by: Martin Zumsande <mzumsande@gmail.com>
    d489f6d701
  70. init: make inbound tx relay percentage configurable
    Permit users to change the amount of inbounds that are permitted to relay
    transactions. This is particularly relevant to ensure that superusers that are
    not concerned with resource usage are not artificially restricted from offering
    many transaction relay slots to the network.
    
    Co-authored-by: Martin Zumsande <mzumsande@gmail.com>
    7e84285532
  71. p2p: trigger possible eviction if we support bloom filters and change a peer to tx relay
    Co-authored-by: Amiti Uttarwar <amiti@uttarwar.org>
    d5fd2af5b7
  72. doc: Update docs that refer to -maxconnections dd2721f7ad
  73. mzumsande force-pushed on Sep 17, 2024
  74. mzumsande commented at 10:36 pm on September 17, 2024: contributor
    Rebased for cmake, and fixed silent conflicts with #30750.
  75. DrahtBot removed the label CI failed on Sep 17, 2024

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-12-11 06:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me