p2p: attempt to fill full outbound connection slots with peers that support tx relay #28538

pull mzumsande wants to merge 3 commits into bitcoin:master from mzumsande:202309_fullob_to_blocksonly changing 6 files +73 −20
  1. mzumsande commented at 7:13 pm on September 26, 2023: contributor

    As described in the issues #16418 and #28371, it’s a possibility that we could end up having an insufficient number of outbound peers relaying our transactions. Having fewer outbound peers support tx-relay could also have implications on privacy and fee estimation.

    While #28488 is suggesting meaures based on comparing fee filters / mempool sizes, there is also the simpler issue of peers that tell us they don’t want transactions, which is determined by the fRelay field in the version msg. If such a peer runs bitcoin core, that usually means that it’s running in -blocksonly mode and accepting inbound connections. The status quo is that we will not avoid these peers and just not send them any transactions.

    This PR proposes not to waste any of our 8 full-relay outbound slots on these peers (unless we are in -blocksonly mode ourselves). Using them as outbound peers for block-relay-only connections is fine though and not impacted.

    As was suggested by ajtowns below, we don’t disconnect during version processing, but later during the regularly scheduled EvictExtraOutboundPeers() task, and only if we are at the maximum of full-outbound peers.

    If reviewers think that this proposal is too aggressive, an alternative solution (with a little more complexity) would be to restrict the maximum number of outbound connections to -blocksonly peers to 1 or 2. Although I currently think that it’s ok to be selective with outbound connections, so I didn’t implement that right away.

  2. DrahtBot commented at 7:13 pm on September 26, 2023: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage

    For detailed information about the code coverage, see the test coverage report.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK stratospher
    Concept ACK glozow, sr-gi
    Stale ACK naumenkogs, amitiuttarwar

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    Conflicts

    No conflicts as of last run.

  3. DrahtBot added the label P2P on Sep 26, 2023
  4. mzumsande marked this as a draft on Sep 26, 2023
  5. DrahtBot added the label CI failed on Sep 26, 2023
  6. brunoerg referenced this in commit 2f500da12b on Sep 26, 2023
  7. mzumsande force-pushed on Sep 26, 2023
  8. mzumsande marked this as ready for review on Sep 26, 2023
  9. DrahtBot removed the label CI failed on Sep 26, 2023
  10. in src/net_processing.cpp:3334 in 6fa1a219db outdated
    3329@@ -3330,6 +3330,15 @@ void PeerManagerImpl::ProcessMessage(CNode& pfrom, const std::string& msg_type,
    3330         }
    3331         if (!vRecv.empty())
    3332             vRecv >> fRelay;
    3333+
    3334+        // Don't waste any of our full inbound slots on peers that don't want tx relay (because they might be in -blocksonly mode)
    


    naumenkogs commented at 7:28 am on September 27, 2023:
    inbound -> outbound

    mzumsande commented at 1:29 pm on September 27, 2023:
    oops - fixed!
  11. naumenkogs approved
  12. naumenkogs commented at 7:33 am on September 27, 2023: member

    Concept ACK.

    I think this measure makes perfect sense, it’s easy and immediately approaches the mentioned issues.

    I would be fine with restricting this (as suggested by the OP message) too.

  13. mzumsande force-pushed on Sep 27, 2023
  14. DrahtBot added the label CI failed on Sep 27, 2023
  15. DrahtBot removed the label CI failed on Sep 27, 2023
  16. in src/net_processing.cpp:3334 in 0034a8fa9b outdated
    3329@@ -3330,6 +3330,15 @@ void PeerManagerImpl::ProcessMessage(CNode& pfrom, const std::string& msg_type,
    3330         }
    3331         if (!vRecv.empty())
    3332             vRecv >> fRelay;
    3333+
    3334+        // Don't waste any of our full outbound slots on peers that don't want tx relay (because they might be in -blocksonly mode)
    


    brunoerg commented at 2:15 pm on September 28, 2023:

    In 82e7a7dc3b4447e4affe3a98e1af8b478aa4dc04: Perhaps we could mention on this comment that it doesn’t apply in case we’re running in -blocksonly mode.

    0        // Don't waste any of our full outbound slots on peers that don't want tx relay (because they might be in -blocksonly mode) unless we are in -blocksonly mode.
    

    mzumsande commented at 10:14 pm on October 4, 2023:
    I have included this info in the doc for the new spot!
  17. in src/net_processing.cpp:3337 in 0034a8fa9b outdated
    3329@@ -3330,6 +3330,15 @@ void PeerManagerImpl::ProcessMessage(CNode& pfrom, const std::string& msg_type,
    3330         }
    3331         if (!vRecv.empty())
    3332             vRecv >> fRelay;
    3333+
    3334+        // Don't waste any of our full outbound slots on peers that don't want tx relay (because they might be in -blocksonly mode)
    3335+        if (!fRelay && !m_opts.ignore_incoming_txs && pfrom.IsFullOutboundConn())
    3336+        {
    3337+            LogPrint(BCLog::NET, "peer=%d (full-relay outbound) does not participate in transaction relay; disconnecting\n", pfrom.GetId());
    


    ajtowns commented at 6:13 am on September 29, 2023:

    Rather than doing this immediaely upon connection (ie, after receiving their VERSION message), would it be better to do it only when we have all our outbound slots filled up (ie, as part of EvictExtraOutboundPeers) ?

    Also, doing the check repeatedy would mean that we could keep blocks-only peers while we’re doing IBD, only dropping them when we exit IBD and start caring about tx relay.


    mzumsande commented at 8:59 pm on September 29, 2023:

    Rather than doing this immediaely upon connection (ie, after receiving their VERSION message), would it be better to do it only when we have all our outbound slots filled up (ie, as part of EvictExtraOutboundPeers) ?

    I like that idea. If for some strange reason we wouldn’t be able to find any other peers we wouldn’t end up connecting the only ones we have. But I think we would want to start evicting if we are at the limit (so in CheckForStaleTipAndEvictPeers), not just if we have exceeded it (EvictExtraOutboundPeers).

    Also, doing the check repeatedy would mean that we could keep blocks-only peers while we’re doing IBD, only dropping them when we exit IBD and start caring about tx relay.

    Do we really want that? The main reason to run in -blocksonly mode is to save traffic. I would assume that most nodes wouldn’t enable listening anyway, but those that do (maybe not even on purpose), maybe it would be nicer not to use them for IBD?


    naumenkogs commented at 10:17 am on October 2, 2023:

    start evicting if we are at the limit (so in CheckForStaleTipAndEvictPeers), not just if we have exceeded it (EvictExtraOutboundPeers).

    I agree.

    Do we really want that? The main reason to run in -blocksonly mode is to save traffic. I would assume that most nodes wouldn’t enable listening anyway, but those that do (maybe not even on purpose), maybe it would be nicer not to use them for IBD?

    Not sure about this. You seem to care specifically about those users which do -blocksonly and forget to disable inbounds. Perhaps this is better communicated through documentation and logging. I think we should leave them a chance to strengthen the block relaying network.


    amitiuttarwar commented at 0:24 am on October 3, 2023:

    +1 to having the logic in eviction instead of connection

    RE IBD: it seems odd / logically inconsistent to run a node in -blocksonly and enable inbounds, so I don’t think it’s worth overly optimizing for the use case. If we want to improve the user experience, we could have a soft param interaction between the two (independent of these changes)


    mzumsande commented at 10:13 pm on October 4, 2023:

    it seems odd / logically inconsistent to run a node in -blocksonly and enable inbounds, so I don’t think it’s worth overly optimizing for the use case

    Maybe there is a use case for wanting to help the network by providing inbound capacity, but still minimize traffic at the same time? Although then it often doesn’t make a lot of sense to run a non-pruned node - unless you also need the entire chain for selfish purposes (e.g. txindex). In any case, I have included the suggestion to not evict while in ibd in the latest push!


    ajtowns commented at 0:38 am on October 5, 2023:

    Do we really want that? The main reason to run in -blocksonly mode is to save traffic.

    In that case you’d be better off doing -maxuploadtarget either instead of or in addition to -blocksonly?

    it seems odd / logically inconsistent to run a node in -blocksonly and enable inbounds,

    I think it would make sense to expect that if 100% (or 99.9%) of nodes were -blocksonly that they would still manage to propagate blocks sensibly, bring up new nodes efficiently, etc. In theory with this PR, we would also be able to propagate txs successfully in the 99.9% -blocksonly case.

  18. amitiuttarwar commented at 0:25 am on October 3, 2023: contributor
    concept ACK
  19. mzumsande force-pushed on Oct 4, 2023
  20. mzumsande commented at 10:07 pm on October 4, 2023: contributor
    I’ve now changed the approach to @ajtowns suggestion to disconnecting during eviction.
  21. mzumsande renamed this:
    p2p: don't make full outbound connections to peers that don't support tx relay
    p2p: attempt to fill full outbound connection slots with peers that support tx relay
    on Oct 4, 2023
  22. in test/functional/test_framework/p2p.py:381 in 831ed4960b outdated
    377         return create_conn
    378 
    379     def peer_accept_connection(self, *args, services=P2P_SERVICES, **kwargs):
    380+        txrelay = True
    381+        if 'txrelay' in kwargs:
    382+            txrelay = kwargs.pop('txrelay')
    


    sr-gi commented at 9:09 pm on October 10, 2023:

    nit: Dict::pop can be provided a default. This could be simplified to:

    0        txrelay = kwargs.pop('txrelay', True)
    

    mzumsande commented at 10:12 pm on October 16, 2023:
    done, thanks
  23. in test/functional/test_framework/p2p.py:362 in 831ed4960b outdated
    359         vt.strSubVer = P2P_SUBVERSION
    360-        vt.relay = P2P_VERSION_RELAY
    361+        if txrelay:
    362+            vt.relay = P2P_VERSION_RELAY
    363+        else:
    364+            vt.relay = 0
    


    sr-gi commented at 9:13 pm on October 10, 2023:

    nit: This can be simplified using the ternary operator

    0        vt.relay = P2P_VERSION_RELAY if txrelay else 0
    

    But also, it may be just simpler to give txrelay a default value in the function signature:

    0peer_connect_send_version(self, services, txrelay=0)
    

    mzumsande commented at 10:12 pm on October 16, 2023:
    used the ternary, I think having a default arg is not ideal here because I’m not sure what should be the default and why.
  24. in test/functional/p2p_blocksonly.py:90 in 831ed4960b outdated
    85+        blocksonly_peer.sync_with_ping()
    86+        blocksonly_peer.peer_disconnect()
    87+
    88+        self.log.info("Check that we evict full-relay outbound connections to blocksonly peers at the full outbound limit")
    89+        # Simulate the situation of being at the max limit of full-outbound connections by setting -maxconnections=1
    90+        self.restart_node(0, ["-maxconnections=1"])
    


    sr-gi commented at 9:33 pm on October 10, 2023:
    I think we should make at least two connections here, otherwise we are not making sure that we are actually not disconnecting all our outbound instead of just evicting enough to go bellow the limit

    mzumsande commented at 10:15 pm on October 16, 2023:
    done, added another connection that is not being disconnected.
  25. in src/net.cpp:2367 in 831ed4960b outdated
    2410@@ -2410,7 +2411,7 @@ int CConnman::GetExtraFullOutboundCount() const
    2411             }
    2412         }
    2413     }
    2414-    return std::max(full_outbound_peers - m_max_outbound_full_relay, 0);
    2415+    return full_outbound_peers - m_max_outbound_full_relay;
    


    sr-gi commented at 9:39 pm on October 10, 2023:
    Do we need to make this change though? It doesn’t look like negative numbers are being used anyway, only whether we are above, or at, zero.

    ajtowns commented at 6:05 am on October 12, 2023:
    Negative numbers mean neither of those code blocks are executed; if it was max{...,0} then the “at zero” code block would be executed when it shouldn’t be.

    sr-gi commented at 2:15 pm on October 12, 2023:

    Right, because we want to execute code for the case in which we are exactly at the limit, and being under the limit will be mapped to the same value. My bad.

    I have to say though that, after this change, the function name becomes a bit confusing.


    ajtowns commented at 7:40 pm on October 12, 2023:

    I have to say though that, after this change, the function name becomes a bit confusing.

    I had a quick look at changing it to return full_outbound_peers and doing the comparison with m_max_outbound_full_relay in the caller. Would require exposing the latter value to net_processing, but otherwise seems like it works. Wasn’t convinced it was enough of an improvement to bother though.


    mzumsande commented at 10:19 pm on October 16, 2023:
    I could add this if reviewers prefer it. I also thought about renaming, but couldn’t really think of a better name than “GetExtraFullOutboundCount” that would imply negative return values.

    amitiuttarwar commented at 0:02 am on December 12, 2023:
    maybe GetFullOutboundDelta ?

    mzumsande commented at 9:50 pm on December 12, 2023:
    I like that name! I missed your comment in the most recent push, but will change it tomorrow!

    mzumsande commented at 7:08 pm on December 14, 2023:
    done now!
  26. in src/net_processing.cpp:5244 in f3b74d4382 outdated
    5246@@ -5246,6 +5247,28 @@ void PeerManagerImpl::EvictExtraOutboundPeers(std::chrono::seconds now)
    5247             }
    5248         }
    5249     }
    5250+
    5251+    // If we are at the max full outbound limit but not above, check if any peers
    


    glozow commented at 1:12 pm on October 11, 2023:

    nit f3b74d43823a038fb0aa4e0bcb7d4f99a7f90688:

    The 2 logical paths depending on the value of extra_outbound_count could be more clear if this were an if/else if block.


    mzumsande commented at 10:11 pm on October 16, 2023:
    changed to else if
  27. in test/functional/test_framework/test_node.py:676 in 831ed4960b outdated
    672@@ -673,6 +673,7 @@ def add_p2p_connection(self, p2p_conn, *, wait_for_verack=True, **kwargs):
    673     def add_outbound_p2p_connection(self, p2p_conn, *, wait_for_verack=True, p2p_idx, connection_type="outbound-full-relay", **kwargs):
    674         """Add an outbound p2p connection from node. Must be an
    675         "outbound-full-relay", "block-relay-only", "addr-fetch" or "feeler" connection.
    676+        The txrelay arg determines whether our peer will signal support for txrelay in their version message.
    


    glozow commented at 1:15 pm on October 11, 2023:
    831ed4960bfb1eafb763cb2a52201a97a17f8168: seems like this should be in the peer_connect_send_version function docstring?

    mzumsande commented at 10:12 pm on October 16, 2023:
    Removed the doc, since it’s hidden in the kwargs anyway (It wasn’t when I added it in an earlier revision). Within peer_connect_send_version it’s kind of self-explanatory, I had added it here to avoid confusion to which side of the connection txrelay refers to.
  28. glozow commented at 1:17 pm on October 11, 2023: member
    concept/approach ACK
  29. sr-gi commented at 3:10 pm on October 11, 2023: member
    Concept ACK, just some nits here and there
  30. in src/net_processing.cpp:5256 in 831ed4960b outdated
    5258+        m_connman.ForEachNode([&node_to_evict](CNode* node) {
    5259+            if (!node->IsFullOutboundConn() || node->m_relays_txs) return;
    5260+            if (!node_to_evict.has_value() || *node_to_evict < node->GetId()) {
    5261+                node_to_evict = node->GetId();
    5262+            }
    5263+        });
    


    sr-gi commented at 5:44 pm on October 11, 2023:
    What if there is, at least, a node already flagged to be disconnected? Shouldn’t we skip this altogether? Otherwise, it could be the case that we end up evicting more nodes that we really need to

    sr-gi commented at 5:57 pm on October 11, 2023:
    Nvm, ForEachNode already checks NodeFullyConnected

    mzumsande commented at 10:13 pm on October 16, 2023:
    Yes, that’s why I didn’t include the check here (the similar checks a few lines above aren’t necessary either)

    naumenkogs commented at 12:03 pm on December 6, 2023:
    Assume fDisconnect could be useful here too. Just in case ForEachNode gets updated.

    amitiuttarwar commented at 0:01 am on December 12, 2023:
    updating ForEachNode to no longer check NodeFullyConnected seems pretty intrusive & like all callers would need to be carefully checked, so imo it doesn’t seem super valuable to repeat the check here. but cost of redundant check is also pretty low so doesn’t seem like a big deal either way.

    mzumsande commented at 9:01 pm on December 12, 2023:
    I also don’t see that as a real risk, but I do see the problem that whoever reads this code (or decides to use ForEachNode somewhere else) could easily get confused, especially since some callers check again for fDisconnect. I added a refactor commit renaming ForEachNode() to ForEachFullyConnectedNode() and removing the duplicate checks / comments. What do you think about that approach?

    naumenkogs commented at 8:20 am on December 13, 2023:

    To be clear, i suggested literally having Assume(fDisconnect), not an extra if check (that would actually make code worse).

    Even if you refactor it into ForEachFullyConnectedNode, I think this Assume(fDisconnect) is kinda useful — because who knows what exactly ForEachFullyConnectedNode means :)

    But I’m fine either way, either you rename or add Assume (both are ideal i think :))


    amitiuttarwar commented at 8:59 pm on December 13, 2023:

    I added a refactor commit renaming ForEachNode() to ForEachFullyConnectedNode() and removing the duplicate checks / comments.

    I like it, I find it clear

  31. mzumsande force-pushed on Oct 16, 2023
  32. mzumsande commented at 10:18 pm on October 16, 2023: contributor

    Pushed to address feedback.

    I also removed a sub-test with a block-relay-only peer, there isn’t really a reason to think that that peer would be disconnected anyway, and it didn’t make much sense in that form because we weren’t at any connection limit.

  33. mzumsande force-pushed on Nov 7, 2023
  34. in src/net_processing.cpp:5252 in 7ebf72b655 outdated
    5246@@ -5246,6 +5247,27 @@ void PeerManagerImpl::EvictExtraOutboundPeers(std::chrono::seconds now)
    5247             }
    5248         }
    5249     }
    5250+    // If we are at the max full outbound limit but not above, check if any peers
    5251+    // don't support tx relay (they might be in -blocksonly mode) - if so,
    5252+    // evict the newest of them so we can fill the slot with a tx-relaying outbound peer
    


    naumenkogs commented at 12:02 pm on December 6, 2023:
    is the newest enforced by ForEachNode iterating? Perhaps make this requirement more clear? Perhaps even add an assert for m_connected before node_to_evict = node->GetId();

    mzumsande commented at 6:28 pm on December 7, 2023:
    It is enforced by the line if (!node_to_evict.has_value() || *node_to_evict < node->GetId()) { (second condition), not through m_connected but by picking the eligible node with the highest NodeId. I’ll add to the comment that “newest” means “highest Node Id”, do you think that would be sufficient?

    naumenkogs commented at 10:04 am on December 8, 2023:
    sounds good
  35. in test/functional/p2p_blocksonly.py:104 in cd7c42e64b outdated
     95+        blocksonly_peer2.wait_for_disconnect()
     96+        blocksonly_peer1.sync_with_ping()
     97+        blocksonly_peer1.peer_disconnect()
     98+
     99+        self.log.info("Check that we don't evict full-relay outbound connections to blocksonly peers if we are blocksonly ourselves")
    100+        self.restart_node(0, ["-blocksonly", "-maxconnections=1"])
    


    naumenkogs commented at 12:06 pm on December 6, 2023:

    Not sure why anyone would do maxconnections=1 in practice, but then if their firewall allows to find only one blocksonly node (and they are not in blocksonly), the behavior might be confusing: they will be frequently disconencted from this peer.

    Perhaps, this could be logged if maxconnections=1


    mzumsande commented at 6:52 pm on December 7, 2023:
    I agree that it makes no sense, if you want just one connection you should pick a node you trust and do -connect. Using automatic connections for that means eclipsing yourself - maybe we should warn about that in general? That being said, this PR does log disconnection to -blocksonly peers, although only in BCLog::NET - do you suggest to make it unconditional based on -maxconnections?

    naumenkogs commented at 10:05 am on December 8, 2023:
    the idea was to log if maxconnect is set to 1, so at the startup (setting this argument) time. I don’t insist though.

    amitiuttarwar commented at 11:48 pm on December 11, 2023:
    imo users should get an error if they set -maxconnections to less than 8, or at least a “this is unsafe. are you SURE?!”. but that’s probably getting off topic here :)

    mzumsande commented at 9:08 pm on December 12, 2023:
    I’d agree with adding this warning, but yeah, this PR is not the best place for it.
  36. mzumsande force-pushed on Dec 7, 2023
  37. naumenkogs commented at 10:07 am on December 8, 2023: member

    ACK 840a022

    Not sure if you noticed this idea, might be a worthy precaution.

  38. DrahtBot requested review from glozow on Dec 8, 2023
  39. DrahtBot requested review from amitiuttarwar on Dec 8, 2023
  40. DrahtBot requested review from sr-gi on Dec 8, 2023
  41. in test/functional/p2p_blocksonly.py:81 in 840a022700 outdated
    77@@ -78,6 +78,31 @@ def blocksonly_mode_tests(self):
    78         self.nodes[0].disconnect_p2ps()
    79         self.generate(self.nodes[0], 1)
    80 
    81+        self.log.info("Check that we sustain full-relay outbound connections to blocksonly peers when not at the full oubound limit")
    


    amitiuttarwar commented at 10:50 pm on December 11, 2023:
    0        self.log.info("Check that we sustain full-relay outbound connections to blocksonly peers when not at the full outbound limit")
    

    amitiuttarwar commented at 11:39 pm on December 11, 2023:

    I don’t think this is testing the expected behavior. The test node is currently started up in blocksonly mode, so won’t enter the new conditional in EvictExtraOutboundPeers regardless of the extra_outbound_count check. To get the test to fail, I added a restart. So right now, this 1st new test and the 3rd new test are the same thing.

    My suggestion is to break these 3 new tests into a separate subtest that restarts with -noblocksonly to help make it clear what the preconditions are.

    diff that edits the condition to get the existing test to fail, and then adds the restart so it actually fails:

     0diff --git a/src/net_processing.cpp b/src/net_processing.cpp
     1index b4f3dbc0ba..a29e296a16 100644
     2--- a/src/net_processing.cpp
     3+++ b/src/net_processing.cpp
     4@@ -5252,7 +5252,7 @@ void PeerManagerImpl::EvictExtraOutboundPeers(std::chrono::seconds now)
     5     // evict the newest of them (highest Node Id) so we can fill the slot with a tx-relaying outbound peer
     6     // This is not done if we are in -blocksonly mode ourselves or still in IBD,
     7     // because then we don't care if our peer supports tx relay.
     8-    else if (extra_outbound_count == 0 && !m_opts.ignore_incoming_txs && !m_chainman.IsInitialBlockDownload()) {
     9+    else if (extra_outbound_count <= 0 && !m_opts.ignore_incoming_txs && !m_chainman.IsInitialBlockDownload()) {
    10         std::optional<NodeId> node_to_evict;
    11         m_connman.ForEachNode([&node_to_evict](CNode* node) {
    12             if (!node->IsFullOutboundConn() || node->m_relays_txs) return;
    13diff --git a/test/functional/p2p_blocksonly.py b/test/functional/p2p_blocksonly.py
    14index 5b455a7e3a..c2fd395d68 100755
    15--- a/test/functional/p2p_blocksonly.py
    16+++ b/test/functional/p2p_blocksonly.py
    17@@ -79,6 +79,7 @@ class P2PBlocksOnly(BitcoinTestFramework):
    18         self.generate(self.nodes[0], 1)
    19
    20         self.log.info("Check that we sustain full-relay outbound connections to blocksonly peers when not at the full oubound limit")
    21+        self.restart_node(0, ["-noblocksonly"])
    22         blocksonly_peer = self.nodes[0].add_outbound_p2p_connection(P2PInterface(), p2p_idx=1, txrelay=False)
    23         # Trigger CheckForStaleTipAndEvictPeers, which is scheduled every 45s (EXTRA_PEER_CHECK_INTERVAL)
    24         self.nodes[0].mockscheduler(46)
    

    mzumsande commented at 9:02 pm on December 12, 2023:
    Good catch, I missed that we are already in blocksonly mode! I moved the added tests to a new blocksonly_peer_tests subtest as suggested.
  42. amitiuttarwar commented at 11:55 pm on December 11, 2023: contributor

    code looks good to me. left a comment about the test that seems worth fixing, then I’m ready to ACK the patch.

    trying to think about any potential negative network effects that could occur, but so far it seems fine to disconnect OB full relay peers that signal they don’t want to receive transactions.

  43. mzumsande force-pushed on Dec 12, 2023
  44. mzumsande commented at 9:10 pm on December 12, 2023: contributor

    840a022 to d636e38: Addressed feedback, and also added a new commit to rename ForEachNode and remove duplicate checks, see #28538 (review).

    ToDo: address #28538 (review)

  45. mzumsande force-pushed on Dec 12, 2023
  46. in test/functional/p2p_blocksonly.py:82 in 70782ab70d outdated
    78@@ -78,6 +79,34 @@ def blocksonly_mode_tests(self):
    79         self.nodes[0].disconnect_p2ps()
    80         self.generate(self.nodes[0], 1)
    81 
    82+    # Tests in which our peer is blocksonly
    


    amitiuttarwar commented at 8:44 pm on December 13, 2023:
    hm, I think this comment can be a bit misleading for people who don’t readily have the full context. the setup tests peers who set the version relay bool to false, indicating they don’t want tx messages. if they are running bitcoind, that means they are running in blocksonly mode, but from the POV of the tests we don’t actually know that. I think it could be helpful to clarify that in the comment here, and then can continue to use the var naming of “blocksonly” in the test.

    mzumsande commented at 7:06 pm on December 14, 2023:
    done
  47. amitiuttarwar commented at 9:00 pm on December 13, 2023: contributor
    ACK d636e38d79a4c3950da91090b1f787163f11e24d
  48. DrahtBot requested review from naumenkogs on Dec 13, 2023
  49. mzumsande force-pushed on Dec 14, 2023
  50. mzumsande commented at 7:07 pm on December 14, 2023: contributor
    d636e38 to 2a17ac3: Renamed GetExtraFullOutboundCount to GetFullOutboundDelta since it can now return negative values, plust small doc change in the test. 2a17ac3 to ad4866: Rebased due to conflict.
  51. p2p: Evict outbound connections to -blocksonly peers when at full outbound maximum
    ...unless we are in -blocksonly mode ourselve or in ibd.
    If we had too many outbound peers not participating in tx relay, we would run
    into problems with getting own transaction relayed. This would also be
    bad for privacy and fee estimation.
    
    We evict non-txrelay peers only when all full outbound slots
    are filled to have peers in hypothetical situations where
    -blocksonly peers are the only ones available.
    
    Idea for this approach by ajtowns.
    1393bf67b2
  52. test: add test for interaction with -blocksonly peers 68769ff7c3
  53. mzumsande force-pushed on Dec 14, 2023
  54. amitiuttarwar commented at 10:34 pm on December 14, 2023: contributor
    ACK ad48667ec8e8d563550df768d0b45abf800662d9
  55. in src/net_processing.cpp:5250 in 1393bf67b2 outdated
    5241@@ -5241,6 +5242,27 @@ void PeerManagerImpl::EvictExtraOutboundPeers(std::chrono::seconds now)
    5242             }
    5243         }
    5244     }
    5245+    // If we are at the max full outbound limit but not above, check if any peers
    5246+    // don't support tx relay (they might be in -blocksonly mode) - if so,
    5247+    // evict the newest of them (highest Node Id) so we can fill the slot with a tx-relaying outbound peer
    5248+    // This is not done if we are in -blocksonly mode ourselves or still in IBD,
    5249+    // because then we don't care if our peer supports tx relay.
    5250+    else if (full_outbound_delta == 0 && !m_opts.ignore_incoming_txs && !m_chainman.IsInitialBlockDownload()) {
    


    naumenkogs commented at 9:27 am on December 18, 2023:

    1393bf67b26c000483c750258ca9d1daeab8a2cb

    not super clear why we won’t do this if full_outbound_delta > 0? Maybe a comment explaining why this could be possible in the first place could help (probably above the first if block). Even now it’s hard to tell if it’s possible to be over the limit most-of-the-time, let alone if we add some new complexity for being over the limit more frequently (for some security reason or whatever).


    mzumsande commented at 8:19 pm on January 3, 2024:
    Hmm, I think the main reason was to avoid interaction with the existing extra-full-outbound eviction. Which raises the question whether there should be any interaction? Should nodes that are protected from that (either because m_protect is true or because they are the only ones for their network) also be made exempt from this new disconnection due to not supporting tx relay? I’m not really sure about that: It would be unfortunate if the non-tx-relaying outbound peer would be the only one with the best chain, but how realistic would such a scenario be?

    sr-gi commented at 7:48 pm on January 5, 2024:

    not super clear why we won’t do this if full_outbound_delta > 0?

    Which raises the question whether there should be any interaction?

    I think this would create a weird interaction. The current approach of this PR is trying to make the 8 full-outbound connections relay transactions, given we already have blocks-only connections, so we should not waste full-outbound slots for this purpose. However, our full-outbound connection protection criteria is based on whether they have provided valid blocks with enough accumulated work and it has nothing to do with how good they are at providing transactions (there is also the recently added network diversity criteria, more about that later).

    This raises the question of what we value the most, having 8 full-outbound peers relaying transactions or our protected peers (and under what criteria). If the former is valued the most, we could merge this under full_outbound_delta > 0 but pick a node not taking the protection criteria into account (which raises the question of how useful the protection is to start with). If the latter is prioritized, then we exclude the protected peers first and then run the eviction over the remaining. This slightly changes the goal of the PR, given we may not achieve 8 transaction-relaying outbound peers. Finally, there are the additional protected peers for network diversity. Depending on how this is treated we may end up with an even smaller number of full-outbounds relaying transactions (8 - 4 - n where n represents the number of networks we support).


    mzumsande commented at 8:19 pm on January 5, 2024:

    Yes, I agree - I think that there are colliding goals, my thinking is as follows:

    1. The m_protect protection attempts to prevent pathological situations where we wouldn’t have enough peers with the honest best chain (that shouldn’t happen normally even without the protection, but would be very serious if they did)
    2. The network-specific protection aims to be connected to all networks we support, to help the interconnection, and also to reduce the risk of eclipse attacks. Ideally, that means relaying both blocks and transactions from one network to another.
    3. The added code attempts to make sure we have as many full-outbound peers actually participating in tx relay as possible, mostly for selfish reasons (e.g. privacy, fee estimation)

    These goals conflict. I’m pretty confident that 3) should take precedence over 2), because the network-specific protection isn’t really about this particular peer, we just want any connection to that network, and would be happy to exchange a current peer that doesn’t support tx-relay with a tx-relaying peer from that same network.

    I’m kind of torn between 1) and 2). It would be easy to exempt non-tx-relaying m_protect peers from disconnection, but that would mean that we could end up with up to 4 of these peers permanently.


    stratospher commented at 4:35 pm on January 9, 2024:

    It would be easy to exempt non-tx-relaying m_protect peers from disconnection, but that would mean that we could end up with up to 4 of these peers permanently.

    commit message where m_protect was introduced says “we pick 4 of our outbound peers and do not subject them to this logic, to be more conservative. We don’t wish to permit temporary network issues (or an attacker) to excessively disrupt network topology.”

    wondering how someone can disrupt network topology with this and if it’s still relevant.

  56. in test/functional/p2p_blocksonly.py:92 in 68769ff7c3 outdated
    87+        # Trigger CheckForStaleTipAndEvictPeers, which is scheduled every 45s (EXTRA_PEER_CHECK_INTERVAL)
    88+        self.nodes[0].mockscheduler(46)
    89+        blocksonly_peer.sync_with_ping()
    90+        blocksonly_peer.peer_disconnect()
    91+
    92+        self.log.info("Check that we evict full-relay outbound connections to blocksonly peers at the full outbound limit")
    


    naumenkogs commented at 9:32 am on December 18, 2023:

    68769ff7c3252c7cfb8758459218979eacba7acb

    You could expand this test by adding some irrelevant peers (e.g. ADDRFETCH), but no big deal.


    mzumsande commented at 8:21 pm on January 5, 2024:
    I tried it, but it turned out to be annoying because in the test we simulate being at the full outbound limit by artificially running with a lower -maxconnections. If we now also add an unrelated peer like ADDRFETCH , it would make the test fail because we don’t have enough slots in semOutbound.
  57. in src/net_processing.cpp:2026 in ad48667ec8 outdated
    2019@@ -2020,7 +2020,7 @@ void PeerManagerImpl::NewPoWValidBlock(const CBlockIndex *pindex, const std::sha
    2020         m_most_recent_block_txs = std::move(most_recent_block_txs);
    2021     }
    2022 
    2023-    m_connman.ForEachNode([this, pindex, &lazy_ser, &hashBlock](CNode* pnode) EXCLUSIVE_LOCKS_REQUIRED(::cs_main) {
    2024+    m_connman.ForEachFullyConnectedNode([this, pindex, &lazy_ser, &hashBlock](CNode* pnode) EXCLUSIVE_LOCKS_REQUIRED(::cs_main) {
    2025         AssertLockHeld(::cs_main);
    2026 
    2027         if (pnode->GetCommonVersion() < INVALID_CB_NO_BAN_VERSION || pnode->fDisconnect)
    


    naumenkogs commented at 9:35 am on December 18, 2023:

    ad48667ec8e8d563550df768d0b45abf800662d9

    gotta drop the second condition here?


    mzumsande commented at 8:21 pm on January 5, 2024:
    done, thanks - I had missed this!
  58. in src/net_processing.cpp:5203 in ad48667ec8 outdated
    5202-            // marked for disconnection
    5203-            if (!pnode->IsFullOutboundConn() || pnode->fDisconnect) return;
    5204+            // Only consider outbound-full-relay peers
    5205+            if (!pnode->IsFullOutboundConn()) return;
    5206             CNodeState *state = State(pnode->GetId());
    5207             if (state == nullptr) return; // shouldn't be possible, but just in case
    


    naumenkogs commented at 9:36 am on December 18, 2023:

    ad48667ec8e8d563550df768d0b45abf800662d9

    might update this too if (!Assume(state)) return and drop the comment if you feel like it :)


    mzumsande commented at 8:22 pm on January 5, 2024:
    I think I’d prefer to not do it here, for my taste it’s not related enough to the core of this PR.
  59. DrahtBot requested review from naumenkogs on Dec 18, 2023
  60. refactor: rename ForEachNode and remove duplicate checks
    Rename ForEachNode to ForEachFullyConnectedNode because it
    has an inbuilt check that only executes the passed function
    for fully connected nodes (i.e. if fSuccessfullyConnected is set
    and fDisconnect isn't).
    
    This means callers don't need to need to worry about that, so remove
    those duplicate checks and related comments.
    8f074589f5
  61. mzumsande force-pushed on Jan 5, 2024
  62. mzumsande commented at 8:24 pm on January 5, 2024: contributor
    Updated to address feedback, I would love to hear more opinions on the protection issue discussed in the threat following #28538 (review)
  63. in test/functional/p2p_blocksonly.py:98 in 68769ff7c3 outdated
    93+        # Simulate the situation of being at the max limit of full-outbound connections by setting -maxconnections to a low value
    94+        self.restart_node(0, ["-noblocksonly", "-maxconnections=2"])
    95+        # Have one blocksonly peer that is not being disconnected
    96+        blocksonly_peer1 = self.nodes[0].add_outbound_p2p_connection(P2PInterface(), p2p_idx=1, txrelay=False)
    97+        blocksonly_peer2 = self.nodes[0].add_outbound_p2p_connection(P2PInterface(), p2p_idx=2, txrelay=False)
    98+        self.nodes[0].mockscheduler(46)
    


    stratospher commented at 4:28 pm on January 9, 2024:

    68769ff: nit if you retouch.

    0with self.nodes[0].assert_debug_log(expected_msgs=["disconnecting full outbound peer not participating in tx relay: peer=1"]):
    1            self.nodes[0].mockscheduler(46)
    2            blocksonly_peer2.wait_for_disconnect()
    
  64. stratospher commented at 4:41 pm on January 9, 2024: contributor
    ACK 8f07458. useful to fill outbound slots with peers which we can relay transactions with and i liked the approach in this PR to swap away blocks-only peers in full relay outbound slots only after the threshold number of max full relay outbound connections is reached.
  65. DrahtBot requested review from amitiuttarwar on Jan 9, 2024
  66. mzumsande commented at 5:25 pm on January 9, 2024: contributor

    Thanks for the review!

    We had an offline conversation about this today (FYI @sdaftuar @sipa @sr-gi). Some takeaways:

    • It would probaly be better to first have a replacement peer ready before disconnecting a blocksonly peer. That would mean a similar approach as for the extra network-specific peers, it would also mean that we would need to add a way to make extra connections when we have a -blocksonly peer we want to get rid of.
    • We should avoid situations where there could be perpetual churn. For example, imagine the handful of peers on the cjdns network suddenly all went blocksonly. Then our network-specific logic would try to make new connections to them, while this logic would disconnect them, resulting in a lot of churn.
    • To avoid interactions with protected peers, it might be best to not protect fRelay=false peers in the first place via m_protect.

    I think I’ll look into reworking the PR and will put it in draft until that has happened.

  67. mzumsande marked this as a draft on Jan 9, 2024
  68. DrahtBot added the label CI failed on Jan 17, 2024
  69. DrahtBot commented at 0:01 am on April 19, 2024: contributor

    🤔 There hasn’t been much activity lately and the CI seems to be failing.

    If no one reviewed the current pull request by commit hash, a rebase can be considered. While the CI failure may be a false positive, the CI hasn’t been running for some time, so there may be a real issue hiding as well. A rebase triggers the latest CI and makes sure that no silent merge conflicts have snuck in.

  70. DrahtBot commented at 2:07 am on July 18, 2024: contributor

    🤔 There hasn’t been much activity lately and the CI seems to be failing.

    If no one reviewed the current pull request by commit hash, a rebase can be considered. While the CI failure may be a false positive, the CI hasn’t been running for some time, so there may be a real issue hiding as well. A rebase triggers the latest CI and makes sure that no silent merge conflicts have snuck in.

  71. mzumsande commented at 6:33 pm on September 18, 2024: contributor
    Not working on this currently, I might open a new PR at a later time, or up for grabs if someone else wants to.
  72. mzumsande closed this on Sep 18, 2024


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-12-11 06:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me