p2p: do not make automatic outbound connections to addnode peers #28895

pull jonatack wants to merge 2 commits into bitcoin:master from jonatack:2023-11-do-not-make-automatic-outbound-connection-to-addnode-peers changing 3 files +30 −0
  1. jonatack commented at 4:44 pm on November 16, 2023: member

    to allocate our limited outbound slots correctly, and to ensure addnode connections benefit from their intended protections.

    Our addnode logic usually connects the addnode peers before the automatic outbound logic does, but not always, as a connection race can occur. If an addnode peer disconnects us and if it was the only one from its network, there can be a race between reconnecting to it with the addnode thread, and it being picked as automatic network-specific outbound peer. Or our internet connection or router or the addnode peer could be temporarily offline, and then return online during the automatic outbound thread. Or we could add a new manual peer using the addnode RPC at that time.

    The race can be more apparent when our node doesn’t know many peers, or with networks like cjdns that currently have few bitcoin peers.

    When an addnode peer is connected as an automatic outbound peer and is the only connection we have to a network, it can be protected by our new outbound eviction logic and persist in the “wrong role”.

    Finally, there does not seem to be a reason to make block-relay or short-lived feeler connections to addnode peers, as the addnode logic will ensure we connect to them if they are up, within the addnode connection limit.

    Fix these issues by checking if the address is an addnode peer in our automatic outbound connection logic.

  2. p2p: do not make automatic outbound connections to addnode peers
    to allocate our limited outbound slots correctly, and to ensure addnode
    connections benefit from their intended protections.
    
    Our addnode logic usually connects the addnode peers before the automatic
    outbound logic does, but not always, as a connection race can occur.  If an
    addnode peer disconnects us and if it was the only one from its network, there
    can be a race between reconnecting to it with the addnode thread, and it being
    picked as automatic network-specific outbound peer.  Or our internet connection
    or router, or the addnode peer, could be temporarily offline, and then return
    online during the automatic outbound thread.  Or we could add a new manual peer
    using the addnode RPC at that time.
    
    The race can be more apparent when our node doesn't know many peers, or with
    networks like cjdns that currently have few bitcoin peers.
    
    When an addnode peer is connected as an automatic outbound peer and is the only
    connection we have to a network, it can be protected by our new outbound
    eviction logic and persist in the "wrong role".
    
    Examples on mainnet using logging added in the same pull request:
    
    2023-08-12T14:51:05.681743Z [opencon] [net.cpp:1949] [ThreadOpenConnections]
    [net:debug] Not making automatic network-specific outbound-full-relay connection
    to i2p peer selected for manual (addnode) connection: [geh...odq.b32.i2p]:0
    
    2023-08-13T03:59:28.050853Z [opencon] [net.cpp:1949] [ThreadOpenConnections]
    [net:debug] Not making automatic block-relay-only connection to onion peer
    selected for manual (addnode) connection: kpg...aid.onion:8333
    
    2023-08-13T16:21:26.979052Z [opencon] [net.cpp:1949] [ThreadOpenConnections]
    [net:debug] Not making automatic network-specific outbound-full-relay connection
    to cjdns peer selected for manual (addnode) connection: [fcc...8ce]:8333
    
    2023-08-14T20:43:53.401271Z [opencon] [net.cpp:1949] [ThreadOpenConnections]
    [net:debug] Not making automatic network-specific outbound-full-relay connection
    to cjdns peer selected for manual (addnode) connection: [fc7...59e]:8333
    
    2023-08-15T00:10:01.894147Z [opencon] [net.cpp:1949] [ThreadOpenConnections]
    [net:debug] Not making automatic feeler connection to i2p peer selected for
    manual (addnode) connection: geh...odq.b32.i2p:8333
    
    Finally, there does not seem to be a reason to make block-relay or short-lived
    feeler connections to addnode peers, as the addnode logic will ensure we connect
    to them if they are up, within the addnode connection limit.
    
    Fix these issues by checking if the address is an addnode peer in our automatic
    outbound connection logic.
    cc62716920
  3. test: add unit test for CConnman::AddedNodesContain() 5e7cc4144b
  4. DrahtBot commented at 4:44 pm on November 16, 2023: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage

    For detailed information about the code coverage, see the test coverage report.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK mzumsande, vasild, guggero, brunoerg

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

  5. DrahtBot added the label P2P on Nov 16, 2023
  6. jonatack commented at 4:49 pm on November 16, 2023: member

    Reported in https://github.com/bitcoin/bitcoin/pull/27213/files#r1291926369 and extracted from an earlier version of #28248.

    The first commit ought to be feasible to backport, if desired. The second test commit would require #28155.

  7. dergoegge commented at 5:17 pm on November 16, 2023: member

    Thank you for isolating this change and making it more clear what you were going for.

    From what I understand the “regression” is not that we might make automatic connections to addnode peers but rather that if that happens these connections might stick around as automatic connections for longer (due to the new eviction logic), while not being labeled as manual.

    So in the end a user might end up with less outbound connections than expected for longer than is currently possible, with some of their manual connections incorrectly labeled as automatic.

    If the above is correct, then I am not sure if this is worthy of a backport at this stage in the release cycle because: 1) This should be rare and 2) the manual connections are still being made and persisted (just labeled incorrectly).

  8. jonatack commented at 5:23 pm on November 16, 2023: member
    Empirically, the regression is that we’re actively adding addnode peers as outbound ones, and keeping them there. Since a couple of years I run a node with all three of our privacy networks and addnode entries for each, along with clearnet, and I didn’t see my addnode peers being connected to as non-manual outbounds. This is new behavior for my node, it’s frequent, it removes the intended privileges that addnode entries benefit from, and it impacts precisely the nodes running multiple networks including ones with fewer peers that used addnode entries to ensure connection to them.
  9. dergoegge commented at 6:05 pm on November 16, 2023: member

    Empirically, the regression is that we’re actively adding addnode peers as outbound ones, and keeping them there.

    Sure but logically, the regression is that the duration of these connections increased not that they are being made in the first place.

    I’m wondering why you only now noticed this, since the duration of these mislabeled connections prior to the new eviction logic (i assume) depended on regular outbound eviction behavior, which could also result in longer living connections.

  10. jonatack commented at 6:30 pm on November 16, 2023: member

    From what I understand the “regression” is not that we might make automatic connections to addnode peers but rather that if that happens these connections might stick around as automatic connections for longer (due to the new eviction logic), while not being labeled as manual.

    This stems from a change in this release where it was observed and reported. We did not previously automatically add outbound connections to addnode entries in sparse networks to the degree and regularity that we do now.

    So in the end a user might end up with less outbound connections than expected for longer than is currently possible, with some of their manual connections incorrectly labeled as automatic.

    Sure but logically, the regression is that the duration of these connections increased not that they are being made in the first place.

    1. This should be rare and 2) the manual connections are still being made and persisted (just labeled incorrectly).

    No longer rare, and not only about mislabeled connections. The new change is also misallocating rare outbound connection slots, and attributing incorrect peer privileges as addnode peers benefit from non-eviction, are not disconnected or discouraged for bad behavior, etc.

    I’m wondering why you only now noticed this, since the duration of these mislabeled connections prior to the new eviction logic (i assume) depended on regular outbound eviction behavior, which could also result in longer living connections.

    Observed and reported in August on the pull that made the change (there was no reaction). While it looks like it could occur somewhat rarely before, with the change it is now frequent, and more noticeable as the connections are protected.

  11. mzumsande commented at 9:01 pm on November 16, 2023: contributor

    Tested ACK 5e7cc4144bb09cfde48ba5bc055a2da3ca4252d1

    I tested this by running on cjdns and ipv4, and then manually disconnecting an addnode cjdns peer and observing the log - both on v26.0rc2 and this branch.

    Sure but logically, the regression is that the duration of these connections increased not that they are being made in the first place.

    I think I disagree with that. They might now be protected from eviction due to network-specific extra peers, but that eviction is also new: Before #27213, we only had the stale-tip eviction, and I think that it would only extremely rarely evict such a peer because of the way it works (see this post for why I think so), so once an automatic connections was made, it was probably never evicted unless there were other causes like we got offline or they disconnected us.

    I think that the mechanism mentioned in the OP (race after getting disconnected) is probably what makes this most noticeable, because ThreadOpenAddedConnections() has a long sleep (60s) which it wouldn’t interrupt, and even though EXTRA_NETWORK_PEER_INTERVAL = 5min is larger, it would immediately kick in after the only peer from a network disconnected us (and then the next time ~5 minutes later), so ThreadOpenConnection would usually be quicker than ThreadOpenAddedConnections().

    A completely unrelated problem is that this problem exists because cjdns has so few peers <10? on mainnet, which is a shame considering how much engineering effort went into it, but I’m not sure what to do about that.

  12. jonatack commented at 9:16 pm on November 16, 2023: member

    cjdns has so few peers <10? on mainnet, which is a shame considering how much engineering effort went into it, but I’m not sure what to do about that.

    I also saw the network-specific issue with i2p peers, and when adding a new network, though cjdns is indeed a good way to test it. As to what we can do to further cjdns adoption, when one-click I2P support was added to Raspiblitz and Umbrel in December 2022 after this workshop and blog post, the number of recent peers returned by -addrinfo for me went from a couple hundred to quickly more than a thousand, and now a couple thousand. Cjdns being a friend-of-a-friend topology requires the additional step of finding a friend to connect to the network with, but we have a couple of stable ones in the hardcoded DNS seeds, and maybe I can encourage and work with the node-in-a-box packages to get support for it in (and do more related workshops).

  13. in src/test/net_peer_connection_tests.cpp:126 in 5e7cc4144b
    121@@ -122,6 +122,13 @@ BOOST_AUTO_TEST_CASE(test_addnode_getaddednodeinfo_and_connection_detection)
    122     BOOST_CHECK_EQUAL(connman->GetAddedNodeInfo(/*include_connected=*/true).size(), nodes.size());
    123     BOOST_CHECK(connman->GetAddedNodeInfo(/*include_connected=*/false).empty());
    124 
    125+    // Test AddedNodesContain()
    126+    for (auto node : connman->TestNodes()) {
    


    jonatack commented at 9:30 pm on November 16, 2023:

    Note to self when retouching here or in follow-ups (i.e. #28248)

    0    for (const auto& node : connman->TestNodes()) {
    
  14. vasild approved
  15. vasild commented at 1:35 pm on November 17, 2023: contributor
    ACK 5e7cc4144bb09cfde48ba5bc055a2da3ca4252d1
  16. in src/net.cpp:3476 in 5e7cc4144b
    3471+    const std::string addr_str{addr.ToStringAddr()};
    3472+    const std::string addr_port_str{addr.ToStringAddrPort()};
    3473+    LOCK(m_added_nodes_mutex);
    3474+    return (m_added_node_params.size() < 24 // bound the query to a reasonable limit
    3475+            && std::any_of(m_added_node_params.cbegin(), m_added_node_params.cend(),
    3476+                           [&](const auto& p) { return p.m_added_node == addr_str || p.m_added_node == addr_port_str; }));
    


    vasild commented at 1:40 pm on November 17, 2023:

    (hit “submit” too early)

    nit, non-blocker: I do not think that this < 24 cap is needed. What is the max number of added nodes expected?

    I think it is not worth doing that, but in theory, if this is to be optimized - then we can observe that addr_port_str begins with addr_str. For example aaa:8333 and aaa and if we are checking if string aaaa is equal to either one, the way this is done now is that it would traverse the entire aaaa twice whereas it can be done with a single traverse. But please ignore this :)


    jonatack commented at 1:47 pm on November 17, 2023:
    I think the number of added nodes is up to the user without a limit, though 8 at a time is the addnode connection limit. We can change m_added_node_params to an unordered set later on as you suggested in #28248 to make the query constant-time on average and drop the size check. Kept it simple here.
  17. guggero approved
  18. guggero commented at 2:54 pm on November 17, 2023: contributor

    utACK 5e7cc4144bb09cfde48ba5bc055a2da3ca4252d1

    I think it’s important that nodes added by addnode always get the elevated protection level.

    A completely unrelated problem is that this problem exists because cjdns has so few peers <10? on mainnet, which is a shame considering how much engineering effort went into it, but I’m not sure what to do about that.

    I’m still trying to wrap my head around CJDNS and how it can be set up in a docker based environment. If successful, I’ll try creating an Umbrel app for it, which should hopefully help with growing the network.

  19. brunoerg commented at 4:36 pm on November 17, 2023: contributor
    utACK 5e7cc4144bb09cfde48ba5bc055a2da3ca4252d1
  20. DrahtBot requested review from brunoerg on Nov 17, 2023
  21. DrahtBot removed review request from brunoerg on Nov 17, 2023
  22. fanquake merged this on Nov 22, 2023
  23. fanquake closed this on Nov 22, 2023

  24. fanquake commented at 11:57 am on November 22, 2023: member
    Added the fix to #28872 for backporting.
  25. fanquake referenced this in commit 55af112565 on Nov 22, 2023
  26. fanquake referenced this in commit e4fef4ae65 on Nov 22, 2023
  27. jonatack deleted the branch on Nov 22, 2023
  28. kwvg referenced this in commit 909e293e1e on Oct 26, 2024
  29. kwvg referenced this in commit ffe8a36e74 on Oct 26, 2024
  30. kwvg referenced this in commit 098a56747a on Oct 26, 2024
  31. kwvg referenced this in commit 09504bdd1f on Oct 26, 2024
  32. PastaPastaPasta referenced this in commit 25c3355053 on Oct 27, 2024
  33. bitcoin locked this on Nov 21, 2024

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-12-21 15:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me