p2p: peer connection bug fixes #28248

pull jonatack wants to merge 15 commits into bitcoin:master from jonatack:2023-08-network-diversity changing 8 files +200 −88
  1. jonatack commented at 8:46 pm on August 9, 2023: contributor

    This pull fixes several peer connection bugs in our p2p code, along with the logging that uncovered them:

    1. Fix detection of inbound peer connections in GetAddedNodeInfo.

    2. Fix addnode CJDNS peers not detected in GetAddedNodeInfo, causing ThreadOpenAddedConnections to continually retry to connect to them and RPC getaddednodeinfo incorrectly showing them as not connected.

    3. Fix ThreadOpenConnections not detecting inbound CJDNS connections and making automatic outbound connections to them.

    4. Fix detection of already connected peers in AlreadyConnectedToAddress().

    5. Fix detection of already connected peers when making outbound connections in ConnectNode.

    6. Do not accept inbound connections in CreateNodeFromAcceptedSocket from I2P peers we’re already connected to, as building I2P tunnels is expensive.

    7. Fix making automatic outbound connections in ThreadOpenConnections to addnode peers, in order not to allocate our limited outbound slots to them and to ensure addnode connections benefit from their intended protections. Our addnode logic usually connects the addnode peers before the automatic outbound logic does, but not always, as a connection race can occur (see the commit message for further details and mainnet examples). When an addnode peer is connected as an automatic outbound peer and is the only connection we have to a network, it can be protected by our new outbound network-specific eviction logic and persist in the “wrong role”. Fix these issues by checking if the selected address is an addnode peer in our automatic outbound connection logic.

    Update the p2p logging with the improvements that allowed seeing/understanding/debugging the current behavior. Please see the commit messages for details.

    Simplify MaybePickPreferredNetwork to return std::optional, make it a const class method, and add Clang thread-safety analysis annotation and related assertions.

  2. DrahtBot commented at 8:46 pm on August 9, 2023: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage

    For detailed information about the code coverage, see the test coverage report.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    Concept ACK brunoerg
    Stale ACK vasild

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #28155 (net: improves addnode / m_added_nodes logic by sr-gi)
    • #27114 (p2p: Allow whitelisting outgoing connections by brunoerg)
    • #27052 (test: rpc: add last block announcement time to getpeerinfo result by LarryRuane)
    • #26859 (fuzz: extend ConsumeNetAddr() to return I2P and CJDNS addresses by vasild)
    • #25832 (tracing: network connection tracepoints by 0xB10C)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  3. DrahtBot added the label P2P on Aug 9, 2023
  4. jonatack marked this as ready for review on Aug 10, 2023
  5. dergoegge commented at 10:03 am on August 10, 2023: member

    The title “p2p: outbound network diversity improvements” makes it sound like you are improving on the logic for making outbound connections but really you are refactoring and adding additional logging. Would you mind making the title more accurate?

    Please see the commit messages for details.

    The PR description is added to the merge commit, so in my opinion it makes sense to have at least a summary of what the PR does in the description.

    From our docs:

    0The body of the pull request should contain sufficient description of what the patch does, and even more importantly, why, with 
    1justification and reasoning. You should include references to any discussions (for example, other issues or mailing list
    2discussions).
    
  6. jonatack renamed this:
    p2p: outbound network diversity improvements
    p2p: outbound network diversity refactoring and logging improvements
    on Aug 10, 2023
  7. jonatack commented at 9:52 pm on August 10, 2023: contributor
    @dergoegge updated
  8. ajtowns commented at 5:51 am on August 11, 2023: contributor
    This mostly seems like pointless refactoring (changing a switch to an || of two functions that themselves have switches). Doesn’t seem worth the review effort to me.
  9. jonatack renamed this:
    p2p: outbound network diversity refactoring and logging improvements
    p2p: outbound logic, logging and refactoring improvements
    on Aug 12, 2023
  10. jonatack force-pushed on Aug 12, 2023
  11. jonatack renamed this:
    p2p: outbound logic, logging and refactoring improvements
    p2p: bugfixes, logic and logging improvements
    on Aug 13, 2023
  12. jonatack force-pushed on Aug 13, 2023
  13. jonatack force-pushed on Aug 14, 2023
  14. jonatack commented at 4:12 am on August 14, 2023: contributor
    Updated to propose fixes for the issues observed with the improved logging.
  15. jonatack force-pushed on Aug 14, 2023
  16. in src/net.cpp:1933 in 4b9774dd49 outdated
    1928+                const auto str_addr{addr.ToStringAddr()};
    1929+                const auto str_addr_port{addr.ToStringAddrPort()};
    1930+
    1931+                LOCK(m_added_nodes_mutex);
    1932+                if (m_added_nodes.size() < 16 // bound the query to a reasonable limit
    1933+                    && std::any_of(m_added_nodes.cbegin(), m_added_nodes.cend(), [&](const auto& a) { return a == str_addr || a == str_addr_port; })) {
    


    vasild commented at 1:46 pm on August 14, 2023:

    Good catch with this deficiency!

    If m_added_nodes is changed to std::unordered_set, then this can be a simple m_added_nodes.count() > 0 and CConnman::AddNode() and CConnman::RemoveAddedNode() could be simplified (and made faster since they are now O(number of added nodes) and will become O(1), I guess performance is irrelevant in practice because of the small number of elements).

    Maybe this is worth a separate PR since it is distinct from other commits (one commit to change it to std::unordered_set and one to add this check).


    jonatack commented at 5:13 pm on August 15, 2023:
    Good idea, simpler and faster. Done (thanks!)
  17. jonatack force-pushed on Aug 15, 2023
  18. jonatack commented at 8:16 pm on August 15, 2023: contributor
    Updated for @vasild’s review feedback to change m_added_nodes from a vector to an unordered set (https://github.com/bitcoin/bitcoin/pull/28248/commits/3f4a1f0bc98159f86d910227fd57d05d10bb8889), which simplifies and speeds up the AddNode() and RemoveAddedNode() methods along with the commit that follows it, p2p: do not make automatic outbound connections to addnode peers (048e1af).
  19. DrahtBot added the label CI failed on Aug 15, 2023
  20. in src/net.cpp:401 in a411cb5a17 outdated
    396@@ -408,7 +397,8 @@ CNode* CConnman::FindNode(const CService& addr)
    397 
    398 bool CConnman::AlreadyConnectedToAddress(const CAddress& addr)
    399 {
    400-    return FindNode(static_cast<CNetAddr>(addr)) || FindNode(addr.ToStringAddrPort());
    401+    const CService service{MaybeFlipIPv6toCJDNS(addr)};
    402+    return FindNode(service) || FindNode(service.ToStringAddrPort());
    


    vasild commented at 3:15 pm on August 16, 2023:
    Before this it would compare just the address and after this change it would compare the address and the port. I think that is undesirable because for inbound connections we see the peer as e.g. 1.2.3.4:somerandomport and when we try to connect to 1.2.3.4:8333 we should assume we are already connected to that peer.

    jonatack commented at 9:57 pm on September 1, 2023:
    Changed FindNode(service) to FindNode(static_cast<CNetAddr>(service)).
  21. in src/net.cpp:2794 in a411cb5a17 outdated
    2003@@ -1995,7 +2004,7 @@ std::vector<AddedNodeInfo> CConnman::GetAddedNodeInfo() const
    2004     }
    2005 
    2006     for (const std::string& strAddNode : lAddresses) {
    2007-        CService service(LookupNumeric(strAddNode, Params().GetDefaultPort(strAddNode)));
    2008+        CService service{MaybeFlipIPv6toCJDNS(LookupNumeric(strAddNode, Params().GetDefaultPort(strAddNode)))};
    


    vasild commented at 3:18 pm on August 16, 2023:
    The existent code in master seems to be comparing the ports. Is there another bug that an inbound connection from 1.2.3.4 would not be considered as “we are connected to 1.2.3.4:8333”?

    jonatack commented at 9:03 pm on August 16, 2023:
    Yes, I think that is currently an issue as well.

    vasild commented at 7:43 am on August 17, 2023:
    I mean, if we already have an inbound connection from 1.2.3.4 (ie we are connected to 1.2.3.4:somerandomport) and this code checks whether we are connected to 1.2.3.4:8333 it would wrongly assume that we are not and would open an outbound connection to that address.

    jonatack commented at 9:57 pm on September 1, 2023:
    Good call. I’ve reproduced that bug and am looking at a fix.

    jonatack commented at 3:56 am on September 6, 2023:
    Updated with a fix, along with several additional issues found and possibly fixed now while looking into this (thanks!)
  22. in src/net.cpp:2635 in a411cb5a17 outdated
    2635-    }
    2636-
    2637-    m_added_nodes.push_back(strNode);
    2638+    if (m_added_nodes.count(strNode)) return false;
    2639+    m_added_nodes.insert(strNode);
    2640     return true;
    


    vasild commented at 3:43 pm on August 16, 2023:

    Or this:

    0return m_added_nodes.insert(strNode).second;
    

    jonatack commented at 8:28 pm on August 16, 2023:
    Thank you, done.
  23. in src/net.cpp:432 in a411cb5a17 outdated
    426@@ -437,6 +427,14 @@ static CAddress GetBindAddress(const Sock& sock)
    427     return addr_bind;
    428 }
    429 
    430+static void LogAlreadyConnected(CNode* pnode, ConnectionType conn_type)
    431+{
    432+    LogPrintfCategory(BCLog::NET, "Failed to open new %s connection, already connected to peer=%d, net=%s%s (%s)\n",
    


    vasild commented at 3:55 pm on August 16, 2023:
    Why not use LogPrintLevel(BCLog::NET, BCLog::Level::Debug?

    jonatack commented at 8:41 pm on August 16, 2023:
    I didn’t change the level from the current LogPrintf since 236618061a445d2cb11e722cfac5fdae5be26abb in 2017. Since this shouldn’t normally occur, unconditionally logging seems helpful (the net category is currently much too busy to leave on and see this happen unless you proactively grep for it – finishing the logging severity level upgrade should solve that. I personally test this branch with the added logging changed to unconditional).

    jonatack commented at 9:57 pm on September 1, 2023:
    On second thought, taking your suggestion as the goal is to migrate to severity level logging. Thanks!
  24. jonatack force-pushed on Sep 6, 2023
  25. jonatack force-pushed on Sep 6, 2023
  26. jonatack force-pushed on Sep 6, 2023
  27. jonatack force-pushed on Sep 6, 2023
  28. jonatack commented at 6:51 pm on September 6, 2023: contributor
    Rebased, updated with improved logging, and fixes for additional issues (71828e0c, f126d041, a1b1a24) observed with the logging or suggested by @vasild (thanks!)
  29. DrahtBot removed the label CI failed on Sep 7, 2023
  30. jonatack renamed this:
    p2p: bugfixes, logic and logging improvements
    p2p: peer connection bug fixes, logic and logging improvements
    on Sep 9, 2023
  31. jonatack force-pushed on Sep 11, 2023
  32. jonatack commented at 8:43 pm on September 12, 2023: contributor
    Updated with further logging improvements. The p2p peer connection bugfixes here are unchanged from the previous push. It may be a good idea to tag this for v26?
  33. in src/net.cpp:2858 in 54804cba85 outdated
    2733@@ -2734,13 +2734,13 @@ std::vector<AddedNodeInfo> CConnman::GetAddedNodeInfo() const
    2734 
    2735 
    2736     // Build a map of all already connected addresses (by IP:port and by name) to inbound/outbound and resolved CService
    2737-    std::map<CService, bool> mapConnected;
    2738+    std::map<CNetAddr, bool> mapConnected;
    


    mzumsande commented at 9:28 pm on September 12, 2023:
    Could this lead to disambiguities? Say if you want to addnode different localhost peers with different ports (but the same IP 127.0.0.1), would getaddednode() now say you are connected to all of them, even when you are only connected to 1 in reality?

    vasild commented at 7:39 am on October 12, 2023:

    Right. Comparing ports for incoming connections is problematic also. I think the right approach for determining if we are connected to addr:port, for the purposes of making an outbound connection:

    • For incoming connections, compare just addr and ignore port
    • For outgoing connections, compare both addr and port.

    This can’t be done with a single map of either CService or CNetAddr. I guess two containers are needed, as this:

      0diff --git i/src/net.cpp w/src/net.cpp
      1index 91282d927d..0eb0023112 100644
      2--- i/src/net.cpp
      3+++ w/src/net.cpp
      4@@ -2809,41 +2809,49 @@ std::vector<AddedNodeInfo> CConnman::GetAddedNodeInfo() const
      5         LOCK(m_added_nodes_mutex);
      6         ret.reserve(m_added_node_params.size());
      7         std::copy(m_added_node_params.cbegin(), m_added_node_params.cend(), std::back_inserter(lAddresses));
      8     }
      9 
     10 
     11-    // Build a map of all already connected addresses (by IP:port and by name) to inbound/outbound and resolved CService
     12-    std::map<CNetAddr, bool> mapConnected;
     13+    // Collect all addresses to which we are already connected.
     14+    std::unordered_set<CNetAddr, CNetAddrHash> connected_to_inbound;
     15+    std::unordered_set<CService, CServiceHash> connected_to_outbound;
     16     std::map<std::string, std::pair<bool, CService>> mapConnectedByName;
     17     {
     18         LOCK(m_nodes_mutex);
     19         for (const CNode* pnode : m_nodes) {
     20             if (pnode->addr.IsValid()) {
     21-                mapConnected[static_cast<CNetAddr>(pnode->addr)] = pnode->IsInboundConn();
     22+                if (pnode->IsInboundConn()) {
     23+                    connected_to_inbound.insert(pnode->addr);
     24+                } else {
     25+                    connected_to_outbound.insert(pnode->addr);
     26+                }
     27             }
     28             std::string addrName{pnode->m_addr_name};
     29             if (!addrName.empty()) {
     30                 mapConnectedByName[std::move(addrName)] = std::make_pair(pnode->IsInboundConn(), static_cast<const CService&>(pnode->addr));
     31             }
     32         }
     33     }
     34 
     35     for (const auto& addr : lAddresses) {
     36         CService service(LookupNumeric(addr.m_added_node, GetDefaultPort(addr.m_added_node)));
     37         AddedNodeInfo addedNode{addr, CService(), false, false};
     38         if (service.IsValid()) {
     39-            // strAddNode is an IP:port
     40-            auto it = mapConnected.find(static_cast<CNetAddr>(service));
     41-            if (it != mapConnected.end()) {
     42+            // addr.m_added_node is an addr:port
     43+            if (connected_to_outbound.count(service) > 0) {
     44+                addedNode.resolvedAddress = service;
     45+                addedNode.fConnected = true;
     46+                addedNode.fInbound = false;
     47+            } else if (connected_to_inbound.count(service) > 0) {
     48                 addedNode.resolvedAddress = service;
     49                 addedNode.fConnected = true;
     50-                addedNode.fInbound = it->second;
     51+                addedNode.fInbound = true;
     52             }
     53         } else {
     54-            // strAddNode is a name
     55+            // addr.m_added_node is a name
     56             auto it = mapConnectedByName.find(addr.m_added_node);
     57             if (it != mapConnectedByName.end()) {
     58                 addedNode.resolvedAddress = it->second.second;
     59                 addedNode.fConnected = true;
     60                 addedNode.fInbound = it->second.first;
     61             }
     62diff --git i/src/netaddress.h w/src/netaddress.h
     63index ad09f16799..99a3b0fc22 100644
     64--- i/src/netaddress.h
     65+++ w/src/netaddress.h
     66@@ -254,12 +254,13 @@ public:
     67             UnserializeV2Stream(s);
     68         } else {
     69             UnserializeV1Stream(s);
     70         }
     71     }
     72 
     73+    friend class CNetAddrHash;
     74     friend class CSubNet;
     75 
     76 private:
     77     /**
     78      * Parse a Tor address and set this object to it.
     79      * [@param](/bitcoin-bitcoin/contributor/param/)[in] addr Address to parse, must be a valid C string, for example
     80@@ -473,12 +474,36 @@ private:
     81         // will not be gossiped, but continue reading next addresses from the stream.
     82         m_net = NET_IPV6;
     83         m_addr.assign(ADDR_IPV6_SIZE, 0x0);
     84     }
     85 };
     86 
     87+class CNetAddrHash
     88+{
     89+public:
     90+    CNetAddrHash()
     91+        : m_salt_k0{GetRand<uint64_t>()},
     92+          m_salt_k1{GetRand<uint64_t>()}
     93+    {
     94+    }
     95+
     96+    CNetAddrHash(uint64_t salt_k0, uint64_t salt_k1) : m_salt_k0{salt_k0}, m_salt_k1{salt_k1} {}
     97+
     98+    size_t operator()(const CNetAddr& a) const noexcept
     99+    {
    100+        CSipHasher hasher(m_salt_k0, m_salt_k1);
    101+        hasher.Write(a.m_net);
    102+        hasher.Write(a.m_addr);
    103+        return static_cast<size_t>(hasher.Finalize());
    104+    }
    105+
    106+private:
    107+    const uint64_t m_salt_k0;
    108+    const uint64_t m_salt_k1;
    109+};
    110+
    111 class CSubNet
    112 {
    113 protected:
    114     /// Network (base) address
    115     CNetAddr network;
    116     /// Netmask, in network byte order
    

    jonatack commented at 9:52 pm on October 16, 2023:
    @vasild Thanks! I agree with your bimodal approach and replaced 71528aeaba2fbb3141693b000f2ec7799074bfdf with your suggestion above credited to you.
  34. in src/net.cpp:1868 in b97cfd18da outdated
    1813@@ -1814,6 +1814,16 @@ void CConnman::CreateNodeFromAcceptedSocket(std::unique_ptr<Sock>&& sock,
    1814     }
    1815 
    1816     const bool inbound_onion = std::find(m_onion_binds.begin(), m_onion_binds.end(), addr_bind) != m_onion_binds.end();
    1817+
    1818+    // Don't accept inbound connections from peers already connected to us.
    1819+    // Skip this check for Tor, as their inbound/outbound addresses differ.
    1820+    if (!inbound_onion) {
    1821+        if (auto pnode = AlreadyConnectedToAddress(addr)) {
    


    mzumsande commented at 10:05 pm on September 12, 2023:
    Similar question here: Could this make some local network setups no longer viable? Probably no one should mainnet for those if it’s just for testing, but possibly some setups with multiple nodes sharing the same IP exist?

    jonatack commented at 5:09 am on October 17, 2023:
    Good question. I’ve rewritten AlreadyConnectedToAddress() with the suggestions in #28248 (review). Do you think that resolves the possible issue you describe?

    jonatack commented at 7:22 pm on October 24, 2023:
    Updated to only apply this check to inbound I2P peers we’re already connected to, due to the resource cost of creating new tunnels.
  35. mzumsande commented at 10:13 pm on September 12, 2023: contributor
    Just had a short look and some questions, will review properly in the next weeks.
  36. DrahtBot added the label Needs rebase on Sep 14, 2023
  37. dergoegge commented at 10:50 am on September 14, 2023: member
    If you’re fixing bugs, it would be very beneficial to also add tests at the same time.
  38. jonatack force-pushed on Oct 11, 2023
  39. DrahtBot removed the label Needs rebase on Oct 11, 2023
  40. in src/net.cpp:411 in 2a9db3a16a outdated
    407@@ -408,9 +408,13 @@ CNode* CConnman::FindNode(const CService& addr)
    408     return nullptr;
    409 }
    410 
    411-bool CConnman::AlreadyConnectedToAddress(const CAddress& addr)
    412+std::optional<CNode*> CConnman::AlreadyConnectedToAddress(const CAddress& addr)
    


    vasild commented at 8:37 am on October 12, 2023:

    Two concerns about this interface change:

    1. The pointer could become stale as soon as this method returns. It is good that this method is private which somewhat limits the danger that somebody will dereference a stale pointer, but does not eliminate it. FindNode() already has such an unsafe interface where it locks m_nodes_mutex, fetches a pointer to an element inside m_nodes, releases the mutex and returns the (possibly stale) pointer. In master, all places where FindNode() is called either only check if the returned pointer is null without dereferencing it (ok), or lock the mutex, call FindNode() (requiring the mutex to be recursive), dereference the pointer, unlock the mutex. But this PR in commit cb3f5a805caa2d42fdab57b42e6aee820ccd347d p2p, bugfix: correctly detect already connected peers in ConnectNode() adds calls where the pointer is dereferenced after releasing the mutex. I would rather have less such unsafe interfaces than more. Maybe log the message from inside AlreadyConnectedToAddress() or have it return the message as a string, to be optionally logged by the caller?

    2. (minor) optional from a pointer is redundant - if the pointer will never be null, then it can be reference. If the pointer can be null, then the optional can be dropped and null pointer to mean “not connected”.


    jonatack commented at 3:29 am on October 17, 2023:
    Good points. Changed AlreadyConnectedToAddress() back to return a bool and log the message from inside it.
  41. in src/net.cpp:2947 in 2a9db3a16a outdated
    2943@@ -2902,7 +2944,7 @@ void CConnman::OpenNetworkConnection(const CAddress& addrConnect, bool fCountFai
    2944     }
    2945     if (!pszDest) {
    2946         bool banned_or_discouraged = m_banman && (m_banman->IsDiscouraged(addrConnect) || m_banman->IsBanned(addrConnect));
    2947-        if (IsLocal(addrConnect) || banned_or_discouraged || AlreadyConnectedToAddress(addrConnect)) {
    2948+        if (IsLocal(addrConnect) || banned_or_discouraged) {
    


    vasild commented at 9:34 am on October 12, 2023:

    cb3f5a805caa2d42fdab57b42e6aee820ccd347d p2p, bugfix: correctly detect already connected peers in ConnectNode()

    • Remove an AlreadyConnectedToAddress() check in CConnman::OpenNetworkConnection, as it is now redundant to the checks added in this commit, and it is preferable to log if it occurs.

    It might happen that CConnman::ConnectNode() skips the call to AlreadyConnectedToAddress() and connects - if pszDest is null and Lookup(pszDest returns empty result. Then it will go here:

    https://github.com/bitcoin/bitcoin/blob/4a5aae9330780c3740e27cc511f7cba1fab745b9/src/net.cpp#L549-L558

    So maybe keep the call to AlreadyConnectedToAddress() from CConnman::OpenNetworkConnection().


    jonatack commented at 3:41 am on October 17, 2023:
    Keeping the call (thanks).
  42. in src/net.cpp:1866 in 2a9db3a16a outdated
    1860@@ -1847,6 +1861,16 @@ void CConnman::CreateNodeFromAcceptedSocket(std::unique_ptr<Sock>&& sock,
    1861     }
    1862 
    1863     const bool inbound_onion = std::find(m_onion_binds.begin(), m_onion_binds.end(), addr_bind) != m_onion_binds.end();
    1864+
    1865+    // Don't accept inbound connections from peers already connected to us.
    1866+    // Skip this check for inbound Tor connections, as each IP will differ.
    


    vasild commented at 9:37 am on October 12, 2023:
    0    // Skip this check for inbound Tor connections because all of them seem to come from the IP address of the Tor router (usually 127.0.0.1).
    

    jonatack commented at 8:35 pm on October 16, 2023:
    Done.
  43. in src/net.cpp:397 in 2a9db3a16a outdated
    415+    if (Params().IsTestChain()) return std::nullopt; // allow bilateral connections on test chains
    416+    const CService service{MaybeFlipIPv6toCJDNS(addr)};
    417+    if (auto pnode = FindNode(static_cast<CNetAddr>(service))) return pnode;
    418+    if (auto pnode = FindNode(service.ToStringAddrPort())) return pnode;
    419+    return std::nullopt;
    420 }
    


    vasild commented at 10:02 am on October 12, 2023:

    This needs to know whether to ignore the port or not in order to do the check properly:

    • if we are about to make an outbound connection and wonder whether we are already connected to the peer we intend to connect to
      • for all existent outbound connections, compare the prospect’s address and port and if both match, then we are already connected
      • for all existent inbound connections, compare only the address and ignore the port, if a match is found, then we are already connected
      • otherwise we are not connected
    • if we are accepting in inbound connection, for all existent inbound and outbound connections, compare only the address and if a match is found then we are already connected, otherwise we are not connected.

    jonatack commented at 0:20 am on October 17, 2023:
    I think that makes sense. Done – thanks!
  44. in src/net.h:111 in 2a9db3a16a outdated
    106+        return m_added_node == p.m_added_node && m_use_v2transport == p.m_use_v2transport;
    107+    }
    108+};
    109+
    110+struct AddedNodeHasher {
    111+    size_t operator()(const AddedNodeParams& p) const noexcept
    


    vasild commented at 10:10 am on October 12, 2023:
    nit, feel free to ignore: for consistency with CServiceHash this better end in Hash. Given that it operates on AddedNodeParams better use that name verbatim instead of shortening it to AddedNode. I.e. consider renaming this to AddedNodeParamsHash.

    jonatack commented at 8:34 pm on October 16, 2023:
    Done
  45. in src/net.h:113 in 2a9db3a16a outdated
    108+};
    109+
    110+struct AddedNodeHasher {
    111+    size_t operator()(const AddedNodeParams& p) const noexcept
    112+    {
    113+        return std::hash<std::string>()(p.m_added_node) ^ (std::hash<bool>()(p.m_use_v2transport) << 1);
    


    vasild commented at 10:22 am on October 12, 2023:
    Why << 1?

    jonatack commented at 4:19 am on October 17, 2023:

    It’s adapted from the struct MyHash code example in https://en.cppreference.com/w/cpp/utility/hash, and I’ve updated it in the latest push to be more like that example. I’d be very happy for advice on how to improve that hashing function or use our time-tested existing ones.

    0struct MyHash
    1{
    2    std::size_t operator()(const S& s) const noexcept
    3    {
    4        std::size_t h1 = std::hash<std::string>{}(s.first_name);
    5        std::size_t h2 = std::hash<std::string>{}(s.last_name);
    6        return h1 ^ (h2 << 1);
    7    }
    8};
    

    vasild commented at 12:01 pm on October 17, 2023:

    I am not sure why they used << 1. It seems to be useless and also it drops one bit from h2. Maybe to avoid returning 0 if first name and last name happen to be equal?

    I think for the purposes of hashing m_added_node_params this suffices:

    0    size_t operator()(const AddedNodeParams& p) const noexcept
    1    {
    2        return std::hash<std::string>()(p.m_added_node) ^ p.m_use_v2transport;
    3    }
    

    However std::hash is not used anywhere in the code (modulo in some tests and benchmarks). CServiceHash uses CSipHasher, but there are security concerns with an outsider being able to fiddle with the hashing, maybe this is an overkill for AddedNodeParams. If that is to be used here, then it would look like:

    0        CSipHasher hasher(m_salt_k0, m_salt_k1);
    1        hasher.Write(MakeUCharSpan(p.m_added_node));
    2        hasher.Write(p.m_use_v2transport);
    3        return static_cast<size_t>(hasher.Finalize());
    

    Or use some of the stateless Hash* functions from src/hash.h.

    I guess @sipa can give us an advice?


    jonatack commented at 1:50 pm on October 17, 2023:

    Maybe to avoid returning 0 if first name and last name happen to be equal?

    Indeed, makes sense.

    I think for the purposes of hashing m_added_node_params this suffices:

    0        return std::hash<std::string>()(p.m_added_node) ^ p.m_use_v2transport;
    

    👍

    However std::hash is not used anywhere in the code (modulo in some tests and benchmarks). CServiceHash uses CSipHasher. If that is to be used here, then it would look like:

    0        CSipHasher hasher(m_salt_k0, m_salt_k1);
    1        hasher.Write(MakeUCharSpan(p.m_added_node));
    2        hasher.Write(p.m_use_v2transport);
    3        return static_cast<size_t>(hasher.Finalize());
    

    Or use some of the stateless Hash* functions from src/hash.h.

    👍


    jonatack commented at 2:53 pm on October 17, 2023:

    Updated with the CSipHasher version (thanks!)

    Edit: the addnode test in test/functional/rpc_net.py failed with the CSipHasher version suggestion (implemented like CServiceHash or CNetAddrHash).

    Using return std::hash<std::string>()(p.m_added_node) ^ p.m_use_v2transport; in the latest push.

  46. in src/net.cpp:2795 in 2a9db3a16a outdated
    2786@@ -2760,6 +2787,19 @@ void CConnman::ThreadOpenConnections(const std::vector<std::string> connect)
    2787                 continue;
    2788             }
    2789 
    2790+            // Do not make automatic outbound connections to addnode peers,
    2791+            // to allocate our limited outbound slots correctly and to ensure
    2792+            // addnode connections benefit from their intended protections.
    2793+            // Expect BIP324 transport when both us and them have NODE_V2_P2P set.
    2794+            const bool v2_p2p(addr.nServices & GetLocalServices() & NODE_P2P_V2);
    2795+            if (WITH_LOCK(m_added_nodes_mutex, return m_added_node_params.count({addr.ToStringAddr(), v2_p2p}) || m_added_node_params.count({addr.ToStringAddrPort(), v2_p2p}))) {
    


    vasild commented at 10:39 am on October 12, 2023:

    If we have the address for manual connections (given to addnode) here we should ignore whether v1 or v2 is going to be used. It is the same peer after all, regardless of p2p encryption. I guess this better be something like:

     0bool in_added_nodes;
     1{
     2    const std::string addr_str{addr.ToStringAddr()};
     3    const std::string addr_port_str{addr.ToStringAddrPort()};
     4    LOCK(m_added_nodes_mutex);
     5    in_added_nodes =
     6        m_added_node_params.count({addr_str, /*m_use_v2transport=*/true}) > 0 ||
     7        m_added_node_params.count({addr_str, /*m_use_v2transport=*/false}) > 0 ||
     8        m_added_node_params.count({addr_port_str, /*m_use_v2transport=*/true}) > 0 ||
     9        m_added_node_params.count({addr_port_str, /*m_use_v2transport=*/false}) > 0;
    10}
    11if (in_added_nodes) {
    

    jonatack commented at 8:45 pm on October 16, 2023:
    Thanks. I hesitated on this point. Done.
  47. in src/net.cpp:493 in 2a9db3a16a outdated
    492                 return nullptr;
    493             }
    494         }
    495     }
    496 
    497+    LogPrintLevel(BCLog::NET, BCLog::Level::Debug, "Trying v%i %s connection to net=%s%s, lastseen=%.1fhrs\n",
    


    vasild commented at 11:05 am on October 12, 2023:

    nit:

    0    LogPrintLevel(BCLog::NET, BCLog::Level::Debug, "Trying v%i %s connection to net=%s%s, lastseen=%.1fh ago\n",
    

    jonatack commented at 8:34 pm on October 16, 2023:
    Done
  48. in src/net.cpp:1896 in 2a9db3a16a outdated
    1892@@ -1869,7 +1893,9 @@ void CConnman::CreateNodeFromAcceptedSocket(std::unique_ptr<Sock>&& sock,
    1893     pnode->AddRef();
    1894     m_msgproc->InitializeNode(*pnode, nodeServices);
    1895 
    1896-    LogPrint(BCLog::NET, "connection from %s accepted\n", addr.ToStringAddrPort());
    1897+    LogPrintLevel(BCLog::NET, BCLog::Level::Debug, "New %s connection accepted from peer=%d, net=%s%s\n",
    


    vasild commented at 11:09 am on October 12, 2023:

    nit: I find “…connection accepted from peer=5” a bit confusing. Maybe:

    0    LogPrintLevel(BCLog::NET, BCLog::Level::Debug, "New %s connection accepted, peer=%d, net=%s%s\n",
    

    jonatack commented at 8:34 pm on October 16, 2023:
    Done
  49. vasild commented at 11:16 am on October 12, 2023: contributor
    Reviewed as of 2a9db3a16a93d497bfe496b7c35edac975ee81d2. Good fixes! LGTM modulo the comments below.
  50. jonatack force-pushed on Oct 17, 2023
  51. jonatack force-pushed on Oct 17, 2023
  52. DrahtBot added the label CI failed on Oct 17, 2023
  53. jonatack force-pushed on Oct 17, 2023
  54. jonatack commented at 5:07 am on October 17, 2023: contributor
    Thank you, @vasild! I’ve pushed a WIP checkpoint update covering all your suggestions. Will return to it tomorrow and happy to take any further suggestions you may have.
  55. DrahtBot removed the label CI failed on Oct 17, 2023
  56. in src/net.cpp:424 in 44ceac2ef9 outdated
    421+        // For accepting an inbound connection, check all existing connections by addr only.
    422+        pnode = FindNode(static_cast<CNetAddr>(service));
    423+    } else {
    424+        // For making an outbound connection, check the existing outbound connections
    425+        // by addr and port, and if none, then inbound connections by addr only.
    426+        pnode = FindNode(service.ToStringAddrPort(), /*inbound=*/false);
    


    vasild commented at 1:08 pm on October 17, 2023:

    Can use CConnman::FindNode(const CService& addr) here?

    0        pnode = FindNode(service, /*inbound=*/false);
    

    but then we will not be comparing against CNode::m_addr_name. I wonder if we should compare against both CNode::addr and CNode::m_addr_name?


    jonatack commented at 3:15 am on October 18, 2023:

    Proposed an implementation:

    0pnode = FindNode(service, /*inbound=*/false);
    1if (!pnode || !pnode->addr.IsValid()) pnode = FindNode(service.ToStringAddrPort(), /*inbound=*/false);
    2if (!pnode || !pnode->addr.IsValid()) pnode = FindNode(static_cast<CNetAddr>(service), /*inbound=*/true);
    

    vasild commented at 8:41 am on October 18, 2023:
    Looks good. Can CNode::addr ever be invalid?

    jonatack commented at 3:45 am on October 24, 2023:
    Not sure, so keeping as belt and suspenders.
  57. in src/net.cpp:392 in 44ceac2ef9 outdated
    430+        LogPrintLevel(BCLog::NET, inbound ? BCLog::Level::Debug : BCLog::Level::Info,
    431+                      "Not %s new %s connection%s, already connected to %s %s %s peer=%d%s, version=%d, subver=%s\n",
    432+                      inbound ? "accepting" : "opening", ConnectionTypeAsString(conn_type),
    433+                      fLogIPs ? strprintf(" %s %s", inbound ? "from" : "to", addr.ToStringAddrPort()) : "",
    434+                      TransportTypeAsString(pnode->m_transport->GetInfo().transport_type),
    435+                      pnode->ConnectionTypeAsString(), GetNetworkName(pnode->ConnectedThroughNetwork()),
    


    vasild commented at 1:15 pm on October 17, 2023:

    Dereferencing pnode after FindNode() has released m_nodes_mutex is not safe. I don’t like adding recursive mutex calls, but without altering FindNode() we have to do that - this code has to acquire the mutex before any call to FindNode(), dereference the returned pointer and then release the mutex:

     0 bool CConnman::AlreadyConnectedToAddress(const CAddress& addr, ConnectionType conn_type, bool log)
     1 {
     2     if (Params().IsTestChain()) return false; // allow bilateral connections on test chains
     3     const CService service{MaybeFlipIPv6toCJDNS(addr)};
     4     const bool inbound{conn_type == ConnectionType::INBOUND};
     5     CNode* pnode{nullptr};
     6+    LOCK(m_nodes_mutex);
     7     if (inbound) {
     8         // For accepting an inbound connection, check all existing connections by addr only.
     9         pnode = FindNode(static_cast<CNetAddr>(service));
    10     } else {
    11         // For making an outbound connection, check the existing outbound connections
    12         // by addr and port, and if none, then inbound connections by addr only.
    13         pnode = FindNode(service.ToStringAddrPort(), /*inbound=*/false);
    14         if (!pnode) pnode = FindNode(static_cast<CNetAddr>(service), /*inbound=*/true);
    15     }
    16     if (pnode && log) {
    17         LogPrintLevel(BCLog::NET, inbound ? BCLog::Level::Debug : BCLog::Level::Info,
    18                       "Not %s new %s connection%s, already connected to %s %s %s peer=%d%s, version=%d, subver=%s\n",
    19                       inbound ? "accepting" : "opening", ConnectionTypeAsString(conn_type),
    20                       fLogIPs ? strprintf(" %s %s", inbound ? "from" : "to", addr.ToStringAddrPort()) : "",
    21                       TransportTypeAsString(pnode->m_transport->GetInfo().transport_type),
    22                       pnode->ConnectionTypeAsString(), GetNetworkName(pnode->ConnectedThroughNetwork()),
    23                       pnode->GetId(), fLogIPs ? strprintf(", peeraddr=%s", pnode->addr.ToStringAddrPort()) : "",
    24                       pnode->nVersion.load(), WITH_LOCK(pnode->m_subver_mutex, return pnode->cleanSubVer));
    25     }
    26-    return pnode;
    27+    return pnode != nullptr;
    28 }
    

    jonatack commented at 1:46 pm on October 17, 2023:
    Good point. (Several times I’ve been tempted to move the locking inside the FindNode methods to their callers.)

    jonatack commented at 3:46 am on October 24, 2023:

    Dereferencing pnode after FindNode() has released m_nodes_mutex is not safe. I don’t like adding recursive mutex calls, but without altering FindNode() we have to do that.

    Done, and moved the locking from the FindNode* methods to their callers.

  58. in src/net.cpp:372 in 44ceac2ef9 outdated
    369+CNode* CConnman::FindNode(const CNetAddr& ip, std::optional<bool> inbound)
    370 {
    371     LOCK(m_nodes_mutex);
    372     for (CNode* pnode : m_nodes) {
    373-      if (static_cast<CNetAddr>(pnode->addr) == ip) {
    374+        if (static_cast<CNetAddr>(pnode->addr) == ip && (!inbound || pnode->IsInboundConn() == inbound)) {
    


    vasild commented at 1:35 pm on October 17, 2023:

    Here and in the other FindNode() using std::optional<bool> as a boolean is a bit unclear whether we are checking if the optional has a value or whether the value is true. I find this more clear:

    0        if (static_cast<CNetAddr>(pnode->addr) == ip && (!inbound.has_value() || pnode->IsInboundConn() == inbound.value())) {
    

    jonatack commented at 1:44 pm on October 17, 2023:

    Added new FindNode methods with 2 params rather than modifying the existing ones.

    Done.

  59. vasild commented at 1:40 pm on October 17, 2023: contributor
    Looks good as of 44ceac2ef9 (the last commit needs to be dispersed).
  60. DrahtBot added the label Needs rebase on Oct 19, 2023
  61. jonatack force-pushed on Oct 24, 2023
  62. jonatack commented at 4:01 am on October 24, 2023: contributor
    Rebased and updated to take the latest feedback from @vasild.
  63. DrahtBot added the label CI failed on Oct 24, 2023
  64. DrahtBot removed the label Needs rebase on Oct 24, 2023
  65. p2p, bugfix: detect all inbound connections in GetAddedNodeInfo()
    Due to their potentially different port numbers, inbound peers can be overlooked
    in CConnman::GetAddedNodeInfo() when comparing CServices, i.e. both addr and port.
    
    Fix this by comparing CNetAddr instances, i.e. addr only, for inbound peers.
    
    Co-authored-by: Jon Atack <jon@atack.com>
    d746231809
  66. p2p, bugfix: detect addnode cjdns peers in GetAddedNodeInfo()
    Addnode (manual) peers connected to us via the cjdns network are currently not
    detected by CConnman::GetAddedNodeInfo(), i.e. fConnected is always false.
    
    This causes the following issues:
    
    - RPC `getaddednodeinfo` incorrectly shows them as not connected
    
    - CConnman::ThreadOpenAddedConnections() continually retries to connect them
    5f53e6c073
  67. p2p, bugfix: detect inbound cjdns peers in AlreadyConnectedToAddress()
    Currently ThreadOpenConnections() does not detect inbound cjdns peers that we
    are already connected to when it calls AlreadyConnectedToAddress().
    
    Fix this by calling MaybeFlipIPv6toCJDNS before the search.
    12ff8077ce
  68. refactor: move locks from CConnman::FindNode methods to their callers
    Dereferencing the pointer result after FindNode* has released m_nodes_mutex is unsafe,
    so we currently have to add unneeded recursive locks before calling FindNode*.
    
    Fix this by acquiring the mutex before calling FindNode* rather than inside them.
    
    Co-authored-by: Vasil Dimov <vd@FreeBSD.org>
    95d8ad3b9a
  69. refactor: allow CConnman::FindNode methods to filter by inbound status 91796ad272
  70. jonatack force-pushed on Oct 24, 2023
  71. jonatack renamed this:
    p2p: peer connection bug fixes, logic and logging improvements
    p2p: peer connection bug fixes
    on Oct 24, 2023
  72. vasild approved
  73. vasild commented at 2:23 pm on October 24, 2023: contributor
    ACK f196b096f55f909e2bb63d5ff43bc3418c0ebbab
  74. in src/net.cpp:1823 in 2ec971326d outdated
    1818@@ -1819,6 +1819,14 @@ void CConnman::CreateNodeFromAcceptedSocket(std::unique_ptr<Sock>&& sock,
    1819     }
    1820 
    1821     const bool inbound_onion = std::find(m_onion_binds.begin(), m_onion_binds.end(), addr_bind) != m_onion_binds.end();
    1822+
    1823+    // Do not accept inbound connections from peers already connected to us.
    


    dergoegge commented at 2:25 pm on October 24, 2023:

    Just because the connections come from the same IP does no mean it is the same peer, e.g. multiple peers could be connecting to you through the same Tor exit relay.

    This does not seem like a bug fix and more like a behavior change. Our inbound eviction logic should handle stuff like this and it probably already does.


    jonatack commented at 2:32 pm on October 24, 2023:
    Inbound tor peers are exempted from this check. I do see many such redundant connection attempts logged when running this branch. In the case of I2P peers, needlessly accepting to build these tunnels can be expensive. As mentioned in the commit message, the other fixes in this PR may mitigate this behavior by our peers, but not for nodes running earlier versions of Bitcoin Core.

    dergoegge commented at 2:38 pm on October 24, 2023:

    Inbound tor peers are exempted from this check.

    We don’t have logic for detecting if someone is using Tor as a proxy to connect to the IPv4 network. Our “inbound Tor” classification simply refers to someone connecting to our hidden service.

    Another example would be multiple peers connecting to us through the same VPN provider.


    jonatack commented at 2:47 pm on October 24, 2023:
    Ok, this could be dropped and deferred until later, to see if the other fixes resolve the behavior seen from peers (I think it’s likely).

    jonatack commented at 7:21 pm on October 24, 2023:
    Updated to only apply this check to inbound I2P peers we’re already connected to, due to the resource cost of creating new tunnels.

    vasild commented at 2:33 pm on October 30, 2023:
    Yes, different peers can be connecting to us through the same NAT box, Tor exit node, or VPN server. Or it might be the same peer indeed. I think that the pros of assuming it is the same peer out-weight the pros of assuming it is not. However, I can see how this can be arguable and without having some solid numbers to back either argument maybe best to keep this unchanged.

    jonatack commented at 9:29 pm on November 1, 2023:
    Yes, I’ve been running this branch with its logging on for several months, and grepping the log turns up dozens, sometimes hundreds, of connection attempts per day from one or two inbound I2P peers on a given day that are already connected. I’m not sure at the moment of writing whether the behavior is malicious or due to buggy outbound logic (which would only be fixed in future releases and not existing releases connecting to us, though).
  75. in src/net.cpp:367 in f196b096f5 outdated
    373 
    374-bool CConnman::AlreadyConnectedToAddress(const CAddress& addr)
    375+bool CConnman::AlreadyConnectedToAddress(const CAddress& addr, ConnectionType conn_type, bool log)
    376 {
    377-    return FindNode(static_cast<CNetAddr>(addr)) || FindNode(addr.ToStringAddrPort());
    378+    if (Params().IsTestChain()) return false; // allow bilateral connections on test chains
    


    fanquake commented at 2:30 pm on October 24, 2023:
    0    if (Params().IsTestChain()) return false; // allow bilateral connections on test chains
    

    This seems very out of place to me, and would also prevent writing certain types of tests?


    jonatack commented at 2:33 pm on October 24, 2023:
    This is required currently for several of our p2p functional tests. They would need to be updated not to make bilateral connections.

    fanquake commented at 2:35 pm on October 24, 2023:
    Which tests?

    fanquake commented at 2:39 pm on October 24, 2023:

    They would need to be updated not to make bilateral connections.

    I would suggest updating the tests to account for the behaviour you’ve changed (it may also help to make the bug fixes clearer for reviewers); rather than introducing per-chain, networking functionality. This seems generally dangerous, and mostly unintuitive.


    jonatack commented at 7:19 pm on October 24, 2023:

    Good points. Tests like rpc_net.py, p2p_blockfilters.py, wallet_groups and others, are making bilateral connections using connect_nodes(). For instance, from rpc_net.py:

    0        # By default, the test framework sets up an addnode connection from
    1        # node 1 --> node0. By connecting node0 --> node 1, we're left with
    2        # the two nodes being connected both ways.
    3        # Topology will look like: node0 <--> node1
    

    Behind the scenes, the test framework is calling RPC addnode "onetry" for the connect_nodes calls. Our peer detection looked like it was intended to find already connected inbound peers, but in practice it did not in some cases, and one is when we make addnode connections in our testing. Perhaps it is an expected use case.

    For now, I updated AlreadyConnectedToAddress() to explicitly allow bilateral connections for manual peers and dropped the IsTestChain() check.

    Edit: the AlreadyConnectedToAddress() commit is too strict, which accounts for some of the test issues. Am updating and adding test coverage.

  76. brunoerg commented at 2:31 pm on October 24, 2023: contributor

    Concept ACK

    tests for the bug fixes would be helpful

  77. in src/net.cpp:2839 in f196b096f5 outdated
    2832@@ -2783,18 +2833,21 @@ std::vector<AddedNodeInfo> CConnman::GetAddedNodeInfo() const
    2833     }
    2834 
    2835     for (const auto& addr : lAddresses) {
    2836-        CService service(LookupNumeric(addr.m_added_node, GetDefaultPort(addr.m_added_node)));
    2837+        CService service{MaybeFlipIPv6toCJDNS(LookupNumeric(addr.m_added_node, GetDefaultPort(addr.m_added_node)))};
    


    fanquake commented at 3:04 pm on October 24, 2023:

    MaybeFlipIPv6toCJDNS

    Is there any way to better encapsulate this MaybeFlipIPv6toCJDNS() usage? This seems to be a frequent pattern (a number of additional calls were added in #27071), and it seems like something that would be better handled at a lower level, rather than having to continually remember to wrap various networking-related calls.

    I guess this is also able to remain broken is various ways because we don’t have any specific tests for this functionality?


    jonatack commented at 4:11 pm on October 25, 2023:
    Yes, there are more than a dozen of these. Perhaps it can be part of a CNetAddr ctor after adding test coverage.
  78. p2p, bugfix: correctly detect connected peers in AlreadyConnectedToAddress()
    using the following logic:
    
    If we are about to open an outbound connection
      - for all outbound connections, compare the peer's address and port
      - for all inbound connections and provided we're not making a manual connection,
        compare only the address and ignore the port
    
    If we are about to accept an inbound connection
      - for all connections, both inbound and outbound, compare only the address and
        ignore the port
    
    Fix the logging to be actually useful by stating which peer is already connected,
    and further improve it to provide full details for debugging purposes.
    
    Co-authored-by: Vasil Dimov <vd@FreeBSD.org>
    cb82a9285c
  79. p2p, bugfix: correctly detect already connected peers in ConnectNode()
    by upgrading the checks from FindNode() to AlreadyConnectedToAddress().
    c2d3226653
  80. p2p: do not accept inbound connections from already connected I2P peers
    as creating I2P tunnels is expensive.
    
    These duplicate connection requests come most frequently from inbound peers.
    The other fixes in this PR may mitigate this behavior by our peers, but not for
    nodes running earlier versions of Bitcoin Core.
    
    Extend a related functional test.
    225b6c868d
  81. refactor: pass AddedNodeParams to CConnman::RemoveAddedNode()
    instead of a std::string (and rename its argument and that of
    CConnman::AddNode() to "params").
    
    Passing AddedNodeParams makes the two member functions AddNode and
    RemoveAddedNode consistent, and is needed in order to refactor
    `m_added_node_params` from a vector to an unordered set in the next commit.
    691e2ef49d
  82. refactor: change m_added_node_params from vector to unordered_set
    which simplifies and speeds up the AddNode() and RemoveAddedNode() methods,
    as well as the next commit, "p2p: do not make automatic outbound connections
    to addnode peers", which checks if an address is already in m_added_node_params.
    
    This refactoring is covered by functional tests in rpc_net.py.
    
    Co-authored-by: Vasil Dimov <vd@FreeBSD.org>
    e5e75f6bdf
  83. p2p, bugfix: do not make automatic outbound connections to addnode peers
    to allocate our limited outbound slots correctly, and to ensure addnode
    connections benefit from their intended protections.
    
    Our addnode logic usually connects the addnode peers before the automatic
    outbound logic does, but not always, as a connection race can occur.
    
    For instance, our internet connection or router (or the addnode peer) could be
    temporarily offline, and then return online during the automatic outbound
    thread.  Or we could add a new manual peer using the `addnode` RPC at that time.
    The race can be more apparent when our node doesn't know many peers, or with
    networks like cjdns that currently have few bitcoin peers.
    
    Examples on mainnet using logging added in the same pull request:
    
    2023-08-12T14:51:05.681743Z [opencon] [net.cpp:1949] [ThreadOpenConnections]
    [net:debug] Not making automatic network-specific outbound-full-relay connection
    to i2p peer selected for manual (addnode) connection: [geh...odq.b32.i2p]:0
    
    2023-08-13T03:59:28.050853Z [opencon] [net.cpp:1949] [ThreadOpenConnections]
    [net:debug] Not making automatic block-relay-only connection to onion peer
    selected for manual (addnode) connection: kpg...aid.onion:8333
    
    2023-08-13T16:21:26.979052Z [opencon] [net.cpp:1949] [ThreadOpenConnections]
    [net:debug] Not making automatic network-specific outbound-full-relay connection
    to cjdns peer selected for manual (addnode) connection: [fcc...8ce]:8333
    
    2023-08-14T20:43:53.401271Z [opencon] [net.cpp:1949] [ThreadOpenConnections]
    [net:debug] Not making automatic network-specific outbound-full-relay connection
    to cjdns peer selected for manual (addnode) connection: [fc7...59e]:8333
    
    2023-08-15T00:10:01.894147Z [opencon] [net.cpp:1949] [ThreadOpenConnections]
    [net:debug] Not making automatic feeler connection to i2p peer selected for
    manual (addnode) connection: geh...odq.b32.i2p:8333
    
    When an addnode peer is connected as an automatic outbound peer and is the only
    connection we have to a network, it can be protected by our new outbound
    eviction logic and persist in the "wrong role".
    
    Finally, there does not seem to be a reason to make block-relay or short-lived
    feeler connections to addnode peers, as the addnode logic will ensure we connect
    to them if they are up, within the addnode connection limit.
    
    Fix these issues by checking if the address is an addnode peer in our automatic
    outbound connection logic.
    
    Co-authored-by: Vasil Dimov <vd@FreeBSD.org>
    21f8426c82
  84. p2p: improve ConnectNode log, rm redundant ThreadOpenConnections log
    By moving the "Trying connection" log entry in ConnectNode to after the address
    resolution, we can:
    
    - not needlessly log the attempt until after checking if we are already connected to it
    
    - know the network the peer connected through, i.e. for cjdns peers
    
    - add useful peer info to the log entry for debugging purposes
    
    This change makes the feeler log entry in CConnman::ThreadOpenConnections()
    redundant...
    
    2023-09-06T01:11:42.614464Z Making feeler connection to dft...dyd.onion:8333
    2023-09-06T01:11:42.614722Z Trying feeler connection to net=onion, peeraddr=dft...dyd.onion:8333, lastseen=4.2hrs
    
    ...so remove it.
    b8b1d5992a
  85. p2p: log when protecting last remaining network peer from eviction
    This logging has been helpful for observing and debugging our peer connection
    and eviction behavior.
    
    Also improve related log entries for
    - eviction of an inbound peer
    - protecting a peer from eviction for a bad or slow chain
    09030c6ca2
  86. p2p: improve inbound/outbound/anchor/network-specific log messages 931df5ae15
  87. refactor: simplify MaybePickPreferredNetwork, drop out-param, return optional
    and make it a const class method, and add Clang thread-safety analysis annotation and
    related assertions.
    4238b6f0a7
  88. jonatack force-pushed on Oct 24, 2023
  89. in src/net.cpp:2769 in d746231809 outdated
    2764@@ -2765,15 +2765,19 @@ std::vector<AddedNodeInfo> CConnman::GetAddedNodeInfo() const
    2765         std::copy(m_added_node_params.cbegin(), m_added_node_params.cend(), std::back_inserter(lAddresses));
    2766     }
    2767 
    2768-
    2769-    // Build a map of all already connected addresses (by IP:port and by name) to inbound/outbound and resolved CService
    2770-    std::map<CService, bool> mapConnected;
    2771+    // Collect all addresses to which we are already connected.
    2772+    std::unordered_set<CNetAddr, CNetAddrHash> connected_to_inbound;
    


    mzumsande commented at 7:56 pm on October 24, 2023:

    commit d746231809814f52695a376bd77f4d06622289a4 (p2p, bugfix: detect all inbound connections in GetAddedNodeInfo()):

    I’m not sure if this fixes a bug - comparing on the level of CNetAddr seems rather strict to me. For example, I think that in the current form, we will not attempt to make a manual connection to an addnode localhost peer 127.0.0.1:XXXX if we have any inbound tor peer. Or we wouldn’t be able to be part of a larger local network built from manual connections that has both incoming and outgoing connections to different local peers.

    More generally speaking, I think that it is very hard (impossible?) to ensure that there are no bidirectional connections without restricting legitimate use cases - and maybe not really necessary after all?


    jonatack commented at 3:58 pm on October 25, 2023:

    Thanks for your feedback. Yes, it’s too strict – am updating and working today on writing tests to demonstrate and specify our needs.

    https://github.com/bitcoin/bitcoin/commit/d746231809814f52695a376bd77f4d06622289a4 aims to resolve issues like RPC getaddednodeinfo not detecting inbound peers that have different port numbers from the same peer connected as an outbound.

    For instance, a cjdns addnode peer already connected in our inbounds as

    0[fc70:704d:b268:cdbf:d622:ea9f:db12:859e]:45594
    

    isn’t detected as connected and returned as such by the getaddednodeinfo RPC

    0  {
    1    "addednode": "fc70:704d:b268:cdbf:d622:ea9f:db12:859e",
    2    "connected": false,
    3    "addresses": [
    4    ]
    5  },
    

    when we would expect it to return

    0  {
    1    "addednode": "fc70:704d:b268:cdbf:d622:ea9f:db12:859e",
    2    "connected": true,
    3    "addresses": [
    4      {
    5        "address": "[fc70:704d:b268:cdbf:d622:ea9f:db12:859e]:8333",
    6        "connected": "inbound"
    7      }
    8    ]
    9  },
    

    A case like that could also lead to our addednodes thread continually attempting to connect to the inbound peer without detecting that we’re already connected to it.


    jonatack commented at 7:22 pm on November 15, 2023:

    After adding unit tests, this CNetAddr logic seems needed not in GetAddedNodeInfo() but only in AddNode(), where it is simpler and may be sufficient to compare by CNetAddr for cjdns entries only, in order to handle them correctly.

     0 bool CConnman::AddNode(const AddedNodeParams& add)
     1 {
     2-    const CService resolved(LookupNumeric(add.m_added_node, GetDefaultPort(add.m_added_node)));
     3+    const CService resolved{MaybeFlipIPv6toCJDNS(LookupNumeric(add.m_added_node, GetDefaultPort(add.m_added_node)))};
     4     const bool resolved_is_valid{resolved.IsValid()};
     5 
     6     LOCK(m_added_nodes_mutex);
     7     for (const auto& it : m_added_node_params) {
     8-        if (add.m_added_node == it.m_added_node || (resolved_is_valid && resolved == LookupNumeric(it.m_added_node, GetDefaultPort(it.m_added_node)))) return false;
     9+        if (add.m_added_node == it.m_added_node) return false;
    10+        if (!resolved_is_valid) continue;
    11+        const CService service{MaybeFlipIPv6toCJDNS(LookupNumeric(it.m_added_node, GetDefaultPort(it.m_added_node)))};
    12+        if (resolved == service) return false;
    13+        if (resolved.IsCJDNS() && static_cast<CNetAddr>(resolved) == static_cast<CNetAddr>(service)) return false;
    14     }
    
  90. in src/net.cpp:364 in cb82a9285c outdated
    360@@ -361,11 +361,38 @@ CNode* CConnman::FindNode(const CService& addr, std::optional<bool> inbound)
    361     return nullptr;
    362 }
    363 
    364-bool CConnman::AlreadyConnectedToAddress(const CAddress& addr)
    365+bool CConnman::AlreadyConnectedToAddress(const CAddress& addr, ConnectionType conn_type, bool log)
    


    mzumsande commented at 8:41 pm on October 24, 2023:
    commit: p2p, bugfix: correctly detect connected peers in AlreadyConnectedToAddress() Could you list some examples of scenarios that are not working correctly currently and are improved by this? I’m trying to understand the impact of this commit better.

    jonatack commented at 4:01 pm on October 25, 2023:
    Agree, am updating this commit and writing tests to demonstrate and specify our needs.
  91. maflcko commented at 7:56 pm on November 1, 2023: member
    Maybe mark as draft while CI is red?
  92. jonatack commented at 8:07 pm on November 1, 2023: contributor
    Done. Note that only the latest push is red. I can push a version that is green, but have been rebasing on #28155 that I reckon will be merged soon, and adding missing test coverage. Concept ACKs on fixing the bugs would be encouraging 😃
  93. jonatack marked this as a draft on Nov 1, 2023
  94. jonatack commented at 8:15 pm on November 1, 2023: contributor
    Note also, that item 7 in the PR description involves a regression in v26.x.
  95. fanquake commented at 8:48 pm on November 1, 2023: member

    Note also, that item 7 in the PR description involves a regression in v26.x.

    What is the regression? Which change caused it?

  96. mzumsande commented at 9:18 pm on November 1, 2023: contributor

    Concept ACKs on fixing the bugs would be encouraging 😃

    I think that some of the commits are straightforward (e.g. the missing MaybeFlipIPv6toCJDNS calls or the logging improvements), whereas others are fixing some non-ideal behavior in some cases, but have the risk of introducing new non-ideal behavior in other cases because the fix comes with downsides of its own.

    For example, if it isn’t possible to distinguish situations in which two connections involving the same IP are controlled by the same entity or different entities, any solution would be imperfect, either allowing duplicate connections or preventing legitimate behavior (that is possibly widely used in local setups). So it’s unclear to me if the status quo on these is actually a bug or an intended best-effort solution / the result of a compromise.

    Maybe some of the straightforward bugfixes could be split out into another PR?

  97. jonatack commented at 9:20 pm on November 1, 2023: contributor

    Reported in https://github.com/bitcoin/bitcoin/pull/27213/files#r1291926369 and described in https://github.com/bitcoin/bitcoin/pull/28248/commits/21f8426c823fd83c40fd81f08ff2bf39ec6a4575, there is an interaction between the new network-specific automatic outbound logic and the addnode list: the former doesn’t account for the latter.

     0Notable changes
     1===============
     2
     3P2P and network changes
     4-----------------------
     5
     6- Nodes with multiple reachable networks will actively try to have at least one
     7  outbound connection to each network. This improves individual resistance to
     8  eclipse attacks and network level resistance to partition attacks. Users no
     9  longer need to perform active measures to ensure being connected to multiple
    10  enabled networks. (#27213)
    

    Node operators running over multiple networks may be likely to use -addnode to ensure connection to each of them (I do). They would be the ones most likely to be affected by the change with their addnode peers being allotted to rare outbound slots instead. It makes sense to exclude addnode peers from the automatic outbound logic, either in general or for the new network-specific one.

  98. jonatack commented at 9:25 pm on November 1, 2023: contributor

    Maybe some of the straightforward bugfixes could be split out into another PR?

    Yes – looking at doing that, or dropping here for now any that aren’t relatively clear.

  99. in src/net.cpp:1831 in 225b6c868d outdated
    1822@@ -1823,6 +1823,13 @@ void CConnman::CreateNodeFromAcceptedSocket(std::unique_ptr<Sock>&& sock,
    1823     }
    1824 
    1825     const bool inbound_onion = std::find(m_onion_binds.begin(), m_onion_binds.end(), addr_bind) != m_onion_binds.end();
    1826+
    1827+    // Do not accept inbound connections from I2P peers already connected to us,
    1828+    // as creating new tunnels is expensive.
    1829+    if (addr.IsI2P() && AlreadyConnectedToAddress(addr, ConnectionType::INBOUND)) {
    1830+        return;
    1831+    }
    


    dergoegge commented at 5:12 pm on November 5, 2023:

    Isn’t the tunnel already created at this point? Afaik the tunnel creation is handled by the i2p gateway (e.g. i2pd) not on our side.

    I’m also not sure what we are protecting against here. Assuming tunnel creation is sufficiently DoSy (which it shouldn’t be), this isn’t enough to prevent attacks as an attacker could just not connect from the same address.


    jonatack commented at 3:54 pm on November 15, 2023:
    Yes, I’m going to punt on this commit for now. Seeing dozens to hundreds of connection attempts per day from, say, 1-2 inbound I2P peers that are already connected. However, I’m not sure these peers are malicious, and the behavior might be due to other issues in our already connected detection or outbound logic addressed in this PR.
  100. DrahtBot added the label Needs rebase on Nov 8, 2023
  101. DrahtBot commented at 1:21 pm on November 8, 2023: contributor

    🐙 This pull request conflicts with the target branch and needs rebase.

  102. mzumsande commented at 3:02 pm on November 15, 2023: contributor

    Note also, that item 7 in the PR description involves a regression in v26.x.

    I don’t think that this can be called a regression:

    • There is nothing new about the root problem (making automatic connections to addnode peers in rare cases) in v26.
    • It can happen that such a peer could now be protected as the only one of its network, but to determine if it’s a regression this needs to be compared with the status before.
    • Before we did protection-by-network, we would only ever get into the situation with an extra-outbound peer if we had a stale tip (30 minutes). Unless we are eclipsed or something weird is going on, this happens naturally a few times per day on average.
    • If we are in that situation, it is not very probable that we’d evict the outbound peer that we’d rather have as an addnode peer. First of all, we’d usually just evict the new extra-peer (unless it provides a new block, which should happen very rarely, especially when we have other high-bandwidth OB peers at the same time). But even in the rare cases, we often would not evict the specific peer to solve the addnode problem, that peer might be one of the 4 protected peers, or we might just pick another one.

    I’d guess you could probably have waited for months to actually evict a mis-placed peer that way. So I’d say that the stale-tip eviction was never a functioning mechanism to resolve these kind of addnode problems (and never meant to be), and as a result, #27213 was not a regression - in practice, it might arguably be even an improvement instead of a regression, because in those cases where the peer in question is not the only one from its network, we’ll get into the situation where we have an extra peer to evict more frequently than before.

  103. jonatack commented at 3:15 pm on November 15, 2023: contributor
    @mzumsande The improved logging allows distinguishing when this is due to the new protection logic, and with occasional exceptions, it is.
  104. mzumsande commented at 3:31 pm on November 15, 2023: contributor

    @mzumsande The improved logging allows distinguishing when this is due to the new protection logic, and with occasional exceptions, it is.

    That’s not my point. My point (see above) is that in earlier versions, the extra-outbound eviction (which would only happen in the stale-tip case then) was not a working or reliable mechanism to evict these peers, so there is no regression. Or did I miss something in my above explanation / do you have logs that pre-26 the stale-tip outbound eviction could somehow resolve this problem reliably?

  105. jonatack commented at 3:42 pm on November 15, 2023: contributor

    Empirically, the regression is that we’re actively adding addnode peers as outbound ones, and keeping them there. Since a couple of years I run a node with all three of our privacy networks and addnode entries for each, along with clearnet, and I didn’t see my addnode peers being connected to as non-manual outbounds. This is new behavior for my node.

    Aside, I’m adding unit tests for each change right now, and these are finding additional issues.

  106. mzumsande commented at 4:26 pm on November 15, 2023: contributor

    Empirically, the regression is that we’re actively adding addnode peers as outbound ones, and keeping them there.

    Oh, I thought you were talking about a race at startup - since the extra outbound should only kick in after 5 minutes, I think that should be enough to make connections to the addnode peers.

    I think what might be happening later is that an addnode peer disconnects us, and if it was the only one from its network then there could be a race between picking another network-related outbound peer (which could be the peer that just disconnected us if addrman would select it) and reconnecting to the peer that evicted us with addnode thread. Could that be what you are seeing?

  107. jonatack commented at 5:30 pm on November 15, 2023: contributor
    @mzumsande I think so, yes, along with the other races mentioned in the commit message https://github.com/bitcoin/bitcoin/pull/28248/commits/21f8426c823fd83c40fd81f08ff2bf39ec6a4575. I’ll add the case you describe to it (thanks!)
  108. jonatack commented at 5:33 pm on November 15, 2023: contributor

    I’d be interested to hear anyone’s thoughts on this, from that commit message:

    Finally, there does not seem to be a reason to make block-relay or short-lived feeler connections to addnode peers, as the addnode logic will ensure we connect to them if they are up, within the addnode connection limit.

    This may not be optimal if one were to add many addnode entries, i.e. more than 8 that are online. I don’t know if people do that. Maybe it would be prudent to allow feelers to addnodes.

  109. mzumsande commented at 7:25 pm on November 15, 2023: contributor

    Maybe it would be prudent to allow feelers to addnodes.

    I think it’s fine to skip feelers for addnode peers. When we would skip these addresses for automatic connection, I don’t see what making a feeler connection to them would achieve, considering that the point of feelers is to have better options for future automatic connections.

  110. DrahtBot commented at 1:16 am on February 13, 2024: contributor

    ⌛ There hasn’t been much activity lately and the patch still needs rebase. What is the status here?

    • Is it still relevant? ➡️ Please solve the conflicts to make it ready for review and to ensure the CI passes.
    • Is it no longer relevant? ➡️ Please close.
    • Did the author lose interest or time to work on this? ➡️ Please close it and mark it ‘Up for grabs’ with the label, so that it can be picked up in the future.
  111. jonatack commented at 4:19 pm on April 8, 2024: contributor
    Will continue work on these bugfixes. Closing to be able to re-open it.
  112. jonatack closed this on Apr 8, 2024

  113. fanquake referenced this in commit dd42a5ddea on May 16, 2024
  114. bitcoin deleted a comment on May 16, 2024

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-11-21 09:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me