net: attempts to connect to all resolved addresses when connecting to a node #28834

pull sr-gi wants to merge 1 commits into bitcoin:master from sr-gi:addnode-tryall changing 1 files +91 −75
  1. sr-gi commented at 5:31 pm on November 9, 2023: member

    This is a follow-up of #28155 motivated by #28155 (review)

    Rationale

    Prior to this, when establishing a network connection via CConnman::ConnectNode, if the connection needed address resolution, a single address would be picked at random from the resolved addresses and our node would try to connect to it. However, this would lead to the behavior of ConnectNode being unpredictable when the address was resolved to various ips (e.g. the address resolving to IPv4 and IPv6, but we only support one of them).

    This patches the aforementioned behavior by going over all resolved IPs until a valid one is found or until we exhaust them.

  2. DrahtBot commented at 5:31 pm on November 9, 2023: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage

    For detailed information about the code coverage, see the test coverage report.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK vasild, mzumsande, achow101
    Concept ACK naumenkogs

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #29415 (Broadcast own transactions only via short-lived Tor or I2P connections by vasild)
    • #29231 (logging: Update to new logging API by ajtowns)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  3. DrahtBot added the label P2P on Nov 9, 2023
  4. DrahtBot added the label CI failed on Nov 9, 2023
  5. maflcko commented at 1:25 pm on November 13, 2023: member
    CI failed
  6. sr-gi force-pushed on Nov 15, 2023
  7. sr-gi commented at 5:29 pm on November 15, 2023: member
    There was an issue with the interaction of the change and using a proxy. It should be fixed now
  8. in test/functional/feature_proxy.py:194 in 3bb2cd9640 outdated
    187@@ -187,7 +188,10 @@ def node_test(self, node, *, proxies, auth, test_onion, test_cjdns):
    188         addr = "node.noumenon:8333"
    189         self.log.debug(f"Test: outgoing DNS name connection through node for address {addr}")
    190         node.addnode(addr, "onetry")
    191+        # FIXME: DEADLOCK IS HERE
    192+        self.log.debug(f"Test: DEADLOCK")
    193         cmd = proxies[3].queue.get()
    194+        
    


    jonatack commented at 5:47 pm on November 15, 2023:

    CI linter needs appeasing here by removing extra spaces

    (you can check locally before pushing by running ./test/lint/lint-whitespace.py)


    sr-gi commented at 5:50 pm on November 15, 2023:
    I even left some debug comments there. My bad
  9. sr-gi force-pushed on Nov 15, 2023
  10. sr-gi force-pushed on Nov 15, 2023
  11. DrahtBot removed the label CI failed on Nov 15, 2023
  12. mzumsande commented at 8:53 pm on November 15, 2023: contributor
    Concept ACK, will review soon.
  13. in src/net.cpp:460 in 8efc4c3ec8 outdated
    456@@ -448,102 +457,108 @@ CNode* CConnman::ConnectNode(CAddress addrConnect, const char *pszDest, bool fCo
    457     assert(!addr_bind.IsValid());
    458     std::unique_ptr<i2p::sam::Session> i2p_transient_session;
    459 
    460-    if (addrConnect.IsValid()) {
    461-        const bool use_proxy{GetProxy(addrConnect.GetNetwork(), proxy)};
    462-        bool proxyConnectionFailed = false;
    463+    for (auto& addrConnect: resolvedAddresses) {
    


    mzumsande commented at 7:00 pm on December 19, 2023:
    There should be a different name than addrConnect (either here or above), shadowing should be avoided.

    sr-gi commented at 4:41 pm on December 20, 2023:
    Addressed in 9c1ab79
  14. sr-gi force-pushed on Dec 20, 2023
  15. mzumsande commented at 6:11 pm on December 20, 2023: contributor
    Would be good to adjust the title / commit message to be more general: The solution doesn’t specifically address addnode but changes OpenNetworkConnection / ConnectNode and therefore all callers that use it with pszDest set (-connect, -seednode, test-only -addconnection) are affected by the change.
  16. sr-gi force-pushed on Dec 20, 2023
  17. sr-gi renamed this:
    net: Attempts to connect to all resolved addresses on `addnode`
    net: attempts to connect to all resolved addresses when connecting to a node
    on Dec 20, 2023
  18. sr-gi commented at 8:56 pm on December 20, 2023: member

    Would be good to adjust the title / commit message to be more general: The solution doesn’t specifically address addnode but changes OpenNetworkConnection / ConnectNode and therefore all callers that use it with pszDest set (-connect, -seednode, test-only -addconnection) are affected by the change.

    Updated both commit message/title and PR title/desciption

  19. DrahtBot added the label CI failed on Jan 16, 2024
  20. sr-gi force-pushed on Jan 29, 2024
  21. sr-gi commented at 10:07 pm on January 29, 2024: member
    Rebased to fix CI
  22. naumenkogs commented at 8:59 am on January 30, 2024: member
    Concept ACK
  23. DrahtBot removed the label CI failed on Jan 30, 2024
  24. in src/net.cpp:420 in 494c732fab outdated
    416@@ -417,6 +417,7 @@ CNode* CConnman::ConnectNode(CAddress addrConnect, const char *pszDest, bool fCo
    417     // Resolve
    418     const uint16_t default_port{pszDest != nullptr ? GetDefaultPort(pszDest) :
    419                                                      m_params.GetDefaultPort()};
    420+    std::vector<CAddress> resolvedAddresses{};
    


    vasild commented at 9:34 am on February 13, 2024:

    naming: variable names should use snake_case according to doc/developer-notes.md.

    Also, this contains the “resolved” addresses in the first branch, but in the other branches it has just a copy of addrConnect. In that case there is no “resolving” involved. Maybe a better name would be connect_to?


    sr-gi commented at 3:08 pm on February 14, 2024:

    I’ve changed resolvedAddresses -> connect_to, and resolvedAddr -> target_addr for consistency.

    Also, I added a comment describing what connect_to is and how it gets populated

  25. in src/net.cpp:461 in 494c732fab outdated
    457@@ -449,102 +458,108 @@ CNode* CConnman::ConnectNode(CAddress addrConnect, const char *pszDest, bool fCo
    458     assert(!addr_bind.IsValid());
    459     std::unique_ptr<i2p::sam::Session> i2p_transient_session;
    460 
    461-    if (addrConnect.IsValid()) {
    462-        const bool use_proxy{GetProxy(addrConnect.GetNetwork(), proxy)};
    463-        bool proxyConnectionFailed = false;
    464+    for (auto& resolvedAddr: resolvedAddresses) {
    


    vasild commented at 9:39 am on February 13, 2024:

    Here we iterate over all resolved addresses and I think this is the right thing to do as usually there are just a few, maybe IPv4 and IPv6 and some fallback addresses for redundancy.

    What is the worst thing that can happen? A broken/malicious hostname could resolve to 100s addresses all of which timeout slowly, causing some delay here. I think that is ok, given that pzsDest never comes unchecked from random/evil peers over the P2P network and is always provided manually by the operator of the node, right?


    sr-gi commented at 4:12 pm on February 14, 2024:

    What is the worst thing that can happen? A broken/malicious hostname could resolve to 100s addresses all of which timeout slowly, causing some delay here.

    We are limiting the number of resolutions to 256

    I think that is ok, given that pzsDest never comes unchecked from random/evil peers over the P2P network and is always provided manually by the operator of the node, right?

    That’s my understanding too. This is the call tree for ConnectNode:

    0RPC::AddConnection -->      AddConnection               --|
    1ThreadOpenConnections -->   ProcessAddrFetch            --|  
    2Connman::Start -->          ThreadOpenConnections (x2)  ------->  OpenNetworkConnection --> ConnectNode 
    3CConnman::Start -->         ThreadOpenAddedConnections  --|
    4                            RPC::AddNode ("onetry")     __|
    

    Out of all these, the only method that is not purely manual is ThreadOpenConnections, but that only calls OpenNetworkConnection with pszDest set if connect parameters have been passed to startup, which are also manual.


    mzumsande commented at 8:29 pm on February 15, 2024:

    I think that is ok, given that pzsDest never comes unchecked from random/evil peers over the P2P network and is always provided manually by the operator of the node, right?

    With the slight exception that DNS seeds can, in certain circumstances, be added as AddAddrFetch peers and then queried via pszDest. These are not provided by the operator manually but hardcoded.


    sr-gi commented at 10:11 pm on February 16, 2024:
    I’m a bit curious about this. I think I’ll try to dig into how bad could it be if a malicious DNS seed provides us with 256+ addresses that try to delay the TPC handshake as much as possible

    sr-gi commented at 7:07 pm on February 20, 2024:

    I did some digging on the topic. Looks like the way TCP handshake timeout works depends on the OS, but is managed by the application who’s trying to create/accept the connection. In our case, this is governed by nConnectTimeout, which ranges from 0-5000ms (being the later the default: DEFAULT_CONNECT_TIMEOUT).

    So, the longest this could take to resolve, assuming 256 addresses that all timeout, is around ~21 min (5*256/60). As mentioned before, this is done by the user under normal conditions, and the only case in were it is not (DNS Seed + AddAddrFetch), it would require one of the DNS Seeds to go rogue for this to trigger, which seems to be a bigger problem by itself

  26. vasild commented at 9:41 am on February 13, 2024: contributor
    Approach ACK 494c732fabf656764114bfbf824e5aa60c80e2cf, just a naming nit below.
  27. sr-gi force-pushed on Feb 14, 2024
  28. vasild approved
  29. vasild commented at 4:52 pm on February 14, 2024: contributor
    ACK ba021087a2a8d1b5866460a386ea42e0b643c184
  30. DrahtBot requested review from naumenkogs on Feb 14, 2024
  31. DrahtBot requested review from mzumsande on Feb 14, 2024
  32. mzumsande commented at 8:31 pm on February 15, 2024: contributor

    ACK ba021087a2a8d1b5866460a386ea42e0b643c184

    I reviewed the code and tested this by manually connecting to a node that resolves to both an IPv4 and an IPv6 address (the latter of which I can’t connect to in my network environmen)

  33. DrahtBot removed review request from mzumsande on Feb 15, 2024
  34. in src/net.cpp:417 in ba021087a2 outdated
    416@@ -417,6 +417,9 @@ CNode* CConnman::ConnectNode(CAddress addrConnect, const char *pszDest, bool fCo
    417     // Resolve
    418     const uint16_t default_port{pszDest != nullptr ? GetDefaultPort(pszDest) :
    419                                                      m_params.GetDefaultPort()};
    420+
    421+    // Collection of addresses to try to connect to: either all dns resolved addresses (if pszDest is provided) or addrConnect otherwise.
    422+    std::vector<CAddress> connect_to{};
    


    naumenkogs commented at 9:31 am on February 16, 2024:
    nit: do we still need Shuffle(resolved.begin(), resolved.end(), FastRandomContext());? It certainly doesn’t achieve the same goal as before… I’d suggest either dropping it altogether, or document this new purpose (either inline or in the commit message).

    vasild commented at 3:38 pm on February 16, 2024:

    What was the goal before? If it was for some load balancing, e.g. if it returns 3 addresses to which we can connect to not always connect to the first result? (or have two different nodes connect to the first result)

    If “yes” then that still holds with this PR?


    sr-gi commented at 10:08 pm on February 16, 2024:

    This was added by me as a suggestion from @vasild in #28155 (review) to maintain the older logic, but it comes all the way from https://github.com/bitcoin/bitcoin/commit/6387f397b323b0fb4ca303fe418550f5465147c6#diff-00021eed586a482abdb09d6cdada1d90115abe988a91421851960e26658bed02R430.

    There is no comment nor commit message motivating it though.

    I personally think randomizing the order in which the resolved addresses are tried wouldn’t hurt. But I’m open to drop it if you think it’s better.


    naumenkogs commented at 8:57 am on February 19, 2024:

    Yeah, after looking more into the current PR, it achieves nearly the same goal… I’m fine with not doing anything here.

    But it would be cool to understand better what it achieves :)

    Here’s what i think is going on. A DNS implementation may (or may not) return results in the same order, and then all the nodes will end up making connections with the same result. Then it’s useful for load balancing, and to avoid DNS influencing the connectivity in general. Might be worth adding a comment along these lines.


    sr-gi commented at 4:38 pm on February 20, 2024:

    I did a bit more digging and found the _actual_ origin of the DNS result randomization, a code refactor before the commit I posted earlier got me confused: https://github.com/bitcoin/bitcoin/commit/1a5a4e648873c4cd88b936648ebf2858393e5510

    This was introduced in #8113 by @sipa, so he may know better


    naumenkogs commented at 7:30 am on February 22, 2024:
    I think we can go ahead with this PR without waiting for clarification.
  35. in src/net.cpp:421 in ba021087a2 outdated
    416@@ -417,6 +417,9 @@ CNode* CConnman::ConnectNode(CAddress addrConnect, const char *pszDest, bool fCo
    417     // Resolve
    418     const uint16_t default_port{pszDest != nullptr ? GetDefaultPort(pszDest) :
    419                                                      m_params.GetDefaultPort()};
    420+
    421+    // Collection of addresses to try to connect to: either all dns resolved addresses (if pszDest is provided) or addrConnect otherwise.
    


    naumenkogs commented at 9:33 am on February 16, 2024:
    nit: would be nice to say XYZ (pszDest) is provided instead of pszDest is provided
  36. in src/net.cpp:528 in ba021087a2 outdated
    568-            // If a connection to the node was attempted, and failure (if any) is not caused by a problem connecting to
    569-            // the proxy, mark this as an attempt.
    570-            addrman.Attempt(addrConnect, fCountFailure);
    571+
    572+        // Check any other resolved address (if any) if we fail to connect
    573+        if (!connected) {
    


    naumenkogs commented at 9:39 am on February 16, 2024:

    nit:

    I’d suggest to clarify the commit message saying that “a valid one is found” refers to what exactly (getting up to here and connected = true).


    sr-gi commented at 9:46 pm on February 16, 2024:

    I changed that bit of the commit message to:

    0This patches the aforementioned behavior by going over all resolved IPs until we find one
    1we can connect to or until we exhaust them.
    

    I think that should be clear enough given the variable name, but let me know otherwise

  37. in src/net.cpp:517 in ba021087a2 outdated
    551-            // no proxy needed (none set for target network)
    552-            sock = CreateSock(addrConnect);
    553+        } else if (pszDest && GetNameProxy(proxy)) {
    554+            sock = CreateSock(proxy.proxy);
    555             if (!sock) {
    556                 return nullptr;
    


    naumenkogs commented at 9:42 am on February 16, 2024:
    nit: can you add a comment to justify early termination here (instead of trying going through other addrs)
  38. naumenkogs changes_requested
  39. naumenkogs commented at 9:49 am on February 16, 2024: member
    Mostly looks good, let me know what you think about the suggestions.
  40. DrahtBot requested review from naumenkogs on Feb 16, 2024
  41. sr-gi commented at 10:12 pm on February 16, 2024: member

    Thanks for the reviews @mzumsande @vasild and @naumenkogs!

    I have some updates pending on the nits provided by @naumenkogs, but I’ll wait until we resolve #28834 (review) to push them all. I’d also look into #28834 (review) out of curiosity

  42. sr-gi force-pushed on Feb 22, 2024
  43. sr-gi force-pushed on Feb 22, 2024
  44. sr-gi commented at 2:02 pm on February 22, 2024: member
    Addressesed suggestions by @naumenkogs
  45. vasild approved
  46. vasild commented at 10:59 am on February 28, 2024: contributor
    ACK 0494068c7e0f5fda90aac874b9b3b2623d855cba
  47. DrahtBot requested review from mzumsande on Feb 28, 2024
  48. mzumsande commented at 5:24 pm on March 11, 2024: contributor
    re-ACK 0494068c7e0f5fda90aac874b9b3b2623d855cba Only doc changes since my previous ack.
  49. DrahtBot added the label Needs rebase on Mar 12, 2024
  50. sr-gi force-pushed on Mar 12, 2024
  51. sr-gi commented at 6:15 pm on March 12, 2024: member
    Rebased to address merge conflicts
  52. DrahtBot removed the label Needs rebase on Mar 12, 2024
  53. vasild approved
  54. vasild commented at 9:15 am on March 13, 2024: contributor
    ACK a75975e1de8062a571afa0a026fc0aee9a6aef56
  55. DrahtBot added the label Needs rebase on Mar 13, 2024
  56. sr-gi force-pushed on Mar 15, 2024
  57. sr-gi force-pushed on Mar 15, 2024
  58. DrahtBot commented at 9:03 pm on March 15, 2024: contributor

    🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the documentation.

    Possibly this is due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    Leave a comment here, if you need help tracking down a confusing failure.

    Debug: https://github.com/bitcoin/bitcoin/runs/22724110270

  59. DrahtBot added the label CI failed on Mar 15, 2024
  60. sr-gi commented at 9:11 pm on March 15, 2024: member
    Rebased to address merge conflicts
  61. DrahtBot removed the label Needs rebase on Mar 15, 2024
  62. DrahtBot removed the label CI failed on Mar 15, 2024
  63. sr-gi commented at 2:16 pm on March 21, 2024: member
    @vasild @mzumsande @naumenkogs would you mind re-checking this? It feels like it’s been ready for a while but it keeps being 1 ack short
  64. mzumsande commented at 5:17 pm on March 22, 2024: contributor
    Looks like something went wrong with the rebase, there are two commits with rust code that don’t seem related.
  65. sr-gi force-pushed on Mar 22, 2024
  66. sr-gi commented at 7:50 pm on March 22, 2024: member

    Looks like something went wrong with the rebase, there are two commits with rust code that don’t seem related.

    Not really sure what happened here 🤔 but should be fixed now

  67. mzumsande commented at 4:49 pm on March 26, 2024: contributor
    re-ACK ae7afa5547889ccf04c1522555e28e025d14f335
  68. DrahtBot requested review from vasild on Mar 26, 2024
  69. in src/net.cpp:493 in ae7afa5547 outdated
    515+                if (connected) {
    516+                    sock = std::move(conn.sock);
    517+                    addr_bind = CAddress{conn.me, NODE_NONE};
    518+                }
    519+            } else if (use_proxy) {
    520+                LogPrintLevel(BCLog::PROXY, BCLog::Level::Debug, "Using proxy: %s to connect to %s:%s\n", proxy.ToString(), target_addr.ToStringAddr(), target_addr.GetPort());
    


    vasild commented at 1:37 pm on April 18, 2024:

    nit: since this line is being changed anyway, consider this:

    0                LogPrintLevel(BCLog::PROXY, BCLog::Level::Debug, "Using proxy: %s to connect to %s\n", proxy.ToString(), target_addr.ToStringAddrPort());
    

    This would take care to put IPv6 addresses in [ ] instead of printing something like 2001:470:88ff:2e::1:8333


    sr-gi commented at 2:06 pm on April 18, 2024:

    This is just being moved but agreed is simple enough to include it here.

    Covered in fd81a37

  70. vasild approved
  71. vasild commented at 1:39 pm on April 18, 2024: contributor
    ACK ae7afa5547889ccf04c1522555e28e025d14f335
  72. net: attempts to connect to all resolved addresses when connecting to a node
    Prior to this, when establishing a network connection via CConnman::ConnectNode,
    if the connection needed address resolution, a single address would be picked
    at random from the resolved addresses and our node will try to connect to it. However,
    this would lead to the behavior of ConnectNode being unpredictable when the address
    was resolved to various ips (e.g. the address resolving to IPv4 and IPv6, but we only
    support one of them).
    
    This patches the aforementioned behavior by going over all resolved IPs until we find one
    we can connect to or until we exhaust them.
    fd81a37239
  73. sr-gi force-pushed on Apr 18, 2024
  74. sr-gi requested review from mzumsande on Apr 18, 2024
  75. sr-gi requested review from vasild on Apr 18, 2024
  76. DrahtBot added the label CI failed on Apr 18, 2024
  77. DrahtBot removed the label CI failed on Apr 18, 2024
  78. vasild approved
  79. vasild commented at 1:18 pm on April 19, 2024: contributor
    ACK fd81a37239541d0d508402cd4eeb28f38128c1f2
  80. DrahtBot added the label CI failed on Apr 21, 2024
  81. mzumsande commented at 5:32 pm on April 22, 2024: contributor
    re-ACK fd81a37239541d0d508402cd4eeb28f38128c1f2 (just looked at diff, only small logging change)
  82. DrahtBot removed the label CI failed on Apr 22, 2024
  83. achow101 commented at 7:52 pm on April 25, 2024: member
    ACK fd81a37239541d0d508402cd4eeb28f38128c1f2
  84. achow101 merged this on Apr 25, 2024
  85. achow101 closed this on Apr 25, 2024


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-12-11 06:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me