p2p: Improve diversification of new connections #19860

pull naumenkogs wants to merge 3 commits into bitcoin:master from naumenkogs:2020-09-diverse-new-conn-fixes changing 1 files +16 āˆ’8
  1. naumenkogs commented at 2:06 pm on September 2, 2020: member

    This PR improves the logic around netgroup diversity of new connections. More specifically, it improves privacy and fixes a couple things I consider buggy. It also adds a bunch of documentation.

    This PR changes the following:

    1. prevents short-lived ADDR_FETCH and FEELER connections from affecting our further peer selection, because they are short-lived and should not matter
    2. ADDR_FETCH and FEELER conns are now not included in count_failure consideration (see new documentation). Even though it makes sense to consider them if they are present at the moment of the call, we should not rely on that luck. Instead, we should be explicit.
    3. ~Prevents BLOCK_RELAY connections from affecting our further full-relay peer selection, because otherwise, an attacker with sufficient ADDR-related capabilities may infer to which netgroups those (supposedly more private) connections belong.~ Abandoning this feature for now because the trade-offs are unclear, see P2P meeting at #bitcoin-core-dev from Sept. 8.
    4. Look at the existing MANUAL peers when enforcing diversity of new outbound conns.

    ~Iā€™m also not sure whether new BLOCK_RELAY connections should be distinct w.r.t existent full outbound connections. Seems like another influence vector for an attacker with AddrMan poisoning capabilities, but this threat is probably too advanced. And having a good diversity of BLOCK_RELAY conns is very beneficial.~

  2. naumenkogs marked this as a draft on Sep 2, 2020
  3. naumenkogs commented at 2:07 pm on September 2, 2020: member
    Draft until #19724 is merged.
  4. naumenkogs commented at 2:12 pm on September 2, 2020: member

    There is an interesting overlap with #19858 (periodic BLOCK_RELAY conns to sync tips).

    I think those temporarily BLOCK_RELAY conns probably should be selected similarly to the existent new BLOCK_RELAY conns, meaning diversified by both set_connected_full_relay and set_connected_block_relay. If many of our peers are Sybils, it also would improve the chance of not connecting to the same netgroup.

    Similarly to the work in this PR, I think we better select them as diverse as we can. If an attacker is capable to influence this selection, making it less diverse (by dropping one of the checks) won’t really help neither with the robustness of these selected connections nor with their privacy.

  5. DrahtBot added the label P2P on Sep 2, 2020
  6. sdaftuar commented at 3:27 pm on September 2, 2020: member

    Prevents BLOCK_RELAY connections from affecting our further full-relay peer selection, because otherwise, an attacker with sufficient ADDR-related capabilities may infer to which netgroups those (supposedly more private) connections belong.

    Do you have a way to model what an attacker’s capability would need to be in order to deduce something like this? On the face of it, it seems strange to me make this kind of peering behavior dependent of the order in which we connect peers (full-relay first or block-relay first), and in particular, my presumption would be that it’s more important to aim for peer netgroup diversity (as a first-order solution to being eclipsed) than it is to worry about inference based on that diversity (which is a second-order concern with regards to eclipse attacks) – but if the inference capability is strong then perhaps I’m mistaken in thinking that?

  7. in src/net.h:907 in f26b0ddac5 outdated
    860@@ -861,6 +861,12 @@ class CNode
    861         return m_conn_type == ConnectionType::INBOUND;
    862     }
    863 
    864+    /* Whether we send addr messages over this connection */
    865+    bool RelayAddrsWithConn() const
    


    Empact commented at 4:51 pm on September 2, 2020:
    nit: Would be a bit clearer IMO to use the Is prefix or similar to distinguish between the interrogative and the imperative.

    naumenkogs commented at 7:03 am on September 3, 2020:
    Not my work: see #19724 this is based on.
  8. in src/net.cpp:1834 in c63c87507a outdated
    1829@@ -1830,7 +1830,8 @@ void CConnman::ThreadOpenConnections(const std::vector<std::string> connect)
    1830         // Only connect out to one peer per network group (/16 for IPv4).
    1831         int nOutboundFullRelay = 0;
    1832         int nOutboundBlockRelay = 0;
    1833-        std::set<std::vector<unsigned char> > setConnected;
    1834+        std::set<std::vector<unsigned char> > set_connected_full_relay;
    1835+        std::set<std::vector<unsigned char> > set_connected_block_relay;
    


    Empact commented at 4:58 pm on September 2, 2020:
    nit: I don’t believe the > > construction is necessary anymore here. See other uses<*>> in the codebase.
  9. DrahtBot commented at 5:27 pm on September 2, 2020: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #24170 (p2p, rpc: Manual block-relay-only connections with addnode by mzumsande)
    • #22910 (net: Encapsulate asmap in NetGroupManager by jnewbery)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  10. naumenkogs commented at 7:20 am on September 3, 2020: member

    @sdaftuar I agree that immediate diversity is more important than theoretical anti-inference. But since it’s easy to get confused here, I want to point out that after the second commit:

    • new block relay conns are made with full diversity, same as before
    • new full relay conns are made distinct w.r.t. existing full relay conns

    The difference is only that existing block relay conns don’t affect new full relay conns.

    So, like, just 2 of the existing peers, for now, won’t participate in diversity enforcement. If we have more (say 8 block relay conns), we might introduce some randomness: like 4 conns are used to diversify new full conns, and 4 are hidden.

    Do you have a way to model what an attacker’s capability would need to be in order to deduce something like this?

    No, not right now. I just assume an attacker can influence what our AddrMan has. The influence shouldn’t even be that strong, they just have to force us to connect to a set of (given by them) peers on-demand with some probability.

    For example, they exploited eviction and made us connect to them (via regular conns) exclusively. Now to eclipse us they have to find out about our block-relay-only conns and potentially disrupt those. May be feasible with these capabilities and enough protocol knowledge :) Since block-relay-only conns are created (sort of) specifically to handle scenarios like this, I wanna make them more robust.


    If this is not convincing, I can drop the commit until when/if I can actually demostrate a feasible attack.

  11. naumenkogs force-pushed on Sep 3, 2020
  12. naumenkogs commented at 10:53 am on September 3, 2020: member

    Added one more improvement: diversify by MANUAL connections (consider them regular full outbound). The privacy considerations I have for BLOCK_RELAY don’t apply here because MANUAL are not particularly private… (since they support full relay)

    Not sure whether they should be excluded when considering count_failure, similarly to “Having one persistent outbound peer probably means it’s from our local network […]” reasoning in the comment.

  13. naumenkogs marked this as ready for review on Sep 3, 2020
  14. naumenkogs force-pushed on Sep 3, 2020
  15. in src/net.cpp:1834 in 229aba769e outdated
    1829@@ -1830,7 +1830,8 @@ void CConnman::ThreadOpenConnections(const std::vector<std::string> connect)
    1830         // Only connect out to one peer per network group (/16 for IPv4).
    1831         int nOutboundFullRelay = 0;
    1832         int nOutboundBlockRelay = 0;
    1833-        std::set<std::vector<unsigned char> > setConnected;
    1834+        std::set<std::vector<unsigned char>> set_connected_full_relay;
    1835+        std::set<std::vector<unsigned char>> set_connected_block_relay;
    


    ariard commented at 2:42 pm on September 4, 2020:
    On the threading side, it sounds to me a bit inefficient to recompute those sets for each test membership of a selected addr as we have to lock and iterate vNodes each time. A future improvement could be to cache them and only update as we add/drop nodes ?

    naumenkogs commented at 10:57 am on September 7, 2020:
    I would leave this for future improvements.
  16. in src/net.cpp:1846 in a7bc2eb070 outdated
    1846+                // Our goal here is to not use multiple of our limited outbound slots on a single netgroup.
    1847                 switch (pnode->m_conn_type) {
    1848                     case ConnectionType::INBOUND:
    1849+                        // Not an outbound slot.
    1850+                        // Additionally, they could be attacker controlled and used to prevent
    1851+                        // us from connecting to particular hosts if we used them here.
    


    ariard commented at 2:54 pm on September 4, 2020:

    You can precise further exact caution we should have wrt to inbound netgroup diversity.

    “If we enforce some netgroup diversity on our inbound slots in the future we should avoid to intersect this set of pinned netgroups with the outbound one. That way to avoid an attacker restraining diversity of potential outbound connections by already occupying a netgroup as an inbound”


    naumenkogs commented at 9:18 am on September 8, 2020:
    I’d rather not keep comments w.r.t potential future improvements in the codebase.
  17. in src/net.cpp:1848 in b2e36fff58 outdated
    1844@@ -1845,7 +1845,6 @@ void CConnman::ThreadOpenConnections(const std::vector<std::string> connect)
    1845                         // Not an outbound slot.
    1846                         // Additionally, they could be attacker controlled and used to prevent
    1847                         // us from connecting to particular hosts if we used them here.
    1848-                    case ConnectionType::MANUAL:
    1849                         // Manually selected, so should not affect our peer selection.
    


    ariard commented at 2:55 pm on September 4, 2020:
    You should remove this comment.
  18. ariard commented at 3:55 pm on September 4, 2020: member

    I think you have don’t consider not only AddrMan poisoning capabilities but also more powerful ones like connections-interference capabilities. E.g an adversary observing an outbound connection from Alice to Bob and repeatedly opening inbound towards Bob until Alice is dropped. More concretely, if we assume that an attacker can map a victim outbound full-relay peers through tx-relay and manipulate those connections liveliness, a serie of which netgroups have been visited can be established. Observing the missing ones among the publicly available peerspace should hint attacker towards which netgroups are occupied by victim’s block-relay slots.

    I understand this PR as aiming to provoke netgroup collisions among the set of full-relay and block-relay peers to avoid making observable the absence of it. And thus improving the non-observability of block-relay peers. Of course it has as a side-effect to decrease the global outbound peers diversity as full-relay netgroups may overlap with block-relay ones. I think this is acceptable as the diversity reduction come at the win of making it far harder to disrupt block-relay connections. Previously an attacker should have committed a higher number Sybils spread in more diverse netgroups and could have interfered with victim’s honest block-relay to increase odds of its deployed Sybils being selected as block-relay. Now, an attacker may commit a lower number of Sybils spread in a less diverse netgroups but is more restrained to interfere, and thus likely decrease its odds of being selected.

    Overall, I think diversity and network topology non-observability are improving eclipse-attack resistance on different axis. As of today, the current trade-off we have between them is more due to the absence of clean separation between block-relay/tx-relay and a low number of block-relay-only.

    Concept ACK

  19. naumenkogs force-pushed on Sep 8, 2020
  20. naumenkogs force-pushed on Sep 9, 2020
  21. naumenkogs commented at 7:16 am on September 9, 2020: member

    This PR is now updated to be a non-controversial improvement:

    • don’t diversify by ADDR_FETCH and FEELER
    • do diversify by MANUAL

    I dropped the part about BLOCK_RELAY diversity.

  22. in src/net.cpp:1841 in 9c2babb8a4 outdated
    1837@@ -1838,19 +1838,22 @@ void CConnman::ThreadOpenConnections(const std::vector<std::string> connect)
    1838                 if (pnode->IsFullOutboundConn()) nOutboundFullRelay++;
    1839                 if (pnode->IsBlockOnlyConn()) nOutboundBlockRelay++;
    1840 
    1841-                // Netgroups for inbound and manual peers are not excluded because our goal here
    1842-                // is to not use multiple of our limited outbound slots on a single netgroup
    1843-                // but inbound and manual peers do not use our outbound slots. Inbound peers
    1844-                // also have the added issue that they could be attacker controlled and used
    1845-                // to prevent us from connecting to particular hosts if we used them here.
    1846+                // Our goal here is to not use multiple of our limited outbound slots on a single netgroup.
    


    Zero-1729 commented at 1:47 pm on September 16, 2020:

    nit: maybe consider rephrasing this to

    0// Our goal here is to not use multiple limited outbound slots on a single netgroup.
    
  23. naumenkogs force-pushed on Sep 24, 2020
  24. vasild approved
  25. vasild commented at 11:29 am on August 25, 2021: contributor

    ACK c7699eeaff5f4045cbe2aa64c615d3800d152ff9

    Makes perfect sense to me.

  26. Zero-1729 approved
  27. Zero-1729 commented at 1:11 pm on August 28, 2021: contributor

    crACK c7699ee

    LGTM, makes sense to exclude feeler and addr_fetch, and consider manual conns.

  28. naumenkogs commented at 3:18 pm on August 31, 2021: member
    Sorry for being absent, I’ll try to rebase it sometime soon.
  29. bitcoin deleted a comment on Aug 31, 2021
  30. amitiuttarwar commented at 10:48 pm on September 2, 2021: contributor
    • concept ACK to removing netgroup influence of ADDR_FETCH & FEELER
    • OP says they “are now not included in count_failure consideration (see new documentation)” « what is this in reference to? is this a relic from a previous version of the PR?
    • unsure about including MANUAL peers for outbound diversity & don’t see anything on this PR that explains why. can you share your reasoning?
  31. DrahtBot added the label Needs rebase on Sep 6, 2021
  32. p2p: Diversify connections only w.r.t *persistent* outbound peers
    ADDR_FETCH and FEELER are short-lived connections,
    and they should not affect our choice of peers.
    
    Also, improve comments.
    916d8f03c7
  33. naumenkogs renamed this:
    Improve diversification of new connections: privacy and stability
    p2p: Improve diversification of new connections
    on Sep 14, 2021
  34. naumenkogs commented at 1:07 pm on September 14, 2021: member

    @amitiuttarwar

    OP says they “are now not included in count_failure consideration (see new documentation)” « what is this in reference to? is this a relic from a previous version of the PR?

    I don’t even remember. Dropped this from the description. A lot of discussion in this PR, maybe worth opening a new one…

    unsure about including MANUAL peers for outbound diversity & don’t see anything on this PR that explains why. can you share your reasoning?

    I just think MANUAL is just no different from a regular outbound peer. If you have a MANUAL to netgroup XYZ, you probably don’t want OUTBOUND_FULL_RELAY or whatever to XYZ either (previously we would allow this). Note, this does not limit MANUAL conns in any way.

  35. naumenkogs force-pushed on Sep 14, 2021
  36. DrahtBot removed the label Needs rebase on Sep 14, 2021
  37. in src/net.cpp:2042 in d2c7296327 outdated
    2036@@ -2034,6 +2037,10 @@ void CConnman::ThreadOpenConnections(const std::vector<std::string> connect)
    2037                 addr = addrman.Select();
    2038             }
    2039 
    2040+            // The following checks are never applied to MANUAL and ADDR_FETCH,
    2041+            // because those connections are made elsewhere.
    2042+            assert(conn_type != ConnectionType::MANUAL && conn_type != ConnectionType::ADDR_FETCH);
    


    vasild commented at 8:28 am on September 15, 2021:

    The following two variants are equivalent:

    0assert(A && B);
    

    vs

    0assert(A);
    1assert(B);
    

    However if the first variant fails it would be impossible to say which one of A or B was false just by looking at the message (something like assertion "A && B" failed). With the second variant it is clear whether A or B was false.

  38. vasild approved
  39. vasild commented at 8:38 am on September 15, 2021: contributor

    ACK d2c729632720f6cee67bedc661d49b967dc20506

    The commit message of 513d52b5e92e9b84ee3e3864fbeeb164c31e7c9d p2p: Account for MANUAL conns when diversifying persistent outbound conns reads: “this doesn’t affect how we choose MANUAL connections” which I find a bit confusing because we (== the Bitcoin Core software) never chooses manual connections - they are configured by the user.

  40. p2p: Account for MANUAL conns when diversifying persistent outbound conns
    Previously, we would make connections to peer from the netgroups to which
    our MANUAL outbound connections belong.
    However, they should be seen as regular connections from Addrman when it comes to netgroup diversity check, since the same rationale can be applied.
    
    Note, this has nothing to do with how we connect to MANUAL connections:
    we connect to them unconditionally.
    a87f04e795
  41. doc: peer selection rules do not apply to MANUAL and ADDR_FETCH aeec31d7b3
  42. naumenkogs force-pushed on Sep 15, 2021
  43. vasild approved
  44. vasild commented at 11:14 am on September 15, 2021: contributor
    ACK aeec31d7b32b744fecb4115176a5d4e8586cdec2
  45. mzumsande commented at 6:30 pm on January 27, 2022: contributor

    Code Review ACK aeec31d7b32b744fecb4115176a5d4e8586cdec2

    I just think MANUAL is just no different from a regular outbound peer. If you have a MANUAL to netgroup XYZ, you probably don’t want OUTBOUND_FULL_RELAY or whatever to XYZ either (previously we would allow this).

    I guess it depends on why you connected to the manual peer: Sometimes you may have a higher trust in that node, maybe because you know the operator. In that case, you wouldn’t necessarily see a greater risk that another node from the same group would be colluding with them.

    On the other hand, even if you do trust them, there could still be attacks on an ISP level that would apply to multiple nodes from one group, so I think it makes sense to include them in the diversification as well.

  46. jonatack commented at 7:02 pm on January 27, 2022: contributor

    Review ACK aeec31d7b32b744fecb4115176a5d4e8586cdec2

    Verified rebase to master and debug build are ok.

  47. fanquake requested review from jnewbery on Jan 28, 2022
  48. fanquake requested review from sipa on Jan 28, 2022
  49. jonatack commented at 4:55 pm on March 14, 2022: contributor
    This pull has ACKs by @vasild since 6 months, @mzumsande and myself since six weeks, and a concept ACK by @ariard, and has seen review by a number of contributors, might be RFM.
  50. jonatack commented at 5:00 pm on March 14, 2022: contributor
    Verified rebase to current master is still clean.
  51. in src/net.cpp:2044 in aeec31d7b3
    2036@@ -2034,6 +2037,11 @@ void CConnman::ThreadOpenConnections(const std::vector<std::string> connect)
    2037                 addr = addrman.Select();
    2038             }
    2039 
    2040+            // The following checks are never applied to MANUAL and ADDR_FETCH,
    2041+            // because those connections are made elsewhere.
    2042+            assert(conn_type != ConnectionType::MANUAL);
    2043+            assert(conn_type != ConnectionType::ADDR_FETCH);
    2044+
    


    jnewbery commented at 10:34 am on March 18, 2022:
    I’m not sure how this is an improvement. conn_type is a local variable that is set in lines 1932-1988 above, and can never be set to ConnectionType::MANUAL or ConnectionType::ADDR_FETCH. What are these assertions preventing or protecting against?
  52. in src/net.cpp:1921 in aeec31d7b3
    1922-                    case ConnectionType::MANUAL:
    1923+                    // Short-lived outbound connections should not affect how we select outbound
    1924+                    // peers from addrman.
    1925+                    case ConnectionType::ADDR_FETCH:
    1926+                    // Short-lived outbound connections should not affect how we select outbound
    1927+                    // peers from addrman.
    


    jnewbery commented at 10:35 am on March 18, 2022:
    No need to duplicate this comment.
  53. jnewbery commented at 10:38 am on March 18, 2022: contributor

    Concept ACK not taking INBOUND, ADDR_FETCH or FEELER connections into account, and taking MANUAL connections into account.

    I think the while (!interruptNet) loop in ThreadOpenConnections() is a bit of a mess, and should be refactored out into its own function. Adding extra assertions halfway through that loop that are testing a variable set further up in the function is probably making it slightly worse, not better.

  54. DrahtBot added the label Needs rebase on Apr 22, 2022
  55. DrahtBot commented at 2:00 pm on April 22, 2022: contributor

    šŸ™ This pull request conflicts with the target branch and needs rebase.

    Want to unsubscribe from rebase notifications on this pull request? Just convert this pull request to a “draft”.

  56. vasild commented at 10:42 am on October 24, 2022: contributor
    This is nice to have. @naumenkogs, do you plan to rebase it?
  57. naumenkogs commented at 12:56 pm on October 25, 2022: member
    @vasild not in the next 2 weeks, but possibly afterwards. Feel free to mark for grabs i guess.
  58. fanquake commented at 11:43 am on December 5, 2022: member

    Feel free to mark for grabs i guess.

    Will close, and mark up for grabs for now.

  59. fanquake closed this on Dec 5, 2022

  60. fanquake added the label Up for grabs on Dec 5, 2022
  61. stratospher commented at 4:09 pm on March 15, 2023: contributor
    revived this in #27264.
  62. maflcko removed the label Up for grabs on Mar 15, 2023
  63. maflcko removed the label Needs rebase on Mar 15, 2023
  64. fanquake referenced this in commit 053b2d3377 on Mar 19, 2023
  65. bitcoin locked this on Mar 14, 2024

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-07-03 13:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me