p2p: update inbound eviction protection for multiple networks, add I2P peers

jonatack commented at 0:29 am on February 22, 2021: member

Continuing the work in #20197 and #20685, this pull updates and abstracts our inbound eviction protection to make it fully ratio-based and easily extensible to peers connected via high-latency privacy networks that we newly support, like I2P and perhaps others soon, as these peers are disadvantaged by the latency criteria of our eviction logic.

It then adds eviction protection for peers connected over I2P. As described in #20685 (comment), we’ve observed over the past few months that I2P peers have a min ping latency similar to or greater than that of onion peers.

The algorithm is a basically a multi-pass knapsack:

Count the number of eviction candidates in each of the disadvantaged privacy networks.
Sort the networks from lower to higher candidate counts, so that a network with fewer candidates will have the first opportunity for any unused slots remaining from the previous iteration. In the case of a tie in candidate counts, priority is given by array member order from first to last, guesstimated to favor more unusual networks.
Iterate through the networks in this order. On each iteration, allocate each network an equal number of protected slots targeting a total number of candidates to protect, provided any slots remain in the knapsack.
Protect the candidates in that network having the longest uptime, if any in that network are present.
Continue iterating as long as we have non-allocated slots remaining and candidates available to protect.

The goal of this logic is to favorise the diversity of our peer connections.

The individual commit messages describe each change in more detail.

Special thank you to Vasil Dimov for the excellent review feedback and the algorithm improvement that made this change much better than it would have been otherwise. Thanks also to Antoine Riard, whose review feedback nudged this change to protect disadvantaged networks having fewer, rather than more, eviction candidates.

fanquake added the label P2P on Feb 22, 2021

jonatack force-pushed on Feb 22, 2021

DrahtBot commented at 3:17 am on February 22, 2021: member

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#21160 (net/net processing: Move tx inventory into net_processing by jnewbery)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

vasild commented at 8:18 am on February 25, 2021: member

Concept ACK

jonatack force-pushed on Apr 19, 2021

jonatack renamed this:
~~p2p, test: add inbound eviction protection for I2P peers~~
p2p: update inbound eviction protection for multiple networks, add I2P peers
on Apr 20, 2021

jonatack force-pushed on Apr 20, 2021

jonatack force-pushed on Apr 22, 2021

jonatack marked this as ready for review on Apr 22, 2021

laanwj added this to the "Blockers" column in a project

mzumsande commented at 11:26 pm on April 29, 2021: member

Concept ACK, will review more closely soon.

in src/net.cpp:931 in 831b3294d9 outdated

947+    const auto networks = std::count_if(network_counts.begin(), network_counts.end(), [](auto n) { return n; });
948+
949+    if (networks) {
950+        // Obtain the minimum number of peers per network to potentially protect.
951+        // The base ratio is 1/2 (for the case of peers from only 1 network).
952+        const size_t protect_size_per_network{total_protect_size / (2 * networks)};

vasild commented at 4:33 pm on May 3, 2021:

Before this PR we would have protected 25% onions. With this PR up to commit 517976187 (inclusive) we would protect 12.5% onions and with the full PR - just 8.3%. What about:

0        const size_t protect_size_per_network{total_protect_size / networks};

jonatack commented at 11:32 am on May 5, 2021:

Those are initial (not necessarily final) percentages iff onion, i2p and localhost peers are all present in the eviction candidates. For instance, if no i2p or localhost peers are among the candidats, then the full 25% is made available to onions. And if localhost or i2p peers are present, any initially reserved slots that they don’t use are made available for onions. (See the commit messages for a more detailed description). I’m not against adjusting the ratios though.

jonatack commented at 12:26 pm on May 17, 2021:

I played around with a more generous version that would protect more than 1/4 in the case of eviction candidates from multiple disadvantaged networks. Didn’t keep it for now, but memoizing the commit: e5c27dd.

in src/net.cpp:914 in 831b3294d9 outdated

919-        // Pick out up to 1/4 peers connected via our onion service, sorted by longest uptime.
920-        EraseLastKElements(vEvictionCandidates, CompareOnionTimeConnected, onion_protect_size,
921-                           [](const NodeEvictionCandidate& n) { return n.m_is_onion; });
922+    // Count number of nodes connected via localhost, I2P, and onion.
923+    // Higher positions in the array recover unused slots left by lower ones.
924+    std::array<size_t, 3> network_counts{{}};

vasild commented at 4:37 pm on May 3, 2021:

These are used as booleans. I would suggest something like:

 0struct {
 1    bool local{false};
 2    bool i2p{false};
 3    bool onion{false};
 4} networks;
 5...
 6if (node.m_is_local) {
 7    networks.local = true;
 8}
 9
10// if (networks) becomes:
11if (networks.local || networks.i2p || networks.onion)

jonatack commented at 11:49 am on May 5, 2021:

It’s used to determine the number of networks from which we have disadvantaged candidates (indeed that could be done with bools also, replacing count_if with a sum of each boolean) but I anticipate possibly also using the actual counts to determine the distribution size or order of the protected slots.

in src/net.cpp:1058 in 831b3294d9 outdated

1054@@ -1022,7 +1055,7 @@ bool CConnman::AttemptToEvictConnection()
1055                                                HasAllDesirableServiceFlags(node->nServices),
1056                                                peer_relay_txes, peer_filter_not_null, node->nKeyedNetGroup,
1057                                                node->m_prefer_evict, node->addr.IsLocal(),
1058-                                               node->m_inbound_onion};
1059+                                               node->m_inbound_onion, node->addr.IsI2P()};

vasild commented at 4:55 pm on May 3, 2021:

Passing multiple booleans like is_onion, is_i2p, etc is a bit un-flexible and leaves the possibility for contradictions like having both flags set to true. Fuzzing could do that and I expect that there may be (or may be added) some legit assert somewhere in the code to ensure that nonsensical values like is_onion==true and is_i2p==true are not passed (resulting in an assertion failure during fuzzing).

What about passing enum Network instead?

jonatack commented at 9:05 pm on May 6, 2021:

Passing enum Network, instead of multiple booleans, indeed allowed rewriting this to be better abstracted and generalizable. WIP, for now the initial rewrite is in an additional commit.

vasild commented at 5:02 pm on May 3, 2021: member

Approach ACK 831b3294d92b23f85617e77ea3f90082de73ce3a

jonatack force-pushed on May 6, 2021

jonatack force-pushed on May 7, 2021

jonatack force-pushed on May 8, 2021

laanwj commented at 3:04 pm on May 10, 2021: member

Concept ACK

jonatack force-pushed on May 16, 2021

vasild commented at 12:17 pm on May 17, 2021: member

Approach ACK 4890e2f2fdefaa18b0e96bcaf3691fb3f0213572

Some squashing is warranted.

jonatack commented at 12:22 pm on May 17, 2021: member

Approach ACK 4890e2f

Some squashing is warranted.

Thanks! Indeed. Re-arranging.

DrahtBot added the label Needs rebase on May 24, 2021

in src/net.cpp:938 in 161aed5142 outdated

949+            }
950+
951+            if (network_counts[1]) {
952+                // Protect onion peers (connected via our onion service), sorted by longest uptime.
953+                const size_t onion_peers_to_protect{total_to_protect - (initial_size - vEvictionCandidates.size())};
954+                EraseLastKElements(vEvictionCandidates, CompareOnionTimeConnected, onion_peers_to_protect,

ariard commented at 7:39 pm on May 26, 2021:

Why you don’t pass k=protect_size_per_network to EraseLastKElements ?

Let’s say initial_size=200, total_protect_size=100. If you count two networks, protected_size_per_network=25.

If the first network does exist, vEvictionCandidates is reduced to 175, total_to_protect increased to 50 And the second network does exist, 50 - 200 - 175, k is equal to 25

If the first network doesn’t exist, vEvictionCandidates isn’t reduced, total_to_protect is equal to 25 And the second network does exist, 25 - 200 - 200, k is equal to 25

Let me know if I get the math wrong, or the logic is altered in following commits (after 5179761)

Also, total_to_protect, protect_size_per_network are confusing. Maybe rename total_to_protect to already_selected_for_protect_size ?

jonatack commented at 9:36 pm on June 2, 2021:

Since the inital number of protected slots per network goes down quickly as the number of networks increases, it seems good to not waste any unused slots in this basic greedy knapsack. So we calculate peers_to_protect for each iteration having potential peers to protect, in order to recover any unused slots from the previous iteration.

jonatack commented at 9:39 pm on June 2, 2021:

Hm, s/total_to_protect/available_to_protect/? Edit: done.

in src/net.cpp:859 in 2aaeeff3e8 outdated

850@@ -851,6 +851,12 @@ static bool CompareOnionTimeConnected(const NodeEvictionCandidate& a, const Node
851     return a.nTimeConnected > b.nTimeConnected;
852 }
853 
854+static bool CompareI2PTimeConnected(const NodeEvictionCandidate& a, const NodeEvictionCandidate& b)
855+{
856+    if (a.m_is_i2p != b.m_is_i2p) return b.m_is_i2p;
857+    return a.nTimeConnected > b.nTimeConnected;
858+}
859+

ariard commented at 8:26 pm on May 26, 2021:

What would be the purpose of the follow-up you’re envisioning ?

AFAIU this current patchset, any unused protected slots across a high-latency network is reallocated to the next one by summing up protect_size_per_network and passing through total_to_protect ?

jonatack commented at 9:28 pm on June 2, 2021:

You’re right. The commit message was out of date. Updated.

ariard commented at 8:28 pm on May 26, 2021: member

Approach ACK 831b329

jonatack force-pushed on Jun 2, 2021

DrahtBot removed the label Needs rebase on Jun 2, 2021

dunxen commented at 6:40 am on June 3, 2021: contributor

Concept ACK. Will take a closer look later today :)

laanwj added this to the milestone 22.0 on Jun 3, 2021

jonatack commented at 7:05 am on June 7, 2021: member

Thanks for the concept ACKs @mzumsande, @duncandean and @laanwj and the approach ACKs @vasild and @ariard. Now ready for final review.

This screenshot illustrates the higher latency of I2P peers, which disadvantages them under our inbound eviction criteria even relative to onion peers.

Screenshot from 2021-06-07 08-56-56

in src/net.cpp:961 in dd6f8c4291 outdated

967+                                   [&n](const NodeEvictionCandidate& c) {
968+                                       return n.id == localhost ? c.m_is_local : c.m_network == n.id;
969+                                   });
970+                available_to_protect += protect_size_per_network;
971+            }
972+        }

vasild commented at 10:37 am on June 8, 2021:

(I needed a pen and paper for this)

Lets assume initial_size = 100.

Case 1: candidates contains 1 localhost peer, 20 I2P peers and 20 Tor peers. From those, the protected will be: 1 localhost, 15 I2P, 8 Tor.

Case 2: candidates contains 0 localhost peers, 20 I2P peers and 20 Tor peers. From those, the protected will be: 12 I2P peers and 12 Tor peers.

Case 2 looks strange compared to Case 1 because less I2P peers will be protected when there are no localhost ones. I would expect less peers from some earlier networks to result in more protections from subsequent networks.

jonatack commented at 7:20 am on June 9, 2021:

I think the outcome of Case 2 is ideal. Case 1 protects many I2P peers because that network gets the unused slots from the localhost allocation. What change would you suggest? Keep in mind that this is a subcomponent seen in isolation; another 1/4 of the remaining peers will be protected based on their uptime.

jonatack commented at 8:26 am on June 9, 2021:

Hm, maybe we can tally the number of networks that have more than protect_size_per_network peers and try to share the leftover slots between them so that Case 1 protects 1 / 11 / 12 instead of 1 / 15 / 8.

vasild commented at 10:43 am on June 9, 2021:

What change would you suggest?

 0void ProtectEvictionCandidatesByRatio(std::vector<NodeEvictionCandidate>& candidates)
 1{
 2    // Treat localhost connections as a separate network.
 3    static constexpr Network NET_LOCALHOST{NET_MAX};
 4
 5    const std::array<Network, 3> networks{NET_LOCALHOST, NET_I2P, NET_ONION};
 6
 7    // Protect the half of the remaining nodes which have been connected the longest.
 8    // This replicates the non-eviction implicit behavior, and precludes attacks that start later.
 9    // To favorise the diversity of our peer connections, reserve up to half of
10    // these protected spots for onion, localhost and I2P peers, even if they're
11    // not longest uptime overall. This helps protect these higher-latency peers
12    // that tend to be otherwise disadvantaged under our eviction criteria.
13    const size_t total_to_protect{candidates.size() / 2};
14
15    // Protect up to 25% of `candidates` due to network. We may protect less if
16    // there are no candidates that belong to `networks`.
17
18    const size_t max_protect_due_to_network{total_to_protect / 2};
19    size_t num_protected{0};
20
21    while (num_protected < max_protect_due_to_network) {
22        const size_t remaining_to_protect{max_protect_due_to_network - num_protected};
23        const size_t protect_per_network{std::max(remaining_to_protect / networks.size(), 1UL)};
24        bool protected_at_least_one{false};
25
26        for (const Network net : networks) {
27            const size_t before = candidates.size();
28            EraseLastKElements(candidates,
29                               CompareNodeNetworkTime(net, net == NET_LOCALHOST),
30                               protect_per_network,
31                               [&net](const NodeEvictionCandidate& c) {
32                                   return net == NET_LOCALHOST ? c.m_is_local : c.m_network == net;
33                               });
34            const size_t after = candidates.size();
35            if (after < before) {
36                protected_at_least_one = true;
37                num_protected += before - after;
38                if (num_protected >= max_protect_due_to_network) {
39                    break;
40                }
41            }
42        }
43
44        if (!protected_at_least_one) {
45            break;
46        }
47    }
48
49    // Protect another 25% to 50% of `candidates` due to uptime/longevity. For example, if
50    // 25% were protected due to network then we will protect 25% due to uptime, if
51    // 20% were protected due to network then we will protect 30% due to uptime.
52    const size_t remaining{total_to_protect - num_protected};
53    EraseLastKElements(candidates, ReverseCompareNodeTimeConnected, remaining);
54}

will result in:

0(number of peers that belong to localhost, i2p, tor => protected peers from localhost, i2p, tor)
11, 20, 20 => 1, 12, 12
20, 20, 20 => 0, 13, 12

Feel free to ignore.

jonatack commented at 7:28 pm on June 9, 2021:

This seems like a nice approach, multi-pass with simpler data structures. Will adapt the tests to it to check all the test cases. Edit: only 4 “combined” test cases needed to be updated, and this version does better on those.

jonatack commented at 1:35 pm on June 10, 2021:

It looks like we should keep the Net struct and < sort operator so the result can independent of the order of members in the networks array, so a hybrid approach, also adding an is_local member to the struct to no longer need net == localhost throughout the code.

vasild commented at 1:54 pm on June 10, 2021:

Yes, with the above snippet, the result may depend on the order of elements in the networks array, but only up to 1 which I think it ok - after all, if we want to split evenly e.g. 10 slots among 3 networks, some network will get 4 slots and the rest will get 3 slots.

The current code also depends on the order in networks because it does “stable” sort, right?

jonatack commented at 2:19 pm on June 10, 2021:

The results make more sense when they depend on number of peers by network, e.g. if you have 10 I2P peers and 3 onion ones, best to give first chance at unused slots to the I2P peers so we protect, say, 3 I2P and 2 onions rather than the opposite. Stable sort only makes a difference if the peer counts are equal. I’ll propose this separately in the last commit for feedback.

jonatack commented at 2:24 pm on June 10, 2021:

What I like is that we focused on different aspects and the combined result seems to provide the best eviction protection, so this is great.

vasild commented at 2:35 pm on June 10, 2021:

So, if we have 10 I2P peers and 3 onion ones and we want to protect 6 in total, we will protect 3 I2P and 3 onion, this is desired, right?

If we have again 10 I2P and 3 onion and want to protect 5, do we want to protect 3 I2P and 2 onion or 2 I2P and 3 onion? I am not sure. One may argue that protecting one more (but we are talking about just one here) from the rare network (onion) is better because it is easier to find peers from the more popular network (I2P), so the ones from the rare network are more scarce and precious. Or a counter argument could be to keep the proportion - if we have more I2P, then keep more I2P.

Personally, I think it is not so important because it is just 1 peer we are talking about here and I would favor code simplicity over enforcing either approach with more complex code.

jonatack commented at 3:11 pm on June 10, 2021:

Yes, this is about tradeoffs. The simplest code was when only a Network was passed around all the code without the is_local bool everywhere, but that simplicity had a logic encapsulation drawback (perhaps some others), so I rewrote everything and brought back is_local.

I’ve redone this pull several times because I’d like it to do the right thing, whether networks with more peers get first priority at unused slots or networks with fewer. When running a node with a reduced number of connections (-maxconnections), an eviction candidate can be one of a small number. If people prefer to have the array order decide independently of the number of peers, it seems we should clearly document that dependency. Edit: documented.

I’ve separated the changes out to separate commits. Am going through everything before pushing the update.

jonatack commented at 3:29 pm on June 10, 2021:

My thought is that having separate protection buckets for each of the disadvantaged networks is the essential step here to improving the diversity of our inbound peer connections…and that unused protection slots should be distributed among disadvantaged networks having remaining eviction candidates by descending order of their count, and in the case of a tie, fall back to the array order.

in src/net.cpp:927 in dd6f8c4291 outdated

947+    const auto networks_size = std::count_if(networks.begin(), networks.end(), [](const Net& n) { return n.count; });
948+
949+    if (networks_size) {
950+        // Obtain the minimum number of peers per network to potentially protect.
951+        // The base ratio is 1/2 (for the case of peers from only 1 network).
952+        const size_t protect_size_per_network{total_protect_size / (2 * networks_size)};

vasild commented at 10:44 am on June 8, 2021:

Why “minimum”? I think that either protect_size_per_network or less peers will be protected, so that is a “maximum”, no?

jonatack commented at 2:00 pm on June 8, 2021:

Unused slots can be recovered, so the number protected can be higher, and yes, if there are fewer peers of that network, the number protected can be lower. Perhaps s/minimum/initial/.

vasild commented at 10:21 am on June 9, 2021:

Right, it can be higher or lower. Neither one of “minimum” or “maximum” is suitable. s/minimum/initial/ looks good.

jonatack commented at 2:39 pm on June 10, 2021:

Thanks. This line is now no longer used.

in src/net.cpp:883 in dd6f8c4291 outdated

870@@ -883,6 +871,17 @@ static bool CompareNodeBlockRelayOnlyTime(const NodeEvictionCandidate &a, const
871     return a.nTimeConnected > b.nTimeConnected;
872 }
873 
874+struct CompareNodeNetworkTime {

vasild commented at 12:57 pm on June 8, 2021:

I find this hard to grasp. Some comments would be nice. For example:

0/**
1 * Define an order between NodeEvictionCandidate objects based on network and connection time.
2 * Nodes near the beginning are more likely to be disconnected, nodes near the end will be
3 * protected (less likely to be disconnected).
4 * - First all nodes that do not belong to `network`, among them:
5 *   - first nodes that have been connected shorter
6 *   - later nodes that have been connected longer
7 * - Later all nodes that belong to `network`, among them, sorted by connection time as above.
8 */
9struct CompareNodeNetworkTime {

jonatack commented at 7:09 pm on June 10, 2021:

Done, added documentation.

in src/net.cpp:880 in dd6f8c4291 outdated

875+    const Network network, localhost;
876+    CompareNodeNetworkTime(Network n, Network l) : network(n), localhost(l) {}
877+    bool operator()(const NodeEvictionCandidate& a, const NodeEvictionCandidate& b) const
878+    {
879+        if (network == localhost && a.m_is_local != b.m_is_local) return b.m_is_local;
880+        if ((a.m_network == network) != (b.m_network == network)) return b.m_network == network;

vasild commented at 1:02 pm on June 8, 2021:

If network == localhost and a.m_is_local == b.m_is_local then the control flow will continue to the second if. I think this is wrong or at least confusing because if this is the localhost comparator (network == localhost) then it should not sort based on m_network. Here is an example that does not involve passing the constant localhost if that makes it easier to understand, feel free to ignore:

 0struct CompareNodeNetworkTime { 
 1    const Network m_network;
 2    const bool m_is_local;
 3    CompareNodeNetworkTime(Network network, bool is_local)
 4        : m_network(network), m_is_local(is_local)
 5    { 
 6    } 
 7    bool operator()(const NodeEvictionCandidate& a, const NodeEvictionCandidate& b) const
 8    { 
 9        if (m_is_local) {
10            if (a.m_is_local != b.m_is_local) {
11                return b.m_is_local;
12            }
13        } else if ((a.m_network == m_network) != (b.m_network == m_network)) {
14            return b.m_network == m_network;
15        }   
16        return a.nTimeConnected > b.nTimeConnected;
17    };  
18};
19
20...
21
22CompareNodeNetworkTime(n.id, n.id == localhost)

jonatack commented at 6:59 am on June 9, 2021:

The suggestion is equivalent/does the same thing, assuming the following change is also made to the caller.

0-                EraseLastKElements(candidates, CompareNodeNetworkTime(n.id, localhost), peers_to_protect,
1+                EraseLastKElements(candidates, CompareNodeNetworkTime(n.id, n.id == localhost), peers_to_protect,

jonatack commented at 2:37 pm on June 10, 2021:

Updated to pass an is_local bool.

jonatack force-pushed on Jun 11, 2021

p2p, refactor: rm redundant erase_size calculation in SelectNodeToEvict()

as EraseLastKElements() called in the next line performs the same operation.
Thanks to Martin Zumsande (lightlike) for seeing this while reviewing.

Co-authored-by: Martin Zumsande <mzumsande@gmail.com>

1cde800523

jonatack force-pushed on Jun 11, 2021

jonatack commented at 1:23 pm on June 11, 2021: member

Updated per the discussions above and rebased. The CI failure seems unrelated. I’m running a node on this branch with -maxconnections=40.

Edit: repushed to give the unrelated CI issue another chance.

in src/net.cpp:946 in 22b1dae1e0 outdated

970+                               });
971+            const size_t after = candidates.size();
972+            if (before > after) {
973+                protected_at_least_one = true;
974+                num_protected += before - after;
975+                if (num_protected >= max_protect_by_network) break;

vasild commented at 2:38 pm on June 11, 2021:

off-topic, please ignore

0if (cond) statement;

vs

0if (cond) {
1    statement;
2}

Putting everything on one line has 3 disadvantages:

It is not gdb friendly - it is not possible to set a breakpoint on statement; - to stop after cond and before statement (if cond was true).
It is not friendly to line-based code coverage - it will show that that line was executed even if statement was not.
It is not diff friendly: 3.1.

0-if (cond) statement;
1+if (cond) {
2+    statement;
3+    another;
4+}

VS

0 if (cond) {
1     statement;
2+    another;
3 }

3.2.

0-if (some || involved(123) || STUFF + here / 9 || you_have_to_check_if_this_is_the_same) woho(x * 19 - y);
1+if (some || involved(123) || STUFF + here / 9 || you_have_to_check_if_this_is_the_same) woho(x * 29 - y);

VS

0 if (some || involved(123) || STUFF + here / 9 || no_need_to_check_in_this_case) {
1-    woho(x * 19 - y);
2+    woho(x * 29 - y);
3 }

jonatack commented at 5:22 pm on June 12, 2021:

Brackets and newlines added

jonatack force-pushed on Jun 11, 2021

in src/test/net_peer_eviction_tests.cpp:35 in 42530190d8 outdated

31@@ -36,7 +32,7 @@ std::vector<NodeEvictionCandidate> GetRandomNodeEvictionCandidates(const int n_c
32             /* nKeyedNetGroup */ random_context.randrange(100),
33             /* prefer_evict */ random_context.randbool(),
34             /* m_is_local */ random_context.randbool(),
35-            /* m_is_onion */ random_context.randbool(),
36+            /* m_network */ static_cast<Network>(random_context.randrange(static_cast<int>(Network::NET_INTERNAL))),

vasild commented at 2:51 pm on June 11, 2021:

This implies that NET_INTERNAL has the highest value which may change in the future.

0            /* m_network */ ALL_NETWORKS[random_context.randrange(ALL_NETWORKS.size())],

jonatack commented at 5:17 pm on June 11, 2021:

Good point, looks like your suggestion with this might do it:

 0+++ b/src/test/net_peer_eviction_tests.cpp
 1@@ -4,6 +4,7 @@
 2 #include <net.h>
 3+#include <test/util/net.h>
 4 #include <test/util/setup_common.h>
 5 
 6@@ -32,7 +33,7 @@ std::vector<NodeEvictionCandidate> GetRandomNodeEvictionCandidates(const int n_c
 7-            /* m_network */ static_cast<Network>(random_context.randrange(static_cast<int>(Network::NET_INTERNAL))),
 8+            /* m_network */ ALL_NETWORKS[random_context.randrange(ALL_NETWORKS.size())],
 9
10+++ b/src/test/util/net.h
11@@ -68,7 +68,7 @@ constexpr ConnectionType ALL_CONNECTION_TYPES[]{
12-constexpr Network ALL_NETWORKS[]{
13+constexpr std::array ALL_NETWORKS{

jonatack commented at 5:22 pm on June 12, 2021:

Done.

jonatack commented at 5:18 pm on June 13, 2021:

Huh, the CI is barfing on this change. Strange.

MarcoFalke commented at 5:43 pm on June 13, 2021:

This might be a compiler bug. I used static constexpr auto ALL_FEE_ESTIMATE_HORIZONS = std::array{ to work around that.

jonatack commented at 6:16 pm on June 13, 2021:

Thanks! Will try that.

jonatack commented at 9:17 pm on June 13, 2021:

It worked 👍

vasild approved

vasild commented at 3:09 pm on June 11, 2021: member

ACK 42530190d82ee1692a546a8daae89e3ba0f5a3a2

I think the following commits should be squashed into one:

0e157ed1478 p2p: make ProtectEvictionCandidatesByRatio() fully ratio-based
1029256f7ed p2p: give networks with more peers priority for unused protection slots
242530190d8 p2p, refactor: replace NET_LOCALHOST with a bool struct member

It could be tedious and time consuming for reviewers to review newly added code that is rehauled in a subsequent commit and then changed again in yet another commit.

in src/net.cpp:905 in 42530190d8 outdated

901@@ -894,40 +902,58 @@ static void EraseLastKElements(
902     elements.erase(std::remove_if(elements.end() - eraseSize, elements.end(), predicate), elements.end());
903 }
904 
905-void ProtectEvictionCandidatesByRatio(std::vector<NodeEvictionCandidate>& vEvictionCandidates)
906+void ProtectEvictionCandidatesByRatio(std::vector<NodeEvictionCandidate>& candidates)

ariard commented at 4:49 pm on June 11, 2021:

micro-nit: i would prefer eviction_candidates

jonatack commented at 5:22 pm on June 12, 2021:

done

in src/net.cpp:910 in 42530190d8 outdated

906+void ProtectEvictionCandidatesByRatio(std::vector<NodeEvictionCandidate>& candidates)
907 {
908+    // Networks to protect: localhost, onion, I2P. In case of equal counts, earlier array members
909+    // have the first opportunity to recover unused protection slots from the previous iteration.
910+    struct Net { Network id; size_t count; bool is_local; };
911+    std::array<Net, 3> networks{{{/* localhost */ NET_MAX, 0, 1}, {NET_ONION, 0, 0}, {NET_I2P, 0, 0}}};

ariard commented at 5:03 pm on June 11, 2021:

Maybe you can precise onion as “torv3” or “tor” ?

I think even if Tor is the most well-known deployed onion routing you have multiple other existing onion routing constructions. In fact, I2P’s garlic routing is widely similar to Tor’s onion routing. Or Sphinx as deployed on LN communication layer is also a format of onion. Pedantic point :)

jonatack commented at 5:22 pm on June 12, 2021:

done

in src/net.cpp:929 in 42530190d8 outdated

949+    // not longest uptime overall. This helps protect these higher-latency peers
950+    // that tend to be otherwise disadvantaged under our eviction criteria.
951+    const size_t initial_size = candidates.size();
952+    const size_t total_protect_size{initial_size / 2};
953+
954+    // Protect up to 25% of the candidates by disadvantaged network.

ariard commented at 5:43 pm on June 11, 2021:

IIUC, in the worst-case scenario, at least 1 peer is protected among all the disadvantaged network, the one with the most candidates.

I think that’s an interesting property.

jonatack commented at 5:23 pm on June 12, 2021:

Yes, if there are at least 4 eviction candidates. (Changed per your review feedback from the disadvantaged network with the most candidates to the one with the fewest.)

in src/net.cpp:934 in 42530190d8 outdated

954+    // Protect up to 25% of the candidates by disadvantaged network.
955+    const size_t max_protect_by_network{total_protect_size / 2};
956+    size_t num_protected{0};
957+
958+    while (num_protected < max_protect_by_network) {
959+        const size_t remaining_to_protect{max_protect_by_network - num_protected};

ariard commented at 5:50 pm on June 11, 2021:

nit: to reduce confusion among sets of eviction candidates, maybe called this variable `disadvantaged_to_protect"

remaining_to_protect is already used as another variable name at the end of this function.

jonatack commented at 5:23 pm on June 12, 2021:

done

in src/net.cpp:918 in 42530190d8 outdated

914+        n.count = std::count_if(candidates.cbegin(), candidates.cend(),
915+                                [&n](const NodeEvictionCandidate& c) { return n.is_local ? c.m_is_local : c.m_network == n.id; });
916+    }
917+    // Sort `networks` by descending candidate count, to give networks having more candidates
918+    // the first opportunity to recover unused protection slots from the previous iteration.
919+    std::stable_sort(networks.begin(), networks.end(), [](Net a, Net b) { return a.count > b.count; });

ariard commented at 6:01 pm on June 11, 2021:

I think I would favor sorting network by ascending candidate count, otherwise I believe it would bias towards anonymity networks with abundant peers, decreasing our local diversity of anonymity network ?

I think this point has been already discussed in this thread. I agree it’s minor.

jonatack commented at 5:23 pm on June 12, 2021:

Done. As the unit tests show, one peer can make a difference when the numbers are low, e.g. when -maxconnections has been reduced or only one candidate will be protected.

in src/net.cpp:946 in 42530190d8 outdated

966+                               [&n](const NodeEvictionCandidate& c) { return n.is_local ? c.m_is_local : c.m_network == n.id; });
967+            const size_t after = candidates.size();
968+            if (before > after) {
969+                protected_at_least_one = true;
970+                num_protected += before - after;
971+                if (num_protected >= max_protect_by_network) break;

ariard commented at 6:29 pm on June 11, 2021:

Following diff doesn’t break net_peer_eviction_tests, is this an expected behavior ?

0@@ -943,7 +943,7 @@ void ProtectEvictionCandidatesByRatio(std::vector<NodeEvictionCandidate>& candid
1             if (before > after) {
2                 protected_at_least_one = true;
3                 num_protected += before - after;
4-                if (num_protected >= max_protect_by_network) break;
5+                if (num_protected == max_protect_by_network) break;
6             }
7         }
8         if (!protected_at_least_one) break;

jonatack commented at 5:37 pm on June 12, 2021:

Thanks for checking this! Agree, the tests pass with this change:

0-                if (num_protected >= max_protect_by_network) {
1+                assert(num_protected <= max_protect_by_network);
2+                if (num_protected == max_protect_by_network) {

I think it’s expected behavior based on these lines:

0    while (num_protected < max_protect_by_network) {
1        const size_t disadvantaged_to_protect{max_protect_by_network - num_protected};
2        const size_t protect_per_network{std::max(disadvantaged_to_protect / networks.size(), static_cast<size_t>(1))};

Edit: This pull has seen several rewrites, so tomorrow I plan to re-verify the pertinence of each test and check for missing cases.

vasild commented at 11:36 am on June 14, 2021:

Given that num_protected is incremented with a non-constant number, that could be larger than 1, I assume it could become larger than max_protect_by_network at some point. Thus >= for comparison instead of ==.

How could replacing >= with == and an assert change anything? I think if it changes anything then it must be a trigger of the assert - if it was equal before, then behavior is unchanged (>= is the same as ==) and if it was strictly larger, before we would have executed break; and after the change - triggered the assert.

To me that assert looks scary, even if some (non-obvious) arithmetic a few lines above mandates it will never be triggered. The code may be altered in the future which may change this. Also, even if it happens that num_protected > max_protect_by_network, then nothing really bad has happened - we protected some peers (more than we intended), the program can continue executing just fine.

(the code at the time I write this does not contain an assert (a457f34e5039b75ee015b273028c3ee153656d5c))

jonatack commented at 11:50 am on June 14, 2021:

Right, the assert was just for sanity testing (I wrote more tests but didn’t find a way to hit the assert), no plan to add it.

in src/net.cpp:944 in 42530190d8 outdated

956+    size_t num_protected{0};
957+
958+    while (num_protected < max_protect_by_network) {
959+        const size_t remaining_to_protect{max_protect_by_network - num_protected};
960+        const size_t protect_per_network{std::max(remaining_to_protect / networks.size(), static_cast<size_t>(1))};
961+        bool protected_at_least_one{false};

ariard commented at 6:32 pm on June 11, 2021:

What do you think about commenting the purpose of this variable ?

“If not more eviction candidates remain to protect among disadvantaged networks, ensure we exit cleanly”

jonatack commented at 5:24 pm on June 12, 2021:

Done.

ariard commented at 6:35 pm on June 11, 2021: member

I think this is pretty mature, if you can just have a look around this comment ? Other comments are less substantial. Once done, I want to check again correctness of the multi-pass algorithm.

jonatack commented at 5:27 pm on June 12, 2021: member

Thanks for the feedback @vasild and @ariard. All the feedback taken, will re-push after reworking the tests.

jonatack force-pushed on Jun 13, 2021

jonatack commented at 4:45 pm on June 13, 2021: member

Updated, taking all review feedback by @vasild and @ariard, and also improving readability and updating the tests.

git diff 4253019 a457f34 --color-words

test: speed up and simplify peer_eviction_test

This speeds up the test significantly, which helps when
running it repeatedly.

Suggest reviewing the diff with:

colorMoved = dimmed-zebra
colorMovedWs = allow-indentation-change

519e76bb64

test: add ALL_NETWORKS to test utilities 4a19f501ab

p2p, refactor: improve constness in ProtectEvictionCandidatesByRatio() ec590f1d91

p2p, refactor: rename vEvictionCandidates to eviction_candidates

in ProtectEvictionCandidatesByRatio()
per current style guide in doc/developer-notes.md

7321e6f2fe

p2p: add m_network to NodeEvictionCandidate struct 4ee7aec47e

p2p: add CompareNodeNetworkTime() comparator struct

to compare and sort peer eviction candidates by the
passed-in is_local (localhost status) and network
arguments, and by longest uptime.

38a81a8e20

test: remove combined onion/localhost eviction protection tests

as we are about the change the behavior sufficiently that when we
have multiple disadvantaged networks and a small number of peers
under test, the number of protected peers per network can be different.

3f8105c4d2

jonatack force-pushed on Jun 13, 2021

in src/net.cpp:920 in a457f34e50 outdated

939+    const size_t total_protect_size{initial_size / 2};
940+
941+    // Disadvantaged networks to protect: I2P, localhost, Tor/onion. In case of equal counts, earlier
942+    // array members have first opportunity to recover unused slots from the previous iteration.
943+    struct Net { bool is_local; Network id; size_t count; };
944+    std::array<Net, 3> networks{{{0, NET_I2P, 0}, {/* localhost */ 1, NET_MAX, 0}, {0, NET_ONION, 0}}};

vasild commented at 10:11 am on June 14, 2021:

nit: use true / false for bool:

0    std::array<Net, 3> networks{{{false, NET_I2P, 0}, {/* localhost */ true, NET_MAX, 0}, {false, NET_ONION, 0}}};

jonatack commented at 11:29 am on June 14, 2021:

Yeah, I did this to keep the line length down as it’s a standard conversion, but sure.

jonatack commented at 12:04 pm on June 14, 2021:

Done.

vasild approved

vasild commented at 11:24 am on June 14, 2021: member

ACK a457f34e5039b75ee015b273028c3ee153656d5c

laanwj commented at 11:53 am on June 14, 2021: member

I’ve been running this PR (various versions) on a busy node for the last weeks. Unfortunately this node has a really high maxconnections (500, while average number of connected peers is 200) so I just realize I haven’t been actually testing the eviction behavior :blush:

In any case Code review (and lightly tested) ACK a457f34e5039b75ee015b273028c3ee153656d5c Code review re-ACK 1b1088d52fbff8b1c9438d6aa8c6edcbdd471457

p2p: make ProtectEvictionCandidatesByRatio() fully ratio-based

with a more abstract framework to allow easily extending inbound
eviction protection to peers connected through new higher-latency
networks that are disadvantaged by our inbound eviction criteria,
such as I2P and perhaps other BIP155 networks in the future like
CJDNS.  This is a change in behavior.

The algorithm is a basically a multi-pass knapsack:

- Count the number of eviction candidates in each of the disadvantaged
  privacy networks.

- Sort the networks from lower to higher candidate counts, so that
  a network with fewer candidates will have the first opportunity
  for any unused slots remaining from the previous iteration.  In
  the case of a tie in candidate counts, priority is given by array
  member order from first to last, guesstimated to favor more unusual
  networks.

- Iterate through the networks in this order.  On each iteration,
  allocate each network an equal number of protected slots targeting
  a total number of candidates to protect, provided any slots remain
  in the knapsack.

- Protect the candidates in that network having the longest uptime,
  if any in that network are present.

- Continue iterating as long as we have non-allocated slots
  remaining and candidates available to protect.

Localhost peers are treated as a network like Tor or I2P by aliasing
them to an unused Network enumerator: Network::NET_MAX.

The goal is to favorise diversity of our inbound connections.

Credit to Vasil Dimov for improving the algorithm from single-pass
to multi-pass to better allocate unused protection slots.

Co-authored-by: Vasil Dimov <vd@FreeBSD.org>

1e15acf478

p2p: update ProtectEvictionCandidatesByRatio() doxygen docs 787d46bb2a

p2p: remove unused CompareOnionTimeConnected() 9e889e8a5c

p2p: remove unused CompareLocalHostTimeConnected() 310fab4928

p2p: remove unused m_is_onion member from NodeEvictionCandidate struct 045cb40192

test: add combined onion/localhost eviction protection coverage 70bbc62711

p2p: extend inbound eviction protection by network to I2P peers

This commit extends our inbound eviction protection to I2P peers to
favorise the diversity of peer connections, as peers connected
through the I2P network are otherwise disadvantaged by our eviction
criteria for their higher latency (higher min ping times) relative
to IPv4 and IPv6 peers, as well as relative to Tor onion peers.

The `networks` array is order-dependent in the case of a tie in
candidate counts between networks (earlier array members receive
priority in the case of a tie).

Therefore, we place I2P candidates before localhost and onion ones
in terms of opportunity to recover unused remaining protected slots
from the previous iteration, guesstimating that most nodes allowing
both onion and I2P inbounds will have more onion peers, followed by
localhost, then I2P, as I2P support is only being added in the
upcoming v22.0 release.

ce02dd1ef1

test: add tests for inbound eviction protection of I2P peers 7c2284eda2

test: add combined I2P/onion/localhost eviction protection tests 1b1088d52f

jonatack force-pushed on Jun 14, 2021

jonatack commented at 12:33 pm on June 14, 2021: member

Thanks @vasild and @laanwj. Updated per git diff a457f34 1b1088d --color-words for the review suggestion and to add 4 test cases, and updated the PR description to thank/credit @vasild and @ariard.

vasild approved

vasild commented at 1:02 pm on June 14, 2021: member

ACK 1b1088d52fbff8b1c9438d6aa8c6edcbdd471457

laanwj merged this on Jun 14, 2021

laanwj closed this on Jun 14, 2021

laanwj removed this from the "Blockers" column in a project

jonatack deleted the branch on Jun 14, 2021

laanwj commented at 1:23 pm on June 14, 2021: member

Wow! How long does it take to get up to 200 peers? Mine is restarted a lot but never gets above 35-40.

Last restart was last week, it gets connections quite quickly. Then again it is connected to all possible networks and has had a >99% uptime for a few years. Either that helps or maybe it’s just all spy nodes :slightly_smiling_face:

sidhujag referenced this in commit b9e1e110ed on Jun 14, 2021

laanwj referenced this in commit 21998bc028 on Jul 15, 2021

Fabcien referenced this in commit 990ce4d9f4 on Feb 17, 2022

laanwj referenced this in commit 8b6cd42c62 on Mar 2, 2022

gwillen referenced this in commit dc6f568584 on Jun 1, 2022

DrahtBot locked this on Aug 16, 2022

p2p: update inbound eviction protection for multiple networks, add I2P peers #21261

Conflicts