RFC: randomize over netgroups in outbound peer selection #34019

issue darosior openend this issue on December 5, 2025

darosior commented at 4:19 pm on December 5, 2025: member
The current mechanism for choosing outbound peers picks one at randoms among known-reachable addresses, with the caveat that we do not connect twice to the same netgroup (by default /16’s and, if an ASmap is configured, by AS’s). A more robust mechanism for preventing an attacker to control all of a node’s outbound connections would first randomize over netgroups and then pick a known-reachable address within that netgroup.

This alternative mechanism would make the probability for an attacker to control all of a node’s outbound connections exponentially decreasing in the number of connections, roughly $(\frac{k}{n})^c$ with $k$ the number of netgroups controlled by the attacker, $n$ the total number of netgroups available to be chosen from, and $c$ the number of outbound connections made. There is today in the order of 5k different /16’s to choose from in the network¹. If we were to use this method, even an adversary that introduced 1k new ones with exclusively their nodes (which would be absurdly expensive) would not be able to control all of a node’s outbound peers with any relevant probability.

By contrast, the current mechanism will allow an adversary with enough node IPs to control all outbound connections of a node with a realistic probability, as long as those IPs are spread across at least as many netgroups as we make outbound connections by default. This is not merely a theoretical concern. This summer i investigated² an entity that spun up hundreds of reachable nodes on the network. They have since scaled up to 3000 nodes, spread across 8 /16’s boundaries (and 3 or 4 AS’s). As a result, a freshly started clearnet nodes nowadays will make 3 to 5 of their outbound connections on average to this single entity, which is not even actively attacking (for instance by more aggressively sharing their own node addresses and/or not relaying other reachable nodes’). More discussions regarding this entity are available here and here.

Of course, switching to this mechanism for choosing outbound peers would have consequences on the network graph. Because we currently sample over all known node addresses, we will be biased towards netgroups that contain a lot of nodes (such as hosting providers). First sampling by netgroup would remove this bias, and make it significantly more likely to connect to more “obscure” netgroups. This could cause a resource allocation issue on the network, with the inbound connection slots of netgroups with a lesser amount of nodes getting overused (and maxed out) while tons of inbound connection slots in netgroups with a higher amount of nodes sit unused. Interestingly, this is a similar concern to that of switching to ASmap by default shared by @virtu here.

Naturally a middle of the road solution could be to use the alternative mechanism for half of our connections ($c = 5$ in the formula above is more than enough) to get the local eclipse resistance benefits while minimizing the risk of global network disruption. An alternative would be to fully move to sampling by netgroups, but not uniformly. The draw could be biased toward those with more available resources.

A related discussion is how we want a node to behave when its inbound connection slots are full (see #16599 (comment)).

A related question is whether we want to keep the “never connect to more than one netgroup” rule if we adopt the alternative mechanism. Without the rule but with the new mechanism, could it be the case that if all connection slots in “small” netgroups, a large fraction of the network eventually converge towards the larger netgroups? Possibly making several connections to the same netgroups? That seems unlikely. On the other hand if it happens it would organically spread resource usage (though not with a distribution we are happy with).

This topic was discussed during yesterday’s IRC meeting (which this issue is following up on). Logs available here.
A conservative estimate from querying the /16’s present in the tried table of a number of long-running nodes, and comparing what a number of sources (1, 2, 3) claim are the number of reachable ipv4 nodes. The command ran to gather the number of /16’s in a node’s addrman is the following: bitcoin-cli getrawaddrman |jq -r '.tried[].address | select(test("^[0-9]{1,3}(\\.[0-9]{1,3}){3}$")) | (split(".")[0:2] | join("."))' |uniq |wc -l. ↩︎

See this blog post. The investigation started because the nodes were misconfigured, and i ended up being in contact with the person running those. It appears the person is purposefully trying to optimize for the highest possible number of node addresses announced, in particular by having advertising several IPs per node. ↩︎
fanquake added the label P2P on Dec 5, 2025
ajtowns commented at 6:14 pm on December 5, 2025: contributor
I seem to have 7204 ipv4 nodes in my tried table with a timestamp more recent than 90 days ago, split across 3509 /16s. There are 6 /16s with between 100 and 200 tried entries, and another 23 /16s with more than 20 tried entries. At the other end of the scale, there are 2578 /16s with only one node in my tried table, 561 with two nodes, 172 with three, 58 with four, 28 with 5 and 26 with 6.

The network is able to accept 115 inbound connections per node by default (max connections = 125, minus 10 outbounds), so if I tak e my tried table as comprehensive (not really a sensible assumption), cumulatively that’s:
- 2578 /16s with 1 nodes: 29,647 nodes worth of inbound connections
- +561 /16s with 2 nodes: 42,550 nodes worth of inbound connections
- +172 /16s with 3 nodes: 48,484 nodes worth of inbound connections
- +58 /16s with 4 nodes: 51,152 nodes worth of inbound connections
- +28 /16s with 5 nodes: 52,762 nodes worth of inbound connections
- +26 /16s with 6 nodes: 54,556 nodes worth of inbound connections
- +13 /16s with 7 nodes: 55,602 nodes worth of inbound connections
- +12 /16s with 8 nodes: 56,706 nodes worth of inbound connections
- +8 /16s with 9 nodes: 57,534 nodes worth of inbound connections
- +4 /16s with 10 nodes: 57,994 nodes worth of inbound connections
- +5 /16s with 11 nodes: 58,627 nodes worth of inbound connections
- +2 /16s with 12 nodes: 58,903 nodes worth of inbound connections
- +3 /16s with 13 nodes: 59,351 nodes worth of inbound connections
- +5 /16s with 14 nodes: 60,156 nodes worth of inbound connections
- +2 /16s with 15 nodes: 60,501 nodes worth of inbound connections
- everything: 82,846 nodes worth of inbound connections
(the calculation here is just G /16s with N nodes each gives G*N*115 inbound slots, and thus copes with G*N*11.5 nodes’ worth of inbound connections, since each node makes 10 outbound connections)

Trying a sim with a 50/50 split and 60k total nodes seems to suggest the single-node /16s would all fill up their slots; but a 30/70 split looks like the single-node /16s would average out to 108.3/115 inbounds, which might be manageable.

Contributors
darosior ajtowns

Labels
P2P