RFC: randomize over netgroups in outbound peer selection #34019

issue darosior opened this issue on December 5, 2025
  1. darosior commented at 4:19 PM on December 5, 2025: member

    The current mechanism for choosing outbound peers picks one at randoms among known-reachable addresses, with the caveat that we do not connect twice to the same netgroup (by default /16's and, if an ASmap is configured, by AS's). A more robust mechanism for preventing an attacker to control all of a node's outbound connections would first randomize over netgroups and then pick a known-reachable address within that netgroup.

    This alternative mechanism would make the probability for an attacker to control all of a node's outbound connections exponentially decreasing in the number of connections, roughly $(\frac{k}{n})^c$ with $k$ the number of netgroups controlled by the attacker, $n$ the total number of netgroups available to be chosen from, and $c$ the number of outbound connections made. There is today in the order of 5k different /16's to choose from in the network[^0]. If we were to use this method, even an adversary that introduced 1k new ones with exclusively their nodes (which would be absurdly expensive) would not be able to control all of a node's outbound peers with any relevant probability.

    By contrast, the current mechanism will allow an adversary with enough node IPs to control all outbound connections of a node with a realistic probability, as long as those IPs are spread across at least as many netgroups as we make outbound connections by default. This is not merely a theoretical concern. This summer i investigated[^1] an entity that spun up hundreds of reachable nodes on the network. They have since scaled up to 3000 nodes, spread across 8 /16's boundaries (and 3 or 4 AS's). As a result, a freshly started clearnet nodes nowadays will make 3 to 5 of their outbound connections on average to this single entity, which is not even actively attacking (for instance by more aggressively sharing their own node addresses and/or not relaying other reachable nodes'). More discussions regarding this entity are available here and here.

    Of course, switching to this mechanism for choosing outbound peers would have consequences on the network graph. Because we currently sample over all known node addresses, we will be biased towards netgroups that contain a lot of nodes (such as hosting providers). First sampling by netgroup would remove this bias, and make it significantly more likely to connect to more "obscure" netgroups. This could cause a resource allocation issue on the network, with the inbound connection slots of netgroups with a lesser amount of nodes getting overused (and maxed out) while tons of inbound connection slots in netgroups with a higher amount of nodes sit unused. Interestingly, this is a similar concern to that of switching to ASmap by default shared by @virtu here.

    Naturally a middle of the road solution could be to use the alternative mechanism for half of our connections ($c = 5$ in the formula above is more than enough) to get the local eclipse resistance benefits while minimizing the risk of global network disruption. An alternative would be to fully move to sampling by netgroups, but not uniformly. The draw could be biased toward those with more available resources.

    A related discussion is how we want a node to behave when its inbound connection slots are full (see #16599 (comment)).

    A related question is whether we want to keep the "never connect to more than one netgroup" rule if we adopt the alternative mechanism. Without the rule but with the new mechanism, could it be the case that if all connection slots in "small" netgroups, a large fraction of the network eventually converge towards the larger netgroups? Possibly making several connections to the same netgroups? That seems unlikely. On the other hand if it happens it would organically spread resource usage (though not with a distribution we are happy with).

    This topic was discussed during yesterday's IRC meeting (which this issue is following up on). Logs available here.

    [^0]: A conservative estimate from querying the /16's present in the tried table of a number of long-running nodes, and comparing what a number of sources (1, 2, 3) claim are the number of reachable ipv4 nodes. The command ran to gather the number of /16's in a node's addrman is the following: bitcoin-cli getrawaddrman |jq -r '.tried[].address | select(test("^[0-9]{1,3}(\\.[0-9]{1,3}){3}$")) | (split(".")[0:2] | join("."))' |uniq |wc -l. [^1]: See this blog post. The investigation started because the nodes were misconfigured, and i ended up being in contact with the person running those. It appears the person is purposefully trying to optimize for the highest possible number of node addresses announced, in particular by having advertising several IPs per node.

  2. fanquake added the label P2P on Dec 5, 2025
  3. ajtowns commented at 6:14 PM on December 5, 2025: contributor

    I seem to have 7204 ipv4 nodes in my tried table with a timestamp more recent than 90 days ago, split across 3509 /16s. There are 6 /16s with between 100 and 200 tried entries, and another 23 /16s with more than 20 tried entries. At the other end of the scale, there are 2578 /16s with only one node in my tried table, 561 with two nodes, 172 with three, 58 with four, 28 with 5 and 26 with 6.

    The network is able to accept 115 inbound connections per node by default (max connections = 125, minus 10 outbounds), so if I tak e my tried table as comprehensive (not really a sensible assumption), cumulatively that's:

    • 2578 /16s with 1 nodes: 29,647 nodes worth of inbound connections
    • +561 /16s with 2 nodes: 42,550 nodes worth of inbound connections
    • +172 /16s with 3 nodes: 48,484 nodes worth of inbound connections
    • +58 /16s with 4 nodes: 51,152 nodes worth of inbound connections
    • +28 /16s with 5 nodes: 52,762 nodes worth of inbound connections
    • +26 /16s with 6 nodes: 54,556 nodes worth of inbound connections
    • +13 /16s with 7 nodes: 55,602 nodes worth of inbound connections
    • +12 /16s with 8 nodes: 56,706 nodes worth of inbound connections
    • +8 /16s with 9 nodes: 57,534 nodes worth of inbound connections
    • +4 /16s with 10 nodes: 57,994 nodes worth of inbound connections
    • +5 /16s with 11 nodes: 58,627 nodes worth of inbound connections
    • +2 /16s with 12 nodes: 58,903 nodes worth of inbound connections
    • +3 /16s with 13 nodes: 59,351 nodes worth of inbound connections
    • +5 /16s with 14 nodes: 60,156 nodes worth of inbound connections
    • +2 /16s with 15 nodes: 60,501 nodes worth of inbound connections
    • everything: 82,846 nodes worth of inbound connections

    (the calculation here is just G /16s with N nodes each gives G*N*115 inbound slots, and thus copes with G*N*11.5 nodes' worth of inbound connections, since each node makes 10 outbound connections)

    Trying a sim with a 50/50 split and 60k total nodes seems to suggest the single-node /16s would all fill up their slots; but a 30/70 split looks like the single-node /16s would average out to 108.3/115 inbounds, which might be manageable.

  4. 0xB10C commented at 3:17 PM on May 11, 2026: contributor

    As commented in https://bnoc.xyz/t/many-connections-to-bitproject-io-nodes/40/19, I saw some of my monitoring nodes having more than 4 connections to bitprojects for an extended about of time. Concept ACK on fixing this.

    I think the 5/5 approach is a good idea, but want to look a bit more into simulating this with a unit test. Once I do, I'll follow up with it here.

  5. 0xB10C referenced this in commit 06fd42da52 on Jun 24, 2026
  6. 0xB10C referenced this in commit f07d7f00f8 on Jun 24, 2026
  7. 0xB10C referenced this in commit eec93ab32c on Jun 25, 2026
  8. 0xB10C referenced this in commit 7cdae83fc0 on Jun 25, 2026
  9. 0xB10C referenced this in commit a473102192 on Jun 25, 2026
  10. 0xB10C referenced this in commit fab54fd4aa on Jun 25, 2026
  11. 0xB10C commented at 3:30 PM on June 25, 2026: contributor

    I have a (handcrafted) simulation that can reproduce the specific Bitprojects issue and provides some numbers on how many connections we'd have made to them. It can also be useful in the future to evaluate and test potential fixes:

    Run 2026-06-simulations-34019 with ctest --test-dir build -j9 -R simulation_34019. Show the results with cat simulation_data/*.txt.

    Using peers.datfiles provided by a handful of contributors who happened to have an older file laying around, I've been looking into simulating the number of connections we'd make to Bitprojects nodes. Note that many of these addrman snapshots come from development and test setups. The data quality might not be representative of a always on node that has been running for a long time.

    The simulation starts by loading a peers.dat file and then, for one million simulation iterations, opens 10 outbound connections and records how many of these are to Bitprojects IPs. The simulation uses the address selection logic from ThreadOpenConnections(), and assumes a static connection success rate to non-Bitprojects IPs (e.g. only 25% of connection attempts succeed), while connections to Bitprojects always succeed. There's an addrman from a node with ASmap enabled, which allows us to see the behavior with ASmap enabled. Looking at older addrmans is also interesting, as it shows the effect of Bitprojects spinning up more IP ranges over time.

    The simulation does not factor in anchors, which do provide eclipse attack resistance, as I did not have access to them. With anchors, the simulation results might have been slightly better (less connections to Bitprojects). The simulation also does not factor in feeler connections. No addresses in the addrman are updated. Since block-relay/full-relay doesn't influence who we connect to, we ignore this here too. We assume the node can reach all networks it has addresses for in it's addrman.

    I committed the peers.dat files in simulation_data/. The date in the filename indicates the last time an address was added to the file.

    snapshot asmap new tried ipv4 ipv6 onion i2p bitprojects new bitprojects tried
    2025-04-24-dan false 49,334 199 35,684 6,718 7,055 76 282 (0.57%) 3 (1.51%)
    2025-08-15-dea false 64,612 3,915 57,572 10,955 0 0 610 (0.94%) 424 (10.83%)
    2026-01-14-dar false 26,808 167 23,906 3,069 0 0 884 (3.30%) 41 (24.55%)
    2026-02-10-dan false 46,304 83 40,953 5,434 0 0 1,717 (3.71%) 19 (22.89%)
    2026-03-12-hal true 62,183 9,413 43,901 5,754 18,576 3,365 396 (0.64%) 371 (3.94%)
    2026-03-15-dea false 63,778 8,361 61,926 10,213 0 0 1,291 (2.02%) 1,124 (13.44%)
    2026-03-26-wil false 63,267 7,902 49,088 7,390 14,492 199 1,419 (2.24%) 288 (3.65%)
    2026-04-03-dar false 26,176 35 22,830 3,381 0 0 803 (3.07%) 1 (2.86%)
    2026-06-24-cha false 65,532 9,991 63,520 12,003 0 0 2 (0.00%) 1,295 (12.96%)

    The simulation parameters are:

    • iteration count / sample size: 1M
    • connection success rate to non-Bitprojects IPs: 25% (see this, but we can also try with 10% and 40%)
    • connection success rate to Bitprojects IPs: 100% (see this)

    This results in the following distribution of connections to Bitproject IPs:

    <img width="1286" height="790" alt="Connections to Bitprojects IPs simulation results" src="https://github.com/user-attachments/assets/5925b6d2-f35a-4440-8d62-6fc683aa8d1a" />

    • Bitprojects initially started with one /24 in 2024-07 and added three more /24 in late 2024. The address manager snapshots 2025-04-24-dan-peers.dat and 2025-08-15-dea-peers.dat both show a maximum of 4 connections to Bitprojects. In the April 2025 snapshot has a mean Bitprojects connection count of 0.41 while the August one has a mean of 1.62.
    • In October 2025, bitprojects added 8 more /24 IPv4 ranges for a total of 12. The next two snapshots, 2026-01-14-dar-peers.dat and 2026-02-10-dan-peers.dat, from early 2026 show a maximum of 9 connections to Bitprojects. The mean for these is 3.82 and 3.33 connections to Bitproject IPs. The 01-14 snapshot has a very high rate of 41/167 = ~25% IPs in the tried table being to Bitprojects.
    • The 2026-03-12-hal-peers.dat snapshot is special as it uses an ASmap. Since the 12 /24's were split across three ASNs, we only make up to three connections to Bitprojects with a mean of 0.77. By spreading out the 12x /24's across multiple AS, which could be more costly, bitprojects could have worked around ASmap. Additionally, this snapshot has Onion and I2P addresses enabled, which allow it to make connections to more networks.
    • The 2026-03-15-dea-peers.dat snapshot is similar to the 01-14 and 02-10 snapshots but with a lower mean of 2.8 connections to Bitprojects.
    • The 2026-03-26-wil-peers.dat snapshot has Onion and I2P addresses enabled, which make it a lot more resistant against the IPv4-only Bitprojects IPs. In the mean it makes 1.48 connections to Bitprojects.
    • The 2026-04-03-dar-peers.dat has only one Bitprojects IP in it's tried table consisting of only 35 addresses. So it made a lot fewer connections to Bitprojects with a mean of 1.18.
    • The Bitprojects nodes were shutdown end of March 2026. However, if they'd come back online today, 2026-06-24-cha-peers.dat would still make 1.9 connections to Bitprojects in the mean. It still has 1295 Bitprojects IPs (out of 10k) in it's tried table.

    With a hypothetical non-Bitprojects connection success rate of 10% (as opposed to the assumed 25%), this looks worse.

    <img width="1286" height="790" alt="Image" src="https://github.com/user-attachments/assets/d6657d22-e84b-477c-8793-025b22867a0e" />

    With success=10%, we're seeing more than half of the connections in some simulations being made to Bitprojects. We're also seeing that it's possible for three addrman snapshots to be completely (outbound) eclipsed with 10 connections to Bitprojects (but only in >0.1% of the cases).


    This reproduces and confirms the issue from a simulation standpoint and allows us to test potential fixes against this. As a next step, looking into a 5-netgroup-randomized and a 5-addrman-sampled (or 3 / 7) connection split could be interesting.


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-07-02 03:51 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me