hebasto commented at 9:09 pm on October 30, 2019: member

Refs:

A victim node restart is still the most likely way of an eclipse attack occurring (https://github.com/bitcoin/bitcoin/issues/17326#issuecomment-550360907).

Suppose that a node periodically dumps the current outbound connection list to the disk (a very small file) and retrieve it after shutdown/crash and restart, and tries to re-connect to the listed peers. It could mitigate eclipse attack.

UPDATE 2019-11-07 13:30 UTC There is an eclipse attack scenario when an attacker exploits a victim node restart to force it to connect to new, probably adversarial, peers.

Trying to re-connect to the before-restart dedicated block-relay-only (#15759) outbound peers mitigates such type of attack.

This proposition does not:

prevent all types of eclipse attack
completely eliminate the type of eclipse attack described above, as re-connection could fail
make block-relay-only connections persistent as any of them could be dropped by a peer

hebasto commented at 9:09 pm on October 30, 2019: member

ping @EthanHeilman @sdaftuar @naumenkogs

fanquake added the label P2P on Oct 30, 2019

MarcoFalke added the label Brainstorming on Oct 31, 2019

EthanHeilman commented at 2:46 pm on October 31, 2019: contributor

Implementation

This would be trivial to implement. Addrman already serializes its data to disk in the form of peers.dat. You would just need to maintain an outgoing connection vector, add it to the addrman serialization logic and then read that vector when calling select.

Thoughts

In our paper we called these countermeasure 5: anchor connections. We suggested limiting them to two of the outgoing connections. We wrote:

Anchor connections.Inspired by Tor entry guard rotation rates [33], we add two connections that persist. Between restarts. Thus, we add an anchor table, recording addresses of current outgoing connections and the time of first connection to each address. Upon restart,the node dedicates two extra outgoing connections to the oldest anchor addresses that accept incoming connections. Now, in addition to defeating our other counter-measures, a successful attacker must also disrupt anchor connections; eclipse attacks fail if the victim connects to an anchor address not controlled by the attacker […] [33] - “One Fast Guard for Life (or 9 months)” - https://www-users.cs.umn.edu/~hoppernj/single_guard.pdf -Eclipse Attacks on Bitcoin’s Peer-to-Peer Network

Eclipse attack requires the victim node to restart so it can connect to adversarial addresses.

Some things have changed from when that paper was written and that is no longer true.

Bitcoin 2015:

Bob has 116 incoming connections.
Alice makes an outgoing connection to Bob.
Bob has 117 incoming connections
Carol attempts to make an outgoing connection to Bob
Carol’s connection is rejected.
Alice still has an outgoing connection to Bob

Bitcoin 2019:

Bob has 116 incoming connections.
Alice makes an outgoing connection to Bob.
Bob has 117 incoming connections
Carol attempts to make an outgoing connection to Bob
There is a change Alice’s connection is evicted and Carol’s connection is established
Alice loses her outgoing connection to Bob (sometimes)
Carol now has an outgoing connection (sometimes)

See the incoming connection eviction logic here: https://github.com/bitcoin/bitcoin/blob/master/src/net.cpp#L857

If we were in Bitcoin 2015 anchor connections could be bypassed using a connection starvation attack:

Attacker makes up to 117 out connections to each full node on the network. This is actually pretty cheap to do. A laptop connected over WIFI could do this to Bitcoin 2015
Attacker performs eclipse attack
Victim reboots, attacker fills up the freed connection slots in the network
Victim can’t connect to nodes they were connected to before the reboot since all their connections are monopolized by attacker
Eclipse attack succeeds.

Now in Bitcoin 2019 the victim could successfully evict the attackers connections and reconnect to the node. However in BItcoin 2019 an attacker might be able to eclipse a node without reboots via connection eviction logic. There is a trade off here. Someone should research this!!!

Unsatisfying conclusion:

Always reconnecting outgoing connections to the same nodes would probably make Eclipse attacks more complex but it would also mean that outgoing connections would be more static. Is making them more static good or bad?

naumenkogs commented at 5:47 pm on October 31, 2019: member

Edit: this comment is not accurate, see the following Ethan’s clarification below. @EthanHeilman Thanks for the thorough explanation of the difference between 2015 and 2019. Just wanted to highlight for the other readers that although it’s very useful to understand the background, it does not directly affect reasoning about this PR. Maybe it does but in a positive way: anchors are stronger in 2019, so increasing their number today is even better.

However in BItcoin 2019 an attacker might be able to eclipse a node without reboots via connection eviction logic. There is a trade off here. Someone should research this!!!

I agree, this is an important one, but again, not really related to whether we need anchors. I hope to find time to eventually look into this particular issue and measure the trade-offs.

Please correct me if im wrong.

hebasto commented at 1:15 pm on November 1, 2019: member

@EthanHeilman

Thank you for your review.

This would be trivial to implement. Addrman already serializes its data to disk in the form of peers.dat. You would just need to maintain an outgoing connection vector, add it to the addrman serialization logic and then read that vector when calling select.

CConnman::DumpAddresses() is called from CConnman::Stop(). Therefore, peers.dat does not suit in case of unpredictable shutdown (e.g., power failure). IMO, outgoing connection vector should be dumped to the dedicated file, say anchors.dat, periodically, like banlist.dat does, or after each change in it.

naumenkogs commented at 3:28 pm on November 1, 2019: member

While thinking about this idea I came up with a formula a node is at most as secure as all the connections it has ever made. What we really want is to be able to check with our former connections that we’re on the same tip (at least to prevent full eclipsing with double-spends etc. Other sybil problems still apply).

So perhaps every N (say 2) minutes we can exchange recent block hashes with one of our former nodes? This is a pretty big change, but we can start with doing it for our last 8+2 outbound disconnects. So, logging in anchors.dat all our outgoing conns all the time, and exchanging last block hash with top-8 (except currently connected) over N*(8+2) minutes.

I would suggest to not mix this logic with feelers. Connecting to a feeler currently doesn’t even check we’re on the same tip (that’s a shame, I want to fix it soon). But even if we do sync tips, this would be a cheap way for an attacker to fill our anchors.dat, so I want anchors to represent persistent conns.

hebasto commented at 2:36 pm on November 6, 2019: member

I had some discussions with @naumenkogs, and I understand that there are some concerns about long-time consequences for the network graph topology.

Let me make my proposition more clear.

There is an attack vector which requires victim node restart. It exploits existing logic to make new outbound connections at node startup.
Currently, such a kind of outbound peer rotation (OPR for short) is a side effect of (un)expected node restart. My proposition changes this behavior: being implemented it significantly reduces the probability of OPR on the node restart.
The discussion about OPR and its effect on the network keeps very long time, e.g., #4723, #15759. The latest state of discussion could be expressed as:

OPR is good for tx-relayed peers as it improves privacy and makes topology inference more difficult
OPR is bad for block-relayed peers as it increases risk for a node to be eclipsed

The only goal of my proposition is mitigation of a well-known eclipse attack; it is not about OPR directly. As a side effect, it changes node behavior wrt OPR. Please note that some other processes and events, besides node restart, affect OPR node behavior, e.g., detected “stale tip” event.
Having dedicated block-relayed outbound connections, I believe it is good, without trade-offs, to preserve them when a node restarts.

IMO, @naumenkogs’s #17326 (comment) is orthogonal to my proposition, and definitely deserves its own discussion.

EthanHeilman commented at 3:27 pm on November 6, 2019: contributor

@naumenkogs

Just wanted to highlight for the other readers that although it’s very useful to understand the background, it does not directly affect reasoning about this PR.

I explained my point poorly. Let me try again. This PR makes the following argument: “Eclipse attack requires the victim node to restart so it can connect to adversarial addresses.” This is no longer the case. That being said in my opinion a restart is still the most likely way of an eclipse attack occurring.

I agree, this is an important one, but again, not really related to whether we need anchors. I hope to find time to eventually look into this particular issue and measure the trade-offs.

Because restarts are no longer necessary to perform eclipse attacks, anchors provide less of a security improvement against eclipse attacks. However against a restart-based eclipse attack the security they do provide is harder for an connection starvation attack to bypass. The security provided by anchors in 2019 is a less useful but more robust countermeasure.

I still think they are useful enough to justify adding them. I just want to make sure we that this issue documents the actual security they provide.

naumenkogs commented at 8:23 pm on November 6, 2019: member

OPR is good for tx-relayed peers as it improves privacy and makes topology inference more difficult
OPR is bad for block-relayed peers as it increases risk for a node to be eclipsed

Just wanted to mention that this is my (rough) current intuition, not something we have consensus on :)

As for the proposal itself, I currently have 2 problems with it:

users which do expect new connections when restart. It’s purely a UX question. (Perhaps explaining this and new instructions would be enough.)
disabling this side-effect OPR we have from restarts, which is one of the very few ways we currently rotate peers. After removing it, the network will be more static. (An answer to it would be a well-thought explicit rotation, but we all agree it’s something non-trivial.)

With the latest suggestion of @hebasto to anchor only block-relay-only peers, I think this is strictly beneficial, because we still rotate 8 (tx+block) relay peers, so we will meet the expectations and we will keep side-effect OPR.

If we ever conclude that block-relay-only links should be rotated, we can ADD 2 more rotatable links. But I think my orthogonal idea I explained above and checking tips with feelers should be sufficient here.

Concept ACK.

TheBlueMatt commented at 9:22 pm on November 6, 2019: member

Right, this seems reasonable, but maybe only for a subset. Having more “categories” of connections (including “rotatable” ones, as @naumenkogs notes) is likely also important. Ultimately, different types of eclipse attacks demand different responses.

gmaxwell commented at 10:03 pm on November 6, 2019: contributor

Making it do this with all connections would probably bad, because it would guarantee capture persistence. It potentially makes topology inference more powerful. Strong persistence can also contributed to network self-partitioning (e.g. where longer distance links are less reliable, so they get culled, and eventually you end up disconnected subgraphs that connect only to their own continent).

In the past in bitcoin we’ve tried to exploit diversity in connections– consider the inbound peer eviction logic: We exclude peers from eviction if they are among the best in a half dozen different metrics, with the belief that it is much harder for an attacker to dominate in every category than it is to just dominate in a single metric.

The revised approach of only applying it to blocks-only peers essentially addresses the topology inference question. But I don’t think the best outcome comes from making all blocks only peers persistent– because that would be needlessly weak, e.g. to an attacker that contacts major VPS providers and acquires control of many IPs that have a long history of running bitcoin nodes, and disadvantages connections to honest users on dynamic IPs. Doing it with half of them or even just two of them would probably be a bigger win.

This behaviour should probably earn a complementary behaviour on the inbound side: Right now about half the inbound slots are preserved for longest-connected peers. Half of those could be redirected to be preserved for network|limited peers with longest-historically-connected time. Without some measure like this, persistent connection logic could somewhat undermined by an attacker that fills the connection slots up on long running static IPed nodes in order to cause the eviction of (or prevent connections from) the other hosts they hope to eclipse.

gmaxwell commented at 10:10 pm on November 6, 2019: contributor

Now in Bitcoin 2019 the victim could successfully evict the attackers connections and reconnect to the node. However in BItcoin 2019 an attacker might be able to eclipse a node without reboots via connection eviction logic. There is a trade off here. Someone should research this!!!

Half of the inbound connections are reserved for the longest running connections, which is the ‘2015’ logic (pre PR6374). Few nodes were more than half full in 2015. So essentially all peers that would have been protected in 2015 are protected today. So, I don’t think your characterization of changing from one weakness to another is correct– instead the current behaviour is fairly strong against both attacks.

Providing the same kind of diverse protection is why I argue above against all peers (or all blocks only peers) being made persistent.

naumenkogs commented at 2:37 am on November 7, 2019: member

@gmaxwell I might be wrong, but it seems you’re confusing light clients’ (network|limited) blocks-only connections with block-relay-only connections we recently added?

After this PR, every node creates 8 regular connections and 2 connections which relay only blocks (no transactions and no addrs).

Our latest discussion here were around keeping only those 2 new block-relay-only connections persistent.

mzumsande commented at 12:24 pm on November 7, 2019: member

Making it do this with all connections would probably bad, because it would guarantee capture persistence.

To extend on this, it seems to me that a patient attacker who controls just a relatively small number of nodes but can provide 100% uptime and a large capacity for inbound connections could slowly but surely eclipse arbitrary nodes or even take over large parts of a network with a large number of anchors:

Every time some node restarts (or evicts an inbound peer), the affected inbound peers will search for new outbound connections and connect with a certain probability to one of the attackers nodes - once this has happened, this outbound slot is “locked in” to the attacker forever, protected by anchor logic. So the attacker could slowly capture connections over time and eventually take over large parts of the network.

To a lesser degree this could also be a problem with the idea of keeping all of the 2 blocks-only connections persistent: An attacker could capture the subset of persistent blocks-only connections over time with the strategy outlined above and neutralize the added protection that block-only connections were meant to provide.

hebasto commented at 1:33 pm on November 7, 2019: member

Thanks to all reviewers. OP has been updated.

naumenkogs commented at 4:50 pm on November 7, 2019: member

Every time some node restarts (or evicts an inbound peer), the affected inbound peers will search for new outbound connections and connect with a certain probability to one of the attackers nodes - once this has happened, this outbound slot is “locked in” to the attacker forever, protected by anchor logic. So the attacker could slowly capture connections over time and eventually take over large parts of the network.

The whole thing is ultimately dependent on a combination 2 factors: I) how much better are the malicious nodes comparing (in terms of a node lifetime without outages) II) which fraction of reachable nodes are malicious

It seems to me that the difference is: persistency allows an attacker to benefit from I, and restart rotation allows an attacker to benefit from II. Like, if an attacker deploys more sybils overtime, persistency wouldn’t benefit an attacker, and restart rotation would. At the same time, if an attacker have much higher reliability, persistency would benefit an attacker, and restart rotation won’t.

We don’t know for sure which one is easier to achieve, and also middle ground is not necessarily optimal, but I think this is measurable under various conditions.

There are obviously other variables. Would we prefer 0.1% chance of being eclipsed and 50% chance of being connected to 4 spy nodes over 0.3% chance of being eclipsed and 20% chance of being connected to 4 spy nodes under some realistic conditions. Being connected to 4 spies would result in a total tx deanonymization under current protocols, so the answer to me is unclear. As an example, I imagine the first scenario is something we get with restart rotation and the second is something we get with restart persistency, under my guess of the attacker’s capabilities w.r.t number of sybils and persistency.

hebasto commented at 10:48 pm on November 9, 2019: member

An implementation is presented in #17428.

ariard commented at 8:07 pm on January 15, 2020: member

I do think anchors on the whole is a good idea. Let’s say you have an attacker willingly to eclipse some victim by spoofing enough malicious nodes for a given period T. By anchoring, and reusing some nodes from period T -1, attacker should have started at T -1, so it increases its deployment costs to be sustained along both periods. We can extend this logic to randomly pick among our anchors peers sorted by time range. E.g picks 1 node among the January-to-March range, 1 node among April-to-June, 1 node among July-to-September, etc. That would imply we refresh our anchors.dat with outbound peers at shutdown and move from a strong persistence to a weaker one.

That’s said, this implementation is far more scoped, which is a good start but IMO it may suffer from a weakness. Let’s consider the following attack scenario, anchor selection logic is gameable and anchor peers can be occupied by an attacker. If anchors are block-only-relay, using txn/addrs relay leaks via inbound, he could discover topologies of remaining full-relay. If anchors are full-relay, attacker can’t leverage leaks to discover remaining block-only-relay ones. So I think if we restrain anchors to full-relay it’s better.

OPR is bad for block-relayed peers as it increases risk for a node to be eclipsed

But in this case, full-relay are also block-relay ones, so avoiding their restart rotation would also prevent the attack described. Though it gives to an attacker a persistent advantage on tx-spying so, without further block/tx relay separation, anchoring have to balance betwene a tx-spying advantage and loosing the hidden bonus of block-relay-only.

hebasto renamed this:
~~rfc, p2p: Eclipse attack mitigation~~
rfc, p2p: Restart-based eclipse attack mitigation
on Jan 25, 2020

hebasto commented at 10:16 am on January 25, 2020: member

@ariard

Let’s consider the following attack scenario, anchor selection logic is gameable and anchor peers can be occupied by an attacker.

In this case, if an attack is already successful, there is no reason to prevent it, and this suggestion is not applied.

If anchors are block-only-relay, using txn/addrs relay leaks via inbound, he could discover topologies of remaining full-relay. If anchors are full-relay, attacker can’t leverage leaks to discover remaining block-only-relay ones. So I think if we restrain anchors to full-relay it’s better.

I believe that your suggestion makes the tx-relay node graph more static, which in turn makes tx spying easier, no?

gmaxwell commented at 2:43 pm on January 25, 2020: contributor

@gmaxwell I might be wrong, but it seems you’re confusing light clients’ (network|limited) blocks-only connections with block-relay-only connections we recently added?

After this PR, every node creates 8 regular connections and 2 connections which relay only blocks (no transactions and no addrs).

Our latest discussion here were around keeping only those 2 new block-relay-only connections persistent.

I can’t figure out what I said to make you think that. I was aware of the behaviour and the context.

ariard commented at 0:49 am on January 30, 2020: member

@hebasto

In this case, if an attack is already successful, there is no reason to prevent it, and this suggestion is not applied.

But being successful to occupy anchor spots doesn’t mean you’re able to take over the remaining outbound full-relay ones. One vulnerability which lets an attacker controlling some outbound connections may be exploited to pursue an escalation and gain total eclipse over the victim, we should prevent this. I see our misunderstanding, I wasn’t assuming anchors=block-only as it is right now with your implementation but was thinking more generally if we open connections based on peers being among our anchors repository.

I believe that your suggestion makes the tx-relay node graph more static, which in turn makes tx spying easier, no?

Yes I agree on this point, see the end of my previous comment. But I would favor eclipse-safety over tx-spying concerns, because failing the first one you may loose money, the second one you can circumvent it by announcing your txn over Tor or other. Though part of a wider debate..

hebasto commented at 1:35 pm on February 8, 2020: member

The initial idea to mitigate eclipse attacks based on node restarts, included ones initiated by adversary, introduces a new risk.

hebasto closed this on Feb 8, 2020

instagibbs commented at 2:47 pm on February 24, 2020: member

why is this issue closed?

hebasto commented at 7:35 pm on February 27, 2020: member

why is this issue closed?

To keep all discussion in #17428.

laanwj referenced this in commit 9855422e65 on Oct 15, 2020

sidhujag referenced this in commit f5fa561d87 on Oct 16, 2020

DrahtBot locked this on Feb 15, 2022

rfc, p2p: Restart-based eclipse attack mitigation #17326

Implementation

Thoughts

Unsatisfying conclusion: