This code makes executing two particular (and potentially other) attacks harder.
InvBlock
This behavior was described well here (page 11).
Per current implementation, if node A receives INV (tx) from node B, node A sends GETDATA to B and waits for TX message back.
Node A is likely to receive more INVs (regarding the same tx) from other peers. But node A would not send another GETDATA unless it does not hear TX back from node B for next 2 minutes (to save bandwidth)
Thus, if B is a malicious node, it can prevent node A from getting the transaction (even if all A’s peers have it) for 2 minutes.
This behavior seems to be an inherent limitation of the current P2P relay protocol, and I don’t see how it can be fundamentally changed (I can see workarounds which involve rewriting a lot of P2P code though).
What does this PR fix?
The attacks I’m looking at involve preventing A from learning the transaction for 2*N minutes. To do that, an attacker has to spin up N nodes and send N INVs simultaneously to node A (then InvBlocks will be queued with an interval of 2 minutes according to current implementation)
More precisely, 2 scenarios I’m looking at are:
- An attacker censors a particular transaction. By performing InvBlock from different nodes, an attacker can execute a network-wide censorship of a particular transaction (or all transactions). The earlier an attacker founds the transaction he wants to censor, the easier it is to perform an attack. As it was pointed out by @gwillen, this is even more dangerous in the case of lightning, where transactions are known in advance.
- Topology inference described in papers 1, 2 involve network-wide InvBlock. This fix would not mitigate this type of inference, but I believe it will make it more expensive to perform (an attacker would have to create more transactions and perform more rounds to learn the topology, the second paper itself notes that InvBlock isolation is important for the attack).
How does it work
This PR introduces bias toward outbound connections (they have higher priority when a node chooses from whom it should request a transaction) and randomizes the order. As per @gmaxwell suggestion, GETDATA requests queue is created after processing all incoming messages from all nodes.
After this fix, if the incoming messages were [I1, I2, I3, O1, O2, O3, O4], the queue for GETDATA may look like [O2, O1, O3, O4, I1, I3, I2, ….].
If {I1, I2, I3} were significantly earlier (but the difference is less than TX_TIMEOUT=60 s) than others, the queue for GETDATA may look like [I2, O2, O1, O3, O4, I1, I3, ….].
Other comments:
- This mitigation works better if the connectivity is higher (especially outbound, because it would be less likely that 2 GETDATAs for inbound malicious nodes queued together)