As compact block completion works currently, nodes reveal precisely the subset of transactions from published blocks that they already have in their mempool when they make a GETBLOCKTXN
request for the transactions that they are missing during compact block relay. The greatest danger here is that nodes will never request their own transactions. Given a “sufficient number” of GETBLOCKTXN
’s from a single peer, it will become possible to identify their wallet addresses with some degree of confidence.
Assuming that all transactions except for a node’s own, have a nonzero probability of not being in the node’s mempool when a block is discovered, an attacker with an infinite set of GETBLOCKTXN
’s from a single peer that reuses a finite number of pubkeys will have 100% confidence about what addresses belong to that peer.
I am not a statistician, but I am actively trying to see if I can work out how large, and whether the “sufficient number” that gives a reasonable degree of confidence about a peer-pubkey correlation is a realistic scenario or not.
This PR prevents mempool fingerprinting by randomly adding ~ 1 in 200 (0.5%) transactions from our mempool to our GETBLOCKTXN
. Nodes that have less complete mempools (worse connections) will have fewer excess txn’s to relay. (Nodes with 50% of block missing from mempool will tend to have about 5 excess transactions requested if there are 2000 txn’s in a block) 0.5% is a number I mostly pulled out of thin air but a maximum impact of 0.5% seems like a reasonable price to pay if the fingerprinting attack described is realistic.