[WIP] p2p: Add random txn's from mempool to GETBLOCKTXN

davidgumberg commented at 9:13 pm on February 11, 2023: contributor

As compact block completion works currently, nodes reveal precisely the subset of transactions from published blocks that they already have in their mempool when they make a GETBLOCKTXN request for the transactions that they are missing during compact block relay. The greatest danger here is that nodes will never request their own transactions. Given a “sufficient number” of GETBLOCKTXN’s from a single peer, it will become possible to identify their wallet addresses with some degree of confidence.

Assuming that all transactions except for a node’s own, have a nonzero probability of not being in the node’s mempool when a block is discovered, an attacker with an infinite set of GETBLOCKTXN’s from a single peer that reuses a finite number of pubkeys will have 100% confidence about what addresses belong to that peer.

I am not a statistician, but I am actively trying to see if I can work out how large, and whether the “sufficient number” that gives a reasonable degree of confidence about a peer-pubkey correlation is a realistic scenario or not.

This PR prevents mempool fingerprinting by randomly adding ~ 1 in 200 (0.5%) transactions from our mempool to our GETBLOCKTXN. Nodes that have less complete mempools (worse connections) will have fewer excess txn’s to relay. (Nodes with 50% of block missing from mempool will tend to have about 5 excess transactions requested if there are 2000 txn’s in a block) 0.5% is a number I mostly pulled out of thin air but a maximum impact of 0.5% seems like a reasonable price to pay if the fingerprinting attack described is realistic.

DrahtBot commented at 9:13 pm on February 11, 2023: contributor

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Reviews

See the guideline for information on the review process.

Type	Reviewers
Concept NACK	naumenkogs
Approach NACK	sipa

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

davidgumberg force-pushed on Feb 11, 2023

Add rand txn's from mempool to GETBLOCKTXN request

In order to prevent fingerprinting, especially of our own txn's,
this adds a ~0.5% chance that transactions already in our mempool
get added to our GETBLOCKTXN request Nodes that have less
complete mempools are likely to have fewer excess txn's to relay.

db84b1f470

davidgumberg force-pushed on Feb 11, 2023

sipa commented at 4:07 am on February 12, 2023: member

It’s an interesting observation that our responses to compact block announcements reveal something about our mempool, but I’m not sure it’s worth the cost of addressing that:

Blocks are rare, and very expensive to produce, meaning that per block only a few of our peers even get the chance to query us about it (and it’s unaffordable to produce more close-to-tip blocks to trigger that).
Increasing the size of compact block responses may actually add to propagation latency, especially when it results in a response that now need more TCP packets (the bandwidth isn’t the concern here).
Just empirically, compact block relay works very well (on my well-connected node without wallets, 91% of blocks are reconstructed without asking for any transactions; 3.6% need 1 transaction; 3.2% need 2 transactions; 0.5% need 3 transactions; 1.1% need several). So even when our peers get a chance to learn something, there generally is very little to learn.

If we wanted to do something about this information leak nonetheless, I believe the right approach would be using the m_recently_announced_invs filter which we maintain for all our peers, and just add all transactions to the compact block response that we haven’t told our peer about yet (and if there are too many, perhaps just immediately fall back to standard block relay).

naumenkogs commented at 8:58 am on February 13, 2023: member

I agree with @sipa, with a stronger emphasis that I would probably NACK this change because the cost of this fix is too high, and the privacy gain is too low.

You may be interested in contributing to some SPV client implementation instead :) I’m curious how well they preserve privacy when they request transactions/blocks (that subset which is of interest to them specifically). E.g. whether they ask the same node to provide everything — then the node can correlate.

maflcko commented at 10:48 am on February 13, 2023: member

Wouldn’t it be better to not add wallet transactions to the mempool if we don’t want peers to query our mempool for wallet transactions?

See also #11887 (comment) (and all in- and out- links in this issue)

glozow added the label P2P on Feb 13, 2023

petertodd commented at 1:07 pm on February 13, 2023: contributor

Just empirically, compact block relay works very well

Note that it’s very easy for an adversary to change that by simply broadcasting simultaneous double-spends with the same fee. Indeed, n-way double spends broadcast to n different nodes is easy to do. So I don’t think the observation that it works well right now is relevant to the adversarial case.

sipa commented at 2:41 pm on February 13, 2023: member

I agree with @sipa, with a stronger emphasis that I would probably NACK this change because the cost of this fix is too high, and the privacy gain is too low.

Yeah, Approach NACK. I may be convinced that doing something to avoid mempool fingerprinting through GETBLOCKTXN is worth it, but if we want that, there are better ways than this.

Note that it’s very easy for an adversary to change that by simply broadcasting simultaneous double-spends with the same fee. Indeed, n-way double spends broadcast to n different nodes is easy to do. So I don’t think the observation that it works well right now is relevant to the adversarial case.

That’s fair; the other arguments are stronger.

Wouldn’t it be better to not add wallet transactions to the mempool if we don’t want peers to query our mempool for wallet transactions?

I don’t think that’s a good idea. The point is that we shouldn’t treat wallet transactions any differently from transactions received from other peers. If we don’t add wallet transactions to the mempool but still relay them (because otherwise nobody will ever know about them), we’re adding a giant fingerprint to identify our transactions (relayed but not in mempool…).

I think the focus of this PR on wallet transactions in general is distracting. The issue, if any, is mempool fingerprinting. That might be used by attackers to learn about our wallet transactions, but also about many other things. But the solution isn’t specific to wallet things; it should just be to prevent attackers from learning anything about our mempool transactions that haven’t been announced to them.

maflcko commented at 2:45 pm on February 13, 2023: member

If we don’t add wallet transactions to the mempool but still relay them

Yeah, I didn’t mention this, but obviously we wouldn’t relay them with the mempool. Doing a one-shot (tor-only) outbound connection to fan-out the tx (one-hop dandelion) without adding it to the mempool shouldn’t leave a fingerprint, other than the one left by the tor-only connection, no?

sipa commented at 2:48 pm on February 13, 2023: member

@MarcoFalke Oh, fair enough, that’s a good idea (though it’d probably still need a fallback to normal relay after some delay if we don’t observe the transaction being rumoured back to us). I also think it’s orthogonal to the idea here, because even absent “first mile” wallet broadcast leakage, we still want the P2P network to obscure transaction relay beyond that.

maflcko commented at 3:03 pm on February 13, 2023: member

we still want the P2P network to obscure transaction relay beyond that

I wonder if that is worth it. Given this issue here (and past ones), it just seems hard to think about and any guarantees are at best brittle in an evolving P2P network. So, long term, assuming the private “first mile” privacy-preserving fan out stuff is available, users and wallets caring about it will probably use that. Attempts to optimize the normal relay to be equally privacy-preserving will always have a taste of a false promise and it might be more honest to just tell people to not rely on that.

sipa commented at 3:15 pm on February 13, 2023: member

We can’t rely on Tor for all wallet privacy, especially given that it’s a centralized service that might just fail completely one day (and before that, it’s hard to bound how much sufficiently powerful attackers can learn from traffic analysis in Tor).

Privacy on a public network is always multi-faceted, and it’s fair we can’t make strong guarantees. But on the other hand, we go through pretty substantial efforts to hide lots of things on a best-effort basis, especially involving transaction relay. And they’re not all reducible to protecting wallet privacy (there is eclipse attack protection, fingerprinting for connection graph information, …).

achow101 commented at 4:15 pm on April 25, 2023: member

This PR does not seem to have conceptual support. Please leave a comment if you would like this to be reopened.

achow101 closed this on Apr 25, 2023

bitcoin locked this on Apr 24, 2024

[WIP] p2p: Add random txn’s from mempool to GETBLOCKTXN #27086

Reviews