compact block fingerprinting

Crypt-iQ commented at 1:27 pm on August 15, 2023: contributor

I haven’t written a test for this, but it seems that a peer can fingerprint a chunk of our mempool by announcing a new compact block with a valid header, but inserting many shorttxids that don’t belong to the block but are suspected to be in our mempool. Assuming no short id collisions, we’ll then request each tx that we didn’t find in either vExtraTxnForCompact or the mempool.

I’m not sure how to fix this since compact block relay relies on this behavior. One possible way of fixing this was brought up here #27086 (comment).

maflcko added the label P2P on Aug 15, 2023

maflcko added the label Privacy on Aug 15, 2023

maflcko commented at 1:42 pm on August 15, 2023: member

I guess this relates to the TODO as well: https://github.com/bitcoin/bitcoin/blob/80d70cb6b04b5a3c13e661c0718a4b5108d55869/src/blockencodings.cpp#L24

Crypt-iQ commented at 1:53 pm on August 15, 2023: contributor

I think implementing the TODO might add round-trips and maybe leak what we had in our own mempool prior to block acceptance?

Crypt-iQ commented at 3:18 pm on August 25, 2023: contributor

There are two cases to think about here that currently leak some info:

we have the tx, so we won’t request it
we don’t have the tx, so we will request it

To solve both cases, we could choose some transactions to request even if we have them. For example, if we need X transactions to reconstruct the block, we could request Y transactions in GETBLOCKTXN where we have the other Y-X transactions. Adding these decoy transactions increases the BLOCKTXN response and could cause more round trips. We would also have to ignore the known transactions in BLOCKTXN.

If there are too many txn’s that we’re missing, we could fall back to block relay instead of trying to add many decoy transactions. According to this data point, compact block reconstruction works pretty well. Most of the time no reconstruction happens and sometimes only a few txn’s are needed. So falling back to block relay should be unlikely.

Since quick compact block relay is pretty important to avoid stale blocks, I’m not really sure how to weigh the privacy-bandwidth tradeoff.

mzumsande commented at 3:49 pm on August 25, 2023: contributor

After #27675 (but also before, with a few more restrictions), we relay any tx from our mempool (except for those that arrived after the last INV sent to the peer), so someone wanting to fingerprint our mempool could just ask for those transactions directly instead I think and then check whether we send it to them or answer with NOTFOUND.

So I wonder why would someone trouble themselves with doing compact blocks for this, which would involve either creating a new valid block header (expensive) or trying to be the first to announce a compact block found by someone else to the victim (which has timing issues), if they could just ask for this info directly? (this doesn’t apply to block-relay-only peers though).

Crypt-iQ commented at 8:33 pm on February 17, 2025: contributor

I meant to reply but guess I never did. The point about blocks-only nodes is a good one. In the happy path, they aren’t sent compact blocks since they never send SENDCMPCT to the peer again to set the high-bandwidth mode. But they can still be sent unsolicited compact blocks and because they can’t reconstruct the blocks, it lets them be fingerprinted if anything is in their mempool. In most cases I don’t think we can send GETDATA to fetch txns from blocksonly nodes since they never INV us anything.

I think the only times a -blocksonly node will have a transaction is if they have a peer with the relay permission, in cases of reorg, or if they accidentally decide to broadcast a tx?

gmaxwell commented at 3:09 pm on May 23, 2025: contributor

FWIW, this is solvable by changing to more fibre like methods for block transmission. Because FEC coded missing data doesn’t identify what was missing.

So I had thought the code would check if there were too many missing transactions and just request a full block if all were missing. That doesn’t address the general attack (since the attacker could admit some real txn that are in the block) but it would mostly address the blocksonly case (and could easily get a if blocksonly check so that it always did) – but I can’t find that. It should be added.

It would also be prudent to just drop any unrequested compact block messages that show up outside of the correct protocol behavior (or treat them as header messages). That would also eliminate the attack on blocks only peers and also discourage protocol violations that waste bandwidth (e.g. going around spamming everyone with compact blocks to try to speed up your own propagation).

The latter of these two should be much easier to implement and be a more comprehensive attack protection.

instagibbs commented at 3:33 pm on May 23, 2025: member

Dropping seemingly-unsolicited compact blocks seems fine, we already do that for the parallel portion, this should be a change just to the first one? blocksonly nodes should never set cb peers.

mzumsande commented at 5:19 pm on May 23, 2025: contributor

Some discussion about the privacy implications for blocksonly peers in IRC: https://bitcoin-irc.chaincode.com/bitcoin-core-dev/2025-05-23#1123715;

Dropping unsolicited cmpctblock messages makes sense to me as well (and doesn’t appear to be at odds with anything in BIP152).

Crypt-iQ commented at 6:05 pm on May 23, 2025: contributor

So I had thought the code would check if there were too many missing transactions and just request a full block if all were missing. That doesn’t address the general attack (since the attacker could admit some real txn that are in the block) but it would mostly address the blocksonly case (and could easily get a if blocksonly check so that it always did) – but I can’t find that. It should be added.

Is this still useful if we add a patch to drop unsolicited cmpctblock messages? An attacker would still be able to do this to non-blocksonly peers, but as pointed out above by @mzumsande it doesn’t seem particularly useful as the attacker could simply request the transactions directly from these peers.

If it is useful to fallback to the legacy BLOCK message in the non-blocksonly case, then should it only be done when all the transactions are missing or could it instead be if some % are missing?

gmaxwell commented at 7:13 pm on May 23, 2025: contributor

I don’t recall if the vulnerability related to requesting the transaction was closed (relay pool limited its scope at least to recently announced txn) but there were complications related to orphan txn (e.g. that relaying a transaction ought to imply permission to fetch its parents if they are still in your memory pool). But if that vulnerability hasn’t been closed it ought to be closed. Edit: Sipa noted to me on IRC that the behavior now is that when you INV you grant only permission to fetch txn in your mempool up to that point in time (which is very close to equivalent to only granting permission to things that you’ve INVed (or would have inved if the peer had been connected)). So I think the CB probing, in fact, is an additional information leak.

Ignoring the unrequested CB is desirable for non-privacy reasons, specifically because if it doesn’t someone might decide it’s in their best interest to spam the entire network with an unrequested CB to try to improve propagation. This has been done in the past with unsolicited BLOCKS (which is obviously much worse). I haven’t asked matt but I bet if this oversight weren’t completely accidental I bet it was due to fibre sending them unsolicited, since if you were connecting to a fibre node at all obviously you did want blocks even if the HB logic hadn’t yet selected you. — since by the time CB were developed we’d already learned the lesson that being tolerant in what you accept turns out to cause trouble.

Just in principle if there is some unsolicited activity that shouldn’t exist it probably ought to at least be ignored if not outright triggering a disconnection. There have been a lot of vulnerabilities in the past that could have been avoided or greatly mitigated by doing so, and sure sometimes it’s discovered that it might be useful to have the behavior, but in that case the version that introduces the behavior can introduce a new message type or a handshake that confirms its okay.

TheBlueMatt commented at 8:48 pm on May 23, 2025: contributor

Yea, I’m not really sure if twas entirely deliberate or not, but they’d presumably have to be the first node to relay you a block (so at least its not entirely trivial, as IIRC it has to build on the tip, or if it doesn’t it should!), and, indeed, I believe FIBRE took advantage of it (tho not very effectively due to lack of parallel compact block reconstruction in Bitcoin Core at the time). Now that we do have parallel compact block reconstruction, I wonder if we shouldn’t disable this while at the same time bumping the parallelization limit (dunno what it is now but probably should be 3?) and increasing the HB peer set (they’re pretty cheap, and it probably should have been higher to begin with).

davidgumberg referenced this in commit 13ca017017 on May 23, 2025

davidgumberg commented at 10:01 pm on May 23, 2025: contributor

Opened #32606 to drop unsolicited CMPCTBLOCK messages.

I wonder if we shouldn’t disable this while at the same time bumping the parallelization limit (dunno what it is now but probably should be 3?) and increasing the HB peer set (they’re pretty cheap, and it probably should have been higher to begin with).

I did not do this, but happy to close in favor of another PR that addresses this, or to add this to #32606 later, just not confident that I understand yet how to reason about the implications of this.

gmaxwell commented at 10:46 pm on May 23, 2025: contributor

I wouldn’t suggest bumping the HB limit at this time, as roundtripping to get transactions seems to be the long pole in the tent right now.

TheBlueMatt commented at 1:29 pm on May 24, 2025: contributor

The get transaction RT is the long pole mostly because Bitcoin Core won’t respond to the request until it finishes validating the block (but will send the initial compact block prior to doing so). This makes it pretty slow depending on your peer, but also means that requesting it from many peers may be advantageous (because some peers will validate the block faster). Responding to the message should be ~free for the peers.

It might also be worth pointing out that dropping unsolicited messages probably won’t accomplish all that much, though - you already have to be the first peer to provide a block in order to exploit this, and if you can do it once you can almost certainly do it a few times, at which point you will be an HB peer and can then exploit this normally. If we’re worried about this, we probably need to tweak the protocol to include full txids so that the merkle root can be checked.

Crypt-iQ commented at 1:07 pm on May 27, 2025: contributor

Dropping seemingly-unsolicited compact blocks seems fine, we already do that for the parallel portion, this should be a change just to the first one?

Yup. You wrote the code so I’m sure you know, but for viewers at home, just the first “slot” would need to be accounted for as the other two slots are guaranteed to be HB.

So I think the CB probing, in fact, is an additional information leak.

I realized over the weekend that it also leaks transactions in vExtraTxnForCompact since InitData also checks this vector as well.

I believe FIBRE took advantage of it

I’m not super familiar with FIBRE – does it have a chance of being used again? Would dropping unsolicited compact blocks prevent FIBRE from working again?

but will send the initial compact block prior to doing so

I don’t think this is true anymore. From my reading of the code, this only happens when the block has been completely validated here. I found this a bit confusing since I had originally thought after reading BIP152 that bitcoind would relay the CMPCTBLOCK immediately prior to fully validating the block.

It might also be worth pointing out that dropping unsolicited messages probably won’t accomplish all that much, though - you already have to be the first peer to provide a block in order to exploit this, and if you can do it once you can almost certainly do it a few times, at which point you will be an HB peer and can then exploit this normally.

This is a good point, you would need to be the fastest peer to exploit it in the non-blocksonly case. I do think that the hole should be plugged for -blocksonly nodes though. While it doesn’t seem to result in any serious privacy leak now for these nodes, I do agree with @gmaxwell that in principle we shouldn’t tolerate incorrect protocol flows.

instagibbs commented at 1:10 pm on May 27, 2025: member

I don’t think this is true anymore.

It’s ostensibly forwarded once PoW/merkle checks pass, but not promising utxo/script/etc checks.

davidgumberg referenced this in commit 6e1a49b66e on May 28, 2025

TheBlueMatt commented at 4:51 pm on May 30, 2025: contributor

I’m not super familiar with FIBRE – does it have a chance of being used again? Would dropping unsolicited compact blocks prevent FIBRE from working again?

I believe as of a week of two ago there’s work to revive it!

This is a good point, you would need to be the fastest peer to exploit it in the non-blocksonly case.

This was only half my point - my larger point was that we can’t really fix this issue because if you’re the fastest peer once you’re gonna be the fastest peer again. I’m not sure it’s worth trying to fix it broadly given the impact it might have (even if small) and the fact that it isn’t really a great fix.

If we care deeply about this, we’ll need to use transactions-relayed to request transactions not the mempool.

I do think that the hole should be plugged for -blocksonly nodes though.

Indeed.

Crypt-iQ commented at 5:37 pm on May 30, 2025: contributor

This was only half my point - my larger point was that we can’t really fix this issue because if you’re the fastest peer once you’re gonna be the fastest peer again. I’m not sure it’s worth trying to fix it broadly given the impact it might have (even if small) and the fact that it isn’t really a great fix.

Hmm, I agree with this. I guess the solution is a better protocol to avoid fingerprinting? Adding more complexity here sounds a bit scary.

I looked at the WIP fibre patchset here: https://github.com/w0xlt/bitcoinfibre/blob/45897a826b37eb417cba93ac17ff7bebb272893c/src/net_processing.cpp#L873 and it seems like fibre does rely on sending cmpctblock even if the peer hasn’t requested hb mode. I think this is how fibre bootstraps until it is marked as high bandwidth by the peer at which point the protocol is “normal”? In any case, my opinion was formed by reading the current code which does not do this and I don’t think we should break fibre.

EDIT: I think I’ve linked the wrong code location and the relevant code is in UDPRelayBlock.

davidgumberg commented at 9:32 pm on June 24, 2025: contributor

The relevant code is in UDPRelayBlock

UDPRelayBlock is relevant only for FIBRE to FIBRE connections, and these also have bespoke logic for receiving FIBRE block announcements. The way that FIBRE nodes announce to “default” nodes is the same as the way that “default” nodes announce to each other. Below the first snippet you linked, is this if statement:

https://github.com/bitcoin/bitcoin/blob/a45d53cab556505048c387429fd07188e4c40c3d/src/net_processing.cpp#L1659-L1660

which is essentially identical to the logic used today, except fPreferHeadersAndIDs->m_requested_hb_compactblocks (https://github.com/bitcoin/bitcoin/commit/3b6bfbce386f61dcbb366f08cfff55c3882f429c)

Crypt-iQ commented at 11:03 pm on June 24, 2025: contributor

The relevant code is in UDPRelayBlock

UDPRelayBlock is relevant only for FIBRE to FIBRE connections, and these also have bespoke logic for receiving FIBRE block announcements. The way that FIBRE nodes announce to “default” nodes is the same as the way that “default” nodes announce to each other. Below the first snippet you linked, is this if statement:

bitcoin/src/net_processing.cpp

Lines 1659 to 1660 in a45d53c if (state.fPreferHeaderAndIDs && (!fWitnessEnabled || state.fWantsCmpctWitness) && !PeerHasHeader(&state, pindex) && PeerHasHeader(&state, pindex->pprev)) {

which is essentially identical to the logic used today, except fPreferHeadersAndIDs->m_requested_hb_compactblocks (3b6bfbc)

Gotcha, my bad for the misunderstanding. That means my comment about FIBRE is wrong in your open PR.

compact block fingerprinting #28272