FillBlock
would make the header empty via SetNull
and the call right before the second FillBlock
would crash here since LookupBlockIndex
won’t find anything. Fix that by checking for an empty header before the Assume.
FillBlock
would make the header empty via SetNull
and the call right before the second FillBlock
would crash here since LookupBlockIndex
won’t find anything. Fix that by checking for an empty header before the Assume.
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.
For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/33296.
See the guideline for information on the review process.
Type | Reviewers |
---|---|
ACK | instagibbs, mzumsande, fjahr, achow101 |
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.
3328@@ -3329,6 +3329,11 @@ void PeerManagerImpl::ProcessCompactBlockTxns(CNode& pfrom, Peer& peer, const Bl
3329
3330 PartiallyDownloadedBlock& partialBlock = *range_flight.first->second.second->partialBlock;
3331
3332+ if (partialBlock.header.IsNull()) {
partialBlock
(i.e. downgrade the request to a regular block request) when FillBlock
fails. Then it would no longer be possible to process two blocktxn
messages for the same block.
@instagibbs suggested this as well offline. I looked into it and the main issue is that it allows a peer to stall block download:
partialBlock
)I think there is probably a solution here where the malicious peer couldn’t keep doing this, but it might require some additional state? I think it would be better to not keep the partialBlock
around.
allows a peer to stall block download
I guess this would only be a serious problem in the case where all of our other peers don’t support compact block (because otherwise parallel compact blocks would allow us to download the block).
In any case, this was non-obvious to me. Maybe add a comment in the spot where we fall back to GETDATA
explaining that we don’t wipe the partialBlock
there on purpose?
where all of our other peers don’t support compact block
This, or a malicious peer somehow controls all three slots (harder since one slot is outbound), or we are in -blocksonly
mode.
In any case, this was non-obvious to me. Maybe add a comment in the spot where we fall back to GETDATA explaining that we don’t wipe the partialBlock there on purpose?
Good idea, I will add a comment. I am not sure if the existing logic is on purpose, but the way it is written in master makes it impossible to stall block download indefinitely since the first_in_flight peer needs to make forward progress after the getdata fallback after 10 minutes or get disconnected.
reset()
on partialBlock
to wipe it. I was assuming that wiping would mean calling RemoveBlockRequest
(resets m_downloading_since
which allows block download stalling) followed by BlockRequested
. I will have to think more on this because I am not sure if it introduces a footgun.
concept ACK, as per offline conversations. It would be really nice to make the state machine better, for now we can at least avoid the Assume
in debug builds when a peer does this.
We could also disconnect the peer for this behavior, I think, but not a priority
3328@@ -3329,6 +3329,14 @@ void PeerManagerImpl::ProcessCompactBlockTxns(CNode& pfrom, Peer& peer, const Bl
3329
3330 PartiallyDownloadedBlock& partialBlock = *range_flight.first->second.second->partialBlock;
3331
3332+ if (partialBlock.header.IsNull()) {
3333+ // It is possible for the header to be empty if a previous call to FillBlock wiped the header, but left
3334+ // the PartiallyDownloadedBlock pointer around (i.e. did not call RemoveBlockRequest). In this case, we
3335+ // should not call LookupBlockIndex below.
3336+ LogDebug(BCLog::NET, "Peer %d sent compact block transactions multiple times\n", pfrom.GetId());
3337+ return;
blocktxn
messages will now be ignored (whereas they would result in READ_STATUS_INVALID
/ disconnection before). Shouldn’t we rather keep on disconnecting the peer?
3328@@ -3329,6 +3329,16 @@ void PeerManagerImpl::ProcessCompactBlockTxns(CNode& pfrom, Peer& peer, const Bl
3329
3330 PartiallyDownloadedBlock& partialBlock = *range_flight.first->second.second->partialBlock;
3331
3332+ if (partialBlock.header.IsNull()) {
3333+ // It is possible for the header to be empty if a previous call to FillBlock wiped the header, but left
3334+ // the PartiallyDownloadedBlock pointer around (i.e. did not call RemoveBlockRequest). In this case, we
3335+ // should not call LookupBlockIndex below.
3336+ RemoveBlockRequest(block_transactions.blockhash, pfrom.GetId());
Yup, like @mzumsande pointed out, this shouldn’t be a change in behavior since they would previously hit the READ_STATUS_INVALID
case.
I didn’t consider noban/manual connections for stalling block download as it seems like this is already possible by triggering either of the READ_STATUS_INVALID
cases (whether here or here) and then having SendMessages
request it again here.
I think it shouldn’t be allowed for a noban/manual connection to do this, but since this is pre-existing I didn’t really consider it.
3349@@ -3340,6 +3350,11 @@ void PeerManagerImpl::ProcessCompactBlockTxns(CNode& pfrom, Peer& peer, const Bl
3350 } else if (status == READ_STATUS_FAILED) {
3351 if (first_in_flight) {
3352 // Might have collided, fall back to getdata now :(
3353+ // If we wipe the partialBlock here via RemoveBlockRequest followed by BlockRequested instead of leaving
How about something like:
// We keep the failed partialBlock to disallow processing another compact block announcement from the same // peer for the same block. We let the full block download below continue under the same m_downloading_since timer.
3328@@ -3329,6 +3329,16 @@ void PeerManagerImpl::ProcessCompactBlockTxns(CNode& pfrom, Peer& peer, const Bl
3329
3330 PartiallyDownloadedBlock& partialBlock = *range_flight.first->second.second->partialBlock;
3331
3332+ if (partialBlock.header.IsNull()) {
3333+ // It is possible for the header to be empty if a previous call to FillBlock wiped the header, but left
3334+ // the PartiallyDownloadedBlock pointer around (i.e. did not call RemoveBlockRequest). In this case, we
3335+ // should not call LookupBlockIndex below.
3336+ RemoveBlockRequest(block_transactions.blockhash, pfrom.GetId());
3337+ Misbehaving(peer, "invalid compact block/non-matching block transactions");
3333+ // It is possible for the header to be empty if a previous call to FillBlock wiped the header, but left
3334+ // the PartiallyDownloadedBlock pointer around (i.e. did not call RemoveBlockRequest). In this case, we
3335+ // should not call LookupBlockIndex below.
3336+ RemoveBlockRequest(block_transactions.blockhash, pfrom.GetId());
3337+ Misbehaving(peer, "invalid compact block/non-matching block transactions");
3338+ LogDebug(BCLog::NET, "Peer %d sent compact block transactions multiple times\n", pfrom.GetId());
Concept ACK
It’s great that there is now documentation for this but it’s still not ideal that some magic state has this special meaning that is very implicit. I would have liked it more if it was explicit but I haven’t spend enough time on this part of the code base what would be ideal and I don’t want this to block v30 more than necessary. Maybe a simple bool flag failed_reconstruction
on PartiallyDownloadedBlock
would work? Feel free to ignore.
Previously in debug builds, this would cause an Assume crash if
FillBlock had been called previously. This could happen when multiple
blocktxn messages were received.
Co-Authored-By: Greg Sanders <gsanders87@gmail.com>
Add test_multiple_blocktxn_response that checks that the peer is
disconnected.
Maybe a simple bool flag failed_reconstruction on PartiallyDownloadedBlock would work? Feel free to ignore.
This would work, but I think we should wait until there is an in-repo fuzz harness with good coverage before trying to refactor the net_processing state or the PartiallyDownloadedBlock
state. I think a bool attempted_reconstruction
could replace the calls to SetNull()
/ IsNull()
. An ideal solution would be to get rid of the unusable PartiallyDownloadedBlock
while making sure a malicious peer that is first_in_flight can’t stall block download by resetting the m_downloading_since
timer described here (there is some nuance here related to #22144 that I need to remember, the second and third peers can safely and freely reset their timer in this call to RemoveBlockRequest).
tACK 8b6264768030db1840041abeeaeefd6c227a2644
Reproduced the issue on debug build master using the test and ensured the changes here make the test pass.