When we were in a stalling situation during IBD, we may kick a peer and download the block needed to advance our tip from another peer. When we then receive the block, that will often result in many (hundreds) of blocks being connected in one go, resolving the stalling situation: A 1024-blocks window beginning at the new tip now has several open slots.
There is a bug in what happens next, which results in a less efficient IBD and possibly an unnecessary disconnection of a peer:
In the next FindNextBlocksToDownload
call for each peer, pindexLastCommonBlock
(the last full block both we and that peer have) will not be updated immediately. Instead, we calculate the window of blocks we might request based on the old pindexLastCommonBlock
, call FindNextBlocks
with that - pindexLastCommonBlock
will only then be moved forward within that call.
This is bad for two reasons:
1.) We waste one FindNextBlocksToDownload
call iterating over blocks we won’t need to download anyway (just improving pindexLastCommonBlock
along the way)
2.) If we, as a consequence, don’t request any block from that peer, we will start a new stalling procedure and mark a peer as a possible staller, possibly disconnecting it after a short timeout. This is incorrect, because the stalling situation was already resolved, and can result in unnecessary disconnects.
I think it would be better to adjust pindexLastCommonBlock
based on the new tip in FindNextBlocksToDownload
(because we certainly don’t need to download any blocks below that in case of a linear chain with no forks), and calculate nWindowEnd
based on that, so that the following FindNextBlocks
will download blocks based on a 1024-blocks window based on the new, current tip instead of the old one.
I opened #32180 to fix this.