p2p: Inefficiency in block download / stalling algorithm #32179

issue mzumsande openend this issue on March 31, 2025

mzumsande commented at 8:17 pm on March 31, 2025: contributor

When we were in a stalling situation during IBD, we may kick a peer and download the block needed to advance our tip from another peer. When we then receive the block, that will often result in many (hundreds) of blocks being connected in one go, resolving the stalling situation: A 1024-blocks window beginning at the new tip now has several open slots.

There is a bug in what happens next, which results in a less efficient IBD and possibly an unnecessary disconnection of a peer:

In the next FindNextBlocksToDownload call for each peer, pindexLastCommonBlock (the last full block both we and that peer have) will not be updated immediately. Instead, we calculate the window of blocks we might request based on the old pindexLastCommonBlock, call FindNextBlocks with that - pindexLastCommonBlock will only then be moved forward within that call.

This is bad for two reasons: 1.) We waste one FindNextBlocksToDownload call iterating over blocks we won’t need to download anyway (just improving pindexLastCommonBlock along the way) 2.) If we, as a consequence, don’t request any block from that peer, we will start a new stalling procedure and mark a peer as a possible staller, possibly disconnecting it after a short timeout. This is incorrect, because the stalling situation was already resolved, and can result in unnecessary disconnects.

I think it would be better to adjust pindexLastCommonBlock based on the new tip in FindNextBlocksToDownload (because we certainly don’t need to download any blocks below that in case of a linear chain with no forks), and calculate nWindowEnd based on that, so that the following FindNextBlocks will download blocks based on a 1024-blocks window based on the new, current tip instead of the old one.

I opened #32180 to fix this.
glozow added the label P2P on Apr 2, 2025

Contributors
mzumsande

Labels
P2P