After reading #1271 and #1382, which don’t fix the core problem in my opinion, here is an own proposal.
The problem is that the block download mechanism may be active multiple times, in particular when during a normal initial block download a new block is announced by a different peer. We don’t keep track of which block has already been downloaded from which peer, so a new (dulicate) getblocks/inv/getdata/block/getblocks cycle may be started for the newly announced block.
This is what I would do:
- There is one designated “sync node”, from which the synchronization is supposed to happen.
- getblocks messages are never sent to any but the sync node
- A map is kept that remembers which getdata is sent to which node
- block invs (for unknown blocks) coming from any node cause a getdata to that node, unless a getdata was already sent for that block.
- When a node is disconnected, its outstanding entries are removed from the getdata map, and if it was the sync node, a new sync node is chosen.
- When a (potentially valid) block arrives that has the highest timestamp ever seen, its sender becomes the sync node.
This can later be extended to include a property that measures performance when sending blocks, and chooses a new sync node if deemed to low.
I think this is about the best we can do without heavily reorganising the code, and if we do that, headers-first should get priority, as it allows for much better block download management (it means the best chain is known in advance, blocks can be requested from multiple peers, …)
Any comments/suggestions?