I could really use some more eyes on this. I discussed it briefly with @sipa on IRC a few days ago.
I noticed this while looking into #5668 . This is one possible explanation I can come up with for overlapping block data. Whether it has anything to do with that issue or not, I think this still needs to be addressed.
When re-indexing, there are a few cases where garbage data may be skipped in the block files. In these cases, the indices are correctly written to the index db, however the pointer to the next position for writing in the current block file is calculated by adding the sizes of the valid blocks found.
As a result, when the re-index is finished, the index db is correct for all existing blocks, but the next block will be written to an incorrect offset, likely overwriting existing blocks.
Rather than using the sum of all valid blocks to determine the next write position, use the end of the last block written to the file. Don’t assume that the current block is the last one in the file, since they may be read out-of-order.
I was able to trigger this problem by inserting some garbage data between two valid blocks on disk in the last .dat file, then reindexing. After that, run normally for a few min in order to write a few new blocks to disk, then run with -checkblocks=0.
Before this change, I would get different errors (de-serialization, eof, etc) depending on what garbage i add and where. After the change, it appears to survive the re-index without issue regardless of the garbage.