Summary When block downloads restart after a reindex of block files, spurious bytes are inserted in the blk.dat file. A subsequent reindex will show a deserialization error in the log file.
Test system CentOS 7, Bitcoin Core v0.21.0 pre-built binaries and reproduced going back to 0.10.3!
Problem Description Performing a reindex of the block files reliably introduces spurious bytes, resulting in subsequent deserialization errors.
At the point where block downloads resume and are added to the blk.dat file, 8 extra bytes are introduced in front of the first new data block. Depending on the scenario, these 8 bytes are either all 0’s, or a valid message start string (0xf9beb4d9 “magic” sequence) followed by 4 more bytes. Normal blocks of <magic> <length> <raw block> resume after these 8 bytes.
Investigation revealed that the 4 bytes after the magic were actually the length of a previously existing block in the blk.dat file that had been overwritten.
It appears what is happening is that after reindexing, a file position is calculated for adding newly downloaded blocks, and this position is incorrectly computed to be 8 bytes beyond the proper place. If the insert position is at the end of the file, the intervening 8 bytes are zero filled. If the insert position overwrites pre-existing blocks, the first 8 bytes of the overwritten block(s) are left behind.
It appears the problem results from an inconsistency in the operation of FindBlockPos during reindexing, when it returns file positions at the beginning of the raw block, as opposed to the end of previous block/beginning of the magic.
The block reading algorithm is apparently able to recover and continue, so this problem is not fatal. Inspection of the debug.log shows “LoadExternalBlockFile: Deserialize or I/O error” when it occurs. To be clear, the first reindex doesn’t show this error, but causes the problem such that a subsequent reindex will generate it.
Researching old issues, I suspect this problem contributed to symptoms observed in issues #6492 and #14933. It also looks like like #5864 might have been an attempt to address this or a similar problem.
Reproduction Scenarios These scenarios are rather contrived, but are the simplest ones I could come up with to quickly and reliably demonstrate the issue.
Scenario 1 To create a deterministic scenario, this example uses what I’ll refer to as “canonical” block files. That is, block files where the chain blocks are stored in-order up to the file size limit of 128M. Thus, the canonical blk00000.dat file has the following properties:
size: 134214363 blocks: 119965 (up to height 119964) SHA-256: fe4b764e0ce523cfce6d9e4f326e0d0977ee317a51b4a129cda707cfa4699dd7
- Start with a “canonical” blk00000.dat as the only file in a new data dir
- Start up bitcoin-qt with -reindex.
- Once the reindex completes and block downloads resume over the network, terminate the process.
In the debug.log, you should see: 2021-03-06T21:41:52Z Leaving block file 0: CBlockFileInfo(blocks=119965, size=134214371, heights=0…119964, time=2009-01-03…2011-04-24)
Note that the size has grown by 8 bytes. Inspection of hexdumps show 8 extra zero bytes at the end of the file.
Before reindex:
007fff2c0 14 9f ce 4e af a6 17 17 d9 24 b1 d0 9b 30 28 3d |...N.....$...0(=|
107fff2d0 ee a3 f1 e3 fd 88 ac 00 00 00 00 |...........|
After reindex:
007fff2c0 14 9f ce 4e af a6 17 17 d9 24 b1 d0 9b 30 28 3d |...N.....$...0(=|
107fff2d0 ee a3 f1 e3 fd 88 ac 00 00 00 00 00 00 00 00 00 |................|
207fff2e0 00 00 00 |...|
In this scenario, blk00000.dat is “full”, so the new block will go in blk00001.dat. But, in the process of testing blk00000.dat to see if the new block can be appended, it appears an incorrect file pointer offset winds up adding 8 bytes to the file.
Scenario 2 part 1 The results of this scenario are more variable due to the random ordering of blocks resulting from parallel block downloads.
- Start with a blk00000.dat and blk00001.dat containing out-of-order blocks from a parallel block download.
- Perform a reindex on these block files.
- Wait past header synchronization until new block downloads resume over the network (blk00002.dat is created), then shut the process down.
What you should discover upon inspection of blk00001.dat is that the file has been changed, and at the point where the content changes, the spurious magic and length bytes will appear. This will be somewhere near but not at the end of the file. (It appears that the insert point is after the last block that could be verified to be part of the main chain; because of parallel block downloads, some remaining blocks that couldn’t be attached are overwritten.) An example from my run:
Before reindex:
007f788d0 88 ac 00 00 00 00 f9 be b4 d9 ad bf 00 00 01 00 |................|
107f788e0 00 00 ea 8d fd 5c 0e 23 08 c8 b3 a6 73 61 5d fa |.....\.#....sa].|
Note magic at offset 0x07f788d6, block length of 0x0000bfad, version 0x01 byte from raw block
After reindex:
007f788d0 88 ac 00 00 00 00 f9 be b4 d9 ad bf 00 00 f9 be |................|
107f788e0 b4 d9 61 bf 00 00 01 00 00 00 c3 70 ca 79 eb ed |..a........p.y..|
Note magic at offset 0x07f788d6, length of 0x0000bfad as before, but now followed by another magic and a length 0x0000bf61 before getting to the version 0x01.
Scenario 2 part 2
- Save a copy of the block files resulting from the end of the previous scenario.
- Start another reindex using those files. Again, wait until new block downloads resume, then shut down.
In the debug.log, you should find the following where the corrupted block file was processed: 2021-03-06T22:19:55Z LoadExternalBlockFile: Deserialize or I/O error - ReadCompactSize(): size too large: iostream error
Depending on exactly how things go off the rails, the error could appear as: 2021-03-06T23:07:48Z LoadExternalBlockFile: Deserialize or I/O error - Read attempted past buffer limit: iostream error
If you inspect your last block file from the first reindex and compare it to its state after the second reindex, you should find a second instance of the spurious magic+length bytes.
If you perform another reindex with the block files as they stand now, you’ll get two deserialize errors in the debug log. Each additional reindex will add another instance.