possible corruption: missing undo file

Crypt-iQ commented at 11:36 am on July 4, 2022: contributor

An undo file is written to disk in WriteUndoDataForBlock. If the related block file has nFile = 2, then the undo file will also have nFile = 2. It can be flushed to disk in 4 ways:

FlushStateToDisk calls FlushBlockFile(). This flushes the last block and undo files on disk.
When the current block file is full and fKnown=false, FindBlockPos will flush the last block file. It will flush the corresponding undo file if the last height in the file is equal to the current tip height.
It seems that it is possible for the last block and undo files to be flushed if fKnown=true and nFile < last block file, but that’s irrelevant here.
In WriteUndoDataForBlock, if the undo file is not the last file and the height whose undo data being written is the last height in the file.

The UndoWriteToDisk function opens a CAutoFile and writes the data. When it returns, the CAutoFile destructor is called which calls fclose, which flushes (via fflush) the buffered data to the OS (at this point it’s in dirty pages).

A simplified version of what can happen:

Time 1:

tip=3
blockfile 2 has blocks: [5, 7, 8], so nHeightLast=8
blockfile 2 is unflushed
undo 2 does not exist since tip=3
the blocks being stored in this way is possible if the headers are received, then the blocks are OOO

Time 2:

tip=3
block 4 arrives, but cannot fit in blockfile 2
- FindBlockPos is called, blockfile 2 is flushed. Note that nHeightLast=8 for this file and tip=3 since the tip is updated later. Therefore undo file 2 isn’t flushed here (it also doesn’t exist).
blockfile 3 has blocks: [4]
blockfile 3 is unflushed
ActivateBestChain will update to tip=4
WriteUndoDataForBlock is called for 4
- undo file 3 is created for block 4 (in dirty pages)
- the BLOCK_HAVE_UNDO status flag is set for the related CBlockIndex* for block 4
ActivateBestChain will update to tip=5
WriteUndoDataForBlock is called for 5
- undo file 2 is created for block 5 (in dirty pages)
- the BLOCK_HAVE_UNDO status flag is set for CBlockIndex* 5

Time 3:

tip=5
undo file 2, 3 are still in dirty pages
FlushStateToDisk is called:
- FlushBlockFile is called which flushes the last block+undo file to disk (3)
- WriteBlockIndexDB is called and flushes the block index state to disk.

Note: At this point, block 5 has persisted the BLOCK_HAVE_UNDO status flag. Undo file 2 may still be in dirty pages and a power loss would mean that block 5 doesn’t actually have rev data. The log would look like this on restart:

02022-07-04T04:15:07.917838Z [init] [validation.cpp:3931] [VerifyDB] Verifying last 4 blocks at level 3
12022-07-04T04:15:07.917847Z [init] [validation.cpp:3938] [VerifyDB] [0%]...ERROR: UndoReadFromDisk: Deserialize or I/O error - CAutoFile::read: end of file: unspecified iostream_category error
22022-07-04T04:15:07.918012Z [init] [util/system.h:50] [error] ERROR: VerifyDB(): *** found bad undo data at 4, hash=221d40281e7719cd139503506649647ca50cc8bf9babaa1d1f91b9f80c4f48dd

To aid in testing, I set the linux vm.dirty_writeback_centisecs tunable to a high value so that the kernel would wait a while before flushing dirty pages. This made sure that files with fsync called on them would get flushed way before the non-fsync’d dirty pages.

If I’m right, then it is corruption but a lot of things have to go wrong for it to happen.

MarcoFalke added the label Block storage on Jul 4, 2022

ryanofsky commented at 3:04 pm on July 5, 2022: member

This description is a little complicated to understand. Could you maybe summarize the issue and suggest possible ways it could be fixed?

IIUC, it seems like the problem is that FlushStateToDisk only flushes the last undo file, even though earlier undo files may need to be flushed as well. Both block files and undo files are numbered based on the order blocks are downloaded, but block files are written in the order blocks are downloaded while undo files are written in the order blocks are validated, and the order blocks are downloaded is not the same as the order blocks are validated. So it is sufficient for FlushStateToDisk to only flush the highest numbered block file, but not sufficient for it to only flush the most highest numbered undo file, since lower-numbered undo files may have been written to as blocks were validated.

Crypt-iQ commented at 3:32 pm on July 5, 2022: contributor

That’s a good summary.

If a block is being connected and the undo file’s number is not m_last_blockfile, WriteUndoDataForBlock may not flush the undo file if the block in question isn’t the last height in the corresponding block file. FlushStateToDisk won’t flush this undo file either, but will flush the BLOCK_HAVE_UNDO status flag.

I think the fix would be for FlushStateToDisk to flush any undo files that were written to that resulted in the BLOCK_HAVE_UNDO status flag being set.

Note that since this lower-numbered undo file is given to the OS, it should eventually be flushed to disk in the happy path. So the issue would arise when FlushStateToDisk completes followed by a power loss/disruptive event while the lower-numbered undo file hasn’t been flushed by the OS.

possible corruption: missing undo file #25539