An undo file is written to disk in WriteUndoDataForBlock. If the related block file has nFile = 2, then the undo file will also have nFile = 2. It can be flushed to disk in 4 ways:
- FlushStateToDiskcalls- FlushBlockFile(). This flushes the last block and undo files on disk.
- When the current block file is full and fKnown=false, FindBlockPoswill flush the last block file. It will flush the corresponding undo file if the last height in the file is equal to the current tip height.
- It seems that it is possible for the last block and undo files to be flushed if fKnown=true and nFile < last block file, but that’s irrelevant here.
- In WriteUndoDataForBlock, if the undo file is not the last file and the height whose undo data being written is the last height in the file.
The UndoWriteToDisk function opens a CAutoFile and writes the data. When it returns, the CAutoFile destructor is called which calls fclose, which flushes (via fflush) the buffered data to the OS (at this point it’s in dirty pages).
A simplified version of what can happen:
Time 1:
- tip=3
- blockfile 2 has blocks: [5, 7, 8], so nHeightLast=8
- blockfile 2 is unflushed
- undo 2 does not exist since tip=3
- the blocks being stored in this way is possible if the headers are received, then the blocks are OOO
Time 2:
- tip=3
- block 4 arrives, but cannot fit in blockfile 2
- FindBlockPosis called, blockfile 2 is flushed. Note that nHeightLast=8 for this file and tip=3 since the tip is updated later. Therefore undo file 2 isn’t flushed here (it also doesn’t exist).
 
- blockfile 3 has blocks: [4]
- blockfile 3 is unflushed
- ActivateBestChainwill update to tip=4
- WriteUndoDataForBlockis called for 4- undo file 3 is created for block 4 (in dirty pages)
- the BLOCK_HAVE_UNDO status flag is set for the related CBlockIndex* for block 4
 
- ActivateBestChainwill update to tip=5
- WriteUndoDataForBlockis called for 5- undo file 2 is created for block 5 (in dirty pages)
- the BLOCK_HAVE_UNDO status flag is set for CBlockIndex* 5
 
Time 3:
- tip=5
- undo file 2, 3 are still in dirty pages
- FlushStateToDiskis called:- FlushBlockFileis called which flushes the last block+undo file to disk (3)
- WriteBlockIndexDBis called and flushes the block index state to disk.
 
Note: At this point, block 5 has persisted the BLOCK_HAVE_UNDO status flag. Undo file 2 may still be in dirty pages and a power loss would mean that block 5 doesn’t actually have rev data. The log would look like this on restart:
02022-07-04T04:15:07.917838Z [init] [validation.cpp:3931] [VerifyDB] Verifying last 4 blocks at level 3
12022-07-04T04:15:07.917847Z [init] [validation.cpp:3938] [VerifyDB] [0%]...ERROR: UndoReadFromDisk: Deserialize or I/O error - CAutoFile::read: end of file: unspecified iostream_category error
22022-07-04T04:15:07.918012Z [init] [util/system.h:50] [error] ERROR: VerifyDB(): *** found bad undo data at 4, hash=221d40281e7719cd139503506649647ca50cc8bf9babaa1d1f91b9f80c4f48dd
To aid in testing, I set the linux vm.dirty_writeback_centisecs tunable to a high value so that the kernel would wait a while before flushing dirty pages. This made sure that files with fsync called on them would get flushed way before the non-fsync’d dirty pages.
If I’m right, then it is corruption but a lot of things have to go wrong for it to happen.