Data corruption on MacOS when using exFAT datadir or blocksdir #31454

issue willcl-ark openend this issue on December 10, 2024
  1. willcl-ark commented at 10:48 am on December 10, 2024: member

    As reported in #28552, #28613 and various other places online there are intermittent issues when using an exFAT-formatted drive on MacOS.

    I was easily able to reproduce the issue, using an extrenal Samsung T7 SSD formatted to exFAT and connected via USB-C to an M4 macbook. Failures were seen at seemingly random points: blocks ~80_000, 170_000, 270_000.

    The failures report in multiple ways, depending on where things have gone wrong. For example:

    02024-12-09T10:35:17Z [error] ReadBlockFromDisk: Deserialize or I/O error - ReadCompactSize(): size too large: unspecified iostream_category error at FlatFilePos(nFile=102, nPos=84929524)
    12024-12-09T10:35:17Z [error] A fatal internal error occurred, see debug.log for details: Failed to read block.
    2Error: A fatal internal error occurred, see debug.log for details: Failed to read block.
    

    or:

    02024-12-09T21:58:44Z [error] A fatal internal error occurred, see debug.log for details: Corrupt block found indicating potential hardware failure.
    12024-12-09T21:58:44Z [error] ConnectTip: ConnectBlock 0000000000000010bca0ce4ce37c165cd4e820356194bf5991a7b9af792a83d5 failed, bad-txnmrklroot, hashMerkleRoot mismatch
    22024-12-09T21:58:44Z tor: Thread interrupt
    32024-12-09T21:58:44Z [error] ProcessNewBlock: ActivateBestChain failed (bad-txnmrklroot, hashMerkleRoot mismatch)
    

    In debugging both, I found that the first was simply garbage at that offset in the block file. No magic bytes were present nearby. The second was able to deserialise as the header and first bytes were OK, but the block later contained corrupt bytes, causing the merkle check to fail.

    I added debugging to ensure that it wasn’t an issue with our read/write code, which showed it was not:

    coloured link: https://paste.256k1.dev/erjellines.log

     0/Volumes/data/bitcoin-macos-debug
     1$ rg 0000000000022043f9ea2c298f3381c6ddd86b968271117dd55cb48ac297954c debug.log -C5
     21647168-2024-12-09T16:40:38Z ReadBlockFromDisk: Starting read for block at FlatFilePos(nFile=0, nPos=69397708) (header at FlatFilePos(nFile=0, nPos=69397700))
     31647169-2024-12-09T16:40:38Z ReadBlockFromDisk: Read header magic=f9beb4d9 size=2538 at FlatFilePos(nFile=0, nPos=69397700)
     41647170-2024-12-09T16:40:38Z ReadBlockFromDisk: Reading block data at file offset 69397708 size=2538
     51647171-2024-12-09T16:40:38Z ReadBlockFromDisk: Read complete - block hash=000000000001612689260e1db03677c70bb96e5707d55fe90a1e1f0c5dcac318 size=2538 nTx=9
     61647172-2024-12-09T16:40:38Z UpdateTip: new best=000000000001612689260e1db03677c70bb96e5707d55fe90a1e1f0c5dcac318 height=104762 version=0x00000001 log2_work=59.466998 tx=245649 date='2011-01-26T23:23:07Z' progress=0.000218 cache=22.9MiB(134951txo)
     71647173:2024-12-09T16:40:38Z AcceptBlock: Beginning write for block 0000000000022043f9ea2c298f3381c6ddd86b968271117dd55cb48ac297954c height=104770 size=33103
     81647174:2024-12-09T16:40:38Z AcceptBlock: Finding new position for block 0000000000022043f9ea2c298f3381c6ddd86b968271117dd55cb48ac297954c
     91647175:2024-12-09T16:40:38Z SaveBlockToDisk: Finding position for block 0000000000022043f9ea2c298f3381c6ddd86b968271117dd55cb48ac297954c size=33111 height=104770
    101647176:2024-12-09T16:40:38Z SaveBlockToDisk: Got position file=0 pos=69408910 for block 0000000000022043f9ea2c298f3381c6ddd86b968271117dd55cb48ac297954c
    111647177:2024-12-09T16:40:38Z WriteBlockToDisk: Opening file for block 0000000000022043f9ea2c298f3381c6ddd86b968271117dd55cb48ac297954c at FlatFilePos(nFile=0, nPos=69408910)
    121647178:2024-12-09T16:40:38Z WriteBlockToDisk: Writing header for block 0000000000022043f9ea2c298f3381c6ddd86b968271117dd55cb48ac297954c: magic=f9beb4d9 size=33103
    131647179:2024-12-09T16:40:38Z WriteBlockToDisk: Writing block 0000000000022043f9ea2c298f3381c6ddd86b968271117dd55cb48ac297954c data at file offset 69408918
    141647180:2024-12-09T16:40:38Z WriteBlockToDisk: Successfully wrote block 0000000000022043f9ea2c298f3381c6ddd86b968271117dd55cb48ac297954c to disk
    151647181:2024-12-09T16:40:38Z SaveBlockToDisk: Successfully wrote block 0000000000022043f9ea2c298f3381c6ddd86b968271117dd55cb48ac297954c to FlatFilePos(nFile=0, nPos=69408918)
    161647182:2024-12-09T16:40:38Z AcceptBlock: Wrote block 0000000000022043f9ea2c298f3381c6ddd86b968271117dd55cb48ac297954c to file=0 pos=69408918
    171647183:2024-12-09T16:40:38Z AcceptBlock: Successfully recorded block 0000000000022043f9ea2c298f3381c6ddd86b968271117dd55cb48ac297954c transactions
    181647184-2024-12-09T16:40:38Z AcceptBlock: Beginning write for block 0000000000000fe919539878073d169245f55a86cab68e382b8a33502c656ac3 height=104772 size=215
    191647185-2024-12-09T16:40:38Z AcceptBlock: Finding new position for block 0000000000000fe919539878073d169245f55a86cab68e382b8a33502c656ac3
    201647186-2024-12-09T16:40:38Z SaveBlockToDisk: Finding position for block 0000000000000fe919539878073d169245f55a86cab68e382b8a33502c656ac3 size=223 height=104772
    211647187-2024-12-09T16:40:38Z SaveBlockToDisk: Got position file=0 pos=69442021 for block 0000000000000fe919539878073d169245f55a86cab68e382b8a33502c656ac3
    221647188-2024-12-09T16:40:38Z WriteBlockToDisk: Opening file for block 0000000000000fe919539878073d169245f55a86cab68e382b8a33502c656ac3 at FlatFilePos(nFile=0, nPos=69442021)
    23--
    241647316-2024-12-09T16:40:38Z ReadBlockFromDisk: Read complete - block hash=00000000000053dd7549b03afd260f695e5bfcecca55055d4f3e980cccdfb389 size=215 nTx=1
    251647317-2024-12-09T16:40:38Z UpdateTip: new best=00000000000053dd7549b03afd260f695e5bfcecca55055d4f3e980cccdfb389 height=104769 version=0x00000001 log2_work=59.468001 tx=245698 date='2011-01-27T00:27:29Z' progress=0.000218 cache=22.9MiB(134960txo)
    261647318-2024-12-09T16:40:38Z ReadBlockFromDisk: Starting read for block at FlatFilePos(nFile=0, nPos=69408918) (header at FlatFilePos(nFile=0, nPos=69408910))
    271647319-2024-12-09T16:40:38Z ReadBlockFromDisk: Read header magic=f9beb4d9 size=33103 at FlatFilePos(nFile=0, nPos=69408910)
    281647320-2024-12-09T16:40:38Z ReadBlockFromDisk: Reading block data at file offset 69408918 size=33103
    291647321:2024-12-09T16:40:38Z ReadBlockFromDisk: Read complete - block hash=0000000000022043f9ea2c298f3381c6ddd86b968271117dd55cb48ac297954c size=33103 nTx=3
    301647322-2024-12-09T16:40:38Z [error] A fatal internal error occurred, see debug.log for details: Corrupt block found indicating potential hardware failure.
    311647323:2024-12-09T16:40:38Z [error] ConnectTip: ConnectBlock 0000000000022043f9ea2c298f3381c6ddd86b968271117dd55cb48ac297954c failed, bad-txnmrklroot, hashMerkleRoot mismatch
    321647324-2024-12-09T16:40:38Z [error] ProcessNewBlock: ActivateBestChain failed (bad-txnmrklroot, hashMerkleRoot mismatch)
    331647325-2024-12-09T16:40:38Z tor: Thread interrupt
    341647326-2024-12-09T16:40:38Z Shutdown: In progress...
    351647327-2024-12-09T16:40:38Z opencon thread exit
    361647328-2024-12-09T16:40:38Z torcontrol thread exit
    

    We attempt to read the correct section, but find incorrect data.

    I did try adding even more logging than this, but after a point the corruption became less frequent; indicating to me that slowing down IBD perhaps gives these syncs enough time to process properly.

    As mentioned in #31453, I suspect the root cause here is the F_FULLFSYNC fcntl is improperly implemented for exFAT on MacOS.

    I am unable to think of any other ways we could fix this in our codebase, without resorting to extreme measures like opening the block files without caching (and therefore forcing direct I/O).

  2. willcl-ark commented at 10:51 am on December 10, 2024: member
    See also #31430
  3. willcl-ark added the label macOS on Dec 10, 2024
  4. willcl-ark added the label Block storage on Dec 10, 2024
  5. willcl-ark added the label Upstream on Dec 10, 2024
  6. willcl-ark added the label Data corruption on Dec 10, 2024
  7. willcl-ark referenced this in commit 7325cf4e3f on Dec 10, 2024
  8. willcl-ark referenced this in commit ee05e73046 on Dec 10, 2024

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-12-21 12:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me