Running out of disk space can leave bitcoin in a desynced state #26112

issue jb55 openend this issue on September 16, 2022
  1. jb55 commented at 5:27 pm on September 16, 2022: contributor

    I noticed my node was not syncing, and through some debugging on #bitcoin-core-dev it seems like it was caused by running out of disk space which left the node in a desynced state, with the valid chain marked as invalid.

    After freeing up some space and doing reconsiderblock it fixed it.

    Comments from @sipa:

    Ugh. That is bad. Out of disk space should not result in database corruption.

    Database errors propagating up and being interpreted as (permanent) block invalidity was one of the contributing factors to the BDB/LevelDB fork in the 0.7/0.8 transition.

    logs:

     02022-09-05T00:28:55Z UpdateTip: new best=000000000000000000087e337539b369b19570a88c493725020266387d22dac0 height=752167 version=0x2ce8e004 log2_work=93.70770
     17 tx=761139817 date='2022-09-01T15:00:06Z' progress=0.998887 cache=146.8MiB(1101860txo)
     22022-09-05T00:28:55Z Fatal LevelDB error: IO error: /home/jb55/.bitcoin/chainstate/2963623.ldb: No space left on device
     32022-09-05T00:28:55Z You can use -debug=leveldb to get more complete diagnostic messages
     42022-09-05T00:28:55Z *** System error while flushing: Fatal LevelDB error: IO error: /home/jb55/.bitcoin/chainstate/2963623.ldb: No space left on device
     52022-09-05T00:28:55Z Error: A fatal internal error occurred, see debug.log for details
     62022-09-05T00:28:56Z ERROR: ProcessNewBlock: ActivateBestChain failed (System error while flushing: Fatal LevelDB error: IO error: /home/jb55/.bitcoin/chains
     7tate/2963623.ldb: No space left on device)
     82022-09-05T00:28:56Z ERROR: ConnectBlock: Consensus::CheckTxInputs: 221fa678c5c9953d6cd17e584f05c12ab10ba0f2fc8e8131e266f3f0e9819848, bad-txns-inputs-missing
     9orspent, CheckTxInputs: inputs missing/spent
    102022-09-05T00:28:56Z InvalidChainFound: invalid block=000000000000000000079ba062298aa7cef1888f870c23e303a361c19b097ff8  height=752168  log2_work=93.707719  date=2022-09-01T15:13:42Z
    112022-09-05T00:28:56Z InvalidChainFound:  current best=000000000000000000087e337539b369b19570a88c493725020266387d22dac0  height=752167  log2_work=93.707707  date=2022-09-01T15:00:06Z
    122022-09-05T00:28:56Z ERROR: ConnectTip: ConnectBlock 000000000000000000079ba062298aa7cef1888f870c23e303a361c19b097ff8 failed, bad-txns-inputs-missingorspent, CheckTxInputs: inputs missing/spent
    132022-09-05T00:28:56Z InvalidChainFound: invalid block=000000000000000000079ba062298aa7cef1888f870c23e303a361c19b097ff8  height=752168  log2_work=93.707719  date=2022-09-01T15:13:42Z
    142022-09-05T00:28:56Z InvalidChainFound:  current best=000000000000000000087e337539b369b19570a88c493725020266387d22dac0  height=752167  log2_work=93.707707  date=2022-09-01T15:00:06Z
    
  2. jb55 added the label Bug on Sep 16, 2022
  3. fanquake added this to the milestone 24.0 on Sep 16, 2022
  4. sipa commented at 5:50 pm on September 16, 2022: member

    So it appears that this was triggered by LevelDB failing to write to disk.

    Some questions:

    • Why didn’t our own disk space check detect this long before it happened? @jb55 was anything else quickly filling your disk at the same time, which could cause our own check to not being frequent enough?
    • Did Bitcoin Core shut down after this happened?

    Based on your comments on IRC, it seems that normal restarting didn’t fix the problem. So that suggests that while there was some LevelDB error… Bitcoin Core still managed to (incorrectly) write to disk that the block was invalid. It shouldn’t conclude that in the first place, but it’s somewhat strange that even after a system error it still managed to actually commit that to disk.

  5. jb55 commented at 5:56 pm on September 16, 2022: contributor

    Why didn’t our own disk space check detect this long before it happened? @jb55 was anything else quickly filling your disk at the same time, which could cause our own check to not being frequent enough?

    yes this is very possible, I run nixos and frequently use nix-shell, etc which downloads things and fills up my disk pretty quickly.

    Did Bitcoin Core shut down after this happened?

    yes, here’s the full log:

     02022-09-05T00:28:55Z UpdateTip: new best=000000000000000000087e337539b369b19570a88c493725020266387d22dac0 height=752167 version=0x2ce8e004 log2_work=93.70770
     17 tx=761139817 date='2022-09-01T15:00:06Z' progress=0.998887 cache=146.8MiB(1101860txo)
     22022-09-05T00:28:55Z Fatal LevelDB error: IO error: /home/jb55/.bitcoin/chainstate/2963623.ldb: No space left on device
     32022-09-05T00:28:55Z You can use -debug=leveldb to get more complete diagnostic messages
     42022-09-05T00:28:55Z *** System error while flushing: Fatal LevelDB error: IO error: /home/jb55/.bitcoin/chainstate/2963623.ldb: No space left on device
     52022-09-05T00:28:55Z Error: A fatal internal error occurred, see debug.log for details
     62022-09-05T00:28:56Z ERROR: ProcessNewBlock: ActivateBestChain failed (System error while flushing: Fatal LevelDB error: IO error: /home/jb55/.bitcoin/chains
     7tate/2963623.ldb: No space left on device)
     82022-09-05T00:28:56Z ERROR: ConnectBlock: Consensus::CheckTxInputs: 221fa678c5c9953d6cd17e584f05c12ab10ba0f2fc8e8131e266f3f0e9819848, bad-txns-inputs-missing
     9orspent, CheckTxInputs: inputs missing/spent
    102022-09-05T00:28:56Z InvalidChainFound: invalid block=000000000000000000079ba062298aa7cef1888f870c23e303a361c19b097ff8  height=752168  log2_work=93.707719  date=2022-09-01T15:13:42Z
    112022-09-05T00:28:56Z InvalidChainFound:  current best=000000000000000000087e337539b369b19570a88c493725020266387d22dac0  height=752167  log2_work=93.707707  date=2022-09-01T15:00:06Z
    122022-09-05T00:28:56Z ERROR: ConnectTip: ConnectBlock 000000000000000000079ba062298aa7cef1888f870c23e303a361c19b097ff8 failed, bad-txns-inputs-missingorspent, CheckTxInputs: inputs missing/spent
    132022-09-05T00:28:56Z InvalidChainFound: invalid block=000000000000000000079ba062298aa7cef1888f870c23e303a361c19b097ff8  height=752168  log2_work=93.707719  date=2022-09-01T15:13:42Z
    14orspent, CheckTxInputs: inputs missing/spent
    152022-09-05T00:28:57Z ERROR: ProcessNewBlock: ActivateBestChain failed (System error while flushing: Fatal LevelDB error: IO error: /home/jb55/.bitcoin/chainstate/2963623.ldb: No space left on device)
    162022-09-05T00:28:57Z msghand thread exit
    172022-09-05T00:28:57Z DumpAnchors: Flush 0 outbound block-relay-only peer addresses to anchors.dat started
    18 CheckTxInputs: inputs missing/spent
    192022-09-05T00:28:56Z InvalidChainFound: invalid block=000000000000000000079ba062298aa7cef1888f870c23e303a361c19b097ff8  height=752168  log2_work=93.707719  date=2022-09-01T15:13:42Z
    202022-09-05T00:28:56Z InvalidChainFound:  current best=000000000000000000087e337539b369b19570a88c493725020266387d22dac0  height=752167  log2_work=93.707707  date=2022-09-01T15:00:06Z
    212022-09-05T00:28:57Z tor: Thread interrupt
    222022-09-05T00:28:57Z torcontrol thread exit
    232022-09-05T00:28:57Z opencon thread exit
    242022-09-05T00:28:57Z addcon thread exit
    252022-09-05T00:28:57Z Shutdown: In progress...
    262022-09-05T00:28:57Z net thread exit
    272022-09-05T00:28:57Z Fatal LevelDB error: IO error: /home/jb55/.bitcoin/chainstate/2963623.ldb: No space left on device
    282022-09-05T00:28:57Z You can use -debug=leveldb to get more complete diagnostic messages
    292022-09-05T00:28:57Z *** System error while flushing: Fatal LevelDB error: IO error: /home/jb55/.bitcoin/chainstate/2963623.ldb: No space left on device
    302022-09-05T00:28:57Z Error: A fatal internal error occurred, see debug.log for details
    312022-09-05T00:28:57Z ERROR: ProcessNewBlock: ActivateBestChain failed (System error while flushing: Fatal LevelDB error: IO error: /home/jb55/.bitcoin/chains
    32tate/2963623.ldb: No space left on device)
    332022-09-05T00:28:57Z msghand thread exit
    342022-09-05T00:28:57Z DumpAnchors: Flush 0 outbound block-relay-only peer addresses to anchors.dat started
    352022-09-05T00:28:57Z DumpAnchors: Flush 0 outbound block-relay-only peer addresses to anchors.dat completed (0.00s)
    362022-09-05T00:28:57Z scheduler thread exit
    372022-09-05T00:28:57Z Writing 0 unbroadcast transactions to disk.
    382022-09-05T00:28:57Z Dumped mempool: 0.001682s to copy, 0.015307s to dump
    392022-09-05T00:28:57Z Fatal LevelDB error: IO error: /home/jb55/.bitcoin/chainstate/2963623.ldb: No space left on device
    402022-09-05T00:28:57Z You can use -debug=leveldb to get more complete diagnostic messages
    412022-09-05T00:28:57Z *** System error while flushing: Fatal LevelDB error: IO error: /home/jb55/.bitcoin/chainstate/2963623.ldb: No space left on device
    422022-09-05T00:28:57Z Error: A fatal internal error occurred, see debug.log for details
    432022-09-05T00:28:57Z ForceFlushStateToDisk: failed to flush state (System error while flushing: Fatal LevelDB error: IO error: /home/jb55/.bitcoin/chainstate/2963623.ldb: No space left on device)
    442022-09-05T00:28:57Z Fatal LevelDB error: IO error: /home/jb55/.bitcoin/chainstate/2963623.ldb: No space left on device
    452022-09-05T00:28:57Z You can use -debug=leveldb to get more complete diagnostic messages
    462022-09-05T00:28:57Z *** System error while flushing: Fatal LevelDB error: IO error: /home/jb55/.bitcoin/chainstate/2963623.ldb: No space left on device
    472022-09-05T00:28:57Z Error: A fatal internal error occurred, see debug.log for details
    482022-09-05T00:28:57Z ForceFlushStateToDisk: failed to flush state (System error while flushing: Fatal LevelDB error: IO error: /home/jb55/.bitcoin/chainstate/2963623.ldb: No space left on device)
    492022-09-05T00:28:57Z [personal] Releasing wallet
    502022-09-05T00:28:57Z [old-wallet] Releasing wallet
    512022-09-05T00:28:58Z Shutdown: done
    
  6. sipa commented at 6:34 pm on September 16, 2022: member

    A guess about what might be happening:

    CCoinsViewErrorCatcher, the wrapper class used around CCoinsViewDB that’s supposed to detect these problems and forcefully exit the application, has an override for GetCoins. But in CheckTxInputs, HaveInputs is first invoked, which on its turn calls HaveCoin. HaveCoin is implemented in CCoinsViewDB, but not in CCoinsViewErrorCatcher, and thus the disk read exception escapes.

    A solution may be to just add an override for HaveCoin in CCoinsViewErrorCatcher.

  7. bitcoin deleted a comment on Oct 7, 2022
  8. maflcko removed this from the milestone 24.0 on Oct 17, 2022
  9. maflcko added the label UTXO Db and Indexes on Oct 17, 2022
  10. maflcko commented at 5:09 pm on October 17, 2022: member
    Removed from the milestone, as this is not a regression, nor a fix is available right now.
  11. achow101 referenced this in commit 04265ba937 on Oct 9, 2023
  12. achow101 closed this on Oct 9, 2023

  13. jb55 commented at 4:32 am on October 11, 2023: contributor
    awesome, thanks @aureleoules !
  14. Frank-GER referenced this in commit 80f8443569 on Oct 13, 2023
  15. Mikey4010 commented at 11:18 am on November 12, 2023: none
    61a6c3b0e9a8dab5c5f845af4becde817539133c
  16. bitcoin locked this on Nov 11, 2024

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-11-21 09:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me