Bitcoin Core is unstable in the presence of sudden OS crashes #7233

issue GSPP openend this issue on December 19, 2015
  1. GSPP commented at 2:20 pm on December 19, 2015: none

    Bitcoin Core is unstable in the presence of sudden OS crashes. There are two scenarios that cause problems which I experienced:

    1. Sudden power loss
    2. Stopping a Virtual Machine suddenly

    The symptoms are:

    1. A lengthy “verification” operation on the next startup
    2. Sometimes, reindexing of blocks is necessary because some “corruption” is reported (this is so slow that deleting all blocks and syncing from the network is faster)

    Are Bitcoin Core databases not crash consistent? I’m used to the situation that databases tolerate sudden power loss without loss of data or consistency. This is a core feature of “Enterprise” RDBMSes for example, and it does work as advertised in practice. Could we have crash consistency for Bitcoin Core, too?

    I imagine this can be a problem for automated operations because during the “reprocessing” phase the software is unavailable.

    In case this is considered a bug, maybe there should be automated tests for crash consistency (e.g. powering off a VM suddenly).

    This was version v0.10.2, run both on real hardware as well as virtualized on VMWare.

  2. luke-jr commented at 5:14 pm on December 19, 2015: member
    The first step to reporting any bug, is to see if it’s already fixed. Do you still have this problem with 0.10.4 or 0.11.2?
  3. GSPP commented at 5:17 pm on December 19, 2015: none
    I’ll test that and report back.
  4. sipa commented at 5:21 pm on December 19, 2015: member

    0.11.something fixed some LevelDB consistency errors for Windows systems. If you’re on Windows, you should certainly try that.

    The long catch-up after a crash is due to very aggressive caching, and delayed writing.

    Lastly, -reindex should always be faster than downloading from scratch, as it is performing the exact same operations, only fron disk instead of from network.

  5. GSPP commented at 5:40 pm on December 19, 2015: none
    Yeah, I do not understand why reindex was slower. This is on magnetic disks. Maybe I hit fragmentation or otherwise unfortunate IO patterns.
  6. MarcoFalke commented at 7:23 pm on December 19, 2015: member
    @jonasschnelli Dis some tests in a windows VM: #6917 (comment)
  7. GSPP commented at 7:16 pm on December 22, 2015: none

    I just performed the simplest possible test: Kill the VMWare VM while syncing from the network:

    eruheiruhgaer

    So I would propose that automated tests for crash scenarios are added. Note, that killing the bitcoin process might not be enough to detect such problems because killing a process does not discard OS-level write back data. Killing a VM should be a much better test.

  8. hamiltino commented at 7:57 am on January 2, 2016: none
    hey i get the same problems on linux its so annoying! It doesn’t even happen from os crash, it happens when i kill the bitcoin-qt process abruptly.
  9. NicolasDorier commented at 6:43 am on January 3, 2016: contributor
    I’m surprised, on my side did not have this problem for a while, @GSPP can you wait for the sync to be complete, and they shutdown the VM as you have done and see if the block database fail ?
  10. EthanBianchi commented at 5:09 pm on January 21, 2016: none

    I have tried this on Windows 8/10 and have experienced the same bug. When my block database is completely synchronized and caught up it doesn’t seem to occur.

    However, when my QT client is “catching up”, or synchronizing (Whether it’s from scratch, or just updating since a reboot) it gives me the message that it is corrupted and needs to be re-built.

    So it seems to only occur during synchronization, but not when completely sync’d.

  11. GSPP commented at 5:47 pm on January 22, 2016: none

    I just caught Bitcoin Core in the act:

    dnrtjrtj

    Data is not flushed and exposed to OS crashes and power losses.

    Repro steps:

    1. Put the software into a syncing state
    2. Exit cleanly
    3. Confirm program is terminated using task manager
    4. Use RAMMap to see the unflushed data

    I would expect Bitcoin to always flush files to disk to obtain crash consistency.

    Also, I wonder what happens in case of power loss during flushing. The software must tolerate partial writes as well.

    This was done using the latest version as of today. @sipa

  12. GSPP commented at 6:22 pm on January 22, 2016: none

    Turns out after cleanly shutting down the VM Bitcoin Core wants to verify the blocks on the next startup. I see no reason this is necessary. I suspect there’s at least one more issue here besides the missing flushing because this time I did ensure flushing.

    To summarize the scenario:

    • Latest Core version, upgraded from 0.10
    • VMWare Workstation 11.1.3
    • Windows 7 x64 SP1
    • 6GB RAM, 8 of 8 CPU cores

    Also during syncing, without shutting down I found this particular state:

    a

    Doesn’t this mean that in case of power loss the blocks and the chainstate index might become out-of-sync? Maybe the block makes it to disk but the index does not. Or, the other way around. Or, only some of the disk blocks of each make it and the other ones are zeroes.

    Maybe, the wallet is also not properly flushed: #7249 (comment)

  13. GSPP commented at 6:33 pm on January 22, 2016: none

    https://news.ycombinator.com/item?id=2526311

    This sounds like without sync == true random corruption and inconsistency can occur. But even with syncing different leveldb databases can become inconsistent with respect to each other. Not sure if this is the case here. Just mentioning the possibility.

    https://github.com/bitcoin/bitcoin/search?utf8=%E2%9C%93&q=WriteOptions Looks like sync is not set in cases.

  14. laanwj commented at 4:21 pm on January 27, 2016: member
    See also #5610
  15. laanwj added the label Data corruption on Feb 9, 2016
  16. GSPP commented at 9:59 am on August 14, 2016: none

    I just experienced this on Bitcoin Core version v0.12.1 (64-bit). This problem is significant because when it strikes Bitcoin Core becomes unusable for most purposes for a few days while the database rebuilds.

    Can anyone else reproduce this? Steps are above (https://github.com/bitcoin/bitcoin/issues/7233#issuecomment-166707144).

    I imagine fixing this is not too hard (?) because most database systems are built with crash consistency in mind. Maybe all that is needed is changing some LevelDB settings?

    Also note, that quite a few people are being affected by this: https://www.google.com/webhp?complete=1&hl=en#q=bitcoin+core+database+corrupted&complete=1&hl=en&start=0

  17. GSPP commented at 5:19 pm on September 7, 2016: none
    I repeatedly killed Bitcoin Core 0.13 which did not cause corruption. The issue might have gone away. But I’m not quite certain of that yet.
  18. NicolasDorier commented at 1:27 am on September 8, 2016: contributor
    on my side I got a corruption my USB drive unplugged because of bad cable when I moved it. It was in the block verification phase when it happened. Anyway, it is not as frequent as before, first time it happens to me since several months.
  19. GSPP commented at 8:25 pm on September 8, 2016: none

    I can reproduce this quite reliably:

    1. Delete the chainstate.
    2. Let the software sync for a while (e.g. 10 minutes or until “4-3 years behind”). Sync from the blocks on disk, not from the network. Possibly, this is required to reproduce (if only because of increased load).
    3. Power off (not shut down) the VMware Workstation VM
    4. In 1/2 cases the database becomes corrupted and must be rebuilt

    This is Bitcoin Core version v0.13.0 (64-bit) on Win7 x64.

    Syncing just a few seconds does not cause this. My previous test was for killing the processes during catching up from a pre-existing chainstate. Maybe one of these two deviations makes the bug disappear.

  20. GSPP commented at 10:41 am on September 10, 2016: none

    Did more tests:

    Now I had a new phenomenon: I let it catch up to 33 weeks behind, then exited Bitcoin Core normally. When the process had naturally exited, the chainstate folder was just a few KB in size and indeed after starting the software it is now re-syncing from disk. This is new.

    Trying it now on a completely fresh Win7 SP1 x64 VM with just Bitcoin Core installed: Killing the process 6 times never caused corruption. Powering off the VM frequently causes corruption. This is on a magnetic disk now and the corruption comes about 1 in 3 times. So this test shows that the Windows instance is not somehow messed up.

    I unplugged my SSD once. Also caused corruption.

    Just checked the NTFS file systems in all test VMs: No corruption. This is evidence against the disks violating their durability guarantees.

    Tonight I experienced a blue screen. The old VM became corrupt. The new VM did not report corruption but reindexing started at block zero.

    I can only encourage you to experiment with different ways of interrupting Bitcoin Core. You will find “unexplained phenomena”.

    I’ll let this rest now and will stop testing. I recommend to regularly backup the chainstate folder. It’s not too big and it can save a tremendous amount of time in case something happens.

  21. gmaxwell commented at 7:19 pm on January 1, 2017: contributor
    I believe this issue should be closed. We purposefully do not sync during initial block download for performance reasons, this is known and intentional and I believe also documented.
  22. MarcoFalke closed this on Jan 1, 2017

  23. ghost commented at 0:12 am on January 8, 2017: none
    This has nothing to do with initial block download… i too get this problem, the client is too sensitive to poweroutages, or NAS timeouts.
  24. GSPP commented at 5:21 pm on May 6, 2017: none

    @gmaxwell would there be a way to at least not create corruption during initial sync? Some kind of periodic checkpointing maybe.

    Syncing can take a very long time (such as days depending on hardware).

  25. MarcoFalke locked this on Sep 8, 2021

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-09-28 22:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me