Crash during synchronization after reindexing with txindex

GSPP commented at 12:29 pm on August 16, 2016: none

I just experienced a crash:

It is possibly out of memory related. Hopefully, the assertion condition can help you find the problem.

I was reindexing my blocks because I needed txindex. The reindex is complete at the time of the crash. The GUI shows “synchronizing with network…” at 17 weeks behind. The blocks that I was reindexing were copied from v0.11 into v0.12.1 (64-bit). The command line is bitcoin-qt.exe -server -rpcuser=xxx -rpcpassword=xxx -rpcbind=xxx -rpcallowip=xxx -txindex -reindex. Before running that command I deleted chainstate, database and db.log.

The log shows:

0************************
1EXCEPTION: St9bad_alloc       
2std::bad_alloc       
3C:\Program Files\Bitcoin\bitcoin-qt.exe in ProcessMessages()

So it looks like an OOM condition triggers this bug. It would be helpful if Bitcoin Core supported low memory conditions gracefully.

And now:

Looks like I will be rebuilding for 2 days again :( For a week now I am doing nothing else because the database gets corrupted so easily. I really wish Bitcoin Core was crash consistent. This is running on a clean VM so I don’t think the machine is somehow messed up.

jonasschnelli added the label Block storage on Aug 17, 2016

jonasschnelli added the label Data corruption on Aug 17, 2016

jonasschnelli commented at 7:50 am on August 17, 2016: contributor

I have also encountered issues when running on a windows VM and abruptly closing the VM. IMO we should address these corruption issues with something like #8037.

GSPP commented at 11:24 am on August 17, 2016: none

@jonasschnelli why is there any inconsistency caused by crashes at all? Crash consistency is a 101 feature of most databases. Is it misconfigured? Or should Bitcoin Core use transactions to make related actions atomic? Crashes normally are totally harmless (and that’s true in the real world, not just on paper).

sipa commented at 11:46 am on August 17, 2016: member

@GSPP I can’t tell for sure, as I have never seen a database inconsistency on my own hardware at all, but you are not taking into account that Bitcoin Core does not use a single database.

Storing all block, transaction and UTXO data in a single database has abysmal performance, so instead data with different access patterns are stored separately. The block data (~80 GB) goes into blk* files (append-only raw disk files), the block index (63 MB) goes into a write-only leveldb database (which is loaded into memory entirely at startup), and the UTXO set (1.6 GB) goes into a leveldb database with an application caching layer on top. LevelDB internally is designed to be crash-consistent, and we enable the features that are needed for this (batch writes, which are atomic, and rollback of incomplete transactions on crash recovery).

The problem occurs across the three datasets and caches. Bitcoin Core is designed to keep them perfectly consistent with eachother: we never write to the block index without flushing writes to the blk* files, and we never write UTXO entries to the chainstate database before the block index entries are flushed to their database.

Presumably, there is a bug in the handling of fatal errors (low disk is one of those), which causes an inconsistency to occur. That’s bad, and should be fixed, but it’s not nearly as easy as enabling a feature on a database.

As @jonasschnelli mentions, some of the code dealing with this was changed significantly in the 0.13 codebase (which is likely to be released within a few days). It would be helpful if you can see if the problem persists.

sipa commented at 11:52 am on August 17, 2016: member

@GSPP I just realized you are experiencing an OOM, not a low disk, which aborts the application immediately. In that case, I have no answer, as that’s purely an issue within LevelDB. Perhaps the Windows backend for LevelDB we’re using has bugs, or there is a bug in how the VM deals with ordering of disk writes and flushing them.

GSPP commented at 12:36 pm on August 17, 2016: none

OK. It sounds like each database itself should be crash consistent and by always writing in the correct order there never should be any inconsistency even in the face of power loss. Is that understanding correct (modulo bugs)?

Did you have a chance to try to reproduce #7233? For me the repro is really easy and it might surface the same underlying bug.

MarcoFalke added the label Windows on Aug 20, 2016

MarcoFalke commented at 0:16 am on April 27, 2020: member

Is this still an issue with a recent version of Bitcoin Core? If yes, what are the steps to reproduce?

MarcoFalke closed this on Apr 27, 2020

DrahtBot locked this on Feb 15, 2022

Crash during synchronization after reindexing with txindex #8522