Assert inside leveldbwrapper to avoid continuing after failure. #5621

pull gmaxwell wants to merge 1 commits into bitcoin:master from gmaxwell:leveldb_failure changing 1 files +2 −0

gmaxwell commented at 3:52 PM on January 8, 2015: contributor

Database failure due to faulty hardware could leave users on a fork where they are vulnerable to opportunistic attack because they are rejecting the longest chain.

If there is a software fault in the OS or LevelDB that causes many nodes to experience database failure at once its also important for overall consensus stability that the failing nodes shut down and not continue operating.

In theory, the exceptions in leveldbwrapper should cause shutdowns but in practice the software calls into that database from many places, some of which throw away all exceptions.

A more organized approach to errors should be used in the long-term, but this works for the moment.

Assert inside leveldbwrapper to avoid continuing after failure.

Database failure due to faulty hardware could leave users on a fork where
 they are vulnerable to opportunistic attack because they are rejecting
 the longest chain.

If there is a software fault in the OS or LevelDB that causes many nodes
 to experience database failure at once its also important for overall
 consensus stability that the failing nodes shut down and not continue
 operating.

In theory, the exceptions in leveldbwrapper should cause shutdowns but
 in practice the software calls into that database from many places, some
 of which throw away all exceptions.

A more organized approach to errors should be used in the long-term,
 but this works for the moment.

513fc67cd6

gmaxwell commented at 3:53 PM on January 8, 2015: contributor

Tested by running on a faulty host that was reliably getting LevelDB checksum errors and continuing operation afterwards.
gmaxwell added this to the milestone 0.10.0 on Jan 8, 2015
jgarzik commented at 3:55 PM on January 8, 2015: contributor

ut ACK
laanwj commented at 8:47 AM on January 9, 2015: member

So, #5619 or this one? Or both?
laanwj added the label Bug on Jan 9, 2015
sipa commented at 2:23 PM on January 9, 2015: member

Not both. This one will cause an assert before #5619 has a chance to catch anything anyway.
gmaxwell commented at 2:55 PM on January 9, 2015: contributor

I'm testing 5619 on my faulty host. So far it successfully has failed one of one attempts, another one is ongoing. (5621 was at 3 of 3 shortly after opening the PR) Sorry, the host isn't fast.

I would accept 5619 if, after a more testing we're not surprised by any additional places where an exception here could escape. Though, ... I'm not sure we should be using exceptions for critical failures while we are still using them for normal conditions. Eliminating them from normal conditions would require a substantial rewrite of the network deserialization code, so asserting on the critical failures is the thing to do if you buy into my argument that we shouldn't be mixing them.
laanwj removed this from the milestone 0.10.0 on Jan 12, 2015
gmaxwell commented at 4:14 PM on January 12, 2015: contributor

#5619 was merged.
gmaxwell closed this on Jan 12, 2015
bitcoin locked this on Sep 8, 2021

Contributors

Labels

Linked (view graph)

#5619 Catch UTXO set read errors and shutdown