Log explicit error message when coindb is found in inconsistent state

fjahr commented at 11:29 am on August 27, 2023: contributor

While doing manual testing on assumeutxo this week I managed to put the coindb into an inconsistent state twice. For a normal user, this can also happen if their computer crashes during a flush or if they try to stop their node during a flush and then get tired of waiting and just shut their computer down or kill the process. It’s an edge case but I wouldn’t be surprised if this does happen more often when assumeutxo gets used more widely because there might be multiple flushes happening during loading of the UTXO set in the beginning and users may think something is going wrong because of the unexpected wait or they forgot some configs and want to start over quickly.

The problem is, when this happens at first the node starts up normally until it’s time to flush again and then it hits an assert that the user can not understand.

02023-08-25T16:31:09Z [httpworker.0] [snapshot] 52000000 coins loaded (43.30%, 6768 MB)
12023-08-25T16:31:16Z [httpworker.0] Cache size (7272532192) exceeds total space (7256510300)
22023-08-25T16:31:16Z [httpworker.0] FlushSnapshotToDisk: flushing coins cache (7272 MB) started
3Assertion failed: (old_heads[0] == hashBlock), function BatchWrite, file txdb.cpp, line 126.
4Abort trap: 6

We should at least log an error message that gives users a hint of what the problem is and what they can do to resolve it. I am keeping this separate from the assumeutxo project since this issue can also happen during any regular flush.

DrahtBot commented at 11:29 am on August 27, 2023: contributor

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Reviews

See the guideline for information on the review process.

Type	Reviewers
ACK	jonatack, ryanofsky, jamesob, achow101

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

DrahtBot added the label CI failed on Aug 27, 2023

DrahtBot removed the label CI failed on Aug 27, 2023

in src/txdb.cpp:127 in be1b6d5122 outdated

122@@ -123,6 +123,9 @@ bool CCoinsViewDB::BatchWrite(CCoinsMap &mapCoins, const uint256 &hashBlock, boo
123         // We may be in the middle of replaying.
124         std::vector<uint256> old_heads = GetHeadBlocks();
125         if (old_heads.size() == 2) {
126+            if (old_heads[0] != hashBlock) {
127+                LogPrintLevel(BCLog::COINDB, BCLog::Level::Error, "The coindb detected an inconsistent state, likely due to a previous crash or shutdown. You will need to restart with -reindex-chainstate or -reindex.\n");

jonatack commented at 1:39 pm on August 27, 2023:

The log output will be prefixed by [coindb:error], so for the log message perhaps go with the more user-friendly s/coindb/coins database/.

0                LogPrintLevel(BCLog::COINDB, BCLog::Level::Error, "The coins database detected an inconsistent state, likely due to a previous crash or shutdown. You will need to restart bitcoind with the -reindex-chainstate or -reindex configuration option.\n");

Wo

fjahr commented at 6:17 pm on August 27, 2023:

Done, thanks!

jonatack commented at 1:45 pm on August 27, 2023: member

utACK be1b6d5122355e2eff6bde4efd88a51c2740761e

log: Print error message when coindb is in inconsistent state df60de770d

fjahr force-pushed on Aug 27, 2023

jonatack commented at 6:40 pm on August 27, 2023: member

ACK df60de770d8e27c00eced29d8a4a4bce7c6e4b30

ryanofsky approved

ryanofsky commented at 4:20 pm on August 29, 2023: contributor

Code review ACK df60de770d8e27c00eced29d8a4a4bce7c6e4b30

This is an improvement, but I would think just killing the process should not put the coindb in an inconsistent state that would require a reindex. Am I wrong about that, or is there more work that could be done here to debug the issue and update the database atomically?

jamesob approved

jamesob commented at 4:37 pm on August 29, 2023: member

Code review ACK df60de770d8e27c00eced29d8a4a4bce7c6e4b30

glozow added the label Utils/log/libs on Sep 1, 2023

achow101 commented at 5:17 pm on September 1, 2023: member

ACK df60de770d8e27c00eced29d8a4a4bce7c6e4b30

achow101 merged this on Sep 1, 2023

achow101 closed this on Sep 1, 2023

luke-jr commented at 8:33 pm on September 5, 2023: member

For a normal user, this can also happen if their computer crashes during a flush or if they try to stop their node during a flush and then get tired of waiting and just shut their computer down or kill the process.

It shouldn’t… That would be a bug.

Frank-GER referenced this in commit 6378a67304 on Sep 8, 2023

bitcoin locked this on Sep 4, 2024

Log explicit error message when coindb is found in inconsistent state #28350

Reviews