I’ve been running bitcoind -reindex-chainstate
very often for the LevelDB work I’ve been doing, and twice now I have gotten corrupted LevelDB databases. This is on two different computers (my desktop and my laptop). In at least once case I know there was a clean shut down, and the case with a clean shutdown happened on a build I produced from the official bitcoin-0.16.0.tar.gz release tarball with no patches applied.
Both cases exhibit something very unusual: the CRC32 for the index block is correct, but LevelDB thinks the block is snappy compressed (which it definitely is not). As of #12659 we are now logging better LevelDB error messages, so if other users encounter this issue they’ll see something like the following in their logs:
02018-03-14 01:36:47 LevelDB read failure: Corruption: corrupted compressed block contents
12018-03-14 01:36:47 Corruption: corrupted compressed block contents
22018-03-14 01:36:47 Database corrupted
(If the error is a bad CRC, instead the error message will say “Corruption: block checksum mismatch”.)
What’s interesting about this is that as you can see from the “corrupted compressed block contents” message, LevelDB thought the block was compressed. But that’s definitely not the case: we always have snappy compression disabled (see GetOptions()
in dbwrapper.cpp
). I’m looking at the LevelDB code, and the compression flag is encoded within blocks (I guess so you can change the snappy compression option and not have to rebuild your database).
The conclusion I have to draw from this is that this could be a LevelDB bug, as LevelDB wrote out data with a valid checksum, but wasn’t able to read the data it wrote. If there are other users who have LevelDB corruption problems, I would like them to build bitcoin from master (or at least with 3d16f581538b0974853e820508e8b3093269d2fd applied) and let me know if the error string contains the message “corrupted compressed block contents”.
I am using Linux 4.15.6 on an ext4 filesystem with data=ordered
in both cases.
Clean Shutdown
Here are the logs for the corrupted database on my desktop that had a clean shutdown: https://gist.github.com/eklitzke/166665da7eebfaedfa9ab8d0947ee5c8
As I mentioned earlier, this is from a bitcoind
I compiled from the official source release on bitcoincore.org without any patches applied. This isn’t exactly the same as the Gitian release because I have a newer version of GCC, but I would be surprised if that was related to the corruption issue (although it’s not impossible).
0$ sha1sum bitcoin-0.16.0.tar.gz
1b59539e55782f7f697ec4702bfda118274fd0138 bitcoin-0.16.0.tar.gz
In GDB I can see that the bad block was the one that’s supposed to contain DB_BEST_BLOCK
:
0(gdb) bt
1[#0](/bitcoin-bitcoin/0/) leveldb::ReadBlock (file=file@entry=0x55555ead3a20, options=..., handle=..., result=result@entry=0x7fffffffc090) at leveldb/table/format.cc:122
2[#1](/bitcoin-bitcoin/1/) 0x000055555590a575 in leveldb::Table::Open (options=..., file=0x55555ead3a20, size=size@entry=142335, table=table@entry=0x7fffffffc190) at leveldb/table/table.cc:65
3[#2](/bitcoin-bitcoin/2/) 0x00005555558f7d94 in leveldb::TableCache::FindTable (this=this@entry=0x5555574918f0, file_number=<optimized out>, file_size=142335, handle=handle@entry=0x7fffffffc248)
4 at leveldb/db/table_cache.cc:64
5[#3](/bitcoin-bitcoin/3/) 0x00005555558f805d in leveldb::TableCache::Get (this=0x5555574918f0, options=..., file_number=<optimized out>, file_size=<optimized out>, k=..., arg=0x7fffffffc390,
6 saver=0x5555558f9d40 <leveldb::SaveValue(void*, leveldb::Slice const&, leveldb::Slice const&)>) at leveldb/db/table_cache.cc:112
7[#4](/bitcoin-bitcoin/4/) 0x0000555555901859 in leveldb::Version::Get (this=this@entry=0x5555574b50d0, options=..., k=..., value=value@entry=0x7fffffffc630, stats=stats@entry=0x7fffffffc440)
8 at leveldb/db/version_set.cc:408
9[#5](/bitcoin-bitcoin/5/) 0x00005555558e60b9 in leveldb::DBImpl::Get (this=0x55555eac6630, options=..., key=..., value=0x7fffffffc630) at leveldb/db/db_impl.cc:1141
10[#6](/bitcoin-bitcoin/6/) 0x00005555556daca4 in CDBWrapper::Read<char, uint256> (this=0x55555ead4898, key=@0x55555599a454: 66 'B', value=...) at ./dbwrapper.h:236
11[#7](/bitcoin-bitcoin/7/) 0x00005555556d315f in CCoinsViewDB::GetBestBlock (this=<optimized out>) at txdb.cpp:71
12[#8](/bitcoin-bitcoin/8/) 0x0000555555852dc2 in CCoinsViewBacked::GetBestBlock (this=<optimized out>) at coins.cpp:25
13[#9](/bitcoin-bitcoin/9/) 0x0000555555852f5d in CCoinsViewCache::GetBestBlock (this=0x555557473520) at coins.cpp:137
14[#10](/bitcoin-bitcoin/10/) 0x00005555555b8025 in AppInitMain () at init.cpp:1510
15[#11](/bitcoin-bitcoin/11/) 0x000055555559695c in AppInit (argc=1, argv=<optimized out>) at bitcoind.cpp:160
16[#12](/bitcoin-bitcoin/12/) 0x0000555555589d9f in main (argc=1, argv=0x7fffffffdb78) at bitcoind.cpp:186
Here’s another thing that’s interesting about this. The bad file was named 600070.ldb
. When I look at the directory contents of the chainstate directory, there’s something kind of unexpected:
0-rw-rw-r--. 1 evan evan 86225 Mar 12 01:03 600060.ldb
1-rw-rw-r--. 1 evan evan 106438 Mar 12 01:03 600061.ldb
2-rw-rw-r--. 1 evan evan 81855 Mar 12 01:03 600062.ldb
3-rw-rw-r--. 1 evan evan 118547 Mar 12 01:03 600063.ldb
4-rw-rw-r--. 1 evan evan 89644 Mar 12 01:03 600064.ldb
5-rw-rw-r--. 1 evan evan 112220 Mar 12 01:03 600065.ldb
6-rw-rw-r--. 1 evan evan 86420 Mar 12 01:03 600066.ldb
7-rw-rw-r--. 1 evan evan 97277 Mar 12 01:03 600067.ldb
8-rw-rw-r--. 1 evan evan 106997 Mar 12 01:03 600068.ldb
9-rw-rw-r--. 1 evan evan 142335 Mar 12 01:03 600070.ldb <--- the corrupted file
10-rw-rw-r--. 1 evan evan 144482 Mar 12 01:03 600071.ldb
11-rw-rw-r--. 1 evan evan 150084 Mar 12 01:03 600072.ldb
12-rw-rw-r--. 1 evan evan 154317 Mar 12 01:03 600073.ldb
13-rw-rw-r--. 1 evan evan 157988 Mar 12 01:03 600074.ldb
14-rw-rw-r--. 1 evan evan 151167 Mar 12 01:03 600075.ldb
15-rw-rw-r--. 1 evan evan 224717 Mar 12 01:03 600076.ldb
16-rw-rw-r--. 1 evan evan 149947 Mar 12 01:03 600077.ldb
17-rw-rw-r--. 1 evan evan 162078 Mar 12 01:03 600078.ldb
18-rw-rw-r--. 1 evan evan 178024 Mar 12 01:03 600079.ldb
19-rw-rw-r--. 1 evan evan 140579 Mar 12 01:03 600080.ldb
20-rw-rw-r--. 1 evan evan 145243 Mar 12 01:03 600081.ldb
21-rw-rw-r--. 1 evan evan 143759 Mar 12 01:03 600082.ldb
22-rw-rw-r--. 1 evan evan 72766 Mar 12 01:03 600083.ldb
23-rw-rw-r--. 1 evan evan 206309 Mar 12 01:04 600084.ldb
As you can see, all of these files go up sequentially. Except there’s no 600069.ldb
file, which just so happens to be the file that is numerically just before the corrupted file. I don’t know if this is actually significant or not. It might just be an implementation detail in LevelDB (the “best block” should be lexicographically first in the database, so maybe the first block is handled slightly different wrt file numbering).
I’ve saved a copy of this database locally for further investigation.
Unclean Shutdown
Same error about “corrupted compressed block contents”. This wasn’t a clean shutdown it was just from me killing the process, but this is still not supposed to cause database corruption as LevelDB is supposed to maintain a consistent writeahead log.
The stack trace is different: now the corrupted block is the one containing the head block H
instead of the best block:
0(gdb) bt
1[#0](/bitcoin-bitcoin/0/) leveldb::ReadBlock (file=file@entry=0x15e07d90, options=..., handle=..., result=result@entry=0x7fffffffa270) at leveldb/table/format.cc:122
2[#1](/bitcoin-bitcoin/1/) 0x00000000007ec1f5 in leveldb::Table::Open (options=..., file=0x15e07d90, size=size@entry=1199067, table=table@entry=0x7fffffffa370) at leveldb/table/table.cc:65
3[#2](/bitcoin-bitcoin/2/) 0x00000000007d9a14 in leveldb::TableCache::FindTable (this=this@entry=0x83d1640, file_number=<optimized out>, file_size=1199067, handle=handle@entry=0x7fffffffa428)
4 at leveldb/db/table_cache.cc:64
5[#3](/bitcoin-bitcoin/3/) 0x00000000007d9cdd in leveldb::TableCache::Get (this=0x83d1640, options=..., file_number=<optimized out>, file_size=<optimized out>, k=..., arg=0x7fffffffa570,
6 saver=0x7db9c0 <leveldb::SaveValue(void*, leveldb::Slice const&, leveldb::Slice const&)>) at leveldb/db/table_cache.cc:112
7[#4](/bitcoin-bitcoin/4/) 0x00000000007e34d9 in leveldb::Version::Get (this=this@entry=0xfe1b90, options=..., k=..., value=value@entry=0x7fffffffa7b0, stats=stats@entry=0x7fffffffa620)
8 at leveldb/db/version_set.cc:408
9[#5](/bitcoin-bitcoin/5/) 0x00000000007c7d39 in leveldb::DBImpl::Get (this=0x15df55c0, options=..., key=..., value=0x7fffffffa7b0) at leveldb/db/db_impl.cc:1141
10[#6](/bitcoin-bitcoin/6/) 0x0000000000602050 in CDBWrapper::Read<char, std::vector<uint256, std::allocator<uint256> > > (this=0x15df59e8, key=@0x884eea: 72 'H',
11 value=std::vector of length 0, capacity 0) at ./dbwrapper.h:236
12[#7](/bitcoin-bitcoin/7/) 0x00000000005fc8fb in CCoinsViewDB::GetHeadBlocks (this=0x15df59e0) at txdb.cpp:78
13[#8](/bitcoin-bitcoin/8/) 0x000000000065c965 in CChainState::ReplayBlocks (this=0xcd7f00 <g_chainstate>, params=..., view=0x15df59e0) at validation.cpp:4000
14[#9](/bitcoin-bitcoin/9/) 0x000000000065d761 in ReplayBlocks (params=..., view=0x15df59e0) at validation.cpp:4060
15[#10](/bitcoin-bitcoin/10/) 0x000000000044172b in AppInitMain () at init.cpp:1523
16[#11](/bitcoin-bitcoin/11/) 0x00000000004194e7 in AppInit (argc=1, argv=0x7fffffffe278) at bitcoind.cpp:160
17[#12](/bitcoin-bitcoin/12/) 0x0000000000419afc in main (argc=1, argv=0x7fffffffe278) at bitcoind.cpp:186
It’s interesting that this should be the last block in the database, and the other stack trace was for the first block in the database. This one doesn’t have anything unusual in the directory listing, other than that I’ll note that the corrupted file was the last one numerically:
0-rw-r--r--. 1 evan evan 1413847 Mar 11 23:27 006615.ldb
1-rw-r--r--. 1 evan evan 1484317 Mar 11 23:27 006616.ldb
2-rw-r--r--. 1 evan evan 1177535 Mar 11 23:27 006617.ldb
3-rw-r--r--. 1 evan evan 1209685 Mar 11 23:27 006618.ldb
4-rw-r--r--. 1 evan evan 1497862 Mar 11 23:27 006619.ldb
5-rw-r--r--. 1 evan evan 1318973 Mar 11 23:27 006620.ldb
6-rw-r--r--. 1 evan evan 1269698 Mar 11 23:27 006621.ldb
7-rw-r--r--. 1 evan evan 1358783 Mar 11 23:27 006622.ldb
8-rw-r--r--. 1 evan evan 1199067 Mar 11 23:27 006623.ldb <--- the corrupted file
9-rw-------. 1 evan evan 0 Mar 14 17:34 006725.log
10-rw-------. 1 evan evan 16 Mar 14 17:34 CURRENT
11-rw-------. 1 evan evan 0 Mar 11 15:53 LOCK
12-rw-r--r--. 1 evan evan 181 Mar 12 00:52 LOG
13-rw-r--r--. 1 evan evan 181 Mar 12 00:38 LOG.old
14-rw-------. 1 evan evan 27472 Mar 14 17:34 MANIFEST-006724
I also saved a copy of this database locally for further investigation.