LevelDB “corrupted compressed block contents” errors #12690

issue eklitzke openend this issue on March 15, 2018
  1. eklitzke commented at 0:46 am on March 15, 2018: contributor

    I’ve been running bitcoind -reindex-chainstate very often for the LevelDB work I’ve been doing, and twice now I have gotten corrupted LevelDB databases. This is on two different computers (my desktop and my laptop). In at least once case I know there was a clean shut down, and the case with a clean shutdown happened on a build I produced from the official bitcoin-0.16.0.tar.gz release tarball with no patches applied.

    Both cases exhibit something very unusual: the CRC32 for the index block is correct, but LevelDB thinks the block is snappy compressed (which it definitely is not). As of #12659 we are now logging better LevelDB error messages, so if other users encounter this issue they’ll see something like the following in their logs:

    02018-03-14 01:36:47 LevelDB read failure: Corruption: corrupted compressed block contents
    12018-03-14 01:36:47 Corruption: corrupted compressed block contents
    22018-03-14 01:36:47 Database corrupted
    

    (If the error is a bad CRC, instead the error message will say “Corruption: block checksum mismatch”.)

    What’s interesting about this is that as you can see from the “corrupted compressed block contents” message, LevelDB thought the block was compressed. But that’s definitely not the case: we always have snappy compression disabled (see GetOptions() in dbwrapper.cpp). I’m looking at the LevelDB code, and the compression flag is encoded within blocks (I guess so you can change the snappy compression option and not have to rebuild your database).

    The conclusion I have to draw from this is that this could be a LevelDB bug, as LevelDB wrote out data with a valid checksum, but wasn’t able to read the data it wrote. If there are other users who have LevelDB corruption problems, I would like them to build bitcoin from master (or at least with 3d16f581538b0974853e820508e8b3093269d2fd applied) and let me know if the error string contains the message “corrupted compressed block contents”.

    I am using Linux 4.15.6 on an ext4 filesystem with data=ordered in both cases.

    Clean Shutdown

    Here are the logs for the corrupted database on my desktop that had a clean shutdown: https://gist.github.com/eklitzke/166665da7eebfaedfa9ab8d0947ee5c8

    As I mentioned earlier, this is from a bitcoind I compiled from the official source release on bitcoincore.org without any patches applied. This isn’t exactly the same as the Gitian release because I have a newer version of GCC, but I would be surprised if that was related to the corruption issue (although it’s not impossible).

    0$ sha1sum bitcoin-0.16.0.tar.gz 
    1b59539e55782f7f697ec4702bfda118274fd0138  bitcoin-0.16.0.tar.gz
    

    In GDB I can see that the bad block was the one that’s supposed to contain DB_BEST_BLOCK:

     0(gdb) bt
     1[#0](/bitcoin-bitcoin/0/)  leveldb::ReadBlock (file=file@entry=0x55555ead3a20, options=..., handle=..., result=result@entry=0x7fffffffc090) at leveldb/table/format.cc:122
     2[#1](/bitcoin-bitcoin/1/)  0x000055555590a575 in leveldb::Table::Open (options=..., file=0x55555ead3a20, size=size@entry=142335, table=table@entry=0x7fffffffc190) at leveldb/table/table.cc:65
     3[#2](/bitcoin-bitcoin/2/)  0x00005555558f7d94 in leveldb::TableCache::FindTable (this=this@entry=0x5555574918f0, file_number=<optimized out>, file_size=142335, handle=handle@entry=0x7fffffffc248)
     4    at leveldb/db/table_cache.cc:64
     5[#3](/bitcoin-bitcoin/3/)  0x00005555558f805d in leveldb::TableCache::Get (this=0x5555574918f0, options=..., file_number=<optimized out>, file_size=<optimized out>, k=..., arg=0x7fffffffc390, 
     6    saver=0x5555558f9d40 <leveldb::SaveValue(void*, leveldb::Slice const&, leveldb::Slice const&)>) at leveldb/db/table_cache.cc:112
     7[#4](/bitcoin-bitcoin/4/)  0x0000555555901859 in leveldb::Version::Get (this=this@entry=0x5555574b50d0, options=..., k=..., value=value@entry=0x7fffffffc630, stats=stats@entry=0x7fffffffc440)
     8    at leveldb/db/version_set.cc:408
     9[#5](/bitcoin-bitcoin/5/)  0x00005555558e60b9 in leveldb::DBImpl::Get (this=0x55555eac6630, options=..., key=..., value=0x7fffffffc630) at leveldb/db/db_impl.cc:1141
    10[#6](/bitcoin-bitcoin/6/)  0x00005555556daca4 in CDBWrapper::Read<char, uint256> (this=0x55555ead4898, key=@0x55555599a454: 66 'B', value=...) at ./dbwrapper.h:236
    11[#7](/bitcoin-bitcoin/7/)  0x00005555556d315f in CCoinsViewDB::GetBestBlock (this=<optimized out>) at txdb.cpp:71
    12[#8](/bitcoin-bitcoin/8/)  0x0000555555852dc2 in CCoinsViewBacked::GetBestBlock (this=<optimized out>) at coins.cpp:25
    13[#9](/bitcoin-bitcoin/9/)  0x0000555555852f5d in CCoinsViewCache::GetBestBlock (this=0x555557473520) at coins.cpp:137
    14[#10](/bitcoin-bitcoin/10/) 0x00005555555b8025 in AppInitMain () at init.cpp:1510
    15[#11](/bitcoin-bitcoin/11/) 0x000055555559695c in AppInit (argc=1, argv=<optimized out>) at bitcoind.cpp:160
    16[#12](/bitcoin-bitcoin/12/) 0x0000555555589d9f in main (argc=1, argv=0x7fffffffdb78) at bitcoind.cpp:186
    

    Here’s another thing that’s interesting about this. The bad file was named 600070.ldb. When I look at the directory contents of the chainstate directory, there’s something kind of unexpected:

     0-rw-rw-r--. 1 evan evan   86225 Mar 12 01:03 600060.ldb
     1-rw-rw-r--. 1 evan evan  106438 Mar 12 01:03 600061.ldb
     2-rw-rw-r--. 1 evan evan   81855 Mar 12 01:03 600062.ldb
     3-rw-rw-r--. 1 evan evan  118547 Mar 12 01:03 600063.ldb
     4-rw-rw-r--. 1 evan evan   89644 Mar 12 01:03 600064.ldb
     5-rw-rw-r--. 1 evan evan  112220 Mar 12 01:03 600065.ldb
     6-rw-rw-r--. 1 evan evan   86420 Mar 12 01:03 600066.ldb
     7-rw-rw-r--. 1 evan evan   97277 Mar 12 01:03 600067.ldb
     8-rw-rw-r--. 1 evan evan  106997 Mar 12 01:03 600068.ldb
     9-rw-rw-r--. 1 evan evan  142335 Mar 12 01:03 600070.ldb    <--- the corrupted file
    10-rw-rw-r--. 1 evan evan  144482 Mar 12 01:03 600071.ldb
    11-rw-rw-r--. 1 evan evan  150084 Mar 12 01:03 600072.ldb
    12-rw-rw-r--. 1 evan evan  154317 Mar 12 01:03 600073.ldb
    13-rw-rw-r--. 1 evan evan  157988 Mar 12 01:03 600074.ldb
    14-rw-rw-r--. 1 evan evan  151167 Mar 12 01:03 600075.ldb
    15-rw-rw-r--. 1 evan evan  224717 Mar 12 01:03 600076.ldb
    16-rw-rw-r--. 1 evan evan  149947 Mar 12 01:03 600077.ldb
    17-rw-rw-r--. 1 evan evan  162078 Mar 12 01:03 600078.ldb
    18-rw-rw-r--. 1 evan evan  178024 Mar 12 01:03 600079.ldb
    19-rw-rw-r--. 1 evan evan  140579 Mar 12 01:03 600080.ldb
    20-rw-rw-r--. 1 evan evan  145243 Mar 12 01:03 600081.ldb
    21-rw-rw-r--. 1 evan evan  143759 Mar 12 01:03 600082.ldb
    22-rw-rw-r--. 1 evan evan   72766 Mar 12 01:03 600083.ldb
    23-rw-rw-r--. 1 evan evan  206309 Mar 12 01:04 600084.ldb
    

    As you can see, all of these files go up sequentially. Except there’s no 600069.ldb file, which just so happens to be the file that is numerically just before the corrupted file. I don’t know if this is actually significant or not. It might just be an implementation detail in LevelDB (the “best block” should be lexicographically first in the database, so maybe the first block is handled slightly different wrt file numbering).

    I’ve saved a copy of this database locally for further investigation.

    Unclean Shutdown

    Same error about “corrupted compressed block contents”. This wasn’t a clean shutdown it was just from me killing the process, but this is still not supposed to cause database corruption as LevelDB is supposed to maintain a consistent writeahead log.

    The stack trace is different: now the corrupted block is the one containing the head block H instead of the best block:

     0(gdb) bt
     1[#0](/bitcoin-bitcoin/0/)  leveldb::ReadBlock (file=file@entry=0x15e07d90, options=..., handle=..., result=result@entry=0x7fffffffa270) at leveldb/table/format.cc:122
     2[#1](/bitcoin-bitcoin/1/)  0x00000000007ec1f5 in leveldb::Table::Open (options=..., file=0x15e07d90, size=size@entry=1199067, table=table@entry=0x7fffffffa370) at leveldb/table/table.cc:65
     3[#2](/bitcoin-bitcoin/2/)  0x00000000007d9a14 in leveldb::TableCache::FindTable (this=this@entry=0x83d1640, file_number=<optimized out>, file_size=1199067, handle=handle@entry=0x7fffffffa428)
     4    at leveldb/db/table_cache.cc:64
     5[#3](/bitcoin-bitcoin/3/)  0x00000000007d9cdd in leveldb::TableCache::Get (this=0x83d1640, options=..., file_number=<optimized out>, file_size=<optimized out>, k=..., arg=0x7fffffffa570, 
     6    saver=0x7db9c0 <leveldb::SaveValue(void*, leveldb::Slice const&, leveldb::Slice const&)>) at leveldb/db/table_cache.cc:112
     7[#4](/bitcoin-bitcoin/4/)  0x00000000007e34d9 in leveldb::Version::Get (this=this@entry=0xfe1b90, options=..., k=..., value=value@entry=0x7fffffffa7b0, stats=stats@entry=0x7fffffffa620)
     8    at leveldb/db/version_set.cc:408
     9[#5](/bitcoin-bitcoin/5/)  0x00000000007c7d39 in leveldb::DBImpl::Get (this=0x15df55c0, options=..., key=..., value=0x7fffffffa7b0) at leveldb/db/db_impl.cc:1141
    10[#6](/bitcoin-bitcoin/6/)  0x0000000000602050 in CDBWrapper::Read<char, std::vector<uint256, std::allocator<uint256> > > (this=0x15df59e8, key=@0x884eea: 72 'H', 
    11    value=std::vector of length 0, capacity 0) at ./dbwrapper.h:236
    12[#7](/bitcoin-bitcoin/7/)  0x00000000005fc8fb in CCoinsViewDB::GetHeadBlocks (this=0x15df59e0) at txdb.cpp:78
    13[#8](/bitcoin-bitcoin/8/)  0x000000000065c965 in CChainState::ReplayBlocks (this=0xcd7f00 <g_chainstate>, params=..., view=0x15df59e0) at validation.cpp:4000
    14[#9](/bitcoin-bitcoin/9/)  0x000000000065d761 in ReplayBlocks (params=..., view=0x15df59e0) at validation.cpp:4060
    15[#10](/bitcoin-bitcoin/10/) 0x000000000044172b in AppInitMain () at init.cpp:1523
    16[#11](/bitcoin-bitcoin/11/) 0x00000000004194e7 in AppInit (argc=1, argv=0x7fffffffe278) at bitcoind.cpp:160
    17[#12](/bitcoin-bitcoin/12/) 0x0000000000419afc in main (argc=1, argv=0x7fffffffe278) at bitcoind.cpp:186
    

    It’s interesting that this should be the last block in the database, and the other stack trace was for the first block in the database. This one doesn’t have anything unusual in the directory listing, other than that I’ll note that the corrupted file was the last one numerically:

     0-rw-r--r--. 1 evan evan 1413847 Mar 11 23:27 006615.ldb
     1-rw-r--r--. 1 evan evan 1484317 Mar 11 23:27 006616.ldb
     2-rw-r--r--. 1 evan evan 1177535 Mar 11 23:27 006617.ldb
     3-rw-r--r--. 1 evan evan 1209685 Mar 11 23:27 006618.ldb
     4-rw-r--r--. 1 evan evan 1497862 Mar 11 23:27 006619.ldb
     5-rw-r--r--. 1 evan evan 1318973 Mar 11 23:27 006620.ldb
     6-rw-r--r--. 1 evan evan 1269698 Mar 11 23:27 006621.ldb
     7-rw-r--r--. 1 evan evan 1358783 Mar 11 23:27 006622.ldb
     8-rw-r--r--. 1 evan evan 1199067 Mar 11 23:27 006623.ldb     <--- the corrupted file
     9-rw-------. 1 evan evan       0 Mar 14 17:34 006725.log
    10-rw-------. 1 evan evan      16 Mar 14 17:34 CURRENT
    11-rw-------. 1 evan evan       0 Mar 11 15:53 LOCK
    12-rw-r--r--. 1 evan evan     181 Mar 12 00:52 LOG
    13-rw-r--r--. 1 evan evan     181 Mar 12 00:38 LOG.old
    14-rw-------. 1 evan evan   27472 Mar 14 17:34 MANIFEST-006724
    

    I also saved a copy of this database locally for further investigation.

  2. fanquake added the label Block storage on Mar 15, 2018
  3. laanwj added the label Data corruption on Mar 15, 2018
  4. laanwj commented at 3:42 pm on March 15, 2018: member
    FWIW when you open bitcoind’s database even once with a leveldb that was built with snappy compression enabled, bitcoind will be unable to open it from then on due to compression. But that doesn’t seem to be the issue here.
  5. MarcoFalke commented at 1:31 am on April 26, 2020: member

    Is this still an issue with a recent version of Bitcoin Core? Also, I’d suggest to check your hardware for defects.

    Bitcoin Core makes heavy use of CPU, RAM and disk IO. Hardware defects might only become visible when running Bitcoin Core. You might want to check your hardware for defects.

    • memtest86 to check your RAM
    • to check the CPU behaviour under load, use linpack or Prime95
    • to test your storage device use smartctl or CrystalDiskInfo

    Source: https://bitcoin.stackexchange.com/a/12206

  6. MarcoFalke closed this on Apr 26, 2020

  7. DrahtBot locked this on Feb 15, 2022

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-09-29 01:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me