LevelDB corruption with 0.8.x Mac client #2770

issue toffoo openend this issue on June 15, 2013
  1. toffoo commented at 0:27 am on June 15, 2013: none

    I’ve run bitcoin-qt Mac client on the latest OSX version since 0.3.x thru to 0.7.2 with literally zero problems ever, on essentially this same hardware setup.

    Since the 0.8 update and switch to LevelDB none of the three Mac client releases have worked stably for me and I’ve had to downgrade to 0.7.2 (with the May 15 workaround) to maintain a stable bitcoin wallet.

    Current setup is: MacBook Pro Retina, OSX 10.8.4, bitcoin-qt 0.8.2

    I saw that some of the known corruption issues/bugs were fixed/closed with the 0.8.2 release, so I decided to try the upgrade again. After re-indexing and working fine for hours at time (much better than 0.8.1 at least) upon restart I get “Error opening block database. Do you want to rebuild the block database?” which of course I don’t want to do because it takes forever, even on this world’s fastest Mac with SSD drive. This has happened 6 times now, and the interesting line in debug.log is:

    Verifying last 288 blocks at level 3 LevelDB read failure: Corruption: block checksum mismatch

    More details here: https://bitcointalk.org/index.php?topic=155140.0

    Downgrading to 0.7.2 again, please help permanently fix this in the next 0.8.x release.

  2. orb commented at 7:15 pm on June 16, 2013: none
    I am also still having this problem again with 0.8.2. Previously I had the problem with prior 0.8 builds and had to downgrade. 0.8.2 looked good for a while, but I’ve once again gotten this.
  3. gmaxwell commented at 1:43 am on June 17, 2013: contributor
    @rebroad you’re using a mac?
  4. gmaxwell commented at 2:01 am on June 17, 2013: contributor
    I don’t believe you’ve ever previously disclosed that you were running it on truecrypt. Can you reproduce your failures with it not on truecrypt?
  5. gdvine commented at 5:54 pm on July 9, 2013: none
    Bump. I am also getting this issue. OS X 0.8.3
  6. Diapolo commented at 10:54 am on July 12, 2013: none
    I still think it’s time to upgrade to the latest LevelDB code and see if that helps.
  7. gmaxwell commented at 10:56 am on July 12, 2013: contributor
    @Diapolo If a new version of LevelDB has changed in ways we don’t understand, then those changes could result in chain forking. If we understand how its changed we could reason about how they may or may not have helped OSX/Windows and there is no need to just guess.
  8. Diapolo commented at 11:09 am on July 12, 2013: none
    @gmaxwell You are most likely correct in what you say, but doing nothing about the crashes seems to just bring people away from bitcoind or Bitcoin-Qt in the end :-/.
  9. gavinandresen commented at 5:40 am on August 12, 2013: contributor

    I got a chainstate corruption this morning on my Mac, and have spent most of the day debugging. @sipa @gmaxwell @jgarzik : here is what I’ve found so far:

    My MANIFEST- file is corrupted (corrupted file: http://skypaint.com/bitcoin/MANIFEST-076191 ). I wrote a little python tool to dump the log records (http://skypaint.com/bitcoin/dumplogfile.py ), and something odd happened at bytes 65,506-65.536: there are 30 zero bytes.

    So the records look like (output from my dumplogfile.py ):

    FULL length 1012 (position: 64490) 0 length 0 (position: 65509) 0 length 0 (position: 65516) 0 length 0 (position: 65523) BLOCK 2 LAST length 1121 (position: 65536)

    … and leveldb is very upset that there is no FIRST record before the LAST record at the beginning of block 2 (block as in “leveldb 32,768-byte block of records”, not bitcoin block).

    WHY those 30 zero bytes were written…. I dunno.

  10. jgarzik commented at 5:44 am on August 12, 2013: contributor

    …or not written, as mmap’d pages often default to all-zero.

    Gah.

  11. gavinandresen commented at 6:12 am on August 12, 2013: contributor
  12. Diapolo commented at 7:20 am on August 12, 2013: none
    @gavinandresen Did this happen with a build, which includes the recent LevelDB update?
  13. gavinandresen commented at 11:58 pm on August 12, 2013: contributor

    @Diapolo: yes, I got corruption running git HEAD which includes latest LevelDB.

    I’ve reached the edge of my LevelDB/filesystem knowledge, so I’m done debugging this for now and moving on to other things.

  14. jgarzik commented at 0:03 am on August 13, 2013: contributor
    @gavinandresen Were you able to reliably reproduce the corruption?
  15. toffoo commented at 0:18 am on August 13, 2013: none

    Guys, thanks again for finally looking into this and acknowledging that something is seriously amiss.

    Given that it seems we have a fairly serious and profound bug to squash here, and we’ve apparently (momentarily?) reached the edge of your time/knowledge to debug this, would it make sense as a stopgap measure to release a backport v0.7.3 Mac client?

    https://bitcointalk.org/index.php?topic=199699.msg2128991#msg2128991

    Windows and Linux versions were released back in May, even though their v0.8.x releases seem to work pretty good. From what I’ve seen, none of the v0.8.x Mac releases with LevelDB work reliably.

    I’ve made some noise after each of the v0.8.x Mac releases that they are not working reliably, but this is the first time I’ve seen any of the devs semi-publically acknowledge it.

    I’m sure there are plenty of Mac users downloading and running these buggy releases every day, with no idea there may be a problem, and the number of posts to the bitcointalk Tech Support forum about this confirm it.

    I am well-informed and technical enough to know to use v0.7.2 and install the “May 15 workaround”, but I’m sure there are PLENTY of other Mac users who are not.

    A public announcement that the existing v0.8.x Mac releases are bad and an official release of a v0.7.3 Mac backport I think is probably the right next step.

  16. luke-jr commented at 0:29 am on August 13, 2013: member
    @toffoo I am not comfortable releasing a 0.7.3 final until the May15 hardfork has actually split from the older clients and the backport has proven reliable. I also don’t have a Mac, so I cannot even build v0.7.3rc3 for it.
  17. gavinandresen commented at 1:47 am on August 13, 2013: contributor

    @toffoo : I’d rather drop Bitcoin-Qt on OSX than recommend an old release; we don’t have the resources to support multiple releases.

    Whether this bug is serious enough to drop the OSX release until somebody figures out what is wrong and fixes it is debatable– I’ve had two corruptions in the last six months, the seriousness of this problem seems to be different depending either on particular hardware or luck (I have no idea which).

  18. dpkp commented at 5:49 am on August 15, 2013: none

    Could this be a result of relying on fsync on Mac OSX for write guarantees?

    https://developer.apple.com/library/mac/documentation/Darwin/Reference/ManPages/man2/fsync.2.html

    Note that while fsync() will flush all data from the host to the drive (i.e. the “permanent storage device”), the drive itself may not physically write the data to the platters for quite some time and it may be written in an out-of-order sequence.

    Specifically, if the drive loses power or the OS crashes, the application may find that only some or none of their data was written. The disk drive may also re-order the data so that later writes may be present, while earlier writes are not.

    If so, the fix would be to use fcntl F_FULLFSYNC instead, although that can cause significant slowdown as it will flush all buffered data, not just that of the particular file.

    https://developer.apple.com/library/mac/documentation/Darwin/Reference/ManPages/man2/fcntl.2.html

    F_FULLFSYNC Does the same thing as fsync(2) then asks the drive to flush all buffered data to the permanent storage device (arg is ignored). This is currently implemented on HFS, MS-DOS (FAT), and Universal Disk Format (UDF) file systems. The operation may take quite a while to complete. Certain FireWire drives have also been known to ignore the request to flush their buffered data.

    other references: http://lists.apple.com/archives/Darwin-dev/2005/Feb/msg00087.html http://shaver.off.net/diary/2008/05/25/fsyncers-and-curveballs/

    If this is the cause, I believe the fix is a simple patch to leveldb’s port/port_posix.h along the lines of:

    #if defined(OS_MACOSX) #define fdatasync(fd) fcntl(fd, F_FULLFSYNC, 0) #endif

  19. mikehearn commented at 8:05 am on August 15, 2013: contributor
    I think we should try that patch. The fact that fsync doesn’t fsync is … unfortunate. I guess if writes can arrive to disk out of order and at some arbitrary later time could would line up with what I’ve seen which is that the corruptions only happen when something goes wrong at the OS level, like a kernel panic during resume from suspend. I will ask Sanjay to weigh in on the LevelDB bug as well. He is not a Mac expert but could probably provide advice.
  20. gavinandresen commented at 10:47 pm on August 21, 2013: contributor
    Please try: https://sourceforge.net/projects/bitcoin/files/Bitcoin/bitcoin-0.8.4/test/ … and let us know if the corruption issues disappear.
  21. medicinebottle commented at 2:42 pm on August 25, 2013: none
    I gave 0.8.4 a try but I am still getting the same error. I am just one user though, and not a programmer, so better wait for another response to confirm.
  22. sipa commented at 4:17 pm on August 25, 2013: member
    @medicinebottle Did you restart from scratch? (start with -reindex option, or wiped your blocks + chainstate directories).
  23. toffoo commented at 6:28 pm on August 25, 2013: none

    Hi, yes I would also like to try out this new test release, but I was curious which would be the most scientific method of indexing the block data:

    1. reindex my existing v0.7.2 blockchain
    2. download jgarzik’s blockchain bittorrent bootstrap.dat https://bitcointalk.org/index.php?topic=145386.0
    3. re-download from scratch the entire blockchain with the fresh install of v0.8.4test

    In all my previous tests of v0.8.x releases I did method #1.

    For this one I was thinking of trying #2. Would that in any way ruin the scientific integrity of this test?

    How long should a complete sync of the blockchain thru bitcoin-qt take these days?

  24. toffoo commented at 7:45 am on August 30, 2013: none

    As per sipa’s suggestion, I’ve gone with #1 above and reindexed my existing v0.7.2 blockchain.

    Almost 3 days now running v0.8.4rc2 here and: zero corruption

    Thanks again guys for all this Mac client attention. Whatever you did seems to be working.

  25. sipa commented at 9:21 am on August 30, 2013: member
    @toffoo If you’re up for another experiment to make sure the 0.8.4 changes are the cause here: can you try doing a reindex with 0.8.3 again, and see whether it fails?
  26. toffoo commented at 6:44 am on September 2, 2013: none

    Okay, done. This afternoon I reindexed with 0.8.3 no problem and it has continued to run (on/off several times) tonight with no crashes or corruption.

    When I started this Issue it was 0.8.2 that had corrupted 6 times on me. I don’t think I ever tried 0.8.3 when it came out, because I understood that nothing had changed that might address this issue, and I was fed up after all my problems with 0.8.0, 0.8.1, and 0.8.2.

    I will continue running both this 0.8.3 and 0.8.4rc2 on and off and will report back here if I can get either to corrupt or crash.

  27. gmaxwell commented at 7:19 am on September 2, 2013: contributor
    ::sigh:: we fixed a file descriptor exhaustion bug that caused corruption on OSX previously.
  28. toffoo commented at 3:31 pm on September 4, 2013: none

    Ok, experiment complete: 0.8.3 just corrupted on me. Reindexing worked fine and ran perfect for about 2 days, now upon restart today it opened with “Database corrupted”, closed “unexpectedly”, and didn’t offer to reindex.

    Like before, this was the only interesting line in debug.log:

    LevelDB read failure: Corruption: block checksum mismatch *** System error: Database corrupted

    I will go install the newly released 0.8.4 and report again here if this problem resurfaces again.

  29. toffoo commented at 3:59 pm on September 4, 2013: none

    Ok, now things are getting interesting. With the newly installed 0.8.4, first it came up saying my wallet.dat is corrupted (that’s a first) but fortunately I have a backup copy.

    With restored wallet.dat in place, now it offers to re-index, but now actually crashes/corrupts during re-index (at least during the beginning!):

    LevelDB read failure: Corruption: block checksum mismatch *** System error: Database corrupted Reindexing block file blk00000.dat… LevelDB read failure: Corruption: block checksum mismatch *** System error: Database corrupted ERROR: AcceptBlock() : AddToBlockIndex failed ERROR: ProcessBlock() : AcceptBlock FAILED Reindexing block file blk00001.dat…

    So I’ve tried this a few times now and I cannot get it to re-index the crashed 0.8.3 blockchain.

    So now I will try again from scratch with a fresh copy of my good 0.7.2 blockchain.

  30. toffoo commented at 5:17 am on September 14, 2013: none

    I just tried out the new 0.8.5 Mac release by reindexing my good 0.7.2 blockchain.

    It reindexed fine in about 3 hours and I left it running with no problems for several hours.

    Upon the first restart I got the “Corrupted block database detected. Do you want to rebuilt the block database now?” message and this (new!) error in debug.log:

    Opened LevelDB successfully LoadBlockIndexDB(): last block file = 6 LoadBlockIndexDB(): last block file info: CBlockFileInfo(blocks=42, size=4201183, heights=257815…257856, time=2013-09-13…2013-09-14) LoadBlockIndexDB(): transaction index disabled LoadBlockIndexDB(): hashBestChain=0000000000000027ff96c78ed8363a5b300cbaf4cd04d2188ea2a696372f5561 height=257856 date=2013-09-14 03:01:03 init message: Verifying blocks… Verifying last 288 blocks at level 3 ERROR: bool CBlock::ReadFromDisk(const CDiskBlockPos&)() : deserialize or I/O error ERROR: VerifyDB() : *** block.ReadFromDisk failed at 257814, hash=0000000000000002fad60b4a9a338aa7e736d4305e15d631d5fb38986f58cc41 Flush(false) DBFlush(false) ended 0ms StopNode() Flushed 0 addresses to peers.dat 1ms Committing 8731 changed transactions to coin database… Flush(true) DBFlush(true) ended 0ms

  31. gavinandresen commented at 6:07 am on September 14, 2013: contributor

    “block.ReadFromDisk failed” is what it sounds like: it tried to read block data from disk and it couldn’t.

    That sounds very much like bad hardware to me, the block-writing code is incredibly straightforward (blocks are written as received in append-only fashion to the blk*.dat files).

  32. toffoo commented at 6:35 am on September 14, 2013: none

    Yes, in my decidedly amateur capacity, that’s what I figured ReadFromDisk might mean.

    But do you really believe this could be a hardware issue? I remain skeptical. Let’s recap the facts here:

    1.) running 0.7.2 client since release (and all previous Mac client releases) on this same machine has NEVER crashed or corrupted on me.

    2.) ALL 0.8.x client releases have corrupted for me (eventually or quickly)

    3.) NO other software or files have ever turned up corrupted for me on this machine

    4.) It has a relatively new SSD drive,

    5.) which is running with FileVault2 full-disk encryption … so shouldn’t any hardware errors (do SSD drives get errors?) be transparent to the operating system?

    Is there anything you could suggest I do that might help diagnose this further?

    Can you tell from that debug.log exactly which file it’s saying it can’t read?

    I took a Time Machine backup between the first successful run of 0.8.5 and the second when it failed to open. In theory, if we can determine which file bitcoin-qt is saying is bad, could a restore from backup of a “good?” copy of that file tell us whether the “bad?” copy was really due to hardware corruption on the disk?

  33. toffoo commented at 11:09 pm on September 14, 2013: none

    Ok, I managed to just eye-ball this and see that my new 0.8.5 blk00005.dat file is a few megabytes smaller than the corresponding blk0006.dat file in my original 0.7.2 directory from before the reindex/upgrade. All the other block files are identical pre/post upgrade.

    Also, doing a ‘diff’ (or any other typical OS command) on this file does not report any kind of read error or disk corruption.

  34. gmaxwell commented at 11:36 pm on September 14, 2013: contributor

    toffo, I believe what those messages are effectively saying to us is that the block file which is an append only file was truncated.

    In bitcoin blocks are first appened to the last block file and then the database is updated to reflect those writes (referencing the blocks that were just written).. each of these operations has fsyncs between them (except during the initial syncup) so a correct operating system will not not ever make let them make it to disk out of order.

    At startup your node is attempting to read a block which is mentioned in the database and its finding that the data is just not there— that its reading past the end of the file. Which means that the file was truncated somehow or the writes happened out of order and the block append just didn’t happen but the database update did.

  35. gmaxwell commented at 3:39 am on September 16, 2013: contributor
    @toffoo A side effect of that long blah blah I wrote above— I went and looked at how we were handling the blockfiles and found an OSX specific issue. See pull #3000.
  36. gmaxwell commented at 3:48 pm on September 19, 2013: contributor

    @toffoo Any chance to test with my fix from pull 3000 in git? I’d like to have some confirmation that it makes a visible improvement for people who were previously having issues.

    If it does then perhaps we should do a 0.8.5.1 for OSX only. (wouldn’t be the first time we respun OSX for a small fix)

  37. toffoo commented at 5:16 pm on September 19, 2013: none

    Hey @gmaxwell sure I’d be happy to test it here if someone can build me a binary?

    I’ve never actually tried to compile a bitcoin-qt binary, nor do I have a built environment set up here, and I understand that it isn’t exactly straightforward for a novice to get the Mac client built.

  38. face commented at 8:36 am on September 22, 2013: none

    Aloha @toffoo. I’m the Mac developer for Litecoin. Here is a build of Bitcoin 0.8.5 + pull 3000: http://myutil.com/test/

    I think it is built the same way as official builds (32bit, 10.5 support, built with Xcode 3.2.6 as that was the last Xcode with official support from Apple for OSX 10.5).

  39. wtogami commented at 9:36 am on September 22, 2013: contributor
    Please verify the integrity of his .dmg.asc with GPG. His GPG key is signed by my GPG key, which is available here among many other places: https://github.com/bitcoin/bitcoin/blob/master/contrib/gitian-downloader/wtogami-key.pgp
  40. gavinandresen commented at 9:11 pm on September 22, 2013: contributor
    I use OSX 10.6.8 and Xcode 3.2.4 on a 32-bit machine, with everything compiled -arch i386 -mmacosx_version=10.5 to create release builds.
  41. face commented at 8:18 am on September 23, 2013: none
    Thanks @gavinandresen, I’ll downgrade Xcode to 3.2.4 for all future builds (and then I think I have everything else the same, except perhaps for macports minor versions of dependencies). We had a support issue with OSX 10.5 and the Xcode version could definitely be the culprit.
  42. wtogami commented at 9:35 am on September 23, 2013: contributor
    Well, we don’t know for sure if our builds have problems on 10.5. We have only one report from a user who said the client would mysteriously just stop sometime after sync is complete and it is idle.
  43. gmaxwell commented at 2:52 pm on September 27, 2013: contributor
    @toffoo Any results?
  44. sheldonth commented at 1:06 am on September 28, 2013: none

    When my Bitcoin-Qt 0.8.5-beta gets to “Loading Wallet”, the program quits unexpectedly. The violating thread is:

    Thread 0 Crashed:: Dispatch queue: com.apple.main-thread 0 libsystem_kernel.dylib 0x93a11a6a __pthread_kill + 10 1 libsystem_c.dylib 0x943ecb2f pthread_kill + 101 2 libsystem_c.dylib 0x94423631 abort + 168 3 libsystem_c.dylib 0x944314f6 __assert_rtn + 326 4 org.bitcoinfoundation.Bitcoin-Qt 0x000d1efc SetBestChain(CValidationState&, CBlockIndex*) + 2268 5 org.bitcoinfoundation.Bitcoin-Qt 0x000d3c28 ConnectBestBlock(CValidationState&) + 824 6 org.bitcoinfoundation.Bitcoin-Qt 0x001259e8 AppInit2(boost::thread_group&) + 22744 7 org.bitcoinfoundation.Bitcoin-Qt 0x00023440 main + 7504 8 org.bitcoinfoundation.Bitcoin-Qt 0x00020711 start + 53

    Running OS X 10.8.5. Could it be related to issues above? @gavinandresen ?

  45. toffoo commented at 8:30 am on September 28, 2013: none

    @face thank you for the binary @gmaxwell using face’s Bitcoin 0.8.5 + pull 3000 Mac binary, I’ve just gone and re-indexed my “still working like a charm” v0.7.2 blockchain. It chugged away for about 2.5 hours with no problems and seemed to finish and run with no issue. I then let it run for a while, again with no problems, and then shutdown clean. Upon first re-start I got this again:

    screen shot 2013-09-28 at 5 11 47 am

    Here’s a debug.log for you:

    Bitcoin version v0.8.5-1-g9051cd9-beta (2013-09-21 21:46:04 -1000) Using OpenSSL version OpenSSL 1.0.1e 11 Feb 2013 Startup time: 2013-09-28 08:10:32 Default data directory /Users/—-/Library/Application Support/Bitcoin Using data directory /Users/—-/Library/Application Support/Bitcoin Using at most 13 connections (2560 file descriptors available) Using 8 threads for script verification init message: Verifying wallet… dbenv.open LogDir=/Users/—-/Library/Application Support/Bitcoin/database ErrorFile=/Users/—-/Library/Application Support/Bitcoin/db.log Bound to [::]:8333 Bound to 0.0.0.0:8333 init message: Loading block index… Opening LevelDB in /Users/—-/Library/Application Support/Bitcoin/blocks/index Opened LevelDB successfully Opening LevelDB in /Users/—-/Library/Application Support/Bitcoin/chainstate Opened LevelDB successfully LoadBlockIndexDB(): last block file = 6 LoadBlockIndexDB(): last block file info: CBlockFileInfo(blocks=26, size=2413677, heights=260508…260533, time=2013-09-28…2013-09-28) LoadBlockIndexDB(): transaction index disabled LoadBlockIndexDB(): hashBestChain=0000000000000008b6bb368f7fc6e3d48c7cbc78063aed542a405f2438d4bcc7 height=260533 date=2013-09-28 07:59:13 init message: Verifying blocks… Verifying last 288 blocks at level 3 ERROR: bool CBlock::ReadFromDisk(const CDiskBlockPos&)() : deserialize or I/O error ERROR: VerifyDB() : *** block.ReadFromDisk failed at 260507, hash=0000000000000018d43c8f01bd6f999991558eeb09588e04ab5d6ef0eefa6f1b

  46. wtogami commented at 11:54 am on October 29, 2013: contributor
    @toffoo https://bitcointalk.org/index.php?topic=320695 Please test the Bitcoin OMG build here that includes both Mac OS X fsync patches and an upgrade to leveldb 1.13. We are very curious to learn if this solves the corruption that some Mac users experience.
  47. toffoo commented at 8:19 am on November 4, 2013: none

    @wtogami Thank you for your efforts, I’m so glad to see this issue receive the attention it deserves.

    This weekend I got a chance to try your latest binary. I downloaded and installed your Bitcoin-Qt-0.8.5-OMG3.dmg on my MacbookPro Retina, now running OSX Mavericks v10.9.

    Once again I reindexed my up-to-date v0.7.2 blockchain, this time it took about 4 hours. It finished reindexing with no problem, got synced with the network, and ran fine with no issue for several minutes until I did a clean shutdown.

    Another positive I see is finally a return of Retina fonts to bitcoin-qt! As far as I know, v0.7.2 is the only bitcoin-qt release that had “Retina-style” HiDPI font support. All previous releases, and then again all the v0.8.x releases, look all jaggy on the Retina screen. Of course I’m probably part of a very small audience here, and the other non-font GUI elements are of course still low-res, but it definitely look a lot sharper for me like this, so thanks!

    Unfortunately the bad news is that upon first restart I got the exact same issue as last time, my dreaded: “Corrupted block database detected. Do you want to rebuilt the block database now?”

    Here’s the relevant part of debug.log for you:

    Opened LevelDB successfully LoadBlockIndexDB(): last block file = 6 LoadBlockIndexDB(): last block file info: CBlockFileInfo(blocks=37, size=3188636, heights=267817…267853, time=2013-11-04…2013-11-04) LoadBlockIndexDB(): transaction index disabled LoadBlockIndexDB(): hashBestChain=000000000000000450718beafbece467989022fe2265c5dd69b07f0ca5b40e8f height=267853 date=2013-11-04 07:21:53 init message: Verifying blocks… Verifying last 288 blocks at level 3 ERROR: bool CBlock::ReadFromDisk(const CDiskBlockPos&)() : deserialize or I/O error ERROR: VerifyDB() : *** block.ReadFromDisk failed at 267816, hash=0000000000000006fd15a014c76b0bd0200b74cbcbe8f3a873cedd2798e65a59 Shutdown : In progress…

    Since this failed in exactly the same way as last time, and differently than how it used to die, it got me thinking a little deeper about the situation, so allow me to speculate on what I think may be going on here.

    Let me preface this by saying I am NOT A PROGRAMMER and have never even tried to read the bitcoin source code, so this is nothing more than an educated guess as to how things may actually be working internally.

    I noticed that pre-0.8.x bitcoin-qt stores the blockchain in files that fill up to 2.1 GB before spilling over to a new one. Before this update/reindex I had blk0001.dat - blk0005.dat each 2.1 GB and then blk0006.dat 1.7 GB and I was synced to up to (I think about) 267816.

    When I start the update and it does the reindex, it seems like it hardlinks the old block files with a slightly different name, but now in the /blocks/ directory.

    Now, with the reindex complete, as the new client runs, rather than appending new blocks to the last 1.7 GB blockfile, it creates a new one, rather than continuing to fill up the last old one to 2.1 GB. So now with any new blocks that came in during the 4 hours I was reindexing, or the 20-30 minutes I let it run afterwards, look like they’ve gone into a new 16.8 MB /blocks/blk00006.dat, and leaving a 1.7 GB blk00005.dat.

    So then when I do the first cold restart, guessing from the behavior and the debug.log messages, it seems like it gets up to the last block 267816 I had previously synced and tries to keep reading from that last < 2.1 GB block file rather than looking for the new small one, and therefore quits with error “blockchain corrupted”.

    Again, maybe a shot in the dark here, but I think some version of this theory makes sense in light of some of the other clues. It could be that the “general Mac corruption issue” has been solved, but perhaps some new bug was introduced that only affects people (like me) updating from older Mac client blockchains. Recall, these recent releases are “corrupting” for me immediately upon first restart in exactly the same way, and in a completely different way than how it used to get corrupted at some random, inconsistent moment with the earlier v0.8.x releases.

    What do you think?

  48. wtogami commented at 8:42 am on November 4, 2013: contributor

    Have you tried 0.8.5-OMG3 with a clean sync, not an upgrade?

    Do you have access to a very fast upload pipe somewhere? If your 0.7.2 blocks are 100% reproducible in causing this error then one of the Mac devs likely will want to see your actual files. Use tar and xz to compress them to a smaller size and upload to a URL. Come on IRC and ask for help with a temporary server to upload to if you don’t have one yourself.

    (Your other questions the Mac devs need to think about.)

  49. toffoo commented at 4:28 am on November 11, 2013: none

    Some updates here: I’ve been in contact with @wtogami on IRC and he concluded that my bitcoin-0.7.2 blockchain data is bad in a unique way that breaks upon update to bitcoin-0.8. I added that my blockchain has been continually synced since v0.3ish and agree with his conclusion.

    He suggested: 1.) attempt a complete fresh blockchain resync with his -OMG3 build instead of an update, that is now 2 days in progress, so far so good, I’ll post how it turns out, and 2.) I’ve sent him an archive of my 0.7.2 blockchain for further analysis.

    On a related noted, I have also recently updated my Litecoin from v0.6ish to his 0.8.5.2-rc4 build (which is a similar update from an old blockchain) with no problems and it has been running good with no corruption.

  50. wKovacs64 commented at 4:25 pm on November 11, 2013: none
    I tried running 0.8.5-OMG3 for a week with no corruption problems. When I first opened it, it did complain about my block database so I let it rebuild. Now it’s complaining about the block database again. I guess I will try a completely fresh blockchain like @toffoo did.
  51. toffoo commented at 11:22 pm on November 13, 2013: none

    Another update: I had forgotten that I’d read Gavin’s advice regarding App Nap on Mavericks #3182 after almost 4 days doing the fresh full sync. Even then when I ticked it off it isn’t clear if the executable is run directly from the command line if the OS respects that tick. I was running:

    /Applications/Bitcoin-Qt.app/Contents/MacOS/Bitcoin-Qt -dbcache=1000

    from the command line on advice the initial sync would go faster.

    So whether it was the App Nap, or this other issue mentioned #3243 which I also experienced, or just general slowness, this full sync took a LONG time for me.

    But it did eventually finish last night, successfully, and ran well for hours thereafter and after a few restarts, so I was all ready to type up congratulatory comments that perhaps our Mac corruption bugs had finally been licked.

    But then when I restarted Bitcoin-Qt again this morning, I got the dreaded error again:

    Bitcoin version v0.8.5-OMG3-beta (2013-10-30 17:50:47 -1000) Using OpenSSL version OpenSSL 1.0.1e 11 Feb 2013 Startup time: 2013-11-13 16:31:51 Default data directory /Users/—-/Library/Application Support/Bitcoin Using data directory /Users/—–/Library/Application Support/Bitcoin Using at most 125 connections (2560 file descriptors available) Using 8 threads for script verification init message: Verifying wallet… dbenv.open LogDir=/Users/—–/Library/Application Support/Bitcoin/database ErrorFile=/Users/—–/Library/Application Support/Bitcoin/db.log Bound to [::]:8333 Bound to 0.0.0.0:8333 init message: Loading block index… Opening LevelDB in /Users/—–/Library/Application Support/Bitcoin/blocks/index Opened LevelDB successfully Opening LevelDB in /Users/—–/Library/Application Support/Bitcoin/chainstate Opened LevelDB successfully LoadBlockIndexDB(): last block file = 92 LoadBlockIndexDB(): last block file info: CBlockFileInfo(blocks=453, size=65057378, heights=268923…269375, time=2013-11-11…2013-11-13) LoadBlockIndexDB(): transaction index disabled LoadBlockIndexDB(): hashBestChain=00000000000000062c7d79d3d301f71320f986585ae385d5c356e444fed23f7f height=269375 date=2013-11-13 08:17:03 init message: Verifying blocks… Verifying last 288 blocks at level 3 LevelDB read failure: Corruption: block checksum mismatch Shutdown : In progress… StopNode() Flushed 0 addresses to peers.dat 15ms Committing 10456 changed transactions to coin database… Shutdown : done

    So again, this is @wtogami ’s v0.8.5-OMG3-beta binary, about 12 hours after a complete full sync from the network.

    On a related note, your v0.8.5.2-rc4-beta Litecoin binary is still running great with no problems.

  52. wKovacs64 commented at 11:38 pm on November 13, 2013: none
    I’m trying the same thing, a fresh blockchain using @wtogami ’s 0.8.5-OMG3 beta. So far, so good, but I definitely need a few more days of testing. Although, your results don’t give me much hope of this working now. @toffoo Any chance you’re hopping between networks/locations, such as home vs. work? I mentioned this on IRC the other day - I’m starting to suspect changing networks somehow causes this dreaded corruption error. Don’t ask me to explain why that would be the case, but that’s what I’ve been noticing. I can fire up the client without issue in one location, go to the other location, and get the corruption error. Take it back to the previous location, database validates just fine. And it isn’t always the same result for each location. If I “fix” it in one location that was broken before (either by answering Yes to the rebuild question or seeding my Bitcoin folder with validated files), it very well may work there but now break in the other location. I can’t imagine the client IP address or other network configuration would play into validating the block database, but… :suspect:
  53. toffoo commented at 11:52 pm on November 13, 2013: none
    @KaosMcRage interesting theory, I can certainly appreciate the temptation to search for patterns in the corruption. But in my case, I have only ever run bitcoin-qt on the same home network … BUT my external IP does change on occasion. So I’m not sure if that data point supports your theory or not.
  54. wtogami commented at 0:27 am on November 14, 2013: contributor

    Changing the network doesn’t make sense as the cause of corruption.

    More testing of OMG3+ would be good on Mac as it matches the leveldb currently in Bitcoin 0.9. It is possible it fixed other corruption issues and something else remains here. Quite frustrating that we can’t figure out a reliable trigger.

  55. wKovacs64 commented at 1:53 am on November 14, 2013: none
    @toffoo If your external IP changes match up with some/all of your re-corruption instances… maybe it supports or at least contributes to the theory. I know it’s a stretch, though, and probably nonsense. @wtogami I completely agree, it’s a bit absurd. I’ve resisted saying anything for quite a while but it’s just been too common lately to not at least mention it. Let us know if there is any specific testing you’d like done using OMG3+.
  56. wKovacs64 commented at 2:08 am on November 18, 2013: none

    @wtogami I started with a 9 GB bootstrap.dat, synced the rest using OMG3-beta, and used it for about 5 days. When I opened it up today, it made it past the initial “Verifying blocks…” where B-Qt normally detects corruption and began syncing with the network, but then this popped up: screen shot 2013-11-17 at 6 42 47 pm scaled So, slightly different result but still getting corruption after a while, I guess. Potentially interesting things from debug.log:

    0init message: Verifying blocks...
    1Verifying last 288 blocks at level 3
    2No coin database inconsistencies in last 132 blocks (35868 transactions)
    3 block index           22444ms
    4init message: Loading wallet...
    

    That’s where B-Qt would’ve failed. Then later…

    0ProcessBlock: ACCEPTED
    1getblocks 270098 to 0000000000000000000000000000000000000000000000000000000000000000 limit 500
    2getblocks 270098 to 0000000000000000000000000000000000000000000000000000000000000000 limit 500
    3received block 0000000000000005e283ba322761cde5f606238a8a032b34f053e9500552d57f
    4LevelDB read failure: Corruption: block checksum mismatch
    5*** System error: Database corrupted
    

    Followed by a bunch of connection timeouts and

    0ERROR: AcceptBlock() : AddToBlockIndex failed
    1ERROR: ProcessBlock() : AcceptBlock FAILED
    

    Then some more getblocks and a shutdown. Attempts to open the client after this point result in an immediate prompt to rebuild the block database.

  57. fanquake commented at 3:35 am on November 18, 2013: member
    @KaosMcRage If you choose no to rebuilding the client, does it crash immediately? If so, could you report the details of the crash report?
  58. wKovacs64 commented at 4:00 am on November 18, 2013: none

    @fanquake Yes, it does. Log looks like this:

    0init message: Verifying blocks...
    1Verifying last 288 blocks at level 3
    2LevelDB read failure: Corruption: block checksum mismatch
    3Shutdown : In progress...
    4StopNode()
    5Flushed 0 addresses to peers.dat  18ms
    6Committing 6113 changed transactions to coin database...
    7Shutdown : done
    
  59. fanquake commented at 4:12 am on November 18, 2013: member
    @KaosMcRage Sorry I should have specified, I’m talking about the Apple error report.
  60. wKovacs64 commented at 4:35 am on November 18, 2013: none
    @fanquake Oh, it doesn’t actually crash in that regard. It exits gracefully as far as OS X is concerned.
  61. wtogami commented at 4:57 am on November 18, 2013: contributor
    https://bitcointalk.org/index.php?topic=337294 Bounty to explain and fix this MacOS X corruption issue now stands at 5 BTC + 200 LTC. Please donate to the thread if you want to increase incentive for someone to fix this.
  62. jrmithdobbs commented at 1:04 am on November 21, 2013: contributor

    Can someone please spin a 64bit dmg linked to nothing older than 10.7 sdk and see if this keeps happening? I can’t even currently get tests building with 64bit os x builds and that is not a good sign seeing as 32bit binaries are getting closer and closer to deprecation.

    I think a very large amount of time is being wasted here trying to debug a problem who is being made worse by arbitrary legacy support that isn’t actually being used by anyone.

  63. wKovacs64 commented at 6:48 pm on November 22, 2013: none
    FYI, I waited a few days and opened the OMG3-beta client again and it healed itself. Verified the database just fine.
  64. hyc commented at 7:26 pm on November 25, 2013: none
    fwiw, I have a fork ported to use LMDB. Dunno if that counts as solving the problem re: the bounty, but LMDB has no corruption issues.
  65. wtogami commented at 9:54 am on November 26, 2013: contributor
    https://bitcointalk.org/index.php?topic=337294.msg3718821#msg3718821 If you previously experienced corruption with MacOS X Bitcoin or LItecoin, please test these MacOS X binaries and report back!
  66. sigkill commented at 6:00 am on November 27, 2013: none

    Can anyone please confirm if using HFS+ compression, if the file system is case sensitive, and if they are using SSD with trim enabled, or a standard hard disk?

    If you have this issue please attempt to create a fat32 mount point and move your wallet and other supporting files there, pointing your configs at this (backing up ahead of time)

    I believe this is a forking file system / resource fork issue or an issue with ssd support in OS X and HFS+ compression.

    (Related note - most people have this problem come up when putting the system to sleep or hibernate, this process causes the system to write out memory contents to the drive - this means that due to the resource forking of the system a race condition can and will occur, especially on SSD based machines, there is a feature that can be turned on and off for access time tracking of HFS, but a likely easier fix would be to disallow sleeping, or making sure to shut down your wallet before sleeping your machine)

  67. gavinandresen commented at 6:17 am on November 27, 2013: contributor

    I’m uploading a couple of corrupted chainstates from two of the corruptions I’ve experienced to: http://www.skypaint.com/leveldb_corruption/

    (tar.gz format; the second will be done uploading in 15 minutes or so).

  68. throughnothing commented at 6:31 am on November 27, 2013: none

    @toffoo do you have time machine enabled on your mac, and is it enabled over the directories that your blockchain(s) are being stored?

    I have been runing 0.8.5-beta (stock) for a while now on my mac with no issues. I did an initial full sync with this version of the client, and run it about once a day for a few hours to keep it up to date.

  69. sigkill commented at 6:31 am on November 27, 2013: none

    gavinandersen - can you try and partion a fat partition on to both of the drives where you have had the issues and then copy the data there and attempt to recreate the issue once those partitions are mounted while running bitcoin-qt from there? (running the binary and having the supporting walled and db files within this new FAT fs based mount point)

    I totally think this is an HFS file syncing race condition at this point.

  70. gavinandresen commented at 6:36 am on November 27, 2013: contributor
    @sigkill : no, I am busy with other things.
  71. toffoo commented at 6:41 am on November 27, 2013: none

    @throughnothing yes, I’ve always had time machine enabled on this mac and over the directories the blockchain is stored. It’s set to run only manually and I’m quite sure I only ever ran a backup when bitcoin-qt was not running.

    I’ve of course tried to correlate the episodes of corruption with time machine backups, reboots, IP address changes, or anything else that might explain it, but I’ve yet to find a definitive causation. I’m quite sure I’ve made it thru some time machine backups in my testing of 0.8.x builds without the corruption happening, but who knows.

  72. throughnothing commented at 6:46 am on November 27, 2013: none

    @toffoo it might be helpful if you could:

    • a) turn time machine off (I understand probably not desireble)
    • b) move your blockchain (and probably Bitcoin-QT client also, just in case) out of a Time Machine-watched area

    That could at least eliminate a variable. I don’t run time machine (and never have) and have yet to reproduce it on 2 macs, one running 10.8, and a newer mac running 10.9.

  73. wtogami commented at 7:23 am on November 27, 2013: contributor
    https://bitcointalk.org/index.php?topic=337294.msg3718821#msg3718821 @toffoo Did you try the test builds of Bitcoin-Qt and Litecoin-Qt? Need test results.
  74. toffoo commented at 9:08 am on November 27, 2013: none
    @wtogami yes I have the new binaries and testing in progress… Thanks for your fast builds. Out of curiosity, is there anything else different between -OMG4 and -OMG3 apart from the ‘cfields patch’?
  75. wtogami commented at 9:13 am on November 27, 2013: contributor
    @toffoo Very minor, mainly debug.log print improvements.
  76. victoredwardocallaghan commented at 11:42 am on November 27, 2013: none

    Maybe it is possible to use DTrace on OSX to track this down? I now FreeBSD provides this and I have seen DTrace being used on OSX, however I don’t know the quality of the port. If FreeBSD and OSX are both suffering this isse then perhaps the problem sits down in libc ?

    http://dtrace.org/blogs/brendan/2011/10/10/top-10-dtrace-scripts-for-mac-os-x/ http://snltd.co.uk/snippets/index.php?c=v&sn=dtrace_notes_io.php

  77. rescrv commented at 10:13 pm on November 27, 2013: none
  78. madthanu commented at 2:40 am on November 28, 2013: none

    I guess this bug has already been solved (Robert Escriva), but I’m interested in doing a post-mortem analysis. Background: I’m researching “filesystems-applications-interfacing” bugs, and I’ve delved into some detail on LevelDB bugs.

    There’s a lot of confusing information here, and I’m having trouble reproducing the bug, so it’ll be really helpful if someone can provide me with the following information:

    1. Did the bug happen with process crashes, or clean shutdowns, or system suspends, or system crashes? I hear many people saying clean shutdowns, just making sure. Did a system crash happen at some recent point in the past for people who got corruption with just clean shutdowns?
    2. A snapshot of the “blocks/index/” directory and the “chainstate/” directory in which the LevelDB corruptions were reported. I see gavinandresen’s MANIFEST, but a full snapshot would be appreciated.

    Thanks for the trouble!

  79. toffoo commented at 6:19 pm on November 28, 2013: none

    Thank you @wtogami and @rescrv. I have installed the new -OMG5 binary and so far so good. I re-indexed from my up-to-date v0.7.2 blockchain (with -dbcache=1000) and it took about 2.5 hours. Left it running for a while to get synced, all smooth. Opened closed half dozen times, reboots, suspends, etc. all good.

    I think the jury is still out on whether this devious bug is actually gone for good, but things are looking good. Scrolling up, I recall that the longest any of the v0.8.x releases have lasted for me without corrupting is about 2-3 days. So I’ll continue playing around with this in the coming days and will report back with any anomalies.

    Other Mac users who have suffered corruption issues please post your experiences with this new binary as well..

  80. wKovacs64 commented at 7:40 pm on November 28, 2013: none
    @wtogami @rescrv Is it necessary to rebuild using OMG5? I had a clean/working blockchain (OMG4) that I hadn’t synced in a couple days. Saw the OMG5 binaries were available so I grabbed those. Fired it up, synced for about a minute and then it popped up the corruption error.
  81. rescrv commented at 10:58 pm on November 28, 2013: none
    It’ll be necessary to start from a non-corrupted DB. The fix only prevents corruption. If you start fresh, you’ll be all set. If you start from a DB without corruption you’ll be all set.
  82. wtogami commented at 11:54 pm on November 29, 2013: contributor

    https://bitcointalk.org/index.php?topic=337294.msg3718821#msg3718821 Here are the latest builds with the patch from @rescrv. We have a report from Bitcoin 0.8.5-OMG6 of a different crash.

    http://pastebin.com/hg2QTwTB crash traceback https://www.dropbox.com/s/m5xtbv5tniltwtq/toffoochainstate.tar.xz chainstate/ after the above crash, 189MB download http://pastebin.com/nyvZ7xEZ crash on attempt to run it again

  83. madthanu commented at 0:11 am on November 30, 2013: none
    The first time the crash happened, did the machine have a power failure just before? or maybe was it shutdown clean and rebooted? or wasn’t shutdown in any manner at all?
  84. rescrv commented at 1:06 am on November 30, 2013: none

    Can you provide more information about the above crash? Every file, every key is iterable and gettable from the LevelDB interface. Each file individually validates as well, iterating and “get"ing just fine.

    Were there any other changes between what happened before my patch, and OMG6?

  85. toffoo commented at 1:13 am on November 30, 2013: none

    @madthanu I reindexed my 0.7.2 blockchain with -OMG5 and it ran fine thru all my tests. I then hotswapped in the -OMG6 binary, which again ran fine for many hours and thru all tests. I was lighting up BitcoinBlackFriday and successfully completing all kinds of transactions. During this time I was open/close/suspend/reboot frequently and nothing could phase it.

    But then upon first boot up several hours later I got the crash.

    As discussed on IRC, I’m currently bittorrenting the fresh bootstrap.dat and will index directly with -OMG6 next to see if I can recreate the issue and discount other hypotheses.

  86. madthanu commented at 2:24 am on November 30, 2013: none
    @toffoo Any chance you can upload all your files except wallet.dat and bitcoin.conf? AFAIK, they are the only two files that can affect your privacy, but maybe someone else here can provide more information about that. Will really help with the debugging.
  87. toffoo commented at 3:44 am on December 1, 2013: none

    Some updates….

    After so many versions, tests, crashes, and discussions I’m bound to get dizzy and lose track of what’s been going on, so here’s a log of the latest:

    I torrented a fresh copy of @jgarzik’s excellent bootstrap.dat and was successfully able to build a fresh reindex with the -OMG6b binary. Indexing above block height 250,000 still seemed to take a very long time, with all sorts of errors in debug.log, but did eventually complete and run successfully.

    All tests with -OMG6b have so far worked fine (with regards to the levelDB corruption issue) but a different issue was encountered: directly after sending a transaction (and using Coin Control for my first time) bitcoin-qt crashed with an OSX traceback and here it is:

    http://pastebin.com/g8QqheGc

    This same issue was also reported by LitecoinDev coblee.

    I was able to restart bitcoin-qt with no DB corruption reported and the tx that caused the crash did go thru.

    Apart from this probably unrelated issue, -OMG6b (from the bootstrap.dat reindex) continues to work fine and has not produced or reported corruption. @rescrv has requested a core dump from a DB corruption crash (if possible).

    We determined that OSX does not default to save core dumps, but it can be turned on (before the crash happens of course) with the command: ulimit -c unlimited

    The Bitcoin-Qt binary must then be run from the same Terminal using the command: /Applications/Bitcoin-Qt.app/Contents/MacOS/Bitcoin-Qt

    This setting is not robust to reboots or new Terminal windows and must therefore always be run beforehand each time.

    I successfully generated a core dump by reproducing the ‘send tx’ bug. And we also determined that it would probably be a security risk to share a core dump from bitcoin-qt running a “live wallet”.

    Currently under discussion whether I continue testing with -OMG6b, revert to -OMG5, or await a new binary?

  88. wtogami commented at 8:32 am on December 1, 2013: contributor

    https://bitcointalk.org/index.php?topic=337294.msg3718821#msg3718821 Bitcoin build now following https://github.com/bitcoin/bitcoin/commits/0.8.6

    http://pastebin.com/hg2QTwTB As noted above, @rescrv’s patch alone resulted in this crash in OMG6. OMG6b added cfield’s 2nd leveldb patch which he later retracted.

    The crash during Send is unrelated to the leveldb issue. It seems to be in Coin Control.

  89. toffoo commented at 4:47 am on December 2, 2013: none

    Update:

    Running v0.8.6-mactest1 worked fine for me last night and this morning, opened/closed many times and transactions sent with no problem … until upon a restart today I got the blockchain database corruption error.

    debug.log: http://pastebin.com/VQbvZDDr

    chainstate/: https://www.dropbox.com/s/gf9cahbqkewt6rx/toffoo0.8.6-mactest1-chainstateCORRUPTED.tar.xz

    Also noticed this version no longer has Retina font support, all the previous OMG builds did.

  90. toffoo commented at 6:27 am on December 2, 2013: none

    As per @rescrv ’s suggestion, I ran a memtest to discount a bad RAM hypothesis.

    Thanks edubai on the forum for Mac memtest link: https://bitcointalk.org/index.php?topic=337294.msg3764745#msg3764745

    I passed:

    http://pastebin.com/Xu52eJr7

  91. toffoo commented at 9:43 am on December 2, 2013: none

    Hi @wtogami, Bitcoin-Qt-0.8.5-OMG7-no-mmap OSX crashed for me upon Sending a transaction using Coin Control, here’s the pastebin:

    http://pastebin.com/N063nqrn

    …but running good so far with no LevelDB corruption.

  92. wtogami commented at 11:53 am on December 2, 2013: contributor
    @toffoo http://download1.rpmfusion.org/~warren/bitcoin-0.8.5-OMG7/macosx/ Please test Bitcoin-Qt-0.8.5-OMG7-no-mmap2.dmg. I included yet another “remove setFocus()” at the recommendation of @laanwj. And yes the version is internally Bitcoin-Qt-0.8.5-OMG7-no-mmap. Ignore that.
  93. rescrv commented at 11:04 pm on December 2, 2013: none

    Can we separate the GUI bugs from the OS X LevelDB corruption bug? The issue that was reported resulted in a very clear pattern of corruption, where contiguous segments of files written by LevelDB would be erroneously replaced by NULL bytes. Since applying the msync patch I provided, the corruption issue originally reported has not been confirmed by anyone else, including toffoo (the original reporter). What’s been reported since has been a series of hard crashes, mostly within Qt code, that do not follow the pattern of corruption originally reported in any of the linked forum posts.

    These bugs reside outside LevelDB, and are not part of the (now infamous) LevelDB corruption bug.

  94. gmaxwell commented at 11:17 pm on December 2, 2013: contributor

    @rescrv Toffoo’s crashes at startup appear to be database corruption related, though perhaps not the same corruption.

    (The GUI crashes are related to some other unrelated code that was backported and put in warren’s “OMG” builds, and indeed don’t belong here. though it’s good to have learned about them)

  95. rescrv commented at 11:55 pm on December 2, 2013: none

    Toffoo’s crashes at startup are most likely because the state is invalid, yes. But that does not imply that it is LevelDB corruption. Every time he has had corruption, it has been preceded by a crash in code that is not LevelDB.

    In the first chainstate he posted since the crash, there was absolutely zero LevelDB corruption. Every file is parseable in its entirety. Every entry is retrievable using any access pattern (iteration or “get”). Every checksum matches the blocks it should. If the state is corrupt, it was corrupted before being put into LevelDB, or (and this is unlikely), the corruption affected the data and the checksums just right so that everything is consistent. The problem he reports here is at a higher level, and was preceded by the crash in Qt.

    In the second chainstate he posted (mactest1), there is indeed corruption in 010255.sst. There’s a handful of keys in this file that are corrupted, but no other files. The key observation here is that this file was generated as a result of compaction, and the corruption is localized to a small run of keys. Because the file was generated by compaction, the input keys were verified and likely came from several different SST files, implying that the corruption occurred during compaction and was not present before the compaction started. The corruption itself indicates that something changed the contents of either the checksum or the written data strictly after a valid checksum was computed. Given the brief, direct nature of that code, and the fact that it’s run millions of times per day in all the different apps that use LevelDB, it’s a pretty sure bet that the corruption is generated by something else in the same process. To follow this train of thought further, LevelDB is heavily used within the Chrome web browser (for which it was designed), the HyperDex NoSQL store, and as a core component of the MVP of many startups. These users are not reporting corruption during compaction, and no reports of such corruption exist outside bitcoin-qt.

    The underlying cause of the GUI crashes is likely also responsible for whatever bug within Bitcoin-Qt is now corrupting toffoo’s state. But that bug is not the LevelDB bug that was first reported where the data was corrupted on a clean shutdown. The recent reports have been linked to the GUI, and the nature of the crash would also directly explain the subsequent corruption as well. According to @wtogami on IRC, the GUI code was “a simple race condition in the GUI code, probably a double free”, which is exactly the kind of bug that would lead to errant corruption that indiscriminately affects data both inside and outside of LevelDB.

  96. wtogami commented at 1:41 am on December 3, 2013: contributor

    I am sorry the unrelated GUI bug confused the issue here, but I think we are passed that. The corruption during compaction occurred with the mactest1 build, which lacks the buggy GUI code in question.

    https://github.com/bitcoin/bitcoin/commits/0.8.6 mactest1 was built from 6003954be08586092d652ca2828e86e92d96c660.

  97. wtogami commented at 6:23 am on December 3, 2013: contributor

    https://bitcointalk.org/index.php?topic=337294.msg3718821#msg3718821 New builds posted of Bitcoin and Litecoin which utilize Patrick Strateman’s #3340, modified to switch away from mmap writes on MacOS X only. The unrelated Mac-specific crash during send we believe is now fixed.

    The “mactest1” build is also there which instead uses @rescrv’s #3327 mmap fix.

  98. pstratem commented at 7:17 am on December 3, 2013: contributor

    A careful reading of the OS X manpage for msync indicates that it may not provide the same guarantees as linux.

    “The msync() system call writes modified whole pages back to the filesystem and updates the file modification time.” https://developer.apple.com/library/mac/documentation/Darwin/Reference/ManPages/man2/msync.2.html

    This would seem to indicated that the pages are flushed to the filesystem layer as dirty pages rather than flushed to disk as the linux manpage indicates (and the actual implementation guarantees).

    Importantly PosixMmapFile::Sync calls fdatasync before calling msync.

    This could result in corruption at anypoint in the file assuming Sync is only called once when the file is closed for writing (as I believe nearly all of the files created with the WriteableFile class are).

    It would be realllly helpful if we could find an apple engineer to verify the actual functioning of msync().

  99. wtogami commented at 11:26 am on December 3, 2013: contributor
    @toffoo Please test 0.8.5-OMG8. @coblee experienced zero crashes or corruption issues with vigorous testing. We are very curious to learn how it fares with your Mac.
  100. toffoo commented at 5:12 pm on December 3, 2013: none
    Okay, Bitcoin-Qt-0.8.5-OMG7-no-mmap2 also worked good for me with no corruption but today I switched to 0.8.5-OMG8.
  101. wtogami commented at 7:07 pm on December 3, 2013: contributor
    Glad to hear. Bitcoin-Qt-0.8.5-OMG7-no-mmap2 and Bitcoin-Qt-0.8.5-OMG8 were effectively the same thing, the latter was just properly tagged in git. =)
  102. mrgitbit commented at 3:38 am on December 4, 2013: none
    I downloaded Litecoin QT for the very first time yesterday to a Macbook Pro Retina, received my first litecoins just now and immediately afterwards had this error. Now Litecoin QT won’t open without the error closing it and I have no idea what to do. I had just received LTC and had no time to backup the wallet. Is there a laymen/noob course of action to take?
  103. wtogami commented at 6:18 am on December 4, 2013: contributor
    @mrgitbit Likely your wallet.dat is fine. Use https://litecoin.info/Data_directory page to find and make a backup copy. http://blog.litecoin.org/ Download the latest MacOS build of Litecoin-Qt and use -reindex. Please go to the Litecoin Forum for technical support, it is inappropriate to get Litecoin help here.
  104. h0jeZvgoxFepBQ2C commented at 6:36 pm on December 9, 2013: none
    Is this ticket still open, since 0.8.6 has been released?
  105. laanwj closed this on Jan 6, 2014

  106. norn commented at 5:55 pm on October 16, 2014: none
    Still have this issue, Bitcoin 0.9.3 on Ubuntu 12.04.5 LTS 64bit, FS is ext4 —cut— 2014-10-16 16:20:22 ProcessBlock: ACCEPTED 2014-10-16 16:20:22 UpdateTip: new best=000000000000001cf1cb5c7e4f5e62b069ff7eaa9c889b8d50794b49e48d3f3d height=253248 log2_work=71.333226 tx=22434776 date=2013-08-20 19:48:05 progress=0.240005 2014-10-16 16:20:22 ProcessBlock: ACCEPTED 2014-10-16 16:20:22 LevelDB read failure: Corruption: block checksum mismatch 2014-10-16 16:20:22 Corruption: block checksum mismatch 2014-10-16 16:20:22 *** System error: Database corrupted 2014-10-16 16:20:26 ERROR: ProcessBlock() : AcceptBlock FAILED 2014-10-16 16:20:26 Loaded 253248 blocks from external file in 2769282ms 2014-10-16 16:20:27 Requesting shutdown 2014-10-16 16:20:27 Running Shutdown in thread 2014-10-16 16:20:27 opencon thread interrupt 2014-10-16 16:20:27 dumpaddr thread stop 2014-10-16 16:20:27 addcon thread interrupt 2014-10-16 16:20:27 msghand thread interrupt 2014-10-16 16:20:27 net thread interrupt 2014-10-16 16:20:27 Shutdown : In progress… 2014-10-16 16:20:27 StopNode() 2014-10-16 16:20:28 Shutdown : done 2014-10-16 16:20:28 Shutdown finished 2014-10-16 16:20:28 Shutdown result: 1 2014-10-16 16:20:28 Stopping thread 2014-10-16 16:20:28 Stopped thread —cut—
  107. norn commented at 9:31 am on October 18, 2014: none
    The issue can be resolved by updating leveldb to the latest one. At least It works for me.
  108. norn commented at 8:29 pm on November 28, 2014: none
    Recently I’ve found it was actually bad RAM.
  109. laanwj commented at 5:21 am on November 29, 2014: member

    Thanks for letting us know what the problem was.

    Heh yes, Bitcoin Core is good in finding any problems with your RAM and CPU, usually one of the first programs that starts failing because it requires all results to be correct and deterministic. One bit flip can be enough. Crypto is fragile like that. In a way the early warning is good - stop using the machine for bitcoin until you are sure it is fixed, corruption in wallet operations could result in coin loss.

  110. Bushstar referenced this in commit c3602372cc on Apr 5, 2019
  111. DrahtBot locked this on Sep 8, 2021

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-01-03 00:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me