[0.11.2] Random crashes during blockchain sync - Windows

ghost commented at 0:29 am on December 23, 2015: none

Hi everybody.

I searched for duplicates for this issue but I couldn’t manage to find any, both on this repo and on Google. It happens with Bitcoin Core v0.11.2 on Windows (current environment : Windows 10 x64) and seems to happen randomly some time during sync start, never immediately. The problem is a total crash of the program. I can relaunch it to keep syncing but it slows it down quite a lot - I can’t leave it on overnight. I’ll get throught it but I found it interesting to share this report with you.

What I do I launch Bitcoin Core.

What does happen It resumes syncing to get the full blockchain, nothing bad happens at this moment. Then, several hours later, a seemingly random error throws in : an error windows appears, describing a fatal error that will lead to program shutdown. Closing this windows terminates the program.

What happens then I can restart the program, it will resume syncing. Then you can go back to “What does happen” because the same error happens again after a few hours.

Context data This happens on Windows 10 x64. The program (v0.11.2 - x64) was downloaded from bitcoin.org. This doesn’t seem to be a network-related issue, it happens on very different network contexts.

Relevant log

 02015-12-22 18:58:41 IO error: Win32WritableFile.Sync::FlushFileBuffers C:\Users\Username\AppData\Roaming\Bitcoin\chainstate\223994.ldb: Média protégé en écriture. [it means the media is not writeable]
 12015-12-22 18:58:41 *** System error while flushing: Database I/O error
 22015-12-22 19:16:21 ping timeout: 1200.023628s
 32015-12-22 19:17:59 socket sending timeout: 1201s
 42015-12-22 19:18:39 socket sending timeout: 1201s
 52015-12-22 19:18:39 socket sending timeout: 1201s
 62015-12-22 19:18:40 socket sending timeout: 1201s
 72015-12-22 19:18:40 socket sending timeout: 1201s
 82015-12-22 19:18:40 socket sending timeout: 1201s
 92015-12-22 19:18:40 socket sending timeout: 1201s
10
11[I think that at this moment I came back to my computer, noticed the error windows, closed it and restarted the program]
12
132015-12-22 22:56:19 ERROR: ProcessNewBlock: ActivateBestChain failed
142015-12-22 22:56:20 opencon thread interrupt
152015-12-22 22:56:20 scheduler thread interrupt
162015-12-22 22:56:20 addcon thread interrupt
172015-12-22 22:56:20 msghand thread interrupt
182015-12-22 22:56:20 net thread interrupt
192015-12-22 22:56:20 Shutdown: In progress...
202015-12-22 22:56:20 StopNode()
212015-12-22 22:56:20 IO error: Win32WritableFile.Sync::FlushFileBuffers C:\Users\Username\AppData\Roaming\Bitcoin\chainstate\223994.ldb: Média protégé en écriture.
222015-12-22 22:56:20 *** System error while flushing: Database I/O error
232015-12-22 22:56:21 Shutdown: done
242015-12-22 22:56:23 GUI: "registerShutdownBlockReason: Successfully registered: Bitcoin Core didn't yet exit safely..."
252015-12-22 22:56:23

Thanks in advance for any answer! :)

EDIT : Of course the path written in debug logs is writeable and its nature isn’t supposed to change over time.

tulip0 commented at 0:39 am on December 23, 2015: none

First port of call should be checking the health of your disk, there’s a good chance you’ve a hardware failure. Bitcoin Core is generally very good at finding memory, overheating and IO problems.

ghost commented at 1:21 am on December 23, 2015: none

Well, I didn’t think about it when submitting this issue but SMART tests are fine and I’ve had no other particular disk problems, however the drive is a bit old (aprox 3 years old) so I guess it’d be better to upgrade it. Thanks for the advice!

EDIT : I initially thought is was the HDD while, in fact, it was the SSD. SMART results aren’t perfect but those imperfections are old and don’t pose a threat. It gives more weight to the hypothesis of a (little) failing drive though.

jonasschnelli added the label Windows on Dec 23, 2015

jonasschnelli added the label UTXO Db and Indexes on Dec 23, 2015

pstratem commented at 9:23 pm on December 25, 2015: contributor

@Ano59 Assuming i’ve correctly translated this “media write protected”.

Which for an SSD usually means it’s entered a failure mode and is refusing additional writes.

ghost commented at 4:18 am on December 26, 2015: none

It’s hard to provide an accurate translation, think of it as an error message that would pop if you would try to write on an USB drive you just unplugged.

Btw I re-checked the SMART status of my SSD, it’s still fine. There are 4 damaged sectors but they were already present as long as I remember, ages ago. Read/writes on this drive are still fine, except this random Bitcoin Core error.

I finally sync-ed with the full blockchain, so even if this error is a true bug rather than being due to my hardware, it would be a minor one as it doesn’t prevent people from sync-ing.

unknown closed this on Dec 26, 2015

OldGamer59 commented at 8:44 pm on February 25, 2016: none

I have very similar problem. Getting fatal error after couple hours of synchronizing, some time less. I keep the database on external USB 3.0 HD. Deleted all data folder and loaded from scratch. It was OK, I was able to load all chain non stop for some days. Now I decided to synchronize again (2 weeks behind) and fatal error happens again when it was 9 days behind. Then it crashed 6 and 3 days behind. It is really annoying, because it takes long time to rescan and verify. I think it is something to do with writing on HD. I have more than 50GB free on HD though.

laanwj added the label Data corruption on Mar 3, 2016

ghost commented at 5:43 pm on December 4, 2017: none

This is occurring with Bitcoin Core 0.14.1 on a iSCSI mount backed by a 2 node all flash StarWind VSAN cluster as well

TheBlueMatt commented at 8:50 pm on December 6, 2017: member

Is it possible in those cases that the OS hiccuped and the device was disconnected and then reconnected? eg a bad cable on the USB drive or the connection to the SAN drops and then is reestablished?

ghost commented at 2:29 am on December 7, 2017: none

@TheBlueMatt The iSCSI mount had MPIO configured to connect to both nodes in the VSAN cluster. One of the VSAN nodes was on the same host and VLAN as the Bitcoin Core client, so I would think it would rather difficult for the connection to just drop. Replication between VSAN nodes was also configured as synchronous.

To be fair I also realized that I had enabled NTFS deduplication which might be interfering with it. I know exchange databases by default are excluded from dedup, i’m not sure if Berkeley DB databases have issues as well.

TheBlueMatt commented at 0:09 am on December 8, 2017: member

The issues there are leveldb interactions (“chainstate\223994.ldb” referenced in the error). How leveldb interacts with iSCSI targets on Windows with NTFS dedup is anyone’s guess, honestly. I wouldn’t be surprised if there were some race condition where Windows might report the iSCSI mount as not-writeable during some operation or if one of the two nodes’ connections drop (which would imply lack of quorum, if I understand your config correctly).

ghost commented at 6:18 am on December 8, 2017: none

@TheBlueMatt there was never an issue of lack of quorum, in the event that occurs you have to manually select which node is considered good to resolve the issue of split brain, which I would have noticed.

Anyways it would be nice if bitcoin core could move to more modern database technologies which support things such as networked storage. But that doesn’t seem to be a priority at this time.

TheBlueMatt commented at 3:22 pm on December 8, 2017: member

LevelDB happens to be much, much faster for our (somewhat strange) application - large batches that delete some large percentage of recently-written data - than almost any other non-log-structured database, so, indeed, there isnt much desire to let users use other databases which will invariably be slower. As for the issues with iSCSI on Windows, I’m not a Windows person so really have no idea where to begin looking, but trying to correlate the LevelDB failures with other activity (dedup, iSCSI disconnection, etc) would be a first step.

DrahtBot locked this on Sep 8, 2021

[0.11.2] Random crashes during blockchain sync - Windows - I/O error #7248