Improve chainstate/blockindex disk writing policy

sipa commented at 4:48 pm on November 7, 2014: member

There are 3 pieces of data that are maintained on disk. The actual block and undo data, the block index (which can refer to positions on disk), and the chainstate (which refers to the best block hash).

Earlier, there was no guarantee that blocks were written to disk before block index entries referring to them were written. This commit introduces dirty flags for block index data, and delays writing it until the actual block data is flushed.

With this stricter ordering in writes, it is now safe to not always flush after every block, so there is no need for the IsInitialBlockDownload() check there - instead we just write whenever enough time has passed or the cache size grows too large.

In addition, only do a write inside the block processing loop if necessary (because of cache size exceeded). Otherwise, move the writing to a point after processing is done, after relaying.

This should improve block relay speed, by not writing chainstate and index entries before announcing the new hash.

Future work: move block and undo data writing into FlushStateToDisk, and move it to a background thread.

sipa force-pushed on Nov 12, 2014

sipa commented at 2:37 pm on November 12, 2014: member

Added wallet tip updating and block/undo file flushing to the FlushStateToDisk() method. Very preliminary testing on my VPS (with very slow I/O) shows a ~100ms decrease in block processing time.

sipa force-pushed on Nov 12, 2014

in src/main.cpp: in 56e830897c outdated

1765+ * fast is not set and it's been a while since the last write.
1766+ */
1767+bool static FlushStateToDisk(CValidationState &state, bool fast = false, bool forceWrite = false) {
1768     static int64_t nLastWrite = 0;
1769-    if (forceWrite || pcoinsTip->GetCacheSize() > nCoinCacheSize || (!IsInitialBlockDownload() && GetTimeMicros() > nLastWrite + 600*1000000)) {
1770+    if (forceWrite || pcoinsTip->GetCacheSize() > nCoinCacheSize ||

laanwj commented at 9:35 am on November 13, 2014:

Parametrization on multiple bool parameters leads to hard-to-read code; I’d prefer to pass an enum, or bit field

sipa commented at 5:12 pm on November 14, 2014:

Agree, I’ll add a commit for changing it.

laanwj commented at 9:36 am on November 13, 2014: member

Concept ACK, will test

laanwj commented at 2:50 pm on November 16, 2014: member

Found a small issue. When AppInit2 exits prematurely, bitcoind will crash in shutdown with the following:

0[#0](/bitcoin-bitcoin/0/)  CCoinsViewCache::GetCacheSize (this=0x0) at coins.cpp:196
1[#1](/bitcoin-bitcoin/1/)  0x54afe528 in FlushStateToDisk (state=..., fast=fast@entry=false, forceWrite=forceWrite@entry=true) at main.cpp:1774
2[#2](/bitcoin-bitcoin/2/)  0x54afe906 in FlushStateToDisk () at main.cpp:1811
3[#3](/bitcoin-bitcoin/3/)  0x54ad5c16 in Shutdown () at init.cpp:153
4[#4](/bitcoin-bitcoin/4/)  0x54aca9d6 in AppInit (argc=6, argv=<optimized out>) at bitcoind.cpp:170
5[#5](/bitcoin-bitcoin/5/)  0x54acad20 in main (argc=6, argv=0x7efff714) at bitcoind.cpp:182

You need a check pcoinsTip!=0 in Shutdown() before calling FlushStateToDisk.

sipa force-pushed on Nov 16, 2014

sipa commented at 3:04 pm on November 16, 2014: member

Fixed.

laanwj commented at 2:52 pm on November 17, 2014: member

What about the pcoinstip->Flush() in gettxoutsetinfo https://github.com/bitcoin/bitcoin/blob/master/src/rpcblockchain.cpp#L322 - does this need to be some form of FlushStateToDisk?

sipa commented at 2:56 pm on November 17, 2014: member

@laanwj Ah right, it does. That’s sort of inconvienient but inevitable until there’s a way to iterate the UTXO entries in a CCoinsView. Writing pcoinsTip to disk means a reference to the tip block hash in the index, so that needs to be written, which on itself refers to disk positions, so blocks need to be written too…

sipa force-pushed on Nov 17, 2014

sipa commented at 2:58 pm on November 17, 2014: member

Fixed.

laanwj added the label Improvement on Nov 17, 2014

sipa commented at 2:02 pm on November 20, 2014: member

Tested by kill -KILL’ing a node while it was flushing. No problems.

sipa force-pushed on Nov 20, 2014

gmaxwell commented at 7:19 am on November 23, 2014: contributor

This seems to greatly improves our handling of unclean shutdowns during the IBD. Previously we’d reliably corrupt the database, requring the user to manually delete things. I’ve tested this with a bunch of killing while syncing and not been able to break it.

sipa force-pushed on Nov 23, 2014

rdponticelli commented at 3:40 pm on November 23, 2014: contributor

I performed a thorough synchronization of a testnet node from scratch using this patch, sending term or kill signals randomly every 180-640 seconds. It worked like a charm. Such test easily triggers #5156 without it.

gmaxwell commented at 7:51 pm on November 23, 2014: contributor

I think we should seriously consider this for 0.10. It’s a big reliablity improvement during the initial sync, beyond its performance implications.

laanwj commented at 9:25 am on November 24, 2014: member

I’ve subjected this to some horrible crashes, and a full sync w/ DEBUG_LOCKORDER. ACK commithash 9dd92a12fe92b4a81e612acf1c6245b85fa12a73 for 0.10. https://dev.visucore.com/bitcoin/acks/5241

Improve chainstate/blockindex disk writing policy

There are 3 pieces of data that are maintained on disk. The actual block
and undo data, the block index (which can refer to positions on disk),
and the chainstate (which refers to the best block hash).

Earlier, there was no guarantee that blocks were written to disk before
block index entries referring to them were written. This commit introduces
dirty flags for block index data, and delays writing entries until the actual
block data is flushed.

With this stricter ordering in writes, it is now safe to not always flush
after every block, so there is no need for the IsInitialBlockDownload()
check there - instead we just write whenever enough time has passed or
the cache size grows too large. Also updating the wallet's best known block
is delayed until this is done, otherwise the wallet may end up referring to an
unknown block.

In addition, only do a write inside the block processing loop if necessary
(because of cache size exceeded). Otherwise, move the writing to a point
after processing is done, after relaying.

51ce901aa3

Introduce separate flushing modes a206950016

sipa force-pushed on Nov 24, 2014

sipa commented at 2:17 pm on November 24, 2014: member

Sorry, needed to rebase after BIP23 block proposals.

laanwj added this to the milestone 0.10.0 on Nov 24, 2014

gmaxwell commented at 10:42 pm on November 24, 2014: contributor

ACK.

laanwj merged this on Nov 25, 2014

laanwj closed this on Nov 25, 2014

laanwj referenced this in commit 397b9011c9 on Nov 25, 2014

MarcoFalke locked this on Sep 8, 2021

Improve chainstate/blockindex disk writing policy #5241