Optimize reindex

sipa commented at 2:13 pm on April 20, 2016: member

Several changes:

Make reindex/import use AcceptBlock rather than ProcessNewBlock, so activation of the resulting best chain is delayed.
That delayed activation is handled by the existing “Activating best chain” step in the startup process, which is moved to ThreadImport.
Optimize ActivateBestChain to not walk the entire chain to find the most work chain to connect for each block (O(n^2) -> O(n)).

This has several advantages:

As the actual activation is done after all blocks have been added back to the index, it gets to use the checkpoints information (better progress indicator, skipping of signature checks, …).
Having a block index with many unactivated blocks (for example, after deleting the chainstate directory) becomes much faster
That case also runs in the background now (as it’s simply treated as part of the importing process).

Downsides:

All blocks are read twice from disk during reindex (once to rebuild the index, once to activate).

Todo:

Better progress indicator during the index rebuild phase.

laanwj added the label Validation on Apr 20, 2016

laanwj commented at 2:20 pm on April 20, 2016: member

Good to see this!

All blocks are read twice from disk during reindex (once to rebuild the index, once to activate)

Could this (theoretically) be avoided by just reading the header during the initial scan? after all, the extra header that we tack on includes the size of the block, so we could skip the rest the first time.

sipa commented at 2:22 pm on April 20, 2016: member

@laanwj Yes, I’ve considered that. That could lead to an inconsistent block index though (as in: I’m not sure whether there are any internal assumptions that all entries in the block index are valid).

jonasschnelli commented at 2:23 pm on April 20, 2016: contributor

Nice. Concept ACK. Testing this now on a mainnet node -reindex.

sipa commented at 2:25 pm on April 20, 2016: member

@laanwj Another problem with that approach: say you get a block corrupted near the chain tip on disk, you find it and do a reindex. You expect it to reindex up to that corrupted block and then start back from network. Instead, if you don’t do transaction validation during the rebuild loop, it may happily add the block to the index, and have activation fail (making it mark the chain as invalid).

laanwj commented at 2:27 pm on April 20, 2016: member

Yes, it would be quite more complex to handle those issues well. In any case if the speed-up is more than the extra disk i/o that this generates takes, it doesn’t matter, no need to worry about it in this pull.

sipa commented at 2:30 pm on April 20, 2016: member

@jonasschnelli If you feel like benchmarking, an interesting thing to test is (reindex on master) vs (deleting chainstate on this PR), both with signature checks hackishly disabled. Deleting of chainstate caused a very slow, foreground, and (before #7821) uninterruptible process earlier. With this PR is should be (nearly) the same speed as an old reindex.

sipa force-pushed on Apr 20, 2016

sipa commented at 2:39 pm on April 20, 2016: member

Would it be useful to have a -weakreindex or -rebuild now, which does not throw out the block index, and only the chainstate? That would be faster, as it wouldn’t need the rebuild phase.

laanwj commented at 3:37 pm on April 20, 2016: member

Would it be useful to have a -weakreindex or -rebuild now, which does not throw out the block index, and only the chainstate? That would be faster, as it wouldn’t need the rebuild phase.

Would need better reporting about what database is corrupted for that to be useful to end-users. But even without that I can see that being very useful for benchmarking the UTXO database.

I like the idea of being able to rebuild databases separately. What about the other way around? If you have a UTXO set, but want to reindex the block chain? (e.g. when switching a pruned node to a non-pruned one by copying block data from another node)

If it was up to me I wouldn’t use the name ‘weakindex’ or ‘rebuid’ (that’s another one for the rescan, reindex, re…. category of vaguenes) but call it what it is: -rebuild-blockindex -rebuild-chainstate etc (using the same name as the db directory names).

sipa commented at 3:41 pm on April 20, 2016: member

I like the idea of being able to rebuild databases separately. What about the other way around? If you have a UTXO set, but want to reindex the block chain? (e.g. when switching a pruned node to a non-pruned one by copying block data from another node)

You can just overwrite your blocks/ database with another version (including its index), as long as it has your currently active chain in it.

sipa commented at 4:02 pm on April 20, 2016: member

If it was up to me I wouldn’t use the name ‘weakindex’ or ‘rebuid’ (that’s another one for the rescan, reindex, re…. category of vaguenes) but call it what it is: -rebuild-blockindex -rebuild-chainstate etc (using the same name as the db directory names).

-rebuild-chainstate sounds like an excellent idea. I don’t know what -rebuild-blockindex would mean though…

sipa force-pushed on Apr 21, 2016

sipa commented at 12:17 pm on April 21, 2016: member

Added a commit to implement -reindex-chainstate.

I guess the semantics for -reindex-blockindex could be:

Rebuild the block index without rebuilding the chainstate
If afterwards the chainstate’s tip is in the resulting block index, mark every block on the best chain as validated and pretend nothing happened
If not, wipe the chainstate and rebuild that as well.

laanwj commented at 12:23 pm on April 21, 2016: member

I don’t know what -rebuild-blockindex would mean though…

That hypothetical option would rebuild the blockindex database, say after some new-fangled corruption check detects that that database was corrupted.

Throw away the block index (so the database in blocks/index)
Reindex blocks

But don’t touch the chain state in any way, it is assumed to be still good. If the best chain is not included after reindexing the blocks, throw another error that a ~~full reindex~~ -reindex-chainstate is necessary.

I guess the semantics for -reindex-blockindex could be:

Yes, something like that, sounds good to me.

sipa force-pushed on Apr 21, 2016

dcousens commented at 0:03 am on April 22, 2016: contributor

concept ACK -(rebuild|reindex)-*

sipa commented at 10:24 am on April 22, 2016: member

I tried implementing -reindex-blockindex, but it is significantly harder: it would mean first initializing with a dummy chainstate, then reindexing the blockindex while leaving the original chainstate untouched, then stopping the node, loading the real chainstate, marking the blocks along the best chain as valid (or fail if not found anymore), and starting the node again.

laanwj commented at 1:09 pm on April 26, 2016: member

Going to test this

laanwj commented at 1:29 pm on April 26, 2016: member

Not sure if this is expected behavior, it’s probably harmless; but during the “Reindexing block file” initial scan, rev*.dat files are created with size 0:

0-rw------- 1 user user         0 Apr 26 15:26 rev00415.dat
1-rw------- 1 user user         0 Apr 26 15:26 rev00416.dat
2-rw------- 1 user user         0 Apr 26 15:26 rev00417.dat
3-rw------- 1 user user         0 Apr 26 15:26 rev00418.dat

Edit: the block counts are off? possibly due to out-of-order blocks not being counted. It looks a bit strange but probably harmless:

02016-04-26 13:30:48 Reindexing block file blk00471.dat...
12016-04-26 13:30:51 Loaded 3 blocks from external file in 2585ms

laanwj commented at 5:58 pm on April 26, 2016: member

Did a reindex of my test chain using this patch:

02016-04-26 13:07:36 Reindexing block file blk00000.dat...
12016-04-26 13:32:16 UpdateTip: new best=00000000839a8e6886ab5951d76f411475428afc90947ee320161bbf18eb6048 height=1 version=0x00000001 log2_work=33.000022 tx=2 date='2009-01-09 02:54:25' progress=0.000000 cache=0.0MiB(1tx)
22016-04-26 17:44:18 UpdateTip: new best=00000000000000000360e6d03330560950285d1062c6f6e6a4558ba1720d6e73 height=407838 version=0x00000004 log2_work=84.507935 tx=123156514 date='2016-04-18 11:34:09' progress=0.994753 cache=43.8MiB(23470tx)

Times:

0phase1   00:24:40 (1480s)
1phase2   04:12:02 (15122s)
2         -----------------
3total    04:36:42 (16602s)

For comparison, a sync from a local node on the same machine, under the same conditions, takes 05:32:29 (19949s). That’s very good. I have not tried doing a ’traditional’ reindex,

Edit: also verified that this fixes #7038. Also tested that interrupting the reindex then restarting works, both in the first phase and second phase.

Edit.2: also did a “traditional” reindex - it took 04:45:49 (17149s). Only a bit longer. Note that this is a fast CPU with 8 cores, so possibly the difference with not validating is less than it would otherwise be.

Edit.3: verified that utxo set hash is the same after old reindex versus new reindex (just to be sure).

in src/init.cpp: in 691cbc972a outdated

607     }
608 
609+    // scan for better chains in the block chain database, that are not yet connected in the active best chain
610+    CValidationState state;
611+    if (!ActivateBestChain(state, chainparams))
612+        LogPrintf("Failed to connect best block");

laanwj commented at 9:22 am on April 27, 2016:

Shouldn’t this error be fatal?

jonasschnelli commented at 9:05 am on April 28, 2016: contributor

Tested on a mainnet node:

0jonasschnelli@bitcoinsrv:~/.bitcoin$ cat debug.log | grep "Bitcoin version\|Reindexing finished\|height=409149"
12016-04-27 14:11:51 Bitcoin version v0.12.99.0-fb682d7 (2016-04-27 15:34:26 +0200)
22016-04-27 14:34:17 Reindexing finished
32016-04-27 18:52:50 UpdateTip: new best=000000000000000000c2963b062bebef4beff8ef38ae5d81e7ccfa8e05abcb1a height=409149 version=0x00000004 log2_work=84.559763 tx=125119626 date='2016-04-27 14:05:06' progress=0.999875 cache=4.0MiB(0tx)

= Times:

0phase1    1'346s (00:22:26)
1phase2   15'513s (04:18:33)
2===========================
3total    16'859s (04:40:59)

Full debug log with -debug=reindex http://bitcoinsrv.jonasschnelli.ch/debug.reindex.log (>240MB!)

sipa commented at 2:19 pm on April 28, 2016: member

Added GUI progress reporting

sipa force-pushed on Apr 28, 2016

sipa commented at 9:22 pm on May 1, 2016: member

The two phases in the GUI:

reindexing processing

If the chainstate is ahead of the block index at startup, you get the same “Processing blocks on disk” now, while the main GUI is running (and not during loading).

sipa force-pushed on May 1, 2016

laanwj commented at 11:22 am on May 2, 2016: member

@sipa Very nice!

jonasschnelli commented at 1:50 pm on May 2, 2016: contributor

Just started to sync with master+thisPR a new node over the GUI on Ubuntu 16.04 with dbcache=3500 and got this:

The GUI used chainActive.tip(). Is this no longer updated continuously?

sipa force-pushed on May 5, 2016

sipa commented at 4:56 pm on May 5, 2016: member

@jonasschnelli I added another commit that hopefully fixes that (untested).

sipa force-pushed on May 9, 2016

sipa commented at 6:38 pm on May 9, 2016: member

Rebased.

jonasschnelli commented at 7:54 am on May 10, 2016: contributor

Just started a Qt reindex with https://bitcoin.jonasschnelli.ch/pulls/7917/ (7ac4114). The debug window shows 0 as current number of blocks (is this intentional?). The progress bar in the main window is updating.

jonasschnelli commented at 9:41 am on May 10, 2016: contributor

After a while it started updating the debug window (maybe after first reindex [chain index] phase).

laanwj commented at 9:50 am on May 10, 2016: member

During the first phase the number of blocks will be 0, that makes sense. I guess it could show the longest header chain as well in the debug window, but right now it only shows what is fully validated.

laanwj commented at 9:55 am on May 10, 2016: member

Hm but I see the same behavior, reindexing a regtest chain. Log shows:

02016-05-10 09:53:32 UpdateTip: new best=57d8789c33ee61db98c8b87076cae5c637056bff4a4c340b3ef28e9bc853a4a3 height=250 version=0x00000003 log2_work=8.9715436 tx=251 date='2015-06-23 12:20:30' progress=1.000000 cache=0.1MiB(250tx)

While the current number of blocks in the debug window is still at 1. Same for the onmouseover “Processed 1 block of transaction history”

laanwj commented at 10:05 am on May 10, 2016: member

In my case it is a problem is in the rate limiting in BlockTipChanged. If the updating goes too fast, it will only send a signal for the initial value.

0if (!initialSync || now - nLastBlockTipUpdateNotification > MODEL_UPDATE_DELAY) {

This is not new in this patch, just exposed by the new syncing behavior. I think we should move ahead with this and solve the UI issues separately.

sipa commented at 7:53 pm on May 15, 2016: member

@laanwj @jonasschnelli Added another commit to make header updates not compete for rate limiting with block updates.

Make ProcessNewBlock dbp const and update comment d253ec4baa

Switch reindexing to AcceptBlock in-loop and ActivateBestChain afterwards 316623f2c1

Optimize ActivateBestChain for long chains fb8fad1586

Add -reindex-chainstate that does not rebuild block index d3d7547911

Report reindexing progress in GUI b4d24e142e

in src/main.cpp: in 7bb8649816 outdated

4101@@ -4060,17 +4102,19 @@ bool LoadExternalBlockFile(const CChainParams& chainparams, FILE* fileIn, CDiskB
4102                         std::multimap<uint256, CDiskBlockPos>::iterator it = range.first;
4103                         if (ReadBlockFromDisk(block, it->second, chainparams.GetConsensus()))
4104                         {
4105-                            LogPrintf("%s: Processing out of order child %s of %s\n", __func__, block.GetHash().ToString(),
4106+                            LogPrint("reindex", "%s: Processing out of order child %s of %s\n", __func__, block.GetHash().ToString(),

MarcoFalke commented at 10:45 am on May 16, 2016:

Does this require mention in the -debug=<category> help text?

sipa commented at 10:38 pm on May 16, 2016:

Agree!

sipa force-pushed on May 16, 2016

sipa commented at 10:46 pm on May 16, 2016: member

Squashed and rebased.

pstratem commented at 1:58 am on May 17, 2016: contributor

utACK fb8fad1586ced69fa37c665a11916ae4c4d0df05

pstratem commented at 3:00 am on May 17, 2016: contributor

utACK d3d75479115bc3480f163df774ee9dd2f8bd9f54

pstratem commented at 3:08 am on May 17, 2016: contributor

very untested ACK b4d24e142e25a21c78ab74146c3520f2259fd7c2 (sanity not guaranteed)

pstratem commented at 3:29 am on May 17, 2016: contributor

On a related note access to chainActive immediately after threadGroup.create_thread(boost::bind(&ThreadImport, vImportFiles)); is not protected by cs_main

laanwj merged this on May 18, 2016

laanwj closed this on May 18, 2016

laanwj referenced this in commit 239d419864 on May 18, 2016

str4d referenced this in commit f42cde3c30 on Feb 23, 2017

sickpig referenced this in commit 2142ea0cd4 on Feb 25, 2017

sickpig referenced this in commit ae828d508d on Feb 25, 2017

sickpig referenced this in commit f1773ac268 on Feb 27, 2017

sickpig referenced this in commit 9ba0220027 on Mar 4, 2017

random-zebra referenced this in commit b993c56afe on Apr 4, 2021

LarryRuane referenced this in commit d68a5581ee on May 27, 2021

zkbot referenced this in commit ab8ed50c4e on Aug 9, 2021

zkbot referenced this in commit f721e3b067 on Aug 9, 2021

in src/qt/clientmodel.cpp:27 in b4d24e142e

23@@ -24,6 +24,7 @@
24 class CBlockIndex;
25 
26 static const int64_t nClientStartupTime = GetTime();
27+static int64_t nLastHeaderTipUpdateNotification = 0;

rebroad commented at 11:48 am on September 7, 2021:

This variable seems to be initialized here, but there’s nowhere in the code where it’s ever updated, and yet it is checked as if the value might change.

MarcoFalke commented at 11:54 am on September 7, 2021:

Please don’t use more than 5 year old pull requests to ask beginner programming questions.

You can open a new issue, if you are absolutely unable to find the answer yourself.

In this case it is updated via a reference (a reference is like a pointer).

MarcoFalke locked this on Sep 7, 2021

Optimize reindex #7917