index: Fix backwards search for bestblock

mzumsande commented at 10:05 pm on October 26, 2021: contributor

This PR attempts to fix an intermittent Init issue encountered during the stress testing of #23289, which relates to the pruning-compatible filter reconstruction logic introduced in #15946.

The problem would occur when the node starts with -txindex=1 but ThreadSync is interrupted after it sets m_best_block_index to Genesis, and before it gets do any further work. In that case, during the next restart of the node, an Init error would be thrown because BaseIndex::Init() tries to backtrack from the tip to the last block which has been successfully indexed (here: Genesis), but the backtracking logic didn’t work properly in this case: The loop while (block_to_test && block->pprev && (block->pprev->nStatus & BLOCK_HAVE_DATA)) checks if a predecessor exists before performing the check block_to_test == block and then possbily setting prune_violation = false If block_to_test and block are the Genesis block this check will not be reached because block->pprev does not exist.

To reproduce this bug on regtest:

start a node with a fresh datadir using -txindex=1 (or any other index)
stop and restart without any index
mine a block
stop and restart again with the index enabled ->InitError Error: txindex best block of the index goes beyond pruned data. (...)

Fix this by requiring that we have the data for the block of the current iteration block (instead of requiring it for the predecessor block->pprev) That way, the check for block_to_test == block is also reached when block_to_test is the Genesis block. No longer requiring the data of block->pprev also means that we can now prune up to m_best_block_index height without requiring a reindex (one block more than before). I added this edge case to feature_blockfilterindex_prune.py, the new version should fail on master.

DrahtBot added the label UTXO Db and Indexes on Oct 26, 2021

jamesob commented at 2:30 pm on October 27, 2021: member

Concept ACK

mzumsande marked this as a draft on Oct 27, 2021

mzumsande commented at 11:17 pm on October 27, 2021: contributor

The proposed change would also mean that it would now be possible to prune up to the best block of an index without needing to reindex (+1 block compared to master). I hope I’m not missing something, but I don’t see why we still need the full data of block->pprev in the case where block is the best block of the index and block->pprev was indexed in the past. I added a test for this edge case to feature_blockfilterindex_prune.py.

DrahtBot commented at 1:35 am on October 28, 2021: contributor

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#21726 (Improve Indices on pruned nodes via prune blockers by fjahr)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

mzumsande marked this as ready for review on Nov 9, 2021

mzumsande commented at 0:20 am on November 9, 2021: contributor

Moved out of draft and added steps to reproduce to OP. Not sure if the second commit (test) should be merged, maybe it’s too detailed, plus the test gets reworked in #21726 anyway.

Note that the last commit of #21726 would partly “fix” the issue too by just not running the affected code unless -prune=1 is specified but doesn’t address the root cause.

in test/functional/feature_blockfilterindex_prune.py:27 in a36e23b183 outdated

23@@ -24,11 +24,7 @@ def run_test(self):
24         self.log.info("check if we can access a blockfilter when pruning is enabled but no blocks are actually pruned")
25         self.sync_index(height=200)
26         assert_greater_than(len(self.nodes[0].getblockfilter(self.nodes[0].getbestblockhash())['filter']), 0)
27-        # Mine two batches of blocks to avoid hitting NODE_NETWORK_LIMITED_MIN_BLOCKS disconnection
28-        self.generate(self.nodes[0], 250)
29-        self.sync_all()
30-        self.generate(self.nodes[0], 250)
31-        self.sync_all()
32+        self.generate(self.nodes[0], 500)

maflcko commented at 9:54 am on November 9, 2021:

Why is this change correct?

mzumsande commented at 10:09 am on November 9, 2021:

Because preventing disconnections due to NODE_NETWORK_LIMITED_MIN_BLOCKS makes little sense when there is just one node involved in the test.

laanwj commented at 12:45 pm on November 10, 2021: member

Concept ACK

DrahtBot added the label Needs rebase on Nov 16, 2021

jamesob approved

jamesob commented at 4:35 pm on November 22, 2021: member

ACK a36e23b183a6b8d1c60598618f2a5c8f31da02e0 (jamesob/ackr/23365.1.mzumsande.index_fix_backwards_sear)

Thanks for looking into this and figuring out a good fix. Once this is merged, I can update #23289 to cover interrupting init after index initialization.

I built this locally and confirmed that the attached test fails if the index/base.cpp changes are reverted. Logically the change makes sense - if the index’s best block is the genesis, we should be able to continue indexation from that point (which is all the rephrased while condition does).

The block = block->pprev; does make me a little nervous (although as-written the code would never cause any problems given how we initialize block_to_test), so if you wanted to you could include an assertion like

 0diff --git a/src/index/base.cpp b/src/index/base.cpp
 1index 88c74ec543..8525dcbfa0 100644
 2--- a/src/index/base.cpp
 3+++ b/src/index/base.cpp
 4@@ -96,6 +96,7 @@ bool BaseIndex::Init()
 5                     prune_violation = false;
 6                     break;
 7                 }
 8+                assert(block->pprev);
 9                 block = block->pprev;
10             }
11         }

 0-----BEGIN PGP SIGNED MESSAGE-----
 1Hash: SHA512
 2
 3ACK a36e23b183a6b8d1c60598618f2a5c8f31da02e0 ([`jamesob/ackr/23365.1.mzumsande.index_fix_backwards_sear`](https://github.com/jamesob/bitcoin/tree/ackr/23365.1.mzumsande.index_fix_backwards_sear))
 4
 5-----BEGIN PGP SIGNATURE-----
 6
 7iQIzBAEBCgAdFiEEGNRVI1NPYuZCSIrGepNdrbLETwUFAmGbxWwACgkQepNdrbLE
 8TwVIdg/8CKYBVyw9DvvSx/G2E31/z7avW4uwj5cq10HMmjtsT/cCHhFtofDtVvIG
 99binTF/dlYhyzQmv7FduzQkgpZtehokrrBugoktyh6zeJpRV+mPUxZvFJ0VUCK1w
10+dVJ2zdbDmDuogJR7+ZZbGgBGxPKjS9t/Q+/LQIP221lgAVi1MPJ8oYuyg4+R6hn
11ueS3piEtXmHnQt0F05PUhJDZZidB/3hRp1bu9Z0H79TrL65wnA1bnDf3+IKbGeQL
12CPsYxZR4VZRiIqkSESsKuIbIVE2ELNJcQHP5pKoA389dBDbDlMfw1Lo/N+opB1tm
13kbkrCt0ZB4HEAwsbyi3KWOue5rYkx5zuz6/7QW6sChkfGLwE+F+UfyvzCzy2P8FI
14Fzd8lx/74O/+0NtcIpY+kSlPxztdQCuqCZHmx8cuYvnfPDIWkg2TrgX77JPoGJHO
15x8mJBGWrTkxYoEM2brLWBKK24pPbafrx860NWEzJEoJ5xreZOfAzsBZPLtEZtymq
16hggWvPjgGljGUFnKakjhP3ahMgDBNyQxoXM/n/XIE3NSepvRE7WOc5yxj2gGScM8
17h6oIYZVmlgbbImA1asNDgXRk4acrA/N+3dZdUCAyjPz0lhx3/g2XznCZsfRyZdjA
18cftewu0RF0l/XnyRZ37K5xvTUsjDYqb2omufwnmgy2giD9QnHdA=
19=5ydW
20-----END PGP SIGNATURE-----

0Tested on Linux-5.10.0-9-amd64-x86_64-with-glibc2.28
1
2Configured with ./configure LDFLAGS=-L/home/james/src/bitcoin/db4/lib/ CPPFLAGS=-I/home/james/src/bitcoin/db4/include/ CXXFLAGS=-fPIE -pipe -O2 -g -Wthread-safety-analysis -Wall -Werror=sign-compare -Wsign-compare -Werror=thread-safety-analysis --enable-wallet --enable-debug --with-daemon --enable-natpmp-default CXX=/usr/bin/clang++ CC=/usr/bin/clang PYTHONPATH= --disable-shared --with-pic --enable-benchmark=no --enable-module-recovery --enable-module-schnorrsig --enable-experimental --disable-openssl-tests --disable-shared --with-pic --enable-benchmark=no --enable-module-recovery --enable-module-schnorrsig --enable-experimental --disable-openssl-tests --no-create --no-recursion
3
4Compiled with /usr/bin/ccache /usr/bin/clang++ -std=c++17 -mavx -mavx2 -fPIE -pipe -O2 -g -Wthread-safety-analysis -Wall -Werror=sign-compare -Wsign-compare -Werror=thread-safety-analysis -O0 -g3 -ftrapv -fdebug-prefix-map=$(abs_top_srcdir)=.  -Wstack-protector -fstack-protector-all -fcf-protection=full -fstack-clash-protection -msse4 -msha -msse4.1 -msse4.2  i
5
6Compiler version: Debian clang version 13.0.1-++20211019122913+8a93745a7121-1~exp1~20211019003538.10

index: Fix backwards search for bestblock

This allows filters to be reconstructed when the best known block is
the Genesis block without needing to reindex.
It fixes Init errors seen in #23289.

698c524698

test: Add edge case of pruning up to index height 9600ea0145

mzumsande force-pushed on Nov 26, 2021

DrahtBot removed the label Needs rebase on Nov 26, 2021

mzumsande commented at 3:43 pm on November 26, 2021: contributor

The block = block->pprev; does make me a little nervous (although as-written the code would never cause any problems given how we initialize block_to_test), so if you wanted to you could include an assertion like

Thanks - I added the assertion as suggested, and rebased. Note that adding the assert only changes the kind of error: Before, if block->pprev had not existed (which I agree shouldn’t be possible), the InitError would have been thrown (because the while condition includes a check for block).

in src/index/base.cpp:99 in 9600ea0145

 95+            while (block_to_test && block && (block->nStatus & BLOCK_HAVE_DATA)) {
 96                 if (block_to_test == block) {
 97                     prune_violation = false;
 98                     break;
 99                 }
100+                assert(block->pprev);

jnewbery commented at 1:16 pm on December 7, 2021:

It may be worth adding a comment here on why block->pprev must exist (because we’re walking backwards from the active chain tip to a block index in the active chain. If a block index that we encounter doesn’t have a backwards pointer, then the chain is malformed).

jamesob commented at 2:37 pm on December 7, 2021:

This isn’t actually a correct explanation (the genesis block is in the active chain and doesn’t have pprev set). The reason this assertion will hold is because of how block and block_to_test are chosen; the conditional above this line is guaranteed to break at-or-higher-than genesis because of the few lines of logic above (which, if summarized in a comment, would just end up repeating the fairly-straightforward code above).

jnewbery commented at 3:46 pm on December 7, 2021:

I think you’re saying the same thing as I am. Because of the way block and block_to_test are chosen, if we walk backwards from block, we’re guaranteed to encounter block_to_test (either at or before the genesis block).

mzumsande commented at 9:50 pm on December 13, 2021:

Sorry I didn’t get to this earlier, would have added a comment, do people think that this warrants a follow-up?

fanquake commented at 1:30 am on December 14, 2021:

@mzumsande sounds like it could be.

mzumsande commented at 10:28 pm on December 14, 2021:

done in #23777

in test/functional/feature_blockfilterindex_prune.py:53 in 9600ea0145

49-        self.generate(self.nodes[0], 1000)
50+        self.generate(self.nodes[0], 502)
51+
52+        self.log.info("prune exactly up to the blockfilterindexes best block while blockfilters are disabled")
53+        pruneheight_2 = self.nodes[0].pruneblockchain(1000)
54+        assert_equal(pruneheight_2, 998)

jnewbery commented at 1:18 pm on December 7, 2021:

Is there a specific reason why calling pruneblockchain(1000) prunes up to exactly block 998? Is it because that’s where the block file happens to wrap (i.e. blocks 999 and 1000 are in the same file as later blocks and so can’t be pruned). If so, that seems a bit brittle (e.g. if the default block file size of disk serialization changes, then this test will fail), and definitely seems worthy of a comment.

mzumsande commented at 9:45 pm on December 13, 2021:

Is it because that’s where the block file happens to wrap

Yes, I believe that is the reason, and I agree it’s brittle (although the test relied before on this, e.g. at assert_equal(pruneheight, 248) - I’m not sure though if there is a better way to test pruning in a detailed way.

mzumsande commented at 10:28 pm on December 14, 2021:

comment added in #23777

jnewbery commented at 1:18 pm on December 7, 2021: contributor

Concept ACK.

ryanofsky commented at 8:18 pm on December 7, 2021: contributor

Partial code review ACK 9600ea01450b0d39be90eb2971c1ac5c9b69a66e for the code change, not the test changes. (Test changes are indirect and little over my head.) It seems obvious that previous code prune_violation = true, while (block->pprev) would incorrectly detect a prune violation at the genesis block, and the fix here make sense and looks correct.

maflcko merged this on Dec 13, 2021

maflcko closed this on Dec 13, 2021

jamesob referenced this in commit 0fc6b72d1d on Dec 13, 2021

sidhujag referenced this in commit 1fd54088ea on Dec 13, 2021

mzumsande referenced this in commit 59ba4cbb0f on Dec 14, 2021

mzumsande referenced this in commit 43ccaf6fdf on Dec 14, 2021

bitcoin deleted a comment on Dec 14, 2021

mzumsande referenced this in commit e4a8d561ed on Dec 15, 2021

maflcko referenced this in commit df6e961c41 on Dec 16, 2021

sidhujag referenced this in commit 8eeb7d1aaf on Dec 16, 2021

0xB10C referenced this in commit 6c5ecf97f9 on Dec 16, 2021

jamesob referenced this in commit 8904f17ea7 on Dec 29, 2021

mzumsande referenced this in commit 82ec24709b on Jan 17, 2022

rebroad referenced this in commit d1a11217eb on Feb 3, 2022

rebroad referenced this in commit f4db8ed45e on Feb 3, 2022

kwvg referenced this in commit 27bcd1b42b on Feb 26, 2022

Fabcien referenced this in commit eacbda7591 on Apr 5, 2022

Fabcien referenced this in commit 7a31619610 on Nov 25, 2022

bitcoin locked this on Dec 14, 2022

mzumsande deleted the branch on Oct 13, 2023

index: Fix backwards search for bestblock #23365

Conflicts