index: make startup more efficient

furszy commented at 2:55 pm on May 9, 2023: member

Simplifies index startup code, eliminating the g_indexes_ready_to_sync variable, deduplicating code and moving the prune violation check out of the BaseIndex class.

Also makes startup more efficient by running the prune violation check once for all indexes instead of once for each index, and by delaying the prune violation check and moving it off of the main thread so the node can start up faster and perform the block data availability verification even when the ‘-reindex" or the “-reindex-chainstate” flags are enabled (which hasn’t being possible so far).

DrahtBot commented at 2:55 pm on May 9, 2023: contributor

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Reviews

See the guideline for information on the review process.

Type	Reviewers
ACK	ryanofsky, TheCharlatan
Stale ACK	mzumsande

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#28053 (refactor: Move stopafterblockimport option out of blockstorage by TheCharlatan)
#28051 (Get rid of shutdown.cpp/shutdown.h, use SignalInterrupt directly by ryanofsky)
#27850 (test: Add unit & functional test coverage for blockstore by pinheadmz)
#26762 (bugfix: Make CCheckQueue RAII-styled by hebasto)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

DrahtBot added the label CI failed on May 9, 2023

in src/index/base.cpp:389 in 86752e0cc5 outdated

384@@ -390,7 +385,11 @@ IndexSummary BaseIndex::GetSummary() const
385     IndexSummary summary{};
386     summary.name = GetName();
387     summary.synced = m_synced;
388-    summary.best_block_height = m_best_block_index ? m_best_block_index.load()->nHeight : 0;
389+    if (m_best_block_index) {
390+        const CBlockIndex* pindex = m_best_block_index.load();

ryanofsky commented at 3:09 pm on May 11, 2023:

In commit “index: verify blocks data existence only once” (86752e0cc5bc48f3d4ac1cd07835c37daf078d6a)

Not important in practice, but since m_best_block_index is an atomic variable, it would look a little better to retrieve it atomically with:

0    if (const CBlockIndex* pindex = m_best_block_index.load()) {

in src/init.cpp:1575 in 86752e0cc5 outdated

1571+    const auto& update_indexes_start_block = [&](const BaseIndex* index){
1572+        const IndexSummary& summary = index->GetSummary();
1573+        if (summary.synced) return;
1574+        const CBlockIndex* pindex = WITH_LOCK(::cs_main, return chainman.ActiveChain()[summary.best_block_height]);
1575+        // indexes are always initialized to the active chain last common block
1576+        assert(pindex->GetBlockHash() == summary.best_block_hash);

ryanofsky commented at 3:49 pm on May 11, 2023:

In commit “index: verify blocks data existence only once” (86752e0cc5bc48f3d4ac1cd07835c37daf078d6a)

In the case where the bitcoind datadir is being newly created and the LoadChainState call above doesn’t set a chain tip, pindex will be null here and this line will segfault. It should be possible to handle this with if (!pindex) return. This is causing CI errors (https://cirrus-ci.com/task/4539645451567104) currently

in src/node/interfaces.cpp:728 in 1341793b92 outdated

724@@ -725,6 +725,39 @@ class ChainImpl : public Chain
725     {
726         ::uiInterface.ShowProgress(title, progress, resume_possible);
727     }
728+    bool hasDataFromTipDown(const CBlockIndex* start_block) override {

ryanofsky commented at 3:54 pm on May 11, 2023:

In commit “indexes, refactor: Remove index prune_violation code” (1341793b928ba81205b1cea11fdbd52fe6c3e869)

Note for reviewers: code in this function is moved from the BaseIndex::Init function with minor changes. An easy way to review it is to look at the before and after versions in a side by side diff.

ryanofsky commented at 3:59 pm on May 11, 2023:

In commit “indexes, refactor: Remove index prune_violation code” (1341793b928ba81205b1cea11fdbd52fe6c3e869)

It would probably make more sense to make this method into a standalone function bool ChainDataFromTipDown(ChainstateManager& chainman, const CBlockIndex& start_block) function in someplace like src/node/chainstate.cpp than to add it to interfaces::Chain if it won’t be called by chain clients like wallets or indexes.

ryanofsky changes_requested

ryanofsky commented at 4:10 pm on May 11, 2023: contributor

Reviewed 86752e0cc5bc48f3d4ac1cd07835c37daf078d6a and it mostly looks good, but does have a null pointer deference currently (see comments and suggested fix). I like this PR because it will simplify #24230 and could potentially improve performance. I’m not actually how sure how much it actually would improve performance in practice though, so I’m curious if that’s was the original motivation here or if this is related to one of your other changes. I’m also curious about the todo:

Pending todo: Fix remaining test by using the block index map instead of the active chain. The active chain is not available at the point where the indexers are started.

Unsure if this is just referring to the null pointer problem, or to something else

furszy commented at 8:40 pm on May 11, 2023: member

Thanks for the deep review @ryanofsky!

Reviewed 86752e0 and it mostly looks good, but does have a null pointer deference currently (see comments and suggested fix). I like this PR because it will simplify #24230 and could potentially improve performance. I’m not actually how sure how much it actually would improve performance in practice though, so I’m curious if that’s was the original motivation here or if this is related to one of your other changes.

Yeah, the motivation came from a mix of thoughts actually.

I was reviewing #24230 and wasn’t completely sold by the hasDataFromTipDown() call inside the attachChain function. It seems odd to call to a chain data verification function in an event registering method. Then went deeper over the function, realized about the work duplication, and took that argument to simplify #24230 a bit.

We could also move the entire hasDataFromTipDown from the interface to somewhere else in this way. So there is no CBlockIndex dependency in the interface neither here nor in the #24230 intermediate commits.

Plus, It plays well with #25193.

And.. I’m also thinking that after #24230 and the parallel sync work (#26966), we could have one initial sync thread with a workers pool for all the indexers instead of the current per indexer sync thread. Which should speedup the sync process quite a lot. We are currently reading the entire chain block by block from disk on every index thread. So instead, we could read blocks just once, then dispatch events to the indexers. Making indexers purely listeners with no associated thread.

I’m also curious about the todo:

Pending todo: Fix remaining test by using the block index map instead of the active chain. The active chain is not available at the point where the indexers are started.

Unsure if this is just referring to the null pointer problem, or to something else

Yeah ok. That comment is inaccurate and I forgot to update it.

I initially thought that the issue was due an initialization ordering, that we weren’t having the active chain activated at that point (thought that was done only post load-blk thread completion). So I implemented 4738a1d7d46a9dd1f12dfeb047b142ab33c0fa16 and 837acfdbaca430cc5e29b7cdbf61380c33ed1aa2 instead of the current version. Which is better than what we have here currently, but it’s not the solution for the null ptr deference.

The issue is that empty indexers don’t set the best block to the genesis CBlockIndex, they just leave it null. So, the index summary returns a valid height=0 with an empty block hash.. which crashes on the block hash assertion.

So, the fix is easy. But.. I haven’t done it because have found another possible index sync “fatal error” in master and wanted to fix it prior continue moving forward here.

Essentially, we are not checking whether the node has the post fork point blocks in disk prior starting the index:

The pruning violation checks whether we have blocks from the fork point up to the active chain tip. But it does not check if the node has blocks from the fork point up to the fork tip.. which are required by coinstatsindex to reverse its state during the reorg..

E.g.

Active chain A -> B -> C -> D -> E

Index chain A -> B -> C -> G -> F -> H

The “failure error” will happen when G or F are not in disk or were pruned.

So, if this happens, it causes a “fatal error“ during the coin stats index reorg process due the impossibility to read the block from disk.

ryanofsky commented at 5:17 pm on May 15, 2023: contributor

re: #27607 (comment)

We could also move the entire hasDataFromTipDown from the interface to somewhere else in this way. So there is no CBlockIndex dependency in the interface neither here nor in the #24230 intermediate commits.

Yes I don’t think it makes sense to expose that method as part of the Chain interface, and even if it did make sense, it wouldn’t make sense to put the implementation in the ChainImpl class, because the class is mostly meant to hold glue code, not complicated functions. That’s the reason for the suggestion to make it a standalone function bool ChainDataFromTipDown(ChainstateManager& chainman, const CBlockIndex& start_block) in #27607 (review)

And.. I’m also thinking that after #24230 and the parallel sync work (#26966), we could have one initial sync thread with a workers pool for all the indexers instead of the current per indexer sync thread. Which should speedup the sync process quite a lot. We are currently reading the entire chain block by block from disk on every index thread. So instead, we could read blocks just once, then dispatch events to the indexers. Making indexers purely listeners with no associated thread.

Yes letting indexes just receive notifications to get in sync and not have to implement sync logic is the goal of #24230. And the threading issue should be orthogonal after that PR. Indexes (and wallets) could read and process blocks in single thread, or read blocks in a single thread and process them in parallel, or read and process blocks in parallel like happens currently, or use some other form of scheduling. But regardless of what ordering is used, the interface an individual index uses to get in sync should not have to change, and indexes shouldn’t have to create sync threads or deal with race conditions between the notification threads and sync threads.

I initially thought that the issue was due an initialization ordering, that we weren’t having the active chain activated at that point (thought that was done only post load-blk thread completion).

Oh, I see. That’s also what I concluded from seeing the failure, but I guess it is not the complete picture.

The issue is that empty indexers don’t set the best block to the genesis CBlockIndex, they just leave it null. So, the index summary returns a valid height=0 with an empty block hash.. which crashes on the block hash assertion.

So, the fix is easy.

I didn’t know about this. I guess the fix would be to treat the genesis as the fork point in this case. And to consider the index already synced if there is nochainstate and the genesis pointer is null.

But.. I haven’t done it because have found another possible index sync “fatal error” in master and wanted to fix it prior continue moving forward here.

It seems like it would be a good thing to fix this. But this sounds like something that was already broken, so I’m not sure if the fix has to be bundled here, necessarily. I think the “has data from tip down” check is a useful check that can run early and warn if any block data is missing. But if the check isn’t perfect and doesn’t catch missing block, it shouldn’t be the worst thing because the missing blocks will just be reported later rather than earlier.

mzumsande commented at 9:37 pm on May 16, 2023: contributor

I can’t see how the current approach could work, even if the problems discussed above were solved:

With this PR, the newly introduced function update_indexes_start_block is executed right after creating the index, but before executing Start(). But the index is basically just an empty object at this point, because its constructor doesn’t interact with the database stored on disk at all. So calling GetSummary() can’t possibly give any meaningful data about the best block at this stage. The load from disk is being done in Init()/CustomInit(), but that happens only later within Start().

I think the necessary order would be to

create all indexes and read their best block / other data from disk
determine the oldest block for all indexes
Do the pruning check once

furszy force-pushed on May 16, 2023

ryanofsky commented at 10:09 pm on May 16, 2023: contributor

I can’t see how the current approach could work, even if the problems discussed above were solved:

Good catch noticing Init() was not called! It doesn’t seem like it should be that hard to fix, though. The PR was already moving most of the code out of Init(), anyway, so now a little more code needs to move. I didn’t look very deeply but I would probably make Init() a public method and call it after constructing the index. Also stop calling Init() from Start() and move the RegisterValidationInterface() from Start() to Init().

furszy commented at 10:10 pm on May 16, 2023: member

Thanks for the review @mzumsande.

Funny that I pushed a small update at the same time that you were commenting.

Have few more changes on the pipeline that will be pushing soon. e.g. the hasDataFromTipDown entire function can be written in two lines.. just need to re-purpose the GetFirstStoredBlock function a bit :).

I think the necessary order would be to

create all indexes and read their best block / other data from disk determine the oldest block for all indexes Do the pruning check once

Yeah.. I was thinking on the other issue and forgot that Init() isn’t being called in the constructor.. :face_palm:. Will re-order it, Thanks!

furszy force-pushed on May 17, 2023

ryanofsky commented at 1:31 pm on May 17, 2023: contributor

Looking at current version of the PR 6dfec1b5fbb0f8e95a6134ea8b27e9d6c285d171 it seems to have changed a lot, and the other thing it is doing now is delaying startup of indexes until blocks are loaded, so there is no longer a conflict with reindex-chainstate. So it is basically reimplementing #25193 in a different way that doesn’t require a sleep_for(std::chrono::milliseconds(500)) waitloop. I think this approach is cleaner, but also that #25193 is a smaller more targeted change with a test, so I would probably prefer to see #25193 merged first and with this cleanup and rewrite merged later.

~~#25193 also has two ACKs, so I can re-ack and merge it soon.~~ EDIT: Never mind, just noticed it needs rebase currently

On this PR, I like the approach and thinks the code looks pretty good overall. The only thing I don’t like is all the complexity added to AppInitMain(). I think that complexity could go away if you got rid of the std::set<BaseIndex*> indexers local variable there and added a NodeContext std::vector<BaseIndex*> indexes member instead. Then there would be no need for the func_start_indexes lambda. The lambda could be replaced with a regular StartIndexes function instead.

furszy commented at 1:54 pm on May 17, 2023: member

Was about to send the updates comment hehe. I was too tired last night to write it. Thanks for the review ryanofsky!

Updates list:

Decoupled index Init() from Start(). So indexers can be initialized without spawning the sync thread.
Simplified the pruning violation code by re-purposing the GetFirstStoredBlock function. Now called IsBlockDataAvailable.
Fixed a small race, where we set the index m_synced flag (which enables BlockConnected events) before calling to the child class init function. So, for example, the block filter index could theoretically process a block before initializing the next filter position field and end up overwriting the first stored filter.

Feedback:

Looking at current version of the PR 6dfec1b it seems to have changed a lot, and the other thing it is doing now is delaying startup of indexes until blocks are loaded, so there is no longer a conflict with reindex-chainstate. So it is basically reimplementing #25193 in a different way that doesn’t require a sleep_for(std::chrono::milliseconds(500)) waitloop. I think this approach is cleaner, but also that #25193 is a smaller more targeted change with a test, so I would probably prefer to see #25193 merged first and with this cleanup and rewrite merged later.

Yeah, I updated the PR description last night stating that this is now built on top of #25193 (but my slightly modified version of it #25193#pullrequestreview-1375021974). @mzumsande said that he was going to give them a look and probably take them. So, all good if them get squashed there or here. Either way is fine for me. Happy to re-review #25193 whenever is ready to go again.

On this PR, I like the approach and thinks the code looks pretty good overall. The only thing I don’t like is all the complexity added to AppInitMain(). I think that complexity could go away if you got rid of the std::set<BaseIndex*> indexers local variable there and added a NodeContext std::vector<BaseIndex*> indexes member instead. Then there would be no need for the func_start_indexes lambda. The lambda could be replaced with a regular StartIndexes function instead.

Sounds great. Was also thinking about moving the lambda to a standalone function but wasn’t finding the right place and didn’t want to create a new file only for this.

furszy force-pushed on May 17, 2023

DrahtBot added the label Needs rebase on May 17, 2023

furszy force-pushed on May 17, 2023

furszy commented at 7:23 pm on May 17, 2023: member

Rebased post #25193 merge. Conflicts solved.

Only change from the last push is on the first commit ca30419, where the index threads active wait and the global flag are replaced by a post-poned indexers start call.

furszy renamed this:
~~init: verify blocks data existence only once for all the indexers~~
index: improve initialization and pruning violation checks
on May 17, 2023

furszy renamed this:
~~index: improve initialization and pruning violation checks~~
index: improve initialization and pruning violation check
on May 17, 2023

DrahtBot removed the label Needs rebase on May 17, 2023

furszy force-pushed on May 18, 2023

furszy commented at 5:17 pm on May 18, 2023: member

CI failure is not related.

maflcko commented at 5:22 pm on May 18, 2023: member

See #27492 (comment) about the CI failure

ryanofsky commented at 5:25 pm on May 18, 2023: contributor

re: #27607#issue-1702212340

Fixed a small race, where we set the index m_synced flag (which enables BlockConnected events) before calling to the child class init function. So, for example, the block filter index could theoretically process a block before initializing the next filter position field and end up overwriting the first stored filter.

Can you clarify this? I don’t see m_synced getting set to true before the index is synced, and don’t understand the block filter index example.

furszy commented at 5:42 pm on May 18, 2023: member

re: #27607 (comment)

Fixed a small race, where we set the index m_synced flag (which enables BlockConnected events) before calling to the child class init function. So, for example, the block filter index could theoretically process a block before initializing the next filter position field and end up overwriting the first stored filter.

Can you clarify this? I don’t see m_synced getting set to true before the index is synced, and don’t understand the block filter index example.

Sure. The BaseIndex::Start() flow in master is the following one:

Calls RegisterValidationInterface() –> which registers to the validation interface.
Calls BaseIndex::Init() –> which, prior the pruning check, sets m_synced: m_synced = m_best_block_index.load() == active_chain.Tip()
Calls CustomInit(). –> which, for the block index filter, initializes the m_next_filter_pos field with the db information.

So, if the block filter index is synced, and the index receives a BlockConnected event after finish point (2) and before point (3) then m_next_filter_pos will not be initialized, so BlockFIlterIndex::CustomAppend() will use a null m_next_filter_pos.

ryanofsky commented at 5:54 pm on May 18, 2023: contributor

Thanks! So the problem is that m_synced is set to true before CustomInit code runs, instead of after. It looks like this bug was introduced in bef4e405f3de2718dfee279a9abff4daf016da26 from #25494

EDIT: Added a note in the other PR #25494 (review)

furszy commented at 6:05 pm on May 18, 2023: member

Thanks! So the problem is that m_synced is set to true before CustomInit code runs, instead of after. It looks like this bug was introduced in bef4e40 from #25494

Yeah!, that is more or less what I wrote in the description “Fixed a small race, where we set the index m_synced flag (which enables BlockConnected events) before calling to the child class init function”.

Happy to write it differently if it is not good enough. Maybe should change “child class init function” for “CustomInit”?. (in the commit description it’s explained with more details anyway)

ryanofsky commented at 6:23 pm on May 18, 2023: contributor

Happy to write it differently if it is not good enough. Maybe should change “child class init function” for “CustomInit”?. (in the commit description it’s explained with more details anyway)

Yes if CustomInit was mentioned I probably would have understood better. I would say this PR fixes a race condition where an index’s CustomAppend method might be called before its CustomInit method, causing the index to try to update itself before it is initialized. And that the PR fixes the problem by always setting m_synced to true until after CustomInit is called, instead of before. And that the race was introduced in bef4e405f3de2718dfee279a9abff4daf016da26 from #25494

in src/init.cpp:1666 in ca3041984c outdated

1666+        }
1667 
1668-    chainman.m_load_block = std::thread(&util::TraceThread, "loadblk", [=, &chainman, &args] {
1669         ThreadImport(chainman, vImportFiles, ShouldPersistMempool(args) ? MempoolPath(args) : fs::path{});
1670+
1671+        // Start indexers

ryanofsky commented at 6:35 pm on May 18, 2023:

In commit “init: start indexes after the loadblk thread finishes” (ca3041984cf3665e27b6783c923ab1c32682500a)

Could we change “indexers” to “indexes” in all new code? I don’t think distinguishing between an “index” and “indexer” really clarifies anything, and the word “indexers” seems a little odd and unexpected.

furszy commented at 2:14 pm on May 19, 2023:

Sure, will change it. I used indexers to try to differentiate them from block indexes but guess that its an odd term in English.

furszy commented at 6:43 pm on May 18, 2023: member

sounds good. Description updated, thanks.

in src/init.cpp:1683 in ca3041984c outdated

1667 
1668-    chainman.m_load_block = std::thread(&util::TraceThread, "loadblk", [=, &chainman, &args] {
1669         ThreadImport(chainman, vImportFiles, ShouldPersistMempool(args) ? MempoolPath(args) : fs::path{});
1670+
1671+        // Start indexers
1672+        if (!StartIndexes(node)) {

ryanofsky commented at 6:48 pm on May 18, 2023:

In commit “init: start indexes after the loadblk thread finishes” (ca3041984cf3665e27b6783c923ab1c32682500a)

One side-effect of this change is that now if there is a “best block of the index goes beyond pruned data” error, bitcoind successfully starts and returns 0 then exits, where previously it could fail early and actually return an error code.

That aspect of the previous behavior seems better to me, though I’m not sure what the tradeoffs are.

Do you think the the main benefit of this commit is cleaner code, and the worse error detection a side effect? Or are there other benefits to this commit?

I’m thinking the PR is doing a lot of things and maybe it would be good to split into smaller PRs where possible. Especially if the smaller PRs can be unambiguous improvements.

furszy commented at 3:13 pm on May 19, 2023:

Do you think the the main benefit of this commit is cleaner code, and the worse error detection a side effect? Or are there other benefits to this commit?

Aside from the cleaner code (which I think that it’s salable on its own), I see maintainability and test coverage benefits with the global flag removal. Also, the indexes threads active-wait isn’t that good in terms of processors context switching (the index threads are waking up every 0.5s and the reindex process could take a while). Plus, since #25193, we skip the pruning violation checks prior reindex, while here we are recovering those checks too.

Furthermore, it’s more accurate to conduct the pruning violations checks after the “loadblk” thread execution. This is because that thread imports data from external sources and activates the best chain (only there we know which is the node’s best chain). Meaning that, with the current code, we could have performed the pruning violations checks on a fork chain if the import thread encounters another best chain (and not perform any pruning check on the real active chain). Which could lead to a similar situation as there would be no init error if something isn’t correct after thread import execution.

Said that, talking about the init returned error, maybe we could cache the node shutdown reason and return it even if the error was post-init?. Will give it a look.

I’m thinking the PR is doing a lot of things and maybe it would be good to split into smaller PRs where possible. Especially if the smaller PRs can be unambiguous improvements.

Yeah sure. I usually try to not split a PR too much (when it’s scoped under a certain area) to not loose review power and coordinate better between ourselves but.. I agree that this started targeting something and now grew up a bit longer.

furszy commented at 3:18 pm on May 20, 2023:

Said that, talking about the init returned error, maybe we could cache the node shutdown reason and return it even if the error was post-init?. Will give it a look.

Following-up this, pushed #27708. Which will let us keep the same behavior as we have now:

The user, same as the functional tests, shouldn’t notice any difference between running the pruning violation checks in the load-blk thread vs init thread with it. We will continue erroring out after a pruning violation with an EXIT_FAILURE code. Making the feature_index_prune.py changes in ca3041984cf3665e27b6783c923ab1c32682500a no longer needed.

mzumsande commented at 8:42 pm on May 21, 2023:

Do you think the the main benefit of this commit is cleaner code, and the worse error detection a side effect? Or are there other benefits to this commit?

I think one minor downside could be that the indexes will be available a few minutes later because loadblk doesn’t only do reindexing etc., it also loads the mempool - this could take several minutes depending on mempool size and there isn’t really a reason why the indexes shouldn’t be available before that has finished. This is already the case now in case of -reindex-chainstate after #25193 (I didn’t think of that), but it could be easily changed in master by moving g_indexes_ready_to_sync = true up one line in ThreadImport.

furszy commented at 4:21 am on May 22, 2023:

This is already the case now in case of -reindex-chainstate after #25193 (I didn’t think of that), but it could be easily changed in master by moving g_indexes_ready_to_sync = true up one line in ThreadImport.

yeah, I don’t see why that would be a downside for this changes neither. Same as with g_indexes_ready_to_sync flag, the StartIndexes() call could be executed before the mempool load too.

It wouldn’t be bad to rename ThreadImport to ImportBlocks, and move the single line mempool load function call to the end of the loadblk lambda.

in src/node/blockstorage.h:225 in 704c9e4cf6 outdated

221+    //! Returns nullptr if 'lower_block' is not an 'upper_block' ancestry.
222+    const CBlockIndex* IsBlockDataAvailable(const CBlockIndex& upper_block LIFETIMEBOUND, const CBlockIndex* lower_block=nullptr) EXCLUSIVE_LOCKS_REQUIRED(::cs_main);
223+
224     //! Find the first block that is not pruned
225-    const CBlockIndex* GetFirstStoredBlock(const CBlockIndex& start_block LIFETIMEBOUND) EXCLUSIVE_LOCKS_REQUIRED(::cs_main);
226+    const CBlockIndex* GetFirstStoredBlock(const CBlockIndex& upper_block LIFETIMEBOUND) EXCLUSIVE_LOCKS_REQUIRED(::cs_main);

ryanofsky commented at 6:53 pm on May 18, 2023:

In commit “index: simplify pruning violation check” (704c9e4cf67468708db655226a78489b92ef0523)

Would be good to make this declaration use the same argument name as the definition which is BlockManager::GetFirstStoredBlock(const CBlockIndex& start_block)

furszy commented at 3:42 pm on May 21, 2023:

yeah thx, this is a remanent from a previous version.

in src/index/base.cpp:118 in 704c9e4cf6 outdated

142-            }
143-        }
144-        if (prune_violation) {
145+
146+        // make sure we have all block data back to the start block
147+        if (m_chainstate->m_blockman.IsBlockDataAvailable(*active_chain.Tip(), start_block) != start_block) {

ryanofsky commented at 6:55 pm on May 18, 2023:

In commit “index: simplify pruning violation check” (704c9e4cf67468708db655226a78489b92ef0523)

Could the commit message clarify if any behavior is changing here? Would be good to label it a refactoring and say it does not change behavior if nothing is changing, and say what is changing otherwise.

Also, it seems like new code that is added here just gets moved / deleted later in the PR in the last commit “index: verify blocks data existence only once” (dd9bc393ebd4ba915ca94991390b6e98d29dcfaf).

I could be missing something, but it seems like the PR would be easier to review if this commit only added the IsBlockDataAvailable function, the other commit started calling it, so reviewers would only need to review one new version of the pruned data check, instead of reviewing a temporary version that gets thrown away and then another final version.

furszy commented at 4:48 pm on May 21, 2023:

Could the commit message clarify if any behavior is changing here? Would be good to label it a refactoring and say it does not change behavior if nothing is changing, and say what is changing otherwise.

yeah sure. https://github.com/bitcoin/bitcoin/commit/704c9e4cf67468708db655226a78489b92ef0523 has no behavior change.

Also, it seems like new code that is added here just gets moved / deleted later in the PR in the last commit “index: verify blocks data existence only once” (https://github.com/bitcoin/bitcoin/commit/dd9bc393ebd4ba915ca94991390b6e98d29dcfaf).

I could be missing something, but it seems like the PR would be easier to review if this commit only added the IsBlockDataAvailable function, the other commit started calling it, so reviewers would only need to review one new version of the pruned data check, instead of reviewing a temporary version that gets thrown away and then another final version.

Hmm, I actually made it in this way to try to make review easier.

The idea was that 704c9e4 makes current violation code much more concise with no behavior change (introducing this new function which re-purposes another existent function), then dd9bc393 moves it to init with a slightly difference: instead of calling IsBlockDataAvailable once per index, it’s called once for all the indexes (so the diff is on init fetching the oldest block index from the indexers and then calling IsBlockDataAvailable with it).

I thought that was going to be easier to review in this way than moving + changing the entire pruning violation check at once in a single commit. I mean, at least for me, it seemed good to let us verify that the conciser pruning violation code doesn’t introduces any behavior changes first (at this point reviewers should be ok with the “new” process), then move that to init and call it with the the oldest indexer block index (which is the behavior change).

But let me know if this doesn’t makes sense for you, and you still prefer the all-in-one way. If it doesn’t help review or code consistency, could go ahead with your suggestion.

in src/index/base.cpp:110 in 21812970cc outdated

115@@ -119,6 +116,16 @@ bool BaseIndex::Init()
116             return InitError(strprintf(Untranslated("%s best block of the index goes beyond pruned data. Please disable the index or reindex (which will download the whole blockchain again)"), GetName()));
117         }
118     }
119+
120+    // Child init
121+    if (!CustomInit(start_block ? std::make_optional(interfaces::BlockKey{start_block->GetBlockHash(), start_block->nHeight}) : std::nullopt)) {

ryanofsky commented at 6:59 pm on May 18, 2023:

In commit “index: prevent race by calling ‘CustomInit’ prior setting ’m_synced’ flag” (21812970cc56191e18f96692e47f00ed881bd596)

I think it would be nice to split this commit off into a separate PR is possible, so we have a minimal bugfix that could backported (and also be easier to understand)

mzumsande commented at 8:20 pm on May 19, 2023:

Good catch! Is my understanding correct that this race cannot actually happen in practice on master right now, because the indexes are set up (Init()) by the init thread, which starts the networking and loadblk threads only later - so I can’t see from which other thread BlockConnected signals could come from at this stage. However, once index initialization is deferred to the loadblk thread in ca3041984cf3665e27b6783c923ab1c32682500a, I think this race could easily happen. (this might affect the decision whether backport is necessary).

furszy commented at 1:56 pm on May 22, 2023:

Is my understanding correct that this race cannot actually happen in practice on master right now, because the indexes are set up (Init()) by the init thread, which starts the networking and loadblk threads only later - so I can’t see from which other thread BlockConnected signals could come from at this stage.

Yeah, the race cannot realistically happen in master. The only process that occurs before the indexes initialization is the chain state loading process, which doesn’t signal any block connection event.

So back porting seems to not be needed. But, if it helps review, could still decouple 21812970cc56191e18f96692e47f00ed881bd596 into a standalone PR.

in src/init.cpp:1581 in 26bd60eafc outdated

1576@@ -1577,6 +1577,9 @@ bool AppInitMain(NodeContext& node, interfaces::BlockAndHeaderTipInfo* tip_info)
1577         node.indexers.insert(g_coin_stats_index.get());
1578     }
1579 
1580+    // Init all indexers
1581+    for (auto indexer : node.indexers) indexer->Init();

ryanofsky commented at 7:02 pm on May 18, 2023:

In commit “refactor: index, decouple ‘Init’ from ‘Start’” (26bd60eafc42484d80d5dfef0fa0abb77e7817ce)

Init return value is ignored. I think this needs to return false if it fails. Would also be good to mark Init declaration with [[nodiscard]]

furszy commented at 1:57 pm on May 22, 2023:

good catch 👌🏼 .

ryanofsky commented at 7:16 pm on May 18, 2023: contributor

Reviewed dd9bc393ebd4ba915ca94991390b6e98d29dcfaf, which looks good overall, but I had questions about few things

ryanofsky dismissed

maflcko added the label Needs rebase on May 19, 2023

DrahtBot removed the label Needs rebase on May 19, 2023

in src/node/blockstorage.h:222 in 704c9e4cf6 outdated

215@@ -216,8 +216,13 @@ class BlockManager
216     //! Returns last CBlockIndex* that is a checkpoint
217     const CBlockIndex* GetLastCheckpoint(const CCheckpointData& data) EXCLUSIVE_LOCKS_REQUIRED(cs_main);
218 
219+    //! Returns 'lower_block' if all blocks up to that point are available in disk, otherwise returns the first block that is not pruned in the upper-lower range.
220+    //! If 'lower_block=nullptr': verify blocks availability up to the genesis block.
221+    //! Returns nullptr if 'lower_block' is not an 'upper_block' ancestry.
222+    const CBlockIndex* IsBlockDataAvailable(const CBlockIndex& upper_block LIFETIMEBOUND, const CBlockIndex* lower_block=nullptr) EXCLUSIVE_LOCKS_REQUIRED(::cs_main);

mzumsande commented at 6:42 pm on May 19, 2023:

I don’t think this is possible to trigger with the way the function is used, but if upper_block already has NO_DATA, it would return that block, maybe nullptr would be more natural? (also applies to GetFirstStoredBlock).

Also, the name sounds like a boolean function, maybe something like CheckBlockDataAvailability() would be better?

furszy commented at 2:07 pm on May 22, 2023:

I don’t think this is possible to trigger with the way the function is used, but if upper_block already has NO_DATA, it would return that block, maybe nullptr would be more natural? (also applies to GetFirstStoredBlock).

yeah, that is something that already exists. Good eye. It shouldn’t be possible to trigger as upper_block is always equal to the chain tip and, as far as I know, we never prune the tip. But.. a bug is a bug, better to fix it now.

furszy force-pushed on May 22, 2023

furszy commented at 4:41 pm on May 22, 2023: member

Feedback tackled, thanks ryanofsky and mzumzande for your deep review!

This PR now depends on #27720 and #27708.

List of changes since the last review:

Created #27708 so the post-init shutdown doesn’t introduce any tangible difference for user and tests. (Per #27607 (review) discussion).
Moved the LoadMempool() call from the blockstorage import function to init.cpp (per #27607 (review) discussion). So the indexes can be started prior starting the mempool (which could take a while) and we don’t keep a mempool related function inside blockstorage.cpp.
Decoupled the uninitialized index child class race condition into an standalone PR #27720.
Fixed a missing init return if index.Init() fails.
Renamed IsBlockDataAvailable to CheckBlockDataAvailability and added the missing data availability check for the upper block (https://github.com/bitcoin/bitcoin/pull/27607#discussion_r1199268910).

DrahtBot removed the label CI failed on May 22, 2023

DrahtBot added the label UTXO Db and Indexes on May 22, 2023

furszy force-pushed on May 26, 2023

DrahtBot added the label Needs rebase on May 30, 2023

achow101 referenced this in commit 3a83d4417b on May 31, 2023

sidhujag referenced this in commit 9ec40ffad2 on May 31, 2023

furszy force-pushed on May 31, 2023

DrahtBot removed the label Needs rebase on May 31, 2023

DrahtBot added the label Needs rebase on Jun 12, 2023

ryanofsky commented at 3:52 pm on June 13, 2023: contributor

This can be rebased now that #27708 is merged

furszy force-pushed on Jun 14, 2023

DrahtBot added the label CI failed on Jun 14, 2023

DrahtBot removed the label Needs rebase on Jun 14, 2023

furszy force-pushed on Jun 14, 2023

furszy commented at 2:42 pm on June 14, 2023: member

This can be rebased now that #27708 is merged

Done thanks, rebased on master. Had some troubles with the line-ending encoding in the Windows CI job but all good now.

The current CI failure is unrelated to this PR.

in src/init.cpp:1663 in 17fd33248c outdated

1682@@ -1683,14 +1683,17 @@ bool AppInitMain(NodeContext& node, interfaces::BlockAndHeaderTipInfo* tip_info)
1683     }
1684 
1685     chainman.m_load_block = std::thread(&util::TraceThread, "loadblk", [=, &chainman, &args] {
1686-        ThreadImport(chainman, vImportFiles, ShouldPersistMempool(args) ? MempoolPath(args) : fs::path{});
1687+        // Import blocks
1688+        ImportBlocks(chainman, vImportFiles);
1689+        // Load mempool from disk
1690+        chainman.ActiveChainstate().LoadMempool(ShouldPersistMempool(args) ? MempoolPath(args) : fs::path{});

ryanofsky commented at 3:54 pm on June 14, 2023:

In commit “refactor: move ‘LoadMempool’ call out of blockstorage.cpp” (17fd33248cf6d1daa995d4c29bc98e7cd3dd722d)

I’m confused about this commit because the commit message says it only “will allow us to start indexes prior loading the mempool” in future commits, but the actual code change in commit sets g_indexes_ready_to_sync before calling LoadMempool, so it seems like that change is happening here, not in future commits.

Also renaming ImportBlocks to ThreadImport here makes it hard to see where the real changes are in this commit. I agree ImportBlocks would be a slightly better name than ThreadImport, but I also think ThreadImport is fine. What’s more misleading than the function name is the thread name “loadblk”, so if you do want to do a renaming, I’d suggest renaming both loadblk and ThreadImport in a separate scripted diff commit not mixed with more substantive changes.

furszy commented at 12:22 pm on June 16, 2023:

I’m confused about this commit because the commit message says it only “will allow us to start indexes prior loading the mempool” in future commits, but the actual code change in commit sets g_indexes_ready_to_sync before calling LoadMempool, so it seems like that change is happening here, not in future commits.

Upps, yeah. Sorry for that. I updated that commit description in a rush..

Also renaming ImportBlocks to ThreadImport here makes it hard to see where the real changes are in this commit. I agree ImportBlocks would be a slightly better name than ThreadImport, but I also think ThreadImport is fine. What’s more misleading than the function name is the thread name “loadblk”, so if you do want to do a renaming, I’d suggest renaming both loadblk and ThreadImport in a separate scripted diff commit not mixed with more substantive changes.

Yeah ok. Agree. Have renamed “loadblk” to “post_init_load” but I’m all ears if you have a better term for a thread that imports/load blocks, the mempool and starts indexes. No better naming has come to my head yet.

ryanofsky commented at 4:13 pm on June 22, 2023:

Yeah ok. Agree. Have renamed “loadblk” to “post_init_load” but I’m all ears if you have a better term for a thread that imports/load blocks, the mempool and starts indexes. No better naming has come to my head yet.

I think current name loadblk is misleading because the thread is doing more than loading block data, so most other names would be better. I’d probably call it initload just to suggest it is loading data and is related to initialization, and will go away when the node is started. I think it’s also good if the thread has a short name so it is readable in tools like top

furszy commented at 1:43 pm on June 24, 2023:

Ok done, renamed the thread to initload.

in src/init.cpp:1557 in a9e45f7926 outdated

1578         if (!result) {
1579             return InitError(util::ErrorString(result));
1580         }
1581 
1582         g_txindex = std::make_unique<TxIndex>(interfaces::MakeChain(node), cache_sizes.tx_index, false, fReindex);
1583-        if (!g_txindex->Start()) {

ryanofsky commented at 4:24 pm on June 14, 2023:

In commit “init: start indexes after ’loadblk’ thread finishes” (a9e45f7926fab62ef0c5cea540386266235b2990)

Can the commit message be written to say how this affects behavior of bitcoind?

I understand that it gets rid of the g_indexes_ready_to_sync variable internally, which is nice.

I can also see that behavior of the BaseIndex::ThreadSync() is basically unchanged, because it now calls BaseIndex::Start in basically same place it previously set g_indexes_ready_to_sync to true, after importing blocks and before loading the mempool.

But now RegisterValidationInterface and BaseIndex::Init calls appear to be delayed. Previously they ran on the main thread before loadblk thread started, now they are running on the loadblk thread, after importing blocks and before loading the mempool. Is it a good thing to delay the index Register and Init calls? Is it good to move them from the main thread to the loadblk thread? It seems like it would be pretty easy to keep those calls where they are and still get rid of the g_indexes_ready_to_sync variable, so what is the motivation for changing the behavior here?

The change seems reasonable to me, but the description is vague and it’s not clear how exactly behavior is changing and why the behavior change is good, or neccessary.

EDIT: Looking at later commit “refactor: index, decouple ‘Init’ from ‘Start’” (9f38c2ffba87ea07c0eeb9975ad4d84ae716f9aa), I can see that the behavior change here actually gets reverted later and is unnecessary. I would suggest moving that commit before this one so this one only removes the g_indexes_ready_to_sync variable without affecting other behavior.

in src/node/blockstorage.h:225 in 2163b736f2 outdated

215@@ -216,6 +216,11 @@ class BlockManager
216     //! Returns last CBlockIndex* that is a checkpoint
217     const CBlockIndex* GetLastCheckpoint(const CCheckpointData& data) EXCLUSIVE_LOCKS_REQUIRED(cs_main);
218 
219+    //! Returns 'lower_block' if all blocks up to that point are available in disk, otherwise returns the first block that is not pruned in the upper-lower range.
220+    //! If 'lower_block=nullptr': verify blocks availability up to the genesis block.
221+    //! Returns nullptr if 'lower_block' is not an 'upper_block' ancestry.
222+    const CBlockIndex* CheckBlockDataAvailability(const CBlockIndex& upper_block LIFETIMEBOUND, const CBlockIndex* lower_block=nullptr) EXCLUSIVE_LOCKS_REQUIRED(::cs_main);

ryanofsky commented at 4:35 pm on June 14, 2023:

In commit “refactor: simplify pruning violation check” (2163b736f255dd5381a6f9ba4ad7dd32192c74e6)

This commit seems like a nice code simplification to the BaseIndex::Init function. I wonder if simplifying Init is the real motivation for the change at this point, or if this is also used for something else later. Would be helpful if commit message had a sentence about the benefit / motivation.

Also commit message should be updated to mention CheckBlockDataAvailability instead of IsBlockDataAvailable.

Also, it would be good to have a simple unit test for the CheckBlockDataAvailability function in blockmanager_tests.cpp. Even if the unit test doesn’t cover all the edge cases, having it would be useful to show CheckBlockDataAvailability makes sense as an API and is testable

EDIT: Now that I’ve reached the last commit of the PR “index: verify blocks data existence only once” (ce5f798b089dc8051f6b414ce1a606e60c85d7f3) I can see that the motivation for commit is to be able to help make index startup more asynchronous, avoid repeating work when multiple indexes are enabled in the followup commit. Could mention something like that in the commit message here.

in src/init.cpp:1568 in 9f38c2ffba outdated

1588@@ -1589,6 +1589,9 @@ bool AppInitMain(NodeContext& node, interfaces::BlockAndHeaderTipInfo* tip_info)
1589         node.indexes.insert(g_coin_stats_index.get());
1590     }
1591 
1592+    // Init all indexes
1593+    for (auto index : node.indexes) if (!index->Init()) return false;

ryanofsky commented at 4:47 pm on June 14, 2023:

In commit “refactor: index, decouple ‘Init’ from ‘Start’” (9f38c2ffba87ea07c0eeb9975ad4d84ae716f9aa)

Oh wow, it seems like this commit is basically reverting all the behavior changes of earlier commit a9e45f7926fab62ef0c5cea540386266235b2990 “init: start indexes after ’loadblk’ thread finishes” that I was confused by and asking about in my comment on that commit (https://github.com/bitcoin/bitcoin/pull/27607#discussion_r1229884217).

I would definitely recommend moving this commit before the earlier a9e45f7926fab62ef0c5cea540386266235b2990 commit, and keeping RegisterValidationInterface and BaseIndex::Init code running on the main thread throughout the PR, instead of moving it from the main thread to the loadblk thread in the earlier commit, and then moving it back to the main thread again in this commit. That would be a lot less confusing.

furszy commented at 8:15 pm on June 16, 2023:

yeah, totally agree. Commits order ended up quite ugly after all the findings. Thanks.

ryanofsky approved

ryanofsky commented at 4:57 pm on June 14, 2023: contributor

Light code review ACK ce5f798b089dc8051f6b414ce1a606e60c85d7f3. The final changes here look pretty good, but I saw some confusing things in the individual commits, and left some suggestions I think would make this easier to review and understand.

furszy force-pushed on Jun 17, 2023

DrahtBot removed the label CI failed on Jun 17, 2023

furszy commented at 2:29 pm on June 18, 2023: member

Feedback tackled, thanks ryanofsky!

Reordered commits: so we first introduce the indexes vector inside the NodeContext struct, then Init() is decoupled from Start(), and finally the Start() call is moved to the ’loadblk’ thread (where indexes can perform their work directly without having to be actively waiting for another thread signal).
Extracted mempool load changes into a standalone commit.
Added scripted-diff for the rename changes: ’loadblk’ thread name, the ThreadImport function and the ’m_load_block’ class member.
Improved commits description per feedback.

Note: I’m not super strong over the selected names in the scripted-diff commit 32c646f. Could drop it, or change them, if they are not good enough. Thoughts?

in src/index/base.cpp:414 in fc4d09af52 outdated

416-    m_chainstate = &m_chain->context()->chainman->ActiveChainstate();
417-    // Need to register this ValidationInterface before running Init(), so that
418-    // callbacks are not missed if Init sets m_synced to true.
419-    RegisterValidationInterface(this);
420-    if (!Init()) return false;
421+    if (!m_init) throw std::runtime_error("Error: Cannot start a non-initialized index");

ryanofsky commented at 7:23 pm on June 22, 2023:

In commit “refactor: index, decouple ‘Init’ from ‘Start’” (fc4d09af5218e4df83cb8d94b8abdfdfc38bcbb0)

Should probably throw std::logic_error not std::runtime_error because there’s no way this should happen at runtime unless there is a bug. Could also assert(m_init)

furszy commented at 1:41 pm on June 24, 2023:

Sure, done.

in src/node/blockstorage.cpp:409 in 5bf99759d6 outdated

405@@ -406,6 +406,7 @@ const CBlockIndex* BlockManager::GetFirstStoredBlock(const CBlockIndex& start_bl
406 {
407     AssertLockHeld(::cs_main);
408     const CBlockIndex* last_block = &start_block;
409+    if (!(last_block->nStatus & BLOCK_HAVE_DATA)) return nullptr;

ryanofsky commented at 7:29 pm on June 22, 2023:

In commit “bugfix: Make GetFirstStoredBlock return nullptr if ‘start_block’ has no data” (5bf99759d677efbdca1c3903b5128a870963a429)

Would be nice to have a unit test showing the effect of the change in this commit fixing the bug.

Test coverage is added in the next commit 3df6fbb44f60ebb739395212170f0093e0fe3e1e, so maybe the small part of that test that’s relevant could be added here. Or the commit order could be just switched and the test updated in this commit.

furszy commented at 1:41 pm on June 24, 2023:

Done. Reordered code so the test coverage for the bug fix is introduced with the bug fix.

in src/index/base.cpp:106 in 480f5aad06 outdated

103@@ -106,10 +104,9 @@ bool BaseIndex::Init()
104         SetBestBlockIndex(locator_index);
105     }
106 
107-    // Skip pruning check if indexes are not ready to sync (because reindex-chainstate has wiped the chain).
108     const CBlockIndex* start_block = m_best_block_index.load();
109     bool synced = start_block == active_chain.Tip();
110-    if (!synced && g_indexes_ready_to_sync) {

ryanofsky commented at 3:20 pm on June 23, 2023:

In commit “init: don’t start indexes sync thread prematurely” (480f5aad060f4f60b5d0617cfb96276891ac692a)

It’s not really clear why this change is safe. It seems like in case of reindexing maybe active_chain.Tip() could be null and the prune_violation error could happen below? Or possibly there’s a reason that wouldn’t happen that could be clarified here. Maybe && g_indexes_ready_to_sync needs to be replaced with && !fReindexChainState? Either way would at least be good to keep a code comment here about how the prune_violation check is supposed to work with reindexing.

furszy commented at 1:11 pm on June 24, 2023:

Hmm ok, this should have been part of the commits ordering changes.

The change is not safe in 480f5aad, it becomes safe at the last commit (in b9fe9ab).

Basically, this last commit moves the pruning violation check from Init() to the end of the ’loadblk’ thread, which is where the reindex, block loading and chain activation processes happen.

So we can run this verification step even when the reindex or reindex-chainstate flags are enabled (which has being skipped so far).

This means executing the pruning violation checks only prior spawning the indexes threads, which is safe because these checks’ purpose is to verify that the indexes threads will be able to sync up to the tip block.

Will add a check inside the intermediate commit so the history stay consistent and also expand the last commit description to include this explanation. Thanks!

ryanofsky commented at 9:21 pm on June 26, 2023:

In commit “init: don’t start indexes sync thread prematurely” (d7a13cb990826d0422e2135a6fd4f2b8813b6eb8)

I don’t think replacing the && g_indexes_ready_to_sync condition with && !LoadingBlocks here in this commit is correct or preserves behavior. The goal of having the condition is to skip checking code and avoid the prune_violation error below if -reindex-chainstate is used, but !LoadingBlocks will still be true if -reindex-chainstate is used. Definition of LoadingBlocks is m_importing || fReindex but m_importing is guaranteed to be false here because the ThreadImport thread is not created at this point. And the fReindex global is only true if -reindex option is used, not if -reindex-chainstate option is used.

I think the most straightforward thing to do in this commit is just to get rid of the while (!g_indexes_ready_to_sync) loop in BaseIndex::ThreadSync but leave the g_indexes_ready_to_sync variable alone everywhere else. Then get rid of g_indexes_ready_to_sync later when it is safe to eliminate, probably in the last commit “index: verify blocks data existence only once” (3514f768b181d359d0218845ace0266d37e9dfbd

furszy commented at 2:56 pm on June 27, 2023:

Ah yeah. The other possibility, which adds a bit more overhead, could be to move the pruning violation check from Init() to Start() in the intermediate commit, then move it from Start() to StartIndexes() in the last one.

Will go with your suggestion. Thanks.

ryanofsky approved

furszy force-pushed on Jun 24, 2023

furszy commented at 1:49 pm on June 24, 2023: member

Updated per feedback, thanks ryanofsky.

Changes:

Renamed post_load_init thread to initload.
Changed BaseIndex::Start() to throw std::logic_error when the index is not initialized.
Added coverage for the bugfix commit within the bugfix commit and not later (2e1bb3b).
Expanded the last commit description explaining the pruning violation check movement from the init thread to the worker ’loadinit’ thread (per comment).

in src/node/blockstorage.cpp:404 in 1f88e942d3 outdated

401@@ -402,17 +402,28 @@ bool BlockManager::IsBlockPruned(const CBlockIndex* pblockindex)
402     return (m_have_pruned && !(pblockindex->nStatus & BLOCK_HAVE_DATA) && pblockindex->nTx > 0);
403 }
404 
405-const CBlockIndex* BlockManager::GetFirstStoredBlock(const CBlockIndex& start_block)
406+const CBlockIndex* BlockManager::CheckBlockDataAvailability(const CBlockIndex& upper_block, const CBlockIndex* lower_block)

ryanofsky commented at 9:46 pm on June 26, 2023:

In commit “refactor: simplify pruning violation check” (1f88e942d34bf8c51374567e9086f5535f0bde21)

Looking at the CheckBlockDataAvailability and GetFirstStoredBlock functions more closely, I don’t think is ever a legitmate reason to call them with an upper block that missing data, or to call them with a lower block that is not an ancestor of ancestor of upper block.

I think if either of these things happen, something has gone badly wrong, and it would be better to treat these things as failed preconditions, instead of normal cases that CheckBlockDataAvailability ignores, and might lead to crashes or bug later that are harder to debug.

Would suggest replacing:

0if (!(last_block->nStatus & BLOCK_HAVE_DATA)) return nullptr;

with:

0assert(last_block->nStatus & BLOCK_HAVE_DATA));

and:

0if (last_block->nHeight < lower_block->nHeight) return nullptr;

with:

0assert(last_block->nHeight < lower_block->nHeight);

and adding:

0assert(last_block != nullptr);

before the return at the end.

furszy commented at 8:52 pm on June 27, 2023:

Sure. Added the changes and documented the preconditions.

ryanofsky approved

ryanofsky commented at 10:15 pm on June 26, 2023: contributor

Code review ACK 3514f768b181d359d0218845ace0266d37e9dfbd

This looks good, and the PR seems pretty easy to review now. It’s a series of simple, clean improvements that make index startup more straightforward and efficient.

I left two review comments, one is about a potential bug in an intermediate commit, but it’s fixed by the final commit, and also not hard to address if you want to take the suggestion there.

I’d consider maybe renaming the PR from “index: improve initialization and pruning violation check” to “index: Make startup more efficient” and describing it as “Simplify index startup code, eliminating the g_indexes_ready_to_sync variable and moving the prune violation check out of the BaseIndex class. Also try to make startup more efficient by running the prune violation check once for all indexes instead of once for each index, and by delaying the prune violation check and moving it off of the main thread so the node can start up faster.”

in src/node/context.h:62 in 68b13a2b6c outdated

58@@ -58,6 +59,7 @@ struct NodeContext {
59     std::unique_ptr<ChainstateManager> chainman;
60     std::unique_ptr<BanMan> banman;
61     ArgsManager* args{nullptr}; // Currently a raw pointer because the memory is not managed by this struct
62+    std::vector<BaseIndex*> indexes; // raw pointers because memory is not managed by this struct

mzumsande commented at 3:55 pm on June 27, 2023:

commit 68b13a2b6c96fd061a31add58b8626c5bf1cdf3d nit: this is not a struct

furszy commented at 10:07 pm on June 27, 2023:

It refers to the NodeContext struct. Same as the line that is right above it.

mzumsande commented at 2:18 pm on June 28, 2023:

oh, right, I misunderstood this, can be resolved.

furszy force-pushed on Jun 27, 2023

furszy renamed this:
~~index: improve initialization and pruning violation check~~
index: make startup more efficient
on Jun 27, 2023

in doc/developer-notes.md:624 in 11fb8fa81a outdated

620@@ -621,7 +621,7 @@ Threads
621   : Started from `main()` in `bitcoind.cpp`. Responsible for starting up and
622   shutting down the application.
623 
624-- [ThreadImport (`b-loadblk`)](https://doxygen.bitcoincore.org/namespacenode.html#ab4305679079866f0f420f7dbf278381d)

mzumsande commented at 8:54 pm on June 27, 2023:

commit 11fb8fa81ab8e2436e4ae6de79b52581db399265: This spot doesn’t mean the ThreadImport function, but the thread. Various other spots below also mean the thread, so I think it’d be better to change this manually, and to also expand the description one line below? I think essentially this thread performs various loading tasks that are part of init but shouldn’t block the node from being started, so are done in a separate thread (external block import, reindex, reindex-chainstate, index background sync, mempool load)

furszy commented at 1:20 pm on June 28, 2023:

Yeah. Added eb47368 improving this docs in a separated commit.

in src/init.h:77 in 68b13a2b6c outdated

72@@ -73,4 +73,7 @@ bool AppInitMain(node::NodeContext& node, interfaces::BlockAndHeaderTipInfo* tip
73  */
74 void SetupServerArgs(ArgsManager& argsman);
75 
76+/** Validates requirements to run the indexes and spawns each index initial sync thread */
77+bool StartIndexes(node::NodeContext& node);

mzumsande commented at 9:06 pm on June 27, 2023:

commit 68b13a2b6c96fd061a31add58b8626c5bf1cdf3d: I think that once it’s decoupled from BaseIndex::Init(), the names BaseIndex::Start() and StartIndexes() are misleading: It doesn’t actually start the indexes, if Init() determines that an index is synced, the index is started right then (it begins to process validationinterface signals), and the later StartIndexes() does nothing, and could just as well be skipped. I’d prefer something like IndexBackgroundSync().

furszy commented at 1:21 pm on June 28, 2023:

Done

in src/init.cpp:1905 in 68b13a2b6c outdated

1900@@ -1904,3 +1901,9 @@ bool AppInitMain(NodeContext& node, interfaces::BlockAndHeaderTipInfo* tip_info)
1901 
1902     return true;
1903 }
1904+
1905+bool StartIndexes(NodeContext& node)

mzumsande commented at 9:09 pm on June 27, 2023:

commit 68b13a2b6c96fd061a31add58b8626c5bf1cdf3d: Considering how large this function becomes and how low-level the code is in the final state, I don’t really love having it in init.cpp instead of somewhere in index/. Could it be better in index/base.cpp as a free function, even if it’s not part of BaseIndex?

furszy commented at 10:44 pm on June 27, 2023:

I thought about that initially too, but we have #24230 going into the opposite direction (removing the CBlockIndex usages from the index/ directory).

I think that the best would be to place it inside the chain interface. But let me check how far that goes. If it requires more changes, might be good to leave it for a follow-up to not expand this PR more.

ryanofsky commented at 1:56 pm on June 28, 2023:

I’d definitely avoid moving StartIndexes into the interfaces::Chain class, because because that class is supposed to expose limited node functionality to outside callers like wallets and indexes without giving them access to node internals. It is supposed to be a thin wrapper around node functions and data, not to contain any real functionality itself. Its methods are meant to be called by wallets and indexes, not by node code because it would add an unnecessary layer of indirection, bring in a monolithic dependency, and could lead to interfaces::Chain class growing larger than it needs to be.

I think StartIndexes function is actually pretty at home in src/init.cpp and would be fine to leave there. But if you want to move it, you could move it to a src/index/init.cpp file. If you do move it to src/index/init.cpp, it would make index code mirror structure of wallet code. In wallet code, libbitcoin_wallet.a library contains all the .cpp files in src/wallet/ except src/wallet/init.cpp, which is part of libbitcoin_node.a. I’m not sure I Iove that structure, but it works and there is precedent for it.

Another option would be to move it somewhere like src/node/indexes.cpp. Conceivably the src/wallet/init.cpp file could move to src/node/wallets.cpp to be consistent with that.

I don’t think it matters too much what source file StartIndexes function lives in though, as long as it is not part of the Chain interface.

furszy commented at 2:43 pm on June 28, 2023:

ok gotcha about the chain interface.

I thought about it because of the goal of removing the node internal types from the subdirectories. If we stick to that path, the code move into the src/index/init.cpp could be seen as a regression.

The good point about moving this code somewhere else is that it could be unit tested without having to add init.h and NodeContext dependencies.

At least for now, I’m a bit inclined for the src/node/ option but not feeling strong over it neither. I would also be fine if we keep it at init.cpp until we add more unit tests. What do you think @mzumsande?

mzumsande commented at 5:02 pm on June 30, 2023:

I think I also like the src/node option most, but it’s ok for me to keep in init for now and move that later.

furszy commented at 9:14 pm on June 27, 2023: member

Updated per feedback, thanks ryanofsky!.

Reordered code so g_indexes_ready_to_sync removal occurs at the last commit.
Documented the CheckBlockDataAvailability preconditions.
And fixed a little bug that could have arose when the user erases the blocks directory without erasing the indexes directory. In this scenario, the process need to verify that CheckBlockDataAvailability return value is the genesis block.

mzumsande commented at 9:37 pm on June 27, 2023: contributor

Concept ACK, didn’t review the latest push yet. Only have some issues with the naming.

DrahtBot added the label Needs rebase on Jun 27, 2023

DrahtBot added the label CI failed on Jun 27, 2023

furszy force-pushed on Jun 28, 2023

furszy commented at 1:31 pm on June 28, 2023: member

Tackled naming and docs suggestions, thanks mzumsande.

furszy force-pushed on Jun 28, 2023

DrahtBot removed the label Needs rebase on Jun 28, 2023

DrahtBot removed the label CI failed on Jun 28, 2023

in src/index/base.cpp:113 in 2412ffdce6 outdated

113-        if (!start_block) {
114-            // index is not built yet
115-            // make sure we have all block data back to the genesis
116-            prune_violation = m_chainstate->m_blockman.GetFirstStoredBlock(*active_chain.Tip()) != active_chain.Genesis();
117+        const CBlockIndex* block_to_test = start_block;
118+        if (!block_to_test) block_to_test = active_chain.Genesis();

ryanofsky commented at 4:27 pm on June 28, 2023:

In commit “refactor: simplify pruning violation check” (2412ffdce69ce2b164e5303ae4b610ab0d00fd5c)

Looking at this code, it’s not really clear that block_to_test is going to be non-null, and that the CheckBlockDataAvailability check is going to pass. If it were null, the pruning check would fail even if there were no pruned data.

So I think it would be good add an assert with an explanation of why block_to_test can’t be null here, and I implemented this below.

While implementing this I also noticed that current CheckBlockDataAvailability interface seems less safe to use because it accepts a second null parameter when there’s no reason to pass it one, and it’s also awkward because it returns a pointer instead of just a boolean that reflects whether data is available. Would suggest the following changes to address all of this:

  0--- a/src/index/base.cpp
  1+++ b/src/index/base.cpp
  2@@ -111,6 +111,14 @@ bool BaseIndex::Init()
  3     if (!synced && g_indexes_ready_to_sync) {
  4         const CBlockIndex* block_to_test = start_block;
  5         if (!block_to_test) block_to_test = active_chain.Genesis();
  6+
  7+        // Assert block_to_test is not null here. It can't be null because the
  8+        // genesis block can't be null here. The genesis block will be null
  9+        // during this BaseIndex::Init() call if the node is being started for
 10+        // the first time, or if -reindex is used. But in both of these cases
 11+        // m_best_block_index is also null so this branch is not reached.
 12+        assert(block_to_test);
 13+
 14         if (!active_chain.Contains(block_to_test)) {
 15             // if the bestblock is not part of the mainchain, find the fork
 16             // so we can make sure we have all data down to the fork
 17@@ -118,7 +126,7 @@ bool BaseIndex::Init()
 18         }
 19 
 20         // make sure we have all block data back to the start block
 21-        if (m_chainstate->m_blockman.CheckBlockDataAvailability(*active_chain.Tip(), block_to_test) != block_to_test) {
 22+        if (!m_chainstate->m_blockman.CheckBlockDataAvailability(*active_chain.Tip(), *Assert(block_to_test))) {
 23             return InitError(strprintf(Untranslated("%s best block of the index goes beyond pruned data. Please disable the index or reindex (which will download the whole blockchain again)"), GetName()));
 24         }
 25     }
 26--- a/src/node/blockstorage.cpp
 27+++ b/src/node/blockstorage.cpp
 28@@ -402,7 +402,7 @@ bool BlockManager::IsBlockPruned(const CBlockIndex* pblockindex)
 29     return (m_have_pruned && !(pblockindex->nStatus & BLOCK_HAVE_DATA) && pblockindex->nTx > 0);
 30 }
 31 
 32-const CBlockIndex* BlockManager::CheckBlockDataAvailability(const CBlockIndex& upper_block, const CBlockIndex* lower_block)
 33+const CBlockIndex& FindFirstStored(const CBlockIndex& upper_block, const CBlockIndex* lower_block) EXCLUSIVE_LOCKS_REQUIRED(::cs_main)
 34 {
 35     AssertLockHeld(::cs_main);
 36     const CBlockIndex* last_block = &upper_block;
 37@@ -410,7 +410,7 @@ const CBlockIndex* BlockManager::CheckBlockDataAvailability(const CBlockIndex& u
 38     while (last_block->pprev && (last_block->pprev->nStatus & BLOCK_HAVE_DATA)) {
 39         if (lower_block) {
 40             // Return if we reached the lower_block
 41-            if (last_block == lower_block) return lower_block;
 42+            if (last_block == lower_block) return *lower_block;
 43             // if range was surpassed, means that 'lower_block' is not part of the 'upper_block' chain
 44             // and so far this is not allowed.
 45             assert(last_block->nHeight >= lower_block->nHeight);
 46@@ -418,12 +418,17 @@ const CBlockIndex* BlockManager::CheckBlockDataAvailability(const CBlockIndex& u
 47         last_block = last_block->pprev;
 48     }
 49     assert(last_block != nullptr);
 50-    return last_block;
 51+    return *last_block;
 52+}
 53+
 54+bool BlockManager::CheckBlockDataAvailability(const CBlockIndex& upper_block, const CBlockIndex& lower_block)
 55+{
 56+    return &FindFirstStored(upper_block, &lower_block) == &lower_block;
 57 }
 58 
 59 const CBlockIndex* BlockManager::GetFirstStoredBlock(const CBlockIndex& start_block)
 60 {
 61-    return CheckBlockDataAvailability(start_block);
 62+    return &FindFirstStored(start_block, nullptr);
 63 }
 64 
 65 // If we're using -prune with -reindex, then delete block files that will be ignored by the
 66--- a/src/node/blockstorage.h
 67+++ b/src/node/blockstorage.h
 68@@ -217,15 +217,15 @@ public:
 69     //! Returns last CBlockIndex* that is a checkpoint
 70     const CBlockIndex* GetLastCheckpoint(const CCheckpointData& data) EXCLUSIVE_LOCKS_REQUIRED(cs_main);
 71 
 72-    //! Find the oldest block that is not pruned in the [upper_block, lower_block] range.
 73-    //! If all blocks down to lower_block are available, returns lower_block.
 74-    //! If 'lower_block=nullptr': the function verifies all blocks down to the genesis block.
 75+    //! Check if all blocks in the [upper_block, lower_block] range have data available.
 76     //! Preconditions:
 77     //!  1) The caller must ensure that upper_block has block data available.
 78     //!  2) The caller must ensure that lower_block is an ancestor of upper_block (part of the same chain).
 79-    const CBlockIndex* CheckBlockDataAvailability(const CBlockIndex& upper_block LIFETIMEBOUND, const CBlockIndex* lower_block=nullptr) EXCLUSIVE_LOCKS_REQUIRED(::cs_main);
 80+    bool CheckBlockDataAvailability(const CBlockIndex& upper_block LIFETIMEBOUND, const CBlockIndex& lower_block LIFETIMEBOUND) EXCLUSIVE_LOCKS_REQUIRED(::cs_main);
 81 
 82-    //! Find the first block that is not pruned
 83+    //! Find the first stored ancestor of start_block immediately after the last
 84+    //! pruned ancestor. Return value will never be null. Caller is responsible
 85+    //! for ensuring that start_block has data is not pruned.
 86     const CBlockIndex* GetFirstStoredBlock(const CBlockIndex& start_block LIFETIMEBOUND) EXCLUSIVE_LOCKS_REQUIRED(::cs_main);
 87 
 88     /** True if any block files have ever been pruned. */
 89diff --git a/src/test/blockmanager_tests.cpp b/src/test/blockmanager_tests.cpp
 90index fadf1c2026c0..a7248935f394 100644
 91--- a/src/test/blockmanager_tests.cpp
 92+++ b/src/test/blockmanager_tests.cpp
 93@@ -107,11 +107,11 @@ BOOST_FIXTURE_TEST_CASE(blockmanager_block_data_availability, TestChain100Setup)
 94 
 95     // 1) Return genesis block when all blocks are available
 96     BOOST_CHECK_EQUAL(blockman.GetFirstStoredBlock(tip), chainman->ActiveChain()[0]);
 97-    BOOST_CHECK_EQUAL(blockman.CheckBlockDataAvailability(tip), chainman->ActiveChain()[0]);
 98+    BOOST_CHECK(blockman.CheckBlockDataAvailability(tip, *chainman->ActiveChain()[0]));
 99 
100-    // 2) Return lower_block when all blocks are available
101+    // 2) Check lower_block when all blocks are available
102     CBlockIndex* lower_block = chainman->ActiveChain()[tip.nHeight / 2];
103-    BOOST_CHECK_EQUAL(blockman.CheckBlockDataAvailability(tip, lower_block), lower_block);
104+    BOOST_CHECK(blockman.CheckBlockDataAvailability(tip, *lower_block));
105 
106     // Prune half of the blocks
107     int height_to_prune = tip.nHeight / 2;
108@@ -121,10 +121,8 @@ BOOST_FIXTURE_TEST_CASE(blockmanager_block_data_availability, TestChain100Setup)
109 
110     // 3) The last block not pruned is in-between upper-block and the genesis block
111     BOOST_CHECK_EQUAL(blockman.GetFirstStoredBlock(tip), first_available_block);
112-    BOOST_CHECK_EQUAL(blockman.CheckBlockDataAvailability(tip), first_available_block);
113-
114-    // 4) The last block not pruned in the [tip, last_pruned_block] range is the lower_block + 1
115-    BOOST_CHECK_EQUAL(blockman.CheckBlockDataAvailability(tip, last_pruned_block), first_available_block);
116+    BOOST_CHECK(blockman.CheckBlockDataAvailability(tip, *first_available_block));
117+    BOOST_CHECK(!blockman.CheckBlockDataAvailability(tip, *last_pruned_block));
118 }
119 
120 BOOST_AUTO_TEST_SUITE_END()

furszy commented at 9:43 pm on June 28, 2023:

Cool thanks!. Pushed it with few tiny differences because I think that, at least for now, we don’t need the indirections introduced by making GetFirstStoredBlock a FindFirstStored wrapper.

ryanofsky approved

ryanofsky commented at 4:42 pm on June 28, 2023: contributor

Code review ACK 2f830d1de8f0b57f46f1d7da3cb7b9ab4aa1e2ff. I left a long comment below about one of the intermediate commits after I got hung up on whether block_to_test could be null before it was passed to the CheckBlockData function. I think it is not too important since the code is deleted in the end, but I did suggest some improvements to CheckBlockData that I think would make it safer and easier to use.

furszy force-pushed on Jun 28, 2023

DrahtBot added the label CI failed on Jun 28, 2023

DrahtBot removed the label CI failed on Jun 29, 2023

in src/index/base.cpp:109 in 5ec5a48887 outdated

128-            return InitError(strprintf(Untranslated("%s best block of the index goes beyond pruned data. Please disable the index or reindex (which will download the whole blockchain again)"), GetName()));
129-        }
130-    }
131-
132     // Child init
133+    const CBlockIndex* start_block = m_best_block_index.load();

ryanofsky commented at 1:51 am on June 30, 2023:

In commit “index: verify blocks data existence only once” (5ec5a4888743ce7261e0e9cdc077014cd47bfdd6)

Noticed this while I was rebasing #24230, but one side effect of this commit is now CustomInit will be called unnecessarily when the index can’t start up because there is a pruning violation. This doesn’t seem ideal, but it probably not worth worrying about. It would be good to mention the change in the commit description, though.

I think #24230 will restore previous behavior, and not call CustomInit when the index won’t start. But it hard to avoid calling CustomInit here because it can’t be moved to the Start function without also moving the m_synced assignment (because CustomInit has to be called before m_synced is true to make sure it is called before CustomAppend, as explained 3126454dcfa1dd29bb66500d5f2b5261684d6c58). Moving m_synced assignment to Start here would potentially be bad for performance, because it would mean indexes would only sync after the blocks were connected instead at the same time during LoadBlocks. #24230 should avoid calling CustomInit unnecessarily, and avoid needing to wait for LoadBlocks when the index is in sync by registering for notifications in a different way that doesn’t rely on m_synced to avoid spurious notifications.

furszy commented at 12:43 pm on June 30, 2023:

Noticed this while I was rebasing #24230, but one side effect of this commit is now CustomInit will be called unnecessarily when the index can’t start up because there is a pruning violation. This doesn’t seem ideal, but it probably not worth worrying about. It would be good to mention the change in the commit description, though.

Hmm, I hold the opposite view on this matter. I believe it is beneficial to call CustomInit within the base class’s Init method, regardless of any external chain state. It ensures consistency in how an object is initialized, as it is isolated from its context initially.

By performing specific initialization checks on the index subclasses before anything else, we can guarantee class data consistency. Moreover, the verification process in the child classes for CustomInit is faster, and any errors detected are more severe than the pruning violation (currently, all failures within CustomInit are related to database corruption errors).

in src/node/blockstorage.cpp:409 in b0932ec260 outdated

405@@ -406,6 +406,7 @@ const CBlockIndex* BlockManager::GetFirstStoredBlock(const CBlockIndex& start_bl
406 {
407     AssertLockHeld(::cs_main);
408     const CBlockIndex* last_block = &start_block;
409+    assert(last_block->nStatus & BLOCK_HAVE_DATA);

mzumsande commented at 3:26 pm on June 30, 2023:

Although GetFirstStoredBlock() is currently used only for rpc and indexes, this added assert makes me a bit uneasy. While we can check that GetFirstStoredBlock() is only called with the tip, how sure can we that the tip has BLOCK_HAVE_DATA? I could think of several edge cases where we don’t have data for the tip, and managed to trigger the assert in some of these situations:

(tested): With AssumeUTXO, right after importing a chainstate from disk / before getting any additional blocks from peers
(tested): After calling invalidateblock for a pruned block (maybe we should disallow this / add a warning?)
(untested): possibly also when pruning and a large reorg happens?

So I think if we now require that “The caller must ensure that upper_block has block data available.” we should adjust the existing callers to do that.

ryanofsky commented at 6:20 pm on June 30, 2023:

So I think if we now require that “The caller must ensure that upper_block has block data available.” we should adjust the existing callers to do that.

Is there something you would like to see specifically, like adding asserts at the call site? I thought most of the call sites were just passing the chain tip, so it’s hard to see how this could cause a problem unless the tip was pruned, in which case the current assert failure would be seem pretty appropriate

mzumsande commented at 6:29 pm on June 30, 2023:

I don’t think an assert failure is appropiate if having a pruned tip is a valid state that bitcoin core can be in, which seems to be the case, at least in the examples above. For example, it should be possible to call getblockchaininfo directly after loading a snapshot from disk, without triggering an assert.

furszy commented at 6:31 pm on June 30, 2023:

Based on the AssumeUTXO line. It seems to be expected to not have block data available right after importing the chainstate. And that shouldn’t crash the software if the user calls pruneblockchain or getblockchaininfo.

– ok well.. we wrote the same thing at the same time –

ryanofsky commented at 6:47 pm on June 30, 2023:

I see, so maybe current code is buggy and returns bad rpc results then? I guess simplest thing to do is just document GetFirstStoredBlock’s odd behavior in this case and keep it unchanged, rather than returning nullptr which could segfault, or adding an assertion.

The new GetFirstStoredBlock function should be able to straightforwardly return false in this case, though.

furszy commented at 10:09 pm on June 30, 2023:

I see, so maybe current code is buggy and returns bad rpc results then?

yeah.

I guess simplest thing to do is just document GetFirstStoredBlock’s odd behavior in this case and keep it unchanged, rather than returning nullptr which could segfault, or adding an assertion.

Have struggled a bit with this. I tried to maintain the previous GetFirstStoredBlock behavior but it conflicts so much with the function’s description: “Find the first stored ancestor of ‘start_block’ immediately after the last pruned ancestor” that I couldn’t come up with a suitable explanation for this “oddity”.

Have preferred to transfer the responsibility of checking the ‘start_block’ data availability to the few callers who use it (only two places), without altering their behavior in any way.

I think that the changes are minimal and help to reduce confusion. But let me know what you guys think.

ryanofsky commented at 11:18 pm on July 5, 2023:

In commit “make GetFirstStoredBlock assert that ‘start_block’ always has data” (aad83e489a9f515a1da36acd3e99f35b1a0da53c)

re: #27607 (review)

I think that the changes are minimal and help to reduce confusion. But let me know what you guys think.

Thanks for that. I think it would have been fine to just keep the GetFirstStoredBlock behavior unchanged, and just document it, but fixing the call sites is probably better in the long run

furszy force-pushed on Jun 30, 2023

mzumsande commented at 8:32 pm on July 5, 2023: contributor

ACK 94c9b1f37e335c43c739b853bb9457737b67d73a

I reviewed the code again and did some additional tests with pruning on regtest.

DrahtBot requested review from ryanofsky on Jul 5, 2023

in src/index/base.cpp:117 in aad83e489a outdated

112@@ -113,7 +113,8 @@ bool BaseIndex::Init()
113         if (!start_block) {
114             // index is not built yet
115             // make sure we have all block data back to the genesis
116-            prune_violation = m_chainstate->m_blockman.GetFirstStoredBlock(*active_chain.Tip()) != active_chain.Genesis();
117+            bool has_tip_data = active_chain.Tip()->nStatus & BLOCK_HAVE_DATA;
118+            prune_violation = !has_tip_data || m_chainstate->m_blockman.GetFirstStoredBlock(*active_chain.Tip()) != active_chain.Genesis();

ryanofsky commented at 11:18 pm on July 5, 2023:

In commit “make GetFirstStoredBlock assert that ‘start_block’ always has data” (aad83e489a9f515a1da36acd3e99f35b1a0da53c)

It seems like there is a slight change in behavior here. Previously if the tip did not have data, but all blocks before it did have data there would not be a prune violation. Now there will be prune violation.

This is probably a good thing. I don’t think this case actually needs to be an error if the missing blocks are <= than the assumeutxo snapshot height, and the blocks will be just be downloaded later as part of the background sync. But as long as it is an error currently, it seems best if the error happens consistently.

In any case, commit message could mention this is changing behavior in a corner case.

furszy commented at 1:42 pm on July 6, 2023:

done, extended the commit description.

in src/rpc/blockchain.cpp:814 in aad83e489a outdated

809@@ -810,8 +810,10 @@ static RPCHelpMan pruneblockchain()
810         height = chainHeight - MIN_BLOCKS_TO_KEEP;
811     }
812 
813-    PruneBlockFilesManual(active_chainstate, height);
814     const CBlockIndex& block{*CHECK_NONFATAL(active_chain.Tip())};
815+    if (!(block.nStatus & BLOCK_HAVE_DATA)) throw JSONRPCError(RPC_MISC_ERROR, "Nothing to prune. No block data available for the chain tip.");

ryanofsky commented at 4:03 am on July 6, 2023:

In commit “make GetFirstStoredBlock assert that ‘start_block’ always has data” (aad83e489a9f515a1da36acd3e99f35b1a0da53c)

The pruneblockchain RPC doesn’t consider it an error to prune a block that’s already been pruned. It just makes a best effort to prune what it can, and then it returns the height of the last block without transaction data.

So I think it would be more consistent to not make this an error, and just do:

0PruneBlockFilesManual(active_chainstate, height);
1return block.nStatus & BLOCK_HAVE_DATA ? GetFirstStoredBlock(block)->nHeight -1 : block->nHeight;

This would be a smaller change in behavior. It would make the RPC method consistently return the height of the highest block without data, instead of being off-by-one in this case.

furszy commented at 1:44 pm on July 6, 2023:

done 👍🏼.

in src/rpc/blockchain.cpp:1273 in aad83e489a outdated

1268@@ -1267,7 +1269,8 @@ RPCHelpMan getblockchaininfo()
1269     obj.pushKV("size_on_disk", chainman.m_blockman.CalculateCurrentUsage());
1270     obj.pushKV("pruned", chainman.m_blockman.IsPruneMode());
1271     if (chainman.m_blockman.IsPruneMode()) {
1272-        obj.pushKV("pruneheight", chainman.m_blockman.GetFirstStoredBlock(tip)->nHeight);
1273+        bool has_tip_data = tip.nStatus & BLOCK_HAVE_DATA;
1274+        obj.pushKV("pruneheight", has_tip_data ? chainman.m_blockman.GetFirstStoredBlock(tip)->nHeight : tip.nHeight);

ryanofsky commented at 4:21 am on July 6, 2023:

In commit “make GetFirstStoredBlock assert that ‘start_block’ always has data” (aad83e489a9f515a1da36acd3e99f35b1a0da53c)

This preserves current behavior, but it is inconsistent with the documentation which says the field is be set to “height of the last block pruned, plus one”. To make the code actually consistent with the documentation, this should be:

0obj.pushKV("pruneheight", has_tip_data ? chainman.m_blockman.GetFirstStoredBlock(tip)->nHeight : tip.nHeight + 1);

Alternately, to make the documentation consistent with the code, the documentation could be changed to “height of first stored block after the last pruned or assumed-valid block. Or, the height of the last pruned or assumed valid block if no valid blocks after it have been downloaded yet.”

I would probably update the code rather than the documentation, since that seems simpler, and this case should only happen when there’s an assumeutxo snapshot so there shouldn’t be a backwards compatibility concern.

furszy commented at 1:41 pm on July 6, 2023:

I would probably update the code rather than the documentation, since that seems simpler, and this case should only happen when there’s an assumeutxo snapshot so there shouldn’t be a backwards compatibility concern.

Wouldn’t be odd for the user to receive the height of a block that they do not have? (tip + 1).

Isn’t really a big deal anyway, this is an edge case and we can stick to the current RPC docs. But maybe would be good to discuss further what “pruneheight” result should be in a follow-up. it sounds better if we directly return the last pruned height.

furszy commented at 1:53 pm on July 6, 2023:

Pushed the suggested code changes.

ryanofsky approved

DrahtBot requested review from ryanofsky on Jul 6, 2023

furszy force-pushed on Jul 6, 2023

furszy commented at 1:52 pm on July 6, 2023: member

Updated per feedback, small diff. Thanks ryanofsky!

Changes:

Documented behavior change when tip has no data in 79449741fa commit description.
Removed the introduced pruneblockchain “nothing to prune” error.
Made getblockchaininfo “pruneheight” result consistent with the RPC documentation.

in src/node/blockstorage.h:222 in 9ef5099dac outdated

216@@ -217,8 +217,16 @@ class BlockManager
217     //! Returns last CBlockIndex* that is a checkpoint
218     const CBlockIndex* GetLastCheckpoint(const CCheckpointData& data) EXCLUSIVE_LOCKS_REQUIRED(cs_main);
219 
220-    //! Find the first block that is not pruned
221-    const CBlockIndex* GetFirstStoredBlock(const CBlockIndex& start_block LIFETIMEBOUND) EXCLUSIVE_LOCKS_REQUIRED(::cs_main);
222+    //! Check if all blocks in the [upper_block, lower_block] range have data available.
223+    //! Preconditions:
224+    //!  1) The caller must ensure that upper_block has block data available.

ryanofsky commented at 3:38 pm on July 6, 2023:

In commit “refactor: simplify pruning violation check” (9ef5099dac06f0cc4b904fefe9620c9fc09c0ddc)

This comment is no longer true. The CheckBlockDataAvailability function will just return false in this case, the caller doesn’t need to check if the block has data itself.

furszy commented at 9:15 pm on July 6, 2023:

comment removed. Thanks.

in src/node/blockstorage.h:231 in 9ef5099dac outdated

225+    //!  2) The caller must ensure that lower_block is an ancestor of upper_block (part of the same chain).
226+    bool CheckBlockDataAvailability(const CBlockIndex& upper_block LIFETIMEBOUND, const CBlockIndex& lower_block LIFETIMEBOUND) EXCLUSIVE_LOCKS_REQUIRED(::cs_main);
227+
228+    //! Find the first stored ancestor of start_block immediately after the last
229+    //! pruned ancestor. Return value will never be null. Caller is responsible
230+    //! for ensuring that start_block has data is not pruned.

ryanofsky commented at 3:40 pm on July 6, 2023:

In commit “refactor: simplify pruning violation check” (9ef5099dac06f0cc4b904fefe9620c9fc09c0ddc)

Would be probably make sense to move new this comment to the earlier commit “make GetFirstStoredBlock assert that ‘start_block’ always has data” (79449741fa7b2fc3e59dbc92e408cff547a885fa), since everything described is true there and applies more to that commit.

in src/init.cpp:1910 in 30b2511f39 outdated

1905+        if (!active_chain.Contains(pindex)) {
1906+            pindex = active_chain.FindFork(pindex);
1907+        }
1908+
1909+        // Update start block
1910+        bool need_start_from_scratch = !indexes_start_block || !pindex;

ryanofsky commented at 4:07 pm on July 6, 2023:

In commit “index: verify blocks data existence only once” (30b2511f39d32e29f9f05859aa8a97b84c22376b)

The variable name need_start_from_scratch seems misleading because it will be true even when not syncing from genesis. Would suggest simplifying and clarifying with:

 0--- a/src/init.cpp
 1+++ b/src/init.cpp
 2@@ -1892,9 +1892,6 @@ bool StartIndexBackgroundSync(NodeContext& node)
 3 
 4     ChainstateManager& chainman = *Assert(node.chainman);
 5     for (auto index : node.indexes) {
 6-        // If the index best block is nullptr, means that we will start from the genesis block
 7-        if (indexes_start_block && !indexes_start_block.value()) continue; // So, nothing to do.
 8-
 9         const IndexSummary& summary = index->GetSummary();
10         if (summary.synced) continue;
11 
12@@ -1906,11 +1903,10 @@ bool StartIndexBackgroundSync(NodeContext& node)
13             pindex = active_chain.FindFork(pindex);
14         }
15 
16-        // Update start block
17-        bool need_start_from_scratch = !indexes_start_block || !pindex;
18-        if (need_start_from_scratch || pindex->nHeight < indexes_start_block.value()->nHeight) {
19+        if (!indexes_start_block || !pindex || pindex->nHeight < indexes_start_block.value()->nHeight) {
20             indexes_start_block = pindex;
21             older_index_name = summary.name;
22+            if (!pindex) break; // Starting from genesis so no need to look for earlier block.
23         }
24     };
25

furszy commented at 9:14 pm on July 6, 2023:

Sure, applied. Thanks!

ryanofsky approved

ryanofsky commented at 4:50 pm on July 6, 2023: contributor

Code review ACK 30b2511f39d32e29f9f05859aa8a97b84c22376b. Just GetFirstStoredBlock and CheckBlockDataAvailability related improvements since last review.

It would be nice to have a third reviewer for PR this if anyone else wants to take a look.

re: #27607 (review)

maybe would be good to discuss further what “pruneheight” result should be in a follow-up. it sounds better if we directly return the last pruned height.

I agree that probably would be better. I think the initial history of this is that as of #15991 pruneblockchain and getblockchaininfo both tried to return the “first stored” block height, rather than the “last pruned” block height.

But then @luke-jr noticed that “first stored” concept was misleading because blockfiles contain multiple blocks, and blocks are downloaded out of order, so it is possible for there to be complete blocks that are stored before the last pruned block. To address this, he changed pruneblockchain to return the last block pruned rather than the first block stored in #24629, and he changed the getblockchaininfo documentation in #24640 to clarify that it returns the last block pruned + 1 rather than first block stored. I’m not sure why he didn’t try to make the two functions more consistent with each other. Probably there were backwards compatibility concerns, and it was more important to make the pruneblockchain return value consistent with its argument value than to change the behavior of getblockchaininfo.

DrahtBot requested review from mzumsande on Jul 6, 2023

furszy force-pushed on Jul 6, 2023

furszy commented at 9:17 pm on July 6, 2023: member

Updated per feedback, thanks. Small diff.

DrahtBot added the label CI failed on Jul 6, 2023

furszy commented at 10:31 pm on July 7, 2023: member

Note: the branch tests are all passing locally. What is failing is the CI compiling it on top of master. Fixing it..

init: start indexes sync earlier

The mempool load can take a while, and it is not
needed for the indexes' synchronization.

Also, having the mempool load function call
inside 'blockstorage.cpp' wasn't structurally
correct.

ed4462cc78

scripted-diff: rename 'loadblk' thread name to 'initload'

The thread does not only load blocks, it loads the mempool and,
in a future commit, will start the indexes as well.

Also, renamed the 'ThreadImport' function to 'ImportBlocks'
And the 'm_load_block' class member to 'm_thread_load'.

-BEGIN VERIFY SCRIPT-

sed -i "s/ThreadImport/ImportBlocks/g" $(git grep -l ThreadImport -- ':!/doc/')
sed -i "s/loadblk/initload/g" $(git grep -l loadblk -- ':!/doc/release-notes/')
sed -i "s/m_load_block/m_thread_load/g" $(git grep -l m_load_block)

-END VERIFY SCRIPT-

04575106b2

doc: describe 'init load' thread actions 2ebc7e68cc

refactor: init indexes, decouple 'Start()' from the creation step

No behavior change.

The goal here is to group indexes, so we can perform the same
initialization and verification process equally for all of them.

The checks performed inside `StartIndexes` will be expanded
in the subsequent commits.

225e213110

furszy force-pushed on Jul 7, 2023

ryanofsky approved

ryanofsky commented at 10:44 pm on July 7, 2023: contributor

Code review ACK 988f3f692acabf0eedfed38e4704fe355f995e72, just suggested StartIndexes simplification and comment changes since last review

furszy commented at 10:56 pm on July 7, 2023: member

Rebased on master due a silent conflict. Only had to adapt an AbortNode() line (which now is under the fatalError() alias).

Should be easy to re-check with a git range-diff 988f3f6...4223d80.

DrahtBot removed the label CI failed on Jul 8, 2023

in src/index/base.cpp:115 in b5f8bbff46 outdated

116-            // index is not built yet
117-            // make sure we have all block data back to the genesis
118-            bool has_tip_data = active_chain.Tip()->nStatus & BLOCK_HAVE_DATA;
119-            prune_violation = !has_tip_data || m_chainstate->m_blockman.GetFirstStoredBlock(*active_chain.Tip()) != active_chain.Genesis();
120+        const CBlockIndex* block_to_test = start_block;
121+        if (!block_to_test) block_to_test = active_chain.Genesis();

TheCharlatan commented at 10:45 am on July 9, 2023:

Nit: const CBlockIndex* block_to_test = start_block ? start_block : active_chain.Genesis();

furszy commented at 1:53 pm on July 10, 2023:

done

TheCharlatan approved

TheCharlatan commented at 11:06 am on July 9, 2023: contributor

Nice, ACK 4223d80bdae0d265e67280ea8853b184ff0ced13

Nit can be ignored.

DrahtBot requested review from ryanofsky on Jul 9, 2023

in src/index/base.h:59 in 18bf3abc2b outdated

53@@ -54,6 +54,8 @@ class BaseIndex : public CValidationInterface
54     };
55 
56 private:
57+    /// Whether the index has been initialized or not.
58+    std::atomic<bool> m_init{false};

maflcko commented at 12:39 pm on July 10, 2023:

18bf3abc2babbe066e5d4b5798209826316c8295: Why is this atomic, when only a single thread has access? Or is the other code multi thread safe?

furszy commented at 1:41 pm on July 10, 2023:

At the end of the PR, two threads end up accessing the variable. The BaseIndex::Init call stays in the main thread, while the BaseIndex::StartBackgroundSync call is moved to the ’loadblk’ thread (now called ‘initload’)

in src/init.cpp:1571 in 225e213110 outdated

1575-        }
1576+        node.indexes.emplace_back(g_coin_stats_index.get());
1577     }
1578 
1579+    // Now that all indexes are loaded, start them
1580+    StartIndexes(node);

maflcko commented at 12:42 pm on July 10, 2023:

225e213110602b4fd1d345167f5f92d26557f6c1: Why ignore the return value? This makes the tests fail

furszy commented at 1:53 pm on July 10, 2023:

upps, fixed the intermediate commit. Thanks.

maflcko commented at 1:25 pm on July 10, 2023: member

Looks like the tests fail

refactor: index, decouple 'Init' from 'Start'

So indexes can be initialized without spawning
the sync thread.

This makes asynchronous indexes startup
possible in the following commits.

430e7027a1

make GetFirstStoredBlock assert that 'start_block' always has data

And transfer the responsibility of verifying whether 'start_block'
has data or not to the caller.

This is because the 'GetFirstStoredBlock' function responsibility
is to return the first block containing data. And the current
implementation can return 'start_block' when it has no data!. Which
is misleading at least.

Edge case behavior change:
Previously, if the block tip lacked data but all preceding blocks
contained data, there was no prune violation. And now, such
scenario will result in a prune violation.

c82ef91eae

refactor: simplify pruning violation check

By generalizing 'GetFirstStoredBlock' and implementing
'CheckBlockDataAvailability' we can dedup code and
avoid repeating work when multiple indexes are enabled.
E.g. get the oldest block across all indexes and
perform the pruning violation check from that point
up to the tip only once (this feature is being introduced
in a follow-up commit).

This commit shouldn't change behavior in any way.

Co-authored-by: Ryan Ofsky <ryan@ofsky.org>

2ec89f1970

init: don't start indexes sync thread prematurely

By moving the 'StartIndexes()' call into the 'initload'
thread, we can remove the threads active wait. Optimizing
the available resources.

The only difference with the current state is that now the
indexes threads will only be started when they can process
work and not before it.

fcbdaeef4d

index: verify blocks data existence only once

At present, during init, we traverse the chain (once per index)
to confirm that all necessary blocks to sync each index up to
the current tip are present.

To make the process more efficient, we can fetch the oldest block
from the indexers and perform the chain data existence check from
that point only once.

This also moves the pruning violation check to the end of the
'loadinit' thread, which is where the reindex, block loading and
chain activation processes happen.

Making the node's startup process faster, allowing us to remove
the global g_indexes_ready_to_sync flag, and enabling the
execution of the pruning violation verification even when the
reindex or reindex-chainstate flags are enabled (which has being
skipped so far).

ca91c244ef

furszy force-pushed on Jul 10, 2023

furszy commented at 1:57 pm on July 10, 2023: member

Updated per feedback, thanks Marco and TheCharlatan.

Changes:

Fixed bug in one of the intermediate commits (per comment).
Inlined an if/else in one of the intermediate commits (per comment).

ryanofsky approved

ryanofsky commented at 2:28 pm on July 10, 2023: contributor

Code review ACK ca91c244ef1ba7eac6643d66a5fc56d3a2a8b550. Just rebase and suggested changes since last review (Start return check, and code simplification)

DrahtBot requested review from TheCharlatan on Jul 10, 2023

TheCharlatan approved

TheCharlatan commented at 3:44 pm on July 10, 2023: contributor

re-ACK ca91c244ef1ba7eac6643d66a5fc56d3a2a8b550

ryanofsky merged this on Jul 10, 2023

ryanofsky closed this on Jul 10, 2023

ryanofsky referenced this in commit e253568da8 on Jul 11, 2023

sidhujag referenced this in commit efd4404000 on Jul 12, 2023

furszy deleted the branch on Jul 20, 2023

bitcoin locked this on Jul 19, 2024

index: make startup more efficient #27607

Reviews

Conflicts