index: Check all necessary block data is available before starting to sync

fjahr commented at 10:55 pm on March 30, 2024: contributor

Currently, we check that BLOCK_HAVE_DATA is available for all blocks an index needs to sync during startup. However, for coinstatsindex and blockfilterindex we also need the undo data for these blocks. If that data is missing in the blocks, we are currently still starting to sync each of these indices and then crash later when we encounter the missing data.

This PR adds explicit knowledge of which block data is needed for each index and then checks its availability during startup before initializing the sync process on them.

This also addresses a few open comments from #29668 in the last commit.

DrahtBot commented at 10:55 pm on March 30, 2024: contributor

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/29770.

Reviews

See the guideline for information on the review process.

Type	Reviewers
Concept ACK	stickies-v
Approach ACK	TheCharlatan

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#26966 (index: initial sync speedup, parallelize process by furszy)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

DrahtBot added the label UTXO Db and Indexes on Mar 30, 2024

TheCharlatan commented at 11:16 pm on March 30, 2024: contributor

Concept ACK

DrahtBot added the label CI failed on Mar 31, 2024

in src/index/base.h:166 in 809fffdd6e outdated

160@@ -160,6 +161,9 @@ class BaseIndex : public CValidationInterface
161 
162     /// Get a summary of the index and its state.
163     IndexSummary GetSummary() const;
164+
165+    /// Data needed in blocks in order for the index to be able to sync
166+    virtual uint32_t NeededBlockData() const = 0;

stickies-v commented at 10:23 am on April 1, 2024:

nit: NeededBlockStatusMask (I prefer Required instead of Needed but that’s personal and doesn’t really matter) better highlights that this returns a mask instead of data

fjahr commented at 4:02 pm on July 12, 2024:

Renamed

in test/functional/feature_index_prune.py:148 in 809fffdd6e outdated

139@@ -132,6 +140,39 @@ def run_test(self):
140         for i, msg in enumerate([filter_msg, stats_msg, filter_msg]):
141             self.nodes[i].assert_start_raises_init_error(extra_args=self.extra_args[i], expected_msg=msg+end_msg)
142 
143+        def check_for_block(node, hash):
144+            try:
145+                self.nodes[node].getblock(hash)
146+                return True
147+            except JSONRPCException:
148+                return False

stickies-v commented at 10:23 am on April 1, 2024:

Already implement on L57, I think this one can be removed?

fjahr commented at 4:01 pm on July 12, 2024:

done

in test/functional/feature_index_prune.py:167 in 809fffdd6e outdated

162+            # 1500 is the height to where the indices were able to sync
163+            # previously
164+            for b in range(1500, prune_height):
165+                bh = node.getblockhash(b)
166+                node.getblockfrompeer(bh, peer_id)
167+                self.wait_until(lambda: check_for_block(node=i, hash=bh), timeout=10)

stickies-v commented at 11:21 am on April 1, 2024:

Could use batch queries to speed things up. On my machine, fetching individually takes ~12s (2 x ~6s), fetching in batch just ~3s (2 x ~1.5s). Is there a reason we need to wait_until check_for_block? Batch approach seems to work fine without it?

 0diff --git a/test/functional/feature_index_prune.py b/test/functional/feature_index_prune.py
 1index af6ef4c14c..e15ca7ac76 100755
 2--- a/test/functional/feature_index_prune.py
 3+++ b/test/functional/feature_index_prune.py
 4@@ -5,13 +5,25 @@
 5 """Test indices in conjunction with prune."""
 6 import os
 7 from test_framework.authproxy import JSONRPCException
 8-from test_framework.test_framework import BitcoinTestFramework
 9+from test_framework.test_framework import BitcoinTestFramework, TestNode
10 from test_framework.util import (
11     assert_equal,
12     assert_greater_than,
13     assert_raises_rpc_error,
14 )
15 
16+from typing import Dict, List, Any
17+
18+def send_batch_request(node: TestNode, method: str, params: List[Any]) -> List[Any]:
19+    """Send batch request and parse all results"""
20+    data = [{"method": method, "params": p} for p in params]
21+    response = node.batch(data)
22+    result = []
23+    for item in response:
24+        assert item["error"] is None, item["error"]
25+        result.append(item["result"])
26+
27+    return result
28 
29 class FeatureIndexPruneTest(BitcoinTestFramework):
30     def set_test_params(self):
31@@ -159,12 +171,9 @@ class FeatureIndexPruneTest(BitcoinTestFramework):
32             assert_equal(len(peers), 1)
33             peer_id = peers[0]["id"]
34 
35-            # 1500 is the height to where the indices were able to sync
36-            # previously
37-            for b in range(1500, prune_height):
38-                bh = node.getblockhash(b)
39-                node.getblockfrompeer(bh, peer_id)
40-                self.wait_until(lambda: check_for_block(node=i, hash=bh), timeout=10)
41+            # 1500 is the height to where the indices were able to sync previously
42+            hashes = send_batch_request(node, "getblockhash", [[a] for a in range(1500, prune_height)])
43+            send_batch_request(node, "getblockfrompeer", [[bh, peer_id] for bh in hashes])
44 
45             # Upon restart we expect the same errors as previously although all
46             # necessary blocks have been fetched. Both indices need the undo

fjahr commented at 4:01 pm on July 12, 2024:

Taken with minor edit, thanks!

stickies-v commented at 11:23 am on April 1, 2024: contributor

Concept ACK

fjahr commented at 4:05 pm on April 28, 2024: contributor

@stickies-v Thanks for the feedback, I will leave this unaddressed for now until #29668 has been merged. Then I will get back to it when I take this out of draft mode.

DrahtBot added the label Needs rebase on Jul 10, 2024

fjahr force-pushed on Jul 12, 2024

fjahr marked this as ready for review on Jul 12, 2024

DrahtBot removed the label Needs rebase on Jul 12, 2024

fjahr force-pushed on Jul 12, 2024

DrahtBot removed the label CI failed on Jul 13, 2024

fjahr commented at 4:46 pm on July 13, 2024: contributor

Rebased with updates that resulted from the changes in #29668 before merge plus an additional commit that addresses left-over comments in #29668.

in src/index/base.h:168 in 245c09bc85 outdated

160@@ -160,6 +161,9 @@ class BaseIndex : public CValidationInterface
161 
162     /// Get a summary of the index and its state.
163     IndexSummary GetSummary() const;
164+
165+    /// Data needed in blocks in order for the index to be able to sync
166+    virtual uint32_t RequiredBlockStatusMask() const = 0;

maflcko commented at 8:08 am on July 15, 2024:

unrelated: I wonder if at some point it could make sense to use a named type for the block status mask.

Something like using BlockStatusMask = std::underlying_type_t<BlockStatus>;

in src/node/blockstorage.cpp:620 in 05fb9aef47 outdated

618 {
619-    if (!(upper_block.nStatus & BLOCK_HAVE_DATA)) return false;
620-    return GetFirstBlock(upper_block, BLOCK_HAVE_DATA, &lower_block) == &lower_block;
621+    if (!(upper_block.nStatus & status_mask)) return false;
622+    LogPrintf("This is result: %s\n", GetFirstBlock(upper_block, status_mask, &lower_block)->nHeight);
623+    LogPrintf("This is goal: %s\n", lower_block.nHeight);

maflcko commented at 8:08 am on July 15, 2024:

Looks like a leftover debug print?

fjahr commented at 11:23 am on July 15, 2024:

yepp, fixed, thanks!

fjahr force-pushed on Jul 15, 2024

furszy commented at 1:43 am on July 16, 2024: member

It would be nice to implement 81638f5d42b841 differently, without adding the <chain.h> dependency to all index headers. This dependency provides access to node internal types that indexes running on other processes (in the future) will not know about.

It seems we all implemented the same “index X requires block data” and “index Y requires block undo data” in slightly different ways. I did it in #26966 some time ago, and ryanofsky also did it in #24230.

My preference would be to follow #24230 and use the custom options class. It is more flexible than introducing a new method to override on all/most classes every time a new index customization is added.

fjahr commented at 9:03 am on July 16, 2024: contributor

It would be nice to implement 81638f5 differently, without adding the <chain.h> dependency to all index headers. This dependency provides access to node internal types that indexes running on other processes (in the future) will not know about.

Should not be too complicated to get rid of the dependency. I will draft something, maybe I’ll just give @maflcko s suggestion a try.

It seems we all implemented the same “index X requires block data” and “index Y requires block undo data” in slightly different ways. I did it in #26966 some time ago, and ryanofsky also did it in #24230.

My preference would be to follow #24230 and use the custom options class. It is more flexible than introducing a new method to override on all/most classes every time a new index customization is added.

I’m wouldn’t call having helper functions returning bools for each case more flexible than what we have here. At least this is what I saw in #24230 and I think you mean:

0virtual bool RequiresBlockUndoData() const { return false; }```
1
2It may be more readable in the short term but if we build more complicated stuff beyond just checking undo data I think this will be complicated. And we already use the mask everywhere so why not build on this? I like being able to grep for the blockstatus enum values too in order to see where block data is checked in various ways, that would be also lost.

furszy commented at 2:05 pm on July 16, 2024: member

It would be nice to implement 81638f5 differently, without adding the <chain.h> dependency to all index headers. This dependency provides access to node internal types that indexes running on other processes (in the future) will not know about.

Should not be too complicated to get rid of the dependency. I will draft something, maybe I’ll just give @maflcko s suggestion a try.

maflcko’s suggestion wouldn’t remove the dependency. Unless you place the new enum somewhere else?. Which I don’t think it worth it.

It seems we all implemented the same “index X requires block data” and “index Y requires block undo data” in slightly different ways. I did it in #26966 some time ago, and ryanofsky also did it in #24230. My preference would be to follow #24230 and use the custom options class. It is more flexible than introducing a new method to override on all/most classes every time a new index customization is added.

I’m wouldn’t call having helper functions returning bools for each case more flexible than what we have here. At least this is what I saw in #24230 and I think you mean:

virtual bool RequiresBlockUndoData() const { return false; }

It may be more readable in the short term but if we build more complicated stuff beyond just checking undo data I think this will be complicated. And we already use the mask everywhere so why not build on this? I like being able to grep for the blockstatus enum values too in order to see where block data is checked in various ways, that would be also lost.

If we agree that the final goal of the indexes is to run in isolation, in a standalone process, and be the first consumers of the kernel library, then linking them to chain internal fields like the CBlockIndex status mask isn’t the way to go. These objects will live only within the kernel process.

Also, I don’t think the grepping argument is valid in most circumstances. It can be used anywhere to break layer distinctions. For example, what if the GUI needs to know if a certain block is available on disk in the future. Would you add the status masks enum dependency to a widget class? Doing so would break the current structure: views -> model/interfaces -> kernel (this is also done this way to allow the GUI to run in a standalone process as well).

The RequiresBlockUndoData() implementation is from my PR, and I’m not fan of it anymore (I don’t dislike it, just prefer a different approach). I think #24230 approach is better as it introduces a new base class method to override called CustomOptions() that returns a struct containing index’s specific information. I think that building upon this would be cleaner and more flexible, as we would only need to add fields to a struct instead of changing the base class interface with each new option. - would probably change the struct name, which is currently called NotifyOptions -.

fjahr commented at 3:57 pm on July 16, 2024: contributor

maflcko’s suggestion wouldn’t remove the dependency. Unless you place the new enum somewhere else?. Which I don’t think it worth it.

Yes, I was thinking of putting it in a file for types which we have been doing in the code base in several places. And @ryanofsky suggests introducing a kernel/types.h here already. Moving code isn’t hard to review and chain.h is big enough to be broken up a bit more IMO.

If we agree that the final goal of the indexes is to run in isolation, in a standalone process, and be the first consumers of the kernel library, then linking them to chain internal fields like the CBlockIndex status mask isn’t the way to go. These objects will live only within the kernel process.

Yes, I hope they can run in isolation in the future. But the index still needs to have an understanding of what block data is and what it needs in order to decide if it can handle what the the kernel gives it. So I don’t think it can be avoided that the index has knowledge of the block status so this would need to be shared. That doesn’t mean that they can’t run isolation, but the kernel needs to share this knowledge with them. We can only avoid this if we forget about the concept suggested here and let the index ask for data from the kernel until it hits something unexpected and fails or we basically reimplement the block status as a list of bools in the index which will be much harder to reason about and maintain. I don’t see how an options object prevents that.

DrahtBot added the label Needs rebase on Jul 16, 2024

fjahr force-pushed on Jul 17, 2024

fjahr commented at 10:55 am on July 17, 2024: contributor

just rebased

DrahtBot removed the label Needs rebase on Jul 17, 2024

furszy commented at 10:31 pm on July 19, 2024: member

But the index still needs to have an understanding of what block data is and what it needs in order to decide if it can handle what the the kernel gives it. So I don’t think it can be avoided that the index has knowledge of the block status so this would need to be shared. That doesn’t mean that they can’t run isolation, but the kernel needs to share this knowledge with them. We can only avoid this if we forget about the concept suggested here and let the index ask for data from the kernel until it hits something unexpected and fails or we basically reimplement the block status as a list of bools in the index which will be much harder to reason about and maintain. I don’t see how an options object prevents that.

We both agree that indexes need to share their sync data requirements in some way. Perhaps we are not understanding each other because your rationale is based on the current sync mechanism, while I am considering a future version that is no longer at the index base class.

An index running in a standalone process would only sync through kernel signals. It will register with the kernel, providing its last known block hash and the data and events it wants to receive and listen to. Then, it will only react to them. It will no longer request data directly from the kernel as it does currently.

The kernel, running in a different process, will emit the signals required to sync the registered index, just as it does for other listeners like the wallet (no difference between them). It will provide the BlockInfo struct, which may or may not contain specific data, such as block undo data during connection/disconnection and other information if requested during registration.

This is why using an options struct instead of creating a method for each index sync requirement is more flexible to me. The index will provide this struct during registration to the kernel running in a different process and then forget about it. Then, if any event arrives without the requested data, the index will abort its execution.

Moreover, even if you prefer the multi-overridden-methods approach instead of the options struct (which is how I implemented this in #26966), I don’t think accessing the uint32_t block index status bit flags field helps much in terms of reasoning about the code or maintenance. People working on upper layers like the indexes, the wallet, or the GUI should focus on receiving the data they expect and should not have to worry/learn about how the block verification status is mapped in memory.

in test/functional/feature_index_prune.py:161 in adabfbc237 outdated

153@@ -132,6 +154,29 @@ def run_test(self):
154         for i, msg in enumerate([filter_msg, stats_msg, filter_msg]):
155             self.nodes[i].assert_start_raises_init_error(extra_args=self.extra_args[i], expected_msg=msg+end_msg)
156 
157+        self.log.info("fetching the missing blocks with getblockfrompeer doesn't work for block filter index and coinstatsindex")

Sjors commented at 3:43 pm on July 23, 2024:

adabfbc23700056e5bd22b673c3078aa49ad83ea: it would be useful to have an example where getblockfrompeer does work, i.e. an index that doesn’t need undo data.

fjahr commented at 2:00 pm on March 30, 2025:

That would be nice but I’m not sure how to do it. The one candidate index we have for that is txindex but we wholly disable that in the context of pruning because what it is used for is incompatible with pruning.

Sjors commented at 8:55 pm on July 23, 2024: member

As a quick sanity check, I rebuilt -coinstatsindex and checked gettxoutsetinfo still gives me the same stats at block 840,000.

DrahtBot added the label Needs rebase on Dec 6, 2024

fjahr force-pushed on Mar 29, 2025

DrahtBot removed the label Needs rebase on Mar 29, 2025

fjahr force-pushed on Mar 30, 2025

fjahr commented at 2:08 pm on March 30, 2025: contributor

I am making one more effort to pick this up see if it can get the necessary support. Upon re-reading our conversation @furszy I have removed the chain.h dependency from the indices so that it’s a bit closer to what #24230 is planning to do. I looked at it as well to see if there are commits available to pick to add a options object but that didn’t seem to be the case and #24230 is also still in draft mode so unsure how valuable it would be if I pick some small piece of code out of it. The code added to the indices here is minimal, so I hope that we can agree now that is worth adding while #24230 is unclear when it will land. The tests added here will be valuable for it either way it seems.

in src/init.cpp:2136 in 9f99000655 outdated

2136+        const CBlockIndex* start_block = indexes_start_block.value();
2137+        // We only check down to the first block instead of Genesis because
2138+        // Genesis can not have undo data so it is ok that it doesn't have
2139+        // all the data that the index might need.
2140+        if (start_block->nHeight == 0) start_block = chainman.ActiveChain()[1];
2141+        if (!chainman.m_blockman.CheckBlockDataAvailability(*index_chain.Tip(), *Assert(start_block), older_needed_block_data)) {

mzumsande commented at 6:47 pm on April 1, 2025:

We may not have a block at height one: The assert in this line can be triggered by starting a fresh regtest node (empty datadir, no index), stopping it and restarting it with an index.

Interesting that no test catches this, maybe we should add one.

fjahr commented at 6:31 pm on April 3, 2025:

Good catch! I fixed this and refactored the function a little more while I was at it. I think the new structure is a bit easier to understand and reason about. I also added a test that would have caught this. Open to add more startup options there too.

mzumsande commented at 7:54 pm on April 1, 2025: contributor

Tbh the getblockfrompeer scenario seem a little bit far-fetched to me.

Also, why aren’t we simply always requiring BLOCK_HAVE_DATA and BLOCK_HAVE_UNDO for all indexes instead of making a distinction per index? For hypothetical future indexes? While txindex doesn’t strictly need undo data, it also doesn’t work with pruning, so when we run with txindex I think we’ll always have the undo data anyway for all blocks in the chain.

fjahr commented at 8:51 pm on April 1, 2025: contributor

Tbh the getblockfrompeer scenario seem a little bit far-fetched to me.

I ran into this personally when I accidentally pruned a bit too much data and tried to backfill with getblockfrompeer, so to me its not far-fetched and I think getblockfrompeer is a feature that has a lot of potential to get (over)used in ways we don’t anticipate. At the same time I don’t think people really understand for what it works and for what it doesn’t work. I myself didn’t initially have a full grasp of it and improved docs, tests etc. because of it (#23813, #23927, #29668).

Also, why aren’t we simply always requiring BLOCK_HAVE_DATA and BLOCK_HAVE_UNDO for all indexes instead of making a distinction per index?

I could be convinced to do it that way if other reviewers prefer this. But I don’t think that saves us much. I would save few lines of code and instead need to write a few lines of docs to make clear that we do this even though it’s an oversimplification. Curious what @furszy thinks about this.

For hypothetical future indexes? While txindex doesn’t strictly need undo data, it also doesn’t work with pruning, so when we run with txindex I think we’ll always have the undo data anyway for all blocks in the chain.

Yes, for future indices definitely. Then I think it’s also good documentation, it makes explicit which data the index uses at very minimal cost in complexity. There have also been multiple discussions in the past about pruning undo data independent of block data (for example #19040) and also making txindex compatible with pruning has been discussed (for example #12651), for each of these this would be necessary, even if those may not be very hot topics at the moment.

in src/init.cpp:2123 in 9f99000655 outdated

2115@@ -2113,16 +2116,24 @@ bool StartIndexBackgroundSync(NodeContext& node)
2116         if (!indexes_start_block || !pindex || pindex->nHeight < indexes_start_block.value()->nHeight) {
2117             indexes_start_block = pindex;
2118             older_index_name = summary.name;
2119-            if (!pindex) break; // Starting from genesis so no need to look for earlier block.
2120+            // Starting from genesis so no need to look for earlier block.
2121+            if (!pindex) break;
2122+            // Check if we need to have undo data available to sync as well.
2123+            if (index->RequiresUndoData()) {
2124+                older_needed_block_data |= BLOCK_HAVE_UNDO;

furszy commented at 8:06 pm on April 2, 2025:

The RequiresUndoData() check needs to be placed before the breaking point. The first index could require sync from genesis but not require undo data, which would cause all other indexes to return early and never set the BLOCK_HAVE_UNDO flag.

fjahr commented at 6:29 pm on April 3, 2025:

Good catch! Fixed.

in src/init.cpp:2135 in 9f99000655 outdated

2135-        if (!chainman.m_blockman.CheckBlockDataAvailability(*index_chain.Tip(), *Assert(start_block))) {
2136+        const CBlockIndex* start_block = indexes_start_block.value();
2137+        // We only check down to the first block instead of Genesis because
2138+        // Genesis can not have undo data so it is ok that it doesn't have
2139+        // all the data that the index might need.
2140+        if (start_block->nHeight == 0) start_block = chainman.ActiveChain()[1];

furszy commented at 8:27 pm on April 2, 2025:

There is an edge case here that can lead to a crash when the node does not have the genesis block data.

Imagine a pruned node that re-downloaded its chain via getblockfrompeer(). Since the genesis block cannot be re-downloaded from the wire (because it is discarded after being received at the p2p layer — material for another discussion), the node could be having the entire chain except for the genesis block data. Then, as this code skips the genesis block, the index will crash internally when it tries to read it during sync.

Note: I’m not referring to the undo data here, just the regular block data. Writing a test case for this should be straightforward.

fjahr commented at 11:30 am on April 3, 2025:

Do we really receive the genesis block from the p2p layer? I thought we just load it via LoadGenesisBlock(). I just started a new node with no internet connectivity and used all the typical rpc on the genesis block (for example getblock 000000000019d6689c085ae165831e934ff763ae46a2a6c172b3f1b60a8ce26f 2) and they all worked, so it doesn’t seem like the data availability of the genesis block relies on p2p.

furszy commented at 4:36 pm on April 4, 2025:

Do we really receive the genesis block from the p2p layer? I thought we just load it via LoadGenesisBlock(). I just started a new node with no internet connectivity and used all the typical rpc on the genesis block (for example getblock 000000000019d6689c085ae165831e934ff763ae46a2a6c172b3f1b60a8ce26f 2) and they all worked, so it doesn’t seem like the data availability of the genesis block relies on p2p.

The scenario I described above is during pruning + manual block re-download. LoadGenesisBlock() writes the genesis block only once to disk. It does not write it if the block index is in the chain state.

During pruning, we erase all block up to a certain point by deleting entire block files but we don’t delete the associated block index in the chain state. This means we also delete the genesis block data during the initial pruning but we keep the block index so LoadGenesisBlock() will not re-write the genesis block to disk during next startup (you can verify this by calling getblock(genesis_hash) in a prune node).

Now, the scenario I described: The user might decide to re-download all the prune blocks manually, by calling getblockfrompeer() over all the chain state (including the genesis block, which as I described above, it will not be automatically re-written to disk after the initial pruning). This will make the node recover the entire chain except for the genesis block. This is because our P2P layer ignores the genesis block when it receives it from the wire (due to the anti-DoS check, more precisely it retrieves too-little-chainwork).

So, during any future startup, the node will have all the blocks content except for the genesis block content. Which will make the index crash internally because you are skipping the genesis block data availability check during init.

fjahr commented at 9:51 pm on May 31, 2025:

Ok, I have addressed this by pulling the handing of this special case up and doing a check for block data of genesis if that is what’s needed by an index that requires undo data.

in src/index/blockfilterindex.h:87 in 9f99000655 outdated

81@@ -82,6 +82,9 @@ class BlockFilterIndex final : public BaseIndex
82     /** Get a range of filter hashes between two heights on a chain. */
83     bool LookupFilterHashRange(int start_height, const CBlockIndex* stop_index,
84                                std::vector<uint256>& hashes_out) const;
85+
86+    /** Undo data is needed in order for the index to be able to sync */
87+    bool RequiresUndoData() const override {return true;};

furszy commented at 8:29 pm on April 2, 2025:

extra semicolon at the end (on all the RequiresUndoData())

fjahr commented at 6:29 pm on April 3, 2025:

Fixed

furszy commented at 8:39 pm on April 2, 2025: member

Also, why aren’t we simply always requiring BLOCK_HAVE_DATA and BLOCK_HAVE_UNDO for all indexes instead of making a distinction per index?

I could be convinced to do it that way if other reviewers prefer this. But I don’t think that saves us much. I would save few lines of code and instead need to write a few lines of docs to make clear that we do this even though it’s an oversimplification. Curious what @furszy thinks about this.

The synchronization process currently runs in isolation per index on separate threads (one per index). In the future, we might want to run it only once for all indexes to avoid reading the same blocks from disk multiple times. At that point, it would be useful to know which index requires which data so that, during dispatching — especially when multiple dispatching workers are involved — we don’t queue data that won’t be consumed by the index.

fjahr commented at 9:31 am on April 3, 2025: contributor

Also, why aren’t we simply always requiring BLOCK_HAVE_DATA and BLOCK_HAVE_UNDO for all indexes instead of making a distinction per index?

The IBD Booster idea that was just published is another concept where we would not have any undo data available but txindex should still be possible: https://delvingbitcoin.org/t/ibd-booster-speeding-up-ibd-with-pre-generated-hints-poc/1562/2

fjahr force-pushed on Apr 3, 2025

DrahtBot added the label CI failed on Apr 3, 2025

DrahtBot removed the label CI failed on Apr 4, 2025

maflcko closed this on Apr 7, 2025

maflcko reopened this on Apr 7, 2025

DrahtBot added the label Needs rebase on Apr 25, 2025

TheCharlatan commented at 12:39 pm on May 22, 2025: contributor

@fjahr can you rebase this, I’m still interested in this change.

fjahr force-pushed on May 31, 2025

fjahr commented at 9:52 pm on May 31, 2025: contributor

Rebased and addressed @furszy ’s comment here, sorry for the long delay!

DrahtBot removed the label Needs rebase on May 31, 2025

in src/init.cpp:2080 in acfbed6982 outdated

2081+        for (auto& index : node.indexes) {
2082+            const IndexSummary& summary = index->GetSummary();
2083+            if (summary.synced) continue;
2084+
2085+            // Get the last common block between the index best block and the active chain
2086+            const CBlockIndex* pindex = WITH_LOCK(::cs_main, return chainman.m_blockman.LookupBlockIndex(summary.best_block_hash));

TheCharlatan commented at 12:54 pm on June 2, 2025:

I think it should be ok to scope the lock to just the necessary call to LookupBlockIndex, but I am also having a hard time saying for certain, that there might not be some edge case race condition here with the later call to Contains and FindFork.

fjahr commented at 9:49 pm on June 4, 2025:

Makes sense, I am keeping the lock for the later calls as well now.

in src/init.cpp:2115 in acfbed6982 outdated

2128-        if (!indexes_start_block || !pindex || pindex->nHeight < indexes_start_block.value()->nHeight) {
2129-            indexes_start_block = pindex;
2130-            older_index_name = summary.name;
2131-            if (!pindex) break; // Starting from genesis so no need to look for earlier block.
2132+        // Verify all blocks needed to sync to current tip are present unless we already checked all of them above.
2133+        if (block_start && !(undo_start && undo_start.value() <= block_start.value())) {

TheCharlatan commented at 1:18 pm on June 2, 2025:

I think you are just comparing pointers here, but actually want to compare their height. This block of code is also kind of hard to test against from a functional test. Maybe we can add a unit test for StartIndexBackgroundSync that manipulates some of the internal block index state?

fjahr commented at 9:48 pm on June 4, 2025:

Right, good catch on the pointers! Fixed.

I can try to add such a unit test, do you have specific scenarios in mind that you would like to see covered? If we just want to make sure this block is covered I think I could also add a functional test that basically executes the edge case that @furszy described above: #29770 (review) and I think that should be fairly easy to do.

in src/rpc/blockchain.h:12 in 60786db022 outdated

 8@@ -9,15 +9,18 @@
 9 #include <core_io.h>
10 #include <streams.h>
11 #include <sync.h>
12+#include <threadsafety.h>

TheCharlatan commented at 1:32 pm on June 2, 2025:

Nit: These include fixes seem a bit random. Why not apply all suggested fixes from the CI?

fjahr commented at 9:46 pm on June 4, 2025:

This is just addressing a specific comment from #29668: #29668 (review) I’m not sure which CI you mean? But anyway I think I don’t want to extend the scope by adding more include fixes, I can drop these if you want.

TheCharlatan commented at 9:50 pm on June 4, 2025:

The tidy job includes a complete run of include what you use suggestions for all source files.

TheCharlatan commented at 1:39 pm on June 2, 2025: contributor

Approach ACK

Can you run clang-format-diff on the hunks where it makes sense? I think requiring the indexes to declare if they will use undo data is sensible, but I’d also not be opposed to just making all of them require undo data.

fjahr force-pushed on Jun 4, 2025

DrahtBot added the label Needs rebase on Jun 11, 2025

in src/init.cpp:2092 in ccf21cf1b2 outdated

2103-        LOCK(::cs_main);
2104-        const CBlockIndex* pindex = chainman.m_blockman.LookupBlockIndex(summary.best_block_hash);
2105-        if (!index_chain.Contains(pindex)) {
2106-            pindex = index_chain.FindFork(pindex);
2107-        }
2108+            if (index->RequiresUndoData()) {

furszy commented at 0:16 am on June 13, 2025:

In ccf21cf1b273f27eebbaad5dcf070377d675592c:

Now can simplify this commit by replacing index->RequiresUndoData() for index->CustomOptions().connect_undo_data

fjahr commented at 9:13 pm on June 15, 2025:

done

fjahr force-pushed on Jun 15, 2025

fjahr commented at 9:14 pm on June 15, 2025: contributor

Rebased and adapted the changes that #32694 introduced.

DrahtBot removed the label Needs rebase on Jun 15, 2025

furszy commented at 2:44 pm on June 16, 2025: member

The test doesn’t appear to exercise the issue. I cherry-picked 155803b onto master, and it passes without failure. Was expecting some crash (or a not-common failure) due to the lack of undo data and the missing check for it.

support multiple block status checks in CheckBlockDataAvailability 927c13de0a

furszy commented at 3:31 pm on June 16, 2025: member

I think you could implement this with a single chain traversal and without too many changes if you make CheckBlockDataAvailability aware of the lack of undo data in the genesis block. See https://github.com/furszy/bitcoin-core/commit/927c13de0a7bb22d9ff280f3b122860cbdeda383 and https://github.com/furszy/bitcoin-core/commit/be23f17fbdcc896e25258f67bfa1840ca9825266

fjahr force-pushed on Jun 17, 2025

fjahr commented at 8:47 pm on June 17, 2025: contributor

I think you could implement this with a single chain traversal and without too many changes if you make CheckBlockDataAvailability aware of the lack of undo data in the genesis block. See furszy@927c13d and furszy@be23f17

It was a bit hard for me to relate the changes to the changes here but it seems to me that the major reason this is simpler is because there is no real distinction between indexes that need undo data and indexes that don’t need it. This is similar to the suggestion by @mzumsande here previously: #29770#pullrequestreview-2733750397. I think the handling of it isn’t ideal because it’s not consistent. Image two indices are present, one (A) is caught up close to the tip and requires undo data, the other (B) was just started and hasn’t synced at all but doesn’t need undo data. If A is before B in the list of indexes, then the full chain will be checked for undo data because A latched the undo check to true and then B required the full chain to be checked. If B is before A in the list of indexes no blocks are checked for undo data, not even the few needed for A, because the undo check never latches to true.

Does that seem accurate to you? I didn’t test this, just looked at your code. Please also note that I try to do a single traversal here in the code too, it is just broken into two different steps. To stay with the example above, my code here should check the few needed undo blocks for A in the first traversal, and then the second picks up where undo traversal ended and checks all the rest of the blocks for B without undo data. It’s still a single traversal as in no block should be checked twice.

EDIT: No, that last part isn’t true anymore. I had that implemented but it was making this even more complicated so I simplified it to that we don’t check for block data if we have already checked all the necessary for block data + undo data. I can add this back though.

It seems fine to me to move the genesis handling from StartIndexBackgroundSync to CheckBlockDataAvailability, I just don’t think that in itself simplifies the code. I can apply this but first let me know if we are alinged about the other changes first. Maybe after all this is too much premature optimization and we should just do something simpler like just check all blocks for undo data.

The test doesn’t appear to exercise the issue. I cherry-picked 155803b onto master, and it passes without failure. Was expecting some crash (or a not-common failure) due to the lack of undo data and the missing check for it.

Good catch! There was a race condition in the test which I fixed in the latest push. With that the test fails consistently on current master for me.

TheCharlatan commented at 2:33 pm on June 18, 2025: contributor

It seems fine to me to move the genesis handling from StartIndexBackgroundSync to CheckBlockDataAvailability, I just don’t think that in itself simplifies the code

I think I’d prefer this move too. StartIndexBackgroundSync has too many responsibilities as is. Furszy’s commit looks good to me.

think the handling of it isn’t ideal because it’s not consistent.

I think I agree, but maybe I am still missing something from furszy’s patch. In any case the current approach seems fine to me.

index: Check availability of undo data for indices 1279029947

test: Indices can not start based on block data without undo data ef33b2b90b

rpc, test: Address feedback from #29668 5ce8336aec

test: Add coverage for restarted node without any block sync 041214d7e7

in src/init.cpp:2124 in ff7bf7c3b4 outdated

2125     const Chainstate& chainstate = WITH_LOCK(::cs_main, return chainman.GetChainstateForIndexing());
2126     const CChain& index_chain = chainstate.m_chain;
2127+    const int current_height = WITH_LOCK(::cs_main, return index_chain.Height());
2128+
2129+    // Skip checking data availability if we have not synced any blocks yet
2130+    if (current_height > 0) {

furszy commented at 5:59 pm on June 18, 2025:

A simple way of misusing this would be to invalidate the chain starting from block 1 with the index off. So when the node starts with the index enabled, it would have an active chain height of 0 and the index at the previous tip; so this code would skip the block availability checks. I think we should just drop this line and perform the availability checks all time.

fjahr commented at 1:59 pm on June 20, 2025:

If the whole chain is invalidated there are no blocks to check so I don’t see harm in skipping the checks. But since the checks aren’t doing much either I don’t see much harm in this and can remove it in the next push.

furszy commented at 2:30 pm on June 20, 2025:

If the whole chain is invalidated there are no blocks to check so I don’t see harm in skipping the checks.

We would still need to reorg the index if this happens. From the invalidated tip to the new one. And for that we need to disconnect all invalid blocks. So checking the blocks/undo availability seems correct to me.

fjahr force-pushed on Jun 18, 2025

fjahr commented at 8:43 pm on June 18, 2025: contributor

Ok, I have picked the commit from @furszy moving the genesis handling from StartIndexBackgroundSync to CheckBlockDataAvailability.

furszy commented at 2:51 pm on June 19, 2025: member

Image two indices are present, one (A) is caught up close to the tip and requires undo data, the other (B) was just started and hasn’t synced at all but doesn’t need undo data. If A is before B in the list of indexes, then the full chain will be checked for undo data because A latched the undo check to true and then B required the full chain to be checked. If B is before A in the list of indexes no blocks are checked for undo data, not even the few needed for A, because the undo check never latches to true.

Does that seem accurate to you?

I think I understand the rationale behind your code, but the example wasn’t very helpful. Mainly because it seems easily solvable by simply checking whether undo data is required before skipping the index. Some simple code so we are sync in that regards:

0start_block = tip;
1requires_undo
2for index in indexes:
3    requires_undo |= index.RequiresUndo()
4    // set lowest starting height.
5
6status_flags = check_undo ? undo_flag | data_flag : data_flag
7if (!checkBlockAvailability(tip, start_block, status_flags))
8    // fail

That being said, as far as I understand it now, the rationale behind splitting the undo data check from the block data check is that their existence on disk might not be synchronized in the future (even though that’s not currently the case). If we are sync now, I think adding a comment in the code explaining this rationale would be really helpful.

fjahr commented at 2:17 pm on June 20, 2025: contributor

I think I understand the rationale behind your code, but the example wasn’t very helpful. Mainly because it seems easily solvable by simply checking whether undo data is required before skipping the index. Some simple code so we are sync in that regards:

Thanks! But sorry, I don’t see how the pseudo-code fixes it. I stared at it for a while but it seems to have the exact same behavior as the code in https://github.com/furszy/bitcoin-core/commit/be23f17fbdcc896e25258f67bfa1840ca9825266 afaict. Can you point me to the difference and ideally maybe show the patch to the code in the commit. Doesn’t have to compile, just so I see where the difference is in the code because I just don’t see it.

That being said, as far as I understand it now, the rationale behind splitting the undo data check from the block data check is that their existence on disk might not be synchronized in the future (even though that’s not currently the case). If we are sync now, I think adding a comment in the code explaining this rationale would be really helpful.

Ok, I can add more documentation, you probably mean within StartIndexBackgroundSync, right?

Adding a bit more context as I reflected on this: for me, I think, when I what playing around with different approaches in StartIndexBackgroundSync I eventually found it easier to reason about the problem in a very general way, i.e. what would we do if we had any number of indexes and they could all be synced to various heights and some of them need undo and some of them don’t (like this . As I mentioned before, it’s definitely fair to say that that’s premature optimization but for me this generalized way was easier to reason about that keeping the current indexes and their constraints in the back of my head while thinking about it. Not sure if that makes sense but that’s how I ended up there and I think it’s a very rare case where the code does more than we need to right now but that makes it easier to reason about, not harder. But I wouldn’t be surprised if others see it differently.

furszy commented at 3:01 pm on June 20, 2025: member

I stared at it for a while but it seems to have the exact same behavior as the code in https://github.com/furszy/bitcoin-core/commit/be23f17fbdcc896e25258f67bfa1840ca9825266 afaict. Can you point me to the difference and ideally maybe show the patch to the code in the commit. Doesn’t have to compile, just so I see where the difference is in the code because I just don’t see it.

Hmm, maybe easier to do it offline, but I could try. I’m not even suggesting implementing my code at this point. I ended up understanding why you split the check. I’m ok with it.

But essentially, what my pseudo-code does is extend the current checks to cover undo data if any active index requires it, regardless of its sync height. The idea was to take advantage of the fact that block and undo data are synchronized on disk, and that the bit flag check has no extra cost (checking block_status & data is the same as checking block_status & (data | undo)).

In other words, going back to your example — you mentioned something like:

Tip at 500 indexA at 400 – needs undo indexB at 10 – no undo

And you said the problem arises when indexB appears first in the loop, right? Because no undo data is checked. But that’s not the case if the flag is just updated at the beginning of the loop.

0start_block = tip;
1requires_undo
2for index in indexes:
3    requires_undo |= index.RequiresUndo() --> set to true whenever indexA appears
4    // set lowest index height.  --> this will be set to the indexB block
5
6status_flags = check_undo ? undo_flag | data_flag : data_flag
7if (!checkBlockAvailability(tip, start_block, status_flags)). --> this will be executed from tip to indexB block, checks block + undo data in this scenario
8    // fail

At least this is what I understood from your example.

Ok, I can add more documentation, you probably mean within StartIndexBackgroundSync, right?

Yes. Probably a small: “verify block + undo data existence from tip down to the lowest undo-requiring height, and if no index requires undo data or any block-only index goes lower than the undo index, verify block data existence down to the lowest block-only height”.

in src/test/blockmanager_tests.cpp:147 in 927c13de0a outdated

139@@ -135,6 +140,11 @@ BOOST_FIXTURE_TEST_CASE(blockmanager_block_data_availability, TestChain100Setup)
140     BOOST_CHECK_EQUAL(blockman.GetFirstBlock(tip, BLOCK_HAVE_DATA), first_available_block);
141     BOOST_CHECK(blockman.CheckBlockDataAvailability(tip, *first_available_block));
142     BOOST_CHECK(!blockman.CheckBlockDataAvailability(tip, *last_pruned_block));
143+
144+    // Simulate that the first available block is missing undo data and
145+    // detect this by using a status mask.
146+    first_available_block->nStatus &= ~BLOCK_HAVE_UNDO;
147+    BOOST_CHECK(!blockman.CheckBlockDataAvailability(tip, *first_available_block, BlockStatus{BLOCK_HAVE_DATA | BLOCK_HAVE_UNDO}));

mzumsande commented at 7:56 pm on June 20, 2025:

I think that the change to CheckBlockDataAvailability has a problem: If I add a check here:

0BOOST_CHECK(!blockman.CheckBlockDataAvailability(tip, *genesis, BlockStatus{BLOCK_HAVE_DATA | BLOCK_HAVE_UNDO}));

it fails because of the genesis special case code. To fix this, we should make sure that first_block’s predecessor is genesis (e.g. first_block->nHeight == 1 in the genesis special case condition), so that CheckBlockDataAvailability returns false if genesis is the lower block of the range but a different block of the range is missing undo data.

furszy commented at 8:11 pm on June 20, 2025:

nice catch!

furszy commented at 3:23 pm on June 21, 2025:

Fixed here https://github.com/furszy/bitcoin-core/commit/14d3db22ece4d6b4a652728db154fd98c1d5ae30. @fjahr all yours (can directly exchange the first commit for this one).

index: Check all necessary block data is available before starting to sync #29770

Code Coverage & Benchmarks

Reviews

Conflicts