index: Check all necessary block data is available before starting to sync #29770

pull fjahr wants to merge 3 commits into bitcoin:master from fjahr:2024-03-check-undo-index changing 12 files +98 −18
  1. fjahr commented at 10:55 pm on March 30, 2024: contributor

    Currently, we check that BLOCK_HAVE_DATA is available for all blocks an index needs to sync during startup. However, for coinstatsindex and blockfilterindex we also need the undo data for these blocks. If that data is missing in the blocks, we are currently still starting to sync each of these indices and then crash later when we encounter the missing data.

    This PR adds explicit knowledge of which block data is needed for each index and then checks its availability during startup before initializing the sync process on them.

    This also addresses a few open comments from #29668 in the last commit.

  2. DrahtBot commented at 10:55 pm on March 30, 2024: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage

    For detailed information about the code coverage, see the test coverage report.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    Concept ACK TheCharlatan, stickies-v

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #31072 (refactor: Clean up messy strformat and bilingual_str usages by ryanofsky)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  3. DrahtBot added the label UTXO Db and Indexes on Mar 30, 2024
  4. TheCharlatan commented at 11:16 pm on March 30, 2024: contributor
    Concept ACK
  5. DrahtBot added the label CI failed on Mar 31, 2024
  6. in src/index/base.h:166 in 809fffdd6e outdated
    160@@ -160,6 +161,9 @@ class BaseIndex : public CValidationInterface
    161 
    162     /// Get a summary of the index and its state.
    163     IndexSummary GetSummary() const;
    164+
    165+    /// Data needed in blocks in order for the index to be able to sync
    166+    virtual uint32_t NeededBlockData() const = 0;
    


    stickies-v commented at 10:23 am on April 1, 2024:
    nit: NeededBlockStatusMask (I prefer Required instead of Needed but that’s personal and doesn’t really matter) better highlights that this returns a mask instead of data

    fjahr commented at 4:02 pm on July 12, 2024:
    Renamed
  7. in test/functional/feature_index_prune.py:148 in 809fffdd6e outdated
    139@@ -132,6 +140,39 @@ def run_test(self):
    140         for i, msg in enumerate([filter_msg, stats_msg, filter_msg]):
    141             self.nodes[i].assert_start_raises_init_error(extra_args=self.extra_args[i], expected_msg=msg+end_msg)
    142 
    143+        def check_for_block(node, hash):
    144+            try:
    145+                self.nodes[node].getblock(hash)
    146+                return True
    147+            except JSONRPCException:
    148+                return False
    


    stickies-v commented at 10:23 am on April 1, 2024:
    Already implement on L57, I think this one can be removed?

    fjahr commented at 4:01 pm on July 12, 2024:
    done
  8. in test/functional/feature_index_prune.py:167 in 809fffdd6e outdated
    162+            # 1500 is the height to where the indices were able to sync
    163+            # previously
    164+            for b in range(1500, prune_height):
    165+                bh = node.getblockhash(b)
    166+                node.getblockfrompeer(bh, peer_id)
    167+                self.wait_until(lambda: check_for_block(node=i, hash=bh), timeout=10)
    


    stickies-v commented at 11:21 am on April 1, 2024:

    Could use batch queries to speed things up. On my machine, fetching individually takes ~12s (2 x ~6s), fetching in batch just ~3s (2 x ~1.5s). Is there a reason we need to wait_until check_for_block? Batch approach seems to work fine without it?

     0diff --git a/test/functional/feature_index_prune.py b/test/functional/feature_index_prune.py
     1index af6ef4c14c..e15ca7ac76 100755
     2--- a/test/functional/feature_index_prune.py
     3+++ b/test/functional/feature_index_prune.py
     4@@ -5,13 +5,25 @@
     5 """Test indices in conjunction with prune."""
     6 import os
     7 from test_framework.authproxy import JSONRPCException
     8-from test_framework.test_framework import BitcoinTestFramework
     9+from test_framework.test_framework import BitcoinTestFramework, TestNode
    10 from test_framework.util import (
    11     assert_equal,
    12     assert_greater_than,
    13     assert_raises_rpc_error,
    14 )
    15 
    16+from typing import Dict, List, Any
    17+
    18+def send_batch_request(node: TestNode, method: str, params: List[Any]) -> List[Any]:
    19+    """Send batch request and parse all results"""
    20+    data = [{"method": method, "params": p} for p in params]
    21+    response = node.batch(data)
    22+    result = []
    23+    for item in response:
    24+        assert item["error"] is None, item["error"]
    25+        result.append(item["result"])
    26+
    27+    return result
    28 
    29 class FeatureIndexPruneTest(BitcoinTestFramework):
    30     def set_test_params(self):
    31@@ -159,12 +171,9 @@ class FeatureIndexPruneTest(BitcoinTestFramework):
    32             assert_equal(len(peers), 1)
    33             peer_id = peers[0]["id"]
    34 
    35-            # 1500 is the height to where the indices were able to sync
    36-            # previously
    37-            for b in range(1500, prune_height):
    38-                bh = node.getblockhash(b)
    39-                node.getblockfrompeer(bh, peer_id)
    40-                self.wait_until(lambda: check_for_block(node=i, hash=bh), timeout=10)
    41+            # 1500 is the height to where the indices were able to sync previously
    42+            hashes = send_batch_request(node, "getblockhash", [[a] for a in range(1500, prune_height)])
    43+            send_batch_request(node, "getblockfrompeer", [[bh, peer_id] for bh in hashes])
    44 
    45             # Upon restart we expect the same errors as previously although all
    46             # necessary blocks have been fetched. Both indices need the undo
    

    fjahr commented at 4:01 pm on July 12, 2024:
    Taken with minor edit, thanks!
  9. stickies-v commented at 11:23 am on April 1, 2024: contributor
    Concept ACK
  10. fjahr commented at 4:05 pm on April 28, 2024: contributor
    @stickies-v Thanks for the feedback, I will leave this unaddressed for now until #29668 has been merged. Then I will get back to it when I take this out of draft mode.
  11. DrahtBot added the label Needs rebase on Jul 10, 2024
  12. fjahr force-pushed on Jul 12, 2024
  13. fjahr marked this as ready for review on Jul 12, 2024
  14. DrahtBot removed the label Needs rebase on Jul 12, 2024
  15. fjahr force-pushed on Jul 12, 2024
  16. DrahtBot removed the label CI failed on Jul 13, 2024
  17. fjahr commented at 4:46 pm on July 13, 2024: contributor
    Rebased with updates that resulted from the changes in #29668 before merge plus an additional commit that addresses left-over comments in #29668.
  18. in src/index/base.h:166 in 245c09bc85 outdated
    160@@ -160,6 +161,9 @@ class BaseIndex : public CValidationInterface
    161 
    162     /// Get a summary of the index and its state.
    163     IndexSummary GetSummary() const;
    164+
    165+    /// Data needed in blocks in order for the index to be able to sync
    166+    virtual uint32_t RequiredBlockStatusMask() const = 0;
    


    maflcko commented at 8:08 am on July 15, 2024:

    unrelated: I wonder if at some point it could make sense to use a named type for the block status mask.

    Something like using BlockStatusMask = std::underlying_type_t<BlockStatus>;

  19. in src/node/blockstorage.cpp:620 in 05fb9aef47 outdated
    618 {
    619-    if (!(upper_block.nStatus & BLOCK_HAVE_DATA)) return false;
    620-    return GetFirstBlock(upper_block, BLOCK_HAVE_DATA, &lower_block) == &lower_block;
    621+    if (!(upper_block.nStatus & status_mask)) return false;
    622+    LogPrintf("This is result: %s\n", GetFirstBlock(upper_block, status_mask, &lower_block)->nHeight);
    623+    LogPrintf("This is goal: %s\n", lower_block.nHeight);
    


    maflcko commented at 8:08 am on July 15, 2024:
    Looks like a leftover debug print?

    fjahr commented at 11:23 am on July 15, 2024:
    yepp, fixed, thanks!
  20. fjahr force-pushed on Jul 15, 2024
  21. furszy commented at 1:43 am on July 16, 2024: member

    It would be nice to implement 81638f5d42b841 differently, without adding the <chain.h> dependency to all index headers. This dependency provides access to node internal types that indexes running on other processes (in the future) will not know about.

    It seems we all implemented the same “index X requires block data” and “index Y requires block undo data” in slightly different ways. I did it in #26966 some time ago, and ryanofsky also did it in #24230.

    My preference would be to follow #24230 and use the custom options class. It is more flexible than introducing a new method to override on all/most classes every time a new index customization is added.

  22. fjahr commented at 9:03 am on July 16, 2024: contributor

    It would be nice to implement 81638f5 differently, without adding the <chain.h> dependency to all index headers. This dependency provides access to node internal types that indexes running on other processes (in the future) will not know about.

    Should not be too complicated to get rid of the dependency. I will draft something, maybe I’ll just give @maflcko s suggestion a try.

    It seems we all implemented the same “index X requires block data” and “index Y requires block undo data” in slightly different ways. I did it in #26966 some time ago, and ryanofsky also did it in #24230.

    My preference would be to follow #24230 and use the custom options class. It is more flexible than introducing a new method to override on all/most classes every time a new index customization is added.

    I’m wouldn’t call having helper functions returning bools for each case more flexible than what we have here. At least this is what I saw in #24230 and I think you mean:

    0virtual bool RequiresBlockUndoData() const { return false; }```
    1
    2It may be more readable in the short term but if we build more complicated stuff beyond just checking undo data I think this will be complicated. And we already use the mask everywhere so why not build on this? I like being able to grep for the blockstatus enum values too in order to see where block data is checked in various ways, that would be also lost.
    
  23. furszy commented at 2:05 pm on July 16, 2024: member

    It would be nice to implement 81638f5 differently, without adding the <chain.h> dependency to all index headers. This dependency provides access to node internal types that indexes running on other processes (in the future) will not know about.

    Should not be too complicated to get rid of the dependency. I will draft something, maybe I’ll just give @maflcko s suggestion a try.

    maflcko’s suggestion wouldn’t remove the dependency. Unless you place the new enum somewhere else?. Which I don’t think it worth it.

    It seems we all implemented the same “index X requires block data” and “index Y requires block undo data” in slightly different ways. I did it in #26966 some time ago, and ryanofsky also did it in #24230. My preference would be to follow #24230 and use the custom options class. It is more flexible than introducing a new method to override on all/most classes every time a new index customization is added.

    I’m wouldn’t call having helper functions returning bools for each case more flexible than what we have here. At least this is what I saw in #24230 and I think you mean:

    virtual bool RequiresBlockUndoData() const { return false; }

    It may be more readable in the short term but if we build more complicated stuff beyond just checking undo data I think this will be complicated. And we already use the mask everywhere so why not build on this? I like being able to grep for the blockstatus enum values too in order to see where block data is checked in various ways, that would be also lost.

    If we agree that the final goal of the indexes is to run in isolation, in a standalone process, and be the first consumers of the kernel library, then linking them to chain internal fields like the CBlockIndex status mask isn’t the way to go. These objects will live only within the kernel process.

    Also, I don’t think the grepping argument is valid in most circumstances. It can be used anywhere to break layer distinctions. For example, what if the GUI needs to know if a certain block is available on disk in the future. Would you add the status masks enum dependency to a widget class? Doing so would break the current structure: views -> model/interfaces -> kernel (this is also done this way to allow the GUI to run in a standalone process as well).

    The RequiresBlockUndoData() implementation is from my PR, and I’m not fan of it anymore (I don’t dislike it, just prefer a different approach). I think #24230 approach is better as it introduces a new base class method to override called CustomOptions() that returns a struct containing index’s specific information. I think that building upon this would be cleaner and more flexible, as we would only need to add fields to a struct instead of changing the base class interface with each new option. - would probably change the struct name, which is currently called NotifyOptions -.

  24. fjahr commented at 3:57 pm on July 16, 2024: contributor

    maflcko’s suggestion wouldn’t remove the dependency. Unless you place the new enum somewhere else?. Which I don’t think it worth it.

    Yes, I was thinking of putting it in a file for types which we have been doing in the code base in several places. And @ryanofsky suggests introducing a kernel/types.h here already. Moving code isn’t hard to review and chain.h is big enough to be broken up a bit more IMO.

    If we agree that the final goal of the indexes is to run in isolation, in a standalone process, and be the first consumers of the kernel library, then linking them to chain internal fields like the CBlockIndex status mask isn’t the way to go. These objects will live only within the kernel process.

    Yes, I hope they can run in isolation in the future. But the index still needs to have an understanding of what block data is and what it needs in order to decide if it can handle what the the kernel gives it. So I don’t think it can be avoided that the index has knowledge of the block status so this would need to be shared. That doesn’t mean that they can’t run isolation, but the kernel needs to share this knowledge with them. We can only avoid this if we forget about the concept suggested here and let the index ask for data from the kernel until it hits something unexpected and fails or we basically reimplement the block status as a list of bools in the index which will be much harder to reason about and maintain. I don’t see how an options object prevents that.

  25. DrahtBot added the label Needs rebase on Jul 16, 2024
  26. index: Check availability of all necessary data for indices 9b28eae85a
  27. test: Indices can not start based on block data without undo data adabfbc237
  28. rpc, test: Address feedback from #29668 d855c594c3
  29. fjahr force-pushed on Jul 17, 2024
  30. fjahr commented at 10:55 am on July 17, 2024: contributor
    just rebased
  31. DrahtBot removed the label Needs rebase on Jul 17, 2024
  32. furszy commented at 10:31 pm on July 19, 2024: member

    But the index still needs to have an understanding of what block data is and what it needs in order to decide if it can handle what the the kernel gives it. So I don’t think it can be avoided that the index has knowledge of the block status so this would need to be shared. That doesn’t mean that they can’t run isolation, but the kernel needs to share this knowledge with them. We can only avoid this if we forget about the concept suggested here and let the index ask for data from the kernel until it hits something unexpected and fails or we basically reimplement the block status as a list of bools in the index which will be much harder to reason about and maintain. I don’t see how an options object prevents that.

    We both agree that indexes need to share their sync data requirements in some way. Perhaps we are not understanding each other because your rationale is based on the current sync mechanism, while I am considering a future version that is no longer at the index base class.

    An index running in a standalone process would only sync through kernel signals. It will register with the kernel, providing its last known block hash and the data and events it wants to receive and listen to. Then, it will only react to them. It will no longer request data directly from the kernel as it does currently.

    The kernel, running in a different process, will emit the signals required to sync the registered index, just as it does for other listeners like the wallet (no difference between them). It will provide the BlockInfo struct, which may or may not contain specific data, such as block undo data during connection/disconnection and other information if requested during registration.

    This is why using an options struct instead of creating a method for each index sync requirement is more flexible to me. The index will provide this struct during registration to the kernel running in a different process and then forget about it. Then, if any event arrives without the requested data, the index will abort its execution.

    Moreover, even if you prefer the multi-overridden-methods approach instead of the options struct (which is how I implemented this in #26966), I don’t think accessing the uint32_t block index status bit flags field helps much in terms of reasoning about the code or maintenance. People working on upper layers like the indexes, the wallet, or the GUI should focus on receiving the data they expect and should not have to worry/learn about how the block verification status is mapped in memory.

  33. in test/functional/feature_index_prune.py:157 in adabfbc237 outdated
    153@@ -132,6 +154,29 @@ def run_test(self):
    154         for i, msg in enumerate([filter_msg, stats_msg, filter_msg]):
    155             self.nodes[i].assert_start_raises_init_error(extra_args=self.extra_args[i], expected_msg=msg+end_msg)
    156 
    157+        self.log.info("fetching the missing blocks with getblockfrompeer doesn't work for block filter index and coinstatsindex")
    


    Sjors commented at 3:43 pm on July 23, 2024:
    adabfbc23700056e5bd22b673c3078aa49ad83ea: it would be useful to have an example where getblockfrompeer does work, i.e. an index that doesn’t need undo data.
  34. Sjors commented at 8:55 pm on July 23, 2024: member
    As a quick sanity check, I rebuilt -coinstatsindex and checked gettxoutsetinfo still gives me the same stats at block 840,000.

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-11-21 09:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me