assumeutxo: indexing changes #24006

pull jamesob wants to merge 4 commits into bitcoin:master from jamesob:2022-01-au-indexing changing 11 files +193 −62
  1. jamesob commented at 4:56 pm on January 7, 2022: member

    This is part of the assumeutxo project (parent PR: #15606)


    This PR includes the necessary changes for indexing to work with the operation of a background validation chainstate. In short, it

    • only emits existing ValidationInterface events for the active chain,
    • introduces a BackgroundBlockConnected event for the background chain to support indexation, and
    • removes assumptions in the indexing framework that indexation will happen sequentially.

    More details below.


    Adds BackgroundBlockConnected to the validationinterface and uses it in index maintenance. Ensures that index building will work when part of the chain is validated asynchronously in the background by a second chainstate.

    This changeset removes the guarantee that indexes will be built sequentially, as none of the current indexes require this and it is no longer easy to offer this guarantee when multiple chainstates are in use.

    Within BaseIndex, we only update m_best_block_index (which is essentially the progress marker for how far along the indexing process is) for chains which we can be certain that all blocks under a given block on that chain have had indexing performed on them. In other words, we will not update m_best_block_index during a BlockConnected event that comes from an active snapshot chain, i.e. a chain for which some of the blocks underneath Tip() are not indexed.

    Once background validation is completed and no background chainstates are in use, the indexer will happily use BlockConnected events from the active chain to update m_best_block_index as usual. No blocks from the active chainstate will have been missed for indexing (despite not previously updating m_best_block_index), so it is safe to perform this transition without any extra behavior.

    Some unnecessary logic from BaseIndex::ChainStateFlushed() is removed, since the locator can no longer be assumed to be an ancestor of the best_block_index (since the “best block” may be on a background chainstate and the locator may have come from an assumed-valid tip well ahead of it).

  2. validation: only send ValidationInterface callbacks for active chain 9ce8364c68
  3. add ChainstateManager.getSnapshot{Height,BaseBlock}()
    For use in later commits.
    e50aa36a78
  4. validation: index: add methods for snapshot-aware indexing
    To be used in subsequent commits.
    0975c1ff29
  5. validation: indexing changes for assumeutxo
    Adds BackgroundBlockConnected to the validationinterface and uses it in
    index maintenance. Ensures that index building will work when part of
    the chain is validated asynchronously in the background by a second
    chainstate.
    
    This changeset removes the guarantee that indexes will be built
    sequentially, as none of the current indexes require this and it is no
    longer easy to offer this guarantee when multiple chainstates are in
    use.
    
    Within BaseIndex, we only update `m_best_block_index` (which is
    essentially the progress marker for how far along the indexing process
    is) for chains which we can be certain that all blocks under a given
    block on that chain have had indexing performed on them.  In other
    words, we will not update `m_best_block_index` during a BlockConnected
    event that comes from an active snapshot chain, i.e. a chain for which
    some of the blocks underneath Tip() are not indexed.
    
    Once background validation is completed and no background chainstates
    are in use, the indexer will happily use BlockConnected events from the
    active chain to update `m_best_block_index` as usual. No blocks from the
    active chainstate will have been missed for indexing (despite not
    previously updating m_best_block_index), so it is safe to perform this
    transition without any extra behavior.
    
    Some unnecessary logic from BaseIndex::ChainStateFlushed() is removed,
    since the locator can no longer be assumed to be an ancestor of the
    best_block_index (since the "best block" may be on a background chainstate
    and the locator may have come from an assumed-valid tip well ahead of
    it).
    82399df1fd
  6. DrahtBot added the label UTXO Db and Indexes on Jan 7, 2022
  7. DrahtBot added the label Validation on Jan 7, 2022
  8. DrahtBot commented at 8:29 pm on January 7, 2022: member

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #24008 (assumeutxo: net_processing changes by jamesob)
    • #22485 (index: doc/log BaseIndex sync behavior with empty datadir by jamesob)
    • #19888 (rpc, test: Improve getblockstats for unspendables by fjahr)
    • #15606 (assumeutxo by jamesob)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  9. in src/validation.h:825 in e50aa36a78 outdated
    821@@ -822,6 +822,10 @@ class ChainstateManager
    822         CBlockIndex** ppindex) EXCLUSIVE_LOCKS_REQUIRED(cs_main);
    823     friend CChainState;
    824 
    825+    // Returns nullptr if no snapshot ahs been loaded.
    


    Sjors commented at 6:16 pm on January 10, 2022:
    nit: ahs -> has
  10. in src/validation.h:978 in 0975c1ff29 outdated
    973+    //!   BLOCK_HAVE_DATA beneath the tip.
    974+    //!
    975+    //!   In other words, give us the chainstate for which we can reasonably expect
    976+    //!   that all blocks beneath the tip have been indexed. In practice this means
    977+    //!   when using an assumed-valid chainstate based upon a snapshot, return only the
    978+    //!   fully validated chain.
    


    Sjors commented at 6:22 pm on January 10, 2022:
    This is a bit cryptically worded, because “fully validated chain” could be read as “the background chain, but only once it’s fully validated”.
  11. Sjors commented at 6:35 pm on January 10, 2022: member

    This changeset removes the guarantee that indexes will be built sequentially, as none of the current indexes require this and it is no longer easy to offer this guarantee when multiple chainstates are in use.

    I’m confused. IIUC when a new index starts building while IBD is in progress it will:

    1. While only synching from assume valid to the tip: do nothing
    2. While syncing the background chain state: index, from genesis up
    3. When background chain state is finish: index assumed valid blocks

    If so, then it’s still building sequentially. Or is it indexing assumed valid blocks already in (1) and (2)?

    Also, doesn’t MuHash in coinstatsindex require the index to be in sequence, because the hash for each block is an increment of that of the previous block? Or do we store the increment and infer the value for each block on the fly? cc @fjahr

    Regarding ValidationInterface callbacks for active chain, do you have any hint for how to reason about what the consumers of these callbacks expect?

  12. fjahr commented at 8:57 pm on January 10, 2022: member

    Also, doesn’t MuHash in coinstatsindex require the index to be in sequence, because the hash for each block is an increment of that of the previous block? Or do we store the increment and infer the value for each block on the fly?

    It works like this in the index but since the MuHash is just a representation of a UTXO set and does not commit to a previous state you can “jump in” at any point in the chain and calculate the MuHash at any height by looking only at the UTXO set (inefficient because the full set has to be processed) and then from this point on work with the deltas (=blocks) as usual (efficient). Because of this it is also possible to calculate the MuHash at the tip without the index, it just takes a while. I haven’t looked at the code in detail yet but just skimming I think the code for this starting of the coinstatsindex at any height is not included yet, right? But since assumeutxo supplies the UTXO set and its corresponding height this should be possible to build this in a follow-up.

    But I agree with Sjors’ question on the conceptual level: do we really need the indices to sync already right away on the blocks between the au height and tip or is it not maybe simpler to have the indices only sync with the background chain? Syncing the blocks between au and tip first we could serve blockfilters at the tip right away, which is nice, but if there is ever an issue with an incorrect snapshot then we make it other people’s problem too because we are serving incorrect blockfilters.

  13. jamesob commented at 9:05 pm on January 10, 2022: member

    I’m confused.

    The way this works as-written is that if the user starts with an empty datadir and then proceeds to load a snapshot, indexes will be built on the basis of incoming (Background)BlockConnected events that will come from both chainstates (due to the behavior documented in the unmerged #22485) - hence the lack of ordering.

    But now that we’re discussing it, maybe it’s preferable to just defer index building until we’re done with the assumed-valid chain…

    Also, doesn’t MuHash in coinstatsindex require the index to be in sequence, because the hash for each block is an increment of that of the previous block? Or do we store the increment and infer the value for each block on the fly? cc @fjahr

    Good catch! Thought I’d checked that but apparently not: https://github.com/jamesob/bitcoin/blob/e84248814296b82d926f5018dd62d917ddefe626/src/index/coinstatsindex.cpp#L118-L121

    A few options that come to mind:

    • we can defer indexation until there is are no assumed-valid chainstates in use (as @fjahr suggested as I was typing this), or
    • we can introduce an m_requires_sequential member variable to BaseIndex that does the above, but only for indexes that require it. This is kind of an optimized version of the first option.
  14. Sjors commented at 9:33 am on January 11, 2022: member
    I think we can introduce m_requires_sequential in the future if we really want to build an index earlier. I can’t see an obvious use case for that atm.
  15. jamesob commented at 4:24 pm on January 12, 2022: member
    Closing this while I work on a simpler implementation that preserves sequential indexing at the cost of deferring indexing on the assumed-valid chainstate.
  16. jamesob closed this on Jan 12, 2022

  17. DrahtBot locked this on Jan 12, 2023

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-07-01 10:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me