index: move disk read lookups to base class #32694

pull furszy wants to merge 5 commits into bitcoin:master from furszy:2025_indexes_remove_CBlockIndex_access changing 15 files +167 −196
  1. furszy commented at 8:38 pm on June 6, 2025: member

    Combining common refactors from #24230 and #26966, aiming to move both efforts forward while reducing their size and review burden.

    Broadly, #24230 focuses on enabling indexes to run in a separate process, and #26966 aims to parallelize the indexes initial synchronization process. A shared prerequisite for both is ensuring that only the base index class interacts with the node’s chain internals - child index classes should instead operate solely through chain events.

    This PR moves disk read lookups from child index classes to the base index class. It also includes a few documentation improvements and a test-only code cleanup.

  2. test: indexes, avoid creating threads when sync runs synchronously
    The indexes test call StartBackgroundSync(), which spawns a thread to run Sync(),
    only for the test thread to wait for it to complete by calling IndexWaitSynced().
    
    So, since the sync is performed synchronously, we can skip the extra thread creation
    entirely and call Sync() directly.
    331a25cb16
  3. DrahtBot commented at 8:38 pm on June 6, 2025: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/32694.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK TheCharlatan, maflcko, davidgumberg, mzumsande, achow101
    Stale ACK ryanofsky

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #30469 (index: Fix coinstats overflow and introduce index versioning by fjahr)
    • #29307 (util: explicitly close all AutoFiles that have been written by vasild)
    • #26966 (index: initial sync speedup, parallelize process by furszy)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  4. DrahtBot added the label UTXO Db and Indexes on Jun 6, 2025
  5. in src/test/blockfilter_index_tests.cpp:146 in 331a25cb16 outdated
    141@@ -143,10 +142,7 @@ BOOST_FIXTURE_TEST_CASE(blockfilter_index_initial_sync, BuildChainTestingSetup)
    142     // BlockUntilSyncedToCurrentChain should return false before index is started.
    143     BOOST_CHECK(!filter_index.BlockUntilSyncedToCurrentChain());
    144 
    145-    BOOST_REQUIRE(filter_index.StartBackgroundSync());
    


    ryanofsky commented at 11:53 am on June 10, 2025:

    In commit “test: indexes, avoid creating threads when sync runs synchronously” (331a25cb16632042dd6782a9b62fcc5c8aa6da3b)

    Motivation for this commit seems unclear. It seems to initialize indexes in unit tests differently than they are initialized in the node and reduce test coverage a little. Maybe commit message can be updated to mention reasons for wanting to do this?


    furszy commented at 3:33 pm on June 10, 2025:

    In commit “test: indexes, avoid creating threads when sync runs synchronously” (331a25c)

    Motivation for this commit seems unclear. It seems to initialize indexes in unit tests differently than they are initialized in the node and reduce test coverage a little. Maybe commit message can be updated to mention reasons for wanting to do this?

    Having unit tests waiting for secondary thread creation seemed odd to me; especially since these tests are meant to only run and verify the initial index synchronization output once (once the initial sync is completed, they sync new blocks only via block connection signals). Besides that, the functionality should already be covered by the functional tests.

    That being said, could drop it too if preferred. I think I just dislike having the thread inside the index class.

  6. in src/index/base.cpp:138 in c0bd2c5a8a outdated
    129@@ -130,6 +130,12 @@ bool BaseIndex::Init()
    130     return true;
    131 }
    132 
    133+// Returns the next block to process during initial sync.
    134+// Possible return values:
    135+// 1) The genesis block if pindex_prev=nullptr
    136+// 2) The next block after pindex_prev, if pindex_prev is on the active chain (null if it reached the tip).
    137+// 3) The block immediately following the fork point between pindex_prev and the active chain,
    138+//    if pindex_prev is not part of the active chain -> the caller must perform a rewind if this happens.
    


    ryanofsky commented at 12:04 pm on June 10, 2025:

    In commit “test: indexes, avoid creating threads when sync runs synchronously” (331a25cb16632042dd6782a9b62fcc5c8aa6da3b)

    This seems ok but the description is now longer than the function is and it seems to repeat a number of uninteresting details about what the function is doing. Maybe consider dropping the function level comment and just adding a short comment above the findfork call on line 151 since that is the part that seems to be surprising. Something like “Since block is not in the chain, return the next block in the chain **after** the last common ancestor. Caller will be responsible for rewinding back to the common ancestor.”


    furszy commented at 4:52 pm on June 10, 2025:
    Done as suggested. Ended up squashing the commit inside the last one.
  7. in src/index/base.cpp:368 in 3f390efa31 outdated
    362@@ -343,17 +363,14 @@ void BaseIndex::BlockConnected(ChainstateRole role, const std::shared_ptr<const
    363             return;
    364         }
    365     }
    366-    interfaces::BlockInfo block_info = kernel::MakeBlockInfo(pindex, block.get());
    367-    if (CustomAppend(block_info)) {
    368+
    369+    // Dispatch block to child class; errors are logged internally and abort the node.
    370+    if (ProcessBlock(pindex)) {
    


    ryanofsky commented at 12:26 pm on June 10, 2025:

    In commit “index: remove CBlockIndex access from CustomAppend()” (3f390efa313f4ac43e9134d5299d7faec74cca60)

    Should be ProcessBlock(pindex, block.get()) to avoid rereading the block from disk?


    furszy commented at 3:02 pm on June 10, 2025:

    Should be ProcessBlock(pindex, block.get()) to avoid rereading the block from disk?

    upps, yes! That was the spirit of function’s signature.

  8. in src/index/coinstatsindex.cpp:123 in 3f390efa31 outdated
    124-        // will be removed in upcoming commit
    125-        const CBlockIndex* pindex = WITH_LOCK(cs_main, return m_chainstate->m_blockman.LookupBlockIndex(block.hash));
    126-        if (!m_chainstate->m_blockman.ReadBlockUndo(block_undo, *pindex)) {
    127-            return false;
    128-        }
    129+        block_undo = *Assert(block.undo_data);
    


    ryanofsky commented at 12:32 pm on June 10, 2025:

    In commit “index: remove CBlockIndex access from CustomAppend()” (3f390efa313f4ac43e9134d5299d7faec74cca60)

    It’s a bit inefficient to copy the undo data here, and the code seems overcomplicated because the block_undo variable is only referenced once, 60 lines below. Would suggest dropping the block_undo variable and just accessing block.undo_data directly below.

    The block_undo variable in BlockFilterIndex::CustomAppend could also go away to simplify that function.


    furszy commented at 3:13 pm on June 10, 2025:

    In commit “index: remove CBlockIndex access from CustomAppend()” (3f390ef)

    It’s a bit inefficient to copy the undo data here, and the code seems overcomplicated because the block_undo variable is only referenced once, 60 lines below. Would suggest dropping the block_undo variable and just accessing block.undo_data directly below.

    The block_undo variable in BlockFilterIndex::CustomAppend could also go away to simplify that function.

    Done as suggested. Thanks!

  9. in src/index/base.cpp:156 in 3f390efa31 outdated
    152@@ -153,6 +153,39 @@ static const CBlockIndex* NextSyncBlock(const CBlockIndex* pindex_prev, CChain&
    153     return chain.Next(chain.FindFork(pindex_prev));
    154 }
    155 
    156+bool BaseIndex::ProcessBlock(const CBlockIndex* pindex, const CBlock* block_data)
    157+{
    158+    interfaces::BlockInfo block_info = kernel::MakeBlockInfo(pindex, block_data);
    159+
    160+    CBlock block;
    


    TheCharlatan commented at 12:40 pm on June 10, 2025:
    Nit: There is no reason to declare this (and CBlockUndo) outside the if statement, right?

    furszy commented at 3:12 pm on June 10, 2025:

    Nit: There is no reason to declare this (and CBlockUndo) outside the if statement, right?

    There’s actually a slightly important reason for that. If the object is declared inside the if-block, it will be destructed when going out of scope. This would leave null pointers in the BlockInfo struct that we provide to child classes.


    TheCharlatan commented at 5:54 pm on June 10, 2025:
    Of course, not sure what tripped me up here :(
  10. ryanofsky approved
  11. ryanofsky commented at 12:55 pm on June 10, 2025: contributor
    Code review ACK 3f390efa313f4ac43e9134d5299d7faec74cca60. Nice changes! This gets rid of blockstorage and CBlockIndex* accesses from index subclasses during sync to decouple the indexes from node internals and make new indexes more straightforward to write in the future. It should also simplify the parallelization PR and avoid the need for refactoring there.
  12. TheCharlatan commented at 1:39 pm on June 10, 2025: contributor

    Approach ACK

    This all felt very familiar :)

    Can you double-check the commit message of 50696f07784b5dcf04a398f7592dead4dde6d2e9. I think there is a word too much in the first and second sentence.

  13. furszy force-pushed on Jun 10, 2025
  14. indexes, refactor: Stop requiring CBlockIndex type to call IsBIP30Unspendable
    This commit does not change behavior in any way.
    0a248708dc
  15. indexes, refactor: Remove remaining CBlockIndex* uses in index Rewind methods
    Move ReadBlock code from CoinStatsIndex::CustomRewind to BaseIndex::Rewind
    
    Move ReadUndo code from CoinStatsIndex::ReverseBlock to BaseIndex::Rewind
    
    This commit does change behavior slightly. Since the new CustomRemove
    methods only take a single block at a time instead of a range of
    disconnected blocks, when they call CopyHeightIndexToHashIndex they will
    now do an index seek for each removed block instead of only seeking once
    to the height of the earliest removed block. Seeking instead of scanning
    is a little worse for performance if there is a >1 block reorg, but
    probably not noticeable unless the reorg is very deep.
    6f1392cc42
  16. furszy force-pushed on Jun 10, 2025
  17. furszy commented at 5:02 pm on June 10, 2025: member

    Thanks for the review ryanofsky and TheCharlatan! Updated per feedback.

    Can you double-check the commit message of https://github.com/bitcoin/bitcoin/commit/50696f07784b5dcf04a398f7592dead4dde6d2e9. I think there is a word too much in the first and second sentence.

    Done.

  18. TheCharlatan approved
  19. TheCharlatan commented at 6:43 pm on June 10, 2025: contributor
    ACK 4c5ec3a70a47798d8451da65c9c0059955cde937
  20. DrahtBot requested review from ryanofsky on Jun 10, 2025
  21. in src/index/blockfilterindex.cpp:327 in 6f1392cc42 outdated
    324 
    325     // During a reorg, we need to copy all filters for blocks that are getting disconnected from the
    326     // height index to the hash index so we can still find them when the height index entries are
    327     // overwritten.
    328-    if (!CopyHeightIndexToHashIndex(*db_it, batch, m_name, new_tip.height, current_tip.height)) {
    329+    if (!CopyHeightIndexToHashIndex(*db_it, batch, m_name, block.height - 1, block.height)) {
    


    mzumsande commented at 8:15 pm on June 10, 2025:

    6f1392cc42cde638773f2b697d7d2c58abcdc860: As a result of this commit, CopyHeightIndexToHashIndex (which doesn’t have any other callers) could be simpler: No nee to pass two heights, no need for a loop if it only copies one element.

    Besides, I wonder if the code duplication (it’s identical code in coinstatsindex and blockfilterindex) could be resolved.


    furszy commented at 10:17 pm on June 10, 2025:
    This reminds me to my own comment hehe. Will do it here.

    furszy commented at 2:00 am on June 11, 2025:
    Pushed. Added it in a separate commit to avoid extending ryanofsky’s commit further.
  22. mzumsande commented at 9:21 pm on June 10, 2025: contributor
    Concept ACK
  23. refactor: index, simplify CopyHeightIndexToHashIndex to process single block
    The function previously handled a range of heights but now copies only a
    single entry. This change simplifies the code to match its current usage.
    91b7ab6c69
  24. index: remove CBlockIndex access from CustomAppend()
    Moved CBlockUndo disk read lookups from child index classes to
    the base index class.
    
    The goal is for child index classes to synchronize only through
    events, without directly accessing the chain database.
    
    This change will enable future parallel synchronization mechanisms,
    reduce database access (when batched), and contribute toward the
    goal of running indexes in a separate process (with no chain
    database access).
    
    Besides that, this commit also documents how NextSyncBlock() behaves.
    It is not immediately clear this function could return the first
    block after the fork point during a reorg.
    029ba1a21d
  25. furszy force-pushed on Jun 11, 2025
  26. TheCharlatan approved
  27. TheCharlatan commented at 8:12 am on June 11, 2025: contributor
    Re-ACK 029ba1a21d570f7db6c4366ec9a30a381b56d6fb
  28. DrahtBot requested review from mzumsande on Jun 11, 2025
  29. maflcko commented at 9:23 am on June 11, 2025: member

    review ACK 029ba1a21d570f7db6c4366ec9a30a381b56d6fb 👡

    Signature:

    0untrusted comment: signature from minisign secret key on empty file; verify via: minisign -Vm "${path_to_any_empty_file}" -P RWTRmVTMeKV5noAMqVlsMugDDCyyTSbA3Re5AkUrhvLVln0tSaFWglOw -x "${path_to_this_whole_four_line_signature_blob}"
    1RUTRmVTMeKV5npGrKx1nqXCw5zeVHdtdYURB/KlyA/LMFgpNCs+SkW9a8N95d+U4AP1RJMi+krxU1A3Yux4bpwZNLvVBKy0wLgM=
    2trusted comment: review ACK 029ba1a21d570f7db6c4366ec9a30a381b56d6fb 👡
    3OQ13TwEEQORasoU+qCZ+jMGAPXKnsCJhTt8atkoWt5nRWhCn9ueXFilV6W5B9iknOUBhP6B2nDQJXduuKa0nAA==
    
  30. in src/test/util/index.cpp:21 in 331a25cb16 outdated
    15-        // there was an unexpected error in the index that caused it to stop syncing and request a shutdown.
    16-        Assert(!interrupt);
    17-
    18-        UninterruptibleSleep(100ms);
    19-    }
    20-    assert(index.GetSummary().synced);
    


    davidgumberg commented at 6:23 pm on June 11, 2025:

    https://github.com/bitcoin/bitcoin/pull/32694/commits/331a25cb16632042dd6782a9b62fcc5c8aa6da3b (test: indexes, avoid creating threads when sync runs synchronously)

    nit, only if retouching: it would probably be nice to retain the m_synced check from here.


    maflcko commented at 6:11 am on June 12, 2025:

    nit, only if retouching: it would probably be nice to retain the m_synced check from here.

    the m_init check wasn’t retained either, but I didn’t mention it, as this is test-only code


    furszy commented at 6:19 pm on June 12, 2025:
    Sure, will do it if it needs a re-touch. I don’t see it as that important because if the flag wouldn’t being set, the test would be failing due to the index ignoring the BlockConnected signals anyway.
  31. in src/index/blockfilterindex.cpp:295 in 91b7ab6c69 outdated
    293-                                       int start_height, int stop_height)
    294+                                                     const std::string& index_name, int height)
    295 {
    296-    DBHeightKey key(start_height);
    297+    DBHeightKey key(height);
    298     db_it.Seek(key);
    


    davidgumberg commented at 9:24 pm on June 11, 2025:

    Out of scope for this PR since it is incorrect on master, but db_it needs a Valid() check before GetKey(), since leveldb assert()’s on trying to read the key of an invalid iterator:

    https://github.com/bitcoin-core/leveldb-subtree/tree/bitcoin-fork/db/db_iter.cc#L64-L67


    davidgumberg commented at 10:55 pm on June 11, 2025:
    Alternatively, this should probably do m_db.ReadKey(), but that would require this function to become a member of the index or to be passed a db handle, so maybe better suited for a refactor that also addresses @mzumsande’s ask to de-deduplicate this (and maybe LookupOne?) in blockfilterindex and coinstatsindex

    furszy commented at 6:45 pm on June 12, 2025:

    Out of scope for this PR since it is incorrect on master, but db_it needs a Valid() check before GetKey(), since leveldb assert()’s on trying to read the key of an invalid iterator:

    I assume that the invalid iterator case could only happen when it cannot access db? Which is not common unless the user deletes/moves the index db directory after startup?


    davidgumberg commented at 6:51 pm on June 12, 2025:
    The iterator is invalid if no entry is found matching the key I believe.

    davidgumberg commented at 6:59 pm on June 12, 2025:

    Correction, the iterator is invalid if no entry is found which has a key greater than or equal to the “Seek”’ed key:

    https://github.com/bitcoin-core/leveldb-subtree/blob/bitcoin-fork/db/db_test.cc#L861-L870


    davidgumberg commented at 7:15 pm on June 12, 2025:

    The GetKey() call and key.height check below protects against the situation where an entry with a height greater than the key we’re seeking exists, but the height we’re seeking doesn’t exist. In the instance where the height we’re seeking is greater than any entry which exists, which should never happen,db_it.GetKey() will assert and crash.

    Just to be clear though, I think this is out-of-scope and the PR is ready to merge as-is.

  32. in src/index/base.cpp:282 in 6f1392cc42 outdated
    279+        if (!CustomRemove(block_info)) {
    280+            return false;
    281+        }
    282     }
    283 
    284     // In the case of a reorg, ensure persisted block locator is not stale.
    


    davidgumberg commented at 10:23 pm on June 11, 2025:

    Just an observation:

    While reviewing this PR’s changes to CopyHeightIndexToHashIndex(), I wanted to understand LookUpOne from coinstatsindex.cpp and from blockfilterindex.cpp since this is where the lookups by block hashes are relevant:

    https://github.com/bitcoin/bitcoin/blob/0a248708dc9d465db09168c39b3f12cb4c9465b7/src/index/coinstatsindex.cpp#L307-L325

    I was a little confused about whether or not LookupOne() is correct, since it makes the assumption that if no entry exists for a given block height then there is no entry for that block hash, this is because CustomRewind()/CustomRemove() does not actually remove the rewound block from the DB, it just writes a hash-keyed copy of it, and then when the block is replaced by its sibling from a heavier chain, the height-keyed entry is overwritten.

    But, I do struggle a little bit to understand why we even care about preserving the rewound block at all, since if we ever reconnect the block we don’t even take advantage of already having written it to the DB, is it because users should be able to do gettxoutsetinfo and getblockfilter() on stale blocks that have already been indexed?

  33. in src/interfaces/chain.h:330 in 6f1392cc42 outdated
    326@@ -327,6 +327,15 @@ class Chain
    327         virtual void chainStateFlushed(ChainstateRole role, const CBlockLocator& locator) {}
    328     };
    329 
    330+    //! Options specifying which chain notifications are required.
    


    davidgumberg commented at 11:02 pm on June 11, 2025:

    https://github.com/bitcoin/bitcoin/pull/32694/commits/6f1392cc42cde638773f2b697d7d2c58abcdc860 (indexes, refactor: Remove remaining CBlockIndex* uses in index Rewind methods)


    nit:

    0    //! Options specifying data required for chain notifications.
    
  34. in src/interfaces/chain.h:334 in 6f1392cc42 outdated
    326@@ -327,6 +327,15 @@ class Chain
    327         virtual void chainStateFlushed(ChainstateRole role, const CBlockLocator& locator) {}
    328     };
    329 
    330+    //! Options specifying which chain notifications are required.
    331+    struct NotifyOptions
    332+    {
    333+        //! Include block data with block disconnected notifications.
    334+        bool disconnect_data = false;
    


    davidgumberg commented at 11:04 pm on June 11, 2025:

    https://github.com/bitcoin/bitcoin/pull/32694/commits/6f1392cc42cde638773f2b697d7d2c58abcdc860 (indexes, refactor: Remove remaining CBlockIndex* uses in index Rewind methods)


    feel-free-to-disregard-style-nit:

    I feel this would be more consistent as:

    0        bool disconnect_block_data = false;
    
  35. davidgumberg approved
  36. davidgumberg commented at 11:07 pm on June 11, 2025: contributor

    ACK 029ba1a21d570f7db6c

    Left some nits, and general observations that don’t need to be addressed in this PR or at all.

  37. DrahtBot requested review from davidgumberg on Jun 11, 2025
  38. furszy commented at 6:21 pm on June 12, 2025: member
    Will try to not touch the code while we are all happy with the current state. Moving #26966 and #24230 forward after ~3 years sounds good enough to me :)
  39. mzumsande commented at 8:39 pm on June 12, 2025: contributor
    Code Review ACK 029ba1a21d570f7db6c4366ec9a30a381b56d6fb
  40. achow101 commented at 10:51 pm on June 12, 2025: member
    ACK 029ba1a21d570f7db6c4366ec9a30a381b56d6fb
  41. achow101 merged this on Jun 12, 2025
  42. achow101 closed this on Jun 12, 2025

  43. furszy deleted the branch on Jun 13, 2025

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-06-28 21:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me