index: Deduplicate HashKey / HeightKey handling #32997

pull mzumsande wants to merge 2 commits into bitcoin:master from mzumsande:202507_dedup_hashheight changing 3 files +105 −171
  1. mzumsande commented at 10:09 pm on July 16, 2025: contributor
    The logic for DBHashKey / DBHeightKey handling and lookup of entries is shared by coinstatsindex and blockfilterindex, leading to many lines of duplicated code. De-duplicate this by moving the logic to index/key.h (using templates for the index-specific DBVal).
  2. DrahtBot added the label UTXO Db and Indexes on Jul 16, 2025
  3. DrahtBot commented at 10:09 pm on July 16, 2025: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/32997.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    Concept ACK fjahr

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #31308 (ci, iwyu: Treat warnings as errors for specific directories by hebasto)
    • #26966 (index: initial sync speedup, parallelize process by furszy)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

    LLM Linter (✨ experimental)

    Possible typos and grammar issues:

    • “an different block” -> “a different block” [‘an’ before a consonant sound is incorrect]

    drahtbot_id_4_m

  4. in src/index/coinstatsindex.cpp:123 in a73af86fa2 outdated
    118@@ -119,23 +119,6 @@ bool CoinStatsIndex::CustomAppend(const interfaces::BlockInfo& block)
    119 
    120     // Ignore genesis block
    121     if (block.height > 0) {
    122-        std::pair<uint256, DBVal> read_out;
    123-        if (!m_db->Read(DBHeightKey(block.height - 1), read_out)) {
    


    fjahr commented at 10:50 pm on July 16, 2025:
    It’s not true that they aren’t used for anything. The point of reading the block from disk is to check if it’s the prev block that is expected because the values that are written for the current block depend on the prev block values. They are used not from the prev block but from the members instead though. I am not sure right now if this check can just be removed, but if they are still needed the expected prev block hash could probably be written into a member too and the DB read could be avoided this way.

    mzumsande commented at 11:15 pm on July 16, 2025:

    ah, ok. I thought that it was used in the past for something but no longer is, because we load data into a local variable and let it go out of scope without using it. So the point is the return false / the logged warning? Is this a legitimitate situation we can get into outside of index corruption?

    In any case, I’ll drop that commit with the next push, it’s not the main focus of this PR.


    fjahr commented at 9:30 am on July 17, 2025:
    Yes, the return false would prevent potentially writing false stats because the member variables are not up-to-date with what the prev block should be. I don’t remember what the scenarios were but I think there was also fear of some edge cases with reorgs but from today’s perspective I am not sure anymore how that would happen. Maybe the refactorings in the index have made this scenario impossible but I’ll have to think about it some more.

    mzumsande commented at 3:35 pm on July 17, 2025:
    I’m skeptical if this is a possible scenario with today’s index base code (which is much better with reorgs / unexpected restart scenarios than it was back then), but it’s definitely more complicated than I thought, so this PR is not appropriate for this - I dropped the commit and will resolve this discussion.

    fjahr commented at 9:55 pm on July 18, 2025:
    I agree, sounds good to me!

    furszy commented at 4:34 pm on July 20, 2025:

    The base index class is responsible for detecting reorgs and calling the appropriate methods from the child class: CustomRemove() up to the forking point, then CustomAppend() to connect the new chain blocks. This happens during both initial sync and the validation event. I’m pretty confident we can safely remove the extra check.

    Furthermore, since the index child class receives block connection and disconnection events in order, I don’t see why it should care about reorgs at all (speaking generally here, for all index child classes). That’s one of the base class responsibilities.

    Also, if we do want to keep the previous block check for some reason, I think we could cache it during CustomInit(), then update it in both CustomAppend() and CustomRemove(). Or.. we could keep the base class’s m_best_block_index updated with every processed block too. Either way, I think we can avoid accessing disk for this.

  5. fjahr commented at 9:43 am on July 17, 2025: contributor
    Concept ACK on the deduplication parts
  6. index: deduplicate Hash / Height handling
    The code was largely duplicated between coinstatsindex
    and blockfilterindex.
    Deduplicate it by moving it to a shared file.
    
    slight change in behavior: the index name is no longer
    part of the error msg in case of (un)serialization errors.
    05b402ccaa
  7. index, refactor: deduplicate LookUpOne
    LookUpOne is used by both coinstatsindex and blockfilterindex,
    the two implementations had already started to deviate slightly
    for no apparent reason.
    03b67a27d0
  8. mzumsande force-pushed on Jul 17, 2025
  9. in src/index/key.h:80 in 03b67a27d0
    75@@ -76,4 +76,24 @@ template <typename DBVal>
    76     return true;
    77 }
    78 
    79+template <typename DBVal>
    80+static bool LookUpOne(const CDBWrapper& db, const interfaces::BlockRef& block, DBVal& result)
    


    fjahr commented at 10:07 pm on July 18, 2025:

    nit: could do IWYU I guess, even though this wasn’t followed that well in the index cpp files.

    0#include <dbwrapper.h>
    1#include <interfaces/types.h>
    
  10. in src/index/blockfilterindex.cpp:33 in 05b402ccaa outdated
    30@@ -30,8 +31,6 @@
    31  * as big-endian so that sequential reads of filters by height are fast.
    32  * Keys for the hash index have the type [DB_BLOCK_HASH, uint256].
    33  */
    34-constexpr uint8_t DB_BLOCK_HASH{'s'};
    


    fjahr commented at 10:18 pm on July 18, 2025:

    The comment above here still talks about three items stored for each block but after the change only one of the values is here. This would need to be adopted somehow and probably some of the docs moved over to index/key.h.

    EDIT: Maybe the current comment can be left almost untouched but there needs to be a reference to where the keys are now.

  11. in src/index/key.h:13 in 05b402ccaa outdated
     8+#include <cstdint>
     9+#include <string>
    10+#include <stdexcept>
    11+#include <utility>
    12+
    13+static constexpr uint8_t DB_BLOCK_HASH{'s'};
    


    fjahr commented at 10:20 pm on July 18, 2025:
    Aside from the documentation moving necessary mentioned above maybe there could be 1-2 general sentences here that this deals with the hash/height keys and that it’s mostly relevant to indexes that use these.

    fjahr commented at 10:21 pm on July 18, 2025:
    Also, very minor naming nit: File could be called db_key.h since we have other kinds of keys but I don’t feel too strongly about it.
  12. fjahr commented at 10:34 pm on July 18, 2025: contributor
    Looks good, aside from the documentation cleanup in blockfilterindex.cpp you can consider my comments optional.

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-07-23 21:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me