blocks: force hash validations on disk read #32638

pull l0rinc wants to merge 3 commits into bitcoin:master from l0rinc:l0rinc/read-block-hash-check changing 6 files +18 −16
  1. l0rinc commented at 9:48 am on May 29, 2025: contributor

    A follow-up to #32487 (review), after which validating the hash of a read block from disk doesn’t incur the cost of calculating its hash anymore.

    Summary

    This PR adds explicit checks that the read block header’s hash matches the one we were expecting.

    Context

    After the previous PR, validating a block’s hash during read operations became essentially free. This PR leverages that by requiring callers to provide a block’s expected hash (or std::nullopt), preventing silent failures caused by corrupted or mismatched data. Most ReadBlock usages were updated with expected hashes and now fail on mismatch.

    Changes

    • added hash assertions in ProcessGetBlockData and ProcessMessage to validate that the block read from disk matches the expected hash;
    • updated tests and benchmark to pass the correct block hash to ReadBlock(), ensuring the hash validation is tested - or none if we already expect PoW failure;
    • removed the default value for expected_hash, requiring an explicit hash for all block reads.

    Why is the hash still optional (but no longer has a default value)

    • for header-error tests, where the goal is to trigger failures early in the parsing process;
    • for out-of-order orphan blocks, where the child hash isn’t available before the initial disk read.
  2. DrahtBot commented at 9:48 am on May 29, 2025: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/32638.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK hodlinator, janb84, maflcko

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

  3. l0rinc renamed this:
    blocks: force hash validations of blocks read from disk explicit
    blocks: force hash validations of blocks read from disk
    on May 29, 2025
  4. l0rinc marked this as ready for review on May 29, 2025
  5. l0rinc renamed this:
    blocks: force hash validations of blocks read from disk
    blocks: force hash validations on disk read
    on May 29, 2025
  6. janb84 commented at 1:13 pm on June 2, 2025: contributor

    ACK efbe0e86810ccbe828472935eb221c2ddf295bf3

    Looks like a good followup PR to use the option to validate the hash as introduced in #32487

    • code reviewed ✅
    • compiled & run tests ✅
  7. willcl-ark added the label Block storage on Jun 2, 2025
  8. in src/net_processing.cpp:2268 in efbe0e8681 outdated
    2264@@ -2265,6 +2265,7 @@ void PeerManagerImpl::ProcessGetBlockData(CNode& pfrom, Peer& peer, const CInv&
    2265         if (!pindex) {
    2266             return;
    2267         }
    2268+        assert(pindex->GetBlockHash() == inv.hash);
    


    hodlinator commented at 8:28 am on June 5, 2025:

    From PR description:

    added hash assertions in ProcessGetBlockData and ProcessMessage to validate that the block read from disk matches the expected hash;

    Stared at BlockManager::AddToBlockIndex() but I fail to see how the hash in the keys of the BlockManager::m_block_index map would be different from the hash in the CBlockIndex::phashBlock (stored in the values of BlockManager::m_block_index). Would require cosmic rays hitting RAM (or faulty RAM) rather than disk from what I can see - so adding these asserts right after the lookup seems unnecessary.


    l0rinc commented at 11:41 am on June 12, 2025:

    I’m not sure I fully understand the comment here. Are you saying that it’s obvious that this is always true? if so, isn’t that kinda’ the role of an assert though :)?

    If you think it’s ridiculously trivial (though you had to look inside the methods to understand the relation, this assertion was meant to avoid doing that, since I’m replacing one with the other in the code, this should help reviewers understand why that’s safe).

    So here it’s meant to document why subsequent code now relies on inv.hash instead of pindex->GetBlockHash() in several places. Let me know if you think it’s not helpful.


    hodlinator commented at 12:45 pm on June 12, 2025:

    The pseudo-code here is roughly:

    0  value = lookup(key);
    1  if (!value) {
    2      return;
    3  }
    4+ assert(value belongs to key);
    

    Grepped the codebase for calls to LookupBlockIndex() and couldn’t find any where we assert that the returned index’s hash matches what we looked it up by. So this a departure from the current level of defensiveness we use in the project. Heck, /src/index/coinstatsindex.cpp and /src/kernel/coinstats.cpp don’t even check against a null return value.

    It makes sense to communicate to reviewers that the change is safe, but I prefer it be done via commit message or some other means in this case. I think it is wrong to bloat the code to avoid reviewers having to double-check 1-2 levels deeper for such basic assumptions. Would be somewhat more okay if you changed LookupBlockIndex internals to assert the field on the return value as a post-condition.

    If we agree on switching to a super-defensive but also cluttered and less performant coding style like https://github.com/tigerbeetle/tigerbeetle/blob/main/docs/TIGER_STYLE.md#safety, I prefer we do it coupled with some kind of dev announcement/agreement. We should probably be re-evaluating our standards, taking into account that we have redundancy in the amount of nodes and versions being run.


    l0rinc commented at 1:41 pm on June 12, 2025:
    Regardless of the implementation, what I was documenting here was the safety of changing one value to the other, without having to understand the details of why. If I touch again, I don’t mind changing this, if other reviewer agree it’s overly pedantic. In other cases, I have explicitly removed asserts like this in the next commit, the signal that this value was explicitly checked before.

    hodlinator commented at 7:19 pm on June 12, 2025:

    Would rather see post-conditions documented internally:

     0diff --git a/src/node/blockstorage.cpp b/src/node/blockstorage.cpp
     1index 04936ec99d..409097361f 100644
     2--- a/src/node/blockstorage.cpp
     3+++ b/src/node/blockstorage.cpp
     4@@ -192,14 +192,18 @@ CBlockIndex* BlockManager::LookupBlockIndex(const uint256& hash)
     5 {
     6     AssertLockHeld(cs_main);
     7     BlockMap::iterator it = m_block_index.find(hash);
     8-    return it == m_block_index.end() ? nullptr : &it->second;
     9+    if (it == m_block_index.end()) return nullptr;
    10+    Assume(*it->second.phashBlock == hash);
    11+    return &it->second;
    12 }
    13 
    14 const CBlockIndex* BlockManager::LookupBlockIndex(const uint256& hash) const
    15 {
    16     AssertLockHeld(cs_main);
    17     BlockMap::const_iterator it = m_block_index.find(hash);
    18-    return it == m_block_index.end() ? nullptr : &it->second;
    19+    if (it == m_block_index.end()) return nullptr;
    20+    Assume(*it->second.phashBlock == hash);
    21+    return &it->second;
    22 }
    23 
    24 CBlockIndex* BlockManager::AddToBlockIndex(const CBlockHeader& block, CBlockIndex*& best_header)
    
    • Less clutter/contagion of asserts (O(1) vs O(n) proliferation).
    • Documentation purpose mostly. Should only have impact in Assume()-enabled builds without degrading performance in releases.
    • Catches cosmic rays (and buffer-overruns) in Assume()-enabled builds.
    • Still clearly communicates safety in the context of this PR - I think we should be able to ask reviewers to go 1 level deep. (And if we add the checking in this PR it is even more clear).

    l0rinc commented at 10:36 am on June 13, 2025:
    Since I meant the assert as a cheap and reliable documentation (and have no interest in modifying LookupBlockIndex), I’ve moved the the reason for changing pindex->GetBlockHash() to inv.hash and pindex->GetBlockHash() to inv.hash to the commit message instead of the assert.

    hodlinator commented at 1:01 pm on June 16, 2025:
    Thanks!
  9. in src/node/blockstorage.h:414 in efbe0e8681 outdated
    410@@ -411,7 +411,7 @@ class BlockManager
    411     void UnlinkPrunedFiles(const std::set<int>& setFilesToPrune) const;
    412 
    413     /** Functions for disk access for blocks */
    414-    bool ReadBlock(CBlock& block, const FlatFilePos& pos, const std::optional<uint256>& expected_hash = {}) const;
    415+    bool ReadBlock(CBlock& block, const FlatFilePos& pos, const std::optional<uint256>& expected_hash) const;
    


    hodlinator commented at 8:33 am on June 5, 2025:

    nit: Would be clearer if the PR description changed to something like

    0- Why is the hash still optional
    1+ Why is the hash still optional (but no longer has a default value)
    

    Happy this PR removes the default value!


    l0rinc commented at 11:30 am on June 12, 2025:
    Done, thanks
  10. hodlinator commented at 9:03 am on June 5, 2025: contributor
    Concept ACK efbe0e86810ccbe828472935eb221c2ddf295bf3
  11. net: assert block hash in `ProcessGetBlockData` and `ProcessMessage`
    The non-recent-block code path in `ProcessGetBlockData` already has `inv.hash` available (equaling `pindex->GetBlockHash()`).
    Pass it to `ReadBlock()` and assert that the on-disk header matches the requested hash.
    
    The `GETBLOCKTXN` message handler in `ProcessMessage` receives `req.blockhash` from the peer (equaling `pindex->GetBlockHash()`).
    Pass this hash to `ReadBlock()` for verification and assert that the index lookup matches.
    
    Co-authored-by: TheCharlatan <seb.kung@gmail.com>
    Co-authored-by: Hodlinator <172445034+hodlinator@users.noreply.github.com>
    5d235d50d6
  12. test/bench: verify hash in `ComputeFilter` reads
    Switch to the index-aware `ReadBlock()` overload in `ComputeFilter` so that filter creation will abort if the stored block header hash doesn't match the expected one.
    
    In the `readwriteblock` benchmark, pass the expected hash to `ReadBlock()` to match the new signature without affecting benchmark performance.
    2371b9f4ee
  13. blockstorage: make block read hash checks explicit
    Dropped the default expected_hash parameter from `ReadBlock()`.
    
    In `blockmanager_flush_block_file` tests, we pass {} since the tests would already fail at PoW validation for corrupted blocks.
    
    In `ChainstateManager::LoadExternalBlockFile`, we pass {} when processing child blocks because their hashes aren't known beforehand.
    9341b5333a
  14. l0rinc force-pushed on Jun 13, 2025
  15. hodlinator approved
  16. hodlinator commented at 1:02 pm on June 16, 2025: contributor

    ACK 9341b5333ad54ccdb7c16802ff06c51b956948e7

    Good to make net_processing.cpp send in the expected hash and force other callers to consider whether they have the hash available through removing the default value for the function parameter to BlockManager::ReadBlock. Makes the change that #32487 started feel more complete.

  17. DrahtBot requested review from janb84 on Jun 16, 2025
  18. janb84 commented at 1:58 pm on June 16, 2025: contributor
    re ACK 9341b5333ad54ccdb7c16802ff06c51b956948e7
  19. maflcko commented at 2:17 pm on June 17, 2025: member

    review ACK 9341b5333ad54ccdb7c16802ff06c51b956948e7 🕙

    Signature:

    0untrusted comment: signature from minisign secret key on empty file; verify via: minisign -Vm "${path_to_any_empty_file}" -P RWTRmVTMeKV5noAMqVlsMugDDCyyTSbA3Re5AkUrhvLVln0tSaFWglOw -x "${path_to_this_whole_four_line_signature_blob}"
    1RUTRmVTMeKV5npGrKx1nqXCw5zeVHdtdYURB/KlyA/LMFgpNCs+SkW9a8N95d+U4AP1RJMi+krxU1A3Yux4bpwZNLvVBKy0wLgM=
    2trusted comment: review ACK 9341b5333ad54ccdb7c16802ff06c51b956948e7 🕙
    3LXb3r6QjUSFApA8kapoYrNrMTCeEbtRY4O6KU+r7bn6eM4wXU12P7bBLGrZK2Uvk5cfpkuhkrvm+Y2/zkp4SBw==
    

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-06-26 12:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me