validation: abort on DB unreadable coins instead of treating them as missing #34931

pull furszy wants to merge 4 commits into bitcoin:master from furszy:2026_utxo_deser_error_divergence changing 6 files +322 −12
  1. furszy commented at 8:22 pm on March 26, 2026: member

    Early note: the majority of this PR consists of test coverage. The changes per se are small.

    If a UTXO entry on disk can’t be deserialized, the node currently treats it as if the coin wouldn’t exist instead of aborting with an error. A non-existing coin has a very specific meaning for consensus: any block that spends it would be permanently rejected as invalid
    (BLOCK_FAILED_VALID), silently forking the node from the rest of the network. This can’t currently be triggered in practice (details below), but it’s still the wrong behavior.

    The root cause is that CDBWrapper::Read() returns false for both missing entries and
    deserialization failures, so CCoinsViewDB::GetCoin() has no way to tell them apart. CCoinsViewErrorCatcher was built to catch database read errors and abort, but it never
    fires during deserialization errors because CDBWrapper::Read() swallows the exception before it can propagate. This comment in ExecuteBackedWrapper() spells out the code intent very clearly.

    As mentioned initially, this can’t happen in practice today. It would require either a bug in the coin serialization path, or a memory corruption before the data reaches LevelDB (at which point we have bigger problems). Random disk-level bit flips are caught earlier by LevelDB’s verification (verify_checksums=true, enabled by default), which already propagates correctly as DB_INTERNAL_ERROR. Regardless, a db read issue should never be silently misinterpreted as a consensus violation.

    This PR adds CDBWrapper::TryRead(), which returns a ReadStatus that lets callers
    discriminate between all possible outcomes. CCoinsViewDB::GetCoin() switches on the result and throws on any error, letting ExecuteBackedWrapper() do what it was designed
    to do. CDBWrapper::Read() becomes a thin wrapper over TryRead(), preserving backward
    compatibility for all other callers (so we don’t have to change non-consensus code here). PeekCoin() is also covered, as it delegates to CCoinsViewDB::GetCoin() at the database
    level.

    The idea of the PR is to go slowly over the code changes, first commit locks-in the current CDBWrapper::Read() behavior . The second adds TryRead() with tests for all four status codes. The third is the CCoinsViewDB::GetCoin() fix. The fourth is a functional that ensures the node aborts correctly instead of silently diverging.

    Testing Notes: Cherry-picking the functional test commit on master demonstrates the consensus split
    when the coin entry fails to deserialize.

    Extra Note: CDBIterator::GetValue() has the same silent-swallow pattern. Not consensus-critical. Should be addressed in a follow-up.

  2. test: add missing coverage for CDBWrapper::Read() errors
    CDBWrapper::Read() errors have no test coverage or documentation.
    
    Currently, the function can return false when deserialization fails
    (indistinguishable from a missing key) and throw dbwrapper_error
    on an internal database error.
    
    This commit introduces tests that pin both behaviors so we can work
    on improvements in the next commits without worrying about introducing
    a behavior change.
    0536e3ca61
  3. dbwrapper: add TryRead() returning explicit ReadStatus
    Read() returns false for both a missing key and a deserialization
    failure, making it impossible for callers to distinguish between
    them.
    
    This commits adds TryRead() returning a ReadStatus struct that
    discriminates between:
    
    - OK:                    key found, value deserialized
    - NOT_FOUND:             key absent
    - DB_INTERNAL_ERROR:     LevelDB threw during record read
    - DESERIALIZATION_ERROR: key present, value incompatible with
                             expected format
    
    An op_error field preserves the original exception message for
    diagnostic purposes.
    
    This also makes Read() a thin wrapper over TryRead() to keep
    existing call sites unchanged.
    
    Note:
    Key serialization is the only operation that may throw in
    TryRead(), as callers are expected to provide well-formed keys.
    This is why this function is not noexcept.
    4b89e272dd
  4. txdb: detect UTXO deserialization errors via CDBWrapper::TryRead()
    If a UTXO entry on disk can't be deserialized, the node treats it
    as if the coin doesn't exist. Any block that spends that coin is
    permanently rejected as invalid (BLOCK_FAILED_VALID), silently
    forking the node from the rest of the network. This can hardly be
    triggered in practice (details below), but it's still the wrong
    behavior that could affect us in the future.
    
    The root cause is that CDBWrapper::Read() returns false for both
    missing keys and deserialization failures, so the consensus
    class CCoinsViewDB::GetCoin() has no way to tell them apart.
    CCoinsViewErrorCatcher was built to catch database read errors
    and abort, but it never fires because CDBWrapper::Read() swallows
    the exception before it can propagate.
    
    In practice, this scenario isn't a latent risk at the moment. It
    requires either a bug in the coin serialization path, or memory
    corruption before the data reaches LevelDB (at which point we have
    bigger problems). Any random disk-level bit flips are caught earlier
    by LevelDB's verification (the verify_checksums=true option enabled
    by default), which surfaces as a DB_INTERNAL_ERROR rather than a
    deserialization failure.
    
    This commit switches CCoinsViewDB::GetCoin() to use
    CDBWrapper::TryRead(), which lets the caller discriminate between
    all possible outcomes. On deserialization error, the exception now
    propagates through CCoinsViewErrorCatcher to ExecuteBackedWrapper(),
    which invokes the shutdown callbacks and aborts the node accordantly.
    
    This also fixes PeekCoin(), which delegates to GetCoin() at the
    CCoinsViewDB level.
    2fb50c557c
  5. test: exercise node abort on UTXO deserialization failure
    This ensures that UTXO unserialization errors abort the node, and does
    not cause a consensus divergence.
    
    A valid UTXO is created and shared between two nodes. The raw database
    entry is then deliberately modified on one node so it can no longer be
    deserialized. When the other node spends that UTXO and mines a block,
    the node with the unserializable entry must abort during block connection
    rather than silently treating the coin as absent and marking the block
    BLOCK_FAILED_VALID, which would cause it to permanently diverge from the
    network's best chain.
    bf295eb0af
  6. DrahtBot added the label Validation on Mar 26, 2026
  7. DrahtBot commented at 8:23 pm on March 26, 2026: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    Concept ACK stickies-v, w0xlt, fjahr

    If your review is incorrectly listed, please copy-paste <!–meta-tag:bot-skip–> into the comment that the bot should ignore.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #bitcoin-core/gui/933 ([DRAFT] Introduce Qt test automation bridge and gui functional tests by johnny9)
    • #34872 (wallet: fix mixed-input transaction accounting in history RPCs by w0xlt)
    • #34320 (coins: remove redundant and confusing CCoinsViewDB::HaveCoin by l0rinc)
    • #34132 (coins: drop error catcher, centralize fatal read handling by l0rinc)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

    LLM Linter (✨ experimental)

    Possible places where comparison-specific test macros should replace generic comparisons:

    • [test/functional/feature_utxo_abort_on_error.py] assert node0.gettxout(coinbase_txid, 0) is not None, f"Expected UTXO {coinbase_txid}:0 to exist in UTXO set" -> replace with assert_not_equal(node0.gettxout(coinbase_txid, 0), None) (or assert_equal(node0.gettxout(... ) is not None, True) if preferred)
    • [test/functional/feature_utxo_abort_on_error.py] assert existing_value is not None, f"UTXO {coinbase_txid}:0 not found in db" -> replace with assert_not_equal(existing_value, None)
    • [test/functional/feature_utxo_abort_on_error.py] assert len(permanently_invalid) == 0, f"Spending block {spending_block_hash} must not be marked BLOCK_FAILED_VALID" -> replace with assert_equal(len(permanently_invalid), 0)

    2026-03-26 20:23:26

  8. stickies-v commented at 9:20 pm on March 26, 2026: contributor

    If a UTXO entry on disk can’t be deserialized, the node currently treats it as if the coin wouldn’t exist instead of aborting with an error.

    Concept ACK

  9. w0xlt commented at 10:33 pm on March 26, 2026: contributor
    Concept ACK
  10. fjahr commented at 8:03 am on March 27, 2026: contributor
    Concept ACK
  11. l0rinc commented at 12:48 pm on March 27, 2026: contributor
    This was already done in #34132, can you explain why you think we need two competing implementations?
  12. furszy commented at 3:19 pm on March 27, 2026: member

    This was already done in #34132, can you explain why you think we need two competing implementations?

    We are obviously touching the same code from different angles. Yours from a code-level perspective, and mine from a higher-level consensus divergence scenario. I think it’s fairly evident from reading both PR descriptions. I wasn’t aware of #34132, but I am now. I find both angles valuable.

    From what I can see, your PR is a much broader change focused on removing the CCoinsViewErrorCatcher indirection, and the deserialization fix comes within that larger structural change, so the consensus fix doesn’t get the attention/relevance it should have. Mine comes at it from a different direction, solely motivated by the consensus divergence standpoint. A UTXO deserialization failure silently causes BLOCK_FAILED_VALID, permanently forking the node from the network.

    The approaches also differ technically. Your #34132 makes Read() itself fatal, changing behavior for all callers at once, which forced you to end up mixing consensus and non-consensus changes together in the same PR. This PR is minimal and much narrower in that sense, it focuses on consensus and nothing else. It adds TryRead() returning an explicit ReadStatus, so GetCoin() can discriminate between a missing key and a deserialization failure without touching Read() for non-consensus callers. All this work started from the functional test bf295eb0af8b8768ce2e659eb8e5afdf1c3a8a5f that exercises the actual silent-fork scenario on master.

    I still need to review your PR in detail, but it would be good to focus on what we can do to get the best of both worlds. I think there are valuable changes on both sides. The first point that comes to mind is that the consensus changes and its test coverage should live in its own isolated PR, and then we keep building with broader cleanups and improvements on top. We should avoid mixing them within the same PR. Will dive into your PR to see what else we can combine and improve.

    The goal is not to compete, it’s to get the best outcome for the codebase.


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-03-31 12:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me