blockencodings: recover from cmpctblock short-id collisions #34932

pull w0xlt wants to merge 2 commits into bitcoin:master from w0xlt:cmpctblock-shortid-collision-recovery changing 3 files +111 −12
  1. w0xlt commented at 8:31 pm on March 26, 2026: contributor

    When two transactions in a compact block share the same 6-byte short ID, InitData() currently returns READ_STATUS_FAILED, abandoning reconstruction entirely and falling back to a full block download.

    This is unnecessarily broad. A collision only means those specific positions are ambiguous - the rest of the compact block is still usable.

    This PR marks collided short IDs with a sentinel value instead of failing. Ambiguous entries are skipped during mempool/extra_txn matching and left as unavailable in txn_available, so they flow naturally into the existing getblocktxn request path. Only the collided transactions are fetched; the compact block fast path is preserved for everything else.

    Safety is unchanged: ambiguous positions are never guessed, and reconstructed blocks are still validated for merkle mutation before acceptance.

  2. blockencodings: recover from cmpctblock short-id collisions
    PartiallyDownloadedBlock::InitData() currently treats duplicate short IDs in a cmpctblock as a generic READ_STATUS_FAILED condition. That causes us to abandon compact block reconstruction for an otherwise well-formed announcement, either by falling back to a full block request or by giving up on that peer entirely.
    
    This is broader than necessary. A short-id collision only means that one or more transaction positions are ambiguous. We already have a targeted recovery path for missing transactions via getblocktxn/blocktxn, so the compact block can still be used as long as the ambiguous positions are left unresolved.
    
    Handle this by marking collided short IDs as ambiguous during InitData() instead of failing immediately. Ambiguous entries are skipped during mempool and extra_txn matching, remain unavailable in txn_available, and naturally flow into the existing missing-transaction request path. This preserves the compact-block fast path and limits fallback to only the collided transactions.
    
    Safety is unchanged: we still do not guess which transaction a collided short ID refers to, and reconstructed blocks are still checked for mutation before acceptance.
    
    Add a unit test that forces a short-id collision and verifies that InitData() succeeds, the collided entries remain unavailable, and FillBlock() reconstructs the original block once those transactions are provided.
    4af636b1b9
  3. test: cover cmpctblock short-id collision recovery
    Add a compact-block functional test that forces two transaction positions to share a short ID in an otherwise valid cmpctblock.
    
    The test preloads every non-colliding transaction into the mempool, sends the mutated compact block, verifies the node requests only the ambiguous positions via getblocktxn instead of falling back to a full block request, and then completes reconstruction with blocktxn.
    1d5737397c
  4. DrahtBot commented at 8:31 pm on March 26, 2026: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Reviews

    See the guideline for information on the review process. A summary of reviews will appear here.

  5. ajtowns commented at 0:20 am on March 27, 2026: contributor

    When two transactions in a compact block share the same 6-byte short ID,

    For an honest peer (ie one constructing the shortids via a random nonce and hash against a valid block that doesn’t contain the same transaction twice) the odds of this occurring for a block with under 8k transactions (ie any reasonable block) is about 1-in-8.8million, ie about once in 167 years. I think it probably makes more sense to treat a duplicate shortid as “this compact block is probably broken/dishonest, so don’t even waste CPU scanning the mempool”.

  6. w0xlt commented at 1:07 am on March 28, 2026: contributor

    it probably makes more sense to treat a duplicate shortid as “this compact block is probably broken/dishonest, so don’t even waste CPU scanning the mempool”.

    Fair enough. That’s already the behavior in master. So the only remaining change worth making would be to remove or update the stale TODO in blockencodings.cpp, since it points in a direction we don’t want.

  7. ajtowns commented at 3:36 am on March 28, 2026: contributor

    it probably makes more sense to treat a duplicate shortid as “this compact block is probably broken/dishonest, so don’t even waste CPU scanning the mempool”.

    Fair enough. That’s already the behavior in master. So the only remaining change worth making would be to remove or update the stale TODO in blockencodings.cpp, since it points in a direction we don’t want.

    That’s my feeling, at least.

    Duplicate shortids in the mempool (rather than the block) are more likely to occur in practice, and there’s been discussion about potentially using a minisketch (over the wtxids) as a way to detect and recover a small number of issues there there and possibly avoid either a round-trip (if there was a collision) or disregarding the compact block (if there was a single hit against the wrong tx), fwiw.


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-03-31 12:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me