BIP draft: BIPs for Utreexo #1923

pull kcalvinalvin wants to merge 3 commits into bitcoin:master from kcalvinalvin:2025-08-10-utreexo-bips changing 13 files +1446 −0
  1. kcalvinalvin commented at 6:56 am on August 10, 2025: contributor

    These are the 3 BIPs that describe Utreexo, a consensus-compatible (non-soft fork) way to send and verify transactions without storing the full UTXO set.

    The 3 BIPs are for:

    1. The specification of the Utreexo accumulator.
    2. The specification of Bitcoin block and tx validation using the Utreexo accumulator.
    3. The peer to peer networking changes required to enable Utreexo nodes.

    Mailing list post: https://groups.google.com/g/bitcoindev/c/W1lxBraKG_E

  2. kcalvinalvin force-pushed on Aug 10, 2025
  3. kcalvinalvin force-pushed on Aug 10, 2025
  4. jonatack added the label New BIP on Aug 10, 2025
  5. in utreexo-p2p-bip.md:26 in a94f6434c8 outdated
    21+This document **does not** describe how to validate blocks and transactions using the provided data, see [Utreexo - Validation Layer](./utreexo-validation-bip.md) for more details.
    22+
    23+## Motivation
    24+
    25+Utreexo nodes require the inclusion proof to fully validate blocks and transactions.
    26+Each block has an corresponding inclusion proof with it and this inclusion proof for blocks up to height 906,937 requires an additional 631.85GB, which is roughly 40GB less than the size of the block data.
    


    jmoik commented at 2:08 pm on August 11, 2025:
    an -> a

    jonatack commented at 3:39 pm on August 11, 2025:

    there are two, would be this one

    0Each block has a corresponding inclusion proof with it and this inclusion proof for blocks up to height 906,937 requires an additional 631.85GB, which is roughly 40GB less than the size of the block data.
    

    kcalvinalvin commented at 6:56 am on August 12, 2025:
    Addressed in the latest push
  6. in utreexo-p2p-bip.md:27 in a94f6434c8 outdated
    22+
    23+## Motivation
    24+
    25+Utreexo nodes require the inclusion proof to fully validate blocks and transactions.
    26+Each block has an corresponding inclusion proof with it and this inclusion proof for blocks up to height 906,937 requires an additional 631.85GB, which is roughly 40GB less than the size of the block data.
    27+Each transaction also has an corresponding inclusion proof with it and for normal transaction relay, the proof is roughly 3 times the size of the transaction.
    


    jmoik commented at 2:08 pm on August 11, 2025:
    an -> a

    kcalvinalvin commented at 6:56 am on August 12, 2025:
    Addressed in the latest push
  7. in utreexo-p2p-bip.md:50 in a94f6434c8 outdated
    45+3. Archive nodes
    46+
    47+CSNs have the goal of minimizing data storage and download while performing block validation.
    48+Archive and bridge nodes store more data and provide this data to CSNs.
    49+
    50+Bridge nodes are nodes that can add inclusion proofs to mempool transactions, support the same set of messages as CSNs, and are in fact should be indistinguishable from CSNs on the network.
    


    jmoik commented at 2:10 pm on August 11, 2025:
    are in fact should -> should in fact

    kcalvinalvin commented at 6:57 am on August 12, 2025:
    Addressed in the latest push
  8. in utreexo-p2p-bip.md:98 in a94f6434c8 outdated
    93+### Transaction relay
    94+
    95+![Current TX relay](bip-utreexo-p2p/current-tx-relay.png)
    96+
    97+Current transaction relay is done by sending an inv message with the hash of the transaction and a type field that denotes that this hash represents a transaction.
    98+If the node receiving the inv is does not have a tx matching that hash, it then requests for it using a getdata message.
    


    jmoik commented at 2:15 pm on August 11, 2025:
    • is

    kcalvinalvin commented at 6:57 am on August 12, 2025:
    Removed in the latest push
  9. in utreexo-p2p-bip.md:105 in a94f6434c8 outdated
    100+![Utreexo TX relay](bip-utreexo-p2p/utreexo-tx-relay.png)
    101+
    102+The transaction relay for Utreexo nodes doesn't add any extra round trips.
    103+However, it does include extra inventory vectors in the inv message.
    104+
    105+We introduce a new inventory vector type called `utreexoproofhash` which make up the extra information that a Utreexo node will receive.
    


    jmoik commented at 2:17 pm on August 11, 2025:
    make -> includes

    jonatack commented at 3:42 pm on August 11, 2025:
    s/ which make/, which makes/

    kcalvinalvin commented at 3:49 am on August 12, 2025:

    I’ll go with , which makes since includes sounds like the utreexoproofhash invvect has other information as well

    EDIT: Replaced with , which makes in the latest push

  10. in utreexo-p2p-bip.md:302 in a94f6434c8 outdated
    297+| Field                      | Type                                | Description                                   |
    298+|----------------------------|-------------------------------------|-----------------------------------------------|
    299+| length of the Utreexo TTLs | varint                              | The length of the Utreexo summaries           |
    300+| Utreexo TTLs               | vector of Utreexo summaries         | The vector of the requested Utreexo summaries |
    301+| length of the proof hashes | varint                              | The length of the proof hashes                |
    302+| proof hashes               | vector of 32 byte hashes            | The vector of the requested Utreexo summaries |
    


    jmoik commented at 2:19 pm on August 11, 2025:
    requested proof hashes*?
  11. in utreexo-p2p-bip.md:242 in a94f6434c8 outdated
    237+
    238+| Field        | Type                | Description                                                                                                                                                          |
    239+|--------------|---------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------|
    240+| block height | uint32              | The time-to-live value of a leaf in the Utreexo merkle forest. The value is determined by the amount of leaves that were added to the accumulator since its creation |
    241+| length       | varint              | The length of the TTLs                                                                                                                                               |
    242+| TTLs         | vector of TTL infos | position in the Utreexo merkle forest when the leaf was removed                                                                                                      |
    


    jmoik commented at 2:21 pm on August 11, 2025:
    description?
  12. in utreexo-p2p-bip.md:194 in a94f6434c8 outdated
    189+A compact leaf data is defined as:
    190+
    191+| Field        | type                         | Description     |
    192+|--------------|------------------------------|-----------------|
    193+| header code  | uint32                       | This is a value obtained by left shifting the block height that confirmed this transaction, and then OR-ing it with 1, only if this transaction is a coinbase. |
    194+| amount       | int64                        | The amount in sats locked on this output |
    


    jmoik commented at 2:23 pm on August 11, 2025:
    should probably be unsigned

    kcalvinalvin commented at 4:38 am on August 12, 2025:

    It makes sense to have it as int64 as CAmount is represented as int64 in code https://github.com/bitcoin/bitcoin/blob/273e600e65c2e31a6e9a0bd72b40672aaa503b08/src/consensus/amount.h#L12

    Other implementations follow this as well:https://github.com/btcsuite/btcd/blob/baebb836c2d4692da3de3b0d437f4da6ce915546/wire/msgtx.go#L337

  13. jmoik commented at 2:34 pm on August 11, 2025: none
    some typos
  14. in utreexo-p2p-bip.md:13 in a94f6434c8 outdated
     8+Comments-URI: TBD
     9+Status: Draft
    10+Type: Specification
    11+Created: 2024-08-08
    12+License: BSD-3-Clause
    13+Depends: BIP-???? (Utreexo - Peer Services)
    


    jonatack commented at 7:41 pm on August 11, 2025:

    Per BIPs 2 and 3, this would be “Requires” (and currently refers to the same BIP)

    0Requires: BIP-???? (Utreexo - Peer Services)
    

    kcalvinalvin commented at 6:53 am on August 12, 2025:
    Addressed in the latest push
  15. in utreexo-validation-bip.md:13 in a94f6434c8 outdated
     8+Comments-URI: TBD
     9+Status: Draft
    10+Type: Specification
    11+Created: 2023-10-01
    12+License: BSD-3-Clause
    13+Depends: BIP-???? (Utreexo Accumulator Specification)
    


    jonatack commented at 7:41 pm on August 11, 2025:

    Per BIPs 2 and 3, this would be “Requires”

    0Requires: BIP-???? (Utreexo Accumulator Specification)
    

    kcalvinalvin commented at 6:53 am on August 12, 2025:
    Addressed in the latest push
  16. in utreexo-accumulator-bip.md:13 in a94f6434c8 outdated
     8+Comments-URI: TBD
     9+Status: Draft
    10+Type: Specification
    11+Created: 2025-06-18
    12+License: BSD-3-Clause
    13+Depends: BIP-???? (Utreexo Accumulator Specification)
    


    jonatack commented at 7:42 pm on August 11, 2025:
    Refers to the same document. If correct, this line should be dropped.

    kcalvinalvin commented at 6:54 am on August 12, 2025:
    Dropped in the latest push
  17. in utreexo-accumulator-bip.md:56 in a94f6434c8 outdated
    51+the accumulator tracks the current set of unspent transaction outputs (UTXOs).
    52+
    53+The Utreexo accumulator is based on an append-only Merkle tree design introduced in [^1],
    54+which provides logarithmic-sized inclusion proofs. Utreexo extends this design to support dynamic updates,
    55+specifically enabling deletions from the set—a requirement for tracking UTXO spends in Bitcoin.
    56+To accommodate this, Utreexo increases the storage requirement for the accumulator state to O(log₂(N)),
    


    jonatack commented at 7:47 pm on August 11, 2025:
    “increases the requirement” – perhaps mention here “compared to the UTXO set”

    luisschwab commented at 10:05 pm on August 11, 2025:
    0To accommodate this, Utreexo increases the storage requirement for the accumulator state to $O(log_2(N))$,
    

    LaTeX renderers don’t play nice with this unicode symbol.


    kcalvinalvin commented at 5:03 am on August 12, 2025:

    Ah the paragraph could be worded better.

    It’s referring to how the merkle forest is expanded to support more leaves. Like sparse merkle trees, you pre-allocate the Utreexo accumulator to hold 2^n leaves. If you want to hold (2^n)+1 leaves, you need to resize the accumulator to hold 2^n+1 leaves.


    kcalvinalvin commented at 6:05 am on August 12, 2025:

    Oh I read it wrong too. It increases the requirements vs the paper referenced in [^1].

    Fixing this…

    Changed the sentence to improve legibility


    kcalvinalvin commented at 6:52 am on August 12, 2025:
    Addressed in the latest push
  18. in utreexo-validation-bip.md:39 in a94f6434c8 outdated
    34+long-term scalability concern.
    35+
    36+Utreexo is a dynamic accumulator that enables the UTXO set to be represented in just a few kilobytes,
    37+by requiring peers to provide additional proof data to verify the inclusion of a UTXO in the
    38+accumulator. This allows for the construction of extremely lightweight nodes capable of performing
    39+the same validation as a full node, without the need to store the entire UTXO set.
    


    jonatack commented at 9:38 pm on August 11, 2025:
    The preceding 3 paragraphs seem to be duplicates of the accumulator BIP that this BIP requires. Can perhaps remove them or refer to the accumulator BIP motivation.

    kcalvinalvin commented at 6:56 am on August 12, 2025:
    Removed the preceding 3 paragraphs in the latest push
  19. in utreexo-accumulator-bip.md:554 in a94f6434c8 outdated
    550+While RSA accumulators and similar constructions offer significant advantages in proof size—often allowing a
    551+single proof to cover an entire block's worth of UTXOs—the trade-offs in proof generation cost and latency are
    552+substantial. In RSA-based designs, creating a proof for any given UTXO at arbitrary times can be computationally
    553+intensive, especially as the number of UTXOs grows.
    554+
    555+Utreexo's design is driven by the need for Bridge Nodes: nodes that maintain backward compatibility with existing
    


    jonatack commented at 9:48 pm on August 11, 2025:
    This BIP appears to be missing a required backwards compatibility section.

    kcalvinalvin commented at 6:56 am on August 12, 2025:
    Added a backwards compatibility section
  20. jonatack commented at 9:52 pm on August 11, 2025: member
    Thank you for proposing these drafts. They already look quite complete with respect to the editorial requirements (BIPs 2 and 3). I’ve done a cursory first pass. No immediate conceptual feedback. A few editorial comments follow; feel free to ignore them during conceptual review until they are applicable.
  21. in utreexo-accumulator-bip.md:66 in a94f6434c8 outdated
    61+The Utreexo accumulator consists of a set of Merkle trees: specifically, perfect binary trees with $2^n$ elements,
    62+where each node in the tree contains a 32-byte hash. The elements being stored appear at the leaves—the bottom layer of the tree.
    63+The topmost node is referred to as the "root," while nodes located between the leaves and the root are called "intermediate nodes."
    64+
    65+Any integer number of elements ($N$) can be represented as a forest of such trees. On average, a set of N elements will require
    66+approximately $\frac{log₂(N)}{2}$ trees. The number and sizes of trees are determined by the binary representation of $N$:
    


    luisschwab commented at 10:05 pm on August 11, 2025:
    0approximately $\frac{log_2(N)}{2}$ trees. The number and sizes of trees are determined by the binary representation of $N$:
    

    LaTeX renderers don’t play nice with this unicode symbol.


    kcalvinalvin commented at 6:52 am on August 12, 2025:
    Addressed in the latest push
  22. kcalvinalvin force-pushed on Aug 12, 2025
  23. Add the Utreexo accumulator BIP 8444a28331
  24. Add Utreexo validation BIP ca511ff1de
  25. Add Utreexo P2P BIP d1d03420ac
  26. kcalvinalvin force-pushed on Aug 12, 2025
  27. petertodd commented at 3:52 pm on August 12, 2025: contributor
    You need to justify why you’re using SHA-512/256 rather than SHA-256, like the rest of the Bitcoin protocol. Right now you just link to a paper from 2011. But that paper is out of date now that hardware support for SHA-256 has become common.
  28. 1BitcoinBoWP1FZ4xwTNkq6XksKidmgYYw commented at 6:29 pm on August 12, 2025: none

    I strongly recommend replacing SHA-256 with SHAKE256 (from the SHA-3 standard) for the following reasons:

    1. Security Advantages

    • 🔒 Provides built-in protection against length-extension attacks
    • 📏 Offers flexible output lengths (supports 128-bit and 256-bit security levels)
    • ⚙️ Based on Keccak sponge construction (NIST FIPS 202 standard)
    • 🌐 Aligns with post-quantum cryptography standards

    2. Comparative Analysis: SHA-256 vs SHAKE256

    Characteristic SHA-256 SHAKE256
    Algorithm Family SHA-2 SHA-3 (Keccak)
    Output Flexibility Fixed 256-bit Arbitrary length
    Security Properties Vulnerable to length-extension Resistant to length-extension
    Internal Structure Merkle-Damgård Sponge function
    Standardization NIST FIPS 180-4 NIST FIPS 202

    3. Functional Example

    Input: Bitcoin

    SHAKE256 (512-bit output):
    6beb0661ba1fa7289bf359fbb81550bd9641cf5abc62a14d466c421c8a86e528e027632ec0e7ceb994650566f3c8258af2240333b6d0e9186766fd2c1ebb763a

    SHAKE256 (256-bit output):
    6beb0661ba1fa7289bf359fbb81550bd9641cf5abc62a14d466c421c8a86e528

    4. Implementation Benefits

    • ✅ Maintains 256-bit output compatibility where needed
    • ✅ Future-proofs against emerging cryptographic vulnerabilities
    • ✅ Reduces potential attack vectors through improved design
    • ✅ Supports Bitcoin’s security evolution while maintaining performance

    5. Technical Reference

    For detailed cryptographic differences:
    Cryptographic Comparison: SHA-2 vs SHA-3

  29. kcalvinalvin commented at 11:06 am on August 18, 2025: contributor

    You need to justify why you’re using SHA-512/256 rather than SHA-256, like the rest of the Bitcoin protocol. Right now you just link to a paper from 2011. But that paper is out of date now that hardware support for SHA-256 has become common.

    Sure we can update the accumulator BIP with benchmarks for SHA512/256 vs SHA256.

    But could you link to the aforementioned justifications for the other parts of the Bitcoin protocol that use SHA512?

  30. kcalvinalvin commented at 11:10 am on August 18, 2025: contributor

    I strongly recommend replacing SHA-256 with SHAKE256 (from the SHA-3 standard) for the following reasons:

    SHAKE256 is not used in Bitcoin and introduces a new hash which increases the trust-assumption. We do not want to do this.

  31. bitcoin deleted a comment on Aug 18, 2025
  32. bitcoin deleted a comment on Aug 18, 2025
  33. 1BitcoinBoWP1FZ4xwTNkq6XksKidmgYYw commented at 2:32 pm on August 18, 2025: none

    The reliance of Bitcoin on SHA-2—a legacy hash function designed by the National Security Agency (NSA)—introduces non-trivial security risks, particularly when considering the often-dismissed threat posed by quantum adversaries.

    Migrating to SHAKE256 (a variant of SHA-3) would represent a meaningful improvement, though such a change merely delays the inevitable: Bitcoin must eventually transition to a quantum-resistant cryptographic framework. When this occurs—and it will, regardless of opposition—SHA-2, along with ECDSA private keys, public keys, and signatures, will become obsolete.

    See: Lenght extension attack (Bitcoin is vulnerable because it’s using SHA-256)

  34. bitcoin deleted a comment on Aug 18, 2025
  35. bitcoin deleted a comment on Aug 18, 2025
  36. jonatack commented at 2:35 pm on August 18, 2025: member
    Some friendly moderation to keep the discussion focused on technical review – thanks.
  37. kcalvinalvin commented at 2:46 pm on August 18, 2025: contributor

    The reliance of Bitcoin on SHA-2—a legacy hash function designed by the National Security Agency (NSA)—introduces non-trivial security risks, particularly when considering the often-dismissed threat posed by quantum adversaries.

    SHA256 and SHA512 are quantum resistent.

    Migrating to SHAKE256 (a variant of SHA-3) would represent a meaningful improvement, though such a change merely delays the inevitable: Bitcoin must eventually transition to a quantum-resistant cryptographic framework. When this occurs—and it will, regardless of opposition—SHA-2, along with ECDSA private keys, public keys, and signatures, will become obsolete. See: Lenght extension attack (Bitcoin is vulnerable because it’s using SHA-256)

    Ok but this has nothing to do with this BIP.

  38. murchandamus commented at 10:15 pm on August 18, 2025: contributor
    @1BitcoinBoWP1FZ4xwTNkq6XksKidmgYYw, please cut out the LLM generated comments. If any of us were interested in seeing an LLM’s prediction of what might be said about a topic, we could prompt one ourselves.
  39. petertodd commented at 10:18 pm on August 18, 2025: contributor

    On Mon, Aug 18, 2025 at 04:06:51AM -0700, Calvin Kim wrote:

    kcalvinalvin left a comment (bitcoin/bips#1923)

    You need to justify why you’re using SHA-512/256 rather than SHA-256, like the rest of the Bitcoin protocol. Right now you just link to a paper from 2011. But that paper is out of date now that hardware support for SHA-256 has become common.

    Sure we can update the accumulator BIP with benchmarks for SHA512/256 vs SHA256.

    But could you link to the aforementioned justifications for the other parts of the Bitcoin protocol that use SHA512?

    No part of the Bitcoin consensus protocol uses SHA512.

  40. kcalvinalvin commented at 6:17 am on August 19, 2025: contributor

    On Mon, Aug 18, 2025 at 04:06:51AM -0700, Calvin Kim wrote: kcalvinalvin left a comment (bitcoin/bips#1923) > You need to justify why you’re using SHA-512/256 rather than SHA-256, like the rest of the Bitcoin protocol. Right now you just link to a paper from 2011. But that paper is out of date now that hardware support for SHA-256 has become common. Sure we can update the accumulator BIP with benchmarks for SHA512/256 vs SHA256. But could you link to the aforementioned justifications for the other parts of the Bitcoin protocol that use SHA512? No part of the Bitcoin consensus protocol uses SHA512.

    Ok but you’ve stated in your previous comment “You need to justify why you’re using SHA-512/256 rather than SHA-256, like the rest of the Bitcoin protocol”. Would be very helpful to see what type of justifications the other protocols have made.

    Second, I don’t think it matters if SHA512 wasn’t used in the Bitcoin consensus protocol. SHA512 is used in BIP32 and the argument that SHA512 is safe for generating private keys but not safe for Bitcoin consensus isn’t sound.

    I think our original justification (better performance with SHA512/256) mentioned in the BIP is sound. Happy to provide the benchmarks, they’re being worked on at the moment.


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bips. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-08-19 23:10 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me