BIP 181, 182, 183: BIPs for Utreexo #1923

pull kcalvinalvin wants to merge 4 commits into bitcoin:master from kcalvinalvin:2025-08-10-utreexo-bips changing 14 files +1607 −0
  1. kcalvinalvin commented at 6:56 am on August 10, 2025: contributor

    These are the 3 BIPs that describe Utreexo, a consensus-compatible (non-soft fork) way to send and verify transactions without storing the full UTXO set.

    The 3 BIPs are for:

    1. The specification of the Utreexo accumulator.
    2. The specification of Bitcoin block and tx validation using the Utreexo accumulator.
    3. The peer to peer networking changes required to enable Utreexo nodes.

    Mailing list post: https://groups.google.com/g/bitcoindev/c/W1lxBraKG_E

  2. kcalvinalvin force-pushed on Aug 10, 2025
  3. kcalvinalvin force-pushed on Aug 10, 2025
  4. jonatack added the label New BIP on Aug 10, 2025
  5. in utreexo-p2p-bip.md:26 in a94f6434c8 outdated
    21+This document **does not** describe how to validate blocks and transactions using the provided data, see [Utreexo - Validation Layer](./utreexo-validation-bip.md) for more details.
    22+
    23+## Motivation
    24+
    25+Utreexo nodes require the inclusion proof to fully validate blocks and transactions.
    26+Each block has an corresponding inclusion proof with it and this inclusion proof for blocks up to height 906,937 requires an additional 631.85GB, which is roughly 40GB less than the size of the block data.
    


    jmoik commented at 2:08 pm on August 11, 2025:
    an -> a

    jonatack commented at 3:39 pm on August 11, 2025:

    there are two, would be this one

    0Each block has a corresponding inclusion proof with it and this inclusion proof for blocks up to height 906,937 requires an additional 631.85GB, which is roughly 40GB less than the size of the block data.
    

    kcalvinalvin commented at 6:56 am on August 12, 2025:
    Addressed in the latest push
  6. in utreexo-p2p-bip.md:27 in a94f6434c8 outdated
    22+
    23+## Motivation
    24+
    25+Utreexo nodes require the inclusion proof to fully validate blocks and transactions.
    26+Each block has an corresponding inclusion proof with it and this inclusion proof for blocks up to height 906,937 requires an additional 631.85GB, which is roughly 40GB less than the size of the block data.
    27+Each transaction also has an corresponding inclusion proof with it and for normal transaction relay, the proof is roughly 3 times the size of the transaction.
    


    jmoik commented at 2:08 pm on August 11, 2025:
    an -> a

    kcalvinalvin commented at 6:56 am on August 12, 2025:
    Addressed in the latest push
  7. in utreexo-p2p-bip.md:50 in a94f6434c8 outdated
    45+3. Archive nodes
    46+
    47+CSNs have the goal of minimizing data storage and download while performing block validation.
    48+Archive and bridge nodes store more data and provide this data to CSNs.
    49+
    50+Bridge nodes are nodes that can add inclusion proofs to mempool transactions, support the same set of messages as CSNs, and are in fact should be indistinguishable from CSNs on the network.
    


    jmoik commented at 2:10 pm on August 11, 2025:
    are in fact should -> should in fact

    kcalvinalvin commented at 6:57 am on August 12, 2025:
    Addressed in the latest push
  8. in utreexo-p2p-bip.md:98 in a94f6434c8 outdated
    93+### Transaction relay
    94+
    95+![Current TX relay](bip-utreexo-p2p/current-tx-relay.png)
    96+
    97+Current transaction relay is done by sending an inv message with the hash of the transaction and a type field that denotes that this hash represents a transaction.
    98+If the node receiving the inv is does not have a tx matching that hash, it then requests for it using a getdata message.
    


    jmoik commented at 2:15 pm on August 11, 2025:
    • is

    kcalvinalvin commented at 6:57 am on August 12, 2025:
    Removed in the latest push
  9. in utreexo-p2p-bip.md:105 in a94f6434c8 outdated
    100+![Utreexo TX relay](bip-utreexo-p2p/utreexo-tx-relay.png)
    101+
    102+The transaction relay for Utreexo nodes doesn't add any extra round trips.
    103+However, it does include extra inventory vectors in the inv message.
    104+
    105+We introduce a new inventory vector type called `utreexoproofhash` which make up the extra information that a Utreexo node will receive.
    


    jmoik commented at 2:17 pm on August 11, 2025:
    make -> includes

    jonatack commented at 3:42 pm on August 11, 2025:
    s/ which make/, which makes/

    kcalvinalvin commented at 3:49 am on August 12, 2025:

    I’ll go with , which makes since includes sounds like the utreexoproofhash invvect has other information as well

    EDIT: Replaced with , which makes in the latest push

  10. in utreexo-p2p-bip.md:302 in a94f6434c8 outdated
    297+| Field                      | Type                                | Description                                   |
    298+|----------------------------|-------------------------------------|-----------------------------------------------|
    299+| length of the Utreexo TTLs | varint                              | The length of the Utreexo summaries           |
    300+| Utreexo TTLs               | vector of Utreexo summaries         | The vector of the requested Utreexo summaries |
    301+| length of the proof hashes | varint                              | The length of the proof hashes                |
    302+| proof hashes               | vector of 32 byte hashes            | The vector of the requested Utreexo summaries |
    


    jmoik commented at 2:19 pm on August 11, 2025:
    requested proof hashes*?

    kcalvinalvin commented at 7:09 am on August 12, 2025:
    Addressed in the latest push
  11. in utreexo-p2p-bip.md:242 in a94f6434c8 outdated
    237+
    238+| Field        | Type                | Description                                                                                                                                                          |
    239+|--------------|---------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------|
    240+| block height | uint32              | The time-to-live value of a leaf in the Utreexo merkle forest. The value is determined by the amount of leaves that were added to the accumulator since its creation |
    241+| length       | varint              | The length of the TTLs                                                                                                                                               |
    242+| TTLs         | vector of TTL infos | position in the Utreexo merkle forest when the leaf was removed                                                                                                      |
    


    jmoik commented at 2:21 pm on August 11, 2025:
    description?

    kcalvinalvin commented at 7:09 am on August 12, 2025:
    Added the correct description in the latest push
  12. in utreexo-p2p-bip.md:194 in a94f6434c8 outdated
    189+A compact leaf data is defined as:
    190+
    191+| Field        | type                         | Description     |
    192+|--------------|------------------------------|-----------------|
    193+| header code  | uint32                       | This is a value obtained by left shifting the block height that confirmed this transaction, and then OR-ing it with 1, only if this transaction is a coinbase. |
    194+| amount       | int64                        | The amount in sats locked on this output |
    


    jmoik commented at 2:23 pm on August 11, 2025:
    should probably be unsigned

    kcalvinalvin commented at 4:38 am on August 12, 2025:

    It makes sense to have it as int64 as CAmount is represented as int64 in code https://github.com/bitcoin/bitcoin/blob/273e600e65c2e31a6e9a0bd72b40672aaa503b08/src/consensus/amount.h#L12

    Other implementations follow this as well:https://github.com/btcsuite/btcd/blob/baebb836c2d4692da3de3b0d437f4da6ce915546/wire/msgtx.go#L337

  13. jmoik commented at 2:34 pm on August 11, 2025: none
    some typos
  14. in utreexo-p2p-bip.md:13 in a94f6434c8 outdated
     8+Comments-URI: TBD
     9+Status: Draft
    10+Type: Specification
    11+Created: 2024-08-08
    12+License: BSD-3-Clause
    13+Depends: BIP-???? (Utreexo - Peer Services)
    


    jonatack commented at 7:41 pm on August 11, 2025:

    Per BIPs 2 and 3, this would be “Requires” (and currently refers to the same BIP)

    0Requires: BIP-???? (Utreexo - Peer Services)
    

    kcalvinalvin commented at 6:53 am on August 12, 2025:
    Addressed in the latest push
  15. in utreexo-validation-bip.md:13 in a94f6434c8 outdated
     8+Comments-URI: TBD
     9+Status: Draft
    10+Type: Specification
    11+Created: 2023-10-01
    12+License: BSD-3-Clause
    13+Depends: BIP-???? (Utreexo Accumulator Specification)
    


    jonatack commented at 7:41 pm on August 11, 2025:

    Per BIPs 2 and 3, this would be “Requires”

    0Requires: BIP-???? (Utreexo Accumulator Specification)
    

    kcalvinalvin commented at 6:53 am on August 12, 2025:
    Addressed in the latest push
  16. in utreexo-accumulator-bip.md:13 in a94f6434c8 outdated
     8+Comments-URI: TBD
     9+Status: Draft
    10+Type: Specification
    11+Created: 2025-06-18
    12+License: BSD-3-Clause
    13+Depends: BIP-???? (Utreexo Accumulator Specification)
    


    jonatack commented at 7:42 pm on August 11, 2025:
    Refers to the same document. If correct, this line should be dropped.

    kcalvinalvin commented at 6:54 am on August 12, 2025:
    Dropped in the latest push
  17. in utreexo-accumulator-bip.md:56 in a94f6434c8 outdated
    51+the accumulator tracks the current set of unspent transaction outputs (UTXOs).
    52+
    53+The Utreexo accumulator is based on an append-only Merkle tree design introduced in [^1],
    54+which provides logarithmic-sized inclusion proofs. Utreexo extends this design to support dynamic updates,
    55+specifically enabling deletions from the set—a requirement for tracking UTXO spends in Bitcoin.
    56+To accommodate this, Utreexo increases the storage requirement for the accumulator state to O(log₂(N)),
    


    jonatack commented at 7:47 pm on August 11, 2025:
    “increases the requirement” – perhaps mention here “compared to the UTXO set”

    luisschwab commented at 10:05 pm on August 11, 2025:
    0To accommodate this, Utreexo increases the storage requirement for the accumulator state to $O(log_2(N))$,
    

    LaTeX renderers don’t play nice with this unicode symbol.


    kcalvinalvin commented at 5:03 am on August 12, 2025:

    Ah the paragraph could be worded better.

    It’s referring to how the merkle forest is expanded to support more leaves. Like sparse merkle trees, you pre-allocate the Utreexo accumulator to hold 2^n leaves. If you want to hold (2^n)+1 leaves, you need to resize the accumulator to hold 2^n+1 leaves.


    kcalvinalvin commented at 6:05 am on August 12, 2025:

    Oh I read it wrong too. It increases the requirements vs the paper referenced in [^1].

    Fixing this…

    Changed the sentence to improve legibility


    kcalvinalvin commented at 6:52 am on August 12, 2025:
    Addressed in the latest push
  18. in utreexo-validation-bip.md:39 in a94f6434c8 outdated
    34+long-term scalability concern.
    35+
    36+Utreexo is a dynamic accumulator that enables the UTXO set to be represented in just a few kilobytes,
    37+by requiring peers to provide additional proof data to verify the inclusion of a UTXO in the
    38+accumulator. This allows for the construction of extremely lightweight nodes capable of performing
    39+the same validation as a full node, without the need to store the entire UTXO set.
    


    jonatack commented at 9:38 pm on August 11, 2025:
    The preceding 3 paragraphs seem to be duplicates of the accumulator BIP that this BIP requires. Can perhaps remove them or refer to the accumulator BIP motivation.

    kcalvinalvin commented at 6:56 am on August 12, 2025:
    Removed the preceding 3 paragraphs in the latest push
  19. in utreexo-accumulator-bip.md:554 in a94f6434c8 outdated
    550+While RSA accumulators and similar constructions offer significant advantages in proof size—often allowing a
    551+single proof to cover an entire block's worth of UTXOs—the trade-offs in proof generation cost and latency are
    552+substantial. In RSA-based designs, creating a proof for any given UTXO at arbitrary times can be computationally
    553+intensive, especially as the number of UTXOs grows.
    554+
    555+Utreexo's design is driven by the need for Bridge Nodes: nodes that maintain backward compatibility with existing
    


    jonatack commented at 9:48 pm on August 11, 2025:
    This BIP appears to be missing a required backwards compatibility section.

    kcalvinalvin commented at 6:56 am on August 12, 2025:
    Added a backwards compatibility section
  20. jonatack commented at 9:52 pm on August 11, 2025: member
    Thank you for proposing these drafts. They already look quite complete with respect to the editorial requirements (BIPs 2 and 3). I’ve done a cursory first pass. No immediate conceptual feedback. A few editorial comments follow; feel free to ignore them during conceptual review until they are applicable.
  21. in utreexo-accumulator-bip.md:66 in a94f6434c8 outdated
    61+The Utreexo accumulator consists of a set of Merkle trees: specifically, perfect binary trees with $2^n$ elements,
    62+where each node in the tree contains a 32-byte hash. The elements being stored appear at the leaves—the bottom layer of the tree.
    63+The topmost node is referred to as the "root," while nodes located between the leaves and the root are called "intermediate nodes."
    64+
    65+Any integer number of elements ($N$) can be represented as a forest of such trees. On average, a set of N elements will require
    66+approximately $\frac{log₂(N)}{2}$ trees. The number and sizes of trees are determined by the binary representation of $N$:
    


    luisschwab commented at 10:05 pm on August 11, 2025:
    0approximately $\frac{log_2(N)}{2}$ trees. The number and sizes of trees are determined by the binary representation of $N$:
    

    LaTeX renderers don’t play nice with this unicode symbol.


    kcalvinalvin commented at 6:52 am on August 12, 2025:
    Addressed in the latest push
  22. kcalvinalvin force-pushed on Aug 12, 2025
  23. kcalvinalvin force-pushed on Aug 12, 2025
  24. petertodd commented at 3:52 pm on August 12, 2025: contributor
    You need to justify why you’re using SHA-512/256 rather than SHA-256, like the rest of the Bitcoin protocol. Right now you just link to a paper from 2011. But that paper is out of date now that hardware support for SHA-256 has become common.
  25. 1BitcoinBoWP1FZ4xwTNkq6XksKidmgYYw commented at 6:29 pm on August 12, 2025: contributor

    I strongly recommend replacing SHA-256 with SHAKE256 (from the SHA-3 standard) for the following reasons:

    1. Security Advantages

    • 🔒 Provides built-in protection against length-extension attacks
    • 📏 Offers flexible output lengths (supports 128-bit and 256-bit security levels)
    • ⚙️ Based on Keccak sponge construction (NIST FIPS 202 standard)
    • 🌐 Aligns with post-quantum cryptography standards

    2. Comparative Analysis: SHA-256 vs SHAKE256

    Characteristic SHA-256 SHAKE256
    Algorithm Family SHA-2 SHA-3 (Keccak)
    Output Flexibility Fixed 256-bit Arbitrary length
    Security Properties Vulnerable to length-extension Resistant to length-extension
    Internal Structure Merkle-Damgård Sponge function
    Standardization NIST FIPS 180-4 NIST FIPS 202

    3. Functional Example

    Input: Bitcoin

    SHAKE256 (512-bit output):
    6beb0661ba1fa7289bf359fbb81550bd9641cf5abc62a14d466c421c8a86e528e027632ec0e7ceb994650566f3c8258af2240333b6d0e9186766fd2c1ebb763a

    SHAKE256 (256-bit output):
    6beb0661ba1fa7289bf359fbb81550bd9641cf5abc62a14d466c421c8a86e528

    4. Implementation Benefits

    • ✅ Maintains 256-bit output compatibility where needed
    • ✅ Future-proofs against emerging cryptographic vulnerabilities
    • ✅ Reduces potential attack vectors through improved design
    • ✅ Supports Bitcoin’s security evolution while maintaining performance

    5. Technical Reference

    For detailed cryptographic differences:
    Cryptographic Comparison: SHA-2 vs SHA-3

  26. kcalvinalvin commented at 11:06 am on August 18, 2025: contributor

    You need to justify why you’re using SHA-512/256 rather than SHA-256, like the rest of the Bitcoin protocol. Right now you just link to a paper from 2011. But that paper is out of date now that hardware support for SHA-256 has become common.

    Sure we can update the accumulator BIP with benchmarks for SHA512/256 vs SHA256.

    But could you link to the aforementioned justifications for the other parts of the Bitcoin protocol that use SHA512?

  27. kcalvinalvin commented at 11:10 am on August 18, 2025: contributor

    I strongly recommend replacing SHA-256 with SHAKE256 (from the SHA-3 standard) for the following reasons:

    SHAKE256 is not used in Bitcoin and introduces a new hash which increases the trust-assumption. We do not want to do this.

  28. bitcoin deleted a comment on Aug 18, 2025
  29. bitcoin deleted a comment on Aug 18, 2025
  30. 1BitcoinBoWP1FZ4xwTNkq6XksKidmgYYw commented at 2:32 pm on August 18, 2025: contributor

    The reliance of Bitcoin on SHA-2—a legacy hash function designed by the National Security Agency (NSA)—introduces non-trivial security risks, particularly when considering the often-dismissed threat posed by quantum adversaries.

    Migrating to SHAKE256 (a variant of SHA-3) would represent a meaningful improvement, though such a change merely delays the inevitable: Bitcoin must eventually transition to a quantum-resistant cryptographic framework. When this occurs—and it will, regardless of opposition—SHA-2, along with ECDSA private keys, public keys, and signatures, will become obsolete.

    See: Lenght extension attack (Bitcoin is vulnerable because it’s using SHA-256)

  31. bitcoin deleted a comment on Aug 18, 2025
  32. bitcoin deleted a comment on Aug 18, 2025
  33. jonatack commented at 2:35 pm on August 18, 2025: member
    Some friendly moderation to keep the discussion focused on technical review – thanks.
  34. kcalvinalvin commented at 2:46 pm on August 18, 2025: contributor

    The reliance of Bitcoin on SHA-2—a legacy hash function designed by the National Security Agency (NSA)—introduces non-trivial security risks, particularly when considering the often-dismissed threat posed by quantum adversaries.

    SHA256 and SHA512 are quantum resistent.

    Migrating to SHAKE256 (a variant of SHA-3) would represent a meaningful improvement, though such a change merely delays the inevitable: Bitcoin must eventually transition to a quantum-resistant cryptographic framework. When this occurs—and it will, regardless of opposition—SHA-2, along with ECDSA private keys, public keys, and signatures, will become obsolete. See: Lenght extension attack (Bitcoin is vulnerable because it’s using SHA-256)

    Ok but this has nothing to do with this BIP.

  35. murchandamus commented at 10:15 pm on August 18, 2025: contributor
    @1BitcoinBoWP1FZ4xwTNkq6XksKidmgYYw, please cut out the LLM generated comments. If any of us were interested in seeing an LLM’s prediction of what might be said about a topic, we could prompt one ourselves.
  36. petertodd commented at 10:18 pm on August 18, 2025: contributor

    On Mon, Aug 18, 2025 at 04:06:51AM -0700, Calvin Kim wrote:

    kcalvinalvin left a comment (bitcoin/bips#1923)

    You need to justify why you’re using SHA-512/256 rather than SHA-256, like the rest of the Bitcoin protocol. Right now you just link to a paper from 2011. But that paper is out of date now that hardware support for SHA-256 has become common.

    Sure we can update the accumulator BIP with benchmarks for SHA512/256 vs SHA256.

    But could you link to the aforementioned justifications for the other parts of the Bitcoin protocol that use SHA512?

    No part of the Bitcoin consensus protocol uses SHA512.

  37. kcalvinalvin commented at 6:17 am on August 19, 2025: contributor

    On Mon, Aug 18, 2025 at 04:06:51AM -0700, Calvin Kim wrote: kcalvinalvin left a comment (bitcoin/bips#1923) > You need to justify why you’re using SHA-512/256 rather than SHA-256, like the rest of the Bitcoin protocol. Right now you just link to a paper from 2011. But that paper is out of date now that hardware support for SHA-256 has become common. Sure we can update the accumulator BIP with benchmarks for SHA512/256 vs SHA256. But could you link to the aforementioned justifications for the other parts of the Bitcoin protocol that use SHA512? No part of the Bitcoin consensus protocol uses SHA512.

    Ok but you’ve stated in your previous comment “You need to justify why you’re using SHA-512/256 rather than SHA-256, like the rest of the Bitcoin protocol”. Would be very helpful to see what type of justifications the other protocols have made.

    Second, I don’t think it matters if SHA512 wasn’t used in the Bitcoin consensus protocol. SHA512 is used in BIP32 and the argument that SHA512 is safe for generating private keys but not safe for Bitcoin consensus isn’t sound.

    I think our original justification (better performance with SHA512/256) mentioned in the BIP is sound. Happy to provide the benchmarks, they’re being worked on at the moment.

  38. 1BitcoinBoWP1FZ4xwTNkq6XksKidmgYYw commented at 6:22 am on August 20, 2025: contributor

    SHA512 is safe for generating private keys

    Lol, what did you say?

  39. kcalvinalvin commented at 6:35 am on August 20, 2025: contributor

    SHA512 is safe for generating private keys

    Lol, what did you say?

    Dude, go look up on chatgpt how SHA512/256 works. Length extension attacks that you mentioned DOES NOT work on it because the outputs are truncated. BIP32 uses HMAC-SHA512 which is just a keyed SHA512.

    Why do I even have to deal with this guy. It’s clear he doesn’t know anything. His comments are worthless and this is wasting my time and energy.

  40. kcalvinalvin commented at 7:16 am on August 20, 2025: contributor

    IMG_0371

    This is the type of email he sends me after I block him. I’m sorry for posting unrelated comments here but imho he should be blocked from this repo.

  41. in utreexo-validation-bip.md:66 in d1d03420ac outdated
    61+hash, you must compute the SHA-512/256 hash of the following data:
    62+
    63+| Name              | Type                     | Description                               |
    64+| ----------------- | ------------------------ | ----------------------------------------- |
    65+| Utreexo_Tag_V1    | 64 byte array            | The version tag to be prepended to the leafhash. |
    66+| Utreexo_Tag_V1    | 64 byte array            | The version tag to be prepended to the leafhash. |
    


    lucad70 commented at 7:13 pm on August 21, 2025:
    For clarification, is the Utreexo_Tag_V1 really used twice in preimage to the hash?

    murchandamus commented at 4:38 pm on August 27, 2025:

    My guess would be that this duplication is unintended.

    0| Name              | Type                     | Description                               |
    1| ----------------- | ------------------------ | ----------------------------------------- |
    2| Utreexo_Tag_V1    | 64 byte array            | The version tag to be prepended to the leafhash. |
    

    kcalvinalvin commented at 8:58 am on August 29, 2025:

    Oh no the duplication is intended.

    Since we use SHA512/256 as the hash function, each chunk is 128 bytes. Since the version tag is only 64 bytes, we need two of them.

  42. petertodd commented at 1:48 pm on August 24, 2025: contributor

    On Mon, Aug 18, 2025 at 04:06:51AM -0700, Calvin Kim wrote: kcalvinalvin left a comment (bitcoin/bips#1923) > You need to justify why you’re using SHA-512/256 rather than SHA-256, like the rest of the Bitcoin protocol. Right now you just link to a paper from 2011. But that paper is out of date now that hardware support for SHA-256 has become common. Sure we can update the accumulator BIP with benchmarks for SHA512/256 vs SHA256. But could you link to the aforementioned justifications for the other parts of the Bitcoin protocol that use SHA512? No part of the Bitcoin consensus protocol uses SHA512.

    Ok but you’ve stated in your previous comment “You need to justify why you’re using SHA-512/256 rather than SHA-256, like the rest of the Bitcoin protocol”. Would be very helpful to see what type of justifications the other protocols have made.

    Second, I don’t think it matters if SHA512 wasn’t used in the Bitcoin consensus protocol. SHA512 is used in BIP32 and the argument that SHA512 is safe for generating private keys but not safe for Bitcoin consensus isn’t sound.

    I think our original justification (better performance with SHA512/256) mentioned in the BIP is sound. Happy to provide the benchmarks, they’re being worked on at the moment.

    The question is 1) why are we added one new dependency to consensus implementations, and 2) is this actually a performance increase, given that dedicated SHA256 hardware is becoming common?

    Length-extension attacks are not relevant for this use-case as we are only committing to public data.

  43. in utreexo-accumulator-bip.md:10 in 8444a28331 outdated
     5+Authors: Tadge Dryja <rx@awsomnet.org>
     6+         Calvin Kim <bip@calvinkim.info>
     7+         Davidson Souza <bip@dlsouza.dev>
     8+Comments-URI: TBD
     9+Status: Draft
    10+Type: Specification
    


    murchandamus commented at 7:02 pm on August 25, 2025:
    Nit: BIP 2 is still active, so this should be “Standard Track” for the time being.
  44. in utreexo-accumulator-bip.md:17 in d1d03420ac outdated
    12+License: BSD-3-Clause
    13+```
    14+
    15+## Abstract
    16+
    17+This BIP describes the Utreexo accumulator and it's operations. It lays down how to update the
    


    murchandamus commented at 7:04 pm on August 25, 2025:
    0This BIP describes the Utreexo accumulator and its operations. It lays down how to update the
    
  45. in utreexo-accumulator-bip.md:56 in d1d03420ac outdated
    51+
    52+The Utreexo accumulator is based on an append-only Merkle tree design introduced in [^1],
    53+which provides logarithmic-sized inclusion proofs. Utreexo extends this design to support dynamic updates,
    54+specifically enabling deletions from the set—a requirement for tracking UTXO spends in Bitcoin.
    55+To accommodate this, Utreexo changes the storage requirement from the accumulator design in [^1] to $O(log_2(N))$,
    56+where N is the number of elements ever added to the set, while still keeping proof sizes small and verification efficient.
    


    murchandamus commented at 8:36 pm on August 25, 2025:
    In case this doesn’t get discussed later, it might be interesting to compare how O(log2(N)) for all transaction outputs ever created compare to the current UTXO set size.

    kcalvinalvin commented at 8:53 am on August 29, 2025:
    Technically the current Utreexo design is O(log2(N)) of all txos since the forest doesn’t shrink on a deletion. We just move the leaf up so it has the same affect as shrinking the forest.
  46. in utreexo-accumulator-bip.md:176 in d1d03420ac outdated
    171+
    172+## Utility Functions
    173+
    174+The following utility functions are required for performing accumulator operations:
    175+
    176+**parent_hash(left, right):** Returns the hash of the concatenation of two child hashes (`left` and `right`).
    


    murchandamus commented at 8:44 pm on August 25, 2025:

    Does this ambiguity regarding the depth of the leaf in the tree not introduce similar weaknesses as the original Merkle tree construction? Why would we float up leaf-hashes rather than create a tagged hash at each level?

    Is this fully mitigated due to the number of leaves being known?


    kcalvinalvin commented at 9:07 am on August 29, 2025:

    Does this ambiguity regarding the depth of the leaf in the tree not introduce similar weaknesses as the original Merkle tree construction?

    Not quite sure which weakness you’re referring to here. Is it CVE-2012-2459 (one from calculating the Bitcoin block header commitment)? Since we don’t duplicate hashes, it’s not vulnerable to that particular attack.

    Why would we float up leaf-hashes rather than create a tagged hash at each level?

    Since we float up the leaf hashes, we can save on the proofs being sent over for the sibling later on.

    On a tree like so, proof for 01 is 00, 09, 13.

    014
    1|---------------\
    212              13
    3|-------\       |-------\
    408      09      10      11
    5|---\   |---\   |---\   |---\
    600  01  02  03  04  05  06  07
    

    If we delete 00, then 01 moves up to 08. The proof for 01 is now 09 and 13. The proof got shorter.

    014
    1|---------------\
    212              13
    3|-------\       |-------\
    401      09      10      11
    5|---\   |---\   |---\   |---\
    6        02  03  04  05  06  07
    

    adiabat commented at 2:52 am on September 2, 2025:

    That’s a good point, and we can add a bit about this potential issue in the BIP.

    Leaves can move up, and it’s no longer leaves getting hashed with leaves, but leaf/internal node pairs happen often. An attack would be to grind through transactions to get two leaf hashes that together could look like a leaf data preimage for a rogue UTXO.

    The reason this isn’t a problem is that in all cases when a node is verifying a UTXO proof, the full UTXO data is known and a keyed hash is used (see “UTXO Hash Preimages” section of the validation BIP) to get from the UTXO data to the leaf. The first 128 bytes input to the hash function are the tags (the hash of “UtreexoV1”). Since this tag is only used for UTXO data, and not in internal accumulator hashes, this should prevent any internal hashes from being interpreted as UTXO data.

  47. in utreexo-accumulator-bip.md:191 in d1d03420ac outdated
    186+    if right is None: return left
    187+
    188+    return sha512_256(left + right)
    189+```
    190+
    191+**treerows(numleaves):** Returns the minimum number of bits required to represent `numleaves - 1`. This corresponds to the height of the largest tree in the forest. Returns `0` if `numleaves` is `0`.
    


    murchandamus commented at 8:49 pm on August 25, 2025:
    The numleaves - 1 throws me off here. It’s not obvious to me, why the function would be defined that way rather than the “minimum number of bits required to represent numleaves”? Perhaps a bit more context would help?

    kcalvinalvin commented at 9:26 am on August 29, 2025:

    Ah it’s because we wanted treerows to return the index of the largest tree not the length. For the below tree, numleaves = 4 but we want treerows to return 2 not 3.

    0row 2: 06
    1       |-------\
    2row 1: 04      05
    3       |---\   |---\
    4row 0: 00  01  02  03
    

    If we just took the minimum number of bits to represent numleaves = 4, we’d get 3. So to account for this, we take the minimum number of bits needed to represent numleaves-1. This off-by-one happens when numleaves is a power of two. @adiabat did talk about wanting to make treerows return the length and not the index a while back so last chance to speak up? :)

    I’ve added the explanation in the bip as well.

  48. in utreexo-accumulator-bip.md:241 in d1d03420ac outdated
    236+Implementation:
    237+
    238+```python
    239+def parent(position: int, total_rows: int) -> int:
    240+    return (position >> 1) | (1 << total_rows)
    241+```
    


    murchandamus commented at 9:05 pm on August 25, 2025:

    I could have used a little more explanation why this returns the parent, but staring at it for a bit, it seems to me that a fully filled tree with 2n leaves would have 2n-1 inner nodes, meaning that all leaves start with a zero in the first position and all inner nodes starting with a one.

    E.g. for four leaves, the leaves are 000, 001, 010, and 011, and the inner nodes would be 100, 101, 110.

    For 000 and 001, shifting to the right gives 00 and setting the top bit makes the parent 100. For 010 and 011, it works out to be 101. For 100 and 101, it works out to 110.

    Gotcha, cool.

  49. in utreexo-accumulator-bip.md:554 in d1d03420ac outdated
    549+While RSA accumulators and similar constructions offer significant advantages in proof size—often allowing a
    550+single proof to cover an entire block's worth of UTXOs—the trade-offs in proof generation cost and latency are
    551+substantial. In RSA-based designs, creating a proof for any given UTXO at arbitrary times can be computationally
    552+intensive, especially as the number of UTXOs grows.
    553+
    554+Utreexo's design is driven by the need for Bridge Nodes: nodes that maintain backward compatibility with existing
    


    murchandamus commented at 9:16 pm on August 25, 2025:

    New jargon is usually italicized on introduction, perhaps consider:

    0Utreexo's design is driven by the need for *bridge nodes*: nodes that maintain backward compatibility with existing
    
  50. murchandamus commented at 9:22 pm on August 25, 2025: contributor
    I had a look at most of the Accumulator Specification for the first helping. Looks very good already. I only reviewed the function definitions up to root_position, then skimmed the rest, before reading on from Rationale.
  51. in utreexo-validation-bip.md:4 in ca511ff1de outdated
    0@@ -0,0 +1,328 @@
    1+```
    2+BIP: TBD
    3+Layer: Peer Services
    4+Title: Utreexo - Validation Layer
    


    murchandamus commented at 4:14 pm on August 27, 2025:
    The title feels a bit odd to me. It could be a bit more descriptive, I was thinking “Utreexo - Transaction and block validation” or smth?
  52. in utreexo-validation-bip.md:10 in ca511ff1de outdated
     5+Authors: Tadge Dryja <rx@awsomnet.org>
     6+         Calvin Kim <bip@calvinkim.info>
     7+         Davidson Souza <bip@dlsouza.dev>
     8+Comments-URI: TBD
     9+Status: Draft
    10+Type: Specification
    


    murchandamus commented at 4:16 pm on August 27, 2025:
    Nit: Until BIP 3 activates, this should be Standards Track.
  53. in utreexo-validation-bip.md:20 in d1d03420ac outdated
    15+
    16+## Abstract
    17+
    18+This BIP defines the rules for validating blocks and transactions using the
    19+Utreexo accumulator. It is important to note that this BIP does not define the
    20+Utreexo accumulator itself, for that see BIP-????. This document is only concerned with
    


    murchandamus commented at 4:32 pm on August 27, 2025:

    Maybe for the time being:

    0Utreexo accumulator itself, for that see [‎BIP Utreexo Accumulator](‎utreexo-accumulator-bip.md). This document is only concerned with
    
  54. in utreexo-validation-bip.md:52 in d1d03420ac outdated
    47+## Specification
    48+
    49+### Node Hashes
    50+
    51+During a node's normal operation, it will need to compute the leaf hash for UTXOs
    52+being added or removed from the accumulator. The leaf hash is a 32 byte hash that
    


    murchandamus commented at 4:35 pm on August 27, 2025:
    0being added or removed from the accumulator. The leaf hash is a 32-byte hash that
    
  55. in utreexo-validation-bip.md:60 in d1d03420ac outdated
    55+
    56+Unless otherwise specified, all fields are in little-endian format.
    57+
    58+#### UTXO Hash Preimages
    59+
    60+Individual UTXOs are represented as 32 byte hashes in the Utreexo accumulator. To obtain this
    


    murchandamus commented at 4:36 pm on August 27, 2025:
    0Individual UTXOs are represented as 32-byte hashes in the Utreexo accumulator. To obtain this
    
  56. in utreexo-validation-bip.md:79 in d1d03420ac outdated
    74+
    75+Each field being defined as follows:
    76+
    77+##### Version tag
    78+
    79+We use tagged hashes for the hashes committed in the accumulator for versioning
    


    murchandamus commented at 4:45 pm on August 27, 2025:
    Maybe link to the section introducing tagged hashes in BIP 340: https://github.com/bitcoin/bips/blob/master/bip-0340.mediawiki#user-content-Design
  57. in utreexo-validation-bip.md:84 in d1d03420ac outdated
    79+We use tagged hashes for the hashes committed in the accumulator for versioning
    80+purposes. This is added so that if there are changes in the preimage of the
    81+hash, the version tag helps to avoid misinterpretation.
    82+
    83+The Utreexo version tag is the SHA512 hash of the string `UtreexoV1`, which is represented as the vector
    84+`[85 116 114 101 101 120 111 86 49]` and hex `0x5574726565786f5631`.  (The resulting 64 byte output is
    


    murchandamus commented at 4:46 pm on August 27, 2025:
    0`[85 116 114 101 101 120 111 86 49]` and hex `0x5574726565786f5631`.  (The resulting 64-byte output is
    
  58. in utreexo-validation-bip.md:67 in d1d03420ac outdated
    62+
    63+| Name              | Type                     | Description                               |
    64+| ----------------- | ------------------------ | ----------------------------------------- |
    65+| Utreexo_Tag_V1    | 64 byte array            | The version tag to be prepended to the leafhash. |
    66+| Utreexo_Tag_V1    | 64 byte array            | The version tag to be prepended to the leafhash. |
    67+| BlockHash         | 32 byte array            | The hash of the block in which this tx was confirmed. |
    


    murchandamus commented at 4:49 pm on August 27, 2025:

    On all lines in this table, when the byte count is used as an adjective:

    0-32 byte array
    1+32-byte array
    

    and

    0-4 bytes unsigned integer
    1+4-byte unsigned integer
    
  59. in utreexo-validation-bip.md:70 in d1d03420ac outdated
    65+| Utreexo_Tag_V1    | 64 byte array            | The version tag to be prepended to the leafhash. |
    66+| Utreexo_Tag_V1    | 64 byte array            | The version tag to be prepended to the leafhash. |
    67+| BlockHash         | 32 byte array            | The hash of the block in which this tx was confirmed. |
    68+| TXID              | 32 byte array            | The transaction's TXID                    |
    69+| Vout              | 4 bytes unsigned integer | The output index of this UTXO             |
    70+| Header code       | 4 bytes unsigned integer | The block height and iscoinbase. This is a value obtained by left shifting the block height that confirmed this transaction, and then OR-ing it with 1, only if this transaction is a coinbase. |
    


    murchandamus commented at 4:57 pm on August 27, 2025:
    0| Header code       | 4 bytes unsigned integer | The block height and iscoinbase. This is a value obtained by left shifting the block height that confirmed this transaction by one bit, and then OR-ing it with 1, only if this transaction is a coinbase. |
    
  60. in utreexo-validation-bip.md:110 in d1d03420ac outdated
    105+The output index of the UTXO in the transaction.
    106+
    107+##### Header code
    108+
    109+This field stores the block height and a boolean for marking that the UTXO is
    110+part of a coinbase transaction. Mostly serves to save space as the coinbase
    


    murchandamus commented at 5:02 pm on August 27, 2025:
    0This field stores the block height and a boolean for marking that the UTXO was
    1created by a coinbase transaction. Mostly serves to save space as the coinbase
    
  61. in utreexo-validation-bip.md:114 in d1d03420ac outdated
    109+This field stores the block height and a boolean for marking that the UTXO is
    110+part of a coinbase transaction. Mostly serves to save space as the coinbase
    111+boolean can be stored in a single bit.
    112+
    113+This field is a value obtained by left shifting the block height that
    114+confirmed this transaction, and then setting the least significant bit to 1 only
    


    murchandamus commented at 5:03 pm on August 27, 2025:
    0confirmed this transaction by one bit, and then setting the least significant bit to 1 only
    
  62. in utreexo-validation-bip.md:130 in d1d03420ac outdated
    125+The block height is needed as during transaction validation, it is used during
    126+the check of BIP-0065 CLTV. In current nodes, the block height is stored locally
    127+as a part of the UTXO set. Since Utreexo nodes get this data from peers, we need
    128+to commit to the block height to avoid security vulnerabilities.
    129+
    130+The boolean for coinbase is needed as they may not be spent before having 100 confirmations.
    


    murchandamus commented at 5:03 pm on August 27, 2025:
    0The boolean for coinbase outputs is needed as they may not be spent before having 100 confirmations.
    
  63. in utreexo-validation-bip.md:150 in d1d03420ac outdated
    145+##### script pubkey
    146+
    147+This field is added to commit to the locking script of the UTXO. With current
    148+nodes, this is stored in the UTXO set but since we receive this in the proof
    149+from our peers, we need to commit to this value to avoid malicious peers that
    150+may send over the wrong locking script.
    


    murchandamus commented at 5:08 pm on August 27, 2025:

    Perhaps consider “output script” as “scriptPubKey” is just Bitcoin Core’s variable name for that field.

     0##### Output script size
     1
     2As the output script ("scriptPubKey" in Bitcoin Core) is a variable length byte array, we prepend it with the
     3length.
     4
     5##### Output Script
     6
     7This field is added to commit to the output script of the UTXO. With current
     8nodes, this is stored in the UTXO set but since we receive this in the proof
     9from our peers, we need to commit to this value to avoid malicious peers that
    10may send over the wrong output script.
    
  64. in utreexo-validation-bip.md:173 in d1d03420ac outdated
    168+##### Provably unspendable transaction outputs
    169+
    170+There are outputs in the Bitcoin network that we can guarantee that they cannot
    171+be spent without a hard-fork of the network. The following output types are not
    172+added to the accumulator:
    173+- Outputs that start with an OP_RETURN (0x6a)
    


    murchandamus commented at 5:13 pm on August 27, 2025:
    0- Outputs whose output script starts with an OP_RETURN (0x6a)
    
  65. in utreexo-validation-bip.md:174 in d1d03420ac outdated
    169+
    170+There are outputs in the Bitcoin network that we can guarantee that they cannot
    171+be spent without a hard-fork of the network. The following output types are not
    172+added to the accumulator:
    173+- Outputs that start with an OP_RETURN (0x6a)
    174+- Outputs with a scriptPubkey larger than 10,000 bytes
    


    murchandamus commented at 6:04 pm on August 27, 2025:
    0- Outputs with an output script larger than 10,000 bytes
    
  66. in utreexo-validation-bip.md:186 in d1d03420ac outdated
    181+In Utreexo, nodes inspect blocks and identify which outputs are being created
    182+and destroyed in the same block, and exclude them from the accumulator and proofs.
    183+
    184+There's no need to provide proofs for outputs which have been created in the same
    185+block. Adding and then immediately removing the output from the accumulator would be
    186+possible but doesn't serve any purpose - once outputs are spent their past existence
    


    murchandamus commented at 6:05 pm on August 27, 2025:
    0possible but doesn't serve any purpose - once outputs are spent, their past existence
    
  67. in utreexo-validation-bip.md:199 in d1d03420ac outdated
    194+The Utreexo accumulator lacks associative properties during addition and the
    195+ordering of which UTXO hash gets added first is consensus critical. For
    196+the modification of the accumulator the steps are as follows:
    197+
    198+1. Batch remove the UTXOs that were spent in the block based on the algorithm
    199+   defined in BIP-????. Deletions itself are order-independent.
    


    murchandamus commented at 6:08 pm on August 27, 2025:
    For the time being, maybe just use [‎BIP Utreexo Accumulator](‎utreexo-accumulator-bip.md) to clarify whether you are referring to one or the other.
  68. in utreexo-validation-bip.md:204 in d1d03420ac outdated
    199+   defined in BIP-????. Deletions itself are order-independent.
    200+2. Batch add all non-excluded outputs in the order they're included in the
    201+   Bitcoin block. Additions are order-dependent.
    202+
    203+The removal and the addition of the hashes follow the algorithms defined in
    204+BIP-????.
    


    murchandamus commented at 6:10 pm on August 27, 2025:
    Ditto
  69. in utreexo-validation-bip.md:213 in d1d03420ac outdated
    208+The UTXO proof has 2 elements: the accumulator proof and the leaf data. The
    209+leaf data provides the necessary UTXO data for block validation that would be
    210+stored locally for non-Utreexo nodes. The accumulator proof proves that the
    211+given UTXO hash preimages are committed in the accumulator.
    212+
    213+Accumulator proof is defined in BIP-????, and contains two elements:
    


    murchandamus commented at 6:10 pm on August 27, 2025:
    Ditto
  70. in utreexo-validation-bip.md:218 in d1d03420ac outdated
    213+Accumulator proof is defined in BIP-????, and contains two elements:
    214+
    215+1. A vector of positions of the UTXO hashes in the accumulator.
    216+2. A vector of hashes required to hash up to the roots.
    217+
    218+For (1), positions are in the order of the leaves that are being proved in
    


    murchandamus commented at 6:11 pm on August 27, 2025:
    0For (1), positions are in the order of the leaves that are being proven in
    
  71. in utreexo-validation-bip.md:224 in d1d03420ac outdated
    219+the accumulator. These are all the inputs in the natural blockchain order that
    220+excludes the same block spends.
    221+
    222+The UTXO hash preimages follow the same ordering as (1) in the accumulator
    223+proofs. Each of the positions in (1) refer to the UTXO hash preimage in the same
    224+index.
    


    murchandamus commented at 6:16 pm on August 27, 2025:
    For some reason I had thought that the accumulator proof was a Merkle branch, but now reading this, it makes me think that the proofs are built-up from the leaf preimages. Which of the two is correct, and could you perhaps check whether some more clarification should be added here to make it unambiguous? This might also just be me mixing up something as I’m trying to puzzle together everything that is going on.

    kcalvinalvin commented at 10:08 am on August 29, 2025:

    For some reason I had thought that the accumulator proof was a Merkle branch, but now reading this, it makes me think that the proofs are built-up from the leaf preimages. Which of the two is correct, and could you perhaps check whether some more clarification should be added here to make it unambiguous?

    You are right, there’s the merkle branches themselves and the leaf preimages are an entirely separate data apart from that.

    I’ll read it over again and make clarifications where needed.


    adiabat commented at 2:56 am on September 2, 2025:
    Yeah it’s both. First the merkle branch is needed to verify that the UTXO exists at all, then the UTXO data / leaf preimage is needed to feed in to the normal script & transaction validation. We can make it clear that there’s really 2 things stuck together here.
  72. in utreexo-validation-bip.md:237 in d1d03420ac outdated
    232+
    233+For each block, the UTXO proof must be provided with the bitcoin block for
    234+validation to be possible. Without the UTXO proof, it's not possible to
    235+validate that the inputs being referenced exists in the UTXO set.
    236+
    237+The end result of the UTXO proof validation results us in the vector of UTXO
    


    murchandamus commented at 6:18 pm on August 27, 2025:
    0The end result of the UTXO proof validation results in the vector of UTXO
    
  73. in utreexo-validation-bip.md:258 in d1d03420ac outdated
    253+of this check affect the consensus validation for Utreexo nodes.
    254+
    255+### BIP-0030 and BIP-0034 consensus check
    256+
    257+Before `BIP-0030`, the Bitcoin consensus rules allowed for duplicate TXIDs. If two
    258+transactions shared a same TXID, the transaction outputs of the preceding
    


    murchandamus commented at 6:20 pm on August 27, 2025:
    0transactions shared a same TXID, the transaction outputs of the succeeding
    
  74. in utreexo-validation-bip.md:261 in d1d03420ac outdated
    256+
    257+Before `BIP-0030`, the Bitcoin consensus rules allowed for duplicate TXIDs. If two
    258+transactions shared a same TXID, the transaction outputs of the preceding
    259+transaction would overwrite the previously created UTXOs. It was assumed that
    260+TXIDs were unique but it's trivially easy to create a transaction that share
    261+the same `TXID` for coinbase transactions by re-using the same bitcoin address.
    


    murchandamus commented at 6:30 pm on August 27, 2025:
    Nit: It’s not just that the TXID is the same, the entire transaction is the same.
  75. in utreexo-validation-bip.md:264 in d1d03420ac outdated
    259+transaction would overwrite the previously created UTXOs. It was assumed that
    260+TXIDs were unique but it's trivially easy to create a transaction that share
    261+the same `TXID` for coinbase transactions by re-using the same bitcoin address.
    262+
    263+`BIP-0030` check is a consensus check that enforces that newly created transactions
    264+do not have outputs that overwrites an existing UTXO.
    


    murchandamus commented at 6:30 pm on August 27, 2025:
    0do not have outputs that overwrite an existing UTXO.
    
  76. in utreexo-validation-bip.md:266 in d1d03420ac outdated
    261+the same `TXID` for coinbase transactions by re-using the same bitcoin address.
    262+
    263+`BIP-0030` check is a consensus check that enforces that newly created transactions
    264+do not have outputs that overwrites an existing UTXO.
    265+
    266+`BIP-0034` was a rule where the block height was included in the script signature
    


    murchandamus commented at 6:35 pm on August 27, 2025:

    What was called originally “generator transaction” is now more familiarly referred to as a “coinbase transaction” after the “scriptSig” equivalent being called “coinbase field” in that context.

    0`BIP-0034` introduces a rule that requires the block height to be included in the coinbase field
    
  77. in utreexo-validation-bip.md:267 in d1d03420ac outdated
    262+
    263+`BIP-0030` check is a consensus check that enforces that newly created transactions
    264+do not have outputs that overwrites an existing UTXO.
    265+
    266+`BIP-0034` was a rule where the block height was included in the script signature
    267+of the coinbase transaction. One of the reason for the change was to make
    


    murchandamus commented at 6:39 pm on August 27, 2025:

    As far as I can tell, the rest of BIP 34 explains the activation mechanism of BIP 34, so I would claim that this is the main reason.

    0of the coinbase transaction. The main reason for the change was to make
    
  78. in utreexo-validation-bip.md:271 in d1d03420ac outdated
    266+`BIP-0034` was a rule where the block height was included in the script signature
    267+of the coinbase transaction. One of the reason for the change was to make
    268+coinbase transactions unique so that the expensive check of going through the
    269+UTXO set wouldn't be needed. However, there were blocks in the past that had
    270+random bytes that could be interpreted as block heights. The lowest block
    271+heights are: 209,921, 490,897, and 1,983,702.
    


    murchandamus commented at 6:41 pm on August 27, 2025:
    0random bytes that could be interpreted as block heights. The lowest implicated block
    1heights are: 209,921, 490,897, and 1,983,702.
    
  79. in utreexo-validation-bip.md:285 in d1d03420ac outdated
    280+will remove the checkpoints[^1], as they are not needed anymore to prevent attacks
    281+against nodes during Initial Block Download. This is effectively a hard-fork,
    282+that will probably never actually happen, however.
    283+
    284+Block 1,983,702 is the first block that Utreexo nodes would be in danger of a
    285+consensus failure due to the inability to perform the BIP-0030 checks. However,
    


    murchandamus commented at 6:47 pm on August 27, 2025:
    0consensus failure due to the inability to perform the BIP-0030 checks, if someone were to reuse coinbase transaction from block 164,384 . However,
    
  80. in utreexo-validation-bip.md:291 in d1d03420ac outdated
    286+this block will happen in roughly 21 years from now, and some mitigations have been
    287+proposed [^2].
    288+
    289+### Historical BIP-0030 violations
    290+
    291+There were two UTXOs that were overwritten due to this consensus rule are:
    


    murchandamus commented at 6:48 pm on August 27, 2025:

    Not due to this rule, but rather before it was introduced:

    0There were two UTXOs that were overwritten by repeated transactions:
    
  81. in utreexo-validation-bip.md:300 in d1d03420ac outdated
    295+Since the leaf hashes that are committed to the Utreexo accumulator commit to
    296+the block hash as well, all the leaf hashes are unique and the two historical
    297+violations do not happen with how the UTXO set is represented with the Utreexo
    298+accumulator. To be consensus compatible with clients that do have the historical
    299+violations, the leaves representing these two UTXOs in the Utreexo accumulator
    300+are hardcoded as unspendable.
    


    murchandamus commented at 6:51 pm on August 27, 2025:

    If I’m understanding this right:

    0accumulator. To be consensus compatible with clients that retain only the second
    1occurrences of these outputs, the leaves representing the corresponding first UTXOs in the Utreexo accumulator
    2are hardcoded as unspendable.
    
  82. murchandamus commented at 6:56 pm on August 27, 2025: contributor
    This time I took a look at the “Validation Layer” BIP. Also looks very good already. I noticed that there is no Rationale section, and the title seemed a little less informative than it could be.
  83. in utreexo-p2p-bip.md:28 in d1d03420ac outdated
    23+## Motivation
    24+
    25+Utreexo nodes require the inclusion proof to fully validate blocks and transactions.
    26+Each block has a corresponding inclusion proof with it and this inclusion proof for blocks up to height 906,937 requires an additional 631.85GB, which is roughly 40GB less than the size of the block data.
    27+Each transaction also has a corresponding inclusion proof with it and for normal transaction relay, the proof is roughly 3 times the size of the transaction.
    28+It's still reasonable for a single node to download this extra data but little caching goes a long way in reducing the amount of data that one has to download.
    


    murchandamus commented at 9:10 pm on August 28, 2025:

    little caching ↦ almost no caching a little caching ↦ some caching

    I think you mean the latter:

    0It's still reasonable for a single node to download this extra data but a little caching goes a long way in reducing the amount of data that one has to download.
    
  84. in utreexo-p2p-bip.md:50 in d1d03420ac outdated
    45+3. Archive nodes
    46+
    47+CSNs have the goal of minimizing data storage and download while performing block validation.
    48+Archive and bridge nodes store more data and provide this data to CSNs.
    49+
    50+Bridge nodes are nodes that can add inclusion proofs to mempool transactions, support the same set of messages as CSNs, and should in fact be indistinguishable from CSNs on the network.
    


    murchandamus commented at 9:16 pm on August 28, 2025:
    It’s not clear to me how “bridge nodes should in fact be indistinguishable from CSNs on the network”. By whom are they indistinguishable. In what regard are they indistinguishable? Shouldn’t they, e.g., be frequently the first peer to notify about new transactions appearing in the mempool and blocks having been found as they act as the translation layer and therefore the initial source of data for the Utreexo-portion of the node network?

    kcalvinalvin commented at 11:31 am on August 29, 2025:

    It’s not clear to me how “bridge nodes should in fact be indistinguishable from CSNs on the network”. By whom are they indistinguishable. In what regard are they indistinguishable?

    They’re indistinguishable as we don’t explicitly specify which nodes are bridges. The sentence was an attempt at clarifying a common misconception that a CSN must connect to bridge nodes.

    Shouldn’t they, e.g., be frequently the first peer to notify about new transactions appearing in the mempool and blocks having been found as they act as the translation layer and therefore the initial source of data for the Utreexo-portion of the node network?

    Yes this is true. They usually should be the first to notify utreexo peers about new txs and blocks


    adiabat commented at 3:12 am on September 2, 2025:

    Maybe “indistinguishable” is too strong – it would be great if nobody could tell, but if there are a small number of bridge nodes and a large number of CSNs it might be traceable.

    The main thing is bridge nodes don’t announce themselves as such; they just pass proofs and transactions around just like CSNs. If you’re a CSN connected directly to a bridge node, you might see a lot of INVs and proofs originate from that node, and they might be a bridge, but they might just be a well connected CSN.

    It’s similar to trying to prevent people from tracing new transactions to originating nodes, though probably in one sense harder (bridge nodes keep being bridge nodes all the time vs only getting one shot with a wallet broadcasting) but also lower stakes (determining that a node is a bridge doesn’t hurt privacy or network strength that much).

  85. in utreexo-p2p-bip.md:51 in d1d03420ac outdated
    46+
    47+CSNs have the goal of minimizing data storage and download while performing block validation.
    48+Archive and bridge nodes store more data and provide this data to CSNs.
    49+
    50+Bridge nodes are nodes that can add inclusion proofs to mempool transactions, support the same set of messages as CSNs, and should in fact be indistinguishable from CSNs on the network.
    51+Archive nodes are able to serve the blocks and the inclusion proofs. However, they are not able to generate the inclusion proofs as they do not keep the full UTXO set.
    


    murchandamus commented at 9:19 pm on August 28, 2025:

    Does “Bridge node” refer to the aspect of whether the node has the UTXO set, and does “archive node” refer to having the full set of data? I.e., are these different dimensions? Would you run an “archive bridge node” if you want to offer all services?

    Edit: Oh, never mind, you answer that right below.

  86. in utreexo-p2p-bip.md:60 in d1d03420ac outdated
    55+The one exception to this flexibility is that archive nodes must provide both the blocks and the inclusion proofs.
    56+While theoretically possible to split these two resources, the blocks are quite small relative to the block proofs, and it simplifies clients to be able to rely on being able to request both over the same connection.
    57+
    58+### Pre-P2P: Bridge Building
    59+
    60+When introducing Utreexo into an existing network, there are 2 thing needed before CSNs can operate.
    


    murchandamus commented at 9:23 pm on August 28, 2025:
    0When introducing Utreexo into an existing network, there are two things needed before CSNs can operate.
    
  87. in utreexo-p2p-bip.md:61 in d1d03420ac outdated
    56+While theoretically possible to split these two resources, the blocks are quite small relative to the block proofs, and it simplifies clients to be able to rely on being able to request both over the same connection.
    57+
    58+### Pre-P2P: Bridge Building
    59+
    60+When introducing Utreexo into an existing network, there are 2 thing needed before CSNs can operate.
    61+First, archive nodes need to build proofs for old blocks to serve during the initial-block download (IBD).
    


    murchandamus commented at 9:23 pm on August 28, 2025:
    0First, archive nodes need to build proofs for old blocks to serve during the initial block download (IBD).
    
  88. in utreexo-p2p-bip.md:71 in d1d03420ac outdated
    66+
    67+### Initial Block Download
    68+
    69+![Current IBD](bip-utreexo-p2p/current-ibd.png)
    70+
    71+Current IBD is done by a headers-first block download, in which the node downloads all the Bitcoin block headers, verifies that they connect, and start downloading the actual block data for validation.
    


    murchandamus commented at 9:25 pm on August 28, 2025:

    “Verifies that they connect” feels a bit overly reductive.

    0Conventionally, IBD is done by a headers-first block download, in which the node downloads all the Bitcoin block headers, verifies that they connect, and follows up by by downloading the block data for validation.
    
  89. in utreexo-p2p-bip.md:78 in d1d03420ac outdated
    73+![Utreexo node IBD](bip-utreexo-p2p/utreexo-node-ibd.png)
    74+
    75+Utreexo nodes still perform the headers-first phase.
    76+However, in addition to blocks, they also require the inclusion proof for UTXOs spent in that block.
    77+Hence a Utreexo node will send a `getutreexoproof` message along with the `getdata` message for a given block.
    78+This flow is the simplest change and allows a Utreexo node to validate and perform IBD but this method does require downloading about 2 times compared to the current nodes as the inclusion proof for a block is roughly the same size as the block itself.
    


    murchandamus commented at 9:31 pm on August 28, 2025:
    0This flow is the simplest change and allows a Utreexo node to validate and perform IBD but this method does require downloading about two times as much data as a conventional node due to the inclusion proof for a block being roughly the same size as the block itself.
    
  90. in utreexo-p2p-bip.md:89 in d1d03420ac outdated
    84+With these TTL values, a node receiving the `TTL` message will be able to determine which output to cache with the Clairvoyant algorithm[^1] which allows the IBD-ing node to reduce the bandwidth required in syncing the node in the most efficient way possible.
    85+
    86+The node will have the block and the TTLs for the outputs of the given block which it can then use to cache parts of the inclusion proof and only request the needed parts of an inclusion proof for future blocks.
    87+
    88+We note that it is feasible for a node to receive incorrect TTL values from malicious nodes and this can negatively impact the bandwidth savings.
    89+Nodes can mitigate this by not downloading TTL values too far into the future or by checking if the `TTL` message received was included in the accumulator hard-coded into the binary.
    


    murchandamus commented at 9:35 pm on August 28, 2025:
    I don’t understand what you mean with “hard-coded in the binary”.

    kcalvinalvin commented at 11:40 am on August 29, 2025:

    Oh I should clarify this.

    Since nothing is being committed to the TTL messages, a node can just lie about the values in the message. To prevent this, the node should either:

    1: don’t download too far into the future since the damage done will be greater. 2: rely on the pre-committed (aka “hard coded into the binary”) ttl accumulator in the node software. The ttl accumulator has ttls for each of the blocks accumulated. With this accumulator, the node can check if the received ttl is valid or invalid by checking for its existence in the ttl accumulator.

  91. in utreexo-p2p-bip.md:91 in d1d03420ac outdated
    86+The node will have the block and the TTLs for the outputs of the given block which it can then use to cache parts of the inclusion proof and only request the needed parts of an inclusion proof for future blocks.
    87+
    88+We note that it is feasible for a node to receive incorrect TTL values from malicious nodes and this can negatively impact the bandwidth savings.
    89+Nodes can mitigate this by not downloading TTL values too far into the future or by checking if the `TTL` message received was included in the accumulator hard-coded into the binary.
    90+
    91+This TTL commitment scheme is described in detail [here](#Commitment scheme for TTL messages).
    


    murchandamus commented at 9:36 pm on August 28, 2025:
    0The TTL commitment scheme is described in detail in the section [Commitment scheme for TTL messages](#commitment-scheme-for-ttl-messages) below.
    
  92. in utreexo-p2p-bip.md:97 in d1d03420ac outdated
    92+
    93+### Transaction relay
    94+
    95+![Current TX relay](bip-utreexo-p2p/current-tx-relay.png)
    96+
    97+Current transaction relay is done by sending an inv message with the hash of the transaction and a type field that denotes that this hash represents a transaction.
    


    murchandamus commented at 9:41 pm on August 28, 2025:
    Throughout this BIP: I don’t think “current” is a future-proof term to distinguish the established behavior of nodes from Utreexo nodes. Perhaps “conventional” or “non-Utreexo nodes” would be a better fit?
  93. in utreexo-p2p-bip.md:98 in d1d03420ac outdated
    93+### Transaction relay
    94+
    95+![Current TX relay](bip-utreexo-p2p/current-tx-relay.png)
    96+
    97+Current transaction relay is done by sending an inv message with the hash of the transaction and a type field that denotes that this hash represents a transaction.
    98+If the node receiving the inv does not have a tx matching that hash, it then requests for it using a getdata message.
    


    murchandamus commented at 9:54 pm on August 28, 2025:
    0If the node receiving the inv does not have a transaction matching that hash, the node then requests the transaction using a getdata message.
    
  94. in utreexo-p2p-bip.md:107 in d1d03420ac outdated
    102+The transaction relay for Utreexo nodes doesn't add any extra round trips.
    103+However, it does include extra inventory vectors in the inv message.
    104+
    105+We introduce a new inventory vector type called `utreexoproofhash`, which makes up the extra information that a Utreexo node will receive.
    106+
    107+A hash with the type `utreexoproofhash` represents 4 Utreexo merkle tree positions, each of them little endian serialized and taking up 8 bytes in the 32 byte hash.
    


    murchandamus commented at 9:56 pm on August 28, 2025:
    0
    1A hash with the type `utreexoproofhash` represents four Utreexo merkle tree positions, each of them little-endian serialized and taking up 8 bytes in the 32-byte hash.
    
  95. in utreexo-p2p-bip.md:108 in d1d03420ac outdated
    103+However, it does include extra inventory vectors in the inv message.
    104+
    105+We introduce a new inventory vector type called `utreexoproofhash`, which makes up the extra information that a Utreexo node will receive.
    106+
    107+A hash with the type `utreexoproofhash` represents 4 Utreexo merkle tree positions, each of them little endian serialized and taking up 8 bytes in the 32 byte hash.
    108+When sending an inv message to a Utreexo node for a tx, we append `utreexoproofhash` inventory vectors to represent the merkle tree positions for each of the UTXOs being referenced in the inputs of the tx.
    


    murchandamus commented at 9:58 pm on August 28, 2025:

    Please don’t use abbreviations like “tx” in the running text.

    0When sending an inv message to a Utreexo node for a transaction, we append `utreexoproofhash` inventory vectors to represent the merkle tree positions for each of the UTXOs being referenced in the inputs of the transaction.
    

    murchandamus commented at 10:14 pm on August 28, 2025:
    When you write “send an inv message” do you actually mean “announce a new transaction” rather than literally a inv message being sent?

    murchandamus commented at 10:17 pm on August 28, 2025:

    The use of inv message here seems confusing. It’s not clear to me whether it is meant literal or as a stand in for “transaction announcement”. First I even thought you were describing the message that transfers the entire transaction because you already were sending the utreexoproofhash along. If this is actually describing how Utreexo nodes announce transactions to each other, instead of saying “to send an inv message”, it could better be introduces e.g., as

    “Where conventional nodes us a inv message to announce a new transaction, Utreexo nodes use the invvect message to announce new transactions to Utreexo peers.”

    It would perhaps also help if you explain why the utreexoproofhash would need to be sent with announcement of the transaction—I would have expected them to only be necessary when the transaction data is sent.

    Either way, this section is confusing to me.

  96. in utreexo-p2p-bip.md:109 in d1d03420ac outdated
    104+
    105+We introduce a new inventory vector type called `utreexoproofhash`, which makes up the extra information that a Utreexo node will receive.
    106+
    107+A hash with the type `utreexoproofhash` represents 4 Utreexo merkle tree positions, each of them little endian serialized and taking up 8 bytes in the 32 byte hash.
    108+When sending an inv message to a Utreexo node for a tx, we append `utreexoproofhash` inventory vectors to represent the merkle tree positions for each of the UTXOs being referenced in the inputs of the tx.
    109+The Utreexo merkle tree positions are explained in detail in the bip "Utreexo Accumulator Specification".
    


    murchandamus commented at 9:58 pm on August 28, 2025:
    0The Utreexo merkle tree positions are explained in detail in "BIP Utreexo Accumulator Specification".
    
  97. in utreexo-p2p-bip.md:105 in d1d03420ac outdated
    100+![Utreexo TX relay](bip-utreexo-p2p/utreexo-tx-relay.png)
    101+
    102+The transaction relay for Utreexo nodes doesn't add any extra round trips.
    103+However, it does include extra inventory vectors in the inv message.
    104+
    105+We introduce a new inventory vector type called `utreexoproofhash`, which makes up the extra information that a Utreexo node will receive.
    


    murchandamus commented at 10:12 pm on August 28, 2025:

    This section feels ambiguous to me.

    With the graphic labeling the first box that contains “type:transaction, hash: tx hash (a)” as invvect, I am now wondering whether Utreexo nodes use the inv message to announce messages to each other extended by invvect data, or whether they send invvect messages instead of inv messages.

  98. in utreexo-p2p-bip.md:115 in d1d03420ac outdated
    110+Since the hash in an inventory vector is always 32 bytes, any unused space will be padded with the max uint64 value of 18446744073709551615.
    111+
    112+With these merkle tree positions for the UTXOs referenced in the inputs, we can calculate the needed positions of the merkle hashes to them.
    113+These positions are then sent over in the `getdata` message as an another inventory vector.
    114+
    115+![Utreexo TX relay multiple Utreexo proof hash vectors](bip-utreexo-p2p/utreexo-tx-relay-multiple-proofhash-vectors.png)
    


    murchandamus commented at 10:22 pm on August 28, 2025:
    Scrolling up and down through this document, it’s sometimes difficult to tell whether a paragraph belongs to the image before or after the paragraph. Since Markdown does not allow captions on images, it could for example help if either the images included the caption, or if the text were structured in some way that makes it clearer.
  99. in utreexo-p2p-bip.md:131 in d1d03420ac outdated
    126+
    127+### Block Propagation
    128+
    129+![Legacy Block Propagation](bip-utreexo-p2p/legacy-block-propagation.png)
    130+
    131+Legacy block propagation without Compact Blocks comprises of three steps:
    


    murchandamus commented at 10:27 pm on August 28, 2025:
    Consistency: Previously you were referring to non-Utreexo nodes as “current nodes”, now it’s “legacy”. Please use one term to refer to the same concept across the entire document.
  100. in utreexo-p2p-bip.md:202 in d1d03420ac outdated
    197+#### Reconstructable Script
    198+
    199+For some script types (e.g. `ScriptHash`, `PubkeyHash`, `WitnessScriptHash`, `WitnessPubkeyHash`) the actual locking condition is not in the scriptPubkey, but a hash of it.
    200+The script which is evaluated is provided as an element of the scriptSig or witness data.
    201+
    202+Therefore, we can safely just omit the locking script hash from the UTXO data and reconstruct it from the witness or scriptSig.
    


    murchandamus commented at 10:37 pm on August 28, 2025:
    0Therefore, we can safely omit the locking script hash from the UTXO data and reconstruct it from the witness or scriptSig.
    
  101. in utreexo-p2p-bip.md:242 in d1d03420ac outdated
    237+
    238+| Field        | Type                | Description                                                                                                                                                                                                                                                    |
    239+|--------------|---------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
    240+| block height | uint32              | The time-to-live value of a leaf in the Utreexo merkle forest. The value is determined by the amount of leaves that were added to the accumulator since its creation                                                                                           |
    241+| length       | varint              | The length of the TTLs                                                                                                                                                                                                                                         |
    242+| TTLs         | vector of TTL infos | The TTL Info for the UTXOs that are added to the Utreexo merkle forest in blockchain ordering. See [Utreexo - Validation Layer](./utreexo-validation-bip.md#Excluded UTXOs from the accumulator) for the UTXOs that are not added to the Utreexo merkle forest |
    


    murchandamus commented at 10:40 pm on August 28, 2025:
    0| TTLs         | vector of TTL infos | The TTL Info for the UTXOs that are added to the Utreexo merkle forest in blockchain ordering. See [Utreexo - Validation Layer](./utreexo-validation-bip.md#excluded-utxos-from-the-accumulator) for the UTXOs that are not added to the Utreexo merkle forest |
    
  102. in utreexo-p2p-bip.md:287 in d1d03420ac outdated
    282+
    283+The bitmaps here are formatted as big-endian and padded to the nearest byte, with 1 meaning a request for the proof hash or the leaf data, and 0 meaning omit the proof hash or the leaf data.
    284+
    285+Since there's one corresponding leaf data per target location, it's trivial to generate a bitmap for the leafdatas.
    286+
    287+Using the [proof_positions](./utreexo-accumulator-bip.md#Utility Functions) function, it's possible to generate the positions of the needed proof hashes for a given set of targets.
    


    murchandamus commented at 10:42 pm on August 28, 2025:
    0Using the [proof_positions](./utreexo-accumulator-bip.md#utility-functions) function, it's possible to generate the positions of the needed proof hashes for a given set of targets.
    
  103. murchandamus commented at 10:50 pm on August 28, 2025: contributor
    I read the whole P2P BIP, although I went over the new messages section a bit more quickly. There are some sections that felt a bit confusing to me, perhaps you could try to take a look at whether you can clarify those for the less initiated. Overall, this seems close to complete, although I noticed that it is missing a Rationale section.
  104. murchandamus added the label Needs number assignment on Aug 28, 2025
  105. jonatack commented at 8:00 pm on August 29, 2025: member
    After discussion amongst the editors, we’ve assigned 181-183 for these 3 BIP drafts. @murchandamus suggested 181 Accumulator / 182 Validation / 183 P2P (I agree) while leaving it up to you.
  106. jonatack removed the label Needs number assignment on Aug 29, 2025
  107. murchandamus commented at 11:33 pm on August 29, 2025: contributor
    Whenever you get around to it, please add the numbers to the Preambles, set the “Created” header to 2025-08-29 (it holds the date a BIP got numbered), and add the table entries to the README.mediawiki.
  108. kcalvinalvin commented at 3:53 am on August 30, 2025: contributor

    After discussion amongst the editors, we’ve assigned 181-183 for these 3 BIP drafts.

    @murchandamus suggested 181 Accumulator / 182 Validation / 183 P2P (I agree) while leaving it up to you.

    Whenever you get around to it, please add the numbers to the Preambles, set the “Created” header to 2025-08-29 (it holds the date a BIP got numbered), and add the table entries to the README.mediawiki.

    Currently going through all the reviews and writing up the rationale for validation and p2p. Will address these as well.

  109. in utreexo-p2p-bip.md:186 in d1d03420ac outdated
    181+### New data structures
    182+
    183+#### Compact leaf data
    184+
    185+For a CSN to learn the data associated with a UTXO, it must ask for a peer that has it.
    186+To authenticate this data, it is committed into the accumulator, and therefore cannot be changed by peer.
    


    luisschwab commented at 4:11 pm on September 2, 2025:
    0To authenticate this data, it is committed into the accumulator, and therefore cannot be changed by the peer.
    
  110. in utreexo-p2p-bip.md:18 in d1d03420ac outdated
    13+Requires: <BIP-???? (Utreexo Accumulator Specification), BIP-???? (Utreexo - Validation Layer)>
    14+```
    15+
    16+## Abstract
    17+
    18+Utreexo creates a compact representation of the UTXO set that only takes a couple of kilobytes.
    


    luisschwab commented at 4:14 pm on September 2, 2025:

    At one extreme of this gradient, nodes minimize storage and memory requirements, keeping only the roots of the hash trees, which never exceed a kilobyte.

    The Utreexo paper mentions that the upper limit of the accumulator size is a single KB. What changed?


    kcalvinalvin commented at 12:55 pm on September 7, 2025:
    It’s essentially still a kilobyte but since we can support leaves up to the maximum of uint64, we can have 64 roots which is 64*32 = 2048. So 2KB max.
  111. kcalvinalvin force-pushed on Sep 7, 2025
  112. kcalvinalvin force-pushed on Sep 7, 2025
  113. kcalvinalvin force-pushed on Sep 7, 2025
  114. BIP181: Add the Utreexo accumulator BIP d89952d09f
  115. kcalvinalvin force-pushed on Sep 7, 2025
  116. BIP182: Add the Utreexo validation BIP 4aa26f3ca7
  117. kcalvinalvin force-pushed on Sep 7, 2025
  118. BIP183: Add the Utreexo P2P BIP 68da366a98
  119. Update README table to include BIPs: 181, 182, 183 bd1e242587
  120. kcalvinalvin force-pushed on Sep 7, 2025
  121. kcalvinalvin renamed this:
    BIP draft: BIPs for Utreexo
    BIP 181, 182, 183: BIPs for Utreexo
    on Sep 7, 2025
  122. kcalvinalvin commented at 12:58 pm on September 7, 2025: contributor

    All of the review comments are addressed and the rationale for BIPs 182 and 183 were added.

    BIP-0183 was also edited in the following ways:

    1: Images updated with caption 2: Images now updated with transparent backgrounds and changed the colors so they can be read in dark mod 3: Changed the layout of the images and the paragraphs to be more legible.


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bips. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-09-13 09:10 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me