Possible savings in dumptxoutset serialization format (~20%) #25675

issue RCasatta openend this issue on July 22, 2022
  1. RCasatta commented at 11:43 am on July 22, 2022: none

    Is your feature request related to a problem? Please describe.

    Size of the serialized UTXO set

    Describe the solution you’d like

    The serialization format of the serialized UTXO set from dumptxoutset is a list of (COutPoint, Coin).

    However, many out points refer to the same transaction so we can group by Txid with: list(Txid, list((vout,Coin))

    Since the cursor is already iterating through sorted Txid this doesn’t add complexity to the serialization code.

    Considering the UTXO at height 745995:

    0serialized_size: ~5.3Gb
    1total_elements: 83_082_178
    2uniques_txids: 49_517_483
    3bytes_lost: total_elements // due to the additional byte for the length of the inner list
    4bytes_savings: (total_elements-unique_txids)*32 - bytes_lost ~= 1Gb
    

    Describe alternatives you’ve considered

    Additional bytes could be saved by leveraging the duplications in scripts (address reuse). However, this is not considered worthy because the format would lose the streaming property and also because we don’t want to optimize on something which is not recommended.

    The byte lost for expressing the length of the inner list could be optimized with a special byte containing both the length of the list and the first vout, since usually both vout and this value are very small they should fit on a single byte most of the time having a fallback for edge cases.

    Additional context

    Tagging @jamesob author of #16899 assumeutxo summary #15606

  2. RCasatta added the label Feature on Jul 22, 2022
  3. luke-jr commented at 8:27 pm on March 14, 2024: member
    Outputs are also created sequentially, so it seems likely list(Txid, list(Coin,...) might improve things too?
  4. RCasatta commented at 8:56 pm on March 14, 2024: none
    Shouldn’t be list(Txid, list(Option<Coin>,...) to recompute the vout? In this case I don’t think so because it needs a byte for every None?
  5. achow101 closed this on May 23, 2024


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-01-21 12:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me