Possible savings in `dumptxoutset` serialization format (~20%) #25675

issue RCasatta openend this issue on July 22, 2022

RCasatta commented at 11:43 am on July 22, 2022: none
Is your feature request related to a problem? Please describe.

Size of the serialized UTXO set

Describe the solution you’d like

The serialization format of the serialized UTXO set from dumptxoutset is a list of (COutPoint, Coin).

However, many out points refer to the same transaction so we can group by Txid with: list(Txid, list((vout,Coin))

Since the cursor is already iterating through sorted Txid this doesn’t add complexity to the serialization code.

Considering the UTXO at height 745995:
0serialized_size: ~5.3Gb 1total_elements: 83_082_178 2uniques_txids: 49_517_483 3bytes_lost: total_elements // due to the additional byte for the length of the inner list 4bytes_savings: (total_elements-unique_txids)*32 - bytes_lost ~= 1Gb
Describe alternatives you’ve considered

Additional bytes could be saved by leveraging the duplications in scripts (address reuse). However, this is not considered worthy because the format would lose the streaming property and also because we don’t want to optimize on something which is not recommended.

The byte lost for expressing the length of the inner list could be optimized with a special byte containing both the length of the list and the first vout, since usually both vout and this value are very small they should fit on a single byte most of the time having a fallback for edge cases.

Additional context

Tagging @jamesob author of #16899 assumeutxo summary #15606
RCasatta added the label Feature on Jul 22, 2022
luke-jr commented at 8:27 pm on March 14, 2024: member

Outputs are also created sequentially, so it seems likely list(Txid, list(Coin,...) might improve things too?
RCasatta commented at 8:56 pm on March 14, 2024: none

Shouldn’t be list(Txid, list(Option<Coin>,...) to recompute the vout? In this case I don’t think so because it needs a byte for every None?
achow101 closed this on May 23, 2024
achow101 referenced this in commit 413844f1c2 on May 23, 2024
bitcoin locked this on May 23, 2025

Contributors
RCasatta luke-jr

Labels
Feature

Possible savings in dumptxoutset serialization format (~20%) #25675

Possible savings in `dumptxoutset` serialization format (~20%) #25675