This PR is a late follow-up to #27432, introducing an option for the utxo-to-sqlite script to store the txid/scriptPubKey columns as bytes (= BLOB
storage class in sqlite, see e.g. https://www.sqlite.org/datatype3.html in sqlite) rather than hex strings. This was proposed in earlier reviews (https://github.com/bitcoin/bitcoin/pull/27432#issuecomment-1516857024, #27432 (comment)) and has the obvious advantage of a significantly smaller size of the resulting database (and with that, faster conversion) and the avoidance of hex-to-bytes conversion for further processing of the data [1]. The rationale on why hex strings were chosen back then (and still stays the default, if only for compatibility reasons) is laid out in #27432 (comment) [2].
The approach taken is introducing new parameters --spk
and --txid
which can either have the values “hex”, “raw” (for scriptpubkey) and “hex”, “raw”, “rawle” (for txid). Thanks to ajtowns for providing this suggestion. Happy to take further inputs on naming and thoughts on future extensibility etc.
[1] For a concrete example, I found that having these columns as bytes would be nice while working on a SwiftSync hints generator tool (https://github.com/theStack/swiftsync-hints-gen), which takes the result of the utxo-to-sqlite tool as input. [2] note that in contrast what I wrote back then, I think there is no ambiguity on byte-string-serialization of txids; they are ultimately just hash results and hence, they should be stored as such, and adding a big/little endian knob wouldn’t make much sense. The drawback of not being able to immediately show txid-strings (as one would need to do the bytes-reversal step first, which is not possible in sqlite, see e.g. #24952 (comment)) still remains though.