Problem description
There is demand from users to get the UTXO set in form of a SQLite database (#24628). Bitcoin Core currently only supports dumping the UTXO set in a binary compact-serialized format, which was crafted specifically for AssumeUTXO snapshots (see PR #16899), with the primary goal of being as compact as possible. Previous PRs tried to extend the dumptxoutset
RPC with new formats, either in human-readable form (e.g. #18689, #24202), or most recently, directly as SQLite database (#24952). Both are not optimal: due to the huge size of the ever-growing UTXO set with already more than 80 million entries on mainnet, human-readable formats are practically useless, and very likely one of the first steps would be to put them in some form of database anyway. Directly adding SQLite3 dumping support on the other hand introduces an additional dependency to the non-wallet part of bitcoind and the risk of increased maintenance burden (see e.g. #24952 (comment), #24628 (comment)).
Proposed solution
This PR follows the “external tooling” route by adding a simple Python script for achieving the same goal in a two-step process (first create compact-serialized UTXO set via dumptxoutset
, then convert it to SQLite via the new script). Executive summary:
- single file, no extra dependencies (sqlite3 is included in Python’s standard library [1])
- ~150 LOC, mostly deserialization/decompression routines ported from the Core codebase and (probably the most difficult part) a little elliptic curve / finite field math to decompress pubkeys (essentialy solving the secp256k1 curve equation y^2 = x^3 + 7 for y given x, respecting the proper polarity as indicated by the compression tag)
- creates a database with only one table
utxos
with the following schema:(txid TEXT, vout INT, value INT, coinbase INT, height INT, scriptpubkey TEXT)
- the resulting file has roughly 2x the size of the compact-serialized UTXO set (this is mostly due to encoding txids and scriptpubkeys as hex-strings rather than bytes)
[1] note that there are some rare cases of operating systems like FreeBSD though, where the sqlite3 module has to installed explicitly (see #26819)
A functional test is also added that creates UTXO set entries with various output script types (standard and also non-standard, for e.g. large scripts) and verifies that the UTXO sets of both formats match by comparing corresponding MuHashes. One MuHash is supplied by the bitcoind instance via gettxoutsetinfo muhash
, the other is calculated in the test by reading back the created SQLite database entries and hashing them with the test framework’s MuHash3072
module.
Manual test instructions
I’d suggest to do manual tests also by comparing MuHashes. For that, I’ve written a go tool some time ago which would calculate the MuHash of a sqlite database in the created format (I’ve tried to do a similar tool in Python, but it’s painfully slow).
0$ [run bitcoind instance with -coinstatsindex]
1$ ./src/bitcoin-cli dumptxoutset ~/utxos.dat
2$ ./src/bitcoin-cli gettxoutsetinfo muhash <block height returned in previous call>
3(outputs MuHash calculated from node)
4
5$ ./contrib/utxo-tools/utxo_to_sqlite.py ~/utxos.dat ~/utxos.sqlite
6$ git clone https://github.com/theStack/utxo_dump_tools
7$ cd utxo_dump_tools/calc_utxo_hash
8$ go run calc_utxo_hash.go ~/utxos.sqlite
9(outputs MuHash calculated from the SQLite UTXO set)
10
11=> verify that both MuHashes are equal
For a demonstration what can be done with the resulting database, see #24952#pullrequestreview-956290477 for some example queries. Thanks go to LarryRuane who gave me to the idea of rewriting this script in Python and adding it to contrib
.