Add CSV support to utxo_convert.py (formerly utxo_to_sqlite.py) #34324

pull sipa wants to merge 4 commits into bitcoin:master from sipa:202601_dumpcsv changing 5 files +668 −233
  1. sipa commented at 9:37 pm on January 16, 2026: member

    This renames the utxo_to_sqlite.py script to utxo_convert.py, and:

    • Adds support for multiple output formats selectable with --format (default depends on filename extension).
    • Adds a CSV format (which includes reporting the scriptPubKey as address/descriptor).
    • Makes type checkers and linters happier.

    The goal is giving users a way to construct the UTXO set in a textual, greppable format.

  2. DrahtBot commented at 9:37 pm on January 16, 2026: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/34324.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    Concept ACK nervana21

    If your review is incorrectly listed, please copy-paste <!–meta-tag:bot-skip–> into the comment that the bot should ignore.

    Conflicts

    No conflicts as of last run.

    LLM Linter (✨ experimental)

    Possible typos and grammar issues:

    • “refer to the module docstring on top of the script” -> “refer to the module docstring at the top of the script” [“on top of the script” is nonstandard; “at the top of the script” is clearer and correct English]

    2026-03-12 13:42:27

  3. in contrib/utxo-tools/utxo_to_csv.py:66 in 34b9bc172a
    61+
    62+def byte_to_base58(b, version):
    63+    "Compute the Base58Check encoding of an input byte array with given version."""
    64+    result = ''
    65+    b = bytes([version]) + b  # prepend version
    66+    b += hashlib.sha256(hashlib.sha256(b).digest()).digest() # append checksum
    



    sipa commented at 11:43 pm on January 16, 2026:
    Done!
  4. in contrib/utxo-tools/utxo_to_csv.py:204 in 34b9bc172a
    199+def scriptpubkey_to_descriptor(spk, network_string):
    200+    """Infer a descriptor for the specified scriptpubkey."""
    201+    if len(spk) == 25 and spk[0] == 0x76 and spk[1] == 0xa9 and spk[2] == 20 and spk[23] == 0x88 and spk[24] == 0xac:
    202+        return "addr(" + byte_to_base58(spk[3:23], P2PKH_VERSIONS[network_string]) + ")"
    203+    if len(spk) == 23 and spk[0] == 0xa9 and spk[1] == 20 and spk[22] == 0x87:
    204+        return "addr(" + byte_to_base58(spk[1:21], P2SH_VERSIONS[network_string]) + ")"
    


    l0rinc commented at 10:38 pm on January 16, 2026:

    based on https://github.com/bitcoin/bitcoin/blob/fa942332b40c97375af0722f32f7575bca3af819/src/script/solver.cpp#L149 we need to skip the second element as well:

    0        return "addr(" + byte_to_base58(spk[2:22], P2SH_VERSIONS[network_string]) + ")"
    

    sipa commented at 11:43 pm on January 16, 2026:
    Fixed, thanks.
  5. in contrib/utxo-tools/utxo_to_csv.py:217 in 34b9bc172a outdated
    212+    if multi is not None:
    213+        keys, m = multi
    214+        return f"multi({m}," + ",".join(key.hex() for key in keys) + ")"
    215+    return "raw(" + spk.hex() + ")"
    216+
    217+def read_varint(f):
    


    l0rinc commented at 10:41 pm on January 16, 2026:
    I understand if we don’t want to dedup across test and helpers, but can we do that inside the same “UTXO” tools? #32116 (comment)

    sipa commented at 11:43 pm on January 16, 2026:
    I think if we want to deduplicate, it would be better to merge the two tools into one.

    l0rinc commented at 10:32 am on January 17, 2026:

    merge the two tools into one

    +1 for that

  6. DrahtBot added the label CI failed on Jan 16, 2026
  7. DrahtBot commented at 10:55 pm on January 16, 2026: contributor

    🚧 At least one of the CI tasks failed. Task lint: https://github.com/bitcoin/bitcoin/actions/runs/21081641168/job/60636466740 LLM reason (✨ experimental): Lint failure: duplicate dictionary key “Testnet3” in contrib/utxo-tools/utxo_to_csv.py.

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  8. sipa force-pushed on Jan 16, 2026
  9. DrahtBot removed the label CI failed on Jan 17, 2026
  10. in contrib/utxo-tools/utxo_to_csv.py:143 in 21fdbe06af
    138+    ret = bech32_encode(encoding, hrp, [witver] + convertbits(witprog, frombits=8, tobits=5))
    139+    return ret
    140+
    141+# Address/descriptor encoding
    142+
    143+def decode_bare_multisig(spk):
    


    l0rinc commented at 12:08 pm on January 17, 2026:
    Ran the conversion locally, the multisig detection looks a bit loose, and I think some nonstandard scripts can get misclassified as multisig, see: https://github.com/bitcoin/bitcoin/blob/5a0f49bd2661d82efe13740856764e4e17fc1d06/src/pubkey.h#L77-L79

    sipa commented at 9:09 pm on February 27, 2026:
    I have rewritten this to be more exact (and also more readable, I hope).
  11. nervana21 commented at 6:03 pm on February 4, 2026: contributor
    Concept ACK
  12. in contrib/utxo-tools/utxo_to_csv.py:142 in 21fdbe06af
    137+    encoding = Encoding.BECH32 if witver == 0 else Encoding.BECH32M
    138+    ret = bech32_encode(encoding, hrp, [witver] + convertbits(witprog, frombits=8, tobits=5))
    139+    return ret
    140+
    141+# Address/descriptor encoding
    142+
    


    nervana21 commented at 5:37 pm on February 9, 2026:

    This solution is based on l0rinc’s comment below. Please feel free to ignore if there is a better way to address the issue.

     0
     1def _pubkey_get_len(first_byte):
     2    """Computes the length of a pubkey with a given first byte."""
     3    if first_byte in (2, 3):
     4        return 33
     5    if first_byte in (4, 6, 7):
     6        return 65
     7    return 0
     8
     9def _pubkey_is_valid_size(key):
    10    """True iff key is non-empty and first byte implies length == len(key)."""
    11    return len(key) > 0 and _pubkey_get_len(key[0]) == len(key)
    

    sipa commented at 9:09 pm on February 27, 2026:
    I have rewritten this in a different way.
  13. in contrib/utxo-tools/utxo_to_csv.py:167 in 21fdbe06af
    162+        return None
    163+    # Decode public keys
    164+    while pos < len(spk) and (spk[pos] == 33 or spk[pos] == 65):
    165+        if pos + 1 + spk[pos] > len(spk):
    166+            return None
    167+        keys.append(spk[pos + 1:pos + 1 + spk[pos]])
    


    nervana21 commented at 5:37 pm on February 9, 2026:

    Use the new helper in the key loop.

    0        key_blob = spk[pos + 1:pos + 1 + spk[pos]]
    1        if not _pubkey_is_valid_size(key_blob):
    2            return None
    3        keys.append(key_blob)
    

    sipa commented at 9:09 pm on February 27, 2026:
    I have rewritten this in a different way; I had missed your suggestion, sorry.
  14. sipa force-pushed on Feb 27, 2026
  15. sipa renamed this:
    Add utxo_to_csv.py tool
    Add CSV support to utxo_convert.py (formerly utxo_to_sqlite.py)
    on Feb 27, 2026
  16. sipa commented at 9:08 pm on February 27, 2026: member
    I have overhauled this to integrate into the existing utxo_to_sqlite.py script (renaming it to utxo_convert.py).
  17. sipa force-pushed on Feb 27, 2026
  18. DrahtBot added the label CI failed on Feb 27, 2026
  19. sipa force-pushed on Feb 27, 2026
  20. sipa force-pushed on Feb 27, 2026
  21. sipa force-pushed on Feb 27, 2026
  22. DrahtBot removed the label CI failed on Feb 28, 2026
  23. sedited commented at 8:51 am on March 8, 2026: contributor
    Can you elaborate a bit in the PR description why this change might be desirable? What is a use case that isn’t covered by the existing functionality? There are a large number of external libraries that can already do address conversion, and sqlite itself has dump to csv functionality.
  24. sipa commented at 1:47 pm on March 8, 2026: member

    @sedited That’s a fair question.

    I wrote this when I wanted to grep the addresses in the UTXO set for partial matches, and thought this may be a generally useful conversion. While there certainly are libraries that can convert scriptPubKeys to addresses, I think it’s valuable for technical-but-not-programmer people to be able to query the UTXO set with a provided tool.

    I didn’t consider sqlite-to-csv, or other direct sqlite tooling, because I’m not really familiar with it. A reasonable alternative here might be an option to add descriptor/address fields to the sqlite dump (or maybe generally control what fields go into it).

  25. sedited commented at 9:48 am on March 12, 2026: contributor

    I think it’s valuable for technical-but-not-programmer people to be able to query the UTXO set with a provided tool.

    Mmh, that seems like a weak motivation given llms and that most can just use an RPC call for that. I think what I’m not sure about is adding a bunch of general purpose parsing logic specifically to this contrib script. If the problem is that we wouldn’t want to rely on an external parser, I remember we’ve discussed making the core functionality of the test framework a reusable library. Would that be a better approach to this problem in general? I think we should eventually work towards migrating these tools to c++ and re-use our existing code.

    All that said, I also don’t want to block this PR if it is actually useful to people.

  26. DrahtBot added the label Needs rebase on Mar 12, 2026
  27. contrib: clean up utxo_to_sqlite (types, linter) 5405dfe2f8
  28. contrib: rename utxo_to_sqlite.py -> utxo_convert.py 6b3f5a7efd
  29. contrib: abstract out UTXO reader functionality in utxo_convert.py 1775be3b57
  30. contrib: add support for CSV output to utxo_convert.py eeb8af59aa
  31. sipa force-pushed on Mar 12, 2026
  32. sipa commented at 1:45 pm on March 12, 2026: member

    I think what I’m not sure about is adding a bunch of general purpose parsing logic specifically to this contrib script. If the problem is that we wouldn’t want to rely on an external parser, I remember we’ve discussed making the core functionality of the test framework a reusable library. Would that be a better approach to this problem in general?

    I don’t like that. I think our test framework should remain internal, so we are not beholden to interface promises for external users. The approach here indeed duplicates in a minimal way, but tests against the test framework, so it does at least inherit some of the assurances it provides.

    I think we should eventually work towards migrating these tools to c++ and re-use our existing code.

    Yeah, that may make sense for some of these tools. If there is interest, I’m willing to look into that.

    All that said, I also don’t want to block this PR if it is actually useful to people.

    Ok, let’s see what others think.

    most can just use an RPC call for that.

    Not sure what you’re talking about here.

  33. DrahtBot removed the label Needs rebase on Mar 12, 2026
  34. DrahtBot added the label CI failed on Mar 12, 2026

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-03-15 03:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me