contrib: refactor: dedup deserialization routines in utxo-to-sqlite script #32116

pull theStack wants to merge 1 commits into bitcoin:master from theStack:202503-contrib-utxo_to_sqlite-dedup_deser_functions changing 1 files +10 −49
  1. theStack commented at 5:10 pm on March 21, 2025: contributor
    The compact-size/varint/amount deserialization routines already exist in the functional test framework (with the advantage of having proper unit tests executed in CI), so we can just import them from there instead of leaving duplicated code in the repository.
  2. contrib: dedup deserialization routines in utxo-to-sqlite script
    These routines already exist in the functional test framework
    (with the advantage of having proper unit tests executed in CI), so we
    can just import them from there instead of leaving duplicated code
    in the repository.
    f535b2fe63
  3. DrahtBot commented at 5:10 pm on March 21, 2025: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/32116.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK rkrux
    Concept ACK l0rinc, yancyribbens

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

  4. DrahtBot added the label Scripts and tools on Mar 21, 2025
  5. rkrux commented at 8:49 am on March 24, 2025: contributor

    utACK f535b2fe63255175f7a35882599f9b6b83ac25f1

    I did realise there’d be some duplication when I previously supported adding and using these deserialisation functions in #31907 (review), good that you found a way to get rid of the recently introduced duplication.

  6. in contrib/utxo-tools/utxo_to_sqlite.py:38 in f535b2fe63
    78-
    79-
    80 def decompress_script(f):
    81     """Equivalent of `DecompressScript()` (see compressor module)."""
    82-    size = read_varint(f)  # sizes 0-5 encode compressed script types
    83+    size = deser_varint(f)  # sizes 0-5 encode compressed script types
    


    l0rinc commented at 11:34 am on March 24, 2025:
    👍, read_varint and deser_varint implementations match exactly (except for description, which we may move over there)
  7. in contrib/utxo-tools/utxo_to_sqlite.py:118 in f535b2fe63
    114@@ -154,13 +115,13 @@ def main():
    115         # read key (COutPoint)
    116         if coins_per_hash_left == 0:  # read next prevout hash
    117             prevout_hash = f.read(32)[::-1].hex()
    118-            coins_per_hash_left = read_compactsize(f)
    119-        prevout_index = read_compactsize(f)
    120+            coins_per_hash_left = deser_compact_size(f)
    


    l0rinc commented at 11:36 am on March 24, 2025:
    Seems to me deser_compact_size does and extra little endian conversion - for a single byte, which seems redundant, we might as well remove it from there to match this implementation
  8. in contrib/utxo-tools/utxo_to_sqlite.py:26 in f535b2fe63
    21+from test_framework.compressor import decompress_amount  # noqa: E402
    22+from test_framework.messages import deser_compact_size, deser_varint  # noqa: E402
    23+
    24 
    25 UTXO_DUMP_MAGIC = b'utxo\xff'
    26 UTXO_DUMP_VERSION = 2
    


    l0rinc commented at 11:44 am on March 24, 2025:
    Could we import the MAGIC_BYTES from test_framework.messages as well (we might have to extend it with Testnet4)?

    theStack commented at 6:18 am on March 27, 2025:
    Good find, will do if we end up still following the current approach of the PR (now less likely in light of the discussion triggered by #32116 (comment)).
  9. in contrib/utxo-tools/utxo_to_sqlite.py:124 in f535b2fe63
    123-        code = read_varint(f)
    124+        code = deser_varint(f)
    125         height = code >> 1
    126         is_coinbase = code & 1
    127-        amount = decompress_amount(read_varint(f))
    128+        amount = decompress_amount(deser_varint(f))
    


    l0rinc commented at 11:46 am on March 24, 2025:
    👍, decompress_amount implementations match exactly
  10. in contrib/utxo-tools/utxo_to_sqlite.py:19 in f535b2fe63
    15@@ -16,6 +16,11 @@
    16 import sys
    17 import time
    18 
    19+sys.path.insert(0, os.path.join(os.path.dirname(__file__), '../../test/functional'))
    


    l0rinc commented at 11:52 am on March 24, 2025:
    this is a bit hacky, but I don’t mind if we can dedup

    rkrux commented at 11:54 am on March 24, 2025:
    +1, I felt the same but I feel it’s worth getting rid of duplication.

    l0rinc commented at 12:02 pm on March 24, 2025:
    Though this means this utility file isn’t as self-contained anymore (i.e. you can’t just copy it out of the project - not even sure that was possible before)
  11. l0rinc changes_requested
  12. l0rinc commented at 11:52 am on March 24, 2025: contributor
    Concept ACK - I think we can do a bit more cleanup, though
  13. yancyribbens commented at 6:05 pm on March 25, 2025: contributor
    Concept ACK
  14. laanwj commented at 1:27 pm on March 26, 2025: member

    Concept ~0. i understand the reasoning but not sure whether this is worth it. This isn’t a lot of code, and it’s very well-defined code that isn’t likely to change over time. Making the script depend on the test framework means that the test framework needs to maintain an interface for external use, something we don’t want to encourage in general as it limits flexibility to change. Also this use itself isn’t tested. If the functions move, the script breaks.

    with the advantage of having proper unit tests executed in CI

    My preferred solution there would be to add a CI test for this script instead.

  15. rkrux commented at 2:00 pm on March 26, 2025: contributor

    #32116 (comment) contains good points to consider.

    This isn’t a lot of code, and it’s very well-defined code that isn’t likely to change over time. Making the script depend on the test framework means that the test framework needs to maintain an interface for external use, something we don’t want to encourage in general as it limits flexibility to change.

    Agree, it’s not a lot of code and is unlikely to change over time. Also, agree that we don’t want this to be encouraged in general.

    If the functions move, the script breaks.

    Some brittleness is indeed introduced. Given other points, I’d ask how likely it is for these utility functions to move in the functional test framework?

    My preferred solution there would be to add a CI test for this script instead.

    This script is a part of the CI tests via the functional tests here: https://github.com/bitcoin/bitcoin/blob/c0b7159de47592c95c6db061de682b4af4058102/test/functional/tool_utxo_to_sqlite.py#L105-L108

  16. rkrux commented at 2:15 pm on March 26, 2025: contributor
    One reason I find deduplicating here appealing is that these (de)compress/deserialize utility functions have quite a custom implementation with lack of comments/explanation. My preference would be to have them present once only instead of seeing the implementation in multiple places in the repo.
  17. laanwj commented at 2:34 pm on March 26, 2025: member

    This script is a part of the CI tests via the functional tests here:

    OK, good!

    My preference would be to have them present once only instead of seeing the implementation in multiple places in the repo.

    Right. A cleaner approach to deduplication would be introduce a python utility module that’s used by both the test framework and the tools (but is in a third location), this makes it clear that there’s an interface at least.

  18. theStack commented at 6:08 am on March 27, 2025: contributor

    @laanwj: Fair points! My reasoning to consider this (somewhat hacky) “import code from test framework” approach as acceptable was the fact that we already do it in other contrib scripts as well (e.g. the signet miner), but yeah, that doesn’t necessarily mean it’s a very good idea.

    My preference would be to have them present once only instead of seeing the implementation in multiple places in the repo.

    Right. A cleaner approach to deduplication would be introduce a python utility module that’s used by both the test framework and the tools (but is in a third location), this makes it clear that there’s an interface at least.

    Agree. Will give this a few more thoughts on how this could look like. Related discussion: #27432 (comment) #27432 (comment)

  19. laanwj commented at 10:57 am on March 27, 2025: member

    Fair points! My reasoning to consider this (somewhat hacky) “import code from test framework” approach as acceptable was the fact that we already do it in other contrib scripts as well (e.g. the signet miner), but yeah, that doesn’t necessarily mean it’s a very good idea.

    Yes, to be fair i’ve used that hack plenty of times in my own code outside of bitcoin core, the test framework can be very useful to import to do low-level stuff with bitcoin from Python.

    It would be quite some work to separate “this is purely testing code and depending on it is dangerously brittle” parts from “this is useful for tooling”, but useful for clarity. i don’t insist it needs to be done in this PR, i’m not against merging this as-is.

    Agree. Will give this a few more thoughts on how this could look like.

    In the past we’ve also avoided it because calling it anything else than “functional testing framework” may make other projects decide to depend on it, limiting the flexibility for change required for our tests. But we can be clear that it’s meant for internal use only, like pretty much everything under contrib/, so placing it under there may be a good choice.

    Also coming up with a convenient name could be a challenge, everything in the combinatorial namespace of “bitcoin” “lib” “python” is very likely already used 😄


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-03-28 15:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me