rpc: allow writing UTXO set to a named pipe, introduce dump_to_sqlite.sh script #31560

pull theStack wants to merge 5 commits into bitcoin:master from theStack:202412-dumptxoutset-allow_write_to_named_pipe changing 7 files +380 −3
  1. theStack commented at 2:42 am on December 24, 2024: contributor

    This PR is based on #27432 and slightly modifies the dumptxoutset RPC to allow writing the UTXO set dump into a named pipe, so that the output data can be consumed by another process, see #31373. Taking use of this with the utxo-to-sqlite.py tool (introduced in #27432), creating an UTXO set in SQLite3 format is possible on the fly and becomes a one-liner with a newly introduced script dump_to_sqlite.sh. E.g. for signet:

     0$ ./contrib/utxo-tools/dump_to_sqlite.sh "./build/src/bitcoin-cli -signet" ~/utxos.sqlite3
     1UTXO Snapshot for Signet at block hash 000000ddc3b251483cf1ebb23e2750ba..., contains 5705634 coins
     21048576 coins converted [18.38%], 4.474s passed since start
     32097152 coins converted [36.76%], 8.793s passed since start
     43145728 coins converted [55.13%], 13.146s passed since start
     54194304 coins converted [73.51%], 17.478s passed since start
     65242880 coins converted [91.89%], 21.832s passed since start
     7{
     8  "coins_written": 5705634,
     9  "base_hash": "000000ddc3b251483cf1ebb23e2750ba2490701d0c547241b247a9beb85498d0",
    10  "base_height": 227678,
    11  "path": "/tmp/tmp.MFHEVqetv0/utxos.fifo",
    12  "txoutset_hash": "f29e524c999487cbd0cfca5201dce67c2c5e5c5eb115c63ad48c2239f23eea4c",
    13  "nchaintx": 8272649
    14}
    15TOTAL: 5705634 coins written to /home/thestack/utxos.sqlite3, snapshot height is 227678.
    

    Note that the dumptxoutset RPC calculates an UTXO set hash as a first step before any data is emitted, so especially on mainnet it takes quite a while until the conversion starts and something is happening visibly.

    The new script is quite minimal and PoC-y at this point, there are some potential improvement ideas:

    • better error handling (e.g. detect if bitcoin-cli exists, clean up tmpdir if bitcoin-cli execution fails etc.)
    • allow to pass through the rollback option (now we always dump at the current height, i.e. “latest” parameter)
  2. DrahtBot commented at 2:42 am on December 24, 2024: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/31560.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    Stale ACK tdb3

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #31375 (multiprocess: Add bitcoin wrapper executable by ryanofsky)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  3. DrahtBot added the label RPC/REST/ZMQ on Dec 24, 2024
  4. theStack force-pushed on Dec 24, 2024
  5. DrahtBot added the label CI failed on Dec 24, 2024
  6. DrahtBot commented at 2:52 am on December 24, 2024: contributor

    🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/34820327020

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  7. theStack force-pushed on Dec 24, 2024
  8. theStack force-pushed on Dec 24, 2024
  9. DrahtBot removed the label CI failed on Dec 24, 2024
  10. in contrib/README.md:54 in 95c861f368 outdated
    49+
    50+### [UTXO-to-SQLite](/contrib/utxo-tools/utxo_to_sqlite.py) ###
    51+This script converts a compact-serialized UTXO set (as generated by Bitcoin Core with `dumptxoutset`)
    52+to a SQLite3 database. The coins are stored in a table with the following schema:
    53+```
    54+CREATE TABLE utxos(txid TEXT, vout INT, value INT, coinbase INT, height INT, scriptpubkey TEXT)
    


    tdb3 commented at 3:00 pm on December 24, 2024:
    nit: To prevent maintaining the same info (table structure) in multiple files, maybe we can create a link here to the opening comment in utxo_to_sqlite.py (describing the table)?

    theStack commented at 2:09 am on December 28, 2024:
    Good idea, done in #27432 (referring now to the module docstring, which is also visible as --help output of the script).
  11. in test/functional/tool_utxo_to_sqlite.py:124 in b5ca41cb06 outdated
    131+            os.mkfifo(fifo_filename)
    132+            output_direct_filename = os.path.join(self.options.tmpdir, "utxos_direct.sqlite")
    133+            p = subprocess.Popen([sys.executable, utxo_to_sqlite_path, fifo_filename, output_direct_filename],
    134+                                 stderr=subprocess.STDOUT)
    135+            node.dumptxoutset(fifo_filename, "latest")
    136+            p.wait()
    


    tdb3 commented at 4:28 pm on December 24, 2024:
    Rather than have this wait indefinitely, might be better to specify a timeout (e.g. CONVERSION_TIMEOUT = 60, p.wait(timeout=CONVERSION_TIMEOUT)). This could allow earlier failure detection (i.e. instead of relying on the longer CI timeout).

    theStack commented at 2:10 am on December 28, 2024:
    Added a fixed timeout of 10 seconds, which should be more than enough given the tiny regtest UTXO set. I felt that it’s not worth it to introduce a constant for that, but happy to add if others feel strongly.
  12. in test/functional/tool_utxo_to_sqlite.py:112 in b5ca41cb06 outdated
    119-            utxo_ser += CTxOut(value, bytes.fromhex(spk_hex)).serialize()
    120-            muhash.insert(utxo_ser)
    121-        con.close()
    122-
    123-        muhash_sqlite = muhash.digest()[::-1].hex()
    124+        muhash_sqlite = calculate_muhash_from_sqlite_utxos(output_filename)
    


    tdb3 commented at 4:30 pm on December 24, 2024:
    Might be less churn to have commit d9a8a137b64f586422455318a9e757b1967a4f73 introduce this function instead of refactoring in this commit.

    theStack commented at 2:10 am on December 28, 2024:
    Great idea, done in #27432.
  13. in test/functional/tool_utxo_to_sqlite.py:2 in d9a8a137b6 outdated
    0@@ -0,0 +1,114 @@
    1+#!/usr/bin/env python3
    2+# Copyright (c) 2023 The Bitcoin Core developers
    


    tdb3 commented at 4:34 pm on December 24, 2024:
    2024-present
  14. in contrib/utxo-tools/dump_to_sqlite.sh:2 in 43fff2e9da outdated
    0@@ -0,0 +1,32 @@
    1+#!/usr/bin/env bash
    2+# Copyright (c) 2024 The Bitcoin Core developers
    


    tdb3 commented at 4:36 pm on December 24, 2024:
    2024-present
  15. tdb3 commented at 5:43 pm on December 24, 2024: contributor

    Approach ACK

    Great feature!

    Did some manual santiy testing on mainnet:

    • Used dumptxoutset to create a dump (with a node synced to block 876,186), utxo_to_sqlite.py to covert to a sqlite file, and opened/parsed in python. Conversion seemed successful, the correct number of coins were present in the table
    • Used dump_to_sqlite.sh to do the same with fifo (but with a node synced to block 200,000), then open/parsed in python. Conversion seemed successful, the correct number of coins were present in the table

    Left a few relatively small comments. May circle back and review utxo_to_sqlite.py in more detail as time allows.

  16. contrib: add tool to convert compact-serialized UTXO set to SQLite database ec99ed7380
  17. test: add test for utxo-to-sqlite conversion script 4080b66cbe
  18. in test/functional/tool_utxo_to_sqlite.py:8 in 43fff2e9da outdated
    0@@ -0,0 +1,131 @@
    1+#!/usr/bin/env python3
    2+# Copyright (c) 2023 The Bitcoin Core developers
    3+# Distributed under the MIT software license, see the accompanying
    4+# file COPYING or http://www.opensource.org/licenses/mit-license.php.
    5+"""Test utxo-to-sqlite conversion tool"""
    6+import os
    7+try:
    8+    import sqlite3
    


    romanz commented at 12:08 pm on December 25, 2024:
    Is the import expected to fail on CI? If so, I am not sure that the test below can run successfully if sqlite3 is not available…

    theStack commented at 2:14 am on December 28, 2024:
    The idea is to skip a test rather than fail, if the sqlite3 module is not available (as far as I’m aware, the only supported distro where this could happen currently is FreeBSD). However, this indeed didn’t work as expected since I was using skip_if_no_sqlite rather than skip_if_no_py_sqlite3 (see #26882). Fixed in #27432.
  19. theStack force-pushed on Dec 28, 2024
  20. theStack commented at 2:24 am on December 28, 2024: contributor
    @tdb3 @romanz: Thanks for your reviews, much appreciated! Note that the first two commits which introduce the utxo-to-sqlite.py tool (+test) are part of the base PR #27432, so further comments on those changes would better fit there in the future. I took all of your suggestions and updated #27432 and rebased this PR on top of that again accordingly.
  21. tdb3 approved
  22. tdb3 commented at 4:53 pm on December 28, 2024: contributor

    ACK 59df8480be77a8e3618c9422536c4f8aed82467e

    Also re-ran tests in #31560#pullrequestreview-2522042088

  23. luke-jr commented at 5:59 am on January 7, 2025: member
    Can we make the exists/is_fifo/open atomic somehow? Seems liable to have a race here someday…
  24. theuni commented at 5:56 pm on January 8, 2025: member
    Not opposed to this, but it seems like a good use-case for a kernel util :) @thecharlatan @stickies-v @josibake
  25. luke-jr referenced this in commit 4c23b86973 on Jan 15, 2025
  26. theStack force-pushed on Jan 18, 2025
  27. rpc: support writing UTXO set dump (`dumptxoutset`) to a named pipe
    This allows external tooling (e.g. converters) to consume the output
    directly, rather than having to write the dump to disk first and then
    read it from there again.
    
    Co-authored-by: Luke Dashjr <luke-jr+git@utopios.org>
    61cef65ed1
  28. test: add test for utxo-to-sqlite conversion using named pipe 6341a92820
  29. contrib: add script dump_to_sqlite.sh for direct SQLite3 UTXO dump 53217bd33a
  30. theStack force-pushed on Jan 18, 2025
  31. DrahtBot added the label CI failed on Jan 18, 2025
  32. DrahtBot commented at 4:34 am on January 18, 2025: contributor

    🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/35809942510

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  33. theStack commented at 4:40 am on January 18, 2025: contributor

    Can we make the exists/is_fifo/open atomic somehow? Seems liable to have a race here someday…

    eg luke-jr@56ee485 @luke-jr: Thanks, applied this to https://github.com/bitcoin/bitcoin/pull/31560/commits/61cef65ed1bf43d7d48ed8257441e697d6c14171. Note that I had to introduce a fs::exists helper for file_status, as direct usage of std::filesystem::exists was prohibited by the linter (which failed with “Direct use of std::filesystem may be dangerous and buggy. Please include <util/fs.h> and use the fs:: namespace, which has unsafe filesystem functions marked as deleted.”).

  34. DrahtBot removed the label CI failed on Jan 18, 2025

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-01-21 09:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me