RFC: support for writing UTXO set dump (dumptxoutset RPC) to a named pipe #31373

issue theStack openend this issue on November 26, 2024
  1. theStack commented at 2:01 pm on November 26, 2024: contributor

    Please describe the feature you’d like to see added.

    While the primary obvious use-case for the dumptxoutset RPC is to create AssumeUTXO snapshots (to be distributed and loaded on newly created nodes via the loadtxoutset RPC later), it can also be useful as input for external tooling like converters to other UTXO set formats, e.g. #27432. For those, the intermediate step of writing a >10GB file to disk and then reading it again is wasteful and annoying, as it consumes both more time and space than necessary. By supporting writing to a named pipe, the output data could be fed directly into another process instead. Thanks to the UNIX “everything is a file” philosophy, no logic changes in the tooling are even needed – the reader only sees an input stream and doesn’t notice or care if the input file represents an actual physical file on disk or if the data is generated on-the-fly from another process.

    Currently needed steps for external tools:

    1. call dumptxoutset to create utxo.dump (>10GB on mainnet)
    2. call external tool with utxo.dump as input (run only after step 1 is finished)
    3. delete utxo.dump

    Needed steps for external tools with named pipe support:

    1. create a named pipe utxo.pipe (e.g. via https://linux.die.net/man/3/mkfifo)
    2. call dumptxoutset to write to utxo.pipe
    3. call external tool with utxo.pipe as input (run in parallel to step 2)
    4. delete utxo.pipe

    I’ve tried this yesterday and it works as expected with minimal changes (see proposed solution below). Will push the branch later with concrete instructions, if people feel that this is worthwhile to support.

    No response

    Describe the solution you’d like

    The dumptxoutset call only needs two minor behaviour modificiations. If the passed path is a named pipe (trivially detectable via C++ standard library routine std::filesystem::is_fifo), then:

    • don’t error if the file already exists
    • don’t create a temporary file with .incomplete suffix, but write directly into the specified path

    Describe any alternatives you’ve considered

    No response

    Please leave any additional context

    No response

  2. theStack added the label Feature on Nov 26, 2024
  3. willcl-ark commented at 12:21 pm on November 28, 2024: member
    Sounds like a reasonable idea to me.
  4. pythcoiner commented at 1:56 pm on December 21, 2024: none

    i’ll review/test your PR when ready.

    i think 1 ,3 and 4 could/should be optional as it also can be handled by the process reading the pipe

  5. theStack commented at 9:10 pm on December 22, 2024: contributor

    i’ll review/test your PR when ready.

    i think 1 ,3 and 4 could/should be optional as it also can be handled by the process reading the pipe

    Yes, I agree that named pipe creation (1) / deletion (4) and the call of the tool (3) shouldn’t be done by the RPC itself. My plan was to base the change on #27432 and then provide an additional mode where the user has to provide the path to the bitcoin-cli binary rather than an input file, which would roughly do the following:

    0fifoname=/tmp/txoutset.fifo
    1bitcoincli_bin=$1
    2output_file=$2
    3mkfifo $fifoname
    4$bitcoincli_bin dumptxoutset $fifoname
    5./contrib/utxo-tools/utxo-to-sqlite.py $fifoname $output_file
    6rm $fifoname
    

    Will open a PR shortly.

  6. pythcoiner commented at 3:03 am on December 23, 2024: none
    i’ll take a look at 27432 then

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-01-21 06:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me