rpc: allow dumptxoutset to dump human-readable data #24202

pull w0xlt wants to merge 2 commits into bitcoin:master from w0xlt:feature-utxo-ascii changing 5 files +153 −22
  1. w0xlt commented at 4:25 pm on January 29, 2022: contributor

    As #18689 is labeled “Up for grabs” , this PR proposes a modified version of it.

    The changes (from the original) are : . RPC has been simplified. It now has only one additional parameter (human_readable=true/false). . The CreateUTXOSnapshot signature has been simplified, with reduced number of parameters and coinascii_cb_t data type has been removed. . The functional test verifies the content of the human-readable file.

    The generated file, however, is the same.

    To test: ./src/bitcoin-cli dumptxoutset utxo.txt true

  2. w0xlt force-pushed on Jan 29, 2022
  3. DrahtBot added the label RPC/REST/ZMQ on Jan 29, 2022
  4. w0xlt force-pushed on Jan 29, 2022
  5. w0xlt force-pushed on Jan 30, 2022
  6. in src/rpc/blockchain.cpp:2730 in 4ad72ef992 outdated
    2726+            HelpExampleCli("dumptxoutset", "utxo.dat") +
    2727+            HelpExampleCli("dumptxoutset", "utxo.dat true")
    2728         },
    2729         [&](const RPCHelpMan& self, const JSONRPCRequest& request) -> UniValue
    2730 {
    2731+    const bool is_human_readable = !request.params[1].isNull() && request.params[1].get_bool();
    


    shaavan commented at 10:56 am on January 30, 2022:

    nit: I think it’s better to print the default value as literal. This will make it easier to change the value (in case it is needed to be changed) without changing the logic.

    0    const bool is_human_readable{request.params[1].isNull() ? false : request.params[1].get_bool()};
    
  7. in src/rpc/blockchain.cpp:2775 in 4ad72ef992 outdated
    2770@@ -2762,6 +2771,9 @@ UniValue CreateUTXOSnapshot(
    2771     CCoinsStats stats{CoinStatsHashType::HASH_SERIALIZED};
    2772     CBlockIndex* tip;
    2773 
    2774+    // used when human readable format is requested
    2775+    const std::string& separator = ",";
    


    shaavan commented at 11:01 am on January 30, 2022:

    nit:

    0    const std::string& separator{","};
    
  8. in src/rpc/blockchain.cpp:2817 in 4ad72ef992 outdated
    2814-    afile << metadata;
    2815+        for(const auto& text : { "txid", "vout", "value", "coinbase", "height", "scriptPubKey" }) {
    2816+            afile.write(separator);
    2817+            afile.write(text);
    2818+        }
    2819+
    


    shaavan commented at 11:03 am on January 30, 2022:

    nit: I think this extra line can be removed.

  9. shaavan commented at 11:11 am on January 30, 2022: contributor

    Concept ACK

    I like the idea of allowing the option to dumptxoutset in human-readable data. Also, at first glance, the code looks very clean. Great work! @w0xlt.

    I shall post my complete review after thoroughly reviewing this PR. In the meantime, I found some nit suggestions that might help make this PR even better.

  10. w0xlt force-pushed on Jan 30, 2022
  11. in src/rpc/blockchain.cpp:2711 in 4cef616000 outdated
    2704@@ -2705,9 +2705,10 @@ static RPCHelpMan dumptxoutset()
    2705 {
    2706     return RPCHelpMan{
    2707         "dumptxoutset",
    2708-        "Write the serialized UTXO set to disk.",
    2709+        "Write the UTXO set to disk.",
    2710         {
    2711             {"path", RPCArg::Type::STR, RPCArg::Optional::NO, "Path to the output file. If relative, will be prefixed by datadir."},
    2712+            {"human_readable", RPCArg::Type::BOOL, RPCArg::Default{false}, "Dump file is created in human-readable format"}
    


    luke-jr commented at 4:03 am on January 31, 2022:
    I think it would be better to have a non-Boolean param for this. Not only are Booleans terrible for positional arguments, but it’s quite plausible in the future the format might change, and specifying a format version Number or format String would help ease the transition from the deprecated format.

    shaavan commented at 8:22 am on January 31, 2022:

    @luke-jr

    What do you think about shifting this boolean param from positional to option argument?


    luke-jr commented at 9:06 am on January 31, 2022:
    I still don’t think a boolean is a good fit either way.
  12. luke-jr changes_requested
  13. in test/functional/rpc_dumptxoutset.py:36 in 4cef616000 outdated
    18@@ -18,16 +19,12 @@ def set_test_params(self):
    19         self.setup_clean_chain = True
    20         self.num_nodes = 1
    21 
    22-    def run_test(self):
    23-        """Test a trivial usage of the dumptxoutset RPC command."""
    24+    def test_dump_file(self, filename, is_human_readable, expected_digest):
    25+
    


    brunoerg commented at 4:29 pm on January 31, 2022:

    Feel free to ignore: I’d remove this blank line


    w0xlt commented at 2:33 am on February 9, 2022:
    Done. Suggestion accepted.
  14. in test/functional/rpc_dumptxoutset.py:48 in 4cef616000 outdated
    45 
    46         assert_equal(
    47             out['txoutset_hash'], 'd4b614f476b99a6e569973bf1c0120d88b1a168076f8ce25691fb41dd1cef149')
    48         assert_equal(out['nchaintx'], 101)
    49 
    50         # Specifying a path to an existing file will fail.
    


    brunoerg commented at 4:38 pm on January 31, 2022:
    0        self.log.info("Test that a path to an existing file will fail")
    

    I would put some ’logs’ on this test to follow each step when the test is running.


    w0xlt commented at 2:33 am on February 9, 2022:
    Done. Suggestion accepted.
  15. brunoerg approved
  16. brunoerg commented at 5:19 pm on January 31, 2022: contributor

    tACK 4cef616000bdd92b5fc99b4367fa6c0e88204096 (MacOS 12)

    ./src/bitcoin-cli dumptxoutset utxo.txt true

    0{
    1  "coins_written": 40495,
    2  "base_hash": "000000002cb3254c66a5510fe4b921d5e5d72e85b5111455365f519dee915ce3",
    3  "base_height": 48205,
    4  "path": "/Users/brunogarcia/Library/Application Support/Bitcoin/utxo.txt",
    5  "txoutset_hash": "a4d1ef75d84b1dbda8df305661ef8e8bbcc3fc86e12e4d425255eaf04030100c",
    6  "nchaintx": 48821
    7}
    

    I checked the file content and seems right.

    Ran again and got:

    0error code: -8
    1error message:
    2/Users/brunogarcia/Library/Application Support/Bitcoin/utxo.txt already exists. If you are sure this is what you want, move it out of the way first 
    
  17. in src/rpc/blockchain.cpp:2814 in 4cef616000 outdated
    2811+    } else {
    2812+        afile.write("#(blockhash " + tip->GetBlockHash().ToString() + ")");
    2813 
    2814-    afile << metadata;
    2815+        for(const auto& text : { "txid", "vout", "value", "coinbase", "height", "scriptPubKey" }) {
    2816+            afile.write(separator);
    


    luke-jr commented at 1:19 am on February 8, 2022:

    It doesn’t make sense to have the blockhash “comment” line part of the headers, separator and all.

    Looking at #18689, it wasn’t much better: had weird spacing after the block hash (" ) ") and concatenating the headers immediately after that with no separator.

    Instead, based on other CSVs I’ve seen, I suggest:

    0#(blockhash nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn)
    1txid,vout,value,coinbase,height,scriptPubKey
    

    w0xlt commented at 6:27 pm on February 10, 2022:
    Done.
  18. in src/streams.h:632 in 4cef616000 outdated
    625@@ -626,6 +626,11 @@ class CAutoFile
    626         }
    627     }
    628 
    629+    void write(const std::string& s)
    630+    {
    631+        write(MakeByteSpan(s));
    632+    }
    


    luke-jr commented at 7:35 am on February 8, 2022:
    Probably better to do this inside CreateUTXOSnapshot

    luke-jr commented at 8:51 pm on February 8, 2022:
    eg https://github.com/luke-jr/bitcoin/commit/5ddb75caa0ab2e3be306b96132e3b9af7ba62a40 (but since you don’t have all the params in this PR, it will take a bit of rebasing)

    w0xlt commented at 2:33 am on February 9, 2022:
    Done. Thanks.
  19. in src/rpc/blockchain.cpp:2775 in 4cef616000 outdated
    2770@@ -2762,6 +2771,9 @@ UniValue CreateUTXOSnapshot(
    2771     CCoinsStats stats{CoinStatsHashType::HASH_SERIALIZED};
    2772     CBlockIndex* tip;
    2773 
    2774+    // used when human readable format is requested
    2775+    const std::string& separator{","};
    


    luke-jr commented at 7:41 am on February 8, 2022:

    Can do a line_separator (`"\n") here too…

    0    const auto separator = MakeByteSpan(",").first(1);
    1    const auto line_separator = MakeByteSpan("\n").first(1);
    

    w0xlt commented at 2:32 am on February 9, 2022:
    Done.
  20. luke-jr changes_requested
  21. in test/functional/rpc_dumptxoutset.py:70 in 4cef616000 outdated
    68+        self.generate(node, COINBASE_MATURITY)
    69+
    70+        txt_file_digest = '5bc8a9c14d1f6d89833342dcd6014bdf9ddb5f19e3741760da6d6d666971df41'
    71+
    72+        if os.name == 'nt':
    73+            txt_file_digest = 'dff298e77a80bb6e292526c0729d78fdef68a537c372563b5f5dbfadddabc780'
    


    luke-jr commented at 7:47 am on February 8, 2022:
    This is kind of an ugly hack. Maybe we just need to open the file in text mode and hash it as if it were UNIX format? :/

    luke-jr commented at 8:50 pm on February 8, 2022:

    Python’s text mode reader is overly tolerant, so wouldn’t check the EOL used is correct.

    Instead, I ended up with https://github.com/luke-jr/bitcoin/commit/58e0d32de6a444bd73d30355ffe17dc7ed96e738 - but it may be better off just leaving this alone for this PR (unless we care about the added flexibility of supporting any arbitrary platform)


    w0xlt commented at 2:31 am on February 9, 2022:
    Done. Thanks for the detailed review.
  22. w0xlt marked this as a draft on Feb 9, 2022
  23. rpc: allow dumptxoutset to dump human-readable data
    Co-authored-by: Shashwat Vangani <shaavan.github@gmail.com>
    Co-authored-by: Luke Dashjr <luke-jr+git@utopios.org>
    25ac6b5a36
  24. test: add test for dump human-readable dumptxoutset
    Co-authored-by: Luke Dashjr <luke-jr+git@utopios.org>
    Co-authored-by: brunoerg <brunoely.gc@gmail.com>
    1053636ddd
  25. w0xlt force-pushed on Feb 9, 2022
  26. w0xlt commented at 2:31 am on February 9, 2022: contributor

    @luke-jr , I accepted your suggestions. While I personally prefer simpler RPC (something like dumptxoutset utxo.csv "csv", as I proposed earlier), I don’t have a strong opinion on this. If other reviewers want to discuss it, we can eventually change it.

    I tried merging your commits first, but when I dropped the previous commits, this caused a lot of conflicts. So I reset and re-commit. If you prefer another approach, let me know.

  27. w0xlt marked this as ready for review on Feb 9, 2022
  28. DrahtBot commented at 1:23 pm on February 24, 2022: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #24732 (Remove buggy and confusing IncrementExtraNonce by MarcoFalke)
    • #24456 (blockman: Properly guard blockfile members by dongcarl)
    • #22564 (refactor: Move mutable globals cleared in ::UnloadBlockIndex to BlockManager by dongcarl)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  29. theStack commented at 2:30 pm on March 10, 2022: contributor

    Concept ACK

    I think on the long-term writing a database (e.g. sqlite) would be even more helpful, as a text file in the size of multiple giga-bytes is relatively hard to handle without an index. But this is obviously out-of-scope for this PR and probably can also be solved by a script that creates the database with the human-readable data as input.

  30. DrahtBot added the label Needs rebase on Apr 6, 2022
  31. DrahtBot commented at 10:05 am on April 6, 2022: contributor

    🐙 This pull request conflicts with the target branch and needs rebase.

    Want to unsubscribe from rebase notifications on this pull request? Just convert this pull request to a “draft”.

  32. DrahtBot commented at 7:41 am on July 25, 2022: contributor
    • Is it still relevant? ➡️ Please solve the conflicts to make it ready for review and to ensure the CI passes.
    • Is it no longer relevant? ➡️ Please close.
    • Did the author lose interest or time to work on this? ➡️ Please close it and mark it ‘Up for grabs’ with the label, so that it can be picked up in the future.
  33. achow101 commented at 7:01 pm on October 12, 2022: member
    Closing this as it has not had any activity in a while. If you are interested in continuing work on this, please leave a comment so that it can be reopened.
  34. achow101 closed this on Oct 12, 2022

  35. adamjonas added the label Up for grabs on Mar 9, 2023
  36. bitcoin locked this on Mar 8, 2024

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-07-05 19:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me