rpc: allow dumptxoutset to dump human-readable data #18689

pull pierreN wants to merge 1 commits into bitcoin:master from pierreN:feature-utxo-ascii changing 6 files +126 −30
  1. pierreN commented at 7:40 pm on April 17, 2020: contributor

    Adds additional optional arguments to dumptxoutset. If any are present, a human-readable file is written to disk instead of the compact binary serialized form currently in use. This does not change the current default behavior of dumptxoutset.


    Thanks to the future assumeutxo feature (#15605), we now have a dumptxoutset RPC (#16899) which can write the whole UTXO set to disk. However, the current format, although compact, is not easily readable by standard tools (e.g. for someone who would like to study the UTXO set). Plus this binary format might change in the future AFAIK.

    Providing power users an easy way to have a human-readable dump of the UTXOs would be a useful feature. We would this way replace 3rd party hackish tools with possible side effects.

    On my machine (slow SSD):

    • dumping the whole original 4GB binary UTXO set takes around 1mn40
    • dumping the set in whole ASCII form takes less than 9GB and 3mn30 (ofc file size/time depends on which ASCII data you write to disk; you can select them via the format argument).

    Thanks!

  2. hebasto commented at 8:03 pm on April 17, 2020: member
    What are possible/expected use cases?
  3. MarcoFalke commented at 8:09 pm on April 17, 2020: member
    Concept ACK
  4. DrahtBot added the label RPC/REST/ZMQ on Apr 17, 2020
  5. DrahtBot added the label Tests on Apr 17, 2020
  6. DrahtBot commented at 10:57 pm on April 17, 2020: member

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #21850 (Remove GetDataDir(net_specific) function by kiminuo)
    • #21526 (validation: UpdateTip/CheckBlockIndex assumeutxo support by jamesob)
    • #20664 (Add scanblocks RPC call by jonasschnelli)
    • #20295 (rpc: getblockfrompeer by Sjors)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  7. pierreN commented at 2:16 am on April 18, 2020: contributor

    @hebasto the most common use case will be for users to easily study the whole UTXO set in only a few minutes.

    With this PR it should be really easy to dump any format you want via the format parameter (type of data, number of occurrences and order are all respected). Also, just adding one element to the ascii_types vector allows you to dump any new type of ASCII data.

    For example, you can now trivially plot a graph of when current UTXOs were created (took 3mn30 on my machine):

    0$ bitcoin-cli dumptxoutset utxos.dat '["height","value"]' false ' '
    1$ awk '{if($1 in map) { map[$1] += $2; } else { map[$1] = $2 }} END { for(height in map) { print height, map[height]; }}' utxos.dat | sort -n > plot.dat
    2$ gnuplot -e "plot 'plot.dat' w l; pause -1;"
    

    image

    Another example would be to sum the amount of coinbase values in the UTXO set (took 2mn30):

    0$ bitcoin-cli dumptxoutset utxos.dat '["coinbase","value"]' false ' '
    1$ fgrep "1 " utxos.dat | awk '{sum += $2} END {print sum/100000000, "btc unspent coinbase"}'
    21.76077e+06 btc unspent coinbase
    

    Or really, just anything the user wants. With some more work from the user, it could also make it easier to track some indicators (such as bootstrapping/syncing SOPR).

    Another example: doing bitcoin file archaeology. Since all methods using scriptPubSig to etch data on the blockchain spam the UTXO set, you can retrieve a superset of all TXIDs of “files stored in the blockchain” via this RPC call.

  8. DrahtBot added the label Needs rebase on Apr 30, 2020
  9. pierreN force-pushed on May 1, 2020
  10. pierreN force-pushed on May 1, 2020
  11. pierreN commented at 5:09 am on May 1, 2020: contributor
    rebased cd20cb8
  12. DrahtBot removed the label Needs rebase on May 1, 2020
  13. brakmic commented at 1:50 pm on May 1, 2020: contributor

    ACK cd20cb886deb0ef91ab89c66bbfb511e89eb77ee

    Built, run and tested on macOS Catalina 10.15.4

     0./test/functional/rpc_dumptxoutset.py
     12020-05-01T13:40:08.556000Z TestFramework (INFO): Initializing test directory /var/folders/7q/4ffytzk562dd2ky4bfg9_w7h0000gn/T/bitcoin_func_test_oc2tzp0y
     22020-05-01T13:40:11.072000Z TestFramework (INFO): no_option
     32020-05-01T13:40:11.113000Z TestFramework (INFO): all_data
     42020-05-01T13:40:11.216000Z TestFramework (INFO): partial_data_1
     52020-05-01T13:40:11.304000Z TestFramework (INFO): partial_data_order
     62020-05-01T13:40:11.369000Z TestFramework (INFO): partial_data_double
     72020-05-01T13:40:11.447000Z TestFramework (INFO): no_header
     82020-05-01T13:40:11.537000Z TestFramework (INFO): separator
     92020-05-01T13:40:11.617000Z TestFramework (INFO): all_options
    102020-05-01T13:40:11.748000Z TestFramework (INFO): Stopping nodes
    112020-05-01T13:40:12.313000Z TestFramework (INFO): Cleaning up /var/folders/7q/4ffytzk562dd2ky4bfg9_w7h0000gn/T/bitcoin_func_test_oc2tzp0y on exit
    122020-05-01T13:40:12.313000Z TestFramework (INFO): Tests successful
    
    0./src/bitcoin-cli -regtest dumptxoutset dump.dat '["txid", "vout"]' false ':'
    1{
    2  "coins_written": 407,
    3  "base_hash": "01ba165996f7a7899e56b37584398adb892a5df7566b95e8de457ab588784740",
    4  "base_height": 407,
    5  "path": "/Users/brakmic/Library/Application Support/Bitcoin/regtest/dump.dat"
    6}
    
    0cat "/Users/brakmic/Library/Application Support/Bitcoin/regtest/dump.dat"
    1208c48f15ed2971709d81da915b72255e50b9251c558dc45981632ed6e4cd300:0
    2338e1fde4b86e2daaba2bd7cb4f8d77e600f47e7814645aafb480f56f4f41103:0
    3e73b0564bd56d359bd8df64fa3b9fd8586c3ff0430081aff1f97a9600c834403:0
    423ff11ec2801f1c4838fc19863f7fa8d9283ac29e644180a6eaba160fd2a9c03:0
    503ba026c466ab490a19b0aa8a39abeeccc6cff24d4a24a34d5f4304ae21e5304:0
    698e8ceec62fb6442acaa939461482a46d7c03968082430661854815571eca204:0
    74b84d555d8a19ab3cc38152e446fdbd059ec535ab67806a61628f238e495ff04:0
    8[...snip...]
    
  14. luke-jr referenced this in commit 8cf4bf7651 on Jun 9, 2020
  15. in src/rpc/blockchain.cpp:2298 in cd20cb886d outdated
    2294+    const std::string separator = request.params[3].isNull() ? "," : request.params[3].get_str();
    2295+    std::vector<std::pair<std::string, cb_t>> requested;
    2296+    if (!is_compact) {
    2297+        const auto& arr = request.params[1].get_array();
    2298+        const std::unordered_map<std::string, cb_t> ascii_map(std::begin(ascii_types), std::end(ascii_types));
    2299+        for(auto i = 0; i < arr.size(); ++i) {
    


    luke-jr commented at 11:51 pm on June 9, 2020:

    auto doesn’t really work in this context…

    0warning: comparison of integer expressions of different signedness: ‘int’ and ‘size_t’ {aka ‘long unsigned int’} [-Wsign-compare]
    

    pierreN commented at 8:57 am on June 14, 2020:

    Ha, funny that a compiler can get the warning but don’t properly choose the type of i. I guess the 0 must confuse it.

    Thanks for catching this, I’ve just updated the branch.

  16. luke-jr referenced this in commit 6c0bf8881c on Jun 10, 2020
  17. pierreN force-pushed on Jun 14, 2020
  18. jamesob commented at 1:48 am on August 26, 2020: member
    Cool, I’ll take a look in the next few days.
  19. DrahtBot added the label Needs rebase on Sep 22, 2020
  20. jamesob commented at 3:02 pm on November 19, 2020: member
    Concept ACK - will review soon.
  21. MarcoFalke removed the label Tests on Nov 20, 2020
  22. luke-jr referenced this in commit c0f780cda5 on Nov 25, 2020
  23. in src/rpc/blockchain.cpp:2297 in 82046cf7fa outdated
    2293+    const bool show_header = request.params[2].isNull() || request.params[2].get_bool();
    2294+    const std::string separator = request.params[3].isNull() ? "," : request.params[3].get_str();
    2295+    std::vector<std::pair<std::string, cb_t>> requested;
    2296+    if (!is_compact) {
    2297+        const auto& arr = request.params[1].get_array();
    2298+        const std::unordered_map<std::string, cb_t> ascii_map(std::begin(ascii_types), std::end(ascii_types));
    


    luke-jr commented at 11:58 pm on November 26, 2020:

    I’m not sure if it’s a compiler bug or PR bug, but ascii_types is invalid here when compiled with GCC 9.3.0.

     0==48759== Thread 22 b-httpworker.3:
     1==48759== Invalid read of size 8
     2==48759==    at 0x4BC68C: __gnu_cxx::__normal_iterator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::fun
     3ction<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > (COutPoint const&, Coin const&)> > const*, std::vector<std::pair<std
     4::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::function<std::__cxx11::basic_string<char, std::char_traits<char>, std::
     5allocator<char> > (COutPoint const&, Coin const&)> >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<cha
     6r> >, std::function<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > (COutPoint const&, Coin const&)> > > > >::__normal_ite
     7rator(std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::function<std::__cxx11::basic_string<char, std::char_
     8traits<char>, std::allocator<char> > (COutPoint const&, Coin const&)> > const* const&) (stl_iterator.h:807)
     9==48759==    by 0x4BC637: std::vector<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::function<std::__cxx1
    101::basic_string<char, std::char_traits<char>, std::allocator<char> > (COutPoint const&, Coin const&)> >, std::allocator<std::pair<std::__cxx11::basic_stri
    11ng<char, std::char_traits<char>, std::allocator<char> >, std::function<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > (CO
    12utPoint const&, Coin const&)> > > >::begin() const (stl_vector.h:818)
    13==48759==    by 0x4BA12F: decltype (({parm#1}.begin)()) std::begin<std::vector<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::all
    14ocator<char> >, std::function<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > (COutPoint const&, Coin const&)> >, std::all
    15ocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::function<std::__cxx11::basic_string<char, std::char
    16_traits<char>, std::allocator<char> > (COutPoint const&, Coin const&)> > > > >(std::vector<std::pair<std::__cxx11::basic_string<char, std::char_traits<cha
    17r>, std::allocator<char> >, std::function<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > (COutPoint const&, Coin const&)>
    18 >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::function<std::__cxx11::basic_string<cha
    19r, std::char_traits<char>, std::allocator<char> > (COutPoint const&, Coin const&)> > > > const&) (range_access.h:59)
    20==48759==    by 0x48EC97: dumptxoutset()::$_37::operator()(RPCHelpMan const&, JSONRPCRequest const&) const (blockchain.cpp:2692)
    
  24. luke-jr referenced this in commit 05d8ff8877 on Nov 30, 2020
  25. in src/rpc/blockchain.cpp:2257 in 82046cf7fa outdated
    2252+        // add any other desired items here
    2253+    };
    2254+
    2255+    std::vector<RPCArg> ascii_args;
    2256+    std::transform(std::begin(ascii_types), std::end(ascii_types), std::back_inserter(ascii_args),
    2257+            [](const std::pair<std::string, cb_t>& t) { return RPCArg{t.first, RPCArg::Type::STR, RPCArg::Optional::OMITTED, "Info to write for a given UTXO"}; });
    


    benthecarman commented at 10:09 pm on March 6, 2021:
    It’d be nice if these were more descriptive. It’s unclear exactly what the serialization is for these argsr
  26. MarcoFalke added the label Up for grabs on Apr 13, 2021
  27. MarcoFalke commented at 6:42 pm on April 13, 2021: member
    Still needs rebase
  28. DrahtBot removed the label Needs rebase on May 6, 2021
  29. rpc: allow dumptxoutset to dump human-readable data 65d0697fe3
  30. pierreN force-pushed on May 6, 2021
  31. pierreN commented at 9:39 pm on May 6, 2021: contributor

    Sorry for the few months delay. I have a bit more time now and will try to follow through with this PR.

    I’ll update the branch in a few days (I was syncing when my old SSD died).

  32. Sjors commented at 1:41 pm on May 21, 2021: member
    Consider moving this functionality to the new bitcoin-util instead. You could add a command that converts the binary format to human readable.
  33. DrahtBot added the label Needs rebase on May 24, 2021
  34. DrahtBot commented at 9:40 am on May 24, 2021: member

    🐙 This pull request conflicts with the target branch and needs rebase.

    Want to unsubscribe from rebase notifications on this pull request? Just convert this pull request to a “draft”.

  35. in src/rpc/blockchain.cpp:2611 in 65d0697fe3
    2607@@ -2564,7 +2608,7 @@ static RPCHelpMan dumptxoutset()
    2608     };
    2609 }
    2610 
    2611-UniValue CreateUTXOSnapshot(NodeContext& node, CChainState& chainstate, CAutoFile& afile)
    2612+UniValue CreateUTXOSnapshot(const bool is_compact, const bool show_header, const std::string& separator, NodeContext& node, CChainState& chainstate, CAutoFile& afile, const std::vector<std::pair<std::string, coinascii_cb_t>>& requested)
    


    luke-jr commented at 6:17 am on October 11, 2021:
    IMO it’d be nicer to avoid the two mutually-exclusive bools. Maybe a good case for a class enum?
  36. in src/test/validation_chainstatemanager_tests.cpp:186 in 65d0697fe3
    182@@ -183,7 +183,7 @@ CreateAndActivateUTXOSnapshot(NodeContext& node, const fs::path root, F malleati
    183     FILE* outfile{fsbridge::fopen(snapshot_path, "wb")};
    184     CAutoFile auto_outfile{outfile, SER_DISK, CLIENT_VERSION};
    185 
    186-    UniValue result = CreateUTXOSnapshot(node, node.chainman->ActiveChainstate(), auto_outfile);
    187+    UniValue result = CreateUTXOSnapshot(false, false, "", node, node.chainman->ActiveChainstate(), auto_outfile, {});
    


    luke-jr commented at 6:19 am on October 11, 2021:
    The first false here should be true, as ActivateSnapshot can only handle the binary/compact format.
  37. luke-jr changes_requested
  38. josibake commented at 10:43 am on December 28, 2021: member
    Concept ACK @pierreN are you still working on this? I’m happy to try and take it over the finish line if you’re not.
  39. jamesob commented at 7:10 pm on January 3, 2022: member
    re-Concept ACK and at a high-level the code looks pretty good. Nice job on the tests. In need of a rebase though.
  40. luke-jr referenced this in commit 17d3ed773f on Feb 8, 2022
  41. luke-jr referenced this in commit 6511adb193 on Feb 8, 2022
  42. luke-jr referenced this in commit 648dc4dcd7 on Feb 8, 2022
  43. luke-jr referenced this in commit 61f3da3c04 on Feb 8, 2022
  44. luke-jr commented at 9:04 pm on February 8, 2022: member
    If you decide to revive this PR, I’ve done an extensive rebase at https://github.com/bitcoin/bitcoin/compare/master...luke-jr:rpc_dumptxoutset_hr (leave off the last commit), rebasing it on top of (but not compatible with) #24202, and splitting up the different functionality across multiple commits.
  45. MarcoFalke commented at 9:11 am on March 22, 2022: member
    I think this was picked up in #https://github.com/bitcoin/bitcoin/pull/24202 , so can be closed?
  46. MarcoFalke removed the label Up for grabs on Mar 22, 2022
  47. fanquake closed this on May 12, 2022

  48. DrahtBot locked this on May 12, 2023

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-12-18 18:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me