Add hash_type MUHASH for gettxoutsetinfo #19145

pull fjahr wants to merge 5 commits into bitcoin:master from fjahr:csi-3-muhash-rpc changing 10 files +195 −59
  1. fjahr commented at 10:08 PM on June 2, 2020: member

    This is another Pr in the series PRs for Coinstatsindex (see overview in #18000). This PR adds the hash_type option muhash to gettxoutsetinfo through which the user can calculate the serialized muhash of the utxo set. This PR does not use the index yet.

  2. fjahr renamed this:
    [WIP] Allow hash_type options for gettxoutsetinfo
    [WIP] Add hash_type options for gettxoutsetinfo
    on Jun 2, 2020
  3. DrahtBot added the label Build system on Jun 2, 2020
  4. DrahtBot added the label RPC/REST/ZMQ on Jun 2, 2020
  5. DrahtBot added the label Tests on Jun 2, 2020
  6. DrahtBot added the label Utils/log/libs on Jun 2, 2020
  7. DrahtBot commented at 1:22 AM on June 3, 2020: member

    <!--e57a25ab6845829454e8d69fc972939a-->

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    <!--174a7506f384e20aa4161008e828411d-->

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #19806 (validation: UTXO snapshot activation by jamesob)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  8. fjahr renamed this:
    [WIP] Add hash_type options for gettxoutsetinfo
    Add hash_type options for gettxoutsetinfo
    on Jun 4, 2020
  9. fjahr commented at 1:32 PM on June 4, 2020: member

    Added another more extensive test, now ready for review.

  10. Sjors commented at 12:55 PM on June 9, 2020: member

    Concept ACK. In the followup PR that adds a MuHash index, the RPC documentation should point out that this index can be used to dramatically speed up the RPC call when used with muhash. And that muhash and none will be more responsive than the default (or just change the default, and note that as a breaking change to anyone who uses the legacy hash).

  11. fjahr force-pushed on Jun 11, 2020
  12. fjahr commented at 11:08 PM on June 11, 2020: member

    @MarcoFalke I would like to hear your thoughts on 2109165e3b61d26a0496d5a1efd1c635a7f6a76a. I had it only in the test at first but put it into the framework because I think it could be useful for other tests as well and since it's just a method it does not affect the use of P2PDataStore in other tests. But the data it returns on each UTXO is kind of specific to my test.

  13. MarcoFalke removed the label Tests on Jun 11, 2020
  14. MarcoFalke removed the label Build system on Jun 11, 2020
  15. MarcoFalke commented at 11:22 PM on June 11, 2020: member

    Seems fine to put it wherever you want, as long as reviewers are happy. I guess the decision mostly depends on whether this will be used by other tests.

  16. fjahr force-pushed on Jun 12, 2020
  17. DrahtBot added the label Needs rebase on Jun 24, 2020
  18. fjahr force-pushed on Jun 29, 2020
  19. fjahr commented at 7:21 PM on June 29, 2020: member

    rebased

  20. DrahtBot removed the label Needs rebase on Jun 29, 2020
  21. fjahr force-pushed on Jun 30, 2020
  22. fjahr commented at 2:26 PM on June 30, 2020: member

    Fixed a merge issue and pulled in latest changes from dependency PRs.

  23. fjahr force-pushed on Jul 5, 2020
  24. fjahr renamed this:
    Add hash_type options for gettxoutsetinfo
    Add hash_type MUHASH for gettxoutsetinfo
    on Jul 5, 2020
  25. fjahr commented at 11:46 AM on July 5, 2020: member

    Removed some code dublication

  26. MarcoFalke referenced this in commit b52e25cc1b on Jul 6, 2020
  27. DrahtBot added the label Needs rebase on Jul 6, 2020
  28. fjahr force-pushed on Jul 6, 2020
  29. fjahr commented at 2:00 PM on July 6, 2020: member

    Rebased and made small style fix in rpc/util.cpp.

  30. DrahtBot removed the label Needs rebase on Jul 6, 2020
  31. sidhujag referenced this in commit e98a4e94d1 on Jul 8, 2020
  32. DrahtBot added the label Needs rebase on Jul 30, 2020
  33. fjahr force-pushed on Aug 10, 2020
  34. DrahtBot removed the label Needs rebase on Aug 10, 2020
  35. DrahtBot added the label Needs rebase on Aug 26, 2020
  36. fjahr force-pushed on Aug 27, 2020
  37. DrahtBot removed the label Needs rebase on Aug 27, 2020
  38. fjahr force-pushed on Aug 28, 2020
  39. DrahtBot added the label Needs rebase on Sep 1, 2020
  40. fjahr force-pushed on Sep 2, 2020
  41. fjahr commented at 11:39 AM on September 2, 2020: member

    Rebased after #19105 was merged.

  42. DrahtBot removed the label Needs rebase on Sep 2, 2020
  43. DrahtBot added the label Needs rebase on Sep 10, 2020
  44. fjahr force-pushed on Sep 11, 2020
  45. DrahtBot removed the label Needs rebase on Sep 11, 2020
  46. in src/Makefile.am:421 in 9a06f21e6b outdated
     396 | @@ -397,6 +397,8 @@ crypto_libbitcoin_crypto_base_a_SOURCES = \
     397 |    crypto/hmac_sha512.h \
     398 |    crypto/poly1305.h \
     399 |    crypto/poly1305.cpp \
     400 | +  crypto/muhash.h \
     401 | +  crypto/muhash.cpp \
    


    mjdietzx commented at 2:16 PM on November 24, 2020:

    Should be alphabetical order?


    fjahr commented at 12:30 AM on January 14, 2021:

    Sorry, this was part of a previous PR that this one is based on, so I don't want to touch this anymore.

  47. in test/functional/feature_utxo_set_hash.py:59 in 9a06f21e6b outdated
      54 | +            height += 1
      55 | +            block_time += 1
      56 | +
      57 | +            # Save first block because we want to spend its coinbase.
      58 | +            if i == 0:
      59 | +                block1 = block
    


    mjdietzx commented at 5:40 PM on November 24, 2020:

    Isn't this sort of a roundabout way to setup what seems to be the actual test, at lines 74-87? I'm not seeing why conn = node.add_p2p_connection(P2PDataStore()) is necessary for what's being tested here, and if the method added in this PR def estimate_utxo_set(self): is needed.

    Would something like this do what you need:

    coinbase_blocks = list(map(lambda block: node.getblock(block), node.generate(100)))
    spending = next(filter(lambda block: block["confirmations"] == 100, coinbase_blocks))
    coinbase_blocks.remove(spending)
    
    tx = node.gettransaction(spending["tx"][0])
    vtx = FromHex(CTransaction(), tx["hex"])
    vtx.rehash()
    tx1 = create_tx_with_script(vtx, 0, script_sig=b'\x51', amount=25 * COIN)
    tx2 = create_tx_with_script(tx1, 0, script_sig=b'\x51', amount=25 * COIN)
    
    tx1id = node.sendrawtransaction(hexstring=tx1.serialize().hex(), maxfeerate=0)['hex']
    tx2id = node.sendrawtransaction(hexstring=tx2.serialize().hex(), maxfeerate=0)['hex']
    node.generateblock(output=node.getnewaddress(), transactions=[tx1id, tx2id])
    

    and then you form utxo_set here from coinbase_blocks, and tx, tx1, tx2 to calculate the MuHash of the set


    fjahr commented at 12:33 AM on January 14, 2021:

    It was and I did it because I thought there was a good chance that this code could be used for other tests as well and it also made some things a bit easier. But overall of course it adds complexity and I am lacking a specific example where it would be used so I have removed the P2P based implementation and simplified it in a similar way as you imagined, I think. Please check it out.

  48. in test/functional/feature_utxo_set_hash.py:54 in 9a06f21e6b outdated
      70 | +        block2.solve()
      71 | +
      72 | +        conn.send_blocks_and_test([block2], node, success=True)
      73 | +
      74 | +        # Calculate the MuHash of the UTXO set estimate
      75 | +        muhash = MuHash3072()
    


    mjdietzx commented at 6:50 PM on November 24, 2020:

    Would this test be a little better if instead of forming the utxo set at the end, you maintained a running sum and added as blocks were mined, and then added/subtracted when UTXOs were spent and created in tx1, tx2?


    fjahr commented at 12:35 AM on January 14, 2021:

    I don't think so because we need to somehow keep track of what is really in the UTXO set and I think this would make test harder to follow if I did it this way. And I feel like this is more of the standard way of how we structure our tests.

  49. DrahtBot added the label Needs rebase on Jan 7, 2021
  50. fjahr force-pushed on Jan 14, 2021
  51. fjahr commented at 12:38 AM on January 14, 2021: member

    Rebased and addressed feedback by @mjdietzx . Thanks for reviewing and sorry for the long wait, I was focussing on #19055 which is now merged and which makes this one my high prio. I also added a small refactor commit to improve encapsulation between MuHash3072 and Num3072.

  52. DrahtBot removed the label Needs rebase on Jan 14, 2021
  53. in src/node/coinstats.cpp:28 in 62e0f39510 outdated
      24 | @@ -24,31 +25,47 @@ static uint64_t GetBogoSize(const CScript& scriptPubKey)
      25 |             scriptPubKey.size() /* scriptPubKey */;
      26 |  }
      27 |  
      28 | -static void ApplyStats(CCoinsStats& stats, CHashWriter& ss, const uint256& hash, const std::map<uint32_t, Coin>& outputs)
      29 | +static void ApplyHash(CCoinsStats &stats, CHashWriter& ss, const uint256& hash, const std::map<uint32_t, Coin>& outputs, std::map<uint32_t, Coin>::const_iterator it)
    


    FelixWeis commented at 6:48 AM on January 14, 2021:

    nit: CCoinsStats& stats


    fjahr commented at 11:27 PM on January 14, 2021:

    Done

  54. laanwj added this to the "Blockers" column in a project

  55. fjahr force-pushed on Jan 14, 2021
  56. fjahr commented at 11:29 PM on January 14, 2021: member

    There is a deterministic regtest blockchain for the test rpc_getblockstats.py which could (additionally) test the finalized hash against constants, especially in the future when one tests for correct rolling behaviour between blocks.

    Thanks, that's a good point but I think this will be even more interesting when we get to the actual index when the RPC can give hashes for specific heights so I think I will leave this for one of the follow-ups of this one.

  57. fjahr commented at 11:30 PM on January 14, 2021: member

    Addressed Feedback by @FelixWeis and fixed the functional test for the no-wallet case.

  58. fjahr force-pushed on Jan 17, 2021
  59. fjahr force-pushed on Jan 21, 2021
  60. in src/crypto/muhash.cpp:278 in 6995fbd687 outdated
     274 | @@ -276,18 +275,33 @@ void Num3072::Divide(const Num3072& a)
     275 |      if (this->IsOverflow()) this->FullReduce();
     276 |  }
     277 |  
     278 | -Num3072 MuHash3072::ToNum3072(Span<const unsigned char> in) {
     279 | -    Num3072 out{};
     280 | -    uint256 hashed_in = (CHashWriter(SER_DISK, 0) << in).GetSHA256();
     281 | -    unsigned char tmp[BYTE_SIZE];
     282 | -    ChaCha20(hashed_in.data(), hashed_in.size()).Keystream(tmp, BYTE_SIZE);
     283 | +Num3072::Num3072(const unsigned char data[BYTE_SIZE]) {
    


    sipa commented at 9:49 PM on January 21, 2021:

    In commit " refactor: Improve encapsulation between MuHash3072 and Num3072"

    I recommend against passing C-style arrays, as despite its notation, it's not passing an array but a pointer. For example, this works with no warning: https://godbolt.org/z/YrMvfE

    An alternative is using a C++ reference-to-array:

     Num3072::Num3072(const unsigned char (&data)[BYTE_SIZE]) { 
    

    fjahr commented at 12:06 AM on January 22, 2021:

    Done

  61. in src/node/coinstats.cpp:55 in d20438fd83 outdated
      50 | +
      51 | +    CDataStream ss(SER_DISK, PROTOCOL_VERSION);
      52 | +    ss << outpoint;
      53 | +    ss << (uint32_t)(coin.nHeight * 2 + coin.fCoinBase);
      54 | +    ss << coin.out;
      55 | +    muhash.Insert(MakeUCharSpan(ss.str()));
    


    sipa commented at 10:09 PM on January 21, 2021:

    In commit "rpc: Add hash_type MUHASH to gettxoutsetinfo"

    I think extracting an std::string here is unnecessary.

    muhash.Insert(MakeUCharSpan(ss));
    

    should work just as well, and avoid the copy.


    fjahr commented at 12:06 AM on January 22, 2021:

    Fixed

  62. in src/node/coinstats.cpp:27 in 9470df12ec outdated
      23 | @@ -24,31 +24,35 @@ static uint64_t GetBogoSize(const CScript& scriptPubKey)
      24 |             scriptPubKey.size() /* scriptPubKey */;
      25 |  }
      26 |  
      27 | -static void ApplyStats(CCoinsStats& stats, CHashWriter& ss, const uint256& hash, const std::map<uint32_t, Coin>& outputs)
      28 | +static void ApplyHash(CCoinsStats &stats, CHashWriter& ss, const uint256& hash, const std::map<uint32_t, Coin>& outputs, std::map<uint32_t, Coin>::const_iterator it)
    


    sipa commented at 10:11 PM on January 21, 2021:

    In commit "refactor: Seperate hash and stats calculation in coinstats"

    Nit: seperate -> separate in commit title


    fjahr commented at 12:06 AM on January 22, 2021:

    Fixed

  63. fjahr force-pushed on Jan 22, 2021
  64. in src/crypto/muhash.h:94 in 7a96f77c00 outdated
      93 | @@ -91,8 +94,6 @@ class Num3072
      94 |  class MuHash3072
    


    jonatack commented at 5:22 PM on January 23, 2021:

    f8e4e369 nit, this is the only occurrence in the codebase documentation where MuHash is written Muhash

      * efficiently combined later.
      *
    - * Muhash does not support checking if an element is already part of the
    + * MuHash does not support checking if an element is already part of the
      * set. That is why this class does not enforce the use of a set as the
      * data it represents because there is no efficient way to do so.
    

    (the other place is in the python code)

    test/functional/test_framework/muhash.py:95:class TestFrameworkMuhash(unittest.TestCase):
    

    fjahr commented at 5:39 PM on January 24, 2021:

    Fixed

  65. in test/functional/rpc_blockchain.py:270 in 7a96f77c00 outdated
     266 | @@ -267,6 +267,13 @@ def _test_gettxoutsetinfo(self):
     267 |          # hash_type none should not return a UTXO set hash.
     268 |          res5 = node.gettxoutsetinfo(hash_type='none')
     269 |          assert 'hash_serialized_2' not in res5
     270 | +        assert 'muhash' not in res5
    


    jonatack commented at 6:03 PM on January 23, 2021:

    bde3feba should assert 'muhash' not in be added to any of res-res4


    fjahr commented at 5:39 PM on January 24, 2021:

    Done.

  66. in src/rpc/util.cpp:131 in 7a96f77c00 outdated
     135 | +    } else if (hash_type_input == "none") {
     136 | +        return CoinStatsHashType::NONE;
     137 | +    } else if (hash_type_input == "muhash") {
     138 | +        return CoinStatsHashType::MUHASH;
     139 | +    } else {
     140 | +        throw JSONRPCError(RPC_INVALID_PARAMETER, strprintf("%d is not a valid hash_type", hash_type_input));
    


    jonatack commented at 6:23 PM on January 23, 2021:

    bde3febaf this could perhaps be a bit simpler

    <details><summary>diff</summary><p>

    diff --git a/src/rpc/blockchain.cpp b/src/rpc/blockchain.cpp
    index 576dabc349..3a07e5620d 100644
    --- a/src/rpc/blockchain.cpp
    +++ b/src/rpc/blockchain.cpp
    @@ -1079,7 +1079,7 @@ static RPCHelpMan gettxoutsetinfo()
         CCoinsStats stats;
         ::ChainstateActive().ForceFlushStateToDisk();
     
    -    const CoinStatsHashType hash_type = ParseHashType(request.params[0], CoinStatsHashType::HASH_SERIALIZED);
    +    const CoinStatsHashType hash_type{request.params[0].isNull() ? CoinStatsHashType::HASH_SERIALIZED : ParseHashType(request.params[0].get_str())};
     
         CCoinsView* coins_view = WITH_LOCK(cs_main, return &ChainstateActive().CoinsDB());
         NodeContext& node = EnsureNodeContext(request.context);
    diff --git a/src/rpc/util.cpp b/src/rpc/util.cpp
    index 30e09a9aac..d8dadf41f7 100644
    --- a/src/rpc/util.cpp
    +++ b/src/rpc/util.cpp
    @@ -113,14 +113,8 @@ std::vector<unsigned char> ParseHexO(const UniValue& o, std::string strKey)
         return ParseHexV(find_value(o, strKey), strKey);
     }
     
    -CoinStatsHashType ParseHashType(const UniValue& param, const CoinStatsHashType default_type)
    +CoinStatsHashType ParseHashType(const std::string& hash_type_input)
     {
    -    if (param.isNull()) {
    -        return default_type;
    -    }
    -
    -    std::string hash_type_input = param.get_str();
    -
         if (hash_type_input == "hash_serialized_2") {
             return CoinStatsHashType::HASH_SERIALIZED;
         } else if (hash_type_input == "none") {
    diff --git a/src/rpc/util.h b/src/rpc/util.h
    index 942c243718..611d281014 100644
    --- a/src/rpc/util.h
    +++ b/src/rpc/util.h
    @@ -77,7 +77,7 @@ extern uint256 ParseHashO(const UniValue& o, std::string strKey);
     extern std::vector<unsigned char> ParseHexV(const UniValue& v, std::string strName);
     extern std::vector<unsigned char> ParseHexO(const UniValue& o, std::string strKey);
     
    -CoinStatsHashType ParseHashType(const UniValue& param, const CoinStatsHashType default_type);
    +CoinStatsHashType ParseHashType(const std::string& hash_type_input);
     
     extern CAmount AmountFromValue(const UniValue& value);
     extern std::string HelpExampleCli(const std::string& methodname, const std::string& args);
    

    </p></details>


    jonatack commented at 6:53 AM on January 24, 2021:

    Could also drop the header file declaration and move the function into the same file as the caller, unless you expect it to have callers in other files.


    fjahr commented at 5:38 PM on January 24, 2021:

    You're right, I don't foresee any other callers at the moment so I moved the function into rpc/blockchain.cpp.

  67. in src/rpc/blockchain.cpp:1067 in 7a96f77c00 outdated
    1062 | @@ -1063,6 +1063,7 @@ static RPCHelpMan gettxoutsetinfo()
    1063 |                          {RPCResult::Type::NUM, "transactions", "The number of transactions with unspent outputs"},
    1064 |                          {RPCResult::Type::NUM, "txouts", "The number of unspent transaction outputs"},
    1065 |                          {RPCResult::Type::NUM, "bogosize", "A meaningless metric for UTXO set size"},
    1066 | +                        {RPCResult::Type::STR_HEX, "muhash", "The serialized hash (only present if 'muhash' hash_type is chosen)"},
    1067 |                          {RPCResult::Type::STR_HEX, "hash_serialized_2", "The serialized hash (only present if 'hash_serialized_2' hash_type is chosen)"},
    


    jonatack commented at 6:32 PM on January 23, 2021:

    bde3febaf pass as optional results

    -                        {RPCResult::Type::STR_HEX, "muhash", "The serialized hash (only present if 'muhash' hash_type is chosen)"},
    -                        {RPCResult::Type::STR_HEX, "hash_serialized_2", "The serialized hash (only present if 'hash_serialized_2' hash_type is chosen)"},
    +                        {RPCResult::Type::STR_HEX, "muhash", /* optional */ true, "The serialized hash (only present if 'muhash' hash_type is chosen)"},
    +                        {RPCResult::Type::STR_HEX, "hash_serialized_2", /* optional */ true, "The serialized hash (only present if 'hash_serialized_2' hash_type is chosen)"},
    

    fjahr commented at 5:39 PM on January 24, 2021:

    Done.

  68. in src/node/coinstats.cpp:53 in 7a96f77c00 outdated
      62 | +    COutPoint outpoint = COutPoint(hash, it->first);
      63 | +    Coin coin = it->second;
      64 | +
      65 | +    CDataStream ss(SER_DISK, PROTOCOL_VERSION);
      66 | +    ss << outpoint;
      67 | +    ss << (uint32_t)(coin.nHeight * 2 + coin.fCoinBase);
    


    jonatack commented at 6:36 PM on January 23, 2021:

    bde3feba named cast

        ss << static_cast<uint32_t>(coin.nHeight * 2 + coin.fCoinBase);
    

    fjahr commented at 5:39 PM on January 24, 2021:

    Done.

  69. in test/functional/feature_utxo_set_hash.py:14 in 7a96f77c00 outdated
       9 | +
      10 | +from test_framework.blocktools import create_transaction
      11 | +from test_framework.messages import (
      12 | +    COutPoint,
      13 | +    FromHex,
      14 | +    CBlock,
    


    jonatack commented at 6:41 PM on January 23, 2021:

    7a96f77c nit, sort


    fjahr commented at 5:39 PM on January 24, 2021:

    Done.

  70. in test/functional/feature_utxo_set_hash.py:20 in 7a96f77c00 outdated
      15 | +)
      16 | +from test_framework.muhash import MuHash3072
      17 | +from test_framework.test_framework import BitcoinTestFramework
      18 | +from test_framework.util import assert_equal
      19 | +
      20 | +import struct
    


    jonatack commented at 6:42 PM on January 23, 2021:

    7a96f77c nit, place standard library imports before application/library specific imports per https://www.python.org/dev/peps/pep-0008/#imports


    fjahr commented at 5:39 PM on January 24, 2021:

    Done.

  71. jonatack commented at 6:48 PM on January 23, 2021: member

    Built and tested at each commit. This seems close. A few suggestions below.

  72. refactor: Improve encapsulation between MuHash3072 and Num3072
    Also fixes a typo.
    a1fcceac69
  73. fjahr force-pushed on Jan 24, 2021
  74. fjahr commented at 5:39 PM on January 24, 2021: member

    Addressed feedback by @jonatack

  75. fjahr force-pushed on Jan 24, 2021
  76. in src/rpc/blockchain.cpp:1069 in 2e5617f7f4 outdated
    1065 |      return RPCHelpMan{"gettxoutsetinfo",
    1066 |                  "\nReturns statistics about the unspent transaction output set.\n"
    1067 |                  "Note this call may take some time.\n",
    1068 |                  {
    1069 | -                    {"hash_type", RPCArg::Type::STR, /* default */ "hash_serialized_2", "Which UTXO set hash should be calculated. Options: 'hash_serialized_2' (the legacy algorithm), 'none'."},
    1070 | +                    {"hash_type", RPCArg::Type::STR, /* default */ "hash_serialized_2", "Which UTXO set hash should be calculated. Options: 'hash_serialized_2' (the legacy algorithm), 'muhash', 'none'."},
    


    jonatack commented at 7:06 PM on January 24, 2021:

    good catch in the latest push

    - Options: 'hash_serialized_2' (the legacy algorithm), 'none'."},
    + Options: 'hash_serialized_2' (the legacy algorithm), 'muhash', 'none'."},
    

    fjahr commented at 10:22 PM on January 24, 2021:

    Yeah, I feel like I fixed this before but I guess I messed up a rebase somewhere ;)

  77. in src/rpc/blockchain.cpp:1059 in 2e5617f7f4 outdated
    1054 | +    } else if (hash_type_input == "none") {
    1055 | +        return CoinStatsHashType::NONE;
    1056 | +    } else if (hash_type_input == "muhash") {
    1057 | +        return CoinStatsHashType::MUHASH;
    1058 | +    } else {
    1059 | +        throw JSONRPCError(RPC_INVALID_PARAMETER, strprintf("%d is not a valid hash_type", hash_type_input));
    


    jonatack commented at 7:11 PM on January 24, 2021:

    726c9504 I overlooked this in my first review, perhaps add a functional test assertion on this output

            throw JSONRPCError(RPC_INVALID_PARAMETER, strprintf("%s is not a valid hash_type", hash_type_input));
    

    jonatack commented at 7:29 PM on January 24, 2021:

    Oh, maybe it works anyway

    ./src/bitcoin-cli -signet gettxoutsetinfo MuHash
    error code: -8
    error message:
    MuHash is not a valid hash_type
    

    fjahr commented at 10:22 PM on January 24, 2021:

    Done

  78. in src/node/coinstats.cpp:141 in 2e5617f7f4 outdated
     137 | @@ -117,9 +138,17 @@ static void PrepareHash(CHashWriter& ss, const CCoinsStats& stats)
     138 |      ss << stats.hashBlock;
     139 |  }
     140 |  static void PrepareHash(std::nullptr_t, CCoinsStats& stats) {}
     141 | +// Muhash does not need the prepare step
    


    jonatack commented at 7:18 PM on January 24, 2021:

    726c9504 pico-nit, feel free to ignore, s/Muhash/MuHash/


    fjahr commented at 10:22 PM on January 24, 2021:

    Done

  79. jonatack commented at 7:41 PM on January 24, 2021: member

    ACK 2e5617f7f4bcece4834852cc925068e47ceeffa3 code review, reviewed diff last review per git diff 7a96f77 2e5617f, tested rebased on current master

    Happy to re-ACK for the changes above.

    <details><summary>Some manual command line testing.</summary><p>

    # bsi is an alias for ./src/bitcoin-cli -signet
    
    ((HEAD detached at origin/pr/19145))$ bsi help gettxoutsetinfo
    gettxoutsetinfo ( "hash_type" )
    
    Returns statistics about the unspent transaction output set.
    Note this call may take some time.
    
    Arguments:
    1. hash_type    (string, optional, default=hash_serialized_2) Which UTXO set hash should be calculated. Options: 'hash_serialized_2' (the legacy algorithm), 'muhash', 'none'.
    
    Result:
    {                                 (json object)
      "height" : n,                   (numeric) The current block height (index)
      "bestblock" : "hex",            (string) The hash of the block at the tip of the chain
      "transactions" : n,             (numeric) The number of transactions with unspent outputs
      "txouts" : n,                   (numeric) The number of unspent transaction outputs
      "bogosize" : n,                 (numeric) A meaningless metric for UTXO set size
      "muhash" : "hex",               (string, optional) The serialized hash (only present if 'muhash' hash_type is chosen)
      "hash_serialized_2" : "hex",    (string, optional) The serialized hash (only present if 'hash_serialized_2' hash_type is chosen)
      "disk_size" : n,                (numeric) The estimated size of the chainstate on disk
      "total_amount" : n              (numeric) The total amount
    }
    
    Examples:
    > bitcoin-cli gettxoutsetinfo 
    > curl --user myusername --data-binary '{"jsonrpc": "1.0", "id": "curltest", "method": "gettxoutsetinfo", "params": []}' -H 'content-type: text/plain;' http://127.0.0.1:8332/
    
    ((HEAD detached at origin/pr/19145))$ bsi gettxoutsetinfo
    {
      "height": 21927,
      "bestblock": "0000000d69e0c1800fd7103a7f1aaf24ec7b8c1db92257bda1035dcfd8e71c30",
      "transactions": 17430,
      "txouts": 17550,
      "bogosize": 1263989,
      "hash_serialized_2": "c8dbf6ab5094c36935ca89fc8e347b65ab17320f0937c5ac942274a1a0027892",
      "disk_size": 1264123,
      "total_amount": 1096350.00000000
    }
    
    ((HEAD detached at origin/pr/19145))$ bsi gettxoutsetinfo hash_serialized_2
    {
      "height": 21928,
      "bestblock": "000000c3c7e614717b8cdf6c33de7a8a7d97a856eba6640b481a0029fc774249",
      "transactions": 17431,
      "txouts": 17551,
      "bogosize": 1264061,
      "hash_serialized_2": "3e31af57df9dac6bf8924560b647fc81f01bdac1c0fe4af2bc12b61c81d87d61",
      "disk_size": 1264123,
      "total_amount": 1096400.00000000
    }
    
    ((HEAD detached at origin/pr/19145))$ bsi gettxoutsetinfo muhash
    {
      "height": 21927,
      "bestblock": "0000000d69e0c1800fd7103a7f1aaf24ec7b8c1db92257bda1035dcfd8e71c30",
      "transactions": 17430,
      "txouts": 17550,
      "bogosize": 1263989,
      "muhash": "cd74b9cfe35dfee561ed730953ed15085e83c096db30d5c03a8ab1a6a137077a",
      "disk_size": 1264123,
      "total_amount": 1096350.00000000
    }
    
    ((HEAD detached at origin/pr/19145))$ bsi gettxoutsetinfo none
    {
      "height": 21927,
      "bestblock": "0000000d69e0c1800fd7103a7f1aaf24ec7b8c1db92257bda1035dcfd8e71c30",
      "transactions": 17430,
      "txouts": 17550,
      "bogosize": 1263989,
      "disk_size": 1264123,
      "total_amount": 1096350.00000000
    }
    
    ((HEAD detached at origin/pr/19145))$ bsi gettxoutsetinfo non
    error code: -8
    error message:
    non is not a valid hash_type
    
    ((HEAD detached at origin/pr/19145))$ bsi gettxoutsetinfo MuHash
    error code: -8
    error message:
    MuHash is not a valid hash_type
    

    </p></details>

  80. fjahr force-pushed on Jan 24, 2021
  81. fjahr commented at 10:23 PM on January 24, 2021: member

    Addressed further feedback by @jonatack . Thanks for reviewing!

  82. jonatack commented at 10:53 AM on January 25, 2021: member

    re-ACK 3506d9080166bc250e941d791e3e8a2b85d5bd6b per git diff 2e5617f 3506d90

    Thanks for updating.

  83. jonatack commented at 1:04 PM on January 25, 2021: member

    This might need a release note (even if the note will be updated in planned follow-ups to this).

  84. Sjors commented at 1:46 PM on January 28, 2021: member

    Muhash at height 668,047: c263d20036cc983510e9b347cb88e947168dd5665ed00b0e6118d07a7704f810

  85. in src/node/coinstats.cpp:28 in e6856a3809 outdated
      24 | @@ -24,7 +25,7 @@ static uint64_t GetBogoSize(const CScript& scriptPubKey)
      25 |             scriptPubKey.size() /* scriptPubKey */;
      26 |  }
      27 |  
      28 | -static void ApplyHash(CCoinsStats &stats, CHashWriter& ss, const uint256& hash, const std::map<uint32_t, Coin>& outputs, std::map<uint32_t, Coin>::const_iterator it)
      29 | +static void ApplyHash(CCoinsStats& stats, CHashWriter& ss, const uint256& hash, const std::map<uint32_t, Coin>& outputs, std::map<uint32_t, Coin>::const_iterator it)
    


    Sjors commented at 3:17 PM on January 29, 2021:

    e6856a3809e54dc4de03cb4cd4f9d78495cc0625: nit this undoes the move of & in the previous commit


    fjahr commented at 12:00 AM on January 31, 2021:

    Fixed.

  86. in src/node/coinstats.cpp:124 in e6856a3809 outdated
     123 | @@ -111,6 +124,10 @@ bool GetUTXOStats(CCoinsView* view, CCoinsStats& stats, CoinStatsHashType hash_t
     124 |      case(CoinStatsHashType::NONE): {
     125 |          return GetUTXOStats(view, stats, nullptr, interruption_point);
     126 |      }
     127 | +    case(CoinStatsHashType::MUHASH): {
    


    Sjors commented at 3:21 PM on January 29, 2021:

    e6856a3809e54dc4de03cb4cd4f9d78495cc0625 nit: handle MUHASH before NONE? (same with PrepareHash and ParseHashType)


    fjahr commented at 12:00 AM on January 31, 2021:

    Done.

  87. in src/rpc/blockchain.cpp:1079 in e6856a3809 outdated
    1075 | @@ -1063,7 +1076,8 @@ static RPCHelpMan gettxoutsetinfo()
    1076 |                          {RPCResult::Type::NUM, "transactions", "The number of transactions with unspent outputs"},
    1077 |                          {RPCResult::Type::NUM, "txouts", "The number of unspent transaction outputs"},
    1078 |                          {RPCResult::Type::NUM, "bogosize", "A meaningless metric for UTXO set size"},
    1079 | -                        {RPCResult::Type::STR_HEX, "hash_serialized_2", "The serialized hash (only present if 'hash_serialized_2' hash_type is chosen)"},
    1080 | +                        {RPCResult::Type::STR_HEX, "muhash", /* optional */ true, "The serialized hash (only present if 'muhash' hash_type is chosen)"},
    


    Sjors commented at 3:25 PM on January 29, 2021:

    e6856a3809e54dc4de03cb4cd4f9d78495cc0625 nit: move down?


    fjahr commented at 12:00 AM on January 31, 2021:

    Done.

  88. in src/node/coinstats.cpp:146 in 3506d90801 outdated
     142 | +static void PrepareHash(MuHash3072& muhash, CCoinsStats& stats) {}
     143 |  
     144 |  static void FinalizeHash(CHashWriter& ss, CCoinsStats& stats)
     145 |  {
     146 | -    stats.hashSerialized = ss.GetHash();
     147 | +    stats.hashSerialized = ss.GetSHA256();
    


    Sjors commented at 3:46 PM on January 29, 2021:

    This (?) changes the result of gettxoutsetinfo hash_serialized_2 (probably something else)


    fjahr commented at 12:00 AM on January 31, 2021:

    Huh, no idea how that happened, makes no sense at all, especially inside that commit. Fixed.

  89. Sjors changes_requested
  90. Sjors commented at 3:55 PM on January 29, 2021: member

    Reviewed 3506d9080166bc250e941d791e3e8a2b85d5bd6b, almost happy...

    Muhash at height 668,212: d0192e6604c3dc9a78f47bfc5ae84b8c0f6baf960d7ba9305664bbb9f65ef092.

    hash_serialized_2 is 63fa057a656168ed832d8abbec8008948fc17aecde9f346d69043871a746941f on master, but 15d3629db2fcbed27a1fd9d35c15b5ac6f0a70f5723ebf9cf87e3df0b0feb061 here (efe73b1f89e82d65100c73c06d9a01ec19cc7e029d93caaa8bfded4d1723d484 if I revert GetHash())

    You may want to hardcode a hash_serialized_2 example in the test suite.

  91. refactor: Separate hash and stats calculation in coinstats 2474645f3b
  92. rpc: Add hash_type MUHASH to gettxoutsetinfo
    Also small style fix in rpc/util.cpp
    0d3b2f643d
  93. test: Add test for gettxoutsetinfo RPC with MuHash 6ccc8fc067
  94. test: Add test for deterministic UTXO set hash results e987ae5a55
  95. fjahr force-pushed on Jan 31, 2021
  96. fjahr commented at 12:14 AM on January 31, 2021: member

    Reviewed 3506d90, almost happy...

    Muhash at height 668,212: d0192e6604c3dc9a78f47bfc5ae84b8c0f6baf960d7ba9305664bbb9f65ef092.

    hash_serialized_2 is 63fa057a656168ed832d8abbec8008948fc17aecde9f346d69043871a746941f on master, but 15d3629db2fcbed27a1fd9d35c15b5ac6f0a70f5723ebf9cf87e3df0b0feb061 here (efe73b1f89e82d65100c73c06d9a01ec19cc7e029d93caaa8bfded4d1723d484 if I revert GetHash())

    You may want to hardcode a hash_serialized_2 example in the test suite.

    Thanks a lot for testing @Sjors ! I should have addressed all your comments.

    The difference in that hash was caused by using outputs.end() where I should have used std::prev(outputs.end()) in ApplyHash(). I was actually sure we had some deterministic tests in place for that hash already but obviously, we didn't. So I added a new commit with a small test calculating the hash based on the cached chain to add such coverage.

  97. in src/rpc/blockchain.cpp:1080 in e987ae5a55
    1075 | @@ -1063,7 +1076,8 @@ static RPCHelpMan gettxoutsetinfo()
    1076 |                          {RPCResult::Type::NUM, "transactions", "The number of transactions with unspent outputs"},
    1077 |                          {RPCResult::Type::NUM, "txouts", "The number of unspent transaction outputs"},
    1078 |                          {RPCResult::Type::NUM, "bogosize", "A meaningless metric for UTXO set size"},
    1079 | -                        {RPCResult::Type::STR_HEX, "hash_serialized_2", "The serialized hash (only present if 'hash_serialized_2' hash_type is chosen)"},
    1080 | +                        {RPCResult::Type::STR_HEX, "hash_serialized_2", /* optional */ true, "The serialized hash (only present if 'hash_serialized_2' hash_type is chosen)"},
    1081 | +                        {RPCResult::Type::STR_HEX, "muhash", /* optional */ true, "The serialized hash (only present if 'muhash' hash_type is chosen)"},
    


    jonatack commented at 8:21 PM on February 1, 2021:

    nit, in these two lines, could replace the single quotes with double ones for consistency, e.g. s/'/\"/


    fjahr commented at 9:30 PM on February 1, 2021:

    Yes, will do if I have to retouch! Thanks for re-reviewing!

  98. jonatack commented at 8:53 PM on February 1, 2021: member

    Good catch and nice improvements.

    Tested re-ACK e987ae5a554c9952812746c29f2766bacea4b727 per git diff 3506d90 e987ae5, reviewed diff, debug built, ran gettxoutsetinfo -signet and help on this branch vs master, at height 23127 both returned hash_serialized_2 of 2b72d65f3b6efb2311f58374ea2b939abf49684d44f4bafda45faa3b5452a454 and this branch returned muhash of c9f1ff12d345ccf9939c6bbf087e6f7399b6115adee1569287e9c5c43dbb475c

    Only if you have to retouch and if you agree, one nit below (and maybe use single/double quotes consistently in the functional tests).

  99. Sjors commented at 10:19 AM on February 3, 2021: member

    tACK e987ae5

    I get the same MuHash as before for block 668,212 and hash_serialized_2 now matches what I found on master.

  100. achow101 commented at 10:18 PM on February 8, 2021: member

    ACK e987ae5a554c9952812746c29f2766bacea4b727

    It would be nice if the test used MiniWallet or otherwise didn't rely on the wallet, but it's fine for now.

  101. in src/crypto/muhash.cpp:278 in a1fcceac69 outdated
     274 | @@ -276,18 +275,33 @@ void Num3072::Divide(const Num3072& a)
     275 |      if (this->IsOverflow()) this->FullReduce();
     276 |  }
     277 |  
     278 | -Num3072 MuHash3072::ToNum3072(Span<const unsigned char> in) {
     279 | -    Num3072 out{};
     280 | -    uint256 hashed_in = (CHashWriter(SER_DISK, 0) << in).GetSHA256();
     281 | -    unsigned char tmp[BYTE_SIZE];
     282 | -    ChaCha20(hashed_in.data(), hashed_in.size()).Keystream(tmp, BYTE_SIZE);
     283 | +Num3072::Num3072(const unsigned char (&data)[BYTE_SIZE]) {
    


    ryanofsky commented at 5:59 PM on February 10, 2021:

    In commit "refactor: Improve encapsulation between MuHash3072 and Num3072" (a1fcceac69097a8e6540a6fd8121a5d53022528f)

    Note in case it helps other reviewers: I found this commit easier to review rearranging the method order to ToNum3072 Num3072 ToBytes instead of Num3072 ToBytes ToNum3072 so diff was smaller with code moving around less.

  102. in src/node/coinstats.cpp:27 in 2474645f3b outdated
      23 | @@ -24,31 +24,35 @@ static uint64_t GetBogoSize(const CScript& scriptPubKey)
      24 |             scriptPubKey.size() /* scriptPubKey */;
      25 |  }
      26 |  
      27 | -static void ApplyStats(CCoinsStats& stats, CHashWriter& ss, const uint256& hash, const std::map<uint32_t, Coin>& outputs)
      28 | +static void ApplyHash(CCoinsStats& stats, CHashWriter& ss, const uint256& hash, const std::map<uint32_t, Coin>& outputs, std::map<uint32_t, Coin>::const_iterator it)
    


    ryanofsky commented at 6:32 PM on February 10, 2021:

    In commit "refactor: Separate hash and stats calculation in coinstats" (2474645f3b15687e7f196b89eb935d6e6a98a9da)

    stats arguments in unused in all three overloads and could be dropped.

    Also, passing iterators around like this seems unnecessarily elaborate. It makes sense to separate stats and hash computations, but is there a reason both computations need to share the a single for loop over the output map? I would think you could make all the functions separate like:

    static void ApplyHash(CHashWriter& ss, const uint256& hash, const std::map<uint32_t, Coin>& outputs); static void ApplyHash(MuHash3072& muhash, const uint256& hash, const std::map<uint32_t, Coin>& outputs); static void ApplyHash(std::nullptr_t, const uint256& hash, const std::map<uint32_t, Coin>& outputs); static void ApplyStats(CCoinsStats& stats, const uint256& hash, const std::map<uint32_t, Coin>& outputs);

  103. ryanofsky approved
  104. ryanofsky commented at 6:46 PM on February 10, 2021: member

    Code review ACK e987ae5a554c9952812746c29f2766bacea4b727. Looks very good. I left one suggestion to simplify code, but feel free to ignore it here and maybe consider it for later since PR has already had a lot of review.

  105. fjahr commented at 9:43 PM on February 10, 2021: member

    Thanks a lot for the reviews @achow101 and @ryanofsky ! I will add your suggestions if I have to push again, otherwise I will put them into the next PR in this series.

  106. laanwj merged this on Feb 12, 2021
  107. laanwj closed this on Feb 12, 2021

  108. laanwj removed this from the "Blockers" column in a project

  109. sidhujag referenced this in commit 266fcc8b6d on Feb 12, 2021
  110. in test/functional/feature_utxo_set_hash.py:30 in e987ae5a55
      25 | +        self.skip_if_no_wallet()
      26 | +
      27 | +    def test_deterministic_hash_results(self):
      28 | +        self.log.info("Test deterministic UTXO set hash results")
      29 | +
      30 | +        # These depend on the setup_clean_chain option, the chain loaded from the cache
    


    MarcoFalke commented at 4:12 PM on February 12, 2021:

    there is no loaded chain and the utxo set is empty. Which is also useful to check, but doesn't check if the hash is deterministic?

    Did you forget to load the chain from ./test/functional/data/rpc_getblockstats.json?

  111. MarcoFalke referenced this in commit 9b48b3ac42 on Mar 26, 2021
  112. kittywhiskers referenced this in commit c46d0b951b on Feb 26, 2022
  113. kittywhiskers referenced this in commit 47e6598166 on Feb 26, 2022
  114. Fabcien referenced this in commit 32714970e1 on Mar 14, 2022
  115. Fabcien referenced this in commit e3aff75e2f on Mar 14, 2022
  116. Fabcien referenced this in commit ef48cb3edd on Mar 14, 2022
  117. kittywhiskers referenced this in commit 88381c4bef on Apr 3, 2022
  118. kittywhiskers referenced this in commit f559701b86 on Apr 20, 2022
  119. kittywhiskers referenced this in commit 6504cafa01 on Apr 24, 2022
  120. kittywhiskers referenced this in commit ef18d0858d on Apr 27, 2022
  121. DrahtBot locked this on Aug 16, 2022

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-05-02 18:14 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me