coinstats, assumeutxo: fix hash_serialized2 calculation #28685

pull fjahr wants to merge 6 commits into bitcoin:master from fjahr:2023-10-au-weird-fix changing 15 files +78 −74
  1. fjahr commented at 10:01 am on October 19, 2023: contributor

    Closes #28675

    The last commit demonstrates that theStack’s analysis here seems to be correct. There will be more changes needed for the rest of the test suite but the feature_assumeutxo.py with my additional tests pass.

  2. DrahtBot commented at 10:02 am on October 19, 2023: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage

    For detailed information about the code coverage, see the test coverage report.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK theStack, ryanofsky, achow101
    Concept ACK jamesob
    Stale ACK dergoegge

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #26426 (WIP: Fix coinstatsindex overflow issue by fjahr)
    • #26045 (rpc: Optimize serialization disk space of dumptxoutset by aureleoules)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  3. fjahr force-pushed on Oct 19, 2023
  4. DrahtBot added the label CI failed on Oct 19, 2023
  5. fjahr force-pushed on Oct 19, 2023
  6. fjahr renamed this:
    coinstats, asssumeutxo: fix hash_serialized2 calculation
    coinstats, assumeutxo: fix hash_serialized2 calculation
    on Oct 19, 2023
  7. dergoegge commented at 11:41 am on October 19, 2023: member

    I’ve written a fuzz test that catches the original bug mentioned in #28675 (comment): https://github.com/dergoegge/bitcoin/blob/2023-10-fuzz-coinstats-hash/src/test/fuzz/coinstats_hash.cpp.

    I think it also caught a second issue regarding the values of UTXOs, certain negative values seem to result in the same hash as their positive counter part. I have not investigated further but I assume it is realted to VarIntMode::NONNEGATIVE_SIGNED.

     0$ echo "CMoC9Q==" | base64 --decode > coinstats_hash.crash
     1$ FUZZ_PRINTF=1 FUZZ=coinstats_hash ./src/test/fuzz/fuzz coinstats_hash.crash
     2INFO: Running with entropic power schedule (0xFF, 100).
     3INFO: Seed: 3663010564
     4INFO: Loaded 1 modules   (379542 inline 8-bit counters): 379542 [0x560080cb5d88, 0x560080d1281e),
     5INFO: Loaded 1 PC tables (379542 PCs): 379542 [0x560080d12820,0x5600812dd180),
     6./src/test/fuzz/fuzz: Running 1 inputs 1 time(s) each.
     7Running: coinstats_hash_crashes/crash-d59c557ae7f9e837716b272a05d1397a94f589d7
     82023-10-19T11:34:50Z Opened LevelDB successfully
     92023-10-19T11:34:50Z Using obfuscation key for : 0000000000000000
    102023-10-19T11:34:50Z mutating a coin
    112023-10-19T11:34:50Z coinbase=1 height=1386179375 value=-5934962493229608344 scriptPubKey=[97951910994b0a6104092fbc9b385639]
    122023-10-19T11:34:50Z coinbase=1 height=1386179375 value=6856422276773357632 scriptPubKey=[97951910994b0a6104092fbc9b385639]
    13fuzz: test/fuzz/coinstats_hash.cpp:135: void coinstats_muhash_fuzz_target(FuzzBufferType): Assertion `mutated_result->hashSerialized != result->hashSerialized' failed.
    

    Using the varints in the serialization for the hashes seems like unnecessary complexity in the first place. Since we are changing the hash format already, maybe we just get rid of the varints entirely?

    (I’ve not polished the fuzz test, sorry if you run into bugs in the test itself)

  8. fjahr force-pushed on Oct 19, 2023
  9. fjahr commented at 12:18 pm on October 19, 2023: contributor

    @dergoegge Awesome to have a fuzz test already :)

    I think the issue with negative amounts is good to know but not as critical. Since negative output values are not consensus valid it seems ok for the serialization to assume that they are not negative. That’s at least how I understand the issue right now. So I think in practice this should at least not lead to a different hash.

    I am unsure about removing VARINT from this code entirely. I agree that for the hashing it is not needed but we also use it when we dump the utxo set and there we probably don’t want to get rid of it since it’s saving space. So we still have to reason about it anyway and I think VARINT isn’t really a problem in general. It’s widely used in our code and I think also pretty well understood. But I won’t fight it if people agree that this is a valuable change.

    Let’s discuss this in the IRC meeting today!

  10. fanquake commented at 12:22 pm on October 19, 2023: member

    I think this should also change hash_serialized_2 -> hash_serialized_3, and add a release note. Given that this was broken, and likely had very few users, I don’t see a problem with a breaking change, and making it clear to anyone that may have been using it, that things were broken (better than silently starting to return different results).

    I don’t see any value in keeping around a broken hash_serialized_2 either, as that will just lead to future confusion, in relation to AssumeUTXO.

  11. theStack commented at 12:49 pm on October 19, 2023: contributor

    Using the varints in the serialization for the hashes seems like unnecessary complexity in the first place. Since we are changing the hash format already, maybe we just get rid of the varints entirely?

    That’s something worth considering IMHO, had similar thoughts. I assume the primary (probably only?) reason for using VARINTs in the hashing routine is that it’s supposedly faster, as it leads to less total data that has to be processed by the underlying hash function (e.g. the vast majority of UTXO amounts can probably be encoded in 2-4 bytes if encoded as VARINTs, rather than needing the full 8 bytes). Would be interesting to benchmark how much difference in performance it really is, to judge if it’s worth it to keep the VARINTs in the hashing code.

  12. fjahr referenced this in commit e403dc98bf on Oct 19, 2023
  13. fjahr commented at 1:03 pm on October 19, 2023: contributor

    I think this should also change hash_serialized_2 -> hash_serialized_3

    Makes sense, added

  14. fjahr referenced this in commit b4713c58ec on Oct 19, 2023
  15. fjahr force-pushed on Oct 19, 2023
  16. dergoegge commented at 1:13 pm on October 19, 2023: member

    Since negative output values are not consensus valid it seems ok for the serialization to assume that they are not negative.

    Couldn’t an attacker distribute utxo sets with the negative values instead of the correct positive ones without it being detected at startup? This would then later lead to a consensus split when these outputs are spent.

  17. jamesob commented at 1:36 pm on October 19, 2023: member

    Couldn’t an attacker distribute utxo sets with the negative values instead of the correct positive ones without it being detected at startup? This would then later lead to a consensus split when these outputs are spent.

    On that note, may not be a bad idea to add a negative-amount sanity check to ActivateSnapshot() as a belt-and-suspenders, since we’re already checking the height for overflow anyway: https://github.com/jamesob/bitcoin/blob/447f50e4aed9a8b1d80e1891cda85801aeb80b4e/src/validation.cpp#L4926-L4928.

  18. fjahr commented at 1:38 pm on October 19, 2023: contributor

    Couldn’t an attacker distribute utxo sets with the negative values instead of the correct positive ones without it being detected at startup? This would then later lead to a consensus split when these outputs are spent.

    On that note, may not be a bad idea to add a negative-amount sanity check to ActivateSnapshot() as a belt-and-suspenders, since we’re already checking the height for overflow anyway: https://github.com/jamesob/bitcoin/blob/447f50e4aed9a8b1d80e1891cda85801aeb80b4e/src/validation.cpp#L4926-L4928.

    Yeah, I thought somehow that this was enforced by TxOut itself but it seems that is not the case.

  19. in src/validation.cpp:5402 in 6d0ce0b5c1 outdated
    5398@@ -5399,6 +5399,11 @@ bool ChainstateManager::PopulateAndValidateSnapshot(
    5399                       coins_count - coins_left);
    5400             return false;
    5401         }
    5402+        if (coin.out.nValue < 0) {
    


    jamesob commented at 1:56 pm on October 19, 2023:
    While we’re at it, might as well check that we’re bounded by MAX_MONEY too (as in tx_check: https://github.com/bitcoin/bitcoin/blob/master/src/consensus/tx_check.cpp#L26-L29)

    fjahr commented at 2:01 pm on October 19, 2023:
    Right, I should have just used MoneyRange right away…

    fjahr commented at 2:01 pm on October 19, 2023:
    done
  20. fjahr force-pushed on Oct 19, 2023
  21. fjahr force-pushed on Oct 19, 2023
  22. fanquake added this to the milestone 26.0 on Oct 19, 2023
  23. jamesob commented at 2:36 pm on October 19, 2023: member

    Concept ACK and approach ACK

    My only reservation here would be completely displacing hash_serialized_2. It’s bad manners to create a breaking change to the RPC API without deprecation warnings, but I suppose if there is an existing vulnerability, the break may be preferable.

    Not to create more work, but I wonder if it’s worth polling the mailinglist to see if there are existing consumers of gettxoutset.hash_serialized_2 that would be disrupted by its disappearance. Until then, it’s possible to parameterize ApplyHash to make the old, bad behavior optional, and include both keys in non-assumeutxo RPC output.

  24. DrahtBot removed the label CI failed on Oct 19, 2023
  25. in src/kernel/coinstats.cpp:77 in 9454f66e34 outdated
    73@@ -74,7 +74,7 @@ static void ApplyHash(HashWriter& ss, const uint256& hash, const std::map<uint32
    74     for (auto it = outputs.begin(); it != outputs.end(); ++it) {
    75         if (it == outputs.begin()) {
    76             ss << hash;
    77-            ss << VARINT(it->second.nHeight * 2 + it->second.fCoinBase ? 1u : 0u);
    78+            ss << VARINT(it->second.nHeight * 2 + (it->second.fCoinBase ? 1u : 0u));
    


    dergoegge commented at 4:47 pm on October 19, 2023:
    We need to commit to the height and coinbase status of all coins created by the same transaction, not just the values from the first coin.

    theStack commented at 5:03 pm on October 19, 2023:

    I had the same suspicion earlier, but: all coins handled in this loop belong to the same tx (same outpoint hash), so they consequently should also have the same height and coinbase values, no?

    // EDIT: nevermind, I wasn’t thinking this to the end.

  26. dergoegge commented at 4:55 pm on October 19, 2023: member

    I’ve changed the fuzzer and it found one more bug (see comment inline): https://github.com/dergoegge/bitcoin/tree/2023-10-fuzz-coinstats-hash

    The version of ApplyHash on that branch no longer uses VARINT and looks much simpler imo.

    Imo, we should fix all bugs that cause two different versions of a snapshot file to result in the same hash. Otherwise the assumeutxo check becomes “check that the hashes match and check for negative output values and check …” instead of “check that the hashes match”.

  27. theStack commented at 6:12 pm on October 19, 2023: contributor

    Here are some benchmark results of different branches that has been proposed so far. I’ve ran $ time ./src/bitcoin-cli gettxoutsetinfo on mainnet, around block height 812940 three times each:

    master (buggy) this PR (with VARINTs, simple fix, still buggy) dergoegge’s branch (without VARINTs, bugfixed) my branch (using TxOutSer serialization, as used in MuHash)
    1m8,664s 1m9,024s 1m9,861s 1m25,875s
    1m6,803s 1m8,815s 1m12,192s 1m28,743s
    1m8,808s 1m9,466s 1m12,227s 1m28,711s

    The TxOutSer approach is the simplest one (ApplyHash follows the exact same logic as for MuHash, only that the data is fed into HashWriter rather than MuHash3072 instance), but looking at the numbers, it seems that going with @dergoegge’s solution makes the most sense?

  28. fjahr commented at 7:07 pm on October 19, 2023: contributor

    I have similar results as @theStack . Involvement of VARINT doesn’t seem to have much impact on performance, but MuHash style serialization does. Though that would allow for nice simplifications in coinstats (see that commit).

    master VARINT removed (https://github.com/fjahr/bitcoin/commit/0431c115904b8df344efc864e1b2cc151cafa0ee) MuHash style (https://github.com/fjahr/bitcoin/commit/6b3628f2c8402695af970a6f2e54da53a30d3c65)
    1m23.825s 1m15.944s 1m37.636s
    1m15.484s 1m16.271s 1m39.447s
    1m16.042s 1m16.467s 1m36.454s
  29. sipa commented at 7:35 pm on October 19, 2023: member
    @theStack @fjahr Couldn’t the difference in performance be due to the fact that the muhash style version constructs a temporary DataStream object first, while the other feeds it to a hasher directly? It’s possible to use muhash-style serialization without the intermediary object I think.
  30. fjahr commented at 7:50 pm on October 19, 2023: contributor

    @theStack @fjahr Couldn’t the difference in performance be due to the fact that the muhash style version constructs a temporary DataStream object first, while the other feeds it to a hasher directly? It’s possible to use muhash-style serialization without the intermediary object I think.

    Seems that you are right, I have tested this with the code here and I get similar performance as with master (m17.516s, 1m20.996s, 1m22.714s).

  31. fjahr commented at 8:54 pm on October 19, 2023: contributor

    I have just pushed another commit that breaks all the tests and chainparams that were previously fixed but I would like someone else to confirm my findings in terms of performance before I finish everything up, fixing the tests, squashing etc.

    This latest change aligns the serialization for muhash and hash_serialized which eliminates the VARINTs and it doesn’t seem to come with a performance hit after I applied @sipa ’s comment. Other side-effects previously not mentioned: 1. We can get rid of PrepareHash completely. 2. Since TxOutSer is used in coinstatsindex and I made the design choice to change it, I needed to make some further changes and I am exposing now ApplyCoinHash to coinstatsindex and I have also added RemoveCoinHash to coinstats for consistency although it’s only used in coinstatsindex. Happy to discuss this though. Let the bikeshedding begin ;) @dergoegge could you also point your merciless fuzzer at this?

  32. fjahr commented at 9:23 pm on October 19, 2023: contributor

    Actually, we are still indirectly using a VARINT as part of the standard Coin serialization with this latest patch, it’s just not in coinstats anymore: https://github.com/bitcoin/bitcoin/blob/77f0ceb7175dbd00b51e27838a7167804d67646f/src/coins.h#L64

    However, at least this is now the default VARINT and not NONNEGATIVE_SIGNED.

  33. fjahr force-pushed on Oct 19, 2023
  34. DrahtBot added the label CI failed on Oct 19, 2023
  35. fjahr force-pushed on Oct 19, 2023
  36. dergoegge commented at 9:25 am on October 20, 2023: member

    Approach ACK - re-using the muhash serialization makes a lot of sense.

    could you also point your merciless fuzzer at this?

    Done! Didn’t find any more bugs.

  37. in src/kernel/coinstats.cpp:55 in 1c78f7357b outdated
    52+template <typename T>
    53+static void TxOutSer(T& ss, const COutPoint& outpoint, const Coin& coin)
    54 {
    55-    DataStream ss{};
    56     ss << outpoint;
    57     ss << static_cast<uint32_t>(coin.nHeight * 2 + coin.fCoinBase);
    


    dergoegge commented at 9:46 am on October 20, 2023:
    0    ss << static_cast<uint32_t>((coin.nHeight << 1) + coin.fCoinBase);
    

    This should avoid UBSan screaming at us in the future, e.g. kernel/coinstats.cpp:55:46: runtime error: signed integer overflow: 2147483636 * 2 cannot be represented in type 'int'.


    maflcko commented at 9:50 am on October 20, 2023:

    UBSan screaming

    Wouldn’t it scream equally about left shift of a signed integer?


    dergoegge commented at 10:17 am on October 20, 2023:
    Hm, it isn’t complaining. nHeight is also not signed.

    maflcko commented at 10:21 am on October 20, 2023:
    Ah, ok, I mixed it up with CBlockIndex::nHeight. In that case, the potentially-narrowing cast isn’t needed and a non-narrowing cast on the bool to unsigned should be enough?

    fjahr commented at 2:05 pm on October 20, 2023:
    I have adopted @dergoegge ’s suggestion for now, let me know if you would prefer a different change though @maflcko
  38. theStack commented at 11:10 am on October 20, 2023: contributor

    @theStack @fjahr Couldn’t the difference in performance be due to the fact that the muhash style version constructs a temporary DataStream object first, while the other feeds it to a hasher directly? It’s possible to use muhash-style serialization without the intermediary object I think.

    Seems that you are right, I have tested this with the code here and I get similar performance as with master (m17.516s, 1m20.996s, 1m22.714s).

    I have just pushed another commit that breaks all the tests and chainparams that were previously fixed but I would like someone else to confirm my findings in terms of performance before I finish everything up, fixing the tests, squashing etc.

    Re-did the benchmarks the current state of the PR (using MuHash serialization without the temporary DataStream object, commit 1c78f7357b1c1053da472c43575087e3d602a01f) around block height 813024 on mainnet (see earlier benchmark #28685 (comment) for comparison):

    latest state of this PR
    1m16,973s
    1m17,455s
    1m17,509s

    That’s still a bit slower than on master (and alternative implementations), but IMHO it’s not too bad to be a show-stopper for using the MuHash serialization.

  39. fjahr referenced this in commit f59dd58dea on Oct 20, 2023
  40. fjahr force-pushed on Oct 20, 2023
  41. fjahr commented at 2:20 pm on October 20, 2023: contributor

    Cleaned everything up while using the MuHash serialization as discussed. The remaining tests are fixed and chainparams for testnet and signet updated.

    This currently does not preserve the old hash_serialized_2 which is the concern of the (I think) only unaddressed comment coming from @jamesob :

    My only reservation here would be completely displacing hash_serialized_2. It’s bad manners to create a breaking change to the RPC API without deprecation warnings, but I suppose if there is an existing vulnerability, the break may be preferable.

    Not to create more work, but I wonder if it’s worth polling the mailinglist to see if there are existing consumers of gettxoutset.hash_serialized_2 that would be disrupted by its disappearance. Until then, it’s possible to parameterize ApplyHash to make the old, bad behavior optional, and include both keys in non-assumeutxo RPC output.

    I will write a post to the mailing list today and also look into how much pain it would be to preserve the hash_serialized_2 and run it in parallel/allow its usage with a parameter. If people have an opinion on this please comment here as well.

    Personally, I agree we shouldn’t break anything without warning but I am also pretty convinced that there is no serious usage of hash_serialized_2 happening so I think it would be okay to remove it.

    I think this should not block review of this change here though.

  42. fjahr commented at 8:24 pm on October 20, 2023: contributor

    Thanks, everyone for the review and feedback so far!

    A few more updates:

    • The mailing list post went out a few hours ago. I let you know if use cases for hash_serialized_2 arise.
    • Here is a commit that re-introduces hash_serialized_2 on top of the change set here: https://github.com/fjahr/bitcoin/commit/71d2cdcdce63c5c25bcb704badec521d32f2928d The user has the choice to request hash_serialized_2 explicitly as the hash or have it be used by default when activating -deprecatedrpc=gettxoutsetinfo. It’s a bit simpler than I had expected honestly but obviously leaving it out is still cleaner. Please indicate if you would like to have this included here even if nobody speaks up on the mailing list in the next couple of days.
    • I think the CI failure can be ignored

    EDIT: Here is another, slightly altered approach to the re-introduction of hash_serialized_2, using another hash_type instead of bools all over: https://github.com/fjahr/bitcoin/commit/47ae7d8c3c1a10b796d0b5e9cd08da61d7dfedf9

  43. coinstats: Fix hash_serialized2 calculation
    The legacy serialization was vulnerable to maleation and is fixed by
    adopting the same serialization procedure as was already in use for
    MuHash.
    
    This also includes necessary test fixes where the hash_serialized2 was
    hardcoded as well as correction of the regtest chainparams.
    
    Co-authored-by: Sebastian Falbesoner <sebastian.falbesoner@gmail.com>
    351370a1d2
  44. scripted-diff: Rename hash_serialized_2 to hash_serialized_3
    -BEGIN VERIFY SCRIPT-
    sed -i 's/hash_serialized_2/hash_serialized_3/g' $( git grep -l 'hash_serialized_2' ./src ./contrib ./test )
    -END VERIFY SCRIPT-
    cb0336817e
  45. docs: Add release notes for #28685 66865446a7
  46. assumeutxo: Check deserialized coins for out of range values f6213929c5
  47. chainparams, assumeutxo: Fix testnet txoutset hash
    Review hint: You can use devtools/utxo_snapshot.sh to validate this.
    
    ./contrib/devtools/utxo_snapshot.sh 2500000 testnet-utxo.dat ./src/bitcoin-cli
    a503cd0f0b
  48. chainparams, assumeutxo: Fix signet txoutset hash
    Review hint: You can use devtools/utxo_snapshot.sh to validate this.
    
    ./contrib/devtools/utxo_snapshot.sh 160000 signet-utxo.dat ./src/bitcoin-cli
    4bfaad4eca
  49. fjahr force-pushed on Oct 20, 2023
  50. fjahr commented at 8:55 pm on October 20, 2023: contributor
    Rebased since #28669 was merged 🚀
  51. dergoegge approved
  52. dergoegge commented at 10:51 am on October 23, 2023: member

    Code review ACK f6213929c519d0e615cacd3d6f479f1517be1662

    I did not verify the new chainparams hashes.

  53. in doc/release-notes-28685.md:4 in 66865446a7 outdated
    0@@ -0,0 +1,4 @@
    1+RPC
    2+---
    3+
    4+The `hash_serialized_2` value has been removed from `gettxoutsetinfo` since the value it calculated contained a bug and did not take all data into account. It is superseded by `hash_serialized_3` which provides the same functionality but serves the correctly calculated hash.
    


    theStack commented at 1:20 pm on October 23, 2023:

    doc-nit: could add PR reference

    0The `hash_serialized_2` value has been removed from `gettxoutsetinfo` since the value it calculated contained a bug and did not take all data into account. It is superseded by `hash_serialized_3` which provides the same functionality but serves the correctly calculated hash. (#28685)
    
  54. theStack approved
  55. theStack commented at 2:38 pm on October 23, 2023: contributor

    Code-review ACK 4bfaad4eca01674a9c84a447a17594dc2b9a4c39

    Verified the adapted testnet and signet txoutset hashes via

     0$ ./src/bitcoind -testnet &
     1$ ./contrib/devtools/utxo_snapshot.sh 2500000 testnet-utxo.dat ./src/bitcoin-cli -testnet
     2Rewinding chain back to height 2500000 (by invalidating 000000000000016ffe5c7617d0ac05a8bad83aed050241d91a1c2f67c3320bfd); this may take a while
     3Generating UTXO snapshot...                                                                              
     4{                                                   
     5  "coins_written": 29324631,                                                                             
     6  "base_hash": "0000000000000093bcb68c03a9a168ae252572d348a2eaeba2cdf9231d73206f",
     7  "base_height": 2500000,                                                                                
     8  "path": "/home/thestack/.bitcoin/testnet3/testnet-utxo.dat",
     9  "txoutset_hash": "f841584909f68e47897952345234e37fcd9128cd818f41ee6c3ca68db8071be7",                   
    10  "nchaintx": 66484552                                                                                                                                                                                             
    11}
    12Restoring chain to original height; this may take a while                                                                                                                                                                                               
    

    and

     0$ ./src/bitcoind -signet &
     1$ ./contrib/devtools/utxo_snapshot.sh 160000 signet-utxo.dat ./src/bitcoin-cli -signet
     2Rewinding chain back to height 160000 (by invalidating 000000a724dba71c14821ecf2670048cc99de44c206ea2c84411b97dd1ab14a4); this may take a while
     3Generating UTXO snapshot...
     4{
     5  "coins_written": 1278002,
     6  "base_hash": "0000003ca3c99aff040f2563c2ad8f8ec88bd0fd6b8f0895cfaf1ef90353a62c",
     7  "base_height": 160000,
     8  "path": "/home/thestack/.bitcoin/signet/signet-utxo.dat",
     9  "txoutset_hash": fe0a44309b74d6b5883d246cb419c6221bcccf0b308c9b59b7d70783dbdf928a",
    10  "nchaintx": 2289496
    11}
    12Restoring chain to original height; this may take a while
    
  56. DrahtBot requested review from jamesob on Oct 23, 2023
  57. DrahtBot requested review from dergoegge on Oct 23, 2023
  58. in src/kernel/coinstats.cpp:52 in 351370a1d2 outdated
    47@@ -48,15 +48,35 @@ uint64_t GetBogoSize(const CScript& script_pub_key)
    48            script_pub_key.size() /* scriptPubKey */;
    49 }
    50 
    51-DataStream TxOutSer(const COutPoint& outpoint, const Coin& coin)
    52+template <typename T>
    53+static void TxOutSer(T& ss, const COutPoint& outpoint, const Coin& coin)
    


    ryanofsky commented at 3:09 pm on October 23, 2023:

    In commit “coinstats: Fix hash_serialized2 calculation” (351370a1d211615e3d5b158ccb0400cd79c5c085)

    Name of this function is a little confusing since its signature is changing and it is no longer returning a serialization. Would suggest calling it SerializeTxOut or similar.

  59. in src/kernel/coinstats.cpp:85 in 351370a1d2 outdated
    100-        ss << VARINT(it->first + 1);
    101-        ss << it->second.out.scriptPubKey;
    102-        ss << VARINT_MODE(it->second.out.nValue, VarIntMode::NONNEGATIVE_SIGNED);
    103-
    104-        if (it == std::prev(outputs.end())) {
    105-            ss << VARINT(0u);
    


    ryanofsky commented at 3:47 pm on October 23, 2023:

    In commit “coinstats: Fix hash_serialized2 calculation” (351370a1d211615e3d5b158ccb0400cd79c5c085)

    It seems reasonable to to get rid of ss << VARINT(0) added to the hash between transactions. It seems to date back to the original hashing code added in #7756 and I guess along with the nonzero ss << VARINT(it->first + 1) above was intended to delimit the boundary between transactions. None of this is necessary with the new hash format that repeats the transaction id before each coin.

  60. ryanofsky approved
  61. ryanofsky commented at 3:52 pm on October 23, 2023: contributor

    Code review ACK 4bfaad4eca01674a9c84a447a17594dc2b9a4c39

    I only reviewed the code and did not confirm the hardcoded hash values, but code changes seem very good. At first I was confused why the first commit was making so many changes, instead of just doing a minimal bugfix:

    0-ss << VARINT(it->second.nHeight * 2 + it->second.fCoinBase ? 1u : 0u);
    1+ss << VARINT(it->second.nHeight * 2 + (it->second.fCoinBase ? 1u : 0u));
    

    and updating the hashes.

    But I guess the idea is that as long as the hash values need to change anyway, might as well make other changes to the hash format and simplify the code around it. Making these bigger changes to the way this hash is computed instead of just fixing the bug and restoring the original hash_serialized_2 introduced in #10195 and broken in #12737 does seem reasonable.

  62. achow101 commented at 7:04 pm on October 23, 2023: member

    ACK 4bfaad4eca01674a9c84a447a17594dc2b9a4c39

    Also verified that the updated chainparams are correct.

  63. achow101 merged this on Oct 23, 2023
  64. achow101 closed this on Oct 23, 2023

  65. fjahr commented at 9:45 pm on October 23, 2023: contributor
    Thanks a lot to the reviewers! FYI, nobody has reached out concerning the usage of hash_serialized_2. If that changes I will make it known immediately.
  66. bitcoin deleted a comment on Oct 23, 2023
  67. pablomartin4btc commented at 3:09 am on October 24, 2023: member

    post-merge tACK da8e397e4a1bb1cd6185016d6537e82ac3f277c7

     0./contrib/devtools/utxo_snapshot.sh 160000 signet-utxo.dat ./src/bitcoin-cli -signet
     1Rewinding chain back to height 160000 (by invalidating 000000a724dba71c14821ecf2670048cc99de44c206ea2c84411b97dd1ab14a4); this may take a while
     2Generating UTXO snapshot...
     3{
     4  "coins_written": 1278002,
     5  "base_hash": "0000003ca3c99aff040f2563c2ad8f8ec88bd0fd6b8f0895cfaf1ef90353a62c",
     6  "base_height": 160000,
     7  "path": "/home/pablo/.bitcoin/signet/signet-utxo.dat",
     8  "txoutset_hash": "fe0a44309b74d6b5883d246cb419c6221bcccf0b308c9b59b7d70783dbdf928a",
     9  "nchaintx": 2289496
    10}
    11Restoring chain to original height; this may take a while
    
     0./contrib/devtools/utxo_snapshot.sh 2500000 testnet-utxo.dat ./src/bitcoin-cli -testnet
     1Rewinding chain back to height 2500000 (by invalidating 000000000000016ffe5c7617d0ac05a8bad83aed050241d91a1c2f67c3320bfd); this may take a while
     2Generating UTXO snapshot...
     3{
     4  "coins_written": 29324631,
     5  "base_hash": "0000000000000093bcb68c03a9a168ae252572d348a2eaeba2cdf9231d73206f",
     6  "base_height": 2500000,
     7  "path": "/home/pablo/.bitcoin/testnet3/testnet-utxo.dat",
     8  "txoutset_hash": "f841584909f68e47897952345234e37fcd9128cd818f41ee6c3ca68db8071be7",
     9  "nchaintx": 66484552
    10}
    11Restoring chain to original height; this may take a while
    
  68. pablomartin4btc referenced this in commit 24deb2022b on Oct 25, 2023
  69. Sjors commented at 6:49 am on October 25, 2023: member
    I’ll look into updating the hash in #28553

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-07-03 10:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me