Closes #28675
The last commit demonstrates that theStack’s analysis here seems to be correct. There will be more changes needed for the rest of the test suite but the feature_assumeutxo.py
with my additional tests pass.
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.
For detailed information about the code coverage, see the test coverage report.
See the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.
Reviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.
I’ve written a fuzz test that catches the original bug mentioned in #28675 (comment): https://github.com/dergoegge/bitcoin/blob/2023-10-fuzz-coinstats-hash/src/test/fuzz/coinstats_hash.cpp.
I think it also caught a second issue regarding the values of UTXOs, certain negative values seem to result in the same hash as their positive counter part. I have not investigated further but I assume it is realted to VarIntMode::NONNEGATIVE_SIGNED
.
0$ echo "CMoC9Q==" | base64 --decode > coinstats_hash.crash
1$ FUZZ_PRINTF=1 FUZZ=coinstats_hash ./src/test/fuzz/fuzz coinstats_hash.crash
2INFO: Running with entropic power schedule (0xFF, 100).
3INFO: Seed: 3663010564
4INFO: Loaded 1 modules (379542 inline 8-bit counters): 379542 [0x560080cb5d88, 0x560080d1281e),
5INFO: Loaded 1 PC tables (379542 PCs): 379542 [0x560080d12820,0x5600812dd180),
6./src/test/fuzz/fuzz: Running 1 inputs 1 time(s) each.
7Running: coinstats_hash_crashes/crash-d59c557ae7f9e837716b272a05d1397a94f589d7
82023-10-19T11:34:50Z Opened LevelDB successfully
92023-10-19T11:34:50Z Using obfuscation key for : 0000000000000000
102023-10-19T11:34:50Z mutating a coin
112023-10-19T11:34:50Z coinbase=1 height=1386179375 value=-5934962493229608344 scriptPubKey=[97951910994b0a6104092fbc9b385639]
122023-10-19T11:34:50Z coinbase=1 height=1386179375 value=6856422276773357632 scriptPubKey=[97951910994b0a6104092fbc9b385639]
13fuzz: test/fuzz/coinstats_hash.cpp:135: void coinstats_muhash_fuzz_target(FuzzBufferType): Assertion `mutated_result->hashSerialized != result->hashSerialized' failed.
Using the varints in the serialization for the hashes seems like unnecessary complexity in the first place. Since we are changing the hash format already, maybe we just get rid of the varints entirely?
(I’ve not polished the fuzz test, sorry if you run into bugs in the test itself)
@dergoegge Awesome to have a fuzz test already :)
I think the issue with negative amounts is good to know but not as critical. Since negative output values are not consensus valid it seems ok for the serialization to assume that they are not negative. That’s at least how I understand the issue right now. So I think in practice this should at least not lead to a different hash.
I am unsure about removing VARINT from this code entirely. I agree that for the hashing it is not needed but we also use it when we dump the utxo set and there we probably don’t want to get rid of it since it’s saving space. So we still have to reason about it anyway and I think VARINT isn’t really a problem in general. It’s widely used in our code and I think also pretty well understood. But I won’t fight it if people agree that this is a valuable change.
Let’s discuss this in the IRC meeting today!
I think this should also change hash_serialized_2
-> hash_serialized_3
, and add a release note. Given that this was broken, and likely had very few users, I don’t see a problem with a breaking change, and making it clear to anyone that may have been using it, that things were broken (better than silently starting to return different results).
I don’t see any value in keeping around a broken hash_serialized_2
either, as that will just lead to future confusion, in relation to AssumeUTXO.
Using the varints in the serialization for the hashes seems like unnecessary complexity in the first place. Since we are changing the hash format already, maybe we just get rid of the varints entirely?
That’s something worth considering IMHO, had similar thoughts. I assume the primary (probably only?) reason for using VARINTs in the hashing routine is that it’s supposedly faster, as it leads to less total data that has to be processed by the underlying hash function (e.g. the vast majority of UTXO amounts can probably be encoded in 2-4 bytes if encoded as VARINTs, rather than needing the full 8 bytes). Would be interesting to benchmark how much difference in performance it really is, to judge if it’s worth it to keep the VARINTs in the hashing code.
I think this should also change
hash_serialized_2
->hash_serialized_3
Makes sense, added
Since negative output values are not consensus valid it seems ok for the serialization to assume that they are not negative.
Couldn’t an attacker distribute utxo sets with the negative values instead of the correct positive ones without it being detected at startup? This would then later lead to a consensus split when these outputs are spent.
Couldn’t an attacker distribute utxo sets with the negative values instead of the correct positive ones without it being detected at startup? This would then later lead to a consensus split when these outputs are spent.
On that note, may not be a bad idea to add a negative-amount sanity check to ActivateSnapshot()
as a belt-and-suspenders, since we’re already checking the height for overflow anyway: https://github.com/jamesob/bitcoin/blob/447f50e4aed9a8b1d80e1891cda85801aeb80b4e/src/validation.cpp#L4926-L4928.
Couldn’t an attacker distribute utxo sets with the negative values instead of the correct positive ones without it being detected at startup? This would then later lead to a consensus split when these outputs are spent.
On that note, may not be a bad idea to add a negative-amount sanity check to
ActivateSnapshot()
as a belt-and-suspenders, since we’re already checking the height for overflow anyway: https://github.com/jamesob/bitcoin/blob/447f50e4aed9a8b1d80e1891cda85801aeb80b4e/src/validation.cpp#L4926-L4928.
Yeah, I thought somehow that this was enforced by TxOut
itself but it seems that is not the case.
5398@@ -5399,6 +5399,11 @@ bool ChainstateManager::PopulateAndValidateSnapshot(
5399 coins_count - coins_left);
5400 return false;
5401 }
5402+ if (coin.out.nValue < 0) {
MoneyRange
right away…
Concept ACK and approach ACK
My only reservation here would be completely displacing hash_serialized_2
. It’s bad manners to create a breaking change to the RPC API without deprecation warnings, but I suppose if there is an existing vulnerability, the break may be preferable.
Not to create more work, but I wonder if it’s worth polling the mailinglist to see if there are existing consumers of gettxoutset.hash_serialized_2
that would be disrupted by its disappearance. Until then, it’s possible to parameterize ApplyHash
to make the old, bad behavior optional, and include both keys in non-assumeutxo RPC output.
73@@ -74,7 +74,7 @@ static void ApplyHash(HashWriter& ss, const uint256& hash, const std::map<uint32
74 for (auto it = outputs.begin(); it != outputs.end(); ++it) {
75 if (it == outputs.begin()) {
76 ss << hash;
77- ss << VARINT(it->second.nHeight * 2 + it->second.fCoinBase ? 1u : 0u);
78+ ss << VARINT(it->second.nHeight * 2 + (it->second.fCoinBase ? 1u : 0u));
I had the same suspicion earlier, but: all coins handled in this loop belong to the same tx (same outpoint hash), so they consequently should also have the same height and coinbase values, no?
// EDIT: nevermind, I wasn’t thinking this to the end.
I’ve changed the fuzzer and it found one more bug (see comment inline): https://github.com/dergoegge/bitcoin/tree/2023-10-fuzz-coinstats-hash
The version of ApplyHash
on that branch no longer uses VARINT and looks much simpler imo.
Imo, we should fix all bugs that cause two different versions of a snapshot file to result in the same hash. Otherwise the assumeutxo check becomes “check that the hashes match and check for negative output values and check …” instead of “check that the hashes match”.
Here are some benchmark results of different branches that has been proposed so far.
I’ve ran $ time ./src/bitcoin-cli gettxoutsetinfo
on mainnet, around block height 812940 three times each:
master (buggy) | this PR (with VARINTs, simple fix, still buggy) | dergoegge’s branch (without VARINTs, bugfixed) | my branch (using TxOutSer serialization, as used in MuHash) |
---|---|---|---|
1m8,664s | 1m9,024s | 1m9,861s | 1m25,875s |
1m6,803s | 1m8,815s | 1m12,192s | 1m28,743s |
1m8,808s | 1m9,466s | 1m12,227s | 1m28,711s |
The TxOutSer
approach is the simplest one (ApplyHash
follows the exact same logic as for MuHash, only that the data is fed into HashWriter
rather than MuHash3072
instance), but looking at the numbers, it seems that going with @dergoegge’s solution makes the most sense?
I have similar results as @theStack . Involvement of VARINT
doesn’t seem to have much impact on performance, but MuHash style serialization does. Though that would allow for nice simplifications in coinstats (see that commit).
master | VARINT removed (https://github.com/fjahr/bitcoin/commit/0431c115904b8df344efc864e1b2cc151cafa0ee) |
MuHash style (https://github.com/fjahr/bitcoin/commit/6b3628f2c8402695af970a6f2e54da53a30d3c65) |
---|---|---|
1m23.825s | 1m15.944s | 1m37.636s |
1m15.484s | 1m16.271s | 1m39.447s |
1m16.042s | 1m16.467s | 1m36.454s |
@theStack @fjahr Couldn’t the difference in performance be due to the fact that the muhash style version constructs a temporary
DataStream
object first, while the other feeds it to a hasher directly? It’s possible to use muhash-style serialization without the intermediary object I think.
Seems that you are right, I have tested this with the code here and I get similar performance as with master (m17.516s, 1m20.996s, 1m22.714s).
I have just pushed another commit that breaks all the tests and chainparams that were previously fixed but I would like someone else to confirm my findings in terms of performance before I finish everything up, fixing the tests, squashing etc.
This latest change aligns the serialization for muhash and hash_serialized which eliminates the VARINT
s and it doesn’t seem to come with a performance hit after I applied @sipa ’s comment. Other side-effects previously not mentioned: 1. We can get rid of PrepareHash
completely. 2. Since TxOutSer
is used in coinstatsindex and I made the design choice to change it, I needed to make some further changes and I am exposing now ApplyCoinHash
to coinstatsindex and I have also added RemoveCoinHash
to coinstats for consistency although it’s only used in coinstatsindex. Happy to discuss this though. Let the bikeshedding begin ;)
@dergoegge could you also point your merciless fuzzer at this?
Actually, we are still indirectly using a VARINT
as part of the standard Coin
serialization with this latest patch, it’s just not in coinstats
anymore: https://github.com/bitcoin/bitcoin/blob/77f0ceb7175dbd00b51e27838a7167804d67646f/src/coins.h#L64
However, at least this is now the default VARINT
and not NONNEGATIVE_SIGNED
.
Approach ACK - re-using the muhash serialization makes a lot of sense.
could you also point your merciless fuzzer at this?
Done! Didn’t find any more bugs.
52+template <typename T>
53+static void TxOutSer(T& ss, const COutPoint& outpoint, const Coin& coin)
54 {
55- DataStream ss{};
56 ss << outpoint;
57 ss << static_cast<uint32_t>(coin.nHeight * 2 + coin.fCoinBase);
0 ss << static_cast<uint32_t>((coin.nHeight << 1) + coin.fCoinBase);
This should avoid UBSan screaming at us in the future, e.g. kernel/coinstats.cpp:55:46: runtime error: signed integer overflow: 2147483636 * 2 cannot be represented in type 'int'
.
UBSan screaming
Wouldn’t it scream equally about left shift of a signed integer?
nHeight
is also not signed.
CBlockIndex::nHeight
. In that case, the potentially-narrowing cast isn’t needed and a non-narrowing cast on the bool to unsigned should be enough?
@theStack @fjahr Couldn’t the difference in performance be due to the fact that the muhash style version constructs a temporary
DataStream
object first, while the other feeds it to a hasher directly? It’s possible to use muhash-style serialization without the intermediary object I think.Seems that you are right, I have tested this with the code here and I get similar performance as with master (m17.516s, 1m20.996s, 1m22.714s).
I have just pushed another commit that breaks all the tests and chainparams that were previously fixed but I would like someone else to confirm my findings in terms of performance before I finish everything up, fixing the tests, squashing etc.
Re-did the benchmarks the current state of the PR (using MuHash serialization without the temporary DataStream
object, commit 1c78f7357b1c1053da472c43575087e3d602a01f) around block height 813024 on mainnet (see earlier benchmark #28685 (comment) for comparison):
latest state of this PR |
---|
1m16,973s |
1m17,455s |
1m17,509s |
That’s still a bit slower than on master (and alternative implementations), but IMHO it’s not too bad to be a show-stopper for using the MuHash serialization.
Cleaned everything up while using the MuHash serialization as discussed. The remaining tests are fixed and chainparams for testnet and signet updated.
This currently does not preserve the old hash_serialized_2
which is the concern of the (I think) only unaddressed comment coming from @jamesob :
My only reservation here would be completely displacing hash_serialized_2. It’s bad manners to create a breaking change to the RPC API without deprecation warnings, but I suppose if there is an existing vulnerability, the break may be preferable.
Not to create more work, but I wonder if it’s worth polling the mailinglist to see if there are existing consumers of gettxoutset.hash_serialized_2 that would be disrupted by its disappearance. Until then, it’s possible to parameterize ApplyHash to make the old, bad behavior optional, and include both keys in non-assumeutxo RPC output.
I will write a post to the mailing list today and also look into how much pain it would be to preserve the hash_serialized_2
and run it in parallel/allow its usage with a parameter. If people have an opinion on this please comment here as well.
Personally, I agree we shouldn’t break anything without warning but I am also pretty convinced that there is no serious usage of hash_serialized_2
happening so I think it would be okay to remove it.
I think this should not block review of this change here though.
Thanks, everyone for the review and feedback so far!
A few more updates:
hash_serialized_2
arise.hash_serialized_2
on top of the change set here: https://github.com/fjahr/bitcoin/commit/71d2cdcdce63c5c25bcb704badec521d32f2928d The user has the choice to request hash_serialized_2
explicitly as the hash or have it be used by default when activating -deprecatedrpc=gettxoutsetinfo
. It’s a bit simpler than I had expected honestly but obviously leaving it out is still cleaner. Please indicate if you would like to have this included here even if nobody speaks up on the mailing list in the next couple of days.EDIT: Here is another, slightly altered approach to the re-introduction of hash_serialized_2
, using another hash_type instead of bools all over: https://github.com/fjahr/bitcoin/commit/47ae7d8c3c1a10b796d0b5e9cd08da61d7dfedf9
The legacy serialization was vulnerable to maleation and is fixed by
adopting the same serialization procedure as was already in use for
MuHash.
This also includes necessary test fixes where the hash_serialized2 was
hardcoded as well as correction of the regtest chainparams.
Co-authored-by: Sebastian Falbesoner <sebastian.falbesoner@gmail.com>
-BEGIN VERIFY SCRIPT-
sed -i 's/hash_serialized_2/hash_serialized_3/g' $( git grep -l 'hash_serialized_2' ./src ./contrib ./test )
-END VERIFY SCRIPT-
Review hint: You can use devtools/utxo_snapshot.sh to validate this.
./contrib/devtools/utxo_snapshot.sh 2500000 testnet-utxo.dat ./src/bitcoin-cli
Review hint: You can use devtools/utxo_snapshot.sh to validate this.
./contrib/devtools/utxo_snapshot.sh 160000 signet-utxo.dat ./src/bitcoin-cli
Code review ACK f6213929c519d0e615cacd3d6f479f1517be1662
I did not verify the new chainparams hashes.
0@@ -0,0 +1,4 @@
1+RPC
2+---
3+
4+The `hash_serialized_2` value has been removed from `gettxoutsetinfo` since the value it calculated contained a bug and did not take all data into account. It is superseded by `hash_serialized_3` which provides the same functionality but serves the correctly calculated hash.
doc-nit: could add PR reference
0The `hash_serialized_2` value has been removed from `gettxoutsetinfo` since the value it calculated contained a bug and did not take all data into account. It is superseded by `hash_serialized_3` which provides the same functionality but serves the correctly calculated hash. (#28685)
Code-review ACK 4bfaad4eca01674a9c84a447a17594dc2b9a4c39
Verified the adapted testnet and signet txoutset hashes via
0$ ./src/bitcoind -testnet &
1$ ./contrib/devtools/utxo_snapshot.sh 2500000 testnet-utxo.dat ./src/bitcoin-cli -testnet
2Rewinding chain back to height 2500000 (by invalidating 000000000000016ffe5c7617d0ac05a8bad83aed050241d91a1c2f67c3320bfd); this may take a while
3Generating UTXO snapshot...
4{
5 "coins_written": 29324631,
6 "base_hash": "0000000000000093bcb68c03a9a168ae252572d348a2eaeba2cdf9231d73206f",
7 "base_height": 2500000,
8 "path": "/home/thestack/.bitcoin/testnet3/testnet-utxo.dat",
9 "txoutset_hash": "f841584909f68e47897952345234e37fcd9128cd818f41ee6c3ca68db8071be7",
10 "nchaintx": 66484552
11}
12Restoring chain to original height; this may take a while
and
0$ ./src/bitcoind -signet &
1$ ./contrib/devtools/utxo_snapshot.sh 160000 signet-utxo.dat ./src/bitcoin-cli -signet
2Rewinding chain back to height 160000 (by invalidating 000000a724dba71c14821ecf2670048cc99de44c206ea2c84411b97dd1ab14a4); this may take a while
3Generating UTXO snapshot...
4{
5 "coins_written": 1278002,
6 "base_hash": "0000003ca3c99aff040f2563c2ad8f8ec88bd0fd6b8f0895cfaf1ef90353a62c",
7 "base_height": 160000,
8 "path": "/home/thestack/.bitcoin/signet/signet-utxo.dat",
9 "txoutset_hash": fe0a44309b74d6b5883d246cb419c6221bcccf0b308c9b59b7d70783dbdf928a",
10 "nchaintx": 2289496
11}
12Restoring chain to original height; this may take a while
47@@ -48,15 +48,35 @@ uint64_t GetBogoSize(const CScript& script_pub_key)
48 script_pub_key.size() /* scriptPubKey */;
49 }
50
51-DataStream TxOutSer(const COutPoint& outpoint, const Coin& coin)
52+template <typename T>
53+static void TxOutSer(T& ss, const COutPoint& outpoint, const Coin& coin)
In commit “coinstats: Fix hash_serialized2 calculation” (351370a1d211615e3d5b158ccb0400cd79c5c085)
Name of this function is a little confusing since its signature is changing and it is no longer returning a serialization. Would suggest calling it SerializeTxOut or similar.
100- ss << VARINT(it->first + 1);
101- ss << it->second.out.scriptPubKey;
102- ss << VARINT_MODE(it->second.out.nValue, VarIntMode::NONNEGATIVE_SIGNED);
103-
104- if (it == std::prev(outputs.end())) {
105- ss << VARINT(0u);
In commit “coinstats: Fix hash_serialized2 calculation” (351370a1d211615e3d5b158ccb0400cd79c5c085)
It seems reasonable to to get rid of ss << VARINT(0)
added to the hash between transactions. It seems to date back to the original hashing code added in #7756 and I guess along with the nonzero ss << VARINT(it->first + 1)
above was intended to delimit the boundary between transactions. None of this is necessary with the new hash format that repeats the transaction id before each coin.
Code review ACK 4bfaad4eca01674a9c84a447a17594dc2b9a4c39
I only reviewed the code and did not confirm the hardcoded hash values, but code changes seem very good. At first I was confused why the first commit was making so many changes, instead of just doing a minimal bugfix:
0-ss << VARINT(it->second.nHeight * 2 + it->second.fCoinBase ? 1u : 0u);
1+ss << VARINT(it->second.nHeight * 2 + (it->second.fCoinBase ? 1u : 0u));
and updating the hashes.
But I guess the idea is that as long as the hash values need to change anyway, might as well make other changes to the hash format and simplify the code around it. Making these bigger changes to the way this hash is computed instead of just fixing the bug and restoring the original hash_serialized_2
introduced in #10195 and broken in #12737 does seem reasonable.
ACK 4bfaad4eca01674a9c84a447a17594dc2b9a4c39
Also verified that the updated chainparams are correct.
hash_serialized_2
. If that changes I will make it known immediately.
post-merge tACK da8e397e4a1bb1cd6185016d6537e82ac3f277c7
0./contrib/devtools/utxo_snapshot.sh 160000 signet-utxo.dat ./src/bitcoin-cli -signet
1Rewinding chain back to height 160000 (by invalidating 000000a724dba71c14821ecf2670048cc99de44c206ea2c84411b97dd1ab14a4); this may take a while
2Generating UTXO snapshot...
3{
4 "coins_written": 1278002,
5 "base_hash": "0000003ca3c99aff040f2563c2ad8f8ec88bd0fd6b8f0895cfaf1ef90353a62c",
6 "base_height": 160000,
7 "path": "/home/pablo/.bitcoin/signet/signet-utxo.dat",
8 "txoutset_hash": "fe0a44309b74d6b5883d246cb419c6221bcccf0b308c9b59b7d70783dbdf928a",
9 "nchaintx": 2289496
10}
11Restoring chain to original height; this may take a while
0./contrib/devtools/utxo_snapshot.sh 2500000 testnet-utxo.dat ./src/bitcoin-cli -testnet
1Rewinding chain back to height 2500000 (by invalidating 000000000000016ffe5c7617d0ac05a8bad83aed050241d91a1c2f67c3320bfd); this may take a while
2Generating UTXO snapshot...
3{
4 "coins_written": 29324631,
5 "base_hash": "0000000000000093bcb68c03a9a168ae252572d348a2eaeba2cdf9231d73206f",
6 "base_height": 2500000,
7 "path": "/home/pablo/.bitcoin/testnet3/testnet-utxo.dat",
8 "txoutset_hash": "f841584909f68e47897952345234e37fcd9128cd818f41ee6c3ca68db8071be7",
9 "nchaintx": 66484552
10}
11Restoring chain to original height; this may take a while