The current hex tx decoding logic will refuse to decode valid extended-encoded transactions if the result fails the heuristic sanity check, even when the legacy-encoding fails. Fix this.
Fixes #20579
@instagibbs @ajtowns Ping, it seems this may be fixing a regression caused by #17775 (see #20579).
125 | @@ -126,31 +126,53 @@ static bool CheckTxScriptsSanity(const CMutableTransaction& tx) 126 | 127 | static bool DecodeTx(CMutableTransaction& tx, const std::vector<unsigned char>& tx_data, bool try_no_witness, bool try_witness) 128 | { 129 | + CMutableTransaction tx_wit, tx_nowit; 130 | + bool ok_wit = false, ok_nowit = false;
naming nit: not sure if this is better, but maybe wtx, tx, wtx_ok, tx_ok
I've renamed everything to {tx,ok}_{extended,legacy}, because extended serialization doesn't imply the result actually is a segwit transaction - it just means prefer the interpretation of extended serialization when there is ambiguity.
143 | } catch (const std::exception&) {
144 | // Fall through.
145 | }
146 | }
147 |
148 | + // Try decoding with legacy serialization, and remember if the result succesfully consumes the entire input.
nit, "successfully" (also line 132)
Fixed.
ACK daf98fec173d259afe3cd5640e73ac51a91e0a4
Could perhaps use some tests.
Can you grab the example in the issue(or generate one) as a regression test?
On Tue, Dec 8, 2020, 7:37 AM Pieter Wuille notifications@github.com wrote:
@sipa commented on this pull request.
In src/core_read.cpp https://github.com/bitcoin/bitcoin/pull/20595#discussion_r537917709:
} catch (const std::exception&) {// Fall through. } }
- // Try decoding with legacy serialization, and remember if the result succesfully consumes the entire input.
Fixed.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/bitcoin/bitcoin/pull/20595#discussion_r537917709, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABMAFU63TLZS43YZI27WVZ3STVRM7ANCNFSM4URDWYYQ .
171 | + if (ok_legacy && CheckTxScriptsSanity(tx_legacy)) { 172 | + tx = std::move(tx_legacy); 173 | + return true; 174 | + } 175 | + 176 | + // If only one deserialization succeeded, use that one.
The following if statement also handles the case where both deserializations succeeded but neither passed sanity, which the comments skip.
Maybe
// If both deserializations succeeded, prefer the one for which CheckTxScriptsSanity returns true.
// If that's the case for both ** or neither **, prefer the extended one
// If only one deserialization succeeded, use that one.
if (ok_ext && ok_legacy)
if (TxSanity(tx_ext) || !TxSanity(tx_legacy)) {
return tx_ext;
} else {
return tx_legacy;
}
} else if (ok_ext) {
return tx_ext;
} else if (ok_legacy) {
return tx_legacy;
} else {
fail;
}
would match the comments better? Or could consolidate the conditions something like:
if (ok_ext && !(ok_legacy && !TxSanity(tx_ext) && TxSanity(tx_legacy)) {
// If extended decode succeeded, prefer it, except if legacy also succeeded,
// extended didn't pass sanity checks, and legacy did pass sanity checks
return tx_ext;
} else if (ok_legacy) {
// Otherwise, if legacy succeeded use it
return tx_legacy;
} else {
// Otherwise fail
fail;
}
Alternatively you could stay closer to the current code with:
if (try_witness) {
ok_ext = Decode(tx_ext);
if (ok_ext && (!try_no_witness || Sanity(tx_ext))) return tx_ext;
}
if (try_no_witness) {
ok_legacy = Decode(tx_legacy);
if (ok_legacy && (!ok_ext || Sanity(tx_legacy)) return tx_legacy;
}
if (ok_ext) return tx_ext; // neither pass sanity checks
fail;
The comment implies that only one deserialization has succeeded if this line is reached, which is not true
// If both deserializations succeeded, prefer the extended one. Otherwise use the one that succeeded deserialization.
For code-readability I prefer the version of how the pr is currently written. The first two options are too nested to be readable.
If the goal is to skip a useless decoding in the early-return case (Sanity(tx_ext)), I prefer your third option.
I've added a number of comments, and restructured a bit along the lines of the 3rd option.
Rewrote the comments.
125 | @@ -126,31 +126,55 @@ static bool CheckTxScriptsSanity(const CMutableTransaction& tx) 126 | 127 | static bool DecodeTx(CMutableTransaction& tx, const std::vector<unsigned char>& tx_data, bool try_no_witness, bool try_witness) 128 | { 129 | + CMutableTransaction tx_extended, tx_legacy;
This potentially doubles the memory usage of DecodeTx (previously, up to 4MB for each of tx, tx_data; now an additional 4MB for each of tx_extended and tx_legacy). Is that a potential problem anywhere? (I don't think it is, but...)
It is only called by the user. It would be a bug if the legacy encoding was tried for any "real" transactions.
Yeah, I don't think this is a concern. This function is only used when doing hex decoding.
For the example in the bug report, decoderawtransaction (without specifying witness-only decode) fails, but getrawtransaction c61adde6f1c09ef4c9864e5c5aa7d92e0c6e76e4413710cf989bae9145ef141e true 0000000000000000001051d0017518f9fb7a7d201751a34fa91621ed32c0d88f succeeds, so I think this is working as I expected. I didn't suggest a change along these lines because of the extra memory usage, but if we're okay with that, +1 from me.
Separately should we consider doing anything to remove the output that fails the sanity check from the utxo set, given it doesn't parse and hence is unspendable?
ACK 451556858a23d81fd23ab2f3f7a72f61856c62d3
ACK, it seems this comes with two changes. It prefers the extended encoding:
It might be possible to harvest seeds from the decode_tx fuzz test which has good coverage for this: https://drahtbot.space/host_reports/DrahtBot/reports/coverage_fuzz/sipa/bitcoin/451556858a23d81f/fuzz.coverage/src/core_read.cpp.gcov.html
Whenever both encodings are permitted, try both, and if only one succeeds,
return that one. Otherwise prefer the one for which the heuristic sanity
check passes. If that is the case for neither or for both, return the
extended-permitting deserialization.
Restructured the code a bit, and added comments. Also added a regression test based on the example from #20579. @MarcoFalke I wrote a fuzzer to find transactions that decode incorrectly (see https://github.com/sipa/bitcoin/tree/202012_fancy_tx_hex_decode_fuzz). Turns out it's not very hard if you feed it a few real transactions as seeds, but they all look very weird. Using this we could probably quickly find extra conditions to add to CheckTxScriptsSanity, but I think we should keep that for a future PR.
Code review ACK 0f949cde3dff15170db7930b0f7345ff995c267d
Tested ACK 0f949cde3dff15170db7930b0f7345ff995c267d
The new naming, comments, and code organisation are much improved.
<details><summary>Toyed with placing the <code>CheckTxScriptsSanity</code> calls right after setting each <code>ok_{extended,legacy}</code> but it's less readable.</summary><p>
diff --git a/src/core_read.cpp b/src/core_read.cpp
index 7687a86185..fe64b7b95e 100644
--- a/src/core_read.cpp
+++ b/src/core_read.cpp
@@ -148,37 +148,39 @@ static bool DecodeTx(CMutableTransaction& tx, const std::vector<unsigned char>&
CDataStream ssData(tx_data, SER_NETWORK, PROTOCOL_VERSION);
try {
ssData >> tx_extended;
- if (ssData.empty()) ok_extended = true;
+ if (ssData.empty()) {
+ ok_extended = true;
+ // Optimization: if extended decoding succeeded and the result passes CheckTxScriptsSanity,
+ // don't bother decoding the other way.
+ if (CheckTxScriptsSanity(tx_extended)) {
+ tx = std::move(tx_extended);
+ return true;
+ }
+ }
} catch (const std::exception&) {
// Fall through.
}
}
- // Optimization: if extended decoding succeeded and the result passes CheckTxScriptsSanity,
- // don't bother decoding the other way.
- if (ok_extended && CheckTxScriptsSanity(tx_extended)) {
- tx = std::move(tx_extended);
- return true;
- }
-
// Try decoding with legacy serialization, and remember if the result successfully consumes the entire input.
if (try_no_witness) {
CDataStream ssData(tx_data, SER_NETWORK, PROTOCOL_VERSION | SERIALIZE_TRANSACTION_NO_WITNESS);
try {
ssData >> tx_legacy;
- if (ssData.empty()) ok_legacy = true;
+ if (ssData.empty()) {
+ ok_legacy = true;
+ // If legacy decoding succeeded and passes CheckTxScriptsSanity, that's our answer, as we know
+ // at this point that extended decoding either failed or doesn't pass the sanity check.
+ if (CheckTxScriptsSanity(tx_legacy)) {
+ tx = std::move(tx_legacy);
+ return true;
+ }
+ }
} catch (const std::exception&) {
// Fall through.
}
}
- // If legacy decoding succeeded and passes CheckTxScriptsSanity, that's our answer, as we know
- // at this point that extended decoding either failed or doesn't pass the sanity check.
- if (ok_legacy && CheckTxScriptsSanity(tx_legacy)) {
- tx = std::move(tx_legacy);
- return true;
- }
-
// If extended decoding succeeded, and neither decoding passes sanity, return the extended one.
</p></details>
Code review ACK 0f949cde3dff15170db7930b0f7345ff995c267d
The new naming, comments, and code organisation are much improved.
Yes the comments definitely help clarify why the function is like this.
Backported in #20612
Milestone
0.21.0