Improve heuristic hex transaction decoding

sipa commented at 10:39 pm on December 7, 2020: member

The current hex tx decoding logic will refuse to decode valid extended-encoded transactions if the result fails the heuristic sanity check, even when the legacy-encoding fails. Fix this.

Fixes #20579

fanquake added the label Needs backport (0.21) on Dec 7, 2020

sipa commented at 10:55 pm on December 7, 2020: member

@instagibbs @ajtowns Ping, it seems this may be fixing a regression caused by #17775 (see #20579).

in src/core_read.cpp:130 in daf98fec17 outdated

125@@ -126,31 +126,53 @@ static bool CheckTxScriptsSanity(const CMutableTransaction& tx)
126 
127 static bool DecodeTx(CMutableTransaction& tx, const std::vector<unsigned char>& tx_data, bool try_no_witness, bool try_witness)
128 {
129+    CMutableTransaction tx_wit, tx_nowit;
130+    bool ok_wit = false, ok_nowit = false;

jonatack commented at 11:07 pm on December 7, 2020:

naming nit: not sure if this is better, but maybe wtx, tx, wtx_ok, tx_ok

sipa commented at 11:37 pm on December 7, 2020:

I’ve renamed everything to {tx,ok}_{extended,legacy}, because extended serialization doesn’t imply the result actually is a segwit transaction - it just means prefer the interpretation of extended serialization when there is ambiguity.

in src/core_read.cpp:145 in daf98fec17 outdated

143         } catch (const std::exception&) {
144             // Fall through.
145         }
146     }
147 
148+    // Try decoding with legacy serialization, and remember if the result succesfully consumes the entire input.

jonatack commented at 11:14 pm on December 7, 2020:

nit, “successfully” (also line 132)

sipa commented at 11:37 pm on December 7, 2020:

Fixed.

laanwj added the label RPC/REST/ZMQ on Dec 7, 2020

sipa renamed this:
~~Improve heuristic hex decoding~~
Improve heuristic hex transaction decoding
on Dec 7, 2020

jonatack commented at 11:21 pm on December 7, 2020: member

ACK daf98fec173d259afe3cd5640e73ac51a91e0a4

Could perhaps use some tests.

sipa force-pushed on Dec 7, 2020

instagibbs commented at 2:29 am on December 8, 2020: member

Can you grab the example in the issue(or generate one) as a regression test?

On Tue, Dec 8, 2020, 7:37 AM Pieter Wuille notifications@github.com wrote:

@sipa commented on this pull request.

In src/core_read.cpp https://github.com/bitcoin/bitcoin/pull/20595#discussion_r537917709:
     } catch (const std::exception&) {
         // Fall through.
     }
 }
// Try decoding with legacy serialization, and remember if the result succesfully consumes the entire input.

Fixed.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/bitcoin/bitcoin/pull/20595#discussion_r537917709, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABMAFU63TLZS43YZI27WVZ3STVRM7ANCNFSM4URDWYYQ .

in src/core_read.cpp:167 in 451556858a outdated

171+    if (ok_legacy && CheckTxScriptsSanity(tx_legacy)) {
172+        tx = std::move(tx_legacy);
173+        return true;
174+    }
175+
176+    // If only one deserialization succeeded, use that one.

ajtowns commented at 6:25 am on December 8, 2020:

The following if statement also handles the case where both deserializations succeeded but neither passed sanity, which the comments skip.

Maybe

 0// If both deserializations succeeded, prefer the one for which CheckTxScriptsSanity returns true.
 1// If that's the case for both ** or neither **, prefer the extended one
 2// If only one deserialization succeeded, use that one.
 3if (ok_ext && ok_legacy)
 4    if (TxSanity(tx_ext) || !TxSanity(tx_legacy)) {
 5        return tx_ext;
 6    } else {
 7        return tx_legacy;
 8    }
 9} else if (ok_ext) {
10    return tx_ext;
11} else if (ok_legacy) {
12   return tx_legacy;
13} else {
14   fail;
15}

would match the comments better? Or could consolidate the conditions something like:

 0if (ok_ext && !(ok_legacy && !TxSanity(tx_ext) && TxSanity(tx_legacy)) {
 1    // If extended decode succeeded, prefer it, except if legacy also succeeded,
 2    // extended didn't pass sanity checks, and legacy did pass sanity checks
 3    return tx_ext;
 4} else if (ok_legacy) {
 5    // Otherwise, if legacy succeeded use it
 6    return tx_legacy;
 7} else {
 8   // Otherwise fail
 9   fail;
10}

Alternatively you could stay closer to the current code with:

0if (try_witness) {
1    ok_ext = Decode(tx_ext);
2    if (ok_ext && (!try_no_witness || Sanity(tx_ext))) return tx_ext;
3}
4if (try_no_witness) {
5    ok_legacy = Decode(tx_legacy);
6    if (ok_legacy && (!ok_ext || Sanity(tx_legacy)) return tx_legacy;
7}
8if (ok_ext) return tx_ext; // neither pass sanity checks
9fail;

MarcoFalke commented at 7:03 am on December 8, 2020:

The comment implies that only one deserialization has succeeded if this line is reached, which is not true

0    // If both deserializations succeeded, prefer the extended one. Otherwise use the one that succeeded deserialization.

MarcoFalke commented at 8:54 am on December 8, 2020:

For code-readability I prefer the version of how the pr is currently written. The first two options are too nested to be readable. If the goal is to skip a useless decoding in the early-return case (Sanity(tx_ext)), I prefer your third option.

sipa commented at 9:14 pm on December 8, 2020:

I’ve added a number of comments, and restructured a bit along the lines of the 3rd option.

sipa commented at 9:15 pm on December 8, 2020:

Rewrote the comments.

in src/core_read.cpp:142 in 451556858a outdated

125@@ -126,31 +126,55 @@ static bool CheckTxScriptsSanity(const CMutableTransaction& tx)
126 
127 static bool DecodeTx(CMutableTransaction& tx, const std::vector<unsigned char>& tx_data, bool try_no_witness, bool try_witness)
128 {
129+    CMutableTransaction tx_extended, tx_legacy;

ajtowns commented at 6:28 am on December 8, 2020:

This potentially doubles the memory usage of DecodeTx (previously, up to 4MB for each of tx, tx_data; now an additional 4MB for each of tx_extended and tx_legacy). Is that a potential problem anywhere? (I don’t think it is, but…)

MarcoFalke commented at 7:11 am on December 8, 2020:

It is only called by the user. It would be a bug if the legacy encoding was tried for any “real” transactions.

sipa commented at 9:14 pm on December 8, 2020:

Yeah, I don’t think this is a concern. This function is only used when doing hex decoding.

MarcoFalke added this to the milestone 0.21.0 on Dec 8, 2020

ajtowns commented at 7:07 am on December 8, 2020: member

For the example in the bug report, decoderawtransaction (without specifying witness-only decode) fails, but getrawtransaction c61adde6f1c09ef4c9864e5c5aa7d92e0c6e76e4413710cf989bae9145ef141e true 0000000000000000001051d0017518f9fb7a7d201751a34fa91621ed32c0d88f succeeds, so I think this is working as I expected. I didn’t suggest a change along these lines because of the extra memory usage, but if we’re okay with that, +1 from me.

Separately should we consider doing anything to remove the output that fails the sanity check from the utxo set, given it doesn’t parse and hence is unspendable?

ACK 451556858a23d81fd23ab2f3f7a72f61856c62d3

MarcoFalke commented at 7:10 am on December 8, 2020: member

ACK, it seems this comes with two changes. It prefers the extended encoding:

when neither passes the sanity check, but both deserialize. Previously it preferred the legacy one.
when the extended one doesn’t pass sanity checks and the legacy encoding doesn’t succeed. Previously it failed.

MarcoFalke commented at 2:31 pm on December 8, 2020: member

It might be possible to harvest seeds from the decode_tx fuzz test which has good coverage for this: https://drahtbot.space/host_reports/DrahtBot/reports/coverage_fuzz/sipa/bitcoin/451556858a23d81f/fuzz.coverage/src/core_read.cpp.gcov.html

Improve heuristic hex transaction decoding

Whenever both encodings are permitted, try both, and if only one succeeds,
return that one. Otherwise prefer the one for which the heuristic sanity
check passes. If that is the case for neither or for both, return the
extended-permitting deserialization.

39c42c4420

sipa force-pushed on Dec 8, 2020

Add regression test for incorrect decoding 0f949cde3d

sipa commented at 9:23 pm on December 8, 2020: member

Restructured the code a bit, and added comments. Also added a regression test based on the example from #20579. @MarcoFalke I wrote a fuzzer to find transactions that decode incorrectly (see https://github.com/sipa/bitcoin/tree/202012_fancy_tx_hex_decode_fuzz). Turns out it’s not very hard if you feed it a few real transactions as seeds, but they all look very weird. Using this we could probably quickly find extra conditions to add to CheckTxScriptsSanity, but I think we should keep that for a future PR.

achow101 commented at 4:40 pm on December 9, 2020: member

Code review ACK 0f949cde3dff15170db7930b0f7345ff995c267d

jonatack commented at 0:13 am on December 10, 2020: member

Tested ACK 0f949cde3dff15170db7930b0f7345ff995c267d

The new naming, comments, and code organisation are much improved.

 0diff --git a/src/core_read.cpp b/src/core_read.cpp
 1index 7687a86185..fe64b7b95e 100644
 2--- a/src/core_read.cpp
 3+++ b/src/core_read.cpp
 4@@ -148,37 +148,39 @@ static bool DecodeTx(CMutableTransaction& tx, const std::vector<unsigned char>&
 5         CDataStream ssData(tx_data, SER_NETWORK, PROTOCOL_VERSION);
 6         try {
 7             ssData >> tx_extended;
 8-            if (ssData.empty()) ok_extended = true;
 9+            if (ssData.empty()) {
10+                ok_extended = true;
11+                // Optimization: if extended decoding succeeded and the result passes CheckTxScriptsSanity,
12+                // don't bother decoding the other way.
13+                if (CheckTxScriptsSanity(tx_extended)) {
14+                    tx = std::move(tx_extended);
15+                    return true;
16+                }
17+            }
18         } catch (const std::exception&) {
19             // Fall through.
20         }
21     }
22 
23-    // Optimization: if extended decoding succeeded and the result passes CheckTxScriptsSanity,
24-    // don't bother decoding the other way.
25-    if (ok_extended && CheckTxScriptsSanity(tx_extended)) {
26-        tx = std::move(tx_extended);
27-        return true;
28-    }
29-
30     // Try decoding with legacy serialization, and remember if the result successfully consumes the entire input.
31     if (try_no_witness) {
32         CDataStream ssData(tx_data, SER_NETWORK, PROTOCOL_VERSION | SERIALIZE_TRANSACTION_NO_WITNESS);
33         try {
34             ssData >> tx_legacy;
35-            if (ssData.empty()) ok_legacy = true;
36+            if (ssData.empty()) {
37+                ok_legacy = true;
38+                // If legacy decoding succeeded and passes CheckTxScriptsSanity, that's our answer, as we know
39+                // at this point that extended decoding either failed or doesn't pass the sanity check.
40+                if (CheckTxScriptsSanity(tx_legacy)) {
41+                    tx = std::move(tx_legacy);
42+                    return true;
43+                }
44+            }
45         } catch (const std::exception&) {
46             // Fall through.
47         }
48     }
49 
50-    // If legacy decoding succeeded and passes CheckTxScriptsSanity, that's our answer, as we know
51-    // at this point that extended decoding either failed or doesn't pass the sanity check.
52-    if (ok_legacy && CheckTxScriptsSanity(tx_legacy)) {
53-        tx = std::move(tx_legacy);
54-        return true;
55-    }
56-
57     // If extended decoding succeeded, and neither decoding passes sanity, return the extended one.

laanwj commented at 10:18 am on December 10, 2020: member

Code review ACK 0f949cde3dff15170db7930b0f7345ff995c267d

The new naming, comments, and code organisation are much improved.

Yes the comments definitely help clarify why the function is like this.

laanwj merged this on Dec 10, 2020

laanwj closed this on Dec 10, 2020

MarcoFalke referenced this in commit 1caa32e3f2 on Dec 10, 2020

MarcoFalke referenced this in commit ce13b99020 on Dec 10, 2020

MarcoFalke commented at 10:50 am on December 10, 2020: member

Backported in #20612

MarcoFalke removed the label Needs backport (0.21) on Dec 10, 2020

sidhujag referenced this in commit 0fc0cf7160 on Dec 10, 2020

DrahtBot locked this on Feb 15, 2022

Improve heuristic hex transaction decoding #20595

@sipa commented on this pull request.