Improve heuristic hex transaction decoding #20595

pull sipa wants to merge 2 commits into bitcoin:master from sipa:202012_fancy_tx_hex_decode changing 2 files +57 −9
  1. sipa commented at 10:39 pm on December 7, 2020: member

    The current hex tx decoding logic will refuse to decode valid extended-encoded transactions if the result fails the heuristic sanity check, even when the legacy-encoding fails. Fix this.

    Fixes #20579

  2. fanquake added the label Needs backport (0.21) on Dec 7, 2020
  3. sipa commented at 10:55 pm on December 7, 2020: member
    @instagibbs @ajtowns Ping, it seems this may be fixing a regression caused by #17775 (see #20579).
  4. in src/core_read.cpp:130 in daf98fec17 outdated
    125@@ -126,31 +126,53 @@ static bool CheckTxScriptsSanity(const CMutableTransaction& tx)
    126 
    127 static bool DecodeTx(CMutableTransaction& tx, const std::vector<unsigned char>& tx_data, bool try_no_witness, bool try_witness)
    128 {
    129+    CMutableTransaction tx_wit, tx_nowit;
    130+    bool ok_wit = false, ok_nowit = false;
    


    jonatack commented at 11:07 pm on December 7, 2020:
    naming nit: not sure if this is better, but maybe wtx, tx, wtx_ok, tx_ok

    sipa commented at 11:37 pm on December 7, 2020:
    I’ve renamed everything to {tx,ok}_{extended,legacy}, because extended serialization doesn’t imply the result actually is a segwit transaction - it just means prefer the interpretation of extended serialization when there is ambiguity.
  5. in src/core_read.cpp:145 in daf98fec17 outdated
    143         } catch (const std::exception&) {
    144             // Fall through.
    145         }
    146     }
    147 
    148+    // Try decoding with legacy serialization, and remember if the result succesfully consumes the entire input.
    


    jonatack commented at 11:14 pm on December 7, 2020:
    nit, “successfully” (also line 132)

    sipa commented at 11:37 pm on December 7, 2020:
    Fixed.
  6. laanwj added the label RPC/REST/ZMQ on Dec 7, 2020
  7. sipa renamed this:
    Improve heuristic hex decoding
    Improve heuristic hex transaction decoding
    on Dec 7, 2020
  8. jonatack commented at 11:21 pm on December 7, 2020: member

    ACK daf98fec173d259afe3cd5640e73ac51a91e0a4

    Could perhaps use some tests.

  9. sipa force-pushed on Dec 7, 2020
  10. sipa force-pushed on Dec 7, 2020
  11. instagibbs commented at 2:29 am on December 8, 2020: member

    Can you grab the example in the issue(or generate one) as a regression test?

    On Tue, Dec 8, 2020, 7:37 AM Pieter Wuille notifications@github.com wrote:

    @sipa commented on this pull request.

    In src/core_read.cpp https://github.com/bitcoin/bitcoin/pull/20595#discussion_r537917709:

         } catch (const std::exception&) {
    
             // Fall through.
         }
     }
    
    • // Try decoding with legacy serialization, and remember if the result succesfully consumes the entire input.

    Fixed.

    — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/bitcoin/bitcoin/pull/20595#discussion_r537917709, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABMAFU63TLZS43YZI27WVZ3STVRM7ANCNFSM4URDWYYQ .

  12. in src/core_read.cpp:167 in 451556858a outdated
    171+    if (ok_legacy && CheckTxScriptsSanity(tx_legacy)) {
    172+        tx = std::move(tx_legacy);
    173+        return true;
    174+    }
    175+
    176+    // If only one deserialization succeeded, use that one.
    


    ajtowns commented at 6:25 am on December 8, 2020:

    The following if statement also handles the case where both deserializations succeeded but neither passed sanity, which the comments skip.

    Maybe

     0// If both deserializations succeeded, prefer the one for which CheckTxScriptsSanity returns true.
     1// If that's the case for both ** or neither **, prefer the extended one
     2// If only one deserialization succeeded, use that one.
     3if (ok_ext && ok_legacy)
     4    if (TxSanity(tx_ext) || !TxSanity(tx_legacy)) {
     5        return tx_ext;
     6    } else {
     7        return tx_legacy;
     8    }
     9} else if (ok_ext) {
    10    return tx_ext;
    11} else if (ok_legacy) {
    12   return tx_legacy;
    13} else {
    14   fail;
    15}
    

    would match the comments better? Or could consolidate the conditions something like:

     0if (ok_ext && !(ok_legacy && !TxSanity(tx_ext) && TxSanity(tx_legacy)) {
     1    // If extended decode succeeded, prefer it, except if legacy also succeeded,
     2    // extended didn't pass sanity checks, and legacy did pass sanity checks
     3    return tx_ext;
     4} else if (ok_legacy) {
     5    // Otherwise, if legacy succeeded use it
     6    return tx_legacy;
     7} else {
     8   // Otherwise fail
     9   fail;
    10}
    

    Alternatively you could stay closer to the current code with:

    0if (try_witness) {
    1    ok_ext = Decode(tx_ext);
    2    if (ok_ext && (!try_no_witness || Sanity(tx_ext))) return tx_ext;
    3}
    4if (try_no_witness) {
    5    ok_legacy = Decode(tx_legacy);
    6    if (ok_legacy && (!ok_ext || Sanity(tx_legacy)) return tx_legacy;
    7}
    8if (ok_ext) return tx_ext; // neither pass sanity checks
    9fail;
    

    MarcoFalke commented at 7:03 am on December 8, 2020:

    The comment implies that only one deserialization has succeeded if this line is reached, which is not true

    0    // If both deserializations succeeded, prefer the extended one. Otherwise use the one that succeeded deserialization.
    

    MarcoFalke commented at 8:54 am on December 8, 2020:
    For code-readability I prefer the version of how the pr is currently written. The first two options are too nested to be readable. If the goal is to skip a useless decoding in the early-return case (Sanity(tx_ext)), I prefer your third option.

    sipa commented at 9:14 pm on December 8, 2020:
    I’ve added a number of comments, and restructured a bit along the lines of the 3rd option.

    sipa commented at 9:15 pm on December 8, 2020:
    Rewrote the comments.
  13. in src/core_read.cpp:142 in 451556858a outdated
    125@@ -126,31 +126,55 @@ static bool CheckTxScriptsSanity(const CMutableTransaction& tx)
    126 
    127 static bool DecodeTx(CMutableTransaction& tx, const std::vector<unsigned char>& tx_data, bool try_no_witness, bool try_witness)
    128 {
    129+    CMutableTransaction tx_extended, tx_legacy;
    


    ajtowns commented at 6:28 am on December 8, 2020:
    This potentially doubles the memory usage of DecodeTx (previously, up to 4MB for each of tx, tx_data; now an additional 4MB for each of tx_extended and tx_legacy). Is that a potential problem anywhere? (I don’t think it is, but…)

    MarcoFalke commented at 7:11 am on December 8, 2020:
    It is only called by the user. It would be a bug if the legacy encoding was tried for any “real” transactions.

    sipa commented at 9:14 pm on December 8, 2020:
    Yeah, I don’t think this is a concern. This function is only used when doing hex decoding.
  14. MarcoFalke added this to the milestone 0.21.0 on Dec 8, 2020
  15. ajtowns commented at 7:07 am on December 8, 2020: member

    For the example in the bug report, decoderawtransaction (without specifying witness-only decode) fails, but getrawtransaction c61adde6f1c09ef4c9864e5c5aa7d92e0c6e76e4413710cf989bae9145ef141e true 0000000000000000001051d0017518f9fb7a7d201751a34fa91621ed32c0d88f succeeds, so I think this is working as I expected. I didn’t suggest a change along these lines because of the extra memory usage, but if we’re okay with that, +1 from me.

    Separately should we consider doing anything to remove the output that fails the sanity check from the utxo set, given it doesn’t parse and hence is unspendable?

    ACK 451556858a23d81fd23ab2f3f7a72f61856c62d3

  16. MarcoFalke commented at 7:10 am on December 8, 2020: member

    ACK, it seems this comes with two changes. It prefers the extended encoding:

    • when neither passes the sanity check, but both deserialize. Previously it preferred the legacy one.
    • when the extended one doesn’t pass sanity checks and the legacy encoding doesn’t succeed. Previously it failed.
  17. MarcoFalke commented at 2:31 pm on December 8, 2020: member
    It might be possible to harvest seeds from the decode_tx fuzz test which has good coverage for this: https://drahtbot.space/host_reports/DrahtBot/reports/coverage_fuzz/sipa/bitcoin/451556858a23d81f/fuzz.coverage/src/core_read.cpp.gcov.html
  18. Improve heuristic hex transaction decoding
    Whenever both encodings are permitted, try both, and if only one succeeds,
    return that one. Otherwise prefer the one for which the heuristic sanity
    check passes. If that is the case for neither or for both, return the
    extended-permitting deserialization.
    39c42c4420
  19. sipa force-pushed on Dec 8, 2020
  20. Add regression test for incorrect decoding 0f949cde3d
  21. sipa commented at 9:23 pm on December 8, 2020: member
    Restructured the code a bit, and added comments. Also added a regression test based on the example from #20579. @MarcoFalke I wrote a fuzzer to find transactions that decode incorrectly (see https://github.com/sipa/bitcoin/tree/202012_fancy_tx_hex_decode_fuzz). Turns out it’s not very hard if you feed it a few real transactions as seeds, but they all look very weird. Using this we could probably quickly find extra conditions to add to CheckTxScriptsSanity, but I think we should keep that for a future PR.
  22. achow101 commented at 4:40 pm on December 9, 2020: member
    Code review ACK 0f949cde3dff15170db7930b0f7345ff995c267d
  23. jonatack commented at 0:13 am on December 10, 2020: member

    Tested ACK 0f949cde3dff15170db7930b0f7345ff995c267d

    The new naming, comments, and code organisation are much improved.

     0diff --git a/src/core_read.cpp b/src/core_read.cpp
     1index 7687a86185..fe64b7b95e 100644
     2--- a/src/core_read.cpp
     3+++ b/src/core_read.cpp
     4@@ -148,37 +148,39 @@ static bool DecodeTx(CMutableTransaction& tx, const std::vector<unsigned char>&
     5         CDataStream ssData(tx_data, SER_NETWORK, PROTOCOL_VERSION);
     6         try {
     7             ssData >> tx_extended;
     8-            if (ssData.empty()) ok_extended = true;
     9+            if (ssData.empty()) {
    10+                ok_extended = true;
    11+                // Optimization: if extended decoding succeeded and the result passes CheckTxScriptsSanity,
    12+                // don't bother decoding the other way.
    13+                if (CheckTxScriptsSanity(tx_extended)) {
    14+                    tx = std::move(tx_extended);
    15+                    return true;
    16+                }
    17+            }
    18         } catch (const std::exception&) {
    19             // Fall through.
    20         }
    21     }
    22 
    23-    // Optimization: if extended decoding succeeded and the result passes CheckTxScriptsSanity,
    24-    // don't bother decoding the other way.
    25-    if (ok_extended && CheckTxScriptsSanity(tx_extended)) {
    26-        tx = std::move(tx_extended);
    27-        return true;
    28-    }
    29-
    30     // Try decoding with legacy serialization, and remember if the result successfully consumes the entire input.
    31     if (try_no_witness) {
    32         CDataStream ssData(tx_data, SER_NETWORK, PROTOCOL_VERSION | SERIALIZE_TRANSACTION_NO_WITNESS);
    33         try {
    34             ssData >> tx_legacy;
    35-            if (ssData.empty()) ok_legacy = true;
    36+            if (ssData.empty()) {
    37+                ok_legacy = true;
    38+                // If legacy decoding succeeded and passes CheckTxScriptsSanity, that's our answer, as we know
    39+                // at this point that extended decoding either failed or doesn't pass the sanity check.
    40+                if (CheckTxScriptsSanity(tx_legacy)) {
    41+                    tx = std::move(tx_legacy);
    42+                    return true;
    43+                }
    44+            }
    45         } catch (const std::exception&) {
    46             // Fall through.
    47         }
    48     }
    49 
    50-    // If legacy decoding succeeded and passes CheckTxScriptsSanity, that's our answer, as we know
    51-    // at this point that extended decoding either failed or doesn't pass the sanity check.
    52-    if (ok_legacy && CheckTxScriptsSanity(tx_legacy)) {
    53-        tx = std::move(tx_legacy);
    54-        return true;
    55-    }
    56-
    57     // If extended decoding succeeded, and neither decoding passes sanity, return the extended one.
    
  24. laanwj commented at 10:18 am on December 10, 2020: member

    Code review ACK 0f949cde3dff15170db7930b0f7345ff995c267d

    The new naming, comments, and code organisation are much improved.

    Yes the comments definitely help clarify why the function is like this.

  25. laanwj merged this on Dec 10, 2020
  26. laanwj closed this on Dec 10, 2020

  27. MarcoFalke referenced this in commit 1caa32e3f2 on Dec 10, 2020
  28. MarcoFalke referenced this in commit ce13b99020 on Dec 10, 2020
  29. MarcoFalke commented at 10:50 am on December 10, 2020: member
    Backported in #20612
  30. MarcoFalke removed the label Needs backport (0.21) on Dec 10, 2020
  31. sidhujag referenced this in commit 0fc0cf7160 on Dec 10, 2020
  32. DrahtBot locked this on Feb 15, 2022

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-07-06 21:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me