util: Limit decimal range of numbers ParseScript accepts #18416

pull pierreN wants to merge 1 commits into bitcoin:master from pierreN:fix-parsescript-numop-overflow changing 8 files +63 −17
  1. pierreN commented at 7:36 am on March 24, 2020: contributor

    Following up on this suggestion : #18413 (comment), prevent the output of atoi64 in the core_read.cpp:ParseScript helper to send to CScriptNum::serialize values wider than 32-bit.

    Since the ParseScript helper is only used by the tool defined in bitcoin-tx.cpp, this only prevents users to provide too much unrealistic values.

  2. in src/core_read.cpp:68 in 88bd812672 outdated
    66+            //for details, see : https://github.com/bitcoin/bitcoin/pull/18413#issuecomment-602954469
    67+            if(n > std::numeric_limits<int32_t>::max()) {
    68+                n = std::numeric_limits<int32_t>::max();
    69+            } else if(n < std::numeric_limits<int32_t>::min()) {
    70+                n = std::numeric_limits<int32_t>::min();
    71+            }
    


    theStack commented at 8:09 am on March 24, 2020:

    FWIW, here’s an alternative shorter version for the same logic, without branches:

    0    n = std::min(n, static_cast<int64_t>(std::numeric_limits<int32_t>::max()));
    1    n = std::max(n, static_cast<int64_t>(std::numeric_limits<int32_t>::min()));
    

    Not sure if it’s really more readable though, personally I’d be okay with both versions.


    pierreN commented at 8:19 am on March 24, 2020:

    std::clamp in C++17 would be even better and less verbose, but unsure if we can use C++17 features ?

    Note that min/max in the standard also do exactly one comparison each.

    I can remove the if/else though if people think it fits better here.


    theStack commented at 8:31 am on March 24, 2020:
    Ah, std::clamp would indeed be a very nice option. Unfortunately, upgrading to C++17 has not happened yet (see this discussion: #16684), so it can’t be used. Sometimes substitutes for not-yet-available C++ features are implemented by contributors, like Span for the C++20 std::span by sipa (see #12886), don’t know though if that is a desired way to go here, and if yes, if it should be packed in this PR. This should probably include identifiying all clamps throughout the codebase and using our newly introduced clamp function (so it can be replaced with std::clamp once we move to C++17).

    pierreN commented at 8:35 am on March 24, 2020:

    Thanks, I’d be down to do that in a different PR. I’m looking for stuffs to do to improve my knowledge of the codebase anyway :)

    I’ll wait for some other people to confirm “yes this is a good idea, do a separate PR for std::clamp” before starting though

  3. pierreN force-pushed on Mar 24, 2020
  4. DrahtBot added the label Tests on Mar 24, 2020
  5. pierreN force-pushed on Mar 24, 2020
  6. pierreN commented at 3:35 pm on March 24, 2020: contributor

    It seems that the tests were already including values wider than 2^32 so I reckon there is 2 options :

    • remove/update tests to make them fit into a 2^32
    • just prevent the UB value instead

    The second option seemed the less intrusive, so I updated the PR to only forbid the value -2^63. I also added 2 tests reflecting that : before this PR, those tests were triggering the UB before failing.

  7. sipa commented at 4:14 pm on March 24, 2020: member

    I think we should just fix UB in CScriptNum::serialize; that should be easy, and avoid all issues.

    In addition, I think we may want to restrict what range of numbers ParseScript accepts in decimal, because numbers outside -0xFFFFFFFF…0xFFFFFFFF are simply illegal in scripts (they don’t work as numbers, so they should really be represented as hex strings instead if they’re used at all). That has nothing to do with avoiding UB; it’s just fixing weird behavior in ParseScript.

  8. pierreN commented at 11:57 pm on March 24, 2020: contributor

    OK, thanks.

    So. if I understand correctly, we do agree that the following tests in src/test/data/script_tests.json are wrong and should be removed (most of them originally from 90320d67779be5c97061380c035d3fe51b7ce74b ) ?

     0["549755813887", "SIZE 5 EQUAL", "P2SH,STRICTENC", "OK"],
     1["549755813888", "SIZE 6 EQUAL", "P2SH,STRICTENC", "OK"],
     2["9223372036854775807", "SIZE 8 EQUAL", "P2SH,STRICTENC", "OK"],
     3
     4["-549755813887", "SIZE 5 EQUAL", "P2SH,STRICTENC", "OK"],
     5["-549755813888", "SIZE 6 EQUAL", "P2SH,STRICTENC", "OK"],
     6["-9223372036854775807", "SIZE 8 EQUAL", "P2SH,STRICTENC", "OK"],
     7
     8["549755813887", "0x05 0xFFFFFFFF7F EQUAL", "P2SH,STRICTENC", "OK"],
     9["549755813888", "0x06 0x000000008000 EQUAL", "P2SH,STRICTENC", "OK"],
    10["9223372036854775807", "0x08 0xFFFFFFFFFFFFFF7F EQUAL", "P2SH,STRICTENC", "OK"],
    11
    12["-549755813887", "0x05 0xFFFFFFFFFF EQUAL", "P2SH,STRICTENC", "OK"],
    13["-549755813888", "0x06 0x000000008080 EQUAL", "P2SH,STRICTENC", "OK"],
    14["-9223372036854775807", "0x08 0xFFFFFFFFFFFFFFFF EQUAL", "P2SH,STRICTENC", "OK"],
    15
    16["4294967296", "CHECKSEQUENCEVERIFY", "CHECKSEQUENCEVERIFY", "UNSATISFIED_LOCKTIME", "CSV fails if stack top bit 1 << 31 is not set, and tx version < 2"],
    
  9. sipa commented at 1:36 am on March 25, 2020: member
    @pierreN They can be converted to hexadecimal.
  10. pierreN force-pushed on Mar 25, 2020
  11. pierreN renamed this:
    Prevent num op overflows in ParseScript() helper
    Limit decimal range of numbers ParseScript accepts
    on Mar 25, 2020
  12. pierreN commented at 4:06 am on March 25, 2020: contributor

    Ow my bad, I was too focused on the atoi64 branch to see it that way, thanks.

    I edited my branch accordingly.

  13. pierreN force-pushed on Mar 25, 2020
  14. in src/test/data/tx_invalid.json:177 in 08b392a205 outdated
    173@@ -174,7 +174,7 @@
    174 "0100000001000100000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000ff64cd1d", "P2SH,CHECKLOCKTIMEVERIFY"],
    175 
    176 ["Argument 2^32 with nLockTime=2^32-1"],
    177-[[["0000000000000000000000000000000000000000000000000000000000000100", 0, "4294967296 CHECKLOCKTIMEVERIFY 1"]],
    


    MarcoFalke commented at 11:45 am on March 25, 2020:
    Shouldn’t this version be kept as an example of an _invalid script?

    pierreN commented at 2:42 pm on March 25, 2020:

    OK, thanks. This is a good idea I think, I was hesitant to add more tests. If I understood you correctly, the tests are invalids here, so I guess you mean adding something like:

    0["Argument 2^32 in decimal (hence clamped to 0xffffffff) with nLockTime=2^32-2"],
    1[[["0000000000000000000000000000000000000000000000000000000000000100", 0, "4294967296 CHECKLOCKTIMEVERIFY 1"]],
    2"0100000001000100000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000fffffffe", "P2SH,CHECKLOCKTIMEVERIFY"],
    

    If this is correct, I guess I should add similar tests for other modified test cases too ?


    MarcoFalke commented at 2:46 pm on March 25, 2020:
    Feel free to add as many tests as you like, as long as all existing proper test cases also stay there

    pierreN commented at 3:30 pm on March 25, 2020:
    OK great, I added 5 more tests in the branch.

    MarcoFalke commented at 3:43 pm on March 25, 2020:
    What I mean is that this shouldn’t be silently clamped. The test here looks right. Why would it be safe to parse a script and silently change it while parsing?

    pierreN commented at 3:49 pm on March 25, 2020:
    OK, so you are suggesting in the core_read.cpp above to return an std::runtime_error when the result of the atoi64 is beyond the the range -0xffffffff..0xffffffff instead ?

    MarcoFalke commented at 3:51 pm on March 25, 2020:
    Yes, some kind of failure

    practicalswift commented at 4:00 pm on March 25, 2020:
    std::runtime_error is thrown on other script parse errors in ParseScript(…). Shouldn’t that be thrown also for this type of script parse error in ParseScript?

    pierreN commented at 4:20 pm on March 25, 2020:

    OK, thanks, this is clearly a possibility.

    Note that for now, no test trigger the already existing std::runtime_error. script_json_test doesn’t handle exceptions, so it should be modified to catch them (maybe in a different PR then ?). Testcases will have to be modified to handle the exception too I guess.

    (the fuzzer harness already handle exceptions well)

    I’m not sure which possibility (clamping or throwing an error) fits more the style of the codebase. Maybe getting an ACK from @sipa for the exception first ?


    pierreN commented at 5:02 pm on March 25, 2020:

    @practicalswift hm sorry but I guess it’s better to ask first: if you return an std::runtime_error, how to describe the error in the JSON test file ?

    I guess that adding a new value to the ScriptError_t enum is too intrusive. Adding a special if/else case in script_test.cpp:ParseScriptError seems too hackish. Reusing an existing ScriptError_t seems the less odd option - something like SCRIPT_ERR_BAD_OPCODE or SCRIPT_ERR_OP_COUNT ?


    sipa commented at 5:06 pm on March 25, 2020:

    I don’t think there is a need. (tx/script)_(valid/invalid) are testing the transaction and script validity logic - not the script parser.

    If a particular test now causes an exception to be thrown, just write that test differently. It’s still testing the same thing.


    pierreN commented at 6:51 pm on March 25, 2020:
    OK, so as stated below, if a test contains 4294967296 CHECKLOCKTIMEVERIFY 1 or 999...999, this is a script parser error which should be covered in bitcoin-util-test.json, not here. I believe this resolve this issue ?
  15. pierreN force-pushed on Mar 25, 2020
  16. laanwj commented at 4:18 pm on March 25, 2020: member
    What’s the rationale here for saturating the number instead of raising an error? I generally prefer explicit parse errors on invalid input, at least.
  17. practicalswift commented at 4:20 pm on March 25, 2020: contributor
    @laanwj Agreed. That would also be in line with how parsing errors are handled elsewhere in ParseScript(…): a std::runtime_error is thrown :)
  18. sipa commented at 4:34 pm on March 25, 2020: member
    Yes, that’s what I meant to suggest: fail if a number outside the valid range is present. Sorry if that was not clear.
  19. MarcoFalke commented at 5:09 pm on March 25, 2020: member
    Slightly related: The fuzz tests already have the error condition in this parser here covered: https://marcofalke.github.io/btc_cov/fuzz.coverage/src/core_read.cpp.gcov.html#84 . However our unit and functional tests don’t know how to handle this/have this uncovered: https://marcofalke.github.io/btc_cov/total.coverage/src/core_read.cpp.gcov.html#84
  20. MarcoFalke commented at 5:11 pm on March 25, 2020: member

    It looks like a test case can be added to:

    • src/test/script_tests.cpp and/or
    • test/util/data/bitcoin-util-test.json with binary bitcoin-tx
  21. pierreN commented at 5:30 pm on March 25, 2020: contributor
    OK so sorry now I’m a bit confused - as stated above, if we start handling exceptions in script_test.cpp, which ScriptError_t to have in the JSON ? Do you think this would be useful ?
  22. MarcoFalke commented at 5:34 pm on March 25, 2020: member

    The exception is simply passed in an error_text in test/util/data/bitcoin-util-test.json. E.g.

    0  { "exec": "./bitcoin-tx",
    1    "args": ["-create", "nversion=1foo"],
    2    "return_code": 1,
    3    "error_txt": "error: Invalid TX version requested",
    4    "description": "Tests the check for invalid nversion value"
    5  },
    
  23. pierreN commented at 6:07 pm on March 25, 2020: contributor

    Ow, OK, thanks, I see.

    So this way if we return an exception, we can just add a test ./src/bitcoin-tx -create outscript=0:9999999999 to check the parser (and have code coverage for the added exception) - hence the test in src/test/data/tx_invalid.json containing 4294967296 CHECKLOCKTIMEVERIFY 1 becomes useless since it would just trigger the same exception in the parser.

  24. pierreN force-pushed on Mar 25, 2020
  25. MarcoFalke renamed this:
    Limit decimal range of numbers ParseScript accepts
    util: Limit decimal range of numbers ParseScript accepts
    on Mar 25, 2020
  26. in src/test/data/tx_valid.json:299 in f3fb51d09b outdated
    297 "020000000100010000000000000000000000000000000000000000000000000000000000000000000000ffffbf7f0100000000000000000000000000", "P2SH,CHECKSEQUENCEVERIFY"],
    298-[[["0000000000000000000000000000000000000000000000000000000000000100", 0, "6442450944 CHECKSEQUENCEVERIFY 1"]],
    299+[[["0000000000000000000000000000000000000000000000000000000000000100", 0, "0x050000008001 CHECKSEQUENCEVERIFY 1"]],
    300 "020000000100010000000000000000000000000000000000000000000000000000000000000000000000ffffff7f0100000000000000000000000000", "P2SH,CHECKSEQUENCEVERIFY"],
    301-[[["0000000000000000000000000000000000000000000000000000000000000100", 0, "6442450944 CHECKSEQUENCEVERIFY 1"]],
    302+[[["0000000000000000000000000000000000000000000000000000000000000100", 0, "0x050000008001 CHECKSEQUENCEVERIFY 1"]],
    


    pierreN commented at 6:53 pm on March 25, 2020:
    Note that using an exception allowed to raise new errors in tx_valid - before the clamped value also valided the test…
  27. MarcoFalke added the label Utils/log/libs on Mar 25, 2020
  28. MarcoFalke removed the label Tests on Mar 25, 2020
  29. in src/core_read.cpp:66 in f3fb51d09b outdated
    59@@ -59,6 +60,14 @@ CScript ParseScript(const std::string& s)
    60         {
    61             // Number
    62             int64_t n = atoi64(*w);
    63+
    64+            //limit the range of numbers ParseScript accepts in decimal
    65+            //since numbers outside -0xFFFFFFFF...0xFFFFFFFF are illegal in scripts
    66+            if(n > static_cast<int64_t>(0xffffffff) || n < -1*static_cast<int64_t>(0xffffffff)) {
    


    MarcoFalke commented at 6:54 pm on March 25, 2020:

    Any reason this needs a static cast?

    0            if (n > 0xffffffff || n < -0xffffffff) {
    

    pierreN commented at 7:09 pm on March 25, 2020:

    The second static_cast seems strictly necessary, see this example : https://godbolt.org/z/Ko33NG

    The first one is superficial (implicit), I just thought that the symetry looked better. I can remove the first one if you prefer ?

    edit : at least fixed whitespace style

    edit 2 : also note that the LL suffix could be used but this would assume that long long = 64bits


    practicalswift commented at 9:29 pm on March 25, 2020:
    Explicit is better than implicit :)

    MarcoFalke commented at 1:41 pm on March 26, 2020:
    I think with C++11 you can also get a safer way to achieve the same: int64_t{bla}. If the value doesn’t fit in the integer type, it won’t compile. Whereas with static_cast it would compile and give the wrong result.

    pierreN commented at 8:56 pm on March 26, 2020:
    Thanks, updated the branch accordingly,
  30. pierreN commented at 6:55 pm on March 25, 2020: contributor

    OK, I think I updated the branch as we discussed above ?

    If this PR is OK, I’ll add a test case in bitcoin-util-test.json for the second exception in ParseScript in another PR.

  31. in test/util/data/bitcoin-util-test.json:283 in f3fb51d09b outdated
    267+  { "exec": "./bitcoin-tx",
    268+    "args": ["-create", "outscript=0:-9999999999"],
    269+    "return_code": 1,
    270+    "error_txt": "error: script parse error: decimal numeric value only allowed in the range -0xFFFFFFFF...0xFFFFFFFF",
    271+    "description": "Try to parse an output script with a decimal number below the allowed range"
    272+  },
    


    MarcoFalke commented at 6:55 pm on March 25, 2020:
    Could also check that +-0xffffffff is accepted and +-(0xffffffff+1) is not?

    pierreN commented at 7:10 pm on March 25, 2020:
    Yes, thanks, great idea.
  32. MarcoFalke approved
  33. MarcoFalke commented at 6:55 pm on March 25, 2020: member
    Concept ACK
  34. pierreN force-pushed on Mar 25, 2020
  35. pierreN force-pushed on Mar 25, 2020
  36. sipa commented at 8:30 pm on March 25, 2020: member
    ACK 4cefaace020f406d1143cf1351988958ef360861
  37. pierreN force-pushed on Mar 26, 2020
  38. pierreN commented at 1:45 am on March 26, 2020: contributor
    (absorbing an empty commit to re-launch failed CI)
  39. pierreN force-pushed on Mar 26, 2020
  40. sipa commented at 10:47 pm on March 26, 2020: member
    re-ACK 2ee8cc07292e62a1be47d09421f93d25ad24fbd4
  41. Limit decimal range of numbers ParseScript accepts 9ab14e4d21
  42. in src/core_read.cpp:22 in 2ee8cc0729 outdated
    18@@ -19,6 +19,7 @@
    19 #include <boost/algorithm/string/split.hpp>
    20 
    21 #include <algorithm>
    22+#include <limits>
    


    laanwj commented at 6:48 am on March 27, 2020:
    Is <limits> still necessary?

    pierreN commented at 6:51 am on March 27, 2020:
    Ow, thank you. No it isn’t, just removed it.
  43. pierreN force-pushed on Mar 27, 2020
  44. laanwj commented at 6:59 am on March 27, 2020: member
    ACK 9ab14e4d21c73d16d8d782f1576fe29e659e2a70 (only difference is removing the limits include as suggested: https://github.com/bitcoin/bitcoin/compare/2ee8cc07292e62a1be47d09421f93d25ad24fbd4..9ab14e4d21c73d16d8d782f1576fe29e659e2a70)
  45. laanwj merged this on Mar 27, 2020
  46. laanwj closed this on Mar 27, 2020

  47. pierreN commented at 7:27 am on March 27, 2020: contributor
    Thanks !
  48. laanwj commented at 8:19 am on March 27, 2020: member
    I’m very sorry to only realize this now but atoi64 itself doesn’t do bounds checking. So very large or small values outside the 64 bit range will still overflow the 64 bit or at least not raise an error conditions. I think the appropriate function to use here is our own ParseUInt32.
  49. pierreN commented at 8:31 am on March 27, 2020: contributor

    Sorry, I don’t follow, atoi64 is:

    0int64_t atoi64(const std::string& str)
    1{
    2#ifdef _MSC_VER
    3    return _atoi64(str.c_str());
    4#else
    5    return strtoll(str.c_str(), nullptr, 10);
    6#endif
    7}
    

    And both _atoi64 and strtoll both return LLONG_MAX or LLONG_MIN when receiving too big positive/negative values ?

    It will additionally set an errno but since we check beforehand that we are parsing a number, there should not be any issue ?

    In anycase it’s true that ParseInt32 (since the value can be signed) could be more adapted since it checks for errno anyway. I could open a new PR for that if people prefer ?

  50. laanwj commented at 10:22 am on March 27, 2020: member

    And both _atoi64 and strtoll both return LLONG_MAX or LLONG_MIN when receiving too big positive/negative values ?

    If that’s the case this is actually OK. I thought they didn’t have any guaranteed return value in that case.

    0       The strtol() function returns the result of the conversion, unless the value would underflow or overflow.  If an underflow occurs, strtol() returns LONG_MIN.  If an overflow occurs,  str‐
    1       tol() returns LONG_MAX. 
    

    both LONG_MIN and LONG_MAX are out of the range so it will do so correctly…

    In anycase it’s true that ParseInt32 (since the value can be signed) could be more adapted since it checks for errno anyway. I could open a new PR for that if people prefer ?

    Oh right! then you’d actually need ParseInt64: the value can be signed, and it’s not in the range of a 32 bit signed integer (-0x800000000x7fffffff). But I think it’s fine to keep it as-is then, phew.

  51. pierreN commented at 10:25 am on March 27, 2020: contributor
    OK, great, thanks for the code review !
  52. MarcoFalke referenced this in commit 210b533a11 on Mar 27, 2020
  53. in src/test/data/script_tests.json:2524 in 9ab14e4d21
    2520@@ -2521,7 +2521,7 @@
    2521 ["-1", "CHECKSEQUENCEVERIFY", "CHECKSEQUENCEVERIFY", "NEGATIVE_LOCKTIME", "CSV automatically fails if stack top is negative"],
    2522 ["0x0100", "CHECKSEQUENCEVERIFY", "CHECKSEQUENCEVERIFY,MINIMALDATA", "UNKNOWN_ERROR", "CSV fails if stack top is not minimally encoded"],
    2523 ["0", "CHECKSEQUENCEVERIFY", "CHECKSEQUENCEVERIFY", "UNSATISFIED_LOCKTIME", "CSV fails if stack top bit 1 << 31 is set and the tx version < 2"],
    2524-["4294967296", "CHECKSEQUENCEVERIFY", "CHECKSEQUENCEVERIFY", "UNSATISFIED_LOCKTIME",
    2525+["0x050000000001", "CHECKSEQUENCEVERIFY", "CHECKSEQUENCEVERIFY", "UNSATISFIED_LOCKTIME",
    


    MarcoFalke commented at 2:06 pm on March 27, 2020:
    Why are you changing all the values when translating from dec to hex? It might be fine, but at least for locktime values there are different code paths depending on the value.

    pierreN commented at 2:15 pm on March 27, 2020:
    Ow, good point. I double checked with the value returned by ParseScript in the tests before/after the PR - so that the behavior of the tests doesn’t change at all (i.e. before this PR the script built by ParseScript from 4294967296 contained 0x050000000001)

    MarcoFalke commented at 2:18 pm on March 27, 2020:
    Ah, oops. My bad
  54. azuchi referenced this in commit b73426b7c6 on Jun 16, 2020
  55. Fabcien referenced this in commit a8bad1d2bd on Jan 12, 2021
  56. DrahtBot locked this on Feb 15, 2022

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-10-30 03:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me