Restore atoi64/GetIntArg saturating behavior #23841

pull jamesob wants to merge 1 commits into bitcoin:master from jamesob:2021-12-atoi64-incompat changing 11 files +88 −13
  1. jamesob commented at 5:18 pm on December 22, 2021: member

    The new locale-independent atoi64 method introduced in #20452 behaves differently for values passed which are greater than the uint64_t max. This commit is proof of that (the attached test fails), and is meant to spur discussion on how to handle such an incompatibility.

    The change as committed in #20542 may break some end-usages of bitcoind, but more than that may have consensus implications as this deserialization mechanism is used in CScript::ParseScript. I think this means it’s possible that #20542 could have been an accidental soft fork?

    Edit: CScript::ParseScript isn’t consensus-critical, but is used in e.g. script descriptor parsing.

  2. sipa commented at 5:25 pm on December 22, 2021: member

    Nice catch.

    CScript::ParseScript shouldn’t be consensus-critical though.

  3. jamesob commented at 5:29 pm on December 22, 2021: member

    Some discussion from IRC:

    012:21 <MarcoFalke> jamesob: Integer overflow inside atoi is UB or at least unspecified behavior, last time I checked
    112:22 <MarcoFalke> That is: Parsing 999999999999 can give you any value (it doesn't have to be 0 or max or 999999999999 mod max)
    212:23 <MarcoFalke> jamesob: It is also not used in consensus code, so this is not a consensus change
    312:24 <jamesob> this is not consensus? https://github.com/bitcoin/bitcoin/pull/20452/files#diff-1db27ed1bfbf61ea0fe64447413ef9f24238be710e7ca4ae9c7bc7a5c994eca0L72-R72
    412:26 <jamesob> so here's the tricky thing: our atoi64 wasn't actually using atoi, but strtoll - for which oveflow behavior is well defined: "The strtol() function returns the result of the conversion, unless the value would underflow or overflow. If an underflow occurs, strtol() returns LONG_MIN. If an overflow occurs, strtol() returns LONG_MAX."
    512:26 <jamesob> ("The strtoll() function works just like the strtol() function but returns a long long integer value.")
    612:28 <MarcoFalke> Yeah, not consensus. Could be used by bitcoin-util or some RPC function.
    712:29 <jamesob> Phew
    
  4. MarcoFalke commented at 5:36 pm on December 22, 2021: member
    Indeed good catch, I wasn’t aware it is safe and saturating to produce an overflow. Still, an explicit error would be better than silent saturation. See also https://github.com/bitcoin/bitcoin/pull/17385/files#diff-19427b0dd1a791adc728c82e88f267751ba4f1c751e19262cac03cccd2822216R509
  5. MarcoFalke added this to the milestone 23.0 on Dec 22, 2021
  6. MarcoFalke added the label Brainstorming on Dec 22, 2021
  7. MarcoFalke added the label Utils/log/libs on Dec 22, 2021
  8. jamesob commented at 7:25 pm on December 22, 2021: member

    I’ve pushed a change that restores compatibility by introducing a new function, LocaleIndependentAtoi64. This is specifically not a template specialization of its more generic cousin because the under-/overflow behavior differs (as was the case previously with our atoi64 implementation). A compile error is rendered if someone attempts to use the generic version with int64_t types.

    I’ve also added tests for GetIntArg that assert the saturation behavior.

    When this is merged I’ll make a follow-up issue to surface InitErrors instead of saturating, which would be a good first issue I think.

  9. jamesob force-pushed on Dec 22, 2021
  10. DrahtBot commented at 7:03 am on December 23, 2021: member

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #24041 (Restore atoi compatibility with old versions of Bitcoin Core by ryanofsky)
    • #17783 (util: Fix -norpcwhitelist, -norpcallowip, and similar corner case behavior by ryanofsky)
    • #17581 (refactor: Remove settings merge reverse precedence code by ryanofsky)
    • #17580 (refactor: Add ALLOW_LIST flags and enforce usage in CheckArgFlags by ryanofsky)
    • #17493 (util: Forbid ambiguous multiple assignments in config file by ryanofsky)
    • #16545 (refactor: Implement missing error checking for ArgsManager flags by ryanofsky)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  11. MarcoFalke commented at 2:59 pm on January 3, 2022: member
    If you want this merged, please make sure that each commit compiles and passes all tests. Otherwise it will break git bisect.
  12. jamesob force-pushed on Jan 3, 2022
  13. in src/test/getarg_tests.cpp:144 in c0be750b20 outdated
    140@@ -140,9 +141,12 @@ BOOST_AUTO_TEST_CASE(intarg)
    141     BOOST_CHECK_EQUAL(m_local_args.GetIntArg("-foo", 11), 11);
    142     BOOST_CHECK_EQUAL(m_local_args.GetIntArg("-foo", 0), 0);
    143 
    144-    ResetArgs("-foo -bar");
    145-    BOOST_CHECK_EQUAL(m_local_args.GetIntArg("-foo", 11), 0);
    146-    BOOST_CHECK_EQUAL(m_local_args.GetIntArg("-bar", 11), 0);
    147+    BOOST_CHECK_EQUAL(m_local_args.GetIntArg("-foo", 0), 0);
    


    MarcoFalke commented at 7:59 am on January 4, 2022:
    How is this different from the line preceding it?

    jamesob commented at 6:03 pm on January 4, 2022:
    Oops, this looks like a mistaken edit.
  14. MarcoFalke commented at 8:02 am on January 4, 2022: member

    I think you can squash the remaining test commit into the feature commit. No need for a fresh commit.

    Also, left a question.

    (Note to myself: commit 5b78055d5965b6ed69c6160100b1e4e36e6e6ea3 was removed in the latest rebase)

  15. Restore atoi64 compatibility with old versions of Bitcoin Core
    The new locale-independent atoi64 method introduced in #20452 behaves
    differently for values passed which are greater than the uint64_t max.
    This commit is proof of that, meant to spur discussion on how to handle
    such an incompatibility.
    
    Introduce LocaleIndependentAtoi64 which behaves the same way that
    previous versions of Bitcoin Core has when faced with under- and
    overflow.
    
    This behavior was implicitly changed in #20452, but has not yet
    been included in a release.
    
    Attempts to use LocaleIndependentAtoi for int64_t return values
    will result in a compilation error.
    ac1a5b113c
  16. jamesob force-pushed on Jan 4, 2022
  17. laanwj commented at 3:15 pm on January 5, 2022: member
    Good catch. But concept NACK on reintroducing strtoll usage. I would prefer implementing the desired compatible behavior in an explicit way using new functions, instead of delegating it to the C library’s potentially locale-dependent function. (or maybe even better: explicit parse error feedback in case of out-of-range)
  18. in src/test/getarg_tests.cpp:149 in ac1a5b113c
    144@@ -144,6 +145,11 @@ BOOST_AUTO_TEST_CASE(intarg)
    145     BOOST_CHECK_EQUAL(m_local_args.GetIntArg("-foo", 11), 0);
    146     BOOST_CHECK_EQUAL(m_local_args.GetIntArg("-bar", 11), 0);
    147 
    148+    // Check under-/overflow behavior.
    149+    ResetArgs("-foo=-9223372036854775809 -bar=9223372036854775808");
    


    MarcoFalke commented at 3:22 pm on January 5, 2022:
    This is not overflowing. It is equal to the min/max?

    jamesob commented at 3:28 pm on January 5, 2022:
     010:29:37 james@slug james/tmp % cat int64max.cpp
     1#include <limits>
     2#include <iostream>
     3#include <cinttypes>
     4
     5int main() {
     6    std::cout << std::numeric_limits<int64_t>::max() << '\n';
     7    std::cout << std::numeric_limits<int64_t>::min() << '\n';
     8}
     910:29:42 james@slug james/tmp % ./a.out
    109223372036854775807
    11-9223372036854775808
    
  19. jamesob commented at 3:22 pm on January 5, 2022: member
    @laanwj I think there is some confusion here: nowhere in the live code do I actually use strtoll; it’s just used in tests to model compatibility with the old implementation in previous versions.
  20. MarcoFalke approved
  21. MarcoFalke commented at 3:34 pm on January 5, 2022: member
    ack
  22. in src/util/strencodings.cpp:581 in ac1a5b113c
    576+            return 0;
    577+        }
    578+        s = s.substr(1);
    579+    }
    580+
    581+    auto [_, error_condition] = std::from_chars(s.data(), s.data() + s.size(), result);
    


    JeremyRubin commented at 7:29 pm on January 5, 2022:
    std::from_chars uses strtol locale dependent behavior, no?

    JeremyRubin commented at 7:30 pm on January 5, 2022:

    https://en.cppreference.com/w/cpp/utility/from_chars says

    1. Integer parsers: Expects the pattern identical to the one used by std::strtol in the default (“C”) locale and the given non-zero numeric base, except that

    sipa commented at 7:43 pm on January 5, 2022:
    It always uses the “C” locale, so it is locale-independent.

    JeremyRubin commented at 7:49 pm on January 5, 2022:
    gotcha.
  23. MarcoFalke removed the label Brainstorming on Jan 6, 2022
  24. vincenzopalazzo approved
  25. MarcoFalke renamed this:
    New atoi64 behavior is not compatible with legacy behavior
    Restore atoi64/GetIntArg saturating behavior
    on Jan 10, 2022
  26. MarcoFalke commented at 7:55 am on January 10, 2022: member
    Edited title
  27. MarcoFalke commented at 7:55 am on January 10, 2022: member
    @laanwj Are you still NACK on this? Otherwise I think this is rfm
  28. shaavan approved
  29. shaavan commented at 12:36 pm on January 10, 2022: contributor

    Code Review ACK ac1a5b113ccdace4d7f04e2e8c4089e397b5c0b4

    According to Microsoft docs:

    In the case of overflow with large positive integral values, _atoi64 returns I64_MAX and I64_MIN in the case of overflow with large negative, integral values.

    However, this was not true for the LocaleIndependentAtoi<int64_t> function, which intended to replicate _atoi64 behavior. This function returns 0 both in case of underflow and overflow.

    This PR fixes it by introducing a slightly modified LocaleIndependentAtoi function specifically for int64_t which correctly replicates _atoi64 behavior of return INT64_MAX in case of overflow, and INT64_MIN in case of the underflow.

    The added tests make sense and work perfectly, which, as expected, failed on the Master branch. Thanks for catching this and preventing a potentially major future issue.

  30. in src/util/strencodings.h:133 in ac1a5b113c
    128+ *
    129+ * This aims to replicate the exact behaviour of atoi64 in previous versions of
    130+ * Bitcoin Core. The old atoi64 utility function actually made use of `strtoll`
    131+ * in its implementation (or Microsoft's `_atoi64`), which saturates over- and
    132+ * underflows. That behavior is reproduced here.
    133+ */
    


    ryanofsky commented at 5:07 pm on January 10, 2022:

    In commit “Restore atoi64 compatibility with old versions of Bitcoin Core” (ac1a5b113ccdace4d7f04e2e8c4089e397b5c0b4)

    I find this description hard to follow. What is the point of referencing all these other functions and vaguely handwaving about what this function does of simply saying “Parse a decimal integer string. On positive overflow, return the maximum integer value, on negative overflow return the minimum integer value, and on error, return 0.”?

  31. in src/util/strencodings.h:99 in ac1a5b113c
     93@@ -94,8 +94,11 @@ void SplitHostPort(std::string in, uint16_t& portOut, std::string& hostOut);
     94 // which provide parse error feedback.
     95 //
     96 // The goal of LocaleIndependentAtoi is to replicate the exact defined behaviour
     97-// of atoi and atoi64 as they behave under the "C" locale.
     98-template <typename T>
     99+// of atoi as it behaves under the "C" locale.
    100+//
    101+// Prevent use of LocaleIndependentAtoi for int64_t, since the function below
    


    ryanofsky commented at 5:15 pm on January 10, 2022:

    In commit “Restore atoi64 compatibility with old versions of Bitcoin Core” (ac1a5b113ccdace4d7f04e2e8c4089e397b5c0b4)

    I don’t understand why this function exists at all. What use could there be for a parsing function that returns 0 in the case of overflow? It just seems like a footgun. Returning the biggest possible value when a bigger value is given, and the smallest possible value when a smaller value is given seems like a more useful and less dangerous way to parse values.

    I guess it precedes this PR, but the naming also seems strange. Why call this one util function locale independent, when basically every util function we have should be locale independent? If we had use for a parsing function that did vary by locale it would make more sense to draw attention and mention locales in that function’s name, not this function’s name.


    MarcoFalke commented at 9:08 am on January 11, 2022:
    I like that the name is long and ugly. It makes developers less likely to pick it, when there are better named and safer alternatives available that come with error reporting instead of implicitly mapping errors to random (in-range) return values.

    ryanofsky commented at 3:06 pm on January 11, 2022:

    re: #23841 (review)

    I like that the name is long and ugly. It makes developers less likely to pick it, when there are better named and safer alternatives available that come with error reporting instead of implicitly mapping errors to random (in-range) return values.

    It’s just seems a weird way to make the function name long by adding a random fact about locale independence. You could call it FrogsAreGreenAtoi and that would be better name than LocaleIndependentAtoi because it wouldn’t falsely suggest that other functions depend on the current locale and this function doesn’t. I think an ideal name would be something more like ParseIntUnchecked or AtoiIgnoreErrors


    MarcoFalke commented at 3:39 pm on January 11, 2022:

    FrogsAreGreenAtoi

    Ryan, not all frogs are green. I’d suggest to use the name MyFrogIsGreenAtoi or ParseSaturatedInt.


    ryanofsky commented at 3:44 pm on January 11, 2022:
    +1 to MyFrogIsGreenAtoi
  32. in src/util/strencodings.h:123 in ac1a5b113c
    117@@ -115,6 +118,21 @@ T LocaleIndependentAtoi(const std::string& str)
    118     return result;
    119 }
    120 
    121+/**
    122+ * LocaleIndependentAtoi64 is provided for backwards compatibility reasons.
    123+ * Since its behavior differs from standard atoi functionality, it is not a
    


    ryanofsky commented at 5:29 pm on January 10, 2022:

    In commit “Restore atoi64 compatibility with old versions of Bitcoin Core” (ac1a5b113ccdace4d7f04e2e8c4089e397b5c0b4)

    I can’t figure how this function is not a valid implementation of the standard function https://en.cppreference.com/w/cpp/string/byte/atoi. If its behavior is actually nonstandard, it would be good to say how.

  33. ryanofsky approved
  34. ryanofsky commented at 5:52 pm on January 10, 2022: member

    Concept ~0. I don’t understand why this PR is forking the parse function and only removing the overflow footgun specifically in the int64_t case and letting it remain in other cases, instead of parsing sanely in all cases. I also think the commit descriptions and function descriptions are handwavy and confusing. They should just say what actual behaviors are instead of gesturing vaguely at other functions and PRs without saying what the differences are.

    Code review ACK ac1a5b113ccdace4d7f04e2e8c4089e397b5c0b4, though. This does seem like the smallest possible change that will fix a few int64 parses, and it has good test coverage.

  35. MarcoFalke commented at 9:06 am on January 11, 2022: member

    I don’t understand why this PR is forking the parse function and only removing the overflow footgun specifically in the int64_t case

    In the int case it previously used atoi, which runs into UB (not saturation). See #23841 (comment) . So I presume the goal was to make minimal changes?

    Though, given that it is UB I can also see the argument to prefer less code and turn the UB into saturation. (In master it was turned to return 0 IIRC).

  36. ryanofsky commented at 3:31 pm on January 11, 2022: member

    re: #23841 (comment)

    Though, given that it is UB I can also see the argument to prefer less code and turn the UB into saturation. (In master it was turned to return 0 IIRC).

    Yes, that’s what I’m trying to figure out. I don’t understand why you would return 0 when the provided value is bigger than the maximum value, instead of returning the maximum value. Also why you would want two error-ignoring parsing functions with different and not clearly documented behavior instead of one error-ignoring parsing function with clearly documented behavior? And is there actually something different about the int64_t type that it needs to be treated differently than all the other integer types?

    Anyway, I acked this PR since it seems OK to merge, but I think it would be nicer if the two functions were consolidated, given nicer behavior and documentation, and probably renamed.

  37. MarcoFalke commented at 3:37 pm on January 11, 2022: member

    is there actually something different about the int64_t type that it needs to be treated differently than all the other integer types?

    No, not that I am aware of.

  38. MarcoFalke changes_requested
  39. MarcoFalke commented at 4:06 pm on January 11, 2022: member
    Withdraw my ACK. Seems more minimalistic to have one method instead of two.
  40. jamesob commented at 4:21 pm on January 11, 2022: member

    I think there is some navel-gazing going on here. We are in a situation where a change has inadvertently broken compatibility with a prior release (albeit in a minor way). This change proposes to remedy that and is well-tested.

    If there are style concerns, I think they can be done in a follow-up. Otherwise I have other things to do and someone else can fix this in a way that reviewers find preferable.

    Edit:

    Seems more minimalistic to have one method instead of two.

    The prior behavior of Bitcoin was to saturate only for int64s - are you proposing to special case this single method on the basis of type? I think having a single method that either saturates or doesn’t on the basis of type is very misleading.

  41. MarcoFalke commented at 4:56 pm on January 11, 2022: member

    are you proposing to special case this single method on the basis of type?

    No, simply make the existing method saturate instead of creating yet another one. See #23841#pullrequestreview-849309409

    If not done here, there will be a follow-up doing it. And I think that we should avoid follow-ups that completely remove code that was added previously. (They are fine when only a subset is followed-up on).

    If you don’t have time or energy to fix it that is fine, just let us know so that it can be picked up.

  42. ryanofsky commented at 5:08 pm on January 11, 2022: member

    I think there is some navel-gazing going on here.

    It’s true, and to make the navel-gazing even worse, it was mostly about the already merged PR #20452, not even this PR!

    Anyway posted #24041 as an alternative

  43. MarcoFalke commented at 5:15 pm on January 11, 2022: member

    The code here is non-trivial to review, based on the bug that was accidentally introduced.

    So I don’t think it is navel-gazing to consolidate the tricky code to just one function instead of two.

    Also, I don’t think it is navel-gazing to prefer a +63-17 patch over a two patches (each of which is larger than the single one), first +88-13, then +42-71.

  44. jamesob commented at 5:58 pm on January 11, 2022: member
  45. jamesob closed this on Jan 11, 2022

  46. JeremyRubin commented at 6:35 pm on January 11, 2022: contributor

    if i understand it correctly, i do prefer the approach here as a stop-gap compared to #24041.

    What I think makes the most sense is to do both of these (this PR first) and then over the course of a few releases:

    1. Log Warn if the parsing will differ in a future release
    2. Error + Shutdown if the parsing will differ in a future release
    3. Change the parsing.

    Perhaps it’s more trouble than it’s worth to DTRT here, but I think it is, personally, since config files breaking is a big deal.

  47. ryanofsky commented at 8:05 pm on January 11, 2022: member

    Perhaps it’s more trouble than it’s worth to DTRT here, but I think it is, personally, since config files breaking is a big deal.

    I don’t think there are any differences between this PR and #24041 with respect to config files. Config behavior unintentionally changed in #20452, and both this PR and #24041 restore the previous behavior.

  48. JeremyRubin commented at 10:23 pm on January 11, 2022: contributor
    ok i was reading off of #24041 (comment) but it seems that is not the case, in which case i’ll concept ack #24041
  49. laanwj commented at 11:59 am on January 12, 2022: member

    @laanwj I think there is some confusion here: nowhere in the live code do I actually use strtoll; it’s just used in tests to model compatibility with the old implementation in previous versions.

    @laanwj Are you still NACK on this? Otherwise I think this is rfm

    No, I’m not, using it in the tests is fine sorry for misunderstanding it, my brain doesn’t work anymore.

  50. MarcoFalke referenced this in commit 290ff5ef6d on Jan 12, 2022
  51. DrahtBot locked this on Jan 12, 2023

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-10-30 00:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me