atoi(…)
with locale-independent std::from_chars(…)
(C++17).
atoi(…)
with locale-independent std::from_chars(…)
(C++17).
Note that we have a function ParseInt32
(as well as 64 and UInt variants) for this and I once tried to use it in more places, see #17385.
However these didn’t turn out to be actually locale-independent. Maybe this can replace them eventually.
85+ }
86+ s = s.substr(1);
87+ }
88+ auto [_, error_condition] = std::from_chars(s.data(), s.data() + s.size(), result);
89+ if (error_condition != std::errc{}) {
90+ return 0;
Option
.
(then handle it at the call site, if it wants to squash any errors to 0, fine)
last
as well, otherwise it ignores any remainder.
LocaleIndependentAtoi
) is meant as a quirk-by-quirk compatible to the not-entirely-sane atoi(…)
I’m afraid we need to return a T
(instead of std::optional<T>
) and ignore any remainder, no? :)
locale-independent std::from_chars(…) (C++17).
This is good to hear at least. From https://en.cppreference.com/w/cpp/utility/from_chars#Notes :
Unlike other parsing functions in C++ and C libraries, std::from_chars is locale-independent, non-allocating, and non-throwing.
Concept ACK.
76+template <typename T>
77+T ToIntegral(const std::string& str)
78+{
79+ T result;
80+ // Emulate atoi(...) handling of white space and leading +/-.
81+ std::string s = TrimString(str);
Making a copy of the string here, as well as below for the +/- check, is kind of efficient. Not sure it matters here, it’s not like this function is used in any performance critical places at the moment.
But as the function takes a range, we could do better.
As a general rule I try to stay away from pointer arithmetic (rationale here): that’s the reason for the technically extraneous copy which I think is insignificant in this context :)
Personally I find code that avoids pointer arithmetic much easier to reason about from a safety perspective.
That said: I’ll let other chime in :)
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.
Reviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.
I think we need to make a choice here (either one or both):
atoi
behavior, but call it something else than ToIntegral
, something like LocaleIndependentAtoi
, make it clear it should not be used in new code, it’s only meant for preserving atoi
behavior for backward compatibility reasons.Parse[U]IntXX
or their contents in a locale-independent way. I’m fine with the name then.I think we need to make a choice here (either one or both):
- Make an emulation of all broken and surprising
atoi
behavior, but call it something else thanToIntegral
, something likeLocaleIndependentAtoi
, make it clear it should not be used in new code, it’s only meant for preservingatoi
behavior for backward compatibility reasons.- Make a sane integer parsing function that can eventually replace
Parse[U]IntXX
or their contents in a locale-independent way. I’m fine with the name then.
Good point. I thought about those options as well. I think we should do both. This PR is meant as a pure refactoring: it is not meant to change any behaviour that is defined by atoi
.
I’ve now made it more clear that this PR is meant as a quirk-by-quirk compatible replacement for atoi
by calling the function LocaleIndependentAtoi
:)
Good point about naming: we should keep the name ToIntegral
for a future sane version that doesn’t emulate atoi
or any other legacy function.
atoi
as well as atoi64
in that case (as it’s parametrized on type)?
69@@ -68,8 +70,25 @@ std::string EncodeBase32(Span<const unsigned char> input, bool pad = true);
70 std::string EncodeBase32(const std::string& str, bool pad = true);
71
72 void SplitHostPort(std::string in, int& portOut, std::string& hostOut);
73-int64_t atoi64(const std::string& str);
74-int atoi(const std::string& str);
75+
76+template <typename T>
atoi
for backwards compatibility) and that new code should use the Parse[U]IntXX
functions with parse error feedback.
I don’t get the CI error. Is charconv
non-standard in some way?
0In file included from primitives/transaction.cpp:10:0:
1./util/strencodings.h:16:10: fatal error: charconv: No such file or directory
2 #include <charconv>
3 ^~~~~~~~~~
4compilation terminated.
@laanwj I think the CI failures are from relatively older distros, and that C++17 library support lagged behind C++17 compiler support for a while in C++17 infancy.
According to cppreference the elementary string conversions (<charconv>
) were added in GCC libstdc++ 8, Clang libc++ 7 and MSVC Standard Library 19.14.
Rebased!
Could this PR get a 23.0 milestone like the related PR #20457 (“util: Make Parse{Int,UInt}{32,64} use locale independent std::from_chars(…) (C++17) instead of locale dependent strto{l,ll,ul,ull}”)? :)
test: Add test cases for LocaleIndependentAtoi
fuzz: Assert legacy atoi(s) == LocaleIndependentAtoi<int>(s)
fuzz: Assert legacy atoi64(s) == LocaleIndependentAtoi<int64_t>(s)
701@@ -702,7 +702,7 @@ static RPCHelpMan getblocktemplate()
702 std::string lpstr = lpval.get_str();
703
704 hashWatchedChain = ParseHashV(lpstr.substr(0, 64), "longpollid");
705- nTransactionsUpdatedLastLP = atoi64(lpstr.substr(64));
706+ nTransactionsUpdatedLastLP = LocaleIndependentAtoi<int64_t>(lpstr.substr(64));
nTransactionsUpdatedLastLP
is unsigned int
; should this be defined or cast to that?
nTimeSmart
in the wallet code. In the longer run the field should imo be converted to an explicitly sized type, and parsed as that, but keeping the behavior exactly the same makes sense here to keep it uncontroversial.
73-int atoi(const std::string& str);
74+
75+// LocaleIndependentAtoi is provided for backwards compatibility reasons.
76+//
77+// New code should use the ParseInt64/ParseUInt64/ParseInt32/ParseUInt32 functions
78+// which provide parse error feedback.
atoi
and atoi64
is pretty much always undesirable.
(so I would normally comment “these function names are long and clunky, please shorten them” but it’s good in this case, we want using them to be ugly :smile: )
ToIntegral
, this contradicts by saying to use ParseInt*
??
ToIntegral
sounds like a good suggestion now that it is part of master
:)
68@@ -69,7 +69,7 @@ CScript ParseScript(const std::string& s)
69 (w->front() == '-' && w->size() > 1 && std::all_of(w->begin()+1, w->end(), ::IsDigit)))
70 {
71 // Number
72- int64_t n = atoi64(*w);
73+ int64_t n = LocaleIndependentAtoi<int64_t>(*w);
ParseIntegral
is part of master
:)
249@@ -250,7 +250,7 @@ bool RPCConsole::RPCParseCommandLine(interfaces::Node* node, std::string &strRes
250 for(char argch: curarg)
251 if (!IsDigit(argch))
252 throw std::runtime_error("Invalid result query");
253- subelement = lastResult[atoi(curarg.c_str())];
254+ subelement = lastResult[LocaleIndependentAtoi<int>(curarg)];
76@@ -77,8 +77,7 @@ std::optional<CAmount> ParseMoney(const std::string& money_string)
77 return std::nullopt;
78 if (nUnits < 0 || nUnits > COIN)
79 return std::nullopt;
80- int64_t nWhole = atoi64(strWhole);
81-
82+ int64_t nWhole = LocaleIndependentAtoi<int64_t>(strWhole);