util: refactor upper/lowercase functions

kallewoof commented at 2:38 am on August 8, 2019: member

This includes renaming Downcase() to ToLower() and make it return a string rather than modify referenced arg. Also adds ToUpper() string version.

Additionally, it clarifies that the locale independency of the case functions is a feature and not a limitation. I interpreted it as the latter and rewrote code to be locale-aware before realizing this.

This is done in preparation for #11413 and as a general refactor. I don’t think the optimization that the pre-refactor state gave warrants the unwieldy usage.

This includes renaming Downcase() to ToLower() and make it return a string rather than modify referenced arg.
Also adds ToUpper() string version.

0481fa2584

fanquake added the label Refactoring on Aug 8, 2019

fanquake added the label Utils/log/libs on Aug 8, 2019

practicalswift commented at 7:26 am on August 8, 2019: contributor

Concept ACK

Thanks for cleaning this up.

Will review the code.

I don’t think the optimization that the pre-refactor state gave warrants the unwieldy usage.

Very much so. (Also unclear if the pre-refactor state is a meaningful optimisation in practice.)

Sjors commented at 8:33 am on August 8, 2019: member

Concept ACK for improved readability and usability.

laanwj commented at 9:43 am on August 8, 2019: member

Concept ACK having these as functions seems better than doing the transformation in-place

Additionally, it clarifies that the locale independency of the case functions is a feature and not a limitation

Yes, please do not ever add any locale dependent code to bitcoind. Deterministic behavior is the most important thing there (it’s okay for bitcoin-qt but only for UI purposes).

in src/util/strencodings.cpp:551 in 0481fa2584

545@@ -546,9 +546,18 @@ bool ParseFixedPoint(const std::string &val, int decimals, int64_t *amount_out)
546     return true;
547 }
548 
549-void Downcase(std::string& str)
550+std::string ToLower(const std::string& str)
551 {
552-    std::transform(str.begin(), str.end(), str.begin(), [](char c){return ToLower(c);});
553+    std::string r;

promag commented at 10:31 am on August 8, 2019:

Could reserve, same below.

kallewoof commented at 1:45 pm on August 8, 2019:

What does that mean?

promag commented at 1:49 pm on August 8, 2019:

See http://www.cplusplus.com/reference/string/string/reserve/.

practicalswift commented at 8:36 pm on August 8, 2019:

@promag I don’t think using r.reserve(str.size()); necessarily results in faster execution here. I think it depends on the SSO implementation and the length of str.

If str.size() <= 15 I think not using reserve is faster when using libstdc++ or MSVC. And str.size() <= 22 for libc++.

promag commented at 8:48 pm on August 8, 2019:

Right, but I’m not particularly worried about performance, i think it improves the code.

kallewoof commented at 0:47 am on August 9, 2019:

Reserving seems sensible. It would actually slow down the code, @practicalswift ? Crazy. I would have never expected that. I’m indifferent on this, though, and opinions are mixed so leaving as is for now.

practicalswift commented at 7:40 pm on August 14, 2019:

@kallewoof You can test this using the following Google Benchmark snippet:

 0#include <algorithm>
 1#include <string>
 2
 3constexpr char ToLower(char c) {
 4    return (c >= 'A' && c <= 'Z' ? (c - 'A') + 'a' : c);
 5}
 6
 7std::string ToLower_1(const std::string& str) {
 8    std::string r;
 9    for (auto ch : str) {
10        r += ToLower((unsigned char)ch);
11    }
12    return r;
13}
14
15std::string ToLower_2(const std::string& str) {
16    std::string r;
17    r.reserve(str.size());
18    for (auto ch : str) {
19        r += ToLower((unsigned char)ch);
20    }
21    return r;
22}
23
24static void BenchToLower_1(benchmark::State& state) {
25  for (auto _ : state) {
26    const std::string s1{"HELLO"};
27    benchmark::DoNotOptimize(s1);
28    const std::string s2 = ToLower_1(s1);
29    benchmark::DoNotOptimize(s2);
30  }
31}
32BENCHMARK(BenchToLower_1);
33
34static void BenchToLower_2(benchmark::State& state) {
35  for (auto _ : state) {
36    const std::string s1{"HELLO"};
37    benchmark::DoNotOptimize(s1);
38    const std::string s2 = ToLower_2(s1);
39    benchmark::DoNotOptimize(s2);
40  }
41}
42BENCHMARK(BenchToLower_2);

Notice that ToLower_1 which doesn’t reserve is slightly faster for strings up to 15 chars for libstdc++ and MSVC and up to 22 chars for libc++.

kallewoof commented at 6:36 am on August 15, 2019:

@practicalswift Thanks for that! Very educational. Memory allocations are tricky to optimize, I guess. :)

in src/netbase.cpp:41 in 0481fa2584

36@@ -37,8 +37,8 @@ bool fNameLookup = DEFAULT_NAME_LOOKUP;
37 static const int SOCKS5_RECV_TIMEOUT = 20 * 1000;
38 static std::atomic<bool> interruptSocks5Recv(false);
39 
40-enum Network ParseNetwork(std::string net) {
41-    Downcase(net);
42+enum Network ParseNetwork(const std::string& net_in) {
43+    std::string net = ToLower(net_in);

promag commented at 10:34 am on August 8, 2019:

Could be net = ToLower(net) and avoid changing the signature? Both are fine though.

laanwj commented at 11:04 am on August 8, 2019:

Changing the signature is an improvement here IMO

l2a5b1 commented at 5:16 pm on August 8, 2019:

I don’t think the optimization that the pre-refactor state gave warrants the unwieldy usage.

It seems to me that the usage of the refactored function is more unwieldy than the pre-refactor state.

Downcase(net) straightforwardly downcases the string. The refactored-state needs two variables (const std::string& net_in, std::string net) and an assignment to achieve the same result.

promag commented at 10:36 am on August 8, 2019: member

Concept ACK.

luke-jr commented at 4:00 pm on August 8, 2019: member

While there are cases where Downcase was unwieldy used, I don’t think it is unwieldy as a rule, and there’s no real reason to go out of the way to make things less efficient. The new functions can be added without making other code worse.

in src/util/strencodings.cpp:549 in 0481fa2584

545@@ -546,9 +546,18 @@ bool ParseFixedPoint(const std::string &val, int decimals, int64_t *amount_out)
546     return true;
547 }
548 
549-void Downcase(std::string& str)
550+std::string ToLower(const std::string& str)

l2a5b1 commented at 5:23 pm on August 8, 2019:

What is the significant developer benefit of this change and why is the required in preparation for #11413? This function is called at just two call sites and the pre-refactor state expects that the argument is directly transformed.

in src/util/system.cpp:391 in 0481fa2584

387@@ -388,7 +388,7 @@ bool ArgsManager::ParseParameters(int argc, const char* const argv[], std::strin
388             key.erase(is_index);
389         }
390 #ifdef WIN32
391-        std::transform(key.begin(), key.end(), key.begin(), ToLower);
392+        key = ToLower(key);

l2a5b1 commented at 6:37 pm on August 8, 2019:

Pre-refactor this would just be Downcase(key).

l2a5b1 commented at 6:52 pm on August 8, 2019: contributor

I am afraid that I don’t see the significant developer improvement of the refactored Downcase function. Usage of the refactored ToLower function at the two call sites also seems more unwieldy than the pre-refactor state, because both call sites expected the string argument to be transformed.

Refactoring Downcase is also not required for #11413.

I think we should create a ToLower that returns the result when we actually need it (and even then - as in boost for example - it could make sense to have both functions).

laanwj commented at 10:36 pm on August 8, 2019: member

I think we should create a ToLower that returns the result when we actually need it (and even then - as in boost for example - it could make sense to have both functions).

Having two sets of uppercase/lowercase functions would be overkill, especially given how little they are used. Efficiency isn’t a point, either, because where they are used only, like, a single time per invocation of the executable, for argument processing.

Returning their result is the normal, surprise-free way for functions to work. Changing things in-place is an optimization, unnecessary in this case. The new way is imo not more unwieldy, definitely not if you need to keep the old string.

Even with that, even if performance mattered

0+ enum Network ParseNetwork(std::string net) {
1+    Downcase(net);
2- enum Network ParseNetwork(const std::string& net_in) {
3-    std::string net = ToLower(net_in);

Is not an overall loss of efficiency. In the old case, net is copied as input argument then transformed in place. In the new case (=1 string allocation), in the new case the function takes the argument as reference but ToLower creates and returns a new string with the uppercase data (=1 string allocation).

l2a5b1 commented at 11:46 pm on August 8, 2019: contributor

@laanwj thanks, appreciate the feedback.

Having two sets of uppercase/lowercase functions would be overkill, especially given how little they are used.

There are no two sets of uppercase/lowercase functions and it is also not what I did propose.

The new way is imo not more unwieldy, definitely not if you need to keep the old string.

The code at the call sites does not need to keep the old string.

As far as performance or efficiency argument is concerned, I didn’t mention that in my feedback.

What I do argue is that there don’t seem to be any benefits to refactoring Downcase:

It is not required for #11413;
There are no significant developer benefits, because the call sites don’t need the lower cased string to be returned;
Use of the refactored function at the two call sites is more unwieldy than with the Downcase equivalent, because with ToLower you need to do more to achieve the same result.

Hence:

I think we should create a ToLower that returns the result when we actually need it

laanwj commented at 11:56 pm on August 8, 2019: member

Right, okay, I’m not going to argue this any further

ACK 0481fa25844dc6ec9f6c3fac8428d874d34b0ad0 from me

kallewoof commented at 0:56 am on August 9, 2019: member

@l2a5b1

Thanks for your feedback. The function (Downcase()) is used in one single place, and its behavior is not as anyone would expect, because someone at some point thought in-place replacement would be an optimization worthy of going for. I disagree, and as you can see, others here disagree as well.

Generally speaking, if you have a choice between straight-forward/expected behavior and unexpected-but-optimized behavior, you usually wanna go for the former, unless the code is in some way performance critical (e.g. it slows things down significantly).

Seeing as ParseNetwork, the only function using the Downcase function in the code, is called one single time in init.cpp, it makes no sense to keep a non-intuitive optimization around.

I am changing it in preparation for #11413 because I need the uppercase variant, and having one void Downcase(std::string& str) function which performs in-place replacement, and one std::string ToUpper(const std::string& str) function which does not, would be bloody confusing for everybody. @luke-jr Honestly, this function is called one single time. Let’s not keep things confusing on purpose.

practicalswift commented at 7:21 am on August 9, 2019: contributor

ACK 0481fa25844dc6ec9f6c3fac8428d874d34b0ad0 – diff looks correct

Agree fully with @kallewoof and @laanwj regarding the choice between “straight-forward/expected” vs “unexpected-but-optimized” for code paths that are not performance critical.

l2a5b1 commented at 4:20 pm on August 9, 2019: contributor

@kallewoof thanks! :)

ACK 0481fa2 - Although, I think @luke-jr’s feedback is spot on; Downcase is just an artifact of ParseNetwork, which has been happily downcasing net via a string argument for over 7 years; and I do recommend to add ToLower when somebody actually needs it in new work, there is no point in keeping a trivial utility function if it is not appreciated.

promag commented at 10:54 pm on August 12, 2019: member

ACK 0481fa25844dc6ec9f6c3fac8428d874d34b0ad0.

fanquake merged this on Aug 13, 2019

fanquake closed this on Aug 13, 2019

fanquake referenced this in commit b799ebcc17 on Aug 13, 2019

kallewoof deleted the branch on Aug 13, 2019

sidhujag referenced this in commit 206e3a5511 on Aug 13, 2019

deadalnix referenced this in commit feb9dc577c on Jun 4, 2020

ShengguangXiao referenced this in commit 1d008873a2 on Aug 28, 2020

kittywhiskers referenced this in commit 01e726d6b4 on Nov 1, 2021

kittywhiskers referenced this in commit db7e456213 on Nov 1, 2021

kittywhiskers referenced this in commit e13d93ec78 on Nov 1, 2021

UdjinM6 referenced this in commit c679fd9ab6 on Nov 2, 2021

pravblockc referenced this in commit cecf881462 on Nov 18, 2021

DrahtBot locked this on Dec 16, 2021

util: refactor upper/lowercase functions #16566