util: refactor upper/lowercase functions #16566

pull kallewoof wants to merge 1 commits into bitcoin:master from kallewoof:2019-08-casefuns-refactor changing 6 files +48 −24
  1. kallewoof commented at 2:38 am on August 8, 2019: member

    This includes renaming Downcase() to ToLower() and make it return a string rather than modify referenced arg. Also adds ToUpper() string version.

    Additionally, it clarifies that the locale independency of the case functions is a feature and not a limitation. I interpreted it as the latter and rewrote code to be locale-aware before realizing this.

    This is done in preparation for #11413 and as a general refactor. I don’t think the optimization that the pre-refactor state gave warrants the unwieldy usage.

  2. util: refactor upper/lowercase functions
    This includes renaming Downcase() to ToLower() and make it return a string rather than modify referenced arg.
    Also adds ToUpper() string version.
    0481fa2584
  3. fanquake added the label Refactoring on Aug 8, 2019
  4. fanquake added the label Utils/log/libs on Aug 8, 2019
  5. practicalswift commented at 7:26 am on August 8, 2019: contributor

    Concept ACK

    Thanks for cleaning this up.

    Will review the code.

    I don’t think the optimization that the pre-refactor state gave warrants the unwieldy usage.

    Very much so. (Also unclear if the pre-refactor state is a meaningful optimisation in practice.)

  6. Sjors commented at 8:33 am on August 8, 2019: member
    Concept ACK for improved readability and usability.
  7. laanwj commented at 9:43 am on August 8, 2019: member

    Concept ACK having these as functions seems better than doing the transformation in-place

    Additionally, it clarifies that the locale independency of the case functions is a feature and not a limitation

    Yes, please do not ever add any locale dependent code to bitcoind. Deterministic behavior is the most important thing there (it’s okay for bitcoin-qt but only for UI purposes).

  8. in src/util/strencodings.cpp:551 in 0481fa2584
    545@@ -546,9 +546,18 @@ bool ParseFixedPoint(const std::string &val, int decimals, int64_t *amount_out)
    546     return true;
    547 }
    548 
    549-void Downcase(std::string& str)
    550+std::string ToLower(const std::string& str)
    551 {
    552-    std::transform(str.begin(), str.end(), str.begin(), [](char c){return ToLower(c);});
    553+    std::string r;
    


    promag commented at 10:31 am on August 8, 2019:
    Could reserve, same below.

    kallewoof commented at 1:45 pm on August 8, 2019:
    What does that mean?

    promag commented at 1:49 pm on August 8, 2019:

    practicalswift commented at 8:36 pm on August 8, 2019:

    @promag I don’t think using r.reserve(str.size()); necessarily results in faster execution here. I think it depends on the SSO implementation and the length of str.

    If str.size() <= 15 I think not using reserve is faster when using libstdc++ or MSVC. And str.size() <= 22 for libc++.


    promag commented at 8:48 pm on August 8, 2019:
    Right, but I’m not particularly worried about performance, i think it improves the code.

    kallewoof commented at 0:47 am on August 9, 2019:
    Reserving seems sensible. It would actually slow down the code, @practicalswift ? Crazy. I would have never expected that. I’m indifferent on this, though, and opinions are mixed so leaving as is for now.

    practicalswift commented at 7:40 pm on August 14, 2019:

    @kallewoof You can test this using the following Google Benchmark snippet:

     0#include <algorithm>
     1#include <string>
     2
     3constexpr char ToLower(char c) {
     4    return (c >= 'A' && c <= 'Z' ? (c - 'A') + 'a' : c);
     5}
     6
     7std::string ToLower_1(const std::string& str) {
     8    std::string r;
     9    for (auto ch : str) {
    10        r += ToLower((unsigned char)ch);
    11    }
    12    return r;
    13}
    14
    15std::string ToLower_2(const std::string& str) {
    16    std::string r;
    17    r.reserve(str.size());
    18    for (auto ch : str) {
    19        r += ToLower((unsigned char)ch);
    20    }
    21    return r;
    22}
    23
    24static void BenchToLower_1(benchmark::State& state) {
    25  for (auto _ : state) {
    26    const std::string s1{"HELLO"};
    27    benchmark::DoNotOptimize(s1);
    28    const std::string s2 = ToLower_1(s1);
    29    benchmark::DoNotOptimize(s2);
    30  }
    31}
    32BENCHMARK(BenchToLower_1);
    33
    34static void BenchToLower_2(benchmark::State& state) {
    35  for (auto _ : state) {
    36    const std::string s1{"HELLO"};
    37    benchmark::DoNotOptimize(s1);
    38    const std::string s2 = ToLower_2(s1);
    39    benchmark::DoNotOptimize(s2);
    40  }
    41}
    42BENCHMARK(BenchToLower_2);
    

    Notice that ToLower_1 which doesn’t reserve is slightly faster for strings up to 15 chars for libstdc++ and MSVC and up to 22 chars for libc++.


    kallewoof commented at 6:36 am on August 15, 2019:
    @practicalswift Thanks for that! Very educational. Memory allocations are tricky to optimize, I guess. :)
  9. in src/netbase.cpp:41 in 0481fa2584
    36@@ -37,8 +37,8 @@ bool fNameLookup = DEFAULT_NAME_LOOKUP;
    37 static const int SOCKS5_RECV_TIMEOUT = 20 * 1000;
    38 static std::atomic<bool> interruptSocks5Recv(false);
    39 
    40-enum Network ParseNetwork(std::string net) {
    41-    Downcase(net);
    42+enum Network ParseNetwork(const std::string& net_in) {
    43+    std::string net = ToLower(net_in);
    


    promag commented at 10:34 am on August 8, 2019:
    Could be net = ToLower(net) and avoid changing the signature? Both are fine though.

    laanwj commented at 11:04 am on August 8, 2019:
    Changing the signature is an improvement here IMO

    l2a5b1 commented at 5:16 pm on August 8, 2019:

    I don’t think the optimization that the pre-refactor state gave warrants the unwieldy usage.

    It seems to me that the usage of the refactored function is more unwieldy than the pre-refactor state.

    Downcase(net) straightforwardly downcases the string. The refactored-state needs two variables (const std::string& net_in, std::string net) and an assignment to achieve the same result.

  10. promag commented at 10:36 am on August 8, 2019: member
    Concept ACK.
  11. luke-jr commented at 4:00 pm on August 8, 2019: member
    While there are cases where Downcase was unwieldy used, I don’t think it is unwieldy as a rule, and there’s no real reason to go out of the way to make things less efficient. The new functions can be added without making other code worse.
  12. in src/util/strencodings.cpp:549 in 0481fa2584
    545@@ -546,9 +546,18 @@ bool ParseFixedPoint(const std::string &val, int decimals, int64_t *amount_out)
    546     return true;
    547 }
    548 
    549-void Downcase(std::string& str)
    550+std::string ToLower(const std::string& str)
    


    l2a5b1 commented at 5:23 pm on August 8, 2019:
    What is the significant developer benefit of this change and why is the required in preparation for #11413? This function is called at just two call sites and the pre-refactor state expects that the argument is directly transformed.
  13. in src/util/system.cpp:391 in 0481fa2584
    387@@ -388,7 +388,7 @@ bool ArgsManager::ParseParameters(int argc, const char* const argv[], std::strin
    388             key.erase(is_index);
    389         }
    390 #ifdef WIN32
    391-        std::transform(key.begin(), key.end(), key.begin(), ToLower);
    392+        key = ToLower(key);
    


    l2a5b1 commented at 6:37 pm on August 8, 2019:
    Pre-refactor this would just be Downcase(key).
  14. l2a5b1 commented at 6:52 pm on August 8, 2019: contributor

    I am afraid that I don’t see the significant developer improvement of the refactored Downcase function. Usage of the refactored ToLower function at the two call sites also seems more unwieldy than the pre-refactor state, because both call sites expected the string argument to be transformed.

    Refactoring Downcase is also not required for #11413.

    I think we should create a ToLower that returns the result when we actually need it (and even then - as in boost for example - it could make sense to have both functions).

  15. laanwj commented at 10:36 pm on August 8, 2019: member

    I think we should create a ToLower that returns the result when we actually need it (and even then - as in boost for example - it could make sense to have both functions).

    Having two sets of uppercase/lowercase functions would be overkill, especially given how little they are used. Efficiency isn’t a point, either, because where they are used only, like, a single time per invocation of the executable, for argument processing.

    Returning their result is the normal, surprise-free way for functions to work. Changing things in-place is an optimization, unnecessary in this case. The new way is imo not more unwieldy, definitely not if you need to keep the old string.

    Even with that, even if performance mattered

    0+ enum Network ParseNetwork(std::string net) {
    1+    Downcase(net);
    2- enum Network ParseNetwork(const std::string& net_in) {
    3-    std::string net = ToLower(net_in);
    

    Is not an overall loss of efficiency. In the old case, net is copied as input argument then transformed in place. In the new case (=1 string allocation), in the new case the function takes the argument as reference but ToLower creates and returns a new string with the uppercase data (=1 string allocation).

  16. l2a5b1 commented at 11:46 pm on August 8, 2019: contributor

    @laanwj thanks, appreciate the feedback.

    Having two sets of uppercase/lowercase functions would be overkill, especially given how little they are used.

    There are no two sets of uppercase/lowercase functions and it is also not what I did propose.

    The new way is imo not more unwieldy, definitely not if you need to keep the old string.

    The code at the call sites does not need to keep the old string.

    As far as performance or efficiency argument is concerned, I didn’t mention that in my feedback.

    What I do argue is that there don’t seem to be any benefits to refactoring Downcase:

    • It is not required for #11413;
    • There are no significant developer benefits, because the call sites don’t need the lower cased string to be returned;
    • Use of the refactored function at the two call sites is more unwieldy than with the Downcase equivalent, because with ToLower you need to do more to achieve the same result.

    Hence:

    I think we should create a ToLower that returns the result when we actually need it

  17. laanwj commented at 11:56 pm on August 8, 2019: member

    Right, okay, I’m not going to argue this any further

    ACK 0481fa25844dc6ec9f6c3fac8428d874d34b0ad0 from me

  18. kallewoof commented at 0:56 am on August 9, 2019: member

    @l2a5b1

    Thanks for your feedback. The function (Downcase()) is used in one single place, and its behavior is not as anyone would expect, because someone at some point thought in-place replacement would be an optimization worthy of going for. I disagree, and as you can see, others here disagree as well.

    Generally speaking, if you have a choice between straight-forward/expected behavior and unexpected-but-optimized behavior, you usually wanna go for the former, unless the code is in some way performance critical (e.g. it slows things down significantly).

    Seeing as ParseNetwork, the only function using the Downcase function in the code, is called one single time in init.cpp, it makes no sense to keep a non-intuitive optimization around.

    I am changing it in preparation for #11413 because I need the uppercase variant, and having one void Downcase(std::string& str) function which performs in-place replacement, and one std::string ToUpper(const std::string& str) function which does not, would be bloody confusing for everybody. @luke-jr Honestly, this function is called one single time. Let’s not keep things confusing on purpose.

  19. practicalswift commented at 7:21 am on August 9, 2019: contributor

    ACK 0481fa25844dc6ec9f6c3fac8428d874d34b0ad0 – diff looks correct

    Agree fully with @kallewoof and @laanwj regarding the choice between “straight-forward/expected” vs “unexpected-but-optimized” for code paths that are not performance critical.

  20. l2a5b1 commented at 4:20 pm on August 9, 2019: contributor

    @kallewoof thanks! :)

    ACK 0481fa2 - Although, I think @luke-jr’s feedback is spot on; Downcase is just an artifact of ParseNetwork, which has been happily downcasing net via a string argument for over 7 years; and I do recommend to add ToLower when somebody actually needs it in new work, there is no point in keeping a trivial utility function if it is not appreciated.

  21. promag commented at 10:54 pm on August 12, 2019: member
    ACK 0481fa25844dc6ec9f6c3fac8428d874d34b0ad0.
  22. fanquake merged this on Aug 13, 2019
  23. fanquake closed this on Aug 13, 2019

  24. fanquake referenced this in commit b799ebcc17 on Aug 13, 2019
  25. kallewoof deleted the branch on Aug 13, 2019
  26. sidhujag referenced this in commit 206e3a5511 on Aug 13, 2019
  27. deadalnix referenced this in commit feb9dc577c on Jun 4, 2020
  28. ShengguangXiao referenced this in commit 1d008873a2 on Aug 28, 2020
  29. kittywhiskers referenced this in commit 01e726d6b4 on Nov 1, 2021
  30. kittywhiskers referenced this in commit db7e456213 on Nov 1, 2021
  31. kittywhiskers referenced this in commit e13d93ec78 on Nov 1, 2021
  32. UdjinM6 referenced this in commit c679fd9ab6 on Nov 2, 2021
  33. pravblockc referenced this in commit cecf881462 on Nov 18, 2021
  34. DrahtBot locked this on Dec 16, 2021

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-07-03 10:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me