Just a few minor optimizations to bech32::Decode():
- optimize the order and logic of the conditionals
- get rid of subsequent '(c < 33 || c > 126)' check which is redundant (already performed above)
- add a couple more bech32 tests (mixed-case)
Just a few minor optimizations to bech32::Decode():
154 | @@ -161,26 +155,28 @@ std::pair<std::string, data> Decode(const std::string& str) { 155 | for (size_t i = 0; i < str.size(); ++i) { 156 | unsigned char c = str[i]; 157 | if (c < 33 || c > 126) return {}; 158 | - if (c >= 'a' && c <= 'z') lower = true; 159 | - if (c >= 'A' && c <= 'Z') upper = true; 160 | + // Mixing upper and lowercase is not OK. 161 | + if (!lower && c >= 'a' && c <= 'z') { if(upper) return {}; lower = true; }
Nit: if (upper) instead of if(upper)
Not even a nit. Should be a compiler error IMO. I must be tired. :-)
169 | }
170 | data values(str.size() - 1 - pos);
171 | for (size_t i = 0; i < str.size() - 1 - pos; ++i) {
172 | unsigned char c = str[i + pos + 1];
173 | - int8_t rev = (c < 33 || c > 126) ? -1 : CHARSET_REV[c];
174 | + int8_t rev = CHARSET_REV[c];
Add assert(c >= 33 && c <= 126); on the line before int8_t rev = CHARSET_REV[c]; to make assumption explicit?
@practicalswift If I were writing Decode() from scratch, I don't think it would occur to me to add an assert() there. Not convinced it adds any value.
Remember that we have asserts enabled on release notes, better not add them in inner loops especially not if the goal is to 'tighten up' anything.
LowerCase() and std::tolower() are not equivalent. std::tolower() takes the currently installed C locale into account.
Had a feeling there was a reason for LowerCase().
160 | @@ -161,18 +161,20 @@ std::pair<std::string, data> Decode(const std::string& str) { 161 | for (size_t i = 0; i < str.size(); ++i) { 162 | unsigned char c = str[i]; 163 | if (c < 33 || c > 126) return {}; 164 | - if (c >= 'a' && c <= 'z') lower = true; 165 | - if (c >= 'A' && c <= 'Z') upper = true; 166 | + // Mixing upper and lowercase is not OK. 167 | + if (!lower && c >= 'a' && c <= 'z') { if (upper) return {}; lower = true; }
Please follow the coding style: any if that has anything but just a single-statement then body must use braces and indentation.
@murrayn After fixing any nits, please also squash your commits.
160 | @@ -161,18 +161,25 @@ std::pair<std::string, data> Decode(const std::string& str) { 161 | for (size_t i = 0; i < str.size(); ++i) { 162 | unsigned char c = str[i]; 163 | if (c < 33 || c > 126) return {}; 164 | - if (c >= 'a' && c <= 'z') lower = true; 165 | - if (c >= 'A' && c <= 'Z') upper = true;
else here would improve a bit.
Maybe better:
if (c >= 'a') { if (c <= 'z') lower = true; }
else if (c >= 'A') { if (c <= 'Z') upper = true; }
The success case?
OK, in that case (let's assume a string consisting only of lowercase or uppercase letters) the existing code will do three comparisons and one assignment, per character. The proposed code will do three comparisons per character, with the added benefit of returning earlier in the case of a malformed string. Not sure why you would "suspect this makes the most common case worst".
Further to this, if we're going to assume the most common case is the success case (which is reasonable), it would probably be good to move the initial (c < 33 || c > 126) check to an ultimate "else if" check to the original code.
Please provide benchmark results. I suspect this makes the most common case worst. Maybe the suggestion below would improve performance for the most common case. An edge case doesn't have to perform better if the implementation makes the remaining cases worst.
LowerCase() and std::tolower() are not equivalent. std::tolower() takes the currently installed C locale into account.
Thanks for catching this. We should be extremely careful to not introduce locale dependencies in the low-level string parsing functions. We've had serious problems with those in the past. This can result in country-specific bugs...
Please don't overthink this. Decoding addresses is hardly relevant (I don't think anyone would notice if they were 100x slower). My goal when writing this was more clarity and simplicity than speed, though I'm obviously not opposed to performance improvements if they don't conflict with those goals.
I am interested in whether the additional branches don't make performance worse though. My gut feeling is that they impact performance more than comparisons.
To get rid of the merge commit, please squash your commits according to https://github.com/bitcoin/bitcoin/blob/master/CONTRIBUTING.md#squashing-commits
Not sure if this PR is stalled due to the earlier comment about benchmarks...I didn't think benchmarks would be as interesting after my most recent commit, in which the code was more straightforward. Just in case, I have benchmarked:
if (c >= 'a' && c <= 'z') lower = 1;
else if (c >= 'A' && c <= 'Z') upper = 1;
else if (c < 33 || c > 126) return 0;
versus
if (c < 33 || c > 126) return 0;
if (c >= 'a' && c <= 'z') lower = 1;
if (c >= 'A' && c <= 'Z') upper = 1;
and my results show the former is significantly faster; however, with -O2 compiler optimization enabled they benchmark identically, which isn't surprising.
utACK 60f61f9.
Change is slightly better because success checks come first, which is probably the most common case.
Please also adjust the OP
utACK 60f61f99529f54f85c847d61122c70c0358ebecc