Minor optimizations to bech32::Decode(); add tests. #12881

murrayn commented at 10:39 AM on April 4, 2018: contributor

Just a few minor optimizations to bech32::Decode():

optimize the order and logic of the conditionals
get rid of subsequent '(c < 33 || c > 126)' check which is redundant (already performed above)
add a couple more bech32 tests (mixed-case)

fanquake added the label Refactoring on Apr 4, 2018

in src/bech32.cpp:159 in b65a5a3bfd outdated

 154 | @@ -161,26 +155,28 @@ std::pair<std::string, data> Decode(const std::string& str) {
 155 |      for (size_t i = 0; i < str.size(); ++i) {
 156 |          unsigned char c = str[i];
 157 |          if (c < 33 || c > 126) return {};
 158 | -        if (c >= 'a' && c <= 'z') lower = true;
 159 | -        if (c >= 'A' && c <= 'Z') upper = true;
 160 | +        // Mixing upper and lowercase is not OK.
 161 | +        if (!lower && c >= 'a' && c <= 'z') { if(upper) return {}; lower = true; }

practicalswift commented at 11:05 AM on April 4, 2018:

Nit: if (upper) instead of if(upper)

murrayn commented at 11:32 AM on April 4, 2018:

Not even a nit. Should be a compiler error IMO. I must be tired. :-)

in src/bech32.cpp:175 in b65a5a3bfd outdated

 169 |      }
 170 |      data values(str.size() - 1 - pos);
 171 |      for (size_t i = 0; i < str.size() - 1 - pos; ++i) {
 172 |          unsigned char c = str[i + pos + 1];
 173 | -        int8_t rev = (c < 33 || c > 126) ? -1 : CHARSET_REV[c];
 174 | +        int8_t rev = CHARSET_REV[c];

practicalswift commented at 11:10 AM on April 4, 2018:

Add assert(c >= 33 && c <= 126); on the line before int8_t rev = CHARSET_REV[c]; to make assumption explicit?

murrayn commented at 3:23 AM on April 5, 2018:

@practicalswift If I were writing Decode() from scratch, I don't think it would occur to me to add an assert() there. Not convinced it adds any value.

laanwj commented at 11:05 AM on April 26, 2018:

Remember that we have asserts enabled on release notes, better not add them in inner loops especially not if the goal is to 'tighten up' anything.

practicalswift commented at 11:13 AM on April 4, 2018: contributor

LowerCase() and std::tolower() are not equivalent. std::tolower() takes the currently installed C locale into account.

murrayn commented at 11:46 AM on April 4, 2018: contributor

Had a feeling there was a reason for LowerCase().

in src/bech32.cpp:165 in 1bc88b4a66 outdated

 160 | @@ -161,18 +161,20 @@ std::pair<std::string, data> Decode(const std::string& str) {
 161 |      for (size_t i = 0; i < str.size(); ++i) {
 162 |          unsigned char c = str[i];
 163 |          if (c < 33 || c > 126) return {};
 164 | -        if (c >= 'a' && c <= 'z') lower = true;
 165 | -        if (c >= 'A' && c <= 'Z') upper = true;
 166 | +        // Mixing upper and lowercase is not OK.
 167 | +        if (!lower && c >= 'a' && c <= 'z') { if (upper) return {}; lower = true; }

sipa commented at 12:44 AM on April 5, 2018:

Please follow the coding style: any if that has anything but just a single-statement then body must use braces and indentation.

fanquake commented at 1:06 AM on April 5, 2018: member

@murrayn After fixing any nits, please also squash your commits.

in src/bech32.cpp:165 in 163d794add outdated

 160 | @@ -161,18 +161,25 @@ std::pair<std::string, data> Decode(const std::string& str) {
 161 |      for (size_t i = 0; i < str.size(); ++i) {
 162 |          unsigned char c = str[i];
 163 |          if (c < 33 || c > 126) return {};
 164 | -        if (c >= 'a' && c <= 'z') lower = true;
 165 | -        if (c >= 'A' && c <= 'Z') upper = true;

promag commented at 1:32 AM on April 5, 2018:

else here would improve a bit.

promag commented at 1:43 AM on April 5, 2018:

Maybe better:

if (c >= 'a') { if (c <= 'z') lower = true; }
else if (c >= 'A') { if (c <= 'Z') upper = true; }

murrayn commented at 3:29 AM on April 5, 2018:

@promag Do you have an example in mind of a most common case?

promag commented at 8:35 AM on April 5, 2018:

The success case?

murrayn commented at 10:25 AM on April 5, 2018:

OK, in that case (let's assume a string consisting only of lowercase or uppercase letters) the existing code will do three comparisons and one assignment, per character. The proposed code will do three comparisons per character, with the added benefit of returning earlier in the case of a malformed string. Not sure why you would "suspect this makes the most common case worst".

murrayn commented at 11:28 AM on April 5, 2018:

Further to this, if we're going to assume the most common case is the success case (which is reasonable), it would probably be good to move the initial (c < 33 || c > 126) check to an ultimate "else if" check to the original code.

promag commented at 8:41 PM on April 5, 2018:

@murrayn probably, but can you post benchmark results?

promag commented at 1:47 AM on April 5, 2018: member

Please provide benchmark results. I suspect this makes the most common case worst. Maybe the suggestion below would improve performance for the most common case. An edge case doesn't have to perform better if the implementation makes the remaining cases worst.

murrayn force-pushed on Apr 5, 2018

laanwj commented at 7:25 AM on April 5, 2018: member

LowerCase() and std::tolower() are not equivalent. std::tolower() takes the currently installed C locale into account.

Thanks for catching this. We should be extremely careful to not introduce locale dependencies in the low-level string parsing functions. We've had serious problems with those in the past. This can result in country-specific bugs...

sipa commented at 11:50 AM on April 5, 2018: member

Please don't overthink this. Decoding addresses is hardly relevant (I don't think anyone would notice if they were 100x slower). My goal when writing this was more clarity and simplicity than speed, though I'm obviously not opposed to performance improvements if they don't conflict with those goals.

I am interested in whether the additional branches don't make performance worse though. My gut feeling is that they impact performance more than comparisons.

murrayn commented at 2:10 AM on April 6, 2018: contributor

@sipa Thanks for the feedback. I've reworked the code again to reflect your input.

MarcoFalke commented at 7:38 PM on April 11, 2018: member

To get rid of the merge commit, please squash your commits according to https://github.com/bitcoin/bitcoin/blob/master/CONTRIBUTING.md#squashing-commits

Tighten up bech32::Decode(); add tests. 60f61f9952

murrayn force-pushed on Apr 13, 2018

murrayn commented at 9:22 AM on April 20, 2018: contributor

Not sure if this PR is stalled due to the earlier comment about benchmarks...I didn't think benchmarks would be as interesting after my most recent commit, in which the code was more straightforward. Just in case, I have benchmarked:

if (c >= 'a' && c <= 'z') lower = 1;
else if (c >= 'A' && c <= 'Z') upper = 1;
else if (c < 33 || c > 126) return 0;

versus

if (c < 33 || c > 126) return 0;
if (c >= 'a' && c <= 'z') lower = 1;
if (c >= 'A' && c <= 'Z') upper = 1;

and my results show the former is significantly faster; however, with -O2 compiler optimization enabled they benchmark identically, which isn't surprising.

practicalswift cross-referenced this on Apr 20, 2018 from issue build: Add linter checking for accidental introduction of locale dependence by practicalswift

promag commented at 10:26 AM on April 26, 2018: member

utACK 60f61f9.

Change is slightly better because success checks come first, which is probably the most common case.

MarcoFalke commented at 11:47 AM on April 26, 2018: member

Please also adjust the OP

murrayn renamed this:
~~Tighten up bech32::Decode(); add tests.~~
Minor optimizations to bech32::Decode(); add tests.
on Apr 28, 2018

laanwj commented at 10:08 AM on May 15, 2018: member

utACK 60f61f99529f54f85c847d61122c70c0358ebecc

laanwj merged this on May 15, 2018

laanwj closed this on May 15, 2018

laanwj referenced this in commit 1d4662f5dc on May 15, 2018

Bushstar cross-referenced this on May 16, 2018 from issue commits from bitcoin/master by Bushstar

laanwj referenced this in commit 5779dc4f76 on Jun 7, 2018

PastaPastaPasta referenced this in commit 41e71ce854 on Jun 17, 2020

PastaPastaPasta referenced this in commit 28b02f42e1 on Jul 2, 2020

UdjinM6 referenced this in commit ee1aa96286 on May 21, 2021

UdjinM6 referenced this in commit 86ae5fcb64 on May 25, 2021

bitcoin locked this on Sep 8, 2021