This PR adds benchmarks to Encode()/Decode().
The benchmark commit is duplicated in #13632.
147 | @@ -147,11 +148,15 @@ namespace bech32 148 | std::string Encode(const std::string& hrp, const data& values) { 149 | data checksum = CreateChecksum(hrp, values); 150 | data combined = Cat(values, checksum); 151 | - std::string ret = hrp + '1'; 152 | - ret.reserve(ret.size() + combined.size()); 153 | + size_t hrp_size = hrp.size(); 154 | + char ret[hrp_size + combined.size() + 2]; 155 | + memcpy(ret, hrp.c_str(), hrp_size);
Is there slow-down associated with inlining hrp_size? Seems the compiler would optimize that.
I'll check. I didn't like repeatedly calling hrp.size() but it may be optimized away, as you say.
I don't think the difference warrants the extra line, so I removed it. Thanks for the nudge. :)
155 | + memcpy(ret, hrp.c_str(), hrp_size); 156 | + char* ptr = ret + hrp_size; 157 | + *(ptr++) = '1'; 158 | for (auto c : combined) { 159 | - ret += CHARSET[c]; 160 | + *(ptr++) = CHARSET[c];
How about switching to ++ptr?
Why? (And that would change the behavior to setting the next value rather than the current and iterating.)
++i is preferred over i++.
https://github.com/bitcoin/bitcoin/blob/master/doc/developer-notes.md
You can make the behavior fit by shifting the increment to the later operation in each case.
In this case, I think $(ptr++) is a better alternative, as it is more intuitive. Update the value and then move to the next position. Doing ++i here semantically means the pointer points to the last byte that was set, rather than the next available byte.
The nice thing about ++ptr, and the reason it is preferred AFAIK, is that it does not require a copy of the value in order to operate - you always operate using the value directly because it is holding the current value rather than the next one.
In theory, ++x is faster than x++ for the reason you mention. However, in practice, for simple types the optimizer will rearrange things to avoid the copy. Even for more complex types this is often the case.
Furthermore, just because a construct is preferred doesn't mean you need to go out of your way to use over an alternative. If there are good reasons for x++, like here, you can use it.
157 | for (auto c : combined) {
158 | - ret += CHARSET[c];
159 | + *(ptr++) = CHARSET[c];
160 | }
161 | + *ptr = 0;
162 | return ret;
Even faster: return std::string(ret, ptr); (no need for the string constructor to find the string length again, you already know).
Nice!
Damn.
Bech32Encode, 5, 800000, 3.79493, 9.32891e-07, 9.75863e-07, 9.47069e-07
Actually, the vanity generator did not see an increase in speed so this may have been a fluke. It didn't make it worse though so keeping.
Too bad it didn't help, but I prefer the avoidance of zero-terminated strings in any case.
<!--e57a25ab6845829454e8d69fc972939a-->Note to reviewers: This pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.
147 | @@ -147,12 +148,14 @@ namespace bech32 148 | std::string Encode(const std::string& hrp, const data& values) { 149 | data checksum = CreateChecksum(hrp, values); 150 | data combined = Cat(values, checksum); 151 | - std::string ret = hrp + '1'; 152 | - ret.reserve(ret.size() + combined.size()); 153 | + char ret[91]; 154 | + memcpy(ret, hrp.c_str(), hrp.size());
please use .data() here (it goes more naturally with .size())
This really needs a length check. Fixing the length at 91 but still copying the entire input (and writing after it) is a buffer overflow waiting to happen, otherwise.
Came here to say the same regarding a length check. It's too brittle otherwise.
I knew you'd say that. Added length check and updated benchmark results in OP. Still improvement.
I'm a bit scared of this due to the potential security issue introduced. Is it really worth it?
Sure, yes. Though an assertion is not a replacement for proper error handing, IMO. This still means the whole process crashes if a string that is too long is passed in, and makes the security dependent on assertions being enabled (this might be true in other place too though :( ).
I am not convinced this is necessary myself anymore, so I am closing. If someone feels like it's a worthy endeavour go nuts.
It still makes sense to add the benchmark, I think!
Dropped the last commit and keeping the bench part only.
utACK 189cf35f3e6d2cc9ed08eb23dd0ea36be28b6c11
tACK 189cf35