This implements BIP 350:
- For segwit v1+ addresses, a new checksum algorithm called Bech32m is used.
- Segwit v0 address keep using Bech32 as specified in BIP 173.
<!--e57a25ab6845829454e8d69fc972939a-->
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.
<!--174a7506f384e20aa4161008e828411d-->
Reviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.
100 | + if ((encoding == bech32::Encoding::BECH32 || encoding == bech32::Encoding::BECH32M) && 101 | + encdata.size() > 0 && hrp == params.Bech32HRP()) { 102 | // Bech32 decoding 103 | - int version = bech.second[0]; // The first 5 bit symbol is the witness version (0-16) 104 | + int version = encdata[0]; // The first 5 bit symbol is the witness version (0-16) 105 | + if ((version == 0 && encoding != bech32::Encoding::BECH32) || (version != 0 && encoding != bech32::Encoding::BECH32M)) {
Why forbid Bech32m for v0? Seems like it might be nice to (very slowly) migrate to Bech32m exclusively, and drop Bech32 someday?
I think it's a bad idea to have multiple addresses valid for the same scriptPubKey. It only results in confusion when decoding/recoding doesn't roundtrip. v0 outputs use bech32, and I don't think this should ever change, for better or for worse.
It also worsens error detection to 29 bits (as every v0 output now has two valid checksums).
Roundtrips (by uninvolved third parties) seem like strictly a bad thing IMO, but loss of error detection precision is a good reason. :/
By implementing Bech32m, I found out that it is not obvious how to locate errors, because the residue was polymod^1, but now is either polimod^1 or polimod^0x2bc830a3... I am tempted to try both and take the residue with the least errors. I think this should be specified in the bip.
That said by doing this, the error patterns with failure probability in the bip are probably different. As I think you assume in your table that the decoder knows it is decoding bech32m. Unsure if this is really changing things by a lot.
This is not the right place to comment for it, but did not found elsewhere.
@NicolasDorier That isn't solved by sipa only allowing one polimod based on the witness version?
@luke-jr It's not, because the error may be one that affects the witness version symbol. @NicolasDorier Good point, I will elaborate on that.
Concept ACK
Concept ACK, Approach ACK. Skimmed code but will hopefully go through more thoroughly at a later date.
26 | + INVALID, 27 | 28 | -/** Decode a Bech32 string. Returns (hrp, data). Empty hrp means failure. */ 29 | -std::pair<std::string, std::vector<uint8_t>> Decode(const std::string& str); 30 | + BECH32, //! Bech32 encoding as defined in BIP173 31 | + BECH32M, //! Bech32m encoding as defined in bip-bech32m
Commit 6b28812881375311891e80c1d4e9df6aaa0b85d5
Perhaps too unorthodox, but it would theoretically be possible to do
BECH32 = 1, //...
BECH32M = 0x2bc830a3, // ...
and get rid of EncodingConstant. (But the amount of casting required is probably not worth it.)
I prefer not to do that just for encapsulation reason. Nothing outside of bech32.cpp should care or see the constants.
Slightly tested ACK 863a5c2eba77ca571813b33b9fa6237a9f3efbd7
I honestly don't really like the std::tuple stuff. It's pretty ugly and a simple struct containing the 3 elements passed as a copy ~by reference~ (edit: I meant like the tuples are passed now) in the same way would be more straightforward.
@kallewoof Agree, a struct is definitely warranted here. Updated.
13 | @@ -14,16 +14,35 @@ 14 | 15 | #include <stdint.h> 16 | #include <string> 17 | +#include <tuple>
Don't think this is needed anymore.
Fixed.
utACK 936322d351ad8ad2b0497bdb0651d5d969944f9f
re-slightly-tested-ACK 2827cf86559a9310ca3fb8b19e03422586a21a2b
Code Review ACK 2827cf8
utACK 2827cf86559a9310ca3fb8b19e03422586a21a2b
Rebased now #20832 is merged.
I've also addressed some comments on the BIP draft itself, see https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2021-January/018362.html.
re-ACK 3fee858d4732c57355ec9d18c66e84ed5d2a5961
re-ACK 3fee858
29 | @@ -27,6 +30,12 @@ const int8_t CHARSET_REV[128] = { 30 | 1, 0, 3, 16, 11, 28, 12, 14, 6, 4, 2, -1, -1, -1, -1, -1 31 | }; 32 | 33 | +/* Determine the final constant to use for the specified encoding. */ 34 | +uint32_t EncodingConstant(Encoding encoding) { 35 | + assert(encoding == Encoding::BECH32 || encoding == Encoding::BECH32M); 36 | + return encoding == Encoding::BECH32 ? 1 : 0x2bc830a3;
Shouldn't 0x2bc830a3 be made a global constant value?
IMO, no. There is no reason why it should be exported from this module.
135 | + return Encoding::INVALID; 136 | } 137 | 138 | /** Create a checksum. */ 139 | -data CreateChecksum(const std::string& hrp, const data& values) 140 | +data CreateChecksum(const std::string& hrp, const data& values, Encoding encoding)
Why here Encoding is added as the last parameter why in https://github.com/bitcoin/bitcoin/pull/20861/files#diff-f146300624c06d2e08aadf500952294148a1785edd6ff2e8b50f13b2c08255edR152 it became first? Is there any reason behind API inconsistency?
Fixed.
A small API inconsistency caught and a global constant proposed
45 | @@ -46,3 +46,4 @@ BIPs that are implemented by Bitcoin Core (up-to-date up to **v0.21.0**): 46 | * [`BIP 325`](https://github.com/bitcoin/bips/blob/master/bip-0325.mediawiki): Signet test network is supported as of **v0.21.0** ([PR 18267](https://github.com/bitcoin/bitcoin/pull/18267)). 47 | * [`BIP 339`](https://github.com/bitcoin/bips/blob/master/bip-0339.mediawiki): Relay of transactions by wtxid is supported as of **v0.21.0** ([PR 18044](https://github.com/bitcoin/bitcoin/pull/18044)). 48 | * [`BIP 340`](https://github.com/bitcoin/bips/blob/master/bip-0340.mediawiki) [`341`](https://github.com/bitcoin/bips/blob/master/bip-0341.mediawiki) [`342`](https://github.com/bitcoin/bips/blob/master/bip-0342.mediawiki): Validation rules for Taproot (including Schnorr signatures and Tapscript leaves) are implemented as of **v0.21.0** ([PR 19953](https://github.com/bitcoin/bitcoin/pull/19953)), without mainnet activation. 49 | +* [`BIP 350`](https://github.com/bitcoin/bips/blob/master/bip-0350.mediawiki): Addresses for native v1+ segregated Witness outputs use Bech32m instead of Bech32 as of **v22.0** ([PR 20861](https://github.com/bitcoin/bitcoin/pull/20861)).
Won't this need to be backported to 0.21.1 with taproot activation?
I have no opinion on that, but even if we do that, that's not part of this PR.
Also note that if 0.21.1 contains Taproot activation logic, it won't contain any wallet support. Being able to send to a Taproot address is probably worth back-porting, but that could also wait for 0.21.2.
485 | @@ -486,7 +486,25 @@ 486 | } 487 | ], 488 | [ 489 | - "bc1pw508d6qejxtdg4y5r3zarvary0c5xw7kw508d6qejxtdg4y5r3zarvary0c5xw7k7grplx", 490 | + "tb1qqqqqp399et2xygdj5xreqhjjvcmzhxw4aywxecjdzew6hylgvsesrxh6hy", 491 | + "0020000000c4a5cad46221b2a187905e5266362b99d5e91c6ce24d165dab93e86433", 492 | + { 493 | + "isPrivkey": false, 494 | + "chain": "test",
Couldn't this be signet as well?
Added a signet copy of this.
535 | - "0020000000c4a5cad46221b2a187905e5266362b99d5e91c6ce24d165dab93e86433", 536 | + "tb1pqqqqp399et2xygdj5xreqhjjvcmzhxw4aywxecjdzew6hylgvsesf3hn0c", 537 | + "5120000000c4a5cad46221b2a187905e5266362b99d5e91c6ce24d165dab93e86433", 538 | { 539 | "isPrivkey": false, 540 | "chain": "test",
ditto signet comment
And this.
Rebased and addressed review comments.
ACK f15513d4184e637eda76eb23e0d6d47d7fb25075
ACK f15513d4184e637eda76eb23e0d6d47d7fb25075
I'm confused how key_io_valid.json and key_io_invalid.json are used. It doesn't seem to be referenced in any of the (functional) tests or CI scripts. If I use the suggested incantation in gen_key_io_test_vectors.py it fails at line 131 with bech32_encode() missing 1 required positional argument: 'spec'.
94 | @@ -95,20 +95,26 @@ CTxDestination DecodeDestination(const std::string& str, const CChainParams& par 95 | error_str = "Invalid prefix for Base58-encoded address"; 96 | } 97 | data.clear(); 98 | - auto bech = bech32::Decode(str); 99 | - if (bech.second.size() > 0) { 100 | + const auto dec = bech32::Decode(str); 101 | + if ((dec.encoding == bech32::Encoding::BECH32 || dec.encoding == bech32::Encoding::BECH32M) && dec.data.size() > 0) {
There might be a Base58 string which is a valid Bech32m with future Witness version, and since Base58 is the first decoder, a future valid Bech32m address may be incorrectly parsed as Base58. So I propose to change their order.
That seems like something for a separate PR.
I don't think there is any concern about that.
The decoder only accepts Base58Check inputs that encode 20 data bytes, which means addresses of at most 35 characters (ceil(log(256)/log(58)20) == 20). A Bech32(m) segwit address of 35 characters (assuming 2+-character HRP) would encode at most (35-6(checksum)-1(separator)-2(hrp)-1(version byte))5 = 125 bits, rounded down to a byte, 120 bits. I don't think anyone is planning on proposing 120-bit addresses for Bitcoin any time soon.
I honestly don't really like the std::tuple stuff. It's pretty ugly and a simple struct containing the 3 elements passed as a copy by reference (edit: I meant like the tuples are passed now) in the same way would be more straightforward. (@kallewoof)
@kallewoof Agree, a struct is definitely warranted here. Updated. (@sipa)
No need to change this again, but in future you may find that structured bindings (since c++17) offer a clean and readable way to avoid the boilerplate of declaring a struct return. Combining with an if init statement:
if (const auto [encoding, hrp, data] = bech32::Decode(str);
(encoding == bech32::Encoding::BECH32 || encoding == bech32::Encoding::BECH32M) && data.size() > 0) {
// [... do stuff ]
} // encoding, hrp and data fall out of scope at end of if block
Agree, though another reason why we may not want to do this: we'll want this PR backported as far as possible.
Makes sense. I'll mark this as resolved.
Also, TIL: you can use structured bindings to unpack a struct, so this works even when the return value is a DecodeResult: const auto [encoding, hrp, d] = bech32::Decode(str);
121 | + } 122 | // The rest of the symbols are converted witness program bytes. 123 | - data.reserve(((bech.second.size() - 1) * 5) / 8); 124 | - if (ConvertBits<5, 8, false>([&](unsigned char c) { data.push_back(c); }, bech.second.begin() + 1, bech.second.end())) { 125 | + data.reserve(((dec.data.size() - 1) * 5) / 8); 126 | + if (ConvertBits<5, 8, false>([&](unsigned char c) { data.push_back(c); }, dec.data.begin() + 1, dec.data.end())) {
in case of Bech32 address error_str will still be set to "Invalid prefix for Base58-encoded address" because it is not cleaned when we return https://github.com/bitcoin/bitcoin/pull/20861/files#diff-a83ac7cb4a9d60b647db6378ccfad815005724e594bf53f49745b0f40701292fR152
So I propose
if (ConvertBits<5, 8, false>([&](unsigned char c) { data.push_back(c); }, dec.data.begin() + 1, dec.data.end())) {
error_str = "";
It's already wiped in line 102, I think?
True, don't know how I missed that
Found few tricky places, which are from the previous implementation
re-ACK 835ff6b
We should make gen_key_io_test_vectors.py deterministic, because right now it's impractical to verify the JSON changes in ea3995aa3cd8f04d7328c95a43c79a5b20fa18a5 and 835ff6b8568291870652ca0d33d934039e7b84a8. All I could do is run it again and see if the tests still pass.
reACK 835ff6b8568291870652ca0d33d934039e7b84a8
30 | + BECH32M, //!< Bech32m encoding as defined in BIP350 31 | +}; 32 | + 33 | +/** Encode a Bech32 or Bech32m string. If hrp contains uppercase characters, this will cause an 34 | + * assertion error. Encoding must be one of BECH32 or BECH32M. */ 35 | +std::string Encode(Encoding, const std::string& hrp, const std::vector<uint8_t>& values);
Makes no difference to the compiled code, but the convention is to give the declaration's parameters names (which match the parameter names in the function definition).
std::string Encode(Encoding encoding, const std::string& hrp, const std::vector<uint8_t>& values);
Done.
0 | @@ -1,4 +1,4 @@ 1 | -// Copyright (c) 2017 Pieter Wuille 2 | +// Copyright (c) 2017, 2021 Pieter Wuille
Perhaps update line 10:
-// For more information, see BIP 173.
+// For more information, see BIPs 173 and 350.
Done.
45 | - assert(data == input); 46 | + for (auto encoding : {bech32::Encoding::BECH32, bech32::Encoding::BECH32M}) { 47 | + const std::string encoded = bech32::Encode(encoding, "bc", input); 48 | + assert(!encoded.empty()); 49 | + const auto r2 = bech32::Decode(encoded); 50 | + if (r2.hrp.empty()) {
Should a round trip encode/decode always result in the same hrp ("bc") here? Am I missing some way that this could legitimately fail?
This already existed in the code before this PR, but it's a good point. The only failure case is when the data is too big to fit in a valid Bech32(m) encoding (max 90 characters).
I've added a check for that, and then dropped the if (r2.hrp.empty()) { branch.
Just a few comments on the first commit so far. Nothing blocking or that needs to be changed.
92 | + "16plkw9", 93 | + "1p2gdwpf" 94 | + }; 95 | + for (const std::string& str : CASES) { 96 | + const auto dec = bech32::Decode(str); 97 | + BOOST_CHECK(dec.encoding != bech32::Encoding::BECH32);
In commit "Add Bech32m test vectors" (f0e15955421870c297d4ddb523431dae711e27d3)
I think this means to check != BECH32M, instead of != BECH32. Alternately could check == INVALID, if it is not a problem to be more strict.
Indeed. Replaced with == INVALID.
61 | @@ -62,7 +62,7 @@ class DestinationEncoder 62 | std::vector<unsigned char> data = {(unsigned char)id.version}; 63 | data.reserve(1 + (id.length * 8 + 4) / 5); 64 | ConvertBits<8, 5, true>([&](unsigned char c) { data.push_back(c); }, id.program, id.program + id.length); 65 | - return bech32::Encode(bech32::Encoding::BECH32, m_params.Bech32HRP(), data); 66 | + return bech32::Encode(bech32::Encoding::BECH32M, m_params.Bech32HRP(), data);
In commit "Use Bech32m encoding for v1+ segwit addresses" (ea3995aa3cd8f04d7328c95a43c79a5b20fa18a5)
Should AddressDescriptor::GetOutputType should now return BECH32M instead of BECH32 for WitnessUnknown addresses to be consistent with this change?
Also, this change seems like the first change that affects user-visible behavior, so it would be great if it added release notes or at least mentioned what the effects are in the commit message. I think the only thing current software can do with V1 addresses is validate them and do watchonly scans for them. So this only affects validate and import RPCs?
Very good point.
We can't return GetOutputType::BECH32M because no such constant exists. It could be added, but I'd rather not do that in this PR, because I don't think it actually matters at this stage. The purpose of output types is so the user can select what kind of address to request from a wallet. You're right that technically a addr(...) descriptor could be imported (but I think that in most cases a ranged descriptor is required, so addr(...) won't even work), and it might matter to classify that as bech32m - but this seems like a very unrealistic edge case.
However, I think this will matter once we want to add actual taproot support to the wallet (#21365, and follow-ups that at one point might want to construct such wallet descriptors by default). I suspect we may want an OutputType::BECH32M then, and permit wallets to have separate -addresstype=bech32 and -addresstype=bech32m active descriptors, as one may know that the sender they want to receive from supports bech32 but not bech32m yet.
I've also added release notes.
ping @achow101: thoughts on OutputType::BECH32M?
I think -addresstype=bech32m makes sense, e.g. getnewaddress bech32m as a way to request a taproot p2pk address. However it might be more future proof to use -addresstype=tap and getnewaddress tap.
Longer term I suspect it might be better to give wallet descriptors a name and then call getnewaddress NAME.
See also #15590 for some earlier discussions around this issue.
However it might be more future proof to use
-addresstype=tapandgetnewaddress tap.
I'm not sure. If Bech32m gets adopted as I hope it does, we won't need a new address type for post-taproot things anymore.
Since bech32m is (and bech32 was) intended to be used for different segwit versions, I think that we should actually move away from having bech32 or bech32m as the address type as with future segwit versions, it becomes ambiguous as to what type of address you want.
For example, suppose we introduce a segwit v2 that, like segwit v1, will encode a pubkey in the address. Now the wallet would have a descriptor for segwit v1, and another for segwit v2. If I wanted a segwit v1 address, what do I provide as the address type? If I provide bech32m, that's ambiguous because both segwit v1 and segwit v2 use bech32m addresses. We would have to come up with another name for segwit v2 addresses, and even then, it is still confusing because technically both are bech32m addresses.
With that in mind, I think we need to use a different and more unique naming scheme. Luckily this is pretty easy because segwit has version numbers, and proposals all have unique names. So for segwit v1 addresses, we could call the address type tap and just not expose bech32m in it.
I have done a similar thing in HWI where segwit v0 is wit rather than bech32, and segwit v1 will be tap.
Looking at it now, I do also think that having bech32 as the address type for segwit v0 was not the right thing to do. It really should have just be wit or something like that that makes it unique to just segwit v0. Since bech32 is now segwit v0 only, it's fine, but in the future, we should not do the same thing.
Now the wallet would have a descriptor for segwit v1, and another for segwit v2. If I wanted a segwit v1 address, what do I provide as the address type?
I think that's the crucial point: why would you want that? If every sender supports both equally, you'd always produce whatever you have a descriptor for that is compatible with that.
If there is a reason for being more flexible, no objection to just using names or something. But then also, would you support having multiple distinct active descriptors that produce the same address type within the same wallet? I think that's the same problem.
If anything, I think we should have done things in the other direction; OutputTypes should have been LEGACY, BIP16, BECH32 or so (and BIP16 should have been both usable for multisig and segwit v0). Because those are the classes senders care about.
I think that's the crucial point: why would you want that? If every sender supports both equally, you'd always produce whatever you have a descriptor for that is compatible with that.
In the ideal world, sure. But as we have seen with segwit, that's not guaranteed. Even though senders were supposed to allow any segwit version with bech32, they didn't. I'm not sure that everyone will understand to do that with bech32m.
The address type option is not just for senders. It's also for receivers and what they are comfortable with receiving. I could conceive of a situation where someone sets up a new wallet but wants to use v1 instead of v2 because, e.g., v2 introduces some new signature algo that has assumptions they aren't comfortable with using. What would they have to do to use v1?
In the ideal world, sure. But as we have seen with segwit, that's not guaranteed. Even though senders were supposed to allow any segwit version with bech32, they didn't. I'm not sure that everyone will understand to do that with bech32m.
Sure, that's why I suggest we may want to add an OutputType::BECH32M - precisely because Bech32 and Bech32m may differ for senders. But p2sh vs. p2sh-p2wpk shouldn't be distinguished, for example.
I could conceive of a situation where someone sets up a new wallet but wants to use v1 instead of v2 because, e.g., v2 introduces some new signature algo that has assumptions they aren't comfortable with using. What would they have to do to use v1?
That's fair, and was a good argument for segwit vs non-segwit in pre-descriptor wallets. But in descriptor wallets if you don't want witness v9 addresses because you're not comfortable with them, simply don't import/create such a descriptor.
Sure, that's why I suggest we may want to add an OutputType::BECH32M - precisely because Bech32 and Bech32m may differ for senders.
Yes, I agree with doing that.
But in descriptor wallets if you don't want witness v9 addresses because you're not comfortable with them, simply don't import/create such a descriptor.
That's reasonable, but I think it's too onerous on users to have to figure out which and how to create the descriptor that they want. I think it's reasonable that user would know about different address types and, at a high level, the consequences of using each type, but also not know (or be afraid of) the descriptor syntax, which derivation paths to use, safely handling xprvs for import, etc. I suppose we could add more options to the wallet creation stuff, but that still means the user has to deal with descriptors directly once they feel comfortable enough to upgrade, or they have to make a wholly new wallet.
If instead the wallet generated v1 and v2 descriptors, then the user could switch to the new address type by choosing the option and it's very simple and easy to do so.
I'm going to mark this as resolved; we've discussed this in today's wallet meeting. It's not exactly clear how to proceed, but in any case, I don't think this is something for this PR. Please respond if you don't agree.
66 | @@ -65,29 +67,39 @@ 67 | ] 68 | # templates for valid bech32 sequences 69 | bech32_templates = [ 70 | - # hrp, version, witprog_size, metadata, output_prefix 71 | - ('bc', 0, 20, (False, 'main', None, True), p2wpkh_prefix), 72 | - ('bc', 0, 32, (False, 'main', None, True), p2wsh_prefix), 73 | - ('bc', 1, 2, (False, 'main', None, True), (OP_1, 2)),
In commit "Use Bech32m encoding for v1+ segwit addresses" (ea3995aa3cd8f04d7328c95a43c79a5b20fa18a5)
I probably just need to stare at this more, but maybe you could summarize the changes here. Like where did the 1,2 and 2,16 cases go?
They're just gone and replaced by better cases. I got rid of (1,2) as (1,32) is a more relevant case (that's taproot) and added (2,2) too. To test a bit larger variety, some other (2, _) were replaced with (3, _) as well.
Partial code review ACK 835ff6b8568291870652ca0d33d934039e7b84a8 for c++/python implementations and unit and functional tests, but I didn't look at closely at test vector changes yet or look at fuzz tests at all
re-utACK 92ec3fcdd2f79949839ce269403396362bea9d54
5 | + being implemented, behavior for all RPCs that accept addresses is changed when 6 | + a native witness version 1 (or higher) is passed. These now require a Bech32m 7 | + encoding instead of a Bech32 one, and Bech32m encoding will be used for such 8 | + addresses in RPC output as well. No version 1 addresses should be created 9 | + for mainnet until consensus rules are adopted that give them meaning 10 | + (e.g. through [BIP 341](https://github.com/bitcoin/bips/blob/master/bip-0350.mediawiki)).
s/350/341/
d1054b98e5ec04cd3cdb487eaaf561926ef6c82d
In d1054b98e5ec04cd3cdb487eaaf561926ef6c82d "Use Bech32m encoding for v1+ segwit addresses"
I think you meant https://github.com/bitcoin/bips/blob/master/bip-0341.mediawiki (341, not 350)
Done.
Already fixed.
35 | - const std::string encoded = bech32::Encode("bc", input); 36 | - assert(!encoded.empty()); 37 | 38 | - const std::pair<std::string, std::vector<uint8_t>> r2 = bech32::Decode(encoded); 39 | - if (r2.first.empty()) { 40 | - assert(r2.second.empty());
How come this scenario is no longer relevant?
The added if condition around it (which checks that the input is within acceptable range).
Code Review ACK 92ec3fc
Code changes look good, I didn't go too deep into the test vector changes, but the changes to the tests themselves look good.
Approach ACK, reviewing.
30 | + BECH32M, //!< Bech32m encoding as defined in BIP350 31 | +}; 32 | + 33 | +/** Encode a Bech32 or Bech32m string. If hrp contains uppercase characters, this will cause an 34 | + * assertion error. Encoding must be one of BECH32 or BECH32M. */ 35 | +std::string Encode(Encoding encoding, const std::string& hrp, const std::vector<uint8_t>& values);
402cbb2 Perhaps move the data type alias to the header file to use it uniformly in bech32.{h, cpp}
<details><summary>example code diff</summary><p>
diff --git a/src/bech32.cpp b/src/bech32.cpp
index 80794ec4f1..60d4d72ca5 100644
--- a/src/bech32.cpp
+++ b/src/bech32.cpp
@@ -13,8 +13,6 @@ namespace bech32
-typedef std::vector<uint8_t> data;
-
/** The Bech32 character set for encoding. */
const char* CHARSET = "qpzry9x8gf2tvdw0s3jn54khce6mua7l";
diff --git a/src/bech32.h b/src/bech32.h
index 3679ea8ccb..7a601a7b97 100644
--- a/src/bech32.h
+++ b/src/bech32.h
@@ -16,6 +16,8 @@
#include <string>
#include <vector>
+typedef std::vector<uint8_t> data;
+
namespace bech32
{
@@ -28,16 +30,16 @@ enum class Encoding {
/** Encode a Bech32 or Bech32m string. If hrp contains uppercase characters, this will cause an
* assertion error. Encoding must be one of BECH32 or BECH32M. */
-std::string Encode(Encoding encoding, const std::string& hrp, const std::vector<uint8_t>& values);
+std::string Encode(Encoding encoding, const std::string& hrp, const data& values);
struct DecodeResult
{
- Encoding encoding; //!< What encoding was detected in the result; Encoding::INVALID if failed.
- std::string hrp; //!< The human readable part
- std::vector<uint8_t> data; //!< The payload (excluding checksum)
+ Encoding encoding; //!< What encoding was detected in the result; Encoding::INVALID if failed.
+ std::string hrp; //!< The human readable part
+ data data; //!< The payload (excluding checksum)
DecodeResult() : encoding(Encoding::INVALID) {}
- DecodeResult(Encoding enc, std::string&& h, std::vector<uint8_t>&& d) : encoding(enc), hrp(std::move(h)), data(std::move(d)) {}
+ DecodeResult(Encoding enc, std::string&& h, ::data&& d) : encoding(enc), hrp(std::move(h)), data(std::move(d)) {}
};
</p></details>
I don't think we want to pollute the global namespace with a type called data.
Good point indeed: https://google.github.io/styleguide/cppguide.html#Aliases. "However, local convenience aliases are fine in function definitions, private sections of classes, explicitly marked internal namespaces, and in .cc (.cpp) files."
Perhaps it's worth having a global typedef for this (maybe with a more descriptive name than data), but if so, I think it's something for a different PR.
Closing this.
127 | // if we required that the checksum was 0, it would be the case that appending a 0 to a valid 128 | // list of values would result in a new valid list. For that reason, Bech32 requires the 129 | - // resulting checksum to be 1 instead. 130 | - return PolyMod(Cat(ExpandHRP(hrp), values)) == 1; 131 | + // resulting checksum to be 1 instead. In Bech32m, this constant was amended. 132 | + uint32_t check = PolyMod(Cat(ExpandHRP(hrp), values));
402cbb29516d5bf03a18822af9f1b4b574f319ac pico-nits, check can be const (and braced initialization for type safety)
const uint32_t check{PolyMod(Cat(ExpandHRP(hrp), values))};
Idem for mod on line 140, result on line 195, and version in key_io.cpp::L106
Perhaps this is more personal style. I don't think this is an improvement.
Added const.
7 | + encoding instead of a Bech32 one, and Bech32m encoding will be used for such 8 | + addresses in RPC output as well. No version 1 addresses should be created 9 | + for mainnet until consensus rules are adopted that give them meaning 10 | + (e.g. through [BIP 341](https://github.com/bitcoin/bips/blob/master/bip-0350.mediawiki)). 11 | + Once that happens, Bech32m is expected to be used for them, so this shouldn't 12 | + affect any production system, but may be observed on other networks where such
d1054b98e nit, s/system/systems/ and omit comma
affect any production systems but may be observed on other networks where such
Done.
29 | @@ -27,6 +30,12 @@ const int8_t CHARSET_REV[128] = { 30 | 1, 0, 3, 16, 11, 28, 12, 14, 6, 4, 2, -1, -1, -1, -1, -1 31 | }; 32 | 33 | +/* Determine the final constant to use for the specified encoding. */ 34 | +uint32_t EncodingConstant(Encoding encoding) { 35 | + assert(encoding == Encoding::BECH32 || encoding == Encoding::BECH32M);
Checking for encoding type may be done frequently enough to consider adding helper functions (not necessarily in this PR).
I'm not sure what you're suggesting concretely here.
I think explicitly asserting is purposefully done to give a sort of code-visual cue on where updates are required, after adding additional encodings in the future, so I don't think helper functions would be appropriate here.
Light first-pass ACK 92ec3fcdd2f79949839ce269403396362bea9d54
Mostly looked at the first commit. A few comments, feel free to ignore or address in a follow-up.
24 | +enum class Encoding { 25 | + INVALID, //!< Failed decoding 26 | 27 | -/** Decode a Bech32 string. Returns (hrp, data). Empty hrp means failure. */ 28 | -std::pair<std::string, std::vector<uint8_t>> Decode(const std::string& str); 29 | + BECH32, //!< Bech32 encoding as defined in BIP173
Not necessarily for this PR but we may want to think about renaming this to BECH32Legacy or BECH32v0 or so. In the code right now sometimes "bech32" refers to BIP173 specifically and sometimes to both BIP173 and BIP350. In this line it's the former case, in the namespace it's the latter case. I can see people getting confused by that in the future.
I don't think that's the right solution. The bech32 module is independent of the concept of segwit or bitcoin or addresses or witness versions. If there are other places where the name bech32 is used in a confusing way to include bech32m, feel free to point them out so they can be fixed (in this PR or elsewhere).
Ok, my naming suggestions were just what came to my head at that moment, I did not want to intentionally peg the encoding name to one of its use cases. I just think it's unfortunate that everything including the BIP173 encoding was called "Bech32" and now there is a new encoding called "Bech32m" that is also a Bech32 string. I think it might be helpful to have a naming distinction between Bech32 in general and the BIP173 encoding. But I also hope this code doesn't have to change much in the future and if no one else shares that view, I am happy to have it ignored :)
there is a new encoding called "Bech32m" that is also a Bech32 string.
That's wrong. We should fix places in the code where Bech32m is referred to as Bech32. I know of one, addresstype/OutputType::BECH32, which I'm intentionally ignoring in this to minimize code changes (see earlier discussion), but should be fixed still.
re: #20861 (review)
The name "bech32m" does maybe seem unfortunate. It might have been nice to chose a completely different name ("fatfinger32, now catching more stupid typos!") since from a user perspective you can't use a bech32 address when you need a bech32m address, or vice versa. It's just behind the scenes that they share an extremely similar implementation, and the "human readable" parts can be interpreted the same way.
You are hereby promoted to Official Namer Of Stuff.
Nice, so much easier than my other jobs!
That's wrong. We should fix places in the code where Bech32m is referred to as Bech32.
Ok, re-reading the module now I think this is mostly clear and I was probably bringing in my own preconceived ideas or looking at older code like the comment 2 lines above that was actually removed. I nit-picked through the module and here a few suggestions where this could be clarified further: https://github.com/fjahr/bitcoin/commit/2156afd69e6e5bd587a9d734b21c3e068a037156 but obviously feel free to ignore or keep for a follow-up together with the OutputType maybe.
tested ACK 92ec3fcdd2f79949839ce269403396362bea9d54
Reviewed code after re-reading BIP350 and the Review Club discussion. Then did some manual testing with address imports and some test mutations.
43 | + DecodeResult() : encoding(Encoding::INVALID) {} 44 | + DecodeResult(Encoding enc, std::string&& h, std::vector<uint8_t>&& d) : encoding(enc), hrp(std::move(h)), data(std::move(d)) {} 45 | +}; 46 | + 47 | +/** Decode a Bech32 string. */ 48 | +DecodeResult Decode(const std::string& str);
The caller can't do anything with a DecodeResult where encoding is Encoding::INVALID, so changing this return type to std::optional<DecodeResult> and removing the Encoding::INVALID enum type feels more natural to me.
<details> <summary>Diff</summary>
diff --git a/src/bech32.cpp b/src/bech32.cpp
index 80794ec4f1..2fd84b395b 100644
--- a/src/bech32.cpp
+++ b/src/bech32.cpp
@@ -6,6 +6,7 @@
#include <util/vector.h>
#include <assert.h>
+#include <optional>
namespace bech32
{
@@ -120,7 +121,7 @@ data ExpandHRP(const std::string& hrp)
}
/** Verify a checksum. */
-Encoding VerifyChecksum(const std::string& hrp, const data& values)
+std::optional<Encoding> VerifyChecksum(const std::string& hrp, const data& values)
{
// PolyMod computes what value to xor into the final values to make the checksum 0. However,
// if we required that the checksum was 0, it would be the case that appending a 0 to a valid
@@ -129,7 +130,7 @@ Encoding VerifyChecksum(const std::string& hrp, const data& values)
uint32_t check = PolyMod(Cat(ExpandHRP(hrp), values));
if (check == EncodingConstant(Encoding::BECH32)) return Encoding::BECH32;
if (check == EncodingConstant(Encoding::BECH32M)) return Encoding::BECH32M;
- return Encoding::INVALID;
+ return {};
}
/** Create a checksum. */
@@ -165,7 +166,7 @@ std::string Encode(Encoding encoding, const std::string& hrp, const data& values
}
/** Decode a Bech32 or Bech32m string. */
-DecodeResult Decode(const std::string& str) {
+std::optional<DecodeResult> Decode(const std::string& str) {
bool lower = false, upper = false;
for (size_t i = 0; i < str.size(); ++i) {
unsigned char c = str[i];
@@ -192,9 +193,9 @@ DecodeResult Decode(const std::string& str) {
for (size_t i = 0; i < pos; ++i) {
hrp += LowerCase(str[i]);
}
- Encoding result = VerifyChecksum(hrp, values);
- if (result == Encoding::INVALID) return {};
- return {result, std::move(hrp), data(values.begin(), values.end() - 6)};
+ auto result = VerifyChecksum(hrp, values);
+ if (!result) return {};
+ return {{*result, std::move(hrp), data(values.begin(), values.end() - 6)}};
}
} // namespace bech32
diff --git a/src/bech32.h b/src/bech32.h
index 3679ea8ccb..f2f294bc3f 100644
--- a/src/bech32.h
+++ b/src/bech32.h
@@ -12,6 +12,7 @@
#ifndef BITCOIN_BECH32_H
#define BITCOIN_BECH32_H
+#include <optional>
#include <stdint.h>
#include <string>
#include <vector>
@@ -20,8 +21,6 @@ namespace bech32
{
enum class Encoding {
- INVALID, //!< Failed decoding
-
BECH32, //!< Bech32 encoding as defined in BIP173
BECH32M, //!< Bech32m encoding as defined in BIP350
};
@@ -32,16 +31,15 @@ std::string Encode(Encoding encoding, const std::string& hrp, const std::vector<
struct DecodeResult
{
- Encoding encoding; //!< What encoding was detected in the result; Encoding::INVALID if failed.
+ Encoding encoding; //!< What encoding was detected in the result
std::string hrp; //!< The human readable part
std::vector<uint8_t> data; //!< The payload (excluding checksum)
- DecodeResult() : encoding(Encoding::INVALID) {}
DecodeResult(Encoding enc, std::string&& h, std::vector<uint8_t>&& d) : encoding(enc), hrp(std::move(h)), data(std::move(d)) {}
};
/** Decode a Bech32 string. */
-DecodeResult Decode(const std::string& str);
+std::optional<DecodeResult> Decode(const std::string& str);
} // namespace bech32
diff --git a/src/key_io.cpp b/src/key_io.cpp
index dbcbfa1f29..dbb4711d42 100644
--- a/src/key_io.cpp
+++ b/src/key_io.cpp
@@ -96,25 +96,25 @@ CTxDestination DecodeDestination(const std::string& str, const CChainParams& par
}
data.clear();
const auto dec = bech32::Decode(str);
- if ((dec.encoding == bech32::Encoding::BECH32 || dec.encoding == bech32::Encoding::BECH32M) && dec.data.size() > 0) {
+ if (dec && dec->data.size() > 0) {
// Bech32 decoding
error_str = "";
- if (dec.hrp != params.Bech32HRP()) {
+ if (dec->hrp != params.Bech32HRP()) {
error_str = "Invalid prefix for Bech32 address";
return CNoDestination();
}
- int version = dec.data[0]; // The first 5 bit symbol is the witness version (0-16)
- if (version == 0 && dec.encoding != bech32::Encoding::BECH32) {
+ int version = dec->data[0]; // The first 5 bit symbol is the witness version (0-16)
+ if (version == 0 && dec->encoding != bech32::Encoding::BECH32) {
error_str = "Version 0 witness address must use Bech32 checksum";
return CNoDestination();
}
- if (version != 0 && dec.encoding != bech32::Encoding::BECH32M) {
+ if (version != 0 && dec->encoding != bech32::Encoding::BECH32M) {
error_str = "Version 1+ witness address must use Bech32m checksum";
return CNoDestination();
}
// The rest of the symbols are converted witness program bytes.
- data.reserve(((dec.data.size() - 1) * 5) / 8);
- if (ConvertBits<5, 8, false>([&](unsigned char c) { data.push_back(c); }, dec.data.begin() + 1, dec.data.end())) {
+ data.reserve(((dec->data.size() - 1) * 5) / 8);
+ if (ConvertBits<5, 8, false>([&](unsigned char c) { data.push_back(c); }, dec->data.begin() + 1, dec->data.end())) {
if (version == 0) {
{
WitnessV0KeyHash keyid;
diff --git a/src/test/bech32_tests.cpp b/src/test/bech32_tests.cpp
index 2651e46430..835d3a1c79 100644
--- a/src/test/bech32_tests.cpp
+++ b/src/test/bech32_tests.cpp
@@ -23,8 +23,9 @@ BOOST_AUTO_TEST_CASE(bech32_testvectors_valid)
};
for (const std::string& str : CASES) {
const auto dec = bech32::Decode(str);
- BOOST_CHECK(dec.encoding == bech32::Encoding::BECH32);
- std::string recode = bech32::Encode(bech32::Encoding::BECH32, dec.hrp, dec.data);
+ BOOST_CHECK(dec);
+ BOOST_CHECK(dec->encoding == bech32::Encoding::BECH32);
+ std::string recode = bech32::Encode(bech32::Encoding::BECH32, dec->hrp, dec->data);
BOOST_CHECK(!recode.empty());
BOOST_CHECK(CaseInsensitiveEqual(str, recode));
}
@@ -43,8 +44,9 @@ BOOST_AUTO_TEST_CASE(bech32m_testvectors_valid)
};
for (const std::string& str : CASES) {
const auto dec = bech32::Decode(str);
- BOOST_CHECK(dec.encoding == bech32::Encoding::BECH32M);
- std::string recode = bech32::Encode(bech32::Encoding::BECH32M, dec.hrp, dec.data);
+ BOOST_CHECK(dec);
+ BOOST_CHECK(dec->encoding == bech32::Encoding::BECH32M);
+ std::string recode = bech32::Encode(bech32::Encoding::BECH32M, dec->hrp, dec->data);
BOOST_CHECK(!recode.empty());
BOOST_CHECK(CaseInsensitiveEqual(str, recode));
}
@@ -70,7 +72,7 @@ BOOST_AUTO_TEST_CASE(bech32_testvectors_invalid)
};
for (const std::string& str : CASES) {
const auto dec = bech32::Decode(str);
- BOOST_CHECK(dec.encoding == bech32::Encoding::INVALID);
+ BOOST_CHECK(!dec);
}
}
@@ -94,7 +96,7 @@ BOOST_AUTO_TEST_CASE(bech32m_testvectors_invalid)
};
for (const std::string& str : CASES) {
const auto dec = bech32::Decode(str);
- BOOST_CHECK(dec.encoding == bech32::Encoding::INVALID);
+ BOOST_CHECK(!dec);
}
}
diff --git a/src/test/fuzz/bech32.cpp b/src/test/fuzz/bech32.cpp
index 15a3875828..c794fdafc3 100644
--- a/src/test/fuzz/bech32.cpp
+++ b/src/test/fuzz/bech32.cpp
@@ -17,10 +17,8 @@ FUZZ_TARGET(bech32)
{
const std::string random_string(buffer.begin(), buffer.end());
const auto r1 = bech32::Decode(random_string);
- if (r1.hrp.empty()) {
- assert(r1.data.empty());
- } else {
- const std::string reencoded = bech32::Encode(r1.encoding, r1.hrp, r1.data);
+ if (r1) {
+ const std::string reencoded = bech32::Encode(r1->encoding, r1->hrp, r1->data);
assert(CaseInsensitiveEqual(random_string, reencoded));
}
@@ -33,9 +31,10 @@ FUZZ_TARGET(bech32)
const std::string encoded = bech32::Encode(encoding, "bc", input);
assert(!encoded.empty());
const auto r2 = bech32::Decode(encoded);
- assert(r2.encoding == encoding);
- assert(r2.hrp == "bc");
- assert(r2.data == input);
+ assert(r2);
+ assert(r2->encoding == encoding);
+ assert(r2->hrp == "bc");
+ assert(r2->data == input);
}
}
}
</details>
Note that we've removed the following check from the fuzz tester:
- if (r1.hrp.empty()) {
- assert(r1.data.empty());
That inconsistent state can't exist where DecodeResult only contains a successfully decoded bech32{m} string.
EDIT: I guess you're not using std::optional so it's easier to backport to versions that don't require c++17?
Thanks. Marking resolved.
29 | @@ -27,6 +30,12 @@ const int8_t CHARSET_REV[128] = { 30 | 1, 0, 3, 16, 11, 28, 12, 14, 6, 4, 2, -1, -1, -1, -1, -1 31 | }; 32 | 33 | +/* Determine the final constant to use for the specified encoding. */ 34 | +uint32_t EncodingConstant(Encoding encoding) {
Is there a reason that this isn't simply a std::map<Encoding, uint32_t>?
This can be constexpred to evaluate this in compile time for the VerifyChecksum() case:
constexpr uint32_t EncodingConstant(Encoding encoding) {
Hmm, a map feels like a lot more heavyweight/overkill to me.
Making it constexpr will prevent backporting to pre-C++17 code. It's also unnecessary (constexpr doesn't control whether the compiler can evaluate at compile time as an optimization, only whether you can use it in compile-time expressions).
43 | + return Encoding.BECH32M 44 | + else: 45 | + return None 46 | 47 | -def bech32_create_checksum(hrp, data): 48 | +def bech32_create_checksum(hrp, data, spec):
It's slightly inconsistent that in the c++ code, this is called encoding and is the first argument, and here it's spec and the last argument.
Changed the test_framework code to also use (encoding,hrp,data) everywhere.
33 | @@ -27,38 +34,45 @@ def bech32_hrp_expand(hrp): 34 | 35 | def bech32_verify_checksum(hrp, data): 36 | """Verify a checksum given HRP and converted data characters.""" 37 | - return bech32_polymod(bech32_hrp_expand(hrp) + data) == 1 38 | - 39 | + check = bech32_polymod(bech32_hrp_expand(hrp) + data) 40 | + if check == 1:
Perhaps add constant BECH32_CONST = 1
Done.
ACK 92ec3fcdd2f79949839ce269403396362bea9d54
A few style comments. Definitely don't change the branch to take these changes now, but some things to think about once we don't have to worry about backporting to pre-c++17.
15 | @@ -16,28 +16,26 @@ 16 | FUZZ_TARGET(bech32) 17 | { 18 | const std::string random_string(buffer.begin(), buffer.end()); 19 | - const std::pair<std::string, std::vector<uint8_t>> r1 = bech32::Decode(random_string); 20 | - if (r1.first.empty()) { 21 | - assert(r1.second.empty()); 22 | + const auto r1 = bech32::Decode(random_string); 23 | + if (r1.hrp.empty()) { 24 | + assert(r1.data.empty());
In 402cbb29516d5bf03a18822af9f1b4b574f319ac "Implement Bech32m encoding/decoding"
It would be nice to also check that the encoding is Encoding::INVALID
Done.
3 | @@ -4,5 +4,5 @@ Utilities to generate test vectors for the data-driven Bitcoin tests. 4 | 5 | Usage: 6 | 7 | - PYTHONPATH=../../test/functional/test_framework ./gen_key_io_test_vectors.py valid 50 > ../../src/test/data/key_io_valid.json 8 | - PYTHONPATH=../../test/functional/test_framework ./gen_key_io_test_vectors.py invalid 50 > ../../src/test/data/key_io_invalid.json 9 | + PYTHONPATH=../../test/functional/test_framework ./gen_key_io_test_vectors.py valid 70 > ../../src/test/data/key_io_valid.json 10 | + PYTHONPATH=../../test/functional/test_framework ./gen_key_io_test_vectors.py invalid 70 > ../../src/test/data/key_io_invalid.json
In 92ec3fcdd2f79949839ce269403396362bea9d54 "Add signet support to gen_key_io_test_vectors.py"
There is a similar comment at the top of contrib/testgen/gen_key_io_test_vectors.py which should either be updated or removed. Also that comment says the file is for Base58 addresses when clearly both Base58 and Bech32 things are generated in it.
Done.
Just a few additional comments for some things I saw in further review. Not invalidating my previous ACK and these shouldn't hold up this PR.
Addressed comments. Rebased + made the following changes:
diff --git a/contrib/testgen/gen_key_io_test_vectors.py b/contrib/testgen/gen_key_io_test_vectors.py
index 6b985b8ec2..2f3189eca9 100755
--- a/contrib/testgen/gen_key_io_test_vectors.py
+++ b/contrib/testgen/gen_key_io_test_vectors.py
@@ -3,11 +3,11 @@
# Distributed under the MIT software license, see the accompanying
# file COPYING or http://www.opensource.org/licenses/mit-license.php.
'''
-Generate valid and invalid base58 address and private key test vectors.
+Generate valid and invalid base58/bech32(m) address and private key test vectors.
Usage:
- PYTHONPATH=../../test/functional/test_framework ./gen_key_io_test_vectors.py valid 50 > ../../src/test/data/key_io_valid.json
- PYTHONPATH=../../test/functional/test_framework ./gen_key_io_test_vectors.py invalid 50 > ../../src/test/data/key_io_invalid.json
+ PYTHONPATH=../../test/functional/test_framework ./gen_key_io_test_vectors.py valid 70 > ../../src/test/data/key_io_valid.json
+ PYTHONPATH=../../test/functional/test_framework ./gen_key_io_test_vectors.py invalid 70 > ../../src/test/data/key_io_invalid.json
'''
# 2012 Wladimir J. van der Laan
# Released under MIT License
diff --git a/doc/release-notes-20861.md b/doc/release-notes-20861.md
index 2c270f0df5..5c68e4ab0c 100644
--- a/doc/release-notes-20861.md
+++ b/doc/release-notes-20861.md
@@ -7,7 +7,7 @@ Updated RPCs
encoding instead of a Bech32 one, and Bech32m encoding will be used for such
addresses in RPC output as well. No version 1 addresses should be created
for mainnet until consensus rules are adopted that give them meaning
- (e.g. through [BIP 341](https://github.com/bitcoin/bips/blob/master/bip-0350.mediawiki)).
+ (e.g. through [BIP 341](https://github.com/bitcoin/bips/blob/master/bip-0341.mediawiki)).
Once that happens, Bech32m is expected to be used for them, so this shouldn't
- affect any production system, but may be observed on other networks where such
+ affect any production systems, but may be observed on other networks where such
addresses already have meaning (like signet).
diff --git a/src/bech32.cpp b/src/bech32.cpp
index 80794ec4f1..289e0213e8 100644
--- a/src/bech32.cpp
+++ b/src/bech32.cpp
@@ -126,7 +126,7 @@ Encoding VerifyChecksum(const std::string& hrp, const data& values)
// if we required that the checksum was 0, it would be the case that appending a 0 to a valid
// list of values would result in a new valid list. For that reason, Bech32 requires the
// resulting checksum to be 1 instead. In Bech32m, this constant was amended.
- uint32_t check = PolyMod(Cat(ExpandHRP(hrp), values));
+ const uint32_t check = PolyMod(Cat(ExpandHRP(hrp), values));
if (check == EncodingConstant(Encoding::BECH32)) return Encoding::BECH32;
if (check == EncodingConstant(Encoding::BECH32M)) return Encoding::BECH32M;
return Encoding::INVALID;
diff --git a/src/test/fuzz/bech32.cpp b/src/test/fuzz/bech32.cpp
index 15a3875828..63f05c4474 100644
--- a/src/test/fuzz/bech32.cpp
+++ b/src/test/fuzz/bech32.cpp
@@ -19,9 +19,11 @@ FUZZ_TARGET(bech32)
const auto r1 = bech32::Decode(random_string);
if (r1.hrp.empty()) {
assert(r1.data.empty());
+ assert(r1.encoding == Encoding::INVALID);
} else {
const std::string reencoded = bech32::Encode(r1.encoding, r1.hrp, r1.data);
assert(CaseInsensitiveEqual(random_string, reencoded));
+ assert(r1.encoding != Encoding::INVALID);
}
std::vector<unsigned char> input;
diff --git a/test/functional/test_framework/segwit_addr.py b/test/functional/test_framework/segwit_addr.py
index 78b175a69e..3c3d242ad2 100644
--- a/test/functional/test_framework/segwit_addr.py
+++ b/test/functional/test_framework/segwit_addr.py
@@ -7,6 +7,7 @@ import unittest
from enum import Enum
CHARSET = "qpzry9x8gf2tvdw0s3jn54khce6mua7l"
+BECH32_CONST = 1
BECH32M_CONST = 0x2bc830a3
class Encoding(Enum):
@@ -35,24 +36,24 @@ def bech32_hrp_expand(hrp):
def bech32_verify_checksum(hrp, data):
"""Verify a checksum given HRP and converted data characters."""
check = bech32_polymod(bech32_hrp_expand(hrp) + data)
- if check == 1:
+ if check == BECH32_CONST:
return Encoding.BECH32
elif check == BECH32M_CONST:
return Encoding.BECH32M
else:
return None
-def bech32_create_checksum(hrp, data, spec):
+def bech32_create_checksum(encoding, hrp, data):
"""Compute the checksum values given HRP and data."""
values = bech32_hrp_expand(hrp) + data
- const = BECH32M_CONST if spec == Encoding.BECH32M else 1
+ const = BECH32M_CONST if encoding == Encoding.BECH32M else 1
polymod = bech32_polymod(values + [0, 0, 0, 0, 0, 0]) ^ const
return [(polymod >> 5 * (5 - i)) & 31 for i in range(6)]
-def bech32_encode(hrp, data, spec):
+def bech32_encode(encoding, hrp, data):
"""Compute a Bech32 or Bech32m string given HRP and data values."""
- combined = data + bech32_create_checksum(hrp, data, spec)
+ combined = data + bech32_create_checksum(encoding, hrp, data)
return hrp + '1' + ''.join([CHARSET[d] for d in combined])
@@ -69,10 +70,10 @@ def bech32_decode(bech):
return (None, None, None)
hrp = bech[:pos]
data = [CHARSET.find(x) for x in bech[pos+1:]]
- spec = bech32_verify_checksum(hrp, data)
- if spec is None:
+ encoding = bech32_verify_checksum(hrp, data)
+ if encoding is None:
return (None, None, None)
- return (hrp, data[:-6], spec)
+ return (encoding, hrp, data[:-6])
def convertbits(data, frombits, tobits, pad=True):
@@ -100,7 +101,7 @@ def convertbits(data, frombits, tobits, pad=True):
def decode_segwit_address(hrp, addr):
"""Decode a segwit address."""
- hrpgot, data, spec = bech32_decode(addr)
+ encoding, hrpgot, data = bech32_decode(addr)
if hrpgot != hrp:
return (None, None)
decoded = convertbits(data[1:], 5, 8, False)
@@ -110,15 +111,15 @@ def decode_segwit_address(hrp, addr):
return (None, None)
if data[0] == 0 and len(decoded) != 20 and len(decoded) != 32:
return (None, None)
- if (data[0] == 0 and spec != Encoding.BECH32) or (data[0] != 0 and spec != Encoding.BECH32M):
+ if (data[0] == 0 and encoding != Encoding.BECH32) or (data[0] != 0 and encoding != Encoding.BECH32M):
return (None, None)
return (data[0], decoded)
def encode_segwit_address(hrp, witver, witprog):
"""Encode a segwit address."""
- spec = Encoding.BECH32 if witver == 0 else Encoding.BECH32M
- ret = bech32_encode(hrp, [witver] + convertbits(witprog, 8, 5), spec)
+ encoding = Encoding.BECH32 if witver == 0 else Encoding.BECH32M
+ ret = bech32_encode(encoding, hrp, [witver] + convertbits(witprog, 8, 5))
if decode_segwit_address(hrp, ret) == (None, None):
return None
return ret
re-utACK c85b2fce57a49133ec652a3ec95d67511cde311c
48 | -def bech32_create_checksum(hrp, data): 49 | +def bech32_create_checksum(encoding, hrp, data): 50 | """Compute the checksum values given HRP and data.""" 51 | values = bech32_hrp_expand(hrp) + data 52 | - polymod = bech32_polymod(values + [0, 0, 0, 0, 0, 0]) ^ 1 53 | + const = BECH32M_CONST if encoding == Encoding.BECH32M else 1
const = BECH32M_CONST if encoding == Encoding.BECH32M else BECH32_CONST
Done.
utACK c85b2fce57a49133ec652a3ec95d67511cde311c
Verified range-diff from 92ec3fcdd2
This also includes updates to the Python test framework implementation,
test vectors, and release notes.
utACK 2e7c80fb5be82ad4a3f737cab65b31f70a772a23
re-utACK 2e7c80fb5be82ad4a3f737cab65b31f70a772a23
ACK 2e7c80fb5be82ad4a3f737cab65b31f70a772a23
re-ACK 2e7c80fb5be82ad4a3f737cab65b31f70a772a23
Check range-diff since last review.
utACK 0334602
Apologies for the meta naming nit, but let's call that commit bech32m: naming nits? :-)
utACK 03346022d6
This seems ready for merge (and had 4 ACKs prior to the last push). Can we settle on this and resolve any additional style comments in follow-up PRs?
ACK 0334602
re-ACK 0334602
ACK 03346022d611871f2cc185440b19d928b9264d9d
43 | - } else { 44 | - const std::string& hrp = r2.first; 45 | - const std::vector<uint8_t>& data = r2.second; 46 | - assert(hrp == "bc"); 47 | - assert(data == input); 48 | + if (input.size() + 3 + 6 <= 90) {
nit: Any reason to remove the branch that checks bech32 for larger input sizes? I know this isn't relevant for bitcoin addresses, but there are other applications that use bech32 for encoding and checking it here doesn't cost us anything, no?
Bech32 and Bech32m do not permit encoded strings over 90 characters (the checksum properties break down past that point). BOLT11 uses a relaxed version of the spec that drops this requirement.
147 | }
148 | BECH32_INVALID = {
149 | - '❌_VER15_PROG41': 'bcrt10qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqzc7xyq',
150 | - '❌_VER16_PROB01': 'bcrt1sqqpl9r5c',
151 | + '❌_VER15_PROG41': 'bcrt1sqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqajlxj8',
152 | + '❌_VER16_PROB01': 'bcrt1sqq5r4036',
nit: obviously unrelated to your changes, but the B should be a G. (blame me)
18 | @@ -19,7 +19,7 @@ static void Bech32Encode(benchmark::Bench& bench) 19 | tmp.reserve(1 + 32 * 8 / 5); 20 | ConvertBits<8, 5, true>([&](unsigned char c) { tmp.push_back(c); }, v.begin(), v.end()); 21 | bench.batch(v.size()).unit("byte").run([&] { 22 | - bech32::Encode("bc", tmp); 23 | + bech32::Encode(bech32::Encoding::BECH32, "bc", tmp);
nit: Could add a benchmark for bech32m?
Yeah, could do. The performance should be exactly identical to Bech32 though.
left some minor nits