refactor: Model the bech32 charlimit as an Enum #30047

pull josibake wants to merge 2 commits into bitcoin:master from josibake:model-bech32-limit-as-enum changing 2 files +41 −27
  1. josibake commented at 9:26 am on May 6, 2024: member

    Broken out from #28122


    Bech32(m) was defined with a 90 character limit so that certain guarantees for error detection could be made for segwit addresses (see https://github.com/bitcoin/bips/blob/master/bip-0173.mediawiki#checksum-design).

    However, there is nothing about the encoding scheme itself that requires a limit of 90 and in practice bech32(m) is being used without the 90 char limit (e.g. lightning invoices, silent payments). Further, increasing the character limit doesn’t do away with error detection, it simply changes the guarantee.

    The primary motivation for this change is for being able to parse BIP352 v0 silent payment addresses (see https://github.com/bitcoin/bitcoin/pull/28122/commits/622c7a98b9f08177a3cfb601306daabb101af1fd), which require up to 118 characters. In addition to BIP352, modeling the character limit as an enum allows us to easily support new address types that use bech32m and specify their own character limit.

  2. DrahtBot commented at 9:27 am on May 6, 2024: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage

    For detailed information about the code coverage, see the test coverage report.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK paplorinc, theuni, achow101
    Concept ACK katesalazar

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #29607 (refactor: Reduce memory copying operations in bech32 encoding/decoding by paplorinc)
    • #27260 (Enhanced error messages for invalid network prefix during address parsing. by russeree)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  3. DrahtBot added the label Refactoring on May 6, 2024
  4. in src/bech32.cpp:377 in 1cfe953c26 outdated
    374+DecodeResult Decode(const std::string& str, CharLimit limit) {
    375     std::vector<int> errors;
    376     if (!CheckCharacters(str, errors)) return {};
    377     size_t pos = str.rfind('1');
    378-    if (str.size() > 90 || pos == str.npos || pos == 0 || pos + 7 > str.size()) {
    379+    if (str.size() > limit || pos == str.npos || pos == 0 || pos + 7 > str.size()) {
    


    theuni commented at 6:46 pm on May 6, 2024:
    Aren’t these checks broken if limit is NO_LIMIT (0) ?

    josibake commented at 8:59 am on May 7, 2024:
    :doh: yes! fixed and added unit tests for the NO_LIMIT case.
  5. katesalazar commented at 7:37 pm on May 6, 2024: contributor
    Concept ACK
  6. josibake force-pushed on May 7, 2024
  7. theuni commented at 2:23 pm on May 7, 2024: member
    I think the original bug here illustrates that the enum is not well suited for this. Definitely not with a value of 0 for NO_LIMIT anyway. Got any other ideas? :)
  8. josibake commented at 2:55 pm on May 7, 2024: member

    I think the original bug here illustrates that the enum is not well suited for this. Definitely not with a value of 0 for NO_LIMIT anyway. Got any other ideas? :)

    Yeah, I was getting an icky feeling while updating the PR for 2 reasons: NO_LIMIT of 0 is crappy and also the caller needs to make sure to use the same CharLimit for Decode and again with LocateErrors, which seems foot-gunny.

    My ideal solution is Decode/LocateErrors shouldn’t have a character limit at all (or use 1024, considering BIP173 mentions that bech32 error detection works reasonably well up to 1024 characters). Then, character limits for specific address types (e.g. bc1, sp1) should be enforced in the Destination Encoder/Decoder in src/key_io.cpp.

    IIRC, when I first wrote this code a year ago, there wasn’t much appetite for moving the charlimit out of Decode. If thats still the case, my next best solution would be keep the Enum and remove the NO_LIMIT case.

  9. josibake force-pushed on May 8, 2024
  10. josibake commented at 10:50 am on May 8, 2024: member

    @theuni updated to remove NO_LIMIT. I also looked at a few alternatives to using an Enum but I think given the relatively simple usecase, an Enum works here with the benefit of not requiring an expansive refactor.

    I’d be open to a more expansive refactor if there was a) conceptual buy in and b) additional use cases beyond supporting silent payments addresses (e.g. parsing bolt11/12 bech32 strings in Bitcoin Core, mercury layer bech32 addresses, etc).

  11. theuni commented at 5:06 pm on May 8, 2024: member
    I’m not sure I see the value in using an enum (and forcing it to be passed in) over simply defining a constant? If there’s more than 1 possible value, sure. But for now, why?
  12. josibake commented at 6:30 pm on May 8, 2024: member

    I’m not sure I see the value in using an enum (and forcing it to be passed in) over simply defining a constant? If there’s more than 1 possible value, sure. But for now, why?

    This commit is broken out from #28122 , where we add a new value for silent payment addresses (limit of 1024). Here is the commit that adds a new value for silent payments: https://github.com/bitcoin/bitcoin/commit/622c7a98b9f08177a3cfb601306daabb101af1fd

    My motivation for this PR is to prep for the silent payments PR, trying to break out smaller bits from that PR to make it more reviewable.

  13. theuni approved
  14. theuni commented at 5:21 pm on May 10, 2024: member

    Ok. I spent some time working on a correct-by-construction approach here with a new Bech32Encoded string type which knows its own size limit. Ultimately I don’t think it’s worth the complexity.

    What’s here doesn’t seem ideal to me, but I can’t come up with anything better and it’s simple enough that it’s obviously correct (without NO_LIMIT at least :).

    utACK 696e0a5314296ca064192f580de949d5d9d8f9d4.

  15. in src/bech32.cpp:403 in 696e0a5314 outdated
    397@@ -397,12 +398,12 @@ DecodeResult Decode(const std::string& str) {
    398 }
    399 
    400 /** Find index of an incorrect character in a Bech32 string. */
    401-std::pair<std::string, std::vector<int>> LocateErrors(const std::string& str) {
    402+std::pair<std::string, std::vector<int>> LocateErrors(const std::string& str, CharLimit limit) {
    403     std::vector<int> error_locations{};
    404 
    405-    if (str.size() > 90) {
    


    paplorinc commented at 6:52 pm on May 10, 2024:
    I worked around this 90 in my pr in https://github.com/bitcoin/bitcoin/pull/29607/files#diff-f146300624c06d2e08aadf500952294148a1785edd6ff2e8b50f13b2c08255edL315 (though I’m not sure why it’s not replaced here with limit) - I will try to review this in more detail next week

    theuni commented at 7:00 pm on May 10, 2024:
    Ah, can’t believe I missed that. Nice catch!

    theuni commented at 8:33 pm on May 10, 2024:
    ctrl+f “90”, whew that’s the only one. This is exactly why named constants are so important to name. I think it’s fair to call this a nice cleanup (getting rid of the constant) in addition to the new feature.

    josibake commented at 9:33 am on May 11, 2024:
    Oof, also can’t believe I missed this. Thanks for taking a look @paplorinc ! Took a look at your PR and I think this is the most elegant way to solve getting rid of the 90 in ExpandHRP, so I cherry-picked your last commit in (added CHECKSUM_SIZE and modified the commit message to fit this PR a bit more). As a nice side benefit, just this last commit improves the benchmark.

    paplorinc commented at 8:16 pm on May 11, 2024:
    Awesome! I just noticed my email was wrong there, would you be so kind and change it to Lőrinc <pap.lorinc@gmail.com>?

    paplorinc commented at 8:38 pm on August 5, 2024:
    Is this a leftover limit https://github.com/bitcoin/bitcoin/blob/master/src/test/fuzz/bech32.cpp#L32, or it’s fine to hard-code in tests?

    josibake commented at 9:21 am on August 6, 2024:
    Good catch, this is just a leftover limit. Opened #30596 as a follow-up fix.
  16. theuni commented at 8:34 pm on May 10, 2024: member
    ACK retracted until resolving #30047 (review).
  17. josibake force-pushed on May 11, 2024
  18. josibake force-pushed on May 11, 2024
  19. josibake commented at 9:40 am on May 11, 2024: member

    Cherry picked https://github.com/bitcoin/bitcoin/pull/29607/commits/b3b84c3b85c72135ec28904bdb4778024b74bd0d (h/t @paplorinc) to resolve #30047 (review).

    Ran the benchmarks and saw an improvement from the last commit, which is a nice side benefit!

    Before

    0|             ns/byte |              byte/s |    err% |        ins/byte |        cyc/byte |    IPC |       bra/byte |   miss% |     total | benchmark
    1|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
    2|                5.15 |      194,058,250.69 |    0.5% |           81.62 |           12.86 |  6.348 |          12.63 |    0.0% |      0.01 | `Bech32Decode`
    3|               11.08 |       90,289,310.89 |    2.3% |          153.69 |           27.64 |  5.559 |          20.00 |    0.1% |      0.01 | `Bech32Encode`
    

    After

    0|             ns/byte |              byte/s |    err% |        ins/byte |        cyc/byte |    IPC |       bra/byte |   miss% |     total | benchmark
    1|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
    2|                4.73 |      211,584,025.79 |    0.3% |           67.67 |           11.78 |  5.743 |          10.85 |    0.0% |      0.01 | `Bech32Decode`
    3|               10.44 |       95,767,220.36 |    2.6% |          138.84 |           26.06 |  5.327 |          17.25 |    0.0% |      0.01 | `Bech32Encode`
    
  20. in src/bech32.cpp:382 in 0e431c5a24 outdated
    379     std::vector<int> errors;
    380     if (!CheckCharacters(str, errors)) return {};
    381     size_t pos = str.rfind('1');
    382-    if (str.size() > 90 || pos == str.npos || pos == 0 || pos + 7 > str.size()) {
    383+    if (str.size() > limit) return {};
    384+    if (pos == str.npos || pos == 0 || pos + 7 > str.size()) {
    


    paplorinc commented at 10:35 am on May 12, 2024:

    is this 7 actually a CHECKSUM_SIZE + 1, or completely unrelated?

    0    if (pos == str.npos || pos == 0 || pos + CHECKSUM_SIZE >= str.size()) {
    

    josibake commented at 8:30 am on May 13, 2024:
    Yes, this is checking that “The data part, which is at least 6 characters long” (from BIP173). Changed pos + 7 > to pos + CHECKSUM_SIZE >= throughout.
  21. in src/bech32.cpp:351 in 0e431c5a24 outdated
    348-    enc.resize(enc.size() + 6); // Append 6 zeroes
    349+    auto enc = PreparePolynomialCoefficients(hrp, values, CHECKSUM_SIZE);
    350+    enc.insert(enc.end(), CHECKSUM_SIZE, 0x00);
    351     uint32_t mod = PolyMod(enc) ^ EncodingConstant(encoding); // Determine what to XOR into those 6 zeroes.
    352     data ret(6);
    353     for (size_t i = 0; i < 6; ++i) {
    


    paplorinc commented at 12:02 pm on May 12, 2024:
    we could use CHECKSUM_SIZE constant here as well

    josibake commented at 8:29 am on May 13, 2024:
    Done.
  22. in src/bech32.cpp:350 in 0e431c5a24 outdated
    343@@ -340,8 +344,8 @@ Encoding VerifyChecksum(const std::string& hrp, const data& values)
    344 /** Create a checksum. */
    345 data CreateChecksum(Encoding encoding, const std::string& hrp, const data& values)
    346 {
    347-    data enc = Cat(ExpandHRP(hrp), values);
    348-    enc.resize(enc.size() + 6); // Append 6 zeroes
    349+    auto enc = PreparePolynomialCoefficients(hrp, values, CHECKSUM_SIZE);
    350+    enc.insert(enc.end(), CHECKSUM_SIZE, 0x00);
    351     uint32_t mod = PolyMod(enc) ^ EncodingConstant(encoding); // Determine what to XOR into those 6 zeroes.
    352     data ret(6);
    


    paplorinc commented at 12:02 pm on May 12, 2024:

    instead of preallocation we could do a reserve here:

    0    data ret;
    1    ret.reserve(CHECKSUM_SIZE);
    

    josibake commented at 8:29 am on May 13, 2024:
    Would prefer to keep this PR focused on the limit and CHECKSUM_SIZE change, but perhaps a good change for #29607
  23. in src/bech32.h:36 in db854d5a59 outdated
    27@@ -28,6 +28,14 @@ enum class Encoding {
    28     BECH32M, //!< Bech32m encoding as defined in BIP350
    29 };
    30 
    31+/** Character limits for bech32(m) encoded strings. Character limits are how we provide error location guarantees.
    32+ *  These values should never exceed 2^31 - 1 (max value for a 32-bit int), since there are places where we may need to
    33+ *  convert the CharLimit::VALUE to an int. In practice, this should never happen since this CharLimit applies to an address encoding
    34+ *  and we would never encode an address with such a massive value */
    35+enum CharLimit : size_t {
    36+    SEGWIT = 90,            //!< BIP173/350 imposed 90 character limit on Bech32(m) encoded addresses. This guarantees finding up to 4 errors
    


    sipa commented at 1:20 pm on May 12, 2024:
    I’d prefer to just call this BECH32, and come up with another name for variants that have other limits.

    josibake commented at 8:26 am on May 13, 2024:
    Done.
  24. josibake force-pushed on May 13, 2024
  25. josibake commented at 8:34 am on May 13, 2024: member
    Changed the limit name to BECH32 per @sipa ’s suggestion and updated to use CHECKSUM_SIZE throughout per @paplorinc ’s review.
  26. in src/bech32.h:31 in df20572c87 outdated
    27@@ -28,6 +28,14 @@ enum class Encoding {
    28     BECH32M, //!< Bech32m encoding as defined in BIP350
    29 };
    30 
    31+/** Character limits for bech32(m) encoded strings. Character limits are how we provide error location guarantees.
    


    paplorinc commented at 9:44 am on May 13, 2024:

    nit: to be consistent with the casing below

    0/** Character limits for Bech32(m) encoded strings. Character limits are how we provide error location guarantees.
    

    josibake commented at 10:08 am on May 13, 2024:
    Fixed.
  27. in src/bech32.h:36 in df20572c87 outdated
    27@@ -28,6 +28,14 @@ enum class Encoding {
    28     BECH32M, //!< Bech32m encoding as defined in BIP350
    29 };
    30 
    31+/** Character limits for bech32(m) encoded strings. Character limits are how we provide error location guarantees.
    32+ *  These values should never exceed 2^31 - 1 (max value for a 32-bit int), since there are places where we may need to
    33+ *  convert the CharLimit::VALUE to an int. In practice, this should never happen since this CharLimit applies to an address encoding
    34+ *  and we would never encode an address with such a massive value */
    35+enum CharLimit : size_t {
    36+    BECH32 = 90,            //!< BIP173/350 imposed 90 character limit on Bech32(m) encoded addresses. This guarantees finding up to 4 errors
    


    paplorinc commented at 9:45 am on May 13, 2024:

    nit:

    0    BECH32 = 90,            //!< BIP173/350 imposed a 90 character limit on Bech32(m) encoded addresses. This guarantees finding up to 4 errors.
    

    josibake commented at 9:52 am on May 13, 2024:

    Keeping it present tense feels more correct, perhaps:

    0    BECH32 = 90,            //!< BIP173/350 imposed character limit for Bech32(m) encoded addresses. This guarantees finding up to 4 errors.
    

    ?


    paplorinc commented at 9:55 am on May 13, 2024:
    Even better!

    josibake commented at 10:08 am on May 13, 2024:
    Updated.
  28. paplorinc approved
  29. paplorinc commented at 9:46 am on May 13, 2024: contributor
    nice!
  30. refactor: Model the bech32 charlimit as an Enum
    Bech32(m) was defined with a 90 character limit so that certain
    guarantees for error detection could be made for segwit addresses.
    However, there is nothing about the encoding scheme itself that requires
    a limit and in practice bech32(m) has been used without the 90 char
    limit (e.g. lightning invoices).
    
    Further, increasing the character limit doesn't do away with error
    detection, it simply lessons the guarantees.
    
    Model charlimit as an Enum, so that if a different address scheme is
    using bech32(m), the character limit for that address scheme can be
    used, rather than always using the 90 charlimit defined for segwit
    addresses.
    
    upate comment
    5676aec1e1
  31. josibake force-pushed on May 13, 2024
  32. josibake commented at 10:08 am on May 13, 2024: member
    Updated comment per @paplorinc ’s suggestions.
  33. paplorinc approved
  34. josibake commented at 10:21 am on May 13, 2024: member

    @paplorinc I noticed you approved the PR, but we don’t really use github’s approval/review flow for determining the review status of a PR.

    Can you instead leave an ACK <commit> message per https://github.com/bitcoin/bitcoin/blob/master/CONTRIBUTING.md#code-review ? This way, intent is clear to the maintainers and DrahtBot can track the status and update the summary comment.

  35. paplorinc commented at 10:32 am on May 13, 2024: contributor
    ACK d196442c015c4cbf490751a650ecb3aa29321442
  36. DrahtBot requested review from theuni on May 13, 2024
  37. josibake commented at 12:44 pm on May 20, 2024: member
  38. in src/bech32.cpp:314 in d196442c01 outdated
    310@@ -308,18 +311,18 @@ bool CheckCharacters(const std::string& str, std::vector<int>& errors)
    311     return errors.empty();
    312 }
    313 
    314-/** Expand a HRP for use in checksum computation. */
    315-data ExpandHRP(const std::string& hrp)
    316+std::vector<unsigned char> PreparePolynomialCoefficients(const std::string& hrp, const data& values, const size_t extra)
    


    theuni commented at 6:12 pm on May 20, 2024:

    To be clear, the only reason for passing extra here is so that it can be reserved?

    If so, that’s not very intuitive and seems kinda like overkill. Seems like always reserving an extra 6 bytes fine, even if they don’t end up getting used?


    paplorinc commented at 6:27 pm on May 20, 2024:
    Yes, it’s either 0 or 6. I agree that we can always add it at the end, amended my original PR: https://github.com/bitcoin/bitcoin/pull/29607/files#diff-f146300624c06d2e08aadf500952294148a1785edd6ff2e8b50f13b2c08255edR317

    josibake commented at 7:17 am on May 21, 2024:
    Updated to remove the extra argument.
  39. DrahtBot requested review from theuni on May 20, 2024
  40. refactor: replace hardcoded numbers
    Replace ExpandHRP with a PreparePolynomialCoefficients function. Instead
    of using a hardcoded value for the size of the array (90 in this case)
    and a hardcoded value for the checksum, use the actual values vector and
    define checksum size as a constexpr. Use the new CHECKSUM_SIZE
    throughout instead 6.
    
    Co-authored-by: Lőrinc <pap.lorinc@gmail.com>
    7f3f6c6dc8
  41. josibake force-pushed on May 21, 2024
  42. paplorinc commented at 8:23 am on May 21, 2024: contributor
    re-ACK 7f3f6c6dc80247e6dfb0d406dc53bc8198f029fd
  43. theuni approved
  44. theuni commented at 7:09 pm on May 21, 2024: member
    utACK 7f3f6c6dc80247e6dfb0d406dc53bc8198f029fd
  45. achow101 commented at 0:21 am on June 5, 2024: member
    ACK 7f3f6c6dc80247e6dfb0d406dc53bc8198f029fd
  46. achow101 merged this on Jun 5, 2024
  47. achow101 closed this on Jun 5, 2024

  48. achow101 referenced this in commit fcc3b653dc on Jun 13, 2024

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-12-21 15:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me