Follow-ups to Bech32 error detection #23577

pull meshcollider wants to merge 9 commits into bitcoin:master from meshcollider:202111_bech32_followup changing 6 files +233 −323
  1. meshcollider commented at 4:07 AM on November 23, 2021: contributor

    A number of follow-ups and improvements to the bech32 error location code, introduced in #16807.

    Notably, this removes the hardcoded GF1024 tables in favour of constexpr table generation.

  2. Address review comments for Bech32 error validation bb4d3e9b97
  3. Improve Bech32 boost tests 92f0cafdca
  4. Report encoding type in bech32 error message c8b9a224e7
  5. meshcollider added the label RPC/REST/ZMQ on Nov 23, 2021
  6. meshcollider added the label Wallet on Nov 23, 2021
  7. meshcollider commented at 4:08 AM on November 23, 2021: contributor

    @ryanofsky and @sipa, this addresses most of your review comments in #16807 so you may like to take a look, thanks!

  8. laanwj commented at 10:15 AM on November 24, 2021: member

    Notably, this removes the hardcoded GF1024 tables in favour of constexpr table generation.

    Wow, this makes the compiler generate the table? Agree this is better for understandability, just wonder, does this have any noticeable impact on compilation time?

  9. meshcollider commented at 9:26 PM on November 24, 2021: contributor

    @laanwj Wow, this makes the compiler generate the table? Agree this is better for understandability, just wonder, does this have any noticeable impact on compilation time?

    It shouldn't do, each element of the table only takes a few bitwise operations on 16 bit integers and the table is only 1024 elements.

    EDIT: a quick test compiling that file using clang takes 0.24 on master and 0.28 with this PR so it is pretty negligible overall.

  10. MarcoFalke added this to the milestone 23.0 on Nov 25, 2021
  11. laanwj commented at 1:42 PM on November 29, 2021: member

    I checked the assembly output from clang 13.0.0 and gcc 9.3.0 that the tables are indeed a) generated constants in .rodata and b) different from before (as expected and mentioned in the commit description) c) match between compilers.

    	.type	_ZN6bech3212_GLOBAL__N_110GF1024_LOGE,@object # @_ZN6bech3212_GLOBAL__N_110GF1024_LOGE
    	.section	.rodata,"a",@progbits
    	.p2align	1
    _ZN6bech3212_GLOBAL__N_110GF1024_LOGE:
    	.short	65535                           # 0xffff
    	.short	0                               # 0x0
    	.short	99                              # 0x63
    	.short	363                             # 0x16b
    	.short	198                             # 0xc6
    ⋮
    	.type	_ZN6bech3212_GLOBAL__N_110GF1024_EXPE,@object # @_ZN6bech3212_GLOBAL__N_110GF1024_EXPE
    	.p2align	1
    _ZN6bech3212_GLOBAL__N_110GF1024_EXPE:
    	.short	1                               # 0x1
    	.short	32                              # 0x20
    	.short	311                             # 0x137
    	.short	139                             # 0x8b
    	.short	206                             # 0xce
    ⋮
    

    Full values for both tables can be found here: https://0bin.net/paste/Lp6hKmVD#EwTuc1KFIDzy57g0g3EvtQiLIj3HWD4e+OGPWUJTwQ+ (I did not have anything to check them against).

  12. MarcoFalke commented at 1:46 PM on November 29, 2021: member

    (I did not have anything to check them against).

    Looks like the GF1024_EXP you posted differs from the one that is removed in this pull?

  13. sipa commented at 1:55 PM on November 29, 2021: member

    I will review in detail shortly, but I suspect that @meshcollider perhaps included the table changes corresponding to https://github.com/sipa/bech32/pull/64 here? There is Sage code there that generates the tables too, if someone wants something to compare against.

  14. laanwj commented at 1:56 PM on November 29, 2021: member

    Looks like the GF1024_EXP you posted differs from the one that is removed in this pull?

    Both tables are. From the commit:

    Note that the tables generated by this code are different to the previous hardcoded tables, because we simplify to encoding (e) as 1 || 0 rather than 9 || 15 (as was done in PR 64 of the bech32 repo).

    I began this test with the intent to compare if they were the same, then noticed they were not. But it's expected (they use a different encoding).

    Edit: I'll check against @sipa's sage output. Edit2: All values match. The only difference is that the size of GF1024_EXP printed by the Sage script is 1024, while the structure here is 1023 long.

  15. meshcollider commented at 8:52 PM on November 29, 2021: contributor

    I'm happy to split that commit into two, the first deriving the existing table and then the second modifying to the new table, if that would help review?

    The length difference (1024 vs 1023) is just because the sage code also includes (a)^1023 = 1 = (a)^0.

  16. in src/bech32.cpp:38 in 9a505c1acf outdated
     209 | -    452, 710, 552, 128, 612, 600, 275, 322, 193
     210 | -};
     211 | -static_assert(std::size(GF1024_LOG) == 1024, "GF1024_EXP length should be 1024");
     212 | +/** We work with the finite field GF(1024) defined as a degree 2 extension of the base field GF(32)
     213 | + * The defining polynomial of the extension is x^2 + 9x + 23
     214 | + * Let (e) be a primitive element of GF(1024), that is, a generator of the field.
    


    sipa commented at 9:10 PM on November 29, 2021:

    This may be a bit of a philosophical point, touching on difference between specific field and field definition up to isomorphism.

    Still, I don't think it's sufficient (or at least confusing) to say it is a primitive element of GF(1024). It is specifically one of the roots of x^2 + 9x + 23. Things work out well because this is a primitive polynomial, and thus its root in the extension field is indeed a generator, but it's perfectly possible to pick a non-primitive but irreducible polynomial to define the extension field, and then pick a primitive element of that field (which then wouldn't be a root of its modulus).


    sipa commented at 9:16 PM on November 29, 2021:

    How about this?

    Define F=GF(32) as the set of polynomials over GF(2) modulo x^5 + x^3 + 1. We will represent elements of F as their integer encoding (e.g. 11 to mean x^3 + x + 1). Define E=GF(1024) as an extension of F, namely the set of polynomials over F modulo x^2+ 9x + 23. We will represent the element (ax + b) of F as the integer 32*a+b. Let e be the primitive element x of F (represented as 32). Every non-zero element of the field can then ...


    meshcollider commented at 9:20 PM on November 29, 2021:

    This is the definition of a primitive element. From Wikipedia:

    In field theory, a primitive element of a finite field GF(q) is a generator of the multiplicative group of the field. In other words, α ∈ GF(q) is called a primitive element if it is a primitive (q − 1)th root of unity in GF(q); this means that each non-zero element of GF(q) can be written as α^i for some integer i.

    But I agree this is more than just any primitive element, it is specifically because it is a root of the defining polynomial that we decide to use it. I will amend the comments.

  17. in src/bech32.cpp:38 in 98124a6394 outdated
     209 | -    575, 992, 463, 983, 243, 360, 970, 350, 267, 615, 766, 494, 31, 1009,
     210 | -    452, 710, 552, 128, 612, 600, 275, 322, 193
     211 | -};
     212 | -static_assert(std::size(GF1024_LOG) == 1024, "GF1024_EXP length should be 1024");
     213 | +/** We work with the finite field GF(1024) defined as a degree 2 extension of the base field GF(32)
     214 | + * The defining polynomial of the extension is x^2 + 9x + 23.
    


    sipa commented at 9:31 PM on November 29, 2021:

    Thanks, though I still have a nit: you haven't defined an encoding for GF(32) elements, so the number 9 and 23 have no meaning.

  18. in doc/release-notes-16807.md:6 in 98124a6394 outdated
       5 | -invalid characters in the address. For example, this will return the locations of up to two Bech32
       6 | -errors.
       7 | \ No newline at end of file
       8 | +- The `validateaddress` RPC now returns an `error_locations` array for invalid
       9 | +addresses, with the indices of invalid character locations in the address (if
      10 | +known). For example, this will return the locations of up to two Bech32 errors.
    


    sipa commented at 10:29 PM on November 29, 2021:

    Nit: there is really no way to know these are the actual error locations. They're a guess, except under the assumption that no more than 2 substitution errors have been made, in which case they're guaranteed to be correct.

  19. in src/bech32.cpp:46 in 98124a6394 outdated
     217 | + * as (e)^k for some power k.
     218 | + * The array GF1024_EXP contains all these powers of (e) - GF1024_EXP[k] = (e)^k in GF(1024).
     219 | + * Conversely, GF1024_LOG contains the discrete logarithms of these powers, so
     220 | + * GF1024_LOG[GF1024_EXP[k]] == k.
     221 | + * The following function generates the two tables GF1024_EXP and GF1024_LOG as constexprs. */
     222 | +constexpr std::pair<std::array<int16_t, 1023>, std::array<int16_t, 1024>> generate_gf_tables()
    


    sipa commented at 10:32 PM on November 29, 2021:

    Nit: the generate functions here don't really match our coding style.

  20. sipa commented at 10:48 PM on November 29, 2021: member

    ACK 98124a63946d6f7d665864adbef35a863423ca47, with or without nits.

  21. Update release note for bech32 error detection 63f7b69779
  22. in src/bech32.cpp:102 in 98124a6394 outdated
     273 | +
     274 | +    return std::make_pair(GF1024_EXP, GF1024_LOG);
     275 | +}
     276 | +
     277 | +constexpr auto tables = generate_gf_tables();
     278 | +constexpr std::array<int16_t, 1023> GF1024_EXP = tables.first;
    


    sipa commented at 11:06 PM on November 29, 2021:

    I assume this doesn't actually materialize the tables twice, as tables would be an unused symbol in an anonymous namespace. Still, if you'd want to avoid the copying at compile-time (or rule out the risk that it gets emitted twice in the binary), these GF1024_EXP and GF1024_LOG variables could be references (e.g. constexpr const std::array<int16_t, 1023>&).

  23. meshcollider commented at 12:41 AM on November 30, 2021: contributor

    Updated to address all of @sipa's nits, and split the GF1024 constexpr commit into two:

    • the first keeps the table the same, and generates it at compile time.
    • the second modifies the generation to use the simpler encoding from sipa/bech32#64
  24. in src/bech32.cpp:119 in 1e24b1d385 outdated
     290 | +    return std::make_pair(GF1024_EXP, GF1024_LOG);
     291 | +}
     292 | +
     293 | +constexpr auto tables = GenerateGFTables();
     294 | +constexpr const std::array<int16_t, 1023>& GF1024_EXP = tables.first;
     295 | +constexpr const std::array<int16_t, 1024>& GF1024_LOG = tables.second;
    


    laanwj commented at 3:19 PM on November 30, 2021:

    After the recent push I cannot find GF1024_EXP and GF1024_LOG as symbols anymore in the compiled output. I think that makes sense as they're references to tables now, which is fine, just need to adapt my comparison script.

  25. laanwj commented at 3:43 PM on November 30, 2021: member

    Thanks, re-checked:

    • gcc and clang output for tables of 1e24b1d385d64892389985afc058c83410c44311 matches the output of the sage script
    • gcc and clang output for tables of 83dc0b6392a99ea99e2d67d6893d715e48340023 matches GF1024_EXP + GF1024_LOG of the commit before it

    Code review ACK 1e24b1d385d64892389985afc058c83410c44311

  26. in src/bech32.cpp:255 in 1e24b1d385 outdated
     251 | +            SYNDROME_CONSTS[ind] = c;
     252 | +        }
     253 | +    }
     254 | +    return SYNDROME_CONSTS;
     255 | +}
     256 | +constexpr const std::array<uint32_t, 25>& SYNDROME_CONSTS = GenerateSyndromeConstants();
    


    sipa commented at 3:58 PM on November 30, 2021:

    Using a const reference here is counterproductive, it seems: the result appears to be that the reference is constexpr, but the value it is bound to isn't because it is a temporary, so the actual SYNDROME_CONSTS values aren't constexpr (I noticed when trying to add static_assertions on them).


    meshcollider commented at 5:14 PM on November 30, 2021:

    Ah okay, will revert this change then.

  27. sipa commented at 3:59 PM on November 30, 2021: member

    Verified the generated tables by adding the assertions generated by this Sage code:

    # Binary field
    B.<b> = GF(2)
    # Polynomials over the binary field
    BP.<bp> = B[]
    # Encoder field GF(32)
    F.<f> = GF(32, modulus=bp^5 + bp^3 + 1, repr='int')
    # Polynomials over the encoder field
    FP.<fp> = F[]
    # Decoder field GF(1024)
    E.<e> = F.extension(modulus=fp^2 + F.fetch_int(9)*fp + F.fetch_int(23))
    
    # Convert an E field element to an integer 0..1023.
    def gf_to_int(v):
        return v[0].integer_representation() + 32*v[1].integer_representation()
    
    # Convert an integer 0..1023 to an E field element.
    def int_to_gf(v):
        return F.fetch_int(v & 31) + e*F.fetch_int(v >> 5)
    
    GF1024_EXP = [gf_to_int(e**i) for i in range(1023)]
    GF1024_LOG = [-1] + [discrete_log(int_to_gf(i), e, ord=1023, operation="*") for i in range(1, 1024)]
    
    for i in range(1023):
        print("static_assert(GF1024_EXP[%i] == %i);" % (i, GF1024_EXP[i]))
    for i in range(1024):
        print("static_assert(GF1024_LOG[%i] == %i);" % (i, GF1024_LOG[i]))
    
    SYNDROME_CONSTS = [None for _ in range(25)]
    a1, a2, a3 = e**997, e**998, e**999
    for k in range(1, 6):
        for b in range(5):
            c = (gf_to_int(a1**k * int_to_gf(2)**b) |
                 gf_to_int(a2**k * int_to_gf(2)**b) << 10 |
                 gf_to_int(a3**k * int_to_gf(2)**b) << 20)
            SYNDROME_CONSTS[(k-1)*5 + b] = c
    
    for i in range(25):
        print("static_assert(SYNDROME_CONSTS[%i] == %i);" % (i, SYNDROME_CONSTS[i]))
    
  28. Replace GF1024 tables and syndrome constants with compile-time generated constexprs. 14358a029d
  29. Simplify encoding of e in GF(1024) tables to (1,0)
    This follows PR 64 of the sipa/bech32 repo.
    28d9c2857f
  30. Use bounds-checked array lookups in Bech32 error detection code 405c96fc9f
  31. Use std::iota instead of manually pushing range 2fa4fd1961
  32. meshcollider commented at 10:03 PM on November 30, 2021: contributor

    Only change:

    - constexpr const std::array<uint32_t, 25>& SYNDROME_CONSTS = GenerateSyndromeConstants();
    + constexpr std::array<uint32_t, 25> SYNDROME_CONSTS = GenerateSyndromeConstants();
    

    as per @sipa's comment.

  33. sipa commented at 11:12 PM on November 30, 2021: member

    ACK 2fa4fd196176160a5ad0a25da173ff93252b8103, verified the tables by inserting assertions generated by a Sage script.

  34. in src/bech32.cpp:400 in 2fa4fd1961 outdated
     397 | @@ -404,7 +398,8 @@ DecodeResult Decode(const std::string& str) {
     398 |  /** Find index of an incorrect character in a Bech32 string. */
     399 |  std::string LocateErrors(const std::string& str, std::vector<int>& error_locations) {
     400 |      if (str.size() > 90) {
    


    ryanofsky commented at 9:44 PM on December 3, 2021:

    In commit "Use std::iota instead of manually pushing range" (2fa4fd196176160a5ad0a25da173ff93252b8103)

    It would be good to add error_locations.clear() to the top of this function, if the intent is now to overwrite error_locations instead of appending to it. Before this commit, the function consistently appended without overwriting, and now it sometimes overwrites, sometimes appends, which is less consistent.


    meshcollider commented at 11:50 PM on December 3, 2021:

    Done in new commit to avoid invalidating the two ACKs.


    laanwj commented at 8:18 AM on December 5, 2021:

    Sorry to nit the nit, if the intent is to always return something new instead of modifying in place, why not return both values in the return value with e.g. std::pair or std::tuple instead? Personally at least I really prefer using the return value to return values instead of output arguments with references, it conveys the intent better.


    meshcollider commented at 1:27 AM on December 6, 2021:

    @laanwj no worries, I agree. Addressed your nit-nit :)

  35. ryanofsky approved
  36. ryanofsky commented at 9:49 PM on December 3, 2021: member

    Code review ACK 2fa4fd196176160a5ad0a25da173ff93252b8103 with caveat that I didn't try to understand or verify the math here, I just checked that all comments describing the math matched up with the code, and that all the changes seemed like improvements.

  37. Make Bech32 LocateErrors return error list rather than using out-arg a4fe70171b
  38. laanwj commented at 9:41 AM on December 6, 2021: member

    Re-ACK a4fe70171b6fa570eda71d86b59d0fb24c2f0614

  39. laanwj merged this on Dec 6, 2021
  40. laanwj closed this on Dec 6, 2021

  41. meshcollider deleted the branch on Dec 6, 2021
  42. in src/bech32.cpp:399 in a4fe70171b
     395 | @@ -396,23 +396,28 @@ DecodeResult Decode(const std::string& str) {
     396 |  }
     397 |  
     398 |  /** Find index of an incorrect character in a Bech32 string. */
     399 | -std::string LocateErrors(const std::string& str, std::vector<int>& error_locations) {
     400 | +std::pair<std::string, std::vector<int>> LocateErrors(const std::string& str) {
    


    MarcoFalke commented at 1:09 PM on December 6, 2021:

    The docstring reads odd in doxygen:

    "Find index of an incorrect character in a Bech32 string. Return the positions of errors in a Bech32 string. "

    See https://doxygen.bitcoincore.org/namespacebech32.html#ad3d7336317a6e6d359b54b2f60b53acc

    Also, std::make_pair is not needed with C++11/17

    diff --git a/src/bech32.cpp b/src/bech32.cpp
    index 3cda1dfff5..abc617467d 100644
    --- a/src/bech32.cpp
    +++ b/src/bech32.cpp
    @@ -111,7 +111,7 @@ constexpr std::pair<std::array<int16_t, 1023>, std::array<int16_t, 1024>> Genera
             GF1024_LOG[v] = i;
         }
     
    -    return std::make_pair(GF1024_EXP, GF1024_LOG);
    +    return {GF1024_EXP, GF1024_LOG};
     }
     
     constexpr auto tables = GenerateGFTables();
    @@ -395,27 +395,27 @@ DecodeResult Decode(const std::string& str) {
         return {result, std::move(hrp), data(values.begin(), values.end() - 6)};
     }
     
    -/** Find index of an incorrect character in a Bech32 string. */
    -std::pair<std::string, std::vector<int>> LocateErrors(const std::string& str) {
    +std::pair<std::string, std::vector<int>> LocateErrors(const std::string& str)
    +{
         std::vector<int> error_locations{};
     
         if (str.size() > 90) {
             error_locations.resize(str.size() - 90);
             std::iota(error_locations.begin(), error_locations.end(), 90);
    -        return std::make_pair("Bech32 string too long", std::move(error_locations));
    +        return {"Bech32 string too long", std::move(error_locations)};
         }
     
         if (!CheckCharacters(str, error_locations)){
    -        return std::make_pair("Invalid character or mixed case", std::move(error_locations));
    +        return {"Invalid character or mixed case", std::move(error_locations)};
         }
     
         size_t pos = str.rfind('1');
         if (pos == str.npos) {
    -        return std::make_pair("Missing separator", std::vector<int>{});
    +        return {"Missing separator", std::vector<int>{}};
         }
         if (pos == 0 || pos + 7 > str.size()) {
             error_locations.push_back(pos);
    -        return std::make_pair("Invalid separator position", std::move(error_locations));
    +        return {"Invalid separator position", std::move(error_locations)};
         }
     
         std::string hrp;
    @@ -430,7 +430,7 @@ std::pair<std::string, std::vector<int>> LocateErrors(const std::string& str) {
             int8_t rev = CHARSET_REV[c];
             if (rev == -1) {
                 error_locations.push_back(i);
    -            return std::make_pair("Invalid Base 32 character", std::move(error_locations));
    +            return {"Invalid Base 32 character", std::move(error_locations)};
             }
             values[i - pos - 1] = rev;
         }
    @@ -550,7 +550,7 @@ std::pair<std::string, std::vector<int>> LocateErrors(const std::string& str) {
                 }
             } else {
                 // No errors
    -            return std::make_pair("", std::vector<int>{});
    +            return {"", std::vector<int>{}};
             }
     
             if (error_locations.empty() || (!possible_errors.empty() && possible_errors.size() < error_locations.size())) {
    @@ -562,7 +562,7 @@ std::pair<std::string, std::vector<int>> LocateErrors(const std::string& str) {
                                   : error_encoding == Encoding::BECH32 ? "Invalid Bech32 checksum"
                                   : "Invalid checksum";
     
    -    return std::make_pair(error_message, std::move(error_locations));
    +    return {error_message, std::move(error_locations)};
     }
     
     } // namespace bech32
    diff --git a/src/bech32.h b/src/bech32.h
    index 5e89e6efda..707299f19e 100644
    --- a/src/bech32.h
    +++ b/src/bech32.h
    @@ -45,7 +45,7 @@ struct DecodeResult
     /** Decode a Bech32 or Bech32m string. */
     DecodeResult Decode(const std::string& str);
     
    -/** Return the positions of errors in a Bech32 string. */
    +/** Return the positions of errors in a Bech32 string. (Empty string if no errors can be found). */
     std::pair<std::string, std::vector<int>> LocateErrors(const std::string& str);
     
     } // namespace bech32
    
  43. MarcoFalke commented at 1:09 PM on December 6, 2021: member

    left a nit

  44. MarcoFalke commented at 1:56 PM on December 7, 2021: member

    More nits:

  45. sidhujag referenced this in commit 67b14190f6 on Dec 7, 2021
  46. RandyMcMillan referenced this in commit f1d0e97733 on Dec 23, 2021
  47. DrahtBot locked this on Dec 7, 2022

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-04-13 15:14 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me