BIP93: Generalize codex32 format for any hrp and fix typos

BenWestgate commented at 6:36 am on November 22, 2025: contributor

Summary of Changes: Describe codex32 format for arbitrary human-readable parts not just “ms”, specify master seed encoding standard, add new test vectors and enhance readability. This makes the document more like BIP-0173: proposing an encoding “codex32”, then defining a standard for something using it.

See discussion on #2023 (comment).

Spec:

fixed the threshold mistake in the abstract
replaced “master seed” with “secret”, prior to the “Master seed format” section and made descriptions hrp general
updated the checksum reference code to produce valid checksums for any hrp
change t to k to match the test vectors and book
defined “ms” codex32 secrets:
- using terms “secret seed” (as the book does) and “codex32-encoded master seed” to refer to “ms” codex32 secrets
- recommended using first 4 characters of the bech32-encoded fingerprint as the identifier
- recommended the padding bits be set with a CRC code for extra error detection. Provided reference code for this checksum.

Test Vectors:

Fixed the cornucopia of naming conventions in the Test vectors
- used mostly “secret seed”, “codex32 secret”, and “codex32-encoded X”.
Fixed test vector 5 which did not actually append a long checksum to “random” data as the text said it would.
Added vector 6 encoding a “cl” prefix codex32-encoded HSM secret, then relabels the identifier (producing a new checksum and codex32-encoded HSM secret)
Added vector 7 which parses a “cl” prefix codex32 secret and decodes the HSM secret
Clarified why invalid prefix test vectors were bad (their checksum is for “ms” but their prefix is not “ms”)
We might want to add one that uses “cl” with the old “ms” checksum code as that will now fail with the updated ms32_verify_checksum function

Generalize codex32 format for any hrp and fix typos

Clarify codex32 format for different hrp values, specify master seed encoding standard, add new test vectors and enhance readability.

c6f8bd07a6

Revert title for BIP93 document aedb912bd1

jonatack added the label Proposed BIP modification on Nov 22, 2025

jonatack added the label Pending acceptance on Nov 22, 2025

in bip-0093.mediawiki:140 in aedb912bd1

142+guarantees detection of '''any error affecting at most 8 characters'''
143+and has less than a 3 in 10<sup>19</sup> chance of failing to detect more
144+errors. The human-readable part is processed by first
145+feeding the higher bits of each character's US-ASCII value into the
146+checksum calculation followed by a zero and then the lower bits of each<ref>'''Why are the high bits of the human-readable part processed first?'''
147+This results in the actually checksummed data being ''[high hrp] 0 [low hrp] [data]''. This means that under the assumption that errors to the

roconnor commented at 5:18 pm on November 24, 2025:

The lengths limitations of the codex32 strings are working under the assumption that the HRP is not subject to error correction. We more or less cannot do that anyways as all sorts of various bech32 formats have appeared all with different checksums and characteristics. In order to run the checksum algorithm you have to know the prefix first in order to know which checksum algorithm to try.

This isn’t really a problem in practice since there are only a small finite number of prefixes, and from context only a few are going to be applicable anyways.

BenWestgate commented at 1:03 am on November 25, 2025:

This was copied over from BIP-00173. Delete it?

Bech32 attempts to decode two checksums, a universal bech32 decoder could try decoding the string with the bech32, bech32m and codex32 checksums to discover the format. Unless covering the HRP exceeds the max length at HD=9, 2 subsitutions in the HRP will always be detected by every format.

If HRP is swapped between formats the chances of false verification is:

1 in 2^65 for a “codex32 checksum” validating when the encoding was Bech32/Bech32m
~1 in 2^30 for “Bech32 checksum” validating when the encoding was Codex32.

roconnor commented at 4:35 pm on November 25, 2025:

This text was certainly the design goal of BIP-173, but we are not using their checksum, and we haven’t realized this part of their design in codex32 in part because our 13 character checksum unfortunately works only on relatively short strings.

Instead we process this HRP in this way because that is what BIP-173 does, and we still want the HRP to change the residue to catch random errors, so me might as well do it in the standard way.

Unless covering the HRP exceeds the max length at HD=9, 2 subsitutions in the HRP will always be detected by every format.

The problem is that our particular 13 character checksum’s max length for its error detection and correction properties is limited to 93 bech32 characters. That’s why our payload is limited to 74 characters add in 13 character checksum and 6 characters for the header and we get 93 bech32 characters, with nothing left over to detect or correct errors in the HRP. Yes, in cases where the payload is 72 characters or less, our error correction / detection properties extend to the low 5 bits of the ascii characters of a 2 character prefix, but that doesn’t apply to 73 or 74 character payloads.

I don’t know if we really want to get into these subtleties. I’m not even sure correcting and detecting errors in the HRP is useful to begin. If you are a hardware wallet expecting a master seed and someone gives you a “cl” codex32 string, you don’t need a fancy error correction algorithm to detect the “cl” prefix is wrong; if it is a expecting a master seed then the “cl” prefix must be wrong.

BenWestgate commented at 5:18 pm on November 25, 2025:

I will delete the footmark and say: “The human-readable part is processed as per BIP-0173.”

I updated the rationale accordingly:

At this length, the human-readable part is not covered by the checksum. This is acceptable because the checksum scheme itself requires you to know that a valid human-readable part is being used in the first place. If the prefix is damaged and a user is guessing that the data might be using this scheme, then the user can enter the available data explicitly using the suspected prefix.

I’m not even sure correcting and detecting errors in the HRP is useful to begin

wallets.md import guidance prefills the prefix in applications expecting only one to prevent mistakes.

A future application for extended keys (Is long codex32 HD=9 for 74 bytes?) has a situation where decoders need to accept both “xprv” and “xpub” HRPs in the same descriptor. So here it is absolutely useful to detect and correct errors in the HRP.

However we should not go into details about that until such an application needing to disambiguate different HRP actually exists.

roconnor commented at 5:59 pm on November 25, 2025:

What does HD=9 mean?

BenWestgate commented at 6:05 pm on November 25, 2025:

Hamming distance 9, able to detect 8 substitution errors. (and correct 4)

roconnor commented at 6:32 pm on November 25, 2025:

Yeah the 15 character checksum has HD=9 up to a maximum length 1023, supports data up to 1008 = (1023 - 15) characters and a payload of 1002 (=1008 - 6 header characters) bech32 characters. If you want to cover the HRP you lose payload capacity.

BenWestgate commented at 6:54 pm on November 25, 2025:

Plenty of room to long codex32 encode extended keys with covered human-readable parts “xpub”, “xprv”, “tprv” and “tpub” someday. They’re horrible to handle and type in base58. Is this something we should think about in a separate PR?

BenWestgate commented at 9:42 am on November 26, 2025:

I ran the numbers here and we must further restrict HRP lengths to 72 or 74 (same maximum as the payload). At len(hrp) = 74 two header characters two header characters aren’t checked but what does a threshold or share index even mean for a 0 byte payload?

I also explicitly stated the maximum length for a codex32 string is 96. Although perhaps at length = 96 the HRP we assume “ms” while the max length for all other prefixes is 94?

roconnor commented at 4:24 pm on November 26, 2025:

Since we are excluding the HRP from the error-correction and detection routine, the HRP can be as long as folks want. The HRP just adds a “random” factor to the checksum as a function of the HRP text and the number of characters in the data. And by “random” I really mean deterministic.

The way the error correction algorithm works is that it finds locations of errors starting from the end of the string (i.e. the end of the checksum) going back as far as but no further than 93 characters. So when the hrp-expanded segement + the data segment is longer than 93 characters, the algorithm simply fails to locate errors at any location beyond the last 93 characters.

When the hrp-expanded segement + the data segment is longer than 93 characters the hamming distance property still holds on the last 93 characters. That is if you only change the last 93 characters you must change at least 9 characters in order to get another valid string.

BenWestgate commented at 7:03 pm on November 26, 2025:

The HRP is included in the error detection routine. That’s why I’ve restricted the length to 94 characters if hrp != “ms”. In the 96 length case we assume it’s “ms”

Would you like to not have any length restriction and not promise to detect errors in the HRP?

That sounds strictly worse, since we don’t have to support any length of HRP, we can restrict it to ones that do get covered by the 13 character checksum, and that is better if a lot of them proliferate and some future application needs to disambiguate two short checksums.

roconnor commented at 8:59 pm on November 26, 2025:

Would you like to not have any length restriction and not promise to detect errors in the HRP?

That’s what I was thinking. There would just be a 93 data character limit and no correction of the HRP. This is how the limits of BIP-93 was designed.

That sounds strictly worse

I’m struggling to see how error correction is useful for the HRP. You need to already know the HRP in order to know select the codex32 error correction algorithm to begin with. CL wallets only take codex32 strings with the “cl” prefix. Hardware wallets only take codex32 strings beginning with “ms”. If you cannot read the prefix you won’t even know if it is a bip-173 address or a codex32 string or someone else’s algorithm. If you find a piece of paper laying around with “xd10lueasd35kw6r5de5kueedxyesqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqanvrktzhlhus” what are you expecting folks to do with it? Is it a bitcoin address, is it a private key, is it some other coin’s something?

There is a simple error correction algorithm: just try replacing the prefix all the prefixes that you care about.

BenWestgate commented at 9:15 pm on November 26, 2025:

Let’s use a descriptor for an example. Say these prefixes get registered “xpub”, “xprv”, “privkey” or whatever.

Now we have an import field that should be able to correct hrp errors.

In case we typed or wrote “xpub” for “xprv” or vice versa.

It would be best to have error detection guarantees (and corrections) for the HRP instead of the 1/(2^65) sort. And to get that, the hrp needs to be covered.

In this case the app can try “xpub”, “xprv” or “privkey” and detect or repair transcription errors in these strings.

BenWestgate commented at 9:53 pm on November 26, 2025:

Why don’t we keep it as is: a maximum length of 96 and then say that HRP’s may further restrict length. For example how “ms” restricts to 16-64 bytes or “cl” 32.

Because the two current applications do not need to disambiguate HRP’s, but future ones may need to, in which case they should shorten the max length to 94 to cover HRP.

roconnor commented at 10:12 pm on November 26, 2025:

Let’s use a descriptor for an example. Say these prefixes get registered “xpub”, “xprv”, “privkey” or whatever.

Now we have an import field that should be able to correct hrp errors.

In case we typed or wrote “xpub” for “xprv” or vice versa.

I agree that for applications that can accept more than one different HRPs in the same context, it makes sense to make sure error detection can cover them and those applications will need their own BIPs to cover their own length restrictions. My point is that no one is mixing up master seeds with addresses.

It would be best to have error detection guarantees (and corrections) for the HRP instead of the 1/(2^65) sort. And to get that, the checksum needs to be covered.

The situation is actually much worse than the 1/(2^65) sort.

ms1aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaady97ykdtray0m

has two corrections, either

ms1aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaady97ykdtraypy

or

cl1aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaady97ykdtray0m

(Edit: yes ‘a’ is an invalid threshold, but you know what I mean. The point is that for the 93 character data part, mistaking the ms prefix for a cl prefix is the same as making a two character error in the checksum.)

The math is that x^93 == 1 modulo our (short) codex32 generator, which means errors at location n are indistinguishable from errors at location n+93.

Ugh, I’m starting to think you are right. The total length of strings must be 94 characters or less, with an exception that an ms string can be 96 characters. Even a length restriction of 94 characters isn’t going to be enough unless we also restrict the HRP to use only letters and disallow numbers (we could also accept underscores and perhaps a very small handful of carefully selected special ASCII characters).

And maybe (in a separate PR) we should outlaw unusual sized master seed strings too to get rid of even that “ms” exception if we can agree on a definition of unusual sized.

BenWestgate commented at 10:25 pm on November 26, 2025:

Speaking of those multi-hrp applications, my reviewer on the bip85 codex app is working on one such application (BIP32 encodings) #1958 (comment)

roconnor commented at 11:10 pm on November 26, 2025:

Okay we have a choice of making a complicated rule like “the string length plus the length of the hrp (i.e double counting the hrp) must be less than or equal to 93” or we can have simple rule like “the length of the string must be less than or equal to 94” but also restricting the hrp character set to at most 32 values where the lower 5 bits of the ASCII character are all distinct.

Given how length constrained our 13 character checksum is, I think we should restrict the HRP character set. I’d go for “a-z” and special characters @_[|]~, but the choice of special characters is debatable (but our options are limited). However, we will have to pick some.

We have a choice between @ and `, of which I think @ is preferable

We have a choice between ? and _ , of which I think _ is preferable. In particular I don’t think we want ? as a legal HRP character.

We have a choice between ;, [, and {. We have a choice between <, , and |. We have a choice between =, ] and }. We have a choice between >, ^, and ~. I don’t have very strong preferences within these options.

roconnor commented at 11:35 pm on November 26, 2025:

Actually I’m not sure even this restriction is enough. We might need to require hrps only consist of ascii characters between 96 and 126 (or their upper case variants) so that all their upper bits are always 011 in order to be safe.

roconnor commented at 11:41 pm on November 26, 2025:

@apoelstra already has a prefix, “BIP32_24W” which mixes up letter and numbers in violation of my proposal.

roconnor commented at 11:55 pm on November 26, 2025:

@apoelstra ’s BIP32_24W has 82 length strings with at 9 character hrp. 82 + 9 = 91 which is just barely less than 93.

Okay I guess the total length of a string + the length of the hrp (.i.e. double counting the hrp) must be less than or equal to 93 should be the rule for using the 13 character checksum.

roconnor commented at 0:16 am on November 27, 2025:

Perhaps a better way of phrasing this is that the maximum length of string (including the HRP) for a given codex scheme is 93 - the length of the HRP.

BenWestgate commented at 0:24 am on November 27, 2025:

The situation is actually much worse than the 1/(2^65) sort.

Ah yes, I forgot how fragile ECC is when pushed beyond it’s capacity.

(…The point is that for the 93 character data part, mistaking the ms prefix for a cl prefix is the same as making a two character error in the checksum.)

Well “cl” fortunately has a 32-byte length requirement (and we should probably start talking about length limits as payload bytes unless we want to support pure u5 secrets that do not need conversion to bytes) but we do need to plan for the eventual case of detecting HRP errors at the maximum 13 character checksum length.

Ugh, I’m starting to think you are right. The total length of strings must be 94 characters or less, with an exception that an ms string can be 96 characters.

95 characters has an incomplete group decoding to bytes so I can remove the “95 or” 96.

Even a length restriction of 94 characters isn’t going to be enough unless we also restrict the HRP to use only letters and disallow numbers (we could also accept underscores and perhaps a very small handful of carefully selected special ASCII characters).

What do we need to restrict it to in order to allow every US-ASCII character? Keeping in mind it expands to two 5 bit values but only the upper 2 bits can change so we should have better detection ability than 10 bits per character.

ie 5 US-ASCII characters should be similar to 7 data part characters.

Does that mean 52 US-ASCII character limit on the hrp for short codex32 strings? That sounds fine. Although now we get a maximum length formula like strings can’t exceed (len(hrp) + 4) * 7 // 5 + len(payload) <= 74

We also have restrictions within our own codex32 header that can be exploited to lend slightly more correction power to the human-readable part: 9 out of 32 values for threshold. Threshold 0 only has 1/32 share indices.

That might get it to 53 US-ASCII, I didn’t run the numbers.

And maybe (in a separate PR) we should outlaw unusual sized master seed strings too to get rid of even that “ms” exception if we can agree on a definition of unusual sized.

I gave a very good argument to justify outlawing lengths besides 128, 192, 256, 512, or put another looser way, lengths not divisible by 64 bits. Any closer has input lengths within the insert/delete correctable distance of two valid lengths and causes ambiguity.

This can be an “ms” specific recommendation (or requirement) I don’t think every codex32 application needs to be as strict on lengths if they have a good reason to support variable length encodings. Although it would be worth recommending this 64 bit spacing to all applications in case their developers are unaware of the harm it causes insert/delete correction.

The final concern on this topic is that the maximum hrp length is really a derivative of the maximum checksummed payload length, which is 370 bits. If we spend 70 bits on a 10 US-ASCII HRP, there’s only 300 bits, 60 bech32 characters left for the secret. or a 50 character HRP supports a payload length of 4 bech32 characters (2 bytes).

This is making me want to roll the HRP into the CRC, then we get an extra 1-4 bits of error detection on it, which in some cases will extend the permitted HRP length by 1. OTOH for pure secret data recovery purposes, it’s stronger if it just checksums the seed, although it barely matters since it can’t correct any bit errors at the 128 or 256 bitlengths

BenWestgate commented at 0:32 am on November 27, 2025:

Given how length constrained our 13 character checksum is, I think we should restrict the HRP character set. I’d go for “a-z” and special characters @_[|]~, but the choice of special characters is debatable (but our options are limited). However, we will have to pick some.

This is too breaking of BIP-0173 spec, just allow the entire range and accept a shorter string. If an application needs a super long HRP then either we don’t allow it (83 maximum in BIP-0173, although must be lower here for short codex32) or they use the long codex32 checksum which is good for 730 HRP characters.

Do we want to avoid a situation where strings of the same length and different HRP may be using different checksums, or if we don’t care (i don’t care) we can choose the checksum according to combined length:

We pack the bech32_hrp_expand(hrp) + data and then use that to determine checksum_len, that would eliminate our HRP length requirement being smaller than BIP-0173’s, do you agree with this solution?

I’m sure there’s some “hrp insert/delete” correction assumption nightmares for error correcting decoders but applications that need to disambiguate SHOULD at least make all the HRP’s they must consider the same length to avoid this mess.

I’m also fine with treating each HRP character as two bech32 characters for maximum length purposes, even though we can probably do better than that (even using bech32_hrp_expand()) due to each character only being a 7-bit value.

BenWestgate commented at 0:41 am on November 27, 2025:

Okay I guess the total length of a string + the length of the hrp (.i.e. double counting the hrp) must be less than or equal to 93 should be the rule for using the 13 character checksum.

I like this. Can we do better than double counting the HRP since each US-ASCII is 7- not 10 bits? Or will our detection guarantees begin to break down (not able to correct any 2 HRP substitutions or detect any 4 HRP errors if its length exceed 37s, with an empty data part)?

I guess the pure 7-bit character translation of our guarantees is: correct any 2 hrp errors, detect any 5 errors, 9 if contiguous for short codex32 (10 if contiguous on long codex32)

Can we achieve this property simply? We should if we can.

roconnor commented at 0:41 am on November 27, 2025:

What do we need to restrict it to in order to allow every US-ASCII character? Keeping in mind it expands to two 5 bit values but only the upper 2 bits can change so we should have better detection ability than 10 bits per character.

The issue is that for BCH codes the data really needs to fit within their length restriction, so we cannot just count entropy. Even if we know some some bits are fixed, if the polynomial we extract has degree more that 93, everything falls apart because it can no longer distinguish errors on one side of the polynomial from errors on the other side of the polynomial. The rules that the the string length plus counting the hrp again must be less than 93 is the only thing that makes the HRP expanded polynomial concatenated with the data part fit in degree at most 93.

BenWestgate commented at 10:15 pm on November 27, 2025:

If we REQUIRE every registered HRP be unique in the lower 5 bits then:

we don’t have to ever distinguish errors in the high bits they’re a lookup table, VALID_HRP.
the expanded data we covered by the checksum will be < 93 single counting hrp characters.
With the valid HRP table, Correcting errors in the low bits, corrects any errors in the high bits.
We know which checksum is being used by the length of the string, which is far simpler than a per hrp (impossible) design or one that double weights hrp characters towards max_length.

roconnor commented at 10:26 pm on November 27, 2025:

With the valid HRP table, Correcting errors in the low bits, corrects any errors in the high bits.

It’s not that simple. For maximum length codex32 string, errors in the high bits appear as errors in the checksum at the end of the string because for BCH codes, any polynomial longer than the maximum length (93 in our case) effectively wraps around.

Edit: And errors in the high bits are not going to be uncommon. Let me tell you the number of times I’ve mistaken a 5 for an S.

BenWestgate commented at 10:41 pm on November 27, 2025:

Then let’s guarantee to detect 8 errors, 13/15 if contiguous in the low bits, BIP-0173 style but not error correct the HRP. Trying different suspected HRPs will have to be the way to correct a damaged prefix, like our rationale suggests doing.

roconnor commented at 11:18 pm on November 27, 2025:

mistaking a 5 for an S in the HRP counts as two errors.

roconnor commented at 11:34 pm on November 27, 2025:

BIP-0173 error detection doesn’t tell you where the errors are. In particular it doesn’t tell whether the HRP is correct or not. You need to invoke the error correction to find locations.

in bip-0093.mediawiki:341 in aedb912bd1

337+** A conversion of the 16-to-64-byte BIP-0032 HD master seed to bech32:
338+*** Start with the bits of the master seed, most significant bit per byte first.
339+*** Re-arrange those bits into groups of 5, and pad with arbitrary bits at the end if needed.
340+*** Translate those bits to characters using the bech32 character table from BIP-0173.
341+
342+When padding bits are needed they should be generated using CRC polynomial <code>(1 << pad_len) | 3</code> with an initial value of <code>0</code> and appended to the master seed bits. Note that unlike the codex32 checksums, we do NOT include the header data.

roconnor commented at 5:25 pm on November 24, 2025:

I don’t really want this CRC stuff in the standard. At most it is a recommendation that folks MAY use to select padding bits. but I doubt it will be useful in practice since you cannot know if any given codex32 master seed was generated with this CRC or with random padding as the codex32 book does. If one’s seed is so corrupted that the codex32 error correction wasn’t able to fix it, I’m skeptical a few more bits will help.

If it is include than padding MAY be random should also be stated. Perhaps it is better to move this to a separate PR if we want to further discuss it.

BenWestgate commented at 4:08 am on November 25, 2025:

I’ll replace it with one sentence for now:

Encoders MAY select padding using a CRC-w where: w = pad_bits_needed, poly = 1 << w | 3, init = 1, const = 1 << w - 1, refIn = false, and refOut = false.

CRC-4 helps 32-byte seeds detect 94% of 1 character errors/deletions and narrows 1 erasure down to 2 candidates. The most common substitutions in CHARSET are 1 bit apart so actual performance will exceed that.

since you cannot know if any given codex32 master seed was generated with this CRC or with random padding

Finding the book is reasonable suspicion for random padding IF all electronically encoded seeds use CRC. The person who made the backup certainly knows if they encoded with the book or not.

as the codex32 book does

A book insert could compute CRCs by hand, but I have not divided a large enough number to benchmark time for 128-bits, Andrew and I determined the space requirement is two pieces of 11 x 8.5 graph paper.

If it is include than padding MAY be random should also be stated.

Decoders, MUST accept random padding, although they may someday warn on it.

For electronic encoders, trusting RNGs introduces risk: it can leak up to 32 bits to an attacker with 8 shares and breaks decode->encode round-trip. Using zero padding round-trips but leaks the final payload character to an attacker with (5-pad_len) / pad_len interpolated shares.

CRC pad minimizes RNG trust by using entropy already present in the payload bytes.

Perhaps it is better to move this to a separate PR if we want to further discuss it.

Lets discuss on my deterministic codex32 BIP85 PR where it’s required since reviewers asked me to directly encode bytes not u5 ints even for share payloads.

roconnor commented at 4:42 pm on November 25, 2025:

I’d still rather this be in a separate PR and in a separate section on recommendations for determinisitic generation.

BenWestgate commented at 4:53 pm on November 25, 2025:

Should it be a PR to this BIP93 document or to BlockstreamResearch/codex32 wallets.md?

roconnor commented at 6:35 pm on November 25, 2025:

Maybe hammering out the details at codex32_wallets.md makes the most sense. Then we can decide if we want to include it in BIP-93, or maybe just make a reference to the codex32_wallets.md.

in bip-0093.mediawiki:119 in aedb912bd1

116 
117-def ms32_verify_checksum(data):
118+def bech32_hrp_expand(s):
119+  return [ord(x) >> 5 for x in s] + [0] + [ord(x) & 31 for x in s]
120+
121+def ms32_verify_checksum(hrp, data):

roconnor commented at 5:26 pm on November 24, 2025:

If you want an hrp parameter, you have to rename this function to something like codex32_verify_checksum.

BenWestgate commented at 4:29 am on November 25, 2025:

Will do

in bip-0093.mediawiki:85 in aedb912bd1

84 ** A checksum which consists of 13 bech32 characters as described below.
85 
86 As with bech32 strings, a codex32 string MUST be entirely uppercase or entirely lowercase.
87 For presentation, lowercase is usually preferable, but uppercase SHOULD be used for handwritten codex32 strings.
88 If a codex32 string is encoded in a QR code, it SHOULD use the uppercase form, as this is encoded more compactly.
89+The lowercase form is used when determining a character's value for checksum purposes.

roconnor commented at 5:27 pm on November 24, 2025:

This doesn’t make sense. The lowercase form and uppercase form of Bech32 characters have the same value.

BenWestgate commented at 4:13 am on November 25, 2025:

Not for HRP which needs to be lower cased during decoding or bech32_hrp_expand(hrp) would return a different result.

This line is repeated from the test vectors, why explain the rules about case in the vectors instead of up here?

roconnor commented at 4:38 pm on November 25, 2025:

I guess we should reword this to make it more clear that the relevance is for the HRP.

BenWestgate commented at 8:41 pm on November 25, 2025:

“When constructing or verifying a checksum, the human-readable part MUST be interpreted in lowercase, as specified in BIP-0173.”

roconnor commented at 11:19 pm on November 25, 2025:

I might say “MUST be converted to lowercase” instead.

BenWestgate commented at 5:06 am on November 26, 2025:

That seems to imply mutating the string when verifying a checksum.

Something BIP-93 omitted was that encoders should always emit lower-case strings. Did we relax that requirement?

Currently I have the sentence as: “Encoders MUST emit lowercase; decoders MUST reject mixed-case and MUST lowercase the human-readable part during checksum verification.”

And I am adding a section with a codex32_encode and codex32_decode definitions as I think it’s easier to see these rules in code than english.

Uppercase/lowercase

The lowercase form is used when determining a character’s value for checksum purposes.

Encoders MUST always output an all lowercase Bech32 string. If an uppercase version of the encoding result is desired, (e.g.- for presentation purposes, or QR code use), then an uppercasing procedure can be performed external to the encoding process.

Decoders MUST NOT accept strings where some characters are uppercase and some are lowercase (such strings are referred to as mixed case strings).

For presentation, lowercase is usually preferable, but inside QR codes uppercase SHOULD be used, as those permit the use of alphanumeric mode, which is 45% more compact than the normal byte mode.

roconnor commented at 3:52 pm on November 26, 2025:

As far as I’m concerned both all lowercase and all uppercase strings are valid, so encoders can produce either format with lowercase is generally preferred. I’m not really sure what BIP-173 thinks it is achieving by talking about encoders being somewhat different from a post-processing step. Maybe they are just trying to say that when creating a checksum, of course, a lowercase HRP must be used.

That seems to imply mutating the string when verifying a checksum.

This is exactly what the BIP-173 reference python decoder does:

https://github.com/sipa/bech32/blob/master/ref/python/segwit_addr.py#L78

However, “The lowercase form is used …” is also fine wording though.

BenWestgate commented at 7:08 pm on November 26, 2025:

“Decoders MUST use the lowercase form of the human-readable part during checksum verification.”

in bip-0093.mediawiki:551 in aedb912bd1

547@@ -481,10 +548,49 @@ The payload contains 103 bech32 characters, which corresponds to 515 bits. The l
548 
549 This is an example of a '''Long codex32 String'''.
550 
551-* Secret share with index <code>S</code>: <code>MS100C8VSM32ZXFGUHPCHTLUPZRY9X8GF2TVDW0S3JN54KHCE6MUA7LQPZYGSFJD6AN074RXVCEMLH8WU3TK925ACDEFGHJKLMNPQRSTUVWXY06FHPV80UNDVARHRAK</code>
552-* Master secret (hex): <code>dc5423251cb87175ff8110c8531d0952d8d73e1194e95b5f19d6f9df7c01111104c9baecdfea8cccc677fb9ddc8aec5553b86e528bcadfdcc201c17c638c47e9</code>
553+unchecksummed string (bech32): <code>MS10C8VSM32ZXFGUHPCHTLUPZRY9X8GF2TVDW0S3JN54KHCE6MUA7LQPZYGSFJD6AN074RXVCEMLH8WU3TK925ACDEFGHJKLMNPQRSTUVWXY06F</code>

roconnor commented at 5:36 pm on November 24, 2025:

I’d be included to remove this uncheckedsummed string. I’m really nervous displaying strings without a checksum anywhere. They are very problematic.

If you insist on going into this much detail in this test vector I’d say use the following bullets

Master seed (hex):
master node xprv
Payload
HRP
Identifier
Checksum
Secret seed

That’s the order I’d use, but maybe some other permutations are also good.

BenWestgate commented at 4:27 am on November 25, 2025:

How about since the text said:

This example shows generating a new 512-bit master seed using “random” codex32 characters and appending a checksum.

human-readable part: MS k value: 0 identifier: 0C8V share index: S payload: M32ZXFGUHPCHTLUPZRY9X8GF2TVDW0S3JN54KHCE6MUA7LQPZYGSFJD6AN074RXVCEMLH8WU3TK925ACDEFGHJKLMNPQRSTUVWXY06F

checksum: HPV80UNDVARHRAK
secret seed: MS100C8VSM32ZXFGUHPCHTLUPZRY9X8GF2TVDW0S3JN54KHCE6MUA7LQPZYGSFJD6AN074RXVCEMLH8WU3TK925ACDEFGHJKLMNPQRSTUVWXY06FHPV80UNDVARHRAK
Master seed (hex): dc5423251cb87175ff8110c8531d0952d8d73e1194e95b5f19d6f9df7c01111104c9baecdfea8cccc677fb9ddc8aec5553b86e528bcadfdcc201c17c638c47e9
master node xprv: xprv9s21ZrQH143K4UYT4rP3TZVKKbmRVmfRqTx9mG2xCy2JYipZbkLV8rwvBXsUbEv9KQiUD7oED1Wyi9evZzUn2rqK9skRgPkNaAzyw3YrpJN

No information is displayed we did not already in Vector 1.

roconnor commented at 4:40 pm on November 25, 2025:

That would be better.

BenWestgate commented at 8:46 pm on November 25, 2025:

Would you rather just change V5 text to match master’s vectors?

From:

This example shows generating a new 512-bit master seed using “random” codex32 characters and appending a checksum.

To:

This example shows the long codex32 format, when used without splitting the secret into any shares.

roconnor commented at 9:21 pm on November 25, 2025:

I somewhat prefer the current text.

BenWestgate commented at 7:19 pm on November 26, 2025:

I reverted it

We start given

k value = 0 identifier = 0C8V payload =

then compute

checksum
secret seed
Master seed
master node xprv

We are able to infer share index = “s” and hrp = “MS” from the text.

FWIW I have been using the term “secret seed” when a codex32 secret is arrived at from bech characters (interpolation or randomly selected) and I have been using the term codex32-encoded master seed when it’s produced from bytes.

This is slightly more precise but we need not bother readers with the distinction since both are valid.

in bip-0093.mediawiki:334 in aedb912bd1

330+
331+* The human-readable part "ms" for master seed.
332+* The data-part values:
333+** A threshold parameter, which MUST be a single digit between "2" and "9", or the digit "0".
334+** An identifier consisting of 4 bech32 characters.
335+*** We recommend the first 4 characters of the bech32-encoded BIP-0032 key fingerprint.

roconnor commented at 5:53 pm on November 24, 2025:

When some shares of a master seed are compromised, a user may wish to simply dispose of remaining shares and rederive a new set of secret shares without the cost of sweeping their wallet. In such a case a user very much should use a fresh identifier so that they do not get mix up their obsolete share data with their fresh shares.

At best a hardware wallet may suggest such an identifier, but only when the hardware wallet is generating a fresh master seed and thus knows that there are no other secret shares for the same secret floating around.

Again, maybe put this recommendation into a separate PR, perhaps including more details and test vectors.

BenWestgate commented at 6:19 am on November 25, 2025:

This is a good case for BIP85 derive codex32 application to avoid trusting or generating randomness for this.

If we want a default identifier for “reshares” too:

0identifier = fingerprint(master_seed)[:2] + fingerprint(false_seed)[2:4]

Where false_seed is recovered from fresh initial shares (reducing k if needed).

During the first generation with k fresh shares; the two slices together produce the full fingerprint. If the identifier is unspecified, recommend this default for master seeds. Or if that collides, the next higher.

Again, maybe put this recommendation into a separate PR, perhaps including more details and test vectors.

Already done here. Agreed wrt details and vectors.

https://github.com/BenWestgate/bips/blob/a8f8e98d05a2183aba395f8f8ff479b4fb764f95/bip-0085.mediawiki#unshared-secret

#1958

I can move this recommendation (and the padding rec) into a separate BIP93 PR pending the final BIP85 codex32 design (needs finishing touches feedback). I’ll remove it from this PR so it remains focused on typos and general HRP support.

in bip-0093.mediawiki:484 in aedb912bd1

482-* Master secret (hex): <code>d1808e096b35b209ca12132b264662a5</code>
483+* Recovered secret seed with index <code>S</code>: <code>MS12NAMES6XQGUZTTXKEQNJSJZV4JV3NZ5K3KWGSPHUH6EVW</code>
484+* Master seed (hex): <code>d1808e096b35b209ca12132b264662a5</code>
485 * master node xprv: <code>xprv9s21ZrQH143K2NkobdHxXeyFDqE44nJYvzLFtsriatJNWMNKznGoGgW5UMTL4fyWtajnMYb5gEc2CgaKhmsKeskoi9eTimpRv2N11THhPTU</code>
486 
487 Note that per BIP-0173, the lowercase form is used when determining a character's value for checksum purposes.

BenWestgate commented at 6:24 pm on November 25, 2025:

@roconnor why do we have spec notes in the test vectors?

This was what I was trying to make unnecessary with the earlier sentence about case and checksum.

roconnor commented at 6:44 pm on November 25, 2025:

I don’t know. Before we were using a fixed constant for the “ms” prefix, so this text wasn’t really necessary. Now that we want to support more general HRPs, I agree that we need some wording like this somewhere.

BenWestgate commented at 8:49 pm on November 25, 2025:

I removed it and added wording like this to the end of the codex32 Spec section.

Rename ms32 functions to codex32, remove recommendations, clarify HRP case in checksum a4f1e91ad9

in bip-0093.mediawiki:485 in aedb912bd1

483+* Recovered secret seed with index <code>S</code>: <code>MS12NAMES6XQGUZTTXKEQNJSJZV4JV3NZ5K3KWGSPHUH6EVW</code>
484+* Master seed (hex): <code>d1808e096b35b209ca12132b264662a5</code>
485 * master node xprv: <code>xprv9s21ZrQH143K2NkobdHxXeyFDqE44nJYvzLFtsriatJNWMNKznGoGgW5UMTL4fyWtajnMYb5gEc2CgaKhmsKeskoi9eTimpRv2N11THhPTU</code>
486 
487 Note that per BIP-0173, the lowercase form is used when determining a character's value for checksum purposes.
488 In particular, given an all uppercase codex32 string, we still use lowercase <code>ms</code> as the human-readable part during checksum construction.

BenWestgate commented at 6:42 pm on November 25, 2025:

If I add this to spec: “When constructing or verifying a checksum, the human-readable part MUST be interpreted in lowercase, as specified in BIP-0173.”

Then we can remove this.

in bip-0093.mediawiki:63 in a4f1e91ad9

57@@ -59,67 +58,84 @@ However, BIP-0039 has no error-correcting ability, cannot sensibly be extended t
58 
59 ==Specification==
60 
61+We first describe the general checksummed base32<ref>'''Why use base32 at all?''' The lack of mixed case makes it more
62+efficient to read out loud, write, type or to put into QR codes.</ref> format called
63+''codex32'' and then define the BIP-0032 master seed encoding using it.

BenWestgate commented at 10:10 pm on November 25, 2025:

Should it be:

and then define a BIP-0032 master seed encoding using it.

? Because this is only one encoding of master seeds, SLIP-39 and WIF are others, we should use “a” not “the”.

in bip-0093.mediawiki:70 in a4f1e91ad9

68 It reuses the base-32 character set from BIP-0173, and consists of:
69-
70-* A human-readable part, which is the string "ms" (or "MS").
71-* A separator, which is always "1".
72+* A human-readable part, which is intended to convey the type of data, or anything else that is relevant to the reader. This part MUST contain 1 to 83 US-ASCII characters, with each character having a value in the range [33-126]. HRP validity may be further restricted by specific applications.
73+* A separator, which is always "1". In case "1" is allowed inside the human-readable part, the last one in the string is the separator<ref>'''Why include a separator in codex32 strings?''' That way the human-readable

BenWestgate commented at 10:16 pm on November 25, 2025:

Do we need to be this wordy or should we say:

A human-readable part, as specified in BIP-0173, which is intended to convey the type of data, or anything else that is relevant to the reader.
A separator, as specified in BIP-0173, which is always “1”.

in bip-0093.mediawiki:225 in a4f1e91ad9

225+If we already have ''k'' valid codex32 strings such that:
226 
227-* All strings have the same threshold value ''t'', the same identifier, and the same length
228-* All of the share index values are distinct
229+* All strings have the same human-readable part, the same threshold value ''k'', the same identifier, and the same length.
230+* All of the share index values are distinct.

BenWestgate commented at 10:19 pm on November 25, 2025:

remove the periods I inadvertently added to these bullets?

in bip-0093.mediawiki:262 in a4f1e91ad9

278-# Choose a threshold value ''t'' between 2 and 9, inclusive
279+# Choose a human-readable part according to application (Use "ms" for BIP-0032 master seeds)
280+# Choose a threshold value ''k'' between 2 and 9, inclusive
281 # Choose a 4 bech32 character identifier
282-#* We do not define how to choose the identifier, beyond noting that it SHOULD be distinct for every master seed the user may need to disambiguate.
283+#* We do not define how to choose the identifier, beyond noting that it SHOULD be distinct for every secret the user may need to disambiguate

BenWestgate commented at 10:25 pm on November 25, 2025:

Should we say “secret or set of shares” because this is the reshare case you mentioned that SHOULD have a unique identifier? Here we make it sound like it’s OK to reuse an identifier if the secret is the same which is false.

roconnor commented at 10:47 pm on November 25, 2025:

Yes, set of shares.

BenWestgate commented at 7:26 pm on November 26, 2025:

I used “set of shares” in for an existing secret and “secret” in for a fresh secret.

This is technically correct, no need to say both “secret and set of shares” in existing secret, if you follow that process you always get a fresh set of shares and that is what needs to be uniquely identified not the secret per se.

Fix Test vector 5, add encode/decode ref, add length limit, add clairity

Clarify codex32 specification and examples for encoding and decoding processes, including detailed explanations of parameters and checksum handling.

f74527ed4f

Revert deleted new line 3123cead1d

in bip-0093.mediawiki:264 in a4f1e91ad9

282-#* We do not define how to choose the identifier, beyond noting that it SHOULD be distinct for every master seed the user may need to disambiguate.
283+#* We do not define how to choose the identifier, beyond noting that it SHOULD be distinct for every secret the user may need to disambiguate
284 # Set the share index to <code>s</code>
285-# Set the payload to a bech32 encoding of the master seed, padded with arbitrary bits
286-# Generating a valid checksum in accordance with the Checksum section
287+# Set the payload to a bech32 encoding of the secret, padded with arbitrary bits

BenWestgate commented at 10:30 pm on November 25, 2025:

Change this to “secret data” or “secret bytes” to specify its encoding bytes or leave it? There may be some confusion between “secret” meaning the string with share index “s” and the decoded payload bytes of that [codex32] secret.

roconnor commented at 10:50 pm on November 25, 2025:

“secret data” sounds good.

in bip-0093.mediawiki:401 in 3123cead1d

397@@ -323,10 +398,10 @@ While we could use the 15 character checksum for both cases, we prefer to keep t
398 We only guarantee to correct 4 characters no matter how long the string is.
399 Longer strings mean more chances for transcription errors, so shorter strings are better.
400 
401-The longest data part using the regular 13 character checksum is 93 characters and corresponds to a 400-bit secret.
402+The longest data part using the regular 13 character checksum is 93 characters and corresponds to a 368-bit secret.

BenWestgate commented at 7:44 pm on November 26, 2025:

The original forgot to subtract the 6 character header from the secret payload bits.

I used 368 instead of 370 because 2 of them are padding and not secret data.

Is this another one of those places we should say “secret data” to avoid ambiguity between the “s” string and bytes?

roconnor commented at 9:05 pm on November 26, 2025:

I think it is fine as is, since the “368-bit” makes it clear it is data, but if you want to change the wording, that would also be fine.

in bip-0093.mediawiki:402 in 3123cead1d

397@@ -323,10 +398,10 @@ While we could use the 15 character checksum for both cases, we prefer to keep t
398 We only guarantee to correct 4 characters no matter how long the string is.
399 Longer strings mean more chances for transcription errors, so shorter strings are better.
400 
401-The longest data part using the regular 13 character checksum is 93 characters and corresponds to a 400-bit secret.
402+The longest data part using the regular 13 character checksum is 93 characters and corresponds to a 368-bit secret.
403 At this length, the prefix <code>MS1</code> is not covered by the checksum.

BenWestgate commented at 7:48 pm on November 26, 2025:

This is implicit from the spec definition:

Strings of length 95 and 96 MUST use HRP “ms” (or “MS”)

If it needs to be explained here also. Or a sentence why the maximum length is 94 for other HRP, let me know and I’ll try.

We could reduce maximum length to 94 characters and remove the special HRP vs length rule, but that breaks existing “46-byte codex32-encoded master seeds” and these are absolutely critical to support given the wide-spread deployment of both codex32 and 46-byte master seeds.

in bip-0093.mediawiki:403 in 3123cead1d outdated

397@@ -323,10 +398,10 @@ While we could use the 15 character checksum for both cases, we prefer to keep t
398 We only guarantee to correct 4 characters no matter how long the string is.
399 Longer strings mean more chances for transcription errors, so shorter strings are better.
400 
401-The longest data part using the regular 13 character checksum is 93 characters and corresponds to a 400-bit secret.
402+The longest data part using the regular 13 character checksum is 93 characters and corresponds to a 368-bit secret.
403 At this length, the prefix <code>MS1</code> is not covered by the checksum.
404-This is acceptable because the checksum scheme itself requires you to know that the <code>MS1</code> prefix is being used in the first place.
405-If the prefix is damaged and a user is guessing that the data might be using this scheme, then the user can enter the available data explicitly using the suspected <code>MS1</code> prefix.
406+This is acceptable because the checksum scheme itself requires you to know that a codex32 human-readable part is being used in the first place.

BenWestgate commented at 7:50 pm on November 26, 2025:

At this point we should link to the registry somewhere in our document so people know what a “codex32 human-readable part” might be:

Where’s the best place to put this hyperlink? https://github.com/satoshilabs/slips/blob/master/slip-0173.md#uses-of-codex32

roconnor commented at 9:06 pm on November 26, 2025:

I don’t really want to be seen as endorsing a particular registry. But I also see how a link could be useful, so I’m torn.

roconnor commented at 9:08 pm on November 26, 2025:

Maybe best to leave the registry out since they may or may not be Bitcoin related.

BenWestgate commented at 9:59 pm on November 26, 2025:

https://github.com/bitcoin/bips/blob/master/bip-0173.mediawiki#user-content-Registered_Humanreadable_Prefixes

BIP-0173 which is a prerequisite for implementing this format links to that registry.

We don’t have to link to it as they can find it in BIP-0173 but it the concept of registering codex32 HRP should be mentioned to avoid chaos and disaster of using anything for everything.

roconnor commented at 10:57 pm on November 26, 2025:

Oh. Well if there is precedent then I guess it is okay.

roconnor commented at 0:52 am on November 27, 2025: none

Yes let’s keep the silly exception for now for the sake of getting a agreeable PR. We should hammer out the master seed bit-size restrictions in a separate PR.

If you want to say that length 96 ms seeds are deprecated that’s okay too. But I still want to argue for the merits of 160 bit master seeds.

in bip-0093.mediawiki:124 in 3123cead1d outdated

137-        return ms32_verify_long_checksum(data)
138+        return codex32_verify_long_checksum(bech32_hrp_expand(hrp) + data)
139     if len(data) <= 93:
140-        return ms32_polymod(data) == MS32_CONST
141+        return codex32_polymod(bech32_hrp_expand(hrp) + data) == CODEX32_CONST
142     return False

BenWestgate commented at 7:18 am on November 27, 2025:

I dislike a situation where valid “long codex32” strings can be shorter overall (and in data part characters) than regular codex32.

“long” codex32 format: 10 hrp characters + 1 + 6 header characters + 54 payload characters + 15 checksum characters = 86 codex32 format: “ms” hrp characters + 1 + 6 header characters + 74 payload characters + 13 checksum characters = 96

So I’m now restricting the HRP length and leaving codex32_verify_checksum() alone. The checksum will remain selected as it currently is: based on the length of the data.

The maximum HRP length will be restricted (going forward) so that any HRP is always covered by our 4 error correction guarantees if its errors only affect low (or high bits). 96 character short “ms” strings are deprecated, they decode properly but the same hrp & data will now encode with the long checksum.

It has to be this way, if we needed the HRP to know which checksum to use, we can’t protect the HRP. If we change verify rules, we break backwards compatibility.

0def bech32_decode(bech):
1    """Validate a Bech32/Bech32m string, and determine HRP and data."""

We must do the equivalent:

0def codex32_decode(codex):
1    """Validate a codex32/Long codex32 string, and determine HRP and data."""

The decoder must be ignorant of HRP, because that’s the point of it, to determine it.

roconnor commented at 4:55 pm on November 27, 2025:

I can probably write this code out if you want, but my thoughts are we should have codex32_decode, an independent long_codex32_decode and an ms_decode that can call both of them.

BenWestgate commented at 5:29 pm on November 27, 2025:

That makes sense! Bech32 has an encode/decode for the format and then a separate encode/decode function for segwit addresses.

However they do encode/decode both Bech32/Bech32m checksums at once.

We need codex32_encode and codex32_decode function to handle both checksums. That has to be format level, not application level in order to detect/correct HRP errors.

roconnor commented at 5:55 pm on November 27, 2025:

Below is untested code that is approximately what I’m thinking

 0def bech32_hrp_expand(s):
 1  return [ord(x) >> 5 for x in s] + [0] + [ord(x) & 31 for x in s]
 2
 3CODEX32_CONST = 0x10ce0795c2fd1e62a
 4  
 5def codex32_polymod(residue, values):
 6    if len(values) > 93:
 7        return False
 8    GEN = [
 9        0x19dc500ce73fde210,
10        0x1bfae00def77fe529,
11        0x1fbd920fffe7bee52,
12        0x1739640bdeee3fdad,
13        0x07729a039cfc75f5a,
14    ]
15    for v in values:
16        b = (residue >> 60)
17        residue = (residue & 0x0fffffffffffffff) << 5 ^ v
18        for i in range(5):
19            residue ^= GEN[i] if ((b >> i) & 1) else 0
20    return residue
21
22CODEX32_LONG_CONST = 0x43381e570bf4798ab26
23    
24def codex32_long_polymod(residue, values):
25    if len(values) > 1023:
26        return False
27    GEN = [
28        0x3d59d273535ea62d897,
29        0x7a9becb6361c6c51507,
30        0x543f9b7e6c38d8a2a0e,
31        0x0c577eaeccf1990d13c,
32        0x1887f74f8dc71b10651,
33    ]
34    for v in values:
35        b = (residue >> 70)
36        residue = (residue & 0x3fffffffffffffffff) << 5 ^ v
37        for i in range(5):
38            residue ^= GEN[i] if ((b >> i) & 1) else 0
39    return residue
40
41def codex32_verify_checksum(hrp, data):
42    return codex32_polymod(1, bech32_hrp_expand(hrp) + data) == CODEX32_CONST
43    
44def codex32_verify_long_checksum(hrp, data):
45    return codex32_long_polymod(1, bech32_hrp_expand(hrp) + data) == CODEX32_LONG_CONST
46
47def codex32_create_checksum(hrp, data):
48    polymod = codex32_polymod(1, bech32_hrp_expand(hrp) + data + [0] * 13)
49    if polymod:
50        polymod = polymod ^ MS32_CONST
51        return [(polymod >> 5 * (12 - i)) & 31 for i in range(13)]
52    return False
53
54def codex32_create_long_checksum(hrp, data):
55    polymod = codex32_long_polymod(1, bech32_hrp_expand(hrp) + data + [0] * 15)
56    if polymod:
57        polymod = polymod ^ MS32_LONG_CONST
58        return [(polymod >> 5 * (14 - i)) & 31 for i in range(15)]
59    return False
60
61def ms32_verify_checksum(data):
62    if len(data) >= 96:
63        return codex32_verify_long_checksum("ms", data)
64    return codex32_polymod(codex32_polymod(1, bech32_hrp_expand("ms")), data) == CODEX32_CONST
65
66def ms32_create_checksum(data):
67    if len(data) > 80:
68        return codex32_create_long_checksum("ms", data)
69    polymod = codex32_polymod(codex32_polymod(1, bech32_hrp_expand("ms")), data + [0] * 13)
70    polymod = polymod ^ CODEX32_CONST
71    return [(polymod >> 5 * (12 - i)) & 31 for i in range(13)]

As you can see, I think it is up to the particular application to handle switching between the long codex32 format and the regular codex32 format.

BenWestgate commented at 8:41 pm on November 27, 2025:

We can’t assume “ms” to know which checksum to verify.

If there’s an hrp substitution error, we need to know which checksum is being used to detect/correct it, but if which checksum depends on which hrp application, instead of just codex32 string length, then we’re stuck.

That made our length 96 exception a bug, as how can a decoder know this rule applies if it can’t detect the integrity of what determines its applicability?

We need a codex32_decode function that if it validates it has the correct HRP or more than 8 errors, so applications can’t choose their checksum, the format defines which to use.

Like BIP-0173’s HRP detection assumption, our error correction guarantee only applies to lower (or upper) 5 bits of HRP characters. As the swaps that produce an upper bit change are very unlikely. But we can guarantee to correct 2 “double” errors.

BenWestgate commented at 4:34 am on December 1, 2025:

The short/long format must be format level.

 0def codex32_encode(hrp, data, spec):
 1    """Compute a codex32 string given HRP and data values."""
 2    combined = data + codex32_create_checksum(hrp, data, spec)
 3    return hrp + "1" + "".join([CHARSET[d] for d in combined])
 4
 5def codex32_decode(codex=""):
 6    """Validate a Codex32/Codex32 Long string, and dermine HRP and data."""
 7    if (any(ord(x) < 33 or ord(x) > 126 for x in codex)) or (
 8        codex.lower() != codex and codex.upper() != codex
 9    ):
10        return None, None, None
11    codex = codex.lower()
12    pos = codex.rfind("1")
13    if pos < 1 or pos + 20 > len(codex) or pos + len(codex) > 1023:
14        return None, None, None
15    if not (codex[pos + 1].isdigit() and all(x in CHARSET for x in codex[pos + 1 :])):
16        return None, None, None
17    hrp = codex[:pos]
18    data = [CHARSET.index(x) for x in codex[pos + 1 :]]
19    spec = codex32_verify_checksum(hrp, data)
20    if spec is None or codex[pos + 1] == "0" and codex[pos + 6] != "s":
21        return None, None, None
22    return hrp, data[: -13 if spec is Encoding.CODEX32 else -15], spec

BenWestgate commented at 8:36 am on December 16, 2025:

I can probably write this code out if you want, but my thoughts are we should have codex32_decode, an independent long_codex32_decode and an ms_decode that can call both of them.

Yes to ms_decode but that one should go all the way to bytes. Do we add it to this PR in the master seed format section along with ms_encode? I’d prefer if codex32_decode returned a spec it’s a hassle to check the string length everywhere to know which checksum is being used. But the code in my latest commit isn’t ugly.

bosshaas13131313 commented at 7:23 am on November 27, 2025: none

No.

On Thu, Nov 27, 2025, 2:20 AM Ben Westgate @.***> wrote:

@.**** commented on this pull request.

In bip-0093.mediawiki https://github.com/bitcoin/bips/pull/2040#discussion_r2567413699:

+def codex32_verify_checksum(hrp, data): if len(data) >= 96: # See Long codex32 Strings
   return ms32_verify_long_checksum(data)
   return codex32_verify_long_checksum(bech32_hrp_expand(hrp) + data)
if len(data) <= 93:
   return ms32_polymod(data) == MS32_CONST
   return codex32_polymod(bech32_hrp_expand(hrp) + data) == CODEX32_CONST
return False
Needs to become now to:

def codex32_verify_checksum(hrp, data): combined = bech32_hrp_expand(hrp) + data if len(combined) >= 96: return codex32_verify_long_checksum(combined) if len(combined) <= 93: return codex32_polymod(combined) == CODEX32_CONST return False

Missing:

the thorny zero length “ms” rule.

the check in codex32_decode() for the upper long codex32 length limit.

Because of this new max length rule rule we have the curious situation where valid “long codex32” strings can actually be shorter overall (and in data part characters) than regular codex32.

May want to rename that format any thoughts?

Ex: “long” codex32 format: 10 hrp characters + 1 + 6 header characters + 54 payload characters + 15 checksum characters = 86 codex32 format: “ms” hrp characters + 1 + 6 header characters + 74 payload characters + 13 checksum characters = 96

— Reply to this email directly, view it on GitHub https://github.com/bitcoin/bips/pull/2040#pullrequestreview-3513818681, or unsubscribe https://github.com/notifications/unsubscribe-auth/BNFR33IQPDGIR4ICII5KOVT362Q2BAVCNFSM6AAAAACM4BCWD6VHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZTKMJTHAYTQNRYGE . You are receiving this because you are subscribed to this thread.Message ID: @.***>

in bip-0093.mediawiki:347 in 3123cead1d

377     values = data
378-    polymod = ms32_long_polymod(values + [0] * 15) ^ MS32_LONG_CONST
379+    polymod = codex32_long_polymod(values + [0] * 15) ^ CODEX32_LONG_CONST
380     return [(polymod >> 5 * (14 - i)) & 31 for i in range(15)]
381 </source>
382

BenWestgate commented at 7:57 am on November 27, 2025:

Probably should mention its maximum length for the correction guarantees in a sentence here, similar to what is in the checksum section. Otherwise 1024 is a magic number in the reference snippets.

scgbckbone commented at 3:35 pm on November 27, 2025: contributor

This is what I have:

compatibility encoding for BIP-39, allowing to Shamir split mnemonic & extended private key wallets
- HRP: cc
- encodes chaincode + private key of BIP-32 master extended key (64 bytes)
- cc10zcvjs5klr60nyt8usd553sge7r5glcy2ztwfv2d2smmcs7m3mq6dduwavccnjzjchlkffjfx8p3cjjx64q9vkxdt8q9qzuu3s8jfgjysa5pc5nezf2qkfhqpfwf

I only have one HRP, I do not differentiate between testnet/mainnet, even tho I use extended key data, I’m also using it for menmonics, where I first generate master BIP-32 key and then use those values for the codex32 secret share. DO you consider the lack of testnet/mainnet separation an issue?

Your addition of HRP into checksum definitely broke my tests wrt checksum for cc hrp secrets (not an issue, I haven’t released yet - but I’m planning to in few weeks)

My HWW implementation is pretty much in accordance with https://github.com/BlockstreamResearch/codex32/blob/master/docs/wallets.md . My implementation is not ECW. I even provide generate support for secret share S. I only allow to generate 128 & 256 bit MS secrets (but allow to import also 512 bit). In short:

TRNG 256 entropy bits
r = sha256(sha265(entropy))
x = r[:byte_len]
x is new master secret, and default ID is 20 MSB from master XFP (but user can change if he wishes to)

What are the chances of this patch-set to be accepted? Is this spec stable enough to start releasing it ?

scgbckbone commented at 3:46 pm on November 27, 2025: contributor

How I generate non-secret shares:

From current loaded secret (whether it is mnemonic, xprv, or codex32)
- codex32: secret = master_seed (secret share with hrp MS)
- others: secret = chaincode + privkey (64bytes) (secret share with hrp CC)
BIP-85 derive from above secret –> master secret for share ‘a’
interpolate secret share with share ‘a’ while changing only index (c,d,d,e,f,g,h…) to generate new shares

BenWestgate commented at 5:02 pm on November 27, 2025: contributor

* HRP: `cc`

“bc” and “tb” for Bech32 addresses were an upgrade in human-readable prefix from the base58 encoding.

I consider it a regression if you use less characters to encode a human-readable prefix than the base58 extended key format did. “xpriv” is an option here.

* encodes chaincode + private key of BIP-32 master extended key (64 bytes)

Does your format need a 65th byte for the public key that is zero when encoding private keys?

There are many advantages to the strings needing disambiguation having the same byte length.

I only have one HRP, I do not differentiate between testnet/mainnet, … DO you consider the lack of testnet/mainnet separation an issue?

Yes, this is a huge regression from the current bip32 extended key format we want to upgrade. Mostly that I can’t tell by looking at the descriptor if it’s for real funds or not.

Your addition of HRP into checksum definitely broke my tests wrt checksum for cc hrp secrets (not an issue, I haven’t released yet - but I’m planning to in few weeks)

HRP was always in the checksum, it just was pre-computed for “ms” so the checksums for other HRP were wrong. I noticed when I tried to validate the CLN HSM secret examples in my python-codex32 package.

My implementation is not ECW. @roconnor has a PR in codex32 that does ECC you could test.

I even provide generate support for secret share S. I only allow to generate 128 & 256 bit MS secrets (but allow to import also 512 bit).

I have a codex32 PR to update wallets.md guidance for generation, you may see something useful, especially in the HWW case.

In short:

TRNG 256 entropy bits

r = sha256(sha265(entropy))

x = r[:byte_len]

x is new master secret, and default ID is 20 MSB from master XFP (but user can change if he wishes to)

You can and probably should use the entropy bits directly. If they lack entropy, sha256d is an illusion of security.

What are the chances of this patch-set to be accepted? Is this spec stable enough to start releasing it ?

It will need wider community review than us. But there’s comments by P. Wuille as far back as 2020 stating a 4 error correcting bech32 encoding of extended keys is needed. So high acceptance changes once it’s correct and shiney.

This spec PR will not change anything that affects your encoding of ~78 bytes or whatever an extended key has.

We’re mostly debating behavior at the limit between short and long checksums. Yours unambiguously use long codex32.

BenWestgate commented at 5:10 pm on November 27, 2025: contributor

How I generate non-secret shares:

BIP-85 derive from above secret –> master secret for share ‘a’

interpolate secret share with share ‘a’ while changing only index (c,d,d,e,f,g,h…) to generate new shares

It is unsafe to child derive shares from the secret they recover. They should be independently random.

When part of the secret is compromised and an attacker tries to brute force the rest: the dependent relation between the secret and share A allows an attacker with k-1 shares or share A to check his guesses against this. This is far faster than checking an address.

scgbckbone commented at 5:54 pm on November 27, 2025: contributor

HRP was always in the checksum, it just was pre-computed for “ms” so the checksums for other HRP were wrong. I noticed when I tried to validate the CLN HSM secret examples in my python-codex32 package.

I see now…

It is unsafe to child derive shares from the secret they recover. They should be independently random.

I do not want to use randomness here, as I want to split existing secret, and I require the “split” to be deterministic, so that if user is splitting the exact same secret, uses same hrp, same threshold, same id, and same number of shares - application always produces the exact same shares. I could add an option to to choose, if random, or deterministic split, but deterministic is a hard requirement.

…also it is 5 hardened derivation steps plus hmac_sha512

When part of the secret is compromised and an attacker tries to brute force the rest: the dependent relation between the secret and share A allows an attacker with k-1 shares or share A to check his guesses against this. This is far faster than checking an address.

there are plenty other brute-force options if attacker has part of secret, I do not consider this scenario of yours to be something I should optimize for

Yes, this is a huge regression from the current bip32 extended key format we want to upgrade. Mostly that I can’t tell by looking at the descriptor if it’s for real funds or not.

I do not encode extended key (or full extended key), I only encode chaincode + privkey, without any other data as I just want to be able to restore naked xpriv from it, without any more meta extended keys carry. As I use it for both mnemonics and extended keys.

That is why I dismissed the idea of doing testnet/mainnet differentiation as I consider my 64bytes to be the “secret”

BenWestgate commented at 7:45 pm on November 27, 2025: contributor

It is unsafe to child derive shares from the secret they recover. They should be independently random.

I do not want to use randomness here, as I want to split existing secret, and I require the “split” to be deterministic, so that if user is splitting the exact same secret, uses same hrp, same threshold, same id, and same number of shares - application always produces the exact same shares. I could add an option to to choose, if random, or deterministic split, but deterministic is a hard requirement.

The best you could do here if you insist, is perform a KDF on the secret data to harden it before deriving child shares from that derived key. But it still reduces security from information theoretic to computational.

…also it is 5 hardened derivation steps plus hmac_sha512

Still significantly faster than address checking. The EC mult is the bottleneck for address checking is what Andrew told me.

When part of the secret is compromised and an attacker tries to brute force the rest: the dependent relation between the secret and share A allows an attacker with k-1 shares or share A to check his guesses against this. This is far faster than checking an address.

there are plenty other brute-force options if attacker has part of secret, I do not consider this scenario of yours to be something I should optimize for

My point is your standard should be harder to exploit than all other options or we lose security for nothing. Simply deriving child shares from an argon2id or scrypt derived key is probably enough protection.

That is why I dismissed the idea of doing testnet/mainnet differentiation as I consider my 64bytes to be the “secret”

It seems better to encode the recovery words and wordlist with a bip39_12w or bip39_24w human-readable part encoding standard than encode the resulting private key and chaincode bytes. A full bip32 codex32 encoding standard would be more useful than a neutered master xprv only edition.

BenWestgate commented at 9:30 pm on November 27, 2025: contributor

This table shows the undetectable errors, each row has 2-3 characters which cannot be distinguished since they differ only in the upper bits.

I found an 83 character Bech32 HRP with 3 substitutions that validates. In theory, some long HRP won’t detect even 1-2 errors affecting high bits. We inherit this problem if we copy Bech32 max length rules.

The worse case is: a secret is transcribed wrong or damaged, user or heirs, application is forgotten, it validates or corrects to a different application and then is transmitted.

This is worse than a wrong HRP address validating.

We should guarantee to correct 2 HRP errors by covering the expanded characters. Now any wrong 2 character HRP for every seed length reveals it is “ms” secret data. For the more common errors affecting only the low (or high) bits two errors from the data can also be corrected.

So the correct 4 errors guarantee holds under the assumption the HRP errors affect only low (or high) bits. Same assumption as Bech32’s detection guarantee, and it’s a detection only standard. We store secrets so we need correction guarantees and this is how we get them.

roconnor commented at 10:08 pm on November 27, 2025: none

I’m this close to throwing in the towel. BIP-93’s design was never intended to be generalized to arbitrary HRP, and it shows. If people want to reuse our polynomial for their own schemes, then more power to them. They can make their own BIP.

apoelstra commented at 5:37 pm on December 5, 2025: contributor

Sorry for being late to the party. I have read through this whole discussion except for the digression about deterministic share derivation and except for Russell’s detailed code. As I understand it there are a few issues at play:

Ben wants the HRP to be covered by the checksum, which has multiple problems
- if you don’t know the HRP you arguably don’t know the checksum so how can you correct it?
- but conversely Ben points out that we likely want xpub/xprv HRPs which are easy to mess up and would share a checksum
- each HRP character contributes two characters to the checksum plus an extra “separator” character so length n takes away 2n+1 from your total length, which is surprising and weird
- (There was a long discussion about restricting the character set of HRPs. As you have observed, I’ve already violated this with my bip32_24w HRPs. I’m skeptical this matters. If there is anybody except me doing this, they would have needed to do a comparable amount of insane off-spec work to accomplish it and “protecting them” by extending the spec to include them should not be a priority.)
How do we determine the threshold at which to switch to “long codex32” which is a totally different checksum
- Ben would like there to only be a few allowable lengths of long codex32 strings, which I directionally agree with, but I also note that I have violated this (I have 264-bit strings which are converted directly from BIP39 seed words).
Some discussion on what the allowable HRPs should be. BIP-173 allows any ASCII string up to 83 characters.
- …but if you look at the registered list of BIP-173 prefixes, despite there being some pretty crazy crap in there, every single prefix is less than 12 characters, and except for one using : and one using @ every single one is alphanumeric
- Ben initially proposed restricting the set of characters to ones that all have distinct low bits so that we could “ignore the high bits”. But as seen in his above table, this is impossible if we allow both numbers and letters.

In the interest of moving forward I would kinda like Ben to make a new PR with the non-HRP changes, which it seems like everyone agrees with and would reduce the size of the diff of this one.

Then my opinions on the above:

I agree with Russell that in general we should not attempt to correct the HRP. This was outside of the design space for our codex32 SSSS application and among other things we (ab)used this fact to distinguish codex32 from long codex32 on length alone and not HRP. Having said this, if Ben wants to try to error-correct HRPs all the power to him and we should take some effort to avoid undermining that goal.

So for this BIP we should say:

Users can register their own HRPs at [link] but they are only allowed to use ASCII 96 to 126, and their length can be at most 8, say. (These are the tightest restrictions I would support, and I’d also accept any looser ones up to the “83 ascii characters free for all” of BIP 173.) This gives us { | } and ~ as well as letters. People who want a separator should be happy to use ~.
The HRP defines the checksum and SHOULD NOT be error-corrected, unless there is a separate specification describing how to do this. xpub/xprv I think needs to have its own BIP for this. Maybe there could be a general-purpose “bip93 with HRP correction” BIP that covers questions like “what if the user has a character outside of the allowable set” or “should we preferentially try to correct _ and - to ~ or just try random things” or “should we have a fixed set of supported HRPs and just try all of these”. It seems that different answers make sense in different contexts.
I’m happy with whatever length threshold we want for switching between codex32 and long codex32. I think “93 - length of HRP” is fine, along with an exception for ms. We should specify the maximum length in the table of registered HRPs so people don’t have to know the formula if they don’t want to.

I think this should make everyone happy, except that it leaves HRP correction underspecified and delegated to another future BIP. (I would also be open to bringing more text into BIP 93 itself, but let’s try to accomplish the above before we do that.)

jonatack commented at 5:37 pm on December 15, 2025: member

@BenWestgate Do you plan to update here following the merge of #2052?

BenWestgate commented at 7:24 pm on December 15, 2025: contributor

How do we determine the threshold at which to switch to “long codex32” which is a totally different checksum

It’s best for this to depend only on length. Consistent with BIP-0173.

The “ms” exception for 93 data characters we can either:

deprecate (let it decode but future len(hrp) + len(data) > 80 characters encodings will use long codex32
count “ms” as zero characters so nothing changes

Some discussion on what the allowable HRPs should be.

Prefer keeping what BIP-0173 allows to avoid redefinition.

Ben initially proposed restricting the set of characters to ones that all have distinct low bits so that we could “ignore the high bits”. But as seen in his above table, this is impossible if we allow both numbers and letters.

Not quite, I propose to restrict the registry so that every hrp in it has unique low bits from every other registered hrp.

Allow every US-ASCII character but applications should not register an hrp that is only unique in the high bits as it might be mistaken for another.

I think this maintains 8 character error detection guarantees as the low bits are always covered and unique among valid hrp.

…I would like Ben to make a new PR… [to] reduce the size of the diff of this one.

Done and merged.

Then my opinions on the above:

I agree with Russell that in general we should not attempt to correct the HRP.

I agree. Error detection guarantees on it are enough to avoid disasters when honest software detects data it should not have been given.

So for this BIP we should say:

The HRP defines the checksum and SHOULD NOT be error-corrected, unless there is a separate specification describing how to do this.

Correction should try all registered hrp if rebroadcast is not an option. Assume the fewest edits is the valid hrp. According to the edit distance formula in wallets.md

xpub/xprv I think needs to have its own BIP for this.

Agreed. Someone may have volunteered to do this.

I’m happy with whatever length threshold we want for switching between codex32 and long codex32. I think “93 - length of HRP” is fine, along with an exception for ms.

Agree. 93-len(hrp) is simplest. Is that “ms” rule a verify exception (deprecate) or create and verify exception?

specify the maximum length in the table of registered HRPs so people don’t have to know the formula if they don’t want to.

Unsure why string length for a given payload size and hrp belongs in the registry.

except that it leaves HRP correction underspecified and delegated to another future BIP.

I think that’s fine, correction is application specific. Or at very least rules for: public/private/secret and always/sometimes/never retransmissable data. That’s up to 9 hrp types and then correction guidance may include contexts expecting combinations of multiple of these.

I just don’t think it can or should all be said here. Just enough to avoid disasters such IF hrp correction is attempted the least edits should be assumed as the correct hrp. And in applications that transmit data, probably MUST.

(I would also be open to bringing more text into BIP 93 itself, but let’s try to accomplish the above before we do that.)

Agreed. Safety guidance only here

BenWestgate commented at 7:32 pm on December 15, 2025: contributor

@BenWestgate Do you plan to update here following the merge of #2052?

Yea. Enough consensus has formed to do another commit incorporating it this week.

roconnor commented at 7:40 pm on December 15, 2025: none

If it is helpful we could start with an intermediate amendment to BIP-93 seed sizes to only allow between 128 and 256 bit seeds (specifically 16 to 32 byte seeds) for short codex 32 strings, and only allow exactly 512 bit secrets for long codex32 strings. This would eliminate the 95 and 96 character special exceptions we are worried about. I think we all agree we want to restrict the valid size values to some subset of these values anyways, and there is only a small debate on how far we ought to go.

E.g. I would also not oppose going as far as restricting seed sizes to be of 128, 160, 192, 224, and 256, which are the entropy sizes listed in BIP-39. (And also keeping 512 bit long codex32 for compatibility with BIP-39 generated master seeds).

My only hesitation is that I know some folks want to restrict this list even further, and it would be somewhat annoying to make “breaking changes” twice. I use the word “breaking changes” loosely since, in practice people are using 128, 256 and 512 bit entropy sizes.

BenWestgate commented at 5:56 am on December 16, 2025: contributor

I found this reply in my browser cache:

if you don’t know the HRP you arguably don’t know the checksum so how can you correct it?

Bech32/Bech32m checksum:

length 90 or less
1st value is 0-16
The seventh-from-last character has zero padding
implementations SHOULD NOT implement correction…

Codex32 checksum:

length 94 or less or 97 or more
Data starts with digit
- If “0”, 6th character is “s”
14/16th (short/long) from last character may have a 1 in its padding
MAY implement correction…

Only threshold 9, 8, 2, and 0 are valid segwit data[0] and 0 has “s” ruling 31:1 in codex32 favor. 61.25% of random 7th from last characters will have a 1 in their padding. 1 in 10^9 the Bech32 checksum validates

Any attempt at suggesting HRP corrections should assume codex32 after checking it’s invalid:

Bech32/Bech32m
Base58Check
Hex encoded public key

Then only suggest registered HRP.

If it is helpful we could start with an intermediate amendment to BIP-93 seed sizes to only allow between 128 and 256 bit seeds (specifically 16 to 32 byte seeds) for short codex 32 strings, and only allow exactly 512 bit secrets for long codex32 strings.

It is very helpful to lose the “ms” exceptions. These rules fit perfectly in the new “Master seed format” section. 16..32 for regular and 64 for long looks great if you’re OK with no insert/delete correction outside 16, 24, 32, and 64. Do you want to write it, or should I?

Wallets.md says:

MAY attempt correction by deleting and/or inserting characters, as long as the resulting string has a valid length for a codex32 string. ECWs MAY assume the correct length is the closest of 48 or 74.

We can safely amend that to “48, 61, or 74.” as the correctable lengths of each do not overlap.

I think we all agree we want to restrict the valid size values to some subset of these values anyways, and there is only a small debate on how far we ought to go.

If it’s ready to test again, https://github.com/BlockstreamResearch/codex32/pull/70 could give objective data how much performance or accuracy insert/delete correction loses when it checks every length or lengths other than 48, 61, or 74.

E.g. I would also not oppose going as far as restricting seed sizes to be of 128, 160, 192, 224, and 256, which are the entropy sizes listed in BIP-39. (And also keeping 512 bit long codex32 for compatibility with BIP-39 generated master seeds).

Having just two lengths in correction range is like losing 1-bit of checksum, 160 and 224 may not significantly harm accuracy.

My only hesitation is that I know some folks want to restrict this list even further, and it would be somewhat annoying to make “breaking changes” twice. I use the word “breaking changes” loosely since, in practice people are using 128, 256 and 512 bit entropy sizes.

If testing shows a significant loss of insert/delete correction accuracy would that sway your opinion? If not, only 128, 160, 192, 224, and 256 has consensus.

P.S.: I checked the characters for descriptor key origin data and if it is prepended inside the HRP of xpub/tpub, it maintains the “low bits always unique” hrp registry rule for any master fingerprint hex and derivation path symbols which is a nice omen.

Merge branch 'master' into bip93-fix-threshold 8e4256807d

in bip-0093.mediawiki:69 in 8e4256807d outdated

70-
71-* A human-readable part, which is the string "ms" (or "MS").
72-* A separator, which is always "1".
73-* A data part which is in turn subdivided into:
74-** A threshold parameter, which MUST be a single digit between "2" and "9", or the digit "0".
75+It reuses the base-32 character set from BIP-0173, is at most 94 characters long, and consists of:

BenWestgate commented at 8:21 am on December 16, 2025:

Going forward with covering [low hrp] [data] because it’s simplest to know which checksum to use.

If the registry is curated so every HRP differs in [low hrp] our error correction properties remain applicable, if not from the code itself, from lookup against the registry if an application chooses to implement this capability where it can assume the data is codex32.

in bip-0093.mediawiki:78 in 8e4256807d outdated

87-Note that per BIP-0173, the lowercase form is used when determining a character's value for checksum purposes.
88-In particular, given an all uppercase codex32 string, we still use lowercase <code>ms</code> as the human-readable part during checksum construction.
89+** The '''identifier''', which consists of 4 bech32 characters.
90+** The '''share index''', which is any bech32 character.
91+***Note that a share index value of "s" (or "S") is special and denotes the unshared secret (see section "Unshared Secret").
92+** The '''payload''', which is a sequence of 0 to 73 bech32 characters. (However, see '''Long codex32''' below for an exception to this limit.)

BenWestgate commented at 8:26 am on December 16, 2025:

I dislike this sentence as total length determines the switch now, not payload, which is not even required. Although should be for shares otherwise what was the point? Don’t know how to fix it without making this ugly maybe just “which is a sequence of bech32 characters.” and move the (However, see ‘‘‘Long codex32’’’…) to L69 after “at most 94 characters long”

in bip-0093.mediawiki:328 in 8e4256807d outdated

325 
326 ===Long codex32===
327 
328 The 13 character checksum design only supports up to 80 data characters.
329-Excluding the threshold, identifier and index characters, this limits the payload to 74 characters or 46 bytes.
330+Excluding the human-readable part, threshold, identifier and index characters, this limits the payload to 74 characters or 46 bytes.

BenWestgate commented at 8:55 am on December 16, 2025:

72 characters or 45 bytes now. hrp can be length 1 but 73 has an incomplete group.

in bip-0093.mediawiki:327 in 8e4256807d outdated

323 
324 The codex32 secret and the ''k''-1 codex32 shares form a set of ''k'' valid initial codex32 strings from which additional shares can be derived as described above.
325 
326 ===Long codex32===
327 
328 The 13 character checksum design only supports up to 80 data characters.

BenWestgate commented at 8:58 am on December 16, 2025:

I kept the use of “data characters” but it really means [low hrp] [data] now. Do we need to change that everywhere?

in bip-0093.mediawiki:366 in 8e4256807d

BenWestgate commented at 9:00 am on December 16, 2025:

Needs update to “between 75 and 1001 bech32 characters.

in bip-0093.mediawiki:369 in 8e4256807d outdated

367@@ -353,11 +368,28 @@ A long codex32 string follows the same specification as a regular codex32 string
368 
369 A codex32 string with a data part of 94 or 95 characters is never legal as a regular codex32 string is limited to 93 data characters and a long codex32 string is at least 96 data characters.

BenWestgate commented at 9:02 am on December 16, 2025:

Change “data part” to “length” and add 1 for the separator.

in bip-0093.mediawiki:392 in 8e4256807d

388+** The share index "s".
389+** A conversion of the 16-to-64-byte BIP-0032 HD master seed to bech32:
390+*** Start with the bits of the master seed, most significant bit per byte first.
391+*** Re-arrange those bits into groups of 5, and pad with arbitrary bits at the end if needed.
392+*** Translate those bits to characters using the bech32 character table from BIP-0173.
393+

BenWestgate commented at 9:05 am on December 16, 2025:

Delete this duplicate section.

in bip-0093.mediawiki:480 in 8e4256807d outdated

475+
476+SatoshiLabs maintains a full list of registered human-readable parts for other uses of codex32:
477+
478+[https://github.com/satoshilabs/slips/blob/master/slip-0173.md#uses-of-codex32 SLIP-0173 : Registered human-readable parts for BIP-0093]
479+
480+The sequence of lower 5 bits of each character's US-ASCII value in a registered codex32 human-readable part SHOULD be unique.

BenWestgate commented at 9:07 am on December 16, 2025:

This sentence is a mouthful, I’ll need to think about how to say it more clearly.

in bip-0093.mediawiki:627 in 8e4256807d outdated

622+* HSM secret (hex): <code>82f5805deee7834842444d455c8aaab40b2fae229e65c2f38408d576b7b6d2fe08</code>
623+
624+
625+
626+
627 ===Invalid test vectors===

BenWestgate commented at 9:10 am on December 16, 2025:

I’m working on a list of BIP-0173 style valid and invalid vectors and will add them when complete.

BenWestgate commented at 6:54 am on December 17, 2025:

I will be adding these as test vectors for hrp generalized codex32 decode.

 0VALID_CODEX32 = [
 1    "A12UEL5LLGCHJ4UJCQVHG",
 2    "a12uel5llgchj4ujcqvhg",
 3    "a74characterlonghumanreadablepartcontainingnumber1andexcludedcharactersbio15tttgsdupy3h58nvmja",
 4    "abcdef13qpzry9x8gf2tvdw0s3jn54khce6mua7lclc606q3t75r4",
 5    "1199999999999999999999999999999999999999999999999999999999999999999999999999999997f7ekwq8dq7tm",
 6    "split12checkupstagehandshakeupstreamerranterredcaperred75pe8uz2kh9ey",
 7    "?13zyfclf624rkvjcl35t",
 8]
 9
10VALID_CODEX32_LONG = [
11    "A12UEL5LQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQV3RR8ZLCK96GTC3",
12    "a12uel5lqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqv3rr8zlck96gtc3",
13    "a1002characterlonghumanreadablepartthatcontainsthenumber1,theexcludedcharactersbio,andeveryus-asciicharacterin[33-126]!\"#$%&'()*+,-./0123456789:;<=>?@[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~!\"#$%&'()*+,-./0123456789:;<=>?@[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~!\"#$%&'()*+,-./0123456789:;<=>?@[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~!\"#$%&'()*+,-./0123456789:;<=>?@[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~!\"#$%&'()*+,-./0123456789:;<=>?@[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~!\"#$%&'()*+,-./0123456789:;<=>?@[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~!\"#$%&'()*+,-./0123456789:;<=>?@[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~!\"#$%&'()*+,-./0123456789:;<=>?@[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~!\"#$%&'()*+,-./0123456789:;<=>?@[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~!\"#$%&'()*+,-./0123456789:;<=>?@[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~!\"#$%&'()*+,-./0123456789:;<=>?@[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~!\"#$%&'()*+,-./0123456789:;<=>?@[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~!\"#$%&'()*+,-./0123456789:;<=>?@[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~15ttgtscr3gvktxamm8mzt",
14    "abcdef12l7aum6echk45nj3s0wdvt2fg8x9yrzpql7aum6echk45nj3s0wdvt2fg8x9yrzpql7aum6echk45nj3s0wdvt2fg8x9yrzpqp9evrmhc52umqew",
15    "1177777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777fn0jxg9gc35xwa8",
16    "split13checkupstagehandshakeupstreamerranterredcaperredscatteredsusurrantplunderedqsp5ws8r2klm66l",
17    "?17v59aaqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq2uwygewx4t4gps0",
18]
19INVALID_CODEX32 = {
20    " 12fauxxpgjxu9gyqhql4",  # HRP character out of range
21    "\x7f" + "12fauxxk7kd7xqlns9mj",  # HRP character out of range
22    "\x80" + "12fauxxgqp5ecwf5kzg3",  # HRP character out of range
23    # overall max length exceeded
24    "a75characterslonghumanreadablepartcontainingnumber1andexcludedcharactersbio12fauxxau7wnkdhzp90r",
25    "x12fauxbhf2k7v7ay7ua5",  # Invalid data character
26    "li12fauxxz4pdg55uwav3",  # Too short checksum
27    "de12fauxxrmt7mj886swl" + "\xff",  # Invalid character in checksum
28    "A12FAUXXMRQDLRATCD0WJ",  # Checksum calculated with uppercase form of HRP
29}
30
31INVALID_CODEX32_LONG = {
32    " 12fauxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx8qt67zg4n9sqylv",  # HRP character out of range
33    "\x7f"
34    + "12fauxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx836hdd09mhkhkhx",  # HRP character out of range
35    "\x80"
36    + "12fauxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxlhf5ywnkmk4r3tc",  # HRP character out of range
37    # overall max length exceeded
38    "a1003characterslonghumanreadablepartthatcontainsthenumber1,theexcludedcharactersbio,andeveryus-asciicharacterin[33-126]!\"#$%&'()*+,-./0123456789:;<=>?@[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~!\"#$%&'()*+,-./0123456789:;<=>?@[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~!\"#$%&'()*+,-./0123456789:;<=>?@[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~!\"#$%&'()*+,-./0123456789:;<=>?@[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~!\"#$%&'()*+,-./0123456789:;<=>?@[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~!\"#$%&'()*+,-./0123456789:;<=>?@[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~!\"#$%&'()*+,-./0123456789:;<=>?@[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~!\"#$%&'()*+,-./0123456789:;<=>?@[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~!\"#$%&'()*+,-./0123456789:;<=>?@[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~!\"#$%&'()*+,-./0123456789:;<=>?@[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~!\"#$%&'()*+,-./0123456789:;<=>?@[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~!\"#$%&'()*+,-./0123456789:;<=>?@[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~!\"#$%&'()*+,-./0123456789:;<=>?@[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~12fauxxru38cppmlpu0t6l",
39    "y12bfauxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxt3y5fewy4gnw2hs",  # Invalid data character
40    "lt12ifauxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxjxd0ehq868vm3zl",  # Invalid data character
41    "in12fauxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxvgegljrsvs5w9q",  # Too short checksum
42    "mm12fauxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxbz9tqm7y53swfaw",  # Invalid character in checksum
43    "au12fauxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxoqz9gl44za2owxc",  # Invalid character in checksum
44    "M12FAUXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX4DZ47P062DVJUNM",  # Checksum calculated with uppercase form of HRP
45}

BenWestgate commented at 0:13 am on December 21, 2025:

My implementation is passing these test vectors and Bech32/Bech32m vectors:

  0"""Reference implementation for codex32/Long codex32 and codex32-encoded master seeds."""
  1
  2from enum import Enum
  3
  4CHARSET = "qpzry9x8gf2tvdw0s3jn54khce6mua7l"
  5
  6CODEX32_GEN = [
  7    0x19DC500CE73FDE210,
  8    0x1BFAE00DEF77FE529,
  9    0x1FBD920FFFE7BEE52,
 10    0x1739640BDEEE3FDAD,
 11    0x07729A039CFC75F5A,
 12]
 13CODEX32_LONG_GEN = [
 14    0x3D59D273535EA62D897,
 15    0x7A9BECB6361C6C51507,
 16    0x543F9B7E6C38D8A2A0E,
 17    0x0C577EAECCF1990D13C,
 18    0x1887F74F8DC71B10651,
 19]
 20
 21class Encoding(Enum):
 22    """Enumeration type to list the various supported encodings."""
 23
 24    CODEX32 = (CODEX32_GEN, 13, 0x10CE0795C2FD1E62A)
 25    CODEX32_LONG = (CODEX32_LONG_GEN, 15, 0x43381E570BF4798AB26)
 26    BECH32 = (BECH32_GEN, 6, 1)
 27    BECH32M = (BECH32_GEN, 6, 0x2BC830A3)
 28
 29    def __init__(self, gen, cs_len, const):
 30        self.gen = gen
 31        self.cs_len = cs_len
 32        self.shift = len(self.gen) * (self.cs_len - 1)
 33        self.const = const
 34        self.mask = (1 << self.shift) - 1
 35
 36    def polymod(self, values: list[int], residue=1):
 37        """Internal function that computes the Codex32 checksums."""
 38        for value in values:
 39            top = residue >> self.shift
 40            residue = (residue & self.mask) << len(self.gen) ^ value
 41            for i, g in enumerate(self.gen):
 42                residue ^= g if ((top >> i) & 1) else 0
 43        return residue
 44
 45def bech32_hrp_expand(hrp):
 46    """Expand the HRP into values for checksum computation."""
 47    return [ord(x) >> 5 for x in hrp] + [0] + [ord(x) & 31 for x in hrp]
 48
 49def codex32_verify_checksum(hrp, data):
 50    """Verify a checksum given HRP and converted data characters."""
 51    if len(hrp) + len(data) >= 96:  # See Long codex32 Strings
 52        spec = Encoding.CODEX32_LONG
 53    elif len(hrp) + len(data) <= 93:
 54        spec = Encoding.CODEX32
 55    else:
 56        raise InvalidLength(f"{len(hrp) + len(data)} characters not valid for Codex32")
 57    if spec.polymod(bech32_hrp_expand(hrp) + data) == spec.const:
 58        return spec
 59    raise InvalidChecksum(spec.name)
 60
 61def bech32_create_checksum(values, spec: Encoding):
 62    """Compute the checksum values given HRP and data."""
 63    polymod = spec.polymod(values + [0] * spec.cs_len) ^ spec.const
 64    return [(polymod >> 5 * (spec.cs_len - 1 - i)) & 31 for i in range(spec.cs_len)]
 65
 66def bech32_encode(hrp, data, spec):
 67    """Compute a Bech32 string given HRP and data values."""
 68    combined = data + bech32_create_checksum(bech32_hrp_expand(hrp) + data, spec)
 69    return hrp + "1" + "".join(CHARSET[d] for d incombined)
 70
 71def codex32_encode(hrp, data):
 72    """Compute a Codex32/Codex32 Long string given HRP and data values."""
 73    spec = Encoding.CODEX32_LONG if len(hrp) + len(data) > 80 else Encoding.CODEX32
 74    return bech32_encode(hrp, data, spec)
 75
 76def bech32_to_u5(bech: str):
 77    """Map bech32 data-part string -> list of 5-bit integers (0-31)."""
 78    for i, ch in enumerate(bech.lower()):
 79        if ch not in CHARSET:
 80            raise InvalidChar(f"{ch!r} at pos={i} in data part")
 81    return [CHARSET.find(x) for x in bech.lower()]
 82
 83def _decode(bech: str):
 84    """Decode a Bech32/Bech32m string, and determine HRP and data."""
 85    for i, ch in enumerate(bech):
 86        if ord(ch) < 33 or ord(ch) > 126:
 87            raise InvalidChar(f"non-printable U+{ord(ch):04X} at pos={i}")
 88    if bech.lower() != bech and bech.upper() != bech:
 89        raise InvalidCase
 90    bech = bech.lower()
 91    pos = bech.rfind("1")
 92    hrp = bech[:pos]
 93    data = bech32_to_u5(bech[pos + 1 :])
 94    return pos, hrp, data
 95
 96def codex32_decode(codex: str):
 97    """Validate a Codex32/Codex32 Long string, and determine HRP and data."""
 98    pos, hrp, data = _decode(codex)
 99    if pos < 1 or pos + 20 > len(codex) or len(codex) > 1024:
100        raise InvalidLength(f"{len(codex)-1}")
101    if not codex[pos + 1].isdigit():
102        raise InvalidThreshold
103    if data[0] == 15 and data[5] != 16:
104        raise InvalidShareIndex
105    spec = codex32_verify_checksum(hrp, data)
106    return hrp, data[: -spec.cs_len], spec

BenWestgate commented at 9:03 am on January 22, 2026:

If reviewer agrees having no minimum length for Long codex32 is a good solution, then the “too short checksum” vector needs to be shortened to 20 data-part characters.

BenWestgate commented at 9:20 am on December 16, 2025: contributor

Covers some major decisions, areas I need feedback and flagged obvious mistakes to fix.

Remove duplicate master seed format section

Removed the duplicate section detailing the master seed format for codex32.

ca09f9bd0c

BenWestgate commented at 9:46 pm on January 5, 2026: contributor

Since it looks like the simplest and best direction for length switching is based on:

0def codex32_verify_checksum(hrp, data):
1    """Verify a checksum given HRP and converted data characters."""
2    if 93 < len(hrp) + len(data) < 96:
3        raise InvalidLength
4    spec = Encoding.CODEX32 if len(hrp) + len(data) <= 93 else Encoding.CODEX32_LONG
5    return verify_checksum(bech32_hrp_expand(hrp) + data, spec)

This “wrong checksum for their given data sizes” test vector now should pass: "ms10fauxsxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxv70wkzrjr4ntqet"

It would have had 95 data part length but with the inclusion of the hrp it’s now valid. Mark it as an xfail?

Is it time to make that helper PR where we restrict master seed payload lengths (“ms”, other applications can stay 19-1023 as discussed)?

roconnor commented at 3:46 pm on January 6, 2026: none

if 93 < len(hrp) + len(data) < 96:

I don’t understand the logic behind this condition.

BenWestgate commented at 9:54 pm on January 7, 2026: contributor

0if 93 < len(hrp) + len(data) < 96:
I don’t understand the logic behind this condition.

The condition enforces the gap between regular and long codex32. Total covered length is len(hrp) + len(data) per BIP-0173 (this PR aligns). Values ≤93 use CODEX32, values ≥96 use CODEX32_LONG. Lengths 94–95 are explicitly invalid, so this check rejects that range before selecting a spec.

It saves a line versus saying the equivalent:

0def codex32_verify_checksum(hrp, data):
1    """Verify a checksum given HRP and converted data characters."""
2    if len(data) >= 96:                      # See Long codex32 Strings
3        return Encoding.CODEX32_LONG.verify_checksum(bech32_hrp_expand(hrp) + data)
4    if len(data) <= 93:
5        return Encoding.CODEX32.verify_checksum(bech32_hrp_expand(hrp) + data)
6    raise InvalidLength

roconnor commented at 9:57 pm on January 7, 2026: none

Oh right, because the long checksum is longer. I remember now.

BenWestgate commented at 9:32 pm on January 8, 2026: contributor

@roconnor I have written the length restriction helper PR for this PR. I’ll see you there.

BenWestgate commented at 6:49 pm on January 21, 2026: contributor

It was suggested by multiple reviewers to restrict “ms” lengths to avoid having a different switchover between regular and long codex32 checksums, since “ms” is currently uncovered at the longest length. #2077 addresses that by allowing only multiples of 32-bit master seeds.

However, another way to avoid this is make which checksum to use be an application specific choice rather than specification level. After all, without the HRP, we do not even know if the data is regular/long codex32 or Bech32, it could even validate for all 3.

Here, we only need to define how to encode/decode master seeds, including their detail about the checksum switchover and invalid lengths.

This follows the BIP173 trend where some applications, lightning invoices come to mind, defy the 90 character limit of the Bech32 checksum. We simply define two checksums and state their maximum string length for error correction guarantees.

This should result in a much smaller diff. Shall I proceed with this simpler approach?

Make codex32 checksum selection length agnostic c14d242c64

roconnor commented at 0:24 am on January 22, 2026: none

Apologies, I probably won’t get to reviewing this until Feb.

in bip-0093.mediawiki:74 in c14d242c64

81+** A share index, which is any bech32 character.
82+*** Note that a share index value of "s" (or "S") is special and denotes the unshared secret (see section '''Unshared Secret''').
83+** A payload which is a sequence of up to 74 bech32 characters. (However, see '''Long codex32''' below for an exception to this limit.)
84 ** A checksum which consists of 13 bech32 characters as described below.
85 
86+As with bech32 strings, a codex32 string MUST be entirely uppercase or entirely lowercase.

BenWestgate commented at 1:35 am on January 22, 2026:

in bip-0093.mediawiki:77 in c14d242c64

84 ** A checksum which consists of 13 bech32 characters as described below.
85 
86+As with bech32 strings, a codex32 string MUST be entirely uppercase or entirely lowercase. 
87+The lowercase form of the human-readable part is used when determining a character's value for checksum purposes.
88+For presentation, lowercase is usually preferable, but uppercase SHOULD be used for handwritten codex32 strings.
89+If a codex32 string is encoded in a QR code, it SHOULD use the uppercase form, as this is encoded more compactly.

BenWestgate commented at 1:38 am on January 22, 2026:

in bip-0093.mediawiki:78 in c14d242c64

85 
86+As with bech32 strings, a codex32 string MUST be entirely uppercase or entirely lowercase. 
87+The lowercase form of the human-readable part is used when determining a character's value for checksum purposes.
88+For presentation, lowercase is usually preferable, but uppercase SHOULD be used for handwritten codex32 strings.
89+If a codex32 string is encoded in a QR code, it SHOULD use the uppercase form, as this is encoded more compactly.
90+

BenWestgate commented at 1:38 am on January 22, 2026:

BenWestgate commented at 1:40 am on January 22, 2026: contributor

duplicate lines

Removed duplicate lines

Clarify encoding requirements for codex32 strings.

0f0c58e5c9

Replace accidentally deleted section

Clarify codex32 specifications, including checksum details and error correction capabilities.

92d091c6c3

in bip-0093.mediawiki:63 in 0f0c58e5c9

67-
68 A codex32 string is similar to a bech32 string defined in [https://github.com/bitcoin/bips/blob/master/bip-0173.mediawiki BIP-0173].
69 It reuses the base-32 character set from BIP-0173, and consists of:
70 
71-* A human-readable part, which is the string "ms" (or "MS").
72+* The human-readable part, as specified in BIP-0173.

BenWestgate commented at 1:49 am on January 22, 2026:

0* A human-readable part, as specified in BIP-0173.

in bip-0093.mediawiki:141 in 92d091c6c3

157     return [(polymod >> 5 * (12 - i)) & 31 for i in range(13)]
158 </source>
159+
160 This implements a [https://en.wikipedia.org/wiki/BCH_code BCH code] that
161-guarantees detection of '''any error affecting at most 8 characters'''
162+guarantees detection of '''any error affecting at most 8 characters''' in codex32 strings up to 94 characters long,

BenWestgate commented at 8:12 am on January 22, 2026:

0guarantees detection of '''any error affecting at most 8 characters''' in strings up to 94 characters long,

BenWestgate commented at 9:17 am on January 22, 2026:

This is the key difference from choosing the checksum based on length limits and applications needing to disambiguate different HRP will need to bear it in mind. But the master seed format does not because our wallet guidance prefills MS1 on import.

For long codex I kept the maximum length at what covers the [low hrp] + data as per BIP-0173.

in bip-0093.mediawiki:191 in 92d091c6c3 outdated

203+        return None, None
204     codex = codex.lower()
205-    pos = codex.rfind("1")
206-    if pos < 2 or not (48 <= len(codex) <= 127):
207-        return None
208+    pos = codex.rfind('1')

BenWestgate commented at 8:14 am on January 22, 2026:

0    pos = codex.rfind("1")

in bip-0093.mediawiki:184 in 92d091c6c3

191-def ms32_encode(data):
192-    combined = data + ms32_create_checksum(data)
193-    return "ms" + "1" + ''.join([CHARSET[d] for d in combined])
194+def codex32_encode(hrp, data, spec):
195+    combined = data + codex32_create_checksum(hrp, data, spec)
196+    return hrp + '1' + ''.join([CHARSET[d] for d in combined])

BenWestgate commented at 8:15 am on January 22, 2026:

0    return hrp + "1" + "".join([CHARSET[d] for d in combined])

in bip-0093.mediawiki:257 in 92d091c6c3

253+        # codex32-encoded master seeds are never 97-99 characters long.
254+        return None, None
255+    hrpgot, data = codex32_decode(codex, spec)
256+    if hrpgot != "ms":
257+        return None, None
258+    header = u5_to_bech32(data[:6])

BenWestgate commented at 8:21 am on January 22, 2026:

0    header = codex[3:9].lower()

in bip-0093.mediawiki:270 in 92d091c6c3

266+    # Success.
267+    return header, decoded
268+
269+def ms_encode(header, seed):
270+    spec = Encoding.CODEX32 if len(seed) < 47 else Encoding.LONG_CODEX32
271+    ret = codex32_encode("ms", bech32_to_u5(header) + convertbits(witprog, 8, 5), spec)

BenWestgate commented at 8:24 am on January 22, 2026:

0    ret = codex32_encode("ms", [CHARSET.index(x) for x in header] + convertbits(seed, 8, 5), spec)

in bip-0093.mediawiki:521 in 92d091c6c3

516+
517+SatoshiLabs maintains a full list of registered human-readable parts for other uses of codex32:
518+
519+[https://github.com/satoshilabs/slips/blob/master/slip-0173.md#uses-of-codex32 SLIP-0173 : Registered human-readable parts for BIP-0093]
520+
521+The sequence of lower 5 bits of each character's US-ASCII value in a registered codex32 human-readable part SHOULD be unique.

BenWestgate commented at 8:50 am on January 22, 2026:

0A registered codex32 human-readable part SHOULD have a unique sequence of lower 5 bits across its characters' US-ASCII values.

in bip-0093.mediawiki:522 in 92d091c6c3

517+SatoshiLabs maintains a full list of registered human-readable parts for other uses of codex32:
518+
519+[https://github.com/satoshilabs/slips/blob/master/slip-0173.md#uses-of-codex32 SLIP-0173 : Registered human-readable parts for BIP-0093]
520+
521+The sequence of lower 5 bits of each character's US-ASCII value in a registered codex32 human-readable part SHOULD be unique.
522+This makes codex32 HRP error correction possible for applications choosing to implement it.

BenWestgate commented at 8:55 am on January 22, 2026:

0This helps codex32 HRP error correction for applications choosing to specify how to do this.

Apply suggestions from my code review 89ec67fe25

BenWestgate commented at 9:28 am on January 22, 2026: contributor

Apologies, I probably won’t get to reviewing this until Feb.

No worries. If you agree that the application—rather than the string length—should determine which checksum to use, I can close #2077 to save others time.

I slightly prefer leaving it up to the application, even though it requires adding a spec parameter to codex32_decode and codex32_encode.

This seems slightly more correct, as applications could also choose Bech32/Bech32m if they enforce the codex32 header specification, as interpolation would still function.

double quote "1" string 65dfc5fbbf

Enhance master seed decoding details

Added decoding instructions and example code for secret seeds.

afd70b7063

murchandamus commented at 11:15 pm on February 27, 2026: member

@roconnor: I think this may be in your court.

apoelstra commented at 1:35 am on February 28, 2026: contributor

It’s also on my queue to review.

BIP93: Generalize codex32 format for any hrp and fix typos #2040

@.**** commented on this pull request.