Update Abstract to align with specification: threshold can be 0 for unshared secret or 2-9 for shares.
Apparently previous “between 1 and 9” range appears inconsistent with the detailed spec (threshold=1 isn’t valid in codex32).
Update Abstract to align with specification: threshold can be 0 for unshared secret or 2-9 for shares.
Apparently previous “between 1 and 9” range appears inconsistent with the detailed spec (threshold=1 isn’t valid in codex32).
19@@ -20,7 +20,7 @@ This document describes a standard for backing up and restoring the master seed
20 [https://github.com/bitcoin/bips/blob/master/bip-0032.mediawiki BIP-0032] hierarchical deterministic wallet, using Shamir's secret sharing.
21 It includes an encoding format, a BCH error-correcting checksum, and algorithms for share generation and secret recovery.
22 Secret data can be split into up to 31 shares.
23-A minimum threshold of shares, which can be between 1 and 9, is needed to recover the secret, whereas without sufficient shares, no information about the secret is recoverable.
24+A minimum threshold of shares, which can be 0 (for unshared secret) or between 2 and 9 (for shares), is needed to recover the secret, whereas without sufficient shares, no information about the secret is recoverable.
Not sure about this change.
The current abstract states “between 1 and 9” for what “is needed to recover the secret”.
And the Recovering Master Seed section stipulates: “The first character of the data part indicates the threshold of the share, and it is required to be a non-“0” digit.”
So these two excerpts seem to concur? Perhaps it could be clearer.
cc @apoelstra for feedback
It’s correct as it is.
Threshold is a value 1 through 9 and refers to the number of strings needed to recover the seed.
k is the literal first numeric character of the bech32 data so it cannot be “1” even if the threshold is 1. “0” is recommended for unshared secrets (threshold 1) although any numeric value is allowed as it is ignored when share_idx = "s".
Whether it is clear or not is another matter. You’d have to read the body to know these details so I think the abstract is fine. There may be some conflation of “threshold” with “threshold digit” which is called k in the codex book and many reference implementations, which is what lead you to opening this PR.
They call the threshold parameter k in the codex book but used t or threshold in this BIP.
I can see how this might be confusing.
Thanks for the review and giving a justified correction!
Is this closer to what you suggested?
66@@ -67,8 +67,8 @@ It reuses the base-32 character set from BIP-0173, and consists of:
67 * A human-readable part, which is the string "ms" (or "MS").
68 * A separator, which is always "1".
69 * A data part which is in turn subdivided into:
70-** A threshold parameter, which MUST be a single digit between "2" and "9", or the digit "0".
71-*** If the threshold parameter is "0" then the share index, defined below, MUST have a value of "s" (or "S").
72+** A threshold digit (also called ''k'' in the codex book), which MUST be a single digit between "2" and "9", or the digit "0". This digit encodes the threshold (the number of shares required for recovery), where threshold 1 is encoded as "0" for unshared secrets, and thresholds 2-9 are encoded as digits "2"-"9" for shared secrets.
I don’t know if we need to mention what the digit is called in the Codex32 book.
This digit does not always directly encode the threshold number of strings required for recovery.
Threshold 1 is denoted by share index “s” NOT the first data character being “0”. “0” is merely a recommendation, and if used, forces the share index to “s”.
66@@ -67,8 +67,8 @@ It reuses the base-32 character set from BIP-0173, and consists of:
67 * A human-readable part, which is the string "ms" (or "MS").
68 * A separator, which is always "1".
69 * A data part which is in turn subdivided into:
70-** A threshold parameter, which MUST be a single digit between "2" and "9", or the digit "0".
71-*** If the threshold parameter is "0" then the share index, defined below, MUST have a value of "s" (or "S").
72+** A threshold digit (also called ''k'' in the codex book), which MUST be a single digit between "2" and "9", or the digit "0". This digit encodes the threshold (the number of shares required for recovery), where threshold 1 is encoded as "0" for unshared secrets, and thresholds 2-9 are encoded as digits "2"-"9" for shared secrets.
73+*** If the threshold digit is "0" then the share index, defined below, MUST have a value of "s" (or "S").
145@@ -146,14 +146,14 @@ The master seed is decoded by converting the payload to bytes:
146
147 Note that unlike the decoding process in BIP-0173, we do NOT require that the incomplete group be all zeros.
148
149-For an unshared secret, the threshold parameter (the first character of the data part) is ignored (beyond the fact it must be a digit for the codex32 string to be valid).
150-We recommend using the digit "0" for the threshold parameter in this case.
151+For an unshared secret, the threshold digit (the first character of the data part, also called ''k'') is ignored (beyond the fact it must be a digit for the codex32 string to be valid).
k for the threshold parameter. We definitely should not keep repeating both nomenclatures.
145@@ -146,14 +146,14 @@ The master seed is decoded by converting the payload to bytes:
146
147 Note that unlike the decoding process in BIP-0173, we do NOT require that the incomplete group be all zeros.
148
149-For an unshared secret, the threshold parameter (the first character of the data part) is ignored (beyond the fact it must be a digit for the codex32 string to be valid).
150-We recommend using the digit "0" for the threshold parameter in this case.
151+For an unshared secret, the threshold digit (the first character of the data part, also called ''k'') is ignored (beyond the fact it must be a digit for the codex32 string to be valid).
152+We recommend using the digit "0" for the threshold digit in this case, which encodes a threshold of 1 (no sharing).
153 The 4 character identifier also has no effect beyond aiding users in distinguishing between multiple different master seeds in cases where they have more than one.
154
155 ===Recovering Master Seed===
156
157 When the share index of a valid codex32 string (converted to lowercase) is not the letter "s", we call the string an codex32 share.
158-The first character of the data part indicates the threshold of the share, and it is required to be a non-"0" digit.
digit vs parameter changes, or at least move them to their own commit so they are easy to ignore. I agree with your PR summary, that “between 1 and 9” is wrong, but I can’t find the text “between 1 and 9” anywhere in the document or in your diff. This feels to me like LLM slop. At the very least I cannot review this PR in this state.
@apoelstra: The last sentence of the Abstract of the currently published version of BIP 93 reads:
A minimum threshold of shares, which can be between 1 and 9, is needed to recover the secret, whereas without sufficient shares, no information about the secret is recoverable. @Lil-Duckling-22: Please incorporate the requested changes.
Apparently previous “A minimum threshold of shares, which can be between 1 and 9, is needed to recover the secret” range appears inconsistent with the detailed spec (threshold=1 isn’t valid in codex32).
One string with share index “s” can recover the secret but it is the secret. Further the term “threshold” (not “threshold parameter”) always refers to shares:
The relevant definitions proving this text is wrong:
secrets:
Note that a share index value of “s” (or “S”) is special and denotes the unshared secret (see section “Unshared Secret”). When the share index of a valid codex32 string (converted to lowercase) is the letter “s”, we call the string a codex32 secret. For an unshared secret, the threshold parameter (the first character of the data part) is ignored (beyond the fact it must be a digit for the codex32 string to be valid).
shares:
When the share index of a valid codex32 string (converted to lowercase) is not the letter “s”, we call the string an codex32 share. The first character of the data part indicates the threshold of the share, and it is required to be a non-“0” digit.
recover:
In order to recover a master seed, one needs a set of valid codex32 shares such that:
- All shares have the same threshold value, the same identifier, and the same length. @apoelstra: Based on these details, the correct abstract text would be:
Secret data can be directly encoded or split into up to 31 shares. A minimum threshold of shares, which can be between 2 and 9, is needed to recover the secret, whereas without sufficient shares, no information about the secret is recoverable.
Further our test vectors use the oxymoron term “secret share” it should probably be replaced with “codex32 secret”:
- Derived share with index D: MS12NAMEDLL4F8JLH4E5VDVULDLFXU2JHDNLSM97XVENRXEG
- Secret share with index S: MS12NAMES6XQGUZTTXKEQNJSJZV4JV3NZ5K3KWGSPHUH6EVW
Secret share with index s: ms13cashsllhdmn9m42vcsamx24zrxgs3qqjzqud4m0d6nln
Secret with share index s: codex32 secret: codex32 secret with k value 3:
Note that the choice to append two zero bits was arbitrary, and any of the following four secret shares would have been valid choices.
copy the choice above (minus any “with”)
- Secret share with index S: MS100C8VSM32ZXFGUHPCHTLUPZRY9X8GF2TVDW0S3JN54KHCE6MUA7LQPZYGSFJD6AN074RXVCEMLH8WU3TK925ACDEFGHJKLMNPQRSTUVWXY06FHPV80UNDVARHRAK
*Secret with share index S: *codex32 secret:
Vector 5 even has the recommended secret threshold parameter "0". I’m not sure what was going on here, t becomes k in this part of the spec, without ever introducing k although it’s standard SSS term and doesn’t need introduction. Why did it change for no reason? It seems the vector texts were for an earlier draft of the spec. Perhaps ones where uppercase and non-zero threshold parameter codex32 strings were not called codex32 secrets even with share index “S” or “s”.
If you want my opinion, BIP93 should use k everywhere as that’s what your book uses (and the wikipedia article on SSS). It also looks less like other lowercase characters. But that’s a bigger diff than just changing the test vector k’s.
Yeah, these changes all sound great to me @BenWestgate. In particular avoiding the term “secret share” to refer to the secret.
cc @roconnor-blockstream what do you think about the terminology changes proposed in the above comment?
Yeah I think that’s a huge improvement.
I certainly was using the term “share” share to mean “an interpolated codex32 string at some point”, and thus “secret share” is simply “the share” interpolated at the point ‘S’. However, I agree that is a very bad definition of “share” as knowing a single share is not supposed to reveal the secret, so “secret share” shouldn’t even be a thing. Instead interpolating the point at ‘S’ yields the “codex32 encoded master seed”. Then, as Ben says, the number of shares you need to recover will indeed vary from 2 to 9.
While I do think “codex32 secret” is fine, I think should consider “codex32 seed” (or perhaps “codex32 master seed”) as a short hand for “codex32 encoded master seed” and see if we can make it work. This would require more intensive editing to replace some uses of “secret” in the BIP with “seed” or “master seed”.
I certainly was using the term “share” share to mean “an interpolated codex32 string at some point”, and thus “secret share” is simply “the share” interpolated at the point ‘S’.
We always use the verb “recover” to say “interpolate to ‘S’.”
We codex32 encode threshold initial strings, interpolate/derive* additional shares, and we recover secrets. Even the reference Python has this distinction.
Note: Does “derive” seem too ambiguous vs interpolate?
Instead interpolating the point at ‘S’ yields the “codex32 encoded master seed”.
Only if the human-readable part is ‘MS’. For ‘CL’ it’s a “Core Lightning HSM secret” which is a private key.
It’s worth adding 3 test vectors for this application https://docs.corelightning.org/reference/exposesecret and updating the checksum Python reference to accept hrp as a parameter instead of rolling it into the initial value.
While I do think “codex32 secret” is fine, I think should consider “codex32 seed” (or perhaps “codex32 master seed”) as a short hand for “codex32 encoded master seed” and see if we can make it work. This would require more intensive editing to replace some uses of “secret” in the BIP with “seed” or “master seed”.
The book uses “secret seed”, “secret” as short hand in headers and “seed” only when it is NOT codex32-encoded. The scheme doesn’t care what the secret is.
We should use “codex32-encoded master seed” or “secret seed” for the specific case of a codex32 secret with prefix ‘MS’.
To avoid oxymorons, let’s call non-“S” strings codex32 “shares” regardless of human readable prefix. This fits as their data is discarded by software after recovery of codex32 secrets.
The website uses the oxymoron “codex32-encoded seed shares” and incorrectly uses “seeds” to mean “shares”. The ‘MS’ specific wallets.md doc uses seed/shares with 1 mention of “codex32 secret” for codex32-encoded data with index ‘S’.
Let’s fix the clear mistakes first, change t to k everywhere, then follow up with generalizing hrp and replace mentions specific to ‘MS’ secrets with a specific term.
Instead interpolating the point at ‘S’ yields the “codex32 encoded master seed”.
Only if the human-readable part is ‘MS’. For ‘CL’ it’s a “Core Lightning HSM secret” which is a private key.
Thanks for noting this. I was going back and forth on whether “seed” should be used because this document is “MS” specific, or “secret” should be used in case people want to use “codex32” in more generic contexts. Does “codex32” mean BIP-93, or is it more general? Should “secret” be used in the book with “seed” preferred in BIP-93? I don’t have good answers to these questions, and I could be convinced on the “secret” v.s. “seed” wording either way.
Instead interpolating the point at ‘S’ yields the “codex32 encoded master seed”.
Only if the human-readable part is ‘MS’. For ‘CL’ it’s a “Core Lightning HSM secret” which is a private key.
Thanks for noting this. I was going back and forth on whether “seed” should be used because this document is “MS” specific, or “secret” should be used in case people want to use “codex32” in more generic contexts. Does “codex32” mean BIP-93, or is it more general? Should “secret” be used in the book with “seed” preferred in BIP-93? I don’t have good answers to these questions, and I could be convinced on the “secret” v.s. “seed” wording either way.
The term “codex32” is being used interchangeably with BIP-93:
codex32 (string): The full codex32-encoded (i.e. BIP-93 encoded) HSM secret.
Other applications (CLN, SLIP-0173) are citing BIP-93 even though it does not explain how to calculate checksums for non-“MS” codex32 strings. This is going to lead to problems (and already did, for me.)
So we should follow the example of BIP-0173:
We first describe the general checksummed base32[1] format called Bech32 and then define Segregated Witness addresses using it.
Proposed BIP-93 text under H1 “Specification”:
We first describe the general checksummed base32 format called codex32 and then define codex32-encoding of BIP-0032 Master Seeds using it.
This second “BIP-0032 Master Seeds” section is our current “Unshared Secret” section.
We should minimize use of “codex32” as an adjective to only when needed. The book uses it zero, wallets.md once, the website only for the encoding and checksum, never a secret/seed/share/string. Terms codex32 shares, codex32 strings, codex32 secrets IMO present excess jargon to users.
End users only need to know:
The book is printed, it’s fine to keep it “secret seed”, this term is better than “codex32 seed” or “codex32 secret” anyhow. No one will be confused so long as BIP-93 says something to the effect:
When the human-readable part of a valid codex32 secret (converted to lowercase) is “ms”, we call the secret a secret seed or codex32 master seed. The payload in a secret seed is a direct encoding of a BIP-0032 HD master seed.
These are all great points @BenWestgate. Can you file an issue for the book at https://github.com/BlockstreamResearch/codex32/ (you can’t really PR to fix it because the book was printed off some non-master branch and we have a longstanding todo to merge it in, but it’s slow going because of how messy my old nix code was).
I’ll go through the website and update all uses of “seed” to “secret” and audit all uses of “share”. I don’t think I have a public git repo (or any git repo :grimacing:) for the site..
Edit I went through all the HTML files and corrected “seed shares” to “shares (or an unsplit codex32-encoded secret)” and corrected “seeds” to “shares” in one place. Both in the top-level index.html. Otherwise I believe all the uses of “share” “seed” and “secret” were correct.
The book avoided using “secret share” it uses “secret seed” which to me is short hand for “codex32-encoded master seed.” or “ms”-prefixed codex32 secret.
“Secret” implies the encoding at secret index “s” while “seed” is the literal payload bytes.
You want a book issue opened to rename “secret seed” to “codex32-encoded master seed” and after introducing the concept, just “secret”? That would reduce the text in both the book and BIP.
You want a book issue opened to rename “secret seed” to “codex32-encoded master seed” and after introducing the concept, just “secret”? That would reduce the text in both the book and BIP.
Yep, this sounds right to me.
145@@ -146,14 +146,14 @@ The master seed is decoded by converting the payload to bytes:
146
147 Note that unlike the decoding process in BIP-0173, we do NOT require that the incomplete group be all zeros.
148
149-For an unshared secret, the threshold parameter (the first character of the data part) is ignored (beyond the fact it must be a digit for the codex32 string to be valid).
150-We recommend using the digit "0" for the threshold parameter in this case.
151+For an unshared secret, the threshold digit (the first character of the data part) is ignored (beyond the fact it must be a digit for the codex32 string to be valid).
66@@ -67,8 +67,8 @@ It reuses the base-32 character set from BIP-0173, and consists of:
67 * A human-readable part, which is the string "ms" (or "MS").
68 * A separator, which is always "1".
69 * A data part which is in turn subdivided into:
70-** A threshold parameter, which MUST be a single digit between "2" and "9", or the digit "0".
71-*** If the threshold parameter is "0" then the share index, defined below, MUST have a value of "s" (or "S").
72+** A threshold digit (also called ''k'' in the codex book), which MUST be a single digit between "2" and "9", or the digit "0". For shared secrets, this digit encodes the threshold (the number of shares required for recovery): thresholds 2-9 are encoded as digits "2"-"9" respectively. For unshared secrets, threshold 1 is denoted by the share index "s" (not by the threshold digit); the digit "0" is recommended for the threshold digit in this case, but any digit is allowed as it is ignored.
145@@ -146,14 +146,14 @@ The master seed is decoded by converting the payload to bytes:
146
147 Note that unlike the decoding process in BIP-0173, we do NOT require that the incomplete group be all zeros.
148
149-For an unshared secret, the threshold parameter (the first character of the data part) is ignored (beyond the fact it must be a digit for the codex32 string to be valid).
150-We recommend using the digit "0" for the threshold parameter in this case.
151+For an unshared secret, the threshold digit (the first character of the data part) is ignored (beyond the fact it must be a digit for the codex32 string to be valid).
152+We recommend using the digit "0" for the threshold digit in this case. Note that threshold 1 is denoted by the share index "s", not by the threshold digit.
154
155 ===Recovering Master Seed===
156
157 When the share index of a valid codex32 string (converted to lowercase) is not the letter "s", we call the string an codex32 share.
158-The first character of the data part indicates the threshold of the share, and it is required to be a non-"0" digit.
159+The first character of the data part is the threshold digit, which encodes the threshold (the number of shares required for recovery). For a codex32 share, the threshold digit is required to be a non-"0" digit (i.e., "2" through "9"), encoding thresholds 2 through 9 respectively.
approach nACK.
At this point the updates haven’t reflected reviewer feedback.
The best way to close the issue this intends to solve is
to make a new PR with the non-HRP changes from #2040, which it seems like everyone agrees with
If you can strip everything out of that PR related to generalizing BIP93 to any hrp (like renaming seed to secret, changing the python reference, selecting the hrp) it would be ready to merge. You should keep the rename from ‘’t’’ to ‘‘k’’ and the terminology fixes in the test vectors.
If you think you can handle that it would be helpful.
Otherwise I’ll do it later this week.
…My impression is … “commit harvester” … so if this matches your assessment and you want to just adopt any outstanding changes that make sense from this PR here into your own PR, we can just continue this work on your PR instead.
Yes, that’s what I meant when I said:
If you think you can handle that it would be helpful.
As deleting the right parts of the #2040 diff requires a few minutes more human effort than the commits here have contained.