Descriptor checksums

sipa commented at 11:16 pm on February 7, 2019: member

This adds support for a descriptor-specific 8-character checksum.

Descriptors may optionally be suffixed with a # plus these 8 checksum characters. Any descriptor that contains a # at the end must be followed by a valid checksum. If the # is missing entirely, it is valid without checksum.

All RPCs are updated to report descriptors that include the checksum. On input, they are optional except in deriveaddress and importmulti, which require descriptors which include a checksum.

A new RPC is also added to analyse descriptors (getdescriptorinfo), which can be used to compute the checksum for a descriptor without.

sipa force-pushed on Feb 7, 2019

instagibbs commented at 2:21 am on February 8, 2019: member

Can a motivation for the placement be given? Accidentally eliding it for whatever reason neuters the protection while still maintaining a valid descriptor, but clearly it seems simpler from an implementation and compatibility perspective.

sipa commented at 2:30 am on February 8, 2019: member

@gsanders Well for critical RPCs the plan is that the checksum won’t be optional. I just haven’t included that in this PR as it means adapting a bunch of tests, which I only want to do once the checksum algorithm is final.

meshcollider added the label Wallet on Feb 8, 2019

meshcollider commented at 3:18 am on February 8, 2019: contributor

Concept ACK

meshcollider added this to the milestone 0.18.0 on Feb 8, 2019

promag commented at 2:44 pm on February 8, 2019: member

Concept ACK.

It’d be nice to read a draft to update doc/descriptors.md.

sipa commented at 8:04 pm on February 8, 2019: member

Added a section to doc/descriptors.md.

in doc/descriptors.md:182 in 975a4c41bd outdated

177+
178+These checksums consist of 8 alphanumeric characters. As long as errors are
179+restricted to substituting characters in `0123456789()[],'/*abcdefgh@:$%{}`
180+for others in that set and changes in letter case, up to 4 errors will always
181+be detected in descriptors up to 501 characters, and up to 3 errors in longer
182+ones. For larger numbers of errors, or other types of errors, the is a

promag commented at 8:07 pm on February 8, 2019:

s/the is/there is?

sipa commented at 8:08 pm on February 8, 2019:

Fixed.

sipa force-pushed on Feb 8, 2019

in src/test/descriptor_tests.cpp:40 in 141c965313 outdated

35+    bool b_check = (b.size() > 9 && b[b.size() - 9] == '#');
36+    if (a_check != b_check) {
37+        if (a_check) a = a.substr(0, a.size() - 9);
38+        if (b_check) b = b.substr(0, b.size() - 9);
39+    }
40+    if (a != b) fprintf(stderr, "%s != %s\n", a.c_str(), b.c_str());

practicalswift commented at 1:21 pm on February 9, 2019:

Should this a debug print be left in here?

sipa commented at 3:08 am on February 13, 2019:

Fixed.

in src/script/descriptor.cpp:27 in 141c965313 outdated

18@@ -19,6 +19,97 @@
19 
20 namespace {
21 
22+////////////////////////////////////////////////////////////////////////////
23+// Checksum                                                               //
24+////////////////////////////////////////////////////////////////////////////
25+
26+// This section implements a checksum algorithm for descriptors with the following properties:
27+// * Every 1 character substitition error counts as 1 or 2 symbol errors, but:

practicalswift commented at 1:22 pm on February 9, 2019:

Subsitute this substitition with substitution.

sipa commented at 3:05 am on February 13, 2019:

Fixed.

in src/rpc/misc.cpp:159 in 141c965313 outdated

154+            RPCResult{
155+            "{\n"
156+            "  \"descriptor\" : \"desc\",       (string) The descriptor in canonical form, without private keys\n"
157+            "  \"isrange\" : true|false,        (boolean) Whether the descriptor is ranged\n"
158+            "  \"issolvable\" : true|false,     (boolean) Whether the descriptor is solvable\n"
159+            "  \"isprivate\" : true|false,      (boolean) Whether the input descriptor contained at least one private key\n"

Sjors commented at 7:42 pm on February 10, 2019:

Nit: rename isprivate to hasprivatekey (or contains...).

sipa commented at 3:10 am on February 13, 2019:

Done.

Sjors commented at 7:45 pm on February 10, 2019: member

Concept ACK, will review shortly. Agree that deriveaddress and importmulti should require a checksum. The introduction of getdescriptorinfo means there’s no need to make that opt-out.

Do I understand correctly that the checksum is either based on the canonical form, i.e. based on public keys, or skips keys altogether (but that seems suboptimal when these keys are not in checksummed xpub form)? Otherwise the result of getdescriptorinfo could be confusing if you feed it an xpriv.

Maybe return a warning if a user does provide an xpriv that they should clear their shell command history (and generally recommend either not doing that, or providing a safer method like #15346).

meshcollider commented at 8:09 pm on February 10, 2019: contributor

The surrounding code looks good other than the comments above, haven’t reviewed the actual checksum code itself yet

in src/rpc/misc.cpp:164 in 141c965313 outdated

158+            "  \"issolvable\" : true|false,     (boolean) Whether the descriptor is solvable\n"
159+            "  \"isprivate\" : true|false,      (boolean) Whether the input descriptor contained at least one private key\n"
160+            "}\n"
161+            },
162+            RPCExamples{
163+                "Analyse a descriptor\n" +

laanwj commented at 12:21 pm on February 12, 2019:

might want to mention that this example only works on mainnet

sipa commented at 3:13 am on February 13, 2019:

Fixed by changing to a pubkey-only example.

laanwj commented at 12:22 pm on February 12, 2019: member

lightly tested ACK, code changes look good to me, haven’t checked any of the magic numbers in PolyMod

sipa force-pushed on Feb 13, 2019

sipa commented at 3:17 am on February 13, 2019: member

Several changes:

Addressed all comments
Finalized the checksum design (and switched to a slightly better generator)
Added explanation (incl. Sage code) of the checksum
Made checksums mandatory in deriveaddresses and importmulti
Expanded and updated tests, including a Python implementation of the checksum

in src/script/descriptor.cpp:62 in 5a04dafbd2 outdated

48+ *   3 errors in windows up to 19000 symbols.
49+ * - Taking all those generators, and for degree 7 ones, extend them to degree 8 by adding all degree-1 factors.
50+ * - Selecting just the set of generators that guarantee detecting 4 errors in a window of length 512.
51+ * - Selecting one of those with best worst-case behavior for 5 errors in windows of length up to 512.
52+ *
53+ * The generator and the constants to implement it can be verified using this Sage code:

sipa commented at 3:19 am on February 13, 2019:

@gmaxwell @sdaftuar What do you think about the inclusion of the Sage code here? I could do the same for the Bech32.

Sjors commented at 9:58 am on February 13, 2019:

I think that’s a great idea here. For bech32 it should probably just go in the BIP.

Does anyone want to over-engineer having Travis check against this? :-)

sipa commented at 6:13 pm on February 13, 2019:

I think that’s overkill.

sipa force-pushed on Feb 13, 2019

Sjors commented at 9:56 am on February 13, 2019: member

Breaks Travis due to #14918.

in src/script/descriptor.cpp:28 in 83949e8b17 outdated

23+// Checksum                                                               //
24+////////////////////////////////////////////////////////////////////////////
25+
26+// This section implements a checksum algorithm for descriptors with the following properties:
27+// * Every 1 character substitution error counts as 1 or 2 symbol errors, but:
28+//   * An error substituting a character from 0123456789()[],'/*abcdefgh@:$%{} for another in

Sjors commented at 10:14 am on February 13, 2019:

Maybe flip this around, and explain a bit more how symbol error count works:

 0// * Mistakes in a descriptor string are measured in symbol errors. A higher symbol
 1//   error count is more difficult to detect, because it becomes indistinguishable
 2//   from a different descriptor.
 3//   * An error substituting a character from 0123456789()[],'/*abcdefgh@:$%{} for
 4//     another in that set always counts as 1 symbol error.
 5//   * A case error always counts as 1 symbol error.
 6//   * Any other 1 character substitution error counts as 1 or 2 symbol errors.
 7//   * Note that hex encoded keys are covered by these special characters, whereas xprivs
 8//     and xpubs use different characters, but already have their own checksum mechanism.
 9//     Function names like "multi()" use different characters, but mistakes would generally
10//     result in an unparseable descriptor.

Some of this is also explained in DescriptorChecksum in different wording.

sipa commented at 6:14 pm on February 13, 2019:

Cool, that’s more clear. I’ve included it with some copy-editing.

in src/script/descriptor.cpp:111 in 83949e8b17 outdated

 97+     * the position within the groups.
 98+     */
 99+    static std::string INPUT_CHARSET =
100+        "0123456789()[],'/*abcdefgh@:$%{}"
101+        "IJKLMNOPQRSTUVWXYZ&+-.;<=>?!^_|~"
102+        "ijklmnopqrstuvwxyzABCDEFGH`#\"\\ ";

Sjors commented at 10:27 am on February 13, 2019:

Nit: putting ABCDEFGH and abcdefgh all the way to the right would make it a bit more clear that they intentionally have the same offset (for some reason Github doesn’t believe in fixed-width font for code, but even in editors it would be more clear).

sipa commented at 5:57 pm on February 13, 2019:

What platform are you using? It’s fixed width here, and I prefer to keep them in alphabetical order.

sipa commented at 6:15 pm on February 13, 2019:

Github renders in a fixed-width font here. I don’t think this concern weighs up against keeping the characters in alphabetical ordering (to the extent possible).

practicalswift commented at 7:11 pm on February 13, 2019:

FWIW, abc... and ABC... are perfectly aligned here :-)

A non-fixed width GitHub code view is surely not intentional – I suggest reporting to GitHub! :-)

Sjors commented at 7:42 pm on February 13, 2019:

Nvm, it was just the escape characters in the bottom line that made it look misaligned. That or I was A/B-tested.

in src/rpc/misc.cpp:238 in d62fba0053 outdated

232@@ -193,7 +233,7 @@ UniValue deriveaddresses(const JSONRPCRequest& request)
233     }
234 
235     FlatSigningProvider provider;
236-    auto desc = Parse(desc_str, provider);
237+    auto desc = Parse(desc_str, provider, true);
238     if (!desc) {
239         throw JSONRPCError(RPC_INVALID_ADDRESS_OR_KEY, strprintf("Invalid descriptor"));

Sjors commented at 10:56 am on February 13, 2019:

It would be nice to have a dinstinct error for a missing checksum.

instagibbs commented at 6:01 pm on February 13, 2019:

In general the errors are non-existent. Like, a space causes it to be rejected. Maybe a followup PR to make some common error cases printed?

sipa commented at 6:15 pm on February 13, 2019:

Agree, but I’d prefer to do that in a different change.

Sjors commented at 7:37 pm on February 13, 2019:

I think that’s fine because both descriptor enhanced importmulti and deriveaddress are new anyway. We can make them more user friendly later.

Sjors commented at 10:56 am on February 13, 2019: member

tACK 4f95087 modulo RPC help syntax

I can’t vouch for the actual math, but the documentation, Sage code, tests and Python re-implementation are comforting. One way, perhaps overkill, to sanity check that the checksum works as intended is to generate a whole bunch of deterministic typos and see that they are indeed detected.

sipa force-pushed on Feb 13, 2019

sipa commented at 6:15 pm on February 13, 2019: member

Rebased.

gmaxwell commented at 7:08 pm on February 14, 2019: contributor

ACK

in src/rpc/misc.cpp:151 in f57d2cb74e outdated

142@@ -143,6 +143,46 @@ static UniValue createmultisig(const JSONRPCRequest& request)
143     return result;
144 }
145 
146+UniValue getdescriptorinfo(const JSONRPCRequest& request)
147+{
148+    if (request.fHelp || request.params.size() != 1) {
149+        throw std::runtime_error(
150+            RPCHelpMan{"getdescriptorinfo",
151+            {"\nAnalyses a descriptor.\n"},

instagibbs commented at 7:13 pm on February 14, 2019:

Analyze?

flack commented at 8:39 pm on February 14, 2019:

@instagibbs it’s funny, I stumbled across this, too, and did a little search on analyse/analyze in the code base, it’s almost evenly split. So it would probably be better to do a separate PR that standardizes (standardises?) on either the British or American spelling

gmaxwell commented at 9:04 pm on February 14, 2019:

I am not a fan of PRs to go around switching between different english styles, especially in comments. All of them are valid, all can be mutually read by english speakers. Trying to maintain consistency would just mean a never ending sequence of fixups PRs.

We have a finite amount of resources to handle changes, they should be conserved for efforts that improve the capability or reliability of the software.

flack commented at 9:09 pm on February 14, 2019:

I’m not saying it should necessarily be changed, but changing it in this PR only will not really add anything wrt consistency either (since analyze/analyse appear in more or less the same frequency in the code base). So… leave it as is?

instagibbs commented at 9:47 pm on February 14, 2019:

I had no idea it was a valid word, haha. I retract the comment!

instagibbs commented at 9:47 pm on February 14, 2019:

I thought it was a misspelling of “analysis” which made no grammatical sense.

instagibbs changes_requested

instagibbs commented at 7:24 pm on February 14, 2019: member

Please add a single functional test for deriveaddresses and importmulti lacking the checksum, I compiled out the mandatory flag for them and tests seem to pass.

sipa force-pushed on Feb 14, 2019

sipa commented at 11:33 pm on February 14, 2019: member

Rebased and added a test to deriveaddresses and importmulti to test for missing checksum.

in test/functional/rpc_deriveaddresses.py:22 in 34164546d8 outdated

21-
22         assert_equal(self.nodes[0].deriveaddresses(descriptor), [address])
23 
24-        descriptor_pubkey = "wpkh(tpubD6NzVbkrYhZ4WaWSyoBvQwbpLkojyoTZPRsgXELWz3Popb3qkjcJyJUGLnL4qHHoQvao8ESaAstxYSnhyswJ76uZPStJRJCTKvosUCJZL5B/1/1/0)"
25-        address = "bcrt1qjqmxmkpmxt80xz4y3746zgt0q3u3ferr34acd5"
26+        descriptor = "wpkh(tprv8ZgxMBicQKsPd7Uf69XL1XwhmjHopUGep8GuEiJDZmbQz6o58LninorQAfcKZWARbtRtfnLcJ5MQ2AtHcQJCCRUcMRvmDUjyEmNUWwx8UbK/1/1/0)"

instagibbs commented at 11:55 pm on February 14, 2019:

absolutely not blocking nit: just use descriptor = descriptor[:-9] to make it clear it’s just dropping checksum

sipa commented at 10:18 pm on February 15, 2019:

Done.

instagibbs approved

instagibbs commented at 0:28 am on February 15, 2019: member

tests now fail when I allow no checksum

tACK https://github.com/bitcoin/bitcoin/pull/15368/commits/34164546d8f9c2c1f0b05e2b2f8b83a3ac16eec3

fanquake commented at 6:51 am on February 15, 2019: member

Can the deriveaddresses RPC example be updated with a checksum, otherwise it will no-longer work.

Looks like the descriptor should be (from getdescriptorinfo):

0"wpkh([d34db33f/84'/0'/0']xpub6DJ2dNUysrn5Vt36jH2KLBT2i1auw1tTSSomg8PhqNiUtx8QX2SvC9nrHu81fT41fvDUnhMjEzQgXnQjKEu3oaqMSzhSrHMxyyoEAmUHQbY/0/*)#trd0mf0l"

in src/script/descriptor.h:73 in a086e8455a outdated

62@@ -63,7 +63,7 @@ struct Descriptor {
63 };
64 
65 /** Parse a descriptor string. Included private keys are put in out. Returns nullptr if parsing fails. */
66-std::unique_ptr<Descriptor> Parse(const std::string& descriptor, FlatSigningProvider& out);
67+std::unique_ptr<Descriptor> Parse(const std::string& descriptor, FlatSigningProvider& out, bool require_checksum = false);

promag commented at 4:50 pm on February 15, 2019:

Instead of adding the new argument with a default value, why not create a new function ParseChecked? From the call site it is more clear, especially since it’s statically defined where checksum is needed.

promag commented at 7:09 pm on February 15, 2019:

Or you could do auto desc = Parse(desc_str, provider, /* require_checksum = */ true);.

sipa commented at 10:18 pm on February 15, 2019:

Did the latter.

promag commented at 4:56 pm on February 15, 2019: member

Needs release for the new RPC. deriveaddress release notes already points to descriptors.md.

sipa force-pushed on Feb 15, 2019

sipa added the label Needs release note on Feb 15, 2019

sipa commented at 10:19 pm on February 15, 2019: member

@promag I’ve just added a “needs release notes” label for now, as it intersects with the notes added for deriveaddresses and importmulti. @fanquake Done.

in src/script/descriptor.h:65 in a19b3f92e2 outdated

62@@ -63,7 +63,7 @@ struct Descriptor {
63 };
64 
65 /** Parse a descriptor string. Included private keys are put in out. Returns nullptr if parsing fails. */

promag commented at 0:34 am on February 16, 2019:

nit, could update function comment and note that checksum is always checked if present.

sipa commented at 6:31 am on February 16, 2019:

Done.

in src/script/descriptor.cpp:131 in a19b3f92e2 outdated

126+            c = PolyMod(c, cls);
127+            cls = 0;
128+            clscount = 0;
129+        }
130+    }
131+    if (clscount) c = PolyMod(c, cls);

promag commented at 0:38 am on February 16, 2019:

nit, > 0.

sipa commented at 6:32 am on February 16, 2019:

Done.

in src/script/descriptor.cpp:135 in a19b3f92e2 outdated

130+    }
131+    if (clscount) c = PolyMod(c, cls);
132+    for (int j = 0; j < 8; ++j) c = PolyMod(c, 0); // Shift further to determine the checksum.
133+    c ^= 1; // Prevent appending zeroes from not affecting the checksum.
134+
135+    std::string ret;

promag commented at 0:39 am on February 16, 2019:

nit, resize(8)? And then below ret[j] = ....

sipa commented at 6:32 am on February 16, 2019:

Done.

in src/script/descriptor.cpp:120 in a19b3f92e2 outdated

115+
116+    uint64_t c = 1;
117+    int cls = 0;
118+    int clscount = 0;
119+    for (auto ch : span) {
120+        auto pos = INPUT_CHARSET.find(ch);

promag commented at 0:46 am on February 16, 2019:

nit, could avoid linear search by having an array to map to pos

0static int CHAR_POS[] = { ... }; // -1 if invalid
1...
2int pos = CHAR_POS[ch];
3if (pos == -1) return "";

sipa commented at 6:32 am on February 16, 2019:

I don’t think the extra code is worth the performance gain.

in src/rpc/misc.cpp:175 in a19b3f92e2 outdated

170+    RPCTypeCheck(request.params, {UniValue::VSTR});
171+
172+    FlatSigningProvider provider;
173+    auto desc = Parse(request.params[0].get_str(), provider);
174+    if (!desc) {
175+        throw JSONRPCError(RPC_INVALID_ADDRESS_OR_KEY, strprintf("Invalid descriptor"));

promag commented at 0:51 am on February 16, 2019:

nit, could have a test for this error.

sipa commented at 6:32 am on February 16, 2019:

There is (rpc_deriveaddresses functional test).

promag commented at 11:15 am on February 16, 2019:

Not for getdescriptorinfo.

in src/script/descriptor.cpp:892 in a19b3f92e2 outdated

888+    if (check_split.size() > 2) return nullptr; // Multiple '#' symbols
889+    if (check_split.size() == 1 && require_checksum) return nullptr; // Missing checksum
890+    if (check_split.size() == 2) {
891+        auto checksum = DescriptorChecksum(check_split[0]);
892+        if (checksum.size() == 0) return nullptr; // Invalid characters in payload
893+        if ((size_t)check_split[1].size() != checksum.size()) return nullptr; // Unexpected length for checksum

promag commented at 0:58 am on February 16, 2019:

Could be checked before calling DescriptorChecksum?

0if (check_split[1].size() != 8) return nullptr;
1auto checksum = DescriptorChecksum(check_split[0]);

sipa commented at 6:32 am on February 16, 2019:

Good idea, done.

in src/script/descriptor.cpp:892 in a19b3f92e2 outdated

887+    auto check_split = Split(sp, '#');
888+    if (check_split.size() > 2) return nullptr; // Multiple '#' symbols
889+    if (check_split.size() == 1 && require_checksum) return nullptr; // Missing checksum
890+    if (check_split.size() == 2) {
891+        auto checksum = DescriptorChecksum(check_split[0]);
892+        if (checksum.size() == 0) return nullptr; // Invalid characters in payload

promag commented at 0:59 am on February 16, 2019:

nit, .empty().

sipa commented at 6:37 am on February 16, 2019:

Done.

promag commented at 1:05 am on February 16, 2019: member

Could have an “empty checksum” test. Feel free to ignore my nits. LGTM code wise.

sipa force-pushed on Feb 16, 2019

Descriptor checksum 3b40bff988

Add getdescriptorinfo to compute checksum b52cb63688

Make descriptor checksums mandatory in deriveaddresses and importmulti be62903c41

Add checksums to descriptors.md fd637be8d2

sipa force-pushed on Feb 16, 2019

DrahtBot commented at 7:59 am on February 16, 2019: member

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#15414 ([wallet] allow adding pubkeys from imported private keys to keypool by Sjors)
#14912 ([WIP] External signer support (e.g. hardware wallet) by Sjors)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

in src/script/descriptor.cpp:101 in fd637be8d2

 96+     *  - The most common 'unprotected' descriptor characters (hex, keypaths) are in the first group of 32.
 97+     *  - Case errors cause an offset that's a multiple of 32.
 98+     *  - As many alphabetic characters are in the same group (while following the above restrictions).
 99+     *
100+     * If p(x) gives the position of a character c in this character set, every group of 3 characters
101+     * (a,b,c) is encoded as the 4 symbols (p(a) & 31, p(b) & 31, p(c) & 31, (p(a) / 32) + 3 * (p(b) / 32) + 9 * (p(c) / 32).

meshcollider commented at 11:12 am on February 16, 2019:

I’m not sure if I’m reading this wrong, but isn’t the fourth symbol in the wrong order, shouldn’t this be (p(c) / 32) + 3 * (p(b) / 32) + 9 * (p(a) / 32)?

meshcollider commented at 11:16 am on February 16, 2019: contributor

utACK https://github.com/bitcoin/bitcoin/pull/15368/commits/fd637be8d21a606e98c037b40b268c4a1fae2244

laanwj commented at 8:38 pm on February 16, 2019: member

utACK fd637be8d21a606e98c037b40b268c4a1fae2244

laanwj merged this on Feb 16, 2019

laanwj closed this on Feb 16, 2019

laanwj referenced this in commit f60d029a2a on Feb 16, 2019

fanquake removed the label Needs release note on May 17, 2019

MarcoFalke referenced this in commit 1a97c9a483 on Mar 10, 2020

deadalnix referenced this in commit b455b8d272 on Jun 16, 2020

deadalnix referenced this in commit 0cb1527a77 on Jun 17, 2020

deadalnix referenced this in commit 8c72f636c0 on Jun 17, 2020

deadalnix referenced this in commit f231fc4aa3 on Jun 17, 2020

kittywhiskers referenced this in commit c3d5d4a56c on Oct 12, 2021

kittywhiskers referenced this in commit 8ab5f86d42 on Oct 21, 2021

kittywhiskers referenced this in commit 5af083d46c on Oct 25, 2021

kittywhiskers referenced this in commit 6a1d1bfc65 on Oct 25, 2021

kittywhiskers referenced this in commit c93ac35b85 on Oct 25, 2021

kittywhiskers referenced this in commit b8f4bcc74c on Oct 26, 2021

kittywhiskers referenced this in commit d90a064dfb on Oct 26, 2021

kittywhiskers referenced this in commit 23de6c7247 on Oct 26, 2021

kittywhiskers referenced this in commit a0c3d325a2 on Oct 28, 2021

kittywhiskers referenced this in commit 40f38df4bf on Oct 28, 2021

kittywhiskers referenced this in commit 28066868c4 on Oct 28, 2021

kittywhiskers referenced this in commit 3c63ffa68c on Oct 28, 2021

UdjinM6 referenced this in commit bbe9b3d1e0 on Nov 1, 2021

pravblockc referenced this in commit 30a790a302 on Nov 18, 2021

DrahtBot locked this on Dec 16, 2021

Descriptor checksums #15368

Conflicts