Change handling of “h” versus “’” marker for hardened derivation in descriptors #15740

issue gwillen openend this issue on April 3, 2019
  1. gwillen commented at 11:16 pm on April 3, 2019: contributor

    Per some discussion in #bitcoin-core-dev – getdescriptorinfo should probably not canonicalize the descriptor given, but return it exactly as provided. This is because it is the easiest way to get a checksum for a descriptor written by hand, and it’s profoundly annoying that there’s no way to get a checksum for a non-canonical descriptor right now (one containing the “h” marker for hardened, or containing a private key.)

    Also discussed was the possibility of switching the default marker from “’” to “h”. The apostrophe is very annoying to work with over RPC because of quoting problems, which is why “h” is accepted. Better to just use “h” everywhere for the same reason.

  2. fanquake added the label RPC/REST/ZMQ on Apr 3, 2019
  3. ghost commented at 5:15 pm on May 6, 2019: none

    HWI supplies descriptors that include the non-canonical descriptor with the h for hardened and are converted when ran through getdescriptorinfo

    It would be nice if getdescriptorinfo would not change what is provided to it and provides a checksum for what was the input

  4. harding commented at 5:57 pm on May 6, 2019: contributor

    I was working with @myveryown earlier today on debugging a difficulty using descriptors and I found it very unintuitive that this command returns not just an obvious checksum (which is appreciated) but also converts h to ' in origin info in a way that’s not immediately obvious:

    0$ bitcoin-cli getdescriptorinfo "wpkh([f6bb4c63/0h/0h/30h]028429a37c3f09c8c5cc1fab58df32d1a7da7616c748a40eeb1aae1d64acb9c5cc)" | jq -r .descriptor
    1wpkh([f6bb4c63/0'/0'/30']028429a37c3f09c8c5cc1fab58df32d1a7da7616c748a40eeb1aae1d64acb9c5cc)#5wdxpxcx
    

    It wasn’t until @gwillen pointed out this substitution of h with ' that I realized what was going wrong with my attempts to troubleshoot @myveryown’s problem. I’m in agreement with the OP on both points:

    1. getdescriptorinfo should provide a way to get the checksum for a non-canonical descriptor
    2. Paths using hardened derivation should preferentially be indicated by an h rather than a '

    I think the output should probably look something like this:

    0$ bitcoin-cli getdescriptorinfo "wpkh([f6bb4c63/0h/0h/30h]028429a37c3f09c8c5cc1fab58df32d1a7da7616c748a40eeb1aae1d64acb9c5cc)"
    1{
    2  "descriptor": "wpkh([f6bb4c63/0h/0h/30h]028429a37c3f09c8c5cc1fab58df32d1a7da7616c748a40eeb1aae1d64acb9c5cc)#vk9vfu0h"
    3  "normalized": "wpkh([f6bb4c63/0'/0'/30']028429a37c3f09c8c5cc1fab58df32d1a7da7616c748a40eeb1aae1d64acb9c5cc)#5wdxpxcx",
    4  "isrange": false,
    5  "issolvable": true,
    6  "hasprivatekeys": false
    7}
    

    (Instead of “canonical”, I use “normalized” because it has the same number of letters as “descriptor” and so aligns well in fixed-width displays. Also canonical seems more ambiguous than normalized in this context.)

  5. gwillen commented at 6:26 pm on May 6, 2019: contributor

    I’m going to take a crack at this today – I’m going to both switch the default marker to “h”, and have getaddressinfo return the same descriptor provided, rather than alter it. Then people can bikeshed the result. ;-)

    (EDIT: Oh, I like @harding ’s suggestion of providing both the as-given descriptor and the normalized descriptor, will do.)

  6. Sjors commented at 10:17 am on May 7, 2019: member

    Concept ACK on making descriptor checksum based on h or ' given by the user, rather than canonical form. But let’s add an error for mixed usage.

    I don’t think the normalized field is useful.

    I’m fine with switching the default to h. In addition to RPC usage, it’s probably also less susceptible to graphical operating systems messing with quote marks.

    I’m on the fence regarding private keys. At least our RPC should not echo private keys, so it would have to return the checksum without the rest of the descriptor. There could be a separate checksum_private return field for this case.

    Can you update the PR description to say getdescriptorinfo instead of getaddressinfo (assuming that’s what you meant)?

  7. gwillen commented at 8:35 pm on May 7, 2019: contributor
    Sorry yes, that is what I meant, fixed.
  8. gwillen commented at 1:45 am on May 8, 2019: contributor

    Ok, it seems like changing the default character from “’” to “h” is a one-line change, whereas preserving the user’s original usage BUT still stripping private keys is going to be a hugely-invasive change to descriptor parsing (since we parse and deparse completely in order to print right now, and do not track which spelling was used), and giving a sensible error message for mixed usage is a different hugely-invasive change (since we do not currently have any way of returning error codes from parsing, so they would have to be added.)

    So I’m going to PR the one-line change as “hopefully the best use of time right now”, since I’m not aware of anybody clamoring to use “’” instead of “h”, and everyone seems to prefer the opposite.

  9. sipa commented at 2:08 am on May 8, 2019: member

    The original reason to make ’ the default was because of consistency with decodepsbt (which uses the ’ in derivation paths).

    It’s not too hard to make the parsing/serializing just roundtrip what was provided, if that’s desirable.

  10. gwillen commented at 3:00 am on May 8, 2019: contributor
    Hmm, can we get away with changing decodepsbt also? Otherwise we end up with two inconsistent “canonical” forms, which isn’t great either.
  11. Sjors commented at 3:01 pm on May 8, 2019: member
    Paging PSBT-guy: @achow101
  12. achow101 commented at 5:46 pm on May 8, 2019: member

    Hmm, can we get away with changing decodepsbt also?

    Yes.

    IMO we should change this everywhere, including its use in wallet metadata (but that may not be possible/necessary). The only reason ' was being used was because that was the the derivation path that the wallet stored for each key. However since derivation paths are now stored as integer values, it should be fine and safe to change everywhere it is displayed to use h. Note that we still may need to have the wallet continue to write out ' in the derivation path string in CKeyMetaData for backwards compatibility reasons, but that string shouldn’t be displayed anymore so it doesn’t really matter.

  13. gwillen commented at 9:29 pm on May 8, 2019: contributor

    I would be in favor of changing the default to “h” everywhere (pieter’s PR, which I just linked, doesn’t change the default but only makes it possible to get the original descriptor echoed back unmodified with checksum.)

    Note that the easiest way to change the default in getdescriptorinfo requires changing it in bip32.cpp in FormatHDKeypath, which (if I’m not mistaken) also changes it in CKeyMetaData. But as long as ParseHDKeypath is changed to accept both, there’s no back-compat issue, so I don’t expect it should be a problem. I’m happy to continue with that PR if people are comfortable with that.

  14. achow101 commented at 10:41 pm on May 8, 2019: member

    which (if I’m not mistaken) also changes it in CKeyMetaData.

    Only for imported keys.

    I don’t think there’s actually a backwards compatibility issue since it’s just a string that’s only used to be displayed to users.

    I’m happy to continue with that PR if people are comfortable with that.

    ACK.

    Also, there’s a redundant WriteHDKeypath function in rpc/rawtransaction.cpp which can be removed and the thing that uses it can use the WriteHDKeypath in utils/bip32.cpp.

  15. maflcko referenced this in commit 0b058ba69d on May 23, 2019
  16. sidhujag referenced this in commit bbbc06a628 on May 24, 2019
  17. gwillen commented at 6:12 pm on September 9, 2020: contributor

    I guess I let this drop after we started echoing the checksum for the original version, without going all the way to switching the default to “h”?

    The user ghost43 on IRC just hit a shell-quoting issue related to this, reported in ##hwi on Freenode: https://gist.github.com/SomberNight/8d7b065e74449a8716da9a98cac3cae2

    If you wrap the descriptor in single-quotes in the shell, and it happens to have an even number of ’ in it (and the contents of the descriptor happens to not actually need quoting), the shell will just silently remove them as valid-but-useless quotes. This transforms the descriptor into one with public derivation everywhere, which is scary.

    So I guess this should be a vote in favor of reviving the idea to just force everything to be ‘h’ instead of ’ by default.

  18. harding commented at 4:10 pm on September 20, 2020: contributor
    I would very much prefer to have h everywhere by default. I don’t work with descriptors often, but dealing with the quoting issues annoys me every time.
  19. sipa commented at 4:55 pm on September 20, 2020: member
    No strong opinion here. Descriptors can also contain [ and ]. Can’t those pose problems in shells as well?
  20. harding commented at 5:48 pm on September 20, 2020: contributor

    @sipa certainly []()/*' all have special meaning in bash, but only one of them can’t be made literal by encapsulating it in single quotes and can’t even be escaped by simply prefixing it with a backslash within single quotes.

    The challenge I often have is that I need to use one set of quotes to escape a string from the shell and a different set of quotes to indicate a string to the JSON parser; this is most easily done using double quotes for one string and single quotes for the other string, e.g. I want to do something like this:

    bitcoin-cli scantxoutset start '[{"desc": "pkh([d34db33f/44'/0'/0']xpub6ERApfZwUNrhLCkDtcHTcxd75RbzS1ed54G1LkBUHQVHQKqhMkhgbmJbZRkrgZw4koxb5JaHWkY4ALHY2grBGRjaDMzQLcgJvLJuZZvRcEL/1/*)"}]'
    

    But instead I have to add three escape characters per single quote in the BIP32 path:

    bitcoin-cli scantxoutset start '[{"desc": "pkh([d34db33f/44'\''/0'\''/0'\'']xpub6ERApfZwUNrhLCkDtcHTcxd75RbzS1ed54G1LkBUHQVHQKqhMkhgbmJbZRkrgZw4koxb5JaHWkY4ALHY2grBGRjaDMzQLcgJvLJuZZvRcEL/1/*)"}]'
    

    The extra effort is annoying, but I think it’s also a challenging problem for users who are less experienced at the shell. If you run the first command above, the error you get is:

    bash: syntax error near unexpected token `)'
    

    That’s not very useful to an inexperienced user—they may think they need to escape the parens at the end of the string rather than noticing the problem with the single quotes near the beginning of the string. And, as @gwillen noted, it’s easy for a user to get the escaping wrong and not notice the shell submitted a different descriptor than they expected, e.g.:

    $ set -x
    $ bitcoin-cli scantxoutset start '[{"desc": "pkh([d34db33f/44'/0'/0]xpub6ERApfZwUNrhLCkDtcHTcxd75RbzS1ed54G1LkBUHQVHQKqhMkhgbmJbZRkrgZw4koxb5JaHWkY4ALHY2grBGRjaDMzQLcgJvLJuZZvRcEL/1/*)"}]'
    + bitcoin-cli scantxoutset start '[{"desc": "pkh([d34db33f/44/0/0]xpub6ERApfZwUNrhLCkDtcHTcxd75RbzS1ed54G1LkBUHQVHQKqhMkhgbmJbZRkrgZw4koxb5JaHWkY4ALHY2grBGRjaDMzQLcgJvLJuZZvRcEL/1/*)"}]'
    

    I think users are a lot less likely to encounter problems with the other special characters, since those characters will probably be escaped normally and so interpreted literally.

  21. sipa commented at 6:03 pm on September 20, 2020: member
    @harding Yeah, that sounds reasonable. Concept ACK
  22. 6102bitcoin commented at 8:03 pm on November 17, 2020: none

    As an inexperienced user I just came up against this issue.

    Is there a way to force the use of h?

    _Background

    My wallet provided me with a descriptor (including checksum) using h in indicate hardened address derivation (rather than ‘).

    When I compared this to the descriptor I built (using bitcoin-cli getdescriptorinfo) I noticed that the checksums did not match, and that regardless of whether I set the hardened path with ’ or h in my code, getdescriptorinfo is always sanitised this to ‘._

  23. gwillen commented at 8:13 pm on November 17, 2020: contributor
    @6102bitcoin I believe that, when you run getdescriptorinfo, it should now show you TWO checksums – one corresponding to the exact descriptor you provided, and one as part of the “sanitized” version with the ’ characters. Does that seem to be the case?
  24. 6102bitcoin commented at 9:32 pm on November 17, 2020: none

    Only 1 checksum is provided in the checksum field. (perhaps) confusingly the checksum provided in the checksum field does not match that at the end of the end of the descriptor field.

    bitcoin-cli getdescriptorinfo “wpkh([fingerprint/84h/0h/0h]xpub/0/)” { “descriptor”: “wpkh([fingerprint/84’/0’/0’]xpub/0/)#checksum1”, “checksum”: “checksum2”, “isrange”: true, “issolvable”: true, “hasprivatekeys”: false }

    Where checksum1 corresponds to the checksum using ’ & checksum2 corresponds to the checksum using h

  25. 6102bitcoin commented at 7:41 am on November 18, 2020: none
    On further reading of the pull linked above this appears to be the expected behaviour.
  26. gwillen commented at 11:19 pm on November 24, 2020: contributor

    I took a quick look at what it would take to change the default from ' to h. (Considering in particular whether it makes more sense to try to change it only at the interface, or also internally.) Here’s what I found:

    • 10 functional tests (plus psbt_wallet_tests.cpp) have ' hardcoded in various places that I could find. (Generally found by grepping for roughly 44'|49'|84'|/0'|/1'). The functional tests would need to be changed no matter what, insofar as they are parsing RPC output and looking for ' and not h.
    • In descriptor.cpp:
      • BIP32PubkeyProvider::ToString and ::ToPrivateString emit '.
      • ParseKeyPath accepts either ' or h, so that’s no problem.
    • In bip32.cpp:
      • FormatHDKeypath emits '.
      • ParseHDKeypath does a string search for '. I think this would be safe to change to h because it only operates on path components, which otherwise are purely numeric (and it checks this.)
    • In scriptpubkeyman.cpp:
      • LegacyScriptPubKeyMan::DeriveNewChildKey and DescriptorScriptPubKeyMan::SetupDescriptorGeneration both have a bunch of hardcoded keypath fragments with apostrophes in them.
    • In descriptor_tests.cpp:
      • There is a function UseHInsteadOfApostrophe which is used to turn descriptors with ' into descriptors with h to ensure that both formats are tested. Since that direction is easily done with a character substitution, but the reverse is not, it would seem necessary to keep these testcases formatted with an apostrophe to easily test parsing of both styles.

    I get the impression that internally, descriptors are stored at least sometimes as strings with ', and ParseHDKeyPath is used for parsing these (and only accepts ', never h). However, it seems that (outside of tests), this is only used:

    • When the key origin (with pre-parsed path) is missing (walletdb.cpp:463); or
    • When upgrading a wallet without the flag WALLET_FLAG_KEY_ORIGIN_METADATA (scriptpubkeyman.cpp:392).

    So I assume a newly-created wallet will store a non-string key origin, and this codepath will not be used. All other parsing of keypaths appears(?) to go through descriptor.cpp’s ParseKeyPath function, which accepts both.

    So with all that out of the way:

    • Which of these functions are used for stuff we get from / write to the walletdb, and which for RPCs and other UI uses? Is it strictly split between bip32.cpp for the former, and everything else is the latter?
    • Is it okay to just let ParseHDKeyPath accept both formats, or is this undesirable?
    • Is it okay for descriptors stored in the wallet to switch from ' to h going forward? Does this break compatibility in a way that requires some kind of flag or version bump? (I’m not super familiar with wallet versioning stuff, but keep in mind that the wallet – at least the CURRENT wallet – does not APPEAR to ever parse this unless the key origin metadata is missing. If it is the case that NO version of Core (which is willing to read a modern wallet file) will do so, then this may be compatible.)
    • Have I gone too far, and should I just be looking at doing this for descriptors that are presented to the user (e.g. via RPC)? This is tricky because ' and h give different checksums, so this change has to happen before the checksum is added. (Are descriptors in the wallet stored with checksums, or without?)
  27. vijaydasmp referenced this in commit b9db235966 on Dec 5, 2021
  28. vijaydasmp referenced this in commit 3095317726 on Dec 13, 2021
  29. vijaydasmp referenced this in commit 4a70afca62 on Dec 13, 2021
  30. vijaydasmp referenced this in commit 7e3690de5a on Dec 18, 2021
  31. vijaydasmp referenced this in commit 8a84a9b7c0 on Dec 25, 2021
  32. Sjors commented at 8:26 am on September 13, 2022: member

    It’s been a few years, and the use of ' instead h has been consistently driving me nuts, e.g. when trying to import something from listdescriptors with importdescriptors or test the new scanblocks RPC in #23549. The fact that swapping the character changes the checksum makes it even trickier.

    Are there any caveats to make such a change safe for existing wallets? Since the descriptor ID internally is based on the hash of the string.

  33. achow101 referenced this in commit fa53611cf1 on May 8, 2023
  34. pinheadmz commented at 1:58 pm on May 17, 2023: member
    Closing as completed by #26076
  35. pinheadmz closed this on May 17, 2023

  36. bitcoin locked this on May 16, 2024

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-07-03 10:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me