For any wallet account that is not single-signature, backing up the descriptor is crucial, as its loss is likely to be catastrophic and lead to lost funds - even if the seeds that are in theory sufficient for recovery are not lost. This is true even for simple multisig wallets, as losing the knowledge of just a single xpub might make recovery impossible.
In lack of a standard, this led people to get creative with wallet backups schemes, with some even engraving the descriptor in metal.
I believe this is a bad idea and should be discouraged. Descriptors are not seeds, and should be treated radically differently both in theory and in practice.
In this post, I briefly introduce the context, and draft what I believe is the ideal structure for a wallet account backup standard that could be adopted by software wallets.
Motivation: secrecy vs privacy
The seed is secret, as it is what protects the key material that allows spending funds. Unauthorized access to the seed implies that attackers gain ownership of the funds (or at least the specific access controls that the keys are protecting). Hence, it is very valuable for an attacker to gain access to seed, and they will be willing to increase the cost and the sophistication of the attacks, because of the potential of high returns.
Therefore, for seeds:
digital copies are a high risk: hardware signing devices have been built to keep the seeds in a secure enclave, separate from the machine the software wallet is running on.
redundant copies of the seed are a high risk: the seed has to be physically protected, and multiple copies in multiple places inherently make their protection harder.
The descriptors (and their little brother xpub) is only private: unauthorized access allows an attacker, to spy on your funds. That is bad, but not nearly as valuable as taking your funds. Attackers might use this to get information about you, and to inform further attacks, but will lose interest once attempting an attack becomes too costly or sophisticate.
For descriptors:
digital copies are unavoidable: each of parties using the account will necessarily have a digital copy in their software wallet.
additional redundant copies are a very moderate risk.
Therefore, having multiple copies of the descriptor, whether physical, digital, on your disk or on the cloud, is a valid mean to reduce the risk of loss of funds, unlike replicating the seed - which would incur a much higher risk.
I recommend timelocked recovery mechanisms like liana to effectively address the risk of loss of funds caused by mismanaging the seed.
So how do we backup descriptors and wallet policies, the right way?
Physical copies are easy - all you need is a printer. Paper is fine when you have redundancy.
Here I’m concerned with digital copies only.
Desirable properties of a digital backup
Encrypted: this allows to outsource its storage to untrusted parties, for example, cloud providers, or even public forums.
Has access control: decrypting it should only be available to the desired parties (typically, a subset of the cosigners)
Easy to implement: it should not require any sophisticated tools
Vendor-independent: ideally, it should be easy to implement using any hardware signing device.
Deterministic: the result of the backup is the same for the same payload. Not crucial, but a nice-to-have.
A simple deterministic encrypted backup scheme (draft)
We could just encrypt the payload with each of the xpubs of the parties that we want to be able to decrypt it.
Idea 1:
We can do better: we generate a random 32-byte symmetric secret s, encrypt s with each of the public keys, and encrypt the payload with s. This reduces the backup size from O(n \cdot |data|) to O(n + |data|) for n keys.
Idea 2:
For any party that knows the descriptor, there is nothing to protect. Therefore, secrecy is reduced to secrecy for anyone who doesn’t already have the descriptor. We already have a key (the xpub) for each involved cosigner, so we can re-use the same key as the encryption key. However, using asymmetric encryption would require the private key for decryption. This is undesirable, as private keys might be kept in secure enclaves that might not be easily programmed with customized decryption logic. Instead, we reuse the entropy of the public key itself to generate a symmetric secret key, and use it to ‘encrypt’ the shared secret s. Therefore, for the party i with public key p_i, we derive its symmetric secret s_i = \operatorname{sha256}(``\textrm{BACKUP_INDIVIDUAL_SECRET}" \| p_i). This avoids asymmetric encryption, and only requires access to the public key from the secure enclave - a functionality that all the signing devices for bitcoin already provide.
Idea 3:
The only randomness in the process is the shared secret s. In order to make it fully deterministic, we can use the combined entropy of the descriptor to derive a deterministic, shared secret known to anyone who knows the descriptor. Assuming that the different xpubs involved in the descriptor/wallet policy are p_1, p_2, \dots, p_n (in lexicographical order), a simple choice is: s = \operatorname{sha256}(``\textrm{BACKUP_DECRYPTION_SECRET}" \| p_1 \| p_2 \| \dots \| p_n).
The next section puts these ideas together.
The scheme
In the following, the payload data that is being backed up is left unspecified, but it will include (at least) the descriptor or the BIP388 wallet policy. The operator \oplus refers to the bitwise XOR.
Let p_1, p_2, \dots, p_n, be the public keys in the descriptor/wallet policy, in increasing lexicographical order
Let s = \operatorname{sha256}(``\textrm{BACKUP_DECRYPTION_SECRET}" \| p_1 \| p_2 \| \dots \| p_n)
Let s_i = \operatorname{sha256}(``\textrm{BACKUP_INDIVIDUAL_SECRET}" \| p_i)
Let c_i = s \oplus s_i
encrypt the payload data using the symmetric key s using AES-GCM.
The backup is the list of c_i, followed by the encryption of data.
**Note**: this scheme should not be used with descriptors containing private keys (*xprv*). Most software wallets only include *xpubs*.
Decryption
In order to decrypt the payload of a backup, the owner of a certain public key p computes s = \operatorname{sha256}(``\textrm{BACKUP_INDIVIDUAL_SECRET}" \| p), and attempt the decryption of the payload with the key c_i \oplus s for each of the provided c_i.
Decryption will succeed if and only if p was one of the keys in the descriptor/wallet policy.
Security considerations
A deterministic encryption, by definition, cannot satisfy the standard semantic security property commonly used in cryptography; however, in our context, it is safe to assume that the adversary does not have access to plaintexts, and no other plaintext will be encrypted with the same secret s.
Further work
I hope this serves as an inspiration for a more formal specification and implementation that software wallets can adopt.
@josh this has a lot in common with the method you were describing to me, except for the inscription and location features of yours. Perhaps getting Salvatore’s standardized and then layering those parts on would help everyone
@salvatoshi I’d love to share a tool I built that does something similar and perhaps collaborate on getting a multisig backup scheme standardized. I agree that this is a problem that needs to be solved.
The scheme you propose is simple and would appear to work, assuming SHA256 can be used as a secure KDF where the key is derived from a large subset of the data it is encrypting. There is at least one drawback, though:
If the encrypted descriptor is stored publicly or on a compromised server, an attacker who gains access to one secret gains knowledge of the existence of the multisig. This is not ideal if a user wants to protect themselves with a decoy single-sig wallet.
The scheme I’m using makes one significant change. In a k-of-n multisig descriptor, the secret s is split into n shares using shamir secret sharing, where k shares are needed to recover. Each share is then encrypted with one xpub, so that k xpubs are needed to decrypt.
The other minor difference is that I leave the derivation paths in plaintext, so that a user knows how to derive their xpubs. Only the sensitive data is encrypted (the xpubs and master fingerprints).
As of now, the scheme only supports standard (non-taproot) multisig descriptors. In the future, I hope to generalize it to support decaying and non-decaying P2TR multisigs.
Here’s the GitHub repo and the corresponding Delving post. The slides I presented at BitDevs ATL can be found here.
Hi @josh, thanks for the comments! Somehow I missed your previous post, my bad.
I would say in my scheme there is no distinction between an attacker and ‘someone who knows a secret’, as it’s designed to give knowledge of the descriptor precisely to the people who know at least one of the xpubs (or a subset of them, if desired). So if someone has the backup and knows an xpub, they are expected to be able to decrypt.
Shamir secret sharing, apart from adding at least some (arguably manageable) complexity, does not generalize well to wallet setups more complex than multisig. For example, in a setup where there is a time-locked recovery partner that can help retrieve the funds if the primary spending path became inaccessible, you want them to be able to decrypt the backup even with the single xpub.
If you don’t want to enable some party to decode the backup, I think what will work better in practice is to have redundant copies of the backup, but do not give access to the backup to this third party (therefore, not posting it in a public place). Only if the primary spending path becomes lost, then they will be sent the encrypted backup.
This is a great idea; even just a list of all the derivation paths that appear in the key-origin information (without attribution to specific keys) would reduce the search space to at most n xpubs when attempting decryption.
Assuming we want all keys to form the secret, one way to “prevent” someone to be able to access it would be to simply to not generate their ci. Might be useful for some use-cases.
I’m also pondering if the ci should not use a different entropy, maybe a different path (standard, this time), from the same device. The major drawback is that all devices need to provide their second key for the backup to be performed, instead of just any person in the setup being able to create the encrypted backup.
The advantage is to not have to deal with unknown paths (let’s not create a descriptor of the descriptor backup?), possibly even allowing hardware manufacturers to later on add security features to this specific path (confirm on screen to share it?), without breaking compatibility now.
Lastly, I feel like these files would benefit strongly from an error correction mechanism.
I obviously don’t like the idea of sending it to the chain, so I assume most users won’t have large number of replications.
In the case of Liana, assuming it’s for disaster recovery or inheritance, it might be just one copy easily accessible. You want that one to be correct.
Indeed this is what I meant when I wrote above that the scheme has access control, but I could elaborate a bit more.
I agree with the advantage of reducing/eliminating the number of search paths for recovery. However, my main concern is that in practice, this adds a big dependency: the backup scheme now needs access to the necessary tech stack to access the hardware signers (notoriously, a non-trivial one), and the physical device needs to be available when the backup is created - so for example, a watch-only wallet that only receives the descriptor can’t create the encrypted backup.
Instead, the more trivial scheme above is a pure function f(descriptor) -> backup, which I think is a big practical advantage.
Can you elaborate on this? I can’t think of situations where error correction would save the day.
I’d rather suggest implementations to get creative in how to make sure that there are multiple replicas of the backup. Save to google drive? Send via e-mail or DM to someone else (that’s two copies)? Post on nostr/twitter/facebook?..
All of these options could be just a few clicks away with a good implementation in software wallets.
Many services doing any form of collaborative custody (or providing services for self-custody) could also consider storing the encrypted backup for their customer, so backup would be entirely transparent and add no UX cost at all.
Over time, data storage decay. As we can’t assume it will always be stored in a self healing manner, I would put such a self healing mechanism in the backup itself.
That being said, it might be overkill in the sense that most data storage media already implement an error correction mechanism (SD, HDD, SSD, optical disks).
Still, each mechanism has his own tolerance for failure, and sometimes the error correction assumes regular use.
To answer your questions: backups on an offline Flash, Magnetic or Optical storage, let to data rot for 10 years.
Sure, extra ECC won’t prevent failure ultimately, but could prolong the viability of the data long for a negligible cost.
While having more backups is the better option, it’s still not certain the user will easily be able to store and recover data in a large number of places.
Another point would be that a user might not want to leave copies everywhere (or in public, even if encrypted).
To conclude, I’d say that this scheme should still “strongly recommend” having multiple backups, in multiple places. But any one copy should also be pretty resilient against decay, more than your average data.
I like this approach. Conversely, if you lose one of the signer keys (and the software wallet that goes with it), having any of the other keys lets you recover this information. E.g. your house burns down along with one signing device, your Bitcoin Core node and its wallet, but you still have a signing device in a vault.
Calling this s had me confused. But IIUC what you’re actually doing here is to reconstruct your s_i, without knowing which i is you. You try it against each c_i to see if you found s. The way you know if that worked is if s decrypts something sane.
You still need to figure out which xpub to use for p.
The BIP87 account level xpub seems like a good candidate to recommended for this, where you may have to try multiple accounts for decryption.
Or you just add a (plain text) derivation hint to the backup.
Another approach is to pick a standard derivation path for these backups, but then you lose the nice property of being able to derive the backup key from a descriptor:
One such practical advantage:
Backups can easily be verified against data corruption by any software that has the descriptor.
If you want to go one level fancier, you can even use this backup format for cloud sync.
I would suggest a simple JSON blob with a "descriptor" field. That can arbitrarily be expanded to include whatever else people want to (occasionally) backup, such as BIP329 labels.
But this is an essential property for recovery in my opinion, see above.
This adds complexity to recovery, since the (miniscript) descriptor might define a completely different policy than the the information access control. You’d also have to remember the threshold value.
In any case this scheme still reveals the presence of a more sophisticated wallet even if not its contents. It makes it unlikely an attacker falls for the decoy.
Stenographic storage of the backup seems like a better way to deal with this issue.
Not much documentation on the link to understand what it is, but it seems about backing up secrets (like seeds and private keys), while the scheme I’m proposing is about backing up public keys and descriptors/wallet policies.
Hey @salvatoshi, I created a rust librarydescriptor-encrypt that can encrypt any descriptor such that only authorized spenders can decrypt. I plan to make a separate post about it, but I wanted to share it here first as I thought you might find it interesting.
The basic idea is to make the access control policy match the spending policy of the descriptor. It supports all descriptor types and miniscript, and it includes a tag-based and variable-length encoding scheme to minimize the size of the encrypted data, among other features.
With “full secrecy” mode turned on, the encrypted data can be stored in public, and an attacker will learn nothing about the descriptor, or even its existence, unless they compromise enough seeds to spend the funds.
Let me know what you think! My goal was to address your earlier concern about not being able to handle complex wallet setups, like those with a hash-lock or time-lock.
Hi josh,
I didn’t look into the details of how the recursive secret splitting works, but it seems reasonable and it’s very cool that this can be done at all. Good work!
In a way, it could be considered a generalization of the scheme: in the form I proposed, the "parties that can decode the backup are (a subset of) the parties providing the xpubs, while with your scheme you can also enable thresholds of them, and more complex subsets matching (some of) the spending conditions defined by the miniscript rules.
In practice, I still expect that the simple choice is the best for most users, and likely has a much lower adoption barrier because of the much simpler implementation complexity. In particular, if recovering from backup requires multiple parties, building a UX for it is substantially more complex in a wallet.
Thanks! I agree that the implementation is more complex, but my hope is that packaging it into a rust crate with WASM and other bindings might make it easier for wallets to adopt.
Regarding multi-party wallets, I agree that there is a tradeoff there. You get stronger privacy guarantees, in the event that one of the keys is compromised, but recovery then requires two rounds of collaboration, instead of one.
BIP 338Wallet Policies for Descriptor Wallets could be expanded to add T, the wallet birthday timestamp. It has to be the same everywhere, e.g. tr(musig(@0,@1)/T/**,{and_v(v:pk(@0/T/**),older(12960))} is 2-of-2 where @0 can unilaterally sign after 3 months.
You could also use block height H, since it’s a smaller number. But not everyone has an intuitive feel for block heights. It’s also much more likely that you’ll recognise a date even in the distant future.
That would be a breaking change in the specs for all the hardware signers that implemented it.
I think adding the /T step (whether hardened or unhardened) for each of the involved keys (rather than modifying the descriptor template) achieves the same without breaking changes, so your descriptor above would still be just tr(musig(@0,@1)/**,{and_v(v:pk(@0/**),older(12960))}, but each key has the additional derivation step for the xpubs (so /T only appears in the key origins, rather than in the descriptor template).
In that form, the fact that T is the same for all keys is not a requirement - and I don’t think it’s practical to expect it in a multiparty setting: people will provide an xpub at a different time. Forcing it to match would require a round of communication prior to exporting the xpub. While I expect wallets to just store the other parties’ key origins if they have it, that is not always possible and wallets should probably avoid relying on its knowledge at all.
I hadn’t thought about the interactivity requirement for making T match. My implicit assumption was that one party collects the xpubs, picks T, generates the descriptors and sends them back.
But by making T part of the key, you lose the property of having a predictable xpub to recover from.
A related issue is that Bitcoin Core normalizes descriptors to the last hardened derivation step, so aside from musig() which is a bit special, [a]/T would just become [aT] and we lose the predictable xpub.
Another thing I realized is that BIP32 only allows for 31 bit numbers (and one bit is reserved for the hardening flag), so this scheme would break in 2048. We could divide the timestamp by 3600 to work around that. Hourly precision should be enough to avoid duplicates.
It was suggested above (and I agree) to leave the derivation paths from the key origins in clear text in the wallet backup (tbd if with or without fingerprints). Wouldn’t that obviate the need for a predictable xpub?
It would, but it would also reveal the number of participants.
I think it’s better to have a (non-mandatory) predictable derivation, and just recommend using a fresh account number when reusing a device in more than one setup. During recovery, trying a few different account numbers shouldn’t be too bad.
I’m trying to wrap my head around the XOR set of individual secrets Ci (included in backup) as they relate to the shared secret S (to decrypt the ciphertext payload).
[updated] to explain my confusion yesterday about the above statement. They’re not related at all. The shared decryption key is a secret by itself, as are each of the individual Ci pre-images, which are hashed to hide the shared secret until one of the cosigners can remove theirs via XOR to reveal the shared decryption key.
If there are only 2 xpubs in a descriptor, then the XOR result of both Ci values IS the shared secret?
If there are 4 xpubs, or any “even” number of xpubs since each Ci is the whole shared secret minus that individual secret, then the combined XOR result of all Ci values (w/ each individual secret XORed an odd number of times, and revealed), IS the shared secret?
I think I recall a version of similar scheme where each Ci was ciphertext decrypted by its pubkey that revealed the shared secret, rather than XOR-“subtracted” (if that makes sense) from the shared-secret.
I’ll politely ask you to excuse me if I’ve wasted your time or brain cycles to consider my above doubts. I thought that it would be possible to single-out one of the individual secrets and eventually the shared secret by playing with a subset of the Ci secrets, but after trying, I see that the best we can do is an XOR result of at least 2 unknowns.
Thank you for putting thoughts into this important topic.
One idea for an improvement: I do not like that I need now to keep a secret to get an access to descriptor. So, what if I use master xpub chain code as a secret? With that, I would be able to gain access to all descriptors in which keys derived from that master xpub participate.
Also, it is not clear why we need to have a share secret; instead, each multisig participant creates his own backups symmetrically encrypted with just his master xpub chaincode.
I do not completely agree with you on this specific point, for privacy reason you may not want any recovery key to be able to decrypt alone, if the lowest recovery condition is a threshold forinstance.
Let say my Liana policy is or(A, and(thresh(2,B,C,D), older(timelock))) where i’m A and B,C,D are my heirs, I may want to limit the possibility to decrypt the backup only if 2 of my heirs cooperate (or even 2 heirs + a lawyer/ third party).
In this (particular) case SSS (or any mechanism enforcing a threshold at decrypting) may be useful.
But I personally see it more as an optionnal feature or a different format/version rather than default.
In the proposed scheme, indeed you do not need any additional secret (other than your seed/mnenomic). The encryptions of the common secret are part of the backup, not extra secrets to store individually.
There are certainly possible extensions, as explored by @josh ([1][2]). While I think they are neat and interesting, I would be wary of adding complexity to an otherwise very simple scheme, as I think it is likely to hamper adoption. While a library can encapsulate the implementation complexity, it cannot always encapsulate the interface, which is often made more verbose/difficult by the presence of additional features. Interoperability might also be affected if there are optional features.
In your example, and for most inheritance use cases, the capability of individual heirs to decrypt the backup (even if cooperation is required to actually move the funds) is IMHO unlikely to be problematic in practice.
Related to the topic of this thread, the Bitcoin Core wallet is about to add an RPC to backup a wallet’s descriptors and related information. See https://github.com/bitcoin/bitcoin/pull/32489. I’m unsure how much overlap there is with this effort, but it’s about achieving the same goal: backing up all private-but-not-secret information related to the wallet of a user.
In the meantime, I’ve “played” with a C implementation of this BIP draft, I wanted to check how hard it should be to implement in bitcoin core and I figure out something: it seems there is actually no dependency in core for AES-GCM-256, while there is already usage of CHACHA20 so i’m wondering if we should not use CHACHA20 as default encryption algo? (I’ll cross-post to delving)
I’d like to get feedback about changing the default encryption algo from AES-GCM => CHACHA20, I’m really thinking having this implemented in core should be a trong plus, and I dant think it worth to add new dependencies only for that purpose.
The secret vs private distinction is a great framing and I think it’s going to change how people think about descriptor backup burden.
However, I still think about two things:
The availability problem feels not addressed. Google deletes inactive accounts. Nostr relays drop data whenever they feel like it. Email providers get acquired. The person who stored 5 copies at wallet creation has no way to know any of them survived without actively checking, which most people won’t do. Confidentiality and durability are separate properties and only one of them is solved here.
The inheritance path also seems to assume the heir holds an xpub, which means they were a cosigner or the owner handed them key material before dying, which is kind of the hard part of inheritance anyway. The dead man switch email workaround is honest about this gap but reintroduces exactly the centralised dependencies the encryption scheme eliminates elsewhere.
Questions I don’t have an answer to are:
Is there a way to provide durability guarantees that are enforced by something other than user discipline?
What’s the right primitive for heirs who hold no key material at all?
You are correct on both points: this scheme makes no attempt to solve the availability problem, nor guaranteeing the correct key discipline for inheritance and other use cases. I think those have to be addressed on other layers.
I would disagree that the availability problem is hard to solve, though. Computers are really good at storing and copying things. While it’s true they aren’t very good at holding on to things for a very long time, that problem is solved with enough redundancy.
People seem to disagree and proposed solutions like OP_RETURNs or inscriptions as a guaranteed backup – but i don’t think that’s necessary, and ultimately it will likely become too expensive anyway.
Ensuring correct key management from the involved cosigners (especially heirs) is definitely the hardest nut to crack for adoption.
For non-technical people, I tend to think that ‘assisted self-custody with custodial fallback’ is the most reasonable and technologically approachable solution. That is, they are in self-custody (and during that time, a non-custodial service can provide reminders about coin refresh and key management); but if all else fails, after a long enough time-frame, some moderately-trusted third party gets access to the funds. At least trust is only required in case of complete failure, rather than in the normal scenario.
Fair point. Tools already exist to automate replication across multiple providers, and if that gets built into standard software it probably handles most of the availability problem without needing dedicated infrastructure.
Where I keep landing is one step further though. Even with perfect replication, the heir still has to find and access those copies under stress, possibly years after the owner’s accounts went inactive. I think that’s a coordination problem on the recovery side that feels separate from the storage side.
Is the custodial fallback your answer to that too, or do you see that as a different layer entirely?