A more detailed proposal can be found here: https://github.com/jamesob/assumeutxo-docs/tree/2019-04-proposal/proposal
I’d like to talk about the desirability of an assumeutxo
feature, which would allow much faster node bootstrapping in the spirit of assumevalid
. For those unfamiliar with assumeutxo, here’s an informal description from one of last year’s meetings:
“Assume UTXO” is an idea similar to assumevalid. In assumevalid, you have a hash that is hard-coded into the code, and you assume all the blocks in the chain that ends in that hash, that those transactions have valid scripts. This is an optimization for startup to not have to do script checks if you’re willing to believe that the developers were able to properly review and validate up to that point. You could do something similar, where you cache the validity of the particular UTXO set, and it’s ACKed by developers before updating the code. Anyone can independently recompute that value. There’s some nuanced considerations for this […]. The downside if you fuck up is bigger. In assumevalid, you can trick someone into thinking the block is valid if it wasn’t, and assumeutxo you might be able to create coins or something; there are definitely different failure modes. So there’re different security implications here. You could use assumeutxo, but you could go back and double check later. But there’s nuance about how this should be done or how. You could have nodes that export this value, [and so one attack might be] you could lie to your friend I guess, but that’s less scary than other things.
In other words, assumeutxo would be a way to initialize a node using a headers chain and a serialized version of the UTXO state which was generated from another node at some block height. A client making use of this UTXO “snapshot” would specify a hash and expect the content of the resulting UTXO set to yield this hash after deserialization.
This would allow users to bootstrap a usable pruned node & wallet far more quickly from a ~3GB file (at time of writing) rather than waiting for a full initial block download to complete, since we only have to sync blocks between the base of the snapshot and the current network tip. Needless to say this is at expense of operating under a different trust model (albeit temporarily), though how different this really ends up being from assumevalid
in effect is worth debate.
An implementation of assumeutxo could allow background validation of the loaded UTXO snapshot to happen concurrently with use of the assumed chain, so the trust model is only relaxed for a limited amount of time. Conceptually this is pretty straightforward: you’d have two chainstates maintained simultaneously, one of which is doing an IBD from genesis up to the base of the snapshot and the other (i.e. the assumed chain) just doing tip maintenance and servicing immediate requests for chain data. Once the validation chainstate reaches the height of the snapshot base, it computes a hash of the UTXO set it built and compares it to the snapshot’s hash. If it matches, we throw the validation chainstate away and continue to operate on the “assumed” (but now fully validated) chain - otherwise we pitch a fit in the logs and GUI, and either revert to the validation chainstate and continue traditional IBD or simply shutdown.
Specifying assumeutxo
The assumeutxo value/height pairings could be committed in much the same way that we currently update assumevalid. Eventually (and this is not a new idea) we could consider using a rolling UTXO set hash or some similar technology to commit to the UTXO set hash in each block header, though this of course ends up being a consensus change. This would obviate the need for any hardcoded assumeutxo value since bootstrapping nodes could obtain the headers chain, choose a UTXO set hash value at some height, and then obtain the correspondent UTXO snapshot from their peers on the basis of that hash.
Obtaining snapshots
Snapshots could be obtained from a variety of sources, whether over the peer network or centralized content distribution networks (CDNs). It doesn’t really matter since security is contingent on a content hash, though if assumeutxo ends up being something we actually want to support then I’d imagine we would build a means of distribution through the peer network.
Steps
I’ve already drafted up an implementation of this that excludes any P2P mechanism for distributing snapshots but makes the changes necessary for loading snapshots and doing concurrent background validation. It exposes two new RPC commands, dumptxoutset
and loadtxoutset
, to provide a test harness for creating and loading snapshots. Whether or not we’d actually want to commit that RPC interface is something I’d like input on.
Another option is to eschew a loadtxoutset
RPC and use a -utxosnapshot=<path>
startup parameter.
Afterwards, supposing we get that far, we’ll probably want to think about implementing P2P distribution of snapshots, and then maybe starting thinking about a rolling UTXO set hash for potential inclusion in block headers.
Questions
I’m curious for general opinions on this idea and the concrete implementation steps. If this ends up being desirable for the project, it’ll require a lot of refactoring and will probably result in a lengthy succession of incremental PRs (a la the process separation effort), though I am of course happy to propose large, cohesive diffs too. :)
Specific questions I have are:
- Does the assumeutxo trust model differ materially from assumevalid? If so, how? Is it too aggressive a departure from our existing trust model?
- If we agree this is a feature worth supporting, does the sequence of “RPC commands -> hardcoded assumeutxo value, optional use, P2P distribution -> UTXO rolling set hash block header commitment” make sense?