See the proposal for assumeutxo here.
Testing instructions can be found below the “Progress” section.
Progress
All items here have corresponding commits here, but are unchecked if they haven’t been merged yet.
- Chainstate interface
- Localize chainstate data
- Share block data
- Deglobalize chainstate
- UpdateTip/CheckBlockIndex modifications
- ChainstateManager
- Mempool
- LoadBlockIndex
- Init/teardown
- Wallet: includes avoiding rescans when assumed-valid block data is in use
- P2P: minor changes are made to
init.cpp
andnet_processing.cpp
to make simultaneous IBD across multiple chainstates work. - Pruning: implement correct pruning behavior when using a background chainstate
- Blockfile separation: to prevent “fragmentation” in blockfile storage, have background chainstates use separate blockfiles from active snapshot chainstates to avoid interleaving heights and impairing pruning.
- Indexing: all existing
CValidationInterface
events are given with an additional parameter, ChainstateRole, and all indexers ignore events from ChainstateRole::ASSUMEDVALID so that indexation only happens sequentially. - Raise error when both
-reindex
and assumeutxo are in use. - RPC: introduce RPC commands
dumptxoutset
,loadtxoutset
, and (the probably temporary)monitorsnapshot
. - Release docs & first assumeutxo commitment: add notes and a particular assumeutxo hash value for first AU-enabled release.
- This will complete the project and allow use of UTXO snapshots for faster node bootstrap.
- (optional) Coinscache optimization: allow flushing chainstate data without emptying the coins cache; results in better performance after UTXO snapshot load.
Testing
For fun (~5min)
If you want to do a quick test, you can run ./contrib/devtools/test_utxo_snapshots.sh
and follow the instructions. This is mostly obviated by the functional tests, though.
For real (longer)
If you’d like to experience a real usage of assumeutxo, you can do that too.
I’ve cut a new snapshot at height 788'000 (http://img.jameso.be/utxo-788000.dat - but you can do it yourself with ./contrib/devtools/utxo_snapshot.sh
if you want). Download that, and then create a datadir for testing:
0$ cd ~/src/bitcoin # or whatever
1
2# get the snapshot
3$ curl http://img.jameso.be/utxo-788000.dat > utxo-788000.dat
4
5# you'll want to do this if you like copy/pasting
6$ export AU_DATADIR=/home/${USER}/au-test # or wherever
7
8$ mkdir ${AU_DATADIR}
9$ vim ${AU_DATADIR}/bitcoin.conf
10
11dbcache=8000 # or, you know, something high
12blockfilterindex=1
13coinstatsindex=1
14prune=3000
15logthreadnames=1
Obtain this branch, build it, and then start bitcoind:
0$ git remote add jamesob https://github.com/jamesob/bitcoin
1$ git fetch jamesob utxo-dumpload-compressed
2$ git checkout jamesob/utxo-dumpload-compressed
3
4$ ./configure $conf_args && make # (whatever you like to do here)
5
6# start 'er up and watch the logs
7$ ./src/bitcoind -datadir=${AU_DATADIR}
Then, in some other window, load the snapshot
0$ ./src/bitcoin-cli -datadir=${AU_DATADIR} loadtxoutset $(pwd)/utxo-788000.dat
You’ll see some log messages about headers retrieval and waiting to see the snapshot in the headers chain. Once you get the full headers chain, you’ll spend a decent amount of time (~10min) loading the snapshot, checking it, and flushing it to disk. After all that happens, you should be syncing to tip in pretty short order, and you’ll see the occasional [background validation]
log message go by.
In yet another window, you can check out chainstate status with
0$ ./src/bitcoin-cli -datadir=${AU_DATADIR} getchainstates
as well as usual favorites like getblockchaininfo
.
Original change description
For those unfamiliar with assumeutxo, here’s a brief summary from the issue (where any conceptual discussion not specific to this implementation should happen):
assumeutxo would be a way to initialize a node using a headers chain and a serialized version of the UTXO state which was generated from another node at some block height. A client making use of this UTXO “snapshot” would specify a hash and expect the content of the resulting UTXO set to yield this hash after deserialization.
This would allow users to bootstrap a usable pruned node & wallet far more quickly (and with less disk usage) than waiting for a full initial block download to complete, since we only have to sync blocks between the base of the snapshot and the current network tip. Needless to say this is at expense of accepting a different trust model, though how different this really ends up being from
assumevalid
in effect is worth debate.
In short, this is an interesting change because it would allow nodes to get up and running within minutes given a ~3GB file (at time of writing) under an almost identical trust model to assumevalid.
In this implementation, I add a few RPC commands: dumptxoutset
creates a UTXO snapshot and writes it to disk, and loadtxoutset
intakes a snapshot from disk, constructs and activates chainstate based on it, and continues a from-scratch initial block download in the background for the sole purpose of validating the snapshot. Once the snapshot is validated, we throw away the chainstate used for background validation.
The assumeutxo procedure as implemented is as follows:
- A UTXO snapshot is loaded with the
loadtxoutset <path>
RPC command. - A new chainstate (
CChainState
) is initialized usingChainstateManager::ActivateSnapshot()
:- The serialized UTXO data is read in and various sanity checks are performed, e.g. compare expected coin count, recompute the hash and compare it with assumeutxo hash in source code.
- We “fast forward”
new_chainstate->m_chain
to have a tip at the base of the snapshot (with or without block data). Lacking block data, we fake thenTx
counts of the constituentCBlockIndex
entries. LoadChainTip()
is called on the new snapshot and it is installed as our active chainstate.
- The new assumed-valid chainstate is now our active, and so that enters IBD until it is synced to the network’s tip. Presumably the snapshot would be taken relatively close to the current tip but far enough away to avoid meaningful reorgs, say 10,000 blocks deep.
- Once the active chainstate is out of IBD, our old validation chain continues IBD “in the background” while the active chainstate services requests from most of the system.
- Once the background validation chainstate reaches a height equal the base of the snapshot, we take the hash of its UTXO set and ensure it equals the expected hash based on the snapshot. If the hashes are equivalent, we delete the validation chainstate and move on without event; if they aren’t, we log loudly and fall back to the validation chainstate (we should probably just shut down).
The implicit assumption is that the background validation chain will always be a subset of the assumed-valid snapshot chain while the latter is active. We don’t properly handle reorgs that go deeper than the base of the snapshot.
Changes (already merged/outdated)
The crux of this change is in removing any assumptions in the codebase that there is a single chainstate, i.e. any references to global variables chainActive
, pcoinsTip
, et al. need to be replaced with functions that return the relevant chainstate data at that moment in time. This change also takes CChainState
to its logical conclusion by making it more self-contained - any references to globals like chainActive
are removed with class-local references (m_chain
).
A few minor notes on the implementation:
-
When we attempt to load a wallet with a BestBlock locator lower than the base of a snapshot and the snapshot has not yet been validated, we refuse to load the wallet.
-
For additional notes, see the new assumeutxo docs.