Link to the original protocol writeup found here. See the Delving Bitcoin post for initial works and discussion.
Motivation
While many coins are available in cache when spent, every cache miss requires a trip to the disk. Memory can be scarce for single board machines, so cache misses are expected to be far more frequent. A trip to disk is all-the-more severe on low end hardware, so any minimization of disk interactions can speed up initial block download for limited resource devices. One such example may be a block template provider serving a home miner.
Intuition
The vast majority of coins created are later spent. The cost of this observation is many coins will be written to disk, only to be removed later. The SwiftSync protocol alleviates the problem by recording what coins have been added and removed with a small, in-memory aggregate. This allows for the removal of intermediate coin states until SwiftSync has completed. The state of the aggregate is verified at a predetermined chain height, and IBD continues as usual thereafter.
For any chain height, we know all outputs - all inputs = UTXO set, which is equivalently all outputs - UTXO set = all inputs. The goal is to verify the set all outputs - UTXO set is equal to all inputs. Addition and subtraction modulo a field offers a way to do so. For any set, if we add all elements of the set and subtract them, regardless of ordering, we will arrive at zero. In the current implementation, the set elements being compared are hashed outpoints.
The SwiftSync protocol is comprised of two structures. First, the aggregate that will record the state of what outpoints have been added (created) or subtracted (spent). Next, notice that the UTXO set must be reported in order to verify all outputs - UTXO set = all inputs. SwiftSync introduces a hints file, which may be provided to the client at startup. This hints file is nothing more than a pointer to each unspent output by recording the block height and the indexes that will remain unspent. By using this file, a client may omit outpoints that will be in the UTXO set when updating the aggregate. Note this file is not trusted. If a malicious actor provides a faulty UTXO set, the set comparison above will fail, and the client may begin a reindex.
Implementation
Aggregate
The hash aggregate takes outpoints, hashes them with a salted hasher, and updates 4, 64-bit limbs with wrapping modulo arithmetic. The choice of 4, 64-bit limbs with no carries between limbs was made to: 1. minimize code complexity within the aggregate 2. conform to existing APIs, namely uint256. After adding and subtracting some arbitrary list of outpoints, we may poll if the sets were equivalent with IsZero.
Hintsfile
The hints file is a map of block height to the indexes of the outputs in that block that will be in the UTXO set. To construct this file for a particular UTXO set state, each block must be read and queried to find the outputs in that block that remain unspent. Because this is a read-only operation, this may be done on parallel threads in the future. To allow for parallelization, the hints file contains a “contents” section, which denotes the height and corresponding file position to find the hints for that block.
To save space when representing the indexes that remain unspent in a block, an easy choice is to take the difference between the last coin that was unspent and the next one in the block. AFICT this is a version of run-length-encoding. These indexes are encoded using compactSize to further save space, the run-lengths should normally fit into at least 16 bits, often 8 bits.
Parameter Interaction
A side effect of omitting intermediate UTXO states while performing IBD is the “undo data” cannot be derived. There are already configurations that do not have full undo data history, namely pruned nodes. To avoid additional complexity in this PR, this version of accelerated IBD is only made available to pruned nodes, as the undo data would have been removed regardless of the this setting.
Other parameter interactions, not yet implemented:
assumeutxo: Startup should fail if both are configured. A hints file will have no effect, as the UTXO set is already present.assumevalid: A natural choice is to only allow SwiftSync when usingassumevalid, and to only allow a hints file that commits to theassumevalidblock. Similar toassumeutxo, a hash of the hints file may be committed to in the binary, which would be resistant to griefing attacks.
Testing [Linux example]
In your bitcoin directory, download the hints from the test server:
0curl -o signet.hints http://utxohints.store/hints/signet
Verify the hash of the file:
0echo "2b625392ecb2adba2b5f9d7b00c8f3a8adca0d3c signet.hints" | sha1sum --check -
Build the branch:
0cmake -B build && cmake --build build -j $(nproc)
Remove existing chain state:
0rm -rf ~/.bitcoin/signet/
Provide the file at startup:
0./build/bin/bitcoind --chain=signet --utxohints=./signet.hints --prune=550
Other networks:
Testnet4: endpointhttp://utxohints.store/hints/testnet4, expected checksum7acbb25b651ffc5ebc59c46fd065a510c149f00e testnet4.hints
Open Questions
- Given the outcome of SwiftSync is pass/fail, and there is no use of intermediate states, should blocks be persisted to disk at all during the protocol?
- To what degree should a hints file be committed to in the binary. No commitment allows for a server to provide hints for an arbitrary height, providing a convenience for the client, but opens up the possibility of a grief attack. Of course the server risks losing reputation for serving faulty files.
Future Iteration
This is the first iteration of the SwiftSync proposal, however more optimizations may be made. Namely, because set elements may be added or subtracted from the aggregate in any order, blocks may be downloaded in parallel from multiple peers. If you are interested in benchmarking the performance of a fully-parallel version, see the Rust implementation.