kernel: feedback on using kernel in alternative implementations #31878

issue Davidson-Souza openend this issue on February 15, 2025

Davidson-Souza commented at 1:23 pm on February 15, 2025: none

After talking about this with many core devs offline, and being told every time to open an issue with this, I’m opening this to spark some discussion about the viability and optimal implementation of a libbitcoinkernel-like lib to help consensus-compatible implementations. This could be useful for other experimental projects like [2][3].

Goals

Bitcoin doesn’t have a formal spec for its base protocol. It operates with the “reference implementation” model, where Bitcoin Core dictates what Bitcoin actually is. Bitcoin Core is a mission critical software, the backbone of a 2 trillion dollars industry. And as such, it has the highest standards when it comes to adding new features, and experimentations. However, Bitcoin is a new completelly novel experiment, with many prospective ideas out there that can’t be implemented in Core due to it’s incipience, but also can’t be implemented in an alternative client (or at least not trivially) due to the constant risk of falling out of consensus with Bitcoin Core. This tradeoff has stopped or slowed down some ideas like Utreexo and Proof of Work fraud proofs because, to implement them, you need to reimplement a bug-compatible Bitcoin Core rewrite (possibly in other languages). This is also a problem for projects that don’t operate directly with the consensus layer, like wallets and Layer two software, that might misinterpret some paramentes and end up putting user’s funds in risk.

For new deployments, such as segwit and taproot, we have a very well thought description and standardization. We also have abundant test-vectores and sometimes even self-contained reference implementations. That makes reimplementations much easier. However, unless we can hard-fork the network to remove all undefined behavior and get a formal specification, building complex software on top of Bitcoin without a way to re-use Bitcoin Core code will always be a non-trivial process.

Utreexo light clients

With utreexo[1] we can have extremely small clients, both in code footprint and actual resource usage. We currently have two implementations of such ideas: utreexod[2] and floresta[3]. The former is written in Golang, and the latter in Rust. While they are well-tested and have a high quality standard, reimplementing Bitcoin’s consensus logic is a complex problem, that oftentimes can fail you in the most unexpected ways.

The vast majority of consensus failures comes from implementing the low-level checks of blocks and transactions, like script evaluation and locktimes. The implementation of those parts inside Bitcoin Core is battle-tested and overly optimized already, making any meaningful improvement almost impossible. So it would make sense for other projects to reuse them, reducing the chances of a consensus failure and using the state-of-the-art validation logic. The goal isn’t, in any way, replace Bitcoin Core. Rather, to allow for inovation and research in parts that wouldn’t be inside Core’s scope, but without re-inventing the weel every single time. For this we have two possible approaches, libbitcoinconsensus and libbitcoinkernel.

libbitcoinconsensus

This was one the earliest attempts to make the validation code for bitcoin reusable. This lib, however, only exposes script validation. All other checks must be implemented by consumers. Although it have such limited context, it turns out that most of the complexity inside Bitcoin’s validation comes from script validation. It is over a thousand lines, and has many known and unknown undefined behaviors that are extremely hard to get right in other languages, like Rust or Go. For instance, the last consensus failure found on btcd is related to FindAndDelete, a function used inside the script evaluation.

Floresta uses libbitcoinconsensus since it’s very beginning, and has never had any failure of this kind. The API is simple and hard-to-misuse. However, this limited scope prevents it from being used in more advanced use-cases. For instance, all layer two applications need to re-implement core’s mempool policies inside the implementation, sometimes needing a simplified script evaluation logic. This is useful because they need to learn in advance whether a transaction will be confirmed or not, without broadcasting it. If core’s internal policy was exposed as a lib, that would help reducing potential risks on layer two’s software and make them simpler and easier to maintain.

libbitcoinkernel

Kernel on the other hand, is a more complete tool that can even run an embedded Bitcoin Core on your application. It exposes core’s internal storage, the peers logic and lets you replace some parts of a node. It is perfect for applications that needs to consume blocks in a high-bandwidth way, since you can read blocks and transaction straight from disk, without going through the RPC. In the version implemented in [4], it even brings the old libbitcoinconsensus’s API, making it a drop-in replacement for the latter.

As currently designed, however. It can’t get past of libbitcoinconsensus’s context for alternative implementations or layer two software. The reason being that it depends on core’s UTXO set and blocks store. For floresta and utreexod, having the UTXO set doesn’t make sense, since we are utreexo CSN nodes. For L2, sometimes they need to check for chains of unconfirmed transactions, spending UTXOs that doesn’t exist in the UTXO set nor mempool. The ideal workflow for such applications would be a method (or a series of) that takes a serialized block, a list of UTXOs, the current MTP, block height (and hash?); then returns whether that block is valid or not (returning the error if possible). This would be close to the SANS-IO model, where all data needed will be provided by the caller, and the function would only operate on them. A similar interface to validate transactions would also be very useful.

The main con of this approach is that now libbitcoinkernel will technically not be fully-consensus compatible. This happens because whether a UTXO exists or not is completely tied to the database that core uses, in some cases it might be a problem that only happens if they appear in a block. We already have a track record of a chain split caused by a database upgrade[5]. Even if we’ve implemented utreexo validation inside Bitcoin Core, leveldb would still be required. The best we could do is require that a UTXO is recognized by both utreexo and leveldb (that would be a soft-fork, technically). Eliminating leveldb would only be possible if Bitcoin Core worked only as a utreexo node, using leveldb only as a way to keep UTXO data. Although this would be a huge improvement on defining bitcoin’s consensus, and preventing future failures like [5], that is impractical both politically and probably technically, as implementing this risks a consensus failure.

A good compromise would be if libbitcoinkernel (or another subset of thereof with another name for clarity) would allow an “unsafe” mode, where not all consensus parts are provided (e.g. leveldb) and you still could create a failure. This project could live under a new namespace to not confuse with libbitcoinkernel.

Sources [1] https://eprint.iacr.org/2019/611 [2] https://github.com/utreexo/utreexod [3] https://github.com/vinteumorg/floresta [4] #30595 [5] https://github.com/bitcoin/bips/blob/master/bip-0050.mediawiki
maflcko added the label UTXO Db and Indexes on Feb 17, 2025
maflcko added the label Brainstorming on Feb 17, 2025
maflcko added the label Utils/log/libs on Feb 17, 2025

Contributors
Davidson-Souza

Labels
Brainstorming UTXO Db and Indexes Utils/log/libs

kernel: feedback on using kernel in alternative implementations #31878

Goals

Utreexo light clients

libbitcoinconsensus

libbitcoinkernel