Project Board for pull requests past and present: https://github.com/orgs/bitcoin/projects/3
This is the tracking issue for the bitcoin kernel library (libbitcoinkernel
) project. The original tracking issue is found in #24303.
The bitcoin kernel project is a new attempt at extracting Bitcoin Core’s consensus engine. The kernel part of the name highlights one of the key functional differences from the deprecated libbitcoinconsensus and in fact, most libraries: it is a stateful library that can spawn threads, do caching, do I/O, and many other things that one may not normally expect from a library.
libbitcoinkernel
could also be used as an internal library for libbitcoin_node
. The desired library organization is shown in doc/design/libraries.md. This is attempted in #28690.
The statefulness is necessary for the bitcoin kernel’s decidedly incremental approach to extracting our consensus engine. This approach favors:
-
Reusing existing code …which allows us to be continually integrated with Bitcoin Core and benefit from our extensive test suite
-
Incremental decoupling instead of building from scratch …which allows us to avoid having to prematurely optimize for a “perfect” boundary or API (tends to be highly subjective, non-obvious, may lead to unproductive bike-shedding before we’ve even done anything meaningful)
The work of extracting the validation engine into a library and making the API ergonomic is likely to be a multi-release project involving multiple contributors. The incremental approach takes this into account and respects the sheer size of work (both in writing code and getting it through review) that needs to be undertaken.
Areas with open work
Defining an API for the library
Headers installed alongside the library should define a safe and stable interface for its users. The main thrust of work is currently towards providing a first version for a C header providing rudimentary support for validating scripts and blocks, as well as reading from the block store: #30595. See the pull request description for a complete list of binaries and projects using it already. Re-using the API for some of our internal binaries would guarantee dogfooding as well as provide incentives for improving and maintaining it.
The headers have bindings in:
- Python https://github.com/stickies-v/py-bitcoinkernel
- Rust https://github.com/theCharlatan/rust-bitcoinkernel
- Go https://github.com/stringintech/go-bitcoinkernel
- Java https://github.com/yuvicc/java-bitcoinkernel/tree/master
More bindings would be desirable for the following categories:
- Application developers and systems engineers (Swift, Kotlin, Dart, Zig, Haskell, Clojure, Javascript etc.)
- Data analysis and research (R, Julia etc.)
- Data queries (e.g. PostgreSQL external data etc)
Some work has also been done for expanding this initial API, including:
- Script debugging: https://github.com/TheCharlatan/bitcoin/commits/kernelApi_Script/
- Header validation: https://github.com/TheCharlatan/bitcoin/tree/kernelApiNode
- Transaction validation: https://github.com/TheCharlatan/bitcoin/tree/kernelApiNode
An alternative to installing C headers would be installing C++ headers directly. This can either be done by installing all the existing Bitcoin Core headers wholesale, or by exposing a mix between existing headers and headers providing a cleanly wrapped API. The latter approach was also implemented in the following branch: https://github.com/TheCharlatan/bitcoin/tree/kernelApi_Cpp_Internal_Headers.
The current API defined through the C header only has very rudimentary error handling. While it attempts to follow the philosophy that errors arising from unsanitized user input are programming errors, it does surface more concrete information on validation failures, as well as errors that arise from problems with the underlying system. These are referred to as fatal errors. The current Bitcoin Core code does not always surface these fatal errors to the caller. #29642 attempts to surface this error information directly.
It is also currently not possible to instantiate multiple kernel validation objects and distinguish between their log messages. The library currently only has a global logger. See #30342 for a potential solution to this.
Some small build system tweaks to ensure the correct symbols get exported would also be required, see the discussion in https://github.com/bitcoin-core/secp256k1/pull/1677.
Read-only kernel clients
The kernel library provides the functionality to read Bitcoin Core’s data, like blocks, headers, and undo data, directly from disk without additional overhead. External applications like Electrs and Esplora parse block.dat files directly for index building. This is problematic because Bitcoin Core’s block storage is internal and provides no guarantees for external consumers. Reading this data directly can give these guarantees, as well as improve performance, since the block store might contain invalid blocks, or blocks from dead forks.
The current drawback to this approach is that leveldb, which stores the block tree and the chainstate, does not allow for readers in parallel processes. Bitcoin Core therefore has to shutdown before an external process using the kernel library can read its data. A pull request moving the block tree database from leveldb to a simple hand-rolled format, which would allow reading the data in parallel, was opened in #32427.
A summary of how the kernel library might be used for reading data is provided here. Applications building on #32427 could validate this architecture as well as provide benchmarks for this approach over using Bitcoin Core’s existing interfaces to read block data.
Context-minimal validation
The current design of the library includes all the Bitcoin Core specific data storage methods. Alternative clients wishing to validate blocks therefore have to build the chain block-by-block and have to re-use the UTXO model.
However there exist projects and proposed alternative data models like utreexo, swift sync, UHS, and libbitcoin, that do not require a UTXO set for validation, but could still profit from re-using Bitcoin Core’s Block validation code. Floresta, a utreexo implementation, and a project working on swiftsync have voiced strong desire for these features.
There are a few possible directions to take here, including:
- Abstracting data readers and writers (e.g. an abstract block store: https://github.com/TheCharlatan/bitcoin/tree/blockstoreReaderWriter)
- Making validation functions require less context (e.g. stop passing the utxo step to validation functions: #32317)
- Reducing the
cs_main
lock scope (e.g. reducing locking duringAcceptBlock
https://github.com/TheCharlatan/bitcoin/tree/accept_block_reduce_lock) - Exposing more validation functions, like
CheckBlock
,AcceptBlock
, etc. - Validation functions that only use the block tree, but don’t persist chainstate and block data
Context-minimal validation might also be interesting for SPV clients, which wish to re-use only a subset of validation.
cs_main
split
Currently the cs_main
recursive mutex protects not only the bulk of validation-specific mutable state, but also state in net processing. Ideally validation would have its own mutexes, that do not have to be shared with other functionality. Function calls requiring a lock to be held as a pre-condition to ensure consistent state transitions should be consolidated to allow internal locking. This could eventually allow users to process validation tasks and query for data in parallel with less locking contention. It also could allow parallel processing and better prioritization of network messages.
Some draft work towards this goal has been started
- Splitting the net processing mutex: https://github.com/TheCharlatan/bitcoin/tree/net_processing_lock_split
- Better handling of the last blockfile mutex: https://github.com/TheCharlatan/bitcoin/tree/last_blockfile_mutex
- Separate cuckoocache locks: https://github.com/TheCharlatan/bitcoin/tree/lock_cuckoocache
Chainstate / ChainstateManager split
The Chainstate and ChainstateManager are currently tightly coupled, but need not be. In practice this means working with a single chainstate is often needlessly complicated. A draft split of the two classes has been attempted in https://github.com/TheCharlatan/bitcoin/tree/chainmansplit.
Compile targets for embedded, web assembly, and proof systems
One step further than context-free validation would be a kernel API that can compile to architectures with reduced capabilities, e.g. no support for threading, file systems, or atomics. Adding these capabilities could allow the library to be used in embedded environments, e.g. as part of a validating lightning signer or SPV wallet. Targetting riscv bare metal, or llvm IR, could allow re-use in ZK proof systems as well as formal specification languages.
Eventually the goal here should be to provide a fully “sans IO” version of the API.
A PR adding a riscv bare metal CI job as opened in #31425.
Evicting the mempool / plug-able policy
The mempool and its policy rules are currently inside the kernel library, but won’t be exposed in the initial API version. Ideally the library should not force its users to re-use the existing Bitcoin Core opinionated implementation, but allow them to either define their own mempool or at least define their own policy rules. A rough proof of concept for removing the mempool from the library is implemented in https://github.com/TheCharlatan/bitcoin/tree/mempoolout.
The mempool is currently also the only dependent on boost within the library. Having boost as a dependency makes it a bit harder to build, and makes using the C++ headers in the library directly more annoying.
Evicting AssumeUTXO / AssumeValid
AssumeUTXO/Assumevalid are Bitcoin Core specific approaches to improving IBD speed. It is likely external users of Kernel will not want to be tightly coupled to Bitcoin Core specific processes, e.g., choosing the AssumeUTXO snapshots and assumevalid points. Philosophically, kernel should be as agnostic as possible and not tied to a particular implementation’s decisions.
Isolated kernel repository
Eventually Bitcoin Core could itself depend on a standalone kernel repository. Bitcoin Core would likely end up pulling in this project as a subtree. Some open questions around this subject are:
- How would the CI for this repository be set up?
- How would its own fuzzing and unit test setup look like?
- What does the release process look like?
- What is the governance of the project? E.g., is it under Bitcoin Core?
In many ways, this could be similar to how the libsecp256k1 is maintained in its own repository. Some steps towards this goal have been taken:
- Adding a dedicated internal kernel library that could be re-used directly by the project: #28690
- Removing the clientversion dependency from the kernel: #32543
- Pruning unwanted headers from the library: https://github.com/TheCharlatan/bitcoin/tree/kernelPruneHeaders
- Having a separate kernel repo: https://github.com/TheCharlatan/bitcoin-kernel
Action Items
- If you have any questions, please post them below!
- If you have any ideas for the future directions of the project, or want to work on one of the open areas, I’d love to talk about them and support you!
Project-wide TODOs
These are suggestions for further cleanup and improvements that came up during review:
- Various followups for [refactor, kernel: Decouple ArgsManager from blockstorage #27125](https://github.com/bitcoin/bitcoin/pull/27125)
- Martin’s request for a follow-up fixing the docstring: #27125 (review)
Other various items that arose during review and should be tracked
- Cory’s request for cleaning up the kernel interface functions of pointer and reference types: #27636 (review)
- Marco’s request for getting rid of exceptions in the
ArgsManager
: #27491 (review) - Russell’s request for eliminating the BlockNotify signal: #27636 (review)