kernel: Introduce initial C header API

TheCharlatan commented at 9:18 am on August 6, 2024: contributor

This is a first attempt at introducing a C header for the libbitcoinkernel library that may be used by external applications for interfacing with Bitcoin Core’s validation logic. It currently is limited to operations on blocks. This is a conscious choice, since it already offers a lot of powerful functionality, but sits just on the cusp of still being reviewable scope-wise while giving some pointers on how the rest of the API could look like.

The current design was informed by the development of some tools using the C header:

A re-implementation (part of this pull request) of bitcoin-chainstate.
A re-implementation of the python block linearize scripts: https://github.com/TheCharlatan/bitcoin/tree/kernelLinearize
A silent payment scanner: https://github.com/josibake/silent-payments-scanner
An electrs index builder: https://github.com/josibake/electrs/commits/electrs-kernel-integration
A rust bitcoin node: https://github.com/TheCharlatan/kernel-node
A reindexer: https://github.com/TheCharlatan/bitcoin/tree/kernelApi_Reindexer

The library has also been used by other developers already:

A historical block analysis tool: https://github.com/ismaelsadeeq/mining-analysis
A swiftsync hints generator: https://github.com/theStack/swiftsync-hints-gen
Fast script validation in floresta: https://github.com/vinteumorg/Floresta/pull/456

Next to the C++ header also made available in this pull request, bindings for other languages are available here:

The rust bindings include unit and fuzz tests for the API.

The header currently exposes logic for enabling the following functionality:

Feature-parity with the now deprecated libbitcoin-consensus
Optimized sha256 implementations that were not available to previous users of libbitcoin-consensus thanks to a static kernel context
Full support for logging as well as control over categories and severity
Feature parity with the existing experimental bitcoin-chainstate
Traversing the block index as well as using block index entries for reading block and undo data.
Running the chainstate in memory
Reindexing (both full and chainstate-only)
Interrupting long-running functions

The pull request introduces a new kernel-only test binary that purely relies on the kernel C header and the C++ standard library. This is intentionally done to show its capabilities without relying on other code inside the project. This may be relaxed to include some of the existing utilities, or even be merged into the existing test suite.

The complete docs for the API as well as some usage examples are hosted on thecharlatan.ch/kernel-docs. The docs are generated from the following repository (which also holds the examples): github.com/TheCharlatan/kernel-docs.

How can I review this PR?

Scrutinize the commit messages, run the tests, write your own little applications using the library, let your favorite code sanitizer loose on it, hook it up to your fuzzing infrastructure, profile the difference between the existing bitcoin-chainstate and the bitcoin-chainstate introduced here, be nitty on the documentation, police the C interface, opine on your own API design philosophy.

To get a feeling for the API, read through the tests, or one of the examples.

To configure this PR for making the shared library and the bitcoin-chainstate and test_kernel utilities available:

0cmake -B build -DBUILD_KERNEL_LIB=ON -DBUILD_UTIL_CHAINSTATE=ON

Once compiled the library is part of the build artifacts that can be installed with:

0cmake --install build

Why a C header (and not a C++ header)

Shipping a shared library with a C++ header is hard, because of name mangling and an unstable ABI.
Mature and well-supported tooling for integrating C exists for nearly every popular language.
C offers a reasonably stable ABI

Also see #30595 (comment).

What about versioning?

The header and library are still experimental and I would expect this to remain so for some time, so best not to worry about versioning yet.

Potential future additions

In future, the C header could be expanded to support (some of these have been roughly implemented):

Handling transactions, block headers, coins cache, utxo set, meta data, and the mempool
Adapters for an abstract coins store
Adapters for an abstract block store
Adapters for an abstract block tree store
Allocators and buffers for more efficient memory usage
An “io-less” interface
Hooks for an external mempool, or external policy rules

Current drawbacks

For external applications to read the block index of an existing Bitcoin Core node, Bitcoin Core needs to shut down first, since leveldb does not support reading across multiple processes. Other than migrating away from leveldb, there does not seem to be a solution for this problem. Such a migration is implemented in #32427.
The fatal error handling through the notifications is awkward. This is partly improved through #29642.
Handling shared pointers in the interfaces is unfortunate. They make ownership and freeing of the resources fuzzy and poison the interfaces with additional types and complexity. However, they seem to be an artifact of the current code that interfaces with the validation engine. The validation engine itself does not seem to make extensive use of these shared pointers.
If multiple instances of the same type of objects are used, there is no mechanism for distinguishing the log messages produced by each of them. A potential solution is #30342.
The background leveldb compaction thread may not finish in time leading to a non-clean exit. There seems to be nothing we can do about this, outside of patching leveldb.

DrahtBot commented at 9:18 am on August 6, 2024: contributor

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/30595.

Reviews

See the guideline for information on the review process.

Type	Reviewers
Concept ACK	stickies-v, ismaelsadeeq, stringintech, yuvicc
Approach NACK	josibake, purpleKarrot

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#33078 (kernel: improve BlockChecked ownership semantics by stickies-v)
#32953 ([POC] ci: Skip compilation when running static code analysis by hebasto)
#32427 ((RFC) kernel: Replace leveldb-based BlockTreeDB with flat-file based store by TheCharlatan)
#31507 (build: Use clang-cl to build on Windows natively by hebasto)
#31382 (kernel: Flush in ChainstateManager destructor by TheCharlatan)
#29700 (kernel, refactor: return error status on all fatal errors by ryanofsky)
#28792 (Embed default ASMap as binary dump header file by fjahr)
#26022 (Add util::ResultPtr class by ryanofsky)
#25665 (refactor: Add util::Result failure values, multiple error and warning messages by ryanofsky)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

DrahtBot added the label Validation on Aug 6, 2024

TheCharlatan force-pushed on Aug 6, 2024

DrahtBot added the label CI failed on Aug 6, 2024

DrahtBot commented at 10:42 am on August 6, 2024: contributor

🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/28396412371

Make sure to run all tests locally, according to the documentation.

The failure may happen due to a number of reasons, for example:

Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.
An intermittent issue.

Leave a comment here, if you need help tracking down a confusing failure.

TheCharlatan force-pushed on Aug 6, 2024

DrahtBot removed the label CI failed on Aug 6, 2024

theuni commented at 10:14 pm on August 7, 2024: member

Very cool. Can’t wait to dig in when I have some free time.

ryanofsky commented at 6:05 pm on August 12, 2024: contributor

This seems to offer a lot of nice features, but can you explain the tradeoffs of wrapping the C++ interface in C instead of using C++ from rust directly? It seems like having a C middle layer introduces a lot of boilerplate, and I’m wondering if it is really necessary. For example it seems like there is a rust cxx crate (https://docs.rs/cxx/latest/cxx/, https://chatgpt.com/share/dd4dde59-66d6-4486-88a6-2f42144be056) that lets you call C++ directly from Rust and avoid the need for C boilerplate. It looks like https://cppyy.readthedocs.io/en/latest/index.html is an even more full-featured way of calling c++ from python.

Another drawback of going through a C API seems like not just increased boilerplate, but reduced safety. For example, the implementation is using reinterpret_cast everywhere and it seems like the exposed C functions use a kernel_ErrorCode enum type with the union of every possible error type, so callers don’t have a way to know which functions can return which errors.

TheCharlatan commented at 8:54 am on August 13, 2024: contributor

Thank you for the questions and kicking this discussion off @ryanofsky! I’ll update the PR description with a better motiviation re. C vs C++ header, but will also try to answer your questions here.

This seems to offer a lot of nice features, but can you explain the tradeoffs of wrapping the C++ interface in C instead of using C++ from rust directly? It seems like having a C middle layer introduces a lot of boilerplate, and I’m wondering if it is really necessary. For example it seems like there is a rust cxx crate (https://docs.rs/cxx/latest/cxx/, https://chatgpt.com/share/dd4dde59-66d6-4486-88a6-2f42144be056) that lets you call C++ directly from Rust and avoid the need for C boilerplate. It looks like https://cppyy.readthedocs.io/en/latest/index.html is an even more full-featured way of calling c++ from python.

It is true that the interoperability between C++ and Rust has become very good. In fact there is someone working on wrapping the entirety of Bitcoin Core in Rust: https://github.com/klebs6/bitcoin-rs.

During the last Core Dev meeting in Berlin I also asked if a C API were desirable in the first place (notes here) during the libbitcoinkernel session. I moved forward with this implementation, because the consensus at the time with many contributors in the room was that it was desirable. The reasons for this as discussed during the session at the meeting can be briefly summarised:

Shipping a shared library with a C++ header is hard
Mature and well-supported tooling for integrating C exists for nearly every popular language.
C offers a reasonably stable ABI

So if we want the broadest possible support, across as many languages as possible with both dynamic and statically compiled libraries, a C header is the go-to option. I’m speculating here, but a C++ header might also make future standard version bumps and adoption of new standard library features harder. If having some trade-offs with compatibility, library portability, and language support is acceptable, a C++ header might be acceptaple though. It would be nice to hear more reviewers give their opinions here.

I’d also like to add that two libraries that we use and depend on in this project, minisketch and zeromq, use the same pattern. They are C++ codebases, that only expose a C API that in both instances can be used with a C++ RAII wrapper. So there is precedent in the free software ecosystem for doing things this way.

The quality of C++ language interop seems to vary a lot between languages. Python and Rust seem to have decent support, ziglang on the other hand has no support for C++ bindings. JVM family languages are a bit hit and miss, and many of the common academic and industrial data analysis languages, like Julia, R, and Matlab have no support for direct C++ bindings. The latter category should not be disregarded as potential future users, since this library might be useful to access Bitcoin Core data for data analysis projects.

Another drawback of going through a C API seems like not just increased boilerplate, but reduced safety. For example, the implementation is using reinterpret_cast everywhere

I feel like the reduced type safety due to casting is bit of a red herring. The type casting can be harder to abuse if you always use a dedicated helper function for interpreting passed in data types (as I believe is implemented here). Casting is also a pattern used in many other projects; both minisketch and libzmq use similar type casts extensively. It should definitely be possible to scrutinize the API in this PR to a point where it offers decent safety to its users as well as contributors to and maintainers of this code base.

The concerns around boilerplate are more serious in my view, but at least with the current internal code and headers I feel like exposing a safe C++ API is not trivial either. The current headers do not lend themselves to it well, for example through tricky locking mechanics, exposing boost types, or confusing lifetimes. There also comes a point where we should probably stop extensively refactoring internal code for the kernel. I’ve heard some voices during the last two Core Dev meetings with concerns that the kernel project might turn the validation code into an extensive forever building site. Having some boilerplate and glue to abstract some the ugliness and make it safe seems like an acceptable solution for this dilemma. If this means boilerplate is required anyway, I would personally prefer a C API.

Some of the boilerplate-y duplicate definitions in the header could be dropped again eventually if some of the enums are moved to C-style enums instead of class enum. As long as they are properly namespaced, I don’t see a big drawback for this. Similarly, some of the structs could be defined in a way where they can be used on both sides using pimpl or similar idioms. All in all, most of these translations seem very straightforward.

It might be interesting to see how some of the RPC methods could be re-implemented using the kernel header. There have been some RPC implementation bugs over the years that were due to unsafe usage of our internal code within the method implementations. Using the kernel header instead might make this safer and reduce boilerplate. To be clear, I am not suggesting replacing the implementations, but separately re-implementing some of them to show where the kernel header might shine.

it seems like the exposed C functions use a kernel_ErrorCode enum type with the union of every possible error type, so callers don’t have a way to know which functions can return which errors.

We have disagreed on the design of this before. If I understood you correctly, consolidating all error codes into a single enumeration was one of the reasons you opened your version for handling fatal errors in the kernel: #29700 as an alternative to my original: #29642. I am still a bit torn by the two approaches. I get that it may be useful to exactly see which errors may be encountered by invoking a certain routine, but at the same time I get the feeling this often ends up splintering the error handling to the point where you end up with a catch all approach after all. I also think that it is nice to have a single, central list for looking up all error codes and defining some routines for handling them in close proximity to their definition. It would be nice to finally hear some more voices besides the two of us discussing this. real-or-random has recently provided some good points on error handling in the libsecp silent payments pr (that I mostly did not adopt in this PR) and argues that most error codes are not useful to the user. As mentioned in the description, error handling is a weak spot of this pull request and I would like to improve it.

ryanofsky commented at 4:01 pm on August 13, 2024: contributor

I guess another thing I’d like to know is if this is the initial C API, and the implementation is around 3000 lines, and it doesn’t handle “transactions, block headers, coins cache, utxo set, meta data, and the mempool”, how much bigger do you think it will get if it does cover most of the things you would like it to cover? Like is this 20%, 30%, or 50% of the expected size?

I like the idea of reviewing and merging this PR, and establishing a way to interoperate with rust libraries and external projects. I just think going forward we should not lock ourselves into an approach that requires everything to go through a C interface. As we build on this and add features, we should experiment with other approaches that use C++ directly, especially when it can reduce boilerplate and avoid bugs.

Thanks for pointing to me to the other error handling discussion. I very much agree with the post that says having a single error handling path is highly desirable. I especially agree with this in cases where detailed error messages are still provided (keeping in mind that error handling != error reporting, you can return simple error states with detailed messages or logging). Of course there are places where callers do need to handle separate error cases, especially when there are temporary failures, timeouts, and interruptions, and in these cases functions should return 2 or 3 error states instead of 1. But I don’t think there is a reason in modern application code for functions to be able to return 5, 10, 20, or 50 error states generally. In low-level or very general OS, networking or DBMS code it might make sense, but for application code it seems like a cargo cult programming practice that made IBM service manuals very impressive in the 1980s but does not have a present day rationale. There are special cases, but I don’t think it should be a normal thing for functions to be returning 15 error codes if we are trying to provide a safe and easy to use API.

Again though, if this approach is the easiest way to get cross-language interoperability working right now, I think we should try it. I just think we should be looking for ways to make things simpler and safer going forward.

TheCharlatan commented at 7:38 pm on August 13, 2024: contributor

I guess another thing I’d like to know is if this is the initial C API, and the implementation is around 3000 lines, and it doesn’t handle “transactions, block headers, coins cache, utxo set, meta data, and the mempool”, how much bigger do you think it will get if it does cover most of the things you would like it to cover? Like is this 20%, 30%, or 50% of the expected size?

I think a fair comparison would be comparing the amount of code “glue” required, e.g. the size of the bitcoinkernel.cpp file in this pull request. The size of the header is very dependent on the detail of documentation and I think judging it by the amount of test code is also hard. On my branch including iterators for the UTXO set, handling headers, and simple mempool processing, basically all the stuff required to drop-in replace the calls to validation code in net_processing with the C API, is about similar in size: https://github.com/bitcoin/bitcoin/pull/30595/files#diff-cc28221ef8d0c7294dda4e3df9f70bb6c062006b387468380c2c2cc02b6762c3 . The code on that branch is more hacky than the code here, so I would expect a bit less than a doubling in size to get all the features required to run a full node with transaction relay.

In low-level or very general OS, networking or DBMS code it might make sense, but for application code it seems like a cargo cult programming practice that made IBM service manuals very impressive in the 1980s but does not have a present day rationale.

Heh, well put. I think for most functions here it could be feasible to have more concise error codes without too much effort, but I feel like I have to detach from this a bit before being able to come up with an alternative.

ryanofsky commented at 10:09 pm on August 13, 2024: contributor

I think for most functions here it could be feasible to have more concise error codes without too much effort, but I feel like I have to detach from this a bit before being able to come up with an alternative.

Thanks, I think I’d need to look at this more to give concrete suggestions, but I’d hope most functions would just return a simple success or failure status, with a descriptive error message in the case of failure. When functions need to return more complicated information or can fail in different ways that callers will want to distinguish, it should be easy to return the relevant information in custom struct or enum types. I think it’s usually better for functions to return simpler custom types than more complicated shared types, because it lets callers know what values functions can return just by looking at their declarations.

hebasto added the label Needs CMake port on Aug 16, 2024

TheCharlatan force-pushed on Aug 28, 2024

TheCharlatan force-pushed on Aug 29, 2024

hebasto removed the label Needs CMake port on Aug 29, 2024

TheCharlatan force-pushed on Aug 29, 2024

DrahtBot added the label CI failed on Aug 30, 2024

DrahtBot removed the label CI failed on Aug 30, 2024

TheCharlatan force-pushed on Sep 1, 2024

TheCharlatan commented at 8:29 pm on September 1, 2024: contributor

I think for most functions here it could be feasible to have more concise error codes without too much effort, but I feel like I have to detach from this a bit before being able to come up with an alternative.

Completely got rid of the kernel_Error with the last push. Thanks for laying out your logic ryanofsky, I feel like this is cleaner now. When looking at the Rust wrapper, the code seems much clearer too. Errors are now communicated through nullptr or false values. Where required, so far only for the verification functions, a richer status code is communicated to the developer.

ryanofsky commented at 8:13 pm on September 2, 2024: contributor

Thanks for the update. It’s good to drop the error codes so the C API can correspond 1:1 with the C++ API and not be tied to a more old fashioned and cumbersome error handling paradigm (for callers that want to know which errors are possible and not have to code defensively or fall back to failing generically).

I am still -0 on the approach of introducing a C API to begin with, but happy to help review this and get merged and maintain it if other developers think this is the right approach to take (short term or long term). It would be great to have more concept and approach ACKs for this PR particularly from the @theuni who commented earlier and @josibake who seems to have some projects built on this and linked in the PR description.

I think personally, if I wanted to use bitcoin core code from python or rust I would use tools like:

And interoperate with C++ directly, instead of wrapping the C++ interface in a C interface first. Tools like these do not support all C++ types and features, and can make it necessary to selectively wrap more complicated C++ interfaces with simpler C++ interfaces, or even C interfaces, but I don’t think this would be a justification for preemptively requiring every C++ type and function to be wrapped in C before it can be exposed. I just think the resulting boilerplate code:

0kernel_Warning cast_kernel_warning(kernel::Warning warning)
1{
2    switch (warning) {
3    case kernel::Warning::UNKNOWN_NEW_RULES_ACTIVATED:
4        return kernel_Warning::kernel_LARGE_WORK_INVALID_CHAIN;
5    case kernel::Warning::LARGE_WORK_INVALID_CHAIN:
6        return kernel_Warning::kernel_LARGE_WORK_INVALID_CHAIN;
7    } // no default case, so the compiler can warn about missing cases
8    assert(false);
9}

and duplicative type definitions and documentation:

 0/**
 1 * A struct for holding the kernel notification callbacks. The user data pointer
 2 * may be used to point to user-defined structures to make processing the
 3 * notifications easier.
 4 */
 5typedef struct {
 6    void* user_data;                         //!< Holds a user-defined opaque structure that is passed to the notification callbacks.
 7    kernel_NotifyBlockTip block_tip;         //!< The chain's tip was updated to the provided block index.
 8    kernel_NotifyHeaderTip header_tip;       //!< A new best block header was added.
 9    kernel_NotifyProgress progress;          //!< Reports on current block synchronization progress.
10    kernel_NotifyWarningSet warning_set;     //!< A warning issued by the kernel library during validation.
11    kernel_NotifyWarningUnset warning_unset; //!< A previous condition leading to the issuance of a warning is no longer given.
12    kernel_NotifyFlushError flush_error;     //!< An error encountered when flushing data to disk.
13    kernel_NotifyFatalError fatal_error;     //!< A un-recoverable system error encountered by the library.
14} kernel_NotificationInterfaceCallbacks;

are fundamentally unnecessary and not worth effort of writing and maintaining when C++ is not a new or unusual language and not meaningfully less accessible or interoperable than C is.

There are legitimate reasons to wrap C++ in C. One reason would be to provide ABI compatibility. Another would be to make code accessible with dlopen/dlsym. But I think even in these cases you would want to wrap C++ in C selectively, or just define an intermediate C interface to pass pointers but use C++ on either side of the interface. I don’t think you would want to drop down to C when not otherwise needed.

This is just to explain my point of view though. Overall I think this is very nice work, and I want to help with it, not hold it up.

ryanofsky commented at 8:24 pm on September 2, 2024: contributor

Another idea worth mentioning is that a bitcoin kernel C API could be implemented as a separate C library depending on the C++ library. The new code here does not necessarily need to be part of the main bitcoin core git repository, and it could be in a separate project. A benefit of this approach is it could relieve bitcoin core developers from the responsibility of updating the C API and API documention when they change the C++ code. But a drawback is that C API might not always be up to date with latest version of bitcoin core code and could be broken between releases. Also it might not be as well reviewed or understood and might have more bugs.

josibake commented at 7:59 am on September 3, 2024: member

Concept ACK

Also an implicit approach ACK despite not heavily reviewing the code (yet). I have been focusing on using the kernel library in proof of concept applications to get a better sense of how well the library works for downstream users and to hopefully uncover any pain points preemptively. A few of these projects are linked in the PR description.

Regarding a C header vs C++ header, thanks @ryanofsky for taking the time to explain your thought process. I think you raise some excellent points. I’ll try to respond as best I can, despite being slightly out of my depth on this topic 😅

For me, the value of libbitcoinkernel is only fully realised with the broadest possible language support and ease of use for downstream projects. This is why I strongly prefer the C header approach for the following reasons:

Mature tooling for C language bindings
Stable ABI
Well established pattern in other open source projects

If we agree that broad language support is a goal of libbitcoinkernel, highlighting languages that do not support C++ bindings is a much more compelling argument for a C header than highlighting languages that do support C++ bindings as an argument for a C++ header.

Regarding some of the mentioned languages/tools which do have C++ language binding support:

Tools like these do not support all C++ types and features, and can make it necessary to selectively wrap more complicated C++ interfaces with simpler C++ interfaces, or even C interfaces

In this example, who is doing the wrapping to be able to use these tools? If it’s us, this seems much more complicated to ship and maintain a mixed wrapper and also feels over engineered to a specific set of tools and languages. It also does nothing for languages that do not support C++ bindings at all. As @TheCharlatan mentioned, languages favoured by academia lack C++ binding support and making libbitcoinkernel useful for academic research is a particularly important use case of libbitcoinkernel for me.

If we are exposing just a C++ header and expecting the downstream user to wrap selective parts in C interfaces to use libbitcoinkernel, we’ve eroded a fundamental value proposition of libbitcoinkernel, in my opinion. Namely, we want to provide a safe to use consensus library for users and minimise the risk of downstream projects introducing consensus bugs. Requiring downstream projects to write their own C++/C interfaces to be able to use kernel means that a) they just won’t use libbitcoinkernel or b) will introduce bugs when writing these wrappers. Said differently, if boilerplate will be needed for broad language support, I would prefer we focus our energy on writing and reviewing boilerplate code that ensures the usefulness of the library for the broadest possible user base, instead of requiring a subset of users to each write their own boilerplate without any review from us.

DrahtBot added the label Needs rebase on Sep 3, 2024

TheCharlatan force-pushed on Sep 4, 2024

DrahtBot removed the label Needs rebase on Sep 4, 2024

in src/kernel/bitcoinkernel.h:496 in 63a83b8dad outdated

103+    size_t script_pubkey_len;
104+} kernel_TransactionOutput;
105+
106+/**
107+ * @brief Verify if the input at input_index of tx_to spends the script pubkey
108+ * under the constraints specified by flags. If the witness flag is set the

stickies-v commented at 3:36 pm on September 11, 2024:

nit / meta discussion: even though it’ll make things more verbose, I think it might be worth referring to flags with their full name to make it easier for users to find them? I.e. “If the witness flag is set” would become “if kernel_SCRIPT_FLAGS_VERIFY_WITNESS is set in flags”.

TheCharlatan commented at 9:25 pm on November 19, 2024:

Good point, I added your suggestion.

in src/kernel/bitcoinkernel.h:34 in 33c71843e3 outdated

29+#define BITCOINKERNEL_WARN_UNUSED_RESULT __attribute__((__warn_unused_result__))
30+#else
31+#define BITCOINKERNEL_WARN_UNUSED_RESULT
32+#endif
33+#if !defined(BITCOINKERNEL_BUILD) && defined(__GNUC__) && BITCOINKERNEL_GNUC_PREREQ(3, 4)
34+#define BITCOINKERNEL_ARG_NONNULL(_x) __attribute__((__nonnull__(_x)))

stickies-v commented at 2:43 pm on September 12, 2024:

nit: I like that we’re using this guard. Do you see a downside to making it variadic?

(Should be a pretty trivial rebase with e.g. for i in {1..3}; do sed -i -E "s/BITCOINKERNEL_ARG_NONNULL$([^)]+)$ BITCOINKERNEL_ARG_NONNULL$([0-9]+)$/BITCOINKERNEL_ARG_NONNULL(\1, \2)/" ./src/kernel/bitcoinkernel.h; done)

TheCharlatan commented at 9:20 pm on November 19, 2024:

When I apply this to the first commit I get:

0In file included from /home/drgrid/bitcoin/src/test/kernel/test_kernel.cpp:5:
1/home/drgrid/bitcoin/src/kernel/bitcoinkernel.h:201:32: error: too many arguments provided to function-like macro invocation
2  201 | ) BITCOINKERNEL_ARG_NONNULL(1, 3);
3      |                                ^
4/home/drgrid/bitcoin/src/kernel/bitcoinkernel.h:34:9: note: macro 'BITCOINKERNEL_ARG_NONNULL' defined here
5   34 | #define BITCOINKERNEL_ARG_NONNULL(_x) __attribute__((__nonnull__(_x)))
6      |         ^
7/home/drgrid/bitcoin/src/kernel/bitcoinkernel.h:201:3: error: expected function body after function declarator
8  201 | ) BITCOINKERNEL_ARG_NONNULL(1, 3);
9      |   ^

Which makes sense, because the macro only expects one argument. I’m not sure how safe it is to make it take a string or a list instead.

stickies-v commented at 10:25 pm on November 19, 2024:

Sorry, I forgot to include the diff which updates the macro (and 1 instance to show it compiles):

 0diff --git a/src/kernel/bitcoinkernel.h b/src/kernel/bitcoinkernel.h
 1index 9e6bf127db..67248349e2 100644
 2--- a/src/kernel/bitcoinkernel.h
 3+++ b/src/kernel/bitcoinkernel.h
 4@@ -31,9 +31,9 @@
 5 #define BITCOINKERNEL_WARN_UNUSED_RESULT
 6 #endif
 7 #if !defined(BITCOINKERNEL_BUILD) && defined(__GNUC__) && BITCOINKERNEL_GNUC_PREREQ(3, 4)
 8-#define BITCOINKERNEL_ARG_NONNULL(_x) __attribute__((__nonnull__(_x)))
 9+#define BITCOINKERNEL_ARG_NONNULL(...) __attribute__((__nonnull__(__VA_ARGS__)))
10 #else
11-#define BITCOINKERNEL_ARG_NONNULL(_x)
12+#define BITCOINKERNEL_ARG_NONNULL(...)
13 #endif
14 
15 #ifdef __cplusplus
16@@ -522,7 +522,7 @@ bool BITCOINKERNEL_WARN_UNUSED_RESULT kernel_verify_script(
17     unsigned int input_index,
18     unsigned int flags,
19     kernel_ScriptVerifyStatus* status
20-) BITCOINKERNEL_ARG_NONNULL(1) BITCOINKERNEL_ARG_NONNULL(3);
21+) BITCOINKERNEL_ARG_NONNULL(1, 3);
22 
23 /**
24  * [@brief](/bitcoin-bitcoin/contributor/brief/) This disables the global internal logger. No log messages will be

Based on https://gcc.gnu.org/onlinedocs/cpp/Variadic-Macros.html, variadic macros should be standard for C99, and GCC documents accepting multiple indexes: https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html

TheCharlatan commented at 5:06 pm on November 20, 2024:

Thanks, decided to take this. I was a bit careful here, because I lifted the check from secp, which also does not use variadic args: https://github.com/bitcoin-core/secp256k1/blob/master/include/secp256k1.h#L174. But thinking a bit more about it, I could not come up with a good reason not to, so took your suggestion.

in src/kernel/bitcoinkernel.h:72 in 33c71843e3 outdated

48+ * sha256 implementation, initializes the random number generator and
49+ * self-checks the secp256k1 static context. It is used internally for otherwise
50+ * "context-free" operations.
51+ *
52+ * The user can create their own context for passing it to state-rich validation
53+ * functions and holding callbacks for kernel events.

stickies-v commented at 2:59 pm on September 12, 2024:

Is there any benefit to documenting the built-in static constant kernel context in the header documentation? If I understand correctly, that’s an implementation detail and not relevant to the user? If so, I think we should

only talk about the non-static context in bitcoinkernel.h, so that its meaning is unambiguous to the user
consistently refer to the static context as “static context” wherever it is documented, as to not make me question everything whenever I come across an unqualified context reference

If there is merit to documenting the static context in the header, I think it should be more of a footnote than the very first item in the documentation?

TheCharlatan commented at 9:32 pm on November 19, 2024:

See my comment.

in src/kernel/bitcoinkernel.h:44 in 33c71843e3 outdated

39+#ifdef __cplusplus
40+extern "C" {
41+#endif // __cplusplus
42+
43+/**
44+ * ------ Context ------

stickies-v commented at 3:04 pm on September 12, 2024:

Is there benefit to this stand-alone Context documentation, since we already have (and could expand on/merge with) the kernel_Context documentation? I think perhaps a more useful alternative would be to start the documentation with a minimal example on how to use the kernel (or a non-code “getting started” guide), which would inevitably include/reference the kernel_Context, providing users a good starting point on which documentation to read first?

TheCharlatan commented at 8:56 pm on November 19, 2024:

I did not think that much about order here, but I do think having this section on the context is a good idea. The key is that the user is not required to instantiate the context for using some parts of the library (and I think this is important enough to not just make it a footnote). The user-instantiated context is only really required when interacting with the “stateful” endpoints. Besides, it may be relevant to know what the library is instantiating internally in case there is some sort of conflict.

There is an exception here with the validation interface, and I’ve taken several attempts to come up with a nice way to tie it into the option pattern as well. I’ll take a stab at it again soon.

in src/kernel/bitcoinkernel.h:77 in 33c71843e3 outdated

72+ * The user is responsible for de-allocating the memory owned by pointers
73+ * returned by functions. Typically pointers returned by *_create(...) functions
74+ * can be de-allocated by corresponding *_destroy(...) functions.
75+ *
76+ * Pointer arguments make no assumptions on their lifetime. Once the function
77+ * returns the user can safely de-allocate the passed in arguments.

stickies-v commented at 3:25 pm on September 12, 2024:

I find this phrasing a bit confusing. Is this a correct replacement?

0 * A function that takes pointer arguments makes no assumptions on their lifetime. Once the function
1 * returns the user can safely de-allocate the memory owned by those pointers.

TheCharlatan commented at 9:22 pm on November 19, 2024:

Thanks, taken.

in src/kernel/bitcoinkernel.h:80 in 33c71843e3 outdated

75+ *
76+ * Pointer arguments make no assumptions on their lifetime. Once the function
77+ * returns the user can safely de-allocate the passed in arguments.
78+ *
79+ * Pointers passed by callbacks are not owned by the user and are only valid for
80+ * the duration of it. They should not be de-allocated by the user.

stickies-v commented at 3:27 pm on September 12, 2024:

What’s “it”?
I think adopting and sticking to a clear definition of MUST, MAY, SHOULD, … would be appropriate here? E.g. in this case, I think they “MUST” not be de-allocated by the user, rather than “SHOULD”?

TheCharlatan commented at 9:21 pm on November 19, 2024:

Improved this a bit and good point with the more precise language.

DrahtBot added the label Needs rebase on Sep 12, 2024

TheCharlatan force-pushed on Sep 12, 2024

TheCharlatan commented at 9:19 pm on September 12, 2024: contributor

Rebased.

DrahtBot removed the label Needs rebase on Sep 12, 2024

TheCharlatan force-pushed on Sep 13, 2024

TheCharlatan force-pushed on Sep 14, 2024

TheCharlatan force-pushed on Oct 8, 2024

TheCharlatan commented at 8:07 pm on October 8, 2024: contributor

Reworked after receiving a bunch of out-of-band feedback. In short:

Got rid of the void * option handling. Options are now set through dedicated functions instead of a single setter for all options.
Got rid of the kernel_TaskRunner. The context now holds an immediate task runner internally on which a user can register various validation interfaces. It is now the user’s responsibility to process the validation callbacks in a non-blocking fashion with their own infrastructure.
Got rid of raw data types in validation functions. Instead the raw data is now parsed and processed beforehand and the user always passes opaque data types.
Got rid of the explicit transaction output struct. The user can now retrieve the data with helper functions applied on opaque transaction output objects.

TheCharlatan force-pushed on Oct 11, 2024

TheCharlatan force-pushed on Oct 15, 2024

laanwj requested review from laanwj on Oct 22, 2024

bitcoin deleted a comment on Oct 22, 2024

DrahtBot added the label Needs rebase on Oct 24, 2024

TheCharlatan force-pushed on Oct 24, 2024

DrahtBot removed the label Needs rebase on Oct 24, 2024

TheCharlatan force-pushed on Oct 24, 2024

TheCharlatan force-pushed on Oct 25, 2024

TheCharlatan force-pushed on Nov 2, 2024

TheCharlatan force-pushed on Nov 8, 2024

TheCharlatan force-pushed on Nov 10, 2024

TheCharlatan force-pushed on Nov 14, 2024

TheCharlatan force-pushed on Nov 17, 2024

in src/kernel/bitcoinkernel.h:752 in f1b3ab751b outdated

741+) BITCOINKERNEL_ARG_NONNULL(1) BITCOINKERNEL_ARG_NONNULL(2) BITCOINKERNEL_ARG_NONNULL(3);
742+
743+/**
744+ * Destroy the chainstate manager.
745+ */
746+void kernel_chainstate_manager_destroy(kernel_ChainstateManager* chainstate_manager, const kernel_Context* context);

stickies-v commented at 6:01 pm on November 19, 2024:

What’s the point of the context parameter - it seems unused, and inconsistent with the other _destroy functions?

TheCharlatan commented at 8:48 pm on November 19, 2024:

Similarly to how the context is passed to the other chainman related functions, it is there to guarantee that it is still around when destroying the chainman. The reason for this is that there may be error notification callbacks issued during destruction.

in src/kernel/bitcoinkernel.h:740 in f1b3ab751b outdated

735+ * @return                               The allocated chainstate manager, or null on error.
736+ */
737+kernel_ChainstateManager* BITCOINKERNEL_WARN_UNUSED_RESULT kernel_chainstate_manager_create(
738+    kernel_ChainstateManagerOptions* chainstate_manager_options,
739+    kernel_BlockManagerOptions* block_manager_options,
740+    const kernel_Context* context

stickies-v commented at 6:02 pm on November 19, 2024:

nit: this is the only place where context is not the first option, would be nice for consistency?

TheCharlatan commented at 8:47 pm on November 19, 2024:

Thanks, I think it is good to get these little things right.

in src/kernel/bitcoinkernel.h:676 in f1b3ab751b outdated

617+ * with the options will be configured for these chain parameters.
618+ *
619+ * @param[in] context_options  Non-null, previously created with kernel_context_options_create.
620+ * @param[in] chain_parameters Is set to the context options.
621+ */
622+void kernel_context_options_set_chainparams(

stickies-v commented at 7:18 pm on November 19, 2024:

There are a few places, like here, where we expose modifier functions that are (quasi) required to be ran before initializing another object. An alternative approach would be to extend the kernel_context_options_create to take a (nullable) kernel_ChainParameters*, and remove these ~unsafe modifiers altogether? I think that would have the benefit of:

removing a whole category of bugs where users set options at the wrong time (i.e. too late), silently leading to buggy behaviour
making it easier to see which options can (should) be set, without having to first have read the entire documentation

This concern also applies to e.g.:

kernel_context_options_set_notifications
kernel_validation_interface_register

TheCharlatan commented at 9:04 pm on November 19, 2024:

I think this is good the way it is now. The options get instantiated empty and may be populated by the user. The actual object only gets configured once by the options during its instantion. It can’t be changed later on, so there is no concern that users could set something at the wrong time. Having to set options as arguments in their creation function is not a clear win in my eyes either. There are use-cases, for example using the kernel only as a data reader, where the notifications are useless. Likewise defaulting to mainnet seems sane to me too. It also does not integrate well with the “builder pattern” which is common in a bunch of other languages.

stickies-v commented at 7:19 pm on November 19, 2024: contributor

Strong concept ACK.

I’ve started building a python wrapper library to get familiar with and actually use the interface, so most of my comments for now will be based on that experience and reading the documentation.

TheCharlatan commented at 9:31 pm on November 19, 2024: contributor

Thank you for the review @stickies-v!

Updated 6c9121f7907262b2bf065a7ceeb8bca620060a7f -> 6c9121f7907262b2bf065a7ceeb8bca620060a7f (kernelApi_0 -> kernelApi_1, compare)

Added, cleaned up, and precised a bunch of documentation
Slightly changed the order of a function’s arguments, such that it takes the kernel context first.

TheCharlatan force-pushed on Nov 19, 2024

in src/kernel/bitcoinkernel.h:718 in 6c9121f790 outdated

712+/**
713+ * @brief Set the number of available worker threads used during validation.
714+ *
715+ * @param[in] chainstate_manager_options Non-null, options to be set.
716+ * @param[in] worker_threads The number of worker threads that should be spawned in the thread pool
717+ *                           used for validation. The number should be greater than 0.

stickies-v commented at 4:42 pm on November 20, 2024:

nit: according to the worker_threads_num, 0 is accepted too:

Zero means no parallel verification.

0 *                           used for validation. The number must not be negative. When set to zero, no parallel verification is done.

TheCharlatan force-pushed on Nov 20, 2024

TheCharlatan commented at 5:04 pm on November 20, 2024: contributor

Updated 6c9121f7907262b2bf065a7ceeb8bca620060a7f -> 97fe2b25af31ca612c1f8d9f3de739fa3dee3902 (kernelApi_1 -> kernelApi_2, compare)

Added @stickies-v’s suggestion, implementing variadic args for nonnull attribute macro.

in src/kernel/bitcoinkernel.h:1049 in 97fe2b25af outdated

1043+ * @return                       The next block index in the currently active chain, or null on error.
1044+ */
1045+kernel_BlockIndex* BITCOINKERNEL_WARN_UNUSED_RESULT kernel_get_next_block_index(
1046+    const kernel_Context* context,
1047+    kernel_BlockIndex* block_index,
1048+    kernel_ChainstateManager* chainstate_manager

stickies-v commented at 4:51 pm on November 21, 2024:

nit: for the other block_index getters, chainstate_manager is the second argument - would keep that consistent

in src/kernel/bitcoinkernel.h:1015 in 97fe2b25af outdated

1009+ * @param[in] context            Non-null.
1010+ * @param[in] chainstate_manager Non-null.
1011+ * @param[in] block_hash         Non-null.
1012+ * @return                       The block index of the block with the passed in hash, or null on error.
1013+ */
1014+kernel_BlockIndex* BITCOINKERNEL_WARN_UNUSED_RESULT kernel_get_block_index_by_hash(

stickies-v commented at 4:52 pm on November 21, 2024:

nit: from/by naming inconsistency, I think my preference would lie with from (i.e. update to kernel_get_block_index_from_hash and kernel_get_block_index_from_height)

(technically, could update kernel_get_next_block_index -> kernel_get_block_index_from_previous and kernel_get_previous_block_index -> kernel_get_block_index_from_next, but… from_next sounds weird?)

TheCharlatan force-pushed on Nov 21, 2024

TheCharlatan commented at 10:11 pm on November 21, 2024: contributor

Updated 97fe2b25af31ca612c1f8d9f3de739fa3dee3902 -> a9b71eadb8eff5530500cdb7d7227b8575948df6 (kernelApi_2 -> kernelApi_3, compare)

As discussed with @stickies-v out of band, make callbacks only return const pointers, which further ensures that the user does not de-allocate or take ownership of them.

TheCharlatan force-pushed on Nov 21, 2024

DrahtBot added the label CI failed on Nov 21, 2024

DrahtBot commented at 10:18 pm on November 21, 2024: contributor

🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/33351144688

Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.
An intermittent issue.

Leave a comment here, if you need help tracking down a confusing failure.

DrahtBot removed the label CI failed on Nov 21, 2024

TheCharlatan force-pushed on Nov 22, 2024

TheCharlatan commented at 10:36 am on November 22, 2024: contributor

Updated a9b71eadb8eff5530500cdb7d7227b8575948df6 -> fc67047b7e1fb7031285f790ea3a7ea349474f31 (kernelApi_3 -> kernelApi_4, compare)

Made the user_data argument passed in with the callbacks const to better convey that the library doesn’t do anything with it besides passing it back to the user when the callback is triggered. This mimics the behaviour in libsecp: https://github.com/bitcoin-core/secp256k1/blob/master/include/secp256k1.h#L361

TheCharlatan force-pushed on Nov 22, 2024

TheCharlatan commented at 1:47 pm on November 22, 2024: contributor

Updated fc67047b7e1fb7031285f790ea3a7ea349474f31 -> 34a8429ff3a870c0caaf4c4790becd86c5acde38 (kernelApi_4 -> kernelApi_5, compare)

More consistent const usage

in src/kernel/bitcoinkernel.cpp:538 in 34a8429ff3 outdated

489+
490+    if (spent_outputs_ != nullptr && flags & kernel_SCRIPT_FLAGS_VERIFY_TAPROOT) {
491+        txdata.Init(tx, std::move(spent_outputs));
492+    }
493+
494+    return VerifyScript(tx.vin[input_index].scriptSig,

stickies-v commented at 5:38 pm on November 25, 2024:

I think it’s confusing that this function can return False and have status == kernel_SCRIPT_VERIFY_OK. How about adding a kernel_SCRIPT_VERIFY_ERROR catch-all member for unspecified errors? Or alternatively, requiring the user to provide a nullptr and only setting it to kernel_SCRIPT_VERIFY_OK is that’s actually so?

TheCharlatan commented at 9:58 pm on November 25, 2024:

We discussed during the last workshop that ideally we don’t have any status codes here at all. But the problem is annoying to tackle. You’d probably want to pass this function a script verify object that has already passed through the required pre-checks. But then you have to either copy the objects into this object, or give ownership up to that object, which I don’t think is desirable. The alternative to that is having a function with the same signature that you can call to check the arguments. But then you’re forced to check them here again. I’m coming around to the option of replacing the status codes with log messages, but then we’re sacrificing a bit of responsiveness to the developer.

Edit: I also think that because this is probably going to be the most low-level verification function we expose here, populating the ScriptError_t enum here and returning that instead might be much more interesting. I wanted to hold off on this a bit though, so did not do that yet.

TheCharlatan commented at 9:44 am on November 26, 2024:

What do you think of doing something like this instead: https://github.com/TheCharlatan/bitcoin/commit/6323d7b072de5b13ab25aaa29e02332c44808b62

stickies-v commented at 12:57 pm on November 26, 2024:

You’d probably want to pass this function a script verify object that has already passed through the required pre-checks.

I 100% agree with this approach. Adding extra types makes the API more cumbersome to use, but I think it does make it more safe, and the extra verbosity should be quite easy to hide in client libraries.

But then you’re forced to check them here again.

I think that can be avoided by having the prechecks function return a ScriptPreChecksPassed* (which references the object it verified, but doesn’t copy it) and then requiring that as an extra argument (extra as a way to address lifetime issues) to the kernel_verify_script function? This doesn’t prevent runtime issues (e.g. re-using PreChecksPassed pointers, which is easily verifiable at runtime) or segfaults, but at least it adds some compile-time checks to guide the user to using the API safely, and it can be done without any copies or changing ownership?

What do you think of doing something like this instead: https://github.com/TheCharlatan/bitcoin/commit/6323d7b072de5b13ab25aaa29e02332c44808b62

I’m not convinced. I think not requiring the user to deal with status codes (or strings) is a good philosophy, but if it’s optional anyway, then passing a bool to add a log entry feels like a much worse interface? And I think it’s inferior to this approach:

The alternative to that is having a function with the same signature that you can call to check the arguments.

TheCharlatan commented at 1:25 pm on November 26, 2024:

I’ll propose a pre-check function and object later today then :)

TheCharlatan commented at 1:57 pm on November 28, 2024:

Ok, this is what I have now: https://github.com/TheCharlatan/bitcoin/compare/kernelApi_7..kernelApi_ScriptVerifyArgs , what do you think? EDIT: This also introduces the concept of a ‘View’ where you get a resource, but it is dependent on the lifetime of the resources it was created from.

TheCharlatan force-pushed on Nov 25, 2024

TheCharlatan commented at 10:32 pm on November 25, 2024: contributor

Thanks for the review @stickies-v!

Updated 34a8429ff3a870c0caaf4c4790becd86c5acde38 -> 35f8503285c672e8ee7e98617e236b38d8ce7a7f (kernelApi_5 -> kernelApi_6, compare)

Addressed @stickies-v’s comment, fixed worker threads docstring.
Addressed @stickies-v’s comment, make argument ordering more consistent.
Addressed @stickies-v’s comment, name functions consistently that return a new object with the help of other objects with from instead of by.

TheCharlatan force-pushed on Nov 28, 2024

TheCharlatan commented at 11:43 am on November 28, 2024: contributor

Updated 35f8503285c672e8ee7e98617e236b38d8ce7a7f -> 403c20980ec118f6efdd21d7c25646e20574583b (kernelApi_6 -> kernelApi_7, compare)

Integrate the validation interface into the context. This avoids having to create/destroy and register/deregister a standalone validation interface object and should make it a bit easier to use. I was hesitant to do this, because it doesn’t allow the user to register multiple interfaces anymore. However, since the user can multiplex the notifications by themselves, I don’t think this is something worth keeping around.
Consistently apply the block hash deleter

TheCharlatan force-pushed on Dec 2, 2024

TheCharlatan commented at 12:42 pm on December 2, 2024: contributor

Rebased 403c20980ec118f6efdd21d7c25646e20574583b -> 8598bc9e5d3fb7ebc08cf0c6422b3e44c56230d6 (kernelApi_7 -> kernelApi_8, compare)

Get build system fixes from #31395 and #31357

TheCharlatan force-pushed on Dec 4, 2024

TheCharlatan commented at 8:01 am on December 4, 2024: contributor

Rebased 8598bc9e5d3fb7ebc08cf0c6422b3e44c56230d6 -> 6090df267dfece6192b567fed6582445aa811e7f (kernelApi_8 -> kernelApi_9, compare)

Alligned process block pre-checks through #31175
Clamp the work threads number so we properly handle the value range through #31313

TheCharlatan force-pushed on Dec 15, 2024

TheCharlatan commented at 10:54 pm on December 15, 2024: contributor

Updated 6090df267dfece6192b567fed6582445aa811e7f -> 247a8a02c636250ee7e5c06f08cd18ddb1de6be5 (kernelApi_9 -> kernelApi_10, compare)

Tweaked docs to be more doxygen friendly.
Corrected transaction in taproot script validation tests. I must have committed a version where I was testing a different invariant by mistake. The test now correctly asserts that a taproot transaction can pass validation.

TheCharlatan force-pushed on Dec 16, 2024

TheCharlatan commented at 11:23 am on December 16, 2024: contributor

Updated 247a8a02c636250ee7e5c06f08cd18ddb1de6be5 -> 9e203b460d8ab1d92949ab8714a9265c343a5eee (kernelApi_10 -> kernelApi_11, compare)

Changed how the notification callbacks are set. There was no real need for a separate notifications object, so I removed it.

in src/kernel/bitcoinkernel.h:264 in 9e203b460d outdated

259+
260+/**
261+ * Function signature for the global logging callback. All bitcoin kernel
262+ * internal logs will pass through this callback.
263+ */
264+typedef void (*kernel_LogCallback)(void* user_data, const char* message);

laanwj commented at 1:23 pm on December 17, 2024:

A general comment on the API: i’d prefer to pass (and receive) explicit lengths for strings instead of bare char*s.

My experience with wrapping C APIs in rust is that it’s important to have a defined memory-range for strings and arrays. Relying on NUL-termination means that the memory size is effectively unrestricted, making it impossible to do some checks safely. This is (with lesser urgency) also true for other languages like Python that represent strings as pointer+length.

As we internally use C++ strings and not C string APIs this seems straightforward to offer.

TheCharlatan commented at 1:34 pm on December 17, 2024:

Mmh, thanks for this. It should be easy to add a length parameter here.

TheCharlatan force-pushed on Dec 17, 2024

TheCharlatan commented at 2:58 pm on December 17, 2024: contributor

Thank you for having a look @laanwj!

Updated 9e203b460d8ab1d92949ab8714a9265c343a5eee -> 73acb3ff8a04cddc4904c446c2521dd2b2abc84d (kernelApi_11 -> kernelApi_12, compare)

Addressed @laanwj’s comment, added a size parameter to all functions taking a null terminated const char* string parameter. This is arguably safer, since we don’t completely rely on the null terminator for safety.
Removed ‘\n’ newlines in the log calls in the kernel wrapper calls.
Added the progress callback back again. I disabled it some time ago, and forgot to add it back again.
Use string views instead of const char* in the c++ wrapper.

laanwj commented at 3:54 pm on December 17, 2024: member

To make the doxygen documentation nicer to read, i’ve added grouping to the list of functions, and reordered a bit to make sure create is first and destroy always last within the group (if applicable), feel free to take over this patch:

https://github.com/laanwj/bitcoin/commit/c222651aca4578857f5d432bd6ce221b5602ee38

TheCharlatan force-pushed on Dec 17, 2024

TheCharlatan commented at 5:40 pm on December 17, 2024: contributor

Thanks for the doc suggestions @laanwj, I just moved a few functions to different places compared to your patch. Feel free to send me another me, if you think it still is not ideal. The groupings do look very nice in the docs now and also makes it a bit more straight forward to decide where to put new functions.

Updated 73acb3ff8a04cddc4904c446c2521dd2b2abc84d -> 20eec64b5e417cac8c68100826c0adf2152a49eb (kernelApi_12 -> kernelApi_13, compare)

Applied @laanwj’s patch, introducing doc groupings for the header functions.

stickies-v referenced this in commit d7c4348efe on Dec 18, 2024

stickies-v referenced this in commit 576176ce79 on Dec 18, 2024

stickies-v referenced this in commit 198280656b on Dec 18, 2024

stickies-v referenced this in commit 1ed55b8071 on Dec 18, 2024

stickies-v referenced this in commit 7fac90b9cc on Dec 18, 2024

stickies-v referenced this in commit 9e5d92ac2f on Dec 18, 2024

stickies-v referenced this in commit d624a26513 on Dec 18, 2024

stickies-v referenced this in commit 7e4c76c9d8 on Dec 18, 2024

stickies-v referenced this in commit 4103e2cb36 on Dec 18, 2024

stickies-v referenced this in commit b7b739325c on Dec 18, 2024

stickies-v referenced this in commit 1db5599b6d on Dec 18, 2024

stickies-v referenced this in commit b42cf6ac98 on Dec 18, 2024

in src/kernel/bitcoinkernel.h:922 in 20eec64b5e outdated

912+ */
913+bool BITCOINKERNEL_WARN_UNUSED_RESULT kernel_chainstate_manager_load_chainstate(
914+    const kernel_Context* context,
915+    const kernel_ChainstateLoadOptions* chainstate_load_options,
916+    kernel_ChainstateManager* chainstate_manager
917+) BITCOINKERNEL_ARG_NONNULL(1, 2, 3);

stickies-v commented at 7:26 pm on December 18, 2024:

When this function is called more than once, kernel crashes with an assertion error:

0Assertion failed: (!m_ibd_chainstate), function InitializeChainstate, file validation.cpp, line 5655.

The solutions I see atm:

document that this function may only be called once for each chainman
add a field to kernel_ChainstateManager* to keep track of it being loaded already, return false and log an error
rework LoadChainstate logic to handle multiple calls gracefully
remove kernel_chainstate_manager_load_chainstate altogether and load chainstate during kernel_chainstate_manager_create.

I’m not sure if we really need a separate *_load_chainstate function, so if that’s true, then option 4. would probably be preferable? I implemented it in https://github.com/TheCharlatan/bitcoin/compare/kernelApi...stickies-v:bitcoin:kernel/remove-load-chainstate, but in practice this probably should be a rebase instead of an extra commit. Options 1. and 2. seem easy enough to implement too, 3. is probably not the most sensible.

TheCharlatan commented at 10:10 pm on December 18, 2024:

remove kernel_chainstate_manager_load_chainstate altogether and load chainstate during kernel_chainstate_manager_create.

I would like this a lot, but I wanted to keep a separate chainstate load function in case we ever land a “blocks-only read-only” chainstate manager, where we don’t need to load any chainstates. I feel like making this a no-op could work, the simplest thing to do would probably be adding something along the lines of:

0if (chainman.GetAll().size() > 0) return true;

to kernel_chainstate_manager_load_chainstate. But then again it would move us closer to a correct by construction setup if we’d do the constructing and loading all at once.

stickies-v commented at 1:05 pm on December 19, 2024:

Thanks for updating to option 4. Just to summarize what we talked about offline:

but I wanted to keep a separate chainstate load function in case we ever land a “blocks-only read-only” chainstate manager

That makes sense with the current code organization, but I think we should aim to shift towards a more intuitive API over time. Operations that don’t require any chainstate (such as blocks-only read-only) probably shouldn’t use the chainman in the first place.

But then again it would move us closer to a correct by construction setup

I think that is a worthwhile design goal for the API.

in src/kernel/bitcoinkernel.h:434 in 20eec64b5e outdated

428+} kernel_BlockHash;
429+
430+/**
431+ * Convenience struct for holding serialized data.
432+ */
433+typedef struct {

laanwj commented at 1:36 am on December 19, 2024:

i was wondering; these are trivially small structures, why don’t we pass and return them by value instead of by pointer? this would avoid needing a special kernel_byte_array_destroy call to deallocate them

edit: never mind, of course that’s still necessary to deallocate the conents

TheCharlatan commented at 9:11 am on December 19, 2024:

of course that’s still necessary to deallocate the conents

Yes, I try to match every call to new with a corresponding place to delete.

TheCharlatan force-pushed on Dec 19, 2024

TheCharlatan commented at 10:58 am on December 19, 2024: contributor

Thanks for the suggestion @stickies-v, I think it is the right call.

Updated 20eec64b5e417cac8c68100826c0adf2152a49eb -> f157b0cbc7d90075858a6522d13a7bc4f0b25a5f (kernelApi_13 -> kernelApi_14, compare)

Addressed @stickies-v’s comment, applying the suggestion for rolling chainstate loading into chainstate creation.

stickies-v referenced this in commit 523dee4273 on Dec 19, 2024

stickies-v referenced this in commit 514f22fc45 on Dec 19, 2024

stickies-v referenced this in commit dec7ebd469 on Dec 19, 2024

stickies-v referenced this in commit f9408aadad on Dec 19, 2024

stickies-v referenced this in commit befc5e7a09 on Dec 19, 2024

ismaelsadeeq commented at 9:03 pm on December 20, 2024: member

Concept ACK

in src/kernel/bitcoin-chainstate.cpp:17 in f157b0cbc7 outdated

12+{
13+    std::vector<unsigned char> bytes;
14+
15+    for (size_t i{0}; i < hex.length(); i += 2) {
16+        std::string byteString{hex.substr(i, 2)};
17+        unsigned char byte = (char)std::strtol(byteString.c_str(), nullptr, 16);

laanwj commented at 1:18 pm on January 15, 2025:

i would really prefer not to bring back use of strtol in C++ code; it has some known issues with locale-dependence (especially on Linux). what about:

 0#include <charconv>
 1...
 2std::vector<unsigned char> hex_string_to_char_vec(const std::string& hex)
 3{
 4    std::vector<unsigned char> bytes;
 5
 6    for (size_t i{0}; i < hex.length(); i += 2) {
 7        unsigned int val{0};
 8        auto [p, ec] = std::from_chars(hex.data() + i, hex.data() + i + 2, val, 16);
 9        if (ec == std::errc{} && p == hex.data() + i + 2) {
10            bytes.push_back(val);
11        }
12    }
13
14    return bytes;
15}

from_chars is guaranteed to be locale-independent so doesn’t need an exception in the linter either. Same for the other use.

in src/kernel/bitcoin-chainstate.cpp:9 in f157b0cbc7 outdated

0@@ -0,0 +1,200 @@
1+#include <kernel/bitcoinkernel_wrapper.h>
2+
3+#include <cassert>
4+#include <filesystem>
5+#include <iostream>
6+#include <optional>
7+#include <string>
8+#include <string_view>
9+#include <sstream>

laanwj commented at 1:19 pm on January 15, 2025:

missing #include <vector>

DrahtBot added the label Needs rebase on Jan 15, 2025

TheCharlatan force-pushed on Jan 15, 2025

TheCharlatan commented at 9:00 pm on January 15, 2025: contributor

Thanks for the suggestions @laanwj,

Rebased f157b0cbc7d90075858a6522d13a7bc4f0b25a5f -> f25616bec485ee6a70e4b797758d4987d25a7c25 (kernelApi_14 -> kernelApi_15, compare)

Fixed conflict with #31061

Updated f25616bec485ee6a70e4b797758d4987d25a7c25 -> 4dde75858a3b08f84d71176c7be14bae62020b1f (kernelApi_15 -> kernelApi_16, compare)

Took @laanwj’s suggestion, avoiding strtol in hex string to byte vector conversion
Addressed @laanwj’s comment, added missing vector include.

DrahtBot removed the label Needs rebase on Jan 16, 2025

TheCharlatan force-pushed on Jan 17, 2025

TheCharlatan commented at 9:30 am on January 17, 2025: contributor

Rebased 4dde75858a3b08f84d71176c7be14bae62020b1f -> 538671edce5813a62405b9bd5c50c39263c58435 (kernelApi_16 -> kernelApi_17, compare)

Get new method for computing cache sizes from https://github.com/bitcoin/bitcoin/pull/31483

TheCharlatan force-pushed on Jan 17, 2025

TheCharlatan commented at 7:55 am on January 18, 2025: contributor

Rebased 538671edce5813a62405b9bd5c50c39263c58435 -> 01a43b24436e0aed7b8f79d3857630a4bf6a0545 (kernelApi_17 -> kernelApi_18, compare)

Get functional test fix from #31675

in src/kernel/bitcoin-chainstate.cpp:167 in 01a43b2443 outdated

162+    options.SetValidationInterface(validation_interface);
163+
164+    Context context{options};
165+    assert(context);
166+
167+    ChainstateManagerOptions chainman_opts{context, abs_datadir};

stickies-v commented at 3:39 pm on January 30, 2025:

The implicit std::filesystem::__cxx11::path to const std::string& conversion doesn’t seem to cross-compile for x86_64-w64-mingw32:

 0[100%] Building CXX object src/kernel/CMakeFiles/kernel-bitcoin-chainstate.dir/bitcoin-chainstate.cpp.obj
 1/home/runner/work/py-bitcoinkernel/py-bitcoinkernel/depend/bitcoin/src/kernel/bitcoin-chainstate.cpp: In function ‘int main(int, char**)’:
 2/home/runner/work/py-bitcoinkernel/py-bitcoinkernel/depend/bitcoin/src/kernel/bitcoin-chainstate.cpp:164:64: error: no matching function for call to ‘ChainstateManagerOptions::ChainstateManagerOptions(<brace-enclosed initializer list>)’
 3  164 |     ChainstateManagerOptions chainman_opts{context, abs_datadir};
 4      |                                                                ^
 5In file included from /home/runner/work/py-bitcoinkernel/py-bitcoinkernel/depend/bitcoin/src/kernel/bitcoin-chainstate.cpp:1:
 6/home/runner/work/py-bitcoinkernel/py-bitcoinkernel/depend/bitcoin/src/kernel/bitcoinkernel_wrapper.h:396:5: note: candidate: ‘ChainstateManagerOptions::ChainstateManagerOptions(const Context&, const std::string&)’
 7  396 |     ChainstateManagerOptions(const Context& context, const std::string& data_dir) noexcept
 8      |     ^~~~~~~~~~~~~~~~~~~~~~~~
 9/home/runner/work/py-bitcoinkernel/py-bitcoinkernel/depend/bitcoin/src/kernel/bitcoinkernel_wrapper.h:396:73: note:   no known conversion for argument 2 from ‘std::filesystem::__cxx11::path’ to ‘const std::string&’ {aka ‘const std::__cxx11::basic_string<char>&’}
10  396 |     ChainstateManagerOptions(const Context& context, const std::string& data_dir) noexcept
11      |                                                      ~~~~~~~~~~~~~~~~~~~^~~~~~~~
12/home/runner/work/py-bitcoinkernel/py-bitcoinkernel/depend/bitcoin/src/kernel/bitcoinkernel_wrapper.h:383:7: note: candidate: ‘ChainstateManagerOptions::ChainstateManagerOptions(ChainstateManagerOptions&&)’
13  383 | class ChainstateManagerOptions
14      |       ^~~~~~~~~~~~~~~~~~~~~~~~
15/home/runner/work/py-bitcoinkernel/py-bitcoinkernel/depend/bitcoin/src/kernel/bitcoinkernel_wrapper.h:383:7: note:   candidate expects 1 argument, 2 provided
16/home/runner/work/py-bitcoinkernel/py-bitcoinkernel/depend/bitcoin/src/kernel/bitcoin-chainstate.cpp:167:70: error: no matching function for call to ‘BlockManagerOptions::BlockManagerOptions(<brace-enclosed initializer list>)’
17  167 |     BlockManagerOptions blockman_opts{context, abs_datadir / "blocks"};
18      |                                                                      ^
19/home/runner/work/py-bitcoinkernel/py-bitcoinkernel/depend/bitcoin/src/kernel/bitcoinkernel_wrapper.h:425:5: note: candidate: ‘BlockManagerOptions::BlockManagerOptions(const Context&, const std::string&)’
20  425 |     BlockManagerOptions(const Context& context, const std::string& data_dir) noexcept
21      |     ^~~~~~~~~~~~~~~~~~~
22/home/runner/work/py-bitcoinkernel/py-bitcoinkernel/depend/bitcoin/src/kernel/bitcoinkernel_wrapper.h:425:68: note:   no known conversion for argument 2 from ‘std::filesystem::__cxx11::path’ to ‘const std::string&’ {aka ‘const std::__cxx11::basic_string<char>&’}
23  425 |     BlockManagerOptions(const Context& context, const std::string& data_dir) noexcept
24      |                                                 ~~~~~~~~~~~~~~~~~~~^~~~~~~~
25/home/runner/work/py-bitcoinkernel/py-bitcoinkernel/depend/bitcoin/src/kernel/bitcoinkernel_wrapper.h:412:7: note: candidate: ‘BlockManagerOptions::BlockManagerOptions(BlockManagerOptions&&)’
26  412 | class BlockManagerOptions
27      |       ^~~~~~~~~~~~~~~~~~~
28/home/runner/work/py-bitcoinkernel/py-bitcoinkernel/depend/bitcoin/src/kernel/bitcoinkernel_wrapper.h:412:7: note:   candidate expects 1 argument, 2 provided
29gmake[5]: *** [src/kernel/CMakeFiles/kernel-bitcoin-chainstate.dir/build.make:80: src/kernel/CMakeFiles/kernel-bitcoin-chainstate.dir/bitcoin-chainstate.cpp.obj] Error 1
30gmake[4]: *** [CMakeFiles/Makefile2:1168: src/kernel/CMakeFiles/kernel-bitcoin-chainstate.dir/all] Error 2
31gmake[3]: *** [Makefile:136: all] Error 2
32gmake[2]: *** [CMakeFiles/bitcoin_core.dir/build.make:86: bitcoin_core-prefix/src/bitcoin_core-stamp/bitcoin_core-build] Error 2
33gmake[1]: *** [CMakeFiles/Makefile2:122: CMakeFiles/bitcoin_core.dir/all] Error 2
34gmake: *** [Makefile:136: all] Error 2

Ran into this in one of my py-bitcoinkernel CI runs. Slightly older HEAD, but at first glance still relevant, just wanted to dump here already until I have time to investigate further - sorry if it’s irrelevant.

TheCharlatan commented at 3:44 pm on January 30, 2025:

Good catch, I will try add the tests and chainstate binary to the CI here too.

TheCharlatan commented at 12:22 pm on January 31, 2025:

I added it to the cross compiled windows job now, but it is going to take extra work (#31158) to add it to the native job too.

TheCharlatan commented at 1:13 pm on February 1, 2025:

Ok, added symbol exporting now, so we should have somewhat working windows support.

stickies-v commented at 2:39 pm on February 1, 2025:

Nice! Just to be clear: I only had issues with compiling the bitcoin-chainstate target, the bitcoinkernel library already was working fine (at least the functions covered with py-bitcoinkernel’s test suite, which is not yet 100%) with the mingw32 cross-compiled binary.

(But I suspect you’re talking about “somewhat working native windows support”, which I’m not using in my pipelines)

TheCharlatan force-pushed on Jan 31, 2025

TheCharlatan commented at 12:20 pm on January 31, 2025: contributor

Updated 01a43b24436e0aed7b8f79d3857630a4bf6a0545 -> 10e71b4c47e1b199622280d100155ed5d6ef6d66 (kernelApi_18 -> kernelApi_19, compare)

Hooked up kernel tests to cmake so the CI can run them through ctest
Added test_kernel, libbitcoinkernel.so, and kernel/bitcoin-chainstate to some more CI builds.
Patched some small build errors related to unused variables and type narrowing.
Addressed @stickies-v’s comment, fixing path to string conversion.

TheCharlatan force-pushed on Jan 31, 2025

TheCharlatan commented at 12:27 pm on January 31, 2025: contributor

Rebased 10e71b4c47e1b199622280d100155ed5d6ef6d66 -> fb6e0deee18dff38219c9646d2461a40de03ed66 (kernelApi_19 -> kernelApi_20, compare)

DrahtBot commented at 1:36 pm on January 31, 2025: contributor

🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/36477768799

Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.
An intermittent issue.

Leave a comment here, if you need help tracking down a confusing failure.

DrahtBot added the label CI failed on Jan 31, 2025

TheCharlatan force-pushed on Jan 31, 2025

TheCharlatan force-pushed on Feb 1, 2025

TheCharlatan commented at 1:13 pm on February 1, 2025: contributor

Updated fb6e0deee18dff38219c9646d2461a40de03ed66 -> f926d7ef34773b7836e94491a16021288c125b11 (kernelApi_20 -> kernelApi_21, compare)

Added symbol exports where appropriate in the header to enable windows support
Completely replaced the existing bitcoin-chainstate with the new kernel-API-only bitcoin-chainstate.
Removed the cmake symbol visibility patch, instead relying on the header.
Removed flaky filesystem-related chainman and blockman opts tests.

TheCharlatan force-pushed on Feb 1, 2025

TheCharlatan commented at 3:33 pm on February 1, 2025: contributor

Rebased f926d7ef34773b7836e94491a16021288c125b11 -> 817865d57daa822370b0f67e1e079fdd25ab3130 (kernelApi_21 -> kernelApi_22, compare)

Fixed conflict with #30965, which necessitated some changes to the API: The block manager options are now responsible for the options affecting the block tree db.

DrahtBot removed the label CI failed on Feb 3, 2025

Armss9936 approved

in src/test/CMakeLists.txt:217 in 817865d57d outdated

210@@ -211,6 +211,10 @@ function(add_all_test_targets)
211   endforeach()
212 endfunction()
213 
214+if (BUILD_KERNEL_TEST)
215+  add_subdirectory(kernel)
216+endif()
217+

stickies-v commented at 8:56 pm on February 6, 2025:

I’m not sure if this is the best approach. -DBUILD_KERNEL_LIB=ON -DBUILD_KERNEL_TEST=ON should imo build the tests even if -DBUILD_TESTS=OFF. I think an approach where we update src/CMakeLists.txt with the below makes more sense (quick sketch)?

0if(BUILD_KERNEL_LIB)
1  add_subdirectory(kernel)
2  if (BUILD_KERNEL_TEST)
3    add_subdirectory(test/kernel)
4  endif()
5endif()

TheCharlatan force-pushed on Feb 12, 2025

TheCharlatan commented at 3:23 pm on February 12, 2025: contributor

Thank you for all the suggestions made over the past week @stickies-v!

Updated 817865d57daa822370b0f67e1e079fdd25ab3130 -> 5aeaa3f49d10562b8936ef36b1c25a6466dbe03e (kernelApi_22 -> kernelApi_23, compare)

Adapted @stickies-v’s suggestion made in https://github.com/TheCharlatan/bitcoin/pull/24 , by merging the ChainstateManager::Options, BlockManager::Options, or ChainstateLoadOptions into a single options struct. This should simplifies ChainstateManager initialization and slims down the API a bit.
Ran clang-format over the commits
Addressed @stickies-v’s comment, taking the suggestion for making building the kernel tests only conditional on the kernel lib.

TheCharlatan force-pushed on Feb 12, 2025

DrahtBot added the label CI failed on Feb 12, 2025

DrahtBot commented at 4:41 pm on February 12, 2025: contributor

🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/37104019999

Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.
An intermittent issue.

Leave a comment here, if you need help tracking down a confusing failure.

DrahtBot removed the label CI failed on Feb 12, 2025

DrahtBot added the label Needs rebase on Feb 14, 2025

TheCharlatan force-pushed on Feb 14, 2025

TheCharlatan commented at 3:10 pm on February 14, 2025: contributor

Rebased 5aeaa3f49d10562b8936ef36b1c25a6466dbe03e -> a604321c3e4bd50b52fa28e8567f6b068b2d2fb3 (kernelApi_23 -> kernelApi_24, compare)

Fixed conflict with #31844

DrahtBot removed the label Needs rebase on Feb 14, 2025

in src/kernel/bitcoinkernel.cpp:152 in a604321c3e outdated

149+
150+kernel_Warning cast_kernel_warning(kernel::Warning warning)
151+{
152+    switch (warning) {
153+    case kernel::Warning::UNKNOWN_NEW_RULES_ACTIVATED:
154+        return kernel_Warning::kernel_LARGE_WORK_INVALID_CHAIN;

walterl commented at 0:59 am on February 17, 2025:

Isn’t this supposed to return kernel_UNKNOWN_NEW_RULES_ACTIVATED?

0        return kernel_Warning::kernel_UNKNOWN_NEW_RULES_ACTIVATED;

TheCharlatan commented at 3:20 pm on February 17, 2025:

Indeed, I think I mixed this up at some point during a rebase. Thank!

TheCharlatan force-pushed on Feb 17, 2025

DrahtBot added the label Needs rebase on Feb 19, 2025

TheCharlatan force-pushed on Feb 19, 2025

TheCharlatan commented at 11:41 am on February 19, 2025: contributor

Rebased a604321c3e4bd50b52fa28e8567f6b068b2d2fb3 -> 251a55f2f0cc3cdfb7fa0015b76772586134cde3 (kernelApi_24 -> kernelApi_25, compare)

DrahtBot removed the label Needs rebase on Feb 19, 2025

DrahtBot added the label Needs rebase on Feb 20, 2025

TheCharlatan force-pushed on Feb 20, 2025

TheCharlatan commented at 5:58 pm on February 20, 2025: contributor

Rebased 251a55f2f0cc3cdfb7fa0015b76772586134cde3 -> c72b2c2883d4c8791267133f326e3f9347d1520b (kernelApi_25 -> kernelApi_26, compare)

DrahtBot removed the label Needs rebase on Feb 20, 2025

TheCharlatan force-pushed on Feb 22, 2025

TheCharlatan commented at 11:54 am on February 22, 2025: contributor

Updated c72b2c2883d4c8791267133f326e3f9347d1520b -> 29513955891e40e78466f2c666dfa13e9c1b2914 (kernelApi_26 -> kernelApi_27, compare)

Cleaned up some dead code missed while removing the kernel_ValidationInterface.

in src/kernel/bitcoinkernel.h:161 in 2951395589 outdated

156+ *
157+ * The processing of validation events is done through an internal task
158+ * runner owned by the context. The task runner drives the execution of events
159+ * triggering validation interface callbacks. Multiple validation interfaces can
160+ * be registered with the context. The kernel will create an event for each of
161+ * the registered validation interfaces through the task runner.

stickies-v commented at 5:20 pm on March 11, 2025:

I think this whole block is from a previous version and should now be removed?

in src/kernel/bitcoinkernel.h:640 in 2951395589 outdated

636+    const kernel_ChainType chain_type);
637+
638+/**
639+ * Destroy the chain parameters.
640+ */
641+BITCOINKERNEL_API void kernel_chain_parameters_destroy(const kernel_ChainParameters* chain_parameters);

stickies-v commented at 11:06 am on March 12, 2025:

I think this shouldn’t be const?

0BITCOINKERNEL_API void kernel_chain_parameters_destroy(kernel_ChainParameters* chain_parameters);

TheCharlatan commented at 10:51 am on March 15, 2025:

It shouldn’t be, but annoyingly this is not trivial to change. Since the functions constructing the params in our code only return const types, we have to carry that const into our API. I think the only alternative is copying the params, which I will push shortly.

in src/kernel/bitcoinkernel.cpp:734 in 2951395589 outdated

733+}
734+
735+void kernel_chainstate_manager_options_destroy(kernel_ChainstateManagerOptions* options)
736+{
737+    if (options) {
738+        delete cast_const_chainstate_manager_options(options);

stickies-v commented at 11:09 am on March 12, 2025:

nit: I don’t think it makes a functional difference, but it’s a bit weird using the const cast here (+ for ChainParameters, BlockUndo)?

TheCharlatan commented at 9:46 pm on March 15, 2025:

Done.

in src/kernel/bitcoinkernel.cpp:649 in 2951395589 outdated

624+void kernel_context_options_set_chainparams(kernel_ContextOptions* options_, const kernel_ChainParameters* chain_parameters)
625+{
626+    auto options{cast_context_options(options_)};
627+    auto chain_params{reinterpret_cast<const CChainParams*>(chain_parameters)};
628+    // Copy the chainparams, so the caller can free it again
629+    options->m_chainparams = std::make_unique<const CChainParams>(*chain_params);

stickies-v commented at 11:27 am on March 12, 2025:

This doesn’t seem thread-safe (+ for ~all other setters). Since it seems we can’t use std::atomic for most of these, adding a per-struct lock might be a good alternative?

I can’t think of a sane scenario where someone would want to call the same setter from multiple threads, but… it’s probably better to offer the guarantees anyway?

TheCharlatan commented at 5:35 pm on March 15, 2025:

Just noticed that the same is also true for the logger, but unlike the options, I can actually see multiple threads accessing it. I don’t think we should fix that now though, to me it feels more important to have non-global logging objects first.

stickies-v commented at 4:08 pm on March 17, 2025:

to me it feels more important to have non-global logging objects first.

Agreed. Updating the docs to reflect this might be good though, e.g.:

Not thread-safe. Logging is global. Multiple calls are allowed but

must be synchronized and will override previous settings for all

existing kernel_LoggingConnection instances.

in src/kernel/bitcoinkernel.h:164 in 2951395589 outdated

159+ * triggering validation interface callbacks. Multiple validation interfaces can
160+ * be registered with the context. The kernel will create an event for each of
161+ * the registered validation interfaces through the task runner.
162+ *
163+ * A constructed context can be safely used from multiple threads, but functions
164+ * taking it as a non-cost argument need exclusive access to it.

stickies-v commented at 11:39 am on March 12, 2025:

kernel_context_destroy() and kernel_context_interrupt() are the only places that take a non-const kernel_Context. I think we kernel_Context’s is no different to all other *_destroy() functions - in that they should never be called twice, regardless of the thread. And it seems to me that kernel_context_interrupt() is actually thread-safe. So, I think “but functions taking…” can be removed?

(also nit: s/non-cost/non-const/)

stickies-v commented at 1:35 pm on March 13, 2025: contributor

I’ve been looking at thread-safety, and left some comments on it (as well as some unrelated ones).

I think the API is pretty close to being thread-safe. Would be nice if we can make some guarantees on it and document it as such?

DrahtBot added the label Needs rebase on Mar 14, 2025

in src/kernel/bitcoinkernel.cpp:89 in 2951395589 outdated

86+    }
87+    case kernel_LogLevel::kernel_LOG_TRACE: {
88+        return "trace";
89+    }
90+    } // no default case, so the compiler can warn about missing cases
91+    assert(false);

stickies-v commented at 11:31 am on March 14, 2025:

This (+ in log_category_to_string() leads to runtime assertion errors for interpreted languages. It also means that add_log_level_category(), enable_log_category() and disable_log_category() are basically void instead of bool because they can only return true (or crash).

E.g. in python:

0>>> pbk.add_log_level_category(99, 20)
1Assertion failed: (false), function log_level_to_string, file bitcoinkernel.cpp, line 87.
2zsh: abort      python

TheCharlatan commented at 12:31 pm on March 15, 2025:

This (+ in log_category_to_string() leads to runtime assertion errors for interpreted languages

I think the problem here is that the C enums are weakly typed, and if you use them in weakly typed languages you run into these problems. I think the function signature should already give enough of a hint on what the range of allowed values is. That said, I’ll change this to instead return std::nullopt and then return false from there.

EDIT: Changed my mind, rather added a function to the logger so we can always use the enums and don’t need stringy “types”.

in src/kernel/bitcoinkernel.cpp:521 in 2951395589 outdated

518+    const auto level{log_level_to_string(level_)};
519+    if (category == kernel_LogCategory::kernel_LOG_ALL) {
520+        return LogInstance().SetLogLevel(level);
521+    }
522+
523+    return LogInstance().SetCategoryLogLevel(log_category_to_string(category), level);

stickies-v commented at 11:43 am on March 14, 2025:

This back-and-forth string conversion feels suboptimal. Perhaps an alternative approach would be to keep the integer values between kernel_LogCategory the same as BCLog::LogFlags and just define a kernel-specific bitfield that defines which BCLog flags are valid? I don’t think the kernel_LogCategory enum values being non-continuous is a problem, since this might happen in the future anyway e.g. if certain components are moved out of kernel scope?

TheCharlatan commented at 9:47 pm on March 15, 2025:

I’m not quite sure what you meant with “Perhaps an alternative approach would be to keep the integer values between kernel_LogCategory the same as BCLog::LogFlags and just define a kernel-specific bitfield that defines which BCLog flags are valid”. Does the current approach work for you?

stickies-v commented at 12:05 pm on March 17, 2025:

I meant have kernel_LogCategory be a subset of BCLog::LogFlags, instead of having to remap them. But I didn’t realize that that would either require including logging.h in bitcoinkernel.h (impossible), or manually ensuring the enums are synced (bad).

My main gripe was the string re-conversion, which is now gone - so yes, current approach resolves my concern, thanks!

in src/kernel/bitcoinkernel.h:351 in 2951395589 outdated

345+ */
346+typedef enum {
347+    kernel_LOG_INFO = 0,
348+    kernel_LOG_DEBUG,
349+    kernel_LOG_TRACE,
350+} kernel_LogLevel;

stickies-v commented at 4:27 pm on March 14, 2025:

Since log levels are ordered, would it be prudent to reserve space for intermediate levels? E.g. if we decide we do want to add WARNING/ERROR later, we’d have to change existing log levels.

TheCharlatan commented at 12:55 pm on March 15, 2025:

I don’t think we need to rely on the order inside the enumeration here. That said, it could also just mirror the values in the BCLog::Level. I left out Warning and Error because you can’t really control those right now.

stickies-v commented at 7:23 pm on March 20, 2025:

I don’t think we need to rely on the order inside the enumeration here

I don’t think it’s required, but it is slightly convenient when they are? E.g. when implementing py-bitcoinkernel’s logging, being able to rely on the order of kernel_LogLevel helps the implementation a bit (and it is also how these level enums are usually implemented in most logging libraries, I think). As one example, I think using the same values as the python logging library could be sensible: https://docs.python.org/3/library/logging.html#logging-levels (with e.g. 5 for TRACE)

That said, it could also just mirror the values in the BCLog::Level

There is benefit in exposing them all, yes. It’ll be essential if/when we updated the logging callback to expose a kernel_Log struct instead of a string (as per my comment here), and a nice-to-have even for string-parsing as it helps inform which categories could appear in the log output (even if the enums don’t encode their string representation).

None of this is crucial, just sharing my thoughts.

yancyribbens commented at 6:16 pm on March 14, 2025: contributor

rust-bitcoin maintains a file of constants which are meant to mirror values in core. We’ve discussed trying to find an automated solution to keep these consts synchronized since right now, these are manually maintained. This is a point of annoyance since these values are constantly becoming stale (pun intended). Would it be possible to use these C headers to automatically build a rust crate of constants from a C header API? Or would that be overkill..

yancyribbens commented at 6:18 pm on March 14, 2025: contributor

Furthermore, besides keeping values synchronized, an automated solution which would generate all available consts would be ideal.

stickies-v commented at 6:41 pm on March 14, 2025: contributor

A few comments regarding logging. It’s a bit awkward to have LoggingConnection instances, but only global setters to update their granularity, so #30342 looks like a welcome improvement.

Besides that, hooking up a downstream log viewer in py-bitcoinkernel was fairly straightforward. Having a struct kernel_Log callback instead of having to parse a string for various fields (time, threadname, level, …) would be nice, and I think not even a huge left (can be done without upstream changes, even if that would be more efficient).

TheCharlatan commented at 9:42 pm on March 14, 2025: contributor

Re #30595 (comment) and #30595 (comment)

Thank you for your suggestions!

Would it be possible to use these C headers to automatically build a rust crate of constants from a C header API? Or would that be overkill..

Looking at the constants in the linked file they all seem to be policy-related, which is out of scope for now. I don’t think we’ll add a header for that in the near future. Generally speaking I am open towards exposing details of Bitcoin Core’s policy to applications that use it already anyway. For example it might be useful to expose some parts of policy for protocols using pre-signed transactions.

Furthermore, besides keeping values synchronized, an automated solution which would generate all available consts would be ideal.

We’ve recently discussed auto-generating parts of the header and library code instead of writing it by hand as done here. I think for exposing some of the consensus-related constants in that manner might be a good way forward eventually.

TheCharlatan force-pushed on Mar 14, 2025

TheCharlatan commented at 10:33 pm on March 14, 2025: contributor

Rebased 29513955891e40e78466f2c666dfa13e9c1b2914 -> 21f6a3de77a9eedcca5d47f694d540d42b3ddbcc (kernelApi_27 -> kernelApi_28, compare)

Fixed conflict with #31649

DrahtBot removed the label Needs rebase on Mar 14, 2025

yancyribbens commented at 7:00 pm on March 15, 2025: contributor

We’ve recently discussed auto-generating parts of the header and library code instead of writing it by hand as done here. I think for exposing some of the consensus-related constants in that manner might be a good way forward eventually. @TheCharlatan thanks for the reply. How would auto-generating parts work? That does sound potentially promising as a way to build rust crates as well if we can use the same input data for auto-generating.

TheCharlatan force-pushed on Mar 15, 2025

TheCharlatan commented at 9:46 pm on March 15, 2025: contributor

Updated 21f6a3de77a9eedcca5d47f694d540d42b3ddbcc -> 5991a69ee0000de551955846d7d21733c326a748 (kernelApi_28 -> kernelApi_29, compare)

Addressed @stickies-v’s comment, removed outdated comment about the validation interface in the kernel_Context.
Addressed @stickies-v’s comment, removed const qualifier from kernel_ChainParameters.
Addressed @stickies-v’s comment, removed unneeded const cast.
Addressed @stickies-v’s comment, removed unneeded mention of thread safety for the kernel_Context.
Addressed @stickies-v’s comment, introduce a helper function in the Logger to allow us to get rid of the string conversion functions.
Addressed @stickies-v’s comment, use the same order for the kernel log levels as done in the internal enum.
Addressed @stickies-v’s comment, make the logging setter functions return void instead of bool. With the new methods, there is no error case to report anymore.

TheCharlatan force-pushed on Mar 17, 2025

TheCharlatan commented at 9:46 pm on March 17, 2025: contributor

Updated 5991a69ee0000de551955846d7d21733c326a748 -> 2dc27e2860b97c2bffa5f18706917b21858e5594 (kernelApi_29 -> kernelApi_30, compare)

Addressed @stickies-v’s comment, add notice in the documentation about logging settings being thread unsafe and global.
Added a mutex to the options objects to make setting them thread safe.

in src/kernel/bitcoinkernel.h:574 in 2dc27e2860 outdated

569+ * @brief Set the log level of the global internal logger. This does not
570+ * enable the selected categories. Use `kernel_enable_log_category` to start
571+ * logging from a specific, or all categories. This function is not thread
572+ * safe. Mutiple calls from different threads are allowed but must be
573+ * synchronized. This changes a global setting and will override settings for
574+ * all existing `kernelLoggingConnection instances.

stickies-v commented at 11:47 am on March 18, 2025:

typo nit (+ in 2 other log functions)

0 * all existing `kernel_LoggingConnection instances.

ryanofsky commented at 12:31 pm on March 19, 2025: contributor

I had an idea I wanted to suggest here. What if instead of adding C bindings to the bitcoin/bitcoin git repository we took inspiration from @darosior’s thoughts about project scope and developed the C, rust, and python bindings in a separate bitcoin-core/bindings repository, or even separate bitcoin-core/bindings-{c,rust,python} repositories?

Technically I think there are two ways we could implement this:

Add cmake install rules to bitcoin/bitcoin to install kernel and util headers to $prefix/include/, install the kernel library to $prefix/lib/, and install a cmake config package to $prefix/lib/cmake/Libbitcoinkernel/LibbitcoinkernelConfig.cmake that the bitcoin-core/bindings repo can import with find_package config mode. This approach was implemented by @hebasto for libmultiprocess in https://github.com/bitcoin-core/libmultiprocess/pull/96 and I’ve been impressed by how easily it lets different cmake projects share code while still providing a clear boundary between them.
Avoid needing to write cmake install rules and just include the bitcoin/bitcoin repository as a git subtree in the bitcoin-core/bindings repository that can be built with add_subdirectory in the same cmake project.

Having a separate repository for C bindings could have a number of advantages over merging this PR to bitcoin/bitcoin:

It could allow faster development of C/rust/python bindings since they could take place outside the main repository and potentially have faster release cycles.
It could reduce burden on other core bitcoin developers since they would not have to worry about maintaining the C bindings implementation and keeping the C and C++ interfaces in sync.
It could enable other approaches to building bindings and using kernel code in external projects. It is great if developers want to build on the C++>C>Python and C++>C>Rust approaches we are providing. But I also think it would be great if developers could try other approaches like going directly from C++ to Python with pybind or nanobind, or directly to from C++ to Rust with cxx, autocxx, or zngur or going directly from C++ to any number of other languages with SWIG. These approaches may not be preferred by us but they are proven and established (particularly pybind and SWIG, which has been around for decades) that can be much more convenient than dropping down to C.
It could open a way to expose other bitcoin core code besides the kernel code externally. For example, it’s possible to imagine exposing python or javascript bindings for bitcoin wallet code and having tools and UIs written in different languages that are able to use bitcoin wallet files.

There would be some disadvantages to having a separate repository for C bindings:

It would not be posssible to use C++ wrappers that have been written around the C bindings to write internal tools in the bitcoin/bitcoin repository.
We would need to clearly communicate to outside developers that bitcoin/bitcoin C++ interfaces are not stable, and that if projects want a more stable interface they need to use the C interface or the C++ wrappers around the C interface.

This idea should not be incompatible with the current PR and I’d be happy to see this PR being merged whenever it is ready. But it could be something we think about going forward.

TheCharlatan commented at 10:40 pm on March 19, 2025: contributor

It could enable other approaches to building bindings and using kernel code in external projects.

I have done some experimentation with using c++ bindings directly.

SWIG

I tried creating python bindings through swig without looping through the C bindings and I could not get it to work within a reasonable amount of time. While likely a skill issue, it does seem to be struggling with some of our c++20 features and heavily templated code.

pybind

I was more successful here; this took me about a day to setup: https://github.com/TheCharlatan/bitcoinkernel-pybind. For now it just exposes a chainparams and one of its methods, but it does use our c++ methods directly. While an llm helped with the boilerplate, I did spend most of the time figuring out how to pass in some of our required pre-processor definitions and compiler options. Comparing it to my experience with the C headers the setup seems a bit more involved, but also not too horrible in comparison.

cxx

I tried this quite some time ago now (iirc one and half years ago). It seemed to work as well, but was a bit rough to use, so I ended up wrapping it in more rust code. Having to wrap it again would take away some of the utility over using C bindings, so not too sure about this approach. I might revisit cxx soon though.

I think the key differences between using these frameworks on our existing code and curating our own API are safety, documentation, consistency, and discoverability. Developers wishing to create language bindings can take the C header here and understand how to use it in reasonable time. I don’t think this is true for our current c++ code. Whether this is worth the additional maintenance is another question though. As you say:

It could reduce burden on other core bitcoin developers since they would not have to worry about maintaining the C bindings implementation and keeping the C and C++ interfaces in sync.

Looking at the footprint introduced here, I am not too worried about creating significantly more maintenance burden. On the contrary, just like the current bitcoin-chainstate binary acted as a north star for the kernel library development leading up to this pull request, the code in bitcoinkernel.cpp can act as a way to directly inform us on useful future changes. This could include things like logging, locking and mutexes, removing more “Bitcoin Core”-isms, adding hooks for db and file readers, and validating against user-provided UTXOs. It might be a bit more cognitive load when introducing new code to also think about potential external usage, but that might just mean the code changes are a bit better thought through on an architectural level. However, future feature additions to the API and the bindings would obviously mean more open PRs.

It could open a way to expose other bitcoin core code besides the kernel code externally. For example, it’s possible to imagine exposing python or javascript bindings for bitcoin wallet code and having tools and UIs written in different languages that are able to use bitcoin wallet files.

I’m not sure how this would be related with moving the API introduced here into a separate repository , but given that we already have a hierarchy of internal libraries, isn’t that already possible?

This idea should not be incompatible with the current PR and I’d be happy to see this PR being merged whenever it is ready. But it could be something we think about going forward.

The work leading up to this pull request has focused on improving the existing c++ code. If there comes a time where the internal code is consistent and documented enough, I think splitting out foreign bindings could indeed be a final step. Next to @darosior, I would be keen on hearing other people’s thoughts, like @laanwj and @theuni, that have advocated for the C bindings as part of bitcoin/bitcoin in the past. Some people have expressed their long-term hope that this library and header could become one of the few things shipped directly from the bitcoin/bitcoin repository, while most of the other existing components are split out into feature repositories. I think for this PR to move forward it would have to get a bit more contributor buy-in anyway. The projects built on top of this PR already gained some users, but I am not sure if that is enough to get it merged.

ryanofsky commented at 5:09 pm on March 20, 2025: contributor

re: #30595 (comment)

Thanks, if you don’t think a separate repository for C bindings would be good, or worth the tradeoffs, that’s fine. I was just excited about the idea because I realized with cmake config modules it would be easy to implement technically, and it seemed like a natural starting point to experiment with splitting the project up into different repositories while being able to share utilities and infrastructure.

Just to explain my perspective:

My main motivation for suggesting this was that I thought it might help with development of the C bindings. This was speculation on my part because I can see they are about 4K lines now, but I don’t have an idea of how big they will be when they are complete. This was also speculation because I can see you and others hashing out issues and making progress here, but don’t know if there could be more progress if there were a github repository people could push to and open PRs and issues against, so work could happen in parallel and developments might be easier to track. I also didn’t know if bindings might benefit from having more frequent releases as they are developed, or if they could benefit from being built against bitcoin core stable releases instead a embedding custom or bleeding edge versions of bitcoin core code. If using a separate repository wouldn’t help with these things or solve real problems, then I wouldn’t favor having one.
I may also have a different view of this PR because I look at C and C++ as being different languages with some syntax in common but pretty different features and idioms. To me C++ has more in common with Python than it does with C in how code is written and what features are used. So the choice to expose C++ code with a C API does not seem very natural to me. I understand the value of treating the C++ code as unstable and having a translation layer to expose it to other projects and languages, but I’d think different approaches to translation should be possible and we wouldn’t need to choose a single one and maintain it the main repository, even if it’d be fine to do that.

I have done some experimentation with using c++ bindings directly.

Thanks for the details and links and this is interesting to know about. I have a lot of experience with swig but mostly just think any of these approaches could work and be useful, and it should be fine to choose what seems convenient and expose a simple and stable API.

Looking at the footprint introduced here, I am not too worried about creating significantly more maintenance burden. On the contrary, just like the current bitcoin-chainstate binary acted as a north star for the kernel library development leading up to this pull request, the code in bitcoinkernel.cpp can act as a way to directly inform us on useful future changes. This could include things like logging, locking and mutexes, removing more “Bitcoin Core”-isms, adding hooks for db and file readers, and validating against user-provided UTXOs. It might be a bit more cognitive load when introducing new code to also think about potential external usage, but that might just mean the code changes are a bit better thought through on an architectural level. However, future feature additions to the API and the bindings would obviously mean more open PRs.

I can see the analogy, but not really how it applies. The chainstate program was useful because it was small and we could see it doing clumsy things because the C++ API was clumsy, so we would improve the C++ API, and the chainstate program would get simpler. Maybe I need to think about it more, but I don’t see how a similar process could play out between the C++ API and the C bindings, or how there could be other benefits to the C++ code from just maintaining the C bindings. I could see there being benefits from developing the bindings, but would expect those to be the same regardless of repository layout.

It could open a way to expose other bitcoin core code besides the kernel code externally. For example, it’s possible to imagine exposing python or javascript bindings for bitcoin wallet code and having tools and UIs written in different languages that are able to use bitcoin wallet files.

I’m not sure how this would be related with moving the API introduced here into a separate repository , but given that we already have a hierarchy of internal libraries, isn’t that already possible?

Yes, I was just thinking a bitcoin-core/bindings repository would be a natural place for an API like that to live, be discoverable, have documentation, and have issues reported against, rather than the main repository.

I think for this PR to move forward it would have to get a bit more contributor buy-in anyway. The projects built on top of this PR already gained some users, but I am not sure if that is enough to get it merged.

It seems fine to me to merge this PR and maintain this code as a separate library in the main repository. My reason for bringing this up was to question whether we actually needed to do that, and if there might be benefits to maintaining it in a separate repository. I don’t have a great sense of the tradeoffs and both approaches do seem reasonable to me.

in src/kernel/bitcoinkernel.cpp:256 in 2dc27e2860 outdated

257+            if (options->m_validation_interface) {
258+                m_validation_interface = std::make_unique<KernelValidationInterface>(*options->m_validation_interface);
259+                m_signals->RegisterValidationInterface(m_validation_interface.get());
260+            }
261+
262+        }

stickies-v commented at 6:57 pm on March 20, 2025:

The latest force-push broke this logic by leaving m_chainparams and m_notifications uninitialized if options is non-nullptr, but the respective options members are nullptr.

Suggested fix:

 0diff --git a/src/kernel/bitcoinkernel.cpp b/src/kernel/bitcoinkernel.cpp
 1index 0cb2d69cec..1e6c582357 100644
 2--- a/src/kernel/bitcoinkernel.cpp
 3+++ b/src/kernel/bitcoinkernel.cpp
 4@@ -240,11 +240,7 @@ public:
 5           m_interrupt{std::make_unique<util::SignalInterrupt>()},
 6           m_signals{std::make_unique<ValidationSignals>(std::make_unique<ImmediateTaskRunner>())}
 7     {
 8-        if (!options) {
 9-            m_notifications = std::make_unique<KernelNotifications>(kernel_NotificationInterfaceCallbacks{
10-                nullptr, nullptr, nullptr, nullptr, nullptr, nullptr, nullptr, nullptr});
11-            m_chainparams = CChainParams::Main();
12-        } else {
13+        if (options) {
14             LOCK(options->m_mutex);
15             if (options->m_chainparams) {
16                 m_chainparams = std::make_unique<const CChainParams>(*options->m_chainparams);
17@@ -256,7 +252,13 @@ public:
18                 m_validation_interface = std::make_unique<KernelValidationInterface>(*options->m_validation_interface);
19                 m_signals->RegisterValidationInterface(m_validation_interface.get());
20             }
21-
22+        }
23+        if (!m_notifications) {
24+            m_notifications = std::make_unique<KernelNotifications>(kernel_NotificationInterfaceCallbacks{
25+                nullptr, nullptr, nullptr, nullptr, nullptr, nullptr, nullptr, nullptr});
26+        }
27+        if (!m_chainparams) {
28+            m_chainparams = CChainParams::Main();
29         }
30 
31         if (!kernel::SanityChecks(*m_context)) {

This did not fail/break test_kernel.cpp because the test explicitly sets the options in create_context():

0    options.SetChainParams(params);
1    options.SetNotifications(notifications);

py-bitcoinkernel does not automatically do that, which is causing segfaults for kernel_chainstate_manager_create in the test suite there.

TheCharlatan commented at 9:20 pm on March 20, 2025:

Thanks! I also added a regression test. Sorry for not catching this earlier!

in src/kernel/bitcoinkernel.h:589 in 2dc27e2860 outdated

576+ * @param[in] category If kernel_LOG_ALL is chosen, all messages at the specified level
577+ *                     will be logged. Otherwise only messages from the specified category
578+ *                     will be logged at the specified level and above.
579+ * @param[in] level    Log level at which the log category is set.
580+ */
581+BITCOINKERNEL_API void kernel_add_log_level_category(const kernel_LogCategory category, kernel_LogLevel level);

stickies-v commented at 7:10 pm on March 20, 2025:

What is the rationale behind requiring to first add, and then {enable,disable} the category? An alternative would be a single kernel_set_log_level_category, which immediately “enables” (I think it’s a strange term anyway) the category at the given level. “Disabling” would be achieved by calling kernel_set_log_level_category again with a higher (i.e. less granular) level, again taking effect immediately.

I can’t think of any use cases that require separating this in 3 functions? I think it would simultaneously be more intuitive and ergonomic, and probably also less code?

TheCharlatan commented at 9:25 pm on March 20, 2025:

This should just mirror the internal code at the moment, but I agree that it is not really useful to split this up. Will see if I can consolidate this.

TheCharlatan force-pushed on Mar 20, 2025

TheCharlatan commented at 9:21 pm on March 20, 2025: contributor

Updated 2dc27e2860b97c2bffa5f18706917b21858e5594 -> 9fc6accf89ed001f70e107a8e9936f6dc3a35f41 (kernelApi_30 -> kernelApi_31, compare)

Addressed @stickies-v’s comment, fixed naming in docstring for kernel_LoggingConnection.
Addressed @stickies-v’s comment, fixed constructor for Context and added a test to catch the regression.

TheCharlatan force-pushed on Mar 20, 2025

DrahtBot commented at 10:14 pm on March 20, 2025: contributor

🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/39141725876

Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.
An intermittent issue.

Leave a comment here, if you need help tracking down a confusing failure.

DrahtBot added the label CI failed on Mar 20, 2025

TheCharlatan commented at 10:15 pm on March 20, 2025: contributor

Rebased 9fc6accf89ed001f70e107a8e9936f6dc3a35f41 -> 29f05b91cf8a479e403b0322afeb5ff1133da221 (kernelApi_31 -> kernelApi_32, compare)

Fixed silent merge conflict with #31519

DrahtBot removed the label CI failed on Mar 21, 2025

ajtowns commented at 6:00 am on March 21, 2025: contributor

I had an idea I wanted to suggest here. What if instead of adding C bindings to the bitcoin/bitcoin git repository we took inspiration from @darosior’s thoughts about project scope and developed the C, rust, and python bindings in a separate bitcoin-core/bindings repository, or even separate bitcoin-core/bindings-{c,rust,python} repositories?

I think this is an interesting idea, and may be worth exploring independently of this PR. Couple of comments:

* It would not be posssible to use C++ wrappers that have been written around the C bindings to write internal tools in the bitcoin/bitcoin repository.

I think for things like that we’d just include the wrapper in core because it’s not an external maintenance burden but a natural part of core, and we’d already judged that that was less maintenance burden in total than writing the code in C++ in the first place.

* We would need to clearly communicate to outside developers that bitcoin/bitcoin C++ interfaces are not stable, and that if projects want a more stable interface they need to use the C interface or the C++ wrappers around the C interface.

I think you could just treat that as part of API versioning – bitcoin core updates to version 30.0, tweaking a bunch of internal structures that result in C/python/rust API changes, so that results in a semver bump to version 30 for the C/python/rust API. But if the API is an independent product, that doesn’t have to happen on any particular schedule – you can keep using API version 28 even if your node is running v31 (modulo security update policies perhaps).

I find having the test framework code handy for doing python bitcoin things and use jamesob’s verystable for that (eg powcoins, bllsh. It’s just an externally maintained copy of relevant bits of bitcoin core code, that is manually synced to new upstream releases every now and then. That model seems workable to me.

I think maybe exposing some of our complicated internal logic for direct manipulation/experimentation in python might be helpful for debugging – the new txgraph stuff in particular, but perhaps also the fee estimation code.

I think separate repos for python/rust APIs is probably more compelling than for C, since C APIs are naturally fairly basic and are also pretty well supported by C++ without introducing any extra dependencies. With good python/rust APIs available externally, I could imagine the C API not being very useful, but wouldn’t want to bet on it either way.

TheCharlatan commented at 6:38 am on March 21, 2025: contributor

Maybe I need to think about it more, but I don’t see how a similar process could play out between the C++ API and the C bindings, or how there could be other benefits to the C++ code from just maintaining the C bindings. I could see there being benefits from developing the bindings, but would expect those to be the same regardless of repository layout.

I think I was conflating the introduction of some consolidating code here, like a separate context, methods that map to multiple calls to our validation code, logging initialization, etc., and the code really only required to do the C translation. The former could probably be viewed as a safe “interface” of sorts into some of the existing code in the kernel library and could be useful for our internal code, while the latter is more useful for external callers. A long term goal if this gets merged would be evolving the internal code to absorb most of the code in this “interface”. Maybe the approach taken here is not ideal for actually surfacing that, but it is also not clear to me what a better approach might be, since such an “interface” is best designed based on the requirements of an actual user (i.e. the C API in the case of this PR).

TheCharlatan commented at 9:50 pm on March 25, 2025: contributor

@stickies-v found an interesting example of a library, SFML, that ships C++ headers from its “core” repository, but also hosts C bindings in another repository in their organisation. They have a list of all known bindings to their library on their website: https://www.sfml-dev.org/download/bindings. I did a tour of some of them and most seem to be using the C bindings, but interestingly there seems to be a trend among them to move from the C headers to the C++ headers. For example the first python bindings used the C header, while the current python bindings use a mix. The ocaml bindings even mention in their project readme how they migrated over time to the C++ headers. What I also find interesting is that SFML has a mix of internal and external headers, that live in the src and include directory respectively, but use both of them in their core codebase. EDIT: I looked through their git history a bit and saw that they originally had their bindings in the same codebase.

TheCharlatan commented at 11:14 am on March 26, 2025: contributor

Pushed 29f05b91cf8a479e403b0322afeb5ff1133da221 -> 97d1edcdafe074e910ed647dcb6beedd24744b17 (kernelApi_32 -> kernelApi_33, compare)

Added a commit introducing a small purpose section in the header documentation. It briefly mentions the features, that the header is unversioned, might just break with future updates, and won’t be released yet.

ryanofsky referenced this in commit 74c23f80ab on Mar 27, 2025

TheCharlatan force-pushed on Mar 28, 2025

TheCharlatan commented at 9:40 am on March 28, 2025: contributor

Rebased 97d1edcdafe074e910ed647dcb6beedd24744b17 -> a0d24ff9a9337770dae668d7b0ea0a6e62ed086a (kernelApi_33 -> kernelApi_34, compare)

Integrated the new bitcoin-chainstate functional tests from #32145 to demonstrate that it is still working.

DrahtBot added the label Needs rebase on Mar 31, 2025

TheCharlatan commented at 2:15 pm on April 3, 2025: contributor

I pushed a branch that re-writes this PR by inverting the relationship between the C and the C++ API: https://github.com/TheCharlatan/bitcoin/tree/kernelApi_Cpp.

Its new C++ header is here: https://github.com/TheCharlatan/bitcoin/blob/kernelApi_Cpp/src/kernel/bitcoinkernel.hpp

I think this could a viable alternative to move this PR forward beyond the C/C++ API discussion. The C API could now easily live outside this repository.

Circling back to what I said in this comment #30595 (comment):

“I think I was conflating the introduction of some consolidating code here, like a separate context, methods that map to multiple calls to our validation code, logging initialization, etc., and the code really only required to do the C translation.”

The code required on the C side now hardly contains any business logic and only does very mechanical C++ to C translations: https://github.com/TheCharlatan/bitcoin/blob/kernelApi_Cpp/src/kernel/bitcoinkernel_c.cpp . @ryanofsky since you were active in these discussions here, would you support such an approach over the current PR?

ryanofsky commented at 3:58 pm on April 3, 2025: contributor

@ryanofsky since you were active in these discussions here, would you support such an approach over the current PR?

I’d support it and I also support the current PR. It seems like if you are an external kernel user, the approach in #30595 (comment) shouldn’t change things very much for you, since either way you are provided with an alternate set of C++ classes and flags that that mirror the internal ones. I feel like if I personally were writing python or rust bindings, I’d want to just use the original classes and flags so I wouldn’t need to go through an extra level of code to expose new functionality. But I understand reasons for wanting this code, and if this approach helps organize it better or make it easier to maintain that seems great.

The C API could now easily live outside this repository.

Am curious about this. Since https://github.com/TheCharlatan/bitcoin/blob/kernelApi_Cpp/src/kernel/bitcoinkernel.hpp includes kernel/bitcoinkernel.h it seems like C++ API at least partially depends on the C API, but maybe it could be broken up. I do think even in the current master branch, we could add a simple install rule to install headers (as described #30595 (comment)) and a C API could also be built outside that way. Maybe this new approach provides some more appealing alternatives though.

TheCharlatan commented at 4:36 pm on April 3, 2025: contributor

t seems like C++ API at least partially depends on the C API, but maybe it could be broken up. I do think even in the current master branch, we could add a simple install rule to install headers (as described #30595 (comment)) and a C API could also be built outside that way. Maybe this new approach provides some more appealing alternatives though.

Having separate installable headers for these basic enum and result types is very much what I had in mind. Maybe something like the kernel/types.h from one of your PRs, or a bunch of similarly scoped headers could be the way forward. I think it would be trivial to do that for all the extra types that currently are included through the C header on that branch. I did not do it, because I am a bit apprehensive when it comes to shuffling around internal declarations into smaller headers that can be used externally before we have agreed on an approach here. I think the scope here is already quite big, but would it help you evaluate if I would add these additional installable headers?

EDIT: Did the header split here and added some of our internal headers to the install list: https://github.com/TheCharlatan/bitcoin/tree/kernelApi_Cpp_Internal_Headers

I think wholesale exposing all kernel library headers vs. having a dedicated external API and a few “blessed” / “installable” headers is a different discussion though. If the conversation moves to that over the choice of language in the exposed headers, I’d be happy to discuss that too.

hebasto commented at 10:39 pm on April 7, 2025: member

The Windows CI jobs seems to require the following patch:

0--- a/.github/workflows/ci.yml
1+++ b/.github/workflows/ci.yml
2@@ -242,6 +242,7 @@ jobs:
3           BITCOINCLI: '${{ github.workspace }}\build\bin\Release\bitcoin-cli.exe'
4           BITCOINUTIL: '${{ github.workspace }}\build\bin\Release\bitcoin-util.exe'
5           BITCOINWALLET: '${{ github.workspace }}\build\bin\Release\bitcoin-wallet.exe'
6+          BITCOINCHAINSTATE: '${{ github.workspace }}\build\bin\Release\bitcoin-chainstate.exe'
7           TEST_RUNNER_EXTRA: ${{ github.event_name != 'pull_request' && '--extended' || '' }}
8         shell: cmd
9         run: py -3 test\functional\test_runner.py --jobs %NUMBER_OF_PROCESSORS% --ci --quiet --tmpdirprefix=%RUNNER_TEMP% --combinedlogslen=99999999 --timeout-factor=%TEST_RUNNER_TIMEOUT_FACTOR% %TEST_RUNNER_EXTRA%

TheCharlatan commented at 8:20 am on April 8, 2025: contributor

Re #30595#pullrequestreview-2748265226

The Windows CI jobs seems to require the following patch:

Yes, I already did that on https://github.com/TheCharlatan/bitcoin/tree/kernelApi_35, but it still failed.

hebasto commented at 9:44 am on April 8, 2025: member

Re #30595 (review)

The Windows CI jobs seems to require the following patch:

Yes, I already did that on https://github.com/TheCharlatan/bitcoin/tree/kernelApi_35, but it still failed.

Hmm… The test passes on my machine:

 0> py -3 build-static\test\functional\tool_bitcoin_chainstate.py
 12025-04-08T09:43:10.604000Z TestFramework (INFO): PRNG seed is: 7627559442665807582
 22025-04-08T09:43:10.622000Z TestFramework (INFO): Initializing test directory C:\Users\hebasto\AppData\Local\Temp\bitcoin_func_test_wlc9wod3
 32025-04-08T09:43:11.026000Z TestFramework (INFO): Testing bitcoin-chainstate ['C:\\Users\\hebasto\\bitcoin\\build-static\\bin\\Release\\bitcoin-chainstate.exe'] with datadir: C:\Users\hebasto\AppData\Local\Temp\bitcoin_func_test_wlc9wod3\node0
 42025-04-08T09:43:11.251000Z TestFramework (INFO): STDERR: Block has not yet been rejected
 52025-04-08T09:43:11.449000Z TestFramework (INFO): STDERR: Block has not yet been rejected
 6Block is a duplicate
 72025-04-08T09:43:11.701000Z TestFramework (INFO): STDERR: Block decode failed, try again:
 82025-04-08T09:43:11.909000Z TestFramework (INFO): STDERR: Empty line found, try again:
 92025-04-08T09:43:11.961000Z TestFramework (INFO): Stopping nodes
102025-04-08T09:43:11.961000Z TestFramework (INFO): Cleaning up C:\Users\hebasto\AppData\Local\Temp\bitcoin_func_test_wlc9wod3 on exit
112025-04-08T09:43:11.961000Z TestFramework (INFO): Tests successful

Will investigate it further.

hebasto commented at 2:45 pm on April 8, 2025: member

Re #30595 (review)

The Windows CI jobs seems to require the following patch:

Yes, I already did that on https://github.com/TheCharlatan/bitcoin/tree/kernelApi_35, but it still failed.

Hmm… The test passes on my machine:

 0> py -3 build-static\test\functional\tool_bitcoin_chainstate.py
 12025-04-08T09:43:10.604000Z TestFramework (INFO): PRNG seed is: 7627559442665807582
 22025-04-08T09:43:10.622000Z TestFramework (INFO): Initializing test directory C:\Users\hebasto\AppData\Local\Temp\bitcoin_func_test_wlc9wod3
 32025-04-08T09:43:11.026000Z TestFramework (INFO): Testing bitcoin-chainstate ['C:\\Users\\hebasto\\bitcoin\\build-static\\bin\\Release\\bitcoin-chainstate.exe'] with datadir: C:\Users\hebasto\AppData\Local\Temp\bitcoin_func_test_wlc9wod3\node0
 42025-04-08T09:43:11.251000Z TestFramework (INFO): STDERR: Block has not yet been rejected
 52025-04-08T09:43:11.449000Z TestFramework (INFO): STDERR: Block has not yet been rejected
 6Block is a duplicate
 72025-04-08T09:43:11.701000Z TestFramework (INFO): STDERR: Block decode failed, try again:
 82025-04-08T09:43:11.909000Z TestFramework (INFO): STDERR: Empty line found, try again:
 92025-04-08T09:43:11.961000Z TestFramework (INFO): Stopping nodes
102025-04-08T09:43:11.961000Z TestFramework (INFO): Cleaning up C:\Users\hebasto\AppData\Local\Temp\bitcoin_func_test_wlc9wod3 on exit
112025-04-08T09:43:11.961000Z TestFramework (INFO): Tests successful

Will investigate it further.

Passing DATADIR to bitcoin-chainstate.exe as a command-line argument fails to handle UTF-8 characters, which results in an unhandled exception in this code: https://github.com/bitcoin/bitcoin/blob/987ad25bd9ee520dcf1ca96702ff4ad51392f765/src/bitcoin-chainstate.cpp#L61-L62

TheCharlatan force-pushed on Apr 8, 2025

TheCharlatan commented at 9:07 pm on April 8, 2025: contributor

Updated a0d24ff9a9337770dae668d7b0ea0a6e62ed086a -> 9e8b7f8f47af566324df475853e9281937a0c5e2 (kernelApi_34 -> kernelApi_35, compare)

Fixed bitcoin-chainstate on windows. This was not working before, because it did not support UTF-8 strings passed as arguments.

Rebased 9e8b7f8f47af566324df475853e9281937a0c5e2 -> 720f253abbbeb56872b6c16deee26f3fab842254 (kernelApi_35 -> kernelApi_36, compare)

DrahtBot removed the label Needs rebase on Apr 8, 2025

TheCharlatan force-pushed on Apr 9, 2025

DrahtBot added the label Needs rebase on Apr 22, 2025

TheCharlatan force-pushed on Apr 22, 2025

TheCharlatan commented at 6:21 pm on April 22, 2025: contributor

Rebased 720f253abbbeb56872b6c16deee26f3fab842254 -> 4a4eeb94339bb9200012df6a57769dc28e35f553 (kernelApi_36 -> kernelApi_37, compare)

Fixed conflict with #32308

DrahtBot removed the label Needs rebase on Apr 22, 2025

Davidson-Souza commented at 4:26 pm on May 6, 2025: none

I’ve tried out some of the code, specifically the API for validating transactions. I’m reporting back some of the results I’ve got so far, hopefully this info is useful for reviewers.

As some of you might know, I have a project that uses the now deprecated (see #29189) libbitcoinconsensus for script validation. This is a nice feature, since script is usually the hardest part to re-implement when it comes to Bitcoin consensus. However, apart from being deprecated, libbitcoinconsensus had a huge performance bottleneck: it deserialized transactions every time we called it. And since the expose verify_script function was called per input, a tx with several inputs would cause the same tx to be deserialized several times. To make things worse, Bitcoin Core appears to have an optimization for CTransaction, where it pre-computes the txid and wtxid when the tx is deserialized. I believe this is due to those values being used all the time, wouldn’t make sense to keep recomputing it. But for this case, it meant that we would recompute the txid and wtxid of the same transaction, for every input.

When profiling Floresta, I’ve realized that after our assumevalid height (in this context assumevalid is the same concept as core’s), we would take about 40% of CPU time computing those hashes, as shown in this flamegraph.

The API introduced in this PR, exposes a opaque type for CTransaction, that is then passed as parameter to the verify function. So no per-input deserialization, you parse it once and re-use it in all calls for the same tx. Here’s a flamegraph using the new api:

flamegraph

Some functions’ names haven’t been resolved, but most of the functions we see here are what we would expect (verify signatures, sighash calculation…). This flamegraph may not look like much, but it was recorded while validating the last 50k blocks on mainnet, and the perf.dat file is >100GB big.

We also have two benchmarks that are relevant in this case: one fully validating block 866342 (just a random block, the tip when this benchmark was written), and one fully validating block 367891 (a block with a 19k inputs transaction). I’ve ran both 4 times, although the too we use already samples over multiple runs. Both shows incredible improvements.

Bench: Block 866342

Run	Consensus	Kernel(this PR)
1	2.0391 s	613.81 ms
2	2.0351 s	604.42 ms
3	2.0821 s	622.37
4	1.9660 s	619.45

Benchmark: block 367891

Run	Consensus	Kernel(this PR)
1	92.469 s	6.1111 s
2	91.411 s	5.8172 s
3	93.894 s	6.3067 s
4	88.523 s	6.2245 s

For the second case, we can see a ~15x speedup using the new code.

DrahtBot added the label Needs rebase on May 7, 2025

TheCharlatan force-pushed on May 7, 2025

TheCharlatan commented at 6:31 pm on May 7, 2025: contributor

Rebased 4a4eeb94339bb9200012df6a57769dc28e35f553 -> 65fe5d03e7a2d0d00d7d37bd426fd6532fff3c06 (kernelApi_37 -> kernelApi_38, compare)

Fixed conflict with #28710

DrahtBot removed the label Needs rebase on May 7, 2025

DrahtBot added the label Needs rebase on May 28, 2025

TheCharlatan force-pushed on May 28, 2025

TheCharlatan commented at 1:54 pm on May 28, 2025: contributor

Rebased 65fe5d03e7a2d0d00d7d37bd426fd6532fff3c06 -> 1417e0b3b1b03dd014a3459c10a5ae7ab0c3687f (kernelApi_38 -> kernelApi_39, compare)

Fixed conflict with #32528

DrahtBot removed the label Needs rebase on May 28, 2025

DrahtBot added the label Needs rebase on Jun 11, 2025

TheCharlatan force-pushed on Jun 11, 2025

TheCharlatan commented at 7:13 am on June 11, 2025: contributor

Rebased 1417e0b3b1b03dd014a3459c10a5ae7ab0c3687f -> 43535b545ca6dd7e0221b7c25abfc8409885f7c0 (kernelApi_39 -> kernelApi_40, compare)

Fixed conflict with #32680

DrahtBot removed the label Needs rebase on Jun 11, 2025

setavenger commented at 8:50 pm on June 12, 2025: none

I’m working on a v2 of my silent payment indexer in GO. I’ve been able to create a small library to get the basic functionality going https://github.com/setavenger/go-bitcoinkernel. In the current version I’m using Bitcoin Cores RPC getblock endpoint with verbosity 3. That gives me the necessary prevouts and also for convience their height. Knowing when a prevout was created helps a lot with efficient cut-through. Building on lib-bitcoinkernel I noticed that there does not seem to be a way yet to fetch the block of a transaction output (neither height nor hash).

TheCharlatan commented at 10:41 am on June 13, 2025: contributor

Thanks for testing this out @setavenger! I think exposing the height from the coin containing the CTxOut in the undo data would be a nice improvement and should be easy to do. Once you have the height, you can get the block through kernel_get_block_index_from_height and then kernel_read_block_from_disk, which I think would provide what you are looking for.

TheCharlatan force-pushed on Jun 13, 2025

TheCharlatan commented at 11:53 am on June 13, 2025: contributor

Updated 43535b545ca6dd7e0221b7c25abfc8409885f7c0 -> d9e030d56343bb452d86169f77ddfb64f7160235 (kernelApi_40 -> kernelApi_41, compare)

Addressed @setavenger’s comment, added a function to retrieve the block height an output contained in the undo data was created at.

setavenger commented at 2:04 pm on June 14, 2025: none

Wow quick, thanks a lot! This should work well. Will give feedback once I had time to try it out properly.

stringintech commented at 6:01 pm on June 22, 2025: contributor

Concept ACK.

I have also been working on a Go wrapper; I mostly started it to learn and familiarize myself with what is possible with the API, but I intend to maintain it, so I appreciate any feedback:

github.com/stringintech/go-bitcoinkernel

I have already taken a brief look at the existing Python and Rust wrappers and borrowed ideas for tests, but I should take a closer look.

in src/kernel/CMakeLists.txt:153 in d9e030d563 outdated

148@@ -149,3 +149,5 @@ install(TARGETS bitcoinkernel
149     DESTINATION ${CMAKE_INSTALL_LIBDIR}
150     COMPONENT libbitcoinkernel
151 )
152+
153+install(FILES bitcoinkernel.h DESTINATION ${CMAKE_INSTALL_INCLUDEDIR} COMPONENT Kernel)

stringintech commented at 6:29 pm on June 29, 2025:

I think COMPONENT Kernel should become COMPONENT libbitcoinkernel. Noticed when running cmake --install build --component libbitcoinkernel the header file wasn’t being installed.

TheCharlatan force-pushed on Jun 30, 2025

TheCharlatan commented at 10:15 am on June 30, 2025: contributor

Updated d9e030d56343bb452d86169f77ddfb64f7160235 -> 690a5dac223ed18a65c9d9e6c535466cc3ad4511 (kernelApi_41 -> kernelApi_42, compare)

Addressed @stringintech’s comment, using the correct name for the kernel component. This was missed after rebasing on #31869.

yuvicc commented at 4:12 pm on July 7, 2025: contributor

Concept ACK

Made a Java wrapper library for the Java folks out there! https://github.com/yuvicc/java-bitcoinkernel

While playing with the API I also wrote a benchmarking test for the script validation using the API vs the internal code just to see the performane overhead:

ns/op	op/s	err%	ins/op	cyc/op	IPC	bra/op	miss%	total	benchmark
34,138.87	29,292.12	0.2%	457,313.10	144,899.28	3.156	9,465.17	0.5%	0.01	`VerifyScriptBench`
34,229.45	29,214.61	0.1%	457,313.10	144,994.30	3.154	9,465.17	0.5%	0.01	`VerifyScriptBench`
34,001.07	29,410.84	0.2%	457,313.11	144,108.48	3.173	9,465.17	0.5%	0.01	`VerifyScriptBench`

ns/op	op/s	err%	ins/op	cyc/op	IPC	bra/op	miss%	total	benchmark
34,867.07	28,680.36	0.2%	465,484.22	148,107.37	3.143	12,150.19	0.4%	0.01	`VerifyScriptKernelApiBench`
34,999.80	28,571.59	0.2%	465,484.22	148,375.57	3.137	12,150.19	0.4%	0.01	`VerifyScriptKernelApiBench`
34,853.13	28,691.82	0.2%	465,484.22	147,583.89	3.154	12,150.19	0.4%	0.01	`VerifyScriptKernelApiBench`

TheCharlatan commented at 1:51 pm on July 9, 2025: contributor

Made a Java wrapper library for the Java folks out there! https://github.com/yuvicc/java-bitcoinkernel

Cool!

While playing with the API I also wrote a benchmarking test for the script validation using the API vs the internal code just to see the performane overhead

It looks like your benchmark includes serialization in its hot loop, which is not the case for our internal VerifyScriptBench. I’d be surprised if there was any overhead if the serialization is performed externally in your bench too.

yuvicc commented at 1:44 pm on July 10, 2025: contributor

It looks like your benchmark includes serialization in its hot loop, which is not the case for our internal VerifyScriptBench. I’d be surprised if there was any overhead if the serialization is performed externally in your bench too.

Agree, I think I shall keep the script serialization for the internal code inside the hot loop as well and also vice versa. Let me check!

in src/kernel/bitcoinkernel.cpp:1076 in 690a5dac22 outdated

1088+
1089+uint64_t kernel_get_transaction_undo_size(const kernel_BlockUndo* block_undo_, uint64_t transaction_undo_index)
1090+{
1091+    const auto block_undo{cast_const_block_undo(block_undo_)};
1092+    return block_undo->vtxundo[transaction_undo_index].vprevout.size();
1093+}

stringintech commented at 3:45 pm on July 13, 2025:

Could we check and return zero if transaction_undo_index is out of bounds here as well? (and then update the documentation in the header file that zero return value is indicative of out of bounds error)

in src/kernel/bitcoinkernel.h:82 in 690a5dac22 outdated

77+ * functions and holding callbacks for kernel events.
78+ *
79+ * @section error Error handling
80+ *
81+ * Functions communicate an error through their return types, usually returning
82+ * a nullptr, or false if an error is encountered. Additionally, verification

stringintech commented at 3:48 pm on July 13, 2025:

Might be good to include that returning zero could also be indicative of an error.

in src/kernel/bitcoinkernel.h:1272 in 690a5dac22 outdated

1182+ */
1183+BITCOINKERNEL_API kernel_TransactionOutput* BITCOINKERNEL_WARN_UNUSED_RESULT kernel_get_undo_output_by_index(
1184+    const kernel_BlockUndo* block_undo,
1185+    uint64_t transaction_undo_index,
1186+    uint64_t output_index
1187+) BITCOINKERNEL_ARG_NONNULL(1);

stringintech commented at 4:01 pm on July 13, 2025:

In the corresponding commit description it is included that the returned kernel_TransactionOutput is entirely owned by the user and … . It would be nice to also include the explanation here.

in src/kernel/bitcoinkernel.cpp:664 in 690a5dac22 outdated

655+{
656+    auto options{cast_context_options(options_)};
657+    // Copy the notifications, so the caller can free it again
658+    LOCK(options->m_mutex);
659+    options->m_notifications = std::make_unique<const KernelNotifications>(notifications);
660+}

stringintech commented at 4:12 pm on July 13, 2025:

I think the comment does not apply here.

in src/kernel/bitcoinkernel.h:589 in 690a5dac22 outdated

584+ * @param[in] category If kernel_LOG_ALL is chosen, all messages at the specified level
585+ *                     will be logged. Otherwise only messages from the specified category
586+ *                     will be logged at the specified level and above.
587+ * @param[in] level    Log level at which the log category is set.
588+ */
589+BITCOINKERNEL_API void kernel_add_log_level_category(const kernel_LogCategory category, kernel_LogLevel level);

stringintech commented at 4:36 pm on July 13, 2025:

I was thinking why this is not named kernel_set_log_level_category as it sets the log level for a category. But perhaps we cannot since the name is already taken in logging.h for a different purpose.

in src/kernel/bitcoinkernel.cpp:614 in 690a5dac22 outdated

598+
599+    // We are not buffering if we have a connection, so check that it is not the
600+    // last available connection.
601+    if (!LogInstance().Enabled()) {
602+        LogInstance().DisconnectTestLogger();
603+    }

stringintech commented at 4:45 pm on July 13, 2025:

Might be worth making the doc a bit more explicit to say sth like “switch back to buffering logs if no connections remain” since DisconnectTestLogger() doesn’t clearly indicate this behavior from its name.

in src/kernel/bitcoinkernel.h:1019 in 690a5dac22 outdated

1014+ * @brief Get the block index entry of the current chain tip. Once returned,
1015+ * there is no guarantee that it remains in the active chain.
1016+ *
1017+ * @param[in] context            Non-null.
1018+ * @param[in] chainstate_manager Non-null.
1019+ * @return                       The block index of the current tip.

stringintech commented at 5:37 pm on July 13, 2025:

nit

0 * [@return](/bitcoin-bitcoin/contributor/return/)                       The block index of the current tip, or null if no active chain exists.

in src/kernel/bitcoinkernel.h:1205 in 690a5dac22 outdated

1200+
1201+/**
1202+ * @brief Return the block hash associated with a block index.
1203+ *
1204+ * @param[in] block_index Non-null.
1205+ * @return    The block hash.

stringintech commented at 5:38 pm on July 13, 2025:

nit

0 * [@return](/bitcoin-bitcoin/contributor/return/)    The block hash, or null if the block index has no associated hash.

in src/kernel/bitcoinkernel.h:1044 in 690a5dac22 outdated

1039+ * @brief Retrieve a block index by its block hash.
1040+ *
1041+ * @param[in] context            Non-null.
1042+ * @param[in] chainstate_manager Non-null.
1043+ * @param[in] block_hash         Non-null.
1044+ * @return                       The block index of the block with the passed in hash, or null on error.

stringintech commented at 5:55 pm on July 13, 2025:

Null reason could be more explicit:

0 * [@return](/bitcoin-bitcoin/contributor/return/)                       The block index of the block with the passed in hash, or null if block hash not found.

in src/kernel/bitcoinkernel.h:1059 in 690a5dac22 outdated

1054+ * Once retrieved there is no guarantee that it remains in the active chain.
1055+ *
1056+ * @param[in] context            Non-null.
1057+ * @param[in] chainstate_manager Non-null.
1058+ * @param[in] block_height       Height in the chain of the to be retrieved block index.
1059+ * @return                       The block index at a certain height in the currently active chain, or null on error.

stringintech commented at 5:56 pm on July 13, 2025:

Null reason could be more explicit:

0 * [@return](/bitcoin-bitcoin/contributor/return/)                       The block index at a certain height in the currently active chain, or null if height is out of bounds.

in src/kernel/bitcoinkernel.h:1075 in 690a5dac22 outdated

1070+ * chain.
1071+ *
1072+ * @param[in] context            Non-null.
1073+ * @param[in] block_index        Non-null.
1074+ * @param[in] chainstate_manager Non-null.
1075+ * @return                       The next block index in the currently active chain, or null on error.

stringintech commented at 5:57 pm on July 13, 2025:

Null reason could be more explicit:

0 * [@return](/bitcoin-bitcoin/contributor/return/)                       The next block index in the currently active chain, or null if block is tip of chain.

in src/kernel/bitcoinkernel.h:461 in 690a5dac22 outdated

456+
457+/**
458+ * @brief Create a script pubkey from serialized data.
459+ * @param[in] script_pubkey     Non-null.
460+ * @param[in] script_pubkey_len Length of the script pubkey data.
461+ * @return                      The script pubkey, or null on error.

stringintech commented at 6:00 pm on July 13, 2025:

As far as I understand from the implementation, there is no case we return null on error for this. Also for kernel_copy_block_data and kernel_copy_block_pointer_data functions.

in src/kernel/bitcoinkernel.h:306 in 690a5dac22 outdated

305+typedef struct {
306+    const void* user_data;                                //!< Holds a user-defined opaque structure that is passed to the validation
307+                                                          //!< interface callbacks.
308+    kernel_ValidationInterfaceBlockChecked block_checked; //!< Called when a new block has been checked. Contains the
309+                                                          //!< result of its validation.
310+} kernel_ValidationInterfaceCallbacks;

stringintech commented at 6:10 pm on July 13, 2025:

In the corresponding commit description it is stated that the callbacks block any further validation execution when they are called. It is up to the user to … . Would be nice to also mention this blocking nature in the header file docs.

in src/kernel/bitcoinkernel.h:1189 in 690a5dac22 outdated

1178+ * @param[in] transaction_undo_index The index of the transaction undo data within the block undo data.
1179+ * @param[in] output_index           The index of the to be retrieved transaction output within the
1180+ *                                   transaction undo data.
1181+ * @return                           A transaction output pointer, or null if provided indices are out of bounds.
1182+ */
1183+BITCOINKERNEL_API kernel_TransactionOutput* BITCOINKERNEL_WARN_UNUSED_RESULT kernel_get_undo_output_by_index(

stringintech commented at 6:32 pm on July 13, 2025:

In the corresponding commit there is a mention of kernel_ERROR_OUT_OF_BOUNDS which should be removed I think.

in src/kernel/bitcoinkernel.h:920 in 690a5dac22 outdated

919+ *
920+ * @param[in] raw_block     Non-null, serialized block.
921+ * @param[in] raw_block_len Length of the serialized block.
922+ * @return                  The allocated block, or null on error.
923+ */
924+BITCOINKERNEL_API kernel_Block* BITCOINKERNEL_WARN_UNUSED_RESULT kernel_block_create(

stringintech commented at 6:50 pm on July 13, 2025:

In the corresponding commit description:

utility for serializing a CBlock (kernel_block_create()) -> utility for deserializing

in src/kernel/bitcoinkernel.h:938 in 690a5dac22 outdated

933+ */
934+BITCOINKERNEL_API kernel_BlockHash* BITCOINKERNEL_WARN_UNUSED_RESULT kernel_block_get_hash(
935+    kernel_Block* block
936+) BITCOINKERNEL_ARG_NONNULL(1);
937+
938+/** @name ByteArray

stringintech commented at 6:51 pm on July 13, 2025:

0/**

stringintech commented at 6:56 pm on July 13, 2025: contributor

Had some time to go over the commits and code changes and left a number of documentation-related comments. The existing docs are already really solid and helpful - just suggestions that might add clarity.

purpleKarrot commented at 9:28 am on July 14, 2025: contributor

NACK

After working with the API for a few days and reviewing the various language bindings listed in the PR summary, I found that the API requires some fundamental changes in order to reduce the amount of glue code required in language bindings and client code. I wrote a rather detailed analysis here: https://njump.me/naddr1qvzqqqr4gupzqrcxrljwdpfz2qn5a57hse6ez6pkd34pe0wpeskmktt2p62yeketqqvxy6t5vdhkjmntv4exuetv94shq6fdwfjhv6t9wuxrjull

TheCharlatan force-pushed on Jul 14, 2025

TheCharlatan commented at 10:10 am on July 14, 2025: contributor

Thank you for the review @stringintech!

Rebased 690a5dac223ed18a65c9d9e6c535466cc3ad4511 -> 52bab146a5045899ea6800305fa6d9b4efdcc6bd (kernelApi_42 -> kernelApi_43, compare)

Updated 52bab146a5045899ea6800305fa6d9b4efdcc6bd -> 267a7b3f321304f75e8c47e380da49ba9c64bc84 (kernelApi_43 -> kernelApi_44, compare)

Addressed @stringintech’s comment, check bounds on kernel_get_transaction_undo_size.
Addressed @stringintech’s comment, add returning 0 as a potential mark for failure.
Addressed @stringintech’s comment, clarify ownership of kernel_TransactionOutput.
Addressed @stringintech’s comment, clarify copy on copying of the notifications callback struct.
Addressed @stringintech’s comment, renaming kernel_add_log_level_category to kernel_set_log_level_category.
Addressed @stringintech’s comment, describe what precisely happens when we call DisconnectTestLogger().
Addressed @stringintech’s comments 1, 2, 3, 4, 5, 6, 7, 8 improving documentation around returning failure values and types.
Addressed @stringintech’s comment, mention that validation interface callbacks block in the documentation.
Addressed @stringintech’s comment, updated commit description around deserializing blocks.
Addressed @stringintech’s comment, removed name that was left dangling after a prior update.

TheCharlatan commented at 10:44 am on July 14, 2025: contributor

Re #30595 (comment)

Thanks for writing all of that up and your detailed tour @purpleKarrot. I think you raise some excellent points on your blog, but I am not sure how I am to interpret your NACK here.

You mention that fundamental changes are required, but after reading some of your proposed changes in /btck I am not sure how materially different those are from what is proposed here. I think naming conventions is probably the easiest win here, I am currently working on re-writing the header with proper noun-verb-object separation. I think your suggestions for using similar names for the various manipulations we do so we can better map them to ranges, or other standard interfaces is great too!

The BlockUndo is a data type used in Bitcoin Core to populate the rev*.dat data. It contains all the information required to “undo” spending a coin in case of a block reorg. It thus contains the transaction outputs consumed by each transaction in a block. It is really useful to have this information in order to build indexes and do rudimentary data analysis.

I think one problem here is that I may have been too conservative with both the capabilities of the API, i.e. no reference counting, no richer data getters for e.g. blocks and transactions, in order to keep the scope of this PR manageable. The current PR does not have many of the richer things you seem to desire for the fear of scope creep. There is endless bikeshedding potential here, and radically limiting the scope seemed like a solution for this to me. The current header implements the bare minimum required to get a) a full node running, b) read block and undo data, and c) do script validation.

I’d gladly roll your suggestions into the PR here, but if you think this is not salvageable, I’d like to know which conceptual part you disagree with, i.e. this repository shipping a C header, the relationship between the C and C++ headers, or all the deeper memory management and ref counting issues.

purpleKarrot commented at 11:04 am on July 14, 2025: contributor

am not sure how I am to interpret your NACK here.

Just as “I think this PR should not be merged in its current form.” I definitely do agree with the approach of adding a C API.

Regarding the other points, maybe we should have a private discussion.

josibake commented at 3:11 pm on July 14, 2025: member

am not sure how I am to interpret your NACK here.

Just as “I think this PR should not be merged in its current form.” I definitely do agree with the approach of adding a C API.

FWIW, I read this as “Concept ACK, Approach NACK” (per https://github.com/bitcoin/bitcoin/blob/master/CONTRIBUTING.md#conceptual-review), which I think is a helpful distinction.

purpleKarrot commented at 3:20 pm on July 14, 2025: contributor

Yes, Concept ACK, Approach NACK

Thanks, @josibake.

TheCharlatan force-pushed on Jul 14, 2025

TheCharlatan commented at 3:26 pm on July 14, 2025: contributor

Updated 267a7b3f321304f75e8c47e380da49ba9c64bc84 -> 1ffc1c9d94b16cdbfb92a26d0f0e75451efad4fe (kernelApi_44 -> kernelApi_45, compare)

Enforce better function names in the API, which should make future discussions on their desired end format a bit easier.
Dropped the macro check for gcc 4.

in src/kernel/bitcoinkernel.h:128 in 1ffc1c9d94 outdated

128+ *
129+ * The logging connection can be used to manually stop logging.
130+ *
131+ * Messages that were logged before a connection is created are buffered in a
132+ * 1MB buffer. Logging can alternatively be permanently disabled by calling
133+ * kernel_disable_logging(). Functions changing the logging settings are global

stickies-v commented at 4:00 pm on July 17, 2025:

nit: couple of naming mismatches after latest force pushes:

 0diff --git a/src/kernel/bitcoinkernel.h b/src/kernel/bitcoinkernel.h
 1index b72c001d1b..ec4db4e7c7 100644
 2--- a/src/kernel/bitcoinkernel.h
 3+++ b/src/kernel/bitcoinkernel.h
 4@@ -130,7 +130,7 @@ typedef struct kernel_TransactionOutput kernel_TransactionOutput;
 5  *
 6  * Messages that were logged before a connection is created are buffered in a
 7  * 1MB buffer. Logging can alternatively be permanently disabled by calling
 8- * kernel_disable_logging(). Functions changing the logging settings are global
 9+ * kernel_logging_disable(). Functions changing the logging settings are global
10  * (and not thread safe) and change the settings for all existing
11  * kernel_LoggingConnection instances.
12  */
13@@ -576,7 +576,7 @@ BITCOINKERNEL_API void kernel_logging_disable();
14 
15 /**
16  * [@brief](/bitcoin-bitcoin/contributor/brief/) Set the log level of the global internal logger. This does not
17- * enable the selected categories. Use `kernel_enable_log_category` to start
18+ * enable the selected categories. Use `kernel_logging_enable_category` to start
19  * logging from a specific, or all categories. This function is not thread
20  * safe. Mutiple calls from different threads are allowed but must be
21  * synchronized. This changes a global setting and will override settings for
22@@ -786,7 +786,7 @@ BITCOINKERNEL_API void kernel_chainstate_manager_options_set_worker_threads_num(
23 
24 /**
25  * [@brief](/bitcoin-bitcoin/contributor/brief/) Sets wipe db in the options. In combination with calling
26- * [@ref](/bitcoin-bitcoin/contributor/ref/) kernel_import_blocks this triggers either a full reindex,
27+ * [@ref](/bitcoin-bitcoin/contributor/ref/) kernel_chainstate_manager_import_blocks this triggers either a full reindex,
28  * or a reindex of just the chainstate database.
29  *
30  * [@param](/bitcoin-bitcoin/contributor/param/)[in] chainstate_manager_options Non-null, created by [@ref](/bitcoin-bitcoin/contributor/ref/) kernel_chainstate_manager_options_create.

in src/kernel/bitcoinkernel_wrapper.h:476 in 1ffc1c9d94 outdated

471+        {
472+            kernel_block_undo_destroy(ptr);
473+        }
474+    };
475+
476+    const std::unique_ptr<kernel_BlockUndo, Deleter> m_block_undo;

stickies-v commented at 6:02 pm on July 17, 2025:

Should this be const? It’s not moveable as-is. If intentional, brief docstring would be good?

0    std::unique_ptr<kernel_BlockUndo, Deleter> m_block_undo;

in src/bitcoin-chainstate.cpp:14 in 1ffc1c9d94 outdated

47+#include <string_view>
48+#include <vector>
49 
50-int main(int argc, char* argv[])
51+#ifdef WIN32
52+#include <windows.h>

stickies-v commented at 7:00 pm on July 17, 2025:

clang-format sorts this alphabetically, but windows.h needs to be included before shellapi.h otherwise we get build errors like:

0C:\Program Files (x86)\Windows Kits\10\Include\10.0.20348.0\um\shellapi.h(68,1): error C2146: syntax error: missing ';' before identifier 'DECLSPEC_IMPORT' [D:\a\bitcoin\bitcoin\build\src\bitcoin-chainstate.vcxproj]
1  (compiling source file '../../src/bitcoin-chainstate.cpp')
2  
3C:\Program Files (x86)\Windows Kits\10\Include\10.0.20348.0\um\shellapi.h(79,16): error C2065: 'HDROP': undeclared identifier [D:\a\bitcoin\bitcoin\build\src\bitcoin-chainstate.vcxproj]
4  (compiling source file '../../src/bitcoin-chainstate.cpp')
5  
6C:\Program Files (x86)\Windows Kits\10\Include\10.0.20348.0\um\shellapi.h(82,1): error C2086: 'int EXTERN_C': redefinition [D:\a\bitcoin\bitcoin\build\src\bitcoin-chainstate.vcxproj]
7  (compiling source file '../../src/bitcoin-chainstate.cpp')
8      C:\Program Files (x86)\Windows Kits\10\Include\10.0.20348.0\um\shellapi.h(68,1):
9      see declaration of 'EXTERN_C'

Could be useful to exclude this from clang-format?

0#ifdef WIN32
1// clang-format off
2#include <windows.h>
3// clang-format on
4#include <codecvt>
5#include <locale>
6#include <shellapi.h>
7#endif

in src/kernel/bitcoinkernel.cpp:1033 in 1ffc1c9d94 outdated

1040+    auto chainman{cast_chainstate_manager(chainman_)};
1041+    const CBlockIndex* block_index{cast_const_block_index(block_index_)};
1042+
1043+    auto block{new std::shared_ptr<CBlock>(new CBlock{})};
1044+    if (!chainman->m_blockman.ReadBlock(**block, *block_index)) {
1045+        LogError("Failed to read block.");

stickies-v commented at 1:45 pm on July 18, 2025:

This looks like it leaks memory since we never deallocate block if ReadBlock fails. Perhaps using std::unique_ptr here is a better approach? This (+kernel_block_undo_read) seems like the most dangerous one, but perhaps good practice to do this in other places we allocate memory, e.g. here.

TheCharlatan commented at 4:42 pm on July 28, 2025:

This should get de-allocated when the user calls kernel_block_destroy, no?

stickies-v commented at 4:53 pm on July 28, 2025:

They can’t call kernel_block_destroy, we return nullptr. The failure branch needs to include delete block (or alternatively, as suggested, just instantiate block as a std::unique_ptr and then promote it to shared_ptr if reading succeeds.

TheCharlatan commented at 5:09 pm on July 28, 2025:

Right, there is also no test for this :(. Will fix and add a test.

in src/kernel/bitcoinkernel.h:1053 in 1ffc1c9d94 outdated

965+BITCOINKERNEL_API kernel_ByteArray* BITCOINKERNEL_WARN_UNUSED_RESULT kernel_block_pointer_copy_data(
966+    const kernel_BlockPointer* block
967+) BITCOINKERNEL_ARG_NONNULL(1);
968+
969+/**
970+ * Destroy the block.

stickies-v commented at 2:49 pm on July 18, 2025:

Since we use reference counting here, would it be useful to document that in the documentation?

0 * Destroy the block. Handle is invalidated immediately, block is destroyed as soon as no references remain.

TheCharlatan commented at 4:43 pm on July 28, 2025:

This was not relevant so far, since nothing could increment the reference count permanently. Will add once we have something that would exercise that case.

in src/kernel/bitcoinkernel.cpp:508 in 1ffc1c9d94 outdated

505+            if (status) *status = kernel_SCRIPT_VERIFY_ERROR_SPENT_OUTPUTS_MISMATCH;
506+            return false;
507+        }
508+        spent_outputs.reserve(spent_outputs_len);
509+        for (size_t i = 0; i < spent_outputs_len; i++) {
510+            const CTxOut& tx_out{*reinterpret_cast<const CTxOut*>(spent_outputs_[i])};

stickies-v commented at 5:18 pm on July 21, 2025:

We should probably use cast_transaction_output here to make sure we’re not miscasting anything?

in src/kernel/bitcoinkernel.h:1184 in 1ffc1c9d94 outdated

1184+ * @param[in] transaction_undo_index The index of the transaction undo data within the block undo data.
1185+ * @param[in] output_index           The index of the to be retrieved transaction output within the
1186+ *                                   transaction undo data.
1187+ * @return                           A transaction output pointer, or null if provided indices are out of bounds.
1188+ */
1189+BITCOINKERNEL_API kernel_TransactionOutput* BITCOINKERNEL_WARN_UNUSED_RESULT kernel_block_undo_copy_transaction_output_by_index(

stickies-v commented at 4:08 pm on July 28, 2025:

I would prefer to expose kernel_TransactionUndo and kernel_Coin handles so we can generalize iterating over these nested containers.

Using shared_ptr and aliasing constructors, we can do so without incurring any copies (but at the cost of allocating shared_ptr and incrementing the reference counter). In my view this is both more ergonomic (by exposing dedicated types) and performant (by avoiding the need for any _copy operations (except for the kernel_ByteArray ones).

If necessary, and in addition, we can still expose _copy functions in places where we need even more performance and the user prefers handling lifetimes themselves, but I think this can be done at a later stage and on a case-by-case basis.

Example implementation: https://github.com/stickies-v/bitcoin/commits/kernel/add-txundo-coin/ (specifically the 4 “fix: " commits.

in src/kernel/bitcoinkernel.h:211 in 1ffc1c9d94 outdated

211+
212+/**
213+ * Opaque data structure for holding a non-owned block. This is typically a
214+ * block available to the user through one of the validation callbacks.
215+ */
216+typedef struct kernel_BlockPointer kernel_BlockPointer;

stickies-v commented at 4:12 pm on July 28, 2025:

We need kernel_BlockPointer because the validation interface gives us a non-owning reference. It would imo be a lot nicer if we could generalize this into kernel_Block, so I have opened #33078 to improve ownership semantics in the validation interface, after which kernel_BlockPointer can then be removed, e.g. as in https://github.com/stickies-v/bitcoin/commits/kernel/remove-blockpointer/

in src/kernel/bitcoinkernel.cpp:375 in 1ffc1c9d94 outdated

372+{
373+    assert(chainman);
374+    return reinterpret_cast<ChainstateManager*>(chainman);
375+}
376+
377+std::shared_ptr<CBlock>* cast_cblocksharedpointer(kernel_Block* block)

stickies-v commented at 4:16 pm on July 28, 2025:

I think the implementation would be a lot cleaner and safer if we didn’t use reinterpret_cast (or only in rare, targeted cases) but instead just implement the kernel_ structs in the .cpp file (keeping it hidden to the user). It does add minimal overhead by allocating a (usually trivial) struct, but in most cases that is infrequent and the cost negligible. If necessary, we could still re-introduce reinterpret_cast in places where we observe that it does affect performance, but I would strongly prefer it not to be the default.

I think this change goes hand-in-hand with moving to using reference counting internally, where the kernel_ structs would then just have a single std::shared_ptr<> member.

See e.g. https://github.com/stickies-v/bitcoin/commit/daed81c192760bf2e21985c05b42c76c08501e11 for an example.

theuni commented at 6:16 pm on July 28, 2025:

Concept ACK. I suggested the same thing :)

TheCharlatan force-pushed on Jul 28, 2025

TheCharlatan commented at 4:20 pm on July 28, 2025: contributor

Updated 1ffc1c9d94b16cdbfb92a26d0f0e75451efad4fe -> 938767d957b7669accfb554a7cbb25141f7e8632 (kernelApi_45 -> kernelApi_46, compare)

Fixed symbol visibility for windows static builds and simplified the macro checks in the header a bit.
Removed the kernel library symbol visibility hack, this allows the user to reduce exports for the library.

stickies-v commented at 4:28 pm on July 28, 2025: contributor

Did another review round while updating py-bitcoinkernel. The main themes are:

using reference counting internally to let us simplify the interface (e.g. expose kernel_TransactionUndo and kernel_Coin instead of letting the user manage indexes) as well as replace expensive _copy operations with _get ones
- note: in the WG we have discussed exposing reference counting through the public interface too, but personally I don’t see the benefits for that complexity yet, even if I’m happy to convinced otherwise.
avoid/minimize using reinterpret_cast and all the cast_{const}_ functions - instead just implement the kernel_ structs. The overhead should be negligible in most cases and I think it’ll make the code safer and easier to read.
remove kernel_BlockPointer if/when #33078 gets merged.

TheCharlatan force-pushed on Jul 28, 2025

TheCharlatan commented at 5:23 pm on July 28, 2025: contributor

Thank you so much for the review @stickies-v! Picked off the smaller items here, while we are still experimenting with the rest:

Updated 938767d957b7669accfb554a7cbb25141f7e8632 -> 4eb1c66dbdaf35cbc480cb201f6019bc0f5fde95 (kernelApi_46 -> kernelApi_47, compare).

Addressed @stickies-v’s comment, fixed some docstrings.
Addressed @stickies-v’s comment, removed unnecessary const.
Addressed @stickies-v’s comment, added guards against clang-format moving order sensitive includes.
Addressed @stickies-v’s comment, fixed memory leak on read error. This was not caught before, because the error path is untested. Will add a test in a later push.
Addressed @stickies-v’s comment, use a cast for getting a CTxOut*.

purpleKarrot commented at 8:57 pm on July 28, 2025: contributor

I see in the C++ wrapper that a lot of functions are marked noexcept. I assume that the rationale is: “since this function just wraps a C function and C functions cannot throw, the function could just as well be marked noexcept”.

But this is not how the keyword is meant to be used. The noexcept keyword is intended to be used on functions that should not be allowed to throw. The compiler then generates a runtime check to terminate the program in the case that the function throws. You certainly don’t need those checks when wrapping a C function.

Also, per the C++ Core Guidelines,

Leaving an object without its invariant established is asking for trouble

You cannot make a constructor noexcept if it fails to establish the invariant of a class.

Make sure to study: http://www.exceptionsafecode.com/

TheCharlatan force-pushed on Jul 29, 2025

TheCharlatan commented at 7:54 am on July 29, 2025: contributor

Updated 4eb1c66dbdaf35cbc480cb201f6019bc0f5fde95 -> 3d6b3fd8f65e8d4ea2c26d61c3dee89f5da10fac (kernelApi_47 -> kernelApi_48, compare)

Added test for failed block and undo data reads.
Migrated unit tests to use our usual boost test framework. This was done initially to demonstrate that indeed nothing boost related leaked out of the library, but with all other users this is no longer a good reason.

DrahtBot added the label Needs rebase on Jul 29, 2025

TheCharlatan force-pushed on Jul 30, 2025

TheCharlatan commented at 8:58 am on July 30, 2025: contributor

Rebased 3d6b3fd8f65e8d4ea2c26d61c3dee89f5da10fac -> 6a9fdf7ae58a85ccc08c5f6917f64f28f5a330ad (kernelApi_48 -> kernelApi_49, compare)

Fixed conflict with #33079

DrahtBot removed the label Needs rebase on Jul 30, 2025

TheCharlatan commented at 8:04 am on August 1, 2025: contributor

Re #30595 (comment)

You cannot make a constructor noexcept if it fails to establish the invariant of a class.

The question of how to handle constructor errors has been annoying me in this code base for a long time. We do everything from throwing, to output error parameters, to factory functions returning optionals, to bool operator style validity checks. My feeling is we recently tend to avoid throwing exceptions, my personal preference would be using c++23’s std::expected, but that will take more time to land.

The current approach here mimics the approach in our existing minisketch c++ wrapper: https://github.com/bitcoin/bitcoin/blob/master/src/minisketch/include/minisketch.h#L232-L269. I think it would be nice to at least attempt consistency for our C API wrapper classes.

DrahtBot added the label Needs rebase on Aug 6, 2025

TheCharlatan commented at 10:19 am on August 7, 2025: contributor

Rebased 6a9fdf7ae58a85ccc08c5f6917f64f28f5a330ad -> ce8003578e725cf3c64a0f3e1447459e26955a3d (kernelApi_49 -> kernelApi_50, compare)

Fixed conflict with #33077 @stickies-v want to give the mono lib a try in your python bindings?

TheCharlatan force-pushed on Aug 7, 2025

DrahtBot removed the label Needs rebase on Aug 7, 2025

TheCharlatan force-pushed on Aug 11, 2025

TheCharlatan commented at 9:00 am on August 11, 2025: contributor

Rebased ce8003578e725cf3c64a0f3e1447459e26955a3d -> 825f9032dd464cc5d2cdf6493d4a5ddbb2f2ab93 (kernelApi_50 -> kernelApi_51, compare)

kernel: Introduce initial kernel C header API

As a first step, implement the equivalent of what was implemented in the
now deprecated libbitcoinconsensus header. Also add a test binary to
exercise the header and library.

Unlike the deprecated libbitcoinconsensus the kernel library can now use
the hardware-accelerated sha256 implementations thanks for its
statically-initialzed context. The functions kept around for
backwards-compatibility in the libbitcoinconsensus header are not ported
over. As a new header, it should not be burdened by previous
implementations. Also add a new error code for handling invalid flag
combinations, which would otherwise cause a crash.

The macros used in the new C header were adapted from the libsecp256k1
header.

To make use of the C header from C++ code, a C++ header is also
introduced for wrapping the C header. This makes it safer and easier to
use from C++ code.

040469a30f

kernel: Add logging to kernel library C header

Exposing logging in the kernel library allows users to follow what is
going on when using it. Users of the C header can use
`kernel_logging_connection_create(...)` to pass a callback function to
Bitcoin Core's internal logger. Additionally the level and severity can
be globally configured.

By default, the logger buffers messages until
`kernel_loggin_connection_create(...)` is called. If the user does not
want any logging messages, it is recommended that
`kernel_disable_logging()` is called, which permanently disables the
logging and any buffering of messages.

3490457e80

kernel: Add kernel library context object

The context introduced here holds the objects that will be required for
running validation tasks, such as the chosen chain parameters, callbacks
for validation events, and an interrupt utility. These will be used in a
few commits, once the chainstate manager is introduced.

This commit also introduces conventions for defining option objects. A
common pattern throughout the C header will be:
```
options = object_option_create();
object = object_create(options);
```
This allows for more consistent usage of a "builder pattern" for
objects where options can be configured independently from
instantiation.

ad87097995

kernel: Add chain params context option to C header

As a first option, add the chainparams. For now these can only be
instantiated with default values. In future they may be expanded to take
their own options for regtest and signet configurations.

This commit also introduces a unique pattern for setting the option
values when calling the `*_set(...)` function.

5ca40d94ec

kernel: Add notifications context option to C header

The notifications are used for notifying on connected blocks and on
warning and fatal error conditions.

The user of the C header may define callbacks that gets passed to the
internal notification object in the
`kernel_NotificationInterfaceCallbacks` struct. Each of the callbacks
take a `user_data` argument that gets populated from the `user_data`
value in the struct. It can be used to recreate the structure containing
the callbacks on the user's side, or to give the callbacks additional
contextual information.

a75a423455

kernel: Add chainstate manager object to C header

This is the main driver class for anything validation related, so expose
it here.

Creating the chainstate manager options will currently also trigger the
creation of their respectively configured directories.

The chainstate manager and block manager options are consolidated into a
single object. The kernel might eventually introduce a separate block
manager object for the purposes of being a light-weight block store
reader.

The chainstate manager will associate with the context with which it was
created for the duration of its lifetime. It is only valid if that
context remains in memory too.

The tests now also create dedicated temporary directories. This is
similar to the behaviour in the existing unit test framework.

Co-authored-by: stickies-v <stickies-v@protonmail.com>

f28e13fab3

kernel: Add chainstate manager option for setting worker threads

Re-use the same pattern used for the context options. This allows users
to set the number of threads used in the validation thread pool.

4390e0a3f8

kernel: Add chainstate loading when instantiating a ChainstateManager

The library will now internally load the chainstate when a new
ChainstateManager is instantiated.

Options for controlling details of loading the chainstate will be added
over the next few commits.

6d994d1d78

kernel: Add block validation to C header

The added function allows the user process and validate a given block
with the chainstate manager. The *_process_block(...) function does some
preliminary checks on the block before passing it to
`ProcessNewBlock(...)`. These are similar to the checks in the
`submitblock()` rpc.

Richer processing of the block validation result will be made available
in the following commits through the validation interface.

The commits also adds a utility for deserializing a `CBlock`
(`kernel_block_create()`) that may then be passed to the library for
processing.

The tests exercise the function for both mainnet and regtest. The
commit also adds the data of 206 regtest blocks (some blocks also
contain transactions).

323189720f

kernel: Add options for reindexing in C header

Adds options for wiping the chainstate and block tree indexes to the
chainstate load options. In combination and once the
`*_import_blocks(...)` function is added in a later commit, this
triggers a reindex. For now, it just wipes the existing data.

d087bd8c46

kernel: Add chainstate load options for in-memory dbs in C header

This allows a user to run the kernel without creating on-disk files for
the block tree and chainstate indexes. This is potentially useful in
scenarios where the user needs to do some ephemeral validation
operations.

One specific use case is when linearizing the blocks on disk. The block
files store blocks out of order, so a program may utilize the library
and its header to read the blocks with one chainstate manager, and then
write them back in order, and without orphans, with another chainstate
maanger. To save disk resources and if the indexes are not required once
done, it may be beneficial to keep the indexes in memory for the
chainstate manager that writes the blocks back again.

762bae19da

kernel: Add import blocks function to C header

The `kernel_import_blocks` function is used to both trigger a reindex,
if the indexes were previously wiped through the chainstate load
options, or import the block data of a single block file.

The behaviour of the import can be verified through the test logs.

9f889a7727

kernel: Add interrupt function to C header

Calling interrupt can halt long-running functions associated with
objects that were created through the passed-in context.

b0ffc6d3b7

kernel: Add validation interface to C header

This adds the infrastructure required to process validation events. For
now the external validation interface only has support for the
`BlockChecked` callback, but support for the other internal validation
interface methods can be added in the future.

The validation interface follows an architecture for defining its
callbacks and ownership that is similar to the notifications.

The task runner is created internally with a context, which itself
internally creates a unique ValidationSignals object. When the user
creates a new chainstate manager the validation signals are internally
passed to the chainstate manager through the context.

The callbacks block any further validation execution when they are
called. It is up to the user to either multiplex them, or use them
otherwise in a multithreaded mechanism to make processing the validation
events non-blocking.

A validation interface can register for validation events with a
context. Internally the passed in validation interface is registerd with
the validation signals of a context.

The BlockChecked callback introduces a seperate type for a non-owned
block. Since a library-internal object owns this data, the user needs to
be explicitly prevented from deleting it. In a later commit a utility
will be added to copy its data.

77429db87f

kernel: Add functions for the block validation state to C header

These allow for the interpretation of the data in a `BlockChecked`
validation interface callback. This is useful to get richer information
in case a block failed to validate.

4d6e283e3d

kernel: Add function for copying block data to C header

This adds functions for copying serialized block data into a user-owned
variable-sized byte array.

Use it in the tests for verifying the implementation of the validation
interface's `BlockChecked` method.

f4fa216380

kernel: Add functions to read block from disk to C header

This adds functions for reading a block from disk with a retrieved block
index entry. External services that wish to build their own index, or
analyze blocks can use this to retrieve block data.

The block index can now be traversed from the tip backwards. This is
guaranteed to work, since the chainstate maintains an internal block
tree index in memory and every block (besides the genesis) has an
ancestor.

The user can use this function to iterate through all blocks in the
chain (starting from the tip). Once the block index entry for the
genesis block is reached a nullptr is returned if the user attempts to
get the previous entry.

67ab40ed6b

kernel: Add function to read block undo data from disk to C header

This adds functions for reading the undo data from disk with a retrieved
block index entry. The undo data of a block contains all the spent
script pubkeys of all the transactions in a block.

In normal operations undo data is used during re-orgs. This data might
also be useful for building external indexes, or to scan for silent
payment transactions.

Internally the block undo data contains a vector of transaction undo
data which contains a vector of the spent outputs. For this reason, the
`kernel_get_block_undo_size(...)` function is added to the header for
retrieving the size of the transaction undo data vector, as well as the
`kernel_get_transaction_undo_size(...) function for retrieving the size
of each spent outputs vector contained within each transaction undo data
entry. With these two sizes the user can iterate through the undo data
by accessing the transaction outputs by their indeces with
`kernel_get_undo_output_by_index`.

The returned `kernel_TransactionOutput` is entirely owned by the user
and may be destroyed with the `kernel_transaction_output_destroy(...)`
convenience function.

d3099c8896

kernel: Add block index utility functions to C header

Adds further functions useful for traversing the block index and
retrieving block information.

This includes getting the block height and hash.

681dba7385

kernel: Add functions to get the block hash from a block

This is useful for a host block processing feature where having an
identifier for the block is needed. Without this, external users need to
serialize the block and calculate the hash externally, which is less
efficient.

61653d71a4

kernel: Add pure kernel bitcoin-chainstate

This showcases a re-implementation of bitcoin-chainstate only using the
kernel C++ API header.

bd46b323c5

kernel: Allowing reducing exports 1209f44eba

kernel: Add Purpose section to header documentation b4c1e8b7b6

kernel: Fix bitcoin-chainstate for windows

And turn it on in the CI.

39c2c5afd7

TheCharlatan force-pushed on Aug 11, 2025

TheCharlatan commented at 9:26 am on August 11, 2025: contributor

Thank you @stickies-v and @purpleKarrot for the discussions, pushing some work now here:

Updated 825f9032dd464cc5d2cdf6493d4a5ddbb2f2ab93 -> 39c2c5afd75e5d455ac2699dcc1c65728e1a5bc5 (kernelApi_51 -> kernelApi_52, compare)

Rename C function prefix from kernel to btck
Introduce new btck namespace
Rename BlockUndo to BlockSpentOutputs.
Introduce clearer type hierarchy to go from BlockSpentOutputs to ScriptPubkeys: BlockSpentOutputs->TransadctionSpentOutputs -> Coin -> TransactionOutput -> ScriptPubkey.
Remove noexcept from most methods in the C++ wrapper, but explicitly add it to the invocations of the Deleter structs.
Throw if there is a constructor failure in the C++ wrapper.
Enforce some more errors that might occur when the programmer passes in bad input (e.g. through out-of-bounds indeces) by asserts.
Make CBlock, CTransaction, and Context shared_ptrs internally. This avoids expensive copies of our larger data structures and avoids having to pass the Context to ChainstateManager related functions.
Avoid using reinterpret_cast by introducing wrapper structs in the c++ glue code for all opaque structs.
Add a new way to express ownership: Functions now document if they return pointers to non-owned data structures. Non-owned pointers are mostly useful while iterating, and are wrapper in a new RefWrapper type in the C++ wrapper. Internally their ownership is tracked by the wrapper struct to ensure that destroying them again remains safe.
Add functions to explicity _copy an object. Depending on the object type this either increments a reference counter, or does a deep copy of the underlying data. This is also documented. This also allows copying data from RefWrapper-wrapper types.

Will re-work the BlockIndex<->ChainstateManager relationship next and replace the ByteArray struct with a writer callback like purpleKarrot suggested here: https://purplekarrot.github.io/btck/design/memory_management.html

in src/kernel/bitcoinkernel.cpp:520 in 39c2c5afd7

517+    }
518+
519+    const CTransaction& tx{*tx_to->m_tx};
520+    std::vector<CTxOut> spent_outputs;
521+    if (spent_outputs_ != nullptr) {
522+        assert(spent_outputs_len == tx.vin.size());

alexanderwiederin commented at 9:23 am on August 12, 2025:

What was the motivation to remove kernel_SCRIPT_VERIFY_ERROR_SPENT_OUTPUTS_MISMATCH?

TheCharlatan commented at 11:47 am on August 12, 2025:

The idea is that bad inputs which obviously arise from insufficient input sanitation or programming errors is not handled through returning errors, but rather enforced by contract. I think all potential size mismatches and out of bounds errors fall into this category.

kernel: Introduce initial C header API #30595

How can I review this PR?

Why a C header (and not a C++ header)

What about versioning?

Potential future additions

Current drawbacks

Code Coverage & Benchmarks

Reviews

Conflicts