kernel: Introduce initial C header API #30595

pull TheCharlatan wants to merge 21 commits into bitcoin:master from TheCharlatan:kernelApi changing 13 files +4414 −2
  1. TheCharlatan commented at 9:18 am on August 6, 2024: contributor

    This is a first attempt at introducing a C header for the libbitcoinkernel library that may be used by external applications for interfacing with Bitcoin Core’s validation logic. It currently is limited to operations on blocks. This is a conscious choice, since it already offers a lot of powerful functionality, but sits just on the cusp of still being reviewable scope-wise while giving some pointers on how the rest of the API could look like.

    The current design was informed by the development of some tools using the C header:

    Next to the C++ header also made available in this pull request, bindings for other languages are available here:

    The rust bindings include unit and fuzz tests for the API.

    The header currently exposes logic for enabling the following functionality:

    • Feature-parity with the now deprecated libbitcoin-consensus
    • Optimized sha256 implementations that were not available to previous users of libbitcoin-consensus thanks to a static kernel context
    • Full support for logging as well as control over categories and severity
    • Feature parity with the existing experimental bitcoin-chainstate
    • Traversing the block index as well and using block index entries for reading block and undo data.
    • Running the chainstate in memory
    • Reindexing (both full and chainstate-only)
    • Interrupting long-running functions

    The pull request introduces a new kernel-only test binary that purely relies on the kernel C header and the C++ standard library. This is intentionally done to show its capabilities without relying on other code inside the project. This may be relaxed to include some of the existing utilities, or even be merged into the existing test suite.

    The complete docs for the API as well as some usage examples are hosted on thecharlatan.ch/kernel-docs. The docs are generated from the following repository (which also holds the examples): github.com/TheCharlatan/kernel-docs.

    How can I review this PR?

    Scrutinize the commit messages, run the tests, write your own little applications using the library, let your favorite code sanitizer loose on it, hook it up to your fuzzing infrastructure, profile the difference between the existing bitcoin-chainstate and the bitcoin-chainstate introduced here, be nitty on the documentation, police the C interface, opine on your own API design philosophy.

    To get a feeling for the API, read through the tests, or one of the examples.

    Please try to avoid nits for the tests, these can wait for later and easily be improved over time. Docs exhaustively explaining all the intricacies of the internal goings-on of the library can also be added later.

    To configure this PR for making the shared library and the bitcoin-chainstate and test_kernel utilities available:

    0cmake -B build -DBUILD_KERNEL_LIB=ON -DBUILD_UTIL_CHAINSTATE=ON
    

    Once compiled the library is part of the build artifacts that can be installed with:

    0cmake --install build
    

    Python headers might also be useful for testing. ctypeslib2’s clang2py can be used to auto-generate bindings:

    0clang2py src/kernel/bitcoinkernel.h -l /path/to/bitcoin/src/.libs/libbitcoinkernel.so > test_wrapper.py
    

    Or alternatively on macOS (after cmake --install build):

    0clang2py /usr/local/include/bitcoinkernel.h -l /usr/local/lib/libbitcoinkernel.dylib --nm $(PWD)/nm_patch.py > test_wrapper.py
    

    Why a C header (and not a C++ header)

    • Shipping a shared library with a C++ header is hard, because of name mangling.
    • Mature and well-supported tooling for integrating C exists for nearly every popular language.
    • C offers a reasonably stable ABI

    Also see #30595 (comment).

    What about versioning?

    The header and library are still experimental and I would expect this to remain so for some time, so best not to worry about versioning yet.

    Potential future additions

    In future, the C header could be expanded to support (some of these have been roughly implemented):

    • Handling transactions, block headers, coins cache, utxo set, meta data, and the mempool
    • Adapters for an abstract coins store
    • Adapters for an abstract block store
    • Allocators and buffers for more efficient memory usage
    • An “io-less” interface

    Current drawbacks

    • For external applications to read the block index of an existing Bitcoin Core node, Bitcoin Core needs to shut down first, since leveldb does not support reading across multiple processes. Other than migrating away from leveldb, there does not seem to be a solution for this problem.
    • The fatal error handling through the notifications is awkward.
    • Handling shared pointers in the interfaces is unfortunate. They make ownership and freeing of the resources fuzzy and poison the interfaces with additional types and complexity. However, they seem to be an artifact of the current code that interfaces with the validation engine. The validation engine itself does not seem to make extensive use of these shared pointers.
    • If multiple instances of the same type of objects are used, there is no mechanism for distinguishing the log messages produced by each of them.
    • The background leveldb compaction thread may not finish in time leading to a non-clean exit. There seems to be nothing we can do about this, outside of patching leveldb.
  2. DrahtBot commented at 9:18 am on August 6, 2024: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/30595.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    Concept ACK josibake, stickies-v, ismaelsadeeq

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    Conflicts

    No conflicts as of last run.

  3. DrahtBot added the label Validation on Aug 6, 2024
  4. TheCharlatan force-pushed on Aug 6, 2024
  5. DrahtBot added the label CI failed on Aug 6, 2024
  6. DrahtBot commented at 10:42 am on August 6, 2024: contributor

    🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/28396412371

    Make sure to run all tests locally, according to the documentation.

    The failure may happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  7. TheCharlatan force-pushed on Aug 6, 2024
  8. DrahtBot removed the label CI failed on Aug 6, 2024
  9. theuni commented at 10:14 pm on August 7, 2024: member
    Very cool. Can’t wait to dig in when I have some free time.
  10. ryanofsky commented at 6:05 pm on August 12, 2024: contributor

    This seems to offer a lot of nice features, but can you explain the tradeoffs of wrapping the C++ interface in C instead of using C++ from rust directly? It seems like having a C middle layer introduces a lot of boilerplate, and I’m wondering if it is really necessary. For example it seems like there is a rust cxx crate (https://docs.rs/cxx/latest/cxx/, https://chatgpt.com/share/dd4dde59-66d6-4486-88a6-2f42144be056) that lets you call C++ directly from Rust and avoid the need for C boilerplate. It looks like https://cppyy.readthedocs.io/en/latest/index.html is an even more full-featured way of calling c++ from python.

    Another drawback of going through a C API seems like not just increased boilerplate, but reduced safety. For example, the implementation is using reinterpret_cast everywhere and it seems like the exposed C functions use a kernel_ErrorCode enum type with the union of every possible error type, so callers don’t have a way to know which functions can return which errors.

  11. TheCharlatan commented at 8:54 am on August 13, 2024: contributor

    Thank you for the questions and kicking this discussion off @ryanofsky! I’ll update the PR description with a better motiviation re. C vs C++ header, but will also try to answer your questions here.

    This seems to offer a lot of nice features, but can you explain the tradeoffs of wrapping the C++ interface in C instead of using C++ from rust directly? It seems like having a C middle layer introduces a lot of boilerplate, and I’m wondering if it is really necessary. For example it seems like there is a rust cxx crate (https://docs.rs/cxx/latest/cxx/, https://chatgpt.com/share/dd4dde59-66d6-4486-88a6-2f42144be056) that lets you call C++ directly from Rust and avoid the need for C boilerplate. It looks like https://cppyy.readthedocs.io/en/latest/index.html is an even more full-featured way of calling c++ from python.

    It is true that the interoperability between C++ and Rust has become very good. In fact there is someone working on wrapping the entirety of Bitcoin Core in Rust: https://github.com/klebs6/bitcoin-rs.

    During the last Core Dev meeting in Berlin I also asked if a C API were desirable in the first place (notes here) during the libbitcoinkernel session. I moved forward with this implementation, because the consensus at the time with many contributors in the room was that it was desirable. The reasons for this as discussed during the session at the meeting can be briefly summarised:

    • Shipping a shared library with a C++ header is hard
    • Mature and well-supported tooling for integrating C exists for nearly every popular language.
    • C offers a reasonably stable ABI

    So if we want the broadest possible support, across as many languages as possible with both dynamic and statically compiled libraries, a C header is the go-to option. I’m speculating here, but a C++ header might also make future standard version bumps and adoption of new standard library features harder. If having some trade-offs with compatibility, library portability, and language support is acceptable, a C++ header might be acceptaple though. It would be nice to hear more reviewers give their opinions here.

    I’d also like to add that two libraries that we use and depend on in this project, minisketch and zeromq, use the same pattern. They are C++ codebases, that only expose a C API that in both instances can be used with a C++ RAII wrapper. So there is precedent in the free software ecosystem for doing things this way.

    The quality of C++ language interop seems to vary a lot between languages. Python and Rust seem to have decent support, ziglang on the other hand has no support for C++ bindings. JVM family languages are a bit hit and miss, and many of the common academic and industrial data analysis languages, like Julia, R, and Matlab have no support for direct C++ bindings. The latter category should not be disregarded as potential future users, since this library might be useful to access Bitcoin Core data for data analysis projects.

    Another drawback of going through a C API seems like not just increased boilerplate, but reduced safety. For example, the implementation is using reinterpret_cast everywhere

    I feel like the reduced type safety due to casting is bit of a red herring. The type casting can be harder to abuse if you always use a dedicated helper function for interpreting passed in data types (as I believe is implemented here). Casting is also a pattern used in many other projects; both minisketch and libzmq use similar type casts extensively. It should definitely be possible to scrutinize the API in this PR to a point where it offers decent safety to its users as well as contributors to and maintainers of this code base.

    The concerns around boilerplate are more serious in my view, but at least with the current internal code and headers I feel like exposing a safe C++ API is not trivial either. The current headers do not lend themselves to it well, for example through tricky locking mechanics, exposing boost types, or confusing lifetimes. There also comes a point where we should probably stop extensively refactoring internal code for the kernel. I’ve heard some voices during the last two Core Dev meetings with concerns that the kernel project might turn the validation code into an extensive forever building site. Having some boilerplate and glue to abstract some the ugliness and make it safe seems like an acceptable solution for this dilemma. If this means boilerplate is required anyway, I would personally prefer a C API.

    Some of the boilerplate-y duplicate definitions in the header could be dropped again eventually if some of the enums are moved to C-style enums instead of class enum. As long as they are properly namespaced, I don’t see a big drawback for this. Similarly, some of the structs could be defined in a way where they can be used on both sides using pimpl or similar idioms. All in all, most of these translations seem very straightforward.

    It might be interesting to see how some of the RPC methods could be re-implemented using the kernel header. There have been some RPC implementation bugs over the years that were due to unsafe usage of our internal code within the method implementations. Using the kernel header instead might make this safer and reduce boilerplate. To be clear, I am not suggesting replacing the implementations, but separately re-implementing some of them to show where the kernel header might shine.

    it seems like the exposed C functions use a kernel_ErrorCode enum type with the union of every possible error type, so callers don’t have a way to know which functions can return which errors.

    We have disagreed on the design of this before. If I understood you correctly, consolidating all error codes into a single enumeration was one of the reasons you opened your version for handling fatal errors in the kernel: #29700 as an alternative to my original: #29642. I am still a bit torn by the two approaches. I get that it may be useful to exactly see which errors may be encountered by invoking a certain routine, but at the same time I get the feeling this often ends up splintering the error handling to the point where you end up with a catch all approach after all. I also think that it is nice to have a single, central list for looking up all error codes and defining some routines for handling them in close proximity to their definition. It would be nice to finally hear some more voices besides the two of us discussing this. real-or-random has recently provided some good points on error handling in the libsecp silent payments pr (that I mostly did not adopt in this PR) and argues that most error codes are not useful to the user. As mentioned in the description, error handling is a weak spot of this pull request and I would like to improve it.

  12. ryanofsky commented at 4:01 pm on August 13, 2024: contributor

    I guess another thing I’d like to know is if this is the initial C API, and the implementation is around 3000 lines, and it doesn’t handle “transactions, block headers, coins cache, utxo set, meta data, and the mempool”, how much bigger do you think it will get if it does cover most of the things you would like it to cover? Like is this 20%, 30%, or 50% of the expected size?

    I like the idea of reviewing and merging this PR, and establishing a way to interoperate with rust libraries and external projects. I just think going forward we should not lock ourselves into an approach that requires everything to go through a C interface. As we build on this and add features, we should experiment with other approaches that use C++ directly, especially when it can reduce boilerplate and avoid bugs.

    Thanks for pointing to me to the other error handling discussion. I very much agree with the post that says having a single error handling path is highly desirable. I especially agree with this in cases where detailed error messages are still provided (keeping in mind that error handling != error reporting, you can return simple error states with detailed messages or logging). Of course there are places where callers do need to handle separate error cases, especially when there are temporary failures, timeouts, and interruptions, and in these cases functions should return 2 or 3 error states instead of 1. But I don’t think there is a reason in modern application code for functions to be able to return 5, 10, 20, or 50 error states generally. In low-level or very general OS, networking or DBMS code it might make sense, but for application code it seems like a cargo cult programming practice that made IBM service manuals very impressive in the 1980s but does not have a present day rationale. There are special cases, but I don’t think it should be a normal thing for functions to be returning 15 error codes if we are trying to provide a safe and easy to use API.

    Again though, if this approach is the easiest way to get cross-language interoperability working right now, I think we should try it. I just think we should be looking for ways to make things simpler and safer going forward.

  13. TheCharlatan commented at 7:38 pm on August 13, 2024: contributor

    I guess another thing I’d like to know is if this is the initial C API, and the implementation is around 3000 lines, and it doesn’t handle “transactions, block headers, coins cache, utxo set, meta data, and the mempool”, how much bigger do you think it will get if it does cover most of the things you would like it to cover? Like is this 20%, 30%, or 50% of the expected size?

    I think a fair comparison would be comparing the amount of code “glue” required, e.g. the size of the bitcoinkernel.cpp file in this pull request. The size of the header is very dependent on the detail of documentation and I think judging it by the amount of test code is also hard. On my branch including iterators for the UTXO set, handling headers, and simple mempool processing, basically all the stuff required to drop-in replace the calls to validation code in net_processing with the C API, is about similar in size: https://github.com/bitcoin/bitcoin/pull/30595/files#diff-cc28221ef8d0c7294dda4e3df9f70bb6c062006b387468380c2c2cc02b6762c3 . The code on that branch is more hacky than the code here, so I would expect a bit less than a doubling in size to get all the features required to run a full node with transaction relay.

    In low-level or very general OS, networking or DBMS code it might make sense, but for application code it seems like a cargo cult programming practice that made IBM service manuals very impressive in the 1980s but does not have a present day rationale.

    Heh, well put. I think for most functions here it could be feasible to have more concise error codes without too much effort, but I feel like I have to detach from this a bit before being able to come up with an alternative.

  14. ryanofsky commented at 10:09 pm on August 13, 2024: contributor

    I think for most functions here it could be feasible to have more concise error codes without too much effort, but I feel like I have to detach from this a bit before being able to come up with an alternative.

    Thanks, I think I’d need to look at this more to give concrete suggestions, but I’d hope most functions would just return a simple success or failure status, with a descriptive error message in the case of failure. When functions need to return more complicated information or can fail in different ways that callers will want to distinguish, it should be easy to return the relevant information in custom struct or enum types. I think it’s usually better for functions to return simpler custom types than more complicated shared types, because it lets callers know what values functions can return just by looking at their declarations.

  15. hebasto added the label Needs CMake port on Aug 16, 2024
  16. TheCharlatan force-pushed on Aug 28, 2024
  17. TheCharlatan force-pushed on Aug 29, 2024
  18. hebasto removed the label Needs CMake port on Aug 29, 2024
  19. TheCharlatan force-pushed on Aug 29, 2024
  20. DrahtBot added the label CI failed on Aug 30, 2024
  21. DrahtBot removed the label CI failed on Aug 30, 2024
  22. TheCharlatan force-pushed on Sep 1, 2024
  23. TheCharlatan force-pushed on Sep 1, 2024
  24. TheCharlatan commented at 8:29 pm on September 1, 2024: contributor

    I think for most functions here it could be feasible to have more concise error codes without too much effort, but I feel like I have to detach from this a bit before being able to come up with an alternative.

    Completely got rid of the kernel_Error with the last push. Thanks for laying out your logic ryanofsky, I feel like this is cleaner now. When looking at the Rust wrapper, the code seems much clearer too. Errors are now communicated through nullptr or false values. Where required, so far only for the verification functions, a richer status code is communicated to the developer.

  25. ryanofsky commented at 8:13 pm on September 2, 2024: contributor

    Thanks for the update. It’s good to drop the error codes so the C API can correspond 1:1 with the C++ API and not be tied to a more old fashioned and cumbersome error handling paradigm (for callers that want to know which errors are possible and not have to code defensively or fall back to failing generically).

    I am still -0 on the approach of introducing a C API to begin with, but happy to help review this and get merged and maintain it if other developers think this is the right approach to take (short term or long term). It would be great to have more concept and approach ACKs for this PR particularly from the @theuni who commented earlier and @josibake who seems to have some projects built on this and linked in the PR description.

    I think personally, if I wanted to use bitcoin core code from python or rust I would use tools like:

    And interoperate with C++ directly, instead of wrapping the C++ interface in a C interface first. Tools like these do not support all C++ types and features, and can make it necessary to selectively wrap more complicated C++ interfaces with simpler C++ interfaces, or even C interfaces, but I don’t think this would be a justification for preemptively requiring every C++ type and function to be wrapped in C before it can be exposed. I just think the resulting boilerplate code:

    0kernel_Warning cast_kernel_warning(kernel::Warning warning)
    1{
    2    switch (warning) {
    3    case kernel::Warning::UNKNOWN_NEW_RULES_ACTIVATED:
    4        return kernel_Warning::kernel_LARGE_WORK_INVALID_CHAIN;
    5    case kernel::Warning::LARGE_WORK_INVALID_CHAIN:
    6        return kernel_Warning::kernel_LARGE_WORK_INVALID_CHAIN;
    7    } // no default case, so the compiler can warn about missing cases
    8    assert(false);
    9}
    

    and duplicative type definitions and documentation:

     0/**
     1 * A struct for holding the kernel notification callbacks. The user data pointer
     2 * may be used to point to user-defined structures to make processing the
     3 * notifications easier.
     4 */
     5typedef struct {
     6    void* user_data;                         //!< Holds a user-defined opaque structure that is passed to the notification callbacks.
     7    kernel_NotifyBlockTip block_tip;         //!< The chain's tip was updated to the provided block index.
     8    kernel_NotifyHeaderTip header_tip;       //!< A new best block header was added.
     9    kernel_NotifyProgress progress;          //!< Reports on current block synchronization progress.
    10    kernel_NotifyWarningSet warning_set;     //!< A warning issued by the kernel library during validation.
    11    kernel_NotifyWarningUnset warning_unset; //!< A previous condition leading to the issuance of a warning is no longer given.
    12    kernel_NotifyFlushError flush_error;     //!< An error encountered when flushing data to disk.
    13    kernel_NotifyFatalError fatal_error;     //!< A un-recoverable system error encountered by the library.
    14} kernel_NotificationInterfaceCallbacks;
    

    are fundamentally unnecessary and not worth effort of writing and maintaining when C++ is not a new or unusual language and not meaningfully less accessible or interoperable than C is.

    There are legitimate reasons to wrap C++ in C. One reason would be to provide ABI compatibility. Another would be to make code accessible with dlopen/dlsym. But I think even in these cases you would want to wrap C++ in C selectively, or just define an intermediate C interface to pass pointers but use C++ on either side of the interface. I don’t think you would want to drop down to C when not otherwise needed.

    This is just to explain my point of view though. Overall I think this is very nice work, and I want to help with it, not hold it up.

  26. ryanofsky commented at 8:24 pm on September 2, 2024: contributor
    Another idea worth mentioning is that a bitcoin kernel C API could be implemented as a separate C library depending on the C++ library. The new code here does not necessarily need to be part of the main bitcoin core git repository, and it could be in a separate project. A benefit of this approach is it could relieve bitcoin core developers from the responsibility of updating the C API and API documention when they change the C++ code. But a drawback is that C API might not always be up to date with latest version of bitcoin core code and could be broken between releases. Also it might not be as well reviewed or understood and might have more bugs.
  27. josibake commented at 7:59 am on September 3, 2024: member

    Concept ACK

    Also an implicit approach ACK despite not heavily reviewing the code (yet). I have been focusing on using the kernel library in proof of concept applications to get a better sense of how well the library works for downstream users and to hopefully uncover any pain points preemptively. A few of these projects are linked in the PR description.

    Regarding a C header vs C++ header, thanks @ryanofsky for taking the time to explain your thought process. I think you raise some excellent points. I’ll try to respond as best I can, despite being slightly out of my depth on this topic 😅


    For me, the value of libbitcoinkernel is only fully realised with the broadest possible language support and ease of use for downstream projects. This is why I strongly prefer the C header approach for the following reasons:

    1. Mature tooling for C language bindings
    2. Stable ABI
    3. Well established pattern in other open source projects

    If we agree that broad language support is a goal of libbitcoinkernel, highlighting languages that do not support C++ bindings is a much more compelling argument for a C header than highlighting languages that do support C++ bindings as an argument for a C++ header.

    Regarding some of the mentioned languages/tools which do have C++ language binding support:

    Tools like these do not support all C++ types and features, and can make it necessary to selectively wrap more complicated C++ interfaces with simpler C++ interfaces, or even C interfaces

    In this example, who is doing the wrapping to be able to use these tools? If it’s us, this seems much more complicated to ship and maintain a mixed wrapper and also feels over engineered to a specific set of tools and languages. It also does nothing for languages that do not support C++ bindings at all. As @TheCharlatan mentioned, languages favoured by academia lack C++ binding support and making libbitcoinkernel useful for academic research is a particularly important use case of libbitcoinkernel for me.

    If we are exposing just a C++ header and expecting the downstream user to wrap selective parts in C interfaces to use libbitcoinkernel, we’ve eroded a fundamental value proposition of libbitcoinkernel, in my opinion. Namely, we want to provide a safe to use consensus library for users and minimise the risk of downstream projects introducing consensus bugs. Requiring downstream projects to write their own C++/C interfaces to be able to use kernel means that a) they just won’t use libbitcoinkernel or b) will introduce bugs when writing these wrappers. Said differently, if boilerplate will be needed for broad language support, I would prefer we focus our energy on writing and reviewing boilerplate code that ensures the usefulness of the library for the broadest possible user base, instead of requiring a subset of users to each write their own boilerplate without any review from us.

  28. DrahtBot added the label Needs rebase on Sep 3, 2024
  29. TheCharlatan force-pushed on Sep 4, 2024
  30. DrahtBot removed the label Needs rebase on Sep 4, 2024
  31. in src/kernel/bitcoinkernel.h:496 in 63a83b8dad outdated
    103+    size_t script_pubkey_len;
    104+} kernel_TransactionOutput;
    105+
    106+/**
    107+ * @brief Verify if the input at input_index of tx_to spends the script pubkey
    108+ * under the constraints specified by flags. If the witness flag is set the
    


    stickies-v commented at 3:36 pm on September 11, 2024:
    nit / meta discussion: even though it’ll make things more verbose, I think it might be worth referring to flags with their full name to make it easier for users to find them? I.e. “If the witness flag is set” would become “if kernel_SCRIPT_FLAGS_VERIFY_WITNESS is set in flags”.

    TheCharlatan commented at 9:25 pm on November 19, 2024:
    Good point, I added your suggestion.
  32. in src/kernel/bitcoinkernel.h:34 in 33c71843e3 outdated
    29+#define BITCOINKERNEL_WARN_UNUSED_RESULT __attribute__((__warn_unused_result__))
    30+#else
    31+#define BITCOINKERNEL_WARN_UNUSED_RESULT
    32+#endif
    33+#if !defined(BITCOINKERNEL_BUILD) && defined(__GNUC__) && BITCOINKERNEL_GNUC_PREREQ(3, 4)
    34+#define BITCOINKERNEL_ARG_NONNULL(_x) __attribute__((__nonnull__(_x)))
    


    stickies-v commented at 2:43 pm on September 12, 2024:

    nit: I like that we’re using this guard. Do you see a downside to making it variadic?

    (Should be a pretty trivial rebase with e.g. for i in {1..3}; do sed -i -E "s/BITCOINKERNEL_ARG_NONNULL\(([^)]+)\) BITCOINKERNEL_ARG_NONNULL\(([0-9]+)\)/BITCOINKERNEL_ARG_NONNULL(\1, \2)/" ./src/kernel/bitcoinkernel.h; done)


    TheCharlatan commented at 9:20 pm on November 19, 2024:

    When I apply this to the first commit I get:

    0In file included from /home/drgrid/bitcoin/src/test/kernel/test_kernel.cpp:5:
    1/home/drgrid/bitcoin/src/kernel/bitcoinkernel.h:201:32: error: too many arguments provided to function-like macro invocation
    2  201 | ) BITCOINKERNEL_ARG_NONNULL(1, 3);
    3      |                                ^
    4/home/drgrid/bitcoin/src/kernel/bitcoinkernel.h:34:9: note: macro 'BITCOINKERNEL_ARG_NONNULL' defined here
    5   34 | #define BITCOINKERNEL_ARG_NONNULL(_x) __attribute__((__nonnull__(_x)))
    6      |         ^
    7/home/drgrid/bitcoin/src/kernel/bitcoinkernel.h:201:3: error: expected function body after function declarator
    8  201 | ) BITCOINKERNEL_ARG_NONNULL(1, 3);
    9      |   ^
    

    Which makes sense, because the macro only expects one argument. I’m not sure how safe it is to make it take a string or a list instead.


    stickies-v commented at 10:25 pm on November 19, 2024:

    Sorry, I forgot to include the diff which updates the macro (and 1 instance to show it compiles):

     0diff --git a/src/kernel/bitcoinkernel.h b/src/kernel/bitcoinkernel.h
     1index 9e6bf127db..67248349e2 100644
     2--- a/src/kernel/bitcoinkernel.h
     3+++ b/src/kernel/bitcoinkernel.h
     4@@ -31,9 +31,9 @@
     5 #define BITCOINKERNEL_WARN_UNUSED_RESULT
     6 #endif
     7 #if !defined(BITCOINKERNEL_BUILD) && defined(__GNUC__) && BITCOINKERNEL_GNUC_PREREQ(3, 4)
     8-#define BITCOINKERNEL_ARG_NONNULL(_x) __attribute__((__nonnull__(_x)))
     9+#define BITCOINKERNEL_ARG_NONNULL(...) __attribute__((__nonnull__(__VA_ARGS__)))
    10 #else
    11-#define BITCOINKERNEL_ARG_NONNULL(_x)
    12+#define BITCOINKERNEL_ARG_NONNULL(...)
    13 #endif
    14 
    15 #ifdef __cplusplus
    16@@ -522,7 +522,7 @@ bool BITCOINKERNEL_WARN_UNUSED_RESULT kernel_verify_script(
    17     unsigned int input_index,
    18     unsigned int flags,
    19     kernel_ScriptVerifyStatus* status
    20-) BITCOINKERNEL_ARG_NONNULL(1) BITCOINKERNEL_ARG_NONNULL(3);
    21+) BITCOINKERNEL_ARG_NONNULL(1, 3);
    22 
    23 /**
    24  * [@brief](/bitcoin-bitcoin/contributor/brief/) This disables the global internal logger. No log messages will be
    

    Based on https://gcc.gnu.org/onlinedocs/cpp/Variadic-Macros.html, variadic macros should be standard for C99, and GCC documents accepting multiple indexes: https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html


    TheCharlatan commented at 5:06 pm on November 20, 2024:
    Thanks, decided to take this. I was a bit careful here, because I lifted the check from secp, which also does not use variadic args: https://github.com/bitcoin-core/secp256k1/blob/master/include/secp256k1.h#L174. But thinking a bit more about it, I could not come up with a good reason not to, so took your suggestion.
  33. in src/kernel/bitcoinkernel.h:56 in 33c71843e3 outdated
    48+ * sha256 implementation, initializes the random number generator and
    49+ * self-checks the secp256k1 static context. It is used internally for otherwise
    50+ * "context-free" operations.
    51+ *
    52+ * The user can create their own context for passing it to state-rich validation
    53+ * functions and holding callbacks for kernel events.
    


    stickies-v commented at 2:59 pm on September 12, 2024:

    Is there any benefit to documenting the built-in static constant kernel context in the header documentation? If I understand correctly, that’s an implementation detail and not relevant to the user? If so, I think we should

    • only talk about the non-static context in bitcoinkernel.h, so that its meaning is unambiguous to the user
    • consistently refer to the static context as “static context” wherever it is documented, as to not make me question everything whenever I come across an unqualified context reference

    If there is merit to documenting the static context in the header, I think it should be more of a footnote than the very first item in the documentation?


    TheCharlatan commented at 9:32 pm on November 19, 2024:
    See my comment.
  34. in src/kernel/bitcoinkernel.h:44 in 33c71843e3 outdated
    39+#ifdef __cplusplus
    40+extern "C" {
    41+#endif // __cplusplus
    42+
    43+/**
    44+ * ------ Context ------
    


    stickies-v commented at 3:04 pm on September 12, 2024:
    Is there benefit to this stand-alone Context documentation, since we already have (and could expand on/merge with) the kernel_Context documentation? I think perhaps a more useful alternative would be to start the documentation with a minimal example on how to use the kernel (or a non-code “getting started” guide), which would inevitably include/reference the kernel_Context, providing users a good starting point on which documentation to read first?

    TheCharlatan commented at 8:56 pm on November 19, 2024:

    I did not think that much about order here, but I do think having this section on the context is a good idea. The key is that the user is not required to instantiate the context for using some parts of the library (and I think this is important enough to not just make it a footnote). The user-instantiated context is only really required when interacting with the “stateful” endpoints. Besides, it may be relevant to know what the library is instantiating internally in case there is some sort of conflict.

    There is an exception here with the validation interface, and I’ve taken several attempts to come up with a nice way to tie it into the option pattern as well. I’ll take a stab at it again soon.

  35. in src/kernel/bitcoinkernel.h:77 in 33c71843e3 outdated
    72+ * The user is responsible for de-allocating the memory owned by pointers
    73+ * returned by functions. Typically pointers returned by *_create(...) functions
    74+ * can be de-allocated by corresponding *_destroy(...) functions.
    75+ *
    76+ * Pointer arguments make no assumptions on their lifetime. Once the function
    77+ * returns the user can safely de-allocate the passed in arguments.
    


    stickies-v commented at 3:25 pm on September 12, 2024:

    I find this phrasing a bit confusing. Is this a correct replacement?

    0 * A function that takes pointer arguments makes no assumptions on their lifetime. Once the function
    1 * returns the user can safely de-allocate the memory owned by those pointers.
    

    TheCharlatan commented at 9:22 pm on November 19, 2024:
    Thanks, taken.
  36. in src/kernel/bitcoinkernel.h:80 in 33c71843e3 outdated
    75+ *
    76+ * Pointer arguments make no assumptions on their lifetime. Once the function
    77+ * returns the user can safely de-allocate the passed in arguments.
    78+ *
    79+ * Pointers passed by callbacks are not owned by the user and are only valid for
    80+ * the duration of it. They should not be de-allocated by the user.
    


    stickies-v commented at 3:27 pm on September 12, 2024:
    1. What’s “it”?

    2. I think adopting and sticking to a clear definition of MUST, MAY, SHOULD, … would be appropriate here? E.g. in this case, I think they “MUST” not be de-allocated by the user, rather than “SHOULD”?


    TheCharlatan commented at 9:21 pm on November 19, 2024:
    Improved this a bit and good point with the more precise language.
  37. DrahtBot added the label Needs rebase on Sep 12, 2024
  38. TheCharlatan force-pushed on Sep 12, 2024
  39. TheCharlatan commented at 9:19 pm on September 12, 2024: contributor
    Rebased.
  40. DrahtBot removed the label Needs rebase on Sep 12, 2024
  41. TheCharlatan force-pushed on Sep 13, 2024
  42. TheCharlatan force-pushed on Sep 13, 2024
  43. TheCharlatan force-pushed on Sep 14, 2024
  44. TheCharlatan force-pushed on Oct 8, 2024
  45. TheCharlatan commented at 8:07 pm on October 8, 2024: contributor

    Reworked after receiving a bunch of out-of-band feedback. In short:

    • Got rid of the void * option handling. Options are now set through dedicated functions instead of a single setter for all options.
    • Got rid of the kernel_TaskRunner. The context now holds an immediate task runner internally on which a user can register various validation interfaces. It is now the user’s responsibility to process the validation callbacks in a non-blocking fashion with their own infrastructure.
    • Got rid of raw data types in validation functions. Instead the raw data is now parsed and processed beforehand and the user always passes opaque data types.
    • Got rid of the explicit transaction output struct. The user can now retrieve the data with helper functions applied on opaque transaction output objects.
  46. TheCharlatan force-pushed on Oct 11, 2024
  47. TheCharlatan force-pushed on Oct 15, 2024
  48. laanwj requested review from laanwj on Oct 22, 2024
  49. bitcoin deleted a comment on Oct 22, 2024
  50. DrahtBot added the label Needs rebase on Oct 24, 2024
  51. TheCharlatan force-pushed on Oct 24, 2024
  52. DrahtBot removed the label Needs rebase on Oct 24, 2024
  53. TheCharlatan force-pushed on Oct 24, 2024
  54. TheCharlatan force-pushed on Oct 25, 2024
  55. TheCharlatan force-pushed on Nov 2, 2024
  56. TheCharlatan force-pushed on Nov 8, 2024
  57. TheCharlatan force-pushed on Nov 10, 2024
  58. TheCharlatan force-pushed on Nov 14, 2024
  59. TheCharlatan force-pushed on Nov 17, 2024
  60. TheCharlatan force-pushed on Nov 17, 2024
  61. in src/kernel/bitcoinkernel.h:752 in f1b3ab751b outdated
    741+) BITCOINKERNEL_ARG_NONNULL(1) BITCOINKERNEL_ARG_NONNULL(2) BITCOINKERNEL_ARG_NONNULL(3);
    742+
    743+/**
    744+ * Destroy the chainstate manager.
    745+ */
    746+void kernel_chainstate_manager_destroy(kernel_ChainstateManager* chainstate_manager, const kernel_Context* context);
    


    stickies-v commented at 6:01 pm on November 19, 2024:
    What’s the point of the context parameter - it seems unused, and inconsistent with the other _destroy functions?

    TheCharlatan commented at 8:48 pm on November 19, 2024:
    Similarly to how the context is passed to the other chainman related functions, it is there to guarantee that it is still around when destroying the chainman. The reason for this is that there may be error notification callbacks issued during destruction.
  62. in src/kernel/bitcoinkernel.h:740 in f1b3ab751b outdated
    735+ * @return                               The allocated chainstate manager, or null on error.
    736+ */
    737+kernel_ChainstateManager* BITCOINKERNEL_WARN_UNUSED_RESULT kernel_chainstate_manager_create(
    738+    kernel_ChainstateManagerOptions* chainstate_manager_options,
    739+    kernel_BlockManagerOptions* block_manager_options,
    740+    const kernel_Context* context
    


    stickies-v commented at 6:02 pm on November 19, 2024:
    nit: this is the only place where context is not the first option, would be nice for consistency?

    TheCharlatan commented at 8:47 pm on November 19, 2024:
    Thanks, I think it is good to get these little things right.
  63. in src/kernel/bitcoinkernel.h:676 in f1b3ab751b outdated
    617+ * with the options will be configured for these chain parameters.
    618+ *
    619+ * @param[in] context_options  Non-null, previously created with kernel_context_options_create.
    620+ * @param[in] chain_parameters Is set to the context options.
    621+ */
    622+void kernel_context_options_set_chainparams(
    


    stickies-v commented at 7:18 pm on November 19, 2024:

    There are a few places, like here, where we expose modifier functions that are (quasi) required to be ran before initializing another object. An alternative approach would be to extend the kernel_context_options_create to take a (nullable) kernel_ChainParameters*, and remove these ~unsafe modifiers altogether? I think that would have the benefit of:

    • removing a whole category of bugs where users set options at the wrong time (i.e. too late), silently leading to buggy behaviour
    • making it easier to see which options can (should) be set, without having to first have read the entire documentation

    This concern also applies to e.g.:

    • kernel_context_options_set_notifications
    • kernel_validation_interface_register

    TheCharlatan commented at 9:04 pm on November 19, 2024:
    I think this is good the way it is now. The options get instantiated empty and may be populated by the user. The actual object only gets configured once by the options during its instantion. It can’t be changed later on, so there is no concern that users could set something at the wrong time. Having to set options as arguments in their creation function is not a clear win in my eyes either. There are use-cases, for example using the kernel only as a data reader, where the notifications are useless. Likewise defaulting to mainnet seems sane to me too. It also does not integrate well with the “builder pattern” which is common in a bunch of other languages.
  64. stickies-v commented at 7:19 pm on November 19, 2024: contributor

    Strong concept ACK.

    I’ve started building a python wrapper library to get familiar with and actually use the interface, so most of my comments for now will be based on that experience and reading the documentation.

  65. TheCharlatan commented at 9:31 pm on November 19, 2024: contributor

    Thank you for the review @stickies-v!

    Updated 6c9121f7907262b2bf065a7ceeb8bca620060a7f -> 6c9121f7907262b2bf065a7ceeb8bca620060a7f (kernelApi_0 -> kernelApi_1, compare)

    • Added, cleaned up, and precised a bunch of documentation
    • Slightly changed the order of a function’s arguments, such that it takes the kernel context first.
  66. TheCharlatan force-pushed on Nov 19, 2024
  67. in src/kernel/bitcoinkernel.h:718 in 6c9121f790 outdated
    712+/**
    713+ * @brief Set the number of available worker threads used during validation.
    714+ *
    715+ * @param[in] chainstate_manager_options Non-null, options to be set.
    716+ * @param[in] worker_threads The number of worker threads that should be spawned in the thread pool
    717+ *                           used for validation. The number should be greater than 0.
    


    stickies-v commented at 4:42 pm on November 20, 2024:

    nit: according to the worker_threads_num, 0 is accepted too:

    Zero means no parallel verification.

    0 *                           used for validation. The number must not be negative. When set to zero, no parallel verification is done.
    
  68. TheCharlatan force-pushed on Nov 20, 2024
  69. TheCharlatan commented at 5:04 pm on November 20, 2024: contributor

    Updated 6c9121f7907262b2bf065a7ceeb8bca620060a7f -> 97fe2b25af31ca612c1f8d9f3de739fa3dee3902 (kernelApi_1 -> kernelApi_2, compare)

  70. in src/kernel/bitcoinkernel.h:1049 in 97fe2b25af outdated
    1043+ * @return                       The next block index in the currently active chain, or null on error.
    1044+ */
    1045+kernel_BlockIndex* BITCOINKERNEL_WARN_UNUSED_RESULT kernel_get_next_block_index(
    1046+    const kernel_Context* context,
    1047+    kernel_BlockIndex* block_index,
    1048+    kernel_ChainstateManager* chainstate_manager
    


    stickies-v commented at 4:51 pm on November 21, 2024:
    nit: for the other block_index getters, chainstate_manager is the second argument - would keep that consistent
  71. in src/kernel/bitcoinkernel.h:1015 in 97fe2b25af outdated
    1009+ * @param[in] context            Non-null.
    1010+ * @param[in] chainstate_manager Non-null.
    1011+ * @param[in] block_hash         Non-null.
    1012+ * @return                       The block index of the block with the passed in hash, or null on error.
    1013+ */
    1014+kernel_BlockIndex* BITCOINKERNEL_WARN_UNUSED_RESULT kernel_get_block_index_by_hash(
    


    stickies-v commented at 4:52 pm on November 21, 2024:

    nit: from/by naming inconsistency, I think my preference would lie with from (i.e. update to kernel_get_block_index_from_hash and kernel_get_block_index_from_height)

    (technically, could update kernel_get_next_block_index -> kernel_get_block_index_from_previous and kernel_get_previous_block_index -> kernel_get_block_index_from_next, but… from_next sounds weird?)

  72. TheCharlatan force-pushed on Nov 21, 2024
  73. TheCharlatan commented at 10:11 pm on November 21, 2024: contributor

    Updated 97fe2b25af31ca612c1f8d9f3de739fa3dee3902 -> a9b71eadb8eff5530500cdb7d7227b8575948df6 (kernelApi_2 -> kernelApi_3, compare)

    • As discussed with @stickies-v out of band, make callbacks only return const pointers, which further ensures that the user does not de-allocate or take ownership of them.
  74. TheCharlatan force-pushed on Nov 21, 2024
  75. DrahtBot added the label CI failed on Nov 21, 2024
  76. DrahtBot commented at 10:18 pm on November 21, 2024: contributor

    🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/33351144688

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  77. DrahtBot removed the label CI failed on Nov 21, 2024
  78. TheCharlatan force-pushed on Nov 22, 2024
  79. TheCharlatan commented at 10:36 am on November 22, 2024: contributor

    Updated a9b71eadb8eff5530500cdb7d7227b8575948df6 -> fc67047b7e1fb7031285f790ea3a7ea349474f31 (kernelApi_3 -> kernelApi_4, compare)

  80. TheCharlatan force-pushed on Nov 22, 2024
  81. TheCharlatan commented at 1:47 pm on November 22, 2024: contributor

    Updated fc67047b7e1fb7031285f790ea3a7ea349474f31 -> 34a8429ff3a870c0caaf4c4790becd86c5acde38 (kernelApi_4 -> kernelApi_5, compare)

    • More consistent const usage
  82. in src/kernel/bitcoinkernel.cpp:496 in 34a8429ff3 outdated
    489+
    490+    if (spent_outputs_ != nullptr && flags & kernel_SCRIPT_FLAGS_VERIFY_TAPROOT) {
    491+        txdata.Init(tx, std::move(spent_outputs));
    492+    }
    493+
    494+    return VerifyScript(tx.vin[input_index].scriptSig,
    


    stickies-v commented at 5:38 pm on November 25, 2024:
    I think it’s confusing that this function can return False and have status == kernel_SCRIPT_VERIFY_OK. How about adding a kernel_SCRIPT_VERIFY_ERROR catch-all member for unspecified errors? Or alternatively, requiring the user to provide a nullptr and only setting it to kernel_SCRIPT_VERIFY_OK is that’s actually so?

    TheCharlatan commented at 9:58 pm on November 25, 2024:

    We discussed during the last workshop that ideally we don’t have any status codes here at all. But the problem is annoying to tackle. You’d probably want to pass this function a script verify object that has already passed through the required pre-checks. But then you have to either copy the objects into this object, or give ownership up to that object, which I don’t think is desirable. The alternative to that is having a function with the same signature that you can call to check the arguments. But then you’re forced to check them here again. I’m coming around to the option of replacing the status codes with log messages, but then we’re sacrificing a bit of responsiveness to the developer.

    Edit: I also think that because this is probably going to be the most low-level verification function we expose here, populating the ScriptError_t enum here and returning that instead might be much more interesting. I wanted to hold off on this a bit though, so did not do that yet.


    TheCharlatan commented at 9:44 am on November 26, 2024:
    What do you think of doing something like this instead: https://github.com/TheCharlatan/bitcoin/commit/6323d7b072de5b13ab25aaa29e02332c44808b62

    stickies-v commented at 12:57 pm on November 26, 2024:

    You’d probably want to pass this function a script verify object that has already passed through the required pre-checks.

    I 100% agree with this approach. Adding extra types makes the API more cumbersome to use, but I think it does make it more safe, and the extra verbosity should be quite easy to hide in client libraries.

    But then you’re forced to check them here again.

    I think that can be avoided by having the prechecks function return a ScriptPreChecksPassed* (which references the object it verified, but doesn’t copy it) and then requiring that as an extra argument (extra as a way to address lifetime issues) to the kernel_verify_script function? This doesn’t prevent runtime issues (e.g. re-using PreChecksPassed pointers, which is easily verifiable at runtime) or segfaults, but at least it adds some compile-time checks to guide the user to using the API safely, and it can be done without any copies or changing ownership?

    What do you think of doing something like this instead: https://github.com/TheCharlatan/bitcoin/commit/6323d7b072de5b13ab25aaa29e02332c44808b62

    I’m not convinced. I think not requiring the user to deal with status codes (or strings) is a good philosophy, but if it’s optional anyway, then passing a bool to add a log entry feels like a much worse interface? And I think it’s inferior to this approach:

    The alternative to that is having a function with the same signature that you can call to check the arguments.


    TheCharlatan commented at 1:25 pm on November 26, 2024:
    I’ll propose a pre-check function and object later today then :)

    TheCharlatan commented at 1:57 pm on November 28, 2024:
    Ok, this is what I have now: https://github.com/TheCharlatan/bitcoin/compare/kernelApi_7..kernelApi_ScriptVerifyArgs , what do you think? EDIT: This also introduces the concept of a ‘View’ where you get a resource, but it is dependent on the lifetime of the resources it was created from.
  83. TheCharlatan force-pushed on Nov 25, 2024
  84. TheCharlatan commented at 10:32 pm on November 25, 2024: contributor

    Thanks for the review @stickies-v!

    Updated 34a8429ff3a870c0caaf4c4790becd86c5acde38 -> 35f8503285c672e8ee7e98617e236b38d8ce7a7f (kernelApi_5 -> kernelApi_6, compare)

    • Addressed @stickies-v’s comment, fixed worker threads docstring.
    • Addressed @stickies-v’s comment, make argument ordering more consistent.
    • Addressed @stickies-v’s comment, name functions consistently that return a new object with the help of other objects with from instead of by.
  85. TheCharlatan force-pushed on Nov 28, 2024
  86. TheCharlatan commented at 11:43 am on November 28, 2024: contributor

    Updated 35f8503285c672e8ee7e98617e236b38d8ce7a7f -> 403c20980ec118f6efdd21d7c25646e20574583b (kernelApi_6 -> kernelApi_7, compare)

    • Integrate the validation interface into the context. This avoids having to create/destroy and register/deregister a standalone validation interface object and should make it a bit easier to use. I was hesitant to do this, because it doesn’t allow the user to register multiple interfaces anymore. However, since the user can multiplex the notifications by themselves, I don’t think this is something worth keeping around.
    • Consistently apply the block hash deleter
  87. TheCharlatan force-pushed on Dec 2, 2024
  88. TheCharlatan commented at 12:42 pm on December 2, 2024: contributor

    Rebased 403c20980ec118f6efdd21d7c25646e20574583b -> 8598bc9e5d3fb7ebc08cf0c6422b3e44c56230d6 (kernelApi_7 -> kernelApi_8, compare)

  89. TheCharlatan force-pushed on Dec 4, 2024
  90. TheCharlatan commented at 8:01 am on December 4, 2024: contributor

    Rebased 8598bc9e5d3fb7ebc08cf0c6422b3e44c56230d6 -> 6090df267dfece6192b567fed6582445aa811e7f (kernelApi_8 -> kernelApi_9, compare)

    • Alligned process block pre-checks through #31175
    • Clamp the work threads number so we properly handle the value range through #31313
  91. TheCharlatan force-pushed on Dec 15, 2024
  92. TheCharlatan commented at 10:54 pm on December 15, 2024: contributor

    Updated 6090df267dfece6192b567fed6582445aa811e7f -> 247a8a02c636250ee7e5c06f08cd18ddb1de6be5 (kernelApi_9 -> kernelApi_10, compare)

    • Tweaked docs to be more doxygen friendly.
    • Corrected transaction in taproot script validation tests. I must have committed a version where I was testing a different invariant by mistake. The test now correctly asserts that a taproot transaction can pass validation.
  93. TheCharlatan force-pushed on Dec 16, 2024
  94. TheCharlatan commented at 11:23 am on December 16, 2024: contributor

    Updated 247a8a02c636250ee7e5c06f08cd18ddb1de6be5 -> 9e203b460d8ab1d92949ab8714a9265c343a5eee (kernelApi_10 -> kernelApi_11, compare)

    • Changed how the notification callbacks are set. There was no real need for a separate notifications object, so I removed it.
  95. in src/kernel/bitcoinkernel.h:264 in 9e203b460d outdated
    259+
    260+/**
    261+ * Function signature for the global logging callback. All bitcoin kernel
    262+ * internal logs will pass through this callback.
    263+ */
    264+typedef void (*kernel_LogCallback)(void* user_data, const char* message);
    


    laanwj commented at 1:23 pm on December 17, 2024:

    A general comment on the API: i’d prefer to pass (and receive) explicit lengths for strings instead of bare char*s.

    My experience with wrapping C APIs in rust is that it’s important to have a defined memory-range for strings and arrays. Relying on NUL-termination means that the memory size is effectively unrestricted, making it impossible to do some checks safely. This is (with lesser urgency) also true for other languages like Python that represent strings as pointer+length.

    As we internally use C++ strings and not C string APIs this seems straightforward to offer.


    TheCharlatan commented at 1:34 pm on December 17, 2024:
    Mmh, thanks for this. It should be easy to add a length parameter here.
  96. TheCharlatan force-pushed on Dec 17, 2024
  97. TheCharlatan commented at 2:58 pm on December 17, 2024: contributor

    Thank you for having a look @laanwj!

    Updated 9e203b460d8ab1d92949ab8714a9265c343a5eee -> 73acb3ff8a04cddc4904c446c2521dd2b2abc84d (kernelApi_11 -> kernelApi_12, compare)

    • Addressed @laanwj’s comment, added a size parameter to all functions taking a null terminated const char* string parameter. This is arguably safer, since we don’t completely rely on the null terminator for safety.
    • Removed ‘\n’ newlines in the log calls in the kernel wrapper calls.
    • Added the progress callback back again. I disabled it some time ago, and forgot to add it back again.
    • Use string views instead of const char* in the c++ wrapper.
  98. laanwj commented at 3:54 pm on December 17, 2024: member

    To make the doxygen documentation nicer to read, i’ve added grouping to the list of functions, and reordered a bit to make sure create is first and destroy always last within the group (if applicable), feel free to take over this patch:

    https://github.com/laanwj/bitcoin/commit/c222651aca4578857f5d432bd6ce221b5602ee38

  99. TheCharlatan force-pushed on Dec 17, 2024
  100. TheCharlatan commented at 5:40 pm on December 17, 2024: contributor

    Thanks for the doc suggestions @laanwj, I just moved a few functions to different places compared to your patch. Feel free to send me another me, if you think it still is not ideal. The groupings do look very nice in the docs now and also makes it a bit more straight forward to decide where to put new functions.

    Updated 73acb3ff8a04cddc4904c446c2521dd2b2abc84d -> 20eec64b5e417cac8c68100826c0adf2152a49eb (kernelApi_12 -> kernelApi_13, compare)

    • Applied @laanwj’s patch, introducing doc groupings for the header functions.
  101. stickies-v referenced this in commit d7c4348efe on Dec 18, 2024
  102. stickies-v referenced this in commit 576176ce79 on Dec 18, 2024
  103. stickies-v referenced this in commit 198280656b on Dec 18, 2024
  104. stickies-v referenced this in commit 1ed55b8071 on Dec 18, 2024
  105. stickies-v referenced this in commit 7fac90b9cc on Dec 18, 2024
  106. stickies-v referenced this in commit 9e5d92ac2f on Dec 18, 2024
  107. stickies-v referenced this in commit d624a26513 on Dec 18, 2024
  108. stickies-v referenced this in commit 7e4c76c9d8 on Dec 18, 2024
  109. stickies-v referenced this in commit 4103e2cb36 on Dec 18, 2024
  110. stickies-v referenced this in commit b7b739325c on Dec 18, 2024
  111. stickies-v referenced this in commit 1db5599b6d on Dec 18, 2024
  112. stickies-v referenced this in commit b42cf6ac98 on Dec 18, 2024
  113. in src/kernel/bitcoinkernel.h:904 in 20eec64b5e outdated
    912+ */
    913+bool BITCOINKERNEL_WARN_UNUSED_RESULT kernel_chainstate_manager_load_chainstate(
    914+    const kernel_Context* context,
    915+    const kernel_ChainstateLoadOptions* chainstate_load_options,
    916+    kernel_ChainstateManager* chainstate_manager
    917+) BITCOINKERNEL_ARG_NONNULL(1, 2, 3);
    


    stickies-v commented at 7:26 pm on December 18, 2024:

    When this function is called more than once, kernel crashes with an assertion error:

    0Assertion failed: (!m_ibd_chainstate), function InitializeChainstate, file validation.cpp, line 5655.
    

    The solutions I see atm:

    1. document that this function may only be called once for each chainman
    2. add a field to kernel_ChainstateManager* to keep track of it being loaded already, return false and log an error
    3. rework LoadChainstate logic to handle multiple calls gracefully
    4. remove kernel_chainstate_manager_load_chainstate altogether and load chainstate during kernel_chainstate_manager_create.

    I’m not sure if we really need a separate *_load_chainstate function, so if that’s true, then option 4. would probably be preferable? I implemented it in https://github.com/TheCharlatan/bitcoin/compare/kernelApi...stickies-v:bitcoin:kernel/remove-load-chainstate, but in practice this probably should be a rebase instead of an extra commit. Options 1. and 2. seem easy enough to implement too, 3. is probably not the most sensible.


    TheCharlatan commented at 10:10 pm on December 18, 2024:

    remove kernel_chainstate_manager_load_chainstate altogether and load chainstate during kernel_chainstate_manager_create.

    I would like this a lot, but I wanted to keep a separate chainstate load function in case we ever land a “blocks-only read-only” chainstate manager, where we don’t need to load any chainstates. I feel like making this a no-op could work, the simplest thing to do would probably be adding something along the lines of:

    0if (chainman.GetAll().size() > 0) return true;
    

    to kernel_chainstate_manager_load_chainstate. But then again it would move us closer to a correct by construction setup if we’d do the constructing and loading all at once.


    stickies-v commented at 1:05 pm on December 19, 2024:

    Thanks for updating to option 4. Just to summarize what we talked about offline:

    but I wanted to keep a separate chainstate load function in case we ever land a “blocks-only read-only” chainstate manager

    That makes sense with the current code organization, but I think we should aim to shift towards a more intuitive API over time. Operations that don’t require any chainstate (such as blocks-only read-only) probably shouldn’t use the chainman in the first place.

    But then again it would move us closer to a correct by construction setup

    I think that is a worthwhile design goal for the API.

  114. in src/kernel/bitcoinkernel.h:433 in 20eec64b5e outdated
    428+} kernel_BlockHash;
    429+
    430+/**
    431+ * Convenience struct for holding serialized data.
    432+ */
    433+typedef struct {
    


    laanwj commented at 1:36 am on December 19, 2024:

    i was wondering; these are trivially small structures, why don’t we pass and return them by value instead of by pointer? this would avoid needing a special kernel_byte_array_destroy call to deallocate them

    edit: never mind, of course that’s still necessary to deallocate the conents


    TheCharlatan commented at 9:11 am on December 19, 2024:

    of course that’s still necessary to deallocate the conents

    Yes, I try to match every call to new with a corresponding place to delete.

  115. TheCharlatan force-pushed on Dec 19, 2024
  116. TheCharlatan commented at 10:58 am on December 19, 2024: contributor

    Thanks for the suggestion @stickies-v, I think it is the right call.

    Updated 20eec64b5e417cac8c68100826c0adf2152a49eb -> f157b0cbc7d90075858a6522d13a7bc4f0b25a5f (kernelApi_13 -> kernelApi_14, compare)

    • Addressed @stickies-v’s comment, applying the suggestion for rolling chainstate loading into chainstate creation.
  117. stickies-v referenced this in commit 523dee4273 on Dec 19, 2024
  118. stickies-v referenced this in commit 514f22fc45 on Dec 19, 2024
  119. stickies-v referenced this in commit dec7ebd469 on Dec 19, 2024
  120. stickies-v referenced this in commit f9408aadad on Dec 19, 2024
  121. stickies-v referenced this in commit befc5e7a09 on Dec 19, 2024
  122. ismaelsadeeq commented at 9:03 pm on December 20, 2024: member
    Concept ACK
  123. in src/kernel/bitcoin-chainstate.cpp:17 in f157b0cbc7 outdated
    12+{
    13+    std::vector<unsigned char> bytes;
    14+
    15+    for (size_t i{0}; i < hex.length(); i += 2) {
    16+        std::string byteString{hex.substr(i, 2)};
    17+        unsigned char byte = (char)std::strtol(byteString.c_str(), nullptr, 16);
    


    laanwj commented at 1:18 pm on January 15, 2025:

    i would really prefer not to bring back use of strtol in C++ code; it has some known issues with locale-dependence (especially on Linux). what about:

     0#include <charconv>
     1...
     2std::vector<unsigned char> hex_string_to_char_vec(const std::string& hex)
     3{
     4    std::vector<unsigned char> bytes;
     5
     6    for (size_t i{0}; i < hex.length(); i += 2) {
     7        unsigned int val{0};
     8        auto [p, ec] = std::from_chars(hex.data() + i, hex.data() + i + 2, val, 16);
     9        if (ec == std::errc{} && p == hex.data() + i + 2) {
    10            bytes.push_back(val);
    11        }
    12    }
    13
    14    return bytes;
    15}
    

    from_chars is guaranteed to be locale-independent so doesn’t need an exception in the linter either. Same for the other use.

  124. in src/kernel/bitcoin-chainstate.cpp:9 in f157b0cbc7 outdated
    0@@ -0,0 +1,200 @@
    1+#include <kernel/bitcoinkernel_wrapper.h>
    2+
    3+#include <cassert>
    4+#include <filesystem>
    5+#include <iostream>
    6+#include <optional>
    7+#include <string>
    8+#include <string_view>
    9+#include <sstream>
    


    laanwj commented at 1:19 pm on January 15, 2025:
    missing #include <vector>
  125. DrahtBot added the label Needs rebase on Jan 15, 2025
  126. TheCharlatan force-pushed on Jan 15, 2025
  127. TheCharlatan commented at 9:00 pm on January 15, 2025: contributor

    Thanks for the suggestions @laanwj,

    Rebased f157b0cbc7d90075858a6522d13a7bc4f0b25a5f -> f25616bec485ee6a70e4b797758d4987d25a7c25 (kernelApi_14 -> kernelApi_15, compare)

    Updated f25616bec485ee6a70e4b797758d4987d25a7c25 -> 4dde75858a3b08f84d71176c7be14bae62020b1f (kernelApi_15 -> kernelApi_16, compare)

  128. DrahtBot removed the label Needs rebase on Jan 16, 2025
  129. TheCharlatan force-pushed on Jan 17, 2025
  130. TheCharlatan commented at 9:30 am on January 17, 2025: contributor

    Rebased 4dde75858a3b08f84d71176c7be14bae62020b1f -> 538671edce5813a62405b9bd5c50c39263c58435 (kernelApi_16 -> kernelApi_17, compare)

  131. TheCharlatan force-pushed on Jan 17, 2025
  132. kernel: Introduce initial kernel C header API
    As a first step, implement the equivalent of what was implemented in the
    now deprecated libbitcoinconsensus header. Also add a test binary to
    exercise the header and library.
    
    Unlike the deprecated libbitcoinconsensus the kernel library can now use
    the hardware-accelerated sha256 implementations thanks for its
    statically-initialzed context. The functions kept around for
    backwards-compatibility in the libbitcoinconsensus header are not ported
    over. As a new header, it should not be burdened by previous
    implementations. Also add a new error code for handling invalid flag
    combinations, which would otherwise cause a crash.
    
    The macros used in the new C header were adapted from the libsecp256k1
    header.
    
    To make use of the C header from C++ code, a C++ header is also
    introduced for wrapping the C header. This makes it safer and easier to
    use from C++ code.
    b630ad0a30
  133. kernel: Add logging to kernel library C header
    Exposing logging in the kernel library allows users to follow what is
    going on when using it. Users of the C header can use
    `kernel_logging_connection_create(...)` to pass a callback function to
    Bitcoin Core's internal logger. Additionally the level and severity can
    be globally configured.
    
    By default, the logger buffers messages until
    `kernel_loggin_connection_create(...)` is called. If the user does not
    want any logging messages, it is recommended that
    `kernel_disable_logging()` is called, which permanently disables the
    logging and any buffering of messages.
    0a35e1c593
  134. kernel: Add kernel library context object
    The context introduced here holds the objects that will be required for
    running validation tasks, such as the chosen chain parameters, callbacks
    for validation events, and an interrupt utility. These will be used in a
    few commits, once the chainstate manager is introduced.
    
    This commit also introduces conventions for defining option objects. A
    common pattern throughout the C header will be:
    ```
    options = object_option_create();
    object = object_create(options);
    ```
    This allows for more consistent usage of a "builder pattern" for
    objects where options can be configured independently from
    instantiation.
    a7db332299
  135. kerenl: Add chain params context option to C header
    As a first option, add the chainparams. For now these can only be
    instantiated with default values. In future they may be expanded to take
    their own options for regtest and signet configurations.
    
    This commit also introduces a unique pattern for setting the option
    values when calling the `*_set(...)` function.
    508607098d
  136. kernel: Add notifications context option to C header
    The notifications are used for notifying on connected blocks and on
    warning and fatal error conditions.
    
    The user of the C header may define callbacks that gets passed to the
    internal notification object in the
    `kernel_NotificationInterfaceCallbacks` struct. Each of the callbacks
    take a `user_data` argument that gets populated from the `user_data`
    value in the struct. It can be used to recreate the structure containing
    the callbacks on the user's side, or to give the callbacks additional
    contextual information.
    1aac7094f2
  137. kernel: Add chainstate manager object to C header
    This is the main driver class for anything validation related, so expose
    it here.
    
    Creating the chainstate manager and block manager options will currently
    also trigger the creation of their respectively configured directories.
    
    The chainstate manager and block manager options were not consolidated
    into a single object, since the kernel might eventually introduce a
    block manager object for the purposes of being a light-weight block
    store reader.
    
    The chainstate manager will associate with the context with which it was
    created for the duration of its lifetime. It is only valid if that
    context remains in memory too.
    
    The tests now also create dedicated temporary directories. This is
    similar to the behaviour in the existing unit test framework.
    b409435b95
  138. kernel: Add chainstate manager option for setting worker threads
    Re-use the same pattern used for the context options. This allows users
    to set the number of threads used in the validation thread pool.
    aa6303f75c
  139. Kernel: Add chainstate loading to kernel C header
    The `kernel_chainstate_manager_load_chainstate(...)` function is the
    final step required to prepare the chainstate manager for future tasks.
    Its main responsibility is loading the coins and block tree indexes.
    
    Though its `context` argument is not strictly required this was added to
    ensure that the context remains in memory for this operation. This
    pattern of a "dummy" context will be re-used for functions introduced in
    later commits.
    
    The chainstate load options will be populated over the next few commits.
    e8ad76d273
  140. kernel: Add block validation to C header
    The added function allows the user process and validate a given block
    with the chainstate manager. The *_process_block(...) function does some
    preliminary checks on the block before passing it to
    `ProcessNewBlock(...)`. These are similar to the checks in the
    `submitblock()` rpc.
    
    Richer processing of the block validation result will be made available
    in the following commits through the validation interface.
    
    The commits also adds a utility for serializing a `CBlock`
    (`kernel_block_create()`) that may then be passed to the library for
    processing.
    
    The tests exercise the function for both mainnet and regtest. The
    commit also adds the data of 206 regtest blocks (some blocks also
    contain transactions).
    74b080676c
  141. kernel: Add options for reindexing in C header
    Adds options for wiping the chainstate and block tree indexes to the
    chainstate load options. In combination and once the
    `*_import_blocks(...)` function is added in a later commit, this
    triggers a reindex. For now, it just wipes the existing data.
    a93e639f84
  142. kernel: Add chainstate load options for in-memory dbs in C header
    This allows a user to run the kernel without creating on-disk files for
    the block tree and chainstate indexes. This is potentially useful in
    scenarios where the user needs to do some ephemeral validation
    operations.
    
    One specific use case is when linearizing the blocks on disk. The block
    files store blocks out of order, so a program may utilize the library
    and its header to read the blocks with one chainstate manager, and then
    write them back in order, and without orphans, with another chainstate
    maanger. To save disk resources and if the indexes are not required once
    done, it may be beneficial to keep the indexes in memory for the
    chainstate manager that writes the blocks back again.
    103dede03a
  143. kernel: Add import blocks function to C header
    The `kernel_import_blocks` function is used to both trigger a reindex,
    if the indexes were previously wiped through the chainstate load
    options, or import the block data of a single block file.
    
    The behaviour of the import can be verified through the test logs.
    9f52e78332
  144. kernel: Add interrupt function to C header
    Calling interrupt can halt long-running functions associated with
    objects that were created through the passed-in context.
    880ba7da99
  145. kernel: Add validation interface to C header
    This adds the infrastructure required to process validation events. For
    now the external validation interface only has support for the
    `BlockChecked` callback, but support for the other internal validation
    interface methods can be added in the future.
    
    The validation interface follows an architecture for defining its
    callbacks and ownership that is similar to the notifications.
    
    The task runner is created internally with a context, which itself
    internally creates a unique ValidationSignals object. When the user
    creates a new chainstate manager the validation signals are internally
    passed to the chainstate manager through the context.
    
    The callbacks block any further validation execution when they are
    called. It is up to the user to either multiplex them, or use them
    otherwise in a multithreaded mechanism to make processing the validation
    events non-blocking.
    
    A validation interface can register for validation events with a
    context. Internally the passed in validation interface is registerd with
    the validation signals of a context.
    
    The BlockChecked callback introduces a seperate type for a non-owned
    block. Since a library-internal object owns this data, the user needs to
    be explicitly prevented from deleting it. In a later commit a utility
    will be added to copy its data.
    8cf6a580bf
  146. kernel: Add functions for the block validation state to C header
    These allow for the interpretation of the data in a `BlockChecked`
    validation interface callback. This is useful to get richer information
    in case a block failed to validate.
    6c28923087
  147. kernel: Add function for copying block data to C header
    This adds functions for copying serialized block data into a user-owned
    variable-sized byte array.
    
    Use it in the tests for verifying the implementation of the validation
    interface's `BlockChecked` method.
    8ba2d7d2cf
  148. kernel: Add functions to read block from disk to C header
    This adds functions for reading a block from disk with a retrieved block
    index entry. External services that wish to build their own index, or
    analyze blocks can use this to retrieve block data.
    
    The block index can now be traversed from the tip backwards. This is
    guaranteed to work, since the chainstate maintains an internal block
    tree index in memory and every block (besides the genesis) has an
    ancestor.
    
    The user can use this function to iterate through all blocks in the
    chain (starting from the tip). Once the block index entry for the
    genesis block is reached a nullptr is returned if the user attempts to
    get the previous entry.
    d19a790c69
  149. kernel: Add function to read block undo data from disk to C header
    This adds functions for reading the undo data from disk with a retrieved
    block index entry. The undo data of a block contains all the spent
    script pubkeys of all the transactions in a block.
    
    In normal operations undo data is used during re-orgs. This data might
    also be useful for building external indexes, or to scan for silent
    payment transactions.
    
    Internally the block undo data contains a vector of transaction undo
    data which contains a vector of the spent outputs. For this reason, the
    `kernel_get_block_undo_size(...)` function is added to the header for
    retrieving the size of the transaction undo data vector, as well as the
    `kernel_get_transaction_undo_size(...) function for retrieving the size
    of each spent outputs vector contained within each transaction undo data
    entry. With these two sizes the user can iterate through the undo data
    by accessing the transaction outputs by their indeces with
    `kernel_get_undo_output_by_index`. If an invalid index is passed in, the
    `kernel_ERROR_OUT_OF_BOUNDS` error is returned again.
    
    The returned `kernel_TransactionOutput` is entirely owned by the user
    and may be destroyed with the `kernel_transaction_output_destroy(...)`
    convenience function.
    5375d3ada7
  150. kernel: Add block index utility functions to C header
    Adds further functions useful for traversing the block index and
    retrieving block information.
    
    This includes getting the block height and hash.
    5c5661b020
  151. kernel: Add functions to get the block hash from a block
    This is useful for a host block processing feature where having an
    identifier for the block is needed. Without this, external users need to
    serialize the block and calculate the hash externally, which is less
    efficient.
    e4af6d4113
  152. kernel: Add pure kernel bitcoin-chainstate
    This showcases a re-implementation of bitcoin-chainstate only using the
    kernel C++ API header.
    01a43b2443
  153. TheCharlatan force-pushed on Jan 17, 2025
  154. TheCharlatan commented at 7:55 am on January 18, 2025: contributor

    Rebased 538671edce5813a62405b9bd5c50c39263c58435 -> 01a43b24436e0aed7b8f79d3857630a4bf6a0545 (kernelApi_17 -> kernelApi_18, compare)

    • Get functional test fix from #31675

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-01-21 06:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me