kernel: handle null or empty directories in implementation #33867

pull stickies-v wants to merge 1 commits into bitcoin:master from stickies-v:2025-11/kernel-datadir-nullptr changing 4 files +39 −8
  1. stickies-v commented at 10:43 pm on November 12, 2025: contributor

    An empty path may be represented with a nullptr. For example, std::string_view{}.data() may return nullptr.

    Removes the BITCOINKERNEL_ARG_NONNULL attribute for btck_chainstate_manager_options_create ’s data_directory parameter, and instead handles such null arguments in the implementation. Because an empty path is meaningless, btck_chainstate_manager_options_create now treats both null and empty directories as invalid, tightening the interface.

    Also documents how BITCOINKERNEL_ARG_NONNULL should be used.

    Follow-up to #33853#pullrequestreview-3454620265

  2. DrahtBot added the label Validation on Nov 12, 2025
  3. DrahtBot commented at 10:43 pm on November 12, 2025: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/33867.

    Reviews

    See the guideline for information on the review process.

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    LLM Linter (✨ experimental)

    Possible typos and grammar issues:

    • associate -> be associated [The phrase “the created options and through it the chainstate manager will associate with this kernel context” is ungrammatical; “be associated” makes the passive relation clear.]

    drahtbot_id_5_m

  4. in src/kernel/bitcoinkernel.h:937 in 1dee29ef47 outdated
    943@@ -933,10 +944,10 @@ BITCOINKERNEL_API const btck_BlockHash* BITCOINKERNEL_WARN_UNUSED_RESULT btck_bl
    944  * @brief Create options for the chainstate manager.
    945  *
    946  * @param[in] context          Non-null, the created options and through it the chainstate manager will
    947-                               associate with this kernel context for the duration of their lifetimes.
    948- * @param[in] data_directory   Non-null, path string of the directory containing the chainstate data.
    


    maflcko commented at 7:58 am on November 13, 2025:

    I don’t think this change is right. The path can not be null or empty, because that will fail later on with:

    0unknown location(0): fatal error: in "fs_tests/fsbridge_stem": std::filesystem::__cxx11::filesystem_error: filesystem error: cannot make absolute path: Invalid argument []
    

    stickies-v commented at 11:48 am on November 13, 2025:

    I can’t seem to reproduce that, e.g. these tests all pass for me, indicating that an empty or null path resolves to fs::path(), which then expands to fs::current_path()? How did you get that fatal error?

    0    BOOST_CHECK_EQUAL(fs::PathFromString(""), fs::path());
    1    BOOST_CHECK_EQUAL(fs::PathFromString({nullptr, 0}), fs::path());
    2    BOOST_CHECK_EQUAL(fs::canonical(fs::path()), fs::current_path());
    

    maflcko commented at 12:10 pm on November 13, 2025:

    How did you get that fatal error?

    It is hidden in the error message, but it says “cannot make absolute path”.

    Did you actually call the full kernel function here, including:

    0        fs::path abs_data_dir{fs::absolute(fs::PathFromString({data_dir, data_dir_len}))};
    1        fs::create_directories(abs_data_dir);
    

    In any case, even if it doesn’t crash, I wonder what the valid meaning is here, of passing an empty string?


    stickies-v commented at 3:17 pm on November 13, 2025:

    Did you actually call the full kernel function here, including:

    Yeah, these pass for me too. And I can’t find any documentation that any of this would be UB or lead to errors. Hence my confusion about how you got your error message.

    0    fs::path path{fs::absolute(fs::PathFromString({nullptr, 0}))};
    1    fs::create_directories(path);
    2    path = fs::absolute(fs::PathFromString(""));
    3    fs::create_directories(path);
    

    I wonder what the valid meaning is here, of passing an empty string?

    I’m not sure there’s going to be a conclusive answer, here. I’m really not sure myself. I think intuitively, passing a nullptr or empty string should be equivalent. The least surprising behaviour is probably for that to mean “no directory passed, so use the default directory”, which then probably would be mean the cwd, which is the current behaviour of this PR. I have force-pushed a doc update to make this explicit.

    I would not be opposed to failing and returning nullptr when a null or empty string is passed. It’s just more opinionated, and it doesn’t seem necessary for kernel to work well, hence the current approach. If reviewers prefer this, please leave your thoughts and I’m happy to push an update.


    maflcko commented at 3:23 pm on November 13, 2025:

    Yeah, these pass for me too. And I can’t find any documentation that any of this would be UB or lead to errors.

    Ah, I see you are using libc++, which shows a different behavior:

    https://godbolt.org/z/97aer9jde


    maflcko commented at 3:34 pm on November 13, 2025:

    https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90299#c3:

    And for the avoidance of doubt, the problem is not that !exists(p) (although that is true) but that an empty path doesn’t refer to any file system location. absolute(“does not exist”) is not an error, but absolute("") is.

    There is no absolute representation of the empty path, it’s a category error.


    stickies-v commented at 4:58 pm on November 13, 2025:

    Ah, thanks for those 2 links, that’s helpful. I’ve been digging through cppreference and struggle to find a definitive explanation on what std::filesystem::absolute(std::filesystem::path()) represents or why it’s (il)legal, and so far haven’t been able to find anything, so I just went with the behaviour of my implementation. I can see the viewpoint that an empty path does not exist in the filesystem, and thus should not have an absolute path associated with it.

    Since it’s (not unreasonably) causing issues on some platform, imo that tips the scale in favour of more strict input validation. I’ll update the approach to not allow nullptr or empty paths, and force-push shortly.

    Thanks for your help and keen observation.

  5. in src/kernel/bitcoinkernel.cpp:899 in 1dee29ef47 outdated
    895@@ -896,6 +896,10 @@ btck_BlockValidationResult btck_block_validation_state_get_block_validation_resu
    896 
    897 btck_ChainstateManagerOptions* btck_chainstate_manager_options_create(const btck_Context* context, const char* data_dir, size_t data_dir_len, const char* blocks_dir, size_t blocks_dir_len)
    898 {
    899+    if ((data_dir == nullptr && data_dir_len > 0) || (blocks_dir == nullptr && blocks_dir_len > 0)) {
    


    yancyribbens commented at 12:13 pm on November 13, 2025:
    In what situation would data_dir == nullptr and yet data_dir_len does not equal zero? In other-words, if data_dir is null and data_dir_len is 1 then continue?

    stickies-v commented at 3:18 pm on November 13, 2025:

    In what situation would data_dir == nullptr and yet data_dir_len does not equal zero?

    This is an invalid state and would clearly be a logic/programmer error by the caller, hence the check to catch it and return early instead of continuing.


    yancyribbens commented at 3:57 pm on November 13, 2025:
    This case wouldn’t return early though, even though it’s an invalid state.

    stickies-v commented at 4:25 pm on November 13, 2025:
    Hmm? It seems to me like we capture that in if ((data_dir == nullptr && data_dir_len > 0) || ...) {, after which we return early?

    stickies-v commented at 5:35 pm on November 13, 2025:
    Marking as resolved as this code is now outdated anyway.

    yancyribbens commented at 7:51 pm on November 13, 2025:
    I was pointing out you where using && when it should have been || since data_dir == nullptr && data_dir_len > 0 will not return early if data_dir is null but data_dir_len is greater than zero. Looks like you made that change though :)

    stickies-v commented at 0:45 am on November 14, 2025:

    will not return early if data_dir is null but data_dir_len is greater than zero

    that’s literally English for data_dir == nullptr && data_dir_len > 0 evaluating to true, therefore satisfying the if condition, therefore triggering the early return.

    you where using && when it should have been ||

    Sorry, but… no. If the statement was if ((data_dir == nullptr || data_dir_len > 0) || (blocks_dir == nullptr || blocks_dir_len > 0)) as you suggest, we’d do an early nullptr return for all non-empty strings, which is wrong.


    yancyribbens commented at 11:31 am on November 14, 2025:

    that’s literally English for data_dir == nullptr && data_dir_len > 0 evaluating to true, therefore satisfying the if condition, therefore triggering the early return.

    The && requires both of conditions to be true to process the early return branch? The way you had it written before is confusing because if the ptr is null, it is a logical conclusion that data_dir_len should also be zero. So, either data_dir == nullptr alone is enough, or data_dir == nullptr || data_dir_len == 0 to be pedantic. I think that would be a interface error with the bindings that should never happen where data_dir is null but data_dir_len is greater than zero?

    Sorry, but… no. If the statement was if ((data_dir == nullptr || data_dir_len > 0) || (blocks_dir == nullptr || blocks_dir_len > 0)) as you suggest, we’d do an early nullptr return for all non-empty strings, which is wrong.

    I was suggesting separate all conditions with || as you’ve done ( a || b || c || d), or at least that’s what I had in mind. Apologies if that wasn’t clear by my earlier suggestion. Also, I did not mean data_dir_len > 0 but rather data_dir_len == 0 as you’ve done. I don’t think it is required, and you could also simply say data_dir == nullptr || blocks_dir == nullptr?

    I’’m commenting here to further my understanding and to verify the logical condition, not because I consider myself an expert on Rust/C bindings in any sense.


    stickies-v commented at 12:12 pm on November 14, 2025:

    The way you had it written before is confusing because if the ptr is null, it is a logical conclusion that data_dir_len should also be zero.

    should being the keyword here indeed. A non-zero length with a nullptr indicates the user clearly made an error, regardless of business logic. We need to return early in order to not cause any UB later on, and because the request is nonsensical. In the earlier approach on which you commented (0ae6a22a9f44b7e609431b9bac8bcf1b41d9a977), nullptr (but only with zero length) was allowed because it was not causing any errors on my machine, defaulting to the cwd. However, this was later proven a bad idea, hence the change in approach to now (6657bcbdb4d0359c1843ca31fb3670c7c0c260d5) disallow null and empty input in general.

    For context, in https://github.com/bitcoin/bitcoin/pull/33853/commits/5b89956eeb76cf8c9717152fbb0928e026fc0087 (to which this PR is a follow-up) we’re distinguishing between “this call just doesn’t make any sense, what are you doing?” and “unfortunately we can’t produce a result with null input” - even though they both lead to a nullptr result. The distinction is just to let the appropriate code deal with each situation, even if it could be simplified.

    I don’t think it is required, and you could also simply say data_dir == nullptr || blocks_dir == nullptr?

    In the current approach (6657bcbdb4d0359c1843ca31fb3670c7c0c260d5), we also want to disallow empty strings (""), so just checking for nullptr wouldn’t suffice, we need the length check too for that.

    I’’m commenting here to further my understanding and to verify the logical condition, not because I consider myself an expert on Rust/C bindings in any sense.

    I appreciate you taking the time to review this PR.


    yancyribbens commented at 1:14 pm on November 14, 2025:

    A non-zero length with a nullptr indicates the user clearly made an error, regardless of business logic

    This would occur not from a mistake made by the API consumer, but an error with the binding logic (in rust it would be an error with cc/bindgen) I think. That is to say, a rather improbable condition imo.

    In the earlier approach on which you commented (https://github.com/bitcoin/bitcoin/commit/0ae6a22a9f44b7e609431b9bac8bcf1b41d9a977), nullptr (but only with zero length) was allowed because it was not causing any errors on my machine, defaulting to the cwd. However, this was later #33867 (review), hence the change in approach to now (https://github.com/bitcoin/bitcoin/commit/6657bcbdb4d0359c1843ca31fb3670c7c0c260d5) disallow null and empty input in general.

    I see. so ptr could be null and yet empty string with length zero was allowed since it would resolve to . (sometimes). It might be possible get around the system ambiguity by using . for the path when empty but I agree with the current approach to just treat it as ambiguous and return.

    In the current approach (https://github.com/bitcoin/bitcoin/commit/6657bcbdb4d0359c1843ca31fb3670c7c0c260d5), we also want to disallow empty strings (""), so just checking for nullptr wouldn’t suffice, we need the length check too for that.

    For arguments sake, a check for both null and empty could be done without checking length. Or return unconditionally if the length is zero for either data_dir or nullptr.

    I appreciate you taking the time to review this PR.

    Happy to review. I appreciate the work done so far on these C bindings. I’ve also been experimenting with the Rust bindings and am impressed with the progress made.


    stickies-v commented at 1:43 pm on November 14, 2025:

    This would occur not from a mistake made by the API consumer, but an error with the binding logic

    Not all users will use language bindings, and regardless, I don’t think the distinction between “direct usage” and “wrapped usage” is meaningful here. Our implementation introduces UB when a non-zero length is used with a nullptr, so our implementation is responsible for preventing the UB.

    a check for both null and empty could be done without checking length.

    How? In the API, a string is defined as {ptr, ptr+len}. Being empty means a size of 0, and size depends on len. We could introduce the requirement for strings to be null-terminated and remove the _len parameters, but that is intentionally not the design of the API.


    yancyribbens commented at 4:49 pm on November 14, 2025:

    Not all users will use language bindings, and regardless, I don’t think the distinction between “direct usage” and “wrapped usage” is meaningful here. Our implementation introduces UB when a non-zero length is used with a nullptr, so our implementation is responsible for preventing the UB.

    Makes sense

    How? In the API, a string is defined as {ptr, ptr+len}. Being empty means a size of 0, and size depends on len. We could introduce the requirement for strings to be null-terminated and remove the _len parameters, but that is intentionally not the design of the API.

    I see what you mean here that null ptr and a ptr with a len of zero are two different things and it’s decided to return early now in both cases. My intuition is that null ptr would also have a length of zero however that would be a case of UB if ptr is null and len is not zero that should be guarded against. Thanks for entertaining my questions.


    yancyribbens commented at 6:08 pm on November 14, 2025:
    However, there really isn’t a way to truly guard against UB here since a len could be inaccurate (greater than string length). Nothing that can really be done in that case though I guess.

    stickies-v commented at 11:30 am on November 17, 2025:
    Yeah, I don’t think we can completely prevent UB here, just do the best and least surprising thing we can.
  6. stickies-v force-pushed on Nov 13, 2025
  7. ?
    added_to_project_v2 sedited
  8. ?
    project_v2_item_status_changed sedited
  9. stickies-v force-pushed on Nov 13, 2025
  10. stickies-v renamed this:
    kernel: allow null data_directory
    kernel: handle null or empty directories in implementation
    on Nov 13, 2025
  11. stickies-v commented at 5:38 pm on November 13, 2025: contributor
    Force-pushed to disallow empty and null directories, addressing feedback from @maflcko both wrt technical concerns and conceptual concerns. Also updated PR title and description.
  12. in src/kernel/bitcoinkernel.h:960 in 0ae6a22a9f outdated
    961     const btck_Context* context,
    962     const char* data_directory,
    963     size_t data_directory_len,
    964     const char* blocks_directory,
    965-    size_t blocks_directory_len) BITCOINKERNEL_ARG_NONNULL(1, 2);
    966+    size_t blocks_directory_len) BITCOINKERNEL_ARG_NONNULL(1);
    


    maflcko commented at 5:51 pm on November 13, 2025:
    nit: could add back the attributes for (1,2,4)?

    stickies-v commented at 6:07 pm on November 13, 2025:

    Wouldn’t that introduce UB for e.g. the tests that exercise the null path? According to the docs:

    The compiler may also perform optimizations based on the knowledge that nonnull parameters cannot be null.

    I thought we removed the attribute for the same reason in #33853 ?


    maflcko commented at 4:46 pm on November 19, 2025:
    Ah, makes sense. Passing null to nonnull can lead to UB, not to be confused with passing null to _Nonnull (https://clang.llvm.org/docs/AttributeReference.html#nonnull), which doesn’t lead to UB :sweat_smile:
  13. in src/test/kernel/test_kernel.cpp:633 in 0ae6a22a9f
    624@@ -625,6 +625,20 @@ BOOST_AUTO_TEST_CASE(btck_chainman_tests)
    625         ChainstateManagerOptions chainman_opts{context, test_directory.m_directory.string(), (test_directory.m_directory / "blocks").string()};
    626         ChainMan chainman{context, chainman_opts};
    627     }
    628+    { // null or empty data_directory or blocks_directory are not allowed
    629+        Context context{};
    630+        auto valid_dir{test_directory.m_directory.string()};
    631+        std::vector<std::pair<std::string, std::string>> illegal_cases{
    632+            {"", valid_dir},
    633+            {valid_dir, {nullptr, 0}},
    


    maflcko commented at 5:53 pm on November 13, 2025:
    nit: I think you are just testing the stdlib string constructor. Probably want to drop those, or use string_view?

    stickies-v commented at 6:19 pm on November 13, 2025:
    Good catch, thanks. I’ve also updated the ChainstateManagerOptions ctor to take a string_view so we can properly exercise both the null and empty test case.
  14. maflcko approved
  15. maflcko commented at 5:57 pm on November 13, 2025: member
    lgtm
  16. kernel: allow null data_directory
    An empty path may be represented with a nullptr. For example,
    std::string_view::data() may return nullptr.
    
    Removes the BITCOINKERNEL_ARG_NONNULL attribute for data_directory,
    and instead handles such null arguments in the implementation.
    
    Also documents how BITCOINKERNEL_ARG_NONNULL should be used.
    6657bcbdb4
  17. stickies-v force-pushed on Nov 13, 2025
  18. DrahtBot added the label CI failed on Nov 13, 2025
  19. DrahtBot removed the label CI failed on Nov 13, 2025
  20. sedited approved
  21. sedited commented at 10:00 pm on November 13, 2025: contributor
    ACK 6657bcbdb4d0359c1843ca31fb3670c7c0c260d5
  22. stringintech commented at 3:11 pm on November 16, 2025: contributor
    ACK 6657bcb
  23. in src/kernel/bitcoinkernel.h:947 in 6657bcbdb4
    943@@ -933,19 +944,20 @@ BITCOINKERNEL_API const btck_BlockHash* BITCOINKERNEL_WARN_UNUSED_RESULT btck_bl
    944  * @brief Create options for the chainstate manager.
    945  *
    946  * @param[in] context          Non-null, the created options and through it the chainstate manager will
    947-                               associate with this kernel context for the duration of their lifetimes.
    948- * @param[in] data_directory   Non-null, path string of the directory containing the chainstate data.
    949- *                             If the directory does not exist yet, it will be created.
    950- * @param[in] blocks_directory Non-null, path string of the directory containing the block data. If
    951- *                             the directory does not exist yet, it will be created.
    952+ *                             associate with this kernel context for the duration of their lifetimes.
    


    janb84 commented at 4:16 pm on November 19, 2025:

    DrahtBot NIT:

    0 *                             be associated with this kernel context for the duration of their lifetimes.
    
  24. janb84 commented at 4:17 pm on November 19, 2025: contributor

    ACK 6657bcbdb4d0359c1843ca31fb3670c7c0c260d5

    PR Adds logic to handle null pointers and/or empty directories in the chainstate manager. + extends tests and documentation

    “Also documents how BITCOINKERNEL_ARG_NONNULL should be used.” I would say this also explains how to interpret it. Without this additional info seeing a parameter being described as “non-null” but not included in the Macro. The extra documentation helps with the understanding.

    This change has no impact downstream (at least for my .Net impl.)

    Have highlighted the Drahbot suggested NIT so that we do not forget it

  25. maflcko commented at 4:47 pm on November 19, 2025: member

    review ACK 6657bcbdb4d0359c1843ca31fb3670c7c0c260d5 🐪

    Signature:

    0untrusted comment: signature from minisign secret key on empty file; verify via: minisign -Vm "${path_to_any_empty_file}" -P RWTRmVTMeKV5noAMqVlsMugDDCyyTSbA3Re5AkUrhvLVln0tSaFWglOw -x "${path_to_this_whole_four_line_signature_blob}"
    1RUTRmVTMeKV5npGrKx1nqXCw5zeVHdtdYURB/KlyA/LMFgpNCs+SkW9a8N95d+U4AP1RJMi+krxU1A3Yux4bpwZNLvVBKy0wLgM=
    2trusted comment: review ACK 6657bcbdb4d0359c1843ca31fb3670c7c0c260d5 🐪
    3o9fQ2TExnTfLCGFZ5F6eIuvT03Mhg8Fll/YRLXRW+rNQIyHj7VXPwc0LUxxJ6SVA2huNmyJqDzE/BN7Iez7yDA==
    
  26. achow101 commented at 0:15 am on November 20, 2025: member
    ACK 6657bcbdb4d0359c1843ca31fb3670c7c0c260d5
  27. achow101 merged this on Nov 20, 2025
  28. achow101 closed this on Nov 20, 2025

  29. ?
    project_v2_item_status_changed github-project-automation[bot]
  30. stickies-v deleted the branch on Nov 20, 2025
  31. stringintech referenced this in commit 52fffa1e5d on Nov 24, 2025
  32. alexanderwiederin referenced this in commit 7334556a3f on Dec 1, 2025
  33. yuvicc referenced this in commit 4a77a3b8eb on Dec 16, 2025

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-01-10 03:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me