util: Abort on failing CHECK_NONFATAL in debug builds #32588

pull maflcko wants to merge 3 commits into bitcoin:master from maflcko:2505-abort-debug-check-nonfatal changing 8 files +95 −29
  1. maflcko commented at 1:29 pm on May 22, 2025: member

    A failing CHECK_NONFATAL will throw an exception. This is fine and even desired in production builds, because the program may catch the exception and give the user a way to easily report the bug upstream.

    However, in debug development builds, exceptions for internal bugs are problematic:

    • The exception could accidentally be caught and silently ignored
    • The exception does not include a full stacktrace, possibly making debugging harder

    Fix all issues by turning the exception into an abort in debug builds.

    This can be tested by reverting the hunks to src/rpc/node.cpp and test/functional/rpc_misc.py and then running the functional or fuzz tests.

  2. DrahtBot commented at 1:29 pm on May 22, 2025: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/32588.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK ryanofsky, achow101

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    Conflicts

    No conflicts as of last run.

  3. DrahtBot renamed this:
    util: Abort on failing CHECK_NONFATAL in debug builds
    util: Abort on failing CHECK_NONFATAL in debug builds
    on May 22, 2025
  4. DrahtBot added the label Utils/log/libs on May 22, 2025
  5. maflcko force-pushed on May 22, 2025
  6. maflcko force-pushed on May 22, 2025
  7. DrahtBot added the label CI failed on May 22, 2025
  8. DrahtBot commented at 2:34 pm on May 22, 2025: contributor

    🚧 At least one of the CI tasks failed. Task multiprocess, i686, DEBUG: https://github.com/bitcoin/bitcoin/runs/42714583822 LLM reason (✨ experimental): The CI failure is due to the “rpc_tests” subprocess aborting during the test execution.

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  9. DrahtBot removed the label CI failed on May 22, 2025
  10. ryanofsky commented at 7:16 pm on May 22, 2025: contributor

    Concept ACK. Nice idea, and it does seem useful to have a macro checking for unexpected but not very serious conditions by throwing an exception that gets reported in release builds but is a fatal error in debug builds. And current uses of the macro seem like good candidates for that behavior.

    The only possible issues I see are:

    (1) The name CHECK_NONFATAL doesn’t make a lot of sense anymore, now triggering fatal errors when it literally says “nonfatal” in the name. (2) It is now more cumbersome to write unit tests checking for these conditions since they require a release build to run.

    Both could be addressed in followups. Issue (2) could be addressed by having a g_abort_hook or similar hook allowing specific unit tests to write custom code to check for these errors if they want. (This could also be used to replace the g_debug_lockorder_abort variable which does something similar.)

    IMO, issue (1) would be nice to address by coming up with a better designed set of checking macros and starting to use them. I think it could be good to have a:

    • CHECK to check conditions and abort if false in all builds
    • DCHECK to do the same but be compiled out of release builds,
    • CHECK_LOG to log an “internal bug detected please report” type log message in release builds, and abort in debug builds
    • CHECK_THROW to throw an exception in release builds, and abort in log builds.

    Then, current assert uses could become CHECK, current Assume uses in hotspots could become DCHECK, majority of other Assume uses could become CHECK_LOG, and current CHECK_NONFATAL uses could become CHECK_THROW.

    Just a thought though. Maybe current names are not a real problem, and naming shouldn’t block this PR in any case.

  11. maflcko force-pushed on May 23, 2025
  12. maflcko commented at 6:34 am on May 23, 2025: member

    CHECK_THROW

    I don’t think this solves issue (1). Instead of NONFATAL being inaccurately named in debug builds, it will now be THROW, because the check is neither nonfatal nor throwing in debug builds.

    Then, current assert uses could become CHECK, current Assume uses in hotspots could become DCHECK, majority of other Assume uses could become CHECK_LOG, and current CHECK_NONFATAL uses could become CHECK_THROW.

    No objection, just mentioning that this would be a larger diff (including link-time changes, which are for some reason more involved in this area (https://github.com/bitcoin/bitcoin/pull/26688#issuecomment-1359622072, #32543 (review))), so a separate discussion/issue/pull seems better.

    cumbersome to write unit tests

    Thx, pushed a commit to fix this.

  13. in src/rpc/node.cpp:299 in fae8c02268 outdated
    293@@ -295,10 +294,6 @@ static RPCHelpMan echo(const std::string& name)
    294                 RPCExamples{""},
    295         [&](const RPCHelpMan& self, const JSONRPCRequest& request) -> UniValue
    296 {
    297-    if (request.params[9].isStr()) {
    298-        CHECK_NONFATAL(request.params[9].get_str() != "trigger_internal_bug");
    


    ryanofsky commented at 8:27 pm on June 2, 2025:

    In commit “util: Abort on failing CHECK_NONFATAL in debug builds” (fae8c02268c11f2ad6165b6437b3e4846babfaf5)

    IMO, it would be better not to drop this test coverage so that we can verify that the RPC code catching the NonFatalCheckError exception and turning into a JSON response works.

    The commit message mentions Assume and Assert don’t have test coverage either, but I don’t see why they shouldn’t have it.

    It seems like it would be pretty easy to keep the test coverage here either by declaring a g_detail_test_only_CheckFailuresAreExceptionsNotAborts variable or by throwing an explicit NonFatalCheckError exception.

    If we really do want to drop the test coverage, that also seems ok, but dropping it doesn’t seem required.


    maflcko commented at 5:38 am on June 3, 2025:

    If we really do want to drop the test coverage, that also seems ok, but dropping it doesn’t seem required.

    Yeah, I really want to drop this one. The other reason is that a corresponding non-type-safe catch needs to be maintained in src/test/fuzz/rpc.cpp, which was historically brittle and I expect will be in the future.

    If someone cares about test coverage, a self-contained minimal unit test may be best here. Happy to review such a follow-up.


    maflcko commented at 4:41 pm on July 22, 2025:
    restored this, and extended the workaround in the fuzz test, and wrote unit tests.

    ryanofsky commented at 6:56 pm on July 23, 2025:

    re: #32588 (review)

    restored this, and extended the workaround in the fuzz test, and wrote unit tests.

    Changes look nice, but to be sure there still isn’t a test that replaces the python test, explicitly passing a “trigger_internal_bug” argument to the echo RPC and ensuring that it returns an “Internal bug detected” message in response. The fuzz test handles this case, but doesn’t try to trigger it, so won’t fail if the CHECK_NONFATAL` line is removed from the echo RPC.

    Could declare a test_only_CheckFailuresAreExceptionsNotAborts instance inside the echo RPC to keep the python test intact (though would need need to make the global atomic). Or could add a simple unit test replacing the python test. Dropping the coverage also seems ok if you prefer.

  14. achow101 commented at 8:28 pm on June 2, 2025: member
    Concept ACK
  15. ryanofsky approved
  16. ryanofsky commented at 8:44 pm on June 2, 2025: contributor

    Code review ACK faae9a294798c6808ec82f827385d920f8d697cf. I think this is a good change. It makes sense conceptually to have check macros that always abort in debug builds, but do different things depending on cost of the check & severity of the error in release builds.

    re: #32588 (comment)

    I don’t think this solves issue (1). Instead of NONFATAL being inaccurately named in debug builds, it will now be THROW, because the check is neither nonfatal nor throwing in debug builds.

    IMO it does solve it, because the current issue is that the macro is literally doing the thing its name says it will not do (trigger a fatal error). By contrast, I don’t think it is a problem for function name to just describe its primary purpose and not everything else it may do. No need to solve everything here though. Current PR seems like a step forward.

  17. DrahtBot requested review from achow101 on Jun 2, 2025
  18. maflcko force-pushed on Jul 22, 2025
  19. in src/test/fuzz/rpc.cpp:388 in fa68bda5f8 outdated
    380@@ -381,6 +381,12 @@ FUZZ_TARGET(rpc, .init = initialize_rpc)
    381         arguments.push_back(ConsumeRPCArgument(fuzzed_data_provider, good_data));
    382     }
    383     try {
    384+        std::unique_ptr<test_only_CheckFailuresAreExceptionsNotAborts> maybe_mock{};
    385+        if (rpc_command == "echo") {
    386+            // Avoid aborting fuzzing for this specific test-only RPC with an
    387+            // intentional trigger_internal_bug
    388+            maybe_mock = std::make_unique<test_only_CheckFailuresAreExceptionsNotAborts>();
    


    ryanofsky commented at 6:40 pm on July 23, 2025:

    In commit “util: Abort on failing CHECK_NONFATAL in debug builds” (fa68bda5f82887f95cc0313385c8e3a97eaa9967)

    Could maybe make this less verbose by changing maybe_mock from std::unique_ptr to std::optional and using maybe_mock.emplace(). But maybe verbosity is the point here, seems fine regardless

  20. ryanofsky approved
  21. ryanofsky commented at 6:59 pm on July 23, 2025: contributor
    Code review ACK fa68bda5f82887f95cc0313385c8e3a97eaa9967. Looks good! Since last review added some more test coverage and reverted some unneeded changes.
  22. refactor: Set G_ABORT_ON_FAILED_ASSUME when G_FUZZING_BUILD
    This does not change behavior, but documents that
    G_ABORT_ON_FAILED_ASSUME is set when G_FUZZING_BUILD.
    faeb58fe66
  23. maflcko force-pushed on Jul 24, 2025
  24. maflcko force-pushed on Jul 24, 2025
  25. DrahtBot added the label CI failed on Jul 24, 2025
  26. DrahtBot commented at 11:34 am on July 24, 2025: contributor

    🚧 At least one of the CI tasks failed. Task lint: https://github.com/bitcoin/bitcoin/runs/46638610833 LLM reason (✨ experimental): The CI failure is caused by a lint error due to Python code issues identified by ruff.

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  27. maflcko force-pushed on Jul 24, 2025
  28. maflcko force-pushed on Jul 24, 2025
  29. in src/test/fuzz/integer.cpp:262 in fa8251bb88 outdated
    254@@ -255,9 +255,4 @@ FUZZ_TARGET(integer, .init = initialize_integer)
    255         } catch (const std::ios_base::failure&) {
    256         }
    257     }
    258-
    259-    try {
    260-        CHECK_NONFATAL(b);
    261-    } catch (const NonFatalCheckError&) {
    262-    }
    


    ryanofsky commented at 3:19 pm on July 24, 2025:

    In commit “test: Allow testing of check failures” (fa8251bb883300bb84cad5486468d07b5b1b7322)

    Not important, but I think it makes more sense to drop this test in the next commit instead dropping it here, because the next commit is what actually breaks this test and the claim that this fuzz test is “redundant to the new unit test” is not 100% accurate, since the fuzz test and unit test execute with different CheckFailuresAreExceptions values.


    maflcko commented at 3:57 pm on July 24, 2025:

    the claim that this fuzz test is “redundant to the new unit test” is not 100% accurate, since the fuzz test and unit test execute with different CheckFailuresAreExceptions values.

    I think it is 100% accurate, because test_only_CheckFailuresAreExceptionsNotAborts does not affect CHECK_NONFATAL in commit https://github.com/bitcoin/bitcoin/commit/fa8251bb883300bb84cad5486468d07b5b1b7322, so the fuzz test and unit test are 100% identical and redundant in this commit.


    ryanofsky commented at 4:08 pm on July 24, 2025:

    I think it is 100% accurate, because test_only_CheckFailuresAreExceptionsNotAborts does not affect CHECK_NONFATAL in commit fa8251b, so the fuzz test and unit test are 100% identical and redundant in this commit.

    I wouldn’t say it’s 100% accurate to say that two tests have identical coverage just because the they have the same results in one commit, especially when the results immediately diverge in the next commit, where one of the tests starts to fail and the other succeeds. But if you think this statement is 100% accurate, I think you can make it 101% accurate if you mention the change in the next commit, or just remove the test in the commit which breaks it. Also fine with the current approach, but wanted to clarify!


    maflcko commented at 9:06 pm on July 24, 2025:

    make it 101% accurate

    thx, done

  30. ryanofsky approved
  31. ryanofsky commented at 3:37 pm on July 24, 2025: contributor

    Code review ACK fac50ee1f9e577527650397891d356bca6316321, just rebased, added back functional test, and tweaked fuzz test since last review.

    Overall this looks good and conceptually I like this change because it makes all the checking macros do the exact same thing in debug builds and abort, only having varying behavior in release builds.

  32. test: Allow testing of check failures
    This allows specific tests to mock the check behavior to consistently
    use exceptions instead of aborts for intentionally failing checks in all
    build configurations.
    fa0dc4bdff
  33. maflcko force-pushed on Jul 24, 2025
  34. maflcko force-pushed on Jul 25, 2025
  35. util: Abort on failing CHECK_NONFATAL in debug builds
    This requires adjusting some tests to force exceptions over aborts, or
    accept either exceptions or aborts.
    
    Also, remove a fuzz test in integer.cpp that is mostly redundant with
    the unit test added in the prior commit.
    fa37153288
  36. maflcko force-pushed on Jul 25, 2025
  37. DrahtBot removed the label CI failed on Jul 25, 2025
  38. ryanofsky approved
  39. ryanofsky commented at 3:10 pm on July 28, 2025: contributor
    Code review ACK fa37153288ca420420636046ef6b8c4ba7e5a478, just catching subprocess.CalledProcessError in test fixing up a comment since last review
  40. achow101 commented at 7:41 pm on July 29, 2025: member
    ACK fa37153288ca420420636046ef6b8c4ba7e5a478

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-08-23 00:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me