util: Abort on failing CHECK_NONFATAL in debug builds

maflcko commented at 1:29 pm on May 22, 2025: member

A failing CHECK_NONFATAL will throw an exception. This is fine and even desired in production builds, because the program may catch the exception and give the user a way to easily report the bug upstream.

However, in debug development builds, exceptions for internal bugs are problematic:

The exception could accidentally be caught and silently ignored
The exception does not include a full stacktrace, possibly making debugging harder

Fix all issues by turning the exception into an abort in debug builds.

This can be tested by reverting the hunks to src/rpc/node.cpp and test/functional/rpc_misc.py and then running the functional or fuzz tests.

refactor: Set G_ABORT_ON_FAILED_ASSUME when G_FUZZING_BUILD

This does not change behavior, but documents that
G_ABORT_ON_FAILED_ASSUME is set when G_FUZZING_BUILD.

fadd02220a

DrahtBot commented at 1:29 pm on May 22, 2025: contributor

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/32588.

Reviews

See the guideline for information on the review process.

Type	Reviewers
ACK	ryanofsky
Concept ACK	achow101

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

DrahtBot renamed this:
~~util: Abort on failing CHECK_NONFATAL in debug builds~~
util: Abort on failing CHECK_NONFATAL in debug builds
on May 22, 2025

DrahtBot added the label Utils/log/libs on May 22, 2025

maflcko force-pushed on May 22, 2025

DrahtBot added the label CI failed on May 22, 2025

DrahtBot commented at 2:34 pm on May 22, 2025: contributor

🚧 At least one of the CI tasks failed. Task multiprocess, i686, DEBUG: https://github.com/bitcoin/bitcoin/runs/42714583822 LLM reason (✨ experimental): The CI failure is due to the “rpc_tests” subprocess aborting during the test execution.

Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.
An intermittent issue.

Leave a comment here, if you need help tracking down a confusing failure.

DrahtBot removed the label CI failed on May 22, 2025

ryanofsky commented at 7:16 pm on May 22, 2025: contributor

Concept ACK. Nice idea, and it does seem useful to have a macro checking for unexpected but not very serious conditions by throwing an exception that gets reported in release builds but is a fatal error in debug builds. And current uses of the macro seem like good candidates for that behavior.

The only possible issues I see are:

(1) The name CHECK_NONFATAL doesn’t make a lot of sense anymore, now triggering fatal errors when it literally says “nonfatal” in the name. (2) It is now more cumbersome to write unit tests checking for these conditions since they require a release build to run.

Both could be addressed in followups. Issue (2) could be addressed by having a g_abort_hook or similar hook allowing specific unit tests to write custom code to check for these errors if they want. (This could also be used to replace the g_debug_lockorder_abort variable which does something similar.)

IMO, issue (1) would be nice to address by coming up with a better designed set of checking macros and starting to use them. I think it could be good to have a:

CHECK to check conditions and abort if false in all builds
DCHECK to do the same but be compiled out of release builds,
CHECK_LOG to log an “internal bug detected please report” type log message in release builds, and abort in debug builds
CHECK_THROW to throw an exception in release builds, and abort in log builds.

Then, current assert uses could become CHECK, current Assume uses in hotspots could become DCHECK, majority of other Assume uses could become CHECK_LOG, and current CHECK_NONFATAL uses could become CHECK_THROW.

Just a thought though. Maybe current names are not a real problem, and naming shouldn’t block this PR in any case.

util: Abort on failing CHECK_NONFATAL in debug builds

This requires removing the test coverage for the intentional internal
bug, which is fine, because Assume and Assert are not tested either.

fae8c02268

test: Allow testing of check failures

This allows specific tests to mock the check behavior to consistently
use exceptions instead of aborts for intentionally failing checks in all
build configurations.

faae9a2947

maflcko force-pushed on May 23, 2025

maflcko commented at 6:34 am on May 23, 2025: member

CHECK_THROW

I don’t think this solves issue (1). Instead of NONFATAL being inaccurately named in debug builds, it will now be THROW, because the check is neither nonfatal nor throwing in debug builds.

Then, current assert uses could become CHECK, current Assume uses in hotspots could become DCHECK, majority of other Assume uses could become CHECK_LOG, and current CHECK_NONFATAL uses could become CHECK_THROW.

No objection, just mentioning that this would be a larger diff (including link-time changes, which are for some reason more involved in this area (https://github.com/bitcoin/bitcoin/pull/26688#issuecomment-1359622072, #32543 (review))), so a separate discussion/issue/pull seems better.

cumbersome to write unit tests

Thx, pushed a commit to fix this.

in src/rpc/node.cpp:299 in fae8c02268 outdated

293@@ -295,10 +294,6 @@ static RPCHelpMan echo(const std::string& name)
294                 RPCExamples{""},
295         [&](const RPCHelpMan& self, const JSONRPCRequest& request) -> UniValue
296 {
297-    if (request.params[9].isStr()) {
298-        CHECK_NONFATAL(request.params[9].get_str() != "trigger_internal_bug");

ryanofsky commented at 8:27 pm on June 2, 2025:

In commit “util: Abort on failing CHECK_NONFATAL in debug builds” (fae8c02268c11f2ad6165b6437b3e4846babfaf5)

IMO, it would be better not to drop this test coverage so that we can verify that the RPC code catching the NonFatalCheckError exception and turning into a JSON response works.

The commit message mentions Assume and Assert don’t have test coverage either, but I don’t see why they shouldn’t have it.

It seems like it would be pretty easy to keep the test coverage here either by declaring a g_detail_test_only_CheckFailuresAreExceptionsNotAborts variable or by throwing an explicit NonFatalCheckError exception.

If we really do want to drop the test coverage, that also seems ok, but dropping it doesn’t seem required.

maflcko commented at 5:38 am on June 3, 2025:

If we really do want to drop the test coverage, that also seems ok, but dropping it doesn’t seem required.

Yeah, I really want to drop this one. The other reason is that a corresponding non-type-safe catch needs to be maintained in src/test/fuzz/rpc.cpp, which was historically brittle and I expect will be in the future.

If someone cares about test coverage, a self-contained minimal unit test may be best here. Happy to review such a follow-up.

achow101 commented at 8:28 pm on June 2, 2025: member

Concept ACK

ryanofsky approved

ryanofsky commented at 8:44 pm on June 2, 2025: contributor

Code review ACK faae9a294798c6808ec82f827385d920f8d697cf. I think this is a good change. It makes sense conceptually to have check macros that always abort in debug builds, but do different things depending on cost of the check & severity of the error in release builds.

re: #32588 (comment)

I don’t think this solves issue (1). Instead of NONFATAL being inaccurately named in debug builds, it will now be THROW, because the check is neither nonfatal nor throwing in debug builds.

IMO it does solve it, because the current issue is that the macro is literally doing the thing its name says it will not do (trigger a fatal error). By contrast, I don’t think it is a problem for function name to just describe its primary purpose and not everything else it may do. No need to solve everything here though. Current PR seems like a step forward.

DrahtBot requested review from achow101 on Jun 2, 2025

util: Abort on failing CHECK_NONFATAL in debug builds #32588

Code Coverage & Benchmarks

Reviews