logging: Use LogFatal instead LogError for fatal errors

ryanofsky commented at 12:30 pm on June 27, 2024: contributor

Use LogFatal instead LogError for fatal errors.

Keep using LogError for nonfatal errors.

Also rename LogWarning to LogCritical to be clear it is only intended to be used for “severe problems that the node admin should address,” as described in the documentation.

This PR is a draft so other ideas and approaches can be discussed in issue #30348

logging: Use LogFatal instead LogError for fatal errors

Keep using LogError for nonfatal errors. Also rename LogWarning to LogCritical
to be clear it is only intended to be used for "severe problems that the
node admin should address," as described in the documentation.

b27dc672f6

DrahtBot commented at 12:30 pm on June 27, 2024: contributor

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage

For detailed information about the code coverage, see the test coverage report.

Reviews

See the guideline for information on the review process.

Type	Reviewers
Concept NACK	ajtowns
Concept ACK	jonatack

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#30361 (doc: Drop description of LogError messages as fatal by ryanofsky)
#30343 (wallet, logging: Replace WalletLogPrintf() with LogInfo() by ryanofsky)
#30342 (kernel, logging: Pass Logger instances to kernel objects by ryanofsky)
#30155 (validation: Make ReplayBlocks interruptible by mzumsande)
#29656 (chainparams: Change nChainTx type to uint64_t by fjahr)
#29641 (scripted-diff: Use LogInfo/LogDebug over LogPrintf/LogPrint by maflcko)
#29256 (Improve new LogDebug/Trace/Info/Warning/Error Macros by ryanofsky)
#27006 (reduce cs_main scope, guard block index ’nFile’ under a local mutex by furszy)
#26022 (Add util::ResultPtr class by ryanofsky)
#25722 (refactor: Use util::Result class for wallet loading by ryanofsky)
#25665 (refactor: Add util::Result failure values, multiple error and warning messages by ryanofsky)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

willcl-ark commented at 1:23 pm on June 27, 2024: member

I’m largely in agreement with the sentiment here and think the introduction of the LogFatal is nice to distiguish fatal errors.

Does

0LogCritical("pruned datadir may not have more than %d blocks; only checking available blocks\n",

…actually require user intervention or other corrective action?

nit: LogWarning remains in L728 after this change:

0The macros `LogInfo`, `LogDebug`, `LogTrace`, `LogWarning` and `LogError` are available for

in doc/developer-notes.md:29 in 475b70da30 outdated

25@@ -26,6 +26,7 @@ Developer Notes
26     - [Ignoring IDE/editor files](#ignoring-ideeditor-files)
27 - [Development guidelines](#development-guidelines)
28     - [General Bitcoin Core](#general-bitcoin-core)
29+    - [Logging](#logging)

jonatack commented at 1:42 pm on June 27, 2024:

Suggest moving all this developer-facing documentation back to the Level enum class where the levels are defined.

User-facing documentation would be in the logging RPC help.

ryanofsky commented at 2:41 pm on June 28, 2024:

re: #30347 (review)

I agree with you that the logging header should have better documentation, but I also think this section of developer notes is useful and provides good background information.

I actually already made some pretty significant improvements to documentation and organization of the logging header in #29256. If you would be interested in reviewing that PR (even just the documentation changes) I would really appreciate it. The PR also addresses some other things you might be interested in like letting the log macros accept a consistent set of parameters. And it is fully backwards compatible, so it shouldn’t be taking away anything you may currently like.

in src/logging.h:245 in 475b70da30 outdated

240@@ -239,6 +241,8 @@ static inline void LogPrintf_(const std::string& logging_function, const std::st
241 #define LogInfo(...) LogPrintLevel_(BCLog::LogFlags::ALL, BCLog::Level::Info, __VA_ARGS__)
242 #define LogWarning(...) LogPrintLevel_(BCLog::LogFlags::ALL, BCLog::Level::Warning, __VA_ARGS__)
243 #define LogError(...) LogPrintLevel_(BCLog::LogFlags::ALL, BCLog::Level::Error, __VA_ARGS__)
244+#define LogCritical(...) LogPrintLevel_(BCLog::LogFlags::ALL, BCLog::Level::Critical, __VA_ARGS__)
245+#define LogFatal(...) LogPrintLevel_(BCLog::LogFlags::ALL, BCLog::Level::Fatal, __VA_ARGS__)

jonatack commented at 1:45 pm on June 27, 2024:

It would be good to shed this proliferation of macros with hardwired names in favor of one single, consistent Log one.

Simpler, easier, and fewer elements to maintain and keep in alignment.

ryanofsky commented at 2:41 pm on June 28, 2024:

re: #30347 (review)

Agree this does create a small maintenence burden. FWIW, #29256 consolidates and simplifies the macro definitions so the burden can get a littler smaller.

Overall though, I think the macros provide convenience and efficiency. I think we can support having both convenience and variable logging levels, and not have to sacrifice one to get the other.

jonatack commented at 1:48 pm on June 27, 2024: member

Tentative Concept ACK on adding Fatal if the distinction is considered essential, unsure about Critical replacing Warning.

in src/logging.h:81 in 475b70da30 outdated

76@@ -77,6 +77,8 @@ namespace BCLog {
77         Info,      // Default
78         Warning,
79         Error,
80+        Critical,
81+        Fatal,

ajtowns commented at 2:23 pm on June 27, 2024:

I don’t think it makes sense for us to have 7 levels of logging – even 5 is a lot. Renaming Warning to Critical and Error to Fatal for the reasons given seems fine, but the old names shouldn’t be kept around in that case.

jonatack commented at 7:29 pm on June 27, 2024:

I don’t think it makes sense for us to have 7 levels of logging – even 5 is a lot.

Between 6 and 8 levels seems to be standard? The most frequent levels might be our current ones, plus Fatal, i.e.

Fatal, Error, Warning, Info, Debug, Trace

https://www.crowdstrike.com/cybersecurity-101/observability/logging-levels/ https://sematext.com/blog/logging-levels/ https://betterstack.com/community/guides/logging/log-levels-explained/ https://stackoverflow.com/questions/2031163/when-to-use-the-different-log-levels https://www.sumologic.com/glossary/log-levels/ https://medium.com/@m.merkulov/logging-levels-explained-4dd61815601f https://techhub.hpe.com/eginfolib/networking/docs/switches/7500/5200-1946a_nmm_cg/content/495508377.htm

ajtowns commented at 10:33 pm on June 27, 2024:

The medium post describes five levels with the same names as ours currently: error, warn, info, debug, trace; the other six links all describe syslog’s behaviour, which was designed for unix systems in the 1980s. Logging systems we actually use include python’s (which has 5 default levels), leveldb (which does not have levels, so we treat it all as debug), and libevent (which has 4 levels). So, no, 6-8 levels is not standard, and for a custom subsystem we should be doing what’s best for us, not necessarily what’s standard in any case.

The flowchart from the stackoverflow link might be helpful, though: https://stackoverflow.com/a/64806781/84932 For us I think it would look more like:

 0flowchart TD
 1    start-->|No|dbg
 2    start{{Is this critical information?}}-->|Yes|crit
 3
 4    dbg-->|No|logtrace(Trace + Category)
 5    dbg{{Is this moderate volume?}}-->|Yes|logdebug(Debug + Category)
 6
 7    crit{{Is there a problem?}}-->|No|loginfo(Info)
 8    crit-->|Yes|warnerr
 9
10    warnerr{{Do we need to abort?}}-->|No|logwarn(Warning/Critical)
11    warnerr-->|Yes|logfatal(Error/Fatal)

ryanofsky commented at 2:41 pm on June 28, 2024:

re: #30347 (review)

I don’t think it makes sense for us to have 7 levels of logging – even 5 is a lot. Renaming Warning to Critical and Error to Fatal for the reasons given seems fine, but the old names shouldn’t be kept around in that case.

I think the reality is we already have 7 levels of logging, but we are using 5 names to describe them. This PR is a small fix to make the names used in the code match the intent of the code.

For example, the documentation says LogError should only be used to log fatal errors. But currently it is being used to log both fatal errors and nonfatal errors. If you would like to log errors less, or at lower priority, I would enthusiastically agree with that goal. But I don’t think those changes should be folded into this PR, which is minimal and not trying to fix problems that are broader and more complicated than simple bad naming.

Your flowchart is a good idealization, but it does not describe the realities of current code, which is using logging for ordinary error reporting, not just the most critical errors. A description of the actual log levels in our code base is:

Fatal messages indicate severe problems (e.g. data loss) that will trigger shutdowns
Critical messages indicate severe problems (e.g. incorrect system time, internal bug detected) that users should address in some way
Error messages indicate less severe problems (e.g. loss of connectivity) that the users should probably know about, but might be transient
Warning messages indicate unexpected conditions (e.g. user specified -checkblocks number higher than number of blocks that may be available to check) that the user should know about, and indicate potential problems

For completeness, Info messages provide information about normal activity, Debug messages can help users troubleshoot issues, and Trace is mostly unused but geared more towards developers.

Overall, I agree with the direction you want to take in reducing number of log levels, and I would applaud any effort to reduce amount of log spam and report errors by other means, but I don’t think those goals should get in the way of fixing names that are confusing and misleading to developers.

If you want to open a PR that takes a broader approach and also fixes the bad names, I will happily go along with it and close this PR. But I think reasonable to want to take one step at at time here and just fix the bad naming.

UPDATE: I think this comment describes a good way to use current log levels, but I don’t actually think the current set of log levels is ideal, and I described a better set of log levels I think we could switch to in a later post: #30364 (comment)

jonatack commented at 4:27 pm on June 28, 2024:

I don’t think it makes sense for us to have 7 levels of logging – even 5 is a lot.

Meta:

It’s not clear why people are 👍 on that comment. Are 5 “a lot”, despite what other projects do, or based on our needs? Or is the 👍 about 5 being ok, but not having more than 5?

I’ve seen drive-by 👍 on other comments recently where the comment made multiple points, and it was similarly not clear: which of the points the 👍 pertains to? All? Some?

If a reviewer has a comment, I think it would be more helpful in general to be explicit.

jonatack commented at 4:33 pm on June 28, 2024:

@ryanofsky

A description of the actual log levels in our code base is

I agree. When I was working on #25203, for instance, it was often unclear to me whether to use error, warning, or info, and your classification would have made the choice clear. I am also not sure that we can usefully eliminate all helpful logging in one or more of the categories you described.

ryanofsky commented at 4:52 pm on June 28, 2024:

re: #30347 (review)

Meta:

It’s not clear why people are 👍 on that comment.

I kind of like the mystery of the reactions. But I 👍 AJ’s chart because I think it is a good recommendation for how to choose log levels in new code, even if it doesn’t really tell you what log levels mean in existing code. I also broadly agree with having few log levels and using few log levels. I think by a large margin most log statements should be using LogDebug.

ajtowns commented at 5:23 pm on June 28, 2024:

Your flowchart is a good idealization, but it does not describe the realities of current code, which is using logging for ordinary error reporting, not just the most critical errors.

The current code mostly hasn’t migrated to the multiple severity levels from the original LogPrintf/LogPrint split. It would be nice to spend time working on that, but we’re instead constantly bikeshedding the API instead and treating that as a blocker for any other logging changes.

ajtowns commented at 5:33 pm on June 28, 2024:

It’s not clear to me that there’s any reason to separate the current LogWarning / LogError levels; those existed prior to #28318 and I documented them in that PR as best I understood them. However, there’s not much benefit in a log that says “hey this error is about to cause the node to stop” – you already get that information by seeing “Shutdown: in progress…” immediately following. So it could make sense to merge Warning/Error into:

LogAlert(fmt, params...) should be used in place of LogInfo for (potentially) severe problems that require attention from the node admin (e.g., system time appears to be wrong, unknown soft fork appears to have activated, insufficient storage space).

which would reduce the flowchart to:

0flowchart TD
1    start-->|No|dbg
2    start{{Is this critical information?}}-->|Yes|crit
3
4    dbg-->|No|logtrace(Trace + Category)
5    dbg{{Is this moderate volume?}}-->|Yes|logdebug(Debug + Category)
6
7    crit{{Is there a problem?}}-->|No|loginfo(Info)
8    crit-->|Yes|logfatal(Alert)

jonatack commented at 5:52 pm on June 28, 2024:

The current code mostly hasn’t migrated to the multiple severity levels from the original LogPrintf/LogPrint split. It would be nice to spend time working on that, but we’re instead constantly bikeshedding the API instead and treating that as a blocker for any other logging changes.

The logging was mostly migrated in the severity level parent PR #25203 before the whole thing was taken off the rails with the overwrought API and rules. (There have also been a number of pulls opened that were a copy or similar version of the commits there.)

ajtowns commented at 6:39 pm on June 28, 2024:

By overwrought, jonatack means LogError("CHECKSUM ERROR ..."), while his preferred non-overwrought api from #25203 looks like LogPrintLevel(BCLog::NET, BCLog::Level::Error, "CHECKSUM ERROR ...").

jonatack commented at 7:12 pm on June 28, 2024:

By overwrought, jonatack means LogError("CHECKSUM ERROR ..."), while his preferred non-overwrought api from #25203 looks like LogPrintLevel(BCLog::NET, BCLog::Level::Error, "CHECKSUM ERROR ...").

As you know, I’ve described what I mean several times, including in my comments in the pull you mention. Here is the most recent one: #29256 (comment)

ryanofsky commented at 10:24 pm on June 28, 2024:

re: #30347 (review)

I like your idea to collapse the log levels further and I am curious about ways you think we can make progress towards that goal.

However, there’s not much benefit in a log that says “hey this error is about to cause the node to stop” – you already get that information by seeing “Shutdown: in progress…” immediately following.

If we don’t need to have a dedicated level for fatal errors, it suggests a possible alternative to this PR, implemented in #30361. It would solve the issue that was reported and might get closer to the long term outcome you outlined.

ryanofsky force-pushed on Jun 28, 2024

ryanofsky commented at 3:55 pm on June 28, 2024: contributor

Updated 475b70da309bbfd889968ce87d251f539f1d3351 -> b27dc672f6e575d739e037b6e46202e619bc386b (pr/logfat.1 -> pr/logfat.2, compare) with suggested fixes and expanded it to cover Logging::Warning and Logging::Error.

re: #30347 (comment)

pruned datadir may not have more than %d blocks;

Nice catch! I updated this PR with your corrections and also updated the #30348 issue description to reflect this.

ajtowns commented at 5:15 pm on June 28, 2024: contributor

Concept NACK; introducing more different levels just makes it require more effort to use the logging system, and increases the chances they will be used incorrectly.

ryanofsky referenced this in commit 5175cbcbb0 on Jun 28, 2024

ryanofsky referenced this in commit b7aae361b2 on Jun 30, 2024

ryanofsky commented at 4:02 pm on June 30, 2024: contributor

Closing this PR as #30361 is a more direct fix for the issue, assuming we are ok with log levels not making a distinction between fatal errors and other types of errors. It seems like it never might have been anybody’s intention to even make this distinction in the first place and it just got written in when new documentation was added.

ryanofsky closed this on Jun 30, 2024

bitcoin locked this on Jun 30, 2025

logging: Use LogFatal instead LogError for fatal errors #30347

Code Coverage

Reviews

Conflicts