RFC: Misused LogError and LogWarning macros

ryanofsky commented at 12:33 pm on June 27, 2024: contributor

As a programmer, when I see LogError and LogWarning macros, I assume LogError should be used to log information about a failure that happened, and LogWarning should be used to log information about an unusual condition which happened that might indicate a real problem depending on other factors and intent.

But developer notes suggest very different meanings for these macros. They say LogError should be used for “severe problems that require the node (or a subsystem) to shut down entirely (e.g., insufficient storage space)” and that LogWarning should be used “for severe problems that the node admin should address, but are not severe enough to warrant shutting down the node (e.g., system time appears to be wrong, unknown soft fork appears to have activated).”

I think the distinction the developer notes makes and the levels it defines are useful, but the current names make it likely that the macros will be used incorrectly.

So while the macros are new, I think it would be good to give them clearer names and I would suggest:

LogFatal for severe problems that require the node or a subsystem to shut down entirely, and
LogCritical for severe problems that the node admin should address, but do not warrant shutting down

Looking at the code I found 58 cases where LogError and Level::Error appeared to be use correctly, in conditions that would lead to full or partial shutdowns, and 71 cases where it appeared to be used incorrectly to report errors that would not cause shutdowns and were potentially transient.

Current LogWarning and Level::Warn usages were better but more rare, with 8 correct uses and 4 incorrect ones.

I think before more code is written and ported to use LogWarning and LogError macros incorrectly, we should update documentation and/or code so macro names accurately reflect the way they are intended to be used.

If we don’t care about log levels distinguishing between fatal and nonfatal errors, #30361 is the most direct fix for this issue and is just a documentation change. If we think using levels distinguish fatal errors from other errors is valuable, #30347 fixes this issue by clarifying documentation and introducing LogFatal and LogCritical levels suggested above.

maflcko commented at 1:12 pm on June 27, 2024: member

61 cases where it appeared to be used incorrectly

No opinion about naming, but of the cases that are used incorrectly, I wonder if their surrounding functions are better suited to use Result instead of a void or bool return value. (I presume many of them were a result of the bool error() helper transformation/removal, which was the first step toward moving some of those functions toward Result)

jonatack commented at 1:34 pm on June 27, 2024: contributor

Level::Error now meaning that the node needs to be shut down does seem to be a deviation, and perhaps adding Level::Fatal is a good idea if this distinction is needed. Not sure about Level::Critical.

IMO the severity-level logging veered in an unfortunate direction with the unneeded proliferation of Log{Level} macros having hardwired names and an inconsistent API with varying numbers of args and documentation duplicated/moved from enum class Level to the developer notes.

All of that was unnecessary and cumbersome, rather than simply one Log macro with a consistent interface and documentation of levels in the Level enum class.

ryanofsky commented at 2:03 pm on June 27, 2024: contributor

re: #30348 (comment)

Thanks Marco, I forgot about the error() function, but yes looking at #29236 it looks like a lot of incorrect LogError uses did come from that replacement.

I also agree it is good to return errors through util::Result. But I think it can be useful in many cases to log the errors as well, or in some cases not try to return very granular error information, but just indicate that a failure happened and refer the user to check the debug log.

re: #30348 (comment)

I think I disagree somewhat with Jon. I like the new logging API overall and think new documentation is well written, giving good advice and making clear distinctions that should lead to less spammy logging. I just think two of the macros happen to have unfortunate names that don’t match how they are intended to be used.

I do agree with the concern about having a proliferation of log macros. But I think the intent behind the macros is that LogDebug should be the most commonly one used by far. Unconditional LogInfo LogCritical and LogFatal macros should be used more rarely, to avoid log spam. And LogTrace should only be used in complicated code where it’s helpful to have fine traces. My PR #30347 does keep LogWarning and LogError macros for now, but I think it might make more sense to get rid of them and shift more code to using LogDebug.

jonatack commented at 2:19 pm on June 27, 2024: contributor

@ryanofsky all those unnecessary macros add complexity and more ducks to keep in a row – for no benefit.

One primary benefit I hope to see from the trace level is to be able to separate out very verbose, low-level, high-frequency logging. With net category logging, for instance, to use debug for high-level, lower-frequency events at the peer level, and trace for low-level, very high frequency events at the p2p messages level.

ryanofsky commented at 2:47 pm on June 27, 2024: contributor

re: #30348 (comment)

@ryanofsky all those unnecessary macros add complexity and more ducks to keep in a row

I agree with that and your other points. I particularly agree the current implementation of the macros is complicated and inconsistent, and I wrote #29256 to make all the macros accept the same parameters and use the same implementation. I just don’t see a significant distinction between different syntaxes LogDebug(...) or LogPrint(BCLog::Debug, ...) or even log.Debug(...) so have not been trying to raise syntax as an issue.

(EDIT: To be clear, I personally prefer brevity and efficiency of the macro syntax over the other syntaxes. I am just trying to be sympathetic to Jon’s point of view while pointing out that his preference for level arguments is only a preference for a different syntax, and he doesn’t need to be so concerned about the existence of convenience macros. I think this one criticism is little shallow, but the other points he makes are more meaningful.)

ajtowns commented at 3:12 pm on June 27, 2024: contributor

I just don’t see a significant distinction between different syntaxes LogDebug(...) or LogPrint(BCLog::Debug, ...) or even log.Debug(...) so have not been trying to raise syntax as an issue.

The second one is an extra 13 characters of noise each time (20 if you don’t do something to avoid having to type BCLog::Level::Debug); for the third one, C++ doesn’t allow you to have it be a magic macro where the arguments aren’t even evaluated when debug logging isn’t enabled. Far better to have the code that actually does the complicated stuff be as easy to read and write as we can make it, especially if the cost is just repeating a one-line macro a few times in a header.

ajtowns commented at 3:27 pm on June 27, 2024: contributor

Looking at the code I found 58 cases where LogError appeared to be use correctly, in conditions that would lead to full or partial shutdowns, and 61 cases where it appeared to be used incorrectly to report errors that would not cause shutdowns and were potentially transient.

#30347 doesn’t seem to actually fix any of the 61 incorrect cases? Maybe have a PR to manually fix the incorrect ones, then a scripted diff to convert everything to the new names?

jonatack commented at 7:37 pm on June 27, 2024: contributor

an extra 13 characters of noise each time (20 if you don’t do something to avoid having to type BCLog::Level::Debug)

Extra complexity and code and an inconsistent API to avoid “extra characters” (that moreover many developers likely just cut and paste as needed) seems a poor (and borderline silly?) trade-off unless forced on us for technical reasons. Far better to have the code and API as simple and consistent as possible.

jonatack commented at 7:44 pm on June 27, 2024: contributor

Two years ago, it seemed we had a road map on severity-level logging based on a PR discussion. It turned out not to be the case, and not sufficiently specified, with subsequent proposals to swerve here or take another road there.

Not sure, but perhaps it would be best to gather a larger consensus on which log levels to adopt, and with which definitions, and an overall API, rather than everyone opening competing proposals. But I haven’t been following all of these proposals closely.

willcl-ark added the label Brainstorming on Jun 28, 2024

ryanofsky commented at 4:39 pm on June 28, 2024: contributor

re: #30348#issue-2378034511

Current LogWarning usages looked appropriate, but were rare, with only 8 usages currently.

This turned out to be incorrect, so I edited this part of the issue description. wilcl-ark found a case where LogWarning is currently being misused (https://github.com/bitcoin/bitcoin/pull/30347#issuecomment-2194685490) to warn about an unexpectedly large -checkblocks value. I also found more incorrect uses that I missed before because I was only searching for LogWarning not Level::Warning. #30347 has been updated to reflect these changes.

re: #30348 (comment)

Two years ago, it seemed we had a road map on severity-level logging based on a PR discussion. It turned out not to be the case, and not sufficiently specified, with subsequent proposals to swerve here or take another road there.

This is definitely true, but I think it is mostly ok and has been beneficial. The decision to only use Level::Error for fatal errors was definitely very surprising to me, I only found out about it yesterday! But I think most changes have been solid improvements.

I agree it would be good to discuss severity levels somewhere if people disagree about them, though I’m not sure they actually do. AJ posted a helpful chart that shows how the levels are supposed to be used, and I think it makes perfect sense as a recommendation for what levels to use in new code, even if it doesn’t describe how log levels are used in a lot of existing code.

This issue is just about making names used in the code match intent of the code, so it may not be best place to discuss bigger topics, but I hope people will open other issues and PRs if there are more improvements that can be made.

re: #30348 (comment)

#30347 doesn’t seem to actually fix any of the 61 incorrect cases?

It fixes them by no longer misusing the name “Error” to mean fatal error, and the name “Warning” to mean critical condition that requires action from the user. So the logging in those cases is correct, even if it goes against new best practices we are recommending for logging. I think in most or all of those cases, the log messages are older than the recommendations are, so I did not take it upon myself to update them.

Maybe have a PR to manually fix the incorrect ones, then a scripted diff to convert everything to the new names?

It’s not clear to me what you mean by manually fixing the incorrect ones, but I would be happy to go with a different approach if you want to implement it, or specify what you mean by “fix.” I provided a pretty detailed rationale for the approach I took in #30347 (review), replying to your other comment (which I appreciate by the way, I do think it would be good to reduce the number of severity levels and the amount of log spam).

ryanofsky referenced this in commit 5175cbcbb0 on Jun 28, 2024

ryanofsky referenced this in commit b7aae361b2 on Jun 30, 2024

RFC: Misused LogError and LogWarning macros #30348