doc: clarify the documentation of `Assume` assertion

ismaelsadeeq commented at 11:20 am on March 20, 2025: member

An Expression inside Assume may be optimized away in production builds when the compiler proves they are side-effect-free. This use case is demonstrated in #31363 and is suggested to be documented in #31363 (comment).

DrahtBot commented at 11:21 am on March 20, 2025: contributor

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/32100.

Reviews

See the guideline for information on the review process.

Type	Reviewers
ACK	l0rinc, hodlinator, jonatack, rkrux
Stale ACK	sipa

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

DrahtBot added the label Docs on Mar 20, 2025

in doc/developer-notes.md:464 in a880d1bf87 outdated

459@@ -460,7 +460,14 @@ other input.
460   safely continue even if the assumption is violated. In debug builds it
461   behaves like `Assert`/`assert` to notify developers and testers about
462   nonfatal errors. In production it doesn't warn or log anything, though the
463-  expression is always evaluated.
464+  expression is always evaluated. However, if the compiler can prove that a
465+  statement inside `Assume` is side-effect-free, it may optimize the call away,

yancyribbens commented at 2:21 pm on March 20, 2025:

Interesting. If Assume can be optimized out during production possibly, is there any reason to prefer Assert?

There’s an assert here in a coin-selection algo that is meant to be performant. Maybe this should be changed to Asume.

maflcko commented at 2:30 pm on March 20, 2025:

There’s an assert here in a coin-selection algo that is meant to be performant. Maybe this should be changed to Asume.

I haven’t measured this, but I’d say an assert on a value of a type of size_t shouldn’t be noticeable in this context.

yancyribbens commented at 6:32 pm on March 20, 2025:

I tested however the benchmarks don’t seem to be sensitive enough for this to make a difference.

sipa commented at 2:36 pm on March 20, 2025: member

Concept ACK, this is largely how I’ve been using Assume().

My thinking is that due to the optimizing-out property of Assume(), the concern about potential performance impact is (even) lower than for Assert()/assert(), and thus one can be even more liberal when using these. Note that even Assert()/assert() tend to have very low cost for cheap conditions, as they’re generally well-predicted by the CPU branch predictor (so the cost tends to be in the nanosecond or less range), but for cheap provably-side-effect-free conditions, for Assume() the concern just disappears entirely.

So the benefit, I think, isn’t directly that Assume() is necessarily significantly faster than Assert()/assert() - in almost all cases in the codebase, the impact of an additional well-predicted conditional isn’t going to be measurable. The benefit is that it may, for reviewers, largely remove the consideration of whether or not there is any performance impact at all.

Of course, compared to the others it has the downside of not actually triggering in production builds, so it should be reserved for cases where (a) the code can safely continue when the assumption is violated (that’s what’s documented already) but also (b) when there is sufficient test coverage to give confidence that the assumption actually can’t trigger in production.

So I think the advice about this kind of usage should be:

There aren’t other reasons to have an Assert()/assert(); i.e., if it weren’t for the possibility of a production-optimized-out Assume(), you’d instead have no assertion at all.
The condition is clearly side-effect free.
It helps readability/reviewability by making assumptions explicitly verified in fuzz tests. If the code doesn’t have good fuzz test coverage, this doesn’t achieve much.

in doc/developer-notes.md:470 in a880d1bf87 outdated

466+  skipping its evaluation in production.
467+   - `Assume` can also act as a lightweight debugging assertion, ensuring
468+     statements are tested e.g., during fuzzing—to catch violations. With
469+     sufficient test coverage, the compiler may optimize away proven conditions
470+     in production, eliminating runtime overhead. This aids both testing and code
471+     review by making statements explicit and tested.

hodlinator commented at 9:57 am on March 22, 2025:

This seems sufficient to me:

0  expression is always evaluated. However, if the compiler can prove that a
1  statement inside `Assume` is side-effect-free, it may optimize the call away,
2  skipping its evaluation in production. This enables a lower-cost way of
3  making explicit statements about the code, aiding review.

Motivation:

Assume can also act as a lightweight debugging assertion, ensuring statements are tested e.g., during fuzzing—to catch violations.

Even if a lot of fuzzing is were to be performed with debug builds, this feels redundant. If we fuzz mostly in non-debug, it’s not very accurate.

With sufficient test coverage, the compiler may optimize away proven conditions in production, eliminating runtime overhead.

Seems to imply that the compiler optimizes away depending on test coverage.

This aids both testing and code review by making statements explicit and tested.

Rephrased and appended to preceding paragraph.

An additional reason for not inserting a extra bullet-point here is to maintain the existing pattern of using one bulletpoint for each assert/check type, starting with “For example, “. So even if insist on adding a mention of fuzzing, ideally it should not be in bullet-point form, but rather appended to the preceding paragraph.

ismaelsadeeq commented at 12:17 pm on March 24, 2025:

fixed!

hodlinator commented at 1:07 pm on March 24, 2025:

Thank you for taking my suggestion!

hodlinator commented at 10:04 am on March 22, 2025: contributor

Concept ACK

Good to document useful aspects of Assume.

rkrux commented at 10:33 am on March 24, 2025: contributor

Concept ACK a880d1bf87817b1e6606c971cbfe98382898a0be

I find myself agreeing with the points mentioned in this comment #32100 (review), there is some redundancy in the verbiage currently that can be removed.

ismaelsadeeq force-pushed on Mar 24, 2025

ismaelsadeeq commented at 12:19 pm on March 24, 2025: member

I find myself agreeing with the points mentioned in this comment #32100 (comment), there is some redundancy in the verbiage currently that can be removed.

This is fixed now in 2898a0be..a7c65edc

hodlinator approved

hodlinator commented at 1:11 pm on March 24, 2025: contributor

ACK a7c65edc884b0e22aaabd7e725f5f39e60b6e76b

Adds more information on the strengths of Assume(), being able to be used as a kind of low-cost documentation of constraints that in the best case can be optimized away in production.

Latest version of the change also keeps with the pattern of only having one - For example ...-subsection.

DrahtBot requested review from sipa on Mar 24, 2025

DrahtBot requested review from rkrux on Mar 24, 2025

sipa commented at 1:24 pm on March 24, 2025: member

ACK a7c65edc884b0e22aaabd7e725f5f39e60b6e76b

rkrux approved

rkrux commented at 1:29 pm on March 24, 2025: contributor

ACK a7c65edc884b0e22aaabd7e725f5f39e60b6e76b

in doc/developer-notes.md:464 in a7c65edc88 outdated

459@@ -460,7 +460,10 @@ other input.
460   safely continue even if the assumption is violated. In debug builds it
461   behaves like `Assert`/`assert` to notify developers and testers about
462   nonfatal errors. In production it doesn't warn or log anything, though the
463-  expression is always evaluated.
464+  expression is always evaluated. However, if the compiler can prove that a
465+  statement inside `Assume` is side-effect-free, it may optimize the call away,

l0rinc commented at 2:06 pm on March 24, 2025:

According to https://www.reddit.com/r/cpp_questions/comments/plg8ij/expression_vs_statement/

Statements do not have a return type

So by definition, unless they’re empty, they cannot be side-effect-free - if my interpretation is correct.

0  expression is always evaluated. However, if the compiler can prove that
1  an expression inside `Assume` is side-effect-free, it may optimize the call away,

ismaelsadeeq commented at 2:30 pm on March 24, 2025:

Fixed!

l0rinc changes_requested

doc: clarify the documentation of `Assume` 329a0dcdaf

ismaelsadeeq force-pushed on Mar 24, 2025

l0rinc commented at 2:37 pm on March 24, 2025: contributor

ACK 329a0dcdafe05002f662e8737a76bfdeaba9a3ed

DrahtBot requested review from rkrux on Mar 24, 2025

DrahtBot requested review from hodlinator on Mar 24, 2025

hodlinator approved

hodlinator commented at 3:17 pm on March 24, 2025: contributor

re-ACK 329a0dcdafe05002f662e8737a76bfdeaba9a3ed

jonatack commented at 5:04 pm on March 24, 2025: member

ACK 329a0dcdafe05002f662e8737a76bfdeaba9a3ed

l0rinc approved

rkrux approved

rkrux commented at 7:45 am on March 25, 2025: contributor

re-ACK 329a0dcdafe05002f662e8737a76bfdeaba9a3ed

yancyribbens commented at 7:02 pm on March 25, 2025: contributor

Of course, compared to the others it has the downside of not actually triggering in production builds, so it should be reserved for cases where (a) the code can safely continue when the assumption is violated (that’s what’s documented already)

This is a good call out, although I don’t see it already documented (here), maybe I’m blind

The other reason I think this is a good call it is this excerpt from http://fiona.dmcs.p.lodz.pl/oopc/reference/en/cpp/language/attributes/assume.html.

If the expression can ever evaluate to false, it will inject undefined behavior into your whole program (not just where the assumption appears). It should never be used in place of an assertion or precondition.

Other note is I don’t see any mention in the docs that this is optimized out in production. The best I can find is that the expression is not evaluated and converted to a bool, then when the program reaches that bool, if the bool evaluates to true it’s basically a noop (has no effect) otherwise boom.

see: https://eel.is/c++draft/dcl.attr.assume

in doc/developer-notes.md:464 in 329a0dcdaf

459@@ -460,7 +460,10 @@ other input.
460   safely continue even if the assumption is violated. In debug builds it
461   behaves like `Assert`/`assert` to notify developers and testers about
462   nonfatal errors. In production it doesn't warn or log anything, though the
463-  expression is always evaluated.
464+  expression is always evaluated. However, if the compiler can prove that
465+  an expression inside `Assume` is side-effect-free, it may optimize the call away,

yancyribbens commented at 7:03 pm on March 25, 2025:

0  an expression inside `Assume` is true, it _will_ have no affect,

yancyribbens commented at 7:04 pm on March 25, 2025:

s/affect/effect

ismaelsadeeq commented at 7:32 pm on March 25, 2025:

I will fix nit when there is need to retouch.

sipa commented at 7:38 pm on March 25, 2025:

This suggestion is not correct.

If the expression inside Assume has a side-effect, but still can be proven to evaluate to true, the side effect will still remain.

yancyribbens commented at 6:36 pm on March 26, 2025:

If the expression inside Assume has a side-effect, but still can be proven to evaluate to true, the side effect will still remain.

I am assuming you are talking about the Macro then and not the C++ 23 Assume since that’s not how the docs read. The docs read that if it’s true it has no effect: “If the expression would not evaluate to true at the place the assumption occurs, the behavior is undefined. Otherwise, the statement has no effect. "

Also, I think may should be replaced with will unless it’s ambigous wether or not this will be otimized away depending on it’s side-effect which it doesn’t sound like it is.

sipa commented at 6:44 pm on March 26, 2025:

Yes, nothing in this PR is about the C++23 [[assume()]] attribute. I don’t understand why you keep bringing it up.

yancyribbens commented at 6:46 pm on March 26, 2025:

I just noticed that Sipa said “There is no guarantee that the compiler will do that.” which I guess is why the may exists here.

yancyribbens commented at 6:56 pm on March 26, 2025:

Yes, nothing in this PR is about the C++23 [[assume()]] attribute. I don’t understand why you keep bringing it up.

The name collision had me confused

maflcko commented at 7:07 pm on March 25, 2025: member

see: https://eel.is/c++draft/dcl.attr.assume

The assume attribute is C++23. However, this codebase is C++20 and this pull request is about the Assume macro in the codebase itself.

yancyribbens commented at 7:11 pm on March 25, 2025: contributor

The assume attribute is C++23. However, this codebase is C++20 and this pull request is about the Assume macro in the codebase itself.

Good point. I did also link the C++23 link above. I’m just looking for confirmation that this doesn’t appear in production which doesn’t appear in any reference that I see.

sipa commented at 7:36 pm on March 25, 2025: member

I’m just looking for confirmation that this doesn’t appear in production which doesn’t appear in any reference that I see.

You cannot find any reference about it, because Assume() is a macro defined in the Bitcoin Core codebase, not a C++ language/library feature. It’s defined here. It is different from, and unrelated to, the [[assume(...)]] attribute added in C++23 which you link to.

And you can’t find confirmation that it doesn’t appear in production, because as far as the language is concerned, it does appear in production. It just happens to be the case that compilers may always remove code which they can prove has no effect.

yancyribbens commented at 5:10 pm on March 26, 2025: contributor

You cannot find any reference about it, because Assume() is a macro defined in the Bitcoin Core codebase, not a C++ language/library feature. It’s defined here. It is different from, and unrelated to, the [[assume(…)]] attribute added in C++23 which you link to.

I see, thanks for the explanation. Others reading this may also think this is talking about the C++ assume (maybe).

And you can’t find confirmation that it doesn’t appear in production, because as far as the language is concerned, it does appear in production. It just happens to be the case that compilers may always remove code which they can prove has no effect.

Ok. The wording might be more clear if it said that the compiler will optimize this out if it can prove there is no side-effect and drop the reference to production.

sipa commented at 5:12 pm on March 26, 2025: member

if it said that the compiler will optimize this out if it can prove there is no side-effect

There is no guarantee that the compiler will do that.

drop the reference to production

In debug builds, it cannot be optimized out, because the statement necessarily has a side-effect (aborting if it detects the condition is false).

fanquake merged this on Mar 27, 2025

fanquake closed this on Mar 27, 2025

ismaelsadeeq deleted the branch on Mar 27, 2025

ryanofsky commented at 2:57 pm on March 27, 2025: contributor

Post-merge code review ACK 329a0dcdafe05002f662e8737a76bfdeaba9a3ed. This should be a helpful change since it clarifies how Assume works.

IMO, the Assume macro is not very well designed because it is useful for two different things and not good at doing either of them. Assume is useful:

To check for states you never expect to occur, but are harmless. It can be used when you want to make sure unexpected states trigger failures during testing, but don’t want those states to crash the program in release builds. You may want to handle them instead by logging a warning, returning an error from the current RPC, safely aborting the current operation, or just doing nothing and moving on.
To check for states you never expect to occur, and probably indicate serious bugs, but the checks are slow, or are located in hotspots where they would be too expensive to perform in release builds, so are better to skip.

Assume is suboptimal for the first case because in most of the places where it is called the return value is never checked, so even when the check is performed in release builds, nothing is logged and there’s no indication of any problem. For the first case it would make more sense to have an Assume function that did the same thing but logged a warning like “Unexpected condition […] reached, consider reporting a bug.” Release builds are run in many more environments than debug builds so if unexpected states that we have explicitly written checks for are being reached, it would be probably better to know about them than not know about them.

Assume is suboptimal for the second case because instead of reliably not performing checks in release builds, checks will only be optimized out if the compiler is able to determine that the check has no side effects. This means the slowest checks may not be optimized out, which is confusing to say the least.

My inclination to improve this would be to keep using Assume for the first case and improve it, but introduce a new macro like DCHECK to be used for the second case that would only execute checks in debug builds. (I previously also suggested a SLOWCHECK macro to try to improve the second case)

hodlinator commented at 3:22 pm on March 27, 2025: contributor

What I’m missing is a SLOWCHECK/DCHECK macro that is active in debug and always active in fuzz builds (and fails fuzz-builds on failure even in release). For cases when something very unexpected is happening but it’s not worth crashing the node in production over it. Hopefully this need is not due to wanting to make up for sloppy comprehension of the code.

(It’s also unfortunate that Assume() uses the same term as the C++23 [[assume]] feature with very different semantics).

maflcko commented at 9:41 pm on March 27, 2025: member

What I’m missing is a SLOWCHECK/DCHECK macro that is active in debug and always active in fuzz builds (and fails fuzz-builds on failure even in release).

The Assume() macro does exactly this, albeit the fuzz part is only documented in the code itself.

hodlinator commented at 9:44 pm on March 27, 2025: contributor

The Assume() macro does exactly this, albeit the fuzz part is only documented in the code itself.

Ah, nice. Was getting thrown off by ABORT_ON_FAILED_ASSUME being debug-only, but G_FUZZING/FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION seems to achieve what I’m after.

doc: clarify the documentation of Assume assertion #32100

Code Coverage & Benchmarks

Reviews

doc: clarify the documentation of `Assume` assertion #32100