bench: `linearizeoptimallyexample11` benchmark now running 4x slow than previously

fanquake commented at 2:16 pm on October 29, 2024: member

According to https://corecheck.dev/benchmarks, the linearizeoptimallyexample11 benchmark 4x’d it’s execution time at some point in the last few days. Screenshot 2024-10-29 at 14 15 02

sipa commented at 2:20 pm on October 29, 2024: member

More like 2x (from 0.64 to 1.21)? Still bizarre, this code has not been touched the past month for as far as I can see?

fanquake commented at 2:22 pm on October 29, 2024: member

I was just looking at total execution time increasing from ~4.3s to ~16.3s.

sipa commented at 2:23 pm on October 29, 2024: member

Oh nevermind, I was looking at the “outliers” panel, which doesn’t give absolute time. Indeed, 4x.

m3dwards commented at 10:19 pm on October 29, 2024: contributor

I have been able to recreate a roughly 2x slowdown on both x86 Linux and AArch64 Mac. Not quite as much as the corcheck.dev slowdown but still significant. Just for reference, corecheck.dev runs on AArch64 Linux.

I compared f0130ab1a1 (October 17th) with 97b790e844abd2 (October 29th).

I’ll have a scan for the offending commit and worse case do a binary search.

m3dwards commented at 11:05 pm on October 30, 2024: contributor

The slowdown appears to be introduced with https://github.com/bitcoin/bitcoin/commit/1c7ca6e64de9b47b2295c81cb0fedd432ffaf001

Not sure why yet.

sipa commented at 2:03 am on October 31, 2024: member

@m3dwards Thanks for digging! I believe that explains it: since #31093, Assume() calls are no longer optimized out in production builds.

davidgumberg commented at 2:40 am on October 31, 2024: contributor

@m3dwards Thanks for digging! I believe that explains it: since #31093, Assume() calls are no longer optimized out in production builds.

+1 I’ve reproduced this performance issue locally, and when you remove g_fuzzing from the inline_assertion_check’s if clause, the regression goes away:

./build/src/bench/bench_bitcoin -filter=LinearizeOptimallyExample11 -min-time=30000

branch	ns/op	op/s	err%	ins/op	cyc/op	IPC	bra/op	miss%	total	benchmark
1c7ca6e	374,612,291.12	2.67	0.1%	7,324,884,820.00	1,304,215,224.38	5.616	898,388,711.38	0.2%	32.23	`LinearizeOptimallyExample11`
1c7ca6e~1	301,758,341.60	3.31	0.3%	4,990,499,172.00	1,050,585,438.00	4.750	352,054,044.90	0.3%	32.48	`LinearizeOptimallyExample11`
no check g_fuzzing in assume 88a2427a68	300,695,825.00	3.33	0.1%	4,990,499,171.80	1,046,167,482.50	4.770	352,054,044.70	0.3%	32.39	`LinearizeOptimallyExample11`

I recorded some flamegraphs that point to where the hot Assume call(s) are: 1c7ca6e (permalink) vs parent (permalink)

l0rinc commented at 7:59 am on October 31, 2024: contributor

Could we obviate in the DrahtBot’s response that the link doesn’t just provide a test coverage report - since the regression was already visible there: https://corecheck.dev/bitcoin/bitcoin/pulls/31093

maflcko commented at 9:42 am on October 31, 2024: member

Sure, it is open source and pull requests for re-wording are welcome: https://github.com/maflcko/DrahtBot/blob/main/webhook_features/src/features/summary_comment.rs#L194

maflcko commented at 9:48 am on October 31, 2024: member

I wonder if an [[unlikely]] annotation can take care of this, so that the compiler can treat the Assume as an (almost) no-op in the happy path. Before and after the g_fuzzing change, val would have to always be evaluated. However, it seems the !val evaluation is the overhead in this case?

sipa commented at 11:35 am on October 31, 2024: member

@maflcko Worth trying, but I have been using Assume in many places assuming it’ll be optimized out entirely in production code. If we leave this g_fuzzing in place, we will need to re-evaluate whether all those Assumes are still worth having if they have a runtime impact.

maflcko commented at 11:51 am on October 31, 2024: member

Then I think the only solution is to make it a build-time option again:

0constexpr bool G_FUZZING{
1#ifdef FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION
2  true
3#else
4  false
5#endif
6};

ryanofsky commented at 1:58 pm on October 31, 2024: contributor

I think it is worth looking for a better way to fix this because of the clumsiness of needing to compile two different versions of libraries when fuzzing is used.

It is not true that in general that Assume checks were “optimized out” before #31093 because the checks are performed before the inline_assertion_check function is called, so wouldn’t be skipped in general just because the function is implemented differently. The compiler would be able to skip the checks if it knows they don’t have side effects. But if it knows they don’t have side effects, it should also be able to skip the checks when g_fuzzing is false.

I think we should look at what compiler is doing here and see if there’s a way we can get it to generate better code to fix the benchmark. Also if we can change the implementation of Assume to check g_fuzzing before running the Assume checks it might increase performance of the production builds generally in cases where the compile doesn’t know if the checks have side effects.

maflcko commented at 2:19 pm on October 31, 2024: member

It is not true that in general that Assume checks were “optimized out” before #31093 because the checks are performed before the inline_assertion_check function is called, so wouldn’t be skipped in general just because the function is implemented differently.

I may be missing something, but I don’t think this is true. The condition !val is never evaluated (and the compiler doesn’t has to account for it being evaluated and doesn’t have to emit any statements for it) in the following code example:

0if (false) {
1  if (!val) {
2    assertion_fail(file, line, func, assertion);
3  }
4}
5return std::forward<T>(val);

However, if the condition was false || g_fuzzing, the compiler would have to account for it and emit the !val check and the assertion_fail branch, even if tagged with [[unlikely]].

ryanofsky commented at 2:44 pm on October 31, 2024: contributor

I could also be missing something, but doesn’t Assume(<check expression>) just expand to inline_assertion_check<false>(<check expression>, __FILE__, __LINE__, __func__, "<check expression>") so unless the compiler knows <check expression> does not have side effects, must execute it and can’t optimize it out? If Assume macro were implemented differently it could skip evaluating <check expression> at all.

sipa commented at 2:58 pm on October 31, 2024: member

@ryanofsky Yeah, the context here is exactly Assume() calls where the conditions is easily shown to be side-effect free by the compiler. I have been using those fairly liberally (inside production code), because it means free extra checks in fuzzing tests that have no runtime cost, but are simultaneously definitely not going to result in different behavior between fuzz tests and production code (because it’s the compiler deciding the condition is side-effect free).

I think these are very useful to have, but it inherently requires a separate compilation for fuzz tests (or any other testing environment where we’d want them enabled). I see that such a separate compilation has downsides too, but the alternative I think does mean just getting rid of some of the most tight-loop Assumes. I don’t think it’s practical to prevent those from running by putting a conditional higher up the stack (at least not without introducing a much more significant risk of distinct behavior between production and testing).

maflcko commented at 3:10 pm on October 31, 2024: member

so unless the compiler knows <check expression> does not have side effects, must execute it and can’t optimize it out?

Correct, but the that hasn’t been changed by g_fuzzing at all. It is not the execution of <check expression>, but the execution of the negation of it (wrong, see #31178 (comment) ), and the corresponding expensive branch in hot paths which has to be emitted for the unlikely case that the negation evaluates to true.

sipa commented at 3:16 pm on October 31, 2024: member

Correct, but the that hasn’t been changed by g_fuzzing at all.

Ah, right, it’s possible that the compiler can turn Assume(<expr>); into if (g_fuzzing) { bool x = <expr>; if (!x) { ... } }, where <expr> is not evaluated at all if g_fuzzing is false. I was operating under the assumption that with the emitted branch, <expr> would always be evaluated even if side-effect free, but that is indeed not the case.

Still, just the fact that the g_fuzzing condition is there means a conditional is likely emitted, which can slow things down even if correctly predicted.

maflcko commented at 3:26 pm on October 31, 2024: member

In any case #31191 should fix the recently introduced g_fuzzing regression. If there are suggestions for other (more fundamental) changes or alternative code patches, separate pull requests would probably be best going forward. I am happy to review any of them, if they exist.

ryanofsky commented at 3:46 pm on October 31, 2024: contributor

After testing this, it does seem like, in practice, the slowdown comes from checking the g_fuzzing variable, not from evaluating the assume condition. I don’t agree with (or don’t understand) Marco’s logic that this was neccesarily the case, since before #31093 if the compiler knew evaluating the assume condition didn’t have side effects and that result of the assume condition was not used, it would have no reason to evaluate it before #31093, but would need to evaluate it after #31093, so an optimization that would skip evaluating the assume condition when g_fuzzing is false could result in a speedup.

But in practice, the speedup that comes from doing this is relatively minor. When I run the LinearizeOptimallyExample11 benchmark on master, I get around 1.4 operations per second. With a change to skip evaluating the assume condition when g_fuzzing is false, this increases to 1.5 operations per second. But if I revert #31093, it increases to 1.8 operation to second.

Change I was tested with was:

 0--- a/src/util/check.h
 1+++ b/src/util/check.h
 2@@ -41,21 +41,24 @@ T&& inline_check_non_fatal(LIFETIMEBOUND T&& val, const char* file, int line, co
 3 void assertion_fail(std::string_view file, int line, std::string_view func, std::string_view assertion);
 4 
 5 /** Helper for Assert()/Assume() */
 6-template <bool IS_ASSERT, typename T>
 7+template <typename T>
 8 constexpr T&& inline_assertion_check(LIFETIMEBOUND T&& val, [[maybe_unused]] const char* file, [[maybe_unused]] int line, [[maybe_unused]] const char* func, [[maybe_unused]] const char* assertion)
 9 {
10-    if (IS_ASSERT || std::is_constant_evaluated() || g_fuzzing
11-#ifdef ABORT_ON_FAILED_ASSUME
12-        || true
13-#endif
14-    ) {
15         if (!val) {
16             assertion_fail(file, line, func, assertion);
17         }
18-    }
19     return std::forward<T>(val);
20 }
21 
22+constexpr bool AbortOnFailedAssume() {
23+#ifdef ABORT_ON_FAILED_ASSUME
24+   return true;
25+#endif
26+   if (std::is_constant_evaluated()) return true;
27+   if (g_fuzzing) [[unlikely]] return true;
28+   return false;
29+}
30+
31 // All macros may use __func__ inside a lambda, so put them under nolint.
32 // NOLINTBEGIN(bugprone-lambda-function-name)
33 
34@@ -76,7 +79,7 @@ constexpr T&& inline_assertion_check(LIFETIMEBOUND T&& val, [[maybe_unused]] con
35     inline_check_non_fatal(condition, __FILE__, __LINE__, __func__, #condition)
36 
37 /** Identity function. Abort if the value compares equal to zero */
38-#define Assert(val) inline_assertion_check<true>(val, __FILE__, __LINE__, __func__, #val)
39+#define Assert(val) inline_assertion_check(val, __FILE__, __LINE__, __func__, #val)
40 
41 /**
42  * Assume is the identity function.
43@@ -88,7 +91,7 @@ constexpr T&& inline_assertion_check(LIFETIMEBOUND T&& val, [[maybe_unused]] con
44  * - For non-fatal errors in interactive sessions (e.g. RPC or command line
45  *   interfaces), CHECK_NONFATAL() might be more appropriate.
46  */
47-#define Assume(val) inline_assertion_check<false>(val, __FILE__, __LINE__, __func__, #val)
48+#define Assume(val) inline_assertion_check(!AbortOnFailedAssume() || (val), __FILE__, __LINE__, __func__, #val)
49 
50 /**
51  * NONFATAL_UNREACHABLE() is a macro that is used to mark unreachable code. It throws a NonFatalCheckError.

ryanofsky referenced this in commit 03cff2c142 on Nov 5, 2024

ryanofsky closed this on Nov 5, 2024

bench: linearizeoptimallyexample11 benchmark now running 4x slow than previously #31178

bench: `linearizeoptimallyexample11` benchmark now running 4x slow than previously #31178