refactor: use original log string when no suspicious chars found

l0rinc commented at 10:55 am on February 21, 2025: contributor

We have to make sure that logged messages don’t contain non-printable characters. Until now we’ve sanitized these messages by recreating them every time, even though we expect most lines to be already normalized. This PR splits out the happy path by identifying the suspicious characters, and if none were found, use the original string, otherwise reconstruct the new string with escaped characters. This makes the most common path simpler to reason about.

For normal messages this results in a ~2x speedup, but the goal of the change is rather making sure that the case we encounter most often (printable values that don’t need sanitization) aren’t recreated every time (i.e. they’re simpler to reason about).

build/src/bench/bench_bitcoin -filter=‘LogEscapeMessage.+’ –min-time=10000

Before:

ns/op	op/s	err%	total	benchmark
216.10	4,627,429.03	0.2%	10.81	`LogEscapeMessageNormal`
328.77	3,041,669.54	0.7%	10.97	`LogEscapeMessageSuspicious`

After

ns/op	op/s	err%	total	benchmark
106.54	9,386,391.31	0.2%	11.01	`LogEscapeMessageNormal`
335.58	2,979,947.73	0.3%	10.96	`LogEscapeMessageSuspicious`

Note that this part of the code is called even with -printtoconsole=0.

DrahtBot commented at 10:55 am on February 21, 2025: contributor

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/31923.

Reviews

See the guideline for information on the review process. A summary of reviews will appear here.

l0rinc force-pushed on Feb 21, 2025

dergoegge commented at 11:05 am on February 21, 2025: member

Does this result in an end-to-end performance improvement? (e.g. in IBD)

l0rinc force-pushed on Feb 21, 2025

fanquake commented at 5:46 pm on February 21, 2025: member

https://github.com/bitcoin/bitcoin/actions/runs/13458258095/job/37607048581?pr=31923:

 0  65/139 Test   [#9](/bitcoin-bitcoin/9/): bench_sanity_check_high_priority .....***Failed   78.91 sec
 1Running with -sanity-check option, output is being suppressed as benchmark results will be useless.
 2/home/runner/work/_temp/src/logging.cpp:337:30: runtime error: implicit conversion from type 'char' of value -20 (8-bit, signed) to type 'unsigned char' changed the value to 236 (8-bit, unsigned)
 3    [#0](/bitcoin-bitcoin/0/) 0x5596549396a8 in BCLog::LogEscapeMessage(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> const&) /home/runner/work/_temp/build-asan/src/util/./logging.cpp:337:30
 4    [#1](/bitcoin-bitcoin/1/) 0x559653466a72 in LogEscapeMessageSuspicious(ankerl::nanobench::Bench&)::$_0::operator()() const /home/runner/work/_temp/build-asan/src/bench/./bench/logging.cpp:94:29
 5    [#2](/bitcoin-bitcoin/2/) 0x559653466a72 in ankerl::nanobench::Bench& ankerl::nanobench::Bench::run<LogEscapeMessageSuspicious(ankerl::nanobench::Bench&)::$_0>(LogEscapeMessageSuspicious(ankerl::nanobench::Bench&)::$_0&&) /home/runner/work/_temp/build-asan/src/bench/./bench/nanobench.h:1221:13
 6    [#3](/bitcoin-bitcoin/3/) 0x559653464ac5 in LogEscapeMessageSuspicious(ankerl::nanobench::Bench&) /home/runner/work/_temp/build-asan/src/bench/./bench/logging.cpp:92:40
 7    [#4](/bitcoin-bitcoin/4/) 0x5596532f85b6 in std::function<void (ankerl::nanobench::Bench&)>::operator()(ankerl::nanobench::Bench&) const /usr/lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/bits/std_function.h:591:9
 8    [#5](/bitcoin-bitcoin/5/) 0x5596532f85b6 in benchmark::BenchRunner::RunAll(benchmark::Args const&) /home/runner/work/_temp/build-asan/src/bench/./bench/bench.cpp:153:13
 9    [#6](/bitcoin-bitcoin/6/) 0x5596532e4ca5 in main /home/runner/work/_temp/build-asan/src/bench/./bench/bench_bitcoin.cpp:150:9
10    [#7](/bitcoin-bitcoin/7/) 0x7f19fe68c1c9  (/lib/x86_64-linux-gnu/libc.so.6+0x2a1c9) (BuildId: 42c84c92e6f98126b3e2230ebfdead22c235b667)
11    [#8](/bitcoin-bitcoin/8/) 0x7f19fe68c28a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2a28a) (BuildId: 42c84c92e6f98126b3e2230ebfdead22c235b667)
12    [#9](/bitcoin-bitcoin/9/) 0x5596531f77d4 in _start (/home/runner/work/_temp/build-asan/src/bench/bench_bitcoin+0xbc97d4) (BuildId: 1f5859276d30b7f56abe5f57c08a08bc76f1aeb6)
13
14SUMMARY: UndefinedBehaviorSanitizer: implicit-integer-sign-change /home/runner/work/_temp/src/logging.cpp:337:30

l0rinc force-pushed on Feb 22, 2025

in src/logging.cpp:333 in f8af5d9aa3 outdated

329      */
330-    std::string LogEscapeMessage(std::string_view str) {
331+    std::string LogEscapeMessage(const std::string& str)
332+    {
333+        const auto suspicious_count{std::ranges::count_if(str, IsSuspicious)};
334+        if (!suspicious_count) return str;

maflcko commented at 8:58 am on February 24, 2025:

This doesn’t use the original string. It still creates a copy and changing std::string_view str to const std::string& str in the function signature doesn’t change that. In fact, (albeit not in this patch), the signature change is a pessimization, because the compiler may be forced to create a temporary string to be able to create a reference to it, when a plain string_view would be sufficient.

l0rinc commented at 2:29 pm on February 24, 2025:

Thanks for the review! The benchmarks indicate that this potential extra copy doesn’t matter, but we can of course move the short-circuiting to the call site, that should make this more intuitive (we’re still getting one copy now, but at least it’s not character-by-character). What do you think, is this better?

maflcko commented at 9:02 am on February 24, 2025: member

Looks like part of the changes are based on a misunderstanding, so I left a comment.

However, I am not sure if the changes should be done. If there is no end-user or developer-facing goal or benefits, then not changing the code seems preferable.

maflcko commented at 9:10 am on February 24, 2025: member

Trying to access https://corecheck.dev/bitcoin/bitcoin/pulls/31923 gives me

0This function has crashed
1
2An unhandled error in the function code triggered the following message:
3
4LambdaTimeout

cc @m3dwards

l0rinc force-pushed on Feb 24, 2025

maflcko commented at 7:43 am on February 25, 2025: member

Tend toward nack for now, as explained in the previous comments: #31923 (comment) and #31923#pullrequestreview-2636342094

Given the added benchmarks, optimization seems to be the goal of this pull request. However, without a visible end-to-end improvement in any scenario (except in the added micro-bench), I fail to see why two benchmarks should be added and why the code should be changed.

maflcko commented at 7:50 am on February 25, 2025: member

Looking at https://corecheck.dev/bitcoin/bitcoin/pulls/31923 it seems that the LogWith* benchmarks are not shown at all? Looks like a bug where the list is truncated?

l0rinc commented at 12:18 pm on February 25, 2025: contributor

The code is reducing useless work therefore is also closer to how we’d likely write the code from scratch now. It’s basically a refactor that simplifies the happy path, with the side-effect of also being faster.

as explained in the previous comments: #31923 (comment) and #31923#pullrequestreview-2636342094

One was a question (~~I’m still measuring it, but don’t expect to have any scenario where IBD is faster because of this, in the same way as most other refactors also don’t have that~~ finished measuring it, the log writings don’t show up as a measurable part of IBD, it’s more like a refactoring with minor speedup as a side-effect), the other is a comment about an old state.

m3dwards commented at 7:50 pm on February 25, 2025: contributor

Trying to access https://corecheck.dev/bitcoin/bitcoin/pulls/31923 gives me

Should be sorted

l0rinc renamed this:
~~log,optimization: use original log string when no suspicious chars found~~
refactor: use original log string when no suspicious chars found
on Mar 9, 2025

DrahtBot added the label Refactoring on Mar 9, 2025

l0rinc commented at 4:57 pm on March 9, 2025: contributor

@maflcko, I’ve reorganized the PR as a refactor where the happy path is now easier to reason about (i.e. no change to the incoming string most of the time) - where the speed gain is just a side-effect of the simpler path taken (it’s why the benchmarks are now PriorityLevel::LOW).

Please let me know if this fits better with the priorities.

maflcko commented at 11:51 am on March 27, 2025: member

I’ve reorganized the PR as a refactor

I’d say my nack still holds. I don’t see the need for adding redundant and questionable benchmarks for refactors. If this was a requirement or even desired for every single refactor of any function, the bench binary would tend to have a number of benchmarks equal to the number of functions in the whole codebase.

log: use original log string when no suspicious chars found

We have to make sure that logged messages don't contain non-printable characters.
Until now, we've sanitized these messages by recreating them every time, even though we expect most lines to be already normalized.
This PR splits out the happy path by identifying the suspicious characters, and if none were found, use the original string, otherwise reconstruct the new string with escaped characters.
This makes the most common path simpler to reason about.

d546d2319c

l0rinc force-pushed on Mar 27, 2025

l0rinc commented at 12:24 pm on March 27, 2025: contributor

I’ve removed the benchmark and update the description, let me know if this seems better.

maflcko commented at 8:04 pm on April 17, 2025: member

No strong opinion, but I don’t see the point. This splits code that was nicely contained over three places, makes it verbose and harder to understand with no visible end-to-end benefit.

l0rinc closed this on Apr 17, 2025

refactor: use original log string when no suspicious chars found #31923

Code Coverage & Benchmarks

Reviews