refactor: use original log string when no suspicious chars found #31923

pull l0rinc wants to merge 1 commits into bitcoin:master from l0rinc:lorinc/optimistic-log-escape-message changing 1 files +13 −7
  1. l0rinc commented at 10:55 am on February 21, 2025: contributor

    We have to make sure that logged messages don’t contain non-printable characters. Until now we’ve sanitized these messages by recreating them every time, even though we expect most lines to be already normalized. This PR splits out the happy path by identifying the suspicious characters, and if none were found, use the original string, otherwise reconstruct the new string with escaped characters. This makes the most common path simpler to reason about.

    For normal messages this results in a ~2x speedup, but the goal of the change is rather making sure that the case we encounter most often (printable values that don’t need sanitization) aren’t recreated every time (i.e. they’re simpler to reason about).

    build/src/bench/bench_bitcoin -filter=‘LogEscapeMessage.+’ –min-time=10000

    Before:

    ns/op op/s err% total benchmark
    216.10 4,627,429.03 0.2% 10.81 LogEscapeMessageNormal
    328.77 3,041,669.54 0.7% 10.97 LogEscapeMessageSuspicious

    After

    ns/op op/s err% total benchmark
    106.54 9,386,391.31 0.2% 11.01 LogEscapeMessageNormal
    335.58 2,979,947.73 0.3% 10.96 LogEscapeMessageSuspicious

    Note that this part of the code is called even with -printtoconsole=0.

  2. DrahtBot commented at 10:55 am on February 21, 2025: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/31923.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    Concept NACK maflcko

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

  3. l0rinc force-pushed on Feb 21, 2025
  4. dergoegge commented at 11:05 am on February 21, 2025: member
    Does this result in an end-to-end performance improvement? (e.g. in IBD)
  5. l0rinc force-pushed on Feb 21, 2025
  6. fanquake commented at 5:46 pm on February 21, 2025: member

    https://github.com/bitcoin/bitcoin/actions/runs/13458258095/job/37607048581?pr=31923:

     0  65/139 Test   [#9](/bitcoin-bitcoin/9/): bench_sanity_check_high_priority .....***Failed   78.91 sec
     1Running with -sanity-check option, output is being suppressed as benchmark results will be useless.
     2/home/runner/work/_temp/src/logging.cpp:337:30: runtime error: implicit conversion from type 'char' of value -20 (8-bit, signed) to type 'unsigned char' changed the value to 236 (8-bit, unsigned)
     3    [#0](/bitcoin-bitcoin/0/) 0x5596549396a8 in BCLog::LogEscapeMessage(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> const&) /home/runner/work/_temp/build-asan/src/util/./logging.cpp:337:30
     4    [#1](/bitcoin-bitcoin/1/) 0x559653466a72 in LogEscapeMessageSuspicious(ankerl::nanobench::Bench&)::$_0::operator()() const /home/runner/work/_temp/build-asan/src/bench/./bench/logging.cpp:94:29
     5    [#2](/bitcoin-bitcoin/2/) 0x559653466a72 in ankerl::nanobench::Bench& ankerl::nanobench::Bench::run<LogEscapeMessageSuspicious(ankerl::nanobench::Bench&)::$_0>(LogEscapeMessageSuspicious(ankerl::nanobench::Bench&)::$_0&&) /home/runner/work/_temp/build-asan/src/bench/./bench/nanobench.h:1221:13
     6    [#3](/bitcoin-bitcoin/3/) 0x559653464ac5 in LogEscapeMessageSuspicious(ankerl::nanobench::Bench&) /home/runner/work/_temp/build-asan/src/bench/./bench/logging.cpp:92:40
     7    [#4](/bitcoin-bitcoin/4/) 0x5596532f85b6 in std::function<void (ankerl::nanobench::Bench&)>::operator()(ankerl::nanobench::Bench&) const /usr/lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/bits/std_function.h:591:9
     8    [#5](/bitcoin-bitcoin/5/) 0x5596532f85b6 in benchmark::BenchRunner::RunAll(benchmark::Args const&) /home/runner/work/_temp/build-asan/src/bench/./bench/bench.cpp:153:13
     9    [#6](/bitcoin-bitcoin/6/) 0x5596532e4ca5 in main /home/runner/work/_temp/build-asan/src/bench/./bench/bench_bitcoin.cpp:150:9
    10    [#7](/bitcoin-bitcoin/7/) 0x7f19fe68c1c9  (/lib/x86_64-linux-gnu/libc.so.6+0x2a1c9) (BuildId: 42c84c92e6f98126b3e2230ebfdead22c235b667)
    11    [#8](/bitcoin-bitcoin/8/) 0x7f19fe68c28a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2a28a) (BuildId: 42c84c92e6f98126b3e2230ebfdead22c235b667)
    12    [#9](/bitcoin-bitcoin/9/) 0x5596531f77d4 in _start (/home/runner/work/_temp/build-asan/src/bench/bench_bitcoin+0xbc97d4) (BuildId: 1f5859276d30b7f56abe5f57c08a08bc76f1aeb6)
    13
    14SUMMARY: UndefinedBehaviorSanitizer: implicit-integer-sign-change /home/runner/work/_temp/src/logging.cpp:337:30 
    
  7. l0rinc force-pushed on Feb 22, 2025
  8. in src/logging.cpp:333 in f8af5d9aa3 outdated
    329      */
    330-    std::string LogEscapeMessage(std::string_view str) {
    331+    std::string LogEscapeMessage(const std::string& str)
    332+    {
    333+        const auto suspicious_count{std::ranges::count_if(str, IsSuspicious)};
    334+        if (!suspicious_count) return str;
    


    maflcko commented at 8:58 am on February 24, 2025:
    This doesn’t use the original string. It still creates a copy and changing std::string_view str to const std::string& str in the function signature doesn’t change that. In fact, (albeit not in this patch), the signature change is a pessimization, because the compiler may be forced to create a temporary string to be able to create a reference to it, when a plain string_view would be sufficient.

    l0rinc commented at 2:29 pm on February 24, 2025:
    Thanks for the review! The benchmarks indicate that this potential extra copy doesn’t matter, but we can of course move the short-circuiting to the call site, that should make this more intuitive (we’re still getting one copy now, but at least it’s not character-by-character). What do you think, is this better?
  9. maflcko commented at 9:02 am on February 24, 2025: member

    Looks like part of the changes are based on a misunderstanding, so I left a comment.

    However, I am not sure if the changes should be done. If there is no end-user or developer-facing goal or benefits, then not changing the code seems preferable.

  10. maflcko commented at 9:10 am on February 24, 2025: member

    Trying to access https://corecheck.dev/bitcoin/bitcoin/pulls/31923 gives me

    0This function has crashed
    1
    2An unhandled error in the function code triggered the following message:
    3
    4LambdaTimeout
    

    cc @m3dwards

  11. l0rinc force-pushed on Feb 24, 2025
  12. maflcko commented at 7:43 am on February 25, 2025: member

    Tend toward NACK for now, as explained in the previous comments: #31923 (comment) and #31923#pullrequestreview-2636342094

    Given the added benchmarks, optimization seems to be the goal of this pull request. However, without a visible end-to-end improvement in any scenario (except in the added micro-bench), I fail to see why two benchmarks should be added and why the code should be changed.

  13. maflcko commented at 7:50 am on February 25, 2025: member
    Looking at https://corecheck.dev/bitcoin/bitcoin/pulls/31923 it seems that the LogWith* benchmarks are not shown at all? Looks like a bug where the list is truncated?
  14. l0rinc commented at 12:18 pm on February 25, 2025: contributor

    The code is reducing useless work therefore is also closer to how we’d likely write the code from scratch now. It’s basically a refactor that simplifies the happy path, with the side-effect of also being faster.

    as explained in the previous comments: #31923 (comment) and #31923#pullrequestreview-2636342094

    One was a question (I’m still measuring it, but don’t expect to have any scenario where IBD is faster because of this, in the same way as most other refactors also don’t have that finished measuring it, the log writings don’t show up as a measurable part of IBD, it’s more like a refactoring with minor speedup as a side-effect), the other is a comment about an old state.

  15. m3dwards commented at 7:50 pm on February 25, 2025: contributor

    Trying to access https://corecheck.dev/bitcoin/bitcoin/pulls/31923 gives me

    Should be sorted

  16. l0rinc renamed this:
    log,optimization: use original log string when no suspicious chars found
    refactor: use original log string when no suspicious chars found
    on Mar 9, 2025
  17. DrahtBot added the label Refactoring on Mar 9, 2025
  18. l0rinc commented at 4:57 pm on March 9, 2025: contributor

    @maflcko, I’ve reorganized the PR as a refactor where the happy path is now easier to reason about (i.e. no change to the incoming string most of the time) - where the speed gain is just a side-effect of the simpler path taken (it’s why the benchmarks are now PriorityLevel::LOW).

    Please let me know if this fits better with the priorities.

  19. maflcko commented at 11:51 am on March 27, 2025: member

    I’ve reorganized the PR as a refactor

    I’d say my nack still holds. I don’t see the need for adding redundant and questionable benchmarks for refactors. If this was a requirement or even desired for every single refactor of any function, the bench binary would tend to have a number of benchmarks equal to the number of functions in the whole codebase.

  20. log: use original log string when no suspicious chars found
    We have to make sure that logged messages don't contain non-printable characters.
    Until now, we've sanitized these messages by recreating them every time, even though we expect most lines to be already normalized.
    This PR splits out the happy path by identifying the suspicious characters, and if none were found, use the original string, otherwise reconstruct the new string with escaped characters.
    This makes the most common path simpler to reason about.
    d546d2319c
  21. l0rinc force-pushed on Mar 27, 2025
  22. l0rinc commented at 12:24 pm on March 27, 2025: contributor
    I’ve removed the benchmark and update the description, let me know if this seems better.

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-03-31 09:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me