tinyformat: enforce compile-time checks for format string literals #31149

pull stickies-v wants to merge 4 commits into bitcoin:master from stickies-v:2024-10/remove-string-format-overload changing 13 files +79 −71
  1. stickies-v commented at 2:32 pm on October 24, 2024: contributor

    Carved out of #30928 - this PR doesn’t include the controversial change of suppressing tinyformat::format_error throwing behaviour, and just adds enforcement of compile-time checks for format string literals.

    Reduce unexpected run-time crashes from string formatting by enforcing compile-time format string checks for most* tfm::format and strprintf usage.

    This PR should introduce no behaviour change. The main changes are:

    • remove the const std::string& tfm::format overload since it’s not necessary. Usage of this overload is removed by one of:
      • replacing string concatenation in the format string with just an extra parameter
      • using the bilingual_str overload
      • using the new tfm::format_raw functions (only for tinyformat implementation or tests)
    • rename the non-compile-time-validated tfm::format overloads to tfm::format_raw, so existing callsites by default use the safer util::ConstevalFormatString tfm::format overloads. Callsites that for some reason don’t pass the compile-time checks (such as in bitcoin-cli.cpp) can use tfm::format_raw.

    See #31149 (review) for an example on how this PR can prevent a run-time error for a small format string mistake.

    * See #31061 for bilingual_str format(const bilingual_str& fmt, const Args&... args) compile-time checks.

  2. DrahtBot commented at 2:32 pm on October 24, 2024: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage

    For detailed information about the code coverage, see the test coverage report.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK l0rinc
    Stale ACK maflcko, TheCharlatan, hodlinator

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #31174 (tinyformat: Add compile-time checking for literal format strings by ryanofsky)
    • #30930 (netinfo: add peer services column and outbound-only option by jonatack)
    • #30221 (wallet: Ensure best block matches wallet scan state by achow101)
    • #28710 (Remove the legacy wallet and BDB dependency by achow101)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  3. laanwj added the label Utils/log/libs on Oct 24, 2024
  4. maflcko commented at 5:27 pm on October 24, 2024: member

    Only change since previous review (https://github.com/bitcoin/bitcoin/pull/30928#pullrequestreview-2355129479) is dropping one commit.

    re-ACK 4cf12216951e91550c81db807fe0ecfafe6834c9 🔡

    Signature:

    0untrusted comment: signature from minisign secret key on empty file; verify via: minisign -Vm "${path_to_any_empty_file}" -P RWTRmVTMeKV5noAMqVlsMugDDCyyTSbA3Re5AkUrhvLVln0tSaFWglOw -x "${path_to_this_whole_four_line_signature_blob}"
    1RUTRmVTMeKV5npGrKx1nqXCw5zeVHdtdYURB/KlyA/LMFgpNCs+SkW9a8N95d+U4AP1RJMi+krxU1A3Yux4bpwZNLvVBKy0wLgM=
    2trusted comment: re-ACK 4cf12216951e91550c81db807fe0ecfafe6834c9 🔡
    3+Wc7LAsLFEv1ml5wXBpRIhY/WbQ/TIxWPVXqF9OJkPsRA7bKmHlMBit5JhDzclgMU5hAfEPaV/gka44cL1CtBQ==
    
  5. hodlinator approved
  6. hodlinator commented at 8:11 pm on October 24, 2024: contributor

    ACK 4cf12216951e91550c81db807fe0ecfafe6834c9

    Only dropped commit I had concerns about. l0rinc too.

    Does not entirely prevent throwing of exceptions from incorrect format strings (notably translated strings), but is a step in the right direction (validating more format strings at compile time).

  7. in src/test/fuzz/strprintf.cpp:55 in 4cf1221695 outdated
    51@@ -52,27 +52,27 @@ FUZZ_TARGET(str_printf)
    52         CallOneOf(
    53             fuzzed_data_provider,
    54             [&] {
    55-                (void)strprintf(format_string, fuzzed_data_provider.ConsumeRandomLengthString(32));
    56+                (void)tinyformat::format_raw(format_string.c_str(), fuzzed_data_provider.ConsumeRandomLengthString(32));
    


    maflcko commented at 8:59 am on October 25, 2024:

    nit: Maybe to avoid the churn here (changing the same line of code twice), and the c_str bloat, a simple alias could be introduced in an early commit, so that only the alias has to be changed internally in later commits?

    For example

    0template<typename... Args>
    1void fuzz_fmt(const std::string &fmt, const Args&... args){(void)tfm::format_raw(fmt.c_str(), args...);}
    

    stickies-v commented at 11:30 am on October 25, 2024:
    Good idea, thanks - added you as co-author to 99b12b5cfcf438a59e9f54ad6d2228be86a9bfeb
  8. DrahtBot added the label Needs rebase on Oct 25, 2024
  9. stickies-v force-pushed on Oct 25, 2024
  10. stickies-v commented at 11:32 am on October 25, 2024: contributor
    Rebased to address merge-conflict from #29936. Also added commit 99b12b5cfcf438a59e9f54ad6d2228be86a9bfeb to minimize c_str() usage (an earlier review concern of @ryanofsky) and minimize code churn, as suggested by @maflcko.
  11. maflcko commented at 11:39 am on October 25, 2024: member

    re-ACK 23560b468c474bbc6a3abd6c0778da62766ac8f7 🎛

    Signature:

    0untrusted comment: signature from minisign secret key on empty file; verify via: minisign -Vm "${path_to_any_empty_file}" -P RWTRmVTMeKV5noAMqVlsMugDDCyyTSbA3Re5AkUrhvLVln0tSaFWglOw -x "${path_to_this_whole_four_line_signature_blob}"
    1RUTRmVTMeKV5npGrKx1nqXCw5zeVHdtdYURB/KlyA/LMFgpNCs+SkW9a8N95d+U4AP1RJMi+krxU1A3Yux4bpwZNLvVBKy0wLgM=
    2trusted comment: re-ACK 23560b468c474bbc6a3abd6c0778da62766ac8f7 🎛
    3Ao1JTeGWaZ/P4YeyDzYeC0BXiuu/VqnjIy0SbF98ltv2ZirS7plmO4GDfWjCbWSTfMYGkSpCvhNTReEu8OBaAg==
    
  12. DrahtBot requested review from hodlinator on Oct 25, 2024
  13. in src/bitcoin-cli.cpp:546 in 23560b468c outdated
    545-                                m_max_addr_rate_limited_length, "addrl",
    546-                                m_max_age_length, "age");
    547+            result += tfm::format_raw("<->   type   net  v  mping   ping send recv  txn  blk  hb %*s%*s%*s ",
    548+                                      m_max_addr_processed_length, "addrp",
    549+                                      m_max_addr_rate_limited_length, "addrl",
    550+                                      m_max_age_length, "age");
    


    jonatack commented at 12:33 pm on October 25, 2024:

    Is all the indentation churn really necessary?

    Does changing strprintf to tfm_format_raw here catch any potential issues that wouldn’t be seen in manual testing?

    Reduce unexpected run-time crashes from string formatting

    Can you provide an example?


    stickies-v commented at 12:43 pm on October 25, 2024:

    Is all the indentation churn really necessary?

    It’s clang-format output which I believe is what we’re trying to adhere to.

    Does changing strprintf to tfm_format_raw here catch any potential issues that wouldn’t be seen in manual testing?

    The code does not compile with strprintf because this PR enforces compile-time checks, which fail because of the width specifiers in the format string (there’s quite some discussion on these limitations in #30546). See also this excerpt from the PR description that addresses this:

    • rename the non-compile-time-validated tfm::format overloads to tfm::format_raw, so existing callsites by default use the safer util::ConstevalFormatString tfm::format overloads. Callsites that for some reason don’t pass the compile-time checks (such as in bitcoin-cli.cpp) can use tfm::format_raw.

    jonatack commented at 12:50 pm on October 25, 2024:
    Clang-format can be helpful for new code, but I don’t think it’s necessary to re-indent code for it that isn’t otherwise touched – that’s just noisy churn.

    jonatack commented at 12:52 pm on October 25, 2024:
    Please provide an example of how this code could crash currently without this change?

    stickies-v commented at 1:00 pm on October 25, 2024:

    Clang-format can be helpful for new code, but I don’t think it’s necessary to re-indent code for it that isn’t otherwise touched – that’s just noisy churn.

    I’m just going with our clang-format settings as defined in src/.clang-format - if you feel like those settings are suboptimal opening a separate pull/issue for that is probably more appropriate?

    Please provide an example of how this code could crash currently without this change?

    The PR states that there is no behaviour change, so (unless there are run-time bugs we haven’t seen yet) we can’t make this code crash without this change. The point of this PR is to make the code more robust against future changes, where a new/modified format string can cause a run-time error, e.g. if not all code-paths are manually tested by reviewers (which is not unusual).


    stickies-v commented at 1:05 pm on October 25, 2024:

    Please provide an example of how this code could crash currently without this change?

    If you’re asking about how such an error could sneak in without this PR, something like this would lead to a run-time exception (on current master b95adf057a4091941c003db4f854f8212b7e99af):

    (edit: updated example to show something that’s actually covered by our new compile-time tests, instead of the explicitly excluded tfm::format_raw callsites)

     0diff --git a/src/bitcoin-cli.cpp b/src/bitcoin-cli.cpp
     1index 38fd740b71..4bbef270c5 100644
     2--- a/src/bitcoin-cli.cpp
     3+++ b/src/bitcoin-cli.cpp
     4@@ -604,7 +604,7 @@ public:
     5         for (size_t i = 0; i < rows.size(); ++i) {
     6             result += strprintf("\n%-5s", rows[i]); // row header
     7             for (int8_t n : reachable_networks) {
     8-                result += strprintf("%8i", m_counts.at(i).at(n)); // network peers count
     9+                result += strprintf("%8$i", m_counts.at(i).at(n)); // network peers count
    10             }
    11             result += strprintf("   %5i", m_counts.at(i).at(NETWORKS.size())); // total peers count
    12             if (i == 1) { // the outbound row has two extra columns for block relay and manual peer counts
    
    0% ./bitcoin-cli -netinfo
    1error: tinyformat: Positional argument out of range
    

    With this PR, it gets caught at compile-time:

     0[ 51%] Built target bitcoin-wallet
     1../../src/bitcoin-cli.cpp:607:37: error: call to consteval function 'util::ConstevalFormatString<1>::ConstevalFormatString' is not a constant expression
     2                result += strprintf("%8$i", m_counts.at(i).at(n)); // network peers count
     3                                    ^
     4../../src/util/string.h:77:34: note: subexpression not valid in a constant expression
     5        if (num_params != count) throw "Format specifier count must match the argument count!";
     6                                 ^
     7../../src/util/string.h:37:67: note: in call to 'Detail_CheckNumFormatSpecifiers({&"%8$i"[0], 4})'
     8    consteval ConstevalFormatString(const char* str) : fmt{str} { Detail_CheckNumFormatSpecifiers(fmt); }
     9                                                                  ^
    10../../src/bitcoin-cli.cpp:607:37: note: in call to 'ConstevalFormatString(&"%8$i"[0])'
    11                result += strprintf("%8$i", m_counts.at(i).at(n)); // network peers count
    12                                    ^
    131 error generated.
    

    jonatack commented at 1:42 pm on October 25, 2024:

    Thanks for the example. As the primary author of this code, I think the run-time error would be more clear, though a compile-time error would catch it one CLI command earlier. I haven’t found this kind of error to be an issue for me over the past half decade on this code.

    Also, this code path is manually tested by myself, reviewers and users. Perhaps for that reason, and because it’s non-critical code, a maintainer explicitly asked that no test coverage be written for it.

    So either way is fine, I suppose, but this conflicts with #30930 so it’s more an annoyance than helpful for me. The rest of this PR seems fine.


    jonatack commented at 1:45 pm on October 25, 2024:

    I’m just going with our clang-format settings as defined in src/.clang-format

    It’s not a hard requirement to retouch existing code with that tool, and I’d prefer avoiding the churn.


    stickies-v commented at 1:59 pm on October 25, 2024:

    As the primary author of this code, I think the run-time error would be more clear

    I’m sorry, what? For what reason would we not use the logic we already have to catch (some/most) of these format errors at compile-time, and defer them to run-time instead, relying on manual review to make sure each code-path is ran, potentially crashing user software? That seems like the opposite of security engineering best practices to me.

    a maintainer explicitly asked that no test coverage be written for it.

    We’re not writing any test coverage for it here either, it comes for free. How is that a bad thing?

    It’s not a hard requirement to retouch existing code with that tool, and I’d prefer avoiding the churn.

    Noted, but I prefer keeping this as is, I’d rather not override clang-format for this.


    TheCharlatan commented at 2:05 pm on October 25, 2024:

    I’d prefer avoiding the churn.

    We usually call things churn that change things multiple times within the same PR. Indenting the lines below a changed line that belong to the same logical unit is a good thing in my eyes. If we would not do that anywhere, the code would look weird and probably make clang-format unusable after a while.


    stickies-v commented at 2:13 pm on October 25, 2024:

    The rest of this PR seems fine.

    We can’t have the rest of this PR without updating bitcoin-cli.cpp, unless someone implements the complete format string validation logic, for which as per my earlier comment, see #30546.


    jonatack commented at 2:19 pm on October 25, 2024:

    I’m sorry, what? For what reason would we not use the logic we already have to catch (some/most) of these format errors at compile-time, and defer them to run-time instead, relying on manual review to make sure each code-path is ran, potentially crashing user software?

    relying on manual review to make sure each code-path is ran, potentially crashing user software

    Because, as I wrote:

    • I think the run-time error is more clear to understand, as the person working on the code
    • This code path is manually tested anyway, otherwise the command with any non-zero level won’t run

    That seems like the opposite of security engineering best practices to me.

    One could argue the same for discouraging test coverage, but that’s been the stance.


    jonatack commented at 2:21 pm on October 25, 2024:

    I’d prefer avoiding the churn.

    We usually call things churn that change things multiple times within the same PR. Indenting the lines below a changed line that belong to the same logical unit is a good thing in my eyes. If we would not do that anywhere, the code would look weird and probably make clang-format unusable after a while.

    The lines are changed needlessly. Whether we call it noise or churn is unimportant.


    jonatack commented at 2:24 pm on October 25, 2024:
    @TheCharlatan mind being explicit what part of my feedback your emoji relates to?

    stickies-v commented at 2:26 pm on October 25, 2024:

    I think the run-time error is more clear to understand, as the person working on the code

    And I think prioritizing devs getting nicer error messages over increased software run-time stability is almost never a good trade-off. It seems we have fundamentally different views, so I’ll leave this thread unresolved but I strongly disagree with your approach and won’t change the PR because of it.

    This code path is manually tested anyway, otherwise the command with any non-zero level won’t run

    I understand it’s tested, and that’s good, because tfm::format_raw() doesn’t do any compile-time checks.


    TheCharlatan commented at 2:36 pm on October 25, 2024:

    mind being explicit what part of my feedback your emoji relates to?

    Sure

    I haven’t found this kind of error to be an issue for me over the past half decade on this code.

    I’ve seen a bunch of positional args errors in fairly recent times, and catching them at compile time is certainly worthwhile. People tried doing this with various linters, but the linters produced false-positives, or missed certain cases.


    jonatack commented at 2:48 pm on October 25, 2024:

    And I think prioritizing devs getting nicer error messages over increased software run-time stability is almost never a good trade-off.

    I think we agree there, given that you wrote “almost” ;)


    l0rinc commented at 7:48 pm on October 28, 2024:

    Does changing strprintf to tfm_format_raw here catch any potential issues that wouldn’t be seen in manual testing?

    By manual testing do you mean automated testing of print statements?

    Clang-format can be helpful for new code, but I don’t think it’s necessary to re-indent code for it that isn’t otherwise touched

    Please format affected code, don’t leave it unformatted, that’s why the current code is so confusing.

    0Coding Style (General)
    1----------------------
    2
    3Various coding styles have been used during the history of the codebase,
    4and the result is not very consistent. However, we're now trying to converge to
    5a single style, which is specified below. When writing patches, favor the new
    6style over attempting to mimic the surrounding style, except for move-only
    7commits.
    

    I haven’t found this kind of error to be an issue for me over the past half decade on this code.

    Others have found it to be an issue, better catch them as early as we can. Why wait until runtime or CI failure or release.


    jonatack commented at 8:16 pm on October 28, 2024:

    Please format affected code, don’t leave it unformatted, that’s why the current code is so confusing.

    The indentation wasn’t wrong, and in the past, long-term contributors generally or often didn’t wish to reformat code in their pulls that wasn’t wrong in some dangerous way when newer contributors asked for reformatting in reviews. There has been extensive bikeshedding over what the clang-format setup here should even do. For what it’s worth, leaving these arguments on the same line, read left-to-right, which was common in this codebase earlier, would avoid the discussion.


    l0rinc commented at 8:29 pm on October 28, 2024:
    There was no bikeshedding about this before on #31149, it just started.

    jonatack commented at 8:44 pm on October 28, 2024:

    By manual testing do you mean automated testing of print statements?

    No, manual testing by building and running the CLI commands.

    There is no automated test coverage of -netinfo (the touched code above), nor, for instance, of -addrinfo.


    l0rinc commented at 8:51 pm on October 28, 2024:
    By doing some of the validation at compile time, there is now some automated ’test coverage’ on it - i.e. it will bite back if it’s very wrong without any additional manual check.

    jonatack commented at 8:55 pm on October 28, 2024:

    Generally unneeded changes can be avoided, but that was just a suggestion. Further historical context to get started:


    l0rinc commented at 9:15 pm on October 28, 2024:
    Useful context, thanks! We must however keep our kitchen clean, we can’t just be cooking all the time. This change by @stickies-v was a small, low-risk change, making the code slightly more maintainable (or at least slightly less unmaintainable). We should encourage these.
  14. DrahtBot removed the label Needs rebase on Oct 25, 2024
  15. TheCharlatan approved
  16. TheCharlatan commented at 2:00 pm on October 25, 2024: contributor
    Nice, ACK 23560b468c474bbc6a3abd6c0778da62766ac8f7
  17. hodlinator approved
  18. hodlinator commented at 9:15 am on October 26, 2024: contributor

    re-ACK 23560b468c474bbc6a3abd6c0778da62766ac8f7

    Changes since previous ACK:

    • Added file-local fuzz_fmt function to reduce churn.
    • Rebased and applied conversions to new code from #29936.
  19. maflcko requested review from ryanofsky on Oct 28, 2024
  20. DrahtBot added the label Needs rebase on Oct 28, 2024
  21. move-only: ConstevalFormatString tfm::format to tinyformat.h
    In future commits, the non-ConstevalFormatString tfm::format
    overloads will be renamed or removed. To avoid all callsites
    having to include util/string.h (or to have a circular dependency
    between tinyformat.h and util/string.h), just move the
    ConstevalFormatString overloads to tinyformat.h
    771efc25a9
  22. fuzz: refactor: add fuzz_fmt string helper function
    In a future commit, the tfm::format std::string overload will
    be removed to enforce usage of a compile-time checked overload.
    
    To minimize code churn in future commits, add a test-only
    std::string tfm::format helper function.
    
    Co-authored-by: MarcoFalke <*~=`'#}+{/-|&$^_@721217.xyz>
    d33f41ab65
  23. tinyformat: remove std::string overload
    All (non-test) format strings are known at compile-time, so we
    don't need the std::string overload. Remove it so users use the
    safer util::ConstevalFormatString where possible, or the
    bilingual_str overload where applicable.
    b6a39c81e8
  24. tinyformat: force compile-time checks for format string literals
    Callsites such as bitcoin-cli that (incorrectly) fail compile-time
    validation can still use tfm::format_raw.
    b448d60826
  25. stickies-v force-pushed on Oct 28, 2024
  26. stickies-v commented at 7:00 pm on October 28, 2024: contributor
    Force-pushed to rebase to address merge conflict from #31042, no other changes.
  27. DrahtBot removed the label Needs rebase on Oct 28, 2024
  28. l0rinc approved
  29. l0rinc commented at 8:45 pm on October 28, 2024: contributor

    Compared to #30928 I see the following changes:

    • clientversion.cpp: Merge conflict in clientversion at CopyrightHolders -> seems to be resolved correctly
    • logging.h: old LogPrintFormatInternal is retained
    • run-lint-format-strings.py: FALSE_POSITIVES not removed
    • util_string_tests.cpp: FALSE_POSITIVES not moved to tests
    • strprintf.cpp: fuzz_fmt added to avoid renames, try-catch added around fuzz_fmt calls (any reason for not try-catching inside fuzz_fmt instead?)
    • tinyformat.h: try-catch removed from vformat
    • wallet.h: new strprintf migrated

    ACK b448d6082674e2c6d2a172145e33795b01e072ff

  30. DrahtBot requested review from hodlinator on Oct 28, 2024
  31. DrahtBot requested review from TheCharlatan on Oct 28, 2024
  32. DrahtBot requested review from maflcko on Oct 28, 2024
  33. ryanofsky commented at 0:05 am on October 29, 2024: contributor

    Sorry for causing unnecessary churn with my earlier comments. After reading Marco’s comment in #31072 (review) I finally understand what this PR is really trying to do, and why all the changes are being made here. Before reading that comment I thought this PR was trying not just to add compile-time checking for tinyformat format strings, but also to force calls that don’t use compile-time checking to use different formatting functions like format_raw and fuzz_fmt. But now I see goal of this PR really is to enable compile-time checking, and the other changes here are basically just a side-effect of the approach taken to implement it.

    That being said, I think the approach in this PR is not ideal. I think introduction of _raw() functions and uses of c_str() are ugly and unnecessary, and even though there are good changes here like code cleanup in various places, those changes should not be necessary just to have compile-time checking of literal format strings, and bundled in with less positive changes here.

    I implemented an different approach in #31174 which only adds compile-time checking for strprintf/tfm::format string literals, and doesn’t make other changes. If that PR looks ok, maybe we could use that approach, and then sort out which other changes here are actually improvements on their own terms? I would much prefer that to bundling good, bad, and ugly changes together in one PR.

  34. stickies-v commented at 3:20 pm on October 29, 2024: contributor

    Closing this PR in favour of #31174, which I think achieves mostly the same goal but does so in a much more elegant way. Thanks for your review and suggestion @ryanofsky, and everyone else here for the persistent re-review as this work is evolving.

    and the other changes here are basically just a side-effect of the approach taken to implement it.

    I think there is merit in making the less safe (i.e. std::string overload) less convenient to use so that developers who are less familiar with this code don’t accidentally use it when they don’t have to, but I’ll leave a comment on your PR so we can have the conversation there.

  35. stickies-v closed this on Oct 29, 2024

  36. ryanofsky commented at 4:54 pm on October 29, 2024: contributor

    I think there is merit in making the less safe (i.e. std::string overload) less convenient to use so that developers who are less familiar with this code don’t accidentally use it when they don’t have to, but I’ll leave a comment on your PR so we can have the conversation there.

    I agree with this, and commit b6a39c81e85338bc82f3db924157a599aa7e25fa in this PR shows places where code is doing things like strprintf("[" + comment + "]") that get around compile time checking. It’d be great to keep the changes in this PR which clean up that code, and add a more explicit syntax for passing unchecked strings. I think I would prefer a syntax more like strprintf(Unchecked(desc_fmt), ...) instead of tfm::format_raw(desc_fmt.c_str(), ...), but I mainly just think it is best to make these changes separately so PRs are easier to understand and don’t touch so much code.

  37. maflcko commented at 1:38 pm on November 8, 2024: member
    I agree that the changes here are still useful. It would be good to rebase on top of #31174, maybe after it is merged.
  38. fanquake referenced this in commit f44e39c9d0 on Nov 14, 2024
  39. fanquake referenced this in commit 22723c809a on Dec 6, 2024
  40. maflcko commented at 10:41 am on December 7, 2024: member

    I think I would prefer a syntax more like strprintf(Unchecked(desc_fmt), ...) instead of tfm::format_raw(desc_fmt.c_str(), ...)

    Due to inactivity here, I went ahead and implemented this syntax in #31061, with a different name.

    So I think everything here has been picked up.


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-12-22 09:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me