util: Use consteval checked format string in FatalErrorf, LogConnectFailure #30546

pull maflcko wants to merge 4 commits into bitcoin:master from maflcko:2407-log changing 8 files +166 −20
  1. maflcko commented at 10:47 am on July 30, 2024: member

    The test/lint/lint-format-strings.py was designed to count the number of format specifiers and assert that they are equal to the number of parameters passed to the format function. The goal seems reasonable, but the implementation has many problems:

    • It is written in Python, meaning that C++ code can not be parsed correctly. Currently it relies on brittle regex and string parsing.
    • Apart from the parsing errors, there are also many logic errors. For example, count_format_specifiers allows a mix of positional specifiers and non-positional specifiers, which can lead to runtime format bugs. Also, count_format_specifiers silently skipped over “special” format specifiers, which are valid in tinyformat, which again can lead to runtime format bugs being undetected.
    • The brittle logic has a history of breaking in pull requests that are otherwise fine. This causes the CI to fail and the pull request being blocked from progress until the bug in the linter is fixed, or the code is rewritten to work around the bug.
    • It is only run in the CI, or when the developer invokes the script. It would be better if the developer got the error message at compile-time, directly when writing the code.

    Fix all issues by using a consteval checked format string in FatalErrorf and LogConnectFailure.

    This is the first step toward #30530 and a follow-up will apply the approach to the other places.

  2. DrahtBot commented at 10:47 am on July 30, 2024: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage

    For detailed information about the code coverage, see the test coverage report.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK stickies-v, hodlinator, l0rinc, ryanofsky
    Stale ACK TheCharlatan

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #30635 (rpc: add optional blockhash to waitfornewblock by Sjors)
    • #30409 (Introduce waitTipChanged() mining interface, replace RPCNotifyBlockChange, drop CRPCSignals & g_best_block by Sjors)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  3. DrahtBot renamed this:
    util: Use consteval checked format string in FatalErrorf
    util: Use consteval checked format string in FatalErrorf
    on Jul 30, 2024
  4. DrahtBot added the label Utils/log/libs on Jul 30, 2024
  5. maflcko commented at 10:57 am on July 30, 2024: member

    Can be tested with the following diff:

     0diff --git a/src/index/base.cpp b/src/index/base.cpp
     1index 955d7b67c9..9811074a2c 100644
     2--- a/src/index/base.cpp
     3+++ b/src/index/base.cpp
     4@@ -179,7 +179,7 @@ void BaseIndex::Sync()
     5                 }
     6             }
     7             if (pindex_next->pprev != pindex && !Rewind(pindex, pindex_next->pprev)) {
     8-                FatalErrorf("%s: Failed to rewind index %s to a previous chain tip", __func__, GetName());
     9+                FatalErrorf("%s: /* Failed to rewind index %s to a previous chain tip */", __func__, GetName());
    10                 return;
    11             }
    12             pindex = pindex_next;
    

    On master (fails):

    0$ ./test/lint/lint-format-strings.py 
    1src/index/base.cpp: Expected 1 argument(s) after format string but found 2 argument(s): FatalErrorf("%s: ", __func__, GetName())
    

    On this pull (compilation check passes correctly).

  6. maflcko force-pushed on Jul 30, 2024
  7. DrahtBot added the label CI failed on Jul 30, 2024
  8. maflcko force-pushed on Jul 30, 2024
  9. DrahtBot removed the label CI failed on Jul 30, 2024
  10. maflcko force-pushed on Aug 1, 2024
  11. maflcko added the label CI failed on Aug 1, 2024
  12. maflcko force-pushed on Aug 1, 2024
  13. in src/util/string.h:38 in fac123acec outdated
    33+                begin_f = false;
    34+                continue;
    35+            }
    36+            if (perc) begin_f = true;
    37+        }
    38+        assert(num_params == count);
    


    TheCharlatan commented at 2:20 pm on August 1, 2024:

    I think it would be good practice to add a comment directly behind these asserts in consteval functions explaining what the error is, so we get better error messages e.g.

    0assert(num_params == count); // The number of format arguments has to match the number of format specifiers in the string.
    

    maflcko commented at 3:35 pm on August 1, 2024:
    Thanks, done!

    maflcko commented at 6:59 am on August 2, 2024:

    Changed to throw, to avoid having to add an include:

    0util/string.h should add these lines:
    1#include <assert.h>     // for assert
    

    It will look like:

    0./util/string.h:38:34: note: subexpression not valid in a constant expression
    1   38 |         if (num_params != count) throw "Number of format specifiers and arguments must match";
    2      |                                  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    3util/string.cpp:15:5: note: in call to 'CheckNumFormatSpecifiers({0, &""[0]})'
    4   15 |     ConstevalFormatString<1>::CheckNumFormatSpecifiers("");
    5      |     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    
  14. DrahtBot removed the label CI failed on Aug 1, 2024
  15. maflcko force-pushed on Aug 1, 2024
  16. maflcko force-pushed on Aug 2, 2024
  17. in src/index/base.cpp:31 in fa1f64d15e outdated
    26@@ -27,9 +27,9 @@ constexpr auto SYNC_LOG_INTERVAL{30s};
    27 constexpr auto SYNC_LOCATOR_WRITE_INTERVAL{30s};
    28 
    29 template <typename... Args>
    30-void BaseIndex::FatalErrorf(const char* fmt, const Args&... args)
    31+void BaseIndex::FatalErrorf(util::ConstevalFormatString<sizeof...(Args)> fmt, const Args&... args)
    


    TheCharlatan commented at 11:55 am on August 15, 2024:
    Nit: Add include for util/string.h

    maflcko commented at 12:27 pm on August 15, 2024:
    Sure, done
  18. TheCharlatan approved
  19. TheCharlatan commented at 11:56 am on August 15, 2024: contributor
    ACK fa1f64d15e75137a6d4b469e7de6e1be0fda762a
  20. maflcko force-pushed on Aug 15, 2024
  21. maflcko force-pushed on Aug 29, 2024
  22. maflcko commented at 4:06 am on August 30, 2024: member
    Rebased for fresh cmake CI
  23. in src/index/base.cpp:31 in faafbcd5e7 outdated
    27@@ -27,9 +28,9 @@ constexpr auto SYNC_LOG_INTERVAL{30s};
    28 constexpr auto SYNC_LOCATOR_WRITE_INTERVAL{30s};
    29 
    30 template <typename... Args>
    31-void BaseIndex::FatalErrorf(const char* fmt, const Args&... args)
    32+void BaseIndex::FatalErrorf(util::ConstevalFormatString<sizeof...(Args)> fmt, const Args&... args)
    


    l0rinc commented at 12:57 pm on August 30, 2024:

    Reproduced it locally by adding an invalid entry:

    0FatalErrorf("%s%s: Failed to rewind index %s to a previous chain tip", __func__, GetName());
    

    The error looks like:

    bitcoin/src/index/base.cpp:183:29: error: call to consteval function ‘util::ConstevalFormatString<2>::ConstevalFormatString’ is not a constant expression FatalErrorf("%s%s: Failed to rewind index %s to a previous chain tip", func, GetName());


    l0rinc commented at 1:06 pm on August 30, 2024:

    Also checked the valid values from the tinyformat docs:

    0FatalErrorf("%s%%: Failed to rewind index %s to a previous chain tip", __func__, GetName());
    1FatalErrorf("%s, %s %d, %.2d:%.2d\n", 0, 0, 0, 0, 0);
    2FatalErrorf("%1$s, %3$d. %2$s, %4$d:%5$.2d\n", 0, 0, 0, 0, 0);
    

    They’re not giving any errors 👍

  24. in src/util/string.cpp:13 in faafbcd5e7 outdated
     7@@ -8,6 +8,25 @@
     8 #include <string>
     9 
    10 namespace util {
    11+
    12+// Compile-time sanity checks
    13+static_assert([] {
    


    l0rinc commented at 2:18 pm on August 30, 2024:
    This would pass if we forgot to add an exception (the point of this PR), so it would be preferable to test the negative cases as well. This is the reason for my NACK

    maflcko commented at 12:50 pm on September 5, 2024:

    NACK

    Ok, moved the tests to a unit test file. I hope this is better.

  25. in src/util/string.h:25 in faafbcd5e7 outdated
    16@@ -17,6 +17,28 @@
    17 #include <vector>
    18 
    19 namespace util {
    20+/** Type to denote a format string that was checked at compile time */
    21+template <int num_params>
    22+struct ConstevalFormatString {
    23+    const char* const fmt;
    24+    consteval ConstevalFormatString(const char* str) : fmt{str} { CheckNumFormatSpecifiers(fmt); }
    25+    consteval static void CheckNumFormatSpecifiers(std::string_view str)
    


    l0rinc commented at 2:21 pm on August 30, 2024:
    making this consteval prohibits us from testing this properly - positives are awkward, and negatives are pretty much impossible. We could change this to a constexpr, leaving the constructor as consteval, keeping the usage failures during runtime and be able to properly test it via positive and negative cases.

    hodlinator commented at 10:10 pm on August 30, 2024:
    One could potentially keep it consteval and have CheckNumFormatSpecifiers return a bool to provide for negatives, but the error messaging would suffer.

    maflcko commented at 12:52 pm on September 5, 2024:

    We could change this to a constexpr, leaving the constructor as consteval, keeping the usage failures during runtime and be able to properly test it via positive and negative cases.

    Ok, done

  26. in src/util/string.h:38 in faafbcd5e7 outdated
    33+                begin_f = false;
    34+                continue;
    35+            }
    36+            if (perc) begin_f = true;
    37+        }
    38+        if (num_params != count) throw "Number of format specifiers and arguments must match";
    


    l0rinc commented at 2:22 pm on August 30, 2024:

    We should be able to simplify this.


    Suggestion 1: we swap the state machine when encountering a %, otherwise inc the param count and step out of the format:

     0constexpr static void CheckNumFormatSpecifiers(std::string_view str)
     1{
     2    int count{0};
     3    bool in_format{false};
     4    for (char c : str) {
     5        if (c == '%') in_format = !in_format;
     6        else if (in_format) {
     7            ++count;
     8            in_format = false;
     9        }
    10    }
    11    if (num_params != count) throw std::runtime_error("Format specifier count must match the argument count!");
    12 }
    

    Suggestion 2: count the excesses and throw when there is any:

     0constexpr static void CheckNumFormatSpecifiers(std::string_view str)
     1{
     2    int excess{num_params};
     3    bool in_format{false};
     4    for (char c : str) {
     5        if (c == '%') in_format = !in_format;
     6        else if (in_format) {
     7            --excess;
     8            in_format = false;
     9        }
    10    }
    11    if (excess) throw std::runtime_error("Format specifier count must match the argument count!");
    12 }
    

    l0rinc commented at 2:26 pm on August 30, 2024:

    if we’d throw a runtime_error, testing would become slightly more reasonable:

    0std::runtime_error("Format specifier count must match the argument count!")
    

    otherwise it’s:

    0BOOST_CHECK_THROW(ConstevalFormatString<1>::CheckNumFormatSpecifiers(""), const char*);
    

    nit: slight rewording


    maflcko commented at 12:52 pm on September 5, 2024:

    BOOST_CHECK_THROW

    I am not using BOOST_CHECK_THROW, so I’ll leave this as-is for now.


    l0rinc commented at 1:03 pm on September 5, 2024:

    How come? Wouldn’t that be a simpler way to check that we’re throwing in this case (instead of the constant err and err_types which cannot even change)

    0BOOST_CHECK_THROW(ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers(""), const char*);
    

    maflcko commented at 2:19 pm on September 5, 2024:
    In the future more (constant) errors will be added, so I think checking what type of error was thrown is useful and doesn’t seem harmful either way in the tests?

    l0rinc commented at 2:45 pm on September 5, 2024:

    Seems a bit excessive to me, could we maybe specialize the tests when those errors are actually added?

    Edit: the build failure seems related:

    D:\a\bitcoin\bitcoin\src\test\util_string_tests.cpp(37,86): error C2326: ‘bool util_string_tests::ConstevalFormatString_NumSpec::test_method::CheckThrow::operator ()(util_string_tests::ConstevalFormatString_NumSpec::test_method::err_type &) const’: function cannot access ’err’ [D:\a\bitcoin\bitcoin\build\src\test\test_bitcoin.vcxproj]


    maflcko commented at 4:13 pm on September 5, 2024:

    I think one of them may have been my initial version, but I changed it because I didn’t like the else and assigning the bool from itself (inverted).

    So I’ll leave this as-is for now.


    maflcko commented at 4:28 pm on September 5, 2024:

    build failure

    Hmm. Pushed something to work around the msvc build failure.


    l0rinc commented at 6:26 pm on September 5, 2024:
    These alternatives work like a simple state machine, close to how we would look at the code: are we inside a format (i.e. started with %) or outside, switching states between the two. The second suggestion can even be used to determine is the number of params is more or if it’s less than the desired one.

    l0rinc commented at 7:46 pm on September 5, 2024:

    By counting the excess we could give more fine-grained errors, e.g:

     0constexpr static void Detail_CheckNumFormatSpecifiers(std::string_view str)
     1{
     2    int excess{num_params};
     3    bool in_format{false};
     4    for (char c: str) {
     5        if (c == '%') in_format = !in_format;
     6        else if (in_format) {
     7            --excess;
     8            in_format = false;
     9        }
    10    }
    11    if (in_format) throw "Format specifier incorrectly terminated by end of string!";
    12    if (excess > 0) throw "Too few format specifiers!";
    13    if (excess < 0) throw "Too many format specifiers!";
    14}
    

    and tests like:

    0auto check_too_many_specifiers = [](ErrType str) { return str == "Too many format specifiers!"s; };
    1BOOST_CHECK_EXCEPTION(ConstevalFormatString<0>::Detail_CheckNumFormatSpecifiers("%_"), ErrType, check_too_many_specifiers);
    2
    3auto check_too_few_specifiers = [](ErrType str) { return str == "Too few format specifiers!"s; };
    4BOOST_CHECK_EXCEPTION(ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers(""), ErrType, check_too_few_specifiers);
    5BOOST_CHECK_EXCEPTION(ConstevalFormatString<2>::Detail_CheckNumFormatSpecifiers("%_"), ErrType, check_too_few_specifiers);
    

    maflcko commented at 5:18 am on September 6, 2024:

    By counting the excess we could give more fine-grained errors, e.g:

    I don’t think this is useful, because the error message will say that there are “too few/many format specifiers”. However, the inverse might be true: “Too many/few args”.

    So I’ll leave this as-is for now. I’d say the important thing is the error, not the exact wording of the error message.


    l0rinc commented at 8:27 am on September 6, 2024:

    important thing is the error, not the exact wording of the error message

    You have separate error for trailing %, why not have dedicated errors for when the format specifiers are fewer/more than the args?


    maflcko commented at 8:31 am on September 6, 2024:

    why not

    The reason is in my previous reply.


    l0rinc commented at 8:40 am on September 6, 2024:
    I saw your previous response, I was quoting from it in mine. Anyway, you can resolve this comment.
  27. l0rinc changes_requested
  28. l0rinc commented at 2:35 pm on August 30, 2024: contributor

    Approach NACK, it seems to me there’s a way to properly test this (including failure cases):

      0Index: src/test/string_tests.cpp
      1<+>UTF-8
      2===================================================================
      3diff --git a/src/test/string_tests.cpp b/src/test/string_tests.cpp
      4new file mode 100644
      5--- /dev/null(date 1725028151058)
      6+++ b/src/test/string_tests.cpp(date 1725028151058)
      7@@ -0,0 +1,52 @@
      8+#include <boost/test/unit_test.hpp>
      9+#include <index/base.h>
     10+#include <test/util/setup_common.h>
     11+#include <univalue.h>
     12+#include <util/string.h>
     13+
     14+#include <string>
     15+
     16+using namespace util;
     17+
     18+BOOST_FIXTURE_TEST_SUITE(string_tests, BasicTestingSetup)
     19+
     20+BOOST_AUTO_TEST_CASE(CheckNumFormatSpecifiers_test)
     21+{
     22+    // No format specifiers
     23+    ConstevalFormatString<0>::CheckNumFormatSpecifiers("");
     24+    ConstevalFormatString<0>::CheckNumFormatSpecifiers("%");
     25+    ConstevalFormatString<0>::CheckNumFormatSpecifiers("%%");
     26+    ConstevalFormatString<0>::CheckNumFormatSpecifiers("%%%");
     27+    ConstevalFormatString<0>::CheckNumFormatSpecifiers("_%");
     28+    ConstevalFormatString<0>::CheckNumFormatSpecifiers("_%%");
     29+    ConstevalFormatString<0>::CheckNumFormatSpecifiers("%%_%");
     30+    ConstevalFormatString<0>::CheckNumFormatSpecifiers("_%%%");
     31+
     32+    // Single format specifier
     33+    ConstevalFormatString<1>::CheckNumFormatSpecifiers("%d");
     34+    ConstevalFormatString<1>::CheckNumFormatSpecifiers("%_");
     35+    ConstevalFormatString<1>::CheckNumFormatSpecifiers("%_%");
     36+    ConstevalFormatString<1>::CheckNumFormatSpecifiers("%%%_");
     37+    ConstevalFormatString<1>::CheckNumFormatSpecifiers("%_%%");
     38+    ConstevalFormatString<1>::CheckNumFormatSpecifiers("%d%%");
     39+
     40+    // Multiple format specifiers
     41+    ConstevalFormatString<2>::CheckNumFormatSpecifiers("%d%s");
     42+    ConstevalFormatString<3>::CheckNumFormatSpecifiers("%d%s%f");
     43+    ConstevalFormatString<2>::CheckNumFormatSpecifiers("%%d%d%i");
     44+    ConstevalFormatString<3>::CheckNumFormatSpecifiers("%d%s%%d%i");
     45+
     46+    // Positional arguments
     47+    ConstevalFormatString<2>::CheckNumFormatSpecifiers("%1$d%2$s");
     48+    ConstevalFormatString<3>::CheckNumFormatSpecifiers("%3$d%1$s%2$f");
     49+    ConstevalFormatString<3>::CheckNumFormatSpecifiers("%3$d%%d%1$s%2$f");
     50+    ConstevalFormatString<2>::CheckNumFormatSpecifiers("%1$d%s");
     51+
     52+    // Negative cases
     53+    BOOST_CHECK_THROW(ConstevalFormatString<1>::CheckNumFormatSpecifiers(""), std::runtime_error);
     54+    BOOST_CHECK_THROW(ConstevalFormatString<0>::CheckNumFormatSpecifiers("%d"), std::runtime_error);
     55+    BOOST_CHECK_THROW(ConstevalFormatString<2>::CheckNumFormatSpecifiers("%d"), std::runtime_error);
     56+    BOOST_CHECK_THROW(ConstevalFormatString<3>::CheckNumFormatSpecifiers("%1$d%2$s"), std::runtime_error);
     57+}
     58+
     59+BOOST_AUTO_TEST_SUITE_END()
     60\ No newline at end of file
     61Index: src/util/string.h
     62<+>UTF-8
     63===================================================================
     64diff --git a/src/util/string.h b/src/util/string.h
     65--- a/src/util/string.h(revision 60c75975b55adcce015a857c529baafd65b58394)
     66+++ b/src/util/string.h(date 1725027928051)
     67@@ -22,20 +22,18 @@
     68 struct ConstevalFormatString {
     69     const char* const fmt;
     70     consteval ConstevalFormatString(const char* str) : fmt{str} { CheckNumFormatSpecifiers(fmt); }
     71-    consteval static void CheckNumFormatSpecifiers(std::string_view str)
     72+    constexpr static void CheckNumFormatSpecifiers(std::string_view str)
     73     {
     74-        int count{0};
     75-        bool begin_f{false};
     76+        int excess{num_params};
     77+        bool in_format{false};
     78         for (char c : str) {
     79-            bool perc{c == '%'};
     80-            if (begin_f) {
     81-                count += !perc;
     82-                begin_f = false;
     83-                continue;
     84+            if (c == '%') in_format = !in_format;
     85+            else if (in_format) {
     86+                --excess;
     87+                in_format = false;
     88             }
     89-            if (perc) begin_f = true;
     90         }
     91-        if (num_params != count) throw "Number of format specifiers and arguments must match";
     92+        if (excess) throw std::runtime_error("Format specifier count must match the argument count!");
     93     }
     94 };
     95 
     96Index: src/test/CMakeLists.txt
     97<+>UTF-8
     98===================================================================
     99diff --git a/src/test/CMakeLists.txt b/src/test/CMakeLists.txt
    100--- a/src/test/CMakeLists.txt(revision 60c75975b55adcce015a857c529baafd65b58394)
    101+++ b/src/test/CMakeLists.txt(date 1725025275931)
    102@@ -119,6 +119,7 @@
    103   sock_tests.cpp
    104   span_tests.cpp
    105   streams_tests.cpp
    106+  string_tests.cpp
    107   sync_tests.cpp
    108   system_tests.cpp
    109   timeoffsets_tests.cpp
    110Index: src/util/string.cpp
    111<+>UTF-8
    112===================================================================
    113diff --git a/src/util/string.cpp b/src/util/string.cpp
    114--- a/src/util/string.cpp(revision 60c75975b55adcce015a857c529baafd65b58394)
    115+++ b/src/util/string.cpp(date 1725025405333)
    116@@ -9,24 +9,6 @@
    117 
    118 namespace util {
    119 
    120-// Compile-time sanity checks
    121-static_assert([] {
    122-    ConstevalFormatString<0>::CheckNumFormatSpecifiers("");
    123-    ConstevalFormatString<0>::CheckNumFormatSpecifiers("%");
    124-    ConstevalFormatString<0>::CheckNumFormatSpecifiers("%%");
    125-    ConstevalFormatString<0>::CheckNumFormatSpecifiers("%%%");
    126-    ConstevalFormatString<1>::CheckNumFormatSpecifiers("%_");
    127-    ConstevalFormatString<0>::CheckNumFormatSpecifiers("_%");
    128-    ConstevalFormatString<0>::CheckNumFormatSpecifiers("%%_");
    129-    ConstevalFormatString<1>::CheckNumFormatSpecifiers("%_%");
    130-    ConstevalFormatString<0>::CheckNumFormatSpecifiers("_%%");
    131-    ConstevalFormatString<1>::CheckNumFormatSpecifiers("%%%_");
    132-    ConstevalFormatString<0>::CheckNumFormatSpecifiers("%%_%");
    133-    ConstevalFormatString<1>::CheckNumFormatSpecifiers("%_%%");
    134-    ConstevalFormatString<0>::CheckNumFormatSpecifiers("_%%%");
    135-    return true; // All checks above compiled and passed
    136-}());
    137-
    138 void ReplaceAll(std::string& in_out, const std::string& search, const std::string& substitute)
    139 {
    140     if (search.empty()) return;
    
  29. in src/util/string.cpp:17 in faafbcd5e7 outdated
    12+// Compile-time sanity checks
    13+static_assert([] {
    14+    ConstevalFormatString<0>::CheckNumFormatSpecifiers("");
    15+    ConstevalFormatString<0>::CheckNumFormatSpecifiers("%");
    16+    ConstevalFormatString<0>::CheckNumFormatSpecifiers("%%");
    17+    ConstevalFormatString<0>::CheckNumFormatSpecifiers("%%%");
    


    l0rinc commented at 2:42 pm on August 30, 2024:
    shouldn’t we warn in this case as well, i.e. an unfinished %? The tinyformat tests use two trailing % when it’s a literal %: https://github.com/c42f/tinyformat/blob/master/tinyformat_test.cpp#L144

    maflcko commented at 12:49 pm on September 5, 2024:
    Maybe in a follow-up, given that the existing linter doesn’t warn either?

    maflcko commented at 4:28 pm on September 5, 2024:

    Actually, done here, because it is just one line of code :sweat_smile:

  30. hodlinator changes_requested
  31. hodlinator commented at 10:15 pm on August 30, 2024: contributor

    Concept ACK faafbcd5e7245586e3f002147b65cc6133bf9a6f

    Specifier validation

    run-lint-format-strings.py does handle more complex format specifications through the "%(.*?)[aAcdeEfFgGinopsuxX]" regexp. (https://en.cppreference.com/w/cpp/io/c/fprintf for details on what should be supported).

    Unsupported conversion specifiers are printed as-is by printf: printf("%1 %H %d %K %k", 10); -> "%1 %H 10 %K %k" Should easily be made into errors: std::string_view{"aAcdeEfFgGinopsuxX"}.find(c) == std::string_view::npos

    (Having something between the %-sign and a valid conversion specifier-letter should probably be an error if we don’t handle it).

    (Type safety)

    It would be really cool if we also checked for type safety à la attribute ((format (printf, …))) but run-lint-format-strings.py primarily attempts to verify that the count of args matched. Better done as a follow-up PR if someone’s up for it.

  32. maflcko force-pushed on Sep 5, 2024
  33. in src/test/util_string_tests.cpp:45 in fa09c161ca outdated
    26+        ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers("%%%_");
    27+        ConstevalFormatString<0>::Detail_CheckNumFormatSpecifiers("%%_%");
    28+        ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers("%_%%");
    29+        ConstevalFormatString<0>::Detail_CheckNumFormatSpecifiers("_%%%");
    30+        return true; // All checks above compiled and passed
    31+    }());
    


    l0rinc commented at 1:06 pm on September 5, 2024:

    the errors are really ugly this way:

    util_string_tests.cpp:16:19: error: static assertion expression is not an integral constant expression static_assert([] { ^~~~

    Would there be any disadvantage in inlining them?

    0BOOST_AUTO_TEST_CASE(ConstevalFormatString_NumSpec)
    1{
    2    ConstevalFormatString<0>::Detail_CheckNumFormatSpecifiers("");
    3    ConstevalFormatString<0>::Detail_CheckNumFormatSpecifiers("%");
    4...
    5    ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers("%_%%");
    6    ConstevalFormatString<0>::Detail_CheckNumFormatSpecifiers("_%%%");
    

    maflcko commented at 3:01 pm on September 5, 2024:

    ugly

    Both clang and gcc print the subexpression as well as the violating call, and the error will look similar to a “real” error (except for being more verbose).

    A failure in your example would be simply a core dump, which is even worse, no?

    Moving the checks from being done at compile-time to be done at runtime means that possible errors will be caught later.


    l0rinc commented at 6:38 pm on September 5, 2024:

    While the code is slightly simpler (-4 lines, indentation), the error message in case of failure is also simpler:

     0bitcoin/src/test/util_string_tests.cpp:16:19: error: static assertion expression is not an integral constant expression
     1    static_assert([] {
     2                  ^~~~
     3bitcoin/src/util/string.h:39:34: note: subexpression not valid in a constant expression
     4        if (num_params != count) throw "Format specifier count must match the argument count!";
     5                                 ^
     6bitcoin/src/test/util_string_tests.cpp:22:9: note: in call to 'Detail_CheckNumFormatSpecifiers({&"% % %_"[0], 6})'
     7        ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers("% % %_");
     8        ^
     9bitcoin/src/test/util_string_tests.cpp:16:19: note: in call to '&[] {
    10    ConstevalFormatString<0>::Detail_CheckNumFormatSpecifiers("");
    11    ConstevalFormatString<0>::Detail_CheckNumFormatSpecifiers("%%");
    12    ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers("%_");
    13    ConstevalFormatString<0>::Detail_CheckNumFormatSpecifiers("%%_");
    14    ConstevalFormatString<0>::Detail_CheckNumFormatSpecifiers("_%%");
    15    ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers("% % %_");
    16    ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers("%_%%");
    17    return true;
    18}->operator()()'
    19    static_assert([] {
    20                  ^
    211 error generated.
    22make[2]: *** [src/test/CMakeFiles/test_bitcoin.dir/util_string_tests.cpp.o] Error 1
    23make[1]: *** [src/test/CMakeFiles/test_bitcoin.dir/all] Error 2
    24make: *** [all] Error 2
    
    0unknown location:0: fatal error: in "util_string_tests/ConstevalFormatString_NumSpec": C string: Format specifier count must match the argument count!
    1src/test/util_string_tests.cpp:13: last checkpoint: "ConstevalFormatString_NumSpec" test entry
    2
    3*** 1 failure is detected in the test module "Bitcoin Core Test Suite"
    

    Moving the checks from being done at compile-time to be done at runtime means that possible errors will be caught later

    In reality these lines will rarely be modified - but probably read more often, so I’d go for the simpler code. If you still insist that this is better, please resolve the comment.


    maflcko commented at 5:19 am on September 6, 2024:

    If you still insist that this is better, please resolve the comment.

    Yes, for now I’ll resolve this. Thanks for the review!

  34. in src/test/util_string_tests.cpp:41 in fa09c161ca outdated
    36+    struct CheckThrow {
    37+        bool operator()(const err_type& str) const { return std::string_view{str} == err; }
    38+    };
    39+    BOOST_CHECK_EXCEPTION(ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers(""), err_type, CheckThrow{});
    40+    BOOST_CHECK_EXCEPTION(ConstevalFormatString<0>::Detail_CheckNumFormatSpecifiers("%_"), err_type, CheckThrow{});
    41+    BOOST_CHECK_EXCEPTION(ConstevalFormatString<2>::Detail_CheckNumFormatSpecifiers("%_"), err_type, CheckThrow{});
    


    l0rinc commented at 1:09 pm on September 5, 2024:

    As mentioned in another comment, wouldn’t this be simpler:

    0    BOOST_CHECK_THROW(ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers(""), const char*);
    1    BOOST_CHECK_THROW(ConstevalFormatString<0>::Detail_CheckNumFormatSpecifiers("%_"), const char*);
    2    BOOST_CHECK_THROW(ConstevalFormatString<2>::Detail_CheckNumFormatSpecifiers("%_"), const char*);
    

    ?

  35. in src/util/string.cpp:1 in fa591a2529 outdated
    0@@ -1,4 +1,4 @@
    1-// Copyright (c) 2019-2022 The Bitcoin Core developers
    2+// Copyright (c) 2019-present The Bitcoin Core developers
    


    l0rinc commented at 1:11 pm on September 5, 2024:
    is this a leftover or just a header-symmetry?

    maflcko commented at 4:28 pm on September 5, 2024:
    For symmetry, but removed in the last push.
  36. maflcko force-pushed on Sep 5, 2024
  37. in src/test/util_string_tests.cpp:32 in fa72ce6642 outdated
    27+    // Negative checks at runtime
    28+    using ErrType = const char*;
    29+
    30+    struct CheckThrowNumSpec {
    31+        bool operator()(const ErrType& str) const { return std::string_view{str} == "Format specifier count must match the argument count!"; }
    32+    };
    


    l0rinc commented at 7:11 pm on September 5, 2024:

    Instead of a functor I think we can simplify this by using a lambda. And we can also simplify the comparison by using string_literals, i.e.:

    0#include <string>
    1using namespace std::string_literals;
    2...
    3    auto check_throw_num_spec = [](ErrType str) { return str == "Format specifier count must match the argument count!"s; };
    4    BOOST_CHECK_EXCEPTION(ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers(""), ErrType, check_throw_num_spec);
    5...
    

    maflcko commented at 5:35 am on September 6, 2024:
    Thanks, switched to lambdas! (Kept the string_view for now)

    l0rinc commented at 8:18 am on September 6, 2024:
    What’s the advantage of the string_view version?

    maflcko commented at 8:30 am on September 6, 2024:
    Just being a bit more explicit. A single s character may otherwise be missed or removed in the future, leading to a raw pointer compare, which would be wrong.

    l0rinc commented at 8:37 am on September 6, 2024:
    ok, that’s fair, thanks.
  38. l0rinc commented at 7:12 pm on September 5, 2024: contributor
    Concept ACK, the remaining nits are non-blocking
  39. in src/util/string.h:27 in fa72ce6642 outdated
    22+struct ConstevalFormatString {
    23+    const char* const fmt;
    24+    consteval ConstevalFormatString(const char* str) : fmt{str} { Detail_CheckNumFormatSpecifiers(fmt); }
    25+    constexpr static void Detail_CheckNumFormatSpecifiers(std::string_view str)
    26+    {
    27+        int count{0};
    


    l0rinc commented at 7:15 pm on September 5, 2024:

    If you keep this version, num_params can likely be a size_t instead:

     0diff --git a/src/util/string.h b/src/util/string.h
     1--- a/src/util/string.h	(revision fa72ce66421d3f90a6794b3e54e56873ae81265f)
     2+++ b/src/util/string.h	(date 1725561949080)
     3@@ -18,13 +18,13 @@
     4 
     5 namespace util {
     6 /** Type to denote a format string that was checked at compile time */
     7-template <int num_params>
     8+template <size_t num_params>
     9 struct ConstevalFormatString {
    10     const char* const fmt;
    11     consteval ConstevalFormatString(const char* str) : fmt{str} { Detail_CheckNumFormatSpecifiers(fmt); }
    12     constexpr static void Detail_CheckNumFormatSpecifiers(std::string_view str)
    13     {
    14-        int count{0};
    15+        size_t count{0};
    16         bool begin_f{false};
    17         for (char c : str) {
    18             bool perc{c == '%'};
    

    maflcko commented at 5:23 am on September 6, 2024:

    Compilers can only walk so far in a consteval/constexpr context, so that it doesn’t matter if there are 31 bits to count, or 64.

    I’ll leave this as-is for now.


    l0rinc commented at 8:16 am on September 6, 2024:
    it’s not the counting, it’s to document that this version doesn’t have negatives (like the excess version does)

    maflcko commented at 8:35 am on September 6, 2024:
    Ok, in that case I’d prefer unsigned count{0};. I may switch to that, if I have to re-touch.
  40. hodlinator commented at 9:08 pm on September 5, 2024: contributor

    Invalid chars check

    Here follows an implementation of the low bar of what I stated in #30546#pullrequestreview-2273568357. It’s implemented on top of fa72ce66421d3f90a6794b3e54e56873ae81265f.

    Diff

    I’ve confirmed that “aAcdfeEfFgGinopsuxX” matches what I linked to in cppreference and also the Python version being replaced.

     0
     1diff --git a/src/test/util_string_tests.cpp b/src/test/util_string_tests.cpp
     2index ecc28f5b96..9254f57b64 100644
     3--- a/src/test/util_string_tests.cpp
     4+++ b/src/test/util_string_tests.cpp
     5@@ -16,11 +16,11 @@ BOOST_AUTO_TEST_CASE(ConstevalFormatString_NumSpec)
     6     static_assert([] {
     7         ConstevalFormatString<0>::Detail_CheckNumFormatSpecifiers("");
     8         ConstevalFormatString<0>::Detail_CheckNumFormatSpecifiers("%%");
     9-        ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers("%_");
    10-        ConstevalFormatString<0>::Detail_CheckNumFormatSpecifiers("%%_");
    11-        ConstevalFormatString<0>::Detail_CheckNumFormatSpecifiers("_%%");
    12-        ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers("%%%_");
    13-        ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers("%_%%");
    14+        ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers("%s");
    15+        ConstevalFormatString<0>::Detail_CheckNumFormatSpecifiers("%%s");
    16+        ConstevalFormatString<0>::Detail_CheckNumFormatSpecifiers("s%%");
    17+        ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers("%%%s");
    18+        ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers("%s%%");
    19         return true; // All checks above compiled and passed
    20     }());
    21 
    22@@ -31,18 +31,25 @@ BOOST_AUTO_TEST_CASE(ConstevalFormatString_NumSpec)
    23         bool operator()(const ErrType& str) const { return std::string_view{str} == "Format specifier count must match the argument count!"; }
    24     };
    25     BOOST_CHECK_EXCEPTION(ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers(""), ErrType, CheckThrowNumSpec{});
    26-    BOOST_CHECK_EXCEPTION(ConstevalFormatString<0>::Detail_CheckNumFormatSpecifiers("%_"), ErrType, CheckThrowNumSpec{});
    27-    BOOST_CHECK_EXCEPTION(ConstevalFormatString<2>::Detail_CheckNumFormatSpecifiers("%_"), ErrType, CheckThrowNumSpec{});
    28+    BOOST_CHECK_EXCEPTION(ConstevalFormatString<0>::Detail_CheckNumFormatSpecifiers("%s"), ErrType, CheckThrowNumSpec{});
    29+    BOOST_CHECK_EXCEPTION(ConstevalFormatString<2>::Detail_CheckNumFormatSpecifiers("%s"), ErrType, CheckThrowNumSpec{});
    30 
    31     struct CheckThrowTerm {
    32         bool operator()(const ErrType& str) const { return std::string_view{str} == "Format specifier incorrectly terminated by end of string!"; }
    33     };
    34     BOOST_CHECK_EXCEPTION(ConstevalFormatString<0>::Detail_CheckNumFormatSpecifiers("%"), ErrType, CheckThrowTerm{});
    35     BOOST_CHECK_EXCEPTION(ConstevalFormatString<0>::Detail_CheckNumFormatSpecifiers("%%%"), ErrType, CheckThrowTerm{});
    36-    BOOST_CHECK_EXCEPTION(ConstevalFormatString<0>::Detail_CheckNumFormatSpecifiers("_%"), ErrType, CheckThrowTerm{});
    37-    BOOST_CHECK_EXCEPTION(ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers("%_%"), ErrType, CheckThrowTerm{});
    38-    BOOST_CHECK_EXCEPTION(ConstevalFormatString<0>::Detail_CheckNumFormatSpecifiers("%%_%"), ErrType, CheckThrowTerm{});
    39-    BOOST_CHECK_EXCEPTION(ConstevalFormatString<0>::Detail_CheckNumFormatSpecifiers("_%%%"), ErrType, CheckThrowTerm{});
    40+    BOOST_CHECK_EXCEPTION(ConstevalFormatString<0>::Detail_CheckNumFormatSpecifiers("s%"), ErrType, CheckThrowTerm{});
    41+    BOOST_CHECK_EXCEPTION(ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers("%s%"), ErrType, CheckThrowTerm{});
    42+    BOOST_CHECK_EXCEPTION(ConstevalFormatString<0>::Detail_CheckNumFormatSpecifiers("%%s%"), ErrType, CheckThrowTerm{});
    43+    BOOST_CHECK_EXCEPTION(ConstevalFormatString<0>::Detail_CheckNumFormatSpecifiers("s%%%"), ErrType, CheckThrowTerm{});
    44+
    45+    struct CheckThrowInvalid {
    46+        bool operator()(const ErrType& str) const { return std::string_view{str} == "Invalid format specifier!"; }
    47+    };
    48+    BOOST_CHECK_EXCEPTION(ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers("%2"), ErrType, CheckThrowInvalid{});
    49+    BOOST_CHECK_EXCEPTION(ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers("%$"), ErrType, CheckThrowInvalid{});
    50+    BOOST_CHECK_EXCEPTION(ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers("%z"), ErrType, CheckThrowInvalid{});
    51 }
    52 
    53 BOOST_AUTO_TEST_SUITE_END()
    54diff --git a/src/util/string.h b/src/util/string.h
    55index e968de5575..5ae54e1217 100644
    56--- a/src/util/string.h
    57+++ b/src/util/string.h
    58@@ -29,6 +29,9 @@ struct ConstevalFormatString {
    59         for (char c : str) {
    60             bool perc{c == '%'};
    61             if (begin_f) {
    62+                if (!perc && std::string_view{"aAcdfeEfFgGinopsuxX"}.find(c) == std::string_view::npos) {
    63+                    throw "Invalid format specifier!";
    64+                }
    65                 count += !perc;
    66                 begin_f = false;
    67                 continue;
    
  41. maflcko commented at 5:29 am on September 6, 2024: member

    throw “Invalid format specifier!”;

    In tinyformat there is no such thing, so I’ll leave this as-is for now.

  42. maflcko force-pushed on Sep 6, 2024
  43. maflcko commented at 5:37 am on September 6, 2024: member
    Replied to all review comments and force pushed a small style-only change in the test.
  44. l0rinc approved
  45. l0rinc commented at 8:30 am on September 6, 2024: contributor

    While I would still prefer some of our comments to be reconsidered, I’m also fine with merging the change as is. Thanks for streamlining the developer experience with these small but useful checks!

    ACK fa092749094aa483e3ce0243885ce2eb8ed22cbb

  46. DrahtBot requested review from TheCharlatan on Sep 6, 2024
  47. DrahtBot requested review from hodlinator on Sep 6, 2024
  48. hodlinator commented at 8:47 am on September 6, 2024: contributor

    In tinyformat there is no such thing, so I’ll leave this as-is for now.

    My motivation is not to reach parity with tinyformat, it is to at least cover some of what the prior linter did:

    https://github.com/bitcoin/bitcoin/blob/c3af4b1ec3fdb308404199ddd0df5170793a2c39/test/lint/run-lint-format-strings.py#L230-L253

    An alternative to making it an error is to simply make invalid chars not count, which is closer to the old behavior. Would that be okay?

    If you don’t want to bring over at least some of that specific functionality, the PR summary should state why.

  49. maflcko commented at 9:25 am on September 6, 2024: member

    Thanks for clarifying! I think not counting “invalid” ones would be wrong, because tinyformat treats them as valid. Silently not counting them is another bug in the Python code, because it will lead to a tinyformat: Too many conversion specifiers in format string. See https://godbolt.org/z/4ncfqvnjY

    I didn’t really want to give a full list of all bugs in the Python code, but I’ve modified the PR summary to mention this bug as well. Let me know if this sounds acceptable.

  50. maflcko commented at 10:02 am on September 6, 2024: member
    Actually, according to git grep --extended-regexp '%[0-9]\$' -- '*.cpp' '*.h' positional args are used. Let me fix that up.
  51. maflcko force-pushed on Sep 6, 2024
  52. maflcko commented at 1:00 pm on September 6, 2024: member
    Force pushed and updated the PR description to list the two additional bugs in the python code.
  53. maflcko force-pushed on Sep 6, 2024
  54. DrahtBot commented at 1:24 pm on September 6, 2024: contributor

    🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/29783243214

    Make sure to run all tests locally, according to the documentation.

    The failure may happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  55. DrahtBot added the label CI failed on Sep 6, 2024
  56. maflcko force-pushed on Sep 6, 2024
  57. DrahtBot removed the label CI failed on Sep 6, 2024
  58. in src/util/string.h:53 in fa18f7cc9a outdated
    47+                if (it_num == it) {
    48+                    // Non-positional specifier
    49+                    count_nor += !perc;
    50+                } else {
    51+                    // Positional specifier
    52+                    if (*it_num != '$' || std::string_view{"aAcdfeEfFgGinopsuxX"}.find(*(++it_num)) == std::string_view::npos) {
    


    l0rinc commented at 3:22 pm on September 6, 2024:
    can we reduce the duplication between line 39 and this set of values?

    maflcko commented at 3:46 pm on September 6, 2024:
    Yes, but it will be more code and complexity, so I’ll leave this as-is for now.

    l0rinc commented at 4:08 pm on September 6, 2024:
    Agree, in C++23 we can just use .contains which would simplify this slightly
  59. l0rinc approved
  60. l0rinc commented at 3:23 pm on September 6, 2024: contributor
    ACK fa18f7cc9ac36ebe0fa8c381b1db76085812e95f
  61. DrahtBot added the label CI failed on Sep 6, 2024
  62. in src/util/string.h:62 in fa18f7cc9a outdated
    57+                begin_f = false;
    58+                continue;
    59+            }
    60+            if (perc) begin_f = true;
    61+        }
    62+        if (count_nor && count_pos) throw "Format specifiers must be all postitional or all non-positional!";
    


    stickies-v commented at 2:24 pm on September 9, 2024:

    typo nit

    0        if (count_nor && count_pos) throw "Format specifiers must be all positional or all non-positional!";
    

    maflcko commented at 12:12 pm on September 10, 2024:
    Thanks, adjusted in the last push.
  63. in src/index/base.cpp:33 in fa18f7cc9a outdated
    27@@ -27,9 +28,9 @@ constexpr auto SYNC_LOG_INTERVAL{30s};
    28 constexpr auto SYNC_LOCATOR_WRITE_INTERVAL{30s};
    29 
    30 template <typename... Args>
    31-void BaseIndex::FatalErrorf(const char* fmt, const Args&... args)
    32+void BaseIndex::FatalErrorf(util::ConstevalFormatString<sizeof...(Args)> fmt, const Args&... args)
    33 {
    34-    auto message = tfm::format(fmt, args...);
    35+    auto message{tfm::format(fmt.fmt, args...)};
    


    stickies-v commented at 8:52 pm on September 9, 2024:
    Not necessary to be done in this PR because there’s only one callsite, but I suppose in the follow-up it’ll make sense to add a template <typename... Args> tfm::format(util::ConstevalFormatString<sizeof...(Args)> fmt, const Args&... args) overload?

    maflcko commented at 5:44 am on September 10, 2024:
    Yes. I am thinking that it could make sense to have a wrapper-header around tinyformat, so that each relevant tfm function can be called with ConstevalFormatString.

    maflcko commented at 12:12 pm on September 10, 2024:
    Thanks, adjusted in the last push.
  64. stickies-v commented at 8:53 pm on September 9, 2024: contributor
    Concept ACK. Doing this at compile time is much more helpful, besides the bugs being fixed.
  65. in src/util/string.h:78 in fa18f7cc9a outdated
    59+            }
    60+            if (perc) begin_f = true;
    61+        }
    62+        if (count_nor && count_pos) throw "Format specifiers must be all postitional or all non-positional!";
    63+        unsigned count{count_nor | count_pos};
    64+        if (num_params != count) throw "Format specifier count must match the argument count!";
    


    stickies-v commented at 10:10 am on September 10, 2024:

    I think having a separate CountNumberFormatSpecifiers would have made testing this function a bit easier and with better error feedback, while simultaneously also reducing template instantiation. Neither are overwhelmingly strong arguments, but I don’t think there are really any downsides either?

    E.g. sample, pretty rough (but passes tests) diff:

      0diff --git a/src/test/util_string_tests.cpp b/src/test/util_string_tests.cpp
      1index 1e041f37c6..43b63751d9 100644
      2--- a/src/test/util_string_tests.cpp
      3+++ b/src/test/util_string_tests.cpp
      4@@ -7,51 +7,58 @@
      5 #include <boost/test/unit_test.hpp>
      6 
      7 using namespace util;
      8+using namespace util::detail;
      9 
     10 BOOST_AUTO_TEST_SUITE(util_string_tests)
     11 
     12 BOOST_AUTO_TEST_CASE(ConstevalFormatString_NumSpec)
     13 {
     14-    // Compile-time sanity checks
     15-    static_assert([] {
     16-        ConstevalFormatString<0>::Detail_CheckNumFormatSpecifiers("");
     17-        ConstevalFormatString<0>::Detail_CheckNumFormatSpecifiers("%%");
     18-        ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers("%s");
     19-        ConstevalFormatString<0>::Detail_CheckNumFormatSpecifiers("%%s");
     20-        ConstevalFormatString<0>::Detail_CheckNumFormatSpecifiers("s%%");
     21-        ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers("%%%s");
     22-        ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers("%s%%");
     23-        ConstevalFormatString<0>::Detail_CheckNumFormatSpecifiers(" 1$s");
     24-        ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers("%1$s");
     25-        ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers("%1$s%1$s");
     26-        ConstevalFormatString<2>::Detail_CheckNumFormatSpecifiers("%2$s");
     27-        ConstevalFormatString<2>::Detail_CheckNumFormatSpecifiers("%2$s 4$s %2$s");
     28-        ConstevalFormatString<129>::Detail_CheckNumFormatSpecifiers("%129$s 999$s %2$s");
     29-        return true; // All checks above compiled and passed
     30-    }());
     31+    BOOST_CHECK_EQUAL(0, CountNumFormatSpecifiers(""));
     32+    BOOST_CHECK_EQUAL(0, CountNumFormatSpecifiers("%%"));
     33+    BOOST_CHECK_EQUAL(1, CountNumFormatSpecifiers("%s"));
     34+    BOOST_CHECK_EQUAL(0, CountNumFormatSpecifiers("%%s"));
     35+    BOOST_CHECK_EQUAL(0, CountNumFormatSpecifiers("s%%"));
     36+    BOOST_CHECK_EQUAL(1, CountNumFormatSpecifiers("%%%s"));
     37+    BOOST_CHECK_EQUAL(1, CountNumFormatSpecifiers("%s%%"));
     38+    BOOST_CHECK_EQUAL(0, CountNumFormatSpecifiers(" 1$s"));
     39+    BOOST_CHECK_EQUAL(1, CountNumFormatSpecifiers("%1$s"));
     40+    BOOST_CHECK_EQUAL(1, CountNumFormatSpecifiers("%1$s%1$s"));
     41+    BOOST_CHECK_EQUAL(2, CountNumFormatSpecifiers("%2$s"));
     42+    BOOST_CHECK_EQUAL(2, CountNumFormatSpecifiers("%2$s 4$s %2$s"));
     43+    BOOST_CHECK_EQUAL(129, CountNumFormatSpecifiers("%129$s 999$s %2$s"));
     44+
     45+    // existing "tests" in run-lint-format-strings.py parse_string_content
     46+    BOOST_CHECK_EQUAL(0, CountNumFormatSpecifiers("foo bar foo"));
     47+    BOOST_CHECK_EQUAL(1, CountNumFormatSpecifiers("foo %d bar foo"));
     48+    BOOST_CHECK_EQUAL(2, CountNumFormatSpecifiers("foo %d bar %i foo"));
     49+    BOOST_CHECK_EQUAL(2, CountNumFormatSpecifiers("foo %d bar %i foo %% foo"));
     50+    BOOST_CHECK_EQUAL(3, CountNumFormatSpecifiers("foo %d bar %i foo %% foo %d foo"));
     51+    // BOOST_CHECK_EQUAL(4, CountNumFormatSpecifiers("foo %d bar %i foo %% foo %*d foo"));
     52+    BOOST_CHECK_EQUAL(5, CountNumFormatSpecifiers("foo %5$d"));
     53+    // BOOST_CHECK_EQUAL(7, CountNumFormatSpecifiers("foo %5$*7$d"));
     54 
     55     // Negative checks at runtime
     56     using ErrType = const char*;
     57 
     58     auto check_mix{[](const ErrType& str) { return std::string_view{str} == "Format specifiers must be all positional or all non-positional!"; }};
     59-    BOOST_CHECK_EXCEPTION(ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers("%s%1$s"), ErrType, check_mix);
     60+    BOOST_CHECK_EXCEPTION(CountNumFormatSpecifiers("%s%1$s"), ErrType, check_mix);
     61 
     62     auto check_num_spec{[](const ErrType& str) { return std::string_view{str} == "Format specifier count must match the argument count!"; }};
     63-    BOOST_CHECK_EXCEPTION(ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers(""), ErrType, check_num_spec);
     64-    BOOST_CHECK_EXCEPTION(ConstevalFormatString<0>::Detail_CheckNumFormatSpecifiers("%s"), ErrType, check_num_spec);
     65-    BOOST_CHECK_EXCEPTION(ConstevalFormatString<2>::Detail_CheckNumFormatSpecifiers("%s"), ErrType, check_num_spec);
     66-    BOOST_CHECK_EXCEPTION(ConstevalFormatString<0>::Detail_CheckNumFormatSpecifiers("%1$s"), ErrType, check_num_spec);
     67-    BOOST_CHECK_EXCEPTION(ConstevalFormatString<2>::Detail_CheckNumFormatSpecifiers("%1$s"), ErrType, check_num_spec);
     68+    BOOST_CHECK_EXCEPTION(ConstevalFormatString<1>::CheckNumFormatSpecifiers(""), ErrType, check_num_spec);
     69+    BOOST_CHECK_EXCEPTION(ConstevalFormatString<0>::CheckNumFormatSpecifiers("%s"), ErrType, check_num_spec);
     70+    BOOST_CHECK_EXCEPTION(ConstevalFormatString<2>::CheckNumFormatSpecifiers("%s"), ErrType, check_num_spec);
     71+    BOOST_CHECK_EXCEPTION(ConstevalFormatString<0>::CheckNumFormatSpecifiers("%1$s"), ErrType, check_num_spec);
     72+    BOOST_CHECK_EXCEPTION(ConstevalFormatString<2>::CheckNumFormatSpecifiers("%1$s"), ErrType, check_num_spec);
     73 
     74     auto check_special{[](const ErrType& str) { return std::string_view{str} == "Invalid format specifier!"; }};
     75-    BOOST_CHECK_EXCEPTION(ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers("%_"), ErrType, check_special);
     76-    BOOST_CHECK_EXCEPTION(ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers("%\n"), ErrType, check_special);
     77+    BOOST_CHECK_EXCEPTION(CountNumFormatSpecifiers("%_"), ErrType, check_special);
     78+    BOOST_CHECK_EXCEPTION(CountNumFormatSpecifiers("%\n"), ErrType, check_special);
     79     // The "special" check also excludes "0", which would be out-of-range in tinyformat
     80-    BOOST_CHECK_EXCEPTION(ConstevalFormatString<0>::Detail_CheckNumFormatSpecifiers("%0$s"), ErrType, check_special);
     81+    BOOST_CHECK_EXCEPTION(CountNumFormatSpecifiers("%0$s"), ErrType, check_special);
     82 
     83     auto check_special_pos{[](const ErrType& str) { return std::string_view{str} == "Invalid positional format specifier!"; }};
     84-    BOOST_CHECK_EXCEPTION(ConstevalFormatString<0>::Detail_CheckNumFormatSpecifiers("%1$"), ErrType, check_special_pos);
     85-    BOOST_CHECK_EXCEPTION(ConstevalFormatString<0>::Detail_CheckNumFormatSpecifiers("%1$_"), ErrType, check_special_pos);
     86+    BOOST_CHECK_EXCEPTION(CountNumFormatSpecifiers("%1$"), ErrType, check_special_pos);
     87+    BOOST_CHECK_EXCEPTION(CountNumFormatSpecifiers("%1$_"), ErrType, check_special_pos);
     88 }
     89 
     90 BOOST_AUTO_TEST_SUITE_END()
     91diff --git a/src/test/util_tests.cpp b/src/test/util_tests.cpp
     92index 1624fb8b5b..ceb2e9ad54 100644
     93--- a/src/test/util_tests.cpp
     94+++ b/src/test/util_tests.cpp
     95@@ -154,18 +154,18 @@ BOOST_AUTO_TEST_CASE(parse_hex)
     96 
     97     // Basic test vector
     98     std::vector<unsigned char> expected(std::begin(HEX_PARSE_OUTPUT), std::end(HEX_PARSE_OUTPUT));
     99-    constexpr std::array<std::byte, 65> hex_literal_array{operator""_hex<util::detail::Hex(HEX_PARSE_INPUT)>()};
    100+    constexpr std::array<std::byte, 65> hex_literal_array{operator""_hex<util::hex_literals::detail::Hex(HEX_PARSE_INPUT)>()};
    101     auto hex_literal_span{MakeUCharSpan(hex_literal_array)};
    102     BOOST_CHECK_EQUAL_COLLECTIONS(hex_literal_span.begin(), hex_literal_span.end(), expected.begin(), expected.end());
    103 
    104-    const std::vector<std::byte> hex_literal_vector{operator""_hex_v<util::detail::Hex(HEX_PARSE_INPUT)>()};
    105+    const std::vector<std::byte> hex_literal_vector{operator""_hex_v<util::hex_literals::detail::Hex(HEX_PARSE_INPUT)>()};
    106     hex_literal_span = MakeUCharSpan(hex_literal_vector);
    107     BOOST_CHECK_EQUAL_COLLECTIONS(hex_literal_span.begin(), hex_literal_span.end(), expected.begin(), expected.end());
    108 
    109-    constexpr std::array<uint8_t, 65> hex_literal_array_uint8{operator""_hex_u8<util::detail::Hex(HEX_PARSE_INPUT)>()};
    110+    constexpr std::array<uint8_t, 65> hex_literal_array_uint8{operator""_hex_u8<util::hex_literals::detail::Hex(HEX_PARSE_INPUT)>()};
    111     BOOST_CHECK_EQUAL_COLLECTIONS(hex_literal_array_uint8.begin(), hex_literal_array_uint8.end(), expected.begin(), expected.end());
    112 
    113-    result = operator""_hex_v_u8<util::detail::Hex(HEX_PARSE_INPUT)>();
    114+    result = operator""_hex_v_u8<util::hex_literals::detail::Hex(HEX_PARSE_INPUT)>();
    115     BOOST_CHECK_EQUAL_COLLECTIONS(result.begin(), result.end(), expected.begin(), expected.end());
    116 
    117     result = ParseHex(HEX_PARSE_INPUT);
    118diff --git a/src/util/strencodings.h b/src/util/strencodings.h
    119index 1543de03ab..0f145d91c5 100644
    120--- a/src/util/strencodings.h
    121+++ b/src/util/strencodings.h
    122@@ -406,7 +406,7 @@ consteval uint8_t ConstevalHexDigit(const char c)
    123  *   ParseHex() does is because heap-based containers cannot cross the compile-
    124  *   time/runtime barrier.
    125  */
    126-inline namespace hex_literals {
    127+namespace hex_literals {
    128 namespace detail {
    129 
    130 template <size_t N>
    131@@ -427,16 +427,16 @@ struct Hex {
    132 
    133 } // namespace detail
    134 
    135-template <util::detail::Hex str>
    136+template <detail::Hex str>
    137 constexpr auto operator""_hex() { return str.bytes; }
    138 
    139-template <util::detail::Hex str>
    140+template <detail::Hex str>
    141 constexpr auto operator""_hex_u8() { return std::bit_cast<std::array<uint8_t, str.bytes.size()>>(str.bytes); }
    142 
    143-template <util::detail::Hex str>
    144+template <detail::Hex str>
    145 constexpr auto operator""_hex_v() { return std::vector<std::byte>{str.bytes.begin(), str.bytes.end()}; }
    146 
    147-template <util::detail::Hex str>
    148+template <detail::Hex str>
    149 inline auto operator""_hex_v_u8() { return std::vector<uint8_t>{UCharCast(str.bytes.data()), UCharCast(str.bytes.data() + str.bytes.size())}; }
    150 
    151 } // inline namespace hex_literals
    152diff --git a/src/util/string.h b/src/util/string.h
    153index 754972b4ed..418dd92ca3 100644
    154--- a/src/util/string.h
    155+++ b/src/util/string.h
    156@@ -18,50 +18,57 @@
    157 #include <vector>
    158 
    159 namespace util {
    160+namespace detail {
    161+constexpr unsigned int CountNumFormatSpecifiers(std::string_view str) 
    162+{
    163+    unsigned count_nor{0}; // Number of "normal" specifiers, like %s
    164+    unsigned count_pos{0}; // Max number in positional specifier, like %8$s
    165+    bool begin_f{false};
    166+    for (auto it{str.begin()}; it < str.end(); ++it) {
    167+        bool perc{*it == '%'};
    168+        if (begin_f) {
    169+            // Almost any char is allowed by tinyformat to be used in a
    170+            // format specifier, but a stricter check avoids bugs. For
    171+            // instance, a strict check prevents consuming the newline
    172+            // terminator in the format string "%\n". Also, excluding '0'
    173+            // avoids an out-of-range position error in tinyformat for the
    174+            // format string "%0$s\n".
    175+            if (std::string_view{"123456789aAcdfeEfFgGinopsuxX%"}.find(*it) == std::string_view::npos) throw "Invalid format specifier!";
    176+            auto it_num{it};
    177+            unsigned maybe_num{0};
    178+            while ('0' <= *it_num && *it_num <= '9') {
    179+                maybe_num *= 10;
    180+                maybe_num += *it_num - '0';
    181+                ++it_num;
    182+            };
    183+            if (it_num == it) {
    184+                // Non-positional specifier
    185+                count_nor += !perc;
    186+            } else {
    187+                // Positional specifier
    188+                if (*it_num != '$' || std::string_view{"aAcdfeEfFgGinopsuxX"}.find(*(++it_num)) == std::string_view::npos) {
    189+                    throw "Invalid positional format specifier!";
    190+                }
    191+                count_pos = std::max(count_pos, maybe_num);
    192+            }
    193+            begin_f = false;
    194+            continue;
    195+        }
    196+        if (perc) begin_f = true;
    197+    }
    198+    if (count_nor && count_pos) throw "Format specifiers must be all positional or all non-positional!";
    199+    unsigned count{count_nor | count_pos};
    200+    return count;
    201+}
    202+} // namespace detail
    203 /** Type to denote a format string that was checked at compile time */
    204 template <unsigned num_params>
    205 struct ConstevalFormatString {
    206     const char* const fmt;
    207-    consteval ConstevalFormatString(const char* str) : fmt{str} { Detail_CheckNumFormatSpecifiers(fmt); }
    208-    constexpr static void Detail_CheckNumFormatSpecifiers(std::string_view str)
    209+    consteval ConstevalFormatString(const char* str) : fmt{str} { CheckNumFormatSpecifiers(str); }
    210+    constexpr static void CheckNumFormatSpecifiers(std::string_view str)
    211     {
    212-        unsigned count_nor{0}; // Number of "normal" specifiers, like %s
    213-        unsigned count_pos{0}; // Max number in positional specifier, like %8$s
    214-        bool begin_f{false};
    215-        for (auto it{str.begin()}; it < str.end(); ++it) {
    216-            bool perc{*it == '%'};
    217-            if (begin_f) {
    218-                // Almost any char is allowed by tinyformat to be used in a
    219-                // format specifier, but a stricter check avoids bugs. For
    220-                // instance, a strict check prevents consuming the newline
    221-                // terminator in the format string "%\n". Also, excluding '0'
    222-                // avoids an out-of-range position error in tinyformat for the
    223-                // format string "%0$s\n".
    224-                if (std::string_view{"123456789aAcdfeEfFgGinopsuxX%"}.find(*it) == std::string_view::npos) throw "Invalid format specifier!";
    225-                auto it_num{it};
    226-                unsigned maybe_num{0};
    227-                while ('0' <= *it_num && *it_num <= '9') {
    228-                    maybe_num *= 10;
    229-                    maybe_num += *it_num - '0';
    230-                    ++it_num;
    231-                };
    232-                if (it_num == it) {
    233-                    // Non-positional specifier
    234-                    count_nor += !perc;
    235-                } else {
    236-                    // Positional specifier
    237-                    if (*it_num != '$' || std::string_view{"aAcdfeEfFgGinopsuxX"}.find(*(++it_num)) == std::string_view::npos) {
    238-                        throw "Invalid positional format specifier!";
    239-                    }
    240-                    count_pos = std::max(count_pos, maybe_num);
    241-                }
    242-                begin_f = false;
    243-                continue;
    244-            }
    245-            if (perc) begin_f = true;
    246-        }
    247-        if (count_nor && count_pos) throw "Format specifiers must be all positional or all non-positional!";
    248-        unsigned count{count_nor | count_pos};
    249+        unsigned count{detail::CountNumFormatSpecifiers(str)};
    250         if (num_params != count) throw "Format specifier count must match the argument count!";
    251     }
    252 };
    

    maflcko commented at 6:23 pm on September 10, 2024:

    Thanks for a diff that compiles. However, I think that making implementation details verbose to use is a feature to avoid developers from accidentally using them in real code. The verbosity in tests is a bit annoying, but seems acceptable to me.

    I’ll leave this as-is for now.


    stickies-v commented at 8:43 pm on September 10, 2024:

    making implementation details verbose to use is a feature to avoid developers from accidentally using them in real code

    I can’t immediately see a use case for this function, but is there a reason we’d want to actively discourage this function from being used? It seems harmless to me, but if there is, perhaps it’d be good to add a brief docstring to the function to describe that?

    The verbosity in tests is a bit annoying, but seems acceptable to me.

    I agree.


    maflcko commented at 9:19 am on September 11, 2024:

    is there a reason we’d want to actively discourage this function from being used?

    Mostly just to avoid someone from using the function over sizeof...(Args), which would render everything pointless, because calculating the same value at compile time is obviously going to always pass (modulo parsing errors)


    maflcko commented at 8:40 am on September 12, 2024:

    The verbosity in tests is a bit annoying, but seems acceptable to me.

    I agree.

    I pushed a test-only change in the latest push to reduce the verbosity to a minimum.

  66. in src/util/string.h:32 in fa18f7cc9a outdated
    26+    {
    27+        unsigned count_nor{0}; // Number of "normal" specifiers, like %s
    28+        unsigned count_pos{0}; // Max number in positional specifier, like %8$s
    29+        bool begin_f{false};
    30+        for (auto it{str.begin()}; it < str.end(); ++it) {
    31+            bool perc{*it == '%'};
    


    stickies-v commented at 10:12 am on September 10, 2024:
    Note: it seems we are explicitly preventing * dynamic width/precision specifiers here, which was previously allowed (and checked) with the linter. That’s not necessarily bad, it seems we’re not currently using them (quick non-exhaustive search) and it can of course be added back in later, and we’re throwing instead of silently ignoring it.

    maflcko commented at 6:26 pm on September 10, 2024:

    Thanks! I think it was wrong for me to try to re-implement tinyformat parsing at compile time and drop the features that aren’t currently needed. I’ve reverted the pull to an earlier state, so that any tinyformat feature continues to work, just like before. (C.f. #30546 (review))

    Edit: * dynamic specifiers aren’t implemented in the latest push, so they are still unsupported. Given that they are not used, I think it is fine if future code using them just falls back to a non-consteval format string. Maybe std::format will be available then, so they could also just use https://en.cppreference.com/w/cpp/utility/format/spec#Width_and_precision (which is consteval checked)


    stickies-v commented at 8:44 pm on September 10, 2024:

    Given that they are not used,

    Yup, I agree with the simplified approach. I didn’t like the half/half of quite a of bit of complexity without being complete, this makes more sense. Can be marked as resolved, thanks!

  67. maflcko force-pushed on Sep 10, 2024
  68. in src/util/string.h:240 in faa32adbcf outdated
    221@@ -173,4 +222,12 @@ template <typename T1, size_t PREFIX_LEN>
    222 }
    223 } // namespace util
    224 
    225+namespace tinyformat {
    226+template <typename... Args>
    227+std::string format(util::ConstevalFormatString<sizeof...(Args)> fmt, const Args&... args)
    


    l0rinc commented at 11:51 am on September 10, 2024:
    can we still have errors like this after this change?

    maflcko commented at 12:10 pm on September 10, 2024:
    Yes, in theory. The goal of this pull request is to make them less likely. According to the pull request description, “This is the first step toward #30530 and a follow-up will apply the approach to the other places.”
  69. l0rinc commented at 11:57 am on September 10, 2024: contributor

    ACK faa32adbcf4c04f0a426eaba4a43b29a293de72b

    • brace initialization in base.cpp
    • postitional typo fix
    • added ConstevalFormatString overload
  70. DrahtBot requested review from stickies-v on Sep 10, 2024
  71. in src/util/string.h:28 in faa32adbcf outdated
    23+struct ConstevalFormatString {
    24+    const char* const fmt;
    25+    consteval ConstevalFormatString(const char* str) : fmt{str} { Detail_CheckNumFormatSpecifiers(fmt); }
    26+    constexpr static void Detail_CheckNumFormatSpecifiers(std::string_view str)
    27+    {
    28+        unsigned count_nor{0}; // Number of "normal" specifiers, like %s
    


    hodlinator commented at 12:53 pm on September 10, 2024:

    nit:

    0        unsigned count_normal{0}; // Number of "normal" specifiers, like %s
    

    “nor” collides with the conjunction word “nor” and with https://en.wikipedia.org/wiki/Logical_NOR


    maflcko commented at 6:13 pm on September 10, 2024:
    Thanks, fixed.
  72. in src/test/util_string_tests.cpp:36 in faa32adbcf outdated
    31+    }());
    32+
    33+    // Negative checks at runtime
    34+    using ErrType = const char*;
    35+
    36+    auto check_mix{[](const ErrType& str) { return std::string_view{str} == "Format specifiers must be all positional or all non-positional!"; }};
    


    hodlinator commented at 12:59 pm on September 10, 2024:

    nit: Possibly slightly more optimal to encourage compile time length calculation for the string literal on the right. But maybe compilers already do the const char[] -> string_view conversion for the literal on the right at compile time to later match string_view::operator== at runtime.

    0    auto check_mix{[](const ErrType& str) { return str == std::string_view{"Format specifiers must be all positional or all non-positional!"}; }};
    

    maflcko commented at 6:13 pm on September 10, 2024:
    Thanks, fixed.
  73. in src/test/util_string_tests.cpp:24 in faa32adbcf outdated
    19+        ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers("%s");
    20+        ConstevalFormatString<0>::Detail_CheckNumFormatSpecifiers("%%s");
    21+        ConstevalFormatString<0>::Detail_CheckNumFormatSpecifiers("s%%");
    22+        ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers("%%%s");
    23+        ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers("%s%%");
    24+        ConstevalFormatString<0>::Detail_CheckNumFormatSpecifiers(" 1$s");
    


    hodlinator commented at 1:53 pm on September 10, 2024:

    Tried adding

    0        ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers("%02d");
    

    here and it fails.

    Ran git grep -E '"[^"]*%[0-9]+[aAcdfeEfFgGinopsuxX*][^"]*' src/ ':(exclude)src/leveldb' ':(exclude)src/secp256k1' ':(exclude)*.ts' and it feels like a common enough case to want to have fixed width formatting (optionally 0-padded). Examples:

    0src/netbase.cpp:            LogError("Proxy requested wrong authentication method %02x\n", pchRet1[1]);
    1src/flatfile.cpp:                LogDebug(BCLog::VALIDATION, "Pre-allocating up to position 0x%x in %s%05u.dat\n", new_size, m_prefix, pos.nFile);
    2src/init.cpp:                LogPrintf(" block index %15dms\n", Ticks<std::chrono::milliseconds>(SteadyClock::now() - load_block_index_start_time));
    3src/node/blockstorage.cpp:            LogDebug(BCLog::BLOCKSTORAGE, "Prune: %s deleted blk/rev (%05u)\n", __func__, *it);
    4src/node/blockstorage.cpp:                          "Failed to flush previous block file %05i (finalize=1, finalize_undo=%i) before opening new block file %05i\n",
    5src/node/blockstorage.cpp:                LogPrintLevel(BCLog::BLOCKSTORAGE, BCLog::Level::Warning, "Failed to flush undo file %05i\n", _pos.nFile);
    6src/node/blockstorage.cpp:            LogPrintf("Reindexing block file blk%05u.dat...\n", (unsigned int)nFile);
    

    Sorry for not realizing and raising this in an earlier review.


    maflcko commented at 6:22 pm on September 10, 2024:

    Sorry for not realizing and raising this in an earlier review.

    No worries. It was my mistake for pushing the wrong diff.

    My initial patch from July only had very basic checking and accepted any format specifier (like tinyformat). When I added support for positional args a few days ago (https://github.com/bitcoin/bitcoin/pull/30546#issuecomment-2333713098), I also tried to validate format specifiers itself.

    However, given that tinyformat accepts any specifier, and given that incorrect specifiers weren’t an issue in the past, I think it is probably not worth it to spend time on fully re-implementing tinyformat parsing at compile time.

    The main issue to detect is the count mismatch.

    So I’ve reverted to the initial version from July, with only support added for positional args.

  74. hodlinator commented at 1:58 pm on September 10, 2024: contributor

    Reviewed faa32adbcf4c04f0a426eaba4a43b29a293de72b

    git range-diff master fa72ce6 faa32ad

    • Implemented positional args after me finally communicating clearly enough.
    • Added ConstevalFormatString overload directly into the tinyformat namespace to ease future adoption.
    • Uses more terse lambdas instead of explicit structs for checking exceptions. :+1:

    Does not implement support for ‘*’, but not currently used anyway: https://github.com/bitcoin/bitcoin/blob/c3af4b1ec3fdb308404199ddd0df5170793a2c39/test/lint/run-lint-format-strings.py#L247

    New concern: Does not support supplying an integer to specify width together with an optional 0 to change padding from spaces to zeroes. (See inline comment).

  75. DrahtBot requested review from hodlinator on Sep 10, 2024
  76. in src/util/string.h:40 in faa32adbcf outdated
    35+                // format specifier, but a stricter check avoids bugs. For
    36+                // instance, a strict check prevents consuming the newline
    37+                // terminator in the format string "%\n". Also, excluding '0'
    38+                // avoids an out-of-range position error in tinyformat for the
    39+                // format string "%0$s\n".
    40+                if (std::string_view{"123456789aAcdfeEfFgGinopsuxX%"}.find(*it) == std::string_view::npos) throw "Invalid format specifier!";
    


    stickies-v commented at 4:48 pm on September 10, 2024:
    We do have quite a few use cases of \.[\d+]f precision that would throw here. No issue in the current code, but would like have to be overhauled in the next PR?

    maflcko commented at 6:26 pm on September 10, 2024:
    Thanks, and good catch! I’ve reverted this, see #30546 (review)
  77. stickies-v commented at 4:59 pm on September 10, 2024: contributor

    I’m still a strong Concept ACK on this, but I wonder if the incremental appoach currently taken is the best approach. We’ll have to (I think quite likely non-trivially) update the logic based on flags, widths, precisions, conversion, … actually used in the codebase, as seen with e.g. the precision and zero-padding comments?

    So perhaps it’d be sensible to first create a draft PR/branch that forces ConstevalFormatString everywhere, informing our requirements and adding full unit test coverage, and then once we have a good overview, branch off the minimal subset of changes required to update FatalErrorf as to keep that first (i.e. this) PR merge-able?

  78. DrahtBot requested review from stickies-v on Sep 10, 2024
  79. maflcko force-pushed on Sep 10, 2024
  80. maflcko force-pushed on Sep 11, 2024
  81. in src/test/util_string_tests.cpp:36 in faca9a8219 outdated
    31+        ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers("%+2s");
    32+        ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers("%+2s");
    33+        ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers("%.6i");
    34+        ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers("%5.2f");
    35+        ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers("%#x");
    36+        ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers("%*c");
    


    stickies-v commented at 10:14 am on September 11, 2024:

    I think this test should have a comment that the current behaviour is because our counting implementation is incorrect, not because it’s desired behaviour. Otherwise this could be confusing to developers improving it in the future.

    Or even better, add a separate group (within the same static_assert) of tests with behaviour we know would be an improvement, but is not currently implemented:

    0        // The current implementation ignores `*` dynamic width/precision
    1        // arguments to minimize code complexity, but this behaviour does
    2        // not need to be preserved.
    3        ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers("%*c");
    4        ConstevalFormatString<2>::Detail_CheckNumFormatSpecifiers("%2$*3$d");
    5        ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers("%.*f");
    

    maflcko commented at 1:40 pm on September 11, 2024:

    Thanks, done!

    (Side note: count_format_specifiers also does not implement * parsing correctly, which I guess was never detected because it was never used)

  82. in src/util/string.h:56 in faca9a8219 outdated
    51+
    52+            if (*it_num == '$') {
    53+                // Positional specifier
    54+                if (maybe_num == 0) throw "Positional format specifier must indicate at least 1";
    55+                count_pos = std::max(count_pos, maybe_num);
    56+                it = it_num;
    


    stickies-v commented at 10:25 am on September 11, 2024:

    I don’t think we need the extra it_num here, just seems to add code and iterations?

     0            unsigned maybe_num{0};
     1            while ('0' <= *it && *it <= '9') {
     2                maybe_num *= 10;
     3                maybe_num += *it - '0';
     4                ++it;
     5            };
     6
     7            if (*it == '$') {
     8                // Positional specifier
     9                if (maybe_num == 0) throw "Positional format specifier must indicate at least 1";
    10                count_pos = std::max(count_pos, maybe_num);
    

    maflcko commented at 1:41 pm on September 11, 2024:
    Thanks, removed!
  83. in src/util/string.h:43 in faca9a8219 outdated
    38+                // %%
    39+                ++it;
    40+                continue;
    41+            }
    42+
    43+            // Test for positional specifier, like %8$
    


    stickies-v commented at 10:29 am on September 11, 2024:
    This comment is a bit misleading, numbers can also be used for flags and width, e.g. %02d.

    maflcko commented at 1:42 pm on September 11, 2024:
    Thanks, (re)moved.
  84. in src/util/string.h:54 in faca9a8219 outdated
    49+                ++it_num;
    50+            };
    51+
    52+            if (*it_num == '$') {
    53+                // Positional specifier
    54+                if (maybe_num == 0) throw "Positional format specifier must indicate at least 1";
    


    stickies-v commented at 10:35 am on September 11, 2024:

    phrasing nit

    0                if (maybe_num == 0) throw "Positional format specifier must have position of at least 1";
    

    l0rinc commented at 11:56 am on September 11, 2024:

    👍, “indicate” may not be the best choice in this context, how about:

    0                if (maybe_num == 0) throw "Positional format specifiers start at 1"};
    

    maflcko commented at 1:42 pm on September 11, 2024:
    Thanks, rephrased!
  85. in test/lint/lint-format-strings.py:36 in faca9a8219 outdated
    31-    'snprintf,2',
    32-    'sprintf,1',
    33     'strprintf,0',
    34-    'vfprintf,1',
    35-    'vprintf,1',
    36-    'vsnprintf,1',
    


    stickies-v commented at 10:43 am on September 11, 2024:
    I think we can then also remove the ("src/dbwrapper.cpp", "vsnprintf(p, limit - p, format, backup_ap)"), in FALSE_POSITIVES in run-lint-format-strings.py?

    maflcko commented at 1:41 pm on September 11, 2024:
    Thanks, removed!
  86. in src/util/string.h:42 in fa53bb4de8 outdated
    37+                // %%
    38+                ++it;
    39+                continue;
    40+            }
    41+
    42+            // Test for positional specifier, like %8$
    


    l0rinc commented at 11:09 am on September 11, 2024:
    nit: personally I’d prohibit positional args completely, they can be super confusing - and shouldn’t be needed, we can extract variables before the call if we need to repeat them, instead of the reader being forced to count parameters… but if we do, I have a few suggestions

    maflcko commented at 1:46 pm on September 11, 2024:

    They are needed, see #30546 (comment)

    If you want to prohibit them, a separate pull request may better suited to remove them.


    l0rinc commented at 1:53 pm on September 11, 2024:

    I know they’re used, but I’d argue they’re not needed, i.e. instead of

     0const std::string error = strprintf(
     1    "Data directory %1$s contains a %2$s file which is ignored, because a different configuration file "
     2    "%3$s from %4$s is being used instead. Possible ways to address this would be to:\n"
     3    "- Delete or rename the %2$s file in data directory %1$s.\n"
     4    "- Change datadir= or conf= options to specify one configuration file, not two, and use "
     5    "includeconf= to include any other configuration files.\n"
     6    "- Set allowignoredconf=1 option to treat this condition as a warning, not an error.",
     7    fs::quoted(fs::PathToString(base_path)),
     8    fs::quoted(BITCOIN_CONF_FILENAME),
     9    fs::quoted(fs::PathToString(orig_config_path)),
    10    config_source);
    

    we could simply repeat the args in the same way they’re repeated in the text (instead of requiring the reader to jump around)

     0const std::string base_path_str = fs::quoted(fs::PathToString(base_path));
     1const std::string config_file_name = fs::quoted(BITCOIN_CONF_FILENAME);
     2const std::string orig_config_path_str = fs::quoted(fs::PathToString(orig_config_path));
     3const std::string error = strprintf(
     4    "Data directory %s contains a %s file which is ignored, because a different configuration file "
     5    "%s from %s is being used instead. Possible ways to address this would be to:\n"
     6    "- Delete or rename the %s file in data directory %s.\n"
     7    "- Change datadir= or conf= options to specify one configuration file, not two, and use "
     8    "includeconf= to include any other configuration files.\n"
     9    "- Set allowignoredconf=1 option to treat this condition as a warning, not an error.",
    10    base_path_str,
    11    config_file_name,
    12    orig_config_path_str,
    13    config_source,
    14    config_file_name,
    15    base_path_str
    16);
    

    maflcko commented at 2:01 pm on September 11, 2024:
    If you want to prohibit them, a separate pull request may better suited to remove and prohibit them.

    l0rinc commented at 2:03 pm on September 11, 2024:
    Absolutely, but we may not want to spend time implementing positional parsing, if we agree that we don’t want to encourage their usage

    maflcko commented at 2:06 pm on September 11, 2024:
    Personally I think it can be useful (std::format also allows them), but I don’t really care. I am happy to put this pull into a draft while it is waiting on the other issue/pull. However, I don’t think this pull is the right place to make such a change.

    l0rinc commented at 2:13 pm on September 11, 2024:
    I don’t want to delay the PR, just to clarify if it makes sense to work on this feature

    ryanofsky commented at 2:50 pm on September 11, 2024:

    I think positional args are more readable than repeated args (even in this example which is atypically long) because:

    • People have limited working memories. it is a easier to see 4 parameters passed and remember what numbers 1-4 are than to see 6 parameters passed and remember what 1-6 are. Remembering a list is also harder if the list has duplicate values.
    • Positional arguments are more explicit. “Configuration file %3$s” it is clearer than “Configuration file %s” because it tells you that the 3rd argument is substituted. You don’t have to manually scan the string and count earlier %s.
    • Positional arguments make it easier understand messages and see whether they are internally consistent. “Data directory %1$s contains a %2$s file […] Delete or rename the %2$s file in data directory %1$s” is easier to understand and verify than the alternative with %s everywhere.

    l0rinc commented at 3:08 pm on September 11, 2024:
    Thanks @ryanofsky, it seems there is a need for that after all, please resolve this comment.
  87. in src/util/string.h:21 in faca9a8219 outdated
    17@@ -17,6 +18,55 @@
    18 #include <vector>
    19 
    20 namespace util {
    21+/** Type to denote a format string that was checked at compile time */
    


    stickies-v commented at 11:13 am on September 11, 2024:

    I think this docstring could be beefed up. To give it a start:

     0/**
     1 * [@brief](/bitcoin-bitcoin/contributor/brief/) A wrapper for a compile-time partially validated format string
     2 *
     3 * This struct can be used to enforce partial compile-time validation of format strings, to reduce
     4 * the likelihood of tinyformat throwing exceptions at run-time. Validation is partial to try and
     5 * prevent the most common errors while avoiding re-implementing the entire parsing logic.
     6 * 
     7 * [@note](/bitcoin-bitcoin/contributor/note/) Counting of `*` dynamic width and precision fields (such as `%*c`, `%2$*3$d`, `%.*f`) is
     8 * notably not implemented to minimize code complexity as long they are not used in the codebase.
     9 * Usage of these fields is not counted and can lead to run-time exceptions.
    10 */
    

    maflcko commented at 1:43 pm on September 11, 2024:
    Thanks, took it and extended the @note a bit.
  88. stickies-v commented at 11:49 am on September 11, 2024: contributor

    Approach ACK and code LGTM faca9a821963d629987f6c2a2f7d13b1bf01162c , left some suggestions that would be nice.

    PR description also needs a bit of updating for the 2 new commits re forbidden functions and covering LogConnectFailure. The 2 new commits don’t seem necessary for this PR but they were trivial to review and relevant to the goal so I’m okay keeping them.

  89. in src/util/string.h:77 in faca9a8219 outdated
    60+                ++count_normal;
    61+                ++it;
    62+            }
    63+        }
    64+        if (count_normal && count_pos) throw "Format specifiers must be all positional or all non-positional!";
    65+        unsigned count{count_normal | count_pos};
    


    l0rinc commented at 12:10 pm on September 11, 2024:
    since we can only have either count_normal or count_pos, but we also need to detect if both were called, could we separate the two concerns: have a single count which is used in both cases and two booleans, a “has_normal” and a “has_positional”?

    maflcko commented at 1:46 pm on September 11, 2024:
    A single count and two booleans are three symbols, whereas two counts are two symbols, which seems easier

    l0rinc commented at 1:49 pm on September 11, 2024:
    The error messages indicate two separate concerns, it would make sense logically to separate them as well
  90. maflcko renamed this:
    util: Use consteval checked format string in FatalErrorf
    util: Use consteval checked format string in FatalErrorf, LogConnectFailure
    on Sep 11, 2024
  91. in src/util/string.h:39 in faca9a8219 outdated
    17@@ -17,6 +18,55 @@
    18 #include <vector>
    19 
    20 namespace util {
    21+/** Type to denote a format string that was checked at compile time */
    22+template <unsigned num_params>
    23+struct ConstevalFormatString {
    24+    const char* const fmt;
    25+    consteval ConstevalFormatString(const char* str) : fmt{str} { Detail_CheckNumFormatSpecifiers(fmt); }
    26+    constexpr static void Detail_CheckNumFormatSpecifiers(std::string_view str)
    


    l0rinc commented at 1:22 pm on September 11, 2024:

    While it gets the job done, I think the parser is unnecessarily complicated for several reasons:

    • Redundant Counters: Separate (but mutually exclusive) count_normal and count_pos.
    • Repetition: The check for “Format specifier incorrectly terminated by end of string” is repeated in multiple places.
    • Inefficient Character Handling: Multiple character checks for % (and manual digit detection).
    • Scattered Iterations: Increment operations (++it), continue statements, and lookaheads (str.end()) make the control flow harder to follow.

    I have a hard time explaining the behavior of the parser in words, following the logic of the code. Can we use a state-machine based parser instead, without lookaround, iterating char by char without jumping around, changing the state as we go along, e.g. something like this:

     0constexpr static void Detail_CheckNumFormatSpecifiers(std::string_view str)
     1{
     2    unsigned count{0};
     3    unsigned maybe_num{0};
     4    bool inside_format_specifier = false;
     5    bool has_normal = false, has_positional = false;
     6    for (auto it = str.begin(); it < str.end(); ++it) {
     7        if (*it == '%') inside_format_specifier = !inside_format_specifier;
     8        else if (inside_format_specifier) {
     9            if (std::isdigit(*it)) maybe_num = maybe_num * 10 + (*it - '0');
    10            else if (*it == '$') {
    11                if (maybe_num == 0) throw "Positional format specifier must be at least 1";
    12                count = std::max(count, maybe_num);
    13                has_positional = true;
    14                inside_format_specifier = false;
    15            } else {
    16                ++count;
    17                maybe_num = 0;
    18                has_normal = true;
    19                inside_format_specifier = false;
    20            }
    21        } else maybe_num = 0;
    22    }
    23
    24    if (has_normal && has_positional) throw "Format specifiers must be all positional or all non-positional!";
    25    if (inside_format_specifier || maybe_num > 0) throw "Format specifier incorrectly terminated by end of string";
    26    if (num_params != count) throw "Format specifier count must match the argument count!";
    27}
    

    This is the reason I’m not ack-ig yet.


    maflcko commented at 1:49 pm on September 11, 2024:

    Your suggestion does not compile, so I’ll leave this as-is for now.

    0src/util/string.h:47:17: note: non-constexpr function 'isdigit' cannot be used in a constant expression
    1   47 |             if (std::isdigit(*it)) maybe_num = maybe_num * 10 + (*it - '0');
    2      |                 ^
    

    l0rinc commented at 1:57 pm on September 11, 2024:

    Must be a compiler difference, you can use if ('0' <= *it && *it <= '9') instead:

     0constexpr static void Detail_CheckNumFormatSpecifiers(std::string_view str)
     1{
     2    unsigned count{0};
     3    unsigned maybe_num{0};
     4    bool inside_format = false;
     5    bool has_normal = false, has_positional = false;
     6    for (char c : str) {
     7        if (c == '%') {
     8            inside_format = !inside_format;
     9        } else if (inside_format) {
    10            if ('0' <= c && c <= '9') {
    11                maybe_num = maybe_num * 10 + (c - '0');
    12            } else if (c == '$') {
    13                if (maybe_num == 0) throw "Positional format specifier must have position of at least 1";
    14                count = std::max(count, maybe_num);
    15                has_positional = true;
    16                inside_format = false;
    17            } else {
    18                ++count;
    19                maybe_num = 0;
    20                has_normal = true;
    21                inside_format = false;
    22            }
    23        } else
    24            maybe_num = 0;
    25    }
    26    if (has_normal && has_positional) throw "Format specifiers must be all positional or all non-positional!";
    27    if (inside_format || maybe_num > 0) throw "Format specifier incorrectly terminated by end of string";
    28    if (num_params != count) throw "Format specifier count must match the argument count!";
    29}
    

    hodlinator commented at 1:59 pm on September 11, 2024:

    Replying to #30546 (review):

    Could be simply for (char c : str).

    Also the coding style only permits omitting braces on if when it isn’t followed by else.


    l0rinc commented at 9:22 am on September 12, 2024:
    Thanks, updated the example
  92. in src/util/string.h:66 in faca9a8219 outdated
    50+            };
    51+
    52+            if (*it_num == '$') {
    53+                // Positional specifier
    54+                if (maybe_num == 0) throw "Positional format specifier must indicate at least 1";
    55+                count_pos = std::max(count_pos, maybe_num);
    


    l0rinc commented at 1:27 pm on September 11, 2024:
    Alternatively, we could cheaply collect the referenced indexes to a std::set<unsigned> referenced_params; and check that all of them were referenced, i.e. if (referenced_params.size() != num_params) throw "Not all parameters are referenced!";
  93. in src/util/string.h:49 in faca9a8219 outdated
    31+            if (*it != '%') {
    32+                ++it;
    33+                continue;
    34+            }
    35+
    36+            if (++it >= str.end()) throw "Format specifier incorrectly terminated by end of string";
    


    l0rinc commented at 1:29 pm on September 11, 2024:
    I don’t particularly like that we have lookaheads even though we should be able to detect this at the very end - which would enable us to remove the duplicated throw.
  94. in src/util/string.h:46 in faca9a8219 outdated
    28+        unsigned count_normal{0}; // Number of "normal" specifiers, like %s
    29+        unsigned count_pos{0};    // Max number in positional specifier, like %8$s
    30+        for (auto it{str.begin()}; it < str.end();) {
    31+            if (*it != '%') {
    32+                ++it;
    33+                continue;
    


    l0rinc commented at 1:29 pm on September 11, 2024:
    I find these jumps quite confusing, it doesn’t reflect how I would manually read and interpret such a format string

    hodlinator commented at 3:07 pm on September 11, 2024:
    I reacted in the opposite way when I saw this, happy to pop the %%-case off the mental stack early and focus on other cases.

    l0rinc commented at 3:13 pm on September 11, 2024:
    but that’s not what this does, that’s below this condition. This states: if the current char is not a %, ignore it - which wouldn’t be the first thing I’d say when explaining to someone how the formatter works (especially since we’re checking later for % again).

    maflcko commented at 4:04 pm on September 11, 2024:

    This states: if the current char is not a %, ignore it - which wouldn’t be the first thing I’d say …

    For counting the specifiers, anything before the specifier begins with % can safely be ignored. I’d explain it this way, but I can see how someone may want to explain it in another way.

    Personally, I am fine either way, because this is just a style nit for me. I’ll leave this as-is for now, but I may or may not change something if I have to push again.


    hodlinator commented at 5:23 pm on September 11, 2024:

    In response to #30546 (review):

    but that’s not what this does, that’s below this condition.

    Ah, yes, mixed them up. Still feels okay to ignore anything up to a ‘%’ that way.

    But I do also like l0rinc’s approach in this thread https://github.com/bitcoin/bitcoin/pull/30546/files#r1754649705, although extra state tends to create room for more bugs.

  95. l0rinc changes_requested
  96. l0rinc commented at 1:32 pm on September 11, 2024: contributor
    Thanks for your patience and for striving to eliminate developer frustration. I like the concept a lot, but have some preferences regarding parser intuitiveness, hope you’ll consider my suggestions.
  97. maflcko force-pushed on Sep 11, 2024
  98. in src/test/util_string_tests.cpp:43 in fa7819bfad outdated
    32+        ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers("%+2s");
    33+        ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers("%.6i");
    34+        ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers("%5.2f");
    35+        ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers("%#x");
    36+        ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers("%_");
    37+        ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers("%\n");
    


    hodlinator commented at 2:18 pm on September 11, 2024:
    These lines (except maybe %5.2f?) should be moved to their own static_assert with a comment admitting they are invalid, but currently not caught (unlike the static_assert below which is valid but not properly handled).

    maflcko commented at 2:48 pm on September 11, 2024:

    They are not invalid, because in tinyformat there isn’t an invalid specifier, see #30546 (comment)

    Also, stuff like %.2fs is pretty standard and valid floating point number formatting. See for example src/logging/timer.h


    ryanofsky commented at 3:44 pm on September 11, 2024:

    re: #30546 (review)

    They are not invalid, because in tinyformat there isn’t an invalid specifier, see #30546 (comment)

    When you say these are not invalid you are saying tinyformat accepts strprintf("hello %\n", "world")? Or saying something else?

    In any case, it would be good to clarify in unit test comments which of the test cases tinyformat supports and which trigger exceptions. Potentially we could even extend the test to compare tinyformat behavior with ConstevalFormat behavior to make things even clearer, but just having a comment to clarify how these cases are treated by tinyformat would be very helpful.


    maflcko commented at 4:27 pm on September 11, 2024:
    Thanks, added a comment!
  99. hodlinator commented at 2:22 pm on September 11, 2024: contributor

    Reviewed fa7819bfadd29e41a1c9283828f7e32934d4fbd9

    git range-diff master faa32ad fa7819b

    • Removed validation of aAcdfeEfFgGinopsuxX as tinyformat does not implement it either - acceptable (Edit: parity with tinyformat is more important than parity with a custom linter for tinyformat).
    • Implemented support for width/padding-specification.
    • Now applies to LogConnectFailure.
    • Adds commit fafe6313cce2b97d86bd6f907ffe93c98cf9cb59 which deduplicates lints.

    fafe6313cce2b97d86bd6f907ffe93c98cf9cb59 - lint: Remove forbidden functions from lint-format-strings.py

    Could move this commit first since it is not dependent on the prior commits?

  100. lint: Remove forbidden functions from lint-format-strings.py
    Given that all of them are forbidden by the
    test/lint/lint-locale-dependence.py check, they can be removed.
    fae7b83eb5
  101. maflcko force-pushed on Sep 11, 2024
  102. in src/util/string.h:69 in faa5a8e448 outdated
    64+                // Positional specifier, like %8$
    65+                if (maybe_num == 0) throw "Positional format specifier must have position of at least 1";
    66+                count_pos = std::max(count_pos, maybe_num);
    67+                if (++it >= str.end()) throw "Format specifier incorrectly terminated by end of string";
    68+            } else {
    69+                // Non-positional specifier, like %s
    


    ryanofsky commented at 3:22 pm on September 11, 2024:

    In commit “util: Add ConstevalFormatString” (fa512d7bd333c6bf5868fd131f9513bc51aab5aa)

    This comment is true, but potentially misleading because it isn’t mentioning that this case will also be reached in cases this function isn’t checking like when # + - or space flags are used, when any .precision value is specified, as well as in invalid unterminated format arguments like "%0%s" that may treated as valid.

    Would suggest adding something like “This case is also reached if any flags or position arguments were specified, which this implementation will not check”


    maflcko commented at 4:27 pm on September 11, 2024:
    Thanks, added a comment!
  103. in src/test/util_string_tests.cpp:27 in faa5a8e448 outdated
    22+        ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers("%%%s");
    23+        ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers("%s%%");
    24+        ConstevalFormatString<0>::Detail_CheckNumFormatSpecifiers(" 1$s");
    25+        ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers("%1$s");
    26+        ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers("%1$s%1$s");
    27+        ConstevalFormatString<2>::Detail_CheckNumFormatSpecifiers("%2$s");
    


    ryanofsky commented at 3:31 pm on September 11, 2024:

    In commit “util: Add ConstevalFormatString” (fa512d7bd333c6bf5868fd131f9513bc51aab5aa)

    Think it could be a good idea to some tests positional argument combined with width or precision options or flags like:

    0ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers("%1$5i");
    1ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers("%1$-5i");
    2ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers("%1$.5i");
    

    maflcko commented at 4:27 pm on September 11, 2024:
    Thanks, added the tests!
  104. in src/test/util_string_tests.cpp:53 in fa512d7bd3 outdated
    42+        // errors when used in a ConstevalFormatString. Please refer to the
    43+        // note in the ConstevalFormatString docs.
    44+        ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers("%*c");
    45+        ConstevalFormatString<2>::Detail_CheckNumFormatSpecifiers("%2$*3$d");
    46+        ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers("%.*f");
    47+        return true; // All checks above compiled and passed
    


    ryanofsky commented at 3:37 pm on September 11, 2024:

    In commit “util: Add ConstevalFormatString” (fa512d7bd333c6bf5868fd131f9513bc51aab5aa)

    I think other invalid but accepted case that would be good to check for here or above (depending on whether or not tiny format treats them as runtime errors) would be:

    0ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers("%123%");
    1ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers("%123%s");
    

    maflcko commented at 4:26 pm on September 11, 2024:

    For tinyformat, almost any format specifier type is valid. % as well: https://godbolt.org/z/f8zWsP6nT

    Thanks, added the tests and a comment!

  105. in src/util/string.h:32 in fa512d7bd3 outdated
    26+ * while avoiding re-implementing the entire parsing logic.
    27+ *
    28+ * @note Counting of `*` dynamic width and precision fields (such as `%*c`,
    29+ * `%2$*3$d`, `%.*f`) is not implemented to minimize code complexity as long as
    30+ * they are not used in the codebase. Usage of these fields is not counted and
    31+ * can lead to run-time exceptions. Code wanting to use the `*` specifier can
    


    ryanofsky commented at 3:51 pm on September 11, 2024:

    In commit “util: Add ConstevalFormatString” (fa512d7bd333c6bf5868fd131f9513bc51aab5aa)

    Maybe replace “can” with “will”? It seems like there is no usage of * that ConstevalFormat would accept and tinyformat would not treat as a runtime error. Or is that not the case?


    maflcko commented at 4:26 pm on September 11, 2024:
    I think using positional args may work for *, but I haven’t checked it.
  106. ryanofsky approved
  107. ryanofsky commented at 4:03 pm on September 11, 2024: contributor

    Code review ACK faa5a8e4480de69086796dba49b65ef73058f8d7

    I like this approach of implementing a loose compile time checker that avoids too much complexity by supporting a superset of tinyformat syntax. This approach is more flexible and should be easier to roll out than the opposite approach which would avoid complexity by only supporting a subset of tinyformat syntax.

    In longer run though, I think just supporting a subset of tinyformat syntax would be preferable to provide more safety, and make the checking code easier to understand. @l0rinc seems to have some good ideas about this and be pretty eager to rewrite the checking function, so there should be opportunities to make future improvements after this is used more widely.

  108. DrahtBot requested review from stickies-v on Sep 11, 2024
  109. DrahtBot requested review from l0rinc on Sep 11, 2024
  110. DrahtBot requested review from hodlinator on Sep 11, 2024
  111. maflcko force-pushed on Sep 11, 2024
  112. maflcko force-pushed on Sep 11, 2024
  113. ryanofsky approved
  114. ryanofsky commented at 5:11 pm on September 11, 2024: contributor
    Code review ACK fa26462e95291652b4021d91b014655f678149e8. Just new tests and comments since last review
  115. in src/test/util_string_tests.cpp:32 in fa26462e95 outdated
    27+        ConstevalFormatString<2>::Detail_CheckNumFormatSpecifiers("%2$s");
    28+        ConstevalFormatString<2>::Detail_CheckNumFormatSpecifiers("%2$s 4$s %2$s");
    29+        ConstevalFormatString<129>::Detail_CheckNumFormatSpecifiers("%129$s 999$s %2$s");
    30+        ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers("%02d");
    31+        ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers("%+2s");
    32+        ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers("%+2s");
    


    hodlinator commented at 6:49 pm on September 11, 2024:
    Duplicates

    maflcko commented at 9:06 pm on September 11, 2024:
    Thanks, I’ll fix it up in the next follow-up, or if I have to re-push here, to avoid invalidating review over a harmless test style-nit.

    maflcko commented at 8:27 am on September 12, 2024:

    I pushed a test-only change to reduce the verbosity of the test and also drop the accidental duplicate line.

    The tests now look like PassFmt<0>(""); for the passing case and FailFmtWithError<0>("%s", check_num_spec); for the failing case.

    I hope this change is acceptable for reviewers to review. Otherwise, I’ll revert back to the previous commit hash.

  116. in src/test/util_string_tests.cpp:29 in fa26462e95 outdated
     8+
     9+using namespace util;
    10+
    11+BOOST_AUTO_TEST_SUITE(util_string_tests)
    12+
    13+BOOST_AUTO_TEST_CASE(ConstevalFormatString_NumSpec)
    


    hodlinator commented at 8:51 pm on September 11, 2024:

    A) Interlacing tfm::format

    Partially in response to @ryanofsky’s comment

    Potentially we could even extend the test to compare tinyformat behavior with ConstevalFormat behavior to make things even clearer, but just having a comment to clarify how these cases are treated by tinyformat would be very helpful.

    …and partially out of my own curiosity about how tfm::format really behaves - I added tfm::format testing interlaced with the functionality for counting format specifiers.

    It disproved my intuition about mismatching counts from: #30546 (review)

    B) Free function?

    As a consequence of wanting to avoid…

    0static_assert([]() { ConstevalFormatString<0>::Detail_CheckNumFormatSpecifiers(""); return true; }());
    1tfm::format("");
    

    …I broke out a free function similarly to @stickies-v suggestion #30546 (review), enabling:

    0using util::detail::CountFormatSpecifiers;
    1
    2static_assert(CountFormatSpecifiers("") == 0);
    3tfm::format("");
    

    While I agree with maflcko that keeping functionality hard to use outside of the intended context has value, keeping tests straightforward is more important. If you prefer…

    0ConstevalFormatString<0>::Detail_CheckNumFormatSpecifiers("");
    1tfm::format("");
    

    …I guess that could work too.

    C) Dropped negatives

    Also dropped some of the negative tests since the free 1 literal in…

    0static_assert(CountFormatSpecifiers("%1$s") == 1);
    

    …more clearly and solely at compile-time covers both…

    0ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers("%1$s");
    

    …and…

    0BOOST_CHECK_EXCEPTION(ConstevalFormatString<0>::Detail_CheckNumFormatSpecifiers("%1$s"), ErrType, check_num_spec);
    1BOOST_CHECK_EXCEPTION(ConstevalFormatString<2>::Detail_CheckNumFormatSpecifiers("%1$s"), ErrType, check_num_spec);
    

    D) Split test function

    Broke ConstevalFormatString_NumSpec test function into:

    • ConstevalFormatString_CorrectCounts
    • ConstevalFormatString_IncorrectCounts
    • ConstevalFormatString_NegativeChecks

    E) Added parse_string_content tests

    Added existing “tests” from run-lint-format-strings.py parse_string_content like @stickies-v did. Even if parity with tinyformat is more important than parity with a custom linter for tinyformat, I think it’s still a nice touch.

      0diff --git a/src/test/util_string_tests.cpp b/src/test/util_string_tests.cpp
      1index c6ee1ffed9..2f6fad50b7 100644
      2--- a/src/test/util_string_tests.cpp
      3+++ b/src/test/util_string_tests.cpp
      4@@ -7,74 +7,172 @@
      5 #include <boost/test/unit_test.hpp>
      6 
      7 using namespace util;
      8+using util::detail::CountFormatSpecifiers;
      9+
     10+namespace {
     11+bool CheckTooMany(const tfm::format_error& s) { return s.what() == std::string_view{"tinyformat: Too many conversion specifiers in format string"}; }
     12+bool CheckOutOfRange(const tfm::format_error& s) { return s.what() == std::string_view{"tinyformat: Positional argument out of range"}; }
     13+}
     14 
     15 BOOST_AUTO_TEST_SUITE(util_string_tests)
     16 
     17-BOOST_AUTO_TEST_CASE(ConstevalFormatString_NumSpec)
     18+// These are counted correctly, since tinyformat will require the provided number of args.
     19+BOOST_AUTO_TEST_CASE(ConstevalFormatString_CorrectCounts)
     20 {
     21-    // Compile-time sanity checks
     22-    static_assert([] {
     23-        ConstevalFormatString<0>::Detail_CheckNumFormatSpecifiers("");
     24-        ConstevalFormatString<0>::Detail_CheckNumFormatSpecifiers("%%");
     25-        ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers("%s");
     26-        ConstevalFormatString<0>::Detail_CheckNumFormatSpecifiers("%%s");
     27-        ConstevalFormatString<0>::Detail_CheckNumFormatSpecifiers("s%%");
     28-        ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers("%%%s");
     29-        ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers("%s%%");
     30-        ConstevalFormatString<0>::Detail_CheckNumFormatSpecifiers(" 1$s");
     31-        ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers("%1$s");
     32-        ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers("%1$s%1$s");
     33-        ConstevalFormatString<2>::Detail_CheckNumFormatSpecifiers("%2$s");
     34-        ConstevalFormatString<2>::Detail_CheckNumFormatSpecifiers("%2$s 4$s %2$s");
     35-        ConstevalFormatString<129>::Detail_CheckNumFormatSpecifiers("%129$s 999$s %2$s");
     36-        ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers("%02d");
     37-        ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers("%+2s");
     38-        ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers("%+2s");
     39-        ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers("%.6i");
     40-        ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers("%5.2f");
     41-        ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers("%#x");
     42-        ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers("%1$5i");
     43-        ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers("%1$-5i");
     44-        ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers("%1$.5i");
     45-        // tinyformat accepts almost any "type" spec, even '%', or '_', or '\n'.
     46-        ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers("%123%");
     47-        ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers("%123%s");
     48-        ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers("%_");
     49-        ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers("%\n");
     50-        return true; // All checks above compiled and passed
     51-    }());
     52-    static_assert([] {
     53+    static_assert(CountFormatSpecifiers("") == 0);
     54+    tfm::format("");
     55+
     56+    static_assert(CountFormatSpecifiers("%%") == 0);
     57+    tfm::format("%%");
     58+
     59+    static_assert(CountFormatSpecifiers("%s") == 1);
     60+    BOOST_CHECK_EXCEPTION(tfm::format("%s"), tfm::format_error, CheckTooMany);
     61+    tfm::format("%s", "foo");
     62+
     63+    static_assert(CountFormatSpecifiers("%%s") == 0);
     64+    tfm::format("%%s");
     65+
     66+    static_assert(CountFormatSpecifiers("s%%") == 0);
     67+    tfm::format("s%%");
     68+
     69+    static_assert(CountFormatSpecifiers("%%%s") == 1);
     70+    BOOST_CHECK_EXCEPTION(tfm::format("%%%s"), tfm::format_error, CheckTooMany);
     71+    tfm::format("%%%s", "foo");
     72+
     73+    static_assert(CountFormatSpecifiers("%s%%") == 1);
     74+    BOOST_CHECK_EXCEPTION(tfm::format("%s%%"), tfm::format_error, CheckTooMany);
     75+    tfm::format("%s%%", "foo");
     76+
     77+    static_assert(CountFormatSpecifiers(" 1$s") == 0);
     78+    tfm::format(" 1$s");
     79+
     80+    static_assert(CountFormatSpecifiers("%1$s") == 1);
     81+    BOOST_CHECK_EXCEPTION(tfm::format("%1$s"), tfm::format_error, CheckOutOfRange);
     82+    tfm::format("%1$s", "foo");
     83+
     84+    static_assert(CountFormatSpecifiers("%1$s%1$s") == 1);
     85+    BOOST_CHECK_EXCEPTION(tfm::format("%1$s%1$s"), tfm::format_error, CheckOutOfRange);
     86+    tfm::format("%1$s%1$s", "foo");
     87+
     88+    static_assert(CountFormatSpecifiers("%2$s") == 2);
     89+    BOOST_CHECK_EXCEPTION(tfm::format("%2$s"), tfm::format_error, CheckOutOfRange);
     90+    BOOST_CHECK_EXCEPTION(tfm::format("%2$s", "foo"), tfm::format_error, CheckOutOfRange);
     91+    tfm::format("%2$s", "foo", "bar");
     92+
     93+    static_assert(CountFormatSpecifiers("%2$s 4$s %2$s") == 2);
     94+    BOOST_CHECK_EXCEPTION(tfm::format("%2$s 4$s %2$s"), tfm::format_error, CheckOutOfRange);
     95+    BOOST_CHECK_EXCEPTION(tfm::format("%2$s 4$s %2$s", "foo"), tfm::format_error, CheckOutOfRange);
     96+    tfm::format("%2$s 4$s %2$s", "foo", "bar");
     97+
     98+    static_assert(CountFormatSpecifiers("%12$s 99$s %2$s") == 12);
     99+    BOOST_CHECK_EXCEPTION(tfm::format("%12$s 99$s %2$s"), tfm::format_error, CheckOutOfRange);
    100+    BOOST_CHECK_EXCEPTION(tfm::format("%12$s 99$s %2$s", "1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11"), tfm::format_error, CheckOutOfRange);
    101+    tfm::format("%12$s 99$s %2$s", "1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12");
    102+
    103+    static_assert(CountFormatSpecifiers("%02d") == 1);
    104+    BOOST_CHECK_EXCEPTION(tfm::format("%02d"), tfm::format_error, CheckTooMany);
    105+    tfm::format("%02d", 1);
    106+
    107+    static_assert(CountFormatSpecifiers("%+2s") == 1);
    108+    BOOST_CHECK_EXCEPTION(tfm::format("%+2s"), tfm::format_error, CheckTooMany);
    109+    tfm::format("%+2s", 1);
    110+
    111+    static_assert(CountFormatSpecifiers("%.6i") == 1);
    112+    BOOST_CHECK_EXCEPTION(tfm::format("%.6i"), tfm::format_error, CheckTooMany);
    113+    tfm::format("%.6i", 1);
    114+
    115+    static_assert(CountFormatSpecifiers("%5.2f") == 1);
    116+    BOOST_CHECK_EXCEPTION(tfm::format("%5.2f"), tfm::format_error, CheckTooMany);
    117+    tfm::format("%5.2f", 1);
    118+
    119+    static_assert(CountFormatSpecifiers("%1$5i") == 1);
    120+    BOOST_CHECK_EXCEPTION(tfm::format("%1$5i"), tfm::format_error, CheckOutOfRange);
    121+    tfm::format("%1$5i", 1);
    122+
    123+    static_assert(CountFormatSpecifiers("%1$-5i") == 1);
    124+    BOOST_CHECK_EXCEPTION(tfm::format("%1$-5i"), tfm::format_error, CheckOutOfRange);
    125+    tfm::format("%1$-5i", 1);
    126+
    127+    static_assert(CountFormatSpecifiers("%1$.5i") == 1);
    128+    BOOST_CHECK_EXCEPTION(tfm::format("%1$.5i"), tfm::format_error, CheckOutOfRange);
    129+    tfm::format("%1$.5i", 1);
    130+
    131+    static_assert(CountFormatSpecifiers("%#x") == 1);
    132+    BOOST_CHECK_EXCEPTION(tfm::format("%#x"), tfm::format_error, CheckTooMany);
    133+    tfm::format("%#x", 1);
    134+
    135+    static_assert(CountFormatSpecifiers("%123%") == 1);
    136+    BOOST_CHECK_EXCEPTION(tfm::format("%123%"), tfm::format_error, CheckTooMany);
    137+    tfm::format("%123%", 1);
    138+
    139+    static_assert(CountFormatSpecifiers("%123%s") == 1);
    140+    BOOST_CHECK_EXCEPTION(tfm::format("%123%s"), tfm::format_error, CheckTooMany);
    141+    tfm::format("%123%s", 1);
    142+
    143+    static_assert(CountFormatSpecifiers("%_") == 1);
    144+    BOOST_CHECK_EXCEPTION(tfm::format("%_"), tfm::format_error, CheckTooMany);
    145+    tfm::format("%_", 1);
    146+
    147+    static_assert(CountFormatSpecifiers("%\n") == 1);
    148+    BOOST_CHECK_EXCEPTION(tfm::format("%\n"), tfm::format_error, CheckTooMany);
    149+    tfm::format("%\n", 1);
    150+}
    151+
    152 // The `*` specifier behavior is unsupported and can lead to runtime
    153 // errors when used in a ConstevalFormatString. Please refer to the
    154 // note in the ConstevalFormatString docs.
    155-        ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers("%*c");
    156-        ConstevalFormatString<2>::Detail_CheckNumFormatSpecifiers("%2$*3$d");
    157-        ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers("%.*f");
    158-        return true; // All checks above compiled and passed
    159-    }());
    160+BOOST_AUTO_TEST_CASE(ConstevalFormatString_IncorrectCounts)
    161+{
    162+    int a{}, b{}, c{};
    163 
    164-    // Negative checks at runtime
    165+    auto check_not_enough{[](const tfm::format_error& s) { return s.what() == std::string_view{"tinyformat: Not enough arguments to read variable width or precision"}; }};
    166+    static_assert(CountFormatSpecifiers("%*c") == 1);
    167+    BOOST_CHECK_EXCEPTION(tfm::format("%*c"), tfm::format_error, check_not_enough);
    168+    BOOST_CHECK_EXCEPTION(tfm::format("%*c", a), tfm::format_error, CheckTooMany);
    169+    tfm::format("%*c", a, b);
    170+
    171+    static_assert(CountFormatSpecifiers("%2$*3$d") == 2);
    172+    BOOST_CHECK_EXCEPTION(tfm::format("%2$*3$d"), tfm::format_error, CheckOutOfRange);
    173+    BOOST_CHECK_EXCEPTION(tfm::format("%2$*3$d", a), tfm::format_error, CheckOutOfRange);
    174+    BOOST_CHECK_EXCEPTION(tfm::format("%2$*3$d", a, b), tfm::format_error, CheckOutOfRange);
    175+    tfm::format("%2$*3$d", a, b, c);
    176+
    177+    static_assert(CountFormatSpecifiers("%.*f") == 1);
    178+    BOOST_CHECK_EXCEPTION(tfm::format("%.*f"), tfm::format_error, check_not_enough);
    179+    BOOST_CHECK_EXCEPTION(tfm::format("%.*f", a), tfm::format_error, CheckTooMany);
    180+    tfm::format("%.*f", a, b);
    181+}
    182+
    183+// Negative checks. Executed at runtime as exceptions are not allowed at compile time.
    184+BOOST_AUTO_TEST_CASE(ConstevalFormatString_NegativeChecks)
    185+{
    186     using ErrType = const char*;
    187 
    188     auto check_mix{[](const ErrType& str) { return str == std::string_view{"Format specifiers must be all positional or all non-positional!"}; }};
    189-    BOOST_CHECK_EXCEPTION(ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers("%s%1$s"), ErrType, check_mix);
    190-
    191-    auto check_num_spec{[](const ErrType& str) { return str == std::string_view{"Format specifier count must match the argument count!"}; }};
    192-    BOOST_CHECK_EXCEPTION(ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers(""), ErrType, check_num_spec);
    193-    BOOST_CHECK_EXCEPTION(ConstevalFormatString<0>::Detail_CheckNumFormatSpecifiers("%s"), ErrType, check_num_spec);
    194-    BOOST_CHECK_EXCEPTION(ConstevalFormatString<2>::Detail_CheckNumFormatSpecifiers("%s"), ErrType, check_num_spec);
    195-    BOOST_CHECK_EXCEPTION(ConstevalFormatString<0>::Detail_CheckNumFormatSpecifiers("%1$s"), ErrType, check_num_spec);
    196-    BOOST_CHECK_EXCEPTION(ConstevalFormatString<2>::Detail_CheckNumFormatSpecifiers("%1$s"), ErrType, check_num_spec);
    197+    BOOST_CHECK_EXCEPTION(CountFormatSpecifiers("%s%1$s"), ErrType, check_mix);
    198 
    199     auto check_0_pos{[](const ErrType& str) { return str == std::string_view{"Positional format specifier must have position of at least 1"}; }};
    200-    BOOST_CHECK_EXCEPTION(ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers("%$s"), ErrType, check_0_pos);
    201-    BOOST_CHECK_EXCEPTION(ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers("%$"), ErrType, check_0_pos);
    202-    BOOST_CHECK_EXCEPTION(ConstevalFormatString<0>::Detail_CheckNumFormatSpecifiers("%0$"), ErrType, check_0_pos);
    203-    BOOST_CHECK_EXCEPTION(ConstevalFormatString<0>::Detail_CheckNumFormatSpecifiers("%0$s"), ErrType, check_0_pos);
    204+    BOOST_CHECK_EXCEPTION(CountFormatSpecifiers("%$s"), ErrType, check_0_pos);
    205+    BOOST_CHECK_EXCEPTION(CountFormatSpecifiers("%$"), ErrType, check_0_pos);
    206+    BOOST_CHECK_EXCEPTION(CountFormatSpecifiers("%0$"), ErrType, check_0_pos);
    207+    BOOST_CHECK_EXCEPTION(CountFormatSpecifiers("%0$s"), ErrType, check_0_pos);
    208 
    209     auto check_term{[](const ErrType& str) { return str == std::string_view{"Format specifier incorrectly terminated by end of string"}; }};
    210-    BOOST_CHECK_EXCEPTION(ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers("%"), ErrType, check_term);
    211-    BOOST_CHECK_EXCEPTION(ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers("%1$"), ErrType, check_term);
    212+    BOOST_CHECK_EXCEPTION(CountFormatSpecifiers("%"), ErrType, check_term);
    213+    BOOST_CHECK_EXCEPTION(CountFormatSpecifiers("%1$"), ErrType, check_term);
    214+}
    215+
    216+// Existing "tests" in run-lint-format-strings.py parse_string_content
    217+BOOST_AUTO_TEST_CASE(ConstevalFormatString_run_lint_format_strings_parse_string_content)
    218+{
    219+    static_assert(CountFormatSpecifiers("foo bar foo") == 0);
    220+    static_assert(CountFormatSpecifiers("foo %d bar foo") == 1);
    221+    static_assert(CountFormatSpecifiers("foo %d bar %i foo") == 2);
    222+    static_assert(CountFormatSpecifiers("foo %d bar %i foo %% foo") == 2);
    223+    static_assert(CountFormatSpecifiers("foo %d bar %i foo %% foo %d foo") == 3);
    224+    //static_assert(CountFormatSpecifiers("foo %d bar %i foo %% foo %*d foo") == 4); // not implemented
    225+    static_assert(CountFormatSpecifiers("foo %5$d") == 5);
    226+    //static_assert(CountFormatSpecifiers("foo %5$*7$d") == 7); // not implemented
    227 }
    228 
    229 BOOST_AUTO_TEST_SUITE_END()
    230diff --git a/src/test/util_tests.cpp b/src/test/util_tests.cpp
    231index 1624fb8b5b..bd40d596ef 100644
    232--- a/src/test/util_tests.cpp
    233+++ b/src/test/util_tests.cpp
    234@@ -150,22 +150,24 @@ constexpr uint8_t HEX_PARSE_OUTPUT[] = {
    235 static_assert((sizeof(HEX_PARSE_INPUT) - 1) == 2 * sizeof(HEX_PARSE_OUTPUT));
    236 BOOST_AUTO_TEST_CASE(parse_hex)
    237 {
    238+    using util::hex_literals::detail::Hex;
    239+
    240     std::vector<unsigned char> result;
    241 
    242     // Basic test vector
    243     std::vector<unsigned char> expected(std::begin(HEX_PARSE_OUTPUT), std::end(HEX_PARSE_OUTPUT));
    244-    constexpr std::array<std::byte, 65> hex_literal_array{operator""_hex<util::detail::Hex(HEX_PARSE_INPUT)>()};
    245+    constexpr std::array<std::byte, 65> hex_literal_array{operator""_hex<Hex(HEX_PARSE_INPUT)>()};
    246     auto hex_literal_span{MakeUCharSpan(hex_literal_array)};
    247     BOOST_CHECK_EQUAL_COLLECTIONS(hex_literal_span.begin(), hex_literal_span.end(), expected.begin(), expected.end());
    248 
    249-    const std::vector<std::byte> hex_literal_vector{operator""_hex_v<util::detail::Hex(HEX_PARSE_INPUT)>()};
    250+    const std::vector<std::byte> hex_literal_vector{operator""_hex_v<Hex(HEX_PARSE_INPUT)>()};
    251     hex_literal_span = MakeUCharSpan(hex_literal_vector);
    252     BOOST_CHECK_EQUAL_COLLECTIONS(hex_literal_span.begin(), hex_literal_span.end(), expected.begin(), expected.end());
    253 
    254-    constexpr std::array<uint8_t, 65> hex_literal_array_uint8{operator""_hex_u8<util::detail::Hex(HEX_PARSE_INPUT)>()};
    255+    constexpr std::array<uint8_t, 65> hex_literal_array_uint8{operator""_hex_u8<Hex(HEX_PARSE_INPUT)>()};
    256     BOOST_CHECK_EQUAL_COLLECTIONS(hex_literal_array_uint8.begin(), hex_literal_array_uint8.end(), expected.begin(), expected.end());
    257 
    258-    result = operator""_hex_v_u8<util::detail::Hex(HEX_PARSE_INPUT)>();
    259+    result = operator""_hex_v_u8<Hex(HEX_PARSE_INPUT)>();
    260     BOOST_CHECK_EQUAL_COLLECTIONS(result.begin(), result.end(), expected.begin(), expected.end());
    261 
    262     result = ParseHex(HEX_PARSE_INPUT);
    263diff --git a/src/util/strencodings.h b/src/util/strencodings.h
    264index 1543de03ab..b79806c251 100644
    265--- a/src/util/strencodings.h
    266+++ b/src/util/strencodings.h
    267@@ -427,16 +427,16 @@ struct Hex {
    268 
    269 } // namespace detail
    270 
    271-template <util::detail::Hex str>
    272+template <util::hex_literals::detail::Hex str>
    273 constexpr auto operator""_hex() { return str.bytes; }
    274 
    275-template <util::detail::Hex str>
    276+template <util::hex_literals::detail::Hex str>
    277 constexpr auto operator""_hex_u8() { return std::bit_cast<std::array<uint8_t, str.bytes.size()>>(str.bytes); }
    278 
    279-template <util::detail::Hex str>
    280+template <util::hex_literals::detail::Hex str>
    281 constexpr auto operator""_hex_v() { return std::vector<std::byte>{str.bytes.begin(), str.bytes.end()}; }
    282 
    283-template <util::detail::Hex str>
    284+template <util::hex_literals::detail::Hex str>
    285 inline auto operator""_hex_v_u8() { return std::vector<uint8_t>{UCharCast(str.bytes.data()), UCharCast(str.bytes.data() + str.bytes.size())}; }
    286 
    287 } // inline namespace hex_literals
    288diff --git a/src/util/string.h b/src/util/string.h
    289index 4724d881ff..d99c1963c6 100644
    290--- a/src/util/string.h
    291+++ b/src/util/string.h
    292@@ -18,25 +18,9 @@
    293 #include <vector>
    294 
    295 namespace util {
    296-/**
    297- * [@brief](/bitcoin-bitcoin/contributor/brief/) A wrapper for a compile-time partially validated format string
    298- *
    299- * This struct can be used to enforce partial compile-time validation of format
    300- * strings, to reduce the likelihood of tinyformat throwing exceptions at
    301- * run-time. Validation is partial to try and prevent the most common errors
    302- * while avoiding re-implementing the entire parsing logic.
    303- *
    304- * [@note](/bitcoin-bitcoin/contributor/note/) Counting of `*` dynamic width and precision fields (such as `%*c`,
    305- * `%2$*3$d`, `%.*f`) is not implemented to minimize code complexity as long as
    306- * they are not used in the codebase. Usage of these fields is not counted and
    307- * can lead to run-time exceptions. Code wanting to use the `*` specifier can
    308- * side-step this struct and call tinyformat directly.
    309- */
    310-template <unsigned num_params>
    311-struct ConstevalFormatString {
    312-    const char* const fmt;
    313-    consteval ConstevalFormatString(const char* str) : fmt{str} { Detail_CheckNumFormatSpecifiers(fmt); }
    314-    constexpr static void Detail_CheckNumFormatSpecifiers(std::string_view str)
    315+
    316+namespace detail {
    317+constexpr static unsigned CountFormatSpecifiers(std::string_view str)
    318 {
    319     unsigned count_normal{0}; // Number of "normal" specifiers, like %s
    320     unsigned count_pos{0};    // Max number in positional specifier, like %8$s
    321@@ -74,8 +58,31 @@ struct ConstevalFormatString {
    322         // specifier is not checked. Parsing continues with the next '%'.
    323     }
    324     if (count_normal && count_pos) throw "Format specifiers must be all positional or all non-positional!";
    325-        unsigned count{count_normal | count_pos};
    326-        if (num_params != count) throw "Format specifier count must match the argument count!";
    327+    return count_normal | count_pos;
    328+}
    329+} // namespace detail
    330+
    331+/**
    332+ * [@brief](/bitcoin-bitcoin/contributor/brief/) A wrapper for a compile-time partially validated format string
    333+ *
    334+ * This struct can be used to enforce partial compile-time validation of format
    335+ * strings, to reduce the likelihood of tinyformat throwing exceptions at
    336+ * run-time. Validation is partial to try and prevent the most common errors
    337+ * while avoiding re-implementing the entire parsing logic.
    338+ *
    339+ * [@note](/bitcoin-bitcoin/contributor/note/) Counting of `*` dynamic width and precision fields (such as `%*c`,
    340+ * `%2$*3$d`, `%.*f`) is not implemented to minimize code complexity as long as
    341+ * they are not used in the codebase. Usage of these fields is not counted and
    342+ * can lead to run-time exceptions. Code wanting to use the `*` specifier can
    343+ * side-step this struct and call tinyformat directly.
    344+ */
    345+template <unsigned num_params>
    346+struct ConstevalFormatString {
    347+    const char* const fmt;
    348+    consteval ConstevalFormatString(const char* str) : fmt{str}
    349+    {
    350+        if (detail::CountFormatSpecifiers(fmt) != num_params)
    351+            throw "Format specifier count must match the argument count!";
    352     }
    353 };
    

    (The unrelated changes in strencodings.h/util_tests.cpp are a consequence of clashing util::detail namespaces).


    maflcko commented at 9:25 pm on September 11, 2024:

    While I agree with maflcko that keeping functionality hard to use outside of the intended context has value, keeping tests straightforward is more important. If you prefer…

    Thank you for the diff. However, I think that real code should be designed without taking tests into account (apart from being testable). Tests should always follow real code, not the other way round. I am sure the verbosity in the test can be reduced (maybe with a macro, or type alias, or …), but I am not sure if the style-nits in the tests are so important, as long as the test is working properly and as expected.

    The diff you shared also changes util::hex_literals::detail::Hex, which I don’t really understand.

    Even if parity with tinyformat is more important than parity with a custom linter for tinyformat, I think it’s still a nice touch.

    All relevant test coverage has been taken over, except for the * test coverage. But the * parsing is broken in the linter anyway, so I am not sure about the value of taking over test in a comment (dead code).


    hodlinator commented at 9:46 pm on September 11, 2024:

    I think that real code should be designed without taking tests into account (apart from being testable). Tests should always follow real code, not the other way round.

    Making the function return a value allowed for C) dropped negatives (and less runtime testing). But I can try to provide a new diff bringing them back and without changing string.h. My main point remains adding the interlaced tinyformat tests to prove parity.

    The diff you shared also changes util::hex_literals::detail::Hex, which I don’t really understand.

    It’s a consequence of clashes between util::detail namespaces. (Added note about that in a later edit).

    All relevant test coverage has been taken over, except for the * test coverage. But the * parsing is broken in the linter anyway, so I am not sure about the value of taking over test in a comment (dead code).

    Fair.


    maflcko commented at 10:07 pm on September 11, 2024:

    The diff you shared also changes util::hex_literals::detail::Hex, which I don’t really understand.

    It’s a consequence of clashes between util::detail namespaces. (Added note about that in a later edit).

    Thanks for explaining. I think this is another reason to keep string.h as-is for now.

    My main point remains adding the interlaced tinyformat tests to prove parity.

    I think the CheckTooMany may be a bit too verbose and mostly unit tests for tinyformat itself. Especially, if they are followed by the happy-path. I think the value in checking that a runtime error occurs for an invalid setting that is already rejected at compile-time is limited.

    Also, there are some tests that are “unspecified behavior” in tinyformat, albeit they are working “as expected” now. For example, (1) not consuming all positional args, (2) or mixing positional and non-positional. (The python linter also gets both wrong.) So I am not sure if unit tests should be added for this unspecified behavior. See also the tinyformat docs:

    0// The format can contain either numbered argument conversion specifications
    1// (that is, "%n$" and "*m$"), or unnumbered argument conversion specifications
    2// (that is, % and * ), but not both. The only exception to this is that %%
    3// can be mixed with the "%n$" form. The results of mixing numbered and
    4// unnumbered argument specifications in a format string are undefined.
    5// When numbered argument specifications are used, specifying the Nth argument
    6// requires that all the leading arguments, from the first to the (N-1)th,
    7// are specified in the format string.
    

    hodlinator commented at 8:01 am on September 12, 2024:
      0diff --git a/src/test/util_string_tests.cpp b/src/test/util_string_tests.cpp
      1index c6ee1ffed9..3fed361e7e 100644
      2--- a/src/test/util_string_tests.cpp
      3+++ b/src/test/util_string_tests.cpp
      4@@ -10,62 +10,140 @@ using namespace util;
      5 
      6 BOOST_AUTO_TEST_SUITE(util_string_tests)
      7 
      8-BOOST_AUTO_TEST_CASE(ConstevalFormatString_NumSpec)
      9+// These are counted correctly, since tinyformat will require the provided number of args.
     10+BOOST_AUTO_TEST_CASE(ConstevalFormatString_CorrectCounts)
     11 {
     12-    // Compile-time sanity checks
     13-    static_assert([] {
     14     ConstevalFormatString<0>::Detail_CheckNumFormatSpecifiers("");
     15+    tfm::format("");
     16+
     17     ConstevalFormatString<0>::Detail_CheckNumFormatSpecifiers("%%");
     18+    tfm::format("%%");
     19+
     20     ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers("%s");
     21+    BOOST_CHECK_THROW(tfm::format("%s"), tfm::format_error);
     22+    tfm::format("%s", "foo");
     23+
     24     ConstevalFormatString<0>::Detail_CheckNumFormatSpecifiers("%%s");
     25+    tfm::format("%%s");
     26+
     27     ConstevalFormatString<0>::Detail_CheckNumFormatSpecifiers("s%%");
     28+    tfm::format("s%%");
     29+
     30     ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers("%%%s");
     31+    BOOST_CHECK_THROW(tfm::format("%%%s"), tfm::format_error);
     32+    tfm::format("%%%s", "foo");
     33+
     34     ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers("%s%%");
     35+    BOOST_CHECK_THROW(tfm::format("%s%%"), tfm::format_error);
     36+    tfm::format("%s%%", "foo");
     37+
     38     ConstevalFormatString<0>::Detail_CheckNumFormatSpecifiers(" 1$s");
     39+    tfm::format(" 1$s");
     40+
     41     ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers("%1$s");
     42+    BOOST_CHECK_THROW(tfm::format("%1$s"), tfm::format_error);
     43+    tfm::format("%1$s", "foo");
     44+
     45     ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers("%1$s%1$s");
     46+    BOOST_CHECK_THROW(tfm::format("%1$s%1$s"), tfm::format_error);
     47+    tfm::format("%1$s%1$s", "foo");
     48+
     49     ConstevalFormatString<2>::Detail_CheckNumFormatSpecifiers("%2$s");
     50+    BOOST_CHECK_THROW(tfm::format("%2$s"), tfm::format_error);
     51+    BOOST_CHECK_THROW(tfm::format("%2$s", "foo"), tfm::format_error);
     52+    tfm::format("%2$s", "foo", "bar");
     53+
     54     ConstevalFormatString<2>::Detail_CheckNumFormatSpecifiers("%2$s 4$s %2$s");
     55-        ConstevalFormatString<129>::Detail_CheckNumFormatSpecifiers("%129$s 999$s %2$s");
     56+    BOOST_CHECK_THROW(tfm::format("%2$s 4$s %2$s"), tfm::format_error);
     57+    BOOST_CHECK_THROW(tfm::format("%2$s 4$s %2$s", "foo"), tfm::format_error);
     58+    tfm::format("%2$s 4$s %2$s", "foo", "bar");
     59+
     60+    ConstevalFormatString<12>::Detail_CheckNumFormatSpecifiers("%12$s 99$s %2$s");
     61+    BOOST_CHECK_THROW(tfm::format("%12$s 99$s %2$s"), tfm::format_error);
     62+    BOOST_CHECK_THROW(tfm::format("%12$s 99$s %2$s", "1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11"), tfm::format_error);
     63+    tfm::format("%12$s 99$s %2$s", "1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12");
     64+
     65     ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers("%02d");
     66+    BOOST_CHECK_THROW(tfm::format("%02d"), tfm::format_error);
     67+    tfm::format("%02d", 1);
     68+
     69     ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers("%+2s");
     70-        ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers("%+2s");
     71+    BOOST_CHECK_THROW(tfm::format("%+2s"), tfm::format_error);
     72+    tfm::format("%+2s", 1);
     73+
     74     ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers("%.6i");
     75+    BOOST_CHECK_THROW(tfm::format("%.6i"), tfm::format_error);
     76+    tfm::format("%.6i", 1);
     77+
     78     ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers("%5.2f");
     79-        ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers("%#x");
     80+    BOOST_CHECK_THROW(tfm::format("%5.2f"), tfm::format_error);
     81+    tfm::format("%5.2f", 1);
     82+
     83     ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers("%1$5i");
     84+    BOOST_CHECK_THROW(tfm::format("%1$5i"), tfm::format_error);
     85+    tfm::format("%1$5i", 1);
     86+
     87     ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers("%1$-5i");
     88+    BOOST_CHECK_THROW(tfm::format("%1$-5i"), tfm::format_error);
     89+    tfm::format("%1$-5i", 1);
     90+
     91     ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers("%1$.5i");
     92-        // tinyformat accepts almost any "type" spec, even '%', or '_', or '\n'.
     93+    BOOST_CHECK_THROW(tfm::format("%1$.5i"), tfm::format_error);
     94+    tfm::format("%1$.5i", 1);
     95+
     96+    ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers("%#x");
     97+    BOOST_CHECK_THROW(tfm::format("%#x"), tfm::format_error);
     98+    tfm::format("%#x", 1);
     99+
    100     ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers("%123%");
    101+    BOOST_CHECK_THROW(tfm::format("%123%"), tfm::format_error);
    102+    tfm::format("%123%", 1);
    103+
    104     ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers("%123%s");
    105+    BOOST_CHECK_THROW(tfm::format("%123%s"), tfm::format_error);
    106+    tfm::format("%123%s", 1);
    107+
    108     ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers("%_");
    109+    BOOST_CHECK_THROW(tfm::format("%_"), tfm::format_error);
    110+    tfm::format("%_", 1);
    111+
    112     ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers("%\n");
    113-        return true; // All checks above compiled and passed
    114-    }());
    115-    static_assert([] {
    116+    BOOST_CHECK_THROW(tfm::format("%\n"), tfm::format_error);
    117+    tfm::format("%\n", 1);
    118+}
    119+
    120 // The `*` specifier behavior is unsupported and can lead to runtime
    121 // errors when used in a ConstevalFormatString. Please refer to the
    122 // note in the ConstevalFormatString docs.
    123+BOOST_AUTO_TEST_CASE(ConstevalFormatString_IncorrectCounts)
    124+{
    125+    int a{}, b{}, c{};
    126+
    127     ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers("%*c");
    128+    BOOST_CHECK_THROW(tfm::format("%*c"), tfm::format_error);
    129+    BOOST_CHECK_THROW(tfm::format("%*c", a), tfm::format_error);
    130+    tfm::format("%*c", a, b);
    131+
    132     ConstevalFormatString<2>::Detail_CheckNumFormatSpecifiers("%2$*3$d");
    133+    BOOST_CHECK_THROW(tfm::format("%2$*3$d"), tfm::format_error);
    134+    BOOST_CHECK_THROW(tfm::format("%2$*3$d", a), tfm::format_error);
    135+    BOOST_CHECK_THROW(tfm::format("%2$*3$d", a, b), tfm::format_error);
    136+    tfm::format("%2$*3$d", a, b, c);
    137+
    138     ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers("%.*f");
    139-        return true; // All checks above compiled and passed
    140-    }());
    141+    BOOST_CHECK_THROW(tfm::format("%.*f"), tfm::format_error);
    142+    BOOST_CHECK_THROW(tfm::format("%.*f", a), tfm::format_error);
    143+    tfm::format("%.*f", a, b);
    144+}
    145 
    146-    // Negative checks at runtime
    147+// Executed at runtime as exceptions are not allowed at compile time.
    148+BOOST_AUTO_TEST_CASE(ConstevalFormatString_NegativeChecks)
    149+{
    150     using ErrType = const char*;
    151 
    152     auto check_mix{[](const ErrType& str) { return str == std::string_view{"Format specifiers must be all positional or all non-positional!"}; }};
    153     BOOST_CHECK_EXCEPTION(ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers("%s%1$s"), ErrType, check_mix);
    154 
    155-    auto check_num_spec{[](const ErrType& str) { return str == std::string_view{"Format specifier count must match the argument count!"}; }};
    156-    BOOST_CHECK_EXCEPTION(ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers(""), ErrType, check_num_spec);
    157-    BOOST_CHECK_EXCEPTION(ConstevalFormatString<0>::Detail_CheckNumFormatSpecifiers("%s"), ErrType, check_num_spec);
    158-    BOOST_CHECK_EXCEPTION(ConstevalFormatString<2>::Detail_CheckNumFormatSpecifiers("%s"), ErrType, check_num_spec);
    159-    BOOST_CHECK_EXCEPTION(ConstevalFormatString<0>::Detail_CheckNumFormatSpecifiers("%1$s"), ErrType, check_num_spec);
    160-    BOOST_CHECK_EXCEPTION(ConstevalFormatString<2>::Detail_CheckNumFormatSpecifiers("%1$s"), ErrType, check_num_spec);
    161-
    162     auto check_0_pos{[](const ErrType& str) { return str == std::string_view{"Positional format specifier must have position of at least 1"}; }};
    163     BOOST_CHECK_EXCEPTION(ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers("%$s"), ErrType, check_0_pos);
    164     BOOST_CHECK_EXCEPTION(ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers("%$"), ErrType, check_0_pos);
    

    Changes compared to fa26462e95291652b4021d91b014655f678149e8

    • Interlaced tinyformat calls to prove and document parity between compile time and runtime checks. (The purpose isn’t to to test tinyformat). Now using the less verbose BOOST_CHECK_THROW for tinyformat instead of BOOST_CHECK_EXCEPTION.
    • Breaking apart test function into multiple functions.
    • Still dropping a few negative tests, since they are proven by positive tests, even if less straightforward without the free function.

    Runtime performance comparison (+0.3ms/+1.8% longer)

    fa26462e95291652b4021d91b014655f678149e8 + diff:

    0hyperfine --warmup 100 "ctest -j 20 --test-dir build -R util_string_tests"
    1Benchmark 1: ctest -j 20 --test-dir build -R util_string_tests
    2  Time (mean ± σ):      16.8 ms ±   0.2 ms    [User: 11.4 ms, System: 6.1 ms]
    3  Range (min … max):    16.2 ms …  17.8 ms    156 runs
    

    fa26462e95291652b4021d91b014655f678149e8:

    0hyperfine --warmup 100 "ctest -j 20 --test-dir build -R util_string_tests"
    1Benchmark 1: ctest -j 20 --test-dir build -R util_string_tests
    2  Time (mean ± σ):      16.5 ms ±   0.2 ms    [User: 10.6 ms, System: 6.5 ms]
    3  Range (min … max):    16.0 ms …  17.4 ms    154 runs
    

    hodlinator commented at 8:58 am on September 12, 2024:

    Diff on top of fae8c25d07cf003df1470699df4ef475055bb885

     0diff --git a/src/test/util_string_tests.cpp b/src/test/util_string_tests.cpp
     1index 4c61632fce..ed383e9dad 100644
     2--- a/src/test/util_string_tests.cpp
     3+++ b/src/test/util_string_tests.cpp
     4@@ -17,7 +17,53 @@ template <unsigned NumArgs>
     5 inline void PassFmt(util::ConstevalFormatString<NumArgs> fmt)
     6 {
     7     decltype(fmt)::Detail_CheckNumFormatSpecifiers(fmt.fmt); // This was run at compile-time, but run it again at run-time to avoid -Wunused.
     8+
     9+    // Prove parity with tinyformat
    10+    switch (NumArgs) {
    11+    case 0:
    12+        tfm::format(fmt.fmt);
    13+        break;
    14+    case 1:
    15+        BOOST_CHECK_THROW(tfm::format(fmt.fmt), tfm::format_error);
    16+        tfm::format(fmt.fmt, "foo");
    17+        break;
    18+    case 2:
    19+        BOOST_CHECK_THROW(tfm::format(fmt.fmt), tfm::format_error);
    20+        BOOST_CHECK_THROW(tfm::format(fmt.fmt, "foo"), tfm::format_error);
    21+        tfm::format(fmt.fmt, "foo", "bar");
    22+        break;
    23+    case 12:
    24+        BOOST_CHECK_THROW(tfm::format(fmt.fmt), tfm::format_error);
    25+        BOOST_CHECK_THROW(tfm::format(fmt.fmt, "1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11"), tfm::format_error);
    26+        tfm::format(fmt.fmt, "1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12");
    27+        break;
    28+    }
    29 }
    30+
    31+template <unsigned WrongNumArgs, unsigned CorrectArgs>
    32+inline void PassFmtIncorrect(util::ConstevalFormatString<WrongNumArgs> fmt)
    33+{
    34+    decltype(fmt)::Detail_CheckNumFormatSpecifiers(fmt.fmt); // This was run at compile-time, but run it again at run-time to avoid -Wunused.
    35+
    36+    // Disprove parity with tinyformat
    37+    static_assert(WrongNumArgs != CorrectArgs);
    38+    int a{}, b{}, c{};
    39+    switch (CorrectArgs) {
    40+    case 2:
    41+        BOOST_CHECK_THROW(tfm::format(fmt.fmt), tfm::format_error);
    42+        BOOST_CHECK_THROW(tfm::format(fmt.fmt, a), tfm::format_error);
    43+        tfm::format(fmt.fmt, a, b);
    44+        break;
    45+    case 3:
    46+        BOOST_CHECK_THROW(tfm::format(fmt.fmt), tfm::format_error);
    47+        BOOST_CHECK_THROW(tfm::format(fmt.fmt, a), tfm::format_error);
    48+        BOOST_CHECK_THROW(tfm::format(fmt.fmt, a, b), tfm::format_error);
    49+        tfm::format(fmt.fmt, a, b, c);
    50+        break;
    51+    }
    52+}
    53+
    54+// Executed at runtime as exceptions are not allowed at compile time.
    55 template <unsigned WrongNumArgs>
    56 inline void FailFmtWithError(std::string_view wrong_fmt, std::string_view error)
    57 {
    58@@ -40,7 +86,7 @@ BOOST_AUTO_TEST_CASE(ConstevalFormatString_NumSpec)
    59     PassFmt<1>("%1$s%1$s");
    60     PassFmt<2>("%2$s");
    61     PassFmt<2>("%2$s 4$s %2$s");
    62-    PassFmt<129>("%129$s 999$s %2$s");
    63+    PassFmt<12>("%12$s 999$s %2$s");
    64     PassFmt<1>("%02d");
    65     PassFmt<1>("%+2s");
    66     PassFmt<1>("%.6i");
    67@@ -58,9 +104,9 @@ BOOST_AUTO_TEST_CASE(ConstevalFormatString_NumSpec)
    68     // The `*` specifier behavior is unsupported and can lead to runtime
    69     // errors when used in a ConstevalFormatString. Please refer to the
    70     // note in the ConstevalFormatString docs.
    71-    PassFmt<1>("%*c");
    72-    PassFmt<2>("%2$*3$d");
    73-    PassFmt<1>("%.*f");
    74+    PassFmtIncorrect<1, 2>("%*c");
    75+    PassFmtIncorrect<2, 3>("%2$*3$d");
    76+    PassFmtIncorrect<1, 2>("%.*f");
    77 
    78     auto check_mix{"Format specifiers must be all positional or all non-positional!"};
    79     FailFmtWithError<1>("%s%1$s", check_mix);
    

    Changes compared to fae8c25d07cf003df1470699df4ef475055bb885

    tinyformat calls to prove and document parity/non-parity between compile time and runtime checks. (The purpose isn’t to to test tinyformat). (Using the less verbose BOOST_CHECK_THROW for tinyformat instead of BOOST_CHECK_EXCEPTION).

    Runtime performance comparison (+0.1ms/+0.6% longer)

    fae8c25d07cf003df1470699df4ef475055bb885 + diff:

    0hyperfine --warmup 100 "ctest -j 20 --test-dir build -R util_string_tests"
    1Benchmark 1: ctest -j 20 --test-dir build -R util_string_tests
    2  Time (mean ± σ):      16.8 ms ±   0.4 ms    [User: 10.9 ms, System: 6.6 ms]
    3  Range (min … max):    16.4 ms …  19.9 ms    154 runs
    

    fae8c25d07cf003df1470699df4ef475055bb885:

    0hyperfine --warmup 100 "ctest -j 20 --test-dir build -R util_string_tests"
    1Benchmark 1: ctest -j 20 --test-dir build -R util_string_tests
    2  Time (mean ± σ):      16.7 ms ±   0.3 ms    [User: 10.8 ms, System: 6.6 ms]
    3  Range (min … max):    16.2 ms …  17.8 ms    157 runs
    

    maflcko commented at 9:01 am on September 12, 2024:

    Still dropping a few negative tests, since they are proven by positive tests, even if less straightforward without the free function.

    I don’t think this is true. Without the free function, the tests are required to catch bugs, such as removing a check. E.g. see this diff that removes a check and the tests that you want to remove:

     0diff --git a/src/test/util_string_tests.cpp b/src/test/util_string_tests.cpp
     1index 4c61632fce..42facd4dc1 100644
     2--- a/src/test/util_string_tests.cpp
     3+++ b/src/test/util_string_tests.cpp
     4@@ -65,13 +65,6 @@ BOOST_AUTO_TEST_CASE(ConstevalFormatString_NumSpec)
     5     auto check_mix{"Format specifiers must be all positional or all non-positional!"};
     6     FailFmtWithError<1>("%s%1$s", check_mix);
     7 
     8-    auto check_num_spec{"Format specifier count must match the argument count!"};
     9-    FailFmtWithError<1>("", check_num_spec);
    10-    FailFmtWithError<0>("%s", check_num_spec);
    11-    FailFmtWithError<2>("%s", check_num_spec);
    12-    FailFmtWithError<0>("%1$s", check_num_spec);
    13-    FailFmtWithError<2>("%1$s", check_num_spec);
    14-
    15     auto check_0_pos{"Positional format specifier must have position of at least 1"};
    16     FailFmtWithError<1>("%$s", check_0_pos);
    17     FailFmtWithError<1>("%$", check_0_pos);
    18diff --git a/src/util/string.h b/src/util/string.h
    19index c650be601d..25f1dac4b0 100644
    20--- a/src/util/string.h
    21+++ b/src/util/string.h
    22@@ -73,8 +73,6 @@ struct ConstevalFormatString {
    23             // specifier is not checked. Parsing continues with the next '%'.
    24         }
    25         if (count_normal && count_pos) throw "Format specifiers must be all positional or all non-positional!";
    26-        unsigned count{count_normal | count_pos};
    27-        if (num_params != count) throw "Format specifier count must match the argument count!";
    28     }
    29 };
    30 
    

    (Locally, src/test/test_bitcoin -t util_string_tests passes)

    Sure, this diff looks absurd and unlikely. However, the whole point of unit tests is to catch stuff like this. Otherwise, one could remove each single unit test with the rationale that the other tests still compile and run.


    hodlinator commented at 9:06 am on September 12, 2024:

    Still dropping a few negative tests, since they are proven by positive tests, even if less straightforward without the free function.

    I don’t think this is true. Without the free function

    True, forgot that that last condition was back in the non-free function. Negative tests are not removed in latest diff.


    maflcko commented at 9:29 am on September 12, 2024:

    Thanks, the diff looks good, but would you mind if you submitted it in a follow-up? I presume you ran it locally, and it didn’t uncover any bugs in this pull request, so that is a useful signal already. However, at some point I want to wrap up with this pull request myself. There are already 150 comments, most of which are hidden by GitHub, and thus almost impossible to navigate.

    If I included your diff, someone may suggest to “reduce verbosity” by using the compiler to construct the args of a given number via std::tuple_cat(std::array<int,NumArgs>{}) and then pass it to tinyformat via TfmF("%s", std::tuple_cat(std::array<int,10>{})); where template<typename... Tt> void TfmF(const char* fmt, const std::tuple<Tt...>& t){ std::apply([fmt](const Tt&... ta){tfm::format(fmt, ta...);},t); }. (Or someone may come up with another improvement).

    So I think there can be value in your patch, but in the interest of wrapping up this pull request (which will be followed by follow-ups anyway), it may be better to leave it as-is for now. Otherwise, it will never be ready because there is always a way to come up with an idea to improve something.


    l0rinc commented at 9:32 am on September 12, 2024:

    Otherwise, it will never be ready because there is always a way to come up with an idea to improve something

    I think we’re getting closer, your patience is appreciated :)


    hodlinator commented at 9:37 am on September 12, 2024:

    Yeah, I can do it as a follow-up since by now it does not really modify your existing code that much.

    If you need to re-touch, I would appreciate if you just took this diff:

    0-    PassFmt<129>("%129$s 999$s %2$s");
    1+    PassFmt<12>("%12$s 999$s %2$s");
    

    to make the follow-up cleaner.


    maflcko commented at 10:11 am on September 12, 2024:
    I’ll leave this as-is for now. GitHub is already hiding the diff for me, so I’ll close this for now. Discussion can continue in the follow-up, which will also make it easier to follow.
  117. hodlinator commented at 8:54 pm on September 11, 2024: contributor

    Reviewed fa26462e95291652b4021d91b014655f678149e8

    git range-diff master fa7819b fa26462

    • Moved independent commit to the beginning, thanks!
    • Added some comments.

    I really like that this is being worked upon and want to help us find the best trade-offs in settling on a solution.

  118. DrahtBot requested review from hodlinator on Sep 11, 2024
  119. maflcko force-pushed on Sep 12, 2024
  120. in src/test/util_string_tests.cpp:19 in fae8c25d07 outdated
    14+// args directly. NormallyPassFmt<sizeof...(Args)> would be
    15+// used.
    16+template <unsigned NumArgs>
    17+inline void PassFmt(util::ConstevalFormatString<NumArgs> fmt)
    18+{
    19+    decltype(fmt)::Detail_CheckNumFormatSpecifiers(fmt.fmt); // This was run at compile-time, but run it again at run-time to avoid -Wunused.
    


    l0rinc commented at 8:46 am on September 12, 2024:

    Some tense changes could clarify the exact behavior:

    0    decltype(fmt)::Detail_CheckNumFormatSpecifiers(fmt.fmt); // This was already executed at compile-time, but is executed again at run-time to avoid -Wunused.
    

    maflcko commented at 12:15 pm on September 12, 2024:
    Thanks, fixed!
  121. in src/test/util_string_tests.cpp:15 in fae8c25d07 outdated
    10+
    11+BOOST_AUTO_TEST_SUITE(util_string_tests)
    12+
    13+// Helper to allow compile-time sanity checks while providing the number of
    14+// args directly. NormallyPassFmt<sizeof...(Args)> would be
    15+// used.
    


    l0rinc commented at 8:46 am on September 12, 2024:

    What does NormallyPassFmt<sizeof...(Args)> would be used. mean exactly?

    0// args directly. NormallyPassFmt<sizeof...(Args)> would be used.
    

    maflcko commented at 9:34 am on September 12, 2024:
    Ah, it should say “Normally PassFmt<sizeof…(Args)> would be used.” (in code)

    maflcko commented at 12:15 pm on September 12, 2024:
    Thanks, fixed typo
  122. in src/test/util_string_tests.cpp:54 in fae8c25d07 outdated
    49+    PassFmt<1>("%1$5i");
    50+    PassFmt<1>("%1$-5i");
    51+    PassFmt<1>("%1$.5i");
    52+    // tinyformat accepts almost any "type" spec, even '%', or '_', or '\n'.
    53+    PassFmt<1>("%123%");
    54+    PassFmt<1>("%123%s");
    


    l0rinc commented at 9:26 am on September 12, 2024:

    This seems like an error to me, shouldn’t trailing % be invalid? It fails for "%123 %", shouldn’t it fail for these as well? The reason for the misclassification is #30546 (review), i.e. that when detecting a % we skip the if (++it >= str.end()) check.

    The example in https://github.com/bitcoin/bitcoin/pull/30546/files#r1754643534 detects this as an error


    maflcko commented at 9:41 am on September 12, 2024:

    Please take a look at the line above your comment. It says tinyformat accepts almost any "type" spec, even '%', or '_', or '\n'.

    If you don’t believe the comment, and the compile time check, you can try for yourself: https://godbolt.org/z/n6ToecPKT


    l0rinc commented at 9:48 am on September 12, 2024:
    I have of course seen the comment and believe it, but I don’t think we should mandate "%123%" being correct (i.e. not just allowing it, but adding test for it), while disallowing "%123 %", "%" and "%1$".

    maflcko commented at 10:07 am on September 12, 2024:

    while disallowing "%123 %", "%" and "%1$".

    They are invalid and will cause a runtime error, which is the reason why they are disallowed and the whole point of this pull request.

    Apart from that, I don’t think it makes sense to spend a massive amount of time on format strings that are POSIX-invalid, but currently happen to be tinyformat-valid. No one should be using them, but if someone uses them it won’t cause a runtime error but a harmless formatting error (or no error at all). But anyone can already write a meaningless format string and no compile-time check is going to prevent them from doing so. (Code review does)


    l0rinc commented at 10:12 am on September 12, 2024:
    I understand that, but why are we making sure that the validation allows nonsensical values?

    maflcko commented at 10:17 am on September 12, 2024:

    The goal is not to create a full POSIX-validator here, so there will always be an example where something is “invalid”. The goal is to catch tinyformat-invalid and POSIX-invalid concerning the argument count.

    Apart from that, I don’t think it makes sense to spend a massive amount of time on format strings that are POSIX-invalid, but currently happen to be tinyformat-valid. No one should be using them, but if someone uses them it won’t cause a runtime error but a harmless formatting error (or no error at all). But anyone can already write a meaningless format string and no compile-time check is going to prevent them from doing so. (Code review does)


    maflcko commented at 10:17 am on September 12, 2024:
    Closing for now. However, no one is holding anyone back from creating a follow-up.

    stickies-v commented at 11:40 am on September 12, 2024:
    +1 for focusing on run-time errors, formatting errors aren’t worth spending much time on imo.
  123. in src/test/util_string_tests.cpp:26 in fae8c25d07 outdated
    21+template <unsigned WrongNumArgs>
    22+inline void FailFmtWithError(std::string_view wrong_fmt, std::string_view error)
    23+{
    24+    using ErrType = const char*;
    25+    auto check_throw{[error](const ErrType& str) { return str == error; }};
    26+    BOOST_CHECK_EXCEPTION(util::ConstevalFormatString<WrongNumArgs>::Detail_CheckNumFormatSpecifiers(wrong_fmt), ErrType, check_throw);
    


    l0rinc commented at 9:28 am on September 12, 2024:

    We could standardize this by extending HasReason in setup_common.h:

    0class HasReason
    1{
    2public:
    3    explicit HasReason(const std::string_view& reason) : m_reason(reason) {}
    4    bool operator()(std::string_view s) const { return s.find(m_reason) != std::string_view::npos; }
    5    bool operator()(const std::exception& e) const { return (*this)(e.what()); }
    6
    7private:
    8    const std::string_view m_reason;
    9};
    

    resulting in

    0BOOST_CHECK_EXCEPTION(util::ConstevalFormatString<WrongNumArgs>::Detail_CheckNumFormatSpecifiers(wrong_fmt), const char*, HasReason(error));
    

    maflcko commented at 9:44 am on September 12, 2024:
    I don’t want to modify HasReason here. There are already more than 150 comments and increasing the scope won’t help. Maybe in a follow-up, or separate pull?

    l0rinc commented at 9:49 am on September 12, 2024:
    Ok, though I doubt it will get merged if it’s in a separate PR…

    maflcko commented at 10:09 am on September 12, 2024:
    Well, I promise to review it. If you don’t believe yourself that it is going to be merged, then maybe it isn’t worth it in the first place. Closing for now. This can be a follow-up, or separate pull.

    l0rinc commented at 10:12 am on September 12, 2024:
    Ok, looking forward to the follow-ups

    l0rinc commented at 11:42 am on September 18, 2024:
  124. l0rinc changes_requested
  125. l0rinc commented at 9:31 am on September 12, 2024: contributor
    It seems to me we still have a remaining trailing formatter bug that we should fix before merging
  126. l0rinc changes_requested
  127. l0rinc commented at 9:31 am on September 12, 2024: contributor
    It seems to me we still have a remaining trailing formatter bug that we should fix before merging
  128. in src/util/string.h:64 in fae8c25d07 outdated
    59+                maybe_num += *it - '0';
    60+                ++it;
    61+            };
    62+
    63+            if (*it == '$') {
    64+                // Positional specifier, like %8$
    


    stickies-v commented at 11:27 am on September 12, 2024:

    nit: %8$ is not a valid specifier

    0                // Positional specifier, like %8$s
    
  129. in src/test/util_string_tests.cpp:65 in fae8c25d07 outdated
    60+    // note in the ConstevalFormatString docs.
    61+    PassFmt<1>("%*c");
    62+    PassFmt<2>("%2$*3$d");
    63+    PassFmt<1>("%.*f");
    64+
    65+    auto check_mix{"Format specifiers must be all positional or all non-positional!"};
    


    stickies-v commented at 11:37 am on September 12, 2024:

    nit: in this new new structure, err_mix makes more sense than check_mix

     0diff --git a/src/test/util_string_tests.cpp b/src/test/util_string_tests.cpp
     1index 4c61632fce..f398404d75 100644
     2--- a/src/test/util_string_tests.cpp
     3+++ b/src/test/util_string_tests.cpp
     4@@ -62,25 +62,25 @@ BOOST_AUTO_TEST_CASE(ConstevalFormatString_NumSpec)
     5     PassFmt<2>("%2$*3$d");
     6     PassFmt<1>("%.*f");
     7 
     8-    auto check_mix{"Format specifiers must be all positional or all non-positional!"};
     9-    FailFmtWithError<1>("%s%1$s", check_mix);
    10+    auto err_mix{"Format specifiers must be all positional or all non-positional!"};
    11+    FailFmtWithError<1>("%s%1$s", err_mix);
    12 
    13-    auto check_num_spec{"Format specifier count must match the argument count!"};
    14-    FailFmtWithError<1>("", check_num_spec);
    15-    FailFmtWithError<0>("%s", check_num_spec);
    16-    FailFmtWithError<2>("%s", check_num_spec);
    17-    FailFmtWithError<0>("%1$s", check_num_spec);
    18-    FailFmtWithError<2>("%1$s", check_num_spec);
    19+    auto err_num_spec{"Format specifier count must match the argument count!"};
    20+    FailFmtWithError<1>("", err_num_spec);
    21+    FailFmtWithError<0>("%s", err_num_spec);
    22+    FailFmtWithError<2>("%s", err_num_spec);
    23+    FailFmtWithError<0>("%1$s", err_num_spec);
    24+    FailFmtWithError<2>("%1$s", err_num_spec);
    25 
    26-    auto check_0_pos{"Positional format specifier must have position of at least 1"};
    27-    FailFmtWithError<1>("%$s", check_0_pos);
    28-    FailFmtWithError<1>("%$", check_0_pos);
    29-    FailFmtWithError<0>("%0$", check_0_pos);
    30-    FailFmtWithError<0>("%0$s", check_0_pos);
    31+    auto err_0_pos{"Positional format specifier must have position of at least 1"};
    32+    FailFmtWithError<1>("%$s", err_0_pos);
    33+    FailFmtWithError<1>("%$", err_0_pos);
    34+    FailFmtWithError<0>("%0$", err_0_pos);
    35+    FailFmtWithError<0>("%0$s", err_0_pos);
    36 
    37-    auto check_term{"Format specifier incorrectly terminated by end of string"};
    38-    FailFmtWithError<1>("%", check_term);
    39-    FailFmtWithError<1>("%1$", check_term);
    40+    auto err_term{"Format specifier incorrectly terminated by end of string"};
    41+    FailFmtWithError<1>("%", err_term);
    42+    FailFmtWithError<1>("%1$", err_term);
    43 }
    44 
    45 BOOST_AUTO_TEST_SUITE_END()
    
  130. maflcko force-pushed on Sep 12, 2024
  131. stickies-v approved
  132. stickies-v commented at 12:45 pm on September 12, 2024: contributor

    ACK fa37d9b0fd0b8f1b23163887618b2fa1a817d596

    I like the new test structure and added documentation, thanks! Left some non-blocking nits, only if you need to force-push.

  133. DrahtBot requested review from l0rinc on Sep 12, 2024
  134. DrahtBot requested review from ryanofsky on Sep 12, 2024
  135. util: Add ConstevalFormatString
    The type is used to wrap a format string once it has been compile-time
    checked to contain the right number of format specifiers.
    faa62c0112
  136. util: Use compile-time check for FatalErrorf fa7087b896
  137. util: Use compile-time check for LogConnectFailure fa5bc450d5
  138. maflcko force-pushed on Sep 12, 2024
  139. stickies-v commented at 1:36 pm on September 12, 2024: contributor
    re-ACK fa5bc450d5d4c1d2daf7b205f1678402c3c21678 No changes except variable naming and small doc update.
  140. hodlinator approved
  141. hodlinator commented at 1:59 pm on September 12, 2024: contributor

    ACK fa5bc450d5d4c1d2daf7b205f1678402c3c21678

    git range-diff master fae8c25 fa5bc45

    Only minor comment adjustments and clearer variable names since last reviewed commits.

    Since GitHub is troublesome, would prefer this change if you re-touch:

    0-    PassFmt<129>("%129$s 999$s %2$s");
    1+    PassFmt<12>("%12$s 999$s %2$s");
    

    Makes it more reasonable to add comparison to tinyformat in follow-up with 12 args rather than 129, but still more than 1 digit.

  142. l0rinc commented at 2:17 pm on September 12, 2024: contributor

    ACK fa5bc450d5d4c1d2daf7b205f1678402c3c21678

    The change progressed a lot from its first version and based on the enthusiasm among reviewers I’m looking forward to continuing these in follow-up PRs.

  143. ryanofsky approved
  144. ryanofsky commented at 3:23 pm on September 12, 2024: contributor

    Code review ACK fa5bc450d5d4c1d2daf7b205f1678402c3c21678

    Just test code cleanup since last review

  145. ryanofsky merged this on Sep 12, 2024
  146. ryanofsky closed this on Sep 12, 2024

  147. maflcko deleted the branch on Sep 12, 2024
  148. TheCharlatan referenced this in commit 69282950aa on Sep 16, 2024
  149. TheCharlatan referenced this in commit dfe0cd4ec5 on Sep 16, 2024

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-11-21 09:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me