util/strencodings: guard SAFE_CHARS index and pre-reserve result #33396

pull forkfury wants to merge 1 commits into bitcoin:master from forkfury:ff changing 1 files +9 −1
  1. forkfury commented at 9:00 pm on September 15, 2025: none
    • Add defensive bounds check for rule parameter in SanitizeString to prevent out-of-range access to SAFE_CHARS array
    • Pre-reserve result capacity and cache reference to allowed charset to avoid repeated lookups
    • No behavior change for valid inputs; improves robustness and minor performance
  2. Update strencodings.cpp abed6fb21e
  3. in src/util/strencodings.cpp:30 in abed6fb21e
    29@@ -30,8 +30,16 @@ static const std::string SAFE_CHARS[] =
    30 std::string SanitizeString(std::string_view str, int rule)
    


    l0rinc commented at 0:51 am on September 16, 2025:
    the rule is an enum in reality, what’s the reason for this addition?

    furszy commented at 0:57 am on September 16, 2025:
    what about changing the SafeChars enum to be an enum class instead? (enums auto-increment their value so you would just need to cast it to int here)
  4. DrahtBot commented at 0:58 am on September 16, 2025: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/33396.

    Reviews

    See the guideline for information on the review process. A summary of reviews will appear here.

  5. in src/util/strencodings.cpp:40 in abed6fb21e
    35+    if (rule < 0 || static_cast<size_t>(rule) >= safe_chars_count) {
    36+        return result;
    37+    }
    38+
    39+    result.reserve(str.size());
    40+    const std::string& allowed = SAFE_CHARS[rule];
    


    l0rinc commented at 1:09 am on September 16, 2025:

    There’s no point in extracting this, the compilers are already smart enough to optimize this, see: https://godbolt.org/z/9rej918W7 (compare the SanitizeString_Direct and SanitizeString_Reference assemblies, they’re the same).

    But the string reserve that still make sense and we can change the parameter to the enum instead, e.g.

     0diff --git a/src/util/strencodings.cpp b/src/util/strencodings.cpp
     1index 289f49d46a..d1b99be943 100644
     2--- a/src/util/strencodings.cpp
     3+++ b/src/util/strencodings.cpp
     4@@ -27,9 +27,10 @@ static const std::string SAFE_CHARS[] =
     5     CHARS_ALPHA_NUM + "!*'();:@&=+$,/?#[]-_.~%", // SAFE_CHARS_URI
     6 };
     7 
     8-std::string SanitizeString(std::string_view str, int rule)
     9+std::string SanitizeString(std::string_view str, SafeChars rule)
    10 {
    11     std::string result;
    12+    result.reserve(str.size());
    13     for (char c : str) {
    14         if (SAFE_CHARS[rule].find(c) != std::string::npos) {
    15             result.push_back(c);
    16diff --git a/src/util/strencodings.h b/src/util/strencodings.h
    17index 0106385804..b13ff22713 100644
    18--- a/src/util/strencodings.h
    19+++ b/src/util/strencodings.h
    20@@ -27,12 +27,12 @@
    21 #include <vector>
    22 
    23 /** Used by SanitizeString() */
    24-enum SafeChars
    25+enum SafeChars: uint8_t
    26 {
    27-    SAFE_CHARS_DEFAULT, //!< The full set of allowed chars
    28-    SAFE_CHARS_UA_COMMENT, //!< BIP-0014 subset
    29-    SAFE_CHARS_FILENAME, //!< Chars allowed in filenames
    30-    SAFE_CHARS_URI, //!< Chars allowed in URIs (RFC 3986)
    31+    SAFE_CHARS_DEFAULT = 0,    //!< The full set of allowed chars
    32+    SAFE_CHARS_UA_COMMENT = 1, //!< BIP-0014 subset
    33+    SAFE_CHARS_FILENAME = 2,   //!< Chars allowed in filenames
    34+    SAFE_CHARS_URI = 3,        //!< Chars allowed in URIs (RFC 3986)
    35 };
    36 
    37 /**
    38@@ -59,7 +59,7 @@ enum class ByteUnit : uint64_t {
    39 * [@param](/bitcoin-bitcoin/contributor/param/)[in] rule   The set of safe chars to choose (default: least restrictive)
    40 * [@return](/bitcoin-bitcoin/contributor/return/)           A new string without unsafe chars
    41 */
    42-std::string SanitizeString(std::string_view str, int rule = SAFE_CHARS_DEFAULT);
    43+std::string SanitizeString(std::string_view str, SafeChars rule = SAFE_CHARS_DEFAULT);
    44 /** Parse the hex string into bytes (uint8_t or std::byte). Ignores whitespace. Returns nullopt on invalid input. */
    45 template <typename Byte = std::byte>
    46 std::optional<std::vector<Byte>> TryParseHex(std::string_view str);
    
  6. l0rinc changes_requested
  7. l0rinc commented at 1:10 am on September 16, 2025: contributor
    Not sure if this is an LLM generated patch or not, it’s clear from the usages that the int rule doesn’t need bounds check - but the reserve does make sense and we can swap the int param to the enum which could make sense.
  8. fanquake commented at 11:02 am on September 16, 2025: member
    Thanks, however we can leave this code as-is.
  9. fanquake closed this on Sep 16, 2025


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-09-26 15:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me