util/strencodings: guard SAFE_CHARS index and pre-reserve result #33396

pull forkfury wants to merge 1 commits into bitcoin:master from forkfury:ff changing 1 files +9 −1
  1. forkfury commented at 9:00 PM on September 15, 2025: none
    • Add defensive bounds check for rule parameter in SanitizeString to prevent out-of-range access to SAFE_CHARS array
    • Pre-reserve result capacity and cache reference to allowed charset to avoid repeated lookups
    • No behavior change for valid inputs; improves robustness and minor performance
  2. Update strencodings.cpp abed6fb21e
  3. in src/util/strencodings.cpp:30 in abed6fb21e
      29 | @@ -30,8 +30,16 @@ static const std::string SAFE_CHARS[] =
      30 |  std::string SanitizeString(std::string_view str, int rule)
    


    l0rinc commented at 12:51 AM on September 16, 2025:

    the rule is an enum in reality, what's the reason for this addition?


    furszy commented at 12:57 AM on September 16, 2025:

    what about changing the SafeChars enum to be an enum class instead? (enums auto-increment their value so you would just need to cast it to int here)

  4. DrahtBot commented at 12:58 AM on September 16, 2025: contributor

    <!--e57a25ab6845829454e8d69fc972939a-->

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    <!--006a51241073e994b41acfe9ec718e94-->

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/33396.

    <!--021abf342d371248e50ceaed478a90ca-->

    Reviews

    See the guideline for information on the review process. A summary of reviews will appear here.

  5. in src/util/strencodings.cpp:40 in abed6fb21e
      35 | +    if (rule < 0 || static_cast<size_t>(rule) >= safe_chars_count) {
      36 | +        return result;
      37 | +    }
      38 | +
      39 | +    result.reserve(str.size());
      40 | +    const std::string& allowed = SAFE_CHARS[rule];
    


    l0rinc commented at 1:09 AM on September 16, 2025:

    There's no point in extracting this, the compilers are already smart enough to optimize this, see: https://godbolt.org/z/9rej918W7 (compare the SanitizeString_Direct and SanitizeString_Reference assemblies, they're the same).

    But the string reserve that still make sense and we can change the parameter to the enum instead, e.g.

    diff --git a/src/util/strencodings.cpp b/src/util/strencodings.cpp
    index 289f49d46a..d1b99be943 100644
    --- a/src/util/strencodings.cpp
    +++ b/src/util/strencodings.cpp
    @@ -27,9 +27,10 @@ static const std::string SAFE_CHARS[] =
         CHARS_ALPHA_NUM + "!*'();:@&=+$,/?#[]-_.~%", // SAFE_CHARS_URI
     };
     
    -std::string SanitizeString(std::string_view str, int rule)
    +std::string SanitizeString(std::string_view str, SafeChars rule)
     {
         std::string result;
    +    result.reserve(str.size());
         for (char c : str) {
             if (SAFE_CHARS[rule].find(c) != std::string::npos) {
                 result.push_back(c);
    diff --git a/src/util/strencodings.h b/src/util/strencodings.h
    index 0106385804..b13ff22713 100644
    --- a/src/util/strencodings.h
    +++ b/src/util/strencodings.h
    @@ -27,12 +27,12 @@
     #include <vector>
     
     /** Used by SanitizeString() */
    -enum SafeChars
    +enum SafeChars: uint8_t
     {
    -    SAFE_CHARS_DEFAULT, //!< The full set of allowed chars
    -    SAFE_CHARS_UA_COMMENT, //!< BIP-0014 subset
    -    SAFE_CHARS_FILENAME, //!< Chars allowed in filenames
    -    SAFE_CHARS_URI, //!< Chars allowed in URIs (RFC 3986)
    +    SAFE_CHARS_DEFAULT = 0,    //!< The full set of allowed chars
    +    SAFE_CHARS_UA_COMMENT = 1, //!< BIP-0014 subset
    +    SAFE_CHARS_FILENAME = 2,   //!< Chars allowed in filenames
    +    SAFE_CHARS_URI = 3,        //!< Chars allowed in URIs (RFC 3986)
     };
     
     /**
    @@ -59,7 +59,7 @@ enum class ByteUnit : uint64_t {
     * [@param](/bitcoin-bitcoin/contributor/param/)[in] rule   The set of safe chars to choose (default: least restrictive)
     * [@return](/bitcoin-bitcoin/contributor/return/)           A new string without unsafe chars
     */
    -std::string SanitizeString(std::string_view str, int rule = SAFE_CHARS_DEFAULT);
    +std::string SanitizeString(std::string_view str, SafeChars rule = SAFE_CHARS_DEFAULT);
     /** Parse the hex string into bytes (uint8_t or std::byte). Ignores whitespace. Returns nullopt on invalid input. */
     template <typename Byte = std::byte>
     std::optional<std::vector<Byte>> TryParseHex(std::string_view str);
    
  6. l0rinc changes_requested
  7. l0rinc commented at 1:10 AM on September 16, 2025: contributor

    Not sure if this is an LLM generated patch or not, it's clear from the usages that the int rule doesn't need bounds check - but the reserve does make sense and we can swap the int param to the enum which could make sense.

  8. fanquake commented at 11:02 AM on September 16, 2025: member

    Thanks, however we can leave this code as-is.

  9. fanquake closed this on Sep 16, 2025


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-04-24 15:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me