refactor: Remove ParseHex(const char*) from public interface #24751

pull MarcoFalke wants to merge 2 commits into bitcoin:master from MarcoFalke:2204-no-pointer-💴 changing 3 files +19 −14
  1. MarcoFalke commented at 6:37 AM on April 4, 2022: member

    This is a refactor (with added test), needed for:

    • #23595
    • #18061
    • Some bugfixes I am working on, but are blocked on this
  2. test: Add test for embedded null in hex string
    Also, fix style in the corresponding function. The style change can be
    reviewed with "--word-diff-regex=."
    ffffa0d77a
  3. refactor: Remove ParseHex(const char*) from public interface fad1b614b5
  4. MarcoFalke added the label Refactoring on Apr 4, 2022
  5. shaavan approved
  6. shaavan commented at 7:49 AM on April 4, 2022: contributor

    Code Review ACK fad1b614b55e0373ba73949f30a3bb360f6db25a

    • This PR:
      1. Add test for ParseHex function. Specifically, it adds tests for an embedded null (\0).
      2. Removes a redundant version of the ParseHex function cleaning the code.

    These two commits are separated from #23595. While both the commits are refactoring changes, the changes done in the second commit are commonly shared between the PRs #23595 and #18061.

    I agree with separating these two commits in a separate PR because:

    1. It removes the need of repeated efforts on a similar change in two different PRs.
    2. It separates refactoring changes from other subsequent changes done in the PRs.
    3. It makes the PRs more focused on a specific change, which aligns with the philosophy of atomic transitions.

    The updated code is clean and easy to understand and reason with, and I think it can be merged.

  7. in src/test/util_tests.cpp:133 in fad1b614b5
     129 | @@ -130,6 +130,14 @@ BOOST_AUTO_TEST_CASE(util_ParseHex)
     130 |      result = ParseHex(" 89 34 56 78");
     131 |      BOOST_CHECK(result.size() == 4 && result[0] == 0x89 && result[1] == 0x34 && result[2] == 0x56 && result[3] == 0x78);
     132 |  
     133 | +    // Embedded null is treated as end
    


    laanwj commented at 9:28 AM on April 4, 2022:

    I hope this is temporary I think this is very bad behavior, in general. Embedded null should be regarded as invalid characters like any other.


    MarcoFalke commented at 9:42 AM on April 4, 2022:

    Yes, I was planning to fix this bug after #https://github.com/bitcoin/bitcoin/pull/23595, but for now I am not going to change behaviour in this refactoring pull.


    laanwj commented at 9:44 AM on April 4, 2022:

    Agree. I don't mean that, it's good that attention is brought to this in this way.

  8. in src/util/strencodings.h:58 in fad1b614b5
      53 | @@ -54,7 +54,7 @@ enum class ByteUnit : uint64_t {
      54 |  * @return           A new string without unsafe chars
      55 |  */
      56 |  std::string SanitizeString(const std::string& str, int rule = SAFE_CHARS_DEFAULT);
      57 | -std::vector<unsigned char> ParseHex(const char* psz);
      58 | +/** Parse the hex string into bytes. Ignores whitespace. */
      59 |  std::vector<unsigned char> ParseHex(const std::string& str);
    


    laanwj commented at 9:30 AM on April 4, 2022:

    This is a pessimization (causes a allocation and copy) if the passed-in parameter is actually a const char *. Maybe switch to a span or string_view? Or is this never the case?


    MarcoFalke commented at 9:47 AM on April 4, 2022:

    There are a few places that use the pointer interface. However, they are not performance critical:

    • Unit tests
    • Calls, that are only done once (genesis block, other chain params, ...)
    • In net processing to generate the alert message

    I think for none of these performance matter, because they are either (1) small strings, in which case a copy doesn't matter, (2) used in tests or otherwise places where performance isn't needed.

    The places that are affected (RPC mostly), already use std::string to pass around hex stuff. For them this change has no effect, but likely #https://github.com/bitcoin/bitcoin/pull/18061 helps here. For example, when hex-decoding a 1-MB block or so.


    MarcoFalke commented at 9:50 AM on April 4, 2022:

    I think switching to string_view could still make sense, but this requires a re-write of the parse-algorithm, as string view lacks the trailing null-char, which the current algorithm relies on. (There is no std::string_view::c_str() method)


    laanwj commented at 9:50 AM on April 4, 2022:

    OK. I still think being able to pass in a memory area instead of a full, allocated string is a conceptually useful interface. const char* is just the wrong way to do it.


    laanwj commented at 9:53 AM on April 4, 2022:

    I think switching to string_view could still make sense, but this requires a re-write of the parse-algorithm,

    Fine, yes that may be too much for a refactor like this, let's not do it in this PR then (might as well solve that at the same time as the trailing null issue).


    MarcoFalke commented at 2:53 PM on April 4, 2022:

    (unresolved conversation, to avoid hiding it)

  9. laanwj commented at 9:37 AM on April 4, 2022: member

    Concept ACK on getting rid of const char * string interfaces.

    Code review ACK fad1b614b55e0373ba73949f30a3bb360f6db25a

  10. vincenzopalazzo approved
  11. sipa commented at 2:37 PM on April 4, 2022: member

    Using std::string_view would be ideal here, as constructing one of those from either std::string or const char* is efficient, and without allocation overhead.

  12. MarcoFalke commented at 2:54 PM on April 4, 2022: member

    @sipa This is ideal, but non-trivial IIUC. See the previous code discussion.

  13. sipa commented at 3:04 PM on April 4, 2022: member
    diff --git a/src/util/strencodings.cpp b/src/util/strencodings.cpp
    index 940fa90da2..691607caba 100644
    --- a/src/util/strencodings.cpp
    +++ b/src/util/strencodings.cpp
    @@ -81,32 +81,24 @@ bool IsHexNumber(const std::string& str)
         return (str.size() > starting_location);
     }
     
    -std::vector<unsigned char> ParseHex(const char* psz)
    +std::vector<unsigned char> ParseHex(std::string_view str)
     {
         // convert hex dump to vector
         std::vector<unsigned char> vch;
    -    while (true)
    -    {
    -        while (IsSpace(*psz))
    -            psz++;
    -        signed char c = HexDigit(*psz++);
    -        if (c == (signed char)-1)
    -            break;
    -        auto n{uint8_t(c << 4)};
    -        c = HexDigit(*psz++);
    -        if (c == (signed char)-1)
    -            break;
    -        n |= c;
    -        vch.push_back(n);
    +    auto it = str.begin();
    +    while (it != str.end() && it + 1 != str.end()) {
    +        if (IsSpace(*it)) {
    +            ++it;
    +            continue;
    +        }
    +        auto c1 = HexDigit(*(it++));
    +        auto c2 = HexDigit(*(it++));
    +        if (c1 == (signed char)-1 || c2 == (signed char)-1) break;
    +        vch.push_back(uint8_t(c1 << 4) | c2);
         }
         return vch;
     }
     
    -std::vector<unsigned char> ParseHex(const std::string& str)
    -{
    -    return ParseHex(str.c_str());
    -}
    -
     void SplitHostPort(std::string in, uint16_t& portOut, std::string& hostOut)
     {
         size_t colon = in.find_last_of(':');
    diff --git a/src/util/strencodings.h b/src/util/strencodings.h
    index 1f83fa3ffa..0e72466fc3 100644
    --- a/src/util/strencodings.h
    +++ b/src/util/strencodings.h
    @@ -55,8 +55,7 @@ enum class ByteUnit : uint64_t {
     * [@return](/bitcoin-bitcoin/contributor/return/)           A new string without unsafe chars
     */
     std::string SanitizeString(const std::string& str, int rule = SAFE_CHARS_DEFAULT);
    -std::vector<unsigned char> ParseHex(const char* psz);
    -std::vector<unsigned char> ParseHex(const std::string& str);
    +std::vector<unsigned char> ParseHex(std::string_view str);
     signed char HexDigit(char c);
     /* Returns true if each character in str is a hex character, and has an even
      * number of hex digits.*/
    

    ?

  14. MarcoFalke commented at 3:07 PM on April 4, 2022: member

    Happy to review, if you create a pull. With "trivial" I meant that this is not a one-line fix, but requires reworking the algorithm.

  15. sipa commented at 3:08 PM on April 4, 2022: member

    On it.

  16. sipa commented at 7:21 PM on April 4, 2022: member

    See #24764.

  17. DrahtBot commented at 1:04 AM on April 5, 2022: member

    <!--e57a25ab6845829454e8d69fc972939a-->

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    <!--174a7506f384e20aa4161008e828411d-->

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #24764 (Modernize util/strencodings and util/string: string_view and optional by sipa)
    • #22087 (Validate port-options by amadeuszpawlik)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  18. MarcoFalke commented at 6:47 AM on April 5, 2022: member

    Thanks, closing for now.

  19. MarcoFalke closed this on Apr 5, 2022

  20. MarcoFalke deleted the branch on Apr 5, 2022
  21. DrahtBot locked this on Apr 5, 2023

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-04-17 06:14 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me