This is a refactor (with added test), needed for:
refactor: Remove ParseHex(const char*) from public interface #24751
pull MarcoFalke wants to merge 2 commits into bitcoin:master from MarcoFalke:2204-no-pointer-💴 changing 3 files +19 −14-
MarcoFalke commented at 6:37 AM on April 4, 2022: member
-
ffffa0d77a
test: Add test for embedded null in hex string
Also, fix style in the corresponding function. The style change can be reviewed with "--word-diff-regex=."
-
refactor: Remove ParseHex(const char*) from public interface fad1b614b5
- MarcoFalke added the label Refactoring on Apr 4, 2022
- shaavan approved
-
shaavan commented at 7:49 AM on April 4, 2022: contributor
Code Review ACK fad1b614b55e0373ba73949f30a3bb360f6db25a
- This PR:
- Add test for ParseHex function. Specifically, it adds tests for an embedded null (
\0). - Removes a redundant version of the ParseHex function cleaning the code.
- Add test for ParseHex function. Specifically, it adds tests for an embedded null (
These two commits are separated from #23595. While both the commits are refactoring changes, the changes done in the second commit are commonly shared between the PRs #23595 and #18061.
I agree with separating these two commits in a separate PR because:
- It removes the need of repeated efforts on a similar change in two different PRs.
- It separates refactoring changes from other subsequent changes done in the PRs.
- It makes the PRs more focused on a specific change, which aligns with the philosophy of atomic transitions.
The updated code is clean and easy to understand and reason with, and I think it can be merged.
- This PR:
-
in src/test/util_tests.cpp:133 in fad1b614b5
129 | @@ -130,6 +130,14 @@ BOOST_AUTO_TEST_CASE(util_ParseHex) 130 | result = ParseHex(" 89 34 56 78"); 131 | BOOST_CHECK(result.size() == 4 && result[0] == 0x89 && result[1] == 0x34 && result[2] == 0x56 && result[3] == 0x78); 132 | 133 | + // Embedded null is treated as end
laanwj commented at 9:28 AM on April 4, 2022:I hope this is temporary I think this is very bad behavior, in general. Embedded null should be regarded as invalid characters like any other.
MarcoFalke commented at 9:42 AM on April 4, 2022:Yes, I was planning to fix this bug after #https://github.com/bitcoin/bitcoin/pull/23595, but for now I am not going to change behaviour in this refactoring pull.
laanwj commented at 9:44 AM on April 4, 2022:Agree. I don't mean that, it's good that attention is brought to this in this way.
in src/util/strencodings.h:58 in fad1b614b5
53 | @@ -54,7 +54,7 @@ enum class ByteUnit : uint64_t { 54 | * @return A new string without unsafe chars 55 | */ 56 | std::string SanitizeString(const std::string& str, int rule = SAFE_CHARS_DEFAULT); 57 | -std::vector<unsigned char> ParseHex(const char* psz); 58 | +/** Parse the hex string into bytes. Ignores whitespace. */ 59 | std::vector<unsigned char> ParseHex(const std::string& str);
laanwj commented at 9:30 AM on April 4, 2022:This is a pessimization (causes a allocation and copy) if the passed-in parameter is actually a
const char *. Maybe switch to aspanorstring_view? Or is this never the case?
MarcoFalke commented at 9:47 AM on April 4, 2022:There are a few places that use the pointer interface. However, they are not performance critical:
- Unit tests
- Calls, that are only done once (genesis block, other chain params, ...)
- In net processing to generate the alert message
I think for none of these performance matter, because they are either (1) small strings, in which case a copy doesn't matter, (2) used in tests or otherwise places where performance isn't needed.
The places that are affected (RPC mostly), already use
std::stringto pass around hex stuff. For them this change has no effect, but likely #https://github.com/bitcoin/bitcoin/pull/18061 helps here. For example, when hex-decoding a 1-MB block or so.
MarcoFalke commented at 9:50 AM on April 4, 2022:I think switching to
string_viewcould still make sense, but this requires a re-write of the parse-algorithm, as string view lacks the trailing null-char, which the current algorithm relies on. (There is nostd::string_view::c_str()method)
laanwj commented at 9:50 AM on April 4, 2022:OK. I still think being able to pass in a memory area instead of a full, allocated string is a conceptually useful interface.
const char*is just the wrong way to do it.
laanwj commented at 9:53 AM on April 4, 2022:I think switching to string_view could still make sense, but this requires a re-write of the parse-algorithm,
Fine, yes that may be too much for a refactor like this, let's not do it in this PR then (might as well solve that at the same time as the trailing null issue).
MarcoFalke commented at 2:53 PM on April 4, 2022:(unresolved conversation, to avoid hiding it)
laanwj commented at 9:37 AM on April 4, 2022: memberConcept ACK on getting rid of
const char *string interfaces.Code review ACK fad1b614b55e0373ba73949f30a3bb360f6db25a
vincenzopalazzo approvedvincenzopalazzo commented at 2:28 PM on April 4, 2022: nonesipa commented at 2:37 PM on April 4, 2022: memberUsing
std::string_viewwould be ideal here, as constructing one of those from eitherstd::stringorconst char*is efficient, and without allocation overhead.MarcoFalke commented at 2:54 PM on April 4, 2022: member@sipa This is ideal, but non-trivial IIUC. See the previous code discussion.
sipa commented at 3:04 PM on April 4, 2022: memberdiff --git a/src/util/strencodings.cpp b/src/util/strencodings.cpp index 940fa90da2..691607caba 100644 --- a/src/util/strencodings.cpp +++ b/src/util/strencodings.cpp @@ -81,32 +81,24 @@ bool IsHexNumber(const std::string& str) return (str.size() > starting_location); } -std::vector<unsigned char> ParseHex(const char* psz) +std::vector<unsigned char> ParseHex(std::string_view str) { // convert hex dump to vector std::vector<unsigned char> vch; - while (true) - { - while (IsSpace(*psz)) - psz++; - signed char c = HexDigit(*psz++); - if (c == (signed char)-1) - break; - auto n{uint8_t(c << 4)}; - c = HexDigit(*psz++); - if (c == (signed char)-1) - break; - n |= c; - vch.push_back(n); + auto it = str.begin(); + while (it != str.end() && it + 1 != str.end()) { + if (IsSpace(*it)) { + ++it; + continue; + } + auto c1 = HexDigit(*(it++)); + auto c2 = HexDigit(*(it++)); + if (c1 == (signed char)-1 || c2 == (signed char)-1) break; + vch.push_back(uint8_t(c1 << 4) | c2); } return vch; } -std::vector<unsigned char> ParseHex(const std::string& str) -{ - return ParseHex(str.c_str()); -} - void SplitHostPort(std::string in, uint16_t& portOut, std::string& hostOut) { size_t colon = in.find_last_of(':'); diff --git a/src/util/strencodings.h b/src/util/strencodings.h index 1f83fa3ffa..0e72466fc3 100644 --- a/src/util/strencodings.h +++ b/src/util/strencodings.h @@ -55,8 +55,7 @@ enum class ByteUnit : uint64_t { * [@return](/bitcoin-bitcoin/contributor/return/) A new string without unsafe chars */ std::string SanitizeString(const std::string& str, int rule = SAFE_CHARS_DEFAULT); -std::vector<unsigned char> ParseHex(const char* psz); -std::vector<unsigned char> ParseHex(const std::string& str); +std::vector<unsigned char> ParseHex(std::string_view str); signed char HexDigit(char c); /* Returns true if each character in str is a hex character, and has an even * number of hex digits.*/?
MarcoFalke commented at 3:07 PM on April 4, 2022: memberHappy to review, if you create a pull. With "trivial" I meant that this is not a one-line fix, but requires reworking the algorithm.
sipa commented at 3:08 PM on April 4, 2022: memberOn it.
DrahtBot commented at 1:04 AM on April 5, 2022: member<!--e57a25ab6845829454e8d69fc972939a-->
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.
<!--174a7506f384e20aa4161008e828411d-->
Conflicts
Reviewers, this pull request conflicts with the following ones:
- #24764 (Modernize util/strencodings and util/string:
string_viewandoptionalby sipa) - #22087 (Validate port-options by amadeuszpawlik)
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.
MarcoFalke commented at 6:47 AM on April 5, 2022: memberThanks, closing for now.
MarcoFalke closed this on Apr 5, 2022MarcoFalke deleted the branch on Apr 5, 2022DrahtBot locked this on Apr 5, 2023
This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-04-17 06:14 UTC
More mirrored repositories can be found on mirror.b10c.me