fix: Make TxidFromString() respect string

hodlinator commented at 9:07 am on July 12, 2024: contributor

Problem

Prior to this, TxidFromString() was passing string_view::data() into uint256S() which meant it would only receive the a naked char* pointer and potentially scan past the string_view::length() until it found a null terminator (or some other non-hex character).

Appears to have been a fully dormant bug as callers were either passing a string literal or std::string directly to TxidFromFromString(), meaning a null terminator always existed at pointer[length()]. Bug existed since original merge of TxidFromString().

Solution

Make uint256S() (and base_blob::SetHex()) take and operate on std::string_view instead of const char* and have TxidFromString() pass that in.

(PR was prompted by comment in #30377 (comment) (referring to #28922 (review))).

DrahtBot commented at 9:07 am on July 12, 2024: contributor

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage

For detailed information about the code coverage, see the test coverage report.

Reviews

See the guideline for information on the review process.

Type	Reviewers
ACK	paplorinc, maflcko, ryanofsky
Stale ACK	stickies-v

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#30482 (rest: Reject truncated hex txid early in getutxos parsing by maflcko)
#30377 (refactor: Add consteval uint256(“str”) and ParseHex(“str”) by hodlinator)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

in src/util/transaction_identifier.h:71 in a4ead4e51c outdated

67@@ -68,7 +68,7 @@ using Wtxid = transaction_identifier<true>;
68 
69 inline Txid TxidFromString(std::string_view str)
70 {
71-    return Txid::FromUint256(uint256S(str.data()));
72+    return Txid::FromUint256(uint256S(str));

maflcko commented at 10:26 am on July 12, 2024:

nit in a4ead4e51c04c75c7837fb53116062e79f502bbd (commit message): Could link directly to the discussion #28922 (review) ?

maflcko commented at 11:02 am on July 12, 2024:

(same commit): Also, you’d have to fix WtxidFromString( in the same commit?

in src/uint256.cpp:25 in 7d8dfe2faf outdated

28-        psz++;
29-
30-    // skip 0x
31-    if (psz[0] == '0' && ToLower(psz[1]) == 'x')
32-        psz += 2;
33+    str = util::LeftTrimStringView(str);

maflcko commented at 10:32 am on July 12, 2024:

Both sides are trimmed. My recommendation would be to drop the first commit and instead add a unit test:

 0diff --git a/src/test/uint256_tests.cpp b/src/test/uint256_tests.cpp
 1index b7892a2139..5602b33e17 100644
 2--- a/src/test/uint256_tests.cpp
 3+++ b/src/test/uint256_tests.cpp
 4@@ -102,6 +102,8 @@ BOOST_AUTO_TEST_CASE( basics ) // constructors, equality, inequality
 5     BOOST_CHECK(uint256S("0x"+MaxL.ToString()) == MaxL);
 6     BOOST_CHECK(uint256S(R1L.ToString()) == R1L);
 7     BOOST_CHECK(uint256S("   0x"+R1L.ToString()+"   ") == R1L);
 8+    BOOST_CHECK(uint256S("   0x" + R1L.ToString() + "-trash;%^&   ") == R1L);
 9+    BOOST_CHECK(uint256S("\t \n  \n \f\n\r\t\v\t   0x" + R1L.ToString() + "  \t \n  \n \f\n\r\t\v\t ") == R1L);
10     BOOST_CHECK(uint256S("") == ZeroL);
11     BOOST_CHECK(R1L == uint256S(R1ArrayHex));
12     BOOST_CHECK(uint256(R1L) == R1L);
13@@ -115,6 +117,8 @@ BOOST_AUTO_TEST_CASE( basics ) // constructors, equality, inequality
14     BOOST_CHECK(uint160S("0x"+MaxS.ToString()) == MaxS);
15     BOOST_CHECK(uint160S(R1S.ToString()) == R1S);
16     BOOST_CHECK(uint160S("   0x"+R1S.ToString()+"   ") == R1S);
17+    BOOST_CHECK(uint160S("   0x" + R1S.ToString() + "-trash;%^&   ") == R1S);
18+    BOOST_CHECK(uint160S(" \t \n  \n \f\n\r\t\v\t  0x" + R1S.ToString() + "   \t \n  \n \f\n\r\t\v\t") == R1S);
19     BOOST_CHECK(uint160S("") == ZeroS);
20     BOOST_CHECK(R1S == uint160S(R1ArrayHex));
21

Then, you can replace this call in the second commit with the existing string view Trim function.

(The unit test also checks that trailing trash is ignored. (Which is sad, but existing behavior, so probably good to keep for now))

maflcko commented at 10:42 am on July 12, 2024:

Also, could add a unit test that truncated input is parsed? (Sad as well, but again existing behavior, so probably good to keep for now)

hodlinator commented at 12:12 pm on July 12, 2024:

Thanks for the feedback, I really enjoyed removing the overloads based on the other comments! :)

Messed up the first force-push somehow, here is a comparison between what you reviewed and the current state: https://github.com/bitcoin/bitcoin/compare/be23937392195c773811fdfc0d723783b2dace67..fcec222ae9386342ff1342e096255ca1ff74964d

Also, could add a unit test that truncated input is parsed?

Do you mean that base_blob::SetHex() allows parsing “half bytes” like "0xF", while ParseHex() would fail on "F" and require "0F" (maybe "F0")?

maflcko commented at 12:25 pm on July 12, 2024:

Do you mean that base_blob::SetHex() allows parsing “half bytes” like "0xF", while ParseHex() would fail on "F" and require "0F" (maybe "F0")?

Yeah, it looks like this is already tested in BOOST_AUTO_TEST_CASE(parse):

0    {
1        std::string s_1{uint256::ONE.GetHex()};
2        BOOST_CHECK_EQUAL(uint256S("1\0").GetHex(), s_1);
3        BOOST_CHECK_EQUAL(uint256S(std::string{"1\0", 2}).GetHex(), s_1);
4        BOOST_CHECK_EQUAL(uint256S("0x1").GetHex(), s_1);
5        BOOST_CHECK_EQUAL(uint256S(" 0x1").GetHex(), s_1);
6        BOOST_CHECK_EQUAL(uint256S(" 1").GetHex(), s_1);
7    }

maflcko commented at 12:26 pm on July 12, 2024:

Messed up the first force-push somehow, here is a comparison between what you reviewed and the current state: https://github.com/bitcoin/bitcoin/compare/be23937392195c773811fdfc0d723783b2dace67..fcec222ae9386342ff1342e096255ca1ff74964d

No worries. I do proper reviews locally anyway, as I don’t trust GitHub, due to its history of data loss and data corruption.

in src/uint256.h:63 in 7d8dfe2faf outdated

57@@ -58,8 +58,9 @@ class base_blob
58     friend constexpr bool operator<(const base_blob& a, const base_blob& b) { return a.Compare(b) < 0; }
59 
60     std::string GetHex() const;
61-    void SetHex(const char* psz);
62-    void SetHex(const std::string& str);
63+    void SetHex(std::string_view str);
64+    inline void SetHex(const char* psz) { SetHex(std::string_view{psz}); }
65+    inline void SetHex(const std::string& str) { SetHex(std::string_view{str}); }

maflcko commented at 10:39 am on July 12, 2024:

The compiler will generate those, no? So seems fine to remove?

in src/uint256.h:135 in 7d8dfe2faf outdated

132@@ -132,5 +133,15 @@ inline uint256 uint256S(const std::string& str)
133     rv.SetHex(str);
134     return rv;
135 }

maflcko commented at 10:39 am on July 12, 2024:

Same (the compiler will generate those)?

maflcko commented at 10:39 am on July 12, 2024:

Also, the same for inline uint160 uint160S(.

maflcko commented at 10:41 am on July 12, 2024: member

left some comments

hodlinator force-pushed on Jul 12, 2024

in src/rest.cpp:794 in fcec222ae9 outdated

788@@ -789,8 +789,9 @@ static bool rest_getutxos(const std::any& context, HTTPRequest* req, const std::
789         for (size_t i = (fCheckMemPool) ? 1 : 0; i < uriParts.size(); i++)
790         {
791             int32_t nOutput;
792-            std::string strTxid = uriParts[i].substr(0, uriParts[i].find('-'));
793-            std::string strOutput = uriParts[i].substr(uriParts[i].find('-')+1);
794+            const std::string_view part{uriParts[i]};
795+            const std::string_view strTxid{part.substr(0, part.find('-'))};
796+            const std::string_view strOutput{part.substr(strTxid.size()+1)};

maflcko commented at 12:21 pm on July 12, 2024:

This is wrong and not a refactor. It crashes the node for me. Previously it did not crash.

0 node0 stderr terminate called after throwing an instance of 'std::out_of_range'
1  what():  basic_string_view::substr: __pos (which is 67) > __size (which is 66)

hodlinator commented at 1:00 pm on July 12, 2024:

Hm.. may have managed to figure out why it would not fail before. Old code:

0std::string strTxid = uriParts[i].substr(0, uriParts[i].find('-'));
1std::string strOutput = uriParts[i].substr(uriParts[i].find('-')+1);

If .find('-') returned npos and npos = size_type(-1), then +1 would make it 0, so it would not throw in substr(). This would happen for an empty uriParts[i], or non-empty not containing - (in this latter case it might have failed parsing strOutput if encountering A-F).

hodlinator commented at 7:31 pm on July 12, 2024:

Improved testing and error reporting in 01fa92928e6f9509935840a8c57d29774a0d14a1.

Made 3d0aec126f8103310b741a5c21e94cf537fcd191 use util::Split. It allocates a vector for each parsed UTXO but is more readable than my former approach.

in src/uint256.h:116 in fcec222ae9 outdated

110@@ -112,21 +111,11 @@ class uint256 : public base_blob<256> {
111     static const uint256 ONE;
112 };
113 
114-/* uint256 from const char *.
115- * This is a separate function because the constructor uint256(const char*) can result
116- * in dangerously catching uint256(0).
117+/* uint256 from std::string_view.
118+ * This is a separate function because the constructor uint256(std::string_view str) can result
119+ * in dangerously catching uint256(0) via std::string_view(const char*).

maflcko commented at 12:28 pm on July 12, 2024:

This is fixed in C++23. Also, most C++17 compilers would already issue a compile warning for this.

hodlinator commented at 7:34 pm on July 12, 2024:

e9a436b7796cb285730d179fb3317ad50234ab04 changed it to: This is not a uint256 constructor because of historical fears of uint256(0) resolving to a NULL string and crashing.

hodlinator force-pushed on Jul 12, 2024

in src/rest.cpp:799 in 3d0aec126f outdated

796+                return RESTERR(req, HTTP_BAD_REQUEST, "Parse error: expected one '-' separator for UTXO in \"" + uriParts[i] + "\"");
797+            const auto strTxid{utxo_pair[0]};
798+            const auto strOutput{utxo_pair[1]};
799 
800             if (!ParseInt32(strOutput, &nOutput) || !IsHex(strTxid))
801                 return RESTERR(req, HTTP_BAD_REQUEST, "Parse error: couldn't read \"<hexadecimal txid>-<decimal output index>\" from \"" + uriParts[i] + "\"");

maflcko commented at 8:06 am on July 16, 2024:

It is probably fine to rework the error strings in rest.cpp, but seems better to leave for a follow-up, given that there are other follow-ups as well: #30444 (comment) ?

This way, the changes here would be focussed on one thing (fixing the possible UB).

hodlinator commented at 8:49 am on July 16, 2024:

Updated to avoid changing existing error strings. Also reordered commits, let me know if you prefer I break off the last 2 into a separate PR.

(Also added some info to commit message for 735865af6308bd76063d48d145ca5761fc839405 and used curly-initialization for the Split() result).

maflcko commented at 9:02 am on July 16, 2024:

Yeah, I’d say a separate PR would be better.

hodlinator commented at 9:11 am on July 16, 2024:

Dropped 2 commits from this PR.

hodlinator force-pushed on Jul 16, 2024

in src/util/string.h:72 in d20fc9cdff outdated

67@@ -68,11 +68,11 @@ std::vector<T> Split(const Span<const char>& sp, char sep)
68 
69 [[nodiscard]] inline std::string_view TrimStringView(std::string_view str, std::string_view pattern = " \f\n\r\t\v")
70 {
71-    std::string::size_type front = str.find_first_not_of(pattern);
72+    std::string_view::size_type front = str.find_first_not_of(pattern);
73     if (front == std::string::npos) {

maflcko commented at 9:34 am on July 16, 2024:

not sure about this change. The types are the same either way and below is still using std::string::npos. Also, it has nothing to do with base_blob in the commit.

hodlinator commented at 9:42 am on July 16, 2024:

Seems very harmless to correct the type. But yeah, should have fixed the npos if we go with this change.

I’ll remove it from this PR for now if it goes outside of the “PR style” of the project.

in src/util/string.h:87 in d20fc9cdff outdated

82@@ -83,8 +83,8 @@ std::vector<T> Split(const Span<const char>& sp, char sep)
83 
84 [[nodiscard]] inline std::string_view RemovePrefixView(std::string_view str, std::string_view prefix)
85 {
86-    if (str.substr(0, prefix.size()) == prefix) {
87-        return str.substr(prefix.size());
88+    if (str.starts_with(prefix)) {
89+        str.remove_prefix(prefix.size());

maflcko commented at 9:35 am on July 16, 2024:

Same here. Has nothing to do with the base_blob commit.

hodlinator commented at 10:14 am on July 16, 2024:

base_blob now calls RemovePrefixView() which it did not previously do. I want to make it as efficient as possible compared to the prior base_blob implementation, hence this change.

l0rinc commented at 10:32 am on July 22, 2024:

I find it quite confusing that we’re sneaking in a mutation of the copy, while previously we just returned a copy directly (like in TrimStringView). We could const the parameters if we did the same as before:

0        return str.substr(prefix.size());

i.e.

0[[nodiscard]] inline std::string_view RemovePrefixView(const std::string_view str, const std::string_view prefix)

Also, could you please add a test case that checks the integrity of the original (would pass because of the string_view copy now as well):

0    auto original = "0x123";
1    BOOST_CHECK_EQUAL(RemovePrefixView(original, "0x"), "123");
2    BOOST_CHECK_EQUAL(original, "0x123"); // Ensure original string is intact

Same for TrimStringView, just to be sure. I’m fine with adding the test in a different PR, but I consider this mutation to a blocker.

See: https://github.com/bitcoin/bitcoin/pull/30482/commits/f77d961d926405637dfbdfb9a9baea1fab4f1b7b#r1685401233

hodlinator commented at 1:13 pm on July 22, 2024:

std::string_view::remove_prefix() itself invites mutation.
The names TrimStringView() and RemovePrefixView() imply mutation.
As I said in #30436 (review), I could only find one other case of const string_view parameters in the entire codebase. I’m sorry if you are used to another way. It may well be a better way, but I’d rather not change more pre-existing things like TrimStringView() and RemovePrefixView() taking non-const string_view copies.

I do not think introducing test cases to prove that the C++ compiler is properly copying a given parameter provides much value.

l0rinc commented at 1:22 pm on July 22, 2024:

std::string_view::remove_prefix() itself invites mutation.

no, we din’t mutate before, we returned a new view. We’re also doing that now, but the copy is done implicitly in the parameters. The original substr way made more sense - like we’re doing with TrimStringView as well.

hodlinator commented at 1:47 pm on July 22, 2024:

The reason for using the standard C++ remove_prefix() function inside of the pre-existing RemovePrefixView() over substr() was covered here: #30436 (review)

base_blob now calls RemovePrefixView() which it did not previously do. I want to make it as efficient as possible compared to the prior base_blob implementation, hence this change.

To be even more clear than I was that time - please compare the implementations of substr() and remove_prefix() in GCC.

ryanofsky commented at 2:24 pm on July 22, 2024:

re: https://github.com/bitcoin/bitcoin/pull/30436/files#r1686336262

I find it quite confusing that we’re sneaking in a mutation of the copy

paplorinc, maybe it would help if you could say more specifically what you find confusing about this code? At least to me the new implementation of this function seems more straightforward than the previous implementation, though I don’t think there is a big difference.

I don’t love the name of the RemovePrefixView function, because that name sounds like it actually does modify the string view like remove_prefix does, instead of just returning a new string view with the prefix stripped. I think a better name for this function would just be StripPrefix. But that’s also a subjective opinion, and renaming would be beyond the scope of this PR.

l0rinc commented at 2:43 pm on July 22, 2024:

I’m not sure what you mean exactly, both are the same, one creates a copy via the params (unless it’s inlined, I guess, that’s why I find it confusing) and modifies the copied instance, the other returns a new copied instance. I really love speed optimizations that also make the code more readable, but this does neither.

given:

0static void SetHexBenchmark(benchmark::Bench& bench)
1{
2    uint256 hash;
3    bench.run([&] {
4        hash.SetHex("1234567890abcdef1234567890abcdef1234567890abcdef1234567890abcdef");
5        hash.SetHex("0x1234567890abcdef1234567890abcdef1234567890abcdef1234567890abcdef");
6        ankerl::nanobench::doNotOptimizeAway(hash);
7    });
8}

with 788fe9cc9ab979c5e14f544ae05dc46436892b81 we’re getting 2940379.88 ops/s on average of 2 runs:

ns/op	op/s	err%	total	benchmark
340.85	2,933,826.25	0.6%	0.01	`SetHexBenchmark`

and

ns/op	op/s	err%	total	benchmark
339.34	2,946,933.51	0.1%	0.01	`SetHexBenchmark`

with return str.substr(prefix.size()); it’s 2936915.645 ops/s on average for 2 runs:

ns/op	op/s	err%	total	benchmark
340.99	2,932,626.59	0.6%	0.01	`SetHexBenchmark`

and

ns/op	op/s	err%	total	benchmark
340.00	2,941,204.70	0.0%	0.01	`SetHexBenchmark`

I.e. the difference of the averages is a lot smaller than the variation of individual runs.

which is basically the same speed as master:

ns/op	op/s	err%	total	benchmark
352.95	2,833,265.72	0.3%	0.01	`SetHexBenchmark`

ns/op	op/s	err%	total	benchmark
356.66	2,803,757.30	1.4%	0.01	`SetHexBenchmark`

let’s focus on making the code cleaner and more maintainable - if it’s also faster, great, but let’s not make it harder to reason about and maintain for an imaginary speed gain.

maflcko commented at 3:04 pm on July 22, 2024:

imaginary speed gain.

You are correct that this is not a speed gain. I’d be surprised if the compiler-produced ASM even differed at all. They should be exactly equivalent.

This pull request has more than a 100 comments for less lines of code changed, and several reviews that say it is an improvement. There will always be room for more style nits, but I think at some point it is best to leave them for follow-ups or ignore them.

l0rinc commented at 3:16 pm on July 22, 2024:

it would help if you could say more specifically what you find confusing about this code

I’ll try to rephrase. Firstly, as you’ve mentioned as well, the name of the method hints at mutation, but it actually returns a copy of the view. Second, it’s inline + pass by value mutation, which implies it’s not a simple search&replace type inline, since the parameters aren’t consts, but it also can’t just mutate the original values passed as parameters, so the copy of the parameter is implicit. Making the parameters const and returning a new view directly solves most of these, i.e. you can just copy the internals of the method and paste it to its usage (which helps in theory to understand how the method will behave). Third, this seems like a premature and untested optimization, which should likely be done in a separate PR, in case the code isn’t more readable.

l0rinc commented at 3:18 pm on July 22, 2024:

This pull request has more than a 100 comments for less lines of code changed

I did all the changes I requested during review (even provided diffs to speed it up), they’re not complicated.

maflcko commented at 7:35 am on July 23, 2024:

the name of the method hints at mutation, but it actually returns a copy of the view.

This isn’t changed. The function signature is identical in this branch and in current master. Both take a copy of the view and return a copy of the view.

so the copy of the parameter is implicit

No, from reading the function signature, one knows that a copy is being made when the argument is put in as a function parameter. Also, adding a const here does not avoid a copy or make a copy explicit in the function parameter.

Overall, I don’t think a style nit (which all other reviewers so far disagree with) qualifies as a Nack.

If the style-nit was important, it could trivially be fixed up in a follow-up. (There are already two and more will be needed anyway).

l0rinc commented at 7:41 am on July 23, 2024:

No, from reading the function signature, one knows that a copy is being made when the argument is put in as a function parameter

but it’s an inline function, there is no function parameter.

hodlinator commented at 8:03 am on July 23, 2024:

Dropped RemovePrefixView() modification.

hodlinator force-pushed on Jul 16, 2024

maflcko approved

maflcko commented at 3:41 pm on July 16, 2024: member

ACK c3a9c71c7077324bf0aa40f834f7537a14619340 🐞

Signature:

0untrusted comment: signature from minisign secret key on empty file; verify via: minisign -Vm "${path_to_any_empty_file}" -P RWTRmVTMeKV5noAMqVlsMugDDCyyTSbA3Re5AkUrhvLVln0tSaFWglOw -x "${path_to_this_whole_four_line_signature_blob}"
1RUTRmVTMeKV5npGrKx1nqXCw5zeVHdtdYURB/KlyA/LMFgpNCs+SkW9a8N95d+U4AP1RJMi+krxU1A3Yux4bpwZNLvVBKy0wLgM=
2trusted comment: ACK c3a9c71c7077324bf0aa40f834f7537a14619340 🐞
30oe/WjDO2mSwVc5Blutf21rEvpYjnqCdQdO76x2UhqHObkUZghAnaWmbpcKtyptM6F8bw8dIl8IU3+TJAkcnBg==

fanquake requested review from stickies-v on Jul 17, 2024

in src/uint256.h:116 in c3a9c71c70 outdated

110@@ -112,21 +111,11 @@ class uint256 : public base_blob<256> {
111     static const uint256 ONE;
112 };
113 
114-/* uint256 from const char *.
115- * This is a separate function because the constructor uint256(const char*) can result
116- * in dangerously catching uint256(0).
117+/* uint256 from std::string_view.
118+ * This is not a uint256 constructor because of historical fears of uint256(0)

stickies-v commented at 6:35 pm on July 17, 2024:

nit: I interpret “historical fears” as “the fears are no longer applicable”, in which case I think it should just be a constructor and this helper function removed. But, it seems like until C++23 this is indeed an issue (even if the ambiguous overload with the uint8_t should prevent compilation), so the current wording is confusing to me.

Suggested alternative:

0/* uint256 from std::string_view.
1 * This is not a uint256 constructor to avoid the ambiguity of
2 * uint256(0) being interpreted as initialization from a nullptr.
3 *
4 * This helper can be replaced with a constructor when we require c++23,
5 * which prohibits constructing a string_view from a nullptr, see
6 * https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p2166r1.html
7 */

hodlinator commented at 7:06 pm on July 17, 2024:

Thank you for the correction! Applied with some minor adjustments.

hodlinator commented at 7:41 am on July 18, 2024:

Reset back to prior version, see #30436 (review).

stickies-v commented at 6:36 pm on July 17, 2024: contributor

Approach ACK c3a9c71c7077324bf0aa40f834f7537a14619340

hodlinator force-pushed on Jul 17, 2024

in src/uint256.h:118 in 4923d90835 outdated

116- * in dangerously catching uint256(0).
117+/* uint256 from std::string_view.
118+ * This is not a uint256 constructor to avoid the ambiguity of uint256(0) being
119+ * interpreted as initialization from a nullptr.
120+ *
121+ * This helper can be replaced with a constructor when we require C++23, which

maflcko commented at 6:36 am on July 18, 2024:

Not sure the latest push is a good change. I don’t think this should be replaced with a constructor in the future, but it should be removed completely.

A new function (for example static std::optional<base_blob<N>> base_blob<N>::FromHex(std::string_view) can be added that parses and then returns an std::optional. Similar to TryParseHex.

In any case, the doxygen comment here is probably the wrong place to track brainstorming issues.

If we really wanted to turn this into a constructor, it could be done today. As mentioned earlier, C++17 compilers will already warn when a string_view is initialized with a nullptr. It should be easy to turn this into a compile error, or simply compile once with C++23 in CI. However, I don’t think we want a constructor here.

hodlinator commented at 7:16 am on July 18, 2024:

Tried:

 0diff --git a/src/test/uint256_tests.cpp b/src/test/uint256_tests.cpp
 1index 2053a42d24..e8dfb03d18 100644
 2--- a/src/test/uint256_tests.cpp
 3+++ b/src/test/uint256_tests.cpp
 4@@ -119,6 +119,10 @@ BOOST_AUTO_TEST_CASE( basics ) // constructors, equality, inequality
 5     BOOST_CHECK(uint160(R1S) == R1S);
 6     BOOST_CHECK(uint160(ZeroS) == ZeroS);
 7     BOOST_CHECK(uint160(OneS) == OneS);
 8+
 9+    uint256{0};
10+    uint256{NULL};
11+    uint256{nullptr};
12 }
13 
14 BOOST_AUTO_TEST_CASE( comparison ) // <= >= < >
15diff --git a/src/uint256.h b/src/uint256.h
16index 19faaca91f..39b776ed0f 100644
17--- a/src/uint256.h
18+++ b/src/uint256.h
19@@ -107,6 +107,7 @@ public:
20     constexpr uint256() = default;
21     constexpr explicit uint256(uint8_t v) : base_blob<256>(v) {}
22     constexpr explicit uint256(Span<const unsigned char> vch) : base_blob<256>(vch) {}
23+    explicit uint256(std::string_view str) { SetHex(str); }
24     static const uint256 ZERO;
25     static const uint256 ONE;
26 };

G++13.2.0 warns against both NULL and nullptr. 0 and NULL resolve to the uint256(uint8_t v) ctor:

 0test/uint256_tests.cpp: In member function ‘void uint256_tests::basics::test_method()’:
 1test/uint256_tests.cpp:124:17: warning: passing NULL to non-pointer argument 1 of ‘constexpr uint256::uint256(uint8_t)’ [-Wconversion-null]
 2  124 |     uint256{NULL};
 3      |                 ^
 4...
 5./uint256.h:108:40: note:   declared here
 6  108 |     constexpr explicit uint256(uint8_t v) : base_blob<256>(v) {}
 7      |                                ~~~~~~~~^
 8test/uint256_tests.cpp:125:20: warning: argument 1 null where non-null expected [-Wnonnull]
 9  125 |     uint256{nullptr};
10      |                    ^
11...
12/nix/store/14c6s4xzhy14i2b05s00rjns2j93gzz4-gcc-13.2.0/include/c++/13.2.0/string_view:140:7: note: in a call to function ‘constexpr std::basic_string_view<_CharT, _Traits>::basic_string_view(const _CharT*) [with _CharT = char; _Traits = std::char_traits<char>]’ declared ‘nonnull’
13  140 |       basic_string_view(const _CharT* __str) noexcept
14      |       ^~~~~~~~~~~~~~~~~
15...
16/nix/store/14c6s4xzhy14i2b05s00rjns2j93gzz4-gcc-13.2.0/include/c++/13.2.0/bits/char_traits.h:409:32: warning: argument 1 null where non-null expected [-Wnonnull]
17  409 |         return __builtin_strlen(__s);
18      |                ~~~~~~~~~~~~~~~~^~~~~
19/nix/store/14c6s4xzhy14i2b05s00rjns2j93gzz4-gcc-13.2.0/include/c++/13.2.0/bits/char_traits.h:409:32: note: in a call to built-in function ‘long unsigned int __builtin_strlen(const char*)’

Clang++(17.0.6) warns as well:

0test/uint256_tests.cpp:124:13: warning: implicit conversion of NULL constant to 'uint8_t' (aka 'unsigned char') [-Wnull-conversion]
1  124 |     uint256{NULL};
2      |     ~~~~~~~ ^~~~
3      |             '\0'
4test/uint256_tests.cpp:125:13: warning: null passed to a callee that requires a non-null argument [-Wnonnull]
5  125 |     uint256{nullptr};
6      |             ^~~~~~~
72 warnings generated.

Both continue compilation and segfault when running the tests.

Does CI have a stricter warnings-as-errors policy?

maflcko commented at 7:29 am on July 18, 2024:

Does CI have a stricter warnings-as-errors policy?

Yeah, IIRC it should treat warnings as error.

Even if it didn’t, the compile warning and nullptr in code would still have to pass the eyes of the developer and reviewers. Also, the CI would fail later on with a segfault anyway.

As for the others:

0 is intentional and used in code today, see:

0const uint256 uint256::ZERO(0);
1const uint256 uint256::ONE(1);

NULL is forbidden by clang-tidy, see modernize-use-nullptr.

hodlinator commented at 7:39 am on July 18, 2024:

Reset back to prior version. As you say it seems we could convert uint256S() to a constructor already thanks to the pre-C++23 warnings by Clang & GCC, but I think that is outside the scope of this PR.

stickies-v commented at 10:39 am on July 18, 2024:

Thanks for your comprehensive input here @maflcko. You’re right that this is safe as long as one CI job fails, there’s no need to ensure this fails on every single compiler, which indeed makes the fear historical. My apologies for the back and forth.

Your suggestion for a FromHex() method also seems better than a std::string_view ctor.

hodlinator commented at 12:23 pm on July 18, 2024:

I agree uint256::FromHex() is more self-explanatory than uint256S(). Making it return std::optional might be good if we want stricter parsing and non-fatal failures. This PR feels large enough for now though.

maflcko commented at 12:31 pm on July 18, 2024:

Yes, all of this should happen later, because it is unrelated to fixing possible UB/crashes (the goal of this pull).

maflcko commented at 7:23 am on July 19, 2024:

A partial fix is in #30482. It will be continued in the future.

hodlinator force-pushed on Jul 18, 2024

in src/uint256.cpp:34 in c3a9c71c70 outdated

39-        digits++;
40+    for (const char c : str) {
41+        if (::HexDigit(c) == -1)
42+            break;
43+        ++digits;
44+    }

stickies-v commented at 1:56 pm on July 18, 2024:

~~review note: it seems the while loop was replaced with for loop purely for readability, I don’t see any functional change here.~~

edit: this is incorrect, as pointed out in the following comments.

ryanofsky commented at 3:51 pm on July 18, 2024:

In commit “refactor: Change base_blob::SetHex() to take std::string_view” (f77d961d926405637dfbdfb9a9baea1fab4f1b7b)

It seems not great, especially in the light of the unterminated string bug, that this will iterate over the entire string when only the beginning portion of the string may be needed. For example, if the string is 1MB and the blob is 32 bytes, this will iterate over 1MB. Would changing this to:

0while (digits < str.size() && digits < WIDTH*2 && ::HexDigit(str[digits]) != -1) {
1    ++digits;
2}

This would also allow dropping the pend variable below as a further simplification.

maflcko commented at 5:19 pm on July 18, 2024:

It seems not great

I don’t think this is a change in behavior (previously this was the behavior and afterward this remains the behavior), and the function will be removed completely in the future anyway, so I am not sure how much time should be spent on playing code-golf here.

Happy to re-ACK a trivial fixup, but if it takes more than a few seconds to review it is probably not worth it.

ryanofsky commented at 5:37 pm on July 18, 2024:

Thanks, I did not know this function will be removed. It would be helpful to mention in the PR description that it will be removed, so reviewers can know how worried to be about the quality of the code.

maflcko commented at 5:43 pm on July 18, 2024:

Thanks, I did not know this function will be removed. It would be helpful to mention in the PR description that it will be removed, so reviewers can know how worried to be about the quality of the code.

Well, it will be removed, given that someone writes the code and reviews it. A tracking issue can be created, if you want, but I think the pull request description is the wrong place for this tracking stuff. As for reviewers: I’d say reviewers should worry the most about any regression or worsening of the code. Follow-up improvements are great to mention, but shouldn’t hold up a possible UB/crash fix.

ryanofsky commented at 2:15 am on July 19, 2024:

re: #30436 (review)

review note: it seems the while loop was replaced with for loop purely for readability, I don’t see any functional change here.

To be sure, replacing the previous while loop is an important part of the fix. If the previous while loop was kept, it could iterate past the end of the of string_view and read uninitialized memory. This for loop (and the while loop I suggested in #30436 (review)) avoid the bug by stopping iteration at the end of the string_view.

maflcko commented at 7:24 am on July 19, 2024:

Well, it will be removed, given that someone writes the code and reviews it.

A partial fix is in #30482. It will be continued in the future.

stickies-v commented at 7:59 am on July 19, 2024:

it could iterate past the end of the of string_view and read uninitialized memory.

Right, because on master we rely on stopping when HexDigit() returns a -1, which is fine when the string is null-terminated but is not safe otherwise. Thanks for pointing this out, this also clarifies my misunderstanding about your earlier comment.

l0rinc commented at 10:26 am on July 22, 2024:

previous version was cleaner in my opinion. Since str is random access, we could simplify to:

0    while (digits < trimmed.size() && ::HexDigit(trimmed[digits]) != -1)
1        ++digits;

See: https://github.com/bitcoin/bitcoin/pull/30482/commits/f77d961d926405637dfbdfb9a9baea1fab4f1b7b#r1685498257

hodlinator commented at 12:54 pm on July 22, 2024:

It is random access, but it’s also contiguous, allowing for the safe range-for. If I had to do it again I might have lifted the break; up to the same line as the if. If I need to retouch and others support that change I might do that to make it slightly more terse like yours.

l0rinc commented at 12:57 pm on July 22, 2024:

You can of course redo it, I didn’t review your change for us to ponder on what could have been…

ryanofsky commented at 2:12 pm on July 22, 2024:

You can of course redo it, I didn’t review your change for us to ponder on what could have been…

I think the while loop is a slightly better but range-for is also ok.

I still do think as long as you are going to update the loop to respect the length of the input (instead of blowing past it) it would be better update the loop to respect the length of the output as well, and not pointlessly read more data than can actually be used, as suggested #30436 (review).

Not very important though, especially if this function will be deleted later.

hodlinator commented at 9:32 pm on July 22, 2024:

I still do think as long as you are going to update the loop to respect the length of the input (instead of blowing past it) it would be better update the loop to respect the length of the output as well, and not pointlessly read more data than can actually be used, as suggested #30436 (review).

I tried making the loop counting digits actually respect the output length (WIDTH*2 here) as well.

0    for (const char c : trimmed) {
1        if (digits == WIDTH*2 || ::HexDigit(c) == -1) break;
2        ++digits;
3    }

This uncovered some weird behavior. Suddenly this check started failing:

0    BOOST_CHECK(R1S == uint160S(R1ArrayHex));

Because of how the following loop in SetHex() parses digits from the end, it turns out that without the limit on digits, if we were given more hex digits than fit, we would read the rightmost ones rather than the leftmost ones. :exploding_head:

(The behavior of reading the rightmost hex digits predates this PR, and I’d rather not go down that rabbit hole, as the function will hopefully be removed - #30436 (review)).

in src/uint256.cpp:25 in f77d961d92 outdated

28-        psz++;
29-
30-    // skip 0x
31-    if (psz[0] == '0' && ToLower(psz[1]) == 'x')
32-        psz += 2;
33+    str = util::TrimStringView(str);

ryanofsky commented at 3:29 pm on July 18, 2024:

In commit “refactor: Change base_blob::SetHex() to take std::string_view” (f77d961d926405637dfbdfb9a9baea1fab4f1b7b)

Note: This line isn’t directly equivalent to previous code because it trims whitespace from the end of the string, not just the beginning, but the net effect is the same because any whitespace at the end of the string would be ignored anyway.

I’m surprised looking at the implementation of SetHex to see how unreliable it is in general. Since it can’t report errors it winds up just ignoring unexpected input:

Any input following the first non-hex character is ignored.
- So if there is a space in the string, or a 1 replaced with an I, it is treated like the end of input, and the resulting blob may have a combination of new and old values.
If the hex string provided is too long to fit in the blob, the end of the string will be ignored.

maflcko commented at 5:15 pm on July 18, 2024:

I’m surprised looking at the implementation of SetHex to see how unreliable it is in general. Since it can’t report errors it winds up just ignoring unexpected input:

Yes, those are known shortcomings. The function will be removed completely in the future, but this will be done in a follow-up.

maflcko commented at 5:56 am on July 19, 2024:

So if there is a space in the string, or a 1 replaced with an I, it is treated like the end of input, and the resulting blob may have a combination of new and old values.

I don’t think this is true. All “old” values will zeroed in the first line of SetHex.

The other shortcomings (no length check and no hex check) are true.

maflcko commented at 7:23 am on July 19, 2024:

Yes, those are known shortcomings. The function will be removed completely in the future, but this will be done in a follow-up.

A partial fix is in #30482. It will be continued in the future.

hodlinator commented at 5:36 pm on July 19, 2024:

Note: This line isn’t directly equivalent to previous code because it trims whitespace from the end of the string, not just the beginning, but the net effect is the same because any whitespace at the end of the string would be ignored anyway. @ryanofsky yes, that’s why my first pushed implementation added LeftTrimStringView in 8f4293d892358ce145eb9a099f2e5f256c4d9c73. But got pushback in #30436 (review) and trimming the end of the string means we save checking one character later when range-for iterating over the trimmed string_view to count digits.

ryanofsky approved

ryanofsky commented at 4:08 pm on July 18, 2024: contributor

Code review ACK c3a9c71c7077324bf0aa40f834f7537a14619340

I think the changes are all good, but the PR and the most important commit “fix: Make TxidFromString() respect string length” (c3a9c71c7077324bf0aa40f834f7537a14619340) are lacking a good description that actually says what the bug is.

I think they should say that currently TxidFromString takes a string_view parameter, but internally it is calling the uint256S function which expects a nul-terminated string. If TxidFromString is called with a string that not contain \0, the implementation will read the string and potentially uninitialized memory after it until it finds any character which is not a hex digit.

DrahtBot requested review from stickies-v on Jul 18, 2024

stickies-v approved

stickies-v commented at 8:56 pm on July 18, 2024: contributor

ACK c3a9c71c7077324bf0aa40f834f7537a14619340,

are lacking a good description that actually says what the bug is.

I agree, I spent quite a bit of time getting to the bottom of this and more documentation would have sped that up

but internally it is calling the uint256S function which expects a nul-terminated string

Maybe I’m being too pedantic (or wrong), but I’m not sure this is 100% correct? I don’t think the uint256S(std::string_view str) str needs to be null-terminated. Rather, I think the problem is that by removing the string length data in TxidFromstring (by passing std::string_view::data instead of std::string_view), uint256S has to construct a std::string_view from the const char* without length information, which is dangerous (as you describe) when the initial string_view was not null-terminated, e.g. because it is a substring?

For example, this test (with a non-null terminated string) fails before c3a9c71c7077324bf0aa40f834f7537a14619340 and succeeds after:

 0//txid "0x74d681e0e03bafa802c8aa084379aa98d9fcd632ddc2ed9782b586ec87451f20"
 1const char arr[66] = {
 2    '7', '4', 'd', '6', '8', '1', 'e', '0', 'e', '0', '3', 'b', 'a', 'f', 'a', '8',
 3    '0', '2', 'c', '8', 'a', 'a', '0', '8', '4', '3', '7', '9', 'a', 'a', '9', '8',
 4    'd', '9', 'f', 'c', 'd', '6', '3', '2', 'd', 'd', 'c', '2', 'e', 'd', '9', '7',
 5    '8', '2', 'b', '5', '8', '6', 'e', 'c', '8', '7', '4', '5', '1', 'f', '2', '0',
 6    'a', 'b' // extra 2 characters
 7};
 8const char* ptr = arr;  // not null-terminated
 9std::string_view view{ptr, 64};
10BOOST_CHECK_EQUAL(TxidFromString(view).ToString(), view);

ryanofsky commented at 2:17 am on July 19, 2024: contributor

re: #30436#pullrequestreview-2185910679

but internally it is calling the uint256S function which expects a nul-terminated string

Maybe I’m being too pedantic (or wrong), but I’m not sure this is 100% correct? I don’t think the uint256S(std::string_view str) str needs to be null-terminated.

Yes, both explanations are saying the same thing but mine is unnecessarily confusing. The uint256S function “expects” a nul-terminated string because it accepts a bare char* argument, and in C bare char* strings are almost always nul-terminated. But it does not “need” a nul-terminated string. The string can be terminated with any character other than 0123456789AaBbCcDdEeFf, as mentioned in the next sentence.

maflcko commented at 5:42 am on July 19, 2024: member

are lacking a good description that actually says what the bug is.

I agree, I spent quite a bit of time getting to the bottom of this and more documentation would have sped that up

I agree as well. Thanks for brining it up! This should be fixed before merge, other than that, I’d say it is ready.

stickies-v commented at 2:14 pm on July 19, 2024: contributor

Side note: we can use bugprone-suspicious-stringview-data-usage (from unreleased clang-19) to catch this going forward, when we have clang-19. I tried running clang-tidy-19 locally from master branch for longer than I probably should have, and eventually gave up.

As a poor man’s alternative, I looked for other stringview data misusage based on some rough grepping, and couldn’t find anything, but that was definitely not an exhaustive search either.

hodlinator force-pushed on Jul 19, 2024

hodlinator commented at 8:47 pm on July 19, 2024: contributor

Thanks for the feedback! Helped me realize I was being too terse. (I think it was a mistake alone just to not have string_view in the title for the sake of keeping under 51 chars).

(Edit: where/were typo)

Only modified commit message:

0--- c3a9c71c7077324bf0aa40f834f7537a14619340	2024-07-19 22:42:56.019823786 +0200
1+++ 788fe9cc9ab979c5e14f544ae05dc46436892b81	2024-07-19 22:48:26.785770275 +0200
2@@ -1,3 +1,5 @@
3-fix: Make TxidFromString() respect string length
4+fix: Make TxidFromString() respect string_view length
5 
6-Appears to have been a fully dormant bug. Introduced since inception of TxidFromString(), discussed in [#28922 (review)](/bitcoin-bitcoin/28922/#discussion_r1404437378).
7+Prior to this, passing in string_view::data() meant uint256S() would only receive the a naked char* pointer and scan past the string_view::length() until it found a null terminator.
8+
9+Appears to have been a fully dormant bug as callers were either passing a string literal or std::string directly to TxidFromFromString(), meaning null terminator always existed at pointer[length()]. Bug existed since original merge of TxidFromString(), discussed in [#28922 (review)](/bitcoin-bitcoin/28922/#discussion_r1404437378).

hodlinator force-pushed on Jul 19, 2024

maflcko commented at 9:09 am on July 22, 2024: member

Checked that the code didn’t change in the last push, only the commit message, which looked fine on a glance.

If you re-touch you can change “and scan past” to “and may scan past”, or “and possibly scan past”. Otherwise, it seems to imply this bug was actually hit.

Also, you can adjust the pull request description with a motivation and fix. Otherwise, it is just two random links, and reviewers will have a hard time seeing the motivation and gist of the changes.

maflcko commented at 9:09 am on July 22, 2024: member

re-ACK 788fe9cc9ab979c5e14f544ae05dc46436892b81 🔃

Signature:

0untrusted comment: signature from minisign secret key on empty file; verify via: minisign -Vm "${path_to_any_empty_file}" -P RWTRmVTMeKV5noAMqVlsMugDDCyyTSbA3Re5AkUrhvLVln0tSaFWglOw -x "${path_to_this_whole_four_line_signature_blob}"
1RUTRmVTMeKV5npGrKx1nqXCw5zeVHdtdYURB/KlyA/LMFgpNCs+SkW9a8N95d+U4AP1RJMi+krxU1A3Yux4bpwZNLvVBKy0wLgM=
2trusted comment: re-ACK 788fe9cc9ab979c5e14f544ae05dc46436892b81  🔃
3Zi5Q+ybp7u54gWlNWvoEWUh7VWj3zjslDq1FeleNTzCp50J7LrQhAJBN27yFxvHPL327HAvBdJgZppjE7Nc2DA==

DrahtBot requested review from stickies-v on Jul 22, 2024

DrahtBot requested review from ryanofsky on Jul 22, 2024

in src/test/uint256_tests.cpp:106 in 801404719d outdated

101@@ -102,6 +102,8 @@ BOOST_AUTO_TEST_CASE( basics ) // constructors, equality, inequality
102     BOOST_CHECK(uint256S("0x"+MaxL.ToString()) == MaxL);
103     BOOST_CHECK(uint256S(R1L.ToString()) == R1L);
104     BOOST_CHECK(uint256S("   0x"+R1L.ToString()+"   ") == R1L);
105+    BOOST_CHECK(uint256S("   0x"+R1L.ToString()+"-trash;%^&   ") == R1L);
106+    BOOST_CHECK(uint256S("\t \n  \n \f\n\r\t\v\t   0x"+R1L.ToString()+"  \t \n  \n \f\n\r\t\v\t ") == R1L);

l0rinc commented at 10:18 am on July 22, 2024:

I see that BOOST_CHECK was use throughout, but we could use BOOST_CHECK_EQUAL in a few places for better error messages - at least for the new code:

0    BOOST_CHECK_EQUAL(uint256S("\t \n  \n \f\n\r\t\v\t   0x"+R1L.ToString()+"  \t \n  \n \f\n\r\t\v\t "), R1L);

We could do the old ones in a different PR, but let’s not introduce already deprecated code.

See: https://github.com/bitcoin/bitcoin/pull/30482/commits/801404719dbdbbc327b2f1bc24cb11b6d52e3f27#r1685455576

hodlinator commented at 12:34 pm on July 22, 2024:

Would fully agree if we were introducing a new test function, but as it is in the middle of existing BOOST_CHECKs, I think it’s fine. Won’t sweat it if I don’t show up in the git blame after the BOOST_CHECK_EQUAL-refactor. And I’m happy to review such a follow-up PR.

(Original change here was suggested in #30436 (review))

ryanofsky commented at 1:52 pm on July 22, 2024:

re: #30436 (review)

Mildly prefer paplorinc’s suggestion, but no strong opinion

in src/test/uint256_tests.cpp:120 in 801404719d outdated

116@@ -115,6 +117,8 @@ BOOST_AUTO_TEST_CASE( basics ) // constructors, equality, inequality
117     BOOST_CHECK(uint160S("0x"+MaxS.ToString()) == MaxS);
118     BOOST_CHECK(uint160S(R1S.ToString()) == R1S);
119     BOOST_CHECK(uint160S("   0x"+R1S.ToString()+"   ") == R1S);
120+    BOOST_CHECK(uint160S("   0x"+R1S.ToString()+"-trash;%^&   ") == R1S);

l0rinc commented at 10:19 am on July 22, 2024:

We have many different flavors of hex parsers, to document the differences better, let’s add a test where e.g. TryParseHash would behave differently, e.g. "0123456789 ABCDEF" where TryParseHex continues, SetHexFragile stops:

0TryParseHex: 0123456789abcdef
1SetHexFragile: efcdab8967452301000000000000000000000000000000000000000000000000

See: https://github.com/bitcoin/bitcoin/pull/30482/commits/801404719dbdbbc327b2f1bc24cb11b6d52e3f27#r1685500849

hodlinator commented at 12:48 pm on July 22, 2024:

See #30482 (review)

maflcko commented at 12:50 pm on July 22, 2024:

TryParseHash, TryParseHex, SetHexFragile

None of them are touched or changed in this pull, and as explained, the tests already exist in various places:

0src/test/util_tests.cpp:    result = ParseHex("04678afdb0fe5548271967f1a67130b7105cd6a828e03909a67962e0ea1f61deb649f6bc3f4cef38c4f35504e51ec112de5c384df7ba0b8d578a4c702b6bf11d5f");
1src/test/util_tests.cpp:    result = ParseHex("12 34 56 78");
2src/test/util_tests.cpp:    result = ParseHex(" 89 34 56 78");
3src/test/util_tests.cpp:    result = ParseHex("     Ff        aA    ");
4src/test/util_tests.cpp:    result = ParseHex("");
5src/test/util_tests.cpp:    BOOST_CHECK_EQUAL(ParseHex("AAF F").size(), 0);
6src/test/util_tests.cpp:    BOOST_CHECK_EQUAL(ParseHex(with_embedded_null).size(), 0);
7src/test/util_tests.cpp:    BOOST_CHECK_EQUAL(ParseHex("1234 invalid 1234").size(), 0);
8src/test/util_tests.cpp:    BOOST_CHECK_EQUAL(ParseHex("12 3").size(), 0);

(etc)

l0rinc commented at 12:51 pm on July 22, 2024:

I did, of course, I’m more interested here in your opinion.

l0rinc commented at 12:55 pm on July 22, 2024:

Since we’re modifying the function I’m suggesting more coverage for the cases, based on what other hex parsers were testing for. It’s not a difficult or unreasonable request.

maflcko commented at 1:00 pm on July 22, 2024:

It’s not a difficult or unreasonable request.

I think full test coverage exists, but I am more than happy to review a test-only pull request adding more tests. (I’d be thrilled to review one that adds missing coverage)

But no need to hold up a bugfix based on a test-only minor nit.

l0rinc commented at 1:18 pm on July 22, 2024:

But no need to hold up a bugfix based on a test-only minor nit.

agree, this isn’t the reason for my NACK.

ryanofsky commented at 1:59 pm on July 22, 2024:

re: #30436 (review)

Tend to agree with marco, should avoid adding test cases if they are redundant with other ones. Better to put effort into organizing test cases and making them target specific functionality so it is more obvious if there are gaps in coverage and there is less code to maintain.

l0rinc commented at 2:58 pm on July 22, 2024:

If you think it’s redundant, please mark this as resolved.

stickies-v commented at 10:23 am on July 22, 2024: contributor

re-ACK 788fe9cc9ab979c5e14f544ae05dc46436892b81

No changes except for commit message update.

in src/uint256.cpp:21 in f77d961d92 outdated

17@@ -18,39 +18,31 @@ std::string base_blob<BITS>::GetHex() const
18 }
19 
20 template <unsigned int BITS>
21-void base_blob<BITS>::SetHex(const char* psz)
22+void base_blob<BITS>::SetHex(std::string_view str)

l0rinc commented at 10:25 am on July 22, 2024:

we could keep the const here as well (and simplify a few other structures, please see my other comments for details):

 0template <unsigned int BITS>
 1void base_blob<BITS>::SetHex(const std::string_view str)
 2{
 3    const auto trimmed = util::RemovePrefixView(util::TrimStringView(str), "0x");
 4    size_t digits = 0;
 5    while (digits < trimmed.size() && ::HexDigit(trimmed[digits]) != -1)
 6        ++digits;
 7    for (auto it = m_data.begin(); it < m_data.end(); ++it) {
 8        auto lo = (digits > 0) ? ::HexDigit(trimmed[--digits]) : 0;
 9        *it = (digits > 0) ? lo | (::HexDigit(trimmed[--digits]) << 4) : lo;
10    }
11}

See: https://github.com/bitcoin/bitcoin/pull/30482/files#r1685498559

hodlinator commented at 12:48 pm on July 22, 2024:

Not sure about consting string_view parameters, could only find one pre-existing case of that pattern (GetBuriedDeployment()). Can sympathize with forcing introduction of new names over mutation. I like how you named the first digit hi (maybe high is more precise).

Ultimately, my modification of the function is meant to be a straightforward transformation of the original code. Since it has received a few ACKs already, I am going to put off changing it for now. But might draw some inspiration from your version if I need to change it for other reasons.

l0rinc commented at 12:53 pm on July 22, 2024:

We can get a reack, let’s not focus on that. I’m NACK 788fe9cc9ab979c5e14f544ae05dc46436892b81 -ing because of #30436 (review) anyway.

ryanofsky commented at 2:06 pm on July 22, 2024:

re: #30436 (review)

I don’t really think it makes sense to make declare string views const, since the data they point to is already immutable, and the argument is passed by value so no changes to the view inside the function could affect the caller. I think the const just adds verbosity. Not 100% sure about this and haven’t thought about it much, but would want to know a specific reason for using const here, or see some broader recommendation somewhere that would explain why to do this.

l0rinc commented at 2:57 pm on July 22, 2024:

the data they point to is already immutable

The string part, but not the leading and trailing offsets.

The const signals the intention of the string_view. By declaring it as const std::string_view str I’m announcing that its value will be constant throughout the method, right? But if it’s just std::string_view str, I can call str.remove_prefix or str.remove_prefix, which modifies the internals of the view, so after the call its internal state is different.

hodlinator commented at 7:08 am on July 23, 2024:

@paplorinc I see hi/lo is consistent with secp256k1_uint128, so I retract my high thoughts.

It strikes me that the hi part of your suggestion is not the one being shifted << 4. Maybe it should be called lo?

l0rinc commented at 7:38 am on July 23, 2024:

you’re absolutely right, edited the example!

ryanofsky commented at 4:10 pm on July 23, 2024:

re: #30436 (review)

so after the call its internal state is different.

I may not be understanding the claim correctly, but just to be clear, the string_view argument is passed by value (not by reference) so it can’t be changed by the call.

From the callers point of view there is no difference between:

0void ProcessString(std::string_view str);
1void ProcessString(const std::string_view str);

Either way, whatever argument they pass will not be changed by the call. That is why my general opinion of const value parameters is that the const mostly adds verbosity and does not provide benefits to callers.

It is true that inside the function definition, the const could have an effect because it makes it not possible to modify or reassign the str variable, which could be good, but also bad. For example it would prevent changing

0-const auto trimmed = util::RemovePrefixView(util::TrimStringView(str), "0x");
1+str = util::RemovePrefixView(util::TrimStringView(str), "0x");

and reducing the number of slightly different variables pointing to the same hex string, which could easily prevent bugs. For example a bug was fixed last week, caused by having two slightly different variables refer to the same log string.

Your other claims about inlining are interesting but I don’t know enough about compilers to know if they are true. My intuition leads me to believe that because specifying const on a value parameter does not change semantics of the call, adding const does not provide the compiler with any information it didn’t already have, so wouldn’t affect inlining. But I could be wrong about that.

l0rinc commented at 7:36 pm on July 23, 2024:

Let me try a different angle: if a method is inline, I would expect to be able to copy-paste its content to the call site directly (at least conceptually, to understand its effect). If the method is pure in a functional sense (including that it doesn’t even mutate its parameters), it’s a lot easier to reason about its overall behavior. Does this resonate more?

ryanofsky commented at 4:04 pm on August 20, 2024:

re: #30436 (review)

Let me try a different angle: if a method is inline, I would expect to be able to copy-paste its content to the call site directly (at least conceptually, to understand its effect). If the method is pure in a functional sense (including that it doesn’t even mutate its parameters), it’s a lot easier to reason about its overall behavior. Does this resonate more?

I actually learned something new about this looking at bc3139e5a5f1bf81e4bb53417157f0f2cef86716 from #30377: apparently our CI tools do not complain if you declare parameters const in the function definition but omit the const in the function declarations. I previously thought this would cause warnings or lint errors if definitions and declarations weren’t exactly the same. So it seems like this provides a good compromise. You can avoid confusing callers by not including a const in the declaration that doesn’t mean or do anything, while keeping const in the definition if you want to make it obvious that a parameter is not mutated.

I still think declaring const for function parameters passed by value doesn’t make sense as a general rule, but it can make sense in certain cases for the reasons you explained.

I also learned there is a clang-tidy rule readability-avoid-const-params-in-decls that might be nice for us to enable.

l0rinc commented at 4:35 pm on August 20, 2024:

function parameters passed by value doesn’t make sense as a general rule

Google cpp guide mostly aligns with your view on this. However, I approach this from a more functional programming perspective, where I want to infer the possible state changes a method can perform based solely on its signature - or at least what it cannot do.

Thanks for following up on this, @ryanofsky!

ryanofsky commented at 5:28 pm on August 20, 2024:

I approach this from a more functional programming perspective, where I want to infer the possible state changes a method can perform based solely on its signature - or at least what it cannot do.

Yes, that’s definitely good and desirable, but my point is that when you add const to a c++ function parameter that is passed by value (not passed by reference) the const does not become part of the function signature. It is completely ignored by callers and has no effect on what inputs can be passed to the function or what changes the function can make to those inputs (because c++ functions already cannot modify inputs which are passed by value).

The only effect of adding top-level const on function parameters passed by value is to make local argument variables inside the function definition constant instead of mutable. This can have benefits as well as drawbacks, just like functional programming sometimes has benefits over imperative programming, and sometimes has drawbacks. An important drawback that can happen when variables are immutable is it can lead to proliferation of similarly named variables containing slightly different values because it is not possible to replace existing values, leading to confusion between variables and bugs like #30386 (review). I just think it is good idea to look at things on a case by case basis and not assume functional approaches are always or usually better.

l0rinc commented at 11:41 am on August 21, 2024:

We’re in 100% agreement here, thanks for detailing your view.

In this particular case my thinking was:

initial code, parameter (copy of the original) is mutated locally, we have to keep track of state changes in our head to understand the method’s behavior:

0std::string_view RemovePrefixView2(std::string_view str, std::string_view prefix)
1{
2    if (str.starts_with(prefix)) {
3        str.remove_prefix(prefix.size());
4    }
5    return str;
6}

adding const makes parameter mutations forbidden - we have to rethink the algo (i.e. code smell):

0std::string_view RemovePrefixView3(const std::string_view str, const std::string_view prefix)
1{
2    if (str.starts_with(prefix)) {
3        str.remove_prefix(prefix.size()); //  error: 'this' argument to member function 'remove_prefix' has type 'const std::string_view' (aka 'const basic_string_view<char>'), but function is not marked const
4    }
5    return str;
6}

we can return a copy of the view, knowing that str didn’t change state throughout the method

0std::string_view RemovePrefixView4(const std::string_view str, const std::string_view prefix)
1{
2    if (str.starts_with(prefix)) {
3        return str.substr(prefix.size());
4    }
5    return str;
6}

which enables a single return:

0std::string_view RemovePrefixView5(const std::string_view str, const std::string_view prefix)
1{
2    return str.starts_with(prefix)
3         ? str.substr(prefix.size())
4         : str;
5}

At which point we might as well remove the const from the params, since the new impl is now trivial:

0std::string_view RemovePrefixView6(std::string_view str, std::string_view prefix)
1{
2    return str.starts_with(prefix) ? str.substr(prefix.size()) : str;
3}

ryanofsky commented at 1:32 pm on August 21, 2024:

Makes sense that const can expose things even in simple cases and lead to insights and improvements like the one you are suggesting. In this case, I think “if starts with prefix call remove_prefix” might be more straightforward than “if starts with prefix call substr and return portion of the string after the prefix” but the difference is not great and I take your point in general const can have benefits in places like this.

in src/uint256.cpp:40 in f77d961d92 outdated

47     while (digits > 0 && p1 < pend) {
48-        *p1 = ::HexDigit(psz[--digits]);
49+        *p1 = ::HexDigit(str[--digits]);
50         if (digits > 0) {
51-            *p1 |= ((unsigned char)::HexDigit(psz[--digits]) << 4);
52+            *p1 |= ((unsigned char)::HexDigit(str[--digits]) << 4);

l0rinc commented at 10:29 am on July 22, 2024:

I see this uses little-endian encoding, while e.g. TryParseHex uses big-endian, right? Do you happen to know what the reason is? Should we maybe document this? Do we need to keep the endianness in every scenario (since that’s why we need the digit counting at the beginning) or could we use big-endian in any of the usages, e.g. if these hex strings aren’t persisted or someting (in a different PR, of course)?

See: https://github.com/bitcoin/bitcoin/pull/30482/commits/f77d961d926405637dfbdfb9a9baea1fab4f1b7b#r1685489845

hodlinator commented at 1:02 pm on July 22, 2024:

I know some hashes have long been represented in little-endian while other large integers are not. Do not know of a reason why. Changing it sounds dangerous. I will consider at least documenting which endian it is if I re-touch.

hodlinator commented at 11:32 am on July 26, 2024:

Endianness was documented in a re-touch - too bad I got it wrong the first time. :sweat_smile: Follow-up in #30526.

in src/test/uint256_tests.cpp:62 in f77d961d92 outdated

63-    uint160 rv;
64-    rv.SetHex(str);
65-    return rv;
66-}
67-inline uint160 uint160S(const std::string& str)
68+inline uint160 uint160S(std::string_view str)

l0rinc commented at 10:35 am on July 22, 2024:

we should be able to keep the const in most places:

0inline uint160 uint160S(const std::string_view str)

hodlinator commented at 1:14 pm on July 22, 2024:

See #30436 (review)

l0rinc commented at 1:25 pm on July 22, 2024:

I’m not sure why the type is relevant here, what I’m saying is that we had a const guarantee before which we can keep.

ryanofsky commented at 1:51 pm on July 22, 2024:

re: #30436 (review)

Again (see #30436 (review)) I agree with hodlinator and don’t get the desire to add const to a parameter that is passed by value and can’t be modified anyway. I’d be curious if there is more reasoning behind the suggestion, but it doesn’t seem like an improvement off hand.

l0rinc commented at 3:05 pm on July 22, 2024:

it’s not very important in this particular method, but inline + const could help with the actual inlining - it’s also clearer to the reader to understand what happens when mutating the parameter, since after inlining it should ideally avoid the copying (which is easy to do for a const). At least this was my thinking for recommending it, let me know if I’m mistaken.

ryanofsky commented at 3:39 pm on July 23, 2024:

re: #30436 (review)

it’s not very important in this particular method, but inline + const could help with the actual inlining - it’s also clearer to the reader to understand what happens when mutating the parameter, since after inlining it should ideally avoid the copying (which is easy to do for a const). At least this was my thinking for recommending it, let me know if I’m mistaken.

Responded in the other thread #30436 (review)

in src/uint256.cpp:23 in f77d961d92 outdated

17@@ -18,39 +18,31 @@ std::string base_blob<BITS>::GetHex() const
18 }
19 
20 template <unsigned int BITS>
21-void base_blob<BITS>::SetHex(const char* psz)
22+void base_blob<BITS>::SetHex(std::string_view str)
23 {
24     std::fill(m_data.begin(), m_data.end(), 0);

l0rinc commented at 10:41 am on July 22, 2024:

We shouldn’t need zeroing, if we’re processing the values exhaustively - but if we do, we could use

0    SetNull();

See: https://github.com/bitcoin/bitcoin/pull/30482/files#r1685396384

hodlinator commented at 1:16 pm on July 22, 2024:

Good find. Will do if I need to retouch that commit for other reasons.

l0rinc commented at 1:25 pm on July 22, 2024:

My other suggestions make this irrelevant, we can zero during iteration as well

ryanofsky commented at 2:30 pm on July 22, 2024:

re: #30436 (review)

Calling SetNull, or skipping zeroing as suggested seem like improvements, but no strong opinion. Also think the other suggestions to use begin / end and simplify the loop are good, so added :+1:

I can also understand other point of view of wanting to change code as little as possible while fixing a bug, and wanting to avoid asking people to rereview code that is changing and potentially introduce new bugs. But whatever you want to do here seems fine.

l0rinc commented at 3:07 pm on July 22, 2024:

if we’re afraid to modify code, it’s usually a code smell, which points to not enough test coverage. Instead of being afraid, we should improve our code coverage. We can’t take a tank to battle if we’re afraid we’ll scratch it.

in src/uint256.cpp:38 in 788fe9cc9a outdated

44+    }
45     unsigned char* p1 = m_data.data();
46     unsigned char* pend = p1 + WIDTH;
47     while (digits > 0 && p1 < pend) {
48-        *p1 = ::HexDigit(psz[--digits]);
49+        *p1 = ::HexDigit(str[--digits]);

l0rinc commented at 10:57 am on July 22, 2024:

we could avoid double assignment (and casting) by using a temp variable for the high nibble as such:

0    for (auto it = m_data.begin(); it < m_data.end(); ++it) {
1        auto lo = (digits > 0) ? ::HexDigit(trimmed[--digits]) : 0;
2        *it = (digits > 0) ? lo | (::HexDigit(trimmed[--digits]) << 4) : lo;
3    }

hodlinator commented at 1:39 pm on July 22, 2024:

I like your version slightly more, will do if I need to re-touch the commit and others agree.

Sorry for being difficult. Wish your review had come in sooner.

l0rinc commented at 1:41 pm on July 22, 2024:

Wish your review had come in sooner

nothing is lost yet, we’re all on the same side, people will glady reack :)

in src/uint256.cpp:36 in 788fe9cc9a outdated

41+        if (::HexDigit(c) == -1)
42+            break;
43+        ++digits;
44+    }
45     unsigned char* p1 = m_data.data();
46     unsigned char* pend = p1 + WIDTH;

l0rinc commented at 10:58 am on July 22, 2024:

we should be able to use m_data.begin() and m_data.end() instead

l0rinc changes_requested

l0rinc commented at 11:01 am on July 22, 2024: contributor

Thanks for tackling these, please my comments, most of them originally posted to https://github.com/bitcoin/bitcoin/pull/30482/files

hodlinator commented at 12:28 pm on July 22, 2024: contributor

Also, you can adjust the pull request description with a motivation and fix. Otherwise, it is just two random links, and reviewers will have a hard time seeing the motivation and gist of the changes.

Done. Thank you for you patience @maflcko.

l0rinc commented at 12:57 pm on July 22, 2024: contributor

NACK 788fe9cc9ab979c5e14f544ae05dc46436892b81, because of #30436 (review) mostly. updated since

ryanofsky approved

ryanofsky commented at 1:22 pm on July 22, 2024: contributor

Code review ACK 788fe9cc9ab979c5e14f544ae05dc46436892b81. No changes since my last review other than a commit message change.

This might be ready for merge since it has 3 acks and is definitely an improvement over the status quo. But I haven’t read through the comments yet or looked into the concerns behind the one nack. It also does seem like this PR might still be a little rough around the edges, so maybe it wouldn’t hurt to iterate a little more.

re: #30436#issue-2405108982

Prior to this, TxidFromString() was passing string_view::data() into uint256S() which meant it would only receive the a naked char* pointer and potentially scan past the string_view::length() until it found a null terminator.

This is a helpful summary, but not 100% accurate. uint256S() does currently require a terminator, but it doesn’t have to be a \0 terminator. Any character other than 01234567890ABCDEFabcdef will work as a terminator and stop reading the string.

ryanofsky commented at 1:49 pm on July 22, 2024: contributor

I think this would benefit from a unit test. Would suggest including following test in transaction_tests.cpp in the first commit:

0BOOST_AUTO_TEST_CASE(test_txid)
1{
2    // TxidFromString currently ignores string_view length and reads the whole
3    // string, not the specified substring.
4    BOOST_CHECK_EQUAL(TxidFromString(std::string_view("ABCD1234").substr(0, 4)).ToString(),
5        "00000000000000000000000000000000000000000000000000000000abcd1234");
6}

And then updating the test in the bugfix commit to make it clear how the bugfix changes behavior:

0BOOST_AUTO_TEST_CASE(test_txid)
1{
2    // Make sure TxidFromString respects string_view length and stops reading at
3    // end of the substring.
4    BOOST_CHECK_EQUAL(TxidFromString(std::string_view("ABCD1234").substr(0, 4)).ToString(),
5        "000000000000000000000000000000000000000000000000000000000000abcd");
6}

ryanofsky approved

ryanofsky commented at 2:37 pm on July 22, 2024: contributor

Responded to a bunch of comments, but after reading them I do think this PR looks ok in its current form, and is an improvement.

Would be happy to merge as is, or re-ack if there are more changes.

hodlinator commented at 10:13 pm on July 22, 2024: contributor

This is a helpful summary, but not 100% accurate. uint256S() does currently require a terminator, but it doesn’t have to be a \0 terminator. Any character other than 01234567890ABCDEFabcdef will work as a terminator and stop reading the string.

(Updated summary slightly).

hodlinator force-pushed on Jul 23, 2024

hodlinator commented at 7:58 am on July 23, 2024: contributor

1b7bfbab04f73ca6b9256b4e01f1eec3da5cb541 - Applied many of the suggestions from @paplorinc and @ryanofsky. Hopefully this will achieve even more ACKs.

Added commit replacing BOOST_CHECK() with BOOST_CHECK_*() versions in uint256_tests.cpp.

Dropped change to RemovePrefixView() as @paplorinc’s benchmark shows no improvement #30436 (review)

Added TxidFromString()-test, demonstrating fix as per @ryanofsky’s suggestion.

Took @paplorinc’s version of the final loop in SetHex(), but changed it to range-for:

0    for (auto& elem : m_data) {
1        auto hi = (digits > 0) ? ::HexDigit(trimmed[--digits]) : 0;
2        elem = (digits > 0) ? hi | (::HexDigit(trimmed[--digits]) << 4) : hi;
3    }

Replaced first loop in SetHex() with find_if():

0    size_t digits = std::ranges::find_if(trimmed, [](char c) { return ::HexDigit(c) == -1; }) - trimmed.begin();

Let me know if you prefer:

0    size_t digits = 0;
1    for (const char c : trimmed) {
2        if (::HexDigit(c) == -1) break;
3        ++digits;
4    }

Documented endianness of hex strings as per @paplorinc’s suggestion + added test.

Sorry about the extra rebase, I’m holding one of the git commands wrongly. Comparison between previously reviewed and current

hodlinator force-pushed on Jul 23, 2024

in src/uint256.cpp:27 in bfcc016825 outdated

45-        }
46+    const auto trimmed = util::RemovePrefixView(util::TrimStringView(str), "0x");
47+    // Note: if we are passed a greater number of digits than would fit as bytes
48+    // in m_data, we will be discarding the leftmost ones.
49+    // str="12bc" in a WIDTH=1 m_data => m_data[] == "\0xbc", not "0x12".
50+    size_t digits = std::ranges::find_if(trimmed, [](char c) { return ::HexDigit(c) == -1; }) - trimmed.begin();

maflcko commented at 8:06 am on July 23, 2024:

Not sure about changing code that will be obsolete in a few commits. It seems wasteful to waste review on code that will be changed or deleted down the line. It seems better to touch every line of code only once (or the least number of times possible), than to repeatedly refactor it, and then remove the “feature”.

Also, this “refactor” (d38bcd39cd98dd45c8b530f415339cda2422ec9f) is introducing new bugs and behavior changes, so it seems like a regression either way.

For example, by removing the std::fill(m_data.begin(), m_data.end(), 0);, two SetHex calls now may result in a different value, because the previous value is not zeroed, but leaked into the new value.

(I haven’t checked for other bugs, but I am sure one can find them by spending more than a few seconds on this)

hodlinator commented at 8:27 am on July 23, 2024:

Not sure about changing code that will be obsolete in a few commits. It seems wasteful to waste review on code that will be changed or deleted down the line. It seems better to touch every line of code only once (or the least number of times possible), than to repeatedly refactor it, and then remove the “feature”.

The function is not being removed as of #30482.

For example, by removing the std::fill(m_data.begin(), m_data.end(), 0);, two SetHex calls now may result in a different value, because the previous value is not zeroed, but leaked into the new value.

This is not true, the final loop iterates over the whole of m_data, filling zeroes on the remaining elements when running out of digits. It felt much more robust to me using a range-for, and cleaner then the previous p1/pend.

maflcko commented at 8:35 am on July 23, 2024:

I see, the code may be fine then, but I still find it odd to change a line of code repeatedly over time when it can be changed once.

The function is not being removed as of #30482.

Correct, but parsing less (or more) digits that the data structure can hold does not make sense, so seems best to remove before refactoring this “feature”. If there is a use-case, it would be good to mention it, so that it is clear that the refactor is the last time this function needs to be touched.

l0rinc commented at 9:27 am on July 23, 2024:

+1

maflcko changes_requested

maflcko commented at 8:15 am on July 23, 2024: member

weak Nack, based on #30436 (review) . Not sure why a reviewed pull request for a fix is changed to add in refactoring that isn’t needed for the fix and can be done later, if it isn’t redundant at all.

Please keep “refactor” changes to the absolute minimum in real code.

in src/test/transaction_tests.cpp:1031 in 6085aebd9f outdated

1027@@ -1028,4 +1028,12 @@ BOOST_AUTO_TEST_CASE(test_IsStandard)
1028     }
1029 }
1030 
1031+BOOST_AUTO_TEST_CASE(test_TxidFromString)

l0rinc commented at 8:44 am on July 23, 2024:

nit: if you edit again, consider adding @ryanofsky as co-author to the commit

in src/test/uint256_tests.cpp:170 in 32d788a94c outdated

204-    BOOST_CHECK(OneL.begin() + 32 == OneL.end());
205-    BOOST_CHECK(MaxL.begin() + 32 == MaxL.end());
206-    BOOST_CHECK(TmpL.begin() + 32 == TmpL.end());
207-    BOOST_CHECK(GetSerializeSize(R1L) == 32);
208-    BOOST_CHECK(GetSerializeSize(ZeroL) == 32);
209+    BOOST_CHECK_EQUAL(memcmp(R1L.begin(), R1Array, 32), 0);

l0rinc commented at 8:57 am on July 23, 2024:

nit (is a bit more verbose, but gives way better error during failure):

 0diff --git a/src/test/uint256_tests.cpp b/src/test/uint256_tests.cpp
 1--- a/src/test/uint256_tests.cpp	(revision bfcc016825d8febf366a6b5eec5ed193c2c313bc)
 2+++ b/src/test/uint256_tests.cpp	(date 1721726677844)
 3@@ -169,11 +169,11 @@
 4     TmpL.SetHex(ZeroL.ToString()); BOOST_CHECK_EQUAL(TmpL, uint256());
 5 
 6     TmpL.SetHex(R1L.ToString());
 7-    BOOST_CHECK_EQUAL(memcmp(R1L.begin(), R1Array, 32), 0);
 8-    BOOST_CHECK_EQUAL(memcmp(TmpL.begin(), R1Array, 32), 0);
 9-    BOOST_CHECK_EQUAL(memcmp(R2L.begin(), R2Array, 32), 0);
10-    BOOST_CHECK_EQUAL(memcmp(ZeroL.begin(), ZeroArray, 32), 0);
11-    BOOST_CHECK_EQUAL(memcmp(OneL.begin(), OneArray, 32), 0);
12+    BOOST_CHECK_EQUAL_COLLECTIONS(R1L.begin(), R1L.end(), R1Array, R1Array + R1L.size());
13+    BOOST_CHECK_EQUAL_COLLECTIONS(TmpL.begin(), TmpL.end(), R1Array, R1Array + TmpL.size());
14+    BOOST_CHECK_EQUAL_COLLECTIONS(R2L.begin(), R2L.end(), R2Array, R2Array + R2L.size());
15+    BOOST_CHECK_EQUAL_COLLECTIONS(ZeroL.begin(), ZeroL.end(), ZeroArray, ZeroArray + ZeroL.size());
16+    BOOST_CHECK_EQUAL_COLLECTIONS(OneL.begin(), OneL.end(), OneArray, OneArray + OneL.size());
17     BOOST_CHECK_EQUAL(R1L.size(), sizeof(R1L));
18     BOOST_CHECK_EQUAL(sizeof(R1L), 32);
19     BOOST_CHECK_EQUAL(R1L.size(), 32);
20@@ -215,11 +215,11 @@
21     TmpS.SetHex(ZeroS.ToString()); BOOST_CHECK_EQUAL(TmpS, uint160());
22 
23     TmpS.SetHex(R1S.ToString());
24-    BOOST_CHECK_EQUAL(memcmp(R1S.begin(), R1Array, 20), 0);
25-    BOOST_CHECK_EQUAL(memcmp(TmpS.begin(), R1Array, 20), 0);
26-    BOOST_CHECK_EQUAL(memcmp(R2S.begin(), R2Array, 20), 0);
27-    BOOST_CHECK_EQUAL(memcmp(ZeroS.begin(), ZeroArray, 20), 0);
28-    BOOST_CHECK_EQUAL(memcmp(OneS.begin(), OneArray, 20), 0);
29+    BOOST_CHECK_EQUAL_COLLECTIONS(R1S.begin(), R1S.end(), R1Array, R1Array + R1S.size());
30+    BOOST_CHECK_EQUAL_COLLECTIONS(TmpS.begin(), TmpS.end(), R1Array, R1Array + TmpS.size());
31+    BOOST_CHECK_EQUAL_COLLECTIONS(R2S.begin(), R2S.end(), R2Array, R2Array + R2S.size());
32+    BOOST_CHECK_EQUAL_COLLECTIONS(ZeroS.begin(), ZeroS.end(), ZeroArray, ZeroArray + ZeroS.size());
33+    BOOST_CHECK_EQUAL_COLLECTIONS(OneS.begin(), OneS.end(), OneArray, OneArray + OneS.size());
34     BOOST_CHECK_EQUAL(R1S.size(), sizeof(R1S));
35     BOOST_CHECK_EQUAL(sizeof(R1S), 20);
36     BOOST_CHECK_EQUAL(R1S.size(), 20);

in src/test/util/setup_common.cpp:82 in 32d788a94c outdated

78@@ -79,6 +79,18 @@ const std::function<std::string(const char*)> G_TRANSLATION_FUN = nullptr;
79 /** Random context to get unique temp data dirs. Separate from g_insecure_rand_ctx, which can be seeded from a const env var */
80 static FastRandomContext g_insecure_rand_ctx_temp_path;
81 
82+std::ostream& operator<<(std::ostream& os, const arith_uint256& num)

l0rinc commented at 8:57 am on July 23, 2024:

nice!

in src/test/uint256_tests.cpp:76 in 32d788a94c outdated

72@@ -73,54 +73,54 @@ inline uint160 uint160S(const std::string& str)
73 
74 BOOST_AUTO_TEST_CASE( basics ) // constructors, equality, inequality
75 {
76-    BOOST_CHECK(1 == 0+1);
77+    BOOST_CHECK_EQUAL(1, 0+1);

l0rinc commented at 9:01 am on July 23, 2024:

I wonder what this does exactly :)

hodlinator commented at 10:15 am on July 23, 2024:

Indeed.. saw it too. Seems like temporary code that somehow made it into the test. Guess I could remove if I retouch.

in src/test/transaction_tests.cpp:1036 in bfcc016825 outdated

1034-    // string, not the specified substring.
1035+    // Make sure TxidFromString respects string_view length and stops reading at
1036+    // end of the substring.
1037     BOOST_CHECK_EQUAL(TxidFromString(std::string_view("ABCD1234").substr(0, 4)).ToString(),
1038-        "00000000000000000000000000000000000000000000000000000000abcd1234");
1039+        "000000000000000000000000000000000000000000000000000000000000abcd");

l0rinc commented at 9:34 am on July 23, 2024:

+1, verified, fails as expected in previous commit nit, could probably be simplifed to:

0    BOOST_CHECK_EQUAL(TxidFromString(std::string_view("ABCD1234", 4)),
1                      uint256S("000000000000000000000000000000000000000000000000000000000000abcd"));

hodlinator commented at 1:08 pm on July 23, 2024:

Removed the .substr() now but kept the raw string instead of wrapping the second parameter in uint256S().

l0rinc approved

l0rinc commented at 9:36 am on July 23, 2024: contributor

Thanks for fixing this! ACK bfcc016825d8febf366a6b5eec5ed193c2c313bc

DrahtBot requested review from ryanofsky on Jul 23, 2024

test: Add test for TxidFromString() behavior f0eeee2dc1

refactor: Make uint256_tests no longer use deprecated BOOST_CHECK() f11f816800

test: uint256 - Garbage suffixes and zero padding 2f5577dc2e

refactor: Change base_blob::SetHex() to take std::string_view

Clarify that hex strings are parsed as little-endian.

01e314ce0a

fix: Make TxidFromString() respect string_view length

Prior to this, passing string_view::data() into uint256S() meant the latter would only receive the a naked char* pointer and potentially scan past the string_view::length() until it found a null terminator (or some other non-hex character).

Appears to have been a fully dormant bug as callers were either passing a string literal or std::string directly to TxidFromFromString(), meaning null terminator always existed at pointer[length()]. Bug existed since original merge of TxidFromString(), discussed in https://github.com/bitcoin/bitcoin/pull/28922#discussion_r1404437378.

09ce3501fa

hodlinator force-pushed on Jul 23, 2024

hodlinator commented at 1:06 pm on July 23, 2024: contributor

Credit @ryanofsky as author of TxidFromString() test with a minor transformation.

Use BOOST_CHECK_EQUAL_COLLECTIONS().

SetHex() - Largely reset back to previously reviewed implementation with 3 ACKs. Sacrificing both my find_if() and @paplorinc & mine’s very nice final loop. Keeping the trimmed string_view variable and consted the parameter as a hat-tip to @paplorinc.

As I was changing SetHex() I initially forgot to bring back std::fill(m_data.begin(), m_data.end(), 0);, but unit tests still succeeded, so I added a test to catch that.

Hope this strikes a good balance. Thank you again for your patience and hopefully my intuition in doing async review online has been refined.

in src/test/uint256_tests.cpp:168 in 09ce3501fa

184     uint256 TmpL(R1L);
185-    BOOST_CHECK(TmpL == R1L);
186-    TmpL.SetHex(R2L.ToString());   BOOST_CHECK(TmpL == R2L);
187-    TmpL.SetHex(ZeroL.ToString()); BOOST_CHECK(TmpL == uint256());
188+    BOOST_CHECK_EQUAL(TmpL, R1L);
189+    // Verify previous values don't persist when setting to truncated string.

l0rinc commented at 1:40 pm on July 23, 2024:

+1

in src/uint256.cpp:34 in 09ce3501fa

40-    while (::HexDigit(psz[digits]) != -1)
41-        digits++;
42+    for (const char c : trimmed) {
43+        if (::HexDigit(c) == -1) break;
44+        ++digits;
45+    }

l0rinc commented at 1:42 pm on July 23, 2024:

nit: was

0 while (digits < trimmed.size() && ::HexDigit(trimmed[digits]) != -1)
1        ++digits;

also controversial? It has a simpler diff

hodlinator commented at 7:30 pm on July 23, 2024:

Agree the diff would have been simpler, in this instance I was attempting to revert back to what already had been discussed and ACK:ed before you joined the review. Thanks for helping improve this PR!

l0rinc approved

l0rinc commented at 1:44 pm on July 23, 2024: contributor

ACK 09ce3501fa2ea2885a857e380eddb74605f7038c

Note: adding co-authorship for the commits that reviewers have contributed to is a good way to make sure all participants are motivated to continue doing it.

DrahtBot requested review from maflcko on Jul 23, 2024

in src/test/uint256_tests.cpp:173 in f11f816800 outdated

207-    BOOST_CHECK(GetSerializeSize(ZeroL) == 32);
208+    BOOST_CHECK_EQUAL_COLLECTIONS(R1L.begin(), R1L.end(), R1Array, R1Array + R1L.size());
209+    BOOST_CHECK_EQUAL_COLLECTIONS(TmpL.begin(), TmpL.end(), R1Array, R1Array + TmpL.size());
210+    BOOST_CHECK_EQUAL_COLLECTIONS(R2L.begin(), R2L.end(), R2Array, R2Array + R2L.size());
211+    BOOST_CHECK_EQUAL_COLLECTIONS(ZeroL.begin(), ZeroL.end(), ZeroArray, ZeroArray + ZeroL.size());
212+    BOOST_CHECK_EQUAL_COLLECTIONS(OneL.begin(), OneL.end(), OneArray, OneArray + OneL.size());

maflcko commented at 3:29 pm on July 23, 2024:

style nit in f11f816800ac520064a1e96871d0b4cc9601ced7: Seems confusing to use the left size to calculate the right end. Seems easier to just use std::end(right), if it compiles?

Edit: Or use std::array and the begin() and end() methods.

l0rinc commented at 6:56 pm on July 23, 2024:

I originally did that for the +32 case, but wasn’t working for the +20

hodlinator commented at 11:28 am on July 26, 2024:

Agree it is bad form to use variable from the left side of the comparison in the right side.

Tested std::end(R1Array) but it would be off by 1 thanks to the null-terminator in the string, resulting in a size/distance-mismatch between the collections/iterators being tested.

Follow-up fixing this issue: #30526

maflcko commented at 3:34 pm on July 23, 2024: member

re-ACK 09ce3501fa2ea2885a857e380eddb74605f7038c 🕓

Signature:

0untrusted comment: signature from minisign secret key on empty file; verify via: minisign -Vm "${path_to_any_empty_file}" -P RWTRmVTMeKV5noAMqVlsMugDDCyyTSbA3Re5AkUrhvLVln0tSaFWglOw -x "${path_to_this_whole_four_line_signature_blob}"
1RUTRmVTMeKV5npGrKx1nqXCw5zeVHdtdYURB/KlyA/LMFgpNCs+SkW9a8N95d+U4AP1RJMi+krxU1A3Yux4bpwZNLvVBKy0wLgM=
2trusted comment: re-ACK 09ce3501fa2ea2885a857e380eddb74605f7038c 🕓
3nckkzWnlpOSIdef04sGXd+awRj6JrntK6TIzwwCm2K3hhSnHCUyW+8VcOS4IzMWnefV3XMunu4gyoYA/oKIwCg==

ryanofsky approved

ryanofsky commented at 5:14 pm on July 23, 2024: contributor

Code review ACK 09ce3501fa2ea2885a857e380eddb74605f7038c. I think the current code changes are about as small as you could make to fix the bug without introducing a string copy, and the surrounding test improvements are all very nice and welcome.

It seems like may be some suggestions from paplorinc that could be responded to, but they are pretty minor. So this looks ready for merge.

ryanofsky merged this on Jul 23, 2024

ryanofsky closed this on Jul 23, 2024

in src/test/util/setup_common.cpp:84 in 09ce3501fa

78@@ -79,6 +79,18 @@ const std::function<std::string(const char*)> G_TRANSLATION_FUN = nullptr;
79 /** Random context to get unique temp data dirs. Separate from g_insecure_rand_ctx, which can be seeded from a const env var */
80 static FastRandomContext g_insecure_rand_ctx_temp_path;
81 
82+std::ostream& operator<<(std::ostream& os, const arith_uint256& num)
83+{
84+    os << ArithToUint256(num).ToString();

stickies-v commented at 7:41 pm on July 23, 2024:

Is the ArithToUint256 conversion necessary?

hodlinator commented at 7:49 pm on July 23, 2024:

Good point, I missed the base_uint::ToString() method. Not as familiar with arith_uint256 as I am with uint256 yet.

Should probably also document endianness of arith_uint256 strings to mirror the added comments to uint256 in this PR.

hodlinator commented at 11:28 am on July 26, 2024:

Follow-up fixing this issue: #30526

stickies-v commented at 7:42 pm on July 23, 2024: contributor

post-merge ACK 09ce3501fa2ea2885a857e380eddb74605f7038c

hodlinator deleted the branch on Jul 23, 2024

hodlinator referenced this in commit 82312c0c7b on Jul 25, 2024

hodlinator referenced this in commit 7f71275db0 on Jul 25, 2024

hodlinator referenced this in commit f01848fb03 on Jul 26, 2024

hodlinator referenced this in commit 07de45a882 on Jul 29, 2024

hodlinator referenced this in commit f352d9c1a6 on Aug 1, 2024

hodlinator referenced this in commit 0828a3601c on Aug 1, 2024

luke-jr referenced this in commit e75283ac60 on Aug 1, 2024

luke-jr referenced this in commit 7a9f5c07a4 on Aug 1, 2024

luke-jr referenced this in commit f2999640e9 on Aug 1, 2024

luke-jr referenced this in commit f027fb93af on Aug 1, 2024

luke-jr referenced this in commit fffa0ff1f6 on Aug 1, 2024

hodlinator referenced this in commit 73e3fa10b4 on Aug 3, 2024

ryanofsky referenced this in commit 1a7d20509f on Aug 5, 2024

fix: Make TxidFromString() respect string_view length #30436

Problem

Solution

Code Coverage

Reviews

Conflicts