node: reduce unsafe uint256S usage

stickies-v commented at 6:59 pm on August 1, 2024: contributor

Since fad2991ba073de0bd1f12e42bf0fbaca4a265508, uint256S has been deprecated because it is less robust than the base_blob::FromHex() introduced in the same PR. Specifically, it tries to recover from length-mismatches, recover from untrimmed whitespace, 0x-prefix and garbage at the end, instead of simply requiring exactly 64 hex-only characters. (see also #30532)

This PR carves out the few uint256S callsites that may potentially prove a bit more controversial to change because they deal with user input and backwards incompatible behaviour change.

The main behaviour change introduced in this PR is:

-minimumchainwork will raise an error when input is longer than 64 hex digits
-assumevalid will raise an error when input contains invalid hex characters, or when it is longer than 64 hex digits
test: the optional RANDOM_CTX_SEED env var will now cause tests to abort when it contains invalid hex characters, or when it is longer than 64 hex digits

After this PR, the remaining work to remove uint256S completely is almost entirely mechanical and/or test related. I will open that PR once #30560 is merged because it builds on that.

DrahtBot commented at 6:59 pm on August 1, 2024: contributor

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage

For detailed information about the code coverage, see the test coverage report.

Reviews

See the guideline for information on the review process.

Type	Reviewers
ACK	hodlinator, ryanofsky, l0rinc, achow101
Stale ACK	maflcko

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#30618 (test: TrySanitizeHexNumber FUZZ and unit testing coverage by l0rinc)
#29641 (scripted-diff: Use LogInfo/LogDebug over LogPrintf/LogPrint by maflcko)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

in src/node/chainstatemanager_args.cpp:38 in c0d508e69c outdated

31@@ -32,13 +32,20 @@ util::Result<void> ApplyArgsManOptions(const ArgsManager& args, ChainstateManage
32     if (auto value{args.GetBoolArg("-checkpoints")}) opts.checkpoints_enabled = *value;
33 
34     if (auto value{args.GetArg("-minimumchainwork")}) {
35-        if (!IsHexNumber(*value)) {
36-            return util::Error{strprintf(Untranslated("Invalid non-hex (%s) minimum chain work value specified"), *value)};
37+        if (auto min_work{uint256::FromHex(util::RemovePrefixView(*value, "0x"))}) {
38+            opts.minimum_chain_work = UintToArith256(*min_work);
39+        } else {
40+            return util::Error{strprintf(Untranslated("Invalid minimum chain work value specified (%s), must be %d character hex"), *value, uint256::size() * 2)};

maflcko commented at 7:24 pm on August 1, 2024:

I wonder if it makes sense to add an option to FromHex to allow parsing of “hex numbers”, which would remove the prefix and only reject too large or junk input?

In any case, you can remove IsHexNumber?

stickies-v commented at 4:39 pm on August 2, 2024:

Thanks for the suggestion. I’ve been exploring different options here. Adding a bool is_num optional param to uint256::FromHex() is an option, but felt a bit awkward because it’d be a uint256-specific parameter (doesn’t make sense for transaction_identifier).

Since we only need this for user input validation, I’ve now taken a different approach where I’ve repurposed IsHexNumber() -> TrySanitizeHexNumber(). I think this minimizes churn (e.g. the test suite can largely be kept as is), isolates user input validation from code that’s mostly used internally and benefits from being strict and simple.

in src/node/chainstatemanager_args.cpp:46 in c0d508e69c outdated

45-    if (auto value{args.GetArg("-assumevalid")}) opts.assumed_valid_block = uint256S(*value);
46+    if (auto value{args.GetArg("-assumevalid")}) {
47+        if (auto block_hash{uint256::FromHex(util::RemovePrefixView(*value, "0x"))}) {
48+            opts.assumed_valid_block = *block_hash;
49+        } else {
50+            return util::Error{strprintf(Untranslated("Invalid assumevalid block hash specified (%s), must be %d character hex"), *value, uint256::size() * 2)};

maflcko commented at 7:29 pm on August 1, 2024:

I presume this will break, when someone sets -noassumevalid?

0Error: Invalid assumevalid block hash specified (0), must be 64 character hex

stickies-v commented at 8:47 pm on August 1, 2024:

Ah thanks, didn’t think of that. Will add a “0” check:

 0diff --git a/src/node/chainstatemanager_args.cpp b/src/node/chainstatemanager_args.cpp
 1index 8085aa65dc..60cee3dd78 100644
 2--- a/src/node/chainstatemanager_args.cpp
 3+++ b/src/node/chainstatemanager_args.cpp
 4@@ -31,7 +31,7 @@ util::Result<void> ApplyArgsManOptions(const ArgsManager& args, ChainstateManage
 5 
 6     if (auto value{args.GetBoolArg("-checkpoints")}) opts.checkpoints_enabled = *value;
 7 
 8-    if (auto value{args.GetArg("-minimumchainwork")}) {
 9+    if (auto value{args.GetArg("-minimumchainwork")}; value && *value != "0") {
10         if (auto min_work{uint256::FromHex(util::RemovePrefixView(*value, "0x"))}) {
11             opts.minimum_chain_work = UintToArith256(*min_work);
12         } else {
13@@ -39,7 +39,7 @@ util::Result<void> ApplyArgsManOptions(const ArgsManager& args, ChainstateManage
14         }
15     }
16 
17-    if (auto value{args.GetArg("-assumevalid")}) {
18+    if (auto value{args.GetArg("-assumevalid")}; value && *value != "0") {
19         if (auto block_hash{uint256::FromHex(util::RemovePrefixView(*value, "0x"))}) {
20             opts.assumed_valid_block = *block_hash;
21         } else {

stickies-v commented at 4:40 pm on August 2, 2024:

-minimumchainwork now accepts hex number input (including “0”) again, and -assumevalid has an extra *value != "0" check. Marking as resolved.

in src/test/util/random.cpp:30 in c0d508e69c outdated

30-        const char* num = std::getenv(RANDOM_CTX_SEED.c_str());
31-        if (num) return uint256S(num);
32+        if (const auto num{std::getenv(RANDOM_CTX_SEED)}) {
33+            uint256 rv;
34+            rv.SetHexDeprecated(num);
35+            return rv;

maflcko commented at 7:30 pm on August 1, 2024:

Seems fine to just use Assert(FromHex(...))?

stickies-v commented at 4:41 pm on August 2, 2024:

Done (and RANDOM_CTX_SEED now also accepts < 64 character input string), thanks.

hodlinator commented at 7:54 pm on August 7, 2024:

To me asserts are for enforcing logical consistency of the code, NOT for handling invalid values of environment variables.

A slight convention seems to be using abort() in these circumstances.

0            if (auto num_parsed{uint256::FromHex(num)}) {
1                return *num_parsed;
2            } else {
3                std::cerr << RANDOM_CTX_SEED << " must be a " << uint256::size() * 2 << " char hex string without '0x'-prefix, was set to: '" << num << "'." << std::endl;
4                std::abort();
5            }

(I also tried BOOST_TEST_FAIL but ran into linker errors for the test_util library so would rather avoid adding a dependency).

Gives output:

0$ RANDOM_CTX_SEED=123 src/test/test_bitcoin
1Running 621 test cases...
2RANDOM_CTX_SEED must be a 64 char hex string without '0x'-prefix, was set to: '123'.
3unknown location(0): fatal error: in "addrman_tests/addrman_simple": signal: SIGABRT (application abort requested)
4test/addrman_tests.cpp(64): last checkpoint: "addrman_simple" fixture ctor
5test_bitcoin: common/args.cpp:576: void ArgsManager::AddArg(const std::string&, const std::string&, unsigned int, const OptionsCategory&): Assertion `ret.second' failed.
6unknown location(0): fatal error: in "addrman_tests/addrman_ports": signal: SIGABRT (application abort requested)
7test/addrman_tests.cpp(108): last checkpoint: "addrman_ports" fixture ctor
8...

maflcko commented at 9:28 pm on August 7, 2024:

To me asserts are for enforcing logical consistency of the code, NOT for handling invalid values of environment variables.

Correct. This rule must be applied to real production code. However, in the tests, personally I like to use it for brevity. No objection to using something else.

stickies-v commented at 10:52 am on August 8, 2024:

However, in the tests, personally I like to use it for brevity.

I agree. The tests fail gracefully, and the error message is helpful, including the location of the assertion which documents that we’re expecting a 64 char hex string. Going to leave as is.

0Running 1 test case...
1test/util/random.cpp:29 operator(): Assertion `uint256::FromHex(num)' failed.
2unknown location:0: fatal error: in "timeoffsets_tests/timeoffsets_warning": signal: SIGABRT (application abort requested)
3test/timeoffsets_tests.cpp:62: last checkpoint: "timeoffsets_warning" fixture ctor
4
5*** 1 failure is detected in the test module "Bitcoin Core Test Suite"

hodlinator commented at 12:08 pm on August 8, 2024:

The inclination to use Assert for any fatal error is understandable, but as I said in my book they are for catching logic errors - not for validating user environment. My non-Bitcoin experience tells me that policies around what should be asserts require upkeep, so I took the time to provide an alternate version.

Was reading https://bitcoincore.academy/bin/onboarding-to-bitcoin-core.html today and came across

We take extra care during the encryption phase to either complete atomically or fail. This includes database writes where we don’t want to write half and crash, for example. Therefore we will throw an assertion if the write fails.

Not sure if the actual code really uses assertions to stop the process if the database reports write failure. Maybe Bitcoin Core has a looser policy than I’m used to, and at least @maflcko is okay with using it in tests.

From a purely functional angle, the std::cerr output provides more immediate and actionable information than the assert. test/util/random.cpp:29 operator(): Assertion `uint256::FromHex(num)' failed. vs RANDOM_CTX_SEED must be a 64 char hex string without '0x'-prefix, was set to: '123'.

I’m happy you incorporated some of my other suggestions but I pointed out this as my major gripe with the PR.

Staying true to my principles and giving this an Approach A-C-K generates cognitive dissonance. But maybe I could be bribed by promises of likely support for a future PR from me cleaning up incorrect Assert usage in tests in general.

maflcko commented at 12:26 pm on August 8, 2024:

Therefore we will throw an assertion if the write fails.

Not sure if the actual code really uses assertions to stop the process if the database reports write failure. Maybe Bitcoin Core has a looser policy than I’m used to, and at least @maflcko is okay with using it in tests.

I haven’t checked the code, but “throw assertion” could mean “throw assertion error”, which seems fine to use in this context the quote was taken from.

Not to be confused with other context, where aborting the program on (some) hardware corruption is actually intentional, and I think also fine. When your hardware runs into a corrupt state, aborting seems preferable, when otherwise execution can’t continue meaningfully anyway. When it happens, the user will usually have to wipe their state or replace their hardware anyway. It is also not external user-input that is passed to the assert, in this case (at least it can be assumed to).

But maybe I could be bribed by promises of likely support for a future PR from me cleaning up incorrect Assert usage in tests in general.

Heh, I am generally happy to review any test-related pull requests.

hodlinator commented at 12:57 pm on August 8, 2024:

Using assertions to prevent the process from continuing in an unsupported state is certainly better than nothing at all.

Once the risk of unsupported state due to hardware failure or environment variable misconfiguration becomes known, one should implement non-assert error handling code, as the unsupported state is now expected in some circumstances.

I concede that in this case the environment variable is usually supposed to be set by Python functional tests, so there is a weak case to be made that it is a logic error. But if a problematic seed value is found and incorrectly copied like in my “123” example above, skipping over the Python layer, it no longer is a code-level logic error.

maflcko commented at 1:12 pm on August 8, 2024:

in this case the environment variable is usually supposed to be set by Python functional tests, so there is a weak case to be made that it is a logic error.

Going back to this case, I don’t think RANDOM_CTX_SEED is set by python at all. It should only be set by a dev (or user), or not at all.

At least when I locally call git grep RANDOM_CTX_SEED, I don’t see it.

hodlinator commented at 1:19 pm on August 8, 2024:

Thanks for calling out my assumption that test_runner.py --randomseed=X could trickle down into unit tests.

Makes my case for it not being a logic error slightly stronger.

stickies-v commented at 1:56 pm on August 8, 2024:

Force pushed to use std::abort() since it seems everyone can get behind that, thanks for the suggestion.

maflcko changes_requested

maflcko commented at 7:31 pm on August 1, 2024: member

Not sure about the breaking changes. -noassumevalid seems common https://github.com/search?q=noassumevalid&type=code

DrahtBot added the label CI failed on Aug 1, 2024

stickies-v force-pushed on Aug 2, 2024

stickies-v commented at 4:44 pm on August 2, 2024: contributor

Thanks a lot for the quick approach feedback, @maflcko. Force pushed to minimize breaking changes by repurposing IsHexNumber() to TrySanitizeHexNumber() and allowing < 64 characters for -minimumchainwork and RANDOM_CTX_SEED, and to address the broken -no<parameter> behaviour.

in src/test/fuzz/hex.cpp:30 in 713d92e29f outdated

26@@ -27,7 +27,7 @@ FUZZ_TARGET(hex)
27     if (IsHex(random_hex_string)) {
28         assert(ToLower(random_hex_string) == hex_data);
29     }
30-    (void)IsHexNumber(random_hex_string);
31+    (void)TrySanitizeHexNumber(random_hex_string);

l0rinc commented at 4:36 pm on August 3, 2024:

would it make sense to test the result_size param here, too?

stickies-v commented at 1:28 pm on August 6, 2024:

Good idea, updated fuzz/hex.cpp to cover result_size with an int16_t. Not sure what the optimal range of values would be here, but I figured not making it too large (int) makes sense given that result_size is not user input?

in src/test/util_tests.cpp:458 in 713d92e29f outdated

475+    BOOST_CHECK(!TrySanitizeHexNumber(" "));      // etc.
476+    BOOST_CHECK(!TrySanitizeHexNumber("0x0ga"));  // invalid character
477+    BOOST_CHECK(!TrySanitizeHexNumber("x0"));     // broken prefix
478+    BOOST_CHECK(!TrySanitizeHexNumber("0x0x00")); // two prefixes not allowed
479+
480+    BOOST_CHECK_EQUAL(TrySanitizeHexNumber("0x1234", /*target_size=*/0).value(), "1234");

l0rinc commented at 4:42 pm on August 3, 2024:

I find it a bit surprising that a result size of 0 returns a value with size 4. Do you think it would make sense to ignore negative values instead, so that a result_size of 0 returns an empty string instead, to make it slightly more intuitive?

stickies-v commented at 2:55 pm on August 5, 2024:

Done (although result_size=0 will always return std::nullopt because empty input strings are not allowed as per the existing IsHexNumber behaviour)

in src/util/strencodings.cpp:57 in 713d92e29f outdated

58+        if (HexDigit(c) < 0) return std::nullopt;
59     }
60-    // Return false for empty string or "0x".
61-    return str.size() > 0;
62+    std::string result{input};
63+    if (input.size() < result_size) {

l0rinc commented at 4:47 pm on August 3, 2024:

since we’ve checked > before, maybe it would be more self-explanatory to check whether we’re equal or not, since it’s a stronger guarantee:

0    if (input.size() != result_size) {

stickies-v commented at 2:52 pm on August 5, 2024:

Not sure about this. I don’t find it more readable, and it also makes the code less robust to changes in other lines. Going to leave as is for now.

l0rinc commented at 11:35 am on August 6, 2024:

👍, thanks for considering

in src/util/strencodings.cpp:49 in 713d92e29f outdated

45@@ -46,14 +46,19 @@ bool IsHex(std::string_view str)
46     return (str.size() > 0) && (str.size()%2 == 0);
47 }
48 
49-bool IsHexNumber(std::string_view str)
50+std::optional<std::string> TrySanitizeHexNumber(std::string_view input, size_t result_size)

l0rinc commented at 4:59 pm on August 3, 2024:

Does it still make sense to suffix it with Number after the change? And besides the TryParseHex below, it doesn’t seem common to prefix optional returning methods with Try - the return type already states that the operation might not succeed, Try seems redundant.

stickies-v commented at 10:16 am on August 6, 2024:

The Number part indicates that we’re dealing with numerical values, where 0x00 means the same thing as 0x0000, so I think that aligns with how the function behaves? Return types are not immediately visible on in the callsite, and I think Try indicating that the function accepts malformed input is helpful for the reader, so I’m going to leave this as is.

l0rinc commented at 11:39 am on August 6, 2024:

ok, the number part can make sense, I’m still not sure about the Try part - seems like https://en.wikipedia.org/wiki/Hungarian_notation, which (as a Hungarian) I’m very much against :p

Return types are not immediately visible on in the callsite

It’s part of the method’s signature, I don’t see how repeating it in the method’s name helps. The parameters also aren’t visible on the call site, yet we’re not encoding them in the method’s name.

If you insist, I won’t block you on this of course, but want to make sure that my arguments against it are clear.

in src/util/strencodings.h:83 in 713d92e29f outdated

81+ *        result_size.
82+ *
83+ * @param input Hex encoding of a number, optionally prefixed with 0x.
84+ * @param result_size Size of the resulting string, causing zero-padding
85+ *                    if input is too short, or a std::nullopt return
86+ *                    value if input is too long. Disabled if <= 0.

l0rinc commented at 5:00 pm on August 3, 2024:

Can it be negative?

stickies-v commented at 2:53 pm on August 5, 2024:

Sorry, leftover from previous version where it could be. Since result_size is now an int again, docs updated to say Disabled if <= -1

in src/node/chainstatemanager_args.cpp:42 in b38a259d1e outdated

40         }
41-        opts.minimum_chain_work = UintToArith256(uint256S(*value));
42     }
43 
44-    if (auto value{args.GetArg("-assumevalid")}) opts.assumed_valid_block = uint256S(*value);
45+    if (auto value{args.GetArg("-assumevalid")}; value && *value != "0") {

l0rinc commented at 7:14 pm on August 3, 2024:

shouldn’t we still set the assumed_valid_block value when it’s 0, given that defaultAssumeValid isn’t 0?

stickies-v commented at 10:13 am on August 6, 2024:

Ugh, you’re right, thanks a lot for catching this. This approach would’ve silently ignored -noassumevalid and -assumevalid=0 by using the defaultAssumeValid value instead. I think it’s not great that this bug didn’t cause any tests to fail. feature_assumevalid.py doesn’t catch this because defaultAssumeValid on regtest is null (and I don’t think there’s a way around this), and we don’t seem to have any unittesting on applying args to options.

I’ve added 1c909b2ccfaee902637b28f890db8dbe168ea926 as a first commit (fails on b38a259d1e2749876d4e3a15f218b4f257049320) to add unittesting on -assumevalid and -minimumchainwork.

in src/util/strencodings.h:81 in b38a259d1e outdated

79+ *        result is padded with leading zeroes until result_size is
80+ *        reached, or std::nullopt returned if input is longer than
81+ *        result_size.
82+ *
83+ * @param input Hex encoding of a number, optionally prefixed with 0x.
84+ * @param result_size Size of the resulting string, causing zero-padding

l0rinc commented at 7:18 pm on August 3, 2024:

I think we could document how to disable this param - currently with 0, I recommend considering negative numbers

stickies-v commented at 2:53 pm on August 5, 2024:

Changed to an int type which takes a negative value to disable.

l0rinc commented at 7:19 pm on August 3, 2024: contributor

Thanks for clearing these up, please see my observations.

in src/test/util_tests.cpp:459 in b38a259d1e outdated

476+    BOOST_CHECK(!TrySanitizeHexNumber("0x0ga"));  // invalid character
477+    BOOST_CHECK(!TrySanitizeHexNumber("x0"));     // broken prefix
478+    BOOST_CHECK(!TrySanitizeHexNumber("0x0x00")); // two prefixes not allowed
479+
480+    BOOST_CHECK_EQUAL(TrySanitizeHexNumber("0x1234", /*target_size=*/0).value(), "1234");
481+    BOOST_CHECK_EQUAL(TrySanitizeHexNumber("0x1234", /*target_size=*/4).value(), "1234");

l0rinc commented at 7:22 pm on August 3, 2024:

0    BOOST_CHECK_EQUAL(TrySanitizeHexNumber("0x1234", /*result_size=*/4).value(), "1234");

stickies-v commented at 2:07 pm on August 5, 2024:

Thanks, leftover from a previous version, updated now!

in src/test/util/random.cpp:30 in b38a259d1e outdated

30     static const uint256 ctx_seed = []() {
31         // If RANDOM_CTX_SEED is set, use that as seed.
32-        const char* num = std::getenv(RANDOM_CTX_SEED.c_str());
33-        if (num) return uint256S(num);
34+        if (const auto num{std::getenv(RANDOM_CTX_SEED)}) {
35+            auto sanitized{*Assert(TrySanitizeHexNumber(num, uint256::size() * 2))}; // RANDOM_CTX_SEED must be up to 64 char hex string

maflcko commented at 11:02 am on August 5, 2024:

I just don’t see the use case to accept accidentally truncated input silently. Why not just fa295215f6fa3b85b0387511920f75eeb3e12b58?

stickies-v commented at 10:24 am on August 6, 2024:

I couldn’t see RANDOM_CTX_SEED be documented anywhere as having to be a 64 char hex string, so I thought trying to be lenient would be good for backwards compatibility, but I don’t have a view on how important that is so happy to go with your simplification. Adopted in latest force push.

maflcko commented at 8:07 am on August 7, 2024:

I couldn’t see RANDOM_CTX_SEED be documented anywhere as having to be a 64 char hex string, so I thought trying to be lenient would be good for backwards compatibility, but I don’t have a view on how important that is so happy to go with your simplification. Adopted in latest force push.

Thanks.

For historic context, it was added in fae43a97ca947cd0802392e9bb86d9d0572c0fba, where it is only printed. Thus, the only source should be to copy-paste from a printed value.

If not, and someone were to provide the seed from “outside”, I think it is reasonable and clearer to require them to provide it in the exact same hex format. Otherwise, it seems fragile.

stickies-v force-pushed on Aug 6, 2024

stickies-v commented at 10:49 am on August 6, 2024: contributor

Force-pushed to address review comments, mainly:

fixed buggy -noassumedvalid behaviour and added unit tests to cover this
changed result_size to an int type and changed the disabled (default) state to -1 instead of 0
changed RANDOM_CTX_SEED to now have to be a 64 char hex string

stickies-v force-pushed on Aug 6, 2024

DrahtBot removed the label CI failed on Aug 6, 2024

in src/test/validation_chainstatemanager_tests.cpp:801 in 1c909b2ccf outdated

796+        return opts;
797+    };
798+
799+    // test -assumevalid
800+    BOOST_CHECK(!set_opts({}).assumed_valid_block.has_value());
801+    BOOST_CHECK(set_opts({"-assumevalid=0"}).assumed_valid_block.value().IsNull());

l0rinc commented at 8:53 am on August 7, 2024:

Thanks for covering these scenarios!

I checked to see how this fails for the previous error:

unknown location:0: fatal error: in “validation_chainstatemanager_tests/chainstatemanager_args”: std::bad_optional_access: bad_optional_access

i.e. it doesn’t get to the IsNull check since it can’t call value() on a null optional, right?

I think we can work around that by comparing the dereferenced optional instead:

0    BOOST_CHECK_EQUAL(*set_opts({"-assumevalid=0"}).assumed_valid_block, uint256::ZERO);

which gives this error for the previously invalid setup:

src/test/validation_chainstatemanager_tests.cpp:801: error: in “validation_chainstatemanager_tests/chainstatemanager_args”: check *set_opts({"-assumevalid=0"}).assumed_valid_block == uint256::ZERO has failed [000000016bcc2e20000000016bcc2e20000000016bcc29a03b66800104140f00 != 0000000000000000000000000000000000000000000000000000000000000000]

Which I find a lot more revealing.

maflcko commented at 9:09 am on August 7, 2024:

has failed [000000016bcc2e20000000016bcc2e20000000016bcc29a03b66800104140f00 != 0000000000000000000000000000000000000000000000000000000000000000]

Which I find a lot more revealing.

Can you run this in valgrind? Dereferencing a nullopt is UB (undefined behavior), which must be avoided in production (obviously), but also in tests.

If you want to learn which methods expose undefined behavior, I find [https://en.cppreference.com/w/cpp/utility/optional/operator*] a good resource.

l0rinc commented at 11:19 am on August 7, 2024:

Can you run this in valgrind?

Thanks for the context @maflcko! Valgrind currently isn’t really compatible with ARM-based Macs.

Dereferencing a nullopt is UB

And we can’t use BOOST_CHECK_EQUAL(set_opts({"-assumevalid=0"}).assumed_valid_block, std::make_optional(uint256::ZERO)) since there’s no standard way to display optionals, right?

Ok, in that case just consider the test fix validated, it failed before the fix, passes now.

maflcko commented at 11:39 am on August 7, 2024:

Can you run this in valgrind?

Thanks for the context @maflcko! Valgrind currently isn’t really compatible with ARM-based Macs.

An alternative on macos may be Asan (or possibly Ubsan), but they require re-compiling the binary.

l0rinc commented at 12:19 pm on August 7, 2024:

I’ve noted it and will give it a try next time, thank you

hodlinator commented at 7:06 pm on August 7, 2024:

And we can’t use BOOST_CHECK_EQUAL(set_opts({"-assumevalid=0"}).assumed_valid_block, std::make_optional(uint256::ZERO)) since there’s no standard way to display optionals, right?

Would this fit your definition of displaying optionals?

https://github.com/bitcoin/bitcoin/pull/16545/files#diff-d4a2fb26adedc27f16bd3778424fa94c473342a695b228220a1810119028be5bR48-R64

l0rinc commented at 9:24 pm on August 7, 2024:

Thanks, I think that would produce better error messages on failure.

stickies-v commented at 10:49 am on August 8, 2024:

Would be nice to have std::optional support in BOOST_CHECK_EQUAL but this PR has already had quite a bit of churn so I’m going to limit the scope and leave as is given that it’s not super relevant.

l0rinc commented at 1:43 pm on August 9, 2024:

Split out to pr: https://github.com/bitcoin/bitcoin/pull/30618

in src/test/util_tests.cpp:461 in f6cda17251 outdated

478+    BOOST_CHECK(!TrySanitizeHexNumber("0x0x00")); // two prefixes not allowed
479+
480+    BOOST_CHECK_EQUAL(TrySanitizeHexNumber("0x1234", /*result_size=*/-1).value(), "1234");
481+    BOOST_CHECK_EQUAL(TrySanitizeHexNumber("0x1234", /*result_size=*/4).value(), "1234");
482+    BOOST_CHECK_EQUAL(TrySanitizeHexNumber("1234", /*result_size=*/5).value(), "01234");
483+    BOOST_CHECK(!TrySanitizeHexNumber("0x1234", /*result_size=*/0));

l0rinc commented at 8:54 am on August 7, 2024:

nice

in src/util/strencodings.cpp:52 in f6cda17251 outdated

51 {
52-    if (str.substr(0, 2) == "0x") str.remove_prefix(2);
53-    for (char c : str) {
54-        if (HexDigit(c) < 0) return false;
55+    input = util::RemovePrefixView(input, "0x");
56+    if (result_size < 0) result_size = input.size(); // negative result_size disables the size check

l0rinc commented at 8:56 am on August 7, 2024:

nit: my first reaction was here - “I wonder if there’s an off-by-one error here”. If you think it makes sense, consider this alternative:

0    if (result_size < 0) result_size = std::numeric_limits<int>::max(); // negative result_size disables the size check

stickies-v commented at 9:18 am on August 7, 2024:

That wouldn’t work. result_size is used both to ensure the input is not too long, as well as to pad with leading zeroes if it’s too short: https://github.com/bitcoin/bitcoin/pull/30569/files#diff-3f688af8f182edecd9c33977b905b3e71dc010574f721fd4e328bdbc7706f574R59

l0rinc commented at 11:21 am on August 7, 2024:

K, in that case, the test cases check the boundaries correctly already, thanks for considering.

in src/util/strencodings.cpp:53 in f6cda17251 outdated

52-    if (str.substr(0, 2) == "0x") str.remove_prefix(2);
53-    for (char c : str) {
54-        if (HexDigit(c) < 0) return false;
55+    input = util::RemovePrefixView(input, "0x");
56+    if (result_size < 0) result_size = input.size(); // negative result_size disables the size check
57+    if (input.empty() || ((int)input.size() > result_size)) return std::nullopt;

l0rinc commented at 8:59 am on August 7, 2024:

I don’t particularly like the alternative, but the developer notes state:

0- Use a named cast or functional cast, not a C-Style cast. When casting
1    between integer types, use functional casts such as `int(x)` or `int{x}`
2    instead of `(int) x`. When casting between more complex types, use `static_cast`.

stickies-v commented at 9:22 am on August 7, 2024:

Thanks, will change to int{x} in next force push.

l0rinc commented at 12:05 pm on August 7, 2024:

That might not work, but extracting might:

0    auto final_size = (result_size < 0) ? input.size() : static_cast<size_t>(result_size);

stickies-v commented at 1:30 pm on August 7, 2024:

Yeah that’ll work, ~taken, thanks.

l0rinc commented at 9:00 am on August 7, 2024: contributor

Will continue reviewing a bit later, I only had time for these so far

in src/test/fuzz/hex.cpp:33 in f6cda17251 outdated

30     const std::string hex_data = HexStr(data);
31     if (IsHex(random_hex_string)) {
32         assert(ToLower(random_hex_string) == hex_data);
33     }
34-    (void)IsHexNumber(random_hex_string);
35+    (void)TrySanitizeHexNumber(random_hex_string, result_size);

l0rinc commented at 11:51 am on August 7, 2024:

Do I understand it correctly that the point of this kind of testing (i.e. just calling the method with random values without validating the result) is to make sure we don’t have memory problems, don’t throw unexpected exceptions, etc?

Since we seem to have many hex related methods, would it make sense to compare their outputs, to make sure we at least have internal consistency?

0    auto sanitized_hex = TrySanitizeHexNumber(random_hex_string, result_size);
1    if (sanitized_hex) {
2        assert(IsHex(*sanitized_hex));
3        assert(result_size < 0 || sanitized_hex->length() == static_cast<size_t>(result_size));
4    }

stickies-v commented at 12:53 pm on August 7, 2024:

is to make sure we don’t have memory problems, don’t throw unexpected exceptions, etc?

That’s my understanding too.

would it make sense to compare their outputs, to make sure we at least have internal consistency?

Sounds reasonable, but I think I’ll leave that for another PR.

l0rinc commented at 1:44 pm on August 9, 2024:

Split out to other pr: https://github.com/bitcoin/bitcoin/pull/30618

in src/test/util_tests.cpp:443 in f6cda17251 outdated

460+    BOOST_CHECK(TrySanitizeHexNumber("0"));
461+    BOOST_CHECK(TrySanitizeHexNumber("0x10"));
462+    BOOST_CHECK(TrySanitizeHexNumber("10"));
463+    BOOST_CHECK(TrySanitizeHexNumber("0xff"));
464+    BOOST_CHECK(TrySanitizeHexNumber("ff"));
465+    BOOST_CHECK(TrySanitizeHexNumber("0xFfa"));

l0rinc commented at 11:58 am on August 7, 2024:

I might have mentioned this before, but what’s the reason for empbracing mixed-case hexadecimal values?

stickies-v commented at 12:44 pm on August 7, 2024:

I don’t have a strong view on it, but this commit just reuses the existing IsHexNumber tests so I’d rather not change that here.

in src/test/util_tests.cpp:444 in f6cda17251 outdated

461+    BOOST_CHECK(TrySanitizeHexNumber("0x10"));
462+    BOOST_CHECK(TrySanitizeHexNumber("10"));
463+    BOOST_CHECK(TrySanitizeHexNumber("0xff"));
464+    BOOST_CHECK(TrySanitizeHexNumber("ff"));
465+    BOOST_CHECK(TrySanitizeHexNumber("0xFfa"));
466+    BOOST_CHECK(TrySanitizeHexNumber("Ffa"));

l0rinc commented at 11:59 am on August 7, 2024:

is it deliberate that inputs don’t always contain an even number of digits?

stickies-v commented at 12:44 pm on August 7, 2024:

Yes, since we’re dealing with numbers specifically.

in src/util/strencodings.cpp:57 in f6cda17251 outdated

55+    input = util::RemovePrefixView(input, "0x");
56+    if (result_size < 0) result_size = input.size(); // negative result_size disables the size check
57+    if (input.empty() || ((int)input.size() > result_size)) return std::nullopt;
58+    for (char c : input) {
59+        if (HexDigit(c) < 0) return std::nullopt;
60     }

l0rinc commented at 12:00 pm on August 7, 2024:

could we use IsHex here instead, i.e. merge it with previous line:

0    if (input.empty() || ((int)input.size() > result_size) || !IsHex(input)) return std::nullopt;

though this contains an extra evennes requirement which we may not want here.

stickies-v commented at 12:41 pm on August 7, 2024:

though this contains an extra evennes requirement which we may not want here.

Exactly

in src/node/chainstatemanager_args.cpp:35 in 00fba2a7d6 outdated

31@@ -32,10 +32,11 @@ util::Result<void> ApplyArgsManOptions(const ArgsManager& args, ChainstateManage
32     if (auto value{args.GetBoolArg("-checkpoints")}) opts.checkpoints_enabled = *value;
33 
34     if (auto value{args.GetArg("-minimumchainwork")}) {
35-        if (!TrySanitizeHexNumber(*value)) {
36+        if (auto sanitized_hex{TrySanitizeHexNumber(*value, /*result_size=*/64)}) {

l0rinc commented at 12:11 pm on August 7, 2024:

in the next commit we’re using uint256::size() * 2 instead of the hard-coded 64 - consider unifying, if you agree that it documents the behavior slightly better

stickies-v commented at 12:53 pm on August 7, 2024:

Makes sense, taken, thanks.

in src/node/chainstatemanager_args.cpp:43 in 686e6bcf37 outdated

38@@ -39,7 +39,14 @@ util::Result<void> ApplyArgsManOptions(const ArgsManager& args, ChainstateManage
39         }
40     }
41 
42-    if (auto value{args.GetArg("-assumevalid")}) opts.assumed_valid_block = uint256S(*value);
43+    if (auto value{args.GetArg("-assumevalid")}) {
44+        if (*value == "0") value.emplace(uint256::size() * 2, '0'); // handle -noassumevalid

l0rinc commented at 12:15 pm on August 7, 2024:

nit: if you think it’s more readable, here’s an alternative which avoids reparsing and duplicating uint256::size() * 2 (but assigns twice, so you may not like it):

0    if (auto value{args.GetArg("-assumevalid")}) {
1        if (*value == "0") { // handle -noassumevalid
2            opts.assumed_valid_block = uint256{};
3        } else if (auto block_hash{uint256::FromHex(*value)}) {
4            opts.assumed_valid_block = *block_hash;
5        } else {
6            return util::Error{strprintf(Untranslated("Invalid assumevalid block hash specified (%s), must be %d character hex (or 0 to disable)"), *value, uint256::size() * 2)};
7        }
8    }

stickies-v commented at 12:51 pm on August 7, 2024:

Yeah I like your approach better, taken, thanks.

l0rinc commented at 12:18 pm on August 7, 2024: contributor

Thanks, please see my remaining suggestions

stickies-v force-pushed on Aug 7, 2024

stickies-v commented at 2:54 pm on August 7, 2024: contributor

Thanks for the review @paplorinc, force pushed to address all outstanding comments. Style-only changes:

introduced final_size var to avoid typecasting between (un)signed types and some related cleanup
removed hardcoded 64 value
cleaned up the -noassumevalid logic a bit

l0rinc approved

l0rinc commented at 3:00 pm on August 7, 2024: contributor

ACK d045fc7ac1729cf29140c43518f20375f2aaa1cc

Note that I don’t know the second order effects of constraining previous public input values, somebody else should assess that part

in src/util/strencodings.cpp:60 in a227cb511e outdated

61-    // Return false for empty string or "0x".
62-    return str.size() > 0;
63+    std::string result{input};
64+    if (auto padding{final_size - input.size()}; padding > 0) {
65+        result = std::string(padding, '0') + result;
66+    }

hodlinator commented at 7:19 pm on August 7, 2024:

nit: If padding were on a warm code path I would advocate for reserving the final size of the result string first, avoiding the temporary ‘0’-repeated string and appending input after the zeroes.

stickies-v commented at 10:41 am on August 8, 2024:

Don’t think this is worth spending too much time on but given that the only non-test call is quite likely to require padding, I’ve updated it to:

 0diff --git a/src/util/strencodings.cpp b/src/util/strencodings.cpp
 1index 4e0317cd0e..be946af269 100644
 2--- a/src/util/strencodings.cpp
 3+++ b/src/util/strencodings.cpp
 4@@ -8,6 +8,7 @@
 5 #include <crypto/hex_base.h>
 6 #include <span.h>
 7 
 8+#include <algorithm>
 9 #include <array>
10 #include <cassert>
11 #include <cstring>
12@@ -54,10 +55,8 @@ std::optional<std::string> TrySanitizeHexNumber(std::string_view input, int resu
13     for (char c : input) {
14         if (HexDigit(c) < 0) return std::nullopt;
15     }
16-    std::string result{input};
17-    if (auto padding{final_size - input.size()}; padding > 0) {
18-        result = std::string(padding, '0') + result;
19-    }
20+    std::string result(final_size, '0');
21+    std::copy(input.begin(), input.end(), result.begin() + (final_size - input.size()));
22     return result;
23 }
24

in src/util/strencodings.h:85 in a227cb511e outdated

83+ * @param input Hex encoding of a number, optionally prefixed with 0x.
84+ * @param result_size Size of the resulting string, causing zero-padding
85+ *                    if input is too short, or a std::nullopt return
86+ *                    value if input is too long. Disabled if <= -1.
87+ *                    (default: -1)
88+ * @return std::optional<std::string>

hodlinator commented at 7:22 pm on August 7, 2024:

Why document the type here when it appears in the code below? Seems noisy if we were to do that for return types everywhere.

stickies-v commented at 10:48 am on August 8, 2024:

Agreed, autogenerated and didn’t think to remove it. Done.

in src/node/chainstatemanager_args.cpp:35 in a227cb511e outdated

31@@ -32,7 +32,7 @@ util::Result<void> ApplyArgsManOptions(const ArgsManager& args, ChainstateManage
32     if (auto value{args.GetBoolArg("-checkpoints")}) opts.checkpoints_enabled = *value;
33 
34     if (auto value{args.GetArg("-minimumchainwork")}) {
35-        if (!IsHexNumber(*value)) {
36+        if (!TrySanitizeHexNumber(*value)) {

hodlinator commented at 7:26 pm on August 7, 2024:

nit: Would personally merge the next commit (e6f81b9d81359a7ffeff6d1830188f00df0e8db0) into this one (a227cb511ec948b37ddbc9ee65de586109ebc1da) since you’re already touching this line and it looks weird as an atomic commit to just throw away the sanitized value.

stickies-v commented at 10:49 am on August 8, 2024:

I disagree, a227cb511ec948b37ddbc9ee65de586109ebc1da is a refactor commit so I prefer keeping the behaviour-changing commits such as e6f81b9d81359a7ffeff6d1830188f00df0e8db0 smaller and separate.

in src/test/util/random.cpp:16 in d045fc7ac1 outdated

12 #include <cstdlib>
13-#include <string>
14 
15 FastRandomContext g_insecure_rand_ctx;
16 
17+inline constexpr const char* RANDOM_CTX_SEED{"RANDOM_CTX_SEED"};

hodlinator commented at 7:38 pm on August 7, 2024:

constexpr implies inline for functions, doesn’t it also for variables? Never used inline on variables myself, but seems it is useful in .h files to avoid ODR-issues. This is a .cpp file though.

stickies-v commented at 10:20 am on August 8, 2024:

I think you’re right that inline was not necessary here. No longer applies as I’ve adopted maflcko’s suggestion which moves the var back inside the SeedRandomForTest scope.

in src/test/validation_chainstatemanager_tests.cpp:781 in d045fc7ac1 outdated

776+{
777+    std::vector<const char*> argv = {"ignore"};
778+    argv.insert(argv.end(), args.begin(), args.end());
779+
780+    std::string error{};
781+    BOOST_REQUIRE(args_man.ParseParameters(argv.size(), argv.data(), error));

hodlinator commented at 9:08 pm on August 7, 2024:

nit:

0    BOOST_REQUIRE(args_man.ParseParameters(argv.size(), argv.data(), error));
1    BOOST_REQUIRE(error.empty());

stickies-v commented at 10:24 am on August 8, 2024:

This feels like a distraction to me tbh, we’re not unit testing ParseParameters here. I think it’s safe to rely on ParseParameters returning false when there’s an error.

hodlinator changes_requested

hodlinator commented at 9:22 pm on August 7, 2024: contributor

Concept ACK d045fc7ac1729cf29140c43518f20375f2aaa1cc

Thanks for chipping away at making hex string handling more robust!

Some minor disagreements + ignorable nits. The most significant disagreement in this thread: #30569 (review)

Tested:

0$ RANDOM_CTX_SEED=1231231231231231231231231231231231231231231231231231231231312312 src/test/test_bitcoin
1$ test/functional/feature_assumevalid.py

in src/test/util/random.cpp:39 in d045fc7ac1 outdated

33+        if (const char* num{std::getenv(RANDOM_CTX_SEED)}) {
34+            return *Assert(uint256::FromHex(num)); // RANDOM_CTX_SEED must be a 64 char hex string
35+        }
36         // Otherwise use a (truly) random value.
37         return GetRandHash();
38     }();

maflcko commented at 9:39 pm on August 7, 2024:

If you re-touch this commit, it could make sense to remove the __func__ printing in the log call below. It doesn’t add any value and may be duplicate. I understand it is unrelated, but when touching a test-only function, may as well fix all trivial issues in one go. (See the commit I referred to in #30569 (review), which also has steps to reproduce and test in the description)

stickies-v commented at 10:18 am on August 8, 2024:

Done, I’ve taken your commit (hadn’t seen the commit message earlier, very nice) but just added the docstring (and smaller num scope) back in.

maflcko approved

maflcko commented at 9:40 pm on August 7, 2024: member

left a nit. Feel free to ignore.

Otherwise this looks good on a first glance.

stickies-v force-pushed on Aug 8, 2024

stickies-v commented at 11:00 am on August 8, 2024: contributor

Force-pushed to address all outstanding review comments, mainly:

avoid string re-allocation
cleaned up util/random.cpp a bit more
removed unnecessary docstring

in src/util/strencodings.cpp:61 in ee63e211e3 outdated

62-    // Return false for empty string or "0x".
63-    return str.size() > 0;
64+    std::string result(final_size, '0');
65+    std::copy(input.begin(), input.end(), result.begin() + (final_size - input.size()));
66+    return result;
67 }

l0rinc commented at 12:18 pm on August 8, 2024:

Given that:

this is only used in a single place in prod (which accept std::string_view as well and has an std::string input)
we’re always copying the result before returning
the overwhelming majority of inputs will probably have the correct size already, i.e. don’t need change.
the performance of the method seems to be important

Could we either return an std::optional<std::string_view> instead or take an std::string& input to be able to return it?

i.e.

 0std::optional<std::string> TrySanitizeHexNumber(std::string input, int result_size)
 1{
 2    input = util::RemovePrefixView(input, "0x");
 3    const auto final_size{(result_size < 0) ? input.size() : static_cast<size_t>(result_size)};
 4    if (input.empty() || (input.size() > final_size)) [[unlikely]] return std::nullopt;
 5    for (char c : input) {
 6        if (HexDigit(c) < 0) [[unlikely]] return std::nullopt;
 7    }
 8    if (input.size() == final_size) [[likely]] return input;
 9
10    std::string result(final_size, '0');
11    std::copy(input.begin(), input.end(), result.begin() + (final_size - input.size()));
12    return result;
13}

hodlinator commented at 12:45 pm on August 8, 2024:

I agree taking input as std::string works better with the return type.

Instead of introducing a 2nd string at the end, one could simply:

0    input.insert(0, final_size - input.size(), '0');
1    return input;

l0rinc commented at 1:26 pm on August 8, 2024:

input.insert(0, final_size - input.size(), ‘0’); return input;

my functional programming past screams at the though of mutating parameters and also returning them

hodlinator commented at 1:30 pm on August 8, 2024:

Yeah, I had a bit of an itch as I was writing return input, but you made your bed when you started taking std::string by non-const value. :)

stickies-v commented at 1:39 pm on August 8, 2024:

the performance of the method seems to be important

How so? There is one callsite, which can be called at most once, to parse one optional debug-only option.

Could we either return an std::optionalstd::string_view

How would that work? std::string_view is a non-owning view.

we’re always copying the result before returning

In 802374b4355bd1dec7a88bba6287c55f935699fe, we’re always allocating exactly once (assuming RVO). In your example, we’d be allocating at least once, and twice if we need padding. That doesn’t look like an improvement to me.

the overwhelming majority of inputs will probably have the correct size already, i.e. don’t need change.

Given that it relies on user input, that seems hard to substantiate. In the functional tests, for example, we’re using quite a bit of non-64 char hex input, which is reasonable given that we’re doing all this work to accept number input.

hodlinator commented at 1:43 pm on August 8, 2024:

(Edit: this was regarding the suggestion in #30569 (review)).

0input = util::RemovePrefixView(input, "0x");

This line gives me the heebie-jeebies if input is std::string. It might be okay as we are assigning a shorter string, but worst case, the assignment operator could free it’s prior heap buffer and allocate a shorter buffer, before attempting to copy from the string_view which now might reference the deallocated buffer. Using RemovePrefix() should alleviate the problem.

l0rinc commented at 2:20 pm on August 8, 2024:

Will let you decide, there are multiple ways to do this, each would be slightly better in a different scenario. Whichever produces the cleanest code, choose that :)

edit: the example I gave didn’t actually use the std::string& input I’ve suggested

stickies-v force-pushed on Aug 8, 2024

stickies-v commented at 1:58 pm on August 8, 2024: contributor

Force pushed to terminate with std::abort() in case of invalid RANDOM_CTX_SEED input.

l0rinc commented at 2:22 pm on August 8, 2024: contributor

ACK 855784d3a0026414159acc42fceeb271f8a28133

DrahtBot requested review from hodlinator on Aug 8, 2024

maflcko commented at 2:26 pm on August 8, 2024: member

This PR carves out the few uint256S callsites that may potentially prove a bit more controversial to change because they deal with user input and - potentially - backwards incompatible behaviour change.

After this PR, the remaining work to remove uint256S completely is almost entirely mechanical and/or test related. I will open that PR once #30560 is merged because it builds on that.

I think the description can be clarified that there are two user-facing settings that are now stricter checked.

stickies-v commented at 2:32 pm on August 8, 2024: contributor

I think the description can be clarified that there are two user-facing settings that are now stricter checked.

Thanks, added a section on introduced behaviour change.

in src/node/chainstatemanager_args.cpp:36 in 3c794b537f outdated

31@@ -32,10 +32,11 @@ util::Result<void> ApplyArgsManOptions(const ArgsManager& args, ChainstateManage
32     if (auto value{args.GetBoolArg("-checkpoints")}) opts.checkpoints_enabled = *value;
33 
34     if (auto value{args.GetArg("-minimumchainwork")}) {
35-        if (!TrySanitizeHexNumber(*value)) {
36+        if (auto sanitized_hex{TrySanitizeHexNumber(*value, /*result_size=*/uint256::size() * 2)}) {
37+            opts.minimum_chain_work = UintToArith256(*uint256::FromHex(*sanitized_hex));

hodlinator commented at 7:11 pm on August 8, 2024:

nit: An Assume() or Assert() around the FromHex() result before the *-deref might be better here, even though it should work :tm:.

Maybe linking TrySanitizeHexNumber and FromHex together in the fuzz-test would be good for enforcing that the former only allows through something acceptable to the latter (although result_size must also be correct).

Edit: An alternative to Assert/Assume would be to call FromHex(..).value() and have a std::bad_optional_access be thrown should it ever not have a value.

stickies-v commented at 7:39 am on August 9, 2024:

Yeah, even though this assumption seems safe at the moment it’s brittle and blindly dereferencing it is not good practice. Will use .value() either in next force-push or in the follow-up PR removing uint256S completely. Thanks.

maflcko commented at 8:58 am on August 9, 2024:

(In the follow-up you can also adjust the error message to explain the new max-length check. Otherwise, this could be confusing:

Error: Invalid non-hex (0x1234ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff) minimum chain work value specified)

l0rinc commented at 10:30 am on August 9, 2024:

blindly dereferencing it is not good practice

In C++23 we’ll be able to chain them properly via something like auto hex_value = sanitized_hex.and_then(uint256::FromHex); If there’s a C++23 checklist somewhere (haven’t seen any), we could just leave the dereference as is and fix it after we migrate to C++23 instead.

ryanofsky commented at 6:50 pm on August 9, 2024:

In commit “node: use uint256::FromHex for -minimumchainwork parsing” (3c794b537fe8fe000d3545704afc17d1d63bdb5f)

This makes parsing more strict, by explicitly returning an error when the input is longer than 64 hex characters.

Commit message is incomplete. The number is not just rejected when it is too long but also if it has whitespace prefixes or suffixes or trailing non-hex characters. These would have been ignored previously.

stickies-v commented at 0:36 am on August 10, 2024:

The number is not just rejected when it is too long but also if it has whitespace prefixes or suffixes or trailing non-hex characters. These would have been ignored previously.

Is that so? It seems to me that IsHexNumber() would have returned false for all of those cases, which is why 802374b4355bd1dec7a88bba6287c55f935699fe is a refactor commit?

ryanofsky commented at 1:10 am on August 13, 2024:

The number is not just rejected when it is too long but also if it has whitespace prefixes or suffixes or trailing non-hex characters. These would have been ignored previously.

Is that so? It seems to me that IsHexNumber() would have returned false for all of those cases, which is why 802374b is a refactor commit?

Oh, sorry I forgot about the previous call. So this commit message is perfectly accurate as it is.

stickies-v commented at 11:30 am on August 22, 2024:

(In the follow-up you can also adjust the error message to explain the new max-length check.

Done.

Will use .value() either in next force-push

Resolved by using FromUserHex() instead.

hodlinator approved

hodlinator commented at 7:28 pm on August 8, 2024: contributor

ACK 855784d3a0026414159acc42fceeb271f8a28133

Passed make check. Passed RANDOM_CTX_SEED=1231231231231231231231231231231231231231231231231231231231312312 src/test/test_bitcoin. Failed with expected error when removing char or inserting non-hex char into RANDOM_CTX_SEED from above.

Thanks @stickies-v for humoring me regarding the error handling in test/util/random.cpp!

maflcko commented at 8:36 am on August 9, 2024: member

ACK 855784d3a0026414159acc42fceeb271f8a28133 🔋

Signature:

0untrusted comment: signature from minisign secret key on empty file; verify via: minisign -Vm "${path_to_any_empty_file}" -P RWTRmVTMeKV5noAMqVlsMugDDCyyTSbA3Re5AkUrhvLVln0tSaFWglOw -x "${path_to_this_whole_four_line_signature_blob}"
1RUTRmVTMeKV5npGrKx1nqXCw5zeVHdtdYURB/KlyA/LMFgpNCs+SkW9a8N95d+U4AP1RJMi+krxU1A3Yux4bpwZNLvVBKy0wLgM=
2trusted comment: ACK 855784d3a0026414159acc42fceeb271f8a28133 🔋
3dosCiDfeNxthUmbV00hveQKoOka0fC9nFw0BDCe/HtnnknF+L2/hfD2/aHv9hYT68heKoF0Oc6X5Skx2/ya1Dg==

achow101 commented at 4:29 pm on August 9, 2024: member

-assumevalid will raise an error when input contains invalid hex characters (including 0x prefix), or when it is not exactly 64 characters long

Why not allow 0x prefixes as -minimumchainwork does? It seems plausible to me that someone might have put a 0x prefixed block hash in their conf file and this change breaks that.

in src/test/validation_chainstatemanager_tests.cpp:774 in 1c909b2ccf outdated

769@@ -769,4 +770,44 @@ BOOST_FIXTURE_TEST_CASE(chainstatemanager_snapshot_completion_hash_mismatch, Sna
770     }
771 }
772 
773+/** Helper function to parse args into args_man and return the result of applying them to opts */
774+util::Result<void> SetOptsFromArgs(ArgsManager& args_man, ChainstateManager::Options& opts,

ryanofsky commented at 6:02 pm on August 9, 2024:

In commit “test: unittest chainstatemanager_args” (1c909b2ccfaee902637b28f890db8dbe168ea926)

I don’t think this should be a standalone function when it is only called one place. IMO it would be better to drop it and move the code into the set_opts lambda, which is the only place using it. Getting rid of this function would also avoid unnecessary complexity of converting a char* array to a std::string array and back to a char* array again, and it could avoid introducing an inconsistently named “args_man” variable (majority of code uses args to refer to the ArgsManager instance, and some older code uses argsman, but args_man would be new).

stickies-v commented at 0:54 am on August 10, 2024:

I don’t think this should be a standalone function when it is only called one place.

The reason I organized it this way is because I was quite surprised to see that we don’t have unit testing on setting options from arguments, and SetOptsFromArgs() can trivially (when templated) be reused for any other options setting test, avoiding (non-move) changes to this file. With that said, would you still prefer not having a separate function, or e.g. templating it (prematurely) and moving it into test/util/?

ryanofsky commented at 0:51 am on August 13, 2024:

re: #30569 (review)

With that said, would you still prefer not having a separate function, or e.g. templating it (prematurely) and moving it into test/util/?

Thanks for pointing out that it’s a pretty generic function. Either way seems fine, since it does seem like something that could be reused.

in src/util/strencodings.h:81 in 802374b435 outdated

79+ *        result is padded with leading zeroes until result_size is
80+ *        reached, or std::nullopt returned if input is longer than
81+ *        result_size.
82+ *
83+ * @param input Hex encoding of a number, optionally prefixed with 0x.
84+ * @param result_size Size of the resulting string, causing zero-padding

ryanofsky commented at 6:09 pm on August 9, 2024:

In commit “util: refactor: change IsHexNumber to TrySanitizeHexNumber” (802374b4355bd1dec7a88bba6287c55f935699fe)

Would be good to change “Size of” to “Desired size of”. When I first read this I was confused why this would ask for the size to specified.

stickies-v commented at 1:00 am on August 10, 2024:

Would be good to change “Size of” to “Desired size of”.

“Desired” to me sounds like the result string may not actually be of the requested size, when it is actually guaranteed. I’m not sure that would be an improvement?

ryanofsky commented at 0:56 am on August 13, 2024:

re: #30569 (review)

“Desired” to me sounds like the result string may not actually be of the requested size, when it is actually guaranteed. I’m not sure that would be an improvement?

Makes sense, I think “Requested” would be better than “Desired” in that case, since the function can still fail and return nullopt.

stickies-v commented at 11:29 am on August 22, 2024:

Resolved by removing TrySanitizeHexNumber() in latest force-push.

in src/util/strencodings.h:78 in 802374b435 outdated

76-bool IsHexNumber(std::string_view str);
77+ * @brief Return a sanitized hex string if input is a valid hex number.
78+ *        "0x" prefix is removed, and if result_size is specified, the
79+ *        result is padded with leading zeroes until result_size is
80+ *        reached, or std::nullopt returned if input is longer than
81+ *        result_size.

ryanofsky commented at 6:24 pm on August 9, 2024:

In commit “util: refactor: change IsHexNumber to TrySanitizeHexNumber” (802374b4355bd1dec7a88bba6287c55f935699fe)

This is a somewhat odd function doing a combination of arbitrary things:

Trimming 0x prefix
Returning an error the string is too long
Padding with 0 if the string is too short

And not doing other things that you might expect a sanitize function to do like trimming whitespace or normalizing the case.

I think instead of saying this “sanitizes” the string and pretending like that word means something it would be better to just say this strips 0x prefixes and fixes the length of the string so it can be passed to the base_blob::ParseHex() function.

ryanofsky commented at 6:45 pm on August 9, 2024:

In commit “util: refactor: change IsHexNumber to TrySanitizeHexNumber” (802374b4355bd1dec7a88bba6287c55f935699fe)

std::nullopt returned if input is longer than result_size

Implementation seems to return null when string is empty, not just when string is too long.

stickies-v commented at 0:58 am on August 10, 2024:

Ah yes you’re right, this is to follow existing behaviour of IsHexNumber() but it’s not documented, I’ll fix that, thanks.

stickies-v commented at 1:03 am on August 10, 2024:

I think instead of saying this “sanitizes” the string and pretending like that word means something it would be better to just say this strips 0x prefixes and fixes the length of the string so it can be passed to the base_blob::ParseHex() function.

Yeah, I felt a bit uneasy about the arbitrariness of what the function is doing and how it’s named. I can’t really come up with a better name, though, but I’m open to suggestions?

An alternative would be to, as @maflcko initially suggested, just have this be a parameter of base_blob::FromHex(), but that feels less elegant to me, still. What do you think?

ryanofsky commented at 1:00 am on August 13, 2024:

re: #30569 (review)

An alternative would be to, as @maflcko initially suggested, just have this be a parameter of base_blob::FromHex(), but that feels less elegant to me, still. What do you think?

Yes I think that suggestion would be a lot better! It’s hard to imagine this function being used for anything other than as a preprocessor to FromHex. And I think it would be great if FromHex had an option to parse numbers strictly, or in a nonstrict way that is still safe, with protection from overflows and garbage characters.

stickies-v commented at 10:43 am on August 13, 2024:

It’s hard to imagine this function being used for anything other than as a preprocessor to FromHex

The reason I carved it out instead of having it be a FromHex() parameter is not for reusability but to not bloat FromHex() with an explosion of input validation options (which also leaks into functions wrapping base_blob::FromHex(), such as transaction_identifier::FromHex()). I don’t think there is a single answer to what we’ll ever need for e.g. the below options, it can very much vary for each callsite and context:

do we accept leading and/or trailing whitespaces
do we require fixed size input
do we forbid, allow or require a 0x prefix
is an empty string forbidden or equivalent to 0

I think having an opinionated input validation function (and potentially more in the future, when the need arises) and a pristine FromHex() that does not concern itself with input validation is a much more maintainable and testable approach. For that reason, I’d prefer not parameterizing FromHex().

Would an approach like the below (moved into the uint256.h header), alleviate your concern?

 0/**
 1 * [@brief](/bitcoin-bitcoin/contributor/brief/) Helper function for \ref base_blob::FromHex. Returns a copy of
 2 *        the input without (optional) "0x" prefix, and padded with
 3 *        leading zeroes if it is shorter than what T::FromHex expects.
 4 *        If input contains invalid hex characters, including
 5 *        whitespace, or if it is longer than what T::FromHex expects, a
 6 *        std::nullopt is returned.
 7 *
 8 *        If a value is returned, it will be accepted by T::FromHex().
 9 *
10 * [@param](/bitcoin-bitcoin/contributor/param/) input Hex encoding of a number, optionally prefixed with 0x.
11 */
12template <typename T>
13std::optional<std::string> ValidateAndPadHexNumber(std::string_view input) 
14    requires std::is_base_of_v<base_blob<T::size() * 8>, T>;

ryanofsky commented at 4:25 pm on August 13, 2024:

I think I would probably add base_blob::FromUserHex() function wrapping base_blob::FromHex(). FromHex() would require strict formatting with fixed width strings and no extraneous characters. But when provided with user input, FromUserHex() would allow 0x prefixes, and not require numbers to be padded to a fixed number of digits. Either way, numbers that are too big to fit in the blob or contained unrecognized characters would be errors.

There is no need for either of these functions to trim whitespace as we already have convenient functions to do that, and decision on whether to allow whitespace depends more on how the string arrived (from a file, or over the command line or RPC) than on what produced it.

I think a FromUserHex function would be better than a sidecar ValidateAndPadHexNumber function because it would be easier to use, and more discoverable. The more it was used, the more it would provide a consistent user interface that did not arbitrarily reject things some places which are accepted other places.

stickies-v commented at 11:21 am on August 22, 2024:

Resolved by removing TrySanitizeHexNumber() altogether (and empty strings are still allowed in uint256::FromUserHex, as per current behaviour on master)

stickies-v commented at 11:29 am on August 22, 2024:

I think I would probably add base_blob::FromUserHex() function wrapping base_blob::FromHex().

Thanks for the suggestion, I’ve adopted this approach.

in src/node/chainstatemanager_args.cpp:45 in 1965076857 outdated

38@@ -39,7 +39,15 @@ util::Result<void> ApplyArgsManOptions(const ArgsManager& args, ChainstateManage
39         }
40     }
41 
42-    if (auto value{args.GetArg("-assumevalid")}) opts.assumed_valid_block = uint256S(*value);
43+    if (auto value{args.GetArg("-assumevalid")}) {
44+        if (*value == "0") { // handle -noassumevalid
45+            opts.assumed_valid_block = uint256{};
46+        } else if (auto block_hash{uint256::FromHex(*value)}) {

ryanofsky commented at 6:36 pm on August 9, 2024:

~~In commit “node: use uint256::FromHex for -minimumchainwork parsing” (3c794b537fe8fe000d3545704afc17d1d63bdb5f)~~ In commit “node: use uint256::FromHex for -assumevalid parsing” (19650768575e3d76d0e06c186896b00ad3e26e57)

This is also now going to reject assumevalid numbers that begin with 0x which isn’t mentioned in the commit message, and also reject numbers with whitespace prefixes and suffixes and trailing non-hex characters.

I think it would be better to just use TrySanitizeHexNumber here to avoid breaking things that work without a clear reason. I can see an argument for rejecting extra whitespace and spurious characters, but 0x prefix should be harmless and it also seems ok to not require numbers to be padded with leading 0’s.

Using TrySanitizeHexNumber here would also allow simplifying the code by dropping the == “0” special case and avoiding the need to modify feature_assumevalid.py.

stickies-v commented at 0:43 am on August 10, 2024:

0x which isn’t mentioned in the commit message, and also reject numbers with whitespace prefixes and suffixes and trailing non-hex characters.

x, whitespace, and trailing non-hex characters are all non-hex characters, so arguably it is mentioned in the commit message, but I agree it wouldn’t hurt to be explicit about the 0x prefix and whitespace.

and it also seems ok to not require numbers to be padded with leading 0’s.

I’m a bit confused about this comment being in the -assumevalid code block, which specifically should only accept hashes and not numbers

Using TrySanitizeHexNumber here would also allow simplifying the code by dropping the == “0” special case and avoiding the need to modify feature_assumevalid.py.

I don’t think we should allow anything besides 0 or a (potentially) valid block hash, that feels brittle to me. But again, given that you’re commenting on -assumedvalid code in a -minimumchainwork commit I think there might be some misunderstanding?

ryanofsky commented at 1:39 am on August 13, 2024:

re: #30569 (review)

Thanks for the response, but I think there are two things that could be improved here:

I think it would be better to call TrySanitizeHexNumber so this continues to accept block hashes with optional 0x prefixes and without leading 0’s. This behavior worked previously and seems safe, so I don’t see a reason to break it. If we do want to break, it would be nice to state some reasoning for doing that in the commit message. I don’t think there is a problem with treating block hashes like numbers since our own code does it and it’s how bitcoin determines block difficulty.
I don’t think current commit message is accurate unless you drop the word “explicitly” because this change is not just reporting errors that weren’t previously reported, it is making things that use to work and were not errors into errors. The commit message is also incomplete because it is not mentioning the things which used to work.

To be clear, IMO:

It’s good this change is making it an error to include trailing non-hex characters instead of silently ignoring them.
It’s good this change is making it an error to specify hex strings that are too long.
It’s probably ok this change makes it an error to include leading and trailing whitespace, instead of trimming it. Maybe that could be safer and prevent configuration bugs.
It’s not good but probably ok this change makes it an error to specify an empty string (-assumedvalid="") to disable the assumevalid feature. That seemed like a useful way to clear the setting.
It’s not good this change makes it an error to include a 0x prefix or write numbers not padded with 0’s. That just seems like being arbitrarily strict and breaking backwards compatibility for no reason.

hodlinator commented at 9:14 am on August 13, 2024:

I didn’t previously realize that this PR breaks use-facing compatibility wrt 0x prefixes in args. Would also like to see such cases be filtered through TrySanitizeHexNumber before going into FromHex so that level of compatibility is maintained.

stickies-v commented at 11:29 am on August 22, 2024:

Thank you for summarizing your concerns @ryanofsky. I believe the latest force-push addresses all of them.

It’s good this change is making it an error to include trailing non-hex characters instead of silently ignoring them. It’s good this change is making it an error to specify hex strings that are too long. It’s probably ok this change makes it an error to include leading and trailing whitespace, instead of trimming it. Maybe that could be safer and prevent configuration bugs.

Relevant behaviour unchanged in latest force-push.

It’s not good but probably ok this change makes it an error to specify an empty string (-assumedvalid="") to disable the assumevalid feature. That seemed like a useful way to clear the setting. It’s not good this change makes it an error to include a 0x prefix or write numbers not padded with 0’s. That just seems like being arbitrarily strict and breaking backwards compatibility for no reason.

Latest force-push continues to allow -assumevalid="" as well as 0x-prefixed and too-short-input (and unit test cases are added to test: unittest chainstatemanager_args), keeping this behaviour the same as it currently is on master.

in src/test/util/random.cpp:30 in 855784d3a0 outdated

29     static const uint256 ctx_seed = []() {
30         // If RANDOM_CTX_SEED is set, use that as seed.
31-        const char* num = std::getenv(RANDOM_CTX_SEED.c_str());
32-        if (num) return uint256S(num);
33+        if (const char* num{std::getenv(RANDOM_CTX_SEED)}) {
34+            if (auto num_parsed{uint256::FromHex(num)}) {

ryanofsky commented at 6:43 pm on August 9, 2024:

In commit “test: Only accept a fully valid RANDOM_CTX_SEED” (855784d3a0026414159acc42fceeb271f8a28133)

I don’t understand the motivation for being so strict here. It seems better to call TrySanitizeHexNumber and keep accepting small numbers. The seed just needs to be any number, and I don’t see why numbers that have 64 digits are better than other numbers, or why somebody manually setting the environment variable manually should need to pad the number and make sure it has 64 digits.

stickies-v commented at 0:48 am on August 10, 2024:

Thanks for your input. I wasn’t sure about this either. See https://github.com/bitcoin/bitcoin/pull/30569/#discussion_r1703942199 for previous discussion on this topic. It seems example values have all been 64 characters in the past, and given that it’s only used in tests it doesn’t seem unreasonably to be a bit more strict to simplify the logic, but I appreciate your point of view too.

maflcko commented at 6:22 am on August 10, 2024:

In commit “test: Only accept a fully valid RANDOM_CTX_SEED” (855784d)

I don’t understand the motivation for being so strict here.

Just for reference, you approved the exact same commit 2 days prior in #30571#pullrequestreview-2222184242 .

Maybe my comment from #30569 (review) can be moved into the commit message for clarity?

ryanofsky commented at 2:10 am on August 13, 2024:

re: #30569 (review)

Thanks for the links. I don’t think I realized this change was made in the other PR, and I think it is not a good change.

I don’t see a reason somebody shouldn’t be able to run RANDOM_CTX_SEED=12345 ./test_bitcoin conveniently from a command line or script without padding the seed with 59 0’s. It seems like this makes it pointlessly difficult to specify a random seed and I don’t get the motivation behind it.

If we want to go ahead with change, I think it should be described accurately as making something that used to work no longer work, not just as adding error checking and implying that anything this now rejects was previously in error.

maflcko commented at 5:21 am on August 13, 2024:

Thanks for the links. I don’t think I realized this change was made in the other PR, and I think it is not a good change.

I don’t see a reason somebody shouldn’t be able to run RANDOM_CTX_SEED=12345 ./test_bitcoin conveniently from a command line or script without padding the seed with 59 0’s. It seems like this makes it pointlessly difficult to specify a random seed and I don’t get the motivation behind it.

No strong opinion, but my thinking was that the seed would at most be copy-pasted from previous logs of failing tests. If someone really wanted to provide a seed themselves, they could easily run echo 12355 | sha256sum and use the result of that.

But this is test-only, so anything is probably fine here.

stickies-v commented at 11:23 am on August 22, 2024:

It seems like this makes it pointlessly difficult to specify a random seed and I don’t get the motivation behind it.

I definitely didn’t mean to introduce user friction with this PR - the philosophy was more to not introduce unnecessary code complexity for features that aren’t used, but since it seems like this might be breaking existing/valuable use cases, I’ve now updated RANDOM_CTX_SEED to use the more lenient uint256::FromUserHex() to still allow 0x-prefixes and shorter input, and it doesn’t add code complexity anyway.

ryanofsky commented at 7:02 pm on August 9, 2024: contributor

Code review 855784d3a0026414159acc42fceeb271f8a28133, mild NACK.

I like the new test, and ideas behind this change, but I think some parts of this are not well documented, and are breaking things that work without a clear reason.

stickies-v commented at 1:05 am on August 10, 2024: contributor

Why not allow 0x prefixes as -minimumchainwork does?

Because my understanding is that 0x is used for hex numbers, and a block hash is not a number. So we have to make a trade-off between not introducing “weirdness” into the code vs maintaining backwards compatibility (for an undocumented feature). I prefer leaning towards the former unless there are good reasons to go for the latter, but that’s exactly the reason I carved out these commits into a smaller PR so thank you for your input.

l0rinc commented at 6:55 am on August 10, 2024: contributor

Because my understanding is that 0x is used for hex numbers, and a block hash is not a number.

If it’s not a number, why are we prefixing it with 0 (by a method called TrySanitizeHexNumber)?

parameter of base_blob::FromHex(),

or alternatively chain the two together like TrimAndParse, i.e. base_blob::FromHexLenient()

-assumedvalid […] -minimumchainwork

Would it be clearer if we just warned about the changes in this release and require sanitized input in the next?

ryanofsky commented at 2:36 am on August 13, 2024: contributor

Code review 855784d3a0026414159acc42fceeb271f8a28133 again (unchanged since last time). I think motivation here is good and this PR could be merged if my concerns are not shared, but IMO this change is a net negative in current state without some compatibility improvements.

re: #30569 (review)

Because my understanding is that 0x is used for hex numbers, and a block hash is not a number.

I do think block hashes are numbers, and it’s not really unusual to treat them that way, but even if it is strange and bad to think of them as numbers, I would want to give more thought to backwards compatibility, and only make things errors when they are (1) definitely errors or (2) have a pretty likelihood of being errors, and not just shorter or different styles of writing things.

hodlinator changes_requested

hodlinator commented at 9:19 am on August 13, 2024: contributor

Approach NACK 855784d3a0026414159acc42fceeb271f8a28133

..in light of #30569 (review)

hodlinator commented at 6:26 am on August 20, 2024: contributor

Would be great to get the final issues resolved. Possible way forward, preserving backwards compatibility:

 0diff --git a/src/node/chainstatemanager_args.cpp b/src/node/chainstatemanager_args.cpp
 1index ff5a7ebd00..e83b5f38b6 100644
 2--- a/src/node/chainstatemanager_args.cpp
 3+++ b/src/node/chainstatemanager_args.cpp
 4@@ -33,7 +33,7 @@ util::Result<void> ApplyArgsManOptions(const ArgsManager& args, ChainstateManage
 5 
 6     if (auto value{args.GetArg("-minimumchainwork")}) {
 7         if (auto sanitized_hex{TrySanitizeHexNumber(*value, /*result_size=*/uint256::size() * 2)}) {
 8-            opts.minimum_chain_work = UintToArith256(*uint256::FromHex(*sanitized_hex));
 9+            opts.minimum_chain_work = UintToArith256(uint256::FromHex(*sanitized_hex).value());
10         } else {
11             return util::Error{strprintf(Untranslated("Invalid non-hex (%s) minimum chain work value specified"), *value)};
12         }
13@@ -42,10 +42,10 @@ util::Result<void> ApplyArgsManOptions(const ArgsManager& args, ChainstateManage
14     if (auto value{args.GetArg("-assumevalid")}) {
15         if (*value == "0") { // handle -noassumevalid
16             opts.assumed_valid_block = uint256{};
17-        } else if (auto block_hash{uint256::FromHex(*value)}) {
18-            opts.assumed_valid_block = *block_hash;
19+        } else if (auto sanitized_hex{TrySanitizeHexNumber(*value, /*result_size=*/uint256::size() * 2)}) {
20+            opts.assumed_valid_block = uint256::FromHex(*sanitized_hex).value();
21         } else {
22-            return util::Error{strprintf(Untranslated("Invalid assumevalid block hash specified (%s), must be %d character hex (or 0 to disable)"), *value, uint256::size() * 2)};
23+            return util::Error{strprintf(Untranslated("Invalid assumevalid block hash specified (%s), must be %d hex digits (or 0 to disable)"), *value, uint256::size() * 2)};
24         }
25     }
26 
27diff --git a/src/test/util/random.cpp b/src/test/util/random.cpp
28index 0c09d8fd24..26f114c6d4 100644
29--- a/src/test/util/random.cpp
30+++ b/src/test/util/random.cpp
31@@ -27,8 +27,8 @@ void SeedRandomForTest(SeedRand seedtype)
32     static const uint256 ctx_seed = []() {
33         // If RANDOM_CTX_SEED is set, use that as seed.
34         if (const char* num{std::getenv(RANDOM_CTX_SEED)}) {
35-            if (auto num_parsed{uint256::FromHex(num)}) {
36-                return *num_parsed;
37+            if (auto num_sanitized{TrySanitizeHexNumber(num, /*result_size=*/uint256::size() * 2)}) {
38+                return uint256::FromHex(*num_sanitized).value();
39             } else {
40                 std::cerr << RANDOM_CTX_SEED << " must be a " << uint256::size() * 2 << " char hex string without '0x'-prefix, was set to: '" << num << "'.\n";
41                 std::abort();
42diff --git a/src/test/validation_chainstatemanager_tests.cpp b/src/test/validation_chainstatemanager_tests.cpp
43index 6f81444aa9..63b687a3c4 100644
44--- a/src/test/validation_chainstatemanager_tests.cpp
45+++ b/src/test/validation_chainstatemanager_tests.cpp
46@@ -802,6 +802,8 @@ BOOST_FIXTURE_TEST_CASE(chainstatemanager_args, BasicTestingSetup)
47     BOOST_CHECK(set_opts({"-noassumevalid"}).assumed_valid_block.value().IsNull());
48     const std::string cmd{"-assumevalid=0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef"};
49     BOOST_CHECK_EQUAL(set_opts({cmd.c_str()}).assumed_valid_block.value().ToString(), cmd.substr(13, cmd.size()));
50+    const std::string prefixed{"-assumevalid=0x0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef"};
51+    BOOST_CHECK_EQUAL(set_opts({prefixed.c_str()}).assumed_valid_block.value().ToString(), prefixed.substr(15, prefixed.size()));
52 
53     // test -minimumchainwork
54     BOOST_CHECK(!set_opts({}).minimum_chain_work.has_value());

Side note: make check fails for me in 2 steps after the regular unit tests, same thing happens on the base commit of the PR, so it’s not caused by the PR changes.

test: unittest chainstatemanager_args 85b7cbfcbe

stickies-v force-pushed on Aug 21, 2024

in src/node/chainstatemanager_args.cpp:38 in cf88a55e97 outdated

31@@ -32,13 +32,20 @@ util::Result<void> ApplyArgsManOptions(const ArgsManager& args, ChainstateManage
32     if (auto value{args.GetBoolArg("-checkpoints")}) opts.checkpoints_enabled = *value;
33 
34     if (auto value{args.GetArg("-minimumchainwork")}) {
35-        if (!IsHexNumber(*value)) {
36-            return util::Error{strprintf(Untranslated("Invalid non-hex (%s) minimum chain work value specified"), *value)};
37+        if (auto min_work{uint256::FromUserHex(*value)}) {
38+            opts.minimum_chain_work = UintToArith256(*min_work);
39+        } else {
40+            return util::Error{strprintf(Untranslated("Invalid minimum chain work value specified (%s), must be up to %d characters hex"), *value, uint256::size() * 2)};

l0rinc commented at 8:02 pm on August 21, 2024:

Nit (feel free to disregard): Based on the description (Minimum work assumed to exist on a valid chain in hex), “chain” and “value” might be superfluous here

0            return util::Error{strprintf(Untranslated("Invalid minimum work specified (%s), must be up to %d characters hex"), *value, uint256::size() * 2)};

hodlinator commented at 8:42 pm on August 21, 2024:

nit: “characters hex” -> “hex digits” here and below - better order grammatically, more precise, and shorter.

stickies-v commented at 11:18 am on August 22, 2024:

Taken both suggestions in latest force-push, thanks.

in src/uint256.h:119 in cf88a55e97 outdated

114+ *        the expected length of uintN_t::size()*2.
115+ *
116+ *        Designed to be used when dealing with user input.
117+ */
118+template <class uintN_t>
119+std::optional<uintN_t> FromUserHex(std::string_view input)

l0rinc commented at 8:04 pm on August 21, 2024:

Since this is a “User” centric specialization of FromHex (and not a Hex specialization of FromUser), I’d keep the FromHex prefix, i.e.: FromHexUser or perhaps FromHexLenient, since the User part is just incidental

hodlinator commented at 8:59 pm on August 21, 2024:

(Thread https://github.com/bitcoin/bitcoin/pull/30569/files#r1725693175) I prefer the current uint256::FromUserHex over uint256::FromHexUser (the latter makes me think of sorcerers). (uint256::FromHexLenient is okay to).

stickies-v commented at 11:17 am on August 22, 2024:

I have nothing against any of the suggested names, but personally I still find FromUserHex to be the most intuitive (and it seems at least 2 other reviewers like it as well), so I’m going to stick with that if that’s okay?

ryanofsky commented at 5:37 pm on August 22, 2024:

re: #30569 (review) re: #30569 (comment)

Maybe I have a slightly different perspective because I think it can be good to name functions after how they are intended to be used, instead of what they do, depending on what’s more relevant to callers. In this case, though, I do think the name describes what it does: parse hex input from a user. The assumption is that callers should not make idiosyncratic choices about what user input to accept, and there should be a common standard.

I don’t like name FromHexLenient so much because it sounds like a negative thing that should be avoided, and doesn’t provide much more information about what the function does either.

If we do want the function name to more literally describe what it does, I think FromHexNumber could work because it’s conventional for hex numbers to have 0x prefixes and not need to be padded with 0’s.

But my vote would be for FromUserHex over FromHexNumber over FromHexLenient

in src/uint256.h:122 in cf88a55e97 outdated

117+ */
118+template <class uintN_t>
119+std::optional<uintN_t> FromUserHex(std::string_view input)
120+{
121+    input = util::RemovePrefixView(input, "0x");
122+    constexpr auto expected_size{uintN_t::size() * 2};

l0rinc commented at 8:05 pm on August 21, 2024:

Could we extract this this as a constant and use it throughout this PR (e.g. in error messages and tests)?

hodlinator commented at 9:01 pm on August 21, 2024:

(Thread https://github.com/bitcoin/bitcoin/pull/30569/files#r1725694129) (This has been silently nagging me as well, but totally understand if you want to leave it for someone to follow up on).

stickies-v commented at 10:31 am on August 22, 2024:

I’m open to the idea, but I don’t immediately see a super elegant solution. Thoughts:

a lot of the ::size() * 2 uses are templated (such as in the referenced example here), so constants wouldn’t really work. Adding a base_blob::hex_size() seems inappropriate to me too, that’s not really a base_blob concern.
non-templated use is not very frequent, and mostly in tests. Potentially a constexpr auto UINT256_HEX_SIZE{uint256::size() * 2} could work, but I’m not really convinced that’s worth it either tbh

l0rinc commented at 11:26 am on August 22, 2024:

Thanks for considering

in src/test/validation_chainstatemanager_tests.cpp:810 in cf88a55e97 outdated

805+        return *result;
806+    };
807+
808+    // test -assumevalid
809+    BOOST_CHECK(!get_valid_opts({}).assumed_valid_block.has_value());
810+    BOOST_CHECK(get_valid_opts({"-assumevalid="}).assumed_valid_block.value().IsNull());

l0rinc commented at 8:06 pm on August 21, 2024:

Are we sure this behavior should be written in stone? It rather seems like an error to me

hodlinator commented at 8:52 pm on August 21, 2024:

(Thread https://github.com/bitcoin/bitcoin/pull/30569/files#r1725695133) Good to document current behavior, whatever it is.

stickies-v commented at 10:34 am on August 22, 2024:

This behaviour was mentioned here (and correctly identified as broken by a previous version of this PR), so that’s why I added a test to make it explicit. I don’t have a strong view on whether that behaviour should be allowed, but since it exists, and some people seem to find it useful, and we’re trying to minimize breaking behaviour now, I think I’d prefer keeping it and documenting it.

l0rinc commented at 11:27 am on August 22, 2024:

Haven’t realized it was possible before as well, thanks for clearing that up

in src/test/uint256_tests.cpp:391 in cf88a55e97 outdated

385@@ -386,6 +386,33 @@ BOOST_AUTO_TEST_CASE(from_hex)
386     TestFromHex<Wtxid>();
387 }
388 
389+BOOST_AUTO_TEST_CASE(from_user_hex)
390+{
391+    constexpr std::string_view valid_hex_65{"0x0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef0"}; // 0x prefix and 65 hex chars

l0rinc commented at 8:11 pm on August 21, 2024:

could we move this closer to first usage?

stickies-v commented at 11:18 am on August 22, 2024:

Sure, done!

in src/test/uint256_tests.cpp:400 in cf88a55e97 outdated

395+    BOOST_CHECK_EQUAL(uint256::FromUserHex("0x").value(), uint256::ZERO);
396+    BOOST_CHECK_EQUAL(uint256::FromUserHex("0").value(), uint256::ZERO);
397+    BOOST_CHECK_EQUAL(uint256::FromUserHex("0x10").value(), uint256{16});
398+    BOOST_CHECK_EQUAL(uint256::FromUserHex("10").value(), uint256{16});
399+    BOOST_CHECK_EQUAL(uint256::FromUserHex("0xff").value(), uint256{255});
400+    BOOST_CHECK_EQUAL(uint256::FromUserHex("ff").value(), uint256{255});

l0rinc commented at 8:13 pm on August 21, 2024:

0    BOOST_CHECK_EQUAL(uint256::FromUserHex("0x10").value(), uint256{0x10});
1    BOOST_CHECK_EQUAL(uint256::FromUserHex("10").value(), uint256{0x10});
2    BOOST_CHECK_EQUAL(uint256::FromUserHex("0xff").value(), uint256{0xff});
3    BOOST_CHECK_EQUAL(uint256::FromUserHex("ff").value(), uint256{0xff});

stickies-v commented at 11:18 am on August 22, 2024:

Thanks, that makes much more sense indeed, taken.

in src/test/uint256_tests.cpp:393 in cf88a55e97 outdated

391+    constexpr std::string_view valid_hex_65{"0x0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef0"}; // 0x prefix and 65 hex chars
392+    static_assert(valid_hex_65.size() == 2 + 65);
393+
394+    BOOST_CHECK_EQUAL(uint256::FromUserHex("").value(), uint256::ZERO);
395+    BOOST_CHECK_EQUAL(uint256::FromUserHex("0x").value(), uint256::ZERO);
396+    BOOST_CHECK_EQUAL(uint256::FromUserHex("0").value(), uint256::ZERO);

l0rinc commented at 8:17 pm on August 21, 2024:

nit: I was wondering if it accepts 00 as well (since numbers usually trim leading zeros) and if it accepts 1 (since other implementations require an even number of digits), consider extending the examples:

0    BOOST_CHECK_EQUAL(uint256::FromUserHex("00").value(), uint256::ZERO);
1    BOOST_CHECK_EQUAL(uint256::FromUserHex("1").value(), uint256::ONE);

stickies-v commented at 11:14 am on August 22, 2024:

Thanks, I’ve added these two test cases!

in src/test/util/random.cpp:42 in cf88a55e97 outdated

42         return GetRandHash();
43     }();
44 
45     const uint256& seed{seedtype == SeedRand::SEED ? ctx_seed : uint256::ZERO};
46-    LogPrintf("%s: Setting random seed for current tests to %s=%s\n", __func__, RANDOM_CTX_SEED, seed.GetHex());
47+    LogPrintf("Setting random seed for current tests to %s=%s\n", RANDOM_CTX_SEED, seed.GetHex());

hodlinator commented at 8:51 pm on August 21, 2024:

nit: Noticed you changed this back from the modern LogInfo in 855784d3a0026414159acc42fceeb271f8a28133 to the legacy LogPrintf, was that intentional?

stickies-v commented at 11:18 am on August 22, 2024:

Oops, it wasn’t, thanks for spotting. Reverted in latest force-push.

in src/test/validation_chainstatemanager_tests.cpp:815 in cf88a55e97 outdated

810+    BOOST_CHECK(get_valid_opts({"-assumevalid="}).assumed_valid_block.value().IsNull());
811+    BOOST_CHECK(get_valid_opts({"-assumevalid=0"}).assumed_valid_block.value().IsNull());
812+    BOOST_CHECK(get_valid_opts({"-noassumevalid"}).assumed_valid_block.value().IsNull());
813+    BOOST_CHECK_EQUAL(get_valid_opts({"-assumevalid=0x1234"}).assumed_valid_block.value().ToString(), std::string(60, '0') + "1234");
814+    const std::string cmd{"-assumevalid=0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef"};
815+    BOOST_CHECK_EQUAL(get_valid_opts({cmd.c_str()}).assumed_valid_block.value().ToString(), cmd.substr(13, cmd.size()));

hodlinator commented at 8:53 pm on August 21, 2024:

nit: Intentionally leaving out 0x-variant?

stickies-v commented at 11:15 am on August 22, 2024:

Yeah, I’m trying to keep tests focused on the specific function under test. We already have proper “0x” prefix testing in uint256_tests/from_user_hex tests, and as belt-and-suspenders this case also covers a bit of both but I’d rather not bloat it too much.

hodlinator approved

hodlinator commented at 9:24 pm on August 21, 2024: contributor

ACK cf88a55e97719aabd62f0b608df0800fef8304de

I really like the TrySanitizeHexNumber -> FromUserHex change! Sorry for not understanding that you had it in the works with my prior suggestion. Only ignorable nits left.

(Still fails at the end of make check due to unlucky base commit).

DrahtBot requested review from l0rinc on Aug 22, 2024

DrahtBot requested review from maflcko on Aug 22, 2024

stickies-v force-pushed on Aug 22, 2024

in src/test/util/random.cpp:33 in 81554aac80 outdated

32-        if (num) return uint256S(num);
33+        if (const char* num{std::getenv(RANDOM_CTX_SEED)}) {
34+            if (auto num_parsed{uint256::FromUserHex(num)}) {
35+                return *num_parsed;
36+            } else {
37+                std::cerr << RANDOM_CTX_SEED << " must consist only of up to " << uint256::size() * 2 << " hex digits (\"0x\" prefix allowed), was set to: '" << num << "'.\n";

l0rinc commented at 11:30 am on August 22, 2024:

nit, if you touch again, consider:

0                std::cerr << RANDOM_CTX_SEED << " must consist of up to " << uint256::size() * 2 << " hex digits (\"0x\" prefix allowed), it was set to: '" << num << "'.\n";

stickies-v commented at 12:27 pm on August 23, 2024:

Thanks, taken.

stickies-v commented at 11:30 am on August 22, 2024: contributor

Force-pushed (twice) to change the approach to address @achow101’s, @ryanofsky’s and @hodlinator’s concerns about being too strict about user hex input for -assumevalid and RANDOM_CTX_SEED. Thanks again for sharing your concerns about breaking changes (and the detailed review), in hindsight I think this is indeed the better approach.

Specifically, these last two force-pushes:

introduces a uint256::FromUserHex() to allow for lenient user input parsing and does away with the TrySanitizeHexNumber, as suggested by l0rinc and ryanofsky
use the new uint256::FromUserHex() function for -minimumchainwork, -assumevalid and RANDOM_CTX_SEED parsing, minimizing the behaviour change in this PR and allowing 0x prefixes and too-short-input in all 3 cases. The main behaviour change now introduced by this PR is limited to more explicit handling of invalid hex characters and too-long-input, which I believe won’t be controversial.
remove the now unused IsHexNumber(). Most of its unit tests have been migrated to uint256_tests/from_user_hex.
improved testing in validation_chainstatemanager_tests:
- refactored SetOptsFromArgs to be templated and reusable, and allow for better error reporting by moving the BOOST_ calls to the edges as much as ergonomically possible
- added unit tests for invalid-hex-char or too-many-char cases for -minimumchainwork and -assumevalid
added and cleaned up some of the `uint256_tests/from_user_hex tests 1, 2, e
terminology: consistently use “hex digits” whenever the character is indeed a [0-9, a-f] value, and “(non-hex) character” when it isn’t or we’re not sure yet.

l0rinc commented at 11:44 am on August 22, 2024: contributor

I like it a lot more now, thanks for considering the feedbacks.

The only part that still bugs me a bit is the naming of FromUserHex -> the “user” is on a different abstraction level, I’d prefer mentioning the behavior change compared to the sibling method (i.e. it’s more forgiving/leninent for the inputs), rather than the motivation for adding the method (the users sometimes prefer different formats).

ACK 81554aac80bf2270db977c110c37acc7e8034194

DrahtBot requested review from hodlinator on Aug 22, 2024

stickies-v commented at 4:30 pm on August 22, 2024: contributor

Sorry for not understanding that you had it in the works with my prior suggestion.

No problem at all, I decided to give this PR a bit of a rest to focus on reviewing and aligning with related PRs, and think about the suggestions made here, hence the delay in updating this. Will quickly address feedback again from here on.

The only part that still bugs me a bit is the naming of FromUserHex -> the “user” is on a different abstraction level, I’d prefer mentioning the behavior change compared to the sibling method (i.e. it’s more forgiving/leninent for the inputs), rather than the motivation for adding the method (the users sometimes prefer different formats).

I think it’s a valid concern, and I agree that generally functions should be named in line with what they do, not with how they’re used - and FromHexLenient would be a good name. The reason I’m sticking with FromUserHex is that it’s 100% a convenience function that we’ve implemented specifically for dealing with user input, and the name makes that very clear imo. I’d also prefer to avoid bikeshedding over this, since it does touch quite a few LoC, commits, commit messages, PR descriptions, … and I think both options are good.

Thank you both for your continued review!

in src/test/uint256_tests.cpp:406 in 4e05d9459a outdated

401+    constexpr std::string_view valid_hex_65{"0x0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef0"}; // 0x prefix and 65 hex chars
402+    static_assert(valid_hex_65.size() == 2 + 65);
403+    BOOST_CHECK_EQUAL(uint256::FromUserHex(valid_hex_65.substr(2, 64)).value().ToString(), valid_hex_65.substr(2, 64));
404+    BOOST_CHECK_EQUAL(uint256::FromUserHex(valid_hex_65.substr(0, 66)).value().ToString(), valid_hex_65.substr(2, 64));
405+
406+    BOOST_CHECK(!uint256::FromUserHex("0x0 "));                 // no spaces at end,

ryanofsky commented at 5:00 pm on August 22, 2024:

In commit “refactor: add uint256::FromUserHex helper” (4e05d9459ad275d5929bab8f55e7edcd0be9dca9)

Not important but could also add a test that this does not allow spaces after a 64 digit string (not just a short string).

stickies-v commented at 12:38 pm on August 23, 2024:

Done, and reorganized the test a bit to use std::string valid_hex_64 instead of std::string_view valid_hex_65 which makes cat operations significantly less verbose, and makes the test more readable imo.

in src/test/uint256_tests.cpp:404 in 4e05d9459a outdated

399+    BOOST_CHECK_EQUAL(uint256::FromUserHex("ff").value(), uint256{0xff});
400+    // a value is expected when the last (65th) character is trimmed off valid_hex_65
401+    constexpr std::string_view valid_hex_65{"0x0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef0"}; // 0x prefix and 65 hex chars
402+    static_assert(valid_hex_65.size() == 2 + 65);
403+    BOOST_CHECK_EQUAL(uint256::FromUserHex(valid_hex_65.substr(2, 64)).value().ToString(), valid_hex_65.substr(2, 64));
404+    BOOST_CHECK_EQUAL(uint256::FromUserHex(valid_hex_65.substr(0, 66)).value().ToString(), valid_hex_65.substr(2, 64));

ryanofsky commented at 5:05 pm on August 22, 2024:

In commit “refactor: add uint256::FromUserHex helper” (4e05d9459ad275d5929bab8f55e7edcd0be9dca9)

I found these two tests and especially the comment above “a value is expected when the last (65th) character is trimmed off valid_hex_65” confusing because they make it sound like the function is supposed to trim off the 65th hex character and ignore it.

Would change the comment above to plainly describe the string: // 65 digit hex number with 0x prefix

And add a comment to explain the thing these two tests are actually checking: // Pass views of first 64 hex digits to verify string_view size is respected.

stickies-v commented at 12:39 pm on August 23, 2024:

Resolved by replacing valid_hex_65 with valid_hex_64 which I think addresses the confusion, too?

in src/test/uint256_tests.cpp:399 in 4e05d9459a outdated

394+    BOOST_CHECK_EQUAL(uint256::FromUserHex("00").value(), uint256::ZERO);
395+    BOOST_CHECK_EQUAL(uint256::FromUserHex("1").value(), uint256::ONE);
396+    BOOST_CHECK_EQUAL(uint256::FromUserHex("0x10").value(), uint256{0x10});
397+    BOOST_CHECK_EQUAL(uint256::FromUserHex("10").value(), uint256{0x10});
398+    BOOST_CHECK_EQUAL(uint256::FromUserHex("0xff").value(), uint256{0xff});
399+    BOOST_CHECK_EQUAL(uint256::FromUserHex("ff").value(), uint256{0xff});

ryanofsky commented at 5:47 pm on August 22, 2024:

In commit “refactor: add uint256::FromUserHex helper” (4e05d9459ad275d5929bab8f55e7edcd0be9dca9)

We seem to be dropping the mixed case checks from the previous code:

0    BOOST_CHECK(IsHexNumber("0xFfa"));
1    BOOST_CHECK(IsHexNumber("Ffa"));
2    BOOST_CHECK(IsHexNumber("0x00112233445566778899aabbccddeeffAABBCCDDEEFF"));
3    BOOST_CHECK(IsHexNumber("00112233445566778899aabbccddeeffAABBCCDDEEFF"));

I tend to think it would be good to keep these, so code is changing less. But if these are intentionally being dropped, it could be good to add a test comment saying what this test is intentionally not checking, and pointing to other tests where the checks are.

stickies-v commented at 12:51 pm on August 23, 2024:

Hmm, fair point. I think I dropped it because of #30377 dropping mixed case support for the consteval ctor, but that’s unrelated to this pull of course so keeping the test is the right approach. I’ve added mixed case into 4 existing checks, but I’d still like to drop the Ffa tests since we can’t use the uint8_t ctor which would make an equality test quite verbose for I think nothing that’s not already tested in the other lines?

0    BOOST_CHECK_EQUAL(uint256::FromUserHex("0xFf").value(), uint256{0xff});
1    BOOST_CHECK_EQUAL(uint256::FromUserHex("Ff").value(), uint256{0xff});
2    const std::string valid_hex_64{"0x0123456789abcdef0123456789abcdef0123456789ABDCEF0123456789ABCDEF"};
3    BOOST_REQUIRE_EQUAL(valid_hex_64.size(), 2 + 64); // 0x prefix and 64 hex digits
4    BOOST_CHECK_EQUAL(uint256::FromUserHex(valid_hex_64.substr(2)).value().ToString(), ToLower(valid_hex_64.substr(2)));
5    BOOST_CHECK_EQUAL(uint256::FromUserHex(valid_hex_64.substr(0)).value().ToString(), ToLower(valid_hex_64.substr(2)));

ryanofsky commented at 6:02 pm on August 22, 2024: contributor

Code review ACK 81554aac80bf2270db977c110c37acc7e8034194. Thanks for all the changes. This seems simpler and more backwards compatible now.

re: #30569#issue-2443189383

In PR description:

test: the optional RANDOM_CTX_SEED env var is now required to consist only of up to 64 hex digits (optionally prefixed with “0x”), otherwise the program will abort

I think it would be clearer and more consistent with the rest of the description if it said “test: the optional RANDOM_CTX_SEED env var will now cause tests to abort when it contains invalid hex characters, or when it is longer than 64 hex digits”

ryanofsky approved

in test/functional/feature_minchainwork.py:26 in 81554aac80 outdated

22@@ -23,7 +23,6 @@
23 
24 # 2 hashes required per regtest block (with no difficulty adjustment)
25 REGTEST_WORK_PER_BLOCK = 2
26-

hodlinator commented at 7:27 pm on August 22, 2024:

nit: Inadvertent newline removal?

stickies-v commented at 12:52 pm on August 23, 2024:

yeah, I could’ve sworn a previous version introduced this newline and this was me reverting it, but I suppose it must be a rebase artifact, thanks for spotting and fixed now.

hodlinator approved

hodlinator commented at 7:28 pm on August 22, 2024: contributor

re-ACK 81554aac80bf2270db977c110c37acc7e8034194

git range-diff master cf88a55e97719aabd62f0b608df0800fef8304de 81554aac80bf2270db977c110c37acc7e8034194

Only changes BOOST_AUTO_TEST_CASE(from_user_hex) tests, error strings, commit messages and revives the LogPrintf -> LogInfo update.

refactor: add uint256::FromUserHex helper

FromUserHex will be used in future commits to construct
uint256 instances from user hex input without being
unnecessarily restrictive on formatting by allowing
0x-prefixed input that is shorter than 64 characters.

70e2c87737

node: use uint256::FromUserHex for -minimumchainwork parsing

Removes dependency on unsafe and deprecated uint256S.

This makes parsing more strict, by returning an error
when the input contains more than 64 hex digits.

8a44d7d3c1

util: remove unused IsHexNumber

The relevant unit tests have been incorporated in
uint256_tests/from_user_hex in a previous commit.

2e58fdb544

node: use uint256::FromUserHex for -assumevalid parsing

Removes dependency on unsafe and deprecated uint256S.

This makes parsing more strict, by returning an error
when the input contains non-hex characters, or when it
contains more than 64 hex digits.

Also make feature_assumevalid.py more robust by using CBlock.hash
which is guaranteed to be 64 characters long, as opposed to the
variable-length hex(CBlock.sha256)

6819e5a329

test: use uint256::FromUserHex for RANDOM_CTX_SEED

Removes dependency on unsafe and deprecated uint256S.

This makes parsing more strict, by requiring RANDOM_CTX_SEED
to be a string of up to 64 hex digits (optionally prefixed with
"0x"), whereas previously any string would be accepted, with
non-hex characters silently ignored and input longer than
64 characters (ignoring "0x" prefix) silently trimmed.

Can be tested with:

$ RANDOM_CTX_SEED=z ./src/test/test_bitcoin --log_level=all --run_test=timeoffsets_tests/timeoffsets_warning -- -printtoconsole=1 | grep RANDOM_CTX_SEED
RANDOM_CTX_SEED must consist of up to 64 hex digits ("0x" prefix allowed), it was set to: 'z'.

Co-Authored-By: MarcoFalke <*~=`'#}+{/-|&$^_@721217.xyz>

18d65d2772

stickies-v force-pushed on Aug 23, 2024

stickies-v commented at 12:59 pm on August 23, 2024: contributor

Force-pushed to address all review comments, limited to nits and tests so should be an easy re-review:

grammar nit for RANDOM_CTX_SEED
added from_user_hex test case to cover more > 64 char cases and cleaned up the test a bit by replacing valid_hex_65 with valid_hex_64 which should be less confusing
re-added mixed case testing in from_user_hex
undid unintentional newline removal

I think it would be clearer and more consistent with the rest of the description if it said “test: the optional RANDOM_CTX_SEED env var will now cause tests to abort when it contains invalid hex characters, or when it is longer than 64 hex digits”

Thanks, PR description updated with your suggestion.

hodlinator commented at 1:13 pm on August 23, 2024: contributor

re-ACK 18d65d27726bf9fc7629b8e794047a10c9cf6156

git range-diff master 81554aac80bf2270db977c110c37acc7e8034194 18d65d27726bf9fc7629b8e794047a10c9cf6156

Changes: Grammar, added&adjusted tests (passed), undid unintentional new line.

DrahtBot requested review from l0rinc on Aug 23, 2024

DrahtBot requested review from ryanofsky on Aug 23, 2024

ryanofsky approved

ryanofsky commented at 2:05 pm on August 23, 2024: contributor

Code review ACK 18d65d27726bf9fc7629b8e794047a10c9cf6156. Very nice change that cleans up the API, adds checking for invalid values, makes parsing of values more consistent, and adds test coverage.

Just small test, error message, and comment changes since last review

l0rinc commented at 3:09 pm on August 23, 2024: contributor

While I’m not a fan of mixed casing, I agree that it’s probably outside the scope of current change.

ACK 18d65d27726bf9fc7629b8e794047a10c9cf6156

achow101 commented at 8:34 pm on August 27, 2024: member

ACK 18d65d27726bf9fc7629b8e794047a10c9cf6156

achow101 merged this on Aug 27, 2024

achow101 closed this on Aug 27, 2024

stickies-v deleted the branch on Aug 29, 2024

node: reduce unsafe uint256S usage #30569

Code Coverage

Reviews

Conflicts