util: ParseByteUnits - Parse a string with suffix unit #23249

dougEfresh commented at 10:25 AM on October 11, 2021: contributor

A convenience utility for parsing human readable strings sizes e.g. 500G is 500 * 1 << 30

The argument/setting maxuploadtarget now accept human readable byte units [k|K|m|M|g|G||t|T] This change backward compatible, defaults to M if no unit specified.

DrahtBot added the label Utils/log/libs on Oct 11, 2021

in src/init.cpp:1753 in 3f900a6946 outdated

1749 | @@ -1749,7 +1750,7 @@ bool AppInitMain(NodeContext& node, interfaces::BlockAndHeaderTipInfo* tip_info)
1750 |      connOptions.nReceiveFloodSize = 1000 * args.GetIntArg("-maxreceivebuffer", DEFAULT_MAXRECEIVEBUFFER);
1751 |      connOptions.m_added_nodes = args.GetArgs("-addnode");
1752 |  
1753 | -    connOptions.nMaxOutboundLimit = 1024 * 1024 * args.GetIntArg("-maxuploadtarget", DEFAULT_MAX_UPLOAD_TARGET);
1754 | +    connOptions.nMaxOutboundLimit = ParseByteUnits(args.GetArg("-maxuploadtarget", "")).value_or(DEFAULT_MAX_UPLOAD_TARGET);

laanwj commented at 2:10 PM on October 11, 2021:

When we change this, it would be nice to do actual error handling instead of silently using the default if the value is unparseable.

dougEfresh commented at 4:26 PM on October 11, 2021:

The current/master branch implementation is silent, I agree we should do some type of error handling, should we:

Return InitError()
Log a warning

If we throw an InitError, some people won't notice it, as this part of the code comes after level DB initialize. The error is shown like ~45-60 seconds after start up. (I know I didn't notice it myself)

ryanofsky has other PR(s) that seems to address this by refactoring ArgsManager #22766 / #16545 . Perhaps wait for that.

ryanofsky commented at 11:46 PM on October 11, 2021:

ryanofsky has other PR(s) that seems to address this by refactoring ArgsManager #22766 / #16545 . Perhaps wait for that.

I'd definitely suggest fatal, explicit InitError for any bad or ambiguous values provided. ArgsManager class and PRs are a mess, I'd stay far away and keep this separate!

EDIT: More concretely, I'd suggest changing DEFAULT_MAX_UPLOAD_TARGET to a string and writing something like:

const std::string DEFAULT_MAX_UPLOAD_TARGET = "0M";
...
auto max_upload_str = args.GetArg("-maxuploadtarget", DEFAULT_MAX_UPLOAD_TARGET);
auto max_upload = ParseByteUnits(max_upload_str);
if (!max_upload) {
    return InitError(strprintf("Bad -maxuploadtarget %s", max_upload_str));
}
connOptions.nMaxOutboundLimit = *max_upload;

in src/util/strencodings.cpp:542 in 3f900a6946 outdated

 537 | +    size_t nBytes{0};
 538 | +
 539 | +    v.pop_back();
 540 | +    switch (ToLower(unit)) {
 541 | +        case 'k':
 542 | +            return ParseUInt64(v, &nBytes) ? std::optional<size_t>{nBytes << 10} : std::nullopt;

laanwj commented at 2:11 PM on October 11, 2021:

Could factor out the ParseUInt64(v, &nBytes) part here, it's the same for all cases.

in src/util/strencodings.h:315 in 3f900a6946 outdated

 310 | + *
 311 | + * Examples: 4m,6M,9g,78T
 312 | + *
 313 | + * @param[in] str                  the string to convert into bytes
 314 | + * @param[in] default_multiplier   if no unit is found in str, use this multiplier. default is MiB
 315 | + * @returns                        optional size_t bytes from str

laanwj commented at 2:12 PM on October 11, 2021:

Please document under which conditions it returns a value or nullopt (e.g. parse errors).

in src/util/strencodings.cpp:551 in 3f900a6946 outdated

 546 | +            break;
 547 | +        case 'g':
 548 | +            return ParseUInt64(v, &nBytes) ? std::optional<size_t>{nBytes << 30} : std::nullopt;
 549 | +            break;
 550 | +        case 't':
 551 | +            return ParseUInt64(v, &nBytes) ? std::optional<size_t>{nBytes << 40} : std::nullopt;

laanwj commented at 4:06 PM on October 11, 2021:

Are people going to fight over factor 1000 versus 1024 or has this been settled?

sipa commented at 4:14 PM on October 11, 2021:

I'd support both, lowercase for 1000-based, uppercase for 1024-based? That hopefully avoids most controversy.

Also, this looks like it could easily overflow.

dougEfresh force-pushed on Oct 13, 2021

dougEfresh commented at 2:58 PM on October 13, 2021: contributor

Updated with

Replacing size_t in favor of uint64_t (windows/mac didn't play nice with size_t)
Adding InitError on parsing failure
Simplified case statement in ParseByteUnit
1000 for lowercase 1024 base for UPPERCASE

What I need help with is overflows. I use __builtin_mul_overflow but there must be a better/easier solution.

dougEfresh renamed this:
~~util: ParseByteUnits - Parse a string with a suffix unit~~
util: ParseByteUnits - Parse a string with suffix unit
on Oct 13, 2021

DrahtBot commented at 8:37 PM on October 14, 2021: contributor

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#22834 (net: respect -onlynet= when making outbound connections by vasild)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

in src/util/strencodings.cpp:578 in 7d96a50d56 outdated

 573 | +            // use default_multipiler
 574 | +            break;
 575 | +    }
 576 | +
 577 | +    int64_t total{0};
 578 | +#if defined(HAVE_BUILTIN_MUL_OVERFLOW)

practicalswift commented at 8:13 AM on October 15, 2021:

Could use MultiplicationOverflow instead if it is moved out from src/test/fuzz/util.h and proper unit tests are added fot it?

dougEfresh commented at 4:32 PM on October 15, 2021:

Since I'm checking an unsigned int, I used

nBytes > std::numeric_limits<uint64_t>::max() / multiplier

If there is interest, I can create another PR with MultiplicationOverflow out of src/test/fuzz/util.h

dougEfresh force-pushed on Oct 15, 2021

in src/util/strencodings.cpp:542 in 0d786a1094 outdated

 537 | +    auto unit = v.back();
 538 | +    uint64_t nBytes{0};
 539 | +    uint64_t multiplier{default_multiplier};
 540 | +
 541 | +    v.pop_back();
 542 | +    if (!(ParseUInt64(v, &nBytes) || ParseUInt64(str, &nBytes))) {

promag commented at 9:00 PM on October 15, 2021:

I think parsing fails for number > 9 without unit?

dougEfresh commented at 11:04 AM on October 16, 2021:

Added this test case:

BOOST_CHECK_EQUAL(ParseByteUnits("10").value_or(0), 10ULL << 20UL);

in src/util/strencodings.cpp:573 in 0d786a1094 outdated

 568 | +            break;
 569 | +        case 'T':
 570 | +            multiplier = 1ULL << 40;
 571 | +            break;
 572 | +        default:
 573 | +            // use default_multipiler

promag commented at 9:00 PM on October 15, 2021:

Should be digit?

dougEfresh commented at 11:04 AM on October 16, 2021:

Updated to parse the original str in the default case

promag commented at 9:03 PM on October 15, 2021: member

Concept ACK. However, I think the implementation can be improved. Why not check the last char first? If it's not a digit then extract it and validate to get the multiplier. Only then parse the number.

dougEfresh force-pushed on Oct 16, 2021

dougEfresh force-pushed on Nov 1, 2021

in src/init.cpp:1754 in c5d4a59cf9 outdated

1749 | @@ -1749,7 +1750,11 @@ bool AppInitMain(NodeContext& node, interfaces::BlockAndHeaderTipInfo* tip_info)
1750 |      connOptions.nReceiveFloodSize = 1000 * args.GetIntArg("-maxreceivebuffer", DEFAULT_MAXRECEIVEBUFFER);
1751 |      connOptions.m_added_nodes = args.GetArgs("-addnode");
1752 |  
1753 | -    connOptions.nMaxOutboundLimit = 1024 * 1024 * args.GetIntArg("-maxuploadtarget", DEFAULT_MAX_UPLOAD_TARGET);
1754 | +    auto opt_max_upload = ParseByteUnits(args.GetArg("-maxuploadtarget", DEFAULT_MAX_UPLOAD_TARGET));
1755 | +    if (!opt_max_upload) {

dougEfresh commented at 8:13 AM on November 1, 2021:

I want to point out to reviewers that this changes start up behavior. Previously, -maxuploadtarget would silently fail and use DEFAULT_MAX_UPLOAD_TARGET. This change will throw an error upon parse failure, but only after 30-50 seconds after start-up

ryanofsky commented at 4:12 PM on November 2, 2021:

re: #23249 (review)

This change will throw an error upon parse failure, but only after 30-50 seconds after start-up

I'm not seeing this take 30-50 seconds, just a second or two. Still it would be a good to provide faster checking, and just moving the parsing call up a little higher in the AppInitMain function seems to give good results. I tried:

diff --git a/src/init.cpp b/src/init.cpp
index 517ad2342ff..513cc24faeb 100644
--- a/src/init.cpp
+++ b/src/init.cpp
@@ -1114,6 +1114,12 @@ bool AppInitMain(NodeContext& node, interfaces::BlockAndHeaderTipInfo* tip_info)
 {
     const ArgsManager& args = *Assert(node.args);
     const CChainParams& chainparams = Params();
+
+    auto opt_max_upload = ParseByteUnits(args.GetArg("-maxuploadtarget", DEFAULT_MAX_UPLOAD_TARGET));
+    if (!opt_max_upload) {
+        return InitError(strprintf(_("Unable to parse -maxuploadtarget: '%s' (possible integer overflow?)"), args.GetArg("-maxuploadtarget","")));
+    }
+
     // ********************************************************* Step 4a: application initialization
     if (!CreatePidFile(args)) {
         // Detailed error printed inside CreatePidFile().
@@ -1749,11 +1755,6 @@ bool AppInitMain(NodeContext& node, interfaces::BlockAndHeaderTipInfo* tip_info)
     connOptions.nSendBufferMaxSize = 1000 * args.GetIntArg("-maxsendbuffer", DEFAULT_MAXSENDBUFFER);
     connOptions.nReceiveFloodSize = 1000 * args.GetIntArg("-maxreceivebuffer", DEFAULT_MAXRECEIVEBUFFER);
     connOptions.m_added_nodes = args.GetArgs("-addnode");
-
-    auto opt_max_upload = ParseByteUnits(args.GetArg("-maxuploadtarget", DEFAULT_MAX_UPLOAD_TARGET));
-    if (!opt_max_upload) {
-        return InitError(strprintf(_("Unable to parse -maxuploadtarget: '%s' (possible integer overflow?)"), args.GetArg("-maxuploadtarget","")));
-    }
     connOptions.nMaxOutboundLimit = *opt_max_upload;;
     connOptions.m_peer_connect_timeout = peer_connect_timeout;

dougEfresh commented at 10:13 PM on November 2, 2021:

Moved

in src/util/strencodings.cpp:588 in c5d4a59cf9 outdated

 583 | +
 584 | +    if (multiplier == 0) {
 585 | +        return opt_uint64{nBytes};
 586 | +    }
 587 | +    // check overflow
 588 | +    return nBytes > std::numeric_limits<uint64_t>::max() / multiplier ? std::nullopt : opt_uint64{nBytes * multiplier};

ryanofsky commented at 6:19 PM on November 2, 2021:

In commit "util: ParseByteUnits - Parse a string with suffix unit [k|K] [m|M] [g|G] [t|T]" (c5d4a59cf91f54a2027851b32cf83cfcb04cd210)

There's repetition and unneeded variables in this function. Also nBytes holds the number before multiplication instead of the actual number of bytes, and the name doesn't follow current coding convention. Would suggest an overall cleanup here like:

diff --git a/src/util/strencodings.cpp b/src/util/strencodings.cpp
index c490cc7f3e1..0075e69c5e7 100644
--- a/src/util/strencodings.cpp
+++ b/src/util/strencodings.cpp
@@ -530,19 +530,8 @@ std::string HexStr(const Span<const uint8_t> s)
 
 std::optional<uint64_t> ParseByteUnits(const std::string& str, const uint64_t default_multiplier)
 {
-    if (str.empty()) {
-        return std::nullopt;
-    }
-
-    using opt_uint64 = std::optional<uint64_t>;
-
-    uint64_t nBytes{0};
-    uint64_t multiplier{default_multiplier};
-    auto unit = str.back();
-    auto v = str;
-
-    v.pop_back();
-
+    auto multiplier = default_multiplier ? default_multiplier : 1;
+    char unit = str.empty() ? 0 : str.back();
     switch (unit) {
         case 'k':
             multiplier = 1000;
@@ -569,21 +558,13 @@ std::optional<uint64_t> ParseByteUnits(const std::string& str, const uint64_t de
             multiplier = 1ULL << 40;
             break;
         default:
-            // unit should be digit, parse original str
-            if (!ParseUInt64(str, &nBytes)) {
-                return std::nullopt;
-            }
-            //check overflow and use default_multiplier
-            return default_multiplier == 0 ? opt_uint64{nBytes} : nBytes > std::numeric_limits<uint64_t>::max() / default_multiplier ? std::nullopt : opt_uint64{nBytes * default_multiplier};
+            unit = 0;
+            break;
     }
-
-    if (!ParseUInt64(v, &nBytes)) {
+    uint64_t num;
+    if (!ParseUInt64(unit ? str.substr(0, str.size()-1) : str, &num) ||
+        num > std::numeric_limits<uint64_t>::max() / multiplier) {
         return std::nullopt;
     }
-
-    if (multiplier == 0) {
-        return opt_uint64{nBytes};
-    }
-    // check overflow
-    return nBytes > std::numeric_limits<uint64_t>::max() / multiplier ? std::nullopt : opt_uint64{nBytes * multiplier};
+    return num * multiplier;
 }

dougEfresh commented at 10:13 PM on November 2, 2021:

Great suggestion, code now looks cleaner.

in src/test/util_tests.cpp:2478 in c5d4a59cf9 outdated

2473 | +
2474 | +    BOOST_CHECK_EQUAL(ParseByteUnits("4t").value_or(0), 4ULL * 1000 * 1000 * 1000 * 1000);
2475 | +    BOOST_CHECK_EQUAL(ParseByteUnits("4T").value_or(0), 4ULL << 40);
2476 | +
2477 | +    // override default multiplier
2478 | +    BOOST_CHECK_EQUAL(ParseByteUnits("5", 1ULL << 10).value_or(0), 5ULL << 10);

ryanofsky commented at 6:27 PM on November 2, 2021:

In commit "util: ParseByteUnits - Parse a string with suffix unit [k|K] [m|M] [g|G] [t|T]" (c5d4a59cf91f54a2027851b32cf83cfcb04cd210)

Would be good to check behavior with 0 multiplier to make sure it doesn't accidentally change in the future.

dougEfresh commented at 10:14 PM on November 2, 2021:

Added 0 multiplier test

in src/test/util_tests.cpp:2491 in c5d4a59cf9 outdated

2486 | +
2487 | +    // be positive
2488 | +    BOOST_CHECK(!ParseByteUnits("-123m").has_value());
2489 | +
2490 | +    // zero padding
2491 | +    BOOST_CHECK_EQUAL(ParseByteUnits("020M").value_or(1), 20ULL << 20);

ryanofsky commented at 6:31 PM on November 2, 2021:

In commit "util: ParseByteUnits - Parse a string with suffix unit [k|K] [m|M] [g|G] [t|T]" (c5d4a59cf91f54a2027851b32cf83cfcb04cd210)

Minor inconsistency here where this case is using .value_or(1) and cases above are using .value_or(0). I think it would best if all of these cases were consistent and better if they were using .value() instead of .value_or()

dougEfresh commented at 10:14 PM on November 2, 2021:

Agreed

ryanofsky approved

ryanofsky commented at 6:47 PM on November 2, 2021: contributor

Code review and lightly tested ACK c5d4a59cf91f54a2027851b32cf83cfcb04cd210. I left some suggestions below, but none are important so feel free to ignore them.

It might be good to add a release note in the doc/ directory mentioning the new argument verification, since in theory it could make a configuration that used to work previously now cause an error on startup.

It would also be good to mention new behavior in the PR description, since current description is a little ambiguous about whether this is only adding a utility function or changing behavior.

dougEfresh force-pushed on Nov 2, 2021

dougEfresh commented at 10:15 PM on November 2, 2021: contributor

It might be good to add a release note in the doc/ directory mentioning the new argument verification, since in theory it could make a configuration that used to work previously now cause an error on startup.

Added to release docs

It would also be good to mention new behavior in the PR description, since current description is a little ambiguous about whether this is only adding a utility function or changing behavior.

Updated PR description.

Thanks for feedback @ryanofsky

dougEfresh requested review from laanwj on Nov 2, 2021

ryanofsky approved

ryanofsky commented at 5:31 PM on November 8, 2021: contributor

Code review ACK f1f8a772b0daf8cc0c40f0fc54e82618ad242ddd. Just suggested changes since last review (thanks!)

in src/init.cpp:1768 in f1f8a772b0 outdated

1754 | @@ -1748,8 +1755,7 @@ bool AppInitMain(NodeContext& node, interfaces::BlockAndHeaderTipInfo* tip_info)
1755 |      connOptions.nSendBufferMaxSize = 1000 * args.GetIntArg("-maxsendbuffer", DEFAULT_MAXSENDBUFFER);
1756 |      connOptions.nReceiveFloodSize = 1000 * args.GetIntArg("-maxreceivebuffer", DEFAULT_MAXRECEIVEBUFFER);
1757 |      connOptions.m_added_nodes = args.GetArgs("-addnode");
1758 | -
1759 | -    connOptions.nMaxOutboundLimit = 1024 * 1024 * args.GetIntArg("-maxuploadtarget", DEFAULT_MAX_UPLOAD_TARGET);
1760 | +    connOptions.nMaxOutboundLimit = *opt_max_upload;;

vasild commented at 9:05 AM on November 9, 2021:

nit: double ;; at the end

dougEfresh commented at 12:19 PM on November 9, 2021:

removed

in src/test/util_tests.cpp:2459 in f1f8a772b0 outdated

2455 | @@ -2456,4 +2456,49 @@ BOOST_AUTO_TEST_CASE(remove_prefix)
2456 |      BOOST_CHECK_EQUAL(RemovePrefix("", ""), "");
2457 |  }
2458 |  
2459 | +BOOST_AUTO_TEST_CASE(util_ParseByteUnits)

vasild commented at 9:17 AM on November 9, 2021:

Maybe add a test for unknown suffix, e.g. 123X.

dougEfresh commented at 12:17 PM on November 9, 2021:

Added invalid unit test

in src/util/strencodings.cpp:566 in f1f8a772b0 outdated

 561 | +            unit = 0;
 562 | +            break;
 563 | +    }
 564 | +
 565 | +    uint64_t num;
 566 | +    if (!ParseUInt64(unit ? str.substr(0, str.size()-1) : str, &num) ||

vasild commented at 9:19 AM on November 9, 2021:

nit: missing spaces around -.

dougEfresh commented at 12:20 PM on November 9, 2021:

When I added the space, it didn't look aesthetically pleasing to me, normally I do add a space, but this if statement looks better without I don't mind adding a space if you like or if someone else asks.

vasild commented at 1:51 PM on November 9, 2021:

The thing is that lots of people have lots of different preferences. "Aesthetically pleasant" for somebody may not be for another. The project uses clang-format to avoid endless low-value discussions about styling.

dougEfresh commented at 4:48 PM on November 9, 2021:

@vasild I changed const uint64_t to uint64_t

in src/init.cpp:440 in f1f8a772b0 outdated

 435 | @@ -435,7 +436,7 @@ void SetupServerArgs(ArgsManager& argsman)
 436 |      argsman.AddArg("-maxreceivebuffer=<n>", strprintf("Maximum per-connection receive buffer, <n>*1000 bytes (default: %u)", DEFAULT_MAXRECEIVEBUFFER), ArgsManager::ALLOW_ANY, OptionsCategory::CONNECTION);
 437 |      argsman.AddArg("-maxsendbuffer=<n>", strprintf("Maximum per-connection send buffer, <n>*1000 bytes (default: %u)", DEFAULT_MAXSENDBUFFER), ArgsManager::ALLOW_ANY, OptionsCategory::CONNECTION);
 438 |      argsman.AddArg("-maxtimeadjustment", strprintf("Maximum allowed median peer time offset adjustment. Local perspective of time may be influenced by peers forward or backward by this amount. (default: %u seconds)", DEFAULT_MAX_TIME_ADJUSTMENT), ArgsManager::ALLOW_ANY, OptionsCategory::CONNECTION);
 439 | -    argsman.AddArg("-maxuploadtarget=<n>", strprintf("Tries to keep outbound traffic under the given target (in MiB per 24h). Limit does not apply to peers with 'download' permission. 0 = no limit (default: %d)", DEFAULT_MAX_UPLOAD_TARGET), ArgsManager::ALLOW_ANY, OptionsCategory::CONNECTION);
 440 | +    argsman.AddArg("-maxuploadtarget=<n>", strprintf("Tries to keep outbound traffic under the given target (in MiB, GiB or TiB per 24h). Limit does not apply to peers with 'download' permission or blocks created within past week. 0 = no limit (default: %s). Optional suffix units [m|M] MiB, [g|G] GiB, [t|T] TiB. Lowercase is 1000 base while uppercase is 1024 base (default).", DEFAULT_MAX_UPLOAD_TARGET), ArgsManager::ALLOW_ANY, OptionsCategory::CONNECTION);

vasild commented at 9:38 AM on November 9, 2021:

I find this text confusing:

Optional suffix units [m|M] MiB, [g|G] GiB, [t|T] TiB. Lowercase is 1000 base while uppercase is 1024 base

Maybe:

Optional suffix units [m|M|g|G|t|T]. Lowercase is 1000 base while uppercase is 1024 base

dougEfresh commented at 12:18 PM on November 9, 2021:

@vasild Good suggestion, I've changed the doc to:

Tries to keep outbound traffic under the given target per 24h. Limit does not apply to peers with 'download' permission or blocks created within past week. 0 = no limit (default: %s). Optional suffix units [k|K|m|M|g|G|t|T] (default: M). Lowercase is 1000 base while uppercase is 1024 base

dougEfresh force-pushed on Nov 9, 2021

in src/util/strencodings.h:319 in f3f88062f4 outdated

 314 | + * @param[in] str                  the string to convert into bytes
 315 | + * @param[in] default_multiplier   if no unit is found in str, use this multiplier. default is MiB
 316 | + * @returns                        optional uint64_t bytes from str or nullopt
 317 | + *                                 if ParseUInt64 is false, str is empty, trailing whitespace or overflow
 318 | + */
 319 | +std::optional<uint64_t> ParseByteUnits(const std::string& str, const uint64_t default_multiplier = 1024UL*1024UL);

vasild commented at 1:37 PM on November 9, 2021:

nit: drop unnecessary const from the integer argument style: missing spaces around *:

std::optional<uint64_t> ParseByteUnits(const std::string& str, uint64_t default_multiplier = 1024UL * 1024UL);

dougEfresh commented at 2:34 PM on November 9, 2021:

I ran clang-format-diff.py

vasild commented at 2:48 PM on November 9, 2021:

The const is still there. I think there was discussion somewhere about it, but I can't find it now...

in src/util/strencodings.cpp:568 in f3f88062f4 outdated

 563 | +    }
 564 | +
 565 | +    uint64_t num;
 566 | +    if (!ParseUInt64(unit ? str.substr(0, str.size()-1) : str, &num) ||
 567 | +        num > std::numeric_limits<uint64_t>::max() / multiplier) { // check overflow
 568 | +         return std::nullopt;

vasild commented at 1:41 PM on November 9, 2021:

style: this has 5 spaces for indentation, should be 4.

in src/util/strencodings.cpp:536 in f3f88062f4 outdated

 531 | +std::optional<uint64_t> ParseByteUnits(const std::string& str, const uint64_t default_multiplier)
 532 | +{
 533 | +    auto multiplier = default_multiplier ? default_multiplier : 1;
 534 | +    char unit = str.empty() ? 0 : str.back();
 535 | +    switch (unit) {
 536 | +        case 'k':

vasild commented at 1:42 PM on November 9, 2021:

style: case should be aligned below switch:

    switch (unit) {
    case 'k':
        multiplier = 1000;

in src/init.cpp:1121 in f3f88062f4 outdated

1114 | @@ -1114,6 +1115,12 @@ bool AppInitMain(NodeContext& node, interfaces::BlockAndHeaderTipInfo* tip_info)
1115 |  {
1116 |      const ArgsManager& args = *Assert(node.args);
1117 |      const CChainParams& chainparams = Params();
1118 | +
1119 | +    auto opt_max_upload = ParseByteUnits(args.GetArg("-maxuploadtarget", DEFAULT_MAX_UPLOAD_TARGET));
1120 | +    if (!opt_max_upload) {
1121 | +        return InitError(strprintf(_("Unable to parse -maxuploadtarget: '%s' (possible integer overflow?)"), args.GetArg("-maxuploadtarget","")));

vasild commented at 1:44 PM on November 9, 2021:

style: missing space

        return InitError(strprintf(_("Unable to parse -maxuploadtarget: '%s' (possible integer overflow?)"), args.GetArg("-maxuploadtarget", "")));

vasild approved

vasild commented at 1:47 PM on November 9, 2021: contributor

ACK f3f88062f4cdb0cc2d1af221a5a9da9ee2ea0e46

Some styling nits below, which would be good to address before merge.

The style rules are defined in src/.clang-format. You can use contrib/devtools/clang-format-diff.py to properly format the patch.

dougEfresh force-pushed on Nov 9, 2021

vasild approved

vasild commented at 2:44 PM on November 9, 2021: contributor

ACK 6ebb47626ee30576548e8b3c01ad961761e28562

dougEfresh force-pushed on Nov 9, 2021

dougEfresh commented at 4:49 PM on November 9, 2021: contributor

@MarcoFalke would you mind reviewing? Thanks

vasild approved

vasild commented at 8:21 AM on November 10, 2021: contributor

ACK 4451a0000dda5daba4b1624dce1c95f1914e7251

in src/util/strencodings.cpp:533 in 4451a0000d outdated

 526 | @@ -526,3 +527,45 @@ std::string HexStr(const Span<const uint8_t> s)
 527 |      assert(it == rv.end());
 528 |      return rv;
 529 |  }
 530 | +
 531 | +std::optional<uint64_t> ParseByteUnits(const std::string& str, uint64_t default_multiplier)
 532 | +{
 533 | +    auto multiplier = default_multiplier ? default_multiplier : 1;

promag commented at 9:15 AM on November 10, 2021:

nit, why not assert(default_multiplier > 0)?

dougEfresh commented at 11:38 AM on November 10, 2021:

Depends on how strict or how we want to use this function. For me, a 0 default_multiplier means "don not apply a multiplier" if no unit found, parse str as is. This may not be intuitive. I'm fine with restricting multiplier > 0. @MarcoFalke what's your opinion ? either way, I'll update the docs in strencoding.h to explaining what a default_multiplier of zero means.

MarcoFalke commented at 11:50 AM on November 10, 2021:

Wouldn't "don not apply a multiplier" mean 1?

dougEfresh commented at 11:57 AM on November 10, 2021:

ok, i will add an assert

in src/util/strencodings.cpp:534 in 4451a0000d outdated

 526 | @@ -526,3 +527,45 @@ std::string HexStr(const Span<const uint8_t> s)
 527 |      assert(it == rv.end());
 528 |      return rv;
 529 |  }
 530 | +
 531 | +std::optional<uint64_t> ParseByteUnits(const std::string& str, uint64_t default_multiplier)
 532 | +{
 533 | +    auto multiplier = default_multiplier ? default_multiplier : 1;
 534 | +    char unit = str.empty() ? 0 : str.back();

promag commented at 9:15 AM on November 10, 2021:

nit,

if (str.empty()) return std::nullopt;
char unit = str.back();

promag commented at 9:20 AM on November 10, 2021: member

Code review ACK 4451a0000dda5daba4b1624dce1c95f1914e7251.

in src/util/strencodings.cpp:566 in 4451a0000d outdated

 561 | +        unit = 0;
 562 | +        break;
 563 | +    }
 564 | +
 565 | +    uint64_t num;
 566 | +    if (!ParseUInt64(unit ? str.substr(0, str.size() - 1) : str, &num) ||

MarcoFalke commented at 9:24 AM on November 10, 2021:

With this breaking change (no longer ignoring whitespace), which I think is fine, it might be best to also reject sign characters (+-) and use ToIntegral<uint64_t>.

dougEfresh commented at 4:13 PM on November 10, 2021:

I have tests for +-, whitespace, invalid unit...etc I'm now using ToIntegral. I've added an assert(default_multiplier > 0) but I don't know how to test this. Any ideas examples?

vasild commented at 11:31 AM on November 11, 2021:

You cannot have the test trigger an assert() and continue successfully. Instead, you can throw std::out_of_range from the function and try/catch it from the test (BOOST_CHECK_THROW()). But in this case maybe it is best/simpler to just remove the parameter. Then there is no need to test edge cases for it :)

in src/util/strencodings.h:319 in 4451a0000d outdated

 314 | + * @param[in] str                  the string to convert into bytes
 315 | + * @param[in] default_multiplier   if no unit is found in str, use this multiplier. default is MiB
 316 | + * @returns                        optional uint64_t bytes from str or nullopt
 317 | + *                                 if ParseUInt64 is false, str is empty, trailing whitespace or overflow
 318 | + */
 319 | +std::optional<uint64_t> ParseByteUnits(const std::string& str, uint64_t default_multiplier = 1024UL * 1024UL);

MarcoFalke commented at 9:26 AM on November 10, 2021:

Should the default multiplier have a default? It seems like the default is something that is per-arg and not project-wide.

dougEfresh commented at 11:21 AM on November 10, 2021:

Agreed. I will remove the default.

dougEfresh commented at 3:01 PM on November 10, 2021:

How about 1 as a default multiplier ?

promag commented at 3:19 PM on November 10, 2021:

How about removing the argument? It's not needed. Either way please update doc comment above.

dougEfresh commented at 4:14 PM on November 10, 2021:

default of 1 seems natural, so my latest change defaults to 1

vasild commented at 11:26 AM on November 11, 2021:

How about removing the argument? It's not needed.

Indeed! I would remove that unused code. No parameter, no problem :) (what should be its default value? should it have default value? how to test 0 if we have assert()?)

dougEfresh commented at 2:45 PM on November 11, 2021:

i do need this multiplier argument. the intent of this function is to be used with maxupload, dbcache...etc. they inheritly have a default multiplier (mostly MiB). the main purpose is maintain backwards capability, e. g. maxupload=300 becomes 300m. As I write this, i think I will change the argument to a an enum with the valid units [k|K|m|M...] with no default value. This makes it clear and easy to verify/test.

dougEfresh commented at 8:12 AM on November 12, 2021:

@vasild @MarcoFalke @promag I added an enum ByteUnitMultiplier to src/util/strencodings.src/util/strencodings.h Let me know what you think.

dougEfresh force-pushed on Nov 10, 2021

in src/util/strencodings.cpp:553 in e4e2441eba outdated

 548 | +        break;
 549 | +    case 'M':
 550 | +        multiplier = 1ULL << 20;
 551 | +        break;
 552 | +    case 'g':
 553 | +        multiplier = 1000ULL * 1000 * 1000;

MarcoFalke commented at 4:20 PM on November 10, 2021:

        multiplier = 1000'000'000ULL;

Haven't tried this, but can the compiler parse this?

dougEfresh commented at 8:07 AM on November 12, 2021:

it does!

in src/util/strencodings.cpp:571 in e4e2441eba outdated

 566 | +        break;
 567 | +    }
 568 | +
 569 | +    auto parsed_num = ToIntegral<uint64_t>(unit ? str.substr(0, str.size() - 1) : str);
 570 | +    // check overflow
 571 | +    return !parsed_num || parsed_num > std::numeric_limits<uint64_t>::max() / multiplier ? std::nullopt : std::optional{*parsed_num * multiplier};

MarcoFalke commented at 4:22 PM on November 10, 2021:

    if( !parsed_num || parsed_num > std::numeric_limits<uint64_t>::max() / multiplier){
    return std::nullopt;
    }
    return *parsed_num * multiplier;

Maybe split up into two lines to avoid the std::optional{} constructor for the value?

dougEfresh commented at 8:07 AM on November 12, 2021:

I agree, it looks better split

MarcoFalke commented at 4:22 PM on November 10, 2021: member

feel free to ignore the style nits

in src/util/strencodings.h:47 in 97c0c4477d outdated

  41 | +    M_IEC = 1ULL << 20,
  42 | +    G_SI = 1'000'000'000ULL,
  43 | +    G_IEC = 1ULL << 30,
  44 | +    T_SI = 1'000'000'000'000ULL,
  45 | +    T_IEC = 1ULL << 40,
  46 | +};

dougEfresh commented at 8:06 AM on November 12, 2021:

I use the "technical" terms but I don't think they are intuitive, open to suggestions.

in src/util/strencodings.h:36 in 97c0c4477d outdated

  28 | @@ -29,6 +29,22 @@ enum SafeChars
  29 |      SAFE_CHARS_URI, //!< Chars allowed in URIs (RFC 3986)
  30 |  };
  31 |  
  32 | +/**
  33 | + * Used by ParseByteUnits()
  34 | + * SI are 1000 base while IEC is 1024 base
  35 | + */
  36 | +enum ByteUnitMultiplier {

promag commented at 10:07 AM on November 12, 2021:

Use enum class, which can be used like ByteUnit::g or ByteUnit::K.

dougEfresh commented at 5:27 PM on November 12, 2021:

changed. more intuitive, clear.

dougEfresh force-pushed on Nov 12, 2021

dougEfresh force-pushed on Nov 13, 2021

ryanofsky approved

ryanofsky commented at 5:12 PM on November 15, 2021: contributor

Code review ACK 1f51a25aa5037210ba39e14f048b3c9ec1d342ca. Main changes since last review are removing default multiplier, asserting multiplier is positive, adding enum

in src/test/util_tests.cpp:2501 in 1f51a25aa5 outdated

2496 | +
2497 | +    // fractions not allowed
2498 | +    BOOST_CHECK(!ParseByteUnits("0.5T", noop));
2499 | +
2500 | +    // overflow
2501 | +    BOOST_CHECK(!ParseByteUnits("18446744073709551615g", noop).has_value());

vasild commented at 10:02 AM on November 17, 2021:

nit: remove .has_value() for consistency with other tests:

    BOOST_CHECK(!ParseByteUnits("18446744073709551615g", noop));

dougEfresh commented at 10:49 AM on November 17, 2021:

fixed in 21b58f430fa05fdb7c5db79b545302417a5dbceb

in src/util/strencodings.h:326 in 1f51a25aa5 outdated

 321 | @@ -305,4 +322,17 @@ std::string ToUpper(const std::string& str);
 322 |   */
 323 |  std::string Capitalize(std::string str);
 324 |  
 325 | +/**
 326 | + * Parse a string with suffix unit [k|K|m|M|g|G|t|T] .

vasild commented at 10:20 AM on November 17, 2021:

nit: extra white space

 * Parse a string with suffix unit [k|K|m|M|g|G|t|T].

dougEfresh commented at 10:49 AM on November 17, 2021:

fixed

vasild commented at 10:22 AM on November 17, 2021: contributor

The code looks ok (1f51a25aa5037210ba39e14f048b3c9ec1d342ca). Some nits below, feel free to ignore.

I think the 3 commits should be squashed into one - subsequent ones just change stuff added by the first commit.

util: ParseByteUnits - Parse a string with suffix unit [k|K|m|M|g|G|t|T]

A convenience utility for human readable arguments/config e.g. -maxuploadtarget=500g

21b58f430f

dougEfresh force-pushed on Nov 17, 2021

dougEfresh commented at 10:53 AM on November 17, 2021: contributor

@MarcoFalke hopefully, last review 21b58f43 Main changes since last review are removing assigning a default multiplier, adding enum for ByteUnit(s)

vasild approved

vasild commented at 11:39 AM on November 17, 2021: contributor

ACK 21b58f430fa05fdb7c5db79b545302417a5dbceb

Just squash and 2 minor nits since 1f51a25aa5, verified with

diff -u <(git diff 1f51a25aa5~3..1f51a25aa5) <(git show 21b58f430f) |vim -

Thanks!

ryanofsky approved

ryanofsky commented at 5:31 AM on November 24, 2021: contributor

Code review ACK 21b58f430fa05fdb7c5db79b545302417a5dbceb. Only changes since last review are dropping optional has_value call, fixing comment punctuation, squashing commits.

in src/test/util_tests.cpp:2467 in 21b58f430f

2462 | +
2463 | +    // no multiplier
2464 | +    BOOST_CHECK_EQUAL(ParseByteUnits("1", noop).value(), 1);
2465 | +    BOOST_CHECK_EQUAL(ParseByteUnits("0", noop).value(), 0);
2466 | +
2467 | +    BOOST_CHECK_EQUAL(ParseByteUnits("1k", noop).value(), 1000ULL);

MarcoFalke commented at 9:47 AM on November 24, 2021:

nit: Could also add a test that default_multiplier is ignored completely if passed in via the string? Maybe also add a fuzz test?

MarcoFalke approved

MarcoFalke merged this on Nov 24, 2021

MarcoFalke closed this on Nov 24, 2021

sidhujag referenced this in commit 9a3c515945 on Nov 24, 2021

sidhujag referenced this in commit 9ac3ebeb11 on Nov 24, 2021

rebroad commented at 8:38 PM on December 24, 2021: contributor

This fails to compile on some (unsupported, e.g. cygwin) systems:

  CXX      bitcoind-bitcoind.o
In file included from ./netaddress.h:19,
                 from ./chainparams.h:11,
                 from bitcoind.cpp:10:
./util/strencodings.h:41:10: warning: multi-character character constant [-Wmultichar]
   41 |     m = 1'000'000ULL,
      |          ^~~~~

sipa commented at 8:53 PM on December 24, 2021: member

@rebroad You need a C++17 compiler.

delta1 referenced this in commit 3f0bf16b6a on May 30, 2023

bitcoin locked this on Sep 21, 2023