refactor: Replace ParseHex with consteval ""

hodlinator commented at 10:50 am on July 2, 2024: contributor

Motivation:

Validates and converts the hex string into bytes at compile time instead of at runtime like ParseHex().
Eliminates runtime dependencies: #30377 (comment), #30048 (review)
Has stricter requirements than ParseHex() (disallows whitespace and uppercase hex digits) and replaces it in a bunch of places.
Makes it possible to derive other compile time constants.
Minor: should shave off a few runtime CPU cycles.

""_hex produces std::array<std::byte> as the momentum in the codebase is to use std::byte over uint8_t.

Also makes uint256 hex string constructor disallow uppercase hex digits. Discussed: #30560 (review)

Surprisingly does not change the size of the Guix bitcoind binary (on x86_64-linux-gnu) by 1 single byte.

Spawned already merged PRs: #30436, #30482, #30532, #30560.

DrahtBot commented at 10:50 am on July 2, 2024: contributor

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage

For detailed information about the code coverage, see the test coverage report.

Reviews

See the guideline for information on the review process.

Type	Reviewers
ACK	l0rinc, ryanofsky, stickies-v
Stale ACK	maflcko

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#30765 (refactor: Extend CScript with operator<< for std::array by l0rinc)
#30618 (test: increase FromUserHex FUZZ and unit testing coverage by l0rinc)
#28710 (Remove the legacy wallet and BDB dependency by achow101)
#26593 (tracing: Only prepare tracepoint arguments when actually tracing by 0xB10C)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

DrahtBot added the label Refactoring on Jul 2, 2024

hodlinator force-pushed on Jul 2, 2024

DrahtBot added the label Needs rebase on Jul 4, 2024

maflcko commented at 12:29 pm on July 4, 2024: member

transaction_identifier.h - Fixed dormant bug in TxidFromString() where the string_view length wasn’t respected(!).

This is known, see #28922 (review). Thanks for picking it up!

Maybe submit the fix first?

Realizing my GH history has minimum proof of work, I might have delayed creating a PR, but it felt timely as Testnet 4 is being worked on.

Not sure what this has to do with testnet 4. May be best to remove unrelated non-technical details from the commits and pull request description. (You can still share them in later comments, if you really want)

Concept ACK. The same should be done to ParseHex: #30048 (review)

hodlinator commented at 10:00 pm on July 5, 2024: contributor

transaction_identifier.h - Fixed dormant bug in TxidFromString() where the string_view length wasn’t respected(!).

This is known, see #28922 (comment). Thanks for picking it up!

Maybe submit the fix first?

Aha, good that you are tracking it! I see at least 4 possible fixes:

Switch back to Txid TxidFromString(const std::string& str)
Make TxidFromString() convert from std::string_view back to std::string internally before calling uint256S() to introduce a null-terminator.
Carry over the full SetHex(std::string_view) implementation from this PR without touching the SetHex(const char*) implementation.
Like 3 but implement SetHex(const char* str) by calling the std::string_view version.

Which do you recommend?

Realizing my GH history has minimum proof of work, I might have delayed creating a PR, but it felt timely as Testnet 4 is being worked on.

Not sure what this has to do with testnet 4. May be best to remove unrelated non-technical details from the commits and pull request description. (You can still share them in later comments, if you really want)

Thanks for the feedback. The Testnet 4 PR from this weeks review club introduces new hash-literals to the code, but I concede that it’s a weak argument.

Concept ACK. The same should be done to ParseHex: #30048 (comment)

Thanks for having a look and the pointer to ParseHex! It was on my radar momentarily but I didn’t reconsider it after reaching the current solution for uint256S(). Should probably introduce a consteval ParseHex(const char*) implementation as part of this PR. Moving to draft for now.

hodlinator marked this as a draft on Jul 5, 2024

maflcko commented at 6:47 am on July 8, 2024: member

Like 3 but implement SetHex(const char* str) by calling the std::string_view version.

I don’t think const char* overloads will need to be provided when string_view exists. Seems fine to just have a single sting_view function (and call it a fix at the same time).

ryanofsky commented at 3:22 pm on July 8, 2024: contributor

Concept ACK, and would be very nice for this to cover ParseHex. If it did, it seems like it would fix the unexpected consensus library dependency on the util library that hebasto reported in #29015 (comment):

https://github.com/bitcoin/bitcoin/blob/a83f050dbe1392fc6b1b6c2a140c7346653b40d3/src/pubkey.cpp#L193

hodlinator force-pushed on Jul 12, 2024

hodlinator commented at 9:09 am on July 12, 2024: contributor

transaction_identifier.h - Fixed dormant bug in TxidFromString() where the string_view length wasn’t respected(!).

This is known, see #28922 (comment). Thanks for picking it up!

Maybe submit the fix first?

PR up now: #30436

hodlinator force-pushed on Jul 12, 2024

DrahtBot removed the label Needs rebase on Jul 12, 2024

hodlinator force-pushed on Jul 12, 2024

hodlinator renamed this:
~~refactor: Make uint256S(const char*) consteval~~
refactor: Make uint256S(const char*) and ParseHex(const char*) consteval
on Jul 12, 2024

hodlinator force-pushed on Jul 12, 2024

hodlinator force-pushed on Jul 13, 2024

hodlinator force-pushed on Jul 16, 2024

DrahtBot commented at 9:51 pm on July 16, 2024: contributor

🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/27531471253

Make sure to run all tests locally, according to the documentation.

The failure may happen due to a number of reasons, for example:

Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.
An intermittent issue.

Leave a comment here, if you need help tracking down a confusing failure.

DrahtBot added the label CI failed on Jul 16, 2024

DrahtBot removed the label CI failed on Jul 17, 2024

in src/uint256.h:119 in dc5bf669e9 outdated

111@@ -111,6 +112,47 @@ class uint256 : public base_blob<256> {
112     static const uint256 ONE;
113 };
114 
115+/* uint256 from const char *.
116+ * Could theoretically have been a constructor but is overload of uint256S() for
117+ * historical reasons.
118+ */
119+consteval uint256 uint256S(const char *str)

maflcko commented at 5:10 am on July 17, 2024:

not sure about overloading this. Otherwise, one may get a runtime evaluation when switching from a raw string literal pointer to a consteval string_view, no?

Seems clearer to make it a constructor taking a string_view? I guess the only confusion could be that the Span<const unsigned char> and string_view (aka Span<const char>) do different things (one is hex decoding and the other is not), but that should be fine, because both are distinct types.

hodlinator commented at 12:21 pm on July 17, 2024:

Thanks for taking a closer look!

I want to enforce consteval for existing string literals. Ideally the string_view overload should be constexpr but from my research on MSVC assembly output in the PR summary it seems to handle lookup tables poorly. One possibility would be to remove the lookup table and use if (c >= '0' && c <= '9'), sacrificing some performance, but we don’t often parse ASCII hex strings in runtime anyway, right?

The main purpose of overloading uint256S() with a consteval version was to avoid changing all the call-sites. If we are okay with changing call-sites, it might be better to introduce a uint256(uint64_t,uint64_t,uint64_t,uint64_t) constructor. That way it would be the compiler parsing hexadecimal integer literals directly. See my raw array (54e0213c9c7a7a942dd320264c1f9224c494b777) and span (bfc2fb049f2a12e9936b938fb073738babac0cbb) explorations.

maflcko commented at 12:29 pm on July 17, 2024:

uint256(uint64_t,uint64_t,uint64_t,uint64_t) constructor. That way it would be the compiler parsing hexadecimal integer literals directly.

Not sure. That’d make it impossible to grep for a (let’s say) block hash. Also, it would be harder to copy-paste one, if the developer has to manually split it into 4 parts and add 0x prefixes. Finally, truncation checks can’t be done, see #30377 (review)

was to avoid changing all the call-sites.

How many are there? Should be trivial to write a scripted-diff to replace uint256S(" with something else, no?

Edit: There are around 50 instances. Most of them in a single file:

0$ git grep 'uint256S("' | grep -v 'src/test/' | wc -l 
154
2$ git grep 'uint256S("' src/kernel/chainparams.cpp  | wc -l 
348

maflcko commented at 12:30 pm on July 17, 2024:

I want to enforce consteval for existing string literals. Ideally the string_view overload should be constexpr but from my research on MSVC assembly output

I think string_view can be used in a consteval context, no?

hodlinator commented at 12:44 pm on July 17, 2024:

I want to enforce consteval for existing string literals. Ideally the string_view overload should be constexpr but from my research on MSVC assembly output

I think string_view can be used in a consteval context, no?

Yes, it’s the rest of that paragraph that poses some concern.

fanquake commented at 12:51 pm on July 17, 2024:

Ideally the string_view overload should be constexpr but from my research on MSVC assembly output

Please don’t be overly concerned about MSVC. We’ve had issues with it in the past, where it’s failed to optimize code properly, or we’ve had to write workarounds for it, when code was otherwise fine in Clang or GCC, i.e (https://github.com/bitcoin/bitcoin/pull/28657#discussion_r1360780446). We don’t use it as a release compiler (and never will), so if it’s failing to do X, that isn’t necessarily a blocker to making changes (if X is working correctly in Clang & GCC). MSVC can always be improved later, and shouldn’t prevent us from writing better code now.

hodlinator commented at 12:43 pm on July 18, 2024:

As of 10baae11ecc376f2250d4a51d2bbfeea56c0a31d I have stopped overloading uint256S():

Changed from consteval uint256 uint256S(const char *str) -> consteval explicit uint256(const char (&str)[65]) and base_blob(std::string_view str) This means the string width is enforced at compile time. (The new parsing code is much more strict and also asserts on the length).

Applied uint256S -> uint256 conversion where applicable, removing “0x”-prefixes. TODO: if there is agreement on the current approach - create scripted diff commit.

Added TxidFromStringS() to mirror uint256S() for runtime use.

Clarified reason for introduction of FixedVec in commit message 10baae11ecc376f2250d4a51d2bbfeea56c0a31d.

in src/uint256.h:133 in dc5bf669e9 outdated

128+            return c - 'A' + 0xA;
129+        else
130+            return -1;
131+    };
132+
133+    // Skip leading spaces.

maflcko commented at 5:11 am on July 17, 2024:

What is the point of this? Seems odd to add space before compilation and then remove it during compilation.

Seems easier to not add the space in the first place and fail compilation when there is one?

hodlinator commented at 12:27 pm on July 17, 2024:

My first stab at this was much more strict but when I added tests I went for full compatibility with the runtime overload. I’m certainly open to making it less permissive though.

hodlinator commented at 12:44 pm on July 18, 2024:

More strict as of 10baae11ecc376f2250d4a51d2bbfeea56c0a31d.

in src/uint256.h:136 in dc5bf669e9 outdated

131+    };
132+
133+    // Skip leading spaces.
134+    while (IsSpace(*str))
135+        str += 1;
136+    // Skip "0x" prefix.

maflcko commented at 5:12 am on July 17, 2024:

Same

hodlinator commented at 12:27 pm on July 17, 2024:

Same (https://github.com/bitcoin/bitcoin/pull/30377#discussion_r1680963050)

in src/uint256.h:143 in dc5bf669e9 outdated

138+        str += 2;
139+
140+    size_t digits = 0;
141+    while (from_hex(str[digits]) != -1)
142+        ++digits;
143+    // 64 = 32 bytes * 2 chars each. 32 bytes = 256 bits.

maflcko commented at 5:19 am on July 17, 2024:

Not sure about hardcoding this for every base_blob.

Seems easier to just implement it once for base_blob and then have it available for all?

hodlinator commented at 12:36 pm on July 17, 2024:

There is no uint160S() outside of uint256_tests.cpp and only uint256 and uint160 use base_blob so it felt alright having it here. If we abandon the uint256S()-overload approach I agree to change this.

hodlinator commented at 12:45 pm on July 18, 2024:

consteval parsing now done inside base_blob(std::string_view) as of 10baae11ecc376f2250d4a51d2bbfeea56c0a31d.

in src/uint256.h:144 in dc5bf669e9 outdated

139+
140+    size_t digits = 0;
141+    while (from_hex(str[digits]) != -1)
142+        ++digits;
143+    // 64 = 32 bytes * 2 chars each. 32 bytes = 256 bits.
144+    assert(digits <= 64);

maflcko commented at 5:23 am on July 17, 2024:

Not sure about fuzzy decoding here. Accidentally truncating an uint256 and thus parsing it as something else seems dangerous.

The convenience of being able to truncate leading zeros of a hex-encoded base_blob are never used, are they? The two constants ZERO and ONE are constructed without hex-decoding, so this isn’t needed there, and I fail to see another use case right now.

It seems easier to just assert WIDTH.

hodlinator commented at 12:41 pm on July 17, 2024:

Generally agree #30377 (review)

(Some call-sites do use “0x”-prefix now that I think of it, while others do not, but aligning that would probably be an easy diff to accept).

hodlinator commented at 12:46 pm on July 18, 2024:

Fixed as of 10baae11ecc376f2250d4a51d2bbfeea56c0a31d.

in src/util/strencodings.h:115 in 92b0fb059f outdated

110+    return std::ranges::equal(a, b);
111+}
112+
113+constexpr inline bool IsSpace(char c) noexcept;
114+template <typename Byte = uint8_t, size_t Size>
115+consteval FixedVec<Byte, Size / 2> ParseHex(const char (&hex_str)[Size])

maflcko commented at 5:27 am on July 17, 2024:

Same here. Overloading ParseHex seems fragile, because a compile time string_view will be evaluate at runtime, while a string literal will be evaluated at compile time.

hodlinator commented at 12:50 pm on July 18, 2024:

Still kept it as an overload as of 10baae11ecc376f2250d4a51d2bbfeea56c0a31d. If you prefer different names for runtime/compile time, which do you suggest?

in src/uint256.h:123 in dc5bf669e9 outdated

118+ */
119+consteval uint256 uint256S(const char *str)
120+{
121+   // Non lookup table version of HexDigit().
122+    auto from_hex = [](const char c) -> int8_t {
123+        if (c >= '0' && c <= '9')

maflcko commented at 5:29 am on July 17, 2024:

Please add {, } for multiline-if, according to the dev notes.

hodlinator commented at 12:40 pm on July 18, 2024:

Fixed as of 10baae11ecc376f2250d4a51d2bbfeea56c0a31d.

hodlinator force-pushed on Jul 18, 2024

hodlinator renamed this:
~~refactor: Make uint256S(const char*) and ParseHex(const char*) consteval~~
refactor: Add consteval uint256(const char (&str)[65]) and ParseHex(const char (&s)[Size])
on Jul 18, 2024

hodlinator renamed this:
~~refactor: Add consteval uint256(const char (&str)[65]) and ParseHex(const char (&s)[Size])~~
refactor: Add consteval uint256("str") and ParseHex("str")
on Jul 18, 2024

in src/util/transaction_identifier.h:77 in 10baae11ec outdated

74+}
75+
76+consteval Txid TxidFromString(const char (&str)[65])
77+{
78+    return Txid::FromUint256(uint256(str));
79 }

maflcko commented at 1:17 pm on July 18, 2024:

Why add a consteval overload here? Generally I am not a fan of adding test-only code (code that is only used in tests) to the real program. Performance or exe-size shouldn’t matter for tests, unless it is significant. If you still want to add it to the tests, that is fine, but then please move it to the test code. But please change the function name of the test-only function to something else. It seems odd that test-only code forces real code to be renamed.

hodlinator commented at 1:28 pm on July 18, 2024:

Good call, will fix.

hodlinator force-pushed on Jul 18, 2024

DrahtBot commented at 1:23 pm on July 18, 2024: contributor

🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/27614934599

Make sure to run all tests locally, according to the documentation.

The failure may happen due to a number of reasons, for example:

Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.
An intermittent issue.

Leave a comment here, if you need help tracking down a confusing failure.

DrahtBot added the label CI failed on Jul 18, 2024

hodlinator commented at 1:24 pm on July 18, 2024: contributor

(Fixed QT/GUI CI errors).

hodlinator force-pushed on Jul 18, 2024

in src/uint256.h:137 in 83a4d2f1cf outdated

133@@ -108,25 +134,16 @@ class uint256 : public base_blob<256> {
134     constexpr uint256() = default;
135     constexpr explicit uint256(uint8_t v) : base_blob<256>(v) {}
136     constexpr explicit uint256(Span<const unsigned char> vch) : base_blob<256>(vch) {}
137+    consteval explicit uint256(const char (&str)[65]) : base_blob<256>(str) {}

maflcko commented at 1:48 pm on July 18, 2024:

A more flexible alternative would be to just accept a string_view and rely on the length assert. However, I guess this prints a more confusing compile error.

Seems fine to change later to string_view, if this is needed.

However, I wonder if you can replace 65 by WIDTH+1?

maflcko commented at 1:50 pm on July 18, 2024:

Also, I wonder if the three duplicate redirects can be replaced by a single uinsg base_blob::base_blob; (or similar) to import all constructors.

hodlinator commented at 9:42 pm on July 19, 2024:

Seems fine to change later to string_view, if this is needed.

However, I wonder if you can replace 65 by WIDTH+1?

Done now in 79921003ffc858ca4b47e0fb187ed83c1667bc27 along with added comment and more descriptive parameter name.

Also, I wonder if the three duplicate redirects can be replaced by a single uinsg base_blob::base_blob; (or similar) to import all constructors.

I guess that would only be possible if switching the uint256-ctor to string_view, which I’d rather hold off on for now.

in src/util/strencodings.h:141 in 83a4d2f1cf outdated

136+        }
137+        auto c1 = from_hex(hex_str[it++]);
138+        if (it >= Size)
139+            return {};
140+        if (!c1.has_value())
141+            return {};

maflcko commented at 2:33 pm on July 18, 2024:

Not sure about these. It should fail compilation instead if non-hex is detected.

hodlinator commented at 8:55 pm on July 19, 2024:

I think it’s more acceptable to diverge on behavior if we call the consteval function something different, see #30377 (review)

maflcko commented at 7:43 am on July 22, 2024:

Maybe BytesFromHex?

hodlinator commented at 8:10 pm on July 23, 2024:

Yes, that would be along the lines of uint256::FromHex() that you have in the works in #30482.

One thing that comes to mind now though is that uint256::FromHex() can return a failure state. Maybe it would be better to have a constructor for this consteval thing - rename FixedVec into ByteVec with an added consteval constructor.

hodlinator commented at 10:21 pm on July 23, 2024:

Having problems inferring the size of the underlying std::array from the char-array size using the constructor approach. Looks like a free function is where it’s at for this one. (And maybe going back to raw std::array for the container if we fail compilation on whitespace/invalid input).

maflcko commented at 5:33 am on July 24, 2024:

I am a bit worried that putting the bytes directly onto the stack (or easily allowing the dev to do so) may lead to high stack usage in some code paths and stack overflow on some platforms that have tighter limits.

Maybe a better overall approach is to validate the hex string at compile time, but then parse into a (heap) vector at runtime?

I don’t think there are any performance concerns where hex parsing is used right now, so doing the parsing at runtime or compile time shouldn’t matter. The important thing is compile-time checking, to catch bugs before the code compiles.

hodlinator commented at 7:51 am on July 24, 2024:

Good point.. the compile-time parsed bytes would go into a data section in the binary instead of the hex string literal, which should :tm: make the binary smaller. But when the std::array containers are initialized during runtime they will take up stack space, instead of the former std::vector taking up heap-space.

Heap is usually slower than the stack, but if the vector was just allocated in the local function, the data should be “hot” anyway.

Not sure how to elegantly achieve compile time validation + runtime parsing.

Will ruminate on this, cheers!

maflcko commented at 8:02 am on July 24, 2024:

Heap is usually slower than the stack, but if the vector was just allocated in the local function, the data should be “hot” anyway.

Right, and I think there is no performance critical path.

Not sure how to elegantly achieve compile time validation + runtime parsing.

Should be easy:

0
1struct ConstevalHexLiteral {
2  const char* const hex;
3  consteval ConstevalHexLiteral(const char* str) : hex{str} { assert(IsCHex(str)); }
4  consteval ConstevalHexLiteral(std::nullptr_t) = delete;
5};
6
7auto BytesFromHex(ConstevalHexLiteral hex) { return std::vector{*Assert(RuntimeParse(hex))}; }

hodlinator commented at 8:35 am on July 24, 2024:

Thanks! That is a piece of art. :)

(Think I would put any Assert()s and vector initialization inside of RuntimeParse()).

maflcko commented at 8:53 am on July 24, 2024:

Yeah, RuntimeParse is probably just TryParseHex, which returns a vector already, so you can drop the vector-move-constructor from my suggestion.

hodlinator commented at 9:49 pm on July 25, 2024:

I am a bit worried that putting the bytes directly onto the stack (or easily allowing the dev to do so) may lead to high stack usage in some code paths and stack overflow on some platforms that have tighter limits.

Outside of tests & benchmarks there would currently only be 4 places where ParseHex() is replaced with BytesFromHex. One has static storage file scope, so wouldn’t affect the stack. The longest one is 336 chars, which becomes 168 bytes on the stack.

4 runtime places tentatively replaced with `BytesFromHex`

net_processing.cpp, the big one:

0// If the peer is old enough to have the old alert system, send it the final alert.
1if (greatest_common_version <= 70012) {
2    constexpr auto finalAlert{BytesFromHex("60010000000000000000000000ffffff7f00000000ffffff7ffeffff7f01ffffff7f00000000ffffff7f00ffffff7f002f555247454e543a20416c657274206b657920636f6d70726f6d697365642c2075706772616465207265717569726564004630440220653febd6410f470f6bae11cad19c48413becb1ac2c17f908fd0fd53bdc3abd5202206d0e9c96fe88d4a0f01ed9dedae2b6f9e00da94cad0fecaae66ecf689bf71b50")};
3    MakeAndPushMessage(pfrom, "alert", Span{finalAlert});
4}

0const CScript genesisOutputScript = CScript() << BytesFromHex("04678afdb0fe5548271967f1a67130b7105cd6a828e03909a67962e0ea1f61deb649f6bc3f4cef38c4f35504e51ec112de5c384df7ba0b8d578a4c702b6bf11d5f") << OP_CHECKSIG;

0bin = ToByteVector(BytesFromHex("512103ad5e0edad18cb1f0fc0d28a3d4f1f3e445640337489abb10404f2d1e086be430210359ef5021964fe22d6f8e05b2463c9540ce96883fe3b278760f048f5189f2e6c452ae"));

0constexpr XOnlyPubKey XOnlyPubKey::NUMS_H{BytesFromHex("50929b74c1a04954b78b4b6035e97a5e078a5a0f28ec96d547bfee9ace803ac0")};

For reference, src/leveldb/util/posix_logger.h has:

0    constexpr const int kStackBufferSize = 512;
1    char stack_buffer[kStackBufferSize];

(But it may be dead code, and would also just be part of a leaf in the call graph).

On my workstation I have an 8MB stack (50000x 160 bytes) according to:

0    struct rlimit rl;
1    int result = getrlimit(RLIMIT_STACK, &rl);
2    if (result == 0) {
3        LogInfo("Stack size: %ld", rl.rlim_cur);
4    } else {
5        LogInfo("getrlimit failed: %d", result);
6    }

This article gives 135168 bytes of stack size (~800x 168 bytes) for unnamed Raspberry Pi device in 2020: https://www.embeddedrelated.com/showarticle/1330.php

sipa commented at 10:13 pm on July 25, 2024:

You can also use std::is_constant_evaluated to make a function behave differently inside constexpr and outside of it.

hodlinator commented at 7:38 am on July 26, 2024:

Yes, I was experimenting with std::is_constant_evaluated early on (don’t think it was discussed). If I remember correctly, the reason I abandoned it in favor of consteval overloads taking string literals was that I was never getting compile-time evaluation where I expected it. There’s also the divergence in behavior that runtime-functions may need to return optional or heap-allocated types, while compile time-functions can keep all error handling internal and need to return stack-allocated types.

If we were to go with compile time validation when possible, but always heap-allocation for ParseHex, I can see how it may be useful.

maflcko commented at 9:12 am on July 26, 2024:

Outside of tests & benchmarks there would currently only be 4 places where ParseHex() is replaced with BytesFromHex.

Thanks for enumerating. Looking at them:

finalAlert -> Shouldn’t matter much, either way (Span-serialization doesn’t care about the underlying type)
genesisOutputScript -> This must be a vector and can not be an array, because array-serialization is different from vector-serialization
ToByteVector -> Yeah, vector as well
XOnlyPubKey -> Shouldn’t matter?

Again, up to you what you see is a better fit.

If you want to provide a function for arrays and another for vectors, it seems fine as well.

hodlinator commented at 11:57 am on July 26, 2024:

genesisOutputScript -> This must be a vector and can not be an array, because array-serialization is different from vector-serialization

Thanks for pointing that difference out, already ran into some other issues with local changes in test code regarding this. Seems a bit brittle that vectors serialize with the size prefixed while Spans don’t, as vectors can implicitly convert to Spans.

This difference in behavior affects DataStream and possibly other types using serialize.h, but as far as I see CScript only has CScript& operator<<(const std::vector<unsigned char>& b).

I am considering proposing the CScript method take std::span instead. Either way I think genesisOutputScript should still be okay.

maflcko commented at 12:25 pm on July 26, 2024:

Seems a bit brittle that vectors serialize with the size prefixed while Spans don’t, as vectors can implicitly convert to Spans.

As for serialize code, an implicit conversion should not happen. If it does anywhere, then that is a bug and should be fixed.

I am considering proposing the CScript method take std::span instead. Either way I think genesisOutputScript should still be okay.

I am not sure if it is beneficial to have cscript-serialization differ from streams-serialization here. In any case, it shouldn’t be required for the changes here, or otherwise touch the same lines of code as the changes here?

maflcko commented at 8:39 am on July 30, 2024:

(Feel free to resolve the discussion for now, given that you dropped those changes here for now)

hodlinator commented at 1:23 pm on August 13, 2024:

Went with ArrayFromBytes for all in the latest tip ~~d6d7a0b5221935518d2797aec7abc5c9632cbf68~~ 240bc10e3f5f90fcc386b1a72ee2067a156abff3.

~~Avoided adding a public CScript::operator<<(std::span) and went for a private method + public std::array overload.~~ (Latest doesn’t touch script.h-header).

Have a variant with consteval validation in VecFromHex-branch, but didn’t feel it was urgent to use in any of the cases, so currently dropped from this PR: https://github.com/hodlinator/bitcoin/blob/2024-07/uint256S_consteval_VecFromHex/src/util/strencodings.h#L95-L110

DrahtBot removed the label CI failed on Jul 18, 2024

hodlinator force-pushed on Jul 19, 2024

hodlinator force-pushed on Jul 21, 2024

in src/uint256.h:110 in 146a2db9e1 outdated

105+        assert(false);
106+    };
107+
108+    // 2 chars per byte.
109+    assert(str.length() == m_data.size() * 2);
110+    auto writeIt = m_data.begin();

maflcko commented at 7:48 am on July 22, 2024:

nit: snake_case for new code: write_it, str_it, according to the dev notes.

hodlinator force-pushed on Jul 22, 2024

ryanofsky referenced this in commit 7cc00bfc86 on Jul 23, 2024

DrahtBot added the label Needs rebase on Jul 23, 2024

hodlinator force-pushed on Jul 26, 2024

hodlinator renamed this:
~~refactor: Add consteval uint256("str") and ParseHex("str")~~
refactor: Add consteval uint256{"str"}
on Jul 26, 2024

hodlinator commented at 10:29 pm on July 26, 2024: contributor

I’ve scaled back this PR again to only concern itself with uint256 and not ParseHex in the hopes of getting it merged.

Even if it doesn’t get merged in favor of work by @stickies-v (https://github.com/bitcoin/bitcoin/pull/30482#discussion_r1693314022) at least it might provide valuable input without too much distraction.

Will try to make use of discussion around ParseHex approaches here in later PR.

hodlinator force-pushed on Jul 26, 2024

DrahtBot removed the label Needs rebase on Jul 27, 2024

hodlinator marked this as ready for review on Jul 27, 2024

maflcko commented at 8:37 am on July 30, 2024: member

While this may not be a hot spot, it’s better hygiene in C++20 to store the parsed data blob directly in the binary, without any parsing at runtime.

I don’t understand the motivation for this change. You seem to imply that this is some vague performance improvement. However, this is questionable.

Without a benchmark or otherwise steps to reproduce, those claims are hard to follow. Also, I’d argue performance is irrelevant in the touched code parts.

The real reason is that there is now strict compile-time checking for full validity of the hex string. Previously, any invalid string was accepted silently. I’d say this should be explained in the motivation and not silently omitted.

An alternative to the changes in this pull request would be to change the changed lines to *Assert(uint256::FromHex(...)) to get some of the same benefits (at runtime).

maflcko commented at 8:42 am on July 30, 2024: member

Maybe even close this pull request and open a fresh one, given that most discussion and conceptual feedback is about something that is now dropped from the pull? #30377 (comment)

ryanofsky commented at 12:51 pm on July 30, 2024: contributor

I think I don’t understand the suggestion to this close PR, or maybe I just disagree with it. It seems like there are a lot of reasons to think this is a good change. It does seem like the main benefit is that constants can be checked at compile time rather than runtime, but this change can also eliminate runtime dependencies(*), make it possible to derive other constants programmatically rather than hardcoding them as opaque binary strings, make the binary smaller, and make it start up faster. The term “hygiene” can be a little vague, but if this change enables all of that, it seems like good hygiene to me.

(*) Even if this change doesn’t make ParseHex consteval right now, it seems like it would help make it consteval in the future, unless I’m missing something.

maflcko commented at 1:21 pm on July 30, 2024: member

I think I don’t understand the suggestion to this close PR

To clarify with “close this pull request and open a fresh one” I meant “close this pull request and open a fresh one with the exact same commits, including a proper motivation and pull request description”. The reason being that most of the discussion comments are not related to the code changes in this pull request anymore. A good chunk of the discussion was about ParseHex and about a change that has since been split up and merged (https://github.com/bitcoin/bitcoin/pull/30377#issuecomment-2225154172).

But anything is fine here. My main feedback is to clarify the motivation (pull request description). This will have to be done whether it is re-opened or not. Otherwise, every single reviewer and future reader will have to do it themselves, like in #30436 (comment).

Also, to clarify:

Concept ACK on consteval uint256.
Concept ACK on consteval vector hex parsing. (Looking forward to review a pull request with this)

hodlinator marked this as a draft on Jul 31, 2024

hodlinator commented at 10:22 am on July 31, 2024: contributor

I’ll open a new PR for uint256{"str"} only (with a clearer motivation) as suggested and possibly re-use this one for ParseHex later, unless I don’t make that into it’s own PR too.

hodlinator commented at 1:42 pm on July 31, 2024: contributor

(#30560 extracted).

Differences

Comparison

Acquiesced to using string_view in the uint256-constructor instead of char[65]. Makes it more suitable for being called from other consteval/constexpr contexts.
Changed arg name to hex_str to clarify.
Cleaned up hex_str -> byte loop à la @paplorinc. :)

DrahtBot added the label Needs rebase on Aug 5, 2024

hodlinator renamed this:
~~refactor: Add consteval uint256{"str"}~~
on hold: refactor: Add consteval ParseHex variant
on Aug 5, 2024

ryanofsky referenced this in commit 21c2879f37 on Aug 5, 2024

hodlinator force-pushed on Aug 13, 2024

hodlinator renamed this:
~~on hold: refactor: Add consteval ParseHex variant~~
refactor: Add consteval ArrayFromBytes()
on Aug 13, 2024

hodlinator marked this as ready for review on Aug 13, 2024

DrahtBot removed the label Needs rebase on Aug 13, 2024

in src/util/strencodings.h:102 in 00dc16bc29 outdated

 97+    }
 98+    return rv;
 99+}
100+
101+template <typename T, size_t N>
102+constexpr bool operator==(const std::vector<T>& a, const std::array<T, N>& b)

ryanofsky commented at 4:47 pm on August 13, 2024:

In commit “refactor: Add consteval ArrayFromHex()” (00dc16bc29124dfbaf9d47c5411e4ebcf95e30b5)

Is this necessary? I’d be concerned adding such a generic top-level operator might cause this to get called in places we are not expecting, and maybe lead to hard to explain behavior or bugs. Would be good to at least have a comment explaining what this is intended to be used for.

hodlinator commented at 2:00 am on August 14, 2024:

Agree, it turned out it could be worked around by switching to BOOST_CHECK_EQUAL_COLLECTIONS in key_tests.cpp.

in src/util/strencodings.h:115 in 00dc16bc29 outdated

81+
82+/** Converts from hex string literal to byte array at compile time.
83+ * @warning Uses a stack based container (array). Care needs to be
84+ *          taken to avoid using up too much stack memory especially in
85+ *          recursive functions or non-leaf nodes of the call graph.
86+ */

ryanofsky commented at 5:00 pm on August 13, 2024:

In commit “refactor: Add consteval ArrayFromHex()” (00dc16bc29124dfbaf9d47c5411e4ebcf95e30b5)

This warning seems unnecessarily scary and maybe not true. I don’t think the function itself would use any stack space with NRVO, and callers could be calling this to initialize an array on the heap not the stack. Would suggest deleting this comment, because it seems like a warning about std::array in general, and I don’t think there is anything that really distinguishes this function from other functions using std::array.

hodlinator commented at 2:16 am on August 14, 2024:

This is in response to #30377 (review)

To summarize - the concern is not stack memory being used inside ArrayFromBytes itself, but rather that the std::array returned contains an inner C-array (https://github.com/gcc-mirror/gcc/blob/61715e9340ab8941d40d62158fe437e9dbe3e068/libstdc%2B%2B-v3/include/std/array) that is not allocated from the heap and so will bump the stack pointer in the calling function by a fair bit for long hex strings.

Should clarify the comment a bit on next re-touch.

ryanofsky commented at 2:49 pm on August 14, 2024:

re: #30377 (review)

The comment seems not true and not helpful.

If the concern is not stack space used inside the function, and this warning is only being added because the return type of the function is a variable-sized std::array, then I’m not sure why every function that returns a variable-sized std::array shouldn’t have the same warning. This actually seems less deserving of a warning than other functions returning a std::array object, because the size of the returned array is not determined by a number passed by the caller, but by a literal string passed by the caller, and is half the size of that string, so pretty much guaranteed to be small.
It is also inaccurate to say std::array is a stack based container or that calling this function will use stack space to store the array. This function can be called to initialize not just stack objects, but also global and heap objects due to NRVO, without being allocated on or copied from the stack.

hodlinator commented at 6:26 pm on August 14, 2024:

New version of the comment in latest push:

0 * [@warning](/bitcoin-bitcoin/contributor/warning/) Unlike VecFromHex which returns a vector, the returned array may use
1 *          significant stack space if called inside a function.

ryanofsky commented at 6:36 pm on August 14, 2024:

re: #30377 (review)

New version of the comment in latest push:

Thank you! That seems accurate now.

in src/prevector.h:396 in 05c16ae864 outdated

391@@ -392,7 +392,10 @@ class prevector {
392             change_capacity(new_size + (new_size >> 1));
393         }
394         T* ptr = item_ptr(p);
395-        memmove(ptr + count, ptr, (size() - p) * sizeof(T));
396+        // Passing dst variable with explicit type instead of temporary
397+        // expression calms down GCC under some circumstances.

ryanofsky commented at 5:05 pm on August 13, 2024:

In commit “refactor: Make code more tolerant of constexpr std::arrays” (05c16ae8649ee3b6bbcdbdf0541a65ccf4477537)

What is this referring to?

hodlinator commented at 2:19 am on August 14, 2024:

I was running into an issue with CScript (inheriting prevector) and std::arrays, similar to here: https://gitlab.freedesktop.org/pipewire/media-session/-/issues/4

Reverted this change in the latest push as I’m no longer modifying CScript. (Was running into MSVC CI-failures stemming from inability to infer N-parameter in constructor taking std::array<unsigned char, N>::const_iterator).

ryanofsky approved

ryanofsky commented at 5:09 pm on August 13, 2024: contributor

Did not review in detail yet but light code review ACK d6d7a0b5221935518d2797aec7abc5c9632cbf68. All changes seem like what I would expect.

I think I would suggest changing title of PR to “refactor: Replace ParseHex with consteval ArrayFromHex” so it mentions ParseHex and it is more obvious how this affects existing code.

DrahtBot requested review from maflcko on Aug 13, 2024

hodlinator force-pushed on Aug 14, 2024

hodlinator renamed this:
~~refactor: Add consteval ArrayFromBytes()~~
refactor: Replace ParseHex with consteval ArrayFromHex
on Aug 14, 2024

Mahmoud198425 approved

in src/uint256.h:131 in 9dab917088 outdated

127@@ -128,13 +128,13 @@ consteval base_blob<BITS>::base_blob(std::string_view hex_str)
128     // Non-lookup table version of HexDigit().
129     auto from_hex = [](const char c) -> int8_t {
130         if (c >= '0' && c <= '9') return c - '0';
131+        // Only lowercase letters are supported, for consistency.

l0rinc commented at 9:08 am on August 14, 2024:

<3

in src/pubkey.h:256 in 29090eca42 outdated

253@@ -254,7 +254,11 @@ class XOnlyPubKey
254     bool IsNull() const { return m_keydata.IsNull(); }
255 
256     /** Construct an x-only pubkey from exactly 32 bytes. */

l0rinc commented at 9:17 am on August 14, 2024:

nit: now that the code is available here, the comment bacame redundant, since it doesn’t provide any info that the code doesn’t already explain

hodlinator commented at 5:35 pm on August 14, 2024:

Taken in latest push.

in src/test/util_tests.cpp:180 in 1953d21e3f outdated

153@@ -154,28 +154,29 @@ BOOST_AUTO_TEST_CASE(parse_hex)
154     BOOST_CHECK_EQUAL_COLLECTIONS(result.begin(), result.end(), expected.begin(), expected.end());
155 
156     // Spaces between bytes must be supported
157+    expected = {0x12, 0x34, 0x56, 0x78};
158     result = ParseHex("12 34 56 78");
159-    BOOST_CHECK(result.size() == 4 && result[0] == 0x12 && result[1] == 0x34 && result[2] == 0x56 && result[3] == 0x78);
160+    BOOST_CHECK_EQUAL_COLLECTIONS(result.begin(), result.end(), expected.begin(), expected.end());

l0rinc commented at 9:31 am on August 14, 2024:

Nice!

Alternatively, since we claim to be testing whitespace support - and we’ve already tested that non-whitespace values are parsed correctly -, we might as well compare ParseHex("12 34 56 78") to ParseHex("12345678") instead - otherwise all tests would fail all the time when an error is introduced, this way only the appropriate tests would fail (+ testing is simpler).

hodlinator commented at 1:08 pm on August 14, 2024:

Responding to https://github.com/bitcoin/bitcoin/pull/30377/files#r1716610850: I think it’s more rigorous/strict to not use ParseHex on both left & right sides of the comparison.

l0rinc commented at 7:02 pm on August 14, 2024:

By testing these transitively, we wouldn’t test everything everywhere. But the latter can have the advantage of making mistakes really obvious :)

in src/test/util_tests.cpp:193 in 1953d21e3f outdated

171-    BOOST_CHECK(result.size() == 4 && result[0] == 0x89 && result[1] == 0x34 && result[2] == 0x56 && result[3] == 0x78);
172+    BOOST_CHECK_EQUAL_COLLECTIONS(result.begin(), result.end(), expected.begin(), expected.end());
173 
174     // Mixed case and spaces are supported
175+    expected = {0xff, 0xaa};
176     result = ParseHex("     Ff        aA    ");

l0rinc commented at 9:35 am on August 14, 2024:

internal whitespaces are weird enough - but do we still encourage mixed casing?

hodlinator commented at 2:16 pm on August 14, 2024:

Responding to https://github.com/bitcoin/bitcoin/pull/30377/files#r1716616576: Enough casing changes for one PR. :)

in src/test/util_tests.cpp:149 in 09458eadc9 outdated

147@@ -148,6 +148,8 @@ BOOST_AUTO_TEST_CASE(parse_hex)
148     std::vector<unsigned char> result;
149     std::vector<unsigned char> expected(ParseHex_expected, ParseHex_expected + sizeof(ParseHex_expected));

l0rinc commented at 9:40 am on August 14, 2024:

Can me modernized/simplified:

0    std::vector<unsigned char> expected(std::begin(ParseHex_expected), std::end(ParseHex_expected));

(nit: weird ParseHex_expected naming)

hodlinator commented at 5:32 pm on August 14, 2024:

Taken in latest push.

in src/util/strencodings.h:91 in 09458eadc9 outdated

86+ */
87+template <typename Byte = uint8_t, size_t Size>
88+consteval std::array<Byte, (Size - 1) / 2> ArrayFromHex(const char (&hex_str)[Size])
89+{
90+    // - 1 comes from implicit null terminator
91+    if ((Size - 1) % 2 == 1) throw "2 hex digits per byte is required";

l0rinc commented at 9:44 am on August 14, 2024:

x - 1 is odd means x is even
digits is plural, are required seems better

0    if (Size % 2 == 0) throw "2 hex digits per byte are required";

or

0    if (Size % 2 == 0) throw "2 hex digits required per byte";

hodlinator commented at 5:33 pm on August 14, 2024:

Taken in latest push.

in src/util/strencodings.h:92 in 09458eadc9 outdated

87+template <typename Byte = uint8_t, size_t Size>
88+consteval std::array<Byte, (Size - 1) / 2> ArrayFromHex(const char (&hex_str)[Size])
89+{
90+    // - 1 comes from implicit null terminator
91+    if ((Size - 1) % 2 == 1) throw "2 hex digits per byte is required";
92+    std::array<Byte, (Size - 1) / 2> rv{};

l0rinc commented at 9:58 am on August 14, 2024:

we’ve checked already that Size is always odd, it seems to me that we can safely truncate:

0    std::array<Byte, Size  / 2> rv{};

and

0std::array<Byte, Size / 2>

in the signature

hodlinator commented at 5:34 pm on August 14, 2024:

Taken in latest push, should add co-authorship on next re-touch.

hodlinator commented at 9:30 pm on August 14, 2024:

Added 4 of you in b2fbe40cdf08b56d98e38ce261bfeeaf23b5f795, hopefully you don’t mind the company. :) @maflcko let me know if you prefer a different identifier to make GitHub rendering prettier.

in src/wallet/test/wallet_crypto_tests.cpp:64 in 29090eca42 outdated

60@@ -63,11 +61,11 @@ static void TestEncryptSingle(const CCrypter& crypt, const CKeyingMaterial& vchP
61     TestDecrypt(crypt, vchCiphertext, vchPlaintext2);
62 }
63 
64-static void TestEncrypt(const CCrypter& crypt, const std::vector<unsigned char>& vchPlaintextIn, \
65+static void TestEncrypt(const CCrypter& crypt, std::span<const unsigned char> vchPlaintextIn, \

maflcko commented at 10:07 am on August 14, 2024:

style nit in 29090eca423327353b474615d02ed7c3190e4a50: Can drop the trailing \, since this is a normal function and not a macro.

l0rinc commented at 11:25 am on August 14, 2024:

My original comment was:

Q: what the purpose of the trailing backslash, is it an automatic formatter trick?

But this answers it, thanks

in src/uint256.h:134 in 9dab917088 outdated

127@@ -128,13 +128,13 @@ consteval base_blob<BITS>::base_blob(std::string_view hex_str)
128     // Non-lookup table version of HexDigit().
129     auto from_hex = [](const char c) -> int8_t {
130         if (c >= '0' && c <= '9') return c - '0';
131+        // Only lowercase letters are supported, for consistency.
132         if (c >= 'a' && c <= 'f') return c - 'a' + 0xA;
133-        if (c >= 'A' && c <= 'F') return c - 'A' + 0xA;
134 
135-        assert(false); // Reached if ctor is called with an invalid hex digit.
136+        throw "Called ctor with an invalid hex digit";

maflcko commented at 10:11 am on August 14, 2024:

nit in 9dab917088e83f627786ed5caafec859a3481b78: If you change the behavior, it also needs to adjust the error string. Now it should say: throw "Hex string must only contain lowercase hex chars";.

hodlinator commented at 5:40 pm on August 14, 2024:

Updated with slightly different string in latest push and removed comment about consistency above as well.

0throw "Only lowercase hex digits are allowed, for consistency";

in src/kernel/chainparams.cpp:74 in 6a256b96af outdated

70@@ -71,7 +71,7 @@ static CBlock CreateGenesisBlock(const char* pszTimestamp, const CScript& genesi
71 static CBlock CreateGenesisBlock(uint32_t nTime, uint32_t nNonce, uint32_t nBits, int32_t nVersion, const CAmount& genesisReward)
72 {
73     const char* pszTimestamp = "The Times 03/Jan/2009 Chancellor on brink of second bailout for banks";
74-    const CScript genesisOutputScript = CScript() << ParseHex("04678afdb0fe5548271967f1a67130b7105cd6a828e03909a67962e0ea1f61deb649f6bc3f4cef38c4f35504e51ec112de5c384df7ba0b8d578a4c702b6bf11d5f") << OP_CHECKSIG;
75+    const CScript genesisOutputScript = CScript() << ToByteVector(ArrayFromHex("04678afdb0fe5548271967f1a67130b7105cd6a828e03909a67962e0ea1f61deb649f6bc3f4cef38c4f35504e51ec112de5c384df7ba0b8d578a4c702b6bf11d5f")) << OP_CHECKSIG;

l0rinc commented at 10:21 am on August 14, 2024:

Checked manually to make sure this still produces the same CScript as before 👍

hodlinator commented at 12:54 pm on August 14, 2024:

Responding to https://github.com/bitcoin/bitcoin/pull/30377/files#r1716678159:

Thanks, better get this part right. :)

ryanofsky commented at 6:30 pm on August 14, 2024:

In commit “refactor: Hand-replace some ParseHex -> ArrayFromHex + VecFromHex” (8ad7e7d3fdfed4b126859dd83ab81cf8a8ae82a9)

It’s not obvious why an extra ToByteVector conversion is necessary here and why ArrayFromHex is not sufficient. Does CScript not accept std::array, or does it not accept uint8_t? I think this change is ok but it would be good to say in commit message what is happening here and how we could simplify this in the future, assuming it is something we should be able to clean up later.

l0rinc commented at 7:03 pm on August 14, 2024:

Yeah :))

l0rinc commented at 7:07 pm on August 14, 2024:

Seems related to: #30377 (review)

hodlinator commented at 9:16 pm on August 14, 2024:

Now explained in commit message 0a82c18457ec81e911b835b2ac76ad7475384983

maflcko commented at 9:56 am on August 15, 2024:

style nit in 0a82c18457ec81e911b835b2ac76ad7475384983:

I know this has been mentioned before, but performance doesn’t matter here, so using just VecFromHex seems shorter and clearer?

hodlinator commented at 8:47 pm on August 15, 2024:

It’s because part of my goal with this PR was to etch these bytes into the runtime binary. Things that can be done at compile time without too much hassle should be done at compile time IMO.

If we go the Vec(HexLiteral route at least it becomes shorter.

maflcko commented at 5:55 am on August 16, 2024:

It’s because part of my goal with this PR was to etch these bytes into the runtime binary. Things that can be done at compile time without too much hassle should be done at compile time IMO.

If we go the Vec(HexLiteral route at least it becomes shorter.

Makes sense, thanks for looking into it!

in src/test/script_tests.cpp:1394 in 6a256b96af outdated

1389@@ -1383,60 +1390,60 @@ BOOST_AUTO_TEST_CASE(script_FindAndDelete)
1390     BOOST_CHECK_EQUAL(FindAndDelete(s, d), 4);
1391     BOOST_CHECK(s == expect);
1392 
1393-    s = ScriptFromHex("0302ff03"); // PUSH 0x02ff03 onto stack
1394-    d = ScriptFromHex("0302ff03");
1395+    s = ToScript(ArrayFromHex("0302ff03")); // PUSH 0x02ff03 onto stack
1396+    d = ToScript(ArrayFromHex("0302ff03"));

maflcko commented at 10:23 am on August 14, 2024:

style nit in 6a256b96af8c94804538eb9e78964557c032b74f: I think in tests it is more important that they are short, easy to read, and write. Compile time enforcements are neat at best, but shouldn’t be a goal. Generally, the unit tests are single threaded and fully deterministic, so if the test ran once (and passed), it should always pass.

So my preference would be to just call data = *Assert(TryParseHex(str)) inside ScriptFromHex.

This makes the diff smaller and also ensures that the same validation is done for ScriptFromHex that remain untouched in this pull.

hodlinator commented at 2:13 pm on August 14, 2024:

Attempted to minimize the diff in latest push, but went with VecFromHex for string literals and introduced a new ScriptFromHexStr for the 2 runtime validated call sites.

in src/test/base58_tests.cpp:75 in 6a256b96af outdated

71@@ -72,7 +72,7 @@ BOOST_AUTO_TEST_CASE(base58_DecodeBase58)
72     // check that DecodeBase58 skips whitespace, but still fails with unexpected non-whitespace at the end.
73     BOOST_CHECK(!DecodeBase58(" \t\n\v\f\r skip \r\f\v\n\t a", result, 3));
74     BOOST_CHECK( DecodeBase58(" \t\n\v\f\r skip \r\f\v\n\t ", result, 3));
75-    std::vector<unsigned char> expected = ParseHex("971a55");
76+    constexpr std::array<unsigned char, 3> expected{ArrayFromHex("971a55")};

l0rinc commented at 10:27 am on August 14, 2024:

Could we simplify some of these declarations (at least in tests) to e.g.

0    auto expected{ArrayFromHex("971a55")};

hodlinator commented at 1:02 pm on August 14, 2024:

Responding to https://github.com/bitcoin/bitcoin/pull/30377/files#r1716685019: I’ll meet you half-way with constexpr auto. :)

l0rinc commented at 7:03 pm on August 14, 2024:

I’ll take what I can get

in src/kernel/chainparams.cpp:410 in 6a256b96af outdated

406@@ -407,7 +407,7 @@ class SigNetParams : public CChainParams {
407         vSeeds.clear();
408 
409         if (!options.challenge) {
410-            bin = ParseHex("512103ad5e0edad18cb1f0fc0d28a3d4f1f3e445640337489abb10404f2d1e086be430210359ef5021964fe22d6f8e05b2463c9540ce96883fe3b278760f048f5189f2e6c452ae");
411+            bin = ToByteVector(ArrayFromHex("512103ad5e0edad18cb1f0fc0d28a3d4f1f3e445640337489abb10404f2d1e086be430210359ef5021964fe22d6f8e05b2463c9540ce96883fe3b278760f048f5189f2e6c452ae"));

maflcko commented at 10:34 am on August 14, 2024:

ToByteVector seems to be called in a few places, though mostly in tests.

I wonder if it makes sense to just go with https://github.com/hodlinator/bitcoin/blob/2024-07/uint256S_consteval_VecFromHex/src/util/strencodings.h#L95-L110

l0rinc commented at 11:19 am on August 14, 2024:

Or maybe we could change the CScript& operator<< to accept std::span instead:

 0diff --git a/src/script/script.h b/src/script/script.h
 1--- a/src/script/script.h	(revision 8509f5ade3ad7e1f1a727fe027483faaf2f2d4fe)
 2+++ b/src/script/script.h	(date 1723634083423)
 3@@ -463,7 +463,7 @@
 4         return *this;
 5     }
 6 
 7-    CScript& operator<<(const std::vector<unsigned char>& b) LIFETIMEBOUND
 8+    CScript& operator<<(const std::span<const unsigned char>& b) LIFETIMEBOUND
 9     {
10         if (b.size() < OP_PUSHDATA1)
11         {

Edit: these would work for CScript() << OP_RETURN << ArrayFromHex("") only (eagerly merged my comment with @maflcko’s, but it wasn’t a perfect match)

hodlinator commented at 1:01 pm on August 14, 2024:

Responding to https://github.com/bitcoin/bitcoin/pull/30377/files#r1716693711: @maflcko: Might introduce VecFromHex for tests but I like that this constant is fully compile time. @paplorinc: I have been trying out exactly that approach quite a bit, but was warned by maflcko that there may be lying dragons among the different serializations of vector/array/span/Span. Also had some annoying MSVC compiler errors when just trying to add an overload for std::array. So undid my changes to that class.

in src/test/script_tests.cpp:1364 in 6a256b96af outdated

1357@@ -1358,6 +1358,13 @@ static CScript ScriptFromHex(const std::string& str)
1358     return CScript(data.begin(), data.end());
1359 }
1360 
1361+template <size_t N>
1362+static CScript ToScript(const std::array<uint8_t, N>& data)
1363+{
1364+    auto v = ToByteVector(data);

l0rinc commented at 10:36 am on August 14, 2024:

To keep previous behavior fully, we could we add a CheckedParseHex style validation here as well (since ArrayFromHex("") is theoretically accepted, while ScriptFromHex failed for empty inputs).

hodlinator commented at 2:12 pm on August 14, 2024:

ScriptFromHex failed for empty inputs.

How come? ParseHex("") returns 0 as verified by util_tests.cpp. Besides that, I think it’s not really what we are testing in this file.

in src/wallet/crypter.cpp:14 in 29090eca42 outdated

10@@ -11,7 +11,7 @@
11 #include <vector>
12 
13 namespace wallet {
14-int CCrypter::BytesToKeySHA512AES(const std::vector<unsigned char>& chSalt, const SecureString& strKeyData, int count, unsigned char *key,unsigned char *iv) const
15+int CCrypter::BytesToKeySHA512AES(std::span<const unsigned char> chSalt, const SecureString& strKeyData, int count, unsigned char* key, unsigned char* iv) const

maflcko commented at 10:42 am on August 14, 2024:

in 29090eca423327353b474615d02ed7c3190e4a50: When touching all of those lines, I wonder if it makes sense to switch them to std::byte, but that will probably be more involved.

hodlinator commented at 1:12 pm on August 14, 2024:

Responding to https://github.com/bitcoin/bitcoin/pull/30377/files#r1716703571: Tried to start switching to std::byte after your comment, but it quickly propagated to untouched files.

maflcko commented at 10:42 am on August 14, 2024: member

Left some style nits / questions, feel free to ignore.

in src/test/script_tests.cpp:1365 in 6a256b96af outdated

1357@@ -1358,6 +1358,13 @@ static CScript ScriptFromHex(const std::string& str)
1358     return CScript(data.begin(), data.end());
1359 }
1360 
1361+template <size_t N>
1362+static CScript ToScript(const std::array<uint8_t, N>& data)
1363+{
1364+    auto v = ToByteVector(data);
1365+    return CScript(v.begin(), v.end());

l0rinc commented at 10:43 am on August 14, 2024:

Do we even need the vector conversion here?

0    return CScript(data.begin(), data.end());

hodlinator commented at 1:04 pm on August 14, 2024:

Clang and GCC don’t, but MSVC does. :(

Removed this function in latest push.

in src/test/script_tests.cpp:1389 in 6a256b96af outdated

1389@@ -1383,60 +1390,60 @@ BOOST_AUTO_TEST_CASE(script_FindAndDelete)
1390     BOOST_CHECK_EQUAL(FindAndDelete(s, d), 4);
1391     BOOST_CHECK(s == expect);
1392 
1393-    s = ScriptFromHex("0302ff03"); // PUSH 0x02ff03 onto stack
1394-    d = ScriptFromHex("0302ff03");

l0rinc commented at 10:47 am on August 14, 2024:

Checked that ScriptFromHex is only used for non-static values now 👍

Though I’m not sure ToScript(ArrayFromHex("0302ff03")) is more readable (or performant) than ScriptFromHex("0302ff03".

in src/pubkey.h:259 in 240bc10e3f outdated

253@@ -254,7 +254,11 @@ class XOnlyPubKey
254     bool IsNull() const { return m_keydata.IsNull(); }
255 
256     /** Construct an x-only pubkey from exactly 32 bytes. */
257-    explicit XOnlyPubKey(Span<const unsigned char> bytes);
258+    constexpr explicit XOnlyPubKey(Span<const unsigned char> bytes)
259+    {
260+        assert(bytes.size() == 32);

stickies-v commented at 11:06 am on August 14, 2024:

nit: while touching, might be worth updating to

0        assert(bytes.size() == m_keydata.size());

hodlinator commented at 5:31 pm on August 14, 2024:

Taken in latest push.

hodlinator commented at 9:37 pm on August 14, 2024:

Further changed to just:

0constexpr explicit XOnlyPubKey(Span<const unsigned char> bytes) : m_keydata{bytes} {}

Since base_blob(Span<const unsigned char>) does the exact length assert internally.

in src/uint256.h:131 in 09458eadc9 outdated

137     if (hex_str.length() != m_data.size() * 2) throw "Hex string must fit exactly";
138     auto str_it = hex_str.rbegin();
139     for (auto& elem : m_data) {
140-        auto lo = from_hex(*(str_it++));
141-        elem = (from_hex(*(str_it++)) << 4) | lo;
142+        auto lo = ConstevalHexDigit(*(str_it++));

stickies-v commented at 11:23 am on August 14, 2024:

I’m unable to compile 09458eadc9a4484ba37a70d1b378ed3f3c9e31d0 using:

0% clang --version
1Homebrew clang version 17.0.6
2Target: arm64-apple-darwin23.3.0
3Thread model: posix
4InstalledDir: /opt/homebrew/opt/llvm/bin

 0git clean -xdf && git checkout 09458eadc9a4484ba37a70d1b378ed3f3c9e31d0 && ./autogen.sh && ./configure --without-gui && make -j 7
 1
 2./uint256.h:131:19: error: call to consteval function 'ConstevalHexDigit' is not a constant expression
 3        auto lo = ConstevalHexDigit(*(str_it++));
 4                  ^
 5./uint256.h:173:60: note: in instantiation of member function 'base_blob<256>::base_blob' requested here
 6    consteval explicit uint256(std::string_view hex_str) : base_blob<256>(hex_str) {}
 7                                                           ^
 8/Library/Developer/CommandLineTools/SDKs/MacOSX14.0.sdk/usr/include/c++/v1/__iterator/reverse_iterator.h:168:56: note: read of non-constexpr variable 'str_it' is not allowed in a constant expression
 9    reverse_iterator operator++(int) {reverse_iterator __tmp(*this); --current; return __tmp;}
10                                                       ^
11/Library/Developer/CommandLineTools/SDKs/MacOSX14.0.sdk/usr/include/c++/v1/__iterator/reverse_iterator.h:168:56: note: in call to 'reverse_iterator(str_it)'
12./uint256.h:131:39: note: in call to '&str_it->operator++(0)'
13        auto lo = ConstevalHexDigit(*(str_it++));
14                                      ^
15./uint256.h:129:10: note: declared here
16    auto str_it = hex_str.rbegin();

maflcko commented at 12:26 pm on August 14, 2024:

What is your macOS and XCode version? IIRC Xcode 14 is no longer supported, see also #29934, but I think this isn’t documented, nor tested, so probably up for debate.

The CI passes, because it is using Xcode 15, see d742be3d3f5d5063d7160f72422bce2fec953f38

l0rinc commented at 12:30 pm on August 14, 2024:

It was working for me with:

0 % clang --version
1Homebrew clang version 18.1.8
2Target: arm64-apple-darwin23.5.0
3Thread model: posix
4InstalledDir: /opt/homebrew/opt/llvm/bin
5
6 % xcodebuild -version
7Xcode 15.4
8Build version 15F31d

hodlinator commented at 1:10 pm on August 14, 2024:

Responding to https://github.com/bitcoin/bitcoin/pull/30377/files#r1716752262:

Weird, actually works for me on same clang version as @stickies-v, but under Linux:

0$ clang --version
1clang version 17.0.6
2Target: x86_64-unknown-linux-gnu

Maybe it’s the standard library that is somehow different (missing some constexpr)?

stickies-v commented at 1:16 pm on August 14, 2024:

What is your macOS and XCode version? IIRC Xcode 14 is no longer supported, see also #29934, but I think this isn’t documented, nor tested, so probably up for debate.

Thanks, I’m on macOS 14.3.1 and just bumped XCode from 14.3.1.0.1.1683849156 to 15.3.0.0.1.1708646388 which resolves the issue 👍

maflcko commented at 1:43 pm on August 14, 2024:

So I guess this commit drops support for XCode on macOS Ventura 13. Not sure if doc/build-osx.md needs to be adjusted, similar to https://github.com/bitcoin/bitcoin/pull/29934/files

hodlinator commented at 9:49 pm on August 14, 2024:

So I guess this commit drops support for XCode on macOS Ventura 13. Not sure if doc/build-osx.md needs to be adjusted, similar to https://github.com/bitcoin/bitcoin/pull/29934/files

According to my earlier (pending and therefore out of order) message, #30377 (review), the clang version is not the issue. So simply changing to llvm@18 in the docs might not be correct.

What is your clang version after the XCode upgrade @stickies-v?

maflcko commented at 8:27 am on August 15, 2024:

Correct, the issue is not the clang version, but the stdlib version. MacOSX14.0.sdk does seem to have an issue in the iterators implementation. The minimum clang (and libc++) version is documented in doc/dependencies.md to be clang-16.

However, Apple somehow ships a completely separately versioned clang and stdlib with Xcode. Apples Xcode 14 stdlib seems to be no longer supported after this commit.

Given that Xcode 15 dropped support for macOS Ventura 13, made me leave the previous comment #30377 (review) about possibly extending the commit to mention that.

Again, I don’t use Apple, so I don’t care, but I wanted to mention it.

l0rinc commented at 11:27 am on August 14, 2024: contributor

I see this has been open for some time, if I missed an old comment that is still relevant, let me know.

I would prefer some simplifications, since the code becase slightly more complex and I think there are a few simple fixes for that.

stickies-v commented at 11:49 am on August 14, 2024: contributor

Concept ACK for more compile time validation.

~~Strangely enough, 09458eadc9a4484ba37a70d1b378ed3f3c9e31d0 doesn’t compile for me even though CI seems fine.~~ edit: fixed by bumping to XCode 15

in src/wallet/crypter.cpp:90 in 240bc10e3f outdated

86@@ -87,7 +87,7 @@ bool CCrypter::Encrypt(const CKeyingMaterial& vchPlaintext, std::vector<unsigned
87     return true;
88 }
89 
90-bool CCrypter::Decrypt(const std::vector<unsigned char>& vchCiphertext, CKeyingMaterial& vchPlaintext) const
91+bool CCrypter::Decrypt(std::span<const unsigned char> vchCiphertext, CKeyingMaterial& vchPlaintext) const

stickies-v commented at 1:50 pm on August 14, 2024:

nit: vch var name prefixes are annoying now that they’re not vectors anymore. It’ll increase the diff but I still think it might be worth modernizing the names here to avoid that inconsistency? (here + a bunch of other instances)

hodlinator commented at 9:23 pm on August 14, 2024:

Done, slightly, vchCiphertext -> chCiphertext (already some precedence for ch-prefix). Also changed vchSalt -> salt in some places.

in src/util/strencodings.h:73 in 240bc10e3f outdated

67@@ -67,6 +68,36 @@ std::vector<Byte> ParseHex(std::string_view hex_str)
68 {
69     return TryParseHex<Byte>(hex_str).value_or(std::vector<Byte>{});
70 }
71+
72+// Non lookup table version of HexDigit().
73+consteval uint8_t ConstevalHexDigit(const char c)

stickies-v commented at 2:52 pm on August 14, 2024:

Testing consteval functions is quite constrained, but I think it still might be useful to add some happy path test cases to util_tests.cpp?

0BOOST_AUTO_TEST_CASE(consteval_hex_digit)
1{
2    BOOST_CHECK_EQUAL(ConstevalHexDigit('0'), 0);
3    BOOST_CHECK_EQUAL(ConstevalHexDigit('9'), 9);
4    BOOST_CHECK_EQUAL(ConstevalHexDigit('a'), 0xa);
5    BOOST_CHECK_EQUAL(ConstevalHexDigit('f'), 0xf);
6}

hodlinator commented at 9:17 pm on August 14, 2024:

Now part of b2fbe40cdf08b56d98e38ce261bfeeaf23b5f795

in src/util/strencodings.h:123 in 240bc10e3f outdated

88+consteval std::array<Byte, (Size - 1) / 2> ArrayFromHex(const char (&hex_str)[Size])
89+{
90+    // - 1 comes from implicit null terminator
91+    if ((Size - 1) % 2 == 1) throw "2 hex digits per byte is required";
92+    std::array<Byte, (Size - 1) / 2> rv{};
93+    size_t it = 0;

stickies-v commented at 3:46 pm on August 14, 2024:

nit: calling this non-iterator i would be more clear

hodlinator commented at 9:21 pm on August 14, 2024:

Done.

hodlinator force-pushed on Aug 14, 2024

in src/util/strencodings.h:117 in 65bc6dcd57 outdated

113+/** Converts from hex string literal to byte array at compile time.
114+ * @warning Unlike VecFromHex which returns a vector, the returned array may use
115+ *          significant stack space if called inside a function.
116+ */
117+template <typename Byte = uint8_t, size_t Size>
118+consteval std::array<Byte, Size / 2> ArrayFromHex(const char (&hex_str)[Size])

stickies-v commented at 5:38 pm on August 14, 2024:

nit: The size requirement could also be expressed as a requires clause:

0consteval std::array<Byte, (Size - 1) / 2> ArrayFromHex(const char (&hex_str)[Size])
1    requires (Size % 2 == 1)

It provides more helpful information on why the constraints aren’t met, e.g.:

0net_processing.cpp:3934:39: error: no matching function for call to 'ArrayFromHex'
1            constexpr auto finalAlert{ArrayFromHex("60010000000000000000000000ffffff7f00000000ffffff7ffeffff7f01ffffff7f00000000ffffff7f00ffffff7f002f555247454e543a20416c657274206b657920636f6d70726f6d697365642c2075706772616465207265717569726564004630440220653febd6410f470f6bae11cad19c48413becb1ac2c17f908fd0fd53bdc3abd5202206d0e9c96fe88d4a0f01ed9dedae2b6f9e00da94cad0fecaae66ecf689bf71b50")};
2                                      ^~~~~~~~~~~~
3./util/strencodings.h:88:44: note: candidate template ignored: constraints not satisfied [with Byte = unsigned char, Size = 337]
4consteval std::array<Byte, (Size - 1) / 2> ArrayFromHex(const char (&hex_str)[Size])    
5                                           ^
6./util/strencodings.h:89:15: note: because '337UL % 2 == 0' (1 == 0) evaluated to false
7    requires (Size % 2 == 0)

l0rinc commented at 7:20 pm on August 14, 2024:

This hex parsing duplication is a bit painful, will try to come up with a deduplicated alternative tomorrow

hodlinator commented at 9:22 pm on August 14, 2024:

Done.

in src/test/util_tests.cpp:204 in c6e1d5bff4 outdated

185-    result = ParseHex("");
186-    BOOST_CHECK(result.size() == 0);
187-    result = TryParseHex<uint8_t>("").value();
188-    BOOST_CHECK(result.size() == 0);
189+    BOOST_CHECK_EQUAL(ParseHex("").size(), 0);
190+    BOOST_CHECK_EQUAL(TryParseHex<uint8_t>("").value().size(), 0);

ryanofsky commented at 6:00 pm on August 14, 2024:

In commit “test refactor: util_tests - Use BOOST_CHECK_EQUAL_COLLECTIONS()” (c6e1d5bff44b7d382a5a91b6722743b5fe07a95d)

Might be worth mentioning this change in commit message or saying it has other test cleanups.

hodlinator commented at 6:57 pm on August 14, 2024:

In the PR summary you mean?

ryanofsky commented at 8:06 pm on August 14, 2024:

re: #30377 (review)

In the PR summary you mean?

I do actually mean the commit message. The current commit message ddd06a0ec0a3b2c1d128fe5986a1363e7cf8e365 makes it sounds like it only switching to BOOST_CHECK_EQUAL_COLLECTIONS, but in reality is doing that and also making a seemingly unrelated test cleanup. Would suggest mentioning the other cleanup in the commit message, so for example, if someone is debugging something and wants to see if this commit is relevant, they can judge that based on the commit message.

in src/util/strencodings.h:121 in b997d266e5 outdated

116+ */
117+template <typename Byte = uint8_t, size_t Size>
118+consteval std::array<Byte, Size / 2> ArrayFromHex(const char (&hex_str)[Size])
119+{
120+    // Size should be uneven due to implicit null terminator.
121+    if (Size % 2 == 0) throw "2 hex digits required per byte";

ryanofsky commented at 6:19 pm on August 14, 2024:

In commit “refactor: Add consteval ArrayFromHex and VecFromHex” (b997d266e549ad6745d01a696cd9fbf1d3525944)

It seems like this is assuming the last character in the string is \0 and then ignoring it? I guess that is good because it makes this convenient to call, but in theory it means somebody could call this with a character array that is does not end in \0 and then the last character of the string would be ignored.

Would suggest changing this to:

0size_t size{hex_str[Size-1] == '\0' ? Size-1 : Size};
1if (size % 2 != 0) throw "2 hex digits required per byte";

to work correctly in either case and actually verify that the last character is \0 if an odd-size array is passed.

hodlinator commented at 7:01 pm on August 14, 2024:

Changing it to a requires-clause as per https://github.com/bitcoin/bitcoin/pull/30377/files#r1717324499 with additional check for null terminator in the function body. Don’t think someone would do:

0constexpr char my_hex[] = {'f', 'f'};
1constexpr std::array<uint8_t, 1> arr = ArrayFromHex(my_hex);

when they could just do:

0constexpr std::array<uint8_t, 1> arr = {0xff};

l0rinc commented at 7:06 pm on August 14, 2024:

Do we really need to support both?

ryanofsky commented at 8:08 pm on August 14, 2024:

re: #30377 (review)

Do we really need to support both?

It seems easier to support both than to add an error message explaining that the last byte has to be null, when there is no particular reason it needs to be null. But mainly I didn’t think the last byte should be silently ignored, so at least that problem is fixed now.

hodlinator commented at 12:10 pm on August 16, 2024:

Continued in #30377 (review)

Resolving this one for now.

ryanofsky approved

ryanofsky commented at 6:41 pm on August 14, 2024: contributor

Code review ACK 65bc6dcd573fe74d2ce175466c30ed830d17f0fc. Looks good, thanks for working on this, and I think this will avoid the need to add the ParseHex suppression in #30415. Left a few comments but nothing critical.

DrahtBot requested review from stickies-v on Aug 14, 2024

DrahtBot requested review from maflcko on Aug 14, 2024

stickies-v commented at 6:53 pm on August 14, 2024: contributor

Approach ACK

in src/util/strencodings.h:84 in 65bc6dcd57 outdated

79+}
80+
81+struct ConstevalHexLiteral {
82+    const std::string_view inner;
83+    template <size_t N>
84+    consteval ConstevalHexLiteral(const char (&hex_str)[N]) : inner{hex_str} { Validate(inner); }

stickies-v commented at 7:12 pm on August 14, 2024:

Since it’s a consteval ctor, I think taking a const char* parameter should behave the same but avoid having to instantiate for each differently sized hex_str input? But I’m still easily confused by how consteval behaves.

0   consteval ConstevalHexLiteral(const char* hex_str) : inner{hex_str} { Validate(inner); }

hodlinator commented at 7:33 pm on August 14, 2024:

Clang unfortunately gives me:

candidate function template not viable: no known conversion from 'const char[65]' to 'ConstevalHexLiteral' for 1st argument

stickies-v commented at 10:28 am on August 15, 2024:

That’s strange, I don’t see how a string literal would need a conversion when there’s a const char* ctor? It seems to work fine for ConstevalStringLiteral too, so pardon my insistence but I think perhaps this compiler error might be because of another issue? Would you mind trying this diff on 734ac5a9002493013c0f8afe763f751ac99f89c8?

 0diff --git a/src/util/strencodings.h b/src/util/strencodings.h
 1index fbd116c77c..63f46396d3 100644
 2--- a/src/util/strencodings.h
 3+++ b/src/util/strencodings.h
 4@@ -80,8 +80,7 @@ consteval uint8_t ConstevalHexDigit(const char c)
 5 
 6 struct ConstevalHexLiteral {
 7     const std::string_view inner;
 8-    template <size_t N>
 9-    consteval ConstevalHexLiteral(const char (&hex_str)[N]) : ConstevalHexLiteral{std::string_view{hex_str}} {}
10+    consteval ConstevalHexLiteral(const char* hex_str) : ConstevalHexLiteral{std::string_view{hex_str}} {}
11     consteval ConstevalHexLiteral(const std::string_view hex_str) : inner{hex_str} { Validate(inner); }
12     consteval ConstevalHexLiteral(std::nullptr_t) = delete;
13

hodlinator commented at 10:25 pm on August 15, 2024:

Thank you for being persistent! Must have had other local changes interacting badly with it when I tried it. Got it working with only your diff, so if we end up resurrecting ConstevalHexLiteral I’ll go with your version.

in src/kernel/chainparams.cpp:350 in 65bc6dcd57 outdated

346@@ -347,7 +347,7 @@ class CTestNet4Params : public CChainParams {
347         m_assumed_chain_state_size = 0;
348 
349         const char* testnet4_genesis_msg = "03/May/2024 000000000000000000001ebd58c244970b3aa9d783bb001011fbe8ea8e98e00e";
350-        const CScript testnet4_genesis_script = CScript() << ParseHex("000000000000000000000000000000000000000000000000000000000000000000") << OP_CHECKSIG;
351+        const CScript testnet4_genesis_script = CScript() << ToByteVector(ArrayFromHex("000000000000000000000000000000000000000000000000000000000000000000")) << OP_CHECKSIG;

l0rinc commented at 7:16 pm on August 14, 2024:

leftover?

0        const CScript testnet4_genesis_script = CScript() << VecFromHex("000000000000000000000000000000000000000000000000000000000000000000") << OP_CHECKSIG;

hodlinator commented at 9:15 pm on August 14, 2024:

Now explained in commit message 0a82c18457ec81e911b835b2ac76ad7475384983 as suggested in #30377 (review).

in src/util/strencodings.h:99 in 65bc6dcd57 outdated

 95+
 96+/** Validates input at compile time, but allocates the vector at runtime in
 97+ *  order to cross the compile time/runtime dynamic memory barrier.
 98+ */
 99+template <typename Byte = uint8_t>
100+std::vector<Byte> VecFromHex(ConstevalHexLiteral hex_str)

l0rinc commented at 7:22 pm on August 14, 2024:

Nit: “vec” vs “array” (abbreviation vs full) -> can we either name them “VecFromHex” and “ArrFromHex” or rather “VectorFromHex” and “ArrayFromHex” (or even shorter: “HexToVector” and “HexToArray”?

hodlinator commented at 7:37 pm on August 14, 2024:

I think *FromHex jives well with uint256::FromHex().. only thing is the latter stores the bytes in reverse order. :face_in_clouds:

Could go for VectorFromHex, but there is already some weak precedence in the type VecDeque.

in src/util/strencodings.h:90 in 65bc6dcd57 outdated

85+    consteval ConstevalHexLiteral(const std::string_view hex_str) : inner{hex_str} { Validate(inner); }
86+    consteval ConstevalHexLiteral(std::nullptr_t) = delete;
87+
88+    static consteval void Validate(const std::string_view hex_str)
89+    {
90+        std::ranges::for_each(hex_str, ConstevalHexDigit);

l0rinc commented at 7:24 pm on August 14, 2024:

super-nit (feel free to ignore): we might want to do the content checking after the size check, it’s cheaper

hodlinator commented at 9:33 pm on August 14, 2024:

Done in latest.

in src/util/strencodings.h:90 in 65bc6dcd57 outdated

87+
88+    static consteval void Validate(const std::string_view hex_str)
89+    {
90+        std::ranges::for_each(hex_str, ConstevalHexDigit);
91+
92+        if (hex_str.size() % 2 == 1) throw "2 hex digits required per byte";

l0rinc commented at 7:25 pm on August 14, 2024:

super-nit:

0        if (hex_str.size() % 2) throw "2 hex digits required per byte";

hodlinator commented at 9:19 pm on August 14, 2024:

I’m burned by old MSVC warnings (probably not current) “implicit conversion to bool” so I’d rather not.

l0rinc commented at 7:06 am on August 15, 2024:

In other cases we’re adding and subtracting booleans, but I’m fine with both, see: https://github.com/bitcoin/bitcoin/pull/30535/files#diff-09e6cf871236bf03d32cca9405837d9b7927690b2296a2de17c9be6ea0e75959R74

hodlinator commented at 7:21 am on August 15, 2024:

That’s some next level terseness. Okay, will change if I re-touch.

in src/util/strencodings.h:120 in 65bc6dcd57 outdated

115+ *          significant stack space if called inside a function.
116+ */
117+template <typename Byte = uint8_t, size_t Size>
118+consteval std::array<Byte, Size / 2> ArrayFromHex(const char (&hex_str)[Size])
119+{
120+    // Size should be uneven due to implicit null terminator.

l0rinc commented at 7:27 pm on August 14, 2024:

0    // Size should be odd due to implicit null terminator.

in src/util/strencodings.h:114 in b997d266e5 outdated

110+    return rv;
111+}
112+
113+/** Converts from hex string literal to byte array at compile time.
114+ * @warning Unlike VecFromHex which returns a vector, the returned array may use
115+ *          significant stack space if called inside a function.

ryanofsky commented at 8:17 pm on August 14, 2024:

re: #30377 (review)

On second thought, while this warning message is more accurate, I don’t think it is clear, because it is implying that return value will use stack space, when we should not expect that to be the case normally. Would suggest:

It may be preferable to call VecFromHex instead of this function to save stack space, or to declare the returned constexpr to avoid using stack space, if returned array is large and being used to initialize a local variable.

hodlinator force-pushed on Aug 14, 2024

in src/util/strencodings.h:84 in b2fbe40cdf outdated

80+
81+struct ConstevalHexLiteral {
82+    const std::string_view inner;
83+    template <size_t N>
84+    consteval ConstevalHexLiteral(const char (&hex_str)[N]) : ConstevalHexLiteral{std::string_view{hex_str}} {}
85+    consteval ConstevalHexLiteral(const std::string_view hex_str) : inner{hex_str} { Validate(inner); }

ryanofsky commented at 2:59 am on August 15, 2024:

In commit “refactor: Add consteval ArrayFromHex and VecFromHex” (b2fbe40cdf08b56d98e38ce261bfeeaf23b5f795)

I can’t see a benefit to defining a string_view constructor here and depending on string_view when this class is only supposed to accept null terminated string literals. This class could be simplified by dropping this constructor, dropping the string_view member and just declaring a const char* member.

hodlinator commented at 9:05 pm on August 15, 2024:

The intention was that the string_view length which is used at runtime would have a higher chance of being calculated at compile time.

in src/util/strencodings.h:84 in b2fbe40cdf outdated

79+}
80+
81+struct ConstevalHexLiteral {
82+    const std::string_view inner;
83+    template <size_t N>
84+    consteval ConstevalHexLiteral(const char (&hex_str)[N]) : ConstevalHexLiteral{std::string_view{hex_str}} {}

ryanofsky commented at 3:04 am on August 15, 2024:

In commit “refactor: Add consteval ArrayFromHex and VecFromHex” (b2fbe40cdf08b56d98e38ce261bfeeaf23b5f795)

This will do a weird thing when passed a non-null terminated string and give an error about an out of bounds array access (“error: array subscript value ‘2’ is outside the bounds of array ‘hex’ of type ‘const char [2]’”) for:

0constexpr char hex[] = {'a', 'b'};
1VecFromHex(hex);

This could be simplified and made less confusing by not requiring character arrays to be null terminated.

hodlinator commented at 9:01 pm on August 15, 2024:

As mentioned in #30377 (review), who would do that?

Someone could theoretically do

0constexpr char hex[] = {'a', 'b', '\0', '\0', '\0'};
1VecFromHex(hex);

and have your version fail to compile.

in src/util/strencodings.h:81 in b2fbe40cdf outdated

76+    if (c >= 'a' && c <= 'f') return c - 'a' + 0xa;
77+
78+    throw "Only lowercase hex digits are allowed, for consistency";
79+}
80+
81+struct ConstevalHexLiteral {

ryanofsky commented at 3:17 am on August 15, 2024:

In commit “refactor: Add consteval ArrayFromHex and VecFromHex” (b2fbe40cdf08b56d98e38ce261bfeeaf23b5f795)

I don’t think it makes sense to have a class that check a hex string at compile time, store the literal hex string in the binary, and then parse the hex string again. It makes the implementation unnecessarily complicated because code loops over the hex string ConstevalHexLiteral 3 different places instead of 1, duplicates checks like throw "2 hex digits required per byte" twice, and defines an extra class to implement an implicit conversion all for the benefit of storing hex instead of binary data in the compiled code and doing unnecessary parsing at runtime. It don’t think there is a justification for this and would recommend going for the alternate approach suggested above using util::HexLiteral and util::Vec functions.

in src/util/strencodings.h:108 in b2fbe40cdf outdated

103+    // since ConstevalHexLiteral has already validated.
104+    std::vector<Byte> rv(hex_str.inner.size() / 2);
105+    auto it = hex_str.inner.begin();
106+    for (auto& elem : rv) {
107+        auto hi = HexDigit(*(it++)) << 4;
108+        elem = Byte(hi | HexDigit(*(it++)));

ryanofsky commented at 3:19 am on August 15, 2024:

In commit “refactor: Add consteval ArrayFromHex and VecFromHex” (b2fbe40cdf08b56d98e38ce261bfeeaf23b5f795)

I think it would be safer and more explicit to use static_cast<Byte>(...) instead of functional cast Byte(...). Not actually sure about this particular situation, but in general there are unsafe conversions that C casts allow which static_cast does not allow.

hodlinator commented at 9:07 pm on August 15, 2024:

Thanks for finding this! Remains from battling the compiler.

in src/wallet/test/wallet_crypto_tests.cpp:46 in 9c4aad3fe4 outdated

48+static void TestDecrypt(const CCrypter& crypt, std::span<const unsigned char> chCiphertext, \
49                         const std::vector<unsigned char>& vchPlaintext = std::vector<unsigned char>())
50 {
51     CKeyingMaterial vchDecrypted;
52-    crypt.Decrypt(vchCiphertext, vchDecrypted);
53+    crypt.Decrypt(chCiphertext, vchDecrypted);

maflcko commented at 8:48 am on August 15, 2024:

style nit in 9c4aad3fe4cb74bb1d9afeece78e2dd6ae1ad08e: When renaming this, I’d suggest to use the “fully” correct naming style ciphertext (snake_case without type prefix), according to the style guide.

(An alternative would be to leave it completely untouched)

stickies-v commented at 10:29 am on August 15, 2024:

I agree. Would either use the fully correct naming style, or not do the rename at all.

hodlinator commented at 9:12 pm on August 15, 2024:

Good suggestion, seems to work out well.

in src/util/strencodings.h:98 in b2fbe40cdf outdated

94+};
95+
96+/** Validates input at compile time, but allocates the vector at runtime in
97+ *  order to cross the compile time/runtime dynamic memory barrier.
98+ */
99+template <typename Byte = uint8_t>

maflcko commented at 9:32 am on August 15, 2024:

style nit in https://github.com/bitcoin/bitcoin/commit/b2fbe40cdf08b56d98e38ce261bfeeaf23b5f795:

I wonder if it makes sense to pick the default as Byte = std::byte now. Obviously the diff will be larger, but longer term it seems cleaner to converge to std::byte, so having that as default will make the future easier, no?

hodlinator commented at 9:10 pm on August 15, 2024:

See https://github.com/bitcoin/bitcoin/pull/30377/files#r1716903532

in src/net_processing.cpp:3935 in 0a82c18457 outdated

3930@@ -3931,7 +3931,7 @@ void PeerManagerImpl::ProcessMessage(CNode& pfrom, const std::string& msg_type,
3931 
3932         // If the peer is old enough to have the old alert system, send it the final alert.
3933         if (greatest_common_version <= 70012) {
3934-            const auto finalAlert{ParseHex("60010000000000000000000000ffffff7f00000000ffffff7ffeffff7f01ffffff7f00000000ffffff7f00ffffff7f002f555247454e543a20416c657274206b657920636f6d70726f6d697365642c2075706772616465207265717569726564004630440220653febd6410f470f6bae11cad19c48413becb1ac2c17f908fd0fd53bdc3abd5202206d0e9c96fe88d4a0f01ed9dedae2b6f9e00da94cad0fecaae66ecf689bf71b50")};
3935+            constexpr auto finalAlert{ArrayFromHex("60010000000000000000000000ffffff7f00000000ffffff7ffeffff7f01ffffff7f00000000ffffff7f00ffffff7f002f555247454e543a20416c657274206b657920636f6d70726f6d697365642c2075706772616465207265717569726564004630440220653febd6410f470f6bae11cad19c48413becb1ac2c17f908fd0fd53bdc3abd5202206d0e9c96fe88d4a0f01ed9dedae2b6f9e00da94cad0fecaae66ecf689bf71b50")};
3936             MakeAndPushMessage(pfrom, "alert", Span{finalAlert});

maflcko commented at 9:57 am on August 15, 2024:

style nit in https://github.com/bitcoin/bitcoin/commit/0a82c18457ec81e911b835b2ac76ad7475384983:

Span and array serialization is the same, so in theory you could drop this call to Span.

hodlinator commented at 8:51 pm on August 15, 2024:

I got lost around SerializeMany :), but agree, serialize.h does the std::array -> Span conversion internally.

in src/test/script_tests.cpp:1369 in 0a82c18457 outdated

1361 
1362+static CScript ScriptFromHex(ConstevalHexLiteral str)
1363+{
1364+    std::vector<unsigned char> data = VecFromHex(str);
1365+    return CScript(data.begin(), data.end());
1366+}

maflcko commented at 10:01 am on August 15, 2024:

style nit in https://github.com/bitcoin/bitcoin/commit/0a82c18457ec81e911b835b2ac76ad7475384983:

Still not sure about this. The benefit seems limited to offer two functions that effectively do the same in tests. It is just extra stuff for test writers (and readers) to keep in mind, when reviewing or writing new tests. Just having one function (the already existing one) seems easier.

The benefit of compile-time enforcement in tests seems limited, because running this test is way faster than compiling it.

in src/test/util_tests.cpp:151 in b2fbe40cdf outdated

147@@ -148,10 +148,14 @@ BOOST_AUTO_TEST_CASE(parse_hex)
148     std::vector<unsigned char> result;
149     std::vector<unsigned char> expected(std::begin(ParseHex_expected), std::end(ParseHex_expected));
150     // Basic test vector
151+    result = ToByteVector(ArrayFromHex("04678afdb0fe5548271967f1a67130b7105cd6a828e03909a67962e0ea1f61deb649f6bc3f4cef38c4f35504e51ec112de5c384df7ba0b8d578a4c702b6bf11d5f"));

l0rinc commented at 10:05 am on August 15, 2024:

nit: this is closely tied to ParseHex_expected, if you modify again, consider extracting all 5 usages next to it.

hodlinator commented at 2:01 pm on August 15, 2024:

To be clear, you are suggesting extracting the repeated string literal “04678afdb0fe5548271967f1a67130b7105c…” into a constexpr variable?

maflcko commented at 10:06 am on August 15, 2024: member

Left some style nits, feel free to ignore.

in src/test/key_tests.cpp:147 in 734ac5a900 outdated

142@@ -143,19 +143,28 @@ BOOST_AUTO_TEST_CASE(key_test1)
143     BOOST_CHECK(key1.Sign(hashMsg, detsig));
144     BOOST_CHECK(key1C.Sign(hashMsg, detsigc));
145     BOOST_CHECK(detsig == detsigc);
146-    BOOST_CHECK(detsig == ParseHex("304402205dbbddda71772d95ce91cd2d14b592cfbc1dd0aabd6a394b6c2d377bbe59d31d022014ddda21494a4e221f0824f0b8b924c43fa43c0ad57dccdaa11f81a6bd4582f6"));
147+    constexpr auto expected_sig1{ArrayFromHex("304402205dbbddda71772d95ce91cd2d14b592cfbc1dd0aabd6a394b6c2d377bbe59d31d022014ddda21494a4e221f0824f0b8b924c43fa43c0ad57dccdaa11f81a6bd4582f6")};
148+    BOOST_CHECK_EQUAL_COLLECTIONS(detsig.begin(), detsig.end(), expected_sig1.begin(), expected_sig1.end());

l0rinc commented at 10:33 am on August 15, 2024:

This was quite simple before, now it might be faster by a few nanos, but it’s less readable. Given that HexStr was already tested properly, could we maybe go the other way and convert detsig to hex string and compare against the hard coded value, i.e.

0    BOOST_CHECK_EQUAL(HexStr(detsig), "304402205dbbddda71772d95ce91cd2d14b592cfbc1dd0aabd6a394b6c2d377bbe59d31d022014ddda21494a4e221f0824f0b8b924c43fa43c0ad57dccdaa11f81a6bd4582f6");

hodlinator commented at 8:53 pm on August 15, 2024:

Thank you!

in src/util/strencodings.h:121 in 734ac5a900 outdated

117+template <typename Byte = uint8_t, size_t Size>
118+consteval std::array<Byte, Size / 2> ArrayFromHex(const char (&hex_str)[Size])
119+    // 2 hex digits required per byte + implicit null terminator
120+    requires (Size % 2 == 1)
121+{
122+    if (hex_str[Size - 1] != '\0') throw "null terminator required";

l0rinc commented at 10:47 am on August 15, 2024:

If we insist that this is an important check, it seems a bit jumpy to do the check at the beginning, after which we iterate over the elements and arrive exactly at that element - and just silently skip it. Alternatively, we could check for \0 when we get there, after we’ve processed the rest of the chars, i.e.

0    assert(!hex_str[i]);
1    return rv;

hodlinator commented at 9:09 pm on August 15, 2024:

I see what you mean and tried it out, but ended up preferring input validation at the top of the function.

in src/bench/bech32.cpp:16 in 734ac5a900 outdated

12@@ -13,9 +13,9 @@
13 
14 static void Bech32Encode(benchmark::Bench& bench)
15 {
16-    std::vector<uint8_t> v = ParseHex("c97f5a67ec381b760aeaf67573bc164845ff39a3bb26a1cee401ac67243b48db");
17+    constexpr std::array<uint8_t, 32> v{ArrayFromHex("c97f5a67ec381b760aeaf67573bc164845ff39a3bb26a1cee401ac67243b48db")};

l0rinc commented at 10:48 am on August 15, 2024:

btw, this is in the preprocessing phase of the benchmark, the performance doesn’t matter here at all - only inside the .run( part, so I’d go with the cleanest code here, too (if this is it, just resolve the comment).

hodlinator commented at 8:40 pm on August 15, 2024:

Yes, this change isn’t to improve benchmark performance.

IMO the changed version is cleaner - the number of bytes to be converted is explicitly enforced in the type and can be connected to the calculation below.

l0rinc commented at 10:51 am on August 15, 2024: contributor

We’re getting closer, I still hope we don’t have to sacrifice some readability.

ryanofsky commented at 12:46 pm on August 15, 2024: contributor

Code review 734ac5a9002493013c0f8afe763f751ac99f89c8

After experimenting with simplifying this. I don’t think I like this approach anymore. I think adding the ConstevalHexLiteral class is basically going out of the way to make the code less efficient and more confusing, because instead of having a clear separation between runtime and compile time functions, we are duplicating code at runtime and compile time to provide a hybrid function that checks hex strings at compile time and then stores the unparsed hex strings in the binary to be parsed later at runtime, instead of just doing the obvious thing and storing bytes so nothing needs to be parsed at runtime.

I implemented a simpler approach in https://github.com/ryanofsky/bitcoin/commits/pr/hex providing straightforward util::HexLiteral and util::Vec functions that I think would be good to adopt here. The changed commits are;

b4b923565b4adaa5e3bcb22a6bc03f1f7ac4cdde util: Add util::HexLiteral and util::Vec functions
8ea5ece5a05967e02338f4392103513dbdb5c3f7 refactor: Hand-replace some ParseHex -> HexLiteral
c1409f4df5489df211861ce9cba1922c7a20e744 refactor: add util::HexLiteral and util::Vec using statements
4a20aa89ffc82077c941493b61b8a13e87742c91 scripted-diff: Replace ParseHex(“str”) -> ArrayFromHex(“str”)

The main change is commit b4b923565b4adaa5e3bcb22a6bc03f1f7ac4cdde “util: Add util::HexLiteral and util::Vec functions”. The other commits have only minor changes.

ryanofsky commented at 3:27 pm on August 15, 2024: contributor

The main change is commit b4b923565b4adaa5e3bcb22a6bc03f1f7ac4cdde “util: Add util::HexLiteral and util::Vec functions”. The other commits have only minor changes.

Design note about b4b923565b4adaa5e3bcb22a6bc03f1f7ac4cdde: I spent hours yesterday trying many ways to implement VectorFromHex(...) and ScriptFromHex(...) hybrid compile/runtime functions that would be equivalent to Vec(HexLiteral(...)) and Script(HexLiteral(...) in this commit and concluded it was impossible because:

In order for these functions to be evaluate char[] arguments at compile time, they would need to be constexpr or consteval, which would make it impossible for them to return std::vector and CScript objects which are usable at runtime.
If the functions could not take char[] arguments, they would have to take implicitly converted arguments of an intermediate type like ConstevalHexLiteral with consteval constructors. But unfortunately, because of the way function template parameter deduction works in C++, the intermediate type would have to be a non-template class instead of a template class, which would make it it impossible for its size to vary based on the size of the string, so not possible for it to represent arbitrary sized binary data.

Eventually, I did find it was possible to implement hybrid functions that evaluated arguments at compile time but returned values that could be used at runtime if they were written like VectorFromHex<"1234">() instead of VectorFromHex("1234"). But at that point I became convinced this was a bad approach and that is just better to have a separate compile time function HexLiteral() and a runtime function Vec() that can be used together simply and clearly instead of trying to combine different things into one function.

hodlinator force-pushed on Aug 15, 2024

hodlinator commented at 10:16 pm on August 15, 2024: contributor

Thanks for taking the time @ryanofsky!

Your troubles with runtime vs compile time memory is indeed part of what fed into prior solutions. Cool that VectorFromHex<"1234">() works, but I’m happy you’re not a fan either.

While I did like the symmetry of ArrayFromHex with the recently added Txid::FromHex and uint256::FromHex, the latter ones like to reverse the byte order, and HexLiteral is 2 chars less. HexLiteral feels more like prime real estate, so hiding behind the fig-leaf of util seems wise. I tried experimenting with user defined literals in response now but ran into issues with both making them consteval and accepting a size_t-templated char-array argument.

The implementation of VecFromHex was a compromise, and when un-drafting the PR a few days ago my initial version did not include it. Some pressure from @maflcko made me flip and add it, but your feedback has made me flop back gain. May him and other reviewers have mercy on my soul.

I called the new helper function in script_tests.cpp ToScript() in my original version, rather than Script(). Are you not concerned that some day the CScript -> Script rename will happen? You evidently have less aversion to noun-named functions whereas I gravitate towards verb-adjacent.

Edit: The recent pushes were from 734ac5a9002493013c0f8afe763f751ac99f89c8 to d543ac9702b559e0ea46af216518964e27aae7ce to e4032e253c2dfc8d75defd450dbb23ccf689c390.

hodlinator renamed this:
~~refactor: Replace ParseHex with consteval ArrayFromHex~~
refactor: Replace ParseHex with consteval HexLiteral
on Aug 15, 2024

in src/test/script_tests.cpp:1367 in ced7cec1c6 outdated

1360@@ -1361,6 +1361,12 @@ static CScript ScriptFromHex(const std::string& str)
1361     return CScript(data.begin(), data.end());
1362 }
1363 
1364+template<size_t N>
1365+CScript Script(const std::array<uint8_t, N>& array)
1366+{
1367+    return {array.begin(), array.end()};

ryanofsky commented at 11:23 pm on August 15, 2024:

In commit “refactor: add util::HexLiteral and util::Vec using statements” (ced7cec1c6d79159f8212d3a92a8f7583ef11884)

It seems like there is a problem with this on windows, looks like because begin and end are returning std::_Array_const_iterator types, instead of character pointers. Following change might fix it:

0-    return {array.begin(), array.end()};
1+    return {array.data(), array.data() + array.size()};

Error is:

https://github.com/bitcoin/bitcoin/actions/runs/10411271977/job/28834888600?pr=30377#step:20:2031

 0D:\a\bitcoin\bitcoin\src\test\script_tests.cpp(1367,12): error C2440: 'return': cannot convert from 'initializer list' to 'CScript' [D:\a\bitcoin\bitcoin\build_msvc\test_bitcoin\test_bitcoin.vcxproj]
 1  D:\a\bitcoin\bitcoin\src\test\script_tests.cpp(1367,5): note: 'CScript::CScript': no overloaded function could convert all the argument types
 2  D:\a\bitcoin\bitcoin\src\script\script.h(436,5): note: could be 'CScript::CScript(const unsigned char *,const unsigned char *)'
 3  D:\a\bitcoin\bitcoin\src\test\script_tests.cpp(1367,5): note: 'CScript::CScript(const unsigned char *,const unsigned char *)': cannot convert argument 1 from 'std::_Array_const_iterator<_Ty,4>' to 'const unsigned char *'
 4          with
 5          [
 6              _Ty=uint8_t
 7          ]
 8  D:\a\bitcoin\bitcoin\src\test\script_tests.cpp(1367,24): note: No user-defined-conversion operator available that can perform this conversion, or the operator cannot be called
 9  D:\a\bitcoin\bitcoin\src\script\script.h(435,5): note: or       'CScript::CScript(std::_Vector_const_iterator<std::_Vector_val<std::_Simple_types<_Ty>>>,std::_Vector_const_iterator<std::_Vector_val<std::_Simple_types<_Ty>>>)'
10          with
11          [
12              _Ty=unsigned char
13          ]
14  D:\a\bitcoin\bitcoin\src\test\script_tests.cpp(1367,5): note: 'CScript::CScript(std::_Vector_const_iterator<std::_Vector_val<std::_Simple_types<_Ty>>>,std::_Vector_const_iterator<std::_Vector_val<std::_Simple_types<_Ty>>>)': cannot convert argument 1 from 'std::_Array_const_iterator<_Ty,4>' to 'std::_Vector_const_iterator<std::_Vector_val<std::_Simple_types<_Ty>>>'
15          with
16          [
17              _Ty=unsigned char
18          ]
19          and
20          [
21              _Ty=uint8_t
22          ]
23          and
24          [
25              _Ty=unsigned char
26          ]
27  D:\a\bitcoin\bitcoin\src\test\script_tests.cpp(1367,24): note: No user-defined-conversion operator available that can perform this conversion, or the operator cannot be called
28  D:\a\bitcoin\bitcoin\src\script\script.h(434,5): note: or       'CScript::CScript(prevector<28,unsigned char,uint32_t,int32_t>::const_iterator,prevector<28,unsigned char,uint32_t,int32_t>::const_iterator)'
29  D:\a\bitcoin\bitcoin\src\test\script_tests.cpp(1367,5): note: 'CScript::CScript(prevector<28,unsigned char,uint32_t,int32_t>::const_iterator,prevector<28,unsigned char,uint32_t,int32_t>::const_iterator)': cannot convert argument 1 from 'std::_Array_const_iterator<_Ty,4>' to 'prevector<28,unsigned char,uint32_t,int32_t>::const_iterator'
30          with
31          [
32              _Ty=uint8_t
33          ]
34  D:\a\bitcoin\bitcoin\src\test\script_tests.cpp(1367,24): note: No user-defined-conversion operator available that can perform this conversion, or the operator cannot be called
35  D:\a\bitcoin\bitcoin\src\test\script_tests.cpp(1367,5): note: while trying to match the argument list '(std::_Array_const_iterator<_Ty,4>, std::_Array_const_iterator<_Ty,4>)'
36          with
37          [
38              _Ty=uint8_t
39          ]
40  D:\a\bitcoin\bitcoin\src\test\script_tests.cpp(1367,5): note: the template instantiation context (the oldest one first) is
41  D:\a\bitcoin\bitcoin\src\test\script_tests.cpp(1395,9): note: see reference to function template instantiation 'CScript script_tests::Script<4>(const std::array<uint8_t,4> &)' being compiled
42  serfloat_tests.cpp
43D:\a\bitcoin\bitcoin\src\test\script_tests.cpp(1367,12): error C2440: 'return': cannot convert from 'initializer list' to 'CScript' [D:\a\bitcoin\bitcoin\build_msvc\test_bitcoin\test_bitcoin.vcxproj]

maflcko commented at 5:18 am on August 16, 2024:

Is #29369 related or possibly a fix?

hodlinator commented at 11:51 am on August 16, 2024:

Yes, #29369 looks like a possible fix. Applied @ryanofsky’s workaround for now but will have a look at that PR.

DrahtBot added the label CI failed on Aug 15, 2024

DrahtBot commented at 11:28 pm on August 15, 2024: contributor

🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/28834906136

Make sure to run all tests locally, according to the documentation.

The failure may happen due to a number of reasons, for example:

Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.
An intermittent issue.

Leave a comment here, if you need help tracking down a confusing failure.

ryanofsky approved

ryanofsky commented at 11:48 pm on August 15, 2024: contributor

Code review ACK e4032e253c2dfc8d75defd450dbb23ccf689c390 if visual studio and clang-tidy errors are fixed

https://cirrus-ci.com/task/5821174167371776 https://github.com/bitcoin/bitcoin/actions/runs/10411271977/job/28834888600?pr=30377#step:20:2031

Thanks for updating. To be clear I don’t have really have any issue with embedding hex constants instead byte constants in the compiled code. I mainly just thought behavior of VectorFromHex function was hard to explain and confusing, arbitrarily doing certain things at runtime and other things at compile time, and the implementation was pretty involved, duplicating code and logic. Adding one simple compile-time function seems preferable.

I do think ToScript might be a better name than Script, I just didn’t put much thought into it since it is a local test function. I’d want to look into semantics more (not sure if it is deserializing or pushing a number), because probably there is a better, more descriptive name. (EDIT: It looks like this is just interpreting the bytes as a serialized script, so a name like AsScript or ToScript \ might be more descriptive than Script. I think ideally CScript would probably just have private constructors and we would use construct named functions like ScriptCode and ScriptData to construct CScript objects in a less ambiguous way.) In general, I agree verb names can be better than noun names but I think noun and other types of names can be ok when functions are acting like constructors.

DrahtBot requested review from stickies-v on Aug 15, 2024

DrahtBot requested review from maflcko on Aug 15, 2024

in src/util/strencodings.h:390 in 84c830f27f outdated

385+ * @note It may be preferable to use Vec(HexLiteral(...)) instead of
386+ * HexLiteral() to save stack space when declaring a local variable, if the hex
387+ * string is large. Alternately the variable could be declared constexpr to
388+ * avoid using stack space.
389+ */
390+template <typename Byte = uint8_t, size_t N>

maflcko commented at 5:38 am on August 16, 2024:

style nit in 84c830f27fd62db4a9cb93bf6d28a86f7751e504: Could use std::byte as default? (See below for reasoning)

(Reply to #30377 (review))

I think your link should say #30377 (review)

This is a newly introduced function, so I think changing it should not extend to untouched files. To clarify, I am not saying that you should switch callers to use std::byte, only the default.

What I meant is that (for now) in this pull request you make the default choice std::byte for Byte and adjust the scripted-diff to use HexLiteral<uint8_t>. Obviously the diff will be larger more verbose, but longer term it seems cleaner to converge to std::byte. This means that the future change converging to std::byte will be cleaning up the code by being able to drop the <uint8_t> and fall back on the default. Also, using the explicit <uint8_t> makes it easier to spot places that still use the “legacy” description of bytes, and it acts as a sort-of “TODO-comment”.

Also, “modern” places that require (or allow) std::byte today already, can simply use the function without having to specify that a Byte should mean std::byte.

This is in symmetry with the recently introduced TryParseHex, which also defaults to std::byte.

hodlinator commented at 12:01 pm on August 16, 2024:

Ah, hadn’t seen your #25227 before. It seems easier to default to std::byte when introducing TryParseHex without any prior callers. But thanks to your elaboration on the effort already under way to use std::byte, I did some more grepping and decided it might be worth attempting. Changed defaults in latest push 01e18d94d9577f415748869376988e3f0f59ced0 and worked through the fallout. The scripted-diff ended up with a lot less work in that version.

(The push right before it in 3a96b9f4b56c2ef47f41e9baa9ce36a268aa9440 applies the other more minor fixes without changing the default to std::byte).

Good to see that you were already thinking about compile-time versions in #25227 (comment).

DrahtBot requested review from maflcko on Aug 16, 2024

hodlinator force-pushed on Aug 16, 2024

DrahtBot removed the label CI failed on Aug 16, 2024

in src/pubkey.h:256 in 1dddb4fb0f outdated

252@@ -253,8 +253,7 @@ class XOnlyPubKey
253      *  !IsFullyValid(). */
254     bool IsNull() const { return m_keydata.IsNull(); }
255 
256-    /** Construct an x-only pubkey from exactly 32 bytes. */

maflcko commented at 2:19 pm on August 16, 2024:

nit in 1dddb4fb0ff88162e58e50e56d5e889bb5727a81: Any reason to remove the comment? Seems important to keep, otherwise a dev may be surprised when the function crashes when it isn’t exactly 32 bytes.

hodlinator commented at 9:25 pm on August 16, 2024:

Was removed it in response to #30377 (review) but then I went one step further and deferred the assert to the base_blob Span-ctor, making the comment more relevant again. Brought back now.

in src/pubkey.h:256 in 1dddb4fb0f outdated

252@@ -253,8 +253,7 @@ class XOnlyPubKey
253      *  !IsFullyValid(). */
254     bool IsNull() const { return m_keydata.IsNull(); }
255 
256-    /** Construct an x-only pubkey from exactly 32 bytes. */
257-    explicit XOnlyPubKey(Span<const unsigned char> bytes);
258+    constexpr explicit XOnlyPubKey(Span<const unsigned char> bytes) : m_keydata{bytes} {}

maflcko commented at 2:23 pm on August 16, 2024:

nit in https://github.com/bitcoin/bitcoin/commit/1dddb4fb0ff88162e58e50e56d5e889bb5727a81: Use std::span to avoid having to touch this again in the future for that reason?

hodlinator commented at 10:01 pm on August 16, 2024:

Started going down the route of std::span<const unsigned char, 32> to enforce the constraint previously, got burnt and backed out a bit too far, std::span with dynamic extent and comment will do for this PR. :+1:

in src/wallet/crypter.cpp:27 in 1dddb4fb0f outdated

23@@ -24,8 +24,8 @@ int CCrypter::BytesToKeySHA512AES(const std::vector<unsigned char>& chSalt, cons
24     unsigned char buf[CSHA512::OUTPUT_SIZE];
25     CSHA512 di;
26 
27-    di.Write((const unsigned char*)strKeyData.data(), strKeyData.size());
28-    di.Write(chSalt.data(), chSalt.size());
29+    di.Write((const unsigned char*)key_data.data(), key_data.size());

maflcko commented at 2:25 pm on August 16, 2024:

nit in https://github.com/bitcoin/bitcoin/commit/1dddb4fb0ff88162e58e50e56d5e889bb5727a81: Use the safe UCharCast while touching this, instead of the possibly unsafe pointer cast?

in src/wallet/crypter.cpp:123 in 1dddb4fb0f outdated

120     std::vector<unsigned char> chIV(WALLET_CRYPTO_IV_SIZE);
121     memcpy(chIV.data(), &nIV, WALLET_CRYPTO_IV_SIZE);
122     if(!cKeyCrypter.SetKey(vMasterKey, chIV))
123         return false;
124-    return cKeyCrypter.Decrypt(vchCiphertext, vchPlaintext);
125+    return cKeyCrypter.Decrypt(chCiphertext, vchPlaintext);

maflcko commented at 2:29 pm on August 16, 2024:

nit in https://github.com/bitcoin/bitcoin/commit/1dddb4fb0ff88162e58e50e56d5e889bb5727a81: Wrong rename?

hodlinator commented at 10:09 pm on August 16, 2024:

Leftover from half-way rename, fixed here and function below.

in src/kernel/chainparams.cpp:77 in 82c9080f23 outdated

73@@ -71,7 +74,7 @@ static CBlock CreateGenesisBlock(const char* pszTimestamp, const CScript& genesi
74 static CBlock CreateGenesisBlock(uint32_t nTime, uint32_t nNonce, uint32_t nBits, int32_t nVersion, const CAmount& genesisReward)
75 {
76     const char* pszTimestamp = "The Times 03/Jan/2009 Chancellor on brink of second bailout for banks";
77-    const CScript genesisOutputScript = CScript() << ParseHex("04678afdb0fe5548271967f1a67130b7105cd6a828e03909a67962e0ea1f61deb649f6bc3f4cef38c4f35504e51ec112de5c384df7ba0b8d578a4c702b6bf11d5f") << OP_CHECKSIG;
78+    const CScript genesisOutputScript = CScript() << Vec(HexLiteral<uint8_t>("04678afdb0fe5548271967f1a67130b7105cd6a828e03909a67962e0ea1f61deb649f6bc3f4cef38c4f35504e51ec112de5c384df7ba0b8d578a4c702b6bf11d5f")) << OP_CHECKSIG;

maflcko commented at 3:43 pm on August 16, 2024:

nit in 82c9080f235f6ffbe983b1360dc5741ce95e9a7d: Follow-up nit: I think it was me who suggested to avoid array/span serialization in script, but I think it could be considered when the vector serialization is mirrored. I understand that this would be different from the serialize.h serialization, but the two are separate anyway (and script inherits the whole prevector functions to insert raw bytes at any point anyway).

In any case, this doesn’t remove the need for Vec and can be done in a follow-up.

maflcko commented at 3:47 pm on August 16, 2024: member

review ACK 44bb5a12c4389dc18e181387356901787844e89f

(I’ll do the rest later)

hodlinator force-pushed on Aug 16, 2024

in src/test/script_tests.cpp:1401 in 67fc994bed outdated

1400     BOOST_CHECK_EQUAL(FindAndDelete(s, d), 1);
1401     BOOST_CHECK(s == expect);
1402 
1403-    s = ScriptFromHex("0302ff030302ff03"); // PUSH 0x2ff03 PUSH 0x2ff03
1404-    d = ScriptFromHex("0302ff03");
1405+    s = ToScript(HexLiteral<uint8_t>("0302ff030302ff03")); // PUSH 0x2ff03 PUSH 0x2ff03

l0rinc commented at 9:55 am on August 17, 2024:

super-nit: during parsing we’re checking now that the inputs contain an even number of hexadecimal digits, if you edit again, consider applying that to the comments as well (similarly to other comments, like 0x02ff03)

0    s = ToScript(HexLiteral<uint8_t>("0302ff030302ff03")); // PUSH 0x02ff03 PUSH 0x02ff03

in src/test/transaction_tests.cpp:873 in 67fc994bed outdated

874     // OP_RESERVED *is* considered to be a PUSHDATA type opcode by IsPushOnly()!
875-    t.vout[0].scriptPubKey = CScript() << OP_RETURN << OP_RESERVED << -1 << 0 << ParseHex("01") << 2 << 3 << 4 << 5 << 6 << 7 << 8 << 9 << 10 << 11 << 12 << 13 << 14 << 15 << 16;
876+    t.vout[0].scriptPubKey = CScript() << OP_RETURN << OP_RESERVED << -1 << 0 << Vec(HexLiteral<uint8_t>("01")) << 2 << 3 << 4 << 5 << 6 << 7 << 8 << 9 << 10 << 11 << 12 << 13 << 14 << 15 << 16;
877     CheckIsStandard(t);
878-    t.vout[0].scriptPubKey = CScript() << OP_RETURN << 0 << ParseHex("01") << 2 << ParseHex("ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff");
879+    t.vout[0].scriptPubKey = CScript() << OP_RETURN << 0 << Vec(HexLiteral<uint8_t>("01")) << 2 << Vec(HexLiteral<uint8_t>("ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff"));

l0rinc commented at 10:39 am on August 17, 2024:

Given that

0assert(Vec(HexLiteral<uint8_t>("01")) == valtype{1});
1assert(Vec(HexLiteral<uint8_t>("ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff")) == valtype(36, 0xff));

consider simplifying these to e.g.

0    t.vout[0].scriptPubKey = CScript() << OP_RETURN << 0 << valtype{1} << 2 << valtype(36, 0xff);

and many more, I think it would simplify a few trivial usages, hexadecimal encoding only makes sense for bigger values.

ryanofsky commented at 2:02 pm on August 17, 2024:

re: #30377 (review)

and many more, I think it would simplify a few trivial usages, hexadecimal encoding only makes sense for bigger values.

This would change which cscript overload is called, which could be good, but I think it would be better to do in a PR dedicated to improving cscript. That way this change can be focused on replacing runtime hex parsing with compile time parsing, and reviewers don’t need to consider ways it is impacting test coverage of different CScript methods.

l0rinc commented at 4:38 pm on August 17, 2024:

Both of them produce a vector, right? Edit:

typedef std::vector<unsigned char> valtype
std::vector<T> Vec(const std::array<T, N>& array)

i.e. both would call CScript& operator<<(const std::vector<unsigned char>& b) LIFETIMEBOUND, right?

ryanofsky commented at 3:33 pm on August 18, 2024:

re: #30377 (review)

Yes, you’re right , I misread the suggestion. Using valtype to construct vectors would not change which CScript method is called, so it should be not a problem to make that change in this PR, and maybe it would be better. I do think tests would be more readable and consistent if they used HexLiteral as much as possible to represent raw bytes, but that’s a subjective opinion and your approach seems fine too.

hodlinator commented at 7:53 am on August 19, 2024:

Yeah, when it came to short/empty hex strings I’ve been tempted to use something like std::vector{0xFF} but felt inconsistent. valtype is a good find, but doesn’t provide so much of a win once we have full std::array<std::byte> support in CSscript: valtype{0xFF} vs HexLiteral("FF") valtype{} vs HexLiteral("") (I think switching to base 10 is or using valtype(36, 0xff); is too inconsistent). So keeping as-is for now.

l0rinc commented at 8:21 am on August 19, 2024:

once we have full std::arraystd::byte support in CSscript:

Valid point.

What about repeated values, which are already using the vector constructor, i.e.

0valtype(36, 0xff)

instead of

0Vec(HexLiteral<uint8_t>("ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff"))

maflcko commented at 8:34 am on August 19, 2024:

What about repeated values, which are already using the vector constructor, i.e.

I’d say that there is more value in a test being consistent (using the same code patterns within a single unit test), than to use the shortest possible code. However that is a style question up to the author. Personally, I’d leave this line in this pull request as-is.

l0rinc commented at 9:14 am on August 19, 2024:

more value in a test being consistent

Agree, that’s why I’m suggesting this, transaction_tests has many non-hexadecimal pushes.

hodlinator commented at 8:28 pm on August 19, 2024:

What about repeated values, which are already using the vector constructor, i.e.

See my message before yours @paplorinc:

(I think switching to base 10 is or using valtype(36, 0xff); is too inconsistent).

l0rinc commented at 8:40 pm on August 19, 2024:

valtype is not the point here, rather that for repeated values we’ve used the vector constructor. But we can close this comment, maybe we’ll do it in another PR.

hodlinator commented at 8:45 pm on August 19, 2024:

Yes. valtype was not my only point, it was also the repeated values.

l0rinc commented at 8:48 pm on August 19, 2024:

But aren’t we already doing that throughout the test, e.g. https://github.com/bitcoin/bitcoin/blob/master/src/test/transaction_tests.cpp#L1031?

hodlinator commented at 8:52 pm on August 19, 2024:

Yes, we are. But I’d rather not change that aspect of this region in this PR. Arguably the current “flattened out” version is clearer for the casual reader who isn’t interested in the exact number of bytes being repeated.

l0rinc commented at 8:54 pm on August 19, 2024:

Ah, so it’s just out of scope - I can work with that. :)

in src/test/transaction_tests.cpp:861 in 67fc994bed outdated

858     BOOST_CHECK_EQUAL(MAX_OP_RETURN_RELAY, t.vout[0].scriptPubKey.size());
859     CheckIsStandard(t);
860 
861     // MAX_OP_RETURN_RELAY+1-byte TxoutType::NULL_DATA (non-standard)
862-    t.vout[0].scriptPubKey = CScript() << OP_RETURN << ParseHex("04678afdb0fe5548271967f1a67130b7105cd6a828e03909a67962e0ea1f61deb649f6bc3f4cef3804678afdb0fe5548271967f1a67130b7105cd6a828e03909a67962e0ea1f61deb649f6bc3f4cef3800");
863+    t.vout[0].scriptPubKey = CScript() << OP_RETURN << Vec(HexLiteral<uint8_t>("04678afdb0fe5548271967f1a67130b7105cd6a828e03909a67962e0ea1f61deb649f6bc3f4cef3804678afdb0fe5548271967f1a67130b7105cd6a828e03909a67962e0ea1f61deb649f6bc3f4cef3800"));

l0rinc commented at 10:56 am on August 17, 2024:

Is Vec needed here because the existing ToByteVector has a long name? Or because of some funny visual studio exception? If none, maybe we could get rid of either.

ryanofsky commented at 2:05 pm on August 17, 2024:

re: #30377 (review)

Is Vec needed here because the existing ToByteVector has a long name? Or because of some funny visual studio exception? If none, maybe we could get rid of either.

CScript only supports « with std::vector, not std::array. It would be natural for it support std::array too, but I think that would be better be done in a followup improving CScript so this PR can be focused on replacing runtime hex parsing with compile time parsing and not make sensitive changes to the cscript implementation.

l0rinc commented at 4:39 pm on August 17, 2024:

I mean both Vec(HexLiteral( and the existing ToByteVector(HexLiteral( would work here, right?

ryanofsky commented at 3:58 pm on August 18, 2024:

re: #30377 (review)

I mean both Vec(HexLiteral( and the existing ToByteVector(HexLiteral( would work here, right?

Right, ToByteVector could be used here instead of Vec, and Vec was just chosen because it is shorter. Sorry I misunderstood your original question.

I do think in general Vec has some benefits over ToByteVector:

Vec only changes the type of the container, not the type of the elements inside the container, so it seems preferable to use in cases like this where the internal type is not changing.
The name ToByteVector seems not great because it returns a vector of unsigned char rather than a vector of std::byte. If intention is to be switched to std::byte later though then I think it is a good name.

But nice thing is neither of these should be neccessary if cscript supports std::array

l0rinc commented at 7:41 pm on August 18, 2024:

But nice thing is neither of these should be necessary if cscript supports std::array

Would it be too early for me to provide a separate PR for that?

hodlinator commented at 7:37 am on August 19, 2024:

I would be happy if you took the ball on that. But would appreciate if you worked on it in a personal branch (still pushing to GitHub for CI) and waited a few days more with making it into a PR.

~6 pushes ago on this PR I had commit 5be34598c4683b2b44f607b28592a9e68e089761 which added targeted std::array support to script.h. It is conservative in that it doesn’t publicly expose the possibility of appending raw std::span. IIRC MSVC had some issue with it, but could have been a different push.

maflcko commented at 8:36 am on August 19, 2024:

I agree with Ryan, that this should be a follow-up PR, and that the changes in this line of the diff should probably be kept as-is.

l0rinc commented at 9:03 am on August 19, 2024:

I haven’t seen https://github.com/bitcoin/bitcoin/commit/5be34598c4683b2b44f607b28592a9e68e089761 It seems to me there’s a simpler solution than that, since the method contains two separate concerns (inserting size + inserting elements) - which are using different parts of the vector/array.

 0diff --git a/src/script/script.h b/src/script/script.h
 1--- a/src/script/script.h	(revision d5eed066d33780b12c1f0813a2adf021eac0ca5d)
 2+++ b/src/script/script.h	(date 1724057506103)
 3@@ -412,6 +412,34 @@
 4 /** Serialized script, used inside transaction inputs and outputs */
 5 class CScript : public CScriptBase
 6 {
 7+private:
 8+    void InsertSize(const size_t size)
 9+    {
10+        if (size < OP_PUSHDATA1)
11+        {
12+            insert(end(), static_cast<unsigned char>(size));
13+        }
14+        else if (size <= 0xff)
15+        {
16+            insert(end(), OP_PUSHDATA1);
17+            insert(end(), static_cast<unsigned char>(size));
18+        }
19+        else if (size <= 0xffff)
20+        {
21+            insert(end(), OP_PUSHDATA2);
22+            uint8_t _data[2];
23+            WriteLE16(_data, size);
24+            insert(end(), _data, _data + sizeof(_data));
25+        }
26+        else
27+        {
28+            insert(end(), OP_PUSHDATA4);
29+            uint8_t _data[4];
30+            WriteLE32(_data, size);
31+            insert(end(), _data, _data + sizeof(_data));
32+        }
33+    }
34+
35 protected:
36     CScript& push_int64(int64_t n)
37     {
38@@ -465,29 +493,15 @@
39 
40     CScript& operator<<(const std::vector<unsigned char>& b) LIFETIMEBOUND
41     {
42-        if (b.size() < OP_PUSHDATA1)
43-        {
44-            insert(end(), (unsigned char)b.size());
45-        }
46-        else if (b.size() <= 0xff)
47-        {
48-            insert(end(), OP_PUSHDATA1);
49-            insert(end(), (unsigned char)b.size());
50-        }
51-        else if (b.size() <= 0xffff)
52-        {
53-            insert(end(), OP_PUSHDATA2);
54-            uint8_t _data[2];
55-            WriteLE16(_data, b.size());
56-            insert(end(), _data, _data + sizeof(_data));
57-        }
58-        else
59-        {
60-            insert(end(), OP_PUSHDATA4);
61-            uint8_t _data[4];
62-            WriteLE32(_data, b.size());
63-            insert(end(), _data, _data + sizeof(_data));
64-        }
65+        InsertSize(b.size());
66+        insert(end(), b.begin(), b.end());
67+        return *this;
68+    }
69+
70+    template<size_t N>
71+    CScript& operator<<(const std::array<unsigned char, N>& b) LIFETIMEBOUND
72+    {
73+        InsertSize(N);
74         insert(end(), b.begin(), b.end());
75         return *this;
76     }

i.e. for vector:

0InsertSize(b.size());
1insert(end(), b.begin(), b.end());

and for array:

0InsertSize(N);
1insert(end(), b.begin(), b.end());

that this should be a follow-up PR

I’m fine with both, I’m just providing alternatives.

hodlinator commented at 11:38 am on August 19, 2024:

I like that my version just forwards everything in one statement, while your leaves 3, duplicated. But either one works for me.

l0rinc commented at 11:58 am on August 19, 2024:

In my suggestion there is no std::span involved and the size + data writing parts are clearly separated. But I can also imagine a method accepting b.size(), b.begin(), b.end() vs N, b.begin(), b.end(), if you think that’s cleaner.

hodlinator commented at 2:06 pm on August 19, 2024:

In my suggestion there is no std::span

Yeah, I guess that introduces needless confusion, better to not mention span. I like your solution better now. :)

But I can also imagine a method accepting b.size(), b.begin(), b.end() vs N, b.begin(), b.end(), if you think that’s cleaner.

Possibly just begin() & end() and just using the difference instead of passing size() explicitly?

l0rinc commented at 8:42 pm on August 19, 2024:

I’m fine with any of those - having a slight preference for the one which clarifies the two separate roles of the serialization - length + values

l0rinc commented at 12:49 pm on August 30, 2024:

Split this out to a separate PR: 176d42b (#30765)

in src/test/transaction_tests.cpp:887 in 67fc994bed outdated

883@@ -882,13 +884,13 @@ BOOST_AUTO_TEST_CASE(test_IsStandard)
884 
885     // Only one TxoutType::NULL_DATA permitted in all cases
886     t.vout.resize(2);
887-    t.vout[0].scriptPubKey = CScript() << OP_RETURN << ParseHex("04678afdb0fe5548271967f1a67130b7105cd6a828e03909a67962e0ea1f61deb649f6bc3f4cef38");
888+    t.vout[0].scriptPubKey = CScript() << OP_RETURN << Vec(HexLiteral<uint8_t>("04678afdb0fe5548271967f1a67130b7105cd6a828e03909a67962e0ea1f61deb649f6bc3f4cef38"));

l0rinc commented at 11:05 am on August 17, 2024:

nit: I’m not in love with the ParseHex -> Vec(HexLiteral<uint8_t> change, seems more noisy than needed. What’s the practical problem with keeping ParseHex in the tests?

ryanofsky commented at 2:12 pm on August 17, 2024:

re: #30377 (review)

nit: I’m not in love with the ParseHex -> Vec(HexLiteral<uint8_t> change, seems more noisy than needed. What’s the practical problem with keeping ParseHex in the tests?

I feel like we could add a HexLiteralU or similar convenience function that is equivalent to HexLiteral<uint8_t>, because HexLiteral<uint8_t> does seem to be needed a lot of places after the std::byte change (107 instances currently, which is more than I would have expected).

It would also be nice to have a followup adding std::array support to CScript so more Vec calls can be dropped, too, but we should avoid touching CScript implementation in this PR. It looks like there are 28 Vec(HexLiteral()) calls currently, and if std::array support is added to CScript 15 of them can be dropped.

I think suggestion to keep ParseHex in tests would be a step backward because compile time checking of strings in tests is as much (or more) of a practical benefit there as in other parts of the code. I think most of the practical benefit is just knowing code works without having to wait for tests to run.

l0rinc commented at 9:33 pm on August 17, 2024:

Thanks @ryanofsky

hodlinator commented at 7:57 am on August 19, 2024:

I guess we could add an interim HexLiteralU, but that would decrease the motivation to adapt to std::byte. :)

in src/wallet/test/wallet_crypto_tests.cpp:78 in 67fc994bed outdated

112     // These are expensive.
113 
114-    TestCrypter::TestPassphrase(ParseHex("0000deadbeef0000"), "test", 25000, \
115-                                ParseHex("fc7aba077ad5f4c3a0988d8daa4810d0d4a0e3bcb53af662998898f33df0556a"), \
116-                                ParseHex("cf2f2691526dd1aa220896fb8bf7c369"));
117+    TestCrypter::TestPassphrase(HexLiteral<uint8_t>("0000deadbeef0000"), "test", 25000, \

l0rinc commented at 11:14 am on August 17, 2024:

nit: as stated before, the trailing \ can likely be removed.

hodlinator commented at 8:03 am on August 19, 2024:

These lines are part of the scripted diff commit (67fc994bedf14e360b3e51fa1a71dc6c1684b532), and I’d rather not complicate the regexps just for that.

hodlinator commented at 9:14 pm on August 19, 2024:

Settled for fixing this in 41d97d38bfbdacdf56e730c6b082f992e4f15ec1, before the scripted-diff happens.

in src/test/util_tests.cpp:233 in 67fc994bed outdated

233+        HexStr(Span{HEX_PARSE_OUTPUT}.first(0)),
234         "");
235 
236     {
237-        const std::vector<char> in_s{ParseHex_expected, ParseHex_expected + 5};
238+        const std::vector<char> in_s{HEX_PARSE_OUTPUT, HEX_PARSE_OUTPUT + 5};

l0rinc commented at 11:17 am on August 17, 2024:

nit: 5 seems to be a random value, we might as well signal that with:

0        const std::string out_exp{"04678afdb0"};
1        const std::vector<char> in_s{HEX_PARSE_OUTPUT, HEX_PARSE_OUTPUT + out_exp.size() / 2};

hodlinator commented at 8:01 am on August 19, 2024:

Will do if I retouch.

in src/util/strencodings.h:390 in 67fc994bed outdated

386+ * HexLiteral() to save stack space when declaring a local variable, if the hex
387+ * string is large. Alternately the variable could be declared constexpr to
388+ * avoid using stack space.
389+ */
390+template <typename Byte = std::byte, size_t N>
391+consteval std::array<Byte, N / 2> HexLiteral(const char (&hex_str)[N])

l0rinc commented at 12:22 pm on August 17, 2024:

Given that we already seem to be using custom string overloads (e.g. https://github.com/bitcoin/bitcoin/blob/master/src/script/miniscript.cpp#L87), would it maybe make the usages less noisy if we tried something like:

0consteval auto operator""_hex(const char* hex_str, const std::size_t len)

to be able to write "0000deadbeef0000"_hex instead of HexLiteral<uint8_t>("0000deadbeef0000")?

I haven’t spent a lot of time with this to make it work since I’m not yet sure it’s a good idea.

hodlinator commented at 1:50 pm on August 17, 2024:

As I said here: #30377 (comment)

I tried experimenting with user defined literals in response now but ran into issues with both making them consteval and accepting a size_t-templated char-array argument.

If we are to return a std::array I think we need to take the size as a template argument I think. And no compiler seems to accept consteval user defined literals.

l0rinc commented at 9:10 am on August 19, 2024:

I haven’t realized user defined literals meant that you’ve already tried it, my mistake.

I’m not advocating for it, but I think we can get it to compile by something like:

 0consteval auto operator"" _hex(const char* hex_str, size_t _)
 1{
 2    constexpr std::size_t len = sizeof(hex_str);
 3    static_assert(len % 2 == 0, "Hex string must have an even number of characters");
 4
 5    if (hex_str[len - 1] != '\0') throw "null terminator required";
 6    std::array<std::byte, len / 2> rv{};
 7    size_t i = 0;
 8    for (auto& elem: rv) {
 9        auto hi = ConstevalHexDigit(hex_str[i++]) << 4;
10        elem = static_cast<std::byte>(hi | ConstevalHexDigit(hex_str[i++]));
11    }
12    return rv;
13}

but it’s kinda’ ugly and might not address our problems. Please resolve the comment if this isn’t helpful.

hodlinator commented at 11:33 am on August 19, 2024:

I tried before your suggestion and then tried again when you suggested on all 3 compilers and none worked. Which I stated.

Then you pasted a longer example. Have you tested and got anything remotely close to that working on any compiler, C++ or other language? :)

Resolving.

l0rinc commented at 12:00 pm on August 19, 2024:

The above seemed to be working (that’s why I suggested it originally) - but I’m not in love with this version either - thanks for considering.

l0rinc commented at 4:14 pm on August 20, 2024:

I see that my example above wasn’t working with GCC and had a stupid sizeof bug.

Here’s an alternative using string literal operator template which seems to be working correctly for me:

 0template<size_t N>
 1struct _Hex {
 2    std::array<std::byte, N / 2> hex{};
 3    consteval _Hex(const char (&hex_str)[N]) requires (N % 2 == 1) {
 4        if (hex_str[N - 1]) throw "null terminator required";
 5        for (std::size_t i = 0; i < hex.size(); ++i) {
 6            hex[i] = static_cast<std::byte>(
 7                (ConstevalHexDigit(hex_str[2 * i]) << 4) |
 8                 ConstevalHexDigit(hex_str[2 * i + 1]));
 9        }
10    }
11};
12
13template<_Hex str>
14consteval auto operator"" _hex() { return str.hex; }
15
16template<_Hex str>
17constexpr auto operator"" _hex_v() { return Vec(str.hex); }

used as

0auto hex_array = "9caaf126043eb5"_hex;
1auto hex_vector = "9caaf126043eb5"_hex_v;

 0diff --git a/src/test/miner_tests.cpp b/src/test/miner_tests.cpp
 1--- a/src/test/miner_tests.cpp	(revision 424f8f4ef5042adf64764c65e85d5b25faf60e66)
 2+++ b/src/test/miner_tests.cpp	(date 1724169991324)
 3@@ -29,6 +29,7 @@
 4 
 5 using node::BlockAssembler;
 6 using node::CBlockTemplate;
 7+using util::ConstevalHexDigit;
 8 using util::HexLiteral;
 9 using util::Vec;
10 
11@@ -660,4 +661,55 @@
12     TestPrioritisedMining(scriptPubKey, txFirst);
13 }
14 
15+/////////
16+
17+template<size_t N>
18+struct _Hex {
19+    std::array<std::byte, N / 2> hex;
20+    consteval _Hex(const char (&hex_str)[N]) requires (N % 2 == 1) {
21+        if (hex_str[N - 1]) throw "null terminator required";
22+        for (std::size_t i = 0; i < hex.size(); ++i) {
23+            hex[i] = static_cast<std::byte>(
24+                (ConstevalHexDigit(hex_str[2 * i]) << 4) |
25+                ConstevalHexDigit(hex_str[2 * i + 1]));
26+        }
27+    }
28+};
29+
30+template<_Hex str>
31+consteval auto operator"" _hex() { return str.hex; }
32+
33+template<_Hex str>
34+constexpr auto operator"" _hex_v() { return Vec(str.hex); }
35+
36+/////////
37+
38+template <typename Byte = std::byte, size_t N>
39+consteval std::array<Byte, N / 2> HexLiteral(const char (&hex_str)[N]) requires (N % 2 == 1) {
40+    if (hex_str[N - 1] != '\0') throw "null terminator required";
41+    std::array<Byte, N / 2> rv{};
42+    size_t i = 0;
43+    for (auto& elem : rv) {
44+        auto hi = ConstevalHexDigit(hex_str[i++]) << 4;
45+        elem = static_cast<Byte>(hi | ConstevalHexDigit(hex_str[i++]));
46+    }
47+    return rv;
48+}
49+
50+BOOST_AUTO_TEST_CASE(string_hex)
51+{
52+    auto original = HexLiteral("04678afdb0fe5548271967f1a67130b7105cd6a828e03909a67962e0ea1f61deb649f6bc3f4cef38c4f35504e51ec112de5c384df7ba0b8d578a4c702b6bf11d5f");
53+    static_assert(std::is_same_v<decltype(original), std::array<std::byte, 65>>);
54+    std::cout << HexStr(original) << std::endl;
55+
56+    auto hex_value = "04678afdb0fe5548271967f1a67130b7105cd6a828e03909a67962e0ea1f61deb649f6bc3f4cef38c4f35504e51ec112de5c384df7ba0b8d578a4c702b6bf11d5f"_hex;
57+    static_assert(std::is_same_v<decltype(hex_value), std::array<std::byte, 65>>);
58+    std::cout << HexStr(hex_value) << std::endl;
59+
60+    auto hex_value2 = "9caaf126043eb5"_hex_v;
61+    static_assert(std::is_same_v<decltype(hex_value2), std::vector<std::byte>>);
62+    assert(hex_value2.size() == 7);
63+    std::cout << HexStr(hex_value2) << std::endl;
64+}
65+
66 BOOST_AUTO_TEST_SUITE_END()

See: https://godbolt.org/z/qPz4Ex8dT

ryanofsky commented at 4:46 pm on August 20, 2024:

re: #30377 (review)

Here’s an alternative using string literal operator template which seems to be working correctly for me:

Wow, that looks great! Assuming this works on compilers we need, it looks like a very nice solution that completely avoids problems I was struggling with trying to get a function parameter to act like a template parameter.

hodlinator commented at 6:47 pm on August 20, 2024:

That is awesome! Didn’t realize I had to enable C++20 support on Godbolt. :man_facepalming: Thank you for persisting @paplorinc! Will experiment with this.

maflcko commented at 8:31 am on August 21, 2024:

Heh, nice. I didn’t know that string literal operator templates are possible since C++20.

hodlinator commented at 1:20 pm on August 21, 2024:

Even though the prior version had many ACKs, this reaches another level of awesome. Initial version using ""_hex now pushed in daba1a25a62e72e9797a134c6377d17a9274a25f with @l0rinc as main author of that commit! :tada:

Resolving this specific thread.

l0rinc commented at 12:25 pm on August 17, 2024: contributor

I’m mostly fine with it, would love if we could make a few usages more compact, since I think we’ve lost readability a bit

in src/test/miniscript_tests.cpp:26 in 9c8d091382 outdated

21@@ -22,6 +22,9 @@
22 #include <script/script_error.h>
23 #include <script/signingprovider.h>
24 
25+using util::HexLiteral;
26+using util::Vec;

maflcko commented at 7:45 am on August 19, 2024:

9c8d091382909f44c2fc61b8c726362356db2bff: Why is vec needed in this file?

hodlinator commented at 7:04 pm on August 19, 2024:

If I change

0std::vector<unsigned char> nonminpush = ParseHex("0000210232780000feff00ffffffffffff21ff005f00ae21ae00000000060602060406564c2102320000060900fe00005f00ae21ae00100000060606060606000000000000000000000000000000000000000000000000000000000000000000");
1const CScript nonminpush_script(nonminpush.begin(), nonminpush.end());

->

0const auto nonminpush = HexLiteral<uint8_t>("0000210232780000feff00ffffffffffff21ff005f00ae21ae00000000060602060406564c2102320000060900fe00005f00ae21ae00100000060606060606000000000000000000000000000000000000000000000000000000000000000000");
1const CScript nonminpush_script(nonminpush.begin(), nonminpush.end());

I can indeed omit Vec() and it compiles.. on everything except MSVC, because the std::array iterators do not satisfy any CScript ctors. So it’s still needed until your #29369 lands.

in src/test/script_tests.cpp:1365 in 309d8b7831 outdated

1360@@ -1361,6 +1361,12 @@ static CScript ScriptFromHex(const std::string& str)
1361     return CScript(data.begin(), data.end());
1362 }
1363 
1364+template <size_t N>
1365+CScript ToScript(const std::array<uint8_t, N>& array)

maflcko commented at 8:11 am on August 19, 2024:

nit in 309d8b783109eb52a5596712b19210271c8f882e (and the next commit):

I know I’ve raised this before in #30377 (review), but I still think it would be cleaner to drop this function (and related changes) from this pull request, because:

It isn’t strictly needed, because replacing the test-only ScriptFromHex wasn’t a direct goal of this pull
It is incomplete, because it adds a bit of compile-time checking for the new paths using ToScript, but leaves the checking out completely for the remaining ScriptFromHex paths. It would be better to add the check data = *Assert(TryParseHex(str)) to ScriptFromHex instead and drop this function.
It seems inconsistent and harder to read when there are two functions doing the exact same thing (turn hex into a script) in the same test case. (Are we going to duplicate every test-only helper just because there is a compile-time and a run-time checked function to to the same?)
It makes the test more verbose
It makes the diff larger and review take longer

Whereas the benefits are limited, because tests are deterministically executed, and this test runs faster than it compiles, so any errors are found similarly fast by the developer introducing them.

hodlinator commented at 3:11 pm on August 19, 2024:

It isn’t strictly needed, because replacing the test-only ScriptFromHex wasn’t a direct goal of this pull

It represents indirect use of ParseHex with string literals.

It is incomplete, because it adds a bit of compile-time checking for the new paths using ToScript, but leaves the checking out completely for the remaining ScriptFromHex paths. It would be better to add the check data = *Assert(TryParseHex(str)) to ScriptFromHex instead and drop this function.

Will add your suggestion to ScriptFromHex. :+1:

It seems inconsistent and harder to read when there are two functions doing the exact same thing (turn hex into a script) in the same test case. (Are we going to duplicate every test-only helper just because there is a compile-time and a run-time checked function to to the same?)

They are not the same function (I know they were initially named very similarly though). With some adjustment one can be partially implemented using the other:

0template <typename T>
1CScript ToScript(const T& container)
2{
3    return {container.data(), container.data() + container.size()};
4}
5
6static CScript ScriptFromHex(const std::string& str)
7{
8    return ToScript(ParseHex(str));
9}

It makes the test more verbose

This is in part due to your suggestion of std::byte defaults. Once that is fixed I don’t think…

0s = ToScript(HexLiteral("0302ff030302ff03"));

…will be too bad.

It makes the diff larger and review take longer

This is part of why we have scripted diffs, right?

Whereas the benefits are limited, because tests are deterministically executed, and this test runs faster than it compiles, so any errors are found similarly fast by the developer introducing them.

It’s down to “what can at low cost run at compile time, should”.

That said, I am open to not touching ScriptFromHex-usages if you insist again (no need to motivate this time, just write :-1: ). Thank you for your patience.

maflcko commented at 8:01 am on August 20, 2024:

I don’t think…
0s = ToScript(HexLiteral("0302ff030302ff03"));
…will be too bad.

Sure, up to you, it is just a style nit :)

I still think s = ScriptFromHex("0302ff030302ff03") is better, because it is shorter, more consistent in the test case, and comes with the same checks at roughly the same CPU cost.

If you really want to change it, maybe it can be done in the follow-up that allows std::byte serialization into script? This would also avoid having to touch the same lines of code twice?

Edit: If you really want to keep it here, shouldn’t it accept any byte container, to avoid having to touch the same lines of code twice?

0template <typename T>
1CScript ToScript(const T& byte_container)
2{
3   auto container{MakeUCharSpan(byte_container)};
4    return {container.data(), container.data() + container.size()};
5}

hodlinator commented at 9:06 am on August 20, 2024:

Are you suggesting adding MakeUCharSpan so that one could send in a raw array that would otherwise not have .data() and .size()?

Wouldn’t the call to MakeUCharSpan have to be re-touched if we switch to using only std::byte for CScript?

maflcko commented at 9:08 am on August 20, 2024:

Are you suggesting adding MakeUCharSpan so that one could send in a raw array that would otherwise not have .data() and .size()?

No, just to hide the reinterpret cast behind a curtain

Wouldn’t the call to MakeUCharSpan have to be re-touched if we switch to using only std::byte for CScript?

Yes, but that would be just two lines of code touched, as opposed to “many”, no?

hodlinator commented at 9:19 am on August 20, 2024:

No, just to hide the reinterpret cast behind a curtain

Aha, so you are concerned about the element type of the container mismatching, but still within allowable realm of UCharCast, got it.

Wait.. that will allow me to change ToScript(HexLiteral<uint8_t>( into ToScript(HexLiteral( - thanks!

hodlinator commented at 8:13 am on August 19, 2024: contributor

Added section on std::array<std::byte> to PR summary.

maflcko commented at 8:22 am on August 19, 2024: member

I haven’t reviewed the last two commits, because I think the test-only changes offers the least amount of benefits, while being the hardest to review, because the type is changed and thus one has to make sure the call graph is still the same. Also, they seemingly are attracting the most bike-shedding.

I think it would be better to remember the commit and then just update the called sites to accept std::byte (and then use that as an excuse to change the tests one-by-one to use the new HexLiteral function) in a follow-up. Otherwise, the tests will be changed again anyway for that reason (to replace HexLiteral<uint8_t> with HexLiteral).

If you decide to keep the last two commits, it would be good to correct the scripted-diff, because I think it is wrong and just happens to work by accident. The replacement is HexLiteral\1<uint8_t>, where \1 refers to the original inner Byte type, for example <std::byte>. However HexLiteral<std::byte><uint8_t> wouldn’t be valid C++ code, when the scripted-diff happens to pick it up in the future.

review ACK 01e18d94d9577f415748869376988e3f0f59ced0 🕶

Signature:

0untrusted comment: signature from minisign secret key on empty file; verify via: minisign -Vm "${path_to_any_empty_file}" -P RWTRmVTMeKV5noAMqVlsMugDDCyyTSbA3Re5AkUrhvLVln0tSaFWglOw -x "${path_to_this_whole_four_line_signature_blob}"
1RUTRmVTMeKV5npGrKx1nqXCw5zeVHdtdYURB/KlyA/LMFgpNCs+SkW9a8N95d+U4AP1RJMi+krxU1A3Yux4bpwZNLvVBKy0wLgM=
2trusted comment: review ACK 01e18d94d9577f415748869376988e3f0f59ced0 🕶
3UHkELMR2tX1aCqoW34kTrN54RD5uxnY1x/QSLH2gY0/HJ15FhM9ZEnlKU16r7AkBB6d6Dlgr99qRlQ2o6wScBA==

maflcko approved

maflcko commented at 9:01 am on August 19, 2024: member

Still haven’t reviewed the last two commits, otherwise the changes since my previous review were:

Add back the comment to the XOnlyPubKey constructor (taking a span)
Use safer UCharCast in one instance
Some refactoring in the crypter module (Personally, I’d prefer if multi-line refactoring were done in a separate commit, not mixed with renaming and type-changes, but not sure if it makes sense to change this pull now)

re-ACK 67fc994bedf14e360b3e51fa1a71dc6c1684b532 🍠

Signature:

0untrusted comment: signature from minisign secret key on empty file; verify via: minisign -Vm "${path_to_any_empty_file}" -P RWTRmVTMeKV5noAMqVlsMugDDCyyTSbA3Re5AkUrhvLVln0tSaFWglOw -x "${path_to_this_whole_four_line_signature_blob}"
1RUTRmVTMeKV5npGrKx1nqXCw5zeVHdtdYURB/KlyA/LMFgpNCs+SkW9a8N95d+U4AP1RJMi+krxU1A3Yux4bpwZNLvVBKy0wLgM=
2trusted comment: re-ACK 67fc994bedf14e360b3e51fa1a71dc6c1684b532 🍠
3bFF4oTk98kWCPHMIS+g5dBIiIgb5V1Amx9ULwWvE9TGCIN3neWeUKgPR9J9A2tUOvyYV0NUITo8uy7hQz/qcAw==

DrahtBot requested review from ryanofsky on Aug 19, 2024

in src/wallet/test/fuzz/crypter.cpp:38 in 67fc994bed outdated

35@@ -36,10 +36,10 @@ FUZZ_TARGET(crypter, .init = initialize_crypter)
36         const unsigned int derivation_method = fuzzed_data_provider.ConsumeBool() ? 0 : fuzzed_data_provider.ConsumeIntegral<unsigned int>();
37 
38         // Limiting the value of nRounds since it is otherwise uselessly expensive and causes a timeout when fuzzing.

stickies-v commented at 10:20 am on August 19, 2024:

rename nit

0        // Limiting the value of rounds since it is otherwise uselessly expensive and causes a timeout when fuzzing.

l0rinc commented at 1:44 pm on August 19, 2024:

I think it’s referring to: https://github.com/bitcoin/bitcoin/blob/master/src/wallet/crypter.cpp#L40 i.e.

0const unsigned int nRounds

stickies-v commented at 1:52 pm on August 19, 2024:

If you follow the call graph, I don’t see how that can be true?

l0rinc commented at 1:59 pm on August 19, 2024:

You’re right, had an older version locally!

in src/test/util_tests.cpp:154 in 67fc994bed outdated

151+static_assert((sizeof(HEX_PARSE_INPUT) - 1) == 2 * sizeof(HEX_PARSE_OUTPUT));
152 BOOST_AUTO_TEST_CASE(parse_hex)
153 {
154     std::vector<unsigned char> result;
155-    std::vector<unsigned char> expected(ParseHex_expected, ParseHex_expected + sizeof(ParseHex_expected));
156+    std::vector<unsigned char> expected(std::begin(HEX_PARSE_OUTPUT), std::end(HEX_PARSE_OUTPUT));

stickies-v commented at 10:47 am on August 19, 2024:

nit: unnecessary std::begin

0    std::vector<unsigned char> expected(HEX_PARSE_OUTPUT, std::end(HEX_PARSE_OUTPUT));

hodlinator commented at 9:11 pm on August 19, 2024:

Unless someone finds a way to maintain symmetry using ranges or something, I’m keeping as-is.

in src/test/util_tests.cpp:220 in 67fc994bed outdated

215+    BOOST_CHECK_EQUAL(ConstevalHexDigit('f'), 0xf);
216+}
217+
218 BOOST_AUTO_TEST_CASE(util_HexStr)
219 {
220     BOOST_CHECK_EQUAL(

stickies-v commented at 10:56 am on August 19, 2024:

nit: these checks can be tidied up a bit:

 0diff --git a/src/test/util_tests.cpp b/src/test/util_tests.cpp
 1index 49bc4b1b50..792fdfde3e 100644
 2--- a/src/test/util_tests.cpp
 3+++ b/src/test/util_tests.cpp
 4@@ -217,17 +217,9 @@ BOOST_AUTO_TEST_CASE(consteval_hex_digit)
 5 
 6 BOOST_AUTO_TEST_CASE(util_HexStr)
 7 {
 8-    BOOST_CHECK_EQUAL(
 9-        HexStr(HEX_PARSE_OUTPUT),
10-        HEX_PARSE_INPUT);
11-
12-    BOOST_CHECK_EQUAL(
13-        HexStr(Span{HEX_PARSE_OUTPUT}.last(0)),
14-        "");
15-
16-    BOOST_CHECK_EQUAL(
17-        HexStr(Span{HEX_PARSE_OUTPUT}.first(0)),
18-        "");
19+    BOOST_CHECK_EQUAL(HexStr(HEX_PARSE_OUTPUT), HEX_PARSE_INPUT);
20+    BOOST_CHECK_EQUAL(HexStr(Span{HEX_PARSE_OUTPUT}.last(0)), "");
21+    BOOST_CHECK_EQUAL(HexStr(Span{HEX_PARSE_OUTPUT}.first(0)), "");
22 
23     {
24         const std::vector<char> in_s{HEX_PARSE_OUTPUT, HEX_PARSE_OUTPUT + 5};

in src/util/strencodings.h:70 in 67fc994bed outdated

66@@ -67,6 +67,7 @@ std::vector<Byte> ParseHex(std::string_view hex_str)
67 {
68     return TryParseHex<Byte>(hex_str).value_or(std::vector<Byte>{});
69 }
70+

stickies-v commented at 10:57 am on August 19, 2024:

nit: phantom newline

stickies-v approved

stickies-v commented at 1:37 pm on August 19, 2024: contributor

ACK 67fc994bedf14e360b3e51fa1a71dc6c1684b532, but

I think it would be better to remember the commit and then just update the called sites to accept std::byte (and then use that as an excuse to change the tests one-by-one to use the new HexLiteral function) in a follow-up. Otherwise, the tests will be changed again anyway for that reason (to replace HexLiteral<uint8_t> with HexLiteral).

I agree with this approach, although I also wouldn’t object to keeping the PR as-is.

Otherwise, a few trivial nits, but nothing blocking.

in src/wallet/crypter.cpp:120 in 72256f9d78 outdated

120-    CCrypter cKeyCrypter;
121-    std::vector<unsigned char> chIV(WALLET_CRYPTO_IV_SIZE);
122-    memcpy(chIV.data(), &nIV, WALLET_CRYPTO_IV_SIZE);
123-    if(!cKeyCrypter.SetKey(vMasterKey, chIV))
124+    CCrypter key_crypter;
125+    if (!key_crypter.SetKey(master_key, {iv.data(), iv.data() + WALLET_CRYPTO_IV_SIZE}))

ryanofsky commented at 1:40 pm on August 19, 2024:

In commit “refactor: Make code more tolerant of constexpr std::arrays” (72256f9d78d2187411f55aa3f779170bed9bde1e):

Would be nice to add:

0static_assert(WALLET_CRYPTO_IV_SIZE <= iv.size());

Since it’s not obvious iv size matches WALLET_CRYPTO_IV_SIZE (I was surprised to see it’s actually twice the size).

ryanofsky approved

ryanofsky commented at 2:16 pm on August 19, 2024: contributor

Code review ACK 67fc994bedf14e360b3e51fa1a71dc6c1684b532, but it’d be fine to drop last two commits of this PR. I do think they would be improvement despite drawbacks Marco listed, but they aren’t necessary and could be saved for a followup which fixes their shortcomings.

hodlinator commented at 8:38 pm on August 19, 2024: contributor

@maflcko:

If you decide to keep the last two commits, it would be good to correct the scripted-diff, because I think it is wrong and just happens to work by accident. The replacement is HexLiteral\1<uint8_t>, where \1 refers to the original inner Byte type, for example <std::byte>. However HexLiteral<std::byte><uint8_t> wouldn’t be valid C++ code, when the scripted-diff happens to pick it up in the future.

Well spotted! I wrote it that way for it to fail compilation and prompt manual adjustments. But I’m replacing it with:

0sed -i --regexp-extended 's/\bParseHex(\("[^"]*"\))/HexLiteral<uint8_t>\1/g' $(git grep -l ParseHex -- :src ':(exclude)src/test/util_tests.cpp')
1sed -i --regexp-extended 's/\bParseHex<std::byte>(\("[^"]*"\))/HexLiteral\1/g' $(git grep -l ParseHex -- :src ':(exclude)src/test/util_tests.cpp')
2sed -i --regexp-extended 's/\bParseHex(<[^>]*>)(\("[^"]*"\))/HexLiteral\1\2/g' $(git grep -l ParseHex -- :src ':(exclude)src/test/util_tests.cpp')

…which should :tm: not require manual adjustments.

hodlinator force-pushed on Aug 19, 2024

hodlinator commented at 9:09 pm on August 19, 2024: contributor

The major change in the latest push is that I’ve broken out the changes to CCrypter + tests into several more commits (code improvement, de-Hungarianization rename, vector->span, separate XOnlyPubKey) to try to clarify what is actually happening. Suggested by maflcko.

I’ve implemented ScriptFromHex in terms of a modified ToScript in a last attempt before maflcko gets to veto it.

Done some util_tests.cpp adjustments as suggested by stickies-v & paplorinc. Added ryanofsky’s assert with some <type_traits> adjustments.

Also improved the scripted diff as mentioned in the message right above.

l0rinc commented at 9:19 pm on August 19, 2024: contributor

Thanks for your patience, re-reviewed everything from scratch - the focused commits help a lot. ACK 80a596ecca5df9f471d9fbcd9fcd15ddd296cdca

DrahtBot requested review from ryanofsky on Aug 19, 2024

DrahtBot requested review from stickies-v on Aug 19, 2024

DrahtBot requested review from maflcko on Aug 19, 2024

hodlinator force-pushed on Aug 20, 2024

hodlinator commented at 7:38 am on August 20, 2024: contributor

Apologies for another push.

Noticed that some of the commit messages were incorrect/imprecise: “Only touching functions that will be modified in upcoming 2 commits.” -> “Only touching functions that will be modified in next commit.” (A prior local version had code fixups happening after the de-Hungarianization and vector->span, which wasn’t as smooth). “Lines will be touched in upcoming commits within this PR.” -> “Lines will be touched in next 2 commits.”

Content change: The de-Hungarianization commit was adding a space after if, which now happens in the commit before.

in src/wallet/crypter.cpp:101 in 08a880de6c outdated

101 
102     AES256CBCDecrypt dec(vchKey.data(), vchIV.data(), true);
103-    nLen = dec.Decrypt(vchCiphertext.data(), vchCiphertext.size(), vchPlaintext.data());
104-    if(nLen == 0)
105+    int nLen = dec.Decrypt(vchCiphertext.data(), vchCiphertext.size(), vchPlaintext.data());
106+    if (nLen == 0)

maflcko commented at 7:43 am on August 20, 2024:

Content change: The de-Hungarianization commit was adding a space after if, which now happens in the commit before.

style nit in 08a880de6c94bc84c1f43a3845e1645d3eb67607: Not sure about adding missing space to lines in one commit and then modifying the lines again in a rename commit. I think adding “missing” space can be done in the rename commit. (Same for the other clang-format changes in “unrelated lines” in commit 08a880de6c94bc84c1f43a3845e1645d3eb67607) (But just a style nit)

hodlinator commented at 8:18 am on August 20, 2024:

The commit is modifying the line just above the if, and then the second commit modifies both (and it keeps the second commit a pure rename).

Is the concern that devs running git blame will have to exclude multiple commits?

maflcko commented at 8:30 am on August 20, 2024:

Is the concern that devs running git blame will have to exclude multiple commits?

Yes. Repeatedly modifying the same lines of code can not be avoided sometimes, but seems better to minimize, because:

(as you say) It increases the git blame depth, and decreases the usefulness of the git log -S parity search (or similar tools)
Reviewers will have to look at the same line twice, which may take more time

Obviously this conflicts with the desire to keep unrelated changes in separate commits, but I think changing whitespace can almost always be done in the commit that touches the line. (At least in this change, it seems applicable)

l0rinc commented at 11:32 am on August 20, 2024:

I think adding “missing” space can be done in the rename commit.

Agree, would have preferred it that way as well. But to be fair, git can ignore whitespaces during blame: https://git-scm.com/docs/git-blame#Documentation/git-blame.txt--w

in src/wallet/crypter.cpp:122 in 8ad548f389 outdated

122-    std::vector<unsigned char> chIV{nIV.begin(), nIV.begin() + WALLET_CRYPTO_IV_SIZE};
123-    if (!cKeyCrypter.SetKey(vMasterKey, chIV))
124+    CCrypter key_crypter;
125+    static_assert(WALLET_CRYPTO_IV_SIZE <= std::remove_reference<decltype(iv)>::type::size());
126+    std::vector<unsigned char> iv_prefix{iv.begin(), iv.begin() + WALLET_CRYPTO_IV_SIZE};
127+    if (!key_crypter.SetKey(master_key, iv_prefix))

maflcko commented at 7:46 am on August 20, 2024:

style nit in 8ad548f389f6abd16db39dadafe3fbeefaedec3a: (up to you) If you want, you can add the missing {} after all if, according to the dev notes, in this commit. Feel free to ignore.

hodlinator commented at 8:26 am on August 20, 2024:

Aha, hadn’t fully internalized that braces were only allowed to be skip if the then-statement appeared on the same line.

Will fix here and possibly other places if I retouch.

Guess you prefer not hoisting up the return false; onto the same line since it disrupts git blame?

hodlinator commented at 10:31 am on August 20, 2024:

Done in latest push without any hoisting.

in src/test/util_tests.cpp:211 in 2c68649f09 outdated

218+    BOOST_CHECK_EQUAL(HexStr(Span{HEX_PARSE_OUTPUT}.last(0)), "");
219+    BOOST_CHECK_EQUAL(HexStr(Span{HEX_PARSE_OUTPUT}.first(0)), "");
220 
221     {
222-        const std::vector<char> in_s{ParseHex_expected, ParseHex_expected + 5};
223+        constexpr char out_exp[] = "04678afdb0";

maflcko commented at 7:49 am on August 20, 2024:

style nit in 2c68649f0932dac146b941fb6b4e2fbc52fbd8a0: Not sure about switching string/string_views to a raw C-array in C++ code when there is no reason for it. Modern compilers can optimize away sting/string_view sizes on literals, so my preference would be to not change the type here.

Example:

0int main() {
1    std::string sv{"04678afdb0"};
2    return sv.size();
3}

Optimized into:

0int main() {
1    return 10;
2}

on modern compilers.

hodlinator commented at 8:44 am on August 20, 2024:

My goal was to make it constexpr. (MakeUCharSpan and MakeByteSpan used on the lines below are faux-constexpr since they call functions that cannot be constexpr as they use reinterpret_cast).

A possible but noisier alternative, keeping string would be:

0static constexpr std::string out_exp{"04678afdb0"};

string_view might be a good compromise:

0constexpr std::string_view out_exp{"04678afdb0"};

I general having a plain C-array is more terse and narrows what the programmer (and compiler) can expect the type to do, so I prefer it in cases like this. But will switch to string_view if I retouch.

maflcko commented at 9:05 am on August 20, 2024:

I general having a plain C-array is more terse and narrows what the programmer (and compiler) can expect the type to do, so I prefer it in cases like this. But will switch to string_view if I retouch.

I guess I disagree that C-arrays should be preferred. Let’s recall the beginning of this pull request (https://github.com/bitcoin/bitcoin/pull/30377#issuecomment-2208857200) which fixed an issue where a raw C-pointer was passed incorrectly. (C-arrays allow silent conversion into a raw C-pointer, so at least std::array should be used, if you really want to use arrays).

Also, sizeof (which is called one line below) on the raw C-array may or may not count the null terminator, depending on how the array was constructed, which seems like another footgun.

Though, your code is correct and this is just a style nit, so anything is fine here.

hodlinator commented at 10:31 am on August 20, 2024:

Agree sizeof was more of a footgun.

hodlinator force-pushed on Aug 20, 2024

in src/wallet/crypter.cpp:124 in e29741857e outdated

124-    CCrypter cKeyCrypter;
125-    std::vector<unsigned char> chIV(WALLET_CRYPTO_IV_SIZE);
126-    memcpy(chIV.data(), &nIV, WALLET_CRYPTO_IV_SIZE);
127-    if(!cKeyCrypter.SetKey(vMasterKey, chIV))
128+    CCrypter key_crypter;
129+    static_assert(WALLET_CRYPTO_IV_SIZE <= std::remove_reference<decltype(iv)>::type::size());

stickies-v commented at 10:41 am on August 20, 2024:

nit: std::remove_reference_t

0    static_assert(WALLET_CRYPTO_IV_SIZE <= std::remove_reference_t<decltype(iv)>::size());

hodlinator commented at 11:41 am on August 20, 2024:

Thanks, taking that!

in src/wallet/crypter.cpp:103 in bad1ffb725 outdated

102+    plaintext.resize(ciphertext.size());
103 
104     AES256CBCDecrypt dec(vchKey.data(), vchIV.data(), true);
105-    int nLen = dec.Decrypt(vchCiphertext.data(), vchCiphertext.size(), vchPlaintext.data());
106-    if(nLen == 0)
107+    int len = dec.Decrypt(ciphertext.data(), ciphertext.size(), plaintext.data());

stickies-v commented at 11:16 am on August 20, 2024:

nit to reduce variable scope and make it slightly more readable:

 0diff --git a/src/wallet/crypter.cpp b/src/wallet/crypter.cpp
 1index a574c8909e..a4b1fd9b3b 100644
 2--- a/src/wallet/crypter.cpp
 3+++ b/src/wallet/crypter.cpp
 4@@ -100,12 +100,11 @@ bool CCrypter::Decrypt(const std::vector<unsigned char>& ciphertext, CKeyingMate
 5     plaintext.resize(ciphertext.size());
 6 
 7     AES256CBCDecrypt dec(vchKey.data(), vchIV.data(), true);
 8-    int len = dec.Decrypt(ciphertext.data(), ciphertext.size(), plaintext.data());
 9-    if (len == 0) {
10-        return false;
11+    if (int len{dec.Decrypt(ciphertext.data(), ciphertext.size(), plaintext.data())}) {
12+        plaintext.resize(len);
13+        return true;
14     }
15-    plaintext.resize(len);
16-    return true;
17+    return false;
18 }
19 
20 bool EncryptSecret(const CKeyingMaterial& vMasterKey, const CKeyingMaterial &vchPlaintext, const uint256& nIV, std::vector<unsigned char> &vchCiphertext)

hodlinator commented at 11:41 am on August 20, 2024:

Started changing it based on your comment, but as there is already another if-return false above in the untouched code, I think it’s more consistent to keep as-is.

stickies-v approved

stickies-v commented at 11:17 am on August 20, 2024: contributor

re-ACK e29741857e900d933b5cf0fb22e3a63bfa1ecd6a

DrahtBot requested review from maflcko on Aug 20, 2024

DrahtBot requested review from l0rinc on Aug 20, 2024

maflcko added the label DrahtBot Guix build requested on Aug 20, 2024

hodlinator force-pushed on Aug 20, 2024

l0rinc commented at 11:47 am on August 20, 2024: contributor

reACK c3fa29db1b05aa51f74cc8be5bdae59be4f3b7c0

https://github.com/bitcoin/bitcoin/compare/80a596ecca5df9f471d9fbcd9fcd15ddd296cdca..c3fa29db1b05aa51f74cc8be5bdae59be4f3b7c0:

Removed many <uint8_t> inside ToScript
out_exp -> string_view
braces in wallet_crypto_tests.cpp and crypter.cpp
remove_reference -> remove_reference_t

DrahtBot requested review from stickies-v on Aug 20, 2024

stickies-v commented at 12:16 pm on August 20, 2024: contributor

re-ACK c3fa29db1b05aa51f74cc8be5bdae59be4f3b7c0

l0rinc commented at 2:45 pm on August 20, 2024: contributor

does not make the binary smaller (built x86_64-linux-gnu guix bitcoind binary) - actually grows by 0.04% (~49.6 KB).

Not sure if it’s a dealbreaker of not, but it seems that when compiling with GCC with -O1 we’re not actually storing the hexadecimal values as an array of bytes, but rather as separate stack pushes, which might be the reason for the size increase, i.e.

0HexLiteral("04678afdb0fe5548271967f1a67130b7105cd6a828e03909a67962e0ea1f61deb649f6bc3f4cef38c4f35504e51ec112de5c384df7ba0b8d578a4c702b6bf11d5f")

Seems to produce

0    mov     BYTE PTR [rbp-80], 4
1    mov     BYTE PTR [rbp-79], 103
2    mov     BYTE PTR [rbp-78], -118
3    mov     BYTE PTR [rbp-77], -3
4    ...
5    mov     BYTE PTR [rbp-18], -15
6    mov     BYTE PTR [rbp-17], 29
7    mov     BYTE PTR [rbp-16], 95

See: https://godbolt.org/z/7xGb1oYvq

 0main:
 1    push    rbp
 2    mov     rbp, rsp
 3    sub     rsp, 80
 4    mov     dword ptr [rbp - 4], 0
 5    lea     rdi, [rbp - 69]
 6    lea     rsi, [rip + .L__const.main.hex_value]
 7    mov     edx, 65
 8    call    memcpy@PLT
 9    xor     eax, eax
10    add     rsp, 80
11    pop     rbp
12    ret
13
14.L__const.main.hex_value:
15    .ascii  "\004g\212\375\260\376UH'\031g\361\246q0\267\020\\\326\250(\3409\t\246yb\340\352\037a\336\266I\366\274?L\3578\304\363U\004\345\036\301\022\336\\8M\367\272\013\215W\212Lp+k\361\035_"

in src/test/script_tests.cpp:1361 in af751d7030 outdated

1354@@ -1355,10 +1355,16 @@ BOOST_AUTO_TEST_CASE(script_GetScriptAsm)
1355     BOOST_CHECK_EQUAL(derSig + "83 " + pubKey, ScriptToAsmStr(CScript() << ToByteVector(ParseHex(derSig + "83")) << vchPubKey));
1356 }
1357 
1358+template <typename T>
1359+CScript ToScript(const T& byte_container)
1360+{
1361+    auto container{MakeUCharSpan(byte_container)};

ryanofsky commented at 3:30 pm on August 20, 2024:

In commit “refactor: add util::HexLiteral and util::Vec using statements” (af751d7030ccb5c2e3e699bd90655a53529953a3)

Renaming container variable to span would seem clearer, since this is now a span not a generic container.

Also this commit message is out of date since it is no longer using util::Vec, and the commit is no longer mostly about adding using statements. I’d probably just call it “Prepare for HexLiteral scripted-diff”

in src/wallet/test/wallet_crypto_tests.cpp:76 in 7713b5fb5c outdated

102 {
103-    TestEncryptSingle(crypt, CKeyingMaterial{vchPlaintextIn.begin(), vchPlaintextIn.end()}, vchCiphertextCorrect);
104-    for(std::vector<unsigned char>::const_iterator i(vchPlaintextIn.begin()); i != vchPlaintextIn.end(); ++i)
105-        TestEncryptSingle(crypt, CKeyingMaterial(i, vchPlaintextIn.end()));
106+    TestEncryptSingle(crypt, CKeyingMaterial{plaintext.begin(), plaintext.end()}, correct_ciphertext);
107+    for (auto it{plaintext.begin()}; it != plaintext.end(); ++it) {

ryanofsky commented at 3:40 pm on August 20, 2024:

In commit “refactor: de-Hungarianize CCrypter” (7713b5fb5c5bf0897037b9857a4adfbbc7e8db56)

Commit message seems to suggest that this is only changing names and adding braces, but this is also switching an iterator type to auto. Might be good to mention in commit message, assuming this change wasn’t intended for a different commit.

in src/bench/bech32.cpp:17 in 288f995f71 outdated

 9@@ -10,12 +10,13 @@
10 #include <string>
11 #include <vector>
12 
13+using util::HexLiteral;
14 
15 static void Bech32Encode(benchmark::Bench& bench)
16 {
17-    std::vector<uint8_t> v = ParseHex("c97f5a67ec381b760aeaf67573bc164845ff39a3bb26a1cee401ac67243b48db");
18+    constexpr std::array<uint8_t, 32> v{HexLiteral<uint8_t>("c97f5a67ec381b760aeaf67573bc164845ff39a3bb26a1cee401ac67243b48db")};

ryanofsky commented at 4:28 pm on August 20, 2024:

In commit “refactor: Hand-replace some ParseHex -> util::HexLiteral” (288f995f71fd8e6f750d668e47237e64d893cc1f)

I feel like commit message could benefit from a summary of the changes it is making. Maybe would suggest:

The following scripted-diff commit will replace ParseHex(...) with HexLiteral<uint8_t>(...) but replacement will not work in cases where vectors are needed instead of arrays, and is not ideal in cases where std::byte can be used instead of uint8_t, so make replacements in these cases manually.

ryanofsky approved

ryanofsky commented at 4:37 pm on August 20, 2024: contributor

Code review ACK c3fa29db1b05aa51f74cc8be5bdae59be4f3b7c0. Main changes since last review were cleanup commits being splitup, and ToScript being changed to accept std::byte so it works better with HexLiteral

sipa commented at 4:55 pm on August 20, 2024: member

@paplorinc

when compiling with GCC with -O1

I don’t think we care about performance/binary properties when compiling with anything below -O2.

l0rinc commented at 5:03 pm on August 20, 2024: contributor

I don’t think we care about performance/binary properties when compiling with anything below -O2.

But the final binary seems to be bigger than before - @hodlinator, did you compare the before/after with -O3?

hodlinator commented at 6:42 pm on August 20, 2024: contributor

But the final binary seems to be bigger than before - @hodlinator, did you compare the before/after with -O3? @paplorinc I compared before/after using a guix-build since that should be representative, don’t know which optimization level it has. Updated the relative size increase in the PR summary ~6 hours ago, based on the latest push (c3fa29db1b05aa51f74cc8be5bdae59be4f3b7c0). +0.04% seems okay if you ask me.

DrahtBot commented at 8:31 pm on August 20, 2024: contributor

Guix builds (on x86_64) [untrusted test-only build, possibly unsafe, not for production use]

File	commit d79ea809d28197b1b4e3748aa1715272b53601d0(master)	commit e769f5579d7b0e147b6e1c9fbacb855d4d9cf679(master and this pull)
SHA256SUMS.part	`3d911b898e9ed91f...`	`d62651e018619c11...`
*-aarch64-linux-gnu-debug.tar.gz	`f334765cfd2ddfad...`	`73218910dbc0f0f4...`
*-aarch64-linux-gnu.tar.gz	`71815acb3280990d...`	`54f18176f2f90839...`
*-arm-linux-gnueabihf-debug.tar.gz	`04f24e107a1964ca...`	`f2be579f2b2499a1...`
*-arm-linux-gnueabihf.tar.gz	`5c9d374a903ee25c...`	`10d9657ef8ad1c53...`
*-arm64-apple-darwin-unsigned.tar.gz	`557d7dff56ee8e09...`	`4ebe258f0aea2a52...`
*-arm64-apple-darwin-unsigned.zip	`abeac21881c99a45...`	`f450e81f40976e67...`
*-arm64-apple-darwin.tar.gz	`e01f42186a42b4cf...`	`94a976ffa8dcaf05...`
*-powerpc64-linux-gnu-debug.tar.gz	`92c87b02f1f7902e...`	`56053c8c11417f64...`
*-powerpc64-linux-gnu.tar.gz	`d380db9025c9edb2...`	`967c7f121efe4315...`
*-riscv64-linux-gnu-debug.tar.gz	`efc52e2d31119579...`	`dd945e671aa7dd10...`
*-riscv64-linux-gnu.tar.gz	`13d42c6099da8cf0...`	`a95bc2166ae09e63...`
*-x86_64-apple-darwin-unsigned.tar.gz	`69766bcdb0e7fbe4...`	`082bda236d4804c1...`
*-x86_64-apple-darwin-unsigned.zip	`476c4d4b5f427258...`	`017d0f23e006ad5d...`
*-x86_64-apple-darwin.tar.gz	`0c22dd8521ff4994...`	`245ad23101d2b8f1...`
*-x86_64-linux-gnu-debug.tar.gz	`c89a5bfd3eb1648c...`	`3570a9db9b29ef88...`
*-x86_64-linux-gnu.tar.gz	`439f2639afcc8383...`	`9f6368f903a294f2...`
*.tar.gz	`dca2b46624c3038f...`	`c63b6302c66f7789...`
guix_build.log	`21734aba68dd5df4...`	`4a82c98c2e2c1996...`
guix_build.log.diff		`53d56c56ccd829f2...`

DrahtBot removed the label DrahtBot Guix build requested on Aug 20, 2024

l0rinc commented at 9:09 pm on August 20, 2024: contributor

+0.04% seems okay if you ask me.

Maybe, but I was expecting smaller, that’s why I started digging.

maflcko commented at 8:25 am on August 21, 2024: member

An O2 godbolt would be https://godbolt.org/z/Kqxe6353P

+0.04% seems okay if you ask me.

Are you sure? I checked the DrahtBot guix build and everything I checked was smaller, but maybe the build is malicious or I made a mistake.

hodlinator commented at 8:30 am on August 21, 2024: contributor

This is how I checked the sizes of current tip vs base:

0$ ls -al guix-build-c3fa29db1b05/distsrc-c3fa29db1b05-x86_64-linux-gnu/src/bitcoind
1-rwxr-xr-x 1 hodlinator users 115048128 aug 20 14:10 guix-build-c3fa29db1b05/distsrc-c3fa29db1b05-x86_64-linux-gnu/src/bitcoind
2
3$ ls -al guix-build-5fdbc8b4ee60/distsrc-5fdbc8b4ee60-x86_64-linux-gnu/src/bitcoind
4-rwxr-xr-x 1 hodlinator users 114998544 aug 13 11:29 guix-build-5fdbc8b4ee60/distsrc-5fdbc8b4ee60-x86_64-linux-gnu/src/bitcoind

maflcko commented at 8:39 am on August 21, 2024: member

I see, so I guess you are counting the increase in the debug symbols, which seems plausible, given that the pull request includes refactoring changes to introduce more implicit or explicit function or constructor calls (vector->span conversions, as well as newly introduced array->vector conversions)

hodlinator commented at 9:06 am on August 21, 2024: contributor

Aha.. was assuming Guix builds weren’t producing debug symbols.. do the non-debug archives under /output/ contain the release-binaries? Results after untaring them:

0$ ls -al guix-build-c3fa29db1b05/output/x86_64-linux-gnu/bitcoin-c3fa29db1b05/bin/bitcoind 
1-rwxr-xr-x 1 hodlinator users 16509032 jan  1  1980 guix-build-c3fa29db1b05/output/x86_64-linux-gnu/bitcoin-c3fa29db1b05/bin/bitcoind
2
3$ ls -al guix-build-5fdbc8b4ee60/output/x86_64-linux-gnu/bitcoin-5fdbc8b4ee60/bin/bitcoind 
4-rwxr-xr-x 1 hodlinator users 16509032 jan  1  1980 guix-build-5fdbc8b4ee60/output/x86_64-linux-gnu/bitcoin-5fdbc8b4ee60/bin/bitcoind
5
6$ diff guix-build-c3fa29db1b05/output/x86_64-linux-gnu/bitcoin-c3fa29db1b05/bin/bitcoind guix-build-5fdbc8b4ee60/output/x86_64-linux-gnu/bitcoin-5fdbc8b4ee60/bin/bitcoind
7Binary files guix-build-c3fa29db1b05/output/x86_64-linux-gnu/bitcoin-c3fa29db1b05/bin/bitcoind and guix-build-5fdbc8b4ee60/output/x86_64-linux-gnu/bitcoin-5fdbc8b4ee60/bin/bitcoind differ

hodlinator force-pushed on Aug 21, 2024

hodlinator renamed this:
~~refactor: Replace ParseHex with consteval HexLiteral~~
refactor: Replace ParseHex with consteval ""_hex literals
on Aug 21, 2024

hodlinator force-pushed on Aug 21, 2024

DrahtBot added the label CI failed on Aug 21, 2024

DrahtBot commented at 1:25 pm on August 21, 2024: contributor

🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/29056977746

Make sure to run all tests locally, according to the documentation.

The failure may happen due to a number of reasons, for example:

Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.
An intermittent issue.

Leave a comment here, if you need help tracking down a confusing failure.

hodlinator force-pushed on Aug 21, 2024

in src/util/strencodings.h:413 in 4672a451ef outdated

408+ * @note It may be preferable to use _hex_v instead of _hex to save stack space
409+ * when declaring a local variable, if the hex string is large. Alternately the
410+ * variable could be declared constexpr to avoid using stack space.
411+ */
412+template <util::detail::Hex str>
413+consteval auto operator"" _hex() { return str.bytes; }

ryanofsky commented at 3:19 pm on August 21, 2024:

In commit “util: Add consteval “"_hex[_v][_u8] literals” (4672a451efe65ed190b943be3cb2646344d06b56)

I think I’d suggest putting these in util namespace to avoid risk of name collisions, especially since the util library is used by libbitcoinkernel and can be used externally.

0namespace util {
1inline namespace hex_literals {
2// ...consteval auto operator""_hex...
3} // inline namespace hex_literals
4} // namespace util

Should allow code to use these by adding using namespace util::hex_literals if I am reading https://stackoverflow.com/questions/38950008/using-string-literals-without-using-namespace-std correctly

ryanofsky commented at 3:23 pm on August 21, 2024:

In commit “util: Add consteval “"_hex[_v][_u8] literals” (4672a451efe65ed190b943be3cb2646344d06b56)

I think from https://en.cppreference.com/w/cpp/language/user_literal#Literal_operators you want to get rid of the space between "" and _hex, because that seems to be a deprecated way of declaring literals. Though I’m not really clear why. (I found discussion about the space in https://github.com/fmtlib/fmt/issues/3607 though that didn’t really clear things up either.)

DrahtBot removed the label CI failed on Aug 21, 2024

in src/util/strencodings.h:408 in 4672a451ef outdated

403+} // namespace util
404+
405+/**
406+ * Converts from hex string literal to std::array<std::byte, N> at compile time.
407+ *
408+ * @note It may be preferable to use _hex_v instead of _hex to save stack space

ryanofsky commented at 4:06 pm on August 21, 2024:

In commit “util: Add consteval “"_hex[_v][_u8] literals” (4672a451efe65ed190b943be3cb2646344d06b56)

I think comment could also warn that difference between _hex_v and _hex suffixes is important when serializing, because vectors are always assumed to be variable-length and serialized with a size prefix, while arrays are considered fixed length and serialized without a prefix.

in src/net_processing.cpp:3937 in ef20ce0586 outdated

3930@@ -3931,8 +3931,8 @@ void PeerManagerImpl::ProcessMessage(CNode& pfrom, const std::string& msg_type,
3931 
3932         // If the peer is old enough to have the old alert system, send it the final alert.
3933         if (greatest_common_version <= 70012) {
3934-            const auto finalAlert{ParseHex("60010000000000000000000000ffffff7f00000000ffffff7ffeffff7f01ffffff7f00000000ffffff7f00ffffff7f002f555247454e543a20416c657274206b657920636f6d70726f6d697365642c2075706772616465207265717569726564004630440220653febd6410f470f6bae11cad19c48413becb1ac2c17f908fd0fd53bdc3abd5202206d0e9c96fe88d4a0f01ed9dedae2b6f9e00da94cad0fecaae66ecf689bf71b50")};
3935-            MakeAndPushMessage(pfrom, "alert", Span{finalAlert});
3936+            constexpr auto finalAlert{"60010000000000000000000000ffffff7f00000000ffffff7ffeffff7f01ffffff7f00000000ffffff7f00ffffff7f002f555247454e543a20416c657274206b657920636f6d70726f6d697365642c2075706772616465207265717569726564004630440220653febd6410f470f6bae11cad19c48413becb1ac2c17f908fd0fd53bdc3abd5202206d0e9c96fe88d4a0f01ed9dedae2b6f9e00da94cad0fecaae66ecf689bf71b50"_hex};
3937+            MakeAndPushMessage(pfrom, "alert", finalAlert);

ryanofsky commented at 4:20 pm on August 21, 2024:

In commit “util: Add consteval “"_hex[_v][_u8] literals” (4672a451efe65ed190b943be3cb2646344d06b56)

I think commit message is not explaining reasons why different types of literals are being chosen, and not pointing out dangers of choosing the wrong type. For example, I think if _hex_v were used instead of _hex on this line, the code would still compile, but an improperly formatted network message would be sent.

Would suggest:

refactor: Hand-replace some ParseHex -> “"_hex

A subsequent scripted-diff commit will replace ParseHex(”…”) with “…"_hex_u8, but this replacement will not work in cases where vectors are needed instead of arrays, and is not ideal in cases where std::byte is accepted. For example, it is currently necessary to use _hex_v_u8 when calling CScript operator« because that operator does not currently support std::array or std::byte. Conversely, it is incorrect to use _hex_v instead of _hex in net_processing.cpp for the MakeAndPushMessage argument, because if the argument is a std::vector it is considered variable-length and serialized with a size prefix, but if the argument is a std::array or Span is it considered fixed length and serialized without a prefix. By the same logic, it is also safe to change the NUMS_H constant in pubkey.cpp from a std::vector to std::array because it is never serialized.

ryanofsky approved

ryanofsky commented at 4:38 pm on August 21, 2024: contributor

Code review ACK c139a788e0052ece8f6c5689e4cd04b406032875. Switched to template literals since last review, so final state of this is a lot nicer. I like the new choice of suffixes, they seem to provide clarity and convenience.

I did leave one code suggestion to move literal operators to an inline namespace, but it could easily be a followup if it would complicate this PR, since it should only add new code.

l0rinc commented at 5:23 pm on August 21, 2024: contributor

ACK c139a788e0052ece8f6c5689e4cd04b406032875

hodlinator force-pushed on Aug 21, 2024

hodlinator commented at 8:10 pm on August 21, 2024: contributor

Thanks @ryanofsky, incorporated your latest feedback in 7a4d249267cb5f63ace96a5fcc03452acc5468b5.

l0rinc commented at 8:26 pm on August 21, 2024: contributor

namespace + formatting changes ACK 7a4d249267cb5f63ace96a5fcc03452acc5468b5

DrahtBot requested review from ryanofsky on Aug 21, 2024

maflcko added the label DrahtBot Guix build requested on Aug 22, 2024

in src/util/strencodings.h:410 in b8b10cea4f outdated

405+/**
406+ * Converts from hex string literal to std::array<std::byte, N> at compile time.
407+ *
408+ * @note It may be preferable to use _hex_v instead of _hex to save stack space
409+ * when declaring a local variable, if the hex string is large. Alternately the
410+ * variable could be declared constexpr to avoid using stack space.

ryanofsky commented at 2:50 pm on August 22, 2024:

In commit “util: Add consteval “"_hex[_v][_u8] literals” (b8b10cea4f4331b5f587cfee6ac401fbbf0697a5)

I think this comment is still missing a lot of important information. It doesn’t mention that this is an alternative to ParseHex, and only mentions one of variant suffixes below, not describing the real considerations required to choose between these alternatives. The commit message does have this information, but it is unlikely someone trying use these functions will see the commit message. Would suggest improving local documentation here and probably dropping anything redundant in the commit message. I think the following would be a good comment:

 0/**
 1 * ""_hex is a compile-time user-defined literal returning a
 2 * `std::array<std::byte>`, equivalent to `ParseHex`. Variants include:
 3 *
 4 * - ""_hex_v: Returns `std::vector<std::byte>`, useful for heap allocation or
 5 *   variable-length serialization.
 6 *
 7 * - ""_hex_u8: Returns `std::array<uint8_t>`, for cases where `std::byte` is
 8 *   incompatible.
 9 *
10 * - ""_hex_v_u8: Returns `std::vector<uint8_t>`, combining heap allocation with
11 *   `uint8_t`.
12 *
13 * [@warning](/bitcoin-bitcoin/contributor/warning/) It may be necessary to use _v variants when serializing, or vice
14 *   versa, because vectors are assumed to be variable-length and serialized
15 *   with a size prefix, while arrays are considered fixed length and serialized
16 *   with no prefix.
17 *
18 * [@warning](/bitcoin-bitcoin/contributor/warning/) It may be preferable to use _v variants to save stack space when
19 *   declaring local variables if hex strings are large. Alternately variables
20 *   could be declared constexpr to avoid using stack space.
21 *
22 * [@warning](/bitcoin-bitcoin/contributor/warning/) Avoid _u8 variants when not necessary, as the codebase migrates to
23 *   use `std::byte` instead of `unsigned char` and `uint8_t``.
24 */

hodlinator commented at 8:43 pm on August 22, 2024:

 0--- before	2024-08-22 22:41:41.949431649 +0200
 1+++ after	2024-08-22 22:41:56.957571134 +0200
 2@@ -1,7 +1,7 @@
 3-/**
 4+/** 
 5  * ""_hex is a compile-time user-defined literal returning a
 6- * `std::array<std::byte>`, equivalent to `ParseHex`. Variants include:
 7- *
 8+ * `std::array<std::byte>`, equivalent to ParseHex(). Variants provided:
 9+ * 
10  * - ""_hex_v: Returns `std::vector<std::byte>`, useful for heap allocation or
11  *   variable-length serialization.
12  *
13@@ -21,5 +21,8 @@
14  *   could be declared constexpr to avoid using stack space.
15  *
16  * [@warning](/bitcoin-bitcoin/contributor/warning/) Avoid _u8 variants when not necessary, as the codebase migrates to
17- *   use `std::byte` instead of `unsigned char` and `uint8_t``.
18+ *   use `std::byte` instead of `unsigned char` and `uint8_t`.
19+ * 
20+ * [@note](/bitcoin-bitcoin/contributor/note/) The reason ""_hex uses `std::array` is because a heap-based
21+ *   `std::vector` cannot cross the compile-time/runtime barrier.
22  */

hodlinator commented at 8:44 pm on August 22, 2024:

Just spotted the whitespace at end of line.. fixing. :)

(And yes this made me set my editor to remove trailing spaces by default).

hodlinator commented at 8:49 pm on August 22, 2024:

(“ParseHex()” becomes a link in Doxygen).

ryanofsky approved

ryanofsky commented at 3:16 pm on August 22, 2024: contributor

Code review ACK 7a4d249267cb5f63ace96a5fcc03452acc5468b5. Changes since last review were just adding hex_literals namespace and improving comments as suggested (thanks!)

hodlinator force-pushed on Aug 22, 2024

DrahtBot commented at 8:36 pm on August 22, 2024: contributor

Guix builds (on x86_64) [untrusted test-only build, possibly unsafe, not for production use]

File	commit bc87ad98543299e1990ee1994d0653df3ac70093(master)	commit 3d1c8b1143f71644ab6339c535888efb5bf6e502(master and this pull)
SHA256SUMS.part	`5a610e983a4fcc01...`	`8530068160c65660...`
*-aarch64-linux-gnu-debug.tar.gz	`c0172136d30ba6d9...`	`dd9cd54dbac35062...`
*-aarch64-linux-gnu.tar.gz	`e839a3d864b3df80...`	`a9cbec45c80c4a41...`
*-arm-linux-gnueabihf-debug.tar.gz	`0fb0f55e2fc0ddb3...`	`fb7cf87f23cc9f85...`
*-arm-linux-gnueabihf.tar.gz	`d999afba72e1f77b...`	`3a572311975c9f94...`
*-arm64-apple-darwin-unsigned.tar.gz	`8ead23a66a5db4ba...`	`d9ff133f276d8d04...`
*-arm64-apple-darwin-unsigned.zip	`d339e2be0a131b6f...`	`8aacc66cb52e2906...`
*-arm64-apple-darwin.tar.gz	`d83c4fd2a0d0ec3d...`	`778848da9139613c...`
*-powerpc64-linux-gnu-debug.tar.gz	`68106d41333bfc6d...`	`5b3f721fbaf77ef9...`
*-powerpc64-linux-gnu.tar.gz	`d4e70fa8303a3d41...`	`81a446b39c932895...`
*-riscv64-linux-gnu-debug.tar.gz	`15fc2c6a2ffb489f...`	`51cdf9ef4c293ac2...`
*-riscv64-linux-gnu.tar.gz	`7769e1b123de3991...`	`1cfb9e47d7d45d61...`
*-x86_64-apple-darwin-unsigned.tar.gz	`0fe2a4cd1d6f4a65...`	`667c61801612cd75...`
*-x86_64-apple-darwin-unsigned.zip	`6595b69b243f19e0...`	`c80173bbac4fbe11...`
*-x86_64-apple-darwin.tar.gz	`fd8bde8242b702b9...`	`4c87a8468daf789f...`
*-x86_64-linux-gnu-debug.tar.gz	`973533336d1e65d8...`	`4069401f6f409655...`
*-x86_64-linux-gnu.tar.gz	`7792be2c3dc1a33e...`	`0cde2435a0a4f0a2...`
*.tar.gz	`4e4718eaffc52514...`	`961cb39cebf18bfa...`
guix_build.log	`d28572665bf76fdb...`	`1fcc45c4c0bdf649...`
guix_build.log.diff		`95ccdc6c1c71e697...`

DrahtBot removed the label DrahtBot Guix build requested on Aug 22, 2024

in src/util/strencodings.h:407 in 3bf9424826 outdated

402+ * @warning It may be preferable to use _v variants to save stack space when
403+ *   declaring local variables if hex strings are large. Alternately variables
404+ *   could be declared constexpr to avoid using stack space.
405+ *
406+ * @warning Avoid _u8 variants when not necessary, as the codebase migrates to
407+ *   use `std::byte` instead of `unsigned char` and `uint8_t``.

l0rinc commented at 8:38 pm on August 22, 2024:

0 *   use `std::byte` instead of `unsigned char` and `uint8_t`.

hodlinator force-pushed on Aug 22, 2024

DrahtBot added the label CI failed on Aug 22, 2024

DrahtBot commented at 8:40 pm on August 22, 2024: contributor

🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/29134897722

Make sure to run all tests locally, according to the documentation.

The failure may happen due to a number of reasons, for example:

Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.
An intermittent issue.

Leave a comment here, if you need help tracking down a confusing failure.

hodlinator force-pushed on Aug 22, 2024

l0rinc commented at 8:49 pm on August 22, 2024: contributor

ACK df92661444f46790b12d5061344d72106ef820d9

Doc updates

DrahtBot requested review from ryanofsky on Aug 22, 2024

DrahtBot removed the label CI failed on Aug 22, 2024

ryanofsky approved

ryanofsky commented at 2:04 am on August 23, 2024: contributor

Code review ACK df92661444f46790b12d5061344d72106ef820d9. Just documentation update since last review (thanks!)

in src/util/strencodings.h:413 in df92661444 outdated

411+ */
412+inline namespace hex_literals {
413+namespace detail {
414+
415+template <size_t N>
416+struct Hex {

stickies-v commented at 10:58 am on August 23, 2024:

Is there a specific reason to put Hex in the detail namespace? I think it could be useful to have consteval functions take a Hex parameter. For example, the base_blob(string_view) ctor can be quite elegantly deduplicated:

 0diff --git a/src/uint256.h b/src/uint256.h
 1index 4124a34719..3616b0f9cb 100644
 2--- a/src/uint256.h
 3+++ b/src/uint256.h
 4@@ -16,6 +16,7 @@
 5 #include <cstdint>
 6 #include <cstring>
 7 #include <optional>
 8+#include <ranges>
 9 #include <string>
10 
11 /** Template base class for fixed-sized opaque blobs. */
12@@ -41,7 +42,13 @@ public:
13         std::copy(vch.begin(), vch.end(), m_data.begin());
14     }
15 
16-    consteval explicit base_blob(std::string_view hex_str);
17+    consteval base_blob(util::detail::Hex<WIDTH * 2 + 1> str)
18+    {
19+        std::ranges::reverse_copy(
20+            std::bit_cast<std::array<uint8_t, WIDTH>>(str.bytes),
21+            m_data.begin()
22+        );
23+    }
24 
25     constexpr bool IsNull() const
26     {
27@@ -122,17 +129,6 @@ public:
28     }
29 };
30 
31-template <unsigned int BITS>
32-consteval base_blob<BITS>::base_blob(std::string_view hex_str)
33-{
34-    if (hex_str.length() != m_data.size() * 2) throw "Hex string must fit exactly";
35-    auto str_it = hex_str.rbegin();
36-    for (auto& elem : m_data) {
37-        auto lo = util::ConstevalHexDigit(*(str_it++));
38-        elem = (util::ConstevalHexDigit(*(str_it++)) << 4) | lo;
39-    }
40-}
41-
42 namespace detail {
43 /**
44  * Writes the hex string (in reverse byte order) into a new uintN_t object
45@@ -170,7 +166,7 @@ class uint256 : public base_blob<256> {
46 public:
47     static std::optional<uint256> FromHex(std::string_view str) { return detail::FromHex<uint256>(str); }
48     constexpr uint256() = default;
49-    consteval explicit uint256(std::string_view hex_str) : base_blob<256>(hex_str) {}
50+    consteval explicit uint256(util::detail::Hex<65> hex_str) : base_blob<256>(hex_str) {}
51     constexpr explicit uint256(uint8_t v) : base_blob<256>(v) {}
52     constexpr explicit uint256(Span<const unsigned char> vch) : base_blob<256>(vch) {}
53     static const uint256 ZERO;

hodlinator commented at 5:47 pm on August 23, 2024:

Interesting idea!

Should move it out of the detail namespace. That makes the naming of Hex more important. Are people still happy with it?

ryanofsky commented at 10:51 pm on August 23, 2024:

re: #30377 (review)

This is an interesting idea. I kind of think now that we have support for these suffixes, we should just add an _rhex reverse hex suffix and drop this constructor entirely, avoid the footgun of having two constructors where one reverses its input and the other does not.

Also, IMO if adopting this suggestion it would be a little better to keep detail namespace than to expose Hex class as something that would be reasonable to use in other places. But anything should be fine.

hodlinator commented at 7:12 am on August 24, 2024:

Not sure about how _rhex would work, if it returned std::array<std::byte, N> one could accidentally substitute for _hex. Keeping Hex in detail for now though.

hodlinator commented at 1:58 pm on August 24, 2024:

Think there may still be something to @stickies-v intuition here. Maybe the consteval base_blob-ctor could somehow be implemented in terms of ""_hex_u8 reverse-copied into base_blob::m_data instead of only borrowing ConstevalHexDigit. Don’t have the stamina to reconcile Hex<N> with std::string_view right now though.

l0rinc commented at 7:28 pm on August 24, 2024:

I also like the suggestion, but there are other hex conversion methods in uint256 which could use some compile-time love - I’m fine with doing it in a separate PR.

stickies-v commented at 11:14 am on August 28, 2024:

I’m happy to keep things as is for now, this can be considered in follow-ups indeed.

ryanofsky commented at 4:28 am on August 29, 2024:

re: #30377 (review)

The reason I think it would be good to drop this constructor later is that after this PR you can write both:

0uint256{"ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff"} // bytes are reversed
1uint256{"ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff"_hex_u8} // bytes are not reversed

where one constructor reverses the bytes and the other constructor does not reverse the bytes, and the reverse constructor accepts an unlabeled string with no suffix. I think more ideally, we would have just one constructor, instead of two constructors with opposite behaviors.

Supporting an _rhex suffix would be one way to have a single constructor and make the argument explicit:

0uint256{"ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff"_rhex} // bytes are reversed
1uint256{"ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff"_hex} // bytes are not reversed

but there could be other ways like adding a constexpr Reverse function:

0uint256{Reverse("ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff"_hex)} // bytes are reversed
1uint256{"ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff"_hex} // bytes are not reversed

or supporting string prefixes instead of suffixes:

0uint256{"<ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff"_hex)} // bytes are reversed
1uint256{">ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff"_hex} // bytes are not reversed

I don’t feel strongly about these ideas but I think they would all simplify the uint256 class and make it more obvious when bytes are being reversed.

l0rinc commented at 6:49 am on August 29, 2024:

I like the _rhex vs _hex suggestion - it’s easy to navigate to the definition in the source code and check the difference. It’s a lot more awkward to check it out with an in-string prefix - since it starts with a non-standard, non-hex, arbitrary character.

hodlinator commented at 7:00 am on August 29, 2024:

Makes me think maybe _hex[_v][_u8] should be changed to return something else than raw std::array/std::vector. Perhaps some thin wrapper struct that can be implicitly converted to the inner type, but can be caught in the uint256-ctors so that ForwardHex is explicitly deleted (or handled, reversed), but ReverseHex is allowed and treated as if the uint256(Span<const unsigned char>)-ctor had been called.

in src/util/strencodings.h:397 in f9fed4d4e6 outdated

392+ *   incompatible.
393+ *
394+ * - ""_hex_v_u8: Returns `std::vector<uint8_t>`, combining heap allocation with
395+ *   `uint8_t`.
396+ *
397+ * @warning It may be necessary to use _v variants when serializing, or vice

ryanofsky commented at 1:11 pm on August 23, 2024:

In commit “util: Add consteval “"_hex[_v][_u8] literals” (f9fed4d4e6daa2bdd715b92b6bfffdbe8202a9c8)

It may be necessary to use _v variants when serializing, or vice versa, because

Would make more sense as “It could be necessary to use vector instead of array variants when serializing, or vice versa, because”

hodlinator commented at 2:13 pm on August 23, 2024:

Might do if I re-touch.

in src/util/strencodings.h:410 in f9fed4d4e6 outdated

405+ *
406+ * @warning Avoid _u8 variants when not necessary, as the codebase migrates to
407+ *   use `std::byte` instead of `unsigned char` and `uint8_t`.
408+ *
409+ * @note The reason ""_hex uses `std::array` is because a heap-based
410+ *   `std::vector` cannot cross the compile-time/runtime barrier.

ryanofsky commented at 1:22 pm on August 23, 2024:

In commit “util: Add consteval “"_hex[_v][_u8] literals” (f9fed4d4e6daa2bdd715b92b6bfffdbe8202a9c8)

Maybe change “The reason” to “One reason”. I think even if C++ did provide a way to convert constexpr vectors to runtime vectors(*) it would still make sense for this to use std::array because:

Semantically this returns a fixed size array, not a variable sized one
std::array is more flexible because array values can be located on the heap, or on stack, or in static storage, while std::vector is inflexible and can basically only allocate values on a heap
Array type is simpler and doesn’t add unnecessary overhead.

(*) C++ doesn’t provide a way to convert constexpr vectors to runtime vectors, but it could, and I assume we could even write our own ToRuntime<constexpr_vector>() function doing this conversion.

hodlinator commented at 2:16 pm on August 23, 2024:

Yeah, my goal was to contrast it against ParseHex returning a vector. Will do if I re-touch.

in src/util/strencodings.h:440 in f9fed4d4e6 outdated

438+
439+template <util::detail::Hex str>
440+constexpr auto operator""_hex_v() { return std::vector<std::byte>{str.bytes.begin(), str.bytes.end()}; }
441+
442+template <util::detail::Hex str>
443+inline auto operator""_hex_v_u8() { return std::vector<uint8_t>{UCharCast(str.bytes.data()), UCharCast(str.bytes.data() + str.bytes.size())}; }

ryanofsky commented at 1:39 pm on August 23, 2024:

In commit “util: Add consteval “"_hex[_v][_u8] literals” (f9fed4d4e6daa2bdd715b92b6bfffdbe8202a9c8)

This could be constexpr and I think switching between consteval and constexpr just makes the definitions inconsistent and adds noise. Would suggest:

 0--- a/src/util/strencodings.h
 1+++ b/src/util/strencodings.h
 2@@ -431,16 +431,16 @@ struct Hex {
 3 } // namespace detail
 4 
 5 template <util::detail::Hex str>
 6-consteval auto operator""_hex() { return str.bytes; }
 7+constexpr auto operator""_hex() { return str.bytes; }
 8 
 9 template <util::detail::Hex str>
10-consteval auto operator""_hex_u8() { return std::bit_cast<std::array<uint8_t, str.bytes.size()>>(str.bytes); }
11+constexpr auto operator""_hex_u8() { return std::bit_cast<std::array<uint8_t, str.bytes.size()>>(str.bytes); }
12 
13 template <util::detail::Hex str>
14 constexpr auto operator""_hex_v() { return std::vector<std::byte>{str.bytes.begin(), str.bytes.end()}; }
15 
16 template <util::detail::Hex str>
17-inline auto operator""_hex_v_u8() { return std::vector<uint8_t>{UCharCast(str.bytes.data()), UCharCast(str.bytes.data() + str.bytes.size())}; }
18+constexpr inline auto operator""_hex_v_u8() { return std::vector<uint8_t>{UCharCast(str.bytes.data()), UCharCast(str.bytes.data() + str.bytes.size())}; }
19 
20 } // inline namespace hex_literals
21 } // namespace util

hodlinator commented at 2:24 pm on August 23, 2024:

I want to explicitly enforce compile time where possible, hence the consteval.

In the ""_hex_v_u8 case, constexpr would be a lie, as UCharCast being called is inline, not constexpr, and calls reinterpret_cast which is not supported at compile time. (This means that UCharSpanCast and MakeUCharSpan are constexpr in name only, except for if called on a collection that is already unsigned char and doesn’t require reinterpret_cast).

stickies-v commented at 2:36 pm on August 23, 2024:

Since we’re templating these on Hex instances, this string literal approach will produce quite a rather large number of template instantiations. I wonder if keeping these consteval would also better allow the compiler to make sure these instantiations don’t end up in the binary?

I want to explicitly enforce compile time where possible, hence the consteval.

Given that they’re templated on a consteval Hex, I don’t think this suggestion would change anything there? It would, in my view, make it more confusing though, by hiding that these operators are in fact consteval. I might be missing some nuance here, though?

l0rinc commented at 3:16 pm on August 23, 2024:

makes the definitions inconsistent and adds noise

I agree that it’s likely just a documentation change for the reader (unsurprisingly, my trivial godbolt experiment also showed the same compiled output), but the methods are a bit different, isn’t it useful signal if we mark them as such?

hodlinator commented at 5:43 pm on August 23, 2024:

Since the consteval/constexpr/inline choices are not obvious, but have fairly solid reasoning behind them, I could add non-Doxygen comments like

0// Calls non-constexpr function so can only be inline
1inline auto operator""_hex_v_u8() { return std::vector<uint8_t>{UCharCast(...

What do you think?

l0rinc commented at 5:46 pm on August 23, 2024:

That wouldn’t explain to me why the compiler seems to allow it

hodlinator commented at 8:38 pm on August 23, 2024:

@stickies-v:

It would, in my view, make it more confusing though, by hiding that these operators are in fact consteval. I might be missing some nuance here, though?

Sorry for not addressing this before, was a bit rushed in my previous post. My understanding is the Hex-instantiations are consteval, and the operators are instantiated at compile time, but the last two are commonly executed at runtime (with a compile time baked Hex-value).

ryanofsky commented at 9:32 pm on August 23, 2024:

re: #30377 (review)

I want to explicitly enforce compile time where possible, hence the consteval.

I don’t think this is a good idea. IMO, the interface should simple and consistent, not noisy with unnecessary / unexplained differences that obscure more real and relevant differences between the functions. It’s not good to expose internal implementation details unnecessarily or to tell compilers how to do their jobs unnecessarily. The consteval on the hex constructor enforces that the hex is parsed into bytes at compile time, which is the thing we care about. What happens to the bytes after that, how they are stored and converted should be up to the user and the compiler.

In the “"_hex_v_u8 case, constexpr would be a lie

Oh, that’s interesting. If adding constexpr to that line would be misleading I agree it would be good to omit it, and consider adding a comment explaining the difference. I also think it would be nice to follow up and make this code consistent and working at compile or run time.

the methods are a bit different, isn’t it useful signal if we mark them as such?

If this is signaling anything it’s a confusing and muddled signal. All 4 functions evaluate their argument at compile time but can execute and return values usable at both compile time and run time (modulo the reinterpret_cast bug) so they can just be marked accurately to reflect what they do.

Current code is ok, but either trying to do something more complicated than necessary, or just not explaining clearly what it’s trying to do.

hodlinator commented at 7:17 am on August 24, 2024:

As long as Hex is consteval I’m happy. Changed the literals that could be into constexpr and added comment about the inline outlier.

ryanofsky approved

in src/test/util_tests.cpp:165 in df92661444 outdated

163+
164+    const std::vector<std::byte> hex_literal_vector{operator""_hex_v<util::detail::Hex(HEX_PARSE_INPUT)>()};
165+    hex_literal_span = MakeUCharSpan(hex_literal_vector);
166+    BOOST_CHECK_EQUAL_COLLECTIONS(hex_literal_span.begin(), hex_literal_span.end(), expected.begin(), expected.end());
167+
168+    constexpr std::array<uint8_t, 65> hex_literal_array_uint8{operator""_hex_u8<util::detail::Hex(HEX_PARSE_INPUT)>()};

stickies-v commented at 1:59 pm on August 23, 2024:

nit: could be made a bit more readable by using auto and implicitly constructing Hex:

 0diff --git a/src/test/util_tests.cpp b/src/test/util_tests.cpp
 1index 644c8ffc72..050aa2a5a9 100644
 2--- a/src/test/util_tests.cpp
 3+++ b/src/test/util_tests.cpp
 4@@ -154,15 +154,14 @@ BOOST_AUTO_TEST_CASE(parse_hex)
 5 
 6     // Basic test vector
 7     std::vector<unsigned char> expected(std::begin(HEX_PARSE_OUTPUT), std::end(HEX_PARSE_OUTPUT));
 8-    constexpr std::array<std::byte, 65> hex_literal_array{operator""_hex<util::detail::Hex(HEX_PARSE_INPUT)>()};
 9-    auto hex_literal_span{MakeUCharSpan(hex_literal_array)};
10+    auto hex_literal_span{MakeUCharSpan(operator""_hex<HEX_PARSE_INPUT>())};
11     BOOST_CHECK_EQUAL_COLLECTIONS(hex_literal_span.begin(), hex_literal_span.end(), expected.begin(), expected.end());
12 
13-    const std::vector<std::byte> hex_literal_vector{operator""_hex_v<util::detail::Hex(HEX_PARSE_INPUT)>()};
14+    const auto hex_literal_vector{operator""_hex_v<HEX_PARSE_INPUT>()};
15     hex_literal_span = MakeUCharSpan(hex_literal_vector);
16     BOOST_CHECK_EQUAL_COLLECTIONS(hex_literal_span.begin(), hex_literal_span.end(), expected.begin(), expected.end());
17 
18-    constexpr std::array<uint8_t, 65> hex_literal_array_uint8{operator""_hex_u8<util::detail::Hex(HEX_PARSE_INPUT)>()};
19+    constexpr auto hex_literal_array_uint8{operator""_hex_u8<HEX_PARSE_INPUT>()};
20     BOOST_CHECK_EQUAL_COLLECTIONS(hex_literal_array_uint8.begin(), hex_literal_array_uint8.end(), expected.begin(), expected.end());
21 
22     result = operator""_hex_v_u8<util::detail::Hex(HEX_PARSE_INPUT)>();

hodlinator commented at 8:46 pm on August 23, 2024:

Will see if all compilers are fine with removing util::detail::Hex(...) here. Hopefully they are!

I prefer documenting the full type in for these specific tests, but admit it is quite verbose.

hodlinator commented at 10:02 am on August 24, 2024:

Bummer, broke Win64.

Aiming to back out that change later today.

in src/test/key_tests.cpp:361 in df92661444 outdated

355@@ -352,7 +356,7 @@ BOOST_AUTO_TEST_CASE(key_ellswift)
356 
357 BOOST_AUTO_TEST_CASE(bip341_test_h)
358 {
359-    std::vector<unsigned char> G_uncompressed = ParseHex("0479be667ef9dcbbac55a06295ce870b07029bfcdb2dce28d959f2815b16f81798483ada7726a3c4655da4fbfc0e1108a8fd17b448a68554199c47d08ffb10d4b8");
360+    constexpr auto G_uncompressed{"0479be667ef9dcbbac55a06295ce870b07029bfcdb2dce28d959f2815b16f81798483ada7726a3c4655da4fbfc0e1108a8fd17b448a68554199c47d08ffb10d4b8"_hex};
361     HashWriter hw;
362     hw.write(MakeByteSpan(G_uncompressed));

stickies-v commented at 2:28 pm on August 23, 2024:

nit in 6954b1f1143a73d4024f5564e287d8eeb34ff577: no more need for MakeByteSpan:

0    hw.write(G_uncompressed);

hodlinator commented at 7:13 am on August 24, 2024:

Done!

stickies-v approved

stickies-v commented at 2:56 pm on August 23, 2024: contributor

ACK df92661444f46790b12d5061344d72106ef820d9 .

The new “"_hex literals approach makes for a much more ergonomic interface that is concise, clear and easy to use. Since the “"_hex operators are templated on the string literal, It does generate a lot of template instantiations, ~~but since these are all consteval, I believe these should not end up in the binary, so that seems like a reasonable trade-off.~~

Left a few nits and a possible approach to further clean up the base_blob::base_blob(string_view) constructor, but nothing that needs to be done now or here.

hodlinator force-pushed on Aug 24, 2024

l0rinc commented at 2:37 pm on August 24, 2024: contributor

docs + constexpr mostly

ACK e63d7e54d26ddabedc5cd2cb8e1180da520fd063

DrahtBot requested review from ryanofsky on Aug 24, 2024

DrahtBot requested review from stickies-v on Aug 24, 2024

in src/util/strencodings.h:443 in e63d7e54d2 outdated

438+constexpr auto operator""_hex_u8() { return std::bit_cast<std::array<uint8_t, str.bytes.size()>>(str.bytes); }
439+
440+template <util::detail::Hex str>
441+constexpr auto operator""_hex_v() { return std::vector<std::byte>{str.bytes.begin(), str.bytes.end()}; }
442+
443+// inline because it calls non-constexpr UCharCast().

l0rinc commented at 2:38 pm on August 24, 2024:

nit: to be fair, this comment confuses me even more

hodlinator force-pushed on Aug 24, 2024

l0rinc commented at 7:30 pm on August 24, 2024: contributor

~~I’m not sure what’s with the Instruction clone failed CI failures, but the change looks good to me~~ the build fixed itself

ACK 92e599d57c507f35d889c3a804d2a7020dcc6b1d

DrahtBot added the label CI failed on Aug 24, 2024

DrahtBot removed the label CI failed on Aug 24, 2024

ryanofsky approved

ryanofsky commented at 2:29 pm on August 26, 2024: contributor

Code review ACK 92e599d57c507f35d889c3a804d2a7020dcc6b1d, just small suggested tweaks since last review (comments, constexpr, MakeByteSpan)

hodlinator commented at 7:01 am on August 28, 2024: contributor

Rebased due to conflict. Also took advantage of now merged #29369 that fully resolves #30377 (review) and #30377 (review).

Changes are:

3 out of 5 replacements in miniscript_tests.cpp are moved to the scripted-diff commit.
The new ToScript in script_tests.cpp can now use the more natural {span.begin(), span.end()} instead of {span.data(), span.data() + span.size()}.

DrahtBot added the label Needs rebase on Aug 28, 2024

hodlinator force-pushed on Aug 28, 2024

in src/util/strencodings.h:399 in dae42d474d outdated

394+ *   serializing, or vice versa, because vectors are assumed to be variable-
395+ *   length and serialized with a size prefix, while arrays are considered fixed
396+ *   length and serialized with no prefix.
397+ *
398+ * @warning It may be preferable to use vector variants to save stack space when
399+ *   declaring local variables if hex strings are large. Alternately variables

stickies-v commented at 10:20 am on August 28, 2024:

grammar nit: https://grammarist.com/usage/alternately-vs-alternatively/

0 *   declaring local variables if hex strings are large. Alternatively, variables

hodlinator commented at 11:31 am on August 28, 2024:

Thanks! Fixed in latest force-push.

in src/util/strencodings.h:437 in dae42d474d outdated

432+
433+template <util::detail::Hex str>
434+constexpr auto operator""_hex_u8() { return std::bit_cast<std::array<uint8_t, str.bytes.size()>>(str.bytes); }
435+
436+template <util::detail::Hex str>
437+constexpr auto operator""_hex_v() { return std::vector<std::byte>{str.bytes.begin(), str.bytes.end()}; }

stickies-v commented at 11:12 am on August 28, 2024:

Note: the _hex_v operator is currently unused, except for in unit tests on this operator:

 0diff --git a/src/test/util_tests.cpp b/src/test/util_tests.cpp
 1index 46eca2f0d8..ab521f5628 100644
 2--- a/src/test/util_tests.cpp
 3+++ b/src/test/util_tests.cpp
 4@@ -158,10 +158,6 @@ BOOST_AUTO_TEST_CASE(parse_hex)
 5     auto hex_literal_span{MakeUCharSpan(hex_literal_array)};
 6     BOOST_CHECK_EQUAL_COLLECTIONS(hex_literal_span.begin(), hex_literal_span.end(), expected.begin(), expected.end());
 7 
 8-    const std::vector<std::byte> hex_literal_vector{operator""_hex_v<util::detail::Hex(HEX_PARSE_INPUT)>()};
 9-    hex_literal_span = MakeUCharSpan(hex_literal_vector);
10-    BOOST_CHECK_EQUAL_COLLECTIONS(hex_literal_span.begin(), hex_literal_span.end(), expected.begin(), expected.end());
11-
12     constexpr std::array<uint8_t, 65> hex_literal_array_uint8{operator""_hex_u8<util::detail::Hex(HEX_PARSE_INPUT)>()};
13     BOOST_CHECK_EQUAL_COLLECTIONS(hex_literal_array_uint8.begin(), hex_literal_array_uint8.end(), expected.begin(), expected.end());
14 
15@@ -198,7 +194,6 @@ BOOST_AUTO_TEST_CASE(parse_hex)
16     // Empty string is supported
17     static_assert(""_hex.empty());
18     static_assert(""_hex_u8.empty());
19-    BOOST_CHECK_EQUAL(""_hex_v.size(), 0);
20     BOOST_CHECK_EQUAL(""_hex_v_u8.size(), 0);
21     BOOST_CHECK_EQUAL(ParseHex("").size(), 0);
22     BOOST_CHECK_EQUAL(TryParseHex<uint8_t>("").value().size(), 0);
23diff --git a/src/util/strencodings.h b/src/util/strencodings.h
24index f9076de047..2939527a82 100644
25--- a/src/util/strencodings.h
26+++ b/src/util/strencodings.h
27@@ -381,9 +381,6 @@ consteval uint8_t ConstevalHexDigit(const char c)
28  * ""_hex is a compile-time user-defined literal returning a
29  * `std::array<std::byte>`, equivalent to ParseHex(). Variants provided:
30  *
31- * - ""_hex_v: Returns `std::vector<std::byte>`, useful for heap allocation or
32- *   variable-length serialization.
33- *
34  * - ""_hex_u8: Returns `std::array<uint8_t>`, for cases where `std::byte` is
35  *   incompatible.
36  *
37@@ -433,9 +430,6 @@ constexpr auto operator""_hex() { return str.bytes; }
38 template <util::detail::Hex str>
39 constexpr auto operator""_hex_u8() { return std::bit_cast<std::array<uint8_t, str.bytes.size()>>(str.bytes); }
40 
41-template <util::detail::Hex str>
42-constexpr auto operator""_hex_v() { return std::vector<std::byte>{str.bytes.begin(), str.bytes.end()}; }
43-
44 template <util::detail::Hex str>
45 inline auto operator""_hex_v_u8() { return std::vector<uint8_t>{UCharCast(str.bytes.data()), UCharCast(str.bytes.data() + str.bytes.size())}; }
46

I think it might make sense the remove the operator? Since we’re dealing with compile-time string literals, I think we shouldn’t encourage new users to take vectors instead of arrays?

hodlinator commented at 11:43 am on August 28, 2024:

Yup, noticed it earlier as well, but decided to keep for now. Could drop if more people agree.

As noted in the docs, vector is currently used to signal that a size-prefix should be added during serialization, so if the conversion to accept std::byte in more places happens before adaptation to accept std::array for size-prefixed data, _hex_v will become necessary.

It also feels more complete to have all byte/uint8_t and array/vector combinations.

stickies-v commented at 11:58 am on August 28, 2024:

so if the conversion to accept std::byte in more places happens before adaptation to accept std::array for size-prefixed data, _hex_v will become necessary.

That is a good point, so we are indeed expecting to use it in the future. In that case, since it’s already reviewed, let’s keep it in and you can mark this as resolved.

stickies-v approved

stickies-v commented at 11:17 am on August 28, 2024: contributor

re-ACK dae42d474d04168fba077f30bf5113721e9fc108

DrahtBot requested review from l0rinc on Aug 28, 2024

DrahtBot requested review from ryanofsky on Aug 28, 2024

hodlinator force-pushed on Aug 28, 2024

stickies-v commented at 11:59 am on August 28, 2024: contributor

re-ACK a096215c9c71a2ec03e76f1fd0bcdda0727996e0

DrahtBot removed the label Needs rebase on Aug 28, 2024

l0rinc commented at 4:00 pm on August 28, 2024: contributor

Changes since 92e599d57c507f35d889c3a804d2a7020dcc6b1d:

 0diff --git a/src/test/miniscript_tests.cpp b/src/test/miniscript_tests.cpp
 1index 452baebff3..e9b0d0ec1d 100644
 2--- a/src/test/miniscript_tests.cpp
 3+++ b/src/test/miniscript_tests.cpp
 4@@ -596,11 +596,11 @@ BOOST_AUTO_TEST_CASE(fixed_tests)
 5     //  - no pubkey before the CHECKSIG
 6     constexpr KeyConverter tap_converter{miniscript::MiniscriptContext::TAPSCRIPT};
 7     constexpr KeyConverter wsh_converter{miniscript::MiniscriptContext::P2WSH};
 8-    const auto no_pubkey{"ac519c"_hex_v_u8};
 9+    const auto no_pubkey{"ac519c"_hex_u8};
10     BOOST_CHECK(miniscript::FromScript({no_pubkey.begin(), no_pubkey.end()}, tap_converter) == nullptr);
11-    const auto incomplete_multi_a{"ba20c6047f9441ed7d6d3045406e95c07cd85c778e4b8cef3ca7abac09b95c709ee5ba519c"_hex_v_u8};
12+    const auto incomplete_multi_a{"ba20c6047f9441ed7d6d3045406e95c07cd85c778e4b8cef3ca7abac09b95c709ee5ba519c"_hex_u8};
13     BOOST_CHECK(miniscript::FromScript({incomplete_multi_a.begin(), incomplete_multi_a.end()}, tap_converter) == nullptr);
14-    const auto incomplete_multi_a_2{"ac2079be667ef9dcbbac55a06295ce870b07029bfcdb2dce28d959f2815b16f81798ac20c6047f9441ed7d6d3045406e95c07cd85c778e4b8cef3ca7abac09b95c709ee5ba519c"_hex_v_u8};
15+    const auto incomplete_multi_a_2{"ac2079be667ef9dcbbac55a06295ce870b07029bfcdb2dce28d959f2815b16f81798ac20c6047f9441ed7d6d3045406e95c07cd85c778e4b8cef3ca7abac09b95c709ee5ba519c"_hex_u8};
16     BOOST_CHECK(miniscript::FromScript({incomplete_multi_a_2.begin(), incomplete_multi_a_2.end()}, tap_converter) == nullptr);
17     // Can use multi_a under Tapscript but not P2WSH.
18     Test("and_v(v:multi_a(2,03d01115d548e7561b15c38f004d734633687cf4419620095bc5b0f47070afe85a,025601570cb47f238d2b0286db4a990fa0f3ba28d1a319f5e7cf55c2a2444da7cc),after(1231488000))", "?", "20d01115d548e7561b15c38f004d734633687cf4419620095bc5b0f47070afe85aac205601570cb47f238d2b0286db4a990fa0f3ba28d1a319f5e7cf55c2a2444da7ccba529d0400046749b1", TESTMODE_VALID | TESTMODE_NONMAL | TESTMODE_NEEDSIG | TESTMODE_P2WSH_INVALID, 4, 2, {}, {}, 3);
19@@ -645,12 +645,12 @@ BOOST_AUTO_TEST_CASE(fixed_tests)
20 
21     // Misc unit tests
22     // A Script with a non minimal push is invalid
23-    const std::vector<unsigned char> nonminpush{"0000210232780000feff00ffffffffffff21ff005f00ae21ae00000000060602060406564c2102320000060900fe00005f00ae21ae00100000060606060606000000000000000000000000000000000000000000000000000000000000000000"_hex_v_u8};
24+    constexpr auto nonminpush{"0000210232780000feff00ffffffffffff21ff005f00ae21ae00000000060602060406564c2102320000060900fe00005f00ae21ae00100000060606060606000000000000000000000000000000000000000000000000000000000000000000"_hex_u8};
25     const CScript nonminpush_script(nonminpush.begin(), nonminpush.end());
26     BOOST_CHECK(miniscript::FromScript(nonminpush_script, wsh_converter) == nullptr);
27     BOOST_CHECK(miniscript::FromScript(nonminpush_script, tap_converter) == nullptr);
28     // A non-minimal VERIFY (<key> CHECKSIG VERIFY 1)
29-    const std::vector<unsigned char> nonminverify{"2103a0434d9e47f3c86235477c7b1ae6ae5d3442d49b1943c2b752a68e2a47e247c7ac6951"_hex_v_u8};
30+    constexpr auto nonminverify{"2103a0434d9e47f3c86235477c7b1ae6ae5d3442d49b1943c2b752a68e2a47e247c7ac6951"_hex_u8};
31     const CScript nonminverify_script(nonminverify.begin(), nonminverify.end());
32     BOOST_CHECK(miniscript::FromScript(nonminverify_script, wsh_converter) == nullptr);
33     BOOST_CHECK(miniscript::FromScript(nonminverify_script, tap_converter) == nullptr);
34diff --git a/src/test/script_tests.cpp b/src/test/script_tests.cpp
35index ef6ca8dc0d..5825684947 100644
36--- a/src/test/script_tests.cpp
37+++ b/src/test/script_tests.cpp
38@@ -1358,7 +1358,7 @@ template <typename T>
39 CScript ToScript(const T& byte_container)
40 {
41     auto span{MakeUCharSpan(byte_container)};
42-    return {span.data(), span.data() + span.size()};
43+    return {span.begin(), span.end()};
44 }
45 
46 static CScript ScriptFromHex(const std::string& str)
47diff --git a/src/util/strencodings.h b/src/util/strencodings.h
48index f9076de047..1543de03ab 100644
49--- a/src/util/strencodings.h
50+++ b/src/util/strencodings.h
51@@ -396,7 +396,7 @@ consteval uint8_t ConstevalHexDigit(const char c)
52  *   length and serialized with no prefix.
53  *
54  * [@warning](/bitcoin-bitcoin/contributor/warning/) It may be preferable to use vector variants to save stack space when
55- *   declaring local variables if hex strings are large. Alternately variables
56+ *   declaring local variables if hex strings are large. Alternatively variables
57  *   could be declared constexpr to avoid using stack space.
58  *
59  * [@warning](/bitcoin-bitcoin/contributor/warning/) Avoid `uint8_t` variants when not necessary, as the codebase

ACK a096215c9c71a2ec03e76f1fd0bcdda0727996e0

DrahtBot added the label Needs rebase on Aug 28, 2024

refactor: Enforce lowercase hex digits for consteval uint256

Also changes compile-time asserts with comments into throws.

7e1d9a8468

refactor: Improve CCrypter related lines

Lines will be touched in next 2 commits.

d99c816971

refactor: de-Hungarianize CCrypter

Beyond renaming it also adjusts whitespace and adds braces to conform to current doc/developer-notes.md.

TestEncrypt: Change iterator type to auto in ahead of vector -> span conversion.

Only touches functions that will be modified in next commit.

bd0830bbd4

refactor: vector -> span in CCrypter

TestEncryptSingle: Remove no longer needed plaintext2-variable that existed because vectors had different allocators.

403d86f1cc

refactor: Make XOnlyPubKey tolerate constexpr std::arrays

Length was already asserted inside of base_blob-ctor.

2b5e6eff36

test refactor: util_tests - parse_hex clean up

* Use BOOST_CHECK_EQUAL_COLLECTIONS and BOOST_CHECK_EQUAL instead of deprecated BOOST_CHECK.
* Avoid repeating expected values.
* Break out repeated HEX_PARSE_INPUT and rename ParseHex_expected to HEX_PARSE_OUTPUT.

Done in preparation for adding a couple more tests in the next commit.

Co-Authored-By: l0rinc <pap.lorinc@gmail.com>

dc5f6f6812

util: Add consteval ""_hex[_v][_u8] literals

""_hex is a compile-time user-defined literal returning std::array<std::byte>, equivalent of ParseHex.

Variants:
- ""_hex_v returns std::vector<std::byte>
- ""_hex_u8 returns std::array<uint8_t>
- ""_hex_v_u8 returns std::vector<uint8_t> - Directly serializable as a size-prefixed OP_PUSH CScript payload using operator<<.

Also extracts from_hex into shared util::ConstevalHexDigit function.

Co-Authored-By: hodlinator <172445034+hodlinator@users.noreply.github.com>
Co-Authored-By: MarcoFalke <*~=`'#}+{/-|&$^_@721217.xyz>
Co-Authored-By: Ryan Ofsky <ryan@ofsky.org>
Co-Authored-By: stickies-v <stickies-v@protonmail.com>

5b74a849cf

refactor: Hand-replace some ParseHex -> ""_hex

The following scripted-diff commit will replace ParseHex("...") with "..."_hex_u8, but this replacement will not work in cases where vectors are needed instead of arrays, and is not ideal in cases where std::byte is accepted.

For example, it is currently necessary to use _hex_v_u8 when calling CScript operator<< because that operator does not currently support std::array or std::byte.

Conversely, it is incorrect to use _hex_v instead of _hex in net_processing.cpp for the MakeAndPushMessage argument, because if the argument is a std::vector it is considered variable-length and serialized with a size prefix, but if the argument is a std::array or Span is it considered fixed length and serialized without a prefix.

By the same logic, it is also safe to change the NUMS_H constant in pubkey.cpp from a std::vector to std::array because it is never serialized.

50bc017040

refactor: Prepare for ParseHex -> ""_hex scripted-diff

- Adds using namespace.
- Extracts ToScript helper function from ScriptFromHex, to be used heavily in the next commit.
- Changes ScriptFromHex from using ParseHex to TryParseHex, now asserting the string is valid.
- Use even number of hex digits in comment (and apply replacement from next commit to only touch line once).

9cb687351f

scripted-diff: Replace ParseHex[<std::byte>]("str") -> "str"_hex[_u8]

Ideally all call sites should accept std::byte instead of uint8_t but those transformations are left to future PRs.

-BEGIN VERIFY SCRIPT-
sed -i --regexp-extended 's/\bParseHex\(("[^"]*")\)/\1_hex_u8/g' $(git grep -l ParseHex -- :src ':(exclude)src/test/util_tests.cpp')
sed -i --regexp-extended 's/\bParseHex<std::byte>\(("[^"]*")\)/\1_hex/g' $(git grep -l ParseHex -- :src ':(exclude)src/test/util_tests.cpp')
sed -i --regexp-extended 's/\bScriptFromHex\(("[^"]*")\)/ToScript(\1_hex)/g' src/test/script_tests.cpp
-END VERIFY SCRIPT-

Co-Authored-By: MarcoFalke <*~=`'#}+{/-|&$^_@721217.xyz>
Co-Authored-By: Ryan Ofsky <ryan@ofsky.org>

8756ccd712

hodlinator force-pushed on Aug 28, 2024

DrahtBot removed the label Needs rebase on Aug 28, 2024

l0rinc commented at 9:43 pm on August 28, 2024: contributor

Non-trivial rebase with conflicts in bloom_tests, crypto_tests and txpackage_tests. The resolutions seem correct, all 3 are passing locally. ACK 8756ccd71218c8e013181473720b10d3c4a94957

DrahtBot requested review from stickies-v on Aug 28, 2024

ryanofsky approved

ryanofsky commented at 3:25 am on August 29, 2024: contributor

Code review ACK 8756ccd71218c8e013181473720b10d3c4a94957, just rebasing since last review and taking advantage of CScript constructors in #29369, also tweaking a code comment

stickies-v commented at 9:43 am on August 29, 2024: contributor

Rebase re-ACK 8756ccd71218c8e013181473720b10d3c4a94957, no changes since a096215c9c71a2ec03e76f1fd0bcdda0727996e0

ryanofsky merged this on Aug 31, 2024

ryanofsky closed this on Aug 31, 2024

maflcko commented at 10:34 am on September 4, 2024: member

macOS follow-up idea: Someone could check on macOS 13, if compilation with XCode still works. If not, build-osx.md could be updated (Ref: #30377 (review))

edit: i guess this is already checked by CI?

hodlinator deleted the branch on Sep 4, 2024

ryanofsky referenced this in commit 3a90dd0df6 on Sep 4, 2024

ryanofsky referenced this in commit c1e8012b0a on Sep 5, 2024

ryanofsky referenced this in commit 3c99f5a38a on Sep 5, 2024

refactor: Replace ParseHex with consteval “"_hex literals #30377

Code Coverage

Reviews

Conflicts

4 runtime places tentatively replaced with BytesFromHex

Differences

Guix builds (on x86_64) [untrusted test-only build, possibly unsafe, not for production use]

Guix builds (on x86_64) [untrusted test-only build, possibly unsafe, not for production use]

4 runtime places tentatively replaced with `BytesFromHex`