refactor: Allow CScript’s operator<< to accept spans, not just vectors #30765

pull l0rinc wants to merge 3 commits into bitcoin:master from l0rinc:l0rinc/CScript-vector-and-array changing 7 files +64 −43
  1. l0rinc commented at 12:48 pm on August 30, 2024: contributor

    Split out of #30377 (review).

    Replace _hex_v_u8 for CScript appends to _hex, to skip vector conversion before serializing to the prevector in CScript.

    To enable both unsigned char and std::byte values, I’ve extracted the existing serialization to append the size & data in separate private methods to clarify that it does more than just a simple data insertion.

    There were also discussion on eliminating the operators here completely to obviate when we’re serializing fixed-size collections as raw bytes, and when we’re prefixing them with their size - should also be done in a separate PR.

    Note that after this change, it would technically be possible to remove _hex_v_u8. Let me know if you think that’s a good idea.

    The PR is still in a draft phase since there isn’t an overall agreement about the exact direction.

  2. DrahtBot commented at 12:48 pm on August 30, 2024: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage

    For detailed information about the code coverage, see the test coverage report.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK ryanofsky, hodlinator

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #28710 (Remove the legacy wallet and BDB dependency by achow101)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  3. DrahtBot added the label Refactoring on Aug 30, 2024
  4. l0rinc force-pushed on Aug 31, 2024
  5. l0rinc marked this as ready for review on Aug 31, 2024
  6. in src/script/script.h:501 in 873fe22015 outdated
    520+        InsertInput(b.begin(), b.end());
    521+        return *this;
    522+    }
    523+
    524+    template<size_t N>
    525+    CScript& operator<<(const std::array<unsigned char, N>& b) LIFETIMEBOUND
    


    maflcko commented at 9:51 am on September 2, 2024:

    not sure about this. If someone in the future has an std::array<std::byte, N> and want to serialize it, will this be touched again? What if someone wants to serialize the first N/2 bytes only? Will they need to create a copy? Though, avoiding the vector conversion was the whole point of this pull request, so it just seems like right now this is adding more code while leaving the underlying issue.

    Maybe this could be a span or a range?

    In any case, the documentation needs to be clear that this is a pushdata and not a plain insert of the raw bytes into the underlying CScriptBase byte blob.


    l0rinc commented at 2:40 pm on September 2, 2024:

    will this be touched again?

    Not sure, isn’t that the case with vector as well?

    so it just seems like right now this is adding more code while leaving the underlying issue.

    Can you please detail why you think this is the case? _hex_v_u8 was mostly eliminated and the production code is basically the same (+3 lines for the other overload).

    Maybe this could be a span or a range?

    I understood we wanted to avoid doing that in #30377 (review)


    maflcko commented at 2:50 pm on September 2, 2024:

    will this be touched again?

    Not sure, isn’t that the case with vector as well?

    Yes, so that is another reason that either the code should be either left as-is, (to avoid changing it now, and then again in the future), or to just change it once.

    so it just seems like right now this is adding more code while leaving the underlying issue.

    Can you please detail why you think this is the case?

    The goal of this change is to avoid a copy/conversion, however the change only fixes it for one very specific instance, leaving the problem otherwise unfixed.

    Maybe this could be a span or a range?

    I understood we wanted to avoid doing that in #30377 (comment)

    Array serialization is equal to span serialization, so if array-serialization would be fine to diverge on, then why wouldn’t span/range serialization?


    l0rinc commented at 3:17 pm on September 2, 2024:

    however the change only fixes it for one very specific instance

    Seems I missed some other cases, which other ones should we include here?

    leaving the problem otherwise unfixed

    It does? This is what we’ve mentioned throughout the original PR by @hodlinator, @ryanofsky and yourself. I don’t mind changing to span (it’s what I suggested originally), if you think that would solve it in a cleaner way - but I need to understand the problem better since I though this fully solved our concerns regarding script data copying of hexadecimal values.


    ryanofsky commented at 3:32 pm on September 2, 2024:

    I think this should just take a span, not an array or a vector because the container type should not be relevant.

    The only reason to not use span here would be to try to make the cscript « operator act like the serialization « operator and treat vectors differently from spans and arrays. This behavior makes some sense for serialization, but doesn’t make sense for pushing cscript data, so I don’t think should be a concern.

    (For serialization we use Span{} casts as a way of telling serialization API that data is fixed-length, not variable-length so should be formatted without a size prefix. But this is a dubious and confusing use of span. Seems like it would be better to have FixedSize and VarSize serialization formatters that would make code clearer and allow serializing variable size containers without size prefixes, and fixed size containers with them, since there are use-cases for both. Adding span or vector conversions just to get data serialized in the right format is unnecessarily confusing, and in some cases inefficient.)


    sipa commented at 3:38 pm on September 2, 2024:

    Would it make sense to drop operator« from CScript, and instead have explicit “Append” and “Push” member functions (that take spans), and do the expected thing?

    (I have not followed this PR in detail, feel free to ignore if I’m missing things)


    ryanofsky commented at 3:39 pm on September 2, 2024:
    It would also be great if this could be more clearly documented as pushing data as marco suggests, and if it could use std::byte instead of char. If using std::byte is not easily possible, maybe it could use BasicByte so the PR can change _hex_v_u8 to _hex instead of _hex_u8

    l0rinc commented at 4:28 pm on September 2, 2024:
    I’ll investigate these tomorrow, thanks for the comments!

    ryanofsky commented at 4:53 pm on September 2, 2024:

    Would it make sense to drop operator« from CScript, and instead have explicit “Append” and “Push” member functions (that take spans), and do the expected thing?

    I think it makes sense, especially because cscripts are serializable, so it is confusing that x << script does an entirely different operation than script << x. But making this change would expand the scope of the PR beyond what it is currently doing which is just making cscript work a little better with hex literals from #30377.

    On the idea of Append and Push methods, I think an Append method is not actually necessary, but may be useful. And if we think it is a good idea to disambiguate the CScript << operator, it probably also makes sense to disambiguate the CScript constructor, since it currently has one overload to add bytes to the script and other overloads to add opcodes and data. Probably a better interface would provide separate functions with descriptive names returning CScript objects, instead of providing multiple CScript constructors where you can’t know what a constructor may be doing without knowing the type of the argument.

    But I do think it would be nice if this PR just added support for std::span and std::byte without necessarily making bigger changes.


    hodlinator commented at 8:46 am on September 3, 2024:

    Having an elegant way of expressing scripts in code is nice. How about (omitted some hex digits):

    0script << "046711d5f"_sized_hex << OP_CHECKSIG;
    

    _sized_hex could possibly create something like a SizePrefixed(std::array<std::byte>) object and CScript could have a << operator taking such objects and prepending the size.

    But I do think it would be nice if this PR just added support for std::span and std::byte without necessarily making bigger changes.

    That was what I was originally doing as part of #30377, I dropped it so that this PR here could let us work toward some final form of this aspect. So I would rather see a bigger step being made.


    l0rinc commented at 11:07 am on September 6, 2024:

    But I do think it would be nice if this PR just added support for std::span and std::byte without necessarily making bigger changes.

    I’ve added support for const std::span<const unsigned char> (for vector & array), but std::byte seems like a bigger change (as explained now in the PR’s description) - unless you’d like me to add it as an extra overload - let me know if that’s what you meant.


    maflcko commented at 12:59 pm on September 6, 2024:
    An overload for std::byte could be one more line of code, but would avoid having to touch all the touched non-header lines in this pull request again. Unless I am missing something obvious, there shouldn’t be any downside.

    ryanofsky commented at 1:26 pm on September 6, 2024:

    An overload for std::byte could be one more line of code, but would avoid having to touch all the touched non-header lines in this pull request again.

    Yeah sorry for any confusion. My suggestion in #30765 (review) was to use std::byte if it was easy or BasicByte as a fallback if would be too much work to convert other code to use std::byte. Having two overloads is also fine. The point is this PR should switch callers from _hex_v_u8 to _hex, not _hex_u8, so there is no need to change them again later.


    maflcko commented at 1:36 pm on September 6, 2024:

    BasicByte

    IIRC the reason that serialize code uses a template on the inner type of the span is that call sites have to explicitly specify that they want the “different” span-serialization.

    I think the concern doesn’t apply to CScript, so a template may be better to avoid, but I haven’t tried this myself.


    l0rinc commented at 3:35 pm on September 6, 2024:

    I have added an std::byte overload in a separate commit, so now we have:

    0CScript& operator<<(const std::span<const std::byte> b) LIFETIMEBOUND
    1{
    2    return *this << std::bit_cast<std::span<const uint8_t>>(b);
    3}
    

    which delegates to:

    0CScript& operator<<(const std::span<const uint8_t> b) LIFETIMEBOUND
    1{
    2    AppendDataSize(b.size());
    3    AppendData(b.begin(), b.end());
    4    return *this;
    5}
    

    I haven’t tried BasicByte, but this seems to work - if the other compilers agree…

  7. l0rinc marked this as a draft on Sep 3, 2024
  8. l0rinc force-pushed on Sep 6, 2024
  9. l0rinc renamed this:
    refactor: Extend CScript with `operator<<` for `std::array`
    refactor: Generalize `CScript`'s `std::vector` push to `std::span` to accept `std::array`, too
    on Sep 6, 2024
  10. l0rinc renamed this:
    refactor: Generalize `CScript`'s `std::vector` push to `std::span` to accept `std::array`, too
    refactor: Generalize `CScript`'s `std::vector` push to accept `std::array`, too
    on Sep 6, 2024
  11. l0rinc force-pushed on Sep 6, 2024
  12. DrahtBot added the label CI failed on Sep 6, 2024
  13. DrahtBot commented at 3:30 pm on September 6, 2024: contributor

    🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/29778092690

    Make sure to run all tests locally, according to the documentation.

    The failure may happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  14. in src/script/script.h:503 in 14080bb53d outdated
    523         return *this;
    524     }
    525 
    526+    CScript& operator<<(const std::span<const std::byte> b) LIFETIMEBOUND
    527+    {
    528+        return *this << std::bit_cast<std::span<uint8_t>>(b);
    


    maflcko commented at 3:44 pm on September 6, 2024:

    Probably fine here, but IIRC it is UB to cast away const from something that is always const.

    It would be better to use a safer alternative instead of a cast.


    sipa commented at 3:47 pm on September 6, 2024:
    I believe casting away constness on itself is never UB, but modifying a const object (through a pointer/reference whose constness was cast away) is UB.

    maflcko commented at 4:05 pm on September 6, 2024:

    Ah yes, according to https://eel.is/c++draft/expr.const.cast#6

    I still have a slight preference to keep the const unless removing it has a clear benefit.


    l0rinc commented at 4:05 pm on September 6, 2024:
    Very good points, in this version I can indeed add back the const, thanks! Also added a simple test to make sure CScript() << _hex_v_u8 and CScript() << _hex produce the same CScript() (can be removed if it passes CI and is deemed superfluous) See: https://github.com/bitcoin/bitcoin/compare/14080bb53d649ed1c99f121f377d2c8f2019c99e..03bdefff8c6663da895d0603469dee15aa4ad35d

    maflcko commented at 4:13 pm on September 6, 2024:
    The last push looks fine, but I still have a slight preference to use UCharSpanCast or std::as_bytes, which preserve constness by default.

    l0rinc commented at 7:29 pm on September 6, 2024:

    std::as_bytes converts to std::span<const std::byte>, but we need a const std::span<const uint8_t> since we can only call prevector#insert with class iterator { T* ptr{}; which isn’t trivial to migrate to a byte span as far as I can tell (i.e. the std::byte operator has to delegate to the uint8_t operator). And UCharSpanCast seems to work with Span, which I understood we’re trying to get away from. Or I could do

    0return *this << std::span<const uint8_t>(UCharCast(b.data()), b.size());
    

    instead of the current:

    0return *this << std::bit_cast<std::span<const uint8_t>>(b);
    

    if you think it’s more correct, but I would appreciate some explanation for why that’s better.


    l0rinc commented at 12:23 pm on September 7, 2024:
    Seems we’re getting an error with mingw around genesisOutputScript, where prevector#fill is trying to write 65 bytes of data into a 32-byte buffer. I can’t reproduce it locally, any help would be appreciated.

    hodlinator commented at 7:55 pm on September 7, 2024:
    The error looks very much like what I ran into in my attempt to make CScript take a span, see my change to prevector in 05c16ae8649ee3b6bbcdbdf0541a65ccf4477537 (referenced in #30377 (review)), I believe that might surprisingly enough fix the issue.

    l0rinc commented at 8:04 pm on September 7, 2024:
    Wow, what a weird fix! Thanks a lot, added it in a separate commit, let’s see if this fixes the build on Windows: 18870ac (#30765) Edit: it did fix the windows problem, but will have to rebase once master is stable again.

    maflcko commented at 7:36 am on September 9, 2024:

    And UCharSpanCast seems to work with Span, which I understood we’re trying to get away from.

    Correct. Sorry for the brevity. I wanted to say that my preference would be to find a way that is applicable to all places in the future that need to cast the inner type of a span from one byte type to another. (I don’t think all code will ever be converted to std::byte, because some interfaces (to C code) really only work with unsigned char, so a conversion will be needed in the future for some places in the code).

    So my preference would be to have (and use) a safe helper function that does not allow casting away the const by accident (even if this happens to not be UB in almost all cases).


    l0rinc commented at 9:38 am on September 9, 2024:
    Good idea, added BytesToUInt8s and UInt8sToBytes.
  15. l0rinc force-pushed on Sep 6, 2024
  16. l0rinc force-pushed on Sep 7, 2024
  17. l0rinc force-pushed on Sep 7, 2024
  18. l0rinc force-pushed on Sep 7, 2024
  19. l0rinc force-pushed on Sep 8, 2024
  20. l0rinc renamed this:
    refactor: Generalize `CScript`'s `std::vector` push to accept `std::array`, too
    refactor: Generalize `CScript`'s `std::vector` push to accept `std::array`
    on Sep 8, 2024
  21. l0rinc force-pushed on Sep 8, 2024
  22. l0rinc force-pushed on Sep 9, 2024
  23. l0rinc marked this as ready for review on Sep 9, 2024
  24. l0rinc force-pushed on Sep 9, 2024
  25. l0rinc force-pushed on Sep 9, 2024
  26. DrahtBot removed the label CI failed on Sep 9, 2024
  27. in src/prevector.h:394 in c4c102de73 outdated
    389@@ -390,7 +390,10 @@ class prevector {
    390             change_capacity(new_size + (new_size >> 1));
    391         }
    392         T* ptr = item_ptr(p);
    393-        memmove(ptr + count, ptr, (size() - p) * sizeof(T));
    394+        // Passing dst variable with explicit type instead of temporary expression
    395+        // calms down GCC under some circumstances.
    


    ryanofsky commented at 12:52 pm on September 9, 2024:

    In commit “refactor: Make code more tolerant of constexpr std::arrays” (c4c102de7336e6c0cc2e5187feedf437ebadb565)

    Sorry I didn’t read the previous threads which might explain this, but this is a very vague comment which should be improved. It’s is not clear looking at the new code what “temporary expression” or “explicit type” the comment is referring to, what “calms down GCC” could mean or what “some circumstances” could be. Would suggest writing a more understandable comment like // Passing dst to memmove instead of (ptr + count) prevents GCC error/warning "<whatever the error/warning is>"


    l0rinc commented at 2:55 pm on September 9, 2024:
    Related to #30765 (review), will see what I can do to document it better. I guess the explanation should rather go to the commit message instead of the code, since the code itself isn’t doing anything special.

    l0rinc commented at 7:17 pm on September 9, 2024:
    Dropped the commit after the suggested simplification

    l0rinc commented at 10:03 am on September 10, 2024:

    Got the error again, trying to create a godbolt reproducer to understand it better

     0In file included from /ci_container_base/src/script/script.h:11,
     1                 from /ci_container_base/src/primitives/transaction.h:11,
     2                 from /ci_container_base/src/primitives/block.h:9,
     3                 from /ci_container_base/src/kernel/chainparams.h:11,
     4                 from /ci_container_base/src/kernel/chainparams.cpp:6:
     5In member function void prevector<N, T, Size, Diff>::fill(T*, InputIterator, InputIterator) [with InputIterator = const unsigned char*; unsigned int N = 28; T = unsigned char; Size = unsigned int; Diff = int],
     6    inlined from void prevector<N, T, Size, Diff>::insert(iterator, InputIterator, InputIterator) [with InputIterator = const unsigned char*; unsigned int N = 28; T = unsigned char; Size = unsigned int; Diff = int] at /ci_container_base/src/prevector.h:395:13,
     7    inlined from void CScript::PushData(const prevector<28, unsigned char>::value_type*, size_t) at /ci_container_base/src/script/script.h:439:15,
     8    inlined from CScript& CScript::operator<<(std::span<const std::byte>) at /ci_container_base/src/script/script.h:496:17,
     9    inlined from CBlock CreateGenesisBlock(uint32_t, uint32_t, uint32_t, int32_t, const CAmount&) at /ci_container_base/src/kernel/chainparams.cpp:76:54:
    10/ci_container_base/src/prevector.h:216:13: error: writing 65 bytes into a region of size 32 [-Werror=stringop-overflow=]
    11  216 |             new(static_cast<void*>(dst)) T(*first);
    12      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    13/ci_container_base/src/kernel/chainparams.cpp: In function CBlock CreateGenesisBlock(uint32_t, uint32_t, uint32_t, int32_t, const CAmount&):
    14/ci_container_base/src/kernel/chainparams.cpp:76:49: note: destination object <anonymous> of size 32
    15   76 |     const CScript genesisOutputScript = CScript() << "04678afdb0fe5548271967f1a67130b7105cd6a828e03909a67962e0ea1f61deb649f6bc3f4cef38c4f35504e51ec112de5c384df7ba0b8d578a4c702b6bf11d5f"_hex << OP_CHECKSIG;
    16      |                                                 ^
    17In file included from /usr/lib/gcc/x86_64-w64-mingw32/12-posix/include/c++/cstring:42,
    18                 from /ci_container_base/src/crypto/common.h:11,
    19                 from /ci_container_base/src/uint256.h:9,
    20                 from /ci_container_base/src/consensus/params.h:9,
    21                 from /ci_container_base/src/kernel/chainparams.h:9:
    22In function void* memmove(void*, const void*, size_t),
    23    inlined from void prevector<N, T, Size, Diff>::insert(iterator, InputIterator, InputIterator) [with InputIterator = const unsigned char*; unsigned int N = 28; T = unsigned char; Size = unsigned int; Diff = int] at /ci_container_base/src/prevector.h:393:16,
    24    inlined from void CScript::PushData(const prevector<28, unsigned char>::value_type*, size_t) at /ci_container_base/src/script/script.h:439:15,
    25    inlined from CScript& CScript::operator<<(std::span<const std::byte>) at /ci_container_base/src/script/script.h:496:17,
    26    inlined from CBlock CreateGenesisBlock(uint32_t, uint32_t, uint32_t, int32_t, const CAmount&) at /ci_container_base/src/kernel/chainparams.cpp:76:54:
    27/usr/share/mingw-w64/include/string.h:214:33: warning: void* __builtin_memmove(void*, const void*, long long unsigned int) offset [65, 35] is out of the bounds [0, 32] of object <anonymous> with type CScript [-Warray-bounds]
    28  214 |   return __builtin___memmove_chk(__dst, __src, __n, __mingw_bos(__dst, 0));
    29      |          ~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    30/ci_container_base/src/kernel/chainparams.cpp: In function CBlock CreateGenesisBlock(uint32_t, uint32_t, uint32_t, int32_t, const CAmount&):
    31/ci_container_base/src/kernel/chainparams.cpp:76:49: note: <anonymous> declared here
    32   76 |     const CScript genesisOutputScript = CScript() << "04678afdb0fe5548271967f1a67130b7105cd6a828e03909a67962e0ea1f61deb649f6bc3f4cef38c4f35504e51ec112de5c384df7ba0b8d578a4c702b6bf11d5f"_hex << OP_CHECKSIG;
    33      |                                                 ^
    34cc1plus: all warnings being treated as errors
    35make[2]: *** [src/CMakeFiles/bitcoin_common.dir/build.make:422: src/CMakeFiles/bitcoin_common.dir/kernel/chainparams.cpp.obj] Error 1
    36make[1]: *** [CMakeFiles/Makefile2:864: src/CMakeFiles/bitcoin_common.dir/all] Error 2
    37make: *** [Makefile:146: all] Error 2
    

    l0rinc commented at 6:18 pm on September 10, 2024:

    This error was reported many times for GCC 12.2, managed to reproduce the error via godbolt (it’s fixed in GCC 12.3) and on a local Dockerfile:

     0FROM docker.io/amd64/debian:bookworm
     1
     2# Update and install dependencies
     3RUN apt update && apt install -y \
     4    build-essential \
     5    cmake \
     6    git \
     7    curl \
     8    wget \
     9    pkg-config \
    10    bsdmainutils \
    11    python3 \
    12    libevent-dev \
    13    libboost-system-dev \
    14    libboost-filesystem-dev \
    15    libboost-chrono-dev \
    16    libboost-test-dev \
    17    libboost-thread-dev \
    18    libminiupnpc-dev \
    19    libzmq3-dev \
    20    libsqlite3-dev \
    21    libdb-dev \
    22    libdb++-dev \
    23    libssl-dev \
    24    libprotobuf-dev \
    25    protobuf-compiler \
    26    libqrencode-dev \
    27    libqt5gui5 \
    28    libqt5core5a \
    29    libqt5dbus5 \
    30    qttools5-dev \
    31    qttools5-dev-tools \
    32    libseccomp-dev \
    33    libcap-dev \
    34    nsis \
    35    g++-mingw-w64-x86-64-posix \
    36    wine-binfmt \
    37    wine64 \
    38    file
    39
    40RUN gcc --version && cmake --version
    41
    42COPY . /bitcoin
    43WORKDIR /bitcoin
    44RUN git clean -fdx
    45RUN cmake -B build -DREDUCE_EXPORTS=ON -DBUILD_GUI_TESTS=OFF -DCMAKE_CXX_FLAGS='-Werror'
    46RUN cmake --build build -j8
    

    Fixed in a39f57f (#30765), added all related info to the commit message, let me know if more info is still needed. Thanks @hodlinator and @ryanofsky for the reviews and suggestions!

  28. in src/script/script.h:420 in 402101e20b outdated
    412@@ -412,6 +413,33 @@ bool GetScriptOp(CScriptBase::const_iterator& pc, CScriptBase::const_iterator en
    413 /** Serialized script, used inside transaction inputs and outputs */
    414 class CScript : public CScriptBase
    415 {
    416+private:
    417+    inline void AppendDataSize(const size_t size)
    418+    {
    419+        if (size < OP_PUSHDATA1) {
    420+            insert(this->end(), uint8_t(size));
    


    ryanofsky commented at 1:09 pm on September 9, 2024:

    In commit “refactor: Make code more tolerant of constexpr std::arrays” (c4c102de7336e6c0cc2e5187feedf437ebadb565)

    I’m not sure I see benefit of changing this from unsigned char to uint8_t now when there’s a good chance this is going to be switched again to std::byte in the future. It could also be good eliminate the C cast on this line if it is changing anyway.

    So not important, but would consider changing uint8_t(size) to static_cast<value_type(size)> here and changing uint8_t generally to value_type so this code can be unchanged if the surrounding code switches to std::byte.


    l0rinc commented at 7:18 pm on September 9, 2024:

    I’m not sure I see benefit of changing this from unsigned char to uint8_t

    uint8_t was already used in the other branches

    changing uint8_t generally to value_type

    I like that, thanks!

  29. in src/script/script.h:494 in 402101e20b outdated
    490@@ -463,32 +491,10 @@ class CScript : public CScriptBase
    491         return *this;
    492     }
    493 
    494-    CScript& operator<<(const std::vector<unsigned char>& b) LIFETIMEBOUND
    495+    CScript& operator<<(const std::span<const uint8_t> b) LIFETIMEBOUND
    


    ryanofsky commented at 1:19 pm on September 9, 2024:

    In commit “Generalize std::vector push to std::span to accept std::array” (402101e20bc2086e9237c2e8a51471ac97069e01)

    Title of this commit is misleading it does not mention CScript. It sounds like it is referring to a vector push method not a script push method. Would suggest something more like “Allow CScript operator« to accept spans, not just vectors”


    l0rinc commented at 7:18 pm on September 9, 2024:
    Thanks
  30. in src/test/util_tests.cpp:157 in e6c02b9f25 outdated
    168-
    169-    result = operator""_hex_v_u8<util::detail::Hex(HEX_PARSE_INPUT)>();
    170-    BOOST_CHECK_EQUAL_COLLECTIONS(result.begin(), result.end(), expected.begin(), expected.end());
    171+    {
    172+        constexpr std::array<std::byte, 65> hex_literal{operator""_hex<util::detail::Hex(HEX_PARSE_INPUT)>()};
    173+        auto hex_span{BytesToUInt8s(hex_literal)};
    


    ryanofsky commented at 1:21 pm on September 9, 2024:

    In commit “Add std::byte span overload to CScript” (e6c02b9f2544032481743b4f8d3148622ccc73dd)

    I don’t think these changes which don’t have to do with CScript belong in a commit that is supposed to be adding a new operator to CScript.

    Would suggest moving definition of BytesToUInt8s and UInt8sToBytes into a new commit before any of the cscript changes so this can be more understandable.

    It would also be good if commit gave a hint about what purpose of these util test changes are. They seem ok, but it is not clear what motivated them or what test coverage is being added and removed.

    Also would suggest squashing the CScript portion of commits 402101e20bc2086e9237c2e8a51471ac97069e01 and e6c02b9f2544032481743b4f8d3148622ccc73dd into a single commit so surrounding code changes once directly from _hex_v_u8 to _hex instead of _hex_v_u8 -> _hex_u8 -> _hex


    l0rinc commented at 7:18 pm on September 9, 2024:
    Dropped these after your simplification suggestion
  31. in src/span.h:297 in e6c02b9f25 outdated
    291@@ -290,6 +292,15 @@ inline const unsigned char* UCharCast(const std::byte* c) { return reinterpret_c
    292 template <typename B>
    293 concept BasicByte = requires { UCharCast(std::span<B>{}.data()); };
    294 
    295+inline constexpr std::span<const uint8_t> BytesToUInt8s(const std::span<const std::byte> s) noexcept
    296+{
    297+    return std::bit_cast<std::span<const uint8_t>>(s);
    


    ryanofsky commented at 1:41 pm on September 9, 2024:

    In commit “Add std::byte span overload to CScript” (e6c02b9f2544032481743b4f8d3148622ccc73dd)

    I think this could use some explanation because it seems unexpected and potentially unsafe to bit cast std::span objects and make assumptions about internal layout of std::span. I can see why bit casting std::array is ok because of restrictions it has, but unclear why bit-casting std::span objects is ok when std::span objects are just wrappers around pointers and bit-casting pointers is not ok. At least according to https://stackoverflow.com/questions/69918542/can-stdbit-cast-be-used-to-cast-from-stdspana-to-stdspanb-and-access-a “there is no requirement that any particular span<T> has the same layout of span<U>…” (and it goes on) so it would be good know why that reasoning does not apply here.

    If you are sure this is safe, then some having some high level explanation of the reasoning in a comment here would be helpful. But from the previous stack exchange answer https://lists.isocpp.org/std-proposals/2022/05/4059.php https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2023/p2747r0.html it seems constexpr functions aren’t actually meant to be able to do all the pointer conversions which are ok to do at runtime.

    If this current approach is not actually safe, would suggest just dropping constexpr overload here, making CScript operator« accept span<const std::byte> and making other operator call UInt8sToBytes and be not constexpr.

    Or if I’m wrong about this and it is ok to use bit_cast on pointers would suggest just using it on the pointer not the whole span object:

    0inline constexpr std::span<const uint8_t> BytesToUInt8s(const std::span<const std::byte> s) noexcept
    1{
    2    return {bit_cast<const uint8_t*>(s.data()), s.size()};
    3}
    

    l0rinc commented at 7:23 pm on September 9, 2024:
    Based on what I read C++23 solves some of these problems (e.g. Every specialization of std::span is a TriviallyCopyable type. (since C++23)) and while array and vector casts would likely be safe, thanks for pointing out that the layout may not always be the same and we need something more predictable here. Wanted to avoid explicit pointer arithmetic or extra copying, let me know if the latest push is a good compromise in this area.
  32. in src/script/script.h:503 in e6c02b9f25 outdated
    497@@ -498,6 +498,11 @@ class CScript : public CScriptBase
    498         return *this;
    499     }
    500 
    501+    CScript& operator<<(const std::span<const std::byte> b) LIFETIMEBOUND
    502+    {
    503+        return *this << BytesToUInt8s(b);
    


    ryanofsky commented at 1:50 pm on September 9, 2024:

    In commit “Add std::byte span overload to CScript” (e6c02b9f2544032481743b4f8d3148622ccc73dd)

    If code is moving away from char and uint8_t and towards std::byte, it would seem better for the uint8_t overload to call the std::byte overload (with UInt8sToBytes) instead of vice versa so the uint8_t overload could be dropped in the future without having to change this code again.


    l0rinc commented at 2:53 pm on September 9, 2024:
    Yes, that’s what I tried at first, but I don’t yet see how that would be possible without changing prevector, as explained here

    ryanofsky commented at 3:17 pm on September 9, 2024:

    Yes, that’s what I tried at first, but I don’t yet see how that would be possible without changing prevector, as explained here

    Thanks that’s helpful to know, and it’s clear why changing prevector could fix this but not clear why it would be necessary to change prevector to fix that as opposed to just passing a start and end pointer of the type it is expecting


    l0rinc commented at 7:24 pm on September 9, 2024:
    Because we were casting back and forth this way. But I think I found a good enough generalization with your help, we don’t need to connect the two operators this way.
  33. ryanofsky commented at 2:24 pm on September 9, 2024: contributor
    Code review e6c02b9f2544032481743b4f8d3148622ccc73dd
  34. ryanofsky commented at 3:27 pm on September 9, 2024: contributor

    I think this PR as of e6c02b9f2544032481743b4f8d3148622ccc73dd has been good exploration of possible changes, but is doing too much and is too complicated. I think it’s worth starting with a minimal change that just supports std::span and std::byte in CScript and replaces _hex_v_u8 with _hex:

      0--- a/src/script/script.h
      1+++ b/src/script/script.h
      2@@ -463,7 +463,7 @@ public:
      3         return *this;
      4     }
      5 
      6-    CScript& operator<<(const std::vector<unsigned char>& b) LIFETIMEBOUND
      7+    CScript& operator<<(std::span<const std::byte> b) LIFETIMEBOUND
      8     {
      9         if (b.size() < OP_PUSHDATA1)
     10         {
     11@@ -488,10 +488,17 @@ public:
     12             WriteLE32(_data, b.size());
     13             insert(end(), _data, _data + sizeof(_data));
     14         }
     15-        insert(end(), b.begin(), b.end());
     16+        const value_type* _data{reinterpret_cast<const value_type*>(b.data())};
     17+        insert(end(), _data, _data + b.size());
     18         return *this;
     19     }
     20 
     21+    // For backwards compatibility. Please prefer std::byte over uint8_t in new code.
     22+    CScript& operator<<(std::span<const uint8_t> b) LIFETIMEBOUND
     23+    {
     24+        return *this << std::as_bytes(b);
     25+    }
     26+
     27     bool GetOp(const_iterator& pc, opcodetype& opcodeRet, std::vector<unsigned char>& vchRet) const
     28     {
     29         return GetScriptOp(pc, end(), opcodeRet, &vchRet);
     30--- a/src/test/miner_tests.cpp
     31+++ b/src/test/miner_tests.cpp
     32@@ -608,7 +608,7 @@ void MinerTestingSetup::TestPrioritisedMining(const CScript& scriptPubKey, const
     33 BOOST_AUTO_TEST_CASE(CreateNewBlock_validity)
     34 {
     35     // Note that by default, these tests run with size accounting enabled.
     36-    CScript scriptPubKey = CScript() << "04678afdb0fe5548271967f1a67130b7105cd6a828e03909a67962e0ea1f61deb649f6bc3f4cef38c4f35504e51ec112de5c384df7ba0b8d578a4c702b6bf11d5f"_hex_v_u8 << OP_CHECKSIG;
     37+    CScript scriptPubKey = CScript() << "04678afdb0fe5548271967f1a67130b7105cd6a828e03909a67962e0ea1f61deb649f6bc3f4cef38c4f35504e51ec112de5c384df7ba0b8d578a4c702b6bf11d5f"_hex << OP_CHECKSIG;
     38     std::unique_ptr<CBlockTemplate> pblocktemplate;
     39 
     40     CTxMemPool& tx_mempool{*m_node.mempool};
     41--- a/src/test/script_tests.cpp
     42+++ b/src/test/script_tests.cpp
     43@@ -1421,7 +1421,7 @@ BOOST_AUTO_TEST_CASE(script_FindAndDelete)
     44     // prefix, leaving 02ff03 which is push-two-bytes:
     45     s = ToScript("0302ff030302ff03"_hex);
     46     d = ToScript("03"_hex);
     47-    expect = CScript() << "ff03"_hex_v_u8 << "ff03"_hex_v_u8;
     48+    expect = CScript() << "ff03"_hex << "ff03"_hex;
     49     BOOST_CHECK_EQUAL(FindAndDelete(s, d), 2);
     50     BOOST_CHECK(s == expect);
     51 
     52--- a/src/test/transaction_tests.cpp
     53+++ b/src/test/transaction_tests.cpp
     54@@ -852,24 +852,24 @@ BOOST_AUTO_TEST_CASE(test_IsStandard)
     55     CheckIsNotStandard(t, "scriptpubkey");
     56 
     57     // MAX_OP_RETURN_RELAY-byte TxoutType::NULL_DATA (standard)
     58-    t.vout[0].scriptPubKey = CScript() << OP_RETURN << "04678afdb0fe5548271967f1a67130b7105cd6a828e03909a67962e0ea1f61deb649f6bc3f4cef3804678afdb0fe5548271967f1a67130b7105cd6a828e03909a67962e0ea1f61deb649f6bc3f4cef38"_hex_v_u8;
     59+    t.vout[0].scriptPubKey = CScript() << OP_RETURN << "04678afdb0fe5548271967f1a67130b7105cd6a828e03909a67962e0ea1f61deb649f6bc3f4cef3804678afdb0fe5548271967f1a67130b7105cd6a828e03909a67962e0ea1f61deb649f6bc3f4cef38"_hex;
     60     BOOST_CHECK_EQUAL(MAX_OP_RETURN_RELAY, t.vout[0].scriptPubKey.size());
     61     CheckIsStandard(t);
     62 
     63     // MAX_OP_RETURN_RELAY+1-byte TxoutType::NULL_DATA (non-standard)
     64-    t.vout[0].scriptPubKey = CScript() << OP_RETURN << "04678afdb0fe5548271967f1a67130b7105cd6a828e03909a67962e0ea1f61deb649f6bc3f4cef3804678afdb0fe5548271967f1a67130b7105cd6a828e03909a67962e0ea1f61deb649f6bc3f4cef3800"_hex_v_u8;
     65+    t.vout[0].scriptPubKey = CScript() << OP_RETURN << "04678afdb0fe5548271967f1a67130b7105cd6a828e03909a67962e0ea1f61deb649f6bc3f4cef3804678afdb0fe5548271967f1a67130b7105cd6a828e03909a67962e0ea1f61deb649f6bc3f4cef3800"_hex;
     66     BOOST_CHECK_EQUAL(MAX_OP_RETURN_RELAY + 1, t.vout[0].scriptPubKey.size());
     67     CheckIsNotStandard(t, "scriptpubkey");
     68 
     69     // Data payload can be encoded in any way...
     70-    t.vout[0].scriptPubKey = CScript() << OP_RETURN << ""_hex_v_u8;
     71+    t.vout[0].scriptPubKey = CScript() << OP_RETURN << ""_hex;
     72     CheckIsStandard(t);
     73-    t.vout[0].scriptPubKey = CScript() << OP_RETURN << "00"_hex_v_u8 << "01"_hex_v_u8;
     74+    t.vout[0].scriptPubKey = CScript() << OP_RETURN << "00"_hex << "01"_hex;
     75     CheckIsStandard(t);
     76     // OP_RESERVED *is* considered to be a PUSHDATA type opcode by IsPushOnly()!
     77-    t.vout[0].scriptPubKey = CScript() << OP_RETURN << OP_RESERVED << -1 << 0 << "01"_hex_v_u8 << 2 << 3 << 4 << 5 << 6 << 7 << 8 << 9 << 10 << 11 << 12 << 13 << 14 << 15 << 16;
     78+    t.vout[0].scriptPubKey = CScript() << OP_RETURN << OP_RESERVED << -1 << 0 << "01"_hex << 2 << 3 << 4 << 5 << 6 << 7 << 8 << 9 << 10 << 11 << 12 << 13 << 14 << 15 << 16;
     79     CheckIsStandard(t);
     80-    t.vout[0].scriptPubKey = CScript() << OP_RETURN << 0 << "01"_hex_v_u8 << 2 << "ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff"_hex_v_u8;
     81+    t.vout[0].scriptPubKey = CScript() << OP_RETURN << 0 << "01"_hex << 2 << "ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff"_hex;
     82     CheckIsStandard(t);
     83 
     84     // ...so long as it only contains PUSHDATA's
     85@@ -883,13 +883,13 @@ BOOST_AUTO_TEST_CASE(test_IsStandard)
     86 
     87     // Only one TxoutType::NULL_DATA permitted in all cases
     88     t.vout.resize(2);
     89-    t.vout[0].scriptPubKey = CScript() << OP_RETURN << "04678afdb0fe5548271967f1a67130b7105cd6a828e03909a67962e0ea1f61deb649f6bc3f4cef38"_hex_v_u8;
     90+    t.vout[0].scriptPubKey = CScript() << OP_RETURN << "04678afdb0fe5548271967f1a67130b7105cd6a828e03909a67962e0ea1f61deb649f6bc3f4cef38"_hex;
     91     t.vout[0].nValue = 0;
     92-    t.vout[1].scriptPubKey = CScript() << OP_RETURN << "04678afdb0fe5548271967f1a67130b7105cd6a828e03909a67962e0ea1f61deb649f6bc3f4cef38"_hex_v_u8;
     93+    t.vout[1].scriptPubKey = CScript() << OP_RETURN << "04678afdb0fe5548271967f1a67130b7105cd6a828e03909a67962e0ea1f61deb649f6bc3f4cef38"_hex;
     94     t.vout[1].nValue = 0;
     95     CheckIsNotStandard(t, "multi-op-return");
     96 
     97-    t.vout[0].scriptPubKey = CScript() << OP_RETURN << "04678afdb0fe5548271967f1a67130b7105cd6a828e03909a67962e0ea1f61deb649f6bc3f4cef38"_hex_v_u8;
     98+    t.vout[0].scriptPubKey = CScript() << OP_RETURN << "04678afdb0fe5548271967f1a67130b7105cd6a828e03909a67962e0ea1f61deb649f6bc3f4cef38"_hex;
     99     t.vout[1].scriptPubKey = CScript() << OP_RETURN;
    100     CheckIsNotStandard(t, "multi-op-return");
    101 
    102--- a/src/wallet/test/ismine_tests.cpp
    103+++ b/src/wallet/test/ismine_tests.cpp
    104@@ -684,7 +684,7 @@ BOOST_AUTO_TEST_CASE(ismine_standard)
    105         BOOST_CHECK(keystore.GetLegacyScriptPubKeyMan()->AddKey(keys[0]));
    106 
    107         scriptPubKey.clear();
    108-        scriptPubKey << OP_0 << "aabb"_hex_v_u8;
    109+        scriptPubKey << OP_0 << "aabb"_hex;
    110 
    111         result = keystore.GetLegacyScriptPubKeyMan()->IsMine(scriptPubKey);
    112         BOOST_CHECK_EQUAL(result, ISMINE_NO);
    113@@ -699,7 +699,7 @@ BOOST_AUTO_TEST_CASE(ismine_standard)
    114         BOOST_CHECK(keystore.GetLegacyScriptPubKeyMan()->AddKey(keys[0]));
    115 
    116         scriptPubKey.clear();
    117-        scriptPubKey << OP_16 << "aabb"_hex_v_u8;
    118+        scriptPubKey << OP_16 << "aabb"_hex;
    119 
    120         result = keystore.GetLegacyScriptPubKeyMan()->IsMine(scriptPubKey);
    121         BOOST_CHECK_EQUAL(result, ISMINE_NO);
    

    If more complicated changes are needed they could be followup commits or PRs.

  35. l0rinc renamed this:
    refactor: Generalize `CScript`'s `std::vector` push to accept `std::array`
    refactor: Allow `CScript`'s `operator<<` to accept spans, not just vectors
    on Sep 9, 2024
  36. l0rinc marked this as a draft on Sep 9, 2024
  37. l0rinc force-pushed on Sep 9, 2024
  38. in src/script/script.h:417 in 9b76c1a0b3 outdated
    412@@ -412,6 +413,32 @@ bool GetScriptOp(CScriptBase::const_iterator& pc, CScriptBase::const_iterator en
    413 /** Serialized script, used inside transaction inputs and outputs */
    414 class CScript : public CScriptBase
    415 {
    416+private:
    417+    inline void AppendDataSize(const size_t size)
    


    hodlinator commented at 7:55 pm on September 9, 2024:
    IMO Append -> Push to stay more consistent with the OP-codes themselves (if we keep this change of breaking out named methods from operator<<).

    l0rinc commented at 8:45 pm on September 9, 2024:
    It’s appending to the end - but if you say push is the accepted terminology, I’ll rename them, thanks.
  39. l0rinc force-pushed on Sep 9, 2024
  40. in src/script/script.h:433 in 872b016e89 outdated
    428+            insert(this->end(), std::cbegin(_data), std::cend(_data));
    429+        } else {
    430+            insert(this->end(), OP_PUSHDATA4);
    431+            value_type _data[4];
    432+            WriteLE32(_data, size);
    433+            insert(this->end(), std::cbegin(_data), std::cend(_data));
    


    hodlinator commented at 8:05 pm on September 9, 2024:

    Why this-> everywhere? Is your plan to change all code you touch to use this-> for clarity going forward?

    Like the static_casts and cbegin()/cend() though.


    l0rinc commented at 8:45 pm on September 9, 2024:
    You’re right, we don’t have an end parameter anymore so I’ll revert these back to just end(), thanks!
  41. in src/script/script.h:428 in 872b016e89 outdated
    423+            insert(this->end(), static_cast<value_type>(size));
    424+        } else if (size <= 0xffff) {
    425+            insert(this->end(), OP_PUSHDATA2);
    426+            value_type _data[2];
    427+            WriteLE16(_data, size);
    428+            insert(this->end(), std::cbegin(_data), std::cend(_data));
    


    hodlinator commented at 8:10 pm on September 9, 2024:
    nit: Don’t like the _-prefix of _data, but it’s from the original. _-prefixes otherwise signify something internal or reserved. Since you are touching all lines, maybe switch it to data/data2/ data16 + data/data4/data32?

    l0rinc commented at 8:45 pm on September 9, 2024:
    or just data since they don’t overlap.
  42. in src/test/script_tests.cpp:1371 in 4574e45a99 outdated
    1367@@ -1368,6 +1368,13 @@ static CScript ScriptFromHex(const std::string& str)
    1368     return ToScript(*Assert(TryParseHex(str)));
    1369 }
    1370 
    1371+BOOST_AUTO_TEST_CASE(script_hex_conversion)
    


    hodlinator commented at 8:15 pm on September 9, 2024:
    nit: Might call it something like script_byte_array_u8_vector_equivalence?

    l0rinc commented at 8:45 pm on September 9, 2024:
    done
  43. hodlinator commented at 8:36 pm on September 9, 2024: contributor

    Concept ACK 4574e45a99d8e11ccadd839f2ef8a80356e8955d.

    Sad to see my weird fix in #30765 (review) go /s, but also glad the workaround was so smooth. Hope that case won’t show up again or one of us remembers it.

  44. l0rinc force-pushed on Sep 9, 2024
  45. DrahtBot added the label CI failed on Sep 10, 2024
  46. DrahtBot commented at 9:08 am on September 10, 2024: contributor

    🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/29897924592

    Make sure to run all tests locally, according to the documentation.

    The failure may happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  47. l0rinc force-pushed on Sep 10, 2024
  48. ryanofsky approved
  49. ryanofsky commented at 8:15 pm on September 10, 2024: contributor

    Code review ACK 48b67d8294eb62c5371e7dbc30b7cf811fb7e51c. This looks great! PR is much simpler now, and it is nice to see the GCC workaround expanded and documented so well.

    Would suggest a few changes, implemented in diff below:

    • Replacing PushDataSize size_t parameter with uint32_t parameter to document fact that it can’t accept larger sizes, and to call WriteLE32 without a type conversion.
    • Replacing PushData const value_type* data, size_t size parameters with std::span<const value_type> data because using span is more idiomatic and self-documenting.
    • Fixing grammar in operator<< comment so it doesn’t sound like std::byte is preferred for backwards compatibility.
    • Making the char operator call the std::byte operator to make it smaller, deduplicate code, and make it obvious the operators are equivalent.

    Suggested changes:

     0--- a/src/script/script.h
     1+++ b/src/script/script.h
     2@@ -414,7 +414,7 @@ bool GetScriptOp(CScriptBase::const_iterator& pc, CScriptBase::const_iterator en
     3 class CScript : public CScriptBase
     4 {
     5 private:
     6-    inline void PushDataSize(const size_t size)
     7+    inline void PushDataSize(const uint32_t size)
     8     {
     9         if (size < OP_PUSHDATA1) {
    10             insert(end(), static_cast<value_type>(size));
    11@@ -434,9 +434,9 @@ private:
    12         }
    13     }
    14 
    15-    void PushData(const value_type* data, size_t size)
    16+    void PushData(std::span<const value_type> data)
    17     {
    18-        insert(end(), data, data + size);
    19+        insert(end(), data.begin(), data.end());
    20     }
    21 
    22 protected:
    23@@ -493,16 +493,14 @@ public:
    24     CScript& operator<<(std::span<const std::byte> b) LIFETIMEBOUND
    25     {
    26         PushDataSize(b.size());
    27-        PushData(reinterpret_cast<const value_type*>(b.data()), b.size());
    28+        PushData({reinterpret_cast<const value_type*>(b.data()), b.size()});
    29         return *this;
    30     }
    31 
    32-    // For backwards compatibility, please prefer std::byte over uint8_t in new code
    33+    // For backwards compatibility. Please prefer std::byte over uint8_t in new code.
    34     CScript& operator<<(std::span<const value_type> b) LIFETIMEBOUND
    35     {
    36-        PushDataSize(b.size());
    37-        PushData(b.data(), b.size());
    38-        return *this;
    39+        return *this << std::as_bytes(b);
    40     }
    41 
    42     bool GetOp(const_iterator& pc, opcodetype& opcodeRet, std::vector<unsigned char>& vchRet) const
    
  50. DrahtBot requested review from hodlinator on Sep 10, 2024
  51. l0rinc commented at 10:20 pm on September 10, 2024: contributor

    Replacing PushDataSize size_t parameter with uint32_t parameter

    Good observation, thanks. I was being overly cautious with modifying that code.

    Making the char operator call the std::byte operator to make it smaller, deduplicate code, and make it obvious the operators are equivalent.

    My goal was to avoid extra casting and wrapping, especially in performance-critical areas like this.

    Right now, we avoid intermediary vectors, but with this change, we’d wrap the data in a std::span, recast via std::as_bytes, only to unwrap it again during insertion. While std::span is lightweight, these extra steps feel unnecessary here, especially in the case of operator<<(std::span<const value_type>), where we’re recasting and rewrapping data we already have in its final form, only to unwrap it again during insertion. It feels like overkill for something that could be done directly - it’s why I’ve extracted the size and data insertion. I don’t see that as repetition (especially compared to the insertion methods in prevector).

    I don’t like raw pointers either, so if you insist, I’ll change it to span, but I want to make sure you understand why I prefer the current version.

  52. l0rinc force-pushed on Sep 10, 2024
  53. hodlinator approved
  54. hodlinator commented at 3:01 pm on September 11, 2024: contributor

    ACK e6a5ab7637e60d3ae731c2ac854c170bc009ab99

    git range-diff master 4574e45 e6a5ab7

    a39f57f1a062eb6c3872a35defaf6d893df1d4a4

    Nice work on the GCC bug! Wish the example error in the commit message didn’t come from code that only becomes possible in later commit. Maybe worth at least noting that in the message?

    e6a5ab7637e60d3ae731c2ac854c170bc009ab99

    Commit message: “Replace _hex_v_u8 CScript appends to _hex” -> “Replace _hex_v_u8 CScript appends with _hex”

  55. DrahtBot requested review from ryanofsky on Sep 11, 2024
  56. l0rinc marked this as ready for review on Sep 11, 2024
  57. prevector: avoid GCC bogus warnings in insert method
    When compiling with GCC 12.2, both `-Warray-bounds` and `-Wstringop-overflow` warnings were triggered in the `prevector::insert` method during CScript prevector operations.
    
    GCC incorrectly assumed that operator new could modify the state of class members, leading to false positives during the memmove operation.
    
    Following the approach in https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=cca06f0d6d76b0, we introduced local copies for the destination pointer in memmove operations. This prevents GCC from misinterpreting memory manipulation as unsafe.
    
    A minimal reproducer triggering this issue in GCC 12.2 and passing in GCC 12.3 can be found at https://godbolt.org/z/8r9TKKoxv.
    
    -------
    
    Full error (with changes from the next commit as well):
    ```
    In file included from /ci_container_base/src/script/script.h:11,
                     from /ci_container_base/src/primitives/transaction.h:11,
                     from /ci_container_base/src/primitives/block.h:9,
                     from /ci_container_base/src/kernel/chainparams.h:11,
                     from /ci_container_base/src/kernel/chainparams.cpp:6:
    In member function ‘void prevector<N, T, Size, Diff>::fill(T*, InputIterator, InputIterator) [with InputIterator = const unsigned char*; unsigned int N = 28; T = unsigned char; Size = unsigned int; Diff = int]’,
        inlined from ‘void prevector<N, T, Size, Diff>::insert(iterator, InputIterator, InputIterator) [with InputIterator = const unsigned char*; unsigned int N = 28; T = unsigned char; Size = unsigned int; Diff = int]’ at /ci_container_base/src/prevector.h:395:13,
        inlined from ‘void CScript::AppendData(const prevector<28, unsigned char>::value_type*, size_t)’ at /ci_container_base/src/script/script.h:439:15,
        inlined from ‘CScript& CScript::operator<<(std::span<const std::byte>)’ at /ci_container_base/src/script/script.h:496:17,
        inlined from ‘CBlock CreateGenesisBlock(uint32_t, uint32_t, uint32_t, int32_t, const CAmount&)’ at /ci_container_base/src/kernel/chainparams.cpp:76:54:
    /ci_container_base/src/prevector.h:216:13: error: writing 65 bytes into a region of size 32 [-Werror=stringop-overflow=]
      216 |             new(static_cast<void*>(dst)) T(*first);
          |             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    /ci_container_base/src/kernel/chainparams.cpp: In function ‘CBlock CreateGenesisBlock(uint32_t, uint32_t, uint32_t, int32_t, const CAmount&)’:
    /ci_container_base/src/kernel/chainparams.cpp:76:49: note: destination object ‘<anonymous>’ of size 32
       76 |     const CScript genesisOutputScript = CScript() << "04678afdb0fe5548271967f1a67130b7105cd6a828e03909a67962e0ea1f61deb649f6bc3f4cef38c4f35504e51ec112de5c384df7ba0b8d578a4c702b6bf11d5f"_hex << OP_CHECKSIG;
          |                                                 ^
    In file included from /usr/lib/gcc/x86_64-w64-mingw32/12-posix/include/c++/cstring:42,
                     from /ci_container_base/src/crypto/common.h:11,
                     from /ci_container_base/src/uint256.h:9,
                     from /ci_container_base/src/consensus/params.h:9,
                     from /ci_container_base/src/kernel/chainparams.h:9:
    In function ‘void* memmove(void*, const void*, size_t)’,
        inlined from ‘void prevector<N, T, Size, Diff>::insert(iterator, InputIterator, InputIterator) [with InputIterator = const unsigned char*; unsigned int N = 28; T = unsigned char; Size = unsigned int; Diff = int]’ at /ci_container_base/src/prevector.h:393:16,
        inlined from ‘void CScript::AppendData(const prevector<28, unsigned char>::value_type*, size_t)’ at /ci_container_base/src/script/script.h:439:15,
        inlined from ‘CScript& CScript::operator<<(std::span<const std::byte>)’ at /ci_container_base/src/script/script.h:496:17,
        inlined from ‘CBlock CreateGenesisBlock(uint32_t, uint32_t, uint32_t, int32_t, const CAmount&)’ at /ci_container_base/src/kernel/chainparams.cpp:76:54:
    /usr/share/mingw-w64/include/string.h:214:33: warning: ‘void* __builtin_memmove(void*, const void*, long long unsigned int)’ offset [65, 35] is out of the bounds [0, 32] of object ‘<anonymous>’ with type ‘CScript’ [-Warray-bounds]
      214 |   return __builtin___memmove_chk(__dst, __src, __n, __mingw_bos(__dst, 0));
          |          ~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    /ci_container_base/src/kernel/chainparams.cpp: In function ‘CBlock CreateGenesisBlock(uint32_t, uint32_t, uint32_t, int32_t, const CAmount&)’:
    /ci_container_base/src/kernel/chainparams.cpp:76:49: note: ‘<anonymous>’ declared here
       76 |     const CScript genesisOutputScript = CScript() << "04678afdb0fe5548271967f1a67130b7105cd6a828e03909a67962e0ea1f61deb649f6bc3f4cef38c4f35504e51ec112de5c384df7ba0b8d578a4c702b6bf11d5f"_hex << OP_CHECKSIG;
          |                                                 ^
    ```
    
    Co-authored-by: Hodlinator <172445034+hodlinator@users.noreply.github.com>
    Co-authored-by: Ryan Ofsky <ryan@ofsky.org>
    c78d8ff4cb
  58. Allow CScript's operator<< to accept spans, not just vectors
    Extracted existing serialization to append size & data in separate private methods to clarify that it does more than just a simple data insertion.
    
    * the C style casts were changed to static_cast
    * `unsigned char` and `uint8_t` were changed to value_type for forward compatibility
    * `data + sizeof(data)` was changed to `std::cend`
    * data insertion (in AppendData) relies on pointer arithmetic now to enable both `std::span<const value_type>` and `std::span<const std::byte>` operators
    * use uint32_t for data size instead of size_t
    * used span instead of raw pointers in the new methods
    
    Co-authored-by: Ryan Ofsky <ryan@ofsky.org>
    Co-authored-by: Hodlinator <172445034+hodlinator@users.noreply.github.com>
    cac846c2fb
  59. Replace CScript _hex_v_u8 appends with _hex
    This will skip vector conversion before serializing to the prevector in CScript.
    5e190cd11f
  60. in src/script/script.h:440 in e6a5ab7637 outdated
    435+    }
    436+
    437+    void PushData(const value_type* data, uint32_t size)
    438+    {
    439+        insert(end(), data, data + size);
    440+    }
    


    maflcko commented at 3:20 pm on September 11, 2024:

    not sure about the naming here. This PushData does not push any data (at least not in a sense what the OP_PUSHDATA* are referring to). InsertData may be a better name. However, this would also be confusing, because the method is private and just a wrapper around the public insert method. It seems easier to just call insert directly instead of creating another wrapper.

    Also, it may be easier to review to leave the code where it was, instead of moving it and coming up with new function names. The “deprecated” wrapper could be a simple return *this << std::as_bytes(b); which compilers should be able to optimize.


    l0rinc commented at 3:23 pm on September 11, 2024:

    It seems easier to just call insert directly instead of creating another wrapper.

    This always adds element to the end(), so it’s either an append or a push or similar.

    The “deprecated” wrapper could be a simple return *this « std::as_bytes(b); which compilers should be able to optimize.

    Please see my arguments against that in #30765 (comment)


    sipa commented at 3:32 pm on September 11, 2024:

    Pushing has a specific meaning in the context of Bitcoin Script (it’s an opcode that pushes data onto the stack), and that’s not what’s happening here so that would be very confusing.

    I don’t understand the argument you link to. *this << std::as_bytes(b); is well-defined, obviously correct, concise, and trivially optimized out.


    l0rinc commented at 3:42 pm on September 11, 2024:

    I’ve called it Append* originally but Push* was requested. Thanks, renamed back.

    *this << std::as_bytes(b); is well-defined, obviously correct, concise, and trivially optimized out.

    Seems there’s an overwhelming majority for this, thanks for the inputs, done!


    hodlinator commented at 9:05 pm on September 11, 2024:

    Pushing has a specific meaning in the context of Bitcoin Script (it’s an opcode that pushes data onto the stack), and that’s not what’s happening here so that would be very confusing.

    The new pair of private functions together implement OP_PUSHDATA, and are only used in that context, that’s the reason I recommended the naming.

  61. l0rinc force-pushed on Sep 11, 2024
  62. ryanofsky approved
  63. ryanofsky commented at 4:20 pm on September 11, 2024: contributor

    Code review ACK 5e190cd11f6ed8b7ab4db0f01192de63deaf6fd7. Looks good!

    Since last review, just renamed push methods to append, and applied a few other suggested tweaks

  64. DrahtBot requested review from hodlinator on Sep 11, 2024
  65. in src/script/script.h:496 in 5e190cd11f
    516-            WriteLE32(_data, b.size());
    517-            insert(end(), _data, _data + sizeof(_data));
    518-        }
    519-        insert(end(), b.begin(), b.end());
    520+        AppendDataSize(b.size());
    521+        AppendData({reinterpret_cast<const value_type*>(b.data()), b.size()});
    


    maflcko commented at 5:15 pm on September 11, 2024:

    Is there a reason to extract those function? I think it would be better to not extract them, because:

    • The diff would be smaller
    • They are only used and called once
    • There shouldn’t be a reason for them to be called from somewhere else internally, because operator<< is available internally and externally
    • If someone wanted to use them externally (for whatever reason), they’d have to be moved again, because they are private right now.

    Seems easier to extract them and possibly make them public when there is need, instead of repeatedly changing the code.


    l0rinc commented at 5:34 pm on September 11, 2024:

    Not reuse, but to obviate that the method does two separate things: complex logic for how to serialize a number, followed by inserting bytes - so basically a separation of concerns.

    And also #30765 (review) indicated that operators may not be here to stay, so it makes sense to unburden them early.

  66. hodlinator approved
  67. hodlinator commented at 9:31 pm on September 11, 2024: contributor

    re-ACK 5e190cd11f6ed8b7ab4db0f01192de63deaf6fd7

    git range-diff master e6a5ab7 5e190cd

    • Disagree with AppendData/AppendDataSize names over PushData/PushDataSize, but will let it go.
    • Nice switch back from pointer arithmetic to span in AppendData.
    • Touched up commit messages per my latest review, cheers!
    • Implements legacy operator<< in terms of the other. I agree with l0rinc’s reservations against the added casting uint8_t->std::byte->uint8_t until CScript::value_type becomes std::byte, but shouldn’t cause any runtime overhead.
  68. DrahtBot removed the label CI failed on Sep 13, 2024

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-09-20 01:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me