util: Safer MakeByteSpan with ByteSpanCast #27973

pull maflcko wants to merge 2 commits into bitcoin:master from maflcko:2306-byte-span- changing 5 files +22 −16
  1. maflcko commented at 8:40 am on June 26, 2023: member

    The AsBytes helpers (or std::as_bytes helpers) are architecture dependent. For example, the below test case diff [0], applied before this commit will pass on x86_64, but fail on s390x [1].

    Fix this by replacing AsBytes with a new ByteSpanCast in the MakeByteSpan helper. This will turn the test case diff into a compile failure instead of a runtime error.

    [0] test case diff:

     0diff --git a/src/test/serialize_tests.cpp b/src/test/serialize_tests.cpp
     1index 09f77d2b61..4e56ffa351 100644
     2--- a/src/test/serialize_tests.cpp
     3+++ b/src/test/serialize_tests.cpp
     4@@ -241,6 +241,15 @@ BOOST_AUTO_TEST_CASE(class_methods)
     5     ss2 << intval << boolval << stringval << charstrval << txval;
     6     ss2 >> methodtest3;
     7     BOOST_CHECK(methodtest3 == methodtest4);
     8+
     9+    {
    10+        DataStream out;
    11+        std::vector<uint8_t> in_8{0x00, 0x11, 0x22, 0x33};
    12+        std::vector<uint16_t> in_16{0xaabb, 0xccdd};
    13+        out.write(MakeByteSpan(in_8));
    14+        out.write(MakeByteSpan(in_16));
    15+        BOOST_CHECK_EQUAL(HexStr(out), "00112233bbaaddcc");
    16+    }
    17 }
    18
    19 BOOST_AUTO_TEST_SUITE_END()
    

    [1] test case failure on s390x: test/serialize_tests.cpp(251): error: in "serialize_tests/class_methods": check HexStr(out) == "00112233bbaaddcc" has failed [00112233aabbccdd != 00112233bbaaddcc]

  2. DrahtBot commented at 8:40 am on June 26, 2023: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Reviews

    See the guideline for information on the review process. A summary of reviews will appear here.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #27479 (BIP324: ElligatorSwift integrations by sipa)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  3. DrahtBot renamed this:
    util: Safer MakeByteSpan with ByteSpanCast
    util: Safer MakeByteSpan with ByteSpanCast
    on Jun 26, 2023
  4. DrahtBot added the label Utils/log/libs on Jun 26, 2023
  5. DrahtBot added the label CI failed on Jun 26, 2023
  6. DrahtBot removed the label CI failed on Jun 26, 2023
  7. ryanofsky commented at 4:05 pm on June 26, 2023: contributor

    fad126dae2bc654a28b0a77e35aa43cae42d1e0c seems like a nice improvement. It makes existing MakeByteSpan function safer by only allowing it to be called on arrays of characters, and not array of other things like uint16_t that could have platform specific representations. Some thoughts though:

    • MakeByteSpan probably deserves to have some documentation since it is one of the more useful function in that file and is called a lot of places.
    • I don’t like introducing the ByteSpanCast function. It doesn’t seem useful except as an implementation detail, and I think it’d be nicer if we got rid of some confusing sounding functions instead of adding another one.
    • I don’t like adding another use of the AsBytePtr function. As mentioned in #27927 I think AsBytePtr is unsafe, and less clear than reinterpret_cast.

    Would suggest tweaking the PR as follows:

     0--- a/src/span.h
     1+++ b/src/span.h
     2@@ -270,25 +270,28 @@ inline const unsigned char* UCharCast(const std::byte* c) { return reinterpret_c
     3 
     4 // Helper function to safely convert a Span to a Span<[const] unsigned char>.
     5 template <typename T> constexpr auto UCharSpanCast(Span<T> s) -> Span<typename std::remove_pointer<decltype(UCharCast(s.data()))>::type> { return {UCharCast(s.data()), s.size()}; }
     6-// Helper function to safely convert a Span to a Span<[const] std::byte>.
     7-template <typename B>
     8-auto ByteSpanCast(Span<B> s)
     9-{
    10-    return Span{AsBytePtr(UCharCast(s.data())), s.size_bytes()}; // Use UCharCast to enforce B is a byte-like type.
    11-}
    12 
    13 /** Like the Span constructor, but for (const) unsigned char member types only. Only works for (un)signed char containers. */
    14 template <typename V> constexpr auto MakeUCharSpan(V&& v) -> decltype(UCharSpanCast(Span{std::forward<V>(v)})) { return UCharSpanCast(Span{std::forward<V>(v)}); }
    15 
    16+// Helper functions to turn arrays of characters into std::byte spans. Usable on
    17+// C++ std::string, std::string_view, and std::vector<char> objects, as well as
    18+// on C string literals and C fixed-size char[] arrays.
    19+//
    20+// Note that when constructing a span directly from a C string literal, the span
    21+// will include the \0 null byte. This can be avoided by constructing the span
    22+// indirectly through std::string_view.
    23 template <typename V>
    24 Span<const std::byte> MakeByteSpan(V&& v) noexcept
    25 {
    26-    return ByteSpanCast(Span{std::forward<V>(v)});
    27+    Span data{std::forward<V>(v)};
    28+    return AsBytes(Span{UCharCast(data.data()), data.size_bytes()});
    29 }
    30 template <typename V>
    31 Span<std::byte> MakeWritableByteSpan(V&& v) noexcept
    32 {
    33-    return ByteSpanCast(Span{std::forward<V>(v)});
    34+    Span data{std::forward<V>(v)};
    35+    return AsWritableBytes(Span{UCharCast(data.data()), data.size_bytes()});
    36 }
    37 
    38 #endif // BITCOIN_SPAN_H
    
  8. DrahtBot added the label Needs rebase on Jun 26, 2023
  9. DrahtBot commented at 10:58 pm on June 26, 2023: contributor

    🐙 This pull request conflicts with the target branch and needs rebase.

  10. maflcko commented at 7:36 am on June 27, 2023: member

    Maybe MakeByteSpan should just be removed?

    • Currently it (as well as Span) accepts a range that is not borrowed (see https://en.cppreference.com/w/cpp/ranges/borrowed_range), leading to potential lifetime issues. Someone should check if making the Span deduction guidelines safer will automatically make the MakeByteSpan function safer.
    • Given that std::as_bytes is in the standard library and we can’t prevent devs from using it (or AsBytes) at compile time, it seems pointless trying to offer a safe MakeByteSpan wrapper with only a docstring suggesting to use it. It may be better to just write a clang-tidy plugin to review the code to disallow calling AsBytes on a span of uint16_t (or similar)?
  11. maflcko marked this as a draft on Jun 27, 2023
  12. ryanofsky commented at 3:03 pm on June 27, 2023: contributor

    Maybe MakeByteSpan should just be removed?

    I don’t think I agree. MakeByteSpan(x) is just equivalent to AsBytes(Span{x}) currently, so right now it is pointless. But if we go ahead with this PR and restrict it to only work on character arrays, that will make it safer and actually useful.

    it seems pointless trying to offer a safe MakeByteSpan wrapper with only a docstring suggesting to use it.

    I don’t think it’s pointless to have a safe function even though a dangerous alternative exists. If I’m writing or reviewing code that uses a safe implementation of MakeByteSpan I can be more confident it is doing the right thing than if I see code using AsBytes. Just like if I’m reviewing code that is using UCharCast I can be more confident in it than code using an (unsigned char) cast.

    I think it would be a good thing to discourage use of AsBytes and provide some safer alternative where possible. That alternative doesn’t have to be MakeByteSpan, but if we have a MakeByteSpan function already and can document it and make it safer, I think it’d be good to do that.

  13. maflcko commented at 3:12 pm on June 27, 2023: member

    If I’m writing or reviewing code that uses a safe implementation of MakeByteSpan I can be more confident it is doing the right thing than if I see code using AsBytes.

    Would you be less confident if there was a clang-tidy plugin doing the check? For example, instead of having two functions that do a similar thing, but it is unclear which one is safer/preferred from just looking at the name, we could have an additional UnsafeAsBytes (or similar), where it is clear that using it without looking up what it does is unsafe.

  14. ryanofsky commented at 3:40 pm on June 27, 2023: contributor

    If I’m writing or reviewing code that uses a safe implementation of MakeByteSpan I can be more confident it is doing the right thing than if I see code using AsBytes.

    Would you be less confident if there was a clang-tidy plugin doing the check? For example, instead of having two functions that do a similar thing, but it is unclear which one is safer/preferred from just looking at the name, we could have an additional UnsafeAsBytes (or similar), where it is clear that using it without looking up what it does is unsafe.

    I think that sounds fine, though I’m not sure what the clang plugin adds exactly. If the goal is just to forbid AsBytes from being called on non-char arrays, we could change the definition of AsBytes to do that (probably using UCharCast internally), and provide an additional UnsafeAsBytes escape hatch for the cases where that doesn’t work. But maybe a clang plugin could be useful to prevent std::as_bytes calls?.

    I just think as long as we have a MakeByteSpan function and can make it safer, it seems good to do that. Or at least that doing that would be better than getting rid of MakeByteSpan and replacing it with calls to AsBytes when AsBytes is not very safe right now. But if there’s something that can we can do to make AsBytes safer, I’d agree there be little use for a safer MakeByteSpan

  15. sipa commented at 3:51 pm on June 27, 2023: member
    FWIW, I don’t believe AsBytes (or std::as_bytes) are ever undefined behavior; every object has a memory representation, which you’re allowed to observe. It is of course always implementation-defined behavior when anything but byte and byte-like arrays are involved, but the point of these functions is allowing access to that.
  16. ryanofsky commented at 4:06 pm on June 27, 2023: contributor

    FWIW, I don’t believe AsBytes (or std::as_bytes) are ever undefined behavior

    Yes that should be true as long as span is pointing to live memory. I should replace “platform specific or undefined” with “implementation specific” in the #27978 change description

  17. PRADACANDI18 commented at 6:41 pm on June 28, 2023: none
    General feed back have I have not given approval just yet on a merge. I have to review some more
  18. test: Use shorter explicit Span construction
    This removes bloat that is not needed.
    4f521747c0
  19. Safer MakeByteSpan with ByteSpanCast
    The AsBytes helpers (or std::as_bytes helpers) are architecture
    dependent. For example, the below test case diff [0], applied before
    this commit will pass on x86_64, but fail on s390x [1].
    
    Fix this by replacing AsBytes with a new ByteSpanCast where possible.
    
    [0] test case diff:
    
    ```
    diff --git a/src/test/serialize_tests.cpp b/src/test/serialize_tests.cpp
    index 09f77d2b61..4e56ffa351 100644
    --- a/src/test/serialize_tests.cpp
    +++ b/src/test/serialize_tests.cpp
    @@ -241,6 +241,15 @@ BOOST_AUTO_TEST_CASE(class_methods)
         ss2 << intval << boolval << stringval << charstrval << txval;
         ss2 >> methodtest3;
         BOOST_CHECK(methodtest3 == methodtest4);
    +
    +    {
    +        DataStream out;
    +        std::vector<uint8_t> in_8{0x00, 0x11, 0x22, 0x33};
    +        std::vector<uint16_t> in_16{0xaabb, 0xccdd};
    +        out.write(MakeByteSpan(in_8));
    +        out.write(MakeByteSpan(in_16));
    +        BOOST_CHECK_EQUAL(HexStr(out), "00112233bbaaddcc");
    +    }
     }
    
     BOOST_AUTO_TEST_SUITE_END()
    ```
    
    [1] test case failure on s390x:
    test/serialize_tests.cpp(251): error: in "serialize_tests/class_methods": check HexStr(out) == "00112233bbaaddcc" has failed [00112233aabbccdd != 00112233bbaaddcc]
    3c4891044a
  20. maflcko force-pushed on Jun 29, 2023
  21. maflcko commented at 8:27 am on June 29, 2023: member
    Closing for now to allow for more brainstorming
  22. maflcko closed this on Jun 29, 2023

  23. maflcko deleted the branch on Jun 29, 2023
  24. achow101 referenced this in commit 561915f35f on Jun 29, 2023
  25. sidhujag referenced this in commit efd2416d43 on Jun 30, 2023
  26. bitcoin locked this on Jun 28, 2024

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-09-15 22:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me