util: Allow std::byte and char Span serialization #27927

pull maflcko wants to merge 2 commits into bitcoin:master from maflcko:2306-ser-span- changing 11 files +36 −26
  1. maflcko commented at 2:30 pm on June 21, 2023: member
    Seems odd to require developers to cast all byte-like spans passed to serialization to unsigned char-spans. Fix that by passing and accepting byte-like spans as-is. Finally, add tests and update the code to use just Span where possible.
  2. DrahtBot commented at 2:30 pm on June 21, 2023: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK ryanofsky, sipa, achow101
    Concept ACK theuni

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    Conflicts

    No conflicts as of last run.

  3. DrahtBot renamed this:
    util: Allow std::byte and char Span serialization
    util: Allow std::byte and char Span serialization
    on Jun 21, 2023
  4. DrahtBot added the label Utils/log/libs on Jun 21, 2023
  5. maflcko force-pushed on Jun 21, 2023
  6. in src/span.h:288 in faa17da7ef outdated
    283 template <typename T> constexpr auto UCharSpanCast(Span<T> s) -> Span<typename std::remove_pointer<decltype(UCharCast(s.data()))>::type> { return {UCharCast(s.data()), s.size()}; }
    284+// Helper function to safely convert a Span to a Span<[const] std::byte>.
    285+template <typename B>
    286+auto ByteSpanCast(Span<B> s)
    287+{
    288+    // Use UCharCast to enforce B is a byte-like type.
    


    theuni commented at 5:55 pm on June 21, 2023:
    Use static_assert or enable_if to make this more obvious to the caller?

    maflcko commented at 6:29 pm on June 21, 2023:

    I think it should already be obvious to the developer? static_assert or enable_if in this context can only turn one compile failure error message into another compile failure error message, because there are no other overloads for ByteSpanCast and I don’t expect there to ever be ones. As this is not (standard) library code, any error message should be fine.

    Currently I get:

    0./span.h:289:27: error: no matching function for call to 'UCharCast'
    1    return Span{AsBytePtr(UCharCast(s.data())), s.size_bytes()};
    2                          ^~~~~~~~~
    3note: in instantiation of function template specialization 'ByteSpanCast<const unsigned long>'...
    

    Happy to move the code comment you commented on the same line as the return to include it in the error message if you think it helps?


    theuni commented at 6:34 pm on June 21, 2023:

    It’s only obvious because the implementation happens to be just below the definition :)

    But sure, I guess that’s good enough.

  7. in src/serialize.h:205 in faa17da7ef outdated
    201@@ -201,8 +202,7 @@ template<typename Stream> inline void Serialize(Stream& s, int64_t a ) { ser_wri
    202 template<typename Stream> inline void Serialize(Stream& s, uint64_t a) { ser_writedata64(s, a); }
    203 template<typename Stream, int N> inline void Serialize(Stream& s, const char (&a)[N]) { s.write(MakeByteSpan(a)); }
    204 template<typename Stream, int N> inline void Serialize(Stream& s, const unsigned char (&a)[N]) { s.write(MakeByteSpan(a)); }
    205-template<typename Stream> inline void Serialize(Stream& s, const Span<const unsigned char>& span) { s.write(AsBytes(span)); }
    206-template<typename Stream> inline void Serialize(Stream& s, const Span<unsigned char>& span) { s.write(AsBytes(span)); }
    207+template <typename Stream, typename B> void Serialize(Stream& s, Span<B> span) { s.write(ByteSpanCast(span)); }
    


    theuni commented at 5:57 pm on June 21, 2023:
    These seem too greedy I think. I guess you’re counting on ByteSpanCast to enforce that it’s byte-like?

    maflcko commented at 6:29 pm on June 21, 2023:

    I guess you’re counting on ByteSpanCast to enforce that it’s byte-like?

    Yes, if ByteSpanCast accepts non-byte spans, there are likely bugs in other places as well, which should be fixed.

    These seem too greedy I think.

    I think it is fine, because non-byte spans can not be serialized without a formatter for the elements. Happy to add a comment here as well, if you think it helps?


    theuni commented at 6:48 pm on June 21, 2023:
    Just a difference of opinion/style I guess. I’d rather not consider the function in the first place, but I suppose anything that causes a compile failure wfm.

    maflcko commented at 7:06 pm on June 21, 2023:
    C++20 Constraints and Concepts is probably going to fix the compiler errors in the long run. May even be less than 5 years until we can switch to that.
  8. theuni commented at 6:00 pm on June 21, 2023: member
    Concept ACK
  9. ryanofsky commented at 8:00 pm on June 21, 2023: contributor

    Concept ACK. It makes sense to allow Span<std::byte> to serialize fixed size blobs, like Span<unsigned char> already does so there is no need for serialize callers to cast to unsigned char if they are already using std::byte.

    I also think it is nice that the implementation uses a Span<T> parameter type instead of Span<std::byte> or Span<unsigned char> so you can tell just by looking at the Serialize declaration that an actual Span argument needs to be passed, not some argument that can be implicitly converted to span. It is true as you pointed out in your earlier comment #27790 (review) that the presence of the Serialize(Stream& os, const T& a) overload below already prevents implicit conversions, but it’s nice if you can look directly at a declaration and see that it doesn’t allow implicit conversions, without having to look at other overloads.

    Looking at surrounding code, I also think there are some followups that could be done to make it cleaner / safer / nicer (but would probably save these for a followup PR):

    1. I think it’s not ideal that ByteSpanCast is using UCharCast cast internally. Now code is using std::byte, I think would be good to drop the UCharCast function and replace it with a similar function that accepts char, unsigned char and std::byte pointers, but returns a std::byte pointer rather than an unsigned char pointer. Doing this would remove an unnecessary series of casts from byte to char and back to byte again in ByteSpanCast, and also just generally move code further in the direction of using std::byte over char
    2. I think it might be good to refuse the serialize Span<char> and Span<unsigned char> arguments, and only serialize Span<std::byte> arguments as fixed size blobs. Again it would push code more in direction of using std::byte over char for binary data. It could also perhaps avoid cases where someone is trying to serialize a variable sized string and ends up through some misunderstood conversions accidentally serializing a fixed size span.
    3. I think it would be good to drop the AsBytePtr function, which looks like a safe conversion but actually can cast any pointer to a byte pointer, is not actually safe or useful to call on most object types. I think if suggestion 1 above is implemented, some uses of AsBytePtr would no longer be necessary, and the other uses would be clearer if they just used reinterpret_cast directly instead of AsBytePtr
  10. maflcko commented at 8:07 am on June 22, 2023: member

    I think it's not ideal that ByteSpanCast is using UCharCast cast internally. Now code is using std::byte, I think would be good to drop the UCharCast function and replace it with a similar function that accepts char, unsigned char and std::byte pointers, but returns a std::byte pointer rather than an unsigned char pointer. Doing this would remove an unnecessary series of casts from byte to char and back to byte again in ByteSpanCast, and also just generally move code further in the direction of using std::byte over char

    I thought about this, but concluded it isn’t worth it. UCharCast won’t be going away, because it is needed to speak with libbitcoinconsensus or libsecp256k1, see for example the conflicting pull. So adding the same 8 function to only differ in return type seems bloaty and verbose. However, I agree with you that new code shouldn’t use UCharCast unless necessary.

    I think it might be good to refuse the serialize Span<char> and Span<unsigned char> arguments, and only serialize Span<std::byte> arguments as fixed size blobs. Again it would push code more in direction of using std::byte over char for binary data. It could also perhaps avoid cases where someone is trying to serialize a variable sized string and ends up through some misunderstood conversions accidentally serializing a fixed size span.

    This may be a bit too much hand-holding, with a small benefit. Also, it would make code more bloaty, as there are quite a few cases in the current code base where a std::string(_view) is serialized via Span<char>, or a view on unsigned char is serialized via Span<unsigned char>.

    Also, Span<B> could be used internally to serialize byte-holding (pre)vectors. In fact this is likely a performance bug-fix, because currently, serialization accepts std::vector<B> for B=signed char, or B=int8_t. (B=char is deleted). However, serialization seems to only apply the optimization to only call write once for B=unsigned char. See https://github.com/bitcoin/bitcoin/blob/f1b4975461364d5d40d2bfafc6b165dd5d7eec30/src/serialize.h#L651 It may be good to treat all byte-like types the same, which can be achieved with Span<B>, or alternatively also delete the other B-types that are not special-cased, or alternatively delete the special-case code to treat all vectors the same.

    I think it would be good to drop the AsBytePtr function, which looks like a safe conversion but actually can cast any pointer to a byte pointer, is not actually safe or useful to call on most object types. I think if suggestion 1 above is implemented, some uses of AsBytePtr would no longer be necessary, and the other uses would be clearer if they just used reinterpret_cast directly instead of AsBytePtr

    I don’t think this is possible either. It is required to speak with libsqlite3, libbdb, and internally in serialize.h. I think we should instead think about nuking the AsBytes span helpers. While they are ported from the standard library, I don’t think they are appropriate for us, because they are likely architecture dependent when it comes to endian-ness.

  11. maflcko force-pushed on Jun 26, 2023
  12. maflcko marked this as a draft on Jun 26, 2023
  13. maflcko force-pushed on Jun 26, 2023
  14. DrahtBot added the label CI failed on Jun 26, 2023
  15. maflcko commented at 9:19 am on June 26, 2023: member
    Rebased on #27973 for now, please review that first.
  16. DrahtBot removed the label CI failed on Jun 26, 2023
  17. DrahtBot added the label Needs rebase on Jun 26, 2023
  18. util: Allow std::byte and char Span serialization fa257bc831
  19. Use only Span{} constructor for byte-like types where possible
    This removes bloat that is not needed.
    fa38d86235
  20. maflcko force-pushed on Jun 27, 2023
  21. maflcko marked this as ready for review on Jun 27, 2023
  22. maflcko commented at 9:30 am on June 27, 2023: member
    Did a rebase on clean master to drop the unneeded dependency.
  23. DrahtBot removed the label Needs rebase on Jun 27, 2023
  24. ryanofsky approved
  25. ryanofsky commented at 2:41 pm on June 27, 2023: contributor
    Code review ACK fa38d862358b87219b12bf31236c52f28d9fc5d6. This looks great. The second commit really removes a lot of boilerplate and shows why the first commit is useful.
  26. ryanofsky commented at 2:42 pm on June 27, 2023: contributor

    Fix that by introducing a helper to convert any byte-like span to a std::byte-span, which is then accepted by serialization.

    Can drop this line from PR description (https://github.com/bitcoin/bitcoin/pull/27927#issue-1767744286)

  27. ryanofsky commented at 3:18 pm on June 27, 2023: contributor

    re: #27927 (comment)

    Thanks for the responses. You convinced me there probably little benefit to preventing serialization of char spans, and it would just add boilerplate and not make things safer.

    I do think we can drop the AsBytePtr function, though, so I tried to do that in #27978.

    I also still think in the longer run it would be good to convert more internal code to use std::byte instead of unsigned char and replace UCharCast with a ByteCast alternative and MakeUCharSpan with a safer MakeUCharSpan alternative. If code calling leveldb or libsecp needs uchar functions, it implement them locally, or we could keep existing functions, but implement them in terms of ByteCast. Not suggesting to do this now. I just think we might end up doing it when more code uses std::byte.

  28. sipa commented at 3:45 pm on June 27, 2023: member

    utACK fa38d862358b87219b12bf31236c52f28d9fc5d6

    I do wonder: is there any reason why we wouldn’t want to support serializing Spans in general (where its serialization is defined as the concatenation of the serialization of its elements)?

  29. maflcko commented at 3:54 pm on June 27, 2023: member

    I do wonder: is there any reason why we wouldn’t want to support serializing Spans in general (where its serialization is defined as the concatenation of the serialization of its elements)?

    I think that should be fine to add, once there is a use case. Or is there already one?

  30. achow101 commented at 7:01 pm on June 28, 2023: member
    ACK fa38d862358b87219b12bf31236c52f28d9fc5d6
  31. achow101 merged this on Jun 28, 2023
  32. achow101 closed this on Jun 28, 2023

  33. maflcko deleted the branch on Jun 29, 2023
  34. in src/serialize.h:220 in fa38d86235
    216@@ -217,10 +217,11 @@ template<typename Stream> inline void Unserialize(Stream& s, int64_t& a ) { a =
    217 template<typename Stream> inline void Unserialize(Stream& s, uint64_t& a) { a = ser_readdata64(s); }
    218 template<typename Stream, int N> inline void Unserialize(Stream& s, char (&a)[N]) { s.read(MakeWritableByteSpan(a)); }
    219 template<typename Stream, int N> inline void Unserialize(Stream& s, unsigned char (&a)[N]) { s.read(MakeWritableByteSpan(a)); }
    220-template<typename Stream> inline void Unserialize(Stream& s, Span<unsigned char>& span) { s.read(AsWritableBytes(span)); }
    221+template <typename Stream, typename B> void Unserialize(Stream& s, Span<B> span) { (void)/* force byte-type */UCharCast(span.data()); s.read(AsWritableBytes(span)); }
    


    maflcko commented at 10:16 am on June 29, 2023:

    Just as another follow-up comment to the earlier question if this is too greedy: Yes, it is too greedy, but will actually print a nicer error message compared to using enable_if or concepts, because of the general serialization catch-all fallback to the member method.

    For example, the following test diff would fail with the general error message for a non-greedy template:

     0diff --git a/src/test/serialize_tests.cpp b/src/test/serialize_tests.cpp
     1index b445ff8ffc..73b9e63011 100644
     2--- a/src/test/serialize_tests.cpp
     3+++ b/src/test/serialize_tests.cpp
     4@@ -245,10 +245,8 @@ BOOST_AUTO_TEST_CASE(class_methods)
     5         DataStream ds;
     6         const std::string in{"ab"};
     7         ds << Span{in};
     8-        std::array<std::byte, 2> out;
     9+        std::array<uint16_t, 2> out;
    10         ds >> Span{out};
    11-        BOOST_CHECK_EQUAL(out.at(0), std::byte{'a'});
    12-        BOOST_CHECK_EQUAL(out.at(1), std::byte{'b'});
    13     }
    14 }
    15 
    
     0  CXX      test/test_bitcoin-serialize_tests.o
     1In file included from test/serialize_tests.cpp:5:
     2In file included from ./hash.h:14:
     3./serialize.h:706:7: error: no member named 'Unserialize' in 'Span<unsigned short>'
     4    a.Unserialize(is);
     5    ~ ^
     6./streams.h:308:11: note: in instantiation of function template specialization 'Unserialize<DataStream, Span<unsigned short> &>' requested here
     7        ::Unserialize(*this, obj);
     8          ^
     9test/serialize_tests.cpp:249:12: note: in instantiation of function template specialization 'DataStream::operator>><Span<unsigned short>>' requested here
    10        ds >> Span{out};
    11           ^
    121 error generated.
    13make[2]: *** [Makefile:19030: test/test_bitcoin-serialize_tests.o] Error 1
    

    I tested this with concepts (C++20), but the result should be the same for C++17 enable_if:

     0diff --git a/src/serialize.h b/src/serialize.h
     1index 348a6ae4f1..5c775458ae 100644
     2--- a/src/serialize.h
     3+++ b/src/serialize.h
     4@@ -202,7 +202,7 @@ template<typename Stream> inline void Serialize(Stream& s, int64_t a ) { ser_wri
     5 template<typename Stream> inline void Serialize(Stream& s, uint64_t a) { ser_writedata64(s, a); }
     6 template<typename Stream, int N> inline void Serialize(Stream& s, const char (&a)[N]) { s.write(MakeByteSpan(a)); }
     7 template<typename Stream, int N> inline void Serialize(Stream& s, const unsigned char (&a)[N]) { s.write(MakeByteSpan(a)); }
     8-template <typename Stream, typename B> void Serialize(Stream& s, Span<B> span) { (void)/* force byte-type */UCharCast(span.data()); s.write(AsBytes(span)); }
     9+template <typename Stream, ByteType B> void Serialize(Stream& s, Span<B> span) { s.write(AsBytes(span)); }
    10 
    11 #ifndef CHAR_EQUALS_INT8
    12 template <typename Stream> void Unserialize(Stream&, char) = delete; // char serialization forbidden. Use uint8_t or int8_t
    13@@ -217,7 +217,7 @@ template<typename Stream> inline void Unserialize(Stream& s, int64_t& a ) { a =
    14 template<typename Stream> inline void Unserialize(Stream& s, uint64_t& a) { a = ser_readdata64(s); }
    15 template<typename Stream, int N> inline void Unserialize(Stream& s, char (&a)[N]) { s.read(MakeWritableByteSpan(a)); }
    16 template<typename Stream, int N> inline void Unserialize(Stream& s, unsigned char (&a)[N]) { s.read(MakeWritableByteSpan(a)); }
    17-template <typename Stream, typename B> void Unserialize(Stream& s, Span<B> span) { (void)/* force byte-type */UCharCast(span.data()); s.read(AsWritableBytes(span)); }
    18+template <typename Stream, ByteType B> void Unserialize(Stream& s, Span<B> span) { s.read(AsWritableBytes(span)); }
    19 
    20 template <typename Stream> inline void Serialize(Stream& s, bool a) { uint8_t f = a; ser_writedata8(s, f); }
    21 template <typename Stream> inline void Unserialize(Stream& s, bool& a) { uint8_t f = ser_readdata8(s); a = f; }
    22diff --git a/src/span.h b/src/span.h
    23index c98784aee4..dbee63ba77 100644
    24--- a/src/span.h
    25+++ b/src/span.h
    26@@ -271,18 +271,24 @@ Span<std::byte> MakeWritableByteSpan(V&& v) noexcept
    27     return AsWritableBytes(Span{std::forward<V>(v)});
    28 }
    29 
    30-// Helper functions to safely cast to unsigned char pointers.
    31+// Helper functions to safely cast a byte-like type to unsigned char pointers.
    32 inline unsigned char* UCharCast(char* c) { return (unsigned char*)c; }
    33 inline unsigned char* UCharCast(unsigned char* c) { return c; }
    34 inline unsigned char* UCharCast(std::byte* c) { return (unsigned char*)c; }
    35 inline const unsigned char* UCharCast(const char* c) { return (unsigned char*)c; }
    36 inline const unsigned char* UCharCast(const unsigned char* c) { return c; }
    37 inline const unsigned char* UCharCast(const std::byte* c) { return reinterpret_cast<const unsigned char*>(c); }
    38+// Helper concept for byte-like types.
    39+template <typename B>
    40+concept ByteType = requires
    41+{
    42+    UCharCast(Span<B>{}.data());
    43+};
    44 
    45 // Helper function to safely convert a Span to a Span<[const] unsigned char>.
    46 template <typename T> constexpr auto UCharSpanCast(Span<T> s) -> Span<typename std::remove_pointer<decltype(UCharCast(s.data()))>::type> { return {UCharCast(s.data()), s.size()}; }
    47 
    48-/** Like the Span constructor, but for (const) unsigned char member types only. Only works for (un)signed char containers. */
    49+/** Like the Span constructor, but for (const) unsigned char member types only. Only works for byte-like type containers. */
    50 template <typename V> constexpr auto MakeUCharSpan(V&& v) -> decltype(UCharSpanCast(Span{std::forward<V>(v)})) { return UCharSpanCast(Span{std::forward<V>(v)}); }
    51 
    52 #endif // BITCOIN_SPAN_H
    
  35. sipa commented at 7:43 pm on June 29, 2023: member

    I think that should be fine to add, once there is a use case. Or is there already one?

    Not that I know, no. I was more wondering why we’d only add a specialized one for byte-like types when a generic one for all spans would work equally well. And I think the answer is that even with a generic one, we’d probably want a more efficient specialization for byte-like types anyway, to avoid processing one byte at a time.

  36. maflcko commented at 7:57 pm on June 29, 2023: member

    Yeah, I am planning to do a more thorough cleanup with C++20, see my comment directly above and the other one earlier in the thread:

    Also, Span<B> could be used internally to serialize byte-holding (pre)vectors. In fact this is likely a performance bug-fix, because currently, serialization accepts std::vector<B> for B=signed char, or B=int8_t. (B=char is deleted). However, serialization seems to only apply the optimization to only call write once for B=unsigned char. See

    https://github.com/bitcoin/bitcoin/blob/f1b4975461364d5d40d2bfafc6b165dd5d7eec30/src/serialize.h#L651 It may be good to treat all byte-like types the same, which can be achieved with Span<B>, or alternatively also delete the other B-types that are not special-cased, or alternatively delete the special-case code to treat all vectors the same.

  37. achow101 referenced this in commit 561915f35f on Jun 29, 2023
  38. sidhujag referenced this in commit 69caf91ba8 on Jun 30, 2023
  39. sidhujag referenced this in commit efd2416d43 on Jun 30, 2023
  40. achow101 referenced this in commit b40d10787b on Feb 21, 2024
  41. achow101 referenced this in commit 6acfc4324c on Feb 21, 2024
  42. kwvg referenced this in commit 33fbe98319 on Feb 23, 2024
  43. kwvg referenced this in commit 3515543d39 on Feb 23, 2024
  44. kwvg referenced this in commit ab467e74c1 on Feb 23, 2024
  45. kwvg referenced this in commit f30b0e1676 on Feb 23, 2024
  46. kwvg referenced this in commit bbbac2e450 on Feb 23, 2024
  47. kwvg referenced this in commit ad055c8c0f on Feb 23, 2024
  48. kwvg referenced this in commit 821f2ea5b3 on Feb 23, 2024
  49. kwvg referenced this in commit 3accf754df on Feb 23, 2024
  50. kwvg referenced this in commit 4eda82718e on Feb 24, 2024
  51. kwvg referenced this in commit 3babdf3bd4 on Feb 24, 2024
  52. kwvg referenced this in commit 00c76230d0 on Feb 24, 2024
  53. kwvg referenced this in commit 099682b38a on Feb 24, 2024
  54. fanquake referenced this in commit 1ce5accc32 on Feb 26, 2024
  55. kwvg referenced this in commit 52e643c195 on Feb 27, 2024
  56. kwvg referenced this in commit de15c82aed on Feb 27, 2024
  57. PastaPastaPasta referenced this in commit cb2fa8360a on Feb 28, 2024
  58. PastaPastaPasta referenced this in commit 4eeafa267c on Feb 28, 2024
  59. PastaPastaPasta referenced this in commit 96c4442253 on Feb 28, 2024
  60. backpacker69 referenced this in commit 702df4ad20 on Jun 14, 2024
  61. backpacker69 referenced this in commit b063c9935f on Jun 14, 2024
  62. bitcoin locked this on Jun 28, 2024

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-09-15 22:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me