unsigned char
-spans. Fix that by passing and accepting byte-like spans as-is. Finally, add tests and update the code to use just Span
where possible.
unsigned char
-spans. Fix that by passing and accepting byte-like spans as-is. Finally, add tests and update the code to use just Span
where possible.
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.
See the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.
No conflicts as of last run.
283 template <typename T> constexpr auto UCharSpanCast(Span<T> s) -> Span<typename std::remove_pointer<decltype(UCharCast(s.data()))>::type> { return {UCharCast(s.data()), s.size()}; }
284+// Helper function to safely convert a Span to a Span<[const] std::byte>.
285+template <typename B>
286+auto ByteSpanCast(Span<B> s)
287+{
288+ // Use UCharCast to enforce B is a byte-like type.
static_assert
or enable_if
to make this more obvious to the caller?
I think it should already be obvious to the developer? static_assert
or enable_if
in this context can only turn one compile failure error message into another compile failure error message, because there are no other overloads for ByteSpanCast
and I don’t expect there to ever be ones. As this is not (standard) library code, any error message should be fine.
Currently I get:
0./span.h:289:27: error: no matching function for call to 'UCharCast'
1 return Span{AsBytePtr(UCharCast(s.data())), s.size_bytes()};
2 ^~~~~~~~~
3note: in instantiation of function template specialization 'ByteSpanCast<const unsigned long>'...
Happy to move the code comment you commented on the same line as the return to include it in the error message if you think it helps?
It’s only obvious because the implementation happens to be just below the definition :)
But sure, I guess that’s good enough.
201@@ -201,8 +202,7 @@ template<typename Stream> inline void Serialize(Stream& s, int64_t a ) { ser_wri
202 template<typename Stream> inline void Serialize(Stream& s, uint64_t a) { ser_writedata64(s, a); }
203 template<typename Stream, int N> inline void Serialize(Stream& s, const char (&a)[N]) { s.write(MakeByteSpan(a)); }
204 template<typename Stream, int N> inline void Serialize(Stream& s, const unsigned char (&a)[N]) { s.write(MakeByteSpan(a)); }
205-template<typename Stream> inline void Serialize(Stream& s, const Span<const unsigned char>& span) { s.write(AsBytes(span)); }
206-template<typename Stream> inline void Serialize(Stream& s, const Span<unsigned char>& span) { s.write(AsBytes(span)); }
207+template <typename Stream, typename B> void Serialize(Stream& s, Span<B> span) { s.write(ByteSpanCast(span)); }
ByteSpanCast
to enforce that it’s byte-like?
I guess you’re counting on ByteSpanCast to enforce that it’s byte-like?
Yes, if ByteSpanCast
accepts non-byte spans, there are likely bugs in other places as well, which should be fixed.
These seem too greedy I think.
I think it is fine, because non-byte spans can not be serialized without a formatter for the elements. Happy to add a comment here as well, if you think it helps?
Concept ACK. It makes sense to allow Span<std::byte>
to serialize fixed size blobs, like Span<unsigned char>
already does so there is no need for serialize callers to cast to unsigned char
if they are already using std::byte
.
I also think it is nice that the implementation uses a Span<T>
parameter type instead of Span<std::byte>
or Span<unsigned char>
so you can tell just by looking at the Serialize declaration that an actual Span
argument needs to be passed, not some argument that can be implicitly converted to span. It is true as you pointed out in your earlier comment #27790 (review) that the presence of the Serialize(Stream& os, const T& a)
overload below already prevents implicit conversions, but it’s nice if you can look directly at a declaration and see that it doesn’t allow implicit conversions, without having to look at other overloads.
Looking at surrounding code, I also think there are some followups that could be done to make it cleaner / safer / nicer (but would probably save these for a followup PR):
std::byte
, I think would be good to drop the UCharCast
function and replace it with a similar function that accepts char
, unsigned char
and std::byte
pointers, but returns a std::byte
pointer rather than an unsigned char
pointer. Doing this would remove an unnecessary series of casts from byte to char and back to byte again in ByteSpanCast, and also just generally move code further in the direction of using std::byte
over char
Span<char>
and Span<unsigned char>
arguments, and only serialize Span<std::byte>
arguments as fixed size blobs. Again it would push code more in direction of using std::byte
over char
for binary data. It could also perhaps avoid cases where someone is trying to serialize a variable sized string and ends up through some misunderstood conversions accidentally serializing a fixed size span.AsBytePtr
function, which looks like a safe conversion but actually can cast any pointer to a byte pointer, is not actually safe or useful to call on most object types. I think if suggestion 1 above is implemented, some uses of AsBytePtr
would no longer be necessary, and the other uses would be clearer if they just used reinterpret_cast
directly instead of AsBytePtr
I think it's not ideal that ByteSpanCast is using UCharCast cast internally. Now code is using std::byte, I think would be good to drop the UCharCast function and replace it with a similar function that accepts char, unsigned char and std::byte pointers, but returns a std::byte pointer rather than an unsigned char pointer. Doing this would remove an unnecessary series of casts from byte to char and back to byte again in ByteSpanCast, and also just generally move code further in the direction of using std::byte over char
I thought about this, but concluded it isn’t worth it. UCharCast
won’t be going away, because it is needed to speak with libbitcoinconsensus or libsecp256k1, see for example the conflicting pull. So adding the same 8 function to only differ in return type seems bloaty and verbose. However, I agree with you that new code shouldn’t use UCharCast
unless necessary.
I think it might be good to refuse the serialize Span<char> and Span<unsigned char> arguments, and only serialize Span<std::byte> arguments as fixed size blobs. Again it would push code more in direction of using std::byte over char for binary data. It could also perhaps avoid cases where someone is trying to serialize a variable sized string and ends up through some misunderstood conversions accidentally serializing a fixed size span.
This may be a bit too much hand-holding, with a small benefit. Also, it would make code more bloaty, as there are quite a few cases in the current code base where a std::string(_view)
is serialized via Span<char>
, or a view on unsigned char
is serialized via Span<unsigned char>
.
Also, Span<B>
could be used internally to serialize byte-holding (pre)vectors. In fact this is likely a performance bug-fix, because currently, serialization accepts std::vector<B>
for B=signed char
, or B=int8_t
. (B=char
is deleted). However, serialization seems to only apply the optimization to only call write
once for B=unsigned char
. See https://github.com/bitcoin/bitcoin/blob/f1b4975461364d5d40d2bfafc6b165dd5d7eec30/src/serialize.h#L651
It may be good to treat all byte-like types the same, which can be achieved with Span<B>
, or alternatively also delete the other B
-types that are not special-cased, or alternatively delete the special-case code to treat all vectors the same.
I think it would be good to drop the AsBytePtr function, which looks like a safe conversion but actually can cast any pointer to a byte pointer, is not actually safe or useful to call on most object types. I think if suggestion 1 above is implemented, some uses of AsBytePtr would no longer be necessary, and the other uses would be clearer if they just used reinterpret_cast directly instead of AsBytePtr
I don’t think this is possible either. It is required to speak with libsqlite3, libbdb, and internally in serialize.h. I think we should instead think about nuking the AsBytes
span helpers. While they are ported from the standard library, I don’t think they are appropriate for us, because they are likely architecture dependent when it comes to endian-ness.
This removes bloat that is not needed.
Fix that by introducing a helper to convert any byte-like span to a std::byte-span, which is then accepted by serialization.
Can drop this line from PR description (https://github.com/bitcoin/bitcoin/pull/27927#issue-1767744286)
re: #27927 (comment)
Thanks for the responses. You convinced me there probably little benefit to preventing serialization of char spans, and it would just add boilerplate and not make things safer.
I do think we can drop the AsBytePtr function, though, so I tried to do that in #27978.
I also still think in the longer run it would be good to convert more internal code to use std::byte
instead of unsigned char
and replace UCharCast
with a ByteCast
alternative and MakeUCharSpan
with a safer MakeUCharSpan
alternative. If code calling leveldb or libsecp needs uchar functions, it implement them locally, or we could keep existing functions, but implement them in terms of ByteCast
. Not suggesting to do this now. I just think we might end up doing it when more code uses std::byte
.
utACK fa38d862358b87219b12bf31236c52f28d9fc5d6
I do wonder: is there any reason why we wouldn’t want to support serializing Span
s in general (where its serialization is defined as the concatenation of the serialization of its elements)?
I do wonder: is there any reason why we wouldn’t want to support serializing Spans in general (where its serialization is defined as the concatenation of the serialization of its elements)?
I think that should be fine to add, once there is a use case. Or is there already one?
216@@ -217,10 +217,11 @@ template<typename Stream> inline void Unserialize(Stream& s, int64_t& a ) { a =
217 template<typename Stream> inline void Unserialize(Stream& s, uint64_t& a) { a = ser_readdata64(s); }
218 template<typename Stream, int N> inline void Unserialize(Stream& s, char (&a)[N]) { s.read(MakeWritableByteSpan(a)); }
219 template<typename Stream, int N> inline void Unserialize(Stream& s, unsigned char (&a)[N]) { s.read(MakeWritableByteSpan(a)); }
220-template<typename Stream> inline void Unserialize(Stream& s, Span<unsigned char>& span) { s.read(AsWritableBytes(span)); }
221+template <typename Stream, typename B> void Unserialize(Stream& s, Span<B> span) { (void)/* force byte-type */UCharCast(span.data()); s.read(AsWritableBytes(span)); }
Just as another follow-up comment to the earlier question if this is too greedy: Yes, it is too greedy, but will actually print a nicer error message compared to using enable_if
or concepts, because of the general serialization catch-all fallback to the member method.
For example, the following test diff would fail with the general error message for a non-greedy template:
0diff --git a/src/test/serialize_tests.cpp b/src/test/serialize_tests.cpp
1index b445ff8ffc..73b9e63011 100644
2--- a/src/test/serialize_tests.cpp
3+++ b/src/test/serialize_tests.cpp
4@@ -245,10 +245,8 @@ BOOST_AUTO_TEST_CASE(class_methods)
5 DataStream ds;
6 const std::string in{"ab"};
7 ds << Span{in};
8- std::array<std::byte, 2> out;
9+ std::array<uint16_t, 2> out;
10 ds >> Span{out};
11- BOOST_CHECK_EQUAL(out.at(0), std::byte{'a'});
12- BOOST_CHECK_EQUAL(out.at(1), std::byte{'b'});
13 }
14 }
15
0 CXX test/test_bitcoin-serialize_tests.o
1In file included from test/serialize_tests.cpp:5:
2In file included from ./hash.h:14:
3./serialize.h:706:7: error: no member named 'Unserialize' in 'Span<unsigned short>'
4 a.Unserialize(is);
5 ~ ^
6./streams.h:308:11: note: in instantiation of function template specialization 'Unserialize<DataStream, Span<unsigned short> &>' requested here
7 ::Unserialize(*this, obj);
8 ^
9test/serialize_tests.cpp:249:12: note: in instantiation of function template specialization 'DataStream::operator>><Span<unsigned short>>' requested here
10 ds >> Span{out};
11 ^
121 error generated.
13make[2]: *** [Makefile:19030: test/test_bitcoin-serialize_tests.o] Error 1
I tested this with concepts (C++20), but the result should be the same for C++17 enable_if
:
0diff --git a/src/serialize.h b/src/serialize.h
1index 348a6ae4f1..5c775458ae 100644
2--- a/src/serialize.h
3+++ b/src/serialize.h
4@@ -202,7 +202,7 @@ template<typename Stream> inline void Serialize(Stream& s, int64_t a ) { ser_wri
5 template<typename Stream> inline void Serialize(Stream& s, uint64_t a) { ser_writedata64(s, a); }
6 template<typename Stream, int N> inline void Serialize(Stream& s, const char (&a)[N]) { s.write(MakeByteSpan(a)); }
7 template<typename Stream, int N> inline void Serialize(Stream& s, const unsigned char (&a)[N]) { s.write(MakeByteSpan(a)); }
8-template <typename Stream, typename B> void Serialize(Stream& s, Span<B> span) { (void)/* force byte-type */UCharCast(span.data()); s.write(AsBytes(span)); }
9+template <typename Stream, ByteType B> void Serialize(Stream& s, Span<B> span) { s.write(AsBytes(span)); }
10
11 #ifndef CHAR_EQUALS_INT8
12 template <typename Stream> void Unserialize(Stream&, char) = delete; // char serialization forbidden. Use uint8_t or int8_t
13@@ -217,7 +217,7 @@ template<typename Stream> inline void Unserialize(Stream& s, int64_t& a ) { a =
14 template<typename Stream> inline void Unserialize(Stream& s, uint64_t& a) { a = ser_readdata64(s); }
15 template<typename Stream, int N> inline void Unserialize(Stream& s, char (&a)[N]) { s.read(MakeWritableByteSpan(a)); }
16 template<typename Stream, int N> inline void Unserialize(Stream& s, unsigned char (&a)[N]) { s.read(MakeWritableByteSpan(a)); }
17-template <typename Stream, typename B> void Unserialize(Stream& s, Span<B> span) { (void)/* force byte-type */UCharCast(span.data()); s.read(AsWritableBytes(span)); }
18+template <typename Stream, ByteType B> void Unserialize(Stream& s, Span<B> span) { s.read(AsWritableBytes(span)); }
19
20 template <typename Stream> inline void Serialize(Stream& s, bool a) { uint8_t f = a; ser_writedata8(s, f); }
21 template <typename Stream> inline void Unserialize(Stream& s, bool& a) { uint8_t f = ser_readdata8(s); a = f; }
22diff --git a/src/span.h b/src/span.h
23index c98784aee4..dbee63ba77 100644
24--- a/src/span.h
25+++ b/src/span.h
26@@ -271,18 +271,24 @@ Span<std::byte> MakeWritableByteSpan(V&& v) noexcept
27 return AsWritableBytes(Span{std::forward<V>(v)});
28 }
29
30-// Helper functions to safely cast to unsigned char pointers.
31+// Helper functions to safely cast a byte-like type to unsigned char pointers.
32 inline unsigned char* UCharCast(char* c) { return (unsigned char*)c; }
33 inline unsigned char* UCharCast(unsigned char* c) { return c; }
34 inline unsigned char* UCharCast(std::byte* c) { return (unsigned char*)c; }
35 inline const unsigned char* UCharCast(const char* c) { return (unsigned char*)c; }
36 inline const unsigned char* UCharCast(const unsigned char* c) { return c; }
37 inline const unsigned char* UCharCast(const std::byte* c) { return reinterpret_cast<const unsigned char*>(c); }
38+// Helper concept for byte-like types.
39+template <typename B>
40+concept ByteType = requires
41+{
42+ UCharCast(Span<B>{}.data());
43+};
44
45 // Helper function to safely convert a Span to a Span<[const] unsigned char>.
46 template <typename T> constexpr auto UCharSpanCast(Span<T> s) -> Span<typename std::remove_pointer<decltype(UCharCast(s.data()))>::type> { return {UCharCast(s.data()), s.size()}; }
47
48-/** Like the Span constructor, but for (const) unsigned char member types only. Only works for (un)signed char containers. */
49+/** Like the Span constructor, but for (const) unsigned char member types only. Only works for byte-like type containers. */
50 template <typename V> constexpr auto MakeUCharSpan(V&& v) -> decltype(UCharSpanCast(Span{std::forward<V>(v)})) { return UCharSpanCast(Span{std::forward<V>(v)}); }
51
52 #endif // BITCOIN_SPAN_H
I think that should be fine to add, once there is a use case. Or is there already one?
Not that I know, no. I was more wondering why we’d only add a specialized one for byte-like types when a generic one for all spans would work equally well. And I think the answer is that even with a generic one, we’d probably want a more efficient specialization for byte-like types anyway, to avoid processing one byte at a time.
Yeah, I am planning to do a more thorough cleanup with C++20, see my comment directly above and the other one earlier in the thread:
Also,
Span<B>
could be used internally to serialize byte-holding (pre)vectors. In fact this is likely a performance bug-fix, because currently, serialization acceptsstd::vector<B>
forB=signed char
, orB=int8_t
. (B=char
is deleted). However, serialization seems to only apply the optimization to only callwrite
once forB=unsigned char
. Seehttps://github.com/bitcoin/bitcoin/blob/f1b4975461364d5d40d2bfafc6b165dd5d7eec30/src/serialize.h#L651 It may be good to treat all byte-like types the same, which can be achieved with
Span<B>
, or alternatively also delete the otherB
-types that are not special-cased, or alternatively delete the special-case code to treat all vectors the same.