Seems odd to require developers to cast all byte-like spans passed to serialization to unsigned char-spans. Fix that by passing and accepting byte-like spans as-is. Finally, add tests and update the code to use just Span where possible.
util: Allow std::byte and char Span serialization #27927
pull maflcko wants to merge 2 commits into bitcoin:master from maflcko:2306-ser-span- changing 11 files +36 −26-
maflcko commented at 2:30 PM on June 21, 2023: member
-
DrahtBot commented at 2:30 PM on June 21, 2023: contributor
<!--e57a25ab6845829454e8d69fc972939a-->
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.
<!--021abf342d371248e50ceaed478a90ca-->
Reviews
See the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.
<!--174a7506f384e20aa4161008e828411d-->
Conflicts
No conflicts as of last run.
- DrahtBot renamed this:
util: Allow std::byte and char Span serialization
util: Allow std::byte and char Span serialization
on Jun 21, 2023 - DrahtBot added the label Utils/log/libs on Jun 21, 2023
- maflcko force-pushed on Jun 21, 2023
-
in src/span.h:288 in faa17da7ef outdated
283 | template <typename T> constexpr auto UCharSpanCast(Span<T> s) -> Span<typename std::remove_pointer<decltype(UCharCast(s.data()))>::type> { return {UCharCast(s.data()), s.size()}; } 284 | +// Helper function to safely convert a Span to a Span<[const] std::byte>. 285 | +template <typename B> 286 | +auto ByteSpanCast(Span<B> s) 287 | +{ 288 | + // Use UCharCast to enforce B is a byte-like type.
theuni commented at 5:55 PM on June 21, 2023:Use
static_assertorenable_ifto make this more obvious to the caller?
maflcko commented at 6:29 PM on June 21, 2023:I think it should already be obvious to the developer?
static_assertorenable_ifin this context can only turn one compile failure error message into another compile failure error message, because there are no other overloads forByteSpanCastand I don't expect there to ever be ones. As this is not (standard) library code, any error message should be fine.Currently I get:
./span.h:289:27: error: no matching function for call to 'UCharCast' return Span{AsBytePtr(UCharCast(s.data())), s.size_bytes()}; ^~~~~~~~~ note: in instantiation of function template specialization 'ByteSpanCast<const unsigned long>'...Happy to move the code comment you commented on the same line as the return to include it in the error message if you think it helps?
theuni commented at 6:34 PM on June 21, 2023:It's only obvious because the implementation happens to be just below the definition :)
But sure, I guess that's good enough.
in src/serialize.h:205 in faa17da7ef outdated
201 | @@ -201,8 +202,7 @@ template<typename Stream> inline void Serialize(Stream& s, int64_t a ) { ser_wri 202 | template<typename Stream> inline void Serialize(Stream& s, uint64_t a) { ser_writedata64(s, a); } 203 | template<typename Stream, int N> inline void Serialize(Stream& s, const char (&a)[N]) { s.write(MakeByteSpan(a)); } 204 | template<typename Stream, int N> inline void Serialize(Stream& s, const unsigned char (&a)[N]) { s.write(MakeByteSpan(a)); } 205 | -template<typename Stream> inline void Serialize(Stream& s, const Span<const unsigned char>& span) { s.write(AsBytes(span)); } 206 | -template<typename Stream> inline void Serialize(Stream& s, const Span<unsigned char>& span) { s.write(AsBytes(span)); } 207 | +template <typename Stream, typename B> void Serialize(Stream& s, Span<B> span) { s.write(ByteSpanCast(span)); }
theuni commented at 5:57 PM on June 21, 2023:These seem too greedy I think. I guess you're counting on
ByteSpanCastto enforce that it's byte-like?
maflcko commented at 6:29 PM on June 21, 2023:I guess you're counting on ByteSpanCast to enforce that it's byte-like?
Yes, if
ByteSpanCastaccepts non-byte spans, there are likely bugs in other places as well, which should be fixed.These seem too greedy I think.
I think it is fine, because non-byte spans can not be serialized without a formatter for the elements. Happy to add a comment here as well, if you think it helps?
theuni commented at 6:48 PM on June 21, 2023:Just a difference of opinion/style I guess. I'd rather not consider the function in the first place, but I suppose anything that causes a compile failure wfm.
maflcko commented at 7:06 PM on June 21, 2023:C++20 Constraints and Concepts is probably going to fix the compiler errors in the long run. May even be less than 5 years until we can switch to that.
theuni commented at 6:00 PM on June 21, 2023: memberConcept ACK
ryanofsky commented at 8:00 PM on June 21, 2023: contributorConcept ACK. It makes sense to allow
Span<std::byte>to serialize fixed size blobs, likeSpan<unsigned char>already does so there is no need for serialize callers to cast tounsigned charif they are already usingstd::byte.I also think it is nice that the implementation uses a
Span<T>parameter type instead ofSpan<std::byte>orSpan<unsigned char>so you can tell just by looking at the Serialize declaration that an actualSpanargument needs to be passed, not some argument that can be implicitly converted to span. It is true as you pointed out in your earlier comment #27790 (review) that the presence of theSerialize(Stream& os, const T& a)overload below already prevents implicit conversions, but it's nice if you can look directly at a declaration and see that it doesn't allow implicit conversions, without having to look at other overloads.Looking at surrounding code, I also think there are some followups that could be done to make it cleaner / safer / nicer (but would probably save these for a followup PR):
- I think it's not ideal that ByteSpanCast is using UCharCast cast internally. Now code is using
std::byte, I think would be good to drop theUCharCastfunction and replace it with a similar function that acceptschar,unsigned charandstd::bytepointers, but returns astd::bytepointer rather than anunsigned charpointer. Doing this would remove an unnecessary series of casts from byte to char and back to byte again in ByteSpanCast, and also just generally move code further in the direction of usingstd::byteoverchar - I think it might be good to refuse the serialize
Span<char>andSpan<unsigned char>arguments, and only serializeSpan<std::byte>arguments as fixed size blobs. Again it would push code more in direction of usingstd::byteovercharfor binary data. It could also perhaps avoid cases where someone is trying to serialize a variable sized string and ends up through some misunderstood conversions accidentally serializing a fixed size span. - I think it would be good to drop the
AsBytePtrfunction, which looks like a safe conversion but actually can cast any pointer to a byte pointer, is not actually safe or useful to call on most object types. I think if suggestion 1 above is implemented, some uses ofAsBytePtrwould no longer be necessary, and the other uses would be clearer if they just usedreinterpret_castdirectly instead ofAsBytePtr
maflcko commented at 8:07 AM on June 22, 2023: memberI think it's not ideal that ByteSpanCast is using UCharCast cast internally. Now code is using std::byte, I think would be good to drop the UCharCast function and replace it with a similar function that accepts char, unsigned char and std::byte pointers, but returns a std::byte pointer rather than an unsigned char pointer. Doing this would remove an unnecessary series of casts from byte to char and back to byte again in ByteSpanCast, and also just generally move code further in the direction of using std::byte over charI thought about this, but concluded it isn't worth it.
UCharCastwon't be going away, because it is needed to speak with libbitcoinconsensus or libsecp256k1, see for example the conflicting pull. So adding the same 8 function to only differ in return type seems bloaty and verbose. However, I agree with you that new code shouldn't useUCharCastunless necessary.I think it might be good to refuse the serialize Span<char> and Span<unsigned char> arguments, and only serialize Span<std::byte> arguments as fixed size blobs. Again it would push code more in direction of using std::byte over char for binary data. It could also perhaps avoid cases where someone is trying to serialize a variable sized string and ends up through some misunderstood conversions accidentally serializing a fixed size span.This may be a bit too much hand-holding, with a small benefit. Also, it would make code more bloaty, as there are quite a few cases in the current code base where a
std::string(_view)is serialized viaSpan<char>, or a view onunsigned charis serialized viaSpan<unsigned char>.Also,
Span<B>could be used internally to serialize byte-holding (pre)vectors. In fact this is likely a performance bug-fix, because currently, serialization acceptsstd::vector<B>forB=signed char, orB=int8_t. (B=charis deleted). However, serialization seems to only apply the optimization to only callwriteonce forB=unsigned char. See https://github.com/bitcoin/bitcoin/blob/f1b4975461364d5d40d2bfafc6b165dd5d7eec30/src/serialize.h#L651 It may be good to treat all byte-like types the same, which can be achieved withSpan<B>, or alternatively also delete the otherB-types that are not special-cased, or alternatively delete the special-case code to treat all vectors the same.I think it would be good to drop the AsBytePtr function, which looks like a safe conversion but actually can cast any pointer to a byte pointer, is not actually safe or useful to call on most object types. I think if suggestion 1 above is implemented, some uses of AsBytePtr would no longer be necessary, and the other uses would be clearer if they just used reinterpret_cast directly instead of AsBytePtrI don't think this is possible either. It is required to speak with libsqlite3, libbdb, and internally in serialize.h. I think we should instead think about nuking the
AsBytesspan helpers. While they are ported from the standard library, I don't think they are appropriate for us, because they are likely architecture dependent when it comes to endian-ness.maflcko force-pushed on Jun 26, 2023maflcko marked this as a draft on Jun 26, 2023maflcko force-pushed on Jun 26, 2023DrahtBot added the label CI failed on Jun 26, 2023DrahtBot removed the label CI failed on Jun 26, 2023DrahtBot added the label Needs rebase on Jun 26, 2023util: Allow std::byte and char Span serialization fa257bc831fa38d86235Use only Span{} constructor for byte-like types where possible
This removes bloat that is not needed.
maflcko force-pushed on Jun 27, 2023maflcko marked this as ready for review on Jun 27, 2023maflcko commented at 9:30 AM on June 27, 2023: memberDid a rebase on clean master to drop the unneeded dependency.
DrahtBot removed the label Needs rebase on Jun 27, 2023ryanofsky approvedryanofsky commented at 2:41 PM on June 27, 2023: contributorCode review ACK fa38d862358b87219b12bf31236c52f28d9fc5d6. This looks great. The second commit really removes a lot of boilerplate and shows why the first commit is useful.
ryanofsky commented at 2:42 PM on June 27, 2023: contributorFix that by introducing a helper to convert any byte-like span to a std::byte-span, which is then accepted by serialization.
Can drop this line from PR description (https://github.com/bitcoin/bitcoin/pull/27927#issue-1767744286)
ryanofsky commented at 3:18 PM on June 27, 2023: contributorre: #27927 (comment)
Thanks for the responses. You convinced me there probably little benefit to preventing serialization of char spans, and it would just add boilerplate and not make things safer.
I do think we can drop the AsBytePtr function, though, so I tried to do that in #27978.
I also still think in the longer run it would be good to convert more internal code to use
std::byteinstead ofunsigned charand replaceUCharCastwith aByteCastalternative andMakeUCharSpanwith a saferMakeUCharSpanalternative. If code calling leveldb or libsecp needs uchar functions, it implement them locally, or we could keep existing functions, but implement them in terms ofByteCast. Not suggesting to do this now. I just think we might end up doing it when more code usesstd::byte.sipa commented at 3:45 PM on June 27, 2023: memberutACK fa38d862358b87219b12bf31236c52f28d9fc5d6
I do wonder: is there any reason why we wouldn't want to support serializing
Spans in general (where its serialization is defined as the concatenation of the serialization of its elements)?maflcko commented at 3:54 PM on June 27, 2023: memberI do wonder: is there any reason why we wouldn't want to support serializing Spans in general (where its serialization is defined as the concatenation of the serialization of its elements)?
I think that should be fine to add, once there is a use case. Or is there already one?
achow101 commented at 7:01 PM on June 28, 2023: memberACK fa38d862358b87219b12bf31236c52f28d9fc5d6
achow101 merged this on Jun 28, 2023achow101 closed this on Jun 28, 2023maflcko deleted the branch on Jun 29, 2023in src/serialize.h:220 in fa38d86235
216 | @@ -217,10 +217,11 @@ template<typename Stream> inline void Unserialize(Stream& s, int64_t& a ) { a = 217 | template<typename Stream> inline void Unserialize(Stream& s, uint64_t& a) { a = ser_readdata64(s); } 218 | template<typename Stream, int N> inline void Unserialize(Stream& s, char (&a)[N]) { s.read(MakeWritableByteSpan(a)); } 219 | template<typename Stream, int N> inline void Unserialize(Stream& s, unsigned char (&a)[N]) { s.read(MakeWritableByteSpan(a)); } 220 | -template<typename Stream> inline void Unserialize(Stream& s, Span<unsigned char>& span) { s.read(AsWritableBytes(span)); } 221 | +template <typename Stream, typename B> void Unserialize(Stream& s, Span<B> span) { (void)/* force byte-type */UCharCast(span.data()); s.read(AsWritableBytes(span)); }
maflcko commented at 10:16 AM on June 29, 2023:Just as another follow-up comment to the earlier question if this is too greedy: Yes, it is too greedy, but will actually print a nicer error message compared to using
enable_ifor concepts, because of the general serialization catch-all fallback to the member method.For example, the following test diff would fail with the general error message for a non-greedy template:
diff --git a/src/test/serialize_tests.cpp b/src/test/serialize_tests.cpp index b445ff8ffc..73b9e63011 100644 --- a/src/test/serialize_tests.cpp +++ b/src/test/serialize_tests.cpp @@ -245,10 +245,8 @@ BOOST_AUTO_TEST_CASE(class_methods) DataStream ds; const std::string in{"ab"}; ds << Span{in}; - std::array<std::byte, 2> out; + std::array<uint16_t, 2> out; ds >> Span{out}; - BOOST_CHECK_EQUAL(out.at(0), std::byte{'a'}); - BOOST_CHECK_EQUAL(out.at(1), std::byte{'b'}); } }CXX test/test_bitcoin-serialize_tests.o In file included from test/serialize_tests.cpp:5: In file included from ./hash.h:14: ./serialize.h:706:7: error: no member named 'Unserialize' in 'Span<unsigned short>' a.Unserialize(is); ~ ^ ./streams.h:308:11: note: in instantiation of function template specialization 'Unserialize<DataStream, Span<unsigned short> &>' requested here ::Unserialize(*this, obj); ^ test/serialize_tests.cpp:249:12: note: in instantiation of function template specialization 'DataStream::operator>><Span<unsigned short>>' requested here ds >> Span{out}; ^ 1 error generated. make[2]: *** [Makefile:19030: test/test_bitcoin-serialize_tests.o] Error 1I tested this with concepts (C++20), but the result should be the same for C++17
enable_if:diff --git a/src/serialize.h b/src/serialize.h index 348a6ae4f1..5c775458ae 100644 --- a/src/serialize.h +++ b/src/serialize.h @@ -202,7 +202,7 @@ template<typename Stream> inline void Serialize(Stream& s, int64_t a ) { ser_wri template<typename Stream> inline void Serialize(Stream& s, uint64_t a) { ser_writedata64(s, a); } template<typename Stream, int N> inline void Serialize(Stream& s, const char (&a)[N]) { s.write(MakeByteSpan(a)); } template<typename Stream, int N> inline void Serialize(Stream& s, const unsigned char (&a)[N]) { s.write(MakeByteSpan(a)); } -template <typename Stream, typename B> void Serialize(Stream& s, Span<B> span) { (void)/* force byte-type */UCharCast(span.data()); s.write(AsBytes(span)); } +template <typename Stream, ByteType B> void Serialize(Stream& s, Span<B> span) { s.write(AsBytes(span)); } #ifndef CHAR_EQUALS_INT8 template <typename Stream> void Unserialize(Stream&, char) = delete; // char serialization forbidden. Use uint8_t or int8_t @@ -217,7 +217,7 @@ template<typename Stream> inline void Unserialize(Stream& s, int64_t& a ) { a = template<typename Stream> inline void Unserialize(Stream& s, uint64_t& a) { a = ser_readdata64(s); } template<typename Stream, int N> inline void Unserialize(Stream& s, char (&a)[N]) { s.read(MakeWritableByteSpan(a)); } template<typename Stream, int N> inline void Unserialize(Stream& s, unsigned char (&a)[N]) { s.read(MakeWritableByteSpan(a)); } -template <typename Stream, typename B> void Unserialize(Stream& s, Span<B> span) { (void)/* force byte-type */UCharCast(span.data()); s.read(AsWritableBytes(span)); } +template <typename Stream, ByteType B> void Unserialize(Stream& s, Span<B> span) { s.read(AsWritableBytes(span)); } template <typename Stream> inline void Serialize(Stream& s, bool a) { uint8_t f = a; ser_writedata8(s, f); } template <typename Stream> inline void Unserialize(Stream& s, bool& a) { uint8_t f = ser_readdata8(s); a = f; } diff --git a/src/span.h b/src/span.h index c98784aee4..dbee63ba77 100644 --- a/src/span.h +++ b/src/span.h @@ -271,18 +271,24 @@ Span<std::byte> MakeWritableByteSpan(V&& v) noexcept return AsWritableBytes(Span{std::forward<V>(v)}); } -// Helper functions to safely cast to unsigned char pointers. +// Helper functions to safely cast a byte-like type to unsigned char pointers. inline unsigned char* UCharCast(char* c) { return (unsigned char*)c; } inline unsigned char* UCharCast(unsigned char* c) { return c; } inline unsigned char* UCharCast(std::byte* c) { return (unsigned char*)c; } inline const unsigned char* UCharCast(const char* c) { return (unsigned char*)c; } inline const unsigned char* UCharCast(const unsigned char* c) { return c; } inline const unsigned char* UCharCast(const std::byte* c) { return reinterpret_cast<const unsigned char*>(c); } +// Helper concept for byte-like types. +template <typename B> +concept ByteType = requires +{ + UCharCast(Span<B>{}.data()); +}; // Helper function to safely convert a Span to a Span<[const] unsigned char>. template <typename T> constexpr auto UCharSpanCast(Span<T> s) -> Span<typename std::remove_pointer<decltype(UCharCast(s.data()))>::type> { return {UCharCast(s.data()), s.size()}; } -/** Like the Span constructor, but for (const) unsigned char member types only. Only works for (un)signed char containers. */ +/** Like the Span constructor, but for (const) unsigned char member types only. Only works for byte-like type containers. */ template <typename V> constexpr auto MakeUCharSpan(V&& v) -> decltype(UCharSpanCast(Span{std::forward<V>(v)})) { return UCharSpanCast(Span{std::forward<V>(v)}); } #endif // BITCOIN_SPAN_Hsipa commented at 7:43 PM on June 29, 2023: memberI think that should be fine to add, once there is a use case. Or is there already one?
Not that I know, no. I was more wondering why we'd only add a specialized one for byte-like types when a generic one for all spans would work equally well. And I think the answer is that even with a generic one, we'd probably want a more efficient specialization for byte-like types anyway, to avoid processing one byte at a time.
maflcko commented at 7:57 PM on June 29, 2023: memberYeah, I am planning to do a more thorough cleanup with C++20, see my comment directly above and the other one earlier in the thread:
Also,
Span<B>could be used internally to serialize byte-holding (pre)vectors. In fact this is likely a performance bug-fix, because currently, serialization acceptsstd::vector<B>forB=signed char, orB=int8_t. (B=charis deleted). However, serialization seems to only apply the optimization to only callwriteonce forB=unsigned char. Seehttps://github.com/bitcoin/bitcoin/blob/f1b4975461364d5d40d2bfafc6b165dd5d7eec30/src/serialize.h#L651 It may be good to treat all byte-like types the same, which can be achieved with
Span<B>, or alternatively also delete the otherB-types that are not special-cased, or alternatively delete the special-case code to treat all vectors the same.achow101 referenced this in commit 561915f35f on Jun 29, 2023sidhujag referenced this in commit 69caf91ba8 on Jun 30, 2023sidhujag referenced this in commit efd2416d43 on Jun 30, 2023achow101 referenced this in commit b40d10787b on Feb 21, 2024achow101 referenced this in commit 6acfc4324c on Feb 21, 2024kwvg referenced this in commit 33fbe98319 on Feb 23, 2024kwvg referenced this in commit 3515543d39 on Feb 23, 2024kwvg referenced this in commit ab467e74c1 on Feb 23, 2024kwvg referenced this in commit f30b0e1676 on Feb 23, 2024kwvg referenced this in commit bbbac2e450 on Feb 23, 2024kwvg referenced this in commit ad055c8c0f on Feb 23, 2024kwvg referenced this in commit 821f2ea5b3 on Feb 23, 2024kwvg referenced this in commit 3accf754df on Feb 23, 2024kwvg referenced this in commit 4eda82718e on Feb 24, 2024kwvg referenced this in commit 3babdf3bd4 on Feb 24, 2024kwvg referenced this in commit 00c76230d0 on Feb 24, 2024kwvg referenced this in commit 099682b38a on Feb 24, 2024fanquake referenced this in commit 1ce5accc32 on Feb 26, 2024kwvg referenced this in commit 52e643c195 on Feb 27, 2024kwvg referenced this in commit de15c82aed on Feb 27, 2024PastaPastaPasta referenced this in commit cb2fa8360a on Feb 28, 2024PastaPastaPasta referenced this in commit 4eeafa267c on Feb 28, 2024PastaPastaPasta referenced this in commit 96c4442253 on Feb 28, 2024backpacker69 referenced this in commit 702df4ad20 on Jun 14, 2024backpacker69 referenced this in commit b063c9935f on Jun 14, 2024bitcoin locked this on Jun 28, 2024
This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-05-02 21:13 UTC
More mirrored repositories can be found on mirror.b10c.me