Serialization improvements step 5 (blockencodings)

sipa commented at 9:38 pm on February 10, 2020: member

This is probably the most involved change in the sequence of changes extracted from #10785.

In order to implement the differential encoding of BIP152, this change changes VectorFormatter to permit a stateful sub-formatter, which is then used by DifferenceFormatter. A CustomUintFormatter is added as well to do the 48-bit serialization of short ids.

sipa requested review from ryanofsky on Feb 10, 2020

DrahtBot commented at 10:10 pm on February 10, 2020: member

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

No conflicts as of last run.

fanquake added this to the "Blockers" column in a project

DrahtBot added the label Needs rebase on Feb 11, 2020

laanwj added the label Utils/log/libs on Feb 13, 2020

sipa force-pushed on Feb 13, 2020

sipa commented at 5:16 pm on February 13, 2020: member

Rebased.

DrahtBot removed the label Needs rebase on Feb 13, 2020

in src/serialize.h:557 in 99fa6174c5 outdated

557-    explicit CCompactSize(uint64_t& nIn) : n(nIn) { }
558+    template<typename Stream, typename I>
559+    void Unser(Stream& s, I& v)
560+    {
561+        static_assert(std::is_unsigned<I>::value, "CompactSize only supported for unsigned integers");
562+        static_assert(std::numeric_limits<I>::max() <= std::numeric_limits<uint64_t>::max(), "CompactSize only supports 64-bit integers and below");

ryanofsky commented at 9:58 pm on February 13, 2020:

In commit “Convert CCompactSize to proper formatter” (8b8ebf016c1de3c3827beff7666c0adb628355bf)

I’d consider dropping the static asserts here, because they don’t really make sense for deserializing (they do make sense in the Ser function below). I don’t thinks a reason to disallow deserializing into variables wider than 64 bits or signed variables if the value fits.

sipa commented at 3:24 am on February 15, 2020:

Done.

in src/serialize.h:560 in 99fa6174c5 outdated

563 
564-    template<typename Stream>
565-    void Serialize(Stream &s) const {
566-        WriteCompactSize<Stream>(s, n);
567+        uint64_t n = ReadCompactSize<Stream>(s);
568+        if (n > std::numeric_limits<I>::max()) {

ryanofsky commented at 10:02 pm on February 13, 2020:

In commit “Convert CCompactSize to proper formatter” (8b8ebf016c1de3c3827beff7666c0adb628355bf)

It’s unlikely to make a difference in practice unless I is some crazy type that starts above 0, but I’d add || n < std::numeric_limits<I>::min() to be complete here

sipa commented at 3:24 am on February 15, 2020:

Done.

in src/blockencodings.h:14 in f3480e04d7 outdated

10@@ -11,17 +11,60 @@
11 class CTxMemPool;
12 
13 // Dumb helper to handle CTransaction compression at serialize-time
14-struct TransactionCompressor {
15-private:
16-    CTransactionRef& tx;
17-public:
18-    explicit TransactionCompressor(CTransactionRef& txIn) : tx(txIn) {}
19+struct TransactionCompression

ryanofsky commented at 10:13 pm on February 13, 2020:

In commit “Convert blockencodings.h to new serialization framework” (d46dfeccaf1992ba92bff7e0d4988ae3f66b327a)

I think you could drop this class and use DefaultFormatter below instead

sipa commented at 3:24 am on February 15, 2020:

Done, by adding a type alias (keeping it obvious where new implementations would go).

in src/blockencodings.h:25 in f3480e04d7 outdated

25+struct Uint48Formatter
26+{
27+    template <typename Stream, typename I> void Ser(Stream& s, I v)
28+    {
29+        static_assert(std::is_unsigned<I>::value, "Uint48Formatter needs an unsigned integer");
30+        static_assert(sizeof(I) >= 6, "Uint48Formatter needs a 48+ bit type");

ryanofsky commented at 10:46 pm on February 13, 2020:

In commit “Convert blockencodings.h to new serialization framework” (d46dfeccaf1992ba92bff7e0d4988ae3f66b327a)

Would drop these lines. There should be no problem serializing a value smaller than 48 bits as 48 bits (or serializing a signed value value that’s positive). I think a runtime check would be more appropriate:

0if (v >> 48 || v < 0) {
1    throw std::ios_base::failure("Uint48Formatter value out of range");
2}

sipa commented at 3:24 am on February 15, 2020:

Done.

in src/blockencodings.h:35 in f3480e04d7 outdated

36 
37-    ADD_SERIALIZE_METHODS;
38+    template <typename Stream, typename I> void Unser(Stream& s, I& v)
39+    {
40+        static_assert(std::is_unsigned<I>::value, "Uint48Formatter needs an unsigned integer");
41+        static_assert(sizeof(I) >= 6, "Uint48Formatter needs a 48+ bit type");

ryanofsky commented at 6:36 pm on February 14, 2020:

In commit “Convert blockencodings.h to new serialization framework” (d46dfeccaf1992ba92bff7e0d4988ae3f66b327a)

These checks seem too strict. Could just check for a big enough range and not care about the format

0static_assert(std::numeric_limits<I>::max() >= 0xffffffffffff && std::numeric_limits<I>::min() <= 0, "Uint48Formatter needs a 48+ bit type");

sipa commented at 3:27 am on February 15, 2020:

Done.

in src/blockencodings.h:50 in f3480e04d7 outdated

54+{
55+    uint16_t m_subtract = 0;
56+
57+    uint16_t operator()(uint16_t val)
58+    {
59+        uint16_t tmp = val - m_subtract;

ryanofsky commented at 6:46 pm on February 14, 2020:

In commit “Convert blockencodings.h to new serialization framework” (d46dfeccaf1992ba92bff7e0d4988ae3f66b327a)

Should this raise an error if val <= tmp? It seems like the serializer shouldn’t create a sequence that the deserializer would reject.

sipa commented at 3:25 am on February 15, 2020:

I think the code is written assuming that the data fed to the serializer is sane.

I’ve merged the two differential transforms into one; I think this is more readable too.

in src/blockencodings.h:48 in f3480e04d7 outdated

52+
53+struct DifferentialSerTransform
54+{
55+    uint16_t m_subtract = 0;
56+
57+    uint16_t operator()(uint16_t val)

ryanofsky commented at 7:18 pm on February 14, 2020:

In commit “Convert blockencodings.h to new serialization framework” (d46dfeccaf1992ba92bff7e0d4988ae3f66b327a)

Maybe not worth extra effort or complexity, but I think the transforms would be safer and actually more understandable if they were templated. Safer because by hardcoding uint16_t as the sequence and diff types, lets c++ get away with truncating wider values to 16 bits with no checking. I think the ideal would look something like:

0template<typename DT> struct DifferentialSerTransform {
1   template<typename T> DT operator()(T val);
2};
3
4template<typename T> struct DifferentialUnserTransform {
5  template<typename DT> T operator()(DT diff);
6};

sipa commented at 3:27 am on February 15, 2020:

That doesn’t really work, because VectorFormatter needs the input and output of the transform to be identical (or at least compatible)… suspect there is a way to make it infer what the input type to the transform on deserialization is, but right now, it creates a V::value_type to deserialize into.

sipa commented at 3:53 am on February 16, 2020:

No longer relevant in new code.

in src/blockencodings.h:95 in f3480e04d7 outdated

114-    inline void SerializationOp(Stream& s, Operation ser_action) {
115-        uint64_t idx = index;
116-        READWRITE(COMPACTSIZE(idx));
117-        if (idx > std::numeric_limits<uint16_t>::max())
118+    template<typename Stream>
119+    void Unserialize(Stream& s)

ryanofsky commented at 8:10 pm on February 14, 2020:

In commit “Convert blockencodings.h to new serialization framework” (d46dfeccaf1992ba92bff7e0d4988ae3f66b327a)

There should be no need to have custom serialize and deserialize methods, and do overflow checking with a temporary variable now that CompactSizeFormatter has the same checking. The following seems to be sufficient:

0SERIALIZE_METHODS(PrefilledTransaction, obj) { READWRITE(COMPACTSIZE(obj.index), obj.tx); }

sipa commented at 3:34 am on February 15, 2020:

Nice, done!

in src/blockencodings.h:136 in f3480e04d7 outdated

197-        if (BlockTxCount() > std::numeric_limits<uint16_t>::max())
198+    template <typename Stream>
199+    inline void Unserialize(Stream& s)
200+    {
201+        static_assert(SHORTTXIDS_LENGTH == 6, "shorttxids serialization assumes 6-byte shorttxids");
202+        s >> header >> nonce >> Using<VectorFormatter<Uint48Formatter>>(shorttxids) >> prefilledtxn;

ryanofsky commented at 8:29 pm on February 14, 2020:

In commit “Convert blockencodings.h to new serialization framework” (d46dfeccaf1992ba92bff7e0d4988ae3f66b327a)

Maybe a matter of preference, but it might be nice to avoid needing to repeat s << header... ss >> header... serialization order. Could do:

 0SERIALIZE_METHODS(CBlockHeaderAndShortTxIDs, obj)
 1{
 2    static_assert(SHORTTXIDS_LENGTH == 6, "shorttxids serialization assumes 6-byte shorttxids");
 3    READWRITE(obj.header, obj.nonce, Using<VectorFormatter<Uint48Formatter>>(obj.shorttxids), obj.prefilledtxn);
 4    if (ser_action.ForRead()) {
 5        if (obj.BlockTxCount() > std::numeric_limits<uint16_t>::max()) {
 6            throw std::ios_base::failure("indexes overflowed 16 bits");
 7        }
 8        obj.FillShortTxIDSelector();
 9    }
10}

sipa commented at 3:38 am on February 15, 2020:

Ah indeed, that works because shorttxidkX are mutable, which they probably shouldn’t be. But no need to fix that here.

Done.

ryanofsky approved

ryanofsky commented at 8:45 pm on February 14, 2020: member

Code review ACK d46dfeccaf1992ba92bff7e0d4988ae3f66b327a. I really like this change a lot. Serialization code in BlockTransactionsRequest, BlockTransactions, and CBlockHeaderAndShortTxIDs is much less repetitive and more understandable now, IMO. I left some suggestions below, but nothing important, so feel free to ignore.

Convert CCompactSize to proper formatter 3ca574cef0

sipa force-pushed on Feb 15, 2020

ryanofsky referenced this in commit f777879968 on Feb 15, 2020

ryanofsky approved

ryanofsky commented at 2:38 pm on February 15, 2020: member

Code review ACK 40fdd5faefe835a9868be1a667511780811a814e. Changes since last review were just suggestions tweaking asserts and switching to SERIALIZE_METHODS. Also, there’s a new combined DifferenceTransform class.

I think you could go further and make DifferenceTransform into a formatter: f7778799685a0d3d70cf6e18d0911e676cc02b90 (branch). This is nice because it collapses transforming and formatting into one function and does more size/overflow checking

Make VectorFormatter support stateful formatters 56dd9f04c7

Add DifferenceFormatter 10633398f2

sipa force-pushed on Feb 16, 2020

sipa commented at 3:56 am on February 16, 2020: member

@ryanofsky That’s amazing, it looks so much cleaner than my approach. I’ve taken your code and restructured it as commits in this PR.

I’ve also replaced Uint48Formatter with a CustomUintFormatter in serialize.h instead, as a separate commit.

in src/serialize.h:524 in 96b18829d7 outdated

519+    static_assert(Bytes > 0 && Bytes <= 8, "CustomUintFormatter Bytes out of range");
520+    static constexpr uint64_t MASK = 0xffffffffffffffff >> (8 * (8 - Bytes));
521+
522+    template <typename Stream, typename I> void Ser(Stream& s, I v)
523+    {
524+        if (v < 0 || (v & ~I(MASK)) != 0) throw std::ios_base::failure("CustomUintFormatter value out of range");

ryanofsky commented at 5:21 pm on February 24, 2020:

In commit “Add CustomUintFormatter” (96b18829d750baea8577cf409d85c0cd5186b67c)

Could replace (v & ~I(MASK)) != 0 with v <= MASK to be more consistent with other assert and avoid making an assumption about representation of v.

sipa commented at 10:11 pm on February 25, 2020:

Done, also renamed MASK to MAX.

ryanofsky approved

ryanofsky commented at 5:31 pm on February 24, 2020: member

Code review ACK 3efc7543662fe1d638ae9c94c00a1107b878d83a. Changes since last review are using the DifferenceFormatter -> DifferenceTransform replacement, reorganizing commits, and also replacing Uint48Formatter -> CustomUintFormatter

Add CustomUintFormatter e574fff53e

Convert blockencodings.h to new serialization framework 353f376277

sipa force-pushed on Feb 25, 2020

ryanofsky approved

ryanofsky commented at 11:01 pm on February 25, 2020: member

Code review ACK 353f376277ad9b87e03c9ccbc1028c4b6d12e8ea. Only changes since last review are suggested assert change and MASK->MAX rename

laanwj commented at 4:45 pm on March 2, 2020: member

ACK 353f376277ad9b87e03c9ccbc1028c4b6d12e8ea, nice change Thanks @ryanofsky for the suggestion making it even more concise.

laanwj merged this on Mar 5, 2020

laanwj closed this on Mar 5, 2020

laanwj removed this from the "Blockers" column in a project

sidhujag referenced this in commit 69e8548a31 on Mar 5, 2020

sidhujag referenced this in commit 4c7ebce962 on Nov 10, 2020

Fabcien referenced this in commit 7c7cbd48c1 on Jan 6, 2021

Fabcien referenced this in commit 99225de342 on Jan 6, 2021

Fabcien referenced this in commit 1be72a31df on Jan 6, 2021

Fabcien referenced this in commit 92575da52f on Jan 6, 2021

deadalnix referenced this in commit 7b24ab1964 on Jan 6, 2021

kittywhiskers referenced this in commit 739ac62b4f on Mar 10, 2021

kittywhiskers referenced this in commit f2a0751122 on Mar 10, 2021

kittywhiskers referenced this in commit 4a17e11fe9 on Mar 10, 2021

kittywhiskers referenced this in commit de4e4684e3 on Mar 10, 2021

kittywhiskers referenced this in commit 568c91756a on Mar 10, 2021

kittywhiskers referenced this in commit 2ba56d7d94 on Mar 10, 2021

kittywhiskers referenced this in commit 513e1d1b17 on Mar 10, 2021

kittywhiskers referenced this in commit 9f5b8c8b77 on Mar 10, 2021

kittywhiskers referenced this in commit 90368750c4 on Mar 16, 2021

kittywhiskers referenced this in commit 3797c12dc2 on Mar 16, 2021

PastaPastaPasta referenced this in commit 97caca3033 on Apr 18, 2021

UdjinM6 referenced this in commit 20b71700dc on May 14, 2021

kittywhiskers referenced this in commit c6a1395c8a on May 20, 2021

furszy referenced this in commit 5c93f159bc on Jul 5, 2021

random-zebra referenced this in commit b4751e10ce on Aug 11, 2021

in src/serialize.h:525 in 353f376277

520+    static constexpr uint64_t MAX = 0xffffffffffffffff >> (8 * (8 - Bytes));
521+
522+    template <typename Stream, typename I> void Ser(Stream& s, I v)
523+    {
524+        if (v < 0 || v > MAX) throw std::ios_base::failure("CustomUintFormatter value out of range");
525+        uint64_t raw = htole64(v);

cculianu commented at 2:23 pm on December 4, 2021:

Late to the party here – but this does not seem equivalent to the previous encoding mechanism. In particular, if the host machine is big endian, the previous code and this change will produce different, incompatible results.

sipa commented at 2:31 pm on December 4, 2021:

Can you give an example of how it would fail?

htole64 converts a number in host native representation (= the value the uint64_t input has in C) to 64-bit LE representation. On LE systems htole64 is a no-op. On BE systems it performs a byteswap on the representation of the number.

cculianu commented at 3:39 pm on December 4, 2021:

Oh, nevermind. False alarm. The serializations end up equivalent, they just take different paths to get there. :)

gades referenced this in commit 036ae6c2ed on Apr 26, 2022

ftrader referenced this in commit 2d7d885597 on May 29, 2022

ftrader referenced this in commit 9bc9c239d9 on May 29, 2022

ftrader referenced this in commit 97b37e0344 on May 29, 2022

ftrader referenced this in commit 717ec52af7 on May 29, 2022

ftrader referenced this in commit dff81a926b on May 29, 2022

malbit referenced this in commit 0781039dcd on Aug 7, 2022

DrahtBot locked this on Dec 4, 2022

Serialization improvements step 5 (blockencodings) #18112

Conflicts