Serialization improvements step 5 (blockencodings) #18112

pull sipa wants to merge 5 commits into bitcoin:master from sipa:202002_noncastserial_5 changing 3 files +87 −126
  1. sipa commented at 9:38 pm on February 10, 2020: member

    This is probably the most involved change in the sequence of changes extracted from #10785.

    In order to implement the differential encoding of BIP152, this change changes VectorFormatter to permit a stateful sub-formatter, which is then used by DifferenceFormatter. A CustomUintFormatter is added as well to do the 48-bit serialization of short ids.

  2. sipa requested review from ryanofsky on Feb 10, 2020
  3. DrahtBot commented at 10:10 pm on February 10, 2020: member

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Conflicts

    No conflicts as of last run.

  4. fanquake added this to the "Blockers" column in a project

  5. DrahtBot added the label Needs rebase on Feb 11, 2020
  6. laanwj added the label Utils/log/libs on Feb 13, 2020
  7. sipa force-pushed on Feb 13, 2020
  8. sipa commented at 5:16 pm on February 13, 2020: member
    Rebased.
  9. DrahtBot removed the label Needs rebase on Feb 13, 2020
  10. in src/serialize.h:557 in 99fa6174c5 outdated
    557-    explicit CCompactSize(uint64_t& nIn) : n(nIn) { }
    558+    template<typename Stream, typename I>
    559+    void Unser(Stream& s, I& v)
    560+    {
    561+        static_assert(std::is_unsigned<I>::value, "CompactSize only supported for unsigned integers");
    562+        static_assert(std::numeric_limits<I>::max() <= std::numeric_limits<uint64_t>::max(), "CompactSize only supports 64-bit integers and below");
    


    ryanofsky commented at 9:58 pm on February 13, 2020:

    In commit “Convert CCompactSize to proper formatter” (8b8ebf016c1de3c3827beff7666c0adb628355bf)

    I’d consider dropping the static asserts here, because they don’t really make sense for deserializing (they do make sense in the Ser function below). I don’t thinks a reason to disallow deserializing into variables wider than 64 bits or signed variables if the value fits.


    sipa commented at 3:24 am on February 15, 2020:
    Done.
  11. in src/serialize.h:560 in 99fa6174c5 outdated
    563 
    564-    template<typename Stream>
    565-    void Serialize(Stream &s) const {
    566-        WriteCompactSize<Stream>(s, n);
    567+        uint64_t n = ReadCompactSize<Stream>(s);
    568+        if (n > std::numeric_limits<I>::max()) {
    


    ryanofsky commented at 10:02 pm on February 13, 2020:

    In commit “Convert CCompactSize to proper formatter” (8b8ebf016c1de3c3827beff7666c0adb628355bf)

    It’s unlikely to make a difference in practice unless I is some crazy type that starts above 0, but I’d add || n < std::numeric_limits<I>::min() to be complete here


    sipa commented at 3:24 am on February 15, 2020:
    Done.
  12. in src/blockencodings.h:14 in f3480e04d7 outdated
    10@@ -11,17 +11,60 @@
    11 class CTxMemPool;
    12 
    13 // Dumb helper to handle CTransaction compression at serialize-time
    14-struct TransactionCompressor {
    15-private:
    16-    CTransactionRef& tx;
    17-public:
    18-    explicit TransactionCompressor(CTransactionRef& txIn) : tx(txIn) {}
    19+struct TransactionCompression
    


    ryanofsky commented at 10:13 pm on February 13, 2020:

    In commit “Convert blockencodings.h to new serialization framework” (d46dfeccaf1992ba92bff7e0d4988ae3f66b327a)

    I think you could drop this class and use DefaultFormatter below instead


    sipa commented at 3:24 am on February 15, 2020:
    Done, by adding a type alias (keeping it obvious where new implementations would go).
  13. in src/blockencodings.h:25 in f3480e04d7 outdated
    25+struct Uint48Formatter
    26+{
    27+    template <typename Stream, typename I> void Ser(Stream& s, I v)
    28+    {
    29+        static_assert(std::is_unsigned<I>::value, "Uint48Formatter needs an unsigned integer");
    30+        static_assert(sizeof(I) >= 6, "Uint48Formatter needs a 48+ bit type");
    


    ryanofsky commented at 10:46 pm on February 13, 2020:

    In commit “Convert blockencodings.h to new serialization framework” (d46dfeccaf1992ba92bff7e0d4988ae3f66b327a)

    Would drop these lines. There should be no problem serializing a value smaller than 48 bits as 48 bits (or serializing a signed value value that’s positive). I think a runtime check would be more appropriate:

    0if (v >> 48 || v < 0) {
    1    throw std::ios_base::failure("Uint48Formatter value out of range");
    2}
    

    sipa commented at 3:24 am on February 15, 2020:
    Done.
  14. in src/blockencodings.h:35 in f3480e04d7 outdated
    36 
    37-    ADD_SERIALIZE_METHODS;
    38+    template <typename Stream, typename I> void Unser(Stream& s, I& v)
    39+    {
    40+        static_assert(std::is_unsigned<I>::value, "Uint48Formatter needs an unsigned integer");
    41+        static_assert(sizeof(I) >= 6, "Uint48Formatter needs a 48+ bit type");
    


    ryanofsky commented at 6:36 pm on February 14, 2020:

    In commit “Convert blockencodings.h to new serialization framework” (d46dfeccaf1992ba92bff7e0d4988ae3f66b327a)

    These checks seem too strict. Could just check for a big enough range and not care about the format

    0static_assert(std::numeric_limits<I>::max() >= 0xffffffffffff && std::numeric_limits<I>::min() <= 0, "Uint48Formatter needs a 48+ bit type");
    

    sipa commented at 3:27 am on February 15, 2020:
    Done.
  15. in src/blockencodings.h:50 in f3480e04d7 outdated
    54+{
    55+    uint16_t m_subtract = 0;
    56+
    57+    uint16_t operator()(uint16_t val)
    58+    {
    59+        uint16_t tmp = val - m_subtract;
    


    ryanofsky commented at 6:46 pm on February 14, 2020:

    In commit “Convert blockencodings.h to new serialization framework” (d46dfeccaf1992ba92bff7e0d4988ae3f66b327a)

    Should this raise an error if val <= tmp? It seems like the serializer shouldn’t create a sequence that the deserializer would reject.


    sipa commented at 3:25 am on February 15, 2020:

    I think the code is written assuming that the data fed to the serializer is sane.

    I’ve merged the two differential transforms into one; I think this is more readable too.

  16. in src/blockencodings.h:48 in f3480e04d7 outdated
    52+
    53+struct DifferentialSerTransform
    54+{
    55+    uint16_t m_subtract = 0;
    56+
    57+    uint16_t operator()(uint16_t val)
    


    ryanofsky commented at 7:18 pm on February 14, 2020:

    In commit “Convert blockencodings.h to new serialization framework” (d46dfeccaf1992ba92bff7e0d4988ae3f66b327a)

    Maybe not worth extra effort or complexity, but I think the transforms would be safer and actually more understandable if they were templated. Safer because by hardcoding uint16_t as the sequence and diff types, lets c++ get away with truncating wider values to 16 bits with no checking. I think the ideal would look something like:

    0template<typename DT> struct DifferentialSerTransform {
    1   template<typename T> DT operator()(T val);
    2};
    3
    4template<typename T> struct DifferentialUnserTransform {
    5  template<typename DT> T operator()(DT diff);
    6};
    

    sipa commented at 3:27 am on February 15, 2020:
    That doesn’t really work, because VectorFormatter needs the input and output of the transform to be identical (or at least compatible)… suspect there is a way to make it infer what the input type to the transform on deserialization is, but right now, it creates a V::value_type to deserialize into.

    sipa commented at 3:53 am on February 16, 2020:
    No longer relevant in new code.
  17. in src/blockencodings.h:95 in f3480e04d7 outdated
    114-    inline void SerializationOp(Stream& s, Operation ser_action) {
    115-        uint64_t idx = index;
    116-        READWRITE(COMPACTSIZE(idx));
    117-        if (idx > std::numeric_limits<uint16_t>::max())
    118+    template<typename Stream>
    119+    void Unserialize(Stream& s)
    


    ryanofsky commented at 8:10 pm on February 14, 2020:

    In commit “Convert blockencodings.h to new serialization framework” (d46dfeccaf1992ba92bff7e0d4988ae3f66b327a)

    There should be no need to have custom serialize and deserialize methods, and do overflow checking with a temporary variable now that CompactSizeFormatter has the same checking. The following seems to be sufficient:

    0SERIALIZE_METHODS(PrefilledTransaction, obj) { READWRITE(COMPACTSIZE(obj.index), obj.tx); }
    

    sipa commented at 3:34 am on February 15, 2020:
    Nice, done!
  18. in src/blockencodings.h:136 in f3480e04d7 outdated
    197-        if (BlockTxCount() > std::numeric_limits<uint16_t>::max())
    198+    template <typename Stream>
    199+    inline void Unserialize(Stream& s)
    200+    {
    201+        static_assert(SHORTTXIDS_LENGTH == 6, "shorttxids serialization assumes 6-byte shorttxids");
    202+        s >> header >> nonce >> Using<VectorFormatter<Uint48Formatter>>(shorttxids) >> prefilledtxn;
    


    ryanofsky commented at 8:29 pm on February 14, 2020:

    In commit “Convert blockencodings.h to new serialization framework” (d46dfeccaf1992ba92bff7e0d4988ae3f66b327a)

    Maybe a matter of preference, but it might be nice to avoid needing to repeat s << header... ss >> header... serialization order. Could do:

     0SERIALIZE_METHODS(CBlockHeaderAndShortTxIDs, obj)
     1{
     2    static_assert(SHORTTXIDS_LENGTH == 6, "shorttxids serialization assumes 6-byte shorttxids");
     3    READWRITE(obj.header, obj.nonce, Using<VectorFormatter<Uint48Formatter>>(obj.shorttxids), obj.prefilledtxn);
     4    if (ser_action.ForRead()) {
     5        if (obj.BlockTxCount() > std::numeric_limits<uint16_t>::max()) {
     6            throw std::ios_base::failure("indexes overflowed 16 bits");
     7        }
     8        obj.FillShortTxIDSelector();
     9    }
    10}
    

    sipa commented at 3:38 am on February 15, 2020:

    Ah indeed, that works because shorttxidkX are mutable, which they probably shouldn’t be. But no need to fix that here.

    Done.

  19. ryanofsky approved
  20. ryanofsky commented at 8:45 pm on February 14, 2020: member
    Code review ACK d46dfeccaf1992ba92bff7e0d4988ae3f66b327a. I really like this change a lot. Serialization code in BlockTransactionsRequest, BlockTransactions, and CBlockHeaderAndShortTxIDs is much less repetitive and more understandable now, IMO. I left some suggestions below, but nothing important, so feel free to ignore.
  21. Convert CCompactSize to proper formatter 3ca574cef0
  22. sipa force-pushed on Feb 15, 2020
  23. sipa force-pushed on Feb 15, 2020
  24. sipa force-pushed on Feb 15, 2020
  25. ryanofsky referenced this in commit f777879968 on Feb 15, 2020
  26. ryanofsky approved
  27. ryanofsky commented at 2:38 pm on February 15, 2020: member

    Code review ACK 40fdd5faefe835a9868be1a667511780811a814e. Changes since last review were just suggestions tweaking asserts and switching to SERIALIZE_METHODS. Also, there’s a new combined DifferenceTransform class.

    I think you could go further and make DifferenceTransform into a formatter: f7778799685a0d3d70cf6e18d0911e676cc02b90 (branch). This is nice because it collapses transforming and formatting into one function and does more size/overflow checking

  28. Make VectorFormatter support stateful formatters 56dd9f04c7
  29. Add DifferenceFormatter 10633398f2
  30. sipa force-pushed on Feb 16, 2020
  31. sipa commented at 3:56 am on February 16, 2020: member

    @ryanofsky That’s amazing, it looks so much cleaner than my approach. I’ve taken your code and restructured it as commits in this PR.

    I’ve also replaced Uint48Formatter with a CustomUintFormatter in serialize.h instead, as a separate commit.

  32. in src/serialize.h:524 in 96b18829d7 outdated
    519+    static_assert(Bytes > 0 && Bytes <= 8, "CustomUintFormatter Bytes out of range");
    520+    static constexpr uint64_t MASK = 0xffffffffffffffff >> (8 * (8 - Bytes));
    521+
    522+    template <typename Stream, typename I> void Ser(Stream& s, I v)
    523+    {
    524+        if (v < 0 || (v & ~I(MASK)) != 0) throw std::ios_base::failure("CustomUintFormatter value out of range");
    


    ryanofsky commented at 5:21 pm on February 24, 2020:

    In commit “Add CustomUintFormatter” (96b18829d750baea8577cf409d85c0cd5186b67c)

    Could replace (v & ~I(MASK)) != 0 with v <= MASK to be more consistent with other assert and avoid making an assumption about representation of v.


    sipa commented at 10:11 pm on February 25, 2020:
    Done, also renamed MASK to MAX.
  33. ryanofsky approved
  34. ryanofsky commented at 5:31 pm on February 24, 2020: member
    Code review ACK 3efc7543662fe1d638ae9c94c00a1107b878d83a. Changes since last review are using the DifferenceFormatter -> DifferenceTransform replacement, reorganizing commits, and also replacing Uint48Formatter -> CustomUintFormatter
  35. Add CustomUintFormatter e574fff53e
  36. Convert blockencodings.h to new serialization framework 353f376277
  37. sipa force-pushed on Feb 25, 2020
  38. ryanofsky approved
  39. ryanofsky commented at 11:01 pm on February 25, 2020: member
    Code review ACK 353f376277ad9b87e03c9ccbc1028c4b6d12e8ea. Only changes since last review are suggested assert change and MASK->MAX rename
  40. laanwj commented at 4:45 pm on March 2, 2020: member
    ACK 353f376277ad9b87e03c9ccbc1028c4b6d12e8ea, nice change Thanks @ryanofsky for the suggestion making it even more concise.
  41. laanwj merged this on Mar 5, 2020
  42. laanwj closed this on Mar 5, 2020

  43. laanwj removed this from the "Blockers" column in a project

  44. sidhujag referenced this in commit 69e8548a31 on Mar 5, 2020
  45. sidhujag referenced this in commit 4c7ebce962 on Nov 10, 2020
  46. Fabcien referenced this in commit 7c7cbd48c1 on Jan 6, 2021
  47. Fabcien referenced this in commit 99225de342 on Jan 6, 2021
  48. Fabcien referenced this in commit 1be72a31df on Jan 6, 2021
  49. Fabcien referenced this in commit 92575da52f on Jan 6, 2021
  50. deadalnix referenced this in commit 7b24ab1964 on Jan 6, 2021
  51. kittywhiskers referenced this in commit 739ac62b4f on Mar 10, 2021
  52. kittywhiskers referenced this in commit f2a0751122 on Mar 10, 2021
  53. kittywhiskers referenced this in commit 4a17e11fe9 on Mar 10, 2021
  54. kittywhiskers referenced this in commit de4e4684e3 on Mar 10, 2021
  55. kittywhiskers referenced this in commit 568c91756a on Mar 10, 2021
  56. kittywhiskers referenced this in commit 2ba56d7d94 on Mar 10, 2021
  57. kittywhiskers referenced this in commit 513e1d1b17 on Mar 10, 2021
  58. kittywhiskers referenced this in commit 9f5b8c8b77 on Mar 10, 2021
  59. kittywhiskers referenced this in commit 90368750c4 on Mar 16, 2021
  60. kittywhiskers referenced this in commit 3797c12dc2 on Mar 16, 2021
  61. PastaPastaPasta referenced this in commit 97caca3033 on Apr 18, 2021
  62. UdjinM6 referenced this in commit 20b71700dc on May 14, 2021
  63. kittywhiskers referenced this in commit c6a1395c8a on May 20, 2021
  64. furszy referenced this in commit 5c93f159bc on Jul 5, 2021
  65. random-zebra referenced this in commit b4751e10ce on Aug 11, 2021
  66. in src/serialize.h:525 in 353f376277
    520+    static constexpr uint64_t MAX = 0xffffffffffffffff >> (8 * (8 - Bytes));
    521+
    522+    template <typename Stream, typename I> void Ser(Stream& s, I v)
    523+    {
    524+        if (v < 0 || v > MAX) throw std::ios_base::failure("CustomUintFormatter value out of range");
    525+        uint64_t raw = htole64(v);
    


    cculianu commented at 2:23 pm on December 4, 2021:
    Late to the party here – but this does not seem equivalent to the previous encoding mechanism. In particular, if the host machine is big endian, the previous code and this change will produce different, incompatible results.

    sipa commented at 2:31 pm on December 4, 2021:

    Can you give an example of how it would fail?

    htole64 converts a number in host native representation (= the value the uint64_t input has in C) to 64-bit LE representation. On LE systems htole64 is a no-op. On BE systems it performs a byteswap on the representation of the number.


    cculianu commented at 3:39 pm on December 4, 2021:
    Oh, nevermind. False alarm. The serializations end up equivalent, they just take different paths to get there. :)
  67. gades referenced this in commit 036ae6c2ed on Apr 26, 2022
  68. ftrader referenced this in commit 2d7d885597 on May 29, 2022
  69. ftrader referenced this in commit 9bc9c239d9 on May 29, 2022
  70. ftrader referenced this in commit 97b37e0344 on May 29, 2022
  71. ftrader referenced this in commit 717ec52af7 on May 29, 2022
  72. ftrader referenced this in commit dff81a926b on May 29, 2022
  73. malbit referenced this in commit 0781039dcd on Aug 7, 2022
  74. DrahtBot locked this on Dec 4, 2022

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-12-18 18:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me