Serialization improvements step 4 (undo.h) #18021

pull sipa wants to merge 4 commits into bitcoin:master from sipa:202001_noncastserial_4 changing 2 files +81 −88
  1. sipa commented at 7:30 pm on January 29, 2020: member

    The next step of changes from #10785.

    This one adds:

    • A meta-formatter for vectors, which serializes the vector elements using another formatter
    • Switch the undo.h code to the new framework, using the above (where undo entries are serialized as a vector, each of which uses a modified serializer for the UTXOs).
  2. sipa commented at 7:31 pm on January 29, 2020: member
    @ryanofsky I added another small improvement here that wasn’t present in #10785, namely reusing the VectorUsing logic in the vector/prevector serializers themselves (avoiding code duplication).
  3. laanwj added this to the "Blockers" column in a project

  4. in src/serialize.h:28 in 416300bd0c outdated
    24@@ -25,6 +25,9 @@
    25 
    26 static const unsigned int MAX_SIZE = 0x02000000;
    27 
    28+/** Maximum amount of memory to allocate at once when deserializing vectors. */
    


    Empact commented at 11:33 pm on January 29, 2020:
    nit: specify units?

    sipa commented at 4:57 am on January 30, 2020:
    Done.
  5. fanquake requested review from jamesob on Jan 29, 2020
  6. Add a constant for the maximum vector allocation (5 Mbyte) 37d800bea0
  7. sipa force-pushed on Jan 30, 2020
  8. kallewoof commented at 6:27 am on February 3, 2020: member
    Code review ACK 24235e50e9d532cbe5836dcebbce0c6a04be8652
  9. in src/serialize.h:726 in 24235e50e9 outdated
    722@@ -673,6 +723,16 @@ inline void Unserialize(Stream& is, T&& a)
    723     a.Unserialize(is);
    724 }
    725 
    726+/** Identity formatter. Serializes objects as themselves. */
    


    laanwj commented at 10:10 am on February 5, 2020:
    Maybe add a comment when this is to be used, on first glance it seems like a no-op.

    sipa commented at 3:09 am on February 7, 2020:
    Done.
  10. in src/serialize.h:635 in 24235e50e9 outdated
    629+        size_t allocated = 0;
    630+        while (allocated < size) {
    631+            // For DoS prevention, do not blindly allocate as much as the stream claims to contain.
    632+            // Instead, allocate in 5MiB batches, so that an attacker actually needs to provide
    633+            // X MiB of data to make us allocate X+5 Mib.
    634+            allocated = std::min(size, allocated + MAX_VECTOR_ALLOCATE / sizeof(typename V::value_type));
    


    laanwj commented at 10:13 am on February 5, 2020:
    A static assert that sizeof(typename V::value_type) <= MAX_VECTOR_ALLOCATE could catch an edge case here.

    sipa commented at 3:09 am on February 7, 2020:
    Done.
  11. in src/serialize.h:640 in 741e27bfe5 outdated
    635+            v.reserve(allocated);
    636+            while (v.size() < allocated) {
    637+                typename V::value_type val;
    638+                auto wrapped = Using<Formatter>(val);
    639+                s >> wrapped;
    640+                v.push_back(std::move(val));
    


    ryanofsky commented at 7:30 pm on February 5, 2020:

    In commit “Add custom vector-element formatter” (741e27bfe5eb6bdc61f99b300c16f4b2f92e8ec7)

    Could avoid creating the temporary:

    0v.emplace_back();
    1s >> Using<Formatter>(v.back());
    

    sipa commented at 3:09 am on February 7, 2020:
    Done.
  12. in src/undo.h:92 in 24235e50e9 outdated
     98-        // TODO: avoid reimplementing vector deserializer
     99-        uint64_t count = 0;
    100-        ::Unserialize(s, COMPACTSIZE(count));
    101-        if (count > MAX_INPUTS_PER_BLOCK) {
    102-            throw std::ios_base::failure("Too many input undo records");
    103-        }
    


    ryanofsky commented at 7:50 pm on February 5, 2020:

    In commit “Convert undo.h to new serialization framework” (24235e50e9d532cbe5836dcebbce0c6a04be8652)

    Note: With this check going away, there doesn’t appear to be any limit on the total size of the deserialized vprevout vector. The incremental allocation size is now limited to MAX_VECTOR_ALLOCATE, though.


    sipa commented at 3:10 am on February 7, 2020:
    Yeah, I’m aware. I don’t think this is significant. The earlier code needed a sanity limit, and had custom code, so it used a custom limit. The new code is safe by default (even though the limit in practice is much higher).
  13. in src/serialize.h:613 in 741e27bfe5 outdated
    608+ *
    609+ * V is not required to be an std::vector type. It works for any class that
    610+ * exposes a value_type, size, reserve, push_back, and const iterators.
    611+ */
    612+template<class Formatter>
    613+struct VectorUsing
    


    ryanofsky commented at 8:07 pm on February 5, 2020:

    I don’t think the VectorUsing and UsingSelf names for formatters are the best. Would suggest sticking to the FooFormatter convention and renaming VectorUsing to VectorFormatter and UsingSelf to DefaultFormatter.

    Aside from giving more consistent class names, it should also make usages more readable: Using<VectorUsing<UsingSelf>>(v) -> Using<VectorFormatter<DefaultFormatter>>(v)


    sipa commented at 3:10 am on February 7, 2020:
    That’s indeed better. Done.
  14. ryanofsky approved
  15. ryanofsky commented at 8:13 pm on February 5, 2020: member
    Code review ACK 24235e50e9d532cbe5836dcebbce0c6a04be8652. Left minor suggestions, feel free to ignore
  16. Add custom vector-element formatter
    This allows a very compact notation for serialization of vectors whose
    elements are not serialized using their default encoding.
    abf8624356
  17. Make std::vector and prevector reuse the VectorFormatter logic 3cd8ab9d11
  18. Convert undo.h to new serialization framework 3c94b0039d
  19. sipa force-pushed on Feb 7, 2020
  20. DrahtBot commented at 10:06 am on February 7, 2020: member

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #18088 (build: ensure we aren’t using GNU extensions by fanquake)
    • #18087 (Get rid of VARINT default argument by sipa)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  21. in src/serialize.h:633 in 3c94b0039d
    628+        size_t size = ReadCompactSize(s);
    629+        size_t allocated = 0;
    630+        while (allocated < size) {
    631+            // For DoS prevention, do not blindly allocate as much as the stream claims to contain.
    632+            // Instead, allocate in 5MiB batches, so that an attacker actually needs to provide
    633+            // X MiB of data to make us allocate X+5 Mib.
    


    jonatack commented at 2:00 pm on February 7, 2020:
    Lines 632 and 633, so the documentation doesn’t go out of date if the value of MAX_VECTOR_ALLOCATE changes, perhaps replace “5 MiB” and “5 Mib” with “MAX_VECTOR_ALLOCATE” (and s/X Mib/X/ as the constant is defined in bytes).
  22. in src/serialize.h:635 in 3c94b0039d
    630+        while (allocated < size) {
    631+            // For DoS prevention, do not blindly allocate as much as the stream claims to contain.
    632+            // Instead, allocate in 5MiB batches, so that an attacker actually needs to provide
    633+            // X MiB of data to make us allocate X+5 Mib.
    634+            static_assert(sizeof(typename V::value_type) <= MAX_VECTOR_ALLOCATE, "Vector element size too large");
    635+            allocated = std::min(size, allocated + MAX_VECTOR_ALLOCATE / sizeof(typename V::value_type));
    


    jonatack commented at 2:09 pm on February 7, 2020:
    Now that the static assert suggested in #18021 (review) has been added, possibly worthwhile (or not) to calculate sizeof(typename V::value_type) once and cache it.

    jonatack commented at 2:54 pm on February 7, 2020:
    Noting that the result of sizeof is always nonzero, even if applied to an empty class type, so (normally) no risk of division by zero.

    kallewoof commented at 3:31 pm on February 7, 2020:
    Unless I’m confused, sizeof(..) is determined at compile-time, so no reason to cache it.

    jonatack commented at 4:55 pm on February 7, 2020:
    Oops yes, static_assert requires a bool constexpr, otherwise a compile-time error is issued.
  23. jonatack commented at 2:44 pm on February 7, 2020: member

    Qualified ACK 3c94b0039d2c

    Code review, built/ran tests/ran bitcoind. I am not yet highly competent wrt C++ templates but this looks like a worthwhile refactoring. Would review again. Some thoughts below, feel free to ignore.

  24. ryanofsky approved
  25. ryanofsky commented at 7:02 pm on February 7, 2020: member
    Code review ACK 3c94b0039d2ca2a8c41fd6127ff5019a2afc304e. Changes since last review: renaming formatter classes, adding suggested static_assert, and removing temporary in VectorFormatter
  26. laanwj commented at 2:47 pm on February 10, 2020: member
    code review ACK 3c94b0039d2ca2a8c41fd6127ff5019a2afc304e
  27. laanwj referenced this in commit 4c2578706c on Feb 10, 2020
  28. laanwj merged this on Feb 10, 2020
  29. laanwj closed this on Feb 10, 2020

  30. fanquake removed this from the "Blockers" column in a project

  31. sidhujag referenced this in commit ee101ba39b on Feb 18, 2020
  32. sidhujag referenced this in commit fc5b6b9daf on Nov 10, 2020
  33. deadalnix referenced this in commit b05d4ce895 on Dec 23, 2020
  34. deadalnix referenced this in commit 4ce8d58f5c on Dec 23, 2020
  35. Fabcien referenced this in commit 96abf36e50 on Dec 23, 2020
  36. Fabcien referenced this in commit 74eb604218 on Dec 23, 2020
  37. kittywhiskers referenced this in commit 98b18b83d4 on Mar 10, 2021
  38. kittywhiskers referenced this in commit aeb9240e41 on Mar 10, 2021
  39. kittywhiskers referenced this in commit 603db4699b on Mar 10, 2021
  40. kittywhiskers referenced this in commit 5bd678db41 on Mar 16, 2021
  41. kittywhiskers referenced this in commit ef07309697 on Mar 16, 2021
  42. PastaPastaPasta referenced this in commit 0802bdfa37 on Apr 18, 2021
  43. UdjinM6 referenced this in commit 20b71700dc on May 14, 2021
  44. kittywhiskers referenced this in commit b9007ef802 on May 20, 2021
  45. furszy referenced this in commit 5c93f159bc on Jul 5, 2021
  46. random-zebra referenced this in commit b4751e10ce on Aug 11, 2021
  47. DrahtBot locked this on Feb 15, 2022

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-01-22 03:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me