Various serialization simplifcations and optimizations #9039

pull sipa wants to merge 9 commits into bitcoin:master from sipa:simpleserial changing 39 files +409 −543
  1. sipa commented at 1:45 am on October 29, 2016: member

    The commits in this pull request implement a sequence of changes:

    • Simplifications:
      • Remove unused ReadVersion and WriteVersion CDataStream and CAutoFile had a ReadVersion and WriteVersion method that was never used. Remove them.
      • Make nType and nVersion private and sometimes const Make the various stream implementations’ nType and nVersion private and const (except in CDataStream where we really need a setter).
      • Make streams’ read and write return void The stream implementations have two layers (the upper one with operator« and operator», and a lower one with read and write). The lower layer’s return values are never used (nor should they, as they should only be used from the higher layer), so make them void.
      • Make GetSerializeSize a wrapper on top of CSizeComputer Given that in default GetSerializeSize implementations we’re already using CSizeComputer(), get rid of the specialized GetSerializeSize methods everywhere, and just use CSizeComputer. This removes a lot of code which isn’t actually used anywhere. In a few places, this removes an actually more efficient size computing algorithm, which we’ll bring back in the “Add optimized CSizeComputer serializers” commit later.
      • Get rid of nType and nVersion The big change: remove the nType and nVersion as parameters to all serialization methods and functions. There is only one place where it’s read and has an impact (in CAddress), and even there it does not impact any of the member objects’ serializations. Instead, the few places that need nType or nVersion read it directly from the stream, through GetType and GetVersion calls which are added to all streams.
      • Avoid -Wshadow errors As suggested by @paveljanik, remove the few remaining cases of variable shadowing in the serialization code.
    • Optimizations:
      • Make CSerAction’s ForRead() constexpr The CSerAction’s ForRead() method does not depend on any runtime data, so guarantee that requests to it can be optimized out by making it constexpr (suggested by @theuni in #8580).
      • Add optimized CSizeComputer serializers To get the advantages of faster GetSerializeSize implementations back, reintroduce them in the few places where they actually make a difference, in the form of a specialized Serialize implementation. This actually gets us in a better state than before, as these even get used when they’re nested inside the serialization of another object.
      • Use fixed preallocation instead of costly GetSerializeSize dbwrapper uses GetSerializeSize to compute the size of the buffer to preallocate. For some cases (specifically: CCoins) this requires a costly compression call. Avoid this by just using fixed size preallocations instead.

    This will make it easier to address @TheBlueMatt’s comments in #8580, resulting is a simpler and more efficient way to simultaneously deserialize+construct objects with const members from streams.

    I assume this conflicts/supersedes #8468.

  2. sipa added the label Refactoring on Oct 29, 2016
  3. sipa force-pushed on Oct 29, 2016
  4. sipa force-pushed on Oct 29, 2016
  5. sipa force-pushed on Oct 29, 2016
  6. sipa force-pushed on Oct 29, 2016
  7. sipa force-pushed on Oct 29, 2016
  8. sipa commented at 6:19 am on October 30, 2016: member
    A benchmark of a -reindex-chainstate until height 290000, with default dbcache, shows that this is around 1% faster (4 runs on master with times ranging 1370.99-1377.95s, 4 runs on this PR with times ranging 1359.90-1365.74s; measured using -bench=debug).
  9. dcousens approved
  10. in src/serialize.h: in 87d523c7b7 outdated
    446@@ -474,7 +447,7 @@ class LimitedString
    447     LimitedString(std::string& string) : string(string) {}
    


    paveljanik commented at 3:19 pm on October 30, 2016:

    Can you please also change this into

    0LimitedString(std::string& _string) : string(_string) {}
    

    This will remove 204 -Wshadow warnings…

  11. in src/serialize.h: in 87d523c7b7 outdated
    829-    int nVersion;
    830-
    831     CSizeComputer(int nTypeIn, int nVersionIn) : nSize(0), nType(nTypeIn), nVersion(nVersionIn) {}
    832 
    833-    CSizeComputer& write(const char *psz, size_t nSize)
    834+    void write(const char *psz, size_t nSize)
    


    paveljanik commented at 3:20 pm on October 30, 2016:
    _nSize please.
  12. in src/serialize.h: in 87d523c7b7 outdated
    834+    void write(const char *psz, size_t nSize)
    835+    {
    836+        this->nSize += nSize;
    837+    }
    838+
    839+    void increment(size_t nSize)
    


    paveljanik commented at 3:20 pm on October 30, 2016:
    ditto, _nSize please.
  13. paveljanik commented at 3:21 pm on October 30, 2016: contributor
    Concept ACK. I like it!
  14. sipa force-pushed on Oct 30, 2016
  15. sipa commented at 8:48 pm on October 30, 2016: member
    Addressed @paveljanik’s nits.
  16. paveljanik commented at 9:15 pm on October 30, 2016: contributor

    Compiling with clang brings new warnings (diff master->#9039):

    0+In file included from init.cpp:41:
    1+./wallet/wallet.h:137:13: warning: 'static' function 'ReadOrderPos' declared in header file should be declared 'static inline' [-Wunneeded-internal-declaration]
    2+static void ReadOrderPos(int64_t& nOrderPos, mapValue_t& mapValue)
    3             ^
    4+./wallet/wallet.h:148:13: warning: 'static' function 'WriteOrderPos' declared in header file should be declared 'static inline' [-Wunneeded-internal-declaration]
    5+static void WriteOrderPos(const int64_t& nOrderPos, mapValue_t& mapValue)
    6             ^
    

    The overall efect on build logs is brutal here (-Wshadow turned on by default):

    0-rw-r--r--  1 pavel  wheel   536727 Oct 30 22:20 9039.log
    1-rw-r--r--  1 pavel  wheel  2873995 Oct 30 22:18 master.log
    

    Great work @sipa! Thanks.

    utACK https://github.com/bitcoin/bitcoin/pull/9039/commits/50473bb142e66f42a82f5fe57919776522d299aa

    I will continue testing tomorrow.

  17. sipa commented at 9:21 pm on October 30, 2016: member
    @paveljanik That seems completely unrelated to this PR. Is it possible that clang only shows a limited number of warnings, and only gets to those new ones after the current ones fixed by this PR are fixed?
  18. paveljanik commented at 9:23 pm on October 30, 2016: contributor
    @sipa Well, maybe. I do not know yet 8)
  19. paveljanik commented at 11:26 am on October 31, 2016: contributor

    Hmm, compiling e.g. src/wallet/wallet.cpp prints NO warning in the current master. But this is printed after applying this PR:

    0In file included from wallet/wallet.cpp:6:
    1./wallet/wallet.h:137:13: warning: 'static' function 'ReadOrderPos' declared in header file should be declared 'static inline' [-Wunneeded-internal-declaration]
    2static void ReadOrderPos(int64_t& nOrderPos, mapValue_t& mapValue)
    3            ^
    4./wallet/wallet.h:148:13: warning: 'static' function 'WriteOrderPos' declared in header file should be declared 'static inline' [-Wunneeded-internal-declaration]
    5static void WriteOrderPos(const int64_t& nOrderPos, mapValue_t& mapValue)
    6            ^
    72 warnings generated.
    

    The command line used is the same, ccache removed.

  20. theuni commented at 10:31 pm on November 1, 2016: member
    Concept ACK and quick review utACK. I can confirm @paveljanik’s new warning, but can’t figure out why. Inlining shuts it up.
  21. sipa commented at 8:55 pm on November 2, 2016: member
    @theuni @paveljanik I can’t explain the new warning, but it is correct. I’ll add an (technically unrelated) commit here to fix it?
  22. paveljanik commented at 9:14 pm on November 2, 2016: contributor
    Yes, please.
  23. MarcoFalke commented at 10:23 pm on November 2, 2016: member

    I’ll add an (technically unrelated) commit here to fix it?

    If it is unrelated, better move it to a new pull request.

    (Could not see the warning here, when rebasing on current master and typing make)

  24. sipa commented at 0:30 am on November 3, 2016: member
  25. TheBlueMatt commented at 10:59 pm on November 3, 2016: member
    Your commit message on “Make streams’ read and write return void” has the description of “Make nType and nVersion private and sometimes const” and vice versa.
  26. sipa force-pushed on Nov 3, 2016
  27. sipa commented at 11:44 pm on November 3, 2016: member
    @TheBlueMatt Fixed!
  28. in src/coins.h: in f700506516 outdated
    164-        // version
    165-        nSize += ::GetSerializeSize(VARINT(this->nVersion), nType, nVersion);
    166-        // size of header code
    167-        nSize += ::GetSerializeSize(VARINT(nCode), nType, nVersion);
    168-        // spentness bitmask
    169-        nSize += nMaskSize;
    


    TheBlueMatt commented at 3:19 pm on November 4, 2016:
    Strictly speaking this is also a performance regression…do we care about the time taken to iterate over the nMastkSize bits (ie do we ever care about the performance of CCoins serialization that much?)

    sipa commented at 0:21 am on November 5, 2016:
    I actually think this will be optimized out. It’s a loop with known count in CCoin::Serialize, where each iteration results in a counter increment by one. Also, yes, I don’t think we care.
  29. sipa force-pushed on Nov 4, 2016
  30. in src/uint256.h: in 8d72ac410c outdated
    77@@ -78,11 +78,6 @@ class base_blob
    78         return sizeof(data);
    79     }
    80 
    81-    unsigned int GetSerializeSize(int nType, int nVersion) const
    


    TheBlueMatt commented at 6:41 pm on November 4, 2016:
    This breaks the uint256_tests compile, but you fixed it in “Get rid of nType and nVersion”, should probably move that few-line diff back to this commit.

    sipa commented at 9:56 pm on November 7, 2016:
    Fixed.
  31. in src/dbwrapper.h: in c39050a22a outdated
    16@@ -17,6 +17,9 @@
    17 #include <leveldb/db.h>
    18 #include <leveldb/write_batch.h>
    19 
    20+static const size_t DBWRAPPER_PREALLOC_KEY_SIZE = 64;
    21+static const size_t DBWRAPPER_PREALLOC_VALUE_SIZE = 1024;
    


    TheBlueMatt commented at 10:42 pm on November 4, 2016:
    I see no reason not to go higher here? Keys should ~always be 35 bytes (ish, but 64 is fine), but values could be higher…but might as well also prealloc more than 1K for values since they’re occasionally higher and the memory usage isnt an issue. Did you benchmark this?

    sipa commented at 0:24 am on November 5, 2016:
    I did benchmark the change overall, and the “Flush to disk” time went down by around 5% from just this. I did not benchmark any other sizes, but my expectation is that as long as the key size is over 35, and the value over 100 or so, there will not be any measurable difference, as these cases occur so infrequently, and a vector resize isn’t the end of the world.
  32. TheBlueMatt commented at 10:43 pm on November 4, 2016: member
    utACK f700506516d47b7f010b4954d277b17c9e2e43a9. Few nits and questions, but none of them really matter.
  33. gmaxwell commented at 5:39 am on November 7, 2016: contributor
    21:22 < fanquake> Has anyone benched #9039 ? Sipa posted a 1% speedup, but I’m seeing nearly 10%. Wondering if I’m missing something.. 21:34 < sipa> fanquake: highly dependent on your system 21:34 < sipa> i saw 5% in cpu reduction in the disk flushing code
  34. laanwj commented at 9:37 am on November 7, 2016: member
    Needs rebase after #8708
  35. MarcoFalke added this to the milestone 0.14.0 on Nov 7, 2016
  36. Remove unused ReadVersion and WriteVersion
    CDataStream and CAutoFile had a ReadVersion and WriteVersion method
    that was never used. Remove them.
    50e8a9ccd7
  37. Make streams' read and write return void
    The stream implementations had two cascading layers (the upper one
    with operator<< and operator>>, and a lower one with read and write).
    The lower layer's functions are never cascaded (nor should they, as
    they should only be used from the higher layer), so make them return
    void instead.
    c2c5d42f36
  38. sipa force-pushed on Nov 7, 2016
  39. sipa commented at 9:23 pm on November 7, 2016: member
    Rebased on top of #8708.
  40. Make nType and nVersion private and sometimes const
    Make the various stream implementations' nType and nVersion private
    and const (except in CDataStream where we really need a setter).
    fad9b66504
  41. Make GetSerializeSize a wrapper on top of CSizeComputer
    Given that in default GetSerializeSize implementations created by
    ADD_SERIALIZE_METHODS we're already using CSizeComputer(), get rid
    of the specialized GetSerializeSize methods everywhere, and just use
    CSizeComputer. This removes a lot of code which isn't actually used
    anywhere.
    
    For CCompactSize and CVarInt this actually removes a more efficient
    size computing algorithm, which is brought back in a later commit.
    657e05ab2e
  42. Get rid of nType and nVersion
    Remove the nType and nVersion as parameters to all serialization methods
    and functions. There is only one place where it's read and has an impact
    (in CAddress), and even there it does not impact any of the recursively
    invoked serializers.
    
    Instead, the few places that need nType or nVersion are changed to read
    it directly from the stream object, through GetType() and GetVersion()
    methods which are added to all stream classes.
    528472111b
  43. Avoid -Wshadow errors
    Suggested by Pavel Janik.
    a603925c77
  44. Make CSerAction's ForRead() constexpr
    The CSerAction's ForRead() method does not depend on any runtime
    data, so guarantee that requests to it can be optimized out by
    making it constexpr.
    
    Suggested by Cory Fields.
    a2929a26f5
  45. Add optimized CSizeComputer serializers
    To get the advantages of faster GetSerializeSize() implementations
    back that were removed in "Make GetSerializeSize a wrapper on top of
    CSizeComputer", reintroduce them in the few places in the form of a
    specialized Serialize() implementation. This actually gets us in a
    better state than before, as these even get used when they're invoked
    indirectly in the serialization of another object.
    25a211aa9e
  46. Use fixed preallocation instead of costly GetSerializeSize
    Dbwrapper used GetSerializeSize() to compute the size of the buffer
    to preallocate. For some cases (specifically: CCoins) this requires
    a costly compression call. Avoid this by just using fixed size
    preallocations instead.
    d59a518466
  47. sipa force-pushed on Nov 7, 2016
  48. gmaxwell commented at 0:05 am on November 8, 2016: contributor
    Rescan of recent blocks appears to be 4% faster with this on my laptop.
  49. gmaxwell commented at 10:53 pm on November 8, 2016: contributor
    tested ACK.
  50. laanwj commented at 12:52 pm on November 9, 2016: member
    Code review + concept ACK d59a518
  51. laanwj merged this on Nov 9, 2016
  52. laanwj closed this on Nov 9, 2016

  53. laanwj referenced this in commit e81df49644 on Nov 9, 2016
  54. jtimon commented at 10:21 pm on November 14, 2016: contributor
    Fast review on the parts related to removing nType and nVersion for serialization d59a518.
  55. codablock referenced this in commit 19a2d668cf on Jan 15, 2018
  56. gladcow referenced this in commit cdd8bfc8f7 on Mar 5, 2018
  57. gladcow referenced this in commit b3dee2b0fa on Mar 13, 2018
  58. gladcow referenced this in commit c7455fa6a3 on Mar 14, 2018
  59. gladcow referenced this in commit 3b414ed8f3 on Mar 15, 2018
  60. gladcow referenced this in commit e6d36cec9c on Mar 15, 2018
  61. gladcow referenced this in commit e49a2606c5 on Mar 15, 2018
  62. gladcow referenced this in commit e29f44711d on Mar 15, 2018
  63. gladcow referenced this in commit e12f297813 on Mar 24, 2018
  64. gladcow referenced this in commit a526e9bf89 on Apr 4, 2018
  65. UdjinM6 referenced this in commit bc45a2f87a on Apr 11, 2018
  66. zkbot referenced this in commit 3b68ab255f on Apr 17, 2018
  67. zkbot referenced this in commit d408e23ab7 on Apr 18, 2018
  68. zkbot referenced this in commit 0753a0e8a9 on Apr 19, 2018
  69. andvgal referenced this in commit ccefcb6d29 on Jan 6, 2019
  70. andvgal referenced this in commit fd5c50bc2b on Jan 6, 2019
  71. CryptoCentric referenced this in commit 5185fb954a on Feb 24, 2019
  72. CryptoCentric referenced this in commit dd3fd51204 on Feb 28, 2019
  73. furszy referenced this in commit 8cd9c592a9 on May 29, 2020
  74. UdjinM6 referenced this in commit ade894eb0f on May 28, 2021
  75. UdjinM6 referenced this in commit cb0eeee0df on May 28, 2021
  76. MarcoFalke locked this on Sep 8, 2021

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-01-22 03:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me