speed up Unserialize_impl for prevector #12324

pull AkioNak wants to merge 1 commits into bitcoin:master from AkioNak:unserialize changing 3 files +46 −3
  1. AkioNak commented at 11:51 am on February 1, 2018: contributor

    The unserializer for prevector uses resize() for reserve the area, but it’s prefer to use reserve() because resize() have overhead to call its constructor many times.

    However, reserve() does not change the value of _size (a private member of prevector).

    This PR make the logic of read from stream to callback function, and prevector handles initilizing new values with that call-back and ajust the value of _size.

    The changes are as follows:

    1. prevector.h Add a public member function named ‘append’. This function has 2 params, number of elemenst to append and call-back function that initilizing new appended values.

    2. serialize.h In the following two function:

    • Unserialize_impl(Stream& is, prevector<N, T>& v, const unsigned char&)
    • Unserialize_impl(Stream& is, prevector<N, T>& v, const V&) Make a callback function from each original logic of reading values from stream, and call prevector’s append().
    1. test/prevector_tests.cpp Add a test for append().

    A benchmark result is following:

    [Machine] MacBook Pro (macOS 10.13.3/i7 2.2GHz/mem 16GB/SSD)

    [result] DeserializeAndCheckBlockTest => 22% faster DeserializeBlockTest => 29% faster

    [before PR] # Benchmark, evals, iterations, total, min, max, median DeserializeAndCheckBlockTest, 60, 160, 94.4901, 0.0094644, 0.0104715, 0.0098339 DeserializeBlockTest, 60, 130, 65.0964, 0.00800362, 0.00895134, 0.00824187

    [After PR] # Benchmark, evals, iterations, total, min, max, median DeserializeAndCheckBlockTest, 60, 160, 77.1597, 0.00767013, 0.00858959, 0.00805757 DeserializeBlockTest, 60, 130, 49.9443, 0.00613926, 0.00691187, 0.00635527

  2. fanquake added the label Refactoring on Feb 1, 2018
  3. fanquake commented at 11:56 am on February 1, 2018: member
    @AkioNak You might also want to look at #10785.
  4. AkioNak commented at 2:18 pm on February 1, 2018: contributor
    @fanquake Thank you for pointing to #10785. I will check if this PR is still useful even if #10785 is merged.
  5. laanwj commented at 6:43 pm on February 1, 2018: member
    Thanks for adding benchmarks! That’s the way to do optimization PRs.
  6. in src/test/prevector_tests.cpp:187 in 74de77f76e outdated
    182@@ -183,6 +183,20 @@ class prevector_tester {
    183         pre_vector = pre_vector_alt;
    184     }
    185 
    186+    void append(realtype values) {
    187+        for(auto v : values) {
    


    promag commented at 2:25 pm on February 6, 2018:
    Nit, space after for.
  7. in src/test/prevector_tests.cpp:192 in 74de77f76e outdated
    187+        for(auto v : values) {
    188+            real_vector.push_back(v);
    189+        }
    190+        auto p = pre_vector.size();
    191+        auto f = [&]() {
    192+            for(auto v : values) {
    


    promag commented at 2:25 pm on February 6, 2018:
    Nit, space after for.
  8. promag commented at 2:28 pm on February 6, 2018: member
    Please squash.
  9. AkioNak force-pushed on Feb 7, 2018
  10. AkioNak commented at 6:54 am on February 7, 2018: contributor
    @promag Thank you for your review. I fixed them and squashed commits.
  11. AkioNak commented at 10:19 am on February 7, 2018: contributor

    @fanquake Fortunately, I think that there was no collision or adverse effect between #10785 and #12324. Also, #12324 still usefull even if #10785 has been merged.

    Confirmation summary:

    1. enviroment : MacBook Pro (macOS 10.13.3/i7 2.2GHz/mem 16GB/SSD)
    2. merge - git merge (master d3e4675 + both #10785 and #12324) : succeed.
    3. build - make clean && make : succeed.
    4. test - test_runner.py : passed. (exclude wallet_encription.py)
    5. benchmark [result] DeserializeAndCheckBlockTest => 25% faster DeserializeBlockTest => 30% faster
    0[#10785]
    1 # Benchmark, evals, iterations, total, min, max, median
    2 DeserializeAndCheckBlockTest, 50, 160, 76.7465, 0.00941822, 0.00986061, 0.00958263
    3 DeserializeBlockTest, 50, 130, 52.3447, 0.00791727, 0.00828939, 0.00805472
    4
    5[#10785 + [#12324](/bitcoin-bitcoin/12324/)]
    6 # Benchmark, evals, iterations, total, min, max, median
    7 DeserializeAndCheckBlockTest, 50, 160, 61.3164, 0.00750302, 0.00797864, 0.00765575
    8 DeserializeBlockTest, 50, 130, 40.1209, 0.00602097, 0.0063615, 0.00617751
    
  12. eklitzke commented at 2:54 am on March 11, 2018: contributor

    Concept ACK. I like the idea of making this faster (I’ve seen this taking a lot of time in my profiles), but using a template to pass a lambda seems unnecessarily complex. If it gets inlined it will be fast. But if it ends up not being inlined, the lambda will turn into a full std::function object (since it captures) and that will be slow.

    Can you either:

    • Remove the func template (it’s only called in two places…)
    • Or check that GCC 4.8 inlines the lambda properly?
  13. AkioNak commented at 1:37 pm on March 11, 2018: contributor
    @eklitzke thank you for your comment. I will try it.
  14. kallewoof commented at 7:16 am on March 13, 2018: member

    I like the idea of making this faster (I’ve seen this taking a lot of time in my profiles), but using a template to pass a lambda seems unnecessarily complex. If it gets inlined it will be fast. But if it ends up not being inlined, the lambda will turn into a full std::function object (since it captures) and that will be slow.

    Adding inline to the template should do the trick, I think.

    But I agree that lambdas and callbacks are a bit complex here. I personally think a caveat note on a method in prevector that leaves garbage in the vector is fine, e.g.

    0/**
    1 * Grow the size of the prevector by b bytes.
    2 * NOTE: The added capacity must be overwritten, or it will contain garbage data.
    3 */
    
  15. sipa commented at 0:30 am on March 17, 2018: member

    Agree with @kallewoof. It seems the goal of using a callback here is to avoid having a public method that brings the vector in a (partially) undefined state. However, the result is that now we have a callback that needs to run in this state.

    I would either:

    • change the method name to resize_uninitialized or so, and initialize explicitly after it returns
    • pass a begin and end iterator to the callback (avoiding the need for the code in the callback to interact with the prevector while it’s in an undefined state).
  16. in src/prevector.h:386 in f04d8093ed outdated
    381@@ -382,6 +382,20 @@ class prevector {
    382         }
    383     }
    384 
    385+    inline void resize_uninitialized(size_type new_size) {
    386+        // resize_uninitialized change the size of the prevector but dose not initialize.
    


    eklitzke commented at 6:50 am on March 19, 2018:
    not: s/dose/does/
  17. in src/prevector.h:387 in f04d8093ed outdated
    381@@ -382,6 +382,20 @@ class prevector {
    382         }
    383     }
    384 
    385+    inline void resize_uninitialized(size_type new_size) {
    386+        // resize_uninitialized change the size of the prevector but dose not initialize.
    387+        // If size < new_size, the added elements must be initialized explicitly after it return.
    


    eklitzke commented at 6:51 am on March 19, 2018:
    nit: s/return/returns/
  18. AkioNak commented at 7:53 am on March 19, 2018: contributor
    @eklitzke @kallewoof @sipa Thank you for suggestions. I introduced resize_uninitialized() and explicitly initialized instead of lambdas and callbacks.
  19. eklitzke commented at 7:37 am on March 20, 2018: contributor
    This looks good! Just some typos in the comments.
  20. AkioNak commented at 10:15 am on March 20, 2018: contributor
    @eklitzke Thank you for your pointing out for my typos. Fixed them.
  21. eklitzke commented at 4:26 am on March 21, 2018: contributor

    This looks good, although you still need to squash. I’m curious: do you still see the speedup from your intial benchmark? I know we changed other logic in this file since then.

    On master:

    0$ ./src/bench/bench_bitcoin -filter='Deser.*' --evals=10
    1# Benchmark, evals, iterations, total, min, max, median
    2DeserializeAndCheckBlockTest, 10, 160, 9.86197, 0.00603752, 0.00640901, 0.00616365
    3DeserializeBlockTest, 10, 130, 6.7487, 0.00486321, 0.0065349, 0.00496556
    

    With your branch:

    0$ ./src/bench/bench_bitcoin -filter='Deser.*' --evals=10
    1# Benchmark, evals, iterations, total, min, max, median
    2DeserializeAndCheckBlockTest, 10, 160, 9.96894, 0.00611369, 0.00647763, 0.00622066
    3DeserializeBlockTest, 10, 130, 6.43537, 0.00484567, 0.00539326, 0.00490343
    
  22. MarcoFalke commented at 12:52 pm on March 21, 2018: member
    Could make sense to squash and rebase on master to ease benchmarking?
  23. AkioNak force-pushed on Mar 21, 2018
  24. AkioNak commented at 5:54 pm on March 21, 2018: contributor

    Squashed and rebased. Now, speed up is still exist but a little (2.06% - 3.35%).

    my enviroment : iMac late 2013 (macOS 10.13.3/i5 2.9GHz/mem 16GB/SSD) [on master]

    0# Benchmark, evals, iterations, total, min, max, median
    1DeserializeAndCheckBlockTest, 1000, 160, 1169.53, 0.00713881, 0.00842435, 0.00718528
    2DeserializeBlockTest, 1000, 130, 756.064, 0.00574035, 0.006433, 0.00575841
    

    [my PR]

    0# Benchmark, evals, iterations, total, min, max, median
    1DeserializeAndCheckBlockTest, 1000, 160, 1131.6, 0.0070475, 0.00744498, 0.00706667
    2DeserializeBlockTest, 1000, 130, 740.836, 0.00567827, 0.00600557, 0.00569649
    
  25. sipa commented at 6:20 pm on March 21, 2018: member
    utACK d85530db45b327eecf408bc8e9636fa60e886208
  26. eklitzke commented at 0:59 am on March 22, 2018: contributor
    Thanks for checking. utACK d85530db45b327eecf408bc8e9636fa60e886208
  27. AkioNak referenced this in commit 07696d1493 on Jul 4, 2018
  28. AkioNak force-pushed on Jul 7, 2018
  29. AkioNak commented at 3:42 pm on July 7, 2018: contributor

    Rebased and add a bench. This benchmark measures the part specialized for unserialization.

    PrevectorDeserializeNontrivial => 3% faster PrevectorDeserializeTrivial => 24% faster

    my enviroment : iMac late 2013 (macOS 10.13.3/i5 2.9GHz/mem 16GB/SSD)

    [on master ] commit 0212187fc624ea4a02fc99bc57ebd413499a9ee1

     0[#1](/bitcoin-bitcoin/1/)
     1# Benchmark, evals, iterations, total, min, max, median
     2PrevectorDeserializeNontrivial, 5, 6800, 10.3091, 0.00030134, 0.000308502, 0.000301828
     3PrevectorDeserializeTrivial, 5, 52000, 6.21001, 2.3817e-05, 2.39649e-05, 2.387e-05
     4
     5[#2](/bitcoin-bitcoin/2/)
     6# Benchmark, evals, iterations, total, min, max, median
     7PrevectorDeserializeNontrivial, 5, 6800, 10.2418, 0.000300878, 0.000301709, 0.000301096
     8PrevectorDeserializeTrivial, 5, 52000, 6.2122, 2.38438e-05, 2.39243e-05, 2.39157e-05
     9
    10[#3](/bitcoin-bitcoin/3/)
    11# Benchmark, evals, iterations, total, min, max, median
    12PrevectorDeserializeNontrivial, 5, 6800, 10.2464, 0.000300893, 0.000301853, 0.000301164
    13PrevectorDeserializeTrivial, 5, 52000, 6.20238, 2.38275e-05, 2.3929e-05, 2.38356e-05
    

    [my PR] commit 23afe7acfa7908905e826f09601c9564ff685be0

     0[#1](/bitcoin-bitcoin/1/)
     1# Benchmark, evals, iterations, total, min, max, median
     2PrevectorDeserializeNontrivial, 5, 6800, 9.95062, 0.000292415, 0.000292987, 0.000292503
     3PrevectorDeserializeTrivial, 5, 52000, 4.99719, 1.9179e-05, 1.92648e-05, 1.92245e-05
     4
     5[#2](/bitcoin-bitcoin/2/)
     6# Benchmark, evals, iterations, total, min, max, median
     7PrevectorDeserializeNontrivial, 5, 6800, 9.94901, 0.000292216, 0.000292962, 0.00029258
     8PrevectorDeserializeTrivial, 5, 52000, 4.99091, 1.91576e-05, 1.92298e-05, 1.92036e-05
     9
    10[#3](/bitcoin-bitcoin/3/)
    11# Benchmark, evals, iterations, total, min, max, median
    12PrevectorDeserializeNontrivial, 5, 6800, 9.94274, 0.000292245, 0.00029272, 0.000292385
    13PrevectorDeserializeTrivial, 5, 52000, 4.99286, 1.91848e-05, 1.92303e-05, 1.92037e-05
    
  30. MarcoFalke commented at 6:58 am on July 8, 2018: member

    Rebased and add a bench.

    Could add the bench in a separate commit/pull request to make it easier to check for the speedup.

  31. AkioNak force-pushed on Jul 8, 2018
  32. AkioNak commented at 12:22 pm on July 8, 2018: contributor
    @MarcoFalke Thank you for your suggestion. Separated this new benchmark from the original commit.
  33. AkioNak force-pushed on Jul 9, 2018
  34. AkioNak commented at 4:31 am on July 9, 2018: contributor
    Re-orderd commits. First commit (ee9867c) is adding a benchmark fucntion. Second one (f9083e5) is refactoring(speed up) and tests.
  35. in src/prevector.h:400 in f9083e53e3 outdated
    393@@ -394,6 +394,20 @@ class prevector {
    394         fill(ptr, first, last);
    395     }
    396 
    397+    inline void resize_uninitialized(size_type new_size) {
    398+        // resize_uninitialized change the size of the prevector but does not initialize.
    399+        // If size < new_size, the added elements must be initialized explicitly after it returns.
    400+        difference_type count = new_size - size();
    


    kallewoof commented at 4:55 am on July 9, 2018:
    Is this always guaranteed to give a signed result back? I.e. if new_size is 4 and size() is 5, will count always be -1 or will it sometimes be cast to (uint32_t)-1? (E.g. for other platforms and/or compilers)

    AkioNak commented at 5:58 am on July 10, 2018:
    @kallewoof Thanks. I will judge whether the value of ‘_size’ should need to increase by comparing new_size and size () instead of the sign of pre-computed difference of them.

    AkioNak commented at 10:19 am on July 10, 2018:
    done.
  36. in src/test/prevector_tests.cpp:202 in f9083e53e3 outdated
    190+        auto p = pre_vector.size();
    191+        pre_vector.resize_uninitialized(p + values.size());
    192+        for (auto v : values) {
    193+            pre_vector[p] = v;
    194+            ++p;
    195+        }
    


    kallewoof commented at 4:59 am on July 9, 2018:
    Also test shrinking the prevector.

    AkioNak commented at 5:58 am on July 10, 2018:
    @kallewoof Indeed. I will add.

    AkioNak commented at 10:19 am on July 10, 2018:
    done.
  37. kallewoof commented at 5:01 am on July 9, 2018: member
    utACK ee9867ce781c3849a8969f0cc870775cbf8956d3
  38. AkioNak commented at 10:22 am on July 10, 2018: contributor

    @kallewoof fixed your pointed out. please re-review.

    A benchmark result is following: [Machine] MacBook Pro (macOS 10.13.3/i7 2.2GHz/mem 16GB/SSD)

    [result] DeserializeAndCheckBlockTest => 2.4% faster DeserializeBlockTest => 2.9% faster PrevectorDeserializeNontrivial => 2.2% faster PrevectorDeserializeTrivial => 20.0% faster

    [before] commit ee9867ce781c3849a8969f0cc870775cbf8956d3

    0# Benchmark, evals, iterations, total, min, max, median
    1DeserializeAndCheckBlockTest, 10, 1600, 106.29, 0.00663181, 0.00666301, 0.00664266
    2DeserializeBlockTest, 10, 1300, 80.2574, 0.00615166, 0.00620452, 0.00617665
    3PrevectorDeserializeNontrivial, 10, 68000, 216.451, 0.00031795, 0.000319471, 0.000318262
    4PrevectorDeserializeTrivial, 10, 520000, 130.563, 2.5033e-05, 2.52096e-05, 2.51162e-05
    

    [After] commit 26bbf08a6ecb3f5876d2843166073b23179e527e

    0# Benchmark, evals, iterations, total, min, max, median
    1DeserializeAndCheckBlockTest, 10, 1600, 103.765, 0.00647724, 0.00649993, 0.00648584
    2DeserializeBlockTest, 10, 1300, 78.2138, 0.00599649, 0.00612872, 0.00600488
    3PrevectorDeserializeNontrivial, 10, 68000, 211.805, 0.000311148, 0.000312017, 0.00031137
    4PrevectorDeserializeTrivial, 10, 520000, 109.057, 2.08899e-05, 2.13558e-05, 2.09365e-05
    
  39. in src/test/prevector_tests.cpp:196 in c8cab143bd outdated
    192@@ -193,6 +193,9 @@ class prevector_tester {
    193             pre_vector[p] = v;
    194             ++p;
    195         }
    196+        size_t s = real_vector.size() - (InsecureRand32() % values.size());
    


    kallewoof commented at 2:23 am on July 11, 2018:
    Avoid randomness in tests. It is slow and results are needlessly unpredictable. Maybe just do real_vector.size() / 2 or something?

    AkioNak commented at 7:39 am on July 11, 2018:
    @kallewoof Ok. But I think if shrink here, added elements may be gone. So I will move these line to the top of this function.
  40. in src/prevector.h:409 in 26bbf08a6e outdated
    407-        if (count < 0) {
    408+        if (new_size < cur_size) {
    409             erase(item_ptr(new_size), end());
    410         } else {
    411-            _size += count;
    412+            _size += new_size - cur_size;
    


    kallewoof commented at 2:24 am on July 11, 2018:
    Does it impact performance a lot if you just use size() in both these places?

    AkioNak commented at 7:32 am on July 11, 2018:
    @kallewoof Now I mesured a bench that using size() instead of cur_size; I am surprised that compiler optimization is excellent. In this patch, rather than referring to local variables, function calls are faster.
  41. in src/test/prevector_tests.cpp:204 in c8cab143bd outdated
    192@@ -193,6 +193,9 @@ class prevector_tester {
    193             pre_vector[p] = v;
    194             ++p;
    195         }
    196+        size_t s = real_vector.size() - (InsecureRand32() % values.size());
    197+        real_vector.resize(s);
    198+        pre_vector.resize_uninitialized(s);
    199     }
    


    AkioNak commented at 7:41 am on July 11, 2018:
    self review. need calling test() at the enf of this function.
  42. AkioNak force-pushed on Jul 11, 2018
  43. AkioNak commented at 8:56 am on July 11, 2018: contributor

    @kallewoof done.

    A benchmark result update:

    [result] Compared to commit ee9867ce781c3849a8969f0cc870775cbf8956d3 DeserializeAndCheckBlockTest => 2.7% faster DeserializeBlockTest => 3.1% faster PrevectorDeserializeNontrivial => 2.4% faster PrevectorDeserializeTrivial => 23.7% faster

    [After] commit ece98807208328ce17c1d30c44a68d272b58b90c

    0# Benchmark, evals, iterations, total, min, max, median
    1DeserializeAndCheckBlockTest, 10, 1600, 103.461, 0.00644834, 0.00650016, 0.00646659
    2DeserializeBlockTest, 10, 1300, 77.9655, 0.00597518, 0.00606764, 0.00599208
    3PrevectorDeserializeNontrivial, 10, 68000, 211.382, 0.000310146, 0.000312558, 0.000310812
    4PrevectorDeserializeTrivial, 10, 520000, 105.636, 2.02662e-05, 2.04059e-05, 2.03115e-05
    
  44. MarcoFalke commented at 12:22 pm on July 11, 2018: member
    Could add the benchmarks in a separate pull request to get them in faster?
  45. AkioNak force-pushed on Jul 19, 2018
  46. AkioNak commented at 8:51 am on July 19, 2018: contributor
    @kallewoof @MarcoFalke move the benchmarks in a new pull request #13711, and squashed.
  47. in src/prevector.h:398 in 6a511fbd66 outdated
    393@@ -394,6 +394,21 @@ class prevector {
    394         fill(ptr, first, last);
    395     }
    396 
    397+    inline void resize_uninitialized(size_type new_size) {
    398+        // resize_uninitialized change the size of the prevector but does not initialize.
    


    kallewoof commented at 8:45 am on July 27, 2018:
    // resize_uninitialized changes the size of the prevector but does not initialize it

    AkioNak commented at 1:56 pm on July 27, 2018:
    @kallewoof Thanks. fixed.
  48. in src/prevector.h:399 in 6a511fbd66 outdated
    393@@ -394,6 +394,21 @@ class prevector {
    394         fill(ptr, first, last);
    395     }
    396 
    397+    inline void resize_uninitialized(size_type new_size) {
    398+        // resize_uninitialized change the size of the prevector but does not initialize.
    399+        // If size < new_size, the added elements must be initialized explicitly after it returns.
    


    kallewoof commented at 8:45 am on July 27, 2018:
    after it returns. is unnecessary I think. must be initialized explicitly

    AkioNak commented at 1:56 pm on July 27, 2018:
    @kallewoof Thanks. fixed.
  49. kallewoof commented at 8:48 am on July 27, 2018: member

    utACK 6a511fbd660dc9ba307e7c401271978055c16fb4

    I am not super happy with using randomness in tests as it slows down and makes the tests needlessly unpredictable, but otherwise looks good to me.

  50. AkioNak force-pushed on Jul 27, 2018
  51. MarcoFalke referenced this in commit f98d1e0008 on Jul 27, 2018
  52. AkioNak commented at 1:41 pm on July 29, 2018: contributor
    @kallewoof I understand your concern of slow-down caused from randomness. But I don’t worry so much because of deterministic psued random introduce by #10321. It may be possible to remove randomness from this test (or more widely), but I think it would be better to use a different PR(s).
  53. kallewoof commented at 3:55 am on July 30, 2018: member

    Cool about #10321, didn’t realize that change was made.

    utACK b91962ecf0a9b90c989068e3f12e5699bc90ef6f

    [Edited to fix commit reference.]

  54. in src/test/prevector_tests.cpp:280 in b91962ecf0 outdated
    275@@ -260,6 +276,14 @@ BOOST_AUTO_TEST_CASE(PrevectorTestInt)
    276             if (InsecureRandBits(5) == 18) {
    277                 test.move();
    278             }
    279+            if (InsecureRandBits(5) == 19) {
    280+                int num = 1 + (InsecureRandBits(4));
    


    practicalswift commented at 1:38 pm on October 11, 2018:
    Switch to unsigned? InsecureRandBits returns unsigned :-)

    AkioNak commented at 7:16 am on October 15, 2018:
    @practicalswift Thank you. Oh, I see. unsigned int is better. I have addressed it ( and rebased to be992701).
  55. sipa commented at 8:06 pm on October 12, 2018: member
    utACK b91962ecf0a9b90c989068e3f12e5699bc90ef6f
  56. AkioNak force-pushed on Oct 15, 2018
  57. in src/test/prevector_tests.cpp:193 in 5573129cfe outdated
    182@@ -183,6 +183,22 @@ class prevector_tester {
    183         pre_vector = pre_vector_alt;
    184     }
    185 
    186+    void resize_uninitialized(realtype values) {
    187+        size_t s = real_vector.size() / 2;
    188+        real_vector.resize(s);
    189+        pre_vector.resize_uninitialized(s);
    


    shahzadlone commented at 7:32 am on February 1, 2019:
    Perhaps here you could make sure before going in loop to reserve values.size() many more memory to optimize the vector.

    AkioNak commented at 1:02 pm on February 3, 2019:
    @shahzadlone Thank you for your review. Addressed that you pointed out.
  58. speed up Unserialize_impl for prevector
    The unserializer for prevector uses resize() for reserve the area,
    but it's prefer to use reserve() because resize() have overhead
    to call its constructor many times.
    
    However, reserve() does not change the value of "_size"
    (a private member of prevector).
    
    This PR introduce resize_uninitialized() to prevector that similar to
    resize() but does not call constructor, and added elements are
    explicitly initialized in Unserialize_imple().
    
    The changes are as follows:
    1. prevector.h
    Add a public member function named 'resize_uninitialized'.
    This function processes like as resize() but does not call constructors.
    So added elemensts needs explicitly initialized after this returns.
    
    2. serialize.h
    In the following two function:
     Unserialize_impl(Stream& is, prevector<N, T>& v, const unsigned char&)
     Unserialize_impl(Stream& is, prevector<N, T>& v, const V&)
    Calls resize_uninitialized() instead of resize()
    
    3. test/prevector_tests.cpp
    Add a test for resize_uninitialized().
    86b47fa741
  59. AkioNak force-pushed on Feb 3, 2019
  60. laanwj commented at 3:10 pm on June 18, 2019: member
    utACK 86b47fa741408b061ab0bda784b8678bfd7dfa88
  61. laanwj merged this on Jun 18, 2019
  62. laanwj closed this on Jun 18, 2019

  63. laanwj referenced this in commit 8777a80706 on Jun 18, 2019
  64. sidhujag referenced this in commit 1fd27cc371 on Jun 19, 2019
  65. MarcoFalke commented at 7:26 pm on June 19, 2019: member
    Has anyone checked that this actually improves the benchmark as claimed in the OP. It does not for me.
  66. kallewoof commented at 5:11 am on June 20, 2019: member

    I ran the benchmarks on two linux machines (one pretty powerful (GCO) and one not so (Lefty)), and a MacBook Pro. Raw numbers at bottom. I see improvements in master compared to e2182b02b in PrevectorDeserialize*rivial, but not in the other benchmarks:

    (Sorry, I wasn’t sure how to make graphs for benchmarks. Raw data below.)

      0git checkout e2182b02b
      1
      2MacBook Pro
      3
      4$ ./bench_bitcoin -filter=".*Deserialize.*"
      5# Benchmark, evals, iterations, total, min, max, median
      6DeserializeAndCheckBlockTest, 5, 160, 6.38381, 0.00792733, 0.00802109, 0.00799184
      7DeserializeBlockTest, 5, 130, 4.23839, 0.00634135, 0.00694236, 0.00644537
      8PrevectorDeserializeNontrivial, 5, 6800, 13.3873, 0.000390055, 0.000397128, 0.000393687
      9PrevectorDeserializeTrivial, 5, 52000, 7.96211, 2.98404e-05, 3.17966e-05, 3.04728e-05
     10$ ./bench_bitcoin -filter=".*Deserialize.*"
     11# Benchmark, evals, iterations, total, min, max, median
     12DeserializeAndCheckBlockTest, 5, 160, 7.26932, 0.00838346, 0.00972242, 0.00942287
     13DeserializeBlockTest, 5, 130, 4.31697, 0.00627366, 0.00738568, 0.0063276
     14PrevectorDeserializeNontrivial, 5, 6800, 13.2609, 0.000386639, 0.000394633, 0.000387913
     15PrevectorDeserializeTrivial, 5, 52000, 6.47209, 2.45058e-05, 2.51857e-05, 2.49892e-05
     16$ ./bench_bitcoin -filter=".*Deserialize.*"
     17# Benchmark, evals, iterations, total, min, max, median
     18DeserializeAndCheckBlockTest, 5, 160, 6.32675, 0.00777852, 0.00802965, 0.00789037
     19DeserializeBlockTest, 5, 130, 4.20692, 0.00634361, 0.00674539, 0.00637073
     20PrevectorDeserializeNontrivial, 5, 6800, 13.3495, 0.000390075, 0.00039594, 0.000391467
     21PrevectorDeserializeTrivial, 5, 52000, 6.39515, 2.44061e-05, 2.48503e-05, 2.45071e-05
     22$
     23
     24Ubuntu Linux (gco)
     25
     26$ ./bench_bitcoin -filter=".*Deserialize.*"
     27# Benchmark, evals, iterations, total, min, max, median
     28DeserializeAndCheckBlockTest, 5, 160, 3.66615, 0.0045421, 0.00466559, 0.00455856
     29DeserializeBlockTest, 5, 130, 2.46865, 0.00379088, 0.00381237, 0.00379707
     30PrevectorDeserializeNontrivial, 5, 6800, 1.98757, 5.78348e-05, 6.06301e-05, 5.79352e-05
     31PrevectorDeserializeTrivial, 5, 52000, 2.46781, 9.44648e-06, 9.56601e-06, 9.48185e-06
     32$ ./bench_bitcoin -filter=".*Deserialize.*"
     33# Benchmark, evals, iterations, total, min, max, median
     34DeserializeAndCheckBlockTest, 5, 160, 3.66454, 0.00455685, 0.00465467, 0.00456468
     35DeserializeBlockTest, 5, 130, 2.48794, 0.00379826, 0.00392414, 0.00380655
     36PrevectorDeserializeNontrivial, 5, 6800, 1.99268, 5.78273e-05, 6.03986e-05, 5.83746e-05
     37PrevectorDeserializeTrivial, 5, 52000, 2.49502, 9.40815e-06, 9.87399e-06, 9.63512e-06
     38$ ./bench_bitcoin -filter=".*Deserialize.*"
     39# Benchmark, evals, iterations, total, min, max, median
     40DeserializeAndCheckBlockTest, 5, 160, 3.68505, 0.00456129, 0.00469898, 0.00458263
     41DeserializeBlockTest, 5, 130, 2.47728, 0.00378338, 0.00390047, 0.00378849
     42PrevectorDeserializeNontrivial, 5, 6800, 1.98537, 5.76469e-05, 6.05225e-05, 5.77919e-05
     43PrevectorDeserializeTrivial, 5, 52000, 2.46844, 9.41589e-06, 9.66261e-06, 9.43065e-06
     44$
     45
     46
     47Ubuntu Linux (lefty)
     48
     49$ ./bench_bitcoin -filter=".*Deserialize.*"
     50# Benchmark, evals, iterations, total, min, max, median
     51DeserializeAndCheckBlockTest, 5, 160, 4.026, 0.00500104, 0.00506093, 0.00503687
     52DeserializeBlockTest, 5, 130, 2.75244, 0.00420178, 0.00429237, 0.00423144
     53PrevectorDeserializeNontrivial, 5, 6800, 2.1691, 6.35412e-05, 6.42218e-05, 6.3725e-05
     54PrevectorDeserializeTrivial, 5, 52000, 2.74612, 1.05157e-05, 1.05976e-05, 1.05714e-05
     55$ ./bench_bitcoin -filter=".*Deserialize.*"
     56# Benchmark, evals, iterations, total, min, max, median
     57DeserializeAndCheckBlockTest, 5, 160, 4.06636, 0.0050483, 0.00515482, 0.00507011
     58DeserializeBlockTest, 5, 130, 2.78506, 0.00421478, 0.00449306, 0.00424098
     59PrevectorDeserializeNontrivial, 5, 6800, 2.18859, 6.39355e-05, 6.50351e-05, 6.42038e-05
     60PrevectorDeserializeTrivial, 5, 52000, 2.77944, 1.05288e-05, 1.10395e-05, 1.06421e-05
     61$ ./bench_bitcoin -filter=".*Deserialize.*"
     62# Benchmark, evals, iterations, total, min, max, median
     63DeserializeAndCheckBlockTest, 5, 160, 4.02973, 0.00501044, 0.00507903, 0.00502341
     64DeserializeBlockTest, 5, 130, 2.77102, 0.00423808, 0.00430167, 0.00425302
     65PrevectorDeserializeNontrivial, 5, 6800, 2.21241, 6.38159e-05, 6.64756e-05, 6.51687e-05
     66PrevectorDeserializeTrivial, 5, 52000, 2.76094, 1.05595e-05, 1.07031e-05, 1.06016e-05
     67$
     68
     69git checkout master (44d81723236114f9370f386f3b3310477a6dde43)
     70
     71MacBook Pro
     72
     73$ ./bench_bitcoin -filter=".*Deserialize.*"
     74# Benchmark, evals, iterations, total, min, max, median
     75DeserializeAndCheckBlockTest, 5, 160, 6.17814, 0.0076679, 0.00785118, 0.00768925
     76DeserializeBlockTest, 5, 130, 4.06245, 0.00622414, 0.00628035, 0.00624402
     77PrevectorDeserializeNontrivial, 5, 6800, 13.073, 0.000382798, 0.000387096, 0.00038391
     78PrevectorDeserializeTrivial, 5, 52000, 5.15493, 1.96871e-05, 1.99912e-05, 1.9814e-05
     79$ ./bench_bitcoin -filter=".*Deserialize.*"
     80# Benchmark, evals, iterations, total, min, max, median
     81DeserializeAndCheckBlockTest, 5, 160, 6.1623, 0.00761647, 0.00792896, 0.00765533
     82DeserializeBlockTest, 5, 130, 4.445, 0.00621718, 0.00774078, 0.00645273
     83PrevectorDeserializeNontrivial, 5, 6800, 14.5375, 0.000414608, 0.00044193, 0.000425773
     84PrevectorDeserializeTrivial, 5, 52000, 5.79235, 1.98752e-05, 2.50009e-05, 2.20261e-05
     85$ ./bench_bitcoin -filter=".*Deserialize.*"
     86# Benchmark, evals, iterations, total, min, max, median
     87DeserializeAndCheckBlockTest, 5, 160, 6.22612, 0.00771692, 0.00791803, 0.00773467
     88DeserializeBlockTest, 5, 130, 4.09782, 0.00627166, 0.00638606, 0.00628699
     89PrevectorDeserializeNontrivial, 5, 6800, 13.0198, 0.000381592, 0.000384941, 0.000382887
     90PrevectorDeserializeTrivial, 5, 52000, 5.17657, 1.98519e-05, 2.00462e-05, 1.98927e-05
     91$
     92
     93Linux Ubuntu (gco)
     94
     95$ ./bench_bitcoin -filter=".*Deserialize.*"
     96# Benchmark, evals, iterations, total, min, max, median
     97DeserializeAndCheckBlockTest, 5, 160, 3.71635, 0.00461716, 0.00470119, 0.00462609
     98DeserializeBlockTest, 5, 130, 2.51346, 0.00384798, 0.0039135, 0.00385253
     99PrevectorDeserializeNontrivial, 5, 6800, 1.64983, 4.74319e-05, 5.09874e-05, 4.79242e-05
    100PrevectorDeserializeTrivial, 5, 52000, 1.66145, 6.2768e-06, 6.69577e-06, 6.32143e-06
    101$ ./bench_bitcoin -filter=".*Deserialize.*"
    102# Benchmark, evals, iterations, total, min, max, median
    103DeserializeAndCheckBlockTest, 5, 160, 3.6407, 0.00452042, 0.00463718, 0.00453418
    104DeserializeBlockTest, 5, 130, 2.46214, 0.00373507, 0.00390495, 0.00376771
    105PrevectorDeserializeNontrivial, 5, 6800, 1.64648, 4.78473e-05, 5.03235e-05, 4.79951e-05
    106PrevectorDeserializeTrivial, 5, 52000, 1.67024, 6.32431e-06, 6.73776e-06, 6.3539e-06
    107$ ./bench_bitcoin -filter=".*Deserialize.*"
    108# Benchmark, evals, iterations, total, min, max, median
    109DeserializeAndCheckBlockTest, 5, 160, 3.77728, 0.00471439, 0.00474315, 0.00471631
    110DeserializeBlockTest, 5, 130, 2.49195, 0.00381892, 0.00388194, 0.00382307
    111PrevectorDeserializeNontrivial, 5, 6800, 1.68265, 4.85277e-05, 5.10689e-05, 4.96072e-05
    112PrevectorDeserializeTrivial, 5, 52000, 1.68869, 6.41578e-06, 6.60289e-06, 6.50186e-06
    113$
    114
    115Ubuntu Linux (lefty)
    116
    117$ ./bench_bitcoin -filter=".*Deserialize.*"
    118# Benchmark, evals, iterations, total, min, max, median
    119DeserializeAndCheckBlockTest, 5, 160, 4.05288, 0.00505502, 0.00507194, 0.00506818
    120DeserializeBlockTest, 5, 130, 2.80491, 0.0042465, 0.00449445, 0.00427763
    121PrevectorDeserializeNontrivial, 5, 6800, 1.86039, 5.34824e-05, 5.64816e-05, 5.4309e-05
    122PrevectorDeserializeTrivial, 5, 52000, 1.88765, 7.17239e-06, 7.30816e-06, 7.28207e-06
    123$ ./bench_bitcoin -filter=".*Deserialize.*"
    124# Benchmark, evals, iterations, total, min, max, median
    125DeserializeAndCheckBlockTest, 5, 160, 3.97765, 0.0049599, 0.00499381, 0.00496968
    126DeserializeBlockTest, 5, 130, 2.70895, 0.00415038, 0.00417582, 0.00417205
    127PrevectorDeserializeNontrivial, 5, 6800, 1.80413, 5.28583e-05, 5.33843e-05, 5.30097e-05
    128PrevectorDeserializeTrivial, 5, 52000, 1.82675, 6.98311e-06, 7.08135e-06, 7.019e-06
    129$ ./bench_bitcoin -filter=".*Deserialize.*"
    130# Benchmark, evals, iterations, total, min, max, median
    131DeserializeAndCheckBlockTest, 5, 160, 4.03341, 0.00500616, 0.00510331, 0.0050347
    132DeserializeBlockTest, 5, 130, 2.74251, 0.00420575, 0.0042337, 0.00421798
    133PrevectorDeserializeNontrivial, 5, 6800, 1.80552, 5.28383e-05, 5.37108e-05, 5.29898e-05
    134PrevectorDeserializeTrivial, 5, 52000, 1.8427, 6.96723e-06, 7.47701e-06, 7.0036e-06
    135$
    
  67. AkioNak commented at 10:40 am on June 20, 2019: contributor

    @MarcoFalke @kallewoof Thank you for the comment and reporting the benchmark result.

    The #12549, which is an improvement to prevector and merged on 1 Mar 2018, is very efficient, so the improvement of this PR is relatively hidden in the benchmark of Deserialize*BlockTest. However, focusing on prevector deserialization itself, there are some speedup with this PR.

    So I had have to change the PR description to PrevectorDeserializerivial instead of DeserializeBlockTest to clarify where is improved.

  68. jamesob commented at 2:55 pm on June 24, 2019: member

    Can verify I’m seeing speedups in this branch (as rebased onto master) relative to the commit that came before its merge into master, but only for gcc:

    e2182b02b5af13f0de38cf8b08bb81723387c570 vs. unserialize (relative)

    bench name x e2182b02b5af13f0de38cf8b08bb81723387c570 unserialize
    micro.gcc.PrevectorDeserializeNontrivial.total_secs 3 1.143 1.000
    micro.gcc.PrevectorDeserializeTrivial.total_secs 3 1.333 1.000
    micro.gcc.PrevectorDestructorNontrivial.total_secs 3 1.018 1.000
    micro.clang.PrevectorDeserializeNontrivial.total_secs 3 1.000 1.020
    micro.clang.PrevectorDeserializeTrivial.total_secs 3 1.106 1.000
    micro.clang.PrevectorDestructorNontrivial.total_secs 3 1.000 1.019
    micro.clang.PrevectorDestructorTrivial.total_secs 3 1.000 1.003

    These changes do not notably affect IBD time though.

  69. codablock referenced this in commit 7c6ebbe058 on Oct 1, 2019
  70. codablock referenced this in commit 833ebbef5c on Oct 1, 2019
  71. codablock referenced this in commit 2be67c7605 on Oct 2, 2019
  72. barrystyle referenced this in commit d6a98dd1a3 on Jan 22, 2020
  73. PastaPastaPasta referenced this in commit c59dcbba59 on Jul 29, 2020
  74. jasonbcox referenced this in commit 7289af88e9 on Nov 12, 2020
  75. furszy referenced this in commit 07b88da888 on Jan 25, 2021
  76. gades referenced this in commit 2968f93c4f on Jul 1, 2021
  77. ftrader referenced this in commit 70752f8e64 on Nov 21, 2021
  78. MarcoFalke locked this on Dec 16, 2021

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-09-28 22:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me