speed up Unserialize_impl for prevector

AkioNak commented at 11:51 am on February 1, 2018: contributor

The unserializer for prevector uses resize() for reserve the area, but it’s prefer to use reserve() because resize() have overhead to call its constructor many times.

However, reserve() does not change the value of _size (a private member of prevector).

This PR make the logic of read from stream to callback function, and prevector handles initilizing new values with that call-back and ajust the value of _size.

The changes are as follows:

prevector.h Add a public member function named ‘append’. This function has 2 params, number of elemenst to append and call-back function that initilizing new appended values.
serialize.h In the following two function:

Unserialize_impl(Stream& is, prevector<N, T>& v, const unsigned char&)
Unserialize_impl(Stream& is, prevector<N, T>& v, const V&) Make a callback function from each original logic of reading values from stream, and call prevector’s append().

test/prevector_tests.cpp Add a test for append().

A benchmark result is following:

[Machine] MacBook Pro (macOS 10.13.3/i7 2.2GHz/mem 16GB/SSD)

[result] DeserializeAndCheckBlockTest => 22% faster DeserializeBlockTest => 29% faster

[before PR] # Benchmark, evals, iterations, total, min, max, median DeserializeAndCheckBlockTest, 60, 160, 94.4901, 0.0094644, 0.0104715, 0.0098339 DeserializeBlockTest, 60, 130, 65.0964, 0.00800362, 0.00895134, 0.00824187

[After PR] # Benchmark, evals, iterations, total, min, max, median DeserializeAndCheckBlockTest, 60, 160, 77.1597, 0.00767013, 0.00858959, 0.00805757 DeserializeBlockTest, 60, 130, 49.9443, 0.00613926, 0.00691187, 0.00635527

fanquake added the label Refactoring on Feb 1, 2018

fanquake commented at 11:56 am on February 1, 2018: member

@AkioNak You might also want to look at #10785.

AkioNak commented at 2:18 pm on February 1, 2018: contributor

@fanquake Thank you for pointing to #10785. I will check if this PR is still useful even if #10785 is merged.

laanwj commented at 6:43 pm on February 1, 2018: member

Thanks for adding benchmarks! That’s the way to do optimization PRs.

in src/test/prevector_tests.cpp:187 in 74de77f76e outdated

182@@ -183,6 +183,20 @@ class prevector_tester {
183         pre_vector = pre_vector_alt;
184     }
185 
186+    void append(realtype values) {
187+        for(auto v : values) {

promag commented at 2:25 pm on February 6, 2018:

Nit, space after for.

in src/test/prevector_tests.cpp:192 in 74de77f76e outdated

187+        for(auto v : values) {
188+            real_vector.push_back(v);
189+        }
190+        auto p = pre_vector.size();
191+        auto f = [&]() {
192+            for(auto v : values) {

promag commented at 2:25 pm on February 6, 2018:

Nit, space after for.

promag commented at 2:28 pm on February 6, 2018: member

Please squash.

AkioNak force-pushed on Feb 7, 2018

AkioNak commented at 6:54 am on February 7, 2018: contributor

@promag Thank you for your review. I fixed them and squashed commits.

AkioNak commented at 10:19 am on February 7, 2018: contributor

@fanquake Fortunately, I think that there was no collision or adverse effect between #10785 and #12324. Also, #12324 still usefull even if #10785 has been merged.

Confirmation summary:

enviroment : MacBook Pro (macOS 10.13.3/i7 2.2GHz/mem 16GB/SSD)
merge - git merge (master d3e4675 + both #10785 and #12324) : succeed.
build - make clean && make : succeed.
test - test_runner.py : passed. (exclude wallet_encription.py)
benchmark [result] DeserializeAndCheckBlockTest => 25% faster DeserializeBlockTest => 30% faster

0[#10785]
1 # Benchmark, evals, iterations, total, min, max, median
2 DeserializeAndCheckBlockTest, 50, 160, 76.7465, 0.00941822, 0.00986061, 0.00958263
3 DeserializeBlockTest, 50, 130, 52.3447, 0.00791727, 0.00828939, 0.00805472
4
5[#10785 + [#12324](/bitcoin-bitcoin/12324/)]
6 # Benchmark, evals, iterations, total, min, max, median
7 DeserializeAndCheckBlockTest, 50, 160, 61.3164, 0.00750302, 0.00797864, 0.00765575
8 DeserializeBlockTest, 50, 130, 40.1209, 0.00602097, 0.0063615, 0.00617751

eklitzke commented at 2:54 am on March 11, 2018: contributor

Concept ACK. I like the idea of making this faster (I’ve seen this taking a lot of time in my profiles), but using a template to pass a lambda seems unnecessarily complex. If it gets inlined it will be fast. But if it ends up not being inlined, the lambda will turn into a full std::function object (since it captures) and that will be slow.

Can you either:

Remove the func template (it’s only called in two places…)
Or check that GCC 4.8 inlines the lambda properly?

AkioNak commented at 1:37 pm on March 11, 2018: contributor

@eklitzke thank you for your comment. I will try it.

kallewoof commented at 7:16 am on March 13, 2018: member

I like the idea of making this faster (I’ve seen this taking a lot of time in my profiles), but using a template to pass a lambda seems unnecessarily complex. If it gets inlined it will be fast. But if it ends up not being inlined, the lambda will turn into a full std::function object (since it captures) and that will be slow.

Adding inline to the template should do the trick, I think.

But I agree that lambdas and callbacks are a bit complex here. I personally think a caveat note on a method in prevector that leaves garbage in the vector is fine, e.g.

0/**
1 * Grow the size of the prevector by b bytes.
2 * NOTE: The added capacity must be overwritten, or it will contain garbage data.
3 */

sipa commented at 0:30 am on March 17, 2018: member

Agree with @kallewoof. It seems the goal of using a callback here is to avoid having a public method that brings the vector in a (partially) undefined state. However, the result is that now we have a callback that needs to run in this state.

I would either:

change the method name to resize_uninitialized or so, and initialize explicitly after it returns
pass a begin and end iterator to the callback (avoiding the need for the code in the callback to interact with the prevector while it’s in an undefined state).

in src/prevector.h:386 in f04d8093ed outdated

381@@ -382,6 +382,20 @@ class prevector {
382         }
383     }
384 
385+    inline void resize_uninitialized(size_type new_size) {
386+        // resize_uninitialized change the size of the prevector but dose not initialize.

eklitzke commented at 6:50 am on March 19, 2018:

not: s/dose/does/

in src/prevector.h:387 in f04d8093ed outdated

381@@ -382,6 +382,20 @@ class prevector {
382         }
383     }
384 
385+    inline void resize_uninitialized(size_type new_size) {
386+        // resize_uninitialized change the size of the prevector but dose not initialize.
387+        // If size < new_size, the added elements must be initialized explicitly after it return.

eklitzke commented at 6:51 am on March 19, 2018:

nit: s/return/returns/

AkioNak commented at 7:53 am on March 19, 2018: contributor

@eklitzke @kallewoof @sipa Thank you for suggestions. I introduced resize_uninitialized() and explicitly initialized instead of lambdas and callbacks.

eklitzke commented at 7:37 am on March 20, 2018: contributor

This looks good! Just some typos in the comments.

AkioNak commented at 10:15 am on March 20, 2018: contributor

@eklitzke Thank you for your pointing out for my typos. Fixed them.

eklitzke commented at 4:26 am on March 21, 2018: contributor

This looks good, although you still need to squash. I’m curious: do you still see the speedup from your intial benchmark? I know we changed other logic in this file since then.

On master:

0$ ./src/bench/bench_bitcoin -filter='Deser.*' --evals=10
1# Benchmark, evals, iterations, total, min, max, median
2DeserializeAndCheckBlockTest, 10, 160, 9.86197, 0.00603752, 0.00640901, 0.00616365
3DeserializeBlockTest, 10, 130, 6.7487, 0.00486321, 0.0065349, 0.00496556

With your branch:

0$ ./src/bench/bench_bitcoin -filter='Deser.*' --evals=10
1# Benchmark, evals, iterations, total, min, max, median
2DeserializeAndCheckBlockTest, 10, 160, 9.96894, 0.00611369, 0.00647763, 0.00622066
3DeserializeBlockTest, 10, 130, 6.43537, 0.00484567, 0.00539326, 0.00490343

MarcoFalke commented at 12:52 pm on March 21, 2018: member

Could make sense to squash and rebase on master to ease benchmarking?

AkioNak force-pushed on Mar 21, 2018

AkioNak commented at 5:54 pm on March 21, 2018: contributor

Squashed and rebased. Now, speed up is still exist but a little (2.06% - 3.35%).

my enviroment : iMac late 2013 (macOS 10.13.3/i5 2.9GHz/mem 16GB/SSD) [on master]

0# Benchmark, evals, iterations, total, min, max, median
1DeserializeAndCheckBlockTest, 1000, 160, 1169.53, 0.00713881, 0.00842435, 0.00718528
2DeserializeBlockTest, 1000, 130, 756.064, 0.00574035, 0.006433, 0.00575841

[my PR]

0# Benchmark, evals, iterations, total, min, max, median
1DeserializeAndCheckBlockTest, 1000, 160, 1131.6, 0.0070475, 0.00744498, 0.00706667
2DeserializeBlockTest, 1000, 130, 740.836, 0.00567827, 0.00600557, 0.00569649

sipa commented at 6:20 pm on March 21, 2018: member

utACK d85530db45b327eecf408bc8e9636fa60e886208

eklitzke commented at 0:59 am on March 22, 2018: contributor

Thanks for checking. utACK d85530db45b327eecf408bc8e9636fa60e886208

AkioNak referenced this in commit 07696d1493 on Jul 4, 2018

AkioNak force-pushed on Jul 7, 2018

AkioNak commented at 3:42 pm on July 7, 2018: contributor

Rebased and add a bench. This benchmark measures the part specialized for unserialization.

PrevectorDeserializeNontrivial => 3% faster PrevectorDeserializeTrivial => 24% faster

my enviroment : iMac late 2013 (macOS 10.13.3/i5 2.9GHz/mem 16GB/SSD)

[on master ] commit 0212187fc624ea4a02fc99bc57ebd413499a9ee1

 0[#1](/bitcoin-bitcoin/1/)
 1# Benchmark, evals, iterations, total, min, max, median
 2PrevectorDeserializeNontrivial, 5, 6800, 10.3091, 0.00030134, 0.000308502, 0.000301828
 3PrevectorDeserializeTrivial, 5, 52000, 6.21001, 2.3817e-05, 2.39649e-05, 2.387e-05
 4
 5[#2](/bitcoin-bitcoin/2/)
 6# Benchmark, evals, iterations, total, min, max, median
 7PrevectorDeserializeNontrivial, 5, 6800, 10.2418, 0.000300878, 0.000301709, 0.000301096
 8PrevectorDeserializeTrivial, 5, 52000, 6.2122, 2.38438e-05, 2.39243e-05, 2.39157e-05
 9
10[#3](/bitcoin-bitcoin/3/)
11# Benchmark, evals, iterations, total, min, max, median
12PrevectorDeserializeNontrivial, 5, 6800, 10.2464, 0.000300893, 0.000301853, 0.000301164
13PrevectorDeserializeTrivial, 5, 52000, 6.20238, 2.38275e-05, 2.3929e-05, 2.38356e-05

[my PR] commit 23afe7acfa7908905e826f09601c9564ff685be0

 0[#1](/bitcoin-bitcoin/1/)
 1# Benchmark, evals, iterations, total, min, max, median
 2PrevectorDeserializeNontrivial, 5, 6800, 9.95062, 0.000292415, 0.000292987, 0.000292503
 3PrevectorDeserializeTrivial, 5, 52000, 4.99719, 1.9179e-05, 1.92648e-05, 1.92245e-05
 4
 5[#2](/bitcoin-bitcoin/2/)
 6# Benchmark, evals, iterations, total, min, max, median
 7PrevectorDeserializeNontrivial, 5, 6800, 9.94901, 0.000292216, 0.000292962, 0.00029258
 8PrevectorDeserializeTrivial, 5, 52000, 4.99091, 1.91576e-05, 1.92298e-05, 1.92036e-05
 9
10[#3](/bitcoin-bitcoin/3/)
11# Benchmark, evals, iterations, total, min, max, median
12PrevectorDeserializeNontrivial, 5, 6800, 9.94274, 0.000292245, 0.00029272, 0.000292385
13PrevectorDeserializeTrivial, 5, 52000, 4.99286, 1.91848e-05, 1.92303e-05, 1.92037e-05

MarcoFalke commented at 6:58 am on July 8, 2018: member

Rebased and add a bench.

Could add the bench in a separate commit/pull request to make it easier to check for the speedup.

AkioNak force-pushed on Jul 8, 2018

AkioNak commented at 12:22 pm on July 8, 2018: contributor

@MarcoFalke Thank you for your suggestion. Separated this new benchmark from the original commit.

AkioNak force-pushed on Jul 9, 2018

AkioNak commented at 4:31 am on July 9, 2018: contributor

Re-orderd commits. First commit (ee9867c) is adding a benchmark fucntion. Second one (f9083e5) is refactoring(speed up) and tests.

in src/prevector.h:400 in f9083e53e3 outdated

393@@ -394,6 +394,20 @@ class prevector {
394         fill(ptr, first, last);
395     }
396 
397+    inline void resize_uninitialized(size_type new_size) {
398+        // resize_uninitialized change the size of the prevector but does not initialize.
399+        // If size < new_size, the added elements must be initialized explicitly after it returns.
400+        difference_type count = new_size - size();

kallewoof commented at 4:55 am on July 9, 2018:

Is this always guaranteed to give a signed result back? I.e. if new_size is 4 and size() is 5, will count always be -1 or will it sometimes be cast to (uint32_t)-1? (E.g. for other platforms and/or compilers)

AkioNak commented at 5:58 am on July 10, 2018:

@kallewoof Thanks. I will judge whether the value of ‘_size’ should need to increase by comparing new_size and size () instead of the sign of pre-computed difference of them.

AkioNak commented at 10:19 am on July 10, 2018:

done.

in src/test/prevector_tests.cpp:202 in f9083e53e3 outdated

190+        auto p = pre_vector.size();
191+        pre_vector.resize_uninitialized(p + values.size());
192+        for (auto v : values) {
193+            pre_vector[p] = v;
194+            ++p;
195+        }

kallewoof commented at 4:59 am on July 9, 2018:

Also test shrinking the prevector.

AkioNak commented at 5:58 am on July 10, 2018:

@kallewoof Indeed. I will add.

AkioNak commented at 10:19 am on July 10, 2018:

done.

kallewoof commented at 5:01 am on July 9, 2018: member

utACK ee9867ce781c3849a8969f0cc870775cbf8956d3

AkioNak commented at 10:22 am on July 10, 2018: contributor

@kallewoof fixed your pointed out. please re-review.

A benchmark result is following: [Machine] MacBook Pro (macOS 10.13.3/i7 2.2GHz/mem 16GB/SSD)

[result] DeserializeAndCheckBlockTest => 2.4% faster DeserializeBlockTest => 2.9% faster PrevectorDeserializeNontrivial => 2.2% faster PrevectorDeserializeTrivial => 20.0% faster

[before] commit ee9867ce781c3849a8969f0cc870775cbf8956d3

0# Benchmark, evals, iterations, total, min, max, median
1DeserializeAndCheckBlockTest, 10, 1600, 106.29, 0.00663181, 0.00666301, 0.00664266
2DeserializeBlockTest, 10, 1300, 80.2574, 0.00615166, 0.00620452, 0.00617665
3PrevectorDeserializeNontrivial, 10, 68000, 216.451, 0.00031795, 0.000319471, 0.000318262
4PrevectorDeserializeTrivial, 10, 520000, 130.563, 2.5033e-05, 2.52096e-05, 2.51162e-05

[After] commit 26bbf08a6ecb3f5876d2843166073b23179e527e

0# Benchmark, evals, iterations, total, min, max, median
1DeserializeAndCheckBlockTest, 10, 1600, 103.765, 0.00647724, 0.00649993, 0.00648584
2DeserializeBlockTest, 10, 1300, 78.2138, 0.00599649, 0.00612872, 0.00600488
3PrevectorDeserializeNontrivial, 10, 68000, 211.805, 0.000311148, 0.000312017, 0.00031137
4PrevectorDeserializeTrivial, 10, 520000, 109.057, 2.08899e-05, 2.13558e-05, 2.09365e-05

in src/test/prevector_tests.cpp:196 in c8cab143bd outdated

192@@ -193,6 +193,9 @@ class prevector_tester {
193             pre_vector[p] = v;
194             ++p;
195         }
196+        size_t s = real_vector.size() - (InsecureRand32() % values.size());

kallewoof commented at 2:23 am on July 11, 2018:

Avoid randomness in tests. It is slow and results are needlessly unpredictable. Maybe just do real_vector.size() / 2 or something?

AkioNak commented at 7:39 am on July 11, 2018:

@kallewoof Ok. But I think if shrink here, added elements may be gone. So I will move these line to the top of this function.

in src/prevector.h:409 in 26bbf08a6e outdated

407-        if (count < 0) {
408+        if (new_size < cur_size) {
409             erase(item_ptr(new_size), end());
410         } else {
411-            _size += count;
412+            _size += new_size - cur_size;

kallewoof commented at 2:24 am on July 11, 2018:

Does it impact performance a lot if you just use size() in both these places?

AkioNak commented at 7:32 am on July 11, 2018:

@kallewoof Now I mesured a bench that using size() instead of cur_size; I am surprised that compiler optimization is excellent. In this patch, rather than referring to local variables, function calls are faster.

in src/test/prevector_tests.cpp:204 in c8cab143bd outdated

192@@ -193,6 +193,9 @@ class prevector_tester {
193             pre_vector[p] = v;
194             ++p;
195         }
196+        size_t s = real_vector.size() - (InsecureRand32() % values.size());
197+        real_vector.resize(s);
198+        pre_vector.resize_uninitialized(s);
199     }

AkioNak commented at 7:41 am on July 11, 2018:

self review. need calling test() at the enf of this function.

AkioNak force-pushed on Jul 11, 2018

AkioNak commented at 8:56 am on July 11, 2018: contributor

@kallewoof done.

A benchmark result update:

[result] Compared to commit ee9867ce781c3849a8969f0cc870775cbf8956d3 DeserializeAndCheckBlockTest => 2.7% faster DeserializeBlockTest => 3.1% faster PrevectorDeserializeNontrivial => 2.4% faster PrevectorDeserializeTrivial => 23.7% faster

[After] commit ece98807208328ce17c1d30c44a68d272b58b90c

0# Benchmark, evals, iterations, total, min, max, median
1DeserializeAndCheckBlockTest, 10, 1600, 103.461, 0.00644834, 0.00650016, 0.00646659
2DeserializeBlockTest, 10, 1300, 77.9655, 0.00597518, 0.00606764, 0.00599208
3PrevectorDeserializeNontrivial, 10, 68000, 211.382, 0.000310146, 0.000312558, 0.000310812
4PrevectorDeserializeTrivial, 10, 520000, 105.636, 2.02662e-05, 2.04059e-05, 2.03115e-05

MarcoFalke commented at 12:22 pm on July 11, 2018: member

Could add the benchmarks in a separate pull request to get them in faster?

AkioNak force-pushed on Jul 19, 2018

AkioNak commented at 8:51 am on July 19, 2018: contributor

@kallewoof @MarcoFalke move the benchmarks in a new pull request #13711, and squashed.

in src/prevector.h:398 in 6a511fbd66 outdated

393@@ -394,6 +394,21 @@ class prevector {
394         fill(ptr, first, last);
395     }
396 
397+    inline void resize_uninitialized(size_type new_size) {
398+        // resize_uninitialized change the size of the prevector but does not initialize.

kallewoof commented at 8:45 am on July 27, 2018:

// resize_uninitialized changes the size of the prevector but does not initialize it

AkioNak commented at 1:56 pm on July 27, 2018:

@kallewoof Thanks. fixed.

in src/prevector.h:399 in 6a511fbd66 outdated

393@@ -394,6 +394,21 @@ class prevector {
394         fill(ptr, first, last);
395     }
396 
397+    inline void resize_uninitialized(size_type new_size) {
398+        // resize_uninitialized change the size of the prevector but does not initialize.
399+        // If size < new_size, the added elements must be initialized explicitly after it returns.

kallewoof commented at 8:45 am on July 27, 2018:

after it returns. is unnecessary I think. must be initialized explicitly

AkioNak commented at 1:56 pm on July 27, 2018:

@kallewoof Thanks. fixed.

kallewoof commented at 8:48 am on July 27, 2018: member

utACK 6a511fbd660dc9ba307e7c401271978055c16fb4

I am not super happy with using randomness in tests as it slows down and makes the tests needlessly unpredictable, but otherwise looks good to me.

AkioNak force-pushed on Jul 27, 2018

MarcoFalke referenced this in commit f98d1e0008 on Jul 27, 2018

AkioNak commented at 1:41 pm on July 29, 2018: contributor

@kallewoof I understand your concern of slow-down caused from randomness. But I don’t worry so much because of deterministic psued random introduce by #10321. It may be possible to remove randomness from this test (or more widely), but I think it would be better to use a different PR(s).

kallewoof commented at 3:55 am on July 30, 2018: member

Cool about #10321, didn’t realize that change was made.

utACK b91962ecf0a9b90c989068e3f12e5699bc90ef6f

[Edited to fix commit reference.]

in src/test/prevector_tests.cpp:280 in b91962ecf0 outdated

275@@ -260,6 +276,14 @@ BOOST_AUTO_TEST_CASE(PrevectorTestInt)
276             if (InsecureRandBits(5) == 18) {
277                 test.move();
278             }
279+            if (InsecureRandBits(5) == 19) {
280+                int num = 1 + (InsecureRandBits(4));

practicalswift commented at 1:38 pm on October 11, 2018:

Switch to unsigned? InsecureRandBits returns unsigned :-)

AkioNak commented at 7:16 am on October 15, 2018:

@practicalswift Thank you. Oh, I see. unsigned int is better. I have addressed it ( and rebased to be992701).

sipa commented at 8:06 pm on October 12, 2018: member

utACK b91962ecf0a9b90c989068e3f12e5699bc90ef6f

AkioNak force-pushed on Oct 15, 2018

in src/test/prevector_tests.cpp:193 in 5573129cfe outdated

182@@ -183,6 +183,22 @@ class prevector_tester {
183         pre_vector = pre_vector_alt;
184     }
185 
186+    void resize_uninitialized(realtype values) {
187+        size_t s = real_vector.size() / 2;
188+        real_vector.resize(s);
189+        pre_vector.resize_uninitialized(s);

shahzadlone commented at 7:32 am on February 1, 2019:

Perhaps here you could make sure before going in loop to reserve values.size() many more memory to optimize the vector.

AkioNak commented at 1:02 pm on February 3, 2019:

@shahzadlone Thank you for your review. Addressed that you pointed out.

speed up Unserialize_impl for prevector

The unserializer for prevector uses resize() for reserve the area,
but it's prefer to use reserve() because resize() have overhead
to call its constructor many times.

However, reserve() does not change the value of "_size"
(a private member of prevector).

This PR introduce resize_uninitialized() to prevector that similar to
resize() but does not call constructor, and added elements are
explicitly initialized in Unserialize_imple().

The changes are as follows:
1. prevector.h
Add a public member function named 'resize_uninitialized'.
This function processes like as resize() but does not call constructors.
So added elemensts needs explicitly initialized after this returns.

2. serialize.h
In the following two function:
 Unserialize_impl(Stream& is, prevector<N, T>& v, const unsigned char&)
 Unserialize_impl(Stream& is, prevector<N, T>& v, const V&)
Calls resize_uninitialized() instead of resize()

3. test/prevector_tests.cpp
Add a test for resize_uninitialized().

86b47fa741

AkioNak force-pushed on Feb 3, 2019

laanwj commented at 3:10 pm on June 18, 2019: member

utACK 86b47fa741408b061ab0bda784b8678bfd7dfa88

laanwj merged this on Jun 18, 2019

laanwj closed this on Jun 18, 2019

laanwj referenced this in commit 8777a80706 on Jun 18, 2019

sidhujag referenced this in commit 1fd27cc371 on Jun 19, 2019

MarcoFalke commented at 7:26 pm on June 19, 2019: member

Has anyone checked that this actually improves the benchmark as claimed in the OP. It does not for me.

kallewoof commented at 5:11 am on June 20, 2019: member

I ran the benchmarks on two linux machines (one pretty powerful (GCO) and one not so (Lefty)), and a MacBook Pro. Raw numbers at bottom. I see improvements in master compared to e2182b02b in PrevectorDeserialize*rivial, but not in the other benchmarks:

(Sorry, I wasn’t sure how to make graphs for benchmarks. Raw data below.)

  0git checkout e2182b02b
  1
  2MacBook Pro
  3
  4$ ./bench_bitcoin -filter=".*Deserialize.*"
  5# Benchmark, evals, iterations, total, min, max, median
  6DeserializeAndCheckBlockTest, 5, 160, 6.38381, 0.00792733, 0.00802109, 0.00799184
  7DeserializeBlockTest, 5, 130, 4.23839, 0.00634135, 0.00694236, 0.00644537
  8PrevectorDeserializeNontrivial, 5, 6800, 13.3873, 0.000390055, 0.000397128, 0.000393687
  9PrevectorDeserializeTrivial, 5, 52000, 7.96211, 2.98404e-05, 3.17966e-05, 3.04728e-05
 10$ ./bench_bitcoin -filter=".*Deserialize.*"
 11# Benchmark, evals, iterations, total, min, max, median
 12DeserializeAndCheckBlockTest, 5, 160, 7.26932, 0.00838346, 0.00972242, 0.00942287
 13DeserializeBlockTest, 5, 130, 4.31697, 0.00627366, 0.00738568, 0.0063276
 14PrevectorDeserializeNontrivial, 5, 6800, 13.2609, 0.000386639, 0.000394633, 0.000387913
 15PrevectorDeserializeTrivial, 5, 52000, 6.47209, 2.45058e-05, 2.51857e-05, 2.49892e-05
 16$ ./bench_bitcoin -filter=".*Deserialize.*"
 17# Benchmark, evals, iterations, total, min, max, median
 18DeserializeAndCheckBlockTest, 5, 160, 6.32675, 0.00777852, 0.00802965, 0.00789037
 19DeserializeBlockTest, 5, 130, 4.20692, 0.00634361, 0.00674539, 0.00637073
 20PrevectorDeserializeNontrivial, 5, 6800, 13.3495, 0.000390075, 0.00039594, 0.000391467
 21PrevectorDeserializeTrivial, 5, 52000, 6.39515, 2.44061e-05, 2.48503e-05, 2.45071e-05
 22$
 23
 24Ubuntu Linux (gco)
 25
 26$ ./bench_bitcoin -filter=".*Deserialize.*"
 27# Benchmark, evals, iterations, total, min, max, median
 28DeserializeAndCheckBlockTest, 5, 160, 3.66615, 0.0045421, 0.00466559, 0.00455856
 29DeserializeBlockTest, 5, 130, 2.46865, 0.00379088, 0.00381237, 0.00379707
 30PrevectorDeserializeNontrivial, 5, 6800, 1.98757, 5.78348e-05, 6.06301e-05, 5.79352e-05
 31PrevectorDeserializeTrivial, 5, 52000, 2.46781, 9.44648e-06, 9.56601e-06, 9.48185e-06
 32$ ./bench_bitcoin -filter=".*Deserialize.*"
 33# Benchmark, evals, iterations, total, min, max, median
 34DeserializeAndCheckBlockTest, 5, 160, 3.66454, 0.00455685, 0.00465467, 0.00456468
 35DeserializeBlockTest, 5, 130, 2.48794, 0.00379826, 0.00392414, 0.00380655
 36PrevectorDeserializeNontrivial, 5, 6800, 1.99268, 5.78273e-05, 6.03986e-05, 5.83746e-05
 37PrevectorDeserializeTrivial, 5, 52000, 2.49502, 9.40815e-06, 9.87399e-06, 9.63512e-06
 38$ ./bench_bitcoin -filter=".*Deserialize.*"
 39# Benchmark, evals, iterations, total, min, max, median
 40DeserializeAndCheckBlockTest, 5, 160, 3.68505, 0.00456129, 0.00469898, 0.00458263
 41DeserializeBlockTest, 5, 130, 2.47728, 0.00378338, 0.00390047, 0.00378849
 42PrevectorDeserializeNontrivial, 5, 6800, 1.98537, 5.76469e-05, 6.05225e-05, 5.77919e-05
 43PrevectorDeserializeTrivial, 5, 52000, 2.46844, 9.41589e-06, 9.66261e-06, 9.43065e-06
 44$
 45
 46
 47Ubuntu Linux (lefty)
 48
 49$ ./bench_bitcoin -filter=".*Deserialize.*"
 50# Benchmark, evals, iterations, total, min, max, median
 51DeserializeAndCheckBlockTest, 5, 160, 4.026, 0.00500104, 0.00506093, 0.00503687
 52DeserializeBlockTest, 5, 130, 2.75244, 0.00420178, 0.00429237, 0.00423144
 53PrevectorDeserializeNontrivial, 5, 6800, 2.1691, 6.35412e-05, 6.42218e-05, 6.3725e-05
 54PrevectorDeserializeTrivial, 5, 52000, 2.74612, 1.05157e-05, 1.05976e-05, 1.05714e-05
 55$ ./bench_bitcoin -filter=".*Deserialize.*"
 56# Benchmark, evals, iterations, total, min, max, median
 57DeserializeAndCheckBlockTest, 5, 160, 4.06636, 0.0050483, 0.00515482, 0.00507011
 58DeserializeBlockTest, 5, 130, 2.78506, 0.00421478, 0.00449306, 0.00424098
 59PrevectorDeserializeNontrivial, 5, 6800, 2.18859, 6.39355e-05, 6.50351e-05, 6.42038e-05
 60PrevectorDeserializeTrivial, 5, 52000, 2.77944, 1.05288e-05, 1.10395e-05, 1.06421e-05
 61$ ./bench_bitcoin -filter=".*Deserialize.*"
 62# Benchmark, evals, iterations, total, min, max, median
 63DeserializeAndCheckBlockTest, 5, 160, 4.02973, 0.00501044, 0.00507903, 0.00502341
 64DeserializeBlockTest, 5, 130, 2.77102, 0.00423808, 0.00430167, 0.00425302
 65PrevectorDeserializeNontrivial, 5, 6800, 2.21241, 6.38159e-05, 6.64756e-05, 6.51687e-05
 66PrevectorDeserializeTrivial, 5, 52000, 2.76094, 1.05595e-05, 1.07031e-05, 1.06016e-05
 67$
 68
 69git checkout master (44d81723236114f9370f386f3b3310477a6dde43)
 70
 71MacBook Pro
 72
 73$ ./bench_bitcoin -filter=".*Deserialize.*"
 74# Benchmark, evals, iterations, total, min, max, median
 75DeserializeAndCheckBlockTest, 5, 160, 6.17814, 0.0076679, 0.00785118, 0.00768925
 76DeserializeBlockTest, 5, 130, 4.06245, 0.00622414, 0.00628035, 0.00624402
 77PrevectorDeserializeNontrivial, 5, 6800, 13.073, 0.000382798, 0.000387096, 0.00038391
 78PrevectorDeserializeTrivial, 5, 52000, 5.15493, 1.96871e-05, 1.99912e-05, 1.9814e-05
 79$ ./bench_bitcoin -filter=".*Deserialize.*"
 80# Benchmark, evals, iterations, total, min, max, median
 81DeserializeAndCheckBlockTest, 5, 160, 6.1623, 0.00761647, 0.00792896, 0.00765533
 82DeserializeBlockTest, 5, 130, 4.445, 0.00621718, 0.00774078, 0.00645273
 83PrevectorDeserializeNontrivial, 5, 6800, 14.5375, 0.000414608, 0.00044193, 0.000425773
 84PrevectorDeserializeTrivial, 5, 52000, 5.79235, 1.98752e-05, 2.50009e-05, 2.20261e-05
 85$ ./bench_bitcoin -filter=".*Deserialize.*"
 86# Benchmark, evals, iterations, total, min, max, median
 87DeserializeAndCheckBlockTest, 5, 160, 6.22612, 0.00771692, 0.00791803, 0.00773467
 88DeserializeBlockTest, 5, 130, 4.09782, 0.00627166, 0.00638606, 0.00628699
 89PrevectorDeserializeNontrivial, 5, 6800, 13.0198, 0.000381592, 0.000384941, 0.000382887
 90PrevectorDeserializeTrivial, 5, 52000, 5.17657, 1.98519e-05, 2.00462e-05, 1.98927e-05
 91$
 92
 93Linux Ubuntu (gco)
 94
 95$ ./bench_bitcoin -filter=".*Deserialize.*"
 96# Benchmark, evals, iterations, total, min, max, median
 97DeserializeAndCheckBlockTest, 5, 160, 3.71635, 0.00461716, 0.00470119, 0.00462609
 98DeserializeBlockTest, 5, 130, 2.51346, 0.00384798, 0.0039135, 0.00385253
 99PrevectorDeserializeNontrivial, 5, 6800, 1.64983, 4.74319e-05, 5.09874e-05, 4.79242e-05
100PrevectorDeserializeTrivial, 5, 52000, 1.66145, 6.2768e-06, 6.69577e-06, 6.32143e-06
101$ ./bench_bitcoin -filter=".*Deserialize.*"
102# Benchmark, evals, iterations, total, min, max, median
103DeserializeAndCheckBlockTest, 5, 160, 3.6407, 0.00452042, 0.00463718, 0.00453418
104DeserializeBlockTest, 5, 130, 2.46214, 0.00373507, 0.00390495, 0.00376771
105PrevectorDeserializeNontrivial, 5, 6800, 1.64648, 4.78473e-05, 5.03235e-05, 4.79951e-05
106PrevectorDeserializeTrivial, 5, 52000, 1.67024, 6.32431e-06, 6.73776e-06, 6.3539e-06
107$ ./bench_bitcoin -filter=".*Deserialize.*"
108# Benchmark, evals, iterations, total, min, max, median
109DeserializeAndCheckBlockTest, 5, 160, 3.77728, 0.00471439, 0.00474315, 0.00471631
110DeserializeBlockTest, 5, 130, 2.49195, 0.00381892, 0.00388194, 0.00382307
111PrevectorDeserializeNontrivial, 5, 6800, 1.68265, 4.85277e-05, 5.10689e-05, 4.96072e-05
112PrevectorDeserializeTrivial, 5, 52000, 1.68869, 6.41578e-06, 6.60289e-06, 6.50186e-06
113$
114
115Ubuntu Linux (lefty)
116
117$ ./bench_bitcoin -filter=".*Deserialize.*"
118# Benchmark, evals, iterations, total, min, max, median
119DeserializeAndCheckBlockTest, 5, 160, 4.05288, 0.00505502, 0.00507194, 0.00506818
120DeserializeBlockTest, 5, 130, 2.80491, 0.0042465, 0.00449445, 0.00427763
121PrevectorDeserializeNontrivial, 5, 6800, 1.86039, 5.34824e-05, 5.64816e-05, 5.4309e-05
122PrevectorDeserializeTrivial, 5, 52000, 1.88765, 7.17239e-06, 7.30816e-06, 7.28207e-06
123$ ./bench_bitcoin -filter=".*Deserialize.*"
124# Benchmark, evals, iterations, total, min, max, median
125DeserializeAndCheckBlockTest, 5, 160, 3.97765, 0.0049599, 0.00499381, 0.00496968
126DeserializeBlockTest, 5, 130, 2.70895, 0.00415038, 0.00417582, 0.00417205
127PrevectorDeserializeNontrivial, 5, 6800, 1.80413, 5.28583e-05, 5.33843e-05, 5.30097e-05
128PrevectorDeserializeTrivial, 5, 52000, 1.82675, 6.98311e-06, 7.08135e-06, 7.019e-06
129$ ./bench_bitcoin -filter=".*Deserialize.*"
130# Benchmark, evals, iterations, total, min, max, median
131DeserializeAndCheckBlockTest, 5, 160, 4.03341, 0.00500616, 0.00510331, 0.0050347
132DeserializeBlockTest, 5, 130, 2.74251, 0.00420575, 0.0042337, 0.00421798
133PrevectorDeserializeNontrivial, 5, 6800, 1.80552, 5.28383e-05, 5.37108e-05, 5.29898e-05
134PrevectorDeserializeTrivial, 5, 52000, 1.8427, 6.96723e-06, 7.47701e-06, 7.0036e-06
135$

AkioNak commented at 10:40 am on June 20, 2019: contributor

@MarcoFalke @kallewoof Thank you for the comment and reporting the benchmark result.

The #12549, which is an improvement to prevector and merged on 1 Mar 2018, is very efficient, so the improvement of this PR is relatively hidden in the benchmark of Deserialize*BlockTest. However, focusing on prevector deserialization itself, there are some speedup with this PR.

So I had have to change the PR description to PrevectorDeserializerivial instead of DeserializeBlockTest to clarify where is improved.

jamesob commented at 2:55 pm on June 24, 2019: member

Can verify I’m seeing speedups in this branch (as rebased onto master) relative to the commit that came before its merge into master, but only for gcc:

e2182b02b5af13f0de38cf8b08bb81723387c570 vs. unserialize (relative)

bench name	x	e2182b02b5af13f0de38cf8b08bb81723387c570	unserialize
micro.gcc.PrevectorDeserializeNontrivial.total_secs	3	1.143	1.000
micro.gcc.PrevectorDeserializeTrivial.total_secs	3	1.333	1.000
micro.gcc.PrevectorDestructorNontrivial.total_secs	3	1.018	1.000
micro.clang.PrevectorDeserializeNontrivial.total_secs	3	1.000	1.020
micro.clang.PrevectorDeserializeTrivial.total_secs	3	1.106	1.000
micro.clang.PrevectorDestructorNontrivial.total_secs	3	1.000	1.019
micro.clang.PrevectorDestructorTrivial.total_secs	3	1.000	1.003

These changes do not notably affect IBD time though.

codablock referenced this in commit 7c6ebbe058 on Oct 1, 2019

codablock referenced this in commit 833ebbef5c on Oct 1, 2019

codablock referenced this in commit 2be67c7605 on Oct 2, 2019

barrystyle referenced this in commit d6a98dd1a3 on Jan 22, 2020

PastaPastaPasta referenced this in commit c59dcbba59 on Jul 29, 2020

jasonbcox referenced this in commit 7289af88e9 on Nov 12, 2020

furszy referenced this in commit 07b88da888 on Jan 25, 2021

gades referenced this in commit 2968f93c4f on Jul 1, 2021

ftrader referenced this in commit 70752f8e64 on Nov 21, 2021

MarcoFalke locked this on Dec 16, 2021

speed up Unserialize_impl for prevector #12324

A benchmark result is following:

e2182b02b5af13f0de38cf8b08bb81723387c570 vs. unserialize (relative)