util: implement `noexcept` move assignment & move ctor for `prevector`

martinus commented at 2:55 pm on March 26, 2023: contributor

prevector’s move assignment and move constructor were not noexcept, which makes it inefficient to use inside STL containers like std::vector. That’s the case e.g. for CScriptCheck. This PR adds noexcept, and also implements the move assignment & ctor, which makes it quite a bit more efficient to use prevector in an std::vector.

The PR also adds a benchmark which grows an std::vector by adding prevector objects to it.

merge-base:

ns/op	op/s	err%	ins/op	cyc/op	IPC	bra/op	miss%	total	benchmark
6,440.29	155,272.42	0.2%	40,713.01	20,473.84	1.989	7,132.01	0.2%	0.44	`PrevectorFillVectorDirectNontrivial`
3,213.19	311,217.35	0.7%	35,373.01	10,214.07	3.463	6,945.00	0.2%	0.43	`PrevectorFillVectorDirectTrivial`
34,749.70	28,777.23	0.1%	364,396.05	110,521.94	3.297	78,568.37	0.1%	0.43	`PrevectorFillVectorIndirectNontrivial`
32,535.05	30,736.09	0.4%	353,823.31	103,464.53	3.420	79,871.80	0.2%	0.40	`PrevectorFillVectorIndirectTrivial`

util: prevector’s move ctor and move assignment is noexcept:

ns/op	op/s	err%	ins/op	cyc/op	IPC	bra/op	miss%	total	benchmark
6,603.87	151,426.40	0.2%	23,734.01	21,009.63	1.130	2,445.01	0.3%	0.44	`PrevectorFillVectorDirectNontrivial`
1,980.93	504,813.15	0.1%	13,784.00	6,304.32	2.186	2,258.00	0.3%	0.44	`PrevectorFillVectorDirectTrivial`
19,110.54	52,327.15	0.1%	139,816.41	51,987.72	2.689	28,512.18	0.1%	0.43	`PrevectorFillVectorIndirectNontrivial`
12,334.37	81,074.27	0.7%	125,655.12	39,253.58	3.201	27,854.46	0.2%	0.44	`PrevectorFillVectorIndirectTrivial`

util: implement prevector’s move ctor & move assignment

ns/op	op/s	err%	ins/op	cyc/op	IPC	bra/op	miss%	total	benchmark
5,262.66	190,018.01	0.2%	20,157.01	16,745.26	1.204	2,445.01	0.3%	0.44	`PrevectorFillVectorDirectNontrivial`
1,687.07	592,744.35	0.2%	12,742.00	5,368.02	2.374	2,258.00	0.3%	0.44	`PrevectorFillVectorDirectTrivial`
17,930.80	55,769.95	0.1%	136,237.69	47,903.31	2.844	28,512.02	0.2%	0.42	`PrevectorFillVectorIndirectNontrivial`
11,893.75	84,077.78	0.2%	126,182.02	37,852.91	3.333	28,152.01	0.1%	0.44	`PrevectorFillVectorIndirectTrivial`

As can be seen, mostly thanks to just noexcept the benchmark becomes about 2 times faster because std::vector can now use move operations instead of having to fall back to copy everything

I had a look at how this change affects the other benchmarks, and they are all pretty much the same, the only noticable difference is CCheckQueueSpeedPrevectorJob goes from 364.56ns down to 346.21ns.

bench: Add benchmark for prevector usage in std::vector d380d2877e

util: prevector's move ctor and move assignment is `noexcept`

Move operations already are `noexcept`, so add the keyword to the methods.
This makes the `PrevectorFillVectorIndirect...` benchmarks about twice
as fast on my machine, because otherwise `std::vector` has to use a copy
when the vector resizes.

81f67977f5

test: CScriptCheck is used a lot in std::vector, make sure that's efficient

Adds a few static_asserts so CScriptCheck stays is_nothrow_move_assignable,
is_nothrow_move_constructible, and is_nothrow_destructible

fffc86f49f

util: implement prevector's move ctor & move assignment

Using swap() was rather wasteful because it had to copy the whole direct
memory data twice. Also, due to the swap() in move assignment the moved-from
object might hold on to unused memory for longer than necessary.

bfb9291a86

DrahtBot commented at 2:55 pm on March 26, 2023: contributor

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Reviews

See the guideline for information on the review process.

Type	Reviewers
ACK	theStack, jonatack, achow101
Concept ACK	hebasto
Approach ACK	hernanmarino, john-moffett

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

DrahtBot added the label Utils/log/libs on Mar 26, 2023

hebasto commented at 3:59 pm on March 26, 2023: member

Concept ACK. Thanks for picking this up.

hernanmarino commented at 11:58 pm on March 26, 2023: contributor

Approach ACK

in src/prevector.h:280 in bfb9291a86

277@@ -276,8 +278,13 @@ class prevector {
278         return *this;
279     }
280 
281-    prevector& operator=(prevector<N, T, Size, Diff>&& other) {
282-        swap(other);

jonatack commented at 9:59 pm on April 4, 2023:

After removing these two swap calls in the last commit bfb9291, should we remove the unused swap function?

 0diff --git a/src/prevector.h b/src/prevector.h
 1index bcab1ff00cd..2e8510acbe6 100644
 2--- a/src/prevector.h
 3+++ b/src/prevector.h
 4@@ -459,12 +459,6 @@ public:
 5         return *item_ptr(size() - 1);
 6     }
 7 
 8-    void swap(prevector<N, T, Size, Diff>& other) noexcept
 9-    {
10-        std::swap(_union, other._union);
11-        std::swap(_size, other._size);
12-    }
13-
14     ~prevector() {
15         if (!is_direct()) {
16             free(_union.indirect_contents.indirect);
17diff --git a/src/test/fuzz/prevector.cpp b/src/test/fuzz/prevector.cpp
18index 9cea32e304f..312e20a0998 100644
19--- a/src/test/fuzz/prevector.cpp
20+++ b/src/test/fuzz/prevector.cpp
21@@ -161,12 +161,6 @@ public:
22         pre_vector.shrink_to_fit();
23     }
24 
25-    void swap() noexcept
26-    {
27-        real_vector.swap(real_vector_alt);
28-        pre_vector.swap(pre_vector_alt);
29-    }
30-
31     void move()
32     {
33         real_vector = std::move(real_vector_alt);
34@@ -211,7 +205,7 @@ FUZZ_TARGET(prevector)
35 
36     LIMITED_WHILE(prov.remaining_bytes(), 3000)
37     {
38-        switch (prov.ConsumeIntegralInRange<int>(0, 13 + 3 * (test.size() > 0))) {
39+        switch (prov.ConsumeIntegralInRange<int>(0, 12 + 3 * (test.size() > 0))) {
40         case 0:
41             test.insert(prov.ConsumeIntegralInRange<size_t>(0, test.size()), prov.ConsumeIntegral<int>());
42             break;
43@@ -261,21 +255,18 @@ FUZZ_TARGET(prevector)
44             test.assign(prov.ConsumeIntegralInRange<size_t>(0, 32767), prov.ConsumeIntegral<int>());
45             break;
46         case 11:
47-            test.swap();
48-            break;
49-        case 12:
50             test.copy();
51             break;
52-        case 13:
53+        case 12:
54             test.move();
55             break;
56-        case 14:
57+        case 13:
58             test.update(prov.ConsumeIntegralInRange<size_t>(0, test.size() - 1), prov.ConsumeIntegral<int>());
59             break;
60-        case 15:
61+        case 14:
62             test.erase(prov.ConsumeIntegralInRange<size_t>(0, test.size() - 1));
63             break;
64-        case 16:
65+        case 15:
66             test.pop_back();
67             break;
68         }
69diff --git a/src/test/prevector_tests.cpp b/src/test/prevector_tests.cpp
70index 1559011fcd1..375a8772d8f 100644
71--- a/src/test/prevector_tests.cpp
72+++ b/src/test/prevector_tests.cpp
73@@ -166,13 +166,6 @@ public:
74         test();
75     }
76 
77-    void swap() noexcept
78-    {
79-        real_vector.swap(real_vector_alt);
80-        pre_vector.swap(pre_vector_alt);
81-        test();
82-    }
83-
84     void move() {
85         real_vector = std::move(real_vector_alt);
86         real_vector_alt.clear();
87@@ -273,9 +266,6 @@ BOOST_AUTO_TEST_CASE(PrevectorTestInt)
88             if (InsecureRandBits(9) == 12) {
89                 test.assign(InsecureRandBits(5), int(InsecureRand32()));
90             }
91-            if (InsecureRandBits(3) == 3) {
92-                test.swap();
93-            }
94             if (InsecureRandBits(4) == 8) {
95                 test.copy();
96             }

john-moffett commented at 6:55 pm on May 12, 2023:

I’d normally agree, but the documentation says that prevector is meant as a drop-in replacement for std::vector<T>, so I think it’s best to leave it.

jonatack commented at 10:38 pm on April 4, 2023: contributor

Tested Approach ACK bfb9291a8661fe5b26c14ed755cfa89d27c37110, ~ACK modulo re-verifying the implementation of the last commit.

Rebased to current master, built and tested at each commit on macOS Ventura 13.3, ARM64 M1 Max, regular (non-debug) build, and I am seeing the following speedups in the added benchmarks after the second commit 81f67977f543f and the fourth commit bfb9291a8661fe.

The only exception is a slowdown at 81f67977f54 for the DirectNontrivial benchmark, but all the other improvements look quite significant.

Ran the benchmarks a few times at each step to re-verify.

current master @ 369d4c0

0|               ns/op |                op/s |    err% |     total | benchmark
1|--------------------:|--------------------:|--------:|----------:|:----------
2|            3,390.34 |          294,955.41 |    0.3% |      0.01 | `PrevectorFillVectorDirectNontrivial`
3|            2,455.29 |          407,283.18 |    0.2% |      0.01 | `PrevectorFillVectorDirectTrivial`
4|           23,923.60 |           41,799.74 |    1.0% |      0.01 | `PrevectorFillVectorIndirectNontrivial`
5|           16,638.89 |           60,100.17 |    0.6% |      0.01 | `PrevectorFillVectorIndirectTrivial`

prevector’s move ctor and move assignment is noexcept

0|            3,720.82 |          268,758.17 |    0.1% |      0.01 | `PrevectorFillVectorDirectNontrivial`
1|            2,028.66 |          492,935.82 |    0.2% |      0.01 | `PrevectorFillVectorDirectTrivial`
2|           11,149.71 |           89,688.44 |    2.2% |      0.01 | `PrevectorFillVectorIndirectNontrivial`
3|            5,943.73 |          168,244.45 |    0.1% |      0.01 | `PrevectorFillVectorIndirectTrivial`

implement prevector’s move ctor & move assignment

0|            2,761.52 |          362,119.10 |    0.2% |      0.01 | `PrevectorFillVectorDirectNontrivial`
1|            1,655.06 |          604,207.56 |    0.8% |      0.01 | `PrevectorFillVectorDirectTrivial`
2|            9,561.75 |          104,583.42 |    1.4% |      0.01 | `PrevectorFillVectorIndirectNontrivial`
3|            5,192.09 |          192,600.85 |    0.4% |      0.01 | `PrevectorFillVectorIndirectTrivial`

We use std::vector<CScriptCheck> data structures in CheckInputScripts() and ConnectBlock().

Some relevant links on this topic:

jonatack commented at 10:53 pm on April 4, 2023: contributor

I had a look at how this change affects the other benchmarks, and they are all pretty much the same, the only noticable difference is CCheckQueueSpeedPrevectorJob goes from 364.56ns down to 346.21ns.

Here are my results with ./src/bench/bench_bitcoin -filter='CCheckQueueSpeedPrevectorJob*.*'.

master @ 369d4c03b7084de967576759545ba36a17fc18bb

0|              ns/job |               job/s |    err% |     total | benchmark
1|--------------------:|--------------------:|--------:|----------:|:----------
2|              249.56 |        4,007,033.05 |    8.3% |      0.09 | :wavy_dash: `CCheckQueueSpeedPrevectorJob` (Unstable with ~10.9 iters. Increase `minEpochIterations` to e.g. 109)
3|              260.60 |        3,837,227.86 |   10.3% |      0.09 | :wavy_dash: `CCheckQueueSpeedPrevectorJob` (Unstable with ~10.9 iters. Increase `minEpochIterations` to e.g. 109)
4|              235.15 |        4,252,693.75 |    7.0% |      0.08 | :wavy_dash: `CCheckQueueSpeedPrevectorJob` (Unstable with ~10.9 iters. Increase `minEpochIterations` to e.g. 109)

this pull (rebased to master)

0|              ns/job |               job/s |    err% |     total | benchmark
1|--------------------:|--------------------:|--------:|----------:|:----------
2|              237.13 |        4,217,094.35 |    6.6% |      0.09 | :wavy_dash: `CCheckQueueSpeedPrevectorJob` (Unstable with ~10.9 iters. Increase `minEpochIterations` to e.g. 109)
3|              246.34 |        4,059,432.96 |    6.6% |      0.08 | :wavy_dash: `CCheckQueueSpeedPrevectorJob` (Unstable with ~10.9 iters. Increase `minEpochIterations` to e.g. 109)
4|              230.56 |        4,337,272.00 |    5.5% |      0.09 | :wavy_dash: `CCheckQueueSpeedPrevectorJob` (Unstable with ~10.9 iters. Increase `minEpochIterations` to e.g. 109)

martinus commented at 5:45 am on April 5, 2023: contributor

The only exception is a slowdown at https://github.com/bitcoin/bitcoin/commit/81f67977f543faca2dcc35846f73e2917375ae79 for the DirectNontrivial benchmark, but all the other improvements look quite significant.

I think the reason why DirectNontrivial got slower when only adding noexcept is that now the supposedly faster move operations are used when resizing the vector, but since this was implemented as a swap, the implementation had to swap the union which I think involves 3 copies behind the scenes: copy a to tmp, copy b to a, copy tmp to b.

The copy constructor for direct storage which was used before the noexcept is simpler, it only had to copy the union once.

So only in combination with noexcept and non-swap based implementation the direct storage version is faster

in src/prevector.h:286 in bfb9291a86

283+    prevector& operator=(prevector<N, T, Size, Diff>&& other) noexcept {
284+        if (!is_direct()) {
285+            free(_union.indirect_contents.indirect);
286+        }
287+        _union = std::move(other._union);
288+        _size = other._size;

john-moffett commented at 7:07 pm on May 12, 2023:

Maybe a good idea to:

0if (!other.is_direct()) {
1        other._union.indirect_contents.indirect = nullptr;
2}

in case anyone tries to free it?

martinus commented at 6:38 am on May 13, 2023:

I think it’s enough to have other._size = 0; because then other.is_direct() will return false and no free happens

john-moffett approved

john-moffett commented at 7:10 pm on May 12, 2023: contributor

Approach ACK bfb9291a8661fe5b26c14ed755cfa89d27c37110

theStack commented at 5:39 pm on June 9, 2023: contributor

Concept ACK

Very nice find! Here are the benchmark results on one of my slower machines (running OpenBSD 7.3, amd64). Interestingly, the speedup on the PrevectorFillVectorIndirect* benchmarks gained by marking noexcept seems to be even higher than 2x (close to 3x):

merge-base:

ns/op	op/s	err%	total	benchmark
13,998.14	71,438.05	1.5%	0.01	`PrevectorFillVectorDirectNontrivial`
9,583.81	104,342.67	2.0%	0.01	`PrevectorFillVectorDirectTrivial`
204,851.50	4,881.58	3.4%	0.01	`PrevectorFillVectorIndirectNontrivial`
180,488.00	5,540.53	4.1%	0.01	`PrevectorFillVectorIndirectTrivial`

commit “util: prevector’s move ctor and move assignment is noexcept”

ns/op	op/s	err%	total	benchmark
16,886.54	59,218.75	3.3%	0.01	`PrevectorFillVectorDirectNontrivial`
6,777.13	147,555.06	3.8%	0.01	`PrevectorFillVectorDirectTrivial`
70,299.40	14,224.87	1.6%	0.01	`PrevectorFillVectorIndirectNontrivial`
62,582.36	15,978.94	1.2%	0.01	`PrevectorFillVectorIndirectTrivial`

util: implement prevector’s move ctor & move assignment

ns/op	op/s	err%	total	benchmark
13,087.47	76,408.96	2.9%	0.01	`PrevectorFillVectorDirectNontrivial`
6,000.36	166,656.77	0.7%	0.01	`PrevectorFillVectorDirectTrivial`
71,839.56	13,919.91	3.6%	0.01	`PrevectorFillVectorIndirectNontrivial`
61,748.83	16,194.64	1.9%	0.01	`PrevectorFillVectorIndirectTrivial`

theStack approved

theStack commented at 5:00 pm on June 10, 2023: contributor

Tested and light code-review ACK bfb9291a8661fe5b26c14ed755cfa89d27c37110

(My understanding of move semantics is limited, but the change looks correct to me. [All the links @jonatack posted above](/bitcoin-bitcoin/27334/#pullrequestreview-1371788591) for further reading were very helpful, especially this one: https://quuxplusone.github.io/blog/2022/08/26/vector-pessimization/) @ryanofsky: you might be interested in this PR?

fanquake requested review from ryanofsky on Jun 12, 2023

jonatack approved

jonatack commented at 3:42 pm on June 15, 2023: contributor

Tested Approach ACK bfb9291, ~ACK modulo re-verifying the implementation of the last commit.

We use std::vector<CScriptCheck> data structures in CheckInputScripts() and ConnectBlock().

ACK bfb9291a8661fe5b26c14ed755cfa89d27c37110 re-reviewed, verified no current merge conflict by rebasing to current master, and have been running on mainnet for a few days.

Edit: I’d be happy to review a pull that documents the magic numbers in the prevector benchmarks.

achow101 commented at 7:32 pm on June 27, 2023: member

ACK bfb9291a8661fe5b26c14ed755cfa89d27c37110

achow101 merged this on Jun 27, 2023

achow101 closed this on Jun 27, 2023

jonatack commented at 8:58 pm on June 27, 2023: contributor

The merge script picked my quoted github snippet rather than the ACK; maybe it should ignore lines that begin with > .

0    ACKs for top commit:
1      achow101:
2        ACK bfb9291a8661fe5b26c14ed755cfa89d27c37110
3      jonatack:
4        > Tested Approach ACK [bfb9291](https://github.com/bitcoin/bitcoin/commit/bfb9291a8661fe5b26c14ed755cfa89d27c37110), ~ACK modulo re-verifying the implementation of the last commit.
5      john-moffett:
6        Approach ACK bfb9291a8661fe5b26c14ed755cfa89d27c37110
7      theStack:
8        Tested and light code-review ACK bfb9291a8661fe5b26c14ed755cfa89d27c37110

martinus deleted the branch on Jun 28, 2023

sidhujag referenced this in commit aa869ee933 on Jun 30, 2023

achow101 referenced this in commit 9e350dc74b on Mar 12, 2024

bitcoin locked this on Jun 27, 2024

util: implement noexcept move assignment & move ctor for prevector #27334

Reviews

util: implement `noexcept` move assignment & move ctor for `prevector` #27334