util: implement `noexcept` move assignment & move ctor for `prevector` #27334

pull martinus wants to merge 4 commits into bitcoin:master from martinus:2023-03-prevector-move changing 3 files +43 −4
  1. martinus commented at 2:55 PM on March 26, 2023: contributor

    prevector's move assignment and move constructor were not noexcept, which makes it inefficient to use inside STL containers like std::vector. That's the case e.g. for CScriptCheck. This PR adds noexcept, and also implements the move assignment & ctor, which makes it quite a bit more efficient to use prevector in an std::vector.

    The PR also adds a benchmark which grows an std::vector by adding prevector objects to it.

    merge-base: | ns/op | op/s | err% | ins/op | cyc/op | IPC | bra/op | miss% | total | benchmark |--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:---------- | 6,440.29 | 155,272.42 | 0.2% | 40,713.01 | 20,473.84 | 1.989 | 7,132.01 | 0.2% | 0.44 | PrevectorFillVectorDirectNontrivial | 3,213.19 | 311,217.35 | 0.7% | 35,373.01 | 10,214.07 | 3.463 | 6,945.00 | 0.2% | 0.43 | PrevectorFillVectorDirectTrivial | 34,749.70 | 28,777.23 | 0.1% | 364,396.05 | 110,521.94 | 3.297 | 78,568.37 | 0.1% | 0.43 | PrevectorFillVectorIndirectNontrivial | 32,535.05 | 30,736.09 | 0.4% | 353,823.31 | 103,464.53 | 3.420 | 79,871.80 | 0.2% | 0.40 | PrevectorFillVectorIndirectTrivial

    util: prevector's move ctor and move assignment is noexcept: | ns/op | op/s | err% | ins/op | cyc/op | IPC | bra/op | miss% | total | benchmark |--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:---------- | 6,603.87 | 151,426.40 | 0.2% | 23,734.01 | 21,009.63 | 1.130 | 2,445.01 | 0.3% | 0.44 | PrevectorFillVectorDirectNontrivial | 1,980.93 | 504,813.15 | 0.1% | 13,784.00 | 6,304.32 | 2.186 | 2,258.00 | 0.3% | 0.44 | PrevectorFillVectorDirectTrivial | 19,110.54 | 52,327.15 | 0.1% | 139,816.41 | 51,987.72 | 2.689 | 28,512.18 | 0.1% | 0.43 | PrevectorFillVectorIndirectNontrivial | 12,334.37 | 81,074.27 | 0.7% | 125,655.12 | 39,253.58 | 3.201 | 27,854.46 | 0.2% | 0.44 | PrevectorFillVectorIndirectTrivial

    util: implement prevector's move ctor & move assignment | ns/op | op/s | err% | ins/op | cyc/op | IPC | bra/op | miss% | total | benchmark |--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:---------- | 5,262.66 | 190,018.01 | 0.2% | 20,157.01 | 16,745.26 | 1.204 | 2,445.01 | 0.3% | 0.44 | PrevectorFillVectorDirectNontrivial | 1,687.07 | 592,744.35 | 0.2% | 12,742.00 | 5,368.02 | 2.374 | 2,258.00 | 0.3% | 0.44 | PrevectorFillVectorDirectTrivial | 17,930.80 | 55,769.95 | 0.1% | 136,237.69 | 47,903.31 | 2.844 | 28,512.02 | 0.2% | 0.42 | PrevectorFillVectorIndirectNontrivial | 11,893.75 | 84,077.78 | 0.2% | 126,182.02 | 37,852.91 | 3.333 | 28,152.01 | 0.1% | 0.44 | PrevectorFillVectorIndirectTrivial

    As can be seen, mostly thanks to just noexcept the benchmark becomes about 2 times faster because std::vector can now use move operations instead of having to fall back to copy everything

    I had a look at how this change affects the other benchmarks, and they are all pretty much the same, the only noticable difference is CCheckQueueSpeedPrevectorJob goes from 364.56ns down to 346.21ns.

  2. bench: Add benchmark for prevector usage in std::vector d380d2877e
  3. util: prevector's move ctor and move assignment is `noexcept`
    Move operations already are `noexcept`, so add the keyword to the methods.
    This makes the `PrevectorFillVectorIndirect...` benchmarks about twice
    as fast on my machine, because otherwise `std::vector` has to use a copy
    when the vector resizes.
    81f67977f5
  4. test: CScriptCheck is used a lot in std::vector, make sure that's efficient
    Adds a few static_asserts so CScriptCheck stays is_nothrow_move_assignable,
    is_nothrow_move_constructible, and is_nothrow_destructible
    fffc86f49f
  5. util: implement prevector's move ctor & move assignment
    Using swap() was rather wasteful because it had to copy the whole direct
    memory data twice. Also, due to the swap() in move assignment the moved-from
    object might hold on to unused memory for longer than necessary.
    bfb9291a86
  6. DrahtBot commented at 2:55 PM on March 26, 2023: contributor

    <!--e57a25ab6845829454e8d69fc972939a-->

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    <!--021abf342d371248e50ceaed478a90ca-->

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK theStack, jonatack, achow101
    Concept ACK hebasto
    Approach ACK hernanmarino, john-moffett

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

  7. DrahtBot added the label Utils/log/libs on Mar 26, 2023
  8. hebasto commented at 3:59 PM on March 26, 2023: member

    Concept ACK. Thanks for picking this up.

  9. hernanmarino commented at 11:58 PM on March 26, 2023: contributor

    Approach ACK

  10. in src/prevector.h:280 in bfb9291a86
     277 | @@ -276,8 +278,13 @@ class prevector {
     278 |          return *this;
     279 |      }
     280 |  
     281 | -    prevector& operator=(prevector<N, T, Size, Diff>&& other) {
     282 | -        swap(other);
    


    jonatack commented at 9:59 PM on April 4, 2023:

    After removing these two swap calls in the last commit bfb9291, should we remove the unused swap function?

    <details><summary>sample diff</summary><p>

    diff --git a/src/prevector.h b/src/prevector.h
    index bcab1ff00cd..2e8510acbe6 100644
    --- a/src/prevector.h
    +++ b/src/prevector.h
    @@ -459,12 +459,6 @@ public:
             return *item_ptr(size() - 1);
         }
     
    -    void swap(prevector<N, T, Size, Diff>& other) noexcept
    -    {
    -        std::swap(_union, other._union);
    -        std::swap(_size, other._size);
    -    }
    -
         ~prevector() {
             if (!is_direct()) {
                 free(_union.indirect_contents.indirect);
    diff --git a/src/test/fuzz/prevector.cpp b/src/test/fuzz/prevector.cpp
    index 9cea32e304f..312e20a0998 100644
    --- a/src/test/fuzz/prevector.cpp
    +++ b/src/test/fuzz/prevector.cpp
    @@ -161,12 +161,6 @@ public:
             pre_vector.shrink_to_fit();
         }
     
    -    void swap() noexcept
    -    {
    -        real_vector.swap(real_vector_alt);
    -        pre_vector.swap(pre_vector_alt);
    -    }
    -
         void move()
         {
             real_vector = std::move(real_vector_alt);
    @@ -211,7 +205,7 @@ FUZZ_TARGET(prevector)
     
         LIMITED_WHILE(prov.remaining_bytes(), 3000)
         {
    -        switch (prov.ConsumeIntegralInRange<int>(0, 13 + 3 * (test.size() > 0))) {
    +        switch (prov.ConsumeIntegralInRange<int>(0, 12 + 3 * (test.size() > 0))) {
             case 0:
                 test.insert(prov.ConsumeIntegralInRange<size_t>(0, test.size()), prov.ConsumeIntegral<int>());
                 break;
    @@ -261,21 +255,18 @@ FUZZ_TARGET(prevector)
                 test.assign(prov.ConsumeIntegralInRange<size_t>(0, 32767), prov.ConsumeIntegral<int>());
                 break;
             case 11:
    -            test.swap();
    -            break;
    -        case 12:
                 test.copy();
                 break;
    -        case 13:
    +        case 12:
                 test.move();
                 break;
    -        case 14:
    +        case 13:
                 test.update(prov.ConsumeIntegralInRange<size_t>(0, test.size() - 1), prov.ConsumeIntegral<int>());
                 break;
    -        case 15:
    +        case 14:
                 test.erase(prov.ConsumeIntegralInRange<size_t>(0, test.size() - 1));
                 break;
    -        case 16:
    +        case 15:
                 test.pop_back();
                 break;
             }
    diff --git a/src/test/prevector_tests.cpp b/src/test/prevector_tests.cpp
    index 1559011fcd1..375a8772d8f 100644
    --- a/src/test/prevector_tests.cpp
    +++ b/src/test/prevector_tests.cpp
    @@ -166,13 +166,6 @@ public:
             test();
         }
     
    -    void swap() noexcept
    -    {
    -        real_vector.swap(real_vector_alt);
    -        pre_vector.swap(pre_vector_alt);
    -        test();
    -    }
    -
         void move() {
             real_vector = std::move(real_vector_alt);
             real_vector_alt.clear();
    @@ -273,9 +266,6 @@ BOOST_AUTO_TEST_CASE(PrevectorTestInt)
                 if (InsecureRandBits(9) == 12) {
                     test.assign(InsecureRandBits(5), int(InsecureRand32()));
                 }
    -            if (InsecureRandBits(3) == 3) {
    -                test.swap();
    -            }
                 if (InsecureRandBits(4) == 8) {
                     test.copy();
                 }
    

    </p></details>


    john-moffett commented at 6:55 PM on May 12, 2023:

    I'd normally agree, but the documentation says that prevector is meant as a drop-in replacement for std::vector<T>, so I think it's best to leave it.

  11. jonatack commented at 10:38 PM on April 4, 2023: contributor

    Tested Approach ACK bfb9291a8661fe5b26c14ed755cfa89d27c37110, ~ACK modulo re-verifying the implementation of the last commit.

    Rebased to current master, built and tested at each commit on macOS Ventura 13.3, ARM64 M1 Max, regular (non-debug) build, and I am seeing the following speedups in the added benchmarks after the second commit 81f67977f543f and the fourth commit bfb9291a8661fe.

    The only exception is a slowdown at 81f67977f54 for the DirectNontrivial benchmark, but all the other improvements look quite significant.

    Ran the benchmarks a few times at each step to re-verify.

    current master @ 369d4c0

    |               ns/op |                op/s |    err% |     total | benchmark
    |--------------------:|--------------------:|--------:|----------:|:----------
    |            3,390.34 |          294,955.41 |    0.3% |      0.01 | `PrevectorFillVectorDirectNontrivial`
    |            2,455.29 |          407,283.18 |    0.2% |      0.01 | `PrevectorFillVectorDirectTrivial`
    |           23,923.60 |           41,799.74 |    1.0% |      0.01 | `PrevectorFillVectorIndirectNontrivial`
    |           16,638.89 |           60,100.17 |    0.6% |      0.01 | `PrevectorFillVectorIndirectTrivial`
    

    prevector's move ctor and move assignment is noexcept

    |            3,720.82 |          268,758.17 |    0.1% |      0.01 | `PrevectorFillVectorDirectNontrivial`
    |            2,028.66 |          492,935.82 |    0.2% |      0.01 | `PrevectorFillVectorDirectTrivial`
    |           11,149.71 |           89,688.44 |    2.2% |      0.01 | `PrevectorFillVectorIndirectNontrivial`
    |            5,943.73 |          168,244.45 |    0.1% |      0.01 | `PrevectorFillVectorIndirectTrivial`
    

    implement prevector's move ctor & move assignment

    |            2,761.52 |          362,119.10 |    0.2% |      0.01 | `PrevectorFillVectorDirectNontrivial`
    |            1,655.06 |          604,207.56 |    0.8% |      0.01 | `PrevectorFillVectorDirectTrivial`
    |            9,561.75 |          104,583.42 |    1.4% |      0.01 | `PrevectorFillVectorIndirectNontrivial`
    |            5,192.09 |          192,600.85 |    0.4% |      0.01 | `PrevectorFillVectorIndirectTrivial`
    

    We use std::vector<CScriptCheck> data structures in CheckInputScripts() and ConnectBlock().

    Some relevant links on this topic:

  12. jonatack commented at 10:53 PM on April 4, 2023: contributor

    I had a look at how this change affects the other benchmarks, and they are all pretty much the same, the only noticable difference is CCheckQueueSpeedPrevectorJob goes from 364.56ns down to 346.21ns.

    Here are my results with ./src/bench/bench_bitcoin -filter='CCheckQueueSpeedPrevectorJob*.*'.

    master @ 369d4c03b7084de967576759545ba36a17fc18bb

    |              ns/job |               job/s |    err% |     total | benchmark
    |--------------------:|--------------------:|--------:|----------:|:----------
    |              249.56 |        4,007,033.05 |    8.3% |      0.09 | :wavy_dash: `CCheckQueueSpeedPrevectorJob` (Unstable with ~10.9 iters. Increase `minEpochIterations` to e.g. 109)
    |              260.60 |        3,837,227.86 |   10.3% |      0.09 | :wavy_dash: `CCheckQueueSpeedPrevectorJob` (Unstable with ~10.9 iters. Increase `minEpochIterations` to e.g. 109)
    |              235.15 |        4,252,693.75 |    7.0% |      0.08 | :wavy_dash: `CCheckQueueSpeedPrevectorJob` (Unstable with ~10.9 iters. Increase `minEpochIterations` to e.g. 109)
    

    this pull (rebased to master)

    |              ns/job |               job/s |    err% |     total | benchmark
    |--------------------:|--------------------:|--------:|----------:|:----------
    |              237.13 |        4,217,094.35 |    6.6% |      0.09 | :wavy_dash: `CCheckQueueSpeedPrevectorJob` (Unstable with ~10.9 iters. Increase `minEpochIterations` to e.g. 109)
    |              246.34 |        4,059,432.96 |    6.6% |      0.08 | :wavy_dash: `CCheckQueueSpeedPrevectorJob` (Unstable with ~10.9 iters. Increase `minEpochIterations` to e.g. 109)
    |              230.56 |        4,337,272.00 |    5.5% |      0.09 | :wavy_dash: `CCheckQueueSpeedPrevectorJob` (Unstable with ~10.9 iters. Increase `minEpochIterations` to e.g. 109)
    
  13. martinus commented at 5:45 AM on April 5, 2023: contributor

    The only exception is a slowdown at https://github.com/bitcoin/bitcoin/commit/81f67977f543faca2dcc35846f73e2917375ae79 for the DirectNontrivial benchmark, but all the other improvements look quite significant.

    I think the reason why DirectNontrivial got slower when only adding noexcept is that now the supposedly faster move operations are used when resizing the vector, but since this was implemented as a swap, the implementation had to swap the union which I think involves 3 copies behind the scenes: copy a to tmp, copy b to a, copy tmp to b.

    The copy constructor for direct storage which was used before the noexcept is simpler, it only had to copy the union once.

    So only in combination with noexcept and non-swap based implementation the direct storage version is faster

  14. in src/prevector.h:286 in bfb9291a86
     283 | +    prevector& operator=(prevector<N, T, Size, Diff>&& other) noexcept {
     284 | +        if (!is_direct()) {
     285 | +            free(_union.indirect_contents.indirect);
     286 | +        }
     287 | +        _union = std::move(other._union);
     288 | +        _size = other._size;
    


    john-moffett commented at 7:07 PM on May 12, 2023:

    Maybe a good idea to:

    if (!other.is_direct()) {
            other._union.indirect_contents.indirect = nullptr;
    }
    

    in case anyone tries to free it?


    martinus commented at 6:38 AM on May 13, 2023:

    I think it's enough to have other._size = 0; because then other.is_direct() will return false and no free happens

  15. john-moffett approved
  16. john-moffett commented at 7:10 PM on May 12, 2023: contributor

    Approach ACK bfb9291a8661fe5b26c14ed755cfa89d27c37110

  17. theStack commented at 5:39 PM on June 9, 2023: contributor

    Concept ACK

    Very nice find! Here are the benchmark results on one of my slower machines (running OpenBSD 7.3, amd64). Interestingly, the speedup on the PrevectorFillVectorIndirect* benchmarks gained by marking noexcept seems to be even higher than 2x (close to 3x):

    merge-base:

    | ns/op | op/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 13,998.14 | 71,438.05 | 1.5% | 0.01 | PrevectorFillVectorDirectNontrivial | 9,583.81 | 104,342.67 | 2.0% | 0.01 | PrevectorFillVectorDirectTrivial | 204,851.50 | 4,881.58 | 3.4% | 0.01 | PrevectorFillVectorIndirectNontrivial | 180,488.00 | 5,540.53 | 4.1% | 0.01 | PrevectorFillVectorIndirectTrivial

    commit "util: prevector's move ctor and move assignment is noexcept"

    | ns/op | op/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 16,886.54 | 59,218.75 | 3.3% | 0.01 | PrevectorFillVectorDirectNontrivial | 6,777.13 | 147,555.06 | 3.8% | 0.01 | PrevectorFillVectorDirectTrivial | 70,299.40 | 14,224.87 | 1.6% | 0.01 | PrevectorFillVectorIndirectNontrivial | 62,582.36 | 15,978.94 | 1.2% | 0.01 | PrevectorFillVectorIndirectTrivial

    util: implement prevector's move ctor & move assignment

    | ns/op | op/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 13,087.47 | 76,408.96 | 2.9% | 0.01 | PrevectorFillVectorDirectNontrivial | 6,000.36 | 166,656.77 | 0.7% | 0.01 | PrevectorFillVectorDirectTrivial | 71,839.56 | 13,919.91 | 3.6% | 0.01 | PrevectorFillVectorIndirectNontrivial | 61,748.83 | 16,194.64 | 1.9% | 0.01 | PrevectorFillVectorIndirectTrivial

  18. theStack approved
  19. theStack commented at 5:00 PM on June 10, 2023: contributor

    Tested and light code-review ACK bfb9291a8661fe5b26c14ed755cfa89d27c37110

    (My understanding of move semantics is limited, but the change looks correct to me. All the links [@jonatack](/bitcoin-bitcoin/contributor/jonatack/) posted above for further reading were very helpful, especially this one: https://quuxplusone.github.io/blog/2022/08/26/vector-pessimization/) @ryanofsky: you might be interested in this PR?

  20. fanquake requested review from ryanofsky on Jun 12, 2023
  21. jonatack approved
  22. jonatack commented at 3:42 PM on June 15, 2023: contributor

    Tested Approach ACK bfb9291, ~ACK modulo re-verifying the implementation of the last commit.

    We use std::vector<CScriptCheck> data structures in CheckInputScripts() and ConnectBlock().

    ACK bfb9291a8661fe5b26c14ed755cfa89d27c37110 re-reviewed, verified no current merge conflict by rebasing to current master, and have been running on mainnet for a few days.

    Edit: I'd be happy to review a pull that documents the magic numbers in the prevector benchmarks.

  23. achow101 commented at 7:32 PM on June 27, 2023: member

    ACK bfb9291a8661fe5b26c14ed755cfa89d27c37110

  24. achow101 merged this on Jun 27, 2023
  25. achow101 closed this on Jun 27, 2023

  26. jonatack commented at 8:58 PM on June 27, 2023: contributor

    The merge script picked my quoted github snippet rather than the ACK; maybe it should ignore lines that begin with > .

        ACKs for top commit:
          achow101:
            ACK bfb9291a8661fe5b26c14ed755cfa89d27c37110
          jonatack:
            > Tested Approach ACK [bfb9291](https://github.com/bitcoin/bitcoin/commit/bfb9291a8661fe5b26c14ed755cfa89d27c37110), ~ACK modulo re-verifying the implementation of the last commit.
          john-moffett:
            Approach ACK bfb9291a8661fe5b26c14ed755cfa89d27c37110
          theStack:
            Tested and light code-review ACK bfb9291a8661fe5b26c14ed755cfa89d27c37110
    
  27. martinus deleted the branch on Jun 28, 2023
  28. sidhujag referenced this in commit aa869ee933 on Jun 30, 2023
  29. achow101 referenced this in commit 9e350dc74b on Mar 12, 2024
  30. bitcoin locked this on Jun 27, 2024

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-04-13 15:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me