Make prevector::resize() and other prevector operations much faster #12549

pull eklitzke wants to merge 3 commits into bitcoin:master from eklitzke:prevector changing 5 files +151 −88
  1. eklitzke commented at 4:16 am on February 27, 2018: contributor

    This branch optimizes various prevector operations, especially resizing vectors. While profiling the loadblk thread I noticed that a lot of time was being spent in prevector::resize() which led to this work. I have some data here indicating that it takes up 37% of the time in ReadBlockFromDisk(): https://monad.io/readblockfromdisk.svg

    This branch improves things significantly. For trivial types, the new results for the prevector benchmark are:

    • PrevectorClearTrivial which tests prevector::clear() becomes 24.6x faster
    • PrevectorDestructorTrivial which tests prevector::~prevector() becomes 20.5x faster
    • PrevectorResizeTrivial which tests prevector::resize() becomes 20.3x faster

    Note that in practice it looks like the prevector is only used to contain unsigned char types, which is a trivial type. The benchmarks are testing a bit of an extreme case, but the changes here are motivated by the profiling data for ReadBlockFromDisk() I linked to above.

    The pull request here consists of a series of three commits:

    • The first adds new benchmarks but does not change the prevector code.
    • The second is from @AkioNak , and merges some prevector optimizations he submitted in #11988
    • The third optimizes prevector::resize() to use memset() when the prevector contains trivially constructible types
  2. meshcollider commented at 4:25 am on February 27, 2018: contributor
    Nice speedup! Concept ACK
  3. kallewoof commented at 4:37 am on February 27, 2018: member
    This exists in #11988 no?
  4. eklitzke commented at 4:40 am on February 27, 2018: contributor

    It looks like this branch is failing because std::is_trivially_constructible wasn’t added until GCC 5 (according to https://www.gnu.org/software/gcc/gcc-5/changes.html ), even though it is part of C++11. I can find a workaround for GCC 4.8.

    This isn’t quite the same as #11988 because that still uses placement new.

  5. kallewoof commented at 4:48 am on February 27, 2018: member
    If it’s not hugely different consider reviewing #11988 and suggesting changes to make it better instead.
  6. eklitzke commented at 4:52 am on February 27, 2018: contributor
    That’s a good idea, I’ll see if I can cherry-pick @AkioNak’s commit into my branch while I’m testing the GCC 4.8 compatibility issues.
  7. eklitzke force-pushed on Feb 27, 2018
  8. eklitzke commented at 6:46 am on February 27, 2018: contributor

    This new branch cherry-picks @AkioNak ’s commit from #11988 and then adds my memset optimization. I constructed this PR such that it has three commits:

    • First commit just adds new benchmarks
    • Second commit is from #11988 (unmodified, preserves authorship)
    • Third commit implements my memset() optimization

    I have a script that benchmarks the performance for each of the three commits. Here’s the summarized data. The first commit is considered to have a baseline speedup of 1x, and then each of the subsequent commits shows the speedup from the base:

     034d238     PrevectorClear                 1.00x
     134d238     PrevectorDestructor            1.00x
     234d238     PrevectorResizeNontrivial      1.00x
     334d238     PrevectorResizeTrivial         1.00x
     4d1d28e     PrevectorClear                 2.36x
     5d1d28e     PrevectorDestructor            2.16x
     6d1d28e     PrevectorResizeNontrivial      3.24x
     7d1d28e     PrevectorResizeTrivial         2.76x
     8f8b078     PrevectorClear                 8.31x
     9f8b078     PrevectorDestructor            7.62x
    10f8b078     PrevectorResizeNontrivial      6.56x
    11f8b078     PrevectorResizeTrivial         18.99x
    

    Here’s the raw data I generated the above from:

     0git 34d2387b1d5c753d6ba130eb6b4730dcc15072c5
     1# Benchmark, evals, iterations, total, min, max, median
     2PrevectorClear, 5, 5600, 3.47561, 0.00012317, 0.000125752, 0.000123494
     3PrevectorDestructor, 5, 5700, 3.2223, 0.000112914, 0.000113208, 0.000113068
     4PrevectorResizeNontrivial, 5, 4100, 3.24072, 0.000158005, 0.000158212, 0.000158075
     5PrevectorResizeTrivial, 5, 4900, 3.24407, 0.000131094, 0.000133749, 0.000132495
     6
     7git d1d28e72154a9f2e7fb56001f2b790074d1cfa30
     8# Benchmark, evals, iterations, total, min, max, median
     9PrevectorClear, 5, 5600, 1.4682, 5.23405e-05, 5.2611e-05, 5.24265e-05
    10PrevectorDestructor, 5, 5700, 1.49587, 5.24034e-05, 5.26179e-05, 5.2448e-05
    11PrevectorResizeNontrivial, 5, 4100, 1.00747, 4.87854e-05, 5.04511e-05, 4.88389e-05
    12PrevectorResizeTrivial, 5, 4900, 1.17581, 4.79055e-05, 4.81338e-05, 4.79753e-05
    13
    14git f8b078c8cda6d4ac6e3b112ce605b08f6ac002b9
    15# Benchmark, evals, iterations, total, min, max, median
    16PrevectorClear, 5, 5600, 0.419837, 1.48182e-05, 1.55947e-05, 1.48556e-05
    17PrevectorDestructor, 5, 5700, 0.422994, 1.48187e-05, 1.48808e-05, 1.48327e-05
    18PrevectorResizeNontrivial, 5, 4100, 0.494272, 2.40686e-05, 2.41813e-05, 2.4084e-05
    19PrevectorResizeTrivial, 5, 4900, 0.171037, 6.89859e-06, 7.06054e-06, 6.97555e-06
    
  9. in src/prevector.h:343 in f8b078c8cd outdated
    343         }
    344-        while (size() < new_size) {
    345-            _size++;
    346-            new(static_cast<void*>(item_ptr(size() - 1))) T();
    347-        }
    348+        size_t increase = new_size - cur_size;
    


    Empact commented at 7:35 am on February 27, 2018:

    Maybe use ptrdiff_t here and in fill? Would allow you to assert against the negative case rather than have a very large value.

    Not that the negative case is possible with the current code, but it could be protective going forward.

  10. in src/bench/prevector.cpp:53 in f8b078c8cd outdated
    48+        }
    49+    }
    50+}
    51+
    52+static void PrevectorResizeTrivial(benchmark::State& state) {
    53+    static_assert(IS_TRIVIALLY_CONSTRUCTIBLE<unsigned char>::value);
    


    AkioNak commented at 2:27 pm on February 27, 2018:
    I think diganostic message can not be omitted on c++11.
  11. in src/bench/prevector.cpp:63 in f8b078c8cd outdated
    58+    int x;
    59+    NontrivialStruct() :x(-1) {}
    60+};
    61+
    62+static void PrevectorResizeNontrivial(benchmark::State& state) {
    63+    static_assert(!IS_TRIVIALLY_CONSTRUCTIBLE<NontrivialStruct>::value);
    


    AkioNak commented at 2:28 pm on February 27, 2018:
    I think diganostic message can not be omitted on c++11.
  12. laanwj commented at 2:57 pm on February 27, 2018: member
    Ping @sipa who wrote this code.
  13. eklitzke force-pushed on Feb 27, 2018
  14. eklitzke force-pushed on Feb 27, 2018
  15. eklitzke force-pushed on Feb 27, 2018
  16. eklitzke commented at 5:30 pm on February 27, 2018: contributor

    I’ve addressed the feedback I’ve gotten so far, and expanded the other benchmarks to test both trivial and non-trivial types.

    One thing that I also found interesting while doing this change was the way the current code is written to not use SFINAE. On GCC 7.3 (what I have locally) the compiler is smart enough to do the type checks at compile time and not generate runtime checks, meaning it generates optimal code (i.e. the same code it would generate using SFINAE). On GCC 4.8 it’s smart enough to do that for trivial types, but does not generate optimal code in the nontrivial path. This is probably fine because the main use of prevector is to hold unsigned char (a trivially constructible and destructible type).

  17. sipa commented at 6:55 pm on February 27, 2018: member
    I believe your first commit won’t compile on GCC 4.8. Also, if you’re going to use IS_TRIVIALLY_CONSTRUCTIBLE outside of prevector.h, perhaps it’s better to move it elsewhere (like util.h or compat.h)?
  18. eklitzke commented at 7:03 pm on February 27, 2018: contributor
    Good catch, I only tested the last commit. I’ll rebase this branch so all of the commits build cleanly, and move the macro definition.
  19. Add new prevector benchmarks.
    This prepares for a series of two additional commits which optimize
    prevector performance.
    f0e7aa7020
  20. Reduce redundant code of prevector and speed it up
    In prevector.h, the code which like item_ptr(size()) apears in the loop.
    Both item_ptr() and size() judge whether values are held directly or
    indirectly, but in most cases it is sufficient to make that judgement
    once outside the loop.
    
    This PR adds 2 private function fill() which has the loop to initialize
    by specified value (or iterator of the other prevector's element),
    but don't call item_ptr() in their loop.
    Other functions(assign(), constructor, operator=(), insert())
    that has similar loop, call fill() instead of original loop.
    
    Also, resize() was changed like fill(), but it calls the default
    constructor for that element each time.
    e46be25f0e
  21. eklitzke force-pushed on Feb 27, 2018
  22. eklitzke commented at 8:49 pm on February 27, 2018: contributor

    Here are the latest results, tested under both GCC 7.3 and GCC 4.8. For both compilers I ran the full test suite and benchmarks for each of the three commits in this PR.

    GCC 4.8

     0Commit     Test                                     Median (s)   Speedup
     1------------------------------------------------------------------------
     2f0e7aa     PrevectorClearNontrivial                 8.15034e-05   1.00x
     3f0e7aa     PrevectorClearTrivial                    1.49740e-04   1.00x
     4f0e7aa     PrevectorDestructorNontrivial            8.27660e-05   1.00x
     5f0e7aa     PrevectorDestructorTrivial               1.48346e-04   1.00x
     6f0e7aa     PrevectorResizeNontrivial                8.26464e-05   1.00x
     7f0e7aa     PrevectorResizeTrivial                   1.50579e-04   1.00x
     8e46be2     PrevectorClearNontrivial                 2.59100e-05   3.15x
     9e46be2     PrevectorClearTrivial                    4.80695e-05   3.12x
    10e46be2     PrevectorDestructorNontrivial            2.54231e-05   3.26x
    11e46be2     PrevectorDestructorTrivial               4.80365e-05   3.09x
    12e46be2     PrevectorResizeNontrivial                2.54693e-05   3.24x
    13e46be2     PrevectorResizeTrivial                   4.82063e-05   3.12x
    143f9abf     PrevectorClearNontrivial                 2.60759e-05   3.13x
    153f9abf     PrevectorClearTrivial                    6.08490e-06  24.61x
    163f9abf     PrevectorDestructorNontrivial            2.55872e-05   3.23x
    173f9abf     PrevectorDestructorTrivial               6.09886e-06  24.32x
    183f9abf     PrevectorResizeNontrivial                2.56024e-05   3.23x
    193f9abf     PrevectorResizeTrivial                   6.15214e-06  24.48x
    

    GCC 7.3

     0Commit     Test                                     Median (s)   Speedup
     1------------------------------------------------------------------------
     2f0e7aa     PrevectorClearNontrivial                 1.58989e-04   1.00x
     3f0e7aa     PrevectorClearTrivial                    1.31495e-04   1.00x
     4f0e7aa     PrevectorDestructorNontrivial            1.58322e-04   1.00x
     5f0e7aa     PrevectorDestructorTrivial               1.31456e-04   1.00x
     6f0e7aa     PrevectorResizeNontrivial                1.58518e-04   1.00x
     7f0e7aa     PrevectorResizeTrivial                   1.29851e-04   1.00x
     8e46be2     PrevectorClearNontrivial                 4.89286e-05   3.25x
     9e46be2     PrevectorClearTrivial                    4.81217e-05   2.73x
    10e46be2     PrevectorDestructorNontrivial            4.94412e-05   3.20x
    11e46be2     PrevectorDestructorTrivial               4.79828e-05   2.74x
    12e46be2     PrevectorResizeNontrivial                4.89689e-05   3.24x
    13e46be2     PrevectorResizeTrivial                   4.80065e-05   2.70x
    143f9abf     PrevectorClearNontrivial                 2.59529e-05   6.13x
    153f9abf     PrevectorClearTrivial                    6.39799e-06  20.55x
    163f9abf     PrevectorDestructorNontrivial            2.58583e-05   6.12x
    173f9abf     PrevectorDestructorTrivial               6.40997e-06  20.51x
    183f9abf     PrevectorResizeNontrivial                2.56362e-05   6.18x
    193f9abf     PrevectorResizeTrivial                   6.40775e-06  20.26x
    

    It’s interesting that GCC 4.8 generates better code in some cases. Anyway, hopefully this is enough data to convince people that this is a good change.

  23. JeremyRubin commented at 8:54 pm on February 27, 2018: contributor

    utACK

    This is a good optimization @AkioNak / @eklitzke! I didn’t have a hack for getting cross platform support for trivial constructors when I wrote #9505, so glad to see trivial types handled more optimally now (noting this to clarify that there was no other reason not to do this that I came across in #9505)

    I would only nit that you should amend the line:

    prevector::~prevector() for trivial types becomes 7.6x faster (most of its time was spent in resize())
    

    to make it clear you are only referring to the Benchmark, not to the actual destructor (there should be no performance change under this patchset, and it would be good for you to demonstrate that – ironically it means I probably under-reported the performance advantage when I added the trivial_destructor commit, because I counted the resizes).

  24. in src/compat.h:21 in 3f9abf49d7 outdated
    16+// https://www.gnu.org/software/gcc/gcc-5/changes.html
    17+#if defined(__GNUC__) && __GNUC__ < 5
    18+#define IS_TRIVIALLY_CONSTRUCTIBLE std::is_trivial
    19+#else
    20+#define IS_TRIVIALLY_CONSTRUCTIBLE std::is_trivially_constructible
    21+#endif
    


    Empact commented at 9:00 pm on February 27, 2018:
    Is this redundant with the same in prevector.h?

    eklitzke commented at 9:28 pm on February 27, 2018:
    Good catch, pushing a rebase that fixes this.
  25. Use memset() to optimize prevector::resize()
    Further optimize prevector::resize() (which is called by a number of
    other prevector methods) to use memset to initialize memory when the
    prevector contains trivial types.
    5aad635b78
  26. eklitzke force-pushed on Feb 27, 2018
  27. sipa commented at 9:44 pm on February 27, 2018: member
    utACK 5aad635b78c8359adae9b2af015b67b7325c0e0b. Those benchmark numbers look very nice; thanks @AkioNak as well (sorry I never got around to reviewing your previous PR).
  28. eklitzke commented at 11:18 pm on February 27, 2018: contributor
    Updated profile for ReadBlockFromDisk() with these changes: https://monad.io/readblockfromdisk2.svg . The time spent waiting on prevector::resize() has dropped from about 32% of the total time to 4% of the total time.
  29. AkioNak commented at 11:55 pm on February 27, 2018: contributor
    @sipa please never mind.
  30. fanquake added the label Refactoring on Feb 28, 2018
  31. laanwj commented at 11:08 am on March 1, 2018: member
    utACK 5aad635b78c8359adae9b2af015b67b7325c0e0b
  32. laanwj merged this on Mar 1, 2018
  33. laanwj closed this on Mar 1, 2018

  34. laanwj referenced this in commit 32987d5aeb on Mar 1, 2018
  35. eklitzke commented at 8:50 pm on March 1, 2018: contributor

    I just discovered this causes a warning in GCC 8, which added a new -Wclass-memaccess warning. The warning is spurious/incorrect, and GCC 8 is still not released as stable so I’m effectively testing a pre-release compiler. I filed a bug upstream: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84656

    If GCC 8 is released without fixing this we can add an explicit void* cast in the memset() call to suppress the warning. The other option would be to SFINAE the entire class (more work).

  36. eklitzke deleted the branch on Mar 4, 2018
  37. codablock referenced this in commit f599b4759f on Oct 1, 2019
  38. codablock referenced this in commit e5852a2bcd on Oct 1, 2019
  39. codablock referenced this in commit 595826ad63 on Oct 2, 2019
  40. barrystyle referenced this in commit 2ceaa3eabb on Jan 22, 2020
  41. furszy referenced this in commit 95ed10a37e on Dec 20, 2020
  42. DrahtBot locked this on Sep 8, 2021

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-11-17 15:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me