Make prevector::resize() and other prevector operations much faster #12549

pull eklitzke wants to merge 3 commits into bitcoin:master from eklitzke:prevector changing 5 files +151 −88
  1. eklitzke commented at 4:16 AM on February 27, 2018: contributor

    This branch optimizes various prevector operations, especially resizing vectors. While profiling the loadblk thread I noticed that a lot of time was being spent in prevector::resize() which led to this work. I have some data here indicating that it takes up 37% of the time in ReadBlockFromDisk(): https://monad.io/readblockfromdisk.svg

    This branch improves things significantly. For trivial types, the new results for the prevector benchmark are:

    • PrevectorClearTrivial which tests prevector::clear() becomes 24.6x faster
    • PrevectorDestructorTrivial which tests prevector::~prevector() becomes 20.5x faster
    • PrevectorResizeTrivial which tests prevector::resize() becomes 20.3x faster

    Note that in practice it looks like the prevector is only used to contain unsigned char types, which is a trivial type. The benchmarks are testing a bit of an extreme case, but the changes here are motivated by the profiling data for ReadBlockFromDisk() I linked to above.

    The pull request here consists of a series of three commits:

    • The first adds new benchmarks but does not change the prevector code.
    • The second is from @AkioNak , and merges some prevector optimizations he submitted in #11988
    • The third optimizes prevector::resize() to use memset() when the prevector contains trivially constructible types
  2. meshcollider commented at 4:25 AM on February 27, 2018: contributor

    Nice speedup! Concept ACK

  3. kallewoof commented at 4:37 AM on February 27, 2018: member

    This exists in #11988 no?

  4. eklitzke commented at 4:40 AM on February 27, 2018: contributor

    It looks like this branch is failing because std::is_trivially_constructible wasn't added until GCC 5 (according to https://www.gnu.org/software/gcc/gcc-5/changes.html ), even though it is part of C++11. I can find a workaround for GCC 4.8.

    This isn't quite the same as #11988 because that still uses placement new.

  5. kallewoof commented at 4:48 AM on February 27, 2018: member

    If it's not hugely different consider reviewing #11988 and suggesting changes to make it better instead.

  6. eklitzke commented at 4:52 AM on February 27, 2018: contributor

    That's a good idea, I'll see if I can cherry-pick @AkioNak's commit into my branch while I'm testing the GCC 4.8 compatibility issues.

  7. eklitzke force-pushed on Feb 27, 2018
  8. eklitzke commented at 6:46 AM on February 27, 2018: contributor

    This new branch cherry-picks @AkioNak 's commit from #11988 and then adds my memset optimization. I constructed this PR such that it has three commits:

    • First commit just adds new benchmarks
    • Second commit is from #11988 (unmodified, preserves authorship)
    • Third commit implements my memset() optimization

    I have a script that benchmarks the performance for each of the three commits. Here's the summarized data. The first commit is considered to have a baseline speedup of 1x, and then each of the subsequent commits shows the speedup from the base:

    34d238     PrevectorClear                 1.00x
    34d238     PrevectorDestructor            1.00x
    34d238     PrevectorResizeNontrivial      1.00x
    34d238     PrevectorResizeTrivial         1.00x
    d1d28e     PrevectorClear                 2.36x
    d1d28e     PrevectorDestructor            2.16x
    d1d28e     PrevectorResizeNontrivial      3.24x
    d1d28e     PrevectorResizeTrivial         2.76x
    f8b078     PrevectorClear                 8.31x
    f8b078     PrevectorDestructor            7.62x
    f8b078     PrevectorResizeNontrivial      6.56x
    f8b078     PrevectorResizeTrivial         18.99x
    

    Here's the raw data I generated the above from:

    git 34d2387b1d5c753d6ba130eb6b4730dcc15072c5
    # Benchmark, evals, iterations, total, min, max, median
    PrevectorClear, 5, 5600, 3.47561, 0.00012317, 0.000125752, 0.000123494
    PrevectorDestructor, 5, 5700, 3.2223, 0.000112914, 0.000113208, 0.000113068
    PrevectorResizeNontrivial, 5, 4100, 3.24072, 0.000158005, 0.000158212, 0.000158075
    PrevectorResizeTrivial, 5, 4900, 3.24407, 0.000131094, 0.000133749, 0.000132495
    
    git d1d28e72154a9f2e7fb56001f2b790074d1cfa30
    # Benchmark, evals, iterations, total, min, max, median
    PrevectorClear, 5, 5600, 1.4682, 5.23405e-05, 5.2611e-05, 5.24265e-05
    PrevectorDestructor, 5, 5700, 1.49587, 5.24034e-05, 5.26179e-05, 5.2448e-05
    PrevectorResizeNontrivial, 5, 4100, 1.00747, 4.87854e-05, 5.04511e-05, 4.88389e-05
    PrevectorResizeTrivial, 5, 4900, 1.17581, 4.79055e-05, 4.81338e-05, 4.79753e-05
    
    git f8b078c8cda6d4ac6e3b112ce605b08f6ac002b9
    # Benchmark, evals, iterations, total, min, max, median
    PrevectorClear, 5, 5600, 0.419837, 1.48182e-05, 1.55947e-05, 1.48556e-05
    PrevectorDestructor, 5, 5700, 0.422994, 1.48187e-05, 1.48808e-05, 1.48327e-05
    PrevectorResizeNontrivial, 5, 4100, 0.494272, 2.40686e-05, 2.41813e-05, 2.4084e-05
    PrevectorResizeTrivial, 5, 4900, 0.171037, 6.89859e-06, 7.06054e-06, 6.97555e-06
    
  9. in src/prevector.h:343 in f8b078c8cd outdated
     343 |          }
     344 | -        while (size() < new_size) {
     345 | -            _size++;
     346 | -            new(static_cast<void*>(item_ptr(size() - 1))) T();
     347 | -        }
     348 | +        size_t increase = new_size - cur_size;
    


    Empact commented at 7:35 AM on February 27, 2018:

    Maybe use ptrdiff_t here and in fill? Would allow you to assert against the negative case rather than have a very large value.

    Not that the negative case is possible with the current code, but it could be protective going forward.

  10. in src/bench/prevector.cpp:53 in f8b078c8cd outdated
      48 | +        }
      49 | +    }
      50 | +}
      51 | +
      52 | +static void PrevectorResizeTrivial(benchmark::State& state) {
      53 | +    static_assert(IS_TRIVIALLY_CONSTRUCTIBLE<unsigned char>::value);
    


    AkioNak commented at 2:27 PM on February 27, 2018:

    I think diganostic message can not be omitted on c++11.

  11. in src/bench/prevector.cpp:63 in f8b078c8cd outdated
      58 | +    int x;
      59 | +    NontrivialStruct() :x(-1) {}
      60 | +};
      61 | +
      62 | +static void PrevectorResizeNontrivial(benchmark::State& state) {
      63 | +    static_assert(!IS_TRIVIALLY_CONSTRUCTIBLE<NontrivialStruct>::value);
    


    AkioNak commented at 2:28 PM on February 27, 2018:

    I think diganostic message can not be omitted on c++11.

  12. laanwj commented at 2:57 PM on February 27, 2018: member

    Ping @sipa who wrote this code.

  13. eklitzke force-pushed on Feb 27, 2018
  14. eklitzke force-pushed on Feb 27, 2018
  15. eklitzke force-pushed on Feb 27, 2018
  16. eklitzke commented at 5:30 PM on February 27, 2018: contributor

    I've addressed the feedback I've gotten so far, and expanded the other benchmarks to test both trivial and non-trivial types.

    One thing that I also found interesting while doing this change was the way the current code is written to not use SFINAE. On GCC 7.3 (what I have locally) the compiler is smart enough to do the type checks at compile time and not generate runtime checks, meaning it generates optimal code (i.e. the same code it would generate using SFINAE). On GCC 4.8 it's smart enough to do that for trivial types, but does not generate optimal code in the nontrivial path. This is probably fine because the main use of prevector is to hold unsigned char (a trivially constructible and destructible type).

  17. sipa commented at 6:55 PM on February 27, 2018: member

    I believe your first commit won't compile on GCC 4.8. Also, if you're going to use IS_TRIVIALLY_CONSTRUCTIBLE outside of prevector.h, perhaps it's better to move it elsewhere (like util.h or compat.h)?

  18. eklitzke commented at 7:03 PM on February 27, 2018: contributor

    Good catch, I only tested the last commit. I'll rebase this branch so all of the commits build cleanly, and move the macro definition.

  19. Add new prevector benchmarks.
    This prepares for a series of two additional commits which optimize
    prevector performance.
    f0e7aa7020
  20. Reduce redundant code of prevector and speed it up
    In prevector.h, the code which like item_ptr(size()) apears in the loop.
    Both item_ptr() and size() judge whether values are held directly or
    indirectly, but in most cases it is sufficient to make that judgement
    once outside the loop.
    
    This PR adds 2 private function fill() which has the loop to initialize
    by specified value (or iterator of the other prevector's element),
    but don't call item_ptr() in their loop.
    Other functions(assign(), constructor, operator=(), insert())
    that has similar loop, call fill() instead of original loop.
    
    Also, resize() was changed like fill(), but it calls the default
    constructor for that element each time.
    e46be25f0e
  21. eklitzke force-pushed on Feb 27, 2018
  22. eklitzke commented at 8:49 PM on February 27, 2018: contributor

    Here are the latest results, tested under both GCC 7.3 and GCC 4.8. For both compilers I ran the full test suite and benchmarks for each of the three commits in this PR.

    GCC 4.8

    Commit     Test                                     Median (s)   Speedup
    ------------------------------------------------------------------------
    f0e7aa     PrevectorClearNontrivial                 8.15034e-05   1.00x
    f0e7aa     PrevectorClearTrivial                    1.49740e-04   1.00x
    f0e7aa     PrevectorDestructorNontrivial            8.27660e-05   1.00x
    f0e7aa     PrevectorDestructorTrivial               1.48346e-04   1.00x
    f0e7aa     PrevectorResizeNontrivial                8.26464e-05   1.00x
    f0e7aa     PrevectorResizeTrivial                   1.50579e-04   1.00x
    e46be2     PrevectorClearNontrivial                 2.59100e-05   3.15x
    e46be2     PrevectorClearTrivial                    4.80695e-05   3.12x
    e46be2     PrevectorDestructorNontrivial            2.54231e-05   3.26x
    e46be2     PrevectorDestructorTrivial               4.80365e-05   3.09x
    e46be2     PrevectorResizeNontrivial                2.54693e-05   3.24x
    e46be2     PrevectorResizeTrivial                   4.82063e-05   3.12x
    3f9abf     PrevectorClearNontrivial                 2.60759e-05   3.13x
    3f9abf     PrevectorClearTrivial                    6.08490e-06  24.61x
    3f9abf     PrevectorDestructorNontrivial            2.55872e-05   3.23x
    3f9abf     PrevectorDestructorTrivial               6.09886e-06  24.32x
    3f9abf     PrevectorResizeNontrivial                2.56024e-05   3.23x
    3f9abf     PrevectorResizeTrivial                   6.15214e-06  24.48x
    

    GCC 7.3

    Commit     Test                                     Median (s)   Speedup
    ------------------------------------------------------------------------
    f0e7aa     PrevectorClearNontrivial                 1.58989e-04   1.00x
    f0e7aa     PrevectorClearTrivial                    1.31495e-04   1.00x
    f0e7aa     PrevectorDestructorNontrivial            1.58322e-04   1.00x
    f0e7aa     PrevectorDestructorTrivial               1.31456e-04   1.00x
    f0e7aa     PrevectorResizeNontrivial                1.58518e-04   1.00x
    f0e7aa     PrevectorResizeTrivial                   1.29851e-04   1.00x
    e46be2     PrevectorClearNontrivial                 4.89286e-05   3.25x
    e46be2     PrevectorClearTrivial                    4.81217e-05   2.73x
    e46be2     PrevectorDestructorNontrivial            4.94412e-05   3.20x
    e46be2     PrevectorDestructorTrivial               4.79828e-05   2.74x
    e46be2     PrevectorResizeNontrivial                4.89689e-05   3.24x
    e46be2     PrevectorResizeTrivial                   4.80065e-05   2.70x
    3f9abf     PrevectorClearNontrivial                 2.59529e-05   6.13x
    3f9abf     PrevectorClearTrivial                    6.39799e-06  20.55x
    3f9abf     PrevectorDestructorNontrivial            2.58583e-05   6.12x
    3f9abf     PrevectorDestructorTrivial               6.40997e-06  20.51x
    3f9abf     PrevectorResizeNontrivial                2.56362e-05   6.18x
    3f9abf     PrevectorResizeTrivial                   6.40775e-06  20.26x
    

    It's interesting that GCC 4.8 generates better code in some cases. Anyway, hopefully this is enough data to convince people that this is a good change.

  23. JeremyRubin commented at 8:54 PM on February 27, 2018: contributor

    utACK

    This is a good optimization @AkioNak / @eklitzke! I didn't have a hack for getting cross platform support for trivial constructors when I wrote #9505, so glad to see trivial types handled more optimally now (noting this to clarify that there was no other reason not to do this that I came across in #9505)

    I would only nit that you should amend the line:

    prevector::~prevector() for trivial types becomes 7.6x faster (most of its time was spent in resize())

    to make it clear you are only referring to the Benchmark, not to the actual destructor (there should be no performance change under this patchset, and it would be good for you to demonstrate that -- ironically it means I probably under-reported the performance advantage when I added the trivial_destructor commit, because I counted the resizes).

  24. in src/compat.h:21 in 3f9abf49d7 outdated
      16 | +// https://www.gnu.org/software/gcc/gcc-5/changes.html
      17 | +#if defined(__GNUC__) && __GNUC__ < 5
      18 | +#define IS_TRIVIALLY_CONSTRUCTIBLE std::is_trivial
      19 | +#else
      20 | +#define IS_TRIVIALLY_CONSTRUCTIBLE std::is_trivially_constructible
      21 | +#endif
    


    Empact commented at 9:00 PM on February 27, 2018:

    Is this redundant with the same in prevector.h?


    eklitzke commented at 9:28 PM on February 27, 2018:

    Good catch, pushing a rebase that fixes this.

  25. Use memset() to optimize prevector::resize()
    Further optimize prevector::resize() (which is called by a number of
    other prevector methods) to use memset to initialize memory when the
    prevector contains trivial types.
    5aad635b78
  26. eklitzke force-pushed on Feb 27, 2018
  27. sipa commented at 9:44 PM on February 27, 2018: member

    utACK 5aad635b78c8359adae9b2af015b67b7325c0e0b. Those benchmark numbers look very nice; thanks @AkioNak as well (sorry I never got around to reviewing your previous PR).

  28. eklitzke commented at 11:18 PM on February 27, 2018: contributor

    Updated profile for ReadBlockFromDisk() with these changes: https://monad.io/readblockfromdisk2.svg . The time spent waiting on prevector::resize() has dropped from about 32% of the total time to 4% of the total time.

  29. AkioNak commented at 11:55 PM on February 27, 2018: contributor

    @sipa please never mind.

  30. fanquake added the label Refactoring on Feb 28, 2018
  31. laanwj commented at 11:08 AM on March 1, 2018: member

    utACK 5aad635b78c8359adae9b2af015b67b7325c0e0b

  32. laanwj merged this on Mar 1, 2018
  33. laanwj closed this on Mar 1, 2018

  34. laanwj referenced this in commit 32987d5aeb on Mar 1, 2018
  35. eklitzke commented at 8:50 PM on March 1, 2018: contributor

    I just discovered this causes a warning in GCC 8, which added a new -Wclass-memaccess warning. The warning is spurious/incorrect, and GCC 8 is still not released as stable so I'm effectively testing a pre-release compiler. I filed a bug upstream: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84656

    If GCC 8 is released without fixing this we can add an explicit void* cast in the memset() call to suppress the warning. The other option would be to SFINAE the entire class (more work).

  36. eklitzke deleted the branch on Mar 4, 2018
  37. codablock referenced this in commit f599b4759f on Oct 1, 2019
  38. codablock referenced this in commit e5852a2bcd on Oct 1, 2019
  39. codablock referenced this in commit 595826ad63 on Oct 2, 2019
  40. barrystyle referenced this in commit 2ceaa3eabb on Jan 22, 2020
  41. furszy referenced this in commit 95ed10a37e on Dec 20, 2020
  42. DrahtBot locked this on Sep 8, 2021

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-04-29 12:15 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me