optimization: Reduce cache lookups in CCoinsViewCache::FetchCoin #30326

pull paplorinc wants to merge 1 commits into bitcoin:master from paplorinc:paplorinc/FetchCoin changing 1 files +11 −12
  1. paplorinc commented at 10:01 pm on June 23, 2024: contributor

    Enhanced efficiency and readability of CCoinsViewCache::FetchCoin by replacing separate find() and emplace() calls with a single try_emplace(), reducing map lookups and potential insertions.

    AssembleBlock shows FetchCoin as one of its bottlenecks:

    These changes result in a modest performance improvement:

    ./src/bench/bench_bitcoin –filter=‘AssembleBlock’ –min-time=10000

    before:

    ns/op op/s err% total benchmark
    156,160.70 6,403.66 0.6% 10.91 AssembleBlock

    after:

    ns/op op/s err% total benchmark
    152,971.97 6,537.15 0.2% 10.95 AssembleBlock

    Further benchmarks: #30326 (comment)

  2. DrahtBot commented at 10:01 pm on June 23, 2024: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage

    For detailed information about the code coverage, see the test coverage report.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK andrewtoth, sipa, achow101
    Stale ACK luke-jr

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    Conflicts

    No conflicts as of last run.

  3. paplorinc force-pushed on Jun 23, 2024
  4. paplorinc renamed this:
    optimizations: Reduce cache lookups in CCoinsViewCache::FetchCoin
    optimization: Reduce cache lookups in CCoinsViewCache::FetchCoin
    on Jun 23, 2024
  5. paplorinc force-pushed on Jun 24, 2024
  6. paplorinc marked this as ready for review on Jun 24, 2024
  7. fanquake commented at 2:07 pm on June 24, 2024: member

    These changes result in a modest performance improvement:

    I ran the benchmark on an aarch64 machine, and saw mostly worse performance when using the latest GCC, or Clang, with libstdc++. There was a small improvement when using Clang and libc++.

    GCC 14.1.1 20240620 (Red Hat 14.1.1-6)

    Master

    0|               ns/op |                op/s |    err% |          ins/op |          cyc/op |    IPC |         bra/op |   miss% |     total | benchmark
    1|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
    2|          298,908.49 |            3,345.51 |    0.1% |    2,175,461.97 |      883,257.19 |  2.463 |     314,883.84 |    0.5% |     10.99 | `AssembleBlock`
    

    PR

    0|               ns/op |                op/s |    err% |          ins/op |          cyc/op |    IPC |         bra/op |   miss% |     total | benchmark
    1|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
    2|          317,848.00 |            3,146.16 |    0.5% |    2,243,862.65 |      918,949.53 |  2.442 |     316,609.60 |    0.5% |     11.00 | `AssembleBlock`
    

    Clang 18.1.6 (Fedora 18.1.6-4.fc41)

    Master

    0|               ns/op |                op/s |    err% |          ins/op |          cyc/op |    IPC |         bra/op |   miss% |     total | benchmark
    1|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
    2|          290,702.08 |            3,439.95 |    0.2% |    2,047,925.12 |      858,324.72 |  2.386 |     290,750.85 |    0.6% |     10.99 | `AssembleBlock`
    

    PR

    0|               ns/op |                op/s |    err% |          ins/op |          cyc/op |    IPC |         bra/op |   miss% |     total | benchmark
    1|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
    2|          301,625.77 |            3,315.37 |    0.2% |    2,138,789.21 |      891,494.06 |  2.399 |     294,669.20 |    0.5% |     10.96 | `AssembleBlock`
    

    Clang libc++

    Master

    0|               ns/op |                op/s |    err% |          ins/op |          cyc/op |    IPC |         bra/op |   miss% |     total | benchmark
    1|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
    2|          285,978.35 |            3,496.77 |    0.2% |    1,960,603.15 |      845,960.63 |  2.318 |     294,904.09 |    0.6% |     10.98 | `AssembleBlock`
    

    PR

    0|               ns/op |                op/s |    err% |          ins/op |          cyc/op |    IPC |         bra/op |   miss% |     total | benchmark
    1|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
    2|          279,828.66 |            3,573.62 |    0.1% |    1,944,499.08 |      825,776.61 |  2.355 |     296,404.49 |    0.6% |     11.01 | `AssembleBlock`
    
  8. paplorinc commented at 2:17 pm on June 24, 2024: contributor
    Thanks for checking @fanquake! As the profiler showed, this is only 5% of AssembleBlock, and the benchmark varied a lot on my machine as well, it might be why you also see the discrepancy. Do you think I should add a dedicated benchmark for this only?
  9. fanquake commented at 2:27 pm on June 24, 2024: member

    I think that rather than trying to micro-optimise these somewhat sensitive functions, if you’re interested in benchmarking / performance related changes, your time might be better spent reviewing Pull Requests like #28280 (which your changes conflict with).

    I’d also note that all of your benchmarking / profiling seems to be being done on an arm64 macOS machine, using (I assume) Apple Clang as the compiler, and libc++ as the standard library. So when you create changes like these, even if you see improvments locally, they aren’t necessarily going to be improvements for the compiler/std lib we primarily care about (because it’s used to build the majority of our release binaries), which is GCC and libstdc++.

  10. sipa commented at 5:08 pm on June 24, 2024: member
    If you can benchmark pre-assumevalid IBD I think you’ll see FetchCoins have a bigger impact than in AssembleBlock. It may need an actual -reindex though, I don’t know if we have a benchmark with realistic workload.
  11. paplorinc commented at 9:09 am on June 25, 2024: contributor

    Thanks for the compiler hints @fanquake, I had a lot of problems with MacOSX14.sdk, so was using Apple clang version 15.0.0 (clang-1500.3.9.4). Might give GCC and libstdc++ another try if I can find a readme of how to do that on a mac properly, everything just failed catastrophically so far.

     0Index: src/validation.cpp
     1IDEA additional info:
     2Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
     3<+>UTF-8
     4===================================================================
     5diff --git a/src/validation.cpp b/src/validation.cpp
     6--- a/src/validation.cpp	(revision a114ced7192204963f19abb09db05d6be14cbf61)
     7+++ b/src/validation.cpp	(date 1719320825375)
     8@@ -2133,7 +2133,7 @@
     9                        bool cacheFullScriptStore, PrecomputedTransactionData& txdata,
    10                        std::vector<CScriptCheck>* pvChecks)
    11 {
    12-    if (tx.IsCoinBase()) return true;
    13+    return true;
    14 
    15     if (pvChecks) {
    16         pvChecks->reserve(tx.vin.size());
    

    I ran bitcoind -reindex-chainstate -assumevalid=0 -par=1 -stopatheight=200000 for 20 iterations (skipped the first in both as a warmup) as such:

    0git checkout master && make -j10 >/dev/null 2>&1 && for i in {1..20}; do (time (./src/bitcoind -reindex-chainstate -assumevalid=0 -par=1 -stopatheight=200000 >/dev/null 2>&1)) 2>&1 | awk '{print $(NF-1)}'; done; git checkout paplorinc/FetchCoin && make -j10 >/dev/null 2>&1 && for i in {1..20}; do (time (./src/bitcoind -reindex-chainstate -assumevalid=0 -par=1 -stopatheight=200000 >/dev/null 2>&1)) 2>&1 | awk '{print $(NF-1)}'; done
    

    which reveals a small but measurable difference between the two runs (~4%):

    Is there any other test or benchmark or change you’d like me to do?

  12. paplorinc commented at 4:41 pm on June 28, 2024: contributor

    your time might be better spent reviewing Pull Requests like #28280

    Thanks for the hint @fanquake, it’s a tough PR, I have added my two sats to it: #28280#pullrequestreview-2138727793

  13. andrewtoth commented at 10:59 pm on July 4, 2024: contributor

    Just a style nit, but I think if you return early on if (!inserted) it would make a cleaner diff and make it easier to see the benefit of this change.

     0diff --git a/src/coins.cpp b/src/coins.cpp
     1index a4e4d4ad32..9e8072ba02 100644
     2--- a/src/coins.cpp
     3+++ b/src/coins.cpp
     4@@ -41,20 +41,20 @@ size_t CCoinsViewCache::DynamicMemoryUsage() const {
     5 }
     6 
     7 CCoinsMap::iterator CCoinsViewCache::FetchCoin(const COutPoint &outpoint) const {
     8-    CCoinsMap::iterator it = cacheCoins.find(outpoint);
     9-    if (it != cacheCoins.end())
    10+    const auto [it, inserted] = cacheCoins.try_emplace(outpoint);
    11+    if (!inserted)
    12         return it;
    13-    Coin tmp;
    14-    if (!base->GetCoin(outpoint, tmp))
    15+    if (!base->GetCoin(outpoint, it->second.coin)) {
    16+        cacheCoins.erase(it);
    17         return cacheCoins.end();
    18-    CCoinsMap::iterator ret = cacheCoins.emplace(std::piecewise_construct, std::forward_as_tuple(outpoint), std::forward_as_tuple(std::move(tmp))).first;
    19-    if (ret->second.coin.IsSpent()) {
    20+    }
    21+    if (it->second.coin.IsSpent()) {
    22         // The parent only has an empty entry for this outpoint; we can consider our
    23         // version as fresh.
    24-        ret->second.flags = CCoinsCacheEntry::FRESH;
    25+        it->second.flags = CCoinsCacheEntry::FRESH;
    26     }
    27-    cachedCoinsUsage += ret->second.coin.DynamicMemoryUsage();
    28-    return ret;
    29+    cachedCoinsUsage += it->second.coin.DynamicMemoryUsage();
    30+    return it;
    31 }
    32 
    33 bool CCoinsViewCache::GetCoin(const COutPoint &outpoint, Coin &coin) const {
    
  14. paplorinc force-pushed on Jul 5, 2024
  15. paplorinc commented at 6:59 am on July 5, 2024: contributor
    Thanks for the review @andrewtoth. I’m not sure I see the advantage of inverting the if (inserted) { call, just to have two return it; calls after. But I did rename it to ret, since we’re getting the most likely return value immediately now.
  16. andrewtoth commented at 1:40 pm on July 5, 2024: contributor

    disabling unrelated validations

    I don’t think @sipa said to disable validations. “benchmark pre-assumevalid IBD” means testing IBD (or reindex) before the assumevalid block, which right now is block 824,000. By setting -assumevalid=0 that means you disabled using assumevalid entirely, so you will do signature checks from genesis.

    limiting -stopatheight=200000

    With this limit you should not see any benefit from this change. Using default dbcache settings the cache is not flushed until around block 250000 for the first time. This means that all cache entries will exist in the coinsCache map when calling FetchCoin so it will never enter the if (inserted) { block.

    I think you should benchmark bitcoind -reindex-chainstate -stopatheight=820000 or so. And you should not change anything in validation.cpp, that could skew the benchmark.

    I’m not sure I see the advantage of inverting the if (inserted) { call, just to have two return it; calls after.

    It was because the diff is then +9/-9 and no lines change indentation, and the current diff is +12/-13 with indentation changing. So just a style nit, no advantage to the code. Also, after your change of renaming it to ret, the change I suggest would have a diff of only +6/-6 and would no longer conflict with #28280

  17. luke-jr approved
  18. luke-jr commented at 7:35 pm on July 6, 2024: member

    utACK 00052ecc0402f2c53e3602b901a937b64aa7f7cc

    There’s a potential race condition for negative lookups, but I assume it’s fine since the old code would also have a (different) race.

    Performance might be slightly worse for negative lookups, but that seems an acceptable tradeoff since the positive lookup path is much more likely, especially during IBD.

  19. paplorinc commented at 11:18 am on July 7, 2024: contributor

    Thanks for the review @andrewtoth and @luke-jr!

    I think you should benchmark bitcoind -reindex-chainstate -stopatheight=820000

    I don’t have that much space yet, but ran it with:

    0sudo hyperfine \
    1--runs 3 \
    2--show-output \
    3--export-markdown bench.md \
    4--parameter-list commit bd5d16,00052ec \
    5--prepare "sudo purge; sync; git checkout {commit}; git reset --hard; make -j10" \
    6"./src/bitcoind -reindex-chainstate -stopatheight=400000 -printtoconsole=0"
    

    resulting in

    Command Mean [s] Min [s] Max [s] Relative
    ./src/bitcoind -reindex-chainstate -stopatheight=400000 -printtoconsole=0 (commit = bd5d16) 1012.260 ± 1.206 1011.416 1013.641 1.02 ± 0.00
    ./src/bitcoind -reindex-chainstate -stopatheight=400000 -printtoconsole=0 (commit = 00052ec) 994.387 ± 3.077 991.018 997.050 1.00

    i.e. a modest 2% speedup.

    would no longer conflict with #28280

    good point, if more changes will be needed, I’ll do that!

    positive lookup path is much more likely, especially during IBD.

    Thanks Luke, that was my thinking as well. I’ll do a full IBD this week to see how it behaves (once I free up some extra space locally).

  20. andrewtoth approved
  21. andrewtoth commented at 4:49 pm on July 7, 2024: contributor

    ACK 00052ecc0402f2c53e3602b901a937b64aa7f7cc

    I benchmarked IBD from a local node connection to block 820k, twice on this branch twice on master. This branch: 22265 seconds mean time Master: 22523 seconds mean time

    Also benchmarked -reindex-chainstate up to block 820k, twice on this branch and twice on master. This branch: 22085 seconds mean time Master: 22255 seconds mean time

    So it appears to shave off ~3 minutes of time.

  22. paplorinc commented at 7:32 pm on July 7, 2024: contributor
    Thanks a lot for checking, @andrewtoth!
  23. DrahtBot added the label CI failed on Jul 22, 2024
  24. DrahtBot commented at 11:57 am on July 22, 2024: contributor

    🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/27072041125

    Make sure to run all tests locally, according to the documentation.

    The failure may happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  25. DrahtBot removed the label CI failed on Jul 22, 2024
  26. DrahtBot added the label Needs rebase on Aug 8, 2024
  27. Reduce cache lookups in CCoinsViewCache::FetchCoin
    Enhanced efficiency and readability of CCoinsViewCache::FetchCoin by replacing separate find() and emplace() calls with a single try_emplace(), reducing map lookups and potential insertions.
    204ca67bba
  28. paplorinc force-pushed on Aug 8, 2024
  29. paplorinc commented at 8:55 pm on August 8, 2024: contributor
  30. DrahtBot removed the label Needs rebase on Aug 8, 2024
  31. andrewtoth approved
  32. andrewtoth commented at 5:57 pm on August 9, 2024: contributor
    re-ACK 204ca67bba263018374fe86d7a6867362d09536f
  33. DrahtBot requested review from luke-jr on Aug 9, 2024
  34. sipa commented at 8:04 pm on August 9, 2024: member
    utACK 204ca67bba263018374fe86d7a6867362d09536f
  35. achow101 commented at 8:06 pm on August 12, 2024: member
    ACK 204ca67bba263018374fe86d7a6867362d09536f
  36. achow101 merged this on Aug 12, 2024
  37. achow101 closed this on Aug 12, 2024

  38. paplorinc deleted the branch on Aug 12, 2024

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-09-16 19:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me