sync: improve CCoinsViewCache ReallocateCache - 2nd try #30370

pull fjahr wants to merge 5 commits into bitcoin:master from fjahr:2024-07-pr28945 changing 5 files +18 −17
  1. fjahr commented at 2:45 pm on July 1, 2024: contributor

    Rebased and reopened #28945

    Original description:

    TL;DR: this change improves sync time on my PC by ~6%.

    While syncing when the CCoinsViewCache gets full it is flushed to disk and then memory is freed. Depending on available cache size, this happens several times. This PR removes the unnecessary reallocations for temporary CCoinsViewCache and reserves the cacheCoins’s bucket array to its previous size.

    I benchmarked the change on an AMD Ryzen 9 7950X with this command:

    0hyperfine \
    1--show-output --parameter-list commit b5a271334ca81a6adcb1c608d85c83621a9eae47,ceeb71816512642e92a110b2cf5d2549d090fe78 \
    2--setup 'git checkout {commit} && make -j$(nproc) src/bitcoind' \
    3--prepare 'sync; sudo /sbin/sysctl vm.drop_caches=3' \
    4-M 3 './src/bitcoind -dbcache=500 -reindex-chainstate -printtoconsole=0 -stopatheight=500000'
    

    Where b5a2713 is mergebase, and ceeb718 this PR.

    Which runs ./src/bitcoind -dbcache=500 -reindex-chainstate -printtoconsole=0 -stopatheight=500000 3 times and compares both commits, giving this result:

     0Benchmark 1: ./src/bitcoind -dbcache=500 -reindex-chainstate -printtoconsole=0 -stopatheight=500000 (commit = b5a271334ca81a6adcb1c608d85c83621a9eae47)
     1  Time (mean ± σ):     2815.648 s ± 70.720 s    [User: 2639.959 s, System: 640.051 s]
     2  Range (min … max):   2736.616 s … 2872.964 s    3 runs
     3
     4Benchmark 2: ./src/bitcoind -dbcache=500 -reindex-chainstate -printtoconsole=0 -stopatheight=500000 (commit = ceeb71816512642e92a110b2cf5d2549d090fe78)
     5  Time (mean ± σ):     2661.520 s ± 23.205 s    [User: 2473.010 s, System: 624.040 s]
     6  Range (min … max):   2635.816 s … 2680.926 s    3 runs
     7
     8Summary
     9  ./src/bitcoind -dbcache=500 -reindex-chainstate -printtoconsole=0 -stopatheight=500000 (commit = ceeb71816512642e92a110b2cf5d2549d090fe78) ran
    10    1.06 ± 0.03 times faster than ./src/bitcoind -dbcache=500 -reindex-chainstate -printtoconsole=0 -stopatheight=500000 (commit = b5a271334ca81a6adcb1c608d85c83621a9eae47)
    
  2. coins: Add parameter to specify if ReallocateCache is necessary cd7c826845
  3. validation: don't reallocate cache for short lived CCoinsViewCache
    The short lived CCoinsViewCache's are destructed right after the Flush(), therefore it is not necessary to call ReallocateCache.
    27c574f456
  4. coins: reserve bucket count ad922bd7bd
  5. DrahtBot commented at 2:45 pm on July 1, 2024: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/30370.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    Concept NACK l0rinc

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #30425 (kernel: De-globalize static validation variables by TheCharlatan)
    • #28280 (Don’t empty dbcache on prune flushes: >30% faster IBD by andrewtoth)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  6. Add fuzz coverage and documentation on Flush parameter 86416b52c2
  7. fuzz: Clean up fuzz/coinscache_sim
    Since Flush() includes a call to ReallocateCache() it does not seem to be necessary to have another test that calls it explicitly again.
    43e02afa44
  8. fjahr commented at 4:02 pm on July 1, 2024: contributor
    CI failure is unrelated (#30368).
  9. DrahtBot added the label CI failed on Jul 1, 2024
  10. andrewtoth commented at 5:58 pm on July 1, 2024: contributor
    I have the same comment from #28945: Why can we not revert https://github.com/bitcoin/bitcoin/commit/5e4ac5abf54f8e6d6330df0c73119aa0cca4c103 to speed this up instead? I’m not clear why we need to have the coinsCache free up all associated memory to use the PoolAllocator.
  11. DrahtBot removed the label CI failed on Jul 12, 2024
  12. DrahtBot added the label Needs rebase on Jul 16, 2024
  13. DrahtBot commented at 4:14 pm on July 16, 2024: contributor

    🐙 This pull request conflicts with the target branch and needs rebase.

  14. in src/coins.h:310 in cd7c826845 outdated
    306@@ -307,7 +307,7 @@ class CCoinsViewCache : public CCoinsViewBacked
    307      * to be forgotten.
    308      * If false is returned, the state of this cache (and its backing view) will be undefined.
    309      */
    310-    bool Flush();
    311+    bool Flush(bool reallocate_cache = true);
    


    l0rinc commented at 9:15 am on September 11, 2024:
    I would prefer not having a default value, I’d like each call to specify exactly why it needs a reallocation or not, i.e. adding each call in a separate commit, explaining why that instance of Flush was called with a true/ false (we can leave the default value in the first commit, add explicit parameter for the usages and remove the default value in the last commit). E.g. I want to understand why FlushSnapshotToDisk calls are always reallocated (or at least see that it was thoroughly investigated)
  15. in src/coins.cpp:328 in ad922bd7bd outdated
    323     ::new (&m_cache_coins_memory_resource) CCoinsMapMemoryResource{};
    324     ::new (&cacheCoins) CCoinsMap{0, SaltedOutpointHasher{/*deterministic=*/m_deterministic}, CCoinsMap::key_equal{}, &m_cache_coins_memory_resource};
    325+
    326+    // After a flush all the cacheCoins's memory has now been freed. It is likely that the map gets full again so flush is necessary, and it will
    327+    // happen around the same memory usage. So we can already reserve the bucket array to the previous size, reducing the number of rehashes needed.
    328+    cacheCoins.reserve(current_bucket_count);
    


    l0rinc commented at 9:18 am on September 11, 2024:

    doesn’t this contradict the purpose of cache reallocation, i.e. https://github.com/bitcoin/bitcoin/blob/master/src/coins.h#L469-L473?

    So it’s either

    the map’s allocator may be hanging onto a lot of memory despite having called .clear()

    or

    It is likely that the map gets full again so flush is necessary, and it will happen around the same memory usage

    Which begs the question: why not remove ReallocateCache completely? (might be related to #30370 (comment), not sure)

    i.e since clear basically just deallocates and zeros out the map already, why do we create a new one and reserve it to the same size? Why not just delete the ReallocateCache method?

  16. in src/test/fuzz/coins_view.cpp:79 in 86416b52c2 outdated
    75@@ -76,7 +76,7 @@ FUZZ_TARGET(coins_view, .init = initialize_coins_view)
    76                 assert(expected_code_path);
    77             },
    78             [&] {
    79-                (void)coins_view_cache.Flush();
    80+                (void)coins_view_cache.Flush(fuzzed_data_provider.ConsumeBool());
    


    l0rinc commented at 9:20 am on September 11, 2024:

    shouldn’t we at least assert the return value, given that the doc states:

    If false is returned, the state of this cache (and its backing view) will be undefined.

  17. l0rinc changes_requested
  18. l0rinc commented at 1:48 pm on September 11, 2024: contributor
    concept NACK, this seems to completely contradict what ReallocateCache was introduced in the first place. I left comments explaining my concerns, please let me know if I’m mistaken.
  19. fjahr commented at 10:02 pm on September 21, 2024: contributor

    this seems to completely contradict what ReallocateCache was introduced in the first place

    Can you explain what you mean by this? I don’t understand how making it optional contradicts it.

  20. l0rinc commented at 10:08 pm on September 21, 2024: contributor

    Can you explain what you mean by this

    I expanded on it in #30370 (review).

    My understanding is that ReallocateCache was introduced to save memory when shrinking the map after evictions, and this PR prohibits shrinking and reallocates the same capacity we were just trying to get rid of. If I’m mistaken here, I’d appreciate an explanation for why we can’t solve it in a simpler way - e.g. by getting rid of ReallocateCache (or evicting the least recently used instead, since we have a linked list of dirty entries now), or something similar.

  21. andrewtoth commented at 3:38 pm on September 22, 2024: contributor
    Researching this more, I think I can answer my own question. The pool allocator introduced in #25325 cannot shrink in memory. It can only allocate more chunks, and freed memory just adds some nodes in the chunks to a freelist. But, we don’t discount the available nodes in the freelist. So, even though we clear all elements, the memory usage reported by the map stays the same. This works great if we just fill up the map and then reallocate it, but it doesn’t really work well now when we can sync the cache to disk without reallocating the cache. It also prevents us from being smarter about cache usage by tracking and deleting least recently added entries instead of just wiping it all when it gets full.
  22. bitcoin deleted a comment on Sep 23, 2024
  23. andrewtoth commented at 11:04 am on September 23, 2024: contributor

    In light of this, I think a better approach is to track the available memory in the freelist in the PoolAllocator and expose it in a public getter. Then, discount that available memory in CCoinsViewCache::DynamicMemoryUsage. And finally, revert 5e4ac5abf54f8e6d6330df0c73119aa0cca4c103 and achieve the speedup.

    I think this will result in even more speedup, since the actual chunks in the pool allocator will be recycled and won’t have to be reallocated.

  24. andrewtoth commented at 9:27 pm on October 18, 2024: contributor

    While trying to implement the approach above, it became apparent that we still need the call to ReallocateCache here. This is for the case when IBD is done with memory normally reserved for the mempool. After IBD is finished and the mempool starts filling up and encroaching on the memory used by the cache, the cache must be reallocated. There is no other way to shrink it.

    I think it’s good that we are not calling reallocate on the ephemeral views, but I think we can do better with the preallocation. Since we know the total cache size with https://github.com/bitcoin/bitcoin/blob/master/src/validation.cpp#L2760-L2761, we can divide that by sizeof(CoinsCachePair) and get an upper bound on how many entries we could ever add to that cache until we have to flush. Using that number with reserve is more accurate than just using the previous bucket size. The amount of extra space might have changed, or it could have been flushed spuriously with an rpc call. This way we are guaranteed to not have to resize the cache until the next flush.

    We should also take this a step further and reserve the amount of entries needed for the ephemeral views used to connect and disconnect blocks. The only time the pool allocator can’t be used and has to fall back to new is when the cache has to be resized, so these are allocations we want to avoid. By tallying the number of inputs and outputs in the block before connecting/disconnecting, we can then reserve and only do a single allocation per block.

  25. l0rinc commented at 3:24 pm on January 4, 2025: contributor

    @fjahr, I’ve measured the effect of this change on a full IBD - <1%, doesn’t seem significant.

    master

    0Benchmark 1: COMMIT=d73f37dda221835b5109ede1b84db2dc7c4b74a1 ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=870000 -printtoconsole=0                             
    1  Time (mean ± σ):     39690.021 s ± 333.717 s    [User: 51244.778 s, System: 3430.580 s]                                                                                                 
    2  Range (min … max):   39454.047 s … 39925.995 s    2 runs 
    

    this PR rebased on top of master

    0Benchmark 1: COMMIT=ba36ac6c4ae46b1dc4386326844ce67664fc9206 ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=870000 -printtoconsole=0
    1  Time (mean ± σ):     39587.720 s ± 527.298 s    [User: 51105.792 s, System: 3430.869 s]
    2  Range (min … max):   39214.864 s … 39960.575 s    2 runs
    

    Given this and @andrewtoth’s excellent analysis above, does this PR still make sense?

    I will try to provide an alternative, but it might take some time since it depends on other changes that are meant to improve memory accounting (e.g. #18086, #28531), and I still think that dropping the cache completely is really wasteful and we should try some form of MRU cache like #31102 instead.

    This is what an IBD with default max cache size looks like: image

  26. fjahr commented at 3:06 pm on January 5, 2025: contributor
    Alright, closing if the effect isn’t significant. I have a few too many things on my plate at the moment anyway so maybe someone else wants to investigate further from here.
  27. fjahr closed this on Jan 5, 2025


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-02-05 06:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me