Use number of dirty cache entries in flush warnings/logs #31703

pull sipa wants to merge 2 commits into bitcoin:master from sipa:202501_dirty_coin_count changing 5 files +50 −15
  1. sipa commented at 7:27 pm on January 21, 2025: member

    Since #28280 and #28233, the percentage of dirty entries in the cache when a flush happen may in practice be far from 100%. The log output, the decision to warn about a large flush (#31534), and the disk-full detection, however always use the entire size of the cache.

    Fix this by keeping track of the number of dirty entries in CCoinsViewCache, and using that number. I’ve dropped the usage of DynamicMemoryUsage as it’s the wrong metric, even before the non-wiping flushes were introduced (if everything is dirty, memory usage is correlated with, but not the same as, the amount of disk space written or used). They could be improved by keeping track of proper write size estimates in a follow-up, but here I’m just using a fixed 48 bytes per entry estimate.

  2. DrahtBot commented at 7:27 pm on January 21, 2025: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/31703.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    Concept ACK TheCharlatan, andrewtoth, l0rinc, tdb3

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #31132 (validation: fetch block inputs on parallel threads 10% faster IBD by andrewtoth)
    • #30673 (coins: remove logic for spent-and-FRESH cache entries and writing non-DIRTY entries by andrewtoth)
    • #30611 (validation: write chainstate to disk every hour by andrewtoth)
    • #28216 (fuzz: a new target for the coins database by darosior)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  3. sipa renamed this:
    202501 dirty coin count
    Use number of dirty cache entries in flush warnings/logs
    on Jan 21, 2025
  4. TheCharlatan commented at 7:59 pm on January 21, 2025: contributor
    Nice, Concept ACK
  5. in src/coins.h:472 in 19f1f69802 outdated
    467@@ -463,6 +468,9 @@ class CCoinsViewCache : public CCoinsViewBacked
    468     //! Calculate the size of the cache (in number of transaction outputs)
    469     unsigned int GetCacheSize() const;
    470 
    471+    //! Calculate the number of dirty cache entries (transaction outputs)
    472+    unsigned int GetDirtyCount() const noexcept { return m_num_dirty; }
    


    andrewtoth commented at 8:20 pm on January 21, 2025:

    nit

    0    size_t GetDirtyCount() const noexcept { return m_num_dirty; }
    

    sipa commented at 8:52 pm on January 21, 2025:
    Done.
  6. in src/validation.cpp:2957 in 19f1f69802 outdated
    2953@@ -2953,8 +2954,8 @@ bool Chainstate::FlushStateToDisk(
    2954             TRACEPOINT(utxocache, flush,
    2955                    int64_t{Ticks<std::chrono::microseconds>(SteadyClock::now() - nNow)},
    2956                    (uint32_t)mode,
    2957-                   (uint64_t)coins_count,
    2958-                   (uint64_t)coins_mem_usage,
    


    andrewtoth commented at 8:26 pm on January 21, 2025:

    This tracepoint was previously recording an estimate of memory usage of the entire cache, now it is an estimate of the disk usage of the dirty entries that were just written. Is that what we want to record here?

    cc @0xB10C


    sipa commented at 8:52 pm on January 21, 2025:
    Reverted this change.

    andrewtoth commented at 1:02 am on January 22, 2025:
    Hmm now we need to only declare coins_mem_usage if building trace points.

    sipa commented at 2:36 pm on January 22, 2025:
    Added [[maybe_unused]], which seems to suffice.
  7. andrewtoth commented at 8:26 pm on January 21, 2025: contributor
    Concept ACK
  8. sipa force-pushed on Jan 21, 2025
  9. DrahtBot added the label CI failed on Jan 21, 2025
  10. DrahtBot commented at 10:37 pm on January 21, 2025: contributor

    🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/35957028069

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  11. sipa force-pushed on Jan 22, 2025
  12. DrahtBot removed the label CI failed on Jan 22, 2025
  13. in src/validation.cpp:2937 in d99b0ec47c outdated
    2936-            LOG_TIME_MILLIS_WITH_CATEGORY(strprintf("write coins cache to disk (%d coins, %.2fKiB)",
    2937-                coins_count, coins_mem_usage >> 10), BCLog::BENCH);
    2938-
    2939             // Typical Coin structures on disk are around 48 bytes in size.
    2940+            size_t approx_disk_usage = 48 * coins_dirty_count;
    2941+            if (approx_disk_usage >= WARN_FLUSH_COINS_SIZE) LogWarning("Flushing large (%.1f GiB) UTXO set to disk, it may take several minutes", double(approx_disk_usage) / (1 << 30));
    


    l0rinc commented at 11:40 am on January 22, 2025:

    This warning became more useful now - a few questions:

    • Would it make sense to mention the coin count and do the warning based on that instead? May be too much information, not sure, it’s just an idea:
    0Flushing %s dirty coins to disk...
    
    • If we keep GiB (while double is likely still precise enough for the billions range), I wonder if the users are really interested in fraction of GiB here
    • The limit for triggering this has changed here - does the WARN_FLUSH_COINS_SIZE threshold still make sense?

    l0rinc commented at 12:33 pm on January 22, 2025:

    Should we make this consistent with FlushSnapshotToDisk and ChainstateManager::PopulateAndValidateSnapshot as well to log the estimated disk usage based on dirty count instead of DynamicMemoryUsage?

    02025-01-22T11:56:22Z FlushSnapshotToDisk: saving snapshot chainstate (27440 MB) started
    

    (and maybe use MiB as well to avoid further confusion) and

    02025-01-22T11:56:22Z [snapshot] loaded 176948713 (27440 MB)
    

    l0rinc commented at 1:44 pm on January 22, 2025:
    This is likely my mistake in #31534, but loading the UTXO set (as described in #31645) will call Flush via FlushSnapshotToDisk at the end (taking a really long time but never warning) https://github.com/bitcoin/bitcoin/blob/master/src/validation.cpp#L5878, reset m_num_dirty = 0 so when we get to FlushStateToDisk (through ActivateSnapshot) we’re never warning since coins_dirty_count is always 0.

    sipa commented at 3:08 pm on January 22, 2025:
    I have switched everything to be dirty-count based, rather than size-based.

    sipa commented at 3:15 pm on January 22, 2025:
    I think that can wait for an independent improvement.

    sipa commented at 3:16 pm on January 22, 2025:
    I’m not familiar with the snapshot code.

    l0rinc commented at 7:28 pm on January 22, 2025:

    I posted the steps to reproduce it at #31645 - but we can move this warning to Flush in a different PR to make sure it works for AssumeUTXO as well.


    Checked that the current PR displays the warning correctly.

    2025-01-22T19:20:54Z UpdateTip: new best=0000000000000000000162ce2492545600d6e6780ee22bf464166a4faa28ba28 height=824665 version=0x32000000 log2_work=94.650129 tx=948674213 date=‘2024-01-06T21:32:54Z’ progress=0.822859 cache=9687.2MiB(67719102txo) … 2025-01-22T19:20:55Z [warning] Flushing large (67721262 entries) UTXO set to disk, it may take several minutes


    l0rinc commented at 5:10 pm on January 30, 2025:

    I’m not familiar with the snapshot code.

    It seems to me that it may make more sense to move the warning to CCoinsViewDB::BatchWrite instead, which could already have m_dirty available through the CoinsViewCacheCursor, and since it’s called from Flush and Sync, an AssumeUTXO load-and-dump would also trigger the warning (not just an IBD).

    I also don’t mind doing it myself in a follow-up, if you think it’s a good idea.

  14. in src/coins.h:383 in d99b0ec47c outdated
    379@@ -377,6 +380,8 @@ class CCoinsViewCache : public CCoinsViewBacked
    380 
    381     /* Cached dynamic memory usage for the inner Coin objects. */
    382     mutable size_t cachedCoinsUsage{0};
    383+    /* Cached number of dirty Coin objects. */
    


    l0rinc commented at 11:53 am on January 22, 2025:

    We’re already in a Cache object, maybe we could clarify that this is a running count:

    0    /* Running count of dirty Coin entries. */
    

    sipa commented at 3:08 pm on January 22, 2025:
    Done.
  15. in src/validation.cpp:2936 in d99b0ec47c outdated
    2931@@ -2931,16 +2932,17 @@ bool Chainstate::FlushStateToDisk(
    2932         }
    2933         // Flush best chain related state. This can only be done if the blocks / block index write was also done.
    2934         if (fDoFullFlush && !CoinsTip().GetBestBlock().IsNull()) {
    2935-            if (coins_mem_usage >= WARN_FLUSH_COINS_SIZE) LogWarning("Flushing large (%d GiB) UTXO set to disk, it may take several minutes", coins_mem_usage >> 30);
    2936-            LOG_TIME_MILLIS_WITH_CATEGORY(strprintf("write coins cache to disk (%d coins, %.2fKiB)",
    2937-                coins_count, coins_mem_usage >> 10), BCLog::BENCH);
    2938-
    2939             // Typical Coin structures on disk are around 48 bytes in size.
    2940+            size_t approx_disk_usage = 48 * coins_dirty_count;
    


    l0rinc commented at 11:55 am on January 22, 2025:

    I understand that the hard-coded value was here before, and was wondering if we could explain how we got to this magic number (and not mention the value in comments at all since code can express that better), maybe something like:

    0static constexpr size_t COIN_DISK_USAGE_BYTES{sizeof(Coin)}; // Approximate size on disk
    

    Loading the UTXO set to memory:

    2025-01-22T11:56:22Z [snapshot] loaded 176948713 (27440 MB) coins from snapshot 0000000000000000000320283a032748cef8227873ff4872689bf23f1cda83a5

    and checking the resulting disk size:

    0du -sk /Users/lorinc/IdeaProjects/bitcoin/demo/chainstate_snapshot | cut -f1
    111524472
    

    Suggests we may want to add a ~40% overhead here:

    0((11524472*1024)/176948713)/48
    

    i.e.

    0static constexpr size_t COIN_DISK_USAGE_BYTES{static_cast<size_t>(sizeof(Coin) * 1.4)}; // Approximate size on disk
    

    sipa commented at 3:14 pm on January 22, 2025:

    Estimating sizes of the UTXO set is hard, because these are all very distinct numbers:

    • The size on disk in a snapshot of a UTXO set
    • The size on disk in the chainstate LevelDB database (the database has different per-entry overhead, prefix compression, bloom filters, …).
    • The amount of data that will be written to disk to flush a number of dirty entries (because e.g. writing a removal/spending of a UTXO temporarily takes up space in the log before it’s compacted into the tables)
    • The amount of memory a UTXO entry takes up in RAM (as approximated by DynamicMemoryUsage()).

    If we actually care about disk sizes, we should actually try to estimate the thing we care about, and not use snapshot size as a proxy.


    l0rinc commented at 7:31 pm on January 22, 2025:
    I understand that it’s complex, but we seem to be off here by 40% - I don’t mind if we tackle that in a separate PR/issue and would appreciate your opinion on how to handle this better.

    sipa commented at 7:35 pm on January 22, 2025:
    I don’t understand why you’re looking at snapshot size at all, that’s not a good measure of data written. We may be off by a lot more, or a lot less. If we want to do better, we should look at how much is actually written, and then add to CCoinsViewCache a way to predict that number. None of that seems a good fit for this PR, though.

    l0rinc commented at 5:20 pm on January 30, 2025:

    I don’t understand why you’re looking at snapshot size

    not a good measure of data written

    and not use snapshot size as a proxy

    I understand that the dirty entries in the snapshot are not a perfect representation (except for the AssumeUTXO tests I was doing, which likely skewed my recommendations), but I though it’s what we’re already doing here when we’re calculating “approx_disk_usage” from “coins_dirty_count”.

    None of that seems a good fit for this PR, though

    I agree, if we want a better representation. But replacing the magic 48 with sizeof(Coin) could be. But I’m also fine if not, it’s just an alternative that I found.

  16. in src/coins.cpp:320 in d99b0ec47c outdated
    316@@ -306,6 +317,7 @@ void CCoinsViewCache::ReallocateCache()
    317 {
    318     // Cache should be empty when we're calling this.
    319     assert(cacheCoins.size() == 0);
    320+    assert(m_num_dirty == 0);
    


    l0rinc commented at 11:57 am on January 22, 2025:
    given that this is mainly used for logging, do we want to crash the client when this isn’t accurate?

    sipa commented at 3:14 pm on January 22, 2025:
    Changed to Assume.
  17. in src/test/coins_tests.cpp:103 in d99b0ec47c outdated
     99@@ -100,6 +100,7 @@ class CCoinsViewCacheTest : public CCoinsViewCache
    100     CCoinsMap& map() const { return cacheCoins; }
    101     CoinsCachePair& sentinel() const { return m_sentinel; }
    102     size_t& usage() const { return cachedCoinsUsage; }
    103+    size_t& GetNumDirty() const { return m_num_dirty; }
    


    l0rinc commented at 12:00 pm on January 22, 2025:

    Nit 1: num doesn’t obviate what the number represents, just its type. It could mean bytes or count or an int representation of a boolean. Given that in other cases we called this DirtyCount, could we use that here as well? Nit 2: not sure it matters, but in other cases here we didn’t add the “Get” prefix, I don’t see the value it adds (and it kinda’ gets confusing when we’re mutating it in SingleEntryCacheTest), might as well trim it I guess.

    0    size_t& DirtyCount() const { return m_dirty_count; }
    

    sipa commented at 3:14 pm on January 22, 2025:
    Changed to m_dirty_count and GetDirtyCount() everywhere.
  18. in src/coins.cpp:154 in d99b0ec47c outdated
    150@@ -145,6 +151,7 @@ bool CCoinsViewCache::SpendCoin(const COutPoint &outpoint, Coin* moveout) {
    151     } else {
    152         CCoinsCacheEntry::SetDirty(*it, m_sentinel);
    153         it->second.coin.Clear();
    154+        ++m_num_dirty;
    


    l0rinc commented at 12:11 pm on January 22, 2025:
    Super-nit: m_num_dirty increment is related to the SetDirty call, not the Clear call, if you think it makes sense, we could group them as such (like we do in EmplaceCoinInternalDANGER and BatchWrite)

    sipa commented at 3:15 pm on January 22, 2025:
    Done.
  19. in src/coins.h:290 in d99b0ec47c outdated
    286@@ -286,6 +287,7 @@ struct CoinsViewCacheCursor
    287         const auto next_entry{current.second.Next()};
    288         // If we are not going to erase the cache, we must still erase spent entries.
    289         // Otherwise, clear the state of the entry.
    290+        if (current.second.IsDirty()) --m_dirty;
    


    l0rinc commented at 12:17 pm on January 22, 2025:

    I don’t mind either the conditional increment or the implicit boolean-to-int conversion, but I get distracted when both are used without an easy to understand reason. Unless there’s a good reason here, could we change this to:

    0        m_dirty -= current.second.IsDirty();
    

    sipa commented at 3:15 pm on January 22, 2025:
    Done.
  20. in src/test/coins_tests.cpp:628 in d99b0ec47c outdated
    624@@ -624,15 +625,15 @@ static void SetCoinsValue(const CAmount value, Coin& coin)
    625     }
    626 }
    627 
    628-static size_t InsertCoinsMapEntry(CCoinsMap& map, CoinsCachePair& sentinel, const CoinEntry& cache_coin)
    629+static std::pair<size_t, size_t> InsertCoinsMapEntry(CCoinsMap& map, CoinsCachePair& sentinel, const CoinEntry& cache_coin)
    


    l0rinc commented at 12:26 pm on January 22, 2025:

    Nit: it’s test code so it’s not super-important, but we’re basically returning the state of a modified parameter here - seems a bit hacky to me. I think we could return the same value as before and call IsDirty() on the call site, e.g.

    0size_t usage{cache_coin ? InsertCoinsMapEntry(map, sentinel, *cache_coin) : 0};
    1size_t dirty{cache_coin->IsDirty()};
    

    and

    0cache.usage() += InsertCoinsMapEntry(cache.map(), cache.sentinel(), *cache_coin);
    1cache.GetNumDirty() += cache_coin->IsDirty();
    

    sipa commented at 3:15 pm on January 22, 2025:
    Done.
  21. in src/validation.cpp:2945 in d99b0ec47c outdated
    2945             // Pushing a new one to the database can cause it to be written
    2946             // twice (once in the log, and once in the tables). This is already
    2947             // an overestimation, as most will delete an existing entry or
    2948             // overwrite one. Still, use a conservative safety factor of 2.
    2949-            if (!CheckDiskSpace(m_chainman.m_options.datadir, 48 * 2 * 2 * CoinsTip().GetCacheSize())) {
    2950+            if (!CheckDiskSpace(m_chainman.m_options.datadir, 2 * 2 * approx_disk_usage)) {
    


    l0rinc commented at 12:36 pm on January 22, 2025:
    Slightly unrelated, but we probably need something similar for memory as well. Currently, there’s no explicit warning when -dbcache is set to a value exceeding available system memory (especially problematic after #28358). A similar warning in this scenario would be very helpful - but likely outside the scope of this PR.
  22. in src/coins.cpp:247 in d99b0ec47c outdated
    243@@ -235,6 +244,7 @@ bool CCoinsViewCache::BatchWrite(CoinsViewCacheCursor& cursor, const uint256 &ha
    244                     itUs->second.coin = it->second.coin;
    245                 }
    246                 cachedCoinsUsage += itUs->second.coin.DynamicMemoryUsage();
    247+                m_num_dirty += !itUs->second.IsDirty();
    


    l0rinc commented at 1:15 pm on January 22, 2025:
    It took me a while to understand this one - still not sure I follow completely.

    sipa commented at 3:15 pm on January 22, 2025:
    I’ve reworked it a bit. LMK if it’s clearer now.

    l0rinc commented at 5:14 pm on January 30, 2025:
    Yes, thank you. The only remaining nit is that in other cases we set it dirty first, before incrementing (edit: to obviate that the call doesn’t rely on the value of m_dirty_count).
  23. in src/coins.cpp:81 in d99b0ec47c outdated
    77@@ -78,6 +78,7 @@ void CCoinsViewCache::AddCoin(const COutPoint &outpoint, Coin&& coin, bool possi
    78     bool fresh = false;
    79     if (!inserted) {
    80         cachedCoinsUsage -= it->second.coin.DynamicMemoryUsage();
    81+        m_num_dirty -= it->second.IsDirty();
    


    l0rinc commented at 1:22 pm on January 22, 2025:
    We’re modifying m_num_dirty multiple times in the AddCoin method, possibly even cancelling out previous modifications. If it’s not too involved, could we calculate the final diff and only update it a single time in this method for clarity? Same applies to SpendCoin.
  24. in src/coins.cpp:362 in d99b0ec47c outdated
    358@@ -343,6 +359,7 @@ void CCoinsViewCache::SanityCheck() const
    359     }
    360     assert(count_linked == count_flagged);
    361     assert(recomputed_usage == cachedCoinsUsage);
    362+    assert(num_dirty == m_num_dirty);
    


    l0rinc commented at 1:24 pm on January 22, 2025:
    Before we merge I’d like to run a full IBD with SanityCheck enabled throughout the code - can you recommend other settings that I should enable?

    sipa commented at 3:16 pm on January 22, 2025:
    Anything that involves a reorg would perhaps be interesting to test.

    l0rinc commented at 4:56 pm on January 24, 2025:

    I ran an IBD until 800k with -DCMAKE_BUILD_TYPE=Debug for the current PR (it’s really slow this way) - no issues found.

    a reorg would perhaps be interesting to test

    Do we have a way of triggering random reorgs during IBD (at least undo/redo style)? If not, would it be helpful to create a PR that adds a hidden -simulatereorgonibd argument (similarly to -dbcrashratio) that randomly triggers small, temporary reorgs during IBD on mainnet? Specifically, at a random height between every ~[10,000, 100,000] blocks, we disconnect and reconnect ~[1, 100] blocks (favoring smaller counts), performing a CoinsTip#SanityCheck() before and after (and maybe validating that we have the UTXOs in cache that will be removed, asserting after undo that they’re missing, and revalidating after reapplication that they’re back in cache). This would test reorg logic in a semi-realistic scenario - though it wouldn’t stress diverging chains, as it reconnects the same blocks.


    l0rinc commented at 5:26 pm on January 30, 2025:

    Alternatively, we could collect a few hundred stale blocks (found 219 raw blocks so far via https://github.com/bitcoin-data/stale-blocks and https://learnmeabitcoin.com) and create a stale-block-proxy node for end-to-end regression testing of reorgs with historical data:

    • A freshly built test node would connect only to this proxy, which would simulate the original chain forks.
    • It serves headers & blocks exactly as they were originally seen (without storing them, requesting non-stale ones from the network).
    • The proxy pretends a stale block is the tip, letting the test node sync up to it.
    • Once the test node reaches that stale block, the proxy sets the next stale block as the new tip, triggering a natural reorg.

    Would that work and be a useful way to test reorg behavior during IBD?

  25. in src/validation.cpp:2831 in d99b0ec47c outdated
    2827@@ -2828,7 +2828,8 @@ bool Chainstate::FlushStateToDisk(
    2828     bool full_flush_completed = false;
    2829 
    2830     const size_t coins_count = CoinsTip().GetCacheSize();
    2831-    const size_t coins_mem_usage = CoinsTip().DynamicMemoryUsage();
    2832+    [[maybe_unused]] const size_t coins_mem_usage = CoinsTip().DynamicMemoryUsage();
    


    l0rinc commented at 1:47 pm on January 22, 2025:
    I guess we can’t just inline this, but maybe we can guard it with #ifdef ENABLE_TRACING alternatively

    sipa commented at 3:16 pm on January 22, 2025:
    I think [[maybe_unused]] is fine.

    l0rinc commented at 5:24 pm on January 30, 2025:
    Sure, though this won’t warn us if it actually becomes unused.
  26. l0rinc changes_requested
  27. l0rinc commented at 2:10 pm on January 22, 2025: contributor

    Concept ACK

    I have tried reproducing it via AssumeUTXO but the warning is never triggered. That’s my main concern (it’s probably my mistake in the related PR), other than not applying it in other cases where the sizes are displayed.

  28. sipa force-pushed on Jan 22, 2025
  29. sipa force-pushed on Jan 22, 2025
  30. DrahtBot added the label CI failed on Jan 22, 2025
  31. DrahtBot commented at 3:18 pm on January 22, 2025: contributor

    🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/36002777403

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  32. coins: keep track of number of dirty entries in cache 586f2a6b1b
  33. validation: use dirty coins count in flush warnings 15619a1c99
  34. sipa force-pushed on Jan 22, 2025
  35. tdb3 commented at 11:12 pm on January 22, 2025: contributor
    Concept ACK for now. Makes sense to ensure warnings use a more accurate data point (m_dirty_count). Helps prevent a “chicken little” effect.
  36. in src/test/coins_tests.cpp:103 in 15619a1c99
     99@@ -100,6 +100,7 @@ class CCoinsViewCacheTest : public CCoinsViewCache
    100     CCoinsMap& map() const { return cacheCoins; }
    101     CoinsCachePair& sentinel() const { return m_sentinel; }
    102     size_t& usage() const { return cachedCoinsUsage; }
    103+    size_t& GetDirtyCount() const { return m_dirty_count; }
    


    l0rinc commented at 12:20 pm on February 1, 2025:

    This now hides GetDirtyCount in CCoinsViewCache with a size_t return value (changed to size_t& here), used later as cache.usage() += and cache.GetDirtyCount() +=.

    To avoid the confusing override and the method return value assignment, please consider making them explicit setters/adders:

    0void AddUsage(size_t usage) const { cachedCoinsUsage += usage; }
    1void AddDirtyCount(size_t count) const { m_dirty_count += count; }
    
    0cache.AddUsage(InsertCoinsMapEntry(cache.map(), cache.sentinel(), *cache_coin));
    1cache.AddDirtyCount(cache_coin->IsDirty());
    
  37. in src/validation.cpp:92 in 15619a1c99
    87@@ -88,8 +88,8 @@ using node::CBlockIndexHeightOnlyComparator;
    88 using node::CBlockIndexWorkComparator;
    89 using node::SnapshotMetadata;
    90 
    91-/** Size threshold for warning about slow UTXO set flush to disk. */
    92-static constexpr size_t WARN_FLUSH_COINS_SIZE = 1 << 30; // 1 GiB
    93+/** Threshold for warning when writing this many dirty cache entries to disk. */
    94+static constexpr size_t WARN_FLUSH_COINS_COUNT = 10'000'000;
    


    l0rinc commented at 12:30 pm on February 1, 2025:

    👍 10m seems like a good approximation for the previous value:

    2025-01-31T09:45:42Z UpdateTip: new best=0000000000000000cc22cc938f92ecc3a3844f23775c4ad4e66404563c8594c8 height=287000 version=0x00000002 log2_work=76.752105 tx=33340892 date=‘2014-02-2 1T05:18:42Z’ progress=0.028794 cache=1245.8MiB(10004443txo)

  38. in src/coins.cpp:115 in 15619a1c99
    114@@ -113,7 +115,10 @@ void CCoinsViewCache::AddCoin(const COutPoint &outpoint, Coin&& coin, bool possi
    115 void CCoinsViewCache::EmplaceCoinInternalDANGER(COutPoint&& outpoint, Coin&& coin) {
    


    l0rinc commented at 12:44 pm on February 1, 2025:

    This change doesn’t seem to be covered by coins_tests at all. Can we extend SimulationTest to call stack.back()->EmplaceCoinInternalDANGER instead of AddCoin randomly?

    Maybe something like:

    0if (COutPoint op(txid, 0); !stack.back()->map().contains(op) && !(coin.IsSpent() || (m_rng.rand32() & 1))) {
    1    stack.back()->EmplaceCoinInternalDANGER(std::move(op), std::move(newcoin));
    2} else {
    3    stack.back()->AddCoin(std::move(op), std::move(newcoin), /*possible_overwrite=*/(!coin.IsSpent() || (m_rng.rand32() & 1)));
    4}
    
  39. in src/coins.cpp:119 in 15619a1c99
    114@@ -113,7 +115,10 @@ void CCoinsViewCache::AddCoin(const COutPoint &outpoint, Coin&& coin, bool possi
    115 void CCoinsViewCache::EmplaceCoinInternalDANGER(COutPoint&& outpoint, Coin&& coin) {
    116     cachedCoinsUsage += coin.DynamicMemoryUsage();
    117     auto [it, inserted] = cacheCoins.try_emplace(std::move(outpoint), std::move(coin));
    118-    if (inserted) CCoinsCacheEntry::SetDirty(*it, m_sentinel);
    119+    if (inserted) {
    120+        CCoinsCacheEntry::SetDirty(*it, m_sentinel);
    


    l0rinc commented at 1:23 pm on February 1, 2025:

    The documentation of EmplaceCoinInternalDANGER claims that:

    Emplace a coin into cacheCoins without performing any checks, marking the emplaced coin as dirty.

    How come we’re updating cachedCoinsUsage unconditionally in this method - but m_dirty_count only when inserted?


    andrewtoth commented at 2:06 pm on February 1, 2025:
    That’s a bug from #28280. It should be done conditionally. I fix it in the first commit of #31132. The only caller of this method will not add coins that already exist in the cache though, so it will always enter the if (inserted) block.

    l0rinc commented at 2:29 pm on February 1, 2025:
    Thanks for validating it, I only noticed it now, when trying to test EmplaceCoinInternalDANGER in SimulationTest
  40. l0rinc changes_requested
  41. l0rinc commented at 2:00 pm on February 1, 2025: contributor
    I have reviewed it in multiple steps, I have a few remaining questions and concerns, let me know if I can help in any other way.
  42. DrahtBot removed the label CI failed on Feb 3, 2025

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-02-22 21:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me