Use number of dirty cache entries in flush warnings/logs #31703

pull sipa wants to merge 2 commits into bitcoin:master from sipa:202501_dirty_coin_count changing 5 files +50 −15
  1. sipa commented at 7:27 pm on January 21, 2025: member

    Since #28280 and #28233, the percentage of dirty entries in the cache when a flush happen may in practice be far from 100%. The log output, the decision to warn about a large flush (#31534), and the disk-full detection, however always use the entire size of the cache.

    Fix this by keeping track of the number of dirty entries in CCoinsViewCache, and using that number. I’ve dropped the usage of DynamicMemoryUsage as it’s the wrong metric, even before the non-wiping flushes were introduced (if everything is dirty, memory usage is correlated with, but not the same as, the amount of disk space written or used). They could be improved by keeping track of proper write size estimates in a follow-up, but here I’m just using a fixed 48 bytes per entry estimate.

  2. DrahtBot commented at 7:27 pm on January 21, 2025: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/31703.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    Concept ACK TheCharlatan, andrewtoth, l0rinc, tdb3

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #32128 (Draft: CCoinMap Experiments by martinus)
    • #31132 (validation: fetch block inputs on parallel threads 10% faster IBD by andrewtoth)
    • #30673 (coins: remove logic for spent-and-FRESH cache entries and writing non-DIRTY entries by andrewtoth)
    • #30611 (validation: write chainstate to disk every hour by andrewtoth)
    • #29700 (kernel, refactor: return error status on all fatal errors by ryanofsky)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  3. sipa renamed this:
    202501 dirty coin count
    Use number of dirty cache entries in flush warnings/logs
    on Jan 21, 2025
  4. TheCharlatan commented at 7:59 pm on January 21, 2025: contributor
    Nice, Concept ACK
  5. in src/coins.h:472 in 19f1f69802 outdated
    467@@ -463,6 +468,9 @@ class CCoinsViewCache : public CCoinsViewBacked
    468     //! Calculate the size of the cache (in number of transaction outputs)
    469     unsigned int GetCacheSize() const;
    470 
    471+    //! Calculate the number of dirty cache entries (transaction outputs)
    472+    unsigned int GetDirtyCount() const noexcept { return m_num_dirty; }
    


    andrewtoth commented at 8:20 pm on January 21, 2025:

    nit

    0    size_t GetDirtyCount() const noexcept { return m_num_dirty; }
    

    sipa commented at 8:52 pm on January 21, 2025:
    Done.
  6. in src/validation.cpp:2957 in 19f1f69802 outdated
    2953@@ -2953,8 +2954,8 @@ bool Chainstate::FlushStateToDisk(
    2954             TRACEPOINT(utxocache, flush,
    2955                    int64_t{Ticks<std::chrono::microseconds>(SteadyClock::now() - nNow)},
    2956                    (uint32_t)mode,
    2957-                   (uint64_t)coins_count,
    2958-                   (uint64_t)coins_mem_usage,
    


    andrewtoth commented at 8:26 pm on January 21, 2025:

    This tracepoint was previously recording an estimate of memory usage of the entire cache, now it is an estimate of the disk usage of the dirty entries that were just written. Is that what we want to record here?

    cc @0xB10C


    sipa commented at 8:52 pm on January 21, 2025:
    Reverted this change.

    andrewtoth commented at 1:02 am on January 22, 2025:
    Hmm now we need to only declare coins_mem_usage if building trace points.

    sipa commented at 2:36 pm on January 22, 2025:
    Added [[maybe_unused]], which seems to suffice.
  7. andrewtoth commented at 8:26 pm on January 21, 2025: contributor
    Concept ACK
  8. sipa force-pushed on Jan 21, 2025
  9. DrahtBot added the label CI failed on Jan 21, 2025
  10. DrahtBot commented at 10:37 pm on January 21, 2025: contributor

    🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/35957028069

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  11. sipa force-pushed on Jan 22, 2025
  12. DrahtBot removed the label CI failed on Jan 22, 2025
  13. in src/validation.cpp:2937 in d99b0ec47c outdated
    2936-            LOG_TIME_MILLIS_WITH_CATEGORY(strprintf("write coins cache to disk (%d coins, %.2fKiB)",
    2937-                coins_count, coins_mem_usage >> 10), BCLog::BENCH);
    2938-
    2939             // Typical Coin structures on disk are around 48 bytes in size.
    2940+            size_t approx_disk_usage = 48 * coins_dirty_count;
    2941+            if (approx_disk_usage >= WARN_FLUSH_COINS_SIZE) LogWarning("Flushing large (%.1f GiB) UTXO set to disk, it may take several minutes", double(approx_disk_usage) / (1 << 30));
    


    l0rinc commented at 11:40 am on January 22, 2025:

    This warning became more useful now - a few questions:

    • Would it make sense to mention the coin count and do the warning based on that instead? May be too much information, not sure, it’s just an idea:
    0Flushing %s dirty coins to disk...
    
    • If we keep GiB (while double is likely still precise enough for the billions range), I wonder if the users are really interested in fraction of GiB here
    • The limit for triggering this has changed here - does the WARN_FLUSH_COINS_SIZE threshold still make sense?

    l0rinc commented at 12:33 pm on January 22, 2025:

    Should we make this consistent with FlushSnapshotToDisk and ChainstateManager::PopulateAndValidateSnapshot as well to log the estimated disk usage based on dirty count instead of DynamicMemoryUsage?

    02025-01-22T11:56:22Z FlushSnapshotToDisk: saving snapshot chainstate (27440 MB) started
    

    (and maybe use MiB as well to avoid further confusion) and

    02025-01-22T11:56:22Z [snapshot] loaded 176948713 (27440 MB)
    

    l0rinc commented at 1:44 pm on January 22, 2025:
    This is likely my mistake in #31534, but loading the UTXO set (as described in #31645) will call Flush via FlushSnapshotToDisk at the end (taking a really long time but never warning) https://github.com/bitcoin/bitcoin/blob/master/src/validation.cpp#L5878, reset m_num_dirty = 0 so when we get to FlushStateToDisk (through ActivateSnapshot) we’re never warning since coins_dirty_count is always 0.

    sipa commented at 3:08 pm on January 22, 2025:
    I have switched everything to be dirty-count based, rather than size-based.

    sipa commented at 3:15 pm on January 22, 2025:
    I think that can wait for an independent improvement.

    sipa commented at 3:16 pm on January 22, 2025:
    I’m not familiar with the snapshot code.

    l0rinc commented at 7:28 pm on January 22, 2025:

    I posted the steps to reproduce it at #31645 - but we can move this warning to Flush in a different PR to make sure it works for AssumeUTXO as well.


    Checked that the current PR displays the warning correctly.

    2025-01-22T19:20:54Z UpdateTip: new best=0000000000000000000162ce2492545600d6e6780ee22bf464166a4faa28ba28 height=824665 version=0x32000000 log2_work=94.650129 tx=948674213 date=‘2024-01-06T21:32:54Z’ progress=0.822859 cache=9687.2MiB(67719102txo) … 2025-01-22T19:20:55Z [warning] Flushing large (67721262 entries) UTXO set to disk, it may take several minutes


    l0rinc commented at 5:10 pm on January 30, 2025:

    I’m not familiar with the snapshot code.

    It seems to me that it may make more sense to move the warning to CCoinsViewDB::BatchWrite instead, which could already have m_dirty available through the CoinsViewCacheCursor, and since it’s called from Flush and Sync, an AssumeUTXO load-and-dump would also trigger the warning (not just an IBD).

    I also don’t mind doing it myself in a follow-up, if you think it’s a good idea.


    l0rinc commented at 9:49 am on March 18, 2025:

    To clarify, currently AssumeUTXO doesn’t display the warning:

    02025-03-18T10:08:47Z [snapshot] 184000000 coins loaded (99.56%, 29332 MB)
    12025-03-18T10:08:48Z [snapshot] loaded 184821030 (29456 MB) coins from snapshot 000000000000000000010b17283c3c400507969a9c2afd1dcf2082ec5cca2880
    22025-03-18T10:08:48Z FlushSnapshotToDisk: saving snapshot chainstate (29456 MB) started
    32025-03-18T10:09:57Z FlushSnapshotToDisk: completed (69461.66ms)
    

    But if we moved the warning introduced in #31534 into to the method that does the actual long operation it would show the warning for AssumeUTXO case as well (i.e. both for Flush and Sync):

     0diff --git a/src/coins.h b/src/coins.h
     1index f897bce749..a23b663a1a 100644
     2--- a/src/coins.h
     3+++ b/src/coins.h
     4@@ -300,6 +300,7 @@ struct CoinsViewCacheCursor
     5     }
     6 
     7     inline bool WillErase(CoinsCachePair& current) const noexcept { return m_will_erase || current.second.coin.IsSpent(); }
     8+    size_t GetDirtyCount() const noexcept { return m_dirty; }
     9 private:
    10     size_t& m_usage;
    11     size_t& m_dirty;
    12diff --git a/src/txdb.cpp b/src/txdb.cpp
    13index 1622039d63..083cab3279 100644
    14--- a/src/txdb.cpp
    15+++ b/src/txdb.cpp
    16@@ -25,6 +25,9 @@ static constexpr uint8_t DB_HEAD_BLOCKS{'H'};
    17 // Keys used in previous version that might still be found in the DB:
    18 static constexpr uint8_t DB_COINS{'c'};
    19 
    20+// Threshold for warning when writing this many dirty cache entries to disk.
    21+static constexpr size_t WARN_FLUSH_COINS_COUNT{10'000'000};
    22+
    23 bool CCoinsViewDB::NeedsUpgrade()
    24 {
    25     std::unique_ptr<CDBIterator> cursor{m_db->NewIterator()};
    26@@ -109,6 +112,8 @@ bool CCoinsViewDB::BatchWrite(CoinsViewCacheCursor& cursor, const uint256 &hashB
    27         }
    28     }
    29 
    30+    if (cursor.GetDirtyCount() > WARN_FLUSH_COINS_COUNT) LogWarning("Flushing large (%d entries) UTXO set to disk, it may take several minutes", cursor.GetDirtyCount());
    31+
    32     // In the first batch, mark the database as being in the middle of a
    33     // transition from old_tip to hashBlock.
    34     // A vector is used for future extensibility, as we may want to support
    35diff --git a/src/validation.cpp b/src/validation.cpp
    36index 1f4f2e1ae8..3323a037dc 100644
    37--- a/src/validation.cpp
    38+++ b/src/validation.cpp
    39@@ -88,8 +88,6 @@ using node::CBlockIndexHeightOnlyComparator;
    40 using node::CBlockIndexWorkComparator;
    41 using node::SnapshotMetadata;
    42 
    43-/** Threshold for warning when writing this many dirty cache entries to disk. */
    44-static constexpr size_t WARN_FLUSH_COINS_COUNT = 10'000'000;
    45 /** Time to wait between writing blocks/block index to disk. */
    46 static constexpr std::chrono::hours DATABASE_WRITE_INTERVAL{1};
    47 /** Time to wait between flushing chainstate to disk. */
    48@@ -2932,7 +2930,6 @@ bool Chainstate::FlushStateToDisk(
    49         }
    50         // Flush best chain related state. This can only be done if the blocks / block index write was also done.
    51         if (fDoFullFlush && !CoinsTip().GetBestBlock().IsNull()) {
    52-            if (coins_dirty_count >= WARN_FLUSH_COINS_COUNT) LogWarning("Flushing large (%d entries) UTXO set to disk, it may take several minutes", coins_dirty_count);
    53             LOG_TIME_MILLIS_WITH_CATEGORY(strprintf("write coins cache to disk (%d out of %d cached coins)",
    54                 coins_dirty_count, coins_count), BCLog::BENCH);
    

    Reproducer:

    0cmake -B build -DCMAKE_BUILD_TYPE=Release && cmake --build build -j$(nproc)
    1mkdir -p demo && rm -rfd demo/chainstate demo/chainstate_snapshot demo/debug.log
    2build/bin/bitcoind -datadir=demo -stopatheight=1
    3build/bin/bitcoind -datadir=demo -daemon -blocksonly=1 -connect=0 -dbcache=30000
    4build/bin/bitcoin-cli -datadir=demo -rpcclienttimeout=0 loadtxoutset ~/utxo-880000.dat
    5build/bin/bitcoin-cli -datadir=demo stop
    

    Result:

    02025-03-18T09:59:40Z [snapshot] 184000000 coins loaded (99.56%, 29332 MB)
    12025-03-18T09:59:41Z [snapshot] loaded 184821030 (29456 MB) coins from snapshot 000000000000000000010b17283c3c400507969a9c2afd1dcf2082ec5cca2880
    22025-03-18T09:59:41Z FlushSnapshotToDisk: saving snapshot chainstate (29456 MB) started
    3*2025-03-18T09:59:41Z [warning] Flushing large (184821030 entries) UTXO set to disk, it may take several minutes*
    42025-03-18T10:00:50Z FlushSnapshotToDisk: completed (68776.27ms)
    

    Would it be simpler if I pushed this change in a separate PR instead?

  14. in src/coins.h:383 in d99b0ec47c outdated
    379@@ -377,6 +380,8 @@ class CCoinsViewCache : public CCoinsViewBacked
    380 
    381     /* Cached dynamic memory usage for the inner Coin objects. */
    382     mutable size_t cachedCoinsUsage{0};
    383+    /* Cached number of dirty Coin objects. */
    


    l0rinc commented at 11:53 am on January 22, 2025:

    We’re already in a Cache object, maybe we could clarify that this is a running count:

    0    /* Running count of dirty Coin entries. */
    

    sipa commented at 3:08 pm on January 22, 2025:
    Done.
  15. in src/validation.cpp:2936 in d99b0ec47c outdated
    2931@@ -2931,16 +2932,17 @@ bool Chainstate::FlushStateToDisk(
    2932         }
    2933         // Flush best chain related state. This can only be done if the blocks / block index write was also done.
    2934         if (fDoFullFlush && !CoinsTip().GetBestBlock().IsNull()) {
    2935-            if (coins_mem_usage >= WARN_FLUSH_COINS_SIZE) LogWarning("Flushing large (%d GiB) UTXO set to disk, it may take several minutes", coins_mem_usage >> 30);
    2936-            LOG_TIME_MILLIS_WITH_CATEGORY(strprintf("write coins cache to disk (%d coins, %.2fKiB)",
    2937-                coins_count, coins_mem_usage >> 10), BCLog::BENCH);
    2938-
    2939             // Typical Coin structures on disk are around 48 bytes in size.
    2940+            size_t approx_disk_usage = 48 * coins_dirty_count;
    


    l0rinc commented at 11:55 am on January 22, 2025:

    I understand that the hard-coded value was here before, and was wondering if we could explain how we got to this magic number (and not mention the value in comments at all since code can express that better), maybe something like:

    0static constexpr size_t COIN_DISK_USAGE_BYTES{sizeof(Coin)}; // Approximate size on disk
    

    Loading the UTXO set to memory:

    2025-01-22T11:56:22Z [snapshot] loaded 176948713 (27440 MB) coins from snapshot 0000000000000000000320283a032748cef8227873ff4872689bf23f1cda83a5

    and checking the resulting disk size:

    0du -sk /Users/lorinc/IdeaProjects/bitcoin/demo/chainstate_snapshot | cut -f1
    111524472
    

    Suggests we may want to add a ~40% overhead here:

    0((11524472*1024)/176948713)/48
    

    i.e.

    0static constexpr size_t COIN_DISK_USAGE_BYTES{static_cast<size_t>(sizeof(Coin) * 1.4)}; // Approximate size on disk
    

    sipa commented at 3:14 pm on January 22, 2025:

    Estimating sizes of the UTXO set is hard, because these are all very distinct numbers:

    • The size on disk in a snapshot of a UTXO set
    • The size on disk in the chainstate LevelDB database (the database has different per-entry overhead, prefix compression, bloom filters, …).
    • The amount of data that will be written to disk to flush a number of dirty entries (because e.g. writing a removal/spending of a UTXO temporarily takes up space in the log before it’s compacted into the tables)
    • The amount of memory a UTXO entry takes up in RAM (as approximated by DynamicMemoryUsage()).

    If we actually care about disk sizes, we should actually try to estimate the thing we care about, and not use snapshot size as a proxy.


    l0rinc commented at 7:31 pm on January 22, 2025:
    I understand that it’s complex, but we seem to be off here by 40% - I don’t mind if we tackle that in a separate PR/issue and would appreciate your opinion on how to handle this better.

    sipa commented at 7:35 pm on January 22, 2025:
    I don’t understand why you’re looking at snapshot size at all, that’s not a good measure of data written. We may be off by a lot more, or a lot less. If we want to do better, we should look at how much is actually written, and then add to CCoinsViewCache a way to predict that number. None of that seems a good fit for this PR, though.

    l0rinc commented at 5:20 pm on January 30, 2025:

    I don’t understand why you’re looking at snapshot size

    not a good measure of data written

    and not use snapshot size as a proxy

    I understand that the dirty entries in the snapshot are not a perfect representation (except for the AssumeUTXO tests I was doing, which likely skewed my recommendations), but I though it’s what we’re already doing here when we’re calculating “approx_disk_usage” from “coins_dirty_count”.

    None of that seems a good fit for this PR, though

    I agree, if we want a better representation. But replacing the magic 48 with sizeof(Coin) could be. But I’m also fine if not, it’s just an alternative that I found.

  16. in src/coins.cpp:320 in d99b0ec47c outdated
    316@@ -306,6 +317,7 @@ void CCoinsViewCache::ReallocateCache()
    317 {
    318     // Cache should be empty when we're calling this.
    319     assert(cacheCoins.size() == 0);
    320+    assert(m_num_dirty == 0);
    


    l0rinc commented at 11:57 am on January 22, 2025:
    given that this is mainly used for logging, do we want to crash the client when this isn’t accurate?

    sipa commented at 3:14 pm on January 22, 2025:
    Changed to Assume.
  17. in src/test/coins_tests.cpp:103 in d99b0ec47c outdated
     99@@ -100,6 +100,7 @@ class CCoinsViewCacheTest : public CCoinsViewCache
    100     CCoinsMap& map() const { return cacheCoins; }
    101     CoinsCachePair& sentinel() const { return m_sentinel; }
    102     size_t& usage() const { return cachedCoinsUsage; }
    103+    size_t& GetNumDirty() const { return m_num_dirty; }
    


    l0rinc commented at 12:00 pm on January 22, 2025:

    Nit 1: num doesn’t obviate what the number represents, just its type. It could mean bytes or count or an int representation of a boolean. Given that in other cases we called this DirtyCount, could we use that here as well? Nit 2: not sure it matters, but in other cases here we didn’t add the “Get” prefix, I don’t see the value it adds (and it kinda’ gets confusing when we’re mutating it in SingleEntryCacheTest), might as well trim it I guess.

    0    size_t& DirtyCount() const { return m_dirty_count; }
    

    sipa commented at 3:14 pm on January 22, 2025:
    Changed to m_dirty_count and GetDirtyCount() everywhere.
  18. in src/coins.cpp:154 in d99b0ec47c outdated
    150@@ -145,6 +151,7 @@ bool CCoinsViewCache::SpendCoin(const COutPoint &outpoint, Coin* moveout) {
    151     } else {
    152         CCoinsCacheEntry::SetDirty(*it, m_sentinel);
    153         it->second.coin.Clear();
    154+        ++m_num_dirty;
    


    l0rinc commented at 12:11 pm on January 22, 2025:
    Super-nit: m_num_dirty increment is related to the SetDirty call, not the Clear call, if you think it makes sense, we could group them as such (like we do in EmplaceCoinInternalDANGER and BatchWrite)

    sipa commented at 3:15 pm on January 22, 2025:
    Done.
  19. in src/coins.h:290 in d99b0ec47c outdated
    286@@ -286,6 +287,7 @@ struct CoinsViewCacheCursor
    287         const auto next_entry{current.second.Next()};
    288         // If we are not going to erase the cache, we must still erase spent entries.
    289         // Otherwise, clear the state of the entry.
    290+        if (current.second.IsDirty()) --m_dirty;
    


    l0rinc commented at 12:17 pm on January 22, 2025:

    I don’t mind either the conditional increment or the implicit boolean-to-int conversion, but I get distracted when both are used without an easy to understand reason. Unless there’s a good reason here, could we change this to:

    0        m_dirty -= current.second.IsDirty();
    

    sipa commented at 3:15 pm on January 22, 2025:
    Done.
  20. in src/test/coins_tests.cpp:628 in d99b0ec47c outdated
    624@@ -624,15 +625,15 @@ static void SetCoinsValue(const CAmount value, Coin& coin)
    625     }
    626 }
    627 
    628-static size_t InsertCoinsMapEntry(CCoinsMap& map, CoinsCachePair& sentinel, const CoinEntry& cache_coin)
    629+static std::pair<size_t, size_t> InsertCoinsMapEntry(CCoinsMap& map, CoinsCachePair& sentinel, const CoinEntry& cache_coin)
    


    l0rinc commented at 12:26 pm on January 22, 2025:

    Nit: it’s test code so it’s not super-important, but we’re basically returning the state of a modified parameter here - seems a bit hacky to me. I think we could return the same value as before and call IsDirty() on the call site, e.g.

    0size_t usage{cache_coin ? InsertCoinsMapEntry(map, sentinel, *cache_coin) : 0};
    1size_t dirty{cache_coin->IsDirty()};
    

    and

    0cache.usage() += InsertCoinsMapEntry(cache.map(), cache.sentinel(), *cache_coin);
    1cache.GetNumDirty() += cache_coin->IsDirty();
    

    sipa commented at 3:15 pm on January 22, 2025:
    Done.
  21. in src/validation.cpp:2945 in d99b0ec47c outdated
    2945             // Pushing a new one to the database can cause it to be written
    2946             // twice (once in the log, and once in the tables). This is already
    2947             // an overestimation, as most will delete an existing entry or
    2948             // overwrite one. Still, use a conservative safety factor of 2.
    2949-            if (!CheckDiskSpace(m_chainman.m_options.datadir, 48 * 2 * 2 * CoinsTip().GetCacheSize())) {
    2950+            if (!CheckDiskSpace(m_chainman.m_options.datadir, 2 * 2 * approx_disk_usage)) {
    


    l0rinc commented at 12:36 pm on January 22, 2025:
    Slightly unrelated, but we probably need something similar for memory as well. Currently, there’s no explicit warning when -dbcache is set to a value exceeding available system memory (especially problematic after #28358). A similar warning in this scenario would be very helpful - but likely outside the scope of this PR.
  22. in src/coins.cpp:247 in d99b0ec47c outdated
    243@@ -235,6 +244,7 @@ bool CCoinsViewCache::BatchWrite(CoinsViewCacheCursor& cursor, const uint256 &ha
    244                     itUs->second.coin = it->second.coin;
    245                 }
    246                 cachedCoinsUsage += itUs->second.coin.DynamicMemoryUsage();
    247+                m_num_dirty += !itUs->second.IsDirty();
    


    l0rinc commented at 1:15 pm on January 22, 2025:
    It took me a while to understand this one - still not sure I follow completely.

    sipa commented at 3:15 pm on January 22, 2025:
    I’ve reworked it a bit. LMK if it’s clearer now.

    l0rinc commented at 5:14 pm on January 30, 2025:
    Yes, thank you. The only remaining nit is that in other cases we set it dirty first, before incrementing (edit: to obviate that the call doesn’t rely on the value of m_dirty_count).
  23. in src/coins.cpp:81 in d99b0ec47c outdated
    77@@ -78,6 +78,7 @@ void CCoinsViewCache::AddCoin(const COutPoint &outpoint, Coin&& coin, bool possi
    78     bool fresh = false;
    79     if (!inserted) {
    80         cachedCoinsUsage -= it->second.coin.DynamicMemoryUsage();
    81+        m_num_dirty -= it->second.IsDirty();
    


    l0rinc commented at 1:22 pm on January 22, 2025:
    We’re modifying m_num_dirty multiple times in the AddCoin method, possibly even cancelling out previous modifications. If it’s not too involved, could we calculate the final diff and only update it a single time in this method for clarity? Same applies to SpendCoin.
  24. in src/coins.cpp:362 in d99b0ec47c outdated
    358@@ -343,6 +359,7 @@ void CCoinsViewCache::SanityCheck() const
    359     }
    360     assert(count_linked == count_flagged);
    361     assert(recomputed_usage == cachedCoinsUsage);
    362+    assert(num_dirty == m_num_dirty);
    


    l0rinc commented at 1:24 pm on January 22, 2025:
    Before we merge I’d like to run a full IBD with SanityCheck enabled throughout the code - can you recommend other settings that I should enable?

    sipa commented at 3:16 pm on January 22, 2025:
    Anything that involves a reorg would perhaps be interesting to test.

    l0rinc commented at 4:56 pm on January 24, 2025:

    I ran an IBD until 800k with -DCMAKE_BUILD_TYPE=Debug for the current PR (it’s really slow this way) - no issues found.

    a reorg would perhaps be interesting to test

    Do we have a way of triggering random reorgs during IBD (at least undo/redo style)? If not, would it be helpful to create a PR that adds a hidden -simulatereorgonibd argument (similarly to -dbcrashratio) that randomly triggers small, temporary reorgs during IBD on mainnet? Specifically, at a random height between every ~[10,000, 100,000] blocks, we disconnect and reconnect ~[1, 100] blocks (favoring smaller counts), performing a CoinsTip#SanityCheck() before and after (and maybe validating that we have the UTXOs in cache that will be removed, asserting after undo that they’re missing, and revalidating after reapplication that they’re back in cache). This would test reorg logic in a semi-realistic scenario - though it wouldn’t stress diverging chains, as it reconnects the same blocks.


    l0rinc commented at 5:26 pm on January 30, 2025:

    Alternatively, we could collect a few hundred stale blocks (found 219 raw blocks so far via https://github.com/bitcoin-data/stale-blocks and https://learnmeabitcoin.com) and create a stale-block-proxy node for end-to-end regression testing of reorgs with historical data:

    • A freshly built test node would connect only to this proxy, which would simulate the original chain forks.
    • It serves headers & blocks exactly as they were originally seen (without storing them, requesting non-stale ones from the network).
    • The proxy pretends a stale block is the tip, letting the test node sync up to it.
    • Once the test node reaches that stale block, the proxy sets the next stale block as the new tip, triggering a natural reorg.

    Would that work and be a useful way to test reorg behavior during IBD?


    l0rinc commented at 10:26 am on March 18, 2025:
    I will do this later to test reorg behavior in a functional test, you can resolve this comment.
  25. in src/validation.cpp:2831 in d99b0ec47c outdated
    2827@@ -2828,7 +2828,8 @@ bool Chainstate::FlushStateToDisk(
    2828     bool full_flush_completed = false;
    2829 
    2830     const size_t coins_count = CoinsTip().GetCacheSize();
    2831-    const size_t coins_mem_usage = CoinsTip().DynamicMemoryUsage();
    2832+    [[maybe_unused]] const size_t coins_mem_usage = CoinsTip().DynamicMemoryUsage();
    


    l0rinc commented at 1:47 pm on January 22, 2025:
    I guess we can’t just inline this, but maybe we can guard it with #ifdef ENABLE_TRACING alternatively

    sipa commented at 3:16 pm on January 22, 2025:
    I think [[maybe_unused]] is fine.

    l0rinc commented at 5:24 pm on January 30, 2025:
    Sure, though this won’t warn us if it actually becomes unused.
  26. l0rinc changes_requested
  27. l0rinc commented at 2:10 pm on January 22, 2025: contributor

    Concept ACK

    I have tried reproducing it via AssumeUTXO but the warning is never triggered. That’s my main concern (it’s probably my mistake in the related PR), other than not applying it in other cases where the sizes are displayed.

  28. sipa force-pushed on Jan 22, 2025
  29. sipa force-pushed on Jan 22, 2025
  30. DrahtBot added the label CI failed on Jan 22, 2025
  31. DrahtBot commented at 3:18 pm on January 22, 2025: contributor

    🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/36002777403

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  32. coins: keep track of number of dirty entries in cache 586f2a6b1b
  33. validation: use dirty coins count in flush warnings 15619a1c99
  34. sipa force-pushed on Jan 22, 2025
  35. tdb3 commented at 11:12 pm on January 22, 2025: contributor
    Concept ACK for now. Makes sense to ensure warnings use a more accurate data point (m_dirty_count). Helps prevent a “chicken little” effect.
  36. in src/test/coins_tests.cpp:103 in 15619a1c99
     99@@ -100,6 +100,7 @@ class CCoinsViewCacheTest : public CCoinsViewCache
    100     CCoinsMap& map() const { return cacheCoins; }
    101     CoinsCachePair& sentinel() const { return m_sentinel; }
    102     size_t& usage() const { return cachedCoinsUsage; }
    103+    size_t& GetDirtyCount() const { return m_dirty_count; }
    


    l0rinc commented at 12:20 pm on February 1, 2025:

    This now hides GetDirtyCount in CCoinsViewCache with a size_t return value (changed to size_t& here), used later as cache.usage() += and cache.GetDirtyCount() +=.

    To avoid the confusing override and the method return value assignment, please consider making them explicit setters/adders:

    0void AddUsage(size_t usage) const { cachedCoinsUsage += usage; }
    1void AddDirtyCount(size_t count) const { m_dirty_count += count; }
    
    0cache.AddUsage(InsertCoinsMapEntry(cache.map(), cache.sentinel(), *cache_coin));
    1cache.AddDirtyCount(cache_coin->IsDirty());
    
  37. in src/validation.cpp:92 in 15619a1c99
    87@@ -88,8 +88,8 @@ using node::CBlockIndexHeightOnlyComparator;
    88 using node::CBlockIndexWorkComparator;
    89 using node::SnapshotMetadata;
    90 
    91-/** Size threshold for warning about slow UTXO set flush to disk. */
    92-static constexpr size_t WARN_FLUSH_COINS_SIZE = 1 << 30; // 1 GiB
    93+/** Threshold for warning when writing this many dirty cache entries to disk. */
    94+static constexpr size_t WARN_FLUSH_COINS_COUNT = 10'000'000;
    


    l0rinc commented at 12:30 pm on February 1, 2025:

    👍 10m seems like a good approximation for the previous value:

    2025-01-31T09:45:42Z UpdateTip: new best=0000000000000000cc22cc938f92ecc3a3844f23775c4ad4e66404563c8594c8 height=287000 version=0x00000002 log2_work=76.752105 tx=33340892 date=‘2014-02-2 1T05:18:42Z’ progress=0.028794 cache=1245.8MiB(10004443txo)

  38. in src/coins.cpp:115 in 15619a1c99
    114@@ -113,7 +115,10 @@ void CCoinsViewCache::AddCoin(const COutPoint &outpoint, Coin&& coin, bool possi
    115 void CCoinsViewCache::EmplaceCoinInternalDANGER(COutPoint&& outpoint, Coin&& coin) {
    


    l0rinc commented at 12:44 pm on February 1, 2025:

    This change doesn’t seem to be covered by coins_tests at all. Can we extend SimulationTest to call stack.back()->EmplaceCoinInternalDANGER instead of AddCoin randomly?

    Maybe something like:

    0if (COutPoint op(txid, 0); !stack.back()->map().contains(op) && !(coin.IsSpent() || (m_rng.rand32() & 1))) {
    1    stack.back()->EmplaceCoinInternalDANGER(std::move(op), std::move(newcoin));
    2} else {
    3    stack.back()->AddCoin(std::move(op), std::move(newcoin), /*possible_overwrite=*/(!coin.IsSpent() || (m_rng.rand32() & 1)));
    4}
    
  39. in src/coins.cpp:119 in 15619a1c99
    114@@ -113,7 +115,10 @@ void CCoinsViewCache::AddCoin(const COutPoint &outpoint, Coin&& coin, bool possi
    115 void CCoinsViewCache::EmplaceCoinInternalDANGER(COutPoint&& outpoint, Coin&& coin) {
    116     cachedCoinsUsage += coin.DynamicMemoryUsage();
    117     auto [it, inserted] = cacheCoins.try_emplace(std::move(outpoint), std::move(coin));
    118-    if (inserted) CCoinsCacheEntry::SetDirty(*it, m_sentinel);
    119+    if (inserted) {
    120+        CCoinsCacheEntry::SetDirty(*it, m_sentinel);
    


    l0rinc commented at 1:23 pm on February 1, 2025:

    The documentation of EmplaceCoinInternalDANGER claims that:

    Emplace a coin into cacheCoins without performing any checks, marking the emplaced coin as dirty.

    How come we’re updating cachedCoinsUsage unconditionally in this method - but m_dirty_count only when inserted?


    andrewtoth commented at 2:06 pm on February 1, 2025:
    That’s a bug from #28280. It should be done conditionally. I fix it in the first commit of #31132. The only caller of this method will not add coins that already exist in the cache though, so it will always enter the if (inserted) block.

    l0rinc commented at 2:29 pm on February 1, 2025:
    Thanks for validating it, I only noticed it now, when trying to test EmplaceCoinInternalDANGER in SimulationTest

    l0rinc commented at 10:23 am on March 18, 2025:

    Following @andrewtoth’s solution we can do:

    0void CCoinsViewCache::EmplaceCoinInternalDANGER(COutPoint&& outpoint, Coin&& coin) {
    1    const auto mem_usage{coin.DynamicMemoryUsage()};
    2    auto [it, inserted] = cacheCoins.try_emplace(std::move(outpoint), std::move(coin));
    3    if (inserted) {
    4        cachedCoinsUsage += mem_usage;
    5        CCoinsCacheEntry::SetDirty(*it, m_sentinel);
    6        ++m_dirty_count;
    7    }
    8}
    
  40. l0rinc changes_requested
  41. l0rinc commented at 2:00 pm on February 1, 2025: contributor
    I have reviewed it in multiple steps, I have a few remaining questions and concerns, let me know if I can help in any other way.
  42. DrahtBot removed the label CI failed on Feb 3, 2025
  43. in src/coins.h:275 in 15619a1c99
    271@@ -272,10 +272,11 @@ struct CoinsViewCacheCursor
    272     //! Calling CCoinsMap::clear() afterwards is faster because a CoinsCachePair cannot be coerced back into a
    273     //! CCoinsMap::iterator to be erased, and must therefore be looked up again by key in the CCoinsMap before being erased.
    274     CoinsViewCacheCursor(size_t& usage LIFETIMEBOUND,
    275+                        size_t& dirty LIFETIMEBOUND,
    


    l0rinc commented at 12:47 pm on February 17, 2025:
    nit: parameters slightly unaligned with first parameter
  44. in src/coins.h:279 in 15619a1c99
    271@@ -272,10 +272,11 @@ struct CoinsViewCacheCursor
    272     //! Calling CCoinsMap::clear() afterwards is faster because a CoinsCachePair cannot be coerced back into a
    273     //! CCoinsMap::iterator to be erased, and must therefore be looked up again by key in the CCoinsMap before being erased.
    274     CoinsViewCacheCursor(size_t& usage LIFETIMEBOUND,
    275+                        size_t& dirty LIFETIMEBOUND,
    276                         CoinsCachePair& sentinel LIFETIMEBOUND,
    277                         CCoinsMap& map LIFETIMEBOUND,
    278                         bool will_erase) noexcept
    279-        : m_usage(usage), m_sentinel(sentinel), m_map(map), m_will_erase(will_erase) {}
    280+        : m_usage(usage), m_dirty(dirty), m_sentinel(sentinel), m_map(map), m_will_erase(will_erase) {}
    


    l0rinc commented at 12:58 pm on February 17, 2025:

    Do we still need to track CCoinsViewDB::BatchWrite#changed now that we’re tracking dirty count already? I think we should be able to use the new m_dirty(_count) there instead, i.e. something like:

    0size_t dirty_count{cursor.GetDirtyCount()};
    1size_t changed{0};
    2...
    3if (dirty_count > WARN_FLUSH_COINS_COUNT) LogWarning("Flushing large (%d entries) UTXO set to disk, it may take several minutes", dirty_count);
    4...
    5assert(changed == dirty_count); // TODO remove `changed` after all scenarios pass
    6LogDebug(BCLog::COINDB, "Committed %u changed transaction outputs (out of %u) to coin database...", (unsigned int)dirty_count, (unsigned int)count);
    

    For me locally this passes all tests, quick IBD (ran it with -dbcache=5 -maxmempool=5 -stopatheight=120000), reindex and AssumeUTXO.


    l0rinc commented at 10:34 am on March 18, 2025:

    I don’t mind if we’re mixing booleans and ints as long as the types and names are obvious, e.g.

    0size_t dirty = cache_coin ? cache_coin->IsDirty() : 0;
    1auto cursor{CoinsViewCacheCursor(usage, dirty, sentinel, map, /*will_erase=*/true)};
    

    where we’re setting a parameter called dirty to the result of IsDirty, but it gets converted to dirty count. We could obviate this by calling it dirty_count here as well:

    0    CoinsViewCacheCursor(size_t& usage LIFETIMEBOUND,
    1                         size_t& dirty_count LIFETIMEBOUND,
    2                         CoinsCachePair& sentinel LIFETIMEBOUND,
    3                         CCoinsMap& map LIFETIMEBOUND,
    4                         bool will_erase) noexcept
    5        : m_usage(usage), m_dirty_count(dirty_count), m_sentinel(sentinel), m_map(map), m_will_erase(will_erase) {}
    

    l0rinc commented at 2:10 pm on March 30, 2025:
    This could also be useful to avoid the needless DB_HEAD_BLOCKS erase and rewrite for empty batches.
  45. laanwj added the label Utils/log/libs on Mar 4, 2025
  46. in src/coins.cpp:345 in 15619a1c99
    341@@ -327,6 +342,9 @@ void CCoinsViewCache::SanityCheck() const
    342         // Recompute cachedCoinsUsage.
    343         recomputed_usage += entry.coin.DynamicMemoryUsage();
    344 
    345+        // Recompute m_num_dirty;
    


    l0rinc commented at 9:31 am on March 18, 2025:

    leftover num_dirty (& other comments here end with fullstop)

    0        // Recompute m_dirty_count.
    
  47. in src/test/fuzz/coins_view.cpp:126 in 15619a1c99
    122@@ -123,6 +123,7 @@ FUZZ_TARGET(coins_view, .init = initialize_coins_view)
    123                 CoinsCachePair sentinel{};
    124                 sentinel.second.SelfRef(sentinel);
    125                 size_t usage{0};
    126+                size_t num_dirty{0};
    


    l0rinc commented at 10:30 am on March 18, 2025:
    Leftover num_dirty
  48. in src/coins.cpp:249 in 15619a1c99
    243@@ -235,7 +244,10 @@ bool CCoinsViewCache::BatchWrite(CoinsViewCacheCursor& cursor, const uint256 &ha
    244                     itUs->second.coin = it->second.coin;
    245                 }
    246                 cachedCoinsUsage += itUs->second.coin.DynamicMemoryUsage();
    247-                CCoinsCacheEntry::SetDirty(*itUs, m_sentinel);
    248+                if (!itUs->second.IsDirty()) {
    249+                    ++m_dirty_count;
    250+                    CCoinsCacheEntry::SetDirty(*itUs, m_sentinel);
    


    l0rinc commented at 10:39 am on March 18, 2025:

    in other cases we set it to dirty first and only increment m_dirty_count after (shouldn’t matter on single thread but at least we don’t have to think about whether that’s the case or not):

    0                    CCoinsCacheEntry::SetDirty(*itUs, m_sentinel);
    1                    ++m_dirty_count;
    
  49. l0rinc changes_requested
  50. l0rinc commented at 11:54 am on March 18, 2025: contributor

    After our last discussion (and with the new AssumeUTXO included) I’ve rebased and re-reviewed and added examples to my suggestions to simplify the review.

    My remaining concerns are that AssumeUTXO doesn’t warn in the same way as other dumps do, a few leftover refactoring misses, redundant calculation now that we have a dirty count, missing code coverage for modified code, a leftover memory accounting bug in affected area, and a few code nits to make the change less surprising.

      0diff --git a/src/coins.cpp b/src/coins.cpp
      1index 30d82e1d08..b126622490 100644
      2--- a/src/coins.cpp
      3+++ b/src/coins.cpp
      4@@ -113,10 +113,11 @@ void CCoinsViewCache::AddCoin(const COutPoint &outpoint, Coin&& coin, bool possi
      5 }
      6 
      7 void CCoinsViewCache::EmplaceCoinInternalDANGER(COutPoint&& outpoint, Coin&& coin) {
      8-    cachedCoinsUsage += coin.DynamicMemoryUsage();
      9+    const auto mem_usage{coin.DynamicMemoryUsage()};
     10     auto [it, inserted] = cacheCoins.try_emplace(std::move(outpoint), std::move(coin));
     11     if (inserted) {
     12         CCoinsCacheEntry::SetDirty(*it, m_sentinel);
     13+        cachedCoinsUsage += mem_usage;
     14         ++m_dirty_count;
     15     }
     16 }
     17@@ -245,8 +246,8 @@ bool CCoinsViewCache::BatchWrite(CoinsViewCacheCursor& cursor, const uint256 &ha
     18                 }
     19                 cachedCoinsUsage += itUs->second.coin.DynamicMemoryUsage();
     20                 if (!itUs->second.IsDirty()) {
     21-                    ++m_dirty_count;
     22                     CCoinsCacheEntry::SetDirty(*itUs, m_sentinel);
     23+                    ++m_dirty_count;
     24                 }
     25                 // NOTE: It isn't safe to mark the coin as FRESH in the parent
     26                 // cache. If it already existed and was spent in the parent
     27@@ -342,7 +343,7 @@ void CCoinsViewCache::SanityCheck() const
     28         // Recompute cachedCoinsUsage.
     29         recomputed_usage += entry.coin.DynamicMemoryUsage();
     30 
     31-        // Recompute m_num_dirty;
     32+        // Recompute m_dirty_count.
     33         dirty_count += entry.IsDirty();
     34 
     35         // Count the number of entries we expect in the linked list.
     36diff --git a/src/coins.h b/src/coins.h
     37index f897bce749..474cf77fdb 100644
     38--- a/src/coins.h
     39+++ b/src/coins.h
     40@@ -272,11 +272,11 @@ struct CoinsViewCacheCursor
     41     //! Calling CCoinsMap::clear() afterwards is faster because a CoinsCachePair cannot be coerced back into a
     42     //! CCoinsMap::iterator to be erased, and must therefore be looked up again by key in the CCoinsMap before being erased.
     43     CoinsViewCacheCursor(size_t& usage LIFETIMEBOUND,
     44-                        size_t& dirty LIFETIMEBOUND,
     45-                        CoinsCachePair& sentinel LIFETIMEBOUND,
     46-                        CCoinsMap& map LIFETIMEBOUND,
     47-                        bool will_erase) noexcept
     48-        : m_usage(usage), m_dirty(dirty), m_sentinel(sentinel), m_map(map), m_will_erase(will_erase) {}
     49+                         size_t& dirty_count LIFETIMEBOUND,
     50+                         CoinsCachePair& sentinel LIFETIMEBOUND,
     51+                         CCoinsMap& map LIFETIMEBOUND,
     52+                         bool will_erase) noexcept
     53+        : m_usage(usage), m_dirty_count(dirty_count), m_sentinel(sentinel), m_map(map), m_will_erase(will_erase) {}
     54 
     55     inline CoinsCachePair* Begin() const noexcept { return m_sentinel.second.Next(); }
     56     inline CoinsCachePair* End() const noexcept { return &m_sentinel; }
     57@@ -285,7 +285,7 @@ struct CoinsViewCacheCursor
     58     inline CoinsCachePair* NextAndMaybeErase(CoinsCachePair& current) noexcept
     59     {
     60         const auto next_entry{current.second.Next()};
     61-        m_dirty -= current.second.IsDirty();
     62+        m_dirty_count -= current.second.IsDirty();
     63         // If we are not going to erase the cache, we must still erase spent entries.
     64         // Otherwise, clear the state of the entry.
     65         if (!m_will_erase) {
     66@@ -300,9 +300,11 @@ struct CoinsViewCacheCursor
     67     }
     68 
     69     inline bool WillErase(CoinsCachePair& current) const noexcept { return m_will_erase || current.second.coin.IsSpent(); }
     70+    size_t GetDirtyCount() const noexcept { return m_dirty_count; }
     71+
     72 private:
     73     size_t& m_usage;
     74-    size_t& m_dirty;
     75+    size_t& m_dirty_count;
     76     CoinsCachePair& m_sentinel;
     77     CCoinsMap& m_map;
     78     bool m_will_erase;
     79diff --git a/src/test/coins_tests.cpp b/src/test/coins_tests.cpp
     80index 261859ff9a..0cef222024 100644
     81--- a/src/test/coins_tests.cpp
     82+++ b/src/test/coins_tests.cpp
     83@@ -99,8 +99,8 @@ public:
     84 
     85     CCoinsMap& map() const { return cacheCoins; }
     86     CoinsCachePair& sentinel() const { return m_sentinel; }
     87-    size_t& usage() const { return cachedCoinsUsage; }
     88-    size_t& GetDirtyCount() const { return m_dirty_count; }
     89+    void AddUsage(size_t usage) const { cachedCoinsUsage += usage; }
     90+    void AddDirtyCount(size_t count) const { m_dirty_count += count; }
     91 };
     92 
     93 } // namespace
     94@@ -197,8 +197,11 @@ void SimulationTest(CCoinsView* base, bool fake_best_block)
     95                     (coin.IsSpent() ? added_an_entry : updated_an_entry) = true;
     96                     coin = newcoin;
     97                 }
     98-                bool is_overwrite = !coin.IsSpent() || m_rng.rand32() & 1;
     99-                stack.back()->AddCoin(COutPoint(txid, 0), std::move(newcoin), is_overwrite);
    100+                if (COutPoint op(txid, 0); !stack.back()->map().contains(op) && !(coin.IsSpent() || (m_rng.rand32() & 1))) {
    101+                    stack.back()->EmplaceCoinInternalDANGER(std::move(op), std::move(newcoin));
    102+                } else {
    103+                    stack.back()->AddCoin(std::move(op), std::move(newcoin), /*possible_overwrite=*/(!coin.IsSpent() || (m_rng.rand32() & 1)));
    104+                }
    105             } else {
    106                 // Spend the coin.
    107                 removed_an_entry = true;
    108@@ -653,8 +656,8 @@ static void WriteCoinsViewEntry(CCoinsView& view, const MaybeCoin& cache_coin)
    109     CCoinsMapMemoryResource resource;
    110     CCoinsMap map{0, CCoinsMap::hasher{}, CCoinsMap::key_equal{}, &resource};
    111     auto usage{cache_coin ? InsertCoinsMapEntry(map, sentinel, *cache_coin) : 0};
    112-    size_t dirty = cache_coin ? cache_coin->IsDirty() : 0;
    113-    auto cursor{CoinsViewCacheCursor(usage, dirty, sentinel, map, /*will_erase=*/true)};
    114+    size_t dirty_count = cache_coin ? cache_coin->IsDirty() : 0;
    115+    auto cursor{CoinsViewCacheCursor(usage, dirty_count, sentinel, map, /*will_erase=*/true)};
    116     BOOST_CHECK(view.BatchWrite(cursor, {}));
    117 }
    118 
    119@@ -666,8 +669,8 @@ public:
    120         auto base_cache_coin{base_value == ABSENT ? MISSING : CoinEntry{base_value, CoinEntry::State::DIRTY}};
    121         WriteCoinsViewEntry(base, base_cache_coin);
    122         if (cache_coin) {
    123-            cache.usage() += InsertCoinsMapEntry(cache.map(), cache.sentinel(), *cache_coin);
    124-            cache.GetDirtyCount() += cache_coin->IsDirty();
    125+            cache.AddUsage(InsertCoinsMapEntry(cache.map(), cache.sentinel(), *cache_coin));
    126+            cache.AddDirtyCount(cache_coin->IsDirty());
    127         }
    128     }
    129 
    130diff --git a/src/test/fuzz/coins_view.cpp b/src/test/fuzz/coins_view.cpp
    131index a9b91b9efe..64cec03fd5 100644
    132--- a/src/test/fuzz/coins_view.cpp
    133+++ b/src/test/fuzz/coins_view.cpp
    134@@ -123,7 +123,7 @@ FUZZ_TARGET(coins_view, .init = initialize_coins_view)
    135                 CoinsCachePair sentinel{};
    136                 sentinel.second.SelfRef(sentinel);
    137                 size_t usage{0};
    138-                size_t num_dirty{0};
    139+                size_t dirty_count{0};
    140                 CCoinsMapMemoryResource resource;
    141                 CCoinsMap coins_map{0, SaltedOutpointHasher{/*deterministic=*/true}, CCoinsMap::key_equal{}, &resource};
    142                 LIMITED_WHILE(good_data && fuzzed_data_provider.ConsumeBool(), 10'000)
    143@@ -145,11 +145,11 @@ FUZZ_TARGET(coins_view, .init = initialize_coins_view)
    144                     if (dirty) CCoinsCacheEntry::SetDirty(*it, sentinel);
    145                     if (fresh) CCoinsCacheEntry::SetFresh(*it, sentinel);
    146                     usage += it->second.coin.DynamicMemoryUsage();
    147-                    num_dirty += dirty;
    148+                    dirty_count += dirty;
    149                 }
    150                 bool expected_code_path = false;
    151                 try {
    152-                    auto cursor{CoinsViewCacheCursor(usage, num_dirty, sentinel, coins_map, /*will_erase=*/true)};
    153+                    auto cursor{CoinsViewCacheCursor(usage, dirty_count, sentinel, coins_map, /*will_erase=*/true)};
    154                     coins_view_cache.BatchWrite(cursor, fuzzed_data_provider.ConsumeBool() ? ConsumeUInt256(fuzzed_data_provider) : coins_view_cache.GetBestBlock());
    155                     expected_code_path = true;
    156                 } catch (const std::logic_error& e) {
    157diff --git a/src/txdb.cpp b/src/txdb.cpp
    158index 1622039d63..9587a62db7 100644
    159--- a/src/txdb.cpp
    160+++ b/src/txdb.cpp
    161@@ -25,6 +25,9 @@ static constexpr uint8_t DB_HEAD_BLOCKS{'H'};
    162 // Keys used in previous version that might still be found in the DB:
    163 static constexpr uint8_t DB_COINS{'c'};
    164 
    165+// Threshold for warning when writing this many dirty cache entries to disk.
    166+static constexpr size_t WARN_FLUSH_COINS_COUNT{10'000'000};
    167+
    168 bool CCoinsViewDB::NeedsUpgrade()
    169 {
    170     std::unique_ptr<CDBIterator> cursor{m_db->NewIterator()};
    171@@ -93,6 +96,7 @@ std::vector<uint256> CCoinsViewDB::GetHeadBlocks() const {
    172 bool CCoinsViewDB::BatchWrite(CoinsViewCacheCursor& cursor, const uint256 &hashBlock) {
    173     CDBBatch batch(*m_db);
    174     size_t count = 0;
    175+    size_t dirty_count{cursor.GetDirtyCount()};
    176     size_t changed = 0;
    177     assert(!hashBlock.IsNull());
    178 
    179@@ -109,6 +113,8 @@ bool CCoinsViewDB::BatchWrite(CoinsViewCacheCursor& cursor, const uint256 &hashB
    180         }
    181     }
    182 
    183+    if (dirty_count > WARN_FLUSH_COINS_COUNT) LogWarning("Flushing large (%d entries) UTXO set to disk, it may take several minutes", dirty_count);
    184+
    185     // In the first batch, mark the database as being in the middle of a
    186     // transition from old_tip to hashBlock.
    187     // A vector is used for future extensibility, as we may want to support
    188@@ -145,9 +151,10 @@ bool CCoinsViewDB::BatchWrite(CoinsViewCacheCursor& cursor, const uint256 &hashB
    189     batch.Erase(DB_HEAD_BLOCKS);
    190     batch.Write(DB_BEST_BLOCK, hashBlock);
    191 
    192-    LogDebug(BCLog::COINDB, "Writing final batch of %.2f MiB\n", batch.SizeEstimate() * (1.0 / 1048576.0));
    193+    LogDebug(BCLog::COINDB, "Writing final batch of %.2f MiB", batch.SizeEstimate() * (1.0 / 1048576.0));
    194     bool ret = m_db->WriteBatch(batch);
    195-    LogDebug(BCLog::COINDB, "Committed %u changed transaction outputs (out of %u) to coin database...\n", (unsigned int)changed, (unsigned int)count);
    196+    assert(changed == dirty_count); // TODO remove `changed` after all scenarios pass
    197+    LogDebug(BCLog::COINDB, "Committed %u changed transaction outputs (out of %u) to coin database...", (unsigned int)dirty_count, (unsigned int)count);
    198     return ret;
    199 }
    200 
    201diff --git a/src/validation.cpp b/src/validation.cpp
    202index 1f4f2e1ae8..3323a037dc 100644
    203--- a/src/validation.cpp
    204+++ b/src/validation.cpp
    205@@ -88,8 +88,6 @@ using node::CBlockIndexHeightOnlyComparator;
    206 using node::CBlockIndexWorkComparator;
    207 using node::SnapshotMetadata;
    208 
    209-/** Threshold for warning when writing this many dirty cache entries to disk. */
    210-static constexpr size_t WARN_FLUSH_COINS_COUNT = 10'000'000;
    211 /** Time to wait between writing blocks/block index to disk. */
    212 static constexpr std::chrono::hours DATABASE_WRITE_INTERVAL{1};
    213 /** Time to wait between flushing chainstate to disk. */
    214@@ -2932,7 +2930,6 @@ bool Chainstate::FlushStateToDisk(
    215         }
    216         // Flush best chain related state. This can only be done if the blocks / block index write was also done.
    217         if (fDoFullFlush && !CoinsTip().GetBestBlock().IsNull()) {
    218-            if (coins_dirty_count >= WARN_FLUSH_COINS_COUNT) LogWarning("Flushing large (%d entries) UTXO set to disk, it may take several minutes", coins_dirty_count);
    219             LOG_TIME_MILLIS_WITH_CATEGORY(strprintf("write coins cache to disk (%d out of %d cached coins)",
    220                 coins_dirty_count, coins_count), BCLog::BENCH);
    
  51. sipa commented at 3:11 pm on March 31, 2025: member
    @l0rinc I’ve lost track a bit of your comments/suggestions. Is the PR as-is broken when interacting with assumeutxo?

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-04-02 00:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me