coins: remove logic for spent-and-FRESH cache entries and writing non-DIRTY entries #30673

pull andrewtoth wants to merge 10 commits into bitcoin:master from andrewtoth:no-spent-and-fresh changing 7 files +85 −120
  1. andrewtoth commented at 9:31 pm on August 18, 2024: contributor

    Following up from #28280 (review), which suggested a revival of #18746.

    GetCoin will never return true for a spent entry, so we can safely assume that any entry we fetch will not be spent. This lets us remove the only non-test code path which adds a FRESH-but-not-DIRTY entry to the flagged linked list. This in turn ensures all entries being sent to BatchWrite are DIRTY entries.

    A corollary is that all spent coins must be DIRTY. The only time a coin can be spent and not DIRTY is when the CCoinsViewCacheEntry is created with an empty coin. The last commit makes this more clear by checking for freshness if inserted instead of if spent and not DIRTY.

    This is a pure refactor removing dead code which handles a non-existent corner case of a FRESH-but-not-DIRTY entry in the CCoinsViewCache or a spent entry in CCoinsViewDB. There is a lot of test code which tries to exercise this corner case, which is also updated in this PR to behave like non-test code.

  2. DrahtBot commented at 9:31 pm on August 18, 2024: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage

    For detailed information about the code coverage, see the test coverage report.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    Concept ACK l0rinc

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #30906 (refactor: prohibit direct flags access in CCoinsCacheEntry and remove invalid tests by l0rinc)
    • #30849 (refactor: migrate bool GetCoin to return optional<Coin> by l0rinc)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  3. DrahtBot added the label UTXO Db and Indexes on Aug 18, 2024
  4. andrewtoth force-pushed on Aug 18, 2024
  5. DrahtBot added the label CI failed on Aug 18, 2024
  6. DrahtBot commented at 9:39 pm on August 18, 2024: contributor

    🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/28917867890

    Make sure to run all tests locally, according to the documentation.

    The failure may happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  7. andrewtoth force-pushed on Aug 18, 2024
  8. DrahtBot removed the label CI failed on Aug 18, 2024
  9. in src/coins.cpp:53 in feaea74b04 outdated
    48@@ -49,10 +49,8 @@ CCoinsMap::iterator CCoinsViewCache::FetchCoin(const COutPoint &outpoint) const
    49             cacheCoins.erase(ret);
    50             return cacheCoins.end();
    51         }
    52-        if (ret->second.coin.IsSpent()) {
    53-            // The parent only has an empty entry for this outpoint; we can consider our version as fresh.
    54-            ret->second.AddFlags(CCoinsCacheEntry::FRESH, *ret, m_sentinel);
    55-        }
    56+        // GetCoin will never return true if the coin is spent
    57+        Assume(!ret->second.coin.IsSpent());
    


    l0rinc commented at 9:47 am on August 19, 2024:

    when do we throw and when do we Assume here?

    * Assume is the identity function. * * - Should be used to run non-fatal checks. In debug builds it behaves like * Assert()/assert() to notify developers and testers about non-fatal errors. * In production it doesn’t warn or log anything. * - For fatal errors, use Assert().

    Wouldn’t this be a fatal error?

    Alternatively, if we want to separate the test runs from production runs, would it maybe make sense to call SanityCheck in test&debug mode every time we want to be absolutely sure the state is correct? I know it’s expensive, but might be reassuring during testing - and similarly to Assume, would be skipped in prod.


    andrewtoth commented at 7:26 pm on August 31, 2024:

    It would be a fatal error when running in DEBUG mode, but is not compiled in for release.

    when do we throw and when do we Assume here?

    I only added throws to BatchWrite because there are invalid states that can happen in our fuzz test harness which we want to handle properly. A follow up to remove setting the flags directly can help make that more clear and we wouldn’t need to have the throws there anymore.


    l0rinc commented at 1:39 pm on September 15, 2024:
  10. in src/coins.cpp:326 in feaea74b04 outdated
    320@@ -325,14 +321,14 @@ void CCoinsViewCache::SanityCheck() const
    321         if (entry.IsDirty()) attr |= 1;
    322         if (entry.IsFresh()) attr |= 2;
    323         if (entry.coin.IsSpent()) attr |= 4;
    324-        // Only 5 combinations are possible.
    325-        assert(attr != 2 && attr != 4 && attr != 7);
    326+        // Only 4 combinations are possible.
    327+        assert(attr != 2 && attr != 4 && attr != 6 && attr != 7);
    


    l0rinc commented at 11:51 am on August 19, 2024:

    So basically if we lay out the allowed combinations we’d get:

    0assert((!entry.IsDirty() & !entry.IsFresh() & !entry.coin.IsSpent())  |  // attr = 0
    1        (entry.IsDirty() & !entry.IsFresh() & !entry.coin.IsSpent())  |  // attr = 1
    2        (entry.IsDirty() &  entry.IsFresh() & !entry.coin.IsSpent())  |  // attr = 3
    3        (entry.IsDirty() & !entry.IsFresh() &  entry.coin.IsSpent()));   // attr = 5
    

    Which could be simplified to

    0assert((entry.IsDirty() && !(entry.coin.IsSpent() && entry.IsFresh())) || !(entry.coin.IsSpent() || entry.IsFresh()));
    

    which can be laid out as an if expression which would probably document the situation better than != 2 && != 4 && != 6 && != 7:

    0if (entry.coin.IsSpent()) {
    1    assert(entry.IsDirty() && !entry.IsFresh());
    2} else {
    3    assert(entry.IsDirty() || !entry.IsFresh());
    4}
    
     0BOOST_AUTO_TEST_CASE(sanity_check_assertions)
     1{
     2    for (int is_dirty = 0; is_dirty <= 1; ++is_dirty) {
     3        for (int is_fresh = 0; is_fresh <= 1; ++is_fresh) {
     4            for (int is_spent = 0; is_spent <= 1; ++is_spent) {
     5                // Original check
     6                unsigned attr = 0;
     7                if (is_dirty) attr |= 1;
     8                if (is_fresh) attr |= 2;
     9                if (is_spent) attr |= 4;
    10                bool original_result = (attr != 2 && attr != 4 && attr != 6 && attr != 7);
    11
    12                bool new_result = (!is_dirty & !is_fresh & !is_spent) |  // attr = 0
    13                                  (is_dirty & !is_fresh & !is_spent) |   // attr = 1
    14                                  (is_dirty & is_fresh & !is_spent) |    // attr = 3
    15                                  (is_dirty & !is_fresh & is_spent);     // attr = 5
    16                BOOST_CHECK_EQUAL(original_result, new_result);
    17
    18                new_result = ((is_dirty && !(is_spent && is_fresh)) || !(is_spent || is_fresh));
    19                BOOST_CHECK_EQUAL(original_result, new_result);
    20
    21                if (is_dirty) {
    22                    new_result = !(is_spent && is_fresh);
    23                } else {
    24                    new_result = !(is_spent || is_fresh);
    25                }
    26                BOOST_CHECK_EQUAL(original_result, new_result);
    27            }
    28        }
    29    }
    30}
    

    andrewtoth commented at 7:18 pm on August 31, 2024:
    Done.
  11. andrewtoth force-pushed on Aug 19, 2024
  12. in src/test/fuzz/coinscache_sim.cpp:148 in 963804c5f1 outdated
    147@@ -151,14 +148,10 @@ class CoinsViewBottom final : public CCoinsView
    148     {
    


    l0rinc commented at 12:41 pm on August 19, 2024:

    I was surprised to see that the method called GetCoin now:

    • overwrites the dummy coin parameter with the value found in the cache
    • returns a boolean indicating whether the coin exists in the cache and is not spent (the name doesn’t have any hints about it being spent or not)

    I don’t mind if you think it’s outside the scope of the current change, but checking the usages it seems to me we can now simplify GetCoin to actually return the coin and let the callsite check for spent and it might simplify the usages as well:

    0std::optional<Coin> CCoinsViewCache::GetCoin(const COutPoint& outpoint) const
    1{
    2    auto it = FetchCoin(outpoint);
    3    if (it != cacheCoins.end()) {
    4        return it->second.coin;
    5    }
    6    return std::nullopt;
    7}
    

    Which would simplify usages from e.g.

    0Coin coin_using_get_coin;
    1const bool exists_using_get_coin = coins_view_cache.GetCoin(random_out_point, coin_using_get_coin);
    2if (exists_using_get_coin) {
    3    assert(coin_using_get_coin == coin_using_access_coin);
    4}
    5assert((exists_using_access_coin && exists_using_have_coin_in_cache && exists_using_have_coin && exists_using_get_coin) ||
    6       (!exists_using_access_coin && !exists_using_have_coin_in_cache && !exists_using_have_coin && !exists_using_get_coin));
    

    to something like:

    0if (auto coin_optional = coins_view_cache.GetCoin(random_out_point); coin_optional && !coin_optional->IsSpent()) {
    1    assert(*coin_optional == coin_using_access_coin);
    2    assert(exists_using_access_coin && exists_using_have_coin_in_cache && exists_using_have_coin);
    3} else {
    4    assert(!exists_using_access_coin && !exists_using_have_coin_in_cache && !exists_using_have_coin);
    5}
    

    What do you think?


    andrewtoth commented at 1:35 pm on August 19, 2024:
    This behavior is documented here, so not sure why test code bothered to do this. Indeed, switching to std::optional would be cleaner, but I think a much bigger refactor. This is also suggested in the original PR here.

    l0rinc commented at 1:40 pm on August 19, 2024:

    but I think a much bigger refactor

    Maybe, would it help if I did that in a different PR?


    andrewtoth commented at 1:45 pm on August 19, 2024:
    Yes, but it would conflict heavily with this one. Many of the tests would have to be removed as well.

    l0rinc commented at 2:24 pm on August 19, 2024:
    I’m fine with that, extracted to a branch for now, will push as a draft after your PR stabilizes

    l0rinc commented at 7:59 pm on August 19, 2024:
    you can resolve this comment, I see now that this should definitely be investigated in a separate PR (it’s a big change).

    andrewtoth commented at 0:18 am on August 20, 2024:
    I think you can resolve your own comment threads too 😉

  13. in src/coins.h:182 in ffabb2ddcd outdated
    176@@ -184,13 +177,13 @@ struct CCoinsCacheEntry
    177     inline bool IsDirty() const noexcept { return m_flags & DIRTY; }
    178     inline bool IsFresh() const noexcept { return m_flags & FRESH; }
    179 
    180-    //! Only call Next when this entry is DIRTY, FRESH, or both
    181+    //! Only call Next when this entry is DIRTY
    182     inline CoinsCachePair* Next() const noexcept {
    183         Assume(m_flags);
    


    l0rinc commented at 12:46 pm on August 19, 2024:
    0        Assume(IsDirty());
    

    nit: it’s a cheap call, why not make it an assert instead? Note that after that change the comments will become redundant


    andrewtoth commented at 7:24 pm on August 31, 2024:
    Done.

    l0rinc commented at 2:08 pm on September 1, 2024:
    I don’t see it in doc: update comments to reflect possible code paths (b0737df12ed6922624d3e78bbb2bc86e610522cd)
  14. in src/test/coins_tests.cpp:66 in 946a035cdc outdated
    52@@ -53,14 +53,11 @@ class CCoinsViewTest : public CCoinsView
    53 
    54     bool BatchWrite(CoinsViewCacheCursor& cursor, const uint256& hashBlock) override
    55     {
    56-        for (auto it{cursor.Begin()}; it != cursor.End(); it = cursor.NextAndMaybeErase(*it)){
    57-            if (it->second.IsDirty()) {
    


    l0rinc commented at 12:50 pm on August 19, 2024:

    is this is always the case, consider adding an assertion

    0assert(it->second.IsDirty());
    

    (we can remove them in the last commit, but would be useful for the intermediary commits at least)


    andrewtoth commented at 7:18 pm on August 31, 2024:
    Done.
  15. in src/coins.cpp:281 in 286f841c8e outdated
    274@@ -275,7 +275,7 @@ bool CCoinsViewCache::Sync()
    275 void CCoinsViewCache::Uncache(const COutPoint& hash)
    276 {
    277     CCoinsMap::iterator it = cacheCoins.find(hash);
    278-    if (it != cacheCoins.end() && !it->second.IsDirty() && !it->second.IsFresh()) {
    


    l0rinc commented at 12:51 pm on August 19, 2024:
    same, can we add a (temporary) assertion for the removed check?
  16. l0rinc commented at 12:54 pm on August 19, 2024: contributor

    The code became a lot simpler now - have a few suggestion I’d like us to consider, but I understand if you think it’s out of scope.

    Once we get to a state that is ACK-worthy, I’ll enable hard assertions everywhere and run an IBD and check if it crashes anywhere.

  17. in src/coins.cpp:52 in 1f2f8e516b outdated
    48@@ -49,10 +49,8 @@ CCoinsMap::iterator CCoinsViewCache::FetchCoin(const COutPoint &outpoint) const
    49             cacheCoins.erase(ret);
    50             return cacheCoins.end();
    51         }
    52-        if (ret->second.coin.IsSpent()) {
    53-            // The parent only has an empty entry for this outpoint; we can consider our version as fresh.
    54-            ret->second.AddFlags(CCoinsCacheEntry::FRESH, *ret, m_sentinel);
    55-        }
    56+        // GetCoin will never return true if the coin is spent
    


    l0rinc commented at 8:01 pm on August 19, 2024:
    It doesn’t sound like it’s GetCoin’s responsibility, I think the call-sites should be checking for whether the coins are spent or not, not the cache accessor method

    andrewtoth commented at 1:39 am on August 20, 2024:
    It’s specifically documented that this is the behavior. Why shouldn’t we be able to rely on it?

    l0rinc commented at 5:53 am on August 20, 2024:
    If we need to document the behavior of a method, it suggests that we’ve acknowledged it may not be intuitive for everyone, indicating that we’ve conceded on finding a better design. I think there’s a way we can avoid the comment: by separating the two concerns of getting a value from cache and checking if it’s spent or not. Or if you insist, we can call it “GetUnspent” in which case we can also get rid of the comment and call-site asserts and keep the current behavior. What do you think?

    andrewtoth commented at 4:40 pm on August 20, 2024:
    What you are suggesting is basically a concept NACK on this PR. If we aren’t confident that we will never get a spent coin from GetCoin, then we need to keep the logic to handle spent and FRESH coins.

    sipa commented at 4:43 pm on August 20, 2024:
    @paplorinc I feel your criticism here is about the naming/expectations/documentation of the existing code, rather than about this PR. Do you feel that addressing that first should be a blocker for this PR?

    l0rinc commented at 5:06 pm on August 20, 2024:

    I won’t block this since I’m fine with discussing/addressing any issues in later commits as well. I also don’t yet understand the second order effects of these changes, so treat my comments as curious observations.

    So my concern is that in this PR we’re tying coin.IsSpent() with retrieving values from the cache. So if my understanding is correct we should either:

    • be able to get a coin from the cache and ignore/assert whether it’s spent or not - e.g. HaveCoin will now return false when the cache contains it but it’s spent… and if that’s not possible in certain situations I would assert instead of returning the boolean. So in that case GetCoin would just get it from the cache and the call-site will decide whether it should assert for IsSpent or branch based on it, something like this: https://github.com/bitcoin/bitcoin/blob/9db8b30788eff2b8e9c9f629a6a03886a28a3281/src/test/fuzz/coins_view.cpp#L167
    • rename the method to reflect the new responsibility that it has

    There’s also a third possibility that I’m completely off on what this PR does - your patience is appreciated :)


    andrewtoth commented at 8:13 pm on August 22, 2024:

    @l0rinc It seems to me like you are thinking this PR has some behavior change. Any coin retrieved from GetCoin will be unspent if GetCoin returns true. You can follow this logically from seeing the return value here and then the only base class is CCoinsViewDB which does not save spent coins so cannot retrieve them in GetCoin.

    This PR is recognizing that FetchCoin logically cannot get to here with a spent coin. This is a dead code path.


    l0rinc commented at 8:26 pm on August 22, 2024:

    Yeah, I understand that.

    What I’m struggling with is the cache retrieval, and especially the invalidation logic, which seems to be scattered throughout the code and tightly coupled with the business logic.

    Would you be open to a screen-sharing session to discuss this? It would help me better understand the constraints and likely speed up my review process as well.


    l0rinc commented at 11:24 am on August 26, 2024:
    So my understanding is that if a coin is already spent it’s basically “less likely” to be in the cache (0% without reorgs, right?), so we want to get rid of it ASAP to have more likely cache hits. Given that the complete UTXO set doesn’t fit into memory anymore, would it maybe make sense to have a more complex retainment heuristic for each cache layer, e.g. spentness, recency, dust amount, unspendables, time-locked, etc?

    andrewtoth commented at 7:23 pm on August 31, 2024:

    Given that the complete UTXO set doesn’t fit into memory anymore, would it maybe make sense to have a more complex retainment heuristic for each cache layer,

    Since we now have a linked list of entries, we could have another sentinel that tracks coins that have their flags cleared. Then, when doing a flush we could just remove a chunk from the bottom of the linked list (entries added least recently).


    l0rinc commented at 2:24 pm on September 1, 2024:
    comment states exactly what the code already states - it’s just noise, can be removed

    l0rinc commented at 3:16 pm on September 1, 2024:
    That sounds like a really good idea that we should investigate after this PR!

    l0rinc commented at 8:52 pm on September 8, 2024:
    Added #30849 to somewhat mitigate the GetCoin mutations in tests - hoping this will get us closer to excluding invalid tests states.
  18. andrewtoth commented at 6:36 pm on August 20, 2024: contributor

    Once we get to a state that is ACK-worthy, I’ll enable hard assertions everywhere and run an IBD and check if it crashes anywhere.

    I think for a change like this, spending time fuzzing would be more valuable. I believe it already runs in debug mode for that so Assumes would trigger crashes.

  19. in src/txdb.cpp:116 in 1f2f8e516b outdated
    120-                batch.Erase(entry);
    121-            else
    122-                batch.Write(entry, it->second.coin);
    123-            changed++;
    124-        }
    125+    for (auto it{cursor.Begin()}; it != cursor.End(); it = cursor.NextAndMaybeErase(*it)) {
    


    l0rinc commented at 9:05 am on August 26, 2024:
    +1 for moving the mutation to the loop nit: for consistency consider adding braces to the condition
  20. in src/coins.h:187 in 1f2f8e516b outdated
    184         return m_next;
    185     }
    186 
    187-    //! Only call Prev when this entry is DIRTY, FRESH, or both
    188+    //! Only call Prev when this entry is DIRTY
    189     inline CoinsCachePair* Prev() const noexcept {
    


  21. in src/test/fuzz/coinscache_sim.cpp:153 in 1f2f8e516b outdated
    152-                coin.Clear();
    153-                return true;
    154-            }
    155             return false;
    156         } else {
    157             coin = it->second;
    


    l0rinc commented at 9:58 am on August 26, 2024:

    The docs state:

    When false is returned, coin’s value is unspecified.

    Eventually I hope we can untangle these two “return values” (coin & spentness), but temporarily maybe we can improve this guarantee (since I’m very much against unspecified mutations of non-local state) so that we only set the value of coin when it’s unspent (similarly to how Uncache handles this case), e.g.:

    0bool GetCoin(const COutPoint& outpoint, Coin& coin) const final
    1{
    2    if (auto it = m_data.find(outpoint); it != m_data.end() && !it->second.IsSpent()) {
    3        coin = it->second;
    4        return true;
    5    } else {
    6        return false;
    7    }
    8}
    
  22. in src/test/coins_tests.cpp:889 in 1f2f8e516b outdated
    874@@ -883,8 +875,13 @@ BOOST_AUTO_TEST_CASE(ccoins_write)
    875     for (const CAmount parent_value : {ABSENT, SPENT, VALUE1})
    876         for (const CAmount child_value : {ABSENT, SPENT, VALUE2})
    877             for (const char parent_flags : parent_value == ABSENT ? ABSENT_FLAGS : FLAGS)
    878-                for (const char child_flags : child_value == ABSENT ? ABSENT_FLAGS : CLEAN_FLAGS)
    879-                    CheckWriteCoins(parent_value, child_value, parent_value, parent_flags, child_flags, parent_flags);
    880+                for (const char child_flags : child_value == ABSENT ? ABSENT_FLAGS : CLEAN_FLAGS) {
    881+                    if (child_flags == FRESH) {
    


    l0rinc commented at 10:04 am on August 26, 2024:
    nit: are we repeating business logic here? Can we make it more black-box somehow?
  23. in src/coins.cpp:340 in 1f2f8e516b outdated
    337@@ -341,7 +338,7 @@ void CCoinsViewCache::SanityCheck() const
    338         assert(it->second.Next()->second.Prev() == it);
    339         assert(it->second.Prev()->second.Next() == it);
    340         // Verify they are actually flagged.
    341-        assert(it->second.IsDirty() || it->second.IsFresh());
    342+        assert(it->second.IsDirty());
    


    l0rinc commented at 10:06 am on August 26, 2024:
    this will be redundant once we assert it in Prev and Next

    l0rinc commented at 2:54 pm on September 1, 2024:
    We can remove this now
  24. in src/coins.cpp:276 in 1f2f8e516b outdated
    274@@ -278,7 +275,7 @@ bool CCoinsViewCache::Sync()
    275 void CCoinsViewCache::Uncache(const COutPoint& hash)
    276 {
    277     CCoinsMap::iterator it = cacheCoins.find(hash);
    278-    if (it != cacheCoins.end() && !it->second.IsDirty() && !it->second.IsFresh()) {
    279+    if (it != cacheCoins.end() && !it->second.IsDirty()) {
    


    l0rinc commented at 10:07 am on August 26, 2024:

    nit: can be modernized to (note: can likely be deduplicated with GetCoin):

    0    if (auto it = cacheCoins.find(hash); it != cacheCoins.end() && !it->second.IsDirty()) {
    
  25. in src/coins.cpp:325 in 1f2f8e516b outdated
    321@@ -325,14 +322,14 @@ void CCoinsViewCache::SanityCheck() const
    322         if (entry.IsDirty()) attr |= 1;
    323         if (entry.IsFresh()) attr |= 2;
    324         if (entry.coin.IsSpent()) attr |= 4;
    325-        // Only 5 combinations are possible.
    326-        assert(attr != 2 && attr != 4 && attr != 7);
    327+        // Only 4 combinations are possible.
    


    l0rinc commented at 10:39 am on August 26, 2024:

    I’m having a hard time with https://github.com/bitcoin/bitcoin/blob/master/src/coins.cpp#L97, where the coin is obviously spent and can also be non-dirty (according to updatecoins_simulation_test, ccoins_flush_behavior and others), which violates attr != 4.

    If it’s valid for coin to (temporarily?) have that state, shouldn’t SanityCheck allow it?

    Or if this is the state when the parent-cache gets invalidated because of a re-org, could we maybe simplify the possible states by always Sync-ing when a reorg is detected and/or only deleting spent entries from parent after a certain block maturity?


    andrewtoth commented at 7:20 pm on August 31, 2024:

    A coin cannot be spent and not DIRTY. I made a new commit to address this in AddCoin. The only time it will be in this state is when a CCoinsViewCacheEntry is constructed with an empty coin, which then has the coin moved into it a few lines later.

    I think doing a follow up where we remove direct access to flags can remove a lot of code paths that we test for.


    l0rinc commented at 4:10 pm on September 1, 2024:
    That makes sense, as stated in other comments, I would be more comfortable doing that before this refactoring.
  26. in src/coins.cpp:188 in 1f2f8e516b outdated
    187-            continue;
    188+            throw std::logic_error("A non-DIRTY coin was returned from the cursor in BatchWrite");
    189+        }
    190+        if (it->second.IsFresh() && it->second.coin.IsSpent()) {
    191+            throw std::logic_error("A FRESH coin was not removed when it was spent");
    192         }
    


    l0rinc commented at 11:25 am on August 26, 2024:
    I find it inconsistent that we’re throwing here only for testability - since we assume this cannot happen in reality. Basically, this is another “return value” of the method, so declaring it a bool was essentially a lie (especially since it always seems to just return true). Could we use the return value instead to signal failure (and since we think this is impossible, skip the error messages)?

    andrewtoth commented at 6:15 pm on August 31, 2024:
    This can happen in our fuzz tests. I think a follow up to this change would be to remove direct access to the flags, and replace with methods SetDirty and SetDirtyAndFresh. Then we can get rid of fuzzing all kinds of flags and simplify this.

    l0rinc commented at 4:09 pm on September 1, 2024:
    Can we consider fixing that before this PR? I wouldn’t mind doing parts of that myself!

    andrewtoth commented at 4:19 pm on September 8, 2024:
    Are you suggesting to refactor direct flags access with SetDirty and SetDirtyAndFresh first? If we do that first, we would still need a SetFresh setter since we still have a case here with setting FRESH-but-not-DIRTY. It’s kind of a chicken-and-egg problem.

    l0rinc commented at 6:22 pm on September 8, 2024:
    I suggest we make our tests reflect reality before we do this change, since currently if we add asserts about the internal consistency of the code, the tests seem to constantly violate them.
  27. in src/coins.cpp:210 in 1f2f8e516b outdated
    230+            // We can mark it FRESH in the parent if it was FRESH in the child
    231+            // Otherwise it might have just been flushed from the parent's cache
    232+            // and already exist in the grandparent
    233+            if (it->second.IsFresh()) {
    234+                entry.AddFlags(CCoinsCacheEntry::FRESH, *itUs, m_sentinel);
    235             }
    


    l0rinc commented at 11:31 am on August 26, 2024:

    Instead of calling AddFlags twice, in other cases we’ve compressed this into a single line:

    0            entry.AddFlags(CCoinsCacheEntry::DIRTY | (it->second.IsFresh() ? CCoinsCacheEntry::FRESH : 0), *itUs, m_sentinel);
    

    andrewtoth commented at 7:21 pm on August 31, 2024:
    I think I only touch this in whitespace, so I don’t want to refactor this right now.
  28. in src/coins.cpp:194 in 1f2f8e516b outdated
    214-                if (it->second.IsFresh()) {
    215-                    entry.AddFlags(CCoinsCacheEntry::FRESH, *itUs, m_sentinel);
    216-                }
    217+            // Create the coin in the parent cache, move the data up
    218+            // and mark it as dirty.
    219+            itUs = cacheCoins.try_emplace(it->first).first;
    


    l0rinc commented at 11:33 am on August 26, 2024:

    Now that this structure is simpler, we could modernize (and optimize) this find / try_emplace construct as:

    0auto [itUs, inserted] = cacheCoins.try_emplace(it->first);
    1if (inserted) {
    

    andrewtoth commented at 6:17 pm on August 31, 2024:
    Yes, that would be a good follow up!
  29. l0rinc changes_requested
  30. l0rinc commented at 11:51 am on August 26, 2024: contributor
    Dug deeper and added more relevant questions - bear with me if they’re outside the scope of this PR.
  31. DrahtBot added the label Needs rebase on Aug 28, 2024
  32. andrewtoth force-pushed on Aug 31, 2024
  33. andrewtoth force-pushed on Aug 31, 2024
  34. DrahtBot added the label CI failed on Aug 31, 2024
  35. DrahtBot removed the label Needs rebase on Aug 31, 2024
  36. DrahtBot removed the label CI failed on Aug 31, 2024
  37. in src/coins.cpp:88 in 7d2b49926c outdated
    87-        // we would remove it from this cache and would never flush spentness
    88-        // to the parent cache.
    89+        // A spent FRESH coin cannot exist in the cache because a FRESH coin
    90+        // is simply erased when it is spent.
    91         //
    92         // Re-adding a spent coin can happen in the case of a re-org (the coin
    


    l0rinc commented at 2:23 pm on September 1, 2024:
    can this still happen during a reorg? The comment here is really scary, do we really need so much context here? I’d prefer a test that fails when the code is wrong instead of a really long description…
  38. in src/coins.cpp:93 in 7d2b49926c outdated
     95         //
     96-        // If the coin doesn't exist in the current cache, or is spent but not
     97-        // DIRTY, then it can be marked FRESH.
     98-        fresh = !it->second.IsDirty();
     99+        // If the coin doesn't exist in the current cache then it can be marked FRESH.
    100+        fresh = inserted;
    


    l0rinc commented at 2:51 pm on September 1, 2024:

    The freshness flag depend on inserted, possible_overwrite and it->second.coin.IsSpent(), assigned and validated in different places. Could we obviate the freshness flag calculation instead with something like:

    0bool fresh_flag = inserted && !possible_overwrite ? CCoinsCacheEntry::FRESH : 0;
    

    Either here or in a follow-up we could simplify & modernize the method to something like:

     0void CCoinsViewCache::AddCoin(const COutPoint &outpoint, Coin&& coin, bool possible_overwrite) {
     1    assert(!coin.IsSpent());
     2    if (coin.out.scriptPubKey.IsUnspendable()) return;
     3
     4    auto [it, inserted] = cacheCoins.try_emplace(outpoint);
     5    if (!inserted) {
     6        if (!possible_overwrite && !it->second.coin.IsSpent()) throw std::logic_error("Attempted to overwrite an unspent coin (when possible_overwrite is false)");
     7        cachedCoinsUsage -= it->second.coin.DynamicMemoryUsage();
     8    }
     9    cachedCoinsUsage += coin.DynamicMemoryUsage();
    10    it->second.coin = std::move(coin);
    11
    12    bool fresh_flag = inserted && !possible_overwrite ? CCoinsCacheEntry::FRESH : 0;
    13    it->second.AddFlags(CCoinsCacheEntry::DIRTY | fresh_flag, *it, m_sentinel);
    14
    15    TRACE5(utxocache, add,
    16           outpoint.hash.data(),
    17           (uint32_t) outpoint.n,
    18           (uint32_t) it->second.coin.nHeight,
    19           (int64_t) it->second.coin.out.nValue,
    20           (bool) it->second.coin.IsCoinBase());
    21}
    
  39. in src/coins.cpp:195 in 7d2b49926c outdated
    190         }
    191         CCoinsMap::iterator itUs = cacheCoins.find(it->first);
    192         if (itUs == cacheCoins.end()) {
    193             // The parent cache does not have an entry, while the child cache does.
    194-            // We can ignore it if it's both spent and FRESH in the child
    195-            if (!(it->second.IsFresh() && it->second.coin.IsSpent())) {
    


    l0rinc commented at 2:51 pm on September 1, 2024:
    still think we should add an assertion here (temporarily at least)
  40. in src/coins.h:180 in 7d2b49926c outdated
    176@@ -184,15 +177,15 @@ struct CCoinsCacheEntry
    177     inline bool IsDirty() const noexcept { return m_flags & DIRTY; }
    178     inline bool IsFresh() const noexcept { return m_flags & FRESH; }
    179 
    180-    //! Only call Next when this entry is DIRTY, FRESH, or both
    181+    //! Only call Next when this entry is DIRTY
    


    l0rinc commented at 2:54 pm on September 1, 2024:
    the comments are redundant now, the code already makes this clear
  41. in src/coins.h:186 in 7d2b49926c outdated
    184+        Assume(IsDirty());
    185         return m_next;
    186     }
    187 
    188-    //! Only call Prev when this entry is DIRTY, FRESH, or both
    189+    //! Only call Prev when this entry is DIRTY
    


    l0rinc commented at 2:55 pm on September 1, 2024:
    same
  42. in src/test/coins_tests.cpp:787 in 7d2b49926c outdated
    786@@ -793,7 +787,7 @@ BOOST_AUTO_TEST_CASE(ccoins_add)
    787      */
    


    l0rinc commented at 3:30 pm on September 1, 2024:
    unrelated: could coins_cache_simulation_testbe moved into a fuzz test - or would it be too difficult to track all those states?. If you think any of these recommendations can be done in a parallel PR, let me know and I’ll do them myself before this is merged
  43. in src/test/coins_tests.cpp:888 in 7d2b49926c outdated
    884@@ -892,8 +885,13 @@ BOOST_AUTO_TEST_CASE(ccoins_write)
    885     for (const CAmount parent_value : {ABSENT, SPENT, VALUE1})
    886         for (const CAmount child_value : {ABSENT, SPENT, VALUE2})
    887             for (const char parent_flags : parent_value == ABSENT ? ABSENT_FLAGS : FLAGS)
    888-                for (const char child_flags : child_value == ABSENT ? ABSENT_FLAGS : CLEAN_FLAGS)
    889-                    CheckWriteCoins(parent_value, child_value, parent_value, parent_flags, child_flags, parent_flags);
    890+                for (const char child_flags : child_value == ABSENT ? ABSENT_FLAGS : CLEAN_FLAGS) {
    


    l0rinc commented at 4:05 pm on September 1, 2024:

    I might be completely off here, but in case of e.g. CheckWriteCoins(ABSENT, SPENT , FAIL , NO_ENTRY , DIRTY|FRESH, NO_ENTRY ); we’re calling .IsDirty() which returns false, but m_next and m_prev are set because CheckWriteCoins calls them with different flags.

    Can this happen during actual usage, i.e. shouldn’t they only be part of the internal linked list when they’re dirty?

    It bothers me a lot how many of the simulated combinations don’t always seem to reflect real usage - while comments warn us of situations that may not be covered with tests at all. Would it maybe make sense to fix the reliability of the tests before this PR?

  44. l0rinc changes_requested
  45. l0rinc commented at 4:11 pm on September 1, 2024: contributor
    The unreliability of many of our tests worries me, would you be open to fixing them in a PR before this one?
  46. DrahtBot added the label Needs rebase on Sep 2, 2024
  47. test: never return true from GetCoins for spent entries 273b9e4a0c
  48. coins: Assume coins returned from GetCoin cannot be spent ed9cd0cea4
  49. coins: throw errors for non-DIRTY or spent and FRESH coins in BatchWrite
    There are no code paths which add a non-DIRTY entry to the cursor in
    BatchWrite, so we can safely make this a logic error and test for it.
    
    There are no code paths which add a spent and FRESH coin to the cursor
    in BatchWrite, so we can safely make this a logic error and test for it.
    a414da3304
  50. doc: update comments to reflect possible code paths cbed14a423
  51. coins: remove redundant IsDirty checks in BatchWrite
    It is no longer possible to get non-DIRTY entries in BatchWrite
    f724cac945
  52. coins: remove IsFresh check in Uncache
    It is not possible for an entry to be FRESH if not already DIRTY.
    9ae555f347
  53. test: don't check for FRESH but not DIRTY entries in SanityCheck d8eb28998c
  54. coins: assume entry is dirty in Next and Prev 8c3b887c7b
  55. test: simplify coins sanity check 5d95794171
  56. coins: set all inserted spent coins to fresh
    A spent coins must be DIRTY, so remove
    references to spent but not DIRTY coins.
    The only way a spent coin can be not DIRTY
    is when creating the CCoinsCacheEntry with
    an empty coin. This can be made more clear
    by setting fresh if inserted, instead of
    checking if an unspent coin is not DIRTY.
    34a101d4ba
  57. andrewtoth force-pushed on Sep 2, 2024
  58. DrahtBot removed the label Needs rebase on Sep 3, 2024
  59. in src/test/fuzz/coinscache_sim.cpp:155 in 34a101d4ba
    147@@ -151,14 +148,10 @@ class CoinsViewBottom final : public CCoinsView
    148     {
    149         auto it = m_data.find(outpoint);
    150         if (it == m_data.end()) {
    151-            if ((outpoint.n % 5) == 3) {
    152-                coin.Clear();
    


    l0rinc commented at 1:25 pm on September 7, 2024:
    I found this very confusing - a coin getter checking the cache, and if it’s not found, it clears the coin that it was supposed to return?
  60. in src/test/fuzz/coinscache_sim.cpp:151 in 34a101d4ba
    147@@ -151,14 +148,10 @@ class CoinsViewBottom final : public CCoinsView
    148     {
    149         auto it = m_data.find(outpoint);
    150         if (it == m_data.end()) {
    151-            if ((outpoint.n % 5) == 3) {
    152-                coin.Clear();
    153-                return true;
    154-            }
    155             return false;
    


    l0rinc commented at 1:56 pm on September 7, 2024:
  61. andrewtoth referenced this in commit de0d22eb6c on Sep 7, 2024
  62. in src/test/coins_tests.cpp:54 in 34a101d4ba
    50@@ -51,25 +51,19 @@ class CCoinsViewTest : public CCoinsView
    51             return false;
    52         }
    53         coin = it->second;
    54-        if (coin.IsSpent() && m_rng.randbool() == 0) {
    


    l0rinc commented at 8:31 pm on September 8, 2024:
    Did this change require any other test adjustment?

    andrewtoth commented at 8:38 pm on September 8, 2024:
    No, but without this the next commit would break.
  63. in src/test/coins_tests.cpp:790 in 34a101d4ba
    786@@ -793,7 +787,7 @@ BOOST_AUTO_TEST_CASE(ccoins_add)
    787      */
    788     CheckAddCoin(ABSENT, VALUE3, VALUE3, NO_ENTRY   , DIRTY|FRESH, false);
    789     CheckAddCoin(ABSENT, VALUE3, VALUE3, NO_ENTRY   , DIRTY      , true );
    790-    CheckAddCoin(SPENT , VALUE3, VALUE3, 0          , DIRTY|FRESH, false);
    791+    CheckAddCoin(SPENT , VALUE3, VALUE3, 0          , DIRTY      , false);
    


    l0rinc commented at 11:21 am on September 12, 2024:

    It seems to me that production code never actually calls AddFlags with a 0 value — we are always setting DIRTY, even when FRESH is set. Therefore, it would make sense to change AddFlags from using a custom uint8_t flags parameter (which is always DIRTY and sometimes FRESH in practice, and only called with a 0 in tests) to using a freshness parameter, such as SetDirtyAndFresh(CoinsCachePair& self, CoinsCachePair& sentinel, bool is_fresh = false).

    This change would allow us to remove a lot of invalid test cases like this one (i.e. where flag is 0 or non-dirty).

    As you’ve also mentioned, this would also allow us to remove the internal flags entirely. Instead of exposing the flags, we could simplify the implementation like this:

    0inline bool IsDirty() const noexcept
    1{
    2    assert(!m_next == !m_prev);
    3    return !!m_next;
    4}
    5inline bool IsFresh() const noexcept { return m_is_fresh; }
    

    andrewtoth commented at 12:27 pm on September 12, 2024:

    Yes, absolutely! But, this PR or something else would need to be merged first, or we would need to have a method SetFresh because there’s still a case where we have FRESH but not DIRTY.

    I think we should keep the internal flags as a bitmap, since if we introduce new fields for each state it will increase the memory footprint of each CoinsCachePair.


    l0rinc commented at 12:34 pm on September 12, 2024:

    because there’s still a case where we have FRESH but not DIRTY.

    I only see https://github.com/bitcoin/bitcoin/blob/34a101d4babd0cd2df86b414da385c3bded84027/src/coins.cpp#L204-L210 and https://github.com/bitcoin/bitcoin/blob/34a101d4babd0cd2df86b414da385c3bded84027/src/coins.cpp#L96 in production code - in both cases we’re only setting FRESH together with DIRTY. Did I miss anything in production code?

    I think we should keep the internal flags as a bitmap, since if we introduce new fields for each state it will increase the memory footprint of each CoinsCachePair.

    I would rather we simplify to boolean, and if we need more flags, change it to an uint8_t again - I don’t like over-generalizing for hypothetical scenarios - unless we’re sure about that future direction (which I understood isn’t the case).


    andrewtoth commented at 1:28 pm on September 12, 2024:

    This case will still have to be handled if this PR is not merged first: https://github.com/bitcoin/bitcoin/blob/master/src/coins.cpp#L52-L55

    I would rather we simplify to boolean, and if we need more flags, change it to an uint8_t again

    What I meant was adding a new field for DIRTY and a new field for FRESH. Since bools must be at least 1 byte it could make the pair larger depending on alignment.


    l0rinc commented at 2:29 pm on September 12, 2024:
    But we don’t need bool for dirty anymore, only for freshness - so we can even get rid of the enum (moving it over to the test maybe). I’ll experiment to see what the next smallest change is before this PR, we need to have tests that are more representative of what’s actually possible.
  64. in src/coins.cpp:342 in 34a101d4ba
    336@@ -341,7 +337,7 @@ void CCoinsViewCache::SanityCheck() const
    337         assert(it->second.Next()->second.Prev() == it);
    338         assert(it->second.Prev()->second.Next() == it);
    339         // Verify they are actually flagged.
    340-        assert(it->second.IsDirty() || it->second.IsFresh());
    341+        assert(it->second.IsDirty());
    342         // Count the number of entries actually in the list.
    343         ++count_linked;
    


    l0rinc commented at 12:23 pm on September 12, 2024:
    to avoid an infinite loop when the linking is incorrect (e.g. next returns itself), we could assert(count_linked++ < count_flagged);
  65. andrewtoth marked this as a draft on Sep 19, 2024
  66. achow101 referenced this in commit 74fb19317a on Oct 24, 2024
  67. DrahtBot commented at 6:14 pm on October 24, 2024: contributor

    🐙 This pull request conflicts with the target branch and needs rebase.

  68. DrahtBot added the label Needs rebase on Oct 24, 2024

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-11-21 09:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me