coins: allow write to disk without cache drop #17487

pull jamesob wants to merge 4 commits into bitcoin:master from jamesob:2019-11-au-coins-erase changing 5 files +288 −28
  1. jamesob commented at 4:17 pm on November 15, 2019: member

    This is part of the assumeutxo project:

    Parent PR: #15606 Issue: #15605 Specification: https://github.com/jamesob/assumeutxo-docs/tree/master/proposal


    In certain circumstances, we may want to flush chainstate data to disk without emptying cacheCoins, which affects performance. UTXO snapshot activation is one such case, as we populate cacheCoins with the snapshot contents and want to persist immediately afterwards but also enter IBD.

    See also #15265, which makes the case that under normal operation a flush-without-erase doesn’t necessarily add much benefit. I open this PR even in light of the previous discussion because (i) flush-without-erase almost certainly provides benefit in the case of snapshot activation (especially on spinning disk hardware) and (ii) this diff is fairly small and gives us convenient options for more granular cache management without changing existing policy.

    See also #15218.

  2. jamesob force-pushed on Nov 15, 2019
  3. DrahtBot added the label Tests on Nov 15, 2019
  4. DrahtBot added the label UTXO Db and Indexes on Nov 15, 2019
  5. in src/coins.cpp:154 in bd44f51d79 outdated
    151+        if (erase) {
    152+            return mapCoins.erase(it);
    153+        } else {
    154+            return it++;
    155+        }
    156+    };
    


    maflcko commented at 6:45 pm on November 15, 2019:

    Could use the ternary operator instead for less code and no captures?

    0for ( ... ; it = erase ? mapCoins.erase(it) : ++it) {
    
  6. maflcko commented at 6:46 pm on November 15, 2019: member
    Does this speed up IBD with low dbcache? If so, a benchmark would be nice
  7. jamesob commented at 6:51 pm on November 15, 2019: member

    Does this speed up IBD with low dbcache? If so, a benchmark would be nice

    No, this PR doesn’t change any existing behavior - it simply introduces the option to avoid erasing the cache (which is later used by assumeutxo’s snapshot activation code). This changeset shouldn’t cause any sort of performance difference.

  8. in src/coins.h:278 in bd44f51d79 outdated
    274@@ -275,7 +275,7 @@ class CCoinsViewCache : public CCoinsViewBacked
    275      * Failure to call this method before destruction will cause the changes to be forgotten.
    276      * If false is returned, the state of this cache (and its backing view) will be undefined.
    277      */
    278-    bool Flush();
    279+    bool Flush(bool clear_cache = true);
    


    ariard commented at 8:30 pm on November 15, 2019:
    Isn’t passing the argument explicitly better than with a default one? Also it’s worthy to extend method comment a bit with effect of argument on cache.
  9. ariard commented at 8:55 pm on November 15, 2019: contributor

    Code Review ACK bd44f51

    Can you slightly note in commit message than this change is not used right now ?

    CCoinsViewCache::Flush is called in FlushStateToDisk, DisconnectTip, ConnectTip, ReplayBlocks but as new default argument is true there shouldn’t be behavior change.

  10. DrahtBot commented at 9:08 pm on November 15, 2019: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK sipa, achow101, Sjors
    Concept ACK ryanofsky
    Stale ACK ariard, dongcarl, jnewbery, jonatack, andrewtoth, fjahr

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #25862 (refactor, kernel: Remove gArgs accesses from dbwrapper and txdb by ryanofsky)
    • #25325 (Add pool based memory resource by martinus)
    • #9384 (CCoinsViewCache code cleanup & deduplication by ryanofsky)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  11. dongcarl commented at 9:25 pm on November 15, 2019: contributor
    Code Review ACK bd44f51d790edc07b7e1224e0f8bfce4252c74d5 Checked that it doesn’t change any behavior, if the compiler is smart enough about detecting unused codepaths/function signatures, it will probably even emit an unchanged binary.
  12. maflcko removed the label Tests on Nov 15, 2019
  13. jamesob force-pushed on Nov 15, 2019
  14. jamesob commented at 9:38 pm on November 15, 2019: member
    Thanks for the looks, everyone. I’ve incorporated feedback from @MarcoFalke and @ariard. (diff)
  15. Sjors commented at 4:31 pm on November 19, 2019: member
  16. maflcko commented at 4:57 pm on November 19, 2019: member

    Given that this does not improve performance (see #15265 (comment)), is this needed at all? If so, sharing a benchmark with steps to reproduce would be helpful.

    Anyway, I’d like to see the tests in src/test/coins_tests.cpp updated to cover the newly added paths.

  17. ryanofsky commented at 5:08 pm on November 19, 2019: contributor

    Concept ACK, but:

    • I think the PR description should be clearer that this change by itself doesn’t change behavior.
    • I don’t understand how this speeds up UTXO snapshot activation if conclusion from #15265 is that the cache is only really functioning as a write cache and not a read cache.
    • I agree with Marco about testing. This definitely needs unit tests or some coverage to unsure this isn’t adding broken functionality, or code that could easily be broken in the future with no one noticing.
    • I agree with Antoine about the Flush method signature. Using a default boolean argument seems error prone. Explicit parameter would be ok, but better would just be to have separate methods like Flush() and Sync() or Flush() and Write()
  18. jamesob commented at 5:18 pm on November 19, 2019: member

    Alright, given the controversy here now is probably as good a time as any to back up and figure out exactly what cacheCoins’ impact is on performance, for both this PR and posterity, so I’m going to benchmark

    1. a reindex to 550,000 with and without an in-memory UTXO cache, and then
    2. UTXO snapshot sync-to-tip with and without this changeset.

    I’ll get back with some results in the next few days, and hopefully will write up an elucidation on where exactly CCoinsViewCache does or doesn’t provide benefit.

    (Though I wonder how cross-platform differences beyond SSD-or-not play into this, e.g. the cache may or may not show importance based on leveldb’s use of mmap & max_open_files, which is particularly relevant on Win32.)

  19. laanwj commented at 11:42 am on November 20, 2019: member

    the cache may or may not show importance based on leveldb’s use of mmap & max_open_files, which is particularly relevant on Win32

    FWIW, #17398 adds leveldb mmap support on Windows, might want to try with and without that PR when doing benchmarking on Windows.

  20. jamesob commented at 7:04 pm on November 21, 2019: member

    After benching, I stand by this change as being significantly useful. I compare HEAD of the assumeutxo parent PR (utxo-dumpload-compressed) to a branch of it which drops use of the erase=false parameter, erasing the coins after snapshot load (bench/au.no-erase, changes). The benchmark compares the time taken to sync to block 604,667 after loading a snapshot taken at height 600,000.

    The result is a significant degradation in performance on spinning disk hosts (4.5 hours vs. 1.5 hours) - see the bench-hdd-3 results. There is a modest but probably negligible degradation on SSD hosts.

     0host         tag                      time       time% maxmem  cpu%  dbcache
     1bench-ssd-4  bench/au.no-erase        47:42.80   1.00  6689.85MB 125%  5000MB
     2bench-ssd-4  bench/au.no-erase        44:15.66   0.93  6560.45MB 136%  5000MB
     3bench-ssd-4  utxo-dumpload-compressed 40:13.30   0.84  6662.75MB 149%  5000MB
     4bench-ssd-4  utxo-dumpload-compressed 41:36.01   0.87  6652.87MB 143%  5000MB
     5
     6bench-ssd-5  bench/au.no-erase        41:33.65   0.95  6754.54MB 144%  5000MB
     7bench-ssd-5  bench/au.no-erase        41:32.42   0.95  6561.39MB 145%  5000MB
     8bench-ssd-5  utxo-dumpload-compressed 43:51.50   1.00  6758.98MB 135%  5000MB
     9bench-ssd-5  utxo-dumpload-compressed 36:39.94   0.84  6649.31MB 162%  5000MB
    10
    11bench-hdd-3  utxo-dumpload-compressed 1:47:08    0.38  6654.23MB 57%   5000MB
    12bench-hdd-3  utxo-dumpload-compressed 1:30:48    0.32  6612.49MB 66%   5000MB
    13bench-hdd-3  bench/au.no-erase        4:24:53    0.93  6578.77MB 23%   5000MB
    14bench-hdd-3  bench/au.no-erase        4:45:29    1.00  6583.10MB 21%   5000MB
    

    This benchmark is reproducible by running this script:

    0./bin/run_remote.py au --hosts [hostnames ...]
    

    I’ll add tests to make sure this change works as-advertised, but I think it’s pretty clearly a useful option for us to have.

  21. jamesob commented at 7:37 pm on November 21, 2019: member
    @sdaftuar just pointed out to me that this code is very wrong - because I don’t wipe the flags of flushed coins, this code ends up with an incorrect on-disk UTXO set (since a coin with the FRESH flag will only be removed from the in-memory cache and the delete will not propagate to leveldb). Going to fix this bug and reevaluate and pledge to never doubt Suhas’ benchmarks again.
  22. jamesob commented at 7:38 pm on November 21, 2019: member
    (FWIW @Sjors wins Employee of the Month for spotting this bug in a previous comment.)
  23. Sjors commented at 11:59 am on November 22, 2019: member

    Took me a while to understand what you’re doing here. To rephrase in my own words: when you load the snapshot from disk you end up with a coin cache at e.g. 600,000. Depending on the users dbcache setting, part of this is already in RAM, which is potentially nice when continuing IBD.

    https://github.com/jamesob/bitcoin/commit/85a73a074389c80c09e2acc877f45e06ab3f5976#diff-24efdb00bfbe56b140fb006b562cc70bL5514-L5530

    If -dbcache is less than the size of the snapshot, the only the most recent (?) coins of the snapshot will in RAM, the rest would already have been flushed. This is probably similar to what happens during a normal IBD when dbcache overflows. (we could use some heuristics to sort coins by life expectancy)

    For some reason (why?) you need to flush at the end of loading the snapshot, which normally means no coins are in RAM. This PR changes that last flush to keep stuff around.

    IIUC the main benefit of this cache is to reduce the number of unnecessary writes, i.e. when a coin is created and then destroyed we save 2 disk writes. But when we flush, even without deleting the coins from RAM, we expect 1 write if the coin is spent before the tip, otherwise no write.

    Looking forward to the new benchmark. I suggest doing this with a ~1 month old snapshot (realistic for users who download immediately after a 1 month release candidates) and a 4 month old snapshot (the average age if we ship every 6 months).

  24. jamesob commented at 4:16 pm on November 22, 2019: member

    I’m surprised to report that even after fixing the (consensus!) bug that @Sjors and @sdaftuar pointed out, this change still shows considerable improvement for the assumeutxo snapshot-load use. We’re still seeing 3x reduction in time with this change applied for spinning disks while doing a from-600k snapshot IBD.

     0host         tag                      time       time% maxmem  cpu%  dbcache
     1bench-ssd-4  utxo-dumpload-compressed 31:48.95   0.70  6787.93MB 179%  5000MB
     2bench-ssd-4  utxo-dumpload-compressed 36:06.14   0.80  6689.20MB 157%  5000MB
     3bench-ssd-4  bench/au.no-erase.1      43:21.22   0.96  6689.82MB 137%  5000MB
     4bench-ssd-4  bench/au.no-erase.1      45:14.75   1.00  6739.20MB 131%  5000MB
     5
     6bench-ssd-5  utxo-dumpload-compressed 33:08.03   0.77  6787.14MB 172%  5000MB
     7bench-ssd-5  utxo-dumpload-compressed 31:13.55   0.73  6731.26MB 182%  5000MB
     8bench-ssd-5  bench/au.no-erase.1      42:49.74   1.00  6754.30MB 139%  5000MB
     9bench-ssd-5  bench/au.no-erase.1      41:22.71   0.97  6557.05MB 144%  5000MB
    10
    11bench-hdd-3  bench/au.no-erase.1      4:34:00    0.98  6530.24MB 22%   5000MB
    12bench-hdd-3  bench/au.no-erase.1      4:39:21    1.00  6562.12MB 21%   5000MB
    13bench-hdd-3  utxo-dumpload-compressed 1:30:11    0.32  6736.32MB 63%   5000MB
    14bench-hdd-3  utxo-dumpload-compressed 1:33:19    0.33  6702.03MB 61%   5000MB
    

    (host specs here)

    Applying @sdaftuar’s fix (here in bench/au.no-erase.1, and here in utxo-dumpload-compressed) I think actually improves runtime for the single Flush(erase=false) call that we make after finishing the snapshot load because now we’re actually removing spent coins from cacheCoins, thus freeing up more cache space going into the IBD.


    For some reason (why?) you need to flush at the end of loading the snapshot, which normally means no coins are in RAM. This PR changes that last flush to keep stuff around.

    Thanks for the nice summary, @Sjors. I’m not sure that we necessarily need to flush after loading the snapshot, but I figured it was prudent to ensure that we’d resume the snapshot-based sync even after an unclean shutdown - maybe it’s better to just omit the Flush() call altogether after loading the snapshot?

    Tangentially, @sdaftuar mentioned the other day a potential technique for partial flushes that may accomplish the kind of optimization that this branch and #15265 are aiming for. The idea would be to allocate some of the cacheCoins memory to a secondary data structure that would serve as an index into cacheCoins, allowing fast lookup by spentness and height of the in-memory coins. That way, on FlushStateToDisk() we could do a partial flush, only making disk writes for those coins which are either unspent and of a certain age or spent and on-disk. This would make use of the fact that most coins are very short-lived (see figure 2 in @adiabat’s excellent utreexo paper).

    I think this is a promising idea and I’ll be experimenting with it soon - though of course that’s outside the scope of this PR.

  25. andrewtoth commented at 8:22 pm on November 22, 2019: contributor
    What about using Flush(erase=false) for all periodic flushes after IBD? That way when running with a high enough dbcache a flush would never empty cacheCoins, which would surely improve performance. That’s also the only real issue with #15218 .
  26. jamesob force-pushed on Dec 3, 2019
  27. jamesob commented at 8:42 pm on December 3, 2019: member

    I’ve pushed some test coverage for the new erase parameter (and more generally some explicit tests for coins cache behavior). Notably, the new test cases use CCoinsViewDB at the base of the cache structure, which isn’t currently tested explicitly anywhere else in the unittest suite - though of course an instance lives at the heart of every CChainState.

    Throughout the course of writing (and debugging) the tests, I found yet another bug that had to do with std::moveing the coins in CCoinsView::BatchWrite even when erase=false, which was causing the coin.out.scriptPubKey to appear null in the child-most cache. This is just another testament to how tricky the coins cache code is to get right (as I was warned months ago by @sdaftuar). But that said, I feel pretty confident that this change is now correct.

  28. in src/coins.cpp:213 in 1e68aad46e outdated
    212+        cacheCoins.clear();
    213+        cachedCoinsUsage = 0;
    214+    } else {
    215+        // Instead of clearing the cache, just clear the FRESH/DIRTY
    216+        // flags, and erase any spent coins
    217+        for (auto it = cacheCoins.begin(); it != cacheCoins.end(); ) {
    


    maflcko commented at 9:00 pm on December 3, 2019:

    Is it true that this loop is very slow with high dbcache? If so, a comment would be appropriate.

    Other than that I also agree with @ryanofsky comment to have “separate methods like Flush() and Sync() or Flush() and Write()”. As a rule of thumb, if two functions have a completely different implementation, they should not be bundled in one and switched by a boolean.


    jamesob commented at 5:10 pm on December 4, 2019:

    In my own benchmarking, I haven’t seen anything to support the notion that this loop takes a long time. In the newest benchmarks (to be posted), calling Flush(erase=true) on a cache of size 3826 MB takes 7:17 and calling Flush(erase=false) on a cache of the same size takes 6:03 so the difference is either negligible or favorable.

    I got the times by subtracting timestamps from the last and second-to-last loglines below for each branch during snapshot activation.

    Before this change

    02019-12-03T23:18:26Z [snapshot] 63000000 coins loaded (99.39%, 3826 MB)
    12019-12-03T23:18:26Z [snapshot] loaded 63389760 (3877 MB) coins from snapshot 00000000000000000007316856900e76b4f7a9139cfbfba89842c8d196cd5f91
    22019-12-03T23:18:26Z [snapshot] flushing snapshot chainstate to disk
    32019-12-03T23:25:43Z [snapshot] validated snapshot (592 MB)
    

    After this change

    02019-12-03T23:48:09Z [snapshot] 63000000 coins loaded (99.39%, 3826 MB)
    12019-12-03T23:48:10Z [snapshot] loaded 63389760 (3877 MB) coins from snapshot 00000000000000000007316856900e76b4f7a9139cfbfba89842c8d196cd5f91
    22019-12-03T23:48:10Z [snapshot] flushing snapshot chainstate to disk
    32019-12-03T23:54:13Z [snapshot] validated snapshot (3877 MB)
    

    I’ll make the Flush() + Sync() change.

  29. jamesob commented at 5:12 pm on December 4, 2019: member

    Latest round of benchmarks for the current tip of this branch continues to show good performance improvements for UTXO snapshot load, even on SSD. (same setup as #17487 (comment))

     0host         tag                      time       time% maxmem  cpu%  dbcache
     1bench-ssd-4  bench/au.no-erase.1      43:27.79   0.98  6572.45MB 137%  5000MB
     2bench-ssd-4  bench/au.no-erase.1      44:10.18   1.00  6690.36MB 135%  5000MB
     3bench-ssd-4  utxo-dumpload.54         32:59.07   0.75  6733.55MB 173%  5000MB
     4bench-ssd-4  utxo-dumpload.54         33:19.00   0.75  6725.27MB 171%  5000MB
     5
     6bench-ssd-5  utxo-dumpload.54         31:32.39   0.60  6769.57MB 182%  5000MB
     7bench-ssd-5  utxo-dumpload.54         31:11.39   0.60  6699.07MB 183%  5000MB
     8bench-ssd-5  bench/au.no-erase.1      41:41.13   0.80  6772.39MB 142%  5000MB
     9bench-ssd-5  bench/au.no-erase.1      52:16.76   1.00  6567.82MB 114%  5000MB
    10
    11bench-hdd-3  bench/au.no-erase.1      4:37:10    1.00  6569.48MB 21%   5000MB
    12bench-hdd-3  bench/au.no-erase.1      4:36:03    1.00  6523.61MB 21%   5000MB
    13bench-hdd-3  utxo-dumpload.54         1:30:22    0.33  6705.20MB 63%   5000MB
    14bench-hdd-3  utxo-dumpload.54         1:36:20    0.35  6735.74MB 59%   5000MB
    
  30. jamesob force-pushed on Dec 5, 2019
  31. jamesob renamed this:
    coins: add `erase` parameter to control cacheCoins drop on flush
    coins: allow Flush() without cache drop
    on Dec 5, 2019
  32. jamesob commented at 4:53 pm on December 5, 2019: member
    I’ve split Flush(bool erase) into Sync() and Flush() as requested by @ryanofsky @MarcoFalke. (changes)
  33. ryanofsky commented at 9:02 pm on January 27, 2020: contributor
  34. Sjors commented at 11:57 am on January 29, 2020: member

    Review club notes:

    Preparation notes; it’s scheduled for ~Wednesday the 29th~ tonight, which explains why I couldn’t find the minutes :-)

  35. in src/test/coins_tests.cpp:159 in 640861a52a outdated
    149@@ -150,9 +150,16 @@ BOOST_AUTO_TEST_CASE(coins_cache_simulation_test)
    150             bool test_havecoin_after = InsecureRandBits(2) == 0;
    151 
    152             bool result_havecoin = test_havecoin_before ? stack.back()->HaveCoin(COutPoint(txid, 0)) : false;
    153-            const Coin& entry = (InsecureRandRange(500) == 0) ? AccessByTxid(*stack.back(), txid) : stack.back()->AccessCoin(COutPoint(txid, 0));
    154+
    155+            // Infrequently, test usage of AccessByTxid instead of AccessCoin - the
    156+            // former just delegates to the latter and returns the first unspent in a txn.
    


    Sjors commented at 12:26 pm on January 29, 2020:
    If COutPoint(txid, 0) doesn’t exist and COutPoint(txid, 0) does, then AccessByTxid will randomly behave different from AccessCoin. But the coins are random anyway.

    jamesob commented at 5:46 pm on May 28, 2020:
    I don’t understand the original comment - is there a typo in one of your COutPoints?
  36. in src/test/coins_tests.cpp:241 in b12c4b092d outdated
    232         if (InsecureRandRange(100) == 0) {
    233             // Every 100 iterations, change the cache stack.
    234             if (stack.size() > 0 && InsecureRandBool() == 0) {
    235                 //Remove the top cache
    236-                BOOST_CHECK(stack.back()->Flush());
    237+                bool should_erase = InsecureRandRange(4) < 3;
    


    Sjors commented at 1:52 pm on January 29, 2020:
    Comment above should be “Flush or sync the top cache”. Are you sure this test is working as advertised? What does delete stack.back(); and stack.pop_back(); do here? Making them conditional doesn’t break the test.

    jamesob commented at 5:43 pm on May 28, 2020:
    Yeah, the point is to exercise cases where we do delete/pop parts of the cache stack - the test should pass in any case.
  37. in src/test/coins_tests.cpp:909 in eebaca7620 outdated
    888+    coin.fCoinBase = 0;
    889+    return coin;
    890+}
    891+
    892+
    893+//! For CCoinsViewCache instances backed by either another cache instance and
    


    Sjors commented at 2:12 pm on January 29, 2020:
    nit: “either … and”?

    jamesob commented at 5:49 pm on May 28, 2020:
    fixed
  38. Sjors commented at 2:30 pm on January 29, 2020: member

    I left a few comments mid-review to help with tonights review club. So far eebaca7620bbd0af0ec385c6c7d47b2b4b524d55 looks good, but I need to stare at it a bit more. The new (clarification of) tests are great.

    ~You may want to rebase, because without that on macOS I get errors during make:~ (this was an issue with my machine)

    0Undefined symbols for architecture x86_64:
    1  "_CRYPTO_num_locks", referenced from:
    2      (anonymous namespace)::RNGState::RNGState() in libbitcoin_util.a(libbitcoin_util_a-random.o)
    3  "_RAND_cleanup", referenced from:
    
  39. jonatack commented at 4:23 pm on January 29, 2020: member

    You may want to rebase, because without that on macOS I get errors during make:

    With gcc (Debian 8.3.0-6) and --enable-debug --enable-bench eebaca7620bbd0af0ec385c6c7d47b2b4b524d55 builds and all tests are passing for me. Starting code review.

  40. jonatack commented at 5:53 pm on January 29, 2020: member
    Concept ACK. Code review ACK of commit b2abb396c64e96585a0c38a269372b35cced08bd. The changes are straightforward and do not appear to change behavior.
  41. ryanofsky commented at 6:09 pm on January 29, 2020: contributor
    PR title “coins: allow Flush() without cache drop” seems out of date. Would suggest something like “coins: Add new unused CCoinsViewCache::Sync() method”
  42. jamesob renamed this:
    coins: allow Flush() without cache drop
    coins: allow write to disk without cache drop
    on Jan 29, 2020
  43. jnewbery commented at 1:55 pm on February 2, 2020: contributor
    Concept ACK
  44. in src/coins.cpp:245 in eebaca7620 outdated
    210     return fOk;
    211 }
    212 
    213+bool CCoinsViewCache::Sync()
    214+{
    215+    bool fOk = base->BatchWrite(cacheCoins, hashBlock, /*erase*/ false);
    


    jnewbery commented at 11:18 pm on February 10, 2020:
    This is simply copying the pattern of Flush(), but BatchWrite() never returns false, so this parameter does nothing. I think all three functions should be cleaned up to not return anything so that the calling pattern is clear, but that can be done in a follow-up PR.
  45. in src/test/coins_tests.cpp:936 in eebaca7620 outdated
    915+
    916+    auto flush_all = [&all_caches](bool erase) {
    917+        // Flush in reverse order to ensure that flushes happen from children up.
    918+        for (auto i = all_caches.rbegin(); i != all_caches.rend(); ++i) {
    919+            auto cache = *i;
    920+            cache->SetBestBlock(InsecureRand256());  // Hack to allow flushing.
    


    jnewbery commented at 11:20 pm on February 10, 2020:
    nit: “Hack to allow flushing” isn’t very illuminating. Perhaps “A cache’s hashBlock must be filled before flushing to disk. That’s normally done in connect/disconnect block. Do it manually here”

    Sjors commented at 6:49 pm on May 19, 2020:
  46. in src/test/coins_tests.cpp:1010 in eebaca7620 outdated
    989+
    990+    // --- 6. Spend the coin.
    991+    //
    992+    BOOST_CHECK(view->SpendCoin(outp));
    993+
    994+    // The coin should be in the cache, but pruned and marked dirty.
    


    jnewbery commented at 11:21 pm on February 10, 2020:
    nit: I don’t think pruned is the right terminolgy, but this can be cleaned up with all the rest in a future PR.

    jamesob commented at 5:51 pm on May 28, 2020:
    fixed
  47. in src/test/coins_tests.cpp:1050 in eebaca7620 outdated
    1029+    BOOST_CHECK(!all_caches[0]->HaveCoin(outp));
    1030+    BOOST_CHECK(!all_caches[1]->HaveCoin(outp));
    1031+
    1032+    flush_all(/*erase*/ true); // Erase all cache content.
    1033+
    1034+    // --- Bonus check 2: ensure that a FRESH coin is deleted by Sync()
    


    jnewbery commented at 11:22 pm on February 10, 2020:
    I think this should say “ensure that a FRESH spent coin is deleted by Sync()”. If the coin is FRESH but not spent, then a Sync() won’t delete it.

    Sjors commented at 6:51 pm on May 19, 2020:

    jamesob commented at 5:52 pm on May 28, 2020:
    fixed
  48. in src/test/coins_tests.cpp:1073 in eebaca7620 outdated
    1055+    all_caches[0]->Sync();
    1056+
    1057+    // Ensure there is no sign of the coin after spend/flush.
    1058+    GetCoinsMapEntry(all_caches[0]->map(), value, flags, outp);
    1059+    BOOST_CHECK_EQUAL(value, ABSENT);
    1060+    BOOST_CHECK_EQUAL(flags, NO_ENTRY);
    


    jnewbery commented at 11:23 pm on February 10, 2020:
    nit: I don’t think we need these three lines. Just checking that HaveCoinInCache() fails is enough to show that the coin doesn’t exist in the cache.

    jamesob commented at 5:53 pm on May 28, 2020:
    I’m going to keep these in because I think it’s worth being painfully explicit in coins tests.

    jnewbery commented at 10:21 pm on June 1, 2020:
    If HaveCoinInCache() isn’t explicit enough, should it be removed from lower in the test, or is that different somehow?
  49. jnewbery commented at 11:26 pm on February 10, 2020: contributor

    tested ACK eebaca7620bbd0af0ec385c6c7d47b2b4b524d55

    I’ve added a few nits, but I think this is ready for merge now, and those nits could be cleaned up later.

  50. in src/test/coins_tests.cpp:935 in eebaca7620 outdated
    916+    auto flush_all = [&all_caches](bool erase) {
    917+        // Flush in reverse order to ensure that flushes happen from children up.
    918+        for (auto i = all_caches.rbegin(); i != all_caches.rend(); ++i) {
    919+            auto cache = *i;
    920+            cache->SetBestBlock(InsecureRand256());  // Hack to allow flushing.
    921+            erase ? cache->Flush() : cache->Sync();
    


    fjahr commented at 11:00 pm on February 14, 2020:
    nit: I like the naming destinction between Sync() and Flush() in the code because it made it easier to parse and that naming could have been used more here in the test as well. Now the destinction it is kind of hidden behind this do_erasing_flush parameter. There could have been a sync_all helper method for example. But it’s probably not worth the effort of changing now.
  51. in src/coins.cpp:226 in b2abb396c6 outdated
    211 }
    212 
    213+bool CCoinsViewCache::Sync()
    214+{
    215+    bool fOk = base->BatchWrite(cacheCoins, hashBlock, /*erase*/ false);
    216+    // Instead of clearing the cache, just clear the FRESH/DIRTY
    


    fjahr commented at 11:33 pm on February 14, 2020:
    nit: If I look at this comment outside of the context of this PR “Instead of clearing the cache” might not make much sense because there is no cache clearing going on inside of this function and the relation to Flush() may not be immediately clear.

    jamesob commented at 6:09 pm on May 28, 2020:
    rephrased
  52. fjahr approved
  53. fjahr commented at 11:54 pm on February 14, 2020: contributor

    ACK eebaca7620bbd0af0ec385c6c7d47b2b4b524d55

    Reviewed code and tested locally.

  54. jamesob commented at 4:37 pm on February 17, 2020: member
    Thanks for the reviews, all. I’ll address some of the nits if there’s a need to rebase, otherwise I’m going for ACK-preservation at this point.
  55. jnewbery commented at 5:20 pm on March 13, 2020: contributor
    This has two ACKs on the latest branch (@jnewbery and @fjahr), but previous concept ACKs from @ryanofsky @jonatack @dongcarl @ariard , and other review comments from @Sjors @andrewtoth @MarcoFalke . What can we do to get this over the line?
  56. in src/coins.cpp:174 in eebaca7620 outdated
    157@@ -158,7 +158,7 @@ bool CCoinsViewCache::BatchWrite(CCoinsMap &mapCoins, const uint256 &hashBlockIn
    158                 // Otherwise we will need to create it in the parent
    159                 // and move the data up and mark it as dirty
    160                 CCoinsCacheEntry& entry = cacheCoins[it->first];
    161-                entry.coin = std::move(it->second.coin);
    162+                entry.coin = erase ? std::move(it->second.coin) : it->second.coin;
    


    luke-jr commented at 3:52 am on April 2, 2020:
    Is the std::move gain really worth the added code complexity here?

    sipa commented at 4:08 am on April 2, 2020:
    This has no effect, I think. All std::move does is cast to an rvalue reference. The ternary operator merges it back to the same type with the other branch.

    jamesob commented at 2:33 pm on April 5, 2020:

    It is indeed required. Try recompiling without this change and running the coins tests (./src/test/test_bitcoin -t coins_tests).

    The std::move is necessary to erase the contents of the map being written to the parent.


    sipa commented at 6:20 pm on April 5, 2020:
    @jamesob The std::move here has no effect; it is equivalent to entry.coin = it->second.coin. If you want to conditionally move, you need two separate statements.

    jamesob commented at 6:37 pm on April 5, 2020:
    Did you actually try recompiling with your suggested change and running the tests? The result on my platform (gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0) is failure.

    luke-jr commented at 6:56 pm on April 5, 2020:

    @jamesob It sounds like you’re relying on the side effect of the moved-from object being cleared or something, but AFAIK it is in fact left in an undefined state…

    Probably should just copy and explicitly clear.


    sipa commented at 7:00 pm on April 5, 2020:

    It seems I’m wrong; your code is correct.

    Apparently what happens in this case is that a temporary variable is created, which it->second.coin is either copied or moved to (depending on erase), and then that temporary variable is move-assigned to entry.coin. This has the desired effect, though there is possibly an unnecessary (but cheap) move.

    If you write it as

    0if (erase) {
    1    entry.coin = std::move(it->second.coin);
    2} else {
    3    entry.coin = it->second.coin;
    4}
    

    I think you get the same semantics, but without the additional move.


    sipa commented at 7:04 pm on April 5, 2020:
    @luke-jr Well, the only member with move semantics is CScript, which is a prevector, and move assigning is well defined (it’s in our own code), namely swapping.

    luke-jr commented at 7:06 pm on April 5, 2020:
    Abstractly, it is undefined, and std::move AFAIK does not guarantee move assignment, only allows it?

    sipa commented at 7:11 pm on April 5, 2020:

    @luke-jr std::move just casts to an rvalue reference, nothing else. It’s up to the involved classes to define what to do when assigned with or constructed from an rvalue.

    The standard also defines what standard library classes do in this case: moved-from objects are in a valid but unspecified state (which means that you’re only allowed to run operations on them that have no preconditions; e.g. empty() on them or assigning to them is ok).

    However, no standard library classes are involved here, so it doesn’t apply.

    I agree with you insofar that if the desired effect is clearing, the code should just explicitly clear (potentially after a move, if that’s more efficient). But it isn’t UB to rely on the rvalue semantics of a class we defined ourselves.


    Sjors commented at 4:55 pm on May 19, 2020:

    I like sipa’s alternative.

    FWIW I also get a flood of errors when using entry.coin = std::move(it->second.coin); with erase = false. Conversely if I drop std::move it works fine, but I haven’t looked at performance. In other words, we’re not depending on some side-effect std::move, it’s just a performance enhancer.


    jamesob commented at 6:08 pm on May 28, 2020:

    @Sjors please see above - we do rely on the std::move to clear the entry from the child cache, it isn’t just a performance optimization. If you remove the move, the coins unittests will fail.

    That said I agree it seems tricky to rely upon this behavior implicitly based upon classes we’ve define ourselves, but I haven’t tested the performance implications of adding mapCoins.erase(it). Given that I’m essentially leaving the behavior as it was before this change in the case of the move, and we have unittests that will catch a change in behavior here, I don’t feel the need to add an explicit erase() call. I have however clarified the phrasing with an explicit if-else as @sipa suggests.


    Sjors commented at 7:25 pm on May 28, 2020:

    If you remove the move, the coins unittests will fail.

    Odd, I tried that. Will try again.


    jamesob commented at 7:58 pm on May 28, 2020:

    Odd, I tried that. Will try again.

    Perhaps you made the change in the relevant commit, but then didn’t apply subsequent commits including the new tests? Just a guess.


    andrewtoth commented at 8:22 pm on May 28, 2020:
    FWIW I did try this before acking. I checked out this branch and changed the ternary to just use the copy assignment and the tests errored out. Will try again with updated commits.

    jamesob commented at 10:34 pm on May 28, 2020:
    @jnewbery has found that I was wrong - the std::move invocation isn’t necessary for correctness, it’s just an optimization. Thanks for pointing that out @Sjors. Will remove the new comment.

    jnewbery commented at 2:21 am on May 29, 2020:

    Since this has caused so much confusion, I think it’d be better to update the comment than remove it entirely. Perhaps something like:

    0// We'll erase this entry from the map in the for increase expression.
    1// Move the coin to the parent cache instead of copying as an optimization.
    

    Sjors commented at 12:41 pm on June 1, 2020:
    Pfew, I’m not insane :-) Thanks for adding the comment.
  57. DrahtBot added the label Needs rebase on Apr 16, 2020
  58. jamesob force-pushed on May 6, 2020
  59. DrahtBot removed the label Needs rebase on May 6, 2020
  60. andrewtoth commented at 3:46 am on May 18, 2020: contributor
    tested ACK 8bb1163805121b2863fef1c16c9b104803e8dfdd I’ve been running this patch on top of this branch for the last week to test non-erasing periodic flushes. I haven’t had any issues.
  61. jonatack commented at 8:15 am on May 18, 2020: member

    ACK 8bb1163805121b2

    Reviewed, built/tested/ran bitcoind locally. No change in current behavior. Thanks for the work on the tests.

  62. in src/coins.cpp:229 in 8bb1163805 outdated
    225+{
    226+    bool fOk = base->BatchWrite(cacheCoins, hashBlock, /*erase*/ false);
    227+    // Instead of clearing the cache, just clear the FRESH/DIRTY
    228+    // flags, and erase any spent coins
    229+    for (auto it = cacheCoins.begin(); it != cacheCoins.end(); ) {
    230+        it->second.flags = 0;
    


    jnewbery commented at 4:18 pm on May 19, 2020:

    nit: it seems a bit odd to clear the flags immediately before deleting in the case where the coin is spent. If you retouch this PR, the flag clearing can be moved into the else branch below.

    You might also want to update the comment to reflect that change: “Erase any spent coins and clear the FRESH/DIRTY flags for unspent coins”.


    Sjors commented at 6:21 pm on May 19, 2020:
    Agreed. In addition, let’s not assume there won’t be new flags in the future, so maybe clear the FRESH/DIRTY flags explicitly with an =&?

    jamesob commented at 6:11 pm on May 28, 2020:
    fixed
  63. in src/txdb.cpp:119 in 8bb1163805 outdated
    115@@ -116,7 +116,9 @@ bool CCoinsViewDB::BatchWrite(CCoinsMap &mapCoins, const uint256 &hashBlock) {
    116         }
    117         count++;
    118         CCoinsMap::iterator itOld = it++;
    119-        mapCoins.erase(itOld);
    120+        if (erase) {
    


    jnewbery commented at 4:27 pm on May 19, 2020:

    nit: I think it’d be clearer to avoid this use-once local variable by using the same pattern that you used in coins.cpp:

    0it = erase ? mapCoins.erase(it) : ++it;
    

    Sjors commented at 6:28 pm on May 19, 2020:
    Does this need an exception, to always delete FRESH spent coins? Or those can’t exist at this level?

    jamesob commented at 6:31 pm on May 28, 2020:

    Good suggestion, thanks! Fixed.

    FRESH spent coins won’t exist here because they will have been removed from the child cache first - for proof of this note that nothing calls CCoinsViewDB::BatchWrite() aside from CCoinsViewCache, and SpendCoin() only exists on CCoinsViewCache instances.

  64. jnewbery commented at 4:29 pm on May 19, 2020: contributor
    A couple more comments inline.
  65. in src/coins.h:309 in d5f4f7bcdb outdated
    306      */
    307     bool Flush();
    308 
    309+    /**
    310+     * Push the modifications applied to this cache to its base while retaining
    311+     * the contents of this cache.
    


    Sjors commented at 6:23 pm on May 19, 2020:
    nit: * the contents of this cache except for FRESH spent coins.

    jamesob commented at 6:12 pm on May 28, 2020:
    fixed
  66. in src/test/coins_tests.cpp:186 in f2abc63382 outdated
    182                     BOOST_CHECK(newcoin.out.scriptPubKey.IsUnspendable());
    183                     added_an_unspendable_entry = true;
    184                 } else {
    185-                    newcoin.out.scriptPubKey.assign(InsecureRandBits(6), 0); // Random sizes so we can test memory usage accounting
    186+                    // Random sizes so we can test memory usage accounting
    187+                    newcoin.out.scriptPubKey.assign(InsecureRandBits(6), 1);
    


    Sjors commented at 6:36 pm on May 19, 2020:
    Why did you switch the second argument of assign to 1?

    jamesob commented at 5:48 pm on May 28, 2020:
    Good question… can’t remember doing that. Will revert.
  67. Sjors approved
  68. Sjors commented at 6:53 pm on May 19, 2020: member

    utACK 8bb1163805121b2863fef1c16c9b104803e8dfdd

    Note that #18746 also touches BatchWrite, and I have a light preference for merging that first. But it shouldn’t be rocket science to rebase either this PR or that one.

    On a related note, I find the existing terminology of base, mapCoins, parent and grandparent extremely confusing.

  69. jamesob commented at 4:27 pm on May 20, 2020: member
    Thanks for the feedback, all. I’ll push some responses later today.
  70. jamesob force-pushed on May 28, 2020
  71. jamesob commented at 6:38 pm on May 28, 2020: member

    Thanks for the feedback, all. I’ll push some responses later today.

    8 days later… revisions pushed. Thanks for feedback.

  72. in src/coins.cpp:208 in 1ce5cd389f outdated
    204@@ -199,7 +205,7 @@ bool CCoinsViewCache::BatchWrite(CCoinsMap &mapCoins, const uint256 &hashBlockIn
    205             } else {
    206                 // A normal modification.
    207                 cachedCoinsUsage -= itUs->second.coin.DynamicMemoryUsage();
    208-                itUs->second.coin = std::move(it->second.coin);
    209+                itUs->second.coin = erase ? std::move(it->second.coin) : it->second.coin;
    


    andrewtoth commented at 1:56 am on May 31, 2020:
    Should this also be updated to use an if/else block instead of a ternary like above?

    jamesob commented at 3:23 pm on May 31, 2020:
    Good catch.
  73. jamesob force-pushed on May 31, 2020
  74. in src/coins.cpp:211 in fdc1d80ce4 outdated
    205@@ -199,7 +206,14 @@ bool CCoinsViewCache::BatchWrite(CCoinsMap &mapCoins, const uint256 &hashBlockIn
    206             } else {
    207                 // A normal modification.
    208                 cachedCoinsUsage -= itUs->second.coin.DynamicMemoryUsage();
    209-                itUs->second.coin = std::move(it->second.coin);
    210+                if (erase) {
    211+                    // The `move` call here is purely an optimization; we rely on the
    212+                    // `mapCoins.erase` call in the `for` predicate to actually remove
    


    jnewbery commented at 10:12 pm on May 31, 2020:
    nit: the third part of the for loop is an expression, not a predicate (more precisely, the increase or iteration expression). A predicate is a function which returns true/false.
  75. jamesob force-pushed on Jun 1, 2020
  76. Sjors commented at 1:46 pm on June 1, 2020: member
    re-utACK 8d5994ef2235c6706440bc18968006cd04c22fb4
  77. jnewbery commented at 10:21 pm on June 1, 2020: contributor
    utACK 8d5994ef2235c6706440bc18968006cd04c22fb4
  78. jonatack commented at 3:18 pm on June 2, 2020: member

    re-ACK 8d5994e modulo question at #17487 (review), reviewed changes since my previous review per git diff 8bb1163 8d5994e and verified that this builds cleanly on Debian gcc and clang locally with the works.

      0$ (pr/17487)$ git diff 8bb1163 8d5994e
      1diff --git a/src/coins.cpp b/src/coins.cpp
      2index 060a23a09f..0a76509705 100644
      3--- a/src/coins.cpp
      4+++ b/src/coins.cpp
      5@@ -171,7 +171,14 @@ bool CCoinsViewCache::BatchWrite(CCoinsMap &mapCoins, const uint256 &hashBlockIn
      6                 // Create the coin in the parent cache, move the data up
      7                 // and mark it as dirty.
      8                 CCoinsCacheEntry& entry = cacheCoins[it->first];
      9-                entry.coin = erase ? std::move(it->second.coin) : it->second.coin;
     10+                if (erase) {
     11+                    // The `move` call here is purely an optimization; we rely on the
     12+                    // `mapCoins.erase` call in the `for` expression to actually remove
     13+                    // the entry from the child map.
     14+                    entry.coin = std::move(it->second.coin);
     15+                } else {
     16+                    entry.coin = it->second.coin;
     17+                }
     18                 cachedCoinsUsage += entry.coin.DynamicMemoryUsage();
     19                 entry.flags = CCoinsCacheEntry::DIRTY;
     20                 // We can mark it FRESH in the parent if it was FRESH in the child
     21@@ -199,7 +206,14 @@ bool CCoinsViewCache::BatchWrite(CCoinsMap &mapCoins, const uint256 &hashBlockIn
     22             } else {
     23                 // A normal modification.
     24                 cachedCoinsUsage -= itUs->second.coin.DynamicMemoryUsage();
     25-                itUs->second.coin = erase ? std::move(it->second.coin) : it->second.coin;
     26+                if (erase) {
     27+                    // The `move` call here is purely an optimization; we rely on the
     28+                    // `mapCoins.erase` call in the `for` expression to actually remove
     29+                    // the entry from the child map.
     30+                    itUs->second.coin = std::move(it->second.coin);
     31+                } else {
     32+                    itUs->second.coin = it->second.coin;
     33+                }
     34                 cachedCoinsUsage += itUs->second.coin.DynamicMemoryUsage();
     35                 itUs->second.flags |= CCoinsCacheEntry::DIRTY;
     36                 // NOTE: It isn't safe to mark the coin as FRESH in the parent
     37@@ -223,14 +237,14 @@ bool CCoinsViewCache::Flush() {
     38 bool CCoinsViewCache::Sync()
     39 {
     40     bool fOk = base->BatchWrite(cacheCoins, hashBlock, /*erase*/ false);
     41-    // Instead of clearing the cache, just clear the FRESH/DIRTY
     42-    // flags, and erase any spent coins
     43+    // Instead of clearing `cacheCoins` as we would in Flush(), just clear the
     44+    // FRESH/DIRTY flags of any coin that isn't spent.
     45     for (auto it = cacheCoins.begin(); it != cacheCoins.end(); ) {
     46-        it->second.flags = 0;
     47         if (it->second.coin.IsSpent()) {
     48             cachedCoinsUsage -= it->second.coin.DynamicMemoryUsage();
     49             it = cacheCoins.erase(it);
     50         } else {
     51+            it->second.flags = 0;
     52             ++it;
     53         }
     54     }
     55diff --git a/src/coins.h b/src/coins.h
     56index b912b01fb4..e91c299de2 100644
     57--- a/src/coins.h
     58+++ b/src/coins.h
     59@@ -306,7 +306,7 @@ public:
     60 
     61     /**
     62      * Push the modifications applied to this cache to its base while retaining
     63-     * the contents of this cache.
     64+     * the contents of this cache (except for FRESH spent coins, which we erase).
     65      * Failure to call this method or Flush() before destruction will cause the changes
     66      * to be forgotten.
     67      * If false is returned, the state of this cache (and its backing view) will be undefined.
     68diff --git a/src/test/coins_tests.cpp b/src/test/coins_tests.cpp
     69index 9375cd69b0..0bf65014ff 100644
     70--- a/src/test/coins_tests.cpp
     71+++ b/src/test/coins_tests.cpp
     72@@ -183,7 +183,7 @@ void SimulationTest(CCoinsView* base, bool fake_best_block)
     73                     added_an_unspendable_entry = true;
     74                 } else {
     75                     // Random sizes so we can test memory usage accounting
     76-                    newcoin.out.scriptPubKey.assign(InsecureRandBits(6), 1);
     77+                    newcoin.out.scriptPubKey.assign(InsecureRandBits(6), 0);
     78                     (coin.IsSpent() ? added_an_entry : updated_an_entry) = true;
     79                     coin = newcoin;
     80                 }
     81@@ -906,7 +906,7 @@ Coin MakeCoin()
     82 }
     83 
     84 
     85-//! For CCoinsViewCache instances backed by either another cache instance and
     86+//! For CCoinsViewCache instances backed by either another cache instance or
     87 //! leveldb, test cache behavior and flag state (DIRTY/FRESH) by
     88 //!
     89 //! 1. Adding a random coin to the child-most cache,
     90@@ -933,7 +933,9 @@ void TestFlushBehavior(
     91         // Flush in reverse order to ensure that flushes happen from children up.
     92         for (auto i = all_caches.rbegin(); i != all_caches.rend(); ++i) {
     93             auto cache = *i;
     94-            cache->SetBestBlock(InsecureRand256());  // Hack to allow flushing.
     95+            // hashBlock must be filled before flushing to disk; value is
     96+            // unimportant here. This is normally done during connect/disconnect block.
     97+            cache->SetBestBlock(InsecureRand256());
     98             erase ? cache->Flush() : cache->Sync();
     99         }
    100     };
    101@@ -1007,7 +1009,7 @@ void TestFlushBehavior(
    102     //
    103     BOOST_CHECK(view->SpendCoin(outp));
    104 
    105-    // The coin should be in the cache, but pruned and marked dirty.
    106+    // The coin should be in the cache, but spent and marked dirty.
    107     GetCoinsMapEntry(view->map(), value, flags, outp);
    108     BOOST_CHECK_EQUAL(value, SPENT);
    109     BOOST_CHECK_EQUAL(flags, DIRTY);
    110@@ -1047,7 +1049,7 @@ void TestFlushBehavior(
    111 
    112     flush_all(/*erase*/ true); // Erase all cache content.
    113 
    114-    // --- Bonus check 2: ensure that a FRESH coin is deleted by Sync()
    115+    // --- Bonus check 2: ensure that a FRESH, spent coin is deleted by Sync()
    116     //
    117     txid = InsecureRand256();
    118     outp = COutPoint(txid, 0);
    119diff --git a/src/txdb.cpp b/src/txdb.cpp
    120index c6dfe39595..2588dc914f 100644
    121--- a/src/txdb.cpp
    122+++ b/src/txdb.cpp
    123@@ -115,10 +115,7 @@ bool CCoinsViewDB::BatchWrite(CCoinsMap &mapCoins, const uint256 &hashBlock, boo
    124             changed++;
    125         }
    126         count++;
    127-        CCoinsMap::iterator itOld = it++;
    128-        if (erase) {
    129-            mapCoins.erase(itOld);
    130-        }
    131+        it = erase ? mapCoins.erase(it) : ++it;
    132         if (batch.SizeEstimate() > batch_size) {
    133             LogPrint(BCLog::COINDB, "Writing partial batch of %.2f MiB\n", batch.SizeEstimate() * (1.0 / 1048576.0));
    134             db.WriteBatch(batch);
    
  79. in src/txdb.cpp:115 in 8d5994ef22 outdated
    114@@ -115,8 +115,7 @@ bool CCoinsViewDB::BatchWrite(CCoinsMap &mapCoins, const uint256 &hashBlock) {
    115             changed++;
    116         }
    117         count++;
    118-        CCoinsMap::iterator itOld = it++;
    119-        mapCoins.erase(itOld);
    120+        it = erase ? mapCoins.erase(it) : ++it;
    


    jonatack commented at 3:23 pm on June 2, 2020:
    Should ++it be outside the ternary conditional here?

    jnewbery commented at 4:13 pm on June 2, 2020:
    No. std::unordered_map::erase() returns an iterator to the next item in the map: https://en.cppreference.com/w/cpp/container/unordered_map/erase.

    jonatack commented at 4:17 pm on June 2, 2020:
    Oh – and it appears to be a convention. Thanks @jnewbery.

    sipa commented at 4:26 pm on June 2, 2020:
    it = ++it; is kind of strange though. How about it = mapCoins.erase(it) : std::next(it); ?

    jamesob commented at 5:56 pm on June 3, 2020:
    Will change if I rebase.

    jamesob commented at 8:18 pm on August 1, 2020:
    Fixed, thanks.
  80. in src/coins.cpp:199 in 8d5994ef22 outdated
    170@@ -171,7 +171,14 @@ bool CCoinsViewCache::BatchWrite(CCoinsMap &mapCoins, const uint256 &hashBlockIn
    171                 // Create the coin in the parent cache, move the data up
    172                 // and mark it as dirty.
    173                 CCoinsCacheEntry& entry = cacheCoins[it->first];
    174-                entry.coin = std::move(it->second.coin);
    175+                if (erase) {
    176+                    // The `move` call here is purely an optimization; we rely on the
    177+                    // `mapCoins.erase` call in the `for` expression to actually remove
    178+                    // the entry from the child map.
    179+                    entry.coin = std::move(it->second.coin);
    


    andrewtoth commented at 5:18 pm on June 3, 2020:
    Thinking about this more, is it not more than a pure optimization? If we copy instead of move, then we are increasing the memory footprint of our dbcache. For a large batch write won’t all dirty entries exist in memory twice, potentially increasing the memory requirements of the node temporarily until the parent cache writes to disk?

    jamesob commented at 5:55 pm on June 3, 2020:
    @andrewtoth I think this is offset by the fact that we’re erasing as we iterate through the map, so coins will only be duplicated in memory one by one. But that’s a good consideration, thanks for raising the point.

    andrewtoth commented at 6:17 pm on June 3, 2020:
    @jamesob For this change we are no longer erasing as we iterate through the map, and also copying, right? For a Sync() with a large amount of dirty entries won’t this cause memory usage to increase?

    andrewtoth commented at 10:28 pm on June 3, 2020:
    Hmm ok I guess I’m missing how we’re erasing during the iteration with erase=false. I tested a reindex-chainstate with max dbcache and did a Sync instead of Flush on shutdown, and it didn’t increase memory usage at all when syncing. LGTM.

    jamesob commented at 8:21 pm on June 4, 2020:

    @andrewtoth you don’t see a big memory spike because in practice this implementation of BatchWrite is only called for small sets of coins during ConnectTip() to make UTXO set changes atomic (i.e. a CCoinsViewCache on top of another CCoinsViewCache); calling BatchWrite for really sizable sets of coins happens on CCoinsViewDB (leveldb) objects (i.e. writing down from a CCoinsViewCache -> CCoinsViewDB). CCoinsViewDB doesn’t have an in-memory cache (at least that we maintain) so there is no sizable duplication.

    But once again, thanks for raising the point and testing.

  81. andrewtoth commented at 10:28 pm on June 3, 2020: contributor
    re-ACK 8d5994ef2235c6706440bc18968006cd04c22fb4
  82. DrahtBot added the label Needs rebase on Jul 29, 2020
  83. jamesob force-pushed on Aug 1, 2020
  84. jamesob commented at 3:53 pm on August 1, 2020: member

    au.coins-erase.7 -> au.coins-erase.8

     0% git range-diff master au.coins-erase.7 au.coins-erase.8
     1
     21:  8b9b51da04 ! 1:  58616e0b8b coins: add Sync() method to allow flush without cacheCoins drop
     3    @@ -187,7 +187,7 @@
     4
     5     -bool CCoinsViewDB::BatchWrite(CCoinsMap &mapCoins, const uint256 &hashBlock) {
     6     +bool CCoinsViewDB::BatchWrite(CCoinsMap &mapCoins, const uint256 &hashBlock, bool erase) {
     7    -     CDBBatch batch(db);
     8    +     CDBBatch batch(*m_db);
     9          size_t count = 0;
    10          size_t changed = 0;
    11     @@
    12    @@ -199,7 +199,7 @@
    13     +        it = erase ? mapCoins.erase(it) : ++it;
    14              if (batch.SizeEstimate() > batch_size) {
    15                  LogPrint(BCLog::COINDB, "Writing partial batch of %.2f MiB\n", batch.SizeEstimate() * (1.0 / 1048576.0));
    16    -             db.WriteBatch(batch);
    17    +             m_db->WriteBatch(batch);
    18
    19      diff --git a/src/txdb.h b/src/txdb.h
    20      --- a/src/txdb.h
    212:  09ef24a814 = 2:  48d3e92b06 test: refactor: clarify the coins simulation
    223:  c50373bfb7 = 3:  7239b0cee5 test: add use of Sync() to coins tests
    234:  8d5994ef22 = 4:  0292b01cc4 test: add test for coins view flush behavior using Sync()
    

    This PR has gotten substantial review, testing, and ACK from a number of people (@andrewtoth, @jnewbery, @jonatack, @Sjors) but personally I would appreciate evaluation for at least a Concept ACK from someone who has a lot of experience in the coins department (e.g. @sipa, @sdaftuar, @ryanofsky) before this moves forward.

    To reiterate, this has shown to be noticeably helpful when loading UTXO snapshots with assumeutxo and can potentially help shave some time off ConnectBlock (https://github.com/bitcoin/bitcoin/pull/18941#issuecomment-633684774 cc @andrewtoth). But I don’t want to push a string here.

  85. in src/coins.h:309 in 58616e0b8b outdated
    306      */
    307     bool Flush();
    308 
    309+    /**
    310+     * Push the modifications applied to this cache to its base while retaining
    311+     * the contents of this cache (except for FRESH spent coins, which we erase).
    


    sipa commented at 6:21 pm on August 1, 2020:

    I don’t think except for FRESH spent coins, which we erase is true?

    If a FRESH coin is spent, it is deleted immediately, so there shouldn’t be any such entries in the first place.


    jamesob commented at 8:17 pm on August 1, 2020:
    Yep, you’re right, not sure what I was thinking there.
  86. jamesob force-pushed on Aug 1, 2020
  87. jamesob commented at 8:18 pm on August 1, 2020: member

    au.coins-erase.8 -> au.coins-erase.9

     01:  58616e0b8b ! 1:  47d71fdb70 coins: add Sync() method to allow flush without cacheCoins drop
     1    @@ -154,7 +154,7 @@
     2
     3     +    /**
     4     +     * Push the modifications applied to this cache to its base while retaining
     5    -+     * the contents of this cache (except for FRESH spent coins, which we erase).
     6    ++     * the contents of this cache (except for spent coins, which we erase).
     7     +     * Failure to call this method or Flush() before destruction will cause the changes
     8     +     * to be forgotten.
     9     +     * If false is returned, the state of this cache (and its backing view) will be undefined.
    10    @@ -196,7 +196,7 @@
    11              count++;
    12     -        CCoinsMap::iterator itOld = it++;
    13     -        mapCoins.erase(itOld);
    14    -+        it = erase ? mapCoins.erase(it) : ++it;
    15    ++        it = erase ? mapCoins.erase(it) : std::next(it);
    16              if (batch.SizeEstimate() > batch_size) {
    17                  LogPrint(BCLog::COINDB, "Writing partial batch of %.2f MiB\n", batch.SizeEstimate() * (1.0 / 1048576.0));
    18                  m_db->WriteBatch(batch);
    192:  48d3e92b06 = 2:  342ef935c4 test: refactor: clarify the coins simulation
    203:  7239b0cee5 = 3:  77897654a5 test: add use of Sync() to coins tests
    214:  0292b01cc4 = 4:  f4b56059b4 test: add test for coins view flush behavior using Sync()
    

    Addressed feedback from @sipa (thanks).

  88. DrahtBot removed the label Needs rebase on Aug 3, 2020
  89. Sjors commented at 6:33 pm on May 5, 2021: member
    I was able to rebase f4b5605 on master and run the test. Difference with my previous utACK seems trivial, but I haven’t re-reviewed it.
  90. DrahtBot added the label Needs rebase on Jun 23, 2021
  91. jamesob force-pushed on Aug 17, 2021
  92. DrahtBot removed the label Needs rebase on Aug 17, 2021
  93. in src/coins.cpp:237 in 0c2fc5154e outdated
    233@@ -220,12 +234,29 @@ bool CCoinsViewCache::BatchWrite(CCoinsMap &mapCoins, const uint256 &hashBlockIn
    234 }
    235 
    236 bool CCoinsViewCache::Flush() {
    237-    bool fOk = base->BatchWrite(cacheCoins, hashBlock);
    238+    bool fOk = base->BatchWrite(cacheCoins, hashBlock, /*erase*/ true);
    


    jonatack commented at 11:53 am on December 11, 2021:

    The following TODO comment in src/validation.cpp was removed in #23738. Making a note here so the TODO can be either be done in this PR or in a follow-up.

    0    coins_cache.Flush(); // TODO: if [#17487](/bitcoin-bitcoin/17487/) is merged, add erase=false here for better performance.
    

    fjahr commented at 12:22 pm on January 9, 2022:
    Note: I found that the tests don’t fail if erase is flipped to false on this line, other than in Sync(). The tests would need to become more granular to catch this case but it certainly has not gotten worse than from before this PR so it can be kept for a follow-up if it seems beneficial to test.

    fanquake commented at 10:31 am on January 18, 2023:

    You can add the = to these comments so that clang-tidy will check them.

    0    bool fOk = base->BatchWrite(cacheCoins, hashBlock, /*erase=*/ true);
    

    jamesob commented at 3:40 pm on January 20, 2023:
    Fixed, thanks.
  94. in src/test/coins_tests.cpp:943 in 0c2fc5154e outdated
    938+            cache->SetBestBlock(InsecureRand256());
    939+            erase ? cache->Flush() : cache->Sync();
    940+        }
    941+    };
    942+
    943+    txid = InsecureRand256();
    


    fjahr commented at 9:39 pm on January 8, 2022:
    nit: These three variables could have been initialized here.
  95. fjahr commented at 12:23 pm on January 9, 2022: contributor

    tACK 0c2fc5154e88b46725b8a1c2149aa94c552d1e03

    Reviewed code, ran tests locally including some basic mutations.

  96. achow101 commented at 11:17 pm on January 17, 2023: member
    This PR has several stale ACKs. It seems like the most recent revision could be in a good enough state to go in (no conflicts with master, almost all comments have been addressed). Would any of the previous reviewers (@sjors @ryanofsky @jonatack @andrewtoth) like to give this a re-review?
  97. achow101 commented at 11:35 pm on January 17, 2023: member
    ACK 0c2fc5154e88b46725b8a1c2149aa94c552d1e03
  98. sipa commented at 1:13 am on January 18, 2023: member
    Concept ACK, and rough code review ACK the first commit. I’d still like to review the tests.
  99. in src/test/coins_tests.cpp:998 in 0c2fc5154e outdated
     999+    }
    1000+
    1001+    // Can't overwrite an entry without specifying that an overwrite is
    1002+    // expected.
    1003+    Coin tmpCoin2{coin};
    1004+    BOOST_CHECK_THROW(
    


    fanquake commented at 10:29 am on January 18, 2023:
    0test/coins_tests.cpp:1004:39: error: 'tmpCoin2' used after it was moved [bugprone-use-after-move,-warnings-as-errors]
    1        view->AddCoin(outp, std::move(tmpCoin2), /*possible_overwrite*/ false),
    2                                      ^
    3test/coins_tests.cpp:1004:15: note: move occurred here
    4        view->AddCoin(outp, std::move(tmpCoin2), /*possible_overwrite*/ false),
    5              ^
    6test/coins_tests.cpp:1004:39: note: the use happens in a later loop iteration than the move
    7        view->AddCoin(outp, std::move(tmpCoin2), /*possible_overwrite*/ false),
    

    jamesob commented at 1:57 pm on January 20, 2023:
    Someone correct me if I’m wrong, but I think this is a spurious check. tmpCoin2 is initialized fresh for every invocation in this function, and then never used again after move. Clang-tidy is throwing this error because the containing function, TestFlushBehavior(), is called in a loop below.

    maflcko commented at 2:24 pm on January 20, 2023:

    It may also be because the BOOST_CHECK_THROW macro expands to a loop?

    Either way, it should be trivial to work around by just replacing std::move(tmpCoin2) with Coin{coin}.

  100. in src/test/coins_tests.cpp:1057 in 0c2fc5154e outdated
    1057+    BOOST_CHECK(!base.HaveCoin(outp));
    1058+    BOOST_CHECK(!all_caches[0]->HaveCoin(outp));
    1059+    BOOST_CHECK(!all_caches[1]->HaveCoin(outp));
    1060+
    1061+    // Add and spend from same cache without flushing.
    1062+    all_caches[0]->AddCoin(outp, std::move(coin), false);
    


    fanquake commented at 10:29 am on January 18, 2023:
    0test/coins_tests.cpp:1065:30: error: 'coin' used after it was moved [bugprone-use-after-move,-warnings-as-errors]
    1    BOOST_CHECK_EQUAL(value, coin.out.nValue);
    2                             ^
    3test/coins_tests.cpp:1061:20: note: move occurred here
    4    all_caches[0]->AddCoin(outp, std::move(coin), false);
    5                   ^
    
  101. jamesob commented at 10:11 pm on January 18, 2023: member
    Thanks for resurrecting this one from the graveyard; I’ll fix it up tomorrow.
  102. jamesob force-pushed on Jan 20, 2023
  103. coins: add Sync() method to allow flush without cacheCoins drop
    In certain circumstances, we may want to flush to disk without
    emptying `cacheCoins`, which affects performance. UTXO snapshot
    activation is one such case.
    
    This method is currently unused and this commit does not
    change any behavior.
    
    Incorporates feedback from John Newbery.
    
    Co-authored-by: Suhas Daftuar <sdaftuar@gmail.com>
    79cedc36af
  104. test: refactor: clarify the coins simulation
    Adds comments, slight refactor clarifications to make the code
    easier to follow.
    6d8affca96
  105. test: add use of Sync() to coins tests 2c3cbd6c00
  106. test: add test for coins view flush behavior using Sync()
    Thanks to Marco Falke for help with move semantics.
    1d7935b45a
  107. in src/coins.cpp:167 in a5b8b9a305 outdated
    162@@ -163,8 +163,8 @@ void CCoinsViewCache::SetBestBlock(const uint256 &hashBlockIn) {
    163     hashBlock = hashBlockIn;
    164 }
    165 
    166-bool CCoinsViewCache::BatchWrite(CCoinsMap &mapCoins, const uint256 &hashBlockIn) {
    167-    for (CCoinsMap::iterator it = mapCoins.begin(); it != mapCoins.end(); it = mapCoins.erase(it)) {
    168+bool CCoinsViewCache::BatchWrite(CCoinsMap &mapCoins, const uint256 &hashBlockIn, bool erase) {
    169+    for (CCoinsMap::iterator it = mapCoins.begin(); it != mapCoins.end(); it = erase ? mapCoins.erase(it) : ++it) {
    


    sipa commented at 3:30 pm on January 20, 2023:
    The it = ... ++it evaluation modifies it twice, which has complicated rules about its validity: https://en.cppreference.com/w/cpp/language/eval_order. My reading of the rules there is that this is in fact well-defined since C++11, but I’d still recommend avoiding it by using it = erase ? mapCoins.erase(it) : std::next(it) instead for example.

    jamesob commented at 3:40 pm on January 20, 2023:
    Good thought. Fixed.
  108. jamesob force-pushed on Jan 20, 2023
  109. jamesob commented at 3:41 pm on January 20, 2023: member
    All feedback so far has been addressed as far as I’m aware. I had to push a fully rebased version because I was having issues compiling (the base branch is from 2019!).
  110. jonatack commented at 4:37 pm on January 20, 2023: member

    Reviewed the first commit https://github.com/bitcoin/bitcoin/commit/79cedc36afe2e72e42839d861734d73d545d21b8. Suggestions:

    • drop the Sync() function from the first commit and add it instead in the 3rd commit where it is actually used
    • rename the first commit to something like “add bool erase param to BatchWrite() to flush without emptying…”
    • avoid adding a default value for erase in the BatchWrite() declarations in favor of passing it explicitly in two calls in the tests
      0jon|(79cedc36af...):~/bitcoin/bitcoin$ gd 
      1diff --git a/src/coins.cpp b/src/coins.cpp
      2index ecbd625cd7..fec0856513 100644
      3--- a/src/coins.cpp
      4+++ b/src/coins.cpp
      5@@ -248,30 +248,14 @@ bool CCoinsViewCache::BatchWrite(CCoinsMap &mapCoins, const uint256 &hashBlockIn
      6     return true;
      7 }
      8 
      9-bool CCoinsViewCache::Flush() {
     10-    bool fOk = base->BatchWrite(cacheCoins, hashBlock, /*erase=*/ true);
     11+bool CCoinsViewCache::Flush()
     12+{
     13+    bool fOk = base->BatchWrite(cacheCoins, hashBlock, /*erase=*/true);
     14     cacheCoins.clear();
     15     cachedCoinsUsage = 0;
     16     return fOk;
     17 }
     18 
     19-bool CCoinsViewCache::Sync()
     20-{
     21-    bool fOk = base->BatchWrite(cacheCoins, hashBlock, /*erase=*/ false);
     22-    // Instead of clearing `cacheCoins` as we would in Flush(), just clear the
     23-    // FRESH/DIRTY flags of any coin that isn't spent.
     24-    for (auto it = cacheCoins.begin(); it != cacheCoins.end(); ) {
     25-        if (it->second.coin.IsSpent()) {
     26-            cachedCoinsUsage -= it->second.coin.DynamicMemoryUsage();
     27-            it = cacheCoins.erase(it);
     28-        } else {
     29-            it->second.flags = 0;
     30-            ++it;
     31-        }
     32-    }
     33-    return fOk;
     34-}
     35-
     36 void CCoinsViewCache::Uncache(const COutPoint& hash)
     37 {
     38     CCoinsMap::iterator it = cacheCoins.find(hash);
     39diff --git a/src/coins.h b/src/coins.h
     40index 4edc146a14..c6f0e048c8 100644
     41--- a/src/coins.h
     42+++ b/src/coins.h
     43@@ -176,7 +176,7 @@ public:
     44 
     45     //! Do a bulk modification (multiple Coin changes + BestBlock change).
     46     //! The passed mapCoins can be modified.
     47-    virtual bool BatchWrite(CCoinsMap &mapCoins, const uint256 &hashBlock, bool erase = true);
     48+    virtual bool BatchWrite(CCoinsMap &mapCoins, const uint256 &hashBlock, bool erase);
     49 
     50     //! Get a cursor to iterate over the whole state
     51     virtual std::unique_ptr<CCoinsViewCursor> Cursor() const;
     52@@ -202,7 +202,7 @@ public:
     53     uint256 GetBestBlock() const override;
     54     std::vector<uint256> GetHeadBlocks() const override;
     55     void SetBackend(CCoinsView &viewIn);
     56-    bool BatchWrite(CCoinsMap &mapCoins, const uint256 &hashBlock, bool erase = true) override;
     57+    bool BatchWrite(CCoinsMap &mapCoins, const uint256 &hashBlock, bool erase) override;
     58     std::unique_ptr<CCoinsViewCursor> Cursor() const override;
     59     size_t EstimateSize() const override;
     60 };
     61@@ -235,7 +235,7 @@ public:
     62     bool HaveCoin(const COutPoint &outpoint) const override;
     63     uint256 GetBestBlock() const override;
     64     void SetBestBlock(const uint256 &hashBlock);
     65-    bool BatchWrite(CCoinsMap &mapCoins, const uint256 &hashBlock, bool erase = true) override;
     66+    bool BatchWrite(CCoinsMap &mapCoins, const uint256 &hashBlock, bool erase) override;
     67     std::unique_ptr<CCoinsViewCursor> Cursor() const override {
     68         throw std::logic_error("CCoinsViewCache cursor iteration not supported.");
     69     }
     70diff --git a/src/test/coins_tests.cpp b/src/test/coins_tests.cpp
     71index d55dea4f83..1d84d4c3bc 100644
     72--- a/src/test/coins_tests.cpp
     73+++ b/src/test/coins_tests.cpp
     74@@ -53,7 +53,7 @@ public:
     75 
     76     uint256 GetBestBlock() const override { return hashBestBlock_; }
     77 
     78-    bool BatchWrite(CCoinsMap& mapCoins, const uint256& hashBlock, bool erase = true) override
     79+    bool BatchWrite(CCoinsMap& mapCoins, const uint256& hashBlock, bool erase) override
     80     {
     81         for (CCoinsMap::iterator it = mapCoins.begin(); it != mapCoins.end(); ) {
     82             if (it->second.flags & CCoinsCacheEntry::DIRTY) {
     83@@ -610,7 +610,7 @@ void WriteCoinsViewEntry(CCoinsView& view, CAmount value, char flags)
     84 {
     85     CCoinsMap map;
     86     InsertCoinsMapEntry(map, value, flags);
     87-    BOOST_CHECK(view.BatchWrite(map, {}));
     88+    BOOST_CHECK(view.BatchWrite(map, /*hashBlock=*/{}, /*erase=*/true));
     89 }
     90 
     91 class SingleEntryCacheTest
     92diff --git a/src/test/fuzz/coins_view.cpp b/src/test/fuzz/coins_view.cpp
     93index 46026d8df3..149082425e 100644
     94--- a/src/test/fuzz/coins_view.cpp
     95+++ b/src/test/fuzz/coins_view.cpp
     96@@ -129,7 +129,7 @@ FUZZ_TARGET_INIT(coins_view, initialize_coins_view)
     97                 }
     98                 bool expected_code_path = false;
     99                 try {
    100-                    coins_view_cache.BatchWrite(coins_map, fuzzed_data_provider.ConsumeBool() ? ConsumeUInt256(fuzzed_data_provider) : coins_view_cache.GetBestBlock());
    101+                    coins_view_cache.BatchWrite(coins_map, fuzzed_data_provider.ConsumeBool() ? ConsumeUInt256(fuzzed_data_provider) : coins_view_cache.GetBestBlock(), /*erase=*/true);
    102                     expected_code_path = true;
    
  111. in src/txdb.cpp:149 in 79cedc36af outdated
    145@@ -146,8 +146,7 @@ bool CCoinsViewDB::BatchWrite(CCoinsMap &mapCoins, const uint256 &hashBlock) {
    146             changed++;
    147         }
    148         count++;
    149-        CCoinsMap::iterator itOld = it++;
    150-        mapCoins.erase(itOld);
    151+        it = erase ? mapCoins.erase(it) : std::next(it);
    


    jonatack commented at 4:46 pm on January 20, 2023:
    In the first commit 79cedc36afe2, I’m unsure if this diff is a change of behavior, though haven’t looked more into #17487 (review) and if it applies here. Thoughts?

    Sjors commented at 7:05 pm on January 25, 2023:

    Calling erase(it) on a std::unordered_map returns an iterator for the next element, which is was itOld was. So for erase == true the behaviour should be unchanged.

    I don’t think @sipa’s comment applies here, because the original code didn’t update it twice, but used a temporary variable itOld instead. But even it did apply, that original was also well defined - at least since we moved to c++11. So this refactor goes from something that is well defined but ugly, to something nice.


    jonatack commented at 5:28 pm on January 26, 2023:

    Thanks @Sjors, agree, as a refresher I just verified this with https://godbolt.org/z/z54n51P5h

     0#include <iostream>
     1#include <iterator>
     2#include <string>
     3#include <unordered_map>
     4
     5int main()
     6{
     7    auto print_key_value = [](const auto& key, const auto& value)
     8    {
     9        std::cout << "Key:[" << key << "] Value:[" << value << "]\n";
    10    };
    11
    12    std::unordered_map<std::string, std::string> m {
    13        {"RED","#FF0000"},
    14        {"GREEN","#00FF00"},
    15        {"BLUE","#0000FF"},
    16        {"BLACK", "#000000"},
    17        {"WHITE", "#FFFFFF"},
    18    };
    19    const auto m2{m};
    20
    21    std::cout << "Iterate on unordered_map using std::next\n";
    22    for (auto it = m.begin(); it != m.end();) {
    23        print_key_value(it->first, it->second);
    24        it = std::next(it);
    25    }
    26
    27    std::cout << "\nSame, using increment iterator and erase\n";
    28    for (auto it = m.begin(); it != m.end();) {
    29        print_key_value(it->first, it->second);
    30        auto itOld = it++;
    31        m.erase(itOld);
    32    }
    33    if (m.empty()) {
    34        std::cout << "Unordered_map is now empty\n";
    35    }
    36
    37    m = m2;
    38
    39    std::cout << "\nSame, using only erase\n";
    40    for (auto it = m.begin(); it != m.end();) {
    41        print_key_value(it->first, it->second);
    42        it = m.erase(it);
    43    }
    44    if (m.empty()) {
    45        std::cout << "Unordered_map is now empty\n";
    46    }
    47
    48    return 0;
    49}
    
     0Iterate on unordered_map using std::next
     1Key:[BLACK] Value:[#000000]
     2Key:[BLUE] Value:[#0000FF]
     3Key:[WHITE] Value:[#FFFFFF]
     4Key:[GREEN] Value:[#00FF00]
     5Key:[RED] Value:[#FF0000]
     6
     7Same, using increment iterator and erase
     8Key:[BLACK] Value:[#000000]
     9Key:[BLUE] Value:[#0000FF]
    10Key:[WHITE] Value:[#FFFFFF]
    11Key:[GREEN] Value:[#00FF00]
    12Key:[RED] Value:[#FF0000]
    13Unordered_map is now empty
    14
    15Same, using only erase
    16Key:[BLACK] Value:[#000000]
    17Key:[BLUE] Value:[#0000FF]
    18Key:[RED] Value:[#FF0000]
    19Key:[GREEN] Value:[#00FF00]
    20Key:[WHITE] Value:[#FFFFFF]
    21Unordered_map is now empty
    
  112. in src/test/coins_tests.cpp:160 in 1d7935b45a
    153@@ -154,9 +154,16 @@ void SimulationTest(CCoinsView* base, bool fake_best_block)
    154             bool test_havecoin_after = InsecureRandBits(2) == 0;
    155 
    156             bool result_havecoin = test_havecoin_before ? stack.back()->HaveCoin(COutPoint(txid, 0)) : false;
    157-            const Coin& entry = (InsecureRandRange(500) == 0) ? AccessByTxid(*stack.back(), txid) : stack.back()->AccessCoin(COutPoint(txid, 0));
    158+
    159+            // Infrequently, test usage of AccessByTxid instead of AccessCoin - the
    160+            // former just delegates to the latter and returns the first unspent in a txn.
    161+            const Coin& entry = (InsecureRandRange(500) == 0) ?
    


    jonatack commented at 5:20 pm on January 20, 2023:

    In the second commit 6d8affca96c7a34f5f104 while adding the explanatory comment, it may be worth describing why it is run infrequently. If I run ./src/test/test_bitcoin -t coins_tests with the following change, the test run appears to hang (or take a very long time) with this loop being run NUM_SIMULATION_ITERATIONS = 40000 iterations.

    0            // Infrequently, test usage of AccessByTxid instead of AccessCoin - the
    1            // former just delegates to the latter and returns the first unspent in a txn.
    2-            const Coin& entry = (InsecureRandRange(500) == 0) ?
    3-                AccessByTxid(*stack.back(), txid) : stack.back()->AccessCoin(COutPoint(txid, 0));
    4+            const Coin& entry = AccessByTxid(*stack.back(), txid);
    

    6d8affca96c7a34f5f104 nit, can drop redundant brackets while touching the line

    0            const Coin& entry = InsecureRandRange(500) == 0 ?
    
  113. in src/test/coins_tests.cpp:58 in 1d7935b45a
    52@@ -53,9 +53,9 @@ class CCoinsViewTest : public CCoinsView
    53 
    54     uint256 GetBestBlock() const override { return hashBestBlock_; }
    55 
    56-    bool BatchWrite(CCoinsMap& mapCoins, const uint256& hashBlock) override
    57+    bool BatchWrite(CCoinsMap& mapCoins, const uint256& hashBlock, bool erase = true) override
    58     {
    59-        for (CCoinsMap::iterator it = mapCoins.begin(); it != mapCoins.end(); ) {
    60+        for (CCoinsMap::iterator it = mapCoins.begin(); it != mapCoins.end(); it = erase ? mapCoins.erase(it) : ++it) {
    


    jonatack commented at 5:37 pm on January 20, 2023:

    In the third commit 2c3cbd6c007a588e667, it looks like #17487 (review) by @sipa would apply here as well.

    0        for (CCoinsMap::iterator it = mapCoins.begin(); it != mapCoins.end(); it = erase ? mapCoins.erase(it) : std::next(it)) {
    
  114. jamesob commented at 5:53 pm on January 20, 2023: member
    @jonatack thanks for the look, but given my level of interest and time, I’m not going to be doing a lot of shuffling on this PR. Unless there are problems, I’m going to leave it as-is and may incorporate your smaller suggestions if I have to retouch.
  115. Sjors commented at 5:39 pm on January 21, 2023: member
    Rebase looks innocent at first glance, but this has been so long ago that I should re-review the change.
  116. sipa commented at 10:01 pm on January 24, 2023: member

    ACK 1d7935b45ac61791399989effc18aece8b368fbb

    I’ve reviewed the code carefully, and am happy about the extent of behaviors tested. I’ve also experimented by (combined with #25325) by adding code that regularly triggers a non-erasing sync of the active chain, while reindexing a mainnet node, running in valgrind, and running with -sanitizers=undefined,address, including restarts and killall -KILL bitcoinds. No problems observed.

  117. Sjors commented at 1:29 pm on January 25, 2023: member

    I added a variant of #15265 on top of this (https://github.com/Sjors/bitcoin/commit/4a2c802d6e2b1bdcf1d8546f3834344436a94522) and am currently running IBD on mainnet with a node pruned to 10GB with 16GB of dbcache. I also enabled -sanitizers=undefined,address and set --enable-c++20 for some variation.

    The original PR found there was no performance benefit, but it makes a good stress test. So far so good (365K blocks synced, multiple flushes, a manual node restart and a kill -9 after several prune events). It should take about 2 days on my AMD Ryzen 7950x.

    I also noticed that, as expected, the cache size slightly decreases at every prune event, since we’re deleting spent coins.

  118. jamesob commented at 1:35 pm on January 25, 2023: member
    @Sjors it’s worth noting that based on benchmarks I’ve done in the past (https://github.com/bitcoin/bitcoin/pull/17487#issuecomment-557595582), most of the performance benefits are observable on spinning disks. And certainly in the case of assumeutxo, when the flush is done before a sizable period of IBD, the difference is probably more pronounced than during normal tip maintenance. Edit: I see that your bench includes cases like this.
  119. in src/coins.cpp:252 in 79cedc36af outdated
    248@@ -233,12 +249,29 @@ bool CCoinsViewCache::BatchWrite(CCoinsMap &mapCoins, const uint256 &hashBlockIn
    249 }
    250 
    251 bool CCoinsViewCache::Flush() {
    252-    bool fOk = base->BatchWrite(cacheCoins, hashBlock);
    253+    bool fOk = base->BatchWrite(cacheCoins, hashBlock, /*erase=*/ true);
    


    Sjors commented at 2:30 pm on January 25, 2023:
    79cedc36afe2e72e42839d861734d73d545d21b8: nit drop space /*erase=*/true
  120. sipa commented at 4:47 pm on January 25, 2023: member
    Perhaps for a follow-up, but we shouldn’t forget: add a call to coins_view_cache.Sync() in src/test/fuzz/coins_view.cpp.
  121. sipa commented at 4:54 pm on January 25, 2023: member

    Regarding performance, I think #15265 is a bit naive, and we shouldn’t conclude too much based on benchmarks of just that.

    The trade-off when syncing without wiping the cache is that on the next flush/sync, there will be more non-dirty entries in the cache that need to be iterated over. That iterating has a non-zero cost which is only a fraction of that of actually writing something to disk, but for millions of them, it still adds up. A better strategy is probably keeping some statistics on the ratio between dirty/nondirty entries in the cache, and decide whether to sync/flush based on that.

  122. in src/coins.h:294 in 79cedc36af outdated
    291      */
    292     bool Flush();
    293 
    294+    /**
    295+     * Push the modifications applied to this cache to its base while retaining
    296+     * the contents of this cache (except for spent coins, which we erase).
    


    Sjors commented at 6:55 pm on January 25, 2023:
    Is it strictly necessary or just nice to have to erase spent coins?
  123. achow101 commented at 7:02 pm on January 25, 2023: member
    ACK 1d7935b45ac61791399989effc18aece8b368fbb
  124. jamesob commented at 7:05 pm on January 25, 2023: member

    Perhaps for a follow-up, but we shouldn’t forget: add a call to coins_view_cache.Sync() in src/test/fuzz/coins_view.cpp.

    I’ll leave this as-is to avoid disturbing the delicate congregation of ACKs (sounds like a nature documentary), but can do a quick follow-up.

  125. in src/coins.cpp:263 in 79cedc36af outdated
    259+bool CCoinsViewCache::Sync()
    260+{
    261+    bool fOk = base->BatchWrite(cacheCoins, hashBlock, /*erase=*/ false);
    262+    // Instead of clearing `cacheCoins` as we would in Flush(), just clear the
    263+    // FRESH/DIRTY flags of any coin that isn't spent.
    264+    for (auto it = cacheCoins.begin(); it != cacheCoins.end(); ) {
    


    Sjors commented at 7:19 pm on January 25, 2023:
    Perhaps in a future refactor we could avoid this second loop by moving this into base->BatchWrite. Assuming that improves performance.

    andrewtoth commented at 1:28 pm on July 16, 2024:
    This is done in #28280.
  126. in src/coins.cpp:253 in 79cedc36af outdated
    248@@ -233,12 +249,29 @@ bool CCoinsViewCache::BatchWrite(CCoinsMap &mapCoins, const uint256 &hashBlockIn
    249 }
    250 
    251 bool CCoinsViewCache::Flush() {
    252-    bool fOk = base->BatchWrite(cacheCoins, hashBlock);
    253+    bool fOk = base->BatchWrite(cacheCoins, hashBlock, /*erase=*/ true);
    254     cacheCoins.clear();
    


    Sjors commented at 7:32 pm on January 25, 2023:

    So this was redundant all along? Since BatchWrite was already deleting all the elements.

    Seems so, since adding the following line above doesn’t break any tests:

    0if(cacheCoins.size() != 0) throw std::logic_error("Not all cached coins were erased");
    
  127. Sjors commented at 8:26 pm on January 25, 2023: member
    I still need to review CCoinsViewCache::BatchWrite which IIUC only gets involved when there’s a stack of two CCoinsViewCache. See IRC chat of January 25th around 19:20 UTC.
  128. in src/coins.cpp:179 in 1d7935b45a
    175@@ -176,8 +176,10 @@ void CCoinsViewCache::SetBestBlock(const uint256 &hashBlockIn) {
    176     hashBlock = hashBlockIn;
    177 }
    178 
    179-bool CCoinsViewCache::BatchWrite(CCoinsMap &mapCoins, const uint256 &hashBlockIn) {
    180-    for (CCoinsMap::iterator it = mapCoins.begin(); it != mapCoins.end(); it = mapCoins.erase(it)) {
    181+bool CCoinsViewCache::BatchWrite(CCoinsMap &mapCoins, const uint256 &hashBlockIn, bool erase) {
    


    Sjors commented at 9:13 am on January 26, 2023:

    I’m starting to think we don’t need this at all, because the change to CCoinsViewDB::BatchWrite is enough.

    #15265 had an assert in CCoinsViewCache::BatchWrite:

    0assert(erase_from_map); // We do not implement writes to a cache where we don't erase from the child!
    

    jamesob commented at 5:45 pm on January 26, 2023:
    (see my comment here: #17487 (comment))
  129. Sjors commented at 9:32 am on January 26, 2023: member

    I think this PR can be simplified by not touching the body of CCoinsViewDB::BatchWrite (other than adding an assert).

    My understanding from yesterdays IRC discussion is that the only time CCoinsViewCache::BatchWrite is called is when you have two CCoinsViewCache’s stacked on top of each other.

    The big Flush() only happens in Chainstate::FlushStateToDisk. This involves CoinsTip() which is a CCoinsViewCache on top of a CCoinsViewDB. So in base-> BatchWrite is called on the CCoinsViewDB version.

    Same for FlushSnapshotToDisk.

    There’s also a Flush() call in Chainstate::ReplayBlocks(). This also involves a CCoinsViewCache on top of a CCoinsViewDB, but since this is only used at startup time, it’s tiny anyway.

    The only two places where we have two CCoinsViewCache objects stacked on top of each other are ConnectTip and DisconnectTip(). But here the top-most CCoinsViewCache is tiny and is not worth preserving.

    Unless something changes in AssumeUTXO followup PRs?

  130. jamesob commented at 5:41 pm on January 26, 2023: member

    I think this PR can be simplified by not touching the body of CCoinsViewDB::BatchWrite (other than adding an assert).

    I don’t think this is workable for my purposes. My motivation for filing the PR was for use in assumeutxo, where we load in a large number of coins to a CCoinsViewCache hierarchy. After all coins are loaded, the coins cache is committed to disk, which results in a BatchWrite() call on a CCoinsViewDB object. If that method does not support erase=false, we will evict all freshly-loaded coins from the cache.

    But perhaps you meant that we might be able to get away with not modifying CCoinsViewCache::BatchWrite() in this changeset. In a strict sense this may be possible, but then we would break the interface for CCoinsView, since one of its subclasses would support that parameter and another wouldn’t. I think this would complicate the implementation somewhat, and might lead to surprising usages.

    It is plausible to me that we could still add the erase parameter to all BatchWrite()s to maintain interface compatbility, but then do some kind of conditional assertion that erase != true if the type of base is CCoinsViewCache, but from what I’m reading, C++ doesn’t like that kind of dynamic type-checking. This would also complicate our tests, which randomly build cache hierarchies and then call BatchWrite willy-nilly.


    So to sum up, I think what you’re suggesting is possible, but IMO it’s kind of a wash in terms of the complexity it might introduce. I’m personally not interested in spending the time to do the experimentation since my interest in assumeutxo (and subsequently this change) is waning. If others are interested in picking up this change and trying it, you have my full support!

  131. Sjors commented at 10:17 am on January 27, 2023: member

    I think this PR can be simplified by not touching the body of CCoinsViewDB::BatchWrite

    Oops, my comment was the wrong way around. It should have said not touching the body of CCoinsViewCache. That’s what the assert achieved in #15265 (in CCoinsViewCache, not in the base class).

    I agree there’s a trade-off for where you’re adding complexity: this PR makes CCoinsViewDB::BatchWrite more complicated, but the alternative might add make the tests more complicated.

    I’ll continue to review the changed CCoinsViewCache method.

  132. Sjors approved
  133. Sjors commented at 1:15 pm on January 27, 2023: member

    tACK 1d7935b45ac61791399989effc18aece8b368fbb

    IBD with debug stuff ~still needs half a day or so, but I’ll update things here if it fails~ is done!

  134. jamesob commented at 2:26 pm on January 30, 2023: member

    Any updates @Sjors?

    Is there more review needed here, or is this looking ready to go?

    image

  135. fanquake merged this on Jan 30, 2023
  136. fanquake closed this on Jan 30, 2023

  137. sidhujag referenced this in commit 8ff3d69a6a on Jan 30, 2023
  138. achow101 referenced this in commit c8cb62272e on Jan 30, 2023
  139. sidhujag referenced this in commit 04f83d37fb on Jan 30, 2023
  140. jamesob referenced this in commit d6f54dac8d on Jan 31, 2023
  141. jamesob referenced this in commit d2a9939c8a on Feb 6, 2023
  142. jamesob referenced this in commit 129d8f55ee on Feb 22, 2023
  143. jamesob referenced this in commit 17f636883d on May 5, 2023
  144. bitcoin locked this on Jan 30, 2024
  145. bitcoin unlocked this on Jul 16, 2024
  146. Prabhat1308 referenced this in commit d6069cb8d6 on Jul 16, 2024
  147. achow101 referenced this in commit 27a770b34b on Aug 8, 2024
  148. kwvg referenced this in commit 2a5fc3148e on Aug 9, 2024
  149. kwvg referenced this in commit 7e798d9808 on Aug 10, 2024
  150. kwvg referenced this in commit e86dce77a2 on Aug 10, 2024
  151. PastaPastaPasta referenced this in commit 5d852c3e23 on Aug 13, 2024
  152. PastaPastaPasta referenced this in commit b119bd4e0d on Aug 13, 2024
  153. paplorinc referenced this in commit d0f52dce9e on Aug 14, 2024
  154. paplorinc referenced this in commit c98d57900e on Aug 14, 2024

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-11-21 12:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me