andrewtoth
commented at 10:38 pm on August 16, 2023:
contributor
Since #17487 we no longer need to clear the coins cache when syncing to disk. A warm coins cache significantly speeds up block connection, and only needs to be fully flushed when nearing the dbcache limit.
For frequent pruning flushes there’s no need to empty the cache and kill connect block speed. However, simply using Sync in place of Flush actually slows down a pruned full IBD with a high dbcache value. This is because as the cache grows, sync takes longer since every coin in the cache is scanned to check if it’s dirty. For frequent prune flushes and a large cache this constant scanning starts to really slow IBD down, and just emptying the cache on every prune becomes faster.
To fix this, we can add two pointers to each cache entry and construct a doubly linked list of dirty entries. We can then only iterate through all dirty entries on each Sync, and simply clear the pointers after.
With this approach a full IBD with dbcache=16384 and prune=550 was 32% faster than master. For default dbcache=450 speedup was ~9%. All benchmarks were run with stopatheight=800000.
prune
dbcache
time
max RSS
speedup
master
550
16384
8:52:57
2,417,464k
-
branch
550
16384
6:01:00
16,216,736k
32%
branch
550
450
8:05:08
2,818,072k
8.8%
master
10000
5000
8:19:59
2,962,752k
-
branch
10000
5000
5:56:39
6,179,764k
28.8%
master
0
16384
4:51:53
14,726,408k
-
branch
0
16384
4:43:11
16,526,348k
2.7%
master
0
450
7:08:07
3,005,892k
-
branch
0
450
6:57:24
3,013,556k
2.6%
While the 2 pointers add memory to each cache entry, it did not slow down IBD. For non-pruned IBD results were similar for this branch and master. When I performed the initial IBD, the full UTXO set could be held in memory when using the max dbcache value. For non-pruned IBD with max dbcache to tip ended up using 12% more memory, but it was also 2.7% faster somehow. For smaller dbcache values the dbcache limit is respected so does not consume more memory, and the potentially more frequent flushes were not significant enough to cause any slowdown.
For reviewers, the commits in order do the following:
First 4 commits encapsulate all accesses to flags on cache entries, and then the 5th makes flags private.
Commits refactor: add CoinsCachePair alias to coins: call ClearFlags in CCoinsCacheEntry destructor create the linked list head nodes and cache entry self references and pass them into AddFlags.
Commit coins: track flagged cache entries in linked list actually adds the entries into a linked list when they are flagged DIRTY or FRESH and removes them from the linked list when they are destroyed or the flags are cleared manually. However, the linked list is not yet used anywhere.
Commit test: add cache entry linked list tests adds unit tests for the linked list.
Commit coins: pass linked list of flagged entries to BatchWrite uses the linked list to iterate through DIRTY entries instead of using the entire coins cache.
Commit validation: don't erase coins cache on prune flushes uses Sync instead of Flush for pruning flushes, so the cache is no longer cleared.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.
Conflicts
Reviewers, this pull request conflicts with the following ones:
#30326 (optimization: Reduce cache lookups in CCoinsViewCache::FetchCoin by paplorinc)
#28216 (fuzz: a new target for the coins database by darosior)
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.
in
src/coins.h:109
in
dbd663c446outdated
102@@ -103,8 +103,16 @@ class Coin
103 */
104 struct CCoinsCacheEntry
105 {
106+private:
107+ //! These are used to create a doubly linked list of dirty entries. They are
108+ //! set in SetFlags and unset in ClearFlags.
109+ CCoinsCacheEntry *m_prev{nullptr};
Actually, it seems iterators may get invalidated when a map rehash occurs, but pointers to elements don’t.
I think you can however store pointers to std::pair<const COutPoint, CoinsCacheEntry> (the value_type of the map). Would that help alleviate the need for m_outpoint?
andrewtoth
commented at 2:28 pm on August 17, 2023:
Brilliant, thanks! Was just trying this last night, this looks like the way. Will give it a shot.
andrewtoth
commented at 8:21 pm on August 17, 2023:
DrahtBot added the label
CI failed
on Aug 17, 2023
DrahtBot removed the label
CI failed
on Aug 18, 2023
andrewtoth force-pushed
on Aug 18, 2023
andrewtoth force-pushed
on Aug 18, 2023
andrewtoth force-pushed
on Aug 19, 2023
andrewtoth force-pushed
on Aug 19, 2023
andrewtoth force-pushed
on Aug 20, 2023
andrewtoth force-pushed
on Aug 23, 2023
andrewtoth force-pushed
on Aug 24, 2023
andrewtoth force-pushed
on Aug 24, 2023
jamesob
commented at 3:14 pm on August 24, 2023:
contributor
Concept ACK! I’ll try to repro the bench results in the next week or so.
DrahtBot added the label
CI failed
on Aug 24, 2023
DrahtBot removed the label
CI failed
on Aug 25, 2023
0xB10C
commented at 7:52 am on August 30, 2023:
contributor
@andrewtoth Not sure if you have seen #20827 which is a different approach but has a similar effect.
andrewtoth
commented at 7:13 pm on August 30, 2023:
contributor
@0xB10C indeed I have seen that one. I believe they complement each other. #20827 reduces IBD time for larger prune values and this PR reduces it even more for smaller prune values. I should benchmark this on top of that one.
Also, I’m curious how you are performing your benchmarks and generating all those nice graphs. Do you mind sharing what tools you use?
0xB10C
commented at 6:35 am on August 31, 2023:
contributor
Also, I’m curious how you are performing your benchmarks and generating all those nice graphs. Do you mind sharing what tools you use?
At the time I had a spare and otherwise idle machine set up I could use. I don’t have the bash script at hand but IIRC I had a merge base (MB) and a pull-request (PR) binary I’d run with different prune and dbcache configurations. After each run, the script would rename and move the debug log (with extra -debug=coindb -debug=prune) to a separate folder. After all runs were done, I loaded the debug logs into the following Jupyter notebooks to parse the debug logs and to generate the graphs. For anyone revisiting this, these are the relevant comments #20827 (comment), #20827 (comment), and #20827 (comment) for my setup and results.
I’ve uploaded the jupyter nodebooks here https://gist.github.com/0xB10C/89c2903290cfb1840792d41dcd079646. The first was used for a partial IBD from 500k-600k and the second one with a full IBD. Both times syncing from a local node on the same host. The notebooks expect the debug log files with the name debug_{binary}_{cset}_{run}.log where {binary} is either MB or PR, {cset} is an integer specifying the configuration set (pruning and dbcache parameter), and {run} is an integer numbering the run of this binary-cset combination if I did multiple. Feel free to remix and share in any way you want. Keep in mind that parsing logs is quite fragile as log message format can change. Feel free to reach out if there are questions.
DrahtBot added the label
CI failed
on Sep 4, 2023
DrahtBot removed the label
CI failed
on Sep 5, 2023
jamesob
commented at 5:47 pm on September 20, 2023:
contributor
Bench running now, should have some results in the next day or so.
jamesob
commented at 10:25 am on September 27, 2023:
contributor
Local IBD of 30_000 blocks with -prune=550 from a Taproot-enabled height didn’t show any difference. I didn’t have -log=prune enabled, so my bench debug.log was basically useless. Rerunning…
DrahtBot added the label
Needs rebase
on Nov 8, 2023
andrewtoth force-pushed
on Nov 30, 2023
DrahtBot removed the label
Needs rebase
on Dec 1, 2023
andrewtoth force-pushed
on Dec 3, 2023
andrewtoth
commented at 6:30 pm on December 3, 2023:
contributor
Rebased, ran the following benchmark using hyperfine. Sync to block 800k 3 times on master commit and the last commit of this branch (I updated the commit recently, but no code changes only commit order changes to satisfy CI), dbcache=4000 and prune=550:
The second commit is this branch, so it runs 24% faster than master.
@jamesob I suppose 30k blocks is not enough to appreciate the effect of syncing with a full cache?
maflcko
commented at 11:46 am on December 5, 2023:
One thing I don’t like about modernize-use-equals-default is that it makes aggregate initialization possible again, when a developer disabled it by declaring a constructor.
Though, now that C++20 will be required soon, I guess it can be enabled. C.f.
0structA {
1// A() {}; // "main" fn fails to compile under C++17 and 20
2 A() =default; // "main" fn fails to compile under C++20
3int b{};
4};
56intmain() { (void)A{1}; }
martinus
commented at 12:00 pm on December 5, 2023:
martinus
commented at 7:25 pm on December 4, 2023:
I think SetFlags is a bit confusing name, since it’s more a AddFlags, because it doesn’t simply set the flags, it ORs them together.
Alternatively (and I think this is the better solution) really use m_flags = flags instead of mflags |= flags.
andrewtoth
commented at 2:15 am on December 5, 2023:
Only 2 of the 6 previous flags assignments (aside from = 0) used simple assignment, the other 4 use OR assignment. The 2 that use simple assignment we know logically that they must have been just created and therefore their flags are 0 so we are not overwriting any existing flags. Therefore an OR assignment in those 2 cases is logically equivalent.
So if we were to go with your second suggestion we would need to modify behavior to make sure we assign the new flags we wish to set OR’d with the current flags so we don’t overwrite existing flags. Or create separate Set and Add methods. I think that would make this patch more cumbersome.
It seems to me your first suggestion would be the path of least resistance. Updated to AddFlags.
martinus
commented at 7:32 pm on December 4, 2023:
Not sure if that is possible in the call, but when emplace doesn’t create a new object because the outpoint is already present, then previously the flags didn’t change. Now, the flag DIRTY is always added, regardless if the outpoint was already present in cacheCoins or not.
in
src/coins.cpp:234
in
ba09f5039coutdated
236- if (!(it->second.flags & CCoinsCacheEntry::FRESH && it->second.coin.IsSpent())) {
237+ if (!(it->second.GetFlags() & CCoinsCacheEntry::FRESH && it->second.coin.IsSpent())) {
238 // Create the coin in the parent cache, move the data up
239 // and mark it as dirty.
240- CCoinsCacheEntry& entry = cacheCoins[it->first];
241+ itUs = cacheCoins.emplace(std::piecewise_construct, std::forward_as_tuple(it->first), std::tuple<>()).first;
martinus
commented at 7:40 pm on December 4, 2023:
I’d use itUs = cacheCoins.try_emplace(it->first).first; instead, it’s just less code and does the same thing
in
src/coins.cpp:317
in
ba09f5039coutdated
314+ for (auto it{m_flagged_head.second.Next()}; it != nullptr; ) {
315 if (it->second.coin.IsSpent()) {
316+ // We need to get next entry before we erase this entry.
317+ // Since cacheCoins owns the entry, it will be deallocated before we
318+ // can access the next entry if we erase first.
319+ const auto next_entry{it->second.Next(/*clear_flags=*/true)};
martinus
commented at 8:12 pm on December 4, 2023:
nit, you could move that line in front of the if, then remove the else, like so:
andrewtoth
commented at 3:03 am on December 5, 2023:
contributor
@martinus thank you very much for your review! I updated the PR with all your suggestions.
andrewtoth force-pushed
on Dec 5, 2023
DrahtBot added the label
CI failed
on Dec 5, 2023
DrahtBot removed the label
CI failed
on Dec 5, 2023
martinus
commented at 5:55 am on December 5, 2023:
contributor
I didn’t think this through yet, but this might be an alternative way to achieve the same thing: Instead of the double linked list, split the cacheCoins map up into 4 different maps, one map for each flag setting, e.g. as a std::array<CCoinMap, 4> m_cache_coins_per_flag;. Index is the flag, e.g. m_cache_coins_per_flag[FRESH | DIRTY] to get the map with all entries that are fresh & dirty. Instead of setting a flag & linking the list, use the unordered_map’s auto node = extract(...) and insert(std::move(node)) to move the entry to the corresponding map. This only relinks the unordered_map node and causes no allocation, pointers to the entry stay valid.
This approach would have a few advantages:
Less memory usage: no m_flag member is needed that actually required 8 bytes due to alignment, no 2 pointers needed for interlinking, and multiple maps means overall bucket sizes will stay smaller.
processing the maps might be faster: iteration is fast because e.g. skipping non-dirty entries is trivial (they are in a different map). There might be a hashing overhead though from extract() and insert().
For this to work, all 4 maps need to use the same m_cache_coins_memory_resource (insert() into a different map where the allocator doesn’t compare equal is undefined behavior).
andrewtoth
commented at 1:19 pm on December 5, 2023:
contributor
@martinus something similar was mentioned in #15265 (comment). However, with your approach every access would require 4 lookups.
martinus
commented at 7:43 pm on December 5, 2023:
contributor
@martinus something similar was mentioned in #15265 (comment). However, with your approach every access would require 4 lookups.
Ha, of course; I didn’t think about that at all.
andrewtoth force-pushed
on Dec 13, 2023
DrahtBot added the label
CI failed
on Jan 12, 2024
andrewtoth force-pushed
on Mar 8, 2024
andrewtoth force-pushed
on Mar 8, 2024
DrahtBot removed the label
CI failed
on Mar 8, 2024
andrewtoth force-pushed
on Mar 14, 2024
andrewtoth force-pushed
on Mar 14, 2024
andrewtoth force-pushed
on Mar 14, 2024
DrahtBot added the label
CI failed
on Mar 14, 2024
DrahtBot
commented at 6:08 pm on March 14, 2024:
contributor
🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the
documentation.
Possibly this is due to a silent merge conflict (the changes in this pull request being
incompatible with the current code in the target branch). If so, make sure to rebase on the latest
commit of the target branch.
Leave a comment here, if you need help tracking down a confusing failure.
DrahtBot removed the label
CI failed
on Mar 14, 2024
andrewtoth force-pushed
on Mar 16, 2024
in
src/coins.h:110
in
56faed97cdoutdated
105@@ -103,6 +106,14 @@ class Coin
106 */
107 struct CCoinsCacheEntry
108 {
109+private:
110+ //! These are used to create a doubly linked list of flagged entries. They
In commit “coins: track dirty cache entries in doubly linked list”
Use a C++17 structured binding:
0auto [it, inserted] = cacheCoins.emplace(
in
src/coins.cpp:54
in
75a248f770outdated
50@@ -51,7 +51,7 @@ CCoinsMap::iterator CCoinsViewCache::FetchCoin(const COutPoint &outpoint) const
51 if (ret->second.coin.IsSpent()) {
52 // The parent only has an empty entry for this outpoint; we can consider our
53 // version as fresh.
54- ret->second.flags = CCoinsCacheEntry::FRESH;
55+ ret->second.AddFlags(CCoinsCacheEntry::FRESH, *ret, m_flagged_head);
In commit “coins: track dirty cache entries in doubly linked list”
Should FRESH-but-not-DIRTY be flagged? I believe those correspond to spent but unmodified coins fetched from the parent cache.
If the rule is flagged == dirty, then perhaps it makes sense to instead of having AddFlags and Clearflags have a single SetFlags, which knows whether the new state is supposed to be in the list or not (moving the responsibility of flagged-or-not to that function instead of the caller). EDIT: I notice you used to have a SetFlags, actually, but my comments is more about where the responsibility lies for deciding whether an entry ought to be flagged or not than the actual function names.
andrewtoth
commented at 6:32 pm on March 20, 2024:
Should FRESH-but-not-DIRTY be flagged? I believe those correspond to spent but unmodified coins fetched from the parent cache.
Yes, but only because in this case they are also spent. We erase all spent coins from the cache after doing a non-erasing batch write. There would be no way to track spent-but-not-DIRTY otherwise except by scanning the entire cache.
Since the only case where we have FRESH-but-not-DIRTY is where the coin is spent, it makes sense to me to just add this to the linked list as well. Before proposing this PR I did call the linked list head m_dirty_head, but after not finding an efficient way to remove non-dirty spent entries, I decided to call it m_flagged_head.
If this makes sense to you I can add a comment here explaining this.
see #28280 (review), I think that FRESH-but-not-DIRTY coins may not exist.
andrewtoth force-pushed
on Mar 21, 2024
andrewtoth
commented at 2:11 pm on March 21, 2024:
contributor
@0xB10C rebased after #20827 was merged, and ran the following benchmark:
0hyperfine --export-markdown ~/bench.md --show-output --parameter-list commit 015ac13dcc964a31ef06dfdb565f88f901607f0e,25bdbc7f340884b3ab871c7948a4f2693037b871 --prepare 'git checkout {commit} && make -j$(nproc) src/bitcoind; sync; sudo /sbin/sysctl vm.drop_caches=3; rm -rf ~/.bitcoin/blocks; rm -rf ~/.bitcoin/chainstate;'-M 3'./src/bitcoind -dbcache=4000 -prune=550 -printtoconsole=0 -connect=<local network node ip> -stopatheight=800000' 1 2... 3 4Benchmark 1: ./src/bitcoind -dbcache=4000-prune=550-printtoconsole=0-connect=<local network node ip>-stopatheight=800000 (commit =015ac13dcc964a31ef06dfdb565f88f901607f0e)
5 6 Time (mean ±σ): 27241.365 s ±33.693 s [User: 34791.898 s, System: 7357.098 s]
7Range (min … max): 27202.978 s …27266.037 s 3 runs
8 9Benchmark 2: ./src/bitcoind -dbcache=4000-prune=550-printtoconsole=0-connect=<local network node ip>-stopatheight=800000 (commit =25bdbc7f340884b3ab871c7948a4f2693037b871)
1011 Time (mean ±σ): 22153.568 s ±37.663 s [User: 27818.681 s, System: 4309.878 s]
12Range (min … max): 22119.682 s …22194.118 s 3 runs
1314Summary
15./src/bitcoind -dbcache=4000-prune=550-printtoconsole=0-connect=192.168.2.13-stopatheight=800000 (commit =25bdbc7f340884b3ab871c7948a4f2693037b871) ran
161.23±0.00 times faster than ./src/bitcoind -dbcache=4000-prune=550-printtoconsole=0-connect=192.168.2.13-stopatheight=800000 (commit =015ac13dcc964a31ef06dfdb565f88f901607f0e)
So this is 23% faster than master when using dbcache=4000, prune=550. (I just rebased again but 015ac13dcc964a31ef06dfdb565f88f901607f0e is master and 25bdbc7f340884b3ab871c7948a4f2693037b871 is my branch).
@sipa thank you for your review! Addressed your nits and added more comments about flagged entries.
DrahtBot added the label
Needs rebase
on Mar 22, 2024
andrewtoth force-pushed
on Apr 3, 2024
DrahtBot removed the label
Needs rebase
on Apr 3, 2024
DrahtBot added the label
CI failed
on Apr 3, 2024
andrewtoth
commented at 9:26 pm on April 4, 2024:
contributor
Rebased and ran some faster benchmarks, 2 each but only until 400k. Will do some more going to 800k.
DrahtBot removed the label
CI failed
on Apr 4, 2024
andrewtoth
commented at 4:03 am on April 9, 2024:
contributor
Ran benchmarks to 800k blocks, 2 times each. As expected, results are very good when running pruned. When running non-pruned with default dbcache, however, master was 1% faster. I think this slowdown is negligible though and the improvements to syncing with pruning enabled make this worth the changes here. Also, potential improvements like #28945 will make up for it.
prune
dbcache
mean time
speedup
master
550
16384
7h 29m
-
branch
550
16384
4h 56m
34%
branch
550
450
6h 33m
12%
master
0
16384
3h 34m
-
branch
0
16384
3h 29m
2%
master
0
450
5h 34m
1%
branch
0
450
5h 40m
-
hernanmarino
commented at 11:37 pm on April 12, 2024:
contributor
Concept ACK . I ’d really like to see this merged
DrahtBot added the label
CI failed
on Apr 21, 2024
DrahtBot removed the label
CI failed
on Apr 23, 2024
achow101
commented at 8:21 pm on May 13, 2024:
member
For frequent pruning flushes there’s no need to empty the cache and kill connect block speed. However, simply using Sync in place of Flush actually slows down a pruned full IBD with a high dbcache value. This is because as the cache grows, sync takes longer since every coin in the cache is scanned to check if it’s dirty. For frequent prune flushes and a large cache this constant scanning starts to really slow IBD down, and just emptying the cache on every prune becomes faster.
To fix this, we can add two pointers to each cache entry and construct a doubly linked list of dirty entries. We can then only iterate through all dirty entries on each Sync, and simply clear the pointers after.
Perhaps I’m missing something, but it seems to me that the obvious solution to this problem is to have BatchWrite always delete dirty entries and clear the flags on all entries, rather than implementing a whole new data structure? BatchWrite already iterates the entire cache anyways, so it would make sense to me that we can consolidate all of the operations that occur for each cache entry.
achow101 referenced this in commit
d6069cb8d6
on May 13, 2024
@achow101 That works, but it misses the point. If you delete the dirty entries on a prune flush, then those entries need to be re-read from disk when they are (likely soon) spent.
achow101
commented at 8:36 pm on May 13, 2024:
member
@achow101 That works, but it misses the point. If you delete the dirty entries on a prune flush, then those entries need to be re-read from disk when they are (likely soon) spent.
Sorry, I meant clear the Dirty and spent ones. Basically just move the stuff that Sync() does into the loop that BatchWrite is doing.
DrahtBot added the label
Needs rebase
on May 13, 2024
andrewtoth
commented at 2:32 am on May 14, 2024:
contributor
@achow101 I see what you mean, clear the flags and delete spent entries in the loop in BatchWrite. That would reduce each Sync from 2 full scans of the coinsCache to 1. It still scans the entire coinsCache, while the current change here only loops through flagged entries. Will experiment with how that performs, and possibly combining both ideas to do a single loop of only the flagged entries.
Am I right that after this change (or alternatives which let BatchWrite just iterate dirty/fresh entries), it would be possible to (perhaps dramatically) reduce the periodic flush time, so that after IBD, we basically always have a mostly non-dirty cache? Such a change could be made today of course, but the cost of iterating the entire cache every time would add up quickly.
andrewtoth
commented at 4:44 pm on May 21, 2024:
contributor
Am I right that after this change (or alternatives which let BatchWrite just iterate dirty/fresh entries), it would be possible to (perhaps dramatically) reduce the periodic flush time, so that after IBD, we basically always have a mostly non-dirty cache? Such a change could be made today of course, but the cost of iterating the entire cache every time would add up quickly.
I believe you are right, but I’m unsure what the benefit is that you see. From my understanding, the periodic flushes are solely to prevent data loss in case of unclean shutdown (otherwise, flushing from filling the dbcache or flushing from shutting down cleanly prevent data loss). With a 24 hour periodic flush schedule and #15218, the most that is at risk is ~144 blocks worth of data?
If you are talking about periodic flushes during IBD though, that might be a benefit that could obviate #15218 and properly fix #11600. If we reduce the periodic flush interval to an hour or so, then only an hour or so of IBD data would be at risk of rolling back. This would of course mean that there is less opportunity for the cache to delete FRESH & DIRTY entries on spend so they never have to hit the disk, so it would be less performant. But it would still keep the cache for connecting blocks quickly.
@andrewtoth I was talking about post-IBD. And one advantage to more frequent flushing is reducing the latency spikes you get when a flush is triggered.
andrewtoth
commented at 12:43 pm on June 3, 2024:
contributor
BatchWrite already iterates the entire cache anyways
@achow101 I think this is what you might not be clear about. BatchWrite iterates the entire cache, but this patch does not. After this patch BatchWrite only iterates flagged entries, which is a much smaller subset of the cache.
I tried moving everything in the loop in Sync into BatchWrite, along with cachedCoinsUsage so it can be decremented as we delete spent coins from the child cache, and it looks like it will take many times longer to sync with dbcache=16384 prune=550 than current master. So I don’t think that’s a viable alternative solution to what is proposed here.
andrewtoth force-pushed
on Jun 3, 2024
andrewtoth
commented at 2:17 pm on June 3, 2024:
contributor
Rebased, and added a small optimization based on @achow101’s suggestion. In BatchWrite, we also clear the flags when not erasing if the coin is unspent. This reduces the amount of flagged entries needed to be iterated in Sync to find the spent entries to delete. We can also throw a logic error now if we find a flagged entry in Sync that is not spent.
andrewtoth force-pushed
on Jun 3, 2024
DrahtBot removed the label
Needs rebase
on Jun 3, 2024
achow101
commented at 4:16 pm on June 3, 2024:
member
@achow101 I think this is what you might not be clear about. BatchWrite iterates the entire cache, but this patch does not. After this patch BatchWrite only iterates flagged entries, which is a much smaller subset of the cache.
Yes, that was what I was missing. It just wasn’t clear to me from the description that this was implemented as another performance improvement in addition to clearing dirty and spent UTXOs.
DrahtBot added the label
CI failed
on Jun 3, 2024
andrewtoth force-pushed
on Jun 3, 2024
DrahtBot removed the label
CI failed
on Jun 4, 2024
would it make sense to use cacheCoins.try_emplace without the forward_as_tuple parts (similarly to what I’ve tried in a related PR) (in a separate commit or PR)
Sorry, can you elaborate on what benefit that would have here, and any possible downsides? I’m not 100% sure what all the differences are between the two methods, so preferred to stick to status quo.
116+ * the parent cache to do a batch write. This is a performance optimization
117+ * compared to giving all entries in the cache to the parent and having the
118+ * parent scan for only modified entries.
119+ *
120+ * FRESH entries are tracked because a FRESH-but-not-DIRTY entry can only be
121+ * a spent coin that exists in the parent cache. When performing a batch
This comment and the implementation further persists the notion of FRESH-but-not-DIRTY coin, which I think is unfortunate, because it should be impossible for them to exist:
The only place in which they could be created is here, while fetching a coin from the parent that is spent. However that is not possible because we’d only get to that point if the GetCoin() call a few lines above returned true - which is impossible for spent coins because according to the abstract CCoinsView interface, GetCoin “Returns true only when an unspent coin was found, which is returned in coin.” This is followed both by CCoinsViewCache::GetCoin, which returns false if spent and CCoinsViewDB::GetCoin which doesn’t return spent coins because they are never saved to disk.
Maybe it would be good to revive #18746 which attempted to fix this but got closed.
With that, this PR could add only DIRTY coins and ignore the FRESH flag, which would be simpler.
With that, this PR could add only DIRTY coins and ignore the FRESH flag, which would be simpler.
I’m not sure how that would be simpler. The two flags are coupled and so adding one and not the other would be a more complex diff. Perhaps I don’t understand what you are suggesting?
I could just replace the single FRESH-but-not-DIRTY check with an assertion and remove this comment about tracking FRESH coins?
Yes, I thought some more and don’t find a great way to simplify the logic. I think it would be enough to remove / amend the comment, maybe mentioning instead that FRESH-but-not-DIRTY coins are not expected to occur but would be handled well if they did.
replace the single FRESH-but-not-DIRTY check
Not sure if that’s the “check” you mean, but replacing this with an assertion was essentially what #18746 did in its latest version. Since that PR was mildly controversial (I think mostly because it originally had the spin of “These coins may be unnecessarily added today; don’t add them anymore” instead of “We already don’t add them. Remove unreachable code”) I think the question of adding an assert should be separated from this PR.
I think that it should be explained in a comment similar as it’s already done in CCoinsViewCache::BatchWrite why we keep unspent coins because this is not obvious.
mzumsande
commented at 6:45 pm on June 27, 2024:
contributor
A non-pruned IBD with full dbcache to tip ended up using 12% more memory, but it was also 2.7% faster somehow.
I’m not sure if I understand this, is this still the case?
Why would a node use more memory after this pull? Shouldn’t the 2 added pointers be included in the dbcache accounting, resulting in more frequent flushes due to the cache filling up faster, but not in an increase in memory when the cache is full?
l0rinc
commented at 7:36 pm on June 27, 2024:
contributor
Also, potential improvements like #28945 will make up for it.
andrewtoth
commented at 7:40 pm on June 27, 2024:
contributor
A non-pruned IBD with full dbcache to tip ended up using 12% more memory, but it was also 2.7% faster somehow.
I’m not sure if I understand this, is this still the case? Why would a node use more memory after this pull? Shouldn’t the 2 added pointers be included in the dbcache accounting, resulting in more frequent flushes due to the cache filling up faster, but not in an increase in memory when the cache is full?
This is no longer the case. When I did the initial benchmarks the full UTXO set could be stored in memory, and so a full IBD on mainnet would be 14.7 GB for master and 16.5 GB for this pull. Today both master and this pull will hit the max limit and flush at some point, so their memory usage should be near identical. You are correct that if the dbcache limit is set below the maximum UTXO set size then this pull will not use more memory.
For clarity, this pull only uses more memory if the entire IBD can occur without a single flush. What I meant in the description is this pull uses more memory when the entire utxo set is stored in memory and never flushed.
This pull could use more memory again on mainnet if #28358 is merged, and this pull will use more memory on other chains than mainnet if the UTXO size is smaller than the dbcache setting.
andrewtoth
commented at 7:44 pm on June 27, 2024:
contributor
Also, potential improvements like #28945 will make up for it.
In my recent benchmarks it was again slightly faster on this branch for default dbcache and no pruning. So I think this PR is essentially the same.
in
src/coins.h:122
in
0a2bc92b0coutdated
117+ * compared to giving all entries in the cache to the parent and having the
118+ * parent scan for only modified entries.
119+ *
120+ * FRESH entries are tracked because a FRESH-but-not-DIRTY entry can only be
121+ * a spent coin that exists in the parent cache. When performing a batch
122+ * write but not clearing the cache (CCoinsViewCache::Sync), we must also
111+ * These are used to create a doubly linked list of flagged entries.
112+ * They are set in AddFlags and unset in ClearFlags.
113+ * A flagged entry is any entry that is either DIRTY, FRESH, or both.
114+ *
115+ * DIRTY entries are tracked so that only modified entries can be passed to
116+ * the parent cache to do a batch write. This is a performance optimization
Is this really obvious though that we need to clear flags before destruction? Normally flags wouldn’t affect destroying an object, but in this case they also tie the object to a linked list.
No, but it’s in a destructor, calling the ClearFlags method - I didn’t seen what the comment added beyond what the code already explained well.
It could explain why you chose this style, or what alternatives you’ve considered, or something else that the code cannot, but this just seemed redundant to me.
Probably not a realistic scenario (may require multithreaded access because of the m_flags guard, not sure) , but m_prev will never be null when we get here, right?
Yes, this code is single threaded so not possible to get past the previous line if m_prev is null.
It is possible for m_prev to point to a pair that has been freed if the linked list head is destroyed, which is why cacheCoins must always be destroyed before the linked list head.
in
src/test/coinscachepair_tests.cpp:168
in
6508bdfe43outdated
163+ // Check that we can clear flags then re-add them
164+ n1.second.ClearFlags();
165+ BOOST_CHECK_EQUAL(n1.second.GetFlags(), 0);
166+ BOOST_CHECK_EQUAL(n2.second.Next(), nullptr);
167+
168+ // Check that we calling `ClearFlags` with 0 flags has no effect
197+
198+ inline uint8_t GetFlags() const { return m_flags; }
199+
200+ //! Get the next entry in the flagged linked list, optionally removing the
201+ //! current entry and clearing the flags.
202+ inline CoinsCachePair* Next(bool clear_flags = false)
It’s mostly convenience. Otherwise to clear flags while iterating we would have to bind m_next to another variable first, then call ClearFlags on the current pair, and then bind the the other variable holding the old m_next to the current pair variable.
Maybe, but now the method returns AND mutates, that’s a confusing combination.
I would expect it to be either void and mutate or return and not change internal state.
Can we maybe move the mutation out to the call-site?
The std lib uses this pattern though, with std::unordered_map::erase. It mutates and returns next. Surely we should be following std lib conventions, no?
165+ }
166+
167+ //! Adding a flag also requires a self reference to the pair that contains
168+ //! this entry in the CCoinsCache map and a reference to the head of the
169+ //! flagged pair linked list.
170+ inline void AddFlags(uint8_t flags, CoinsCachePair& self, CoinsCachePair& head)
I find it quite confusing that AddFlags modifies the linked list as well (vs GetFlags only returning the flags, of course). Could make sense to split it to multiple methods or add the side-effect to the name
We want to tie adding a flag to adding the entry to the linked list. We don’t want to have an interface that lets something add a flag without adding it to the linked list. I think splitting that functionality into multiple methods would be dangerous.
ClearFlags is similar. Does it matter to external callers how these methods modify the linked list? That should be transparent to callers. They just know that calling Next() from the head gets all flagged entries. The how should not be exposed IMO.
Could the overall intent be better expressed here, something like SetCacheStateAndLink/ResetCacheStateAndUnlink or Sync/Reset or similar?
It’s obviously doing more after this commit than what it announces…
in
src/test/coinscachepair_tests.cpp:43
in
6508bdfe43outdated
38+ CoinsCachePair head;
39+ auto nodes{CreatePairs(head)};
40+
41+ // Check iterating through pairs is identical to iterating through a list
42+ auto lit{nodes.begin()};
43+ for (auto it{head.second.Next()}; it != nullptr; it = it->second.Next(/*clear_flags=*/false), ++lit) {
Since iteration could misbehave severely and this would pass (e.g. if Next() would return nullptr we wouldn’t even enter the loop), it could make more sense to iterate over the target nodes instead, something like:
Hmm well the AddFlags function is in production code, so this is constructing the linked list via production code. The std::list we create is a test harness helper that we use to check that our linked list behaves the same.
in
src/test/coinscachepair_tests.cpp:70
in
6508bdfe43outdated
It just checks that removing the last element sets head to nullptr. I suppose just for completeness, since this case is the same as when we delete n4 above.
doing refactoring and logic changes in the same commit makes reviewing slightly harder - could you extract all non-critical changes in a separate commit so that we can concentrate on high-risk code more than on stylistic changes (which are important, but shouldn’t distract us)
this it->second.GetFlags() & CCoinsCacheEntry::DIRTY appears quite often, maybe it would make sense to extract it to it->second.IsDirty() (and it->second.IsFresh(), of course)
l0rinc
commented at 4:37 pm on June 28, 2024:
contributor
Fantastic change, would be helpful if you could make it a bit more digestible by separating it into smaller commits (especially splitting high-risk changes from low-risk refactoring), explaining the changes in commit messages.
in
src/coins.h:127
in
6ed20408dfoutdated
123+ * erase all spent entries. Since only DIRTY or FRESH entries can be spent,
124+ * we can track them in the linked list to erase them. This is a performance
125+ * optimization compared to scanning all cache entries to find any that are
126+ * spent-but-not-DIRTY.
127+ */
128+ CoinsCachePair* m_prev{nullptr};
I don’t fully understand why we need m_prev here, but have you investigated whether it’s possible to get rid of it and rely on m_next only?
Sometimes we still have a reference to both (current and next, so we wouldn’t need the next’s prev), but I assume that wasn’t always the case, right?
What prohibits us from accessing the previous ones in constant time during iteration
Nothing during iteration. But we need to delete entries randomly when a FRESH coin is spent, which is not during any iteration.
andrewtoth
commented at 5:56 pm on June 28, 2024:
contributor
Fantastic change, would be helpful if you could make it a bit more digestible by separating it into smaller commits (especially splitting high-risk changes from low-risk refactoring), explaining the changes in commit messages.
Thank you @paplorinc! I did try initially to separate the changes more. However, I found the third commit has to have all those changes done in one commit otherwise the test-all-commits CI job will fail. Changing only parts of it can break the tests, and break building the fuzz targets. I will explore to see if I can break it down any further and keep CI happy.
l0rinc
commented at 7:22 pm on June 28, 2024:
contributor
Changing only parts of it can break the tests
I usually try to recreate every line, after I have all the changes by interactively rebasing the current state and adding small refactoring under the current commit.
E.g. these could be separate commits doing a single thing:
renaming bool erase to bool will_erase
changing it->second.flags usages to it->second.GetFlags() (again, without change in behavior, but you could also add another commit after this which extracts the usages and returns it->second.IsFresh() instead, making commits after it easier to read)
changing it->second.flags |= to use it->second.AddFlags(CCoinsCacheEntry::DIRTY); (without any change in behavior, just doing it in a method)
changing iterators to the new linked list - without any other change in logic.
etc, I think you got the picture. By extracting these to separate trivial commits (which are followed by high risk changes, done in as few lines as possible without any refactorings in the way), we can concentrate on the difficult ones (which are a lot simpler and in separate commits themselves).
Thanks!
andrewtoth force-pushed
on Jun 29, 2024
andrewtoth force-pushed
on Jun 29, 2024
DrahtBot added the label
CI failed
on Jun 29, 2024
DrahtBot
commented at 3:02 pm on June 29, 2024:
contributor
🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the
documentation.
Possibly this is due to a silent merge conflict (the changes in this pull request being
incompatible with the current code in the target branch). If so, make sure to rebase on the latest
commit of the target branch.
Leave a comment here, if you need help tracking down a confusing failure.
DrahtBot removed the label
CI failed
on Jun 29, 2024
andrewtoth
commented at 6:06 pm on June 29, 2024:
contributor
@mzumsande@paplorinc thank you very much for your reviews! I’ve rebased and updated with all your suggested changes.
I also updated the commits for easier review. First, flags is encapsulated whenever used, then made private. Then linked list head and self references are added to AddFlags, followed by AddFlags and ClearFlags actually maintaining a linked list of entries. Lastly, the linked list is used for BatchWrite instead of passing the entire map of entries.
The flags seem like an implementation detail to me which we might not want to expose (we could have stored it in two booleans as well and the outside behavior shouldn’t change), so usages such as:
It isn’t because entry is constructed the line above, and we know from default constructor flags will be initialized to 0. So 0 | flags is same as flags.
in
src/test/coinscachepair_tests.cpp:66
in
f5443052bfoutdated
55+ // Check that head is empty
56+ BOOST_CHECK_EQUAL(head.second.Next(), nullptr);
57+
58+ // Delete the nodes from the list to make sure there are no dangling pointers
59+ for (auto it{nodes.begin()}; it != nodes.end(); it = nodes.erase(it)) {
60+ BOOST_CHECK_EQUAL(it->second.GetFlags(), 0);
so this basically inserts “in the middle” of the linked list all the time, i.e. not at the beginning or the end, right?
I also find that quite confusing, i.e.
0head <-> newest node <-> second newest <-> third newest <-> oldest node -> nullptr
what if we iterated the initialized nodes backwards so that they reflect the order of insertion, something like:
l0rinc
commented at 9:02 pm on June 29, 2024:
contributor
Added a few more comments, it’s a lot more understandable to me now - though I’m still lacking a few important details, I’ll investigate those next week.
Thanks for all your effort!
in
src/coins.h:127
in
c36363f6b2outdated
121+ * mean a spent coin exists in the parent CCoinsView and not in the child
122+ * CCoinsViewCache. Nevertheless, if a spent coin is retrieved from the
123+ * parent cache, the FRESH-but-not-DIRTY coin will be tracked by the linked
124+ * list and deleted when Sync or Flush is called on the CCoinsViewCache.
125+ */
126+ CoinsCachePair* m_prev{nullptr};
186- it != mapCoins.end();
187- it = erase ? mapCoins.erase(it) : std::next(it)) {
188+bool CCoinsViewCache::BatchWrite(CoinsCachePair *pairs, const uint256 &hashBlockIn, bool will_erase) {
189+ for (auto it{pairs};
190+ it != nullptr;
191+ it = it->second.Next(/*clear_flags=*/will_erase || !it->second.coin.IsSpent())) { // Keep flags for spent coins so they can be erased in Sync
nit: it seems to me we’re really pushing it with the loop iteration + deletion here, could we do the deletion at the end of the loop instead (like in src/txdb.cpp) to have a cleaner separation or the responsibilities? It may make the Next method cleaner as well (I know it was similar before as well).
l0rinc
commented at 2:18 pm on June 30, 2024:
contributor
This kept me up at night, we should be able to get rid of m_prev by doing insertions and deletions slightly differently:
we’d have an immutable SENTINEL value always serving as the last element (instead of nullptr);
insertion would always be in front of the previous element, which would serve as the new head (iterating from which will always get us to the SENTINEL);
deletion would copy the content (node, flag and next) of the next node instead (unless it’s a SENTINEL), and dispose of that instead.
Something like this:
0diff --git a/src/coins.h b/src/coins.h
1--- a/src/coins.h (revision c36363f6b24c7ab2afe198d9855f507ddf096e1f)
2+++ b/src/coins.h (date 1719755615706)
3@@ -123,11 +123,11 @@
4 * parent cache, the FRESH-but-not-DIRTY coin will be tracked by the linked
5 * list and deleted when Sync or Flush is called on the CCoinsViewCache.
6 */
7- CoinsCachePair* m_prev{nullptr};
8 CoinsCachePair* m_next{nullptr};
9 uint8_t m_flags{0};
1011 public:
12+ static CoinsCachePair SENTINEL;
13 Coin coin; // The actual cached data.
1415 enum Flags {
16@@ -154,22 +154,19 @@
17 //! Adding a flag also requires a self reference to the pair that contains
18 //! this entry in the CCoinsCache map and a reference to the head of the
19 //! flagged pair linked list.
20- inline void AddFlags(uint8_t flags, CoinsCachePair& self, CoinsCachePair& head)
21+ inline void AddFlags(uint8_t flags, CoinsCachePair& next)
22 {
23 if (!m_flags && flags) {
24- m_prev = &head;
25- m_next = head.second.m_next;
26- if (m_next) m_next->second.m_prev = &self;
27- head.second.m_next = &self;
28+ m_next = &next;
29 }
30 m_flags |= flags;
31 }
32 inline void ClearFlags()
33 {
34- if (m_flags) {
35- if (m_next) m_next->second.m_prev = m_prev;
36- m_prev->second.m_next = m_next;
37- m_prev = m_next = nullptr;
38+ if (m_flags && m_next != &SENTINEL) {
39+ coin = std::move(m_next->second.coin);
40+ m_flags = m_next->second.m_flags;
41+ m_next = m_next->second.m_next;
42 }
43 m_flags = 0;
44 }
What do you think?
andrewtoth
commented at 2:50 pm on June 30, 2024:
contributor
we should be able to get rid of m_prev by doing insertions and deletions slightly differently
But we can’t move the coin from the next entry to us. The coinsCache map will still point to the next entry for that outpoint.
l0rinc
commented at 3:06 pm on June 30, 2024:
contributor
The coinsCache map will still point to the next entry for that outpoint
Yeah, but we can update the map in constant time, right? Do you think that would measurably slow down execution?
Could the memory savings of one less pointer allow us to increase cache hit rate, or am I completely off here?
andrewtoth
commented at 3:13 pm on June 30, 2024:
contributor
Using a singly linked list would be better, agreed. I just don’t see how this design would work. How do we clear the flags but keep the coin? That is the purpose of this patch, to not delete an entry after clearing the flags.
andrewtoth
commented at 3:16 pm on June 30, 2024:
contributor
Oh wait, we only need random access for deletion! Otherwise we can clear the flags on iteration!
Brilliant! I think this could work then.
andrewtoth
commented at 9:24 pm on June 30, 2024:
contributor
@paplorinc was trying to implement your suggestion, but it’s much too complex I think.
How do we delete the node that points to the sentinel? If we move the sentinel into the node and reassign sentinel to that node, we would still need to erase that node from the coinsCache map. We could extract the node handle from the map and have the CCoinsViewCache own that, but this seems like far too much code being changed. I don’t think there’s that much benefit for removing one pointer per entry. We would also have to extract the node from the map after the next node moves into it, and then reinsert the node with the next node’s original outpoint.
sipa
commented at 9:28 pm on June 30, 2024:
member
@andrewtoth Just high-level brainstorming, and I really don’t want to push this PR into something more complex than necessary.
But would it be possible to have a map from outpoints to entries, with a singly-linked list through the entries, such that *(map[key].m_prev) stores the data for key key, rather than map[key] itself?
andrewtoth
commented at 9:49 pm on June 30, 2024:
contributor
But would it be possible to have a map from outpoints to entries, with a singly-linked list through the entries, such that *(map[key].m_prev) stores the data for key key, rather than map[key] itself?
@sipa Sorry I’m not sure I follow. What does *(map[key].m_prev) store?
sipa
commented at 10:18 pm on June 30, 2024:
member
So the data structure we want has (a) an index on COutPoint and (b) a single-linked list that stores an iteration order. The issue (I think, I haven’t followed in detail) is that we need the ability to delete entries found through the (a) index, and if we naively do that, the entry that (b) points to the deleted entry will have its prev pointer invalidated.
My suggestion is “shifting” the values of the map by one. map[key] stores the UTXO for the entry that is (b)-after key, so the UTXO for outpoint key is found in *(map[key]->m_prev) instead. This means an extra indirection on lookup (which may make the whole thing already not worth it from a performance perspective), but it does mean entries can be deleted (to delete key, do { auto to_del = map[key]->m_prev; map[key]->m_prev = to_del->m_prev; delete(to_del); }.
andrewtoth
commented at 10:42 pm on June 30, 2024:
contributor
@sipa thank you for the suggestion! I think it’s missing the fact that we want entries in the map to possibly not belong to the linked list. So we want to clear the linked list after a non-erasing flush, but leave the map intact otherwise. I don’t think that’s possible with just this data structure. We would need 2 parallel maps, one with unflagged entries and one with dirty entries. I’m not sure if that would be more performant with 2 lookups for every access? I think if we have 2 maps we can also just get rid of the linked list entirely.
You did mention the 2 maps idea a while ago #15265 (comment) and @martinus had a similar idea earlier in this PR discussion.
sipa
commented at 10:45 pm on June 30, 2024:
member
@andrewtoth You can have 1 map, with 2 sentinels, each being the end of one linked list (one for flagged entries, one for unflagged ones)… but yeah, fair point, this perhaps complicates matters even more. And the performance penalty of this idea is probably not insignificant either.
andrewtoth force-pushed
on Jul 1, 2024
andrewtoth force-pushed
on Jul 1, 2024
in
src/test/coinscachepair_tests.cpp:165
in
5f753defe5outdated
147+ BOOST_CHECK_EQUAL(n1.second.Next(), nullptr);
148+ BOOST_CHECK_EQUAL(head.Next(), &n1);
149+
150+ // Check that adding FRESH flag on new node inserts it after head
151+ n2.second.AddFlags(CCoinsCacheEntry::FRESH, n2, head);
152+ BOOST_CHECK_EQUAL(n2.second.GetFlags(), CCoinsCacheEntry::FRESH);
But since these are unit tests we are kind of testing the internals, so it would be good to catch if any other flags sneak in. With IsFresh it just checks that the FRESH flag is set, but any other flag could be set as well.
in
src/coins.cpp:274
in
670084adc5outdated
284- ++it;
285+ cacheCoins.erase(it->first);
286+ } else if (fOk) {
287+ /* BatchWrite must unset all unspent entries when will_erase=false. */
288+ throw std::logic_error("Not all unspent flagged entries were unset");
289 }
Since inline substitution is unobservable in the standard semantics, compilers are free to use inline substitution for any function that’s not marked inline, and are free to generate function calls to any function
marked inline.
Does it make sense to check what the main compilers do in this case (the ones I checked via godbolt automatically inline), or was that just a waste of time, since seemingly unrelated changes could change the outcome (but if that’s the case, shouldn’t we let the compiler decide)?
I think not inlining would mean I would also have to add these methods to the implementation file instead of the header.
Since you’re checking assembly, is there any difference if I add noexcept to these methods? I can’t find a definitive answer on if that will make any difference.
It allows the function to appear in multiple compilation units (which makes it mandatory here). Templated entities automatically have this behavior (with or without explicit inline).
It suggests to the compiler that its implementation should be inlined in call sites. Because this inlining has no observable effect on compiler behavior, compilers can do this (or not do this) with or without inline. It has become less important as compilers are smart enough to make this decision themselves, but not entirely gone.
@andrewtoth The C++ Core Guidelines suggest adding noexcept whenever you know a function won’t throw. Bitcoin Core treats failure to allocate memory as an unrecoverable condition, so having dynamic memory allocations in a function is not a reason not to add it.
It may or may not have a performance benefit (especially when it’s a function called from a different compilation unit - and we’re not compiling with LTO enabled, or in default/move constructors of types used inside STL containers which may pick more efficient algorithm variants in that case), but even if it doesn’t, it’s useful as human documentation that a function isn’t expected to throw.
That also holds for inline in my opinion, actually: it documents that the author intends for a function to be inlined, whether or not the compiler actually does that (at -O0 it won’t ever inline, at -O2 all tiny functions probably will).
@sipa thank you for the explanation.
@paplorinc that was an example where it made a difference to performance that came to my mind, yes. I was just curious whether there would be any performance benefit of adding noexcept on these methods. If the asm generated with or without is the same then the answer is no. But the advice for adding them for human documentation makes sense.
l0rinc
commented at 9:24 am on July 1, 2024:
contributor
utACK670084adc53e3d661ea5b4a19743659609a42d5e, will run the benchmarks in the following weeks, thanks for addressing the suggestions.
DrahtBot requested review from jamesob
on Jul 1, 2024
andrewtoth force-pushed
on Jul 3, 2024
andrewtoth force-pushed
on Jul 3, 2024
in
src/coins.h:142
in
9d539b7451outdated
135@@ -136,7 +136,10 @@ struct CCoinsCacheEntry
136 CCoinsCacheEntry() noexcept = default;
137 explicit CCoinsCacheEntry(Coin&& coin_) noexcept : coin(std::move(coin_)) {}
138139- inline void AddFlags(uint8_t flags) noexcept { m_flags |= flags; }
140+ //! Adding a flag also requires a self reference to the pair that contains
141+ //! this entry in the CCoinsCache map and a reference to the head of the
142+ //! flagged pair linked list.
143+ inline void AddFlags(uint8_t flags, CoinsCachePair& self, CCoinsCacheEntry& head) noexcept { m_flags = flags; }
Yeah I suppose we can rename it m_flagged_sentinel or something? Do you have a suggestion that would have made this clearer if that was the name originally?
we have “sentinels” on both ends now, right? A dummy at the beginning and a nullptr on the end. This is what I found confusing in the test, why not store the actual head and have a nullptr or sentinel at the end only? I haven’t investigated this, maybe I’m completely off again…
we have “sentinels” on both ends now, right? A dummy at the beginning and a nullptr on the end.
Yes, that sounds right.
why not store the actual head and have a nullptr or sentinel at the end only?
I found it was more code to keep track of a changing head in the CCoinsViewCache, instead of a dummy sentinel I called m_flagged_head that remained static and all insertions happened after the sentinel.
If we tracked the actual head, we would have to assign the latest iterator added to be head for each call, and then on deletion we would have to check if it was head as well and then reassign head to the next entry. That last part couldn’t be done without access to the coinsCache, so we couldn’t just add ClearFlags into the destructor of the entry.
I think I will refrain from adding these annotations for now, since they are more than just documentation and may have side effects in the generated code. I might experiment with a separate commit to see if they affect performance.
l0rinc
commented at 6:19 pm on July 3, 2024:
contributor
I ran the benchmarks without prune (on the previous push), with just a lazy -reindex or -reindex-chainstate, but couldn’t measure any difference - do I need more than 200k blocks? Or is prune critical here? Or is lazy reindexing not enough?
andrewtoth
commented at 6:34 pm on July 3, 2024:
contributor
I ran the benchmarks without prune (on the previous push), with just a lazy -reindex or -reindex-chainstate, but couldn’t measure any difference - do I need more than 200k blocks? Or is prune critical here? Or is lazy reindexing not enough?
It shouldn’t have much effect on non-pruning nodes, so it’s good that there’s no regression.
You will need pruning enabled to see the performance benefits. The higher the dbcache and lower the prune settings the better. The higher the block height the better.
l0rinc
commented at 6:35 pm on July 3, 2024:
contributor
You will need pruning enabled to see the performance benefits
That requires a bit more setup to be able to do it offline
andrewtoth
commented at 6:43 pm on July 3, 2024:
contributor
That requires a bit more setup to be able to do it offline
I don’t think it’s possible to reindex the chainstate of a pruned node, since that requires having all blocks. So you will need to redownload blocks to benchmark this. In my setup I have another fully synced non-pruned node on a separate machine in my local network and I only connect the node being benchmarked to that other node.
In commit “refactor: encapsulate flags setting with AddFlags and ClearFlags”
I don’t think this test helps; testing a value is likely slower than just overwriting it. Testing with godbolt does seem like it doesn’t get optimized out, which surprises me a bit.
EDIT: I see this matters in a future commit. Feel free to ignore, or move the conditional there.
in
src/test/fuzz/coins_view.cpp:128
in
4890c85efdoutdated
121@@ -122,12 +122,16 @@ FUZZ_TARGET(coins_view, .init = initialize_coins_view)
122 random_mutable_transaction = *opt_mutable_transaction;
123 },
124 [&] {
125+ // flagged_head must go out of scope and be destroyed after coins_map.
126+ // Any remaining flagged entries in coins_map need to reference the head
127+ // when they are destroyed.
128+ CCoinsCacheEntry flagged_head;
If I can suggest a slightly different linked-list design, which would avoid some conditionals.
A doubly-linked, circular, list. The sentinel is both the first and the last element of the list of flagged entries. Every non-flagged entry is also a linked list, containing just itself. In this case, adding/removing from the flagged list is just concatenating/splicing linked lists, which can be done without conditions/edge cases.
0CCoinsCacheEntry() noexcept 1{
2 ...
3 m_prev =this;
4 m_next =this;
5}
6 7/** Insert this after other. */ 8voidInsertAfter(CCoinsCacheEntry* other) noexcept 9{
10auto other_next = other->m_next;
11 std::swap(other->m_next, m_prev->m_next);
12 std::swap(m_prev, other_next->m_prev);
13}
1415/** Insert this before other. */16voidInsertBefore(CCoinsCacheEntry* other)
17{
18auto other_prev = other->m_prev;
19 std::swap(other->m_prev, m_next->m_prev);
20 std::swap(m_next, other_m_prev->m_next);
21}
2223/** Make this a singleton list. Can be done on entries that are already singletons. */24voidTakeOut()
25{
26auto old_prev = m_prev;
27 std::swap(m_next->m_prev, m_prev);
28 std::swap(old_prev->m_next, m_next);
29}
Actually, even simpler. By delaying the setting of m_next = this and m_prev = this to just before insertion, inlining that, and not bothering with getting m_next and m_prev correct for taken-out elements:
Also, m_prev and m_next would have to be CoinsCachePairs, not CCoinsCacheEntrys, so we can’t set the pointers in the constructor since we don’t have a pair.
Done! Thank you for your review, I think it’s pretty clean now. I just had to add a SelfRef method on CCoinsCacheEntry for the sentinel to init its pointers to itself.
Also, we need to not call Next() randomly on any CCoinsCacheEntry, only from the sentinel and then iterating through. This isn’t done anywhere in the patch, but it’s not protected against if anyone does. Now instead of returning a nullptr if it’s not in the linked list it could potentially return a pointer to an already freed entry.
davidgumberg
commented at 9:53 pm on August 1, 2024:
we need to not call Next() randomly on any CCoinsCacheEntry, only from the sentinel and then iterating through. This isn’t done anywhere in the patch, but it’s not protected against if anyone does. Now instead of returning a nullptr if it’s not in the linked list it could potentially return a pointer to an already freed entry.
Feel free to disregard: Would it make sense for Next()/Prev() to add an Assume(m_flags) or to do if(m_flags) { return m_next; } to protect against the dangling reference?
andrewtoth
commented at 4:21 am on August 2, 2024:
Added Assume(m_flags), thank you for the suggestion @davidgumberg! I did have to add m_flags = DIRTY; to the SelfRef method so that we can call Next on the sentinel to get the first entry.
andrewtoth force-pushed
on Jul 7, 2024
in
src/coins.h:144
in
4cc0f95c40outdated
140+ //! Adding a flag also requires a self reference to the pair that contains
141+ //! this entry in the CCoinsCache map and a reference to the sentinel of the
142+ //! flagged pair linked list.
143+ inline void AddFlags(uint8_t flags, CoinsCachePair& self, CoinsCachePair& sentinel) noexcept
144+ {
145+ assert(&self.second == this);
I see there’s also an assert(!coin.IsSpent()); at the beginning of AddCoin, which would also be in the hot path. When should we use assert vs Assume then?
In commit “coins: track flagged cache entries in linked list”
This doesn’t matter much, but if you’d swap all prevs/nexts here, the entry would get inserted at the end of the list rather than the beginning, making the list be ordered old-to-new. Unless there is a reason to write things out from new to old?
In commit “coins: track flagged cache entries in linked list”
Would it make sense to add code to CCoinsViewCache’s destructor to explicitly wipe the map, and then test that m_sentinel.m_next == &m_sentinel in an Assume or so? To ascertain that no deleted CCoinsCacheEntrys remain in the list?
274- // FRESH/DIRTY flags of any coin that isn't spent.
275- for (auto it = cacheCoins.begin(); it != cacheCoins.end(); ) {
276+ bool fOk = base->BatchWrite(m_sentinel, hashBlock, /*will_erase=*/false);
277+ // Instead of clearing `cacheCoins` as we would in Flush(), just erase the
278+ // spent coins. All coins left in the linked list after BatchWrite should be spent.
279+ for (auto it{m_sentinel.second.Next()}; it != &m_sentinel; ) {
In commit “coins: pass linked list of flagged entries to BatchWrite”
It’s a bit unfortunate that this means iterating through spent entries twice; wouldn’t it be possible to let BatchWrite do the deletion too in the same loop? This will probably mean passing both cacheCoins and m_sentinel in.
I ended up doing it. I pass in sentinel and cachedCoinsUsage and delete the spent entries when not erasing.
I think calling erase with the outpoint only is slower than erasing with the iterator itself, so I only erase when calling Sync, and clear the map at the end of the loop if erase=true.
Hmm, it’s not at all obvious that this will be faster, as an erase by outpoint involves hashing and lookup in the hash table. It may be worth benchmarking the two options to make sure.
Another option is switching the coinscache map from std::unordered_map to a Boost multiindex, which can simultaneously maintain a hashtable and a linked list transparently without needing to hack our own linked list on top. Practically that would mean no issues due to lack of ability to convert from an entry pointer to an iterator.
Hmm, it’s not at all obvious that this will be faster, as an erase by outpoint involves hashing and lookup in the hash table. It may be worth benchmarking the two options to make sure.
Sorry, I may not have been clear. I am not erasing by outpoint. I only erase the spent coins if erase=false by outpoint, because that’s the only way to find those in constant time.
Otherwise, for a normal erasing flush, we just call mapCoins.clear() at the end to wipe them all at once.
I think I have used the fastest options for each path.
I understand now. My point stands that both versions are doing something unnecessary, but I see now it’s already a strict improvement:
For erase=false, you erase spent coins where iterating, which involves a hashtable lookup, even though you already have a pointer to the element you’re erasing (and it’s just due to an interface limitation that this cannot be turned into an iterator that can be erased directly).
For erase=true, you still iterate a second time (inside mapCoins.clear()), even though again you already passed them once (this is very low in CPU cost, but if the cache is big enough to not fit in CPU caches, this will incur a second RAM latency cost).
Using a multiindex would permit getting rid of both, but let’s keep that for a follow-up, if anyone wants to investigate that.
Boost multi-index is headers-only, so it’s just a build-time dependency, not a runtime one. For the mempool we already realistically can’t avoid having something that’s functionally equivalent to boost multiindex, so I’m not too worried about adding more reliance on it. I do know @theuni is looking into having a simple ad-hoc replacement for it, though.
you still iterate a second time (inside mapCoins.clear()),
Hmm I added this in the latest revision because we now have the mapCoins in BatchWrite and could clear it.
Before this patch was just not erasing the entries and removing checking that it is empty in ReallocateCache. If the map is destroyed does it not essentially do a clear as well, so this is a wash whether we clear now or on destruction?
andrewtoth force-pushed
on Jul 7, 2024
andrewtoth force-pushed
on Jul 7, 2024
in
src/coins.cpp:199
in
91c262bae9outdated
199- if (!(it->second.flags & CCoinsCacheEntry::DIRTY)) {
200+ if (!(it->second.IsDirty())) {
201+ const auto next_entry{it->second.Next(/*clear_flags=*/true)};
202+ if (!erase && it->second.coin.IsSpent()) {
203+ childCachedCoinsUsage -= it->second.coin.DynamicMemoryUsage();
204+ mapCoins.erase(it->first);
Yes, but if !erase. This is what Sync was doing before I moved this into BatchWrite. Only spent outpoint when we don’t want to clear the cache. This is what I already benchmarked to be much faster.
Err, not master, but the previous version of this PR that I force pushed over an hour ago.
In master it is looping through the entire cacheCoins map, and deleting spent entries. With this approach we do have to do a hash and lookup, but it’s constant time instead of linear.
Hmm, right. But so there are still two possible variants for the erase=false case?
One where they are erased during BatchWrite, using outpoint-based erase
One where all the erasing is done inside Sync, and BatchWrite only does flag clearing.
(1) has higher CPU cost (due to hashing and hash-table search), while (2) will involves accessing entries’ memory twice. Which of the two is faster isn’t clear to me, as it’ll depend on CPU cache sizes etc.
Ah, but (2) necessarily also involves an outpoint-based erase, as you only want to erase flagged entries without iterating the non-flagged ones. Right.
2 also required an outpoint based deletion, so this approach just moves that deletion into the same loop instead of clearing unspent flags in the first loop and then looping through spent entries again in a second loop and deleting them via outpoint.
Ok, but is there no C++-fu incantation that can get us from a std::pair<const COutpoint, CCoinsCacheEntry>* to a CCoinsMap::iterator? I can’t figure out a way to get that to compile but I’m sure there’s a way using void* or something.
I don’t seem to be able to get it to work. It may be that’s because the iterator doesn’t actually encapsulate an element pointer, but a pointer to where in the hashtable the entry exists or so. If so, then that just isn’t possible.
andrewtoth
commented at 5:11 pm on July 7, 2024:
contributor
Thank you for your thorough review @sipa! Will run some more extensive benchmarks that should be done in the coming weeks.
l0rinc
commented at 6:05 pm on July 8, 2024:
contributor
nice, I like this version a lot more.
I haven’t investigated it yet, but now that we have a circular list, does the double linking still offer a measurable advantage or could we do the same with a single link and maybe 2 iterations instead (and would it be worth it, if it’s possible)?
@paplorinc With only a single link, how would you delete arbitrary map entries? Barring large hacks like putting the data in the next entry rather than the entry which the key points to, I don’t see how you could delete entries in less than $\mathcal{O}(n)$ time.
l0rinc
commented at 7:07 pm on July 8, 2024:
contributor
With only a single link, how would you delete arbitrary map entries?
My naive thinking was that maybe we don’t actually need random-element deletion. Based on how ClearFlags and erase are currently called from loops that already have the previous node, we could often call ClearFlags(*prev).
The destructor is trickier. Maybe we can handle this in the CCoinsViewCache destructor instead, batch-destructing entries there, which would involve a single extra $\mathcal{O}(n)$ traversal.
I’m sure this has some other disadvantages (if it works at all), which is why I’m asking if it could potentially work or not.
@paplorinc When UTXOs are spent while FRESH, they get deleted from the cache; this is quite possibly the most significant performance optimization Bitcoin Core has during IBD, as it avoids the need for those UTXOs to ever end up on disk and makes space for new ones. It’s done based on COutPoint lookup, not during iteration.
While it’s not impossible to instead do this in a cleanup step separate from normal operation, I think that would come with a significant performance cost, as it would need frequent iterations over all UTXOs to discover these to-be-deleted entries (not just flagged ones, like this PR does).
l0rinc
commented at 7:19 pm on July 8, 2024:
contributor
andrewtoth
commented at 3:12 am on July 9, 2024:
contributor
Added a CoinsViewCacheCursor struct to encapsulate the iteration of the linked list when passed up to BatchWrite.
andrewtoth force-pushed
on Jul 9, 2024
DrahtBot added the label
CI failed
on Jul 9, 2024
DrahtBot
commented at 3:13 am on July 9, 2024:
contributor
🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the
documentation.
Possibly this is due to a silent merge conflict (the changes in this pull request being
incompatible with the current code in the target branch). If so, make sure to rebase on the latest
commit of the target branch.
Leave a comment here, if you need help tracking down a confusing failure.
DrahtBot removed the label
CI failed
on Jul 9, 2024
in
src/coins.cpp:259
in
adcccaae6boutdated
264+ bool fOk = base->BatchWrite(cursor, hashBlock, /*erase=*/true);
265 if (fOk) {
266- if (!cacheCoins.empty()) {
267- /* BatchWrite must erase all cacheCoins elements when erase=true. */
268- throw std::logic_error("Not all cached coins were erased");
269- }
Which test should I run to check?
I tried src/test/test_bitcoin --run_test=coins_tests and make check, they all passed when I commented out the throw. Also, code coverage (I know it’s incomplete, but still) indicates, this line is not exercised
92@@ -93,7 +93,7 @@ void CCoinsViewCache::AddCoin(const COutPoint &outpoint, Coin&& coin, bool possi
93 //
94 // If the coin doesn't exist in the current cache, or is spent but not
95 // DIRTY, then it can be marked FRESH.
96- fresh = !(it->second.flags & CCoinsCacheEntry::DIRTY);
97+ fresh = !(it->second.IsDirty());
222 // the calling code.
223 throw std::logic_error("FRESH flag misapplied to coin that exists in parent cache");
224 }
225226- if ((itUs->second.flags & CCoinsCacheEntry::FRESH) && it->second.coin.IsSpent()) {
227+ if ((itUs->second.IsFresh()) && it->second.coin.IsSpent()) {
in which case Begin and Next modify the state but don’t actually return anything.
Or even better, we could get rid CoinsCachePair* m_curr{nullptr}; (Begin would also become a const) and add it to Next(CoinsCachePair* current, bool erase), and the usage would simply become cursor.Next(it, erase) instead.
nit: you haven’t touched this line, but Sonar complains that: "std::move" should not be called on an object of a type with neither move constructor nor move-assignment operator. - ignore it, if it’s unrelated (the rest from https://corecheck.dev/bitcoin/bitcoin/pulls/28280 are also not very useful, but I can’t judge the severity of this one)
andrewtoth
commented at 3:43 am on August 2, 2024:
Here it says an explicit move constructor does not allow certain optimizations. So this is likely complaining due to the only move constructor being explicit. I’m not sure what the implications are exactly for removing the explicit keyword here.
… but, I think inside that constructor, the attempt to construct the coin member of CCoinsCacheEntry using a move is ineffective, because there is no move constructor for Coin.
Note that CCoinsCacheEntry::CCoinsCacheEntry(Coin&&) is not a move constructor at all because the argument type does not match the type being constructed. I think the warning from Sonar is unrelated to this constructor.
My thinking is we should make sure Coin has a move constructor.
l0rinc
commented at 2:55 pm on July 9, 2024:
contributor
The benchmarks indicate that BlockToJsonVerboseWrite was slowed down
I ran a before/after against master, found only a minor (4%) diff - is this expected?
make -j10 && ./src/bench/bench_bitcoin –filter=BlockToJsonVerboseWrite –min-time=10000
before:
ns/op
op/s
err%
total
benchmark
27,574,397.06
36.27
0.2%
11.03
BlockToJsonVerboseWrite
27,586,627.40
36.25
0.1%
11.01
BlockToJsonVerboseWrite
27,638,839.92
36.18
0.2%
10.99
BlockToJsonVerboseWrite
after:
ns/op
op/s
err%
total
benchmark
28,573,375.00
35.00
0.2%
11.06
BlockToJsonVerboseWrite
28,531,835.59
35.05
0.1%
11.08
BlockToJsonVerboseWrite
28,602,410.56
34.96
0.1%
11.09
BlockToJsonVerboseWrite
andrewtoth force-pushed
on Jul 9, 2024
andrewtoth
commented at 9:29 pm on July 9, 2024:
contributor
@paplorinc thank you again for your review! I addressed all your nits I think. I don’t think that benchmark is related, could just be noise.
l0rinc
commented at 10:25 am on July 10, 2024:
contributor
I don’t think that benchmark is related, could just be noise.
Indeed, after your rebase the speed seems to be the same as on master.
in
src/coins.h:285
in
dddb59a63doutdated
280@@ -253,8 +281,8 @@ class CCoinsView
281 virtual std::vector<uint256> GetHeadBlocks() const;
282283 //! Do a bulk modification (multiple Coin changes + BestBlock change).
284- //! The passed mapCoins can be modified.
285- virtual bool BatchWrite(CCoinsMap &mapCoins, const uint256 &hashBlock, bool erase = true);
286+ //! The passed cursor is used to iterate through the coins.
287+ virtual bool BatchWrite(CoinsViewCacheCursor &cursor, const uint256 &hashBlock, bool erase = true);
Ahh yes this comment is removed in this patch, and a variation of it should actually probably go right here. Perhaps I do this in a follow-up to not invalidate your ACK?
variation of it should actually probably go right here
Maybe, but I would still be surprised, regardless of the context, if not X leads to X. Could the parameter explain it better than “erase”, so that the reviewer isn’t surprised that it means “keep”?
invalidate your ACK
I expect that I will have to review and ack your changes many more times before it’s merged, don’t worry about it!
So why is erase == true || it->second.coin.IsSpent() == true an erasure in one place, but erase == false && current.second.coin.IsSpent() == true an erasure in the other method? Shouldn’t the Next method rather be something like this instead:
When erase = true, we will be erasing the entire map at the end of BatchWrite. When erase = false, we want to keep all unspent coins in the map and have their flags cleared. This also means we want to erase only the spent coins when erase = false.
While looping through CCoinsMap::iterators, we can just do it = erase ? mapCoins.erase(true) : std::next(it), and then deleting all spent coins and clear their flags right after we return from BatchWrite when erase = false.
Now that we are looping through CoinsCachePairs though, we can’t call mapCoins.erase(it) because a CoinsCachePair can not be coerced back to a CCoinsMap::iterator. So, instead of erasing while we iterate when erase = true, we just keep all entries and call mapCoins.clear() after we return from BatchWrite.
But, as a performance optimization for the case where erase = false, we can also delete the spent coins in the same loop so we don’t have to loop through them again when we return. So in this case we can erase by the map key instead of the map iterator. This is slower because we have to hash the key and lookup the entry in the map, whereas with the iterator we could just delete it directly.
When erase = true we don’t want to erase by the key, because it is slower than calling clear at the end of the loop.
, we’re moving when erase = true because we’ve already erased it - right?
We’re moving because we are going to erase it in either case, not that we have yet. We can’t move an erased entry because it will have already been destroyed. And we can’t use a moved entry later because it is invalidated by the move. So when we want to keep the entry, we need to copy it and not move out of it.
When erase = true, we will be erasing the entry after we return, so all entries can be moved. When coin.IsSpent(), we will be erasing it when erase = false because we only want to keep unspent entries.
I did have a commit before that changed erase to will_erase, I can’t remember why I removed it. Would that make it clearer if I renamed that again?
I will have to read that again tomorrow, it’s the end of the day here :/
will_erase could help, but I find it hard to keep track of so many independent mutations, I’m hoping there’s a simpler way to do this…
Do you think you could add a test that at least doesn’t pass for my suggested code above? Maybe that will help me understand why something this simple (iteration with/without simple/bulk cleanup) has this many loose parts, where the exact same parameter means the opposite from one line to the next…
You’re right, the logic for moving or not should not leak into BatchWrite. Perhaps we can get rid of erase in BatchWrite and have the cursor return a pair - the iterator and a bool for whether it can move out or not?
I had it like that before but others suggested to move everything into 1 loop instead of looping through the dirty entries again after BatchWrite to clear flags and delete spent entries.
I haven’t been able to come up with something that I think would be an improvement. I did split the second last commit up into two commits though. Maybe that will make it more clear when reviewing.
While reviewing (and before reading this discussion) I also got stuck on this. Because of the expectations the naming raises, it seems like a potential footgun (future code may try to iterate over the linked list without intending to erase entries).
I think the side effect should be documented more clearly for the Next() function, maybe even rename to NextAndMaybeErase() or something similar.
andrewtoth
commented at 10:02 pm on July 22, 2024:
This comment thread is pointing to an out of date version of the PR. There’s more documentation on this struct and methods now, as well as changing a variable and field name. Maybe it is more clear now?
andrewtoth
commented at 3:44 am on August 2, 2024:
Renamed the method to be NextAndMaybeErase.
andrewtoth
commented at 3:45 am on August 2, 2024:
Also added more comments around this struct.
l0rinc
commented at 10:40 am on July 10, 2024:
contributor
Thanks, I will try to properly test the speed in the following weeks (maybe I’ll just rent a VM and do it there)
utACK5c37dfdda55384358540cfae50715faefa79c933
andrewtoth
commented at 5:31 pm on July 12, 2024:
contributor
Some benchmarks with the latest changes. Did 2 IBD with prune=550 and 2 -reindex-chainstate with prune=0 to block 820k. Results were similar to before. This time no prune was 1% faster on this branch with default dbache, and 1% slower on this branch with max dbcache. This just means there’s no regression IMO.
It’s 34% faster pruned with max dbcache and 13% faster with default dbcache on this branch vs a pruned node on master with max dbcache.
Note I did the dbache=450 prune=0 benchmarks on a slower machine so I could do these benchmarks in parallel.
prune
dbcache
mean time (s)
speedup
master
550
16384
30,085
-
branch
550
16384
19,984
34%
branch
550
450
26,278
13%
master
0
16384
12,541
1%
branch
0
16384
12,669
-
master
0
450
21,942
-
branch
0
450
21,671
1%
andrewtoth force-pushed
on Jul 12, 2024
andrewtoth force-pushed
on Jul 12, 2024
l0rinc
commented at 10:01 am on July 13, 2024:
contributor
@andrewtoth, I’m running the benchmark now on a cosy Hetzner dedicated server (without a local Bitcoin node, let’s see if that makes it too unreliable).
Do you think we should try different -dbbatchsize values as well?
in
src/coins.cpp:203
in
fa0494d7ffoutdated
208- // The `move` call here is purely an optimization; we rely on the
209- // `mapCoins.erase` call in the `for` expression to actually remove
210- // the entry from the child map.
211+ if (cursor.WillErase(*it)) {
212+ // Since this entry will be erased,
213+ // we can move the coin into us instead of copying it
You might be right here, but this function in particular is very heavily commented due to its importance. I think modifying the comment is better than just removing it.
219+ // Since this entry will be erased,
220+ // we can move the coin into us instead of copying it
221 entry.coin = std::move(it->second.coin);
222 } else {
223 entry.coin = it->second.coin;
224 }
I believe in making important parts of the code stick out and secondary importance going in the background.
This move is secondary importance, right? But it occupies a big part of the method, maybe we can simplify it with a ternary:
229@@ -230,6 +230,40 @@ class CCoinsViewCursor
230 uint256 hashBlock;
231 };
232233+struct CoinsViewCacheCursor
234+{
235+ //! will_erase should be set to true in CCoinsViewCache::Flush and false in CCoinsViewCache::Sync
In commit “coins: pass linked list of flagged entries to BatchWrite”
I’d prefer a comment that explains what the flag does rather than just when to set it. For example “If will_erase is not set, iterating through the cursor will erase spent coins from the map, and other coins will be unflagged (removing them from the linked list). If will_erase is set, the underlying map will not be modified, as the caller is expected to wipe the entire map anyway.”
254+ }
255+ return next_entry;
256+ }
257+
258+ //! Whether the current pair will be erased after being passed to Next
259+ inline bool WillErase(CoinsCachePair& current) const noexcept { return m_will_erase || current.second.coin.IsSpent(); }
andrewtoth
commented at 2:43 pm on July 13, 2024:
contributor
@andrewtoth, I’m running the benchmark now on a cosy Hetzner dedicated server (without a local Bitcoin node, let’s see if that makes it too unreliable).
Just wanted to note the first commit should probably be 94d56b9def44bfa5e002ef2c666e5961b1deacdc, the one before this PR, and not 0383defd049fa1feb6de7abea259041487ccda1d which is the first in this PR.
Do you think we should try different -dbbatchsize values as well?
This PR doesn’t do anything related to -dbbatchsize, so no I don’t think so.
andrewtoth
commented at 3:13 pm on July 13, 2024:
contributor
Also, you should probably wrap hyperfine with nohup ... & so you can exit the ssh session.
l0rinc
commented at 11:38 am on July 14, 2024:
contributor
the first commit should probably be
The first commit was just unrelated refactor, was easier to use that as a ‘before’.
a warmup run is going to take a long time
Yeah, but this makes the measurements simpler (this way I don’t need the sync and drop_caches), both runs start from the same (slightly warmed up) state. Plus I didn’t want to test the exact same settings as you did :)
wrap hyperfine with nohup … &
was using tmux for that, the warmup and the first 2 iterations for 0383de are almost finished now.
l0rinc
commented at 8:53 am on July 16, 2024:
contributor
I come bearing good news: the benchmarks have finished running!
229@@ -230,6 +230,40 @@ class CCoinsViewCursor
230 uint256 hashBlock;
231 };
232233+struct CoinsViewCacheCursor
234+{
235+ //! will_erase should be set to true in CCoinsViewCache::Flush and false in CCoinsViewCache::Sync
236+ CoinsViewCacheCursor(size_t& usage, CoinsCachePair& sentinel, CCoinsMap& map, bool will_erase) noexcept
In commit “coins: pass linked list of flagged entries to BatchWrite”
Can you add LIFETIMEBOUND to the usage, sentinel, and map arguments? That may make the compiler complain if the CoinsViewCacheCursor object is used after the arguments go out of scope.
203@@ -206,10 +204,9 @@ bool CCoinsViewCache::BatchWrite(CCoinsMap &mapCoins, const uint256 &hashBlockIn
204 // and mark it as dirty.
205 itUs = cacheCoins.try_emplace(it->first).first;
206 CCoinsCacheEntry& entry{itUs->second};
207- if (erase) {
208- // The `move` call here is purely an optimization; we rely on the
209- // `mapCoins.erase` call in the `for` expression to actually remove
210- // the entry from the child map.
211+ if (cursor.WillErase(*it)) {
In commit “coins: pass linked list of flagged entries to BatchWrite”
The behavior change is a bit confusing here, as cursor.WillErase(*it) will be true for spent items, which was not true for the old code’s erase. Would it make sense to split this commit up in two; one part which introduces the cursor and only iterate over flagged entry, and then a second part which moves the deleting of spent entries from the separate loop in CCoinsViewCache::Sync call into the CCoinsViewCache::BatchWrite loop?
sipa
commented at 4:05 pm on July 16, 2024:
member
ACK21090d032f5c4d90a41c3ca2030690c75899ec1a
I have a few non-blocking suggested improvements in the new commit. I have also pushed a sanity check commit in https://github.com/sipa/bitcoin/commits/pr_28280, feel free to cherry-pick or squash in.
Now that we have this, could you please cover it with tests as well (where we’re testing that the linked list is constructed correctly)?
Or would it make more sense to just call SanityCheck()since it’s already checking the linked list’s integrity?
Is it just bad luck that it wasn’t triggered by CI or could the fuzz tests be incorrect or just not yet triggered at this stage?
Could we use it in coins_tests or coinscachepair_tests as well?
And if it’s a test-only method, maybe document, that it’s heavy and should only be used in testing, not in prod? Or maybe even move it completely to the tests…
Thanks for the info.
I assumed that some of the randomized tests contributed to the coverage here. Isn’t that why we have Lost baseline coverage for PRs? Or did I misunderstand the reason for unrelated code not being covered anymore?
I assumed that some of the randomized tests contributed to the coverage here.
The section where your screenshot is from is “Uncovered new code”. The screenshot means that the code wasn’t covered previously and isn’t covered now either. There should be no randomness.
yes, of course.
I assumed that the existence of Lost baseline coverage means that there was some randomness in the tests or infrastructure or production code or threading or something (i.e. covered before, but not covered now for a reason that’s unrelated to this change):
Ah, my mistake, this was an intermediary commit - fine either way
achow101 added this to the milestone 28.0
on Jul 18, 2024
andrewtoth force-pushed
on Jul 18, 2024
l0rinc
commented at 4:30 pm on July 18, 2024:
contributor
ACKd7f94e0fa9ec3641405e6909ab7da4e48865e840
DrahtBot requested review from sipa
on Jul 18, 2024
in
src/coins.cpp:54
in
5907ace1e6outdated
50@@ -51,7 +51,7 @@ CCoinsMap::iterator CCoinsViewCache::FetchCoin(const COutPoint &outpoint) const
51 if (ret->second.coin.IsSpent()) {
52 // The parent only has an empty entry for this outpoint; we can consider our
53 // version as fresh.
54- ret->second.flags = CCoinsCacheEntry::FRESH;
Did you consider setting m_next and m_prev be set to back to nullptr here? Doesn’t seem strictly necessary but maybe a bit cleaner?
andrewtoth
commented at 10:00 pm on July 22, 2024:
An earlier version of this PR that relied on checking for null did do this. I just think it’s unnecessary work to set them to null, because access is guarded by m_flags.
in
src/test/fuzz/coins_view.cpp:149
in
81f59cb109outdated
It would have to be coins_map.clear(), which is passed up in the cursor to coins_view_cache. This block is equivalent to Flush, and coins_view_cache is the base.
However, coins_map goes out of scope right after this block and is not reused as in Flush, so clearing it is not necessary.
mzumsande
commented at 9:47 pm on July 22, 2024:
contributor
Code Review ACKd7f94e0fa9ec3641405e6909ab7da4e48865e840
(disclaimer: I’m not very experienced with the coins db but started to look into it more recently).
I did some testing on signet to ensure that the dumptxoutset data is not affected by this.
I’m in the process of doing some test runs on mainnet, but that’ll take me a few more days.
in
src/txdb.cpp:117
in
81f59cb109outdated
113@@ -114,7 +114,7 @@ bool CCoinsViewDB::BatchWrite(CCoinsMap &mapCoins, const uint256 &hashBlock, boo
114 batch.Erase(DB_BEST_BLOCK);
115 batch.Write(DB_HEAD_BLOCKS, Vector(hashBlock, old_tip));
116117- for (CCoinsMap::iterator it = mapCoins.begin(); it != mapCoins.end();) {
118+ for (auto it{cursor.Begin()}; it != cursor.End();) {
nit: After this, the log
Committed X changed transaction outputs (out of Y) to coin database...
has become less interesting because we only iterate over dirty coins, so X=Y always.
I might be more helpful to report the initial size of m_map of the cursor (before entries are deleted from it) as the total.
There are other places that could be cleaned up, such as obviated dirty checks like here and here, but this should probably done after a #18746 revival so there’s no confusion.
yes, no need to do that here. I just mentioned the logging here because I was interested in the ratios to assess the efficiency of this PR, and had to write my own log.
refactor: encapsulate flags access for dirty and fresh checks
No behavior change. This prepares moving the cache entry flags field
to private access.
Co-Authored-By: l0rinc <pap.lorinc@gmail.com>
df34a94e57
refactor: encapsulate flags get access for all other checks
No behavior change. This prepares moving the cache entry
flags field to private access.
9715d3bf1e
refactor: encapsulate flags setting with AddFlags and ClearFlags
No behavior change. This prepares moving the cache entry
flags field to private access.
8737c0cefa
refactor: disallow setting flags in CCoinsCacheEntry constructors
No behavior change because any entries that are added in EmplaceCoinInternalDANGER
have DIRTY assigned to them after, and if they
are not inserted then they will not be
modified as before.
This prepares moving the cache entry
flags field to private access.
Co-Authored-By: Martin Leitner-Ankerl <martin.ankerl@gmail.com>
4e4fb4cbab
refactor: move flags to private uint8_t and rename to m_flags
No behavior change. This prepares to add CCoinsCacheEntrys
to a doubly linked list when a flag is added.
davidgumberg
commented at 0:07 am on August 1, 2024:
Is this destructor defined just to document this assumption?
andrewtoth
commented at 0:17 am on August 1, 2024:
It is actually here to clear cacheCoins and make sure all CCoinsCacheEntrys are destroyed and call ClearFlags in their destructors. Since the first and last flagged entries have a reference to m_sentinel, we must be sure they remove themselves from the linked list and set m_flags to 0 before m_sentinel is destroyed.
Without this destructor, we can make this requirement implicit by having m_sentinel come before cacheCoins in the CCoinsViewCache members. That way m_sentinel will be destroyed after cacheCoins and all its entries have already been destroyed. But, that is brittle since someone could reorder the fields. By having this destructor here we make sure all entries are unflagged before destroying the sentinel.
davidgumberg
commented at 0:21 am on August 1, 2024:
This was stumping me for a while, thank you!!
andrewtoth
commented at 0:51 am on August 1, 2024:
Perhaps I should comment that here in a follow-up then?
davidgumberg
commented at 1:05 am on August 1, 2024:
Follow-up: Why wouldn’t m_sentinel being destroyed first be OK since: when m_sentinel.second.ClearFlags() happens, that makes the last (youngest) entry’s m_next point to the first (oldest) entry and vice versa?
andrewtoth
commented at 1:08 am on August 1, 2024:
m_sentinel does not have m_flags set. So when m_sentinel is destroyed it doesn’t actually remove itself from the list.
davidgumberg
commented at 1:22 am on August 1, 2024:
Thanks, I missed that! I think a comment in a follow-up could be helpful.
andrewtoth
commented at 3:36 am on August 2, 2024:
Added a comment
andrewtoth
commented at 4:23 am on August 2, 2024:
Now that we mark the sentinel as dirty, my response to your follow-up is outdated. So perhaps we don’t need this destructor anymore? The last entry to be destroyed will just point to itself in ClearFlags, right?
andrewtoth
commented at 4:31 am on August 2, 2024:
I removed the CCoinsViewCache destructor in light of this. Thank you!
andrewtoth force-pushed
on Aug 2, 2024
andrewtoth force-pushed
on Aug 2, 2024
DrahtBot
commented at 4:11 am on August 2, 2024:
contributor
Make sure to run all tests locally, according to the documentation.
The failure may happen due to a number of reasons, for example:
Possibly due to a silent merge conflict (the changes in this pull request being
incompatible with the current code in the target branch). If so, make sure to rebase on the latest
commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the
affected test.
An intermittent issue.
Leave a comment here, if you need help tracking down a confusing failure.
DrahtBot added the label
CI failed
on Aug 2, 2024
andrewtoth
commented at 4:17 am on August 2, 2024:
contributor
Rebased and addressed most comments.
Added more comments in and around CoinsViewCacheCursor
Renamed Next to NextAndMaybeErase in CoinsViewCacheCursor
Added Assume(m_flags) to Next and Prev of CCoinsCacheEntry
Added tests for Prev in coinscachepair_tests and added calls to SanityCheck in coins_tests
Set m_flags = DIRTY in SelfRef of CCoinsCacheEntry so we can call Next on the sentinel with Assume(m_flags) and removed CCoinsViewCache destructor. This last change is the only change in behavior for non-debug, non-test code. This means that we don’t have to worry about clearing the coins map before destroying the linked list sentinel. The sentinel will remove its references from the entries it points to when it gets destroyed. This will leave the last entry pointing to itself when it is destroyed.
refactor: require self and sentinel parameters for AddFlags
No behavior change. Prepares for adding the CoinsCachePairs to
a linked list when flagged.
8bd3959fea
coins: call ClearFlags in CCoinsCacheEntry destructor
No behavior change. Prepares for flags adding CCoinsCacheEntrys
to a linked list which need to be removed on destruction.
58b7ed156d
andrewtoth force-pushed
on Aug 2, 2024
DrahtBot removed the label
CI failed
on Aug 2, 2024
andrewtoth
commented at 4:55 pm on August 2, 2024:
contributor
The compare feature is kind of broken now that I rebased it. You can checkout the changes from the previous version with the following command
git range-diff 0383defd049fa1feb6de7abea259041487ccda1d..d7f94e0fa9ec3641405e6909ab7da4e48865e840 df34a94e57659c31873c26c86a8de5a032400061..28f0dd2d60d0f9a8d8a5aa0ed4a5eb1bd4f605bf
or smaller
git range-diff 0383def..d7f94e0 df34a94..28f0dd2
there’s a hidden setter and updater here, which is applicable to ClearFlags and SelfRef as well, if you need change to the commits again, please consider extracting the repetition:
this isn’t exactly what the code checks, right?
It can also be both - which you’ve specified in other cases if it was allowed -, so we should either add DIRTY, FRESH, or both in the comment or change the precondition to Assume(IsDirty() ^ IsFresh());
I guess it’s a bit ambiguous, but I’d interpret this as “Only call … when this entry has either the DIRTY of FRESH flags set” and not as “Only call … when this entry’s flags are either exactly DIRTY or exactly FRESH”.
andrewtoth
commented at 11:45 pm on August 5, 2024:
Fixed.
in
src/test/coins_tests.cpp:644
in
28f0dd2d60outdated
since we can now do that in the constructor. What do you think? I understand if you want to do it in a separate PR, but I will gladly reapprove as well.
l0rinc
commented at 3:52 pm on August 3, 2024:
contributor
re-ACK28f0dd2d60d0f9a8d8a5aa0ed4a5eb1bd4f605bf
Left a few nits, if you edit again, please consider them!
DrahtBot requested review from mzumsande
on Aug 3, 2024
DrahtBot requested review from sipa
on Aug 3, 2024
sipa
commented at 1:20 pm on August 4, 2024:
member
reACK28f0dd2d60d0f9a8d8a5aa0ed4a5eb1bd4f605bf
vostrnad
commented at 10:06 am on August 5, 2024:
none
Benchmark ACK (that’s not a thing but it totally should be)
I benchmarked the latest revision (28f0dd2d60d0f9a8d8a5aa0ed4a5eb1bd4f605bf) against the merge base (9774a958b501a6d439a734e18b29e04f59f973f6) with a few runs to block 400,000, syncing from a local node, with pruning enabled and all other settings at default. With dbcache=450 it finished 9.4% faster, dbcache=4096 was 14.3% faster.
dbcache=450:
master: 3203s
pr: 2901s (9.4% speedup)
dbcache=4096:
master: 3241s
pr: 2778s (14.3% speedup)
mzumsande
commented at 4:54 pm on August 5, 2024:
contributor
re-ACK28f0dd2d60d0f9a8d8a5aa0ed4a5eb1bd4f605bf
in
src/coins.h:277
in
6453d8e38aoutdated
273@@ -257,9 +274,24 @@ struct CoinsViewCacheCursor
274 inline CoinsCachePair* Begin() const noexcept { return m_sentinel.second.Next(); }
275 inline CoinsCachePair* End() const noexcept { return &m_sentinel; }
276277- inline CoinsCachePair* NextAndMaybeErase(CoinsCachePair& current) noexcept { return current.second.Next(); }
278+ //! Return the next entry after current, erasing current if WillErase(current) returns true
In 6453d8e38a51311b23c980c5c61a60a0d0bd68b2 “coins: move Sync logic to CoinsViewCacheCursor”
This comment seems to be incorrect, WillErase(current) will return true when m_will_erase == true, but this function will not erase the entry in that instance.
andrewtoth
commented at 9:22 pm on August 5, 2024:
Ianilfy
commented at 10:23 am on September 13, 2024:
What do I do
andrewtoth force-pushed
on Aug 5, 2024
coins: track flagged cache entries in linked list
No visible behavior change. This commit tracks the flagged
entries internally but the list is not iterated by anything.
Co-Authored-By: Pieter Wuille <pieter@wuille.net>
Co-Authored-By: l0rinc <pap.lorinc@gmail.com>
24ce37cb86
test: add cache entry linked list tests
Co-Authored-By: l0rinc <pap.lorinc@gmail.com>
a14edada8a
coins: pass linked list of flagged entries to BatchWrite
BatchWrite now iterates through the linked
list of flagged entries instead of the entire
coinsCache map.
Co-Authored-By: l0rinc <pap.lorinc@gmail.com>
7825b8b9ae
andrewtoth force-pushed
on Aug 5, 2024
andrewtoth
commented at 11:45 pm on August 5, 2024:
contributor
Might as well address some nits since I invalidated ACKs anyways.
Erase spent cache entries and clear flags of unspent
entries inside the BatchWrite loop, instead of an
additional loop after BatchWrite.
Co-Authored-By: Pieter Wuille <pieter@wuille.net>
05cf4e1875
Add linked-list test to CCoinsViewCache::SanityCheck0e8918755f
validation: don't erase coins cache on prune flushes589db872e1
andrewtoth force-pushed
on Aug 6, 2024
l0rinc
commented at 6:38 am on August 6, 2024:
contributor
0Benchmark 1: COMMIT=df34a9 ./src/bitcoind -datadir=/root/BitcoinData -dbcache=16384-prune=550-stopatheight=300000-printtoconsole=0 1 Time (mean ± σ): 993.561 s ±36.138 s [User: 1001.395 s, System: 96.911 s]
2 Range (min … max): 968.008 s …1019.115 s 2 runs
3 4Benchmark 2: COMMIT=28f0dd ./src/bitcoind -datadir=/root/BitcoinData -dbcache=16384-prune=550-stopatheight=300000-printtoconsole=0 5 Time (mean ± σ): 1168.303 s ±39.657 s [User: 919.521 s, System: 95.169 s]
6 Range (min … max): 1140.262 s …1196.345 s 2 runs
7 8Summary
9'COMMIT=df34a9 ./src/bitcoind -datadir=/root/BitcoinData -dbcache=16384 -prune=550 -stopatheight=300000 -printtoconsole=0' ran
101.18±0.06 times faster than 'COMMIT=28f0dd ./src/bitcoind -datadir=/root/BitcoinData -dbcache=16384 -prune=550 -stopatheight=300000 -printtoconsole=0'
andrewtoth
commented at 11:50 am on August 6, 2024:
contributor
@paplorinc if you’re not connecting to a local trusted node there will be too much variance in how quickly you can connect to peers and how reliably they will serve the blocks. Also, there will be barely any prune flushes the first 300k blocks anyways. That’s a not a good benchmark for this.
l0rinc
commented at 12:16 pm on August 6, 2024:
contributor
if you’re not connecting to a local trusted node
This is by design. I want a real-world test. Others have already tested the local-node scenario, so I’ll just work around the volatility by increasing the repetition.
That’s a not a good benchmark for this
I’m running 3 rounds now until 500k - that should be representative, right?
l0rinc
commented at 12:34 pm on August 7, 2024:
contributor
I come bearing good news again \:D/
0Benchmark 1: COMMIT=df34a9 ./src/bitcoind -datadir=/mnt/sda1/BitcoinData -dbcache=16384-prune=550-stopatheight=500000-printtoconsole=01 Time (mean ± σ): 9216.193 s ±165.608 s [User: 10809.113 s, System: 1931.597 s]
2 Range (min … max): 9065.956 s …9393.770 s 3 runs
34Benchmark 2: COMMIT=589db87 ./src/bitcoind -datadir=/mnt/sda1/BitcoinData -dbcache=16384-prune=550-stopatheight=500000-printtoconsole=05 Time (mean ± σ): 7087.173 s ±320.442 s [User: 8476.776 s, System: 789.276 s]
6 Range (min … max): 6737.609 s …7367.011 s 3 runs
resulting in:
0Summary
1'COMMIT=589db87 ./src/bitcoind -datadir=/mnt/sda1/BitcoinData -dbcache=16384 -prune=550 -stopatheight=500000 -printtoconsole=0' ran
21.30±0.06 times faster than 'COMMIT=df34a9 ./src/bitcoind -datadir=/mnt/sda1/BitcoinData -dbcache=16384 -prune=550 -stopatheight=500000 -printtoconsole=0'
i.e. 30% faster than master (I’ve used the first refactoring commit here as reference, seemed easier to validate) for the first 500k blocks with fully remote nodes to mimic realistic usage.
davidgumberg
commented at 5:41 pm on August 7, 2024:
I think this is right as-is, just wanted to make a note that I believe @mzumsande earlier comment is relevant here. Namely, that it’s not necessary to check for freshness here if we cannot have fresh-but-not-dirty entries, and even if we could, I’m not sure why we wouldn’t remove them from the cache in this instance. But, I think this should be addressed by any revival of #18746.
andrewtoth
commented at 10:10 pm on August 18, 2024:
This is a metadata mirror of the GitHub repository
bitcoin/bitcoin.
This site is not affiliated with GitHub.
Content is generated from a GitHub metadata backup.
generated: 2025-06-17 15:13 UTC
This site is hosted by @0xB10C More mirrored repositories can be found on mirror.b10c.me