Change UpdateForDescendants to use Epochs

JeremyRubin commented at 4:19 am on February 22, 2020: contributor

This patch replaces #18120, fixing some regressions or issues with that PR. The overall change here should be easier to review as well.

Refactors UpdateForDescendants to use Epochs instead of sets to de-duplicate traversal, and replaces the cache entries with a vector instead of a set. This is a straightforward win. The algorithm is a bit clever so as not to require two separate vectors for txiters that need expanding and txiters that have already been processed, but I think it’s relatively easy to reason about.

This is extracted from #18063 as a standalone because it turns out the DoS issues with this code perhaps merit more extensive review http://gnusha.org/bitcoin-core-dev/2020-02-11.log. That PR will stay open for review, but this PR is designed as an “obviously better” win that can be merged now to keep progress up on the mempool project.

sdaftuar can you re-run your bench from #18120?

JeremyRubin requested review from ajtowns on Feb 22, 2020

JeremyRubin requested review from sdaftuar on Feb 22, 2020

fanquake added the label Mempool on Feb 22, 2020

JeremyRubin force-pushed on Feb 22, 2020

DrahtBot commented at 9:26 am on February 22, 2020: member

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

No conflicts as of last run.

in src/txmempool.cpp:123 in ec237cf8c6 outdated

130-            modify_size += child.GetTxSize();
131-            modify_fee += child.GetModifiedFee();
132-            modify_count++;
133-            mapTx.modify(child_it, update_ancestor_state(update_it->GetTxSize(), update_it->GetModifiedFee(), 1, update_it->GetSigOpCost()));
134-        }
135+        modify_size += child.GetTxSize();

sdaftuar commented at 8:28 pm on February 25, 2020:

Github won’t let me leave a comment up at line 106, so I’m leaving it here. I think the change in this commit of trying to not include things in exclude is broken, because it’s possible you’ll have a cache entry hit for something that is excluded, and then because things that are in the cache get swapped at line 106 to an entry in the vector that already_traversed is advanced past, you’ll never pop it before you get down to this loop.

I tend to think that this improvement to leaving things out from exclude isn’t worth it; seems like quite a bit of added complexity to this logic without an obvious performance improvement.

JeremyRubin commented at 8:05 pm on February 26, 2020:

We hold the invariant that everything in the cache is not excluded? So how would this have a bug?

JeremyRubin commented at 8:15 pm on February 26, 2020:

Ah I see what you’re saying. See comment above, will fix.

in src/txmempool.cpp:138 in ec237cf8c6 outdated

140     mapTx.modify(update_it, update_descendant_state(modify_size, modify_fee, modify_count));
141     // share the cache (if there is one)
142-    if (!update_cache.empty()) cache.emplace(update_it, std::move(update_cache));
143+    if (!update_cache.empty()) {
144+        // Ask to shrink to fit if we're using space poorly
145+        if (update_cache.size() > 100 && update_cache.capacity() > 2 * update_cache.size())

sdaftuar commented at 8:28 pm on February 25, 2020:

I overlooked this issue in #18120 – is this the regression you’re referring to in the commit message?

JeremyRubin commented at 8:04 pm on February 26, 2020:

Nope this is as a result of the fix, we sometimes now have a cache that is too big because we remove things from the cache if they are in exclude. We could not do this – and then it would still be possible for the cache entries to have an excess of O(direct children in exlcude), rather than O(exclude) without this patch, and an excess of 100 or not more than 2*size with this patch. It also only does anything on systems that implement shrink_to_fit fwiw.

I’ll tag you on the relevant line….

sdaftuar commented at 8:29 pm on February 25, 2020: member

Only looked at ec237cf8c640f9e85c3c7aad0d3eb243a346b1f9 but I think maybe not worth the added complexity, if I’m understanding correctly.

Also the shrink_to_fit of the update_cache vector could just be included in #18120?

JeremyRubin added this to the "Epoch MemPool" column in a project

JeremyRubin force-pushed on Mar 3, 2020

JeremyRubin commented at 4:46 am on March 3, 2020: contributor

Fixed and updated comment.

JeremyRubin force-pushed on Mar 3, 2020

JeremyRubin commented at 5:55 am on March 3, 2020: contributor

Sorry for all the line noise; I realized I forgot to add a continue, and then decided that I could DRY up the code a bit. I’m pretty happy with the final version, I think it’s a bit easier to view because we don’t treat update_it special from other iterators we walk (other than it not going into the cache line).

I can squash this all down into one commit and replace #18120.

JeremyRubin renamed this:
~~Epoch mempool cache regression plus remove excluded set.~~
Change UpdateForDescendants to use Epochs
on Mar 6, 2020

JeremyRubin force-pushed on Mar 6, 2020

hebasto commented at 5:43 pm on March 29, 2020: member

Concept ACK.

in src/txmempool.cpp:120 in 32985d34a0 outdated

133+    // compute updates now.
134+    int64_t modify_size = 0;
135+    CAmount modify_fee = 0;
136+    int64_t modify_count = 0;
137+    // All entries in the new_cache_line have
138+    // already been checked against excluded list

hebasto commented at 8:10 pm on March 29, 2020:

I’m confused with “… have already been checked against excluded list”. Which list?

JeremyRubin commented at 2:02 am on March 30, 2020:

mapMemPoolDescebdabtsToUpdate grows to contain the set of things to exclude while processing.

Previously this list was explicitly constructed, now it’s implicit

hebasto commented at 9:14 am on March 30, 2020:

Previously this list was explicitly constructed, now it’s implicit

Yes. And the new logic does not apply any rule or list “to exclude” an element. It seems, the comments should avoid any “excluding” notions, and describe the logic in terms of “including” or “conditional including”.

JeremyRubin commented at 10:27 pm on March 30, 2020:

Well the list does exist – it’s vHashesToUpdate. We just don’t need to actually check or construct it because it’s an implicit invariant held by the code.

I can’t comment on the semantic difference between excluding or conditional including, but it does seem that what we’re doing is excluding.

JeremyRubin commented at 10:29 pm on March 30, 2020:

In any case this comment & the function comment serve to document what the expected invariant is for the caller, so I prefer to leave this comment.

hebasto commented at 8:41 am on March 30, 2020: member

IIUC, the first pass of the do ... while (true); loop just emplaces all of the direct descendants of update_it tx, i.e., its children, to the new_cache_line, and taints their m_epoch. All of the following loop passes process grand children, grand-grand children and so on. And two thing are confusing to me:

the following comment: https://github.com/bitcoin/bitcoin/blob/32985d34a0a0cfd4ff8c4d96ff461778eaf72a8a/src/txmempool.cpp#L79-L80

I’d suggest to remove it completely.

the following variable name: https://github.com/bitcoin/bitcoin/blob/32985d34a0a0cfd4ff8c4d96ff461778eaf72a8a/src/txmempool.cpp#L87

I’d suggest:

s/cached_grand_children/cached_descendants/
s/grand_child_it/descendant_it/
s/the grand child/the descendant/ in the comment

EDIT: I realized, that you mean “children” and “grandchildren” relative to an element of the new_cache_line. So, ignore my suggestion (2).

in src/txmempool.cpp:103 in 32985d34a0 outdated

116+            } else {
117                 // Schedule for later processing
118-                stageEntries.insert(childEntry);
119+                // Element must not be excluded, or it would be in the cache already.
120+                new_cache_line.emplace_back(child_it);
121             }

hebasto commented at 9:40 am on March 30, 2020:

The internal loop seems redundant. Could this work:

 0    do {
 1        for (const txiter child_it : GetMemPoolChildren(next_it)) {
 2            if (visited(child_it)) continue;
 3            new_cache_line.emplace_back(child_it);
 4        }
 5
 6        // finish loop here!
 7        if (next_idx_to_walk >= new_cache_line.size()) break;
 8
 9        // Prepare for next iteration:
10        next_it = new_cache_line[next_idx_to_walk];
11        ++next_idx_to_walk;
12    } while (true);

?

UPDATED: … or even

0    txiter next_it = update_it;
1    for (size_t next_idx_to_walk = 0; next_idx_to_walk < new_cache_line.size(); ++next_idx_to_walk) {
2        for (const txiter child_it : GetMemPoolChildren(next_it)) {
3            if (visited(child_it)) continue;
4            new_cache_line.emplace_back(child_it);
5        }
6        next_it = new_cache_line[next_idx_to_walk];
7    }

JeremyRubin commented at 8:34 pm on March 30, 2020:

No because then we aren’t using the cached traversal at all.

It could be correct, but has some algorithmic complexity blowups.

JeremyRubin commented at 8:41 pm on March 30, 2020:

You update is still not using the cache lines….

hebasto commented at 9:08 pm on March 30, 2020:

It could be correct, but has some algorithmic complexity blowups.

How it could be, if every element is visited once only?

JeremyRubin commented at 10:22 pm on March 30, 2020:

I’m saying “could be” because I haven’t bothered to actually check your code above for correctness, given the glaring issue that it doesn’t use the caches which leads to unacceptable algorithmic complexity.

JeremyRubin commented at 1:46 am on March 31, 2020:

I guess to pull up – I do agree that the loop is a bit atypical in structure, but I do think that it’s the most straightforward to reason about because it’s pretty critical when certain checks or variable updates happen.

hebasto commented at 9:47 am on March 31, 2020:

“… it doesn’t use the caches…”

Corrected:

0    txiter next_it = update_it;
1    for (size_t next_idx_to_walk = 0; next_idx_to_walk < new_cache_line.size(); ++next_idx_to_walk) {
2        for (const txiter child_it : GetMemPoolChildren(next_it)) {
3            if (visited(child_it)) continue;
4            if (cache.find(child_it) != cache.end()) continue;
5            new_cache_line.emplace_back(child_it);
6        }
7        next_it = new_cache_line[next_idx_to_walk];
8    }

JeremyRubin commented at 8:05 pm on March 31, 2020:

So now the code checks if there is a cache, and then continues… but you’re now both not using the cache, and incorrect. If you find a cache line for a child then you must include all of it’s cache entries into the new_cache_line otherwise a future call will skip updating descendants properly.

The code that is in the PR is (I believe) correct & does not have superfluous lines. If there was a good simplification of that logic I’d be for it, but I think that it already sort of is at the minimal correct form. There’s not an extra internal loop to get rid of.

hebasto commented at 9:30 am on April 3, 2020:

Code in the PR passes the test from #18485. All my suggestions fail the test.

in src/txmempool.cpp:122 in 32985d34a0 outdated

135+    CAmount modify_fee = 0;
136+    int64_t modify_count = 0;
137+    // All entries in the new_cache_line have
138+    // already been checked against excluded list
139+    for (txiter child_it : new_cache_line) {
140+        const CTxMemPoolEntry& child = *child_it;

hebasto commented at 10:27 am on March 30, 2020:

Why a new variable is needed? child_it->GetTxSize() and child_it->GetModifiedFee() seems to work well, no?

JeremyRubin commented at 8:38 pm on March 30, 2020:

It would work but I have a preference to access through a const ref where possible to avoid using non const APIs where not explicitly offered.

No performance overhead of this, it shouldn’t even allocate a variable in the code.

hebasto approved

hebasto commented at 8:28 pm on April 3, 2020: member

ACK 32985d34a0a0cfd4ff8c4d96ff461778eaf72a8a, tested on Linux Mint 19.3 with the functional test from #18485 (cfaa79b9f4d0c220c220ae0d8a6825581aa89133):

the average time (from 10 runs) reported by test decreased by 9% from 0.367s to 0.347s.

MarcoFalke referenced this in commit 978c5a2122 on Apr 29, 2020

sidhujag referenced this in commit b991503208 on Apr 29, 2020

laanwj added this to the "Blockers" column in a project

in src/txmempool.cpp:78 in 32985d34a0 outdated

89+    // If index is >= next_idx_to_walk, we need to walk it.
90+    // If next_idx_to_walk >= new_cache_line.size(), we're finished.
91+    size_t next_idx_to_walk = 0;
92+    txiter next_it = update_it;
93+    do {
94+        for (const txiter child_it : GetMemPoolChildren(next_it)) {

jonatack commented at 3:44 pm on May 24, 2020:

0        for (const txiter& child_it : GetMemPoolChildren(next_it)) {

to avoid these compile warnings:

0txmempool.cpp:78:27: warning: loop variable 'child_it' of type 'const CTxMemPool::txiter' (aka 'const hashed_index_iterator<hashed_index_node<ordered_index_node<boost::multi_index::detail::null_augment_policy, ordered_index_node<boost::multi_index::detail::null_augment_policy, ordered_index_node<boost::multi_index::detail::null_augment_policy, index_node_base<CTxMemPoolEntry, std::allocator<CTxMemPoolEntry> > > > >, boost::multi_index::detail::hashed_unique_tag>, bucket_array<std::allocator<CTxMemPoolEntry> >, boost::multi_index::detail::hashed_index_global_iterator_tag>') creates a copy from type 'const CTxMemPool::txiter' [-Wrange-loop-analysis]
1        for (const txiter child_it : GetMemPoolChildren(next_it)) {
2                          ^
3txmempool.cpp:78:14: note: use reference type 'const CTxMemPool::txiter &' (aka 'const hashed_index_iterator<hashed_index_node<ordered_index_node<boost::multi_index::detail::null_augment_policy, ordered_index_node<boost::multi_index::detail::null_augment_policy, ordered_index_node<boost::multi_index::detail::null_augment_policy, index_node_base<CTxMemPoolEntry, std::allocator<CTxMemPoolEntry> > > > >, boost::multi_index::detail::hashed_unique_tag>, bucket_array<std::allocator<CTxMemPoolEntry> >, boost::multi_index::detail::hashed_index_global_iterator_tag> &') to prevent copying
4        for (const txiter child_it : GetMemPoolChildren(next_it)) {

hebasto commented at 3:57 pm on May 24, 2020:

Suppose you compiled with clang?

jonatack commented at 4:23 pm on May 24, 2020:

Yes, this was a clang warning (I often compile with both gcc and clang for anything non-trivial as they seem to have complementary results).

JeremyRubin commented at 8:48 pm on May 24, 2020:

Hmm I think specifically we do want copying here because it’s an iter, which in boost::multiindex should be just a pointer, so a reference to an iter implies extra unnecessary indirection? @sipa is probably most familiar with multiindex innards to know if this is correct.

sipa commented at 9:13 pm on May 24, 2020:

Yeah, CTxMemPool::txiter is just a single pointer, a copy would be desirable here.

However, I suspect that the compiler will figure this out, and produce pretty much equal code for both.

JeremyRubin commented at 9:52 pm on May 24, 2020:

Thanks sipa.

My preference is to leave the code as is if the behavior is correct & desirable (even if the compiler is likely to optimize out the indirection), as the warning seems to be just an aggressive warning from clang, unless a warning is excessively bad. The right solution seems to be to annotate somehow that txiter is a copyable type. How can this be done?

sipa commented at 10:11 pm on May 24, 2020:

It seems there are two solutions: not mark the loop variable const (if you want something mutable, a copy is needed anyway), or upgrade to clang 10.

sipa commented at 10:17 pm on May 24, 2020:

FWIW, at -O2, clang 9 compiles both to the same code.

MarcoFalke commented at 10:27 pm on May 24, 2020:

txiter is already const (it would be illegal to modify a mempool entry by the iterator). I don’t think we write ‘const txiter’ anywhere

sipa commented at 10:35 pm on May 24, 2020:

It’s a const_iterator. The const here just prevents modifying the txiter object, not what it points to. It’s a vaguely sensible thing to do when you know a variable isn’t going to be modified.

JeremyRubin commented at 10:54 pm on May 24, 2020:

@MarcoFalke this is not quite true, it’s perfectly legal to modify through the iterator via at least two means:

Const cast – the txiter is known to be non const somewhere so updating is fine
Mutable fields – the txiter epochs are mutable, same with the vtxhashesidx

In this case we are modifying the epochs via the mutable fields even though it’s const.

jonatack commented at 8:54 am on May 25, 2020:

Yes, on Clang 6 (Debian), making the txiters non-const also removes the warning.

If the resulting code is the same, my suggestion would be to appease the warnings to not add needlessly to the noise and friction for reviewers, here and down the road. Could be added as a follow-up if this is merged.

in src/txmempool.cpp:89 in 32985d34a0 outdated

100+            // - All keys in the cache are already excluded, no need to add
101+            //   `child_it` to the new_cache_line (but not all excluded keys
102+            //   are in the cache yet!)
103+            cacheMap::iterator cached_grand_children = cache.find(child_it);
104+            if (cached_grand_children != cache.end()) {
105+                for (const txiter grand_child_it : cached_grand_children->second) {

jonatack commented at 3:44 pm on May 24, 2020:

0                for (const txiter& grand_child_it : cached_grand_children->second) {

to avoid these compile warnings:

0txmempool.cpp:89:35: warning: loop variable 'grand_child_it' of type 'const CTxMemPool::txiter' (aka 'const hashed_index_iterator<hashed_index_node<ordered_index_node<boost::multi_index::detail::null_augment_policy, ordered_index_node<boost::multi_index::detail::null_augment_policy, ordered_index_node<boost::multi_index::detail::null_augment_policy, index_node_base<CTxMemPoolEntry, std::allocator<CTxMemPoolEntry> > > > >, boost::multi_index::detail::hashed_unique_tag>, bucket_array<std::allocator<CTxMemPoolEntry> >, boost::multi_index::detail::hashed_index_global_iterator_tag>') creates a copy from type 'const CTxMemPool::txiter' [-Wrange-loop-analysis]
1                for (const txiter grand_child_it : cached_grand_children->second) {
2                                  ^
3txmempool.cpp:89:22: note: use reference type 'const CTxMemPool::txiter &' (aka 'const hashed_index_iterator<hashed_index_node<ordered_index_node<boost::multi_index::detail::null_augment_policy, ordered_index_node<boost::multi_index::detail::null_augment_policy, ordered_index_node<boost::multi_index::detail::null_augment_policy, index_node_base<CTxMemPoolEntry, std::allocator<CTxMemPoolEntry> > > > >, boost::multi_index::detail::hashed_unique_tag>, bucket_array<std::allocator<CTxMemPoolEntry> >, boost::multi_index::detail::hashed_index_global_iterator_tag> &') to prevent copying
4                for (const txiter grand_child_it : cached_grand_children->second) {

jonatack commented at 7:58 am on May 25, 2020: member

It seems that performance is the motivation for this PR. Could benchmarks, or a link to them, be added to the PR description?

hebasto commented at 8:04 am on May 25, 2020: member

@jonatack

It seems that performance is the motivation for this PR. Could benchmarks, or a link to them, be added to the PR description?

Some benchmarks are done :)

hebasto commented at 9:27 am on May 25, 2020: member

Benchmarking:

on master (24f70290642c9c5108d3dc62dbe055f5d1bcff9d)

 0$ for i in {1..10}; do ./test/functional/mempool_updatefromblock.py | grep -o 'in [0-9]\.[0-9]*'; done
 1in 0.2627599239349365
 2in 0.26287293434143066
 3in 0.26229262351989746
 4in 0.2730386257171631
 5in 0.29637765884399414
 6in 0.29340052604675293
 7in 0.32193946838378906
 8in 0.26309871673583984
 9in 0.2626173496246338
10in 0.2927675247192383

this PR on top of the master

 0$ for i in {1..10}; do ./test/functional/mempool_updatefromblock.py | grep -o 'in [0-9]\.[0-9]*'; done
 1in 0.2642529010772705
 2in 0.29978084564208984
 3in 0.26271796226501465
 4in 0.28136396408081055
 5in 0.2789490222930908
 6in 0.2894265651702881
 7in 0.2904317378997803
 8in 0.2793881893157959
 9in 0.31591129302978516
10in 0.317304134368896

The previous benchmarking was done on different platform.

jonatack commented at 5:21 pm on May 25, 2020: member

Thanks @hebasto, I’ve been running your benchmarks all day long on a 2012 MacBookPro that seems to run them 6x slower than you :). Results here when it’s done.

Initial results for 12 runs on a more recent Debian laptop were showing 1-3% improvement on average but with quite a bit more variance than you, for this PR rebased on master versus master. Not statistically significant given the wide variance in results, which is why I offloaded a longer benchmarking to the older computer that is also doing nothing else.

JeremyRubin commented at 8:07 pm on May 25, 2020: contributor

Thanks all for running and developing these benches! @jonatack

I think the thing to look at is that this is being done for a bunch of reasons that encapsulate performance more than just run time on a bench mark. If the run time is similar on a given bench that’s a very favorable outcome for this patch. If it’s better, that’s great.

But this is ultimately more about DoS surface than sheer performance – even if this algorithm were slower on benches but guaranteed to perform within a better asymptotic it would be an improvement over the current code.

With this patch we allocate way less memory. So the maximally bad block (which it’s unclear if @hebasto’s bench is maximally bad) is improved. Profilers should also look at the allocation differences between runs of master and this PR. Back of the envelope in this PR we make N allocations of size N, v.s. N^2 allocations of size 1. At too small N, the N of size N will look the same or worse, but getting rid of a N^2 path is very desirable.

hebasto commented at 3:55 am on May 26, 2020: member

@jonatack @JeremyRubin Please note that mempool_updatefromblock.py from #18485 is not a specially designed for benchmarking. It’s a functional test that was adapted for benchmarking while we have no a special one.

jonatack commented at 7:35 am on May 27, 2020: member

For what it’s worth, 30 test runs on master versus this branch rebased on master seem to show no significant difference in run time (~0.5%). They do appear to show a significant reduction in run time variance with this PR. I ran four alternating cycles of 15: master, PR, master, PR; it took most of a day to for the 60 runs.

 0master
 1
 2(/ (+
 3 1.6158521175384521
 4 1.5466628074645996
 5 1.6130828857421875
 6 1.5822479724884033
 7 1.8187620639801025
 8 1.5548689365386963
 9 1.5336580276489258
10 1.5182380676269531
11 1.5340137481689453
12 1.5922291278839111
13 1.6903131008148193
14 1.8066248893737793
15 1.8288109302520752
16 1.5529100894927979
17 1.4984791278839111
18 1.6140100955963135
19 1.6495018005371094
20 1.6218130588531494
21 1.6670641899108887
22 1.550534963607788
23 1.540299654006958
24 1.6754770278930664
25 1.548105001449585
26 1.5992188453674316
27 1.5701148509979248
28 1.5563862323760986
29 1.5182199478149414
30 1.5484421253204346
31 1.578866958618164
32 1.5370879173278809
33) 30)
341.602063
35
36(HEAD, origin/pr/18191) rebased on master
37
38(/ (+
39 1.5921480655670166
40 1.5645711421966553
41 1.5695910453796387
42 1.5787348747253418
43 1.6047391891479492
44 1.563709020614624
45 1.5736000537872314
46 1.5630147457122803
47 1.5653061866760254
48 1.599463939666748
49 1.5765950679779053
50 1.692945957183838
51 1.580806016921997
52 1.5970890522003174
53 1.5502479076385498
54 1.7198941707611084
55 1.5886380672454834
56 1.5441138744354248
57 1.601973056793213
58 1.5830471515655518
59 1.7337110042572021
60 1.5389633178710938
61 1.5398321151733398
62 1.588163137435913
63 1.744027853012085
64 1.569767951965332
65 1.6018469333648682
66 1.560020923614502
67 1.542902946472168
68 1.5561339855194092
69) 30)
701.5928533
71
72(/ 1.5928533 1.602063)
730.9942514

jonatack commented at 7:43 am on May 27, 2020: member

The build warnings I encountered building in Debian with Clang 6 were also present when building on MacOS with the default Apple clang version 11.

In both cases, these are the only compiler warnings I’m seeing while building Bitcoin Core. They can be removed by either making the txiters const reference or non-const.

 0txmempool.cpp:89:35: warning: loop variable 'grand_child_it' of type 'const CTxMemPool::txiter' (aka 'const hashed_index_iterator<hashed_index_node<ordered_index_node<boost::multi_index::detail::null_augment_policy, ordered_index_node<boost::multi_index::detail::null_augment_policy, ordered_index_node<boost::multi_index::detail::null_augment_policy, index_node_base<CTxMemPoolEntry, std::__1::allocator<CTxMemPoolEntry> > > > >, boost::multi_index::detail::hashed_unique_tag>, bucket_array<allocator<CTxMemPoolEntry> >, boost::multi_index::detail::hashed_index_global_iterator_tag>') creates a copy from type 'const CTxMemPool::txiter' [-Wrange-loop-analysis]
 1                for (const txiter grand_child_it : cached_grand_children->second) {
 2                                  ^
 3txmempool.cpp:89:22: note: use reference type 'const CTxMemPool::txiter &' (aka 'const hashed_index_iterator<hashed_index_node<ordered_index_node<boost::multi_index::detail::null_augment_policy, ordered_index_node<boost::multi_index::detail::null_augment_policy, ordered_index_node<boost::multi_index::detail::null_augment_policy, index_node_base<CTxMemPoolEntry, std::__1::allocator<CTxMemPoolEntry> > > > >, boost::multi_index::detail::hashed_unique_tag>, bucket_array<allocator<CTxMemPoolEntry> >, boost::multi_index::detail::hashed_index_global_iterator_tag> &') to prevent copying
 4                for (const txiter grand_child_it : cached_grand_children->second) {
 5                     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 6txmempool.cpp:78:27: warning: loop variable 'child_it' of type 'const CTxMemPool::txiter' (aka 'const hashed_index_iterator<hashed_index_node<ordered_index_node<boost::multi_index::detail::null_augment_policy, ordered_index_node<boost::multi_index::detail::null_augment_policy, ordered_index_node<boost::multi_index::detail::null_augment_policy, index_node_base<CTxMemPoolEntry, std::__1::allocator<CTxMemPoolEntry> > > > >, boost::multi_index::detail::hashed_unique_tag>, bucket_array<allocator<CTxMemPoolEntry> >, boost::multi_index::detail::hashed_index_global_iterator_tag>') creates a copy from type 'const CTxMemPool::txiter' [-Wrange-loop-analysis]
 7        for (const txiter child_it : GetMemPoolChildren(next_it)) {
 8                          ^
 9txmempool.cpp:78:14: note: use reference type 'const CTxMemPool::txiter &' (aka 'const hashed_index_iterator<hashed_index_node<ordered_index_node<boost::multi_index::detail::null_augment_policy, ordered_index_node<boost::multi_index::detail::null_augment_policy, ordered_index_node<boost::multi_index::detail::null_augment_policy, index_node_base<CTxMemPoolEntry, std::__1::allocator<CTxMemPoolEntry> > > > >, boost::multi_index::detail::hashed_unique_tag>, bucket_array<allocator<CTxMemPoolEntry> >, boost::multi_index::detail::hashed_index_global_iterator_tag> &') to prevent copying
10        for (const txiter child_it : GetMemPoolChildren(next_it)) {
11             ^~~~~~~~~~~~~~~~~~~~~~~
12
132 warnings generated.

jonatack commented at 7:48 am on May 27, 2020: member

Thanks all for running and developing these benches!

With this patch we allocate way less memory.

I think it would help the PR to feature relevant benchmarkings, with instructions to reproduce them, in the PR description.

MarcoFalke commented at 11:38 am on May 27, 2020: member

I don’t think we have documentation on how to track memory usage. I remember I used massif at one point, but I forgot to write down notes. #17164 (comment)

JeremyRubin commented at 4:21 am on May 28, 2020: contributor

I think it would help the PR to feature relevant benchmarkings, with instructions to reproduce them, in the PR description.

Good point. Generally this one is supposed to be an obviously better version that doesn’t really require much benching. You can easily see from the code that std::sets go to vectors and a std::set of all conflicts is removed. But it’s a good standard to hold to I suppose to want benches/testing for this stuff even if it seems obvious – but as @MarcoFalke points out, benching memory is a bit tricky… @sdaftuar do you think any of the benches you’ve run previously can be published in a reproducible way?

JeremyRubin commented at 4:29 am on May 28, 2020: contributor

@jonatack I’ll fix the iterator reference thing. Is there a way to make this warning trigger on all platforms that you know about?

adamjonas commented at 7:19 pm on June 15, 2020: member

I encountered identical build warnings with MacOS clang 10.

fjahr commented at 3:50 pm on June 17, 2020: member

Concept ACK, also encountered the build warnings though.

JeremyRubin commented at 5:05 pm on June 17, 2020: contributor

Will fix the build warnings soon; thanks all for the review (will tag some people when I try to fix as I can’t trigger locally).

Change UpdateForDescendants to use Epochs 8e5aa08df4

JeremyRubin force-pushed on Jun 17, 2020

JeremyRubin commented at 8:35 pm on June 17, 2020: contributor

@jonatack can you confirm this fixes the build warning you encountered?

adamjonas commented at 0:31 am on June 18, 2020: member

Confirming that 32985d34a0a0cfd4ff8c4d96ff461778eaf72a8a produced the warnings but 8e5aa08df4d0f55eae8842d01b1ab3d9a3b74fc5 does not.

hebasto approved

hebasto commented at 11:58 am on June 18, 2020: member

re-ACK 8e5aa08df4d0f55eae8842d01b1ab3d9a3b74fc5, tested on Linux Mint 19.3 (x86_64) with Clang 6.0.0:

32985d34a0a0cfd4ff8c4d96ff461778eaf72a8a

 0$ make > /dev/null
 1...
 2txmempool.cpp:89:35: warning: loop variable 'grand_child_it' of type 'const CTxMemPool::txiter' (aka 'const hashed_index_iterator<hashed_index_node<ordered_index_node<boost::multi_index::detail::null_augment_policy, ordered_index_node<boost::multi_index::detail::null_augment_policy, ordered_index_node<boost::multi_index::detail::null_augment_policy, index_node_base<CTxMemPoolEntry, std::allocator<CTxMemPoolEntry> > > > >, boost::multi_index::detail::hashed_unique_tag>, bucket_array<allocator<CTxMemPoolEntry> >, boost::multi_index::detail::hashed_index_global_iterator_tag>') creates a copy from type 'const CTxMemPool::txiter' [-Wrange-loop-analysis]
 3                for (const txiter grand_child_it : cached_grand_children->second) {
 4                                  ^
 5txmempool.cpp:89:22: note: use reference type 'const CTxMemPool::txiter &' (aka 'const hashed_index_iterator<hashed_index_node<ordered_index_node<boost::multi_index::detail::null_augment_policy, ordered_index_node<boost::multi_index::detail::null_augment_policy, ordered_index_node<boost::multi_index::detail::null_augment_policy, index_node_base<CTxMemPoolEntry, std::allocator<CTxMemPoolEntry> > > > >, boost::multi_index::detail::hashed_unique_tag>, bucket_array<allocator<CTxMemPoolEntry> >, boost::multi_index::detail::hashed_index_global_iterator_tag> &') to prevent copying
 6                for (const txiter grand_child_it : cached_grand_children->second) {
 7                     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 8txmempool.cpp:78:27: warning: loop variable 'child_it' of type 'const CTxMemPool::txiter' (aka 'const hashed_index_iterator<hashed_index_node<ordered_index_node<boost::multi_index::detail::null_augment_policy, ordered_index_node<boost::multi_index::detail::null_augment_policy, ordered_index_node<boost::multi_index::detail::null_augment_policy, index_node_base<CTxMemPoolEntry, std::allocator<CTxMemPoolEntry> > > > >, boost::multi_index::detail::hashed_unique_tag>, bucket_array<allocator<CTxMemPoolEntry> >, boost::multi_index::detail::hashed_index_global_iterator_tag>') creates a copy from type 'const CTxMemPool::txiter' [-Wrange-loop-analysis]
 9        for (const txiter child_it : GetMemPoolChildren(next_it)) {
10                          ^
11txmempool.cpp:78:14: note: use reference type 'const CTxMemPool::txiter &' (aka 'const hashed_index_iterator<hashed_index_node<ordered_index_node<boost::multi_index::detail::null_augment_policy, ordered_index_node<boost::multi_index::detail::null_augment_policy, ordered_index_node<boost::multi_index::detail::null_augment_policy, index_node_base<CTxMemPoolEntry, std::allocator<CTxMemPoolEntry> > > > >, boost::multi_index::detail::hashed_unique_tag>, bucket_array<allocator<CTxMemPoolEntry> >, boost::multi_index::detail::hashed_index_global_iterator_tag> &') to prevent copying
12        for (const txiter child_it : GetMemPoolChildren(next_it)) {
13             ^~~~~~~~~~~~~~~~~~~~~~~
142 warnings generated.
15...

8e5aa08df4d0f55eae8842d01b1ab3d9a3b74fc5 – no such warnings

in src/txmempool.cpp:62 in 8e5aa08df4

73-            if (cacheIt != cachedDescendants.end()) {
74-                // We've already calculated this one, just add the entries for this set
75-                // but don't traverse again.
76-                for (txiter cacheEntry : cacheIt->second) {
77-                    setAllDescendants.insert(cacheEntry);
78+    const auto epoch = GetFreshEpoch();

ariard commented at 5:30 pm on July 9, 2020:

I think you should layout high-level algorithm behavior here, like “cacheMap maintain a key-elements mapping where each key represents a back-to-the-mempool transactions, elements represents its children sorted XXX. We merge cache line based on key intersecting with children of a latter cache entry evaluation”

You can keep per-line comments but it would be better to normalize style, IMO you blur code behavior description (e.g “Create something”) and assumptions (“Elements are excluded, …”) which is a bit confusing.

in src/txmempool.cpp:121 in 8e5aa08df4

134+    int64_t modify_size = 0;
135+    CAmount modify_fee = 0;
136+    int64_t modify_count = 0;
137+    // All entries in the new_cache_line have
138+    // already been checked against excluded list
139+    for (txiter child_it : new_cache_line) {

ariard commented at 5:44 pm on July 9, 2020:

I wonder if previous accounting logic was correct to exclude already-processed vHashesToUpdate member from descendant state update. E.g if back-to-mempool transaction Y is processed first, as reverse_iterate enforces, its parents X, also to process as second, won’t account it as part of modify_fee and others.

AFAICT, new code maintains behavior by not including Y as part of X’s children cache line.

Correct me if I miss some boost:multi_index internals ?

JeremyRubin commented at 6:36 pm on July 9, 2020:

Ok so suppose vhashestoupdate is {X, Y}.

First we submit X to mempool and Y to mempool.

Then we reverse iterate and update Y, then X.

Previously, we scanned across all of the direct children in the mempool of Y then of X and updated the parent/child relationships. So Y would do only things already in the mempool, and X would see Y in this list, but exclude it in setExcluded as it was set properly when Y was put in the mempool.

In the new patch, we scanned across all of the direct children in the mempool of Y then of X and updated the parent/child relationships. So Y would do only things already in the mempool, and X would see Y in this list, but exclude it in mapMemPoolDescendants as it was set properly when Y was put in the mempool, and mapMemPoolDescendants will get a cacheline entry for everything in vHashesToUpdate; and because of iteration order, children will always be visible to parents.

When we actually call into UpdateForDescendants, we only update against things that are in the cachelines, and not the keys to those cachelines. This is because earlier when Y was accepted to the mempool it should have already modified X’s fees.

ariard commented at 7:34 pm on July 9, 2020:

Gotcha, thanks for the laid-out explanation, I forgot that ancestor state X is well-modified at Y new mempool insertion. I was just confused by the sybillic comment “because any descendants in cache were added to the mempool after the transaction being updated and hence their state is already reflected in the parent state”. In fact that’s should point to AcceptToMemoryPool in validation’s UpdateMempoolForReorg to enhance clarity.

ariard commented at 6:05 pm on July 9, 2020: member

I think this PR should layout more optimization goal, i.e are these performance changes pursued to improve adversarial scenarios where a burdensome to evaluate reorg would unbalance block race ? Or it’s still under the whole Epoch mempool goal to support more complex packages ?

You should mention performance improvements in commit message, namely adding an epoch on descendants traversal to avoid duplicate and swap-in std::vector against std::set to remove the O(log n) at element insertion and argue why that’s an improvement. Wrt to std::vector, is this also an improvement memory-size ? You can also spread them on 2 different commits to ease review.

Overall, maybe you should come with a benchmark suite with package payloads of different complexity to avoid hitting hidden performance regressions ? I don’t think mempool_updatefromblock has been designed for this.

laanwj removed this from the "Blockers" column in a project

DrahtBot added the label Needs rebase on Sep 7, 2020

DrahtBot commented at 11:21 am on September 7, 2020: member

🐙 This pull request conflicts with the target branch and needs rebase.

Want to unsubscribe from rebase notifications on this pull request? Just convert this pull request to a “draft”.

ryihan approved

DrahtBot commented at 12:18 pm on February 22, 2022: member

Is it still relevant? ➡️ Please solve the conflicts to make it ready for review and to ensure the CI passes.
Is it no longer relevant? ➡️ Please close.
Did the author lose interest or time to work on this? ➡️ Please close it and mark it ‘Up for grabs’ with the label, so that it can be picked up in the future.

in src/txmempool.cpp:170 in 8e5aa08df4

166@@ -139,13 +167,13 @@ void CTxMemPool::UpdateTransactionsFromBlock(const std::vector<uint256> &vHashes
167                 assert(childIter != mapTx.end());
168                 // We can skip updating entries we've encountered before or that
169                 // are in the block (which are already accounted for).
170-                if (!visited(childIter) && !setAlreadyIncluded.count(childHash)) {
171+                if (!visited(childIter) && !mapMemPoolDescendantsToUpdate.count(childIter)) {

Crypt-iQ commented at 2:49 pm on April 19, 2022:

Not sure if this PR is still active, but I believe it may be more efficient to check:

0
1+++ if (!visited(childIter) && !childIter->GetMemPoolParentsConst().count(it) {

since if the child is unvisited and is in vHashesToUpdate, it will have it as a parent, which isn’t the case otherwise. I think this would be more efficient if the childiter’s set of parents is < cache size, which I would expect to be true more often than not

glozow commented at 9:58 pm on April 22, 2022: member

this is a duplicate of #24158 and has needed rebase for 18 months, close?

fanquake commented at 9:01 am on April 25, 2022: member

close?

I think so. Looking at the [comments in #24158](/bitcoin-bitcoin/24158/#issuecomment-1042219410):

oh oops. uh yeah TBH I forgot I had that other PR open. They should be mostly the same. The main difference is the earlier one also applies an optimization getting rid of setExclude and just using the cache line presence instead, which ends up being redundant with the setExclude. We can add that optimization as a separate PR since it’s a little bit less obvious why it works, I left it out when I rewrote this one.

fanquake closed this on Apr 25, 2022

vijaydasmp referenced this in commit d619cdf2bf on Dec 28, 2022

vijaydasmp referenced this in commit 5b3da6ba6e on Dec 30, 2022

PastaPastaPasta referenced this in commit d033cf5471 on Jan 12, 2023

DrahtBot locked this on Apr 25, 2023

Change UpdateForDescendants to use Epochs #18191

Conflicts