This adds more functionality to the txgraph module, specifically:
TxGraph::GetMainStagingDiagrams(), a function to obtain feerate diagrams for both the main graph and the staged changes to it, including only the clusters that differ between the two.
TxGraph::GetBlockBuilder(), a function to obtain an object which can efficiently iterate the chunks of the (main) graph from high to low chunk feerate, allowing each to be skipped or included.
TxGraph::GetWorstMainChunk(), a function to obtain the last chunk that would be returned by GetBlockBuilder()βs returned object, intended for eviction.
DrahtBot
commented at 2:50 pm on December 8, 2024:
contributor
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.
Try to run the tests locally, according to the documentation. However, a CI failure may still
happen due to a number of reasons, for example:
Possibly due to a silent merge conflict (the changes in this pull request being
incompatible with the current code in the target branch). If so, make sure to rebase on the latest
commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the
affected test.
An intermittent issue.
Leave a comment here, if you need help tracking down a confusing failure.
sipa force-pushed
on Dec 8, 2024
sipa force-pushed
on Dec 8, 2024
DrahtBot removed the label
CI failed
on Dec 8, 2024
glozow added the label
Mempool
on Dec 17, 2024
sipa force-pushed
on Dec 20, 2024
sipa force-pushed
on Dec 22, 2024
sipa force-pushed
on Jan 8, 2025
sipa
commented at 11:30 pm on January 8, 2025:
member
I have simplified the eviction interface. Instead of getting a TxGraph::Evictor object through TxGraph::GetEvictor(), there is now a simpler TxGraph::GetWorstMainChunk() function, which is a pure inspector.
DrahtBot added the label
CI failed
on Jan 9, 2025
DrahtBot
commented at 0:51 am on January 9, 2025:
contributor
Try to run the tests locally, according to the documentation. However, a CI failure may still
happen due to a number of reasons, for example:
Possibly due to a silent merge conflict (the changes in this pull request being
incompatible with the current code in the target branch). If so, make sure to rebase on the latest
commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the
affected test.
An intermittent issue.
Leave a comment here, if you need help tracking down a confusing failure.
sipa force-pushed
on Jan 9, 2025
DrahtBot removed the label
CI failed
on Jan 9, 2025
sipa force-pushed
on Jan 9, 2025
DrahtBot
commented at 11:40 pm on January 9, 2025:
contributor
Try to run the tests locally, according to the documentation. However, a CI failure may still
happen due to a number of reasons, for example:
Possibly due to a silent merge conflict (the changes in this pull request being
incompatible with the current code in the target branch). If so, make sure to rebase on the latest
commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the
affected test.
An intermittent issue.
Leave a comment here, if you need help tracking down a confusing failure.
DrahtBot added the label
CI failed
on Jan 9, 2025
sipa force-pushed
on Jan 10, 2025
DrahtBot removed the label
CI failed
on Jan 10, 2025
sipa force-pushed
on Jan 10, 2025
sipa force-pushed
on Jan 16, 2025
sipa force-pushed
on Jan 22, 2025
sipa
commented at 8:01 pm on January 22, 2025:
member
I modified TxGraph::GetWorstMainChunk() to return the chunk feerate in addition to the list of transaction Refs.
sipa force-pushed
on Jan 24, 2025
DrahtBot
commented at 11:48 pm on January 24, 2025:
contributor
Try to run the tests locally, according to the documentation. However, a CI failure may still
happen due to a number of reasons, for example:
Possibly due to a silent merge conflict (the changes in this pull request being
incompatible with the current code in the target branch). If so, make sure to rebase on the latest
commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the
affected test.
An intermittent issue.
Leave a comment here, if you need help tracking down a confusing failure.
DrahtBot added the label
CI failed
on Jan 24, 2025
sipa force-pushed
on Jan 26, 2025
DrahtBot removed the label
CI failed
on Jan 26, 2025
sipa force-pushed
on Jan 26, 2025
sipa force-pushed
on Jan 30, 2025
DrahtBot added the label
CI failed
on Jan 31, 2025
DrahtBot
commented at 1:16 am on January 31, 2025:
contributor
Try to run the tests locally, according to the documentation. However, a CI failure may still
happen due to a number of reasons, for example:
Possibly due to a silent merge conflict (the changes in this pull request being
incompatible with the current code in the target branch). If so, make sure to rebase on the latest
commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the
affected test.
An intermittent issue.
Leave a comment here, if you need help tracking down a confusing failure.
sipa force-pushed
on Jan 31, 2025
DrahtBot removed the label
CI failed
on Jan 31, 2025
sipa force-pushed
on Jan 31, 2025
DrahtBot added the label
CI failed
on Jan 31, 2025
DrahtBot
commented at 11:33 pm on January 31, 2025:
contributor
Try to run the tests locally, according to the documentation. However, a CI failure may still
happen due to a number of reasons, for example:
Possibly due to a silent merge conflict (the changes in this pull request being
incompatible with the current code in the target branch). If so, make sure to rebase on the latest
commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the
affected test.
An intermittent issue.
Leave a comment here, if you need help tracking down a confusing failure.
sipa force-pushed
on Feb 1, 2025
DrahtBot removed the label
CI failed
on Feb 1, 2025
sipa force-pushed
on Feb 4, 2025
sipa force-pushed
on Feb 6, 2025
sipa force-pushed
on Feb 11, 2025
sipa force-pushed
on Feb 12, 2025
DrahtBot
commented at 11:49 pm on February 12, 2025:
contributor
Try to run the tests locally, according to the documentation. However, a CI failure may still
happen due to a number of reasons, for example:
Possibly due to a silent merge conflict (the changes in this pull request being
incompatible with the current code in the target branch). If so, make sure to rebase on the latest
commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the
affected test.
An intermittent issue.
Leave a comment here, if you need help tracking down a confusing failure.
DrahtBot added the label
CI failed
on Feb 12, 2025
sipa force-pushed
on Feb 13, 2025
DrahtBot removed the label
CI failed
on Feb 13, 2025
sipa force-pushed
on Feb 14, 2025
sipa force-pushed
on Feb 20, 2025
sipa force-pushed
on Feb 21, 2025
in
src/cluster_linearize.h:1381
in
efad48b239outdated
1364+ // Figure out which elements need to be moved before elem.
1365+ SetType place_before = done & depgraph.Ancestors(elem);
1366+ // Find which position to place elem in (updating j), continuously moving the elements
1367+ // in between forward.
1368+ while (place_before.Any()) {
1369+ // j cannot be 0 here; if it was, then there was necessarily nothing earlier which
yancyribbens
commented at 0:36 am on February 25, 2025:
0 // j cannot be 0 or less; if it was, then there was necessarily nothing earlier which
bitcoin deleted a comment
on Feb 25, 2025
sipa force-pushed
on Mar 3, 2025
ismaelsadeeq
commented at 8:31 pm on March 6, 2025:
member
Approach ACK will review in-depth soon
in
src/txgraph.cpp:853
in
22821d0bd5outdated
651+{
652+ auto chunk_feerates = ChunkLinearization(m_depgraph, m_linearization);
653+ for (const auto& feerate : chunk_feerates) {
654+ ret.push_back(feerate);
655+ }
656+}
ismaelsadeeq
commented at 1:00 pm on March 13, 2025:
This code is gone, but no, I did mean >= 2 here (you cannot call GetMainStagingDiagrams unless you have at least 2 diagrams).
in
src/txgraph.cpp:2106
in
22821d0bd5outdated
1879+ for (Cluster* cluster : main_clusters) {
1880+ cluster->AppendChunkFeerates(main_feerates);
1881+ }
1882+ // Do the same for the Clusters in staging themselves.
1883+ const auto& staging = m_clustersets.back();
1884+ for (int quality = 0; quality < int(QualityLevel::NONE); ++quality) {
ismaelsadeeq
commented at 1:13 pm on March 13, 2025:
why are we iterating the clusters for all quality levels for staging?
Try to run the tests locally, according to the documentation. However, a CI failure may still
happen due to a number of reasons, for example:
Possibly due to a silent merge conflict (the changes in this pull request being
incompatible with the current code in the target branch). If so, make sure to rebase on the latest
commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the
affected test.
An intermittent issue.
Leave a comment here, if you need help tracking down a confusing failure.
sipa force-pushed
on Mar 19, 2025
DrahtBot removed the label
CI failed
on Mar 20, 2025
DrahtBot added the label
Needs rebase
on Mar 20, 2025
sipa force-pushed
on Mar 20, 2025
DrahtBot added the label
CI failed
on Mar 20, 2025
DrahtBot
commented at 1:41 pm on March 20, 2025:
contributor
Try to run the tests locally, according to the documentation. However, a CI failure may still
happen due to a number of reasons, for example:
Possibly due to a silent merge conflict (the changes in this pull request being
incompatible with the current code in the target branch). If so, make sure to rebase on the latest
commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the
affected test.
An intermittent issue.
Leave a comment here, if you need help tracking down a confusing failure.
sipa force-pushed
on Mar 20, 2025
DrahtBot removed the label
Needs rebase
on Mar 20, 2025
DrahtBot removed the label
CI failed
on Mar 20, 2025
sipa force-pushed
on Mar 22, 2025
DrahtBot
commented at 1:37 am on March 22, 2025:
contributor
Try to run the tests locally, according to the documentation. However, a CI failure may still
happen due to a number of reasons, for example:
Possibly due to a silent merge conflict (the changes in this pull request being
incompatible with the current code in the target branch). If so, make sure to rebase on the latest
commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the
affected test.
An intermittent issue.
Leave a comment here, if you need help tracking down a confusing failure.
DrahtBot added the label
CI failed
on Mar 22, 2025
sipa force-pushed
on Mar 22, 2025
DrahtBot removed the label
CI failed
on Mar 22, 2025
Thatβs actually implied by Assume(m_main_clusterset.m_deps_to_add.empty()); below (but added a comment to clarify).
in
src/txgraph.cpp:2089
in
7248d26bd0outdated
1926+ for (const auto& cluster : m_staging_clusterset->m_clusters[quality]) {
1927+ cluster->AppendChunkFeerates(staging_feerates);
1928+ }
1929+ }
1930+ // Sort both by decreasing feerate to obtain diagrams, and return them.
1931+ std::sort(main_feerates.begin(), main_feerates.end(), [](auto& a, auto& b) { return a >> b; });
instagibbs
commented at 2:07 pm on March 25, 2025:
why not sort with size tiebreaks here? It isnβt clear based on context.
The within-same-feerate ordering is actually irrelevant as far as the diagram is concerned, itβs just a sequence of equal-slope sections in the graph; they could arguably be merged into a single one even.
Still, for consistently/repeatability, Iβve made it use the size-tiebreaking comparison.
in
src/test/fuzz/txgraph.cpp:841
in
7248d26bd0outdated
730+ assert(FeeRateCompare(main_cmp_diagram[i], main_cmp_diagram[i - 1]) <= 0);
731+ }
732+ for (size_t i = 1; i < stage_cmp_diagram.size(); ++i) {
733+ assert(FeeRateCompare(stage_cmp_diagram[i], stage_cmp_diagram[i - 1]) <= 0);
734+ }
735+ // Apply total ordering on the feerate diagrams to make them comparable (the exact
instagibbs
commented at 2:11 pm on March 25, 2025:
βcomparableβ through me off here since I was thinking about comparable diagrams.
in
src/test/fuzz/txgraph.cpp:905
in
7248d26bd0outdated
754+ std::inserter(missing_stage_cmp, missing_stage_cmp.end()),
755+ std::greater{});
756+ assert(stage_cmp_diagram.size() + missing_stage_cmp.size() == stage_diagram.size());
757+ // The missing chunks must be equal across main & staging (otherwise they couldn't have
758+ // been omitted).
759+ assert(missing_main_cmp == missing_stage_cmp);
instagibbs
commented at 2:27 pm on March 25, 2025:
Iβm wondering what the best way of catching stuff that should be missing on both sides being added to both diagrams erroneously.
You could imagine a degenerate case where a bug turns this into a whole mempool check. (and implies weβre doing a bunch of work maintaining these clusters for no reason)
A rough suggestion to think about, perhaps you can tighten the bounds?
0diff --git a/src/test/fuzz/txgraph.cpp b/src/test/fuzz/txgraph.cpp
1index 7249827103..f30b650313 100644 2--- a/src/test/fuzz/txgraph.cpp
3+++ b/src/test/fuzz/txgraph.cpp
4@@-338,10+338,17@@ FUZZ_TARGET(txgraph)
5// just feerates). 6 std::sort(ret.begin(), ret.end(), std::greater{});
7return ret;
8 };
9 10+// When entering staging reset to 0 11+// This is used to upper-bound the possible diagram size in number of entries
12+// to catch any regression where erroneous entries are on both sides of the
13+// computed diagram when they shouldn't have been added at all. 14+ uint32_t num_possible_conflicted_entries_main{0};
15+ uint32_t num_possible_pulled_in_entries_staged{0};
16+ 17 LIMITED_WHILE(provider.remaining_bytes() >0, 200) {
18// Read a one-byte command. 19int command = provider.ConsumeIntegral<uint8_t>();
20 21/** Use the bottom 2 bits of command to select an entry in the block_builders vector (if 22@@-377,10+384,11@@ FUZZ_TARGET(txgraph)
23// Otherwise, use smaller range which consume fewer fuzz input bytes, as just
24// these are likely sufficient to trigger all interesting code paths already. 25 fee = provider.ConsumeIntegral<uint8_t>();
26 size = provider.ConsumeIntegral<uint8_t>() +1;
27 }
28+ num_possible_pulled_in_entries_staged++;
29 FeePerWeight feerate{fee, size};
30// Create a real TxGraph::Ref. 31 auto ref = real->AddTransaction(feerate);
32// Create a shared_ptr place in the simulation to put the Ref in. 33 auto ref_loc = top_sim.AddTransaction(feerate);
34@@-398,10+406,22@@ FUZZ_TARGET(txgraph)
35//andif so, skip. 36if (top_sim.graph.Ancestors(pos_par)[pos_chl]) break;
37 }
38 top_sim.AddDependency(par, chl);
39 real->AddDependency(*par, *chl);
40+if (pos_par != SimTxGraph::MISSING && pos_chl != SimTxGraph::MISSING) {
41+if (!top_sim.IsOversized() &&!main_sim.IsOversized()) {
42+// If both are real, add resulting cluster count to num_possible_pulled_in_entries_* 43+const auto max_count{std::max(real->GetCluster(*par, false).size(), real->GetCluster(*par, true).size())};
44+ num_possible_conflicted_entries_main += max_count;
45+ num_possible_pulled_in_entries_staged += max_count;
46+ } else {
47+// If ever oversized, add max count to "turn it off" 48+ num_possible_conflicted_entries_main += SimTxGraph::MAX_TRANSACTIONS;
49+ num_possible_pulled_in_entries_staged += SimTxGraph::MAX_TRANSACTIONS;
50+ }
51+ }
52break;
53 } elseif ((block_builders.empty() || sims.size() >1) && top_sim.removed.size() <100&& command--==0) {
54// RemoveTransaction. Either all its ancestors or all its descendants are also
55// removed (if any), to make sure TxGraph's reordering of removals and dependencies 56// has no effect. 57@@-410,10+430,20@@ FUZZ_TARGET(txgraph)
58 top_sim.IncludeAncDesc(to_remove, alt);
59// The order in which these ancestors/descendants are removed should not matter;
60// randomly shuffle them. 61 std::shuffle(to_remove.begin(), to_remove.end(), rng);
62for (TxGraph::Ref* ptr : to_remove) {
63+if (!top_sim.IsOversized() &&!main_sim.IsOversized()) {
64+const auto max_count{std::max(real->GetCluster(*ptr, false).size(), real->GetCluster(*ptr, true).size())};
65+// Overestimate number of pulled in items
66+ num_possible_conflicted_entries_main += max_count;
67+ num_possible_pulled_in_entries_staged += max_count;
68+ } else {
69+// If ever oversized, add max count to "turn it off" 70+ num_possible_conflicted_entries_main += SimTxGraph::MAX_TRANSACTIONS;
71+ num_possible_pulled_in_entries_staged += SimTxGraph::MAX_TRANSACTIONS;
72+ }
73 real->RemoveTransaction(*ptr);
74 top_sim.RemoveTransaction(ptr);
75 }
76break;
77 } elseif (sel_sim.removed.size() >0&& command--==0) {
78@@-446,10+476,20@@ FUZZ_TARGET(txgraph)
79 }
80// The order in which these ancestors/descendants are destroyed should not matter;
81// randomly shuffle them. 82 std::shuffle(to_destroy.begin(), to_destroy.end(), rng);
83for (TxGraph::Ref* ptr : to_destroy) {
84+if (!top_sim.IsOversized() &&!main_sim.IsOversized()) {
85+const auto max_count{std::max(real->GetCluster(*ptr, false).size(), real->GetCluster(*ptr, true).size())};
86+// Overestimate number of pulled in items
87+ num_possible_conflicted_entries_main += max_count;
88+ num_possible_pulled_in_entries_staged += max_count;
89+ } else {
90+// If ever oversized, add max count to "turn it off" 91+ num_possible_conflicted_entries_main += SimTxGraph::MAX_TRANSACTIONS;
92+ num_possible_pulled_in_entries_staged += SimTxGraph::MAX_TRANSACTIONS;
93+ }
94for (size_t level =0; level < sims.size(); ++level) {
95 sims[level].DestroyTransaction(ptr, level == sims.size() -1);
96 }
97 }
98break;
99@@-580,10+620,14@@ FUZZ_TARGET(txgraph)
100break;
101 } elseif (sims.size() <2&& command--==0) {
102// StartStaging.103 sims.emplace_back(sims.back());
104 real->StartStaging();
105+106+// Start counting upper-bound on open of staging
107+ num_possible_conflicted_entries_main =0;
108+ num_possible_pulled_in_entries_staged =0;
109break;
110 } elseif (block_builders.empty() && sims.size() >1&& command--==0) {
111// CommitStaging.112 real->CommitStaging();
113 sims.erase(sims.begin());
114@@-668,10+712,17@@ FUZZ_TARGET(txgraph)
115 auto [main_diagram, staged_diagram] = real->GetMainStagingDiagrams();
116 auto sum_main = std::accumulate(main_diagram.begin(), main_diagram.end(), FeeFrac{});
117 auto sum_staged = std::accumulate(staged_diagram.begin(), staged_diagram.end(), FeeFrac{});
118 auto diagram_gain = sum_staged - sum_main;
119 auto real_gain = sims[1].SumAll() - sims[0].SumAll();
120+121+// Sanity check sizes of diagrams
122+ assert(main_diagram.size() <= num_possible_conflicted_entries_main);
123+ assert(main_diagram.size() <= SimTxGraph::MAX_TRANSACTIONS);
124+ assert(staged_diagram.size() <= num_possible_pulled_in_entries_staged);
125+ assert(staged_diagram.size() <= SimTxGraph::MAX_TRANSACTIONS);
126+127// Just check that the total fee gained/lost and size gained/lost according to the
128// diagram matches the difference in these values in the simulated graph. A more
129// complete check of the GetMainStagingDiagrams result is performed at the end.130 assert(diagram_gain == real_gain);
131// Check that the feerates in each diagram are monotonically decreasing.
I have added a more accurate approach, I think: the staging trim graph now keeps track of which transactions have been potentially modified since being copied from main. The final comparison checks that the sections of the diagram missing from both GetMainStagingDiagrams results includes at least all those that have definitely not been modified.
instagibbs
commented at 2:49 pm on March 28, 2025:
After a quick read this looks great! Simple and makes sense. Since the missing is equal on each side, the accounting only needs to be done on one side.
195@@ -196,6 +196,10 @@ class TxGraph
196 /** Construct a block builder, drawing chunks in order, from the main graph, which cannot be
197 * oversized. While the returned object exists, no mutators on the main graph are allowed. */
198 virtual std::unique_ptr<BlockBuilder> GetBlockBuilder() noexcept = 0;
199+ /** Get the last chunk in the main graph, i.e., the last chunk that would be returned by a
instagibbs
commented at 2:52 pm on March 25, 2025:
should say that the FeePerWeight is the cfr, and that it cannot be called on an oversized graph
292@@ -292,6 +293,8 @@ class TxGraphImpl final : public TxGraph
293 ChunkIndex m_chunkindex;
294 /** Number of index-observing objects in existence (BlockBuilderImpl). */
295 size_t m_chunkindex_observers{0};
296+ /** Cache of discarded ChunkIndex node handles. */
instagibbs
commented at 3:04 pm on March 25, 2025:
0 /** Cache of discarded ChunkIndex node handles to re-use, avoiding additional allocations. */
glozow referenced this in commit
f1d129d963
on Mar 26, 2025
sipa force-pushed
on Mar 27, 2025
sipa
commented at 3:49 am on March 27, 2025:
member
Rebased after merge of #31363, and on top of #32151.
in
src/txgraph.h:171
in
899ce12b59outdated
161@@ -162,6 +162,10 @@ class TxGraph
162 * main clusters are counted. Refs that do not exist in the queried graph are ignored. The
163 * queried graph must not be oversized. */
164 virtual GraphIndex CountDistinctClusters(std::span<const Ref* const>, bool main_only = false) noexcept = 0;
165+ /** Get feerate diagrams for both main and staging (which must both exist and not be
166+ * oversized), ignoring unmodified components in both. Use FeeFrac rather than FeePerWeight
167+ * so CompareChunks is usable without type-conversion. */
168+ virtual std::pair<std::vector<FeeFrac>, std::vector<FeeFrac>> GetMainStagingDiagrams() noexcept = 0;
I considered something like this, but it complicates matters for the fuzz test, which reconstructs diagram directly using DepGraph code, which produces FeeFrac ones.
Yeah, guess that pushes it too far into the βnot worth itβ
in
src/txgraph.cpp:2090
in
899ce12b59outdated
1931+ cluster->AppendChunkFeerates(staging_feerates);
1932+ }
1933+ }
1934+ // Sort both by decreasing feerate to obtain diagrams, and return them.
1935+ std::sort(main_feerates.begin(), main_feerates.end(), [](auto& a, auto& b) { return a >> b; });
1936+ std::sort(staging_feerates.begin(), staging_feerates.end(), [](auto& a, auto& b) { return a >> b; });
I wonder if getting chunks in feerate order across multiple clusters might be something common/important enough to be worth doing more directly, rather than just getting them all then sorting them after the fact? I suppose Iβd just be suggesting using a heap of affected clusters, which isnβt that different to just sorting by chunk, so I guess itβs fine.
Hmm. I didnβt understand how BlockBuilder was implemented when I wrote that; BlockBuilder maintains an ordered tree of every main chunk, so is O(num_chunks) in size/performance, which is fine. What I had in mind conceptually about cluster mempool is that youβd have a structure that was O(num_clusters) that you could query for the the best/worst chunk. That would be O(n log(n)) setup, then O(log(n)) for the equivalent of a βNext()β call, for n=num_clusters. But thatβs not superior to how BlockBuilder works (and is probably inferior), so, doesnβt sound like thereβs anything worth doing here.
Yeah, I think I considered an on-the-fly heap at some point, and itβs probable that that actually has overall lower runtime costs than this (due do better memory locality and less allocation overhead than a tree-structured set), but being to do the bulk of the work ahead of time is nice.
Thinking some more about thisβ¦ I guess the advantage of the on-the-fly heap would be that you may never have to calculate the ordering for some chunks in the middle of the mempool (if they get rbfβed, or conflicted by a block tx, or if they expire, or if they get cpfpβed and thus brought to the front of the mempool, rather than just waiting until the best block feerate drops to their level, or you stop and restart your bitcoind before they get mined). Probably few enough txs hit those special cases to be much of a saving overall though
Another thought: consider the βvirtual Cluster classβ idea, but instead of special casing each standalone tx into a custom single-tx-Cluster, you could have a single cluster for all the standalone txs. That would then just order those txs by feerate, but it would mean that instead of having to maintain a separate chunk/cluster data structure for each of those txs, you only maintain a single cluster data structure that covers all of them. That approach messes up the model for diagram comparisons that this PR introduces (because suddenly the diagram comparisons pull in unrelated singleton txs whenever any one of them is touched), but maybe that could be fixed reasonably.
I guess the advantage of the on-the-fly heap would be that you may never have to calculate the ordering for some chunks in the middle of the mempool
This is possible even with the existing index-based structure, by allowing chunk feerates to be indeterminate, and index them by a known lower and upper bound. That would necessitate two precomputed indexes (one for block building, sorted by the upper bound; one for eviction, sorted by the lower bound), and relinearizing whatever cluster was found on the fly during block building eviction. If we ever do the absorption set idea, weβd also need two indexes, so thatβs not necessarily a very high cost.
But Iβm not sure any of this is worth it, because it doesnβt really improve the worst case, where you can have successive indidividual transactions that each interact with many of the same clusters (e.g. conflict with a transaction in many of them), requiring relinearizing all of them anyway.
Another thought: consider the βvirtual Cluster classβ idea, but instead of special casing each standalone tx into a custom single-tx-Cluster, you could have a single cluster for all the standalone txs.
A downside here is that the βlinearizationβ for this fat singletons cluster would be O(n log n) to update on every operation involving a singleton, which may be substantial if there are 1000s of singletons?
A variation on this idea though: add a new QualityLevel::SINGLETON, and allow many singleton clusters at the same time, each individually limited to 64 (or whatever) singleton transactions. This might not even need the virtual class idea.
A downside here is that the βlinearizationβ for this fat singletons cluster would be O(n log n) to update on every operation involving a singleton, which may be substantial if there are 1000s of singletons?
I donβt think thatβs true? Youβd just store the linearisation as part of the special cluster implementation as a std::set by feerate order, for O(lg n) modification and O(n) reporting? I guess it would be better to call that class something other than βclusterβ, since youβd probably want GetCluster(singleton) to just return the singleton, and CountDistinctClusters({singleton1, singleton2}) to return 2, etc.
A variation on this idea though: add a new QualityLevel::SINGLETON, and allow many singleton clusters at the same time, each individually limited to 64 (or whatever) singleton transactions. This might not even need the virtual class idea.
Hmm, maybe β you could find a new child tx would spend 63 txs from 63 different singleton clusters, but I donβt know if thereβs anything about that that would be worse than the number of clusters you could impact by conflicting with pre-existing txs; and 63*64 is only 4032 which doesnβt sound very scaryβ¦ Big mempools seem like they might contain up to 175k txs (300MB limit; or 350k txs unlimited), if they were all singletons, dividing by 64 gives just 2700 (or 5500) clusters, which seems nice. (I guess youβd need something to keep track of which existing singleton clusters arenβt full and can be added to)
in
src/test/fuzz/txgraph.cpp:863
in
ef68aea6fdoutdated
832+ std::reverse(chunk.begin(), chunk.end());
833+ auto [worst_chunk, worst_chunk_feerate] = real->GetWorstMainChunk();
834+ assert(chunk == worst_chunk);
835+ assert(chunk_feerate == worst_chunk_feerate);
836+
837+ // Check that the implied ordering gives rise to a combined diagram that matches the
instagibbs
commented at 1:43 pm on March 27, 2025:
Took me a second to understand how get_diagram_fn related to the real world, is this correct? The ordering isnβt implied, exactly.
0 // Check that the real ordering gives rise to a combined diagram that matches the
The implied ordering is the one that was reconstructed above, through CompareMainOrder. Iβve changed the variable names to make this hopefully clearer.
in
src/test/fuzz/txgraph.cpp:826
in
ef68aea6fdoutdated
834+ assert(chunk == worst_chunk);
835+ assert(chunk_feerate == worst_chunk_feerate);
836+
837+ // Check that the implied ordering gives rise to a combined diagram that matches the
838+ // diagram constructed from the individual cluster linearization chunkings.
839+ auto main_diagram = get_diagram_fn(true);
instagibbs
commented at 1:45 pm on March 27, 2025:
0 auto main_full_diagram = get_diagram_fn(/*main_only=*/true);
Iβve renamed all the variables to be of the form {main,staging}_{real,implied,cmp}_diagram.
in
src/test/fuzz/txgraph.cpp:856
in
ef68aea6fdoutdated
864+ main_cmp_diagram.begin(), main_cmp_diagram.end(),
865+ std::inserter(missing_main_cmp, missing_main_cmp.end()),
866+ std::greater{});
867+ assert(main_cmp_diagram.size() + missing_main_cmp.size() == main_diagram.size());
868+ // Do the same for chunks in stage_diagram missign from stage_cmp_diagram.
869+ auto stage_diagram = get_diagram_fn(false);
instagibbs
commented at 1:45 pm on March 27, 2025:
0 auto stage_full_diagram = get_diagram_fn(/*main_only=*/false);
the argument could also just take an explicit level/staging arg since theyβre fully determined at each call site for now
Would it make sense to have , int level=MAIN_LEVEL as a default argument so this could be used for (items changed in) staging as well? Otherwise, should the class name or docstring reflect itβs for main only?
Given that there is only an m_main_lin_order (for now, and no prospective reason why weβd need one in staging), that does not work.
in
src/test/fuzz/txgraph.cpp:855
in
ef68aea6fdoutdated
863+ std::set_difference(main_diagram.begin(), main_diagram.end(),
864+ main_cmp_diagram.begin(), main_cmp_diagram.end(),
865+ std::inserter(missing_main_cmp, missing_main_cmp.end()),
866+ std::greater{});
867+ assert(main_cmp_diagram.size() + missing_main_cmp.size() == main_diagram.size());
868+ // Do the same for chunks in stage_diagram missign from stage_cmp_diagram.
instagibbs
commented at 1:56 pm on March 27, 2025:
0 // Do the same for chunks in stage_diagram missing from stage_cmp_diagram.
183+ /** Determine whether there are more transactions to be included. */
184+ explicit operator bool() noexcept { return m_current_chunk.has_value(); }
185+ /** Get the chunk that is currently suggested to be included. */
186+ const std::span<Ref*>& GetCurrentChunk() noexcept { return m_current_chunk->first; }
187+ /** Get the feerate of the currently suggested chunk. */
188+ const FeePerWeight& GetCurrentChunkFeerate() noexcept { return m_current_chunk->second; }
Iβm a bit surprised to see m_current_chunk as a persistent-ish variable that includes Ref* β that makes it a bug to move Refs around while you have a BlockBuilder in existence. I would have thought something more like:
Itβs still a bug to move Refs around while you have a NextChunk, but at least thatβs kind-of shorter-lived than a BlockBuilder. I think that would mostly just mean moving the Next() calls into NextChunk().
Iβve done something like this, but a bit different, as I felt itβd be ugly for the semantics of repeated GetNextChunk calls without Skip/Include in between.
The interface is now:
0/** Get the chunk that is currently suggested to be included, plus its feerate, if any. */1virtual std::optional<std::pair<std::vector<Ref*>, FeePerWeight>> GetCurrentChunk() noexcept=0;
2/** Mark the current chunk as included, and progress to the next one. */3virtualvoidInclude() noexcept=0;
4/** Mark the current chunk as skipped, and progress to the next one. Further chunks from
5 * the same cluster as the current one will not be reported anymore. */6virtualvoidSkip() noexcept=0;
Where GetCurrentChunk may be called multiple times, without (observable) effect on the builder.
in
src/txgraph.cpp:296
in
bf7b901627outdated
289@@ -289,6 +290,8 @@ class TxGraphImpl final : public TxGraph
290291 /** Index of ChunkData objects. */
292 ChunkIndex m_chunkindex;
293+ /** Number of index-observing objects in existence (BlockBuilderImpl). */
294+ size_t m_chunkindex_observers{0};
Might be good to introduce m_chunkindex_observers in its own commit? Arguably having a vector of Refβs (making it invalid to do Ref b = std::move(a);) is a different constraint from being unable to change the linearisation at the main level (which would mess up BlockBuilder); could potentially have distinct βobserverβ values for each, which would allow fees to be bumped, when you just have a Ref vec, eg.
Itβs a possibility, but you mean this in addition to BlockBuildImpl itself also incrementing the observers count, right? Because it holds an iterator to the chunk index in GraphImpl too.
Itβs a possibility, but you mean this in addition to BlockBuildImpl itself also incrementing the observers count, right? Because it holds an iterator to the chunk index in GraphImpl too.
Yes. (though, technically, if BlockBuildImpl stores its own RefsVector it could avoid explicitly incrementing the observers count, because RefsVector does that for it)
instagibbs
commented at 4:31 pm on March 27, 2025:
member
just a few comments after first read through
in
src/txgraph.cpp:437
in
554888cbb6outdated
433@@ -375,6 +434,7 @@ class TxGraphImpl final : public TxGraph
434 {
435 auto& entry = m_entries[idx];
436 Assume(entry.m_ref != nullptr);
437+ Assume(m_chunkindex_observers == 0 || !entry.m_locator[0].IsPresent());
in
src/test/fuzz/txgraph.cpp:351
in
f4c8848c7coutdated
315+ }
316+ // Verify the number of transactions after deduplicating clusters. This implicitly verifies
317+ // that GetCluster on each element of a cluster reports the cluster transactions in the same
318+ // order.
319+ assert(num_tx == sim.GetTransactionCount());
320+ // Sort by feerate (we don't care about respecting ordering within clusters, as these are
instagibbs
commented at 2:02 pm on March 28, 2025:
We donβt care or it will be respected within clusters due to CFR?
As far as diagram is concerned (incl. for FeeRateCompare), the order of equal-feerate element is irrelevant, as they just represent different sections of an overall line segment with a fixed slope. As far as FeeRateCompare is concerned, they could even be merged into a single line segment with the combined fee and size.
instagibbs
commented at 6:54 pm on March 28, 2025:
Oh, I misread this, but Iβm still having trouble understanding what is intended. βSort by feerateβ β¦ βas these are just feeratesβ.
Maybe Iβm making things more confusing by mentioning it at all?
My thinking is this: in general, to construct the full-graph chunk ordering, you cannot just sort by feerate, because you may sort two equal-feerate chunks within the same cluster incorrectly w.r.t. each other, breaking topology. However, here we are just construct a diagram, and donβt care about what chunks they correspond to, so as far as the feerates of line segments in the diagram is concerned, we can just ignore this problem.
instagibbs
commented at 2:39 pm on March 31, 2025:
Yeah I think no comment at all is better than as-is, but maybe we can just point exactly to what case this is ignoring as safe?
βSort by feerate only, since violating topological constraints within same-feerate chunks wonβt affect diagram comparisons.β
void Skip()
Which avoids the need for it to store Ref* internally.
Improved GetMainStagingDiagrams testing, to verify that unmodified clusters are indeed not included in the diagrams.
Split off the introduction of the chunk index observers to a separate commit.
Split off the generalization of GetClusterRefs to support subsections of a Cluster to a separate commit.
in
src/test/fuzz/txgraph.cpp:912
in
50f173e0b3outdated
761+ // The missing part must include at least all transactions in staging which have not been
762+ // modified, or been in a cluster together with modified transactions, since they were
763+ // copied from main. Note that due to the reordering of removals w.r.t. dependency
764+ // additions, it is possible that the real implementation found more unaffected things.
765+ FeeFrac missing_real;
766+ for (const auto& feerate : missing_main_cmp) missing_real += feerate;
instagibbs
commented at 2:51 pm on March 28, 2025:
micro-nit: I know theyβre identical, but having the list being built via stage_real_diagram seems easier to digest whatβs being computed
instagibbs
commented at 2:39 pm on March 31, 2025:
not worth the nit, mark as resolved please
in
src/txgraph.cpp:267
in
1d500a50b6outdated
251+ struct ChunkData
252+ {
253+ /** The Entry which is the last transaction of the chunk. */
254+ mutable GraphIndex m_graph_index;
255+ /** How many transactions the chunk contains. */
256+ LinearizationIndex m_chunk_count;
instagibbs
commented at 3:26 pm on March 28, 2025:
1d500a50b698a638c840ca79504c2a9a421e8756 stretch goal, have SanityCheck check m_chunk_count directly?
345@@ -303,6 +346,8 @@ class TxGraphImpl final : public TxGraph
346 {
347 /** Pointer to the corresponding Ref object if any, or nullptr if unlinked. */
348 Ref* m_ref{nullptr};
349+ /** Iterator to the corresponding ChunkData, if any. */
instagibbs
commented at 3:39 pm on March 28, 2025:
nit: Could we note that this is initialized first in AddTransaction?
286+
287+ /** Definition for the mining index type. */
288+ using ChunkIndex = std::set<ChunkData, ChunkOrder>;
289+
290+ /** Index of ChunkData objects. */
291+ ChunkIndex m_chunkindex;
instagibbs
commented at 3:43 pm on March 28, 2025:
maybe overkill but m_main_chunkindex since thereβs a lot of level==0 checks everywhere gating access/modification
147@@ -148,7 +148,7 @@ class Cluster
148 * union of their descendants to output. */
149 void GetDescendantRefs(const TxGraphImpl& graph, std::span<std::pair<Cluster*, DepGraphIndex>>& args, std::vector<TxGraph::Ref*>& output) noexcept;
150 /** Get a vector of Refs for all elements of this Cluster, in linearization order. */
151- std::vector<TxGraph::Ref*> GetClusterRefs(const TxGraphImpl& graph) noexcept;
152+ void GetClusterRefs(TxGraphImpl& graph, std::span<TxGraph::Ref*> range, LinearizationIndex start_pos) noexcept;
instagibbs
commented at 4:28 pm on March 28, 2025:
b130e185f54c7f762da6ef435882335e4f0b98ff
Could the documentation get beefed up a bit for this? What the shape that range should look like, etc. Was only clear later when seeing usages.
186+ * the same cluster as the current one will not be reported anymore. */
187+ virtual void Skip() noexcept = 0;
188+ };
189+
190+ /** Construct a block builder, drawing chunks in order, from the main graph, which cannot be
191+ * oversized. While the returned object exists, no mutators on the main graph are allowed. */
instagibbs
commented at 4:57 pm on March 28, 2025:
Should we make sure that the builder outlives the graph? Currently itβs allowed, or at least doesnβt fail if destructors are swapped in the fuzz target.
Thatβs strange, because the builder outliving the graph is definitely a problem (the observer counter in the graph, which has been destroyed, will be subtracted from). I donβt see an easy way of allowing this (itβd need the graph to maintain a set of observers, which seems overkill).
instagibbs
commented at 5:57 pm on March 29, 2025:
Sorry, that the builder doesnβt outlive the graph. I was testing to see what happens when it does outlive it, and itβs not crashing currently.
in
src/test/fuzz/txgraph.cpp:801
in
545812c3a3outdated
797@@ -715,6 +798,27 @@ FUZZ_TARGET(txgraph)
798 }
799 }
800801+ // The same order should be obtained through a BlockBuilder, if nothing is skipped.
instagibbs
commented at 5:05 pm on March 28, 2025:
0 // The same order should be obtained through a BlockBuilder as implied by CompareMainOrder, if nothing is skipped.
540+ /** Which TxGraphImpl this object is doing block building for. It will have its
541+ * m_chunkindex_observers incremented as long as this BlockBuilderImpl exists. */
542+ TxGraphImpl* const m_graph;
543+ /** Clusters which we're not including further transactions from. */
544+ std::set<Cluster*> m_excluded_clusters;
545+ /** Iterator to the current chunk (after the current one) in the chunk index. end() if nothing
instagibbs
commented at 5:07 pm on March 28, 2025:
0 /** Iterator to the current chunk (after the current one) in the chunk index. m_chunkindex.end() if nothing
181+ /** Get the chunk that is currently suggested to be included, plus its feerate, if any. */
182+ virtual std::optional<std::pair<std::vector<Ref*>, FeePerWeight>> GetCurrentChunk() noexcept = 0;
183+ /** Mark the current chunk as included, and progress to the next one. */
184+ virtual void Include() noexcept = 0;
185+ /** Mark the current chunk as skipped, and progress to the next one. Further chunks from
186+ * the same cluster as the current one will not be reported anymore. */
instagibbs
commented at 5:12 pm on March 28, 2025:
Note that this only should be called if GetCurrentChunk() would return something non-nullopt
looks right, and instantly crashes when I add an Assume in the right spot
in
src/txgraph.h:184
in
545812c3a3outdated
178+ public:
179+ /** Support safe inheritance. */
180+ virtual ~BlockBuilder() = default;
181+ /** Get the chunk that is currently suggested to be included, plus its feerate, if any. */
182+ virtual std::optional<std::pair<std::vector<Ref*>, FeePerWeight>> GetCurrentChunk() noexcept = 0;
183+ /** Mark the current chunk as included, and progress to the next one. */
instagibbs
commented at 5:13 pm on March 28, 2025:
Note that this only should be called if GetCurrentChunk() would return something non-nullopt
Iβve changed the code so that Include() and Skip() are no-ops once the end is reached.
in
src/txgraph.h:196
in
d8483d8290outdated
189@@ -190,6 +190,11 @@ class TxGraph
190 /** Construct a block builder, drawing chunks in order, from the main graph, which cannot be
191 * oversized. While the returned object exists, no mutators on the main graph are allowed. */
192 virtual std::unique_ptr<BlockBuilder> GetBlockBuilder() noexcept = 0;
193+ /** Get the last chunk in the main graph, i.e., the last chunk that would be returned by a
194+ * BlockBuilder created now, together with its feerate. The chunk is returned in
195+ * reverse-topological order, so every element is preceded by all its descendants. The main
196+ * graph must not be oversized. If the graph is empty, {} is returned. */
instagibbs
commented at 5:19 pm on March 28, 2025:
in
src/test/fuzz/txgraph.cpp:757
in
d8483d8290outdated
748+ // GetWorstMainChunk.
749+ auto [worst_chunk, worst_chunk_feerate] = real->GetWorstMainChunk();
750+ // Just do some sanity checks here. Consistency with GetBlockBuilder is checked
751+ // below.
752+ if (main_sim.GetTransactionCount() == 0) {
753+ assert(worst_chunk.empty());
instagibbs
commented at 5:19 pm on March 28, 2025:
in
src/test/fuzz/txgraph.cpp:857
in
d8483d8290outdated
846+ last_chunk_feerate = chunk->second;
847 builder->Include();
848 }
849 assert(vec_builder == vec1);
850851+ // The last chunk returned by the BlockBuilder must match GetWorstMainChunk, in reverse.
instagibbs
commented at 5:32 pm on March 28, 2025:
Love to see it
in
src/txgraph.cpp:152
in
a0e19959e8outdated
147@@ -148,8 +148,9 @@ class Cluster
148 /** Process elements from the front of args that apply to this cluster, and append Refs for the
149 * union of their descendants to output. */
150 void GetDescendantRefs(const TxGraphImpl& graph, std::span<std::pair<Cluster*, DepGraphIndex>>& args, std::vector<TxGraph::Ref*>& output) noexcept;
151- /** Get a vector of Refs for all elements of this Cluster, in linearization order. */
152- void GetClusterRefs(TxGraphImpl& graph, std::span<TxGraph::Ref*> range, LinearizationIndex start_pos) noexcept;
153+ /** Get a vector of Refs for all elements of this Cluster, in linearization order. Returns
154+ * the range ends at the end of the cluster. */
instagibbs
commented at 5:55 pm on March 28, 2025:
0 * true if the range ends at the end of the cluster. */
I donβt like that because it means either duplicating the variable, or naming the argument different in definition & declaration.
Iβve changed the comment a bit, though.
instagibbs
commented at 5:58 pm on March 29, 2025:
feel free to resolve
in
src/txgraph.cpp:151
in
a0e19959e8outdated
147@@ -148,8 +148,9 @@ class Cluster
148 /** Process elements from the front of args that apply to this cluster, and append Refs for the
149 * union of their descendants to output. */
150 void GetDescendantRefs(const TxGraphImpl& graph, std::span<std::pair<Cluster*, DepGraphIndex>>& args, std::vector<TxGraph::Ref*>& output) noexcept;
151- /** Get a vector of Refs for all elements of this Cluster, in linearization order. */
152- void GetClusterRefs(TxGraphImpl& graph, std::span<TxGraph::Ref*> range, LinearizationIndex start_pos) noexcept;
153+ /** Get a vector of Refs for all elements of this Cluster, in linearization order. Returns
instagibbs
commented at 6:18 pm on March 28, 2025:
re: commit message for a0e19959e85b2ce1a90902ab1e3fe274f9d63a1d
Might just stress that this makes m_excluded_cluster a lot smaller since this takes the size from O(number cluster) to O(number non-singleton clusters).
instagibbs
commented at 6:31 pm on March 28, 2025:
member
Looking good 0630995fee22990402771547be1480b8706c76ce
No real show-stoppers except for the diagram check fuzz test conceptual issue Iβm still having.
sipa force-pushed
on Mar 28, 2025
in
src/txgraph.h:165
in
dac4d915a4outdated
161@@ -162,6 +162,10 @@ class TxGraph
162 * main clusters are counted. Refs that do not exist in the queried graph are ignored. The
163 * queried graph must not be oversized. */
164 virtual GraphIndex CountDistinctClusters(std::span<const Ref* const>, bool main_only = false) noexcept = 0;
165+ /** Get feerate diagrams for both main and staging (which must both exist and not be
instagibbs
commented at 12:57 pm on March 31, 2025:
Actually we need to define exactly what a feerate diagram is here given it could mean a couple things. i.e., itβs not the aggregated form, but a series of monotonically decreasing in feerate chunks
instagibbs
commented at 12:58 pm on March 31, 2025:
0 /** Get feerate diagrams for both main and staging respectively (which must both exist and not be
in
src/test/fuzz/txgraph.cpp:910
in
3fffb59368outdated
763+ assert(missing_main_cmp == missing_stage_cmp);
764+
765+ // The missing part must include at least all transactions in staging which have not been
766+ // modified, or been in a cluster together with modified transactions, since they were
767+ // copied from main. Note that due to the reordering of removals w.r.t. dependency
768+ // additions, it is possible that the real implementation found more unaffected things.
ismaelsadeeq
commented at 11:01 am on April 22, 2025:
In βtxgraph: Add GetMainStagingDiagrams function (feature)β 3fffb59368b3a84973c6d775c349008273d72b91
Iβve read this comment several times, but I still donβt fully understand it.
βor been in a cluster together with modified transactionβ,
0/** Which transactions have been modified in the graph since creation, either directly or by
1 * being in a cluster which includes modifications. Only relevant for the staging graph. */2 SetType modified;
Doesnβt this imply that the transaction is also considered modified?
Also, could you please clarify what you mean by βmore unaffected thingsβ?
I agree this is pretty intricate. Iβll try to explain in the comments, to see if you have any suggestions for making it clearer.
The SetType modified; variable contains a conservative overestimate of which clusters in the real graph may differ between main and staging. Whenever a transaction is added, or removed, or undergoes a dependency adding in the simulated staging graph, it is definitely considered modified. But for example, say transactions A->B exists in main & staging, so theyβre not in modified. Now a transaction C is added to staging, and a dependency B->C is added. Main is A->B, staging is {A,C}->B. We mark A as modified too, because despite not directly undergoing a change, the cluster it is in is still changed.
Now as for why this may be an overestimate. Say D->E exist in main, as well as a separate F. Staging is created, a dependency E->F is added, and E is removed. The result, in the simulation, is that D & F remain, both modified. However, in the real TxGraphImpl, the dependency addition and removal on E is done lazily, and at application time, the removal happens first, and the dependency addition is a no-op. In the real graph, F is never copied to staging, and thus will not be included in the diagrams, since it is actually the same cluster in both main & staging.
ismaelsadeeq
commented at 9:29 pm on April 22, 2025:
This explanation clarified the comment, thanks.
in
src/txgraph.cpp:1708
in
542df1db60outdated
1631- // Translate all transactions in the Cluster (in linearization order) to Refs.
1632- for (auto idx : m_linearization) {
1633- const auto& entry = graph.m_entries[m_mapping[idx]];
1634+ // Translate the transactions in the Cluster (in linearization order, starting at start_pos in
1635+ // the linearization) to Refs, and fill them in range.
1636+ for (auto& ref : range) {
ismaelsadeeq
commented at 12:06 pm on April 22, 2025:
In βtxgraph: Generalize GetClusterRefs to support subsections (preparation)β 542df1db60af9c2bb27f2b357a8b00817a4bfea0
Maybe check to ensure that start_pos is not more than m_linearization.size() to prevent out of range access attempt?
561+ /** Which TxGraphImpl this object is doing block building for. It will have its
562+ * m_main_chunkindex_observers incremented as long as this BlockBuilderImpl exists. */
563+ TxGraphImpl* const m_graph;
564+ /** Clusters which we're not including further transactions from. */
565+ std::set<Cluster*> m_excluded_clusters;
566+ /** Iterator to the current chunk (after the current one) in the chunk index. end() if nothing
ismaelsadeeq
commented at 4:55 pm on April 22, 2025:
In βtxgraph: Introduce BlockBuilder interface (feature)β b76a9b719a9b7ee98b3835aab60369f37f1628a2
in
src/test/fuzz/txgraph.cpp:658
in
e1cb50a957outdated
653+ auto diagram_gain = sum_staged - sum_main;
654+ auto real_gain = sims[1].SumAll() - sims[0].SumAll();
655+ // Just check that the total fee gained/lost and size gained/lost according to the
656+ // diagram matches the difference in these values in the simulated graph. A more
657+ // complete check of the GetMainStagingDiagrams result is performed at the end.
658+ assert(diagram_gain == real_gain);
ismaelsadeeq
commented at 7:31 am on April 25, 2025:
In βtxgraph: Add GetMainStagingDiagrams function (feature)β e1cb50a9574b70ae6509fc3578af3cbf886f2b63
nit
0 auto sim_gain = sims[1].SumAll() - sims[0].SumAll();
1 // Just check that the total fee gained/lost and size gained/lost according to the
2 // diagram matches the difference in these values in the simulated graph. A more
3 // complete check of the GetMainStagingDiagrams result is performed at the end.
4 assert(diagram_gain == sim_gain);
Done, and added some more real_ and sim_ prefixes to variables here.
in
src/test/fuzz/txgraph.cpp:775
in
e1cb50a957outdated
660+ for (size_t i = 1; i < main_diagram.size(); ++i) {
661+ assert(FeeRateCompare(main_diagram[i], main_diagram[i - 1]) <= 0);
662+ }
663+ for (size_t i = 1; i < staged_diagram.size(); ++i) {
664+ assert(FeeRateCompare(staged_diagram[i], staged_diagram[i - 1]) <= 0);
665+ }
ismaelsadeeq
commented at 7:38 am on April 25, 2025:
In βtxgraph: Add GetMainStagingDiagrams function (feature)β e1cb50a9574b70ae6509fc3578af3cbf886f2b63
The sorting in GetMainStagingDiagrams uses the > operator which additionally compare the size when their is tie, whereas here we use FeeRateCompare which only compare the fee rates.
Shouldnβt this and others below also use the > operator?
I donβt think this is needed. TxGraph doesnβt promise a specific ordering of equal-feerate diagram segments. The implementation is allowed to be more specific than the interface promises.
in
src/test/fuzz/txgraph.cpp:866
in
e1cb50a957outdated
716@@ -639,6 +717,62 @@ FUZZ_TARGET(txgraph)
717 assert(FeeRateCompare(after_feerate, pos_feerate) <= 0);
718 }
719 }
720+
721+ // Check that the implied ordering gives rise to a combined diagram that matches the
722+ // diagram constructed from the individual cluster linearization chunkings.
723+ auto main_real_diagram = get_diagram_fn(/*main_only=*/true);
724+ auto main_implied_diagram = ChunkLinearization(sims[0].graph, vec1);
ismaelsadeeq
commented at 7:40 am on April 25, 2025:
In βtxgraph: Add GetMainStagingDiagrams function (feature)β e1cb50a9574b70ae6509fc3578af3cbf886f2b63
Shouldnβt main_implied_diagram be main_sim_diagram?
No, because itβs computing the diagram for the linearization vec1, which is constructed using the real graphβs transaction comparison function.
Itβs just using sims[0].graph to pull the feerate information from.
in
src/test/fuzz/txgraph.cpp:869
in
e1cb50a957outdated
722+ // diagram constructed from the individual cluster linearization chunkings.
723+ auto main_real_diagram = get_diagram_fn(/*main_only=*/true);
724+ auto main_implied_diagram = ChunkLinearization(sims[0].graph, vec1);
725+ assert(CompareChunks(main_real_diagram, main_implied_diagram) == 0);
726+
727+ if (sims.size() >= 2 && !sims[1].IsOversized()) {
ismaelsadeeq
commented at 7:43 am on April 25, 2025:
In βtxgraph: Add GetMainStagingDiagrams function (feature)β e1cb50a9574b70ae6509fc3578af3cbf886f2b63
They both have to not be oversized no?
0 if (sims.size() >= 2 && !sims[1].IsOversized() && !sims[0].IsOversized()) {
instagibbs
commented at 1:24 pm on April 28, 2025:
this was already checked for earlier in the first if block
in
src/test/fuzz/txgraph.cpp:885
in
e1cb50a957outdated
738+ // Treat the diagrams as sets of chunk feerates, and sort them in the same way so that
739+ // std::set_difference can be used on them below. The exact ordering does not matter
740+ // here, but it has to be consistent with the one used in main_diagram and
741+ // stage_diagram).
742+ std::sort(main_cmp_diagram.begin(), main_cmp_diagram.end(), std::greater{});
743+ std::sort(stage_cmp_diagram.begin(), stage_cmp_diagram.end(), std::greater{});
ismaelsadeeq
commented at 7:53 am on April 25, 2025:
The TxGraph interface doesnβt promise a specific order, only a diagram. The implementation may be doing something more specific right now, but since the interface does not promise that, the test shouldnβt be relying on it.
In what follows, main_cmp_diagram and stage_cmp_diagram will not be treated a diagrams, but as sets of FeeFrac objects, and theyβll be compared against the sets of FeeFrac objects in main_real_diagram and staged_real_diagram respectively. In order to do that, they must be sorted exactly identically, which is a stronger requirement than is needed than just the diagram that GetMainStagingDiagrams() promises.
in
src/test/fuzz/txgraph.cpp:741
in
e1cb50a957outdated
736+ assert(FeeRateCompare(stage_cmp_diagram[i], stage_cmp_diagram[i - 1]) <= 0);
737+ }
738+ // Treat the diagrams as sets of chunk feerates, and sort them in the same way so that
739+ // std::set_difference can be used on them below. The exact ordering does not matter
740+ // here, but it has to be consistent with the one used in main_diagram and
741+ // stage_diagram).
ismaelsadeeq
commented at 8:00 am on April 25, 2025:
177+ /** Make constructor non-public (use TxGraph::GetBlockBuilder()). */
178+ BlockBuilder() noexcept = default;
179+ public:
180+ /** Support safe inheritance. */
181+ virtual ~BlockBuilder() = default;
182+ /** Get the chunk that is currently suggested to be included, plus its feerate, if any. */
Perhaps mention that GetCurrentChunk should only be called once (which is unusual for a getter), even though I canβt think of an scenario where you would actually want to call this more than once.
However I think the best thing to do is actually not use m_cur_cluster for dual purposes and instead have a dedicated variable that tracks when singletons and last chunks donβt need to be added to m_excluded_clusters. This would be a tad more comprehensible and there would be no concerns with calling GetCurrentChunk more than once.
Perhaps mention that GetCurrentChunk should only be called once (which is unusual for a getter), even though I canβt think of an scenario where you would actually want to call this more than once.
Why? Itβs perfectly allowed to call it more than once (though, kind of pointless).
However I think the best thing to do is actually not use m_cur_cluster for dual purposes and instead have a dedicated variable that tracks when singletons and last chunks donβt need to be added to m_excluded_clusters. This would be a tad more comprehensible and there would be no concerns with calling GetCurrentChunk more than once.
Agreed, done.
monlovesmango
commented at 11:10 pm on May 12, 2025:
Sorry yes you are correct. I misread the usage of m_cur_cluster (and then proceeded to also incorrectly fuzz test when trying to confirm my understanding).
Remember that an Assume() is never necessary. Its only purpose is helping you, the reviewer, gain confidence that a property is likely true.
As for what it isnβt here, I didnβt think itβd be useful, as ClearChunkData is a very low-level internal function. I put the Assume() calls around maintenance largely in higher-level functions, but if you feel it would help you to have more, feel free to suggest them.
monlovesmango
commented at 11:11 pm on May 12, 2025:
Mostly just trying to get an understanding of your thought process on when an Assume() is appropriate. That is good context to know about Assume() usage.
I expected it in the higher-level functions AddTransaction, RemoveTransaction, AddDependency, SetTransactionFee, and CommitStaging as these all can change linearization of main graph and did indeed find it in these.
I didnβt expect it in ClearChunkData and UnlinkRef, but thought maybe it was there to be confident that we donβt remove data that BlockBuilderImpl might need/reference. However if this was the correct rationale I would have also expected ClearChunkData to have an Assume(m_main_chunkindex_observers == 0). So was just wondering what the rationale was if there was one.
in
src/txgraph.cpp:305
in
7208ec2f1foutdated
300+ Assume(a_entry.m_locator[0].IsPresent());
301+ Assume(b_entry.m_locator[0].IsPresent());
302+ auto cmp_sequence = CompareClusters(a_entry.m_locator[0].cluster, b_entry.m_locator[0].cluster);
303+ if (cmp_sequence != 0) return cmp_sequence < 0;
304+ // Finally sort by position within the Cluster.
305+ return a_entry.m_main_lin_index < b_entry.m_main_lin_index;
This is pretty much identical to the logic in CompareMainOrder, should this be encapsulated in a single place as the source of truth (rather than having to ensure they stay equivalent)?
Unrelated to this particular PR, but what is the intended caller/use case for CompareMainOrder? Is the reason CompareMainOrder calls MakeAcceptable while ChunkOrder does not simply because we assume its already been called when ChunkOrder is invoked?
This is pretty much identical to the logic in CompareMainOrder, should this be encapsulated in a single place as the source of truth (rather than having to ensure they stay equivalent)?
Good idea, done.
Unrelated to this particular PR, but what is the intended caller/use case for CompareMainOrder? Is the reason CompareMainOrder calls MakeAcceptable while ChunkOrder does not simply because we assume its already been called when ChunkOrder is invoked?
ChunkOrderβs comparison function is called internally whenever the index changes, it is not an externally-facing function like CompareMainOrder. MakeAcceptable cannot be called by ChunkOrder, because itβs the other way around: MakeAcceptable will indirectly modify clusters/chunks, which get added to the index, and that will call ChunkOrder.
monlovesmango
commented at 9:33 pm on May 10, 2025:
contributor
ACK7208ec2f1f0b9d72e1b4248dae9adbcc5ab625f7
All my comments are mostly just questions.
Sorry it took a while to do my review, just wanted to make sure I actually felt like I had a good understanding of all the changes (which took a while).
Unrelated to this PR and more of an FYI, in src/test/fuzz/txgraph.cpp pick_fn I think that:
glozow
commented at 7:01 pm on May 12, 2025:
member
ACK7208ec2f1f0
Light code review + tested by implementing a package linearizer + chunker on top of TxGraph and running #26711 unit tests. It is linearizing and chunking the transactions as I expect, which is great (it still has trouble with cases like RBF, chunks not fitting, and skipping non-descendants, but definitely out of scope for this PR).
txgraph: Add GetMainStagingDiagrams function (feature)
This allows determining whether the changes in a staging diagram unambiguously improve
the graph, through CompareChunks().
2614fea17f
txgraph: abstract out transaction ordering (refactor)87e74e1242
txgraph: Maintain chunk index (preparation)
This is preparation for exposing mining and eviction functionality in
TxGraph.
This is preparation for a next commit which will introduce a class whose
objects hold references to internals in TxGraphImpl, which disallows
modifications to the graph while such objects exist.
c28a602e00
txgraph: Generalize GetClusterRefs to support subsections (preparation)
This is preparation for a next commit which will need a way to extract Refs
for just individual chunks from a cluster.
This interface lets one iterate efficiently over the chunks of the main
graph in a TxGraph, in the same order as CompareMainOrder. Each chunk
can be marked as "included" or "skipped" (and in the latter case,
dependent chunks will be skipped).
Unrelated to this PR and more of an FYI, in src/test/fuzz/txgraph.cpp pick_fn I think that:
Sure, that would be correct, but very confusing, I think!
The number of possible choices is tx_count[0] + sims[0].removed.size() + 1, and this value is stored in the variable choices, representing the number of choices.
A number is then generated in the range between 0 and choices - 1, inclusive.
Of course, operationally, the + 1 and subsequent - 1 are superfluous, but then the variable would probably need to be renamed to choices_minus_one, or max_valid_choice.
0--- a/src/txgraph.cpp
1+++ b/src/txgraph.cpp
2@@ -282,6 +282,27 @@ private:
3 return a->m_sequence <=> b->m_sequence;
4 }
5 6+ /** Compare two entries (which must both exist within the main graph). */
7+ std::strong_ordering CompareMainTransactions(GraphIndex a, GraphIndex b) const noexcept
8+ {
9+ Assume(a < m_entries.size() && b < m_entries.size());
10+ const auto& entry_a = m_entries[a];
11+ const auto& entry_b = m_entries[b];
12+ // Compare chunk feerates, and return result if it differs.
13+ auto feerate_cmp = FeeRateCompare(entry_b.m_main_chunk_feerate, entry_a.m_main_chunk_feerate);
14+ if (feerate_cmp < 0) return std::strong_ordering::less;
15+ if (feerate_cmp > 0) return std::strong_ordering::greater;
16+ // Compare Cluster m_sequence as tie-break for equal chunk feerates.
17+ const auto& locator_a = entry_a.m_locator[0];
18+ const auto& locator_b = entry_b.m_locator[0];
19+ Assume(locator_a.IsPresent() && locator_b.IsPresent());
20+ if (locator_a.cluster != locator_b.cluster) {
21+ return CompareClusters(locator_a.cluster, locator_b.cluster);
22+ }
23+ // As final tie-break, compare position within cluster linearization.
24+ return entry_a.m_main_lin_index <=> entry_b.m_main_lin_index;
25+ }
26+
27 /** Comparator for ChunkData objects in mining order. */
28 class ChunkOrder
29 {
30@@ -291,18 +312,7 @@ private:
31 32 bool operator()(const ChunkData& a, const ChunkData& b) const noexcept
33 {
34- const auto& a_entry = m_graph->m_entries[a.m_graph_index];
35- const auto& b_entry = m_graph->m_entries[b.m_graph_index];
36- // First sort from high feerate to low feerate.
37- auto cmp_feerate = FeeRateCompare(a_entry.m_main_chunk_feerate, b_entry.m_main_chunk_feerate);
38- if (cmp_feerate != 0) return cmp_feerate > 0;
39- // Then sort by increasing Cluster::m_sequence.
40- Assume(a_entry.m_locator[0].IsPresent());
41- Assume(b_entry.m_locator[0].IsPresent());
42- auto cmp_sequence = CompareClusters(a_entry.m_locator[0].cluster, b_entry.m_locator[0].cluster);
43- if (cmp_sequence != 0) return cmp_sequence < 0;
44- // Finally sort by position within the Cluster.
45- return a_entry.m_main_lin_index < b_entry.m_main_lin_index;
46+ return m_graph->CompareMainTransactions(a.m_graph_index, b.m_graph_index) < 0;
47 }
48 };
49 50@@ -575,9 +585,10 @@ class BlockBuilderImpl final : public TxGraph::BlockBuilder
51 /** Iterator to the current chunk in the chunk index. end() if nothing further remains. */
52 TxGraphImpl::ChunkIndex::const_iterator m_cur_iter;
53 /** Which cluster the current chunk belongs to, so we can exclude further transactions from it
54- * when that chunk is skipped, or nullptr if we know we're at the end of the current
55- * cluster. */
56+ * when that chunk is skipped. */
57 Cluster* m_cur_cluster;
58+ /** Whether we know that m_cur_iter points to the last chunk of m_cur_cluster. */
59+ bool m_known_end_of_cluster;
60 61 // Move m_cur_iter / m_cur_cluster to the next acceptable chunk.
62 void Next() noexcept;
63@@ -2046,16 +2057,8 @@ std::strong_ordering TxGraphImpl::CompareMainOrder(const Ref& a, const Ref& b) n
64 Assume(locator_b.IsPresent());
65 MakeAcceptable(*locator_a.cluster);
66 MakeAcceptable(*locator_b.cluster);
67- // Compare chunk feerates, and return result if it differs.
68- auto feerate_cmp = FeeRateCompare(entry_b.m_main_chunk_feerate, entry_a.m_main_chunk_feerate);
69- if (feerate_cmp < 0) return std::strong_ordering::less;
70- if (feerate_cmp > 0) return std::strong_ordering::greater;
71- // Compare Cluster* as tie-break for equal chunk feerates.
72- if (locator_a.cluster != locator_b.cluster) {
73- return CompareClusters(locator_a.cluster, locator_b.cluster);
74- }
75- // As final tie-break, compare position within cluster linearization.
76- return entry_a.m_main_lin_index <=> entry_b.m_main_lin_index;
77+ // Invoke comparison logic.
78+ return CompareMainTransactions(GetRefIndex(a), GetRefIndex(b));
79 }
80 81 TxGraph::GraphIndex TxGraphImpl::CountDistinctClusters(std::span<const Ref* const> refs, bool main_only) noexcept
82@@ -2350,6 +2353,7 @@ void BlockBuilderImpl::Next() noexcept
83 const auto& chunk_data = *m_cur_iter;
84 const auto& chunk_end_entry = m_graph->m_entries[chunk_data.m_graph_index];
85 m_cur_cluster = chunk_end_entry.m_locator[0].cluster;
86+ m_known_end_of_cluster = false;
87 // If we previously skipped a chunk from this cluster we cannot include more from it.
88 if (!m_excluded_clusters.contains(m_cur_cluster)) break;
89 }
90@@ -2363,25 +2367,21 @@ std::optional<std::pair<std::vector<TxGraph::Ref*>, FeePerWeight>> BlockBuilderI
91 ret.emplace();
92 const auto& chunk_data = *m_cur_iter;
93 const auto& chunk_end_entry = m_graph->m_entries[chunk_data.m_graph_index];
94- auto cluster = chunk_end_entry.m_locator[0].cluster;
95 if (chunk_data.m_chunk_count == LinearizationIndex(-1)) {
96 // Special case in case just a single transaction remains, avoiding the need to
97 // dispatch to and dereference Cluster.
98 ret->first.resize(1);
99 Assume(chunk_end_entry.m_ref != nullptr);
100 ret->first[0] = chunk_end_entry.m_ref;
101- m_cur_cluster = nullptr;
102+ m_known_end_of_cluster = true;
103 } else {
104+ Assume(m_cur_cluster);
105 ret->first.resize(chunk_data.m_chunk_count);
106 auto start_pos = chunk_end_entry.m_main_lin_index + 1 - chunk_data.m_chunk_count;
107- bool is_end = cluster->GetClusterRefs(*m_graph, ret->first, start_pos);
108- if (is_end) {
109- m_cur_cluster = nullptr;
110- // If the chunk size was 1, then the special case above should have been used.
111- Assume(chunk_data.m_chunk_count > 1);
112- } else {
113- Assume(cluster == m_cur_cluster);
114- }
115+ m_known_end_of_cluster = m_cur_cluster->GetClusterRefs(*m_graph, ret->first, start_pos);
116+ // If the chunk size was 1 and at end of cluster, then the special case above should
117+ // have been used.
118+ Assume(!m_known_end_of_cluster || chunk_data.m_chunk_count > 1);
119 }
120 ret->second = chunk_end_entry.m_main_chunk_feerate;
121 }
122@@ -2425,10 +2425,12 @@ void BlockBuilderImpl::Include() noexcept
123 void BlockBuilderImpl::Skip() noexcept
124 {
125 // When skipping a chunk we need to not include anything more of the cluster, as that could make
126- // the result topologically invalid. Note that m_cur_cluster == nullptr when this was the last
127- // chunk of a cluster, so in that case nothing is added here. This may significantly reduce the
128- // size of m_excluded_clusters, especially when many singleton clusters are ignored.
129- if (m_cur_cluster != nullptr) m_excluded_clusters.insert(m_cur_cluster);
130+ // the result topologically invalid. However, don't do this if the chunk is known to be the last
131+ // chunk of the cluster. This may significantly reduce the size of m_excluded_clusters,
132+ // especially when many singleton clusters are ignored.
133+ if (m_cur_cluster != nullptr && !m_known_end_of_cluster) {
134+ m_excluded_clusters.insert(m_cur_cluster);
135+ }
136 Next();
137 }
monlovesmango
commented at 11:28 pm on May 12, 2025:
contributor
The number of possible choices is tx_count[0] + sims[0].removed.size() + 1, and this value is stored in the variable choices, representing the number of choices.
Interesting, I guess my interpretation is different and donβt understand where the +1 comes from. I would have thought the number of choices is simply the tx count plus the removed tx count.
Sure, that would be correct, but very confusing, I think!
Its funny, I only noticed this because I found the current assignment very confusing hahah.
Anyway, I think this can be ignored then. Thanks for addressing it.
Interesting, I guess my interpretation is different and donβt understand where the +1 comes from. I would have thought the number of choices is simply the tx count plus the removed tx count.
The +1 is from the third option: returning the empty Ref object, at the bottom of the function.
monlovesmango
commented at 3:38 am on May 13, 2025:
contributor
reACK8673e8f01917b48a5f5476792f759f44ea49d5a5
instagibbs
commented at 1:18 pm on May 13, 2025:
member
Consolidated main ordering checks, added explicit variable for tracking end of cluster in blockbuilder, Hand-checked that m_known_end_of_cluster is set anytime it is read from, LGTM.
DrahtBot requested review from glozow
on May 13, 2025
glozow
commented at 8:23 pm on May 13, 2025:
member
This is a metadata mirror of the GitHub repository
bitcoin/bitcoin.
This site is not affiliated with GitHub.
Content is generated from a GitHub metadata backup.
generated: 2025-06-03 09:12 UTC
This site is hosted by @0xB10C More mirrored repositories can be found on mirror.b10c.me