sipa
commented at 7:06 PM on December 22, 2024:
member
Part of cluster mempool (#30289).
During reorganisations, it is possible that dependencies get added which would result in clusters that violate policy limits (cluster count, cluster weight), when linking the new from-block transactions to the old from-mempool transactions. Unlike RBF scenarios, we cannot simply reject the changes when they are due to received blocks. To accommodate this, add a TxGraph::Trim(), which removes some subset of transactions (including descendants) in order to make all resulting clusters satisfy the limits.
Conceptually, the way this is done is by defining a rudimentary linearization for the entire would-be too-large cluster, iterating it from beginning to end, and reasoning about the counts and weights of the clusters that would be reached using transactions up to that point. If a transaction is encountered whose addition would violate the limit, it is removed, together with all its descendants.
This rudimentary linearization is like a merge sort of the chunks of the clusters being combined, but respecting topology. More specifically, it is continuously picking the highest-chunk-feerate remaining transaction among those which have no unmet dependencies left. For efficiency, this rudimentary linearization is computed lazily, by putting all viable transactions in a heap, sorted by chunk feerate, and adding new transactions to it as they become viable.
The Trim() function is rather unusual compared to the TxGraph functionality added in previous PRs, in that Trim() makes it own decisions about what the resulting graph contents will be, without good specification of how it makes that decision - it is just a best-effort attempt (which is improved in the last commit). All other TxGraph mutators are simply to inform the graph about changes the calling mempool code decided on; this one lets the decision be made by txgraph.
As part of this, the "oversized" property is expanded to also encompass a configurable cluster weight limit (in addition to cluster count limit).
DrahtBot
commented at 7:06 PM on December 22, 2024:
contributor
<!--e57a25ab6845829454e8d69fc972939a-->
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.
<!--174a7506f384e20aa4161008e828411d-->
Conflicts
Reviewers, this pull request conflicts with the following ones:
#31829 (p2p: improve TxOrphanage denial of service bounds by glozow)
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.
<!--5faf32d7da4f0f540f40219e4f7537a3-->
LLM Linter (✨ experimental)
Possible typos and grammar issues:
“Pick an transaction in top_components to attach to.” -> “Pick a transaction in top_components to attach to.” [‘an’ before consonant‐starting ‘transaction’ should be ‘a’]
“Move entry top the back.” -> “Move entry to the back.” [typo ‘top’ instead of ‘to’]
<sup>drahtbot_id_4_m</sup>
glozow added the label Mempool on Jan 2, 2025
sipa force-pushed on Jan 8, 2025
DrahtBot added the label CI failed on Jan 9, 2025
DrahtBot
commented at 12:52 AM on January 9, 2025:
contributor
Try to run the tests locally, according to the documentation. However, a CI failure may still
happen due to a number of reasons, for example:
Possibly due to a silent merge conflict (the changes in this pull request being
incompatible with the current code in the target branch). If so, make sure to rebase on the latest
commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the
affected test.
An intermittent issue.
Leave a comment here, if you need help tracking down a confusing failure.
</details>
sipa force-pushed on Jan 9, 2025
DrahtBot removed the label CI failed on Jan 9, 2025
sipa force-pushed on Jan 9, 2025
in
src/txgraph.cpp:1710
in
0c8dc2323eoutdated
948 | + Ref ret; 949 | + // Construct a new Entry, and link it with the Ref. 950 | + auto idx = m_entries.size(); 951 | + m_entries.emplace_back(); 952 | + auto& entry = m_entries.back(); 953 | + entry.m_ref = &ret;
I'm not sure how this is intended to be used, but storing a stack address seems like a problem? RVO may help but that seems brittle. I imagine the caller should be passing in their own Ref instead?
Why is this doing an effective swap? I would expect this to call UnlinkRef on the moved-from value and reset its m_graph and m_index. Otherwise it wouldn't be unlinked until the moved-from variable goes out of scope, no?
The vector now holds the old ref and UnlinkRef will not be called until that element is removed. I realize it's allowed to be a "valid but unspecified state", but I wouldn't expect a ref to be hanging around.
in
src/txgraph.cpp:188
in
0c8dc2323eoutdated
183 | + /** A class of objects held internally in TxGraphImpl, with information about a single 184 | + * transaction. */ 185 | + struct Entry 186 | + { 187 | + /** Pointer to the corresponding Ref object, if any. */ 188 | + Ref* m_ref;
292 | + LinearizationIndex lin_idx{0}; 293 | + // Iterate over the chunks. 294 | + for (unsigned chunk_idx = 0; chunk_idx < chunking.NumChunksLeft(); ++chunk_idx) { 295 | + auto chunk = chunking.GetChunk(chunk_idx); 296 | + // Iterate over the transactions in the linearization, which must match those in chunk. 297 | + while (true) {
Trying to convince myself this is guaranteed to terminate...
do{} while (!chunk.transactions.None()) rather than the break for readability? Or just while() if we need to guard against an empty linearization (presumably not?)
Every chunk contains at least one element (added an Assume for that)
In the inner loop, one element from that chunk is Reset() (added an Assume that it indeed resets a bit that was previously set).
I've changed it to a do {} while(chunk.transactions.Any()); loop in the first commits, though it reverts back to a while (true) { ... } loop later, when the loop becomes a bit more complex.
in
src/txgraph.cpp:838
in
0c8dc2323eoutdated
306 | +} 307 | + 308 | +void Cluster::ApplyRemovals(TxGraphImpl& graph, std::span<GraphIndex>& to_remove) noexcept 309 | +{ 310 | + // Iterate over the prefix of to_remove that applies to this cluster. 311 | + SetType todo;
Done. I've also added a comment to the Cluster::ApplyRemovals() function definition stating that at least one element from the front of to_remove must belong to this Cluster (which is really why that requirement exists).
Try to run the tests locally, according to the documentation. However, a CI failure may still
happen due to a number of reasons, for example:
Possibly due to a silent merge conflict (the changes in this pull request being
incompatible with the current code in the target branch). If so, make sure to rebase on the latest
commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the
affected test.
An intermittent issue.
Leave a comment here, if you need help tracking down a confusing failure.
</details>
sipa force-pushed on Jan 10, 2025
DrahtBot removed the label CI failed on Jan 10, 2025
sdaftuar
commented at 2:10 PM on January 24, 2025:
FYI -- in my rebase of #28676, I'm seeing tx_pool fuzz test failures due to this line. Not clear to me whether we should require the caller to enforce the policy requirement that a single tx be below the cluster size limit, or just let the caller discover a changeset is oversized and then reject?
Right. That rule exists because the alternative requires existing clusters to be oversized as AddTransaction constructs a singleton cluster instantly. All other forms of oversizedness happen as a result of applying dependencies, which are done lazily.
Done, it is now allowed to have individually oversized transactions.
sipa force-pushed on Jan 24, 2025
sipa
commented at 10:14 PM on January 24, 2025:
member
Some changes:
As a result of dropping Cleanup in the base PR, Trim now reports which transactions it removed, as it becomes the caller's responsibility of destroying Refs.
DrahtBot added the label CI failed on Jan 24, 2025
DrahtBot
commented at 11:23 PM on January 24, 2025:
contributor
Try to run the tests locally, according to the documentation. However, a CI failure may still
happen due to a number of reasons, for example:
Possibly due to a silent merge conflict (the changes in this pull request being
incompatible with the current code in the target branch). If so, make sure to rebase on the latest
commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the
affected test.
An intermittent issue.
Leave a comment here, if you need help tracking down a confusing failure.
</details>
sipa force-pushed on Jan 26, 2025
sipa
commented at 4:42 AM on January 26, 2025:
member
Add support for calling AddTransaction with a feerate whose size already violates the cluster size limit.
DrahtBot removed the label CI failed on Jan 26, 2025
sipa force-pushed on Jan 26, 2025
sipa force-pushed on Jan 30, 2025
DrahtBot
commented at 1:01 AM on January 31, 2025:
contributor
Try to run the tests locally, according to the documentation. However, a CI failure may still
happen due to a number of reasons, for example:
Possibly due to a silent merge conflict (the changes in this pull request being
incompatible with the current code in the target branch). If so, make sure to rebase on the latest
commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the
affected test.
An intermittent issue.
Leave a comment here, if you need help tracking down a confusing failure.
</details>
DrahtBot added the label CI failed on Jan 31, 2025
sipa force-pushed on Jan 31, 2025
DrahtBot removed the label CI failed on Jan 31, 2025
sipa force-pushed on Jan 31, 2025
sipa force-pushed on Feb 1, 2025
sipa force-pushed on Feb 4, 2025
DrahtBot added the label CI failed on Feb 4, 2025
DrahtBot removed the label CI failed on Feb 4, 2025
sipa force-pushed on Feb 6, 2025
sipa force-pushed on Feb 11, 2025
sipa force-pushed on Feb 12, 2025
DrahtBot
commented at 11:49 PM on February 12, 2025:
contributor
Try to run the tests locally, according to the documentation. However, a CI failure may still
happen due to a number of reasons, for example:
Possibly due to a silent merge conflict (the changes in this pull request being
incompatible with the current code in the target branch). If so, make sure to rebase on the latest
commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the
affected test.
An intermittent issue.
Leave a comment here, if you need help tracking down a confusing failure.
</details>
DrahtBot added the label CI failed on Feb 12, 2025
sipa force-pushed on Feb 13, 2025
DrahtBot removed the label CI failed on Feb 13, 2025
sipa force-pushed on Feb 14, 2025
sipa force-pushed on Feb 20, 2025
sipa force-pushed on Feb 21, 2025
in
src/cluster_linearize.h:1381
in
e43f6ca3b8outdated
1364 | + // Figure out which elements need to be moved before elem.1365 | + SetType place_before = done & depgraph.Ancestors(elem);1366 | + // Find which position to place elem in (updating j), continuously moving the elements1367 | + // in between forward.1368 | + while (place_before.Any()) {1369 | + // j cannot be 0 here; if it was, then there was necessarily nothing earlier which
yancyribbens
commented at 10:20 PM on February 24, 2025:
// j cannot be 0 or less; if it was, then there was necessarily nothing earlier which
yancyribbens
commented at 10:23 PM on February 24, 2025:
if it was, then there was necessarily nothing earlier which elem needs to be place before anymore, and place_before would be empty.
This comment seems jumbled and hard to understand. Is it possible to word this better?
Try to run the tests locally, according to the documentation. However, a CI failure may still
happen due to a number of reasons, for example:
Possibly due to a silent merge conflict (the changes in this pull request being
incompatible with the current code in the target branch). If so, make sure to rebase on the latest
commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the
affected test.
An intermittent issue.
Leave a comment here, if you need help tracking down a confusing failure.
</details>
sipa force-pushed on Mar 19, 2025
DrahtBot removed the label CI failed on Mar 20, 2025
DrahtBot added the label Needs rebase on Mar 20, 2025
sipa force-pushed on Mar 20, 2025
DrahtBot added the label CI failed on Mar 20, 2025
DrahtBot
commented at 1:42 PM on March 20, 2025:
contributor
Try to run the tests locally, according to the documentation. However, a CI failure may still
happen due to a number of reasons, for example:
Possibly due to a silent merge conflict (the changes in this pull request being
incompatible with the current code in the target branch). If so, make sure to rebase on the latest
commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the
affected test.
An intermittent issue.
Leave a comment here, if you need help tracking down a confusing failure.
</details>
sipa force-pushed on Mar 20, 2025
DrahtBot removed the label Needs rebase on Mar 20, 2025
DrahtBot removed the label CI failed on Mar 20, 2025
sipa force-pushed on Mar 22, 2025
DrahtBot added the label CI failed on Mar 22, 2025
DrahtBot
commented at 1:23 AM on March 22, 2025:
contributor
Try to run the tests locally, according to the documentation. However, a CI failure may still
happen due to a number of reasons, for example:
Possibly due to a silent merge conflict (the changes in this pull request being
incompatible with the current code in the target branch). If so, make sure to rebase on the latest
commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the
affected test.
An intermittent issue.
Leave a comment here, if you need help tracking down a confusing failure.
</details>
sipa force-pushed on Mar 22, 2025
DrahtBot removed the label CI failed on Mar 22, 2025
sipa force-pushed on Mar 23, 2025
sipa force-pushed on Mar 24, 2025
glozow referenced this in commit f1d129d963 on Mar 26, 2025
sipa force-pushed on Mar 27, 2025
sipa
commented at 3:49 AM on March 27, 2025:
member
Rebased after merge of #31363, and on top of #32151.
sipa force-pushed on Mar 27, 2025
sipa force-pushed on Mar 27, 2025
sipa force-pushed on Mar 28, 2025
sipa force-pushed on Mar 28, 2025
sipa force-pushed on Mar 28, 2025
sipa force-pushed on Apr 4, 2025
sipa force-pushed on Apr 4, 2025
sipa force-pushed on Apr 7, 2025
sipa force-pushed on Apr 7, 2025
DrahtBot added the label Needs rebase on Apr 17, 2025
sipa force-pushed on Apr 17, 2025
DrahtBot removed the label Needs rebase on Apr 17, 2025
in
src/test/fuzz/txgraph.cpp:272
in
462429bc15outdated
267 | @@ -262,12 +268,16 @@ FUZZ_TARGET(txgraph)
268 |
269 | // Decide the maximum number of transactions per cluster we will use in this simulation.
270 | auto max_count = provider.ConsumeIntegralInRange<DepGraphIndex>(1, MAX_CLUSTER_COUNT_LIMIT);
271 | + // And the maximum combined size of transactions per cluster. 272 | + auto max_size = provider.ConsumeIntegralInRange<uint64_t>(1, 0x3fffff * MAX_CLUSTER_COUNT_LIMIT);
misinterpreted the commit message that the aggregate size should not be hit when adding transaction, rather than each individual txn shouldn't exceed this limit. Maybe make this crystal clear.
108 | @@ -109,6 +109,8 @@ class Cluster
109 | }
110 | /** Get the number of transactions in this Cluster. */
111 | LinearizationIndex GetTxCount() const noexcept { return m_linearization.size(); }
112 | + /** Get the total size of the transactions in this Cluster. */ 113 | + uint64_t GetTxSize() const noexcept;
62 | @@ -63,10 +63,10 @@ class TxGraph
63 | /** Virtual destructor, so inheriting is safe. */
64 | virtual ~TxGraph() = default;
65 | /** Construct a new transaction with the specified feerate, and return a Ref to it.
66 | - * If a staging graph exists, the new transaction is only created there. In all 67 | - * further calls, only Refs created by AddTransaction() are allowed to be passed to this 68 | - * TxGraph object (or empty Ref objects). Ref objects may outlive the TxGraph they were 69 | - * created for. */ 70 | + * If a staging graph exists, the new transaction is only created there. feerate.size cannot 71 | + * exceed the graph's max cluster size. In all further calls, only Refs created by
2140 | @@ -2124,6 +2141,8 @@ void Cluster::SanityCheck(const TxGraphImpl& graph, int level) const
2141 | assert(m_linearization.size() <= graph.m_max_cluster_count);
2142 | // The level must match the level the Cluster occurs in.
2143 | assert(m_level == level);
2144 | + // The sum of their sizes cannot exceed m_max_cluster_size.2145 | + assert(GetTxSize() <= graph.m_max_cluster_size);
Was a bit confusing at first, but this must be true because clusters are never merged (via dependency application) when to-be-merged-clusters would be oversized.
commit message maybe could get beefed up a bit with motivation since at first blush it seems odd to add singletons we wouldn't have relayed in the first place
I have expanded the commit message of both Trim commits.
in
src/txgraph.cpp:40
in
439ccf713boutdated
34 | @@ -35,6 +35,9 @@ using ClusterSetIndex = uint32_t;
35 | /** Quality levels for cached cluster linearizations. */
36 | enum class QualityLevel
37 | {
38 | + /** This is a singleton cluster consisting of a transaction that individually exceeds the 39 | + * cluster size limit. It cannot be merged with anything. */ 40 | + OVERSIZED,
I have simplified this to just a m_linearization.empty() || in the if conditional below.
in
src/txgraph.cpp:861
in
439ccf713boutdated
854 | @@ -838,7 +855,10 @@ void Cluster::ApplyRemovals(TxGraphImpl& graph, std::span<GraphIndex>& to_remove
855 | void Cluster::Clear(TxGraphImpl& graph) noexcept
856 | {
857 | for (auto i : m_linearization) {
858 | - graph.ClearLocator(m_level, m_mapping[i]); 859 | + // We do not care about setting oversized_tx accurately here, because this function is only 860 | + // applied to main-graph Clusters in CommitStaging, which will overwrite main's 861 | + // m_txcount_oversized anyway with the staging graph's value. 862 | + graph.ClearLocator(m_level, m_mapping[i], /*oversized_tx=*/false);
Instead of this big unintuitive comment, I have replaced it with /*oversized_tx=*/m_quality == QualityLevel::OVERSIZED_SINGLETON. With that, I don't think the Assume is neccesary?
2686 | + // Replace arg1 and arg2 by their representatives.2687 | + auto rep1 = find_fn(arg1);2688 | + auto rep2 = find_fn(arg2);2689 | + // Bail out if both representatives are the same, because that means arg1 and arg2 are in2690 | + // the same partition already.2691 | + if (rep1 == rep2) return rep1;
854 | @@ -838,7 +855,10 @@ void Cluster::ApplyRemovals(TxGraphImpl& graph, std::span<GraphIndex>& to_remove
855 | void Cluster::Clear(TxGraphImpl& graph) noexcept
856 | {
857 | for (auto i : m_linearization) {
858 | - graph.ClearLocator(m_level, m_mapping[i]); 859 | + // We do not care about setting oversized_tx accurately here, because this function is only
253 | @@ -248,8 +254,11 @@ class TxGraphImpl final : public TxGraph
254 | /** Total number of transactions in this graph (sum of all transaction counts in all
255 | * Clusters, and for staging also those inherited from the main ClusterSet). */
256 | GraphIndex m_txcount{0};
257 | + /** Total number of individually oversized transactions in the graph. */ 258 | + GraphIndex m_txcount_oversized{0};
259 | /** Whether this graph is oversized (if known). This roughly matches
260 | - * m_group_data->m_group_oversized, but may be known even if m_group_data is not. */ 261 | + * m_group_data->m_group_oversized || (m_txcount_oversized > 0), but may be known even if
Do we need m_group_oversized? Seems like a variable we have to keep in sync for no reason:
diff --git a/src/txgraph.cpp b/src/txgraph.cpp
index 01076b763b..853f8530a0 100644
--- a/src/txgraph.cpp
+++ b/src/txgraph.cpp
@@ -231,7 +231,4 @@ private:
/** Which clusters are to be merged. GroupEntry::m_cluster_offset indexes into this. */
std::vector<Cluster*> m_group_clusters;
- /** Whether at least one of the groups cannot be applied because it would result in a
- * Cluster that violates the cluster count limit. */
- bool m_group_oversized;
};
@@ -257,7 +254,5 @@ private:
/** Total number of individually oversized transactions in the graph. */
GraphIndex m_txcount_oversized{0};
- /** Whether this graph is oversized (if known). This roughly matches
- * m_group_data->m_group_oversized || (m_txcount_oversized > 0), but may be known even if
- * m_group_data is not. */
+ /** Whether this graph is oversized (if known). */
std::optional<bool> m_oversized{false};
@@ -1470,7 +1465,7 @@ void TxGraphImpl::GroupClusters(int level) noexcept
clusterset.m_group_data = GroupData{};
clusterset.m_group_data->m_group_clusters.reserve(an_clusters.size());
- clusterset.m_group_data->m_group_oversized = false;
clusterset.m_deps_to_add.clear();
clusterset.m_deps_to_add.reserve(an_deps.size());
+ clusterset.m_oversized = false;
auto an_deps_it = an_deps.begin();
auto an_clusters_it = an_clusters.begin();
@@ -1502,10 +1497,9 @@ void TxGraphImpl::GroupClusters(int level) noexcept
// Detect oversizedness.
if (total_count > m_max_cluster_count || total_size > m_max_cluster_size) {
- clusterset.m_group_data->m_group_oversized = true;
+ clusterset.m_oversized = true;
}
}
Assume(an_deps_it == an_deps.end());
Assume(an_clusters_it == an_clusters.end());
- clusterset.m_oversized = clusterset.m_group_data->m_group_oversized;
Compact();
}
@@ -2371,11 +2365,4 @@ void TxGraphImpl::SanityCheck() const
if (!clusterset.m_removed.empty()) compact_possible = false;
- // If m_group_data exists, and no outstanding removals remain, m_group_oversized must match
- // m_group_oversized || (m_txcount_oversized > 0).
- if (clusterset.m_group_data.has_value() && clusterset.m_to_remove.empty()) {
- assert(clusterset.m_oversized ==
- (clusterset.m_group_data->m_group_oversized || (clusterset.m_txcount_oversized > 0)));
- }
-
// For non-top levels, m_oversized must be known (as it cannot change until the level
// on top is gone).
Nice catch, after the changes in "Permit transactions that exceed cluster size limit", m_group_oversized is never read anymore. Included this patch as a separate simplification commit.
in
src/txgraph.cpp:71
in
acb2b6a90aoutdated
61 | @@ -62,12 +62,29 @@ struct TrimTxData
62 | TxGraph::GraphIndex m_index;
63 | /** Number of unmet dependencies this transaction has. -1 if the transaction is included. */
64 | uint32_t m_deps_left;
65 | + /** Number of dependencies that apply to this transaction as child. */
Here are a couple suggestions to tighten up the checks on what we're trimming. Attempted to state that:
non-oversized clusters remain untouched
transactions with no ancestors that are undersized should never be trimmed (this seems natural but the interface doesn't guarantee it)
Mostly trying to avoid degenerate behavior like entire mempool being wiped, or any oversized cluster being completely wiped needlessly.
diff --git a/src/test/fuzz/txgraph.cpp b/src/test/fuzz/txgraph.cpp
index 47ad5792bf..bdf759f485 100644
--- a/src/test/fuzz/txgraph.cpp
+++ b/src/test/fuzz/txgraph.cpp
@@ -91,4 +91,28 @@ struct SimTxGraph
}
+ /** Returns true if all positions in the given set are in oversized clusters, false otherwise. */
+ bool AllInOversizedClusters(SetType set)
+ {
+ if (set.Any() && !IsOversized()) return false;
+
+ auto todo = graph.Positions();
+ if (!set.IsSubsetOf(todo)) return false;
+
+ // Walk all clusters, and make sure all of `set` doesn't come from non-oversized clusters
+ while (todo.Any()) {
+ auto component = graph.FindConnectedComponent(todo);
+ bool is_oversized{component.Count() > max_cluster_count};
+ uint64_t component_size{0};
+ for (auto i : component) component_size += graph.FeeRate(i).size;
+ is_oversized |= component_size > max_cluster_size;
+ // Some element existed in a non-oversized cluster
+ if (!is_oversized && set.Overlaps(component)) {
+ return false;
+ }
+ todo -= component;
+ }
+ return true;
+ }
+
void MakeModified(DepGraphIndex index)
{
@@ -798,8 +822,16 @@ FUZZ_TARGET(txgraph)
}
auto removed_set = top_sim.MakeSet(removed);
- // The removed set must contain all its own descendants.
for (auto simpos : removed_set) {
+ // The removed set must contain all its own descendants.
assert(top_sim.graph.Descendants(simpos).IsSubsetOf(removed_set));
+
+ // Nothing removed should have no ancestors and be undersized itself
+ assert(!(top_sim.GetAncDesc(top_sim.GetRef(simpos), /*desc=*/false).Count() == 0 &&
+ (uint64_t) top_sim.graph.FeeRate(simpos).size <= top_sim.max_cluster_size));
}
+
+ // Nothing from not-oversized-clusters should have been removed
+ assert(top_sim.AllInOversizedClusters(removed_set));
+
// Apply all removals to the simulation, and verify the result is no longer
// oversized. Don't query the real graph for oversizedness; it is compared
Ah, I see it now. Both variants of the Trim() algorithm can definitely remove these types of transactions.
f.e. if cluster limit is 2 and you have:
TxA <- TxB <- TxC
TxA <- TxE
TxD <- TxE
You may select TxA+TxB only (after stopping at TxC), evicting TxC+TxD+TxE, if TxD's cfr is low.
If TxD's cfr is higher than TxB or TxC, then you could retain A+B+D
After a bunch of debugging, I was able to figure out a stronger check, if we constrain the scenario from a non-oversized to oversized transition with pending dependencies applied: https://github.com/instagibbs/bitcoin/tree/2025-05-applytrim
167 | @@ -151,6 +168,10 @@ class Cluster
168 | void Relinearize(TxGraphImpl& graph, uint64_t max_iters) noexcept;
169 | /** For every chunk in the cluster, append its FeeFrac to ret. */
170 | void AppendChunkFeerates(std::vector<FeeFrac>& ret) const noexcept;
171 | + /** Add a TrimTxData entry for every transaction in the Cluster to ret. Implicit dependencies
Done. Also organized the fields of struct TrimTxData a bit more logically, into "Populated by Cluster::AppendTxData" and "Only used internally in TxGraphImpl::Trim" sections.
in
src/txgraph.cpp:2669
in
ce825731e8outdated
2664 | + // Gather trim data from all involved Clusters.2665 | + auto cluster_span = std::span{clusterset.m_group_data->m_group_clusters}2666 | + .subspan(group_data.m_cluster_offset, group_data.m_cluster_count);2667 | + uint64_t size{0};2668 | + for (Cluster* cluster : cluster_span) {2669 | + size += cluster->AppendTrimData(trim_data, deps);
2672 | + if (trim_data.size() <= m_max_cluster_count && size <= m_max_cluster_size) continue;2673 | + // Sort the trim data by GraphIndex. In what follows, we will treat this sorted vector as2674 | + // a map from GraphIndex to TrimTxData, and its ordering will not change anymore.2675 | + std::sort(trim_data.begin(), trim_data.end(), [](auto& a, auto& b) noexcept { return a.m_index < b.m_index; });2676 | +2677 | + // Construct deps, and sort it by child.
2677 | + // Construct deps, and sort it by child.2678 | + deps.insert(deps.end(),2679 | + clusterset.m_deps_to_add.begin() + group_data.m_deps_offset,2680 | + clusterset.m_deps_to_add.begin() + group_data.m_deps_offset + group_data.m_deps_count);2681 | + std::sort(deps.begin(), deps.end(), [](auto& a, auto& b) noexcept { return a.second < b.second; });2682 | + // Fill m_deps_left in trim_data. Because of the sort above, all
2669 | + size += cluster->AppendTrimData(trim_data, deps);2670 | + }2671 | + // If this group of Clusters does not violate any limits, continue to the next group.2672 | + if (trim_data.size() <= m_max_cluster_count && size <= m_max_cluster_size) continue;2673 | + // Sort the trim data by GraphIndex. In what follows, we will treat this sorted vector as2674 | + // a map from GraphIndex to TrimTxData, and its ordering will not change anymore.
103 | @@ -101,6 +104,9 @@ class Cluster
104 | {
105 | return m_quality == QualityLevel::OPTIMAL;
106 | }
107 | + /** Whether this cluster is oversized (just due to the size of its transaction(s), not due to 108 | + * dependencies that are yet to be added. */ 109 | + bool IsOversized() const noexcept { return m_quality == QualityLevel::OVERSIZED_SINGLETON; }
ismaelsadeeq
commented at 9:18 AM on May 22, 2025:
This method's name aligns with its functionality.
However with the addition of this commit I think there is ambiguity currently between the size oversize, the count oversize and both. This makes it a bit confusing to follow.
To be clear, Cluster::IsOversized() returns whether the Cluster is oversized in general. It's just the case that changes to clusters which bring it over the size/count limit are never applied, so the only way an actually materialized Cluster can be oversized is if it's a singleton transaction which is oversized on its own.
ismaelsadeeq
commented at 8:42 PM on May 22, 2025:
To be clear, Cluster::IsOversized() returns whether the Cluster is oversized in general.
huh, I missed thanks for the clarity.
I think you can add this comment as well or anything that would make this explicit and clearer will be appreciated.
830 | @@ -818,7 +831,7 @@ void Cluster::ApplyRemovals(TxGraphImpl& graph, std::span<GraphIndex>& to_remove
831 | if (todo.None()) {
832 | // If no further removals remain, and thus all removals were at the end, we may be able
833 | // to leave the cluster at a better quality level.
834 | - if (IsAcceptable(/*after_split=*/true)) { 835 | + if (m_linearization.empty() || IsAcceptable(/*after_split=*/true)) {
ismaelsadeeq
commented at 11:00 AM on May 22, 2025:
if there is nothing in the cluster? why should it be marked as needs_split_acceptable?
Also I see no crashes after removing the new or condition.
Interesting, done! I have renamed the function to MatchesOversizedClusters and added a comment to clarify what it does.
I've also simplified it to just if (is_oversized != set.Overlaps(component)) return false;.
in
src/txgraph.h:176
in
412a9e0e69outdated
168 | @@ -169,6 +169,11 @@ class TxGraph
169 | * that appear identically in both. Use FeeFrac rather than FeePerWeight so CompareChunks is
170 | * usable without type-conversion. */
171 | virtual std::pair<std::vector<FeeFrac>, std::vector<FeeFrac>> GetMainStagingDiagrams() noexcept = 0;
172 | + /** Remove transactions (including their own descendants) according to a fast but best-effort 173 | + * strategy such that the TxGraph's cluster and size limits are respected. Applies to staging 174 | + * if it exists, and to main otherwise. Returns the list of all removed transactions in 175 | + * unspecified order. This has no effect unless the relevant graph is oversized. */ 176 | + virtual std::vector<Ref*> Trim() noexcept = 0;
ismaelsadeeq
commented at 1:30 PM on May 22, 2025:
Indicate that it should not be called when their is an instance of block builder.
The GetBlockBuilder function already states "While the returned object exists, no mutators on the main graph are allowed.". I'm open to putting a comment to that effect on each of the mutators where it matters (incl. Trim), but I feel like it should be done consistently.
ismaelsadeeq
commented at 2:30 PM on May 22, 2025:
member
Code review up to 412a9e0e69ccf461554660b7c622b768cfb8ccb3
Also did some fuzzing and mutate the new code added and tests, craches occur as expected.
If a transaction is encountered whose addition would violate the limit, it is removed, together with all its descendants.
In real-world scenarios it is unlikely we encounter those transactions that will exceed the cluster count limit, however when it occur I assume that the newly connected block will have similar transaction with the reorged block. I think maintaining an in-memory data structure for these transactions removed due to the limit seperate from mempool will reduce bandwidth from re-questing the tx again and also avoid non-contextual checks?
We can get rid of them after the reorg is complete and they arent included in any block.
In real-world scenarios it is unlikely we encounter those transactions that will exceed the cluster count limit, however when it occur I assume that the newly connected block will have similar transaction with the reorged block. I think maintaining an in-memory data structure for these transactions removed due to the limit seperate from mempool will reduce bandwidth from re-questing the tx again and also avoid non-contextual checks?
In theory, almost the entire mempool could be trimmed away (in a pathological case where the re-added block transactions link all mempool transaction together), so any approach for doing this would need to be able to handle picking just an acceptable-sized subset. It may make sense to just throw them into the compact block reconstruction extrapool though, up to a certain limit.
sipa force-pushed on May 22, 2025
in
src/txgraph.h:250
in
0e59c73288outdated
239 | @@ -240,8 +240,9 @@ class TxGraph
240 | };
241 | };
242 |
243 | -/** Construct a new TxGraph with the specified limit on transactions within a cluster. That 244 | - * number cannot exceed MAX_CLUSTER_COUNT_LIMIT. */ 245 | -std::unique_ptr<TxGraph> MakeTxGraph(unsigned max_cluster_count) noexcept; 246 | +/** Construct a new TxGraph with the specified limit on the number of transactions within a cluster, 247 | + * and on the sum of transaction sizes within a cluster. max_cluster_count cannot exceed 248 | + * MAX_CLUSTER_COUNT_LIMIT. */
2761 | + Assume(deps_it == deps_by_parent.end());2762 | +2763 | + // Build a heap of all transactions with 0 unmet dependencies.2764 | + std::make_heap(trim_heap.begin(), trim_heap.end(), cmp_fn);2765 | +2766 | + // Iterate over to-be-included transactions. It is possible that the heap empties without
Worked on a test attempts to cover the claim here. It ended up being more complicated than that (claim a bit wrong?) due to the lazy application of dependencies. For this property to hold, I believe you have to touch the TxGraph in any way that applies dependencies when undersized, then transition to oversized graph for the circular dependencies to not occur in the implied graph.
Without the dependencies applied, the linearization order can be "nonsensical", which then creates a contradictory dependency, iow cycle. This cycle ends up dropping sometimes the entire cluster ala
This can be tested by commenting out the CountDistinctClusters invocation in the new test.
The good news is this usage pattern is "natural", so in practice each new transaction (package) will have a TxGraph with all dependencies applied except for the new package. It's just weird and I'm unsure how to do better. Perhaps Trim could pre-emptively apply dependencies + FixLinearization for each grouped cluster.
In this graph, T2 is the implied parent of T1 thanks to to-be-applied dependencies not being applied yet, causing the trim heap to never get an entry:
I've written a test based on this, which randomly adds dependencies but in a "directed" sense that never introduces cycles (rather than just has 2 specific scenarios).
instagibbs
commented at 2:46 PM on May 27, 2025:
member
Reviewed through 538e9ff804f8dfba6f6a808e83572fdeab145ab8
Last big Q is about the cycles being generated in the implied graph, and what if anything we should do differently about it.
sipa force-pushed on May 28, 2025
sipa force-pushed on May 28, 2025
DrahtBot added the label CI failed on May 28, 2025
DrahtBot
commented at 3:32 AM on May 28, 2025:
contributor
<!--85328a0da195eb286784d51f73fa0af9-->
🚧 At least one of the CI tasks failed.
<sub>Task ARM, unit tests, no functional tests: https://github.com/bitcoin/bitcoin/runs/43009654762</sub>
<sub>LLM reason (✨ experimental): The failure is caused by a type mismatch in the call to std::max due to incompatible argument types.</sub>
<details><summary>Hints</summary>
Try to run the tests locally, according to the documentation. However, a CI failure may still
happen due to a number of reasons, for example:
Possibly due to a silent merge conflict (the changes in this pull request being
incompatible with the current code in the target branch). If so, make sure to rebase on the latest
commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the
affected test.
An intermittent issue.
Leave a comment here, if you need help tracking down a confusing failure.
</details>
DrahtBot removed the label CI failed on May 28, 2025
sipa force-pushed on May 28, 2025
instagibbs
commented at 2:13 PM on May 28, 2025:
member
Following the last light ACK, review comments by @instagibbs were addressed:
Increased fuzz test coverage
Clearer and corrected comments
An Assume statement was added
in
src/txgraph.cpp:2612
in
1664ec567foutdated
2607 | + // cluster linearization. So it can be seen as a "merge sort" of the chunks of the clusters,2608 | + // but respecting the dependencies being added.2609 | + //2610 | + // This rudimentary linearization is computed lazily, by putting all eligible (no unmet2611 | + // dependencies) transactions in a heap, and popping the highest-feerate one from it. This2612 | + // continues as long the number or size of all picked transactions together does not exceed the
239 | @@ -240,8 +240,9 @@ class TxGraph
240 | };
241 | };
242 |
243 | -/** Construct a new TxGraph with the specified limit on transactions within a cluster. That 244 | - * number cannot exceed MAX_CLUSTER_COUNT_LIMIT. */ 245 | -std::unique_ptr<TxGraph> MakeTxGraph(unsigned max_cluster_count) noexcept; 246 | +/** Construct a new TxGraph with the specified limit on the number of transactions within a cluster,
Perhaps worth noting (for future visitors) that these two limits have pretty different purposes even though they both contribute to the oversizedness property. Max cluster count helps bound linearization complexity, but max cluster size's purpose is largely external to TxGraph. It is only a parameter so that TxGraph can help enforce it.
I've come to think of them as both being a feature that TxGraph offers to the mempool/net_processing code to help enforce its policy limits. So in this regard, they are equivalent.
The reason for the existence of the policy limits differs however: weight is there to bound how good/predictable transaction quality can be through linearization (avoiding binpacking), while count is there to keep the TxGraphImpl implementation efficient. But it isn't the case that this directly determines the count limit, otherwise e.g. more performant machines should have a higher limit for example.
Fair enough, I suppose they are both external policy limits!
in
src/txgraph.h:69
in
221de30171outdated
63 | @@ -64,9 +64,9 @@ class TxGraph
64 | virtual ~TxGraph() = default;
65 | /** Construct a new transaction with the specified feerate, and return a Ref to it.
66 | * If a staging graph exists, the new transaction is only created there. feerate.size must be
67 | - * strictly positive, and cannot exceed the graph's max cluster size. In all further calls, 68 | - * only Refs created by AddTransaction() are allowed to be passed to this TxGraph object (or 69 | - * empty Ref objects). Ref objects may outlive the TxGraph they were created for. */ 70 | + * strictly positive. In all further calls, only Refs created by AddTransaction() are allowed 71 | + * to be passed to this TxGraph object (or empty Ref objects). Ref objects may outlive the 72 | + * TxGraph they were created for. */
in 221de30171acc1a20549079309b545612f727efb, do I understand correctly that we also don't ever expect OVERSIZED_SINGLETON to occur in the wild (i.e. we wouldn't accept any transactions above standard size in reorgs and max standard size > max cluster size), but TxGraph must handle this case as part of handling oversizedness in general?
With this change, the responsibility for enforcing cluster size limits can be localized purely in TxGraph, without callers (and in particular, tests) needing to duplicate the enforcement for individual transactions.
glozow
commented at 5:14 PM on June 17, 2025:
member
Small doc things for now, still reviewing
sipa force-pushed on Jun 17, 2025
sipa
commented at 5:33 PM on June 17, 2025:
member
Addressed nit, and improved comments here and there.
Also two bigger changes:
There was a bug in the second Trim commit (the one that added UF), resulting in it not achieving the desired quality (if anyone has ideas for tests that exercise the improved quality UF brings, I'm all ears).
I've changed the conversion of unincluded transactions to singleton UF partitions to happen at the time they are popped from the heap, rather than when they are added. I think this makes more sense, as it means the heap is purely for unincluded but includable transactions, and the UF structure purely for included or rejected transactions.
Effect of the bugfix, in a benchmark that creates a merge-the-whole-mempool scenario (not included in the PR, because it doesn't really fit in our bench framework):
Before fix:
Benchmarks (in ms):
block_build: 21.7986
oversize: 7.12257
block_linearize: 33.0116
mempool_build: 0.0691267
trim: 10.4955
final_linearize: 1.5371
reset: 0.308329
Transaction counts:
before trim: 64011
after trim: 38743
After fix:
Benchmarks (in ms):
block_build: 23.3451
oversize: 7.70649
block_linearize: 34.05
mempool_build: 0.0766003
trim: 12.6596
final_linearize: 0.0175771
reset: 0.371984
Transaction counts:
before trim: 64011
after trim: 63994
glozow
commented at 6:54 PM on June 17, 2025:
member
if anyone has ideas for tests that exercise the improved quality UF brings, I'm all ears
-+ if (new_count > m_max_cluster_count || new_size > m_max_cluster_size) break;
++ if (new_count > m_max_cluster_count || new_size > m_max_cluster_size) continue;
In that the optimization means we should keep on going in case we're going to create/add to undersized clusters, but this break broke the optimization. In other words, the base performance was still intact?
sipa
commented at 2:01 PM on June 18, 2025:
member
The commit was still an improvement, as it would only stop once an aggregate-of-clusters-so-far would exceed the limit, rather than stopping when the sum of all-clusters-so-far would exceed the limit (the behavior in the initial Trim commit). But with the continue; instead of break;, it isn't stopping, it'll just skip the cluster being aggregated, and its descendants.
in
src/test/txgraph_tests.cpp:62
in
4ead162a46outdated
57 | + } 58 | + 59 | + // Check that the graph is now oversized. This also forces the graph to 60 | + // group clusters and compute the oversized status. 61 | + BOOST_CHECK_EQUAL(graph->GetTransactionCount(), TOTAL_NUM_TX); 62 | + BOOST_CHECK(graph->IsOversized(false));
instagibbs
commented at 2:18 PM on June 18, 2025:
member
reACK643fffafdc70da5fbaf1992c36965595b9e63f84
and the UF structure purely for included or rejected transactions
i.e. rejected here means "things with no unmet deps yet never included"
Validated in my mental model that bug fix and heap logic changes are correct. Reviewed new unit tests and makes sense that it would have caught the bug. It's a bit further non-black-box, but they're simple.
sipa
commented at 3:18 PM on June 18, 2025:
member
@instagibbs Done. Also expanded comments slightly, made unit test constant names a bit more consistent, and sprinkled some graph->SanityCheck(); calls over the tests.
instagibbs
commented at 3:54 PM on June 18, 2025:
member
reACK2b2df98747fdb6380588991167ce2e8cb92f3bfb
sipa
commented at 1:49 PM on June 30, 2025:
member
Would there be interest in adding a benchmark for the Trim() operation here, or rather in a follow-up?
instagibbs
commented at 3:27 PM on June 30, 2025:
member
@sipa linked here, added to follow-up potentially?
It's the same scenario as the txgraph_trim_huge unit test (1000 chains of 64 reorged transactions each, merged together by 11 mempool transactions into one giant would-be cluster, then Trim()ed down to something acceptable). Takes around 15 ms for me.
instagibbs
commented at 5:46 PM on July 1, 2025:
member
@sipa one order of magnitude more for me on my wimpy laptop but well within acceptable bounds I suspect for a reorg. IIUC this is one of the more pessimistic mempool topologies for this since you're going to have to go through the ~entire mempool before finishing
ismaelsadeeq
commented at 8:39 PM on July 1, 2025:
member
Would there be interest in adding a benchmark for the Trim() operation here, or rather in a follow-up?
That will be nice, I don't mind adding it here. I plan to do another round of review of the added tests and diff after my last review so if added will review it as well.
I try running the bench on the linked commit. MakeTxGraph seems to takes two parameters, but the bench call provided three arguments.
I removed NUM_ACCEPTABLE_ITERS arg this is the result.
@ismaelsadeeq Yeah, the benchmark is in my "all txgraph" branch which also includes #32263. Removing the NUM_ACCEPTABLE_ITERS variable should be fine.
in
src/test/txgraph_tests.cpp:71
in
2b2df98747outdated
66 | + // Call Trim() to remove transactions and bring the cluster back within limits. 67 | + auto removed_refs = graph->Trim(); 68 | + graph->SanityCheck(); 69 | + BOOST_CHECK(!graph->IsOversized(/*main_only=*/false)); 70 | + 71 | + // We only need to trim in the middle bottom transaction to end up with 2 clusters each within cluster limits.
2743 | + ++deps_it;2744 | + }2745 | + trim_it->m_parent_count = trim_it->m_deps_left;2746 | + // If this transaction has no unmet dependencies, and is not oversized, add it to the2747 | + // heap (just append for now, the heapification happens below).2748 | + if (trim_it->m_deps_left == 0 && trim_it->m_tx_size <= m_max_cluster_size) {
I tested that Trim() always removes oversized singletons, even if there are no clusters to be merged, using a simple unit test
diff --git a/src/test/txgraph_tests.cpp b/src/test/txgraph_tests.cpp
index 9f9ad04ff37..d90afd8819d 100644
--- a/src/test/txgraph_tests.cpp
+++ b/src/test/txgraph_tests.cpp
@@ -248,4 +248,43 @@ BOOST_AUTO_TEST_CASE(txgraph_trim_huge)
BOOST_CHECK(graph->GetTransactionCount() >= (NUM_TOP_CHAINS * NUM_TX_PER_TOP_CHAIN * 99) / 100);
}
+BOOST_AUTO_TEST_CASE(txgraph_trim_big_singletons)
+{
+ // Mempool consists of 100 singleton clusters; there are no dependencies. Some are oversized. Trim() should remove all of the oversized ones.
+ static constexpr int MAX_CLUSTER_COUNT = 64;
+ static constexpr int32_t MAX_CLUSTER_SIZE = 100'000;
+ static constexpr int NUM_TOTAL_TX = 100;
+
+ // Create a new graph for the test.
+ auto graph = MakeTxGraph(MAX_CLUSTER_COUNT, MAX_CLUSTER_SIZE);
+
+ // Add all transactions and store their Refs.
+ std::vector<TxGraph::Ref> refs;
+ refs.reserve(NUM_TOTAL_TX);
+
+ // Add all transactions. They are in individual clusters.
+ for (unsigned int i = 0; i < NUM_TOTAL_TX; ++i) {
+ // The 88th transaction is oversized.
+ // Every 20th transaction is oversized.
+ const FeePerWeight feerate{500 + i, (i == 88 || i % 20 == 0) ? MAX_CLUSTER_SIZE + 1 : 100};
+ refs.push_back(graph->AddTransaction(feerate));
+ }
+ graph->SanityCheck();
+
+ // Check that the graph is now oversized. This also forces the graph to
+ // group clusters and compute the oversized status.
+ BOOST_CHECK(graph->IsOversized(/*main_only=*/false));
+
+ // Call Trim() to remove transactions and bring the cluster back within limits.
+ auto removed_refs = graph->Trim();
+ graph->SanityCheck();
+ BOOST_CHECK_EQUAL(graph->GetTransactionCount(), NUM_TOTAL_TX - 6);
+ BOOST_CHECK(!graph->IsOversized(/*main_only=*/false));
+
+ // Check that all the oversized transactions were removed.
+ for (unsigned int i = 0; i < refs.size(); ++i) {
+ BOOST_CHECK_EQUAL(graph->Exists(refs[i]), i != 88 && i % 20 != 0);
+ }
+}
+
BOOST_AUTO_TEST_SUITE_END()
diff --git a/src/txgraph.cpp b/src/txgraph.cpp
index 1b29a05e534..26eef54d321 100644
--- a/src/txgraph.cpp
+++ b/src/txgraph.cpp
@@ -2745,7 +2745,7 @@ std::vector<TxGraph::Ref*> TxGraphImpl::Trim() noexcept
trim_it->m_parent_count = trim_it->m_deps_left;
// If this transaction has no unmet dependencies, and is not oversized, add it to the
// heap (just append for now, the heapification happens below).
- if (trim_it->m_deps_left == 0 && trim_it->m_tx_size <= m_max_cluster_size) {
+ if (trim_it->m_deps_left == 0) {
trim_heap.push_back(trim_it);
}
}
@@ -2813,7 +2813,7 @@ std::vector<TxGraph::Ref*> TxGraphImpl::Trim() noexcept
new_size += ptr->m_uf_size;
}
// Skip the entry if this would violate any limit.
- if (new_count > m_max_cluster_count || new_size > m_max_cluster_size) continue;
+ if (new_count > m_max_cluster_count) continue;
// Union the partitions this transaction and all its dependencies are in together.
auto rep = &entry;
glozow
commented at 6:31 PM on July 2, 2025:
member
ACK2b2df98747fdb6380588991167ce2e8cb92f3bfb
Reviewed the code and the tests. I ran the bench to see that we spend a lot longer on the last commit since we are able to consider more potential clusters.
Before:
txgraph: Add ability to configure maximum cluster size/weight (feature)
This is integrated with the oversized property: the graph is oversized when
any connected component within it contains more than the cluster count limit
many transactions, or when their combined size/weight exceeds the cluster size
limit.
It becomes disallowed to call AddTransaction with a size larger than this limit,
though this limit will be lifted in the next commit.
In addition, SetTransactionFeeRate becomes SetTransactionFee, so that we do not
need to deal with the case that a call to this function might affect the
oversizedness.
c4287b9b71
txgraph: Permit transactions that exceed cluster size limit (feature)
This removes the restriction added in the previous commit that individual
transactions do not exceed the max cluster size limit.
With this change, the responsibility for enforcing cluster size limits can
be localized purely in TxGraph, without callers (and in particular, tests)
needing to duplicate the enforcement for individual transactions.
txgraph: Add ability to trim oversized clusters (feature)
During reorganisations, it is possible that dependencies get add which
result in clusters that violate limits (count, size), when linking the
new from-block transactions to the old from-mempool transactions.
Unlike RBF scenarios, we cannot simply reject these policy violations
when they are due to received blocks. To accomodate this, add a Trim()
function to TxGraph, which removes transactions (including descendants)
in order to make all resulting clusters satisfy the limits.
In the initial version of the function added here, the following approach
is used:
- Lazily compute a naive linearization for the to-be-merged cluster (using
an O(n log n) algorithm, optimized for far larger groups of transactions
than the normal linearization code).
- Initialize a set of accepted transactions to {}
- Iterate over the transactions in this cluster one by one:
- If adding the transaction to the set makes it exceed the max cluster size
or count limit, stop.
- Add the transaction to the set.
- Remove all transactions from the cluster that were not included in the set
(note that this necessarily includes all descendants too, because they
appear later in the naive linearization).
Co-authored-by: Greg Sanders <gsanders87@gmail.com>
a04e205ab0
txgraph: add unit test for TxGraph::Trim (tests)
Co-Authored-By: Pieter Wuille <pieter@wuille.net>
938e86f8fe
txgraph: add fuzz test scenario that avoids cycles inside Trim() (tests)
Trim internally builds an approximate dependency graph of the merged cluster,
replacing all existing dependencies within existing clusters with a simple
linear chain of dependencies. This helps keep the complexity of the merging
operation down, but may result in cycles to appear in the general case, even
though in real scenarios (where Trim is called for stitching re-added mempool
transactions after a reorg back to the existing mempool transactions) such
cycles are not possible.
Add a test that specifically targets Trim() but in scenarios where it is
guaranteed not to have any cycles. It is a special case, is much more a
whitebox test than a blackbox test, and relies on randomness rather than
fuzz input. The upside is that somewhat stronger properties can be tested.
Co-authored-by: Greg Sanders <gsanders87@gmail.com>
Addressed nits, and also incorporated the benchmark in this PR.
txgraph: add Trim benchmark (benchmark)4608df37e0
txgraph: Track multiple potential would-be clusters in Trim (improvement)
In the existing Trim function, as soon as the set of accepted transactions
would exceed the max cluster size or count limit, the acceptance loop is
stopped, removing all later transactions. However, it is possible that by
excluding some of those transactions the would-be cluster splits apart into
multiple would-clusters. And those clusters may well permit far more
transactions before their limits are reached.
Take this into account by using a union-find structure inside TrimTxData to
keep track of the count/size of all would-be clusters that would be formed
at any point, and only reject transactions which would cause these resulting
partitions to exceed their limits.
This is not an optimization in terms of CPU usage or memory; it just
improves the quality of the transactions removed by Trim().
1632fc104b
sipa force-pushed on Jul 2, 2025
DrahtBot added the label CI failed on Jul 2, 2025
DrahtBot
commented at 8:12 PM on July 2, 2025:
contributor
<!--85328a0da195eb286784d51f73fa0af9-->
🚧 At least one of the CI tasks failed.
<sub>Task tidy: https://github.com/bitcoin/bitcoin/runs/45240651157</sub>
<sub>LLM reason (✨ experimental): Clang-tidy detected errors due to inclusion of deprecated headers, causing CI failure.</sub>
<details><summary>Hints</summary>
Try to run the tests locally, according to the documentation. However, a CI failure may still
happen due to a number of reasons, for example:
Possibly due to a silent merge conflict (the changes in this pull request being
incompatible with the current code in the target branch). If so, make sure to rebase on the latest
commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the
affected test.
An intermittent issue.
Leave a comment here, if you need help tracking down a confusing failure.
</details>
instagibbs approved
instagibbs
commented at 8:27 PM on July 2, 2025:
member
reACK1632fc104be8f171f59a023800b2f9f20d0a3cff
Includes the benchmark and unit test for trimming singletons, and a text cleanup
DrahtBot requested review from glozow on Jul 2, 2025
in
src/test/txgraph_tests.cpp:60
in
938e86f8feoutdated
55 | + for (unsigned int i = 0; i < NUM_BOTTOM_TX; ++i) { 56 | + graph->AddDependency(/*parent=*/refs[NUM_BOTTOM_TX + i], /*child=*/refs[i]); 57 | + graph->AddDependency(/*parent=*/refs[NUM_BOTTOM_TX + i + 1], /*child=*/refs[i]); 58 | + } 59 | + 60 | + // Check that the graph is now oversized. This also forces the graph to
True, there isn't actually anything to group in this test. Just copied from the other tests.
glozow
commented at 9:19 PM on July 2, 2025:
member
reACK1632fc104be via range-diff
DrahtBot removed the label CI failed on Jul 2, 2025
in
src/test/txgraph_tests.cpp:55
in
938e86f8feoutdated
50 | + } 51 | + 52 | + // Create the zigzag dependency structure. 53 | + // Each transaction in the bottom row depends on two adjacent transactions from the top row. 54 | + graph->SanityCheck(); 55 | + for (unsigned int i = 0; i < NUM_BOTTOM_TX; ++i) {
ismaelsadeeq
commented at 12:39 PM on July 3, 2025:
We start being oversized at i = 24 iteration.
ismaelsadeeq approved
ismaelsadeeq
commented at 12:22 PM on July 4, 2025:
member
This is a metadata mirror of the GitHub repository
bitcoin/bitcoin.
This site is not affiliated with GitHub.
Content is generated from a GitHub metadata backup.
generated: 2026-05-02 12:13 UTC
This site is hosted by @0xB10C More mirrored repositories can be found on mirror.b10c.me