cluster mempool: add TxGraph reorg functionality

sipa commented at 7:06 pm on December 22, 2024: member

Part of cluster mempool (#30289).

During reorganisations, it is possible that dependencies get added which would result in clusters that violate policy limits (cluster count, cluster weight), when linking the new from-block transactions to the old from-mempool transactions. Unlike RBF scenarios, we cannot simply reject the changes when they are due to received blocks. To accommodate this, add a TxGraph::Trim(), which removes some subset of transactions (including descendants) in order to make all resulting clusters satisfy the limits.

Conceptually, the way this is done is by defining a rudimentary linearization for the entire would-be too-large cluster, iterating it from beginning to end, and reasoning about the counts and weights of the clusters that would be reached using transactions up to that point. If a transaction is encountered whose addition would violate the limit, it is removed, together with all its descendants.

This rudimentary linearization is like a merge sort of the chunks of the clusters being combined, but respecting topology. More specifically, it is continuously picking the highest-chunk-feerate remaining transaction among those which have no unmet dependencies left. For efficiency, this rudimentary linearization is computed lazily, by putting all viable transactions in a heap, sorted by chunk feerate, and adding new transactions to it as they become viable.

The Trim() function is rather unusual compared to the TxGraph functionality added in previous PRs, in that Trim() makes it own decisions about what the resulting graph contents will be, without good specification of how it makes that decision - it is just a best-effort attempt (which is improved in the last commit). All other TxGraph mutators are simply to inform the graph about changes the calling mempool code decided on; this one lets the decision be made by txgraph.

As part of this, the “oversized” property is expanded to also encompass a configurable cluster weight limit (in addition to cluster count limit).

DrahtBot commented at 7:06 pm on December 22, 2024: contributor

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/31553.

Reviews

See the guideline for information on the review process.

Type	Reviewers
ACK	instagibbs, glozow, ismaelsadeeq

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#31829 (p2p: improve TxOrphanage denial of service bounds by glozow)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

LLM Linter (✨ experimental)

Possible typos and grammar issues:

“Pick an transaction in top_components to attach to.” -> “Pick a transaction in top_components to attach to.” [‘an’ before consonant‐starting ‘transaction’ should be ‘a’]
“Move entry top the back.” -> “Move entry to the back.” [typo ‘top’ instead of ‘to’]

drahtbot_id_4_m

glozow added the label Mempool on Jan 2, 2025

sipa force-pushed on Jan 8, 2025

DrahtBot added the label CI failed on Jan 9, 2025

DrahtBot commented at 0:52 am on January 9, 2025: contributor

🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/35343429418

Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.
An intermittent issue.

Leave a comment here, if you need help tracking down a confusing failure.

sipa force-pushed on Jan 9, 2025

DrahtBot removed the label CI failed on Jan 9, 2025

sipa force-pushed on Jan 9, 2025

in src/txgraph.cpp:1710 in 0c8dc2323e outdated

948+    Ref ret;
949+    // Construct a new Entry, and link it with the Ref.
950+    auto idx = m_entries.size();
951+    m_entries.emplace_back();
952+    auto& entry = m_entries.back();
953+    entry.m_ref = &ret;

theuni commented at 6:43 pm on January 9, 2025:

I’m not sure how this is intended to be used, but storing a stack address seems like a problem? RVO may help but that seems brittle. I imagine the caller should be passing in their own Ref instead?

sipa commented at 8:19 pm on January 9, 2025:

I believe it is safe, both with NRVO and without.

With NRVO, ret is constructed directly in the caller’s target destination, so this isn’t a pointer to local stack space.

Without NRVO, the Ref(Ref&&) move constructor is invoked by return ret;, which will update the pointer to the caller’s destination.

theuni commented at 10:15 pm on January 9, 2025:

Ah, right, I missed that the move ctor would handle the update. Thanks for explaining.

in src/txgraph.cpp:1172 in 0c8dc2323e outdated

1167+    }
1168+}
1169+
1170+TxGraph::Ref& TxGraph::Ref::operator=(Ref&& other) noexcept
1171+{
1172+    // Inform both TxGraphs about the Refs being swapped.

theuni commented at 6:48 pm on January 9, 2025:

Why is this doing an effective swap? I would expect this to call UnlinkRef on the moved-from value and reset its m_graph and m_index. Otherwise it wouldn’t be unlinked until the moved-from variable goes out of scope, no?

sipa commented at 8:38 pm on January 9, 2025:

Why is this doing an effective swap?

I think this is quite common, that move-construction is effectively performing a swap.

I would expect this to call UnlinkRef on the moved-from value and reset its m_graph and m_index

That’s possible too, and slightly more efficient I guess.

Otherwise it wouldn’t be unlinked until the moved-from variable goes out of scope, no?

Indeed. I don’t think that’s a problem.

sipa commented at 9:34 pm on January 9, 2025:

Anyway, done!

theuni commented at 9:56 pm on January 9, 2025:

Why is this doing an effective swap?

I think this is quite common, that move-construction is effectively performing a swap.

I would expect this to call UnlinkRef on the moved-from value and reset its m_graph and m_index

That’s possible too, and slightly more efficient I guess.

Otherwise it wouldn’t be unlinked until the moved-from variable goes out of scope, no?

Indeed. I don’t think that’s a problem.

Afaik the move/swap idiom is only safe if the swapped-to value’s dtor doesn’t have any interesting ordering requirements or side-effects.

As a contrived example, a user may do something like:

0std::vector<TxGraph::Ref> vec;
1
2vec.push_back(txgraph->AddTransaction(fee));
3auto ref = txgraph->AddTransaction(fee2);
4...
5ref = std::move(vec.back());

The vector now holds the old ref and UnlinkRef will not be called until that element is removed. I realize it’s allowed to be a “valid but unspecified state”, but I wouldn’t expect a ref to be hanging around.

in src/txgraph.cpp:188 in 0c8dc2323e outdated

183+    /** A class of objects held internally in TxGraphImpl, with information about a single
184+     *  transaction. */
185+    struct Entry
186+    {
187+        /** Pointer to the corresponding Ref object, if any. */
188+        Ref* m_ref;

theuni commented at 6:51 pm on January 9, 2025:

m_ref{nullptr};

sipa commented at 9:34 pm on January 9, 2025:

Done.

in src/txgraph.cpp:297 in 0c8dc2323e outdated

292+    LinearizationIndex lin_idx{0};
293+    // Iterate over the chunks.
294+    for (unsigned chunk_idx = 0; chunk_idx < chunking.NumChunksLeft(); ++chunk_idx) {
295+        auto chunk = chunking.GetChunk(chunk_idx);
296+        // Iterate over the transactions in the linearization, which must match those in chunk.
297+        while (true) {

theuni commented at 7:02 pm on January 9, 2025:

Trying to convince myself this is guaranteed to terminate…

do{} while (!chunk.transactions.None()) rather than the break for readability? Or just while() if we need to guard against an empty linearization (presumably not?)

sipa commented at 9:39 pm on January 9, 2025:

It terminates because:

Every chunk contains at least one element (added an Assume for that)
In the inner loop, one element from that chunk is Reset() (added an Assume that it indeed resets a bit that was previously set).

I’ve changed it to a do {} while(chunk.transactions.Any()); loop in the first commits, though it reverts back to a while (true) { ... } loop later, when the loop becomes a bit more complex.

in src/txgraph.cpp:838 in 0c8dc2323e outdated

306+}
307+
308+void Cluster::ApplyRemovals(TxGraphImpl& graph, std::span<GraphIndex>& to_remove) noexcept
309+{
310+    // Iterate over the prefix of to_remove that applies to this cluster.
311+    SetType todo;

theuni commented at 7:17 pm on January 9, 2025:

Assume !to_remove.empty() or early return if it’s allowed?

sipa commented at 9:40 pm on January 9, 2025:

Done. I’ve also added a comment to the Cluster::ApplyRemovals() function definition stating that at least one element from the front of to_remove must belong to this Cluster (which is really why that requirement exists).

in src/txgraph.cpp:1662 in 0c8dc2323e outdated

1002+    // Make sure the transaction isn't scheduled for removal.
1003+    ApplyRemovals();
1004+    return m_entries[GetRefIndex(arg)].m_locator.IsPresent();
1005+}
1006+
1007+std::vector<TxGraph::Ref*> Cluster::GetAncestorRefs(const TxGraphImpl& graph, ClusterIndex idx) noexcept

theuni commented at 7:42 pm on January 9, 2025:

Looks like these 3 functions could reserve() for their ret vectors.

sipa commented at 9:40 pm on January 9, 2025:

Done. The third one disappears in a later commit, though.

theuni commented at 8:03 pm on January 9, 2025: member

Very quick and shallow pass through the initial impl commit. This PR is a lot to get through :)

sipa force-pushed on Jan 9, 2025

DrahtBot added the label CI failed on Jan 9, 2025

DrahtBot commented at 11:22 pm on January 9, 2025: contributor

🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/35398283653

Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.
An intermittent issue.

Leave a comment here, if you need help tracking down a confusing failure.

sipa force-pushed on Jan 10, 2025

DrahtBot removed the label CI failed on Jan 10, 2025

sipa force-pushed on Jan 10, 2025

sipa force-pushed on Jan 16, 2025

sipa force-pushed on Jan 22, 2025

in src/txgraph.cpp:1548 in 6cb99b067c outdated

1464@@ -1449,6 +1465,7 @@ Cluster::Cluster(TxGraphImpl& graph, const FeeFrac& feerate, GraphIndex graph_in
1465 TxGraph::Ref TxGraphImpl::AddTransaction(const FeeFrac& feerate) noexcept
1466 {
1467     Assume(m_chunkindex_observers == 0 || m_clustersets.size() > 1);
1468+    Assume(feerate.size > 0 && uint64_t(feerate.size) <= m_max_cluster_size);

sdaftuar commented at 2:10 pm on January 24, 2025:

FYI – in my rebase of #28676, I’m seeing tx_pool fuzz test failures due to this line. Not clear to me whether we should require the caller to enforce the policy requirement that a single tx be below the cluster size limit, or just let the caller discover a changeset is oversized and then reject?

sipa commented at 2:34 pm on January 24, 2025:

Right. That rule exists because the alternative requires existing clusters to be oversized as AddTransaction constructs a singleton cluster instantly. All other forms of oversizedness happen as a result of applying dependencies, which are done lazily.

I’ll think about relaxing this.

sipa commented at 8:45 pm on January 26, 2025:

Done, it is now allowed to have individually oversized transactions.

sipa force-pushed on Jan 24, 2025

sipa commented at 10:14 pm on January 24, 2025: member

Some changes:

As a result of dropping Cleanup in the base PR, Trim now reports which transactions it removed, as it becomes the caller’s responsibility of destroying Refs.

DrahtBot added the label CI failed on Jan 24, 2025

DrahtBot commented at 11:23 pm on January 24, 2025: contributor

🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/36148923463

Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.
An intermittent issue.

Leave a comment here, if you need help tracking down a confusing failure.

sipa force-pushed on Jan 26, 2025

sipa commented at 4:42 am on January 26, 2025: member

Add support for calling AddTransaction with a feerate whose size already violates the cluster size limit.

DrahtBot removed the label CI failed on Jan 26, 2025

sipa force-pushed on Jan 26, 2025

sipa force-pushed on Jan 30, 2025

DrahtBot commented at 1:01 am on January 31, 2025: contributor

🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/36451226624

Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.
An intermittent issue.

Leave a comment here, if you need help tracking down a confusing failure.

DrahtBot added the label CI failed on Jan 31, 2025

sipa force-pushed on Jan 31, 2025

DrahtBot removed the label CI failed on Jan 31, 2025

sipa force-pushed on Jan 31, 2025

sipa force-pushed on Feb 1, 2025

sipa force-pushed on Feb 4, 2025

DrahtBot added the label CI failed on Feb 4, 2025

DrahtBot removed the label CI failed on Feb 4, 2025

sipa force-pushed on Feb 6, 2025

sipa force-pushed on Feb 11, 2025

sipa force-pushed on Feb 12, 2025

DrahtBot commented at 11:49 pm on February 12, 2025: contributor

🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/37128455883

Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.
An intermittent issue.

Leave a comment here, if you need help tracking down a confusing failure.

DrahtBot added the label CI failed on Feb 12, 2025

sipa force-pushed on Feb 13, 2025

DrahtBot removed the label CI failed on Feb 13, 2025

sipa force-pushed on Feb 14, 2025

sipa force-pushed on Feb 20, 2025

sipa force-pushed on Feb 21, 2025

in src/cluster_linearize.h:1381 in e43f6ca3b8 outdated

1364+        // Figure out which elements need to be moved before elem.
1365+        SetType place_before = done & depgraph.Ancestors(elem);
1366+        // Find which position to place elem in (updating j), continuously moving the elements
1367+        // in between forward.
1368+        while (place_before.Any()) {
1369+            // j cannot be 0 here; if it was, then there was necessarily nothing earlier which

yancyribbens commented at 10:20 pm on February 24, 2025:

0            // j cannot be 0 or less; if it was, then there was necessarily nothing earlier which

yancyribbens commented at 10:23 pm on February 24, 2025:

0if it was, then there was necessarily nothing earlier which elem needs to be place before anymore, and place_before would be empty.

This comment seems jumbled and hard to understand. Is it possible to word this better?

in src/txgraph.cpp:680 in e43f6ca3b8 outdated

615+
616+void TxGraphImpl::CreateChunkData(GraphIndex idx, LinearizationIndex chunk_count) noexcept
617+{
618+    auto& entry = m_entries[idx];
619+    if (!m_chunkindex_discarded.empty()) {
620+        // Reuse an discarded node handle.

yancyribbens commented at 10:48 pm on February 24, 2025:

0        // Reuse a discarded node handle.

sipa force-pushed on Mar 3, 2025

sipa force-pushed on Mar 19, 2025

DrahtBot added the label CI failed on Mar 19, 2025

DrahtBot commented at 9:06 pm on March 19, 2025: contributor

🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/39067440822

Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.
An intermittent issue.

Leave a comment here, if you need help tracking down a confusing failure.

sipa force-pushed on Mar 19, 2025

DrahtBot removed the label CI failed on Mar 20, 2025

DrahtBot added the label Needs rebase on Mar 20, 2025

sipa force-pushed on Mar 20, 2025

DrahtBot added the label CI failed on Mar 20, 2025

DrahtBot commented at 1:42 pm on March 20, 2025: contributor

🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/39111492257

Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.
An intermittent issue.

Leave a comment here, if you need help tracking down a confusing failure.

sipa force-pushed on Mar 20, 2025

DrahtBot removed the label Needs rebase on Mar 20, 2025

DrahtBot removed the label CI failed on Mar 20, 2025

sipa force-pushed on Mar 22, 2025

DrahtBot added the label CI failed on Mar 22, 2025

DrahtBot commented at 1:23 am on March 22, 2025: contributor

🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/39212785621

Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.
An intermittent issue.

Leave a comment here, if you need help tracking down a confusing failure.

sipa force-pushed on Mar 22, 2025

DrahtBot removed the label CI failed on Mar 22, 2025

sipa force-pushed on Mar 23, 2025

sipa force-pushed on Mar 24, 2025

glozow referenced this in commit f1d129d963 on Mar 26, 2025

sipa force-pushed on Mar 27, 2025

sipa commented at 3:49 am on March 27, 2025: member

Rebased after merge of #31363, and on top of #32151.

sipa force-pushed on Mar 27, 2025

sipa force-pushed on Mar 28, 2025

sipa force-pushed on Apr 4, 2025

sipa force-pushed on Apr 7, 2025

DrahtBot added the label Needs rebase on Apr 17, 2025

sipa force-pushed on Apr 17, 2025

DrahtBot removed the label Needs rebase on Apr 17, 2025

sipa force-pushed on Apr 22, 2025

sipa force-pushed on Apr 30, 2025

sipa force-pushed on May 12, 2025

sipa force-pushed on May 13, 2025

sipa commented at 8:33 pm on May 13, 2025: member

Now ready for review, with #31444 merged.

in src/test/fuzz/txgraph.cpp:272 in 462429bc15 outdated

267@@ -262,12 +268,16 @@ FUZZ_TARGET(txgraph)
268 
269     // Decide the maximum number of transactions per cluster we will use in this simulation.
270     auto max_count = provider.ConsumeIntegralInRange<DepGraphIndex>(1, MAX_CLUSTER_COUNT_LIMIT);
271+    // And the maximum combined size of transactions per cluster.
272+    auto max_size = provider.ConsumeIntegralInRange<uint64_t>(1, 0x3fffff * MAX_CLUSTER_COUNT_LIMIT);

instagibbs commented at 2:50 pm on May 14, 2025:

462429bc159b1b5ddb5443c484c9bdeae985ad26

nitty: max_agg_size

sipa commented at 7:15 pm on May 21, 2025:

I have changed the variable names to max_cluster_count and max_cluster_size everywhere.

in src/test/fuzz/txgraph.cpp:409 in 462429bc15 outdated

406                 } else {
407                     // Otherwise, use smaller range which consume fewer fuzz input bytes, as just
408                     // these are likely sufficient to trigger all interesting code paths already.
409                     fee = provider.ConsumeIntegral<uint8_t>();
410-                    size = provider.ConsumeIntegral<uint8_t>() + 1;
411+                    size = provider.ConsumeIntegralInRange<uint32_t>(1, std::min<uint32_t>(0xff, max_tx_size));

instagibbs commented at 2:56 pm on May 14, 2025:

462429bc159b1b5ddb5443c484c9bdeae985ad26

misinterpreted the commit message that the aggregate size should not be hit when adding transaction, rather than each individual txn shouldn’t exceed this limit. Maybe make this crystal clear.

sipa commented at 7:16 pm on May 21, 2025:

Do you mean in the TxGraph::AddTransaction definition in txgraph.h? I have expanded the comments in the fuzz test around this a bit.

sipa commented at 9:25 pm on May 21, 2025:

Does :+1: mean “That is what I meant” or “It’s fixed now”?

instagibbs commented at 4:59 pm on May 23, 2025:

comments appear clear enough now

in src/txgraph.cpp:113 in 462429bc15 outdated

108@@ -109,6 +109,8 @@ class Cluster
109     }
110     /** Get the number of transactions in this Cluster. */
111     LinearizationIndex GetTxCount() const noexcept { return m_linearization.size(); }
112+    /** Get the total size of the transactions in this Cluster. */
113+    uint64_t GetTxSize() const noexcept;

instagibbs commented at 3:02 pm on May 14, 2025:

462429bc159b1b5ddb5443c484c9bdeae985ad26

nitty: GetTotalTxSize? GetAggTxSize? GetTxsSize?

sipa commented at 7:16 pm on May 21, 2025:

GetTotalTxSize is a nice color.

in src/txgraph.cpp:1473 in 462429bc15 outdated

1468@@ -1453,7 +1469,7 @@ void TxGraphImpl::GroupClusters(int level) noexcept
1469             ++new_entry.m_deps_count;
1470         }
1471         // Detect oversizedness.
1472-        if (total_count > m_max_cluster_count) {
1473+        if (total_count > m_max_cluster_count || total_size > m_max_cluster_size) {
1474             clusterset.m_group_data->m_group_oversized = true;

instagibbs commented at 3:04 pm on May 14, 2025:

462429bc159b1b5ddb5443c484c9bdeae985ad26

nit: docstring for m_group_oversized should be updated

sipa commented at 7:17 pm on May 21, 2025:

Done, but also removed in a later commit.

in src/txgraph.h:67 in 462429bc15 outdated

62@@ -63,10 +63,10 @@ class TxGraph
63     /** Virtual destructor, so inheriting is safe. */
64     virtual ~TxGraph() = default;
65     /** Construct a new transaction with the specified feerate, and return a Ref to it.
66-     *  If a staging graph exists, the new transaction is only created there. In all
67-     *  further calls, only Refs created by AddTransaction() are allowed to be passed to this
68-     *  TxGraph object (or empty Ref objects). Ref objects may outlive the TxGraph they were
69-     *  created for. */
70+     *  If a staging graph exists, the new transaction is only created there. feerate.size cannot
71+     *  exceed the graph's max cluster size. In all further calls, only Refs created by

instagibbs commented at 3:09 pm on May 14, 2025:

and can’t also be 0? Didn’t double check this was already the case, but an assertion is added in 462429bc159b1b5ddb5443c484c9bdeae985ad26

sipa commented at 7:18 pm on May 21, 2025:

Added as comment.

in src/txgraph.cpp:2145 in 462429bc15 outdated

2140@@ -2124,6 +2141,8 @@ void Cluster::SanityCheck(const TxGraphImpl& graph, int level) const
2141     assert(m_linearization.size() <= graph.m_max_cluster_count);
2142     // The level must match the level the Cluster occurs in.
2143     assert(m_level == level);
2144+    // The sum of their sizes cannot exceed m_max_cluster_size.
2145+    assert(GetTxSize() <= graph.m_max_cluster_size);

instagibbs commented at 3:14 pm on May 14, 2025:

462429bc159b1b5ddb5443c484c9bdeae985ad26

Was a bit confusing at first, but this must be true because clusters are never merged (via dependency application) when to-be-merged-clusters would be oversized.

sipa commented at 7:19 pm on May 21, 2025:

I’ve added a comment to highlight that.

in src/test/fuzz/txgraph.cpp:147 in 439ccf713b outdated

133@@ -134,6 +134,8 @@ struct SimTxGraph
134         simmap[simpos] = std::make_shared<TxGraph::Ref>();
135         auto ptr = simmap[simpos].get();
136         simrevmap[ptr] = simpos;
137+        // This may invalidate our cached oversized value.

instagibbs commented at 3:24 pm on May 14, 2025:

439ccf713b8f9de1dff1d02678bb68d0f29ea175

commit message maybe could get beefed up a bit with motivation since at first blush it seems odd to add singletons we wouldn’t have relayed in the first place

sipa commented at 7:15 pm on May 21, 2025:

I have expanded the commit message of both Trim commits.

in src/txgraph.cpp:40 in 439ccf713b outdated

34@@ -35,6 +35,9 @@ using ClusterSetIndex = uint32_t;
35 /** Quality levels for cached cluster linearizations. */
36 enum class QualityLevel
37 {
38+    /** This is a singleton cluster consisting of a transaction that individually exceeds the
39+     *  cluster size limit. It cannot be merged with anything. */
40+    OVERSIZED,

instagibbs commented at 3:25 pm on May 14, 2025:

439ccf713b8f9de1dff1d02678bb68d0f29ea175

nit: SINGLETON_OVERSIZED to reduce later mental gymnastics

sipa commented at 7:19 pm on May 21, 2025:

Good idea, I went with OVERSIZED_SINGLETON.

in src/txgraph.cpp:502 in 439ccf713b outdated

455@@ -447,7 +456,7 @@ class TxGraphImpl final : public TxGraph
456     ClusterSet& GetClusterSet(int level) noexcept;
457     const ClusterSet& GetClusterSet(int level) const noexcept;
458     /** Make a transaction not exist at a specified level. It must currently exist there. */
459-    void ClearLocator(int level, GraphIndex index) noexcept;
460+    void ClearLocator(int level, GraphIndex index, bool oversized_tx) noexcept;

instagibbs commented at 3:27 pm on May 14, 2025:

439ccf713b8f9de1dff1d02678bb68d0f29ea175

please add documentation on what oversized_tx should be

sipa commented at 7:19 pm on May 21, 2025:

Done.

in src/txgraph.cpp:830 in 439ccf713b outdated

826@@ -815,9 +827,14 @@ void Cluster::ApplyRemovals(TxGraphImpl& graph, std::span<GraphIndex>& to_remove
827         todo.Reset(m_linearization.back());
828         m_linearization.pop_back();
829     }
830-    if (todo.None()) {
831+    if (m_linearization.empty()) {

instagibbs commented at 3:43 pm on May 14, 2025:

439ccf713b8f9de1dff1d02678bb68d0f29ea175

is this a future optimization? these types of clusters were already being marked as NEED_SPLIT_*

sipa commented at 7:20 pm on May 21, 2025:

I have simplified this to just a m_linearization.empty() || in the if conditional below.

in src/txgraph.cpp:861 in 439ccf713b outdated

854@@ -838,7 +855,10 @@ void Cluster::ApplyRemovals(TxGraphImpl& graph, std::span<GraphIndex>& to_remove
855 void Cluster::Clear(TxGraphImpl& graph) noexcept
856 {
857     for (auto i : m_linearization) {
858-        graph.ClearLocator(m_level, m_mapping[i]);
859+        // We do not care about setting oversized_tx accurately here, because this function is only
860+        // applied to main-graph Clusters in CommitStaging, which will overwrite main's
861+        // m_txcount_oversized anyway with the staging graph's value.
862+        graph.ClearLocator(m_level, m_mapping[i], /*oversized_tx=*/false);

instagibbs commented at 3:48 pm on May 14, 2025:

0        Assume(m_level == 0);
1        graph.ClearLocator(m_level, m_mapping[i], /*oversized_tx=*/false);

sipa commented at 7:21 pm on May 21, 2025:

Instead of this big unintuitive comment, I have replaced it with /*oversized_tx=*/m_quality == QualityLevel::OVERSIZED_SINGLETON. With that, I don’t think the Assume is neccesary?

instagibbs commented at 2:33 pm on May 27, 2025:

much better thanks

in src/txgraph.cpp:2684 in acb2b6a90a outdated

2686+        // Replace arg1 and arg2 by their representatives.
2687+        auto rep1 = find_fn(arg1);
2688+        auto rep2 = find_fn(arg2);
2689+        // Bail out if both representatives are the same, because that means arg1 and arg2 are in
2690+        // the same partition already.
2691+        if (rep1 == rep2) return rep1;

instagibbs commented at 9:15 pm on May 14, 2025:

acb2b6a90a47567773268a024bcf2f1ddfcc60df

FYI this condition doesn’t appear to be hit in the fuzz harness

sipa commented at 7:21 pm on May 21, 2025:

Both assert(rep1 == rep2) and assert(rep1 != rep2) instantly trigger failures when fuzzing for me.

instagibbs commented at 5:00 pm on May 23, 2025:

curious. Coverage report may be off.

in src/txgraph.cpp:858 in 439ccf713b outdated

854@@ -838,7 +855,10 @@ void Cluster::ApplyRemovals(TxGraphImpl& graph, std::span<GraphIndex>& to_remove
855 void Cluster::Clear(TxGraphImpl& graph) noexcept
856 {
857     for (auto i : m_linearization) {
858-        graph.ClearLocator(m_level, m_mapping[i]);
859+        // We do not care about setting oversized_tx accurately here, because this function is only

instagibbs commented at 2:23 pm on May 19, 2025:

439ccf713b8f9de1dff1d02678bb68d0f29ea175

Should this function have clearer naming/documentation to avoid future regressions in use? Seems brittle to me.

sipa commented at 7:22 pm on May 21, 2025:

See above, I have dropped the constraints on ClearLocator instead.

instagibbs commented at 2:33 pm on May 27, 2025:

better thanks

in src/txgraph.cpp:260 in 439ccf713b outdated

253@@ -248,8 +254,11 @@ class TxGraphImpl final : public TxGraph
254         /** Total number of transactions in this graph (sum of all transaction counts in all
255          *  Clusters, and for staging also those inherited from the main ClusterSet). */
256         GraphIndex m_txcount{0};
257+        /** Total number of individually oversized transactions in the graph. */
258+        GraphIndex m_txcount_oversized{0};
259         /** Whether this graph is oversized (if known). This roughly matches
260-         *  m_group_data->m_group_oversized, but may be known even if m_group_data is not. */
261+         *  m_group_data->m_group_oversized || (m_txcount_oversized > 0), but may be known even if

instagibbs commented at 2:40 pm on May 19, 2025:

439ccf713b8f9de1dff1d02678bb68d0f29ea175

Do we need m_group_oversized? Seems like a variable we have to keep in sync for no reason:

 0diff --git a/src/txgraph.cpp b/src/txgraph.cpp
 1index 01076b763b..853f8530a0 100644
 2--- a/src/txgraph.cpp
 3+++ b/src/txgraph.cpp
 4@@ -231,7 +231,4 @@ private:
 5         /** Which clusters are to be merged. GroupEntry::m_cluster_offset indexes into this. */
 6         std::vector<Cluster*> m_group_clusters;
 7-        /** Whether at least one of the groups cannot be applied because it would result in a
 8-         *  Cluster that violates the cluster count limit. */
 9-        bool m_group_oversized;
10     };
11 
12@@ -257,7 +254,5 @@ private:
13         /** Total number of individually oversized transactions in the graph. */
14         GraphIndex m_txcount_oversized{0};
15-        /** Whether this graph is oversized (if known). This roughly matches
16-         *  m_group_data->m_group_oversized || (m_txcount_oversized > 0), but may be known even if
17-         *  m_group_data is not. */
18+        /** Whether this graph is oversized (if known). */
19         std::optional<bool> m_oversized{false};
20 
21@@ -1470,7 +1465,7 @@ void TxGraphImpl::GroupClusters(int level) noexcept
22     clusterset.m_group_data = GroupData{};
23     clusterset.m_group_data->m_group_clusters.reserve(an_clusters.size());
24-    clusterset.m_group_data->m_group_oversized = false;
25     clusterset.m_deps_to_add.clear();
26     clusterset.m_deps_to_add.reserve(an_deps.size());
27+    clusterset.m_oversized = false;
28     auto an_deps_it = an_deps.begin();
29     auto an_clusters_it = an_clusters.begin();
30@@ -1502,10 +1497,9 @@ void TxGraphImpl::GroupClusters(int level) noexcept
31         // Detect oversizedness.
32         if (total_count > m_max_cluster_count || total_size > m_max_cluster_size) {
33-            clusterset.m_group_data->m_group_oversized = true;
34+            clusterset.m_oversized = true;
35         }
36     }
37     Assume(an_deps_it == an_deps.end());
38     Assume(an_clusters_it == an_clusters.end());
39-    clusterset.m_oversized = clusterset.m_group_data->m_group_oversized;
40     Compact();
41 }
42@@ -2371,11 +2365,4 @@ void TxGraphImpl::SanityCheck() const
43         if (!clusterset.m_removed.empty()) compact_possible = false;
44 
45-        // If m_group_data exists, and no outstanding removals remain, m_group_oversized must match
46-        // m_group_oversized || (m_txcount_oversized > 0).
47-        if (clusterset.m_group_data.has_value() && clusterset.m_to_remove.empty()) {
48-            assert(clusterset.m_oversized ==
49-                   (clusterset.m_group_data->m_group_oversized || (clusterset.m_txcount_oversized > 0)));
50-        }
51-
52         // For non-top levels, m_oversized must be known (as it cannot change until the level
53         // on top is gone).

sipa commented at 7:23 pm on May 21, 2025:

Nice catch, after the changes in “Permit transactions that exceed cluster size limit”, m_group_oversized is never read anymore. Included this patch as a separate simplification commit.

in src/txgraph.cpp:71 in acb2b6a90a outdated

61@@ -62,12 +62,29 @@ struct TrimTxData
62     TxGraph::GraphIndex m_index;
63     /** Number of unmet dependencies this transaction has. -1 if the transaction is included. */
64     uint32_t m_deps_left;
65+    /** Number of dependencies that apply to this transaction as child. */

instagibbs commented at 3:09 pm on May 19, 2025:

acb2b6a90a47567773268a024bcf2f1ddfcc60df

Some of this commit message should probably go in ce825731e8bea294e71b339015eccdb087e760c7 ? First paragraph at least.

sipa commented at 7:25 pm on May 21, 2025:

I have expanded the commit messages.

in src/test/fuzz/txgraph.cpp:828 in ce825731e8 outdated

787@@ -788,6 +788,27 @@ FUZZ_TARGET(txgraph)
788                     assert(sum == worst_chunk_feerate);
789                 }
790                 break;
791+            } else if ((block_builders.empty() || sims.size() > 1) && command-- == 0) {

instagibbs commented at 8:23 pm on May 19, 2025:

Here are a couple suggestions to tighten up the checks on what we’re trimming. Attempted to state that:

non-oversized clusters remain untouched
transactions with no ancestors that are undersized should never be trimmed (this seems natural but the interface doesn’t guarantee it)

Mostly trying to avoid degenerate behavior like entire mempool being wiped, or any oversized cluster being completely wiped needlessly.

 0diff --git a/src/test/fuzz/txgraph.cpp b/src/test/fuzz/txgraph.cpp
 1index 47ad5792bf..bdf759f485 100644
 2--- a/src/test/fuzz/txgraph.cpp
 3+++ b/src/test/fuzz/txgraph.cpp
 4@@ -91,4 +91,28 @@ struct SimTxGraph
 5     }
 6 
 7+    /** Returns true if all positions in the given set are in oversized clusters, false otherwise. */
 8+    bool AllInOversizedClusters(SetType set)
 9+    {
10+        if (set.Any() && !IsOversized()) return false;
11+
12+        auto todo = graph.Positions();
13+        if (!set.IsSubsetOf(todo)) return false;
14+
15+        // Walk all clusters, and make sure all of `set` doesn't come from non-oversized clusters
16+        while (todo.Any()) {
17+            auto component = graph.FindConnectedComponent(todo);
18+            bool is_oversized{component.Count() > max_cluster_count};
19+            uint64_t component_size{0};
20+            for (auto i : component) component_size += graph.FeeRate(i).size;
21+            is_oversized |= component_size > max_cluster_size;
22+            // Some element existed in a non-oversized cluster
23+            if (!is_oversized && set.Overlaps(component)) {
24+                return false;
25+            }
26+            todo -= component;
27+        }
28+        return true;
29+    }
30+
31     void MakeModified(DepGraphIndex index)
32     {
33@@ -798,8 +822,16 @@ FUZZ_TARGET(txgraph)
34                 }
35                 auto removed_set = top_sim.MakeSet(removed);
36-                // The removed set must contain all its own descendants.
37                 for (auto simpos : removed_set) {
38+                    // The removed set must contain all its own descendants.
39                     assert(top_sim.graph.Descendants(simpos).IsSubsetOf(removed_set));
40+
41+                    // Nothing removed should have no ancestors and be undersized itself
42+                    assert(!(top_sim.GetAncDesc(top_sim.GetRef(simpos), /*desc=*/false).Count() == 0 &&
43+                        (uint64_t) top_sim.graph.FeeRate(simpos).size <= top_sim.max_cluster_size));
44                 }
45+
46+                // Nothing from not-oversized-clusters should have been removed
47+                assert(top_sim.AllInOversizedClusters(removed_set));
48+
49                 // Apply all removals to the simulation, and verify the result is no longer
50                 // oversized. Don't query the real graph for oversizedness; it is compared

sipa commented at 7:24 pm on May 21, 2025:

Taken, except for:

0+                  assert(!(top_sim.GetAncDesc(top_sim.GetRef(simpos), /*desc=*/false).Count() == 0 &&
1+                        (uint64_t) top_sim.graph.FeeRate(simpos).size <= top_sim.max_cluster_size));

because Count() == 0 will never be true (ancestors always includes the transaction itself), and the test fails when I change it to Count() == 1.

instagibbs commented at 8:34 pm on May 21, 2025:

Ah, I see it now. Both variants of the Trim() algorithm can definitely remove these types of transactions.

f.e. if cluster limit is 2 and you have: TxA <- TxB <- TxC TxA <- TxE TxD <- TxE

You may select TxA+TxB only (after stopping at TxC), evicting TxC+TxD+TxE, if TxD’s cfr is low.

If TxD’s cfr is higher than TxB or TxC, then you could retain A+B+D

After a bunch of debugging, I was able to figure out a stronger check, if we constrain the scenario from a non-oversized to oversized transition with pending dependencies applied: https://github.com/instagibbs/bitcoin/tree/2025-05-applytrim

sipa commented at 3:38 am on May 28, 2025:

Added a commit with a test like this, with some changes. LMK what you think.

instagibbs commented at 8:26 pm on May 19, 2025: member

acb2b6a90a47567773268a024bcf2f1ddfcc60df

focused on motivation/structure/approach and test coverage for first pass. Haven’t validated the actual trimming logic yet.

in src/txgraph.cpp:919 in ce825731e8 outdated

907@@ -886,6 +908,37 @@ void Cluster::AppendChunkFeerates(std::vector<FeeFrac>& ret) const noexcept
908     ret.insert(ret.end(), chunk_feerates.begin(), chunk_feerates.end());
909 }
910 
911+uint64_t Cluster::AppendTrimData(std::vector<TrimTxData>& ret, std::vector<std::pair<GraphIndex, GraphIndex>>& deps) const noexcept
912+{
913+    LinearizationChunking linchunking(m_depgraph, m_linearization);

instagibbs commented at 2:20 pm on May 20, 2025:

0    const LinearizationChunking linchunking(m_depgraph, m_linearization);

sipa commented at 3:33 am on May 28, 2025:

Done.

in src/txgraph.cpp:189 in ce825731e8 outdated

167@@ -151,6 +168,10 @@ class Cluster
168     void Relinearize(TxGraphImpl& graph, uint64_t max_iters) noexcept;
169     /** For every chunk in the cluster, append its FeeFrac to ret. */
170     void AppendChunkFeerates(std::vector<FeeFrac>& ret) const noexcept;
171+    /** Add a TrimTxData entry for every transaction in the Cluster to ret. Implicit dependencies

instagibbs commented at 2:51 pm on May 20, 2025:

nit: 3 out of the N fields are being set here, could we be specific which ones for clarity?

sipa commented at 3:38 am on May 28, 2025:

Done. Also organized the fields of struct TrimTxData a bit more logically, into “Populated by Cluster::AppendTxData” and “Only used internally in TxGraphImpl::Trim” sections.

in src/txgraph.cpp:2669 in ce825731e8 outdated

2664+        // Gather trim data from all involved Clusters.
2665+        auto cluster_span = std::span{clusterset.m_group_data->m_group_clusters}
2666+                                .subspan(group_data.m_cluster_offset, group_data.m_cluster_count);
2667+        uint64_t size{0};
2668+        for (Cluster* cluster : cluster_span) {
2669+            size += cluster->AppendTrimData(trim_data, deps);

instagibbs commented at 3:12 pm on May 20, 2025:

nit: leave a comment that deps is also being added to here, briefly forgot

sipa commented at 3:37 am on May 28, 2025:

Done.

in src/txgraph.cpp:2677 in ce825731e8 outdated

2672+        if (trim_data.size() <= m_max_cluster_count && size <= m_max_cluster_size) continue;
2673+        // Sort the trim data by GraphIndex. In what follows, we will treat this sorted vector as
2674+        // a map from GraphIndex to TrimTxData, and its ordering will not change anymore.
2675+        std::sort(trim_data.begin(), trim_data.end(), [](auto& a, auto& b) noexcept { return a.m_index < b.m_index; });
2676+
2677+        // Construct deps, and sort it by child.

instagibbs commented at 3:12 pm on May 20, 2025:

0        // Append explicit deps, and sort it by child.

sipa commented at 3:37 am on May 28, 2025:

Done.

in src/txgraph.cpp:2682 in ce825731e8 outdated

2677+        // Construct deps, and sort it by child.
2678+        deps.insert(deps.end(),
2679+                    clusterset.m_deps_to_add.begin() + group_data.m_deps_offset,
2680+                    clusterset.m_deps_to_add.begin() + group_data.m_deps_offset + group_data.m_deps_count);
2681+        std::sort(deps.begin(), deps.end(), [](auto& a, auto& b) noexcept { return a.second < b.second; });
2682+        // Fill m_deps_left in trim_data. Because of the sort above, all

instagibbs commented at 3:17 pm on May 20, 2025:

0        // Fill m_deps_left in trim_data and initially populate trim_heap. Because of the sort above, all

sipa commented at 3:37 am on May 28, 2025:

Done.

in src/txgraph.cpp:2700 in ce825731e8 outdated

2695+                trim_heap.push_back(trim_it);
2696+            }
2697+        }
2698+        Assume(deps_it == deps.end());
2699+
2700+        // Sort deps by parent.

instagibbs commented at 3:19 pm on May 20, 2025:

0        // Sort deps by parent by graph index. The order is unchanged from now on.

sipa commented at 3:36 am on May 28, 2025:

Done.

in src/txgraph.cpp:2713 in ce825731e8 outdated

2669+            size += cluster->AppendTrimData(trim_data, deps);
2670+        }
2671+        // If this group of Clusters does not violate any limits, continue to the next group.
2672+        if (trim_data.size() <= m_max_cluster_count && size <= m_max_cluster_size) continue;
2673+        // Sort the trim data by GraphIndex. In what follows, we will treat this sorted vector as
2674+        // a map from GraphIndex to TrimTxData, and its ordering will not change anymore.

instagibbs commented at 3:23 pm on May 20, 2025:

0        // a map from GraphIndex to TrimTxData via locate_fn, and its ordering will not change anymore.

sipa commented at 3:37 am on May 28, 2025:

Done.

in src/txgraph.cpp:2851 in ce825731e8 outdated

2767+            }
2768+        }
2769+    }
2770+    clusterset.m_group_data.reset();
2771+    clusterset.m_oversized = false;
2772+    return ret;

instagibbs commented at 3:43 pm on May 20, 2025:

0    Assume(!ret.empty());
1    return ret;

sipa commented at 3:35 am on May 28, 2025:

Done.

sipa force-pushed on May 21, 2025

in src/txgraph.cpp:151 in bd95322455 outdated

103@@ -101,6 +104,9 @@ class Cluster
104     {
105         return m_quality == QualityLevel::OPTIMAL;
106     }
107+    /** Whether this cluster is oversized (just due to the size of its transaction(s), not due to
108+     *  dependencies that are yet to be added. */
109+    bool IsOversized() const noexcept { return m_quality == QualityLevel::OVERSIZED_SINGLETON; }

ismaelsadeeq commented at 9:18 am on May 22, 2025:

This method’s name aligns with its functionality.

However with the addition of this commit I think there is ambiguity currently between the size oversize, the count oversize and both. This makes it a bit confusing to follow.

maybe separate the two.

oversize should be for size.

overlimit or something like it for both

sipa commented at 7:54 pm on May 22, 2025:

To be clear, Cluster::IsOversized() returns whether the Cluster is oversized in general. It’s just the case that changes to clusters which bring it over the size/count limit are never applied, so the only way an actually materialized Cluster can be oversized is if it’s a singleton transaction which is oversized on its own.

ismaelsadeeq commented at 8:42 pm on May 22, 2025:

To be clear, Cluster::IsOversized() returns whether the Cluster is oversized in general.

huh, I missed thanks for the clarity. I think you can add this comment as well or anything that would make this explicit and clearer will be appreciated.

sipa commented at 11:50 pm on May 22, 2025:

Expanded the comment to say that.

in src/txgraph.cpp:834 in bd95322455 outdated

830@@ -818,7 +831,7 @@ void Cluster::ApplyRemovals(TxGraphImpl& graph, std::span<GraphIndex>& to_remove
831     if (todo.None()) {
832         // If no further removals remain, and thus all removals were at the end, we may be able
833         // to leave the cluster at a better quality level.
834-        if (IsAcceptable(/*after_split=*/true)) {
835+        if (m_linearization.empty() || IsAcceptable(/*after_split=*/true)) {

ismaelsadeeq commented at 11:00 am on May 22, 2025:

if there is nothing in the cluster? why should it be marked as needs_split_acceptable? Also I see no crashes after removing the new or condition.

sipa commented at 7:57 pm on May 22, 2025:

You’re right, this doesn’t matter at all. The Cluster will just be deleted by Split anyway. Removed it.

in src/test/fuzz/txgraph.cpp:835 in 412a9e0e69 outdated

820+                auto removed = real->Trim();
821+                if (!was_oversized) {
822+                    assert(removed.empty());
823+                    break;
824+                }
825+                auto removed_set = top_sim.MakeSet(removed);

ismaelsadeeq commented at 12:38 pm on May 22, 2025:

assert that we trim when oversized ?

0                assert(!removed.empty());
1                auto removed_set = top_sim.MakeSet(removed);

sipa commented at 7:51 pm on May 22, 2025:

Done.

in src/test/fuzz/txgraph.cpp:270 in 412a9e0e69 outdated

265+            for (auto i : component) component_size += graph.FeeRate(i).size;
266+            is_oversized |= component_size > max_cluster_size;
267+            // Some element existed in a non-oversized cluster
268+            if (!is_oversized && set.Overlaps(component)) {
269+                return false;
270+            }

ismaelsadeeq commented at 12:56 pm on May 22, 2025:

check both ways

0            }
1            // No element exist in set and cluster is oversized 
2            if (is_oversized && !set.Overlaps(component)){
3               return false;
4            }

sipa commented at 7:56 pm on May 22, 2025:

Interesting, done! I have renamed the function to MatchesOversizedClusters and added a comment to clarify what it does.

I’ve also simplified it to just if (is_oversized != set.Overlaps(component)) return false;.

in src/txgraph.h:176 in 412a9e0e69 outdated

168@@ -169,6 +169,11 @@ class TxGraph
169      *  that appear identically in both. Use FeeFrac rather than FeePerWeight so CompareChunks is
170      *  usable without type-conversion. */
171     virtual std::pair<std::vector<FeeFrac>, std::vector<FeeFrac>> GetMainStagingDiagrams() noexcept = 0;
172+    /** Remove transactions (including their own descendants) according to a fast but best-effort
173+     *  strategy such that the TxGraph's cluster and size limits are respected. Applies to staging
174+     *  if it exists, and to main otherwise. Returns the list of all removed transactions in
175+     *  unspecified order. This has no effect unless the relevant graph is oversized. */
176+    virtual std::vector<Ref*> Trim() noexcept = 0;

ismaelsadeeq commented at 1:30 pm on May 22, 2025:

Indicate that it should not be called when their is an instance of block builder.

sipa commented at 7:55 pm on May 22, 2025:

The GetBlockBuilder function already states “While the returned object exists, no mutators on the main graph are allowed.”. I’m open to putting a comment to that effect on each of the mutators where it matters (incl. Trim), but I feel like it should be done consistently.

ismaelsadeeq commented at 2:30 pm on May 22, 2025: member

Code review up to 412a9e0e69ccf461554660b7c622b768cfb8ccb3

Also did some fuzzing and mutate the new code added and tests, craches occur as expected.

If a transaction is encountered whose addition would violate the limit, it is removed, together with all its descendants.

In real-world scenarios it is unlikely we encounter those transactions that will exceed the cluster count limit, however when it occur I assume that the newly connected block will have similar transaction with the reorged block. I think maintaining an in-memory data structure for these transactions removed due to the limit seperate from mempool will reduce bandwidth from re-questing the tx again and also avoid non-contextual checks?

We can get rid of them after the reorg is complete and they arent included in any block.

sipa force-pushed on May 22, 2025

sipa commented at 8:02 pm on May 22, 2025: member

In real-world scenarios it is unlikely we encounter those transactions that will exceed the cluster count limit, however when it occur I assume that the newly connected block will have similar transaction with the reorged block. I think maintaining an in-memory data structure for these transactions removed due to the limit seperate from mempool will reduce bandwidth from re-questing the tx again and also avoid non-contextual checks?

In theory, almost the entire mempool could be trimmed away (in a pathological case where the re-added block transactions link all mempool transaction together), so any approach for doing this would need to be able to handle picking just an acceptable-sized subset. It may make sense to just throw them into the compact block reconstruction extrapool though, up to a certain limit.

sipa force-pushed on May 22, 2025

in src/txgraph.h:250 in 0e59c73288 outdated

239@@ -240,8 +240,9 @@ class TxGraph
240     };
241 };
242 
243-/** Construct a new TxGraph with the specified limit on transactions within a cluster. That
244- *  number cannot exceed MAX_CLUSTER_COUNT_LIMIT. */
245-std::unique_ptr<TxGraph> MakeTxGraph(unsigned max_cluster_count) noexcept;
246+/** Construct a new TxGraph with the specified limit on the number of transactions within a cluster,
247+ *  and on the sum of transaction sizes within a cluster. max_cluster_count cannot exceed
248+ *  MAX_CLUSTER_COUNT_LIMIT. */

instagibbs commented at 5:07 pm on May 23, 2025:

nano-nit while we’re driving by: max_cluster_count should be > 0

sipa commented at 3:36 am on May 28, 2025:

Forgot to take this. Will leave open for now.

ismaelsadeeq commented at 3:18 pm on May 26, 2025: member

light code review ACK 538e9ff804f8dfba6f6a808e83572fdeab145ab8

in src/txgraph.cpp:2776 in 538e9ff804 outdated

2761+        Assume(deps_it == deps_by_parent.end());
2762+
2763+        // Build a heap of all transactions with 0 unmet dependencies.
2764+        std::make_heap(trim_heap.begin(), trim_heap.end(), cmp_fn);
2765+
2766+        // Iterate over to-be-included transactions. It is possible that the heap empties without

instagibbs commented at 2:39 pm on May 27, 2025:

https://github.com/instagibbs/bitcoin/tree/2025-05-applytrim

Worked on a test attempts to cover the claim here. It ended up being more complicated than that (claim a bit wrong?) due to the lazy application of dependencies. For this property to hold, I believe you have to touch the TxGraph in any way that applies dependencies when undersized, then transition to oversized graph for the circular dependencies to not occur in the implied graph.

Without the dependencies applied, the linearization order can be “nonsensical”, which then creates a contradictory dependency, iow cycle. This cycle ends up dropping sometimes the entire cluster ala

EhBGLQYGhDOENwYAr6+vr682AAYAAHoAAAAAAEr/EIQGBi0GBoQ0BjYQLUYGBoQzAEAzhAYGhDcG ADIABgAAMwAAMQCEBgY=

This can be tested by commenting out the CountDistinctClusters invocation in the new test.

The good news is this usage pattern is “natural”, so in practice each new transaction (package) will have a TxGraph with all dependencies applied except for the new package. It’s just weird and I’m unsure how to do better. Perhaps Trim could pre-emptively apply dependencies + FixLinearization for each grouped cluster.

In this graph, T2 is the implied parent of T1 thanks to to-be-applied dependencies not being applied yet, causing the trim heap to never get an entry:

sipa commented at 3:36 am on May 28, 2025:

I’ve written a test based on this, which randomly adds dependencies but in a “directed” sense that never introduces cycles (rather than just has 2 specific scenarios).

instagibbs commented at 2:46 pm on May 27, 2025: member

Reviewed through 538e9ff804f8dfba6f6a808e83572fdeab145ab8

Last big Q is about the cycles being generated in the implied graph, and what if anything we should do differently about it.

sipa force-pushed on May 28, 2025

DrahtBot added the label CI failed on May 28, 2025

DrahtBot commented at 3:32 am on May 28, 2025: contributor

🚧 At least one of the CI tasks failed. Task ARM, unit tests, no functional tests: https://github.com/bitcoin/bitcoin/runs/43009654762 LLM reason (✨ experimental): The failure is caused by a type mismatch in the call to std::max due to incompatible argument types.

Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.
An intermittent issue.

Leave a comment here, if you need help tracking down a confusing failure.

DrahtBot removed the label CI failed on May 28, 2025

sipa force-pushed on May 28, 2025

instagibbs commented at 2:13 pm on May 28, 2025: member

ACK https://github.com/bitcoin/bitcoin/pull/31553/commits/e1f501cab224608004e3a1eee7a48eec3cea25fc

I appreciate the additional coverage added; I know it’s not particularly blackbox but should catch the most basic regressions going forward.

DrahtBot requested review from ismaelsadeeq on May 28, 2025

ismaelsadeeq commented at 2:05 pm on May 29, 2025: member

reACK e1f501cab224608004e3a1eee7a48eec3cea25fc deab145ab8..e1f501cab2

Following the last light ACK, review comments by @instagibbs were addressed:

Increased fuzz test coverage
Clearer and corrected comments
An Assume statement was added

in src/txgraph.cpp:2612 in 1664ec567f outdated

2607+    // cluster linearization. So it can be seen as a "merge sort" of the chunks of the clusters,
2608+    // but respecting the dependencies being added.
2609+    //
2610+    // This rudimentary linearization is computed lazily, by putting all eligible (no unmet
2611+    // dependencies) transactions in a heap, and popping the highest-feerate one from it. This
2612+    // continues as long the number or size of all picked transactions together does not exceed the

glozow commented at 9:48 pm on June 13, 2025:

0    // continues as long as the number or size of all picked transactions together does not exceed the

sipa commented at 5:30 pm on June 17, 2025:

Done.

in src/txgraph.h:248 in 416a8d16d0 outdated

239@@ -240,8 +240,9 @@ class TxGraph
240     };
241 };
242 
243-/** Construct a new TxGraph with the specified limit on transactions within a cluster. That
244- *  number cannot exceed MAX_CLUSTER_COUNT_LIMIT. */
245-std::unique_ptr<TxGraph> MakeTxGraph(unsigned max_cluster_count) noexcept;
246+/** Construct a new TxGraph with the specified limit on the number of transactions within a cluster,

glozow commented at 7:41 pm on June 16, 2025:

Perhaps worth noting (for future visitors) that these two limits have pretty different purposes even though they both contribute to the oversizedness property. Max cluster count helps bound linearization complexity, but max cluster size’s purpose is largely external to TxGraph. It is only a parameter so that TxGraph can help enforce it.

sipa commented at 5:29 pm on June 17, 2025:

Yes and no.

I’ve come to think of them as both being a feature that TxGraph offers to the mempool/net_processing code to help enforce its policy limits. So in this regard, they are equivalent.

The reason for the existence of the policy limits differs however: weight is there to bound how good/predictable transaction quality can be through linearization (avoiding binpacking), while count is there to keep the TxGraphImpl implementation efficient. But it isn’t the case that this directly determines the count limit, otherwise e.g. more performant machines should have a higher limit for example.

glozow commented at 7:07 pm on June 17, 2025:

Fair enough, I suppose they are both external policy limits!

in src/txgraph.h:69 in 221de30171 outdated

63@@ -64,9 +64,9 @@ class TxGraph
64     virtual ~TxGraph() = default;
65     /** Construct a new transaction with the specified feerate, and return a Ref to it.
66      *  If a staging graph exists, the new transaction is only created there. feerate.size must be
67-     *  strictly positive, and cannot exceed the graph's max cluster size. In all further calls,
68-     *  only Refs created by AddTransaction() are allowed to be passed to this TxGraph object (or
69-     *  empty Ref objects). Ref objects may outlive the TxGraph they were created for. */
70+     *  strictly positive. In all further calls, only Refs created by AddTransaction() are allowed
71+     *  to be passed to this TxGraph object (or empty Ref objects). Ref objects may outlive the
72+     *  TxGraph they were created for. */

glozow commented at 3:22 pm on June 17, 2025:

in 221de30171acc1a20549079309b545612f727efb, do I understand correctly that we also don’t ever expect OVERSIZED_SINGLETON to occur in the wild (i.e. we wouldn’t accept any transactions above standard size in reorgs and max standard size > max cluster size), but TxGraph must handle this case as part of handling oversizedness in general?

sipa commented at 5:30 pm on June 17, 2025:

I believe that’s correct. @sdaftuar ?

From the commit message:

With this change, the responsibility for enforcing cluster size limits can be localized purely in TxGraph, without callers (and in particular, tests) needing to duplicate the enforcement for individual transactions.

glozow commented at 5:14 pm on June 17, 2025: member

Small doc things for now, still reviewing

sipa force-pushed on Jun 17, 2025

sipa commented at 5:33 pm on June 17, 2025: member

Addressed nit, and improved comments here and there.

Also two bigger changes:

There was a bug in the second Trim commit (the one that added UF), resulting in it not achieving the desired quality (if anyone has ideas for tests that exercise the improved quality UF brings, I’m all ears).
I’ve changed the conversion of unincluded transactions to singleton UF partitions to happen at the time they are popped from the heap, rather than when they are added. I think this makes more sense, as it means the heap is purely for unincluded but includable transactions, and the UF structure purely for included or rejected transactions.

Effect of the bugfix, in a benchmark that creates a merge-the-whole-mempool scenario (not included in the PR, because it doesn’t really fit in our bench framework):

Before fix:

Benchmarks (in ms):
- block_build: 21.7986
- oversize: 7.12257
- block_linearize: 33.0116
- mempool_build: 0.0691267
- trim: 10.4955
- final_linearize: 1.5371
- reset: 0.308329
Transaction counts:
- before trim: 64011
- after trim: 38743

After fix:

Benchmarks (in ms):
- block_build: 23.3451
- oversize: 7.70649
- block_linearize: 34.05
- mempool_build: 0.0766003
- trim: 12.6596
- final_linearize: 0.0175771
- reset: 0.371984
Transaction counts:
- before trim: 64011
- after trim: 63994

glozow commented at 6:54 pm on June 17, 2025: member

if anyone has ideas for tests that exercise the improved quality UF brings, I’m all ears

Maybe a starting point, I wrote a unit test to see the difference in trimming with 0b59916271c43cbedd14aafe88e5baf2dec2d97d: https://github.com/glozow/bitcoin/commits/2025-06-31553-test/

One test is a bunch of singleton transactions (e.g. from disconnected blocks) connected by 1 transaction (e.g. in the mempool).

Initial Trim implementation: trims all but the top max_cluster_count transactions of the parents
With union-find tracking multiple: just trims the child, which leaves everything as singleton clusters

The other test is a trellis (e.g. parents all from disconnected blocks, children all already in mempool)

Initial Trim implementation: trims half of the trellis (top and bottom), leaving just the prefix that is within max_cluster_count size
With union-find tracking multiple: just trims the middle transaction, splitting it into 2 trellises

There was a bug in the second Trim commit

I haven’t fully grokked this commit (there’s something wrong with my eyesight whenever I look at the code), but before the last push, the trellis test was evicting 1/4 of the transactions: https://github.com/glozow/bitcoin/commits/2025-06-31553-trellis/

sipa force-pushed on Jun 17, 2025

sipa commented at 9:05 pm on June 17, 2025: member

@glozow Awesome, thanks. I’ve included your commit (with minor changes) in this PR, and added my own 64000-transactions-merged-together test to it.

sipa force-pushed on Jun 18, 2025

instagibbs commented at 1:57 pm on June 18, 2025: member

@sipa to be clear, was the issue

0    -+            if (new_count > m_max_cluster_count || new_size > m_max_cluster_size) break;
1    ++            if (new_count > m_max_cluster_count || new_size > m_max_cluster_size) continue;

In that the optimization means we should keep on going in case we’re going to create/add to undersized clusters, but this break broke the optimization. In other words, the base performance was still intact?

sipa commented at 2:01 pm on June 18, 2025: member

@instagibbs Yes, exactly.

The commit was still an improvement, as it would only stop once an aggregate-of-clusters-so-far would exceed the limit, rather than stopping when the sum of all-clusters-so-far would exceed the limit (the behavior in the initial Trim commit). But with the continue; instead of break;, it isn’t stopping, it’ll just skip the cluster being aggregated, and its descendants.

in src/test/txgraph_tests.cpp:62 in 4ead162a46 outdated

57+    }
58+
59+    // Check that the graph is now oversized. This also forces the graph to
60+    // group clusters and compute the oversized status.
61+    BOOST_CHECK_EQUAL(graph->GetTransactionCount(), TOTAL_NUM_TX);
62+    BOOST_CHECK(graph->IsOversized(false));

instagibbs commented at 2:15 pm on June 18, 2025:

micro-nit: main_only could be annotated and true to make it clear we’re not looking at staging (even though it doesn’t exist)

sipa commented at 3:17 pm on June 18, 2025:

Done.

in src/test/txgraph_tests.cpp:235 in 4ead162a46 outdated

222+            }
223+        }
224+        bottom_refs.push_back(std::move(bottom_tx));
225+    }
226+
227+    // Now we are oversized (one cluster of 64011).

instagibbs commented at 2:17 pm on June 18, 2025:

nit: can just assert this number rather than put in comment

sipa commented at 3:17 pm on June 18, 2025:

I’ve added a computation for it, and assertion.

instagibbs approved

instagibbs commented at 2:18 pm on June 18, 2025: member

reACK 643fffafdc70da5fbaf1992c36965595b9e63f84

and the UF structure purely for included or rejected transactions

i.e. rejected here means “things with no unmet deps yet never included”

Validated in my mental model that bug fix and heap logic changes are correct. Reviewed new unit tests and makes sense that it would have caught the bug. It’s a bit further non-black-box, but they’re simple.

git range-diff master e1f501cab224608004e3a1eee7a48eec3cea25fc 643fffafdc70da5fbaf1992c36965595b9e63f84

sipa force-pushed on Jun 18, 2025

sipa commented at 3:18 pm on June 18, 2025: member

@instagibbs Done. Also expanded comments slightly, made unit test constant names a bit more consistent, and sprinkled some graph->SanityCheck(); calls over the tests.

instagibbs commented at 3:54 pm on June 18, 2025: member

reACK 2b2df98747fdb6380588991167ce2e8cb92f3bfb

sipa commented at 1:49 pm on June 30, 2025: member

Would there be interest in adding a benchmark for the Trim() operation here, or rather in a follow-up?

instagibbs commented at 3:27 pm on June 30, 2025: member

@sipa linked here, added to follow-up potentially?

sipa commented at 5:20 pm on July 1, 2025: member

@instagibbs @ismaelsadeeq See https://github.com/sipa/bitcoin/commit/0a8e1d6e39bd1ba6d3df149945b8dfd78e1fb7e6

It’s the same scenario as the txgraph_trim_huge unit test (1000 chains of 64 reorged transactions each, merged together by 11 mempool transactions into one giant would-be cluster, then Trim()ed down to something acceptable). Takes around 15 ms for me.

instagibbs commented at 5:46 pm on July 1, 2025: member

@sipa one order of magnitude more for me on my wimpy laptop but well within acceptable bounds I suspect for a reorg. IIUC this is one of the more pessimistic mempool topologies for this since you’re going to have to go through the ~entire mempool before finishing

ismaelsadeeq commented at 8:39 pm on July 1, 2025: member

Would there be interest in adding a benchmark for the Trim() operation here, or rather in a follow-up?

That will be nice, I don’t mind adding it here. I plan to do another round of review of the added tests and diff after my last review so if added will review it as well.

I try running the bench on the linked commit. MakeTxGraph seems to takes two parameters, but the bench call provided three arguments. I removed NUM_ACCEPTABLE_ITERS arg this is the result.

ns/op	op/s	err%	total	benchmark
12,750,500.00	78.43	0.0%	0.01	`TxGraphTrim`

sipa commented at 9:07 pm on July 1, 2025: member

@ismaelsadeeq Yeah, the benchmark is in my “all txgraph” branch which also includes #32263. Removing the NUM_ACCEPTABLE_ITERS variable should be fine.

in src/test/txgraph_tests.cpp:71 in 2b2df98747 outdated

66+    // Call Trim() to remove transactions and bring the cluster back within limits.
67+    auto removed_refs = graph->Trim();
68+    graph->SanityCheck();
69+    BOOST_CHECK(!graph->IsOversized(/*main_only=*/false));
70+
71+    // We only need to trim in the middle bottom transaction to end up with 2 clusters each within cluster limits.

glozow commented at 2:52 pm on July 2, 2025:

nit in 2b2df98747f

0    // We only need to trim the middle bottom transaction to end up with 2 clusters each within cluster limits.

sipa commented at 7:17 pm on July 2, 2025:

Done.

in src/txgraph.cpp:2748 in 2b2df98747 outdated

2743+                ++deps_it;
2744+            }
2745+            trim_it->m_parent_count = trim_it->m_deps_left;
2746+            // If this transaction has no unmet dependencies, and is not oversized, add it to the
2747+            // heap (just append for now, the heapification happens below).
2748+            if (trim_it->m_deps_left == 0 && trim_it->m_tx_size <= m_max_cluster_size) {

glozow commented at 6:24 pm on July 2, 2025:

I tested that Trim() always removes oversized singletons, even if there are no clusters to be merged, using a simple unit test

 0diff --git a/src/test/txgraph_tests.cpp b/src/test/txgraph_tests.cpp
 1index 9f9ad04ff37..d90afd8819d 100644
 2--- a/src/test/txgraph_tests.cpp
 3+++ b/src/test/txgraph_tests.cpp
 4@@ -248,4 +248,43 @@ BOOST_AUTO_TEST_CASE(txgraph_trim_huge)
 5     BOOST_CHECK(graph->GetTransactionCount() >= (NUM_TOP_CHAINS * NUM_TX_PER_TOP_CHAIN * 99) / 100);
 6 }
 7 
 8+BOOST_AUTO_TEST_CASE(txgraph_trim_big_singletons)
 9+{
10+    // Mempool consists of 100 singleton clusters; there are no dependencies. Some are oversized. Trim() should remove all of the oversized ones.
11+    static constexpr int MAX_CLUSTER_COUNT = 64;
12+    static constexpr int32_t MAX_CLUSTER_SIZE = 100'000;
13+    static constexpr int NUM_TOTAL_TX = 100;
14+
15+    // Create a new graph for the test.
16+    auto graph = MakeTxGraph(MAX_CLUSTER_COUNT, MAX_CLUSTER_SIZE);
17+
18+    // Add all transactions and store their Refs.
19+    std::vector<TxGraph::Ref> refs;
20+    refs.reserve(NUM_TOTAL_TX);
21+
22+    // Add all transactions. They are in individual clusters.
23+    for (unsigned int i = 0; i < NUM_TOTAL_TX; ++i) {
24+        // The 88th transaction is oversized.
25+        // Every 20th transaction is oversized.
26+        const FeePerWeight feerate{500 + i, (i == 88 || i % 20 == 0) ? MAX_CLUSTER_SIZE + 1 : 100};
27+        refs.push_back(graph->AddTransaction(feerate));
28+    }
29+    graph->SanityCheck();
30+
31+    // Check that the graph is now oversized. This also forces the graph to
32+    // group clusters and compute the oversized status.
33+    BOOST_CHECK(graph->IsOversized(/*main_only=*/false));
34+
35+    // Call Trim() to remove transactions and bring the cluster back within limits.
36+    auto removed_refs = graph->Trim();
37+    graph->SanityCheck();
38+    BOOST_CHECK_EQUAL(graph->GetTransactionCount(), NUM_TOTAL_TX - 6);
39+    BOOST_CHECK(!graph->IsOversized(/*main_only=*/false));
40+
41+    // Check that all the oversized transactions were removed.
42+    for (unsigned int i = 0; i < refs.size(); ++i) {
43+        BOOST_CHECK_EQUAL(graph->Exists(refs[i]), i != 88 && i % 20 != 0);
44+    }
45+}
46+
47 BOOST_AUTO_TEST_SUITE_END()
48diff --git a/src/txgraph.cpp b/src/txgraph.cpp
49index 1b29a05e534..26eef54d321 100644
50--- a/src/txgraph.cpp
51+++ b/src/txgraph.cpp
52@@ -2745,7 +2745,7 @@ std::vector<TxGraph::Ref*> TxGraphImpl::Trim() noexcept
53             trim_it->m_parent_count = trim_it->m_deps_left;
54             // If this transaction has no unmet dependencies, and is not oversized, add it to the
55             // heap (just append for now, the heapification happens below).
56-            if (trim_it->m_deps_left == 0 && trim_it->m_tx_size <= m_max_cluster_size) {
57+            if (trim_it->m_deps_left == 0) {
58                 trim_heap.push_back(trim_it);
59             }
60         }
61@@ -2813,7 +2813,7 @@ std::vector<TxGraph::Ref*> TxGraphImpl::Trim() noexcept
62                 new_size += ptr->m_uf_size;
63             }
64             // Skip the entry if this would violate any limit.
65-            if (new_count > m_max_cluster_count || new_size > m_max_cluster_size) continue;
66+            if (new_count > m_max_cluster_count) continue;
67 
68             // Union the partitions this transaction and all its dependencies are in together.
69             auto rep = &entry;

sipa commented at 7:13 pm on July 2, 2025:

Great, incorporated. I’m assuming the txgraph.cpp changes above are to demonstrate the test works?

glozow commented at 9:06 pm on July 2, 2025:

Yes, those shouldn’t be included!

glozow commented at 6:31 pm on July 2, 2025: member

ACK 2b2df98747fdb6380588991167ce2e8cb92f3bfb

Reviewed the code and the tests. I ran the bench to see that we spend a lot longer on the last commit since we are able to consider more potential clusters. Before:

0|               ns/op |                op/s |    err% |     total | benchmark
1|--------------------:|--------------------:|--------:|----------:|:----------
2|       55,112,375.00 |               18.14 |    0.0% |      0.06 | `TxGraphTrim`

After:

0|               ns/op |                op/s |    err% |     total | benchmark
1|--------------------:|--------------------:|--------:|----------:|:----------
2|       71,068,959.00 |               14.07 |    0.0% |      0.07 | `TxGraphTrim`

txgraph: Add ability to configure maximum cluster size/weight (feature)

This is integrated with the oversized property: the graph is oversized when
any connected component within it contains more than the cluster count limit
many transactions, or when their combined size/weight exceeds the cluster size
limit.

It becomes disallowed to call AddTransaction with a size larger than this limit,
though this limit will be lifted in the next commit.

In addition, SetTransactionFeeRate becomes SetTransactionFee, so that we do not
need to deal with the case that a call to this function might affect the
oversizedness.

c4287b9b71

txgraph: Permit transactions that exceed cluster size limit (feature)

This removes the restriction added in the previous commit that individual
transactions do not exceed the max cluster size limit.

With this change, the responsibility for enforcing cluster size limits can
be localized purely in TxGraph, without callers (and in particular, tests)
needing to duplicate the enforcement for individual transactions.

19b14e61ea

txgraph: remove unnecessary m_group_oversized (simplification) eabcd0eb6f

txgraph: Add ability to trim oversized clusters (feature)

During reorganisations, it is possible that dependencies get add which
result in clusters that violate limits (count, size), when linking the
new from-block transactions to the old from-mempool transactions.

Unlike RBF scenarios, we cannot simply reject these policy violations
when they are due to received blocks. To accomodate this, add a Trim()
function to TxGraph, which removes transactions (including descendants)
in order to make all resulting clusters satisfy the limits.

In the initial version of the function added here, the following approach
is used:
- Lazily compute a naive linearization for the to-be-merged cluster (using
  an O(n log n) algorithm, optimized for far larger groups of transactions
  than the normal linearization code).
- Initialize a set of accepted transactions to {}
- Iterate over the transactions in this cluster one by one:
  - If adding the transaction to the set makes it exceed the max cluster size
    or count limit, stop.
  - Add the transaction to the set.
- Remove all transactions from the cluster that were not included in the set
  (note that this necessarily includes all descendants too, because they
  appear later in the naive linearization).

Co-authored-by: Greg Sanders <gsanders87@gmail.com>

a04e205ab0

txgraph: add unit test for TxGraph::Trim (tests)

Co-Authored-By: Pieter Wuille <pieter@wuille.net>

938e86f8fe

txgraph: add fuzz test scenario that avoids cycles inside Trim() (tests)

Trim internally builds an approximate dependency graph of the merged cluster,
replacing all existing dependencies within existing clusters with a simple
linear chain of dependencies. This helps keep the complexity of the merging
operation down, but may result in cycles to appear in the general case, even
though in real scenarios (where Trim is called for stitching re-added mempool
transactions after a reorg back to the existing mempool transactions) such
cycles are not possible.

Add a test that specifically targets Trim() but in scenarios where it is
guaranteed not to have any cycles. It is a special case, is much more a
whitebox test than a blackbox test, and relies on randomness rather than
fuzz input. The upside is that somewhat stronger properties can be tested.

Co-authored-by: Greg Sanders <gsanders87@gmail.com>

9c436ff01c

sipa force-pushed on Jul 2, 2025

sipa commented at 7:17 pm on July 2, 2025: member

Addressed nits, and also incorporated the benchmark in this PR.

txgraph: add Trim benchmark (benchmark) 4608df37e0

txgraph: Track multiple potential would-be clusters in Trim (improvement)

In the existing Trim function, as soon as the set of accepted transactions
would exceed the max cluster size or count limit, the acceptance loop is
stopped, removing all later transactions. However, it is possible that by
excluding some of those transactions the would-be cluster splits apart into
multiple would-clusters. And those clusters may well permit far more
transactions before their limits are reached.

Take this into account by using a union-find structure inside TrimTxData to
keep track of the count/size of all would-be clusters that would be formed
at any point, and only reject transactions which would cause these resulting
partitions to exceed their limits.

This is not an optimization in terms of CPU usage or memory; it just
improves the quality of the transactions removed by Trim().

1632fc104b

sipa force-pushed on Jul 2, 2025

DrahtBot added the label CI failed on Jul 2, 2025

DrahtBot commented at 8:12 pm on July 2, 2025: contributor

🚧 At least one of the CI tasks failed. Task tidy: https://github.com/bitcoin/bitcoin/runs/45240651157 LLM reason (✨ experimental): Clang-tidy detected errors due to inclusion of deprecated headers, causing CI failure.

Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.
An intermittent issue.

Leave a comment here, if you need help tracking down a confusing failure.

instagibbs approved

instagibbs commented at 8:27 pm on July 2, 2025: member

reACK 1632fc104be8f171f59a023800b2f9f20d0a3cff

Includes the benchmark and unit test for trimming singletons, and a text cleanup

git range-diff master 2b2df98747fdb6380588991167ce2e8cb92f3bfb 1632fc104be8f171f59a023800b2f9f20d0a3cff

DrahtBot requested review from glozow on Jul 2, 2025

in src/test/txgraph_tests.cpp:60 in 938e86f8fe outdated

55+    for (unsigned int i = 0; i < NUM_BOTTOM_TX; ++i) {
56+        graph->AddDependency(/*parent=*/refs[NUM_BOTTOM_TX + i], /*child=*/refs[i]);
57+        graph->AddDependency(/*parent=*/refs[NUM_BOTTOM_TX + i + 1], /*child=*/refs[i]);
58+    }
59+
60+    // Check that the graph is now oversized. This also forces the graph to

instagibbs commented at 8:30 pm on July 2, 2025:

fwiw I commented out the IsOversized check and the test passes still as expected, with singletons

glozow commented at 9:18 pm on July 2, 2025:

True, there isn’t actually anything to group in this test. Just copied from the other tests.

glozow commented at 9:19 pm on July 2, 2025: member

reACK 1632fc104be via range-diff

DrahtBot removed the label CI failed on Jul 2, 2025

in src/test/txgraph_tests.cpp:55 in 938e86f8fe outdated

50+    }
51+
52+    // Create the zigzag dependency structure.
53+    // Each transaction in the bottom row depends on two adjacent transactions from the top row.
54+    graph->SanityCheck();
55+    for (unsigned int i = 0; i < NUM_BOTTOM_TX; ++i) {

ismaelsadeeq commented at 12:39 pm on July 3, 2025:

We start being oversized at i = 24 iteration.

ismaelsadeeq approved

ismaelsadeeq commented at 12:22 pm on July 4, 2025: member

reACK 1632fc104be8f171f59a023800b2f9f20d0a3cff 🛰️

glozow merged this on Jul 7, 2025

glozow closed this on Jul 7, 2025

cluster mempool: add TxGraph reorg functionality #31553

Code Coverage & Benchmarks

Reviews

Conflicts

LLM Linter (✨ experimental)