cluster mempool: introduce TxGraph

sipa commented at 3:59 pm on November 24, 2024: member

Part of cluster mempool: #30289.

1. Overview

This introduces the TxGraph class, which encapsulates knowledge about the (effective) fees, sizes, and dependencies between all mempool transactions, but nothing else. In particular, it lacks knowledge about CTransaction, inputs, outputs, txids, wtxids, prioritization, validatity, policy rules, and a lot more. Being restricted to just those aspects of the mempool makes the behavior very easy to fully specify (ignoring the actual linearizations produced), and write simulation-based tests for (which are included in this PR).

2. Interface

The interface can be largely categorized into:

Mutation functions:
- AddTransaction (add a new transaction with specified feerate, and get a Ref object back to identify it).
- RemoveTransaction (given a Ref object, remove the transaction).
- AddDependency (given two Ref objects, add a dependency between them).
- SetTransactionFee (modify the fee associated with a Ref object).
Inspector functions:
- GetAncestors (get the ancestor set in the form of Ref* pointers)
- GetAncestorsUnion (like above, but for the union of ancestors of multiple Ref* pointers)
- GetDescendants (get the descendant set in the form of Ref* pointers)
- GetDescendantsUnion (like above, but for the union of ancestors of multiple Ref* pointers)
- GetCluster (get the connected component set in the form of Ref* pointers, in the order they would be mined).
- GetIndividualFeerate (get the feerate of a transaction)
- GetChunkFeerate (get the mining score of a transaction)
- CountDistinctClusters (count the number of distinct clusters a list of Refs belong to)
Staging functions:
- StartStaging (make all future mutations operate on a proposed transaction graph)
- CommitStaging (apply all the changes that are staged)
- AbortStaging (discard all the changes that are staged)
Miscellaneous functions:
- DoWork (do queued-up computations now, so that future operations are fast)

This TxGraph::Ref type used as a “handle” on transactions in the graph can be inherited from, and the idea is that in the full cluster mempool implementation (#28676, after it is rebased on this), CTxMempoolEntry will inherit from it, and all actually used Ref objects will be CTxMempoolEntrys. With that, the mempool code can just cast any Ref* returned by txgraph to CTxMempoolEntry*.

3. Implementation

Internally the graph data is kept in clustered form (partitioned into connected components), for which linearizations are maintained and updated as needed using the cluster_linearize.h algorithms under the hood, but this is hidden from the users of this class. Implementation-wise, mutations are generally applied lazily, appending to queues of to-be-removed transactions and to-be-added dependencies, so they can be batched for higher performance. Inspectors will generally only evaluate as much as is needed to answer queries, with roughly 5 levels of processing to go to fully instantiated and acceptable cluster linearizations, in order:

ApplyRemovals (take batches of to-be-removed transactions and translate them to “holes” in the corresponding Clusters/DepGraphs).
SplitAll (creating holes in Clusters may cause them to break apart into smaller connected components, so make turn them into separate Clusters/linearizations).
GroupClusters (figure out which Clusters will need to be combined in order to add requested to-be-added dependencies, as these may span clusters).
ApplyDependencies (actually merge Clusters as precomputed by GroupClusters, and add the dependencies between them).
MakeAcceptable (perform the LIMO linearization algorithm on Clusters to make sure their linearizations are acceptable).

4. Future work

This is only an initial version of TxGraph, and some functionality is missing before #28676 can be rebased on top of it:

The ability to get comparative feerate diagrams before/after for the set of staged changes (to evaluate RBF incentive-compatibility).
Mining interface (ability to iterate transactions quickly in mining score order) (see #31444).
Eviction interface (reverse of mining order, plus memory usage accounting) (see #31444).
Ability to fix oversizedness of clusters (before or after committing) - this is needed for reorgs where aborting/rejecting the change just is not an option (see #31553).
Interface for controlling how much effort is spent on LIMO. In this PR it is hardcoded.

Then there are further improvements possible which would not block other work:

Making Cluster a virtual class with different implementations based on transaction count (which could dramatically reduce memory usage, as most Clusters are just a single transaction, for which the current implementation is overkill).
The ability to have background thread(s) for improving cluster linearizations.

DrahtBot commented at 3:59 pm on November 24, 2024: contributor

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/31363.

Reviews

See the guideline for information on the review process.

Type	Reviewers
ACK	instagibbs, ajtowns, ismaelsadeeq, glozow
Concept ACK	jonatack

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#30605 (Cluster linearization: separate tests from tests-of-tests by sipa)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

sipa force-pushed on Nov 24, 2024

DrahtBot commented at 4:20 pm on November 24, 2024: contributor

🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/33443336095

Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.
An intermittent issue.

Leave a comment here, if you need help tracking down a confusing failure.

DrahtBot added the label CI failed on Nov 24, 2024

sipa force-pushed on Nov 24, 2024

DrahtBot removed the label CI failed on Nov 24, 2024

sipa force-pushed on Nov 25, 2024

glozow added the label Mempool on Nov 28, 2024

sipa force-pushed on Dec 2, 2024

sipa force-pushed on Dec 4, 2024

jonatack commented at 2:22 pm on December 5, 2024: member

Concept ACK

in src/txgraph.cpp:258 in 18184f4e73 outdated

225+        Ref* m_ref;
226+        /** Which Cluster and position therein this Entry appears in. ([0] = main, [1] = staged). */
227+        Locator m_locator[MAX_LEVELS];
228+        /** The chunk feerate of this transaction in main (if present in m_locator[0]) */
229+        FeeFrac m_main_chunk_feerate;
230+        /** The position this transaction in the main linearization (if present). /*/

instagibbs commented at 3:25 pm on December 18, 2024:

0        /** The position this transaction in the main linearization (if present). **/

sipa commented at 10:31 pm on February 12, 2025:

Done.

sipa force-pushed on Dec 20, 2024

sipa commented at 4:33 pm on December 20, 2024: member

I’m aware this is a big chunk of code, but perhaps this can aid review a bit.

The biggest commits is txgraph: (feature) add initial version. To get an idea of what is going on, I think it’s useful to look (in order) at:

The src/txgraph.h file added in that commit, as it defines the public interface.
The src/test/fuzz/txgraph.cpp test added in the next commit (txgraph: (tests) add simulation fuzz test), as it does to an extent show how the interface is used, and what is expected of it.
The data structures (TxGraphImpl and Cluster) in src/txgraph.cpp, and the internal functions corresponding to the 5 processing steps that happen:
- Batch removal of queued transaction removals: TxGraphImpl::ApplyRemovals and Cluster::ApplyRemovals.
- Splitting of clusters into components after removing transactions: TxGraphImpl::SplitAll, TxGraphImpl::Split, and Cluster::Split
- Computing which clusters will need to be combined due to queued dependencies being added between them: TxGraphImpl::GroupClusters.
- Merging of clusters and applying queued dependencies to them: TxGraphImpl::ApplyDependencies, TxGraphImpl::Merge, Cluster::Merge, Cluster::ApplyDependencies.
- Making the linearization of clusters acceptable: TxGraphImpl::MakeAcceptable and Cluster::Relinearize.
The interface implementation functions that are built on top in TxGraphImpl: AddTransaction, RemoveTransaction, AddDependency, SetTransactionFee, Cleanup, GetAncestors, GetDescendants, GetCluster, GetChunkFeerate, etc.

sipa force-pushed on Dec 22, 2024

sipa commented at 5:23 pm on December 22, 2024: member

I have pushed a substantial change to the TxGraphImpl::GroupClusters function (in commit “txgraph: (feature) add initial version”). In a benchmark with 64000 transactions being combined into 1000 clusters, this makes it go from ~160ms to ~16ms. I will add the benchmark to this PR when I clean it up.

ismaelsadeeq commented at 7:33 pm on December 24, 2024: member

Concept ACK

GetChunkFeerate (get the mining score of a transaction)

From reading the description and skimming through the code, IIUC TxGraph encapsulates knowledge about fees and sizes but lacks knowledge about prioritization. This implies that the fee it uses is the effective fee, not the modified fee that includes prioritization.

How are we accounting for the real mining score of a Ref without that prioritization knowledge? The current mining algorithm uses the modified ancestor fee to calculate the mining score.

Could you clarify how prioritization will be handled post-cluster mempool?

sipa commented at 7:41 pm on December 24, 2024: member

@ismaelsadeeq TxGraph (and DepGraph, and FeeFrac) just treat “fee” and “size” as numbers whose ratio is to be maximized. These classes don’t care what meaning they correspond to.

In practice, I expect that the mempool code will provide the modified fee as “fee”, and vsize or weight as “size”.

ismaelsadeeq commented at 8:37 pm on December 24, 2024: member

Thank you for clarifying. Reading ’lacks knowledge about prioritization’ led me to infer that. But I see now you are talking about mapDeltas

sipa force-pushed on Jan 8, 2025

in src/txgraph.h:124 in 972df15af2 outdated

134-    virtual bool IsOversized() noexcept = 0;
135-    /** Get the feerate of the chunk which transaction arg is in. Returns the empty FeeFrac if arg
136-     *  does not exist. The graph must not be oversized. */
137-    virtual FeeFrac GetChunkFeerate(const Ref& arg) noexcept = 0;
138+    virtual bool IsOversized(bool main_only = false) noexcept = 0;
139+    /** Get the feerate of the chunk which transaction arg is in in the main graph. Returns the

ismaelsadeeq commented at 6:28 pm on January 9, 2025:

nit when retouching

In " txgraph: (feature) add staging support" 972df15af28ba6787f096162f776b2973342e674

0    /** Get the feerate of the chunk which transaction arg is in the main graph. Returns the

sipa commented at 9:49 pm on January 9, 2025:

Done.

in src/txgraph.h:142 in 0c8dc2323e outdated

104+    virtual FeeFrac GetChunkFeerate(const Ref& arg) noexcept = 0;
105+    /** Get the individual transaction feerate of transaction arg. Returns the empty FeeFrac if
106+     *  arg does not exist. */
107+    virtual FeeFrac GetIndividualFeerate(const Ref& arg) noexcept = 0;
108+    /** Get pointers to all transactions in the connected component ("cluster") which arg is in.
109+     *  The transactions will be returned in a topologically-valid order of acceptable quality.

ismaelsadeeq commented at 6:29 pm on January 9, 2025:

optional nit when their is need to retouch

In “txgraph: (feature) add initial version” 0c8dc2323eb1ec34357a807f0860cf0a08a63a75 you mean QualityLevel::ACCEPTABLE right we could just reference the enum like you were doing txgraph.cpp?

0     *  The transactions will be returned in a topologically-valid order of acceptable quality (QualityLevel::ACCEPTABLE).

sipa commented at 8:07 pm on January 9, 2025:

I consider the QualityLevel enum to be an implementation detail of the txgraph module that should not leak into its interface.

However, I guess it would be helpful to specify more clearly what acceptable mean. I’ll think about that; it’s hard to define.

instagibbs commented at 4:23 pm on February 5, 2025:

I’m unsure the “Acceptable” word helps much here. It’s just whatever quality it is, at least topologically valid.

instagibbs commented at 6:59 pm on March 4, 2025:

Would it go too far to assure the caller that the ordering reflects the underlying stored linearization i.e. reflects the underlying ordering values like chunk feerates are coming from?

ajtowns commented at 6:02 am on March 14, 2025:

Doesn’t “topologically-valid order” already cover what “acceptable quality” means here, as far as the interface is concerned?

ismaelsadeeq commented at 9:53 am on March 20, 2025:

“acceptable quality” is more than just a topologically valid order. It refers to a linearization that may not be optimal but is the best we can achieve for that cluster within resource limitations.

sipa commented at 3:37 am on March 22, 2025:

This has been rewritten.

in src/cluster_linearize.h:1372 in 1c5f1c4601 outdated

1353+        // Figure out which elements elem needs to be placed before.
1354+        SetType place_before = done & depgraph.Ancestors(elem);
1355+        // Find which position to place elem in (updating j), continuously moving the elements
1356+        // in between forward.
1357+        while (place_before.Any()) {
1358+            auto to_swap = linearization[len - 1 - (j - 1)];

ismaelsadeeq commented at 6:35 pm on January 9, 2025:

In “clusterlin: add FixLinearization function + fuzz test” 1c5f1c4601e0a895258cfe073125361f7a6ea012 Is it safe to get the ClusterIndex when j is 0, we will subtract 1 from uint32_t which will result to a large positive number?

sipa commented at 8:13 pm on January 9, 2025:

j cannot be 0. If it were, there is necessarily nothing elem needs to be moved before anymore, and place_before would be empty, and the loop would have terminated.

Will add an Assume(j > 0); on the next push.

sipa commented at 2:52 pm on January 23, 2025:

Done.

yancyribbens commented at 0:48 am on February 26, 2025:

The comment // j cannot be 0 here; implies this should be j != 0 instead of j > 0. Since it’s unsigned j != 0 is also correct.

sipa force-pushed on Jan 9, 2025

sipa commented at 9:49 pm on January 9, 2025: member

A few changes:

Added a TxGraph::DoWork() functions which performs queued-up computations now, so that future operations are fast.
Make Ref& arguments to TxGraph::AddDependency, TxGraph::RemoveTransaction, and TxGraph::SetTransactionFee const. They do not modify the Ref itself; they only modify what the Ref points to.
Address review comments by @ismaelsadeeq here, and @theuni on #31553.

sipa force-pushed on Jan 16, 2025

sipa commented at 9:14 pm on January 16, 2025: member

Added an additional inspector function:

CountDistinctClusters, which efficiently counts how many distinct clusters a list of specified Refs belong to, for helping with enforcing RBF cluster policy limits.

sipa force-pushed on Jan 22, 2025

sipa commented at 7:59 pm on January 22, 2025: member

I split out two optimizations to TxGraphImpl::GroupClusters() from the initial commit into their own commits.

Pumpkin1234567812 approved

sipa commented at 5:28 pm on January 23, 2025: member

A suggestion for simplifying the removed/destroyed/unlinked/cleanup interface.

Unlinking only happens when the calling code destroys a Ref. Currently, unlinking can happen either through Ref destruction, or through TxGraph::Cleanup() (which unlinks anything not appearing in the graph anymore).
The TxGraphImpl::m_entries vector compaction happens transparently, whenever applicable; there is no TxGraph::Cleanup() any more. This can only happen after the corresponding Ref was destroyed (because that’s now the only way to unlink).
To assist the calling code with cleaning up Refs, a new TxGraph::GetRemoved() function is added, which is a pure inspector, and reports which Refs do not appear in either main or staging (if present) anymore.

instagibbs commented at 5:35 pm on January 23, 2025: member

Unlinking only happens when the calling code destroys a Ref. Currently, unlinking can happen either through Ref destruction, or through TxGraph::Cleanup() (which unlinks anything not appearing in the graph anymore).

this would result in:

exists
removed
unlinked + destroyed

as the three possible states?

sipa commented at 5:38 pm on January 23, 2025: member

@instagibbs Right. Though “removed/exists” is really a per-graph (main vs staging) thing, while destroyed+unlinked are TxGraph-wide.

Maybe it should not be called GetRemoved(), as that term is ambiguous. How about GetUnusedRefs or so?

sipa commented at 9:42 pm on January 23, 2025: member

Actually, having this GetUnusedRefs() is pretty annoying (read: would require extra per-transaction memory) to be efficiently implementable, and talking to @sdaftuar it does not seem important if Trim() can report what it is deleting (because that’s really the only place where TxGraphImpl makes a decision about which transactions belong to the graph).

sipa force-pushed on Jan 24, 2025

sipa commented at 10:14 pm on January 24, 2025: member

Some changes:

Cleanup is gone. Compaction now happens automatically and transparently, but the caller is responsible for destroying Refs when they are no longer needed.
Avoid re-Group-ing when trying to determine oversizedness while it is already known.

DrahtBot commented at 11:23 pm on January 24, 2025: contributor

🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/36148884388

Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.
An intermittent issue.

Leave a comment here, if you need help tracking down a confusing failure.

DrahtBot added the label CI failed on Jan 24, 2025

sipa force-pushed on Jan 26, 2025

sipa commented at 4:41 am on January 26, 2025: member

Split out a memory optimization in GroupClusters to its own commit.
Moved caching of oversizedness to its own commit.

sipa force-pushed on Jan 26, 2025

sipa commented at 8:00 pm on January 26, 2025: member

Remove the TxGraph::Ref::operator bool(); it has little use now that cleanup is implicit.
Add assertion that cluster count limit is at least 1.
Modify fuzz test to use TxGraph::Ref pointers everywhere rather than references; I realized the existing (test) code was not entirely kosher (could sometimes use a reference to an object that was already destroyed).

DrahtBot removed the label CI failed on Jan 26, 2025

in src/test/fuzz/cluster_linearize.cpp:1148 in 83d61884b4 outdated

1143+        uint64_t val{0};
1144+        try {
1145+            reader >> VARINT(val);
1146+        } catch (const std::ios_base::failure&) {}
1147+        val %= todo.Count();
1148+        // Find which element in todo that corresponds to.

ismaelsadeeq commented at 6:59 pm on January 27, 2025:

In 83d61884b41b2a2be620fbe35736b433c93d4c76

I find it a bit difficult to follow this comment

/** How long the prefix of the constructed linearization is which is topological. */

??

    // Find which element in todo that corresponds to.

??

sipa commented at 10:04 pm on January 31, 2025:

I have largely rewritten this, and expanded comments. Please have a look and let me know if it is clearer now.

ismaelsadeeq commented at 10:08 pm on February 10, 2025:

Yes this is much better now! Thanks

in src/txgraph.cpp:32 in 6741590e2a outdated

23+class TxGraphImpl;
24+
25+/** Position of a ClusterIndex within a Cluster::m_linearization. */
26+using LinearizationIndex = uint32_t;
27+/** Position of a Cluster within Graph::m_clusters. */
28+using ClusterSetIndex = uint32_t;

ismaelsadeeq commented at 7:58 pm on January 27, 2025:

In “txgraph: (feature) add initial version” 6741590e2a4632c4f9b4655679df9694223beac4

In cluster_linearize, we have ClusterIndex, which is defined as the “Data type to represent transaction indices in clusters.” (DepGraph implicitly is a cluster without the linearization and the TxGraph context that Cluster class introduced in this commit has.)

Since we now have a separate Cluster class, would it be clearer to rename ClusterIndex to DepGraphIndex? The type alias names and class names are overlapping

sipa commented at 10:04 pm on January 31, 2025:

Done.

in src/txgraph.cpp:34 in 6741590e2a outdated

25+/** Position of a ClusterIndex within a Cluster::m_linearization. */
26+using LinearizationIndex = uint32_t;
27+/** Position of a Cluster within Graph::m_clusters. */
28+using ClusterSetIndex = uint32_t;
29+
30+/** Quality levels for cached linearizations. */

ismaelsadeeq commented at 8:04 pm on January 27, 2025:

In “txgraph: (feature) add initial version” 6741590e2a4632c4f9b4655679df9694223beac4

0/** Quality levels for cached cluster linearizations. */

sipa commented at 10:02 pm on January 31, 2025:

Done.

in src/txgraph.h:15 in 15da6fdd19 outdated

11@@ -12,8 +12,7 @@
12 #ifndef BITCOIN_TXGRAPH_H
13 #define BITCOIN_TXGRAPH_H
14 
15-/** No connected component within TxGraph is allowed to exceed this number of transactions. */
16-static constexpr unsigned CLUSTER_COUNT_LIMIT{64};
17+static constexpr unsigned MAX_CLUSTER_COUNT_LIMIT{64};

ismaelsadeeq commented at 9:21 pm on January 27, 2025:

In “txgraph: (feature) make max cluster count configurable and “oversize” state” 15da6fdd19813a08bf9571147f97e22b5390ecf9

Maybe comment why 64 selected and how having limit above this will hurt the node/miner? fwiw we have seen clusters with size more than 64 in the past https://bitcoincore.reviews/25038, but I think it’s not a regular occurrence.

sipa commented at 10:04 pm on January 31, 2025:

This code is unused, the current value just affects testing, but it can literally be set to any positive integer, and the code will work (though at degraded performance for higher values). The real question is what to set the max_cluster_count value to in MakeTxGraph(), but I think that discussion belongs in #28676.

sipa force-pushed on Jan 30, 2025

sipa commented at 11:47 pm on January 30, 2025: member

Introduce vsize and weight tagged versions of FeeFrac, to avoid accidental type confusion.
Convert all of txgraph to work using FeePerWeight, rather than bare FeeFrac.

DrahtBot added the label CI failed on Jan 31, 2025

sipa force-pushed on Jan 31, 2025

DrahtBot removed the label CI failed on Jan 31, 2025

sipa force-pushed on Jan 31, 2025

sipa force-pushed on Feb 1, 2025

sipa force-pushed on Feb 4, 2025

DrahtBot added the label CI failed on Feb 4, 2025

in src/txgraph.cpp:315 in 68435c7a4f outdated

308+        auto& entry = m_entries[idx];
309+        Assume(entry.m_ref != nullptr);
310+        entry.m_ref = &new_location;
311+    }
312+
313+    /** Only called by Ref::~Ref to unlink Refs. */

ismaelsadeeq commented at 9:27 pm on February 4, 2025:

In “txgraph: (feature) add initial version” c89d147209c91bb0464321f5bc733a4eeab0dea0

It is also called by the move assignment operator of Ref

sipa commented at 8:33 pm on February 11, 2025:

Fixed.

DrahtBot removed the label CI failed on Feb 4, 2025

in src/txgraph.cpp:547 in c89d147209 outdated

542+
543+void TxGraphImpl::ApplyRemovals() noexcept
544+{
545+    auto& to_remove = m_to_remove;
546+    // Skip if there is nothing to remove.
547+    if (to_remove.empty()) return;

ismaelsadeeq commented at 3:12 pm on February 5, 2025:

In “txgraph: (feature) add initial version” c89d147209c91bb0464321f5bc733a4eeab0dea0

We can check whether m_to_remove is empty first so we don’t have to copy the reference when it is, the copied reference is also only used in this if statement, so we can just use the m_to_remove value

sipa commented at 8:35 pm on February 11, 2025:

Creating a reference doesn’t cost anything (in simple cases); it’s just introducing a new name for an existing object. It’s true that it’s kind of pointless at this stage, but it’ll be used in “group per-graph data in ClusterSet”, and in “add staging support”.

in src/cluster_linearize.h:1355 in 1fb067066e outdated

1335@@ -1336,6 +1336,38 @@ std::vector<ClusterIndex> MergeLinearizations(const DepGraph<SetType>& depgraph,
1336     return ret;
1337 }
1338 
1339+/** Make linearization topological, retaining its ordering where possible. */
1340+template<typename SetType>
1341+void FixLinearization(const DepGraph<SetType>& depgraph, Span<ClusterIndex> linearization) noexcept
1342+{
1343+    // This algorithm can be summarized as moving every element in the linearization backwards
1344+    // until it is placed after all this ancestors.

instagibbs commented at 3:20 pm on February 5, 2025:

0    // until it is placed after all its ancestors.

sipa commented at 10:31 pm on February 12, 2025:

Done.

in src/cluster_linearize.h:1364 in 1fb067066e outdated

1348+    for (ClusterIndex i = 0; i < len; ++i) {
1349+        /** The element at that position. */
1350+        ClusterIndex elem = linearization[len - 1 - i];
1351+        /** j represents how far from the back of the linearization elem should be placed. */
1352+        ClusterIndex j = i;
1353+        // Figure out which elements elem needs to be placed before.

instagibbs commented at 3:29 pm on February 5, 2025:

0        // Figure out which elements elem needs to be placed after.

we are putting the other elements “before”

sipa commented at 10:31 pm on February 12, 2025:

Done.

in src/cluster_linearize.h:1371 in 1fb067066e outdated

1355+        // Find which position to place elem in (updating j), continuously moving the elements
1356+        // in between forward.
1357+        while (place_before.Any()) {
1358+            // j cannot be 0 here; if it was, then there was necessarily nothing earlier which
1359+            // elem needs to be place before anymore, and place_before would be empty.
1360+            Assume(j > 0);

instagibbs commented at 3:40 pm on February 5, 2025:

in release builds this will result in j-- wrapping around to max value of uint32_t, causing an OOB access? We should probably just stop trying to fix things if it happens?

sipa commented at 7:09 pm on February 12, 2025:

:man_shrugging: I hope the code is clear enough that reviewers are convinced this is impossible. The Assume hopefully helps with that convincing. I hope an assertion isn’t needed to convince you even more.

in src/test/fuzz/cluster_linearize.cpp:410 in a921ae75c2 outdated

406@@ -407,7 +407,7 @@ FUZZ_TARGET(clusterlin_depgraph_serialization)
407     SanityCheck(depgraph);
408 
409     // Verify the graph is a DAG.
410-    assert(IsAcyclic(depgraph));

instagibbs commented at 3:56 pm on February 5, 2025:

a921ae75c2f7162b3b617ddeccb84e3e60728cf8

Should we add a test that completes a cycle then asserts it’s not acyclic to add coverage that way?

sipa commented at 10:32 pm on February 12, 2025:

Done.

in src/CMakeLists.txt:304 in c89d147209 outdated

279@@ -280,6 +280,7 @@ add_library(bitcoin_node STATIC EXCLUDE_FROM_ALL
280   signet.cpp
281   torcontrol.cpp
282   txdb.cpp
283+  txgraph.cpp

instagibbs commented at 4:02 pm on February 5, 2025:

c89d147209c91bb0464321f5bc733a4eeab0dea0 in commit message: “… and so for.” and so on?

sipa commented at 10:32 pm on February 12, 2025:

Done.

ajtowns commented at 4:54 am on March 14, 2025:

(“and so forth” alternatively)

in src/txgraph.h:45 in c89d147209 outdated

40+        GraphIndex m_index = GraphIndex(-1);
41+    public:
42+        /** Construct an empty Ref. Non-empty Refs can only be created using
43+         *  TxGraph::AddTransaction. */
44+        Ref() noexcept = default;
45+        /** Destroy this Ref. This is only allowed when it is empty, or the transaction it refers

instagibbs commented at 4:07 pm on February 5, 2025:

This is only allowed when it is empty

Question: how is this enforced?

sipa commented at 6:59 pm on February 12, 2025:

You are eaten by a grue (specifically, by His Unpredictableness, the Grue of Undefined Behavior).

ajtowns commented at 5:20 am on March 14, 2025:

Why does it say earlier “Destroying this TxGraph::Ref removes the corresponding transaction” but here “[destroying this ref] is only allowed when .. the transaction it refers to has been removed from the graph” ? I think the destructor “unlinks” but does not “remove” the tx, so failure to remove before destructing will result in an Assume(!entry.m_locator.IsPresent()) failure in Compact().

sipa commented at 9:09 pm on March 19, 2025:

@ajtowns That comment was put in the wrong commit. I’ve fixed it.

in src/txgraph.h:85 in c89d147209 outdated

79+     * its ancestors, are removed as well (which is what always happens in realistic scenarios),
80+     * this reordering will not affect the behavior of TxGraph.
81+     *
82+     * As an example, imagine 3 transactions A,B,C where B depends on A. If a dependency of C on B
83+     * is added, and then B is deleted, C will still depend on A. If the deletion of B is reordered
84+     * before the C->B dependency is added, it has no effect instead. If, together with the

instagibbs commented at 4:14 pm on February 5, 2025:

comment clarity: “it has no effect” meaning the dependency addition?

sipa commented at 10:32 pm on February 12, 2025:

I’ve reworded this a bit.

in src/txgraph.cpp:159 in c89d147209 outdated

154+    /** Which dependencies are to be added ((parent,child) pairs). GroupData::m_deps_offset indexes
155+     *  into this. */
156+    std::vector<std::pair<GraphIndex, GraphIndex>> m_deps_to_add;
157+    /** Information about the merges to be performed, if known. */
158+    std::optional<std::vector<GroupEntry>> m_group_data = std::vector<GroupEntry>{};
159+    /** Total number of transactions in this ClusterSet (explicit + implicit). */

instagibbs commented at 4:59 pm on February 5, 2025:

c89d147209c91bb0464321f5bc733a4eeab0dea0

what does implicit mean here?

sipa commented at 10:33 pm on February 12, 2025:

It’s something that really only has meaning after staging support is added. I have dropped it here, expanded the comment a bit, and expanded on it a lot more in the staging commit.

in src/txgraph.cpp:252 in c89d147209 outdated

177+
178+    /** A class of objects held internally in TxGraphImpl, with information about a single
179+     *  transaction. */
180+    struct Entry
181+    {
182+        /** Pointer to the corresponding Ref object, if any. */

instagibbs commented at 5:12 pm on February 5, 2025:

c89d147209c91bb0464321f5bc733a4eeab0dea0

could mention nullptr -> “unlinked”

sipa commented at 10:33 pm on February 12, 2025:

Done.

sipa force-pushed on Feb 6, 2025

sipa commented at 4:30 am on February 6, 2025: member

Changed:

Added GetAncestorsUnion and GetDescendantsUnion, which act like their non-union version, but take in a span of multiple Ref*, and return the Ref* for the union of their ancestors/descendants

(requested by @sdaftuar while rebasing #28676)

in src/txgraph.cpp:110 in 781c15bfca outdated

82+    /** Push changes to Cluster and its linearization to the TxGraphImpl Entry objects. */
83+    void Updated(TxGraphImpl& graph) noexcept;
84+
85+    // Functions that implement the Cluster-specific side of internal TxGraphImpl mutations.
86+
87+    /** Apply any number of removals from the front of to_remove, popping them off. At least one

instagibbs commented at 5:25 pm on February 6, 2025:

781c15bfca1ebaffe7b634196e19144f5ab10a50

Wasn’t immediately clear what the expected behavior is based on the comment. It will apply and pop off the entries where the prefix of to_remove is all owned by the cluster itself?

sipa commented at 10:33 pm on February 12, 2025:

I have rewritten this.

in src/txgraph.cpp:596 in 781c15bfca outdated

301+}
302+
303+void Cluster::ApplyRemovals(TxGraphImpl& graph, std::span<GraphIndex>& to_remove) noexcept
304+{
305+    // Iterate over the prefix of to_remove that applies to this cluster.
306+    Assume(!to_remove.empty());

instagibbs commented at 5:27 pm on February 6, 2025:

UB if this Assume() is wrong, should we Assert() or more gracefully handle it?

sipa commented at 8:47 pm on February 12, 2025:

I don’t feel that’s necessary. The calling code is wrong if it’s called with an empty to_remove. It’s hopefully sufficient to document this, so reviewers can be convinced this isn’t being violated.

ajtowns commented at 7:20 am on March 14, 2025:

What’s the advantage of using Assume() over Assert()? The condition is still checked in production builds, it’s just that the result is discarded and execution continues, rather than triggering an abort, no? My understanding was:

CHECK_NONFATAL(cond) – to throw when there’s an internal error that prevents continuing in RPC code, but is recoverable overall
Assume(x) – when we expect x to be true-ish, but the code will still operate correctly if it is not
Assert(x) or assert(x) when we require x to be true-ish and the code won’t operate correctly if it is not

You could conceivably have ref.m_graph be a debug-only field, and write Assume(ref.CheckGraph(txgraph)) with bool Ref::CheckGraph(const TxGraph* g) { if constexpr (DEBUG) { return g == m_graph; } else return true; } perhaps; but otherwise I don’t think you’re getting any benefit from Assume vs Assert.

sipa commented at 8:54 pm on March 19, 2025:

See my comment here: #31363 (comment) about how Assume() gives you compiler-guaranteed side-effect-equivalence, largely without runtime cost.

in src/txgraph.cpp:599 in 781c15bfca outdated

304+{
305+    // Iterate over the prefix of to_remove that applies to this cluster.
306+    Assume(!to_remove.empty());
307+    SetType todo;
308+    do {
309+        GraphIndex idx = to_remove.front();

instagibbs commented at 5:28 pm on February 6, 2025:

0        GraphIndex idx = to_remove.front();
1        if (!Assume(graph.m_entries.size() > idx)) break;

sipa commented at 10:34 pm on February 12, 2025:

I have added an Assume, but not the break. It just shouldn’t happen, and this is tested extensively in fuzzing (where the Assumes actually have teeth).

in src/txgraph.cpp:62 in 781c15bfca outdated

49+    friend class TxGraphImpl;
50+    using GraphIndex = TxGraph::GraphIndex;
51+    using SetType = BitSet<CLUSTER_COUNT_LIMIT>;
52+    /** The DepGraph for this cluster, holding all feerates, and ancestors/descendants. */
53+    DepGraph<SetType> m_depgraph;
54+    /** m_mapping[i] gives the GraphIndex for the position i transaction in m_depgraph. */

instagibbs commented at 5:33 pm on February 6, 2025:

should probably say what value -1 means

sipa commented at 10:34 pm on February 12, 2025:

Done.

in src/txgraph.cpp:64 in 781c15bfca outdated

51+    using SetType = BitSet<CLUSTER_COUNT_LIMIT>;
52+    /** The DepGraph for this cluster, holding all feerates, and ancestors/descendants. */
53+    DepGraph<SetType> m_depgraph;
54+    /** m_mapping[i] gives the GraphIndex for the position i transaction in m_depgraph. */
55+    std::vector<GraphIndex> m_mapping;
56+    /** The current linearization of the cluster. Size equals m_mapping.TxCount().

instagibbs commented at 5:39 pm on February 6, 2025:

m_mapping is a vector, I presume this is m_depgraph instead?

sipa commented at 10:34 pm on February 12, 2025:

Indeed. Expanded/corrected the comment.

in src/txgraph.cpp:702 in 781c15bfca outdated

349+    std::vector<Cluster*> new_clusters;
350+    bool first{true};
351+    // Iterate over the connected components of this Cluster's m_depgraph.
352+    while (todo.Any()) {
353+        auto component = m_depgraph.FindConnectedComponent(todo);
354+        if (first && component == todo) {

instagibbs commented at 8:23 pm on February 6, 2025:

might help reader instantly recognize what todo means in this case

0        if (first && component == todo) {
1            Assume(todo == m_depgraph.Positions());

sipa commented at 10:34 pm on February 12, 2025:

Done.

in src/txgraph.cpp:637 in 781c15bfca outdated

360+        }
361+        first = false;
362+        // Construct a new Cluster to hold the found component.
363+        auto new_cluster = std::make_unique<Cluster>();
364+        new_clusters.push_back(new_cluster.get());
365+        // Remember that all the component's transaction go to this new Cluster. The positions

instagibbs commented at 8:54 pm on February 6, 2025:

0        // Remember that all the component's transactions go to this new Cluster. The positions

sipa commented at 10:34 pm on February 12, 2025:

Done.

in src/txgraph.cpp:776 in 781c15bfca outdated

419+        }
420+        m_linearization.push_back(new_pos);
421+        // Copy the transaction's dependencies, translating them using remap.
422+        SetType parents;
423+        for (auto par : other.m_depgraph.GetReducedParents(pos)) {
424+            parents.Set(remap[par]);

instagibbs commented at 9:19 pm on February 6, 2025:

note: remap here is already filled out from a topological pov since we’re iterating over other.m_linearization which is assumed to be topologically valid

sipa commented at 10:35 pm on February 12, 2025:

Added a comment to this effect.

in src/txgraph.cpp:798 in 781c15bfca outdated

441+    // between which dependencies are added, which simply concatenates their linearizations. Invoke
442+    // PostLinearize, which has the effect that the linearization becomes a merge-sort of the
443+    // constituent linearizations. Do this here rather than in Cluster::Merge, because this
444+    // function is only invoked once per merged Cluster, rather than once per constituent one.
445+    // This concatenation + post-linearization could be replaced with an explicit merge-sort.
446+    PostLinearize(m_depgraph, m_linearization);

instagibbs commented at 9:33 pm on February 6, 2025:

781c15bfca1ebaffe7b634196e19144f5ab10a50

note: I’m not sure this obvious that PostLinearize obtains this property.

sipa commented at 9:16 pm on February 12, 2025:

It follows from the property that PostLinearize results in chunks that do not consist of multiple disconnected components: if a higher-feerate chunk is appended to a lower-feerate one, it must mean they get separated.

I’m happy to elaborate briefly in this comment, but perhaps this is also not the place to provide deeper more theoretical reasons why the properties hold.

instagibbs commented at 9:45 pm on February 12, 2025:

I am perhaps taking the comment too literally? I just don’t see how it’s a merge-sort of chunks when a PostLinearize may very well end up changing chunk boundaries. Or maybe it doesn’t? I buy that the documented properties otherwise hold.

Either way, is there a presumed standout benefit to calling this once here, then after dependencies have been applied and linearization fixed?

sipa commented at 8:55 pm on March 19, 2025:

Late update (discussed elsewhere): this property actually did not hold. Added comments, and partially rewritten.

in src/txgraph.cpp:118 in 781c15bfca outdated

90+    /** Split this cluster (must have a NEEDS_SPLIT* quality). Returns whether to delete this
91+     *  Cluster afterwards. */
92+    [[nodiscard]] bool Split(TxGraphImpl& graph) noexcept;
93+    /** Move all transactions from cluster to *this (as separate components). */
94+    void Merge(TxGraphImpl& graph, Cluster& cluster) noexcept;
95+    /** Given a span of (parent, child) pairs that all belong to this Cluster (or be removed),

instagibbs commented at 9:40 pm on February 6, 2025:

Cluster (or be removed)

not sure I get this comment, be removed what?

sipa commented at 10:35 pm on February 12, 2025:

This comment was no longer relevant, it dates from an earlier iteration. Gone.

in src/txgraph.cpp:726 in 781c15bfca outdated

449+    std::sort(to_apply.begin(), to_apply.end(), [](auto& a, auto& b) { return a.second < b.second; });
450+    // Iterate over groups of to-be-added dependencies with the same child.
451+    auto it = to_apply.begin();
452+    while (it != to_apply.end()) {
453+        auto& first_child = graph.m_entries[it->second].m_locator;
454+        DepGraphIndex child_idx = first_child.index;

instagibbs commented at 9:42 pm on February 6, 2025:

const this?

sipa commented at 10:35 pm on February 12, 2025:

Done.

in src/txgraph.cpp:772 in 781c15bfca outdated

492+
493+    // Clean up space in quality_cluster.
494+    auto max_setindex = quality_clusters.size() - 1;
495+    if (setindex != max_setindex) {
496+        // If the cluster was not the last element of quality_clusters, move that to take its place.
497+        quality_clusters.back()->m_quality = quality;

instagibbs commented at 9:50 pm on February 6, 2025:

781c15bfca1ebaffe7b634196e19144f5ab10a50

the cluster is from the same quality level, so this is kind of a no-op?

sipa commented at 10:35 pm on February 12, 2025:

Gone.

in src/txgraph.cpp:561 in 781c15bfca outdated

556+        if (cluster != nullptr) {
557+            // If the first to_remove_span entry's Cluster exists, hand to_remove_span to it, so it
558+            // can pop off whatever applies to it.
559+            cluster->ApplyRemovals(*this, to_remove_span);
560+        } else {
561+            // Otherwise, skip this already-removed entry.

instagibbs commented at 10:18 pm on February 6, 2025:

when does this happen?

sipa commented at 10:35 pm on February 12, 2025:

When the same Ref is RemoveTransaction()ed twice. Added a comment to clarify.

in src/txgraph.cpp:708 in 781c15bfca outdated

353+        auto component = m_depgraph.FindConnectedComponent(todo);
354+        if (first && component == todo) {
355+            // The existing Cluster is an entire component. Leave it be, but update its quality.
356+            graph.SetClusterQuality(m_quality, m_setindex, QualityLevel::NEEDS_RELINEARIZE);
357+            // We need to recompute and cache its chunking.
358+            Updated(graph);

ismaelsadeeq commented at 8:23 pm on February 10, 2025:

IIUC we call ApplyRemoval on each cluster before splitting, which calls updated with the graph at the end. so it seems to me that this call to updated before returning here is redundant and repeats work that has already been done?

I also think even if the above assumption is incorrect and an update is necessary, then since the cluster quality is now NEEDS_RELINEARIZE, we only need to update the locator.

sipa commented at 8:40 pm on February 11, 2025:

You’re right, but that’s exactly what the “delay chunking while sub-acceptable” commit is about. In the initial commit, the invariant is that the chunk feerates are maintained at all times (and the sanity check will fail if not). In this “delay chunking while sub-acceptable” commit, some updates are avoided while they are unnecessary. The commit after it (“special-case removal of tail of cluster”) reintroduces some, because with that, performing a Split() can in fact immediately introduce optimal clusters directly.

ismaelsadeeq commented at 10:07 pm on February 13, 2025:

I see this now 👍🏾

in src/txgraph.h:83 in 781c15bfca outdated

82+     * As an example, imagine 3 transactions A,B,C where B depends on A. If a dependency of C on B
83+     * is added, and then B is deleted, C will still depend on A. If the deletion of B is reordered
84+     * before the C->B dependency is added, it has no effect instead. If, together with the
85+     * deletion of B also either A or C is deleted, there is no distinction.
86+     */
87+    virtual void RemoveTransaction(const Ref& arg) noexcept = 0;

ismaelsadeeq commented at 8:50 pm on February 10, 2025:

IIUC you are recommending that callers should remove transactions topologically starting from bottom descendant to the top ancestor.

sipa commented at 8:42 pm on February 11, 2025:

No, the order of removals is irrelevant. The comment is just about pointing out that while TxGraph is allowed to reorder dependency additions and transaction removals w.r.t. each other, this doesn’t matter in practice. As long as every transaction removal is combined (doesn’t matter when) with also removal of all its ancestors, or with all its descendants, this reordering does not affect its observable behavior.

in src/txgraph.cpp:746 in 781c15bfca outdated

391+        for (auto par : m_depgraph.GetReducedParents(i)) new_parents.Set(remap[par].second);
392+        new_cluster->m_depgraph.AddDependencies(new_parents, remap[i].second);
393+    }
394+    // Update all the Locators of moved transactions.
395+    for (Cluster* new_cluster : new_clusters) {
396+        new_cluster->Updated(graph);

ismaelsadeeq commented at 8:54 pm on February 10, 2025:

Don’t we just need to update the locator here, because all NEEDS_RELINEARIZE clusters has to be made acceptable before we can get their chunk feerate?

sipa commented at 8:43 pm on February 11, 2025:

Same comment as before: in the initial commit, the invariant is that the chunk feerates are maintained at all times. It is relaxed later on.

ismaelsadeeq commented at 10:02 pm on February 10, 2025: member

Code review to 741a6a8c4d851cb10ecee810a09187bcbfa5af4c

in src/txgraph.cpp:928 in 781c15bfca outdated

594+    if (!m_deps_to_add.empty()) return;
595+    if (!m_to_remove.empty()) return;
596+
597+    // Sort the GraphIndexes that need to be cleaned up. They are sorted in reverse, so the last
598+    // ones get processed first. This means earlier-processed GraphIndexes will not cause moving of
599+    // later-processed ones during the "swap with end of m_entries" step (which might invalidate

instagibbs commented at 2:59 pm on February 11, 2025:

781c15bfca1ebaffe7b634196e19144f5ab10a50

0    // later-processed ones during the "swap with end of m_entries" step below (which might invalidate

sipa commented at 10:36 pm on February 12, 2025:

Done.

in src/txgraph.cpp:998 in 781c15bfca outdated

654+    if (m_group_data.has_value()) return;
655+
656+    /** Annotated clusters: an entry for each Cluster, together with the representative for the
657+     *  partition it is in if known, or with nullptr if not yet known. */
658+    std::vector<std::pair<Cluster*, Cluster*>> an_clusters;
659+    /** Annotated dependencies: an entry for each m_deps_to_apply entry (excluding ones that apply

instagibbs commented at 3:37 pm on February 11, 2025:

m_deps_to_add?

sipa commented at 10:36 pm on February 12, 2025:

Fixed.

in src/txgraph.cpp:1451 in 781c15bfca outdated

1005+std::vector<TxGraph::Ref*> TxGraphImpl::GetAncestors(const Ref& arg) noexcept
1006+{
1007+    // Return the empty vector if the Ref is empty.
1008+    if (GetRefGraph(arg) == nullptr) return {};
1009+    Assume(GetRefGraph(arg) == this);
1010+    // Apply all dependencies, as the result might be incorrect otherwise.

instagibbs commented at 3:50 pm on February 11, 2025:

Probably worth noting in these comment and elsewhere that it also applies removals if queued.

sipa commented at 10:36 pm on February 12, 2025:

Done.

in src/txgraph.h:93 in 781c15bfca outdated

88+    /** Add a dependency between two specified transactions. Parent may not be a descendant of
89+     *  child already (but may be an ancestor of it already, in which case this is a no-op). If
90+     *  either transaction is already removed, this is a no-op. */
91+    virtual void AddDependency(const Ref& parent, const Ref& child) noexcept = 0;
92+    /** Modify the fee of the specified transaction. If the transaction does not exist (or was
93+     *  removed), this has no effect. */

instagibbs commented at 3:58 pm on February 11, 2025:

should say the Ref if non-null should be connected to this TxGraph object

sipa commented at 10:36 pm on February 12, 2025:

I’ve added a comment to AddTransaction to clarify one can only pass Refs from a given txgraph object to functions of the same object.

in src/txgraph.cpp:1835 in 781c15bfca outdated

1099+
1100+void TxGraphImpl::SetTransactionFee(const Ref& ref, int64_t fee) noexcept
1101+{
1102+    // Don't do anything if the passed Ref is empty.
1103+    if (GetRefGraph(ref) == nullptr) return;
1104+    Assume(GetRefGraph(ref) == this);

instagibbs commented at 3:59 pm on February 11, 2025:

0    if (!Assume(GetRefGraph(ref) == this)) return;

sipa commented at 9:57 pm on February 12, 2025:

If we’re going to expend run-time cost to perform the check, we might as well drop the requirement that Ref refers to this graph entirely. Given that we’re not expecting to ever have multiple TxGraph objects in production, this feels like overkill.

in src/txgraph.cpp:1702 in 781c15bfca outdated

1067+
1068+FeePerWeight TxGraphImpl::GetChunkFeerate(const Ref& arg) noexcept
1069+{
1070+    // Return the empty FeePerWeight if the passed Ref is empty.
1071+    if (GetRefGraph(arg) == nullptr) return {};
1072+    Assume(GetRefGraph(arg) == this);

instagibbs commented at 4:04 pm on February 11, 2025:

0    if (!Assume(GetRefGraph(arg) == this)) return {};

and so on for the public functions?

in src/txgraph.h:47 in 781c15bfca outdated

17+
18+/** Data structure to encapsulate fees, sizes, and dependencies for a set of transactions. */
19+class TxGraph
20+{
21+public:
22+    /** Internal identifier for a transaction within a TxGraph. */

instagibbs commented at 4:26 pm on February 11, 2025:

0    /** Internal identifier for a transaction within a TxGraph corresponding to m_entries order. */

sipa commented at 9:59 pm on February 12, 2025:

I’d rather not refer to implementation details in the public interface.

in src/txgraph.cpp:846 in 781c15bfca outdated

841+    Assume(m_group_data.has_value());
842+    // Nothing to do if there are no dependencies to be added.
843+    if (m_deps_to_add.empty()) return;
844+
845+    // For each group of to-be-merged Clusters.
846+    Assume(m_group_data.has_value());

instagibbs commented at 4:35 pm on February 11, 2025:

this was just assumed a few lines above

edit: in ee4d16ff3a32a0b12a7c1b7837ef5195d9deedf0 the other one is gone, but we’re accessing it two lines above

sipa commented at 10:37 pm on February 12, 2025:

Fixed.

in src/txgraph.cpp:146 in 781c15bfca outdated

141+    /** Information about one group of Clusters to be merged. */
142+    struct GroupEntry
143+    {
144+        /** Which clusters are to be merged. */
145+        std::vector<Cluster*> m_clusters;
146+        /** Which dependencies are to be applied to those merged clusters. */

instagibbs commented at 4:40 pm on February 11, 2025:

0        /** Which dependencies are to be applied to those merged clusters from parent to child. */

sipa commented at 10:37 pm on February 12, 2025:

Added a comment to this effect (the variable is gone in a future commit, though).

in src/txgraph.cpp:1243 in 781c15bfca outdated

845+    // For each group of to-be-merged Clusters.
846+    Assume(m_group_data.has_value());
847+    for (auto& group_data : *m_group_data) {
848+        // Invoke Merge() to merge them into a single Cluster.
849+        Merge(group_data.m_clusters);
850+        // Actually apply all to-be-added dependencies (for each, parent and child belong to the

instagibbs commented at 4:42 pm on February 11, 2025:

0        // Actually apply all to-be-added dependencies (all parents and children from this grouping belong to the

sipa commented at 10:37 pm on February 12, 2025:

Done.

in src/txgraph.cpp:1097 in 781c15bfca outdated

663+
664+    // Construct a an_clusters entry for every parent and child in the to-be-applied dependencies.
665+    for (const auto& [par, chl] : m_deps_to_add) {
666+        auto par_cluster = m_entries[par].m_locator.cluster;
667+        auto chl_cluster = m_entries[chl].m_locator.cluster;
668+        // Skip dependencies for which the parent or child transaction is removed.

instagibbs commented at 5:07 pm on February 11, 2025:

this would be the case where a removal and a dependency addition was queued for the same ref?

sipa commented at 10:18 pm on February 12, 2025:

Indeed.

sipa force-pushed on Feb 11, 2025

in src/txgraph.cpp:1133 in 0eacbd61cb outdated

690+             *  a representative for the entire tree, and can be found by walking upwards from any
691+             *  element. */
692+            PartitionData* parent;
693+            /** (only if this is a root, so when parent == this) An upper bound on the height of
694+             *  tree for this partition. */
695+            unsigned rank;

instagibbs commented at 9:24 pm on February 11, 2025:

we doing just unsigned for a reason? I don’t see this in the codebase often

sipa commented at 10:21 pm on February 12, 2025:

It’s a tiny number, as it’s bounded by $\mathcal{O}(\log n)$, so it didn’t feel right to use the same DepGraphIndex or GraphIndex type.

in src/txgraph.cpp:710 in 0eacbd61cb outdated

705+            Assume(it->cluster == arg);
706+            return &*it;
707+        };
708+
709+        /** Given a PartitionData, find the root of the tree it is in (its representative). */
710+        static constexpr auto find_uf = [](PartitionData* data) noexcept -> PartitionData* {

instagibbs commented at 9:28 pm on February 11, 2025:

s/find_uf/find_root_uf/ ?

find_uf and locate_uf read identical

sipa commented at 10:37 pm on February 12, 2025:

Renamed to find_root_fn, locate_fn, union_fn.

in src/txgraph.cpp:1217 in 0eacbd61cb outdated

759+        for (size_t i = 0; i < partition_data.size(); ++i) {
760+            auto& data = partition_data[i];
761+            // Find the representative of the partition Cluster i is in, and store it with the
762+            // Cluster.
763+            auto rep = find_uf(&data)->cluster;
764+            an_clusters[i].second = rep;

instagibbs commented at 9:43 pm on February 11, 2025:

0            Assume(an_clusters[i].second == nullptr);
1            an_clusters[i].second = rep;

sipa commented at 10:38 pm on February 12, 2025:

Done.

instagibbs commented at 4:22 pm on February 12, 2025: member

reviewed up to “txgraph: (feature) add initial version” 781c15bfca1ebaffe7b634196e19144f5ab10a50 , basically all nits at this point, logic all makes sense

One question I had coming up is use of Assume(), there is a number of places that Assume() is used, then subsequently may result in UB or weird behavior. In a few spots I gave suggestions but was wondering if there is thinking behind doing more (or Assert’ing) or not to handle that in release builds.

in src/test/fuzz/txgraph.cpp:172 in 1f06bc1e4a outdated

167+
168+    // Construct a real and a simulated graph.
169+    auto real = MakeTxGraph();
170+    SimTxGraph sim;
171+
172+    /** Function to pick any Ref (in sim real, sim.removed, or empty). */

ismaelsadeeq commented at 4:34 pm on February 12, 2025:

In “txgraph: (tests) add simulation fuzz test” 1f06bc1e4a8108f1430bcd20fc391c9f663a2e4b

using real in the comments here is a bit confusing as you represent the actual tx graph as real

instagibbs commented at 4:51 pm on February 12, 2025:

sim real

feel like we’re missing a punctuation or I’m unclear what it’s saying

sipa commented at 4:35 am on February 13, 2025:

Agreed, fixed.

ismaelsadeeq commented at 7:20 pm on February 13, 2025:

nit: There is still one remaining in line 180.

sipa commented at 10:11 pm on February 21, 2025:

This has been rewritten already.

in src/txgraph.cpp:605 in 1f06bc1e4a outdated

310@@ -311,7 +311,7 @@ void Cluster::ApplyRemovals(TxGraphImpl& graph, std::span<GraphIndex>& to_remove
311         auto& locator = entry.m_locator;
312         // Stop once we hit an entry that applies to another Cluster.
313         if (locator.cluster != this) break;
314-        // - Remember it in a set of to-remove ClusterIndexes.
315+        // - Remember it in a set of to-remove DepGraphIndexes.

instagibbs commented at 4:43 pm on February 12, 2025:

micro-nit: seems this should have been renamed in a different commit 1f06bc1e4a8108f1430bcd20fc391c9f663a2e4b

sipa commented at 10:11 pm on February 21, 2025:

Done.

in src/test/fuzz/txgraph.cpp:142 in 1f06bc1e4a outdated

100+        auto pos = Find(ref);
101+        if (pos == MISSING) return;
102+        graph.RemoveTransactions(SetType::Singleton(pos));
103+        simrevmap.erase(simmap[pos].get());
104+        // Retain the TxGraph::Ref corresponding to this position, until explicitly destroyed.
105+        // to see it when calling Cleanup().

instagibbs commented at 5:14 pm on February 12, 2025:

1f06bc1e4a8108f1430bcd20fc391c9f663a2e4b

Cleanup

probably an old reference?

sipa commented at 10:11 pm on February 21, 2025:

Removed it.

in src/test/fuzz/txgraph.cpp:306 in 1f06bc1e4a outdated

210+        while (true) {
211+            if (sim.GetTransactionCount() < SimTxGraph::MAX_TRANSACTIONS && command-- == 0) {
212+                // AddTransaction.
213+                int64_t fee;
214+                int32_t size;
215+                if (alt) {

instagibbs commented at 5:19 pm on February 12, 2025:

to be clear this is to allow the search space to be split in a bimodal way?

sipa commented at 10:11 pm on February 21, 2025:

It’s mostly to avoid consuming 12 bytes from the fuzz data for every transaction; added a comment.

in src/test/fuzz/txgraph.cpp:255 in 1f06bc1e4a outdated

250+                    if (oversize) break;
251+                }
252+                sim.AddDependency(par, chl);
253+                real->AddDependency(*par, *chl);
254+                break;
255+            } else if (sim.removed.size() < 100 && command-- == 0) {

instagibbs commented at 5:29 pm on February 12, 2025:

sim.removed.size() < 100

Is this because the loop does 200 commands, and we logically speaking cannot remove more than 100 after 100 additions?

sipa commented at 10:12 pm on February 21, 2025:

This dates from a time when the fuzz test allowed up to 1000 steps, and it felt unnecessary to support cases with unboundedly growing numbers of removed transactions. I’ve just dropped it.

in src/test/fuzz/txgraph.cpp:271 in 1f06bc1e4a outdated

266+                    real->RemoveTransaction(*ptr);
267+                    sim.RemoveTransaction(ptr);
268+                }
269+                break;
270+            } else if (sim.removed.size() > 0 && command-- == 0) {
271+                // ~Ref

instagibbs commented at 5:35 pm on February 12, 2025:

1f06bc1e4a8108f1430bcd20fc391c9f663a2e4b

this is the odd command out; what precisely is this simulating, or is this just making sure destruction of the Ref is being exercised in general?

sipa commented at 10:13 pm on February 21, 2025:

I’ve added a comment. ~Ref triggers observable behavior in TxGraph, so the simulation controls when it gets called (and its ordering w.r.t. other operations).

in src/test/fuzz/txgraph.cpp:279 in 1f06bc1e4a outdated

274+                if (removed_pos != sim.removed.size() - 1) {
275+                    std::swap(sim.removed[removed_pos], sim.removed.back());
276+                }
277+                sim.removed.pop_back();
278+                break;
279+            } else if (sim.GetTransactionCount() > 0 && command-- == 0) {

instagibbs commented at 5:36 pm on February 12, 2025:

do we really not want coverage when the graph is empty?

sipa commented at 10:13 pm on February 21, 2025:

Fixed.

in src/test/fuzz/txgraph.cpp:256 in 1f06bc1e4a outdated

173+    auto pick_fn = [&]() noexcept -> TxGraph::Ref* {
174+        auto tx_count = sim.GetTransactionCount();
175+        /** The number of possible choices. */
176+        size_t choices = tx_count + sim.removed.size() + 1;
177+        /** Pick one of them. */
178+        auto choice = provider.ConsumeIntegralInRange<size_t>(0, choices - 1);

ismaelsadeeq commented at 7:01 pm on February 12, 2025:

In “txgraph: (tests) add simulation fuzz test” 1f06bc1e4a8108f1430bcd20fc391c9f663a2e4b

why add 1 and then subtract again when using the choices value?

sipa commented at 4:26 am on February 13, 2025:

Well choices counts the actual number of choices, consisting of:

All transactions in sim.simmap (tx_count)
All transactions in sim.removed (sim.removed.size())
The empty Ref (1).

When picking one of them, we want one in the range from 0 up to and including choices - 1.

ismaelsadeeq commented at 7:19 pm on February 13, 2025:

Makes sense thanks.

in src/test/fuzz/txgraph.cpp:117 in 1f06bc1e4a outdated

77+    }
78+
79+    /** Add a dependency between two positions in this graph. */
80+    void AddDependency(TxGraph::Ref* parent, TxGraph::Ref* child)
81+    {
82+        auto par_pos = Find(parent);

ismaelsadeeq commented at 7:23 pm on February 12, 2025:

In “txgraph: (tests) add simulation fuzz test” 1f06bc1e4a8108f1430bcd20fc391c9f663a2e4b

Should we return early when parent ref is the same as ref child? This will prevent a possibility of having a cycle. As such, we don’t have to check before adding dependency in the fuzz test.

sipa commented at 4:27 am on February 13, 2025:

It doesn’t matter; TxGraph ignores self-dependencies anyway (because DepGraph does), so having it doesn’t hurt, and adds testing for that case.

in src/test/fuzz/txgraph.cpp:282 in 1f06bc1e4a outdated

195+        // Return empty.
196+        assert(choice == 0);
197+        return &empty_ref;
198+    };
199+
200+    LIMITED_WHILE(provider.remaining_bytes() > 0, 200) {

ismaelsadeeq commented at 7:51 pm on February 12, 2025:

In “txgraph: (tests) add simulation fuzz test” 1f06bc1e4a8108f1430bcd20fc391c9f663a2e4b

This loop is quite big, has series of steps that are depending if behavior of the if else branches.

Will be nice to add an overview of how the test works, just like it was done in cluster_linearize fuzz test

sipa commented at 4:29 am on February 13, 2025:

It’s not clear to me what you’re asking for. I would describe the test as “big simulation test, which performs a number of operations on a real TxGraph, and on a simpler reimplementation, and in the end compares the two”. Something like that? Which cluster_linearize fuzz test are you referring to?

ismaelsadeeq commented at 7:20 pm on February 13, 2025:

Yeah something like that that

In most of the fuzz test in src/test/fuzz/cluster_linearize.cpp you provided a really nice description , which is just an overview of what the test is going to do before jumping to the instructions.

Here we just jump into the instructions without that context.

But this is nitty, you can ignore this comment and resolve.

sipa commented at 9:30 pm on February 13, 2025:

Will address on the next push.

sipa commented at 9:11 pm on March 19, 2025:

I added the following comment:

0    // This is a big simulation test for TxGraph, which performs a fuzz-derived sequence of valid
1    // operations on a TxGraph instance, as well as on a simpler (mostly) reimplementation (see
2    // SimTxGraph above), comparing the outcome of functions that return a result, and finally
3    // performing a full comparison between the two.

in src/txgraph.cpp:1201 in 741a6a8c4d outdated

1196+        // ... for all clusters in them ...
1197+        for (ClusterSetIndex setindex = 0; setindex < quality_clusters.size(); ++setindex) {
1198+            const auto& cluster = *quality_clusters[setindex];
1199+            // Remember we saw this Cluster (only if it is non-empty; empty Clusters aren't
1200+            // expected to be referenced by the Entry vector).
1201+            if (cluster.GetTxCount() != 0) {

instagibbs commented at 8:05 pm on February 12, 2025:

Empty clusters can happen anytime ApplyRemovals() is invoked, but not ApplyDependencies since that invokes SplitAll(), correct?

sipa commented at 8:55 pm on February 21, 2025:

Indeed. Empty clusters can only exist in after an ApplyRemovals but before a SplitAll(). After oversizedness caching, ApplyDependencies may end up not calling SplitAll(), though, if the result is known to be oversized already.

in src/txgraph.cpp:1895 in 741a6a8c4d outdated

1158+    assert(m_done == m_depgraph.Positions());
1159+}
1160+
1161+void TxGraphImpl::SanityCheck() const
1162+{
1163+    /** Which GraphIndexes ought to occur in m_wiped, based on m_entries. */

ismaelsadeeq commented at 8:53 pm on February 12, 2025:

In “txgraph: (tests) add internal sanity check function” 741a6a8c4d851cb10ecee810a09187bcbfa5af4c

What is m_wiped

sipa commented at 4:35 am on February 13, 2025:

That was outdated. Fixed.

in src/txgraph.cpp:1149 in 741a6a8c4d outdated

1144+        // Check that the Entry has a locator pointing back to this Cluster & position within it.
1145+        assert(entry.m_locator.cluster == this);
1146+        assert(entry.m_locator.index == lin_pos);
1147+        // Check linearization position and chunk feerate.
1148+        if (!linchunking.GetChunk(0).transactions[lin_pos]) {
1149+            linchunking.MarkDone(linchunking.GetChunk(0).transactions);

ismaelsadeeq commented at 9:01 pm on February 12, 2025:

In “txgraph: (tests) add internal sanity check function” 741a6a8c4d851cb10ecee810a09187bcbfa5af4c

Should we also compare that the list of transactions in chunk 0 matches m_linearization subset from from previous hit of the conditional statement to lin_pos?

sipa commented at 4:32 am on February 13, 2025:

I’m not sure what you mean here.

ismaelsadeeq commented at 7:21 pm on February 13, 2025:

Yeah I was thinking something like

 0diff --git a/src/txgraph.cpp b/src/txgraph.cpp
 1index afaa46e6326..a9b3e446d26 100644
 2--- a/src/txgraph.cpp
 3+++ b/src/txgraph.cpp
 4@@ -1143,6 +1143,7 @@ void Cluster::SanityCheck(const TxGraphImpl& graph) const
 5 
 6     // Verify m_linearization.
 7     SetType m_done;
 8+    SetType last_chunk;
 9     assert(m_depgraph.IsAcyclic());
10     for (auto lin_pos : m_linearization) {
11         assert(lin_pos < m_mapping.size());
12@@ -1155,9 +1156,12 @@ void Cluster::SanityCheck(const TxGraphImpl& graph) const
13         assert(entry.m_locator.index == lin_pos);
14         // Check linearization position and chunk feerate.
15         if (!linchunking.GetChunk(0).transactions[lin_pos]) {
16+            assert(linchunking.GetChunk(0).transactions == last_chunk);
17+            last_chunk = SetType();
18             linchunking.MarkDone(linchunking.GetChunk(0).transactions);
19         }
20         assert(entry.m_chunk_feerate == linchunking.GetChunk(0).feerate);
21+        last_chunk.Set(lin_pos);
22         // If this Cluster has an acceptable quality level, its chunks must be connected.
23         if (m_quality == QualityLevel::ACCEPTABLE || m_quality == QualityLevel::OPTIMAL) {
24             assert(m_depgraph.IsConnected(linchunking.GetChunk(0).transactions));

sipa commented at 9:29 pm on February 13, 2025:

I don’t think this is the place for it, as it amounts to testing of LinearizationChunking, not TxGraph. The point of the code here is just to get the chunks defined by m_linearization out, so we can test them. If LinearizationChunking works correctly (see clusterlin_linearization_chunking for that), then the chunks that come out will correspond to consecutive transactions from the linearization.

sipa force-pushed on Feb 12, 2025

DrahtBot added the label CI failed on Feb 13, 2025

DrahtBot commented at 0:01 am on February 13, 2025: contributor

🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/37128519486

Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.
An intermittent issue.

Leave a comment here, if you need help tracking down a confusing failure.

sipa force-pushed on Feb 13, 2025

DrahtBot removed the label CI failed on Feb 13, 2025

in src/txgraph.cpp:1189 in 56cbdd7889 outdated

833+            while (deps_it != m_deps_to_add.end()) {
834+                auto [par, chl] = *deps_it;
835+                auto chl_cluster = m_entries[chl].m_locator.cluster;
836+                // Skip dependencies that apply to earlier Clusters (those necessary are for
837+                // deleted transactions, as otherwise we'd have processed them already).
838+                if (!std::less{}(chl_cluster, data.cluster)) {

instagibbs commented at 6:02 pm on February 13, 2025:

56cbdd7889e957c29d76681c46c4a7a6983c9be6

I rewrote it so my smooth brain could better decipher it (I believe it’s the same) :

 0diff --git a/src/txgraph.cpp b/src/txgraph.cpp
 1index b2cecffebb..df93eb2a13 100644
 2--- a/src/txgraph.cpp
 3+++ b/src/txgraph.cpp
 4@@ -821,12 +821,13 @@ void TxGraphImpl::GroupClusters() noexcept
 5             while (deps_it != m_deps_to_add.end()) {
 6                 auto [par, chl] = *deps_it;
 7                 auto chl_cluster = m_entries[chl].m_locator.cluster;
 8                 // Skip dependencies that apply to earlier Clusters (those necessary are for
 9                 // deleted transactions, as otherwise we'd have processed them already).
10-                if (!std::less{}(chl_cluster, data.cluster)) {
11-                    if (chl_cluster != data.cluster) break;
12+                if (std::greater{}(chl_cluster, data.cluster)) {
13+                    break;
14+                } else if (chl_cluster == data.cluster) {
15                     auto par_cluster = m_entries[par].m_locator.cluster;
16                     // Also filter out dependencies applying to a removed parent.
17                     if (par_cluster != nullptr) an_deps.emplace_back(*deps_it, rep);
18                 }
19                 ++deps_it;

sipa commented at 10:15 pm on February 21, 2025:

Done, largely.

in src/txgraph.cpp:662 in 693c3df67d outdated

418+           m_quality == QualityLevel::NEEDS_SPLIT_ACCEPTABLE);
419+    // Determine the new quality the split-off Clusters will have.
420+    QualityLevel new_quality = m_quality == QualityLevel::NEEDS_SPLIT ? QualityLevel::NEEDS_RELINEARIZE :
421+                               m_quality == QualityLevel::NEEDS_SPLIT_OPTIMAL ? QualityLevel::OPTIMAL :
422+                               QualityLevel::ACCEPTABLE;
423+    // If the cluster was NEEDS_SPLIT_OPTIMAL, and we're thus going to produce OPTIMAL clusters, we

instagibbs commented at 6:29 pm on February 13, 2025:

693c3df67dc972dd2fcfbeb8179bcd7833c77bd1

Working on convincing myself that PostLinearization is suffcient to result in optimal clusters. I think it’s straight forward that the chunk prefixes before any removed tail transactions are untouched, and once we remove tail transactions, the last touched-but-not-fully-removed chunk may be in a substandard ordering.

So PostLinearization guarantees that the final sub-chunk is reordered to being connected, and that means that sub-chunk is now optimal? I’m going to need help here.

Something I was too lazy to do: A fuzz harness could be constructed via taking verified-optimal depgraphs (from exhaustive solver), chopping the tail off, and verifying all connected components PostLinearize-ation are just as good as whatever the exhaustive linearizer gives for each component-now-cluster?

instagibbs commented at 7:01 pm on February 27, 2025:

Update from offline conversation: This was a confusion about what we’re calling “optimal”. The splits clusters are already optimal in the sense of the diagram check, but not minimal or fully connected. PostLinearization makes sure that all chunks are connected/minimal.

Perhaps a rewording of this section to make it abundantly clear?

instagibbs commented at 3:54 pm on March 3, 2025:

Also, as discussed offline, this seems like an easy scenario to write a fuzz test for as described above ^

sipa commented at 9:15 pm on March 19, 2025:

NEEDS_SPLIT_OPTIMAL is gone now after further discussion. Marking resolved.

in src/txgraph.cpp:1021 in 370c419c33 outdated

1017@@ -997,6 +1018,7 @@ void TxGraphImpl::AddDependency(const Ref& parent, const Ref& child) noexcept
1018     m_group_data.reset();
1019     // Remember that this dependency is to be applied.
1020     m_deps_to_add.emplace_back(GetRefIndex(parent), GetRefIndex(child));
1021+    m_group_data.reset();

ismaelsadeeq commented at 8:09 pm on February 13, 2025:

In “txgraph: (feature) make max cluster count configurable and oversize state” 370c419c330ca73d9ea44a7d48506a5c203e9c17

This is a duplicate I think

sipa commented at 10:28 pm on February 14, 2025:

Indeed, fixed.

in src/txgraph.cpp:324 in 30d7c8ce7f outdated

336-        } while(chunk.transactions.Any());
337+    // If the Cluster's quality is ACCEPTABLE or OPTIMAL, compute its chunking and store its
338+    // information in the Entry's m_chunk_feerate. These fields are only accessed after making
339+    // the entire graph ACCEPTABLE, so it is pointless to compute these if we haven't reached that
340+    // quality level yet.
341+    if (m_quality == QualityLevel::OPTIMAL || m_quality == QualityLevel::ACCEPTABLE) {

ismaelsadeeq commented at 9:35 pm on February 13, 2025:

In “txgraph: (optimization) delay chunking while sub-acceptable” 30d7c8ce7f34e977238d1454dd032196cbfd936b

nit: we can have truthy/false method for this so that we can just call it. we have this conditional statement in quite a few places

sipa commented at 10:29 pm on February 14, 2025:

Done. I have actually added a bunch of functions (IsAcceptable(), IsOptimal(), NeedsSplitting(), and further on also IsOversized()) and mostly rewritten the long conditions involving m_quality to use these functions instead.

in src/txgraph.cpp:263 in 1b207547a9 outdated

259@@ -260,6 +260,8 @@ class TxGraphImpl final : public TxGraph
260     ClusterSetIndex InsertCluster(std::unique_ptr<Cluster>&& cluster, QualityLevel quality) noexcept;
261     /** Change the QualityLevel of a Cluster (identified by old_quality and old_index). */
262     void SetClusterQuality(QualityLevel old_quality, ClusterSetIndex old_index, QualityLevel new_quality) noexcept;
263+    /** Make a transaction not exist. */

instagibbs commented at 3:16 pm on February 14, 2025:

nit

0    /** Make a transaction not exist. Transaction must currently exist. */

sipa commented at 10:15 pm on February 21, 2025:

Done.

in src/txgraph.cpp:308 in e19bbc3282 outdated

304@@ -252,16 +305,21 @@ class TxGraphImpl final : public TxGraph
305 
306     /** Swap the Entrys referred to by a and b. */
307     void SwapIndexes(GraphIndex a, GraphIndex b) noexcept;
308-    /** Extract a Cluster. */
309-    std::unique_ptr<Cluster> ExtractCluster(QualityLevel quality, ClusterSetIndex setindex) noexcept;
310+    /** If idx exists in the specified level ClusterSet (explicitly or implicitly), return the

ismaelsadeeq commented at 3:40 pm on February 14, 2025:

What do you mean by implicitly

sipa commented at 10:29 pm on February 14, 2025:

I have expanded on this. LMK if it’s clearer now.

in src/txgraph.cpp:120 in e19bbc3282 outdated

 94@@ -90,6 +95,17 @@ class Cluster
 95     void UpdateMapping(DepGraphIndex cluster_idx, GraphIndex graph_idx) noexcept { m_mapping[cluster_idx] = graph_idx; }
 96     /** Push changes to Cluster and its linearization to the TxGraphImpl Entry objects. */
 97     void Updated(TxGraphImpl& graph) noexcept;
 98+    /** Create a copy of this Cluster, returning a pointer to it (used by PullIn). */
 99+    Cluster* CopyTo(TxGraphImpl& graph, int to_level) const noexcept;
100+    /** Get the list of Clusters that conflict with this one (at level-1). */

instagibbs commented at 3:47 pm on February 14, 2025:

nit: “level-1” is a bit vague, but understandable once you dive into the code itself. “top level - 1”?

sipa commented at 10:18 pm on February 21, 2025:

That’s not quite what this does. It looks for conflicts one level below the one where the *this Cluster exists (in practice, that’ll always be top_level-1, though). Improved the comment.

in src/txgraph.cpp:229 in e19bbc3282 outdated

208@@ -189,28 +209,59 @@ class TxGraphImpl final : public TxGraph
209         std::vector<std::pair<GraphIndex, GraphIndex>> m_deps_to_add;
210         /** Information about the merges to be performed, if known. */
211         std::optional<GroupData> m_group_data = GroupData{};
212+        /** Which entries were removed in this ClusterSet (so they can be wiped on abort). */

instagibbs commented at 3:49 pm on February 14, 2025:

IIUC this field should become a subset of m_to_remove’s value which ended up being IsPresent? Is there any better way to explain the relationship of these two variables?

sipa commented at 9:20 pm on February 21, 2025:

I think the most accurate description is “All entries which have an (R) removed locator at this level, plus any transactions in m_unlinked”. Would that help?

instagibbs commented at 11:38 pm on February 21, 2025:

yes, thanks

sipa commented at 9:08 pm on March 19, 2025:

Done.

in src/txgraph.cpp:232 in e19bbc3282 outdated

232+     *  - (P)resent: actually occurs in a Cluster at that level.
233+     *
234+     *  - (M)issing: not present in a Cluster at that level. For main, this means the transaction
235+     *               does not exist. For staging this means it its existence is inherited from
236+     *               main. If it doesn't exist in main, it doesn't exist in staging either. If it
237+     *               does existing in main, the cluster it is in is unmodified in staging.

ismaelsadeeq commented at 3:49 pm on February 14, 2025:

0     *               does exist in main, the cluster it is in is unmodified in staging.

sipa commented at 10:29 pm on February 14, 2025:

Done.

in src/txgraph.cpp:122 in e19bbc3282 outdated

102+    /** Mark all the Entry objects belonging to this Cluster as missing. The Cluster must be
103+     *  deleted immediately after. */
104+    void MakeTransactionsMissing(TxGraphImpl& graph) noexcept;
105+    /** Remove all transactions in a Cluster. */
106+    void Clear(TxGraphImpl& graph) noexcept;
107+    /** Change a Cluster's level from level to level-1. */

ismaelsadeeq commented at 4:05 pm on February 14, 2025:

Should just refer to main and staging only instead of level-1, main, staging and top level? It will be consistent this way.

sipa commented at 9:14 pm on March 19, 2025:

Together with a lot of changes (getting rid of the ClusterSet vector), done.

in src/txgraph.cpp:273 in e19bbc3282 outdated

243+     * - (M,M): the transaction doesn't exist in either graph.
244+     * - (P,M): the transaction exists in both, but only exists explicitly in a Cluster object in
245+     *          main. Its existence in staging is inherited from main.
246+     * - (P,P): the transaction exists in both, and is materialized in both. Thus, the clusters
247+     *          and/or their linearizations may be different in main and staging.
248+     * - (M,P): the transaction is added in staging, and does not exist in main.

ismaelsadeeq commented at 4:15 pm on February 14, 2025:

hmm, above you mentioned “If it doesn’t exist in main, it doesn’t exist in staging either.”.

And now there is a state where the transaction exits in staging but not in main

sipa commented at 10:30 pm on February 14, 2025:

I see this was a bit confusing, I have expanded the explanation.

But what I meant to convey was that (M) in staging means that if the transaction does not exist in main, it doesn’t exist in staging either. A (P) in staging just means it exists there. LMK if the new explanation is clearer.

in src/txgraph.cpp:525 in e19bbc3282 outdated

449+    }
450+}
451+
452+std::vector<Cluster*> TxGraphImpl::GetConflicts() const noexcept
453+{
454+    int level = m_clustersets.size() - 1;

ismaelsadeeq commented at 4:46 pm on February 14, 2025:

Should we just do this here and other places?

0    int level = MAX_LEVELS - 1;

sipa commented at 9:14 pm on March 19, 2025:

This code is gone now, resolving.

in src/txgraph.cpp:518 in e19bbc3282 outdated

442+    for (auto i : m_linearization) {
443+        auto& entry = graph.m_entries[m_mapping[i]];
444+        // For every transaction Entry in this Cluster, if it also exists in a lower-level Cluster,
445+        // then that Cluster conflicts.
446+        if (entry.m_locator[m_level - 1].IsPresent()) {
447+            out.push_back(entry.m_locator[m_level - 1].cluster);

ismaelsadeeq commented at 5:11 pm on February 14, 2025:

Should be set to avoid duplicates, will prevent de duplicating later on?

sipa commented at 8:03 pm on February 18, 2025:

Both options are O(n log n), but a vector + sorting + deduplication has much better constant factors. An std::set requires a separate allocation for each element, while a vector uses a single allocator for everything. Having everything in continuous memory also exploits the CPU cache better.

In theory, using an std::unordered_map could asymptotically faster for large numbers as it’s just O(n), but I suspect not for the numbers we care about. I haven’t tried or benchmarked it, though.

ismaelsadeeq commented at 8:06 pm on February 18, 2025:

Agreed, I calculated recently. This is better.

The constant factors for GetDistinctClusters implementation using vector and then sorting is also better 👍🏾 Please resolve.

ismaelsadeeq commented at 5:36 pm on February 14, 2025: member

Code Review e19bbc328236f64716034277857951184309cd14

It seems to me that this is designed with the flexibility to accommodate more levels, which explains the use of std::vector for ClusterSet

If yes then current approach is perfect else would it be better for TxGraph to maintain a std::array of two ClusterSet? mainandstaging and store their indexes in TxGraph as enums? This way, Cluster would only need to maintain that enum.

In ClusterSet, we could introduce a boolean to indicate whether a set is active or not.

The main set could always remain active, while staging would be flexible and toggleable.

Thus, when we start staging, we can simply mark the back array as active, and once done, clear the back array and mark it as inactive.

I believe this approach would reduce Assumes since we explicitly know we can only have two ClusterSet’s.

in src/txgraph.cpp:532 in e19bbc3282 outdated

451+
452+std::vector<Cluster*> TxGraphImpl::GetConflicts() const noexcept
453+{
454+    int level = m_clustersets.size() - 1;
455+    std::vector<Cluster*> ret;
456+    // All Clusters at level-1 containing transactions in m_removed are conflicts.

instagibbs commented at 7:18 pm on February 14, 2025:

i.e., any (P, R)? If so, putting in comment would be nice

sipa commented at 10:18 pm on February 21, 2025:

Indeed, done.

in src/txgraph.cpp:539 in e19bbc3282 outdated

457+    for (auto i : m_clustersets[level].m_removed) {
458+        auto& entry = m_entries[i];
459+        Assume(entry.m_locator[level - 1].IsPresent());
460+        ret.push_back(entry.m_locator[level - 1].cluster);
461+    }
462+    // Then go over all Clusters at this level, and find their conflicts.

instagibbs commented at 7:19 pm on February 14, 2025:

i.e., any (P, P)? If so, putting in comment would be nice

sipa commented at 10:19 pm on February 21, 2025:

Indeed, done.

in src/txgraph.cpp:563 in e19bbc3282 outdated

481+    ptr->m_depgraph = m_depgraph;
482+    ptr->m_mapping = m_mapping;
483+    ptr->m_linearization = m_linearization;
484+    // Insert the new Cluster into the graph.
485+    graph.InsertCluster(to_level, std::move(ret), m_quality);
486+    // Update its Locators (and possibly linearization data in its Entrys).

instagibbs commented at 7:31 pm on February 14, 2025:

and possibly linearization data

e.g. when to_level is 0 and is otherwise acceptable?

sipa commented at 10:19 pm on February 21, 2025:

Yes, but no, because to_level is never 0. I have just dropped it.

in src/txgraph.cpp:125 in e19bbc3282 outdated

100+    /** Get the list of Clusters that conflict with this one (at level-1). */
101+    void GetConflicts(const TxGraphImpl& graph, std::vector<Cluster*>& out) const noexcept;
102+    /** Mark all the Entry objects belonging to this Cluster as missing. The Cluster must be
103+     *  deleted immediately after. */
104+    void MakeTransactionsMissing(TxGraphImpl& graph) noexcept;
105+    /** Remove all transactions in a Cluster. */

instagibbs commented at 7:36 pm on February 14, 2025:

I find this slightly clearer since txns can still exist somewhere

0    /** Remove all transactions from a Cluster. */

sipa commented at 10:19 pm on February 21, 2025:

Done.

in src/txgraph.cpp:905 in e19bbc3282 outdated

826+    int level = cluster->m_level;
827+    Assume(level <= to_level);
828+    // Copy the Cluster from the level it was found at to higher levels, if any.
829+    while (level < to_level) {
830+        // Make the Cluster Acceptable before copying. This isn't strictly necessary, but doing it
831+        // now avoids doing doable work later.

instagibbs commented at 7:59 pm on February 14, 2025:

micro-nit: immediately doable

sipa commented at 10:20 pm on February 21, 2025:

Actually I meant “double”, not “doable”, I think. Fixed.

instagibbs commented at 11:57 pm on February 21, 2025:

hah, was my first guess that I doubted :)

in src/txgraph.cpp:842 in e19bbc3282 outdated

844+    auto& clusterset = m_clustersets[level];
845     auto& to_remove = clusterset.m_to_remove;
846     // Skip if there is nothing to remove.
847     if (to_remove.empty()) return;
848+    // Pull in all Clusters that are not in the top ClusterSet.
849+    for (GraphIndex index : clusterset.m_to_remove) {

instagibbs commented at 8:06 pm on February 14, 2025:

0    for (GraphIndex index : to_remove) {

and elsewhere so we’re only looking at two different to_remove* things vs three?

sipa commented at 10:20 pm on February 21, 2025:

Done.

sipa force-pushed on Feb 14, 2025

in src/txgraph.h:96 in 88454752cd outdated

 98@@ -99,6 +99,11 @@ class TxGraph
 99      *  effect. */
100     virtual void SetTransactionFee(const Ref& arg, int64_t fee) noexcept = 0;
101 
102+    /** TxGraph is internally lazy, and will not compute many things until they are needed.
103+     *  Calling DoWork will compute everything now, so that future operations are fast. This can be
104+     *  invoked while oversized. */

ismaelsadeeq commented at 3:46 pm on February 18, 2025:

It can be called when oversized but for staging DoWork it will be no-op when it is oversized.

sipa commented at 8:45 pm on February 18, 2025:

It might still do something; for example removals may be applied still, and grouping may be calculated. I feel it’s unnecessary to elaborate that much about implementation details in the interface, though.

sipa commented at 8:30 pm on February 18, 2025: member

@ismaelsadeeq

It seems to me that this is designed with the flexibility to accommodate more levels, which explains the use of std::vector for ClusterSet

It’s not really written with the intent of accomodating more levels; it’s more that I found it more elegant to write it generically, because most operations don’t care what level they are operating on. I can see that the increased abstraction may be confusing though; I’d like to hear more reviewer comments on this.

If yes then current approach is perfect else would it be better for TxGraph to maintain a std::array of two ClusterSet? mainandstaging and store their indexes in TxGraph as enums? This way, Cluster would only need to maintain that enum.

In ClusterSet, we could introduce a boolean to indicate whether a set is active or not.

I think it’s more natural to use a vector, which implicitly encodes how many there are, and it means one can always use m_clustersets.back() to refer to the top set, regardless of whether main/staging exist; the alternative would involve checking whether m_clustersets[1].m_active is true or not. Performance isn’t a concern here either; they’re big data structures that aren’t created/destroyed inside tight loops.

The main set could always remain active, while staging would be flexible and toggleable.

Thus, when we start staging, we can simply mark the back array as active, and once done, clear the back array and mark it as inactive.

I believe this approach would reduce Assumes since we explicitly know we can only have two ClusterSet’s.

True, but it would also add a consistency requirement that m_clustersets[0].m_active is always true, for example. And of course, Assumes are never “necessary”; they’re there to help review (by convincing people that if the condition didn’t hold, presumably a test could hit it). If you find an Assume that’s more cluttering than helpful, it can always just be dropped too.

sipa force-pushed on Feb 20, 2025

sipa commented at 5:24 pm on February 20, 2025: member

I have modified (in “add staging support” and further) the TxGraph functions relating to the normalization steps to all take a level argument as input (ApplyRemovals, SplitAll, GroupClusters, ApplyDependencies, and MakeAllAcceptable). This meant a number of repeated sequences like:

0    SplitAll(0);
1    if (m_clustersets.size() == 1) ApplyDependencies();

could be changed into just

0    ApplyDependencies(0);

in src/txgraph.cpp:828 in 913e14e6fd outdated

830+{
831+    Assume(level >= 0 && size_t(level) < m_clustersets.size());
832+    auto& entry = m_entries[idx];
833+    // Search the entry's locators from top to bottom.
834+    for (int l = level; l >= 0; --l) {
835+        // If the locator is missing, dig deeper; it may exist at a lower level.

instagibbs commented at 7:20 pm on February 20, 2025:

micro-nit: documentation was pretty clear in declaration, but just in case…

0        // If the locator is missing, dig deeper; it may exist at a lower level and therefore be implicitly available at this level.

sipa commented at 10:20 pm on February 21, 2025:

Done.

in src/txgraph.cpp:1548 in 913e14e6fd outdated

1575+void TxGraphImpl::AbortStaging() noexcept
1576+{
1577+    Assume(m_clustersets.size() > 1);
1578+    int stage_level = m_clustersets.size() - 1;
1579+    auto& stage = m_clustersets[stage_level];
1580+    // Mark are removed transactions as Missing (so the stage_level locator for these transactions

instagibbs commented at 7:48 pm on February 20, 2025:

s/Mark are/Mark all/

sipa commented at 10:20 pm on February 21, 2025:

Done.

in src/txgraph.cpp:1717 in 913e14e6fd outdated

1578+    int stage_level = m_clustersets.size() - 1;
1579+    auto& stage = m_clustersets[stage_level];
1580+    // Mark are removed transactions as Missing (so the stage_level locator for these transactions
1581+    // can be reused if another staging is created).
1582+    for (auto idx : stage.m_removed) {
1583+        m_entries[idx].m_locator[stage_level].SetMissing();

instagibbs commented at 7:50 pm on February 20, 2025:

0        Assume(m_entries[idx].m_locator[stage_level].IsRemoved());
1        m_entries[idx].m_locator[stage_level].SetMissing();

sipa commented at 10:22 pm on February 21, 2025:

This seems to work in fuzzing, but I’m not entirely convinced this is clearly true. Entries can exist in m_to_remove which are already removed (if they’re unlinked), so I have not taken this.

instagibbs commented at 0:01 am on February 22, 2025:

saw the updated comment/code in SanityCheck, makes sense

in src/txgraph.cpp:1760 in 913e14e6fd outdated

1576+{
1577+    Assume(m_clustersets.size() > 1);
1578+    int stage_level = m_clustersets.size() - 1;
1579+    auto& stage = m_clustersets[stage_level];
1580+    // Mark are removed transactions as Missing (so the stage_level locator for these transactions
1581+    // can be reused if another staging is created).

instagibbs commented at 7:54 pm on February 20, 2025:

it’s an array of Locators, so what else could have been done here? Might be reading too much into this comment.

sipa commented at 10:24 pm on February 21, 2025:

The goal is just that when staging is aborted/committed, and then you start a new staging, the m_locator[1] values will all be cleared already.

in src/txgraph.cpp:2057 in 07632d6ce0 outdated

1773@@ -1755,6 +1774,11 @@ void TxGraphImpl::SanityCheck() const
1774         if (!clusterset.m_deps_to_add.empty()) compact_possible = false;
1775         if (!clusterset.m_to_remove.empty()) compact_possible = false;
1776         if (!clusterset.m_removed.empty()) compact_possible = false;
1777+
1778+        // If m_group_data exists, its m_group_oversized must match m_oversized.
1779+        if (clusterset.m_group_data.has_value()) {
1780+            assert(clusterset.m_oversized == clusterset.m_group_data->m_group_oversized);

instagibbs commented at 8:12 pm on February 20, 2025:

should check that clusterset.m_oversized.has_value()?

sipa commented at 10:24 pm on February 21, 2025:

No, because it’s comparing an std::optional<bool> with a bool, which can only be true if the first isn’t std::nullopt.

in src/txgraph.cpp:308 in fdd68be0ac outdated

302@@ -303,6 +303,8 @@ class TxGraphImpl final : public TxGraph
303         Locator m_locator[MAX_LEVELS];
304         /** The chunk feerate of this transaction in main (if present in m_locator[0]). */
305         FeePerWeight m_main_chunk_feerate;
306+        /** The position this transaction has in the main linearization (if present). */

instagibbs commented at 8:48 pm on February 20, 2025:

This value is set to -1, sometimes, but never acted on with that value. Noting in case you have thoughts on that.

sipa commented at 10:25 pm on February 21, 2025:

Not really. The -1 has no meaning, but hopefully would result in a nice detectable error if it were ever used.

bitcoin deleted a comment on Feb 20, 2025

in src/txgraph.h:149 in 11135464c5 outdated

148+    virtual std::vector<Ref*> GetAncestorsUnion(std::span<const Ref* const> args, bool main_only = false) noexcept = 0;
149     /** Get pointers to all descendants of the specified transaction. If main_only is false and a
150      *  staging graph exists, it is queried; otherwise the main graph is queried. The queried
151      *  graph must not be oversized. Returns {} if arg does not exist in the queried graph. */
152     virtual std::vector<Ref*> GetDescendants(const Ref& arg, bool main_only = false) noexcept = 0;
153+    /** Like GetDescendants, but return the Refs for all transactions in the union of the provided

instagibbs commented at 4:32 pm on February 21, 2025:

nit: might be worthwhile to mention in this and elsewhere that results will be unique

sipa commented at 10:10 pm on February 21, 2025:

It says “(each transaction is only reported once)”, is that not what you mean?

instagibbs commented at 11:35 pm on February 21, 2025:

major reading comprehension issues it seems

instagibbs commented at 5:07 pm on February 21, 2025: member

Looking good 11135464c5c3aef1ce8d2823120467a522ce2c87

I just have conceptual issues with what PostLinerize is guaranteeing us beyond what is explicitly stated in the header file.

Pile of nits/questions otherwise

sipa force-pushed on Feb 21, 2025

sipa commented at 10:25 pm on February 21, 2025: member

Addressed most comments. I will address the PostLinearize related ones later.

instagibbs commented at 0:02 am on February 22, 2025: member

verified changes, just waiting on further PostLinearization commentary

via git range-diff master 11135464c5c3aef1ce8d2823120467a522ce2c87 d5abb86439e79d4adfbfbd46f833268bbca0bf6e

sipa commented at 8:15 pm on March 3, 2025: member

@instagibbs So… it seems that the “optimal linearization, truncate, postlinearize, split” does not actually result in minimal chunks (but it does result in feerate-diagram-optimal ones, and connected ones):

0graph BT;T18["A: 4"];T19["B: 5"];T20["C: 7"];T20-->T18;T20-->T13;T9["D: 7"];T9-->T18;T9-->T19;T13["E: 6"];T13-->T19;

The original (optimal) linearization is [ABEDC]. It’s just a single chunk, so really any ordering is equally good.

Truncation removes C. Postlinearization turns it into [BADE]. However, a 2-chunk alternative exists: [BE,AD].

sipa force-pushed on Mar 3, 2025

sipa commented at 9:54 pm on March 3, 2025: member

I have dropped the QualityLevel::NEEDS_SPLIT_OPTIMAL value, and made the “post-linearize after remove, before split” apply to NEEDS_SPLIT_ACCEPTABLE instead, with updated comments about it.

instagibbs commented at 5:09 pm on March 4, 2025: member

ACK 72a97c0a07ea6e5a95ab37c8d95e1ea02cff8e92

With the weaker claims in the code re:PostLinearization, it works for me, and obviates the need for an additional harness test.

git range-diff master d5abb86439e79d4adfbfbd46f833268bbca0bf6e 72a97c0a07ea6e5a95ab37c8d95e1ea02cff8e92

DrahtBot requested review from jonatack on Mar 4, 2025

DrahtBot requested review from ismaelsadeeq on Mar 4, 2025

ismaelsadeeq approved

ismaelsadeeq commented at 8:29 pm on March 6, 2025: member

Code review ACK 72a97c0a07ea6e5a95ab37c8d95e1ea02cff8e92 🛸

I’ve reviewed each commit and also looked at how the interface is being used by CTxMemPool in the big PR

in src/cluster_linearize.h:23 in 8872339583 outdated

18@@ -19,8 +19,8 @@
19 
20 namespace cluster_linearize {
21 
22-/** Data type to represent transaction indices in clusters. */
23-using ClusterIndex = uint32_t;
24+/** Data type to represent transaction indices in DepGraphs and the clusters they represent. */
25+using DepGraphIndex = uint32_t;

ajtowns commented at 4:52 am on March 14, 2025:

This commit could be a scripted diff.

0-BEGIN VERIFY SCRIPT-
1sed -i 's/Data type to represent transaction indices in clusters./Data type to represent transaction indices in DepGraphs and the clusters they represent./' $(git grep -l 'using ClusterIndex')
2sed -i 's|\<ClusterIndex\>|DepGraphIndex|g' $(git grep -l 'ClusterIndex')
3-END VERIFY SCRIPT-

sipa commented at 9:15 pm on March 19, 2025:

Done.

in src/txgraph.h:77 in e22a0b21f8 outdated

72+    virtual ~TxGraph() = default;
73+    /** Construct a new transaction with the specified feerate, and return a Ref to it. In all
74+     *  further calls, only Refs created by AddTransaction() are allowed to be passed to this
75+     *  TxGraph object (or empty Ref objects). */
76+    [[nodiscard]] virtual Ref AddTransaction(const FeePerWeight& feerate) noexcept = 0;
77+    /** Remove the specified transaction. This is a no-op if the transaction was already removed.

ajtowns commented at 5:31 am on March 14, 2025:

The invalidation behaviour for Refs seems unclear to me? I think the idea is that if you hold onto a Ref you can continue to uniquely refer to a removed transaction, up until the Ref is destructed, at which point the TxGraph is compacted and memory might be reused at that point. This delay then also allows for reordering to occur which can result in optimisations.

I think the intended behaviour could be made a bit clearer in the header file though? It’s not immediately obvious to me under what circumstances the ApplyRemovals() and Compact() are/should be called.

sipa commented at 9:19 pm on March 19, 2025:

As far as the interface is concerned: you get a Ref object from AddTransaction, and as long as the object exists, using it remains valid (which, due to the unlinking in ~TxGraphImpl you’ve suggested I add, may now even outlive the TxGraph). In the initial commit a Ref object may only be destroyed after the transaction has been removed, but this is relaxed in “txgraph: (feature) destroying Ref means removing transaction” (currently 9a04cc18e19aa03d215ce1fb030809d7e6eb98a6).

I’d like to keep the implementation details (like when ApplyRemovals() and Compact() are called) out of the interface definition, but other than that, happy for any suggestions that would clarify things (I’ve adjusted comments a bit, and moved them to the correct commits already).

ajtowns commented at 0:34 am on March 20, 2025:

Maybe it would be helpful to explain in the header that AddTransaction() creates a Ref, which you’re then expected to assign/move into your own container, which may be a subclass containing even more info about the tx, and then that serves as a permanent handle, even as the txgraph gets rearranged internally and that it’s deleted from the graph either with RemoveTransaction explicitly or implicitly by destructing the Ref? Essentially expanding the “Data type used to reference transactions within a TxGraph” comment that’s already there to be a bit more ELI5 maybe?

If you create an empty Ref (ie, construct it directly, not via AddTransaction) is there any way to associate that Ref with a TxGraph or is it forever useless as a Ref? Would it make any sense to have an AddTransaction variant that takes a Ref& argument rather than returning a new Ref? I guess equivalently, is an empty Ref useful for anything other than tests? Maybe “Non-empty Refs can only be created using AddTx” should be more explicit and say “You want to use AddTransaction() rather than constructing a Ref directly”.

sipa commented at 6:10 pm on March 23, 2025:

This works:

0TxGraph::Ref ref;
1...
2ref = txgraph.AddTransaction(fee, size):

So think of an empty Ref as an equivalent to an std::unique_ptr to nullptr or so. It’s potentially useful just so you can have a variable declared as a member in a class or so without immediately initializing it.

It even goes further in that this works even through subclasses:

0class TxMempoolEntry : public TxGraph::Ref { ... }
1
2TxMemoolEntry entry;
3...
4entry = txgraph.AddTransaction(fee, size);

in src/txgraph.cpp:168 in e22a0b21f8 outdated

163+         *  pairs. */
164+        std::vector<std::pair<GraphIndex, GraphIndex>> m_deps;
165+    };
166+
167+    /** The vectors of clusters, one vector per quality level. ClusterSetIndex indexes into each. */
168+    std::vector<std::unique_ptr<Cluster>> m_clusters[int(QualityLevel::NONE)];

ajtowns commented at 6:11 am on March 14, 2025:

std::array<std::vector<std::unique_ptr<Cluster>>,int{QualityLevel::NONE}> m_clusters; might read clearer?

sipa commented at 9:19 pm on March 19, 2025:

Done.

in src/txgraph.cpp:197 in e22a0b21f8 outdated

192+        /** Check if this Locator is present (in some Cluster). */
193+        bool IsPresent() const noexcept { return cluster != nullptr; }
194+    };
195+
196+    /** A class of objects held internally in TxGraphImpl, with information about a single
197+     *  transaction. */

ajtowns commented at 6:29 am on March 14, 2025:

“A class of objects” doesn’t seem like a very useful description? “Internal information about each transaction in a TxGraphImpl” ?

sipa commented at 9:20 pm on March 19, 2025:

Done.

in src/txgraph.cpp:303 in e22a0b21f8 outdated

196+    /** A class of objects held internally in TxGraphImpl, with information about a single
197+     *  transaction. */
198+    struct Entry
199+    {
200+        /** Pointer to the corresponding Ref object if any, or nullptr if unlinked. */
201+        Ref* m_ref{nullptr};

ajtowns commented at 7:06 am on March 14, 2025:

Having each Ref point to the TxGraph and the TxGraph point to (and potentially modify via SwapIndexes) every Ref seems very self-referential, which is probably a bit confusing and not very cache optimal and so forth…

I think you could reduce this to a const Ref* (so that Refs wouldn’t get modified via entry.m_ref) by replacing the use of SwapIndexes on Compact() with a free list/heap of no-longer used entries. Probably not worthwhile though.

sipa commented at 8:53 pm on March 19, 2025:

So this is roughly the thinking that led to this design of the split between Ref and Entry.

The most direct way of accomplishing this is just moving all data currently in Entry into Ref instead, but that requires exposing TxGraphImpl implementation details in the header. It also lacks a way to refer to entries in a stable manner internally. For example, what if a transaction has been marked for removal or dependency adding, and the Ref object is moved by the mempool code… we’d need a way to iterate over all places where a particular Ref is marked, which (I believe) needs several more cyclic references between components.

Another extreme is to have a map from GraphIndex values (which are chosen incrementally and never reused) to Entry objects, making them truly globally and permanently unique identifiers for Refs. No rewriting is ever necessary, as there are no compactions needed, but it implies an additional allocation per transaction, plus probably (even) worse memory locality than the current design.

I think using a freelist-like design can be seen as a variant of this, by storing the map in a support/allocators/pool.h pooled resource, and using map iterators as the graph indexes? I feel that’s probably overkill.

The current design avoids the Entry->Ref map by allocating them continuously, but compacting at stable points, which necessitates a backreference, but that’s still cheaper than an additional map.

in src/txgraph.h:18 in e22a0b21f8 outdated

13+#define BITCOIN_TXGRAPH_H
14+
15+/** No connected component within TxGraph is allowed to exceed this number of transactions. */
16+static constexpr unsigned CLUSTER_COUNT_LIMIT{64};
17+
18+/** Data structure to encapsulate fees, sizes, and dependencies for a set of transactions. */

ajtowns commented at 10:07 am on March 15, 2025:

Might be worth pointing out that the api is designed towards allowing for an implementation that stores the transitive closure of dependencies – when querying, if B spends C, then you can’t differentiate between A spends B and A spends B and C.

sipa commented at 9:20 pm on March 19, 2025:

Done.

in src/txgraph.cpp:1120 in e22a0b21f8 outdated

701+    std::sort(an_clusters.begin(), an_clusters.end());
702+    an_clusters.erase(std::unique(an_clusters.begin(), an_clusters.end()), an_clusters.end());
703+
704+    // Run the union-find algorithm to to find partitions of the input Clusters which need to be
705+    // grouped together. See https://en.wikipedia.org/wiki/Disjoint-set_data_structure.
706+    {

ajtowns commented at 11:47 am on March 16, 2025:

Slightly surprised the “union-find algorithm” section isn’t broken out into a top level class or similar, rather than lambda functions.

sipa commented at 8:59 pm on March 19, 2025:

Yeah, fair point. I will think about this. It’s a bit unfortunate that there is a second union-find implementation added in #31553, but it’s so different (including different per-partition meta-data being kept) that trying to have a single implementation would be too abstract/template-heavy.

sipa commented at 6:16 pm on March 23, 2025:

I tried to encapsulate it in its own class, but it’s hard to give it the same performance (which is relevant in the context of huge reorgs) without making the interface rather involved (needs a “representative” type that contains a pointer to the PartitionData object, separate from the element data type), and even then, the union-find version in Trim() in 31553 would need its own separate implementation still.

Going to leave it for now, we can think about cleanups later.

in src/txgraph.cpp:2089 in e22a0b21f8 outdated

1141+
1142+TxGraph::Ref::~Ref()
1143+{
1144+    if (m_graph) {
1145+        // Inform the TxGraph about the Ref being destroyed.
1146+        m_graph->UnlinkRef(m_index);

ajtowns commented at 4:35 pm on March 17, 2025:

I think this gives you a segfault when aborting the program if the TxGraph is destructed prior to some of its outstanding Ref’s being destructed. It might be worthwhile having the TxGraph destructor iterate through any txs that haven’t already been unlinked and clear the m_graph pointers back to itself. Alternatively, the TxGraph destructor could issue an assertion failure if there are any linked Refs remaining.

sipa commented at 9:00 pm on March 19, 2025:

Done. I’ve made ~TxGraphImpl clean up all Ref::m_graph pointers.

ajtowns commented at 5:30 pm on March 17, 2025: contributor

This seems fairly difficult to review – there’s a lot of mostly implicit background in what TxGraph does and why it does it that way, as evidenced by both the description in this PR and in the review club.

When I started typing this comment, I was wondering if maybe it would make sense to make the commits here into more of a gentle introduction to TxGraph, rather than one commit that does everything, then optimisations. But I don’t really think that approach would survive subsequent modifications to the code very well. Likewise for just adding more comments to the code.

Alternatively, I wonder if it would be helpful both for new devs and debugging to expose the TxGraph API as a python module, possibly with additional debugging handles so that you can examine internal state? I could imagine an interactive python notebook that allows you to interact with a real TxGraph being a useful experimental tool for learning/writing mempool code and also being something that could perhaps be reasonably maintained. Here’s a rough take at what that might look like. It’s using boost-python, which seems reasonable, but doesn’t seem super friendly to pure-virtual classes (hence the tedious wrapper class) ~~or move-only objects (hence AddTransaction being wrapped via a constructor)~~. (EDIT: add link for “here’s”; EDIT2: converting the vector<Ref*> stuff to python made the move-only Ref stuff irrelevant; the wrapper class isn’t quite so tedious any more either really)

Comments below are mostly a bunch of nits/observations while familiarising myself with the code, feel free to ignore.

in src/txgraph.cpp:112 in 72a97c0a07 outdated

107+    /** Given a DepGraphIndex into this Cluster, find the corresponding GraphIndex. */
108+    GraphIndex GetClusterEntry(DepGraphIndex index) const noexcept { return m_mapping[index]; }
109+    /** Only called by Graph::SwapIndexes. */
110+    void UpdateMapping(DepGraphIndex cluster_idx, GraphIndex graph_idx) noexcept { m_mapping[cluster_idx] = graph_idx; }
111+    /** Push changes to Cluster and its linearization to the TxGraphImpl Entry objects. */
112+    void Updated(TxGraphImpl& graph) noexcept;

ajtowns commented at 2:01 pm on March 18, 2025:

Most of the Cluster methods require a TxGraphImpl to keep the quality/staging/mapping things consistent; feels a bit like it might be simpler just to have almost all this code stay in TxGraphImpl?

sipa commented at 8:32 pm on March 19, 2025:

See my comments below: #31363 (comment) about how the division between the two follows whether or not the function needs access to Cluster’s internals (which may become dependent on the subclass).

in src/txgraph.cpp:27 in 72a97c0a07 outdated

22+
23+/** The maximum number of levels a TxGraph can have (0 = main, 1 = staging). */
24+static constexpr int MAX_LEVELS{2};
25+
26+// Forward declare the TxGraph implementation class.
27+class TxGraphImpl;

ajtowns commented at 3:22 pm on March 18, 2025:

I think this is a close to accurate diagram of the data structures here:

txgraph excalidraw

To me, the cycles in that graph seem like a code smell. The GraphIndex backrefs make sense (or are unavoidable), as some stable way to refer to each tx is needed. But, I think it would probably be better for a Cluster not to be aware of its quality level or staging level (removing the m_setindex/m_quality/m_level back links) and moving the code that would query/change those items into TxGraph instead.

I think you could think of GraphIndex as the unique global reference to a tx in a TxGraph, in which case Ref could be viewed as an RAII view of a GraphIndex (which enforces the need for a reference to TxGraph to allow for the resource to be freed on destruction), and which also copes with GraphIndex values being updated.

sipa commented at 8:31 pm on March 19, 2025:

See my comment below: #31363 (comment) regarding Cluster becoming an abstract base class.

ajtowns commented at 0:41 am on March 20, 2025:

Sounds like this is something that can be reconsidered when we get a PR that introduces variant Clusters then. Mark as resolved for now?

sipa commented at 11:57 pm on March 21, 2025:

Marking resolved.

in src/txgraph.cpp:240 in 72a97c0a07 outdated

235+         *  m_group_data->m_group_oversized, but may be known even if m_group_data is not. */
236+        std::optional<bool> m_oversized{false};
237+    };
238+
239+    /** The ClusterSets in this TxGraphImpl. Has exactly 1 (main) or exactly 2 elements (main and staged). */
240+    std::vector<ClusterSet> m_clustersets;

ajtowns commented at 4:14 pm on March 18, 2025:

ClusterSet m_main_clusterset; std::optional<ClusterSet> m_staging_clusterset; might be clearer – the vector stuff seems more confusing than helpful?

Does it ever make sense to have multiple nested staging levels? Parallel staging levels could make sense (I want to compare three RBFs in parallel on multiple cores, having checked that they’re all independent and don’t interact with each others’ clusters) but seems probably more complicated than would be worthwhile.

instagibbs commented at 6:23 pm on March 18, 2025:

I ran into a somewhat contrived use-case, but I have a fairly strong YNGNI feeling of this aspect of the code. That said, I don’t think it hurts understanding too much as is.

ismaelsadeeq commented at 6:28 pm on March 18, 2025:

Curious to know this contrived use case, I also think it’s a bit of an overkill in a previous comment #31363#pullrequestreview-2618143580, but @sipa made a compelling argument for it #31363 (comment)

ajtowns commented at 9:31 am on March 19, 2025:

I think StartStaging() gets a bit simpler if you know that it can only be called with m_clustersets.size()==1:

0 void TxGraphImpl::StartStaging() noexcept
1 {
2+    Assume(m_clustersets.size() == 1);  // vs < MAX_LEVELS
3+    SplitAll(0); // vs m_clustersets.size() - 1
4+    ApplyDependencies(0); // ditto
5     m_clustersets.emplace_back();
6+    auto& stage = m_clustersets[1]; // vs m_clustersets.back()
7+    auto& main = m_clustersets[0]; // vs *(m_clustersets.rbegin() + 1);
8     ...

That’s the only use of m_clustersets.rbegin(), the remaining API of m_clustersets seems to be:

 0    class MainStageClusterSets
 1    {
 2    private:
 3        std::vector<ClusterSet> cs;
 4
 5    public:
 6        MainStageClusterSets()
 7        {
 8            cs.reserve(MAX_LEVELS);
 9            cs.emplace_back();
10        }
11
12        bool UnfinishedTodo() const {
13            for (auto& clusterset : cs) {
14                if (!clusterset.m_deps_to_add.empty()) return true;
15                if (!clusterset.m_to_remove.empty()) return true;
16                if (!clusterset.m_removed.empty()) return true;
17            }
18            return false;
19        }
20
21        const ClusterSet& operator[](size_t i) const { return cs[i]; }
22        ClusterSet& operator[](size_t i) { return cs[i]; }
23        ClusterSet& back() { return cs.back(); }
24        size_t size() const { return cs.size(); }
25
26        void emplace_back() { cs.emplace_back(); }
27        void pop_back() { cs.pop_back(); }
28    };
29    MainStageClusterSets m_clustersets;

which seem pretty straightforward to switch to a main/optional-stage model. Doing some renaming would make sense if so (emplace_back/pop_back to add/remove_staging; back to work maybe; size to has_staging maybe; maybe a templated ForEach() accepting a lambda taking a ClusterSet for iteration).

sipa commented at 9:22 pm on March 19, 2025:

I’ve done something like this; std::vector<ClusterSet> is gone, and the code reduced by a whopping 23 lines as a result.

in src/txgraph.cpp:1887 in 72a97c0a07 outdated

1882+        // Check that the Entry has a locator pointing back to this Cluster & position within it.
1883+        assert(entry.m_locator[level].cluster == this);
1884+        assert(entry.m_locator[level].index == lin_pos);
1885+        // For top-level entries, check linearization position and chunk feerate.
1886+        if (level == 0 && IsAcceptable()) {
1887+            assert(entry.m_main_lin_index == linindex++);

ajtowns commented at 4:56 pm on March 18, 2025:

Avoid side-effects inside asserts if possible?

sipa commented at 9:22 pm on March 19, 2025:

Done.

ajtowns commented at 6:03 pm on March 18, 2025: contributor

Preliminary ACK 72a97c0a07ea6e5a95ab37c8d95e1ea02cff8e92

I think it’s okay to merge this code:

as of this PR, it isn’t actually used, so should not introduce bugs immediately
the fuzz testing is pretty good, so there should not be any catastrophic bugs even if it were used
while maybe the design could be improved, that can be done incrementally after it’s merged
I’m reasonably confident I understand what’s going on and can help find/fix bugs/problems if they occur
I think the design and implementation are a good step forward for implementing cluster mempool

On the other hand, calling this “preliminary” as:

I’m not yet confident that I could maintain this code if its authors all suddenly vanished
I’m a bit concerned that to contribute to mempool code you might need to be a triple expert – C++, bitcoin, and industrial optimisation theory; not a show stopper, but as further PRs progress I’d like to gain confidence either that the optimisation parts can be safely treated as a very stable black box, or (ideally and) that we can reasonably onboard people to understand how the internals of the black box work.

sipa commented at 7:12 pm on March 18, 2025: member

@ajtowns Responding to some of the more conceptual comments

I’m indeed not entirely happy with the cycles between TxGraphImpl and Cluster, though there is a design concern that isn’t really materialized in the current code yet: Cluster may become a virtual class with multiple implementations for the purpose of memory usage (e.g., there could be a specialized SingletonCluster which just stores a single transaction, and no DepGraph, linearization, …, avoiding several vectors and their allocation overhead). The current code is designed to accommodate that, by putting code in Cluster member functions that need to have access to the internals, while having the generic code in TxGraphImpl. It’s still not trivial to introduce, because cluster merging/splitting needs to somehow work for all combination of cluster types, but to the extent I can say without actually having implemented it, I think it’s fairly close to that. I’m happy to hear about alternative approaches.
The above relates somewhat to the m_setindex/m_quality/m_level not feeling like they deserve to be dealt with in Cluster: indeed, they’re per-Cluster data, but not part of the “cluster implementation specific” code, so I was imagining these (the variables, and functions relating to them) to be part of the abstract base class for Cluster, and not in the Cluster derived classes. It would be possible to move the related functions to TxGraphImpl instead, but if this virtual Cluster design appears, maybe having it in the Cluster base class is actually more clear anyway?
Regarding Assume / Assert / assert. It’s true that semantically speaking, Assume() is always evaluated (so it’s fine to have side effects inside). However, it’s also the case that if the compiler can prove a statement is side-effect free, it can optimize the call away in non-debug builds. So I’m using them as a form of cheap assertion that can be used extensively throughout the codebase, most of which without runtime impact in production, compiler-proven to have same side-effects in debug and release builds. Combined with (fuzz) tests with high branch coverage, I believe this gives you the best of both worlds for non-critical assertions that still help reviewers reason about the code (as a “ah, an Assume(), I can be reasonably assured that in fuzz-test-reachable-scenarios, this condition is actually always true”).
Regarding the destruction of Ref objects and removing/unlinking. Note that the behavior changes: in the initial commit you can only destroy Ref objects whose transactions have already been RemoveTransaction()d, so that there is never a need for an inspector function to refer back to it anymore. In a later commit, Ref destroying is changed to automatically trigger a removal in the linked TxGraph if any (in both main and staging, which RemoveTransaction can’t do, as it only works on the top level).

The current design tries very hard to make sure that under all circumstances, TxGraph is internally consistent (including when destroying non-removed Ref objects, which necessitates the m_graph pointer inside), but the responsibility to make sure the graph is externally consistent with the mempool lies outside. However, it doesn’t actually achieve this design entirely when considering locking: one could accidentally destroy a Ref without holding whatever lock protects the TxGraph object, for example, and if we need to restrict to only supporting “sane” behavior, maybe an alternative design is possible, where instead of TxGraph::Ref::~Ref, we have a TxGraph::WipeTransaction(Ref&) function (which does removing/unlinking/compacting as needed), and the actual Ref destructor then just asserts(false); if not wiped first? This would also remove the need for having an m_graph inside a Ref (or making it debug-only).
Regarding having a ClusterSet m_main_clusterset; std::optional<ClusterSet> m_staging_clusterset instead of a vector of ClusterSets (also tagging @instagibbs and @ismaelsadeeq who have commented about this). There is absolutely no intention of trying to accommodate multiple nested (or parallel) levels, and the design isn’t aiming for that. I just found it more elegant to largely avoid treating main and staging as separate concepts, and instead see them as two copies of the same thing. I haven’t tried the alternative, and there is probably a bunch of “interact with level below” that can become a bit more simple/specialized, but maybe also more logic for picking between m_main_clusterset and m_staging_clusterset (which aren’t even the same type). If you all think it’s worth trying, I’m happy to look into it.

ajtowns commented at 12:10 pm on March 19, 2025: contributor

However, it’s also the case that if the compiler can prove a statement is side-effect free, it can optimize the call away in non-debug builds.

That makes sense. Might be a good idea to update doc/developer-notes.md pointing out that use case? It currently says:

Assume should be used to document assumptions when program execution can safely continue even if the assumption is violated.

sipa force-pushed on Mar 19, 2025

DrahtBot added the label CI failed on Mar 19, 2025

DrahtBot commented at 9:08 pm on March 19, 2025: contributor

🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/39067477614

Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.
An intermittent issue.

Leave a comment here, if you need help tracking down a confusing failure.

sipa force-pushed on Mar 19, 2025

sipa commented at 9:34 pm on March 19, 2025: member

Big change: std::vector<ClusterSet> is gone, replaced by m_main_clusterset and m_staging_clusterset.

New helper functions GetTopLevel(), GetSpecifiedLevel(), GetClusterSet() made this fairly easy to do.
Simplifications resulted in ClearLocator, GetConflicts, PullIn, MakeTransactionsMissing, StartStaging, CommitStaging, ExtractCluster.
CopyTo was renamed to CopyToStaging.
LevelDown was renamed to MoveToMain.
MakeTransactionsMissing was renamed to MakeStagingTransactionsMissing.

Other changes:

Various comments
Ref objects are now allowed to outlive TxGraph.
GetChunkFeerate (and its testing, plus Entry::m_chunk_feerate maintenance) was moved to a separate commit.

DrahtBot removed the label CI failed on Mar 20, 2025

in src/txgraph.h:117 in 63b8f96e67 outdated

128+     *  configured maximum cluster count). If main_only is false and a staging graph exists, it is
129+     *  queried; otherwise the main graph is queried. Some of the functions below are not available
130+     *  for oversized graphs. The mutators above are always available. Removing a transaction by
131+     *  destroying its Ref while staging exists will not clear main's oversizedness until staging
132+     *  is aborted or committed. */
133+    virtual bool IsOversized(bool main_only = false) noexcept = 0;

ajtowns commented at 6:50 am on March 20, 2025:

If I’m understanding right, this seem like it makes IsOversized() largely unfixable by the client – so the only reasonable strategy I can see is to do:

0    StartStaging();
1    // change transactions
2    if (IsOversized() || otherwise_undesirable()) {
3        AbortStaging();
4    } else {
5        CommitStaging();`
6    }

That’s a very reasonable strategy of course, so this isn’t a complaint! Just that if so, it seems like it should be documented a bit more clearly? I wonder a bit if CommitStaging() should have an Assume(G_FUZZING || !m_oversized) or something…

(In particular, if you ever end up with the main graph being oversized, I don’t see how you can even discover which txs are contributing to the oversized-ness in order to remove them – GetCluster(ref) will fail the m_deps_to_add.empty() check I think, so you can’t use that, and if you do you’ll just get a subset of the actual cluster that is within limits; same with GetDescendants)

ajtowns commented at 7:08 am on March 20, 2025:

cluster mempool: introduce TxGraph #31363

1. Overview

2. Interface

3. Implementation

4. Future work

Code Coverage & Benchmarks

Reviews

Conflicts