This introduces the TxGraph class, which encapsulates knowledge about the (effective) fees, sizes, and dependencies between all mempool transactions, but nothing else. In particular, it lacks knowledge about CTransaction, inputs, outputs, txids, wtxids, prioritization, validatity, policy rules, and a lot more. Being restricted to just those aspects of the mempool makes the behavior very easy to fully specify (ignoring the actual linearizations produced), and write simulation-based tests for (which are included in this PR).
2. Interface
The interface can be largely categorized into:
Mutation functions:
AddTransaction (add a new transaction with specified feerate, and get a Ref object back to identify it).
RemoveTransaction (given a Ref object, remove the transaction).
AddDependency (given two Ref objects, add a dependency between them).
SetTransactionFee (modify the fee associated with a Ref object).
Inspector functions:
GetAncestors (get the ancestor set in the form of Ref* pointers)
GetAncestorsUnion (like above, but for the union of ancestors of multiple Ref* pointers)
GetDescendants (get the descendant set in the form of Ref* pointers)
GetDescendantsUnion (like above, but for the union of ancestors of multiple Ref* pointers)
GetCluster (get the connected component set in the form of Ref* pointers, in the order they would be mined).
GetIndividualFeerate (get the feerate of a transaction)
GetChunkFeerate (get the mining score of a transaction)
CountDistinctClusters (count the number of distinct clusters a list of Refs belong to)
Staging functions:
StartStaging (make all future mutations operate on a proposed transaction graph)
CommitStaging (apply all the changes that are staged)
AbortStaging (discard all the changes that are staged)
Miscellaneous functions:
DoWork (do queued-up computations now, so that future operations are fast)
This TxGraph::Ref type used as a “handle” on transactions in the graph can be inherited from, and the idea is that in the full cluster mempool implementation (#28676, after it is rebased on this), CTxMempoolEntry will inherit from it, and all actually used Ref objects will be CTxMempoolEntrys. With that, the mempool code can just cast any Ref* returned by txgraph to CTxMempoolEntry*.
3. Implementation
Internally the graph data is kept in clustered form (partitioned into connected components), for which linearizations are maintained and updated as needed using the cluster_linearize.h algorithms under the hood, but this is hidden from the users of this class. Implementation-wise, mutations are generally applied lazily, appending to queues of to-be-removed transactions and to-be-added dependencies, so they can be batched for higher performance. Inspectors will generally only evaluate as much as is needed to answer queries, with roughly 5 levels of processing to go to fully instantiated and acceptable cluster linearizations, in order:
ApplyRemovals (take batches of to-be-removed transactions and translate them to “holes” in the corresponding Clusters/DepGraphs).
SplitAll (creating holes in Clusters may cause them to break apart into smaller connected components, so make turn them into separate Clusters/linearizations).
GroupClusters (figure out which Clusters will need to be combined in order to add requested to-be-added dependencies, as these may span clusters).
ApplyDependencies (actually merge Clusters as precomputed by GroupClusters, and add the dependencies between them).
MakeAcceptable (perform the LIMO linearization algorithm on Clusters to make sure their linearizations are acceptable).
4. Future work
This is only an initial version of TxGraph, and some functionality is missing before #28676 can be rebased on top of it:
The ability to get comparative feerate diagrams before/after for the set of staged changes (to evaluate RBF incentive-compatibility).
Mining interface (ability to iterate transactions quickly in mining score order) (see #31444).
Eviction interface (reverse of mining order, plus memory usage accounting) (see #31444).
Ability to fix oversizedness of clusters (before or after committing) - this is needed for reorgs where aborting/rejecting the change just is not an option (see #31553).
Interface for controlling how much effort is spent on LIMO. In this PR it is hardcoded.
Then there are further improvements possible which would not block other work:
Making Cluster a virtual class with different implementations based on transaction count (which could dramatically reduce memory usage, as most Clusters are just a single transaction, for which the current implementation is overkill).
The ability to have background thread(s) for improving cluster linearizations.
DrahtBot
commented at 3:59 pm on November 24, 2024:
contributor
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.
Conflicts
Reviewers, this pull request conflicts with the following ones:
#30605 (Cluster linearization: separate tests from tests-of-tests by sipa)
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.
sipa force-pushed
on Nov 24, 2024
DrahtBot
commented at 4:20 pm on November 24, 2024:
contributor
Try to run the tests locally, according to the documentation. However, a CI failure may still
happen due to a number of reasons, for example:
Possibly due to a silent merge conflict (the changes in this pull request being
incompatible with the current code in the target branch). If so, make sure to rebase on the latest
commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the
affected test.
An intermittent issue.
Leave a comment here, if you need help tracking down a confusing failure.
DrahtBot added the label
CI failed
on Nov 24, 2024
sipa force-pushed
on Nov 24, 2024
sipa force-pushed
on Nov 24, 2024
DrahtBot removed the label
CI failed
on Nov 24, 2024
sipa force-pushed
on Nov 25, 2024
sipa force-pushed
on Nov 25, 2024
sipa force-pushed
on Nov 25, 2024
glozow added the label
Mempool
on Nov 28, 2024
sipa force-pushed
on Dec 2, 2024
sipa force-pushed
on Dec 4, 2024
jonatack
commented at 2:22 pm on December 5, 2024:
member
Concept ACK
in
src/txgraph.cpp:258
in
18184f4e73outdated
225+ Ref* m_ref;
226+ /** Which Cluster and position therein this Entry appears in. ([0] = main, [1] = staged). */
227+ Locator m_locator[MAX_LEVELS];
228+ /** The chunk feerate of this transaction in main (if present in m_locator[0]) */
229+ FeeFrac m_main_chunk_feerate;
230+ /** The position this transaction in the main linearization (if present). /*/
instagibbs
commented at 3:25 pm on December 18, 2024:
0 /** The position this transaction in the main linearization (if present). **/
The src/txgraph.h file added in that commit, as it defines the public interface.
The src/test/fuzz/txgraph.cpp test added in the next commit (txgraph: (tests) add simulation fuzz test), as it does to an extent show how the interface is used, and what is expected of it.
The data structures (TxGraphImpl and Cluster) in src/txgraph.cpp, and the internal functions corresponding to the 5 processing steps that happen:
Batch removal of queued transaction removals: TxGraphImpl::ApplyRemovals and Cluster::ApplyRemovals.
Splitting of clusters into components after removing transactions: TxGraphImpl::SplitAll, TxGraphImpl::Split, and Cluster::Split
Computing which clusters will need to be combined due to queued dependencies being added between them: TxGraphImpl::GroupClusters.
Merging of clusters and applying queued dependencies to them: TxGraphImpl::ApplyDependencies, TxGraphImpl::Merge, Cluster::Merge, Cluster::ApplyDependencies.
Making the linearization of clusters acceptable: TxGraphImpl::MakeAcceptable and Cluster::Relinearize.
The interface implementation functions that are built on top in TxGraphImpl: AddTransaction, RemoveTransaction, AddDependency, SetTransactionFee, Cleanup, GetAncestors, GetDescendants, GetCluster, GetChunkFeerate, etc.
sipa force-pushed
on Dec 22, 2024
sipa
commented at 5:23 pm on December 22, 2024:
member
I have pushed a substantial change to the TxGraphImpl::GroupClusters function (in commit “txgraph: (feature) add initial version”). In a benchmark with 64000 transactions being combined into 1000 clusters, this makes it go from ~160ms to ~16ms. I will add the benchmark to this PR when I clean it up.
ismaelsadeeq
commented at 7:33 pm on December 24, 2024:
member
Concept ACK
GetChunkFeerate (get the mining score of a transaction)
From reading the description and skimming through the code, IIUC TxGraph encapsulates knowledge about fees and sizes but lacks knowledge about prioritization. This implies that the fee it uses is the effective fee, not the modified fee that includes prioritization.
How are we accounting for the real mining score of a Ref without that prioritization knowledge? The current mining algorithm uses the modified ancestor fee to calculate the mining score.
Could you clarify how prioritization will be handled post-cluster mempool?
sipa
commented at 7:41 pm on December 24, 2024:
member
@ismaelsadeeq TxGraph (and DepGraph, and FeeFrac) just treat “fee” and “size” as numbers whose ratio is to be maximized. These classes don’t care what meaning they correspond to.
In practice, I expect that the mempool code will provide the modified fee as “fee”, and vsize or weight as “size”.
ismaelsadeeq
commented at 8:37 pm on December 24, 2024:
member
Thank you for clarifying. Reading ’lacks knowledge about prioritization’ led me to infer that.
But I see now you are talking about mapDeltas
sipa force-pushed
on Jan 8, 2025
in
src/txgraph.h:124
in
972df15af2outdated
134- virtual bool IsOversized() noexcept = 0;
135- /** Get the feerate of the chunk which transaction arg is in. Returns the empty FeeFrac if arg
136- * does not exist. The graph must not be oversized. */
137- virtual FeeFrac GetChunkFeerate(const Ref& arg) noexcept = 0;
138+ virtual bool IsOversized(bool main_only = false) noexcept = 0;
139+ /** Get the feerate of the chunk which transaction arg is in in the main graph. Returns the
ismaelsadeeq
commented at 6:28 pm on January 9, 2025:
nit when retouching
In " txgraph: (feature) add staging support" 972df15af28ba6787f096162f776b2973342e674
0 /** Get the feerate of the chunk which transaction arg is in the main graph. Returns the
104+ virtual FeeFrac GetChunkFeerate(const Ref& arg) noexcept = 0;
105+ /** Get the individual transaction feerate of transaction arg. Returns the empty FeeFrac if
106+ * arg does not exist. */
107+ virtual FeeFrac GetIndividualFeerate(const Ref& arg) noexcept = 0;
108+ /** Get pointers to all transactions in the connected component ("cluster") which arg is in.
109+ * The transactions will be returned in a topologically-valid order of acceptable quality.
ismaelsadeeq
commented at 6:29 pm on January 9, 2025:
optional nit when their is need to retouch
In “txgraph: (feature) add initial version” 0c8dc2323eb1ec34357a807f0860cf0a08a63a75
you mean QualityLevel::ACCEPTABLE right we could just reference the enum like you were doing txgraph.cpp?
0 * The transactions will be returned in a topologically-valid order of acceptable quality (QualityLevel::ACCEPTABLE).
Would it go too far to assure the caller that the ordering reflects the underlying stored linearization i.e. reflects the underlying ordering values like chunk feerates are coming from?
Doesn’t “topologically-valid order” already cover what “acceptable quality” means here, as far as the interface is concerned?
ismaelsadeeq
commented at 9:53 am on March 20, 2025:
“acceptable quality” is more than just a topologically valid order.
It refers to a linearization that may not be optimal but is the best we can achieve for that cluster within resource limitations.
in
src/cluster_linearize.h:1372
in
1c5f1c4601outdated
1353+ // Figure out which elements elem needs to be placed before.
1354+ SetType place_before = done & depgraph.Ancestors(elem);
1355+ // Find which position to place elem in (updating j), continuously moving the elements
1356+ // in between forward.
1357+ while (place_before.Any()) {
1358+ auto to_swap = linearization[len - 1 - (j - 1)];
ismaelsadeeq
commented at 6:35 pm on January 9, 2025:
In “clusterlin: add FixLinearization function + fuzz test” 1c5f1c4601e0a895258cfe073125361f7a6ea012
Is it safe to get the ClusterIndex when j is 0, we will subtract 1 from uint32_t which will result to a large positive number?
j cannot be 0. If it were, there is necessarily nothing elem needs to be moved before anymore, and place_before would be empty, and the loop would have terminated.
yancyribbens
commented at 0:48 am on February 26, 2025:
The comment // j cannot be 0 here; implies this should be j != 0 instead of j > 0. Since it’s unsigned j != 0 is also correct.
sipa force-pushed
on Jan 9, 2025
sipa
commented at 9:49 pm on January 9, 2025:
member
A few changes:
Added a TxGraph::DoWork() functions which performs queued-up computations now, so that future operations are fast.
Make Ref& arguments to TxGraph::AddDependency, TxGraph::RemoveTransaction, and TxGraph::SetTransactionFee const. They do not modify the Ref itself; they only modify what the Ref points to.
sipa
commented at 9:14 pm on January 16, 2025:
member
Added an additional inspector function:
CountDistinctClusters, which efficiently counts how many distinct clusters a list of specified Refs belong to, for helping with enforcing RBF cluster policy limits.
sipa force-pushed
on Jan 22, 2025
sipa
commented at 7:59 pm on January 22, 2025:
member
I split out two optimizations to TxGraphImpl::GroupClusters() from the initial commit into their own commits.
Pumpkin1234567812 approved
sipa
commented at 5:28 pm on January 23, 2025:
member
A suggestion for simplifying the removed/destroyed/unlinked/cleanup interface.
Unlinking only happens when the calling code destroys a Ref. Currently, unlinking can happen either through Ref destruction, or through TxGraph::Cleanup() (which unlinks anything not appearing in the graph anymore).
The TxGraphImpl::m_entries vector compaction happens transparently, whenever applicable; there is no TxGraph::Cleanup() any more. This can only happen after the corresponding Ref was destroyed (because that’s now the only way to unlink).
To assist the calling code with cleaning up Refs, a new TxGraph::GetRemoved() function is added, which is a pure inspector, and reports which Refs do not appear in either main or staging (if present) anymore.
instagibbs
commented at 5:35 pm on January 23, 2025:
member
Unlinking only happens when the calling code destroys a Ref. Currently, unlinking can happen either through Ref destruction, or through TxGraph::Cleanup() (which unlinks anything not appearing in the graph anymore).
this would result in:
exists
removed
unlinked + destroyed
as the three possible states?
sipa
commented at 5:38 pm on January 23, 2025:
member
@instagibbs Right. Though “removed/exists” is really a per-graph (main vs staging) thing, while destroyed+unlinked are TxGraph-wide.
Maybe it should not be called GetRemoved(), as that term is ambiguous. How about GetUnusedRefs or so?
sipa
commented at 9:42 pm on January 23, 2025:
member
Actually, having this GetUnusedRefs() is pretty annoying (read: would require extra per-transaction memory) to be efficiently implementable, and talking to @sdaftuar it does not seem important if Trim() can report what it is deleting (because that’s really the only place where TxGraphImpl makes a decision about which transactions belong to the graph).
sipa force-pushed
on Jan 24, 2025
sipa
commented at 10:14 pm on January 24, 2025:
member
Some changes:
Cleanup is gone. Compaction now happens automatically and transparently, but the caller is responsible for destroying Refs when they are no longer needed.
Avoid re-Group-ing when trying to determine oversizedness while it is already known.
DrahtBot
commented at 11:23 pm on January 24, 2025:
contributor
Try to run the tests locally, according to the documentation. However, a CI failure may still
happen due to a number of reasons, for example:
Possibly due to a silent merge conflict (the changes in this pull request being
incompatible with the current code in the target branch). If so, make sure to rebase on the latest
commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the
affected test.
An intermittent issue.
Leave a comment here, if you need help tracking down a confusing failure.
DrahtBot added the label
CI failed
on Jan 24, 2025
sipa force-pushed
on Jan 26, 2025
sipa
commented at 4:41 am on January 26, 2025:
member
Split out a memory optimization in GroupClusters to its own commit.
Moved caching of oversizedness to its own commit.
sipa force-pushed
on Jan 26, 2025
sipa
commented at 8:00 pm on January 26, 2025:
member
Remove the TxGraph::Ref::operator bool(); it has little use now that cleanup is implicit.
Add assertion that cluster count limit is at least 1.
Modify fuzz test to use TxGraph::Ref pointers everywhere rather than references; I realized the existing (test) code was not entirely kosher (could sometimes use a reference to an object that was already destroyed).
DrahtBot removed the label
CI failed
on Jan 26, 2025
in
src/test/fuzz/cluster_linearize.cpp:1148
in
83d61884b4outdated
1143+ uint64_t val{0};
1144+ try {
1145+ reader >> VARINT(val);
1146+ } catch (const std::ios_base::failure&) {}
1147+ val %= todo.Count();
1148+ // Find which element in todo that corresponds to.
ismaelsadeeq
commented at 6:59 pm on January 27, 2025:
In 83d61884b41b2a2be620fbe35736b433c93d4c76
I find it a bit difficult to follow this comment
/** How long the prefix of the constructed linearization is which is topological. */
??
// Find which element in todo that corresponds to.
I have largely rewritten this, and expanded comments. Please have a look and let me know if it is clearer now.
ismaelsadeeq
commented at 10:08 pm on February 10, 2025:
Yes this is much better now!
Thanks
in
src/txgraph.cpp:32
in
6741590e2aoutdated
23+class TxGraphImpl;
24+
25+/** Position of a ClusterIndex within a Cluster::m_linearization. */
26+using LinearizationIndex = uint32_t;
27+/** Position of a Cluster within Graph::m_clusters. */
28+using ClusterSetIndex = uint32_t;
ismaelsadeeq
commented at 7:58 pm on January 27, 2025:
In “txgraph: (feature) add initial version” 6741590e2a4632c4f9b4655679df9694223beac4
In cluster_linearize, we have ClusterIndex, which is defined as the “Data type to represent transaction indices in clusters.” (DepGraph implicitly is a cluster without the linearization and the TxGraph context that Cluster class introduced in this commit has.)
Since we now have a separate Cluster class, would it be clearer to rename ClusterIndex to DepGraphIndex? The type alias names and class names are overlapping
25+/** Position of a ClusterIndex within a Cluster::m_linearization. */
26+using LinearizationIndex = uint32_t;
27+/** Position of a Cluster within Graph::m_clusters. */
28+using ClusterSetIndex = uint32_t;
29+
30+/** Quality levels for cached linearizations. */
ismaelsadeeq
commented at 8:04 pm on January 27, 2025:
In “txgraph: (feature) add initial version” 6741590e2a4632c4f9b4655679df9694223beac4
0/** Quality levels for cached cluster linearizations. */
11@@ -12,8 +12,7 @@
12 #ifndef BITCOIN_TXGRAPH_H
13 #define BITCOIN_TXGRAPH_H
1415-/** No connected component within TxGraph is allowed to exceed this number of transactions. */
16-static constexpr unsigned CLUSTER_COUNT_LIMIT{64};
17+static constexpr unsigned MAX_CLUSTER_COUNT_LIMIT{64};
ismaelsadeeq
commented at 9:21 pm on January 27, 2025:
In “txgraph: (feature) make max cluster count configurable and “oversize” state” 15da6fdd19813a08bf9571147f97e22b5390ecf9
Maybe comment why 64 selected and how having limit above this will hurt the node/miner? fwiw we have seen clusters with size more than 64 in the past https://bitcoincore.reviews/25038, but I think it’s not a regular occurrence.
This code is unused, the current value just affects testing, but it can literally be set to any positive integer, and the code will work (though at degraded performance for higher values). The real question is what to set the max_cluster_count value to in MakeTxGraph(), but I think that discussion belongs in #28676.
sipa force-pushed
on Jan 30, 2025
sipa
commented at 11:47 pm on January 30, 2025:
member
Introduce vsize and weight tagged versions of FeeFrac, to avoid accidental type confusion.
Convert all of txgraph to work using FeePerWeight, rather than bare FeeFrac.
DrahtBot added the label
CI failed
on Jan 31, 2025
sipa force-pushed
on Jan 31, 2025
DrahtBot removed the label
CI failed
on Jan 31, 2025
sipa force-pushed
on Jan 31, 2025
sipa force-pushed
on Feb 1, 2025
sipa force-pushed
on Feb 4, 2025
DrahtBot added the label
CI failed
on Feb 4, 2025
in
src/txgraph.cpp:315
in
68435c7a4foutdated
308+ auto& entry = m_entries[idx];
309+ Assume(entry.m_ref != nullptr);
310+ entry.m_ref = &new_location;
311+ }
312+
313+ /** Only called by Ref::~Ref to unlink Refs. */
ismaelsadeeq
commented at 9:27 pm on February 4, 2025:
In “txgraph: (feature) add initial version” c89d147209c91bb0464321f5bc733a4eeab0dea0
It is also called by the move assignment operator of Ref
DrahtBot removed the label
CI failed
on Feb 4, 2025
in
src/txgraph.cpp:547
in
c89d147209outdated
542+
543+void TxGraphImpl::ApplyRemovals() noexcept
544+{
545+ auto& to_remove = m_to_remove;
546+ // Skip if there is nothing to remove.
547+ if (to_remove.empty()) return;
ismaelsadeeq
commented at 3:12 pm on February 5, 2025:
In “txgraph: (feature) add initial version” c89d147209c91bb0464321f5bc733a4eeab0dea0
We can check whether m_to_remove is empty first so we don’t have to copy the reference when it is, the copied reference is also only used in this if statement, so we can just use the m_to_remove value
Creating a reference doesn’t cost anything (in simple cases); it’s just introducing a new name for an existing object. It’s true that it’s kind of pointless at this stage, but it’ll be used in “group per-graph data in ClusterSet”, and in “add staging support”.
in
src/cluster_linearize.h:1355
in
1fb067066eoutdated
1335@@ -1336,6 +1336,38 @@ std::vector<ClusterIndex> MergeLinearizations(const DepGraph<SetType>& depgraph,
1336 return ret;
1337 }
13381339+/** Make linearization topological, retaining its ordering where possible. */
1340+template<typename SetType>
1341+void FixLinearization(const DepGraph<SetType>& depgraph, Span<ClusterIndex> linearization) noexcept
1342+{
1343+ // This algorithm can be summarized as moving every element in the linearization backwards
1344+ // until it is placed after all this ancestors.
instagibbs
commented at 3:20 pm on February 5, 2025:
in
src/cluster_linearize.h:1364
in
1fb067066eoutdated
1348+ for (ClusterIndex i = 0; i < len; ++i) {
1349+ /** The element at that position. */
1350+ ClusterIndex elem = linearization[len - 1 - i];
1351+ /** j represents how far from the back of the linearization elem should be placed. */
1352+ ClusterIndex j = i;
1353+ // Figure out which elements elem needs to be placed before.
instagibbs
commented at 3:29 pm on February 5, 2025:
0 // Figure out which elements elem needs to be placed after.
in
src/cluster_linearize.h:1371
in
1fb067066eoutdated
1355+ // Find which position to place elem in (updating j), continuously moving the elements
1356+ // in between forward.
1357+ while (place_before.Any()) {
1358+ // j cannot be 0 here; if it was, then there was necessarily nothing earlier which
1359+ // elem needs to be place before anymore, and place_before would be empty.
1360+ Assume(j > 0);
instagibbs
commented at 3:40 pm on February 5, 2025:
in release builds this will result in j-- wrapping around to max value of uint32_t, causing an OOB access? We should probably just stop trying to fix things if it happens?
:man_shrugging: I hope the code is clear enough that reviewers are convinced this is impossible. The Assume hopefully helps with that convincing. I hope an assertion isn’t needed to convince you even more.
in
src/test/fuzz/cluster_linearize.cpp:410
in
a921ae75c2outdated
406@@ -407,7 +407,7 @@ FUZZ_TARGET(clusterlin_depgraph_serialization)
407 SanityCheck(depgraph);
408409 // Verify the graph is a DAG.
410- assert(IsAcyclic(depgraph));
instagibbs
commented at 3:56 pm on February 5, 2025:
a921ae75c2f7162b3b617ddeccb84e3e60728cf8
Should we add a test that completes a cycle then asserts it’s not acyclic to add coverage that way?
40+ GraphIndex m_index = GraphIndex(-1);
41+ public:
42+ /** Construct an empty Ref. Non-empty Refs can only be created using
43+ * TxGraph::AddTransaction. */
44+ Ref() noexcept = default;
45+ /** Destroy this Ref. This is only allowed when it is empty, or the transaction it refers
instagibbs
commented at 4:07 pm on February 5, 2025:
Why does it say earlier “Destroying this TxGraph::Ref removes the corresponding transaction” but here “[destroying this ref] is only allowed when .. the transaction it refers to has been removed from the graph” ? I think the destructor “unlinks” but does not “remove” the tx, so failure to remove before destructing will result in an Assume(!entry.m_locator.IsPresent()) failure in Compact().
@ajtowns That comment was put in the wrong commit. I’ve fixed it.
in
src/txgraph.h:85
in
c89d147209outdated
79+ * its ancestors, are removed as well (which is what always happens in realistic scenarios),
80+ * this reordering will not affect the behavior of TxGraph.
81+ *
82+ * As an example, imagine 3 transactions A,B,C where B depends on A. If a dependency of C on B
83+ * is added, and then B is deleted, C will still depend on A. If the deletion of B is reordered
84+ * before the C->B dependency is added, it has no effect instead. If, together with the
instagibbs
commented at 4:14 pm on February 5, 2025:
comment clarity: “it has no effect” meaning the dependency addition?
154+ /** Which dependencies are to be added ((parent,child) pairs). GroupData::m_deps_offset indexes
155+ * into this. */
156+ std::vector<std::pair<GraphIndex, GraphIndex>> m_deps_to_add;
157+ /** Information about the merges to be performed, if known. */
158+ std::optional<std::vector<GroupEntry>> m_group_data = std::vector<GroupEntry>{};
159+ /** Total number of transactions in this ClusterSet (explicit + implicit). */
instagibbs
commented at 4:59 pm on February 5, 2025:
It’s something that really only has meaning after staging support is added. I have dropped it here, expanded the comment a bit, and expanded on it a lot more in the staging commit.
in
src/txgraph.cpp:252
in
c89d147209outdated
177+
178+ /** A class of objects held internally in TxGraphImpl, with information about a single
179+ * transaction. */
180+ struct Entry
181+ {
182+ /** Pointer to the corresponding Ref object, if any. */
instagibbs
commented at 5:12 pm on February 5, 2025:
sipa
commented at 4:30 am on February 6, 2025:
member
Changed:
Added GetAncestorsUnion and GetDescendantsUnion, which act like their non-union version, but take in a span of multiple Ref*, and return the Ref* for the union of their ancestors/descendants
82+ /** Push changes to Cluster and its linearization to the TxGraphImpl Entry objects. */
83+ void Updated(TxGraphImpl& graph) noexcept;
84+
85+ // Functions that implement the Cluster-specific side of internal TxGraphImpl mutations.
86+
87+ /** Apply any number of removals from the front of to_remove, popping them off. At least one
instagibbs
commented at 5:25 pm on February 6, 2025:
781c15bfca1ebaffe7b634196e19144f5ab10a50
Wasn’t immediately clear what the expected behavior is based on the comment. It will apply and pop off the entries where the prefix of to_remove is all owned by the cluster itself?
301+}
302+
303+void Cluster::ApplyRemovals(TxGraphImpl& graph, std::span<GraphIndex>& to_remove) noexcept
304+{
305+ // Iterate over the prefix of to_remove that applies to this cluster.
306+ Assume(!to_remove.empty());
instagibbs
commented at 5:27 pm on February 6, 2025:
UB if this Assume() is wrong, should we Assert() or more gracefully handle it?
I don’t feel that’s necessary. The calling code is wrong if it’s called with an empty to_remove. It’s hopefully sufficient to document this, so reviewers can be convinced this isn’t being violated.
What’s the advantage of using Assume() over Assert()? The condition is still checked in production builds, it’s just that the result is discarded and execution continues, rather than triggering an abort, no? My understanding was:
CHECK_NONFATAL(cond) – to throw when there’s an internal error that prevents continuing in RPC code, but is recoverable overall
Assume(x) – when we expect x to be true-ish, but the code will still operate correctly if it is not
Assert(x) or assert(x) when we require x to be true-ish and the code won’t operate correctly if it is not
You could conceivably have ref.m_graph be a debug-only field, and write Assume(ref.CheckGraph(txgraph)) with bool Ref::CheckGraph(const TxGraph* g) { if constexpr (DEBUG) { return g == m_graph; } else return true; } perhaps; but otherwise I don’t think you’re getting any benefit from Assume vs Assert.
See my comment here: #31363 (comment) about how Assume() gives you compiler-guaranteed side-effect-equivalence, largely without runtime cost.
in
src/txgraph.cpp:599
in
781c15bfcaoutdated
304+{
305+ // Iterate over the prefix of to_remove that applies to this cluster.
306+ Assume(!to_remove.empty());
307+ SetType todo;
308+ do {
309+ GraphIndex idx = to_remove.front();
instagibbs
commented at 5:28 pm on February 6, 2025:
I have added an Assume, but not the break. It just shouldn’t happen, and this is tested extensively in fuzzing (where the Assumes actually have teeth).
in
src/txgraph.cpp:62
in
781c15bfcaoutdated
49+ friend class TxGraphImpl;
50+ using GraphIndex = TxGraph::GraphIndex;
51+ using SetType = BitSet<CLUSTER_COUNT_LIMIT>;
52+ /** The DepGraph for this cluster, holding all feerates, and ancestors/descendants. */
53+ DepGraph<SetType> m_depgraph;
54+ /** m_mapping[i] gives the GraphIndex for the position i transaction in m_depgraph. */
instagibbs
commented at 5:33 pm on February 6, 2025:
51+ using SetType = BitSet<CLUSTER_COUNT_LIMIT>;
52+ /** The DepGraph for this cluster, holding all feerates, and ancestors/descendants. */
53+ DepGraph<SetType> m_depgraph;
54+ /** m_mapping[i] gives the GraphIndex for the position i transaction in m_depgraph. */
55+ std::vector<GraphIndex> m_mapping;
56+ /** The current linearization of the cluster. Size equals m_mapping.TxCount().
instagibbs
commented at 5:39 pm on February 6, 2025:
m_mapping is a vector, I presume this is m_depgraph instead?
349+ std::vector<Cluster*> new_clusters;
350+ bool first{true};
351+ // Iterate over the connected components of this Cluster's m_depgraph.
352+ while (todo.Any()) {
353+ auto component = m_depgraph.FindConnectedComponent(todo);
354+ if (first && component == todo) {
instagibbs
commented at 8:23 pm on February 6, 2025:
might help reader instantly recognize what todo means in this case
360+ }
361+ first = false;
362+ // Construct a new Cluster to hold the found component.
363+ auto new_cluster = std::make_unique<Cluster>();
364+ new_clusters.push_back(new_cluster.get());
365+ // Remember that all the component's transaction go to this new Cluster. The positions
instagibbs
commented at 8:54 pm on February 6, 2025:
0 // Remember that all the component's transactions go to this new Cluster. The positions
419+ }
420+ m_linearization.push_back(new_pos);
421+ // Copy the transaction's dependencies, translating them using remap.
422+ SetType parents;
423+ for (auto par : other.m_depgraph.GetReducedParents(pos)) {
424+ parents.Set(remap[par]);
instagibbs
commented at 9:19 pm on February 6, 2025:
note: remap here is already filled out from a topological pov since we’re iterating over other.m_linearization which is assumed to be topologically valid
441+ // between which dependencies are added, which simply concatenates their linearizations. Invoke
442+ // PostLinearize, which has the effect that the linearization becomes a merge-sort of the
443+ // constituent linearizations. Do this here rather than in Cluster::Merge, because this
444+ // function is only invoked once per merged Cluster, rather than once per constituent one.
445+ // This concatenation + post-linearization could be replaced with an explicit merge-sort.
446+ PostLinearize(m_depgraph, m_linearization);
instagibbs
commented at 9:33 pm on February 6, 2025:
781c15bfca1ebaffe7b634196e19144f5ab10a50
note: I’m not sure this obvious that PostLinearize obtains this property.
It follows from the property that PostLinearize results in chunks that do not consist of multiple disconnected components: if a higher-feerate chunk is appended to a lower-feerate one, it must mean they get separated.
I’m happy to elaborate briefly in this comment, but perhaps this is also not the place to provide deeper more theoretical reasons why the properties hold.
instagibbs
commented at 9:45 pm on February 12, 2025:
I am perhaps taking the comment too literally? I just don’t see how it’s a merge-sort of chunks when a PostLinearize may very well end up changing chunk boundaries. Or maybe it doesn’t? I buy that the documented properties otherwise hold.
Either way, is there a presumed standout benefit to calling this once here, then after dependencies have been applied and linearization fixed?
Late update (discussed elsewhere): this property actually did not hold. Added comments, and partially rewritten.
in
src/txgraph.cpp:118
in
781c15bfcaoutdated
90+ /** Split this cluster (must have a NEEDS_SPLIT* quality). Returns whether to delete this
91+ * Cluster afterwards. */
92+ [[nodiscard]] bool Split(TxGraphImpl& graph) noexcept;
93+ /** Move all transactions from cluster to *this (as separate components). */
94+ void Merge(TxGraphImpl& graph, Cluster& cluster) noexcept;
95+ /** Given a span of (parent, child) pairs that all belong to this Cluster (or be removed),
instagibbs
commented at 9:40 pm on February 6, 2025:
This comment was no longer relevant, it dates from an earlier iteration. Gone.
in
src/txgraph.cpp:726
in
781c15bfcaoutdated
449+ std::sort(to_apply.begin(), to_apply.end(), [](auto& a, auto& b) { return a.second < b.second; });
450+ // Iterate over groups of to-be-added dependencies with the same child.
451+ auto it = to_apply.begin();
452+ while (it != to_apply.end()) {
453+ auto& first_child = graph.m_entries[it->second].m_locator;
454+ DepGraphIndex child_idx = first_child.index;
instagibbs
commented at 9:42 pm on February 6, 2025:
492+
493+ // Clean up space in quality_cluster.
494+ auto max_setindex = quality_clusters.size() - 1;
495+ if (setindex != max_setindex) {
496+ // If the cluster was not the last element of quality_clusters, move that to take its place.
497+ quality_clusters.back()->m_quality = quality;
instagibbs
commented at 9:50 pm on February 6, 2025:
781c15bfca1ebaffe7b634196e19144f5ab10a50
the cluster is from the same quality level, so this is kind of a no-op?
556+ if (cluster != nullptr) {
557+ // If the first to_remove_span entry's Cluster exists, hand to_remove_span to it, so it
558+ // can pop off whatever applies to it.
559+ cluster->ApplyRemovals(*this, to_remove_span);
560+ } else {
561+ // Otherwise, skip this already-removed entry.
instagibbs
commented at 10:18 pm on February 6, 2025:
When the same Ref is RemoveTransaction()ed twice. Added a comment to clarify.
in
src/txgraph.cpp:708
in
781c15bfcaoutdated
353+ auto component = m_depgraph.FindConnectedComponent(todo);
354+ if (first && component == todo) {
355+ // The existing Cluster is an entire component. Leave it be, but update its quality.
356+ graph.SetClusterQuality(m_quality, m_setindex, QualityLevel::NEEDS_RELINEARIZE);
357+ // We need to recompute and cache its chunking.
358+ Updated(graph);
ismaelsadeeq
commented at 8:23 pm on February 10, 2025:
IIUC we call ApplyRemoval on each cluster before splitting, which calls updated with the graph at the end. so it seems to me that this call to updated before returning here is redundant and repeats work that has already been done?
I also think even if the above assumption is incorrect and an update is necessary, then since the cluster quality is now NEEDS_RELINEARIZE, we only need to update the locator.
You’re right, but that’s exactly what the “delay chunking while sub-acceptable” commit is about. In the initial commit, the invariant is that the chunk feerates are maintained at all times (and the sanity check will fail if not). In this “delay chunking while sub-acceptable” commit, some updates are avoided while they are unnecessary. The commit after it (“special-case removal of tail of cluster”) reintroduces some, because with that, performing a Split() can in fact immediately introduce optimal clusters directly.
ismaelsadeeq
commented at 10:07 pm on February 13, 2025:
I see this now 👍🏾
in
src/txgraph.h:83
in
781c15bfcaoutdated
82+ * As an example, imagine 3 transactions A,B,C where B depends on A. If a dependency of C on B
83+ * is added, and then B is deleted, C will still depend on A. If the deletion of B is reordered
84+ * before the C->B dependency is added, it has no effect instead. If, together with the
85+ * deletion of B also either A or C is deleted, there is no distinction.
86+ */
87+ virtual void RemoveTransaction(const Ref& arg) noexcept = 0;
ismaelsadeeq
commented at 8:50 pm on February 10, 2025:
IIUC you are recommending that callers should remove transactions topologically starting from bottom descendant to the top ancestor.
No, the order of removals is irrelevant. The comment is just about pointing out that while TxGraph is allowed to reorder dependency additions and transaction removals w.r.t. each other, this doesn’t matter in practice. As long as every transaction removal is combined (doesn’t matter when) with also removal of all its ancestors, or with all its descendants, this reordering does not affect its observable behavior.
in
src/txgraph.cpp:746
in
781c15bfcaoutdated
391+ for (auto par : m_depgraph.GetReducedParents(i)) new_parents.Set(remap[par].second);
392+ new_cluster->m_depgraph.AddDependencies(new_parents, remap[i].second);
393+ }
394+ // Update all the Locators of moved transactions.
395+ for (Cluster* new_cluster : new_clusters) {
396+ new_cluster->Updated(graph);
ismaelsadeeq
commented at 8:54 pm on February 10, 2025:
Don’t we just need to update the locator here, because all NEEDS_RELINEARIZE clusters has to be made acceptable before we can get their chunk feerate?
Same comment as before: in the initial commit, the invariant is that the chunk feerates are maintained at all times. It is relaxed later on.
ismaelsadeeq
commented at 10:02 pm on February 10, 2025:
member
Code review to 741a6a8c4d851cb10ecee810a09187bcbfa5af4c
in
src/txgraph.cpp:928
in
781c15bfcaoutdated
594+ if (!m_deps_to_add.empty()) return;
595+ if (!m_to_remove.empty()) return;
596+
597+ // Sort the GraphIndexes that need to be cleaned up. They are sorted in reverse, so the last
598+ // ones get processed first. This means earlier-processed GraphIndexes will not cause moving of
599+ // later-processed ones during the "swap with end of m_entries" step (which might invalidate
instagibbs
commented at 2:59 pm on February 11, 2025:
781c15bfca1ebaffe7b634196e19144f5ab10a50
0 // later-processed ones during the "swap with end of m_entries" step below (which might invalidate
654+ if (m_group_data.has_value()) return;
655+
656+ /** Annotated clusters: an entry for each Cluster, together with the representative for the
657+ * partition it is in if known, or with nullptr if not yet known. */
658+ std::vector<std::pair<Cluster*, Cluster*>> an_clusters;
659+ /** Annotated dependencies: an entry for each m_deps_to_apply entry (excluding ones that apply
instagibbs
commented at 3:37 pm on February 11, 2025:
1005+std::vector<TxGraph::Ref*> TxGraphImpl::GetAncestors(const Ref& arg) noexcept
1006+{
1007+ // Return the empty vector if the Ref is empty.
1008+ if (GetRefGraph(arg) == nullptr) return {};
1009+ Assume(GetRefGraph(arg) == this);
1010+ // Apply all dependencies, as the result might be incorrect otherwise.
instagibbs
commented at 3:50 pm on February 11, 2025:
Probably worth noting in these comment and elsewhere that it also applies removals if queued.
88+ /** Add a dependency between two specified transactions. Parent may not be a descendant of
89+ * child already (but may be an ancestor of it already, in which case this is a no-op). If
90+ * either transaction is already removed, this is a no-op. */
91+ virtual void AddDependency(const Ref& parent, const Ref& child) noexcept = 0;
92+ /** Modify the fee of the specified transaction. If the transaction does not exist (or was
93+ * removed), this has no effect. */
instagibbs
commented at 3:58 pm on February 11, 2025:
should say the Ref if non-null should be connected to this TxGraph object
If we’re going to expend run-time cost to perform the check, we might as well drop the requirement that Ref refers to this graph entirely. Given that we’re not expecting to ever have multiple TxGraph objects in production, this feels like overkill.
in
src/txgraph.cpp:1702
in
781c15bfcaoutdated
1067+
1068+FeePerWeight TxGraphImpl::GetChunkFeerate(const Ref& arg) noexcept
1069+{
1070+ // Return the empty FeePerWeight if the passed Ref is empty.
1071+ if (GetRefGraph(arg) == nullptr) return {};
1072+ Assume(GetRefGraph(arg) == this);
instagibbs
commented at 4:04 pm on February 11, 2025:
0 if (!Assume(GetRefGraph(arg) == this)) return {};
and so on for the public functions?
in
src/txgraph.h:47
in
781c15bfcaoutdated
17+
18+/** Data structure to encapsulate fees, sizes, and dependencies for a set of transactions. */
19+class TxGraph
20+{
21+public:
22+ /** Internal identifier for a transaction within a TxGraph. */
instagibbs
commented at 4:26 pm on February 11, 2025:
0 /** Internal identifier for a transaction within a TxGraph corresponding to m_entries order. */
I’d rather not refer to implementation details in the public interface.
in
src/txgraph.cpp:846
in
781c15bfcaoutdated
841+ Assume(m_group_data.has_value());
842+ // Nothing to do if there are no dependencies to be added.
843+ if (m_deps_to_add.empty()) return;
844+
845+ // For each group of to-be-merged Clusters.
846+ Assume(m_group_data.has_value());
instagibbs
commented at 4:35 pm on February 11, 2025:
this was just assumed a few lines above
edit: in ee4d16ff3a32a0b12a7c1b7837ef5195d9deedf0 the other one is gone, but we’re accessing it two lines above
141+ /** Information about one group of Clusters to be merged. */
142+ struct GroupEntry
143+ {
144+ /** Which clusters are to be merged. */
145+ std::vector<Cluster*> m_clusters;
146+ /** Which dependencies are to be applied to those merged clusters. */
instagibbs
commented at 4:40 pm on February 11, 2025:
0 /** Which dependencies are to be applied to those merged clusters from parent to child. */
Added a comment to this effect (the variable is gone in a future commit, though).
in
src/txgraph.cpp:1243
in
781c15bfcaoutdated
845+ // For each group of to-be-merged Clusters.
846+ Assume(m_group_data.has_value());
847+ for (auto& group_data : *m_group_data) {
848+ // Invoke Merge() to merge them into a single Cluster.
849+ Merge(group_data.m_clusters);
850+ // Actually apply all to-be-added dependencies (for each, parent and child belong to the
instagibbs
commented at 4:42 pm on February 11, 2025:
0 // Actually apply all to-be-added dependencies (all parents and children from this grouping belong to the
663+
664+ // Construct a an_clusters entry for every parent and child in the to-be-applied dependencies.
665+ for (const auto& [par, chl] : m_deps_to_add) {
666+ auto par_cluster = m_entries[par].m_locator.cluster;
667+ auto chl_cluster = m_entries[chl].m_locator.cluster;
668+ // Skip dependencies for which the parent or child transaction is removed.
instagibbs
commented at 5:07 pm on February 11, 2025:
this would be the case where a removal and a dependency addition was queued for the same ref?
690+ * a representative for the entire tree, and can be found by walking upwards from any
691+ * element. */
692+ PartitionData* parent;
693+ /** (only if this is a root, so when parent == this) An upper bound on the height of
694+ * tree for this partition. */
695+ unsigned rank;
instagibbs
commented at 9:24 pm on February 11, 2025:
we doing just unsigned for a reason? I don’t see this in the codebase often
It’s a tiny number, as it’s bounded by $\mathcal{O}(\log n)$, so it didn’t feel right to use the same DepGraphIndex or GraphIndex type.
in
src/txgraph.cpp:710
in
0eacbd61cboutdated
705+ Assume(it->cluster == arg);
706+ return &*it;
707+ };
708+
709+ /** Given a PartitionData, find the root of the tree it is in (its representative). */
710+ static constexpr auto find_uf = [](PartitionData* data) noexcept -> PartitionData* {
instagibbs
commented at 9:28 pm on February 11, 2025:
759+ for (size_t i = 0; i < partition_data.size(); ++i) {
760+ auto& data = partition_data[i];
761+ // Find the representative of the partition Cluster i is in, and store it with the
762+ // Cluster.
763+ auto rep = find_uf(&data)->cluster;
764+ an_clusters[i].second = rep;
instagibbs
commented at 9:43 pm on February 11, 2025:
instagibbs
commented at 4:22 pm on February 12, 2025:
member
reviewed up to “txgraph: (feature) add initial version” 781c15bfca1ebaffe7b634196e19144f5ab10a50 , basically all nits at this point, logic all makes sense
One question I had coming up is use of Assume(), there is a number of places that Assume() is used, then subsequently may result in UB or weird behavior. In a few spots I gave suggestions but was wondering if there is thinking behind doing more (or Assert’ing) or not to handle that in release builds.
in
src/test/fuzz/txgraph.cpp:172
in
1f06bc1e4aoutdated
167+
168+ // Construct a real and a simulated graph.
169+ auto real = MakeTxGraph();
170+ SimTxGraph sim;
171+
172+ /** Function to pick any Ref (in sim real, sim.removed, or empty). */
ismaelsadeeq
commented at 4:34 pm on February 12, 2025:
In “txgraph: (tests) add simulation fuzz test” 1f06bc1e4a8108f1430bcd20fc391c9f663a2e4b
using real in the comments here is a bit confusing as you represent the actual tx graph as real
instagibbs
commented at 4:51 pm on February 12, 2025:
sim real
feel like we’re missing a punctuation or I’m unclear what it’s saying
310@@ -311,7 +311,7 @@ void Cluster::ApplyRemovals(TxGraphImpl& graph, std::span<GraphIndex>& to_remove
311 auto& locator = entry.m_locator;
312 // Stop once we hit an entry that applies to another Cluster.
313 if (locator.cluster != this) break;
314- // - Remember it in a set of to-remove ClusterIndexes.
315+ // - Remember it in a set of to-remove DepGraphIndexes.
instagibbs
commented at 4:43 pm on February 12, 2025:
micro-nit: seems this should have been renamed in a different commit 1f06bc1e4a8108f1430bcd20fc391c9f663a2e4b
in
src/test/fuzz/txgraph.cpp:142
in
1f06bc1e4aoutdated
100+ auto pos = Find(ref);
101+ if (pos == MISSING) return;
102+ graph.RemoveTransactions(SetType::Singleton(pos));
103+ simrevmap.erase(simmap[pos].get());
104+ // Retain the TxGraph::Ref corresponding to this position, until explicitly destroyed.
105+ // to see it when calling Cleanup().
instagibbs
commented at 5:14 pm on February 12, 2025:
This dates from a time when the fuzz test allowed up to 1000 steps, and it felt unnecessary to support cases with unboundedly growing numbers of removed transactions. I’ve just dropped it.
in
src/test/fuzz/txgraph.cpp:271
in
1f06bc1e4aoutdated
I’ve added a comment. ~Ref triggers observable behavior in TxGraph, so the simulation controls when it gets called (and its ordering w.r.t. other operations).
in
src/test/fuzz/txgraph.cpp:279
in
1f06bc1e4aoutdated
Well choices counts the actual number of choices, consisting of:
All transactions in sim.simmap (tx_count)
All transactions in sim.removed (sim.removed.size())
The empty Ref (1).
When picking one of them, we want one in the range from 0 up to and including choices - 1.
ismaelsadeeq
commented at 7:19 pm on February 13, 2025:
Makes sense thanks.
in
src/test/fuzz/txgraph.cpp:117
in
1f06bc1e4aoutdated
77+ }
78+
79+ /** Add a dependency between two positions in this graph. */
80+ void AddDependency(TxGraph::Ref* parent, TxGraph::Ref* child)
81+ {
82+ auto par_pos = Find(parent);
ismaelsadeeq
commented at 7:23 pm on February 12, 2025:
In “txgraph: (tests) add simulation fuzz test” 1f06bc1e4a8108f1430bcd20fc391c9f663a2e4b
Should we return early when parent ref is the same as ref child? This will prevent a possibility of having a cycle.
As such, we don’t have to check before adding dependency in the fuzz test.
It’s not clear to me what you’re asking for. I would describe the test as “big simulation test, which performs a number of operations on a real TxGraph, and on a simpler reimplementation, and in the end compares the two”. Something like that? Which cluster_linearize fuzz test are you referring to?
ismaelsadeeq
commented at 7:20 pm on February 13, 2025:
Yeah something like that that
In most of the fuzz test in src/test/fuzz/cluster_linearize.cpp you provided a really nice description , which is just an overview of what the test is going to do before jumping to the instructions.
Here we just jump into the instructions without that context.
But this is nitty, you can ignore this comment and resolve.
0 // This is a big simulation test for TxGraph, which performs a fuzz-derived sequence of valid
1 // operations on a TxGraph instance, as well as on a simpler (mostly) reimplementation (see
2 // SimTxGraph above), comparing the outcome of functions that return a result, and finally
3 // performing a full comparison between the two.
in
src/txgraph.cpp:1201
in
741a6a8c4doutdated
1196+ // ... for all clusters in them ...
1197+ for (ClusterSetIndex setindex = 0; setindex < quality_clusters.size(); ++setindex) {
1198+ const auto& cluster = *quality_clusters[setindex];
1199+ // Remember we saw this Cluster (only if it is non-empty; empty Clusters aren't
1200+ // expected to be referenced by the Entry vector).
1201+ if (cluster.GetTxCount() != 0) {
instagibbs
commented at 8:05 pm on February 12, 2025:
Empty clusters can happen anytime ApplyRemovals() is invoked, but not ApplyDependencies since that invokes SplitAll(), correct?
Indeed. Empty clusters can only exist in after an ApplyRemovals but before a SplitAll(). After oversizedness caching, ApplyDependencies may end up not calling SplitAll(), though, if the result is known to be oversized already.
in
src/txgraph.cpp:1895
in
741a6a8c4doutdated
1158+ assert(m_done == m_depgraph.Positions());
1159+}
1160+
1161+void TxGraphImpl::SanityCheck() const
1162+{
1163+ /** Which GraphIndexes ought to occur in m_wiped, based on m_entries. */
ismaelsadeeq
commented at 8:53 pm on February 12, 2025:
In “txgraph: (tests) add internal sanity check function” 741a6a8c4d851cb10ecee810a09187bcbfa5af4c
1144+ // Check that the Entry has a locator pointing back to this Cluster & position within it.
1145+ assert(entry.m_locator.cluster == this);
1146+ assert(entry.m_locator.index == lin_pos);
1147+ // Check linearization position and chunk feerate.
1148+ if (!linchunking.GetChunk(0).transactions[lin_pos]) {
1149+ linchunking.MarkDone(linchunking.GetChunk(0).transactions);
ismaelsadeeq
commented at 9:01 pm on February 12, 2025:
In “txgraph: (tests) add internal sanity check function” 741a6a8c4d851cb10ecee810a09187bcbfa5af4c
Should we also compare that the list of transactions in chunk 0 matches m_linearization subset from from previous hit of the conditional statement to lin_pos?
I don’t think this is the place for it, as it amounts to testing of LinearizationChunking, not TxGraph. The point of the code here is just to get the chunks defined by m_linearization out, so we can test them. If LinearizationChunking works correctly (see clusterlin_linearization_chunking for that), then the chunks that come out will correspond to consecutive transactions from the linearization.
sipa force-pushed
on Feb 12, 2025
DrahtBot added the label
CI failed
on Feb 13, 2025
DrahtBot
commented at 0:01 am on February 13, 2025:
contributor
Try to run the tests locally, according to the documentation. However, a CI failure may still
happen due to a number of reasons, for example:
Possibly due to a silent merge conflict (the changes in this pull request being
incompatible with the current code in the target branch). If so, make sure to rebase on the latest
commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the
affected test.
An intermittent issue.
Leave a comment here, if you need help tracking down a confusing failure.
sipa force-pushed
on Feb 13, 2025
DrahtBot removed the label
CI failed
on Feb 13, 2025
in
src/txgraph.cpp:1189
in
56cbdd7889outdated
833+ while (deps_it != m_deps_to_add.end()) {
834+ auto [par, chl] = *deps_it;
835+ auto chl_cluster = m_entries[chl].m_locator.cluster;
836+ // Skip dependencies that apply to earlier Clusters (those necessary are for
837+ // deleted transactions, as otherwise we'd have processed them already).
838+ if (!std::less{}(chl_cluster, data.cluster)) {
instagibbs
commented at 6:02 pm on February 13, 2025:
56cbdd7889e957c29d76681c46c4a7a6983c9be6
I rewrote it so my smooth brain could better decipher it (I believe it’s the same) :
0diff --git a/src/txgraph.cpp b/src/txgraph.cpp
1index b2cecffebb..df93eb2a13 100644
2--- a/src/txgraph.cpp
3+++ b/src/txgraph.cpp
4@@ -821,12 +821,13 @@ void TxGraphImpl::GroupClusters() noexcept
5 while (deps_it != m_deps_to_add.end()) {
6 auto [par, chl] = *deps_it;
7 auto chl_cluster = m_entries[chl].m_locator.cluster;
8 // Skip dependencies that apply to earlier Clusters (those necessary are for
9 // deleted transactions, as otherwise we'd have processed them already).
10- if (!std::less{}(chl_cluster, data.cluster)) {
11- if (chl_cluster != data.cluster) break;
12+ if (std::greater{}(chl_cluster, data.cluster)) {
13+ break;
14+ } else if (chl_cluster == data.cluster) {
15 auto par_cluster = m_entries[par].m_locator.cluster;
16 // Also filter out dependencies applying to a removed parent.
17 if (par_cluster != nullptr) an_deps.emplace_back(*deps_it, rep);
18 }
19 ++deps_it;
418+ m_quality == QualityLevel::NEEDS_SPLIT_ACCEPTABLE);
419+ // Determine the new quality the split-off Clusters will have.
420+ QualityLevel new_quality = m_quality == QualityLevel::NEEDS_SPLIT ? QualityLevel::NEEDS_RELINEARIZE :
421+ m_quality == QualityLevel::NEEDS_SPLIT_OPTIMAL ? QualityLevel::OPTIMAL :
422+ QualityLevel::ACCEPTABLE;
423+ // If the cluster was NEEDS_SPLIT_OPTIMAL, and we're thus going to produce OPTIMAL clusters, we
instagibbs
commented at 6:29 pm on February 13, 2025:
693c3df67dc972dd2fcfbeb8179bcd7833c77bd1
Working on convincing myself that PostLinearization is suffcient to result in optimal clusters. I think it’s straight forward that the chunk prefixes before any removed tail transactions are untouched, and once we remove tail transactions, the last touched-but-not-fully-removed chunk may be in a substandard ordering.
So PostLinearization guarantees that the final sub-chunk is reordered to being connected, and that means that sub-chunk is now optimal? I’m going to need help here.
Something I was too lazy to do: A fuzz harness could be constructed via taking verified-optimal depgraphs (from exhaustive solver), chopping the tail off, and verifying all connected components PostLinearize-ation are just as good as whatever the exhaustive linearizer gives for each component-now-cluster?
instagibbs
commented at 7:01 pm on February 27, 2025:
Update from offline conversation: This was a confusion about what we’re calling “optimal”. The splits clusters are already optimal in the sense of the diagram check, but not minimal or fully connected. PostLinearization makes sure that all chunks are connected/minimal.
Perhaps a rewording of this section to make it abundantly clear?
336- } while(chunk.transactions.Any());
337+ // If the Cluster's quality is ACCEPTABLE or OPTIMAL, compute its chunking and store its
338+ // information in the Entry's m_chunk_feerate. These fields are only accessed after making
339+ // the entire graph ACCEPTABLE, so it is pointless to compute these if we haven't reached that
340+ // quality level yet.
341+ if (m_quality == QualityLevel::OPTIMAL || m_quality == QualityLevel::ACCEPTABLE) {
ismaelsadeeq
commented at 9:35 pm on February 13, 2025:
In “txgraph: (optimization) delay chunking while sub-acceptable” 30d7c8ce7f34e977238d1454dd032196cbfd936b
nit: we can have truthy/false method for this so that we can just call it.
we have this conditional statement in quite a few places
Done. I have actually added a bunch of functions (IsAcceptable(), IsOptimal(), NeedsSplitting(), and further on also IsOversized()) and mostly rewritten the long conditions involving m_quality to use these functions instead.
in
src/txgraph.cpp:263
in
1b207547a9outdated
259@@ -260,6 +260,8 @@ class TxGraphImpl final : public TxGraph
260 ClusterSetIndex InsertCluster(std::unique_ptr<Cluster>&& cluster, QualityLevel quality) noexcept;
261 /** Change the QualityLevel of a Cluster (identified by old_quality and old_index). */
262 void SetClusterQuality(QualityLevel old_quality, ClusterSetIndex old_index, QualityLevel new_quality) noexcept;
263+ /** Make a transaction not exist. */
instagibbs
commented at 3:16 pm on February 14, 2025:
nit
0 /** Make a transaction not exist. Transaction must currently exist. */
304@@ -252,16 +305,21 @@ class TxGraphImpl final : public TxGraph
305306 /** Swap the Entrys referred to by a and b. */
307 void SwapIndexes(GraphIndex a, GraphIndex b) noexcept;
308- /** Extract a Cluster. */
309- std::unique_ptr<Cluster> ExtractCluster(QualityLevel quality, ClusterSetIndex setindex) noexcept;
310+ /** If idx exists in the specified level ClusterSet (explicitly or implicitly), return the
ismaelsadeeq
commented at 3:40 pm on February 14, 2025:
94@@ -90,6 +95,17 @@ class Cluster
95 void UpdateMapping(DepGraphIndex cluster_idx, GraphIndex graph_idx) noexcept { m_mapping[cluster_idx] = graph_idx; }
96 /** Push changes to Cluster and its linearization to the TxGraphImpl Entry objects. */
97 void Updated(TxGraphImpl& graph) noexcept;
98+ /** Create a copy of this Cluster, returning a pointer to it (used by PullIn). */
99+ Cluster* CopyTo(TxGraphImpl& graph, int to_level) const noexcept;
100+ /** Get the list of Clusters that conflict with this one (at level-1). */
instagibbs
commented at 3:47 pm on February 14, 2025:
nit: “level-1” is a bit vague, but understandable once you dive into the code itself. “top level - 1”?
That’s not quite what this does. It looks for conflicts one level below the one where the *this Cluster exists (in practice, that’ll always be top_level-1, though). Improved the comment.
in
src/txgraph.cpp:229
in
e19bbc3282outdated
208@@ -189,28 +209,59 @@ class TxGraphImpl final : public TxGraph
209 std::vector<std::pair<GraphIndex, GraphIndex>> m_deps_to_add;
210 /** Information about the merges to be performed, if known. */
211 std::optional<GroupData> m_group_data = GroupData{};
212+ /** Which entries were removed in this ClusterSet (so they can be wiped on abort). */
instagibbs
commented at 3:49 pm on February 14, 2025:
IIUC this field should become a subset of m_to_remove’s value which ended up being IsPresent? Is there any better way to explain the relationship of these two variables?
I think the most accurate description is “All entries which have an (R) removed locator at this level, plus any transactions in m_unlinked”. Would that help?
instagibbs
commented at 11:38 pm on February 21, 2025:
232+ * - (P)resent: actually occurs in a Cluster at that level.
233+ *
234+ * - (M)issing: not present in a Cluster at that level. For main, this means the transaction
235+ * does not exist. For staging this means it its existence is inherited from
236+ * main. If it doesn't exist in main, it doesn't exist in staging either. If it
237+ * does existing in main, the cluster it is in is unmodified in staging.
ismaelsadeeq
commented at 3:49 pm on February 14, 2025:
0 * does exist in main, the cluster it is in is unmodified in staging.
102+ /** Mark all the Entry objects belonging to this Cluster as missing. The Cluster must be
103+ * deleted immediately after. */
104+ void MakeTransactionsMissing(TxGraphImpl& graph) noexcept;
105+ /** Remove all transactions in a Cluster. */
106+ void Clear(TxGraphImpl& graph) noexcept;
107+ /** Change a Cluster's level from level to level-1. */
ismaelsadeeq
commented at 4:05 pm on February 14, 2025:
Should just refer to main and staging only instead of level-1, main, staging and top level?
It will be consistent this way.
Together with a lot of changes (getting rid of the ClusterSet vector), done.
in
src/txgraph.cpp:273
in
e19bbc3282outdated
243+ * - (M,M): the transaction doesn't exist in either graph.
244+ * - (P,M): the transaction exists in both, but only exists explicitly in a Cluster object in
245+ * main. Its existence in staging is inherited from main.
246+ * - (P,P): the transaction exists in both, and is materialized in both. Thus, the clusters
247+ * and/or their linearizations may be different in main and staging.
248+ * - (M,P): the transaction is added in staging, and does not exist in main.
ismaelsadeeq
commented at 4:15 pm on February 14, 2025:
hmm, above you mentioned “If it doesn’t exist in main, it doesn’t exist in staging either.”.
And now there is a state where the transaction exits in staging but not in main
I see this was a bit confusing, I have expanded the explanation.
But what I meant to convey was that (M) in staging means that if the transaction does not exist in main, it doesn’t exist in staging either. A (P) in staging just means it exists there. LMK if the new explanation is clearer.
442+ for (auto i : m_linearization) {
443+ auto& entry = graph.m_entries[m_mapping[i]];
444+ // For every transaction Entry in this Cluster, if it also exists in a lower-level Cluster,
445+ // then that Cluster conflicts.
446+ if (entry.m_locator[m_level - 1].IsPresent()) {
447+ out.push_back(entry.m_locator[m_level - 1].cluster);
ismaelsadeeq
commented at 5:11 pm on February 14, 2025:
Should be set to avoid duplicates, will prevent de duplicating later on?
Both options are O(n log n), but a vector + sorting + deduplication has much better constant factors. An std::set requires a separate allocation for each element, while a vector uses a single allocator for everything. Having everything in continuous memory also exploits the CPU cache better.
In theory, using an std::unordered_map could asymptotically faster for large numbers as it’s just O(n), but I suspect not for the numbers we care about. I haven’t tried or benchmarked it, though.
ismaelsadeeq
commented at 8:06 pm on February 18, 2025:
Agreed, I calculated recently.
This is better.
The constant factors for GetDistinctClusters implementation using vector and then sorting is also better 👍🏾
Please resolve.
ismaelsadeeq
commented at 5:36 pm on February 14, 2025:
member
It seems to me that this is designed with the flexibility to accommodate more levels, which explains the use of std::vector for ClusterSet
If yes then current approach is perfect else would it be better for TxGraph to maintain a std::array of two ClusterSet? mainandstaging and store their indexes in TxGraph as enums? This way, Cluster would only need to maintain that enum.
In ClusterSet, we could introduce a boolean to indicate whether a set is active or not.
The main set could always remain active, while staging would be flexible and toggleable.
Thus, when we start staging, we can simply mark the back array as active, and once done, clear the back array and mark it as inactive.
I believe this approach would reduce Assumes since we explicitly know we can only have two ClusterSet’s.
in
src/txgraph.cpp:532
in
e19bbc3282outdated
451+
452+std::vector<Cluster*> TxGraphImpl::GetConflicts() const noexcept
453+{
454+ int level = m_clustersets.size() - 1;
455+ std::vector<Cluster*> ret;
456+ // All Clusters at level-1 containing transactions in m_removed are conflicts.
instagibbs
commented at 7:18 pm on February 14, 2025:
i.e., any (P, R)? If so, putting in comment would be nice
457+ for (auto i : m_clustersets[level].m_removed) {
458+ auto& entry = m_entries[i];
459+ Assume(entry.m_locator[level - 1].IsPresent());
460+ ret.push_back(entry.m_locator[level - 1].cluster);
461+ }
462+ // Then go over all Clusters at this level, and find their conflicts.
instagibbs
commented at 7:19 pm on February 14, 2025:
i.e., any (P, P)? If so, putting in comment would be nice
481+ ptr->m_depgraph = m_depgraph;
482+ ptr->m_mapping = m_mapping;
483+ ptr->m_linearization = m_linearization;
484+ // Insert the new Cluster into the graph.
485+ graph.InsertCluster(to_level, std::move(ret), m_quality);
486+ // Update its Locators (and possibly linearization data in its Entrys).
instagibbs
commented at 7:31 pm on February 14, 2025:
and possibly linearization data
e.g. when to_level is 0 and is otherwise acceptable?
Yes, but no, because to_level is never 0. I have just dropped it.
in
src/txgraph.cpp:125
in
e19bbc3282outdated
100+ /** Get the list of Clusters that conflict with this one (at level-1). */
101+ void GetConflicts(const TxGraphImpl& graph, std::vector<Cluster*>& out) const noexcept;
102+ /** Mark all the Entry objects belonging to this Cluster as missing. The Cluster must be
103+ * deleted immediately after. */
104+ void MakeTransactionsMissing(TxGraphImpl& graph) noexcept;
105+ /** Remove all transactions in a Cluster. */
instagibbs
commented at 7:36 pm on February 14, 2025:
I find this slightly clearer since txns can still exist somewhere
826+ int level = cluster->m_level;
827+ Assume(level <= to_level);
828+ // Copy the Cluster from the level it was found at to higher levels, if any.
829+ while (level < to_level) {
830+ // Make the Cluster Acceptable before copying. This isn't strictly necessary, but doing it
831+ // now avoids doing doable work later.
instagibbs
commented at 7:59 pm on February 14, 2025:
Actually I meant “double”, not “doable”, I think. Fixed.
instagibbs
commented at 11:57 pm on February 21, 2025:
hah, was my first guess that I doubted :)
in
src/txgraph.cpp:842
in
e19bbc3282outdated
844+ auto& clusterset = m_clustersets[level];
845 auto& to_remove = clusterset.m_to_remove;
846 // Skip if there is nothing to remove.
847 if (to_remove.empty()) return;
848+ // Pull in all Clusters that are not in the top ClusterSet.
849+ for (GraphIndex index : clusterset.m_to_remove) {
instagibbs
commented at 8:06 pm on February 14, 2025:
0 for (GraphIndex index : to_remove) {
and elsewhere so we’re only looking at two different to_remove* things vs three?
98@@ -99,6 +99,11 @@ class TxGraph
99 * effect. */
100 virtual void SetTransactionFee(const Ref& arg, int64_t fee) noexcept = 0;
101102+ /** TxGraph is internally lazy, and will not compute many things until they are needed.
103+ * Calling DoWork will compute everything now, so that future operations are fast. This can be
104+ * invoked while oversized. */
ismaelsadeeq
commented at 3:46 pm on February 18, 2025:
It can be called when oversized but for staging DoWork it will be no-op when it is oversized.
It might still do something; for example removals may be applied still, and grouping may be calculated. I feel it’s unnecessary to elaborate that much about implementation details in the interface, though.
sipa
commented at 8:30 pm on February 18, 2025:
member
It seems to me that this is designed with the flexibility to accommodate more levels, which explains the use of std::vector for ClusterSet
It’s not really written with the intent of accomodating more levels; it’s more that I found it more elegant to write it generically, because most operations don’t care what level they are operating on. I can see that the increased abstraction may be confusing though; I’d like to hear more reviewer comments on this.
If yes then current approach is perfect else would it be better for TxGraph to maintain a std::array of two ClusterSet? mainandstaging and store their indexes in TxGraph as enums? This way, Cluster would only need to maintain that enum.
In ClusterSet, we could introduce a boolean to indicate whether a set is active or not.
I think it’s more natural to use a vector, which implicitly encodes how many there are, and it means one can always use m_clustersets.back() to refer to the top set, regardless of whether main/staging exist; the alternative would involve checking whether m_clustersets[1].m_active is true or not. Performance isn’t a concern here either; they’re big data structures that aren’t created/destroyed inside tight loops.
The main set could always remain active, while staging would be flexible and toggleable.
Thus, when we start staging, we can simply mark the back array as active, and once done, clear the back array and mark it as inactive.
I believe this approach would reduce Assumes since we explicitly know we can only have two ClusterSet’s.
True, but it would also add a consistency requirement that m_clustersets[0].m_active is always true, for example. And of course, Assumes are never “necessary”; they’re there to help review (by convincing people that if the condition didn’t hold, presumably a test could hit it). If you find an Assume that’s more cluttering than helpful, it can always just be dropped too.
sipa force-pushed
on Feb 20, 2025
sipa
commented at 5:24 pm on February 20, 2025:
member
I have modified (in “add staging support” and further) the TxGraph functions relating to the normalization steps to all take a level argument as input (ApplyRemovals, SplitAll, GroupClusters, ApplyDependencies, and MakeAllAcceptable). This meant a number of repeated sequences like:
830+{
831+ Assume(level >= 0 && size_t(level) < m_clustersets.size());
832+ auto& entry = m_entries[idx];
833+ // Search the entry's locators from top to bottom.
834+ for (int l = level; l >= 0; --l) {
835+ // If the locator is missing, dig deeper; it may exist at a lower level.
instagibbs
commented at 7:20 pm on February 20, 2025:
micro-nit: documentation was pretty clear in declaration, but just in case…
0 // If the locator is missing, dig deeper; it may exist at a lower level and therefore be implicitly available at this level.
1575+void TxGraphImpl::AbortStaging() noexcept
1576+{
1577+ Assume(m_clustersets.size() > 1);
1578+ int stage_level = m_clustersets.size() - 1;
1579+ auto& stage = m_clustersets[stage_level];
1580+ // Mark are removed transactions as Missing (so the stage_level locator for these transactions
instagibbs
commented at 7:48 pm on February 20, 2025:
1578+ int stage_level = m_clustersets.size() - 1;
1579+ auto& stage = m_clustersets[stage_level];
1580+ // Mark are removed transactions as Missing (so the stage_level locator for these transactions
1581+ // can be reused if another staging is created).
1582+ for (auto idx : stage.m_removed) {
1583+ m_entries[idx].m_locator[stage_level].SetMissing();
instagibbs
commented at 7:50 pm on February 20, 2025:
This seems to work in fuzzing, but I’m not entirely convinced this is clearly true. Entries can exist in m_to_remove which are already removed (if they’re unlinked), so I have not taken this.
instagibbs
commented at 0:01 am on February 22, 2025:
saw the updated comment/code in SanityCheck, makes sense
in
src/txgraph.cpp:1760
in
913e14e6fdoutdated
1576+{
1577+ Assume(m_clustersets.size() > 1);
1578+ int stage_level = m_clustersets.size() - 1;
1579+ auto& stage = m_clustersets[stage_level];
1580+ // Mark are removed transactions as Missing (so the stage_level locator for these transactions
1581+ // can be reused if another staging is created).
instagibbs
commented at 7:54 pm on February 20, 2025:
it’s an array of Locators, so what else could have been done here? Might be reading too much into this comment.
No, because it’s comparing an std::optional<bool> with a bool, which can only be true if the first isn’t std::nullopt.
in
src/txgraph.cpp:308
in
fdd68be0acoutdated
302@@ -303,6 +303,8 @@ class TxGraphImpl final : public TxGraph
303 Locator m_locator[MAX_LEVELS];
304 /** The chunk feerate of this transaction in main (if present in m_locator[0]). */
305 FeePerWeight m_main_chunk_feerate;
306+ /** The position this transaction has in the main linearization (if present). */
instagibbs
commented at 8:48 pm on February 20, 2025:
This value is set to -1, sometimes, but never acted on with that value. Noting in case you have thoughts on that.
Not really. The -1 has no meaning, but hopefully would result in a nice detectable error if it were ever used.
bitcoin deleted a comment
on Feb 20, 2025
in
src/txgraph.h:149
in
11135464c5outdated
148+ virtual std::vector<Ref*> GetAncestorsUnion(std::span<const Ref* const> args, bool main_only = false) noexcept = 0;
149 /** Get pointers to all descendants of the specified transaction. If main_only is false and a
150 * staging graph exists, it is queried; otherwise the main graph is queried. The queried
151 * graph must not be oversized. Returns {} if arg does not exist in the queried graph. */
152 virtual std::vector<Ref*> GetDescendants(const Ref& arg, bool main_only = false) noexcept = 0;
153+ /** Like GetDescendants, but return the Refs for all transactions in the union of the provided
instagibbs
commented at 4:32 pm on February 21, 2025:
nit: might be worthwhile to mention in this and elsewhere that results will be unique
It says “(each transaction is only reported once)”, is that not what you mean?
instagibbs
commented at 11:35 pm on February 21, 2025:
major reading comprehension issues it seems
instagibbs
commented at 5:07 pm on February 21, 2025:
member
Looking good 11135464c5c3aef1ce8d2823120467a522ce2c87
I just have conceptual issues with what PostLinerize is guaranteeing us beyond what is explicitly stated in the header file.
Pile of nits/questions otherwise
sipa force-pushed
on Feb 21, 2025
sipa
commented at 10:25 pm on February 21, 2025:
member
Addressed most comments. I will address the PostLinearize related ones later.
instagibbs
commented at 0:02 am on February 22, 2025:
member
verified changes, just waiting on further PostLinearization commentary
via git range-diff master 11135464c5c3aef1ce8d2823120467a522ce2c87 d5abb86439e79d4adfbfbd46f833268bbca0bf6e
sipa
commented at 8:15 pm on March 3, 2025:
member
@instagibbs So… it seems that the “optimal linearization, truncate, postlinearize, split” does not actually result in minimal chunks (but it does result in feerate-diagram-optimal ones, and connected ones):
The original (optimal) linearization is [ABEDC]. It’s just a single chunk, so really any ordering is equally good.
Truncation removes C. Postlinearization turns it into [BADE]. However, a 2-chunk alternative exists: [BE,AD].
sipa force-pushed
on Mar 3, 2025
sipa
commented at 9:54 pm on March 3, 2025:
member
I have dropped the QualityLevel::NEEDS_SPLIT_OPTIMAL value, and made the “post-linearize after remove, before split” apply to NEEDS_SPLIT_ACCEPTABLE instead, with updated comments about it.
instagibbs
commented at 5:09 pm on March 4, 2025:
member
ACK72a97c0a07ea6e5a95ab37c8d95e1ea02cff8e92
With the weaker claims in the code re:PostLinearization, it works for me, and obviates the need for an additional harness test.
I’ve reviewed each commit and also looked at how the interface is being used by CTxMemPool in the big PR
in
src/cluster_linearize.h:23
in
8872339583outdated
18@@ -19,8 +19,8 @@
1920 namespace cluster_linearize {
2122-/** Data type to represent transaction indices in clusters. */
23-using ClusterIndex = uint32_t;
24+/** Data type to represent transaction indices in DepGraphs and the clusters they represent. */
25+using DepGraphIndex = uint32_t;
0-BEGIN VERIFY SCRIPT-
1sed -i 's/Data type to represent transaction indices in clusters./Data type to represent transaction indices in DepGraphs and the clusters they represent./' $(git grep -l 'using ClusterIndex')
2sed -i 's|\<ClusterIndex\>|DepGraphIndex|g' $(git grep -l 'ClusterIndex')
3-END VERIFY SCRIPT-
72+ virtual ~TxGraph() = default;
73+ /** Construct a new transaction with the specified feerate, and return a Ref to it. In all
74+ * further calls, only Refs created by AddTransaction() are allowed to be passed to this
75+ * TxGraph object (or empty Ref objects). */
76+ [[nodiscard]] virtual Ref AddTransaction(const FeePerWeight& feerate) noexcept = 0;
77+ /** Remove the specified transaction. This is a no-op if the transaction was already removed.
The invalidation behaviour for Refs seems unclear to me? I think the idea is that if you hold onto a Ref you can continue to uniquely refer to a removed transaction, up until the Ref is destructed, at which point the TxGraph is compacted and memory might be reused at that point. This delay then also allows for reordering to occur which can result in optimisations.
I think the intended behaviour could be made a bit clearer in the header file though? It’s not immediately obvious to me under what circumstances the ApplyRemovals() and Compact() are/should be called.
As far as the interface is concerned: you get a Ref object from AddTransaction, and as long as the object exists, using it remains valid (which, due to the unlinking in ~TxGraphImpl you’ve suggested I add, may now even outlive the TxGraph). In the initial commit a Ref object may only be destroyed after the transaction has been removed, but this is relaxed in “txgraph: (feature) destroying Ref means removing transaction” (currently 9a04cc18e19aa03d215ce1fb030809d7e6eb98a6).
I’d like to keep the implementation details (like when ApplyRemovals() and Compact() are called) out of the interface definition, but other than that, happy for any suggestions that would clarify things (I’ve adjusted comments a bit, and moved them to the correct commits already).
Maybe it would be helpful to explain in the header that AddTransaction() creates a Ref, which you’re then expected to assign/move into your own container, which may be a subclass containing even more info about the tx, and then that serves as a permanent handle, even as the txgraph gets rearranged internally and that it’s deleted from the graph either with RemoveTransaction explicitly or implicitly by destructing the Ref? Essentially expanding the “Data type used to reference transactions within a TxGraph” comment that’s already there to be a bit more ELI5 maybe?
If you create an empty Ref (ie, construct it directly, not via AddTransaction) is there any way to associate that Ref with a TxGraph or is it forever useless as a Ref? Would it make any sense to have an AddTransaction variant that takes a Ref& argument rather than returning a new Ref? I guess equivalently, is an empty Ref useful for anything other than tests? Maybe “Non-empty Refs can only be created using AddTx” should be more explicit and say “You want to use AddTransaction() rather than constructing a Ref directly”.
So think of an empty Ref as an equivalent to an std::unique_ptr to nullptr or so. It’s potentially useful just so you can have a variable declared as a member in a class or so without immediately initializing it.
It even goes further in that this works even through subclasses:
192+ /** Check if this Locator is present (in some Cluster). */
193+ bool IsPresent() const noexcept { return cluster != nullptr; }
194+ };
195+
196+ /** A class of objects held internally in TxGraphImpl, with information about a single
197+ * transaction. */
196+ /** A class of objects held internally in TxGraphImpl, with information about a single
197+ * transaction. */
198+ struct Entry
199+ {
200+ /** Pointer to the corresponding Ref object if any, or nullptr if unlinked. */
201+ Ref* m_ref{nullptr};
Having each Ref point to the TxGraph and the TxGraph point to (and potentially modify via SwapIndexes) every Ref seems very self-referential, which is probably a bit confusing and not very cache optimal and so forth…
I think you could reduce this to a const Ref* (so that Refs wouldn’t get modified via entry.m_ref) by replacing the use of SwapIndexes on Compact() with a free list/heap of no-longer used entries. Probably not worthwhile though.
So this is roughly the thinking that led to this design of the split between Ref and Entry.
The most direct way of accomplishing this is just moving all data currently in Entry into Ref instead, but that requires exposing TxGraphImpl implementation details in the header. It also lacks a way to refer to entries in a stable manner internally. For example, what if a transaction has been marked for removal or dependency adding, and the Ref object is moved by the mempool code… we’d need a way to iterate over all places where a particular Ref is marked, which (I believe) needs several more cyclic references between components.
Another extreme is to have a map from GraphIndex values (which are chosen incrementally and never reused) to Entry objects, making them truly globally and permanently unique identifiers for Refs. No rewriting is ever necessary, as there are no compactions needed, but it implies an additional allocation per transaction, plus probably (even) worse memory locality than the current design.
I think using a freelist-like design can be seen as a variant of this, by storing the map in a support/allocators/pool.h pooled resource, and using map iterators as the graph indexes? I feel that’s probably overkill.
The current design avoids the Entry->Ref map by allocating them continuously, but compacting at stable points, which necessitates a backreference, but that’s still cheaper than an additional map.
in
src/txgraph.h:18
in
e22a0b21f8outdated
13+#define BITCOIN_TXGRAPH_H
14+
15+/** No connected component within TxGraph is allowed to exceed this number of transactions. */
16+static constexpr unsigned CLUSTER_COUNT_LIMIT{64};
17+
18+/** Data structure to encapsulate fees, sizes, and dependencies for a set of transactions. */
Might be worth pointing out that the api is designed towards allowing for an implementation that stores the transitive closure of dependencies – when querying, if B spends C, then you can’t differentiate between A spends B and A spends B and C.
701+ std::sort(an_clusters.begin(), an_clusters.end());
702+ an_clusters.erase(std::unique(an_clusters.begin(), an_clusters.end()), an_clusters.end());
703+
704+ // Run the union-find algorithm to to find partitions of the input Clusters which need to be
705+ // grouped together. See https://en.wikipedia.org/wiki/Disjoint-set_data_structure.
706+ {
Yeah, fair point. I will think about this. It’s a bit unfortunate that there is a second union-find implementation added in #31553, but it’s so different (including different per-partition meta-data being kept) that trying to have a single implementation would be too abstract/template-heavy.
I tried to encapsulate it in its own class, but it’s hard to give it the same performance (which is relevant in the context of huge reorgs) without making the interface rather involved (needs a “representative” type that contains a pointer to the PartitionData object, separate from the element data type), and even then, the union-find version in Trim() in 31553 would need its own separate implementation still.
Going to leave it for now, we can think about cleanups later.
in
src/txgraph.cpp:2089
in
e22a0b21f8outdated
1141+
1142+TxGraph::Ref::~Ref()
1143+{
1144+ if (m_graph) {
1145+ // Inform the TxGraph about the Ref being destroyed.
1146+ m_graph->UnlinkRef(m_index);
I think this gives you a segfault when aborting the program if the TxGraph is destructed prior to some of its outstanding Ref’s being destructed. It might be worthwhile having the TxGraph destructor iterate through any txs that haven’t already been unlinked and clear the m_graph pointers back to itself. Alternatively, the TxGraph destructor could issue an assertion failure if there are any linked Refs remaining.
Done. I’ve made ~TxGraphImpl clean up all Ref::m_graph pointers.
ajtowns
commented at 5:30 pm on March 17, 2025:
contributor
This seems fairly difficult to review – there’s a lot of mostly implicit background in what TxGraph does and why it does it that way, as evidenced by both the description in this PR and in the review club.
When I started typing this comment, I was wondering if maybe it would make sense to make the commits here into more of a gentle introduction to TxGraph, rather than one commit that does everything, then optimisations. But I don’t really think that approach would survive subsequent modifications to the code very well. Likewise for just adding more comments to the code.
Alternatively, I wonder if it would be helpful both for new devs and debugging to expose the TxGraph API as a python module, possibly with additional debugging handles so that you can examine internal state? I could imagine an interactive python notebook that allows you to interact with a real TxGraph being a useful experimental tool for learning/writing mempool code and also being something that could perhaps be reasonably maintained. Here’s a rough take at what that might look like. It’s using boost-python, which seems reasonable, but doesn’t seem super friendly to pure-virtual classes (hence the tedious wrapper class) or move-only objects (hence AddTransaction being wrapped via a constructor). (EDIT: add link for “here’s”; EDIT2: converting the vector<Ref*> stuff to python made the move-only Ref stuff irrelevant; the wrapper class isn’t quite so tedious any more either really)
Comments below are mostly a bunch of nits/observations while familiarising myself with the code, feel free to ignore.
in
src/txgraph.cpp:112
in
72a97c0a07outdated
107+ /** Given a DepGraphIndex into this Cluster, find the corresponding GraphIndex. */
108+ GraphIndex GetClusterEntry(DepGraphIndex index) const noexcept { return m_mapping[index]; }
109+ /** Only called by Graph::SwapIndexes. */
110+ void UpdateMapping(DepGraphIndex cluster_idx, GraphIndex graph_idx) noexcept { m_mapping[cluster_idx] = graph_idx; }
111+ /** Push changes to Cluster and its linearization to the TxGraphImpl Entry objects. */
112+ void Updated(TxGraphImpl& graph) noexcept;
Most of the Cluster methods require a TxGraphImpl to keep the quality/staging/mapping things consistent; feels a bit like it might be simpler just to have almost all this code stay in TxGraphImpl?
See my comments below: #31363 (comment) about how the division between the two follows whether or not the function needs access to Cluster’s internals (which may become dependent on the subclass).
in
src/txgraph.cpp:27
in
72a97c0a07outdated
22+
23+/** The maximum number of levels a TxGraph can have (0 = main, 1 = staging). */
24+static constexpr int MAX_LEVELS{2};
25+
26+// Forward declare the TxGraph implementation class.
27+class TxGraphImpl;
I think this is a close to accurate diagram of the data structures here:
To me, the cycles in that graph seem like a code smell. The GraphIndex backrefs make sense (or are unavoidable), as some stable way to refer to each tx is needed. But, I think it would probably be better for a Cluster not to be aware of its quality level or staging level (removing the m_setindex/m_quality/m_level back links) and moving the code that would query/change those items into TxGraph instead.
I think you could think of GraphIndex as the unique global reference to a tx in a TxGraph, in which case Ref could be viewed as an RAII view of a GraphIndex (which enforces the need for a reference to TxGraph to allow for the resource to be freed on destruction), and which also copes with GraphIndex values being updated.
235+ * m_group_data->m_group_oversized, but may be known even if m_group_data is not. */
236+ std::optional<bool> m_oversized{false};
237+ };
238+
239+ /** The ClusterSets in this TxGraphImpl. Has exactly 1 (main) or exactly 2 elements (main and staged). */
240+ std::vector<ClusterSet> m_clustersets;
ClusterSet m_main_clusterset; std::optional<ClusterSet> m_staging_clusterset; might be clearer – the vector stuff seems more confusing than helpful?
Does it ever make sense to have multiple nested staging levels? Parallel staging levels could make sense (I want to compare three RBFs in parallel on multiple cores, having checked that they’re all independent and don’t interact with each others’ clusters) but seems probably more complicated than would be worthwhile.
instagibbs
commented at 6:23 pm on March 18, 2025:
I ran into a somewhat contrived use-case, but I have a fairly strong YNGNI feeling of this aspect of the code. That said, I don’t think it hurts understanding too much as is.
ismaelsadeeq
commented at 6:28 pm on March 18, 2025:
Curious to know this contrived use case, I also think it’s a bit of an overkill in a previous comment #31363#pullrequestreview-2618143580, but @sipa made a compelling argument for it #31363 (comment)
which seem pretty straightforward to switch to a main/optional-stage model. Doing some renaming would make sense if so (emplace_back/pop_back to add/remove_staging; back to work maybe; size to has_staging maybe; maybe a templated ForEach() accepting a lambda taking a ClusterSet for iteration).
I’ve done something like this; std::vector<ClusterSet> is gone, and the code reduced by a whopping 23 lines as a result.
in
src/txgraph.cpp:1887
in
72a97c0a07outdated
1882+ // Check that the Entry has a locator pointing back to this Cluster & position within it.
1883+ assert(entry.m_locator[level].cluster == this);
1884+ assert(entry.m_locator[level].index == lin_pos);
1885+ // For top-level entries, check linearization position and chunk feerate.
1886+ if (level == 0 && IsAcceptable()) {
1887+ assert(entry.m_main_lin_index == linindex++);
as of this PR, it isn’t actually used, so should not introduce bugs immediately
the fuzz testing is pretty good, so there should not be any catastrophic bugs even if it were used
while maybe the design could be improved, that can be done incrementally after it’s merged
I’m reasonably confident I understand what’s going on and can help find/fix bugs/problems if they occur
I think the design and implementation are a good step forward for implementing cluster mempool
On the other hand, calling this “preliminary” as:
I’m not yet confident that I could maintain this code if its authors all suddenly vanished
I’m a bit concerned that to contribute to mempool code you might need to be a triple expert – C++, bitcoin, and industrial optimisation theory; not a show stopper, but as further PRs progress I’d like to gain confidence either that the optimisation parts can be safely treated as a very stable black box, or (ideally and) that we can reasonably onboard people to understand how the internals of the black box work.
sipa
commented at 7:12 pm on March 18, 2025:
member
@ajtowns Responding to some of the more conceptual comments
I’m indeed not entirely happy with the cycles between TxGraphImpl and Cluster, though there is a design concern that isn’t really materialized in the current code yet: Cluster may become a virtual class with multiple implementations for the purpose of memory usage (e.g., there could be a specialized SingletonCluster which just stores a single transaction, and no DepGraph, linearization, …, avoiding several vectors and their allocation overhead). The current code is designed to accommodate that, by putting code in Cluster member functions that need to have access to the internals, while having the generic code in TxGraphImpl. It’s still not trivial to introduce, because cluster merging/splitting needs to somehow work for all combination of cluster types, but to the extent I can say without actually having implemented it, I think it’s fairly close to that. I’m happy to hear about alternative approaches.
The above relates somewhat to the m_setindex/m_quality/m_level not feeling like they deserve to be dealt with in Cluster: indeed, they’re per-Cluster data, but not part of the “cluster implementation specific” code, so I was imagining these (the variables, and functions relating to them) to be part of the abstract base class for Cluster, and not in the Cluster derived classes. It would be possible to move the related functions to TxGraphImpl instead, but if this virtual Cluster design appears, maybe having it in the Cluster base class is actually more clear anyway?
Regarding Assume / Assert / assert. It’s true that semantically speaking, Assume() is always evaluated (so it’s fine to have side effects inside). However, it’s also the case that if the compiler can prove a statement is side-effect free, it can optimize the call away in non-debug builds. So I’m using them as a form of cheap assertion that can be used extensively throughout the codebase, most of which without runtime impact in production, compiler-proven to have same side-effects in debug and release builds. Combined with (fuzz) tests with high branch coverage, I believe this gives you the best of both worlds for non-critical assertions that still help reviewers reason about the code (as a “ah, an Assume(), I can be reasonably assured that in fuzz-test-reachable-scenarios, this condition is actually always true”).
Regarding the destruction of Ref objects and removing/unlinking. Note that the behavior changes: in the initial commit you can only destroy Ref objects whose transactions have already been RemoveTransaction()d, so that there is never a need for an inspector function to refer back to it anymore. In a later commit, Ref destroying is changed to automatically trigger a removal in the linked TxGraph if any (in both main and staging, which RemoveTransaction can’t do, as it only works on the top level).
The current design tries very hard to make sure that under all circumstances, TxGraph is internally consistent (including when destroying non-removed Ref objects, which necessitates the m_graph pointer inside), but the responsibility to make sure the graph is externally consistent with the mempool lies outside. However, it doesn’t actually achieve this design entirely when considering locking: one could accidentally destroy a Ref without holding whatever lock protects the TxGraph object, for example, and if we need to restrict to only supporting “sane” behavior, maybe an alternative design is possible, where instead of TxGraph::Ref::~Ref, we have a TxGraph::WipeTransaction(Ref&) function (which does removing/unlinking/compacting as needed), and the actual Ref destructor then just asserts(false); if not wiped first? This would also remove the need for having an m_graph inside a Ref (or making it debug-only).
Regarding having a ClusterSet m_main_clusterset; std::optional<ClusterSet> m_staging_clusterset instead of a vector of ClusterSets (also tagging @instagibbs and @ismaelsadeeq who have commented about this). There is absolutely no intention of trying to accommodate multiple nested (or parallel) levels, and the design isn’t aiming for that. I just found it more elegant to largely avoid treating main and staging as separate concepts, and instead see them as two copies of the same thing. I haven’t tried the alternative, and there is probably a bunch of “interact with level below” that can become a bit more simple/specialized, but maybe also more logic for picking between m_main_clusterset and m_staging_clusterset (which aren’t even the same type). If you all think it’s worth trying, I’m happy to look into it.
ajtowns
commented at 12:10 pm on March 19, 2025:
contributor
However, it’s also the case that if the compiler can prove a statement is side-effect free, it can optimize the call away in non-debug builds.
That makes sense. Might be a good idea to update doc/developer-notes.md pointing out that use case? It currently says:
Assume should be used to document assumptions when program execution can
safely continue even if the assumption is violated.
sipa force-pushed
on Mar 19, 2025
sipa force-pushed
on Mar 19, 2025
DrahtBot added the label
CI failed
on Mar 19, 2025
DrahtBot
commented at 9:08 pm on March 19, 2025:
contributor
Try to run the tests locally, according to the documentation. However, a CI failure may still
happen due to a number of reasons, for example:
Possibly due to a silent merge conflict (the changes in this pull request being
incompatible with the current code in the target branch). If so, make sure to rebase on the latest
commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the
affected test.
An intermittent issue.
Leave a comment here, if you need help tracking down a confusing failure.
sipa force-pushed
on Mar 19, 2025
sipa
commented at 9:34 pm on March 19, 2025:
member
Big change: std::vector<ClusterSet> is gone, replaced by m_main_clusterset and m_staging_clusterset.
New helper functions GetTopLevel(), GetSpecifiedLevel(), GetClusterSet() made this fairly easy to do.
Simplifications resulted in ClearLocator, GetConflicts, PullIn, MakeTransactionsMissing, StartStaging, CommitStaging, ExtractCluster.
CopyTo was renamed to CopyToStaging.
LevelDown was renamed to MoveToMain.
MakeTransactionsMissing was renamed to MakeStagingTransactionsMissing.
Other changes:
Various comments
Ref objects are now allowed to outlive TxGraph.
GetChunkFeerate (and its testing, plus Entry::m_chunk_feerate maintenance) was moved to a separate commit.
DrahtBot removed the label
CI failed
on Mar 20, 2025
in
src/txgraph.h:117
in
63b8f96e67outdated
128+ * configured maximum cluster count). If main_only is false and a staging graph exists, it is
129+ * queried; otherwise the main graph is queried. Some of the functions below are not available
130+ * for oversized graphs. The mutators above are always available. Removing a transaction by
131+ * destroying its Ref while staging exists will not clear main's oversizedness until staging
132+ * is aborted or committed. */
133+ virtual bool IsOversized(bool main_only = false) noexcept = 0;
If I’m understanding right, this seem like it makes IsOversized() largely unfixable by the client – so the only reasonable strategy I can see is to do:
That’s a very reasonable strategy of course, so this isn’t a complaint! Just that if so, it seems like it should be documented a bit more clearly? I wonder a bit if CommitStaging() should have an Assume(G_FUZZING || !m_oversized) or something…
(In particular, if you ever end up with the main graph being oversized, I don’t see how you can even discover which txs are contributing to the oversized-ness in order to remove them – GetCluster(ref) will fail the m_deps_to_add.empty() check I think, so you can’t use that, and if you do you’ll just get a subset of the actual cluster that is within limits; same with GetDescendants)
Ability to inspect the would-be-constructed clusters (so they can be “fixed” if they violate policy rules, as opposed to just accepted/rejected, before applying - this is needed for reorgs which are not optional).
See #31553, which adds a TxGraph::Trim function that removes some subset of transactions + their descendants, such that the result is no longer oversized, in O(n log n) time.
When I started this PR, the idea was to expose ways to inspect would-be-oversized clusters, so they could be fixed by the user, before committing.
It turned out that really nothing beats an internal “figure it out, and fix it” method in terms of efficiency and convenience, so that’s what 31553 does. It’s a bit of a design break compared to the rest of TxGraph, which lets the user make all decisions about the content of the graph, but it’s very practical.
:+1: I think it makes sense for TxGraph to take care of all the topology checks and feerate comparison decisions, which includes finding the best txs (block building), finding the worst txs (eviction) and finding good subsets when limits would otherwise be exceeded (reorgs, “rbf”/“carve out” cases maybe). So this doesn’t feel like a design break at all to me, but rather pretty much exactly parallel to GetWorstMainChunk() – “here’s a bunch of txs that you want to remove”.
instagibbs
commented at 1:40 pm on March 20, 2025:
so the only reasonable strategy I can see is to do:
The other reason to not follow this strategy, aside from reorgs, is “kindred eviction”, where the caller is in charge of gathering sufficient information try to get back to an un-oversized state in the staging area, using non-oversized main stage information as a guiding tool.
@ajtowns That’s fair; I liked to think of TxGraph as something with well-specified behavior, but there are several instances of unspecified/best-effort behavior already:
The cluster linearizations used (as no optimality is guaranteed)
Reordering of tx removal / dep adding due to lazy batching
Tie-breaking of order of same-chunk-feerate transactions from different clusters
The exact logic Trim uses
Going to mark this resolved.
ajtowns
commented at 7:10 am on March 20, 2025:
contributor
(conflicts with just merged #31519, Span to std::span)
in
src/txgraph.cpp:1015
in
63b8f96e67outdated
1010+ if (!m_main_clusterset.m_to_remove.empty()) return;
1011+ if (!m_main_clusterset.m_removed.empty()) return;
1012+ if (m_staging_clusterset.has_value()) {
1013+ if (!m_staging_clusterset->m_deps_to_add.empty()) return;
1014+ if (!m_staging_clusterset->m_to_remove.empty()) return;
1015+ if (!m_staging_clusterset->m_removed.empty()) return;
Good idea. Done through a ClusterSet::HoldsGraphIndexes() function.
ajtowns
commented at 9:01 am on March 20, 2025:
contributor
ReACK 63b8f96e67f9ad649070a231532d48fb6c3573e4 . Needs a rebase anyway though.
The main/staging changes read better to me fwiw. The level stuff in the locators could be ungeneralised as well I suppose, but that doesn’t seem likely to be much of an improvement.
DrahtBot requested review from instagibbs
on Mar 20, 2025
DrahtBot requested review from ismaelsadeeq
on Mar 20, 2025
DrahtBot added the label
Needs rebase
on Mar 20, 2025
in
src/txgraph.cpp:199
in
101a8ee328outdated
171- std::vector<std::pair<GraphIndex, GraphIndex>> m_deps;
172+ /** Where the clusters to be merged start in m_group_clusters. */
173+ uint32_t m_cluster_offset;
174+ /** How many clusters to merge. */
175+ uint32_t m_cluster_count;
176+ /** Where the dependencies for this cluster group in m_deps_to_add start. */
ismaelsadeeq
commented at 10:27 am on March 20, 2025:
In 101a8ee3280e50c3272a80939b46a67faca838e4 “txgraph: (optimization) avoid per-group vectors for clusters & dependencies”
0 /** Where the dependencies for this cluster group in ClusterSet::m_deps_to_add start. */
Try to run the tests locally, according to the documentation. However, a CI failure may still
happen due to a number of reasons, for example:
Possibly due to a silent merge conflict (the changes in this pull request being
incompatible with the current code in the target branch). If so, make sure to rebase on the latest
commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the
affected test.
An intermittent issue.
Leave a comment here, if you need help tracking down a confusing failure.
sipa force-pushed
on Mar 20, 2025
DrahtBot removed the label
Needs rebase
on Mar 20, 2025
instagibbs
commented at 2:55 pm on March 20, 2025:
member
reviewed via git range-diff master 72a97c0a07ea6e5a95ab37c8d95e1ea02cff8e92 1601906941fa559ebbee7898453fa77f4606ad38
Ref objects are now allowed to outlive TxGraph
Can this get surfaced in the fuzz coverage with something that would previously fail?
DrahtBot removed the label
CI failed
on Mar 20, 2025
ajtowns
commented at 7:56 am on March 21, 2025:
contributor
Can this get surfaced in the fuzz coverage with something that would previously fail?
As long as you actually run the fuzz binary with your changes, and not the fuzz binary in the old location that you compiled a few days ago (cough), the following should work:
0--- a/src/test/fuzz/txgraph.cpp
1+++ b/src/test/fuzz/txgraph.cpp
2@@ -723,4 +723,6 @@ FUZZ_TARGET(txgraph)
34 // Sanity check again (because invoking inspectors may modify internal unobservable state).
5 real->SanityCheck();
6+ real.reset(); // kill the txgraph
7+ sims.clear(); // clear the sim container and any Refs
8 }
Dropping the TxGraphImpl destructor body then causes segfaults (rather than a nice assertion failure).
ajtowns
commented at 4:02 pm on March 21, 2025:
contributor
ReACK 1601906941fa559ebbee7898453fa77f4606ad38
instagibbs
commented at 4:30 pm on March 21, 2025:
member
reACK1601906941fa559ebbee7898453fa77f4606ad38
Dropping the TxGraphImpl destructor body then causes segfaults (rather than a nice assertion failure).
Right, I got myself turned around on this thinking it was harder to achieve. I’d suggest adding coverage here or in a quick follow-up.
sipa force-pushed
on Mar 22, 2025
sipa
commented at 0:06 am on March 22, 2025:
member
Changes:
Did a big pass through the interface documentation in txgraph.h
Moved the support for Refs outliving TxGraph to a separate commit, and added the test suggested in #31363 (comment).
Some smaller things listed below:
DrahtBot added the label
CI failed
on Mar 22, 2025
DrahtBot
commented at 1:22 am on March 22, 2025:
contributor
Try to run the tests locally, according to the documentation. However, a CI failure may still
happen due to a number of reasons, for example:
Possibly due to a silent merge conflict (the changes in this pull request being
incompatible with the current code in the target branch). If so, make sure to rebase on the latest
commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the
affected test.
An intermittent issue.
Leave a comment here, if you need help tracking down a confusing failure.
sipa force-pushed
on Mar 22, 2025
DrahtBot removed the label
CI failed
on Mar 22, 2025
ismaelsadeeq
commented at 11:14 am on March 24, 2025:
As it is right now, it’s unused in the main PR.
I’ve been thinking when DoWork can be called in the mempool lifecycle and whether that would make a difference because everything is done lazily and we only apply staged changes when we want to get the current state. (specifically a scenario where after some lazy mutations it’s cheap to just call DoWork and not wait until we want to get a state.)
Maybe after trimming due to a reorg? or after multiple additions of transactions e.g loading new mempool.
I believe the plan is to basically run it at every “stable” point in time, so after any transaction or block processing.
ismaelsadeeq
commented at 11:54 am on March 24, 2025:
member
re-ACK41b4434fed169570ce0976c6e58db0d4a9614aaa after #31363#pullrequestreview-2707582068
The new doc is better, and it’s explicit that ref’s can outlive their txgraph.
DrahtBot requested review from ajtowns
on Mar 24, 2025
in
src/txgraph.h:85
in
bf8f04b11doutdated
80+ /** Determine whether arg exists in this graph (i.e., was not removed). */
81+ virtual bool Exists(const Ref& arg) noexcept = 0;
82+ /** Get the individual transaction feerate of transaction arg. Returns the empty FeePerWeight
83+ * if arg does not exist. */
84+ virtual FeePerWeight GetIndividualFeerate(const Ref& arg) noexcept = 0;
85+ /** Get pointers to all transactions in the which arg is in. The transactions will be returned
87+ virtual std::vector<Ref*> GetCluster(const Ref& arg) noexcept = 0;
88+ /** Get pointers to all ancestors of the specified transaction, in unspecified order. Returns
89+ * {} if arg does not exist in the graph. */
90+ virtual std::vector<Ref*> GetAncestors(const Ref& arg) noexcept = 0;
91+ /** Get pointers to all descendants of the specified transaction, in unspecified order. Returns
92+ * {} if arg does not exist in the graph. */
20+ * The connected components within the transaction graph are called clusters: whenever one
21+ * transaction is reachable from another, through any sequence of is-parent-of or is-child-of
22+ * relations, they belong to the same cluster (so clusters include parents, children, but also
23+ * grandparents, siblings, cousins twice removed, ...).
24+ *
25+ * TxGraph implicitly maintains an associated total ordering on its transactions (its linearization)
I don’t think this is strictly true until the chunk indexes are added in #31444? Prior to that, it’s possible to obtain such an ordering, but only by requesting all the clusters, breaking them into chunks manually (via chunk feerate calls on each tx in a cluster), then sorting the chunks.
I’ve replaced “maintains” with “defines”. As far as the interface is concerned, what the actual implementation does in this regard is irrelevant.
in
src/txgraph.h:121
in
41b4434fedoutdated
116+ * is aborted or committed. */
117+ virtual bool IsOversized(bool main_only = false) noexcept = 0;
118+ /** Get the feerate of the chunk which transaction arg is in the main graph. Returns the empty
119+ * FeePerWeight if arg does not exist in the main graph. The main graph must not be
120+ * oversized. */
121+ virtual bool Exists(const Ref& arg, bool main_only = false) noexcept = 0;
ajtowns
commented at 1:07 pm on March 24, 2025:
contributor
range-diff review of 41b4434fed169570ce0976c6e58db0d4a9614aaa
Good to see it’s now explicit which calls you can make on an oversized txgraph.
DrahtBot requested review from ajtowns
on Mar 24, 2025
clusterlin: Add FixLinearization function + fuzz test
This function takes an existing ordering for transactions in a DepGraph, and
makes it a valid linearization for it (i.e., topological). Any topological
prefix of the input remains untouched.
0aa874a357
clusterlin: Make IsAcyclic() a DepGraph member function
... instead of being a separate test-only function.
Also add a fuzz test for it returning false.
Since cluster_linearize.h does not actually have a Cluster type anymore, it is more
appropriate to rename the index type to DepGraphIndex.
-BEGIN VERIFY SCRIPT-
sed -i 's/Data type to represent transaction indices in clusters./Data type to represent transaction indices in DepGraphs and the clusters they represent./' $(git grep -l 'using ClusterIndex')
sed -i 's|\<ClusterIndex\>|DepGraphIndex|g' $(git grep -l 'ClusterIndex')
-END VERIFY SCRIPT-
d449773899
feefrac: Introduce tagged wrappers to distinguish vsize/WU rates6eab3b2d73
txgraph: Add initial version (feature)
This adds an initial version of the txgraph module, with the TxGraph class.
It encapsulates knowledge about the fees, sizes, and dependencies between all
mempool transactions, but nothing else.
In particular, it lacks knowledge about txids, inputs, outputs, CTransactions,
... and so forth. Instead, it exposes a generic TxGraph::Ref type to reference
nodes in the TxGraph, which can be passed around and stored by layers on top.
8ad3ed2681
txgraph: Add simulation fuzz test (tests)
This adds a simulation fuzz test for txgraph, by comparing with a naive
reimplementation that models the entire graph as a single DepGraph, and
clusters in TxGraph as connected components within that DepGraph.
05abf336f9
txgraph: Add internal sanity check function (tests)
To make testing more powerful, expose a function to perform an internal sanity
check on the state of a TxGraph. This is especially important as TxGraphImpl
contains many redundantly represented pieces of information:
* graph contains clusters, which refer to entries, but the entries refer back
* graph maintains pointers to Ref objects, which point back to the graph.
This lets us make sure they are always in sync.
ee57e93099
txgraph: Avoid per-group vectors for clusters & dependencies (optimization)
Instead construct a single vector with the list of all clusters in all groups,
and then store per-group offset/range in that list.
For dependencies, reuse m_deps_to_add, and store offset/range into that.
c80aecc24d
txgraph: Add GetChunkFeerate function (feature)
This adds a function to query the chunk feerate of a transaction, by caching it
inside the Entry objects.
1d27b74c8e
txgraph: Make max cluster count configurable and "oversize" state (feature)
Instead of leaving the responsibility on higher layers to guarantee that
no connected component within TxGraph (a barely exposed concept, except through
GetCluster()) exceeds the cluster count limit, move this responsibility to
TxGraph itself:
* TxGraph retains a cluster count limit, but it becomes configurable at construction
time (this primarily helps with testing that it is properly enforced).
* It is always allowed to perform mutators on TxGraph, even if they would cause the
cluster count limit to be exceeded. Instead, TxGraph exposes an IsOversized()
function, which queries whether it is in a special "oversize" state.
* During oversize state, many inspectors are unavailable, but mutators remain valid,
so the higher layer can "fix" the oversize state before continuing.
64f69ec8c3
txgraph: Avoid representative lookup for each dependency (optimization)
The m_deps_to_add vector is sorted by child Cluster*, which matches the
order of an_clusters. This means we can walk through m_deps_to_add while
doing the representative lookups for an_clusters, and reuse them.
1171953ac6
txgraph: Avoid looking up the same child cluster repeatedly (optimization)
Since m_deps_to_add has been sorted by child Cluster* already, all dependencies
with the same child will be processed consecutively. Take advantage of this by
remember the last partition merged with, and reusing that if applicable.
57f5499882
txgraph: Delay chunking while sub-acceptable (optimization)
Chunk-based information (primarily, chunk feerates) are never accessed without
first bringing the relevant Clusters to an "acceptable" quality level. Thus,
while operations are ongoing and Clusters are not acceptable, we can omit
computing the chunkings and chunk feerates for Clusters.
5801e0fb2b
txgraph: Special-case removal of tail of cluster (Optimization)
When transactions are removed from the tail of a cluster, we know the existing
linearization remains acceptable (if it already was), but may just need splitting
and postlinearization, so special case these into separate quality levels.
36dd5edca5
txgraph: Group per-graph data in ClusterSet (refactor)
This is a preparation for a next commit where a TxGraph will start representing
potentially two distinct graphs (a main one, and a staging one with proposed
changes).
34aa3da5ad
txgraph: Abstract out ClearLocator (refactor)
Move a number of related modifications to TxGraphImpl into a separate
function for removal of transactions. This is preparation for a later
commit where this will be useful in more than one place.
c99c7300b4
txgraph: Add staging support (feature)
In order to make it easy to evaluate proposed changes to a TxGraph, introduce a
"staging" mode, where mutators (AddTransaction, AddDependency, RemoveTransaction)
do not modify the actual graph, but just a staging version of it. That staging
graph can then be commited (replacing the main one with it), or aborted (discarding
the staging).
8c70688965
txgraph: Cache oversizedness of graphs (optimization)6b037ceddf
txgraph: Destroying Ref means removing transaction (feature)
Before this commit, if a TxGraph::Ref object is destroyed, it becomes impossible
to refer to, but the actual corresponding transaction node in the TxGraph remains,
and remains indefinitely as there is no way to remove it.
Fix this by making the destruction of TxGraph::Ref trigger immediate removal of
the corresponding transaction in TxGraph, both in main and staging if it exists.
82fa3573e1
txgraph: Allow Refs to outlive the TxGraph (feature)22c68cd153
txgraph: Expose ability to compare transactions (feature)
In order to make it possible for higher layers to compare transaction quality
(ordering within the implicit total ordering on the mempool), expose a comparison
function and test it.
295a1ca8bb
txgraph: Add DoWork function (feature)
This can be called when the caller has time to spend now, and wants future operations
to be fast.
b685d322c9
txgraph: Add CountDistinctClusters function (feature)aded047019
txgraph: Multiple inputs to Get{Ancestors,Descendant}Refs (preparation)
This is a preparation for the next commit, which adds a feature to request
the Refs to multiple ancestors/descendants at once.
DrahtBot requested review from ismaelsadeeq
on Mar 24, 2025
in
src/txgraph.cpp:882
in
c80aecc24doutdated
878@@ -857,22 +879,27 @@ void TxGraphImpl::ApplyDependencies() noexcept
879 if (m_deps_to_add.empty()) return;
880881 // For each group of to-be-merged Clusters.
882- for (auto& group_data : *m_group_data) {
883+ for (const auto& group_data : m_group_data->m_groups) {
67@@ -65,14 +68,16 @@ class Cluster
68 QualityLevel m_quality{QualityLevel::NONE};
69 /** Which position this Cluster has in Graph::ClusterSet::m_clusters[m_quality]. */
70 ClusterSetIndex m_setindex{ClusterSetIndex(-1)};
71+ /** Which level this Cluster is at in the graph (-1=not inserted, 0=main, 1=staging). */
sipa
commented at 7:24 pm on March 25, 2025:
member
It occurs to me that it may be possible to get rid of TxGraphImpl::Entry::m_main_lin_index, by storing a transaction linearization position rather than its DepGraphIndex inside the TxGraphImpl::Locator. So instead of using the DepGraphIndex as primary way to identify a transaction within a cluster, make the linearization index the primary, and look up the other one through m_linearization when needed. It’s not clear to me if it’s better to leave m_mapping be indexed by DepGraphIndex or linearization index, but I think it can be done.
Anyway, after looking into it, I don’t think it’s worth changing at this stage in the PR, but maybe it can be done as a follow-up.
in
src/txgraph.cpp:341
in
8c70688965outdated
343- void SetClusterQuality(QualityLevel old_quality, ClusterSetIndex old_index, QualityLevel new_quality) noexcept;
344- /** Make a transaction not exist. It must currently exist. */
345- void ClearLocator(GraphIndex index) noexcept;
346+ void SetClusterQuality(int level, QualityLevel old_quality, ClusterSetIndex old_index, QualityLevel new_quality) noexcept;
347+ /** Get the index of the top level ClusterSet (staging if it exists, main otherwise). */
348+ int GetTopLevel() const noexcept { return m_staging_clusterset.has_value(); }
Some parts of the file seem to suggest a max_level-agnostic implementation (e.g. looping on level from 0 to GetTopLevel()), but this casting from a bool to an int suggests it has to be 2. I think it would be useful to have more levels (e.g. for package test acceptance) in the future, but perhaps a static_assert(MAX_LEVELS == 2) might be good documentation here?
ajtowns
commented at 6:25 pm on March 26, 2025:
contributor
reACKb2ea3656481b4196acaf6a1b5f3949a9ba4cf48f
ismaelsadeeq
commented at 8:11 pm on March 26, 2025:
member
reACKb2ea3656481b4196acaf6a1b5f3949a9ba4cf48f 🚀
in
src/txgraph.cpp:1368
in
b2ea365648
1363+ // that the chunks of the resulting linearization are all connected.
1364+ if (!optimal) PostLinearize(m_depgraph, linearization);
1365+ // Update the linearization.
1366+ m_linearization = std::move(linearization);
1367+ // Update the Cluster's quality.
1368+ auto new_quality = optimal ? QualityLevel::OPTIMAL : QualityLevel::ACCEPTABLE;
fwiw, I don’t think there is test coverage for this (i.e. fine if this is always set to optimal), though I expect any problems would become more obvious when mempool things are hooked up to txgraph
Right. The difference between OPTIMAL and ACCEPTABLE is not observable as far as the interface is concerned, because it only promises a best effort.
There is an operational difference, though: DoWork() will not bother putting more effort into OPTIMAL clusters.
glozow
commented at 9:06 pm on March 26, 2025:
member
ACKb2ea3656481
Reviewed the structure to convince myself this approach makes sense to support the features required, and reviewed the SimTxGraph + differential fuzzer in closer detail. I haven’t read through all the TxGraphImpl functions but did some manual mutation testing.
in
src/test/fuzz/txgraph.cpp:583
in
b2ea365648
578+ for (auto i : component) {
579+ component |= sel_sim.graph.Ancestors(i);
580+ component |= sel_sim.graph.Descendants(i);
581+ }
582+ if (component == old_component) break;
583+ }
556+ refs.resize(count);
557+ for (size_t i = 0; i < count; ++i) {
558+ refs[i] = pick_fn();
559+ }
560+ // Their order should not matter, shuffle them.
561+ std::shuffle(refs.begin(), refs.end(), rng);
Well, pick_fn picks based on fuzz input, which is arbitrary, but definitely not random. There is a point to be made that we should leave it up to the fuzzer entirely here, but I felt like adding actual random ordering might more easily cover strange orderings. Probably doesn’t add much, but also does not hurt much.
in
src/cluster_linearize.h:1359
in
0aa874a357outdated
1354+ SetType place_before = done & depgraph.Ancestors(elem);
1355+ // Find which position to place elem in (updating j), continuously moving the elements
1356+ // in between forward.
1357+ while (place_before.Any()) {
1358+ // j cannot be 0 here; if it was, then there was necessarily nothing earlier which
1359+ // elem needs to be place before anymore, and place_before would be empty.
Where does the MAX_TRANSACTIONS come from? It should limit the size of removed to 100. This was added when the fuzzer ran for much longer (up to 1000 iterations, I think), and I wanted to avoid it just creating and removing transactions. It currently doesn’t do much anymore with the lower iteration count.
I was thinking that if all the transactions are descended from this tx, you’d add all of them to to_remove here. So if you started out with 99, you can end up with a lot more than 100 afterward. You wouldn’t do more removals afterward, but you’d have more than 100.
This is a metadata mirror of the GitHub repository
bitcoin/bitcoin.
This site is not affiliated with GitHub.
Content is generated from a GitHub metadata backup.
generated: 2025-06-17 12:13 UTC
This site is hosted by @0xB10C More mirrored repositories can be found on mirror.b10c.me