Follow-up to #32545, part of #30289.
This contains many of the optimizations that were originally part of #32545 itself.
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.
For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/34023.
See the guideline for information on the review process. A summary of reviews will appear here.
🚧 At least one of the CI tasks failed.
Task 32 bit ARM: https://github.com/bitcoin/bitcoin/actions/runs/19993781856/job/57338323718
LLM reason (✨ experimental): Cluster linearize tests fail due to a size mismatch (linearization size != depgraph TxCount) triggering assertion failures and aborted tests.
Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:
Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.
An intermittent issue.
Leave a comment here, if you need help tracking down a confusing failure.
This also adds a per-cost variant of each.
This adds a data structure representing the optimization state for the spanning-forest
linearization algorithm (SFL), plus a fuzz test for its correctness.
This is preparation for switching over Linearize() to use this algorithm.
See https://delvingbitcoin.org/t/spanning-forest-cluster-linearization/1419 for
a description of the algorithm.
This replaces the existing LIMO linearization algorithm (which internally uses
ancestor set finding and candidate set finding) with the much more performant
spanning-forest linearization algorithm.
This removes the old candidate-set search algorithm, and several of its tests,
benchmarks, and needed utility code.
This introduces a queue of chunks that still need processing, in both
MakeTopological() and OptimizationStep(). This is simultaneously:
* A preparation for introducing randomization, by allowing permuting the
queue.
* An improvement to the fairness of suboptimal solutions, by distributing
the work more fairly over chunks.
* An optimization, by avoiding retrying chunks over and over again which
are already known to be optimal.
This introduces a local RNG inside the SFL state, which is used to randomize
various decisions inside the algorithm, in order to make it hard to create
pathological clusters which predictably have bad performance.
The decisions being randomized are:
* When deciding what chunk to attempt to split, the queue order is
randomized.
* When deciding which dependency to split on, a uniformly random one is
chosen among those with higher top feerate than bottom feerate within
the chosen chunk.
* When deciding which chunks to merge, a uniformly random one among those
with the higher feerate difference is picked.
* When merging two chunks, a uniformly random dependency between them is
now activated.
* When making the state topological, the queue of chunks to process is
randomized.
This places equal-feerate chunks (with no dependencies between them) in random
order in the linearization output, hiding information about DepGraph insertion
order from the output. Likewise, it randomizes the order of transactions within
chunks for the same reason.
This ended up never being used in txgraph.
With MergeLinearizations() gone and the LIMO-based Linearize() replaced by SFL, we do not
need a class (LinearizationChunking) that can maintain an incrementally-improving chunk
set anymore.
Replace it with a function (ChunkLinearizationInfo) that just computes the chunks as
SetInfos once, and returns them as a vector. This simplifies several call sites too.
This avoids the need for a loop over all parents of a transaction while walking
a chunk, and removes the need to store the set of parent dependencies explicitly.
This introduces the notion of gain to the SFL algorithm. Given a chunk c,
an active dependency d in it, and the chunks (t, b) that c would split into
if d were deactivated, the gain is defined as either (they are equivalent):
(feerate(t) - feerate(b)) * size(t) * size(b)
fee(t) * size(b) - fee(b) * size(t)
It happens to also be equal to these:
(feerate(t) - feerate(c)) * size(t) * size(c)
fee(t) * size(c) - fee(c) * size(t)
Its relevance is that this metric is proportional to a lower bound on the area
under the fee-size diagram which would be gained IF a deactivation of d does not
result in a self-merge of t and b again.
This commit adds logic to find, within each chunk, the dependency with the highest
gain. In benchmarks, this appears to be a very good heuristic for deciding which
splits are worth making.
This reduces the number of allocations required inside the SFL algorithm, and works
because the number of dependencies per transaction is at most n-1.
To minimize the memory usage from this pre-allocation (which might impact memory
locality), change the data type of DepIdx from uint32_t to uint8_t or uint16_t when
possible.
Within the per-transaction child dependency list, keep the active ones before all
inactive ones. This improves the complexity over iterating over active dependencies
from O(m) to O(n), as at most n-1 dependencies can be active within any given chunk
at any given time.
This avoids adding them a second time when they happen to already be there.
Out of an abundance of caution that adversarially-constructed clusters might
reliably result in bad chunk split decisions with the maximum-gain strategy,
make every third consecutive attempt to split the same chunk use a random
strategy instead.
We do not need to actually keep track of whether a dependency is active
or not; it is implied by whether or not it appears within the active
prefix of its parent's child_deps, and its child's parent_deps.
Just remove setting and checking it.
After the normal optimization process finishes, and finds an optimal
spanning forest, run a second process (while computation budget remains)
to split chunks into minimal equal-feerate chunks.
his adds a rough estimate of algorithm runtime, so it can be interrupted if no
solution is found in time. Due to inherent differences between platforms, this
will not be extremely accurate, but it is preferable over directly measuring
time for consistency.
This also updates FixLinearization to just be a thin wrapper around Linearize.
In a future commit, FixLinearization will be removed entirely.
With the SFL algorithm, we will practically be capable of keeping
most if not all clusters optimal. With that, it seems less valuable
to avoid doing work after splitting an acceptable cluster, because by
doing some work we may get it to OPTIMAL.
This reduces the complexity of the code a bit as well.
With the new SFL algorithm, the process of loading an existing linearization into the
SFL state is very similar to what PostLinearize does. This means there is little benefit
to performing an explicit PostLinearize step before linearizing inside txgraph. Instead,
it seems better to use our allotted CPU time to perform more SFL optimization steps.