Replace cluster linearization algorithm with SFL

sipa commented at 11:24 pm on May 17, 2025: member

Part of cluster mempool: #30289. Based on #30605.

This replaces the cluster linearization algorithm introduced in #30126 and #30286 (a combination of LIMO with candidate-set search), with a completely different algorithm: spanning-forest linearization, which appears to have much better performance for hard clusters. See this post for a comparison between various linearization algorithms, and this post for benchmarks comparing them. Replaying historical mempool data on it shows that it can effectively linearize every observed cluster up to 64 transactions optimally within tens of microseconds, though pathological examples can be created which take longer.

The algorithm is effectively a very specialized version of the simplex algorithm to the problem of finding high-feerate topological subsets of clusters, but modified to find all consecutive such subsets concurrently rather than just the first one. See the post above for how it is related.

It represents the cluster as partitioned into a set of chunks, each with a spanning tree of its internal dependencies connecting the transactions. Randomized improvements are made by selecting dependencies to add and remove to these spanning trees, merging and splitting chunks, until no more improvements are possible. Like simplex, it does not necessarily make progress in every step, and thus has no upper bound on its runtime, but randomization makes long runtimes very unlikely, and additionally makes it hard to adversarially construct clusters in which the algorithm reliably makes bad choices.

DrahtBot commented at 11:24 pm on May 17, 2025: contributor

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/32545.

Reviews

See the guideline for information on the review process.

Type	Reviewers
Concept ACK	jonatack

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#32263 (cluster mempool: add TxGraph work controls by sipa)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

LLM Linter (✨ experimental)

Possible typos and grammar issues:

“must be subset” -> “must be a subset” [missing article makes the requirement unclear] No other comprehension-impacting typos were found.

drahtbot_id_4_m

sipa force-pushed on May 17, 2025

DrahtBot added the label CI failed on May 17, 2025

DrahtBot commented at 11:40 pm on May 17, 2025: contributor

🚧 At least one of the CI tasks failed. Task ARM, unit tests, no functional tests: https://github.com/bitcoin/bitcoin/runs/42417371062 LLM reason (✨ experimental): The CI failure is due to a build error during the compilation of txgraph.cpp.o.

Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.
An intermittent issue.

Leave a comment here, if you need help tracking down a confusing failure.

sipa force-pushed on May 18, 2025

DrahtBot removed the label CI failed on May 18, 2025

sipa force-pushed on May 18, 2025

sipa force-pushed on May 19, 2025

DrahtBot added the label CI failed on May 19, 2025

DrahtBot commented at 3:39 am on May 19, 2025: contributor

🚧 At least one of the CI tasks failed. Task ARM, unit tests, no functional tests: https://github.com/bitcoin/bitcoin/runs/42447412610 LLM reason (✨ experimental): The CI failure is due to a failed CTest test: cluster_linearize_tests.

Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.
An intermittent issue.

Leave a comment here, if you need help tracking down a confusing failure.

sipa force-pushed on May 20, 2025

DrahtBot removed the label CI failed on May 20, 2025

sipa added the label Mempool on May 20, 2025

in src/cluster_linearize.h:558 in 23072f2b0e outdated

554@@ -539,492 +555,651 @@ class LinearizationChunking
555     }
556 };
557 
558-/** Class encapsulating the state needed to find the best remaining ancestor set.
559+/** Class to represent the internal state of the spanning-forest linearization algorithm.

jonatack commented at 4:09 pm on May 22, 2025:

Appreciate the excellent doxygen documentation here.

jonatack commented at 4:09 pm on May 22, 2025: member

Concept ACK

sipa force-pushed on May 23, 2025

sipa force-pushed on May 24, 2025

sipa force-pushed on May 25, 2025

DrahtBot added the label CI failed on May 25, 2025

DrahtBot commented at 4:44 pm on May 25, 2025: contributor

🚧 At least one of the CI tasks failed. Task CentOS, depends, gui: https://github.com/bitcoin/bitcoin/runs/42858330826 LLM reason (✨ experimental): The CI failure is due to assertion failures within the cluster_linearize_tests and bench_sanity_check tests.

Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.
An intermittent issue.

Leave a comment here, if you need help tracking down a confusing failure.

DrahtBot removed the label CI failed on May 25, 2025

sipa force-pushed on May 25, 2025

sipa force-pushed on May 28, 2025

sipa commented at 2:57 pm on May 28, 2025: member

I’m interested in seeing benchmarks of this on various systems, especially high-end/modern ones (as their performance is likely most predictive of what future hardware will be like).

0./build_dev_mode/bin/bench_bitcoin --filter="Linearize.*Example.*" -min-time=10000

Here are some I have gathered:

ns/cost	cost/s	err%	total	benchmark
1.68	595,052,476.10	0.4%	11.01	`LinearizeBoundedExample0`
1.92	521,264,873.78	0.1%	11.01	`LinearizeBoundedExample1`
1.79	557,860,504.16	0.3%	10.58	`LinearizeBoundedExample2`
1.89	528,599,597.23	0.1%	10.97	`LinearizeBoundedExample3`
1.81	552,507,201.82	0.4%	10.81	`LinearizeBoundedExample4`
1.97	508,721,677.18	1.0%	11.05	`LinearizeBoundedExample5`
1.89	528,912,231.85	0.4%	11.06	`LinearizeBoundedExample6`
1.79	557,221,394.89	0.8%	11.01	`LinearizeBoundedExample7`
1.79	557,162,554.42	0.3%	11.05	`LinearizeBoundedExample8`
1.84	543,573,897.10	0.3%	10.99	`LinearizeBoundedExample9`

ns/op	op/s	err%	total	benchmark
1,576.22	634,428.95	0.4%	10.95	`LinearizeOptimallyExample00`
3,010.14	332,209.92	0.3%	10.99	`LinearizeOptimallyExample01`
5,871.01	170,328.44	0.4%	11.09	`LinearizeOptimallyExample02`
6,321.90	158,180.35	0.6%	10.55	`LinearizeOptimallyExample03`
15,081.30	66,307.30	0.4%	10.99	`LinearizeOptimallyExample04`
12,227.15	81,785.18	0.2%	11.01	`LinearizeOptimallyExample05`
12,703.52	78,718.34	0.4%	11.02	`LinearizeOptimallyExample06`
11,519.71	86,807.71	0.3%	11.01	`LinearizeOptimallyExample07`
14,768.84	67,710.11	0.5%	10.93	`LinearizeOptimallyExample08`
19,655.67	50,875.90	0.3%	11.03	`LinearizeOptimallyExample09`
11,776.61	84,914.06	0.1%	10.97	`LinearizeOptimallyExample10`
17,396.59	57,482.51	0.5%	10.97	`LinearizeOptimallyExample11`
25,232.95	39,630.72	0.3%	11.04	`LinearizeOptimallyExample12`
28,941.27	34,552.73	0.2%	10.98	`LinearizeOptimallyExample13`
36,964.92	27,052.68	0.7%	11.00	`LinearizeOptimallyExample14`
37,103.54	26,951.61	0.3%	11.02	`LinearizeOptimallyExample15`
43,747.38	22,858.51	0.3%	10.98	`LinearizeOptimallyExample16`
59,132.17	16,911.27	0.8%	11.04	`LinearizeOptimallyExample17`
61,824.04	16,174.94	0.2%	10.54	`LinearizeOptimallyExample18`
73,274.33	13,647.34	0.6%	10.65	`LinearizeOptimallyExample19`

ns/cost	cost/s	err%	total	benchmark
1.31	763,277,399.95	0.0%	10.99	`LinearizeBoundedExample0`
1.53	652,463,095.02	0.1%	11.00	`LinearizeBoundedExample1`
1.49	672,057,837.39	0.0%	11.00	`LinearizeBoundedExample2`
1.66	602,362,799.74	0.0%	10.69	`LinearizeBoundedExample3`
1.52	656,146,353.18	0.1%	10.71	`LinearizeBoundedExample4`
1.65	607,795,039.45	0.0%	10.76	`LinearizeBoundedExample5`
1.59	628,108,657.72	0.0%	10.82	`LinearizeBoundedExample6`
1.50	668,384,115.59	0.1%	11.00	`LinearizeBoundedExample7`
1.51	663,161,838.88	0.0%	11.01	`LinearizeBoundedExample8`
1.53	654,682,223.87	0.1%	10.99	`LinearizeBoundedExample9`

ns/op	op/s	err%	total	benchmark
1,209.24	826,967.78	0.3%	10.98	`LinearizeOptimallyExample00`
2,393.97	417,716.84	0.0%	11.01	`LinearizeOptimallyExample01`
4,764.63	209,879.68	0.0%	11.01	`LinearizeOptimallyExample02`
5,067.95	197,318.41	0.1%	11.01	`LinearizeOptimallyExample03`
12,716.36	78,638.87	0.0%	11.01	`LinearizeOptimallyExample04`
10,231.29	97,739.43	0.0%	11.01	`LinearizeOptimallyExample05`
10,649.64	93,899.84	0.0%	11.01	`LinearizeOptimallyExample06`
9,564.17	104,556.95	0.1%	11.01	`LinearizeOptimallyExample07`
11,884.47	84,143.41	0.3%	11.00	`LinearizeOptimallyExample08`
16,436.12	60,841.62	0.1%	11.00	`LinearizeOptimallyExample09`
9,937.05	100,633.51	0.1%	11.00	`LinearizeOptimallyExample10`
14,575.73	68,607.20	0.2%	11.02	`LinearizeOptimallyExample11`
21,355.98	46,825.28	0.1%	11.01	`LinearizeOptimallyExample12`
24,439.13	40,917.99	0.0%	11.00	`LinearizeOptimallyExample13`
31,407.36	31,839.67	0.1%	11.00	`LinearizeOptimallyExample14`
31,366.41	31,881.24	0.1%	11.01	`LinearizeOptimallyExample15`
37,182.88	26,894.10	0.1%	11.00	`LinearizeOptimallyExample16`
50,528.12	19,790.96	0.3%	11.02	`LinearizeOptimallyExample17`
52,714.16	18,970.24	0.1%	11.00	`LinearizeOptimallyExample18`
62,357.87	16,036.47	0.1%	10.57	`LinearizeOptimallyExample19`

ns/cost	cost/s	err%	total	benchmark
1.53	655,436,526.57	0.7%	1.10	`LinearizeBoundedExample0`
1.81	552,194,379.23	0.2%	1.10	`LinearizeBoundedExample1`
1.52	656,148,852.84	0.4%	1.09	`LinearizeBoundedExample2`
1.75	572,294,606.97	0.7%	1.06	`LinearizeBoundedExample3`
1.62	615,593,105.29	0.3%	1.07	`LinearizeBoundedExample4`
1.78	561,989,024.83	1.2%	1.08	`LinearizeBoundedExample5`
1.69	592,612,447.36	0.2%	1.10	`LinearizeBoundedExample6`
1.58	633,450,827.45	0.3%	1.10	`LinearizeBoundedExample7`
1.56	640,690,050.49	0.1%	1.10	`LinearizeBoundedExample8`
1.59	628,148,004.87	0.2%	1.09	`LinearizeBoundedExample9`

ns/op	op/s	err%	total	benchmark
1,499.94	666,693.94	0.4%	1.09	`LinearizeOptimallyExample00`
2,884.51	346,679.16	0.2%	1.10	`LinearizeOptimallyExample01`
5,815.27	171,961.12	0.1%	1.10	`LinearizeOptimallyExample02`
6,097.37	164,005.14	0.2%	1.05	`LinearizeOptimallyExample03`
13,502.59	74,059.89	0.1%	1.10	`LinearizeOptimallyExample04`
11,447.25	87,357.24	0.1%	1.10	`LinearizeOptimallyExample05`
11,639.59	85,913.71	0.1%	1.10	`LinearizeOptimallyExample06`
9,557.65	104,628.26	0.1%	1.10	`LinearizeOptimallyExample07`
12,139.84	82,373.43	0.1%	1.10	`LinearizeOptimallyExample08`
18,237.62	54,831.73	0.1%	1.10	`LinearizeOptimallyExample09`
11,806.18	84,701.39	0.1%	1.10	`LinearizeOptimallyExample10`
17,519.95	57,077.80	0.2%	1.09	`LinearizeOptimallyExample11`
24,397.16	40,988.37	0.2%	1.10	`LinearizeOptimallyExample12`
28,264.97	35,379.48	0.1%	1.10	`LinearizeOptimallyExample13`
36,230.91	27,600.74	0.5%	1.10	`LinearizeOptimallyExample14`
36,251.16	27,585.33	0.3%	1.10	`LinearizeOptimallyExample15`
42,480.07	23,540.45	0.3%	1.10	`LinearizeOptimallyExample16`
59,497.62	16,807.39	0.4%	1.10	`LinearizeOptimallyExample17`
62,192.70	16,079.06	0.2%	1.06	`LinearizeOptimallyExample18`
73,604.11	13,586.20	0.7%	1.06	`LinearizeOptimallyExample19`

ns/cost	cost/s	err%	total	benchmark
4.39	227,695,874.58	0.4%	10.99	`LinearizeBoundedExample0`
5.32	187,925,655.80	2.3%	11.09	`LinearizeBoundedExample1`
4.96	201,470,069.83	1.6%	11.13	`LinearizeBoundedExample2`
5.47	182,878,034.13	1.9%	10.75	`LinearizeBoundedExample3`
5.02	199,278,377.58	2.6%	10.73	`LinearizeBoundedExample4`
5.69	175,678,783.21	2.7%	11.08	`LinearizeBoundedExample5`
5.53	180,751,956.64	4.1%	11.00	`LinearizeBoundedExample6`
5.80	172,338,240.84	2.0%	10.81	`LinearizeBoundedExample7`
5.24	190,865,678.33	3.1%	10.63	`LinearizeBoundedExample8`
5.51	181,564,438.26	2.9%	11.38	`LinearizeBoundedExample9`

ns/op	op/s	err%	total	benchmark
4,399.38	227,304.78	1.9%	11.18	`LinearizeOptimallyExample00`
8,789.91	113,766.77	0.9%	10.84	`LinearizeOptimallyExample01`
15,890.75	62,929.71	1.6%	11.07	`LinearizeOptimallyExample02`
19,082.82	52,403.17	2.2%	11.00	`LinearizeOptimallyExample03`
44,627.25	22,407.83	1.8%	11.20	`LinearizeOptimallyExample04`
33,718.11	29,657.66	1.1%	11.03	`LinearizeOptimallyExample05`
35,473.62	28,189.96	1.1%	11.05	`LinearizeOptimallyExample06`
33,162.95	30,154.13	1.5%	10.88	`LinearizeOptimallyExample07`
45,473.12	21,991.01	0.9%	11.12	`LinearizeOptimallyExample08`
57,919.95	17,265.21	1.1%	11.04	`LinearizeOptimallyExample09`
32,640.19	30,637.08	2.1%	11.16	`LinearizeOptimallyExample10`
46,559.58	21,477.86	1.5%	10.98	`LinearizeOptimallyExample11`
66,614.50	15,011.75	1.4%	10.55	`LinearizeOptimallyExample12`
76,783.09	13,023.70	0.5%	10.74	`LinearizeOptimallyExample13`
97,635.10	10,242.22	1.7%	11.00	`LinearizeOptimallyExample14`
99,854.80	10,014.54	0.8%	11.00	`LinearizeOptimallyExample15`
119,444.66	8,372.08	1.2%	11.11	`LinearizeOptimallyExample16`
156,160.25	6,403.68	0.9%	10.98	`LinearizeOptimallyExample17`
163,364.57	6,121.28	0.8%	10.90	`LinearizeOptimallyExample18`
193,524.10	5,167.32	1.2%	11.14	`LinearizeOptimallyExample19`

ns/cost	cost/s	err%	total	benchmark
1.75	571,671,353.08	0.1%	10.99	`LinearizeBoundedExample0`
2.05	487,547,934.48	0.1%	11.00	`LinearizeBoundedExample1`
1.97	506,752,217.57	0.1%	10.64	`LinearizeBoundedExample2`
2.24	447,162,026.45	0.1%	11.00	`LinearizeBoundedExample3`
2.08	480,557,495.14	0.1%	11.01	`LinearizeBoundedExample4`
2.30	434,505,839.40	0.1%	11.12	`LinearizeBoundedExample5`
2.21	451,557,580.61	0.0%	11.00	`LinearizeBoundedExample6`
2.25	444,374,372.72	0.0%	11.00	`LinearizeBoundedExample7`
2.12	471,017,980.18	0.1%	10.93	`LinearizeBoundedExample8`
2.22	450,256,469.71	1.1%	11.07	`LinearizeBoundedExample9`

ns/op	op/s	err%	total	benchmark
1,586.71	630,233.45	0.1%	10.98	`LinearizeOptimallyExample00`
3,268.23	305,976.43	0.0%	11.00	`LinearizeOptimallyExample01`
6,858.84	145,797.34	0.2%	10.54	`LinearizeOptimallyExample02`
7,873.51	127,008.11	0.1%	10.73	`LinearizeOptimallyExample03`
17,277.23	57,879.64	0.1%	11.00	`LinearizeOptimallyExample04`
14,178.66	70,528.55	0.1%	11.00	`LinearizeOptimallyExample05`
16,009.26	62,463.84	0.0%	10.99	`LinearizeOptimallyExample06`
12,586.44	79,450.57	1.6%	10.87	`LinearizeOptimallyExample07`
15,396.54	64,949.64	0.0%	11.00	`LinearizeOptimallyExample08`
24,369.54	41,034.82	0.0%	11.00	`LinearizeOptimallyExample09`
13,610.19	73,474.38	0.1%	10.99	`LinearizeOptimallyExample10`
20,073.90	49,815.92	0.1%	11.01	`LinearizeOptimallyExample11`
30,462.32	32,827.44	0.3%	11.12	`LinearizeOptimallyExample12`
34,713.56	28,807.18	0.3%	10.93	`LinearizeOptimallyExample13`
44,015.70	22,719.17	0.1%	11.00	`LinearizeOptimallyExample14`
43,492.08	22,992.69	0.1%	11.00	`LinearizeOptimallyExample15`
51,596.62	19,381.12	0.1%	11.01	`LinearizeOptimallyExample16`
71,384.39	14,008.67	0.2%	10.69	`LinearizeOptimallyExample17`
73,714.84	13,565.79	0.1%	10.69	`LinearizeOptimallyExample18`
87,910.27	11,375.24	0.0%	10.82	`LinearizeOptimallyExample19`

davidgumberg commented at 3:16 am on May 29, 2025: contributor

(built with Clang)

AMD Ryzen 9 9950X 16-Core Processor

0./build/bin/bench_bitcoin --filter="Linearize.*Example.*" -min-time=10000

ns/cost	cost/s	err%	ins/cost	cyc/cost	IPC	bra/cost	miss%	total	benchmark
1.19	836,855,166.72	0.4%	20.42	5.13	3.984	2.29	2.4%	11.05	`LinearizeBoundedExample0`
1.44	695,776,928.37	0.1%	24.33	6.17	3.945	2.74	2.4%	10.99	`LinearizeBoundedExample1`
1.17	852,566,861.36	0.2%	18.78	5.03	3.733	2.33	3.2%	11.04	`LinearizeBoundedExample2`
1.36	735,968,415.25	0.1%	19.54	5.83	3.349	2.36	4.6%	10.57	`LinearizeBoundedExample3`
1.24	805,653,703.91	0.0%	18.58	5.33	3.487	2.27	4.2%	10.56	`LinearizeBoundedExample4`
1.34	745,442,570.61	0.1%	20.46	5.76	3.553	2.50	4.1%	10.62	`LinearizeBoundedExample5`
1.27	784,639,270.59	0.0%	19.10	5.47	3.490	2.35	4.3%	10.65	`LinearizeBoundedExample6`
1.18	847,347,367.90	0.2%	18.45	5.07	3.642	2.23	3.6%	10.70	`LinearizeBoundedExample7`
1.19	841,752,012.32	0.2%	18.65	5.10	3.657	2.29	3.5%	10.70	`LinearizeBoundedExample8`
1.20	833,486,862.05	0.1%	19.07	5.15	3.703	2.31	3.2%	10.79	`LinearizeBoundedExample9`

ns/op	op/s	err%	ins/op	cyc/op	IPC	bra/op	miss%	total	benchmark
1,062.27	941,380.47	0.2%	23,154.64	4,561.62	5.076	2,732.15	1.1%	10.99	`LinearizeOptimallyExample00`
2,199.32	454,686.52	0.1%	44,955.91	9,443.49	4.761	5,125.75	1.2%	11.01	`LinearizeOptimallyExample01`
4,655.43	214,803.15	0.1%	76,020.42	19,989.80	3.803	8,376.75	2.7%	11.01	`LinearizeOptimallyExample02`
4,687.57	213,330.01	0.2%	79,606.35	20,125.95	3.955	8,838.43	2.7%	10.99	`LinearizeOptimallyExample03`
11,198.28	89,299.41	0.1%	180,121.05	48,080.45	3.746	21,333.25	2.9%	11.01	`LinearizeOptimallyExample04`
8,806.12	113,557.33	0.0%	148,315.36	37,809.30	3.923	17,597.94	2.9%	10.83	`LinearizeOptimallyExample05`
8,881.81	112,589.68	0.0%	129,414.56	38,127.56	3.394	15,799.78	4.3%	10.83	`LinearizeOptimallyExample06`
7,396.23	135,204.04	0.0%	139,880.41	31,756.73	4.405	16,345.08	2.7%	10.67	`LinearizeOptimallyExample07`
9,691.58	103,182.40	0.0%	201,859.77	41,612.38	4.851	23,957.84	1.1%	11.01	`LinearizeOptimallyExample08`
13,912.45	71,878.08	0.1%	205,829.89	59,733.52	3.446	25,060.12	4.2%	11.00	`LinearizeOptimallyExample09`
9,799.39	102,047.14	0.1%	131,704.66	42,074.45	3.130	13,959.94	5.7%	11.00	`LinearizeOptimallyExample10`
14,178.88	70,527.42	0.1%	181,891.98	60,879.32	2.988	18,797.61	5.9%	11.02	`LinearizeOptimallyExample11`
19,417.52	51,499.89	0.1%	231,655.76	83,370.63	2.779	26,582.04	6.6%	11.01	`LinearizeOptimallyExample12`
21,875.47	45,713.31	0.1%	262,258.87	93,911.73	2.793	30,218.20	6.7%	11.00	`LinearizeOptimallyExample13`
28,875.31	34,631.67	0.1%	336,971.73	123,981.98	2.718	38,520.06	6.5%	11.01	`LinearizeOptimallyExample14`
27,860.30	35,893.37	0.1%	334,160.29	119,619.93	2.794	37,727.10	6.5%	11.00	`LinearizeOptimallyExample15`
33,495.46	29,854.79	0.1%	395,850.86	143,808.78	2.753	45,419.38	6.8%	11.00	`LinearizeOptimallyExample16`
46,820.27	21,358.27	0.2%	527,868.16	201,024.15	2.626	60,683.66	7.4%	11.01	`LinearizeOptimallyExample17`
47,420.53	21,087.91	0.1%	549,255.39	203,589.67	2.698	62,108.84	7.0%	11.01	`LinearizeOptimallyExample18`
56,870.07	17,583.94	0.2%	665,564.77	244,167.60	2.726	75,210.75	6.7%	10.98	`LinearizeOptimallyExample19`

AMD Ryzen 9 7900X 12-Core Processor

0./build/bin/bench_bitcoin --filter="Linearize.*Example.*" -min-time=10000

ns/cost	cost/s	err%	ins/cost	cyc/cost	IPC	bra/cost	miss%	total	benchmark
1.36	737,410,406.03	0.2%	20.58	6.34	3.247	2.34	2.5%	11.02	`LinearizeBoundedExample0`
1.65	606,854,042.04	0.1%	24.57	7.70	3.190	2.76	2.5%	10.99	`LinearizeBoundedExample1`
1.31	760,547,140.52	0.1%	18.87	6.14	3.071	2.36	3.4%	11.02	`LinearizeBoundedExample2`
1.52	657,532,700.47	0.0%	19.61	7.11	2.758	2.38	4.9%	10.63	`LinearizeBoundedExample3`
1.39	718,444,894.17	0.1%	18.63	6.51	2.864	2.30	4.3%	10.63	`LinearizeBoundedExample4`
1.52	656,965,976.43	0.1%	20.51	7.11	2.882	2.53	4.1%	10.71	`LinearizeBoundedExample5`
1.44	694,175,863.25	0.1%	19.12	6.73	2.839	2.39	4.5%	10.72	`LinearizeBoundedExample6`
1.34	745,455,006.56	0.0%	18.49	6.27	2.948	2.27	3.8%	10.81	`LinearizeBoundedExample7`
1.35	740,869,763.95	0.2%	18.70	6.31	2.965	2.32	3.7%	10.83	`LinearizeBoundedExample8`
1.36	733,610,430.06	0.0%	19.17	6.37	3.009	2.34	3.4%	11.00	`LinearizeBoundedExample9`

ns/op	op/s	err%	ins/op	cyc/op	IPC	bra/op	miss%	total	benchmark
1,231.66	811,912.24	0.0%	23,357.46	5,757.88	4.057	2,788.34	1.0%	11.01	`LinearizeOptimallyExample00`
2,608.32	383,388.20	0.1%	44,890.34	12,192.83	3.682	5,176.76	1.2%	11.02	`LinearizeOptimallyExample01`
5,129.41	194,954.12	0.1%	76,112.07	23,977.96	3.174	8,574.18	2.6%	10.99	`LinearizeOptimallyExample02`
5,283.32	189,274.88	0.1%	79,355.44	24,692.88	3.214	8,960.39	2.7%	11.01	`LinearizeOptimallyExample03`
12,510.48	79,933.00	0.1%	178,410.73	58,475.18	3.051	21,948.73	3.0%	11.00	`LinearizeOptimallyExample04`
9,699.21	103,101.15	0.1%	147,263.70	45,334.12	3.248	18,517.16	2.7%	10.99	`LinearizeOptimallyExample05`
9,890.03	101,111.94	0.1%	128,779.01	46,228.98	2.786	16,175.51	4.5%	11.00	`LinearizeOptimallyExample06`
8,853.35	112,951.56	0.1%	138,404.58	41,381.10	3.345	16,972.04	2.6%	10.83	`LinearizeOptimallyExample07`
11,429.49	87,493.00	0.0%	199,933.47	53,426.08	3.742	24,782.93	1.0%	11.00	`LinearizeOptimallyExample08`
15,278.31	65,452.27	0.1%	203,844.43	71,404.74	2.855	25,811.52	4.2%	10.98	`LinearizeOptimallyExample09`
10,430.03	95,877.04	0.1%	130,835.87	48,749.71	2.684	14,322.22	5.7%	11.01	`LinearizeOptimallyExample10`
14,955.33	66,865.78	0.1%	181,388.73	69,905.18	2.595	19,826.07	5.7%	11.01	`LinearizeOptimallyExample11`
20,192.09	49,524.34	0.1%	229,811.63	94,374.34	2.435	27,200.27	6.6%	11.01	`LinearizeOptimallyExample12`
23,265.71	42,981.71	0.3%	260,056.43	108,734.94	2.392	30,956.74	6.7%	10.99	`LinearizeOptimallyExample13`
29,690.95	33,680.30	0.1%	334,363.41	138,778.90	2.409	39,510.12	6.4%	10.99	`LinearizeOptimallyExample14`
29,188.83	34,259.68	0.1%	331,541.94	136,412.59	2.430	38,603.88	6.3%	10.99	`LinearizeOptimallyExample15`
35,084.30	28,502.78	0.0%	391,658.97	163,986.36	2.388	46,505.21	6.6%	11.00	`LinearizeOptimallyExample16`
47,904.99	20,874.65	0.1%	521,828.04	223,869.99	2.331	62,246.01	7.0%	11.01	`LinearizeOptimallyExample17`
49,498.85	20,202.49	0.1%	543,459.92	231,331.22	2.349	63,630.07	6.7%	11.01	`LinearizeOptimallyExample18`
59,173.83	16,899.36	0.1%	658,965.47	276,557.48	2.383	77,128.66	6.3%	11.00	`LinearizeOptimallyExample19`

ismaelsadeeq commented at 1:51 pm on May 29, 2025: member

I’m interested in seeing benchmarks of this on various systems, especially high-end/modern ones (as their performance is likely most predictive of what future hardware will be like).

Below are benchmarks on MacBook M2 PRO 2023. Compiled using clang with only build bench cmake option.

0cmake -B build -DBUILD_BENCH=ON && cmake --build build

0abubakarismail@Abubakars-MacBook-Pro ~/D/W/b/bitcoin ((47bdf8fa))> ./build/bin/bench_bitcoin --filter="Linearize.*Example.*" -min-ti
1me=1000

ns/cost	cost/s	err%	total	benchmark
1.85	539,881,102.87	6.1%	10.64	:wavy_dash: `LinearizeBoundedExample0` (Unstable with ~171,993.8 iters. Increase `minEpochIterations` to e.g. 1719938)
2.16	463,765,560.64	1.9%	11.52	`LinearizeBoundedExample1`
1.89	529,370,444.85	3.4%	11.88	`LinearizeBoundedExample2`
2.09	478,266,850.10	2.1%	10.69	`LinearizeBoundedExample3`
2.00	500,459,378.92	3.4%	10.99	`LinearizeBoundedExample4`
2.23	448,896,053.69	4.1%	12.62	`LinearizeBoundedExample5`
2.21	453,201,053.94	6.2%	9.74	:wavy_dash: `LinearizeBoundedExample6` (Unstable with ~70,645.3 iters. Increase `minEpochIterations` to e.g. 706453)
1.88	531,282,530.06	1.7%	11.59	`LinearizeBoundedExample7`
1.82	550,292,679.97	1.5%	11.85	`LinearizeBoundedExample8`
1.81	553,506,180.26	1.1%	10.68	`LinearizeBoundedExample9`

ns/op	op/s	err%	total	benchmark
1,927.91	518,696.06	1.2%	11.39	`LinearizeOptimallyExample00`
3,227.99	309,790.35	1.0%	10.99	`LinearizeOptimallyExample01`
6,619.50	151,068.72	2.3%	10.17	`LinearizeOptimallyExample02`
6,648.74	150,404.54	1.1%	10.67	`LinearizeOptimallyExample03`
15,989.38	62,541.52	1.7%	11.11	`LinearizeOptimallyExample04`
13,491.31	74,121.78	4.1%	12.15	`LinearizeOptimallyExample05`
12,958.96	77,166.66	1.0%	10.63	`LinearizeOptimallyExample06`
11,525.25	86,766.02	3.7%	11.77	`LinearizeOptimallyExample07`
14,095.38	70,945.21	0.6%	10.80	`LinearizeOptimallyExample08`
21,410.11	46,706.90	1.1%	11.46	`LinearizeOptimallyExample09`
13,372.29	74,781.50	1.5%	11.06	`LinearizeOptimallyExample10`
19,803.65	50,495.75	3.2%	10.79	`LinearizeOptimallyExample11`
26,699.86	37,453.38	1.1%	11.15	`LinearizeOptimallyExample12`
36,106.95	27,695.50	8.0%	11.03	:wavy_dash: `LinearizeOptimallyExample13` (Unstable with ~24,184.5 iters. Increase `minEpochIterations` to e.g. 241845)
41,346.30	24,185.96	3.3%	10.49	`LinearizeOptimallyExample14`
41,673.85	23,995.86	1.6%	9.89	`LinearizeOptimallyExample15`
47,425.14	21,085.87	1.1%	11.07	`LinearizeOptimallyExample16`
66,540.78	15,028.38	2.4%	11.26	`LinearizeOptimallyExample17`
66,934.73	14,939.93	1.4%	10.54	`LinearizeOptimallyExample18`
81,828.99	12,220.61	1.8%	11.44	`LinearizeOptimallyExample19`

~~Almost 10x speedup~~ edit: bench cases changed, so not comparable with master.

sipa commented at 1:54 pm on May 29, 2025: member

@ismaelsadeeq Not an apples-to-apples comparison I’m afraid, because the benchmark cases are changed in this PR too. Can you run the LinearizeBoundedExample? benchmarks too?

ismaelsadeeq commented at 2:35 pm on May 29, 2025: member

@ismaelsadeeq Not an apples-to-apples comparison I’m afraid, because the benchmark cases are changed in this PR too

Oops sorry haven’t take a look, I was hoping I can compare locally myself. I saw a link to the bench results comparing them in the description 👍🏾

Can you run the LinearizeBoundedExample? benchmarks too?

Yep I did and updated the bench results. previous one was not pointing to your latest push.

instagibbs commented at 4:28 pm on May 29, 2025: member

Ran apples to apples out of curiosity. One example ran marginally slower on this PR (but still very fast), many are many multiples faster and some see 500x+ improvements.

gmaxwell commented at 6:47 pm on May 29, 2025: contributor

@instagibbs I believe the only cases that should be slower with SFL are either very small examples (where their time is irrelevant because its very low per tx compared to bigger clusters) and ones with huge numbers of dependencies. Beyond huge dependencies being generally unrepresentative they’re also expensive to generate because every dependency needs a vin, they also tend to be so slow with both that I guess neither will run to completion in practice.

A better way to compare across implementations might be some kind of optimization time vs amount of fee required to create a cluster with that dependency geometry (just assuming a 1s/vb feerate or something, not adding up the actual costs in the cluster because another cluster could exist with the same topology but different fees/size). like fee = 16*txn + 41*inputs_consumed + 9*outputs_consumed_within_cluster – so I’d hope to find that SFL always beats the current exponential algorithm on this kind of metric (except for very small inputs).

The exponential algorithm gets faster with dependency rich clusters because the linearization essentially becomes topologically forced, so the current algorithm doesn’t have many options to consider. If it massive high dependency clusters became common on the network in the future it might make sense to bring back the exponential algorithm for them. :P but I doubt that would happen.

fjahr commented at 8:26 pm on May 30, 2025: contributor

I’m interested in seeing benchmarks of this on various systems

$ build/bin/bench_bitcoin –filter=“Linearize.Example.” -min-time=10000

ns/cost	cost/s	err%	total	benchmark
1.12	893,382,447.85	0.5%	11.06	`LinearizeBoundedExample0`
1.34	743,601,759.92	0.7%	11.07	`LinearizeBoundedExample1`
1.14	880,692,144.98	0.3%	10.99	`LinearizeBoundedExample2`
1.33	749,663,261.00	0.3%	10.98	`LinearizeBoundedExample3`
1.24	807,545,779.18	0.5%	10.57	`LinearizeBoundedExample4`
1.37	730,046,090.82	0.4%	10.64	`LinearizeBoundedExample5`
1.28	779,862,831.19	0.4%	10.66	`LinearizeBoundedExample6`
1.17	856,937,081.55	0.4%	10.71	`LinearizeBoundedExample7`
1.19	840,624,085.43	0.2%	10.74	`LinearizeBoundedExample8`
1.19	840,938,970.51	0.4%	10.74	`LinearizeBoundedExample9`

ns/op	op/s	err%	total	benchmark
1,155.37	865,524.35	0.2%	11.00	`LinearizeOptimallyExample00`
2,170.00	460,828.59	0.2%	11.04	`LinearizeOptimallyExample01`
4,507.21	221,866.90	0.2%	11.01	`LinearizeOptimallyExample02`
4,435.85	225,435.87	0.3%	10.96	`LinearizeOptimallyExample03`
10,189.93	98,136.12	0.2%	11.02	`LinearizeOptimallyExample04`
8,854.57	112,936.09	0.4%	10.79	`LinearizeOptimallyExample05`
8,685.08	115,140.05	0.1%	10.82	`LinearizeOptimallyExample06`
7,162.17	139,622.57	0.0%	10.64	`LinearizeOptimallyExample07`
9,120.88	109,638.49	0.0%	11.00	`LinearizeOptimallyExample08`
12,780.73	78,242.79	0.1%	11.00	`LinearizeOptimallyExample09`
9,242.70	108,193.53	0.1%	10.98	`LinearizeOptimallyExample10`
13,653.81	73,239.62	0.3%	11.04	`LinearizeOptimallyExample11`
19,211.80	52,051.34	0.2%	11.00	`LinearizeOptimallyExample12`
22,243.75	44,956.45	0.1%	11.00	`LinearizeOptimallyExample13`
28,409.81	35,199.11	0.2%	10.99	`LinearizeOptimallyExample14`
28,265.23	35,379.16	0.1%	10.99	`LinearizeOptimallyExample15`
33,767.58	29,614.20	0.3%	11.02	`LinearizeOptimallyExample16`
46,851.80	21,343.90	0.4%	11.03	`LinearizeOptimallyExample17`
48,470.35	20,631.17	0.1%	10.98	`LinearizeOptimallyExample18`
57,668.81	17,340.40	0.4%	10.98	`LinearizeOptimallyExample19`

sipa force-pushed on May 31, 2025

sipa commented at 1:24 pm on May 31, 2025: member

@instagibbs @ismaelsadeeq @gmaxwell FWIW, I didn’t include apples-to-apples comparison because the large-scale comparison benchmarks I did (especially, this one) were just overwhelmingly in favor of SFL over the existing one. The benchmarks were mostly done to decide between SFL and another replacement algorithm, GGT, but see the reasons for choosing SFL here.

A fair apples-to-apples comparison I think needs to use a dataset large enough to encompass both good and bad cases for each of the algorithms, and not be specifically biased for one of them, which is hard in a benchmark I can include in the PR directly. If people are interested, I can try to clean up the branch I used to create the graphs above, however.

sipa commented at 9:25 pm on May 31, 2025: member

Thanks for all the benchmarks, here are some graphs created from them to show their relative performance:

(code & data to generate them is here)

sipa commented at 1:43 am on June 2, 2025: member

If people are interested in a much larger scale comparison between the old and the new algorithm, checkout https://github.com/sipa/bitcoin/commits/bench_sfl_css, build its bin/bench_bitcoin, and:

0wget https://bitcoin.sipa.be/clusters/clusters.tgz
1tar -xzf clusters.tgz
2for F in clusters_sim2023_large clusters_medium clusters_spanning clusters_bipartite; do time ./build/bin/bench_bitcoin --filter="LinearizeDataSet" <$F >$F.out; done
3zstd -19 clusters_*.out

And then PR the resulting *.out.zst files to https://github.com/sipa/lin-benches/tree/main/data.

l0rinc commented at 10:21 am on June 4, 2025: contributor

I’ve run the command on a Raspberry Pi 4 Model B - Cortex-A72 4-Core ARM with SanDisk SSD PLUS 1TB (USB 3.0) with to have data about lower-end hardware as well.

0git remote add sipa https://github.com/sipa/bitcoin.git || true && git fetch sipa bench_sfl_css && git switch -C bench_sfl_css sipa/bench_sfl_css && \
1git clean -fxd && git reset --hard && \
2cmake -B build -DBUILD_BENCH=ON -DCMAKE_BUILD_TYPE=Release && cmake --build build -j"$(nproc)" && \
3wget -q https://bitcoin.sipa.be/clusters/clusters.tgz && tar -xzf clusters.tgz && \
4for F in clusters_sim2023_large clusters_medium clusters_spanning clusters_bipartite; do \
5    time build/bin/bench_bitcoin --filter="LinearizeDataSet" <"$F" >"$F.out"; \
6done && \
7tar -czf clusters.out.tgz clusters_*.out

It took considerably longer than expected (~13h):

clusters_sim2023_large: 11h 39m
clusters_medium: 26m 33s
clusters_spanning: 19m 37s
clusters_bipartite: 37m 17s

Let’s see if I can attach it here: clusters.out.tgz

sipa commented at 12:44 pm on June 4, 2025: member

I’ve created a repo for benchmark data, and added my results plus @l0rinc’s: https://github.com/sipa/lin-benches/tree/main/data.

Also, is this with 32-bit or 64-bit userspace?

l0rinc commented at 1:22 pm on June 4, 2025: contributor

Also, is this with 32-bit or 64-bit userspace?

64-bit userspace (AArch64)

0$ file bitcoind
1bitcoind: ELF 64-bit LSB pie executable, ARM aarch64, version 1 (GNU/Linux), dynamically linked, interpreter /lib/ld-linux-aarch64.so.1, BuildID[sha1]=7e059ec01f7460042910ca4ed15270382269c9d5, for GNU/Linux 3.7.0, with debug_info, not stripped

0$ getconf LONG_BIT
164
2$ dpkg --print-architecture
3arm64
4$ uname -m
5aarch64

sipa force-pushed on Jun 12, 2025

sipa commented at 5:28 pm on June 12, 2025: member

Rebased on top of #30605.

sipa force-pushed on Jun 12, 2025

sipa force-pushed on Jun 14, 2025

clusterlin: add known-correct optimal linearization tests (tests) 70082855b9

clusterlin: replace benchmarks with SFL-hard ones (bench) 22d59ccce2

clusterlin: replace cluster linearization with SFL implementation (feature)

This replaces the existing LIMO linearization algorithm (which internally uses
ancestor set finding and candidate set finding) with the much more performant
spanning-forest linearization algorithm.

See https://delvingbitcoin.org/t/spanning-forest-cluster-linearization/1419

9bcfee7a19

clusterlin: remove unused {Ancestor,Search}CandidateFinder code (cleanup)

This removes the candidate set finding classes, as well as related tests
and benchmarks for them.

e4cdaeaacb

clusterlin: keep track of the active parents of each transaction (optimization)

This avoids the need for a loop over all parents of a transaction while walking
a chunk, and removes the need to store the set of parent dependencies explicitly.

542f770888

clusterlin: split chunks based on maximum gain (optimization)

This introduces the notion of gain to the SFL algorithm. Given a chunk c,
an active dependency d in it, and the chunks (t, b) that c would split into
if d were deactivated, the gain is defined as either (they are equivalent):

  (feerate(t) - feerate(b)) * size(t) * size(b)
  fee(t) * size(b) - fee(b) * size(t)

It happens to also be equal to these:

  (feerate(t) - feerate(c)) * size(t) * size(c)
  fee(t) * size(c) - fee(c) * size(t)

Its relevance is that this metric is proportional to a lower bound on the area
under the fee-size diagram which would be gained IF a deactivation of d does not
result in a self-merge of t and b again.

This commit adds logic to find, within each chunk, the dependency with the highest
gain. In benchmarks, this appears to be a very good heuristic for deciding which
splits are worth making.

b305cbd7b0

clusterlin: use pre-allocated array for dependencies instead of vector (optimization)

This reduces the number of allocations required inside the SFL algorithm, and works
because the number of dependencies per transaction is at most n-1.

To minimize the memory usage from this pre-allocation (which might impact memory locality),
change the data type of DepIdx from uint32_t to uint8_t or uint16_t when possible.

0f089ad6c1

clusterlin: keep child dependencies split into active and inactive (optimization)

Within the per-transaction child dependency list, keep the active ones before all
inactive ones. This improves the complexity over iterating over active dependencies
from O(m) to O(n), as at most n-1 dependencies can be active within any given chunk
at any given time.

00bb7a6fe4

clusterlin: keep FIFO queue of improvable chunks (optimization)

This distributes the work over the various chunks fairly, and simultaneously
avoids retrying chunks over and over again which are already known to be optimal.

44c8a8d685

clusterlin: randomize merges/splits in SFL (feature) c612fec2c5

clusterlin: use random split in SFL on every 3rd attempt (feature)

Out of an abundance of caution that adversarially-constructed clusters might
reliably result in bad chunk split decisions with the maximum-gain strategy,
make every third consecutive attempt to split the same chunk use a random
strategy instead.

5331dd6e5b

clusterlin: make dependency being active implicit (optimization)

We do not need to actually keep track of whether a dependency is active
or not; it is implied by whether or not it appears within the active
prefix of its parent's child_deps, and its child's parent_deps.

Just remove setting and checking it.

51cc2e2ac1

clusterlin: add more accurate cost modeling (feature)

This adds a rough estimate of algorithm runtime, so it can be interrupted if no
solution is found in time. Due to inherent differences between platforms, this
will not be extremely accurate, but it is preferable over directly measuring
time for consistency.

0f6bdd810d

clusterlin: minimize chunks (feature)

After the normal optimization process finishes, and finds an optimal
spanning forest, run a second process (while computation budget remains)
to split chunks into minimal equal-feerate chunks.

As a side-effect, this also guarantees that the optimal chunk order is
deterministic.

545c11da46

sipa force-pushed on Jul 11, 2025

DrahtBot added the label Needs rebase on Jul 29, 2025

DrahtBot commented at 1:38 pm on July 29, 2025: contributor

🐙 This pull request conflicts with the target branch and needs rebase.

Replace cluster linearization algorithm with SFL #32545

Code Coverage & Benchmarks

Reviews

Conflicts

LLM Linter (✨ experimental)

AMD Ryzen 9 9950X 16-Core Processor

AMD Ryzen 9 7900X 12-Core Processor