Cluster mempool: SFL cost model (take 2) #34616

pull sipa wants to merge 2 commits into bitcoin:master from sipa:202512_sfl_costs changing 4 files +102 −22
  1. sipa commented at 5:26 pm on February 18, 2026: member

    Part of #30289, replaces earlier #34138.

    This introduces a more accurate cost model for SFL, to control how much CPU time is spent inside the algorithm for clusters that cannot be linearized perfectly within a reasonable amount of time.

    The goal is having a metric for the amount of work performed, so that txmempool can impose limits on that work: a lower bound that is always performed (unless optimality is reached before that point, of course), and an upper bound to limit the latency and total CPU time spent on this. There are conflicting design goals here:

    • On the one hand, it seems ideal if this metric is closely correlated to actual CPU time, because otherwise the limits become inaccurate.
    • On the other hand, it seems a nightmare to have the metric be platform/system dependent, as it makes network-wide reasoning nearly impossible. It’s expected that slower systems take longer to do the same thing; this holds for everything, and we don’t need to compensate for this.

    There are multiple solutions to this:

    • One extreme is just measuring the time. This is very accurate, but extremely platform dependent, and also non-deterministic due to random scheduling/cache effects.
    • The other extreme is using a very abstract metric like counting how many times certain loops/function inside the algorithm run. That is what is implemented in master right now, just counting the sum of the numbers of transactions updated across all UpdateChunks() calls. It however necessarily fails to account for significant portions of runtime spent elsewhere, resulting in a rather wide range of “ns per cost” values.
    • This PR takes a middle ground, counting many function calls / branches / loops, with weights that were determined through benchmarking on an average on a number of systems.

    Specifically, the cost model was obtained by:

    • For a variety of machines:
      • Running a fixed collection of ~385000 clusters found through random generation and fuzzing, optimizing for difficulty of linearization.
        • Linearize each ~3000 times, with different random seeds. Sometimes without input linearization, sometimes with a bad one.
          • Gather cycle counts for each of the operations included in this cost model, broken down by their parameters.
      • Correct the data by subtracting the runtime of obtaining the cycle count.
      • Drop the 5% top and bottom samples from each cycle count dataset, and compute the average of the remaining samples.
      • For each operation, fit a least-squares linear function approximation through the samples.
    • Rescale all machine expressions to make their total time match, as we only care about relative cost of each operation.
    • Take the per-operation average of operation expressions across all machines, to construct expressions for an average machine.
    • Approximate the result with integer coefficients.

    The benchmarks were performed by l0rinc <pap.lorinc@gmail.com> and myself, on AMD Ryzen 5950X, AMD Ryzen 7995WX, AMD Ryzen 9980X, Apple M4 Max, Intel Core i5-12500H, Intel Core Ultra 7 155H, Intel N150 (Umbrel), Intel Core i7-7700, Intel Core i9-9900K, Intel Haswell (VPS, virtualized), ARM Cortex-A76 (Raspberry Pi 5).

    Based on final benchmarking, the “acceptable” iteration count (which is the minimum spent on every cluster) is to 64000 units, which corresponds to roughly 50 μs on Ryzen 5950X and similar modern desktop hardware.

  2. DrahtBot commented at 5:26 pm on February 18, 2026: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Reviews

    See the guideline for information on the review process. A summary of reviews will appear here.

  3. clusterlin: introduce CostModel class (preparation)
    This parametrizes the cost model for the SFL algorithm with another
    class. Right now, the behavior of that class matches the naive cost
    model so far, but it will be replaced with a more advanced on in a
    future commit.
    
    The reason for abstracting this out is that it makes benchmarking for
    creating such cost models easy, by instantiating the cost model class
    with one that tracks time.
    0fd47359c8
  4. clusterlin: adopt trained cost model (feature)
    See the comments for the SFLDefaultCostModel class for details on how
    the numbers were obtained.
    4673a5870f
  5. sipa force-pushed on Feb 18, 2026
  6. DrahtBot added the label CI failed on Feb 18, 2026
  7. sipa commented at 6:33 pm on February 18, 2026: member
    I’m currently weighing each machine equally, which may not be what we want (e.g., it means x86_64 systems contribute 82%). It’s easy to redo the computation with other factors. The data and scripts are here: https://gist.github.com/sipa/68109fe5f412f6c42f9b157420f2911b
  8. DrahtBot removed the label CI failed on Feb 18, 2026
  9. murchandamus commented at 9:32 pm on February 19, 2026: member

    I have not reviewed the SFL implementation before, but this seems pretty straight forward—just counting various operations at different weights.

    I’m kinda curious how much the additional counting impacts the performance.

  10. gmaxwell commented at 1:47 am on February 20, 2026: contributor
    A measurement would be good for certainty but I wasn’t worried about the cost because instead of running inside inner loops most of it just adds an existing count like how many txn there are at some step or another.
  11. sipa commented at 1:45 pm on February 20, 2026: member

    Running this branch on all clusters seen in a simulated replay of 2023 mempool data, I get the following maximum costs (from 1 to 64 transaction clusters, inclusive): 148, 413, 717, 1196, 1505, 2413, 2222, 2772, 3357, 3847, 4899, 5557, 6461, 7431, 8507, 9144, 10737, 12169, 13691, 14034, 16550, 18539, 19804, 21090, 22750, 24219, 25492, 27941, 26319, 28270, 26858, 26835, 30743, 32388, 34586, 33729, 32543, 34478, 23431, 36431, 39179, 25413, 34125, 29534, 30974, 34904, 45576, 68614, 35684, 31717, 37144, 34917, 36022, 43473, 50659, 41848, 48538, 64825, 43945, 49389, 45978, 45961, 46946, 52467.

    Same number for recent dumps from oct 2025 - yesterday (not simulated replay): 148, 413, 717, 1176, 1616, 2430, 3000, 3624, 3944, 4764, 4800, 6147, 9162, 11460, 8832, 11347, 9785, 12437, 13388, 14681, 12882, 19168, 16532, 23370, 22523, 19992, 14850, 19338, 26696, 23344, 15424, 23684, 30767, 24648, 25410, 27899, 20638, 25316, 24522, 26477, 24069, 34044, 24520, 40842, 22827, 29580, 27610, 25332, 26773, 30432, 30786, 36058, 37928, 37084, 35088, 40434, 39613, 40156, 42154, 49497, 42668, 55243, 39423, 55692.

    So almost all sizes (all except ntx=58 and ntx=48 in 2023, all recent ones) saw no clusters that would need more than the acceptable level of 64000 cost. These numbers are created by running every cluster 100 times, with 100 difference RNG seeds, and with “worst case” input linearizations (including random ones, worst-possible inputs, or no input). In practice, I expect that most large clusters will be built up gradually, so will be passed a nearly-perfect input linearization, yielding better numbers than these. Also note that linearization efforts perform up to 5 times the acceptable level (so 320000 cost) spread over multiple clusters. If there is a single non-optimal cluster left, all that effort may be spent on one.

    I’ll try to make a histogram.


    For reference, here are the same numbers with the cost model in current master (acceptable limit = 1700), for the recent dumps: 0, 4, 10, 34, 48, 92, 122, 146, 169, 264, 242, 305, 545, 664, 491, 660, 628, 800, 868, 844, 842, 1088, 1172, 1387, 1424, 1211, 920, 1129, 1583, 1413, 978, 1404, 1838, 1611, 1492, 1701, 1248, 1506, 1541, 1792, 1458, 2045, 1533, 2656, 1380, 1938, 1789, 1534, 1726, 1800, 1825, 2181, 2532, 2213, 2216, 2437, 2600, 2535, 2581, 3101, 2571, 3472, 2351, 3552.


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-02-20 18:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me