cluster mempool: add txgraph diagrams/mining/eviction #31444

pull sipa wants to merge 9 commits into bitcoin:master from sipa:202412_txgraph_index changing 3 files +695 −24
  1. sipa commented at 2:50 pm on December 8, 2024: member

    Part of cluster mempool: #30289. Builds on top of #31363.

    This adds more functionality to the txgraph module, specifically:

    • TxGraph::GetMainStagingDiagrams(), a function to obtain feerate diagrams for both the main graph and the staged changes to it, including only the clusters that differ between the two.
    • TxGraph::GetBlockBuilder(), a function to obtain an object which can efficiently iterate the chunks of the (main) graph from high to low chunk feerate, allowing each to be skipped or included.
    • TxGraph::GetWorstMainChunk(), a function to obtain the last chunk that would be returned by GetBlockBuilder()’s returned object, intended for eviction.
  2. DrahtBot commented at 2:50 pm on December 8, 2024: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/31444.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK instagibbs
    Approach ACK ismaelsadeeq

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    Conflicts

    No conflicts as of last run.

  3. sipa force-pushed on Dec 8, 2024
  4. DrahtBot added the label CI failed on Dec 8, 2024
  5. DrahtBot commented at 2:59 pm on December 8, 2024: contributor

    🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/34093447174

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  6. sipa force-pushed on Dec 8, 2024
  7. sipa force-pushed on Dec 8, 2024
  8. DrahtBot removed the label CI failed on Dec 8, 2024
  9. glozow added the label Mempool on Dec 17, 2024
  10. sipa force-pushed on Dec 20, 2024
  11. sipa force-pushed on Dec 22, 2024
  12. sipa force-pushed on Jan 8, 2025
  13. sipa commented at 11:30 pm on January 8, 2025: member
    I have simplified the eviction interface. Instead of getting a TxGraph::Evictor object through TxGraph::GetEvictor(), there is now a simpler TxGraph::GetWorstMainChunk() function, which is a pure inspector.
  14. DrahtBot added the label CI failed on Jan 9, 2025
  15. DrahtBot commented at 0:51 am on January 9, 2025: contributor

    🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/35343422864

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  16. sipa force-pushed on Jan 9, 2025
  17. DrahtBot removed the label CI failed on Jan 9, 2025
  18. sipa force-pushed on Jan 9, 2025
  19. DrahtBot commented at 11:40 pm on January 9, 2025: contributor

    🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/35398305833

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  20. DrahtBot added the label CI failed on Jan 9, 2025
  21. sipa force-pushed on Jan 10, 2025
  22. DrahtBot removed the label CI failed on Jan 10, 2025
  23. sipa force-pushed on Jan 10, 2025
  24. sipa force-pushed on Jan 16, 2025
  25. sipa force-pushed on Jan 22, 2025
  26. sipa commented at 8:01 pm on January 22, 2025: member
    I modified TxGraph::GetWorstMainChunk() to return the chunk feerate in addition to the list of transaction Refs.
  27. sipa force-pushed on Jan 24, 2025
  28. DrahtBot commented at 11:48 pm on January 24, 2025: contributor

    🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/36148920396

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  29. DrahtBot added the label CI failed on Jan 24, 2025
  30. sipa force-pushed on Jan 26, 2025
  31. DrahtBot removed the label CI failed on Jan 26, 2025
  32. sipa force-pushed on Jan 26, 2025
  33. sipa force-pushed on Jan 30, 2025
  34. DrahtBot added the label CI failed on Jan 31, 2025
  35. DrahtBot commented at 1:16 am on January 31, 2025: contributor

    🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/36451236768

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  36. sipa force-pushed on Jan 31, 2025
  37. DrahtBot removed the label CI failed on Jan 31, 2025
  38. sipa force-pushed on Jan 31, 2025
  39. DrahtBot added the label CI failed on Jan 31, 2025
  40. DrahtBot commented at 11:33 pm on January 31, 2025: contributor

    🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/36505832220

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  41. sipa force-pushed on Feb 1, 2025
  42. DrahtBot removed the label CI failed on Feb 1, 2025
  43. sipa force-pushed on Feb 4, 2025
  44. sipa force-pushed on Feb 6, 2025
  45. sipa force-pushed on Feb 11, 2025
  46. sipa force-pushed on Feb 12, 2025
  47. DrahtBot commented at 11:49 pm on February 12, 2025: contributor

    🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/37128497464

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  48. DrahtBot added the label CI failed on Feb 12, 2025
  49. sipa force-pushed on Feb 13, 2025
  50. DrahtBot removed the label CI failed on Feb 13, 2025
  51. sipa force-pushed on Feb 14, 2025
  52. sipa force-pushed on Feb 20, 2025
  53. sipa force-pushed on Feb 21, 2025
  54. in src/cluster_linearize.h:1381 in efad48b239 outdated
    1364+        // Figure out which elements need to be moved before elem.
    1365+        SetType place_before = done & depgraph.Ancestors(elem);
    1366+        // Find which position to place elem in (updating j), continuously moving the elements
    1367+        // in between forward.
    1368+        while (place_before.Any()) {
    1369+            // j cannot be 0 here; if it was, then there was necessarily nothing earlier which
    


    yancyribbens commented at 0:36 am on February 25, 2025:
    0            // j cannot be 0 or less; if it was, then there was necessarily nothing earlier which
    
  55. bitcoin deleted a comment on Feb 25, 2025
  56. sipa force-pushed on Mar 3, 2025
  57. ismaelsadeeq commented at 8:31 pm on March 6, 2025: member
    Approach ACK will review in-depth soon
  58. in src/txgraph.cpp:825 in 22821d0bd5 outdated
    651+{
    652+    auto chunk_feerates = ChunkLinearization(m_depgraph, m_linearization);
    653+    for (const auto& feerate : chunk_feerates) {
    654+        ret.push_back(feerate);
    655+    }
    656+}
    


    ismaelsadeeq commented at 1:00 pm on March 13, 2025:
    0void Cluster::AppendChunkFeerates(std::vector<FeeFrac>& ret) const noexcept
    1{
    2    auto chunk_feerates = ChunkLinearization(m_depgraph, m_linearization);
    3    ret.reserve(ret.size() + chunk_feerates.size());
    4    ret.insert(ret.end(), chunk_feerates.begin(), chunk_feerates.end());
    5}
    

    sipa commented at 2:30 pm on March 28, 2025:
    Done (a while ago, forgot to respond).
  59. in src/txgraph.cpp:1870 in 22821d0bd5 outdated
    1864@@ -1854,6 +1865,33 @@ TxGraph::GraphIndex TxGraphImpl::CountDistinctClusters(std::span<const Ref* cons
    1865     return ret;
    1866 }
    1867 
    1868+std::pair<std::vector<FeeFrac>, std::vector<FeeFrac>> TxGraphImpl::GetMainStagingDiagrams() noexcept
    1869+{
    1870+    Assume(m_clustersets.size() >= 2);
    


    ismaelsadeeq commented at 1:04 pm on March 13, 2025:

    Shouldn’t this be <= instead.

    0    Assume(m_clustersets.size() <= 2);
    

    sipa commented at 2:30 pm on March 28, 2025:
    This code is gone, but no, I did mean >= 2 here (you cannot call GetMainStagingDiagrams unless you have at least 2 diagrams).
  60. in src/txgraph.cpp:2085 in 22821d0bd5 outdated
    1879+    for (Cluster* cluster : main_clusters) {
    1880+        cluster->AppendChunkFeerates(main_feerates);
    1881+    }
    1882+    // Do the same for the Clusters in staging themselves.
    1883+    const auto& staging = m_clustersets.back();
    1884+    for (int quality = 0; quality < int(QualityLevel::NONE); ++quality) {
    


    ismaelsadeeq commented at 1:13 pm on March 13, 2025:
    why are we iterating the clusters for all quality levels for staging?

    sipa commented at 7:54 pm on March 27, 2025:
    I think it’s more obviously correct. We should only have ACCEPTABLE and OPTIMAL ones, but going through all of them shouldn’t hurt here.
  61. sipa force-pushed on Mar 19, 2025
  62. sipa force-pushed on Mar 19, 2025
  63. DrahtBot added the label CI failed on Mar 19, 2025
  64. DrahtBot commented at 9:06 pm on March 19, 2025: contributor

    🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/39067512256

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  65. sipa force-pushed on Mar 19, 2025
  66. DrahtBot removed the label CI failed on Mar 20, 2025
  67. DrahtBot added the label Needs rebase on Mar 20, 2025
  68. sipa force-pushed on Mar 20, 2025
  69. DrahtBot added the label CI failed on Mar 20, 2025
  70. DrahtBot commented at 1:41 pm on March 20, 2025: contributor

    🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/39111487630

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  71. sipa force-pushed on Mar 20, 2025
  72. DrahtBot removed the label Needs rebase on Mar 20, 2025
  73. DrahtBot removed the label CI failed on Mar 20, 2025
  74. sipa force-pushed on Mar 22, 2025
  75. DrahtBot commented at 1:37 am on March 22, 2025: contributor

    🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/39212805888

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  76. DrahtBot added the label CI failed on Mar 22, 2025
  77. sipa force-pushed on Mar 22, 2025
  78. DrahtBot removed the label CI failed on Mar 22, 2025
  79. sipa force-pushed on Mar 23, 2025
  80. sipa force-pushed on Mar 24, 2025
  81. in src/txgraph.cpp:2073 in 7248d26bd0 outdated
    1906@@ -1897,6 +1907,32 @@ TxGraph::GraphIndex TxGraphImpl::CountDistinctClusters(std::span<const Ref* cons
    1907     return ret;
    1908 }
    1909 
    1910+std::pair<std::vector<FeeFrac>, std::vector<FeeFrac>> TxGraphImpl::GetMainStagingDiagrams() noexcept
    1911+{
    1912+    Assume(m_staging_clusterset.has_value());
    1913+    MakeAllAcceptable(0);
    


    instagibbs commented at 1:21 pm on March 25, 2025:
    Assume neither cluster set is oversized?

    sipa commented at 7:54 pm on March 27, 2025:
    That’s actually implied by Assume(m_main_clusterset.m_deps_to_add.empty()); below (but added a comment to clarify).
  82. in src/txgraph.cpp:2089 in 7248d26bd0 outdated
    1926+        for (const auto& cluster : m_staging_clusterset->m_clusters[quality]) {
    1927+            cluster->AppendChunkFeerates(staging_feerates);
    1928+        }
    1929+    }
    1930+    // Sort both by decreasing feerate to obtain diagrams, and return them.
    1931+    std::sort(main_feerates.begin(), main_feerates.end(), [](auto& a, auto& b) { return a >> b; });
    


    instagibbs commented at 2:07 pm on March 25, 2025:
    why not sort with size tiebreaks here? It isn’t clear based on context.

    sipa commented at 8:02 pm on March 27, 2025:

    The within-same-feerate ordering is actually irrelevant as far as the diagram is concerned, it’s just a sequence of equal-slope sections in the graph; they could arguably be merged into a single one even.

    Still, for consistently/repeatability, I’ve made it use the size-tiebreaking comparison.

  83. in src/test/fuzz/txgraph.cpp:841 in 7248d26bd0 outdated
    730+                assert(FeeRateCompare(main_cmp_diagram[i], main_cmp_diagram[i - 1]) <= 0);
    731+            }
    732+            for (size_t i = 1; i < stage_cmp_diagram.size(); ++i) {
    733+                assert(FeeRateCompare(stage_cmp_diagram[i], stage_cmp_diagram[i - 1]) <= 0);
    734+            }
    735+            // Apply total ordering on the feerate diagrams to make them comparable (the exact
    


    instagibbs commented at 2:11 pm on March 25, 2025:

    “comparable” through me off here since I was thinking about comparable diagrams.

    Can’t immediately think of better wording.


    sipa commented at 8:00 pm on March 27, 2025:
    I have reworded this. Please have a look.
  84. in src/test/fuzz/txgraph.cpp:901 in 7248d26bd0 outdated
    754+                                std::inserter(missing_stage_cmp, missing_stage_cmp.end()),
    755+                                std::greater{});
    756+            assert(stage_cmp_diagram.size() + missing_stage_cmp.size() == stage_diagram.size());
    757+            // The missing chunks must be equal across main & staging (otherwise they couldn't have
    758+            // been omitted).
    759+            assert(missing_main_cmp == missing_stage_cmp);
    


    instagibbs commented at 2:27 pm on March 25, 2025:

    I’m wondering what the best way of catching stuff that should be missing on both sides being added to both diagrams erroneously.

    You could imagine a degenerate case where a bug turns this into a whole mempool check. (and implies we’re doing a bunch of work maintaining these clusters for no reason)

    A rough suggestion to think about, perhaps you can tighten the bounds?

      0diff --git a/src/test/fuzz/txgraph.cpp b/src/test/fuzz/txgraph.cpp
      1index 7249827103..f30b650313 100644
      2--- a/src/test/fuzz/txgraph.cpp
      3+++ b/src/test/fuzz/txgraph.cpp
      4@@ -338,10 +338,17 @@ FUZZ_TARGET(txgraph)
      5         // just feerates).
      6         std::sort(ret.begin(), ret.end(), std::greater{});
      7         return ret;
      8     };
      9 
     10+    // When entering staging reset to 0
     11+    // This is used to upper-bound the possible diagram size in number of entries
     12+    // to catch any regression where erroneous entries are on both sides of the
     13+    // computed diagram when they shouldn't have been added at all.
     14+    uint32_t num_possible_conflicted_entries_main{0};
     15+    uint32_t num_possible_pulled_in_entries_staged{0};
     16+
     17     LIMITED_WHILE(provider.remaining_bytes() > 0, 200) {
     18         // Read a one-byte command.
     19         int command = provider.ConsumeIntegral<uint8_t>();
     20 
     21         /** Use the bottom 2 bits of command to select an entry in the block_builders vector (if
     22@@ -377,10 +384,11 @@ FUZZ_TARGET(txgraph)
     23                     // Otherwise, use smaller range which consume fewer fuzz input bytes, as just
     24                     // these are likely sufficient to trigger all interesting code paths already.
     25                     fee = provider.ConsumeIntegral<uint8_t>();
     26                     size = provider.ConsumeIntegral<uint8_t>() + 1;
     27                 }
     28+                num_possible_pulled_in_entries_staged++;
     29                 FeePerWeight feerate{fee, size};
     30                 // Create a real TxGraph::Ref.
     31                 auto ref = real->AddTransaction(feerate);
     32                 // Create a shared_ptr place in the simulation to put the Ref in.
     33                 auto ref_loc = top_sim.AddTransaction(feerate);
     34@@ -398,10 +406,22 @@ FUZZ_TARGET(txgraph)
     35                     // and if so, skip.
     36                     if (top_sim.graph.Ancestors(pos_par)[pos_chl]) break;
     37                 }
     38                 top_sim.AddDependency(par, chl);
     39                 real->AddDependency(*par, *chl);
     40+                if (pos_par != SimTxGraph::MISSING && pos_chl != SimTxGraph::MISSING) {
     41+                    if (!top_sim.IsOversized() && !main_sim.IsOversized()) {
     42+                        // If both are real, add resulting cluster count to num_possible_pulled_in_entries_*
     43+                        const auto max_count{std::max(real->GetCluster(*par, false).size(), real->GetCluster(*par, true).size())};
     44+                        num_possible_conflicted_entries_main += max_count;
     45+                        num_possible_pulled_in_entries_staged += max_count;
     46+                    } else {
     47+                        // If ever oversized, add max count to "turn it off"
     48+                        num_possible_conflicted_entries_main += SimTxGraph::MAX_TRANSACTIONS;
     49+                        num_possible_pulled_in_entries_staged += SimTxGraph::MAX_TRANSACTIONS;
     50+                    }
     51+                }
     52                 break;
     53             } else if ((block_builders.empty() || sims.size() > 1)  && top_sim.removed.size() < 100 && command-- == 0) {
     54                 // RemoveTransaction. Either all its ancestors or all its descendants are also
     55                 // removed (if any), to make sure TxGraph's reordering of removals and dependencies
     56                 // has no effect.
     57@@ -410,10 +430,20 @@ FUZZ_TARGET(txgraph)
     58                 top_sim.IncludeAncDesc(to_remove, alt);
     59                 // The order in which these ancestors/descendants are removed should not matter;
     60                 // randomly shuffle them.
     61                 std::shuffle(to_remove.begin(), to_remove.end(), rng);
     62                 for (TxGraph::Ref* ptr : to_remove) {
     63+                    if (!top_sim.IsOversized() && !main_sim.IsOversized()) {
     64+                        const auto max_count{std::max(real->GetCluster(*ptr, false).size(), real->GetCluster(*ptr, true).size())};
     65+                        // Overestimate number of pulled in items
     66+                        num_possible_conflicted_entries_main += max_count;
     67+                        num_possible_pulled_in_entries_staged += max_count;
     68+                    } else {
     69+                        // If ever oversized, add max count to "turn it off"
     70+                        num_possible_conflicted_entries_main += SimTxGraph::MAX_TRANSACTIONS;
     71+                        num_possible_pulled_in_entries_staged += SimTxGraph::MAX_TRANSACTIONS;
     72+                    }
     73                     real->RemoveTransaction(*ptr);
     74                     top_sim.RemoveTransaction(ptr);
     75                 }
     76                 break;
     77             } else if (sel_sim.removed.size() > 0 && command-- == 0) {
     78@@ -446,10 +476,20 @@ FUZZ_TARGET(txgraph)
     79                 }
     80                 // The order in which these ancestors/descendants are destroyed should not matter;
     81                 // randomly shuffle them.
     82                 std::shuffle(to_destroy.begin(), to_destroy.end(), rng);
     83                 for (TxGraph::Ref* ptr : to_destroy) {
     84+                    if (!top_sim.IsOversized() && !main_sim.IsOversized()) {
     85+                        const auto max_count{std::max(real->GetCluster(*ptr, false).size(), real->GetCluster(*ptr, true).size())};
     86+                        // Overestimate number of pulled in items
     87+                        num_possible_conflicted_entries_main += max_count;
     88+                        num_possible_pulled_in_entries_staged += max_count;
     89+                    } else {
     90+                        // If ever oversized, add max count to "turn it off"
     91+                        num_possible_conflicted_entries_main += SimTxGraph::MAX_TRANSACTIONS;
     92+                        num_possible_pulled_in_entries_staged += SimTxGraph::MAX_TRANSACTIONS;
     93+                    }
     94                     for (size_t level = 0; level < sims.size(); ++level) {
     95                         sims[level].DestroyTransaction(ptr, level == sims.size() - 1);
     96                     }
     97                 }
     98                 break;
     99@@ -580,10 +620,14 @@ FUZZ_TARGET(txgraph)
    100                 break;
    101             } else if (sims.size() < 2 && command-- == 0) {
    102                 // StartStaging.
    103                 sims.emplace_back(sims.back());
    104                 real->StartStaging();
    105+
    106+                // Start counting upper-bound on open of staging
    107+                num_possible_conflicted_entries_main = 0;
    108+                num_possible_pulled_in_entries_staged = 0;
    109                 break;
    110             } else if (block_builders.empty() && sims.size() > 1 && command-- == 0) {
    111                 // CommitStaging.
    112                 real->CommitStaging();
    113                 sims.erase(sims.begin());
    114@@ -668,10 +712,17 @@ FUZZ_TARGET(txgraph)
    115                 auto [main_diagram, staged_diagram] = real->GetMainStagingDiagrams();
    116                 auto sum_main = std::accumulate(main_diagram.begin(), main_diagram.end(), FeeFrac{});
    117                 auto sum_staged = std::accumulate(staged_diagram.begin(), staged_diagram.end(), FeeFrac{});
    118                 auto diagram_gain = sum_staged - sum_main;
    119                 auto real_gain = sims[1].SumAll() - sims[0].SumAll();
    120+
    121+                // Sanity check sizes of diagrams
    122+                assert(main_diagram.size() <= num_possible_conflicted_entries_main);
    123+                assert(main_diagram.size() <= SimTxGraph::MAX_TRANSACTIONS);
    124+                assert(staged_diagram.size() <= num_possible_pulled_in_entries_staged);
    125+                assert(staged_diagram.size() <= SimTxGraph::MAX_TRANSACTIONS);
    126+
    127                 // Just check that the total fee gained/lost and size gained/lost according to the
    128                 // diagram matches the difference in these values in the simulated graph. A more
    129                 // complete check of the GetMainStagingDiagrams result is performed at the end.
    130                 assert(diagram_gain == real_gain);
    131                 // Check that the feerates in each diagram are monotonically decreasing.
    

    sipa commented at 7:52 pm on March 27, 2025:
    I will think about this. I think there may be a better way to track what we expect to be missing in the diagram.

    sipa commented at 2:26 pm on March 28, 2025:
    I have added a more accurate approach, I think: the staging trim graph now keeps track of which transactions have been potentially modified since being copied from main. The final comparison checks that the sections of the diagram missing from both GetMainStagingDiagrams results includes at least all those that have definitely not been modified.

    instagibbs commented at 2:49 pm on March 28, 2025:

    After a quick read this looks great! Simple and makes sense. Since the missing is equal on each side, the accounting only needs to be done on one side.

    I’m a bit confused still how this isn’t true:

    0assert(missing_real.fee >= missing_expected.fee);
    

    It quickly fails, and I haven’t dove into the examples. If we’re conservatively counting the un-affected clusters, why would this fail?


    sipa commented at 6:49 pm on March 28, 2025:
    Because fees can be negative in the fuzz test. If they were all non-negative, then I think this suggested assertion should hold.

    instagibbs commented at 6:50 pm on March 28, 2025:
    ah! please leave a note as to that, I always forget that’s possible

    sipa commented at 8:59 pm on March 28, 2025:
    Added a comment.
  85. in src/txgraph.h:194 in e37c5a6afe outdated
    195@@ -196,6 +196,10 @@ class TxGraph
    196     /** Construct a block builder, drawing chunks in order, from the main graph, which cannot be
    197      *  oversized. While the returned object exists, no mutators on the main graph are allowed. */
    198     virtual std::unique_ptr<BlockBuilder> GetBlockBuilder() noexcept = 0;
    199+    /** Get the last chunk in the main graph, i.e., the last chunk that would be returned by a
    


    instagibbs commented at 2:52 pm on March 25, 2025:
    should say that the FeePerWeight is the cfr, and that it cannot be called on an oversized graph

    sipa commented at 7:56 pm on March 27, 2025:
    Done.
  86. in src/txgraph.cpp:297 in 35beed7245 outdated
    292@@ -292,6 +293,8 @@ class TxGraphImpl final : public TxGraph
    293     ChunkIndex m_chunkindex;
    294     /** Number of index-observing objects in existence (BlockBuilderImpl). */
    295     size_t m_chunkindex_observers{0};
    296+    /** Cache of discarded ChunkIndex node handles. */
    


    instagibbs commented at 3:04 pm on March 25, 2025:
    0    /** Cache of discarded ChunkIndex node handles to re-use, avoiding additional allocations. */
    

    sipa commented at 8:00 pm on March 27, 2025:
    Done.
  87. glozow referenced this in commit f1d129d963 on Mar 26, 2025
  88. sipa force-pushed on Mar 27, 2025
  89. sipa commented at 3:49 am on March 27, 2025: member
    Rebased after merge of #31363, and on top of #32151.
  90. in src/txgraph.h:170 in 899ce12b59 outdated
    161@@ -162,6 +162,10 @@ class TxGraph
    162      *  main clusters are counted. Refs that do not exist in the queried graph are ignored. The
    163      *  queried graph must not be oversized. */
    164     virtual GraphIndex CountDistinctClusters(std::span<const Ref* const>, bool main_only = false) noexcept = 0;
    165+    /** Get feerate diagrams for both main and staging (which must both exist and not be
    166+     *  oversized), ignoring unmodified components in both. Use FeeFrac rather than FeePerWeight
    167+     *  so CompareChunks is usable without type-conversion. */
    168+    virtual std::pair<std::vector<FeeFrac>, std::vector<FeeFrac>> GetMainStagingDiagrams() noexcept = 0;
    


    ajtowns commented at 1:32 pm on March 27, 2025:

    Not sure if it’s worth it, but I think you could return vector<FeePerWeight> if you also had:

    0template <typename T>
    1requires std::is_base_of_v<FeeFrac, T> && (sizeof(FeeFrac) == sizeof(T))
    2std::partial_ordering CompareChunks(std::span<const T> chunks0, std::span<const T> chunks1)
    3{
    4    return CompareChunks(std::span<const FeeFrac>(reinterpret_cast<const FeeFrac*>(chunks0.data()), chunks0.size()), std::span<const FeeFrac>(reinterpret_cast<const FeeFrac*>(chunks1.data()), chunks1.size()));
    5}
    

    sipa commented at 7:56 pm on March 27, 2025:
    I considered something like this, but it complicates matters for the fuzz test, which reconstructs diagram directly using DepGraph code, which produces FeeFrac ones.

    ajtowns commented at 10:48 pm on March 28, 2025:
    Yeah, guess that pushes it too far into the “not worth it”
  91. in src/txgraph.cpp:2090 in 899ce12b59 outdated
    1931+            cluster->AppendChunkFeerates(staging_feerates);
    1932+        }
    1933+    }
    1934+    // Sort both by decreasing feerate to obtain diagrams, and return them.
    1935+    std::sort(main_feerates.begin(), main_feerates.end(), [](auto& a, auto& b) { return a >> b; });
    1936+    std::sort(staging_feerates.begin(), staging_feerates.end(), [](auto& a, auto& b) { return a >> b; });
    


    ajtowns commented at 1:41 pm on March 27, 2025:
    I wonder if getting chunks in feerate order across multiple clusters might be something common/important enough to be worth doing more directly, rather than just getting them all then sorting them after the fact? I suppose I’d just be suggesting using a heap of affected clusters, which isn’t that different to just sorting by chunk, so I guess it’s fine.

    sipa commented at 7:59 pm on March 27, 2025:
    That’s exactly what BlockBuilder does? This is test code that simulates the behavior of that (and, also works for staging, which has no index).

    ajtowns commented at 10:59 pm on March 28, 2025:
    Hmm. I didn’t understand how BlockBuilder was implemented when I wrote that; BlockBuilder maintains an ordered tree of every main chunk, so is O(num_chunks) in size/performance, which is fine. What I had in mind conceptually about cluster mempool is that you’d have a structure that was O(num_clusters) that you could query for the the best/worst chunk. That would be O(n log(n)) setup, then O(log(n)) for the equivalent of a “Next()” call, for n=num_clusters. But that’s not superior to how BlockBuilder works (and is probably inferior), so, doesn’t sound like there’s anything worth doing here.

    sipa commented at 1:38 am on March 29, 2025:
    Yeah, I think I considered an on-the-fly heap at some point, and it’s probable that that actually has overall lower runtime costs than this (due do better memory locality and less allocation overhead than a tree-structured set), but being to do the bulk of the work ahead of time is nice.

    ajtowns commented at 5:05 am on March 29, 2025:

    Thinking some more about this… I guess the advantage of the on-the-fly heap would be that you may never have to calculate the ordering for some chunks in the middle of the mempool (if they get rbf’ed, or conflicted by a block tx, or if they expire, or if they get cpfp’ed and thus brought to the front of the mempool, rather than just waiting until the best block feerate drops to their level, or you stop and restart your bitcoind before they get mined). Probably few enough txs hit those special cases to be much of a saving overall though

    Another thought: consider the “virtual Cluster class” idea, but instead of special casing each standalone tx into a custom single-tx-Cluster, you could have a single cluster for all the standalone txs. That would then just order those txs by feerate, but it would mean that instead of having to maintain a separate chunk/cluster data structure for each of those txs, you only maintain a single cluster data structure that covers all of them. That approach messes up the model for diagram comparisons that this PR introduces (because suddenly the diagram comparisons pull in unrelated singleton txs whenever any one of them is touched), but maybe that could be fixed reasonably.


    sipa commented at 1:00 pm on March 29, 2025:

    I guess the advantage of the on-the-fly heap would be that you may never have to calculate the ordering for some chunks in the middle of the mempool

    This is possible even with the existing index-based structure, by allowing chunk feerates to be indeterminate, and index them by a known lower and upper bound. That would necessitate two precomputed indexes (one for block building, sorted by the upper bound; one for eviction, sorted by the lower bound), and relinearizing whatever cluster was found on the fly during block building eviction. If we ever do the absorption set idea, we’d also need two indexes, so that’s not necessarily a very high cost.

    But I’m not sure any of this is worth it, because it doesn’t really improve the worst case, where you can have successive indidividual transactions that each interact with many of the same clusters (e.g. conflict with a transaction in many of them), requiring relinearizing all of them anyway.


    Another thought: consider the “virtual Cluster class” idea, but instead of special casing each standalone tx into a custom single-tx-Cluster, you could have a single cluster for all the standalone txs.

    A downside here is that the “linearization” for this fat singletons cluster would be O(n log n) to update on every operation involving a singleton, which may be substantial if there are 1000s of singletons?

    A variation on this idea though: add a new QualityLevel::SINGLETON, and allow many singleton clusters at the same time, each individually limited to 64 (or whatever) singleton transactions. This might not even need the virtual class idea.


    ajtowns commented at 8:53 pm on March 30, 2025:

    A downside here is that the “linearization” for this fat singletons cluster would be O(n log n) to update on every operation involving a singleton, which may be substantial if there are 1000s of singletons?

    I don’t think that’s true? You’d just store the linearisation as part of the special cluster implementation as a std::set by feerate order, for O(lg n) modification and O(n) reporting? I guess it would be better to call that class something other than “cluster”, since you’d probably want GetCluster(singleton) to just return the singleton, and CountDistinctClusters({singleton1, singleton2}) to return 2, etc.

    A variation on this idea though: add a new QualityLevel::SINGLETON, and allow many singleton clusters at the same time, each individually limited to 64 (or whatever) singleton transactions. This might not even need the virtual class idea.

    Hmm, maybe – you could find a new child tx would spend 63 txs from 63 different singleton clusters, but I don’t know if there’s anything about that that would be worse than the number of clusters you could impact by conflicting with pre-existing txs; and 63*64 is only 4032 which doesn’t sound very scary… Big mempools seem like they might contain up to 175k txs (300MB limit; or 350k txs unlimited), if they were all singletons, dividing by 64 gives just 2700 (or 5500) clusters, which seems nice. (I guess you’d need something to keep track of which existing singleton clusters aren’t full and can be added to)

  92. in src/test/fuzz/txgraph.cpp:859 in ef68aea6fd outdated
    832+        std::reverse(chunk.begin(), chunk.end());
    833+        auto [worst_chunk, worst_chunk_feerate] = real->GetWorstMainChunk();
    834+        assert(chunk == worst_chunk);
    835+        assert(chunk_feerate == worst_chunk_feerate);
    836+
    837+        // Check that the implied ordering gives rise to a combined diagram that matches the
    


    instagibbs commented at 1:43 pm on March 27, 2025:

    Took me a second to understand how get_diagram_fn related to the real world, is this correct? The ordering isn’t implied, exactly.

    0        // Check that the real ordering gives rise to a combined diagram that matches the
    

    sipa commented at 7:52 pm on March 27, 2025:
    The implied ordering is the one that was reconstructed above, through CompareMainOrder. I’ve changed the variable names to make this hopefully clearer.
  93. in src/test/fuzz/txgraph.cpp:826 in ef68aea6fd outdated
    834+        assert(chunk == worst_chunk);
    835+        assert(chunk_feerate == worst_chunk_feerate);
    836+
    837+        // Check that the implied ordering gives rise to a combined diagram that matches the
    838+        // diagram constructed from the individual cluster linearization chunkings.
    839+        auto main_diagram = get_diagram_fn(true);
    


    instagibbs commented at 1:45 pm on March 27, 2025:
    0        auto main_full_diagram = get_diagram_fn(/*main_only=*/true);
    

    sipa commented at 8:00 pm on March 27, 2025:
    I’ve renamed all the variables to be of the form {main,staging}_{real,implied,cmp}_diagram.
  94. in src/test/fuzz/txgraph.cpp:856 in ef68aea6fd outdated
    864+                                main_cmp_diagram.begin(), main_cmp_diagram.end(),
    865+                                std::inserter(missing_main_cmp, missing_main_cmp.end()),
    866+                                std::greater{});
    867+            assert(main_cmp_diagram.size() + missing_main_cmp.size() == main_diagram.size());
    868+            // Do the same for chunks in stage_diagram missign from stage_cmp_diagram.
    869+            auto stage_diagram = get_diagram_fn(false);
    


    instagibbs commented at 1:45 pm on March 27, 2025:
    0            auto stage_full_diagram = get_diagram_fn(/*main_only=*/false);
    

    the argument could also just take an explicit level/staging arg since they’re fully determined at each call site for now


    sipa commented at 8:00 pm on March 27, 2025:
    I’ve renamed all the variables to be of the form {main,staging}_{real,implied,cmp}_diagram.
  95. in src/txgraph.cpp:271 in 3b5e027585 outdated
    266+    /** Comparator for ChunkData objects in mining order. */
    267+    class ChunkOrder
    268+    {
    269+        const TxGraphImpl* const m_graph;
    270+    public:
    271+        explicit ChunkOrder(const TxGraphImpl* graph) : m_graph(graph) {}
    


    ajtowns commented at 1:52 pm on March 27, 2025:
    Would it make sense to have , int level=MAIN_LEVEL as a default argument so this could be used for (items changed in) staging as well? Otherwise, should the class name or docstring reflect it’s for main only?

    sipa commented at 7:53 pm on March 27, 2025:
    Given that there is only an m_main_lin_order (for now, and no prospective reason why we’d need one in staging), that does not work.
  96. in src/test/fuzz/txgraph.cpp:855 in ef68aea6fd outdated
    863+            std::set_difference(main_diagram.begin(), main_diagram.end(),
    864+                                main_cmp_diagram.begin(), main_cmp_diagram.end(),
    865+                                std::inserter(missing_main_cmp, missing_main_cmp.end()),
    866+                                std::greater{});
    867+            assert(main_cmp_diagram.size() + missing_main_cmp.size() == main_diagram.size());
    868+            // Do the same for chunks in stage_diagram missign from stage_cmp_diagram.
    


    instagibbs commented at 1:56 pm on March 27, 2025:
    0            // Do the same for chunks in stage_diagram missing from stage_cmp_diagram.
    

    sipa commented at 7:59 pm on March 27, 2025:
    Fixed.
  97. sipa force-pushed on Mar 27, 2025
  98. in src/txgraph.h:188 in bf7b901627 outdated
    183+        /** Determine whether there are more transactions to be included. */
    184+        explicit operator bool() noexcept { return m_current_chunk.has_value(); }
    185+        /** Get the chunk that is currently suggested to be included. */
    186+        const std::span<Ref*>& GetCurrentChunk() noexcept { return m_current_chunk->first; }
    187+        /** Get the feerate of the currently suggested chunk. */
    188+        const FeePerWeight& GetCurrentChunkFeerate() noexcept { return m_current_chunk->second; }
    


    ajtowns commented at 4:00 pm on March 27, 2025:

    I’m a bit surprised to see m_current_chunk as a persistent-ish variable that includes Ref* – that makes it a bug to move Refs around while you have a BlockBuilder in existence. I would have thought something more like:

     0    struct NextChunk { std::vector<Ref*> refs; FeePerWeight feerate; }
     1    optional<NextChunk> GetNextChunk();
     2    void Include(); void Skip();
     3    
     4    ...
     5
     6    while (auto x = bb.GetNextChunk(); x) {
     7        if (x->feerate.size > remaining) { bb.Skip(); continue; }
     8        bb.Include();
     9        mine(x->refs);
    10    }
    

    It’s still a bug to move Refs around while you have a NextChunk, but at least that’s kind-of shorter-lived than a BlockBuilder. I think that would mostly just mean moving the Next() calls into NextChunk().


    sipa commented at 7:58 pm on March 27, 2025:

    I’ve done something like this, but a bit different, as I felt it’d be ugly for the semantics of repeated GetNextChunk calls without Skip/Include in between.

    The interface is now:

    0        /** Get the chunk that is currently suggested to be included, plus its feerate, if any. */
    1        virtual std::optional<std::pair<std::vector<Ref*>, FeePerWeight>> GetCurrentChunk() noexcept = 0;
    2        /** Mark the current chunk as included, and progress to the next one. */
    3        virtual void Include() noexcept = 0;
    4        /** Mark the current chunk as skipped, and progress to the next one. Further chunks from
    5         *  the same cluster as the current one will not be reported anymore. */
    6        virtual void Skip() noexcept = 0;
    

    Where GetCurrentChunk may be called multiple times, without (observable) effect on the builder.

  99. in src/txgraph.cpp:296 in bf7b901627 outdated
    289@@ -289,6 +290,8 @@ class TxGraphImpl final : public TxGraph
    290 
    291     /** Index of ChunkData objects. */
    292     ChunkIndex m_chunkindex;
    293+    /** Number of index-observing objects in existence (BlockBuilderImpl). */
    294+    size_t m_chunkindex_observers{0};
    


    ajtowns commented at 4:13 pm on March 27, 2025:

    Perhaps this stuff could be a little bit more RAII-ish? Something like:

    0class RefsVector : public std::vector<Ref*>
    1{
    2    TxGraph& m_graph;
    3public:
    4    template<typename... T>
    5    RefsView(TxGraph& graph, T&&... args) : std::vector<Ref*>{args...}, m_graph{graph} { ++m_graph.m_chunkindex_observers; }
    6    ~RefsView() { --m_graph.m_chunkindex_observers; }
    7};
    

    Might be good to introduce m_chunkindex_observers in its own commit? Arguably having a vector of Ref’s (making it invalid to do Ref b = std::move(a);) is a different constraint from being unable to change the linearisation at the main level (which would mess up BlockBuilder); could potentially have distinct “observer” values for each, which would allow fees to be bumped, when you just have a Ref vec, eg.


    sipa commented at 7:54 pm on March 27, 2025:
    It’s a possibility, but you mean this in addition to BlockBuildImpl itself also incrementing the observers count, right? Because it holds an iterator to the chunk index in GraphImpl too.

    sipa commented at 2:33 pm on March 28, 2025:
    I have moved the introduction of m_chunkindex_observers to its own commit.

    ajtowns commented at 11:02 pm on March 28, 2025:

    It’s a possibility, but you mean this in addition to BlockBuildImpl itself also incrementing the observers count, right? Because it holds an iterator to the chunk index in GraphImpl too.

    Yes. (though, technically, if BlockBuildImpl stores its own RefsVector it could avoid explicitly incrementing the observers count, because RefsVector does that for it)

  100. instagibbs commented at 4:31 pm on March 27, 2025: member
    just a few comments after first read through
  101. in src/txgraph.cpp:437 in 554888cbb6 outdated
    433@@ -375,6 +434,7 @@ class TxGraphImpl final : public TxGraph
    434     {
    435         auto& entry = m_entries[idx];
    436         Assume(entry.m_ref != nullptr);
    437+        Assume(m_chunkindex_observers == 0 || !entry.m_locator[0].IsPresent());
    


    ajtowns commented at 4:45 pm on March 27, 2025:
    Should check observers==0 in UpdateRef too?

    sipa commented at 2:26 pm on March 28, 2025:
    This is no longer needed now that BlockBuilderImpl no longer holds on to a vector of Ref*, I think.
  102. ajtowns commented at 4:46 pm on March 27, 2025: contributor
    Looks good
  103. sipa force-pushed on Mar 27, 2025
  104. sipa force-pushed on Mar 28, 2025
  105. in src/test/fuzz/txgraph.cpp:337 in f4c8848c7c outdated
    301+            auto ref = sim.GetRef(i);
    302+            clusters.insert(real->GetCluster(*ref, main_only));
    303+        }
    304+        // Compute the chunkings of each (deduplicated) cluster.
    305+        size_t num_tx{0};
    306+        std::vector<FeeFrac> ret;
    


    instagibbs commented at 2:00 pm on March 28, 2025:

    micro-nit

    0        std::vector<FeeFrac> chunk_feerates;
    

    sipa commented at 9:03 pm on March 28, 2025:
    Done.
  106. in src/test/fuzz/txgraph.cpp:351 in f4c8848c7c outdated
    315+        }
    316+        // Verify the number of transactions after deduplicating clusters. This implicitly verifies
    317+        // that GetCluster on each element of a cluster reports the cluster transactions in the same
    318+        // order.
    319+        assert(num_tx == sim.GetTransactionCount());
    320+        // Sort by feerate (we don't care about respecting ordering within clusters, as these are
    


    instagibbs commented at 2:02 pm on March 28, 2025:
    We don’t care or it will be respected within clusters due to CFR?

    sipa commented at 6:48 pm on March 28, 2025:
    As far as diagram is concerned (incl. for FeeRateCompare), the order of equal-feerate element is irrelevant, as they just represent different sections of an overall line segment with a fixed slope. As far as FeeRateCompare is concerned, they could even be merged into a single line segment with the combined fee and size.

    instagibbs commented at 6:54 pm on March 28, 2025:
    Oh, I misread this, but I’m still having trouble understanding what is intended. “Sort by feerate” … “as these are just feerates”.

    sipa commented at 6:59 pm on March 28, 2025:

    Maybe I’m making things more confusing by mentioning it at all?

    My thinking is this: in general, to construct the full-graph chunk ordering, you cannot just sort by feerate, because you may sort two equal-feerate chunks within the same cluster incorrectly w.r.t. each other, breaking topology. However, here we are just construct a diagram, and don’t care about what chunks they correspond to, so as far as the feerates of line segments in the diagram is concerned, we can just ignore this problem.


    instagibbs commented at 2:39 pm on March 31, 2025:

    Yeah I think no comment at all is better than as-is, but maybe we can just point exactly to what case this is ignoring as safe?

    “Sort by feerate only, since violating topological constraints within same-feerate chunks won’t affect diagram comparisons.”

  107. sipa force-pushed on Mar 28, 2025
  108. sipa commented at 2:35 pm on March 28, 2025: member

    Changes:

    • BlockBuilder interface was changed, it now just has:
      • std::optional<std::pair<std::vector<Ref*, FeePerWeight>>> GetCurrentChunk()
      • void Include()
      • void Skip() Which avoids the need for it to store Ref* internally.
    • Improved GetMainStagingDiagrams testing, to verify that unmodified clusters are indeed not included in the diagrams.
    • Split off the introduction of the chunk index observers to a separate commit.
    • Split off the generalization of GetClusterRefs to support subsections of a Cluster to a separate commit.
  109. in src/test/fuzz/txgraph.cpp:908 in 50f173e0b3 outdated
    761+            // The missing part must include at least all transactions in staging which have not been
    762+            // modified, or been in a cluster together with modified transactions, since they were
    763+            // copied from main. Note that due to the reordering of removals w.r.t. dependency
    764+            // additions, it is possible that the real implementation found more unaffected things.
    765+            FeeFrac missing_real;
    766+            for (const auto& feerate : missing_main_cmp) missing_real += feerate;
    


    instagibbs commented at 2:51 pm on March 28, 2025:
    micro-nit: I know they’re identical, but having the list being built via stage_real_diagram seems easier to digest what’s being computed

    sipa commented at 7:03 pm on March 28, 2025:
    Can you elaborate, or suggest code?

    instagibbs commented at 2:39 pm on March 31, 2025:
    not worth the nit, mark as resolved please
  110. in src/txgraph.cpp:260 in 1d500a50b6 outdated
    251+    struct ChunkData
    252+    {
    253+        /** The Entry which is the last transaction of the chunk. */
    254+        mutable GraphIndex m_graph_index;
    255+        /** How many transactions the chunk contains. */
    256+        LinearizationIndex m_chunk_count;
    


    instagibbs commented at 3:26 pm on March 28, 2025:
    1d500a50b698a638c840ca79504c2a9a421e8756 stretch goal, have SanityCheck check m_chunk_count directly?

    sipa commented at 9:02 pm on March 28, 2025:
    Done.
  111. in src/txgraph.cpp:349 in 1d500a50b6 outdated
    345@@ -303,6 +346,8 @@ class TxGraphImpl final : public TxGraph
    346     {
    347         /** Pointer to the corresponding Ref object if any, or nullptr if unlinked. */
    348         Ref* m_ref{nullptr};
    349+        /** Iterator to the corresponding ChunkData, if any. */
    


    instagibbs commented at 3:39 pm on March 28, 2025:
    nit: Could we note that this is initialized first in AddTransaction?

    sipa commented at 9:02 pm on March 28, 2025:
    Done.
  112. in src/txgraph.cpp:291 in 1d500a50b6 outdated
    286+
    287+    /** Definition for the mining index type. */
    288+    using ChunkIndex = std::set<ChunkData, ChunkOrder>;
    289+
    290+    /** Index of ChunkData objects. */
    291+    ChunkIndex m_chunkindex;
    


    instagibbs commented at 3:43 pm on March 28, 2025:
    maybe overkill but m_main_chunkindex since there’s a lot of level==0 checks everywhere gating access/modification

    sipa commented at 9:03 pm on March 28, 2025:
    Done.
  113. in src/txgraph.cpp:656 in 1d500a50b6 outdated
    579@@ -524,17 +580,25 @@ void Cluster::Updated(TxGraphImpl& graph) noexcept
    580         // Iterate over the chunks.
    581         for (unsigned chunk_idx = 0; chunk_idx < chunking.NumChunksLeft(); ++chunk_idx) {
    582             auto chunk = chunking.GetChunk(chunk_idx);
    583-            Assume(chunk.transactions.Any());
    584+            auto chunk_count = chunk.transactions.Count();
    


    instagibbs commented at 3:56 pm on March 28, 2025:
    nit: const chunk_count

    sipa commented at 9:00 pm on March 28, 2025:
    No, it’s being assigned to below.
  114. in src/txgraph.cpp:151 in b130e185f5 outdated
    147@@ -148,7 +148,7 @@ class Cluster
    148      *  union of their descendants to output. */
    149     void GetDescendantRefs(const TxGraphImpl& graph, std::span<std::pair<Cluster*, DepGraphIndex>>& args, std::vector<TxGraph::Ref*>& output) noexcept;
    150     /** Get a vector of Refs for all elements of this Cluster, in linearization order. */
    151-    std::vector<TxGraph::Ref*> GetClusterRefs(const TxGraphImpl& graph) noexcept;
    152+    void GetClusterRefs(TxGraphImpl& graph, std::span<TxGraph::Ref*> range, LinearizationIndex start_pos) noexcept;
    


    instagibbs commented at 4:28 pm on March 28, 2025:

    b130e185f54c7f762da6ef435882335e4f0b98ff

    Could the documentation get beefed up a bit for this? What the shape that range should look like, etc. Was only clear later when seeing usages.


    sipa commented at 9:04 pm on March 28, 2025:
    Done.
  115. in src/txgraph.h:191 in 545812c3a3 outdated
    186+         *  the same cluster as the current one will not be reported anymore. */
    187+        virtual void Skip() noexcept = 0;
    188+    };
    189+
    190+    /** Construct a block builder, drawing chunks in order, from the main graph, which cannot be
    191+     *  oversized. While the returned object exists, no mutators on the main graph are allowed. */
    


    instagibbs commented at 4:57 pm on March 28, 2025:
    Should we make sure that the builder outlives the graph? Currently it’s allowed, or at least doesn’t fail if destructors are swapped in the fuzz target.

    sipa commented at 9:05 pm on March 28, 2025:
    That’s strange, because the builder outliving the graph is definitely a problem (the observer counter in the graph, which has been destroyed, will be subtracted from). I don’t see an easy way of allowing this (it’d need the graph to maintain a set of observers, which seems overkill).

    instagibbs commented at 5:57 pm on March 29, 2025:
    Sorry, that the builder doesn’t outlive the graph. I was testing to see what happens when it does outlive it, and it’s not crashing currently.
  116. in src/test/fuzz/txgraph.cpp:801 in 545812c3a3 outdated
    797@@ -715,6 +798,27 @@ FUZZ_TARGET(txgraph)
    798             }
    799         }
    800 
    801+        // The same order should be obtained through a BlockBuilder, if nothing is skipped.
    


    instagibbs commented at 5:05 pm on March 28, 2025:
    0        // The same order should be obtained through a BlockBuilder as implied by CompareMainOrder, if nothing is skipped.
    

    sipa commented at 9:05 pm on March 28, 2025:
    Done.
  117. in src/txgraph.cpp:557 in 545812c3a3 outdated
    540+    /** Which TxGraphImpl this object is doing block building for. It will have its
    541+     *  m_chunkindex_observers incremented as long as this BlockBuilderImpl exists. */
    542+    TxGraphImpl* const m_graph;
    543+    /** Clusters which we're not including further transactions from. */
    544+    std::set<Cluster*> m_excluded_clusters;
    545+    /** Iterator to the current chunk (after the current one) in the chunk index. end() if nothing
    


    instagibbs commented at 5:07 pm on March 28, 2025:
    0    /** Iterator to the current chunk (after the current one) in the chunk index. m_chunkindex.end() if nothing
    

    sipa commented at 8:59 pm on March 28, 2025:
    Done.
  118. in src/txgraph.h:186 in 545812c3a3 outdated
    181+        /** Get the chunk that is currently suggested to be included, plus its feerate, if any. */
    182+        virtual std::optional<std::pair<std::vector<Ref*>, FeePerWeight>> GetCurrentChunk() noexcept = 0;
    183+        /** Mark the current chunk as included, and progress to the next one. */
    184+        virtual void Include() noexcept = 0;
    185+        /** Mark the current chunk as skipped, and progress to the next one. Further chunks from
    186+         *  the same cluster as the current one will not be reported anymore. */
    


    instagibbs commented at 5:12 pm on March 28, 2025:
    Note that this only should be called if GetCurrentChunk() would return something non-nullopt

    sipa commented at 9:02 pm on March 28, 2025:
    Same here, made it have no effect in that case.

    instagibbs commented at 4:13 pm on March 31, 2025:
    both cases could use fuzz coverage :pray:
  119. in src/txgraph.h:183 in 545812c3a3 outdated
    178+    public:
    179+        /** Support safe inheritance. */
    180+        virtual ~BlockBuilder() = default;
    181+        /** Get the chunk that is currently suggested to be included, plus its feerate, if any. */
    182+        virtual std::optional<std::pair<std::vector<Ref*>, FeePerWeight>> GetCurrentChunk() noexcept = 0;
    183+        /** Mark the current chunk as included, and progress to the next one. */
    


    instagibbs commented at 5:13 pm on March 28, 2025:
    Note that this only should be called if GetCurrentChunk() would return something non-nullopt

    sipa commented at 9:01 pm on March 28, 2025:
    I’ve changed the code so that Include() and Skip() are no-ops once the end is reached.
  120. in src/txgraph.h:196 in d8483d8290 outdated
    189@@ -190,6 +190,11 @@ class TxGraph
    190     /** Construct a block builder, drawing chunks in order, from the main graph, which cannot be
    191      *  oversized. While the returned object exists, no mutators on the main graph are allowed. */
    192     virtual std::unique_ptr<BlockBuilder> GetBlockBuilder() noexcept = 0;
    193+    /** Get the last chunk in the main graph, i.e., the last chunk that would be returned by a
    194+     *  BlockBuilder created now, together with its feerate. The chunk is returned in
    195+     *  reverse-topological order, so every element is preceded by all its descendants. The main
    196+     *  graph must not be oversized. If the graph is empty, {} is returned. */
    


    instagibbs commented at 5:19 pm on March 28, 2025:
    a {} and empty FeePerWeight pair?

    sipa commented at 9:06 pm on March 28, 2025:
    Done.
  121. in src/test/fuzz/txgraph.cpp:753 in d8483d8290 outdated
    748+                // GetWorstMainChunk.
    749+                auto [worst_chunk, worst_chunk_feerate] = real->GetWorstMainChunk();
    750+                // Just do some sanity checks here. Consistency with GetBlockBuilder is checked
    751+                // below.
    752+                if (main_sim.GetTransactionCount() == 0) {
    753+                    assert(worst_chunk.empty());
    


    instagibbs commented at 5:19 pm on March 28, 2025:
    check worst_chunk_feerate too?

    sipa commented at 8:59 pm on March 28, 2025:
    Done.
  122. in src/test/fuzz/txgraph.cpp:853 in d8483d8290 outdated
    846+            last_chunk_feerate = chunk->second;
    847             builder->Include();
    848         }
    849         assert(vec_builder == vec1);
    850 
    851+        // The last chunk returned by the BlockBuilder must match GetWorstMainChunk, in reverse.
    


    instagibbs commented at 5:32 pm on March 28, 2025:
    Love to see it
  123. in src/txgraph.cpp:152 in a0e19959e8 outdated
    147@@ -148,8 +148,9 @@ class Cluster
    148     /** Process elements from the front of args that apply to this cluster, and append Refs for the
    149      *  union of their descendants to output. */
    150     void GetDescendantRefs(const TxGraphImpl& graph, std::span<std::pair<Cluster*, DepGraphIndex>>& args, std::vector<TxGraph::Ref*>& output) noexcept;
    151-    /** Get a vector of Refs for all elements of this Cluster, in linearization order. */
    152-    void GetClusterRefs(TxGraphImpl& graph, std::span<TxGraph::Ref*> range, LinearizationIndex start_pos) noexcept;
    153+    /** Get a vector of Refs for all elements of this Cluster, in linearization order. Returns
    154+     *  the range ends at the end of the cluster. */
    


    instagibbs commented at 5:55 pm on March 28, 2025:
    0    * true if the range ends at the end of the cluster. */
    

    sipa commented at 9:02 pm on March 28, 2025:
    This has been rewritten.
  124. in src/txgraph.cpp:1690 in a0e19959e8 outdated
    1676@@ -1675,6 +1677,8 @@ void Cluster::GetClusterRefs(TxGraphImpl& graph, std::span<TxGraph::Ref*> range,
    1677         Assume(entry.m_ref != nullptr);
    1678         ref = entry.m_ref;
    1679     }
    1680+    // Return whether this was the end of the Cluster.
    1681+    return start_pos == m_linearization.size();
    


    instagibbs commented at 6:07 pm on March 28, 2025:
    nit: fwiw forgot start_pos was actually end_pos by this point, maybe this function should have cur_pos?

    sipa commented at 9:01 pm on March 28, 2025:

    I don’t like that because it means either duplicating the variable, or naming the argument different in definition & declaration.

    I’ve changed the comment a bit, though.


    instagibbs commented at 5:58 pm on March 29, 2025:
    feel free to resolve
  125. in src/txgraph.cpp:151 in a0e19959e8 outdated
    147@@ -148,8 +148,9 @@ class Cluster
    148     /** Process elements from the front of args that apply to this cluster, and append Refs for the
    149      *  union of their descendants to output. */
    150     void GetDescendantRefs(const TxGraphImpl& graph, std::span<std::pair<Cluster*, DepGraphIndex>>& args, std::vector<TxGraph::Ref*>& output) noexcept;
    151-    /** Get a vector of Refs for all elements of this Cluster, in linearization order. */
    152-    void GetClusterRefs(TxGraphImpl& graph, std::span<TxGraph::Ref*> range, LinearizationIndex start_pos) noexcept;
    153+    /** Get a vector of Refs for all elements of this Cluster, in linearization order. Returns
    


    instagibbs commented at 6:18 pm on March 28, 2025:

    re: commit message for a0e19959e85b2ce1a90902ab1e3fe274f9d63a1d

    Might just stress that this makes m_excluded_cluster a lot smaller since this takes the size from O(number cluster) to O(number non-singleton clusters).


    sipa commented at 9:02 pm on March 28, 2025:
    Done, but elsewhere.
  126. in src/txgraph.cpp:2355 in 0630995fee outdated
    2343-            Assume(cluster == m_cur_cluster);
    2344+            ret->first.resize(chunk_data.m_chunk_count);
    2345+            auto start_pos = chunk_end_entry.m_main_lin_index + 1 - chunk_data.m_chunk_count;
    2346+            bool is_end = cluster->GetClusterRefs(*m_graph, ret->first, start_pos);
    2347+            if (is_end) {
    2348+                m_cur_cluster = nullptr;
    


    instagibbs commented at 6:23 pm on March 28, 2025:
    could Assume() here that chunk_data.m_chunk_count > 1, since it would have been given the special value otherwise?

    sipa commented at 9:01 pm on March 28, 2025:
    Done, plus a comment.
  127. instagibbs commented at 6:31 pm on March 28, 2025: member

    Looking good 0630995fee22990402771547be1480b8706c76ce

    No real show-stoppers except for the diagram check fuzz test conceptual issue I’m still having.

  128. txgraph: Add GetMainStagingDiagrams function (feature)
    This allows determining whether the changes in a staging diagram unambiguously improve
    the graph, through CompareChunks().
    dac4d915a4
  129. txgraph: Maintain chunk index (preparation)
    This is preparation for exposing mining and eviction functionality in
    TxGraph.
    1a9fb7fd5a
  130. txgraph: Introduce TxGraphImpl observer tracking (preparation)
    This is preparation for a next commit which will introduce a class whose
    objects hold references to internals in TxGraphImpl, which disallows
    modifications to the graph while such objects exist.
    9bfd68d667
  131. txgraph: Generalize GetClusterRefs to support subsections (preparation)
    This is preparation for a next commit which will need a way to extract Refs
    for just individual chunks from a cluster.
    e610a1df3a
  132. txgraph: Introduce BlockBuilder interface (feature)
    This interface lets one iterate efficiently over the chunks of the main
    graph in a TxGraph, in the same order as CompareMainOrder. Each chunk
    can be marked as "included" or "skipped" (and in the latter case,
    dependent chunks will be skipped).
    f01ce7f2c9
  133. txgraph: Introduce TxGraph::GetWorstMainChunk (feature)
    It returns the last chunk that would be suggested for mining by BlockBuilder
    objects. This is intended for eviction.
    eb10f3a370
  134. txgraph: Reuse discarded chunkindex entries (optimization) 33f97b568f
  135. txgraph: Skipping end of cluster has no impact (optimization) 74bb8e3e21
  136. txgraph: Special-case singletons in chunk index (optimization) d8e4b92fb4
  137. sipa force-pushed on Mar 28, 2025
  138. in src/txgraph.h:165 in dac4d915a4 outdated
    161@@ -162,6 +162,10 @@ class TxGraph
    162      *  main clusters are counted. Refs that do not exist in the queried graph are ignored. The
    163      *  queried graph must not be oversized. */
    164     virtual GraphIndex CountDistinctClusters(std::span<const Ref* const>, bool main_only = false) noexcept = 0;
    165+    /** Get feerate diagrams for both main and staging (which must both exist and not be
    


    instagibbs commented at 12:57 pm on March 31, 2025:
    Actually we need to define exactly what a feerate diagram is here given it could mean a couple things. i.e., it’s not the aggregated form, but a series of monotonically decreasing in feerate chunks

    instagibbs commented at 12:58 pm on March 31, 2025:
    0    /** Get feerate diagrams for both main and staging respectively (which must both exist and not be
    

    or be even more pedantic about ordering

  139. in src/txgraph.cpp:583 in 1a9fb7fd5a outdated
    581@@ -524,17 +582,25 @@ void Cluster::Updated(TxGraphImpl& graph) noexcept
    582         // Iterate over the chunks.
    583         for (unsigned chunk_idx = 0; chunk_idx < chunking.NumChunksLeft(); ++chunk_idx) {
    


    instagibbs commented at 1:27 pm on March 31, 2025:
    hindsight nit: chunking could be const to make it clear loop end condition isn’t being modified
  140. in src/txgraph.cpp:2302 in f01ce7f2c9 outdated
    2297+        ret.emplace();
    2298+        const auto& chunk_data = *m_cur_iter;
    2299+        const auto& chunk_end_entry = m_graph->m_entries[chunk_data.m_graph_index];
    2300+        ret->first.resize(chunk_data.m_chunk_count);
    2301+        auto start_pos = chunk_end_entry.m_main_lin_index + 1 - chunk_data.m_chunk_count;
    2302+        m_cur_cluster->GetClusterRefs(*m_graph, ret->first, start_pos);
    


    instagibbs commented at 2:22 pm on March 31, 2025:

    f01ce7f2c9d50a95bb7c694028525c8cc3112a56

    0        Assume(m_cur_cluster);
    1        m_cur_cluster->GetClusterRefs(*m_graph, ret->first, start_pos);
    
  141. instagibbs approved
  142. instagibbs commented at 5:24 pm on March 31, 2025: member

    ACK d8e4b92fb4aea16e18285d25c4d5bba3b3b51c98

    non-blocking comments

    git range-diff master 0630995fee22990402771547be1480b8706c76ce d8e4b92fb4aea16e18285d25c4d5bba3b3b51c98

  143. DrahtBot requested review from ismaelsadeeq on Mar 31, 2025

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-03-31 18:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me