cluster mempool: merging & postprocessing of linearizations #30285

pull sipa wants to merge 5 commits into bitcoin:master from sipa:202406_clusterlin_meta changing 3 files +653 −16
  1. sipa commented at 2:12 am on June 14, 2024: member

    Part of cluster mempool: #30289

    Depends on #30126, and was split off from it. #28676 depends on this.

    This adds the algorithms for merging & postprocessing linearizations.

    The PostLinearize(depgraph, linearization) function performs an in-place improvement of linearization, using two iterations of the Linearization post-processing algorithm. The first running from back to front, the second from front to back.

    The MergeLinearizations(depgraph, linearization1, linearization2) function computes a new linearization for the provided cluster, given two existing linearizations for that cluster, which is at least as good as both inputs. The algorithm is described at a high level in merging incomparable linearizations.

    For background and references, see Introduction to cluster linearization.

  2. DrahtBot commented at 2:12 am on June 14, 2024: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage

    For detailed information about the code coverage, see the test coverage report.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK glozow, instagibbs, sdaftuar

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #28676 ([WIP] Cluster mempool implementation by sdaftuar)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  3. sipa renamed this:
    Low-level cluster linearization code: merging & postprocessing
    cluster mempool: merging and postprocessing for linearizations
    on Jun 14, 2024
  4. sipa renamed this:
    cluster mempool: merging and postprocessing for linearizations
    cluster mempool: merging & postprocessing of linearizations
    on Jun 14, 2024
  5. sipa added the label Mempool on Jun 14, 2024
  6. sipa force-pushed on Jul 2, 2024
  7. sipa force-pushed on Jul 10, 2024
  8. sipa force-pushed on Jul 10, 2024
  9. DrahtBot added the label CI failed on Jul 10, 2024
  10. DrahtBot commented at 10:20 pm on July 10, 2024: contributor

    🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the documentation.

    Possibly this is due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    Leave a comment here, if you need help tracking down a confusing failure.

    Debug: https://github.com/bitcoin/bitcoin/runs/27291659212

  11. sipa force-pushed on Jul 11, 2024
  12. sipa force-pushed on Jul 11, 2024
  13. DrahtBot removed the label CI failed on Jul 11, 2024
  14. sipa force-pushed on Jul 19, 2024
  15. sipa force-pushed on Jul 19, 2024
  16. DrahtBot added the label Needs rebase on Jul 26, 2024
  17. sipa force-pushed on Jul 26, 2024
  18. DrahtBot removed the label Needs rebase on Jul 26, 2024
  19. in src/cluster_linearize.h:807 in 8eaca20d60 outdated
    802+    //
    803+    // During an odd pass, the high-level operation is:
    804+    // - Start with an empty list of groups L=[].
    805+    // - For every transaction i in the old linearization, from front to back:
    806+    //   - Append a new group C=[i], containing just i, to the back of L.
    807+    //   - While there is a group P in L immediately preceding C, which has lower feerate than C:
    


    instagibbs commented at 1:44 pm on July 30, 2024:
    0    //   - While there is a group P in L immediately preceding C which has lower feerate than C:
    

    sipa commented at 6:11 pm on August 1, 2024:
    This was rewritten.
  20. in src/cluster_linearize.h:815 in 8eaca20d60 outdated
    785+ * @param[in,out] linearization  On input, an existing linearization for depgraph. On output, a
    786+ *                               potentially better linearization for the same graph.
    787+ *
    788+ * Postlinearization guarantees:
    789+ * - The resulting chunks are connected.
    790+ * - If the input has a tree shape (either all transactions have at most one child, or all
    


    instagibbs commented at 2:10 pm on July 30, 2024:
    intuitively, and from attempting to fuzz this, each pass corresponds to one type of tree shape. Might be good to note this, and if state if there are other motivations as well to do two passes and under what topologies.

    sipa commented at 8:13 pm on August 1, 2024:
    I’ve explained this in more detail (in the comment below, inside the function body).
  21. in src/cluster_linearize.h:834 in 8eaca20d60 outdated
    829+    {
    830+        /** The index of the previous transaction in this group; 0 if this is the first entry of
    831+         *  a group. */
    832+        ClusterIndex prev_tx;
    833+
    834+        // Fields that are only used for transactions that are the last one in a group:
    


    instagibbs commented at 2:23 pm on July 30, 2024:
    next field is first_tx, not last

    sipa commented at 6:40 pm on July 30, 2024:
    The first_tx field (just like the prev_group, group, deps, feerate fields) are only used for transactions which are the last one in a group.

    sipa commented at 1:27 pm on July 31, 2024:
    I have tried to make the comments here a bit clearer.
  22. in src/test/fuzz/cluster_linearize.cpp:789 in 8eaca20d60 outdated
    784+    SanityCheck(depgraph, linearization);
    785+
    786+    // Produce a post-processed version.
    787+    auto post_linearization = linearization;
    788+    PostLinearize(depgraph, post_linearization);
    789+    SanityCheck(depgraph, linearization);
    


    instagibbs commented at 2:27 pm on July 30, 2024:
    sanity checking wrong linearization?

    sipa commented at 10:10 pm on July 30, 2024:
    Indeed, fixed (here and below).
  23. in src/test/fuzz/cluster_linearize.cpp:804 in 8eaca20d60 outdated
    791+    // Compare diagrams: post-linearization cannot worsen anywhere.
    792+    auto old_chunking = ChunkLinearization(depgraph, linearization);
    793+    auto new_chunking = ChunkLinearization(depgraph, post_linearization);
    794+    auto cmp = CompareChunks(new_chunking, old_chunking);
    795+    assert(cmp >= 0);
    796+
    


    instagibbs commented at 2:31 pm on July 30, 2024:
    0
    1    // Run again, things can keep improving (and never get worse)
    2    PostLinearize(depgraph, post_linearization);
    3    SanityCheck(depgraph, post_linearization);
    4    auto newest_chunking = ChunkLinearization(depgraph, post_linearization);
    5    cmp = CompareChunks(newest_chunking, new_chunking);
    6    assert(cmp >= 0);
    7    
    

    sipa commented at 10:11 pm on July 30, 2024:
    Added.
  24. in src/test/fuzz/cluster_linearize.cpp:871 in 8eaca20d60 outdated
    858+
    859+    // Compare diagrams.
    860+    auto old_chunking = ChunkLinearization(depgraph_tree, linearization);
    861+    auto new_chunking = ChunkLinearization(depgraph_tree, post_linearization);
    862+    auto cmp = CompareChunks(new_chunking, old_chunking);
    863+    assert(cmp >= 0);
    


    instagibbs commented at 2:34 pm on July 30, 2024:
    Could test that running PostLinearization again doesn’t do anything

    sipa commented at 10:11 pm on July 30, 2024:
    Done.
  25. in src/test/fuzz/cluster_linearize.cpp:892 in 8eaca20d60 outdated
    870+    assert(cmp_opt == 0);
    871+}
    872+
    873+FUZZ_TARGET(clusterlin_postlinearize_moved_leaf)
    874+{
    875+    // Verify that taking an existing linearization, and moving a leaf to the back, potentially
    


    instagibbs commented at 2:34 pm on July 30, 2024:
    is this specifically for prioritisetransaction or something?

    sipa commented at 10:12 pm on July 30, 2024:
    No, it means that a particular approach for RBF will never worsen single-transaction leaf replacements that do not change the size of a transaction. I’ve added a comment to clarify.

    instagibbs commented at 10:16 pm on July 30, 2024:
    That was my initial thought but the size isn’t changing so I’m unsure how it would map to an RBF?

    sipa commented at 10:19 pm on July 30, 2024:
    “laboratory conditions” I guess. It’s just a nice property that holds; even if the conditions for it don’t exactly hold often in reality, it probably means they’re not far off. I’ll look into whether I can generalize it a bit to support changing size.

    sdaftuar commented at 3:11 pm on July 31, 2024:

    To add on to that – at one point in an earlier draft of #28676, we were seeing examples of RBFs where a single transaction was being replaced with one that had identical size, identical parents, but higher fee, yet due to quirks in how linearization worked the replacement was being rejected because the diagram wasn’t improving.

    We’ve since generalized this concern around accidentally discovering a better linearization for a cluster while processing a potential replacement, and I think we will be addressing this in a more robust way, but it’s nice that PostLinearization solves the specific problem we observed in the wild.

  26. in src/test/fuzz/cluster_linearize.cpp:857 in 8eaca20d60 outdated
    852+    SanityCheck(depgraph_tree, linearization);
    853+
    854+    // Produce a post-processed version.
    855+    auto post_linearization = linearization;
    856+    PostLinearize(depgraph_tree, post_linearization);
    857+    SanityCheck(depgraph_tree, linearization);
    


    instagibbs commented at 2:35 pm on July 30, 2024:
    sanity checking wrong linearization?

    sipa commented at 10:11 pm on July 30, 2024:
    Indeed, fixed.
  27. in src/cluster_linearize.h:992 in c17480e144 outdated
    987+        // prefixes of remaining chunks of the other linearization.
    988+        SetInfo<SetType> best;
    989+        const auto& lin1_firstchunk = chunking1.GetChunk(0);
    990+        const auto& lin2_firstchunk = chunking2.GetChunk(0);
    991+        if (lin2_firstchunk.feerate >> lin1_firstchunk.feerate) {
    992+            best = chunking1.Intersect(lin2_firstchunk);
    


    instagibbs commented at 3:06 pm on July 30, 2024:
    hm, Intersect had me quite confused in this context, consider renaming this as something like PrefixIntersect?

    sipa commented at 10:10 pm on July 30, 2024:
    Added a commit that renames it.
  28. in src/cluster_linearize.h:1042 in c17480e144 outdated
    994+            best = chunking2.Intersect(lin1_firstchunk);
    995+        }
    996+        // Append the result to the output and mark it as done.
    997+        depgraph.AppendTopo(ret, best.transactions);
    998+        chunking1.MarkDone(best.transactions);
    999+        if (chunking1.NumChunksLeft() == 0) break;
    


    instagibbs commented at 3:12 pm on July 30, 2024:
    breaking here allows you to skip one last MarkDone on completion vs putting the count in the while loop, yes?

    sipa commented at 10:03 pm on July 30, 2024:
    Exactly.
  29. instagibbs commented at 3:21 pm on July 30, 2024: member

    f4a183cd17c7c770fd6b9d32b9d6c6f791bceb4e

    did not yet review implementation of PostLinearize, didn’t verify benchmarks with optimizations

  30. sipa force-pushed on Jul 30, 2024
  31. DrahtBot added the label CI failed on Jul 30, 2024
  32. DrahtBot removed the label CI failed on Jul 31, 2024
  33. in src/cluster_linearize.h:842 in 2df78c9cd6 outdated
    811+    // - Start with an empty list of groups L=[].
    812+    // - For every transaction i in the old linearization, from front to back:
    813+    //   - Append a new group C=[i], containing just i, to the back of L.
    814+    //   - While L has at least one group before C, and the group immediately before C has feerate
    815+    //     lower than C:
    816+    //       - Merge P into C, making C the concatenation of P+C, continuing with the combined C.
    


    sdaftuar commented at 2:01 pm on July 31, 2024:
    I think in your update to this comment, you dropped the condition that C depends on P. (But thanks for redrafting – this is easier for me to read.)

    sipa commented at 8:14 pm on August 1, 2024:
    Indeed, fixed.
  34. in src/test/fuzz/cluster_linearize.cpp:855 in 2df78c9cd6 outdated
    850+                parents -= depgraph_gen.Ancestors(j);
    851+                parents.Set(j);
    852+            }
    853+            if (parents.Any()) depgraph_tree.AddDependency(parents.First(), i);
    854+        }
    855+    }
    


    sdaftuar commented at 3:06 pm on July 31, 2024:
    Just to test my understanding: at this point, the graph we’ve constructed may not be connected, right?

    sipa commented at 7:08 pm on August 1, 2024:

    Certainly, for two reasons:

    • depgraph_gen may not be connected to begin with (there is no MakeConnected(depgraph_gen); perhaps there should be?)
    • Even if depgraph_gen is connected (imagine it being a trellis for example), removing all but the first parent, or all but the first child, may split it up.
  35. in src/test/fuzz/cluster_linearize.cpp:896 in 2df78c9cd6 outdated
    891+{
    892+    // Verify that taking an existing linearization, and moving a leaf to the back, potentially
    893+    // increasing its fee, and then post-linearizing, results in something as good as the
    894+    // original. This guarantees that in an RBF that replaces a transaction with one of the same
    895+    // size but higher fee, applying the "remove conflicts, append new transaction, postlinearize"
    896+    // process to an optimalized linearization will retain optimality.
    


    sdaftuar commented at 3:13 pm on July 31, 2024:
    optimalized –> optimal

    sipa commented at 8:14 pm on August 1, 2024:
    Rewritten (the claim wasn’t just grammatically bad, it was also just wrong).
  36. sdaftuar commented at 4:21 pm on July 31, 2024: member
    code review ACK 157464f3a9444ac9b2337227a2fee2f3adbd0918, will fuzz
  37. in src/cluster_linearize.h:183 in 157464f3a9 outdated
    178+     * Specifically, this finds the connected component which contains the first transaction of
    179+     * todo (if any).
    180+     *
    181+     * Two transactions are considered connected if there is a path from one to the other inside
    182+     * todo, where each is an ancestor or descendant of the next one in the entire graph (not just
    183+     * within todo). Thus, if if todo contains a transaction and a grandparent, but misses the
    


    instagibbs commented at 4:37 pm on July 31, 2024:
    0     * within todo). Thus, if todo contains a transaction and a grandparent, but misses the
    

    sipa commented at 8:14 pm on August 1, 2024:
    Fixed.
  38. in src/test/fuzz/cluster_linearize.cpp:806 in 157464f3a9 outdated
    801+    auto post_post_chunking = ChunkLinearization(depgraph, post_post_linearization);
    802+    cmp = CompareChunks(post_post_chunking, post_chunking);
    803+    assert(cmp >= 0);
    804+
    805+    // The chunks that come out of postlinearizing are always connected.
    806+    LinearizationChunking linchunking(depgraph, post_linearization);
    


    instagibbs commented at 4:41 pm on July 31, 2024:
    sounds like a tasty cantonese restaurant
  39. in src/cluster_linearize.h:181 in 157464f3a9 outdated
    172@@ -171,6 +173,50 @@ class DepGraph
    173         return ret;
    174     }
    175 
    176+    /** Find some connected component within the subset "todo" of this graph.
    177+     *
    178+     * Specifically, this finds the connected component which contains the first transaction of
    179+     * todo (if any).
    180+     *
    181+     * Two transactions are considered connected if there is a path from one to the other inside
    


    instagibbs commented at 4:51 pm on July 31, 2024:
    0     * Two transactions are considered connected if there is a path from one to the other and both are inside
    

    ?

    in general this description and rewrite was very helpful


    sipa commented at 7:33 pm on August 1, 2024:
    Not just the two transactions, but all transactions in the path need to be in todo.

    sipa commented at 8:15 pm on August 1, 2024:
    I have rewritten this. Please have a look if it’s better.

    instagibbs commented at 8:22 pm on August 1, 2024:
    definitely clearer on what the “path” is :+1: (and matches my understanding of the code now)
  40. in src/cluster_linearize.h:861 in 157464f3a9 outdated
    856+    {
    857+        /** The index of the previous transaction in this group; 0 if this is the first entry of
    858+         *  a group. */
    859+        ClusterIndex prev_tx;
    860+
    861+        // The fields below are only used for transactions that are the last one in a group:
    


    instagibbs commented at 5:27 pm on July 31, 2024:
    0        // The fields below are only used for transactions that are the last one in a group, also known as a tail transaction:
    

    sipa commented at 8:15 pm on August 1, 2024:
    Done-ish.
  41. in src/cluster_linearize.h:920 in 157464f3a9 outdated
    915+            ClusterIndex cur_group = idx + 1;
    916+            entries[cur_group].group = SetType::Singleton(idx);
    917+            entries[cur_group].deps = rev ? depgraph.Descendants(idx): depgraph.Ancestors(idx);
    918+            entries[cur_group].feerate = depgraph.FeeRate(idx);
    919+            if (rev) entries[cur_group].feerate.fee = -entries[cur_group].feerate.fee;
    920+            entries[cur_group].prev_tx = 0; // No previous transaction in group.
    


    instagibbs commented at 8:15 pm on July 31, 2024:
    except for here, I think all uses of 0 are sentinel references, could add a constant to denote that meaning?

    sipa commented at 8:15 pm on August 1, 2024:
    Done. I have introduced two constants SENTINEL and NO_PREV_TX (both with value 0) and used them instead of the literal 0s here.
  42. in src/cluster_linearize.h:958 in 157464f3a9 outdated
    938+                Assume(cur_group != 0);
    939+                Assume(prev_group != 0);
    940+                if (entries[cur_group].deps.Overlaps(entries[prev_group].group)) {
    941+                    // There is a dependency between cur_group and prev_group; merge prev_group
    942+                    // into cur_group.
    943+                    entries[cur_group].group |= entries[prev_group].group;
    


    instagibbs commented at 8:18 pm on July 31, 2024:
    could note that after these lines, entries[prev_group]’s group + deps + feerate fields are still set but no longer used

    sipa commented at 8:15 pm on August 1, 2024:
    Indeed, added a comment.
  43. sdaftuar commented at 8:24 pm on July 31, 2024: member
    ACK 157464f3a9444ac9b2337227a2fee2f3adbd0918
  44. in src/test/fuzz/cluster_linearize.cpp:930 in 157464f3a9 outdated
    925+    // Compare diagrams (applying the fee delta after computing the old one).
    926+    auto old_chunking = ChunkLinearization(depgraph, lin);
    927+    depgraph.FeeRate(lin_leaf.back()).fee += fee_inc;
    928+    auto new_chunking = ChunkLinearization(depgraph, lin_moved);
    929+    auto cmp = CompareChunks(new_chunking, old_chunking);
    930+    assert(cmp >= 0);
    


    instagibbs commented at 8:32 pm on July 31, 2024:

    nice to have a non-tree example of things strictly improving, also interesting to see that improvements still happen even if no fees are added edit: hm, still not quite showing strict improvement since fees are after the fact; it’s a different “cluster”. Oh well.

    0    if (fee_inc > 0) {
    1        // It's more fees; should be superior
    2        assert(cmp > 0);
    3    } else {
    4        assert(cmp >= 0);
    5    }
    
  45. instagibbs approved
  46. instagibbs commented at 8:37 pm on July 31, 2024: member

    ACK 157464f3a9444ac9b2337227a2fee2f3adbd0918

    I didn’t verify benchmark timings but opts seemed ok

  47. in src/cluster_linearize.h:1045 in 17b29eae7c outdated
    1005+        depgraph.AppendTopo(ret, best.transactions);
    1006+        chunking1.MarkDone(best.transactions);
    1007+        if (chunking1.NumChunksLeft() == 0) break;
    1008+        chunking2.MarkDone(best.transactions);
    1009+    }
    1010+
    


    glozow commented at 11:24 am on August 1, 2024:
    could do an Assume(ret.size() == depgraph.TxCount()); before returning here, or in the fuzz test

    sipa commented at 8:16 pm on August 1, 2024:
    Done.
  48. sdaftuar commented at 3:37 pm on August 1, 2024: member

    Here are my linearization benchmarks (ryzen 7995wx) after this PR (compare with #30126 (comment)):

    ns/op op/s err% total benchmark
    973.43 1,027,297.78 0.5% 0.01 LinearizeNoIters16TxWorstCaseAnc
    1,464.85 682,664.84 0.2% 0.01 LinearizeNoIters16TxWorstCaseLIMO
    2,503.93 399,372.47 0.3% 0.01 LinearizeNoIters32TxWorstCaseAnc
    5,091.86 196,392.05 0.2% 0.01 LinearizeNoIters32TxWorstCaseLIMO
    4,881.34 204,861.99 0.1% 0.01 LinearizeNoIters48TxWorstCaseAnc
    10,880.48 91,907.74 0.0% 0.01 LinearizeNoIters48TxWorstCaseLIMO
    7,843.67 127,491.30 0.0% 0.01 LinearizeNoIters64TxWorstCaseAnc
    18,803.07 53,182.80 0.1% 0.01 LinearizeNoIters64TxWorstCaseLIMO
    11,747.48 85,124.62 0.1% 0.01 LinearizeNoIters75TxWorstCaseAnc
    28,902.79 34,598.74 0.0% 0.01 LinearizeNoIters75TxWorstCaseLIMO
    19,030.02 52,548.55 0.1% 0.01 LinearizeNoIters99TxWorstCaseAnc
    49,663.43 20,135.54 0.1% 0.01 LinearizeNoIters99TxWorstCaseLIMO
    ns/iters iters/s err% total benchmark
    13.66 73,231,349.71 0.1% 0.01 LinearizePerIter16TxWorstCase
    9.63 103,869,988.01 0.3% 0.01 LinearizePerIter32TxWorstCase
    9.38 106,626,518.23 0.3% 0.01 LinearizePerIter48TxWorstCase
    9.44 105,920,646.37 0.3% 0.01 LinearizePerIter64TxWorstCase
    10.38 96,365,108.12 0.3% 0.01 LinearizePerIter75TxWorstCase
    10.38 96,374,023.73 0.2% 0.01 LinearizePerIter99TxWorstCase
    ns/op op/s err% total benchmark
    690.49 1,448,239.70 0.4% 0.01 MergeLinearizations16TxWorstCase
    2,703.90 369,835.96 0.1% 0.01 MergeLinearizations32TxWorstCase
    6,143.66 162,769.31 0.0% 0.01 MergeLinearizations48TxWorstCase
    11,300.76 88,489.66 0.1% 0.01 MergeLinearizations64TxWorstCase
    17,576.48 56,894.21 0.0% 0.01 MergeLinearizations75TxWorstCase
    31,100.42 32,153.91 0.0% 0.01 MergeLinearizations99TxWorstCase
    278.14 3,595,258.27 0.0% 0.01 PostLinearize16TxWorstCase
    863.81 1,157,659.26 0.0% 0.01 PostLinearize32TxWorstCase
    2,657.66 376,271.23 0.2% 0.01 PostLinearize48TxWorstCase
    4,652.11 214,956.22 0.0% 0.01 PostLinearize64TxWorstCase
    5,392.31 185,449.25 0.2% 0.01 PostLinearize75TxWorstCase
    9,431.09 106,032.27 0.1% 0.01 PostLinearize99TxWorstCase
  49. in src/test/fuzz/cluster_linearize.cpp:927 in 2df78c9cd6 outdated
    922+    PostLinearize(depgraph, lin_moved);
    923+    SanityCheck(depgraph, lin_moved);
    924+
    925+    // Compare diagrams (applying the fee delta after computing the old one).
    926+    auto old_chunking = ChunkLinearization(depgraph, lin);
    927+    depgraph.FeeRate(lin_leaf.back()).fee += fee_inc;
    


    glozow commented at 4:45 pm on August 1, 2024:
    I’m a bit confused trying to understand the “remove conflicts, append new tx to the end, postlinearize” RBF idea. I kind of expected this fee increment to happen before PostLinearize() is called, if it’s similar to appending the replacement with higher fees to the end.

    sipa commented at 8:08 pm on August 1, 2024:

    So imagine your have an existing cluster C with an existing linearization L. It contains a leaf transaction R (which has no descendants) somewhere, but not at the very end of L.

    A new transaction A comes in which has the same size as R, same ancestors as R, and the same or higher fee as R.

    Imagine that in such a scenario the RBF code starts by taking L, removing all conflicts (in this case just R) from it, and appending the new transaction A at the end, and runs PostLinearize(), before invoking LIMO and whatnot.

    The test here guarantees that the result of that PostLinearize() will already be as good as the old linearization was, despite having “forgotten” where inside L R was placed. In other words, PostLinearize is - in this very limited scenario - able to “restore” the quality the linearization had.


    glozow commented at 10:49 am on August 2, 2024:
    Ah okay my bad, I thought it was simulating the RBF strategy but it’s actually asserting a more general property, that PostLinearize can bring us “back” to something at least as good as before the leaf was moved to the back.
  50. clusterlin: rename Intersect -> IntersectPrefixes
    This makes it clearer what the function does.
    0e52728a2d
  51. clusterlin: add algorithms for connectedness/connected components
    Add utility functions to DepGraph for finding connected components.
    0e2812d293
  52. clusterlin: add PostLinearize + benchmarks + fuzz tests 4f8958d756
  53. clusterlin: add MergeLinearizations function + fuzz test + benchmark 04d7a04ea4
  54. clusterlin: improve rechunking in LinearizationChunking (optimization)
    When the transactions being marked done exactly match the first chunk of
    what remains of the linearization, we can just remember to skip that
    chunk instead of computing a full rechunking.
    
    Further, chop off prefixes of the input linearization that are already done,
    so they don't need to be reconsidered for further rechunkings.
    bbcee5a0d6
  55. sipa force-pushed on Aug 1, 2024
  56. in src/cluster_linearize.h:833 in bbcee5a0d6
    828+    // than the one started from.
    829+    // - One pass in either direction guarantees that the resulting chunks are connected.
    830+    // - Each direction corresponds to one shape of tree being linearized optimally (forward passes
    831+    //   guarantee this for graphs where each transaction has at most one child; backward passes
    832+    //   guarantee this for graphs where each transaction has at most one parent).
    833+    // - Starting with a backward pass guarantees the moved-tree property.
    


    instagibbs commented at 8:24 pm on August 1, 2024:
    moved-leaf? (or point to another usage of moved tree)
  57. in src/cluster_linearize.h:914 in bbcee5a0d6
    909+    //                                         \-F-/
    910+    //
    911+    // During an even pass, the diagram above would correspond to linearization [2,3,0,1], with
    912+    // groups [2] and [3,0,1].
    913+
    914+    std::vector<TxEntry> entries(linearization.size() + 1);
    


    glozow commented at 10:27 am on August 2, 2024:
    Are we persisting entries between passes just so we don’t need to reallocate this vector? I don’t see that we need to keep any of the information from a previous pass. To check, added a entries.clear() which didn’t seem to break anything for me.

    sipa commented at 11:31 am on August 2, 2024:
    Indeed, the only reuse is the vector allocation.
  58. in src/cluster_linearize.h:815 in bbcee5a0d6
    810+ * @param[in,out] linearization  On input, an existing linearization for depgraph. On output, a
    811+ *                               potentially better linearization for the same graph.
    812+ *
    813+ * Postlinearization guarantees:
    814+ * - The resulting chunks are connected.
    815+ * - If the input has a tree shape (either all transactions have at most one child, or all
    


    glozow commented at 10:37 am on August 2, 2024:
    This is a subset of the “there exists a maximum of 1 path from any node to another” definition of a tree, right? This definition seems to be like a… botanical tree?

    sipa commented at 12:10 pm on August 2, 2024:
    According to https://en.wikipedia.org/wiki/Tree_(graph_theory), a transaction graph where each transaction has at most one child would be an arborescence or out-tree, and the opposite an anti-arborescence or in-tree.
  59. glozow commented at 11:14 am on August 2, 2024: member
    code review ACK bbcee5a0d67
  60. DrahtBot requested review from sdaftuar on Aug 2, 2024
  61. DrahtBot requested review from instagibbs on Aug 2, 2024
  62. instagibbs approved
  63. sdaftuar commented at 6:40 pm on August 2, 2024: member
    ACK bbcee5a0d67db46526ba29a1a4a7c590d303de03
  64. glozow merged this on Aug 5, 2024
  65. glozow closed this on Aug 5, 2024


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-12-26 12:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me