cluster mempool: extend DepGraph functionality #30857

pull sipa wants to merge 7 commits into bitcoin:master from sipa:202409_clusterlin_partialdep changing 8 files +543 −280
  1. sipa commented at 0:48 am on September 10, 2024: member

    Part of cluster mempool: #30289

    This adds:

    • DepGraph::AddDependencies to add 0 or more dependencies to a single transaction at once (identical to calling DepGraph::AddDependency once for each, but more efficient).
    • DepGraph::RemoveTransactions to remove 0 or more transactions from a depgraph.
    • DepGraph::GetReducedParents (and DepGraph::GetReducedChildren) to get the (reduced) direct parents and children of a transaction in a depgraph.

    After which, the Cluster type is removed.

    This is the result of fleshing out the design for the “intermediate layer” (“TxGraph”, no PR yet) between the cluster linearization layer and the mempool layer. My earlier thinking was that TxGraph would store Cluster objects (vectors of pairs of FeeFracs and sets of parents), and convert them to DepGraph on the fly whenever needed. However, after more consideration, it seems better to have TxGraph store DepGraph objects, and manipulate them directly without constantly re-creating them. This requires DepGraph to have some additional functionality.

    The bulk of the complexity here is the addition of DepGraph::RemoveTransactions, which leaves the remaining transactions’ positions within the DepGraph untouched (we want existing identifiers to remain valid), so this implies that graphs can now have “holes” (positions that are unused, but followed by positions that are used). To enable that, an extension of the fuzz/test serialization format DepGraphFormatter is included to deal with such holes.

  2. DrahtBot commented at 0:48 am on September 10, 2024: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage

    For detailed information about the code coverage, see the test coverage report.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK instagibbs, ismaelsadeeq, sdaftuar, glozow

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #30605 (Cluster linearization: separate tests from tests-of-tests by sipa)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  3. sipa renamed this:
    cluster mempool: extend DepGraph (multiple dependencies, removing transactions, parents/children)
    cluster mempool: extend DepGraph functionality
    on Sep 10, 2024
  4. sipa force-pushed on Sep 10, 2024
  5. DrahtBot added the label CI failed on Sep 10, 2024
  6. DrahtBot commented at 1:33 am on September 10, 2024: contributor

    🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/29906098019

    Make sure to run all tests locally, according to the documentation.

    The failure may happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  7. sipa force-pushed on Sep 10, 2024
  8. fanquake commented at 12:39 pm on September 10, 2024: member

    https://cirrus-ci.com/task/4580438562308096?logs=ci#L1723

     0Run clusterlin_ancestor_finder with args ['/ci_container_base/ci/scratch/build-x86_64-pc-linux-gnu/src/test/fuzz/fuzz', '-runs=1', PosixPath('/ci_container_base/ci/scratch/qa-assets/fuzz_corpora/clusterlin_ancestor_finder')]INFO: Running with entropic power schedule (0xFF, 100).
     1INFO: Seed: 2205311796
     2INFO: Loaded 1 modules   (615399 inline 8-bit counters): 615399 [0x55b1778f3be8, 0x55b177989fcf), 
     3INFO: Loaded 1 PC tables (615399 PCs): 615399 [0x55b177989fd0,0x55b1782ede40), 
     4INFO:       57 files found in /ci_container_base/ci/scratch/qa-assets/fuzz_corpora/clusterlin_ancestor_finder
     5INFO: -max_len is not provided; libFuzzer will not generate inputs larger than 4096 bytes
     6INFO: seed corpus: files: 57 min: 1b max: 516b total: 4662b rss: 111Mb
     7/ci_container_base/src/test/util/cluster_linearize.h:242:29: runtime error: unsigned integer overflow: 0 - 1 cannot be represented in type 'ClusterIndex' (aka 'unsigned int')
     8    [#0](/bitcoin-bitcoin/0/) 0x55b17470e4e4 in void (anonymous namespace)::DepGraphFormatter::Unser<SpanReader, bitset_detail::IntBitSet<unsigned int>>(SpanReader&, cluster_linearize::DepGraph<bitset_detail::IntBitSet<unsigned int>>&)::'lambda'(unsigned int)::operator()(unsigned int) const ci/scratch/build-x86_64-pc-linux-gnu/src/test/fuzz/./src/test/util/cluster_linearize.h:242:29
     9    [#1](/bitcoin-bitcoin/1/) 0x55b1746e5efa in void (anonymous namespace)::DepGraphFormatter::Unser<SpanReader, bitset_detail::IntBitSet<unsigned int>>(SpanReader&, cluster_linearize::DepGraph<bitset_detail::IntBitSet<unsigned int>>&) ci/scratch/build-x86_64-pc-linux-gnu/src/test/fuzz/./src/test/util/cluster_linearize.h:268:17
    10    [#2](/bitcoin-bitcoin/2/) 0x55b1746e5efa in void Wrapper<(anonymous namespace)::DepGraphFormatter, cluster_linearize::DepGraph<bitset_detail::IntBitSet<unsigned int>>&>::Unserialize<SpanReader>(SpanReader&) ci/scratch/build-x86_64-pc-linux-gnu/src/test/fuzz/./src/serialize.h:481:73
    11    [#3](/bitcoin-bitcoin/3/) 0x55b1746e5efa in void Unserialize<SpanReader, Wrapper<(anonymous namespace)::DepGraphFormatter, cluster_linearize::DepGraph<bitset_detail::IntBitSet<unsigned int>>&>&>(T&, T0&&) ci/scratch/build-x86_64-pc-linux-gnu/src/test/fuzz/./src/serialize.h:761:7
    12    [#4](/bitcoin-bitcoin/4/) 0x55b1746e5efa in SpanReader& SpanReader::operator>><Wrapper<(anonymous namespace)::DepGraphFormatter, cluster_linearize::DepGraph<bitset_detail::IntBitSet<unsigned int>>&>>(Wrapper<(anonymous namespace)::DepGraphFormatter, cluster_linearize::DepGraph<bitset_detail::IntBitSet<unsigned int>>&>&&) ci/scratch/build-x86_64-pc-linux-gnu/src/test/fuzz/./src/streams.h:114:9
    13    [#5](/bitcoin-bitcoin/5/) 0x55b1746eb7c4 in clusterlin_ancestor_finder_fuzz_target(std::span<unsigned char const, 18446744073709551615ul>) ci/scratch/build-x86_64-pc-linux-gnu/src/test/fuzz/./src/test/fuzz/cluster_linearize.cpp:454:16
    14    [#6](/bitcoin-bitcoin/6/) 0x55b174d1203d in std::function<void (std::span<unsigned char const, 18446744073709551615ul>)>::operator()(std::span<unsigned char const, 18446744073709551615ul>) const /usr/bin/../lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/bits/std_function.h:591:9
    15    [#7](/bitcoin-bitcoin/7/) 0x55b174d1203d in LLVMFuzzerTestOneInput ci/scratch/build-x86_64-pc-linux-gnu/src/test/fuzz/util/./src/test/fuzz/fuzz.cpp:209:5
    16    [#8](/bitcoin-bitcoin/8/) 0x55b1743ee834 in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long) (/ci_container_base/ci/scratch/build-x86_64-pc-linux-gnu/src/test/fuzz/fuzz+0x1bcb834) (BuildId: c3b4f6b7a5dfbb78ebde90aae8831e87214c3cf3)
    17    [#9](/bitcoin-bitcoin/9/) 0x55b1743edf29 in fuzzer::Fuzzer::RunOne(unsigned char const*, unsigned long, bool, fuzzer::InputInfo*, bool, bool*) (/ci_container_base/ci/scratch/build-x86_64-pc-linux-gnu/src/test/fuzz/fuzz+0x1bcaf29) (BuildId: c3b4f6b7a5dfbb78ebde90aae8831e87214c3cf3)
    18    [#10](/bitcoin-bitcoin/10/) 0x55b1743efb46 in fuzzer::Fuzzer::ReadAndExecuteSeedCorpora(std::vector<fuzzer::SizedFile, std::allocator<fuzzer::SizedFile>>&) (/ci_container_base/ci/scratch/build-x86_64-pc-linux-gnu/src/test/fuzz/fuzz+0x1bccb46) (BuildId: c3b4f6b7a5dfbb78ebde90aae8831e87214c3cf3)
    19    [#11](/bitcoin-bitcoin/11/) 0x55b1743f0057 in fuzzer::Fuzzer::Loop(std::vector<fuzzer::SizedFile, std::allocator<fuzzer::SizedFile>>&) (/ci_container_base/ci/scratch/build-x86_64-pc-linux-gnu/src/test/fuzz/fuzz+0x1bcd057) (BuildId: c3b4f6b7a5dfbb78ebde90aae8831e87214c3cf3)
    20    [#12](/bitcoin-bitcoin/12/) 0x55b1743dd54f in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) (/ci_container_base/ci/scratch/build-x86_64-pc-linux-gnu/src/test/fuzz/fuzz+0x1bba54f) (BuildId: c3b4f6b7a5dfbb78ebde90aae8831e87214c3cf3)
    21    [#13](/bitcoin-bitcoin/13/) 0x55b174407bd6 in main (/ci_container_base/ci/scratch/build-x86_64-pc-linux-gnu/src/test/fuzz/fuzz+0x1be4bd6) (BuildId: c3b4f6b7a5dfbb78ebde90aae8831e87214c3cf3)
    22    [#14](/bitcoin-bitcoin/14/) 0x7fe0a47af1c9  (/lib/x86_64-linux-gnu/libc.so.6+0x2a1c9) (BuildId: 6d64b17fbac799e68da7ebd9985ddf9b5cb375e6)
    23    [#15](/bitcoin-bitcoin/15/) 0x7fe0a47af28a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2a28a) (BuildId: 6d64b17fbac799e68da7ebd9985ddf9b5cb375e6)
    24    [#16](/bitcoin-bitcoin/16/) 0x55b1743d2534 in _start (/ci_container_base/ci/scratch/build-x86_64-pc-linux-gnu/src/test/fuzz/fuzz+0x1baf534) (BuildId: c3b4f6b7a5dfbb78ebde90aae8831e87214c3cf3)
    25
    26SUMMARY: UndefinedBehaviorSanitizer: unsigned-integer-overflow /ci_container_base/src/test/util/cluster_linearize.h:242:29 
    27MS: 0 ; base unit: 0000000000000000000000000000000000000000
    280xa,0x7e,0x7e,0x7e,0x3,0x0,0x7e,0x7e,0x7e,0x0,0x8,0x0,0x3f,
    29\012~~~\003\000~~~\000\010\000?
    30artifact_prefix='./'; Test unit written to ./crash-8c0d970971a66a2d84b05f68d2956932fb44a077
    31Base64: Cn5+fgMAfn5+AAgAPw==
    

    The previous releases failure is unrelated.

  9. sipa force-pushed on Sep 10, 2024
  10. sipa commented at 12:52 pm on September 10, 2024: member

    @fanquake Thanks! Fixed, I think.

    EDIT: Narrator: it wasn’t.

  11. sipa force-pushed on Sep 10, 2024
  12. instagibbs commented at 4:27 pm on September 10, 2024: member
    I’m going to hold off reviewing this until I see the TxGraph PR(dirty draft ok) to better understand its intended usage. :+1:
  13. sipa commented at 5:14 pm on September 10, 2024: member

    @instagibbs Sure.

    To give a bit more background, my idea to store things in Cluster form and convert to DepGraph, was due to a (mistaken) belief that incremental changes to a DepGraph would be $\mathcal{O}(n)$ per-dependency, meaning $\mathcal{O}(n^2)$ per-transaction, and thus potentially $\mathcal{O}(n^3)$ for a full $n$ transactions. @sdaftuar pointed out the possibility of an $\mathcal{O}(n)$ per-transaction algorithm, or $\mathcal{O}(n^2)$ for $n$ transactions, implemented in this PR as DepGraph::AddDependencies. As that’s equal in complexity to the current bulk Cluster->DepGraph conversion, there is little reason left to prefer that bulk conversion algorithm, and leaving things in DepGraph form means the same module can be used for more purposes (e.g. computing a transaction’s ancestry, as it’s literally precomputed already inside DepGraph).

  14. glozow added the label Utils/log/libs on Sep 10, 2024
  15. DrahtBot removed the label CI failed on Sep 11, 2024
  16. in src/cluster_linearize.h:200 in 77e07eba11 outdated
    183@@ -184,6 +184,46 @@ class DepGraph
    184         }
    185     }
    186 
    187+    /** Compute the (reduced) set of parents of node i in this graph.
    


    instagibbs commented at 2:43 pm on September 16, 2024:

    commit 77e07eba11391868f83db22bf2280e02b5a8cd78

    add a motivation for these additions in commit message?


    sipa commented at 5:49 pm on September 18, 2024:
    Done. I’ve expanded the commit message of all commits.
  17. in src/test/util/cluster_linearize.h:337 in 77e07eba11 outdated
    273+        for (auto parent : parents) {
    274+            assert((depgraph.Ancestors(parent) & parents).IsSubsetOf(SetType::Singleton(parent)));
    275+        }
    276+        // Similar for children and descendants.
    277+        for (auto child : children) {
    278+            assert((depgraph.Descendants(child) & children).IsSubsetOf(SetType::Singleton(child)));
    


    instagibbs commented at 3:10 pm on September 16, 2024:

    ?

    0            assert((depgraph.Descendants(child) & children) == SetType::Singleton(child));
    

    sipa commented at 5:51 pm on September 18, 2024:
    Same reason; won’t work for cyclic graphs.
  18. in src/test/util/cluster_linearize.h:333 in 77e07eba11 outdated
    269+        auto children = depgraph.GetReducedChildren(i);
    270+        assert(!parents[i]);
    271+        assert(!children[i]);
    272+        // Parents of a transactions do not have ancestors inside those parents (except itself).
    273+        for (auto parent : parents) {
    274+            assert((depgraph.Ancestors(parent) & parents).IsSubsetOf(SetType::Singleton(parent)));
    


    instagibbs commented at 3:10 pm on September 16, 2024:

    ?

    0            assert((depgraph.Ancestors(parent) & parents) == SetType::Singleton(parent));
    

    sipa commented at 5:51 pm on September 18, 2024:
    Sadly that won’t work when there are cyclic dependencies. Of course, this test could be moved down to the “if acyclic” branch, but there is a stronger test there already.
  19. in src/cluster_linearize.h:106 in 44afa83b8f outdated
    114-        // Fill in descendant information by transposing the ancestor information.
    115-        for (ClusterIndex i = 0; i < entries.size(); ++i) {
    116-            for (auto j : entries[i].ancestors) {
    117-                entries[j].descendants.Set(i);
    118-            }
    119+            // Fill in dependencies.
    


    instagibbs commented at 3:35 pm on September 16, 2024:

    nit

    0            // Fill in dependencies using direct parents.
    

    sipa commented at 5:46 pm on September 18, 2024:
    Done. Made it “Fill in dependencies by remapping direct parents.”.
  20. in src/cluster_linearize.h:224 in 44afa83b8f outdated
    164+     *
    165+     * Complexity: O(N) where N=TxCount().
    166+     */
    167+    void AddDependencies(const SetType& parents, ClusterIndex child) noexcept
    168+    {
    169+        // Compute the ancestors of parents that are not yet ancestors of child.
    


    instagibbs commented at 3:44 pm on September 16, 2024:

    nit

    0        // Compute the ancestors of parents that are not already ancestors of child.
    

    sipa commented at 5:54 pm on September 18, 2024:
    Done.
  21. in src/cluster_linearize.h:177 in 44afa83b8f outdated
    162+     * The behavior is identical to repeatedly calling AddDependency for each element of parents,
    163+     * but is more efficient.
    164+     *
    165+     * Complexity: O(N) where N=TxCount().
    166+     */
    167+    void AddDependencies(const SetType& parents, ClusterIndex child) noexcept
    


    instagibbs commented at 4:15 pm on September 16, 2024:

    could use this in AddDependency as a one-liner?

    0AddDependencies(SetType::Singleton(parent), child);
    

    sipa commented at 5:48 pm on September 18, 2024:
    I went a lot further and dropped AddDependency entirely.

    instagibbs commented at 6:30 pm on September 18, 2024:
    even better
  22. in src/test/util/cluster_linearize.h:169 in dde3959a24 outdated
    172-                ++insert_distance;
    173-            }
    174-            rebuilt_order.insert(rebuilt_order.end() - insert_distance, idx);
    175-            s << VARINT(diff + insert_distance);
    176+            // The new transaction is be inserted N positions back from the end of the cluster.
    177+            // Emit N to indicate that many insertion choices are skipped.
    


    instagibbs commented at 5:03 pm on September 16, 2024:

    dde3959a24a12cd7ab37e772b55ae3c60a646528

    no behavior change? would be good to say in commit message explicitly


    sipa commented at 5:54 pm on September 18, 2024:
    Done.
  23. in src/cluster_linearize.h:52 in 7510a65127 outdated
    56@@ -57,9 +57,20 @@ class DepGraph
    57     /** Data for each transaction, in the same order as the Cluster it was constructed from. */
    58     std::vector<Entry> entries;
    59 
    60+    /** Which positions are used. */
    


    instagibbs commented at 6:12 pm on September 16, 2024:

    in commit message for 7510a65127db610df4be9cdd95eaef0c23b3e2ff

    “This commits starts introducing support” are we expecting more support later?


    sipa commented at 5:46 pm on September 18, 2024:
    Fixed. I initially intended to introduce this in multiple commits, but then didn’t go back to editing the commit message after everything ended up in one commit.
  24. in src/test/cluster_linearize_tests.cpp:23 in 7510a65127 outdated
    17@@ -18,6 +18,10 @@ using namespace cluster_linearize;
    18 
    19 namespace {
    20 
    21+/** Special magic value that indicates to TestDepGraphSerialization that a cluster entry represents
    22+ *  a hole. */
    23+constexpr std::pair<FeeFrac, TestBitSet> HOLE{FeeFrac{1901858123504646, 3635904}, {}};
    


    instagibbs commented at 6:26 pm on September 16, 2024:
    humor me: why these values? Values outside what the de-serializer is otherwise allowed to make?

    sipa commented at 5:54 pm on September 18, 2024:
    They were randomly generated, but are legal for the serializer (SanityCheck() would fail otherwise). I’ve simplified them to something more obviously nonsensical (transaction larger than 4M).
  25. DrahtBot added the label Needs rebase on Sep 16, 2024
  26. in src/test/fuzz/cluster_linearize.cpp:930 in 7510a65127 outdated
    863-        depgraph_tree.AddTransaction(depgraph_gen.FeeRate(i));
    864+    for (ClusterIndex i = 0; i < depgraph_gen.PositionRange(); ++i) {
    865+        if (depgraph_gen.Positions()[i]) {
    866+            depgraph_tree.AddTransaction(depgraph_gen.FeeRate(i));
    867+        } else {
    868+            // For holes, add a dummy transaction which is deleted below.
    


    instagibbs commented at 6:44 pm on September 16, 2024:
    I don’t see where these are being deleted? The only removal below is removing things that aren’t in Positions(), which the newly added transactions should be? Likely misreading this.

    sipa commented at 5:53 pm on September 18, 2024:
    This is adding transactions to depgraph_tree corresponding to the holes in depgraph_gen, which are then deleted below. The point is making sure that the non-holes have the same positions in both. I have added a comment to explain.

    instagibbs commented at 6:40 pm on September 18, 2024:
    ok, seems clear to me now, not sure why I was so confused.
  27. sipa force-pushed on Sep 16, 2024
  28. sipa commented at 8:44 pm on September 16, 2024: member
    Rebased after merge of #30286.
  29. DrahtBot removed the label Needs rebase on Sep 16, 2024
  30. sipa force-pushed on Sep 17, 2024
  31. in src/cluster_linearize.h:151 in 9f67d6dcdc outdated
    175     }
    176 
    177+    /** Remove the specified positions from this DepGraph.
    178+     *
    179+     * The specified positions will no longer be part of Positions(), and dependencies with them are
    180+     * removed. Note that due to DepGraph only tracking ancestors/descendants (and not direct
    


    instagibbs commented at 4:32 pm on September 17, 2024:

    Ergonomically speaking, I expect this to be used to remove a set of “direct conflicts” and all descendants of that set, so this wouldn’t cause invalid views of the underlying cluster’s graph relationships.

    Is this correct?


    sipa commented at 5:18 pm on September 18, 2024:
    In practice, this will be used to either remove a set of conflicts (together with all their descendants), or to remove a set of mined transactions (together with all their ancestors). In both cases, no transactions will remain which are both ancestors of some deleted transactions and descendants of other deleted transactions.

    instagibbs commented at 5:19 pm on September 18, 2024:
    ah, right! I hadn’t considered the mining side. Maybe leave something to that effect?

    sipa commented at 5:47 pm on September 18, 2024:
    I feel like that’s better left to the layer above; as far as DepGraph is concerned, it’s pefectly well-defined to do this, even though we don’t care.
  32. in src/cluster_linearize.h:188 in 9f67d6dcdc outdated
    188+        m_used -= del;
    189+        // Remove now-unused trailing entries.
    190+        while (!entries.empty() && !m_used[entries.size() - 1]) {
    191+            entries.pop_back();
    192+        }
    193+        // Remove the deleted transactions from ancestors/descendants of other transactions.
    


    instagibbs commented at 4:33 pm on September 17, 2024:
    do we want to leave a note that now-unused entries not at the end can be non-empty/dirty? If that’s not possible, a comment explaining that would be helpful too.

    sipa commented at 5:52 pm on September 18, 2024:
    Added a comment.
  33. in src/cluster_linearize.h:159 in 9f67d6dcdc outdated
    183+     *
    184+     * Complexity: O(N) where N=TxCount().
    185+     */
    186+    void RemoveTransactions(const SetType& del) noexcept
    187+    {
    188+        m_used -= del;
    


    instagibbs commented at 4:34 pm on September 17, 2024:
    Do we want to Assume that del is in m_used?

    sipa commented at 5:48 pm on September 18, 2024:
    There is no need for that I think. The code doesn’t get simpler by enforcing that.

    instagibbs commented at 6:34 pm on September 18, 2024:
    If we’re allowing it, let’s make sure there’s overage for it in the fuzzer

    sipa commented at 10:09 pm on September 18, 2024:
    Good idea, done (in clusterlin_cluster_serialization).
  34. instagibbs commented at 5:13 pm on September 17, 2024: member

    First run through.

    suggestion for clusterlin_add_dependencies harness: https://github.com/instagibbs/bitcoin/commit/c5c44da5a0a630a6725ca85e90e4c67b2eb1f8bc

    I think this gets similar or better coverage and is easier for me to understand? (alternatively, take an extra bit and use that as decision bit to batch or individually add, and have the other depgraph do the opposite?)

    edit: ah, I’m hodling it wrong! Deleted question.

  35. in src/cluster_linearize.h:178 in f484f5a872 outdated
    163+     * but is more efficient.
    164+     *
    165+     * Complexity: O(N) where N=TxCount().
    166+     */
    167+    void AddDependencies(const SetType& parents, ClusterIndex child) noexcept
    168+    {
    


    instagibbs commented at 6:03 pm on September 17, 2024:

    Was reconstructing graphs incorrectly leading to inconsistent sets. Better to fail here early.

    0    {
    1         // We can't add relationships to holes
    2        Assume(parents.IsSubsetOf(m_used));
    

    a check like this for AddDependency would also be good:

    Assume(SetType::Singleton(parent).IsSubsetOf(m_used));


    sipa commented at 5:49 pm on September 18, 2024:
    Done.
  36. instagibbs commented at 6:46 pm on September 17, 2024: member

    it’s likely the wrong place for the coverage, but I added some coverage for iterative additional and removal of transactions. Take what you think is useful if any:

      0diff --git a/src/test/fuzz/cluster_linearize.cpp b/src/test/fuzz/cluster_linearize.cpp
      1index be06898a6e..62b5ec2343 100644
      2--- a/src/test/fuzz/cluster_linearize.cpp
      3+++ b/src/test/fuzz/cluster_linearize.cpp
      4@@ -1,19 +1,20 @@
      5 // Copyright (c) The Bitcoin Core developers
      6 // Distributed under the MIT software license, see the accompanying
      7 // file COPYING or http://www.opensource.org/licenses/mit-license.php.
      8 
      9 #include <cluster_linearize.h>
     10 #include <serialize.h>
     11 #include <streams.h>
     12 #include <test/fuzz/fuzz.h>
     13 #include <test/fuzz/FuzzedDataProvider.h>
     14+#include <test/fuzz/util.h>
     15 #include <test/util/cluster_linearize.h>
     16 #include <util/bitset.h>
     17 #include <util/feefrac.h>
     18 
     19 #include <algorithm>
     20 #include <stdint.h>
     21 #include <vector>
     22 #include <utility>
     23 
     24 using namespace cluster_linearize;
     25@@ -308,30 +309,89 @@ FUZZ_TARGET(clusterlin_cluster_serialization)
     26             if (i == j) continue;
     27             if (provider.ConsumeBool()) cluster[i].second.Set(j);
     28         }
     29     }
     30 
     31     // Construct dependency graph, and verify it matches the cluster (which includes a round-trip
     32     // check for the serialization).
     33     DepGraph depgraph(cluster);
     34     VerifyDepGraphFromCluster(cluster, depgraph);
     35 
     36+    // Copy to store deleted information
     37+    DepGraph depgraph_copy(depgraph);
     38+
     39     // Remove an arbitrary subset (in order to construct a graph with holes) and verify that it
     40     // still sanity checks (incl. round-tripping serialization).
     41     uint64_t del = provider.ConsumeIntegralInRange<uint64_t>(1, (uint64_t{1} << num_tx) - 1);
     42     TestBitSet setdel;
     43     for (ClusterIndex i : depgraph.Positions()) {
     44         if (del & 1) setdel.Set(i);
     45         del >>= 1;
     46     }
     47+    assert(del == 0);
     48+
     49     depgraph.RemoveTransactions(setdel);
     50     SanityCheck(depgraph);
     51+
     52+    FastRandomContext rng{ConsumeUInt256(provider)};
     53+
     54+    // Arbitary addition or removal order for transactions we're deleting
     55+    std::vector<ClusterIndex> del_indexes;
     56+    for (const auto del_index : setdel) {
     57+        del_indexes.push_back(del_index);
     58+    }
     59+
     60+    // Tracks deletion index vs re-addition index for dependency re-addition
     61+    std::map<ClusterIndex, ClusterIndex> map_del_add_index;
     62+
     63+    // Re-add everything back in; should not fail sanity checks
     64+    // But this will have transactions "out of order" and missing
     65+    // connections from children to previously deleted ancestors.
     66+    std::shuffle(del_indexes.begin(), del_indexes.end(), rng);
     67+    for (ClusterIndex del_index : del_indexes) {
     68+        const auto add_index{depgraph.AddTransaction(depgraph_copy.FeeRate(del_index))};
     69+        map_del_add_index[del_index] = add_index;
     70+    }
     71+
     72+    std::shuffle(del_indexes.begin(), del_indexes.end(), rng);
     73+    for (ClusterIndex del_index : del_indexes) {
     74+       depgraph.AddDependencies(depgraph_copy.Ancestors(del_index), map_del_add_index[del_index]);
     75+    }
     76+
     77+    SanityCheck(depgraph);
     78+
     79+    // But if we repeat this N times, we should get identical depgraphs
     80+    for (int i = 0; i < provider.ConsumeIntegralInRange<int>(1, 5); ++i) {
     81+        depgraph_copy = depgraph;
     82+        // Batch removal or shuffled iterated removal
     83+        if (provider.ConsumeBool()) {
     84+            depgraph.RemoveTransactions(setdel);
     85+        } else {
     86+            std::shuffle(del_indexes.begin(), del_indexes.end(), rng);
     87+            for (const auto del_index : del_indexes) {
     88+                depgraph.RemoveTransactions(TestBitSet::Singleton(del_index));
     89+            }
     90+        }
     91+        SanityCheck(depgraph);
     92+
     93+        std::shuffle(del_indexes.begin(), del_indexes.end(), rng);
     94+        map_del_add_index.clear();
     95+        for (ClusterIndex del_index : setdel) {
     96+            const auto add_index{depgraph.AddTransaction(depgraph_copy.FeeRate(del_index))};
     97+            map_del_add_index[del_index] = add_index;
     98+        }
     99+        std::shuffle(del_indexes.begin(), del_indexes.end(), rng);
    100+        for (ClusterIndex del_index : setdel) {
    101+            depgraph.AddDependencies(depgraph_copy.Ancestors(del_index), map_del_add_index[del_index]);
    102+        }
    103+        assert(depgraph == depgraph_copy);
    104+    }
    105 }
    106 
    107 FUZZ_TARGET(clusterlin_depgraph_serialization)
    108 {
    109     // Verify that any deserialized depgraph is acyclic and roundtrips to an identical depgraph.
    110 
    111     // Construct a graph by deserializing.
    112     SpanReader reader(buffer);
    113     DepGraph<TestBitSet> depgraph;
    114     try {
    
  37. in src/cluster_linearize.h:208 in 9f67d6dcdc outdated
    245+     * This is the set of dependencies of i, excluding i itself, and excluding any parents of other
    246+     * dependencies.
    247+     *
    248+     * Complexity: O(N) where N=Ancestors(i).Count() (which is bounded by TxCount()).
    249+     */
    250+    SetType GetReducedParents(ClusterIndex i) const noexcept
    


    glozow commented at 11:57 pm on September 17, 2024:
    Is the hope to be able to reconstruct direct parents of a transaction from the txgraph? Even if a direct parent is also an indirect ancestor?

    sipa commented at 0:05 am on September 18, 2024:
    No, that’s impossible (A->B->C is indistinguishable from (A->B->C,A->C)); the goal is just computing a minimal set of dependencies whose corresponding ancestry matches the DepGraph (and this will necessarily be a subset of the real direct parents).

    glozow commented at 0:28 am on September 18, 2024:
    thought so… how is it intended to be used?

    sipa commented at 5:50 pm on September 18, 2024:
    I have expanded the commit message.
  38. sipa commented at 0:32 am on September 18, 2024: member

    @glozow It’s already used in two places in this PR.

    Maybe I should clarify in the PR description that this is just a generic utility feature, not something the next layer up will need per se (AFAIK). But because I was introducing a second place where it’s invoked, and it’s already used somewhere in the test, I felt it made sense to turn it into a proper DepGraph function, with proper tests.

  39. glozow commented at 0:44 am on September 18, 2024: member
    Sorry, I don’t know how I missed the calls within this PR :facepalm: I think I was grepping only on the first commit.
  40. sipa force-pushed on Sep 18, 2024
  41. sipa commented at 5:58 pm on September 18, 2024: member

    I addressed most of the feedback received. Thanks @instagibbs and @glozow!

    In addition, the DepGraphFormatter::Unser logic was changed significantly. The result is simpler to follow I think (doesn’t involve adding a transaction once and then undoing the insertion and redoing it once the position is known), and reduces the complexity from $\mathcal{O}(n^3)$ to $\mathcal{O}(n^2)$ in the process. For this, the commits were reordered, and their commit messages expanded to explain the rationale. @instagibbs I have added your testing of re-adding in clusterlin_cluster_serialization as a separate commit with some minor changes (mostly to keep the result efficient).

  42. instagibbs commented at 7:01 pm on September 18, 2024: member

    looked at git range-diff master 9f67d6dcdcf7e9c75a93ad78f8902537599f4f6d 4476915bc76b48eff9b5e67fd6bd7647cc12793f

    LGTM, though I haven’t re-validated the serializer, trusting the tests of tests. I want to think more about the fuzz targets to see if there are any other easy pickups in coverage.

  43. in src/test/fuzz/cluster_linearize.cpp:262 in 0e9af35e49 outdated
    273+    DepGraph depgraph_batch(cluster);
    274+    SanityCheck(depgraph_batch);
    275+    // Read (parent, child) and (parents, child) pairs, and add them to the cluster and depgraph.
    276+    std::vector<std::pair<ClusterIndex, ClusterIndex>> deps_list;
    277+    LIMITED_WHILE(provider.remaining_bytes() > 0, TestBitSet::Size()) {
    278+        const auto parents_mask = provider.ConsumeIntegralInRange<uint64_t>(1, (uint64_t{1} << num_tx) - 1);
    


    instagibbs commented at 7:11 pm on September 18, 2024:

    I think this gives additional coverage since no deps added in a AddDependencies call is allowed

    0        const auto parents_mask = provider.ConsumeIntegralInRange<uint64_t>(0, (uint64_t{1} << num_tx) - 1);
    

    sipa commented at 10:12 pm on September 18, 2024:
    Indeed, done.
  44. in src/cluster_linearize.h:142 in 0e9af35e49 outdated
    136@@ -164,21 +137,29 @@ class DepGraph
    137         return new_idx;
    138     }
    139 
    140-    /** Modify this transaction graph, adding a dependency between a specified parent and child.
    141+    /** Modify this transaction graph, adding multiple parents to a specified child.
    142+     *
    143+     * The behavior is identical to repeatedly calling AddDependency for each element of parents,
    


    instagibbs commented at 7:15 pm on September 18, 2024:
    AddDependency doesn’t exist anymore

    sipa commented at 10:12 pm on September 18, 2024:
    Gone.
  45. in src/cluster_linearize.h:89 in a9d5f0619e outdated
    111@@ -100,21 +112,34 @@ class DepGraph
    112      *
    113      * Complexity: O(N^2) where N=depgraph.TxCount().
    114      */
    115-    DepGraph(const DepGraph<SetType>& depgraph, Span<const ClusterIndex> mapping) noexcept : DepGraph(depgraph.TxCount())
    116+    DepGraph(const DepGraph<SetType>& depgraph, Span<const ClusterIndex> mapping, ClusterIndex pos_range) noexcept : entries(pos_range)
    


    instagibbs commented at 7:37 pm on September 18, 2024:
    Could use some documentation on how to use mapping and pos_range, with holes it’s not immediately obvious.

    sipa commented at 10:08 pm on September 18, 2024:
    I have added some documentation and Assumes.
  46. in src/cluster_linearize.h:127 in a9d5f0619e outdated
    111@@ -100,21 +112,34 @@ class DepGraph
    112      *
    113      * Complexity: O(N^2) where N=depgraph.TxCount().
    114      */
    115-    DepGraph(const DepGraph<SetType>& depgraph, Span<const ClusterIndex> mapping) noexcept : DepGraph(depgraph.TxCount())
    116+    DepGraph(const DepGraph<SetType>& depgraph, Span<const ClusterIndex> mapping, ClusterIndex pos_range) noexcept : entries(pos_range)
    117     {
    118-        Assert(mapping.size() == depgraph.TxCount());
    


    instagibbs commented at 8:09 pm on September 18, 2024:
    is this not still true?

    sipa commented at 10:08 pm on September 18, 2024:
    No; mapping.size() should be equal to depgraph.PositionRange() though, which I’ve clarified and added an Assume for.
  47. in src/cluster_linearize.h:134 in a9d5f0619e outdated
    133             SetType parents;
    134             for (auto j : depgraph.GetReducedParents(i)) parents.Set(mapping[j]);
    135             AddDependencies(parents, mapping[i]);
    136         }
    137+        // Verify that the provided pos_range was correct (no unused positions at the end).
    138+        Assume(pos_range == 0 || m_used[pos_range - 1]);
    


    instagibbs commented at 8:14 pm on September 18, 2024:

    if not erroneous, I think a bit easier to read

    0        Assume((m_used.None() && pos_range == 0) || m_used.Last() == pos_range - 1);
    

    sipa commented at 10:11 pm on September 18, 2024:
    I made it even more explicit with Assume(m_used.None() ? (pos_range == 0) : (pos_range == m_used.Last() + 1));.
  48. in src/test/util/cluster_linearize.h:281 in a9d5f0619e outdated
    280+                diff %= (SetType::Size() - reordering.size());
    281+                SetType holes = SetType::Fill(SetType::Size());
    282+                for (auto pos : reordering) holes.Reset(pos);
    283+                for (auto pos : holes) {
    284+                    if (diff == 0) {
    285+                        reordering.push_back(pos);
    


    instagibbs commented at 8:17 pm on September 18, 2024:
    Haven’t looked close enough to the new format, but pushing another element to reordering after hitting SetType::Size already seems to make the DepGraph construction later odd? Couldn’t this make the mapping argument too large?

    sipa commented at 10:11 pm on September 18, 2024:
    reordering has a size equal to the number of transactions (excluding holes) in the reconstructed DepGraph, and can indeed not exceed SetType::Size() (this is also asserted above). However, total_size does include holes, and can reach SetType::Size() (that’s exactly what this branch is for, and it does not increment total_size anymore).

    instagibbs commented at 1:36 pm on September 19, 2024:
    Ah, missed the line where it was being set to diff.
  49. sipa force-pushed on Sep 18, 2024
  50. sipa commented at 10:13 pm on September 18, 2024: member

    @instagibbs I noticed that your extending of clusterlin_cluster_serialization makes the fuzz test 10x slower (when sanitizers are enabled), and I don’t understand why.

    EDIT: wiped, re-cmaked, recompiled, and the slowdown is gone.

  51. DrahtBot added the label CI failed on Sep 18, 2024
  52. sipa force-pushed on Sep 18, 2024
  53. sipa force-pushed on Sep 18, 2024
  54. DrahtBot removed the label CI failed on Sep 19, 2024
  55. instagibbs approved
  56. instagibbs commented at 1:46 pm on September 19, 2024: member

    ACK aab53ddcd8fcbc3c0be0da9383f8e06abe5badda

    reviewed via git range-diff master 4476915bc76b48eff9b5e67fd6bd7647cc12793f aab53ddcd8fcbc3c0be0da9383f8e06abe5badda

    Fuzz coverage seems good now, with additional documentation/asserts where requested. I didn’t dive deep into the new un-serializer logic.

    Did you figure out the slowdown? I didn’t reproduce locally.

  57. in src/test/util/cluster_linearize.h:272 in 40224019cd outdated
    263@@ -264,9 +264,23 @@ void SanityCheck(const DepGraph<SetType>& depgraph)
    264         for (ClusterIndex j = 0; j < depgraph.TxCount(); ++j) {
    265             assert(depgraph.Ancestors(i)[j] == depgraph.Descendants(j)[i]);
    266         }
    267+        // No transaction is a parent or child of itself.
    268+        auto parents = depgraph.GetReducedParents(i);
    269+        auto children = depgraph.GetReducedChildren(i);
    270+        assert(!parents[i]);
    271+        assert(!children[i]);
    272+        // Parents of a transactions do not have ancestors inside those parents (except itself).
    


    glozow commented at 11:12 am on September 20, 2024:

    nit in 40224019cd6e8259118412529c554d8b4a260130

    0        // Parents of a transaction do not have ancestors inside those parents (except itself).
    

    sipa commented at 11:29 pm on September 20, 2024:
    Fixed.
  58. in src/cluster_linearize.h:129 in aab53ddcd8 outdated
    161@@ -151,39 +162,120 @@ class DepGraph
    162     /** Get the descendants of a given transaction i. Complexity: O(1). */
    163     const SetType& Descendants(ClusterIndex i) const noexcept { return entries[i].descendants; }
    164 
    165-    /** Add a new unconnected transaction to this transaction graph (at the end), and return its
    166-     *  ClusterIndex.
    167+    /** Add a new unconnected transaction to this transaction graph (in the first available
    168+     *  position), and return its ClusterIndex.
    


    glozow commented at 8:58 pm on September 20, 2024:

    Ok, so iiuc, if you add + remove + add + remove the same number of transactions, the depgraph isn’t going to get larger and larger; the new transactions can take the place of the holes.

    Edit: and can see that it is tested in clusterlin_cluster_serialization


    sipa commented at 11:30 pm on September 20, 2024:
    Indeed, and DepGraph itself is responsible for allocating available positions.
  59. glozow commented at 9:12 pm on September 20, 2024: member
    Read through the code and it makes sense to me, but haven’t tested much. It would be nice to have a fuzz coverage report? @dergoegge
  60. sipa force-pushed on Sep 20, 2024
  61. sipa commented at 11:30 pm on September 20, 2024: member
    @instagibbs I deleted by build directory, redid the cmake generation, recompiled, and the slowdown was gone. Weird.
  62. in src/test/util/cluster_linearize.h:168 in 4f5916f956 outdated
    171-                if (idx > *(rebuilt_order.end() - 1 - insert_distance)) break;
    172-                ++insert_distance;
    173-            }
    174-            rebuilt_order.insert(rebuilt_order.end() - insert_distance, idx);
    175-            s << VARINT(diff + insert_distance);
    176+            // The new transaction is be inserted N positions back from the end of the cluster.
    


    sdaftuar commented at 3:27 pm on September 21, 2024:
    nit: Should say “is to be inserted”?

    sipa commented at 6:34 pm on October 1, 2024:
    Indeed. Will address if I retouch.

    sipa commented at 5:53 pm on October 7, 2024:
    Done.
  63. sipa commented at 12:37 pm on September 23, 2024: member

    @glozow @instagibbs It occurs to me that with the change in this PR, the Cluster will really only be used in tests anymore, as DepGraph objects can and will be built incrementally using AddTransaction, AddDependencies, and RemoveTransactions.

    Because of that, I think it would be simpler to get rid of Cluster entirely, and reformulate the tests to all operate on DepGraph directly. I can do this in this PR, or as a follow-up. What do you think?

    EDIT: went ahead and did that.

  64. sipa force-pushed on Sep 23, 2024
  65. sipa commented at 3:58 pm on September 23, 2024: member
    I have replaced the clusterlin_add_dependency and clusterlin_cluster_serialization tests with a new clusterlin_depgraph_sim test that performs an arbitrary sequence of AddTransaction, AddDependencies, and RemoveTransactions calls, and compares the final result with a simulated version. With that, Cluster was easy to delete.
  66. in src/test/fuzz/cluster_linearize.cpp:243 in de688caf29 outdated
    240@@ -241,103 +241,149 @@ std::vector<ClusterIndex> ReadLinearization(const DepGraph<BS>& depgraph, SpanRe
    241 
    242 } // namespace
    243 
    


    instagibbs commented at 4:22 pm on September 30, 2024:

    clusterlin: merge several DepGraph fuzz tests into simulation test

    Seems to only merge 2 fwiw, might just enumerate them?


    sipa commented at 9:37 pm on September 30, 2024:

    Changed to:

    clusterlin: merge two DepGraph fuzz tests into simulation test

    This combines the clusterlin_add_dependency and clusterlin_cluster_serialization fuzz tests into a single clusterlin_depgraph_sim fuzz test. This tests starts from an empty DepGraph and performs a arbitrary number of AddTransaction, AddDependencies, and RemoveTransactions operations on it, and compares the resulting state with a naive reimplementation.

  67. in src/test/fuzz/cluster_linearize.cpp:286 in de688caf29 outdated
    341-FUZZ_TARGET(clusterlin_cluster_serialization)
    342-{
    343-    // Verify that any graph of transactions has its ancestry correctly computed by DepGraph, and
    344-    // if it is a DAG, that it can be serialized as a DepGraph in a way that roundtrips. This
    345-    // guarantees that any acyclic cluster has a corresponding DepGraph serialization.
    346+    /** Read any set of transactions from the provider. */
    


    instagibbs commented at 4:25 pm on September 30, 2024:
    0    /** Read any set of transactions from the provider including empty slots. */
    

    sipa commented at 9:39 pm on September 30, 2024:

    Changed to:

    /** Read any set of transactions from the provider (including unused positions). */

  68. in src/test/fuzz/cluster_linearize.cpp:256 in de688caf29 outdated
    277+    DepGraph<TestBitSet> real;
    278+    /** Simulated DepGraph (sim[i] is std::nullopt if position i does not exist; otherwise,
    279+     *  sim[i]->first is its individual feerate, and sim[i]->second is its set of ancestors. */
    280+    std::array<std::optional<std::pair<FeeFrac, TestBitSet>>, TestBitSet::Size()> sim;
    281+    /** The number of non-nullopt position in sim. */
    282+    ClusterIndex num_tx{0};
    


    instagibbs commented at 4:31 pm on September 30, 2024:
    mu-nit: s/num_tx/sim_num_tx/

    sipa commented at 9:40 pm on September 30, 2024:
    Done. I have also added an actual counting of the number of transactions in real at the end, and an assert to compare it with num_tx_sim.
  69. in src/test/fuzz/cluster_linearize.cpp:347 in de688caf29 outdated
    398+                if (!sim[i].has_value()) {
    399+                    assert(idx == i);
    400+                    break;
    401+                }
    402+            }
    403+            // Update sim.
    


    instagibbs commented at 4:38 pm on September 30, 2024:

    In case somehow the sim had no non-empty slots above and misses the idx == i assert

    0            // Update sim.
    1            assert(!sim[i].has_value());
    

    sipa commented at 9:21 pm on September 30, 2024:
    i is not in scope here anymore.

    instagibbs commented at 0:47 am on October 1, 2024:
    d’oh
  70. instagibbs approved
  71. instagibbs commented at 4:52 pm on September 30, 2024: member

    reACK 7051fcdda6fdc95677a34d5c14321d7580eceed8

    via git range-diff master aab53ddcd8fcbc3c0be0da9383f8e06abe5badda 7051fcdda6fdc95677a34d5c14321d7580eceed8

    New fuzzer harness is a nice generalization of what existed prior.

  72. sipa force-pushed on Sep 30, 2024
  73. sipa force-pushed on Sep 30, 2024
  74. sipa force-pushed on Sep 30, 2024
  75. sipa force-pushed on Sep 30, 2024
  76. sipa commented at 10:33 pm on September 30, 2024: member

    Sorry for the repeated force pushes; I made a change in the wrong commit, and then made some minor improvements to the new fuzz too.

    Absent non-nitty review that needs addressing, I don’t intend to make change here anymore.

  77. instagibbs commented at 0:48 am on October 1, 2024: member

    reACK 2b21aebd9754c503bac12d1dbf566b3aa27451e8

    via git range-diff master 7051fcdda6fdc95677a34d5c14321d7580eceed8 2b21aebd9754c503bac12d1dbf566b3aa27451e8

  78. in src/cluster_linearize.h:687 in 8f44c6a1f8 outdated
    728@@ -649,7 +729,8 @@ class SearchCandidateFinder
    729             m_original_to_sorted[m_sorted_to_original[i]] = i;
    


    sdaftuar commented at 3:32 pm on October 1, 2024:
    Just to verify my understanding: I was wondering why this loop was just up to depgraph.TxCount() and not depgraph.PositionRange(). I think this is ok because when we instantiate a new DepGraph using a reordering, we only look at the mapping positions for entries that are in m_used, so it doesn’t matter how the unused entries are mapped at all. Is that correct?

    sipa commented at 6:42 pm on October 1, 2024:

    Indeed. I think this may actually have been an oversight, but but it is correct: only the first depgraph.TxCount() positions within m_sorted_to_original are actually ever accessed (because the holes are all moved to the end), and similarly, only the used positions within m_original_to_sorted matter.

    This made me realize that this code can be improved slightly. Will do if I retouch:

     0--- a/src/cluster_linearize.h
     1+++ b/src/cluster_linearize.h
     2@@ -668,27 +668,22 @@ public:
     3      */
     4     SearchCandidateFinder(const DepGraph<SetType>& depgraph, uint64_t rng_seed) noexcept :
     5         m_rng(rng_seed),
     6-        m_sorted_to_original(depgraph.PositionRange()),
     7+        m_sorted_to_original(depgraph.TxCount()),
     8         m_original_to_sorted(depgraph.PositionRange())
     9     {
    10         // Determine reordering mapping, by sorting by decreasing feerate. All unused positions
    11         // are placed at the end.
    12-        ClusterIndex good_pos{0}, bad_pos{depgraph.PositionRange()};
    13-        for (ClusterIndex i = 0; i < depgraph.PositionRange(); ++i) {
    14-            if (depgraph.Positions()[i]) {
    15-                m_sorted_to_original[good_pos++] = i;
    16-            } else {
    17-                m_sorted_to_original[--bad_pos] = i;
    18-            }
    19+        ClusterIndex good_pos{0};
    20+        for (auto i : depgraph.Positions()) {
    21+            m_sorted_to_original[good_pos++] = i;
    22         }
    23-        Assume(good_pos == bad_pos);
    24         std::sort(m_sorted_to_original.begin(), m_sorted_to_original.begin() + good_pos, [&](auto a, auto b) {
    25             auto feerate_cmp = depgraph.FeeRate(a) <=> depgraph.FeeRate(b);
    26             if (feerate_cmp == 0) return a < b;
    27             return feerate_cmp > 0;
    28         });
    29         // Compute reverse mapping.
    30-        for (ClusterIndex i = 0; i < depgraph.TxCount(); ++i) {
    31+        for (ClusterIndex i = 0; i < good_pos; ++i) {
    32             m_original_to_sorted[m_sorted_to_original[i]] = i;
    33         }
    34         // Compute reordered dependency graph.
    

    sipa commented at 5:51 pm on October 7, 2024:
    I have incorporated this change.
  79. in src/test/util/cluster_linearize.h:305 in 8f44c6a1f8 outdated
    306+    // Verify Positions and PositionRange consistency.
    307+    ClusterIndex num_positions{0};
    308+    ClusterIndex position_range{0};
    309+    for (ClusterIndex i : depgraph.Positions()) {
    310+        ++num_positions;
    311+        position_range = std::max(position_range, i + 1);
    


    sdaftuar commented at 5:51 pm on October 1, 2024:
    nit: position_range will always just be updated to i+1 at each loop iteration, so we could simplify this if we wanted to.

    sipa commented at 6:44 pm on October 1, 2024:
    Indeed, will do if I retouch.

    sipa commented at 5:53 pm on October 7, 2024:
    Done.
  80. sdaftuar approved
  81. sdaftuar commented at 6:17 pm on October 1, 2024: member
    Code review ACK, just some nits.
  82. sdaftuar commented at 11:42 pm on October 1, 2024: member
    ACK 2b21aebd9754c503bac12d1dbf566b3aa27451e8
  83. ismaelsadeeq commented at 4:26 pm on October 4, 2024: member
    Concept ACK
  84. in src/test/util/cluster_linearize.h:325 in 57244eb52d outdated
    320+                SetType old = chl_union;
    321+                for (auto j : chl_union) chl_union |= children[j];
    322+                if (old == chl_union) break;
    323+            }
    324+            assert(chl_union == depgraph.Descendants(i));
    325+        }
    


    ismaelsadeeq commented at 11:32 am on October 7, 2024:

    nit: This loop can be modified to be more readable by using using descriptive variable names and some helpful comment on what the inner loops are doing.

     0        for (ClusterIndex i = 0; i < depgraph.TxCount(); ++i) {
     1            // Initialize the set of ancestors with just the current transaction itself.
     2            SetType ancestor_union = SetType::Singleton(i);
     3            // Iteratively add parents and parents of parents to the ancestor set until there is no more ancestor to add.
     4            while (true) {
     5                SetType old_ancestor_union = ancestor_union;
     6                for (auto ancestor: ancestor_union) {
     7                    ancestor_union |= parents[ancestor];
     8                }
     9                // exit when new ancestor was not added
    10                if (old_ancestor_union == ancestor_union) break;
    11            }
    12            assert(ancestor_union == depgraph.Ancestors(i));
    13
    14            // Initialize the set of descendants with just the current transaction itself.
    15            SetType descendant_union = SetType::Singleton(i);
    16            // Iteratively add children and children of children to the descendant set until there is no more to add.
    17            while (true) {
    18                SetType old_descendant_union = descendant_union;
    19                for (auto descendant: descendant_union) {
    20                    descendant_union |= children[descendant];
    21                }
    22                // exit when no new descendant was added
    23                if (old_descendant_union == descendant_union) break;
    24            }
    25            assert(descendant_union == depgraph.Descendants(i));
    26        }
    

    sipa commented at 5:53 pm on October 7, 2024:
    Done, with a few changes.
  85. in src/cluster_linearize.h:115 in 8f44c6a1f8 outdated
    108@@ -97,24 +109,50 @@ class DepGraph
    109     }
    110 
    111     /** Construct a DepGraph object given another DepGraph and a mapping from old to new.
    112+     *
    113+     * @param depgraph   The original DepGraph that is being remapped.
    114+     *
    115+     * @param mapping    A vector, such that mapping[i] gives the position in the new DepGraph
    


    ismaelsadeeq commented at 2:37 pm on October 7, 2024:

    nit maybe calling mapping a container will be better.

    0      *@param mapping A container, such that mapping[i] gives the position in the new DepGraph
    

    sipa commented at 5:52 pm on October 7, 2024:
    I wouldn’t call a Span a container; I have changed it to just saying “A Span such that …”.

    ismaelsadeeq commented at 5:59 pm on October 7, 2024:
    much better 👍🏾
  86. ismaelsadeeq commented at 3:33 pm on October 7, 2024: member

    first pass light code review ACK 2b21aebd9754c503bac12d1dbf566b3aa27451e8

    Mostly trying to understand what the code is doing!

    comments are just nits

  87. in src/cluster_linearize.h:191 in 57244eb52d outdated
    183@@ -184,6 +184,48 @@ class DepGraph
    184         }
    185     }
    186 
    187+    /** Compute the (reduced) set of parents of node i in this graph.
    188+     *
    189+     * This returns the minimal subset of the parents of i whose ancestors together equal all of
    190+     * i's ancestors (unless i is part of a cycle of dependencies). Note that DepGraph does not
    191+     * store the actual set of parents.
    


    ismaelsadeeq commented at 3:40 pm on October 7, 2024:

    nit: can go further with the explanation in the comments with something like

    0     * i's ancestors (unless i is part of a cycle of dependencies). Note that DepGraph does not
    1     * store set of parents; it has to be reduced from ancestor set.
    

    for both the parent and child reducing methods.


    sipa commented at 5:52 pm on October 7, 2024:
    I have phrased it as “this information has to be inferred from the ancestor sets” (and similar for GetReducedChildren).
  88. clusterlin: abstract out DepGraph::GetReduced{Parents,Children}
    A fuzz test already relies on these operations, and a future commit will need
    the same logic too. Therefore, abstract them out into proper member functions,
    with proper testing.
    5901cf7100
  89. clusterlin: rework DepGraphFormatter::Unser
    This commit does not change the serialization format. Its purpose is making a
    few changes already in order to reduce the diff size of the later commit that
    introduces support for holes in DepGraph.
    
    The previous approach was to immediately construct a transaction as soon as its
    feerate was known in a preliminary position, and then undo that, and place it
    in the correct position once the position information is known (such that a
    deserialization error in between would not result in an inconsistent state).
    
    The new approach is to delay the actual transaction creation until all its
    information is known, avoiding the need to undo and redo. This requires a
    different means of determining whether dependencies are redundant, but that has
    the advantage that a later commit can apply all dependencies at once, reducing
    the complexity of deserialization.
    eaab55ffc8
  90. clusterlin: simplify DepGraphFormatter::Ser
    This does not change the serialization format.
    
    It turns out that it is unnecessary to keep track of the order of transactions
    in the so-far reconstructed DepGraph to decide how far from the end to insert
    a new transaction.
    abf50649d1
  91. clusterlin: make DepGraph::AddDependency support multiple dependencies at once
    This changes DepGraph::AddDependency into DepGraph::AddDependencies, which takes
    in a single child, but a set of parent transactions, making them all dependencies
    at once.
    
    This is important for performance. N transactions can have O(N^2) parents combined,
    so constructing a full DepGraph using just AddDependency (which is O(N) on its own)
    could take O(N^3) time, while doing the same with AddDependencies (also O(N) on its
    own) only takes O(N^2).
    
    Notably, this matters for DepGraphFormatter::Unser, which goes from O(N^3) to O(N^2).
    
    Co-Authored-By: Greg Sanders <gsanders87@gmail.com>
    75b5d42419
  92. clusterlin: add DepGraph::RemoveTransactions and support for holes in DepGraph
    This commits introduces support in DepGraph for the transaction positions to be
    non-continuous. Specifically, it adds:
    * DepGraph::RemoveTransactions which removes 0 or more positions from a DepGraph.
    * DepGraph::Positions() to get a set of which positions are in use.
    * DepGraph::PositionRange() to get the highest used position in a DepGraph + 1.
    
    In addition, it extends the DepGraphFormatter format to support holes in a
    compatible way (it serializes non-holey DepGraphs identically to the old code,
    and deserializes them the same way)
    0606e66fdb
  93. clusterlin: merge two DepGraph fuzz tests into simulation test
    This combines the clusterlin_add_dependency and clusterlin_cluster_serialization
    fuzz tests into a single clusterlin_depgraph_sim fuzz test. This tests starts
    from an empty DepGraph and performs a arbitrary number of AddTransaction,
    AddDependencies, and RemoveTransactions operations on it, and compares the
    resulting state with a naive reimplementation.
    1c24c62510
  94. clusterlin: remove Cluster type 0b3ec8c59b
  95. sipa force-pushed on Oct 7, 2024
  96. DrahtBot added the label CI failed on Oct 7, 2024
  97. DrahtBot removed the label CI failed on Oct 7, 2024
  98. instagibbs approved
  99. instagibbs commented at 2:27 pm on October 8, 2024: member

    reACK 0b3ec8c59b2b6d598531cd4e688eb276e597c825

    via git range-diff master 2b21aebd9754c503bac12d1dbf566b3aa27451e8 0b3ec8c59b2b6d598531cd4e688eb276e597c825

  100. DrahtBot requested review from sdaftuar on Oct 8, 2024
  101. DrahtBot requested review from ismaelsadeeq on Oct 8, 2024
  102. ismaelsadeeq commented at 3:02 pm on October 8, 2024: member
    reACK 0b3ec8c59b2b6d598531cd4e688eb276e597c825
  103. sdaftuar commented at 10:42 pm on October 8, 2024: member
    reACK 0b3ec8c59b2b6d598531cd4e688eb276e597c825
  104. sdaftuar approved
  105. sdaftuar approved
  106. in src/cluster_linearize.h:711 in 0606e66fdb outdated
    710-        m_todo(SetType::Fill(depgraph.TxCount()))
    711+        m_original_to_sorted(depgraph.PositionRange())
    712     {
    713-        // Determine reordering mapping, by sorting by decreasing feerate.
    714-        std::iota(m_sorted_to_original.begin(), m_sorted_to_original.end(), ClusterIndex{0});
    715+        // Determine reordering mapping, by sorting by decreasing feerate. Unusued positions are
    


    glozow commented at 2:10 pm on October 10, 2024:
    tiny typo unusued
  107. in src/cluster_linearize.h:22 in 0b3ec8c59b
    18@@ -19,14 +19,6 @@
    19 
    20 namespace cluster_linearize {
    21 
    22-/** Data type to represent cluster input.
    


    glozow commented at 2:21 pm on October 10, 2024:
    Nice, no more naming clash for Cluster in my mind
  108. glozow commented at 2:37 pm on October 10, 2024: member
    ACK 0b3ec8c59b2b6d598531cd4e688eb276e597c825, reviewed range-diff from aab53ddcd8fcbc3c0be0da9383f8e06abe5badda and clusterlin_depgraph_sim
  109. glozow merged this on Oct 10, 2024
  110. glozow closed this on Oct 10, 2024


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-10-31 03:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me