Cluster linearization: separate tests from tests-of-tests #30605

pull sipa wants to merge 11 commits into bitcoin:master from sipa:202408_cluster_tests changing 1 files +304 −72
  1. sipa commented at 8:48 pm on August 7, 2024: member

    Part of the cluster mempool project: #30289

    The current cluster linearization fuzz tests contain two tests which combine testing of production code with testing of the test code itself:

    • clusterlin_search_finder: establishes the correctness of SearchCandidateFinder by comparing against both SimpleCandidateFinder and ExhaustiveCandidateFinder (which is even more simple than SimpleCandidateFinder). If SimpleCandidateFinder works correctly, then this comparison with ExhaustiveCandidateFinder is redundant. If it isn’t, we ought to find that in a test specific to SimpleCandidateFinder rather than as a side-effect of testing SearchCandidateFinder. Split this functionality out into a new clusterlin_simple_finder.
    • clusterlin_linearize: establishes the correctness of Linearize by comparing against both SimpleLinearize and literally every valid linearization for the cluster. Again, if SimpleLinearize works correctly, then this comparison with all valid linearizations is redundant, and if it isn’t we should find it in a test for SimpleLinearize. Do so by splitting off that functionality into clusterlin_simple_linearize.

    After that, a few general improvements to the affected tests are made (comparing with linearizations and subsets read from the fuzz input, plus a performance improvement).

  2. DrahtBot commented at 8:48 pm on August 7, 2024: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/30605.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK ismaelsadeeq, marcofleon

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    Conflicts

    No conflicts as of last run.

  3. DrahtBot added the label CI failed on Aug 7, 2024
  4. DrahtBot commented at 10:13 pm on August 7, 2024: contributor

    🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/28483985918

    Make sure to run all tests locally, according to the documentation.

    The failure may happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  5. sipa force-pushed on Aug 7, 2024
  6. DrahtBot removed the label CI failed on Aug 8, 2024
  7. glozow added the label Tests on Aug 8, 2024
  8. sipa force-pushed on Sep 4, 2024
  9. DrahtBot added the label Needs rebase on Sep 16, 2024
  10. sipa force-pushed on Sep 17, 2024
  11. sipa commented at 3:24 pm on September 17, 2024: member
    Rebased after merge of #30286.
  12. DrahtBot removed the label Needs rebase on Sep 17, 2024
  13. ismaelsadeeq commented at 7:09 pm on September 24, 2024: member
    Concept ACK
  14. in src/test/fuzz/cluster_linearize.cpp:137 in d5d6839fe7 outdated
    72@@ -74,6 +73,7 @@ class SimpleCandidateFinder
    73                 // transactions that share ancestry with inc so far (which means only connected
    74                 // sets will be considered).
    75                 if (inc_none || inc.Overlaps(m_depgraph.Ancestors(split))) {
    76+                    --iterations_left;
    


    ismaelsadeeq commented at 4:37 pm on September 25, 2024:

    I believe this new behavior is correct, as it reduces iteration_left when work is being done. This aligns with the behavior in SearchCandidateFinder.

    On a side note, I think “iterations” might be a misnomer here. A more accurate term could be “maximum optimization” or “optimizations left”.


    ismaelsadeeq commented at 6:04 pm on September 25, 2024:
    could be in it’s own commit?

    sipa commented at 2:36 am on September 26, 2024:
    I’ve moved it to its own commit. I left the name iterations_left as I think calling it “optimizations” is misleading; most steps don’t optimize anything. It’s true that the iterations are not actually the loop’s iterations anymore, but they’re still very correlated to the number of iterations the algorithm makes.
  15. in src/test/fuzz/cluster_linearize.cpp:562 in d5d6839fe7 outdated
    557+                for (auto t1 : todo) {
    558+                    const auto& a1 = depgraph.Ancestors(t1);
    559+                    for (auto t2 : todo - a1) {
    560+                        const auto& a2 = depgraph.Ancestors(t2);
    561+                        assert(found.feerate >= depgraph.FeeRate((a1 | a2) & todo));
    562+                    }
    


    ismaelsadeeq commented at 5:11 pm on September 25, 2024:

    This can be in the same loop above?

    Why not just compare with the result of ancestor finder?


    sipa commented at 2:38 am on September 26, 2024:
    I was hoping to do a bit better than comparing with AncestorCandidateFinder (which only tries ancestor sets, not unions of two of them), but it’s really not worth the complexity here. I’ve changed it to using AncestorCandidateFinder (here and below), and in a separate commit added an arbitrary subset (from fuzz input) to compare with.
  16. in src/test/fuzz/cluster_linearize.cpp:666 in d5d6839fe7 outdated
    677+                for (auto t1 : todo) {
    678+                    const auto& a1 = depgraph.Ancestors(t1);
    679+                    for (auto t2 : todo - a1) {
    680+                        const auto& a2 = depgraph.Ancestors(t2);
    681+                        assert(found.feerate >= depgraph.FeeRate((a1 | a2) & todo));
    682+                    }
    


    ismaelsadeeq commented at 5:19 pm on September 25, 2024:
    We could wrap this in a helper to be dry, as same code is used above.

    sipa commented at 2:38 am on September 26, 2024:
    You’re right that it shouldn’t be repeated, but I don’t think it’s worth much effort to keep this. I’ve dropped it.
  17. in src/test/fuzz/cluster_linearize.cpp:831 in d5d6839fe7 outdated
    544+
    545+            if (todo.Count() <= 12) {
    546+                // Compare with ExhaustiveCandidateFinder. This quickly gets computationally
    547+                // expensive for large clusters (O(2^n)), so only do it for sufficiently small ones.
    548+                auto exhaustive = exh_finder.FindCandidateSet();
    549+                assert(exhaustive.feerate == found.feerate);
    


    ismaelsadeeq commented at 5:37 pm on September 25, 2024:
    Should we also assert that the transactions in the set are the same?

    sipa commented at 2:39 am on September 26, 2024:
    They cannot be guaranteed to be identical; there may be multiple distinct optimal candidate sets.

    polespinasa commented at 3:46 pm on June 11, 2025:
    Are sets with different sizes but same fee rate considered equally optimal? In case the smaller one “wins” would make sense to assert the size also?

    sipa commented at 7:27 pm on June 11, 2025:
    The FeeRate::operator< and similar operators treat equal-feerate sets as better when they’re smaller. So the == means both feerate and size (and thus fee) equal. Still, there can be multiple equal such sets.

    polespinasa commented at 7:39 pm on June 11, 2025:
    oh right, it’s using FeeFrac to compare. Noted!
  18. in src/test/fuzz/cluster_linearize.cpp:500 in d5d6839fe7 outdated
    493@@ -495,11 +494,93 @@ FUZZ_TARGET(clusterlin_ancestor_finder)
    494 
    495 static constexpr auto MAX_SIMPLE_ITERATIONS = 300000;
    496 
    497+FUZZ_TARGET(clusterlin_simple_finder)
    498+{
    499+    // Verify that SimpleCandidateFinder works as expected by sanity checking the results
    500+    // and comparing with the results from ExhaustiveCandidateFinder. Note that
    


    ismaelsadeeq commented at 5:39 pm on September 25, 2024:
    Maybe indicate that we only compare SimpleCandidateFinder with ExhaustiveCandidateFinder when we optimally found a subset and the cluster size is small.

    sipa commented at 2:58 pm on September 26, 2024:
    I’ve improved the comments a bit.
  19. ismaelsadeeq commented at 5:59 pm on September 25, 2024: member

    Code review bf1b1b09ca4dd18b7848fb1807df8533cf930df7

    This is a nice refactor, really improve the cluster linearize test a lot.

  20. sipa force-pushed on Sep 26, 2024
  21. sipa force-pushed on Sep 26, 2024
  22. ismaelsadeeq commented at 10:43 am on September 27, 2024: member
    Code review ACK 0cae0c487cb8b32b96f888a31e818f2a90f848cc
  23. sipa force-pushed on Sep 28, 2024
  24. in src/test/fuzz/cluster_linearize.cpp:829 in 465469f1c1 outdated
    553+            auto anc = anc_finder.FindCandidateSet();
    554+            assert(anc.feerate <= found.feerate);
    555+
    556+            if (todo.Count() <= 12) {
    557+                // Compare with ExhaustiveCandidateFinder. This quickly gets computationally
    558+                // expensive for large clusters (O(2^n)), so only do it for sufficiently small ones.
    


    ismaelsadeeq commented at 9:56 am on October 4, 2024:
    Is this not $O(N * 2{^N})$

    sipa commented at 11:56 am on October 4, 2024:
    Yeah, it’s $\mathcal{O}(2^N)$ iterations, but overall runtime is indeed $\mathcal{O}(N \cdot 2^N)$ (or, arguably, $\mathcal{O}(N^2 \cdot 2^N)$ even if you take into account that for larger $N$, the bitsets involved need to grow linearly too, so bitset operations themselves are really $\mathcal{O}(N)$ ).

    sipa commented at 3:09 pm on October 4, 2024:
    Fixed.
  25. in src/test/fuzz/cluster_linearize.cpp:844 in 465469f1c1 outdated
    563+
    564+        // Find a topologically valid subset of transactions to remove from the graph.
    565+        auto del_set = ReadTopologicalSet(depgraph, todo, reader);
    566+        // If we did not find anything, use found itself, because we should remove something.
    567+        if (del_set.None()) del_set = found.transactions;
    568+        todo -= del_set;
    


    ismaelsadeeq commented at 10:25 am on October 4, 2024:

    I might be missing something here, so I would appreciate some clarification.

    1. Why don’t we directly remove the candidate set transactions found by the search?
    2. It seems like there could be multiple valid topological sets at a given time. How does it know which set was actually found?

    sipa commented at 12:00 pm on October 4, 2024:

    The goal here is simply continuing the test, so this isn’t just exercising the first candidate found for a linearization, but can hit arbitrary remainders in later iterations too. To that end, we read a topological subset from the fuzzer, which can be anything, but since ReadTopologicalSet may return an empty set, we need to specially deal with that case.

    One possibility would be adding to ReadTopologicalSet a way to force it to return a non-empty set. Instead, the code here uses the found candidate as fallback. This is just a last-resort fallback to avoid ending up in an infinite loop; the choice doesn’t really matter. Arguably, it could just as well be if (del_set.None()) break;.


    ismaelsadeeq commented at 12:33 pm on October 4, 2024:

    Thanks for clarifying, initially I thought the aim is to remove the subset found by the search continuously until there are no transactions left in m_todo.

    Maybe it will be helpful to document the point you gave as comment?


    sipa commented at 3:10 pm on October 4, 2024:
    I have instead added a commit that allows requesting a non-empty subset from ReadTopologicalSubset, and added some clarifying comments about why/why not empty sets are requested in all call sites.
  26. in src/test/fuzz/cluster_linearize.cpp:564 in 560402264f outdated
    558@@ -559,6 +559,10 @@ FUZZ_TARGET(clusterlin_simple_finder)
    559                 auto exhaustive = exh_finder.FindCandidateSet();
    560                 assert(exhaustive.feerate == found.feerate);
    561             }
    562+
    563+            // Compare with a topological set read from the fuzz input.
    564+            auto read_topo = ReadTopologicalSet(depgraph, todo, reader);
    


    ismaelsadeeq commented at 11:02 am on October 4, 2024:
    We can cache the subset and then reuse it here and when finding topological subset to be deleted.

    sipa commented at 12:01 pm on October 4, 2024:
    Sure we could, but I don’t think there is reason to assume that’s necessarily the best choice.
  27. in src/test/fuzz/cluster_linearize.cpp:667 in 560402264f outdated
    662@@ -659,6 +663,10 @@ FUZZ_TARGET(clusterlin_search_finder)
    663             // Compare with AncestorCandidateFinder;
    664             auto anc = anc_finder.FindCandidateSet();
    665             assert(found.feerate >= anc.feerate);
    666+
    667+            // Compare with a topological set read from the fuzz input.
    


    ismaelsadeeq commented at 11:03 am on October 4, 2024:
    Same comment applies here as well.

    sipa commented at 3:11 pm on October 4, 2024:

    And the same response applies.

    The goal of fuzz testing is testing as many possible valid inputs as possible. Fixing the subset being compared with, and the subset being removed is reducing the number of combinations of tested inputs.


    ismaelsadeeq commented at 3:27 pm on October 4, 2024:
    I see your point now. both comments can be resolved.
  28. ismaelsadeeq commented at 11:24 am on October 4, 2024: member

    I reviewed the latest changes in the last push

    Code review 8eb70e3ca467ac66bc1f43e3e0b3338d63806ee5

    Changes since my last review were

    1. Addressed my previous review comments.

    2. An optimization was introduced in the clusterlin_simple_linearize fuzz test that skips to the last non-topological permutation. For example, given transactions [A, B, C, D, E]. if A-B-C are a valid topological ancestor set, and the first permutation results in [A, E, B, C, D], which is not topologically valid, we can skip to [A, E, D, C, B]. This optimization saves iterations, allowing us to increase the maximum dependency graph we can check to 8.

    3. Assert that the found subsets are compared against the first valid topological set, verifying that we achieve better results when optimal.

    4. Compared the optimal linearization to the default chunking of the dependency graph.

    Overall, the changes look good to me. I left some comments. I also have a single question regarding the process of marking topological set as done.

  29. sipa force-pushed on Oct 4, 2024
  30. sipa force-pushed on Oct 4, 2024
  31. ismaelsadeeq approved
  32. ismaelsadeeq commented at 3:47 pm on October 4, 2024: member

    Code review ACK 46b6a67eccb411903deb09af3301c2499d2c0217

    All comments were addressed.

  33. DrahtBot added the label Needs rebase on Oct 10, 2024
  34. sipa force-pushed on Oct 10, 2024
  35. sipa commented at 5:25 pm on October 10, 2024: member
    Rebased after the merge of #30857.
  36. DrahtBot removed the label Needs rebase on Oct 10, 2024
  37. DrahtBot added the label CI failed on Oct 10, 2024
  38. DrahtBot commented at 6:49 pm on October 10, 2024: contributor

    🚧 At least one of the CI tasks failed. Debug: https://github.com/bitcoin/bitcoin/runs/31368623698

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  39. sipa force-pushed on Oct 10, 2024
  40. DrahtBot removed the label CI failed on Oct 10, 2024
  41. ismaelsadeeq commented at 11:16 am on October 15, 2024: member

    re-ACK 30f30bbcaae67d725360c3e7b83207904e1a9b19

    The recent push was rebase on master after https://github.com/bitcoin/bitcoin/pull/30857

  42. marcofleon commented at 5:43 pm on December 12, 2024: contributor
    0echo "oz0QFv7///////8HPRCGEBAQEBA7Ozs7OwYGBgYGd3d3d3d3d5sAAAAAAAAAd3d3d3cGBgYGBgb/BgYGBgYGBgYGCAgICAgICAgICAgICAgICAgICAgICAgIBgYGBgYGBgYGBgY7OzsQEBAQEBAQEBAQEBAQLT09PRA9PT09PRAQEBAQEC0QEBAQLT09PRA9PT09EA==" | base64 --decode > clusterlin_linearize.crash
    1FUZZ=clusterlin_linearize ./clusterlinfuzzbuild/src/test/fuzz/fuzz clusterlin_linearize.crash
    2src/test/fuzz/cluster_linearize.cpp:1069: void clusterlin_linearize_fuzz_target(FuzzBufferType): Assertion `cmp_read >= 0' failed.
    

    Just did some initial fuzzing for now, but I’m looking to do a more thorough review of this soon.

  43. sipa force-pushed on Dec 12, 2024
  44. sipa commented at 7:35 pm on December 12, 2024: member
    @marcofleon Thank you! It was a bug in the fuzz test, comparing the wrong things. Should be fixed.
  45. sipa force-pushed on Mar 3, 2025
  46. DrahtBot added the label Needs rebase on Mar 26, 2025
  47. sipa force-pushed on Mar 27, 2025
  48. sipa commented at 11:37 am on March 27, 2025: member
    Rebased after the merge of #31363.
  49. DrahtBot removed the label Needs rebase on Mar 27, 2025
  50. DrahtBot added the label CI failed on May 17, 2025
  51. DrahtBot removed the label CI failed on May 18, 2025
  52. in src/test/fuzz/cluster_linearize.cpp:678 in 5aa18902de outdated
    673+        if (MAX_SIMPLE_ITERATIONS > (uint64_t{1} << (todo.Count() - 1))) assert(optimal);
    674+
    675+        // Perform quality checks only if SimpleCandidateFinder claims an optimal result.
    676+        if (optimal) {
    677+            // Optimal sets are always connected.
    678+            assert(depgraph.IsConnected(found.transactions));
    


    ismaelsadeeq commented at 1:51 pm on June 10, 2025:

    IIUC FindCandidateSet only adds connected sets and there is is comment in line 74-75 that explains that. so even when not optimal this assertion will hold.

    I’ve been running a version of this commit with the assertion above if optimal and no crash after a long time.


    sipa commented at 9:35 pm on June 10, 2025:
    Nice catch! I’ve added a commit that always performs the check for SimpleCandidateFinder (even if non-optimal).

    sipa commented at 10:06 pm on June 10, 2025:
    Huh, I hit fuzz failures with this if I let it run for a while. Weird, will investigate.

    sipa commented at 2:36 am on June 11, 2025:

    Fixed!

    The exception was that SimpleCandidateFinder::FindCandidateSet initialized best to be the entire set of remaining transactions, which are not necessarily connected. If the search then stops without ever finding a better subset (because max_iterations was hit), the entire set was returned.

  53. in src/test/fuzz/cluster_linearize.cpp:1206 in 50a18d7143 outdated
    1074@@ -1067,6 +1075,12 @@ FUZZ_TARGET(clusterlin_linearize)
    1075         // If SimpleLinearize finds the optimal result too, they must be equal (if not,
    1076         // SimpleLinearize is broken).
    1077         if (simple_optimal) assert(cmp == 0);
    1078+
    1079+        // Compare with a linearization read from the fuzz input.
    1080+        auto read = ReadLinearization(depgraph, reader);
    1081+        auto read_chunking = ChunkLinearization(depgraph, read);
    1082+        auto cmp_read = CompareChunks(chunking, read_chunking);
    


    ismaelsadeeq commented at 2:37 pm on June 10, 2025:

    The fuzz crash reported by @marcofleon in #30605 (comment) occurred because we were comparing the simple chunking with the chunking read from the fuzzer.

    However, in the block, simple_chunking might not be optimal, while chunking from Linearize is optimal. In such cases, the linearization provided by the fuzzer might be better than what is produced by SimpleLinearize in simple_chunking.

    Can be reproduced by appending this diff

     0diff --git a/src/test/fuzz/cluster_linearize.cpp b/src/test/fuzz/cluster_linearize.cpp
     1index df2cc312704..3f3111fa30e 100644
     2--- a/src/test/fuzz/cluster_linearize.cpp
     3+++ b/src/test/fuzz/cluster_linearize.cpp
     4@@ -1079,7 +1079,7 @@ FUZZ_TARGET(clusterlin_linearize)
     5         // Compare with a linearization read from the fuzz input.
     6         auto read = ReadLinearization(depgraph, reader);
     7         auto read_chunking = ChunkLinearization(depgraph, read);
     8-        auto cmp_read = CompareChunks(chunking, read_chunking);
     9+        auto cmp_read = CompareChunks(simple_chunking, read_chunking);
    10         assert(cmp_read >= 0);
    11     }
    12 }
    

    sipa commented at 9:36 pm on June 10, 2025:
    Indeed.
  54. in src/test/fuzz/cluster_linearize.cpp:1198 in d2fc10b645 outdated
    1093@@ -1090,6 +1094,9 @@ FUZZ_TARGET(clusterlin_linearize)
    1094         // If SimpleLinearize finds the optimal result too, they must be equal (if not,
    1095         // SimpleLinearize is broken).
    1096         if (simple_optimal) assert(cmp == 0);
    


    ismaelsadeeq commented at 2:54 pm on June 10, 2025:
    Should we also do more comparison when both are optimal, that the chunking matches, Edit disregard we do it below 👍🏾

    sipa commented at 9:36 pm on June 10, 2025:
    The chunking won’t necessarily match exactly, but the number of elements in the chunking must match if both are optimal.
  55. ismaelsadeeq commented at 3:02 pm on June 10, 2025: member

    reACK d2fc10b64505512a70bed05e7bb21d76c8f748cd

    With some comments, even though some things added here are deleted in #32545 I think this is quite useful and should get in before SFL.

  56. sipa force-pushed on Jun 10, 2025
  57. sipa force-pushed on Jun 11, 2025
  58. ismaelsadeeq approved
  59. ismaelsadeeq commented at 2:24 pm on June 11, 2025: member
    reACK 30c699dea39157631515d867c65a03987bd46cce
  60. sipa commented at 7:22 pm on June 11, 2025: member

    A suggestion from during the review club: abstracting out the std::next_permuation iteration loop into a ExhaustiveLinearize function, which could be done using a similar interface as Linearize and SimpleLinearize (which would then mimick the search algorithm progress SearchCandidateFinder -> SimpleCandidateFinder -> ExhaustiveCandidateFinder).

    EDIT: the problem below doesn’t matter, because it always iterates through all (valid) permutations anyway, so it will always encounter the optimal.


    I’ve thought about this a bit, and it is possible, but not entirely a trivial operation, because contrary to what the current code does (which compares every found permutation individual with the linearization result), it would need an interface that returns a single “found” linearization. However, this is a complication, because found linearization may be incomparable. We know an optimal linearization always exists, which is at least as good or better as every other linearization, but our iterating over permutations may not find the optimal.

    There are two ways of dealing with this:

    • Only replace the “best” found so far in our hypothetical ExhaustiveLinearize whenever a linearization is found that is strictly better than the best so far.
    • Use MergeLinearizations to combine all linearizations when they are incomparable. This would be a bit slower, but may find better results when there aren’t enough iterations. However, it would also complicate the code, and would “depend” on the correctness of MergeLinearization (tested through yet another fuzz test) to convey confidence in the “obvious correctness”.

    Worth doing? Which approach?

  61. sipa force-pushed on Jun 12, 2025
  62. DrahtBot added the label CI failed on Jun 12, 2025
  63. DrahtBot commented at 2:54 pm on June 12, 2025: contributor

    🚧 At least one of the CI tasks failed. Task ARM, unit tests, no functional tests: https://github.com/bitcoin/bitcoin/runs/43981343548 LLM reason (✨ experimental): Compilation error due to a multi-line comment causing a warning treated as error, leading to build failure.

    Try to run the tests locally, according to the documentation. However, a CI failure may still happen due to a number of reasons, for example:

    • Possibly due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

    • A sanitizer issue, which can only be found by compiling with the sanitizer and running the affected test.

    • An intermittent issue.

    Leave a comment here, if you need help tracking down a confusing failure.

  64. sipa commented at 3:14 pm on June 12, 2025: member
    I have abstracted out the std::next_permutation logic into a separate ExhaustiveLinearize() function, and also added a comment that explains how the various functions/classes/tests are related.
  65. in src/test/fuzz/cluster_linearize.cpp:267 in 9877e275b9 outdated
    262+            // linearization so far.
    263+            auto perm_chunking = ChunkLinearization(depgraph, perm_linearization);
    264+            auto cmp = CompareChunks(perm_chunking, chunking);
    265+            // If the diagram is better, or if it is equal but with more chunks (because we
    266+            // prefer minimal chunks), consider this better.
    267+            if (linearization.empty() ||cmp > 0 || (cmp == 0 && perm_chunking.size() > chunking.size())) {
    


    marcofleon commented at 4:53 pm on June 12, 2025:
    nittiest of nits: Missing a space here. Could be fixed in #32545 if this is isn’t touched anymore.

    sipa commented at 10:37 pm on June 14, 2025:
    Fixed.
  66. DrahtBot removed the label CI failed on Jun 12, 2025
  67. marcofleon commented at 6:19 pm on June 12, 2025: contributor

    code review ACK 9877e275b95400e46f336bc34ce41ed68ce43683

    Previously, the test helpers (SimpleCandidateFinder and SimpleLinearize) were being checked for correctness within the same fuzz targets that are testing the production cluster linearization. Separating out these checks into their own targets reduces redundancy and makes the hierarchy of all these tests easier to follow.

    I ran clusterlin_linearize, clusterlin_search_finder, clusterlin_simple_finder, and clusterlin_simple_linearize for ~50 CPU hours each, no issues.

    The added diagram comment at the beginning is helpful. It also looks a lot cleaner in #32545 😎

  68. DrahtBot requested review from ismaelsadeeq on Jun 12, 2025
  69. in src/test/fuzz/cluster_linearize.cpp:23 in 9877e275b9 outdated
    16@@ -17,6 +17,67 @@
    17 #include <vector>
    18 #include <utility>
    19 
    20+/*
    21+ * The tests in this file primarily test the candidate finder classes and linearization algorithms.
    22+ *
    23+ *   <----: the right side is tested in the named test by comparison with the left side
    


    ismaelsadeeq commented at 4:27 pm on June 13, 2025:

    In “clusterlin: add big comment explaning the relation between tests” 9877e275b95400e46f336bc34ce41ed68ce43683

    I like the new comment addition, just the first description is a bit unclear because the line is diagonal

    Maybe this suggestion will be clearer?

     0diff --git a/src/test/fuzz/cluster_linearize.cpp b/src/test/fuzz/cluster_linearize.cpp
     1index 7d0fb5fbec0..8820e3b3eaa 100644
     2--- a/src/test/fuzz/cluster_linearize.cpp
     3+++ b/src/test/fuzz/cluster_linearize.cpp
     4@@ -18,10 +18,11 @@
     5 #include <utility>
     6 
     7 /*
     8- * The tests in this file primarily test the candidate finder classes and linearization algorithms.
     9+ * The tests in this file primarily cover the candidate finder classes and linearization algorithms.
    10  *
    11- *   <----: the right side is tested in the named test by comparison with the left side
    12- *   <<---: right side is implemented using left side
    13+ *   <----: An implementation (at the start of the line --) is tested in the test marked with *
    14+ *          by comparison with other implementations (at the end of the line ->).
    15+ *   <<---: The right side is implemented using the left side.
    16  *
    17  *   +-----------------------+
    18  *   | SearchCandidateFinder | <<---------------------\
    19@@ -38,7 +39,7 @@
    20  *     |                            |                    |        ||
    21  *     |                            |                    |        vv  TEST CODE
    22  *     |                            |                    |
    23- *     |-clusterlin_search_finder   |                    |-clusterlin_linearize
    24+ *     |-clusterlin_search_finder*  |                    |-clusterlin_linearize*
    25  *     |                            |                    |
    26  *     v                            |                    v
    27  *   +-----------------------+      |               +-----------------+
    28@@ -47,7 +48,7 @@
    29  *                  |               |                    |
    30  *                  +---------------/                    |
    31  *                  |                                    |
    32- *                  |-clusterlin_simple_finder           |-clusterlin_simple_linearize
    33+ *                  |-clusterlin_simple_finder*          |-clusterlin_simple_linearize*
    34  *                  v                                    v
    35  *   +---------------------------+                  +---------------------+
    36  *   | ExhaustiveCandidateFinder |                  | ExhaustiveLinearize |
    37abubakarismail@Abubakars-MacBook-Pro ~/D/W/b/bitcoin ((9877e275))> 
    

    sipa commented at 10:37 pm on June 14, 2025:
    Done. I have also integrated the mention of clusterlin_ancestor_finder into the diagram.
  70. DrahtBot requested review from ismaelsadeeq on Jun 13, 2025
  71. ismaelsadeeq commented at 4:28 pm on June 13, 2025: member
    re-ACK 9877e275b95400e46f336bc34ce41ed68ce43683
  72. clusterlin tests: count SimpleCandidateFinder iterations better
    Only count the number of actual new subsets added. If the queue contains
    a work item that completely covers a component, no transaction can be added
    to it without creating a disconnected component. In this case, also don't
    count it as an iteration.
    
    With this, the number of iterations performed by SimpleCandidateFinder is
    bounded by the number of distinct connected topologically-valid subsets of
    the cluster.
    77a432ee70
  73. clusterlin tests: separate testing of Search- and SimpleCandidateFinder
    This separates the existing fuzz test into:
    
    * clusterlin_search_finder: establishes SearchCandidateFinder's correctness using the
                                simpler SimpleCandidateFinder.
    * clusterlin_simple_finder: establishes SimpleCandidateFinder's correctness using the
                                (even) simpler ExhaustiveCandidateFinder.
    
    rather than trying to do both at once.
    a38c38951e
  74. clusterlin tests: make SimpleCandidateFinder always find connected
    Make a small change to guarantee that SimpleCandidateFinder only ever returns
    connected solutions, even when non-optimal. Then test this property.
    10e90f7aef
  75. clusterlin tests: separate testing of SimpleLinearize and Linearize
    The separates the existing fuzz test into:
    
    * clusterlin_linearize: establishes the correctness of Linearize() using the
                            simpler SimpleLinearize() function.
    * clusterlin_simple_linearize: establishes the correctness of SimpleLinearize() by
                            comparing with all valid linearizations computed by
                            std::next_permutation.
    
    rather than combining the first two into a single fuzz test.
    98c1c88b6f
  76. clusterlin tests: optimize clusterlin_simple_linearize
    Whenever a non-topological permutation is encountered, fast forward to the
    last permutation with the same non-topological prefix, skipping over
    potentially many permutations that are non-topological for the same reason.
    
    With that, increase the checking of all permutations to clusters of size 8
    instead of 7.
    6e37824ac3
  77. clusterlin tests: compare with fuzz-provided topological sets 5f92ebee0d
  78. clusterlin tests: compare with fuzz-provided linearizations 94f3e17c33
  79. clusterlin tests: support non-empty ReadTopologicalSubset()
    In several call sites for ReadTopologicalSubset, a non-empty result is
    expected, necessitating a special case at the call site for empty results.
    
    Fix this by adding a bool non_empty argument, which does this special
    casing (more efficiently) inside ReadTopologicalSubset itself.
    da23ecef29
  80. clusterlin tests: verify that chunks are minimal 1fa55a64ed
  81. clusterlin: abstract try-permutations into ExhaustiveLinearize function
    Rather than this exhaustive linearization check happening inline inside
    clusterlin_simple_linearize, abstract it out into a Linearize()-like
    function for clarity.
    
    Note that this isn't exactly a refactor, because the old code would compare the
    found linearization against all (valid) permutations, while the new code instead
    first computes the best linearization from all valid permutations, and then
    compares it with the found one.
    b64e61d2de
  82. clusterlin: add big comment explaning the relation between tests d7fca5c171
  83. sipa force-pushed on Jun 14, 2025
  84. ismaelsadeeq approved
  85. ismaelsadeeq commented at 1:44 pm on June 15, 2025: member
    re-ACK d7fca5c171f450621112457ea6f7c99b2e21d354
  86. DrahtBot requested review from marcofleon on Jun 15, 2025
  87. marcofleon commented at 12:13 pm on June 16, 2025: contributor
    Re ACK d7fca5c171f450621112457ea6f7c99b2e21d354

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-06-19 12:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me