Make TxGraph fuzz tests more deterministic #32191

pull sipa wants to merge 2 commits into bitcoin:master from sipa:202504_txgraph_deterministic changing 2 files +102 −76
  1. sipa commented at 8:36 pm on April 1, 2025: member

    Part of cluster mempool: #30289

    The implicit transaction ordering for transactions in a TxGraphImpl is defined by:

    1. higher chunk feerate first
    2. lower Cluster* object pointer first
    3. lower position within cluster linearization first.

    Number (2) is not deterministic, as it intricately depends on the heap allocation algorithm. Fix this by giving each Cluster a unique uint64_t m_sequence value, and sorting by those instead.

    The second commit then uses this new approach to optimize GroupClusters a bit more, avoiding some repeated checks and dereferences, by making a local copy of the involved sequence numbers.

    Thanks to @dergoegge for pointing this out.

  2. DrahtBot commented at 8:36 pm on April 1, 2025: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/32191.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK instagibbs, marcofleon

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #32263 (cluster mempool: add TxGraph work controls by sipa)
    • #31553 (cluster mempool: add TxGraph reorg functionality by sipa)
    • #31444 (cluster mempool: add txgraph diagrams/mining/eviction by sipa)
    • #28676 ([WIP] Cluster mempool implementation by sdaftuar)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  3. in src/txgraph.cpp:253 in e58b825305 outdated
    248@@ -244,6 +249,8 @@ class TxGraphImpl final : public TxGraph
    249     ClusterSet m_main_clusterset;
    250     /** The staging ClusterSet, if any. */
    251     std::optional<ClusterSet> m_staging_clusterset;
    252+    /** Next sequence number to assign to created Clusters. */
    253+    uint64_t m_sequence_counter{0};
    


    instagibbs commented at 5:11 pm on April 2, 2025:

    light pref

    0    uint64_t m_next_sequence_counter{0};
    

    sipa commented at 7:27 pm on April 2, 2025:
    Done.
  4. in src/txgraph.cpp:1204 in 631244831b outdated
    1203@@ -1205,7 +1204,8 @@ void TxGraphImpl::GroupClusters(int level) noexcept
    1204         // the partitions their Clusters are in.
    


    instagibbs commented at 6:24 pm on April 2, 2025:
    comment above this line should say an_deps

    sipa commented at 7:28 pm on April 2, 2025:
    Indeed, done (and elsewhere).
  5. in src/txgraph.cpp:1207 in 631244831b outdated
    1203@@ -1205,7 +1204,8 @@ void TxGraphImpl::GroupClusters(int level) noexcept
    1204         // the partitions their Clusters are in.
    1205         Cluster* last_chl_cluster{nullptr};
    1206         PartitionData* last_partition{nullptr};
    1207-        for (const auto& [par, chl] : clusterset.m_deps_to_add) {
    1208+        for (const auto& [dep, seq] : an_deps) {
    


    instagibbs commented at 6:27 pm on April 2, 2025:
    0        for (const auto& [dep, _] : an_deps) {
    

    sipa commented at 7:28 pm on April 2, 2025:
    Done.
  6. in src/txgraph.cpp:1209 in 631244831b outdated
    1203@@ -1205,7 +1204,8 @@ void TxGraphImpl::GroupClusters(int level) noexcept
    1204         // the partitions their Clusters are in.
    1205         Cluster* last_chl_cluster{nullptr};
    1206         PartitionData* last_partition{nullptr};
    1207-        for (const auto& [par, chl] : clusterset.m_deps_to_add) {
    1208+        for (const auto& [dep, seq] : an_deps) {
    1209+            auto [par, chl] = dep;
    1210             auto par_cluster = FindCluster(par, level);
    


    instagibbs commented at 6:37 pm on April 2, 2025:
    par_cluster and chl_cluster must both not be nulltptr, correct now that we’re gating an_deps construction on both clusters existing? Conditional few lines below should never fire.

    sipa commented at 7:28 pm on April 2, 2025:
    Right, done!
  7. in src/txgraph.cpp:1238 in 631244831b outdated
    1248             // Find all dependencies whose child Cluster is Cluster i, and annotate them with rep.
    1249-            while (deps_it != clusterset.m_deps_to_add.end()) {
    1250-                auto [par, chl] = *deps_it;
    1251+            while (deps_it != an_deps.end()) {
    1252+                auto [par, chl] = deps_it->first;
    1253                 auto chl_cluster = FindCluster(chl, level);
    


    instagibbs commented at 6:39 pm on April 2, 2025:
    Assume() FindCluster succeeded

    sipa commented at 7:28 pm on April 2, 2025:
    Done.

    instagibbs commented at 7:34 pm on April 2, 2025:
    I think this got missed

    sipa commented at 7:41 pm on April 2, 2025:
    Now actually done.
  8. instagibbs commented at 6:57 pm on April 2, 2025: member

    looks good, just a few cleanups suggested 631244831ba273fe72c6803f531bc344d0bf1579

    confirmed that no sorts remain based on pointers

  9. sipa force-pushed on Apr 2, 2025
  10. in src/txgraph.cpp:1203 in 3796c16550 outdated
    1201             partition_data[i].rank = 0;
    1202         }
    1203 
    1204-        // Run through all parent/child pairs in m_deps_to_add, and union the
    1205-        // the partitions their Clusters are in.
    1206+        // Run through all parent/child pairs in m_an_deps, and union the partitions their Clusters
    


    instagibbs commented at 7:31 pm on April 2, 2025:
    0        // Run through all parent/child pairs in an_deps, and union the partitions their Clusters
    

    sipa commented at 7:41 pm on April 2, 2025:
    Right, fixed.
  11. in src/txgraph.cpp:1217 in 3796c16550 outdated
    1219-            if (par_cluster == nullptr || chl_cluster == nullptr) continue;
    1220             Assume(par != chl);
    1221             if (chl_cluster == last_chl_cluster) {
    1222                 // If the child Clusters is the same as the previous iteration, union with the
    1223-                // tree they were in, avoiding the need for another lookup. Note that m_deps_to_add
    1224+                // tree they were in, avoiding the need for another lookup. Note that m_an_deps
    


    instagibbs commented at 7:32 pm on April 2, 2025:
    0                // tree they were in, avoiding the need for another lookup. Note that an_deps
    

    sipa commented at 7:41 pm on April 2, 2025:
    Fixed.
  12. instagibbs approved
  13. instagibbs commented at 7:36 pm on April 2, 2025: member

    ACK 3796c16550938fc09fa560d7060b3626506ce526

    Couple of typos and one last Assume that’d be nice

    git range-diff master 631244831ba273fe72c6803f531bc344d0bf1579 3796c16550938fc09fa560d7060b3626506ce526

  14. sipa force-pushed on Apr 2, 2025
  15. instagibbs approved
  16. laanwj added the label Tests on Apr 7, 2025
  17. maflcko commented at 1:30 pm on April 11, 2025: member

    Not sure if you just want to make it more determinisic, or fully deterministic, but I ran:

    0git rebase bitcoin-core/master
    1rm -rf ./bld-cmake && cmake -B ./bld-cmake  -DCMAKE_C_COMPILER='clang' -DCMAKE_CXX_COMPILER='clang++;-stdlib=libc++' -DBUILD_FOR_FUZZING=ON    -DCMAKE_CXX_FLAGS='-fprofile-instr-generate -fcoverage-mapping'                && cmake --build ./bld-cmake
    2cargo run --manifest-path ./contrib/devtools/deterministic-fuzz-coverage/Cargo.toml -- $PWD/bld-cmake/ $PWD/../b-c-qa-assets/fuzz_corpora/ txgraph  32
    

    I only ran it on a subset of the files txgraph/00* for a few times, and it printed: The coverage was not deterministic between runs.

    The output had too little context, so I re-created it with more context:

     0$ git diff -U9 --no-index  -- bld-cmake/fuzz_det_cov.show.t0.*.txt
     1diff --git a/bld-cmake/fuzz_det_cov.show.t0.a.txt b/bld-cmake/fuzz_det_cov.show.t0.b.txt
     2index 0f4fcc3a1d..d5e901bd30 100644
     3--- a/bld-cmake/fuzz_det_cov.show.t0.a.txt
     4+++ b/bld-cmake/fuzz_det_cov.show.t0.b.txt
     5@@ -265929,20 +265929,20 @@
     6   ------------------
     7   330|     19|            return (a != nullptr) <=> (b != nullptr);
     8   331|     19|        }
     9   332|       |        // If neither pointer is nullptr, compare the Clusters' sequence numbers.
    10   333|  2.37k|        Assume(a == b || a->m_sequence != b->m_sequence);
    11   ------------------
    12   |  |   97|  3.55k|#define Assume(val) inline_assertion_check<false>(val, __FILE__, __LINE__, __func__, #val)
    13   |  |                                  ^2.37k                             ^2.37k    ^2.37k              ^2.37k
    14   |  |  ------------------
    15-  |  |  |  Branch (97:51): [True: 1.19k, False: 1.17k]
    16-  |  |  |  Branch (97:51): [True: 1.17k, False: 0]
    17+  |  |  |  Branch (97:51): [True: 1.19k, False: 1.18k]
    18+  |  |  |  Branch (97:51): [True: 1.18k, False: 0]
    19   |  |  ------------------
    20   ------------------
    21   334|  2.37k|        return a->m_sequence <=> b->m_sequence;
    22   335|  2.39k|    }
    23   336|       |
    24   337|       |public:
    25   338|       |    /** Construct a new TxGraphImpl with the specified maximum cluster count. */
    26   339|       |    explicit TxGraphImpl(DepGraphIndex max_cluster_count) noexcept :
    27   340|      7|        m_max_cluster_count(max_cluster_count)
    28@@ -266889,23 +266889,23 @@
    29   974|     32|                auto cluster = FindCluster(index, level);
    30   975|     32|                if (cluster != nullptr) PullIn(cluster);
    31                                                       ^30
    32   ------------------
    33   |  Branch (975:21): [True: 30, False: 2]
    34   ------------------
    35   976|     32|            }
    36   977|      5|        }
    37   978|       |        // Group the set of to-be-removed entries by Cluster::m_sequence.
    38-  979|     93|        std::sort(to_remove.begin(), to_remove.end(), [&](GraphIndex a, GraphIndex b) noexcept {
    39-  980|     93|            Cluster* cluster_a = m_entries[a].m_locator[level].cluster;
    40-  981|     93|            Cluster* cluster_b = m_entries[b].m_locator[level].cluster;
    41-  982|     93|            return CompareClusters(cluster_a, cluster_b) < 0;
    42-  983|     93|        });
    43+  979|     94|        std::sort(to_remove.begin(), to_remove.end(), [&](GraphIndex a, GraphIndex b) noexcept {
    44+  980|     94|            Cluster* cluster_a = m_entries[a].m_locator[level].cluster;
    45+  981|     94|            Cluster* cluster_b = m_entries[b].m_locator[level].cluster;
    46+  982|     94|            return CompareClusters(cluster_a, cluster_b) < 0;
    47+  983|     94|        });
    48   984|       |        // Process per Cluster.
    49   985|     28|        std::span to_remove_span{to_remove};
    50   986|     78|        while (!to_remove_span.empty()) {
    51   ------------------
    

    This only shows the symptom. That is, a == b does not hold for the same amount of times, even when running over the same set of files. Not sure why that is, though.

    For reference, the two inputs I ran against:

     0sh-5.2$ base64 ./txgraph/00dc359d7b7edffe613622aa7f592960eb4db15d
     1//9/f///f6wAZIw7AAap5QBkiDsABqnlAAaqq5UABqvTAAauQgAGrJsABq085AatAAAAAAASBrkn
     2AAa6NgAAAAAAAAatPOQGrdcABq5ZAAaYAAcwVQAHOf4AB+Q6AAc7BqybAAatPOQGrdcABq5ZAAaY
     3AAcwVQAHOf4AB+Q6AAc7Pwa2JgAGtsAABrd8AAbRagAAAAAAAAAAEga5JwAGujYABgBSAH84lgAH
     4OUjXBzn+BqqrlQAGq9MABq5CAAasmwAGrQAHOuQABzs/AAc7hQAHCidF9P85lgAHOVUABzn+AAc6
     55AAHOz8ABzvlAAaKq1oABgrTAAauQgAGrJsABgCtAGUA1wAGrlkABpgAAAAAAACpAAAAAAAAAAAA
     6AAAAAAAAAAAAAP////8AAABrAAAAAAAAAAAAAAAAAAAAAAAAAAAA/+oAAAAAAAAAAAAAAAAAAAAA
     7/+oAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAC0xwAGtXMABrYmAga2AAAAAAAA+///////1ato
     8cgAAAAAAAAAAAAAAAAAAYBEAAAdg3/////8sAP//
     9
    10sh-5.2$ base64 ./txgraph/0002cb30d28317066ed82c933c477212d1680c53
    11//9/f///f6wAZIw7AAap5QBkiDsABqnlAAaqq5UABqvTAAauQgAGrJsABq085AatAAAAAAASBrkn
    12AAa6NgAGAFIAfzjx8fHx8fHx8fHx8fHx8fHx8fHx8fHx8fHx8fHx8fHx8fHx8fHx8fHx8fHx8fHx
    138fE7AAap5QAGqquVAAar0wAGrkoABqybAAYAAAAAAAAAAPHx8fHx8fHx8QAAAAAAAAAAAAAAAAAW
    14AAAADgAAABUAAAASAAAAEwAAACMAAAAcAAAAJgAAADIRAAAAHVXPRlXPRkZVVc/PRlXPRs/PRlVG
    15Ws/PBgAAAAUAAAACAAAAAAAAAAAAABgKAAAAAAAAABAAAAAMAAAACAAAAA8AAAAEAAAADQAAACEA
    16AAAAAAAAAQAAAAkAAAALAAAABwoHRQQANJYABzlVAAaqq5UABqvTAAauQgAGrJsABq085Aat1wAG
    17rlkABpgAnTBVAAc5/gAHOuQABzs/BrYmAAa2wAAGt3wABtFqAAAHAAAAAAASBrknAAa6NgAGAFIA
    18fziWAAc5SNcHOf4ABzrkAAc7PwAHO4UABwonRfT/OJYABzlVAAc5/gAHOuQABzs/AAc75QAGqqta
    19AAar0wAGrkIABqybAAYAAABlrdcABgCumAZZAAAAAAAAqQAAAAAAAAAAAAAAAAAAAAAAAAD/////
    20AAAAawAAAAAAAAAAAAAAAAAAAAAAAAAAAP/qAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAtMcA
    21BrVzAAa2JgIGtgAAAAAAAAAAAAAAACpUaHIAQAAAAAAAAAAAAC4AAAA/AAAAJwAAABEAAAAJAAAA
    22AAAAADwAAAAIAAAAFQAAADkAAAATAAAAEAAAADYAAAACAAAAHQAAADAAAAAAAAAAAABdKQAHXnkA
    23B18AAABKAAdgEQAAB2Df/////ywA//8=
    
  18. sipa commented at 1:41 pm on April 11, 2025: member
    @maflcko Huh, interesting, will investigate. TxGraphImpl does hold a default-constructed FastRandomContext, but those should within the fuzz tests be seeded deterministically now, right?
  19. maflcko commented at 1:49 pm on April 11, 2025: member
    Yes, as long as the random context is not global, it should be fine.
  20. txgraph: compare sequence numbers instead of Cluster* (bugfix)
    This makes fuzz testing more deterministic, by avoiding the (arbitrary) pointer
    value ordering in comparing transactions.
    c72c8d5d45
  21. txgraph: make GroupClusters use partition numbers directly (optimization) 2835216ec0
  22. sipa force-pushed on Apr 11, 2025
  23. sipa commented at 2:44 pm on April 11, 2025: member

    @maflcko Fixed, I think! There was another pointer comparison in the fuzz test itself.

     0+++ b/src/test/fuzz/txgraph.cpp
     1@@ -206,11 +206,14 @@ struct SimTxGraph
     2                 ret.push_back(ptr);
     3             }
     4         }
     5-        // Deduplicate.
     6-        std::sort(ret.begin(), ret.end());
     7-        ret.erase(std::unique(ret.begin(), ret.end()), ret.end());
     8-        // Replace input.
     9-        arg = std::move(ret);
    10+        // Construct deduplicated version in input (do not use std::sort/std::unique for
    11+        // deduplication as it'd rely on non-deterministic pointer comparison).
    12+        arg.clear();
    13+        for (auto ptr : ret) {
    14+            if (std::find(arg.begin(), arg.end(), ptr) == arg.end()) {
    15+                arg.push_back(ptr);
    16+            }
    17+        }
    18     }
    19 };
    
  24. maflcko commented at 4:48 pm on April 11, 2025: member
    Nice. cargo run --manifest-path ./contrib/devtools/deterministic-fuzz-coverage/Cargo.toml -- $PWD/bld-cmake/ $PWD/../b-c-qa-assets/fuzz_corpora/ txgraph 32 from above looks stable now for me.
  25. sipa commented at 4:50 pm on April 11, 2025: member
    @maflcko Great, nice tool. I tried it as well with a hastily-constructed txgraph corpus (after -set_cover_merge=1‘ing it from 30000 to 700 files), and found no issues.
  26. instagibbs commented at 5:44 pm on April 14, 2025: member

    reACK 2835216ec09cc2d86b091824376b15601e7c7b8a

    good catch, didn’t check the fuzzer harness for sorts like I did TxGraph itself…

    git range-diff master 133813c503f2894c36442ec5a34c466d95961f45 2835216ec09cc2d86b091824376b15601e7c7b8a

  27. marcofleon commented at 2:52 pm on April 15, 2025: contributor

    ACK 2835216ec09cc2d86b091824376b15601e7c7b8a

    On master, using an old corpus, AFL++ shows 30% stability. On this PR, after generating a new corpus (~20 cpu hours), AFL++ is showing 93%. I also ran the determinism script several times using the new corpus and it’s passed every time.


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-04-16 15:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me