Make TxGraph fuzz tests more deterministic #32191

pull sipa wants to merge 2 commits into bitcoin:master from sipa:202504_txgraph_deterministic changing 2 files +102 −76
  1. sipa commented at 8:36 PM on April 1, 2025: member

    Part of cluster mempool: #30289

    The implicit transaction ordering for transactions in a TxGraphImpl is defined by:

    1. higher chunk feerate first
    2. lower Cluster* object pointer first
    3. lower position within cluster linearization first.

    Number (2) is not deterministic, as it intricately depends on the heap allocation algorithm. Fix this by giving each Cluster a unique uint64_t m_sequence value, and sorting by those instead.

    The second commit then uses this new approach to optimize GroupClusters a bit more, avoiding some repeated checks and dereferences, by making a local copy of the involved sequence numbers.

    Thanks to @dergoegge for pointing this out.

  2. DrahtBot commented at 8:36 PM on April 1, 2025: contributor

    <!--e57a25ab6845829454e8d69fc972939a-->

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    <!--006a51241073e994b41acfe9ec718e94-->

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/32191.

    <!--021abf342d371248e50ceaed478a90ca-->

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK instagibbs, marcofleon, glozow

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    <!--174a7506f384e20aa4161008e828411d-->

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #32263 (cluster mempool: add TxGraph work controls by sipa)
    • #31553 (cluster mempool: add TxGraph reorg functionality by sipa)
    • #31444 (cluster mempool: add txgraph diagrams/mining/eviction by sipa)
    • #28676 ([WIP] Cluster mempool implementation by sdaftuar)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  3. in src/txgraph.cpp:253 in e58b825305 outdated
     248 | @@ -244,6 +249,8 @@ class TxGraphImpl final : public TxGraph
     249 |      ClusterSet m_main_clusterset;
     250 |      /** The staging ClusterSet, if any. */
     251 |      std::optional<ClusterSet> m_staging_clusterset;
     252 | +    /** Next sequence number to assign to created Clusters. */
     253 | +    uint64_t m_sequence_counter{0};
    


    instagibbs commented at 5:11 PM on April 2, 2025:

    light pref

        uint64_t m_next_sequence_counter{0};
    

    sipa commented at 7:27 PM on April 2, 2025:

    Done.

  4. in src/txgraph.cpp:1204 in 631244831b outdated
    1203 | @@ -1205,7 +1204,8 @@ void TxGraphImpl::GroupClusters(int level) noexcept
    1204 |          // the partitions their Clusters are in.
    


    instagibbs commented at 6:24 PM on April 2, 2025:

    comment above this line should say an_deps


    sipa commented at 7:28 PM on April 2, 2025:

    Indeed, done (and elsewhere).

  5. in src/txgraph.cpp:1207 in 631244831b outdated
    1203 | @@ -1205,7 +1204,8 @@ void TxGraphImpl::GroupClusters(int level) noexcept
    1204 |          // the partitions their Clusters are in.
    1205 |          Cluster* last_chl_cluster{nullptr};
    1206 |          PartitionData* last_partition{nullptr};
    1207 | -        for (const auto& [par, chl] : clusterset.m_deps_to_add) {
    1208 | +        for (const auto& [dep, seq] : an_deps) {
    


    instagibbs commented at 6:27 PM on April 2, 2025:
            for (const auto& [dep, _] : an_deps) {
    

    sipa commented at 7:28 PM on April 2, 2025:

    Done.

  6. in src/txgraph.cpp:1209 in 631244831b outdated
    1203 | @@ -1205,7 +1204,8 @@ void TxGraphImpl::GroupClusters(int level) noexcept
    1204 |          // the partitions their Clusters are in.
    1205 |          Cluster* last_chl_cluster{nullptr};
    1206 |          PartitionData* last_partition{nullptr};
    1207 | -        for (const auto& [par, chl] : clusterset.m_deps_to_add) {
    1208 | +        for (const auto& [dep, seq] : an_deps) {
    1209 | +            auto [par, chl] = dep;
    1210 |              auto par_cluster = FindCluster(par, level);
    


    instagibbs commented at 6:37 PM on April 2, 2025:

    par_cluster and chl_cluster must both not be nulltptr, correct now that we're gating an_deps construction on both clusters existing? Conditional few lines below should never fire.


    sipa commented at 7:28 PM on April 2, 2025:

    Right, done!

  7. in src/txgraph.cpp:1238 in 631244831b outdated
    1248 |              // Find all dependencies whose child Cluster is Cluster i, and annotate them with rep.
    1249 | -            while (deps_it != clusterset.m_deps_to_add.end()) {
    1250 | -                auto [par, chl] = *deps_it;
    1251 | +            while (deps_it != an_deps.end()) {
    1252 | +                auto [par, chl] = deps_it->first;
    1253 |                  auto chl_cluster = FindCluster(chl, level);
    


    instagibbs commented at 6:39 PM on April 2, 2025:

    Assume() FindCluster succeeded


    sipa commented at 7:28 PM on April 2, 2025:

    Done.


    instagibbs commented at 7:34 PM on April 2, 2025:

    I think this got missed


    sipa commented at 7:41 PM on April 2, 2025:

    Now actually done.

  8. instagibbs commented at 6:57 PM on April 2, 2025: member

    looks good, just a few cleanups suggested 631244831ba273fe72c6803f531bc344d0bf1579

    confirmed that no sorts remain based on pointers

  9. sipa force-pushed on Apr 2, 2025
  10. in src/txgraph.cpp:1203 in 3796c16550 outdated
    1201 |              partition_data[i].rank = 0;
    1202 |          }
    1203 |  
    1204 | -        // Run through all parent/child pairs in m_deps_to_add, and union the
    1205 | -        // the partitions their Clusters are in.
    1206 | +        // Run through all parent/child pairs in m_an_deps, and union the partitions their Clusters
    


    instagibbs commented at 7:31 PM on April 2, 2025:
            // Run through all parent/child pairs in an_deps, and union the partitions their Clusters
    

    sipa commented at 7:41 PM on April 2, 2025:

    Right, fixed.

  11. in src/txgraph.cpp:1217 in 3796c16550 outdated
    1219 | -            if (par_cluster == nullptr || chl_cluster == nullptr) continue;
    1220 |              Assume(par != chl);
    1221 |              if (chl_cluster == last_chl_cluster) {
    1222 |                  // If the child Clusters is the same as the previous iteration, union with the
    1223 | -                // tree they were in, avoiding the need for another lookup. Note that m_deps_to_add
    1224 | +                // tree they were in, avoiding the need for another lookup. Note that m_an_deps
    


    instagibbs commented at 7:32 PM on April 2, 2025:
                    // tree they were in, avoiding the need for another lookup. Note that an_deps
    

    sipa commented at 7:41 PM on April 2, 2025:

    Fixed.

  12. instagibbs approved
  13. instagibbs commented at 7:36 PM on April 2, 2025: member

    ACK 3796c16550938fc09fa560d7060b3626506ce526

    Couple of typos and one last Assume that'd be nice

    git range-diff master 631244831ba273fe72c6803f531bc344d0bf1579 3796c16550938fc09fa560d7060b3626506ce526

  14. sipa force-pushed on Apr 2, 2025
  15. instagibbs approved
  16. laanwj added the label Tests on Apr 7, 2025
  17. maflcko commented at 1:30 PM on April 11, 2025: member

    Not sure if you just want to make it more determinisic, or fully deterministic, but I ran:

    git rebase bitcoin-core/master
    rm -rf ./bld-cmake && cmake -B ./bld-cmake  -DCMAKE_C_COMPILER='clang' -DCMAKE_CXX_COMPILER='clang++;-stdlib=libc++' -DBUILD_FOR_FUZZING=ON    -DCMAKE_CXX_FLAGS='-fprofile-instr-generate -fcoverage-mapping'                && cmake --build ./bld-cmake
    cargo run --manifest-path ./contrib/devtools/deterministic-fuzz-coverage/Cargo.toml -- $PWD/bld-cmake/ $PWD/../b-c-qa-assets/fuzz_corpora/ txgraph  32
    

    I only ran it on a subset of the files txgraph/00* for a few times, and it printed: The coverage was not deterministic between runs.

    The output had too little context, so I re-created it with more context:

    $ git diff -U9 --no-index  -- bld-cmake/fuzz_det_cov.show.t0.*.txt
    diff --git a/bld-cmake/fuzz_det_cov.show.t0.a.txt b/bld-cmake/fuzz_det_cov.show.t0.b.txt
    index 0f4fcc3a1d..d5e901bd30 100644
    --- a/bld-cmake/fuzz_det_cov.show.t0.a.txt
    +++ b/bld-cmake/fuzz_det_cov.show.t0.b.txt
    @@ -265929,20 +265929,20 @@
       ------------------
       330|     19|            return (a != nullptr) <=> (b != nullptr);
       331|     19|        }
       332|       |        // If neither pointer is nullptr, compare the Clusters' sequence numbers.
       333|  2.37k|        Assume(a == b || a->m_sequence != b->m_sequence);
       ------------------
       |  |   97|  3.55k|#define Assume(val) inline_assertion_check<false>(val, __FILE__, __LINE__, __func__, #val)
       |  |                                  ^2.37k                             ^2.37k    ^2.37k              ^2.37k
       |  |  ------------------
    -  |  |  |  Branch (97:51): [True: 1.19k, False: 1.17k]
    -  |  |  |  Branch (97:51): [True: 1.17k, False: 0]
    +  |  |  |  Branch (97:51): [True: 1.19k, False: 1.18k]
    +  |  |  |  Branch (97:51): [True: 1.18k, False: 0]
       |  |  ------------------
       ------------------
       334|  2.37k|        return a->m_sequence <=> b->m_sequence;
       335|  2.39k|    }
       336|       |
       337|       |public:
       338|       |    /** Construct a new TxGraphImpl with the specified maximum cluster count. */
       339|       |    explicit TxGraphImpl(DepGraphIndex max_cluster_count) noexcept :
       340|      7|        m_max_cluster_count(max_cluster_count)
    @@ -266889,23 +266889,23 @@
       974|     32|                auto cluster = FindCluster(index, level);
       975|     32|                if (cluster != nullptr) PullIn(cluster);
                                                           ^30
       ------------------
       |  Branch (975:21): [True: 30, False: 2]
       ------------------
       976|     32|            }
       977|      5|        }
       978|       |        // Group the set of to-be-removed entries by Cluster::m_sequence.
    -  979|     93|        std::sort(to_remove.begin(), to_remove.end(), [&](GraphIndex a, GraphIndex b) noexcept {
    -  980|     93|            Cluster* cluster_a = m_entries[a].m_locator[level].cluster;
    -  981|     93|            Cluster* cluster_b = m_entries[b].m_locator[level].cluster;
    -  982|     93|            return CompareClusters(cluster_a, cluster_b) < 0;
    -  983|     93|        });
    +  979|     94|        std::sort(to_remove.begin(), to_remove.end(), [&](GraphIndex a, GraphIndex b) noexcept {
    +  980|     94|            Cluster* cluster_a = m_entries[a].m_locator[level].cluster;
    +  981|     94|            Cluster* cluster_b = m_entries[b].m_locator[level].cluster;
    +  982|     94|            return CompareClusters(cluster_a, cluster_b) < 0;
    +  983|     94|        });
       984|       |        // Process per Cluster.
       985|     28|        std::span to_remove_span{to_remove};
       986|     78|        while (!to_remove_span.empty()) {
       ------------------
    

    This only shows the symptom. That is, a == b does not hold for the same amount of times, even when running over the same set of files. Not sure why that is, though.

    For reference, the two inputs I ran against:

    sh-5.2$ base64 ./txgraph/00dc359d7b7edffe613622aa7f592960eb4db15d
    //9/f///f6wAZIw7AAap5QBkiDsABqnlAAaqq5UABqvTAAauQgAGrJsABq085AatAAAAAAASBrkn
    AAa6NgAAAAAAAAatPOQGrdcABq5ZAAaYAAcwVQAHOf4AB+Q6AAc7BqybAAatPOQGrdcABq5ZAAaY
    AAcwVQAHOf4AB+Q6AAc7Pwa2JgAGtsAABrd8AAbRagAAAAAAAAAAEga5JwAGujYABgBSAH84lgAH
    OUjXBzn+BqqrlQAGq9MABq5CAAasmwAGrQAHOuQABzs/AAc7hQAHCidF9P85lgAHOVUABzn+AAc6
    5AAHOz8ABzvlAAaKq1oABgrTAAauQgAGrJsABgCtAGUA1wAGrlkABpgAAAAAAACpAAAAAAAAAAAA
    AAAAAAAAAAAAAP////8AAABrAAAAAAAAAAAAAAAAAAAAAAAAAAAA/+oAAAAAAAAAAAAAAAAAAAAA
    /+oAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAC0xwAGtXMABrYmAga2AAAAAAAA+///////1ato
    cgAAAAAAAAAAAAAAAAAAYBEAAAdg3/////8sAP//
    
    sh-5.2$ base64 ./txgraph/0002cb30d28317066ed82c933c477212d1680c53
    //9/f///f6wAZIw7AAap5QBkiDsABqnlAAaqq5UABqvTAAauQgAGrJsABq085AatAAAAAAASBrkn
    AAa6NgAGAFIAfzjx8fHx8fHx8fHx8fHx8fHx8fHx8fHx8fHx8fHx8fHx8fHx8fHx8fHx8fHx8fHx
    8fE7AAap5QAGqquVAAar0wAGrkoABqybAAYAAAAAAAAAAPHx8fHx8fHx8QAAAAAAAAAAAAAAAAAW
    AAAADgAAABUAAAASAAAAEwAAACMAAAAcAAAAJgAAADIRAAAAHVXPRlXPRkZVVc/PRlXPRs/PRlVG
    Ws/PBgAAAAUAAAACAAAAAAAAAAAAABgKAAAAAAAAABAAAAAMAAAACAAAAA8AAAAEAAAADQAAACEA
    AAAAAAAAAQAAAAkAAAALAAAABwoHRQQANJYABzlVAAaqq5UABqvTAAauQgAGrJsABq085Aat1wAG
    rlkABpgAnTBVAAc5/gAHOuQABzs/BrYmAAa2wAAGt3wABtFqAAAHAAAAAAASBrknAAa6NgAGAFIA
    fziWAAc5SNcHOf4ABzrkAAc7PwAHO4UABwonRfT/OJYABzlVAAc5/gAHOuQABzs/AAc75QAGqqta
    AAar0wAGrkIABqybAAYAAABlrdcABgCumAZZAAAAAAAAqQAAAAAAAAAAAAAAAAAAAAAAAAD/////
    AAAAawAAAAAAAAAAAAAAAAAAAAAAAAAAAP/qAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAtMcA
    BrVzAAa2JgIGtgAAAAAAAAAAAAAAACpUaHIAQAAAAAAAAAAAAC4AAAA/AAAAJwAAABEAAAAJAAAA
    AAAAADwAAAAIAAAAFQAAADkAAAATAAAAEAAAADYAAAACAAAAHQAAADAAAAAAAAAAAABdKQAHXnkA
    B18AAABKAAdgEQAAB2Df/////ywA//8=
    
  18. sipa commented at 1:41 PM on April 11, 2025: member

    @maflcko Huh, interesting, will investigate. TxGraphImpl does hold a default-constructed FastRandomContext, but those should within the fuzz tests be seeded deterministically now, right?

  19. maflcko commented at 1:49 PM on April 11, 2025: member

    Yes, as long as the random context is not global, it should be fine.

  20. txgraph: compare sequence numbers instead of Cluster* (bugfix)
    This makes fuzz testing more deterministic, by avoiding the (arbitrary) pointer
    value ordering in comparing transactions.
    c72c8d5d45
  21. txgraph: make GroupClusters use partition numbers directly (optimization) 2835216ec0
  22. sipa force-pushed on Apr 11, 2025
  23. sipa commented at 2:44 PM on April 11, 2025: member

    @maflcko Fixed, I think! There was another pointer comparison in the fuzz test itself.

    +++ b/src/test/fuzz/txgraph.cpp
    @@ -206,11 +206,14 @@ struct SimTxGraph
                     ret.push_back(ptr);
                 }
             }
    -        // Deduplicate.
    -        std::sort(ret.begin(), ret.end());
    -        ret.erase(std::unique(ret.begin(), ret.end()), ret.end());
    -        // Replace input.
    -        arg = std::move(ret);
    +        // Construct deduplicated version in input (do not use std::sort/std::unique for
    +        // deduplication as it'd rely on non-deterministic pointer comparison).
    +        arg.clear();
    +        for (auto ptr : ret) {
    +            if (std::find(arg.begin(), arg.end(), ptr) == arg.end()) {
    +                arg.push_back(ptr);
    +            }
    +        }
         }
     };
    
  24. maflcko commented at 4:48 PM on April 11, 2025: member

    Nice. cargo run --manifest-path ./contrib/devtools/deterministic-fuzz-coverage/Cargo.toml -- $PWD/bld-cmake/ $PWD/../b-c-qa-assets/fuzz_corpora/ txgraph 32 from above looks stable now for me.

  25. sipa commented at 4:50 PM on April 11, 2025: member

    @maflcko Great, nice tool. I tried it as well with a hastily-constructed txgraph corpus (after -set_cover_merge=1'ing it from 30000 to 700 files), and found no issues.

  26. instagibbs commented at 5:44 PM on April 14, 2025: member

    reACK 2835216ec09cc2d86b091824376b15601e7c7b8a

    good catch, didn't check the fuzzer harness for sorts like I did TxGraph itself...

    git range-diff master 133813c503f2894c36442ec5a34c466d95961f45 2835216ec09cc2d86b091824376b15601e7c7b8a

  27. marcofleon commented at 2:52 PM on April 15, 2025: contributor

    ACK 2835216ec09cc2d86b091824376b15601e7c7b8a

    On master, using an old corpus, AFL++ shows 30% stability. On this PR, after generating a new corpus (~20 cpu hours), AFL++ is showing 93%. I also ran the determinism script several times using the new corpus and it's passed every time.

  28. glozow commented at 5:47 PM on April 17, 2025: member

    utACK 2835216ec09cc2d86b091824376b15601e7c7b8a

  29. glozow merged this on Apr 17, 2025
  30. glozow closed this on Apr 17, 2025

  31. venpisey commented at 6:55 PM on April 19, 2025: none

    5167170015919920


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-04-19 09:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me