Add simulation-based CCoinsViewCache fuzzer #27011

pull sipa wants to merge 4 commits into bitcoin:master from sipa:202301_coinscache_fuzz changing 7 files +514 −6
  1. sipa commented at 11:12 pm on January 31, 2023: member
    The fuzzer goes through a sequence of operations that get applied to both a real stack of CCoinsViewCache objects, and to simulation data, comparing the two at the end.
  2. DrahtBot commented at 11:12 pm on January 31, 2023: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK jamesob, dergoegge

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #25325 (Add pool based memory resource by martinus)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  3. sipa force-pushed on Feb 1, 2023
  4. in src/test/fuzz/coinscache_sim.cpp:243 in f87382af7a outdated
    238+    // to both the real cache stack and the simulation.
    239+    auto it = buffer.begin();
    240+    unsigned iters = 0;
    241+    while (it != buffer.end()) {
    242+        // Upper limit on number of operations.
    243+        if (++iters == 10000U) break;
    


    maflcko commented at 8:24 am on February 1, 2023:
    nit: Could replace those four lines with just LIMITED_WHILE?

    dergoegge commented at 11:35 am on February 1, 2023:

    Why not consume until the buffer is empty?

    Input sizes are limited by the fuzzing engines anyway (e.g. libFuzzer’s default is 4096 bytes when starting from an empty corpus), so the max. number of ops is capped by that.


    maflcko commented at 12:00 pm on February 1, 2023:

    libFuzzer’s default is 4096 bytes

    We use more fuzz engines than libFuzzer on default settings ;)

    You can look up the history of LIMITED_WHILE to get an idea of how much is broken if the macro is removed.


    sipa commented at 2:07 pm on February 1, 2023:

    nit: Could replace those four lines with just LIMITED_WHILE?

    Done.

    Why not consume until the buffer is empty?

    It does, by the way (in addition to having an upper limit).

  5. in src/test/fuzz/coinscache_sim.cpp:254 in f87382af7a outdated
    249+        if (caches.empty()) {
    250+            caches.emplace_back(new CCoinsViewCache(&bottom));
    251+            sim_caches[caches.size()].Wipe();
    252+        }
    253+
    254+        unsigned cmd = (*(it++)) % 17U;
    


    maflcko commented at 8:24 am on February 1, 2023:

    nit: If you do switch to FuzzedDataProvider, could use CallOneOf?

    .


    sipa commented at 2:07 pm on February 1, 2023:
    Done.
  6. in src/test/fuzz/coinscache_sim.cpp:258 in f87382af7a outdated
    253+
    254+        unsigned cmd = (*(it++)) % 17U;
    255+        switch (cmd) {
    256+
    257+        case 0: { // GetCoin
    258+            uint32_t outpointidx = (it == buffer.end()) ? 0 : *(it++);
    


    maflcko commented at 8:27 am on February 1, 2023:

    nit: If you do switch to FuzzedDataProvider, could use the <uint8_t> int getter? Alternatively, maybe write a lambda: const auto get_uint8_t{[](auto& it) { return it == buffer.end() ? 0 : *(it++); }} and use it in all places with

    0            uint32_t outpointidx{get_uint8_t(it)};
    

    sipa commented at 2:08 pm on February 1, 2023:
    Done, switched provider.ConsumeIntegralInRange<uint32_t>(0, NUM_OUTPOINTS - 1);, which makes it not strongly tied to the range of a uint8_t.
  7. maflcko commented at 8:34 am on February 1, 2023: member
    Left some style nits. (Can be ignored unless there is other stuff to push)
  8. fanquake requested review from dergoegge on Feb 1, 2023
  9. in src/test/fuzz/coinscache_sim.cpp:194 in f87382af7a outdated
    189+} // namespace
    190+
    191+FUZZ_TARGET(coinscache_sim)
    192+{
    193+    /** Precomputed COutPoint and CCoins values. */
    194+    const PrecomputedData data;
    


    dergoegge commented at 11:20 am on February 1, 2023:
    0    static const PrecomputedData data;
    

    PrecomputedData is generated deterministically, right? Given that and it being const, there should be no need to generate it on each iteration.


    sipa commented at 2:08 pm on February 1, 2023:
    Oops, good catch. Made it static, which makes fuzzing several times faster.
  10. dergoegge commented at 11:52 am on February 1, 2023: member
    Concept ACK
  11. in src/test/fuzz/coinscache_sim.cpp:26 in f87382af7a outdated
    20+constexpr uint32_t NUM_OUTPOINTS = 256;
    21+/** Number of distinct Coin values used in this test (ignoring nHeight). */
    22+constexpr uint32_t NUM_COINS = 256;
    23+/** Maximum number CCoinsViewCache objects used in this test. */
    24+constexpr uint32_t MAX_CACHES = 4;
    25+/** Data type large enough to hold 2^NUM_COINS-1. */
    


    jamesob commented at 1:10 pm on February 1, 2023:
    Confused by this comment - 2**8 doesn’t seem to be able to fit 2**256-1. Did you mean to write NUM_COINS - 1?

    sipa commented at 2:28 pm on February 1, 2023:
    Indeed!
  12. sipa force-pushed on Feb 1, 2023
  13. in src/test/fuzz/coinscache_sim.cpp:397 in c7d20096b6 outdated
    391+                // Apply to real caches.
    392+                caches.back()->Flush();
    393+            },
    394+
    395+            [&]() { // Sync.
    396+                // Apply to simulation data (note that in our simulation, syncing and flushing is the same thing).
    


    jamesob commented at 2:13 pm on February 1, 2023:
    Hm. Can’t you emulate Sync() by adding an erase parameter to flush() that just doesn’t set the cache entry to EntryType::NONE? If you don’t do this, won’t the contents of the simulation cache differ from the real caches?

    sipa commented at 2:30 pm on February 1, 2023:

    Semantically, Flush() and Sync() have the same effect: they push the changes made in this cache to the parent coinsview. Whether or not entries stay in a cache is an implementation aspect that’s not observable (except through CacheSize() etc).

    The NONE entry in the simulated cache is also different from “missing” in the real cache, in that lookups in the simulated cache never pull in entries from parent caches, so you can’t really compare the actual cache entrytype with anything the real caches do. They’re wholly different implementations, but their observable behavior should be identical.

    You’re right that I could choose to not reset things to NONE in the simulated cache, but that’d in effect be extending the test to test the behavior of the simulation: resetting to NONE is just a non-observable optimization too there.

  14. jamesob approved
  15. jamesob commented at 2:20 pm on February 1, 2023: member

    ACK c7d20096b6808b11aadda7a652273298b3dd98a8 (jamesob/ackr/27011.2.sipa.add_simulation_based_cco)

    Built and reviewed locally. Always happy to see more coins tests.

    Just a few questions, but as-is this represents a mergeable improvement.

     0-----BEGIN PGP SIGNED MESSAGE-----
     1Hash: SHA512
     2
     3ACK c7d20096b6808b11aadda7a652273298b3dd98a8 ([`jamesob/ackr/27011.2.sipa.add_simulation_based_cco`](https://github.com/jamesob/bitcoin/tree/ackr/27011.2.sipa.add_simulation_based_cco))
     4
     5Built and reviewed locally. Always happy to see more coins tests. 
     6
     7Just a few questions, but as-is this represents a mergeable improvement.
     8-----BEGIN PGP SIGNATURE-----
     9
    10iQIzBAEBCgAdFiEEGNRVI1NPYuZCSIrGepNdrbLETwUFAmPadUMACgkQepNdrbLE
    11TwUs4w//V3HDJYwqiLfMmjKTT1Mo80v0cyFG2WQrUlnLIEBOdSZSor1ur2AgzOjk
    124paKsSCuMzUVfm+CHVvuIRxpPyg9Q4sEONql979/H1yB8VvK7zbddJJ1nyKuEHJQ
    13mex99u0TWspnQRst6T1B1STwSX7Sjc6UUAcVG1dkIZxHlaR0ll8ZDaQcHo9vUWyy
    14gHVRi6V+YsiyVqTJ59hEYgkbHQASjLIFFSGyghMDJTp3jTtOKEDJZzdzpL8KgqSE
    151Mj1/IsEasgekgxAWKuAC29PetSC+XktzHVrLGfR1IHnWnMQ1eoe8lLg86FD1byS
    16U+QpTjb4KMf6zWiflzWcIE0EHgmxaBqug4uAuufPrZKd5r2OQyDxy1LkiawatTgu
    17iwSvFriNH6X2B7I8J34l+/9g9NxubwxVoJvscuuk14vW80IK6lUyPsxc/ujIqiRl
    18ganCc9Obon1JViEdh/t/sjmmHzCFpYmftvjNn+Kwl0XVSMa2a1zKIKMDoR18l/8L
    19v60G6haqyl49H2mmjIPEnomsujwtQQW6mGYolYhAI+LuYOCmkPHQ6Mw8VsGZRmdg
    20xxoATDjx5wMoGCjg3/p8TWszUTHDo3UxpkcBlR9lteWpOTCi3LtaNIYuv8/aBYJJ
    21nbtJlxNXUGFoYBbC/RxLhTx6orbkcldElwXPtxhx/CYKmxWXMiA=
    22=f9q/
    23-----END PGP SIGNATURE-----
    
    0Tested on Linux-6.1.6-arch1-3-x86_64-with-glibc2.36
    1
    2Configured with ./configure LDFLAGS=-L/home/james/src/bitcoin2/db4/lib/ CPPFLAGS=-I/home/james/src/bitcoin2/db4/include/ 'CXXFLAGS=-fPIE -pipe -O2 -g -Wthread-safety-analysis -Wall -Werror=sign-compare -Wsign-compare -Werror=thread-safety-analysis' --enable-wallet --enable-debug --with-daemon --enable-natpmp-default CXX=/usr/bin/clang++ CC=/usr/bin/clang --disable-shared --with-pic --enable-benchmark=no --enable-module-recovery --disable-module-ecdh --no-create --no-recursion
    3
    4Compiled with /usr/bin/ccache /usr/bin/clang++ -std=c++17 -mavx -mavx2 -mpclmul -fPIE -pipe -O2 -g -Wthread-safety-analysis -Wall -Werror=sign-compare -Wsign-compare -Werror=thread-safety-analysis -O0 -g3 -ftrapv -fdebug-prefix-map=$(abs_top_srcdir)=.  -Wstack-protector -fstack-protector-all -fcf-protection=full -fstack-clash-protection -msse4.1 -msse4.2 -msse4 -msha  i
    5
    6Compiler version: clang version 15.0.7
    
  16. sipa force-pushed on Feb 1, 2023
  17. jamesob approved
  18. jamesob commented at 4:15 pm on February 1, 2023: member

    re-ACK fd84485

    A small refactoring change and comment updates.

    0git range-diff upstream/master ackr/27011.2.sipa.add_simulation_based_cco ackr/27011.3.sipa.add_simulation_based_cco
    
  19. Add simulation-based CCoinsViewCache fuzzer
    The fuzzer goes through a sequence of operations that get applied to both a
    real stack of CCoinsViewCache objects, and to simulation data, comparing
    the two at the end.
    3c9cea1340
  20. sipa force-pushed on Feb 1, 2023
  21. in src/coins.cpp:336 in bc2cc7243f outdated
    328+        // Recompute cachedCoinsUsage.
    329+        recomputed_usage += entry.coin.DynamicMemoryUsage();
    330+    }
    331+    assert(recomputed_usage == cachedCoinsUsage);
    332+}
    333+
    


    unknown commented at 11:53 pm on February 1, 2023:

    Not sure about 0, 1, 2 etc. used here:

    Readability: Instead of using numbers, meaningful names for the different coin attributes could be used. This makes the code easier to understand and maintain.


    sipa commented at 11:59 pm on February 1, 2023:
    There really is no meaning at all; they’re arbitrary bits assigned to certain properties.
  22. sipa force-pushed on Feb 1, 2023
  23. Add CCoinsViewCache::SanityCheck() and use it in fuzz test b0ff310840
  24. sipa force-pushed on Feb 2, 2023
  25. Add deterministic mode to CCoinsViewCache 59e6828bb5
  26. in src/coins.cpp:36 in 883c2c8258 outdated
    31@@ -32,7 +32,9 @@ bool CCoinsViewBacked::BatchWrite(CCoinsMap &mapCoins, const uint256 &hashBlock,
    32 std::unique_ptr<CCoinsViewCursor> CCoinsViewBacked::Cursor() const { return base->Cursor(); }
    33 size_t CCoinsViewBacked::EstimateSize() const { return base->EstimateSize(); }
    34 
    35-CCoinsViewCache::CCoinsViewCache(CCoinsView* baseIn) : CCoinsViewBacked(baseIn) {}
    36+CCoinsViewCache::CCoinsViewCache(CCoinsView* baseIn, bool deterministic) :
    37+    CCoinsViewBacked(baseIn), cacheCoins(0, SaltedOutpointHasher(deterministic))
    


    dergoegge commented at 11:21 am on February 2, 2023:

    I think currently the determinism is lost when ReallocateCache is called, because we call the constructor there without the deterministic SaltedOutpointHasher.

    https://github.com/bitcoin/bitcoin/blob/883c2c8258fb10cfd5ab5e60735284b274fb1158/src/coins.cpp#L311-L317


    sipa commented at 2:58 pm on February 2, 2023:
    Fixed, I think.
  27. sipa force-pushed on Feb 2, 2023
  28. dergoegge approved
  29. dergoegge commented at 4:48 pm on February 2, 2023: member
    Code review ACK 59e6828bb5b56a2354a80059d3f660f551f3e207
  30. fanquake requested review from jamesob on Feb 2, 2023
  31. in src/test/fuzz/coinscache_sim.cpp:177 in 59e6828bb5 outdated
    172+                    m_data[it->first] = it->second.coin;
    173+                }
    174+            } else {
    175+                /* For non-dirty entries being written, compare them with what we have. */
    176+                if (it->second.coin.IsSpent()) {
    177+                    assert(m_data.count(it->first) == 0);
    



    sipa commented at 10:00 pm on February 2, 2023:

    That’s not what I would have expected when I wrote it, but it makes sense; it’s related to how line 55 of coins.cpp (https://drahtbot.space/host_reports/DrahtBot/reports/coverage_fuzz/monotree/59e6828bb5b56a23/fuzz.coverage/src/coins.cpp.gcov.html) isn’t covered either.

    When a coin is looked up in a child cache, and the parent reports it doesn’t exist, the child does not cache that absence (this is in contrast to a successful lookup, which does get copied into the child). As a result, we never actually exercise the case where non-DIRTY IsSpent() appear in the caches.

    I don’t think this is a problem. This fuzz test is designed to simulate a relatively realistic scenario of cache creations/deletions/flushes/syncs/additions/spends/lookups, and I believe the above just means that a lack of non-DIRTY spent coins also doesn’t appear in actual block processing.


    sipa commented at 3:34 pm on February 3, 2023:
    I added a commit that ought to exercise these cases. I’ll also submit a follow-up to the qa-assets repo to make use of these.
  32. Exercise non-DIRTY spent coins in caches in fuzz test 561848aaf2
  33. Sjors commented at 2:32 pm on February 8, 2023: member
    I ran the fuzzer using the latest corpus for about an hour and it didn’t find any crashes, so that’s nice.
  34. sipa commented at 2:44 pm on February 8, 2023: member
    The seeds I’ve added to qa-assets for this fuzzer (https://github.com/bitcoin-core/qa-assets/pull/103) are the result of a few months of CPU time, FWIW.
  35. DrahtBot requested review from dergoegge on Feb 10, 2023
  36. dergoegge commented at 12:08 pm on February 13, 2023: member
    Code review ACK 561848aaf2d67791e92754f3d11813bc53959a8f
  37. maflcko merged this on Feb 13, 2023
  38. maflcko closed this on Feb 13, 2023

  39. sidhujag referenced this in commit fed4b784cb on Feb 13, 2023
  40. martinus commented at 6:40 am on February 14, 2023: contributor

    @sipa While rebasing #25325 I saw that the fuzzer in coins_view.cpp still uses CCoinsMap coins_map; and doesn’t use the deterministic seeds, shouldn’t this fuzzer do that as well?

    I can add it while rebasing, I’m touching that line anyways. Unless you want to do this separately

    EDIT: I had a look what happens when I remove the default bool deterministic = false in SaltedOutpointHasher, and found another use of the map in the fuzzer, in tx_pool.cpp through const CCoinsViewMemPool amount_view{WITH_LOCK(::cs_main, return &chainstate.CoinsTip()), tx_pool};

  41. sipa commented at 0:14 am on February 15, 2023: member

    I saw that the fuzzer in coins_view.cpp still uses CCoinsMap coins_map;

    I think that should probably be addressed.

    and doesn’t use the deterministic seeds, shouldn’t this fuzzer do that as well?

    I believe it does?

    0    CCoinsViewCache coins_view_cache{&backend_coins_view, /*deterministic=*/true};
    

    EDIT: I had a look what happens when I remove the default bool deterministic = false in SaltedOutpointHasher, and found another use of the map in the fuzzer, in tx_pool.cpp through const CCoinsViewMemPool amount_view{WITH_LOCK(::cs_main, return &chainstate.CoinsTip()), tx_pool};

    Makes sense to change that too, I think.

  42. martinus commented at 5:15 am on February 15, 2023: contributor

    I believe it does?

    It does in line 49 for CCoinsViewCache, but not in line 118 for the CCoinsMap coins_map;. I’ve added that in #25325 here

  43. bitcoin locked this on Feb 18, 2024

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-12-22 06:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me