fuzz: target CDBWrapper #34887

pull andrewtoth wants to merge 3 commits into bitcoin:master from andrewtoth:fuzz-dbwrapper changing 5 files +280 −2
  1. andrewtoth commented at 1:12 AM on March 21, 2026: contributor

    Inspired by #34866 (comment).

    We currently don't have a dedicated harness targeting CDBWrapper. OSS-Fuzz has a rudimentary harness for levelDB which fails, so doesn't appear maintained.

    This PR adds a harness targeting CDBWrapper against an in-memory oracle to verify correctness.

    A DeterministicEnv wraps levelDB's memenv to eliminate non-determinism by capturing background compaction and running it at fuzzer-chosen points. The fuzzer also controls the cache_bytes and max_file_size sizes so that small values trigger memtable flushes and compaction.

  2. DrahtBot added the label Fuzzing on Mar 21, 2026
  3. DrahtBot commented at 1:12 AM on March 21, 2026: contributor

    <!--e57a25ab6845829454e8d69fc972939a-->

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    <!--021abf342d371248e50ceaed478a90ca-->

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    Concept ACK marcofleon, l0rinc

    If your review is incorrectly listed, please copy-paste <code>&lt;!--meta-tag:bot-skip--&gt;</code> into the comment that the bot should ignore.

    <!--174a7506f384e20aa4161008e828411d-->

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #34866 (fuzz: target concurrent leveldb reads by andrewtoth)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

    <!--5faf32d7da4f0f540f40219e4f7537a3-->

  4. DrahtBot added the label CI failed on Mar 21, 2026
  5. andrewtoth force-pushed on Mar 21, 2026
  6. DrahtBot removed the label CI failed on Mar 21, 2026
  7. andrewtoth force-pushed on Mar 21, 2026
  8. andrewtoth force-pushed on Mar 21, 2026
  9. andrewtoth force-pushed on Mar 21, 2026
  10. DrahtBot added the label CI failed on Mar 22, 2026
  11. andrewtoth force-pushed on Mar 22, 2026
  12. andrewtoth force-pushed on Mar 22, 2026
  13. DrahtBot removed the label CI failed on Mar 22, 2026
  14. dbwrapper: make max_file_size a configurable DBParams field
    Useful for fuzzing different values.
    9e444624bb
  15. andrewtoth force-pushed on Mar 29, 2026
  16. marcofleon commented at 3:22 PM on April 2, 2026: contributor

    Concept ACK

    Nice differential fuzz test. I'm running it now and I'll do a more thorough review soon.

  17. andrewtoth force-pushed on Apr 11, 2026
  18. andrewtoth commented at 3:10 PM on April 11, 2026: contributor

    Improved the assertions around iterators. This now exhausts the DB iterator completely instead of using LIMITED_WHILE, and asserts that the oracle iterator is also exhausted, ensuring no entries are missing.

    Also added coverage for failing to deserialize a value by reading with a type that always throws on deserialization.

    git diff 430f357f7e304336d2b61f72d2bc5439dea557d8..b967e255e7620b8366ec1aeb8932138455be8701

  19. in src/dbwrapper.cpp:226 in 9e444624bb outdated
     222 | @@ -224,6 +223,7 @@ CDBWrapper::CDBWrapper(const DBParams& params)
     223 |      DBContext().syncoptions.sync = true;
     224 |      DBContext().options = GetOptions(params.cache_bytes);
     225 |      DBContext().options.create_if_missing = true;
     226 | +    DBContext().options.max_file_size = params.max_file_size;
    


    l0rinc commented at 9:11 AM on April 14, 2026:

    9e44462 dbwrapper: make max_file_size a configurable DBParams field:

    Is this useful? Changing this back and forth is not something we do and will likely make running this very slow (based on the chosen values)


    andrewtoth commented at 10:34 PM on April 15, 2026:

    Yes, this is very useful. We want to exercise file compactions at different sizes. At 32 MB default there is no way for input to trigger compactions without very large input lengths. That would be very slow. How do you mean this will be slow otherwise?


    l0rinc commented at 9:52 AM on April 16, 2026:

    .max_file_size = provider.ConsumeIntegralInRange<size_t>(1_MiB, 4_MiB),

    I’m fine with exposing the knob, but I still think the actually used current value should be exercised explicitly. Right now the harness samples only 1–4 MiB and never hits DBWRAPPER_MAX_FILE_SIZE

  20. in src/dbwrapper.cpp:229 in e903203536 outdated
     223 | @@ -224,7 +224,9 @@ CDBWrapper::CDBWrapper(const DBParams& params)
     224 |      DBContext().options = GetOptions(params.cache_bytes);
     225 |      DBContext().options.create_if_missing = true;
     226 |      DBContext().options.max_file_size = params.max_file_size;
     227 | -    if (params.memory_only) {
     228 | +    if (params.testing_env) {
     229 | +        DBContext().options.env = params.testing_env;
     230 | +    } else if (params.memory_only) {
    


    l0rinc commented at 9:12 AM on April 14, 2026:

    e903203 dbwrapper: accept optional testing leveldb::Env in DBParams:

    Are these two configs mutually exclusive now? If so, we should add an assert to document that we can't use in-memory leveldb in a testing env.

  21. in test/sanitizer_suppressions/ubsan:22 in b967e255e7 outdated
      18 | @@ -19,6 +19,7 @@ implicit-integer-sign-change:*/include/c++/
      19 |  implicit-integer-sign-change:*/new_allocator.h
      20 |  implicit-integer-sign-change:*/qarraydata.h
      21 |  implicit-integer-sign-change:crc32c/
      22 | +implicit-integer-sign-change:leveldb/
    


    l0rinc commented at 9:14 AM on April 14, 2026:

    b967e25 test: add fuzz harness for CDBWrapper:

    Can we document in the commit message why this is needed and add fixes to https://github.com/bitcoin-core/leveldb-subtree?


    andrewtoth commented at 2:04 AM on April 16, 2026:

    Documented in commit message, but changing leveldb is out of scope here. There are already existing suppressions for leveldb.


    l0rinc commented at 9:50 AM on April 16, 2026:

    Adds implicit-integer-sign-change:leveldb/ to the test UBSAN suppressions.

    The code already shows that the suppression was added, I don't think this is useful. What is still unclear is why a broad implicit-integer-sign-change:leveldb/ suppression is justified here, and whether it could be hiding a real issue in the harness or bundled LevelDB code.

  22. in src/test/fuzz/dbwrapper.cpp:108 in b967e255e7
     103 | +/** Comparator matching LevelDB's lexicographic order on little-endian
     104 | + *  serialized uint16_t keys (low byte first, then high byte). */
     105 | +struct SerializedU16Cmp {
     106 | +    bool operator()(uint16_t a, uint16_t b) const
     107 | +    {
     108 | +        const auto swap{[](uint16_t v) -> uint16_t { return static_cast<uint16_t>((v << 8) | (v >> 8)); }};
    


    l0rinc commented at 9:17 AM on April 14, 2026:

    b967e25 test: add fuzz harness for CDBWrapper:

    Can we reuse existing helpers and mention the LevelDB comparator we're cloning here?

    /** Equivalent to `leveldb::BytewiseComparator()` on 2-byte little-endian
     *  serialized uint16_t keys, while keeping the oracle keyed by uint16_t. */
    struct LevelDBBytewiseU16Cmp {
        bool operator()(uint16_t a, uint16_t b) const { return internal_bswap_16(a) < internal_bswap_16(b); }
    };
    
  23. in src/test/fuzz/dbwrapper.cpp:113 in b967e255e7
     108 | +        const auto swap{[](uint16_t v) -> uint16_t { return static_cast<uint16_t>((v << 8) | (v >> 8)); }};
     109 | +        return swap(a) < swap(b);
     110 | +    }
     111 | +};
     112 | +
     113 | +struct FailDeserialize {
    


    l0rinc commented at 9:18 AM on April 14, 2026:

    b967e25 test: add fuzz harness for CDBWrapper:

    nit: I don't like the name, but we're calling Unserialize, not Deserialize

  24. in src/test/fuzz/dbwrapper.cpp:133 in b967e255e7 outdated
     128 | +    FuzzedDataProvider provider{buffer.data(), buffer.size()};
     129 | +
     130 | +    const auto memenv{std::unique_ptr<leveldb::Env>{leveldb::NewMemEnv(leveldb::Env::Default())}};
     131 | +    DeterministicEnv det_env{memenv.get()};
     132 | +
     133 | +    const bool obfuscate{provider.ConsumeBool()};
    


    l0rinc commented at 9:18 AM on April 14, 2026:

    b967e25 test: add fuzz harness for CDBWrapper:

    Non-obfuscated run should probably have a lot lower probability

        const bool obfuscate{provider.ConsumeIntegralInRange<uint8_t>(0, 10) != 0};
    

    andrewtoth commented at 10:23 PM on April 15, 2026:

    Fuzzing is coverage-guided, not random-sampling. There isn't really "probability" in the way you're thinking about it. The fuzzer discovers and maintains inputs that reach each unique path. ConsumeBool() gives a single-byte toggle that makes both paths trivially reachable. Changing to 1/11 or 1/8 just makes one path harder to reach without any benefit to coverage exploration. The fuzzer naturally allocates corpus entries based on which paths produce unique coverage.


    l0rinc commented at 9:56 AM on April 16, 2026:

    Changing to 1/11 or 1/8 just makes one path harder to reach without any benefit to coverage exploration

    Was just a tuning suggestion to incentivize the fuzzer to spend more time with realistic scenarios (starting from more realistic state). Not a big deal either way, please resolve.


    andrewtoth commented at 1:50 PM on April 16, 2026:

    to incentivize the fuzzer to spend more time

    The fuzzer is incentivized to spend more time based on new coverage. If the first byte does obfuscation and the second byte does non-obfuscation, it will explore both paths. If just 0 of the first 8 bytes does non-obfuscation, and any other value of 8 bytes does obfuscation, then the fuzzer will still explore both paths equally. It's just taking more time for the fuzzer to understand that any other value than 0 does not add new coverage.


    l0rinc commented at 1:53 PM on April 16, 2026:

    It's just taking more time for the fuzzer to understand that any other value than 0 does not add new coverage.

    Exactly.

  25. in src/test/fuzz/dbwrapper.cpp:135 in b967e255e7
     130 | +    const auto memenv{std::unique_ptr<leveldb::Env>{leveldb::NewMemEnv(leveldb::Env::Default())}};
     131 | +    DeterministicEnv det_env{memenv.get()};
     132 | +
     133 | +    const bool obfuscate{provider.ConsumeBool()};
     134 | +    const size_t cache_bytes{provider.ConsumeIntegralInRange<size_t>(64 << 10, 1_MiB)};
     135 | +    const size_t max_file_size{provider.ConsumeIntegralInRange<size_t>(1_MiB, 4_MiB)};
    


    l0rinc commented at 9:20 AM on April 14, 2026:

    b967e25 test: add fuzz harness for CDBWrapper:

    How come we're not even exercising the actual value that we're using internally? The previous value was 2_MiB, the current one is DBWRAPPER_MAX_FILE_SIZE = 32_MiB - if you insist on changing this can we simply choose between these two values only?

        const size_t max_file_size{provider.ConsumeBool() ? 2_MiB : DBWRAPPER_MAX_FILE_SIZE};
    

    or

        const size_t max_file_size{provider.PickValueInArray<std::size_t>({2_MiB, DBWRAPPER_MAX_FILE_SIZE})};
    

    andrewtoth commented at 10:36 PM on April 15, 2026:

    #34887 (review)

    More than 4 MB will not be very useful, since we won't have large enough inputs to do any kind of compaction. Choosing between 2 values is artificially limiting the fuzzer.


    l0rinc commented at 9:58 AM on April 16, 2026:

    Choosing between 2 values is artificially limiting the fuzzer.

    Yes, that's the point - we want it to spend more time on valid cases instead of made-up ones. Making it a range will just result in the actual values used (2_MiB and 32_MiB) in the code being chosen less often.

  26. in src/test/fuzz/dbwrapper.cpp:1 in b967e255e7 outdated


    l0rinc commented at 9:24 AM on April 14, 2026:

    b967e25 test: add fuzz harness for CDBWrapper:

    This last commit is huge, maybe we could split it into a few self-contained commits that each add one layer of coverage and still build on their own? For example:

    1. basic deterministic differential harness with the core CDBWrapper methods
    2. deserialization-failure coverage (FailUnserialize / PickExistingKey)
    3. key-ordering and range checks (LevelDBBytewiseU16Cmp / EstimateSize)
    4. iterator and restart/reopen coverage.

    That would make it much easier to review the basic harness separately from the trickier ordering/iterator/lifecycle pieces.


    l0rinc commented at 9:59 AM on April 16, 2026:

    This still stands. The issue is not correctness, it is reviewability: the basic deterministic harness is much easier to reason about separately from the ordering / iterator / reopen logic.

  27. in src/dbwrapper.h:53 in e903203536 outdated
      47 | @@ -44,6 +48,9 @@ struct DBParams {
      48 |      bool obfuscate = false;
      49 |      //! Passed-through options.
      50 |      DBOptions options{};
      51 | +    //! If non-null, use this as the leveldb::Env instead of the default.
      52 | +    //! Caller retains ownership.
      53 | +    leveldb::Env* testing_env = nullptr;
    


    l0rinc commented at 9:30 AM on April 14, 2026:

    e903203 dbwrapper: accept optional testing leveldb::Env in DBParams:

    Can we cover this with tests in dbwrapper_tests.cpp to explain the expected behavior, e.g. where we reuse an injected memenv across reopens and verify that wipe_data still works (which I think we're silently bypassing here).


    andrewtoth commented at 2:06 AM on April 16, 2026:

    This is only used for this fuzz harness, which covers it. I think having a unit test for a test only construct where it's already tested is unnecessary.


    l0rinc commented at 10:07 AM on April 16, 2026:

    This is only used for this fuzz harness, which covers it

    The fuzzer depends on this behavior, but it is not a substitute for a small API-level test of the constructor semantics here, especially the testing_env precedence and wipe_data behavior.

  28. in src/test/fuzz/dbwrapper.cpp:148 in b967e255e7
     143 | +    }};
     144 | +
     145 | +    // Oracle: key → value length. Content is reconstructed via MakeValue().
     146 | +    std::map<uint16_t, uint16_t, SerializedU16Cmp> oracle;
     147 | +
     148 | +    LIMITED_WHILE(provider.ConsumeBool(), 10000)
    


    l0rinc commented at 9:33 AM on April 14, 2026:

    b967e25 test: add fuzz harness for CDBWrapper:

    nit:

        LIMITED_WHILE(provider.ConsumeBool(), 10'000)
    
  29. in src/test/fuzz/dbwrapper.cpp:156 in b967e255e7
     151 | +            provider,
     152 | +            // --- Mutations ---
     153 | +            [&] {
     154 | +                const uint16_t key{provider.ConsumeIntegral<uint16_t>()};
     155 | +                const uint16_t len{provider.ConsumeIntegralInRange<uint16_t>(0, MAX_VALUE_SIZE)};
     156 | +                const bool sync{provider.ConsumeBool()};
    


    l0rinc commented at 9:35 AM on April 14, 2026:

    b967e25 test: add fuzz harness for CDBWrapper:

    We rarely call fSync (the default is false for a reason), can we also make this more rare (to make sure we detect problems when it's not called and to make execution faster)? We could extract to a helper:

    bool ConsumeSync(FuzzedDataProvider& provider)
    {
        return provider.ConsumeIntegralInRange<uint8_t>(0, 7) == 0;
    }
    

    and use it as

    dbw->Write(key, MakeValue(key, len), ConsumeSync(provider));
    

    andrewtoth commented at 10:24 PM on April 15, 2026:

    l0rinc commented at 10:15 AM on April 16, 2026:

    Can be resolved

  30. in src/test/fuzz/dbwrapper.cpp:191 in b967e255e7
     186 | +                        }
     187 | +                    }
     188 | +                }};
     189 | +                fill();
     190 | +                if (provider.ConsumeBool()) {
     191 | +                    (void)batch.ApproximateSize();
    


    l0rinc commented at 9:37 AM on April 14, 2026:

    b967e25 test: add fuzz harness for CDBWrapper:

    nit: we could assert something here, e.g. that the value is always >= 12 (kHeader)

    constexpr size_t WRITE_BATCH_HEADER{12}; // See `kHeader` in `db/write_batch.cc`
    ...
    assert(batch.ApproximateSize() >= WRITE_BATCH_HEADER);
    batch.Clear();
    assert(batch.ApproximateSize() == WRITE_BATCH_HEADER);
    

    l0rinc commented at 10:06 AM on April 16, 2026:

    Please resolve.

  31. in src/test/fuzz/dbwrapper.cpp:210 in b967e255e7
     205 | +                const uint16_t key{provider.ConsumeIntegral<uint16_t>()};
     206 | +                std::vector<uint8_t> value;
     207 | +                const bool found{dbw.Read(key, value)};
     208 | +                const auto it{oracle.find(key)};
     209 | +                assert(found == (it != oracle.end()));
     210 | +                if (found) assert(value == MakeValue(key, it->second));
    


    l0rinc commented at 9:46 AM on April 14, 2026:

    b967e25 test: add fuzz harness for CDBWrapper:

    Alternatively to make the found/not-found cases more symmetric, we could split based on whether we expect to find the value (like we did with obfiscation), something like:

    if (const auto it{oracle.find(key)}; it != oracle.end()) {
        assert(found && value == MakeValue(key, it->second));
    } else {
        assert(!found);
    }
    

    l0rinc commented at 10:06 AM on April 16, 2026:

    Please resolve.

  32. in src/test/fuzz/dbwrapper.cpp:244 in b967e255e7 outdated
     239 | +                        ++oracle_it;
     240 | +                    } else {
     241 | +                        assert(obfuscate);
     242 | +                        std::string key_str;
     243 | +                        assert(it->GetKey(key_str));
     244 | +                        assert(key_str == std::string("\000obfuscate_key", 14));
    


    l0rinc commented at 9:49 AM on April 14, 2026:

    b967e25 test: add fuzz harness for CDBWrapper:

    could we use the constant here (we could expose it via dbwrapper_private)?

    Thinking a bit more about this, this obfuscation is a bit intertwined with dbwrapper and it complicates the fuzzer since we can't assume the keys are always numbers and the code adds values to the database that aren't under our control, which hinders what we can assert. I don't think we can safely disable it (without introducing a heavy bias), but we can probably separate it from the rest of the iteration at least:

    std::optional<uint16_t> GetIteratorUserKey(CDBIterator& it, bool obfuscate)
    {
        struct RawKeyBytes {
            std::vector<std::byte> key;
    
            void Unserialize(DataStream& s)
            {
                key.resize(s.size());
                if (!key.empty()) s.read(std::span{key});
            }
        };
    
        // The harness only writes uint16_t keys, but an obfuscated DB also stores
        // the internal obfuscation metadata entry under a different key type.
        RawKeyBytes raw_key;
        assert(it.GetKey(raw_key));
        if (raw_key.key.size() != sizeof(uint16_t)) {
            assert(obfuscate);
            std::string key_str;
            assert(it.GetKey(key_str));
            assert(key_str == dbwrapper_private::GetObfuscationKey());
            return std::nullopt;
        }
        return ReadLE16(raw_key.key.data());
    }
    

    Which could help with the VerifyIterator to avoid the iteration falling back only when it disagrees with the oracle.


    andrewtoth commented at 2:08 AM on April 16, 2026:

    I don't think this is worth the extra code complexity, just to pull out this constant. This PR touches very minimal production code, and I'd prefer to keep it like that.


    l0rinc commented at 10:06 AM on April 16, 2026:

    This PR touches very minimal production code, and I'd prefer to keep it like that.

    Sure, does that contradict my suggestion above? The point wasn’t necessarily just to reuse the constant. It was to separate the internal obfuscation-key path from normal user-key iteration so the iterator logic does not only recognize the metadata entry when it disagrees with the oracle.

  33. in src/test/fuzz/dbwrapper.cpp:261 in b967e255e7
     256 | +                }
     257 | +            },
     258 | +            [&] {
     259 | +                uint16_t k1{provider.ConsumeIntegral<uint16_t>()};
     260 | +                uint16_t k2{provider.ConsumeIntegral<uint16_t>()};
     261 | +                if (k1 > k2) std::swap(k1, k2);
    


    l0rinc commented at 9:53 AM on April 14, 2026:

    b967e25 test: add fuzz harness for CDBWrapper:

    Shouldn't we use the comparator here? And use std::minmax to avoid the swap, something like:

    const auto [k1, k2]{std::minmax({provider.ConsumeIntegral<uint16_t>(), provider.ConsumeIntegral<uint16_t>()}, LevelDBBytewiseU16Cmp{})};
    const size_t estimate_size{dbw.EstimateSize(k1, k2)};
    assert((estimate_size == 0) == (k1 == k2));
    

    but that seems to fail sometimes for me locally, not sure how EstimateSize can be 0 otherwise, but maybe if (k1 == k2) assert(estimate_size == 0); is a better assertion: https://github.com/bitcoin/bitcoin/blob/20a6babfa9a66f5432ef19c6c433b4357560f853/src/leveldb/include/leveldb/db.h#L133

  34. in src/test/fuzz/dbwrapper.cpp:265 in b967e255e7
     260 | +                uint16_t k2{provider.ConsumeIntegral<uint16_t>()};
     261 | +                if (k1 > k2) std::swap(k1, k2);
     262 | +                (void)dbw.EstimateSize(k1, k2);
     263 | +            },
     264 | +            [&] {
     265 | +                (void)dbw.DynamicMemoryUsage();
    


    l0rinc commented at 9:53 AM on April 14, 2026:

    b967e25 test: add fuzz harness for CDBWrapper:

    this returns 0 when it fails - can we guard against that? I don't have a concrete suggestion here - except modifying DynamicMemoryUsage to return an std::optional but that would be outside the scope of this change.


    l0rinc commented at 10:09 AM on April 16, 2026:

    Can be resolved.

  35. in src/test/fuzz/dbwrapper.cpp:138 in b967e255e7
     133 | +    const bool obfuscate{provider.ConsumeBool()};
     134 | +    const size_t cache_bytes{provider.ConsumeIntegralInRange<size_t>(64 << 10, 1_MiB)};
     135 | +    const size_t max_file_size{provider.ConsumeIntegralInRange<size_t>(1_MiB, 4_MiB)};
     136 | +
     137 | +    CDBWrapper dbw{{
     138 | +        .path = "",
    


    l0rinc commented at 10:21 AM on April 14, 2026:

    b967e25 test: add fuzz harness for CDBWrapper:

            .path = "dbwrapper_fuzz",
    
  36. in src/test/fuzz/dbwrapper.cpp:115 in b967e255e7
     110 | +    }
     111 | +};
     112 | +
     113 | +struct FailDeserialize {
     114 | +    template <typename Stream>
     115 | +    void Unserialize(Stream&) { throw std::runtime_error{"intentional"}; }
    


    l0rinc commented at 10:30 AM on April 14, 2026:

    b967e25 test: add fuzz harness for CDBWrapper:

    Alternatively, since ios_base::failure would be the natural failure mode for streams:

        void Unserialize(Stream&) { throw std::ios_base::failure{"always fail"}; }
    
  37. in src/test/fuzz/dbwrapper.cpp:120 in b967e255e7
     115 | +    void Unserialize(Stream&) { throw std::runtime_error{"intentional"}; }
     116 | +};
     117 | +
     118 | +void initialize_dbwrapper()
     119 | +{
     120 | +    LogInstance().DisableLogging();
    


    l0rinc commented at 10:31 AM on April 14, 2026:

    b967e25 test: add fuzz harness for CDBWrapper:

    When are we using this and when static const auto testing_setup{MakeNoLogFileContext<>()};?

    Seems the context is more general and we can probably inline it:

    FUZZ_TARGET(dbwrapper, .init = [] { static auto setup{MakeNoLogFileContext<>()}; })
    
  38. in src/test/fuzz/dbwrapper.cpp:127 in b967e255e7 outdated
     122 | +
     123 | +} // namespace
     124 | +
     125 | +FUZZ_TARGET(dbwrapper, .init = initialize_dbwrapper)
     126 | +{
     127 | +    SeedRandomStateForTest(SeedRand::ZEROS);
    


    l0rinc commented at 10:35 AM on April 14, 2026:

    b967e25 test: add fuzz harness for CDBWrapper:

    This seems to be needed even though the default is already https://github.com/bitcoin/bitcoin/blob/261d229455e30fee874a5215a618adf6210cbe3a/src/test/fuzz/fuzz.cpp#L100 - whatever...


    l0rinc commented at 10:09 AM on April 16, 2026:

    Resolve please.

  39. in src/test/fuzz/dbwrapper.cpp:222 in b967e255e7
     217 | +                const uint16_t key{provider.ConsumeIntegral<uint16_t>()};
     218 | +                FailDeserialize wrong_type;
     219 | +                assert(!dbw.Read(key, wrong_type));
     220 | +            },
     221 | +            [&] {
     222 | +                const std::unique_ptr<CDBIterator> it{dbw.NewIterator()};
    


    l0rinc commented at 10:56 AM on April 14, 2026:

    b967e25 test: add fuzz harness for CDBWrapper:

    This seems like a very important invariant that we could extract to a helper and use it more often, e.g.:

                [&] {
                    const auto seek_key{provider.ConsumeBool() ? std::nullopt : std::optional{provider.ConsumeIntegral<uint16_t>()}};
                    VerifyIterator(dbw, oracle, obfuscate, seek_key);
                },
    

    and at the very end:

        // The fuzzer may end without choosing another iterator pass after the last
        // deferred compaction step, so drain and verify one final time here.
        det_env.DrainWork();
        VerifyIterator(dbw, oracle, obfuscate);
    

    andrewtoth commented at 10:27 PM on April 15, 2026:

    #34887 (review)

    The fuzzer may or may not choose another iterator pass based on coverage. We don't need to add this code at the end if there is already a way for the fuzzer to reach it somewhere else.


    l0rinc commented at 10:10 AM on April 16, 2026:

    Yes, the fuzzer may reach that path elsewhere, but making it a final invariant is still useful: the state should be valid at the end of every run regardless of which branches were taken.


    andrewtoth commented at 1:04 PM on April 16, 2026:

    making it a final invariant is still useful

    Not really. If there is deterministic input that causes the state to be invalid at the end of the run, then the fuzzer will be able to reach that invalid state again and trivially add the branch to reach the iterator check above.


    l0rinc commented at 4:20 PM on April 16, 2026:

    Ok, seems we're talking past each other here, will unsubscribe and leave it to others to review.

  40. in src/test/fuzz/dbwrapper.cpp:134 in b967e255e7
     129 | +
     130 | +    const auto memenv{std::unique_ptr<leveldb::Env>{leveldb::NewMemEnv(leveldb::Env::Default())}};
     131 | +    DeterministicEnv det_env{memenv.get()};
     132 | +
     133 | +    const bool obfuscate{provider.ConsumeBool()};
     134 | +    const size_t cache_bytes{provider.ConsumeIntegralInRange<size_t>(64 << 10, 1_MiB)};
    


    l0rinc commented at 11:09 AM on April 14, 2026:

    b967e25 test: add fuzz harness for CDBWrapper:

    These are often bigger in reality, can we add a few meaningful values here instead?

    const size_t cache_bytes{provider.PickValueInArray<std::size_t>({64_KiB, 1_MiB, 2_MiB})};
    

    or if we want to check arbitrary values to avoid introducing any bias:

     const size_t cache_bytes{provider.ConsumeIntegralInRange<size_t>(64_KiB, 4_MiB)};
    

    andrewtoth commented at 10:38 PM on April 15, 2026:

    We want to trigger memtable flushes. Larger values will not allow us to do that with our input size, so are not interesting. Similar to #34887 (review).


    l0rinc commented at 10:11 AM on April 16, 2026:

    so are not interesting

    I get the goal of keeping values small enough to trigger flushes cheaply, but I still think production-relevant values should be sampled some of the time, especially the real 2_MiB block-tree setting.

  41. in src/test/fuzz/dbwrapper.cpp:98 in b967e255e7
      93 | + *  controls only the key and a 16-bit length; the actual content is
      94 | + *  derived from a fixed pattern and doubled so that fuzz bytes aren't
      95 | + *  wasted on bulk data. */
      96 | +std::vector<uint8_t> MakeValue(uint16_t key, uint16_t len)
      97 | +{
      98 | +    std::vector<uint8_t> v(static_cast<size_t>(len) * 2);
    


    l0rinc commented at 11:29 AM on April 14, 2026:

    b967e25 test: add fuzz harness for CDBWrapper:

    Not sure I understand why MakeValue is always even - and if we're multiplying MAX_VALUE_SIZE by 2 it's not the actual max size, right?


    andrewtoth commented at 2:09 AM on April 16, 2026:

    Made it 3 instead of 2 so it can be odd. Also updated the comment to clarify MAX_VALUE_SIZE.


    l0rinc commented at 10:03 AM on April 16, 2026:

    Smart, this fixes the "always even" concern, but it replaces it with a stronger size bias (e.g. no prime numbers, powers of 2, etc). What's the reason for multiplying at all? Why not just

        std::vector<uint8_t> v(len);
    

    ? Please unresolve.

  42. in src/test/fuzz/dbwrapper.cpp:219 in b967e255e7
     214 | +                assert(dbw.Exists(key) == oracle.contains(key));
     215 | +            },
     216 | +            [&] {
     217 | +                const uint16_t key{provider.ConsumeIntegral<uint16_t>()};
     218 | +                FailDeserialize wrong_type;
     219 | +                assert(!dbw.Read(key, wrong_type));
    


    l0rinc commented at 11:40 AM on April 14, 2026:

    b967e25 test: add fuzz harness for CDBWrapper:

    This would already be false without the failure since the key is very unlikely to be present, maybe we could extend it to:

    const uint16_t key{!oracle.empty() && provider.ConsumeBool() ? PickExistingKey(provider, oracle) : provider.ConsumeIntegral<uint16_t>()};
    

    andrewtoth commented at 10:28 PM on April 15, 2026:

    l0rinc commented at 10:13 AM on April 16, 2026:

    The point here is to exercise the deserialization-failure path, not the much more common key-not-found path, so I still think this branch should bias toward existing keys when the oracle is non-empty, see:


    andrewtoth commented at 1:01 PM on April 16, 2026:

    Using words like "unlikely" and "bias" don't make sense when discussing a fuzz harness. The fuzzer will choose keys that are present or not, based on coverage data. It does not choose inputs randomly. We don't want to introduce bias, we want to make it easy for the fuzzer to find all paths.


    l0rinc commented at 1:06 PM on April 16, 2026:

    Using words like "unlikely" and "bias" don't make sense when discussing a fuzz harness.

    We don't want to introduce bias, we want to make it easy for the fuzzer to find all paths.

    We're always introducing bias; I'd rather bias it towards the more realistic uses instead of uniformity. But this comment is not about that, it's about this branch being incorrect since it exercises the not-found path, not the deserialization-error path.


    andrewtoth commented at 1:14 PM on April 16, 2026:

    This is about bias. You are suggesting that it is "unlikely" the value will be present in this codepath. That is not how fuzzers work. The fuzzer will choose a key that is present, and then exercise the deserialization-error path.


    l0rinc commented at 4:14 PM on April 16, 2026:

    You are suggesting that it is "unlikely" the value will be present in this codepath

    I instrumented the FailUnserialize branch to distinguish Read() returning false because the key is absent vs because deserialization throws. Running the dbwrapper fuzzer for 766,281 executions showed 46,127 missing-key hits and 3,041 actual deserialization-failure hits (and 4144181 runs show dbwrapper FailUnserialize stats: missing_key=573326 deserialize_failure=45901). So while libFuzzer does discover the intended path, about 94% of the observed hits to this branch are still the missing-key case rather than the deserialization-failure case. This is what I meant by unlikely.

  43. in src/test/fuzz/dbwrapper.cpp:195 in b967e255e7 outdated
     190 | +                if (provider.ConsumeBool()) {
     191 | +                    (void)batch.ApproximateSize();
     192 | +                    batch.Clear();
     193 | +                    batch_writes.clear();
     194 | +                    batch_erases.clear();
     195 | +                    fill();
    


    l0rinc commented at 11:46 AM on April 14, 2026:

    b967e25 test: add fuzz harness for CDBWrapper:

    after batch.Clear(), the harness could avoid always refilling the batch:

                        if (provider.ConsumeBool()) fill();
    

    andrewtoth commented at 10:28 PM on April 15, 2026:

    This is covered already by the fill() looping zero times.


    l0rinc commented at 10:02 AM on April 16, 2026:

    Good point!

  44. in src/test/fuzz/dbwrapper.cpp:137 in b967e255e7
     132 | +
     133 | +    const bool obfuscate{provider.ConsumeBool()};
     134 | +    const size_t cache_bytes{provider.ConsumeIntegralInRange<size_t>(64 << 10, 1_MiB)};
     135 | +    const size_t max_file_size{provider.ConsumeIntegralInRange<size_t>(1_MiB, 4_MiB)};
     136 | +
     137 | +    CDBWrapper dbw{{
    


    l0rinc commented at 11:53 AM on April 14, 2026:

    b967e25 test: add fuzz harness for CDBWrapper:

    I would expect some bugs occurring when we randomly close&reopen this - could we add such a branch?

        const auto make_db{[&] {
            return std::make_unique<CDBWrapper>(DBParams{
                .path = "dbwrapper_fuzz",
                .cache_bytes = cache_bytes,
                .obfuscate = obfuscate,
                .testing_env = &det_env,
                .max_file_size = max_file_size,
            });
        }};
        std::unique_ptr<CDBWrapper> dbw{make_db()};
    
                [&] {
                    if (provider.ConsumeBool()) det_env.DrainWork();
                    dbw.reset();
                    dbw = make_db({.force_compact = provider.ConsumeBool()});
                    VerifyIterator(*dbw, oracle, obfuscate);
                },
    

    (we could even exercise force_compact this way)


    andrewtoth commented at 2:10 AM on April 16, 2026:

    Added reopening the db, but we can't exercise force_compact since it needs a background compaction. It deadlocks on DeterminiticEnv. If we introduced a background thread it wouldn't really be Deterministic anymore.


    l0rinc commented at 10:28 AM on April 16, 2026:

    we can't exercise force_compact since it needs a background compaction

    Sounds like a something we should investigate, without a background compaction we're missing out on a fundamental behavior difference that we're not checking at all. Reopen with force_compact is one of the more interesting remaining constructor paths. If deterministic compaction makes it impossible here, I think that should at least be called out explicitly.

  45. in src/test/fuzz/dbwrapper.cpp:229 in b967e255e7 outdated
     224 | +                if (provider.ConsumeBool()) {
     225 | +                    const uint16_t seek_key{provider.ConsumeIntegral<uint16_t>()};
     226 | +                    it->Seek(seek_key);
     227 | +                    oracle_it = oracle.lower_bound(seek_key);
     228 | +                } else {
     229 | +                    it->SeekToFirst();
    


    l0rinc commented at 12:24 PM on April 14, 2026:

    b967e25 test: add fuzz harness for CDBWrapper:

    We know the lowest key, we could unify the two branches to:

        it->Seek(seek_key.value_or(uint16_t{0}));
    

    (IsEmpty already covers SeekToFirst)


    andrewtoth commented at 10:29 PM on April 15, 2026:

    This is specifically to cover SeekToFirst.


    l0rinc commented at 6:38 AM on April 16, 2026:

    As mentioned, IsEmpty() already covers SeekToFirst(), so I still think unifying this to Seek(seek_key.value_or(uint16_t{0})) would be fine.

  46. in src/test/fuzz/dbwrapper.cpp:154 in b967e255e7
     149 | +    {
     150 | +        CallOneOf(
     151 | +            provider,
     152 | +            // --- Mutations ---
     153 | +            [&] {
     154 | +                const uint16_t key{provider.ConsumeIntegral<uint16_t>()};
    


    l0rinc commented at 12:39 PM on April 14, 2026:

    b967e25 test: add fuzz harness for CDBWrapper:

    This is repeated very often - we could add a dedicated helper to hide the details and make this inlineable:

    uint16_t ConsumeKey(FuzzedDataProvider& provider) { return provider.ConsumeIntegral<uint16_t>(); }
    uint16_t ConsumeValueLength(FuzzedDataProvider& provider) { return provider.ConsumeIntegralInRange<uint16_t>(0, MAX_VALUE_SIZE); }
    bool ConsumeSync(FuzzedDataProvider& provider) { return provider.ConsumeIntegralInRange<uint8_t>(0, 7) == 0; }
    uint16_t PickExistingKey(FuzzedDataProvider& provider, const std::map<uint16_t, uint16_t, LevelDBBytewiseU16Cmp>& oracle)
    {
        assert(!oracle.empty());
        auto it{oracle.begin()};
        std::advance(it, provider.ConsumeIntegralInRange<size_t>(0, oracle.size() - 1));
        return it->first;
    }
    
  47. in src/test/fuzz/dbwrapper.cpp:176 in b967e255e7
     171 | +                std::set<uint16_t> batch_erases;
     172 | +                const auto fill{[&] {
     173 | +                    LIMITED_WHILE(provider.ConsumeBool(), 20)
     174 | +                    {
     175 | +                        if (provider.ConsumeBool()) {
     176 | +                            const uint16_t key{provider.ConsumeIntegral<uint16_t>()};
    


    l0rinc commented at 12:45 PM on April 14, 2026:

    b967e25 test: add fuzz harness for CDBWrapper:

    key can be deduplicated in this branch

  48. in src/test/fuzz/dbwrapper.cpp:256 in b967e255e7
     251 | +            [&] {
     252 | +                if (!obfuscate) {
     253 | +                    assert(dbw.IsEmpty() == oracle.empty());
     254 | +                } else {
     255 | +                    assert(!dbw.IsEmpty());
     256 | +                }
    


    l0rinc commented at 12:50 PM on April 14, 2026:

    b967e25 test: add fuzz harness for CDBWrapper:

    We could simplify this to a single assert:

                    assert(dbw->IsEmpty() == (oracle.empty() && !obfuscate));
    
  49. in src/test/fuzz/dbwrapper.cpp:49 in b967e255e7
      44 | +    struct Work {
      45 | +        void (*function)(void*);
      46 | +        void* arg;
      47 | +    };
      48 | +
      49 | +    mutable Mutex m_mutex;
    


    l0rinc commented at 12:53 PM on April 14, 2026:

    b967e25 test: add fuzz harness for CDBWrapper:

    We could avoid the mutable here if we make bool HasPending() EXCLUSIVE_LOCKS_REQUIRED(!m_mutex) non-const.

  50. in src/test/fuzz/dbwrapper.cpp:45 in b967e255e7
      40 | + * all pending work before every write to avoid a single-threaded deadlock.
      41 | + */
      42 | +class DeterministicEnv final : public leveldb::EnvWrapper
      43 | +{
      44 | +    struct Work {
      45 | +        void (*function)(void*);
    


    l0rinc commented at 1:12 PM on April 14, 2026:

    b967e25 test: add fuzz harness for CDBWrapper:

    We could dedup and name this type:

    using WorkFunction = void (*)(void*);
    ..
    struct Work {
        WorkFunction function;
        void* arg;
    };
    ...
    void Schedule(WorkFunction function, void* arg) override EXCLUSIVE_LOCKS_REQUIRED(!m_mutex)
    
  51. in src/test/fuzz/dbwrapper.cpp:65 in b967e255e7
      60 | +
      61 | +    bool HasPending() const EXCLUSIVE_LOCKS_REQUIRED(!m_mutex)
      62 | +    {
      63 | +        LOCK(m_mutex);
      64 | +        return !m_queue.empty();
      65 | +    }
    


    l0rinc commented at 1:13 PM on April 14, 2026:

    b967e25 test: add fuzz harness for CDBWrapper:

    WITH_LOCK would make this a one-liner:

        bool HasPending() EXCLUSIVE_LOCKS_REQUIRED(!m_mutex) { return WITH_LOCK(m_mutex, return !m_queue.empty()); }
    

    but if we adjust RunOne to return a bool, we can even get rid of it:

    /** Execute one pending background task. The task may schedule a
     *  successor which is left pending for a later call. */
    bool RunOne() EXCLUSIVE_LOCKS_REQUIRED(!m_mutex)
    {
        Work work{nullptr, nullptr};
        {
            LOCK(m_mutex);
            if (m_queue.empty()) return false;
            work = m_queue.front();
            m_queue.pop_front();
        }
        work.function(work.arg);
        return true;
    }
    
    /** Execute pending background tasks until none remain. */
    void DrainWork() EXCLUSIVE_LOCKS_REQUIRED(!m_mutex) { while (RunOne()) {} }
    
  52. l0rinc changes_requested
  53. l0rinc commented at 1:34 PM on April 14, 2026: contributor

    Concept ACK

    Thanks for tackling this, a dedicated CDBWrapper fuzzer helps with increasing our confidence as we iterate towards #31132.

    I have spent more time with this review than anticipated (again), the main themes of my suggestions are:

    • make the wrapper/env behavior more explicit and covered with focused unit tests, especially injected testing_env precedence and wipe_data
    • keep the fuzz inputs closer to meaningful real values and cheaper execution paths where possible
    • split the large fuzz commit into a few smaller buildable steps so the basic harness can be reviewed separately from the ordering / iterator / reopen logic
    • simplify and reuse helpers in the fuzzer where we can, and make the iterator / obfuscation handling more explicit and robust

    <details><summary>Local changes during review</summary>

    diff --git a/src/dbwrapper.cpp b/src/dbwrapper.cpp
    index f1884c6cf4..3ca8fce9d7 100644
    --- a/src/dbwrapper.cpp
    +++ b/src/dbwrapper.cpp
    @@ -229,13 +229,16 @@ CDBWrapper::CDBWrapper(const DBParams& params)
         } else if (params.memory_only) {
             DBContext().penv = leveldb::NewMemEnv(leveldb::Env::Default());
             DBContext().options.env = DBContext().penv;
    -    } else {
    +    }
    +    if (!params.memory_only) {
             if (params.wipe_data) {
                 LogInfo("Wiping LevelDB in %s", fs::PathToString(params.path));
                 leveldb::Status result = leveldb::DestroyDB(fs::PathToString(params.path), DBContext().options);
                 HandleError(result);
             }
    -        TryCreateDirectories(params.path);
    +        if (!params.testing_env) {
    +            TryCreateDirectories(params.path);
    +        }
             LogInfo("Opening LevelDB in %s", fs::PathToString(params.path));
         }
         // PathToString() return value is safe to pass to leveldb open function,
    @@ -392,4 +395,9 @@ const Obfuscation& GetObfuscation(const CDBWrapper& w)
         return w.m_obfuscation;
     }
     
    +const std::string& GetObfuscationKey()
    +{
    +    return CDBWrapper::OBFUSCATION_KEY;
    +}
    +
     } // namespace dbwrapper_private
    diff --git a/src/dbwrapper.h b/src/dbwrapper.h
    index 3e81a27e37..4b24fefb94 100644
    --- a/src/dbwrapper.h
    +++ b/src/dbwrapper.h
    @@ -73,6 +73,7 @@ namespace dbwrapper_private {
      * specific database.
      */
     const Obfuscation& GetObfuscation(const CDBWrapper&);
    +const std::string& GetObfuscationKey();
     }; // namespace dbwrapper_private
     
     bool DestroyDB(const std::string& path_str);
    @@ -188,6 +189,7 @@ struct LevelDBContext;
     class CDBWrapper
     {
         friend const Obfuscation& dbwrapper_private::GetObfuscation(const CDBWrapper&);
    +    friend const std::string& dbwrapper_private::GetObfuscationKey();
     private:
         //! holds all leveldb-specific fields of this class
         std::unique_ptr<LevelDBContext> m_db_context;
    diff --git a/src/test/CMakeLists.txt b/src/test/CMakeLists.txt
    index 1365d6c147..5e61b40ae6 100644
    --- a/src/test/CMakeLists.txt
    +++ b/src/test/CMakeLists.txt
    @@ -170,6 +170,7 @@ target_link_libraries(test_bitcoin
       libevent::extra
       $<TARGET_NAME_IF_EXISTS:USDT::headers>
     )
    +target_include_directories(test_bitcoin PRIVATE ${PROJECT_SOURCE_DIR}/src/leveldb/include)
     
     if(ENABLE_WALLET)
       add_subdirectory(${PROJECT_SOURCE_DIR}/src/wallet/test wallet)
    diff --git a/src/test/dbwrapper_tests.cpp b/src/test/dbwrapper_tests.cpp
    index 3896ea64da..dfd1f73c78 100644
    --- a/src/test/dbwrapper_tests.cpp
    +++ b/src/test/dbwrapper_tests.cpp
    @@ -3,6 +3,8 @@
     // file COPYING or http://www.opensource.org/licenses/mit-license.php.
     
     #include <dbwrapper.h>
    +#include <leveldb/env.h>
    +#include <leveldb/helpers/memenv/memenv.h>
     #include <test/util/common.h>
     #include <test/util/random.h>
     #include <test/util/setup_common.h>
    @@ -267,6 +269,32 @@ BOOST_AUTO_TEST_CASE(existing_data_no_obfuscate)
         BOOST_CHECK_EQUAL(res3.ToString(), in2.ToString());
     }
     
    +BOOST_AUTO_TEST_CASE(injected_environment)
    +{
    +    std::unique_ptr<leveldb::Env> env{leveldb::NewMemEnv(leveldb::Env::Default())};
    +    const fs::path ph{m_args.GetDataDirBase() / "dbwrapper_injected_environment"};
    +    const uint8_t key{'k'};
    +    const uint256 value{m_rng.rand256()};
    +    uint256 read_value;
    +
    +    {
    +        CDBWrapper dbw({.path = ph, .cache_bytes = 1_MiB, .wipe_data = true, .testing_env = env.get()});
    +        dbw.Write(key, value);
    +    }
    +
    +    {
    +        CDBWrapper dbw({.path = ph, .cache_bytes = 1_MiB, .memory_only = true, .testing_env = env.get()});
    +        BOOST_CHECK(dbw.Read(key, read_value));
    +        BOOST_CHECK_EQUAL(read_value, value);
    +    }
    +
    +    {
    +        CDBWrapper dbw({.path = ph, .cache_bytes = 1_MiB, .wipe_data = true, .testing_env = env.get()});
    +        BOOST_CHECK(dbw.IsEmpty());
    +        BOOST_CHECK(!dbw.Read(key, read_value));
    +    }
    +}
    +
     // Ensure that we start obfuscating during a reindex.
     BOOST_AUTO_TEST_CASE(existing_data_reindex)
     {
    diff --git a/src/test/fuzz/dbwrapper.cpp b/src/test/fuzz/dbwrapper.cpp
    index 433ada799c..5fe2ad69c9 100644
    --- a/src/test/fuzz/dbwrapper.cpp
    +++ b/src/test/fuzz/dbwrapper.cpp
    @@ -3,12 +3,13 @@
     // file COPYING or http://www.opensource.org/licenses/mit-license.php.
     
     #include <dbwrapper.h>
    -#include <logging.h>
    +#include <compat/byteswap.h>
    +#include <crypto/common.h>
     #include <sync.h>
     #include <test/fuzz/FuzzedDataProvider.h>
     #include <test/fuzz/fuzz.h>
     #include <test/fuzz/util.h>
    -#include <test/util/random.h>
    +#include <test/util/setup_common.h>
     #include <util/byte_units.h>
     
     #include <leveldb/env.h>
    @@ -21,7 +22,9 @@
     #include <map>
     #include <memory>
     #include <numeric>
    +#include <optional>
     #include <set>
    +#include <string>
     #include <vector>
     
     namespace {
    @@ -41,53 +44,46 @@ namespace {
      */
     class DeterministicEnv final : public leveldb::EnvWrapper
     {
    +    using WorkFunction = void (*)(void*);
    +
         struct Work {
    -        void (*function)(void*);
    +        WorkFunction function;
             void* arg;
         };
     
    -    mutable Mutex m_mutex;
    +    Mutex m_mutex;
         std::deque<Work> m_queue GUARDED_BY(m_mutex);
     
     public:
         explicit DeterministicEnv(leveldb::Env* base) : EnvWrapper(base) {}
     
    -    void Schedule(void (*function)(void* arg), void* arg) override EXCLUSIVE_LOCKS_REQUIRED(!m_mutex)
    +    void Schedule(WorkFunction function, void* arg) override EXCLUSIVE_LOCKS_REQUIRED(!m_mutex)
         {
             LOCK(m_mutex);
             m_queue.push_back({function, arg});
         }
     
    -    bool HasPending() const EXCLUSIVE_LOCKS_REQUIRED(!m_mutex)
    -    {
    -        LOCK(m_mutex);
    -        return !m_queue.empty();
    -    }
    -
         /** Execute one pending background task. The task may schedule a
          *  successor which is left pending for a later call. */
    -    void RunOne() EXCLUSIVE_LOCKS_REQUIRED(!m_mutex)
    +    bool RunOne() EXCLUSIVE_LOCKS_REQUIRED(!m_mutex)
         {
             Work work{nullptr, nullptr};
             {
                 LOCK(m_mutex);
    -            if (m_queue.empty()) return;
    +            if (m_queue.empty()) return false;
                 work = m_queue.front();
                 m_queue.pop_front();
             }
             work.function(work.arg);
    +        return true;
         }
     
         /** Execute pending background tasks until none remain. */
    -    void DrainWork() EXCLUSIVE_LOCKS_REQUIRED(!m_mutex)
    -    {
    -        while (HasPending()) {
    -            RunOne();
    -        }
    -    }
    +    void DrainWork() EXCLUSIVE_LOCKS_REQUIRED(!m_mutex) { while (RunOne()) {} }
     };
     
     constexpr size_t MAX_VALUE_SIZE{4096};
    +constexpr size_t WRITE_BATCH_HEADER{12}; // See `kHeader` in `db/write_batch.cc`
     
     /** Generate a deterministic value from key and length. The fuzz input
      *  controls only the key and a 16-bit length; the actual content is
    @@ -100,86 +96,140 @@ std::vector<uint8_t> MakeValue(uint16_t key, uint16_t len)
         return v;
     }
     
    -/** Comparator matching LevelDB's lexicographic order on little-endian
    - *  serialized uint16_t keys (low byte first, then high byte). */
    -struct SerializedU16Cmp {
    -    bool operator()(uint16_t a, uint16_t b) const
    -    {
    -        const auto swap{[](uint16_t v) -> uint16_t { return static_cast<uint16_t>((v << 8) | (v >> 8)); }};
    -        return swap(a) < swap(b);
    -    }
    +/** Equivalent to `leveldb::BytewiseComparator()` on 2-byte little-endian
    + *  serialized uint16_t keys, while keeping the oracle keyed by uint16_t. */
    +struct LevelDBBytewiseU16Cmp {
    +    bool operator()(uint16_t a, uint16_t b) const { return internal_bswap_16(a) < internal_bswap_16(b); }
     };
     
    -struct FailDeserialize {
    +uint16_t ConsumeKey(FuzzedDataProvider& provider) { return provider.ConsumeIntegral<uint16_t>(); }
    +uint16_t ConsumeValueLength(FuzzedDataProvider& provider) { return provider.ConsumeIntegralInRange<uint16_t>(0, MAX_VALUE_SIZE); }
    +bool ConsumeSync(FuzzedDataProvider& provider) { return provider.ConsumeIntegralInRange<uint8_t>(0, 7) == 0; }
    +uint16_t PickExistingKey(FuzzedDataProvider& provider, const std::map<uint16_t, uint16_t, LevelDBBytewiseU16Cmp>& oracle)
    +{
    +    assert(!oracle.empty());
    +    auto it{oracle.begin()};
    +    std::advance(it, provider.ConsumeIntegralInRange<size_t>(0, oracle.size() - 1));
    +    return it->first;
    +}
    +
    +struct FailUnserialize {
         template <typename Stream>
    -    void Unserialize(Stream&) { throw std::runtime_error{"intentional"}; }
    +    void Unserialize(Stream&) { throw std::ios_base::failure{"always fail"}; }
     };
     
    -void initialize_dbwrapper()
    +std::optional<uint16_t> GetIteratorUserKey(CDBIterator& it, bool obfuscate)
    +{
    +    struct RawKeyBytes {
    +        std::vector<std::byte> key;
    +
    +        void Unserialize(DataStream& s)
    +        {
    +            key.resize(s.size());
    +            if (!key.empty()) s.read(std::span{key});
    +        }
    +    };
    +
    +    // The harness only writes uint16_t keys, but an obfuscated DB also stores
    +    // the internal obfuscation metadata entry under a different key type.
    +    RawKeyBytes raw_key;
    +    assert(it.GetKey(raw_key));
    +    if (raw_key.key.size() != sizeof(uint16_t)) {
    +        assert(obfuscate);
    +        std::string key_str;
    +        assert(it.GetKey(key_str));
    +        assert(key_str == dbwrapper_private::GetObfuscationKey());
    +        return std::nullopt;
    +    }
    +    return ReadLE16(raw_key.key.data());
    +}
    +
    +/** Compare the DB iterator against the oracle, allowing only the obfuscation
    + *  metadata entry as an extra DB record when obfuscation is enabled. */
    +void VerifyIterator(CDBWrapper& dbw, const std::map<uint16_t, uint16_t, LevelDBBytewiseU16Cmp>& oracle, bool obfuscate, std::optional<uint16_t> seek_key = std::nullopt)
     {
    -    LogInstance().DisableLogging();
    +    const std::unique_ptr<CDBIterator> it{dbw.NewIterator()};
    +    auto oracle_it{seek_key ? oracle.lower_bound(*seek_key) : oracle.begin()};
    +    size_t entry_count{0};
    +    it->Seek(seek_key.value_or(uint16_t{0}));
    +    for (; it->Valid(); it->Next()) {
    +        if (const auto db_key{GetIteratorUserKey(*it, obfuscate)}) {
    +            assert(oracle_it != oracle.end());
    +            assert(*db_key == oracle_it->first);
    +
    +            std::vector<uint8_t> db_value;
    +            assert(it->GetValue(db_value));
    +            assert(db_value == MakeValue(*db_key, oracle_it->second));
    +            ++oracle_it;
    +        }
    +        ++entry_count;
    +    }
    +    assert(oracle_it == oracle.end());
    +    if (!seek_key) assert(entry_count == oracle.size() + obfuscate);
     }
     
     } // namespace
     
    -FUZZ_TARGET(dbwrapper, .init = initialize_dbwrapper)
    +FUZZ_TARGET(dbwrapper, .init = [] { static auto setup{MakeNoLogFileContext<>()}; })
     {
         SeedRandomStateForTest(SeedRand::ZEROS);
         FuzzedDataProvider provider{buffer.data(), buffer.size()};
     
         const auto memenv{std::unique_ptr<leveldb::Env>{leveldb::NewMemEnv(leveldb::Env::Default())}};
         DeterministicEnv det_env{memenv.get()};
    +    leveldb::EnvWrapper* const testing_env{&det_env};
     
    -    const bool obfuscate{provider.ConsumeBool()};
    -    const size_t cache_bytes{provider.ConsumeIntegralInRange<size_t>(64 << 10, 1_MiB)};
    -    const size_t max_file_size{provider.ConsumeIntegralInRange<size_t>(1_MiB, 4_MiB)};
    +    const bool obfuscate{provider.ConsumeIntegralInRange<uint8_t>(0, 10) != 0};
    +    const size_t cache_bytes{provider.ConsumeIntegralInRange<size_t>(64 << 10, 2_MiB)};
    +    const size_t max_file_size{provider.PickValueInArray<std::size_t>({2_MiB, DBWRAPPER_MAX_FILE_SIZE})};
     
    -    CDBWrapper dbw{{
    -        .path = "",
    -        .cache_bytes = cache_bytes,
    -        .obfuscate = obfuscate,
    -        .testing_env = &det_env,
    -        .max_file_size = max_file_size,
    +    const auto make_db{[&](DBOptions options = {}) {
    +        return std::make_unique<CDBWrapper>(DBParams{
    +            .path = "dbwrapper_fuzz",
    +            .cache_bytes = cache_bytes,
    +            .obfuscate = obfuscate,
    +            .options = options,
    +            .testing_env = testing_env,
    +            .max_file_size = max_file_size,
    +        });
         }};
    +    std::unique_ptr<CDBWrapper> dbw{make_db()};
     
         // Oracle: key → value length. Content is reconstructed via MakeValue().
    -    std::map<uint16_t, uint16_t, SerializedU16Cmp> oracle;
    +    std::map<uint16_t, uint16_t, LevelDBBytewiseU16Cmp> oracle;
     
    -    LIMITED_WHILE(provider.ConsumeBool(), 10000)
    +    LIMITED_WHILE(provider.ConsumeBool(), 10'000)
         {
             CallOneOf(
                 provider,
                 // --- Mutations ---
                 [&] {
    -                const uint16_t key{provider.ConsumeIntegral<uint16_t>()};
    -                const uint16_t len{provider.ConsumeIntegralInRange<uint16_t>(0, MAX_VALUE_SIZE)};
    -                const bool sync{provider.ConsumeBool()};
    +                const auto key{ConsumeKey(provider)};
    +                const auto len{ConsumeValueLength(provider)};
                     det_env.DrainWork();
    -                dbw.Write(key, MakeValue(key, len), sync);
    +                dbw->Write(key, MakeValue(key, len), ConsumeSync(provider));
                     oracle[key] = len;
                 },
                 [&] {
    -                const uint16_t key{provider.ConsumeIntegral<uint16_t>()};
    -                const bool sync{provider.ConsumeBool()};
    +                const auto key{ConsumeKey(provider)};
                     det_env.DrainWork();
    -                dbw.Erase(key, sync);
    +                dbw->Erase(key, ConsumeSync(provider));
                     oracle.erase(key);
                 },
                 [&] {
    -                CDBBatch batch{dbw};
    +                CDBBatch batch{*dbw};
                     std::map<uint16_t, uint16_t> batch_writes;
                     std::set<uint16_t> batch_erases;
                     const auto fill{[&] {
                         LIMITED_WHILE(provider.ConsumeBool(), 20)
                         {
    +                        const auto key{ConsumeKey(provider)};
                             if (provider.ConsumeBool()) {
    -                            const uint16_t key{provider.ConsumeIntegral<uint16_t>()};
    -                            const uint16_t len{provider.ConsumeIntegralInRange<uint16_t>(0, MAX_VALUE_SIZE)};
    +                            const auto len{ConsumeValueLength(provider)};
                                 batch.Write(key, MakeValue(key, len));
                                 batch_writes[key] = len;
                                 batch_erases.erase(key);
                             } else {
    -                            const uint16_t key{provider.ConsumeIntegral<uint16_t>()};
                                 batch.Erase(key);
                                 batch_erases.insert(key);
                                 batch_writes.erase(key);
    @@ -188,81 +238,59 @@ FUZZ_TARGET(dbwrapper, .init = initialize_dbwrapper)
                     }};
                     fill();
                     if (provider.ConsumeBool()) {
    -                    (void)batch.ApproximateSize();
    +                    assert(batch.ApproximateSize() >= WRITE_BATCH_HEADER);
                         batch.Clear();
    +                    assert(batch.ApproximateSize() == WRITE_BATCH_HEADER);
                         batch_writes.clear();
                         batch_erases.clear();
    -                    fill();
    +                    if (provider.ConsumeBool()) fill();
                     }
    -                const bool sync{provider.ConsumeBool()};
                     det_env.DrainWork();
    -                dbw.WriteBatch(batch, sync);
    +                dbw->WriteBatch(batch, ConsumeSync(provider));
                     for (const auto& [k, v] : batch_writes) oracle[k] = v;
                     for (const auto& k : batch_erases) oracle.erase(k);
                 },
    +            [&] {
    +                det_env.DrainWork();
    +                dbw.reset();
    +                dbw = make_db({.force_compact = provider.ConsumeBool()});
    +                VerifyIterator(*dbw, oracle, obfuscate);
    +            },
                 // --- Reads ---
                 [&] {
    -                const uint16_t key{provider.ConsumeIntegral<uint16_t>()};
    +                const auto key{ConsumeKey(provider)};
                     std::vector<uint8_t> value;
    -                const bool found{dbw.Read(key, value)};
    -                const auto it{oracle.find(key)};
    -                assert(found == (it != oracle.end()));
    -                if (found) assert(value == MakeValue(key, it->second));
    +                const bool found{dbw->Read(key, value)};
    +                if (const auto it{oracle.find(key)}; it != oracle.end()) {
    +                    assert(found && value == MakeValue(key, it->second));
    +                } else {
    +                    assert(!found);
    +                }
                 },
                 [&] {
    -                const uint16_t key{provider.ConsumeIntegral<uint16_t>()};
    -                assert(dbw.Exists(key) == oracle.contains(key));
    +                const auto key{ConsumeKey(provider)};
    +                assert(dbw->Exists(key) == oracle.contains(key));
                 },
                 [&] {
    -                const uint16_t key{provider.ConsumeIntegral<uint16_t>()};
    -                FailDeserialize wrong_type;
    -                assert(!dbw.Read(key, wrong_type));
    +                const auto key{!oracle.empty() && provider.ConsumeBool() ? PickExistingKey(provider, oracle) : ConsumeKey(provider)};
    +                FailUnserialize wrong_type;
    +                assert(!dbw->Read(key, wrong_type));
                 },
                 [&] {
    -                const std::unique_ptr<CDBIterator> it{dbw.NewIterator()};
    -                auto oracle_it{oracle.begin()};
    -                if (provider.ConsumeBool()) {
    -                    const uint16_t seek_key{provider.ConsumeIntegral<uint16_t>()};
    -                    it->Seek(seek_key);
    -                    oracle_it = oracle.lower_bound(seek_key);
    -                } else {
    -                    it->SeekToFirst();
    -                }
    -                while (it->Valid()) {
    -                    uint16_t db_key;
    -                    if (!it->GetKey(db_key)) break;
    -
    -                    if (oracle_it != oracle.end() && db_key == oracle_it->first) {
    -                        std::vector<uint8_t> db_value;
    -                        assert(it->GetValue(db_value));
    -                        assert(db_value == MakeValue(db_key, oracle_it->second));
    -                        ++oracle_it;
    -                    } else {
    -                        assert(obfuscate);
    -                        std::string key_str;
    -                        assert(it->GetKey(key_str));
    -                        assert(key_str == std::string("\000obfuscate_key", 14));
    -                    }
    -                    it->Next();
    -                }
    -                assert(oracle_it == oracle.end());
    +                const auto seek_key{provider.ConsumeBool() ? std::nullopt : std::optional{ConsumeKey(provider)}};
    +                VerifyIterator(*dbw, oracle, obfuscate, seek_key);
                 },
                 // --- Stats ---
                 [&] {
    -                if (!obfuscate) {
    -                    assert(dbw.IsEmpty() == oracle.empty());
    -                } else {
    -                    assert(!dbw.IsEmpty());
    -                }
    +                assert(dbw->IsEmpty() == (oracle.empty() && !obfuscate));
                 },
                 [&] {
    -                uint16_t k1{provider.ConsumeIntegral<uint16_t>()};
    -                uint16_t k2{provider.ConsumeIntegral<uint16_t>()};
    -                if (k1 > k2) std::swap(k1, k2);
    -                (void)dbw.EstimateSize(k1, k2);
    +                const auto [k1, k2]{std::minmax({ConsumeKey(provider), ConsumeKey(provider)}, LevelDBBytewiseU16Cmp{})};
    +                const size_t estimate_size{dbw->EstimateSize(k1, k2)};
    +                if (k1 == k2) assert(estimate_size == 0);
                 },
                 [&] {
    -                (void)dbw.DynamicMemoryUsage();
    +                (void)dbw->DynamicMemoryUsage();
                 },
                 // --- Compaction control ---
                 [&] {
    @@ -270,5 +298,8 @@ FUZZ_TARGET(dbwrapper, .init = initialize_dbwrapper)
                 });
         }
     
    +    // The fuzzer may end without choosing another iterator pass after the last
    +    // deferred compaction step, so drain and verify one final time here.
         det_env.DrainWork();
    +    VerifyIterator(*dbw, oracle, obfuscate);
     }
    
    

    </details>

  54. dbwrapper: accept optional testing leveldb::Env in DBParams
    Allow callers to inject a custom leveldb::Env via DBParams::testing_env,
    which takes priority over the memory_only in-memory environment. This
    enables fuzz harnesses to supply a deterministic environment.
    49c0e670c2
  55. test: add fuzz harness for CDBWrapper
    Introduces a libFuzzer harness that exercises CDBWrapper operations
    against a std::map oracle, with a DeterministicEnv that captures LevelDB
    background compaction for single-threaded determinism.
    
    Adds implicit-integer-sign-change:leveldb/ to the test UBSAN suppressions.
    b11a6cd998
  56. andrewtoth force-pushed on Apr 16, 2026
  57. andrewtoth commented at 2:22 AM on April 16, 2026: contributor

    Thank you for your detailed review @l0rinc! I have taken many of your suggestions.

    a dedicated CDBWrapper fuzzer helps with increasing our confidence as we iterate towards #31132.

    I'm not sure I see how this is related to #31132, or what you mean by "iterate towards". If you have tangible suggestions for that PR, please share them there!

    make the wrapper/env behavior more explicit and covered with focused unit tests, especially injected testing_env precedence and wipe_data

    I'm not sure unit tests will add much value for this test only construct that is well covered by this fuzz harness. I opted not to take this.

    keep the fuzz inputs closer to meaningful real values and cheaper execution paths where possible

    I didn't take most of the suggestions regarding this theme. Real values will not allow us to execute many interesting code paths cheaply, in fact the opposite! Please see inline comments and let me know if you still disagree.

    split the large fuzz commit into a few smaller buildable steps so the basic harness can be reviewed separately from the ordering / iterator / reopen logic

    I did not take this suggestion. The first 2 commits touch production code. They are small and atomic so easy to review for correctness. The third large commit is adding the fuzz harness, which touches no other code. It is mostly one large CallOneOf loop that can be reviewed top to bottom. I don't see a benefit in splitting this up.

    simplify and reuse helpers in the fuzzer where we can, and make the iterator / obfuscation handling more explicit and robust

    I took many of these, others I have made comments inline.

    git diff b967e255e7620b8366ec1aeb8932138455be8701..b11a6cd998864defcc4e1e6ae9fb5eae7c6a6e3e

  58. l0rinc changes_requested
  59. sedited requested review from l0rinc on Apr 23, 2026
  60. sedited requested review from marcofleon on Apr 23, 2026

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-05-02 18:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me