fuzz: improve dbwrapper_concurrent_reads performance #35455

pull andrewtoth wants to merge 1 commits into bitcoin:master from andrewtoth:fuzz_db_reads_oom_fix changing 1 files +62 −51
  1. andrewtoth commented at 2:39 AM on June 4, 2026: contributor

    The recently merged fuzz harness targeting concurrent reads suffers from poor performance and memory leaks (https://github.com/bitcoin/bitcoin/pull/34866#issuecomment-4614323925).

    Fix this by

    • using a global thread pool instead of a local one per iteration
    • reduce thread count to 8 from 16
    • use a std::map oracle to check results inline instead of reading from the db to get a baseline and storing results
  2. DrahtBot added the label Fuzzing on Jun 4, 2026
  3. DrahtBot commented at 2:39 AM on June 4, 2026: contributor

    <!--e57a25ab6845829454e8d69fc972939a-->

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    <!--006a51241073e994b41acfe9ec718e94-->

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/35455.

    <!--021abf342d371248e50ceaed478a90ca-->

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK marcofleon, l0rinc, sedited

    If your review is incorrectly listed, please copy-paste <code>&lt;!--meta-tag:bot-skip--&gt;</code> into the comment that the bot should ignore.

    <!--174a7506f384e20aa4161008e828411d-->

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #30342 (kernel, logging: Pass Logger instances to kernel objects by ryanofsky)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

    <!--5faf32d7da4f0f540f40219e4f7537a3-->

  4. andrewtoth commented at 2:40 AM on June 4, 2026: contributor
  5. in src/test/fuzz/dbwrapper.cpp:178 in e5bd478b7a
     174 | +Mutex g_read_pool_mutex;
     175 | +
     176 | +void StartReadPoolIfNeeded() EXCLUSIVE_LOCKS_REQUIRED(!g_read_pool_mutex)
     177 | +{
     178 | +    LOCK(g_read_pool_mutex);
     179 | +    if (g_read_pool.WorkersCount() != MAX_READ_WORKERS) g_read_pool.Start(MAX_READ_WORKERS);
    


    l0rinc commented at 7:38 AM on June 4, 2026:

    Please see #35295 (review)

  6. in src/test/fuzz/dbwrapper.cpp:401 in e5bd478b7a
     393 | @@ -414,7 +394,8 @@ FUZZ_TARGET(dbwrapper_concurrent_reads, .init = [] { static auto setup{MakeNoLog
     394 |  
     395 |      // Build query list from seeded and random keys.
     396 |      const size_t num_queries{provider.ConsumeIntegralInRange<size_t>(1, 2'000)};
     397 | -    std::vector<ReadQuery> queries;
     398 | +    enum class ReadOp { Read, Exists, IteratorSeek };
     399 | +    std::vector<std::tuple<ReadOp, uint16_t>> queries;
     400 |      queries.reserve(num_queries);
     401 |      for (size_t i{0}; i < num_queries; ++i) {
     402 |          const auto op{static_cast<ReadOp>(provider.ConsumeIntegralInRange<int>(0, 2))};
    


    l0rinc commented at 7:43 AM on June 4, 2026:

    Related to #34866 (review):

            const auto op{provider.PickValueInArray({ReadOp::Read, ReadOp::Exists, ReadOp::IteratorSeek})};
    

    or

    enum class ReadOp { Read, Exists, IteratorSeek, kMaxValue = IteratorSeek };
    ...
    const auto op{provider.ConsumeEnum<ReadOp>()};
    

    or if we want to give the fuzzer more control:

        const auto consume_key{[&] {
            return provider.ConsumeBool()
                       ? keys[provider.ConsumeIntegralInRange<size_t>(0, keys.size() - 1)]
                       : ConsumeKey(provider);
        }};
        for (size_t i{0}; i < num_queries; ++i) {
            CallOneOf(provider,
                [&] { queries.emplace_back(ReadOp::Read, consume_key()); },
                [&] { queries.emplace_back(ReadOp::Exists, consume_key()); },
                [&] { queries.emplace_back(ReadOp::IteratorSeek, consume_key()); });
        }
    
  7. in src/test/fuzz/dbwrapper.cpp:377 in e5bd478b7a outdated
     373 | @@ -397,14 +374,17 @@ FUZZ_TARGET(dbwrapper_concurrent_reads, .init = [] { static auto setup{MakeNoLog
     374 |      const size_t num_entries{provider.ConsumeIntegralInRange<size_t>(100, 5'000)};
     375 |      std::vector<uint16_t> keys;
     376 |      keys.reserve(num_entries);
     377 | +    Oracle oracle;
    


    l0rinc commented at 7:51 AM on June 4, 2026:

    Related to #34866 (review)

  8. in src/test/fuzz/dbwrapper.cpp:422 in e5bd478b7a
     425 | +            std::iota(order.begin(), order.end(), size_t{0});
     426 | +            std::shuffle(order.begin(), order.end(), thread_rng);
     427 | +            std::vector<uint8_t> v;
     428 | +            std::string key_str;
     429 | +            const std::unique_ptr<CDBIterator> it{db.NewIterator()};
     430 |              start_latch.arrive_and_wait();
    


    l0rinc commented at 7:53 AM on June 4, 2026:

    Could we make sure the iterator creation also races the rest and put it after the wait?

    case ReadOp::IteratorSeek: {
        const std::unique_ptr<CDBIterator> it{db.NewIterator()};
        it->Seek(key);
    

    andrewtoth commented at 8:29 PM on June 4, 2026:

    Moved it below latch but still created above the loop for performance.

  9. in src/test/fuzz/dbwrapper.cpp:443 in e5bd478b7a
     447 | +                    it->Seek(key);
     448 | +                    // Skip the obfuscation metadata entry (a non-uint16_t key) if we land
     449 | +                    // on it, so the result matches the oracle, which only tracks user keys.
     450 | +                    if (it->Valid() && it->GetKey(key_str) && key_str == OBFUSCATION_KEY) it->Next();
     451 | +                    if (const auto oit{oracle.lower_bound(key)}; oit != oracle.end()) {
     452 | +                        assert(it->Valid() && it->GetValue(v) && v == MakeValue(oit->first, oit->second));
    


    l0rinc commented at 7:54 AM on June 4, 2026:

    can we assert the key as well?

                            uint16_t actual_key;
                            assert(it->Valid() && it->GetKey(actual_key) && actual_key == oit->first);
                            assert(it->GetValue(v) && v == MakeValue(actual_key, oit->second));
    
  10. in src/test/fuzz/dbwrapper.cpp:418 in e5bd478b7a
     421 | -        return [&, seed = rng.rand256()]() -> Results {
     422 | +        return [&, seed = rng.rand256()] {
     423 |              FastRandomContext thread_rng{seed};
     424 | +            std::vector<size_t> order(queries.size());
     425 | +            std::iota(order.begin(), order.end(), size_t{0});
     426 | +            std::shuffle(order.begin(), order.end(), thread_rng);
    


    l0rinc commented at 8:01 AM on June 4, 2026:

    nit: could use std::ranges here

                std::ranges::shuffle(order, thread_rng);
    
  11. in src/test/fuzz/dbwrapper.cpp:413 in e5bd478b7a
     415 |      // Workers + main thread synchronize on the latch so all reads start together.
     416 |      std::latch start_latch{static_cast<ptrdiff_t>(MAX_READ_WORKERS + 1)};
     417 | -    std::vector<std::function<Results()>> tasks(MAX_READ_WORKERS);
     418 | +    std::vector<std::function<void()>> tasks(MAX_READ_WORKERS);
     419 | +    FastRandomContext rng{ConsumeUInt256(provider)};
     420 |      std::generate(tasks.begin(), tasks.end(), [&] {
    


    l0rinc commented at 8:02 AM on June 4, 2026:

    nit: could use std::ranges here:

        std::ranges::generate(tasks, [&] {
    
  12. andrewtoth force-pushed on Jun 4, 2026
  13. andrewtoth commented at 8:29 PM on June 4, 2026: contributor

    Thanks @l0rinc. I've applied all your suggestions.

  14. l0rinc commented at 8:46 PM on June 4, 2026: contributor

    code review ACK 26cac4b68180446dcef2ea5293de23e8cec551d4

  15. marcofleon commented at 10:59 AM on June 10, 2026: contributor

    tested ACK 26cac4b68180446dcef2ea5293de23e8cec551d4

  16. fuzz: improve dbwrapper_concurrent_reads performance 1ce9e26239
  17. andrewtoth force-pushed on Jun 10, 2026
  18. andrewtoth commented at 6:37 PM on June 10, 2026: contributor

    Reduced num_entries from 5k -> 3k. This should still provide plenty of room to trigger L1->L2 compactions since each value could be expanded up to 32kb * ~320 would exceed 10MB.

    Requested by @marcofleon (thanks!).

    git diff 26cac4b68180446dcef2ea5293de23e8cec551d4..1ce9e262395bb17e03a7203efab0ef5ae168459b

  19. marcofleon commented at 8:25 PM on June 10, 2026: contributor

    reACK 1ce9e262395bb17e03a7203efab0ef5ae168459b

  20. DrahtBot requested review from l0rinc on Jun 10, 2026
  21. l0rinc commented at 5:54 AM on June 11, 2026: contributor

    ACK 1ce9e262395bb17e03a7203efab0ef5ae168459b

  22. sedited approved
  23. sedited commented at 7:40 AM on June 11, 2026: contributor

    ACK 1ce9e262395bb17e03a7203efab0ef5ae168459b

  24. sedited commented at 7:41 AM on June 11, 2026: contributor

    This might need some further parameter tweaking, but I think we are within our typical timeout parameters now.

  25. sedited merged this on Jun 11, 2026
  26. sedited closed this on Jun 11, 2026

  27. in src/test/fuzz/dbwrapper.cpp:173 in 1ce9e26239
     166 | @@ -167,7 +167,16 @@ void VerifyIterator(CDBWrapper& dbw, const Oracle& oracle,
     167 |  }
     168 |  
     169 |  /** Maximum number of concurrent reader threads in dbwrapper_concurrent_reads. */
     170 | -constexpr size_t MAX_READ_WORKERS{16};
     171 | +constexpr size_t MAX_READ_WORKERS{8};
     172 | +
     173 | +ThreadPool g_read_pool{"dbfuzz"};
     174 | +Mutex g_read_pool_mutex;
    


    maflcko commented at 8:19 AM on June 11, 2026:

    unrelated nit: I know you just copy-pasted this from the other fuzz test, but is the Mutex here actually needed? IIUC no fuzz engine supports running the same fuzz target in parallel in the same thread. They'll all either run the fuzz target sequentially in a single thread/process (possibly in several processes), or run them in a loop+occasional fork.

    So I don't think the Mutex here is needed. Also, if it was needed, g_read_pool would have to be annotated with clang TSA?

    Just a style nit, but I think it would be better to remove it for clarity, or properly document/annotate why it is needed.


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-06-11 10:51 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me