tests: Add fuzzing harnesses for CAutoFile, CBufferedFile, LoadExternalBlockFile and other FILE* consumers #19143

pull practicalswift wants to merge 8 commits into bitcoin:master from practicalswift:fuzzers-FILE changing 7 files +454 −0
  1. practicalswift commented at 7:14 pm on June 2, 2020: contributor

    Add fuzzing harnesses for CAutoFile, CBufferedFile, LoadExternalBlockFile and other FILE* consumers:

    • Add FuzzedFileProvider which provides a FILE* interface to FuzzedDataProvider using fopencookie
    • Add FuzzedAutoFileProvider which provides a CAutoFile interface to FuzzedDataProvider
    • Add serialization/deserialization fuzzing helpers WriteToStream(…)/ReadFromStream(…)
    • Add fuzzing harness for CAutoFile (streams.h)
    • Add fuzzing harness for CBufferedFile (streams.h)
    • Add fuzzing harness for LoadExternalBlockFile(...) (validation.h)
    • Add fuzzing harness for CBlockPolicyEstimator::Read and CBlockPolicyEstimator::Write (policy/fees.h)

    See doc/fuzzing.md for information on how to fuzz Bitcoin Core. Don’t forget to contribute any coverage increasing inputs you find to the Bitcoin Core fuzzing corpus repo.

    Happy fuzzing :)

  2. DrahtBot added the label Build system on Jun 2, 2020
  3. DrahtBot added the label Tests on Jun 2, 2020
  4. DrahtBot commented at 0:08 am on June 8, 2020: member

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #19259 (tests: Add fuzzing harness for LoadMempool(…) and DumpMempool(…) by practicalswift)
    • #19203 (net: Add regression fuzz harness for CVE-2017-18350. Add FuzzedSocket. Add thin SOCKET wrapper. by practicalswift)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  5. DrahtBot added the label Needs rebase on Jun 11, 2020
  6. practicalswift force-pushed on Jun 11, 2020
  7. DrahtBot removed the label Needs rebase on Jun 11, 2020
  8. practicalswift force-pushed on Jun 12, 2020
  9. practicalswift force-pushed on Jun 15, 2020
  10. in src/test/fuzz/util.h:336 in 97ee502b66 outdated
    295+        const std::vector<uint8_t> random_bytes = fuzzed_file->m_fuzzed_data_provider.ConsumeBytes<uint8_t>(size);
    296+        if (random_bytes.empty()) {
    297+            return 0;
    298+        }
    299+        std::memcpy(buf, random_bytes.data(), random_bytes.size());
    300+        const ssize_t n = fuzzed_file->m_fuzzed_data_provider.ConsumeBool() ? size : random_bytes.size();
    


    Crypt-iQ commented at 10:58 pm on July 10, 2020:
    I don’t understand the need for the call to ConsumeBool(). Isn’t size necessarily equal to random_bytes.size()?

    MarcoFalke commented at 10:18 am on July 11, 2020:

    If the m_fuzzed_data_provider is empty it should give out less bytes, so size would be larger. ConsumeBool would return false in that case and this condition would evaluate to size.

    So it seems n can simply be replaced by size


    Crypt-iQ commented at 10:50 am on July 11, 2020:
    Oh I didn’t think about that edge case. Don’t you mean the condition should evaluate to random_bytes.size() if ConsumeBool() is false. So n replaced by random_bytes.size() instead?

    MarcoFalke commented at 10:51 am on July 11, 2020:
    Ah right off-by-one :man_facepalming:

    practicalswift commented at 1:25 am on July 15, 2020:
    Now simply using random_bytes.size(). I think I was trying to be too clever :) Please re-review :)
  11. Crypt-iQ commented at 11:30 pm on July 10, 2020: contributor
    utACK 27bef8c
  12. DrahtBot added the label Needs rebase on Jul 11, 2020
  13. practicalswift force-pushed on Jul 11, 2020
  14. DrahtBot removed the label Needs rebase on Jul 11, 2020
  15. in src/test/fuzz/buffered_file.cpp:43 in 30d545c15a outdated
    37+                    opt_buffered_file->read((char*)arr.data(), fuzzed_data_provider.ConsumeIntegralInRange<size_t>(0, 4096));
    38+                } catch (const std::ios_base::failure&) {
    39+                }
    40+                break;
    41+            }
    42+            case 1: {
    


    Crypt-iQ commented at 8:13 pm on July 14, 2020:
    Why did you leave out SetPos when calling CBufferedFile functions?

    practicalswift commented at 0:11 am on July 15, 2020:
    Good catch! Now fuzzing also SetPos :)
  16. Crypt-iQ commented at 9:24 pm on July 14, 2020: contributor

    Combined fuzzing coverage for only the fuzzers in this branch: https://crypt-iq.github.io/pr19143_cov/src/

    Each fuzzer was run for about 24 hours. I used a variation of libFuzzer and AFL. There seems to be some missing coverage in CBlockPolicyEstimator::Write & CBlockPolicyEstimator::Read. That could probably be fixed by better seeds. There’s also missing coverage in LoadExternalBlockFile, and this can also be fixed by better seeds. I don’t know the format of the external block files (I copy pasted the magic bytes from MessageStart() as seeds but no luck). This was the only slow fuzzing harness with ~5 execs/s.

  17. practicalswift force-pushed on Jul 15, 2020
  18. practicalswift force-pushed on Jul 15, 2020
  19. practicalswift commented at 0:58 am on July 15, 2020: contributor

    Combined fuzzing coverage for only the fuzzers in this branch: https://crypt-iq.github.io/pr19143_cov/src/

    Each fuzzer was run for about 24 hours. I used a variation of libFuzzer and AFL.

    Wow, that’s some thorough testing! Thanks a lot!

    There seems to be some missing coverage in CBlockPolicyEstimator::Write & CBlockPolicyEstimator::Read. That could probably be fixed by better seeds.

    I think the perceived lack of coverage is simply due to src/test/fuzz/policy_estimator being very slow (CBlockPolicyEstimator creation is costly): I believe that given enough time the fuzzer will span the full CFG :)

    To speed things up I’ve now added also a dedicated fuzzer for CBlockPolicyEstimator::{Write,Read} which re-uses the CBlockPolicyEstimator object across runs (without sacrificing substantial coverage stability AFAICT): see src/test/fuzz/policy_estimator_io.cpp.

    This dedicated fuzzer is running at >1000 exec/s compared to <50 exec/s for the non-dedicated one.

    There’s also missing coverage in LoadExternalBlockFile, and this can also be fixed by better seeds. I don’t know the format of the external block files (I copy pasted the magic bytes from MessageStart() as seeds but no luck). This was the only slow fuzzing harness with ~5 execs/s.

    libFuzzer from LLVM 10 is able to reach past the magic bytes check within a few seconds. Perhaps you were using an old version of libFuzzer when running the load_external_block_file fuzzer?

  20. practicalswift force-pushed on Jul 15, 2020
  21. in src/test/fuzz/util.h:361 in 46ce3bf4c6 outdated
    356+        FuzzedFileProvider* fuzzed_file = (FuzzedFileProvider*)cookie;
    357+        int64_t new_offset = 0;
    358+        if (whence == SEEK_SET) {
    359+            new_offset = *offset;
    360+        } else if (whence == SEEK_CUR) {
    361+            if (AdditionOverflow(new_offset, *offset)) {
    


    Crypt-iQ commented at 2:23 am on July 15, 2020:
    new_offset should be fuzzed_file->m_offset

    practicalswift commented at 11:42 am on July 15, 2020:
    Oh, nice catch! Thanks for noticing: now fixed!
  22. in src/test/fuzz/util.h:318 in 3f12c9334f outdated
    313+            FuzzedFileProvider::write,
    314+            FuzzedFileProvider::seek,
    315+            FuzzedFileProvider::close,
    316+        };
    317+        return fopencookie(this, mode.c_str(), io_hooks);
    318+#else
    


    Crypt-iQ commented at 8:17 am on July 15, 2020:
    Could return early at the top of the function #ifndef _GNU_SOURCE, then you wouldn’t need these macros here?

    practicalswift commented at 11:25 am on July 15, 2020:
    This is intentional actually: I want exactly the same input bytes to be consumed by an open() call regardless of fopencookie being available or not.
  23. in src/test/fuzz/autofile.cpp:66 in d03f2588a4 outdated
    61+    }
    62+    (void)auto_file.Get();
    63+    (void)auto_file.GetType();
    64+    (void)auto_file.GetVersion();
    65+    (void)auto_file.IsNull();
    66+    if (fuzzed_data_provider.ConsumeBool()) {
    


    Crypt-iQ commented at 8:33 am on July 15, 2020:
    Is this gated by ConsumeBool() to test the behavior that, if this isn’t hit, fclose() is called automatically in the destructor?

    practicalswift commented at 11:27 am on July 15, 2020:
    Yes it is, but I want the fuzzer to also exercise the release() code path here, and it isn’t called in the destructor :)
  24. in src/test/fuzz/buffered_file.cpp:53 in 8cf3f3492d outdated
    48@@ -48,17 +49,30 @@ void test_one_input(const std::vector<uint8_t>& buffer)
    49                 break;
    50             }
    51             case 3: {
    52+                if (!opt_buffered_file->SetPos(fuzzed_data_provider.ConsumeIntegralInRange<uint64_t>(0, 4096))) {
    53+                    setpos_fail = true;
    


    Crypt-iQ commented at 8:40 am on July 15, 2020:
    Should this be set back to false upon successful call to SetPos?

    practicalswift commented at 11:29 am on July 15, 2020:
    Unfortunately I think a single failed SetPos call is sufficient to set a subsequent FindByte call into an infinite loop. A successful SetPos won’t necessarily undo the harm of a prior failed SetPos call :)
  25. practicalswift force-pushed on Jul 15, 2020
  26. in src/test/fuzz/policy_estimator_io.cpp:25 in bb8711426e outdated
    20+    FuzzedDataProvider fuzzed_data_provider(buffer.data(), buffer.size());
    21+    FuzzedAutoFileProvider fuzzed_auto_file_provider = ConsumeAutoFile(fuzzed_data_provider);
    22+    CAutoFile fuzzed_auto_file = fuzzed_auto_file_provider.open();
    23+    // Re-using block_policy_estimator across runs to avoid costly creation of CBlockPolicyEstimator object.
    24+    static CBlockPolicyEstimator block_policy_estimator;
    25+    if (block_policy_estimator.Read(fuzzed_auto_file)) {
    


    Crypt-iQ commented at 8:48 am on July 15, 2020:
    Since block_policy_estimator is declared static, it shouldn’t matter if the Read call fails because the Write call could still succeed so I don’t think it needs to be gated.

    practicalswift commented at 11:33 am on July 15, 2020:

    The thinking here is that a successful Read will “reset” (to some extent) the state of block_policy_estimator. Without the check we might be calling Write(…) on block_policy_estimator that is populated from a successful Read in a previous run: that would introduce unnecessary coverage instability.

    More specifically this is the case I’m guarding against:

    0run n:
    1  block_policy_estimator.Read(…) returns true
    2  block_policy_estimator.Write(…)
    3run n+1:
    4  block_policy_estimator.Read(…) returns false
    5  block_policy_estimator.Write(…) # now with state from run n (instead of n+1)
    

    I want to minimize the unnecessary coverage instability that unavoidably may arise from re-using block_policy_estimator between runs (which in turn is done to get decent performance for this otherwise super slow fuzzing harness).


    Crypt-iQ commented at 11:42 am on July 15, 2020:
    Ok this makes much more sense. Otherwise it’s not always possible to get the same coverage I think? Different machines on different runs may not hit the same Read->Write->Read(false) calls.

    practicalswift commented at 11:50 am on July 15, 2020:
    @Crypt-iQ Exactly! We want to have “coverage stability”: a given input should result in the same coverage for a single run regardless of what has happened in previous runs. Achieving 100% coverage stability is not always possible. It might be due to architectural reasons (globals is a typical example), or it might be due to performance reasons (re-using an object might be required to get decent performance in terms of execs/s). We want to do what we can to at least minimise unnecessary coverage instability.
  27. Crypt-iQ commented at 10:16 am on July 15, 2020: contributor
    With clang-10 I can pass the magic bytes check (I guess it uses coverage in memcmp to solve for it?). Currently running the new policy_estimator_io fuzzer as well.
  28. tests: Add FuzzedFileProvider which provides a FILE* interface to FuzzedDataProvider using fopencookie 9dbcd6854c
  29. tests: Add FuzzedAutoFileProvider which provides a CAutoFile interface to FuzzedDataProvider e48094a506
  30. tests: Add serialization/deserialization fuzzing helpers WriteToStream(…)/ReadFromStream(…) e507c0799d
  31. tests: Add fuzzing harness for CAutoFile (streams.h) f3aa659be6
  32. tests: Add fuzzing harness for CBufferedFile (streams.h) 9823376030
  33. tests: Add fuzzing harness for LoadExternalBlockFile(...) (validation.h) 7bcc71e5f8
  34. tests: Add fuzzing harness for CBufferedFile::{SetPos,GetPos,GetType,GetVersion} (stream.h) 614e0807a8
  35. tests: Add fuzzing harness for CBlockPolicyEstimator::{Read,Write} (policy/fees.h) ad6c34881d
  36. practicalswift force-pushed on Jul 15, 2020
  37. Crypt-iQ commented at 2:45 pm on July 16, 2020: contributor
    Can confirm that the new policy_estimator_io fuzzer is significantly faster with libfuzzer (600-700 exec/s on one core) and hits checks in Read & Write that it otherwise doesn’t hit in the policy_estimator fuzzer due to being slower. Also tried with AFL but was very slow since I forgot to compile with persistent mode :/. Will review the latest set of changes now. Coverage for just the policy_estimator_io fuzzer is here https://crypt-iq.github.io/policy_cov_llvm_outs/src/policy/fees.cpp.gcov.html
  38. Crypt-iQ commented at 2:33 am on July 18, 2020: contributor
    Tested ACK ad6c348
  39. MarcoFalke merged this on Jul 18, 2020
  40. MarcoFalke closed this on Jul 18, 2020

  41. sidhujag referenced this in commit 60a95acd67 on Jul 18, 2020
  42. Fabcien referenced this in commit b52804961d on Feb 3, 2021
  43. practicalswift deleted the branch on Apr 10, 2021
  44. kittywhiskers referenced this in commit a6bb4410ab on May 7, 2022
  45. kittywhiskers referenced this in commit 4a5287d999 on May 7, 2022
  46. kittywhiskers referenced this in commit 48937a60eb on Jun 14, 2022
  47. kittywhiskers referenced this in commit 8950416efc on Jun 14, 2022
  48. kittywhiskers referenced this in commit c18816d0bd on Jun 14, 2022
  49. kittywhiskers referenced this in commit 8049c7962c on Jun 18, 2022
  50. kittywhiskers referenced this in commit 67997d3e44 on Jun 18, 2022
  51. kittywhiskers referenced this in commit b12b2b7a36 on Jul 4, 2022
  52. kittywhiskers referenced this in commit f83469f3fd on Jul 4, 2022
  53. kittywhiskers referenced this in commit a211038703 on Jul 6, 2022
  54. kittywhiskers referenced this in commit 8bde6cf816 on Jul 6, 2022
  55. kittywhiskers referenced this in commit 915dcab3dc on Jul 6, 2022
  56. kittywhiskers referenced this in commit c92c2cc09b on Jul 13, 2022
  57. kittywhiskers referenced this in commit 60aa00a590 on Jul 13, 2022
  58. kittywhiskers referenced this in commit eb2fa81288 on Jul 15, 2022
  59. PastaPastaPasta referenced this in commit 30d6584cb6 on Jul 17, 2022
  60. DrahtBot locked this on Aug 18, 2022

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-12-19 00:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me