fuzz: how to scale fuzzing with the number of fuzz targets #20088

issue MarcoFalke openend this issue on October 5, 2020
  1. MarcoFalke commented at 5:30 pm on October 5, 2020: member

    Having different fuzz targets is useful to give the fuzzer a specific and well defined task to work on. This makes it also easier for developers to see what an individual fuzz test/target is doing. Moreover, the fuzzer might be more performant in finding new inputs because the input directory as well as the search space is smaller.

    However, there are also several downsides:

    • Limiting the overall search space the fuzzer can explore will make it impossible to reach coverage for the code paths that have been excluded.
    • Building numerous small fuzz targets, instrumenting them and linking them with debug symbols is costly in CPU time and disk space. A quick build is not only important for devs, but also for CI.

    Similar to how the unit tests are compiled and linked into one binary, we could look into linking the fuzz targets into one binary. Individual targets could be selected with some kind of runtime argument.

  2. MarcoFalke added the label Feature on Oct 5, 2020
  3. MarcoFalke added the label Brainstorming on Oct 5, 2020
  4. MarcoFalke added the label Tests on Oct 5, 2020
  5. MarcoFalke commented at 5:31 pm on October 5, 2020: member
    I know that in the beginning we had one fuzz binary and one fuzz corpus. To clarify, this is not what I am suggesting. I am suggesting to have one fuzz binary, but a fuzz corpus for each target.
  6. sipa commented at 5:34 pm on October 5, 2020: member
    @kcc In the past we’ve followed your advice of creating one fuzz binary for each fuzzer. Is there a problem with instead having a single binary, but choosing the test with a command-line argument (and thus still having separate corpora for different tests)?
  7. MarcoFalke commented at 5:39 pm on October 5, 2020: member
    I guess there will be a runtime overhead of parsing which test to run, but that is a constant cost, only paid once when the binary is started. So at least it shouldn’t be an issue for in-process fuzzers such as libfuzzer.
  8. fanquake commented at 7:54 am on October 12, 2020: member
  9. practicalswift commented at 9:46 am on October 13, 2020: contributor

    Thanks for the ping @fanquake!

    When doing large scale fuzzing with the goal of reaching as good coverage as possible I think we definitely need support for one-binary-per-target.

    Don’t take my word for it though: @kcc of C++ sanity fame who introduced the sanitizers, libFuzzer, OSS-Fuzz, CFI, etc. wrote this in #11045 (comment): “I always advocate for one-binary-per-target because it makes fuzzing more efficient.” :)

    Almost all fuzzing platforms are written assuming such a structure with one binary per fuzzer and one corpus per binary. If passing runtime arguments is supported at all then fuzzing_harness -t foo and fuzzing_harness -t bar are typically expected to share the same corpus.

    With that said I think it could make sense to add a configure option which would build the fuzzing harnesses as one binary (with individual targets selected as a runtime argument) to speed up normal day-to-day compilation. That could hopefully be enabled by default to make sure the fuzzing harnesses still compiles are non-fuzzing code changes (currently that is often noticed only at CI stage).

    Would that make sense?

  10. MarcoFalke commented at 4:39 pm on December 3, 2020: member

    Almost all fuzzing platforms are written assuming such a structure with one binary per fuzzer and one corpus per binary

    Our primary concern is to be able to run the fuzzers ourselves, locally and on ci. With an increasing number of targets and increasing number of seeds, the compile-time and run-time for this will only ever increase. By default devs already don’t compile the fuzz tests (#19388), let alone run them. We purely rely on ci for this, which again doesn’t come with infinite resources.

    Supporting fuzzing platforms is a secondary goal. If they are not flexible enough to be configurable for our setting, then :shrug:. Regardless, have you tried creating wrapper bash scripts with the content FUZZ=addr_deserialize ./src/test/fuzz.exe and tried if that works for your favourite fuzzing platform?

  11. MarcoFalke closed this on Dec 15, 2020

  12. DrahtBot locked this on Aug 16, 2022

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-11-17 12:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me