fuzz: how to deal with fuzz input invalidation?

MarcoFalke commented at 9:08 am on January 3, 2021: member

Existing fuzz seeds are invalidated (i.e. may yield less code coverage) whenever the fuzz input “format” changes. Generally, this can happen whenever the behaviour of any executed code changes.

It would be nonsensical to hold back on changing Bitcoin Core code (e.g. validation code) to not invalidate fuzz inputs. Thus, we need to run a fuzz engine 24/7 to adapt the fuzz inputs to the latest code.

While it is possible to avoid some seed invalidations when fuzz code (or other test code) is changed, I think, given that we already have to anticipate input invalidation, we don’t need to spend resources to not invalidate fuzz inputs when fuzz code (or other test code) changes.

MarcoFalke added the label Brainstorming on Jan 3, 2021

MarcoFalke added the label Tests on Jan 3, 2021

MarcoFalke commented at 9:11 am on January 3, 2021: member

One example where it would have been possible to keep the format, by making the fuzz code more complicated was commit fa0f4157098ea68169ced44730986d0ed2c3a5aa. I chose to invalidate the seeds instead.

michaelfolkson commented at 12:03 pm on January 3, 2021: contributor

It would be nonsensical to hold back on changing Bitcoin Core code (e.g. validation code) to not invalidate fuzz inputs.

Agreed. I have been wondering about what aspirational goal we should be aiming for in the future. An informal fuzzing “team” that would monitor when existing seeds are invalidated through changes to Core code. Or all long term contributors updating fuzz seeds when they invalidate them through opening a PR. At least in the short term the latter doesn’t seem viable. A number of long term contributors are missing basic understanding of fuzzing currently.

The fuzz seeds are cheap to produce unless they have been carefully chosen. If there has been a lot of thought expended in choosing particular seeds this should be documented somewhere so that when invalidated they can be easily replaced. At the moment there doesn’t seem to be much documentation on the thought process behind seeds in the seed corpus.

MarcoFalke commented at 12:45 pm on January 3, 2021: member

A number of long term contributors are missing basic understanding of fuzzing currently.

Running the fuzzers should be straightforward: https://github.com/bitcoin/bitcoin/blob/master/doc/fuzzing.md . Also, the concept of fuzzing isn’t really that hard. However, the Bitcoin Core code base isn’t “optimized” for fuzzing (e.g. globals, background threads, non-mockable disk access, net processing dealing also with deserialization …), so working on fuzzers for Bitcoin Core sometimes becomes tricky. For general guidelines on writing fuzzers, you can take a look at https://github.com/google/fuzzing/blob/master/docs/intro-to-fuzzing.md

The fuzz seeds are cheap to produce unless they have been carefully chosen.

Correct. I think we should aim to write fuzzers that can explore the search space from scratch in a reasonable amount of time. Like mentioned before, this isn’t the case for all Bitcoin Core fuzz targets. There are plenty of hashes, checksums, signatures, POW checks, serialization specifics … etc that make it harder to find good fuzz inputs. Code coverage reports help in finding weak spots, and then specifically crafted seeds (either manually or extracted from real world data like the block files or socket buffers) can help to increase coverage.

It is a pity if those seeds get lost due to invalidation, and I think our best bet is to rely on the fuzz engine to translate them for us if it is possible. For exmaple the breakage caused by commit fa0f415 should be fixable with some trivial bit flips or similar.

ajtowns commented at 0:47 am on January 4, 2021: member

100% agree on continuous fuzz-testing to update the qa-assets corpus with it, and improving the code so fuzzing inputs are simpler for the fuzzer to generate.

But I still think we should make modest efforts to minimise corpus invalidation though – mostly just so that adding/removing cases doesn’t automatically invalidate the test’s corpus (ie, replace disappearing tests with a no-op; if a case is selected that’s above the number of cases we have, don’t modulo it, just do a no-op). Switching to that approach does invalidate current cases, but not any worse than adding a case already does, I think; and it seems pretty maintainable to me?

Note that renumbering cases means that a case that has k iterations through the loop needs each of its k choices to be fuzzed to from the old to the new values to remain equivalent, which isn’t a big deal, but I don’t think it’s entirely trivial? Mostly just seems unnecessarily wasteful to me.

MarcoFalke commented at 8:22 am on January 4, 2021: member

mostly just so that adding/removing cases doesn’t automatically invalidate the test’s corpus

This is only one specific case where invalidation can be avoided somewhat trivially. And it doesn’t even help in all cases. E.g. commit fa0f415 wouldn’t be fixed by just that. Starting to accommodate some cases of seed invalidation opens the door to accommodate all or most of them. And then our fuzzing code will just explode to an unmaintainable mess. For example, how would you deal with removal/insertion of serialized values into the fuzz input format? See commit faaa4f2b6af4ce249fc4809a030240afa45b1a33, which removes a serialized 32 bit value.

ajtowns commented at 0:01 am on January 5, 2021: member

This is only one specific case where invalidation can be avoided somewhat trivially.

Yeah, I’m only suggesting avoiding it when it can be avoided somewhat trivially.

For example, how would you deal with …

If we wanted something to seriously minimise invalidation, I think we’d need to approach fuzz input like an upgradable file format – perhaps something like an initial extensible header controlling setup, followed by a sequence of type-length-value blocks controlling how the test should continue. Something like that might be worth exploring – it might make it easier to write tests, not just avoid invalidating things; but it’s not especially easy, and even then, I do think we’d still want to invalidate cases every now and then to avoid accumulating cruft indefinitely.

MarcoFalke commented at 9:04 am on January 5, 2021: member

followed by a sequence of type-length-value blocks controlling how the test should continue

Fuzz engines can’t typically deal with TLVs (see https://github.com/google/fuzzing/blob/master/docs/split-inputs.md#type-length-value ). So we’’d probably have to use the LLVMFuzzerCustomMutator, which again comes with downsides. The most important one likely that our fuzz targets are no longer fuzz engine agnostic.

we’d still want to invalidate cases every now and then to avoid accumulating cruft indefinitely.

This is just asking for more code churn. Every format change will be accompanied by two code changes and without a clear guideline, leaves room for bikeshedding. Maybe if there was a TODO FUZZ_CLEANUP_22_0 comment, the cleanups could be done in a single swipe, but I’d still be sceptical.

ajtowns commented at 3:41 am on January 13, 2021: member

Like I said, I’m only suggesting avoiding invalidation when it’s basically trivial. Don’t really want to get bogged down in hypotheticals…

I think it’d probably be good to automatically generate fuzzer data for:

master
release branches
open PRs

At present, if a PR invalidates part of the corpus (or just introduces a new fuzzer) and the author doesn’t update qa-assets themselves, that PR’s fuzzing will only be useful after its merged, rather than in CI, which seems less than great? Maybe detectable by code-coverage stats going down, but not sure that’s easy to notice currently?

Maybe just having different corpuses for each of those, and merging/minimising them when a PR gets merged into master would work?

MarcoFalke commented at 5:15 pm on January 13, 2021: member

Like I said, I’m only suggesting avoiding invalidation when it’s basically trivial. Don’t really want to get bogged down in hypotheticals…

Fair enough. Though, those trivial cases should be rare enough to not yield any benefit given the overhead to having to remember when and how to invalidate them at a later point in time. Except for script_flags and process_messages, all remaining 150 fuzz targets can reach the current coverage with just minutes of CPU time. (Citation/Benchmark needed)

At present, if a PR invalidates …

Sounds like a good idea. I was thinking about passing -max_total_time to the fuzz engine even when run in ci-mode, but all fuzz targets take a different time, so it would be nicer if there was a -additional_fuzz_time_after_init option. The generated corpus would be lost as soon as the ci box cycles, but that doesn’t hurt because the coverage will quickly be re-generated by our fuzz farms once the pull is merged.

Maybe it is good enough to just set -max_total_time=30 and thus skip generation if the existing seeds take longer than that to read.

practicalswift commented at 6:38 pm on January 13, 2021: contributor

As suggested by @MarcoFalke in #20828 the following comment I wrote in #20828 (comment) about avoiding unnecessary invalidation might be of interest in this thread:

When using the switch (fuzzed_data_provider.ConsumeIntegralInRange<int>(0, N)) idiom it is worth noting that every time N is increased by one to accommodate for a new case all existing seeds are invalidated.

Example (all using the same fixed seed input in the form of a finite scream AAAAAAAAAAAAAAAAAAAAAAAAA):

 0FuzzedDataProvider{buffer.data(), buffer.size()}.ConsumeIntegralInRange<int>(0, 1) == 1
 1FuzzedDataProvider{buffer.data(), buffer.size()}.ConsumeIntegralInRange<int>(0, 2) == 2
 2FuzzedDataProvider{buffer.data(), buffer.size()}.ConsumeIntegralInRange<int>(0, 3) == 1
 3FuzzedDataProvider{buffer.data(), buffer.size()}.ConsumeIntegralInRange<int>(0, 4) == 0
 4FuzzedDataProvider{buffer.data(), buffer.size()}.ConsumeIntegralInRange<int>(0, 5) == 5
 5FuzzedDataProvider{buffer.data(), buffer.size()}.ConsumeIntegralInRange<int>(0, 6) == 2
 6FuzzedDataProvider{buffer.data(), buffer.size()}.ConsumeIntegralInRange<int>(0, 7) == 1
 7FuzzedDataProvider{buffer.data(), buffer.size()}.ConsumeIntegralInRange<int>(0, 8) == 2
 8FuzzedDataProvider{buffer.data(), buffer.size()}.ConsumeIntegralInRange<int>(0, 9) == 5
 9FuzzedDataProvider{buffer.data(), buffer.size()}.ConsumeIntegralInRange<int>(0, 10) == 10
10FuzzedDataProvider{buffer.data(), buffer.size()}.ConsumeIntegralInRange<int>(0, 11) == 5

One trick that can be used to tackle this is to choose from a larger range such as [0, 32] even if we only have say 12 case:s ([0, 11]). The non-matching numbers [12, 32] will simply be “ignored” by the coverage-guided fuzzer.

That way we’ll only have to invalidate existing seeds when we’ve exhausted the entire range [0, 32] with matching case:s. Then we can bump to [0, 64], and so on.

In other words: instead of invalidating every time we add a new case we’ll be invalidating every some-power-of-2:th time we add a new case :)

ajtowns commented at 7:06 am on January 14, 2021: member

all remaining 150 fuzz targets can reach the current coverage with just minutes of CPU time.

I’m not sure whether that’s awesome or a sign our fuzzing doesn’t go deep enough :)

MarcoFalke commented at 10:12 am on January 14, 2021: member

Whoever is going to invalidate the banman seeds should at least apply this patch:

 0diff --git a/src/test/fuzz/banman.cpp b/src/test/fuzz/banman.cpp
 1index e703fa39c1..23b2de288a 100644
 2--- a/src/test/fuzz/banman.cpp
 3+++ b/src/test/fuzz/banman.cpp
 4@@ -51,7 +51,6 @@ FUZZ_TARGET_INIT(banman, initialize_banman)
 5                 [&] {
 6                     ban_man.ClearBanned();
 7                 },
 8-                [] {},
 9                 [&] {
10                     ban_man.IsBanned(ConsumeNetAddr(fuzzed_data_provider));
11                 },
12@@ -71,7 +70,6 @@ FUZZ_TARGET_INIT(banman, initialize_banman)
13                 [&] {
14                     ban_man.DumpBanlist();
15                 },
16-                [] {},
17                 [&] {
18                     ban_man.Discourage(ConsumeNetAddr(fuzzed_data_provider));
19                 });

MarcoFalke commented at 8:15 am on July 31, 2021: member

all remaining 150 fuzz targets can reach the current coverage with just minutes of CPU time.

I’m not sure whether that’s awesome or a sign our fuzzing doesn’t go deep enough :)

For reference, with targets having increased in complexity (rpc, bannman, …) this is no longer true.

MarcoFalke commented at 9:33 am on August 25, 2021: member

In other words: instead of invalidating every time we add a new case we’ll be invalidating every some-power-of-2:th time we add a new case :)

I looked into something like this, but concluded it was not worth it. While it is cheap to implement, adding new cases is rarer than removing existing cases. And removing existing cases would still break the inputs with this method, just like before.

MarcoFalke commented at 3:41 pm on March 29, 2022: member

Closing for now, but please let us know if there is a practical solution

MarcoFalke closed this on Mar 29, 2022

DrahtBot locked this on Mar 29, 2023

fuzz: how to deal with fuzz input invalidation? #20837