We currently use FuzzedDataProvider and a suite of Consume* functions for targets that require input formats other than a byte array. This approach is good for a lot of targets but has issues when it comes to more complex input formats.
- The input corpora consist of custom input serialization formats, which means that the inputs have no meaning outside of the target itself. Seeding or sharing inputs is basically impossible when dealing with custom formats per target, however mutation based fuzzers are particularly effective when provided with an initial seed corpus (coverage guided fuzzers like libFuzzer are able to start from an empty corpus but that is less effective).
- The fuzzer is not able to make useful mutations efficiently, because it only deals with raw bytes and is not aware of the input format. Fuzzers will still be able to create useful mutations, however only after many iterations.
- Changing the target often leads to invalidation of the existing input corpus. For example, if the target is modified to interpret the input data in a more useful way, then the previous input corpus is invalidated, as the serialization format is modified.
libFuzzer provides an interface for dealing with structured input formats: LLVMFuzzerCustomMutator and LLVMFuzzerCustomCrossOver. Using this interface it is possible to curate input corpora with highly structured input formats (e.g. png files, json, encrypted, compressed, base64 encoded). This is described here in detail.
libprotobuf-mutator is a library for mutating protocol buffers, that also provides an interface around libFuzzer's custom mutator API. It allows us to specify input grammars using protobufs and exclusively provides useful mutations (i.e. mutations of the specified input format).
Using libprofobuf-mutator can address most of the issues of the ´FuzzedDataProvider` approach.
- Input corpora exclusively consist of valid protobuf serializations. Meaning that seeding of corpora becomes quite easy, as all you need to do is provide your initial test cases in the protobuf format (i.e. have a script that produces useful initial test cases, similar to
feature_taproot --dumptestsexcept that it should spit out protobufs instead of json objects). Sharing inputs between targets becomes much easier (e.g. if two targets make use of transactions as inputs, then copying the transactions from one targets corpus to the other can easily be automated). - By default the protobufs are serialized into a human readable format, which makes debugging of crashes easier and also enables hand-rolling (initial) test cases.
- IMO, writing protobuf definitions to define input grammars is very easy and maintainable. Looking at the protobuf definition gives an immediate overview of the input type a target takes (vs having to understand the combination
FuzzedDataProviderandConsume*calls). - Modifying the target is possible without invalidating the existing inputs.
- (We could likely get rid of quite a few of our
Consume*functions meaning that there is less test only code to maintain.)
I have provided three examples in this PR that make use of libprotobuf-mutator.
- Fuzzing mempool acceptance
- Fuzzing the version handshake
- Fuzzing validation (ProcessNewBlock, ProcessNewBlockHeaders, ProcessTransaction)
Further reading/watching:
- https://github.com/google/fuzzing/blob/master/docs/structure-aware-fuzzing.md
- https://github.com/google/fuzzing/blob/master/docs/split-inputs.md
- https://www.youtube.com/watch?v=U60hC16HEDY
- https://media.ccc.de/v/35c3-9579-attacking_chrome_ipc.
Building this PR
First clone and build libprotobuf-mutator, instruction can be found in their readme.
Then compile the protobuf definitions in this PR to c++:
cd src/test/fuzz/proto/
protoc *.proto --cpp_out .
Next configure and build the proto fuzzer binaries:
./configure --enable-fuzz --enable-proto-fuzz --with-sanitizers=fuzzer && make
If you did not install the libprotobuf-mutator libraries and headers onto your system, then you might have to set LDFLAGS and CPPFLAGS to point to your local LPM build.
If you manage to build and run the fuzzers, you can inspect the generated inputs with cat or any editor of your choosing.
Looking for conceptual review