MarcoFalke commented at 11:56 am on September 27, 2021: member

Most fuzz engines use coverage signals (for example line coverage or edge coverage) to determine which fuzz inputs to keep. This works generally well, unless the fuzzed logic is stateful.

One example is the script interpreter / script parsing. I believe there is no difference in coverage signals between a 1-of-1 multisig and a 1-of-16 multisig, which is presumably why it took several months to find a bug in the script fuzz target: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=39152

There is -use_value_profile=1, which might help a bit here, but apparently it didn’t help enough.

I am wondering what the best way is to address this shortcoming, so that it is possible to find the above bug (and similar ones) in less CPU time?

cc @guidovranken @agroce @practicalswift

agroce commented at 2:58 pm on September 27, 2021: contributor

The same issue comes up in machine learning: every path through a forest of decision trees looks alike, because the code’s just a loop through a structure that’s the real code.

People have done various things on this I don’t recall offhand, not sure any that are out there work way better than libFuzzer value profiles, unless custom targeted.

One idea I had but have never explored is to use some kind of annotation on key structures that, in a fuzz build, expands to explicit coverage targets somehow. I suspect in C++ a modest annotation + evil template magic could do this. My (very unsupported) guess is that -use_value_profiles=1 is not great at this because it has no way to know which state is noise and which is important. I have an NSF proposal in (no clue if we’ll get funded) that has adding support for this kind of thing to DeepState as a major component.

Writing an #IF_FUZZ hand “fake coverage” that won’t get optimized away for critical targets might be doable in some cases, like here?

guidovranken commented at 5:10 pm on September 27, 2021: contributor

@MarcoFalke

Input size control

Limit the maximum size (libFuzzer flag: -max_len=N). In my experience the vast majority of bugs for most fuzzers (in general) can be constructed with very small files (in the order of several hundreds of bytes). I think OSS-Fuzz uses a max_len of 1 megabyte which is far too large for most targets. The benefit of limiting the max input size to 4KB or smaller is that the search space in terms of volume becomes much smaller, each mutation has more chance of making a meaningful change and hitting a bug. -max_len=4096 is often demonstrably faster in finding bugs than say -max_len=1048576 The only downside of a hard limit is that you will miss bugs which can only be expressed with an input larger than the hard limit.

libFuzzer by default favors smaller inputs and gradually move towards larger inputs. This is controlled by the len_control flag:

0 len_control                         	100	Try generating small inputs first, then try larger inputs over time.  Specifies the rate at which the length limit is increased (smaller == faster).  If 0, immediately try inputs with size up to max_len. Default value is 0, if LLVMFuzzerCustomMutator is used.

So if a hard limit is not desirable you can consider tweaking len_control.

Extra counters

You can use an additional coverage signal by defining (at the global scope, and use extern "C" {} when using C++):

0extern "C" {
1__attribute__((section("__libfuzzer_extra_counters")))
2uint8_t bitcoin_extra_counters[1024];
3}

libFuzzer will see this on startup and use its contents as an additional signal. You don’t need to memset it to 0 each iteration, libFuzzer does that for you.

You can use some arbitrary logic to signal progress to libFuzzer, e.g.

0extern "C" {
1__attribute__((section("__libfuzzer_extra_counters")))
2uint64_t  bitcoin_multisig_size;
3}

and in the script code

0bitcoin_multisig_size = multisig_size;

Alternatively or additionally you can consider hashing some core characteristics of the script environment, for example opcode + program counter + stack size, e.g.:

0extern "C" {
1__attribute__((section("__libfuzzer_extra_counters")))
2uint8_t bitcoin_extra_counters[65535];
3}

and in the script instruction loop:

0bitcoin_extra_counters[ (op^pc^stack.size()) % 65535 ] ++;

or something to that effect.

This ought to signal the VM state more accurately to libFuzzer and should expedite the bug finding.

Some fuzzers even work exclusively using extra counters, such as go-fuzz in libFuzzer mode: https://github.com/dvyukov/go-fuzz/blob/4980593459a186bd2a389fe4557a260cce742594/go-fuzz-build/main.go#L857-L863 And this works very well.

guidovranken commented at 6:00 am on September 28, 2021: contributor

I looked at this specific fuzzer and found that none of the files in the https://github.com/bitcoin-core/qa-assets/tree/main/fuzz_seed_corpus/script directory hit this block: https://github.com/bitcoin/bitcoin/blob/e826b22da252e0599c61d21c98ff89f366b3120f/src/script/standard.cpp#L284-L296

Specifically, no input satisfies this condition:

https://github.com/bitcoin/bitcoin/blob/e826b22da252e0599c61d21c98ff89f366b3120f/src/script/standard.cpp#L141

So in this particular case the problem seems to be that the fuzzer is not able to construct a multisig script in the first place, and a custom mutator might help.

MarcoFalke referenced this in commit fa4baf0756 on Nov 1, 2021

MarcoFalke commented at 11:31 am on November 1, 2021: member

Thanks for all the responses.

I looked a bit more at the particular multisig problem and suspect the issue is that multisig is too structured for a fuzz engine to quickly figure it out.

Basically, it needs to guess:

OP_n, where n is exactly the number of pubkeys
The serialization of n pubkeys (Correct serialized length before the serialized byte vector, and inside the byte vector, the correct CPubKey::GetLen() prefix)
OP_k and no further opcodes

It shouldn’t be too hard for the fuzz engine to find an 1-of-1 multisig with the help of use_value_profile. Though, extending it even to 1-of-2 multisig is quite a lot harder, since it requires to guess the correct OP_n and a correctly pushed pubkey at the right position.

I tried aiding the fuzz engine by reworking ConsumeScript to keep a byte buffer which may be reused. I suspected that this will make it possible to find a 1-of-2 multisig in less than 100 CPU hours on my laptop. Without success I aborted after 200-300 CPU hours.

I also tried using a dictionary with pubkeys as entries. This didn’t seem to improve either.

Finally, adding an explicit path to ConsumeScript to “fake” a multisig worked the best. See #23408.

Closing this for now, but feel free to continue the discussion if there is anything to add.

MarcoFalke closed this on Nov 1, 2021

darosior commented at 8:58 am on November 11, 2021: member

Do you think it would be worth adding mutators using Miniscript?

First it would allow to more efficiently fuzz Miniscript [0] once/if included (finding well structured Miniscript nodes sounds way harder than finding a multisig structure, at least for higher-level nodes). It could also help generating more interesting valid scripts for other targets.

[0] https://github.com/sipa/miniscript/pull/78

MarcoFalke commented at 1:40 pm on November 11, 2021: member

Sure, also one for descriptors could make sense.

MarcoFalke referenced this in commit 024e4debc5 on Nov 15, 2021

sidhujag referenced this in commit c4ff2e2987 on Nov 16, 2021

laanwj referenced this in commit d492dc1cda on Apr 5, 2022

janus referenced this in commit f5af74462f on Jul 3, 2022

DrahtBot locked this on Nov 11, 2022

How to fuzz stateful logic? #23105

Input size control

Extra counters