fuzz: Generate with random libFuzzer settings

maflcko commented at 8:45 am on July 28, 2023: member

Sometimes a libFuzzer setting like -use_value_profile=1 helps [0], sometimes it hurts [1].

[0] #20789 (comment) [1] #27888 (comment)

By picking a random value, it is ensured that at least some of the runs will have the beneficial configuration set.

Also, set -max_total_time to prevent slow fuzz targets from getting a larger time share, or possibly peg to a single core for a long time and block the python script from exiting for a long time. This can be improved in the future. For example, the python script can exit after some time (https://github.com/bitcoin/bitcoin/pull/20752#discussion_r549248791). Alternatively, it can measure if coverage progress was made and run for less time if no progress has been made recently anyway, so that more time can be spent on targets that are new or still make progress.

DrahtBot commented at 8:45 am on July 28, 2023: contributor

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage

For detailed information about the code coverage, see the test coverage report.

Reviews

See the guideline for information on the review process.

Type	Reviewers
ACK	dergoegge, brunoerg, murchandamus
Stale ACK	darosior

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

DrahtBot added the label Tests on Jul 28, 2023

in test/fuzz/test_runner.py:264 in fa23beccc3 outdated

260@@ -259,9 +261,16 @@ def job(command, t, t_env):
261     for target, t_env in targets:
262         target_corpus_dir = corpus_dir / target
263         os.makedirs(target_corpus_dir, exist_ok=True)
264+        use_value_profile = int(random.random() < .3)
265+        max_len = 0 if random.random() < .7 else 2**random.randint(6, 12)

darosior commented at 8:59 am on July 28, 2023:

If we are going to set max_len, should we also set len_control?

maflcko commented at 9:06 am on July 28, 2023:

The default of -len_control=100 seems fine and applicable to all values of -max_len, no?

darosior commented at 9:19 am on July 28, 2023:

Yes, i guess 100 minutes is long enough for the fuzzer to eventually get to try larger inputs.

maflcko commented at 3:28 pm on February 5, 2024:

(removed max_len randomization for now)

maflcko force-pushed on Jul 28, 2023

DrahtBot added the label CI failed on Jul 28, 2023

darosior commented at 9:00 am on July 28, 2023: member

Concept ACK.

Also, randomize -mutate_depth, for fun.

Could you expand?

maflcko commented at 9:07 am on July 28, 2023: member

Concept ACK.

Also, randomize -mutate_depth, for fun.

Could you expand?

Not sure. I am happy to drop this, but it was part of https://www.github.com/bitcoin/bitcoin/pull/20752/commits/1ff0dc525f051bbc7a93312dd622340ca8f4f52c, which is why I picked it up.

darosior commented at 9:17 am on July 28, 2023: member

Not sure. I am happy to drop this

No, i was just curious about the rationale. I guess i’ve a mild preference for sticking to defaults if there isn’t any, but i don’t think it hurts either. ——- Original Message ——- On Friday, July 28th, 2023 at 11:07 AM, MacrabFalke @.***> wrote:

Concept ACK.

Also, randomize -mutate_depth, for fun.

Could you expand?

Not sure. I am happy to drop this, but it was part of https://www.github.com/bitcoin/bitcoin/pull/20752/commits/1ff0dc525f051bbc7a93312dd622340ca8f4f52c, which is why I picked it up.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

DrahtBot removed the label CI failed on Jul 28, 2023

maflcko force-pushed on Jul 28, 2023

maflcko commented at 11:44 am on July 28, 2023: member

Thanks, generated some data and edited the description to support the mutate_depth choice. (Limited to [1,10] for now, but this can be changed back to [1,15], if there is at least one data point showing that values larger than 10 are useful)

in test/fuzz/test_runner.py:271 in fad32eb02c outdated

264+        max_len = 0 if random.random() < .7 else 2**random.randint(6, 12)
265+        mutate_depth = random.randint(1, 10)
266         command = [
267             os.path.join(build_dir, 'src', 'test', 'fuzz', 'fuzz'),
268-            "-runs=100000",
269+            "-max_total_time=6000",

darosior commented at 12:34 pm on July 28, 2023:

That’s a large increase in runtime isn’t it? Won’t 100k iterations take much less than 100 minutes to complete on most of our targets currently?

We’ve got ~~115~~ 169 targets at the moments, this change makes it so running this script with --generate for all targets would take ~~around 8 days~~ more than a dozen of days.

maflcko commented at 12:49 pm on July 28, 2023:

The number of tasks that are spawned is ~300 (294 for me, but it will likely be more in the future). So even if you run each task only for a minute, it will take longer than 4 hours already. But running for a minute seems entirely useless, so I am not sure if it is worth optimizing for. (For smoke testing I just edit this script to -max_total_time=6).

In any way, you can pass --par 999 to run everything at the same time and be done in 100 minutes.

However, to avoid an edit for smoke testing, someone can add a config option in a follow-up or separate pull.

darosior commented at 1:29 pm on July 28, 2023:

Ok i’m confused. I’m aware i just gave this a superficial look but i’m absolutely not following here. (Last comment and if i don’t get it i’ll look into it more in depth.)

The number of tasks that are spawned is ~300

How comes? The default for --par is 4.

So even if you run each task only for a minute, it will take longer than 4 hours already

How could, with 300 tasks, running 169 tasks for 1 minute take 4 hours?

My main worry would be that increasing the runtime of the script by, i don’t know, 100X, would make it so people would always stop it before it completes thereby never generating seeds for the last (alphabetically-sorted) targets.

I may be missing underestimating how many tasks people usually spawn when running this script though. But --par 169 sounds like a lot.

maflcko commented at 1:40 pm on July 28, 2023:

How comes?

Sorry, with “The number of tasks that are spawned is ~300” I meant “The number of tasks that are spawned over the full runtime of the python script is ~300”.

How could

For example, with -j1: Running 300 1-minute-tasks spawned sequentially on one thread will take 300 minutes. (5 hours). Though, the main point is that 1 minute of runtime is useless. So optimizing for a short default runtime will quickly get you to make the entire script useless.

I am happy to change the number or drop this change, but generally I don’t think it will be possible to find a default number that makes everyone happy. There will always be a need to change the number some way or another.

darosior commented at 1:46 pm on July 28, 2023:

Indeed a too short runtime could make the script useless (since people would basically start running it in a loop itself), but a too long one would likely introduce a bias toward the first (alphabetically sorted) targets.

I guess it would be helpful to know who uses this script and how. For instance if they are continuously generating coverage most likely they are already running it in a loop of some sort and therefore a runtime of like 10minutes per target would spead the time spent over all targets instead of generating more for the first ones.

But i’ll stop bikeshedding now, your change looks good. If someone wants to change it they can open a PR and we’ll learn more about how the script is being used.

maflcko commented at 1:56 pm on July 28, 2023:

@murchandamus may be using it ?

murchandamus commented at 6:58 pm on October 4, 2023:

I tried using this script, but got away from that because it always tries to run all fuzz targets. I would either only be able to fuzz each target very briefly of which a big portion would just be the initialization, or if I were to kill the fuzzing when I need my computer for other things, always fuzz the same targets due to its determinism. I recently ended up making myself a bash script instead that randomly picks ten fuzz targets and fuzzes them for one hour each (with 12 processes in parallel). I run this nightly per a cronjob:

for i in $(find ../qa-assets-active-fuzzing/fuzz_seed_corpus/ -mindepth 1 -maxdepth 1 -type d | shuf | head -n10); do FUZZ=$(basename $i) src/test/fuzz/fuzz -jobs=12 -reload=1 -max_total_time=3600 $i; done

maflcko commented at 3:27 pm on February 5, 2024:

I am happy to review a pull request to change from running all targets to run N randomly drawn targets. However, I don’t need the feature, so I won’t be working on this personally.

maflcko commented at 3:28 pm on February 5, 2024:

In any case, the change to max_total_time seems fine, given that https://github.com/bitcoin/bitcoin/pull/28178/files#r1346327126 also does it.

in test/fuzz/test_runner.py:272 in fad32eb02c outdated

265+        mutate_depth = random.randint(1, 10)
266         command = [
267             os.path.join(build_dir, 'src', 'test', 'fuzz', 'fuzz'),
268-            "-runs=100000",
269+            "-max_total_time=6000",
270+            "-reload=0",

darosior commented at 1:48 pm on July 28, 2023:

Nice, do you happen to know why they don’t only set it to 1 when -jobs is not 0?

maflcko commented at 1:55 pm on July 28, 2023:

It is possible to launch multiple libFuzzer manually on the same folder, but this script doesn’t do it, and it seems unlikely that a user would do it.

murchandamus commented at 6:59 pm on October 4, 2023:

Oh, I’ve done that :)

maflcko commented at 3:26 pm on February 5, 2024:

Sure, most have done it, but have you done it while this script is running? :)

darosior approved

darosior commented at 1:49 pm on July 28, 2023: member

utACK fad32eb02c98d841e0cd9b585b61a698feec361a. I’m not using this script myself but the changes make sense to me.

fanquake requested review from dergoegge on Jul 28, 2023

dergoegge commented at 1:37 pm on August 1, 2023: member

I think I’d prefer continuing to use the defaults but I also don’t use the runner much for generating, so no super strong opinion. I would assume that the defaults are based on data/heuristics and probably work well most of the time, so maybe make this optional?

Sometimes a libFuzzer setting like -use_value_profile=1 helps [0], sometimes it hurts [1]. By picking a random value, it is ensured that at least some of the runs will have the beneficial configuration set.

Wouldn’t this only make sense if the “help” outweighs the “hurt”? For example, the metric you used in [1] is “iterations until bug found”, so if randomizing the fuzzer options results in roughly equal speedup (help) and slow down (hurt) then there is no gain on average.

maflcko commented at 1:55 pm on August 1, 2023: member

Except for mutate_depth, 70% of the time the default value is used for each setting. Happy to change mutate_depth as well, which would mean in .7**3 runs you get a “vanilla” libfuzzer.

I’d say the main goal here is to prevent a case where a bug isn’t found at all (or only after a “outlier” long time) due to the default settings. I don’t think we’ve seen such a bug in reality, so I am happy to close this pull.

Regardless, max_len should probably be considered separate, where the goal is to reduce the storage requirements of the qa-assets fuzz inputs folder. I haven’t collected any data on this either yet. Do you think it is worth it to try?

dergoegge commented at 12:06 pm on August 2, 2023: member

I’d say the main goal here is to prevent a case where a bug isn’t found at all (or only after a “outlier” long time) due to the default settings.

This sounds reasonable but also quite rare. Could you give me an example of a theoretical target where this could happen?

Are there any other projects that do this (e.g. oss-fuzz)?

I don’t think we’ve seen such a bug in reality, so I am happy to close this pull.

On a general note, I think it is fine to add to/improve on our fuzzing tools & techniques even if those haven’t found any bugs yet but there should be some evidence that they are useful.

Regardless, max_len should probably be considered separate, where the goal is to reduce the storage requirements of the qa-assets fuzz inputs folder. I haven’t collected any data on this either yet. Do you think it is worth it to try?

For max_len we could also consider adding size guards to the targets themselves, e.g. an early if (fuzz_data_size > x) return; which would result in fuzzers not adding (most) bigger inputs to the corpus.

maflcko commented at 12:23 pm on August 2, 2023: member

Are there any other projects that do this (e.g. oss-fuzz)?

IIRC oss-fuzz did that with AFL build time settings, but removed it again because it would turn up bugs intermittently/non-deterministically. I don’t think they do it for libFuzzer runtime settings, but I found this, which claims that exponential search is turned into linear search.

 0oss-fuzz]$ git grep -W -I use_value_profile 
 1projects/java-example/ExampleValueProfileFuzzer.java=public class ExampleValueProfileFuzzer {
 2projects/java-example/ExampleValueProfileFuzzer.java-  private static String base64(byte[] input) {
 3projects/java-example/ExampleValueProfileFuzzer.java-    return Base64.getEncoder().encodeToString(input);
 4projects/java-example/ExampleValueProfileFuzzer.java-  }
 5projects/java-example/ExampleValueProfileFuzzer.java-
 6projects/java-example/ExampleValueProfileFuzzer.java-  private static long insecureEncrypt(long input) {
 7projects/java-example/ExampleValueProfileFuzzer.java-    long key = 0xefe4eb93215cb6b0L;
 8projects/java-example/ExampleValueProfileFuzzer.java-    return input ^ key;
 9projects/java-example/ExampleValueProfileFuzzer.java-  }
10projects/java-example/ExampleValueProfileFuzzer.java-
11projects/java-example/ExampleValueProfileFuzzer.java-  public static void fuzzerTestOneInput(FuzzedDataProvider data) {
12projects/java-example/ExampleValueProfileFuzzer.java:    // Without -use_value_profile=1, the fuzzer gets stuck here as there is no direct correspondence
13projects/java-example/ExampleValueProfileFuzzer.java-    // between the input bytes and the compared string. With value profile, the fuzzer can guess the
14projects/java-example/ExampleValueProfileFuzzer.java-    // expected input byte by byte, which takes linear rather than exponential time.
15projects/java-example/ExampleValueProfileFuzzer.java-    if (base64(data.consumeBytes(6)).equals("SmF6emVy")) {
16projects/java-example/ExampleValueProfileFuzzer.java-      long[] plaintextBlocks = data.consumeLongs(2);
17projects/java-example/ExampleValueProfileFuzzer.java-      if (plaintextBlocks.length != 2)
18projects/java-example/ExampleValueProfileFuzzer.java-        return;
19projects/java-example/ExampleValueProfileFuzzer.java-      if (insecureEncrypt(plaintextBlocks[0]) == 0x9fc48ee64d3dc090L) {
20projects/java-example/ExampleValueProfileFuzzer.java:        // Without --fake_pcs (enabled by default with -use_value_profile=1), the fuzzer would get
21projects/java-example/ExampleValueProfileFuzzer.java-        // stuck here as the value profile information for long comparisons would not be able to
22projects/java-example/ExampleValueProfileFuzzer.java-        // distinguish between this comparison and the one above.
23projects/java-example/ExampleValueProfileFuzzer.java-        if (insecureEncrypt(plaintextBlocks[1]) == 0x888a82ff483ad9c2L) {
24projects/java-example/ExampleValueProfileFuzzer.java-          mustNeverBeCalled();
25projects/java-example/ExampleValueProfileFuzzer.java-        }
26projects/java-example/ExampleValueProfileFuzzer.java-      }
27projects/java-example/ExampleValueProfileFuzzer.java-    }
28projects/java-example/ExampleValueProfileFuzzer.java-  }
29projects/java-example/ExampleValueProfileFuzzer.java-
30projects/java-example/ExampleValueProfileFuzzer.java-  private static void mustNeverBeCalled() {
31projects/java-example/ExampleValueProfileFuzzer.java-    throw new IllegalStateException("mustNeverBeCalled has been called");
32projects/java-example/ExampleValueProfileFuzzer.java-  }
33projects/java-example/ExampleValueProfileFuzzer.java-}

there should be some evidence that they are useful.

I think I provided some evidence, but I guess you are asking about evidence that they are useful in the average case, which is a bit harder to show.

achow101 requested review from murchandamus on Sep 20, 2023

achow101 requested review from brunoerg on Sep 20, 2023

murchandamus commented at 7:12 pm on October 4, 2023: contributor

ACK fad32eb02c98d841e0cd9b585b61a698feec361a

I would use this script instead of my hacky bash script, if there were a way of running just n random fuzz targets instead of all of them. I don’t feel like I have a good intuition on the values you’re setting, but doing something non-default in 30% of the cases look reasonable to me, based on the evidence that those settings sometimes help.

DrahtBot removed review request from murchandamus on Oct 4, 2023

brunoerg commented at 5:21 pm on October 6, 2023: contributor

I would use this script instead of my hacky bash script, if there were a way of running just n random fuzz targets instead of all of them

like this, @murchandamus?

 0diff --git a/test/fuzz/test_runner.py b/test/fuzz/test_runner.py
 1index a60f51f40a..6510102e6c 100755
 2--- a/test/fuzz/test_runner.py
 3+++ b/test/fuzz/test_runner.py
 4@@ -80,6 +80,13 @@ def main():
 5              ' the given targets for a finite number of times. Outputs them to'
 6              ' the passed corpus_dir.'
 7     )
 8+    parser.add_argument(
 9+        '--max_targets',
10+        '-mt',
11+        type=int,
12+        default=0,
13+        help='Max number of targets to run (chosen randomly). 0 means to run all of them.',
14+    )
15 
16     args = parser.parse_args()
17     args.corpus_dir = Path(args.corpus_dir)
18@@ -121,6 +128,8 @@ def main():
19                 logging.error("Target \"{}\" not found in current target list.".format(excluded_target))
20                 continue
21             test_list_selection.remove(excluded_target)
22+    if args.max_targets > 0:
23+        test_list_selection = random.sample(test_list_selection, args.max_targets)
24     test_list_selection.sort()
25 
26     logging.info("{} of {} detected fuzz target(s) selected: {}".format(len(test_list_selection), len(test_list_all), " ".join(test_list_selection)))

brunoerg commented at 5:38 pm on October 6, 2023: contributor

I don’t have any strong opinion on this. Although ramdomizing it may make sense for me. In practice, I couldn’t see an effective case. I’m using mutation testing to see if I get any interesting stuff with this, I will have a better opinion soon.

in test/fuzz/test_runner.py:265 in fad32eb02c outdated

259@@ -259,9 +260,16 @@ def job(command, t, t_env):
260     for target, t_env in targets:
261         target_corpus_dir = corpus_dir / target
262         os.makedirs(target_corpus_dir, exist_ok=True)
263+        use_value_profile = int(random.random() < .3)
264+        max_len = 0 if random.random() < .7 else 2**random.randint(6, 12)
265+        mutate_depth = random.randint(1, 10)

brunoerg commented at 5:40 pm on October 6, 2023:

What is the default value for mutate_depth? 5?

maflcko commented at 3:29 pm on February 5, 2024:

mutate_depth was dropped from this pull, for now

DrahtBot added the label CI failed on Oct 25, 2023

DrahtBot commented at 2:34 pm on February 5, 2024: contributor

🤔 There hasn’t been much activity lately and the CI seems to be failing.

If no one reviewed the current pull request by commit hash, a rebase can be considered. While the CI failure may be a false positive, the CI hasn’t been running for some time, so there may be a real issue hiding as well. A rebase triggers the latest CI and makes sure that no silent merge conflicts have snuck in.

fuzz: Generate with random libFuzzer settings fa4e396e1d

maflcko force-pushed on Feb 5, 2024

maflcko commented at 3:21 pm on February 5, 2024: member

Removed the max_len and mutate_depth randomization, because it seemed too controversial? Removed this from OP:

Also, randomize -max_len= to possibly get some runs with faster iterations, or to produce smaller reduced fuzz inputs over time.

Also, randomize -mutate_depth, as lower values seem to be beneficial as well. [2]

[2] #27888 (comment)

This picks up the work started in commit https://www.github.com/bitcoin/bitcoin/pull/20752/commits/1ff0dc525f051bbc7a93312dd622340ca8f4f52c

fuzz: Set -rss_limit_mb=8000 for generate as well

This is set by merge, so set it here as well, to avoid OOM.

fa3a4102ef

maflcko commented at 3:22 pm on February 5, 2024: member

Also, added missing rss limit to avoid OOM

maflcko commented at 3:30 pm on February 5, 2024: member

cc @dergoegge are you still unconvinced about this? Is there more references or data I can provide?

dergoegge approved

dergoegge commented at 3:58 pm on February 5, 2024: member

utACK fa3a4102ef0ae06d8930d7a7b567759e2a5b5fde

DrahtBot requested review from murchandamus on Feb 5, 2024

DrahtBot requested review from darosior on Feb 5, 2024

DrahtBot removed the label CI failed on Feb 5, 2024

maflcko commented at 10:52 am on February 16, 2024: member

cc @murchandamus You may be interested in the rss_limit_mb (last commit). Not sure if you set it in your fuzz script, or if you ever ran into OOM.

brunoerg commented at 1:16 pm on February 16, 2024: contributor

light ACK fa3a4102ef0ae06d8930d7a7b567759e2a5b5fde

I did a light test of these values using differential fuzzing and mutation testing. I applied differential fuzzing between the original coin selection functions and their mutants. In general:

Running from seed corpus, -runs=100000 was able to kill most mutants (>55%) but -max_total_time=6000 is so much more effective (>90%), as expected.
Running without seed corpus, -runs=100000 was useless, -max_total_time=6000 was effective.
Running from seed corpus, -use_value_profile=1 reduced the time to kill some mutants in 55%. Especially for changes like “>” to >=" and similars.

Running only fuzzing, even knowing it is not so effective in killing mutants, I applied the mutations in part of the code that could impact the target. Results:

Running without seed corpus, -runs=100000 was useless, -max_total_time=6000 could kill some mutants.
Running from seed corpus, -use_value_profile=1 increased the number of killed mutants in 12% (running with -max_total_time=6000).

murchandamus commented at 6:42 pm on February 26, 2024: contributor

cc @murchandamus You may be interested in the rss_limit_mb (last commit). Not sure if you set it in your fuzz script, or if you ever ran into OOM.

Thanks, memory has not been an issue for me.

murchandamus commented at 6:58 pm on February 26, 2024: contributor

utACK fa3a4102ef0ae06d8930d7a7b567759e2a5b5fde

DrahtBot removed review request from murchandamus on Feb 26, 2024

fanquake merged this on Feb 27, 2024

fanquake closed this on Feb 27, 2024

maflcko deleted the branch on Feb 27, 2024

PastaPastaPasta referenced this in commit 8f04ba19ab on Oct 24, 2024

PastaPastaPasta referenced this in commit 92bad90e6c on Oct 24, 2024

PastaPastaPasta referenced this in commit aaccc9ea51 on Oct 24, 2024

bitcoin locked this on Feb 26, 2025

fuzz: Generate with random libFuzzer settings #28178

Code Coverage

Reviews