This adds a startup option to test/fuzz/test_runner.py which allows to generate seed corpus to the passed seed_dir instead of using them.
Add a seed corpus generation option to the fuzzing test_runner #19659
pull darosior wants to merge 1 commits into bitcoin:master from darosior:20/08/04/fuzz_runner_gen_seeds changing 1 files +60 −12-
darosior commented at 4:46 PM on August 4, 2020: member
-
in test/fuzz/test_runner.py:111 in 5adbd1b2a8 outdated
107 | @@ -100,6 +108,11 @@ def main(): 108 | 109 | logging.info("{} of {} detected fuzz target(s) selected: {}".format(len(test_list_selection), len(test_list_all), " ".join(test_list_selection))) 110 | 111 | + if args.generate:
MarcoFalke commented at 5:03 PM on August 4, 2020:Any reason to not reuse the fuzz_pool below? Benefit is also that it will check for libFuzzer and abort early if not found
darosior commented at 5:09 PM on August 4, 2020:The main reason was to avoid the seedless check just below
MarcoFalke commented at 5:11 PM on August 4, 2020:The check is just a debug print, no? If it does any harm, you could disable it when
args.generate.Though, I like the libFuzzer check to run before any libFuzzer specific args are passed to a subprocess.
darosior commented at 5:13 PM on August 4, 2020:Ok, will change
in test/fuzz/test_runner.py:182 in 5adbd1b2a8 outdated
177 | + def job(command): 178 | + try: 179 | + logging.debug("Running {}..".format(" ".join(command))) 180 | + logging.debug("Output: {}\n\n".format( 181 | + subprocess.run(command, check=True, 182 | + stderr=subprocess.PIPE, timeout=n*5).stderr
MarcoFalke commented at 5:04 PM on August 4, 2020:Any reason to set a time based timeout here, but max runs by number below?
darosior commented at 5:12 PM on August 4, 2020:Yeah, the max runs is for the number of targets per batch to run in parallel, while this timeout is because some targets will have a harder time finding new coverage before some others and is to avoid obstructing the queue with them.
darosior commented at 5:15 PM on August 4, 2020:Hmm actually i used something different and timed out slow fuzzing processes because they were blocking the whole next batch.. With a thread pool maybe it's not necessary ?
MarcoFalke commented at 5:05 PM on August 4, 2020: memberConcept ACK
DrahtBot added the label Tests on Aug 4, 2020darosior force-pushed on Aug 4, 2020darosior commented at 5:51 PM on August 4, 2020: memberOk, removed the early return in main to keep the libfuzzer check, and removed the timeout
in test/fuzz/test_runner.py:176 in 749f6cb52c outdated
171 | + """Generates new corpus seeds. 172 | + 173 | + Run {n} {targets} at a time without input, and outputs the generated corpus 174 | + seeds to {seed_dir}. 175 | + """ 176 | + logging.info(f"Generating corpus seeds to {seed_dir}")
Crypt-iQ commented at 5:26 AM on August 5, 2020:Extraneous
finf"Generating..."
darosior commented at 9:17 AM on August 5, 2020:Had to drop it anyway because we need to support python 3.5 !
Crypt-iQ commented at 5:29 AM on August 5, 2020: contributorAfter locally fixing the log statement, I was able to run this and get a starting corpus pretty quickly. Why wouldn't you just use the populated corpus from
qa-assets?in test/fuzz/test_runner.py:63 in 749f6cb52c outdated
55 | @@ -56,6 +56,14 @@ def main(): 56 | '--m_dir', 57 | help='Merge inputs from this directory into the seed_dir. Needs /target subdirectory.', 58 | ) 59 | + parser.add_argument( 60 | + '-g', 61 | + '--generate', 62 | + action='store_true', 63 | + help='Create new corpus seeds by running the given targets by batches '
MarcoFalke commented at 7:04 AM on August 5, 2020:help='Create new corpus seeds (or extend the existing ones) by running the given targets by batches 'This will also take into account existing seeds, right?
darosior commented at 9:12 AM on August 5, 2020:Yep
in test/fuzz/test_runner.py:170 in 749f6cb52c outdated
166 | @@ -152,6 +167,35 @@ def main(): 167 | ) 168 | 169 | 170 | +def generate_corpus_seeds(fuzz_pool, build_dir, seed_dir, targets, n):
MarcoFalke commented at 7:06 AM on August 5, 2020:def generate_corpus_seeds(*, fuzz_pool, build_dir, seed_dir, targets, n):python is not type safe, so please use named arguments to not accidentally mix up the order of the args and to simplify review
MarcoFalke commented at 7:08 AM on August 5, 2020:nis unused, no?
darosior commented at 9:13 AM on August 5, 2020:Done
darosior commented at 9:14 AM on August 5, 2020:Yes, forgot to remove it along with the ThreadPool. Thanks!
in test/fuzz/test_runner.py:181 in 749f6cb52c outdated
176 | + logging.info(f"Generating corpus seeds to {seed_dir}") 177 | + 178 | + def job(command): 179 | + logging.debug("Running {}..".format(" ".join(command))) 180 | + logging.debug("Output: {}\n\n".format( 181 | + subprocess.run(command, check=True, stderr=subprocess.PIPE).stderr
MarcoFalke commented at 7:09 AM on August 5, 2020:is stderr bytes or an utf-8 string?
darosior commented at 9:21 AM on August 5, 2020:Bytes :( Will pass encoding to
run().darosior force-pushed on Aug 5, 2020darosior commented at 9:29 AM on August 5, 2020: memberAfter locally fixing the log statement, I was able to run this and get a starting corpus pretty quickly. Why wouldn't you just use the populated corpus from qa-assets?
This is used to actually populate and extend the seed corpus, which we use to test against.
in test/fuzz/test_runner.py:184 in 191a161e85 outdated
179 | + 180 | + def job(command): 181 | + logging.debug("Running {}..".format(" ".join(command))) 182 | + logging.debug("Output: {}\n\n".format( 183 | + subprocess.run(command, check=True, stderr=subprocess.PIPE, 184 | + text=True).stderr
MarcoFalke commented at 9:36 AM on August 5, 2020:universal_newlines=True).stderrotherwise it would be python3.7 only
in test/fuzz/test_runner.py:182 in 191a161e85 outdated
177 | + """ 178 | + logging.info("Generating corpus seeds to {}".format(seed_dir)) 179 | + 180 | + def job(command): 181 | + logging.debug("Running {}..".format(" ".join(command))) 182 | + logging.debug("Output: {}\n\n".format(
MarcoFalke commented at 9:37 AM on August 5, 2020:Would probably be good to print the command again, otherwise it can't be correlated
darosior commented at 10:04 AM on August 5, 2020:Done
MarcoFalke approvedMarcoFalke commented at 9:38 AM on August 5, 2020: memberre-ACK. Might test soon
darosior force-pushed on Aug 5, 2020Crypt-iQ commented at 10:40 AM on August 5, 2020: contributorAfter locally fixing the log statement, I was able to run this and get a starting corpus pretty quickly. Why wouldn't you just use the populated corpus from qa-assets?
This is used to actually populate and extend the seed corpus, which we use to test against.
Is there ever a scenario in which you want to run this from scratch? I would imagine you could always just use your existing coverage instead of running from scratch unless I'm thinking about this all wrong.
darosior commented at 12:51 PM on August 5, 2020: memberIs there ever a scenario in which you want to run this from scratch?
Generating from scratch, yes to create the initial corpus to test against. Generating not from scratch, yes to increase coverage to test against.
Crypt-iQ commented at 1:09 PM on August 5, 2020: contributorIs there ever a scenario in which you want to run this from scratch?
Generating from scratch, yes to create the initial corpus to test against. Generating not from scratch, yes to increase coverage to test against.
I understand the second point, and I could see in some ways the first could be useful so Concept ACK from me. It's really for my own edification. Since some of these harnesses require very specific, structured input (see
process_messages), the coverage won't be extensive if generated from scratch. Is the idea to eventually increase the coverage that can be generated from scratch via structured mutators and such?darosior force-pushed on Aug 8, 2020Crypt-iQ commented at 12:37 PM on August 8, 2020: contributorA bit unrelated to this PR, but thought I'd ask since it's kind of confusing me. When trying to run
test/fuzz/test_runner.pywith:ASAN_OPTIONS="detect_stack_use_after_return=1:check_initialization_order=1:strict_init_order=1" test/fuzz/test_runner.py seed_dir=~/qa-assets/fuzz_seed_corpus --loglevel DEBUGThe fuzzing harness is only run a very small number of times which is certainly not enough to run every corpus input against the harnesses. Example output:
Run pub_key_deserialize with args ['/root/bitcoin/src/test/fuzz/pub_key_deserialize', '-runs=1', 'seed_dir=/root/qa-assets/fuzz_seed_corpus/pub_key_deserialize']INFO: Seed: 3809663835 INFO: Loaded 1 modules (504 inline 8-bit counters): 504 [0x55ed26ff4ba8, 0x55ed26ff4da0), INFO: Loaded 1 PC tables (504 PCs): 504 [0x55ed26ff4da0,0x55ed26ff6d20), INFO: 3 files found in seed_dir=/root/qa-assets/fuzz_seed_corpus/pub_key_deserialize INFO: -max_len is not provided; libFuzzer will not generate inputs larger than 4096 bytes INFO: seed corpus: files: 3 min: 2b max: 4b total: 9b rss: 28Mb [#4](/bitcoin-bitcoin/4/) INITED cov: 22 ft: 27 corp: 3/9b exec/s: 0 rss: 29Mb [#4](/bitcoin-bitcoin/4/) DONE cov: 22 ft: 27 corp: 3/9b lim: 4 exec/s: 0 rss: 29Mb Done 4 runs in 0 second(s) Run process_messages with args ['/root/bitcoin/src/test/fuzz/process_messages', '-runs=1', 'seed_dir=/root/qa-assets/fuzz_seed_corpus/process_messages']INFO: Seed: 3809693158 INFO: Loaded 1 modules (177287 inline 8-bit counters): 177287 [0x55751462da08, 0x557514658e8f), INFO: Loaded 1 PC tables (177287 PCs): 177287 [0x557514658e90,0x55751490d700), INFO: 8 files found in seed_dir=/root/qa-assets/fuzz_seed_corpus/process_messages INFO: -max_len is not provided; libFuzzer will not generate inputs larger than 4096 bytes INFO: seed corpus: files: 8 min: 1b max: 4b total: 18b rss: 120Mb [#9](/bitcoin-bitcoin/9/) INITED cov: 611 ft: 1382 corp: 6/12b exec/s: 0 rss: 130Mb [#9](/bitcoin-bitcoin/9/) DONE cov: 611 ft: 1382 corp: 6/12b lim: 4 exec/s: 0 rss: 130Mb Done 9 runs in 0 second(s)An argument could be made for
-runs=1being raised as running a couple inputs against a fuzzing harness isn't too effective I feel like. Could raise it a little to get some decent amount of fuzzing coverage every time this python script is run. It can't be raised too much as I know some fuzzing harnesses (process_messages) take ~10 hours to go through every single input file. Thoughts?EDIT: It's just a libFuzzer option and when
-runs=1, it doesn't log all the output.in test/fuzz/test_runner.py:64 in 395e7d38db outdated
55 | @@ -56,6 +56,14 @@ def main(): 56 | '--m_dir', 57 | help='Merge inputs from this directory into the seed_dir. Needs /target subdirectory.', 58 | ) 59 | + parser.add_argument( 60 | + '-g', 61 | + '--generate', 62 | + action='store_true', 63 | + help='Create new corpus seeds (or extend the existing ones) by running' 64 | + ' the given targets for a finite amount of time. Outputs them to '
Crypt-iQ commented at 4:25 PM on August 9, 2020:These no longer run for a finite amount of time, maybe finite amount of runs instead?
darosior commented at 8:23 AM on August 10, 2020:
Crypt-iQ commented at 11:57 AM on August 10, 2020:Oh sorry, I meant to modify the comment
darosior commented at 12:01 PM on August 10, 2020:Oh yeah, you right: I misunderstood :)
darosior commented at 12:03 PM on August 10, 2020:Amended to outline that i meant time not as in timestamp but as in "1 time, 2 times":
diff --git a/test/fuzz/test_runner.py b/test/fuzz/test_runner.py index cfe2e67b5..5ad1c9b9f 100755 --- a/test/fuzz/test_runner.py +++ b/test/fuzz/test_runner.py @@ -61,8 +61,8 @@ def main(): '--generate', action='store_true', help='Create new corpus seeds (or extend the existing ones) by running' - ' the given targets for a finite amount of time. Outputs them to ' - 'the passed seed_dir.' + ' the given targets for a finite number of times. Outputs them to' + ' the passed seed_dir.' ) args = parser.parse_args()Crypt-iQ approvedCrypt-iQ commented at 6:29 PM on August 9, 2020: contributorTested ACK 395e7d3
Running with
--generateand no seed directory onprocess_messages:['/root/bitcoin/src/test/fuzz/process_messages', '-runs=100000', '/root/seed-run/process_messages'] output: INFO: Seed: 4176416250 INFO: Loaded 1 modules (177403 inline 8-bit counters): 177403 [0x561f7c684dc8, 0x561f7c6b02c3), INFO: Loaded 1 PC tables (177403 PCs): 177403 [0x561f7c6b02c8,0x561f7c965278), INFO: 0 files found in /root/seed-run/process_messages INFO: -max_len is not provided; libFuzzer will not generate inputs larger than 4096 bytes INFO: A corpus is not provided, starting from an empty corpus [#2](/bitcoin-bitcoin/2/) INITED cov: 591 ft: 592 corp: 1/1b exec/s: 0 rss: 76Mb [#3](/bitcoin-bitcoin/3/) NEW cov: 594 ft: 912 corp: 2/2b lim: 4 exec/s: 0 rss: 77Mb L: 1/1 MS: 1 ChangeByte- [#4](/bitcoin-bitcoin/4/) NEW cov: 594 ft: 1346 corp: 3/3b lim: 4 exec/s: 0 rss: 77Mb L: 1/1 MS: 1 ChangeByte- [#5](/bitcoin-bitcoin/5/) NEW cov: 594 ft: 1347 corp: 4/4b lim: 4 exec/s: 0 rss: 77Mb L: 1/1 MS: 1 ChangeBit- [#6](/bitcoin-bitcoin/6/) NEW cov: 597 ft: 1350 corp: 5/6b lim: 4 exec/s: 0 rss: 77Mb L: 2/2 MS: 1 CopyPart- ... [#95703](/bitcoin-bitcoin/95703/) REDUCE cov: 889 ft: 3018 corp: 233/14597b lim: 198 exec/s: 2990 rss: 80Mb L: 142/198 MS: 1 EraseBytes- [#95823](/bitcoin-bitcoin/95823/) REDUCE cov: 889 ft: 3018 corp: 233/14573b lim: 198 exec/s: 2994 rss: 80Mb L: 133/198 MS: 5 InsertRepeatedBytes-InsertByte-CopyPart-ChangeBit-EraseBytes- [#95980](/bitcoin-bitcoin/95980/) REDUCE cov: 889 ft: 3018 corp: 233/14568b lim: 198 exec/s: 2999 rss: 80Mb L: 143/198 MS: 2 ChangeBinInt-EraseBytes- [#96372](/bitcoin-bitcoin/96372/) REDUCE cov: 889 ft: 3018 corp: 233/14545b lim: 198 exec/s: 3011 rss: 80Mb L: 98/198 MS: 2 ChangeBinInt-EraseBytes- [#96999](/bitcoin-bitcoin/96999/) REDUCE cov: 889 ft: 3018 corp: 233/14522b lim: 198 exec/s: 3031 rss: 80Mb L: 158/198 MS: 2 InsertByte-EraseBytes- [#97228](/bitcoin-bitcoin/97228/) REDUCE cov: 889 ft: 3018 corp: 233/14519b lim: 198 exec/s: 3038 rss: 80Mb L: 51/198 MS: 4 CMP-ShuffleBytes-EraseBytes-ShuffleBytes- DE: "\xff\xff\xff\xff"- [#97241](/bitcoin-bitcoin/97241/) REDUCE cov: 889 ft: 3018 corp: 233/14507b lim: 198 exec/s: 3038 rss: 80Mb L: 131/198 MS: 3 ChangeBinInt-InsertRepeatedBytes-EraseBytes- [#97387](/bitcoin-bitcoin/97387/) REDUCE cov: 889 ft: 3018 corp: 233/14485b lim: 198 exec/s: 3043 rss: 80Mb L: 146/198 MS: 1 EraseBytes- [#98259](/bitcoin-bitcoin/98259/) NEW cov: 889 ft: 3019 corp: 234/14655b lim: 205 exec/s: 3070 rss: 80Mb L: 170/198 MS: 2 ChangeBit-InsertRepeatedBytes- [#99240](/bitcoin-bitcoin/99240/) REDUCE cov: 889 ft: 3019 corp: 234/14652b lim: 212 exec/s: 3007 rss: 80Mb L: 31/198 MS: 1 EraseBytes- [#99898](/bitcoin-bitcoin/99898/) REDUCE cov: 889 ft: 3019 corp: 234/14650b lim: 212 exec/s: 3027 rss: 80Mb L: 87/198 MS: 3 ShuffleBytes-ChangeBit-EraseBytes- [#99917](/bitcoin-bitcoin/99917/) NEW cov: 889 ft: 3020 corp: 235/14861b lim: 212 exec/s: 3027 rss: 80Mb L: 211/211 MS: 4 InsertByte-InsertRepeatedBytes-InsertRepeatedBytes-CopyPart- [#100000](/bitcoin-bitcoin/100000/) DONE cov: 889 ft: 3020 corp: 235/14861b lim: 212 exec/s: 3030 rss: 80MbSince it's super structured data, coverage isn't that high (running with
qa-assetscorpus and some local seeds, I get aroundcov: 4700). I think anything that has low coverage with no starting seed directory could be a target for structured fuzzing?darosior force-pushed on Aug 10, 2020darosior force-pushed on Aug 11, 2020darosior force-pushed on Aug 13, 2020practicalswift commented at 3:49 PM on August 14, 2020: contributorTested ACK 0b83151f1042f9d3978ad07a573e89847733b456
Thanks for improving
test/fuzz/test_runner.py-- nice stuff! :)in test/fuzz/test_runner.py:182 in 0b83151f10 outdated
177 | + """ 178 | + logging.info("Generating corpus seeds to {}".format(seed_dir)) 179 | + 180 | + def job(command): 181 | + logging.debug("Running {}..".format(" ".join(command))) 182 | + logging.debug("{} output: {}\n\n".format(
MarcoFalke commented at 9:23 AM on August 22, 2020:Seems odd to append two dots to the command. Could at least add
'around the command?diff --git a/test/fuzz/test_runner.py b/test/fuzz/test_runner.py index 5ad1c9b9fa..f2a8aefd05 100755 --- a/test/fuzz/test_runner.py +++ b/test/fuzz/test_runner.py @@ -178,9 +178,9 @@ def generate_corpus_seeds(*, fuzz_pool, build_dir, seed_dir, targets): logging.info("Generating corpus seeds to {}".format(seed_dir)) def job(command): - logging.debug("Running {}..".format(" ".join(command))) - logging.debug("{} output: {}\n\n".format( - command, + logging.debug("Running '{}' ...\n".format(" ".join(command))) + logging.debug("Command '{}' output:\n{}\n".format( + ' '.join(command), subprocess.run(command, check=True, stderr=subprocess.PIPE, universal_newlines=True).stderr ))
MarcoFalke commented at 9:24 AM on August 22, 2020:Also a newline between the two logs would help, as they are unrelated in the overall log once more than one thread is running (see above suggested diff)
darosior commented at 4:01 PM on August 23, 2020:Added
in test/fuzz/test_runner.py:200 in 0b83151f10 outdated
195 | + target_seed_dir, 196 | + ] 197 | + futures.append(fuzz_pool.submit(job, command)) 198 | + 199 | + for future in as_completed(futures): 200 | + continue
MarcoFalke commented at 9:38 AM on August 22, 2020:Any reason to ignore failures?
future.result()
darosior commented at 3:59 PM on August 23, 2020:Nope, corrected.
MarcoFalke commented at 9:38 AM on August 22, 2020: memberSome questions
ac9927d517c833eb01e02ada097d095559f1f435
darosior force-pushed on Aug 23, 2020darosior force-pushed on Aug 23, 202015ae4a17c4test/fuzz: add a seed corpus generation option to the test_runner
Signed-off-by: Antoine Poinsot <darosior@protonmail.com>
darosior force-pushed on Aug 23, 2020MarcoFalke commented at 6:02 AM on August 24, 2020: memberACK 15ae4a17c4
MarcoFalke merged this on Aug 24, 2020MarcoFalke closed this on Aug 24, 2020jasonbcox referenced this in commit 10f2337d21 on Nov 6, 2020darosior deleted the branch on Jan 12, 2021DrahtBot locked this on Aug 16, 2022ContributorsLabels
github-metadata-mirror
This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-04-24 21:14 UTC
This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me