fuzz: execute each file in dir without fuzz engine #21496

pull ghost wants to merge 2 commits into bitcoin:master from changing 1 files +57 −2
  1. ghost commented at 8:47 pm on March 21, 2021: none

    Fixes #21461

    This supports whole directory runs of the fuzzing seeds without needing a fuzzing library. A few design decisions that I went with but happy to change:

    1. Very basic multi arg support a. Can have 0 args and continue to feed via file input like before b. Can have 1 or more args that is a file or directory d. Subdirectories are ignored if you pass in a directory (though you can use dir/* to include files from subdirectories)
    2. Based the file reading off of util/readwritefile.cpp with small changes to make it a buffer instead of a string (could reuse that but turn string to buffer?)
    3. The new changes are ignored in the __AFL_LOOP path or if built without DPROVIDE_FUZZ_MAIN_FUNCTION

    To build without libFuzzer, exclude the sanitizers. IE:

    0CC=clang CXX=clang++ ./configure BDB_LIBS="-L${BDB_PREFIX}/lib -ldb_cxx-4.8" BDB_CFLAGS="-I${BDB_PREFIX}/include" --without-gui --with-zmq --enable-fuzz
    

    That should have the CPPFLAGS of -DPROVIDE_FUZZ_MAIN_FUNCTION to indicicate it was built with the main function to use instead of the libFuzzer LLVMFuzzerInitialize & LLVMFuzzerTestOneInput methods.

    Tests:

     0# clean and build
     1make clean
     2make -j "$(($(nproc)+1))"
     3
     4# get qa-assets if you don't have already
     5git clone https://github.com/bitcoin-core/qa-assets
     6
     7# existing way to feed 1 at a time, still supported
     8FUZZ=process_message src/test/fuzz/fuzz < qa-assets/fuzz_seed_corpus/process_message/1258dd51f2a5f3221b33a306279ef7290c5fca6d
     9
    10# new with this PR: one at a time
    11FUZZ=process_message src/test/fuzz/fuzz qa-assets/fuzz_seed_corpus/process_message/1258dd51f2a5f3221b33a306279ef7290c5fca6d
    12
    13# or multiple files at the same time
    14FUZZ=process_message src/test/fuzz/fuzz qa-assets/fuzz_seed_corpus/process_message/1258dd51f2a5f3221b33a306279ef7290c5fca6d qa-assets/fuzz_seed_corpus/process_message/32230c42df1caa3220a3f95ee8a8d2865731ca3e qa-assets/fuzz_seed_corpus/process_message/90329d105d238ed81452077b4e0287a2040db9e9
    15
    16# new with this PR: whole directory at a time
    17FUZZ=process_message src/test/fuzz/fuzz qa-assets/fuzz_seed_corpus/process_message
    18
    19# or mix of files and directories at the same time
    20FUZZ=process_message src/test/fuzz/fuzz qa-assets/fuzz_seed_corpus/process_message/1258dd51f2a5f3221b33a306279ef7290c5fca6d qa-assets/fuzz_seed_corpus/process_message/32230c42df1caa3220a3f95ee8a8d2865731ca3e qa-assets/fuzz_seed_corpus/process_message/90329d105d238ed81452077b4e0287a2040db9e9 qa-assets/fuzz_seed_corpus/process_message/
    21
    22# new with this PR: wildcard support
    23FUZZ=process_messages src/test/fuzz/fuzz qa-assets/fuzz_seed_corpus/process_messages/* 
    24
    25# new with this PR: run all seeds in all targets, one target/directory at a time 
    26for D in qa-assets/fuzz_seed_corpus/*; do [ -d "${D}" ] && echo "${D##*/}" && FUZZ="${D##*/}" src/test/fuzz/fuzz qa-assets/fuzz_seed_corpus/"${D##*/}"; done
    

    Happy to make any changes and very much welcome code feedback, my c++ a bit rusty and still figuring out the codebase.

  2. DrahtBot added the label Tests on Mar 21, 2021
  3. fuzz: execute each file in dir without fuzz engine c8ee766ed7
  4. ghost commented at 1:40 am on March 22, 2021: none

    Oh, didn’t realize that filesystem isn’t available right now, was just going by the issue suggestion when I implemented it here. Saw this issue: #20744

    I guess the alternative is to either wait for the switch or to go with the boost approach for iterating through files in a directory?

  5. ajtowns commented at 1:48 am on March 22, 2021: member
    Concept ACK. Not sure if there’s much point to passing in a single file (as opposed to < single_file). If a file can’t be read, maybe continue to try the rest of them, or assert(false) to indicate an error rather than return 0 though?
  6. ajtowns commented at 1:55 am on March 22, 2021: member
    Might be good to update test/fuzz/test_runner.py to work with these changes – it’s got a couple of hardcoded things that expect libFuzzer/ENABLE_FUZZ and a few things that won’t work without fuzzing (m_dir/merge_inputs), and adds a -runs=1 arg that’s not needed without fuzzing, but aside from that works fine with phuzzing (phony fuzzing (tm)).
  7. fanquake commented at 5:05 am on March 22, 2021: member

    Oh, didn’t realize that filesystem isn’t available right now,

    It wont be available for general use in any bitcoind code for a while yet. @MarcoFalke is your intention to start using <filesystem> just in the fuzzing related code? We might have to cherry-pick a change out of #20744.

  8. MarcoFalke commented at 5:25 am on March 22, 2021: member
    This will be build in the regular “non-fuzz-engine” target, so it can’t yet use std::fs (my bad).
  9. in src/test/fuzz/fuzz.cpp:61 in c8ee766ed7 outdated
    56@@ -56,6 +57,23 @@ static bool read_stdin(std::vector<uint8_t>& data)
    57 }
    58 #endif
    59 
    60+#if defined(PROVIDE_FUZZ_MAIN_FUNCTION)
    61+static bool read_file(std::filesystem::path p, std::vector<uint8_t>& data)
    


    MarcoFalke commented at 5:30 am on March 22, 2021:
    0static bool read_file(fs::path p, std::vector<uint8_t>& data)
    

    The wrapper should work fine here?


    unknown commented at 9:23 pm on April 19, 2021:
    Yup, that worked, thanks!
  10. practicalswift commented at 4:14 pm on March 22, 2021: contributor

    Concept ACK

    It would be nice if multiple files or directories could be specified in order to mimic the libFuzzer command-line interface.

    I’m thinking something like this:

     0$ FUZZ=harness src/test/fuzz/fuzz
     1# Would use test case provided via stdin.
     2
     3$ FUZZ=harness src/test/fuzz/fuzz dir1
     4# Would use test cases from `dir1/` assuming `dir1` is a directory.
     5
     6$ FUZZ=harness src/test/fuzz/fuzz file1
     7# Would use test case `file1` assuming `file1` is a file.
     8
     9$ FUZZ=harness src/test/fuzz/fuzz dir1 dir2 file1
    10# Would use test cases from `dir1/`, `dir2/` and also the test case `file1`.
    

    This would allow for picking inputs via shell expanded wildcards.

  11. ghost commented at 4:44 pm on March 22, 2021: none

    Great feedback so far, thanks all!

    I’ll swap out filesystem for boost, handle consecutive dir/file args, look into whats going on with test/fuzz/test_runner.py, and some of the other points brought up.

  12. MarcoFalke commented at 11:19 am on March 28, 2021: member
    I think the test_runner changes can be done in a follow-up, since they are not required to add the feature. Also, they will raise the question how test_runner should deal with different fuzz engines in general.
  13. DrahtBot commented at 11:25 am on March 28, 2021: member

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #21936 (fuzz: Terminate immediately if a fuzzing harness tries to create a TCP socket (belt and suspenders) by practicalswift)
    • #21795 (fuzz: Terminate immediately if a fuzzing harness tries to perform a DNS lookup (belt and suspenders) by practicalswift)
    • #21538 (fuzz: Add fuzzing syscall sandbox: detect use of unexpected syscalls when fuzzing (“syscall sanitizer”) by practicalswift)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  14. ghost commented at 5:05 pm on April 5, 2021: none
    Sorry about the delay on this. Updated the code and the PR description to include boost, multiple arguments that can be either files or directories (including wildcard support), as well as some asserts in the file readings. Ignored test_runner for now but can look into it on a follow up.
  15. in src/test/fuzz/fuzz.cpp:63 in ee7c173c3c outdated
    55@@ -56,6 +56,23 @@ static bool read_stdin(std::vector<uint8_t>& data)
    56 }
    57 #endif
    58 
    59+#if defined(PROVIDE_FUZZ_MAIN_FUNCTION)
    60+static bool read_file(fs::path p, std::vector<uint8_t>& data)
    61+{
    62+    uint8_t buffer[1024];
    63+    FILE *f = fsbridge::fopen(p.string(), "rb");
    


    practicalswift commented at 9:38 am on April 6, 2021:
    Nit: Could clang-format -i src/test/fuzz/fuzz.cpp to get expected formatting? :)

    unknown commented at 9:23 pm on April 19, 2021:
    Thanks! Wasn’t aware of clang-format yet!
  16. in src/test/fuzz/fuzz.cpp:68 in 24b10b4843 outdated
    63+    FILE* f = fsbridge::fopen(p.string(), "rb");
    64+    if (f == nullptr)
    65+        return false;
    66+    do {
    67+        const size_t length = fread(buffer, sizeof(uint8_t), sizeof(buffer), f);
    68+        Assert(!ferror(f));
    


    ajtowns commented at 9:23 pm on April 19, 2021:
    if (ferror(f)) return false; ? (Same behaviour as fopen failure)

    unknown commented at 10:04 pm on April 22, 2021:
    Good point, the assert is done in the parent method. Changed.
  17. in src/test/fuzz/fuzz.cpp:142 in 24b10b4843 outdated
    123+    for (int i = 1; i < argc; ++i) {
    124+        fs::path seed_path(*(argv + i));
    125+        if (fs::is_directory(seed_path)) {
    126+            for (fs::directory_iterator it(seed_path); it != fs::directory_iterator(); ++it) {
    127+                if (!fs::is_regular_file(it->path())) continue;
    128+                Assert(read_file(it->path(), buffer));
    


    ajtowns commented at 9:26 pm on April 19, 2021:

    if (!is_regular_file(..)) { std::cerr << "Not a file: " << it->path() << endl; abort(); } ? if (!read_file(..)) { std::cerr << "Error reading file: " << it->path() << endl; abort(); } ?

    Seems like it would be helpful to know what file is causing a problem. Not sure that skipping non-regular files rather than erroring on them makes sense?


    ajtowns commented at 2:04 pm on April 20, 2021:

    Actually, would be helpful to know what file is causing a problem if the fuzz testing fails too. One way to do that would be to use a signal handler to catch SIGABRT, and pass the filename via a global, something like:

     0#if defined(PROVIDE_FUZZ_MAIN_FUNCTION) && !defined(__AFL_LOOP)
     1fs::path g_seed_path;
     2void signal_handler(int signal) 
     3{
     4    if (signal == SIGABRT) {
     5        std::cerr << "Error processing seed " << g_seed_path << std::endl;
     6    } else {
     7        std::cerr << "Unexpected signal " << signal << " received\n";
     8    }
     9    std::_Exit(EXIT_FAILURE);
    10}
    11#endif
    12...
    13    auto previous_handler = std::signal(SIGABRT, signal_handler);
    14...
    15    g_seed_path = seed_path;
    16    Assert(read_file(seed_path, buffer));
    17    test_one_input(buffer);
    

    (EDIT: maybe these are better for a followup PR, if thinking about this stuff would delaying this one too much)


    unknown commented at 10:05 pm on April 22, 2021:
    Great suggestion, otherwise how is one to know if there’s an issue. Added the suggested code!
  18. in src/test/fuzz/fuzz.cpp:146 in 24b10b4843 outdated
    126+            for (fs::directory_iterator it(seed_path); it != fs::directory_iterator(); ++it) {
    127+                if (!fs::is_regular_file(it->path())) continue;
    128+                Assert(read_file(it->path(), buffer));
    129+                test_one_input(buffer);
    130+                buffer.clear();
    131+            }
    


    ajtowns commented at 10:00 pm on April 19, 2021:
    Is it worth reporting the number of tests run when all the inputs succeed? It’s a bit unclear that anything happened otherwise.

    unknown commented at 10:05 pm on April 22, 2021:
    Added a counter
  19. ajtowns commented at 10:12 pm on April 19, 2021: member
    Seems to work fine for me; some comments below.
  20. in src/test/fuzz/fuzz.cpp:64 in 24b10b4843 outdated
    55@@ -56,6 +56,23 @@ static bool read_stdin(std::vector<uint8_t>& data)
    56 }
    57 #endif
    58 
    59+#if defined(PROVIDE_FUZZ_MAIN_FUNCTION)
    60+static bool read_file(fs::path p, std::vector<uint8_t>& data)
    61+{
    62+    uint8_t buffer[1024];
    63+    FILE* f = fsbridge::fopen(p.string(), "rb");
    64+    if (f == nullptr)
    


    MarcoFalke commented at 6:57 pm on April 20, 2021:
    0    if (f == nullptr) {
    

    According to style-guide


    unknown commented at 10:06 pm on April 22, 2021:

    Good catch, I actually went with

    0if (f == nullptr) return false;
    

    if that’s alright instead, to be consistent with one liners.

  21. fuzz: execute each file in dir without fuzz engine 3e3bf7ac6f
  22. DrahtBot added the label Needs rebase on May 21, 2021
  23. DrahtBot commented at 8:53 am on May 21, 2021: member

    🐙 This pull request conflicts with the target branch and needs rebase.

    Want to unsubscribe from rebase notifications on this pull request? Just convert this pull request to a “draft”.

  24. adamjonas commented at 11:19 pm on August 5, 2021: member
    @AnthonyRonning friendly ping to rebase.
  25. MarcoFalke added the label Up for grabs on Aug 18, 2021
  26. MarcoFalke commented at 5:12 pm on August 18, 2021: member
    Marked “up for grabs” for now.
  27. fanquake closed this on Aug 18, 2021

  28. MarcoFalke removed the label Up for grabs on Oct 20, 2021
  29. MarcoFalke removed the label Needs rebase on Oct 20, 2021
  30. MarcoFalke referenced this in commit bf2c0fb2a2 on Mar 17, 2022
  31. sidhujag referenced this in commit 16b7f5f572 on Mar 18, 2022

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-04-02 00:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me