fuzzing: Write a fuzzer for structured data (e.g. libprotobuf-mutator) #17657

issue MarcoFalke opened this issue on December 3, 2019
  1. MarcoFalke commented at 6:20 PM on December 3, 2019: member

    Messages in Bitcoin are structured, so a fuzzer that was written on structured data might be practically more efficient than our current "blind" fuzzers. https://github.com/google/libprotobuf-mutator looks like a good place to start. The goal of this issue is to write one (or more) fuzzers that are based on structured input data. For example, a transaction or PSBT could be expressed in a structured way and fed into the existing tx or psbt fuzz paths.

    Useful skills: Background in fuzzing and structured data formats

    The purpose of the good first issue label is to highlight which issues are suitable for a new contributor without a deep understanding of the codebase.

    Want to work on this issue?

    You do not need to request permission to start working on this. You are encouraged to comment on the issue if you are planning to work on it. This will help other contributors monitor which issues are actively being addressed and is also an effective way to request assistance if and when you need it.

    For guidance on contributing, please read CONTRIBUTING.md before opening your pull request.

  2. MarcoFalke added the label good first issue on Dec 3, 2019
  3. MarcoFalke added the label Tests on Dec 3, 2019
  4. brakmic commented at 5:17 PM on December 7, 2019: contributor

    Hi,

    I've implemented a very basic structure for "structured fuzzing".

    After I've unsuccessfully tried to integrate this variant with the already existing from src/test/fuzz, I moved the code into the sub-dir src/test/fuzz/structured.

    However, the new code is still based on the existing logic, for example the transaction source. The difference here is, that it also includes additional libFuzzer API's LLVMFuzzerMutate and LLVMFuzzerCustomMutator which execute logic that comes from new mutator classes.

    Currently, there's only a very basic mutator class available, which I have modelled after the original one from libprotobuf-mutator. Although very sophisticated, the code in libprotobuf-mutator is also very complex (at least for me), so that I avoided mindless copy/pasting.

    At this stage, I think, it's better to start really small and introduce only as much stuff as I can understand (have never worked with fuzzing before...in fact, I know it for less than 24 hours).

    If I am not totally mistaken, a specialized mutator class should be able to modify Bitcoin's messages by changing their properties and not only raw "byte vectors". For example, a mutator should be able to take a transaction and modify it in some way to check if anything problematic will happen.

    Example

    This is how the output looks like (of course, it doesn't go far enough as the current fuzzing capabilities aren't specialized enough)

    ./src/test/fuzz/structured/transaction test/fuzz/qa-assets/fuzz_seed_corpus/transaction/
    INFO: Seed: 2167175973
    INFO: Loaded 1 modules   (1206386 inline 8-bit counters): 1206386 [0x10f6f1648, 0x10f817eba), 
    INFO: Loaded 1 PC tables (1206386 PCs): 1206386 [0x10f817ec0,0x110a805e0), 
    INFO:      295 files found in test/fuzz/qa-assets/fuzz_seed_corpus/transaction/
    INFO: -max_len is not provided; libFuzzer will not generate inputs larger than 52575 bytes
    INFO: seed corpus: files: 295 min: 1b max: 52575b total: 491499b rss: 76Mb
    [#128](/bitcoin-bitcoin/128/)    pulse  cov: 4823 ft: 12238 corp: 107/3309b exec/s: 64 rss: 156Mb
    [#256](/bitcoin-bitcoin/256/)    pulse  cov: 5691 ft: 23700 corp: 211/46Kb exec/s: 85 rss: 161Mb
    [#296](/bitcoin-bitcoin/296/)    INITED cov: 5695 ft: 25566 corp: 243/391Kb exec/s: 74 rss: 173Mb
    [#512](/bitcoin-bitcoin/512/)    pulse  cov: 5695 ft: 25566 corp: 243/391Kb lim: 52575 exec/s: 128 rss: 173Mb
    [#1024](/bitcoin-bitcoin/1024/)   pulse  cov: 5695 ft: 25566 corp: 243/391Kb lim: 52575 exec/s: 256 rss: 174Mb
    [#2048](/bitcoin-bitcoin/2048/)   pulse  cov: 5695 ft: 25566 corp: 243/391Kb lim: 52575 exec/s: 409 rss: 176Mb
    [#4096](/bitcoin-bitcoin/4096/)   pulse  cov: 5695 ft: 25566 corp: 243/391Kb lim: 52575 exec/s: 682 rss: 179Mb
    [#8192](/bitcoin-bitcoin/8192/)   pulse  cov: 5695 ft: 25566 corp: 243/391Kb lim: 52575 exec/s: 1024 rss: 186Mb
    [#16384](/bitcoin-bitcoin/16384/)  pulse  cov: 5695 ft: 25566 corp: 243/391Kb lim: 52575 exec/s: 1365 rss: 199Mb
    

    Learning Resources

    If there's someone else also interested in working with structured fuzzing I'd recommend these videos and texts:

    Or, if there's someone else with more experience, please, grab my code, adapt it and share your changes.

    Any help is very much appreciated! 👍


    A few words for people out there struggling with macOS. 😱

    I'm working on macOS Catalina, so maybe I should also put a few words on compiling the fuzzing capability with it:

    • Take care of having an LLVM/Clang environment that contains fuzzing libraries. The default one from Apple is not enough, so that you will have to install it with brew, if not already done.

    • When executing ./configure you should put --disable-asm to avoid errors with certain assembly code from Bitcoin Core's code. There's an entry about it here and it seems to have something to do with sanitizers you have to compile for fuzzing.

    • Take care of giving the correct path for clang and clang++, like CC=/path/to/clang CXX=/path/to/clang++

    • If you run into problems with "boost sleep" or some of boost's libraries can't be found, like boost.thread or boost.filesystem, add this to your configure:

    CXXFLAGS="-isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.15.sdk"
    

    Notice: I am using Catalina 10.15.1, so your SDK might be different and you should adapt the path accordingly.

    Here's my complete configure, just in case.

    ./configure --disable-ccache --enable-fuzz --with-sanitizers=fuzzer,address,undefined --with-boost CPPFLAGS="-I/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include" CXXFLAGS="-isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.15.sdk" CC=/usr/local/opt/llvm/bin/clang CXX=/usr/local/opt/llvm/bin/clang++ LDFLAGS="-L/usr/local/lib/darwin/" --disable-asm
    

    Regards,

  5. practicalswift commented at 9:24 PM on December 7, 2019: contributor

    @brakmic

    I'm very glad to see that you are interested in adding more fuzzing harnesses to the project. Welcome!

    If you want to work on improving fuzzing coverage in Bitcoin Core there is a lot of long-hanging fruit in the form of currently non-fuzz-covered code that could be covered simply by adding small, simple and dumb fuzzing harnesses (without any dependency on libprotobuf-mutator or similar). See the simple fuzzers linked below for inspiration.

    Coverage-guided fuzzers like libFuzzer are surprisingly good these days, so I think you'll be surprised how deep also simple fuzzing harnesses can reach :)

    After adding a few fuzzers to the project you'll get a feel for the limits of simple fuzzing harnesses and you might notice cases where measurements indicate that a fuzzer gets stuck because of the lack of more sophisticated structure awareness. Then it might make sense to look at bringing in libprotobuf-mutator or similar, but my suggestion though is to start with the simplest possible fuzzing techniques first and then add complexity only when required.

    Fuzzing harnesses should be as simple as possible, but not simpler :)

    If you are interested in fuzzing Bitcoin Core, please consider reviewing any of the fuzzing PR:s awaiting review:

    • #17050 – "tests: Add fuzzing harnesses for functions parsing scripts, numbers, JSON and HD keypaths (bip32)"
    • #17071 – "tests: Add fuzzing harness for CheckBlock(...) and other CBlock related functions"
    • #17093 – "tests: Add fuzzing harness for various CTx{In,Out} related functions"
    • #17109 – "tests: Add fuzzing harness for various functions consuming only integrals"
    • #17225 – "tests: Test serialisation as part of deserialisation fuzzing. Test round-trip equality where possible."
    • #17229 – "tests: Add fuzzing harnesses for various Base{32,58,64} and hex related functions"

    I would be glad to help if you run in to any problems during your fuzzing journey :) Also, don't hesitate to ping me if you want any fuzzing PR reviewed :)

    Again: welcome! We need more fuzzing in Bitcoin Core! :)

  6. brakmic commented at 9:41 PM on December 7, 2019: contributor

    @practicalswift

    Many thanks for your support and the list of fuzzing PR's! Now I can work on something that's concrete. :)

    This also will produce proper feedback, so I can adapt the code accordingly.

    Regards,

  7. brakmic commented at 9:01 PM on December 12, 2019: contributor

    @practicalswift

    Meanwhile, I've created a small structure that should help build various structured fuzzers. It's nothing complex, just a single interface that all fuzzing classes must implement.

    class IMutator {
    public:
      // Initialize random nuber generator
      virtual void Seed(unsigned int value) = 0;
      // Default mutate function.
      // All Bitcoin messages are vectors of bytes that can be converted into
      // structures like Transactions, PSBT's, Scripts etc.
      virtual void Mutate(std::vector<uint8_t>& data) = 0;
        // Register callback for postprocessing of mutated messages.
      virtual void RegisterPostProcessor(const IDescriptor* descriptor, PostProcessFunction callback) = 0;
    };
    

    I have then taken the original Script- and Transaction-Fuzzers and extended them with custom fuzzing APIs from libFuzzer. However, the transaction fuzzer is still very primitive, so that only Script Fuzzer should be considered for now.

    In the current implementation I'm letting this fuzzer create tons of (im)possible Bitcoin Scripts. These would look like this (every line is a separate script that will be fired against various functions):

    2 OP_PICK OP_BOOLOR OP_PUSHDATA2 12 OP_ADD 1 OP_SIZE OP_NOTIF OP_MAX OP_VERIF OP_TOALTSTACK OP_CHECKSEQUENCEVERIFY OP_NIP 
    OP_PICK OP_BOOLOR OP_PUSHDATA2 12 
    OP_PICK OP_BOOLOR OP_PUSHDATA2 12 OP_ADD 1 OP_SIZE OP_NOTIF OP_MAX OP_VERIF OP_TOALTSTACK OP_CHECKSEQUENCEVERIFY OP_NIP OP_SWAP 
    OP_BOOLOR OP_PUSHDATA2 12 
    12 1 OP_SIZE OP_NOTIF OP_MAX 
    12 OP_ADD 1 OP_SIZE OP_NOTIF OP_MAX OP_VERIF 
    12 OP_ADD 1 OP_SIZE 
    OP_ADD 1 
    1 OP_SIZE OP_NOTIF OP_MAX OP_VERIF OP_TOALTSTACK OP_CHECKSEQUENCEVERIFY OP_NIP OP_SWAP 
    OP_SIZE OP_NOTIF OP_MAX OP_VERIF OP_TOALTSTACK OP_CHECKSEQUENCEVERIFY 
    OP_NOP OP_MAX OP_VERIF OP_TOALTSTACK OP_CHECKSEQUENCEVERIFY 
    OP_NOP OP_VERIF OP_TOALTSTACK OP_CHECKSEQUENCEVERIFY OP_NIP 
    13 5 -1 OP_CHECKSIG OP_2DUP OP_WITHIN OP_IF OP_2DIV OP_ELSE 12 
    13 5 -1 OP_CHECKSIG OP_2DUP OP_WITHIN OP_IF OP_2DIV OP_ELSE 12 
    -1 OP_CHECKSIG OP_2DUP OP_WITHIN OP_IF OP_2DIV OP_ELSE 12 OP_ADD OP_NOP6 OP_XOR OP_TUCK OP_2DUP OP_VER OP_VERIF OP_FROMALTSTACK 
    -1 OP_CHECKSIG OP_2DUP OP_WITHIN OP_IF OP_2DIV OP_ELSE 12 OP_ADD OP_NOP6 OP_XOR OP_TUCK OP_2DUP OP_VER OP_VERIF OP_FROMALTSTACK 
    OP_CHECKSIG OP_2DUP OP_WITHIN OP_IF OP_2DIV OP_ELSE 12 OP_ADD OP_NOP6 OP_XOR OP_TUCK 
    OP_2DIV OP_ELSE 12 
    12 OP_ADD OP_NOP6 OP_XOR OP_TUCK
    

    I am not sure if this is useful at all, so maybe I should not try to introduce any additional complexities before letting others double check it. Maybe the whole interface-implementation stuff is already too much for this task, so any help in this case is very much appreciated.

    One more thing, however...

    During my experiments I encountered these UB-sanitizer warnings when starting the script fuzzer:

    INFO: Seed: 1708339462
    INFO: Loaded 1 modules   (1093525 inline 8-bit counters): 1093525 [0x1106584c8, 0x11076345d), 
    INFO: Loaded 1 PC tables (1093525 PCs): 1093525 [0x110763460,0x111812db0), 
    INFO: -max_len is not provided; libFuzzer will not generate inputs larger than 4096 bytes
    prevector.h:453:19: runtime error: reference binding to misaligned address 0x7ffee318f162 for type 'prevector<28, unsigned char, unsigned int, int>::size_type' (aka 'unsigned int'), which requires 4 byte alignment
    0x7ffee318f162: note: pointer points here
     00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00
                  ^ 
    SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior prevector.h:453:19 in 
    /usr/local/opt/llvm/bin/../include/c++/v1/type_traits:3699:25: runtime error: reference binding to misaligned address 0x7ffee318f162 for type 'unsigned int', which requires 4 byte alignment
    0x7ffee318f162: note: pointer points here
     00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00
                  ^ 
    SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /usr/local/opt/llvm/bin/../include/c++/v1/type_traits:3699:25 in 
    /usr/local/opt/llvm/bin/../include/c++/v1/type_traits:2281:12: runtime error: reference binding to misaligned address 0x7ffee318f162 for type '_Up' (aka 'unsigned int'), which requires 4 byte alignment
    0x7ffee318f162: note: pointer points here
     00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00
                  ^ 
    SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /usr/local/opt/llvm/bin/../include/c++/v1/type_traits:2281:12 in 
    /usr/local/opt/llvm/bin/../include/c++/v1/type_traits:3699:13: runtime error: load of misaligned address 0x7ffee318f162 for type 'typename remove_reference<unsigned int &>::type' (aka 'unsigned int'), which requires 4 byte alignment
    0x7ffee318f162: note: pointer points here
     00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00
    

    First, I thought that it must have been because of my sloppy coding, but no matter what I did, the warnings remained.

    Then I started (de)activating functions from LLVMFuzzerTestOneInput in test/fuzz/structured/script.cpp one by one.

    And it seems that this call is the culprit, but I still can't explain why:

    (void)IsSolvable(signing_provider, script);
    

    This function was taken like all others from the original test/fuzz/script.cpp

    However, I am still convinced that it has something to do with my code.

    Regards,

  8. MarcoFalke commented at 9:04 PM on December 12, 2019: member

    During my experiments I encountered these UB-sanitizer warnings when starting the script fuzzer:

    I recommend to activate all known suppressions:

    export LSAN_OPTIONS="suppressions=$(pwd)/test/sanitizer_suppressions/lsan"
    export TSAN_OPTIONS="suppressions=$(pwd)/test/sanitizer_suppressions/tsan"
    export UBSAN_OPTIONS="suppressions=$(pwd)/test/sanitizer_suppressions/ubsan:print_stacktrace=1:halt_on_error=1"
    
  9. practicalswift commented at 10:54 PM on December 12, 2019: contributor

    @brakmic

    The prevector alignment issue is known and fixed by PR #17708. Please consider reviewing that PR - it would be nice to have it solved :)

    Regarding the fuzzing-experiments branch: try to measure what results you get from the fuzzing harness in that +564 LOC branch in terms of coverage and then compare that to what you achieve using the simplest possible ~20 LOC fuzzer you can think of for the same target function. What were the results? Did the extra abstractions pay off?

  10. brakmic commented at 10:55 AM on December 13, 2019: contributor

    @practicalswift

    Many thanks for the hint regarding alignment issues. Now I don't have to make my code even more ugly ;)

    Here's the output of script fuzzers. The first one is with custom fuzzing function activated, the second one is exactly the same fuzzer but without the custom function. I just put

    #ifdef CUSTOM_FUZZER 
    ... 
    #endif
    

    around it.

    INFO: Seed: 1052026202
    INFO: Loaded 1 modules   (1093525 inline 8-bit counters): 1093525 [0x10ada24c8, 0x10aead45d), 
    INFO: Loaded 1 PC tables (1093525 PCs): 1093525 [0x10aead460,0x10bf5cdb0), 
    INFO:      440 files found in test/fuzz/qa-assets/fuzz_seed_corpus/script/
    INFO: -max_len is not provided; libFuzzer will not generate inputs larger than 4096 bytes
    INFO: seed corpus: files: 440 min: 1b max: 3948b total: 136723b rss: 72Mb
    [#442](/bitcoin-bitcoin/442/)    INITED cov: 6325 ft: 12399 corp: 320/91Kb exec/s: 221 rss: 89Mb
    [#512](/bitcoin-bitcoin/512/)    pulse  cov: 6325 ft: 12399 corp: 320/91Kb lim: 4096 exec/s: 256 rss: 89Mb
    [#1024](/bitcoin-bitcoin/1024/)   pulse  cov: 6325 ft: 12399 corp: 320/91Kb lim: 4096 exec/s: 512 rss: 90Mb
    [#2048](/bitcoin-bitcoin/2048/)   pulse  cov: 6325 ft: 12399 corp: 320/91Kb lim: 4096 exec/s: 682 rss: 91Mb
    [#4096](/bitcoin-bitcoin/4096/)   pulse  cov: 6325 ft: 12399 corp: 320/91Kb lim: 4096 exec/s: 1024 rss: 93Mb
    [#8192](/bitcoin-bitcoin/8192/)   pulse  cov: 6325 ft: 12399 corp: 320/91Kb lim: 4096 exec/s: 1170 rss: 97Mb
    [#16384](/bitcoin-bitcoin/16384/)  pulse  cov: 6325 ft: 12399 corp: 320/91Kb lim: 4096 exec/s: 1260 rss: 105Mb
    [#32768](/bitcoin-bitcoin/32768/)  pulse  cov: 6325 ft: 12399 corp: 320/91Kb lim: 4096 exec/s: 1260 rss: 123Mb
    [#65536](/bitcoin-bitcoin/65536/)  pulse  cov: 6325 ft: 12399 corp: 320/91Kb lim: 4096 exec/s: 1213 rss: 156Mb
    [#131072](/bitcoin-bitcoin/131072/) pulse  cov: 6325 ft: 12399 corp: 320/91Kb lim: 4096 exec/s: 1310 rss: 223Mb
    [#262144](/bitcoin-bitcoin/262144/) pulse  cov: 6325 ft: 12399 corp: 320/91Kb lim: 4096 exec/s: 1337 rss: 359Mb
    [#524288](/bitcoin-bitcoin/524288/) pulse  cov: 6325 ft: 12399 corp: 320/91Kb lim: 4096 exec/s: 1383 rss: 543Mb
    [#1048576](/bitcoin-bitcoin/1048576/)        pulse  cov: 6325 ft: 12399 corp: 320/91Kb lim: 4096 exec/s: 1286 rss: 544Mb
    [#2097152](/bitcoin-bitcoin/2097152/)        pulse  cov: 6325 ft: 12399 corp: 320/91Kb lim: 4096 exec/s: 1304 rss: 544Mb
    [#4194304](/bitcoin-bitcoin/4194304/)        pulse  cov: 6325 ft: 12399 corp: 320/91Kb lim: 4096 exec/s: 1326 rss: 544Mb
    

    It ran for some 30+ minutes before I stopped it and this is the maximum coverage it was able to achieve. I am pretty sure, that a more "intelligent" script-randomizing technique might have achieved a bit more, but for this I would need to find a way how to construct more "realistic" scripts. That is, scripts which are "almost" correct. Right now, it's more or less creating batches of randomly selected Op-Codes.

    And here the "normal" fuzzer output.

    INFO: Seed: 1849818327INFO: Loaded 1 modules   (1092492 inline 8-bit counters): 1092492 [0x10ccef048, 0x10cdf9bd4),
    INFO: Loaded 1 PC tables (1092492 PCs): 1092492 [0x10cdf9bd8,0x10dea5498), 
    INFO:      276 files found in test/fuzz/qa-assets/fuzz_seed_corpus/script/
    INFO: -max_len is not provided; libFuzzer will not generate inputs larger than 4096 bytes
    INFO: seed corpus: files: 276 min: 1b max: 3948b total: 56003b rss: 71Mb
    ...[snip]...
    [#79124](/bitcoin-bitcoin/79124/)  REDUCE cov: 3502 ft: 8202 corp: 307/117Kb lim: 4096 exec/s: 277 rss: 542Mb L: 2244/3940 MS: 2 ChangeASCIIInt-EraseBytes-
    [#80005](/bitcoin-bitcoin/80005/)  REDUCE cov: 3502 ft: 8202 corp: 307/117Kb lim: 4096 exec/s: 277 rss: 542Mb L: 918/3940 MS: 1 EraseBytes-
    [#80168](/bitcoin-bitcoin/80168/)  REDUCE cov: 3502 ft: 8202 corp: 307/117Kb lim: 4096 exec/s: 278 rss: 542Mb L: 195/3940 MS: 3 InsertRepeatedBytes-InsertByte-EraseBytes-
    [#80409](/bitcoin-bitcoin/80409/)  REDUCE cov: 3502 ft: 8202 corp: 307/117Kb lim: 4096 exec/s: 278 rss: 542Mb L: 2239/3940 MS: 1 EraseBytes-
    [#81560](/bitcoin-bitcoin/81560/)  REDUCE cov: 3502 ft: 8202 corp: 307/116Kb lim: 4096 exec/s: 278 rss: 542Mb L: 2047/3940 MS: 1 EraseBytes-
    [#81994](/bitcoin-bitcoin/81994/)  REDUCE cov: 3502 ft: 8202 corp: 307/116Kb lim: 4096 exec/s: 278 rss: 542Mb L: 1731/3940 MS: 4 ChangeBinInt-ChangeASCIIInt-ChangeBit-EraseBytes-
    [#82232](/bitcoin-bitcoin/82232/)  REDUCE cov: 3502 ft: 8202 corp: 307/116Kb lim: 4096 exec/s: 278 rss: 542Mb L: 989/3940 MS: 3 InsertRepeatedBytes-CMP-EraseBytes- DE: "\x01\x00\x00\x00\x00\x00\x00\x10"-
    [#83897](/bitcoin-bitcoin/83897/)  REDUCE cov: 3502 ft: 8202 corp: 307/115Kb lim: 4096 exec/s: 280 rss: 542Mb L: 930/3940 MS: 5 ChangeBit-ChangeByte-ShuffleBytes-ChangeBit-EraseBytes-
    [#84769](/bitcoin-bitcoin/84769/)  REDUCE cov: 3502 ft: 8202 corp: 307/115Kb lim: 4096 exec/s: 280 rss: 542Mb L: 992/3940 MS: 2 EraseBytes-CMP- DE: "\xb7\x01"-
    [#87330](/bitcoin-bitcoin/87330/)  REDUCE cov: 3502 ft: 8202 corp: 307/115Kb lim: 4096 exec/s: 282 rss: 542Mb L: 240/3940 MS: 1 EraseBytes-
    [#87545](/bitcoin-bitcoin/87545/)  REDUCE cov: 3502 ft: 8202 corp: 307/115Kb lim: 4096 exec/s: 282 rss: 542Mb L: 890/3940 MS: 5 ChangeByte-ShuffleBytes-ChangeBit-InsertByte-EraseBytes-
    [#89626](/bitcoin-bitcoin/89626/)  REDUCE cov: 3502 ft: 8202 corp: 307/114Kb lim: 4096 exec/s: 283 rss: 542Mb L: 135/3940 MS: 1 EraseBytes-
    [#90207](/bitcoin-bitcoin/90207/)  REDUCE cov: 3502 ft: 8202 corp: 307/114Kb lim: 4096 exec/s: 283 rss: 542Mb L: 778/3940 MS: 1 EraseBytes-
    [#91059](/bitcoin-bitcoin/91059/)  REDUCE cov: 3502 ft: 8202 corp: 307/114Kb lim: 4096 exec/s: 283 rss: 542Mb L: 3382/3923 MS: 2 ChangeASCIIInt-EraseBytes-
    [#91787](/bitcoin-bitcoin/91787/)  NEW    cov: 3502 ft: 8205 corp: 308/114Kb lim: 4096 exec/s: 284 rss: 542Mb L: 25/3923 MS: 3 ChangeBit-PersAutoDict-EraseBytes- DE: "\x17\x04\x00\x00"-
    [#94658](/bitcoin-bitcoin/94658/)  REDUCE cov: 3502 ft: 8205 corp: 308/114Kb lim: 4096 exec/s: 285 rss: 542Mb L: 942/3923 MS: 1 EraseBytes-
    [#99369](/bitcoin-bitcoin/99369/)  REDUCE cov: 3502 ft: 8205 corp: 308/114Kb lim: 4096 exec/s: 286 rss: 542Mb L: 1655/3923 MS: 1 EraseBytes-
    [#100062](/bitcoin-bitcoin/100062/) REDUCE cov: 3502 ft: 8205 corp: 308/113Kb lim: 4096 exec/s: 286 rss: 542Mb L: 3586/3923 MS: 3 InsertByte-ChangeByte-EraseBytes-
    [#100503](/bitcoin-bitcoin/100503/) REDUCE cov: 3502 ft: 8205 corp: 308/113Kb lim: 4096 exec/s: 286 rss: 542Mb L: 60/3923 MS: 1 EraseBytes-
    [#100894](/bitcoin-bitcoin/100894/) REDUCE cov: 3502 ft: 8205 corp: 308/113Kb lim: 4096 exec/s: 286 rss: 542Mb L: 134/3923 MS: 1 EraseBytes-
    [#101858](/bitcoin-bitcoin/101858/) REDUCE cov: 3502 ft: 8205 corp: 308/113Kb lim: 4096 exec/s: 284 rss: 542Mb L: 450/3923 MS: 4 ChangeBit-ShuffleBytes-InsertByte-EraseBytes-
    ...[snip]...
    

    However, being unexperienced in fuzzing I am not going to experiment "too much", because it's really easy to get lost in complexities when you're dealing with new things.

    Maybe you or others have better ideas how to build a proper structured fuzzer?

    Regards,

  11. practicalswift commented at 2:06 PM on December 13, 2019: contributor

    @brakmic

    To make it a proper shoot-out you'll need to make sure the two fuzzing sessions with exactly the same seed input corpus and that the fuzzing binaries are given the same run-time.

    In what you posted above the initial corpus sizes differs:

    INFO: seed corpus: files: 440 min: 1b max: 3948b total: 136723b rss: 72Mb
    vs.
    INFO: seed corpus: files: 276 min: 1b max: 3948b total: 56003b rss: 71Mb
    

    Try doing the shoot-out by giving each fuzzing harness a fresh copy of qa-assets/fuzz_seed_corpus/script/. It should be a fresh copy without any saved findings. Avoid the mistake of sharing the same directory between the fuzzers: they should have a separate directory each since libFuzzer will write to these directories.

    Also make sure they are given same runtime using -max_total_time.

    Can you repeat your experiment with these adjustments and post the full results to a GitHub gist? :) Make sure to include all initial INFO: lines and also the ending DONE line in the output.

    […] it's really easy to get lost in complexities when you're dealing with new things.

    A good point. A way to avoid that is to go super simple to start with and only gradually introduce abstractions/complexities only when evidence suggests it is needed.

    In this specific case: my suggestion is that you start with writing a few basic non-structured fuzzers and wait with introducing structured fuzzing until you have experimental results suggesting that such a move would allow you to reach code paths unreachable by simpler methods (or finding such code paths much quicker).

  12. brakmic commented at 3:07 PM on December 13, 2019: contributor

    @practicalswift

    Many thanks! Now I understand a few things better. I've executed the two tests. For both of them I cloned Bitcoin's qa-assets anew: git clone https://github.com/bitcoin-core/qa-assets

    I also compiled src/test/fuzz/structured/script.cpp with and without the custom function.

    I executed them with same arguments:

    ./src/test/fuzz/structured/script -max_total_time=240 test/fuzz/qa-assets/fuzz_seed_corpus/script/
    

    The output of the non-custom script.cpp is:

    INFO: Seed: 3994611508
    INFO: Loaded 1 modules   (1092492 inline 8-bit counters): 1092492 [0x10b467048, 0x10b571bd4), 
    INFO: Loaded 1 PC tables (1092492 PCs): 1092492 [0x10b571bd8,0x10c61d498), 
    INFO:      284 files found in test/fuzz/qa-assets/fuzz_seed_corpus/script/
    INFO: -max_len is not provided; libFuzzer will not generate inputs larger than 4096 bytes
    INFO: seed corpus: files: 284 min: 1b max: 3948b total: 56352b rss: 71Mb
    [#286](/bitcoin-bitcoin/286/)    INITED cov: 6325 ft: 12060 corp: 240/38Kb exec/s: 286 rss: 82Mb
    [#299](/bitcoin-bitcoin/299/)    NEW    cov: 6325 ft: 12062 corp: 241/39Kb lim: 3913 exec/s: 299 rss: 82Mb L: 107/3913 MS: 3 InsertRepeatedBytes-ChangeBit-ChangeByte-
    [#300](/bitcoin-bitcoin/300/)    NEW    cov: 6325 ft: 12064 corp: 242/39Kb lim: 3913 exec/s: 300 rss: 82Mb L: 227/3913 MS: 1 ChangeByte-
    [#301](/bitcoin-bitcoin/301/)    NEW    cov: 6325 ft: 12065 corp: 243/39Kb lim: 3913 exec/s: 301 rss: 82Mb L: 23/3913 MS: 1 ChangeBit-
    [#305](/bitcoin-bitcoin/305/)    NEW    cov: 6325 ft: 12068 corp: 244/43Kb lim: 3913 exec/s: 305 rss: 83Mb L: 3913/3913 MS: 4 EraseBytes-CopyPart-ChangeBinInt-CrossOver-
    [#315](/bitcoin-bitcoin/315/)    NEW    cov: 6325 ft: 12083 corp: 245/43Kb lim: 3913 exec/s: 315 rss: 83Mb L: 154/3913 MS: 5 EraseBytes-InsertByte-ChangeByte-InsertByte-EraseBytes-
    [#352](/bitcoin-bitcoin/352/)    NEW    cov: 6325 ft: 12095 corp: 246/43Kb lim: 3913 exec/s: 352 rss: 83Mb L: 463/3913 MS: 2 InsertRepeatedBytes-InsertRepeatedBytes-
    [#364](/bitcoin-bitcoin/364/)    NEW    cov: 6325 ft: 12097 corp: 247/43Kb lim: 3913 exec/s: 364 rss: 83Mb L: 76/3913 MS: 2 ShuffleBytes-InsertRepeatedBytes-
    [#367](/bitcoin-bitcoin/367/)    NEW    cov: 6325 ft: 12100 corp: 248/43Kb lim: 3913 exec/s: 367 rss: 83Mb L: 20/3913 MS: 3 EraseBytes-CMP-CMP- DE: "\x01\x00\x00\x00\x00\x00\x00\x00"-"R\xe4\x00\x00 `\x00\x00"-
    [..snip..]
    [#89194](/bitcoin-bitcoin/89194/)  REDUCE cov: 6325 ft: 12389 corp: 309/92Kb lim: 4096 exec/s: 405 rss: 515Mb L: 317/3913 MS: 2 InsertByte-EraseBytes-
    [#91850](/bitcoin-bitcoin/91850/)  REDUCE cov: 6325 ft: 12389 corp: 309/92Kb lim: 4096 exec/s: 406 rss: 516Mb L: 460/3913 MS: 1 EraseBytes-
    [#92497](/bitcoin-bitcoin/92497/)  REDUCE cov: 6325 ft: 12389 corp: 309/92Kb lim: 4096 exec/s: 405 rss: 516Mb L: 1146/3913 MS: 2 ChangeBinInt-EraseBytes-
    [#93653](/bitcoin-bitcoin/93653/)  REDUCE cov: 6325 ft: 12389 corp: 309/92Kb lim: 4096 exec/s: 407 rss: 516Mb L: 223/3913 MS: 1 EraseBytes-
    [#96209](/bitcoin-bitcoin/96209/)  REDUCE cov: 6325 ft: 12389 corp: 309/92Kb lim: 4096 exec/s: 409 rss: 516Mb L: 103/3913 MS: 1 EraseBytes-
    [#97245](/bitcoin-bitcoin/97245/)  REDUCE cov: 6325 ft: 12389 corp: 309/92Kb lim: 4096 exec/s: 408 rss: 516Mb L: 201/3913 MS: 1 EraseBytes-
    [#97902](/bitcoin-bitcoin/97902/)  REDUCE cov: 6325 ft: 12389 corp: 309/92Kb lim: 4096 exec/s: 409 rss: 516Mb L: 209/3913 MS: 2 CMP-EraseBytes- DE: "\x00\x00\x00\x00\x00\x00\x00\x00"-
    [#98507](/bitcoin-bitcoin/98507/)  DONE   cov: 6325 ft: 12389 corp: 309/92Kb lim: 4096 exec/s: 408 rss: 516Mb
    ###### Recommended dictionary. ######
    "\x01\x00\x00\x00\x00\x00\x00\x00" # Uses: 571
    "R\xe4\x00\x00 `\x00\x00" # Uses: 671
    "\x00\x00\x00\x00\x00\x00\x00R" # Uses: 621
    "\x1a\x00\x00\x00\x00\x00\x00\x00" # Uses: 630
    "\x83\xe0\x04\x00\xa0a\x00\x00" # Uses: 593
    "\xff\xff~\xfe\xe86\xec\xc0" # Uses: 647
    "\x13\x00\x00\x00" # Uses: 596
    "\x01\x00\x00\x00\x00\x00\x02\x08" # Uses: 577
    "\x10\x00\x00\x00\x00\x00\x00\x00" # Uses: 629
    "\xb6\x02\x00\x00" # Uses: 574
    "\x01\x00\x0a\xfa" # Uses: 567
    "\xbb\x01\x00\x00" # Uses: 502
    "\xff\x96" # Uses: 463
    "\x1c\x00" # Uses: 387
    "\xff\x00\x00\x00" # Uses: 269
    "\x01\x00\x00\xaf" # Uses: 200
    "\x00\xad" # Uses: 94
    "\x00\x00\x00\x00\x00\x00\x00\x00" # Uses: 2
    ###### End of recommended dictionary. ######
    Done 98507 runs in 241 second(s)
    

    And the output of customized script.cpp is:

    INFO: Seed: 160415445
    INFO: Loaded 1 modules   (1093525 inline 8-bit counters): 1093525 [0x104e224c8, 0x104f2d45d), 
    INFO: Loaded 1 PC tables (1093525 PCs): 1093525 [0x104f2d460,0x105fdcdb0), 
    INFO:      284 files found in test/fuzz/qa-assets/fuzz_seed_corpus/script/
    INFO: -max_len is not provided; libFuzzer will not generate inputs larger than 4096 bytes
    INFO: seed corpus: files: 284 min: 1b max: 3948b total: 56352b rss: 71Mb
    [#286](/bitcoin-bitcoin/286/)    INITED cov: 6325 ft: 12059 corp: 241/38Kb exec/s: 286 rss: 82Mb
    [#361](/bitcoin-bitcoin/361/)    NEW    cov: 6325 ft: 12060 corp: 242/38Kb lim: 4096 exec/s: 361 rss: 82Mb L: 13/3913 MS: 5 Custom-Custom-Custom-Custom-Custom-
    [#437](/bitcoin-bitcoin/437/)    NEW    cov: 6325 ft: 12061 corp: 243/38Kb lim: 4096 exec/s: 437 rss: 83Mb L: 10/3913 MS: 1 Custom-
    [#1024](/bitcoin-bitcoin/1024/)   pulse  cov: 6325 ft: 12061 corp: 243/38Kb lim: 4096 exec/s: 512 rss: 83Mb
    [#1084](/bitcoin-bitcoin/1084/)   REDUCE cov: 6325 ft: 12061 corp: 243/38Kb lim: 4096 exec/s: 542 rss: 83Mb L: 4/3913 MS: 2 Custom-Custom-
    [#2048](/bitcoin-bitcoin/2048/)   pulse  cov: 6325 ft: 12061 corp: 243/38Kb lim: 4096 exec/s: 682 rss: 89Mb
    [#2436](/bitcoin-bitcoin/2436/)   NEW    cov: 6325 ft: 12064 corp: 244/38Kb lim: 4096 exec/s: 812 rss: 89Mb L: 11/3913 MS: 3 Custom-Custom-Custom-
    [#3514](/bitcoin-bitcoin/3514/)   NEW    cov: 6325 ft: 12067 corp: 245/38Kb lim: 4096 exec/s: 878 rss: 90Mb L: 13/3913 MS: 3 Custom-Custom-Custom-
    [#4096](/bitcoin-bitcoin/4096/)   pulse  cov: 6325 ft: 12067 corp: 245/38Kb lim: 4096 exec/s: 819 rss: 90Mb
    [#5590](/bitcoin-bitcoin/5590/)   NEW    cov: 6325 ft: 12071 corp: 246/38Kb lim: 4096 exec/s: 931 rss: 92Mb L: 17/3913 MS: 1 Custom-
    [#8192](/bitcoin-bitcoin/8192/)   pulse  cov: 6325 ft: 12071 corp: 246/38Kb lim: 4096 exec/s: 910 rss: 94Mb
    [#16384](/bitcoin-bitcoin/16384/)  pulse  cov: 6325 ft: 12071 corp: 246/38Kb lim: 4096 exec/s: 1024 rss: 101Mb
    [#17775](/bitcoin-bitcoin/17775/)  NEW    cov: 6325 ft: 12075 corp: 247/38Kb lim: 4096 exec/s: 1045 rss: 102Mb L: 16/3913 MS: 5 Custom-Custom-Custom-Custom-Custom-
    [#30092](/bitcoin-bitcoin/30092/)  NEW    cov: 6325 ft: 12079 corp: 248/38Kb lim: 4096 exec/s: 1037 rss: 112Mb L: 17/3913 MS: 2 Custom-Custom-
    [#32768](/bitcoin-bitcoin/32768/)  pulse  cov: 6325 ft: 12079 corp: 248/38Kb lim: 4096 exec/s: 1024 rss: 114Mb
    [#65536](/bitcoin-bitcoin/65536/)  pulse  cov: 6325 ft: 12079 corp: 248/38Kb lim: 4096 exec/s: 1110 rss: 140Mb
    [#103081](/bitcoin-bitcoin/103081/) NEW    cov: 6325 ft: 12081 corp: 249/38Kb lim: 4096 exec/s: 1108 rss: 170Mb L: 18/3913 MS: 4 Custom-Custom-Custom-Custom-
    [#103610](/bitcoin-bitcoin/103610/) NEW    cov: 6325 ft: 12083 corp: 250/38Kb lim: 4096 exec/s: 1114 rss: 170Mb L: 14/3913 MS: 4 Custom-Custom-Custom-Custom-
    [#131072](/bitcoin-bitcoin/131072/) pulse  cov: 6325 ft: 12083 corp: 250/38Kb lim: 4096 exec/s: 1139 rss: 192Mb
    [#262144](/bitcoin-bitcoin/262144/) pulse  cov: 6325 ft: 12083 corp: 250/38Kb lim: 4096 exec/s: 1202 rss: 294Mb
    [#291145](/bitcoin-bitcoin/291145/) DONE   cov: 6325 ft: 12083 corp: 250/38Kb lim: 4096 exec/s: 1208 rss: 317Mb
    Done 291145 runs in 241 second(s)
    

    The results are the same, so I need to figure out how to manipulate the Op-Codes to increase the coverage. Or maybe there is no way to increase it with the current logic? Maybe playing around with Op-Codes is a dead end?

  13. practicalswift commented at 4:15 PM on December 13, 2019: contributor

    @brakmic

    Thanks for sharing your results.

    [#98507](/bitcoin-bitcoin/98507/)  DONE   cov: 6325 ft: 12389 corp: 309/92Kb lim: 4096 exec/s: 408 rss: 516Mb
    vs
    [#291145](/bitcoin-bitcoin/291145/) DONE   cov: 6325 ft: 12083 corp: 250/38Kb lim: 4096 exec/s: 1208 rss: 317Mb
    

    The results are not the literally the same actually: judging only from the numbers the simple original version is slightly better compared to the more complex custom version. While they reach the same number of basic blocks or edges (the cov number) the original version has a higher "feature" count (the ft number). libFuzzer uses different signals to evaluate the code coverage: edge coverage, edge counters, value profiles, indirect caller/callee pairs, etc. These signals combined are called features. I don't think the difference between the two ft numbers is of any major significance in this case though: just making a point of not forgetting to look at the ft number too :)

    The results are the same, so I need to figure out how to manipulate the Op-Codes to increase the coverage.

    You're making the assumption that better coverage can be reached by changing the fuzzing harness and/or the fuzzing technique. That is not necessarily the case :)

    Have you looked at what lines of code in the file you are actually hitting when fuzzing with the simple existing fuzzer versus what you would like to hit? If not, that would be good place to start -- that will tell you if there is anything to "fix" :)

  14. brakmic commented at 9:20 AM on December 14, 2019: contributor

    Have you looked at what lines of code in the file you are actually hitting when fuzzing with the simple existing fuzzer versus what you would like to hit? If not, that would be good place to start -- that will tell you if there is anything to "fix" :)

    No, I didn't. Many thanks for the hint. Maybe the custom script fuzzer would make more sense in other environments, for example wallet. The wallet code deals with scripts too and I could take the wallet test environment from unit tests to create a similar one for the fuzzer.

    However, there's also a risk that my custom script fuzzing simply becomes a solution in search of a problem that is only time consuming without bringing any significant improvement.

  15. MarcoFalke removed the label good first issue on Dec 15, 2019
  16. MarcoFalke referenced this in commit 3b5b276734 on Jan 29, 2020
  17. michaelfolkson commented at 12:08 PM on November 11, 2020: contributor

    Thanks for the guidance on MacOS troubleshooting and collecting together fuzzing resources @brakmic. Very helpful. #17657 (comment)

    Added to this StackExchange post on fuzzing.

  18. MarcoFalke closed this on Mar 8, 2021

  19. DrahtBot locked this on Aug 18, 2022

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-04-29 00:14 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me