fuzzing: Write a fuzzer for structured data (e.g. libprotobuf-mutator) #17657

issue MarcoFalke openend this issue on December 3, 2019
  1. MarcoFalke commented at 6:20 pm on December 3, 2019: member

    Messages in Bitcoin are structured, so a fuzzer that was written on structured data might be practically more efficient than our current “blind” fuzzers. https://github.com/google/libprotobuf-mutator looks like a good place to start. The goal of this issue is to write one (or more) fuzzers that are based on structured input data. For example, a transaction or PSBT could be expressed in a structured way and fed into the existing tx or psbt fuzz paths.

    Useful skills: Background in fuzzing and structured data formats

    The purpose of the good first issue label is to highlight which issues are suitable for a new contributor without a deep understanding of the codebase.

    Want to work on this issue?

    You do not need to request permission to start working on this. You are encouraged to comment on the issue if you are planning to work on it. This will help other contributors monitor which issues are actively being addressed and is also an effective way to request assistance if and when you need it.

    For guidance on contributing, please read CONTRIBUTING.md before opening your pull request.

  2. MarcoFalke added the label good first issue on Dec 3, 2019
  3. MarcoFalke added the label Tests on Dec 3, 2019
  4. brakmic commented at 5:17 pm on December 7, 2019: contributor

    Hi,

    I’ve implemented a very basic structure for “structured fuzzing”.

    After I’ve unsuccessfully tried to integrate this variant with the already existing from src/test/fuzz, I moved the code into the sub-dir src/test/fuzz/structured.

    However, the new code is still based on the existing logic, for example the transaction source. The difference here is, that it also includes additional libFuzzer API’s LLVMFuzzerMutate and LLVMFuzzerCustomMutator which execute logic that comes from new mutator classes.

    Currently, there’s only a very basic mutator class available, which I have modelled after the original one from libprotobuf-mutator. Although very sophisticated, the code in libprotobuf-mutator is also very complex (at least for me), so that I avoided mindless copy/pasting.

    At this stage, I think, it’s better to start really small and introduce only as much stuff as I can understand (have never worked with fuzzing before…in fact, I know it for less than 24 hours).

    If I am not totally mistaken, a specialized mutator class should be able to modify Bitcoin’s messages by changing their properties and not only raw “byte vectors”. For example, a mutator should be able to take a transaction and modify it in some way to check if anything problematic will happen.

    Example

    This is how the output looks like (of course, it doesn’t go far enough as the current fuzzing capabilities aren’t specialized enough)

     0./src/test/fuzz/structured/transaction test/fuzz/qa-assets/fuzz_seed_corpus/transaction/
     1INFO: Seed: 2167175973
     2INFO: Loaded 1 modules   (1206386 inline 8-bit counters): 1206386 [0x10f6f1648, 0x10f817eba), 
     3INFO: Loaded 1 PC tables (1206386 PCs): 1206386 [0x10f817ec0,0x110a805e0), 
     4INFO:      295 files found in test/fuzz/qa-assets/fuzz_seed_corpus/transaction/
     5INFO: -max_len is not provided; libFuzzer will not generate inputs larger than 52575 bytes
     6INFO: seed corpus: files: 295 min: 1b max: 52575b total: 491499b rss: 76Mb
     7[#128](/bitcoin-bitcoin/128/)    pulse  cov: 4823 ft: 12238 corp: 107/3309b exec/s: 64 rss: 156Mb
     8[#256](/bitcoin-bitcoin/256/)    pulse  cov: 5691 ft: 23700 corp: 211/46Kb exec/s: 85 rss: 161Mb
     9[#296](/bitcoin-bitcoin/296/)    INITED cov: 5695 ft: 25566 corp: 243/391Kb exec/s: 74 rss: 173Mb
    10[#512](/bitcoin-bitcoin/512/)    pulse  cov: 5695 ft: 25566 corp: 243/391Kb lim: 52575 exec/s: 128 rss: 173Mb
    11[#1024](/bitcoin-bitcoin/1024/)   pulse  cov: 5695 ft: 25566 corp: 243/391Kb lim: 52575 exec/s: 256 rss: 174Mb
    12[#2048](/bitcoin-bitcoin/2048/)   pulse  cov: 5695 ft: 25566 corp: 243/391Kb lim: 52575 exec/s: 409 rss: 176Mb
    13[#4096](/bitcoin-bitcoin/4096/)   pulse  cov: 5695 ft: 25566 corp: 243/391Kb lim: 52575 exec/s: 682 rss: 179Mb
    14[#8192](/bitcoin-bitcoin/8192/)   pulse  cov: 5695 ft: 25566 corp: 243/391Kb lim: 52575 exec/s: 1024 rss: 186Mb
    15[#16384](/bitcoin-bitcoin/16384/)  pulse  cov: 5695 ft: 25566 corp: 243/391Kb lim: 52575 exec/s: 1365 rss: 199Mb
    

    Learning Resources

    If there’s someone else also interested in working with structured fuzzing I’d recommend these videos and texts:

    Or, if there’s someone else with more experience, please, grab my code, adapt it and share your changes.

    Any help is very much appreciated! 👍


    A few words for people out there struggling with macOS. 😱

    I’m working on macOS Catalina, so maybe I should also put a few words on compiling the fuzzing capability with it:

    • Take care of having an LLVM/Clang environment that contains fuzzing libraries. The default one from Apple is not enough, so that you will have to install it with brew, if not already done.

    • When executing ./configure you should put –disable-asm to avoid errors with certain assembly code from Bitcoin Core’s code. There’s an entry about it here and it seems to have something to do with sanitizers you have to compile for fuzzing.

    • Take care of giving the correct path for clang and clang++, like CC=/path/to/clang CXX=/path/to/clang++

    • If you run into problems with “boost sleep” or some of boost’s libraries can’t be found, like boost.thread or boost.filesystem, add this to your configure:

    0CXXFLAGS="-isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.15.sdk"
    

    Notice: I am using Catalina 10.15.1, so your SDK might be different and you should adapt the path accordingly.

    Here’s my complete configure, just in case.

    0./configure --disable-ccache --enable-fuzz --with-sanitizers=fuzzer,address,undefined --with-boost CPPFLAGS="-I/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include" CXXFLAGS="-isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.15.sdk" CC=/usr/local/opt/llvm/bin/clang CXX=/usr/local/opt/llvm/bin/clang++ LDFLAGS="-L/usr/local/lib/darwin/" --disable-asm
    

    Regards,

  5. practicalswift commented at 9:24 pm on December 7, 2019: contributor

    @brakmic

    I’m very glad to see that you are interested in adding more fuzzing harnesses to the project. Welcome!

    If you want to work on improving fuzzing coverage in Bitcoin Core there is a lot of long-hanging fruit in the form of currently non-fuzz-covered code that could be covered simply by adding small, simple and dumb fuzzing harnesses (without any dependency on libprotobuf-mutator or similar). See the simple fuzzers linked below for inspiration.

    Coverage-guided fuzzers like libFuzzer are surprisingly good these days, so I think you’ll be surprised how deep also simple fuzzing harnesses can reach :)

    After adding a few fuzzers to the project you’ll get a feel for the limits of simple fuzzing harnesses and you might notice cases where measurements indicate that a fuzzer gets stuck because of the lack of more sophisticated structure awareness. Then it might make sense to look at bringing in libprotobuf-mutator or similar, but my suggestion though is to start with the simplest possible fuzzing techniques first and then add complexity only when required.

    Fuzzing harnesses should be as simple as possible, but not simpler :)

    If you are interested in fuzzing Bitcoin Core, please consider reviewing any of the fuzzing PR:s awaiting review:

    • #17050 – “tests: Add fuzzing harnesses for functions parsing scripts, numbers, JSON and HD keypaths (bip32)”
    • #17071 – “tests: Add fuzzing harness for CheckBlock(…) and other CBlock related functions”
    • #17093 – “tests: Add fuzzing harness for various CTx{In,Out} related functions”
    • #17109 – “tests: Add fuzzing harness for various functions consuming only integrals”
    • #17225 – “tests: Test serialisation as part of deserialisation fuzzing. Test round-trip equality where possible.”
    • #17229 – “tests: Add fuzzing harnesses for various Base{32,58,64} and hex related functions”

    I would be glad to help if you run in to any problems during your fuzzing journey :) Also, don’t hesitate to ping me if you want any fuzzing PR reviewed :)

    Again: welcome! We need more fuzzing in Bitcoin Core! :)

  6. brakmic commented at 9:41 pm on December 7, 2019: contributor

    @practicalswift

    Many thanks for your support and the list of fuzzing PR’s! Now I can work on something that’s concrete. :)

    This also will produce proper feedback, so I can adapt the code accordingly.

    Regards,

  7. brakmic commented at 9:01 pm on December 12, 2019: contributor

    @practicalswift

    Meanwhile, I’ve created a small structure that should help build various structured fuzzers. It’s nothing complex, just a single interface that all fuzzing classes must implement.

     0class IMutator {
     1public:
     2  // Initialize random nuber generator
     3  virtual void Seed(unsigned int value) = 0;
     4  // Default mutate function.
     5  // All Bitcoin messages are vectors of bytes that can be converted into
     6  // structures like Transactions, PSBT's, Scripts etc.
     7  virtual void Mutate(std::vector<uint8_t>& data) = 0;
     8    // Register callback for postprocessing of mutated messages.
     9  virtual void RegisterPostProcessor(const IDescriptor* descriptor, PostProcessFunction callback) = 0;
    10};
    

    I have then taken the original Script- and Transaction-Fuzzers and extended them with custom fuzzing APIs from libFuzzer. However, the transaction fuzzer is still very primitive, so that only Script Fuzzer should be considered for now.

    In the current implementation I’m letting this fuzzer create tons of (im)possible Bitcoin Scripts. These would look like this (every line is a separate script that will be fired against various functions):

     02 OP_PICK OP_BOOLOR OP_PUSHDATA2 12 OP_ADD 1 OP_SIZE OP_NOTIF OP_MAX OP_VERIF OP_TOALTSTACK OP_CHECKSEQUENCEVERIFY OP_NIP 
     1OP_PICK OP_BOOLOR OP_PUSHDATA2 12 
     2OP_PICK OP_BOOLOR OP_PUSHDATA2 12 OP_ADD 1 OP_SIZE OP_NOTIF OP_MAX OP_VERIF OP_TOALTSTACK OP_CHECKSEQUENCEVERIFY OP_NIP OP_SWAP 
     3OP_BOOLOR OP_PUSHDATA2 12 
     412 1 OP_SIZE OP_NOTIF OP_MAX 
     512 OP_ADD 1 OP_SIZE OP_NOTIF OP_MAX OP_VERIF 
     612 OP_ADD 1 OP_SIZE 
     7OP_ADD 1 
     81 OP_SIZE OP_NOTIF OP_MAX OP_VERIF OP_TOALTSTACK OP_CHECKSEQUENCEVERIFY OP_NIP OP_SWAP 
     9OP_SIZE OP_NOTIF OP_MAX OP_VERIF OP_TOALTSTACK OP_CHECKSEQUENCEVERIFY 
    10OP_NOP OP_MAX OP_VERIF OP_TOALTSTACK OP_CHECKSEQUENCEVERIFY 
    11OP_NOP OP_VERIF OP_TOALTSTACK OP_CHECKSEQUENCEVERIFY OP_NIP 
    1213 5 -1 OP_CHECKSIG OP_2DUP OP_WITHIN OP_IF OP_2DIV OP_ELSE 12 
    1313 5 -1 OP_CHECKSIG OP_2DUP OP_WITHIN OP_IF OP_2DIV OP_ELSE 12 
    14-1 OP_CHECKSIG OP_2DUP OP_WITHIN OP_IF OP_2DIV OP_ELSE 12 OP_ADD OP_NOP6 OP_XOR OP_TUCK OP_2DUP OP_VER OP_VERIF OP_FROMALTSTACK 
    15-1 OP_CHECKSIG OP_2DUP OP_WITHIN OP_IF OP_2DIV OP_ELSE 12 OP_ADD OP_NOP6 OP_XOR OP_TUCK OP_2DUP OP_VER OP_VERIF OP_FROMALTSTACK 
    16OP_CHECKSIG OP_2DUP OP_WITHIN OP_IF OP_2DIV OP_ELSE 12 OP_ADD OP_NOP6 OP_XOR OP_TUCK 
    17OP_2DIV OP_ELSE 12 
    1812 OP_ADD OP_NOP6 OP_XOR OP_TUCK
    

    I am not sure if this is useful at all, so maybe I should not try to introduce any additional complexities before letting others double check it. Maybe the whole interface-implementation stuff is already too much for this task, so any help in this case is very much appreciated.

    One more thing, however…

    During my experiments I encountered these UB-sanitizer warnings when starting the script fuzzer:

     0INFO: Seed: 1708339462
     1INFO: Loaded 1 modules   (1093525 inline 8-bit counters): 1093525 [0x1106584c8, 0x11076345d), 
     2INFO: Loaded 1 PC tables (1093525 PCs): 1093525 [0x110763460,0x111812db0), 
     3INFO: -max_len is not provided; libFuzzer will not generate inputs larger than 4096 bytes
     4prevector.h:453:19: runtime error: reference binding to misaligned address 0x7ffee318f162 for type 'prevector<28, unsigned char, unsigned int, int>::size_type' (aka 'unsigned int'), which requires 4 byte alignment
     50x7ffee318f162: note: pointer points here
     6 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00
     7              ^ 
     8SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior prevector.h:453:19 in 
     9/usr/local/opt/llvm/bin/../include/c++/v1/type_traits:3699:25: runtime error: reference binding to misaligned address 0x7ffee318f162 for type 'unsigned int', which requires 4 byte alignment
    100x7ffee318f162: note: pointer points here
    11 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00
    12              ^ 
    13SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /usr/local/opt/llvm/bin/../include/c++/v1/type_traits:3699:25 in 
    14/usr/local/opt/llvm/bin/../include/c++/v1/type_traits:2281:12: runtime error: reference binding to misaligned address 0x7ffee318f162 for type '_Up' (aka 'unsigned int'), which requires 4 byte alignment
    150x7ffee318f162: note: pointer points here
    16 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00
    17              ^ 
    18SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /usr/local/opt/llvm/bin/../include/c++/v1/type_traits:2281:12 in 
    19/usr/local/opt/llvm/bin/../include/c++/v1/type_traits:3699:13: runtime error: load of misaligned address 0x7ffee318f162 for type 'typename remove_reference<unsigned int &>::type' (aka 'unsigned int'), which requires 4 byte alignment
    200x7ffee318f162: note: pointer points here
    21 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00
    

    First, I thought that it must have been because of my sloppy coding, but no matter what I did, the warnings remained.

    Then I started (de)activating functions from LLVMFuzzerTestOneInput in test/fuzz/structured/script.cpp one by one.

    And it seems that this call is the culprit, but I still can’t explain why:

    0(void)IsSolvable(signing_provider, script);
    

    This function was taken like all others from the original test/fuzz/script.cpp

    However, I am still convinced that it has something to do with my code.

    Regards,

  8. MarcoFalke commented at 9:04 pm on December 12, 2019: member

    During my experiments I encountered these UB-sanitizer warnings when starting the script fuzzer:

    I recommend to activate all known suppressions:

    0export LSAN_OPTIONS="suppressions=$(pwd)/test/sanitizer_suppressions/lsan"
    1export TSAN_OPTIONS="suppressions=$(pwd)/test/sanitizer_suppressions/tsan"
    2export UBSAN_OPTIONS="suppressions=$(pwd)/test/sanitizer_suppressions/ubsan:print_stacktrace=1:halt_on_error=1"
    
  9. practicalswift commented at 10:54 pm on December 12, 2019: contributor

    @brakmic

    The prevector alignment issue is known and fixed by PR #17708. Please consider reviewing that PR - it would be nice to have it solved :)

    Regarding the fuzzing-experiments branch: try to measure what results you get from the fuzzing harness in that +564 LOC branch in terms of coverage and then compare that to what you achieve using the simplest possible ~20 LOC fuzzer you can think of for the same target function. What were the results? Did the extra abstractions pay off?

  10. brakmic commented at 10:55 am on December 13, 2019: contributor

    @practicalswift

    Many thanks for the hint regarding alignment issues. Now I don’t have to make my code even more ugly ;)

    Here’s the output of script fuzzers. The first one is with custom fuzzing function activated, the second one is exactly the same fuzzer but without the custom function. I just put

    0#ifdef CUSTOM_FUZZER 
    1... 
    2#endif
    

    around it.

     0INFO: Seed: 1052026202
     1INFO: Loaded 1 modules   (1093525 inline 8-bit counters): 1093525 [0x10ada24c8, 0x10aead45d), 
     2INFO: Loaded 1 PC tables (1093525 PCs): 1093525 [0x10aead460,0x10bf5cdb0), 
     3INFO:      440 files found in test/fuzz/qa-assets/fuzz_seed_corpus/script/
     4INFO: -max_len is not provided; libFuzzer will not generate inputs larger than 4096 bytes
     5INFO: seed corpus: files: 440 min: 1b max: 3948b total: 136723b rss: 72Mb
     6[#442](/bitcoin-bitcoin/442/)    INITED cov: 6325 ft: 12399 corp: 320/91Kb exec/s: 221 rss: 89Mb
     7[#512](/bitcoin-bitcoin/512/)    pulse  cov: 6325 ft: 12399 corp: 320/91Kb lim: 4096 exec/s: 256 rss: 89Mb
     8[#1024](/bitcoin-bitcoin/1024/)   pulse  cov: 6325 ft: 12399 corp: 320/91Kb lim: 4096 exec/s: 512 rss: 90Mb
     9[#2048](/bitcoin-bitcoin/2048/)   pulse  cov: 6325 ft: 12399 corp: 320/91Kb lim: 4096 exec/s: 682 rss: 91Mb
    10[#4096](/bitcoin-bitcoin/4096/)   pulse  cov: 6325 ft: 12399 corp: 320/91Kb lim: 4096 exec/s: 1024 rss: 93Mb
    11[#8192](/bitcoin-bitcoin/8192/)   pulse  cov: 6325 ft: 12399 corp: 320/91Kb lim: 4096 exec/s: 1170 rss: 97Mb
    12[#16384](/bitcoin-bitcoin/16384/)  pulse  cov: 6325 ft: 12399 corp: 320/91Kb lim: 4096 exec/s: 1260 rss: 105Mb
    13[#32768](/bitcoin-bitcoin/32768/)  pulse  cov: 6325 ft: 12399 corp: 320/91Kb lim: 4096 exec/s: 1260 rss: 123Mb
    14[#65536](/bitcoin-bitcoin/65536/)  pulse  cov: 6325 ft: 12399 corp: 320/91Kb lim: 4096 exec/s: 1213 rss: 156Mb
    15[#131072](/bitcoin-bitcoin/131072/) pulse  cov: 6325 ft: 12399 corp: 320/91Kb lim: 4096 exec/s: 1310 rss: 223Mb
    16[#262144](/bitcoin-bitcoin/262144/) pulse  cov: 6325 ft: 12399 corp: 320/91Kb lim: 4096 exec/s: 1337 rss: 359Mb
    17[#524288](/bitcoin-bitcoin/524288/) pulse  cov: 6325 ft: 12399 corp: 320/91Kb lim: 4096 exec/s: 1383 rss: 543Mb
    18[#1048576](/bitcoin-bitcoin/1048576/)        pulse  cov: 6325 ft: 12399 corp: 320/91Kb lim: 4096 exec/s: 1286 rss: 544Mb
    19[#2097152](/bitcoin-bitcoin/2097152/)        pulse  cov: 6325 ft: 12399 corp: 320/91Kb lim: 4096 exec/s: 1304 rss: 544Mb
    20[#4194304](/bitcoin-bitcoin/4194304/)        pulse  cov: 6325 ft: 12399 corp: 320/91Kb lim: 4096 exec/s: 1326 rss: 544Mb
    

    It ran for some 30+ minutes before I stopped it and this is the maximum coverage it was able to achieve. I am pretty sure, that a more “intelligent” script-randomizing technique might have achieved a bit more, but for this I would need to find a way how to construct more “realistic” scripts. That is, scripts which are “almost” correct. Right now, it’s more or less creating batches of randomly selected Op-Codes.

    And here the “normal” fuzzer output.

     0INFO: Seed: 1849818327INFO: Loaded 1 modules   (1092492 inline 8-bit counters): 1092492 [0x10ccef048, 0x10cdf9bd4),
     1INFO: Loaded 1 PC tables (1092492 PCs): 1092492 [0x10cdf9bd8,0x10dea5498), 
     2INFO:      276 files found in test/fuzz/qa-assets/fuzz_seed_corpus/script/
     3INFO: -max_len is not provided; libFuzzer will not generate inputs larger than 4096 bytes
     4INFO: seed corpus: files: 276 min: 1b max: 3948b total: 56003b rss: 71Mb
     5...[snip]...
     6[#79124](/bitcoin-bitcoin/79124/)  REDUCE cov: 3502 ft: 8202 corp: 307/117Kb lim: 4096 exec/s: 277 rss: 542Mb L: 2244/3940 MS: 2 ChangeASCIIInt-EraseBytes-
     7[#80005](/bitcoin-bitcoin/80005/)  REDUCE cov: 3502 ft: 8202 corp: 307/117Kb lim: 4096 exec/s: 277 rss: 542Mb L: 918/3940 MS: 1 EraseBytes-
     8[#80168](/bitcoin-bitcoin/80168/)  REDUCE cov: 3502 ft: 8202 corp: 307/117Kb lim: 4096 exec/s: 278 rss: 542Mb L: 195/3940 MS: 3 InsertRepeatedBytes-InsertByte-EraseBytes-
     9[#80409](/bitcoin-bitcoin/80409/)  REDUCE cov: 3502 ft: 8202 corp: 307/117Kb lim: 4096 exec/s: 278 rss: 542Mb L: 2239/3940 MS: 1 EraseBytes-
    10[#81560](/bitcoin-bitcoin/81560/)  REDUCE cov: 3502 ft: 8202 corp: 307/116Kb lim: 4096 exec/s: 278 rss: 542Mb L: 2047/3940 MS: 1 EraseBytes-
    11[#81994](/bitcoin-bitcoin/81994/)  REDUCE cov: 3502 ft: 8202 corp: 307/116Kb lim: 4096 exec/s: 278 rss: 542Mb L: 1731/3940 MS: 4 ChangeBinInt-ChangeASCIIInt-ChangeBit-EraseBytes-
    12[#82232](/bitcoin-bitcoin/82232/)  REDUCE cov: 3502 ft: 8202 corp: 307/116Kb lim: 4096 exec/s: 278 rss: 542Mb L: 989/3940 MS: 3 InsertRepeatedBytes-CMP-EraseBytes- DE: "\x01\x00\x00\x00\x00\x00\x00\x10"-
    13[#83897](/bitcoin-bitcoin/83897/)  REDUCE cov: 3502 ft: 8202 corp: 307/115Kb lim: 4096 exec/s: 280 rss: 542Mb L: 930/3940 MS: 5 ChangeBit-ChangeByte-ShuffleBytes-ChangeBit-EraseBytes-
    14[#84769](/bitcoin-bitcoin/84769/)  REDUCE cov: 3502 ft: 8202 corp: 307/115Kb lim: 4096 exec/s: 280 rss: 542Mb L: 992/3940 MS: 2 EraseBytes-CMP- DE: "\xb7\x01"-
    15[#87330](/bitcoin-bitcoin/87330/)  REDUCE cov: 3502 ft: 8202 corp: 307/115Kb lim: 4096 exec/s: 282 rss: 542Mb L: 240/3940 MS: 1 EraseBytes-
    16[#87545](/bitcoin-bitcoin/87545/)  REDUCE cov: 3502 ft: 8202 corp: 307/115Kb lim: 4096 exec/s: 282 rss: 542Mb L: 890/3940 MS: 5 ChangeByte-ShuffleBytes-ChangeBit-InsertByte-EraseBytes-
    17[#89626](/bitcoin-bitcoin/89626/)  REDUCE cov: 3502 ft: 8202 corp: 307/114Kb lim: 4096 exec/s: 283 rss: 542Mb L: 135/3940 MS: 1 EraseBytes-
    18[#90207](/bitcoin-bitcoin/90207/)  REDUCE cov: 3502 ft: 8202 corp: 307/114Kb lim: 4096 exec/s: 283 rss: 542Mb L: 778/3940 MS: 1 EraseBytes-
    19[#91059](/bitcoin-bitcoin/91059/)  REDUCE cov: 3502 ft: 8202 corp: 307/114Kb lim: 4096 exec/s: 283 rss: 542Mb L: 3382/3923 MS: 2 ChangeASCIIInt-EraseBytes-
    20[#91787](/bitcoin-bitcoin/91787/)  NEW    cov: 3502 ft: 8205 corp: 308/114Kb lim: 4096 exec/s: 284 rss: 542Mb L: 25/3923 MS: 3 ChangeBit-PersAutoDict-EraseBytes- DE: "\x17\x04\x00\x00"-
    21[#94658](/bitcoin-bitcoin/94658/)  REDUCE cov: 3502 ft: 8205 corp: 308/114Kb lim: 4096 exec/s: 285 rss: 542Mb L: 942/3923 MS: 1 EraseBytes-
    22[#99369](/bitcoin-bitcoin/99369/)  REDUCE cov: 3502 ft: 8205 corp: 308/114Kb lim: 4096 exec/s: 286 rss: 542Mb L: 1655/3923 MS: 1 EraseBytes-
    23[#100062](/bitcoin-bitcoin/100062/) REDUCE cov: 3502 ft: 8205 corp: 308/113Kb lim: 4096 exec/s: 286 rss: 542Mb L: 3586/3923 MS: 3 InsertByte-ChangeByte-EraseBytes-
    24[#100503](/bitcoin-bitcoin/100503/) REDUCE cov: 3502 ft: 8205 corp: 308/113Kb lim: 4096 exec/s: 286 rss: 542Mb L: 60/3923 MS: 1 EraseBytes-
    25[#100894](/bitcoin-bitcoin/100894/) REDUCE cov: 3502 ft: 8205 corp: 308/113Kb lim: 4096 exec/s: 286 rss: 542Mb L: 134/3923 MS: 1 EraseBytes-
    26[#101858](/bitcoin-bitcoin/101858/) REDUCE cov: 3502 ft: 8205 corp: 308/113Kb lim: 4096 exec/s: 284 rss: 542Mb L: 450/3923 MS: 4 ChangeBit-ShuffleBytes-InsertByte-EraseBytes-
    27...[snip]...
    

    However, being unexperienced in fuzzing I am not going to experiment “too much”, because it’s really easy to get lost in complexities when you’re dealing with new things.

    Maybe you or others have better ideas how to build a proper structured fuzzer?

    Regards,

  11. practicalswift commented at 2:06 pm on December 13, 2019: contributor

    @brakmic

    To make it a proper shoot-out you’ll need to make sure the two fuzzing sessions with exactly the same seed input corpus and that the fuzzing binaries are given the same run-time.

    In what you posted above the initial corpus sizes differs:

    0INFO: seed corpus: files: 440 min: 1b max: 3948b total: 136723b rss: 72Mb
    1vs.
    2INFO: seed corpus: files: 276 min: 1b max: 3948b total: 56003b rss: 71Mb
    

    Try doing the shoot-out by giving each fuzzing harness a fresh copy of qa-assets/fuzz_seed_corpus/script/. It should be a fresh copy without any saved findings. Avoid the mistake of sharing the same directory between the fuzzers: they should have a separate directory each since libFuzzer will write to these directories.

    Also make sure they are given same runtime using -max_total_time.

    Can you repeat your experiment with these adjustments and post the full results to a GitHub gist? :) Make sure to include all initial INFO: lines and also the ending DONE line in the output.

    […] it’s really easy to get lost in complexities when you’re dealing with new things.

    A good point. A way to avoid that is to go super simple to start with and only gradually introduce abstractions/complexities only when evidence suggests it is needed.

    In this specific case: my suggestion is that you start with writing a few basic non-structured fuzzers and wait with introducing structured fuzzing until you have experimental results suggesting that such a move would allow you to reach code paths unreachable by simpler methods (or finding such code paths much quicker).

  12. brakmic commented at 3:07 pm on December 13, 2019: contributor

    @practicalswift

    Many thanks! Now I understand a few things better. I’ve executed the two tests. For both of them I cloned Bitcoin’s qa-assets anew: git clone https://github.com/bitcoin-core/qa-assets

    I also compiled src/test/fuzz/structured/script.cpp with and without the custom function.

    I executed them with same arguments:

    0./src/test/fuzz/structured/script -max_total_time=240 test/fuzz/qa-assets/fuzz_seed_corpus/script/
    

    The output of the non-custom script.cpp is:

     0INFO: Seed: 3994611508
     1INFO: Loaded 1 modules   (1092492 inline 8-bit counters): 1092492 [0x10b467048, 0x10b571bd4), 
     2INFO: Loaded 1 PC tables (1092492 PCs): 1092492 [0x10b571bd8,0x10c61d498), 
     3INFO:      284 files found in test/fuzz/qa-assets/fuzz_seed_corpus/script/
     4INFO: -max_len is not provided; libFuzzer will not generate inputs larger than 4096 bytes
     5INFO: seed corpus: files: 284 min: 1b max: 3948b total: 56352b rss: 71Mb
     6[#286](/bitcoin-bitcoin/286/)    INITED cov: 6325 ft: 12060 corp: 240/38Kb exec/s: 286 rss: 82Mb
     7[#299](/bitcoin-bitcoin/299/)    NEW    cov: 6325 ft: 12062 corp: 241/39Kb lim: 3913 exec/s: 299 rss: 82Mb L: 107/3913 MS: 3 InsertRepeatedBytes-ChangeBit-ChangeByte-
     8[#300](/bitcoin-bitcoin/300/)    NEW    cov: 6325 ft: 12064 corp: 242/39Kb lim: 3913 exec/s: 300 rss: 82Mb L: 227/3913 MS: 1 ChangeByte-
     9[#301](/bitcoin-bitcoin/301/)    NEW    cov: 6325 ft: 12065 corp: 243/39Kb lim: 3913 exec/s: 301 rss: 82Mb L: 23/3913 MS: 1 ChangeBit-
    10[#305](/bitcoin-bitcoin/305/)    NEW    cov: 6325 ft: 12068 corp: 244/43Kb lim: 3913 exec/s: 305 rss: 83Mb L: 3913/3913 MS: 4 EraseBytes-CopyPart-ChangeBinInt-CrossOver-
    11[#315](/bitcoin-bitcoin/315/)    NEW    cov: 6325 ft: 12083 corp: 245/43Kb lim: 3913 exec/s: 315 rss: 83Mb L: 154/3913 MS: 5 EraseBytes-InsertByte-ChangeByte-InsertByte-EraseBytes-
    12[#352](/bitcoin-bitcoin/352/)    NEW    cov: 6325 ft: 12095 corp: 246/43Kb lim: 3913 exec/s: 352 rss: 83Mb L: 463/3913 MS: 2 InsertRepeatedBytes-InsertRepeatedBytes-
    13[#364](/bitcoin-bitcoin/364/)    NEW    cov: 6325 ft: 12097 corp: 247/43Kb lim: 3913 exec/s: 364 rss: 83Mb L: 76/3913 MS: 2 ShuffleBytes-InsertRepeatedBytes-
    14[#367](/bitcoin-bitcoin/367/)    NEW    cov: 6325 ft: 12100 corp: 248/43Kb lim: 3913 exec/s: 367 rss: 83Mb L: 20/3913 MS: 3 EraseBytes-CMP-CMP- DE: "\x01\x00\x00\x00\x00\x00\x00\x00"-"R\xe4\x00\x00 `\x00\x00"-
    15[..snip..]
    16[#89194](/bitcoin-bitcoin/89194/)  REDUCE cov: 6325 ft: 12389 corp: 309/92Kb lim: 4096 exec/s: 405 rss: 515Mb L: 317/3913 MS: 2 InsertByte-EraseBytes-
    17[#91850](/bitcoin-bitcoin/91850/)  REDUCE cov: 6325 ft: 12389 corp: 309/92Kb lim: 4096 exec/s: 406 rss: 516Mb L: 460/3913 MS: 1 EraseBytes-
    18[#92497](/bitcoin-bitcoin/92497/)  REDUCE cov: 6325 ft: 12389 corp: 309/92Kb lim: 4096 exec/s: 405 rss: 516Mb L: 1146/3913 MS: 2 ChangeBinInt-EraseBytes-
    19[#93653](/bitcoin-bitcoin/93653/)  REDUCE cov: 6325 ft: 12389 corp: 309/92Kb lim: 4096 exec/s: 407 rss: 516Mb L: 223/3913 MS: 1 EraseBytes-
    20[#96209](/bitcoin-bitcoin/96209/)  REDUCE cov: 6325 ft: 12389 corp: 309/92Kb lim: 4096 exec/s: 409 rss: 516Mb L: 103/3913 MS: 1 EraseBytes-
    21[#97245](/bitcoin-bitcoin/97245/)  REDUCE cov: 6325 ft: 12389 corp: 309/92Kb lim: 4096 exec/s: 408 rss: 516Mb L: 201/3913 MS: 1 EraseBytes-
    22[#97902](/bitcoin-bitcoin/97902/)  REDUCE cov: 6325 ft: 12389 corp: 309/92Kb lim: 4096 exec/s: 409 rss: 516Mb L: 209/3913 MS: 2 CMP-EraseBytes- DE: "\x00\x00\x00\x00\x00\x00\x00\x00"-
    23[#98507](/bitcoin-bitcoin/98507/)  DONE   cov: 6325 ft: 12389 corp: 309/92Kb lim: 4096 exec/s: 408 rss: 516Mb
    24###### Recommended dictionary. ######
    25"\x01\x00\x00\x00\x00\x00\x00\x00" # Uses: 571
    26"R\xe4\x00\x00 `\x00\x00" # Uses: 671
    27"\x00\x00\x00\x00\x00\x00\x00R" # Uses: 621
    28"\x1a\x00\x00\x00\x00\x00\x00\x00" # Uses: 630
    29"\x83\xe0\x04\x00\xa0a\x00\x00" # Uses: 593
    30"\xff\xff~\xfe\xe86\xec\xc0" # Uses: 647
    31"\x13\x00\x00\x00" # Uses: 596
    32"\x01\x00\x00\x00\x00\x00\x02\x08" # Uses: 577
    33"\x10\x00\x00\x00\x00\x00\x00\x00" # Uses: 629
    34"\xb6\x02\x00\x00" # Uses: 574
    35"\x01\x00\x0a\xfa" # Uses: 567
    36"\xbb\x01\x00\x00" # Uses: 502
    37"\xff\x96" # Uses: 463
    38"\x1c\x00" # Uses: 387
    39"\xff\x00\x00\x00" # Uses: 269
    40"\x01\x00\x00\xaf" # Uses: 200
    41"\x00\xad" # Uses: 94
    42"\x00\x00\x00\x00\x00\x00\x00\x00" # Uses: 2
    43###### End of recommended dictionary. ######
    44Done 98507 runs in 241 second(s)
    

    And the output of customized script.cpp is:

     0INFO: Seed: 160415445
     1INFO: Loaded 1 modules   (1093525 inline 8-bit counters): 1093525 [0x104e224c8, 0x104f2d45d), 
     2INFO: Loaded 1 PC tables (1093525 PCs): 1093525 [0x104f2d460,0x105fdcdb0), 
     3INFO:      284 files found in test/fuzz/qa-assets/fuzz_seed_corpus/script/
     4INFO: -max_len is not provided; libFuzzer will not generate inputs larger than 4096 bytes
     5INFO: seed corpus: files: 284 min: 1b max: 3948b total: 56352b rss: 71Mb
     6[#286](/bitcoin-bitcoin/286/)    INITED cov: 6325 ft: 12059 corp: 241/38Kb exec/s: 286 rss: 82Mb
     7[#361](/bitcoin-bitcoin/361/)    NEW    cov: 6325 ft: 12060 corp: 242/38Kb lim: 4096 exec/s: 361 rss: 82Mb L: 13/3913 MS: 5 Custom-Custom-Custom-Custom-Custom-
     8[#437](/bitcoin-bitcoin/437/)    NEW    cov: 6325 ft: 12061 corp: 243/38Kb lim: 4096 exec/s: 437 rss: 83Mb L: 10/3913 MS: 1 Custom-
     9[#1024](/bitcoin-bitcoin/1024/)   pulse  cov: 6325 ft: 12061 corp: 243/38Kb lim: 4096 exec/s: 512 rss: 83Mb
    10[#1084](/bitcoin-bitcoin/1084/)   REDUCE cov: 6325 ft: 12061 corp: 243/38Kb lim: 4096 exec/s: 542 rss: 83Mb L: 4/3913 MS: 2 Custom-Custom-
    11[#2048](/bitcoin-bitcoin/2048/)   pulse  cov: 6325 ft: 12061 corp: 243/38Kb lim: 4096 exec/s: 682 rss: 89Mb
    12[#2436](/bitcoin-bitcoin/2436/)   NEW    cov: 6325 ft: 12064 corp: 244/38Kb lim: 4096 exec/s: 812 rss: 89Mb L: 11/3913 MS: 3 Custom-Custom-Custom-
    13[#3514](/bitcoin-bitcoin/3514/)   NEW    cov: 6325 ft: 12067 corp: 245/38Kb lim: 4096 exec/s: 878 rss: 90Mb L: 13/3913 MS: 3 Custom-Custom-Custom-
    14[#4096](/bitcoin-bitcoin/4096/)   pulse  cov: 6325 ft: 12067 corp: 245/38Kb lim: 4096 exec/s: 819 rss: 90Mb
    15[#5590](/bitcoin-bitcoin/5590/)   NEW    cov: 6325 ft: 12071 corp: 246/38Kb lim: 4096 exec/s: 931 rss: 92Mb L: 17/3913 MS: 1 Custom-
    16[#8192](/bitcoin-bitcoin/8192/)   pulse  cov: 6325 ft: 12071 corp: 246/38Kb lim: 4096 exec/s: 910 rss: 94Mb
    17[#16384](/bitcoin-bitcoin/16384/)  pulse  cov: 6325 ft: 12071 corp: 246/38Kb lim: 4096 exec/s: 1024 rss: 101Mb
    18[#17775](/bitcoin-bitcoin/17775/)  NEW    cov: 6325 ft: 12075 corp: 247/38Kb lim: 4096 exec/s: 1045 rss: 102Mb L: 16/3913 MS: 5 Custom-Custom-Custom-Custom-Custom-
    19[#30092](/bitcoin-bitcoin/30092/)  NEW    cov: 6325 ft: 12079 corp: 248/38Kb lim: 4096 exec/s: 1037 rss: 112Mb L: 17/3913 MS: 2 Custom-Custom-
    20[#32768](/bitcoin-bitcoin/32768/)  pulse  cov: 6325 ft: 12079 corp: 248/38Kb lim: 4096 exec/s: 1024 rss: 114Mb
    21[#65536](/bitcoin-bitcoin/65536/)  pulse  cov: 6325 ft: 12079 corp: 248/38Kb lim: 4096 exec/s: 1110 rss: 140Mb
    22[#103081](/bitcoin-bitcoin/103081/) NEW    cov: 6325 ft: 12081 corp: 249/38Kb lim: 4096 exec/s: 1108 rss: 170Mb L: 18/3913 MS: 4 Custom-Custom-Custom-Custom-
    23[#103610](/bitcoin-bitcoin/103610/) NEW    cov: 6325 ft: 12083 corp: 250/38Kb lim: 4096 exec/s: 1114 rss: 170Mb L: 14/3913 MS: 4 Custom-Custom-Custom-Custom-
    24[#131072](/bitcoin-bitcoin/131072/) pulse  cov: 6325 ft: 12083 corp: 250/38Kb lim: 4096 exec/s: 1139 rss: 192Mb
    25[#262144](/bitcoin-bitcoin/262144/) pulse  cov: 6325 ft: 12083 corp: 250/38Kb lim: 4096 exec/s: 1202 rss: 294Mb
    26[#291145](/bitcoin-bitcoin/291145/) DONE   cov: 6325 ft: 12083 corp: 250/38Kb lim: 4096 exec/s: 1208 rss: 317Mb
    27Done 291145 runs in 241 second(s)
    

    The results are the same, so I need to figure out how to manipulate the Op-Codes to increase the coverage. Or maybe there is no way to increase it with the current logic? Maybe playing around with Op-Codes is a dead end?

  13. practicalswift commented at 4:15 pm on December 13, 2019: contributor

    @brakmic

    Thanks for sharing your results.

    0[#98507](/bitcoin-bitcoin/98507/)  DONE   cov: 6325 ft: 12389 corp: 309/92Kb lim: 4096 exec/s: 408 rss: 516Mb
    1vs
    2[#291145](/bitcoin-bitcoin/291145/) DONE   cov: 6325 ft: 12083 corp: 250/38Kb lim: 4096 exec/s: 1208 rss: 317Mb
    

    The results are not the literally the same actually: judging only from the numbers the simple original version is slightly better compared to the more complex custom version. While they reach the same number of basic blocks or edges (the cov number) the original version has a higher “feature” count (the ft number). libFuzzer uses different signals to evaluate the code coverage: edge coverage, edge counters, value profiles, indirect caller/callee pairs, etc. These signals combined are called features. I don’t think the difference between the two ft numbers is of any major significance in this case though: just making a point of not forgetting to look at the ft number too :)

    The results are the same, so I need to figure out how to manipulate the Op-Codes to increase the coverage.

    You’re making the assumption that better coverage can be reached by changing the fuzzing harness and/or the fuzzing technique. That is not necessarily the case :)

    Have you looked at what lines of code in the file you are actually hitting when fuzzing with the simple existing fuzzer versus what you would like to hit? If not, that would be good place to start – that will tell you if there is anything to “fix” :)

  14. brakmic commented at 9:20 am on December 14, 2019: contributor

    Have you looked at what lines of code in the file you are actually hitting when fuzzing with the simple existing fuzzer versus what you would like to hit? If not, that would be good place to start – that will tell you if there is anything to “fix” :)

    No, I didn’t. Many thanks for the hint. Maybe the custom script fuzzer would make more sense in other environments, for example wallet. The wallet code deals with scripts too and I could take the wallet test environment from unit tests to create a similar one for the fuzzer.

    However, there’s also a risk that my custom script fuzzing simply becomes a solution in search of a problem that is only time consuming without bringing any significant improvement.

  15. MarcoFalke removed the label good first issue on Dec 15, 2019
  16. MarcoFalke referenced this in commit 3b5b276734 on Jan 29, 2020
  17. michaelfolkson commented at 12:08 pm on November 11, 2020: contributor

    Thanks for the guidance on MacOS troubleshooting and collecting together fuzzing resources @brakmic. Very helpful. #17657 (comment)

    Added to this StackExchange post on fuzzing.

  18. MarcoFalke closed this on Mar 8, 2021

  19. DrahtBot locked this on Aug 18, 2022

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-09-29 01:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me