Offering my Bitcoin fuzzers #11045

issue guidovranken openend this issue on August 14, 2017
  1. guidovranken commented at 6:13 pm on August 14, 2017: contributor

    I’ve implemented about 13 different fuzzers that each fuzz different parts of the Bitcoin core source code. Collectively these result in much more code coverage than src/test/test_bitcoin_fuzzy.cpp but are not exhaustive (yet).

    I could not find any issues despite running the fuzzers for a long time.

    If you want it, I’ll make a pull request. I’m also willing to work together with someone who has more intimate knowledge of the Bitcoin core source code to complement the current set of fuzzers.

  2. sipa commented at 6:17 pm on August 14, 2017: member
    That sounds very cool! Our current fuzzers are indeed hardly more than proof of concepts.
  3. guidovranken commented at 6:26 pm on August 14, 2017: contributor

    Great. Currently it is not integrated into the build script. It’s just the directory /fuzzers with a number of cpp files and a Makefile, which references hardcoded relative paths (../src/libbitcoin_util.a ../src/crypto/libbitcoin_crypto.a etc). If this is not what you want you must integrate it into the build scripts yourself because that is not my forte.

    I’m gonna prepare it now and make the PR.

  4. practicalswift commented at 10:02 am on August 16, 2017: contributor

    @guidovranken That’s awesome! I’ve played around with different ways to extend the fuzzing coverage myself, so I’d be really interested in checking your stuff out and how you’ve attacked the problem.

    If you are not yet ready to submit a proper pull request I’d be glad to take a look at it as-is. Just dump the code in its current state in a gist, repo or somewhere convenient for you and I’ll figure out the rest :-)

    For reference, see some previous discussions about improving Bitcoin fuzzing :

    • [tests] Add fuzz testing for BlockTransactions, BlockTransactionsRequest and CBlockHeaderAndShortTxIDs deserialization, #10409
    • [tests] Add libFuzzer support, #10440
    • [tests] Speed up fuzzing by ~200x when using afl-fuzz, #10415
    • Feature request: Make Bitcoin libFuzzer-friendly and consider integration into the OSS-Fuzz project, #10364
  5. guidovranken commented at 1:05 am on August 17, 2017: contributor

    Here they are: https://github.com/guidovranken/bitcoin/tree/fuzzing/fuzzers

    You need to compile Bitcoin with AddressSanitizer (preferably) and instrumentation, something like this:

    -fsanitize=address -g -fsanitize-coverage=trace-pc-guard,indirect-calls,trace-cmp

    and put libFuzzer.a in the fuzzers/ directory.

    Instructions how to build libFuzzer can be found here (it’s trivial): https://llvm.org/docs/LibFuzzer.html

  6. practicalswift commented at 7:26 am on August 17, 2017: contributor
    @guidovranken Wow, really nice! Thanks for sharing! 👍
  7. guidovranken commented at 2:52 pm on August 17, 2017: contributor
    You’re welcome. There’s more coming, like a prevector fuzzer (using class prevector_tester, from tests), will push it later today.
  8. practicalswift commented at 2:57 pm on August 17, 2017: contributor
    @guidovranken Great! Keep it coming! 👍 Are you still writing new fuzzers or are all these the fuzzers you had been running for a while without finding any issues?
  9. practicalswift commented at 3:17 pm on August 17, 2017: contributor
    @guidovranken A small nit - would it be possible to rewrite fuzzer-blockchain-load.cpp so that it doesn’t write to the filesystem? It would be nice to avoid filesystem access in all fuzzers. @sipa What is your opinion - should new tests such as these be added as additional tests in test_bitcoin_fuzzy.cpp (extending the enum TEST_ID), or should we add them somewhere else to keep test_bitcoin_fuzzy.cpp “deserialization-only”? :-)
  10. sipa commented at 3:30 pm on August 17, 2017: member

    @practicalswift I believe @kcc has advised against the approach of switching between tests based on input. It risks confusing the fuzzer, and reusing information from one test and trying to apply it in another.

    The suggestion, I believe, was to have a separate binary for each test (which could be done automatically using #defines that select which test to build for).

  11. guidovranken commented at 7:16 pm on August 17, 2017: contributor
    @practicalswift I’m still updating them and writing new fuzzers. I want to update the blockchain loader, but LoadExternalBlockFile takes a FILE*. What do you suggest?
  12. laanwj commented at 8:31 pm on August 17, 2017: member

    Collectively these result in much more code coverage than src/test/test_bitcoin_fuzzy.cpp but are not exhaustive (yet).

    Awesome! Can these be integrated into bitcoin_fuzzy?

  13. laanwj added the label Tests on Aug 17, 2017
  14. practicalswift commented at 9:10 pm on August 17, 2017: contributor

    @guidovranken Very glad to hear that you are actively writing new fuzzers! Excellent!

    Regarding the fuzzing of LoadExternalBlockFile in fuzzer-blockchain-load.cpp – perhaps we could use fmemopen(…) where available and use std::tmpfile() as the fall-back? Perhaps going with std::tmpfile() only is good enough.

  15. kcc commented at 9:26 pm on August 17, 2017: none

    I believe @kcc has advised against the approach of switching between tests based on input. The suggestion, I believe, was to have a separate binary for each test

    Yes. This approach makes fuzzing more efficient,

    std::tmpfile() only is good enough.

    Depends on your speed requirements. If LoadExternalBlockFile is slow enough to neglect the cost of creating a file, it’s fine.

    however this approach may cripple some fuzzing approaches, e.g. the ones that rely on taint analysis (data flow propagation) and can not propagate taint data through the file system.

  16. practicalswift commented at 9:29 pm on August 17, 2017: contributor

    @laanwj I think @sipa raised a good point above that there might be benefits to have a separate binary for each fuzzer. In addition to avoid confusing the fuzzer (unnecessary re-use of fuzzing inputs between the fuzzers - just confirmed by @kcc! :-)) it has the benefit of making it more obvious when a given fuzzing session has reached a state where no new paths are found.

    With the current “first part of input data chooses the fuzzer”-approach taken by test_bitcoin_fuzzy it is quite hard to see which fuzzers that are still hitting new paths (i.e. keep fuzzing!), and which have reached their max depth/“max path count” (i.e. consider stop fuzzing).

    +1 for splitting the fuzzers into one binary each

    Given a one-binary-per-fuzzer setup it would be quite trivial to write a script that gives each fuzzer a running time of X time units each, and then measures the achieved coverage. Having this kind of fuzzer shootout would be quite fun and will probably increase the interest in writing Bitcoin fuzzers!

    Also it would make it easy to measure the incremental coverage gained from adding a new fuzzer which would be handy during fuzzing development.

  17. sipa commented at 9:45 pm on August 17, 2017: member
    Here’s an idea for a fuzzer: after #10699, it would be interesting to write a fuzz test that mimicks the test added in https://github.com/bitcoin/bitcoin/pull/10699/commits/2dd6f80680b8cbf9d4c99c6b2a8310af541e9aa3 (find a script/flag combination that proves a flag is not a softfork).
  18. kcc commented at 9:46 pm on August 17, 2017: none

    have reached their max depth/“max path count” (i.e. consider stop fuzzing).

    I’d encourage you to keep fuzzing even after reaching all possible edges. Example: BN_mod_exp may produce incorrect results on x86_64 (CVE-2017-3732) . This bug took us several CPU years to find, and the coverage was not changing for ~3 months.

  19. practicalswift commented at 9:53 pm on August 17, 2017: contributor

    @kcc Yes, that is a good point (and a very nice example, didn’t know the CVE-2017-3732 discovery details before - thanks!). Given enough resources that is the way to do it. Now if we only could get Bitcoin onto Google’s OSS-Fuzz infrastructure! :-)

    Personally I fuzz Bitcoin and the Swift compiler 24/7 (see results here), but unfortunately I meet a very real budget constraint that forces me to choose wisely between fuzzing targets based on path discovery rate :-)

    When I have you here in this thread - I just wanted to take the time to send a huge thank you for libFuzzer and the sanitizers. Amazing tools! Really impressed by your work!

  20. kcc commented at 10:07 pm on August 17, 2017: none

    if we only could get Bitcoin onto Google’s OSS-Fuzz i

    You are welcome at any moment if/when the bitcoin team decides that our disclosure timelines work for you.

    But you can build a much simpler continuous fuzzing yourself at a very low cost. E.g. n1-highcpu-8 (a VM with 8 cores and 7.2Gb RAM) will cost you 6c per hour ($44 per month). Other cloud compute vendors have comparable prices, AFAICT

    See also: https://github.com/google/fuzzer-test-suite/blob/master/tutorial/libFuzzerTutorial.md#distributed-fuzzing

    Amazing tools!

    Thank you!

  21. TheBlueMatt commented at 4:22 pm on August 18, 2017: member
    @practicalswift sadly the google OSS-Fuzz infrastructure has requirements about disclosure that are unrealistically short for Bitcoin Core, meaning we cant really accept it at this time. Assuming they actually did find something they would expect disclosure within a month or two, which is much too fast for this project (and also faster than Chromium often discloses bugs which they find internally).
  22. practicalswift commented at 8:17 pm on August 22, 2017: contributor

    #10440 adds libFuzzer support to test_bitcoin_fuzzy.cpp (in addition to the already existing afl-fuzz support including AFL persistent mode).

    When that PR is merged, what about using that as the base for the new one-binary-per-fuzzer structure suggested above?

    That way we’ll get both libFuzzer and AFL persistent mode support out of the box for all new fuzzers added.

    BTW, if anyone is willing to review the libFuzzer support patch (#10440) I’d be very happy! :-)

  23. guidovranken commented at 0:08 am on August 23, 2017: contributor

    The blockchain loader now use std::tmpfile.

    I also added a compile flag to the prevector fuzzer which makes it easy to fuzz prevector.h as if it were dealing with very large amounts of data (4GB – the threshold at which a uint32_t overflows). If you compile with -DSIZETYPE_UINT16_T it uses uint16_t as the internal size type, so it doesn’t require 4GB buffers but only 64K (the overflow threshold of uint16_t) buffers. Hope that makes sense? See also https://github.com/bitcoin/bitcoin/issues/11115

  24. guidovranken commented at 5:13 pm on August 25, 2017: contributor
    If you have any ideas about which components to fuzz next, let me know, I’ll implement them (as long as time permits).
  25. kcc commented at 5:33 am on August 29, 2017: none

    Yes, that is a good point (and a very nice example

    And one more example: https://www.openssl.org/news/secadv/20170828.txt Took us 9 months to find.

  26. practicalswift commented at 6:00 am on August 29, 2017: contributor
    @kcc Wow, really nice! :-)
  27. kcc commented at 9:24 pm on August 31, 2017: none

    perhaps we could use fmemopen

    I’ve played with fmemopen in another target and it was horrible. If at all possible try to avoid fmemopen and std::tmpfile – they may cost you 100x or 1000x slowdown

  28. laanwj commented at 4:10 pm on October 5, 2017: member

    +1 for splitting the fuzzers into one binary each

    An intermediate idea would be to have a command-line argument determine the type of fuzzer, instead of the first part of the input data (which I agree might be a bad idea).

    The reason I prefer it as one executable is that there is little overhead building it by default with the tests (which means code rot will not break it). If you want to switch to multiple executable, it will have to be a separate configure flag that is disabled by default e.g. --enable-fuzzers.

  29. kcc commented at 4:17 pm on October 5, 2017: none

    I always advocate for one-binary-per-target because it makes fuzzing more efficient. At least in libFuzzer, some parts of the logic are O(BinarySize). Bit rot is an important concern and we always encourage to use the fuzz targets as part of the regular continuous testing (not for fuzzing, just to run the targets on their corpus).

    Here are our recommendations, please consider them regardless of whether you want to join oss-fuzz. Openssl is an example of well integrated fuzzing with one-binary-per-target: https://github.com/openssl/openssl/tree/master/fuzz

  30. laanwj commented at 7:54 am on October 6, 2017: member
    Fine with me, but as said we shouldn’t build them by default anymore then (they could be built as part of one of the Travis runs for continuous integration, sure). It’s pointless to build 100 fuzzers by default that 99% of people won’t use, people are already complaining that building the tests takes long. Bitcoin has a lot of code and (inter)dependencies and every extra linker invocation will add significant time to the build process. And we don’t want to encourage building entirely without tests.
  31. MarkusTeufelberger commented at 5:32 pm on October 17, 2017: none

    According to @gmaxwell “Bitcoin Core is fuzzed with more resources than “OSS-Fuzz” already."(https://www.reddit.com/r/Bitcoin/comments/76v747/bitcoin_core_code_was_tested_so_thoroughly_that/dohc25t/).

    Is there documentation about this process somewhere? I’m especially interested in the backend, since ClusterFuzz unfortunately is not Open Source and there were no public re-implementations I could find.

  32. Crypt-iQ commented at 8:41 pm on December 24, 2018: contributor
    Sorry to gravedig, but can either of you (@kcc @sipa) explain / link to why fuzzing multiple messages in one harness can “confuse” the harness. gmaxwell gave me an answer about “hashing branch targets causing them not to be taken sometimes” that I’m not too clear on. Thanks!
  33. Dor1s commented at 11:47 pm on December 26, 2018: none

    A fuzzing engine (e.g. libFuzzer or AFL) mostly relies on code coverage signal (plus some others) as a feedback whether the testcase it has generated is “interesting” or not. “Interesting” testcases are saved and used for further mutations, non-interesting ones are discarded.

    Think of it this way:

    • we need to go through a maze, but it’s not interactive
    • we can specify the sequence of steps (UP, UP, LEFT, UP, RIGHT, etc) and execute all the steps at once
    • after trying a sequence of steps, we’ll find out whether we hit any new path in the maze or not
    • if we hit anything new, we keep improving that sequence; otherwise, forget it

    If the maze doesn’t change, we kinda can develop and improve our sequence of steps over time. While with a one-fuzzer-for-all hash based approach, every change in the sequence of the steps may change the maze to another one, and therefore the feedback we’re getting will be mostly useless and confusing.

  34. Crypt-iQ commented at 2:57 am on December 30, 2018: contributor
    Hey @Dor1s great response. I read a bit more about AFL specifically and gmaxwell’s response is starting to make a bit more sense. AFL suffers from a hash collision problem in the 64Kb shared_mem array (http://lcamtuf.coredump.cx/afl/technical_details.txt) and I’m wondering if that is also another motivation to remove this one-fuzzer-for-all approach? But it seems that getting rid of, say, ~20 edges/branches wouldn’t really mitigate this problem at all if the execution traces are particularly deep (i.e. have lots of edges and therefore a potential for collision). Removing the switch specifically to avoid collision doesn’t seem to be much of a motivation - the real motivation would be to avoid confusing the fuzzer with erroneous feedback that’s applied to the wrong messages. What do you think?
  35. practicalswift commented at 10:47 am on December 30, 2018: contributor
    @guidovranken Thanks for the fuzzers! Now that #15043 has been submitted, would you mind submitting your fuzzers as a new PR based on #15043? :-)
  36. Dor1s commented at 3:42 pm on January 2, 2019: none

    I’m wondering if that is also another motivation to remove this one-fuzzer-for-all approach?

    Yes, I think this is one more reason.

    the real motivation would be to avoid confusing the fuzzer with erroneous feedback that’s applied to the wrong messages.

    Yes, that’s the biggest point.

  37. MarcoFalke commented at 7:45 pm on February 15, 2019: member
    If someone wants to move the fuzzers into the repo, the files should be added like this: fab15ff70e8db26942db83a9aab7ebf974e8b6b2 (Just copy-paste everything and adjust test_one_input)
  38. MarcoFalke commented at 8:20 pm on April 16, 2020: member
    I think this can be closed. All offered fuzzers are now covered in at least one of our fuzz targets: https://github.com/bitcoin/bitcoin/tree/master/src/test/fuzz
  39. MarcoFalke closed this on Apr 16, 2020

  40. DrahtBot locked this on Aug 18, 2022

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-07-03 10:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me