test: Build fuzz targets into seperate executables #15043

pull MarcoFalke wants to merge 2 commits into bitcoin:master from MarcoFalke:Mf1812-buildFuzzTargets changing 7 files +531 −243
  1. MarcoFalke commented at 4:59 pm on December 27, 2018: member

    Currently our fuzzer is a single binary that decides on the first few bits of the buffer what target to pick. This is ineffective as the fuzzer needs to “learn” how the fuzz targets are organized and could get easily confused. Not to mention that the (seed) corpus can not be categorized by target, since targets might “leak” into each other. Also the corpus would potentially become invalid if we ever wanted to remove a target…

    Solve that by building each fuzz target into their own executable.

  2. MarcoFalke added the label Tests on Dec 27, 2018
  3. MarcoFalke added the label Build system on Dec 27, 2018
  4. MarcoFalke added the label Needs gitian build on Dec 27, 2018
  5. MarcoFalke force-pushed on Dec 27, 2018
  6. MarcoFalke force-pushed on Dec 27, 2018
  7. practicalswift commented at 5:39 pm on December 27, 2018: contributor

    Concept ACK

    Thanks for doing this!

  8. MarcoFalke force-pushed on Dec 27, 2018
  9. MarcoFalke force-pushed on Dec 27, 2018
  10. in src/test/test_bitcoin_fuzzy-blockheader_deserialize.cpp:16 in 0293c5d91e outdated
    11+    CDataStream ds(buffer, SER_NETWORK, INIT_PROTO_VERSION);
    12+    try {
    13+        int nVersion;
    14+        ds >> nVersion;
    15+        ds.SetVersion(nVersion);
    16+    } catch (const std::ios_base::failure& e) {
    


    practicalswift commented at 10:33 am on December 28, 2018:
    Drop the unused e variable. Applies throughout this PR.
  11. in src/test/test_bitcoin_fuzzy-transaction_deserialize.cpp:9 in 0293c5d91e outdated
    0@@ -0,0 +1,25 @@
    1+// Copyright (c) 2009-2018 The Bitcoin Core developers
    2+// Distributed under the MIT software license, see the accompanying
    3+// file COPYING or http://www.opensource.org/licenses/mit-license.php.
    4+
    5+#include <primitives/transaction.h>
    6+
    7+#include <test/test_bitcoin_fuzzy.hpp>
    8+
    9+static void test_one_input(std::vector<uint8_t> buffer)
    


    practicalswift commented at 10:35 am on December 28, 2018:
    Make buffer const reference. Applies throughout this PR :-)
  12. in src/test/test_bitcoin_fuzzy.hpp:51 in 0293c5d91e outdated
    46+    test_one_input(std::vector<uint8_t>(data, data + size));
    47+    return 0;
    48+}
    49+
    50+// This function is used by libFuzzer
    51+extern "C" int LLVMFuzzerInitialize(int* argc, char*** argv)
    


    practicalswift commented at 10:36 am on December 28, 2018:
    Nit: Could drop unused variable names argc and argv.

    fanquake commented at 10:46 am on December 28, 2018:
  13. MarcoFalke commented at 11:25 am on December 28, 2018: member
    Would be nice if someone could look at the build system changes, since the code is mostly just moved around.
  14. DrahtBot commented at 11:56 am on December 28, 2018: member

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #14912 ([WIP] External signer support (e.g. hardware wallet) by Sjors)
    • #13989 (add avx512 instrinsic by fingera)
    • #10443 (Add fee_est tool for debugging fee estimation code by ryanofsky)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  15. MarcoFalke force-pushed on Dec 31, 2018
  16. MarcoFalke force-pushed on Dec 31, 2018
  17. MarcoFalke force-pushed on Dec 31, 2018
  18. MarcoFalke force-pushed on Dec 31, 2018
  19. MarcoFalke force-pushed on Dec 31, 2018
  20. MarcoFalke force-pushed on Dec 31, 2018
  21. MarcoFalke force-pushed on Dec 31, 2018
  22. MarcoFalke force-pushed on Dec 31, 2018
  23. laanwj commented at 11:53 am on January 2, 2019: member

    Concept ACK

    I think this should be behind a configure flag, building so many executables is going to be slow when statically linking, or with slow filesystems, and it contributes nothing to testing unless the user is planning to do fuzzing.

  24. cjdelisle commented at 12:35 pm on January 2, 2019: none

    As a non-stakeholder, feel free to ignore, but as someone who is using the same test methodology, I would like to understand why you’re proposing to change.

    I have a number of reasons why I tend to prefer a single fuzz entry point, but I would like to know if there is any evidence that putting the test-case in the data can harm the fuzzer’s ability to efficiently find paths.

    My reasoning is as follows:

    1. Personally, I find it better to allow someone with a large machine to be able to run a simple command and begin fuzzing with minimal effort, if each case is fuzzed separately then this is not really practical.
    2. Secondly, I personally am a big fan of maximum fuzz coverage, which means any test that can plausibly make use of randomized data should be accessible from the fuzzer. So clearly I would not want to require people to launch each test separately.
    3. It seems like if one would like to fuzz a particular test case, their need can be easily supported by allowing them to pass an argument to the running which causes it to return 0 if the test is specified to anything else, thus the fuzzer will quickly prune attempts which touch other test cases.
    4. If there is a large corpus of generated test data, it seems that one could use a simple python script to rename the test files to contain the name of the test case which they’re touching, solving the concern about organization.

    Thanks, Caleb

  25. Crypt-iQ commented at 9:25 pm on January 2, 2019: contributor

    Hey @cjdelisle,

    I think the fuzz tests should be split up. AFL (and other fuzzers) can splice test cases with one another and for AFL, this can discover 20% additional execution paths. From my understanding, this can cause efficiency problems if the fuzzer is not fuzzing just one target.

    If we have two inputs, input A meant for address_deserialize and input B meant for banentry_deserialize and they are spliced together to create input C (which is a nonsense input), input C could be selected to be in the queue for further mutation if it provides the same edge coverage as input A or B (depending on its prefix) and cause even more of an efficiency problem. The splicing mutation is really meant to be used on similar, but diverse test cases.

    Also I think seeding the fuzzer with “bad” corpus inputs in general isn’t a good idea because of efficiency and because of the splicing issue, even if we are only running one test and all others are disabled with a flag.

    Anyways I’m pretty much a noob to fuzzing, but this is what I could gather from reading the AFL technical_notes, lcamtuf’s blog, and the discussion on https://github.com/bitcoin/bitcoin/issues/11045

  26. practicalswift commented at 11:29 pm on January 2, 2019: contributor
    @cjdelisle See @kcc:s comment in #11045 (comment) :-)
  27. cjdelisle commented at 8:25 am on January 3, 2019: none

    Thanks @Crypt-iQ and @practicalswift , for those who weren’t following carefully, my understanding is that it is the recommendation of Google’s OSS-Fuzz project that fuzz targets should be broken up ( https://github.com/google/oss-fuzz/blob/master/docs/ideal_integration.md ) because:

    • Some of the fuzzers have O(executable_size) performance
    • Forcing all fuzzing to enter through one entry-point multiplies the number of possible search paths and fuzzers have limited memory

    So I hereby reverse my opinion, because any kind of ease of management is negated by the fact that it’s always better to do what works.

  28. Crypt-iQ commented at 3:34 pm on January 3, 2019: contributor

    I think that the fuzzing targets should be run on the corpus as part of regression testing. This would require the corpus to be included in this project. Is there a reason why it’s not currently included? Maybe this can be done in another PR.

    See @kcc comment: #11045 (comment)

    OSS Fuzz recommendations: https://github.com/google/oss-fuzz/blob/master/docs/ideal_integration.md#regression-testing

  29. in doc/fuzzing.md:29 in f0d75314cb outdated
    25@@ -26,7 +26,7 @@ To build Bitcoin Core using AFL instrumentation (this assumes that the
    26 ./configure --disable-ccache --disable-shared --enable-tests CC=${AFLPATH}/afl-gcc CXX=${AFLPATH}/afl-g++
    27 export AFL_HARDEN=1
    28 cd src/
    29-make test/test_bitcoin_fuzzy
    30+make -C src/test
    


    Crypt-iQ commented at 4:11 pm on January 4, 2019:
    User is already in the src directory so this will fail. I ran make with no arguments in the src directory and it built the fuzz targets.

    MarcoFalke commented at 7:19 pm on January 14, 2019:
    Done (changed it to make)
  30. laanwj commented at 1:45 pm on January 8, 2019: member

    I think that the fuzzing targets should be run on the corpus as part of regression testing. This would require the corpus to be included in this project. Is there a reason why it’s not currently included?

    The idea is to have a separate repository with the corpus. The problem with including it in the main repository, besides taking up space, is that e.g. all changes have to go through the bottleneck of maintainers here.

  31. Crypt-iQ commented at 3:28 pm on January 8, 2019: contributor
    @laanwj Ok that makes sense. I think the corpus should be split up into directories by message type / fuzzing target to avoid erroneous feedback while fuzzing.
  32. laanwj commented at 6:59 pm on January 9, 2019: member
    Probably want to remove the links to test inputs from doc/fuzzing.md here, as they’ll no longer work as-is and one of the links is broken (#15028).
  33. DrahtBot added the label Needs rebase on Jan 10, 2019
  34. DrahtBot removed the label Needs gitian build on Jan 11, 2019
  35. MarcoFalke force-pushed on Jan 14, 2019
  36. MarcoFalke force-pushed on Jan 14, 2019
  37. MarcoFalke force-pushed on Jan 14, 2019
  38. MarcoFalke added the label Needs gitian build on Jan 14, 2019
  39. MarcoFalke deleted a comment on Jan 14, 2019
  40. DrahtBot removed the label Needs rebase on Jan 14, 2019
  41. DrahtBot removed the label Needs gitian build on Jan 15, 2019
  42. MarcoFalke deleted a comment on Jan 15, 2019
  43. MarcoFalke force-pushed on Jan 15, 2019
  44. MarcoFalke force-pushed on Jan 15, 2019
  45. MarcoFalke force-pushed on Jan 16, 2019
  46. MarcoFalke force-pushed on Jan 16, 2019
  47. MarcoFalke force-pushed on Jan 16, 2019
  48. MarcoFalke commented at 4:20 am on January 16, 2019: member
    @laanwj The previous corpus can still be used by stripping the first few (4?) bytes off of the seeds. I will do that (and move them to a separate repo) after this pull has been merged.
  49. MarcoFalke commented at 7:19 pm on January 24, 2019: member
    This hasn’t received any review, but I am going to merge it tomorrow unless someone objects to that.
  50. sipa commented at 0:43 am on January 25, 2019: member
    Wouldn’t it be easier to accomplish this by having a since source file for all fuzz target, but with a macro define (-D… argument) to select which fuzzer is invoked by main?
  51. MarcoFalke commented at 1:58 am on January 25, 2019: member

    @sipa Wouldn’t that require to pass in and compile for each fuzz target? That seems to move the verbosity to the user/ci script/fuzz runner.

    Also having a single source file makes it harder to see what each single fuzz test does (and depends on).

  52. sipa commented at 4:32 am on January 25, 2019: member

    @MarcoFalke The build system would automatically build all of them, supplying the necessary -D arguments for each binary.

    This would avoid all the boilerplate in all those source files.

  53. MarcoFalke force-pushed on Jan 25, 2019
  54. MarcoFalke force-pushed on Jan 25, 2019
  55. MarcoFalke commented at 11:07 pm on January 25, 2019: member
    Ah, I see. Done and removed 400 lines of boilerplate and headers.
  56. MarcoFalke force-pushed on Jan 25, 2019
  57. [test] fuzz: make test_one_input return void
    The return value is always 0 and not used, so might as well return void
    fab4bed68a
  58. MarcoFalke force-pushed on Jan 26, 2019
  59. MarcoFalke force-pushed on Jan 26, 2019
  60. MarcoFalke commented at 0:10 am on January 26, 2019: member

    Now split into two commits, where the top commit is some move-only:

    0git diff 2ca632e5b44a8385989c8539cc4e30e60fdee16c~ 2ca632e5b44a8385989c8539cc4e30e60fdee16c --color-moved=dimmed-zebra src/test
    
  61. MarcoFalke added the label Needs gitian build on Jan 26, 2019
  62. DrahtBot removed the label Needs gitian build on Jan 28, 2019
  63. test: Build fuzz targets into seperate executables 2ca632e5b4
  64. in configure.ac:150 in 02606835e6 outdated
    145@@ -147,6 +146,11 @@ AC_ARG_ENABLE([extended-functional-tests],
    146     [use_extended_functional_tests=$enableval],
    147     [use_extended_functional_tests=no])
    148 
    149+AC_ARG_ENABLE([fuzz],
    150+    AS_HELP_STRING([--enable-fuzz],[enable building of fuzz targets (default no)]),
    


    practicalswift commented at 7:08 pm on January 28, 2019:
    Bikeshedding perhaps but --enable-fuzzing feels more natural to me than --enable-fuzz.
  65. MarcoFalke force-pushed on Jan 30, 2019
  66. MarcoFalke deleted a comment on Jan 30, 2019
  67. in src/Makefile.test.include:177 in 2ca632e5b4
    194+if ENABLE_FUZZ
    195+test_fuzz_block_deserialize_SOURCES = $(FUZZ_SUITE) test/test_bitcoin_fuzzy.cpp
    196+test_fuzz_block_deserialize_CPPFLAGS = $(AM_CPPFLAGS) $(BITCOIN_INCLUDES) -DBLOCK_DESERIALIZE=1
    197+test_fuzz_block_deserialize_CXXFLAGS = $(AM_CXXFLAGS) $(PIE_FLAGS)
    198+test_fuzz_block_deserialize_LDFLAGS = $(RELDFLAGS) $(AM_LDFLAGS) $(LIBTOOL_APP_LDFLAGS)
    199+test_fuzz_block_deserialize_LDADD = \
    


    ryanofsky commented at 6:38 pm on January 30, 2019:

    You could deduplicate some boilerplate by defining common variables like:

    0fuzz_common_ldflags = $(RELDFLAGS) $(AM_LDFLAGS) $(LIBTOOL_APP_LDFLAGS)
    1fuzz_common_ldadd = $(LIBUNIVALUE) $(LIBBITCOIN_SERVER) $(LIBBITCOIN_COMMON)
    

    and then individual targets could be shortened to:

    0test_fuzz_transaction_deserialize_LDFLAGS = $(fuzz_common_ldflags)
    1test_fuzz_transaction_deserialize_LDADD = $(fuzz_common_ldadd)
    2...
    3
    4test_fuzz_block_deserialize_LDFLAGS = $(fuzz_common_ldflags)
    5test_fuzz_block_deserialize_LDADD = $(fuzz_common_ldadd)
    6...
    

    At least this is what I did in 2060a30063866ed1599134fe28846a69deda773c for #10102. One catch is that LDFLAGS, LDADD, etc suffix can’t be capitalized or the variables will be treated specially by automake.

  68. in src/test/test_bitcoin_fuzzy.cpp:40 in 2ca632e5b4
    84     }
    85 
    86-    switch(test_id) {
    87-        case CBLOCK_DESERIALIZE:
    88-        {
    89+#if BLOCK_DESERIALIZE
    


    ryanofsky commented at 6:43 pm on January 30, 2019:
    I think it would make things less complicated just to use separate source files, rather than using these preprocessor defines. These defines don’t seem to really decrease duplication, but I guess they they do have the advantage of making it easy to see the different test cases all in one file.

    MarcoFalke commented at 8:00 pm on January 30, 2019:
    This was my initial solution, but I changed it back to minimize the diff
  69. ryanofsky approved
  70. ryanofsky commented at 6:47 pm on January 30, 2019: member
    utACK 2ca632e5b44a8385989c8539cc4e30e60fdee16c
  71. laanwj merged this on Jan 30, 2019
  72. laanwj closed this on Jan 30, 2019

  73. laanwj referenced this in commit 9553102c38 on Jan 30, 2019
  74. MarcoFalke deleted the branch on Jan 30, 2019
  75. random-zebra referenced this in commit 44b5327e61 on May 28, 2021
  76. kittywhiskers referenced this in commit 8614068123 on Aug 2, 2021
  77. kittywhiskers referenced this in commit cdd3ee5626 on Aug 5, 2021
  78. kittywhiskers referenced this in commit cacbbffa90 on Aug 5, 2021
  79. PastaPastaPasta referenced this in commit 9c16a4122d on Aug 6, 2021
  80. kittywhiskers referenced this in commit de2b7eee83 on Aug 8, 2021
  81. kittywhiskers referenced this in commit 0e7fe9e6ab on Aug 11, 2021
  82. PastaPastaPasta referenced this in commit 90e7119a8b on Aug 11, 2021
  83. 5tefan referenced this in commit 8f4c933f3e on Aug 12, 2021
  84. DrahtBot locked this on Dec 16, 2021

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-09-28 22:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me