test: Build fuzz targets into seperate executables

MarcoFalke commented at 4:59 pm on December 27, 2018: member

Currently our fuzzer is a single binary that decides on the first few bits of the buffer what target to pick. This is ineffective as the fuzzer needs to “learn” how the fuzz targets are organized and could get easily confused. Not to mention that the (seed) corpus can not be categorized by target, since targets might “leak” into each other. Also the corpus would potentially become invalid if we ever wanted to remove a target…

Solve that by building each fuzz target into their own executable.

MarcoFalke added the label Tests on Dec 27, 2018

MarcoFalke added the label Build system on Dec 27, 2018

MarcoFalke added the label Needs gitian build on Dec 27, 2018

MarcoFalke force-pushed on Dec 27, 2018

practicalswift commented at 5:39 pm on December 27, 2018: contributor

Concept ACK

Thanks for doing this!

MarcoFalke force-pushed on Dec 27, 2018

in src/test/test_bitcoin_fuzzy-blockheader_deserialize.cpp:16 in 0293c5d91e outdated

11+    CDataStream ds(buffer, SER_NETWORK, INIT_PROTO_VERSION);
12+    try {
13+        int nVersion;
14+        ds >> nVersion;
15+        ds.SetVersion(nVersion);
16+    } catch (const std::ios_base::failure& e) {

practicalswift commented at 10:33 am on December 28, 2018:

Drop the unused e variable. Applies throughout this PR.

in src/test/test_bitcoin_fuzzy-transaction_deserialize.cpp:9 in 0293c5d91e outdated

0@@ -0,0 +1,25 @@
1+// Copyright (c) 2009-2018 The Bitcoin Core developers
2+// Distributed under the MIT software license, see the accompanying
3+// file COPYING or http://www.opensource.org/licenses/mit-license.php.
4+
5+#include <primitives/transaction.h>
6+
7+#include <test/test_bitcoin_fuzzy.hpp>
8+
9+static void test_one_input(std::vector<uint8_t> buffer)

practicalswift commented at 10:35 am on December 28, 2018:

Make buffer const reference. Applies throughout this PR :-)

in src/test/test_bitcoin_fuzzy.hpp:51 in 0293c5d91e outdated

46+    test_one_input(std::vector<uint8_t>(data, data + size));
47+    return 0;
48+}
49+
50+// This function is used by libFuzzer
51+extern "C" int LLVMFuzzerInitialize(int* argc, char*** argv)

practicalswift commented at 10:36 am on December 28, 2018:

Nit: Could drop unused variable names argc and argv.

fanquake commented at 10:46 am on December 28, 2018:

Related: https://llvm.org/docs/LibFuzzer.html#startup-initialization

MarcoFalke commented at 11:25 am on December 28, 2018: member

Would be nice if someone could look at the build system changes, since the code is mostly just moved around.

DrahtBot commented at 11:56 am on December 28, 2018: member

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#14912 ([WIP] External signer support (e.g. hardware wallet) by Sjors)
#13989 (add avx512 instrinsic by fingera)
#10443 (Add fee_est tool for debugging fee estimation code by ryanofsky)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

MarcoFalke force-pushed on Dec 31, 2018

laanwj commented at 11:53 am on January 2, 2019: member

Concept ACK

I think this should be behind a configure flag, building so many executables is going to be slow when statically linking, or with slow filesystems, and it contributes nothing to testing unless the user is planning to do fuzzing.

cjdelisle commented at 12:35 pm on January 2, 2019: none

As a non-stakeholder, feel free to ignore, but as someone who is using the same test methodology, I would like to understand why you’re proposing to change.

I have a number of reasons why I tend to prefer a single fuzz entry point, but I would like to know if there is any evidence that putting the test-case in the data can harm the fuzzer’s ability to efficiently find paths.

My reasoning is as follows:

Personally, I find it better to allow someone with a large machine to be able to run a simple command and begin fuzzing with minimal effort, if each case is fuzzed separately then this is not really practical.
Secondly, I personally am a big fan of maximum fuzz coverage, which means any test that can plausibly make use of randomized data should be accessible from the fuzzer. So clearly I would not want to require people to launch each test separately.
It seems like if one would like to fuzz a particular test case, their need can be easily supported by allowing them to pass an argument to the running which causes it to return 0 if the test is specified to anything else, thus the fuzzer will quickly prune attempts which touch other test cases.
If there is a large corpus of generated test data, it seems that one could use a simple python script to rename the test files to contain the name of the test case which they’re touching, solving the concern about organization.

Thanks, Caleb

Crypt-iQ commented at 9:25 pm on January 2, 2019: contributor

Hey @cjdelisle,

I think the fuzz tests should be split up. AFL (and other fuzzers) can splice test cases with one another and for AFL, this can discover 20% additional execution paths. From my understanding, this can cause efficiency problems if the fuzzer is not fuzzing just one target.

If we have two inputs, input A meant for address_deserialize and input B meant for banentry_deserialize and they are spliced together to create input C (which is a nonsense input), input C could be selected to be in the queue for further mutation if it provides the same edge coverage as input A or B (depending on its prefix) and cause even more of an efficiency problem. The splicing mutation is really meant to be used on similar, but diverse test cases.

Also I think seeding the fuzzer with “bad” corpus inputs in general isn’t a good idea because of efficiency and because of the splicing issue, even if we are only running one test and all others are disabled with a flag.

Anyways I’m pretty much a noob to fuzzing, but this is what I could gather from reading the AFL technical_notes, lcamtuf’s blog, and the discussion on https://github.com/bitcoin/bitcoin/issues/11045

practicalswift commented at 11:29 pm on January 2, 2019: contributor

@cjdelisle See @kcc:s comment in #11045 (comment) :-)

cjdelisle commented at 8:25 am on January 3, 2019: none

Thanks @Crypt-iQ and @practicalswift , for those who weren’t following carefully, my understanding is that it is the recommendation of Google’s OSS-Fuzz project that fuzz targets should be broken up ( https://github.com/google/oss-fuzz/blob/master/docs/ideal_integration.md ) because:

Some of the fuzzers have O(executable_size) performance
Forcing all fuzzing to enter through one entry-point multiplies the number of possible search paths and fuzzers have limited memory

So I hereby reverse my opinion, because any kind of ease of management is negated by the fact that it’s always better to do what works.

Crypt-iQ commented at 3:34 pm on January 3, 2019: contributor

I think that the fuzzing targets should be run on the corpus as part of regression testing. This would require the corpus to be included in this project. Is there a reason why it’s not currently included? Maybe this can be done in another PR.

See @kcc comment: #11045 (comment)

OSS Fuzz recommendations: https://github.com/google/oss-fuzz/blob/master/docs/ideal_integration.md#regression-testing

in doc/fuzzing.md:29 in f0d75314cb outdated

25@@ -26,7 +26,7 @@ To build Bitcoin Core using AFL instrumentation (this assumes that the
26 ./configure --disable-ccache --disable-shared --enable-tests CC=${AFLPATH}/afl-gcc CXX=${AFLPATH}/afl-g++
27 export AFL_HARDEN=1
28 cd src/
29-make test/test_bitcoin_fuzzy
30+make -C src/test

Crypt-iQ commented at 4:11 pm on January 4, 2019:

User is already in the src directory so this will fail. I ran make with no arguments in the src directory and it built the fuzz targets.

MarcoFalke commented at 7:19 pm on January 14, 2019:

Done (changed it to make)

laanwj commented at 1:45 pm on January 8, 2019: member

I think that the fuzzing targets should be run on the corpus as part of regression testing. This would require the corpus to be included in this project. Is there a reason why it’s not currently included?

The idea is to have a separate repository with the corpus. The problem with including it in the main repository, besides taking up space, is that e.g. all changes have to go through the bottleneck of maintainers here.

Crypt-iQ commented at 3:28 pm on January 8, 2019: contributor

@laanwj Ok that makes sense. I think the corpus should be split up into directories by message type / fuzzing target to avoid erroneous feedback while fuzzing.

laanwj commented at 6:59 pm on January 9, 2019: member

Probably want to remove the links to test inputs from doc/fuzzing.md here, as they’ll no longer work as-is and one of the links is broken (#15028).

DrahtBot added the label Needs rebase on Jan 10, 2019

DrahtBot removed the label Needs gitian build on Jan 11, 2019

MarcoFalke force-pushed on Jan 14, 2019

MarcoFalke added the label Needs gitian build on Jan 14, 2019

MarcoFalke deleted a comment on Jan 14, 2019

DrahtBot removed the label Needs rebase on Jan 14, 2019

DrahtBot removed the label Needs gitian build on Jan 15, 2019

MarcoFalke deleted a comment on Jan 15, 2019

MarcoFalke force-pushed on Jan 15, 2019

MarcoFalke force-pushed on Jan 16, 2019

MarcoFalke commented at 4:20 am on January 16, 2019: member

@laanwj The previous corpus can still be used by stripping the first few (4?) bytes off of the seeds. I will do that (and move them to a separate repo) after this pull has been merged.

MarcoFalke commented at 7:19 pm on January 24, 2019: member

This hasn’t received any review, but I am going to merge it tomorrow unless someone objects to that.

sipa commented at 0:43 am on January 25, 2019: member

Wouldn’t it be easier to accomplish this by having a since source file for all fuzz target, but with a macro define (-D… argument) to select which fuzzer is invoked by main?

MarcoFalke commented at 1:58 am on January 25, 2019: member

@sipa Wouldn’t that require to pass in and compile for each fuzz target? That seems to move the verbosity to the user/ci script/fuzz runner.

Also having a single source file makes it harder to see what each single fuzz test does (and depends on).

sipa commented at 4:32 am on January 25, 2019: member

@MarcoFalke The build system would automatically build all of them, supplying the necessary -D arguments for each binary.

This would avoid all the boilerplate in all those source files.

MarcoFalke force-pushed on Jan 25, 2019

MarcoFalke commented at 11:07 pm on January 25, 2019: member

Ah, I see. Done and removed 400 lines of boilerplate and headers.

MarcoFalke force-pushed on Jan 25, 2019

[test] fuzz: make test_one_input return void

The return value is always 0 and not used, so might as well return void

fab4bed68a

MarcoFalke force-pushed on Jan 26, 2019

MarcoFalke commented at 0:10 am on January 26, 2019: member

Now split into two commits, where the top commit is some move-only:

0git diff 2ca632e5b44a8385989c8539cc4e30e60fdee16c~ 2ca632e5b44a8385989c8539cc4e30e60fdee16c --color-moved=dimmed-zebra src/test

MarcoFalke added the label Needs gitian build on Jan 26, 2019

DrahtBot removed the label Needs gitian build on Jan 28, 2019

test: Build fuzz targets into seperate executables 2ca632e5b4

in configure.ac:150 in 02606835e6 outdated

145@@ -147,6 +146,11 @@ AC_ARG_ENABLE([extended-functional-tests],
146     [use_extended_functional_tests=$enableval],
147     [use_extended_functional_tests=no])
148 
149+AC_ARG_ENABLE([fuzz],
150+    AS_HELP_STRING([--enable-fuzz],[enable building of fuzz targets (default no)]),

practicalswift commented at 7:08 pm on January 28, 2019:

Bikeshedding perhaps but --enable-fuzzing feels more natural to me than --enable-fuzz.

MarcoFalke force-pushed on Jan 30, 2019

MarcoFalke deleted a comment on Jan 30, 2019

in src/Makefile.test.include:177 in 2ca632e5b4

194+if ENABLE_FUZZ
195+test_fuzz_block_deserialize_SOURCES = $(FUZZ_SUITE) test/test_bitcoin_fuzzy.cpp
196+test_fuzz_block_deserialize_CPPFLAGS = $(AM_CPPFLAGS) $(BITCOIN_INCLUDES) -DBLOCK_DESERIALIZE=1
197+test_fuzz_block_deserialize_CXXFLAGS = $(AM_CXXFLAGS) $(PIE_FLAGS)
198+test_fuzz_block_deserialize_LDFLAGS = $(RELDFLAGS) $(AM_LDFLAGS) $(LIBTOOL_APP_LDFLAGS)
199+test_fuzz_block_deserialize_LDADD = \

ryanofsky commented at 6:38 pm on January 30, 2019:

You could deduplicate some boilerplate by defining common variables like:

0fuzz_common_ldflags = $(RELDFLAGS) $(AM_LDFLAGS) $(LIBTOOL_APP_LDFLAGS)
1fuzz_common_ldadd = $(LIBUNIVALUE) $(LIBBITCOIN_SERVER) $(LIBBITCOIN_COMMON)

and then individual targets could be shortened to:

0test_fuzz_transaction_deserialize_LDFLAGS = $(fuzz_common_ldflags)
1test_fuzz_transaction_deserialize_LDADD = $(fuzz_common_ldadd)
2...
3
4test_fuzz_block_deserialize_LDFLAGS = $(fuzz_common_ldflags)
5test_fuzz_block_deserialize_LDADD = $(fuzz_common_ldadd)
6...

At least this is what I did in 2060a30063866ed1599134fe28846a69deda773c for #10102. One catch is that LDFLAGS, LDADD, etc suffix can’t be capitalized or the variables will be treated specially by automake.

in src/test/test_bitcoin_fuzzy.cpp:40 in 2ca632e5b4

84     }
85 
86-    switch(test_id) {
87-        case CBLOCK_DESERIALIZE:
88-        {
89+#if BLOCK_DESERIALIZE

ryanofsky commented at 6:43 pm on January 30, 2019:

I think it would make things less complicated just to use separate source files, rather than using these preprocessor defines. These defines don’t seem to really decrease duplication, but I guess they they do have the advantage of making it easy to see the different test cases all in one file.

MarcoFalke commented at 8:00 pm on January 30, 2019:

This was my initial solution, but I changed it back to minimize the diff

ryanofsky approved

ryanofsky commented at 6:47 pm on January 30, 2019: member

utACK 2ca632e5b44a8385989c8539cc4e30e60fdee16c

laanwj merged this on Jan 30, 2019

laanwj closed this on Jan 30, 2019

laanwj referenced this in commit 9553102c38 on Jan 30, 2019

MarcoFalke deleted the branch on Jan 30, 2019

random-zebra referenced this in commit 44b5327e61 on May 28, 2021

kittywhiskers referenced this in commit 8614068123 on Aug 2, 2021

kittywhiskers referenced this in commit cdd3ee5626 on Aug 5, 2021

kittywhiskers referenced this in commit cacbbffa90 on Aug 5, 2021

PastaPastaPasta referenced this in commit 9c16a4122d on Aug 6, 2021

kittywhiskers referenced this in commit de2b7eee83 on Aug 8, 2021

kittywhiskers referenced this in commit 0e7fe9e6ab on Aug 11, 2021

PastaPastaPasta referenced this in commit 90e7119a8b on Aug 11, 2021

5tefan referenced this in commit 8f4c933f3e on Aug 12, 2021

DrahtBot locked this on Dec 16, 2021

test: Build fuzz targets into seperate executables #15043

Conflicts