fuzz: Link all targets once #20560

pull MarcoFalke wants to merge 2 commits into bitcoin:master from MarcoFalke:2012-fuzzLinkOnce changing 100 files +476 −1307
  1. MarcoFalke commented at 4:23 pm on December 3, 2020: member

    Currently the linker is invoked more than 150 times when compiling with --enable-fuzz. This is problematic for several reasons:

    • It wastes disk space north of 20 GB, as all libraries and sanitizers are linked more than 150 times
    • It wastes CPU time, as the link step can practically not be cached (similar to ccache for object files)
    • It makes it a blocker to compile the fuzz tests by default for non-fuzz builds #19388, for the aforementioned reasons
    • The build file is several thousand lines of code, without doing anything meaningful except listing each fuzz target in a highly verbose manner
    • It makes writing new fuzz tests unnecessarily hard, as build system knowledge is required; Compare that to boost unit tests, which can be added by simply editing an existing cpp file
    • It encourages fuzz tests that re-use the buffer or assume the buffer to be concatenations of seeds, which increases complexity of seeds and complexity for the fuzz engine to explore; Thus reducing the effectiveness of the affected fuzz targets

    Fixes #20088

  2. MarcoFalke added the label Tests on Dec 3, 2020
  3. sipa commented at 5:57 pm on December 3, 2020: member
    Concept ACK, under the condition that we can do some basic testing that this doesn’t meaningfully affect the speed at which fuzzers find issues.
  4. MarcoFalke commented at 5:59 pm on December 3, 2020: member
    The cost is dereferencing one pointer (to a function). I highly doubt that this affects performance, but I am happy to test.
  5. sipa commented at 6:00 pm on December 3, 2020: member

    The cost is dereferencing one pointer (to a function). I highly doubt that this affects performance, but I am happy to test.

    That seems entirely reasonable, and is my expectation too. But given the pervasive “one binary per test” recommendation, I’d rather make sure.

  6. MarcoFalke force-pushed on Dec 3, 2020
  7. DrahtBot commented at 0:30 am on December 4, 2020: member

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #20516 (Well-defined CAddress disk serialization, and addrv2 anchors.dat by sipa)
    • #20487 (draft: Add syscall sandboxing using seccomp-bpf (Linux secure computing mode) by practicalswift)
    • #20457 (util: Make Parse{Int,UInt}{32,64} use locale independent std::from_chars(…) (C++17) instead of locale dependent strto{l,ll,ul,ull} by practicalswift)
    • #20452 (util: Replace use of locale dependent atoi(…) with locale-independent std::from_chars(…) (C++17) by practicalswift)
    • #20377 (fuzz: Fill various small fuzzing gaps by practicalswift)
    • #19972 ([draft] fuzz/net: Add fuzzing harness for node eviction logic. Make node eviction logic testable. by practicalswift)
    • #19920 (test: Fuzzing siphash against reference implementation [Request for feedback] by elichai)
    • #19415 (net: Make DNS lookup mockable, add fuzzing harness by practicalswift)
    • #19288 (fuzz: Add fuzzing harness for TorController by practicalswift)
    • #19259 (fuzz: Add fuzzing harness for LoadMempool(…) and DumpMempool(…) by practicalswift)
    • #19203 (net: Add regression fuzz harness for CVE-2017-18350. Add FuzzedSocket. Add thin SOCKET wrapper. by practicalswift)
    • #19055 (Add MuHash3072 implementation by fjahr)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  8. MarcoFalke commented at 6:32 am on December 4, 2020: member
    @RandyMcMillan Good catch on macOS. Should be fixed now (hopefully)
  9. DrahtBot commented at 9:50 am on December 4, 2020: member

    🕵️ @achow101 @sipa @practicalswift @harding have been requested to review this pull request as specified in the REVIEWERS file.

  10. curtcurt380 approved
  11. laanwj commented at 6:59 pm on December 4, 2020: member
    I’m very much in favor of this with regard to organization. The large number of binaries was bothering me. That said, I know nothing about fuzzing, so only ACK if it doesn’t make fuzzing less useful.
  12. MarcoFalke force-pushed on Dec 5, 2020
  13. MarcoFalke commented at 7:58 am on December 5, 2020: member

    Benchmarks

    Compilation

    Measured was wall clock time and disk usage of the full build pipeline with a warm ccache: autogen, configure with ccache, make clean, make -j 9.

    master this pull
    --with-sanitizers=address,fuzzer,undefined 3m8.335s @ 15G 0m56.090s @ 350M
    --with-sanitizers=fuzzer 2m5.419s @ 12G 0m54.408s @ 274M

    Running CI

    Measured was the time to iterate over all seeds, typically done by CI:

    ./test/fuzz/test_runner.py -j 9

    master this pull
    --with-sanitizers=address,fuzzer,undefined 10m4.286s 10m13.070s
    --with-sanitizers=fuzzer 1m45.550s 1m48.252s

    Coverage

    Coverage decreased slightly because the tinyformat fuzzer had to be renamed, should recover once the seed dir is renamed as well.

    Generating seeds

    Measured was the number of iterations required to find a synthetic bug, typically done on a fuzzing farm (dedicated hardware). The bug used:

     0diff --git a/src/net_processing.cpp b/src/net_processing.cpp
     1index ec5400c3d8..f15a7a990a 100644
     2--- a/src/net_processing.cpp
     3+++ b/src/net_processing.cpp
     4@@ -2696,6 +2696,7 @@ void PeerManager::ProcessMessage(CNode& pfrom, const std::string& msg_type, CDat
     5                     return;
     6                 } else if (!fAlreadyHave && !m_chainman.ActiveChainstate().IsInitialBlockDownload()) {
     7                     AddTxAnnouncement(pfrom, gtxid, current_time);
     8+                    Assert(false);
     9                 }
    10             } else {
    11                 LogPrint(BCLog::NET, "Unknown inv type \"%s\" received from peer=%d\n", inv.ToString(), pfrom.GetId());
    

    process_message_inv harness:

    master this pull
    --with-sanitizers=address,fuzzer,undefined -use_value_profile=1 master_san_val this_san_val
    --with-sanitizers=address,fuzzer,undefined -use_value_profile=0 master_san_noval this_san_noval
    --with-sanitizers=fuzzer -use_value_profile=1 master_nosan_val this_nosan_val
    --with-sanitizers=fuzzer -use_value_profile=0 more than 11kk (aborted) more than 11kk (aborted)
  14. MarcoFalke force-pushed on Dec 5, 2020
  15. MarcoFalke commented at 2:26 pm on December 5, 2020: member
    benchmarks updated with images
  16. sipa commented at 8:35 am on December 6, 2020: member

    Doing a benchmark as well.

    Error introducing patch (similar to a bug I had during development of #19988):

     0diff --git a/src/txrequest.cpp b/src/txrequest.cpp
     1index e54c073328..8a68e4fd8a 100644
     2--- a/src/txrequest.cpp
     3+++ b/src/txrequest.cpp
     4@@ -553,16 +553,17 @@ public:
     5             //   In other words, the situation where std::next(it) is deleted can only occur if std::next(it)
     6             //   belongs to a different peer but the same txhash as 'it'. This is covered by the first bulletpoint
     7             //   already, and we'll have set it_next to end().
     8-            auto it_next = (std::next(it) == index.end() || std::next(it)->m_peer != peer) ? index.end() :
     9-                std::next(it);
    10             // If the announcement isn't already COMPLETED, first make it COMPLETED (which will mark other
    11             // CANDIDATEs as CANDIDATE_BEST, or delete all of a txhash's announcements if no non-COMPLETED ones are
    12             // left).
    13             if (MakeCompleted(m_index.project<ByTxHash>(it))) {
    14                 // Then actually delete the announcement (unless it was already deleted by MakeCompleted).
    15-                Erase<ByPeer>(it);
    16+                it = Erase<ByPeer>(it);
    17+            } else {
    18+                it = std::next(it);
    19             }
    20-            it = it_next;
    21         }
    22     }
    

    Number of iterations until crash is detected (avg +- stddev measured over 250000+ crashes for each):

    • 751ffaabad82f7904fd1d9742a0b323a0ab7bfee (master), -use_value_profile=0: 786.8 +- 529.2
    • fac1885e4ceb622413b1d4609ba6c860a4af3e74 (this PR), -use_value_profile=0: 812.4 +- 536.1
    • 751ffaabad82f7904fd1d9742a0b323a0ab7bfee (master), -use_value_profile=1: 1452.4 +- 1180.3
    • fac1885e4ceb622413b1d4609ba6c860a4af3e74 (this PR), -use_value_profile=1: 1362.0 +- 1126.7
  17. sipa commented at 8:03 pm on December 6, 2020: member

    Updated my numbers. It appears that there is a (statistically) significant difference in iteration count, but for -use_value_profile=0 it’s in the other direction than -use_value_profile=1. So I’m going to assume it’s just due to arbitrary alignment changes in the binary or so.

    Concept ACK. I’m not concerned anymore about the impact on fuzzing speed. Will review the code changes soon.

  18. fuzz: Link all targets once 44444ba759
  19. in test/fuzz/test_runner.py:206 in fac1885e4c outdated
    202@@ -194,7 +203,7 @@ def job(command):
    203             "-runs=100000",
    204             target_seed_dir,
    205         ]
    206-        futures.append(fuzz_pool.submit(job, command))
    207+        futures.append(fuzz_pool.submit(job, command, target))
    


    sipa commented at 3:41 am on December 10, 2020:
    Command here is still the old binary path.

    MarcoFalke commented at 6:16 am on December 10, 2020:
    nice catch. Fixed
  20. MarcoFalke force-pushed on Dec 10, 2020
  21. MarcoFalke commented at 7:49 am on December 10, 2020: member
    @practicalswift Mind taking a look here as well?
  22. practicalswift commented at 4:46 pm on December 11, 2020: contributor

    @practicalswift Mind taking a look here as well?

    I think the addition of an option to link all targets once is a great idea for the reasons you mentioned.

    My only concern is that this removes the ability to link the targets separately.

    That is problematic since the de facto standard in fuzzing is the one-binary-per-harness mode like we currently use: all fuzzing management platforms I know of assume such structure. They typically don’t support setting special environment variables or wrapping binaries in shell scripts that would set such environment variables.

    There are two main ways to fuzz:

    • Case 1. Quick ad-hoc execution of a single harness: typically performed manually by a developer testing a specific part of the code base or a specific PR
    • Case 2. Continuous long term fuzzing of all harnesses in a project: typically performed automatically using a fuzzing platform

    This PR is great for case 1 but breaks case 2.

    Hopefully we can find a way to make case 1 nicer without breaking case 2?

    FWIW the issues I’ve found and reported to security@ which were found via fuzzing have mostly been found by continuous long term fuzzing (case 2 above).

  23. MarcoFalke commented at 5:14 pm on December 11, 2020: member
    I am using test/fuzz/test_runner.py to do “case 2” continuous fuzzing and it works just fine. If there is something broken specifically, it would be good to explain how to reproduce the failure and we can take it from there.
  24. MarcoFalke commented at 6:52 pm on December 11, 2020: member

    To address your feedback, I’ve written a oneline bash script to link each target individually.

    0for f in $(git grep --extended-regexp  'FUZZ_TARGET\(.*)$' | sed --regexp-extended 's/.*FUZZ_TARGET.(.*).$/\1/g'); do echo "Compiling target $f ..." && git reset --hard HEAD && sed -i "s/std::getenv(\"FUZZ\")/\"$f\"/g" ./src/test/fuzz/fuzz.cpp && make -j 9 && mv ./src/test/fuzz/fuzz ./src/test/fuzz/$f ; done
    
  25. practicalswift commented at 12:15 pm on December 12, 2020: contributor

    To address your feedback, I’ve written a oneline bash script to link each target individually.

    0for f in $(git grep --extended-regexp  'FUZZ_TARGET\(.*)$' | sed --regexp-extended 's/.*FUZZ_TARGET.(.*).$/\1/g'); do echo "Compiling target $f ..." && git reset --hard HEAD && sed -i "s/std::getenv(\"FUZZ\")/\"$f\"/g" ./src/test/fuzz/fuzz.cpp && make -j 9 && mv ./src/test/fuzz/fuzz ./src/test/fuzz/$f ; done
    

    That’s a really nice idea: thanks! :)

    That gives us the desired property that find src/test/fuzz/ -type f -executable finds all fuzzing harnesses (which we depend on in our FuzzBuzz integration as an example: see find_targets_command in https://github.com/bitcoin/bitcoin/blob/master/.fuzzbuzz.yml).

    What about making it so that ./configure --enable-fuzz --enable-one-binary-per-fuzzing-harness --with-sanitizers=address,fuzzer,undefined preserves the current behaviour?

    If so everyone should be happy regardless of type of fuzzing platform used, and this PR would not break anyones workflow (+/- changing a single configure parameter). We can assume that everyone currently fuzzing Bitcoin Core using a fuzzing platform is using a platform that supports changing configure parameters easily since that is already needed for -enable-fuzz and --with-sanitizers=address,fuzzer,undefined :)

     0$ gh pr checkout 20560
     1$ make distclean
     2$ ./autogen.sh
     3$ CC=clang CXX=clang++ ./configure --enable-fuzz --with-sanitizers=address,fuzzer,undefined
     4$ make
     5$ find src/test/fuzz/ -type f -executable
     6src/test/fuzz/fuzz
     7$ cat contrib/devtools/build_one_binary_per_fuzzing_harness.sh
     8#!/bin/bash
     9
    10set -e
    11git checkout src/test/fuzz/fuzz.cpp
    12for FUZZING_HARNESS in $(git grep --extended-regexp 'FUZZ_TARGET\(.*)$' | sed --regexp-extended 's/.*FUZZ_TARGET.(.*).$/\1/g' | sort -u); do
    13    echo "Building src/test/fuzz/${FUZZING_HARNESS} ..."
    14    sed -i "s/std::getenv(\"FUZZ\")/\"${FUZZING_HARNESS}\"/g" src/test/fuzz/fuzz.cpp
    15    make -j "$(nproc)" > /dev/null
    16    git checkout src/test/fuzz/fuzz.cpp
    17    mv src/test/fuzz/fuzz "src/test/fuzz/${FUZZING_HARNESS}"
    18done
    19echo "Successfully built all targets."
    20$ contrib/devtools/build_one_binary_per_fuzzing_harness.sh
    21Building src/test/fuzz/addition_overflow ...
    22Building src/test/fuzz/addrdb ...
    23Building src/test/fuzz/asmap ...
    24Building src/test/fuzz/asmap_direct ...
    25Building src/test/fuzz/autofile ...
    2627Successfully built all targets.
    28$ find src/test/fuzz/ -type f -executable | sort -u | head -5
    29src/test/fuzz/addition_overflow
    30src/test/fuzz/addrdb
    31src/test/fuzz/asmap
    32src/test/fuzz/asmap_direct
    33src/test/fuzz/autofile
    
  26. MarcoFalke force-pushed on Dec 14, 2020
  27. MarcoFalke commented at 3:00 pm on December 14, 2020: member
    Addressed all feedback. Anything left to do here?
  28. MarcoFalke force-pushed on Dec 14, 2020
  29. build: Add option --enable-danger-fuzz-link-all fa13e1b0c5
  30. MarcoFalke force-pushed on Dec 14, 2020
  31. jonatack commented at 5:13 pm on December 14, 2020: member
    Began looking over the code. Concept ACK.
  32. practicalswift commented at 7:39 pm on December 14, 2020: contributor

    Thanks for adding --enable-danger-fuzz-link-all for backwards compatibility!

    Concept ACK

    Will review.

  33. practicalswift commented at 4:57 pm on December 15, 2020: contributor

    Tested ACK fa13e1b0c52738492310b6b421d8e38cb04da5b1

    Great work! Thanks! ❤️

     0$ gh pr checkout 20560
     1$ make distclean
     2$ ./autogen.sh
     3$ CC=clang CXX=clang++ ./configure --enable-fuzz --with-sanitizers=address,fuzzer,undefined
     4$ make
     5$ find src/test/fuzz/ -type f -executable
     6src/test/fuzz/fuzz
     7$ PRINT_ALL_FUZZ_TARGETS_AND_ABORT=1 src/test/fuzz/fuzz | head -10
     8addition_overflow
     9addr_info_deserialize
    10addrdb
    11address_deserialize
    12addrman
    13addrman_deserialize
    14asmap
    15asmap_direct
    16autofile
    17banentry_deserialize
    18$ FUZZ=asmap src/test/fuzz/fuzz
    19INFO: Running with entropic power schedule (0xFF, 100).
    20INFO: Seed: 1585518257
    21INFO: Loaded 1 modules   (354516 inline 8-bit counters): 354516 [0x564092304428, 0x56409235acfc),
    22INFO: Loaded 1 PC tables (354516 PCs): 354516 [0x56409235ad00,0x5640928c3a40),
    23INFO: -max_len is not provided; libFuzzer will not generate inputs larger than 4096 bytes
    24INFO: A corpus is not provided, starting from an empty corpus
    25[#2](/bitcoin-bitcoin/2/)      INITED cov: 76 ft: 77 corp: 1/1b exec/s: 0 rss: 96Mb
    2627$ make clean
    28$ CC=clang CXX=clang++ ./configure --enable-fuzz --with-sanitizers=address,fuzzer,undefined --enable-danger-fuzz-link-all
    29$ make
    30$ find src/test/fuzz/ -type f -executable | sort -u | head -10
    31src/test/fuzz/addition_overflow
    32src/test/fuzz/addrdb
    33src/test/fuzz/address_deserialize
    34src/test/fuzz/addr_info_deserialize
    35src/test/fuzz/addrman
    36src/test/fuzz/addrman_deserialize
    37src/test/fuzz/asmap
    38src/test/fuzz/asmap_direct
    39src/test/fuzz/autofile
    40src/test/fuzz/banentry_deserialize
    41$ src/test/fuzz/asmap
    42INFO: Running with entropic power schedule (0xFF, 100).
    43INFO: Seed: 2730480136
    44INFO: Loaded 1 modules   (354515 inline 8-bit counters): 354515 [0x555b6b9523e8, 0x555b6b9a8cbb),
    45INFO: Loaded 1 PC tables (354515 PCs): 354515 [0x555b6b9a8cc0,0x555b6bf119f0),
    46INFO: -max_len is not provided; libFuzzer will not generate inputs larger than 4096 bytes
    47INFO: A corpus is not provided, starting from an empty corpus
    48[#2](/bitcoin-bitcoin/2/)      INITED cov: 76 ft: 77 corp: 1/1b exec/s: 0 rss: 96Mb
    49
  34. MarcoFalke commented at 5:07 pm on December 15, 2020: member
    @sipa Mind to re-ACK?
  35. sipa commented at 5:41 pm on December 15, 2020: member

    ACK fa13e1b0c52738492310b6b421d8e38cb04da5b1. Reviewed the code changes, and tested the 3 different test_runner.py modes (run once, merge, generate). I also tested building with the new –enable-danger-fuzz-link-all

    As a potential follow-up, if there is interesting in building the separate-binary strategy faster it may be useful to instead of modifying the source code in place, make the script create copies of fuzz.cpp, plus an alternative Makefile to build them all. That’d avoid messing with the source tree, and permit building all the target binaries in parallel.

  36. MarcoFalke merged this on Dec 15, 2020
  37. MarcoFalke closed this on Dec 15, 2020

  38. MarcoFalke deleted the branch on Dec 15, 2020
  39. MarcoFalke commented at 6:05 pm on December 15, 2020: member
    Just for reference (not sure if you noticed during review), one of the targets had to be renamed, which is why I also pushed https://github.com/bitcoin-core/qa-assets/commit/70083a813b9cb1c226d480be25996e5f2567e024
  40. sidhujag referenced this in commit cf56ca50c1 on Dec 15, 2020
  41. dongcarl commented at 9:58 pm on December 18, 2020: member

    Is it expected behaviour that

    0./configure --enable-fuzz CC=clang CXX=clang++
    1make
    

    no longer works, and now requires the --with-sanitizers=address,fuzzer,undefined configure flag to work?

    The error I’m seeing during make:

    0/usr/bin/ld: /usr/bin/ld: DWARF error: could not find variable specification at offset c4
    1libtest_fuzz.a(libtest_fuzz_a-fuzz.o): in function `unsigned long const& std::max<unsigned long>(unsigned long const&, unsigned long const&)':
    2/home/dongcarl/src/bitcoin/master/src/./test/fuzz/fuzz.h:18: multiple definition of `FuzzFrameworkEmptyFun()'; test/fuzz/fuzz-addition_overflow.o:/home/dongcarl/src/bitcoin/master/src/./test/fuzz/fuzz.h:18: first defined here
    
  42. jonatack commented at 11:13 pm on December 18, 2020: member
    It’s late and I could be misremembering but I think that flag was always needed.
  43. MarcoFalke commented at 6:01 am on December 19, 2020: member

    @dongcarl I think this is a bug (typo). Functions with a body in a header file need to be inline unless they are member functions, because those are already inline.

    So you might be able to fix it by adding inline.

  44. MarcoFalke referenced this in commit e9efb64a07 on Dec 21, 2020
  45. sidhujag referenced this in commit a26447cc8f on Dec 21, 2020
  46. MarcoFalke referenced this in commit 6e70674cda on Jan 3, 2021
  47. PastaPastaPasta referenced this in commit bacdd0914b on Jun 27, 2021
  48. PastaPastaPasta referenced this in commit 0e64719ca8 on Jun 28, 2021
  49. PastaPastaPasta referenced this in commit 3889f921e8 on Jun 29, 2021
  50. PastaPastaPasta referenced this in commit c459c8c0ac on Jul 1, 2021
  51. PastaPastaPasta referenced this in commit 7d6c7e6151 on Jul 1, 2021
  52. PastaPastaPasta referenced this in commit f4efea7ecf on Sep 17, 2021
  53. PastaPastaPasta referenced this in commit f01f7603ce on Sep 19, 2021
  54. thelazier referenced this in commit d331e2b359 on Sep 25, 2021
  55. DrahtBot locked this on Aug 16, 2022

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-11-17 09:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me