wip: Split fuzz binary #29010

pull dergoegge wants to merge 17 commits into bitcoin:master from dergoegge:2023-12-split-fuzz changing 42 files +2346 −2051
  1. dergoegge commented at 1:56 pm on December 6, 2023: member

    Closes #28971

    • Split each harness into its own file
    • Support compiling individual harness through CPPFLAGS="-DFUZZ_HARNESS=<harness name>"
    • Build individual binaries

    The cumulative size of the individual binaries (compiled with LTO) is 3.6GB vs 14GB when search and replacing std::getenv("FUZZ") (like we do in oss-fuzz).

    TODOs:

    • Introduce option for building individual binaries
    • Simplify building individual binaries (i.e. don’t hard code each binary in Makefile.test.include)
    • Deal with test/fuzz/{tx_pool, tx_package_eval, deserialize}.cpp
    • Introduce a “one harness per file” linter
  2. refactor, fuzz: Rename fuzz harness files 8266eca66c
  3. refactor, fuzz: Split mini_miner harnesses into separate files f2d3d3b9a7
  4. refactor, fuzz: Split out addrman utils 2facf07e91
  5. refactor, fuzz: Split addrman harnesses into separate files ff94c48c70
  6. refactor, fuzz: Split key fuzz harnesses into separate files c26f1a0cee
  7. refactor, fuzz: Move SimualtionTest to fuzz/util 2efafc0fa3
  8. refactor, fuzz: Split p2p transport harnesses into separate files 2e05eb4569
  9. refactor, fuzz: Split miniscript utils from harness 0867a2d04a
  10. refactor, fuzz: Split miniscript harnesses into separate files 9776fc4df7
  11. refactor, fuzz: Split ChaCha20SplitFuzz to fuzz crypto utils 897a65e526
  12. refactor, fuzz: Split chacha20 harnesses into separate files 36eec55b4d
  13. refactor, fuzz: Move TestDescriptor to descriptor fuzz utils 037470b99d
  14. refactor, fuzz: Split descriptor parse harnesses into separate files 14263e0fc4
  15. refactor, fuzz: Split poly1305 harnesses into separate files 071bb3c7fe
  16. NO MERGE: Remove mempool and deser harnesses 0d13e49693
  17. fuzz: Support compiling only one harness ff31cc16ad
  18. NO MERGE: build individual fuzz binaries 8118b24539
  19. DrahtBot commented at 1:56 pm on December 6, 2023: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage

    For detailed information about the code coverage, see the test coverage report.

    Reviews

    See the guideline for information on the review process. A summary of reviews will appear here.

  20. dergoegge commented at 1:59 pm on December 6, 2023: member

    cc @maflcko

    turns out we can actually keep the map if we just don’t add all the harness functions into it

  21. maflcko commented at 2:39 pm on December 6, 2023: member

    Nice, but I am not sure if we want to go down to route to put everyone back into the makefile hell, because this will just make writing new fuzz tests bloaty and impossible to maintain/review (Who is going to read those repetitive and ugly 1k LOC of build code). (Unrelated: Not even sure how this will interact with cmake)

    It seems easier to just put a one-line bash command into the readme to achieve the same, for anyone that needs it (oss-fuzz, fuzz-introspector, afl-lto, etc …).

    Happy to provide more feedback, but for now it would be good to explain why each target needs to be in a separate file.

    In C++ it should be possible to have more than one function in the same file. Also, given that FuzzFrameworkRegisterTarget isn’t called for the function pointer, LTO should see this and nuke the function?

  22. dergoegge commented at 2:46 pm on December 6, 2023: member
    I split the harnesses into individual files so that we can use the file list in src/test/fuzz as the harness list. It should be possible to use that to have a loop (?) in the makefile to build each binary instead of the hard-coded list I have here (I haven’t figured out how to do that yet). If that works, adding a new fuzz harness becomes easier as no makefile would need to be edited.
  23. maflcko commented at 3:06 pm on December 6, 2023: member

    turns out we can actually keep the map if we just don’t add all the harness functions into it

    Are you sure on this? I don’t know how afl or fuzz-introspector detect reachable code paths, but I’d be surprised if the static analysis can follow a function pointer hidden in a map.

  24. DrahtBot added the label Needs rebase on Dec 6, 2023
  25. DrahtBot commented at 3:55 pm on December 6, 2023: contributor

    🐙 This pull request conflicts with the target branch and needs rebase.

  26. dergoegge commented at 3:59 pm on December 6, 2023: member

    Are you sure on this? I don’t know how afl or fuzz-introspector detect reachable code paths, but I’d be surprised if the static analysis can follow a function pointer hidden in a map.

    Yea maybe not… should be quite easy to avoid the map.

    I’ll try to setup a fuzz-introspector instance at some point, I have a feeling it’ll still be a while before we can use oss-fuzz’s instance.

  27. ajtowns commented at 0:36 am on December 7, 2023: contributor

    The cumulative size of the individual binaries (compiled with LTO) is 3.6GB vs 14GB when search and replacing std::getenv("FUZZ") (like we do in oss-fuzz).

    What does this mean? I’m seeing file sizes of 448MB (with fuzzing enabled) and 146MB (with a normal build) for test/fuzz/fuzz.

    This seems like it’s reverting #20560?

  28. maflcko commented at 7:45 am on December 7, 2023: member

    This seems like it’s reverting #20560?

    yeah, in the current form this is reverting that pull, so I don’t think it can be merged. Though, I guess the goal is to somehow come up with magic makefile code (which can optionally be enabled) to extend the build code from the file names automatically.

  29. maflcko commented at 8:45 am on December 7, 2023: member
    I guess the “one fuzz target per file” makes the “one fuzz binary per message/rpc type” idea a bit harder to implement: #28015 (comment) ?
  30. dergoegge commented at 10:50 am on December 7, 2023: member

    What does this mean? I’m seeing file sizes of 448MB (with fuzzing enabled) and 146MB (with a normal build) for test/fuzz/fuzz.

    If you build this pull request you should see a binary per fuzz harness in src/test/fuzz (e.g. src/test/fuzz/process_message), as well as the usual fuzz binary src/test/fuzz/fuzz. If you sum up the size of the individual per harness binaries (assuming you compile with LTO) you should end up with roughly 3.6GB (maybe this depends on compiler version etc.). The 14GB is the size of src/test/fuzz/fuzz multiplied by the number harnesses we have (in this PR 127 harnesses), since that is what we have in oss-fuzz at the moment.

    How I compiled this PR to get these stats:

    0CC=afl-clang-lto CXX=afl-clang-lto++ ./configure --enable-fuzz
    1make
    
  31. in src/test/fuzz/fuzz.h:58 in 8118b24539
    61+    void name##_fuzz_target(FuzzBufferType);                                           \
    62+    struct name##_Before_Main {                                                        \
    63+        name##_Before_Main()                                                           \
    64+        {                                                                              \
    65+            if constexpr (should_compile_harness(#name)) {                             \
    66+                FuzzFrameworkRegisterTarget(#name, name##_fuzz_target, {__VA_ARGS__}); \
    


    maflcko commented at 10:57 am on December 7, 2023:
    Is the change in this header file needed for your approach? If there is only one fuzz target per file, and only one file is compiled, you wouldn’t need to check whether it needs to be compiled, no?

    dergoegge commented at 11:06 am on December 7, 2023:

    Right, I think once building all the individual binaries is optional, this could be used to build just one of the individual binaries, e.g.:

    0CPPFLAGS="-DFUZZ_HARNESS=process_message"  ./configure --enable-fuzz
    1make
    

    which would produce src/test/fuzz/fuzz that only has the process_message harness. Building and linking just one harness is faster than all of them, maybe useful when reproducing testcases?

    Maybe there is a better way of accomplishing this?


    maflcko commented at 11:13 am on December 7, 2023:

    Building and linking just one harness is faster than all of them, maybe useful when reproducing testcases?

    Not sure. This isn’t supported for any other tests (bench, unit, …), so maybe leave as a follow-up.

    If building takes a long time, you can use more CPUs or a populated ccache.

    Also, it could be done easier inside the makefile by just skipping over the other files, if needed?

  32. ajtowns commented at 10:26 pm on December 7, 2023: contributor

    What does this mean? I’m seeing file sizes of 448MB (with fuzzing enabled) and 146MB (with a normal build) for test/fuzz/fuzz.

    If you build this pull request you should see a binary per fuzz harness in src/test/fuzz (e.g. src/test/fuzz/process_message), as well as the usual fuzz binary src/test/fuzz/fuzz. If you sum up the size of the individual per harness binaries (assuming you compile with LTO) you should end up with roughly 3.6GB (maybe this depends on compiler version etc.).

    Hmm, if I switch to this PR and try compiling, I get bunches of linker errors about multiple definitions, eg

    /usr/bin/ld: test/fuzz/fuzz-addrman_serdeser.o: in function `std::iterator_traits<std::byte const*>::difference_type std::__distance<std::byte const*>(std::byte const*, std::byte const*, std::random_access_iterator_tag)':
    

    ./src/./tinyformat.h:666: multiple definition of `MakeV1Transport(long)’; test/fuzz/fuzz-addrman.o:./src/./tinyformat.h:666: first defined here

    Dropping from 14GB to 3.6GB seems worthwhile for oss-fuzz (and similar), but not worthwhile otherwise to me – it’s not compiling so I can’t say for sure, but 450MB bloating out to 3.6GB sounds terrible? Also LTO seems pretty slow, and not amenable to ccache…

    Shouldn’t it be possible to just setup test/fuzz/fuzz.cpp so it (alone) can be recompiled with FUZZ_HARNESS defined, and rely on LTO for everything else (rather than should_compile_harness)? Something like: https://github.com/ajtowns/bitcoin/tree/202312-fuzz-many-exes ?

    You could maybe use something like:

    0$ find -name '*.cpp' | sort | while read a; do grep ^FUZZ_TARGET $a | sed 's/^FUZZ_TARGET[_A-Z]*[(] *\([^,) ]*\) *[,)].*/\1/'; done
    

    to get a list of fuzz targets without having to compile the code first, or separate things into different files.

  33. dergoegge commented at 12:37 pm on December 11, 2023: member

    Hmm, if I switch to this PR and try compiling, I get bunches of linker errors about multiple definitions, eg

    Yea weird, it compiles for me with afl-clang-lto but not with regular clang.

    Dropping from 14GB to 3.6GB seems worthwhile for oss-fuzz (and similar), but not worthwhile otherwise to me

    I listed more benefits in #28971. It is not gonna have a noticeable effect for contributors that aren’t regularly fuzzing.

    to get a list of fuzz targets without having to compile the code first, or separate things into different files.

    The splitting is super straight forward though and the more maintainable thing IMO (your script does not work for e.g. test/fuzz/deserialize.cpp). If we are going to continue employing sed hacks, we can just keep doing that in oss-fuzz.

    I guess the goal is to somehow come up with magic makefile code (which can optionally be enabled) to extend the build code from the file names automatically.

    After consulting our build system gods, it looks like this is not possible with automake. It is apparently very easy to do with cmake though, so I guess we can consider this blocked until cmake is in.

  34. dergoegge commented at 11:55 am on February 1, 2024: member

    I’ll re-open this once we switched to cmake.

    I’ve rebased the commits on the current cmake staging branch and building individual binaries using cmake is quite easy (assuming there is one harness per file):

     0  file(GLOB fuzz_harness_file_list RELATIVE ${CMAKE_CURRENT_SOURCE_DIR} "./*.cpp")
     1  # Remove fuzz.cpp and util.cpp as they don't contain fuzz harnesses
     2  list(REMOVE_ITEM fuzz_harness_file_list "fuzz.cpp")
     3  list(REMOVE_ITEM fuzz_harness_file_list "util.cpp")
     4
     5  foreach(harness_file ${fuzz_harness_file_list})
     6    get_filename_component(harness ${harness_file} NAME_WLE)
     7    add_executable(fuzz_${harness} ${harness_file})
     8    target_compile_definitions(fuzz_${harness} PUBLIC SINGLE_FUZZ_HARNESS)
     9    target_link_libraries(fuzz_${harness}
    10      bitcoin_crypto # TODO ??? memory_cleanse undefined errors
    11      core_interface
    12      test_fuzz
    13      bitcoin_cli
    14      bitcoin_common
    15      minisketch
    16      leveldb
    17      univalue
    18      secp256k1
    19      Boost::headers
    20      libevent::libevent
    21      bitcoin_consensus
    22    )
    23  endforeach()
    
  35. dergoegge closed this on Feb 1, 2024


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-06-29 10:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me