fuzz: Make partially_downloaded_block more deterministic #32158

pull maflcko wants to merge 5 commits into bitcoin:master from maflcko:2503-fuzz-det changing 3 files +89 −30
  1. maflcko commented at 2:04 pm on March 28, 2025: member

    This should make the partially_downloaded_block fuzz target even more deterministic.

    Follow-up to #31841. Tracking issue: #29018.

    This bundles several changes:

    • First, speed up the deterministic-fuzz-coverage helper by introducing parallelism.
    • Then, a fix to remove spawned test threads or spawn them deterministically. (While testing this, high parallelism and thread contention may be needed)

    Testing

    It can be tested via (setting 32 parallel threads):

    0cargo run --manifest-path ./contrib/devtools/deterministic-fuzz-coverage/Cargo.toml -- $PWD/bld-cmake/ $PWD/../b-c-qa-assets/fuzz_corpora/ partially_downloaded_block 32
    

    Locally, on a failure, the output would look like:

    0 ....
    1-  150|      0|            m_worker_threads.emplace_back([this, n]() {
    2-  151|      0|                util::ThreadRename(strprintf("scriptch.%i", n));
    3+  150|      1|            m_worker_threads.emplace_back([this, n]() {
    4+  151|      1|                util::ThreadRename(strprintf("scriptch.%i", n));
    5 ...
    

    This excerpt likely indicates that the script threads were started after the fuzz init function returned.

    Similarly, for the scheduler thread, it would look like:

    0 ...
    1   227|      0|        m_node.scheduler = std::make_unique<CScheduler>();
    2-  228|      1|        m_node.scheduler->m_service_thread = std::thread(util::TraceThread, "scheduler", [&] { m_node.scheduler->serviceQueue(); });
    3+  228|      0|        m_node.scheduler->m_service_thread = std::thread(util::TraceThread, "scheduler", [&] { m_node.scheduler->serviceQueue(); });
    4   229|      0|        m_node.validation_signals =
    5 ...
    
  2. DrahtBot commented at 2:04 pm on March 28, 2025: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/32158.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK hodlinator, janb84, Prabhat1308

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    Conflicts

    Reviewers, this pull request conflicts with the following ones:

    • #32113 (fuzz: enable running fuzz test cases in Debug mode by ajtowns)

    If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

  3. DrahtBot added the label Tests on Mar 28, 2025
  4. fanquake requested review from dergoegge on Mar 29, 2025
  5. fanquake requested review from marcofleon on Mar 29, 2025
  6. maflcko commented at 10:34 am on March 29, 2025: member
    I’ve run cargo run --manifest-path ./contrib/devtools/deterministic-fuzz-coverage/Cargo.toml -- $PWD/bld-cmake/ $PWD/../b-c-qa-assets/fuzz_corpora/ partially_downloaded_block 128 for about 300 times and it passed. So hopefully this is good enough for now. (In theory the scheduler thread may still be woken spuriously, even if there is no work, but the only solution to that would be to disable it completely for all fuzz targets that don’t need it.)
  7. Prabhat1308 commented at 8:34 pm on March 29, 2025: contributor

    tACK fa1e299

    Tested 10 runs with each 32 and 128 parallel threads on MacOS. Steps followed

    0cmake -B build -DCMAKE_C_COMPILER="$(brew --prefix llvm)/bin/clang" \ 
    1   -DCMAKE_CXX_COMPILER="$(brew --prefix llvm)/bin/clang++" \
    2   -DCMAKE_C_FLAGS="-fprofile-instr-generate -fcoverage-mapping" \
    3   -DCMAKE_CXX_FLAGS="-fprofile-instr-generate -fcoverage-mapping" \
    4   -DBUILD_FOR_FUZZING=ON
    5   
    6   cmake --build build -j$(sysctl -n hw.ncpu) 
    7   
    8   cargo run --manifest-path ./contrib/devtools/deterministic-fuzz-coverage/Cargo.toml -- $PWD/build/ $PWD/qa-assets/fuzz_corpora/ partially_downloaded_block 32/128
    
  8. in contrib/devtools/deterministic-fuzz-coverage/src/main.rs:28 in fa1e2995d9 outdated
    24     format!(
    25         r#"
    26 Error: {err}
    27 
    28-Usage: program ./build_dir ./qa-assets/fuzz_corpora fuzz_target_name
    29+Usage: program ./build_dir ./qa-assets/fuzz_corpora fuzz_target_name (parallelism={DEFAULT_PAR})
    


    hodlinator commented at 9:11 am on March 31, 2025:

    nit: []-brackets are commonly used for optional arguments.

    0Usage: program ./build_dir ./qa-assets/fuzz_corpora fuzz_target_name [parallelism={DEFAULT_PAR}]
    

    Might be nice to standardize on -jN, but keeping the logic simple is also good.


    maflcko commented at 12:40 pm on March 31, 2025:
    Thx, will fixup, if I have to re-touch
  9. in src/test/util/setup_common.cpp:258 in fa1e2995d9 outdated
    253@@ -247,7 +254,8 @@ ChainTestingSetup::ChainTestingSetup(const ChainType chainType, TestOpts opts)
    254             .check_block_index = 1,
    255             .notifications = *m_node.notifications,
    256             .signals = m_node.validation_signals.get(),
    257-            .worker_threads_num = 2,
    258+            // Use no worker threads while fuzzing to avoid non-determinism
    259+            .worker_threads_num = G_FUZZING ? 0 : 2,
    


    hodlinator commented at 9:36 am on March 31, 2025:

    I’m worried this means we get decreased fuzz-coverage. Maybe could introduce another var?

    0            .worker_threads_num = G_FUZZING && G_DETERMINISTIC? 0 : 2,
    

    Although being able to reproduce fuzz failures is also nice. :\


    maflcko commented at 12:40 pm on March 31, 2025:
    Not sure. If someone wants to fuzz test the script worker threads, a dedicated fuzz target seems ideal (See ./src/test/fuzz/checkqueue.cpp). Relying on unrelated fuzz targets to achieve coverage here seems brittle at best, because those targets may change at any time, dropping the coverage silently. Also, I am not aware of any meaningful coverage here in any fuzz target that would be more than what the unit tests and functional tests already achieve.
  10. in contrib/devtools/deterministic-fuzz-coverage/src/main.rs:229 in fa1e2995d9 outdated
    234+            };
    235+            if r.is_err() {
    236+                *res = r;
    237+            }
    238+        };
    239+        for (i, entry) in entries.iter().enumerate() {
    


    hodlinator commented at 12:01 pm on March 31, 2025:
    Found a way to reduce this code by 14 lines. What do you think about f8a0a32280d7636135f6821401d7f3b18d10476b?

    hodlinator commented at 12:14 pm on March 31, 2025:
    Would be happy to be listed as commit co-author if you take this or the output spam reduction (https://github.com/bitcoin/bitcoin/pull/32158#discussion_r2020914099) as this is partially inspired by #32074 (review).

    maflcko commented at 12:40 pm on March 31, 2025:

    I don’t think your suggestion works. In Rust, the ? operator is the early-return operator. This means, if you use it in a while loop that joins threads, it may return early and leave some threads dangling.

    This is not much of an issue here, as thread::scope takes care of joining them, once they go out of scope. However, the main thread may now sometimes return an Err, or panic, when a child panics. This seems confusing. Also, if in the future all errors want to be collected and returned, one would have to revert your change as well?


    hodlinator commented at 1:16 pm on March 31, 2025:

    I don’t think your suggestion works.

    I’ve verified it works by executing something that fails determinism:

    0cargo run --manifest-path ./contrib/devtools/deterministic-fuzz-coverage/Cargo.toml -- $PWD/build_fuzz ../qa-assets/fuzz_corpora/ connman 16
    

    it may return early and leave some threads dangling.

    https://doc.rust-lang.org/stable/std/thread/struct.ScopedJoinHandle.html states:

    “An owned permission to join on a scoped thread (block on its termination).”

    However, the main thread may now sometimes return an Err, or panic, when a child panics.

    The change doesn’t modify how panics are handled, it still has Err("a scoped thread panicked".to_string()).

    Also, if in the future all errors want to be collected and returned, one would have to revert your change as well?

    I don’t think the diff would be significantly larger when changing the current version of this PR.


    maflcko commented at 1:31 pm on March 31, 2025:

    You should be able to test my claim by panicking the threads and observing that the Err never makes it out.

    See https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=b7091d91fc333d2e260327c46a90dec6

    Of course, this is fine here. I just wanted to explain the change in behavior here and why I choose the more consistent behavior myself here.


    hodlinator commented at 2:01 pm on March 31, 2025:

    Thanks for the correction.

    I believe this would work, even though rustfmt makes the line count less of a win: 67e6d9ece87606297865761f0fb222db9604c180


    maflcko commented at 3:49 pm on March 31, 2025:
    thx, pushed something similar

    hodlinator commented at 7:56 pm on March 31, 2025:
    Nice touch with getting rid of the expect() with the if let ... else and draining handles when res.is_err().
  11. in contrib/devtools/deterministic-fuzz-coverage/src/main.rs:129 in fa1e2995d9 outdated
    126+        let cov_txt_path = build_dir.join(format!("fuzz_det_cov.show.t{thread_id}.r{run_id}.txt"));
    127+        let profraw_file = build_dir.join(format!("fuzz_det_cov.t{thread_id}.r{run_id}.profraw"));
    128+        let profdata_file = build_dir.join(format!("fuzz_det_cov.t{thread_id}.r{run_id}.profdata"));
    129         if !{
    130             {
    131                 let mut cmd = Command::new(fuzz_exe);
    


    hodlinator commented at 12:04 pm on March 31, 2025:
    Appreciate you adding the “[N/M]” output, but it’s spammed away by fuzz-output even on success. Suggest capturing stdio/stderr and only output them on failure: ea3e89e48250013ea818abcefd7be8e72d31f23d

    maflcko commented at 12:41 pm on March 31, 2025:

    Seems fine, but I’d say, if you want less libfuzzer output, it would be easier to compile without libfuzzer, especially given that it is not needed for this tool.

    Also, code-wise, it would probably be easier to just use https://doc.rust-lang.org/std/process/struct.Command.html#method.output instead of spawning and handling a child manually.


    hodlinator commented at 1:31 pm on March 31, 2025:

    if you want less libfuzzer output, it would be easier to compile without libfuzzer

    Given the defaults in contrib/devtools/README.md I still think this would be a good change.

    Missed that output() still allowed checking the exit code. You are right in that it cleaned things up: a7f4e4b7cd0a04ae7c84ad7afa284006a600e369


    hodlinator commented at 1:56 pm on March 31, 2025:
    Now run through rustfmt: 142e65da352e77f061da3142bff8e2b19e56fc86

    maflcko commented at 3:49 pm on March 31, 2025:
    thx, pushed something similar with emojis
  12. hodlinator commented at 12:07 pm on March 31, 2025: contributor
    Concept ACK fa1e2995d9996641e79f92549e99a6b37e36d140
  13. in contrib/devtools/deterministic-fuzz-coverage/src/main.rs:230 in fa1e2995d9 outdated
    235+            if r.is_err() {
    236+                *res = r;
    237+            }
    238+        };
    239+        for (i, entry) in entries.iter().enumerate() {
    240+            println!("[{i}/{}]", entries.len());
    


    hodlinator commented at 3:21 pm on March 31, 2025:

    Off-by-one:

    0            println!("[{}/{}]", i + 1, entries.len());
    

    maflcko commented at 3:49 pm on March 31, 2025:
    thx, done
  14. maflcko force-pushed on Mar 31, 2025
  15. contrib: Add optional parallelism to deterministic-fuzz-coverage
    Co-Authored-By: Hodlinator <172445034+hodlinator@users.noreply.github.com>
    fa7e931130
  16. contrib: Use -Xdemangler=llvm-cxxfilt in deterministic-*-coverage
    This makes the result more readable.
    fa82fe2c73
  17. maflcko force-pushed on Mar 31, 2025
  18. maflcko force-pushed on Mar 31, 2025
  19. in contrib/devtools/deterministic-fuzz-coverage/src/main.rs:124 in fa74244e97 outdated
    119@@ -110,10 +120,12 @@ fn deterministic_coverage(
    120         .map(|entry| entry.expect("IO error"))
    121         .collect::<Vec<_>>();
    122     entries.sort_by_key(|entry| entry.file_name());
    123-    let run_single = |run_id: u8, entry: &Path| -> Result<PathBuf, AppError> {
    124-        let cov_txt_path = build_dir.join(format!("fuzz_det_cov.show.{run_id}.txt"));
    125-        if !{
    126-            {
    127+    let run_single = |run_id: u8, entry: &Path, thread_id: usize| -> Result<PathBuf, AppError> {
    128+        let cov_txt_path = build_dir.join(format!("fuzz_det_cov.show.t{thread_id}.r{run_id}.txt"));
    


    hodlinator commented at 7:53 pm on March 31, 2025:

    nit: Current failure output:

    0diff --git a/home/hodlinator/bitcoin/build_fuzz/fuzz_det_cov.show.t5.r0.txt b/home/hodlinator/bitcoin/build_fuzz/fuzz_det_cov.show.t5.r1.txt
    1index da0315b1e4..93623d5c15 100644
    2--- a/home/hodlinator/bitcoin/build_fuzz/fuzz_det_cov.show.t5.r0.txt
    3+++ b/home/hodlinator/bitcoin/build_fuzz/fuzz_det_cov.show.t5.r1.txt
    4...
    

    If you re-touch, it might be nice to use a/b instead of r0/r1:

    0diff --git a/home/hodlinator/bitcoin/build_fuzz/fuzz_det_cov.show.t5.a.txt b/home/hodlinator/bitcoin/build_fuzz/fuzz_det_cov.show.t5.b.txt
    1index da0315b1e4..93623d5c15 100644
    2--- a/home/hodlinator/bitcoin/build_fuzz/fuzz_det_cov.show.t5.a.txt
    3+++ b/home/hodlinator/bitcoin/build_fuzz/fuzz_det_cov.show.t5.b.txt
    
    0    let run_single = |run_id: char, entry: &Path, thread_id: usize| -> Result<PathBuf, AppError> {
    1        let cov_txt_path = build_dir.join(format!("fuzz_det_cov.show.t{thread_id}.{run_id}.txt"));
    
  20. janb84 commented at 7:55 pm on March 31, 2025: contributor

    The 128 run fails on my machine: fuzz failed: Too many open files (os error 24)

    Have used the following command:

    0 cargo run --manifest-path ./contrib/devtools/deterministic-fuzz-coverage/Cargo.toml -- $PWD/build/ $PWD/qa-assets/fuzz_corpora/ partially_downloaded_block 128
    

    The 32 run works. I’m on a M1 MAX, in a nix-shell clang 19.1.7, cargo 1.85.0

    EDIT: the ulimit was default 256 on NixOS, have now set it to 4096 and it runs. ulimit -n 4096

  21. in contrib/devtools/deterministic-fuzz-coverage/src/main.rs:232 in fa74244e97 outdated
    232+        let mut handles = VecDeque::with_capacity(par);
    233+        let mut res = Ok(());
    234+        for (i, entry) in entries.iter().enumerate() {
    235+            println!("[{}/{}]", i + 1, entries.len());
    236+            handles.push_back(s.spawn(move || check_individual(entry, i % par)));
    237+            while handles.len() >= par || i == (entries.len() - 1) || res.is_err() {
    


    janb84 commented at 9:12 pm on March 31, 2025:
    0            while handles.len() >= par || i == (entries.len() - 1) || res.is_err() {
    

    Is the condition || res.is_err() intended to cause an early termination? it seems to keep running until handles is empty


    maflcko commented at 6:20 am on April 1, 2025:

    Yes, both are correct. The while loop will run while res.is_err() is true (res will remain err). The loop terminates when it is broken out from via the return res;.

    It is not possible to terminate the program even earlier (other than doing an unclean abort), before all thread handles are empty. Because, if they weren’t empty, the thread scope will implicitly join the remaining “dangling” threads.

    This can be observed in the docs (https://doc.rust-lang.org/stable/std/thread/struct.Scope.html#method.spawn), or by testing (https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=a077ca0d4aa8d08e832a97fb28692417).


    janb84 commented at 11:50 am on April 1, 2025:
    Thanks for the detailed response! It answered my questions and gave some new insights !
  22. janb84 commented at 9:12 pm on March 31, 2025: contributor

    Concept ACK fa74244

    -Tested, with 8,32 and 128 ✅ (8 will saturate most of the available cores already)

  23. hodlinator commented at 7:53 am on April 1, 2025: contributor

    ACK 9999358962ebbff7e2821db7081a8c782b0179c8

    Concept

    Fixes fuzz/unit test nondeterminism

    • Waits service thread/scheduler to start up at beginning of tests.
    • Disables worker threads for CCheckQueue/script checking.

    Contrib utilities

    • Changes checking for determinism over a multitude of fuzz inputs from a single-threaded slug into an optionally highly parallel affair.
    • Limits output from fuzz and adds [N/M] progress, so humans can get a feeling for when testing would complete (if successful for all inputs).
    • Demangles symbols in coverage output.

    Testing

    Tested consistently failing determinism output through:

    0cargo run --manifest-path ./contrib/devtools/deterministic-fuzz-coverage/Cargo.toml -- $PWD/build_fuzz ../qa-assets/fuzz_corpora/ connman 16
    

    I’m having issues reproducing diffs containing m_service_thread|scriptch without the C++ fix. Managed to produce a scriptch-containing diff 1-2 times out of ~40 runs. Any further tips on which qa-assets commit to use, or any other conditions that might help are welcome. I’ve seen it succeed multiple times with parallelism of 400, when it fails, I typically don’t see those strings. This is the only thing preventing me from ACKing the last commit.

  24. maflcko commented at 8:00 am on April 1, 2025: member

    I’m having issues reproducing diffs containing m_service_thread|scriptch without the C++ fix. Managed to produce a scriptch-containing diff 1-2 times out of ~40 runs. Any further tips on which qa-assets commit to use, or any other conditions that might help are welcome.

    What is your clang version and do you mind sharing the cmake configure command and the full diff? I just used the tip of qa-assets.

  25. maflcko force-pushed on Apr 1, 2025
  26. contrib: Only print fuzz output on failure
    This makes it humanly possible to track progress as only "[N/M]"-lines are printed as long as we succeed.
    
    Also, use char (a, b) to indicate run_id instead of u8 (0, 1).
    Also, use emojis to indicate final success or error.
    
    Co-Authored-By: Hodlinator <172445034+hodlinator@users.noreply.github.com>
    fa900bb2dc
  27. test: Avoid script check worker threads while fuzzing
    Threads may execute their function any time after they are spawned, so
    coverage could be non-deterministic.
    
    Fix this,
    
    * for the script check worker threads by disabling them while fuzzing.
    * for the scheduler thread by waiting for it to fully start and run the
      service queue.
    fa17cdb191
  28. maflcko force-pushed on Apr 1, 2025
  29. contrib: Warn about using libFuzzer for coverage check fa51310121
  30. in src/test/util/setup_common.cpp:61 in fa17cdb191 outdated
    59@@ -60,6 +60,7 @@
    60 #include <walletinitinterface.h>
    61 
    


    hodlinator commented at 8:31 am on April 1, 2025:

    (Inline comment thread to not spam main)

    What is your clang version and do you mind sharing the cmake configure command and the full diff? I just used the tip of qa-assets.

     0 cmake -B build_fuzz
     1CMake Warning at CMakeLists.txt:216 (message):
     2  BUILD_FOR_FUZZING=ON will disable all other targets and force
     3  BUILD_FUZZ_BINARY=ON.
     4
     5
     6
     7Configuring secp256k1 subtree...
     8-- Could NOT find Valgrind (missing: Valgrind_INCLUDE_DIR Valgrind_WORKS) 
     9
    10
    11secp256k1 configure summary
    12===========================
    13Build artifacts:
    14  library type ........................ Static
    15Optional modules:
    16  ECDH ................................ OFF
    17  ECDSA pubkey recovery ............... ON
    18  extrakeys ........................... ON
    19  schnorrsig .......................... ON
    20  musig ............................... OFF
    21  ElligatorSwift ...................... ON
    22Parameters:
    23  ecmult window size .................. 15
    24  ecmult gen table size ............... 86 KiB
    25Optional features:
    26  assembly ............................ x86_64
    27  external callbacks .................. OFF
    28Optional binaries:
    29  benchmark ........................... OFF
    30  noverify_tests ...................... OFF
    31  tests ............................... OFF
    32  exhaustive tests .................... OFF
    33  ctime_tests ......................... OFF
    34  examples ............................ OFF
    35
    36Cross compiling ....................... FALSE
    37Valgrind .............................. OFF
    38Preprocessor defined macros ........... ENABLE_MODULE_ELLSWIFT=1 ENABLE_MODULE_SCHNORRSIG=1 ENABLE_MODULE_EXTRAKEYS=1 ENABLE_MODULE_RECOVERY=1 ECMULT_WINDOW_SIZE=15 COMB_BLOCKS=43 COMB_TEETH=6 USE_ASM_X86_64=1
    39C compiler ............................ Clang 19.1.7, /nix/store/ls0g67nsklb2vn3vc9dnksa1adfgq32a-clang-wrapper-19.1.7/bin/clang
    40CFLAGS ................................ -ftrivial-auto-var-init=pattern
    41Compile options ....................... -pedantic -Wall -Wcast-align -Wconditional-uninitialized -Wextra -Wnested-externs -Wno-long-long -Wno-overlength-strings -Wno-unused-function -Wreserved-identifier -Wshadow -Wstrict-prototypes -Wundef
    42Build type:
    43 - CMAKE_BUILD_TYPE ................... RelWithDebInfo
    44 - CFLAGS ............................. -O2 -g 
    45 - LDFLAGS for executables ............ 
    46 - LDFLAGS for shared libraries ....... 
    47SECP256K1_APPEND_CFLAGS ............... -fsanitize=undefined,address,fuzzer
    48SECP256K1_APPEND_LDFLAGS .............. -fsanitize=undefined,address,fuzzer
    49
    50
    51
    52Configure summary
    53=================
    54Executables:
    55  bitcoind ............................ OFF
    56  bitcoin-node (multiprocess) ......... OFF
    57  bitcoin-qt (GUI) .................... OFF
    58  bitcoin-gui (GUI, multiprocess) ..... OFF
    59  bitcoin-cli ......................... OFF
    60  bitcoin-tx .......................... OFF
    61  bitcoin-util ........................ OFF
    62  bitcoin-wallet ...................... OFF
    63  bitcoin-chainstate (experimental) ... OFF
    64  libbitcoinkernel (experimental) ..... OFF
    65Optional features:
    66  wallet support ...................... ON
    67   - legacy wallets (Berkeley DB) ..... OFF
    68  external signer ..................... OFF
    69  ZeroMQ .............................. OFF
    70  USDT tracing ........................ OFF
    71  QR code (GUI) ....................... OFF
    72  DBus (GUI, Linux only) .............. OFF
    73Tests:
    74  test_bitcoin ........................ OFF
    75  test_bitcoin-qt ..................... OFF
    76  bench_bitcoin ....................... OFF
    77  fuzz binary ......................... ON
    78
    79Cross compiling ....................... FALSE
    80C++ compiler .......................... Clang 19.1.7, /nix/store/ls0g67nsklb2vn3vc9dnksa1adfgq32a-clang-wrapper-19.1.7/bin/clang++
    81CMAKE_BUILD_TYPE ...................... RelWithDebInfo
    82Preprocessor defined macros ........... FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION
    83C++ compiler flags .................... -fprofile-instr-generate -fcoverage-mapping -O2 -g -std=c++20 -fPIC -fdebug-prefix-map=/home/hodlinator/bitcoin/src=. -fmacro-prefix-map=/home/hodlinator/bitcoin/src=. -fsanitize=undefined,address,fuzzer -Wall -Wextra -Wgnu -Wformat -Wformat-security -Wvla -Wshadow-field -Wthread-safety -Wloop-analysis -Wredundant-decls -Wunused-member-function -Wdate-time -Wconditional-uninitialized -Woverloaded-virtual -Wsuggest-override -Wimplicit-fallthrough -Wunreachable-code -Wdocumentation -Wself-assign -Wundef -Wno-unused-parameter -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=3 -Wstack-protector -fstack-protector-all -fcf-protection=full -fstack-clash-protection
    84Linker flags .......................... -fprofile-instr-generate -fcoverage-mapping -O2 -g -fsanitize=undefined,address,fuzzer -fstack-protector-all -fcf-protection=full -fstack-clash-protection -Wl,-z,relro -Wl,-z,now -Wl,-z,separate-code -fPIE -pie
    85
    86NOTE: The summary above may not exactly match the final applied build flags
    87      if any additional CMAKE_* or environment variables have been modified.
    88      To see the exact flags applied, build with the --verbose option.
    89
    90Attempt to harden executables ......... ON
    91Treat compiler warnings as errors ..... OFF
    92Use ccache for compiling .............. ON
    93
    94
    95-- Configuring done (0.4s)
    96-- Generating done (0.1s)
    97-- Build files have been written to: /home/hodlinator/bitcoin/build_fuzz
    

    maflcko commented at 8:41 am on April 1, 2025:

    cmake -B build_fuzz

    I guess this is an unclean build, because it the command doesn’t set the cxx flags, as explained in contrib/devtools/README.md, however they do end up in configure summary, so this may be fine.

    Also, this is building with sanitizers, which may or may not have an effect on runtime behavior and thus make the non-determinism manifest differently.

    Also, could you please share the full diff --git a/home/hodlinator/bitcoin/build_fuzz/fuzz_det_cov.show.tN.r0.txt b/home/hodlinator/bitcoin/build_fuzz/fuzz_det_cov.show.tN.r1.txt, if you don’t mind?


    hodlinator commented at 8:54 am on April 1, 2025:

    Log/diff with above build config: without_fix_400t.gz

    Will try with a fresh build directory. Would make sense that sanitizers change behavior.


    maflcko commented at 9:14 am on April 1, 2025:

    It looks like this may be a bug in libFuzzer where an input is run twice, instead of once, when it should only be run once?

    It is not visible in the output, as that is now hidden after your suggestion to hide the output, but it could make sense to reject libFuzzer in the script completely.


    hodlinator commented at 9:23 am on April 1, 2025:

    It is not visible in the output, as that is now hidden after your suggestion to hide the input

    Maybe one could return the std::process::Output objects from run_single and pass them into check_diff to be printed upon failure?


    First run with fresh build tree and without the C++ thread fix from this PR failed (no determinism-issue detected).

    ✨ Second run did produce a diff containing both scriptch. and m_service_thread = .... ✨

    (Third run again detected no determinism).

    ✨ Fourth run again produced diff with expected substrings. ✨


    maflcko commented at 9:37 am on April 1, 2025:

    Maybe one could return the std::process::Output objects from run_single and pass them into check_diff to be printed upon failure?

    Not sure. If libFuzzer is broken, it would be good to fix it upstream. Also, it isn’t needed here at all, so it would be better to not use it.

    (Third run again detected no determinism).

    Are you still using libFuzzer? If not, is the full diff the same? If yes, and you really want to debug with libFuzzer more, please revert fa900bb2dce8ef3ee11d5980f008995d66877155.


    hodlinator commented at 10:40 am on April 1, 2025:

    The old build tree has libFuzzer enabled, the new one does not.

    I made it always print the fuzz output, ran with the old tree, and saw no evidence in the fuzz-output of it running twice, but you are right, there is a 2x pattern in coverage.


    maflcko commented at 12:11 pm on April 1, 2025:

    The old build tree has libFuzzer enabled, the new one does not.

    I made it always print the fuzz output, ran with the old tree, and saw no evidence in the fuzz-output of it running twice, but you are right, there is a 2x pattern in coverage.

    I am happy to take a look at the output, if you still have it and want to provide it.


    hodlinator commented at 2:07 pm on April 1, 2025:

    Was using > file 2> file and the output is a big garbled: without_fix_400t_libFuzzOutput.gz

    Had it output the thread id when printing the fuzz-output (fuzz success thread <thread-id>), so one can correlate with t<thead id> in the diff filenames.


    maflcko commented at 3:26 pm on April 1, 2025:

    Looking at the output, it seems to indicate that the fuzz target ran twice:

    0    46|       |FUZZ_TARGET(partially_downloaded_block, .init = initialize_pdb)
    1-   47|      2|{
    2-   48|      2|    SeedRandomStateForTest(SeedRand::ZEROS);
    3-   49|      2|    FuzzedDataProvider fuzzed_data_provider{buffer.data(), buffer.size()};
    4-   50|      2|    SetMockTime(ConsumeTime(fuzzed_data_provider));
    5+   47|      1|{
    6+   48|      1|    SeedRandomStateForTest(SeedRand::ZEROS);
    7+   49|      1|    FuzzedDataProvider fuzzed_data_provider{buffer.data(), buffer.size()};
    8+   50|      1|    SetMockTime(ConsumeTime(fuzzed_data_provider));
    9    51|       |
    

    Thus, it seems there are two options:

    • There is a bug in the code here and two processes are started by the same thread for the same run-id (or some other bug)
    • There is a bug in libFuzzer and the fuzz target is run twice in the same process (most likely)

    If you want to check this further, and if you want to check if the fuzz process itself is at fault, it should be possible by aborting it, if the process runs the target twice. (Also, could mock out the meat of the logic here):

     0diff --git a/contrib/devtools/deterministic-fuzz-coverage/src/main.rs b/contrib/devtools/deterministic-fuzz-coverage/src/main.rs
     1index 9c1738396b..d5b7181c80 100644
     2--- a/contrib/devtools/deterministic-fuzz-coverage/src/main.rs
     3+++ b/contrib/devtools/deterministic-fuzz-coverage/src/main.rs
     4@@ -150,40 +150,6 @@ fn deterministic_coverage(
     5                 ))?;
     6             }
     7         }
     8-        if !Command::new(LLVM_PROFDATA)
     9-            .arg("merge")
    10-            .arg("--sparse")
    11-            .arg(&profraw_file)
    12-            .arg("-o")
    13-            .arg(&profdata_file)
    14-            .status()
    15-            .map_err(|e| format!("{LLVM_PROFDATA} merge failed with {e}"))?
    16-            .success()
    17-        {
    18-            Err(format!("{LLVM_PROFDATA} merge failed. This can be a sign of compiling without code coverage support."))?;
    19-        }
    20-        let cov_file = File::create(&cov_txt_path)
    21-            .map_err(|e| format!("Failed to create coverage txt file ({e})"))?;
    22-        if !Command::new(LLVM_COV)
    23-            .args([
    24-                "show",
    25-                "--show-line-counts-or-regions",
    26-                "--show-branches=count",
    27-                "--show-expansions",
    28-                "--show-instantiation-summary",
    29-                "-Xdemangler=llvm-cxxfilt",
    30-                &format!("--instr-profile={}", profdata_file.display()),
    31-            ])
    32-            .arg(fuzz_exe)
    33-            .stdout(cov_file)
    34-            .spawn()
    35-            .map_err(|e| format!("{LLVM_COV} show failed with {e}"))?
    36-            .wait()
    37-            .map_err(|e| format!("{LLVM_COV} show failed with {e}"))?
    38-            .success()
    39-        {
    40-            Err(format!("{LLVM_COV} show failed"))?;
    41-        };
    42         Ok(cov_txt_path)
    43     };
    44     let check_diff = |a: &Path, b: &Path, err: &str| -> AppResult {
    45@@ -221,11 +187,6 @@ The coverage was not deterministic between runs.
    46         }
    47         let cov_txt_base = run_single('a', &entry, thread_id)?;
    48         let cov_txt_repeat = run_single('b', &entry, thread_id)?;
    49-        check_diff(
    50-            &cov_txt_base,
    51-            &cov_txt_repeat,
    52-            &format!("The fuzz target input was {}.", entry.display()),
    53-        )?;
    54         Ok(())
    55     };
    56     thread::scope(|s| -> AppResult {
    57@@ -254,18 +215,6 @@ The coverage was not deterministic between runs.
    58     // This can catch issues where mutable global state is leaked from one fuzz input execution to
    59     // the next.
    60     println!("Check all fuzz inputs in one go ...");
    61-    {
    62-        if !corpus_dir.is_dir() {
    63-            Err(format!("{} should be a folder", corpus_dir.display()))?;
    64-        }
    65-        let cov_txt_base = run_single('a', &corpus_dir, 0)?;
    66-        let cov_txt_repeat = run_single('b', &corpus_dir, 0)?;
    67-        check_diff(
    68-            &cov_txt_base,
    69-            &cov_txt_repeat,
    70-            &format!("All fuzz inputs in {} were used.", corpus_dir.display()),
    71-        )?;
    72-    }
    73     println!("✨ Coverage test passed for {fuzz_target}. ✨");
    74     Ok(())
    75 }
    76diff --git a/src/test/fuzz/partially_downloaded_block.cpp b/src/test/fuzz/partially_downloaded_block.cpp
    77index 8c6565e48b..466548ee35 100644
    78--- a/src/test/fuzz/partially_downloaded_block.cpp
    79+++ b/src/test/fuzz/partially_downloaded_block.cpp
    80@@ -45,6 +45,9 @@ PartiallyDownloadedBlock::CheckBlockFn FuzzedCheckBlock(std::optional<BlockValid
    81 
    82 FUZZ_TARGET(partially_downloaded_block, .init = initialize_pdb)
    83 {
    84+static bool g_once{true};
    85+Assert(g_once);
    86+g_once=false;
    87     SeedRandomStateForTest(SeedRand::ZEROS);
    88     FuzzedDataProvider fuzzed_data_provider{buffer.data(), buffer.size()};
    89     SetMockTime(ConsumeTime(fuzzed_data_provider));
    

    hodlinator commented at 10:07 pm on April 1, 2025:

    Result with provided patch: ../src/test/fuzz/partially_downloaded_block.cpp:49 partially_downloaded_block_fuzz_target: Assertion `g_once' failed.

    So it’s within the same process. libFuzzer doesn’t take -runs=1 too seriously. Seems weird that it would repeat when only given 1 input… I guess they don’t expect deterministic tests.

     0⚠️
     1fuzz failed!
     2stdout:
     3
     4stderr:
     5INFO: Running with entropic power schedule (0xFF, 100).
     6INFO: Seed: 921745561
     7INFO: Loaded 1 modules   (543062 inline 8-bit counters): 543062 [0x563af8979360, 0x563af89fdcb6), 
     8INFO: Loaded 1 PC tables (543062 PCs): 543062 [0x563af89fdcb8,0x563af9247218), 
     9/home/hodlinator/bitcoin/build_fuzz/bin/fuzz: Running 1 inputs 1 time(s) each.
    10Running: ../qa-assets/fuzz_corpora/partially_downloaded_block/4eccfaddc420749f641d81b122425972c4f5108e
    11../src/test/fuzz/partially_downloaded_block.cpp:49 partially_downloaded_block_fuzz_target: Assertion `g_once' failed.
    12==91561== ERROR: libFuzzer: deadly signal
    13    [#0](/bitcoin-bitcoin/0/) 0x563af549e44a in __sanitizer_print_stack_trace (/home/hodlinator/bitcoin/build_fuzz/bin/fuzz+0x192744a)
    14    [#1](/bitcoin-bitcoin/1/) 0x563af539cb90 in fuzzer::PrintStackTrace() (/home/hodlinator/bitcoin/build_fuzz/bin/fuzz+0x1825b90)
    15    [#2](/bitcoin-bitcoin/2/) 0x563af5377d16 in fuzzer::Fuzzer::CrashCallback() (.part.0) FuzzerLoop.cpp.o
    16    [#3](/bitcoin-bitcoin/3/) 0x563af5377dda in fuzzer::Fuzzer::StaticCrashSignalCallback() (/home/hodlinator/bitcoin/build_fuzz/bin/fuzz+0x1800dda)
    17    [#4](/bitcoin-bitcoin/4/) 0x7f0d6b440f2f  (/nix/store/6q2mknq81cyscjmkv72fpcsvan56qhmg-glibc-2.40-66/lib/libc.so.6+0x40f2f) (BuildId: d9765725d929c713f4481765fa13ed815149985f)
    18    [#5](/bitcoin-bitcoin/5/) 0x7f0d6b49916b in __pthread_kill_implementation (/nix/store/6q2mknq81cyscjmkv72fpcsvan56qhmg-glibc-2.40-66/lib/libc.so.6+0x9916b) (BuildId: d9765725d929c713f4481765fa13ed815149985f)
    19    [#6](/bitcoin-bitcoin/6/) 0x7f0d6b440e85 in gsignal (/nix/store/6q2mknq81cyscjmkv72fpcsvan56qhmg-glibc-2.40-66/lib/libc.so.6+0x40e85) (BuildId: d9765725d929c713f4481765fa13ed815149985f)
    20    [#7](/bitcoin-bitcoin/7/) 0x7f0d6b428939 in abort (/nix/store/6q2mknq81cyscjmkv72fpcsvan56qhmg-glibc-2.40-66/lib/libc.so.6+0x28939) (BuildId: d9765725d929c713f4481765fa13ed815149985f)
    21    [#8](/bitcoin-bitcoin/8/) 0x563af60b3c74 in assertion_fail(std::basic_string_view<char, std::char_traits<char>>, int, std::basic_string_view<char, std::char_traits<char>>, std::basic_string_view<char, std::char_traits<char>>) /home/hodlinator/bitcoin/build_fuzz/../src/util/check.cpp:34:5
    22    [#9](/bitcoin-bitcoin/9/) 0x563af5950a77 in bool& inline_assertion_check<true, bool&>(bool&, char const*, int, char const*, char const*) /home/hodlinator/bitcoin/build_fuzz/../src/util/check.h:59:13
    23    [#10](/bitcoin-bitcoin/10/) 0x563af59db569 in partially_downloaded_block_fuzz_target(std::span<unsigned char const, 18446744073709551615ul>) /home/hodlinator/bitcoin/build_fuzz/../src/test/fuzz/partially_downloaded_block.cpp:49:1
    24    [#11](/bitcoin-bitcoin/11/) 0x563af5cff4c0 in std::function<void (std::span<unsigned char const, 18446744073709551615ul>)>::operator()(std::span<unsigned char const, 18446744073709551615ul>) const /nix/store/dih8vf5naf93c0wcfxqa9pll3k7iv9bm-gcc-14-20241116/include/c++/14-20241116/bits/std_function.h:591:9
    25    [#12](/bitcoin-bitcoin/12/) 0x563af5cff4c0 in test_one_input(std::span<unsigned char const, 18446744073709551615ul>) /home/hodlinator/bitcoin/build_fuzz/../src/test/fuzz/fuzz.cpp:86:5
    26    [#13](/bitcoin-bitcoin/13/) 0x563af5cff4c0 in LLVMFuzzerTestOneInput /home/hodlinator/bitcoin/build_fuzz/../src/test/fuzz/fuzz.cpp:205:5
    27    [#14](/bitcoin-bitcoin/14/) 0x563af53785d8 in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long) (/home/hodlinator/bitcoin/build_fuzz/bin/fuzz+0x18015d8)
    28    [#15](/bitcoin-bitcoin/15/) 0x563af5378b5e in fuzzer::Fuzzer::TryDetectingAMemoryLeak(unsigned char const*, unsigned long, bool) (/home/hodlinator/bitcoin/build_fuzz/bin/fuzz+0x1801b5e)
    29    [#16](/bitcoin-bitcoin/16/) 0x563af5353e9f in fuzzer::RunOneTest(fuzzer::Fuzzer*, char const*, unsigned long) (/home/hodlinator/bitcoin/build_fuzz/bin/fuzz+0x17dce9f)
    30    [#17](/bitcoin-bitcoin/17/) 0x563af535f904 in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) (/home/hodlinator/bitcoin/build_fuzz/bin/fuzz+0x17e8904)
    31    [#18](/bitcoin-bitcoin/18/) 0x563af52a9242 in main (/home/hodlinator/bitcoin/build_fuzz/bin/fuzz+0x1732242)
    32    [#19](/bitcoin-bitcoin/19/) 0x7f0d6b42a1fd in __libc_start_call_main (/nix/store/6q2mknq81cyscjmkv72fpcsvan56qhmg-glibc-2.40-66/lib/libc.so.6+0x2a1fd) (BuildId: d9765725d929c713f4481765fa13ed815149985f)
    33    [#20](/bitcoin-bitcoin/20/) 0x7f0d6b42a2b8 in __libc_start_main@GLIBC_2.2.5 (/nix/store/6q2mknq81cyscjmkv72fpcsvan56qhmg-glibc-2.40-66/lib/libc.so.6+0x2a2b8) (BuildId: d9765725d929c713f4481765fa13ed815149985f)
    34    [#21](/bitcoin-bitcoin/21/) 0x563af5349574 in _start (/home/hodlinator/bitcoin/build_fuzz/bin/fuzz+0x17d2574)
    35
    36NOTE: libFuzzer has rudimentary signal handlers.
    37      Combine libFuzzer with AddressSanitizer or similar for better crash reports.
    38SUMMARY: libFuzzer: deadly signal
    

    maflcko commented at 7:40 am on April 2, 2025:

    So it’s within the same process. libFuzzer doesn’t take -runs=1 too seriously. Seems weird that it would repeat when only given 1 input… I guess they don’t expect deterministic tests.

    Thanks for checking. I’d say it would be nice to report this upstream to llvm (with a minimal reproducer), and then eventually fix it. Though, google stopped maintaining it, so they providing the fix is unlikely.


    maflcko commented at 8:00 am on April 2, 2025:

    I suspect the minimal reproducer would be:

     0cat << EOF > /tmp/fuzz_once.cpp
     1#include <cstddef>
     2#include <cstdint>
     3#include <cstdlib>
     4
     5extern "C" int LLVMFuzzerTestOneInput(const uint8_t *Data, size_t Size) {
     6  static bool g_once{true};
     7  if (g_once) {
     8    g_once = false;
     9    return 0;
    10  }
    11  std::abort(); // Ran twice :(
    12}
    13EOF
    

    and then:

    0clang++ -fsanitize=fuzzer -g -o /tmp/fuzz_once.exe /tmp/fuzz_once.cpp && while ( /tmp/fuzz_once.exe /tmp/fuzz_once.cpp ) ; do true ; done
    

    However, I can’t reproduce. So it would be good if someone else checked this.

  31. in contrib/devtools/deterministic-fuzz-coverage/src/main.rs:115 in fa51310121
    112-    let profdata_file = build_dir.join("fuzz_det_cov.profdata");
    113+    if using_libfuzzer {
    114+        println!("Warning: The fuzz executable was compiled with libFuzzer as sanitizer.");
    115+        println!("This tool may be tripped by libFuzzer misbehavior.");
    116+        println!("It is recommended to compile without libFuzzer.");
    117+    }
    


    hodlinator commented at 11:36 am on April 1, 2025:

    nit: Would be nice to get

    0Check if using libFuzzer ... NO
    

    For the negative case. Right now on only gets the ... and then just the omission of this warning.


    maflcko commented at 12:12 pm on April 1, 2025:
    May do if I have to re-touch for other reasons.
  32. hodlinator approved
  33. hodlinator commented at 11:42 am on April 1, 2025: contributor

    re-ACK fa513101212327f45965092652f6497aa28362ec

    Changes since last review:

    • Added warning for libFuzzer.
    • Changed run ids from r0/r1 to a/b.

    Tests

    Fresh build tree:

    0cmake -B build_fuzz2 -DCMAKE_C_COMPILER='clang' -DCMAKE_CXX_COMPILER='clang++' -DBUILD_FOR_FUZZING=ON -DCMAKE_CXX_FLAGS='-fprofile-instr-generate -fcoverage-mapping'
    

    After building, ran:

    0cargo run --manifest-path ./contrib/devtools/deterministic-fuzz-coverage/Cargo.toml -- $PWD/build_fuzz2 ../qa-assets/fuzz_corpora/ partially_downloaded_block 400
    

    With qa-assets at tip (3846351…).

    ✨ 2/4 runs without the C++ fix fail with expected differences in scheduler + script check. ✨ 5/5 runs with the C++ fix run deterministically.


    For concept thoughts, see previous review.

  34. DrahtBot requested review from janb84 on Apr 1, 2025
  35. DrahtBot requested review from Prabhat1308 on Apr 1, 2025
  36. janb84 commented at 11:52 am on April 1, 2025: contributor
    Re-ACK fa51310
  37. Prabhat1308 commented at 6:46 pm on April 1, 2025: contributor

    re-ACK fa51310

    Tested with various multiple values of parallel threads. All tests passed.

  38. fanquake merged this on Apr 2, 2025
  39. fanquake closed this on Apr 2, 2025

  40. maflcko deleted the branch on Apr 2, 2025

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-05-05 15:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me