Fuzzer enhancement: Explicitly check output for uninitialized memory #22064

issue guidovranken openend this issue on May 25, 2021
  1. guidovranken commented at 9:15 pm on May 25, 2021: contributor

    Is your feature request related to a problem? Please describe.

    Both MemorySanitizer and Valgrind will only detect uninitialized memory if it is used for branching or IO.

    E.g. the following program performs a computation using an uninitialized variable (a) but this won’t trigger MSAN/Valgrind:

    0int main(void)
    1{
    2    int a; int b = a + 10;
    3    return 0;
    4}
    

    Describe the solution you’d like

    Call

    0extern "C" void __msan_check_mem_is_initialized(const volatile void *x, size_t size);
    

    on the data to make MSAN evaluate it.

    Describe alternatives you’ve considered

    Alternative solution that also works with Valgrind: write the data to /dev/null:

    0#include <stdio.h>
    1
    2int main(void)
    3{
    4    int a; int b = a + 10;
    5    FILE* fp = fopen("/dev/null", "wb");
    6    fwrite(&b, sizeof(b), 1, fp);
    7    fclose(fp);
    8    return 0;
    9}
    

    Additional context

    Proposal: Create a wrapper for __msan_check_mem_is_initialized (as a C++ method), e.g.:

    0void TestMsan(const void* data, const size_t size) {
    1   __msan_check_mem_is_initialized(x, size);
    2}
    

    And use overloaded methods for special types, e.g.

    0void TestMsan(const std::string& s) {
    1   TestMsan(s.data(), s.size());
    2}
    

    Then edit all fuzzer harnesses and call TestMsan with the output of each non-void method.

    E.g. the parse_script harness would become:

     0// Copyright (c) 2009-2020 The Bitcoin Core developers
     1// Distributed under the MIT software license, see the accompanying
     2// file COPYING or http://www.opensource.org/licenses/mit-license.php.
     3
     4#include <core_io.h>
     5#include <script/script.h>
     6#include <test/fuzz/fuzz.h>
     7
     8FUZZ_TARGET(parse_script)
     9{
    10    const std::string script_string(buffer.begin(), buffer.end());
    11    try {
    12        TestMsan(ParseScript(script_string));
    13    } catch (const std::runtime_error&) {
    14    }
    15}
    

    The same concept can be applied to the unit tests.

  2. guidovranken added the label Feature on May 25, 2021
  3. maflcko added the label Brainstorming on May 26, 2021
  4. maflcko commented at 7:36 am on June 5, 2021: member

    Concept ACK, obviously. Though, I am worried that this will make our code overly verbose, and hard to maintain. Maybe it would help if there was a compiler knob to adjust the aggressiveness of the memory sanitizers?

    For “easy” memory violations, -ftrivial-auto-var-init=pattern might be enough to corrupt values and then cause a logic error down the line. At least it will detect the memory issue fixed in commit 37371268d14ed6d5739af5b65d8bdb38b0e8dda2.

  5. practicalswift commented at 8:27 am on October 3, 2021: contributor
    @guidovranken This technique was used in #23152 (comment). Thanks! :)
  6. maflcko commented at 3:23 pm on September 27, 2023: member

    For reference -ftrivial-auto-var-init=pattern, caught another issue (in leveldb) that both valgrind and msan missed: #28359

    As mentioned previously, I don’t think there is anything that can be done here, other than adding a compiler flag upstream.

    In theory the wrapper code can be enforced with a clang-tidy plugin in fuzz code (cc @dergoegge), but the downsides of being incomplete and making the code overly verbose still hold.

  7. maflcko added the label Tests on Sep 27, 2023
  8. maflcko removed the label Feature on Sep 27, 2023
  9. dergoegge commented at 1:53 pm on September 28, 2023: member

    In theory the wrapper code can be enforced with a clang-tidy plugin in fuzz code (cc @dergoegge), but the downsides of being incomplete and making the code overly verbose still hold.

    (maybe this is what you mean by “enforce” but) My idea was to have the clang-tidy plugin auto refactor all our code to insert the wrappers prior to running the msan/valgrind job in CI. I think this should be possible and would avoid the verbosity of having the wrappers present in the actual code.

  10. maflcko commented at 6:05 pm on September 28, 2023: member

    refactor all our code

    So every statement in the source code is wrapped and the memory is read by msan? Probably fine, but I’d suspect a massive slow-down.

    I guess doing this once per release can’t hurt.

  11. real-or-random commented at 1:58 pm on April 9, 2024: contributor

    Alternative solution that also works with Valgrind: write the data to /dev/null:

    There’s also VALGRIND_CHECK_MEM_IS_DEFINED, see also https://github.com/bitcoin-core/secp256k1/blob/master/src/checkmem.h for an abstraction layer that works with both MSan and Valgrind.


    What’s suggested in this issue, i.e., reporting every read of an uninitialized value, may just be too much. The Valgrind FAQ says this:

    As for eager reporting of copies of uninitialised memory values, this has been suggested multiple times. Unfortunately, almost all programs legitimately copy uninitialised memory values around (because compilers pad structs to preserve alignment) and eager checking leads to hundreds of false positives. Therefore Memcheck does not support eager checking at this time.

    But starting with clang 16, at least MSan gets us closer to this. Returning an uninitialized variables from a function, or passing uninitialized values to a function as a parameter is now considered a “use” of uninitialized memory, and MSan will report it by default. See the Clang 16.0.09 Release Notes:

    -fsanitize-memory-param-retval is turned on by default. With -fsanitize=memory, passing uninitialized variables to functions and returning uninitialized variables from functions is more aggressively reported. -fno-sanitize-memory-param-retval restores the previous behavior.


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-12-19 00:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me