memcmp with constants that contain zero bytes are broken in GCC #20005

issue sipa openend this issue on September 23, 2020
  1. sipa commented at 5:57 pm on September 23, 2020: member

    It appears that there is a bug in certain GCC releases (in the version 9 and 10 series) where an optimization step breaks correctness of memcmp when at least one of the arguments is a compile-time constant array that contains at least one zero byte.

    GCC bug is here: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95189. It was stumbled upon by @roconnor-blockstream in https://github.com/bitcoin-core/secp256k1/pull/822. It is being tracked for libsecp256k1 in https://github.com/bitcoin-core/secp256k1/issues/823.

    I have verified that in some instances it also affects C++, and may even affect std::lexicographical_compare.

    This may be relevant in some of our code (in particular, the CNetAddr IP range checking does comparisons with constants that contain zeroes, but perhaps more).

    Solutions:

    • Build with -fno-builtin-memcmp, but we should measure performance impact.
    • Very carefully inspect the codebase for potential cases, and use a custom memcmp for those.

    TODO:

    • Verify if compiler-generated memcmp calls may be affected as well
  2. sipa added the label Bug on Sep 23, 2020
  3. maflcko added this to the milestone 0.21.0 on Sep 23, 2020
  4. sipa commented at 6:36 pm on September 23, 2020: member
    Another observation by @roconnor-blockstream: it seems memcmp(...) <= 0 is not affected, while memcmp(...) < 0.
  5. roconnor-blockstream commented at 8:49 pm on September 23, 2020: contributor
    Technically my claim was, from my very limited understanding, the bug can only change non-zero results from memcpy to zero results. Still, even in this case if the result of memcmp(x,y) is changed from a 1 to a 0, the value of memcmp(...) <= 0 will change from false to true.
  6. elichai commented at 8:28 pm on September 24, 2020: contributor

    Here is the example of the bug appearing when using the > operator on 2 arrays: https://godbolt.org/z/j5en9d Here it appears with std::lexicographical_compare: https://godbolt.org/z/a6scYK

    -fno-builtin-memcmp Doesn’t do anything for either of these (as they didn’t call memcmp)

  7. sipa commented at 8:58 pm on September 24, 2020: member

    @elichai Great find. That’s… unfortunate.

    It appears that code paths that use memcmp for (in)equality checks (e.g. memcmp(…) == 0) are unaffected. I suspect this means that automatically generated operator== on simple classes/structs aren’t affected either. @real-or-random also found evidence for this in the bug’s code.

    However, @elichai’s finding means that -fno-builtin-memcmp is not enough, as there are C++ headers that directly invoke __builtin_memcmp. Some possibilities (we’re not restricted to just one of them, though):

    • Outlaw compiling with affected compilers entirely (by introducing a sanity check, and a unit test)
    • Figure out exactly which GCC -f... optimization flag controls the presence of the bug, and disabled that (always, or selectively on known-bad compilers)
    • Find the subset of affected C++ headers, and remove them from the codebase (replace them with my_memcmp where needed or so).
  8. elichai commented at 9:26 pm on September 24, 2020: contributor

    Another worth mentioning is that arith_uint256 managed to dodge this: https://github.com/bitcoin/bitcoin/blob/master/src/arith_uint256.h#L217-L222

    while manually going over all the -O2 flags in my GCC 10.2 I found that disabling these 5 flags make the bug disappear: -fno-tree-dce -fno-tree-dominator-opts -fno-tree-fre -fno-tree-pre -fno-code-hoisting

    quickly reading the description of these they don’t sound like something we’d like to disable that quickly :(

  9. real-or-random commented at 6:14 pm on September 25, 2020: contributor

    This is a patch against the broken GCC versions that emits a warning when the bug could be triggered: https://gist.github.com/real-or-random/1548239059158370416dfdc6a866329a

    When I compile the project with this patch, I don’t get any warning. That’s good news. But unfortunately, this is not perfeclty sound. It catches all known examples but it fails to catch the std::lexicographical_compare example in this thread…. Maybe the reason is buried somewhere in this function but I don’t know. https://gcc.gnu.org/onlinedocs/gcc-10.2.0/libstdc++/api/a00596_source.html#l01592

  10. roconnor-blockstream commented at 10:25 pm on September 25, 2020: contributor

    With

    0               location_t loc = EXPR_HAS_LOCATION(exp) ? EXPR_LOCATION (exp) : UNKNOWN_LOCATION;
    1               expanded_location s = expand_location (loc);
    2               fnotice(stderr, "%s:%d: Potentially miscompiled memcmp\n", s.file, s.line);
    

    I get a not so usefully located message, but a message none the less:

    0/nix/store/034wplfrxzhmljmajhmyya2ahg4z0nbl-gcc-9.3.0/include/c++/9.3.0/bits/stl_algobase.h:940: Potentially miscompiled memcmp
    
  11. roconnor-blockstream commented at 2:27 am on September 26, 2020: contributor

    Okay, I think I’ve cracked the puzzle:

    0               location_t loc = EXPR_HAS_LOCATION(exp) ? EXPR_LOCATION (exp) : UNKNOWN_LOCATION;
    1               loc = expansion_point_location_if_in_system_header(loc);
    2               emit_diagnostic(DK_DEBUG, loc, 0, "Potentially miscompiled memcmp");
    
    0In file included from /nix/store/23rhjr5qzbvf4kr0nwb5q3lkbxxl62jq-gcc-9.3.0/include/c++/9.3.0/algorithm:61,
    1                 from test4.cpp:1:
    2/nix/store/23rhjr5qzbvf4kr0nwb5q3lkbxxl62jq-gcc-9.3.0/include/c++/9.3.0/bits/stl_algobase.h: In function ‘int square(std::array<unsigned char, 3>)’:
    3/nix/store/23rhjr5qzbvf4kr0nwb5q3lkbxxl62jq-gcc-9.3.0/include/c++/9.3.0/bits/stl_algobase.h:940:41: debug: Potentially miscompiled memcmp
    4  940 |      if (int __result = __builtin_memcmp(__first1, __first2, __len))
    5      |   
    

    expansion_point_location_if_in_system_header probably doesn’t play a role, but maybe it doesn’t hurt either.

  12. roconnor-blockstream commented at 3:52 am on September 26, 2020: contributor
    I rebuilt bitcoin-0.20.1 (including libsecp256k1) using emit_diagnostic, and I also did not get any miscompiled memcmp messages.
  13. theuni commented at 0:26 am on September 30, 2020: member

    I was curious to see how much work it would be to work around this, so I hacked up (and vendored) my libstdc++ headers in order to add detection for the guilty functions, and started a branch of workarounds.

    The guilty v9 headers (which call __builtin_memcmp themselves) are:

    • bits/char_traits.h
    • bits/stl_algobase.h

    v10 adds:

    • array (I haven’t looked into this one yet)

    The workarounds mostly involve creating comparators for containers to avoid the default lexicographical_compare and getting rid of some std::equal usage. Those are pretty straightforward. But then I discovered that many of the boost string parsing functions end up being guilty.

    For example: boost::split(vDeploymentParams, strDeployment, boost::is_any_of(":")); ends up internally calling std::__equal<true>::equal().

    And while this might be a good excuse to get rid of some of boost usage, I’m afraid that the boost libs themselves are going to be a problem. And I don’t think we want to be patching boost to fix this.

    Even worse than boost, glibc and libstdc++ themselves are likely affected as well (for different reasons).

    I can push up my partial work if anyone is interested, but it’s not looking like this is a good way forward.

  14. theuni commented at 0:38 am on September 30, 2020: member

    I suspect this means that automatically generated operator== on simple classes/structs aren’t affected either.

    That seems to be the case from what I’ve seen so far, but we do end up in __builtin_memcmp when using a map/set of an array/vector of bytes, as those will use std::lexicographical_compare by default for comparison. There are lots of these in psbt.h, for example.

    Thankfully there’s an easy way to specify our own comparator:

     0struct VecByteComp {
     1    bool operator()(const std::vector<unsigned char>& lhs, const std::vector<unsigned char>& rhs) const {
     2        //TODO: fill in
     3        return true;
     4    }
     5};
     6
     7...
     8
     9-        std::set<std::vector<unsigned char>> key_lookup;
    10+        std::set<std::vector<unsigned char>, VecByteComp> key_lookup;
    

    But as I mentioned above, working around those doesn’t get us very far :(

  15. jjyr referenced this in commit 6a60c27353 on Oct 2, 2020
  16. laanwj removed this from the milestone 0.21.0 on Oct 8, 2020
  17. maflcko referenced this in commit ca85449f22 on Feb 8, 2021
  18. sidhujag referenced this in commit f2c9a6f37e on Feb 8, 2021
  19. UdjinM6 referenced this in commit 43ee2ef541 on Oct 23, 2021
  20. UdjinM6 referenced this in commit 1bfb6b9543 on Oct 23, 2021
  21. UdjinM6 referenced this in commit 76528449b5 on Oct 23, 2021
  22. UdjinM6 referenced this in commit 6c797b13e8 on Dec 4, 2021
  23. fanquake referenced this in commit 711ce2e533 on Jan 7, 2022
  24. sidhujag referenced this in commit 870a4d39b0 on Jan 7, 2022
  25. fanquake commented at 1:33 pm on August 4, 2022: member

    According to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95189#c33, this has been fixed in GCC 10.3.0, which we currently use to build releases. The relevant backported commit, on the releases/gcc-10 branch, is https://github.com/gcc-mirror/gcc/commit/e4c9aac98611f63847ef6c57916808d9a2d7abcb.

    So the fix is in GCC master, 12, 11 and 10 branches. Currently we support GCC 8.1+.

  26. fanquake referenced this in commit 06bd2f2756 on Aug 14, 2022
  27. fanquake referenced this in commit 155c96e841 on Aug 14, 2022
  28. fanquake added the label Upstream on Aug 18, 2022
  29. fanquake referenced this in commit f6e62c6314 on Aug 31, 2022
  30. fanquake commented at 10:05 am on August 31, 2022: member
    Does anyone have an opinion on what we should do here going forward? We can just wait until we bump out minimum required GCC to 10.3.0+ (unclear if that’d happen for 25.x), and then mark this as resolved. We could introduce a new compiler bug unit test, something like https://github.com/fanquake/bitcoin/commit/f6e62c6314c5707da8236a30efa7f9a44cf1eaef, which will fail when someone is using a broken GCC, but that’s hard to do while we are still supporting / testing older compilers in CI. Or we can just continue to leave this open as a known (potential) problem?
  31. ryanofsky commented at 10:28 am on August 31, 2022: contributor

    We could introduce a new compiler bug unit test, something like fanquake@f6e62c6, which will fail when someone is using a broken GCC, but that’s hard to do while we are still supporting / testing older compilers in CI.

    Could add the test but make it pass if IGNORE_GCC_BUG_95189 environment variable is set, and set the environment variable in relevant CI variants. That seems like a way to resolve the issue.

  32. maflcko commented at 10:47 am on August 31, 2022: member
    Instead of setting IGNORE_GCC_BUG_95189 in the CI, the CI could be patched to assert when the bug is hit in real code
  33. theuni commented at 7:19 pm on August 31, 2022: member

    I worked up a clang AST query that can be used to detect the test-case. I’m not sure if it’s perfect though as I don’t know exactly what gcc will screw up (copies of pointers, de-references, etc.). If this is useful, it’d be trivial to hook it up to a clang-tidy check, or even a compiler plugin to make it easier to check depends as well.

    memcmp.cpp

    (exact test-case taken from gcc bug report):

    0using size_t = decltype(sizeof(0));
    1int memcmp(const void *s1, const void *s2, size_t n);
    2
    3static const float z[1] = {0};
    4int f(float x)
    5{
    6    return memcmp(&x, z, sizeof(x));
    7}
    

    In action:

    0$ clang-query ../memcmp.cpp -- -std=c++17
    1clang-query> m callExpr(hasAnyArgument(ignoringImpCasts(declRefExpr(to(varDecl(hasType(qualType(constantArrayType(),isConstQualified())),hasDescendant(integerLiteral(equals(0)))))))),callee(functionDecl(hasName("memcmp"))))
    2
    3Match [#1](/bitcoin-bitcoin/1/):
    4
    5/home/cory/dev/bitcoin-tidy-experiments/build/../memcmp.cpp:7:12: note: "root" binds here
    6    return memcmp(&x, z, sizeof(x));
    7           ^~~~~~~~~~~~~~~~~~~~~~~~
    81 match.
    

    The query:

     0callExpr(
     1    hasAnyArgument(ignoringImpCasts(
     2        declRefExpr(to(varDecl(
     3            hasType(qualType(
     4                constantArrayType(),
     5                isConstQualified()
     6            )),
     7            hasDescendant(integerLiteral(equals(0)))
     8        ))))
     9    ),
    10    callee(functionDecl(hasName("memcmp")))
    11)
    

    also indentation

  34. roconnor-blockstream commented at 2:28 pm on September 1, 2022: contributor
    Particularly for C++ code, I don’t think the clang query is going to be that useful because indirect calls to memcmp can also be problematic. We searched for problematic code by instrumenting GCC itself and I don’t think anything else would be particularly reliable. (See https://blog.cri.epita.fr/post/2020-12-05-a-gcc-bug-tracking-affected-software/)
  35. theuni commented at 9:27 pm on September 1, 2022: member

    @roconnor-blockstream Yep, makes perfect sense to use a patched GCC instead. I also hadn’t considered implicit or self-generated calls to memcmp, which I assume may be created like for memcpy.

    I took a quick look at some of the offenders, many of which are comparisons against string literals like this one: https://github.com/xiph/flac/blob/f764434a39e8a8715d5871bb263189e5a7298280/src/flac/decode.c#L1310

    Which indeed the above query doesn’t catch. I won’t bother with this any further as hacking on gcc clearly makes more sense. Thanks for that work!

  36. PastaPastaPasta referenced this in commit 525e2d8159 on May 10, 2023
  37. maflcko commented at 6:52 pm on February 8, 2024: member

    Closing for now. Feel free to reopen, but I don’t think anything is going to be done about gcc 10.1 and 10.2 at this point.

    See also https://github.com/bitcoin/bitcoin/pull/29091

  38. maflcko closed this on Feb 8, 2024


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-10-04 22:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me