Add simple static checker based on clang-query #833

pull real-or-random wants to merge 2 commits into bitcoin-core:master from real-or-random:202010-clang-query changing 5 files +76 −0
  1. real-or-random commented at 4:43 pm on October 15, 2020: contributor

    This add a simple static checker based on clang-query, which is a tool that could be described as a “clever grep” for abstract syntax trees (ASTs).

    As an initial proof of usefulness, this commit adds these checks:

    • No uses of floating point types.
    • No use of certain reserved identifiers (e.g., “mem(…)”, #829).
    • No use of memcmp (#823).

    The checks are easily extensible.

    The main purpose is to run the checker on CI, and this commit adds the checker to the Travis CI script.

    This currently requires clang-query version at least 10. (However, it’s not required not compile with clang version 10, or with clang at all. Just the compiler flags must be compatible with clang.) Clang-query simply uses the clang compiler as a backend for generating the AST.

    In order to determine the compile in which the code is supposed to be compiled (e.g., compiler flags such as -D defines and include paths), it reads a compilation database in JSON format. There are multiple ways to generate this database. The easiest way to obtain such a database is to use a tool that intercepts the make process and build the database.

    On Travis CI, we currently use “bear” for this purpose. It’s a natural choice because there is an Ubuntu package for it. If you want to run this locally, bear is a good choice but other tools such as compiledb (Python) are available.

  2. real-or-random force-pushed on Oct 15, 2020
  3. real-or-random force-pushed on Oct 15, 2020
  4. real-or-random force-pushed on Oct 15, 2020
  5. real-or-random commented at 7:38 pm on October 15, 2020: contributor

    Here’s another interesting one: match binaryOperator(allOf(unless(isExpansionInSystemHeader()), hasOperatorName("/")))

    Unfortunately we use quite a few divisions, so this does not lead anywhere.

    We could also check easily for reserved identifiers.

    More suggestions?

  6. real-or-random cross-referenced this on Oct 15, 2020 from issue memcmp may be miscompiled by GCC by real-or-random
  7. real-or-random commented at 9:15 pm on October 15, 2020: contributor
    grml, apparently clang-query uses a compile_commands.json file to infer compile options/flags, which I had lying around in my local tree from some old experiments in 2018… So I need to rework this.
  8. gmaxwell commented at 10:01 pm on October 15, 2020: contributor

    Unfortunately we use quite a few divisions, so this does not lead anywhere.

    Is there any way to tell if the divisor is a compile time constant? We generally don’t want divisions ending up in the machine code (though if its in some setup routine it’s not that big a deal) as they’re non-constant time[1] and slow (e.g. 50-100 cycles on x86, worse on a lot of arm). … but divisions with constants get converted by the compiler to multiples via strength reduction. I believe the codebase is free or nearly free of divisions with non-constant divisors (maybe there is one in scratch space sizing or something like that).

    I’ve failed to yet fully impress my practice of always using masking over division onto Pieter yet… but there are some cases where masking doesn’t easily work.

    [1] also, stock valgrind ctime check will not catch divisions involving secret data. I have a pretty trivial patch to valgrind on my laptop so that it does… and it passed for me as of the last time I ran it. I’m not really eager to figure out how to make travis use a custom compiled copy of valgrind…

  9. real-or-random force-pushed on Oct 16, 2020
  10. real-or-random commented at 1:18 pm on October 16, 2020: contributor

    Unfortunately we use quite a few divisions, so this does not lead anywhere.

    Is there any way to tell if the divisor is a compile time constant? We generally don’t want divisions ending up in the machine code (though if its in some setup routine it’s not that big a deal) as they’re non-constant time[1] and slow (e.g. 50-100 cycles on x86, worse on a lot of arm). … but divisions with constants get converted by the compiler to multiples via strength reduction. I believe the codebase is free or nearly free of divisions with non-constant divisors (maybe there is one in scratch space sizing or something like that).

    Right, for example.

    0Match [#103](/bitcoin-core-secp256k1/103/):
    1
    2./src/ecmult_impl.h:1030:27: note: "root" binds here
    3    *n_batch_points = 1 + (n - 1) / *n_batches;
    4                          ^~~~~~~~~~~~~~~~~~~~
    

    This is the reference for the matcher: https://clang.llvm.org/docs/LibASTMatchersReference.html

    You can match on “constantExpr” but this is not what we want. This matches constant expression as seen from the point of view of the AST, i.e., this matches places where only constant expressions are allowed (such as after case).

    This technique in general does not enable us to get insights on what the compiler will do with the code. It’s really just pattern matching on the AST, so it’s merely a clever grep. (One can do more advanced things with clang tooling but this requires more effort than writing matching expressions.)

    What we could do is to check if the divisor is an integer literal, but even this is not precise we have also a few (x-1) divisors for example.

  11. real-or-random force-pushed on Oct 16, 2020
  12. real-or-random force-pushed on Oct 16, 2020
  13. real-or-random force-pushed on Oct 16, 2020
  14. real-or-random commented at 2:40 pm on October 16, 2020: contributor
    This uses https://github.com/rizsotto/Bear, which intercepts make, to generate a JSON compilation database (see https://clang.llvm.org/docs/JSONCompilationDatabase.html). Bear is nice because we can easily install it on Travis. There are other tools that may work better when used locally, e.g., https://github.com/nickdiego/compiledb .
  15. real-or-random force-pushed on Oct 16, 2020
  16. real-or-random force-pushed on Oct 16, 2020
  17. real-or-random force-pushed on Oct 16, 2020
  18. real-or-random force-pushed on Oct 16, 2020
  19. real-or-random force-pushed on Oct 16, 2020
  20. real-or-random force-pushed on Oct 16, 2020
  21. real-or-random force-pushed on Oct 16, 2020
  22. real-or-random force-pushed on Oct 16, 2020
  23. real-or-random force-pushed on Oct 16, 2020
  24. real-or-random force-pushed on Oct 16, 2020
  25. real-or-random commented at 9:56 pm on October 16, 2020: contributor

    Ready for review.

    Travis fails but this is intentionally. See it in action here:

    https://travis-ci.org/github/bitcoin-core/secp256k1/jobs/736511002#L1254 (complains about memczero declaration, see #829) https://travis-ci.org/github/bitcoin-core/secp256k1/jobs/736511002#L1313 (complains about reserved _t type declared in benchmarks)

    Edit: argh, the links to the exact lines don’t work due to the output folding. I might add a commit that restructures the Travis output in general by adding folds to the main commands: https://github.com/travis-ci/docs-travis-ci-com/issues/949#issuecomment-276755003 Then all the main commands be folded unless they fail, so see only the errors. Moreover we don’t need the cats at the end (which have confused people in the past) and we don’t need to ugly hack that I introduced here to make the color output work with cat.)

  26. real-or-random marked this as ready for review on Oct 16, 2020
  27. sipa commented at 7:16 am on October 17, 2020: contributor
    This is pretty cool. I’ll have to play with it a bit.
  28. real-or-random force-pushed on Oct 19, 2020
  29. real-or-random renamed this:
    Add simple static checker using clang-query
    Add simple static checker based on clang-query
    on Oct 19, 2020
  30. real-or-random commented at 3:57 pm on October 19, 2020: contributor
    I added a long commit message and updated the PR description to this message.
  31. real-or-random referenced this in commit e89278f211 on Oct 20, 2020
  32. real-or-random cross-referenced this on Oct 20, 2020 from issue Don't use reserved identifiers memczero and benchmark_verify_t by real-or-random
  33. Add simple static checker based on clang-query
    This add a simple static checker based on clang-query, which is a
    tool that could be described as a "clever grep" for abstract syntax
    trees (ASTs).
    
    As an initial proof of usefulness, this commit adds these checks:
      - No uses of floating point types.
      - No use of certain reserved identifiers (e.g., "mem(...)", #829).
      - No use of memcmp (#823).
    
    The checks are easily extensible.
    
    The main purpose is to run the checker on CI, and this commit adds
    the checker to the Travis CI script.
    
    This currently requires clang-query version at least 10. (However,
    it's not required not compile with clang version 10, or with clang
    at all. Just the compiler flags must be compatible with clang.)
    Clang-query simply uses the clang compiler as a backend for
    generating the AST.
    
    In order to determine the compile in which the code is supposed to be
    compiled (e.g., compiler flags such as -D defines and include paths),
    it reads a compilation database in JSON format. There are multiple ways
    to generate this database. The easiest way to obtain such a database is
    to use a tool that intercepts the make process and build the database.
    
    On Travis CI, we currently use "bear" for this purpose. It's a natural
    choice because there is an Ubuntu package for it. If you want to run
    this locally, bear is a good choice but other tools such as compiledb
    (Python) are available.
    4831f2ed4b
  34. real-or-random force-pushed on Oct 22, 2020
  35. elichai commented at 10:46 am on October 22, 2020: contributor

    FWIW

    Names beginning with a capital ‘E’ followed a digit or uppercase letter Names that begin with either ‘is’ or ‘to’ followed by a lowercase letter Names that begin with ‘LC_’ followed by an uppercase letter Names of all existing mathematics functions suffixed with ‘f’ or ‘l’ Names that begin with ‘SIG’ followed by an uppercase letter Names that begin with ‘SIG_’ followed by an uppercase letter Names beginning with ‘str’, ‘mem’, or ‘wcs’ followed by a lowercase letter Names that end with ‘_t’

  36. real-or-random commented at 10:53 am on October 22, 2020: contributor
    @elichai Yeah, where did you get this list from? Is this really all about all names? I thought E is only for macros. We can’t check for macros using clang-query but we could add a simple grep for #define[[:space:]+]E etc. I also have uncommitted changes to add https://clang.llvm.org/docs/DiagnosticsReference.html#wreserved-id-macro if the compiler supports it (this covers only _ and __ macros). Not sure if overkill but it’s not a lot of work.
  37. elichai commented at 11:15 am on October 22, 2020: contributor

    @elichai Yeah, where did you get this list from? Is this really all about all names? I thought E is only for macros.

    https://www.gnu.org/software/libc/manual/html_node/Reserved-Names.html

    About E it’s weird, in the gcc docs it says for additional error code names which are macros, and the exact wording of the standard is:(7.1.4)

    Macros that begin with E and a digit or E and an uppercase letter (followed by any combination of digits, letters, and underscore) may be added to the declarations in the <ermo. h> header.

    But on 7.1.3 “Reserved identifies” it says:

    Each header declares or defines all identifiers listed in its associated subclause, and optionally declares or defines identifiers listed in its associated future library directions subclause and identifiers which are always reserved either for any use or for use as file scope identifiers.

    This paper proposing to change that also agrees on that interpretation: http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2572.pdf (See “Future Library Directions”)

    However, p1 makes it clear that all identifiers reserved from this subclause are reserved identifiers regardless of what header files are included, meaning that these rules apply to all C code

    In effect, these identifiers are reserved for all uses in C regardless of what header files (if any) are included

    EDIT: I realize now that you asked about identifier vs macro, and not about if it’s related to errno.h or not. it sounds like it’s only for macros but the lawyering here is delicate

  38. elichai commented at 11:55 am on October 22, 2020: contributor

    Standard on 6.8.3 Semantics:

    The identifier immediately following the define is called the macro name

    and also Aaron Ballman(the author of that paper, and part of WG14 committee) told me:

    macros are defined using an identifier, so “macro name” and “identifier” are interchangable “reserved for use as a macro name” is just saying how it’s intended to be used, not that macro names are a special thing

    So yeah :/

  39. gmaxwell commented at 1:52 pm on October 22, 2020: contributor
    It makes sense: if some macro is going to get called E1234 then you obviously can’t have a variable named E1234 without trouble.
  40. sipa referenced this in commit 9e5939d284 on Nov 4, 2020
  41. Add -Wreserved-id-macro to compile flags (only clang)
    This warns when we define macros with some reserved identifiers, e.g.,
    with underscores. It does not catch all reserved identifiers though.
    bb8cf03a28
  42. real-or-random cross-referenced this on Mar 17, 2021 from issue schnorrsig API overhaul by jonasnick
  43. real-or-random cross-referenced this on Nov 3, 2021 from issue Make fe magnitude implied statically by real-or-random
  44. real-or-random cross-referenced this on Mar 14, 2022 from issue This may be superior to clang-query by real-or-random
  45. real-or-random cross-referenced this on Aug 23, 2022 from issue Add `tools/symbol-check.py` by hebasto
  46. real-or-random commented at 5:36 pm on May 8, 2023: contributor

    Hm, closing for now… clang-query is a neat idea, but I’m not convinced that the checks introduced here (namely mostly “reserved identifier” stuff) are worth maintaining this, in particular because clang-query is a bit finicky.

    This conclusion could certainly change if we want to use something like this here for tracking magnitude (https://github.com/bitcoin-core/secp256k1/issues/1001) or other advanced stuff. And then, we may also want to look into other similar tools such as CodeQL.

  47. real-or-random closed this on May 8, 2023


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin-core/secp256k1. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-12-22 09:15 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me