ci: Run ASan/LSan and reorganize sanitizer and Valgrind jobs #846

real-or-random commented at 5:02 PM on November 9, 2020: contributor

This enables asan (including lsan), and restructures the sanitizer builds. It also removes -fno-omit-frame-pointer again. This is debatable. The main reason for removing this is that GCC (but not clang) fails to build (runs out of registers) when combining asan, -fno-omit-frame-pointer, and the x86_64 asm. But I believe without -fno-omit-frame-pointer, the instrumented binary may be closer to the real binary, which is good. If we get unusable debugging output on CI without frame pointers, we can still reproduce the build locally.

ECMULTGENPRECISION=8 : We should disable this entirely, not only on CI, because it's slow and needs too much stack space

real-or-random force-pushed on Nov 11, 2020

real-or-random force-pushed on Nov 12, 2020

real-or-random commented at 12:53 PM on November 12, 2020: contributor

A few things are weird on MacOS, and I don't really understand what's going on:

jobs 49 and 50 fail because lsan does not work with Apple's clang, see config.log. That's expected according to https://github.com/google/sanitizers/issues/1026
jobs 67 succeeds but 68 fails, with the same error message as above. But why the difference? I have no idea, they should use the same toolchain etc. Is this related to gmp? But 67 seem to fail with the "native compiler", and we don't set $CC_FOR_BUILD correctly for gcc. I've added this now.
I'm not convinced that the "gcc" jobs (including 67, 68) really use gcc. It's again Apple's fake gcc according to the $CC --version in after_script. @elichai I added a $CC --version in the script, let's see if this makes a difference. (This will be obsolete anyway when we switch away from travis).

And job 32 times out... well that's not nice but at least I understand what's going on ...

elichai commented at 1:51 PM on November 12, 2020: contributor

I'm not convinced that the "gcc" jobs (including 67, 68) really use gcc. It's again Apple's fake gcc according to the $CC --version in after_script. @elichai I added a $CC --version in the script, let's see if this makes a difference. (This will be obsolete anyway when we switch away from travis).

IIRC that's why I did CC=gcc-9 because OSX has an alias for gcc=clang https://github.com/bitcoin-core/secp256k1/pull/750/files#diff-5c41b2883cfdc97e6b8a07ebe6456643de0160099b7f15ed04afeb2fd627f630R14

https://github.com/bitcoin-core/secp256k1/blob/4232e5b7da0a68adc14fa4b481f7e106403c200d/contrib/travis.sh#L10-L13

real-or-random commented at 6:42 PM on November 13, 2020: contributor

IIRC that's why I did CC=gcc-9 because OSX has an alias for gcc=clang

Apparently that does not seem to work (anymore?).

I think we shouldn't try to fix this now but first move to another CI platform. For example, Cirrus seems to ship with GCC https://github.com/cirruslabs/cirrus-ci-docs/issues/67 .

real-or-random force-pushed on May 14, 2021

real-or-random commented at 11:57 AM on May 14, 2021: contributor

Rebased to work with Cirrus. I also updated the initial comment.

real-or-random marked this as ready for review on May 14, 2021

real-or-random renamed this:
~~Enable more sanitizers on Travis~~
ci: Run asan and add more sanitizer/valgrind jobs
on May 14, 2021

real-or-random force-pushed on May 17, 2021

real-or-random renamed this:
~~ci: Run asan and add more sanitizer/valgrind jobs~~
ci: Run ASan/LSan and reorganize sanitizer and Valgrind jobs
on May 17, 2021

sipa commented at 10:17 PM on May 18, 2021: contributor

utACK 1191bb07b0198ed605b30efebc7a542fbc14958f

in .cirrus.yml:267 in 1191bb07b0 outdated

 269 | +        WRAPPER_CMD: env
 270 | +        CFLAGS: "-fsanitize=undefined,address"
 271 | +        CFLAGS_FOR_BUILD: "-fsanitize=undefined,address"
 272 | +        UBSAN_OPTIONS: "print_stacktrace=1:halt_on_error=1"
 273 | +        ASAN_OPTIONS: "strict_string_checks=1:detect_stack_use_after_return=1:detect_leaks=1"
 274 | +        LSAN_OPTIONS: "use_unaligned=0"

jonasnick commented at 1:41 AM on May 21, 2021:

that's the default according to https://github.com/google/sanitizers/wiki/AddressSanitizerLeakSanitizer

real-or-random commented at 8:21 AM on May 21, 2021:

Hm indeed. To be honest, I can't remember where this is from, it was already in my old branch. I think we should set it to 1 actually. Let me do this.

Some background: Both Valgrind and LSan can check for memory leaks, so we would not need LSan. But LSan is very cheap and none of these tools are perfect. So as long as we don't see false positives, it makes sense to just run LSan even though we have Valgrind. And it turns out that use_unaligned=1 is indeed one thing where LSan has better coverage than Valgrind: https://stackoverflow.com/questions/33178242/valgrind-and-pragma-pack2 On my machine the difference in running time of tests 64 was 99,95s with use_unaligned=0 vs 100,02s use_unaligned=1, so this is just noise.

jonasnick commented at 11:52 AM on May 21, 2021:

cool, sgtm

in .cirrus.yml:268 in 1191bb07b0 outdated

 263 | +      env:
 264 | +        # The `--error-exitcode` is required to make the test fail if valgrind found errors, otherwise it'll return 0 (https://www.valgrind.org/docs/manual/manual-core.html)
 265 | +        WRAPPER_CMD: "valgrind --error-exitcode=42"
 266 | +    - name: "UBSan, ASan, LSan"
 267 | +      env:
 268 | +        # Use dummy wrapper command to run the tests outside of make check

jonasnick commented at 1:48 AM on May 21, 2021:

Remind me, what's the reason for not running this (and valgrind) in make check? If it was just about being able to set the test "count"/TEST_ITERS separately then we do have an env var for this now, and could remove it like this https://github.com/jonasnick/secp256k1/commits/202010_ci_matrix-jn (untested).

sipa commented at 2:10 AM on May 21, 2021:

I think it's just that make check will try to run without the wrappers.

jonasnick commented at 2:44 AM on May 21, 2021:

You mean make check is run without a wrapper even when a wrapper is set? Afaics no, because we unset BUILD whenever we set WRAPPER_CMD.

real-or-random commented at 8:01 AM on May 21, 2021:

The main issue (and I think this is what Pieter is partly saying) is that ${WRAPPER_CMD} make check typically won't work for all wrapper commands.

valgrind make check will memcheck make but will not memcheck our tests because they'll be child processes. (Now you could try valgrind --trace-children but then it will still additionally memcheck make which gives false positives and wastes CI time.)
wine make check just won't work at all because make is not a Windows executable.
I haven't tried QEMU but I expect the same as for Wine (because we cross-build).

What we could is to tell make to use the wrapper command. Now that I'm reading up on this again, I recall that it's not even more work to do this. There's the (obscurely named) LOG_COMPILER env variable, which does exactly this. You can even pass LOG_COMPILER="./libtool --mode=execute valgrind" etc, which we don't need now but maybe in the future. Running all tests using make check will have the benefit that the logs always end up in the same place in the CI output. Let me give it a shot.

And thanks for reminding me of SECP256K1_TEST_ITERS. When I was writing this, I didn't know that this exists (I reinvented the name TEST_ITERS). So we should simply set this variable instead, like we do for the benchmarks. This has the nice side effect that it would apply to the ordinary make check builds if we'd like to set it there for some reason.

real-or-random force-pushed on May 21, 2021

real-or-random commented at 10:09 AM on May 21, 2021: contributor

updated as described above

edit: pushed again, I had forgotten the use_aligned=1.

ci: Run ASan/LSan and reorganize sanitizer and Valgrind jobs de4157f13a

real-or-random force-pushed on May 21, 2021

in ci/cirrus.sh:41 in 22e980425b outdated

  53 | -    $QEMU_CMD ./exhaustive_tests
  54 | +    make "$BUILD"
  55 |  fi
  56 |  
  57 | -if [ -n "$WINE_CMD" ]
  58 | +if [ -n "$WRAPPER_CMD" ]

jonasnick commented at 11:50 AM on May 21, 2021:

If LOG_COMPILER works as described, can't we remove this branch? That's also why the CI jobs currently time out: the tests are being run twice.

real-or-random commented at 12:32 PM on May 21, 2021:

fixed

in ci/cirrus.sh:39 in 22e980425b outdated

  45 | +# This limits the iterations in the tests and benchmarks.
  46 | +export SECP256K1_TEST_ITERS="$TEST_ITERS"
  47 | +export SECP256K1_BENCH_ITERS="$BENCH_ITERS"
  48 |  
  49 | -if [ -n "$QEMU_CMD" ]
  50 | +if [ -n "$BUILD" ]

jonasnick commented at 11:51 AM on May 21, 2021:

We don't need to check for this condition $BUILD is always set.

real-or-random commented at 12:32 PM on May 21, 2021:

fixed

real-or-random force-pushed on May 21, 2021

real-or-random commented at 6:43 PM on May 21, 2021: contributor

Hm, it's still timing out. But everything looks correct including the test iterations (like in the revision yesterday, when the longest job took ~ 40 min.) If I'm not mistaken, the only difference to now is that we run the tests in make check and this means the only real difference is that they run concurrently with -j2 and we allocate only one CPU. I pushed a commit to see if this fixes the timeout.

ci: Simplify to use generic wrapper for QEMU, Valgrind, etc fcfcb97e74

tests: Treat empty SECP2561_TEST_ITERS as if it was unset 489ff5c20a

ci: Make test iterations configurable and tweak for sanitizer builds 02dcea1ad9

real-or-random force-pushed on May 21, 2021

real-or-random commented at 6:49 PM on May 21, 2021: contributor

Ok, ignore my last comment. The tests were running twice because I screwed up when rebasing. I hope it's really ready now...

jonasnick commented at 9:16 PM on May 21, 2021: contributor

Nice!

ACK 02dcea1ad9441f857c7768e2b7d304bb19fd2a0c spot-checked ci output, checked that when valgrind ./tests crashes then LOG_COMPILER=valgrind make check also crashes.

sipa commented at 9:38 PM on May 21, 2021: contributor

utACK 02dcea1ad9441f857c7768e2b7d304bb19fd2a0c

jonasnick merged this on May 21, 2021

jonasnick closed this on May 21, 2021

Fabcien referenced this in commit 5f47cd0547 on Jan 9, 2023

deadalnix referenced this in commit 826d64fb6d on Jan 10, 2023