General CI discussion

real-or-random commented at 12:53 pm on January 6, 2020: contributor

When thinking about #705 I realized it’s maybe a good idea to take a step back and look at this in the context of a general CI revamp. There are multiple things that would be nice and I think some of these probably have a higher priority than coverage reports:

ARM tests #598
Big-endian tests #696
Windows and/or MSVC builds? (https://github.com/bitcoin-core/secp256k1/issues/451#issuecomment-746752833)
MacOS builds? (MacOS relies on clang anyway, so not this is not as interesting as MSVC.)
Coverage reports #705
Reducing configuration options to keep the test matrix manageable, see e.g., #596 (comment)

Is there anything else you would like to see?

If we want to work on this, this is a good time to think about the CI platform. I don’t mind Travis and it’s reasonably stable in the past months but if we think that a different (or additional) provider is a better fit, it’s better to discuss this before we start working on these issues.

By the way, Bitcoin is currently looking into GitHub Actions for Windows at least: https://github.com/bitcoin/bitcoin/issues/17803

gmaxwell commented at 1:16 am on January 7, 2020: contributor

I think BE is basically solved in travis since there is s390 (lol). It sounds like the arm is arm64 which is useful but less interesting than 32bit arm.

The big gaps I see in travis:

You cannot usefully benchmark, so you can’t watch for performance regressions– the systems it runs on are wildly different in performance and wildly inconsistent in performance because they’re VPSes which are concurrently running other loads.
Getting work product output not supported (seems like a workaround would be possible), meaning you can’t get CI coverage analysis, or static analysis reports… except via integrating with other services which creates an unnecessary disparity between the free software tools the developers can use locally and the (usually commercial) services that the project ends up beholden to.
Travis CPU time is extremely limited (the fact that the travis executors are slow disguises how limited it is, e.g. it takes minutes for what would be tens of seconds locally). This means that stuff like automated fuzz testing at a scale which is likely to find interesting bugs is essentially off the table, at least as part of CI tests. This also clearly distorts testing priorities, with stuff like not using valgrind or cutting down tests just to fit under travis’ limits.

To me the obvious thing to do is to run a jenkins instance. Jenkins is extremely simple to operate (in my experience much easier to use than travis, because you’re less subject to trial and error– particularly because if you’re the group running it you can just ssh in and try the tests locally until they work). Jenkins supports remote executors (and you can confine jobs to only run on particular executors), so its trivial to have an executor that runs on dedicated hardware that allows benchmarking… or on fast systems.

The big challenge I see is that the travis ecosystem has created an expectation that PRs from random people will get CI run against them. And having random people’s code run on your own hosts/networks is super scary. One option would be to just run the jenkins on PRs by whitelisted people and on merged commits (this is apparently a standard feature of the jenkins PR integration module, which I’ve never used– my experience has been w/ just running it on merged commits). I can offer to run some executors (and can do arm7, big endian systems, whatever) on an isolated network that just vpns back to the jenkins install and has no network access. But the limitation there is that if I get hit by a bus you might have issues finding someone else willing to volunteer a half rack of assorted old hardware.

An extension of the “cant benchmark” is that the library really should have regression tests for sidechannel behaviour. Right now someone could flip a compiler flag, and the cmovs some being constant and you get an enormous sidechannel and no one would notice until it shows up in an attack (or hopefully, attack paper). There are a couple different ways of dealing with that which would allow CI integration of a sidechannel test, but most of them require being able to use test on bizarre hardware (e.g. with a jenkins remote executor). The one option that wouldn’t would be running tests in a cycle accurate cpu simulator– essentially like running in valgrind but could get cycle counts and dram access traces as a side effect which could be compared across different secrets to be sure they were constant, but when I went down that path a couple years ago none that I could find simulated dram. @tdaede might have more recent suggestions.

gmaxwell commented at 7:18 am on February 4, 2020: contributor

Travis hiding the output of ./tests failing really stinks, did it used to do that?

real-or-random commented at 10:29 am on February 4, 2020: contributor

We cat the log files in the end but you need to “unfold” output by clicking the arrow. https://travis-ci.org/bitcoin-core/secp256k1/jobs/641332511 Or are talking about something else?

gmaxwell commented at 1:31 pm on February 4, 2020: contributor

Ah, I missed the ‘unfold’ Thanks.

fanquake commented at 10:23 pm on February 4, 2020: member

By the way, Bitcoin is currently looking into GitHub Actions for Windows at least: bitcoin/bitcoin#17803

FYI. We’ve removed the GitHub Actions workflow from our repository. See discussion in that issue and https://github.com/bitcoin/bitcoin/pull/18031.

gmaxwell commented at 5:41 am on February 11, 2020: contributor

The CI should probably do a build on a system without libgmp to catch accidental header dependencies.

The CI should probably have a makefileless build (e.g. just a big GCC commandline that sets all the required defines)

real-or-random commented at 8:41 pm on April 15, 2020: contributor

related: https://github.com/bitcoin/bitcoin/issues/18659

real-or-random commented at 3:49 pm on May 7, 2020: contributor

We may want to look into Azure Pipelines: https://azure.microsoft.com/de-de/services/devops/pipelines/

real-or-random commented at 3:47 pm on November 4, 2020: contributor

We have the ubsan sanitizer now on Travis (https://github.com/bitcoin-core/secp256k1/pull/839). Has anyone tried other sanitizers that we may want to enable?

sipa commented at 4:09 pm on November 4, 2020: contributor

asan/lsan is generally as easy as ubsan to enable (and they can all be enabled simultaneously) though has a higher performance impact. It can catch out-of-bound accesses (also on the stack), and memory leaks.

tsan isn’t useful for us as there is no threading.

msan is a pain to integrate (as it requires building libc and other dependencies with msan as well), and we already have valgrind which largely detects the same issues.

elichai commented at 4:13 pm on November 4, 2020: contributor

msan is a pain to integrate (as it requires building libc and other dependencies with msan as well), and we already have valgrind which largely detects the same issues.

This isn’t really correct for C, msan contains shims for the common libc uses, so there shouldn’t be a need to rebuild libc to run msan here.

real-or-random commented at 10:31 am on November 11, 2020: contributor

Travis is effectively shutting down for open source: https://github.com/bitcoin/bitcoin/issues/17802

Options:

Be lazy and pay them: but I don’t think if we’re happy with Travis in general)
migrate to Cirrus CI: Cirrus would work. We need to move s390x to qemu but native s390x is unreliable on Travis anyway
migrate to Shippable: I was playing around with this a few weeks ago (https://app.shippable.com/github/real-or-random/secp256k1/dashboard) This looked good on first glance and builds are fast. But it seems that the project is dead soon (no activity in their github repo, not even activity or replies on twitter). So I don’t think we’ll be happy with this for long.
do our own thing (e.g., Jenkins run by @gmaxwell )
look into other CIs, e.g., Azure Pipelines

Non-options:

github actions: not suitable for bitcoin core and I don’t think we should have a different position: https://github.com/bitcoin/bitcoin/issues/17803#issuecomment-570317499

edit: Azure is not good either: Screenshot from 2020-11-11 11-39-24

gmaxwell commented at 12:26 pm on November 11, 2020: contributor

Any public “free” solution is going to either be short lived or extremely stingy with cpu time for obvious economic reasons– or, likely, both. Any “custom rolled” solution has the life limits from their sponsors not having time for maintenance, hardware failures, etc. but presumably wouldn’t be cpu resources limited. Custom rolled can also guarantee uncontested access to systems, allowing for real benchmarking.

I suggest regardless: All testing should be setup as generic shell scripts (and/or python if really needed) that could be invoked from any CI system– they’re pretty much there already except for the travis matrixing. Then it becomes easier to move things around.

The project isn’t just limited to just using one thing. If some free hosted thing works then it would probably good to run it in parallel with whatever else gets setup.

[I would already have a jenkins setup running but for ongoing power usage limitations here which are finally almost resolved.]

We need to move s390x to qemu but native s390x is unreliable on Travis anyway

I don’t think having s390x is particularly important, esp since it’s slightly broken for reasons out of our hands– but having a BE target is nice. Debian on MIPS BE (with valgrind too) and Sparc run fine in qemu-system though both quite slow, and I assume BE PPC works but I haven’t tried it. …and each of these is way more important than s390x: They’re all found in really cool spacecraft. I don’t know where s390x machines are found, but certainly not in space. Armonk, New York, probably. Space is much better than Armonk: For one thing, space doesn’t have any computer crackers.

real-or-random commented at 12:40 pm on November 11, 2020: contributor

I don’t think having s390x is particularly important, esp since it’s slightly broken for reasons out of our hands– but having a BE target is nice. Debian on MIPS BE (with valgrind too) and Sparc run fine in qemu-system though both quite slow, and I assume BE PPC works but I haven’t tried it. …and each of these is way more important than s390x:

Yes, sure. I don’t care about s390x in particular. We should just have some BE architecture. If it’s just for BE testing, we should probably just take the one which is fastest on qemu. Of course, having valgrind is better than not having it, and the probability that the Sparc build is relevant for some users is larger than for s390x.

real-or-random cross-referenced this on Nov 11, 2020 from issue ci: Migration from AppVeyor to GitHub Actions by hebasto

gmaxwell commented at 9:25 pm on November 11, 2020: contributor

Well I know mips with valgrind works, and I successfully got debian in sparc working but there is no valgrind. (which is too bad, sparc otherwise tends to expose some issues due to register windowing).

real-or-random commented at 12:02 pm on November 12, 2020: contributor

The project isn’t just limited to just using one thing. If some free hosted thing works then it would probably good to run it in parallel with whatever else gets setup.

Indeed.

More on the free thing: I changed my mind on Github Actions. I think it’s acceptable, see https://github.com/bitcoin/bitcoin/issues/17803#issuecomment-725551444. But Marko seems to disagree and he has some valid points.

I think Cirrus CI is the best free/“cloud” option for now.

jonasschnelli commented at 12:13 pm on November 12, 2020: contributor

There are plans to increase the stability and performance (as well as supported architectures) on bitcoinbuilds.org. There is also a GitHub integration (running only here for now). The CI is completely custom built and easy to extend. I’m currently trying to setup a team of devs/admins.

Ideally it would support running tests on hardware provided by authenticated developers/users. Probably not what you are looking for today but could be something for the long run.

real-or-random commented at 12:56 pm on November 12, 2020: contributor

Probably not what you are looking for today but could be something for the long run.

Thanks for letting us know. This sounds great and is certainly an option for the long run.

apoelstra commented at 6:27 pm on November 16, 2020: contributor

I am working on building a new x86 system with many cores and am happy to replicate any @gmaxwell script infrastructure, to reduce the bus number. But I don’t have any oddball hardware, nor do I have plans to amass any.

JeremyRand commented at 7:00 pm on December 10, 2020: none

edit: Azure is not good either: @real-or-random The Azure docs claim that it can be used with generic Git repos (not requiring GitHub). Would that mode be suitable? I assume it wouldn’t demand problematic GitHub permissions since it has no knowledge of GitHub in that mode, but I may be missing something.

kroggen commented at 6:08 pm on December 16, 2020: none

This is how I am building my fork (with VRF support) for Windows using GitHub Actions and Visual C: here

real-or-random commented at 1:37 pm on December 23, 2020: contributor

I’m heavily working on Cirrus CI integration in https://github.com/real-or-random/secp256k1/tree/202012-cirrusci . It looks promising so far but I want to continue to hack on this a little bit and clean it up before I open a PR here. (And anyway, a PR wouldn’t be build currently because Cirrus is not yet enabled.)

real-or-random cross-referenced this on Jan 7, 2021 from issue Add support for Cirrus CI by real-or-random

real-or-random commented at 8:26 pm on January 14, 2021: contributor

notes from irc:

020:38 <real_or_random> I didn't know it's so easy to run tests in valgrind: make check LOG_COMPILER=valgrind
120:38 <real_or_random> or if we add benchmarks, this would also work: make check LOG_COMPILER='./libtool --mode=execute valgrind'
220:39 <real_or_random> nice thing: it runs exactly like "make check", e.g., it uses multiple threads
320:41 <real_or_random> I believe this will be very helpful for simplifying the ci shell script

real-or-random commented at 2:49 pm on March 8, 2021: contributor

By the way, I still don’t the like way the results are shown in Cirrus. We run make check which runs ./tests and ./exhaustive_tests in parallel and write their outputs in log files. That means, when ./tests fails, the make check thing is red and then you click on it but you don’t see any real error message. Instead, the real error is in a green section below. (See downstream example: https://cirrus-ci.com/task/6253397255913472?command=cat_tests_log#L4) Not a huge deal but simply not great for people who don’t know what’s going on.

We could simply run the tests without make check but then we don’t test the automake part of make check. Maybe we can grep the log files for PASS or FAIL and make the “cat” script fail depending on this.

real-or-random commented at 4:24 pm on June 28, 2021: contributor

Maybe the clang static analyzer (scan-build) is a nice tool we could include in the CI: #959 (comment)

real-or-random commented at 11:56 am on August 20, 2021: contributor

Cirrus CI now has ARM64 hardware, see https://github.com/cirruslabs/cirrus-ci-docs/pull/905 and https://github.com/bitcoin/bitcoin/pull/22749

real-or-random cross-referenced this on Nov 15, 2022 from issue More ARM Linux builds, fewer MacOS builds by real-or-random

real-or-random commented at 7:42 pm on November 16, 2022: contributor

We could add a few more jobs with -O3 to increase the coverage of CI tests.

real-or-random cross-referenced this on Nov 16, 2022 from issue Add MSan CI jobs by sipa

General CI discussion #707