ci: Move more tasks to GHA?

maflcko commented at 8:01 am on June 19, 2024: member

Motivated by #29274 to make it easier to run the CI on forks, more tasks could be moved to GHA, similar to d97ddbe797f5b8b3bca0ee71b692e542b8990195?

The downside would be that it is harder to re-run a task (only maintainers can do it, not the pull request author).

Another downside would be that caching depends artefacts and docker images is hard on GHA. So ideally only tasks with NO_DEPENDS=1 are moved for now. It would be:

ci/test/00_setup_env_native_fuzz.sh:export NO_DEPENDS=1
ci/test/00_setup_env_native_tidy.sh:export NO_DEPENDS=1

Any other thoughts, or volunteers to move the tasks?

maflcko added the label Brainstorming on Jun 19, 2024

maflcko added the label Tests on Jun 19, 2024

maflcko commented at 8:01 am on June 19, 2024: member

cc @Sjors @m3dwards

bitcoin deleted a comment on Jun 19, 2024

m3dwards commented at 10:34 am on June 19, 2024: contributor

I didn’t know PR authors could re-run tasks on Cirrus.

It is nice that you can run the jobs on your own fork, I quite often now just push a random commit to my fork to trigger the CI jobs as an experiment.

Conceivably the jobs could be on both Cirrus and GHA and only run on GHA for forks. Extra maintenance burden probably not worth it though.

How are the depends artefacts cached on Cirrus? And which docker images are you referring to? The CI build one?

Happy to volunteer to move more tasks.

maflcko commented at 10:47 am on June 19, 2024: member

How are the depends artefacts cached on Cirrus? And which docker images are you referring to? The CI build one?

Cirrus itself has a simple and easy to use cache instruction. However, currently, the cache is implicit, because persistent workers are used.

With images I mean the ones listed by podman image ls, that is:

 0REPOSITORY                                     TAG         IMAGE ID      CREATED        SIZE
 1localhost/ci_native_asan                       latest      582be28ff8c1  15 hours ago   1.81 GB
 2localhost/ci_native_valgrind                   latest      fa2461c0e0d5  3 days ago     1.36 GB
 3localhost/ci_native_fuzz_msan                  latest      682198747e18  5 days ago     6.02 GB
 4localhost/ci_native_fuzz_valgrind              latest      84fec02871f5  5 days ago     1.28 GB
 5localhost/ci_macos_cross                       latest      49b2d3ad6d04  6 days ago     1.62 GB
 6localhost/ci_s390x                             latest      d5fe9fb0978a  8 days ago     539 MB
 7localhost/ci_native_msan                       latest      901b867ade25  8 days ago     6.02 GB
 8localhost/ci_native_tidy                       latest      15fed375c141  8 days ago     2.64 GB
 9localhost/ci_win64                             latest      5e2364ec8c8c  8 days ago     2.61 GB
10localhost/ci_native_previous_releases          latest      c8feaac3f9ea  8 days ago     537 MB
11localhost/ci_native_nowallet_libbitcoinkernel  latest      951fd4615e36  8 days ago     909 MB
12localhost/ci_native_fuzz                       latest      4875a0dcc4c7  8 days ago     1.21 GB
13localhost/ci_i686_centos                       latest      a331b16f0046  11 days ago    704 MB
14localhost/ci_arm_linux                         latest      34961f67c7ab  11 days ago    892 MB
15localhost/ci_native_tsan                       latest      9d7b28339df2  11 days ago    1.05 GB
16localhost/ci_i686_multiprocess                 latest      7e3205a702fd  12 days ago    1.08 GB

When they only cache the result of apt install ..., they mostly serve to avoid outages of the Ubuntu mirror, as well as a small speed-up. However, for heavy images like the msan one, they cache the llvm compilation, which is quite CPU heavy.

m3dwards commented at 11:01 am on June 19, 2024: contributor

Could we use this to cache the images? https://docs.docker.com/build/cache/backends/gha/

We are using the GHA cache at the moment, is there a reason why this woudln’t work for depends? Or is it just the effort required to split up the current CI script into different steps to take advantage of GHA cache?

maflcko commented at 11:10 am on June 19, 2024: member

We are using the GHA cache at the moment, is there a reason why this woudln’t work for depends?

It has a limit of 10 GB, so I am not sure if it can fit everything. https://github.com/bitcoin/bitcoin/actions/caches

willcl-ark commented at 11:18 am on June 19, 2024: member

I actually meant to ask this in #30193, but why do we cache using run-id in the key ${{ github.job }}-ccache-${{ github.run_id }} ? As we only cache on master, using only ${{ github.job }}-ccache would make more sense to me; a single rolling cache per job.

When we search for the cache to load we use a “wildcard” restore restore-keys: ${{ github.job }}-ccache- (with no run_id).

This would remove some “duplicates”, e.g “macos-native-x86_64-ccache-” has 3 cache entries, when it only needs 1?

Am I missing some reason for doing things this way?

maflcko commented at 11:22 am on June 19, 2024: member

Am I missing some reason for doing things this way?

See #28292 (review) . This is one of the reasons why I personally don’t like GHA: It is a closed, confusing, and brittle ecosystem. The only benefit is that it is free (for now).

m3dwards commented at 11:43 am on June 19, 2024: contributor

It has a limit of 10 GB, so I am not sure if it can fit everything. https://github.com/bitcoin/bitcoin/actions/caches @fanquake might be able to get us more?

maflcko commented at 11:56 am on June 19, 2024: member

I am not sure how increasing the cache size limit would be possible.

m3dwards commented at 2:32 pm on June 19, 2024: contributor

Is the plan to eventually move everything from Cirrus to GHA?

maflcko commented at 3:06 pm on June 19, 2024: member

If someone finds a solution to all cache issues, then it can be done. (Moving back should be easy in any case)

For now, see the issue description:

ideally only tasks with NO_DEPENDS=1 are moved for now. It would be:
* `ci/test/00_setup_env_native_fuzz.sh:export NO_DEPENDS=1`

* `ci/test/00_setup_env_native_tidy.sh:export NO_DEPENDS=1`

Sjors commented at 8:06 am on June 20, 2024: member

This is one of the reasons why I personally don’t like GHA: It is a closed, confusing, and brittle ecosystem. The only benefit is that it is free (for now).

I’m a bit hesitant as well. As I mentioned in #29274 (review) the only practical need I have currently is to run the native ARM job on Github CI.

However, I’m fine with either skipping it, or following some (clear) instructions to run it on my AMD desktop with some virtualisation (if the performance is acceptable).

The other jobs run fine on my Ubuntu machine(s) with not too much configuration.

I also found, while working on that PR, that Cirrus has better configuration options. E.g. Github CI doesn’t even support custom env variables.

maflcko commented at 8:22 am on June 20, 2024: member

The other jobs run fine on my Ubuntu machine(s) with not too much configuration.

Sure, but for others it may be too much hassle? See https://github.com/bitcoin-inquisition/bitcoin/pull/32#issue-1874824335

Sjors commented at 8:29 am on June 20, 2024: member

They didn’t try self-hosting or paying. It doesn’t seem like a good strategy to constantly flock to whichever company offers free resources. It would be nice if we can make it more flexible in an easy way.

E.g. someone maintaining a fork could upload a yaml file somewhere that specifies which jobs should be run by which cloud provider / self host, and which should be skipped.

maflcko commented at 8:48 am on June 20, 2024: member

Personally I think

it is cleaner to not offer (and maintain) a bunch of config options for the CI services
self-hosting is too much overhead for forks (especially for Windows/macOS builds)
GHA is already required, and if they charged a price, someone would likely pay for it (if it is reasonably priced)
Anyone really wanting to self-host can already do it today by writing their own CI provider and CI integration (the CI system itself only requires docker/podman)

Happy to close this issue, if there is no need or interest.

m3dwards commented at 12:00 pm on June 20, 2024: contributor

One nice aspect of GHA is the dev experience for contributors means they can easily have CI run on their personal forks before submitting a PR upstream. Yes it should be possible for someone to self host a runner but realistically how many would?

Flocking to a free provider could leave the project vulnerable to a bait and switch and perhaps a bit of vendor lock in but it also would have a democratising effect; putting CI in the hands of any fork by default.

Sjors commented at 12:20 pm on June 20, 2024: member

the only practical need I have currently is to run the native ARM job on Github CI

Which I’ve now solved with the magic power of qemu-user-static.

hebasto commented at 8:45 pm on June 20, 2024: member

I actually meant to ask this in #30193, but why do we cache using run-id in the key ${{ github.job }}-ccache-${{ github.run_id }} ? As we only cache on master, using only ${{ github.job }}-ccache would make more sense to me; a single rolling cache per job.

When we search for the cache to load we use a “wildcard” restore restore-keys: ${{ github.job }}-ccache- (with no run_id).

This would remove some “duplicates”, e.g “macos-native-x86_64-ccache-” has 3 cache entries, when it only needs 1?

Am I missing some reason for doing things this way?

In addition to what @maflcko pointed, this approach is documented here:

A cache today is immutable and cannot be updated. But some use cases require the cache to be saved even though there was a “hit” during restore. To do so, use a key which is unique for every run and use restore-keys to restore the nearest cache.

hebasto commented at 9:14 pm on June 20, 2024: member

Here is a summary of the current GHA cache storage usage:

prefix	size, MB
macos-native-x86_64-ccache	380
win64-native-static-qt	61
win64-native-ccache-installation	3
win64-native-ccache	160
win64-native-vcpkg-tools	5
win64-native-vcpkg-binary	51

Total: 660 MB.

And 10 GB are available.

Another downside would be that caching depends artefacts and docker images is hard on GHA. So ideally only tasks with NO_DEPENDS=1 are moved for now.

I agree.

vasild commented at 3:52 am on June 21, 2024: contributor

GitHub is convenient and free. IMO it is a vendor lock-in trap from which we will eventually want to get out. Adding more dependencies to it would make that harder.

“GitHub KYC is here boys!” https://x.com/nitesh_btc/status/1802735626032210330

maflcko commented at 6:29 am on June 21, 2024: member

There is no vendor lock-in, because switching back can be done trivially by calling git revert commit_id by providing the commit_id that switched the task. E.g. d97ddbe797f5b8b3bca0ee71b692e542b8990195.

The CI system is written exactly with that in mind: Not care about the outer host and be possible to run anywhere. No one is holding anyone back to spin up the CI anywhere they want. All they need to get is the source code at the commit id they want to test and a way to report back the CI logs and results.

If you want to discuss moving away from GitHub completely, I’d suggest to start a separate discussion thread. This one is about the UX of the CI (on forks).

maflcko commented at 10:40 am on July 1, 2024: member

Closing for now, but the discussion can continue. It should be possible and easy to switch more tasks over, starting with the ones mentioned in the starting post. This can be done, if people think it improves the UX on forks, or has other benefits. I just wanted to raise the discussion to explain that it is possible, if people find it useful, but I won’t be working on it myself.

maflcko closed this on Jul 1, 2024

bitcoin locked this on Jul 1, 2025

ci: Move more tasks to GHA? #30304