Migrate CI to hosted Cirrus Runners

willcl-ark commented at 8:39 am on July 16, 2025: member

This changeset migrates all current self-hosted CI jobs over to hosted Cirrus Runners.

These runners cost a flat rate of $150/month, and we qualify for an open source discount of 50%. Therefore they are $75/month/runner.

One “runner” should more accurately be thought of in terms of the number of vCPU you are purchasing: https://cirrus-runners.app/pricing/ or in terms of “concurrency”, where 1 runners gets you 1.0 concurrency. e.g. a Linux x86 Runner gets you 16 vCPU (1.0 concurrency) and 64GB RAM to be provisioned as you choose, amongst one or more jobs.

Cirrus Runners currently only support Linux (x86 and Arm64) and MacOS (Arm64). This changeset does not move the existing Github Actions native MacOS runners away from being run on Github’s infrastructure. This could be a follow up optimisation.

Runs from this changeset using Cirrus Runners can be found at: https://github.com/testing-cirrus-runners/bitcoin2/actions which shows an uncached run on master (CI#1), an outside pull request (CI#3) and an updated push to master (CI#4).

These workflows were run on 10 runners, and we would recommend purchasing a similar number for our CI in this repo to achieve the speed and concurrency we expect.

We include some optional performance commits, but these could be split out and made into followups or dropped entirely.

Benefits

Maintenance

As we are not self-hosting, nobody needs to maintain servers, disks etc.

Bus factor

Currently we have a very small number of people with the know-how working on server setup and maintenance. This setup fixes that so that “anyone” familiar with GitHub-style CI systems can work on it.

Scaling

These do not “auto-scale”/have “unlimited concurrency” like some solutions, but if we want more workers/cpu to increase parallism or increase the runner size of certain jobs for a speed-up we can simply buy more concurrency using the web interface.

Speed

Runtimes aproximate current runtimes pretty well, with some jobs being faster. Caching improvements on pull request (re-runs) are left as future optimisations from the current changeset (see below).

GitHub workflow syntax

With a migration to the more-commonly-used GitHub workflow syntax, migration to other providers in the future is often as simple as a one-line change (and installing a new GitHub app to the repo).

If we decide to self-host again, then we can also self-host GitHub runners (using https://github.com/actions/runner) and maintain new GH-style CI syntax.

Reporting

GitHub workflows provide nicer built-in reporting directly on the “Checks” page of a pr. This includes more-detailed action reporting, and a host of pretty nice integrated features, such as Workflow Commands for creating annotations that can print messages during runs. See for example at the bottom of this window where we report ccache hitrate, if it was below 90%: https://github.com/testing-cirrus-runners/bitcoin/actions/runs/16163449125?pr=1

These could be added conditionally into our CI scripts to report interesting or other information.

Costs

Financial

Relative to competitors Cirrus runners are cheap for the hosted CI-world. However these are likely more expensive than our current setup, or a well-configured (new) self-hosted setup.

If we started with 10 runners to be shared amongst all migrated jobs, this would total $750/mo = $9000/yr.

Note that we are not trying to comptete here on cost directly.

Dependencies

We would be dependent on Cirrus infra, and any registry cache that we sign up to (e.g. Quay.io)

Forks

Forks should be able to run CI without paid Cirrus runners. This behaviour is achieved through a rather verbose runs-on: directive.
- This directive hardcodes the main repo (unfortunately you cannot use the env github context in this field in particular, for some reason).
- This directive also allows for a fork which has it’s own Cirrus runners to set the USE_CIRRUS_RUNNERS repository env var to use their own cirrus runners on push to their fork.
- If this is variable is not set or are we are not in the context of the main repo, the workflow will fallback to the GitHub free runners.
This cirrus cache action transparently falls back to github actions cache when not running on cirrus, so forks will get some free github caching (10GB per repo).

All jobs work on forks, but will run (slowly) on GitHub native free hosted runners, instead of Cirrus runners. They will also suffer from poor cache hit-rates, but there’s nothing that can be done about that, and the situtation is an improvement on today.

Migration process

The main org should also, in addition to pulling code changes:

Set up a Quay.io account (free is fine)
Configure a quay.io “robot account”

Set the robot account name as repo environment variable QUAY_USERNAME
Set the robot account access token as repo secret QUAY_ROBOT_TOKEN
Create a new docker repo on quay.io for the build cache to use
Give the robot account read/write access to this repo

Set up a https://cirrus-runners.app/ account, install the GitHub app to the repository, and buy/provision the number of runners.
Update this PR with the repo ORG and REPO.
Permit the actions docker/setup-buildx-action@v3 and docker/login-action@v3 to be run in this repo.

Caching

For the number of CI jobs we have, cache usage on GitHub would be an issue as GH only provides 10GB of cache space, per repo. However cirrus provides 10 GB per runner, which scales better with the number of runners.

The cirruslabs/action/[restore|save] action we use here redirects this to Cirrus’ own cache and is both faster and larger.

In the case that user is running CI on a fork, the cirrus cache falls back transparently to GitHub default cache without error.

ccache, depends-sources, built-depends

Cached as blobs via cirruslabs/actions/cache action.
Current implementation:
- On push: restores and saves caches.
- On pull_request: restores but does not save caches.

This means a new pull request should hit a pretty relevant cache. Old pull requests which are not being rebased on master may suffer from lower cache hit-rate.

If we save caches on all pull request runs we run the risk of evicting recent (and more relevant) cache blobs. It may be possible in a future optimisation to widen this to save on pull request runs too, but it will also depend on how many runners we provision and what cache churn rates are like in the main repo.

Docker build layer caching

Cached using a registry backend
Currently configured to use Quay.io
- For pushing this requires i) an account with them, and ii) login using a “robot account” (with push access).
- Login is only performed on: push, therefore caches are again (like the other caches) only essentially saved on pushes to master.
  - It is insecure to login to the registry on: pull_request, as these originate from external sources.
  - It is possible to login, but the workflow must be changed to use on: pull_request_target, which permits a PR to leak all your repo variables and secrets, amongst other pitfalls such as not useing a merge commit with the base branch as the default checkout target. We recommend this be avoided.
PRs (and forks) can pull layers from the cache without login (allegedly without rate limits on Quay.io, in contrast to Dockerhub).

We have experimented with the gha cache backend type, which stores the build cache as a series of blobs (using the cirruslabs/actions/cache action), and therefore shares the same 10GB*num_runners as all other caches, whilst negating the requirement on any registry.

This seems to simply not work in many cases, and without good visibility into the cache itself (currently there is no ui/cli access), it is not easy to see why build cache layers are seemingly missed in runs in many cases.

Using a regsitry cache has advantages and disadvantages:

Requires an account+login for pushing.
Registry may in theory rate limit us (despite having “unlimited” usage for public repos).
We do not know precisely their garbage collection policy on the registry backend (although this likely doesn’t matter if they do something like Least Recently Used).
Well supported by both docker and podman.
Appears to operate much better in general, at least we do not see the cache misses we regularly saw using the gha backend.
In theory CI being run locally can also use the --cache-from build args to share the CI build cache.

Both backends require the same network i/o in a CI setting (and so are marginally slower than our current disk i/o cache).

Cirrus are apparently in the process of creating their own container registry for their Runner product. If this materialises then we should be able to trivially switch from Quay.io to the cirrus registry for some speed improvements (and presumably to reduce the risk of being rate-limited).

But what about… `x`?

We have tested many other providers, including Runs-on, Buildjet, WarpBuild, and GitHub hosted runners (and investigated even more). But they all fall short in one-way or another.

Runs-On and Buildjet (and others) require installing GH apps with much too-liberal permissions (e.g. Administration: Read|Write) for our use-case.
GitHub hosted runners suffer from all of high costs, lower speed, small cache, and the requirement for a GitHub Teams subscription.
WarpBuild seems to be simply too expensive.

TODO:

To complete migration from self-hosted to hosted for this repo, the backport branches 27.x, 28.x and 29.x would also need their CI ported, but these are left for followups to this change (and pending review/changes here first).

Work and experimentation undertaken with @m3dwards

DrahtBot commented at 8:39 am on July 16, 2025: contributor

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/32989.

Reviews

See the guideline for information on the review process.

Type	Reviewers
Concept ACK	fanquake, achow101, maflcko, hebasto, 0xB10C

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#33000 (ci: Run unit test parallel with functional tests by maflcko)
#31802 (Add bitcoin-{node,gui} to release binaries for IPC by Sjors)
#31425 (RFC: Riscv bare metal CI job by TheCharlatan)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

in ci/test_imagefile:7 in c0ad2b6aa8 outdated

3@@ -4,12 +4,16 @@
4 
5 # See ci/README.md for usage.
6 
7-ARG CI_IMAGE_NAME_TAG
8+# We never want scratch, but default arg silences a Warning

fanquake commented at 9:39 am on July 16, 2025:

Can this be specific? I think either document the warning, or drop this comment, or just drop the workaround, if it’s only a warning.

willcl-ark commented at 10:24 am on July 16, 2025:

It is only a warning.

It’s not present current as we are using podman, but docker does this build check: https://docs.docker.com/reference/build-checks/invalid-default-arg-in-from/

If we are happy with a warning on every docker buildx build ... invocation we can drop it, but this hack silences the warning and will always fail the build if scratch ends up being used (and should be pretty easy to diagnose the failure).

Preference between adding a link to the specific check for more context, or should we drop it and live with the warning?

in ci/test/02_run_container.sh:29 in c0ad2b6aa8 outdated

45-    DOCKER_BUILD_CACHE_NEW_DIR="${DOCKER_BUILD_CACHE_TEMPDIR}/${CONTAINER_NAME}"
46-    DOCKER_BUILD_CACHE_ARG="--cache-from type=local,src=${DOCKER_BUILD_CACHE_OLD_DIR} --cache-to type=local,dest=${DOCKER_BUILD_CACHE_NEW_DIR},mode=max"
47+  echo "DOCKER_BUILD_CACHE_ARG=$DOCKER_BUILD_CACHE_ARG"
48+
49+  # Use buildx directly
50+  # This is useful in CI where docker is used (not podman) and can help achieve better caching using registry cache

fanquake commented at 9:42 am on July 16, 2025:

and can help achieve

This is a bit vauge; have we seen this not work properly? It’s also not completely clear to me that we need buildx to use the registry. Seems like it also works with buildkit, which is now just the default docker build, it any recent Docker version.

willcl-ark commented at 10:30 am on July 16, 2025:

Yes, we have tested that even if we configure the docker builder using the setup-buildx-action where use is set to true by default (which should enable that driver in future docker build invocations IIUC), that the incorrect builder is used unless docker buildx build command specifically is used.

We have tried with BUILDKIT=1 docker build and plain docker build, but both used the incorrect driver (contrary to what we expected).

fanquake commented at 10:47 am on July 16, 2025:

Ok. Could we change this to something like:

Using buildx is required to properly load the correct driver, for use with registry caching. Neither build, nor BUILDKIT=1 currently do this properly

. Just want to avoid phrases like “can help” or “is useful”, and instead be specific about what we are doing & why.

maflcko commented at 11:02 am on July 16, 2025:

why not switch to buildx unconditionally?

willcl-ark commented at 12:19 pm on July 16, 2025:

Does podman accept docker buildx build as a command? I will check in a min with it.

but this was why not originally…

willcl-ark commented at 12:43 pm on July 16, 2025:

OK it seems docker buildx build does work with podman, so I have changed buildx to be used unconditionally in this commit now. Thanks!

maflcko commented at 12:47 pm on July 16, 2025:

Yeah, if buildx is missing on some platforms, it could also use auto-detection to use it when available and not, when not.

fanquake commented at 9:43 am on July 16, 2025: member

Concept ACK. This will also need to go back to 27.x.

in .github/workflows/ci.yml:562 in c0ad2b6aa8 outdated

545         run: sed -i "s|\${INSTALL_BCC_TRACING_TOOLS}|true|g" ./ci/test/00_setup_env_native_asan.sh
546 
547+      - name: Set mmap_rnd_bits
548+        if: ${{ matrix.name == 'TSan, depends, gui' || matrix.name == 'MSan, depends' }}
549+        # Prevents crashes due to high ASLR entropy
550+        run: sudo sysctl -w vm.mmap_rnd_bits=28

fanquake commented at 9:46 am on July 16, 2025:

Wonder if we should just move this into the scripts, rather than set it here, so that it’ll also be done for local usage (where it hasn’t been set already).

willcl-ark commented at 10:28 am on July 16, 2025:

That would make sense to me too. Happy to make that change here, or in a followup.

willcl-ark commented at 1:10 pm on July 16, 2025:

Actually I feel a bit less convinced that dropping a sudo ... command into the CI scripts is ideal for this actually. I see comments that 32 bits should now be “fixed”: https://github.com/google/sanitizers/issues/1614#issuecomment-1885369007

Testing this now locally…

willcl-ark commented at 4:19 pm on July 16, 2025:

Hmmm my msan ci job is not working (due to something unrelated) for me on my main dev machine, and tsan is segfaulting with vm.mmap_rnd_bits=32, so I’m going to leave this in one way or another for now.

maflcko commented at 6:24 am on July 17, 2025:

Actually I feel a bit less convinced that dropping a sudo ... command into the CI scripts is ideal

Yeah, they should be fine to run as a user normally. The only dependencies are Python, Bash, and a container engine.

willcl-ark commented at 10:29 am on July 16, 2025: member

Concept ACK. This will also need to go back to 27.x.

Testing a backport to 29.x here: https://github.com/testing-cirrus-runners/bitcoin2/actions/runs/16320536543

I think the best course of action could be to look for a little more conceptual review here, and after that squash the “ci: port x” commits in this changeset down to a single one, to make backporting to the multiple supported branches easier.

achow101 commented at 6:07 pm on July 16, 2025: member

Concept ACK

in ci/README.md:74 in c0ad2b6aa8 outdated

70+   3. Create a new "robot account" with read/write access to the previously-created Docker repository.
71+4. Set repository variables:
72+   1. `USE_CIRRUS_RUNNERS = true`
73+   2. `REGISTRY_USERNAME = <quay robot account name>`
74+5. Set repo secrets:
75+   1. `RESISTRY_TOKEN = <quay robot token>`

DrahtBot commented at 5:21 am on July 17, 2025:

RESISTRY_TOKEN -> REGISTRY_TOKEN [secret name typo]
GitHubs -> GitHub's [missing apostrophe and letter ‘s’ for possessive]

maflcko commented at 6:51 am on July 17, 2025: member

This means a new pull request should hit a pretty relevant cache. Old pull requests which are not being rebased on master may suffer from lower cache hit-rate.

I don’t think this is true. A pull request that modifies a core header (like serialize.h) will now always start from a cold cache. The current persistent workers have a high ccache hit rate for pulls that are (force) pushed for minor fixups (https://0xb10c.github.io/bitcoin-core-ci-stats/graph/ccache/). Also, before CI runs, pull requests are rebased/merged with master, so the age of a pull request alone shouldn’t affect cache hit rate.

However, the trade-offs here are probably worth to go forward and try to optimize the ccache hit rate later. Concept ACK.

Give the robot account read/write access to this repo

This seems a bit scary. Are you saying that a proprietary third party outside of our control can now push directly to the repo? My assumption was that the tokens would be added to CI in this repo and CI had write access to the registry, not the other way round. Why would the registry need write access here?

Edit: We may reconsider #31850 and drop container image caching, and just accept the intermittent network IO errors or network speed issues.

willcl-ark commented at 7:02 am on July 17, 2025: member

This means a new pull request should hit a pretty relevant cache. Old pull requests which are not being rebased on master may suffer from lower cache hit-rate.

I don’t think this is true. A pull request that modifies a core header (like serialize.h) will now always start from a cold cache. The current persistent workers have a high ccache hit rate for pulls that are (force) pushed for minor fixups (0xb10c.github.io/bitcoin-core-ci-stats/graph/ccache). Also, before CI runs, pull requests are rebased/merged with master, so the age of a pull request alone shouldn’t affect cache hit rate.

However, the trade-offs here are probably worth to go forward and try to optimize the ccache hit rate later. Concept ACK.

Yes, force pushes for minor fixups is the tradeoff we have in the current implementation. As you say, we can set the ccache to save on pull requests too (in the future) if necessary.

Give the robot account read/write access to this repo

This seems a bit scary. Are you saying that a proprietary third party outside of our control can now push directly to the repo? My assumption was that the tokens would be added to CI in this repo and CI had write access to the registry, not the other way round. Why would the registry need write access here?

Sorry for not being clearer! The robot account gets read/write access to the Quay.io (docker) repo, not this code repository!

Edit: We may reconsider #31850 and drop container image caching, and just accept the intermittent network IO errors or network speed issues.

Yes, I would love to get #31850 working in any case, as it would simply avoid long rebuilds in the worst cases; most docker images are rebuilding in < 2 minutes, except MSAN…

in .github/actions/restore-caches/action.yml:4 in 2d0ac71899 outdated

0@@ -0,0 +1,49 @@
1+name: 'Restore Caches'
2+description: 'Restore ccache, depends sources, and built depends caches'
3+inputs:
4+  job-name:

maflcko commented at 8:02 am on July 17, 2025:

nit in 2d0ac71899eadcbff8252b1704d2affb73f0160d: Should be named job-id, not the name?

willcl-ark commented at 10:35 am on July 18, 2025:

switched to job-id in 9458cd0e66d

in .github/actions/restore-caches/action.yml:17 in 2d0ac71899 outdated

12+      run: |
13+        echo "CCACHE_DIR=${{ runner.temp }}/ccache_dir" >> $GITHUB_ENV
14+        echo "DEPENDS_DIR=${{ runner.temp }}/depends" >> $GITHUB_ENV
15+        echo "SOURCES_PATH=${{ runner.temp }}/depends/sources" >> $GITHUB_ENV
16+        echo "BASE_CACHE=${{ runner.temp }}/depends/built" >> $GITHUB_ENV
17+        echo "DEPENDS_HASH=${{ hashFiles('depends/packages/*.mk', 'depends/Makefile', 'depends/config.guess', 'depends/config.sub', 'depends/funcs.mk', 'depends/builders/*.mk', 'depends/hosts/*.mk') }}" >> $GITHUB_ENV

maflcko commented at 8:11 am on July 17, 2025:

nit in https://github.com/bitcoin/bitcoin/commit/2d0ac71899eadcbff8252b1704d2affb73f0160d: Why not just git rev-parse HEAD:depends, like it was done for cirrus? (See git log -S 'git rev-parse HEAD:depends')

willcl-ark commented at 10:35 am on July 18, 2025:

This now uses git rev-parse HEAD:depends in both 868ddf55718 and 1b689cf7811

in .github/actions/restore-caches/action.yml:25 in 2d0ac71899 outdated

20+      id: ccache-cache
21+      uses: cirruslabs/cache/restore@v4
22+      with:
23+        path: ${{ env.CCACHE_DIR }}
24+        # A single ccache per job
25+        key: ccache-${{ inputs.job-name }}-${{ github.ref_name }}-${{ github.run_id }}

maflcko commented at 8:15 am on July 17, 2025:

nit in https://github.com/bitcoin/bitcoin/commit/2d0ac71899eadcbff8252b1704d2affb73f0160d: Comment seems wrong? It is a cache for each job-id + ref_name + run_id. Could just remove the comment?

willcl-ark commented at 10:36 am on July 18, 2025:

comment removed in 9458cd0e66d

in .github/actions/restore-caches/action.yml:27 in 2d0ac71899 outdated

22+      with:
23+        path: ${{ env.CCACHE_DIR }}
24+        # A single ccache per job
25+        key: ccache-${{ inputs.job-name }}-${{ github.ref_name }}-${{ github.run_id }}
26+        restore-keys: |
27+          ccache-${{ inputs.job-name }}-master-

maflcko commented at 8:22 am on July 17, 2025:

nit in https://github.com/bitcoin/bitcoin/commit/2d0ac71899eadcbff8252b1704d2affb73f0160d: why hardcode master. This makes it impossible to cache in pull requests, if this is ever done in the future. Also, only master caches are created, so this doesn’t guard against anything.

Also, the others don’t have it.

willcl-ark commented at 10:37 am on July 18, 2025:

All references to master have been switched to github.event.repository.default_branch where appropriate. Removed the ref_name from the save/restore keys, as it is indeed unnecessary.

in .github/actions/save-caches/action.yml:12 in 2d0ac71899 outdated

 7+runs:
 8+  using: 'composite'
 9+  steps:
10+    - name: Save Ccache cache
11+      uses: cirruslabs/cache/save@v4
12+      if: ${{ (github.event_name == 'push') && (github.ref_name == 'master') }}

maflcko commented at 8:30 am on July 17, 2025:

nit in the same commit: Could use github.event.repository.default_branch instead of master, so that caches can be saved for other repos, such as inquisition?

(same below)

in .github/actions/restore-caches/action.yml:36 in 2d0ac71899 outdated

31+      id: depends-sources
32+      uses: cirruslabs/cache/restore@v4
33+      with:
34+        path: ${{ env.SOURCES_PATH }}
35+        # A single shared depends sources for all jobs per hash of package definitions and meta_depends
36+        key: depends-sources-${{ env.DEPENDS_HASH }}-${{ github.run_id }}

maflcko commented at 8:32 am on July 17, 2025:

same commit: This seems broken? A job-id with NO_QT=1 populating this would corrupt the cache for other job-id’s that build qt?

willcl-ark commented at 10:38 am on July 18, 2025:

9458cd0e66d now uses a depends sources cache per job_id to avoid this pitfall (thanks!):

0key: depends-sources-${{ inputs.job-id }}-${{ env.DEPENDS_HASH }}

in .github/actions/restore-caches/action.yml:14 in 2d0ac71899 outdated

 9+  steps:
10+    - name: Set cache paths
11+      shell: bash
12+      run: |
13+        echo "CCACHE_DIR=${{ runner.temp }}/ccache_dir" >> $GITHUB_ENV
14+        echo "DEPENDS_DIR=${{ runner.temp }}/depends" >> $GITHUB_ENV

maflcko commented at 8:36 am on July 17, 2025:

same commit: Unused?

willcl-ark commented at 10:39 am on July 18, 2025:

Moved this code into the main ci.yml file, so that the cache actions are only performing caching (rather than being responsible for setting env vars), both in 868ddf55718 and 1b689cf7811

in .github/actions/configure-docker/action.yml:38 in b1a4adc814 outdated

33+        password: ${{ inputs.registry-password }}
34+
35+    - name: Set docker build args
36+      shell: bash
37+      run: |
38+        source ${{ env.FILE_ENV }}

maflcko commented at 8:44 am on July 17, 2025:

nit in b1a4adc814369aff21b8c12775270cfe648db486: Could add a comment that this is only used to extract CONTAINER_NAME.

willcl-ark commented at 10:39 am on July 18, 2025:

Comment added in db78efac768

in .github/workflows/ci.yml:25 in 9002faff0d outdated

19@@ -20,6 +20,9 @@ concurrency:
20 env:
21   CI_FAILFAST_TEST_LEAVE_DANGLING: 1  # GHA does not care about dangling processes and setting this variable avoids killing the CI script itself on error
22   MAKEJOBS: '-j10'
23+  REGISTRY: ${{ vars.REGISTRY || 'quay.io' }}
24+  REGISTRY_ORG: ${{ vars.REGISTRY_ORG || 'bitcoin-core' }} # TODO: Set correct org
25+  REGISTRY_REPO: ${{ vars.REGISTRY_REPO || 'core-ci' }} # TODO: Set correct repo

maflcko commented at 8:46 am on July 17, 2025:

9002faff0d7518ebafe05d8d43d6558d238c2659: adjust for https://quay.io/repository/bitcoincore/bitcoin-ci ?

willcl-ark commented at 10:40 am on July 18, 2025:

Registry env vars updated in 7272f56a755

maflcko approved

maflcko commented at 8:50 am on July 17, 2025: member

I guess you want review here and then address it, as it comes in? Once review is finished, the app will be installed and reviewers can also look at a “real” run in this repo?

looked at c0ad2b6aa8e8c31c9f9c9ea2b35ca86f7985c490~23 🎬

Signature:

0untrusted comment: signature from minisign secret key on empty file; verify via: minisign -Vm "${path_to_any_empty_file}" -P RWTRmVTMeKV5noAMqVlsMugDDCyyTSbA3Re5AkUrhvLVln0tSaFWglOw -x "${path_to_this_whole_four_line_signature_blob}"
1RUTRmVTMeKV5npGrKx1nqXCw5zeVHdtdYURB/KlyA/LMFgpNCs+SkW9a8N95d+U4AP1RJMi+krxU1A3Yux4bpwZNLvVBKy0wLgM=
2trusted comment: looked at c0ad2b6aa8e8c31c9f9c9ea2b35ca86f7985c490~23 🎬
3gylJtm++jv0E+65SRoLFPC+ef+fwpVJftiMQ+ziB1uRZAF2MwE7TW3JaEv8iJIpZsRmForPR0jik8/6QvUs+BQ==

hebasto commented at 12:20 pm on July 17, 2025: member

Concept ACK.

we qualify for an open source discount of 50%.

We would be dependent on Cirrus infra…

We shouldn’t be surprised when Cirrus suddenly changes its modus operandi, including its advertised open-source discount or general availability.

willcl-ark commented at 12:35 pm on July 17, 2025: member

Concept ACK.

we qualify for an open source discount of 50%.

We would be dependent on Cirrus infra…

We shouldn’t be surprised when Cirrus suddenly changes its modus operandi, including its advertised open-source discount or general availability.

Certainly, it is good to be wary of that. I think this is equally true for all cloud providers though.

It’s my belief that if we complete this migration we are resonably well protected against this risk for the following reasons:

As we use the common GitHub workflow yaml format, changing to another provider can be as simple as amending the runs-on: line in the yaml (and installing a different provider’s GH app, and changing the cache action to that provider’s own).
We could always revert back to a self-hosted solution utilizing the new workflow yaml via https://github.com/actions/runner, which for example can be trivially configured on one or more self-hosted servers using services.github.runner with Nix (I have a demo of such a configuration here).

We seem to have a good working relationship so far with Cirrus, @m3dwards has a responsive and helpful contact there. Of course, we are in the tendering stage so there is perhaps extra impetus to be helpful to us, but I don’t see any reason that a historical precedence of limiting free runners (which are allegedly being abused for crypto mining), should appear any more risky to paid/premium customers (than any other provider).

maflcko commented at 12:42 pm on July 17, 2025: member

2. We could always revert back to a self-hosted solution

Agree that this may happen with any third party (including GitHub itself). If we want to switch back to the self-hosted runners, it should be as trivial as git revert $the_merge_commit_of_this_pull. Alternatively, switch to GHA-based self-hosted runners. Though, it would be good to create a proof-of-concept pull request to switch to self-hosted runners in GHA, or a different hosted alternative (e.g. warp-build), after this pull is merged. This makes it easier to see that (1) it works and (2) how easily it is possible.

ci: add caching actions

Add "Restore" and "Save" caching actions.

These actions reduce boilerplate in the main ci.yml configuration file.

These actions are implemented so that caches will be saved on `push`
only.

When a pull request is opened it will cache hit on the caches from the
lastest push (preferring master branch), or in the case of depends will
hit on any matching depends hash, falling back to partial matches.

Depends caches are hashed using `git rev-parse HEAD:depends`.
This means we direct cache hit in cases where depends would not be
re-built, otherwise falling back to a partial match.

The cirruslabs cache action will fallback transparently to GitHub's
cache in the case that the job is not being run on a Cirrus Runner,
making these compatible with running on forks (on free GH hardware).

9458cd0e66

ci: add configure-docker action

Another action to reduce boilerplate in the main ci.yml file.

This action will set up a docker builder compatible with caching build
layers to a container registry using the `docker-container` build
driver.

The action will then determine whether it should log in to a container
registry or not based on whether a username and token have been
provided. This will occur only on `push` to master branch, as the
`pull_request` GitHub context does not allow the use of repository
secrets.

Finally it will configure the docker build cache args.

When logged in build cache layers will be both pulled and pushed to the
registry.

When not logged in build cached layers will only be pulled from the
registry.

db78efac76

ci: add registry env vars

These env vars define which registry, org and repo the docker build
caching will use.

They come with defaults, but can be overridden by repository env vars in
the case that a fork would like to use their own registry configuration.

7272f56a75

ci: use buildx in ci

Using buildx is required to properly load the correct driver, for use
with registry caching. Neither build, nor BUILDKIT=1 currently do this
properly.

Use of `docker buildx build` is compatible with podman.

20a258310e

ci: use docker cache args

This was added in PR #31545 with the intention that self-hosted runners
might use it to save build cache.

As we are not using hosted runners with a registry build cache, this can
be removed.

link: https://github.com/bitcoin/bitcoin/pull/31545

e5295f7ad6

ci: have base install run in right dir

This sets the build dir at build time so that Apple SDK gets installed
in the correct/expected location for the runtime to find it.

Co-authored-by: Max Edwards <youwontforgetthis@gmail.com>

aea197b602

ci: update windows-cross job

Co-authored-by: Max Edwards <youwontforgetthis@gmail.com>

868ddf5571

ci: port arm 32-bit job

Co-authored-by: Max Edwards <youwontforgetthis@gmail.com>

1b689cf781

ci: update asan-lsan-ubsan

Co-authored-by: Max Edwards <youwontforgetthis@gmail.com>

5c35c2b4d8

ci: port mac-cross-gui-notests

Co-authored-by: Max Edwards <youwontforgetthis@gmail.com>

eec2d915d5

ci: port nowallet-libbitcoinkernel

Co-authored-by: Max Edwards <youwontforgetthis@gmail.com>

cdc9b1ac5f

ci: port multiprocess-i686-debug

Co-authored-by: Max Edwards <youwontforgetthis@gmail.com>

3172cf334f

ci: port fuzzer-address-undefined-integer-nodepends

Co-authored-by: Max Edwards <youwontforgetthis@gmail.com>

ba3cd62c0e

ci: port previous-releases-depends-debug

Co-authored-by: Max Edwards <youwontforgetthis@gmail.com>

0ddb1b6348

ci: port centos-depends-gui

Co-authored-by: Max Edwards <youwontforgetthis@gmail.com>

fb7b839ec7

ci: port tidy

Co-authored-by: Max Edwards <youwontforgetthis@gmail.com>

1c4b8961b2

ci: port tsan-depends-gui

Co-authored-by: Max Edwards <youwontforgetthis@gmail.com>

3375641cff

willcl-ark force-pushed on Jul 18, 2025

willcl-ark commented at 10:34 am on July 18, 2025: member

Pushed c126475ed7a17ec9030066056e31846c7124dcf3 with a CI run on master branch at https://github.com/testing-cirrus-runners/bitcoin2/actions/runs/16368410249

willcl-ark commented at 10:40 am on July 18, 2025: member

Thanks for the review @maflcko & @fanquake , I hope I addressed all your current review comments.

0xB10C commented at 11:27 am on July 18, 2025: contributor

Concept ACK!

I think finishing the migration would close https://github.com/bitcoin/bitcoin/issues/31965

in ci/test/02_run_container.sh:52 in 20a258310e outdated

45@@ -46,8 +46,10 @@ if [ -z "$DANGER_RUN_CI_ON_HOST" ]; then
46     DOCKER_BUILD_CACHE_ARG="--cache-from type=local,src=${DOCKER_BUILD_CACHE_OLD_DIR} --cache-to type=local,dest=${DOCKER_BUILD_CACHE_NEW_DIR},mode=max"
47   fi
48 
49+  # Use buildx unconditionally
50+  # Using buildx is required to properly load the correct driver, for use with registry caching. Neither build, nor BUILDKIT=1 currently do this properly
51   # shellcheck disable=SC2086
52-  DOCKER_BUILDKIT=1 docker build \
53+  DOCKER_BUILDKIT=1 docker buildx build \

maflcko commented at 1:49 pm on July 18, 2025:

nit in 20a258310ea4fe1c4cd280d26047aee063215454: Buildx always uses BuildKit. Can remove buildkit stuff.

in ci/test/02_run_container.sh:27 in e5295f7ad6 outdated

44-    DOCKER_BUILD_CACHE_TEMPDIR="$(mktemp --directory ci-docker-build-cache-XXXXXXXXXX)"
45-    DOCKER_BUILD_CACHE_NEW_DIR="${DOCKER_BUILD_CACHE_TEMPDIR}/${CONTAINER_NAME}"
46-    DOCKER_BUILD_CACHE_ARG="--cache-from type=local,src=${DOCKER_BUILD_CACHE_OLD_DIR} --cache-to type=local,dest=${DOCKER_BUILD_CACHE_NEW_DIR},mode=max"
47-  fi
48+  echo "DOCKER_BUILD_CACHE_ARG=$DOCKER_BUILD_CACHE_ARG"
49

maflcko commented at 1:53 pm on July 18, 2025:

nit in e5295f7ad64be1d834d3e0c2726edfcc67debfff: What about DANGER_DOCKER_BUILD_CACHE_HOST_DIR below?

Could link to the reverted commit in the commit description instead: e87429a2d0f23eb59526d335844fa5ff5b50b21f?

in ci/test/02_run_container.sh:26 in e5295f7ad6 outdated

43-    # ephemeral, it has to take care of cleaning old TEMPDIR's up.
44-    DOCKER_BUILD_CACHE_TEMPDIR="$(mktemp --directory ci-docker-build-cache-XXXXXXXXXX)"
45-    DOCKER_BUILD_CACHE_NEW_DIR="${DOCKER_BUILD_CACHE_TEMPDIR}/${CONTAINER_NAME}"
46-    DOCKER_BUILD_CACHE_ARG="--cache-from type=local,src=${DOCKER_BUILD_CACHE_OLD_DIR} --cache-to type=local,dest=${DOCKER_BUILD_CACHE_NEW_DIR},mode=max"
47-  fi
48+  echo "DOCKER_BUILD_CACHE_ARG=$DOCKER_BUILD_CACHE_ARG"

maflcko commented at 1:56 pm on July 18, 2025:

nit in e5295f7ad64be1d834d3e0c2726edfcc67debfff: The script has tracing enabled, so i don’t think this is needed?

in .github/workflows/ci.yml:291 in 868ddf5571 outdated

286@@ -287,7 +287,8 @@ jobs:
287 
288   windows-cross:
289     name: 'Linux->Windows cross, no tests'
290-    runs-on: ubuntu-latest
291+    # We hardcode the repo name here, otherwise PRs, which originate "from a fork", will NOT use cirrus runners.
292+    runs-on: ${{ (github.repository == 'bitcoin/bitcoin' || vars.USE_CIRRUS_RUNNERS == 'true') && 'ghcr.io/cirruslabs/ubuntu-runner-amd64:24.04-md' || 'ubuntu-24.04' }}

maflcko commented at 2:15 pm on July 18, 2025:

nit in 868ddf557180ffbc7a60ed539a5df1e1cd3e757c: this seems a bit fragile and won’t work for monotree repos ( bitcoin-core/gui), and won’t work for forks that accept pull requests.

Seems better to just hard-code this to cirrus, possibly add a comment: # Run on cirrus by default, but in forks this can be replaced by 'ubuntu-latest' to run on GHA

maflcko commented at 2:48 pm on July 18, 2025:

If you want it to be easy to edit, I wonder if this can be a separate yaml file so that the edit only has to be done in one place?

maflcko commented at 11:02 am on July 21, 2025:

Unrelatedly, the USE_CIRRUS_RUNNERS can probably be removed:

It is only there for forks, but the comment says it doesn’t work for forks anyway in the expected way.
If someone maintains a fork (presumably to add patches), it should be trivial for them to make the required patch to the CI as well.

in .github/workflows/ci.yml:318 in c126475ed7 outdated

316+          echo "BASE_CACHE=${{ runner.temp }}/depends/built" >> $GITHUB_ENV
317+          echo "DEPENDS_HASH=$(git rev-parse HEAD:depends)" >> $GITHUB_ENV
318+
319+      - name: Restore caches
320+        id: restore-cache
321+        # TODO: previously this also hashed "ci/test/00_setup_env_win64.sh"

maflcko commented at 2:20 pm on July 18, 2025:

nit in 868ddf5: Yeah, seems required. Same for the other caches?

maflcko commented at 2:23 pm on July 18, 2025: member

looked at c126475ed7a17ec9030066056e31846c7124dcf~20 💇

Signature:

0untrusted comment: signature from minisign secret key on empty file; verify via: minisign -Vm "${path_to_any_empty_file}" -P RWTRmVTMeKV5noAMqVlsMugDDCyyTSbA3Re5AkUrhvLVln0tSaFWglOw -x "${path_to_this_whole_four_line_signature_blob}"
1RUTRmVTMeKV5npGrKx1nqXCw5zeVHdtdYURB/KlyA/LMFgpNCs+SkW9a8N95d+U4AP1RJMi+krxU1A3Yux4bpwZNLvVBKy0wLgM=
2trusted comment: looked at c126475ed7a17ec9030066056e31846c7124dcf~20 💇
3kep8ZK4UJEamaLijXtMFwgjSmf1fhSJuF49dbZ/NHDe/5jmIZR2EzJa0ewjjGov4n3xWZMN5f3LKRekrMZ6HAw==

in .github/workflows/ci.yml:518 in c126475ed7 outdated

518+
519+          - name: 'MSan, depends'
520+            id: 'msan-depends'
521+            cirrus-runner: 'ghcr.io/cirruslabs/ubuntu-runner-amd64:24.04-lg'
522+            fallback-runner: 'ubuntu-24.04'
523+            timeout-minutes: 300

fanquake commented at 10:46 am on July 21, 2025:

Note to remember to drop the timeout post #32999.

m3dwards commented at 1:22 pm on July 21, 2025:

Done

ci: port msan-depends

Co-authored-by: Max Edwards <youwontforgetthis@gmail.com>

4bdef1eab6

ci: port lint

Co-authored-by: Max Edwards <youwontforgetthis@gmail.com>

8f7cdb3902

ci: remove .cirrus.yml

Removed as unused.

84c0901dd2

ci: dynamically increase default makejobs with cores

Previously jobs were running on a large multi-core server where 10 jobs
as default made sense (or may even have been on the low side).

Using hosted runners with fixed (and lower) numbers of vCPUs we should
adapt compilation to match the number of cpus we have dynamically.

This is cross-platform compatible with macos and linux only.

cccc150548

doc: Detail configuration of hosted CI runners cb59cef64f

---- BEGIN optional perf improvements ---- ce9afc7b15

ci: use TESTJOBS to speed up functional tests

The test runner can comfortably handle more test jobs than the cpu can
makejobs, so increase this by a factor of 2 by default.

9c3150384e

ci: test ccache hit-rate warning 867e75e430

ci: fix annoying docker warning

Docker currently warns that we are missing a default value.

Set this to scratch which will error if an appropriate image tag is not
passed in to silence the warning.

92f6acb6c7

---- END optional perf improvements ---- fe0906f168

m3dwards force-pushed on Jul 21, 2025

Migrate CI to hosted Cirrus Runners #32989

Benefits

Maintenance

Bus factor

Scaling

Speed

GitHub workflow syntax

Reporting

Costs

Financial

Dependencies

Forks

Migration process

Caching

ccache, depends-sources, built-depends

Docker build layer caching

But what about… x?

TODO:

Code Coverage & Benchmarks

Reviews

Conflicts

But what about… `x`?