Fix issues with CI on forks

Sjors commented at 2:49 pm on January 18, 2024: member

Maintainer note: SKIP_BRANCH_PUSH=true must be set in Cirrus for bitcoin-core/gui before merging this. See https://cirrus-ci.com/github/bitcoin-core/gui -> Settings.

I find myself making pull requests against my fork (mostly on top of #28983, or asking others to do so. Currently only the Github actions are run on forks, because we use self-hosted runners for the Cirrus tasks.

While setting up my own self-hosted runners for my fork, I ran into a number of issues. Some of those were addressed by #29441, but remaining issues are:

When PRs are opened in the fork, cirrus CI jobs are run twice because PRs and branches reside in the same repository, rather than a main repository and a fork repository, as is the case with bitcoin/bitcoin PRs. Fix this by adding a SKIP_BRANCH_PUSH configuration option that allows skipping CI runs not directly associated with a PR. The fix is a generalization of #20328, which fixed a similar problem for the bitcoin-core/gui mirror repository, and it allows removing a hardcoded reference to that repository.

Github actions jobs will still run twice despite this change, see #29274 (comment). Initially this PR tried to prevent that with https://github.com/bitcoin/bitcoin/commit/b9fdd0dc75b5b4944dffc700b0391b38465f754a, but this had some potentially negative side effects, see #29274 (comment), so that commit was dropped for now.
When PRs are opened in the fork, the “test-each-commit” github action can fail due to not being able to find a recent merge commit. This problem doesn’t happen in the bitcoin/bitcoin repository because branches in this repository used as the base for pull requests always point at merge commits.

This PR replaces #29259 using the self hosted workers via Cirrus instead of Github.

You can see this PR in action on this pull request to my fork: https://github.com/Sjors/bitcoin/pull/30

To test it yourself:

spin up at least two self hosted runners. Either use a seperate VM for each, or give them their own user.
Install Podman and other CI dependencies (see .cirrus.yml)
Give Cirrus access to your fork at https://cirrus-ci.com/settings/github/YOU
Get a token from Cirrus and use it to start your worker(s)
Optionally set SKIP_BRANCH_PUSH=true ~~and NO_ARM=true~~ env variables (see .cirrus.yml) make a pull request to your own fork, with this PR as the base branch

Security wise: when dealing with code from strangers on the internet, review it first before running the CI. There’s a Cirrus check-box that requires approval for people without write access to trigger CI.

DrahtBot commented at 2:49 pm on January 18, 2024: contributor

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage

For detailed information about the code coverage, see the test coverage report.

Reviews

See the guideline for information on the review process.

Type	Reviewers
ACK	ryanofsky, maflcko

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

No conflicts as of last run.

in .cirrus.yml:183 in 40bfae2350 outdated

171@@ -161,11 +172,13 @@ task:
172 
173 task:
174   name: 'ASan + LSan + UBSan + integer, no depends, USDT'
175+  # Skip if no Noble worker exists (for a fork)

maflcko commented at 3:13 pm on January 18, 2024:

Is this needed? Won’t this be skipped already by default if the worker is offline?

Sjors commented at 3:31 pm on January 18, 2024:

It stays pending, at least in the Cirrus interface.

maflcko commented at 2:50 pm on February 27, 2024:

It stays pending, at least in the Cirrus interface.

But that seems preferable to me, as it communicates clearly that the task didn’t run.

Sjors commented at 3:55 pm on February 27, 2024:

Trying that UX in https://github.com/Sjors/bitcoin/pull/36 …

Sjors commented at 4:44 pm on February 27, 2024:

On the first push of a branch, due to the UI glitched described below (https://github.com/bitcoin/bitcoin/pull/29274#issuecomment-1967030916) they show as skipped:

That seems fine. On the second push they are put in “queued sate” (forever?).

I agree that hiding the task altogether is not ideal (as ecf01f42c4df4193098e1edb7a8e19c42677d2ef does). Perhaps marking it as skipped is better.

fanquake commented at 6:03 pm on February 27, 2024:

Just generally, I’d rather not add more code/complication to our CI infra to work around quirks/annoying side-effects in other infra, especially when the number of people affected by them are very close to 0.

Sjors commented at 6:14 pm on February 27, 2024:

I ended up marking them as skipped, see #29274 (comment)

especially when the number of people affected by them are very close to 0

That’s partially a chicken-egg problem: it’s currently quite tedious to run CI on forks, which encourages people to make pull requests to the main repo that could be avoided. Everywhere in our repo where we have nested PR’s I think there’s an opportunity to have PR’s to forks instead. That’s a pattern I’d like to try out with Stratum v2.

fanquake commented at 6:20 pm on February 27, 2024:

it’s currently quite tedious to run CI on forks,, which encourages people to make pull requests to the main repo that could be avoided.

Maybe, but anyone can currently run any of the CI jobs they’d like, locally, against any code they’d like (not sure what forks, or nested PRs has to do with it, as long as there is a git branch with our ci infra, the CI job can run?). It seems that the requirement here is to have a Cirrus/GitHub interface?

maflcko commented at 6:28 pm on February 27, 2024:

An alternative may be to move the tasks from cirrus self-hosted to GHA microsoft-hosted, but that requires some more code changes

Sjors commented at 6:42 pm on February 27, 2024:

It’s not that many lines really (apart from documentation), especially as @maflcko has been chipping away at the commits.

A few more lines can be dropped when Ubuntu 24.04 is out. efc22a9ae7b4d1b346e5f1ac8578927a27286917 can be dropped in a followup if the underlying issue is solved.

I don’t mind giving people the option to use Microsoft, but that doesn’t remove the need to support self-hosting imo. Though it would be nice to have a hosted fallback for any missing self-hosted architectures.

the requirement here is to have a Cirrus/GitHub interface?

Yes, for the same reason we have that on the main repo: it’s a much lower barrier to entry. It helps the author by catching mistakes early and it gives reviewers the option to avoid wasting time on things that don’t pass CI.

The use case I have in mind is someone who wants to improve the Template Provider in #29432 (before it’s merged). I have dozens of TODO’s in that PR which I’d like help with. In order to get the benefit of a full CI run, they currently have to open a PR to the main repo. With the tweaks in my PR they can simply open a PR to my fork instead. I think that’s better for everyone.

maflcko commented at 6:50 pm on February 27, 2024:

I don’t mind giving people the option to use Microsoft, but that doesn’t remove the need to support self-hosting imo. Though it would be nice to have a hosted fallback for any missing self-hosted architectures.

I don’t see the benefits of self-hosting in this context. All code and CI results will go through GH and the CI provider in any case. There is no goal to support this long term. It was just the cheapest and easiest option, for now.

Sjors commented at 9:18 am on February 28, 2024:

Well for one thing it’s way faster (using just an idle desktop computer).

Sjors commented at 9:22 am on February 28, 2024:

It’s also not this black and white. Sure, Microsoft and Cirrus can screw us over by faking the CI result. But by not relying on them for hosting for the runners, it’s both easier to catch them doing shenanigans and it’s easier to move elsewhere.

maflcko commented at 9:46 am on February 28, 2024:

easier to catch them

Why? Someone will have to run a fully local backup either way, whose cost is independent of what is running on the other side.

easier to move elsewhere

Why? Moving elsewhere also requires moving the repo off of the GitHub mirror, or writing a GH integration and a CI framework from scratch, both of whose costs are largely independent of where the CI CPUs happen to sit.

Sjors commented at 11:26 am on February 28, 2024:

If I run a self-hosted CI I can see what it’s doing. If it fails on my machine and is marked as successful on Github, I can see that. If I collude with Github that’s a different story of course.

maflcko commented at 11:49 am on February 28, 2024:

The CI cleans up after itself, latest when the next run starts, so I don’t think it is reliably possible to inspect a failure of a remote-triggered CI run locally, even assuming no “attack” from GH.

More importantly, the CI machine is literally doing RCE of downloaded binary executables and random code, so reliably detecting a test failure (assuming a GH attack) seems questionable at best.

in .github/workflows/ci.yml:15 in 40bfae2350 outdated

 8@@ -9,7 +9,10 @@ on:
 9   # See: https://docs.github.com/en/actions/using-workflows/events-that-trigger-workflows#push.
10   push:
11     branches:
12-      - '**'
13+      # Do not run branch pushes on forks, because they are typically redundant with
14+      # pull requests to the main repo.
15+      - 'bitcoin/**'
16+      - 'bitcoin-core/**'

maflcko commented at 3:14 pm on January 18, 2024:

No objection, but I presume this will cause issues on forks (Inquisition, Knots, …)

Sjors commented at 3:34 pm on January 18, 2024:

Good point, this probably needs an ENV flag.

Sjors commented at 3:38 pm on January 18, 2024:

Knots pull requests seem to be typically made against the 25.x-knots branch. Those would run normally. But with the above code CI would not run when e.g. a future 26.x.knots is created (or rebased).

Inquisition seems to follow the some pattern.

Sjors commented at 4:43 pm on January 18, 2024:

Aaargh, afaik Github CI doesn’t support custom ENV variables for forks. So this is solved, but on Github forks are unconditionally skipped.

maflcko commented at 9:33 am on June 18, 2024:

Well, what problem is this trying to solve, given that it introduces new problems?

ryanofsky commented at 11:45 pm on June 24, 2024:

In commit “ci: skip Github CI on branch pushes for forks” (b9fdd0dc75b5b4944dffc700b0391b38465f754a)

re: #29274 (review)

Well, what problem is this trying to solve, given that it introduces new problems?

It seems like this commit is trying to be more efficient by not running CI twice for branch pushes to pull requests. I think it might also stop the “[ryanofsky/bitcoin] Run failed” emails I currently see when I push branches to my repository.

But I think it would better to drop this commit from this PR and move it to separate PR. This PR title and description only mention Cirrus CI, so it’s confusing for it to also affect Github actions. I think this change would also be easier to understand if it was implemented separately and explained on its own terms instead of in relation to a cirrus change.

maflcko commented at 2:11 pm on June 25, 2024:

It seems like this commit is trying to be more efficient by not running CI twice for branch pushes to pull requests. I think it might also stop the “[ryanofsky/bitcoin] Run failed” emails I currently see when I push branches to my repository.

I’d say it is best to disable the CI in your normal fork, if you care about efficiency and not getting emails. Better, you could create a separate fork/repo with CI enabled, and only push to it if you really want to trigger the CI.

in ci/test/02_run_container.sh:60 in 40bfae2350 outdated

55@@ -52,7 +56,7 @@ if [ -z "$DANGER_RUN_CI_ON_HOST" ]; then
56                   --mount "type=volume,src=${CONTAINER_NAME}_depends_sources,dst=$DEPENDS_DIR/sources" \
57                   --mount "type=volume,src=${CONTAINER_NAME}_depends_SDKs_android,dst=$DEPENDS_DIR/SDKs/android" \
58                   --mount "type=volume,src=${CONTAINER_NAME}_previous_releases,dst=$PREVIOUS_RELEASES_DIR" \
59-                  --env-file /tmp/env \
60+                  --env-file /tmp/env-$USER \

maflcko commented at 4:03 pm on January 18, 2024:

My previous comment here wasn’t addressed?

Sjors commented at 4:43 pm on January 18, 2024:

Added a comment above.

maflcko commented at 4:46 pm on January 18, 2024:

But that doesn’t help if two tasks are run at the same time for the same user

Sjors commented at 4:51 pm on January 18, 2024:

I don’t think you’re supposed to do that. One user == one worker.

If we want to support multiple workers per user, we also have to randomise the container names, since you might be running CI for two pull requests at the same time.

maflcko commented at 4:51 pm on January 18, 2024:

For reference, I said to use CONTAINER_NAME in the pull request you closed, which should be a more complete and correct fix?

Sjors commented at 4:56 pm on January 18, 2024:

I missed that.

But the file is owned by the user. In a multi-user setup each user builds its own containers. So this would break afaik. So random UUID could work though.

Sjors commented at 4:57 pm on January 18, 2024:

Could be a separate pull request, because this is a bug fix?

I might split it out if this PR takes too long on its own.

maflcko commented at 5:00 pm on January 18, 2024:

When running locally, usually only one user exists, and someone may be interested in running different tasks for the same (local) code.

Running the same task in parallel is indeed not supported right now.

Sjors commented at 5:07 pm on January 18, 2024:

Ok, so maybe /tmp/env-$USER-$CONTAINER_NAME? (update: done)

in .cirrus.yml:50 in 40bfae2350 outdated

44@@ -45,17 +45,26 @@ env:  # Global defaults
45 # - medium: For an x86_64 machine, recommended to have 4 CPUs and 16 GB of memory.
46 # - noble: For a machine running the Linux kernel shipped with exaclty Ubuntu Noble 24.04. The machine is recommended to have 4 CPUs and 16 GB of memory.
47 # - arm64: For an aarch64 machine, recommended to have 2 CPUs and 8 GB of memory.
48+#
49+# For works without sudo rights, start them with USER_MODE=true and make sure:
50+# - git is installed

maflcko commented at 4:05 pm on January 18, 2024:

Seems better to just require it in the other deps above (next to screen)?

Sjors commented at 4:53 pm on January 18, 2024:

Oh, I assumed that explicit install was there for a reason, the comment “Unconditionally install git (used in fingerprint_script).” is a bit vague.

in .cirrus.yml:66 in 40bfae2350 outdated

64   << : *FILTER_TEMPLATE
65   merge_base_script:
66     # Unconditionally install git (used in fingerprint_script).
67-    - bash -c "$PACKAGE_MANAGER_INSTALL git"
68+    # In USER_MODE this must have been done manually
69+    - if [ ! "$USER_MODE" = "true" ]; then bash -c "$PACKAGE_MANAGER_INSTALL git"; fi

maflcko commented at 4:06 pm on January 18, 2024:

0    - git --version || bash -c "$PACKAGE_MANAGER_INSTALL git"

Maybe without a config flag?

Sjors commented at 4:54 pm on January 18, 2024:

That’s indeed nicer, if the line can’t be dropped entirely.

maflcko commented at 5:01 pm on January 18, 2024:

It can’t be dropped, because then the lint CI would fail, but you are welcome to remove it and try

Sjors force-pushed on Jan 18, 2024

in ci/lint/06_script.sh:14 in cddc867082 outdated

10@@ -11,7 +11,7 @@ set -ex
11 if [ -n "$LOCAL_BRANCH" ]; then
12   # To faithfully recreate CI linting locally, specify all commits on the current
13   # branch.
14-  COMMIT_RANGE="$(git merge-base HEAD master)..HEAD"
15+  COMMIT_RANGE="$(git merge-base HEAD origin/master)..HEAD"

Sjors commented at 4:59 pm on January 18, 2024:

Preserving @maflcko’s comment on the previous PR:

unrelated: Generally I’d find it better if we get rid of the commit range completely. For example, when running the whitespace linter, it seems easier to just lint all code at HEAD, instead of trying to guess which code was modified and only lint that.

maflcko commented at 12:05 pm on February 27, 2024:

See https://github.com/bitcoin/bitcoin/pull/29487

maflcko commented at 12:50 pm on February 27, 2024:

In any case, I don’t understand the change here. This is only run when LOCAL_BRANCH is set, which does not happen in CI, so I doubt that your “fix” fixes anything.

Sjors commented at 1:22 pm on February 27, 2024:

I remember 0ac57f8becd10277549d1612a11b3b7c48b1fd6a fixed an error complaining that there’s no master branch. But I can retest.

Sjors commented at 1:31 pm on February 27, 2024:

IIUC only the cirrus linter CI calls this script. That’s not self-hosted even on forks. So it’s possible this commit was only useful in #29259, where I added that linter to a Github Action. I’ll drop it if CI passes on https://github.com/Sjors/bitcoin/pull/30

maflcko commented at 1:34 pm on February 27, 2024:

I remember https://github.com/bitcoin/bitcoin/commit/0ac57f8becd10277549d1612a11b3b7c48b1fd6a fixed an error complaining that there’s no master branch. But I can retest.

Sure, but I wonder if it is worth it. Given that COMMIT_RANGE is only really required for the scripted-diff check, I think it is fine to force devs to locally specify the range.

There are endless ways in which this can fail. I don’t have a master branch locally either. origin/master isn’t guaranteed to exist either. And finally, all of them (if they exist) may point to different commits, leading to even more bugs

maflcko commented at 1:39 pm on February 27, 2024:

IIUC only the cirrus linter CI calls this script. That’s not self-hosted even on forks. So it’s possible this commit was only useful in #29259, where I added that linter to a Github Action. I’ll drop it if CI passes on Sjors#30

Yeah, obviously it will be passing. As I said, this code in this line is never executed in CI. For pulls, the branch below will be run.

https://cirrus-ci.com/task/6340412349087744?logs=lint#L263

Sjors commented at 2:01 pm on February 27, 2024:

I dropped 0ac57f8becd10277549d1612a11b3b7c48b1fd6a.

Someone can drop COMMIT_RANGE altogether in a separate PR. Maybe move the incantation to a README, because it’s certainly useful to know how to run the linter on each commit.

maflcko commented at 1:49 pm on March 15, 2024:

Fixed in https://github.com/bitcoin/bitcoin/pull/29660

in .github/workflows/ci.yml:33 in baa71097bf outdated

28@@ -29,7 +29,10 @@ jobs:
29   test-each-commit:
30     name: 'test each commit'
31     runs-on: ubuntu-22.04
32-    if: github.event_name == 'pull_request' && github.event.pull_request.commits != 1
33+    # Only run on pull requests, skip if there's only commit.
34+    # Do not run on forks.

Sjors commented at 5:01 pm on January 18, 2024:

It would be nicer to fix the actual problem, but I don’t understand what this job is doing.

Sjors commented at 5:27 pm on January 18, 2024:

@hebasto @ryanofsky any thoughts on how to make the test-each-commit task work for a scenario like in https://github.com/Sjors/bitcoin/pull/30 where the base branch is fork/some-branch and the PR branch is fork/some-pr?

Sjors commented at 5:28 pm on January 18, 2024:

Example of earlier failure: https://github.com/Sjors/bitcoin/actions/runs/7556559882/job/20573828392?pr=28#step:4:42

 0 shell: /usr/bin/bash -e {0}
 1  env:
 2    DANGER_RUN_CI_ON_HOST: 1
 3    CI_FAILFAST_TEST_LEAVE_DANGLING: 1
 4    MAKEJOBS: -j10
 5    MAX_COUNT: 6
 6    FETCH_DEPTH: 8
 7Warning: you are leaving 1 commit behind, not connected to
 8any of your branches:
 9
10  3b5c5d4 ci: add self-hosted runners for GitHub
11
12If you want to keep it by creating a new branch, this may be a good time
13to do so with:
14
15 git branch <new-branch-name> 3b5c5d4
16
17HEAD is now at cbe740b ci: vary /tmp/env
18fatal: bad revision '^^@'

Sjors force-pushed on Jan 18, 2024

DrahtBot added the label CI failed on Jan 18, 2024

maflcko commented at 6:01 pm on January 18, 2024: member

lgtm ACK c65fde483133a04964cc8757c96005b78d9e8ca8

Didn’t review the other commits.

DrahtBot removed the label CI failed on Jan 18, 2024

maflcko commented at 4:09 pm on January 19, 2024: member

As for the other commits, as mentioned previously, I am not sure if this is really needed. But no objection, of course. (https://github.com/bitcoin/bitcoin/pull/29259#issuecomment-1895822246)

Sjors commented at 8:58 am on January 22, 2024: member

@maflcko that comment refers to running CI for checking my own work. But what this pull request enables is running CI for pull requests, which may be from myself but also from others.

I’m somewhat puzzled why Knots (cc @luke-jr) and Inquisition (cc @ajtowns) are not running into the same issues. But I guess there’s at least two differences:

Presumably they run self hosted (virtual) machines via Cirrus but with sudo permissions
They’re both still building on the v25.0 tag, i.e. with somewhat different CI config

ajtowns commented at 11:03 am on January 22, 2024: contributor

I’m somewhat puzzled why Knots (cc @luke-jr) and Inquisition (cc @ajtowns) are not running into the same issues. But I guess there’s at least two differences:

Inquisition disabled all the automatic tests except the trivial linter when cirrus imposed limits earlier in the year, and tests are only run manually by reviewers. Haven’t worked out what it should be doing, so don’t have suggestions to offer, but interested to see what the resolution here is.

luke-jr commented at 4:25 am on January 23, 2024: member

I’m somewhat puzzled why Knots and Inquisition are not running into the same issues

Maybe because I switch the self-hosted CI to not-self-hosted?

in ci/test/02_run_container.sh:34 in e44e9d2bd3 outdated

29@@ -30,6 +30,10 @@ if [ -z "$DANGER_RUN_CI_ON_HOST" ]; then
30   docker volume create "${CONTAINER_NAME}_previous_releases" || true
31 
32   if [ -n "${RESTART_CI_DOCKER_BEFORE_RUN}" ] ; then
33+    # Kill harder, prevents "The container name "/ci_..." is already in use by container"
34+    docker stop "$(docker ps -a -q)" || true

fanquake commented at 4:52 pm on January 23, 2024:

What is “kill harder”? Containers are already being --all --force removed below:

0-a, --all                   Remove all containers
1-f, --force                 Force removal of a running or unusable container

so they shouldn’t need to be “stopped” beforehand?

Also, shouldn’t this be podman?

maflcko commented at 5:02 pm on January 23, 2024:

An alternative (with podman 4.8+) would be to use run --replace, see https://github.com/containers/podman/commit/f21c1d238da393cb11c75e408a736336b979e6d5

Sjors commented at 10:30 am on January 25, 2024:

It was intentionally vague, because I might just be doing something else wrong. I’ll read @maflcko’s suggestion.

maflcko commented at 10:39 am on January 25, 2024:

Looks like https://packages.ubuntu.com/noble/podman and https://packages.debian.org/trixie/podman includes the fixed --replace, so might try that at one point.

Sjors commented at 2:31 pm on January 25, 2024:

Oh, this is only available in Noble? In that case I might just improve the code comment (including sleep) and add a note that can go away with Noble.

Sjors commented at 3:49 pm on January 25, 2024:

I installed Podman 4.8.3 from source on my machine. However I don’t understand what you mean by run --replace. It seems we’re only using podman in the RESTART_CI_DOCKER_BEFORE_RUN section, but still call docker run everywhere. Can we just swap docker run with podman run or is it more complicated?

fanquake commented at 3:53 pm on January 25, 2024:

I don’t have docker installed on any machine I use for CI. Only podman, and a docker -> podman alias (podman-docker).

Sjors commented at 4:21 pm on January 25, 2024:

Mmm, I tried something similar in de26cbc79230f64d7747b8b88f4dee71ebd37e1b by adding a DOCKER_RUN variable and setting it to podman run --replace. But this fails with Error: short-name resolution enforced but cannot prompt without a TTY. Are you passing any other arguments in your alias?

If I can get it to work I can drop 7cdb75d572e825b4c0cf74a516f912f94807853f and add instruction for self-hosters to make sure they have podman >= 4.8

fanquake commented at 4:24 pm on January 25, 2024:

Are you passing any other arguments in your alias?

The alias came from:

0apt install podman-docker
1# podman-docker is already the newest version (4.8.3+ds1-2).
2docker --version
3# podman version 4.8.3
4podman --version
5# podman version 4.8.3

Sjors commented at 4:33 pm on January 25, 2024:

podman-docker is not documented in the manual installation… https://podman.io/docs/installation#building-from-source

Let’s see how I can install this without totally messing up my OS :-)

maflcko commented at 4:37 pm on January 25, 2024:

You can try:

0mkdir -p ~/bin/ && echo 'IyEvdXNyL2Jpbi9lbnYgYmFzaAoKZWNob2VycigpIHsgZWNobyAiJEAiIDE+JjI7IH0KCmVjaG9lcnIgIiAoOjo6ZG9ja2VyLT4pcG9kbWFuICR7QH0iCgpwb2RtYW4gIiR7QH0iCg==' | base64 --decode > ~/bin/docker && chmod +x ~/bin/docker

(obviously untested, but works on a temporary VM)

Sjors commented at 4:42 pm on January 25, 2024:

That base64 decodes to:

0#!/usr/bin/env bash
1
2echoerr() { echo "$@" 1>&2; }
3
4echoerr " (:::docker->)podman ${@}"
5
6podman "${@}"

That doesn’t seem any different from my DOCKER_RUN="podman run --replace" alias.

Sjors commented at 4:43 pm on January 25, 2024:

Or I guess it’s the echoerr doing some magic

maflcko commented at 4:51 pm on January 25, 2024:

It is hard to debug your issue, but I presume you have docker installed on your system.

In this case, you can’t install podman-docker without removing docker. Also, the CI system workers will not clean up after themselves, because the cleanup is done with the podman command, which can not clear docker containers.

My recommendation would be to start from a fresh VM (fresh install of Ubuntu) and then follow the docs: https://github.com/bitcoin/bitcoin/blob/ac923e70e7cec603abd207f104dbabfe675d59b2/.cirrus.yml#L16

This should fix the issue you are trying to “fix” in this commit.

Sjors commented at 5:02 pm on January 25, 2024:

Yes, Docker is installed in rootless mode: https://docs.docker.com/engine/security/rootless/

Forgot to link to the error: https://cirrus-ci.com/task/5247017476161536?logs=ci#L261

The “short-name resolution” error makes no sense to me based on what I can find about it, since the container URL includes docker.io (docker.io/amd64/ubuntu:22.04 in the above example).

you can’t install podman-docker without removing docker

Ok, let’s not go that route, since Docker currently works.

because the cleanup is done with the podman command, which can not clear docker containers

Ah, so that explains why I needed 7cdb75d572e825b4c0cf74a516f912f94807853f.

This should fix the issue you are trying to “fix” in this commit.

Why do we add a dependency on podman at all? Given that docker stop "$(docker ps -a -q)" || true works. I guess it’s less clean because it cleans all docker processes run by the user. But is there some other benefit? Is the plan to move to podman entirely?

maflcko commented at 5:23 pm on January 25, 2024:

Why do we add a dependency on podman at all?

A dependency on a container engine is needed, because this repo is the wrong place to implement a container engine from scratch.

Is the plan to move to podman entirely?

For running the CI locally, any container engine should work. Inside the RESTART_CI_DOCKER_BEFORE_RUN, only podman is supported.

If you want to switch to docker, the podman commands would have to replaced by docker commands inside the RESTART_CI_DOCKER_BEFORE_RUN block. Otherwise, both podman and docker are required to run on docker, which seems confusing.

Sjors commented at 5:33 pm on January 25, 2024:

Oh, I see where my confusion comes from. I thought podman was just a wrapper for Docker. But it’s actually an alternative which can run the same containers. Why not switch over entirely to Podman? It’s quite confusing that we’re using both.

Since Podman doesn’t need root either, I might just nuke Docker and give it a try.

Sjors commented at 7:29 pm on January 25, 2024:

I nuked Docker in favor of Podman. So far it seems to work. I dropped 7cdb75d572e825b4c0cf74a516f912f94807853f.

maflcko commented at 9:00 am on January 26, 2024:

Why not switch over entirely to Podman? It’s quite confusing that we’re using both.

I don’t see the benefit of requiring that docker is uninstalled. If someone wants to use their container engine of their choice locally, just let them do it?

willcl-ark commented at 9:06 am on January 26, 2024:

Not specifically related to the issue at hand: the new podman desktop app is pretty smart, it can show you containers running via docker and podman (and kubernetes clusters) all in the same UI: https://podman-desktop.io/

in ci/test/02_run_container.sh:35 in 7cdb75d572 outdated

29@@ -30,6 +30,10 @@ if [ -z "$DANGER_RUN_CI_ON_HOST" ]; then
30   docker volume create "${CONTAINER_NAME}_previous_releases" || true
31 
32   if [ -n "${RESTART_CI_DOCKER_BEFORE_RUN}" ] ; then
33+    # Kill harder, prevents "The container name "/ci_..." is already in use by container"
34+    docker stop "$(docker ps -a -q)" || true
35+    sleep 3

fanquake commented at 4:52 pm on January 23, 2024:

Also, why is this sleep here?

Sjors commented at 10:31 am on January 25, 2024:

Without that command I got the same behavior as if the command wasn’t there. Presumably it’s asynchronous and we somehow have to wait for it. Maybe the above solution avoids the problem altogether.

Sjors commented at 5:19 pm on January 25, 2024:

I got rid of sleep, in favor of docker rm -f $CONTAINER_NAME (haven’t pushed that change yet, pending discussion on podman)

https://stackoverflow.com/questions/57631654/how-to-wait-for-a-docker-container-to-be-removed

Sjors commented at 5:21 pm on January 25, 2024:

The alternative polling approach suggested in that answer might be worth considering:

it does a SIGKILL rather than the SIGTERM followed by SIGKILL the way docker stop does

Not sure how much this difference matters in our use.

Sjors commented at 7:28 pm on January 25, 2024:

These lines are dropped now.

willcl-ark added the label Tests on Jan 24, 2024

Sjors force-pushed on Jan 25, 2024

Sjors commented at 7:28 pm on January 25, 2024: member

I switched my own machine from Docker to Podman. This removes the need for 7cdb75d572e825b4c0cf74a516f912f94807853f so that’s dropped.

Added a commit that explains the process a bit more clearly.

fanquake referenced this in commit bdddf364c9 on Feb 20, 2024

Sjors force-pushed on Feb 20, 2024

Sjors commented at 1:41 pm on February 20, 2024: member

Rebased after #29441 landed which contains two commits from this PR.

Sjors renamed this:
~~Support self-hosted Cirrus workers on forks (and multi-user)~~
Support self-hosted Cirrus workers on forks
on Feb 20, 2024

Sjors force-pushed on Feb 27, 2024

Sjors commented at 2:01 pm on February 27, 2024: member

Dropped 0ac57f8becd10277549d1612a11b3b7c48b1fd6a since that wasn’t doing anything. Also rebased.

Sjors commented at 4:38 pm on February 27, 2024: member

One annoying side-effect of 80f338c01623469bd3347d8a2e438294512c141c is that if you open a pull request (after pushing the branch) the Cirrus tasks initially all show as skipped. The tasks are actually running though, and as they finish that’s reflected on Github.

This only happens for Cirrus workers, not for Github workers (3048700bde41644f1b8b40825334b6ecc0f7ccc1).

See screenshots: https://github.com/Sjors/bitcoin/pull/36#issuecomment-1966883011

This seems like a bug in Github or in Cirrus’s integration. Dropping 80f338c01623469bd3347d8a2e438294512c141c has two undesirable effects:

It runs CI twice for every push
It runs my self-hosted CI for every push to pull requests on the main repo

The new behaviour is opt-in by setting NO_BRANCH=true, so everyone can decide what they prefer. But perhaps there’s a more elegant solution here?

(marked draft to try a few things)

Sjors marked this as a draft on Feb 27, 2024

Sjors force-pushed on Feb 27, 2024

Sjors commented at 6:10 pm on February 27, 2024: member

I pushed two changes:

80f338c01623469bd3347d8a2e438294512c141c -> 0d084293e3083ffda6309cb80c06852b3010f869: use only_if: instead of skip:: this slightly improves the experience of when you open a new PR.
ecf01f42c4df4193098e1edb7a8e19c42677d2ef -> 7f7f87b6691e222b9488e55b2a41a2e763071ead: unsupported architecture are marked skipped instead of hidden (except for the first time you open a PR, which hopefully is fixed with https://github.com/cirruslabs/cirrus-ci-docs/issues/1233)

As demonstrated in https://github.com/Sjors/bitcoin/pull/37#issuecomment-1967335911, the first push looks like this (with ARM and Asan missing):

And then second: (ARM and Asan shown as skipped)

Sjors marked this as ready for review on Feb 27, 2024

in .cirrus.yml:21 in bfe2f35972 outdated

18+#
19+# The following specific types should exist, with the following requirements:
20+# - small: For an x86_64 machine, recommended to have 2 CPUs and 8 GB of memory.
21+# - medium: For an x86_64 machine, recommended to have 4 CPUs and 16 GB of memory.
22+# - noble: For a machine running the Linux kernel shipped with exaclty Ubuntu Noble 24.04. The machine is recommended to have 4 CPUs and 16 GB of memory.
23+# - arm64: For an aarch64 machine, recommended to have 2 CPUs and 8 GB of memory.

luke-jr commented at 3:54 pm on March 2, 2024:

0# - small: For an x86_64 machine, recommended to have 2 CPU cores and 8 GB of memory.
1# - medium: For an x86_64 machine, recommended to have 4 CPU cores and 16 GB of memory.
2# - noble: For a machine running the Linux kernel shipped with exactly Ubuntu Noble 24.04. The machine is recommended to have 4 CPU cores and 16 GB of memory.
3# - arm64: For an aarch64 machine, recommended to have 2 CPU cores and 8 GB of memory.

Sjors commented at 11:06 am on March 4, 2024:

Done

luke-jr commented at 2:47 am on March 3, 2024: member

This seems more like “don’t support Cirrus on forks”, since it disables it to a large extent. Maybe that should also be behind env vars?

Sjors commented at 10:56 am on March 4, 2024: member

Maybe that should also be behind env vars?

It is. Nothing is disabled for Cirrus on forks, unless you set NO_BRANCH, NO_ARM and/or NO_NOBLE.

On Github CI there’s no way (afaik) to set env vars, so there a few things are disabled for forks.

Sjors force-pushed on Mar 4, 2024

in .cirrus.yml:12 in 61db4fb9bd outdated

 7@@ -8,24 +8,41 @@ env:  # Global defaults
 8   CCACHE_DIR: "/tmp/ccache_dir"
 9   CCACHE_NOHASHDIR: "1"  # Debug info might contain a stale path if the build dir changes, but this is fine
10 
11+# A self-hosted machine(s) can be used via Cirrus CI. It can be configured with a
12+# single user with sudo access, or multiple users without by setting USER_MODE=true

luke-jr commented at 11:31 pm on March 5, 2024:

What acts on this env var?

Sjors commented at 9:14 am on March 7, 2024:

Good catch, nothing does. An earlier version https://github.com/bitcoin/bitcoin/commit/3044813f977a08c05ecdc3de2355ff91cf8058c2 had this variable. I removed it after this comment: #29274 (review), but forgot to change the text.

I’ve now changed it from:

It can be configured with a single user with sudo access, or multiple users without by setting USER_MODE=true

To:

It can be configured with multiple users to run tasks in parallel. No sudo permission is required.

in .cirrus.yml:38 in 61db4fb9bd outdated

39-# - podman-docker-4.1+ is required due to the use of `podman` when
40-#   RESTART_CI_DOCKER_BEFORE_RUN is set and 4.1+ due to the bugfix in 4.1
41+# - podman-docker-4.1+ is required due to the bugfix in 4.1
42 #   (https://github.com/bitcoin/bitcoin/pull/21652#issuecomment-1657098200)
43-# - The ./ci/ depedencies (with cirrus-cli) should be installed:
44+# - The ./ci/ depedencies (with cirrus-cli) should be installed. One-liner example

luke-jr commented at 11:37 pm on March 5, 2024:

dependencies*

Sjors force-pushed on Mar 7, 2024

fanquake referenced this in commit 015ac13dcc on Mar 15, 2024

in .cirrus.yml:45 in 397ee50589 outdated

47 #   ```
48 #   apt update && apt install git screen python3 bash podman-docker curl -y && curl -L -o cirrus "https://github.com/cirruslabs/cirrus-cli/releases/latest/download/cirrus-linux-$(dpkg --print-architecture)" && mv cirrus /usr/local/bin/cirrus && chmod +x /usr/local/bin/cirrus
49 #   ```
50 #
51-# - There are no strict requirements on the hardware, because having less CPUs
52+# - There are no strict requirements on the hardware, because having less CPU cores

maflcko commented at 5:19 pm on March 19, 2024:

I don’t think this change is correct. With CPU I meant vCPU, or CPU thread.

Sjors commented at 5:46 pm on March 19, 2024:

Changed to “having fewer CPU threads run”.

Also rebased.

Sjors force-pushed on Mar 19, 2024

fanquake referenced this in commit 2e1c84b333 on Mar 25, 2024

DrahtBot added the label CI failed on Apr 20, 2024

DrahtBot removed the label CI failed on Apr 23, 2024

DrahtBot added the label Needs rebase on May 15, 2024

Sjors force-pushed on May 28, 2024

Sjors commented at 9:33 am on May 28, 2024: member

Rebased. I dropped NO_NOBLE since Noble has been released. Dropped 8b06d6fb797b1de36abb13f3aaf1a7d1607c2685 after #29660.

Sjors commented at 10:14 am on May 28, 2024: member

I had to increase the fetch depth in 6398efbe145fb297957060600615262d59f2b47c, because otherwise the test-each-commit task can fail with: fatal: bad revision '^^@' and You are not currently on a branch, see example.

This happened in https://github.com/Sjors/bitcoin/pull/30 because it doesn’t build on a merge commit but rather on top of the branch in this PR. I’m surprised it doesn’t also fail on a PR like #28201.

Dianagram009 approved

DrahtBot removed the label Needs rebase on May 28, 2024

in .github/workflows/ci.yml:40 in 6398efbe14 outdated

34@@ -35,7 +35,9 @@ jobs:
35       MAX_COUNT: 6
36     steps:
37       - name: Determine fetch depth
38-        run: echo "FETCH_DEPTH=$((${{ github.event.pull_request.commits }} + 2))" >> "$GITHUB_ENV"
39+        # Fetch some extra commits for pull requests on forks that don't build
40+        # on a merge commit.
41+        run: echo "FETCH_DEPTH=$((${{ github.event.pull_request.commits }} + 100))" >> "$GITHUB_ENV"

ryanofsky commented at 11:54 am on May 28, 2024:

In commit “ci: increase fetch depth for ’test each commit’” (6398efbe145fb297957060600615262d59f2b47c)

I think the following would be a more direct fix. I didn’t test it but it just omits the ^${MERGE_BASE}^@ argument when there is no merge base, which should be fine because there should be no need to exclude merge base ancestors in that case:

0-          echo "TEST_BASE=$(git rev-list -n$((${{ env.MAX_COUNT }} + 1)) --reverse HEAD ^$(git rev-list -n1 --merges HEAD)^@ | head -1)" >> "$GITHUB_ENV"
1+          MERGE_BASE=$(git rev-list -n1 --merges HEAD)
2+          EXCLUDE_MERGE_BASE_ANCESTORS=
3+          if test -n "$MERGE_BASE"; then
4+              EXCLUDE_MERGE_BASE_ANCESTORS=^${MERGE_BASE}^@
5+          fi
6+          echo "TEST_BASE=$(git rev-list -n$((${{ env.MAX_COUNT }} + 1)) --reverse HEAD $EXCLUDE_MERGE_BASE_ANCESTORS | head -1)" >> "$GITHUB_ENV"

Sjors commented at 1:07 pm on May 28, 2024:

Thanks, will try.

Works! See https://github.com/Sjors/bitcoin/actions/runs/9270471465/job/25503545449?pr=30

Sjors force-pushed on May 28, 2024

in .github/workflows/ci.yml:64 in 65abc0b5b5 outdated

61@@ -62,7 +62,12 @@ jobs:
62           # and the ^ prefix is used to exclude these parents and all their
63           # ancestors from the rev-list output as described in:
64           # https://git-scm.com/docs/git-rev-list
65-          echo "TEST_BASE=$(git rev-list -n$((${{ env.MAX_COUNT }} + 1)) --reverse HEAD ^$(git rev-list -n1 --merges HEAD)^@ | head -1)" >> "$GITHUB_ENV"
66+          MERGE_BASE=$(git rev-list -n1 --merges HEAD)
67+          EXCLUDE_MERGE_BASE_ANCESTORS=
68+          if test -n "$MERGE_BASE"; then

ryanofsky commented at 7:06 pm on June 10, 2024:

In commit “ci: test-each-commit merge base optional” (65abc0b5b54edb06cc3bf584691beddbeb802ddc)

Maybe add comment like “MERGE_BASE can be empty due to limited fetch-depth”. Otherwise I don’t think it’s clear why git wouldn’t be able to find the merge commit.

in .cirrus.yml:56 in 8fbf7449af outdated

52+  # No need to run on the read-only mirror, unless it is a PR.
53+  # https://cirrus-ci.org/guide/writing-tasks/#conditional-task-execution
54+  skip: $CIRRUS_REPO_FULL_NAME == "bitcoin-core/gui" && $CIRRUS_PR == ""
55+  # Allow forks to skip CI runs when a branch is pushed.
56+  # Use only_if rather than skip, because otherwise all jobs are marked as
57+  # skipped when a pull request is first opened.

ryanofsky commented at 7:12 pm on June 10, 2024:

In commit “ci: forks can opt-out of CI branch push” (8fbf7449af130126890932986a4d9c0446c2f9c8)

I don’t really understand this comment. Maybe it could say why the jobs should not be marked as skipped, or maybe say how they should be shown instead of how they should not be shown.

Sjors commented at 9:20 am on June 18, 2024:

Improved the comment, see below.

in .cirrus.yml:76 in 8fbf7449af outdated

53+  # https://cirrus-ci.org/guide/writing-tasks/#conditional-task-execution
54+  skip: $CIRRUS_REPO_FULL_NAME == "bitcoin-core/gui" && $CIRRUS_PR == ""
55+  # Allow forks to skip CI runs when a branch is pushed.
56+  # Use only_if rather than skip, because otherwise all jobs are marked as
57+  # skipped when a pull request is first opened.
58+  only_if: $NO_BRANCH != "true" || $CIRRUS_PR != ""

ryanofsky commented at 7:12 pm on June 10, 2024:

In commit “ci: forks can opt-out of CI branch push” (8fbf7449af130126890932986a4d9c0446c2f9c8)

What is $NO_BRANCH? Is it a custom variable like $NO_ARM added later? Could it be documented like NO_ARM is?

Sjors commented at 9:10 am on June 18, 2024:

It’s indeed a custom variable like NO_ARM. I moved the explanation of NO_BRANCH up in the documentation. Leaving only the explanation of skip vs. only_if here, which I also reworded.

in .github/workflows/ci.yml:13 in 1a06d80e7b outdated

 8@@ -9,7 +9,10 @@ on:
 9   # See: https://docs.github.com/en/actions/using-workflows/events-that-trigger-workflows#push.
10   push:
11     branches:
12-      - '**'
13+      # Do not run branch pushes on forks, because they are typically redundant with
14+      # pull requests to the main repo.

ryanofsky commented at 7:35 pm on June 10, 2024:

In commit “ci: skip Github CI on branch pushes for forks” (1a06d80e7bb113b9588efc113262bdd20b60efd4)

I didn’t really understand this comment until I read the PR description. Might be clearer as “Disable CI on branch pushes to forks. It will still run for pull requests. This prevents CI from running twice for typical pull request workflows.”

Also, I’m not really sure how this change relates to the rest of the PR. It seems like it could be unrelated?

Sjors commented at 9:13 am on June 18, 2024:

Updated.

It serves the same purpose as NO_BRANCH on Cirrus, expect that I couldn’t find a way to make that behaviour opt-in on Github CI.

Sjors commented at 9:17 am on June 18, 2024:

I moved ef623fad7b9427dc4b7943e1ac2a1da37c2329d3 to be right after 4c764a48d3e7c1eb3f1c5a8533f7ef2b10f8299a and expanded the commit message to explain why they don’t use the same mechanism. Also added “Cirrus only” the first commit to make this even more clear, hopefully.

ryanofsky approved

ryanofsky commented at 7:36 pm on June 10, 2024: contributor

Code review ACK 65abc0b5b54edb06cc3bf584691beddbeb802ddc with caveat that I don’t know enough about cirrus and github configuration to be very confident in the changes. But they do make sense and appear to do what’s described.

DrahtBot requested review from maflcko on Jun 10, 2024

DrahtBot added the label Needs rebase on Jun 17, 2024

Sjors force-pushed on Jun 18, 2024

Sjors commented at 9:13 am on June 18, 2024: member

Rebased after #30193, which removes “noble” as a machine type. I previously already removed NO_NOBLE, see #29274 (comment). The rebase merely impacts the documented list of persistent workers, which this PR merely moves.

Addressed review comments.

Sjors force-pushed on Jun 18, 2024

in .cirrus.yml:132 in 979110609e outdated

114@@ -115,7 +115,9 @@ task:
115 
116 task:
117   name: 'ARM, unit tests, no functional tests'
118+  # Skip if no ARM worker exists (for a fork)
119   << : *GLOBAL_TASK_TEMPLATE
120+  skip: $NO_ARM == "true"

maflcko commented at 9:35 am on June 18, 2024:

An alternative would be for forks to just supply an x86 machine. The CI system should detect this itself and invoke qemu-aarch64

Sjors commented at 9:37 am on June 18, 2024:

Is that new behavior? Let me test that again…

Sjors commented at 9:39 am on June 18, 2024:

I guess I never tested this, because the documentation just says:

0# - arm64: For an aarch64 machine, recommended to have 2 CPUs and 8 GB of memory.

If the performance difference isn’t too terrible, then I can just drop the commit and mention this in the documentation.

maflcko commented at 9:39 am on June 18, 2024:

You’ll have to run something like podman run --rm --privileged docker.io/multiarch/qemu-user-static --reset -p yes and then register the type as arm64, but I’d say it should work.

Sjors commented at 9:49 am on June 18, 2024:

qemu-aarch64 adds quite a few packages to Ubuntu, so I’d probably want to keep the NO_ARM option around, but document this alternative.

Sjors commented at 9:57 am on June 18, 2024:

I guess this involves partially reverting #28087 since it seems that stripped the qemu-arm call from 00_setup_env_arm.sh.

Sjors commented at 10:28 am on June 18, 2024:

(comments came in out of order for me)

I don’t understand Docker, Podman and/or Qemu well enough to figure this out myself. For now I’ll just leave this commit.

maflcko commented at 7:54 am on June 19, 2024:

If it is too much hassle to set up an arm, one could look into moving it to GHA. However, I am not sure if this is possible. Can macOS ARM run ARM Linux containers in docker?

Sjors commented at 8:00 am on June 20, 2024:

I’m fine with just skipping that action on the stratum v2 based fork.

It’s probably not too much hassle if there’s clear instructions on how to do that on e.g. an AMD based Ubuntu machine.

The other way around doesn’t work for me, because I don’t have a native ARM machine (mac or otherwise). I’ll eventually have one, a laptop. I might then use that for just the ARM task, not the rest of CI.

moving it to GHA

I don’t know how common ARM is for servers these days? I’m assuming it’s mostly used by Apple, and for very low performance devices like Raspberry Pi’s.

I see you opened #30304.

maflcko commented at 8:14 am on June 20, 2024:

I don’t know how common ARM is for servers these days? I’m assuming it’s mostly used by Apple, and for very low performance devices like Raspberry Pi’s.

ARM is common in servers and all major cloud providers offer it. GHA does not offer it for Linux, only macOS M1, which is why I was wondering if it is possible to run ARM Linux containers on macOS M1?

It’s probably not too much hassle if there’s clear instructions on how to do that on e.g. an AMD based Ubuntu machine.

It depends on your Linux distro. On Fedora/CentOS-like it should be dnf install -y bash podman-docker git && podman run --rm --privileged docker.io/multiarch/qemu-user-static --reset -p yes.

On Ubuntu I haven’t tried it, but it may be: apt update && apt install -y bash podman-docker git qemu-user-static && podman run --rm --privileged docker.io/multiarch/qemu-user-static --reset -p yes (If you use docker, qemu may even be picked up by default)

Sjors commented at 8:20 am on June 20, 2024:

After podman run is there some VM running that I have to login to, install the Cirrus job runner? Or is there some change needed to 00_setup_env_arm.sh to make it use this?

maflcko commented at 8:27 am on June 20, 2024:

You don’t need to log in to anything, and everything should work out of the box without code modifications. I can try to test, if you provide more details about your exact system, and whether you intend to use podman-docker or docker.

Sjors commented at 8:44 am on June 20, 2024:

Thanks. I currently use podman-docker on Ubuntu 24.04. I created several cirrus{1,2,3,4,5} users.

For cirrus1 I set type: arm64. I removed the NO_ARM=true variable for the occasion.

I installed qemu-user-static (bash and podman-docker were already there).

I ran your podman run ... incantation as the cirrus1 user. And then started the worker (“Sjimmie I”):

0RESTART_CI_DOCKER_BEFORE_RUN=1  USER_MODE=true MAKEJOBS=-j10 cirrus worker run -f ~/.cirrus.yaml

I then force-pushed an update to https://github.com/Sjors/bitcoin/pull/30. The ARM task fails: https://cirrus-ci.com/task/6683160417665024

0STEP 6/7: RUN ["bash", "-c", "cd /ci_container_base/ && set -o errexit && source ./ci/test/00_setup_env.sh && ./ci/test/01_base_install.sh"]
1exec container process `/usr/bin/bash`: Exec format error
2Error: building at STEP "RUN bash -c cd /ci_container_base/ && set -o errexit && source ./ci/test/00_setup_env.sh && ./ci/test/01_base_install.sh": while running runtime: exit status 1

Sjors commented at 8:48 am on June 20, 2024:

The CPU is an AMD Ryzen 9 7950X.

maflcko commented at 8:51 am on June 20, 2024:

exec container process /usr/bin/bash: Exec format error

You can reproduce this outside the CI system with:

0podman run --rm -ti "docker.io/arm64v8/debian:bookworm" uname --machine

You should be able to fix it by calling once:

0podman run --rm --privileged docker.io/multiarch/qemu-user-static --reset -p yes

(and then double check with the previous command)

(if it passes, the CI should pass as well)

Sjors commented at 9:04 am on June 20, 2024:

That first command:

0WARNING: image platform (linux/arm64/v8) does not match the expected platform (linux/amd64)
1{"msg":"exec container process `/usr/bin/uname`: Exec format error","level":"error","time":"2024-06-20T09:03:31.220776Z"}

Running the second command does not change anything.

maflcko commented at 9:10 am on June 20, 2024:

What is the output of the second command?

Locally, everything works for me:

0 lscpu | grep AMD && cat /etc/os-release | grep Noble && podman --version 
1Vendor ID:                            AuthenticAMD
2Model name:                           AMD EPYC Processor
3VERSION="24.04 LTS (Noble Numbat)"
4podman version 4.9.3

Sjors commented at 9:11 am on June 20, 2024:

0Vendor ID:                            AuthenticAMD
1Model name:                           AMD Ryzen 9 7950X 16-Core Processor
2Virtualization:                       AMD-V
3VERSION="24.04 LTS (Noble Numbat)"
4podman version 4.9.3

maflcko commented at 9:12 am on June 20, 2024:

What is the output of podman run --rm --privileged docker.io/multiarch/qemu-user-static --reset -p yes?

Sjors commented at 9:13 am on June 20, 2024:

 0podman run --rm --privileged docker.io/multiarch/qemu-user-static --reset -p yes
 1Setting /usr/bin/qemu-alpha-static as binfmt interpreter for alpha
 2Setting /usr/bin/qemu-arm-static as binfmt interpreter for arm
 3Setting /usr/bin/qemu-armeb-static as binfmt interpreter for armeb
 4Setting /usr/bin/qemu-sparc-static as binfmt interpreter for sparc
 5Setting /usr/bin/qemu-sparc32plus-static as binfmt interpreter for sparc32plus
 6Setting /usr/bin/qemu-sparc64-static as binfmt interpreter for sparc64
 7Setting /usr/bin/qemu-ppc-static as binfmt interpreter for ppc
 8Setting /usr/bin/qemu-ppc64-static as binfmt interpreter for ppc64
 9Setting /usr/bin/qemu-ppc64le-static as binfmt interpreter for ppc64le
10Setting /usr/bin/qemu-m68k-static as binfmt interpreter for m68k
11Setting /usr/bin/qemu-mips-static as binfmt interpreter for mips
12Setting /usr/bin/qemu-mipsel-static as binfmt interpreter for mipsel
13Setting /usr/bin/qemu-mipsn32-static as binfmt interpreter for mipsn32
14Setting /usr/bin/qemu-mipsn32el-static as binfmt interpreter for mipsn32el
15Setting /usr/bin/qemu-mips64-static as binfmt interpreter for mips64
16Setting /usr/bin/qemu-mips64el-static as binfmt interpreter for mips64el
17Setting /usr/bin/qemu-sh4-static as binfmt interpreter for sh4
18Setting /usr/bin/qemu-sh4eb-static as binfmt interpreter for sh4eb
19Setting /usr/bin/qemu-s390x-static as binfmt interpreter for s390x
20Setting /usr/bin/qemu-aarch64-static as binfmt interpreter for aarch64
21Setting /usr/bin/qemu-aarch64_be-static as binfmt interpreter for aarch64_be
22Setting /usr/bin/qemu-hppa-static as binfmt interpreter for hppa
23Setting /usr/bin/qemu-riscv32-static as binfmt interpreter for riscv32
24Setting /usr/bin/qemu-riscv64-static as binfmt interpreter for riscv64
25Setting /usr/bin/qemu-xtensa-static as binfmt interpreter for xtensa
26Setting /usr/bin/qemu-xtensaeb-static as binfmt interpreter for xtensaeb
27Setting /usr/bin/qemu-microblaze-static as binfmt interpreter for microblaze
28Setting /usr/bin/qemu-microblazeel-static as binfmt interpreter for microblazeel
29Setting /usr/bin/qemu-or1k-static as binfmt interpreter for or1k
30Setting /usr/bin/qemu-hexagon-static as binfmt interpreter for hexagon

Sjors commented at 9:24 am on June 20, 2024:

When I run both commands as root, it does work. But doing so does not change anything for the non-root user. At least that’s a clue.

Sjors commented at 9:33 am on June 20, 2024:

Some background information on what this actually does (though not podman specific): https://stackoverflow.com/a/72890225

Sjors commented at 9:57 am on June 20, 2024:

Adding cirrus1 to /etc/sudoers and running both commands with sudo works. But running only the --reset command with sudo is not enough (even though the qemu-user-static example suggest it should be: https://github.com/multiarch/qemu-user-static?tab=readme-ov-file).

Not really a solution, but at least I can run the CI job once to see what the performance is like.

Sjors commented at 10:20 am on June 20, 2024:

Uhhh … so I did a reboot. And now podman run --rm -ti "docker.io/arm64v8/debian:bookworm" uname --machine just works, doesn’t need the reset command.

Sjors commented at 10:35 am on June 20, 2024:

If the job finishes in reasonable time, I’ll add something like this to the documentation: https://github.com/Sjors/bitcoin/commit/d065dc11fb862f48fc88e674a95b7daf68269ca0

It seems that this either works trivially or is really hard to debug, so I prefer to keep NO_ARM=1 around as an alternative.

Sjors commented at 12:10 pm on June 20, 2024:

1 hour 42 minutes, but that was without using any cache.

Sjors commented at 2:00 pm on June 20, 2024:

58 minutes for the second run, which is acceptable.

maflcko commented at 2:32 pm on June 21, 2024:

In any case, I don’t see (or forgot) what problem is this trying to solve? It is just a display issue where the task is displayed as “skipped” vs something-else? Not sure if the CI config is the right place to provide fixes/fiddles for cosmetic stuff in forks.

Sjors commented at 3:49 pm on June 21, 2024:

Without this change, when there’s no ARM machine, it remains pending forever.

maflcko commented at 2:03 pm on June 25, 2024:

Without this change, when there’s no ARM machine, it remains pending forever.

I understand. However, my point is that this is a cosmetic thing. Who cares? The person that sets up a fork, enables cirrus on it, enables a worker pool on it, knowingly does not enable an arm worker on it, so the person should know that the arm task will not run. Whatever the icon in the UI displays should thus not matter.

However, this change may lead to problems. For example, does yaml allow duplicate keys? If yes, how are they handled? Does the skip via FILTER_TEMPLATE override this one, or vice versa? Will it change in the future?

fanquake commented at 2:17 pm on June 25, 2024:

I understand. However, my point is that this is a cosmetic thing. Who cares? The person that sets up a fork, enables cirrus on it, enables a worker pool on it, knowingly does not enable an arm worker on it, so the person should know that the arm task will not run. Whatever the icon in the UI displays should thus not matter.

I agree. We seem to be spending a lot of time, and happy to take on some complexity just to accommodate use-cases outside this repository; for which, as far as I can see, there is not actually that much demand. We also shouldn’t have to introduce complexity here, just to fix cosmetic issues in other repositories.

To me it’s also unclear what the requirements are here going forward. I assume if any of the outside-of-this-repo functionality breaks, eventually someone might notice, and a fix might happen, but at the same time, I don’t think anyone in this repo is going to be thinking about this when making code changes here (and they shouldn’t have to), so things may break, and that will be unfortunate, but should never be blocking changes here. We are maintaining our infrastructure; not our infrastructure plus the infrastructure of whoever happened to fork this repository.

Sjors commented at 2:32 pm on June 25, 2024:

It’s confusing for both the pull request author and reviewers. Only the fork maintainer will know what’s going on, but they also need to carefully check each time that it’s only the expected job that’s stuck. A single green check, when all jobs pass, is more clear.

does yaml allow duplicate keys

Not sure.

So the simplest solution for now is to drop 859e1c558fd9c8604fda361e465aa4eb4a676823 and make qemu “mandatory”, which works for me.

just to accommodate use-cases outside this repository

But it reduces the number of direct pull requests to repository, so on balance it may save time.

ryanofsky commented at 5:53 pm on June 25, 2024:

re: #29274 (review)

I don’t have a strong opinion, but I do think the NO_ARM commit 859e1c558fd9c8604fda361e465aa4eb4a676823 is a nice change because I don’t think it fixes a purely cosmetic issue. I think it removes confusing behavior of opening a PR and seeing a job hang forever. The actual code change is very simple, just adding skip: $NO_ARM == "true" and I think the documentation change makes the whole “A self-hosted machine(s) can be used via Cirrus CI…” section clearer and more concrete. But I agree with dropping it as long as any objections remain.

maflcko commented at 6:30 am on June 26, 2024:

The actual code change is very simple

If it was so simple, then it would be good to explain how it could possibly work at all. Let me know what I am missing:

However, this change may lead to problems. For example, does yaml allow duplicate keys? If yes, how are they handled? Does the skip via FILTER_TEMPLATE override this one, or vice versa? Will it change in the future?

See also “yaml duplicate keys” in your favourite search engine.

Sjors commented at 6:39 am on June 26, 2024:

It’s conceptually simple, but I agree that we shouldn’t use duplicate keys unless Cirrus docs explicitly says you can. I could work around it in various ways, but for now I found it easier to just drop the commit.

DrahtBot removed the label Needs rebase on Jun 18, 2024

in .cirrus.yml:49 in 3bb1de54f3 outdated

52 #   ```
53 #
54-# - There are no strict requirements on the hardware, because having less CPUs
55-#   runs the same CI script (maybe slower). To avoid rare and intermittent OOM
56+# - There are no strict requirements on the hardware, because having fewer CPU threads
57+#   run the same CI script (maybe slower). To avoid rare and intermittent OOM

maflcko commented at 10:12 am on June 18, 2024:

I can’t parse the new sentence grammatically. Maybe just say , and the CI script can handle slower hardware.?

Sjors commented at 10:35 am on June 18, 2024:

Changed to:

0# - There are no strict requirements on the hardware. Having fewer CPU threads
1#   than recommended merely causes the CI script to run slower.

maflcko commented at 10:39 am on June 18, 2024:

I think this extends to all hardware. For example using a spinning disk will also be slower, regardless of your CPU speed. So being less precise is more correct here.

Sjors commented at 10:56 am on June 18, 2024:

The list of machines only contain recommendations for CPUs and memory. Presumably giving them less memory than recommended can lead to OOM. Giving them fewer CPUs is the only recommendation you can safely ignore (if you’re not in a CI environment that kills jobs after a timeout).

in .cirrus.yml:29 in 3bb1de54f3 outdated

27+# The above machine types are matched to each job by their label. Refer to the
28 # Cirrus CI docs for more details.
29 #
30-# Generally, a persistent worker must run Ubuntu 23.04+ or Debian 12+.
31-# Specifically,
32+# When a contributor makes a pull request against a fork of the repo, two CI

maflcko commented at 10:14 am on June 18, 2024:

I don’t think this has anything to do with forks. It should only depend on whether cirrus is enabled on the repository where the branch is pushed.

Sjors commented at 10:36 am on June 18, 2024:

Oh, so only the maintainer of the fork will have this problem? Maybe there’s another solution then.

Sjors commented at 11:02 am on June 18, 2024:

Ah wait, now I remember, reworded:

0# When a contributor maintains a fork of the repo, any pull request they make
1# to their own fork, or to the main repository, will trigger two CI runs:
2# one for the branch push and one for the pull request.
3# This can be avoided by setting NO_BRANCH=1 as a custom env variable in Cirrus.

Sjors force-pushed on Jun 18, 2024

Sjors force-pushed on Jun 20, 2024

Sjors commented at 12:17 pm on June 20, 2024: member

I turns out it’s fairly straight-forward to run the arm native job on an x86_64 machine using qemu-user-static (ht @maflcko). I added a note on how to do that. But I’m keeping 247b5823b2d5810ac9ddb8c8562232ff3eeb0344 around as an alternative.

maflcko commented at 1:10 pm on June 20, 2024: member

Could split “ci: clarify Cirrus self-hosted workers setup " into a separate doc-only PR? Should be easy to review and merge.

Sjors commented at 2:06 pm on June 20, 2024: member

@maflcko done in #30314, minus the paragraphs about NO_ARM and NO_BRANCH. I’ll rebase this PR if that one lands earlier.

in .cirrus.yml:60 in 1c35e3c47f outdated

56+  # No need to run on the read-only mirror, unless it is a PR.
57+  # https://cirrus-ci.org/guide/writing-tasks/#conditional-task-execution
58+  skip: $CIRRUS_REPO_FULL_NAME == "bitcoin-core/gui" && $CIRRUS_PR == ""
59+  # We use only_if rather than skip when NO_BRANCH is set. Otherwise when a
60+  # contributor opens a pull request, they will initially see all jobs marked
61+  # as skipped.

ryanofsky commented at 2:17 pm on June 20, 2024:

In commit “ci: forks can opt-out of CI branch push (Cirrus only)” (1c35e3c47fe814c402d423e247b26cb6da2e960e)

I’m still confused by this. I understand we don’t want PR contributors to “initially see all jobs marked as skipped”. But what does this cause them to see? Jobs just not showing up at all? Also I don’t understand why it says “initially.” Would the jobs be only initially skipped and not permanently skipped for some reason?

Sjors commented at 4:36 pm on June 20, 2024:

What I remember from a while ago, but I haven’t retested, is that under a pull request all the checks initially are marked as skipped. But then they show as either finished or failed as the results come in. It was rather flaky.

I could try again and make some screenshots…

Sjors commented at 4:50 pm on June 20, 2024:

Glad I checked this, because the glitched described here is no longer happening: #29274 (comment)

ryanofsky commented at 5:01 pm on June 20, 2024:

EDIT: Posted this before I saw reply above, glad this can be simpler now.

I could try again and make some screenshots…

Thanks, I wouldn’t make an extra effort to understand the broken behavior you were trying to avoid. But maybe consider dropping the word “initially” because I think the comment would make more sense without it. The idea of avoiding “skip” directive to avoid tests showing up as “skipped” is fairly clear. It’s just not clear why the tests might initially show up as skipped but then later show up not skipped, if that’s what happens.

It might also help if this said what the code does, instead of just what it does not do. The line only_if: $NO_BRANCH != "true" || $CIRRUS_PR != "" is a hard to parse, but I think it means: Run this job if is part of a PR, and also run it on standalone branches if the project has not set a custom NO_BRANCH variable. Not too important, but I think it could help if comment said before describing why skip was not used.

Sjors force-pushed on Jun 20, 2024

Sjors commented at 5:01 pm on June 20, 2024: member

Simplified c02d96f8f5829d433f4a061480aa37ee0c6ff076 because the UI glitch I noticed by in February no longer happens. See #29274 (review)

Rebased on #30314 to split out the bulk of documentation changes, so will temporarily mark this PR as draft.

Update: #30314 was merged.

Sjors marked this as a draft on Jun 20, 2024

fanquake referenced this in commit 538363738e on Jun 21, 2024

Sjors force-pushed on Jun 21, 2024

Sjors marked this as ready for review on Jun 21, 2024

in .cirrus.yml:70 in 9e30be769d outdated

63@@ -59,7 +64,11 @@ env:  # Global defaults
64 
65 # https://cirrus-ci.org/guide/tips-and-tricks/#sharing-configuration-between-tasks
66 filter_template: &FILTER_TEMPLATE
67-  skip: $CIRRUS_REPO_FULL_NAME == "bitcoin-core/gui" && $CIRRUS_PR == ""  # No need to run on the read-only mirror, unless it is a PR. https://cirrus-ci.org/guide/writing-tasks/#conditional-task-execution
68+  # No need to run on the read-only mirror, unless it is a PR.
69+  # https://cirrus-ci.org/guide/writing-tasks/#conditional-task-execution
70+  skip: $CIRRUS_REPO_FULL_NAME == "bitcoin-core/gui" && $CIRRUS_PR == ""
71+  # Allow forks to skip CI runs when a branch is pushed.
72+  skip: $NO_BRANCH == "true" && $CIRRUS_PR == ""

ryanofsky commented at 11:19 pm on June 24, 2024:

In commit “ci: forks can opt-out of CI branch push (Cirrus only)” (9e30be769dfc06bf122db1ea2042a68c597509a0)

Would suggest expanding comment: “Allow forks to specify NO_BRANCH=true and skip CI runs when a branch is pushed, but still run CI when a PR is created.”

Otherwise it is not clear what the NO_BRANCH variable has to do with forks or why the CIRRUS_PR condition is there.

Sjors commented at 12:45 pm on June 25, 2024:

Done

in .cirrus.yml:69 in 9e30be769d outdated

63@@ -59,7 +64,11 @@ env:  # Global defaults
64 
65 # https://cirrus-ci.org/guide/tips-and-tricks/#sharing-configuration-between-tasks
66 filter_template: &FILTER_TEMPLATE
67-  skip: $CIRRUS_REPO_FULL_NAME == "bitcoin-core/gui" && $CIRRUS_PR == ""  # No need to run on the read-only mirror, unless it is a PR. https://cirrus-ci.org/guide/writing-tasks/#conditional-task-execution
68+  # No need to run on the read-only mirror, unless it is a PR.
69+  # https://cirrus-ci.org/guide/writing-tasks/#conditional-task-execution
70+  skip: $CIRRUS_REPO_FULL_NAME == "bitcoin-core/gui" && $CIRRUS_PR == ""

ryanofsky commented at 11:20 pm on June 24, 2024:

In commit “ci: forks can opt-out of CI branch push (Cirrus only)” (9e30be769dfc06bf122db1ea2042a68c597509a0)

Would it be possible as a followup to delete this skip condition and just specify NO_BRANCH=true in the bitcoin GUI cirrus environment?

Sjors commented at 7:10 am on June 25, 2024:

Yes, I could even change that here if someone sets NO_BRANCH=true before merging this.

Sjors commented at 12:45 pm on June 25, 2024:

Done

in .cirrus.yml:33 in 9e30be769d outdated

26@@ -27,6 +27,11 @@ env:  # Global defaults
27 # The above machine types are matched to each task by their label. Refer to the
28 # Cirrus CI docs for more details.
29 #
30+# When a contributor maintains a fork of the repo, any pull request they make
31+# to their own fork, or to the main repository, will trigger two CI runs:
32+# one for the branch push and one for the pull request.
33+# This can be avoided by setting NO_BRANCH=1 as a custom env variable in Cirrus.

ryanofsky commented at 11:26 pm on June 24, 2024:

In commit “ci: forks can opt-out of CI branch push (Cirrus only)” (9e30be769dfc06bf122db1ea2042a68c597509a0)

How actually do you set a custom in variable in cirrus for a fork? Maybe this could suggest a way to do it or link to the cirrus docs.

Also should this say NO_BRANCH=true instead of NO_BRANCH=1 since code below appears to compare against “true” specifically?

Sjors commented at 12:45 pm on June 25, 2024:

Fixed.

I added an instruction at the top of this PR for where to set NO_BRANCH=true.

I can’t find a Cirrus documentation page to point to for this, but you’ll find it once you find the repo settings page.

in .cirrus.yml:28 in dc98f4c1ab outdated

23@@ -24,6 +24,9 @@ env:  # Global defaults
24 # by installing qemu-user-static, which works out of the box with
25 # podman or docker. Background: https://stackoverflow.com/a/72890225/313633
26 #
27+# Alternatively arm tasks can be skipped by setting NO_ARM=1 as a custom env
28+# variable in Cirrus.

ryanofsky commented at 11:57 pm on June 24, 2024:

In commit “ci: skip arm if no worker is configured” (dc98f4c1ab181c79352f9a14351efe87a58f203d)

Questions / suggestions:

It looks like code below is specifically checking $NO_ARM == “true”, so should this say NO_ARM=true instead of NO_ARM=1?
It might be good to join this paragraph with the previous paragraph so the arm information is in one place.
It might be good to say that forks either need to provide an arm worker or set NO_ARM=true, otherwise ARM tasks will appear pending forever, since that is not obvious. (I am basing this on #29274 (review).)

Sjors commented at 12:45 pm on June 25, 2024:

Fixed NO_ARM=true

Joined the paragraphs.

Explained what happens when you do neither.

Sjors commented at 2:38 pm on June 25, 2024:

But this commit is dropped now.

in .github/workflows/ci.yml:61 in b80a573bb4 outdated

60@@ -61,7 +61,13 @@ jobs:
61           # and the ^ prefix is used to exclude these parents and all their
62           # ancestors from the rev-list output as described in:
63           # https://git-scm.com/docs/git-rev-list
64-          echo "TEST_BASE=$(git rev-list -n$((${{ env.MAX_COUNT }} + 1)) --reverse HEAD ^$(git rev-list -n1 --merges HEAD)^@ | head -1)" >> "$GITHUB_ENV"
65+          MERGE_BASE=$(git rev-list -n1 --merges HEAD)

ryanofsky commented at 0:18 am on June 25, 2024:

In commit “ci: test-each-commit merge base optional” (b80a573bb42e10f1ec550c79133317c1c0b021c6)

Might be useful to expand commit message to explain why this change is needed. Maybe say:

The ci “test-each-commit” job fetches the PR branch being tested with a depth of (# of commits in PR + 2), and then tries to run tests on commits after the most recent merge commit.

When a PR is opened against a bitcoin core branch, a merge commit is always guaranteed to exist within the fetch depth, because bitcoin core branches always point at merge commits.

However, in fork repositories, pull requests can be opened based on other branches that don’t contain recent merge commits, and this will currently cause git rev-list to fail with fatal: bad revision '^^@'.

Work around this problem by not requiring a recent merge commit, and just testing on all fetched commits if a merge commit can’t be found.

ryanofsky approved

ryanofsky commented at 0:29 am on June 25, 2024: contributor

Code review ACK b80a573bb42e10f1ec550c79133317c1c0b021c6, but I would think about dropping the second commit (“ci: skip Github CI on branch pushes for forks” (b9fdd0dc75b5b4944dffc700b0391b38465f754a)) and moving it to another PR, since it is a little confusing and maybe could use more discussion.

Otherwise looks very good, I just left some suggestions to try to improve some comments.

DrahtBot requested review from maflcko on Jun 25, 2024

Sjors force-pushed on Jun 25, 2024

Sjors commented at 12:45 pm on June 25, 2024: member

I dropped b9fdd0dc75b5b4944dffc700b0391b38465f754a as suggested by @ryanofsky so the PR is focussed on Cirrus.

This does cause Github tasks to appear twice, but as long as they pass (or get skipped) it’s fine for now:

I updated 026d86427527a6c58425896066386713b9782275 to drop the bitcoin-core/gui workaround. Updated the PR description to make clear that repo needs to have NO_BRANCH=true set before merging. See #29274 (review)

Sjors force-pushed on Jun 25, 2024

Sjors commented at 2:34 pm on June 25, 2024: member

Dropped the NO_ARM commit as well, see #29274 (review)

So now the first commit is a refactor to replace the hardcoded CIRRUS_REPO_FULL_NAME == "bitcoin-core/gui" with NO_BRANCH. The second commit relaxes the assumption that a merge commit is always guaranteed to exist within the fetch depth.

fanquake commented at 2:48 pm on June 25, 2024: member

Maintainer note: NO_BRANCH=true must be set in Cirrus for bitcoin-core/gui before merging this. See https://cirrus-ci.com/github/bitcoin-core/gui -> Settings.

Can you rename this to something that actually explains what it’s doing; NO_BRANCH is incredibly vauge. Why not DONT_RUN_CI_ON_BRANCH_PUSH_ON_THIS_FORK etc? Then someone looking at the settings in Cirrus will actually know what this is doing.

Sjors force-pushed on Jun 25, 2024

Sjors commented at 3:11 pm on June 25, 2024: member

Renamed NO_BRANCH to SKIP_BRANCH_PUSH.

ryanofsky commented at 4:42 pm on June 25, 2024: contributor

re: #29274#issue-2088449017

Maintainer note: SKIP_BRANCH_PUSH=true must be set in Cirrus for bitcoin-core/gui before merging this. See https://cirrus-ci.com/github/bitcoin-core/gui -> Settings.

I just now went to that URL and set this. There were not any other environment variables set.

I wonder if there is a good place to document this configuration, but probably it’s not too important as the main effect is just to avoiding wastefully running CI twice on pushes to master (at least according to the original PR which added the skip statement: #20328)

in .cirrus.yml:33 in 46be986c2c outdated

26@@ -27,6 +27,11 @@ env:  # Global defaults
27 # The above machine types are matched to each task by their label. Refer to the
28 # Cirrus CI docs for more details.
29 #
30+# When a contributor maintains a fork of the repo, any pull request they make
31+# to their own fork, or to the main repository, will trigger two CI runs:
32+# one for the branch push and one for the pull request.
33+# This can be avoided by setting SKIP_BRANCH_PUSH=true as a custom env variable in Cirrus.

ryanofsky commented at 4:51 pm on June 25, 2024:

In commit “ci: forks can opt-out of CI branch push (Cirrus only)” (46be986c2cff004a081e89433715118637bb4eb3)

Maybe add ... as a custom env variable in Cirrus repository settings, accessible from https://cirrus-ci.com/github/my-organization/my-repository because I don’t think it is obvious where this can be set.

Sjors commented at 6:09 pm on June 25, 2024:

I find this equally confusing, but was a bit reluctant to add a manual for Cirrus. Anyway, added.

ryanofsky approved

ryanofsky commented at 5:59 pm on June 25, 2024: contributor

Code review ACK 0882babb102108a4351714cc3749fceba1ecbb35.

I think the current PR title / description are a little hard to understand. Maybe consider editing it to something like the following:

Fix issues with CI on forks

I find myself making pull requests against my fork (mostly on top of #28983, or asking others to do so. Currently only the Github actions are run on forks, because we use self-hosted runners for the Cirrus tasks.

While setting up my own self-hosted runners for my fork, I ran into a number of issues. Some of those were addressed by #29441, but remaining issues are:
1. When PRs are opened in the fork, cirrus CI jobs are run twice because PRs and branches reside in the same repository, rather than a main repository and a fork repository, as is the case with bitcoin/bitcoin PRs. Fix this by adding a SKIP_BRANCH_PUSH configuration option that allows skipping CI runs not directly associated with a PR. The fix is a generalization of #20328, which fixed a similar problem for the bitcoin-core/gui mirror repository, and it allows removing a hardcoded reference that repository.
  
  Github actions jobs will still run twice despite this change, see #29274 (comment). Initially this PR tried to prevent that with b9fdd0dc75b5b4944dffc700b0391b38465f754a, but this had some potentially negative side effects, see #29274 (comment), so that commit was dropped for now.
2. When PRs are opened in the fork, the “test-each-commit” github action can fail due to not being able to find a recent merge commit. This problem doesn’t happen in the bitcoin/bitcoin repository because branches in this repository used as the base for pull requests always point at merge commits.

This PR replaces #29259 using the self hosted workers via Cirrus instead of Github.

You can see this PR in action on this pull request to my fork: https://github.com/Sjors/bitcoin/pull/30

To test it yourself:

spin up at least two self hosted runners. Either use a seperate VM for each, or give them their own user.
Install Podman and other CI dependencies (see .cirrus.yml)
Give Cirrus access to your fork at https://cirrus-ci.com/settings/github/YOU
Get a token from Cirrus and use it to start your worker(s)
Optionally set NO_BRANCH=true ~~and NO_ARM=true~~ env variables (see .cirrus.yml)
make a pull request to your own fork, with this PR as the base branch

Security wise: when dealing with code from strangers on the internet, review it first before running the CI. There’s a Cirrus check-box that requires approval for people without write access to trigger CI.

ci: forks can opt-out of CI branch push (Cirrus only) e9bfbb5414

ci: test-each-commit merge base optional

The ci "test-each-commit" job fetches the PR branch being tested with a depth of (# of commits in PR + 2), and then tries to run tests on commits after the most recent merge commit.

When a PR is opened against a bitcoin core branch, a merge commit is always guaranteed to exist within the fetch depth, because bitcoin core branches always point at merge commits.

However, in fork repositories, pull requests can be opened based on other branches that don't contain recent merge commits, and this will currently cause git rev-list to fail with fatal: bad revision '^^@'.

Work around this problem by not requiring a recent merge commit, and just testing on all fetched commits if a merge commit can't be found.

Co-authored-by: Ryan Ofsky <ryan@ofsky.org>

576828e732

Sjors force-pushed on Jun 25, 2024

Sjors commented at 6:10 pm on June 25, 2024: member

@ryanofsky thanks for the PR description rewrite!

ryanofsky approved

ryanofsky commented at 6:32 pm on June 25, 2024: contributor

Code review ACK 576828e732bacb61b608cd89f669a2936d3e8d00.

I also tweaked formatting in the PR description to fix a few problems in the markdown copy / paste.

The suggested text “Fix issues with CI on forks” in the description was actually meant to be a replacement title for the PR, since the second commit in the PR fixes an issue with a github actions job, not a cirrus job. So could remove this and maybe update the title.

Sjors renamed this:
~~Support self-hosted Cirrus workers on forks~~
Fix issues with CI on forks
on Jun 25, 2024

maflcko approved

maflcko commented at 6:38 am on June 26, 2024: member

lgtm ACK

My preference would be to have the .cirrus.yml as single source of ground truth. However, requiring forks to update this file is too cumbersome, so this alternative seems fine.

(The “config resolution strategy” is another setting outside the yaml that needs to be modified to “merge for prs”, which could be documented in a follow-up)

But this looks good in any case.

maflcko commented at 6:43 am on June 26, 2024: member

Optionally set NO_BRANCH=true ~~and NO_ARM=true~~ env variables (see .cirrus.yml)

This is wrong in the description. The name was changed.

Sjors commented at 6:49 am on June 26, 2024: member

@maflcko fixed in description

ryanofsky commented at 5:21 pm on July 8, 2024: contributor

@maflcko is your “lgtm ACK” more of a partial ACK, or do you think this is good to merge? I think practically speaking this looks ok to merge, but I wouldn’t want to merge it myself if I’m the only full reviewer.

I guess there are also number of possible followups to this PR if it is merged:

Maybe preventing github actions from running twice on PRs, #29274 (comment).
Maybe restoring ability to skip cirrus ARM jobs #29274 (comment).
Maybe documenting need to set bitcoin-core/gui SKIP_BRANCH_PUSH value somewhere #29274 (comment).
Maybe documenting need to set “config resolution strategy” somewhere #29274 (comment).

maflcko commented at 7:48 am on July 9, 2024: member

ACK 576828e732bacb61b608cd89f669a2936d3e8d00

ryanofsky merged this on Jul 10, 2024

ryanofsky closed this on Jul 10, 2024

Sjors commented at 3:40 pm on July 19, 2024: member

Picked @ryanofsky’s first suggestion into #30487.

Re. the second suggestion: so far ARM jobs have not been a source of headaches on my own fork (only TSAN and MSAN are atm).

Sjors deleted the branch on Aug 12, 2024

DashCoreAutoGuix referenced this in commit 6b56220c1f on Jul 31, 2025

DashCoreAutoGuix referenced this in commit 57b7031bcf on Jul 31, 2025

DashCoreAutoGuix referenced this in commit df4964b93a on Aug 9, 2025

bitcoin locked this on Aug 12, 2025

Fix issues with CI on forks #29274

Code Coverage

Reviews

Conflicts