ci, gha: Add “x86_64: Linux (Debian stable)” GitHub Actions job #1396
pull hebasto wants to merge 5 commits into bitcoin-core:master from hebasto:230806-gha-linux changing 4 files +82 −37-
hebasto commented at 6:59 am on August 7, 2023: memberSolves one item in #1392 partially.
-
hebasto cross-referenced this on Aug 7, 2023 from issue ci: Future of CI after Cirrus pricing change by real-or-random
-
MarcoFalke commented at 7:32 am on August 7, 2023: noneWhat are the ghcr usage limits?
-
hebasto commented at 8:18 am on August 7, 2023: member
What are the ghcr usage limits?
GitHub Container Registry is free for public images.
-
in .github/workflows/ci.yml:61 in a6ba359cc5 outdated
56+ if: contains(needs.modified_files.outputs.all, '.github/workflows/docker.yml') || contains(needs.modified_files.outputs.all, 'ci/linux-debian.Dockerfile') 57+ 58+ x86_64_debian_stable: 59+ name: "x86_64: Linux (Debian stable)" 60+ runs-on: ubuntu-latest 61+ container: ghcr.io/${{ github.repository_owner }}/secp256k1-ci-image:latest
MarcoFalke commented at 8:50 am on August 7, 2023:how will this work, when a pull modifies thelatest
tag and breaks all other build, because they skip thedocker_image
step?
hebasto commented at 9:22 am on August 7, 2023:how will this work, when a pull modifies the
latest
tag and breaks all other build, because they skip thedocker_image
step?Do you mean concurrent builds?
MarcoFalke commented at 9:26 am on August 7, 2023:No, I mean any pull that modifies the image, and all other task runs that need the “previous” (aka current) image frommaster
.
hebasto commented at 9:31 am on August 7, 2023:I see. Going to address this issue shortly.
hebasto force-pushed on Aug 7, 2023in ci/cirrus.sh:10 in 7ea529c40b outdated
3@@ -4,6 +4,10 @@ set -eux 4 5 export LC_ALL=C 6 7+if [ "${GITHUB_ACTIONS:-undefined}" = "true" ]; then 8+ git config --global --add safe.directory "$GITHUB_WORKSPACE" 9+fi 10+
real-or-random commented at 4:56 pm on August 7, 2023:I think it will be good to do this in the workflow, and keep this file agnostic of the CI. (Okay, it’s currently calledcirrus.sh
, and we should anyway change this. ^^)
in .github/workflows/ci.yml:16 in 6d3d4b9713 outdated
11+ uses: actions/checkout@v3 12+ 13+ - name: Get modified files 14+ id: changes 15+ env: 16+ COMMIT_BEFORE: ${{ github.event_name != 'pull_request' && github.event.pull_request.base.sha || (github.event.before != '0000000000000000000000000000000000000000' && github.event.before || github.event.repository.default_branch) }}
real-or-random commented at 4:58 pm on August 7, 2023:Can you explain what this achieves?
hebasto commented at 8:17 pm on August 7, 2023:Fixed a typo
!=
–>==
.Can you explain what this achieves?
By default, the
actions/checkout
usesfetch-depth: 1
.We need to fetch
COMMIT_BEFORE
to figure out the modified files.For pull requests, it is a parent commit of the base branch. For pushes, it is the commit from the previous push.
real-or-random commented at 8:44 pm on August 8, 2023:It seems there’s a built-in feature for this https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#onpushpull_requestpull_request_targetpathspaths-ignore
hebasto commented at 8:47 pm on August 8, 2023:It seems there’s a built-in feature for this docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#onpushpull_requestpull_request_targetpathspaths-ignore
It was the place I started from. However, I didn’t find a design with that feature that allows to have two mutually exclusive execution paths depending on whether docker image is invalidated.
real-or-random commented at 4:50 pm on August 9, 2023:I see. But would you need to mutually exclusive exec paths? The way I imagine it is just to skip the rebuilding of the image when it doesn’t need to be rebuilt. But maybe I’m overlooking something.real-or-random commented at 5:00 pm on August 7, 2023: contributorConcept ACK.
Very nice that the prebuilt docker stuff can be moved to GitHub Actions easily. I think only time will tell whether we prefer GH Actions over Cirrus in terms of latency, reliability, support, etc, but I’m happy to try.
hebasto force-pushed on Aug 7, 2023hebasto renamed this:
[POC] ci: Add "x86_64: Linux (Debian stable)" GitHub Actions job
ci: Add "x86_64: Linux (Debian stable)" GitHub Actions job
on Aug 7, 2023hebasto commented at 8:08 pm on August 7, 2023: memberUpdated 7ea529c40bc890cc8c398bb410413010350ee29b -> c066aff6cfedb3ad98cc8f76ad87da7a06a17011 (pr1396.02 -> pr1396.03, diff):
- addressed @MarcoFalke’s comment
- addressed @real-or-random’s comment
UPD. Also the workflow has been disabled for tags. I don’t see any reason in such behaviour, However, please let me know if I am wrong.
MarcoFalke commented at 6:10 am on August 8, 2023: noneGitHub Container Registry is free for public images.
Ok, so that will also need write-access for the CI again. If there is no storage limit, it would still be good to clean old tags, no? I just can’t imagine that GH will run an unlimited, free, append-only storage forever.
hebasto commented at 8:06 am on August 8, 2023: memberGitHub Container Registry is free for public images.
Ok, so that will also need write-access for the CI again.
Not the whole CI, but a single job: https://github.com/bitcoin-core/secp256k1/blob/c066aff6cfedb3ad98cc8f76ad87da7a06a17011/.github/workflows/ci.yml#L72-L77 and its the corresponding reusable workflow: https://github.com/bitcoin-core/secp256k1/blob/c066aff6cfedb3ad98cc8f76ad87da7a06a17011/.github/workflows/docker.yml#L12-L13
If there is no storage limit, it would still be good to clean old tags, no? I just can’t imagine that GH will run an unlimited, free, append-only storage forever.
I agree with you. Going to leave this issue for a follow up as a non-critical one.
(a note to myself: https://docs.github.com/en/packages/learn-github-packages/deleting-and-restoring-a-package)
MarcoFalke commented at 8:26 am on August 8, 2023: noneIt just seems overly complicated to think about. What if a malicious pull request pushes a malicious image to the image tag of another pull request? Wouldn’t it be easier to only have a single tag to solve all issues:
- Storage requirements are constant.
- Write access can be limited to the main development branch only.
- Pull requests always use the cache from the main development branch and always build from scratch on a new push, if they modify the docker image.
hebasto commented at 11:22 am on August 8, 2023: member- Pull requests always use the cache from the main development branch and always build from scratch on a new push, if they modify the docker image.
Unfortunately,
jobs.<job_id>.container.image
supports only published images, not built locally.MarcoFalke commented at 11:51 am on August 8, 2023: noneYou could implement it yourself with three lines? Something like:podman pull $remote_image_from_main
(to init the cache) and thenpodman build ...
andpodman run cirrus.sh
hebasto commented at 11:53 am on August 8, 2023: memberYou could implement it yourself with three lines? Something like:
podman pull $remote_image_from_main
(to init the cache) and thenpodman build ...
andpodman run cirrus.sh
Sure. That is the approach I’m currently working on :)
MarcoFalke commented at 11:57 am on August 8, 2023: none(re #1396 (comment))
Or maybe there could be two tags:
main
andrandom_pull
, where both are downloaded to opportunistically seed the cache. This would double the storage and require write permission in pull requests again, but would save on compute on a sunny day (no malicious pull request or honest concurrent pull request that modifies the docker image).real-or-random commented at 12:28 pm on August 8, 2023: contributor(re #1396 (comment))
Or maybe there could be two tags:
main
andrandom_pull
, where both are downloaded to opportunistically seed the cache. This would double the storage and require write permission in pull requests again, but would save on compute on a sunny day (no malicious pull request or honest concurrent pull request that modifies the docker image).I think what you suggested above is good enough. It’s anyway not a bad to rebuild every time in PRs that modify the Docker image, just to get rid of all potential caching issues during development.
MarcoFalke commented at 12:32 pm on August 8, 2023: noneWhich above do you mean?
IIRC clang is built from scratch, so if there is a matrix of 90 tasks, clang will be built 90 times in a pull, unless there happens to be a cache laying around.
real-or-random commented at 1:43 pm on August 8, 2023: contributorI meant this:
Pull requests always use the cache from the main development branch and always build from scratch on a new push, if they modify the docker image.
IIRC clang is built from scratch, so if there is a matrix of 90 tasks, clang will be built 90 times in a pull, unless there happens to be a cache laying around.
Okay, sure, that’s a problem.
Could we have just one tag per PR, but delete the image when the PR is closed or merged?
real-or-random commented at 2:16 pm on August 8, 2023: contributorI’m not sure if publishing to ghcr is the right approach. Couldn’t we use GHA’s cache? https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows Then we get cache eviction automatically, see https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows#usage-limits-and-eviction-policy.
I can imagine two approaches:
- Cache the built image (similar to what we do now).
- Always “build” the Docker image, but provide the workflow that builds the image with a Docker build cache. (Then
docker build
will always run, but it will immediately succeed when it has a cache hit…) (edit: I think something like this is described here: https://docs.docker.com/build/ci/github-actions/cache/)
hebasto commented at 2:19 pm on August 8, 2023: memberI’m not sure if publishing to ghcr is the right approach. Couldn’t we use GHA’s cache? docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows Then we get cache eviction automatically, see docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows#usage-limits-and-eviction-policy.
I can imagine two approaches:
-
Cache the built image (similar to what we do now).
-
Always “build” the Docker image, but provide the workflow that builds the image with a Docker build cache. (Then
docker build
will always run, but it will immediately succeed when it has a cache hit…)
I have already tried such an approach.
… it will immediately succeed…
That’s a bit exaggerated :)
UPD. I mean, GBs of cache take time to pack / upload / download / unpack…
MarcoFalke commented at 2:29 pm on August 8, 2023: noneUPD. I mean, GBs of cache take time to pack / upload / download / unpack…
How and why is uploading an image to cache different from uploading an image to ghcr, time wise?
hebasto commented at 2:33 pm on August 8, 2023: memberUPD. I mean, GBs of cache take time to pack / upload / download / unpack…
How and why is uploading an image to cache different from uploading an image to ghcr, time wise?
You’re right. There are no significant difference. My point was that there is no reason to expect a success of a cached job “immediately”.
MarcoFalke commented at 2:39 pm on August 8, 2023: noneAh, I see. Maybe a single task can be spawned before the matrix to populate the cache? IIUC the docker image in this repo is static (the same) for all tasks in the matrix?real-or-random commented at 4:56 pm on August 8, 2023: contributorI can imagine two approaches:
- Cache the built image (similar to what we do now).
- Always “build” the Docker image, but provide the workflow that builds the image with a Docker build cache. (Then
docker build
will always run, but it will immediately succeed when it has a cache hit…)
I have already tried such an approach.
Which one have you tried, and why was it not successful?
hebasto commented at 7:50 pm on August 8, 2023: memberI can imagine two approaches:
- Cache the built image (similar to what we do now).
- Always “build” the Docker image, but provide the workflow that builds the image with a Docker build cache. (Then
docker build
will always run, but it will immediately succeed when it has a cache hit…)
I have already tried such an approach.
Which one have you tried, and why was it not successful?
The latter. Its recent incarnation is here: https://github.com/hebasto/secp256k1/tree/230808-gha-linux.3
It fails due to “no space left on device” error. However, the workflow works with simpler
linux-debian.Dockerfile
.real-or-random commented at 8:07 pm on August 8, 2023: contributorIt fails due to “no space left on device” error
I can imagine this is due to our use of
/tmp
.edit: It may be helpful to try
df -h
inside docker to see what’s going on.real-or-random added the label ci on Aug 8, 2023hebasto commented at 9:57 pm on August 8, 2023: memberIt may be helpful to try
df -h
inside docker to see what’s going on.That seems a bit tricky as everything happens within a third-partydocker/build-push-action
.0[#11](/bitcoin-core-secp256k1/11/) 113.9 Filesystem Size Used Avail Use% Mounted on 1[#11](/bitcoin-core-secp256k1/11/) 113.9 overlay 84G 80G 4.2G 96% / 2[#11](/bitcoin-core-secp256k1/11/) 113.9 tmpfs 64M 0 64M 0% /dev 3[#11](/bitcoin-core-secp256k1/11/) 113.9 shm 64M 0 64M 0% /dev/shm 4[#11](/bitcoin-core-secp256k1/11/) 113.9 /dev/root 84G 80G 4.2G 96% /etc/resolv.conf 5[#11](/bitcoin-core-secp256k1/11/) 113.9 tmpfs 3.4G 0 3.4G 0% /proc/acpi 6[#11](/bitcoin-core-secp256k1/11/) 113.9 tmpfs 3.4G 0 3.4G 0% /sys/firmware 7[#11](/bitcoin-core-secp256k1/11/) 113.9 tmpfs 3.4G 0 3.4G 0% /proc/scsi
Maybe just split MSVC part into a separated image?
MarcoFalke commented at 4:54 am on August 9, 2023: noneYou can clear space, see https://github.com/google/oss-fuzz/commit/b7f04d782277638a67bc44865de445977eed4708hebasto commented at 10:10 am on August 9, 2023: memberYou can clear space, see google/oss-fuzz@b7f04d7
Thank you! I’m going to look into your suggestion closely.
Maybe just split MSVC part into a separated image?
Meanwhile, splitting Docker images has its own benefits like an opportunity to build all images in parallel and minimizing the total time needed to rebuild.
hebasto cross-referenced this on Aug 9, 2023 from issue ci: Move tidy to persistent worker by MarcoFalkereal-or-random commented at 11:11 am on August 9, 2023: contributorThere’s also https://github.com/insightsengineering/disk-space-reclaimer (not suggesting to use that action, but this has a some nice stats etc.)
And yep, https://github.com/hebasto/secp256k1/commit/9db9c1ccd531d020d709af3e8b466b3fb250094c is also a good idea. We could also remove the gcc sources/build dir after installing.
MarcoFalke commented at 11:46 am on August 9, 2023: noneThere’s also https://github.com/insightsengineering/disk-space-reclaimer (not suggesting to use that action, but this has a some nice stats etc.)
My approach clears only 4GB less, but doesn’t consume any CPU or disk ;)
hebasto cross-referenced this on Aug 10, 2023 from issue ci, gha: Add Windows jobs based on Linux image by hebastohebasto marked this as a draft on Aug 10, 2023hebasto force-pushed on Aug 11, 2023hebasto force-pushed on Aug 12, 2023hebasto renamed this:
ci: Add "x86_64: Linux (Debian stable)" GitHub Actions job
ci, gha: Add "x86_64: Linux (Debian stable)" GitHub Actions job
on Aug 12, 2023real-or-random commented at 1:44 pm on August 15, 2023: contributorI think, it would be good to cherry-pick https://github.com/real-or-random/secp256k1/commit/20b208df0ab22a2afd80f3cf172f100365766e7b here to see the difference in image size.hebasto force-pushed on Aug 15, 2023hebasto commented at 5:39 pm on August 15, 2023: memberRebased on top of the #1399.
I think, it would be good to cherry-pick real-or-random@20b208d here to see the difference in image size.
Thanks! Cherry-picked.
hebasto force-pushed on Aug 15, 2023ci: Remove GCC build files and sage to reduce size of Docker image ad3e65d9feci, gha: Add "x86_64: Linux (Debian stable)" GitHub Actions job 03c9e6508cci: Remove "x86_64: Linux (Debian stable)" task from Cirrus CI 6617a620d9hebasto force-pushed on Aug 18, 2023hebasto marked this as ready for review on Aug 18, 2023hebasto commented at 1:52 pm on August 18, 2023: memberFWIW, @real-or-random’s commit reduced the cached Docker layers size from ~3.2 GB to ~2.1 GB.hebasto closed this on Aug 18, 2023
hebasto reopened this on Aug 18, 2023
hebasto commented at 3:48 pm on August 18, 2023: memberHere is an example of the last commit’s help: https://github.com/hebasto/secp256k1/actions/runs/5904122537/job/16015555795#step:3:621real-or-random commented at 4:00 pm on August 18, 2023: contributor- Hm, yeah, sad that we need retries…(nit: Call I it
retry_builder
?fallback
sounds like it’s doing something else) - I wonder if artifacts will be a more reliable way of transferring the image from the builder job to the other job. Do you think it’s worth trying? It would be interesting to see if it’s slower or faster.
ci, gha: Add `retry_builder` Docker image builder
This change is aimed at significantly reducing the frequency of failures caused by intermittent network timeouts.
hebasto force-pushed on Aug 18, 2023hebasto commented at 4:09 pm on August 18, 2023: member- Hm, yeah, sad that we need retries…(nit: Call I it
retry_builder
?fallback
sounds like it’s doing something else)
Done.
- I wonder if artifacts will be a more reliable way of transferring the image from the builder job to the other job. Do you think it’s worth trying? It would be interesting to see if it’s slower or faster.
I agree with you. It’s definitely worth trying.
Considering the 1st Sept. deadline, it would be nice to postpone until all tasks are migrated from Cirrus CI.
real-or-random commented at 4:22 pm on August 18, 2023: contributor*
I agree with you. It’s definitely worth trying.
Considering the 1st Sept. deadline, it would be nice to postpone until all tasks are migrated from Cirrus CI.
Agreed, I’ll add an item to the tracking issue.
real-or-random approvedreal-or-random commented at 4:23 pm on August 18, 2023: contributorACK 4ad4914bd15bd856eddb306d86588bdacabb1184 (assuming CI finishes successfully)real-or-random commented at 4:40 pm on August 18, 2023: contributorWell, now one job failed even with the retrying… Perhaps failure probability within a single job are correlated. https://github.com/bitcoin-core/secp256k1/actions/runs/5904705284/job/16017365422?pr=1396
Idk, I think we should accept some flakiness and move on for now… Cirrus was also not perfect in that regard, even though they improved their automatic re-running. So my ACK stands.
hebasto commented at 10:28 am on August 19, 2023: memberI wonder if artifacts will be a more reliable way of transferring the image from the builder job to the other job.
Uploading of the image (2036732288 bytes) takes appr. 15 minutes.
UPD. See builds in my personal repo – https://github.com/hebasto/secp256k1/actions/runs/5912496333.
UPD. Just a reminder that there is one more option that is “abusing” of GitHub Container Registry.
ci, gha: Drop `driver-opts.network` input for `setup-buildx-action` e10878f58ehebasto commented at 8:29 am on August 20, 2023: memberI took the liberty to push one more commit which partially reverts the change introduced earlier and based on this comment.
Actually, https://github.com/moby/buildkit/issues/3969 is related to writing to the GitHub Actions cache, but our
run-in-docker-action
only reads from it.Before pushing, I’ve successfully run it 10 times in my personal repo – https://github.com/hebasto/secp256k1/actions/runs/5912556017.
Here is an example of how network glitches are currently being handled:
0... 1[#11](/bitcoin-core-secp256k1/11/) sha256:6ee24e002191e89c757a065afeab642edbcab93b1d8c700e01985f03d349b115 598.74MB / 1.43GB 348.3s 2[#11](/bitcoin-core-secp256k1/11/) sha256:6ee24e002191e89c757a065afeab642edbcab93b1d8c700e01985f03d349b115 602.93MB / 1.43GB 353.4s 3[#11](/bitcoin-core-secp256k1/11/) 353.8 error: failed to copy: read tcp 172.17.0.2:53852->20.209.52.129:443: read: connection reset by peer 4[#11](/bitcoin-core-secp256k1/11/) 353.8 retrying in 1s 5[#11](/bitcoin-core-secp256k1/11/) sha256:6ee24e002191e89c757a065afeab642edbcab93b1d8c700e01985f03d349b115 604.54MB / 1.43GB 1.7s 6[#11](/bitcoin-core-secp256k1/11/) sha256:6ee24e002191e89c757a065afeab642edbcab93b1d8c700e01985f03d349b115 608.73MB / 1.43GB 6.9s 7[#11](/bitcoin-core-secp256k1/11/) sha256:6ee24e002191e89c757a065afeab642edbcab93b1d8c700e01985f03d349b115 613.97MB / 1.43GB 12.0s 8...
hebasto cross-referenced this on Aug 20, 2023 from issue ci, gha: Move more non-x86_64 tasks from Cirrus CI to GitHub Actions by hebastoreal-or-random commented at 8:36 am on August 21, 2023: contributorHere is an example of how network glitches are currently being handled:
Oh, nice! Just to get this right: the retries are only possible with the dropped
driver-opts
?hebasto commented at 8:41 am on August 21, 2023: memberHere is an example of how network glitches are currently being handled:
Oh, nice! Just to get this right: the retries are only possible with the dropped
driver-opts
?Unfortunately, I cannot confirm it as did not find such a behavior description in the docs.
real-or-random commented at 9:36 am on August 21, 2023: contributorHm, 3 jobs failed during the Checkout stage with HTTP 429 (too many requests). I wonder if this is related to having so many parallel jobs. Have you seen this before, e.g. when testing in your repo?
edit: Possibly related because it was opened yesterday: https://github.com/actions/checkout/issues/1432
hebasto commented at 10:34 am on August 21, 2023: memberHm, 3 jobs failed during the Checkout stage with HTTP 429 (too many requests). I wonder if this is related to having so many parallel jobs. Have you seen this before, e.g. when testing in your repo?
I’ve seen the same error in my personal repo exactly at the same time (a very short period). I think, it was one of intermittent GitHub-wide glitches. FWIW, similar issues were observed in Cirrus CI a few times (still they were caused by GitHub).
real-or-random commented at 1:58 pm on August 21, 2023: contributorI wonder if artifacts will be a more reliable way of transferring the image from the builder job to the other job.
Uploading of the image (2036732288 bytes) takes appr. 15 minutes.
UPD. See builds in my personal repo – hebasto/secp256k1/actions/runs/5912496333.
UPD. Just a reminder that there is one more option that is “abusing” of GitHub Container Registry.
Thanks for testing the artifact thing. Downloading the image is much faster apparently (~ 1.3 min), and then the actual jobs run a bit faster. So I can imagine that approach is not too bad if we could avoid re-uploading existing artifacts, e.g., have another digest file that just stores the SHA256, and re-upload only if the SHA does not match. But all of this is probably not worth the complexity if the current approach with the cache turns out to be good enough.
real-or-random approvedreal-or-random commented at 2:04 pm on August 21, 2023: contributorACK e10878f58e4022dbac6e215a89c980a17b95044breal-or-random merged this on Aug 21, 2023real-or-random closed this on Aug 21, 2023
hebasto deleted the branch on Aug 21, 2023
hebasto MarcoFalke real-or-randomLabels
ci
github-metadata-mirror
This is a metadata mirror of the GitHub repository bitcoin-core/secp256k1. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-12-26 10:15 UTC
This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me