ci, gha: Add “x86_64: Linux (Debian stable)” GitHub Actions job #1396

pull hebasto wants to merge 5 commits into bitcoin-core:master from hebasto:230806-gha-linux changing 4 files +82 −37
  1. hebasto commented at 6:59 am on August 7, 2023: member
    Solves one item in #1392 partially.
  2. hebasto cross-referenced this on Aug 7, 2023 from issue ci: Future of CI after Cirrus pricing change by real-or-random
  3. MarcoFalke commented at 7:32 am on August 7, 2023: none
    What are the ghcr usage limits?
  4. hebasto commented at 8:18 am on August 7, 2023: member

    What are the ghcr usage limits?

    GitHub Container Registry is free for public images.

  5. in .github/workflows/ci.yml:61 in a6ba359cc5 outdated
    56+    if: contains(needs.modified_files.outputs.all, '.github/workflows/docker.yml') || contains(needs.modified_files.outputs.all, 'ci/linux-debian.Dockerfile')
    57+
    58+  x86_64_debian_stable:
    59+    name: "x86_64: Linux (Debian stable)"
    60+    runs-on: ubuntu-latest
    61+    container: ghcr.io/${{ github.repository_owner }}/secp256k1-ci-image:latest
    


    MarcoFalke commented at 8:50 am on August 7, 2023:
    how will this work, when a pull modifies the latest tag and breaks all other build, because they skip the docker_image step?

    hebasto commented at 9:22 am on August 7, 2023:

    how will this work, when a pull modifies the latest tag and breaks all other build, because they skip the docker_image step?

    Do you mean concurrent builds?


    MarcoFalke commented at 9:26 am on August 7, 2023:
    No, I mean any pull that modifies the image, and all other task runs that need the “previous” (aka current) image from master.

    hebasto commented at 9:31 am on August 7, 2023:
    I see. Going to address this issue shortly.

    hebasto commented at 8:08 pm on August 7, 2023:
    Thanks! Fixed.
  6. hebasto force-pushed on Aug 7, 2023
  7. in ci/cirrus.sh:10 in 7ea529c40b outdated
     3@@ -4,6 +4,10 @@ set -eux
     4 
     5 export LC_ALL=C
     6 
     7+if [ "${GITHUB_ACTIONS:-undefined}" = "true" ]; then
     8+    git config --global --add safe.directory "$GITHUB_WORKSPACE"
     9+fi
    10+
    


    real-or-random commented at 4:56 pm on August 7, 2023:
    I think it will be good to do this in the workflow, and keep this file agnostic of the CI. (Okay, it’s currently called cirrus.sh, and we should anyway change this. ^^)

    hebasto commented at 8:09 pm on August 7, 2023:
    Thanks! Resolved.
  8. in .github/workflows/ci.yml:16 in 6d3d4b9713 outdated
    11+        uses: actions/checkout@v3
    12+
    13+      - name: Get modified files
    14+        id: changes
    15+        env:
    16+          COMMIT_BEFORE: ${{ github.event_name != 'pull_request' && github.event.pull_request.base.sha || (github.event.before != '0000000000000000000000000000000000000000' && github.event.before || github.event.repository.default_branch) }}
    


    real-or-random commented at 4:58 pm on August 7, 2023:
    Can you explain what this achieves?

    hebasto commented at 8:17 pm on August 7, 2023:

    Fixed a typo != –> ==.

    Can you explain what this achieves?

    By default, the actions/checkout uses fetch-depth: 1.

    We need to fetch COMMIT_BEFORE to figure out the modified files.

    For pull requests, it is a parent commit of the base branch. For pushes, it is the commit from the previous push.



    hebasto commented at 8:47 pm on August 8, 2023:

    It seems there’s a built-in feature for this docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#onpushpull_requestpull_request_targetpathspaths-ignore

    It was the place I started from. However, I didn’t find a design with that feature that allows to have two mutually exclusive execution paths depending on whether docker image is invalidated.


    real-or-random commented at 4:50 pm on August 9, 2023:
    I see. But would you need to mutually exclusive exec paths? The way I imagine it is just to skip the rebuilding of the image when it doesn’t need to be rebuilt. But maybe I’m overlooking something.
  9. real-or-random commented at 5:00 pm on August 7, 2023: contributor

    Concept ACK.

    Very nice that the prebuilt docker stuff can be moved to GitHub Actions easily. I think only time will tell whether we prefer GH Actions over Cirrus in terms of latency, reliability, support, etc, but I’m happy to try.

  10. hebasto force-pushed on Aug 7, 2023
  11. hebasto renamed this:
    [POC] ci: Add "x86_64: Linux (Debian stable)" GitHub Actions job
    ci: Add "x86_64: Linux (Debian stable)" GitHub Actions job
    on Aug 7, 2023
  12. hebasto commented at 8:08 pm on August 7, 2023: member

    Updated 7ea529c40bc890cc8c398bb410413010350ee29b -> c066aff6cfedb3ad98cc8f76ad87da7a06a17011 (pr1396.02 -> pr1396.03, diff):

    UPD. Also the workflow has been disabled for tags. I don’t see any reason in such behaviour, However, please let me know if I am wrong.

  13. MarcoFalke commented at 6:10 am on August 8, 2023: none

    GitHub Container Registry is free for public images.

    Ok, so that will also need write-access for the CI again. If there is no storage limit, it would still be good to clean old tags, no? I just can’t imagine that GH will run an unlimited, free, append-only storage forever.

  14. hebasto commented at 8:06 am on August 8, 2023: member

    GitHub Container Registry is free for public images.

    Ok, so that will also need write-access for the CI again.

    Not the whole CI, but a single job: https://github.com/bitcoin-core/secp256k1/blob/c066aff6cfedb3ad98cc8f76ad87da7a06a17011/.github/workflows/ci.yml#L72-L77 and its the corresponding reusable workflow: https://github.com/bitcoin-core/secp256k1/blob/c066aff6cfedb3ad98cc8f76ad87da7a06a17011/.github/workflows/docker.yml#L12-L13

    If there is no storage limit, it would still be good to clean old tags, no? I just can’t imagine that GH will run an unlimited, free, append-only storage forever.

    I agree with you. Going to leave this issue for a follow up as a non-critical one.

    (a note to myself: https://docs.github.com/en/packages/learn-github-packages/deleting-and-restoring-a-package)

  15. MarcoFalke commented at 8:26 am on August 8, 2023: none

    It just seems overly complicated to think about. What if a malicious pull request pushes a malicious image to the image tag of another pull request? Wouldn’t it be easier to only have a single tag to solve all issues:

    • Storage requirements are constant.
    • Write access can be limited to the main development branch only.
    • Pull requests always use the cache from the main development branch and always build from scratch on a new push, if they modify the docker image.
  16. hebasto commented at 11:22 am on August 8, 2023: member
    • Pull requests always use the cache from the main development branch and always build from scratch on a new push, if they modify the docker image.

    Unfortunately, jobs.<job_id>.container.image supports only published images, not built locally.

  17. MarcoFalke commented at 11:51 am on August 8, 2023: none
    You could implement it yourself with three lines? Something like: podman pull $remote_image_from_main (to init the cache) and then podman build ... and podman run cirrus.sh
  18. hebasto commented at 11:53 am on August 8, 2023: member

    You could implement it yourself with three lines? Something like: podman pull $remote_image_from_main (to init the cache) and then podman build ... and podman run cirrus.sh

    Sure. That is the approach I’m currently working on :)

  19. MarcoFalke commented at 11:57 am on August 8, 2023: none

    (re #1396 (comment))

    Or maybe there could be two tags: main and random_pull, where both are downloaded to opportunistically seed the cache. This would double the storage and require write permission in pull requests again, but would save on compute on a sunny day (no malicious pull request or honest concurrent pull request that modifies the docker image).

  20. real-or-random commented at 12:28 pm on August 8, 2023: contributor

    (re #1396 (comment))

    Or maybe there could be two tags: main and random_pull, where both are downloaded to opportunistically seed the cache. This would double the storage and require write permission in pull requests again, but would save on compute on a sunny day (no malicious pull request or honest concurrent pull request that modifies the docker image).

    I think what you suggested above is good enough. It’s anyway not a bad to rebuild every time in PRs that modify the Docker image, just to get rid of all potential caching issues during development.

  21. MarcoFalke commented at 12:32 pm on August 8, 2023: none

    Which above do you mean?

    IIRC clang is built from scratch, so if there is a matrix of 90 tasks, clang will be built 90 times in a pull, unless there happens to be a cache laying around.

  22. real-or-random commented at 1:43 pm on August 8, 2023: contributor

    I meant this:

    Pull requests always use the cache from the main development branch and always build from scratch on a new push, if they modify the docker image.


    IIRC clang is built from scratch, so if there is a matrix of 90 tasks, clang will be built 90 times in a pull, unless there happens to be a cache laying around.

    Okay, sure, that’s a problem.

    Could we have just one tag per PR, but delete the image when the PR is closed or merged?

  23. real-or-random commented at 2:16 pm on August 8, 2023: contributor

    I’m not sure if publishing to ghcr is the right approach. Couldn’t we use GHA’s cache? https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows Then we get cache eviction automatically, see https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows#usage-limits-and-eviction-policy.

    I can imagine two approaches:

    • Cache the built image (similar to what we do now).
    • Always “build” the Docker image, but provide the workflow that builds the image with a Docker build cache. (Then docker build will always run, but it will immediately succeed when it has a cache hit…) (edit: I think something like this is described here: https://docs.docker.com/build/ci/github-actions/cache/)
  24. hebasto commented at 2:19 pm on August 8, 2023: member

    I’m not sure if publishing to ghcr is the right approach. Couldn’t we use GHA’s cache? docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows Then we get cache eviction automatically, see docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows#usage-limits-and-eviction-policy.

    I can imagine two approaches:

    • Cache the built image (similar to what we do now).

    • Always “build” the Docker image, but provide the workflow that builds the image with a Docker build cache. (Then docker build will always run, but it will immediately succeed when it has a cache hit…)

    I have already tried such an approach.

    … it will immediately succeed…

    That’s a bit exaggerated :)

    UPD. I mean, GBs of cache take time to pack / upload / download / unpack…

  25. MarcoFalke commented at 2:29 pm on August 8, 2023: none

    UPD. I mean, GBs of cache take time to pack / upload / download / unpack…

    How and why is uploading an image to cache different from uploading an image to ghcr, time wise?

  26. hebasto commented at 2:33 pm on August 8, 2023: member

    UPD. I mean, GBs of cache take time to pack / upload / download / unpack…

    How and why is uploading an image to cache different from uploading an image to ghcr, time wise?

    You’re right. There are no significant difference. My point was that there is no reason to expect a success of a cached job “immediately”.

  27. MarcoFalke commented at 2:39 pm on August 8, 2023: none
    Ah, I see. Maybe a single task can be spawned before the matrix to populate the cache? IIUC the docker image in this repo is static (the same) for all tasks in the matrix?
  28. real-or-random commented at 4:56 pm on August 8, 2023: contributor

    I can imagine two approaches:

    • Cache the built image (similar to what we do now).
    • Always “build” the Docker image, but provide the workflow that builds the image with a Docker build cache. (Then docker build will always run, but it will immediately succeed when it has a cache hit…)

    I have already tried such an approach.

    Which one have you tried, and why was it not successful?

  29. hebasto commented at 7:50 pm on August 8, 2023: member

    I can imagine two approaches:

    • Cache the built image (similar to what we do now).
    • Always “build” the Docker image, but provide the workflow that builds the image with a Docker build cache. (Then docker build will always run, but it will immediately succeed when it has a cache hit…)

    I have already tried such an approach.

    Which one have you tried, and why was it not successful?

    The latter. Its recent incarnation is here: https://github.com/hebasto/secp256k1/tree/230808-gha-linux.3

    It fails due to “no space left on device” error. However, the workflow works with simpler linux-debian.Dockerfile.

  30. real-or-random commented at 8:07 pm on August 8, 2023: contributor

    It fails due to “no space left on device” error

    I can imagine this is due to our use of /tmp.

    edit: It may be helpful to try df -h inside docker to see what’s going on.

  31. real-or-random added the label ci on Aug 8, 2023
  32. hebasto commented at 9:57 pm on August 8, 2023: member

    It may be helpful to try df -h inside docker to see what’s going on.

    That seems a bit tricky as everything happens within a third-party docker/build-push-action.

    0[#11](/bitcoin-core-secp256k1/11/) 113.9 Filesystem      Size  Used Avail Use% Mounted on
    1[#11](/bitcoin-core-secp256k1/11/) 113.9 overlay          84G   80G  4.2G  96% /
    2[#11](/bitcoin-core-secp256k1/11/) 113.9 tmpfs            64M     0   64M   0% /dev
    3[#11](/bitcoin-core-secp256k1/11/) 113.9 shm              64M     0   64M   0% /dev/shm
    4[#11](/bitcoin-core-secp256k1/11/) 113.9 /dev/root        84G   80G  4.2G  96% /etc/resolv.conf
    5[#11](/bitcoin-core-secp256k1/11/) 113.9 tmpfs           3.4G     0  3.4G   0% /proc/acpi
    6[#11](/bitcoin-core-secp256k1/11/) 113.9 tmpfs           3.4G     0  3.4G   0% /sys/firmware
    7[#11](/bitcoin-core-secp256k1/11/) 113.9 tmpfs           3.4G     0  3.4G   0% /proc/scsi
    

    Maybe just split MSVC part into a separated image?

  33. MarcoFalke commented at 4:54 am on August 9, 2023: none
  34. hebasto commented at 10:10 am on August 9, 2023: member

    @MarcoFalke

    You can clear space, see google/oss-fuzz@b7f04d7

    Thank you! I’m going to look into your suggestion closely.

    Maybe just split MSVC part into a separated image?

    Meanwhile, splitting Docker images has its own benefits like an opportunity to build all images in parallel and minimizing the total time needed to rebuild.

  35. hebasto cross-referenced this on Aug 9, 2023 from issue ci: Move tidy to persistent worker by MarcoFalke
  36. real-or-random commented at 11:11 am on August 9, 2023: contributor

    There’s also https://github.com/insightsengineering/disk-space-reclaimer (not suggesting to use that action, but this has a some nice stats etc.)

    And yep, https://github.com/hebasto/secp256k1/commit/9db9c1ccd531d020d709af3e8b466b3fb250094c is also a good idea. We could also remove the gcc sources/build dir after installing.

  37. MarcoFalke commented at 11:46 am on August 9, 2023: none

    There’s also https://github.com/insightsengineering/disk-space-reclaimer (not suggesting to use that action, but this has a some nice stats etc.)

    My approach clears only 4GB less, but doesn’t consume any CPU or disk ;)

  38. hebasto cross-referenced this on Aug 10, 2023 from issue ci, gha: Add Windows jobs based on Linux image by hebasto
  39. hebasto marked this as a draft on Aug 10, 2023
  40. hebasto commented at 8:05 am on August 10, 2023: member

    Converting to draft.

    Please consider #1398 first that uses GHA cache rather ghcr.io.

  41. hebasto force-pushed on Aug 11, 2023
  42. hebasto commented at 10:13 pm on August 11, 2023: member

    Rebased on top of the #1398.

    The PR description has been updated.

  43. hebasto force-pushed on Aug 12, 2023
  44. hebasto renamed this:
    ci: Add "x86_64: Linux (Debian stable)" GitHub Actions job
    ci, gha: Add "x86_64: Linux (Debian stable)" GitHub Actions job
    on Aug 12, 2023
  45. real-or-random commented at 1:44 pm on August 15, 2023: contributor
    I think, it would be good to cherry-pick https://github.com/real-or-random/secp256k1/commit/20b208df0ab22a2afd80f3cf172f100365766e7b here to see the difference in image size.
  46. hebasto force-pushed on Aug 15, 2023
  47. hebasto commented at 5:39 pm on August 15, 2023: member

    Rebased on top of the #1399.

    I think, it would be good to cherry-pick real-or-random@20b208d here to see the difference in image size.

    Thanks! Cherry-picked.

  48. hebasto force-pushed on Aug 15, 2023
  49. ci: Remove GCC build files and sage to reduce size of Docker image ad3e65d9fe
  50. ci, gha: Add "x86_64: Linux (Debian stable)" GitHub Actions job 03c9e6508c
  51. ci: Remove "x86_64: Linux (Debian stable)" task from Cirrus CI 6617a620d9
  52. hebasto force-pushed on Aug 18, 2023
  53. hebasto marked this as ready for review on Aug 18, 2023
  54. hebasto commented at 1:52 pm on August 18, 2023: member
    FWIW, @real-or-random’s commit reduced the cached Docker layers size from ~3.2 GB to ~2.1 GB.
  55. hebasto closed this on Aug 18, 2023

  56. hebasto reopened this on Aug 18, 2023

  57. hebasto commented at 3:48 pm on August 18, 2023: member
  58. real-or-random commented at 4:00 pm on August 18, 2023: contributor
    • Hm, yeah, sad that we need retries…(nit: Call I it retry_builder? fallback sounds like it’s doing something else)
    • I wonder if artifacts will be a more reliable way of transferring the image from the builder job to the other job. Do you think it’s worth trying? It would be interesting to see if it’s slower or faster.
  59. ci, gha: Add `retry_builder` Docker image builder
    This change is aimed at significantly reducing the frequency of failures
    caused by intermittent network timeouts.
    4ad4914bd1
  60. hebasto force-pushed on Aug 18, 2023
  61. hebasto commented at 4:09 pm on August 18, 2023: member
    • Hm, yeah, sad that we need retries…(nit: Call I it retry_builder? fallback sounds like it’s doing something else)

    Done.

    • I wonder if artifacts will be a more reliable way of transferring the image from the builder job to the other job. Do you think it’s worth trying? It would be interesting to see if it’s slower or faster.

    I agree with you. It’s definitely worth trying.

    Considering the 1st Sept. deadline, it would be nice to postpone until all tasks are migrated from Cirrus CI.

  62. real-or-random commented at 4:22 pm on August 18, 2023: contributor

    *

    I agree with you. It’s definitely worth trying.

    Considering the 1st Sept. deadline, it would be nice to postpone until all tasks are migrated from Cirrus CI.

    Agreed, I’ll add an item to the tracking issue.

  63. real-or-random approved
  64. real-or-random commented at 4:23 pm on August 18, 2023: contributor
    ACK 4ad4914bd15bd856eddb306d86588bdacabb1184 (assuming CI finishes successfully)
  65. real-or-random commented at 4:40 pm on August 18, 2023: contributor

    Well, now one job failed even with the retrying… Perhaps failure probability within a single job are correlated. https://github.com/bitcoin-core/secp256k1/actions/runs/5904705284/job/16017365422?pr=1396

    Idk, I think we should accept some flakiness and move on for now… Cirrus was also not perfect in that regard, even though they improved their automatic re-running. So my ACK stands.

  66. hebasto commented at 10:28 am on August 19, 2023: member

    I wonder if artifacts will be a more reliable way of transferring the image from the builder job to the other job.

    Uploading of the image (2036732288 bytes) takes appr. 15 minutes.

    UPD. See builds in my personal repo – https://github.com/hebasto/secp256k1/actions/runs/5912496333.

    UPD. Just a reminder that there is one more option that is “abusing” of GitHub Container Registry.

  67. ci, gha: Drop `driver-opts.network` input for `setup-buildx-action` e10878f58e
  68. hebasto commented at 8:29 am on August 20, 2023: member

    I took the liberty to push one more commit which partially reverts the change introduced earlier and based on this comment.

    Actually, https://github.com/moby/buildkit/issues/3969 is related to writing to the GitHub Actions cache, but our run-in-docker-action only reads from it.

    Before pushing, I’ve successfully run it 10 times in my personal repo – https://github.com/hebasto/secp256k1/actions/runs/5912556017.

    Here is an example of how network glitches are currently being handled:

    0...
    1[#11](/bitcoin-core-secp256k1/11/) sha256:6ee24e002191e89c757a065afeab642edbcab93b1d8c700e01985f03d349b115 598.74MB / 1.43GB 348.3s
    2[#11](/bitcoin-core-secp256k1/11/) sha256:6ee24e002191e89c757a065afeab642edbcab93b1d8c700e01985f03d349b115 602.93MB / 1.43GB 353.4s
    3[#11](/bitcoin-core-secp256k1/11/) 353.8 error: failed to copy: read tcp 172.17.0.2:53852->20.209.52.129:443: read: connection reset by peer
    4[#11](/bitcoin-core-secp256k1/11/) 353.8 retrying in 1s
    5[#11](/bitcoin-core-secp256k1/11/) sha256:6ee24e002191e89c757a065afeab642edbcab93b1d8c700e01985f03d349b115 604.54MB / 1.43GB 1.7s
    6[#11](/bitcoin-core-secp256k1/11/) sha256:6ee24e002191e89c757a065afeab642edbcab93b1d8c700e01985f03d349b115 608.73MB / 1.43GB 6.9s
    7[#11](/bitcoin-core-secp256k1/11/) sha256:6ee24e002191e89c757a065afeab642edbcab93b1d8c700e01985f03d349b115 613.97MB / 1.43GB 12.0s
    8...
    
  69. hebasto cross-referenced this on Aug 20, 2023 from issue ci, gha: Move more non-x86_64 tasks from Cirrus CI to GitHub Actions by hebasto
  70. hebasto commented at 8:22 pm on August 20, 2023: member
    Continuation of this PR is here: #1406.
  71. real-or-random commented at 8:36 am on August 21, 2023: contributor

    Here is an example of how network glitches are currently being handled:

    Oh, nice! Just to get this right: the retries are only possible with the dropped driver-opts?

  72. hebasto commented at 8:41 am on August 21, 2023: member

    Here is an example of how network glitches are currently being handled:

    Oh, nice! Just to get this right: the retries are only possible with the dropped driver-opts?

    Unfortunately, I cannot confirm it as did not find such a behavior description in the docs.

  73. real-or-random commented at 9:36 am on August 21, 2023: contributor

    Hm, 3 jobs failed during the Checkout stage with HTTP 429 (too many requests). I wonder if this is related to having so many parallel jobs. Have you seen this before, e.g. when testing in your repo?

    edit: Possibly related because it was opened yesterday: https://github.com/actions/checkout/issues/1432

  74. hebasto commented at 10:34 am on August 21, 2023: member

    Hm, 3 jobs failed during the Checkout stage with HTTP 429 (too many requests). I wonder if this is related to having so many parallel jobs. Have you seen this before, e.g. when testing in your repo?

    I’ve seen the same error in my personal repo exactly at the same time (a very short period). I think, it was one of intermittent GitHub-wide glitches. FWIW, similar issues were observed in Cirrus CI a few times (still they were caused by GitHub).

  75. real-or-random commented at 1:58 pm on August 21, 2023: contributor

    I wonder if artifacts will be a more reliable way of transferring the image from the builder job to the other job.

    Uploading of the image (2036732288 bytes) takes appr. 15 minutes.

    UPD. See builds in my personal repo – hebasto/secp256k1/actions/runs/5912496333.

    UPD. Just a reminder that there is one more option that is “abusing” of GitHub Container Registry.

    Thanks for testing the artifact thing. Downloading the image is much faster apparently (~ 1.3 min), and then the actual jobs run a bit faster. So I can imagine that approach is not too bad if we could avoid re-uploading existing artifacts, e.g., have another digest file that just stores the SHA256, and re-upload only if the SHA does not match. But all of this is probably not worth the complexity if the current approach with the cache turns out to be good enough.

  76. real-or-random approved
  77. real-or-random commented at 2:04 pm on August 21, 2023: contributor
    ACK e10878f58e4022dbac6e215a89c980a17b95044b
  78. real-or-random merged this on Aug 21, 2023
  79. real-or-random closed this on Aug 21, 2023

  80. hebasto deleted the branch on Aug 21, 2023

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin-core/secp256k1. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2024-12-26 10:15 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me