ci: allow passing extra args to docker build with CI_IMAGE_BUILD_EXTRA

0xB10C commented at 7:21 pm on November 26, 2024: contributor

By setting CI_IMAGE_BUILD_EXTRA_ARGS, a CI runner can influence the docker build of the CI container image. This allows to, for example, pass --cache-to and --cache-from to the build, which is useful for ephemeral, self-hosted CI runners.

ci: allow passing extra args to docker build

By setting `CI_IMAGE_BUILD_EXTRA_ARGS`, a CI runner can influence the
docker build of the CI container image. This allows to, for example,
pass `--cache-to` and `--cache-from` to the build, which is useful
for ephemeral CI runners.

f8c8e23c99

DrahtBot commented at 7:21 pm on November 26, 2024: contributor

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/31377.

Reviews

See the guideline for information on the review process. A summary of reviews will appear here.

0xB10C commented at 7:21 pm on November 26, 2024: contributor

I’ve been looking into setting up my own ephemeral CI runners. Ideally, these would cache the CI container images that we build. By setting CI_IMAGE_BUILD_EXTRA_ARGS to --cache-to type=local,dest=/cache/docker,mode=max --cache-from type=local,src=/cache/docker (see https://docs.docker.com/build/cache/backends/local/) I can cache the images (and reusable layers of the images) for docker build. This is both nice for a faster build (the docker build in the MSan task takes > 1h), but also avoids failures when e.g. the Ubuntu apt mirrors aren’t reachable.

The current CI runners don’t need this, as they aren’t ephemeral. They cache the images in their internal docker image store which is reused in the next CI task. A general alternative would be to regularly build CI docker images, push them to a registry, and (cache-from)/pull them during the build. However, this seems to be a lot more effort.

DrahtBot added the label Tests on Nov 26, 2024

maflcko commented at 7:46 am on November 27, 2024: member

The current CI runners don’t need this, as they aren’t ephemeral.

Currently, each runner is their own user account, with their own podman image layer cache, so in theory, they’d benefit from this as well.

However, the images are quite large (especially msan), so my worry would be that after a sufficient number of builds, the workers will run out of storage, as the cache isn’t cleared like the local images are.

0xB10C commented at 11:11 am on November 27, 2024: contributor

It occurred to me that the local cache needs to be keyed by e.g. ${CONTAINER_NAME} since different tasks build different images. This wasn’t clear to me from the build cache documentation at first. Setting CI_IMAGE_BUILD_EXTRA_ARGS="--cache-to type=local,dest=/cache/docker/${CONTAINER_NAME}" doesn’t work, as bash does not expand variables in variables. I’ll mark this as draft for now until I find a solution.

However, the images are quite large (especially msan), so my worry would be that after a sufficient number of builds, the workers will run out of storage, as the cache isn’t cleared like the local images are.

Yes, the local build cache dir seems to grow indefinitely as old layers aren’t cleaned up. Based on the “cache versioning” documentation, I think we wouldn’t need to keep old layers around as we wouldn’t use the digest to reference them. A naive approach would be to use --cache-from type=local,src=/path/to/docker/build/cache/${CONTAINER_NAME} with --cache-to type=local,src=$tmpdir. After the build, overwrite /path/to/docker/build/cache/${CONTAINER_NAME} with the contents of $tempdir. Only the most recent build cache would be kept.

0xB10C marked this as a draft on Nov 27, 2024

maflcko commented at 11:21 am on November 27, 2024: member

It occurred to me that the local cache needs to be keyed by e.g. ${CONTAINER_NAME} since different tasks build different images.

Are you sure? The final layers should be different for each task, so should never collide. (Possibly early layers may be shared, if they happen to be the same). If this was an issue, shouldn’t the CI normally fail when running several tasks on the same machine in the same user account?

After the build, overwrite /path/to/docker/build/cache/${CONTAINER_NAME} with the contents of $tempdir. Only the most recent build cache would be kept.

That seems racy, as one worker may still be using the cache when it is (re)moved, which could lead to errors?

0xB10C commented at 12:40 pm on November 27, 2024: contributor

It occurred to me that the local cache needs to be keyed by e.g. ${CONTAINER_NAME} since different tasks build different images.

Are you sure? The final layers should be different for each task, so should never collide. (Possibly early layers may be shared, if they happen to be the same). If this was an issue, shouldn’t the CI normally fail when running several tasks on the same machine in the same user account?

I’m not sure if we are talking about the same thing. I mean the docker build cache of type=local (https://docs.docker.com/build/cache/backends/local/) and not the final images the local docker knows about and lists when calling docker images.

My understanding is that --cache-from type=local,dest=/abc/ will look up /abc/index.json and read the digest. E.g.

 0{
 1  "schemaVersion": 2,
 2  "mediaType": "application/vnd.oci.image.index.v1+json",
 3  "manifests": [
 4    {
 5      "mediaType": "application/vnd.oci.image.index.v1+json",
 6      "digest": "sha256:3a897fa438c7965a7a4919f2d55e5ea578221b4686ebc38f4fb843fac19fc3ec",
 7      "size": 1559,
 8      "annotations": {
 9        "org.opencontainers.image.ref.name": "latest"
10      }
11    }
12  ]
13}

With that, it will start importing the cache from /abc/blobs/sha256/3a897fa438c7965a7a4919f2d55e5ea578221b4686ebc38f4fb843fac19fc3ec. This however doesn’t include all old, now unreferenced blobs/layers.

That seems racy, as one worker may still be using the cache when it is (re)moved, which could lead to errors?

At least on Linux/Unix, when you open a file you get a file descriptor and can read from that descriptor even if the file is removed or overwritten with e.g. mv. This might only fail if you got a file descriptor on an old index.json and try to look up the referenced file in blobs/sha256/3a897f.. which has just been removed. However, --cache-from only warns if it can’t read the cache and then continues treating it as a cache-miss. I changed the digest to start with b10c.. for https://cirrus-ci.com/task/4945786574733312?logs=ci#L176:

0[#5](/bitcoin-bitcoin/5/) importing cache manifest from local:16451918948965588263
1[#5](/bitcoin-bitcoin/5/) ERROR: failed to configure local cache importer: NotFound: rpc error: code = NotFound desc = content sha256:b10c7fa438c7965a7a4919f2d55e5ea578221b4686ebc38f4fb843fac19fc3ec: not found
2[#6](/bitcoin-bitcoin/6/) [1/4] FROM docker.io/library/ubuntu:24.04@sha256:278628f08d4979fb9af9ead44277dbc9c92c2465922310916ad0c46ec9999295
3[#6](/bitcoin-bitcoin/6/) resolve docker.io/library/ubuntu:24.04@sha256:278628f08d4979fb9af9ead44277dbc9c92c2465922310916ad0c46ec9999295
4[#6](/bitcoin-bitcoin/6/) resolve docker.io/library/ubuntu:24.04@sha256:278628f08d4979fb9af9ead44277dbc9c92c2465922310916ad0c46ec9999295 0.9s done
5[#6](/bitcoin-bitcoin/6/) DONE 0.9s

0xB10C referenced this in commit 182892a936 on Dec 19, 2024

0xB10C referenced this in commit f708905428 on Dec 19, 2024

0xB10C referenced this in commit a14506d614 on Dec 19, 2024

0xB10C referenced this in commit 7db407223c on Dec 19, 2024

0xB10C referenced this in commit 92270d4ba5 on Dec 19, 2024

0xB10C referenced this in commit 51c93b7128 on Dec 19, 2024

0xB10C referenced this in commit 8a17ee3cf9 on Dec 19, 2024

0xB10C referenced this in commit 4149b00ae5 on Dec 20, 2024

0xB10C commented at 8:53 am on December 20, 2024: contributor

closing in favor of https://github.com/bitcoin/bitcoin/pull/31545

0xB10C closed this on Dec 20, 2024

ci: allow passing extra args to docker build with CI_IMAGE_BUILD_EXTRA_ARGS #31377

Code Coverage & Benchmarks

Reviews