CI_IMAGE_BUILD_EXTRA_ARGS
, a CI runner can influence the docker build of the CI container image. This allows to, for example, pass --cache-to
and --cache-from
to the build, which is useful for ephemeral, self-hosted CI runners.
CI_IMAGE_BUILD_EXTRA_ARGS
, a CI runner can influence the docker build of the CI container image. This allows to, for example, pass --cache-to
and --cache-from
to the build, which is useful for ephemeral, self-hosted CI runners.
By setting `CI_IMAGE_BUILD_EXTRA_ARGS`, a CI runner can influence the
docker build of the CI container image. This allows to, for example,
pass `--cache-to` and `--cache-from` to the build, which is useful
for ephemeral CI runners.
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.
For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/31377.
See the guideline for information on the review process. A summary of reviews will appear here.
I’ve been looking into setting up my own ephemeral CI runners. Ideally, these would cache the CI container images that we build. By setting CI_IMAGE_BUILD_EXTRA_ARGS
to --cache-to type=local,dest=/cache/docker,mode=max --cache-from type=local,src=/cache/docker
(see https://docs.docker.com/build/cache/backends/local/) I can cache the images (and reusable layers of the images) for docker build
. This is both nice for a faster build (the docker build
in the MSan task takes > 1h), but also avoids failures when e.g. the Ubuntu apt mirrors aren’t reachable.
The current CI runners don’t need this, as they aren’t ephemeral. They cache the images in their internal docker image store which is reused in the next CI task. A general alternative would be to regularly build CI docker images, push them to a registry, and (cache-from)/pull them during the build. However, this seems to be a lot more effort.
The current CI runners don’t need this, as they aren’t ephemeral.
Currently, each runner is their own user account, with their own podman image layer cache, so in theory, they’d benefit from this as well.
However, the images are quite large (especially msan), so my worry would be that after a sufficient number of builds, the workers will run out of storage, as the cache isn’t cleared like the local images are.
It occurred to me that the local cache needs to be keyed by e.g. ${CONTAINER_NAME}
since different tasks build different images. This wasn’t clear to me from the build cache documentation at first. Setting CI_IMAGE_BUILD_EXTRA_ARGS="--cache-to type=local,dest=/cache/docker/${CONTAINER_NAME}"
doesn’t work, as bash does not expand variables in variables. I’ll mark this as draft for now until I find a solution.
However, the images are quite large (especially msan), so my worry would be that after a sufficient number of builds, the workers will run out of storage, as the cache isn’t cleared like the local images are.
Yes, the local build cache dir seems to grow indefinitely as old layers aren’t cleaned up. Based on the “cache versioning” documentation, I think we wouldn’t need to keep old layers around as we wouldn’t use the digest
to reference them. A naive approach would be to use --cache-from type=local,src=/path/to/docker/build/cache/${CONTAINER_NAME}
with --cache-to type=local,src=$tmpdir
. After the build, overwrite /path/to/docker/build/cache/${CONTAINER_NAME}
with the contents of $tempdir
. Only the most recent build cache would be kept.
It occurred to me that the local cache needs to be keyed by e.g.
${CONTAINER_NAME}
since different tasks build different images.
Are you sure? The final layers should be different for each task, so should never collide. (Possibly early layers may be shared, if they happen to be the same). If this was an issue, shouldn’t the CI normally fail when running several tasks on the same machine in the same user account?
After the build, overwrite
/path/to/docker/build/cache/${CONTAINER_NAME}
with the contents of$tempdir
. Only the most recent build cache would be kept.
That seems racy, as one worker may still be using the cache when it is (re)moved, which could lead to errors?
It occurred to me that the local cache needs to be keyed by e.g.
${CONTAINER_NAME}
since different tasks build different images.Are you sure? The final layers should be different for each task, so should never collide. (Possibly early layers may be shared, if they happen to be the same). If this was an issue, shouldn’t the CI normally fail when running several tasks on the same machine in the same user account?
I’m not sure if we are talking about the same thing. I mean the docker build
cache of type=local
(https://docs.docker.com/build/cache/backends/local/) and not the final images the local docker knows about and lists when calling docker images
.
My understanding is that --cache-from type=local,dest=/abc/
will look up /abc/index.json
and read the digest. E.g.
0{
1 "schemaVersion": 2,
2 "mediaType": "application/vnd.oci.image.index.v1+json",
3 "manifests": [
4 {
5 "mediaType": "application/vnd.oci.image.index.v1+json",
6 "digest": "sha256:3a897fa438c7965a7a4919f2d55e5ea578221b4686ebc38f4fb843fac19fc3ec",
7 "size": 1559,
8 "annotations": {
9 "org.opencontainers.image.ref.name": "latest"
10 }
11 }
12 ]
13}
With that, it will start importing the cache from /abc/blobs/sha256/3a897fa438c7965a7a4919f2d55e5ea578221b4686ebc38f4fb843fac19fc3ec
. This however doesn’t include all old, now unreferenced blobs/layers.
That seems racy, as one worker may still be using the cache when it is (re)moved, which could lead to errors?
At least on Linux/Unix, when you open a file you get a file descriptor and can read from that descriptor even if the file is removed or overwritten with e.g. mv
. This might only fail if you got a file descriptor on an old index.json
and try to look up the referenced file in blobs/sha256/3a897f..
which has just been removed. However, --cache-from
only warns if it can’t read the cache and then continues treating it as a cache-miss. I changed the digest
to start with b10c..
for https://cirrus-ci.com/task/4945786574733312?logs=ci#L176:
0[#5](/bitcoin-bitcoin/5/) importing cache manifest from local:16451918948965588263
1[#5](/bitcoin-bitcoin/5/) ERROR: failed to configure local cache importer: NotFound: rpc error: code = NotFound desc = content sha256:b10c7fa438c7965a7a4919f2d55e5ea578221b4686ebc38f4fb843fac19fc3ec: not found
2[#6](/bitcoin-bitcoin/6/) [1/4] FROM docker.io/library/ubuntu:24.04@sha256:278628f08d4979fb9af9ead44277dbc9c92c2465922310916ad0c46ec9999295
3[#6](/bitcoin-bitcoin/6/) resolve docker.io/library/ubuntu:24.04@sha256:278628f08d4979fb9af9ead44277dbc9c92c2465922310916ad0c46ec9999295
4[#6](/bitcoin-bitcoin/6/) resolve docker.io/library/ubuntu:24.04@sha256:278628f08d4979fb9af9ead44277dbc9c92c2465922310916ad0c46ec9999295 0.9s done
5[#6](/bitcoin-bitcoin/6/) DONE 0.9s