ci: Only write docker build images to Cirrus cache #33639

pull maflcko wants to merge 3 commits into bitcoin:master from maflcko:2510-ci-rework-cache-providers changing 3 files +23 −26
  1. maflcko commented at 9:37 am on October 16, 2025: member

    The DOCKER_BUILD_CACHE_ARG env var holds the options on how to use cache providers. Storing the image layers is useful for the Cirrus cache provider, because it offers 10GB per runner (https://cirrus-runners.app/setup/#speeding-up-the-cache). The cached image layers can help to avoid issues when the upstream package manager infra (apt native, apt llvm, pip, apk, git clone, …) has outages or network issues.

    However, on the GitHub Actions cache provider, a total cache of 10GB is offered for the whole repo. This cache must be shared with the depends cache, and the ccache, as well as the previous releases cache. So it is already full and trying to put the docker build layers into it will lead to an overflow.

    Fix it by only writing to the docker cache on Cirrus.

    Also, DOCKER_BUILD_CACHE_ARG requires a shellcheck disable=SC2086 on the full build command. Fix that as well by using shlex.split from Python on just this variable.

  2. DrahtBot renamed this:
    ci: Only write docker build images to Cirrus cache
    ci: Only write docker build images to Cirrus cache
    on Oct 16, 2025
  3. DrahtBot added the label Tests on Oct 16, 2025
  4. DrahtBot commented at 9:37 am on October 16, 2025: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/33639.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK l0rinc, m3dwards, cedwies, willcl-ark

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

    LLM Linter (✨ experimental)

    Possible typos and grammar issues:

    • Building {os.environ[‘CONTAINER_NAME’]} image tag to run in -> Building {os.environ[‘CONTAINER_NAME’]} image to run [original phrasing is incomplete/ambiguous ("…tag to run in" leaves the phrase “in” without an object), so remove “tag” and “in” to make the intent clear]

    drahtbot_id_5_m

  5. fanquake requested review from willcl-ark on Oct 16, 2025
  6. fanquake requested review from m3dwards on Oct 16, 2025
  7. in ci/test/02_run_container.py:64 in fa099b23f8
    57@@ -45,6 +58,30 @@ def main():
    58                 file.write(f"{k}={v}\n")
    59     run(["cat", env_file])
    60 
    61+    if not os.getenv("DANGER_RUN_CI_ON_HOST"):
    62+        CI_IMAGE_LABEL = "bitcoin-ci-test"
    63+
    64+        maybe_cache_arg = shlex.split(os.getenv('DOCKER_BUILD_CACHE_ARG'))
    


    cedwies commented at 9:34 pm on October 16, 2025:

    What if we run this outside CI (where env var might not exist)? In that case os.getenv(...) returns None and shlex.split(None) throws TypeError.

    Would

    0maybe_cache_arg = shlex.split(os.getenv('DOCKER_BUILD_CACHE_ARG', ''))
    

    make sense here? CI behavior is identical, but if it’s missing locally we just get an empty list (no cache flags) instead of crashing. Or would you rather keep CI-only usability?


    maflcko commented at 8:35 am on October 17, 2025:
    Ah, very nice catch. I had considered this case, but forgot to set the empty string before pushing the branch. Fixed now and pushed.
  8. in ci/test/02_run_container.py:29 in fa099b23f8 outdated
    24+    # approximation to respect $MAKEJOBS somewhat, if cpuset is available.
    25+    if os.getenv("HAVE_CGROUP_CPUSET"):
    26+        P = int(run(['nproc'], stdout=subprocess.PIPE).stdout)
    27+        M = min(P, int(os.getenv("MAKEJOBS").lstrip("-j")))
    28+        cpus = sorted(random.sample(range(P), M))
    29+        return [f"--cpuset-cpus={','.join(map(str, cpus))}"]
    


    cedwies commented at 9:58 pm on October 16, 2025:

    we compute M from MAKEJOBS and then always emit a cpuset. CI normally uses -jN with N > 0, but for edge/local setups, if MAKEJOBS ends up as -j0 (or missing/malformed?), M can be 0 -> cpus is [] → we emit --cpuset-cpus= (empty). Would it be ok to skip the flag when M <= 0 (and optionally guard parsing)?

     0def maybe_cpuset():
     1    if os.getenv("HAVE_CGROUP_CPUSET"):
     2        P = int(run(['nproc'], stdout=subprocess.PIPE).stdout)
     3        mj = os.getenv("MAKEJOBS", "-j1")  # maybe a default for local runs?
     4        try:
     5            M = min(P, int(mj.lstrip("-j")))
     6        except (TypeError, ValueError):
     7            return []  # bad/missing MAKEJOBS -> just skip?
     8        if M > 0:
     9            cpus = sorted(random.sample(range(P), M))
    10            return [f"--cpuset-cpus={','.join(map(str, cpus))}"]
    11    return []
    

    This does not change normal CI behavior but just avoids an odd empty cpuset in edge cases. If you prefer strictness, we could alternatively sys.exit instead?


    maflcko commented at 8:35 am on October 17, 2025:

    I don’t think MAKEJOBS=-j0 was ever supported by the CI. It will eventually be passed down to the test runner, which fails:

    0test/functional/test_runner.py", line 714, in __init__
    1    assert num_tests_parallel >= 1
    2           ^^^^^^^^^^^^^^^^^^^^^^^
    3AssertionError
    

    So I think I’ll leave this as-is for now, to keep the code move-only, as the goal is not to change it. It can be changed in a separate PR, if there is need.

    Just as a hint: If you want to use nproc, with the current CI system, you can just omit all settings, because it is picked by default:

    https://github.com/bitcoin/bitcoin/blob/e744fd1249bf9577274614eaf3997bf4bbb612ff/ci/test/00_setup_env.sh#L37-L38

  9. in ci/test/02_run_container.sh:36 in fa099b23f8 outdated
    31-      --platform="${CI_IMAGE_PLATFORM}" \
    32-      --label="${CI_IMAGE_LABEL}" \
    33-      --tag="${CONTAINER_NAME}" \
    34-      $DOCKER_BUILD_CACHE_ARG \
    35-      "${BASE_READ_ONLY_DIR}"
    36-
    


    cedwies commented at 10:09 pm on October 16, 2025:
    Maybe a short comment is helpful for future maintainers? # The Docker build step has moved into 02_run_container.py (see PR [#33639](/bitcoin-bitcoin/33639/))

    maflcko commented at 8:33 am on October 17, 2025:
    I think it is fine without comment, because the files are numbered in order and not too many, so one can just read/grep them all to search what they are looking for.
  10. cedwies commented at 10:10 pm on October 16, 2025: none
    Approach ACK
  11. maflcko force-pushed on Oct 17, 2025
  12. cedwies commented at 1:12 pm on October 20, 2025: none
    utACK fa29ecd Reviewed the code and the new CI flow. The change to limit cache writes to Cirrus (while still reading cache on GHA) and moving the build step into Python with shlex.split(os.getenv("DOCKER_BUILD_CACHE_ARG", "")) looks good to me. I don’t see any unintended behavior changes beyond what is described in the PR.
  13. in ci/test/02_run_container.py:28 in fa29ecd14b
    23+    # $MAKEJOBS is ignored in the build process. Use --cpuset-cpus as an
    24+    # approximation to respect $MAKEJOBS somewhat, if cpuset is available.
    25+    if os.getenv("HAVE_CGROUP_CPUSET"):
    26+        P = int(run(["nproc"], stdout=subprocess.PIPE).stdout)
    27+        M = min(P, int(os.getenv("MAKEJOBS").lstrip("-j")))
    28+        cpus = sorted(random.sample(range(P), M))
    


    l0rinc commented at 12:30 pm on October 21, 2025:
    hmmm, not sure I get why we need this randomness here, is it to allow multiple runs without synchronization?

    maflcko commented at 1:38 pm on October 21, 2025:

    Yeah, it is a bit undocumented, and currently unused after the CI migration. Removed it for now. If there is ever need in the future to re-add it, it can be done properly. For future reference, the current state was:

    0def maybe_cpuset():
    1    # Env vars during the build can not be changed. For example, a modified
    2    # $MAKEJOBS is ignored in the build process. Use --cpuset-cpus as an
    3    # approximation to respect $MAKEJOBS somewhat, if cpuset is available.
    4    if os.getenv("HAVE_CGROUP_CPUSET"):
    5        P = int(run(["nproc"], stdout=subprocess.PIPE).stdout)
    6        M = min(P, int(os.getenv("MAKEJOBS").lstrip("-j")))
    7        cpus = sorted(random.sample(range(P), M))
    8        return [f"--cpuset-cpus={','.join(map(str, cpus))}"]
    9    return []
    
  14. in ci/test/02_run_container.py:22 in fa29ecd14b
    17@@ -17,6 +18,18 @@ def run(cmd, **kwargs):
    18         sys.exit(e)
    19 
    20 
    21+def maybe_cpuset():
    22+    # Env vars during the build can not be changed. For example, a modified
    


    l0rinc commented at 12:32 pm on October 21, 2025:
    Can’t we use Docker ARG to achieve the same result, or is that something different? The comment mentions env vars can’t be changed during build, but ARG can be set at build time.

    maflcko commented at 1:38 pm on October 21, 2025:
    (removed)
  15. in ci/test/02_run_container.py:78 in fa29ecd14b
    73+            f"--build-arg=BASE_ROOT_DIR={os.getenv('BASE_ROOT_DIR')}",
    74+            f"--platform={os.getenv('CI_IMAGE_PLATFORM')}",
    75+            f"--label={CI_IMAGE_LABEL}",
    76+            f"--tag={os.getenv('CONTAINER_NAME')}",
    77+        ]
    78+        cmd_build += maybe_cpuset()
    


    l0rinc commented at 12:39 pm on October 21, 2025:
    I assume that changing the MAYBE_CPUSET position is irrelevant, right?

    maflcko commented at 1:37 pm on October 21, 2025:
    (yes, it should be irrelevant, but removed)
  16. in ci/test/02_run_container.py:79 in fa29ecd14b
    74+            f"--platform={os.getenv('CI_IMAGE_PLATFORM')}",
    75+            f"--label={CI_IMAGE_LABEL}",
    76+            f"--tag={os.getenv('CONTAINER_NAME')}",
    77+        ]
    78+        cmd_build += maybe_cpuset()
    79+        cmd_build += maybe_cache_arg
    


    l0rinc commented at 12:39 pm on October 21, 2025:
    for symmetry can we inline + unpack this?

    maflcko commented at 1:37 pm on October 21, 2025:
    thx, inlined
  17. in ci/test/02_run_container.py:82 in fa29ecd14b
    77+        ]
    78+        cmd_build += maybe_cpuset()
    79+        cmd_build += maybe_cache_arg
    80+        cmd_build += [os.getenv("BASE_READ_ONLY_DIR")]
    81+
    82+        print(f"Building {os.getenv('CONTAINER_NAME')} image tag to run in")
    


    l0rinc commented at 12:43 pm on October 21, 2025:
    nit: we were logging CI_IMAGE_NAME_TAG before (just resolve if that’s intentional)

    maflcko commented at 1:37 pm on October 21, 2025:
    Yes, CI_IMAGE_NAME_TAG is just a general ubuntu:24.04. I think logging the more specific name, like ci_native_tidy seems useful.
  18. in ci/test/02_run_container.py:67 in faeb36d5d2 outdated
    78+        cmd_build += maybe_cpuset()
    79+        cmd_build += maybe_cache_arg
    80+        cmd_build += [os.getenv("BASE_READ_ONLY_DIR")]
    81+
    82+        print(f"Building {os.getenv('CONTAINER_NAME')} image tag to run in")
    83+        run(cmd_build)
    


    l0rinc commented at 12:45 pm on October 21, 2025:

    Can we inline cmd_build?

     0run(["docker", "buildx", "build",
     1    f"--file={os.environ['BASE_READ_ONLY_DIR']}/ci/test_imagefile",
     2    f"--build-arg=CI_IMAGE_NAME_TAG={os.environ['CI_IMAGE_NAME_TAG']}",
     3    f"--build-arg=FILE_ENV={os.environ['FILE_ENV']}",
     4    f"--build-arg=BASE_ROOT_DIR={os.environ['BASE_ROOT_DIR']}",
     5    f"--platform={os.environ['CI_IMAGE_PLATFORM']}",
     6    f"--label={CI_IMAGE_LABEL}",
     7    f"--tag={os.environ['CONTAINER_NAME']}",
     8    *maybe_cpuset(),
     9    *shlex.split(os.getenv("DOCKER_BUILD_CACHE_ARG", "")),
    10    os.environ["BASE_READ_ONLY_DIR"]
    11])
    

    maflcko commented at 1:37 pm on October 21, 2025:
    No, a follow-up requires it to be separate :)

    l0rinc commented at 2:11 pm on October 21, 2025:
    In the followup please consider avoiding concats (e.g. via *list), as shown above
  19. in ci/test/02_run_container.py:25 in faeb36d5d2 outdated
    17@@ -17,6 +18,18 @@ def run(cmd, **kwargs):
    18         sys.exit(e)
    19 
    20 
    21+def maybe_cpuset():
    22+    # Env vars during the build can not be changed. For example, a modified
    23+    # $MAKEJOBS is ignored in the build process. Use --cpuset-cpus as an
    24+    # approximation to respect $MAKEJOBS somewhat, if cpuset is available.
    25+    if os.getenv("HAVE_CGROUP_CPUSET"):
    


    l0rinc commented at 12:51 pm on October 21, 2025:

    Do we still need this? My understanding is that we’re not really using this - can we delete and add back when there’s a well-defined usecase for it?

    If we do, we might want to add MAKEJOBS guard as well since that is also needed


    maflcko commented at 1:37 pm on October 21, 2025:
    (removed)
  20. in .github/actions/configure-docker/action.yml:52 in fa29ecd14b outdated
    47@@ -48,8 +48,9 @@ runs:
    48         # Always optimistically --cache‑from in case a cache blob exists
    49         args=(--cache-from "type=gha${url_args:+,${url_args}},scope=${CONTAINER_NAME}")
    50 
    51-        # If this is a push to the default branch, also add --cache‑to to save the cache
    52-        if [[ ${{ github.event_name }} == "push" && ${{ github.ref_name }} == ${{ github.event.repository.default_branch }} ]]; then
    53+        # Only when using the Cirrus cache provider and this is a push to the default
    54+        # branch, add --cache-to to save the cache.
    55+        if [[ ${{ inputs.cache-provider }} == 'cirrus' && ${{ github.event_name }} == "push" && ${{ github.ref_name }} == ${{ github.event.repository.default_branch }} ]]; then
    


    l0rinc commented at 12:52 pm on October 21, 2025:
    slightly tangential, but maybe there’s room for optimizing the docker layers by putting the variable parts as the last layers, using ARGs over ENVs, combining multiple layers, pinning versions and paths and variables early, pruning old layers, multi-stage builds, etc.

    maflcko commented at 1:37 pm on October 21, 2025:
    In theory yes, but in practise it does not work due to a docker bug. Also, I don’t think it is worth it in practise, see #33620
  21. in ci/test/02_run_container.py:64 in fa29ecd14b
    57@@ -45,6 +58,30 @@ def main():
    58                 file.write(f"{k}={v}\n")
    59     run(["cat", env_file])
    60 
    61+    if not os.getenv("DANGER_RUN_CI_ON_HOST"):
    62+        CI_IMAGE_LABEL = "bitcoin-ci-test"
    63+
    64+        maybe_cache_arg = shlex.split(os.getenv("DOCKER_BUILD_CACHE_ARG", ""))
    


    l0rinc commented at 12:57 pm on October 21, 2025:
    Is this related to the mentioned https://www.shellcheck.net/wiki/SC2086? Because we need split to handle quotes and escaping properly?

    maflcko commented at 1:37 pm on October 21, 2025:
    Yes, exactly, see also the commit message :)
  22. in ci/test/02_run_container.py:29 in fa29ecd14b
    24+    # approximation to respect $MAKEJOBS somewhat, if cpuset is available.
    25+    if os.getenv("HAVE_CGROUP_CPUSET"):
    26+        P = int(run(["nproc"], stdout=subprocess.PIPE).stdout)
    27+        M = min(P, int(os.getenv("MAKEJOBS").lstrip("-j")))
    28+        cpus = sorted(random.sample(range(P), M))
    29+        return [f"--cpuset-cpus={','.join(map(str, cpus))}"]
    


    l0rinc commented at 12:59 pm on October 21, 2025:
    I don’t yet understand why we’re pinning CPUs here, what problem is that solving that --cpus= wouldn’t? The comment is a bit misleading since my understanding is that --cpuset-cpus doesn’t actually limit CPU usage.

    maflcko commented at 1:37 pm on October 21, 2025:
    (removed)
  23. in .github/actions/configure-docker/action.yml:52 in fa29ecd14b
    47@@ -48,8 +48,9 @@ runs:
    48         # Always optimistically --cache‑from in case a cache blob exists
    49         args=(--cache-from "type=gha${url_args:+,${url_args}},scope=${CONTAINER_NAME}")
    50 
    51-        # If this is a push to the default branch, also add --cache‑to to save the cache
    52-        if [[ ${{ github.event_name }} == "push" && ${{ github.ref_name }} == ${{ github.event.repository.default_branch }} ]]; then
    53+        # Only when using the Cirrus cache provider and this is a push to the default
    54+        # branch, add --cache-to to save the cache.
    


    l0rinc commented at 1:03 pm on October 21, 2025:
    0        # Only add --cache-to when using the Cirrus cache provider and pushing to the default branch.
    

    maflcko commented at 1:37 pm on October 21, 2025:
    thx, took your doc comment
  24. in ci/test/02_run_container.py:70 in fa29ecd14b
    65+
    66+        # Use buildx unconditionally
    67+        # Using buildx is required to properly load the correct driver, for use with registry caching. Neither build, nor BUILDKIT=1 currently do this properly
    68+        cmd_build = ["docker", "buildx", "build"]
    69+        cmd_build += [
    70+            f"--file={os.getenv('BASE_READ_ONLY_DIR')}/ci/test_imagefile",
    


    l0rinc commented at 1:07 pm on October 21, 2025:
    Doing os.environ['BASE_READ_ONLY_DIR'] would rather raise a KeyError and the failure pinpoints the source of failure better

    maflcko commented at 1:36 pm on October 21, 2025:
    Ah nice, changed all
  25. l0rinc approved
  26. l0rinc commented at 1:16 pm on October 21, 2025: contributor
    Concept ACK, I understand that you already have an ACK but please consider my suggestions
  27. ci: Remove unused MAYBE_CPUSET
    The option is currently unused. If it is used again in the future, it
    could trivially be added back.
    
    Also, the logic is just a single undocumented python command one-liner.
    
    So remove it for now.
    fa72a2bd5c
  28. maflcko force-pushed on Oct 21, 2025
  29. ci: Move buildx command to python script
    This has a few benefits:
    
    * The shellcheck SC2086 warning is disabled for the whole command, but
      is only needed for the DOCKER_BUILD_CACHE_ARG env var.  So in Python,
      only pass this one env var to shlex.split() for proper word splitting.
    * Future logic improvements can be implemented in Python.
    
    The comments are moved, which can be checked via the git options:
    --color-moved=dimmed-zebra --color-moved-ws=ignore-all-space
    fab64a5d6f
  30. ci: Only write docker build images to Cirrus cache
    Other cache providers offer too little space for this to be useful.
    fabe0e07de
  31. maflcko force-pushed on Oct 21, 2025
  32. l0rinc approved
  33. l0rinc commented at 2:14 pm on October 21, 2025: contributor

    Code review ACK fabe0e07de1ad2f26da62f3ebe0e9be3f939b1f8

    Let’s see how much slowdown this would cause without caching.

  34. DrahtBot requested review from cedwies on Oct 21, 2025
  35. maflcko commented at 2:22 pm on October 21, 2025: member

    Let’s see how much slowdown this would cause without caching.

    I don’t think this has any measurable effect on the runtime. The only benefit could be avoiding intermittent network errors.

  36. m3dwards commented at 8:28 am on October 22, 2025: contributor

    ACK fabe0e07de1ad2f26da62f3ebe0e9be3f939b1f8

    This will save a lot of space and help to prevent the other caches from getting evicted which are much more important than the docker build layer caches. Should only slow down the jobs by a few minutes at most and hopefully be a net positive as the other more important caches (like ccache) will not have been evicted.

    Ran on master on my own fork.

  37. m3dwards approved
  38. cedwies commented at 9:57 am on October 22, 2025: none
    reACK fabe0e0
  39. willcl-ark approved
  40. willcl-ark commented at 10:08 am on October 22, 2025: member

    ACK fabe0e07de1ad2f26da62f3ebe0e9be3f939b1f8

    Agree with the rationale here and the changes look correct to me. There is a run on-going on a fork (non Cirrus runners) at https://github.com/m3dwards/bitcoin/actions/runs/18709746522

    …and the caches in that repo do not contain docker build images as intended.

  41. fanquake merged this on Oct 22, 2025
  42. fanquake closed this on Oct 22, 2025

  43. maflcko deleted the branch on Oct 22, 2025
  44. maflcko commented at 10:57 am on October 23, 2025: member

    Hmm, so I did a fresh run in a repo with a clean GHA cache, and it now says:

    Approaching total cache storage limit (8.89 GB of 10 GB Used)

    So I guess we are still close to overlapping the cache and certainly will reach cache churn with more than one push to the default branch.


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-11-03 06:13 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me