This PR mitigates network issues when vcpkg downloads source tarballs by caching the entire vcpkg/downloads directory.
Closes #34996.
<!--e57a25ab6845829454e8d69fc972939a-->
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.
<!--021abf342d371248e50ceaed478a90ca-->
See the guideline for information on the review process.
If your review is incorrectly listed, please copy-paste <code><!--meta-tag:bot-skip--></code> into the comment that the bot should ignore.
<!--5faf32d7da4f0f540f40219e4f7537a3-->
The size of windows-native-dll-vcpkg-downloads is about 1.2 GB, compared to 470 MB for windows-native-dll-vcpkg-tools.
Perhaps we could exclude vcpkg/downloads/tools from the cache.
Perhaps we could exclude
vcpkg/downloads/toolsfrom the cache.
Done.
The size of
windows-native-dll-vcpkg-downloadsis about 1.2 GB...
It's 800 MB now.
Can you explain why the tools were previously cached, what they are, and why it is fine to no longer cache them?
Can you explain why the tools were previously cached...
To skip downloading and unpacking them whenever possible.
... what they are...
They are Windows binaries used for building packages for the target triplet. Here are the contents of the vcpkg/downloads/tools directory from the CI:
$ ls -l ~/AppData/Local/vcpkg/downloads/tools
total 164
drwxr-xr-x 1 runneradmin 197121 0 Apr 7 14:37 jom
drwxr-xr-x 1 runneradmin 197121 0 Apr 7 14:46 meson-1.8.2-3d2461
drwxr-xr-x 1 runneradmin 197121 0 Apr 7 14:49 msys2
drwxr-xr-x 1 runneradmin 197121 0 Apr 7 14:37 nasm
drwxr-xr-x 1 runneradmin 197121 0 Apr 7 14:29 ninja
drwxr-xr-x 1 runneradmin 197121 0 Apr 7 14:37 perl
drwxr-xr-x 1 runneradmin 197121 0 Apr 7 14:29 powershell-core-7.5.4-windows
drwxr-xr-x 1 runneradmin 197121 0 Apr 7 14:46 python
drwxr-xr-x 1 runneradmin 197121 0 Apr 7 14:46 win_bison
... and why it is fine to no longer cache them?
It is a tradeoff. The downloaded tool archives are now cached along with the package source tarballs. However, vcpkg has to unpack them on every run.
262 | - path: ~/AppData/Local/vcpkg/downloads/tools 263 | - key: ${{ github.job }}-vcpkg-tools-${{ github.run_id }} 264 | - restore-keys: ${{ github.job }}-vcpkg-tools- 265 | + path: | 266 | + ~/AppData/Local/vcpkg/downloads/* 267 | + !~/AppData/Local/vcpkg/downloads/tools
Is there a need to set a path on a download? Wouldn't it be easier to just set the correct path on upload only?
Yes, there is a need. The path input participates in cache entity identification. Therefore, a pair of actions/cache/restore and actions/cache/save must have the exact same path configured, or it will result in a cache miss.
291 | with: 292 | - path: ~/AppData/Local/vcpkg/downloads/tools 293 | - key: ${{ github.job }}-vcpkg-tools-${{ github.run_id }} 294 | + path: | 295 | + ~/AppData/Local/vcpkg/downloads/* 296 | + !~/AppData/Local/vcpkg/downloads/tools
!~/AppData/Local/vcpkg/downloads/tools # Cache the tools once as archives, but not redundantly in extracted form.
nit: Maybe add a comment here, as per my question and your answer?
Thanks! The comment has been added.
This avoids code duplication and improves readability.
The new cache is keyed with the hash of 'vcpkg.json', which reduces
cache storage consumption compared to keying by run ID.
The `vcpkg/downloads/tools` subdirectory is excluded to further save
space.
210 | @@ -211,7 +211,7 @@ jobs: 211 | with: 212 | path: ${{ env.CCACHE_DIR }} 213 | # https://github.com/actions/cache/blob/main/tips-and-workarounds.md#update-a-cache 214 | - key: ${{ github.job }}-${{ matrix.job-type }}-ccache-${{ github.run_id }} 215 | + key: ${{ steps.ccache-cache.outputs.cache-primary-key }}
you claim this is a refactor, but it would be good to link to the relevant docs, so that reviewers can confirm this.
to me this looks like dropping the run_id, making it impossible to save a new cache?
you claim this is a refactor, but it would be good to link to the relevant docs, so that reviewers can confirm this.
Sure.
From the docs:
cache-primary-key- Cache primary key passed in the input to use in subsequent steps of the workflow.
to me this looks like dropping the run_id, making it impossible to save a new cache?
It now refers to:https://github.com/bitcoin/bitcoin/blob/dc9309108381cd5704b8e9eedc931bd5337dad78/.github/workflows/ci.yml#L192
If you think cache-primary-key is confusing, Iām happy to drop that commit.
Note for Maintainers: To properly populate the new CI cache, all current vcpkg binary caches must be cleared to trigger a rebuild.
Would it not be easier and clearer to pick a new name for the new cache?
Note for Maintainers: To properly populate the new CI cache, all current vcpkg binary caches must be cleared to trigger a rebuild.
Would it not be easier and clearer to pick a new name for the new cache?
Thanks! Done.
review ACK 09c0e3778998eece67a908711efdac44ed3d3630 š¢
<details><summary>Show signature</summary>
Signature:
untrusted comment: signature from minisign secret key on empty file; verify via: minisign -Vm "${path_to_any_empty_file}" -P RWTRmVTMeKV5noAMqVlsMugDDCyyTSbA3Re5AkUrhvLVln0tSaFWglOw -x "${path_to_this_whole_four_line_signature_blob}"
RUTRmVTMeKV5npGrKx1nqXCw5zeVHdtdYURB/KlyA/LMFgpNCs+SkW9a8N95d+U4AP1RJMi+krxU1A3Yux4bpwZNLvVBKy0wLgM=
trusted comment: review ACK 09c0e3778998eece67a908711efdac44ed3d3630 š¢
RRwSVexeMqLj/eb1FyUZqhz3Iay3uPWRPi6Mxp2/s7OqYsgtSvDZD2Hw3w1MlR3B74xykHEvLttPcuni2M1VCw==
</details>
utACK 09c0e3778998eece67a908711efdac44ed3d3630
Hmm, the windows-native-dll-vcpkg-downloads cache was successfully saved yesterday:
Cache saved with key: windows-native-dll-vcpkg-downloads-10b54c5a23d7f316a997f1ffa95a48c85318ffb3895a9a21dceef6021fabc38e
But today it is not available:
Cache not found for input keys: windows-native-dll-vcpkg-downloads-10b54c5a23d7f316a997f1ffa95a48c85318ffb3895a9a21dceef6021fabc38e
Not sure what the root cause is here, but assuming eviction/thrashing.
If so we could try increasing cache size: https://docs.github.com/en/actions/reference/workflows-and-actions/dependency-caching#increasing-cache-size ?
Not sure what the root cause is here, but assuming eviction/thrashing.
Yeah, I presume the macos task is fast enough to build twice when other tasks are not yet done, so it can easily eat 2 * 2 * 2 GB = 8 GB with ccache data out of the 10 GB. Another ccache from the arm task will already overflow the limit, without even giving any space for the Windows cache.
So I guess the cache needs to be bumped, or #34474 reverted for the GHA runners?
Opened #35176 one approach to fix the cache churn we see here.