ci: cache gnu32 nix store #256

pull Sjors wants to merge 1 commits into bitcoin-core:master from Sjors:2026/03/nix-cache changing 4 files +62 −2
  1. Sjors commented at 1:33 pm on March 12, 2026: member

    gnu32 is by far the slowest ci job and it spends most of it’s time building Nix stuff.

    Caching the Nix store drops subsequent runs to just 3 minutes.

    Not caching the other ones, because the cache is quite large and Github limits us to 10GB total.

    Using https://github.com/nix-community/cache-nix-action

    Closes #254

  2. DrahtBot commented at 1:33 pm on March 12, 2026: none

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    ACK ryanofsky

    If your review is incorrectly listed, please copy-paste <!–meta-tag:bot-skip–> into the comment that the bot should ignore.

  3. Sjors force-pushed on Mar 12, 2026
  4. Sjors commented at 1:42 pm on March 12, 2026: member
    @ryanofsky can you whitelist nix-community/cache-nix-action@v7?
  5. ryanofsky closed this on Mar 12, 2026

  6. ryanofsky reopened this on Mar 12, 2026

  7. ryanofsky commented at 6:21 pm on March 12, 2026: collaborator

    @ryanofsky can you whitelist nix-community/cache-nix-action@v7?

    Thanks, added. This sorry for the delay. If there is a way we could give you github permissions to change these settings that would seem nice too

  8. Sjors commented at 6:43 pm on March 12, 2026: member
    There’s no rush and I don’t expect to have to make many such changes. Basic ccache and more tailored nix cache should do the trick.
  9. Sjors commented at 7:21 pm on March 12, 2026: member

    About 800 MB, not too bad:

    Runtime was 60 minutes. I’m going to amend the commit (date) to trigger a re-run which should use this cache…

  10. Sjors force-pushed on Mar 12, 2026
  11. Sjors force-pushed on Mar 12, 2026
  12. Sjors commented at 7:54 pm on March 12, 2026: member

    Looks like the cache got trimmed the first write. Trying again.

    This push to 34da50f8499f51179a08a35777fa884b61f2d13b is not expected to be substantially faster, the next push should be.

  13. Sjors commented at 8:35 pm on March 12, 2026: member
    Depsite the incomplete cache it ran in only 30 minutes. The new cache entry is 1.7 GB. Let’s run it again…
  14. Sjors force-pushed on Mar 12, 2026
  15. in .github/workflows/ci.yml:170 in 6dc9700245
    167+      # incomplete cache while restores still match on the stable prefix.
    168+      - name: Cache Nix store
    169+        if: matrix.config == 'gnu32'
    170+        uses: nix-community/cache-nix-action@v7
    171+        with:
    172+          primary-key: nix-${{ runner.os }}-${{ matrix.config }}-${{ env.NIXPKGS_CHANNEL }}-${{ hashFiles('shell.nix', 'ci/patches/*.patch', 'ci/configs/gnu32.bash') }}-${{ github.sha }}
    


    Sjors commented at 8:39 pm on March 12, 2026:
    It might be better to drop -${{ github.sha }} so our cache won’t get flushed as often.
  16. Sjors commented at 8:41 pm on March 12, 2026: member
    Down to 3 minutes, nice!
  17. Sjors marked this as ready for review on Mar 12, 2026
  18. Sjors force-pushed on Mar 12, 2026
  19. Sjors commented at 9:14 pm on March 12, 2026: member

    Trying a different caching approach, using the latest nix channel commit - which doesn’t change that often for nixos-25.05. It should now save the cache only once.

    If it can’t get an exact match it will try a broader match, so IIUC even if something in nix changes from under us, we might still get some benefit out of earlier cache entries. Once the build succeeds, it produces a fresh one that’s then used in subsequent runs.

  20. Sjors force-pushed on Mar 12, 2026
  21. Sjors force-pushed on Mar 12, 2026
  22. Sjors marked this as a draft on Mar 12, 2026
  23. Sjors force-pushed on Mar 12, 2026
  24. Sjors commented at 9:35 pm on March 12, 2026: member

    Alright, 76d3e51fdd165654b3fec778971b1e4b6796f901 created a new entry nix-Linux-gnu32-ac62194c3917d5f474c1a844b6fd6da2db95077d-67a1dfa5824ce68902b067211c47c342ba705ad71d773f96e6885ba6d3defc3e which the next run should reuse and not resave.

    Pushed an amended date commit 6562d22a6dd6abc66a1a3191934240d4f7a0d0fc to test that.

  25. Sjors commented at 9:39 pm on March 12, 2026: member

    As expected, the last gnu32 CI run found a match for the primary key and did not re-save it.

    https://github.com/bitcoin-core/libmultiprocess/actions/runs/23025035895/job/66870747483?pr=256

    Runtime slightly under 3 minutes.

    So this should be good to go now.

  26. Sjors marked this as ready for review on Mar 12, 2026
  27. in .github/workflows/ci.yml:192 in 6562d22a6d
    189           CI_CONFIG: ci/configs/${{ matrix.config }}.bash
    190         run: ci/scripts/run.sh
    191+
    192+      # Use an explicit save step instead of the action post-step so we only
    193+      # archive the store after the build succeeded and the shell closure is
    194+      # rooted against the save-time garbage collection pass.
    


    Sjors commented at 8:52 am on March 13, 2026:
    I have no idea what “shell closure is rooted against the save-time garbage collection pass” means, so it would be good for someone who knows Nix to sanity check this.

    ryanofsky commented at 6:03 pm on March 23, 2026:

    I have no idea what “shell closure is rooted against the save-time garbage collection pass” means, so it would be good for someone who knows Nix to sanity check this.

    It makes sense, but is a confusing comment. Would suggest splitting it up like “Use an explicit save step instead of the action post-step so we only save after a successful build. Create a GC root for the gnu32 shell closure here, so the cache-nix-action step below does not garbage-collect it. "

  28. ryanofsky approved
  29. ryanofsky commented at 6:31 pm on March 23, 2026: collaborator

    Code review ACK 6562d22a6dd6abc66a1a3191934240d4f7a0d0fc. Nice to speed up this very slow job.

    Am happy to merge as-is but it would be nice not to hardcode this logic for the gnu32 job, also to not have such complicated bash commands in the yaml. More ideally, i think this would do something like:

    • Add CI_CACHE_STORE=true in ci/configs/gnu32.bash
    • Add a ci/scripts/config.sh script and build step that sources the config and outputs whatever variables the yml file needs to do caching
    • Add command to ci/scripts/run.sh to create the .nix-gc-roots link needed to prevent garbage collection.

    Also from https://github.com/nix-community/cache-nix-action/#a-typical-job it seems we don’t need two separate cache-nix-action steps before and after the build. Just a single step before the build should be sufficient if the goal is to only save the cache on success.

  30. Sjors commented at 10:57 am on March 24, 2026: member

    not have such complicated bash commands in the yaml

    I you merge #253 first I’ll rebase this to take advantage of the bash scripts introduced there.

  31. Sjors marked this as a draft on Mar 24, 2026
  32. Sjors force-pushed on Mar 25, 2026
  33. Sjors commented at 6:52 pm on March 25, 2026: member

    TSan tripped over: AssertionError: [node 0] Expected message(s) ['canceled while executing'] not found in log while “Running disconnect during BlockTemplate.waitNext”: https://github.com/bitcoin-core/libmultiprocess/actions/runs/23555980083/job/68582763611?pr=256#step:21:2025

    Probably need to bump the TEST_RUNNER_TIMEOUT_FACTOR.

    If the gnu32 job works (so far so good), I’ll push a comment improvement to see if it picks up the cache.

  34. Sjors force-pushed on Mar 25, 2026
  35. Sjors marked this as ready for review on Mar 25, 2026
  36. Sjors commented at 7:11 pm on March 25, 2026: member

    From one hour to two minutes, nice.


    Asan failure looks like another timeout again, see #263.

    0Expected message(s) ['canceled while executing'] not found in log
    

    Though the tail of the error looks more worrying:

    0 node0 stderr /usr/include/capnp/capability.h:1129:16: runtime error: member call on address 0x511000092940 which does not point to an object of type 'capnp::CallContextHook'
    10x511000092940: note: object has invalid vptr
    2 02 00 00 00  6d 7f 00 00 02 00 00 00  20 35 1f 06 6b 7f 00 00  00 00 00 00 be be be be  88 18 06 00
    3              ^~~~~~~~~~~~~~~~~~~~~~~
    4              invalid vptr 
    

    Or is that in the Python framework?

    https://github.com/bitcoin-core/libmultiprocess/actions/runs/23559019643/job/68593574084?pr=256

  37. in .github/workflows/ci.yml:170 in 293b0e0f19
    167+          extra_nix_config: |
    168+            ${{ env.NIX_EXTRA_CONFIG }}
    169+            ${{ github.actor == 'nektos/act' && env.NIX_EXTRA_CONFIG_ACT || '' }}
    170+
    171+      # Cache the heaviest Nix job to stay within GitHub's cache budget while
    172+      # still avoiding repeated gnu32 cross-toolchain downloads and builds.
    


    ryanofsky commented at 9:41 pm on March 25, 2026:

    In commit “ci: cache gnu32 nix store” (293b0e0f197e20e74f729322d0c2e5b5fe4172d8)

    Probably should just drop this comment since there is no longer any code here referencing gnu32.


    Sjors commented at 7:23 am on March 26, 2026:
    I’ll move it to the gnu32 script to explain why the other scripts don’t.
  38. in ci/scripts/run.sh:17 in 293b0e0f19
    10@@ -11,3 +11,13 @@ set -o errexit -o nounset -o pipefail -o xtrace
    11 [ "${CI_CONFIG+x}" ] && source "$CI_CONFIG"
    12 
    13 nix develop --ignore-environment --keep CI_CONFIG --keep CI_CLEAN "${NIX_ARGS[@]+"${NIX_ARGS[@]}"}" -f shell.nix --command ci/scripts/ci.sh
    14+
    15+# Create a GC root for the shell closure so the cache-nix-action save step
    16+# does not garbage-collect it.
    17+if [[ "${CI_CACHE_NIX_STORE:-}" == "true" ]]; then
    


    ryanofsky commented at 9:56 pm on March 25, 2026:

    In commit “ci: cache gnu32 nix store” (293b0e0f197e20e74f729322d0c2e5b5fe4172d8)

    This is ok, but for consistency with CI_CLEAN would change the check to [ -n "${CI_CACHE_NIX_STORE-}" ] and just treat any nonempty value as true, not just “true” specifically.

  39. in ci/scripts/run.sh:22 in 293b0e0f19
    17+if [[ "${CI_CACHE_NIX_STORE:-}" == "true" ]]; then
    18+  mkdir -p .nix-gc-roots
    19+  nix-build shell.nix \
    20+    -o .nix-gc-roots/shell \
    21+    "${NIX_ARGS[@]+"${NIX_ARGS[@]}"}"
    22+  nix-store --query --requisites .nix-gc-roots/shell >/dev/null
    


    ryanofsky commented at 10:05 pm on March 25, 2026:

    In commit “ci: cache gnu32 nix store” (293b0e0f197e20e74f729322d0c2e5b5fe4172d8)

    Maybe add a comment about why this is here. It seems like it doesn’t do anything by default but could print helpful errors if the closure is not complete? Not sure. Would be good to explain or remove.


    Sjors commented at 7:43 am on March 26, 2026:

    Added a comment, it’s a guard against creating a bad cache.

    Good results are ignored, and IIUC if something goes wrong it will go to stderr, but I didn’t test that.

  40. in ci/scripts/run.sh:20 in 293b0e0f19
    15+# Create a GC root for the shell closure so the cache-nix-action save step
    16+# does not garbage-collect it.
    17+if [[ "${CI_CACHE_NIX_STORE:-}" == "true" ]]; then
    18+  mkdir -p .nix-gc-roots
    19+  nix-build shell.nix \
    20+    -o .nix-gc-roots/shell \
    


    ryanofsky commented at 10:11 pm on March 25, 2026:

    In commit “ci: cache gnu32 nix store” (293b0e0f197e20e74f729322d0c2e5b5fe4172d8)

    Would probably be good if CI script did not write any files outside of $CI_DIR, so different configs do not interfere and write the same files when run locally. Would suggest changing this to -o $CI_DIR/gcroot maybe, or something inside $CI_DIR

  41. ryanofsky approved
  42. ryanofsky commented at 10:20 pm on March 25, 2026: collaborator

    Code review ACK 293b0e0f197e20e74f729322d0c2e5b5fe4172d8. This seems more understandable now with one cache step instead of two and more code moved to bash scripts. Left some comments and suggestions, but nothing blocking.

    re: #256 (comment)

    The capnp::CallContextHook “object has invalid vptr” error is pretty bad looking, but it looks like that happens after a test timeout so #263 should prevent it from happening. The actual bug causing that error might also be fixed by one of the fixes in #249, but that’s speculative.

  43. ryanofsky referenced this in commit 975270b619 on Mar 25, 2026
  44. ci: cache gnu32 nix store be8622816d
  45. Sjors force-pushed on Mar 26, 2026
  46. Sjors commented at 7:45 am on March 26, 2026: member

    Rebased and addressed comments.

    The script change invalidated the cache primary key, but the fallback worked, so it’s still fast. This fallback mechanism is also why the sanity check before storing cache is important: #256 (review)

    0Searching for a cache with the key "nix-Linux-gnu32-ac62194c3917d5f474c1a844b6fd6da2db95077d-129eacb3982a51870d08f654586be1b8b91c598a2ff3b8b1384a2bcea6fac601".
    1Could not find a cache with the given "primary-key" and "paths".
    2Searching for a cache using the "restore-prefixes-first-match":
    3["nix-Linux-gnu32-ac62194c3917d5f474c1a844b6fd6da2db95077d-","nix-Linux-gnu32-","nix-Linux-"]
    4Cache hit for: nix-Linux-gnu32-ac62194c3917d5f474c1a844b6fd6da2db95077d-bc4e3ad07d6fb99b43a0ff79cb58a5af0d4496c460a796783fa938faefe49cef
    5...
    
  47. maflcko commented at 8:30 am on March 26, 2026: contributor

    CI failure:

     02026-03-26T08:10:29.5265259Z Temporary test directory at /tmp/test_runner__🏃_20260326_081029
     12026-03-26T08:10:57.0704247Z 1/600 - interface_ipc.py passed, Duration: 22 s
     22026-03-26T08:10:57.0711459Z 2/600 - interface_ipc.py passed, Duration: 22 s
     32026-03-26T08:10:57.3870521Z 3/600 - interface_ipc.py passed, Duration: 22 s
     42026-03-26T08:10:57.4656979Z 4/600 - interface_ipc.py passed, Duration: 22 s
     52026-03-26T08:10:57.4887463Z 5/600 - interface_ipc.py passed, Duration: 22 s
     62026-03-26T08:10:57.5960759Z 6/600 - interface_ipc.py passed, Duration: 22 s
     72026-03-26T08:10:57.6598446Z 7/600 - interface_ipc.py passed, Duration: 22 s
     82026-03-26T08:10:57.9096218Z 8/600 - interface_ipc.py passed, Duration: 23 s
     92026-03-26T08:10:58.1457589Z 9/600 - interface_ipc.py passed, Duration: 23 s
    102026-03-26T08:10:58.1540397Z 10/600 - interface_ipc.py passed, Duration: 23 s
    112026-03-26T08:10:58.2227770Z 11/600 - interface_ipc.py passed, Duration: 23 s
    122026-03-26T08:10:58.2308253Z 12/600 - interface_ipc.py passed, Duration: 23 s
    132026-03-26T08:10:58.2974238Z 13/600 - interface_ipc.py passed, Duration: 23 s
    142026-03-26T08:10:58.3686416Z 14/600 - interface_ipc.py passed, Duration: 23 s
    152026-03-26T08:10:58.3736080Z 15/600 - interface_ipc.py passed, Duration: 23 s
    162026-03-26T08:10:58.4542169Z 16/600 - interface_ipc.py passed, Duration: 23 s
    172026-03-26T08:10:58.4688903Z 17/600 - interface_ipc.py passed, Duration: 23 s
    182026-03-26T08:10:58.4693534Z 18/600 - interface_ipc.py passed, Duration: 23 s
    192026-03-26T08:10:58.4842191Z 19/600 - interface_ipc.py passed, Duration: 23 s
    202026-03-26T08:10:58.6343568Z 20/600 - interface_ipc.py passed, Duration: 23 s
    212026-03-26T08:11:12.9192987Z 21/600 - interface_ipc.py passed, Duration: 15 s
    222026-03-26T08:11:13.0433624Z 22/600 - interface_ipc.py passed, Duration: 15 s
    232026-03-26T08:11:13.4416511Z 23/600 - interface_ipc.py passed, Duration: 16 s
    242026-03-26T08:11:13.5990535Z 24/600 - interface_ipc.py passed, Duration: 16 s
    252026-03-26T08:11:14.2660376Z 25/600 - interface_ipc.py passed, Duration: 15 s
    262026-03-26T08:11:14.4723777Z 26/600 - interface_ipc.py passed, Duration: 16 s
    272026-03-26T08:11:14.5929731Z 27/600 - interface_ipc.py passed, Duration: 16 s
    282026-03-26T08:11:14.7083757Z 28/600 - interface_ipc.py passed, Duration: 16 s
    292026-03-26T08:11:14.8849051Z 29/600 - interface_ipc.py passed, Duration: 16 s
    302026-03-26T08:11:14.9215983Z 30/600 - interface_ipc.py passed, Duration: 16 s
    312026-03-26T08:11:22.4715850Z 31/600 - interface_ipc.py passed, Duration: 9 s
    322026-03-26T08:11:22.8364245Z 32/600 - interface_ipc.py passed, Duration: 9 s
    332026-03-26T08:11:22.8723167Z 33/600 - interface_ipc.py passed, Duration: 8 s
    342026-03-26T08:11:23.8324036Z 34/600 - interface_ipc.py passed, Duration: 8 s
    352026-03-26T08:11:24.5614193Z 35/600 - interface_ipc.py passed, Duration: 9 s
    362026-03-26T08:12:01.8098556Z ##[error]The runner has received a shutdown signal. This can happen when the runner service is stopped, or a manually started runner is canceled.
    372026-03-26T08:12:01.8165252Z ##[error]Process completed with exit code 143.
    382026-03-26T08:12:02.7667771Z Cleaning up orphan processes
    392026-03-26T08:12:05.7207024Z Terminate orphan process: pid (14932) (b-shutoff)
    402026-03-26T08:12:05.8932171Z Terminate orphan process: pid (14934) (b-shutoff)
    412026-03-26T08:12:05.9839329Z Terminate orphan process: pid (14936) (b-shutoff)
    422026-03-26T08:12:06.0753851Z Terminate orphan process: pid (14945) (b-shutoff)
    432026-03-26T08:12:06.1362572Z Terminate orphan process: pid (14951) (b-shutoff)
    442026-03-26T08:12:06.2211871Z Terminate orphan process: pid (14963) (bitcoin-node)
    452026-03-26T08:12:06.2538740Z Terminate orphan process: pid (14978) (b-shutoff)
    462026-03-26T08:12:06.2638800Z Terminate orphan process: pid (14981) (b-shutoff)
    472026-03-26T08:12:06.2910155Z Terminate orphan process: pid (14982) (b-shutoff)
    482026-03-26T08:12:06.3320706Z Terminate orphan process: pid (16439) (b-shutoff)
    492026-03-26T08:12:06.3381871Z Terminate orphan process: pid (16440) (b-shutoff)
    502026-03-26T08:12:06.3418611Z ##[warning]Node.js 20 actions are deprecated. The following actions are running on Node.js 20 and may not work as expected: actions/cache/restore@v4, actions/cache@v4, actions/checkout@v4. Actions will be forced to run with Node.js 24 by default starting June 2nd, 2026. Please check if updated versions of these actions are available that support Node.js 24. To opt into Node.js 24 now, set the FORCE_JAVASCRIPT_ACTIONS_TO_NODE24=true environment variable on the runner or in your workflow file. Once Node.js 24 becomes the default, you can temporarily opt out by setting ACTIONS_ALLOW_USE_UNSECURE_NODE_VERSION=true. For more information see: https://github.blog/changelog/2025-09-19-deprecation-of-node-20-on-github-actions-runners/
    
  48. Sjors commented at 9:26 am on March 26, 2026: member
    Another 143 exit, maybe they’re running out of RAM with the longer timeouts?
  49. ryanofsky commented at 9:41 am on March 26, 2026: collaborator

    Another 143 exit, maybe they’re running out of RAM with the longer timeouts?

    Yes these seem to happen reliably on master since #263 was merged. I suspect what is happening is that with the longer timeouts, the test waits forever for node.assert_debug_log(expected_msgs=["IPC server: socket disconnected", "canceled while executing"], timeout=2) log messages that never arrive. Then the jobs are eventually killed with SIGTERM (exit code 143). If this is the case, it’s possible #249 could fix this.

    EDIT: This explanation is not right. After rebasing #249, we see SIGTERM/143 errors after the test has only run for 6 minutes (https://github.com/bitcoin-core/libmultiprocess/actions/runs/23587830246/job/68685257218?pr=249). So it seems like SIGTERM is not sent because the test is hanging for a very long time. More discussion about this is in #263 (comment)

  50. ryanofsky approved
  51. ryanofsky commented at 9:56 am on March 26, 2026: collaborator

    Code review ACK be8622816da467bb6ba86646f7ccab248b6b4931. Thanks for the updates! Just suggested changes updating comments and tweaking CI_CACHE_NIX_STORE evaluation since last review.

    I think I will go ahead and merge this since the Bitcoin Core CI failures should be unrelated, and are also happening in master

  52. ryanofsky merged this on Mar 26, 2026
  53. ryanofsky closed this on Mar 26, 2026


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin-core/libmultiprocess. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-03-29 21:30 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me