contrib: turn off compression of macOS SDK to fix determinism (across distros) #32009

pull fanquake wants to merge 2 commits into bitcoin:master from fanquake:macos_sdk_select changing 2 files +6 −4
  1. fanquake commented at 4:23 pm on March 6, 2025: member

    This includes two changes. The first is to more selectively pick files for inclusion into our macOS SDK tarball (skip manpages, binaries etc), which is nice because it redues the size of the tarball (from ~80mb to 35mb), and makes the size increase that happens with the next commit, less-bad.

    The second change turns off compression of the tarball. Starting with Python 3.11, Pythons gzip might delegate to zlib. Depending on the OS, i.e Ubuntu vs Fedora, the underlying zlib implementation might differ, resulting in different output.

    For now, or until a better solution exists, disable compression. This results in the SDK increasing in size to ~230mb. Which is not unreasonable, to regain determinism (and would be significantly worse without the previous commit).

    See: https://docs.python.org/3/library/gzip.html#gzip.compress

    Would fix #31873. We could probably also put this into 29.x.

  2. contrib: more selectively pick files for macOS SDK
    Only include what we really need. Skip 100s of mb of manpages. Note that
    System/Library is only needed for the Qt build.
    6998e933f9
  3. RFC: disable compression in macOS gen-sdk script
    Starting with Python 3.11, Pythons gzip might delegate to zlib.
    Depending on the OS, i.e Ubuntu vs Fedora, the underlying zlib
    implementation might differ, resulting in different output.
    
    For now, or until a better solution exists, disable compression. This
    results in the SDK increasing in size to ~230mb. Which is not
    unreasonable, to regain determinism (and would be significantly worse
    without the previous commit).
    
    See: https://docs.python.org/3/library/gzip.html#gzip.compress
    20778eb023
  4. DrahtBot commented at 4:23 pm on March 6, 2025: contributor

    The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

    Code Coverage & Benchmarks

    For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/32009.

    Reviews

    See the guideline for information on the review process.

    Type Reviewers
    Concept ACK davidgumberg, hebasto, willcl-ark, laanwj

    If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

  5. achow101 commented at 4:29 pm on March 6, 2025: member
    Is determinism of that tarball necessary? IIRC it doesn’t actually affect the build.
  6. fanquake commented at 4:31 pm on March 6, 2025: member

    Is determinism of that tarball necessary?

    I don’t see why we wouldn’t want it. If we don’t care about it, we should remove: “The sha256sum should be c0c2e7bb92c1fee0c4e9f3a485e4530786732d6c6dd9e9f418c282aa6892f55d.” So that issues like #31873 aren’t opened.

  7. davidgumberg commented at 9:37 pm on March 6, 2025: contributor

    Concept ACK, while it might not be strictly necessary for this tarball to be deterministic, I think matching hashes of tarballs is the simplest way to verify that you’ve followed the same procedure as others to build the macOS sdk and have gotten the same result.

    150MB seems to me a small price to pay to fix #31873 and reduce surface area for nondeterminism in generating the macOS sdk.

  8. hebasto commented at 6:58 am on March 7, 2025: member
    Concept ACK for the same reasoning as in #32009 (comment).
  9. willcl-ark commented at 10:06 am on March 7, 2025: member

    Concept ACK.

    I don’t think a few hundred MB more disk space required for (power-)users doing guix builds will be a great concern to anyone, and I agree with @davidgumberg that having as many steps in the guix workflow as possible be reproducible, is probably best.

  10. fanquake renamed this:
    RFC contrib: turn off compression of macOS SDK to fix determinism (across distros)
    contrib: turn off compression of macOS SDK to fix determinism (across distros)
    on Mar 7, 2025
  11. DrahtBot added the label Scripts and tools on Mar 7, 2025
  12. laanwj commented at 5:29 am on March 8, 2025: member

    Concept ACK, imo determinism is good because it makes it possible to check if mistakes have been made in the process at earlier steps.

    One comment i was about to give is “why use a .tar.gz instead of .tar, that’s the only way to be sure to remove dependency on zlib”, but that’s a much more spread out code change. This is most likely fine.

    edit: Could do that when we have to upgrade the MacOS SDK anyway, but definitely not for 29.0

  13. Sjors commented at 3:33 pm on March 10, 2025: member

    I’m not too worried about the size increase. A typical guix build eats 30GB on my Ubuntu VM (3 GB for the outputs).

    it makes it possible to check if mistakes have been made in the process at earlier steps.

    Agreed.

    It seems simple enough to backport, especially because we’re not bumping the version.

    I’ll see if I can reproduce the new hashes.

  14. Sjors commented at 3:43 pm on March 10, 2025: member

    One slight problem is that Apple no longer makes XCode 15.0 available for download. They do have Xcode 15.0.1: https://download.developer.apple.com/Developer_Tools/Xcode_15.0.1/Xcode_15.0.1.xip

    I still have the original xip on one of my machines, I’ll upload it somewhere for testing. But we should probably bump it.

    Here you go: https://download.sprovoost.nl/download.php?id=21&token=ee9af1cfea8fa495cc842a407249126d

    (sha256 checksum is in contrib/macdeploy/README.md)

    Update: XCode 15 is still there, see below.


    I was able to reproduce the hash for the first commit 6998e933f935a379c3ad55c2fb16eca9b854f40b, but not for the second commit 20778eb0235df70397fc285f9e3b72270bd4aaf4. I get 8e085768391abfceae619a89ab151d148afe09f4867f1b4c4ce9c5693b92ec82 instead on macOS 15.3.1, Python 3.10.14. Ditto on my Ubuntu 24.04 VM inside the same machine.


    It looks like Apple Silicon has hardware acceleration for zip, maybe that’s a factor? (and would also explain why extract_xcode.py is super fast on my VM running on the M4 mac, while super slow on the AMD Ryzen 7950x). (probably not, because the first commit did produce identical results)


    I also get 8e085768391abfceae619a89ab151d148afe09f4867f1b4c4ce9c5693b92ec82 for Xcode-15.0-15A240d-extracted-SDK-with-libcxx-headers.tar.gz on the AMD Ubuntu machine.

  15. davidgumberg commented at 0:13 am on March 11, 2025: contributor

    One slight problem is that Apple no longer makes XCode 15.0 available for download. They do have Xcode 15.0.1: https://download.developer.apple.com/Developer_Tools/Xcode_15.0.1/Xcode_15.0.1.xip

    I’m still seeing it here: https://developer.apple.com/download/all/?q=Xcode%2015 , and with this download link: https://download.developer.apple.com/Developer_Tools/Xcode_15/Xcode_15.xip, it’s labeled Xcode 15 rather than Xcode 15.0.

  16. Sjors commented at 8:14 am on March 11, 2025: member

    @davidgumberg ah indeed, I see it as well. You have to click on the download link in order to for the download URL to work at all, so that’s probably why it didn’t work for me.

    So that just leaves the hash mismatch to figure out.

  17. willcl-ark commented at 1:11 pm on March 19, 2025: member

    At commit 20778eb0235df70397fc285f9e3b72270bd4aaf4 I get the following guix output:

     0env HOSTS="x86_64-apple-darwin arm64-apple-darwin" ./contrib/guix/guix-build
     1
     2<snip>
     3
     4$ find guix-build-$(git rev-parse --short=12 HEAD)/output/ -type f -print0 | env LC_ALL=C sort -z | xargs -r0 sha256sum
     530aa32906f879c5347be27bd9cb1a65b3d839acd72805fe3185276f788971fdb  guix-build-20778eb0235d/output/arm64-apple-darwin/SHA256SUMS.part
     60763a089bcc5d1d22191414f556bf461cefa8ea8a18971b171cf6fc5570b8790  guix-build-20778eb0235d/output/arm64-apple-darwin/bitcoin-20778eb0235d-arm64-apple-darwin-codesigning.tar.gz
     718cd712ac4cbd63122483a083a9f29d7c184ffaead45078fb27b3d6381b691a6  guix-build-20778eb0235d/output/arm64-apple-darwin/bitcoin-20778eb0235d-arm64-apple-darwin-unsigned.tar.gz
     8c3fdc3b50077f2a4b772d4c1cf50e230f358542844e800e77194bf85dee9eb28  guix-build-20778eb0235d/output/arm64-apple-darwin/bitcoin-20778eb0235d-arm64-apple-darwin-unsigned.zip
     9b97da95ff464b8bdcb9ce31dc783f947dd24d93eb6bfdfea71c3070d51991058  guix-build-20778eb0235d/output/dist-archive/bitcoin-20778eb0235d.tar.gz
    10e180a9ef657ea3f0a6e42c4babe7b38a8e585902fbac301a1f16bccb0c53a18b  guix-build-20778eb0235d/output/x86_64-apple-darwin/SHA256SUMS.part
    115b032a2b6dd2dad11f3884194bcb85aea37237246162dfad27099e6c5175071d  guix-build-20778eb0235d/output/x86_64-apple-darwin/bitcoin-20778eb0235d-x86_64-apple-darwin-codesigning.tar.gz
    1239d23ec95f34cd7d8bbdb6228e55f7f8a0f48a3997c3dcb146a2ff3744f37629  guix-build-20778eb0235d/output/x86_64-apple-darwin/bitcoin-20778eb0235d-x86_64-apple-darwin-unsigned.tar.gz
    1376e36fa1af2f788dea631a7fcc93451fdf794a40dded0e3c7c12f74ab136aefb  guix-build-20778eb0235d/output/x86_64-apple-darwin/bitcoin-20778eb0235d-x86_64-apple-darwin-unsigned.zip
    14
    15$ eza -al guix-build-20778eb0235d/output/*/*apple*.tar.gz
    16.rw-r--r-- 52M will 19 Mar 13:02 guix-build-20778eb0235d/output/arm64-apple-darwin/bitcoin-20778eb0235d-arm64-apple-darwin-codesigning.tar.gz
    17.rw-r--r-- 35M will 19 Mar 13:02 guix-build-20778eb0235d/output/arm64-apple-darwin/bitcoin-20778eb0235d-arm64-apple-darwin-unsigned.tar.gz
    18.rw-r--r-- 56M will 19 Mar 12:55 guix-build-20778eb0235d/output/x86_64-apple-darwin/bitcoin-20778eb0235d-x86_64-apple-darwin-codesigning.tar.gz
    19.rw-r--r-- 38M will 19 Mar 12:55 guix-build-20778eb0235d/output/x86_64-apple-darwin/bitcoin-20778eb0235d-x86_64-apple-darwin-unsigned.tar.gz
    
  18. davidgumberg commented at 5:37 am on March 22, 2025: contributor

    I don’t have an explanation for this yet, but I am able to reproduce the provided hash on systems with python versions >= 3.12.0, with various zlib and zlib-ng versions, but python < 3.12.0 I get varying hashes.

    pkgdiff again reports the mismatching archives as identical. But, now that the archives are uncompressed, the binary diff output is a little more revealing, here’s the start of the diff:

    git diff <(xxd rocky9.3/Xcode-15.0-15A240d-extracted-SDK-with-libcxx-headers.tar.gz) <(xxd fedora41/Xcode-15.0-15A240d-extracted-SDK-with-libcxx-headers.tar.gz)

     0@@ -2050,7 +2050,7 @@
     1 00008010: 0000 0000 0000 0000 0000 0000 0000 0000  ................
     2 00008020: 0000 0000 0000 0000 0000 0000 0000 0000  ................
     3 00008030: 0000 0000 0000 0000 0000 0000 0000 0000  ................
     4-00008040: 0000 0000 0000 0000 00a0 815f 7e58 636f  ..........._~Xco
     5+00008040: 0000 0000 0000 0000 0000 d0ff 2f58 636f  ............/Xco
     6 00008050: 6465 2d31 352e 302d 3135 4132 3430 642d  de-15.0-15A240d-
     7 00008060: 6578 7472 6163 7465 642d 5344 4b2d 7769  extracted-SDK-wi
     8 00008070: 7468 2d6c 6962 6378 782d 6865 6164 6572  th-libcxx-header
     9@@ -4124,30 +4124,30 @@
    10 000101b0: 6129 2041 5050 4c45 5f41 5243 4849 5645  a) APPLE_ARCHIVE
    11 000101c0: 5f53 5749 4654 5f50 5249 5641 5445 3b0a  _SWIFT_PRIVATE;.
    12 000101d0: 0a23 6966 6465 6620 5f5f 6370 6c75 7370  .#ifdef __cplusp
    13-000101e0: 6c75 730a 7d0a 2365 6e64 6966 0a00 608e  lus.}.#endif..`.
    14-000101f0: 9f71 0000 0000 0000 0000 0000 0000 0000  .q..............
    15+000101e0: 6c75 730a 7d0a 2365 6e64 6966 0a00 0000  lus.}.#endif....
    16+000101f0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    17 00010200: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    18 00010210: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    19 00010220: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    20 00010230: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    21-00010240: 0000 0000 0000 0000 0000 0000 0000 0000  ................
    22-00010250: 0000 5863 6f64 652d 3135 2e30 2d31 3541  ..Xcode-15.0-15A
    23-00010260: 3234 3064 2d65 7874 7261 6374 6564 2d53  240d-extracted-S
    24-00010270: 444b 2d77 6974 682d 6c69 6263 7878 2d68  DK-with-libcxx-h
    25-00010280: 6561 6465 7273 2f75 7372 2f69 6e63 6c75  eaders/usr/inclu
    26-00010290: 6465 2f41 7070 6c65 4172 6368 6976 652f  de/AppleArchive/
    27-000102a0: 4141 456e 7472 7958 4154 426c 6f62 2e68  AAEntryXATBlob.h
    28-000102b0: 0000 0000 0000 3030 3030 3634 3400 3030  ......0000644.00
    29-000102c0: 3030 3030 3000 3030 3030 3030 3000 3030  00000.0000000.00
    30-000102d0: 3030 3030 3135 3437 3400 3030 3030 3030  000015474.000000
    31-000102e0: 3030 3030 3000 3032 3631 3631 0020 3000  00000.026161. 0.
    32+00010240: 0000 0000 0000 0000 0000 0000 0058 636f  .............Xco
    33+00010250: 6465 2d31 352e 302d 3135 4132 3430 642d  de-15.0-15A240d-
    34+00010260: 6578 7472 6163 7465 642d 5344 4b2d 7769  extracted-SDK-wi
    35+00010270: 7468 2d6c 6962 6378 782d 6865 6164 6572  th-libcxx-header
    36+00010280: 732f 7573 722f 696e 636c 7564 652f 4170  s/usr/include/Ap
    37+00010290: 706c 6541 7263 6869 7665 2f41 4145 6e74  pleArchive/AAEnt
    38+000102a0: 7279 5841 5442 6c6f 622e 6800 0000 0000  ryXATBlob.h.....
    39+000102b0: 0030 3030 3036 3434 0030 3030 3030 3030  .0000644.0000000
    40+000102c0: 0030 3030 3030 3030 0030 3030 3030 3031  .0000000.0000001
    41+000102d0: 3534 3734 0030 3030 3030 3030 3030 3030  5474.00000000000
    42+000102e0: 0030 3236 3136 3100 2030 0000 0000 0000  .026161. 0......
    

    Testing methodology

    Modified from the repro instructions in #31873 to be faster, most importantly reusing the result of the lengthy extraction step, and just downloading the relevant branches once.

    Setup phase:

     0# assumes you have acquired `Xcode_15.xip` based on the instructions in
     1# contrib/macdeploy/README.md and placed it at ~/xcode/Xcode_15.xip
     2XCODE=~/xcode # feel free to change this
     3
     4cd $XCODE
     5sha256sum Xcode_15.xip
     6# 4daaed2ef2253c9661779fa40bfff50655dc7ec45801aba5a39653e7bcdde48e  /home/user/xcode/Xcode_15.xip
     7git clone --depth 1 https://github.com/bitcoin/bitcoin.git bitcoin-master
     8cd bitcoin-master && git fetch --depth 1 origin pull/32009/head:32009 && git worktree add ../bitcoin-32009 32009 && cd ../ # fetch the pull branch
     9
    10git clone --depth 1 https://github.com/bitcoin-core/apple-sdk-tools.git
    11python3 apple-sdk-tools/extract_xcode.py -f Xcode_15.xip | cpio -d -i # single threaded, slow, we want to reuse this.
    

    To verify that the source material for gen-sdk is good, we can generate the sdk using a known good setup, ubuntu 24.04:

    0docker pull ubuntu:24.04
    1docker run -it \
    2  -v $XCODE:/xcode \
    3  ubuntu:24.04 \
    4  /bin/bash
    

    Inside the container:

    0export DEBIAN_FRONTEND=noninteractive # prevents apt from halting to interact
    1sha256sum /xcode/Xcode_15.xip
    2# 4daaed2ef2253c9661779fa40bfff50655dc7ec45801aba5a39653e7bcdde48e
    3apt update
    4apt install python3 -y
    5
    6xcode/bitcoin-master/contrib/macdeploy/gen-sdk xcode/Xcode.app/  # we are reusing the extracted result from above
    7sha256sum Xcode-15.0-15A240d-extracted-SDK-with-libcxx-headers.tar.gz
    8# c0c2e7bb92c1fee0c4e9f3a485e4530786732d6c6dd9e9f418c282aa6892f55d  Xcode-15.0-15A240d-extracted-SDK-with-libcxx-headers.tar.gz
    

    We should get the same hash here as what contrib/macdeploy/README.md in master promises.


    To test this branch’s version of gen-sdk on a variety of platforms:

    1. Container setup

    0XCODE=~/xcode # where the files from the setup above went.
    1# specify the container platform
    2PLATFORM=fedora:40 
    3
    4docker pull $PLATFORM && \
    5  docker run -it \
    6    -v $XCODE:/xcode \
    7    $PLATFORM \
    8    /bin/bash
    

    2. In the container

    Compare the output of the final sha256sum to what contrib/macdeploy/README.md in this branch promises.

    Debian/Ubuntu
    0export DEBIAN_FRONTEND=noninteractive # prevents apt from halting to interact
    1apt update > /dev/null
    2apt install python3 -y > /dev/null
    3/xcode/bitcoin-32009/contrib/macdeploy/gen-sdk /xcode/Xcode.app/
    4sha256sum Xcode-15.0-15A240d-extracted-SDK-with-libcxx-headers.tar.gz
    

    Fedora/CentOS

    0dnf install python -y --quiet # python3 on rocky8.9
    1/xcode/bitcoin-32009/contrib/macdeploy/gen-sdk /xcode/Xcode.app/
    2sha256sum Xcode-15.0-15A240d-extracted-SDK-with-libcxx-headers.tar.gz
    

    Arch

    0pacman -Sy
    1pacman --noconfirm -S python
    2/xcode/bitcoin-32009/contrib/macdeploy/gen-sdk /xcode/Xcode.app/
    3sha256sum Xcode-15.0-15A240d-extracted-SDK-with-libcxx-headers.tar.gz
    

    Results Tables

    Success

    image python --version python -c "import zlib; print(zlib.ZLIB_VERSION)"
    fedora:39 3.12.7 1.2.13
    fedora:40 3.12.9 1.3.1-zlib-ng
    fedora:41 3.13.2 1.3.1-zlib-ng
    ubuntu:24.04 3.12.3 1.3
    ubuntu:24.10 3.12.7 1.3.1
    archlinux:latest 3.13.2 1.3.1

    Failed to reproduce:

    image python --version python -c "import zlib; print(zlib.ZLIB_VERSION)" hash
    debian:bookworm 3.11.2 1.2.13 8e085768391abfceae619a89ab151d148afe09f4867f1b4c4ce9c5693b92ec82
    rockylinux:8.9 3.6.8 1.2.11 e779914636e6a3a417bf2a19dbce6f0bf8fab10b16717df769d107a5aad6aa2e
    rockylinux:9.3 3.9.18 1.2.11 07b12c2a489c241bbc8c853fe78f2e92faf8ff51631311d142aeb8c7e20e7268

    I downgraded to python 3.11.3 on fedora 41 with zlib 1.3, and got a bad hash, upgraded to python 3.12.0 with zlib-ng 1.3 on debian:bookworm and got a good hash.

    Also see #31873 (comment)

     0dnf install -y gcc git make openssl-devel xz-devel git
     1curl -L https://github.com/madler/zlib/releases/download/v1.3/zlib-1.3.tar.gz | tar xzvf -
     2cd zlib-1.3 && ./configure && make -j $(nproc) && make install && cd ..
     3
     4curl https://pyenv.run | bash
     5export PYENV_ROOT="$HOME/.pyenv"
     6[[ -d $PYENV_ROOT/bin ]] && export PATH="$PYENV_ROOT/bin:$PATH"
     7eval "$(pyenv init - bash)"
     8
     9pyenv install 3.11.3
    10pyenv global 3.11.3
    11
    12# verify zlib version
    13python -c "import zlib; print(zlib.ZLIB_VERSION)"
    14# 1.3
    15
    16/xcode/bitcoin-32009/contrib/macdeploy/gen-sdk /xcode/Xcode.app/
    17sha256sum Xcode-15.0-15A240d-extracted-SDK-with-libcxx-headers.tar.gz
    18# get something bad
    19
    20pyenv install 3.12.0
    21pyenv global 3.12.0
    22/xcode/bitcoin-32009/contrib/macdeploy/gen-sdk /xcode/Xcode.app/
    23sha256sum Xcode-15.0-15A240d-extracted-SDK-with-libcxx-headers.tar.gz
    

    Don’t have super strong evidence for this, but I suspect that this might be a second source of nondeterministic behavior that was already present before, as in my previous testing of master when I reported this issue, when I was trying different combinations of python and zlib, I also saw a similar pattern where when changing to random python, some of the hashes were randomly mismatched, and the rest were either of two hashes, one right and one wrong. I avoided sharing the random hashes in my table in #31873, and all of the values I share are for python 3.12.x

  19. maflcko commented at 6:05 am on March 22, 2025: member

    I don’t have an explanation for this yet, but I am able to reproduce the provided hash on systems with python versions >= 3.12.0, with various zlib and zlib-ng versions, but python < 3.12.0 I get varying hashes.

    A wild guess: Could this be related to https://github.com/python/cpython/issues/95385?

  20. laanwj commented at 4:07 pm on March 26, 2025: member

    git diff <(xxd rocky9.3/Xcode-15.0-15A240d-extracted-SDK-with-libcxx-headers.tar.gz) <(xxd fedora41/Xcode-15.0-15A240d-extracted-SDK-with-libcxx-headers.tar.gz)

    This seems to suggest the divergence is in the tarring, not the gzipping. Do you still see a difference if you gunzip the .gzs and compare the tars only?


github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2025-03-31 09:12 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me