contrib: replace binary verification script verify.sh with python rewrite #20689

pull theStack wants to merge 2 commits into bitcoin:master from theStack:202012-contrib-replace-verify-binaries-script-with-python changing 3 files +190 −184
  1. theStack commented at 7:00 PM on December 17, 2020: member

    The rationale for the PR is the same as for #18132:

    Most of our test scripts are written in python. We don't have enough reviewers for bash scripts and they tend to be clumsy anyway. Especially when it comes to argument parsing.

    Note that there are still a lot of things that could be improved in this replacement (e.g. using regexps for version string parsing, adding type annotations, dividing up into more functions, getting a pylint score closer to 10, etc.), but I found the original shell script quite hard to read, so it's possibly still a good first step for an improvement. ~Not sure though if it's worth the reviewers time, and if it's even continued to be used long-term (maybe there are plans to merge it with get_previous_releases.py, which partly does the same?), so chasing for Concept ACKs right now.~

  2. DrahtBot added the label Docs on Dec 17, 2020
  3. DrahtBot added the label Scripts and tools on Dec 17, 2020
  4. jonatack commented at 7:38 PM on December 17, 2020: member

    Concept ACK (modulo context / future plans for this script; I don't know).

  5. in contrib/verifybinaries/verify.py:75 in bb801302ee outdated
      70 | +        return False
      71 | +    subprocess.run(f"mv {filename}-tmp {filename}", shell=True)
      72 | +    return True
      73 | +
      74 | +
      75 | +def verify_with_sha256sum(files_to_verify, hashfile):
    


    laanwj commented at 11:30 AM on December 18, 2020:

    This is a missed opportunity; please use python's internal hashing instead of calling out to an utility here.


    theStack commented at 7:12 PM on December 18, 2020:

    Good point! When I thought about using internal hashing I was initially put off about potentially loading dozens of MBs into memory (I guess sha256sum does this in a smart way by loading chunk after chunk, instead of the whole file at once), but OTOH this should not be a problem at all and the advantage of getting rid of an external tool outweighs this by far.


    shesek commented at 7:53 PM on January 29, 2021:

    I'm late to the party, but you can do stream hashing in python:

    import hashlib as hash
    
    CHUNKSIZE = 65536
    
    sha = hash.sha256()
    with open('/path/to/foo.bar', 'rb') as fh:
        buff = fh.read(CHUNKSIZE)
        while len(buff) > 0:
            sha.update(buff)
            buff = fh.read(CHUNKSIZE)
    digest = sha.hexdigest()
    

    theStack commented at 9:26 PM on January 29, 2021:

    @shesek: Thanks for posting! I didn't doubt that stream hashing is possible at all in python, I just was put off by the idea of partly reinventing the wheel of sha256 here and investing more than 2-3 LOC for getting the hash of a file.

    I'm late to the party, ...

    Well, luckily in open source development there is no "late to the party", you can always open a follow-up PR :)

  6. laanwj commented at 11:31 AM on December 18, 2020: member

    Concept ACK, I generally prefer python to shell code as things become more complex, I find it easier to review and structure. It also doesn't need to rely so much on tooling that happens to be scattered around the system (like your OS's favorite command line sha256 hashers).

  7. practicalswift commented at 2:38 PM on December 18, 2020: contributor

    Concept ACK on the general idea of replacing this shell script with a Python script.

    The current version of the Python script shells out to wget, gpg, grep, mv, diff and sha256sum.

    AFAICT the only necessary external command that would be non-trivial to replace by using only what is supplied by the Python standard library is gpg.

    Would it be possible to re-write this script to make it shell out only to gpg?

  8. theStack commented at 7:12 PM on December 18, 2020: member

    Thanks for all the conceptual reviews! I agree with the idea to reduce calls to external tools, if we move to a language that has 'batteries included' we should also use them to be more deterministic, robust and platform-independent.

    It also doesn't need to rely so much on tooling that happens to be scattered around the system (like your OS's favorite command line sha256 hashers).

    I didn't know it was that bad and not even sha256sum is available consistently across Linux and *BSDs 👀

    Would it be possible to re-write this script to make it shell out only to gpg?

    Challenge accepted :-)

  9. theStack force-pushed on Dec 20, 2020
  10. theStack commented at 11:25 PM on December 20, 2020: member

    Force-pushed with the following changes, eliminating several external tool call (subprocess.run) instances, according to the reviewers suggestions:

    • replaced call to external grep and mv with simple manual filtering (we don't need the contents in a file anyways)
    • replaced call to external sha256sum with python's internal hashing (hashlib.sha256)
    • replaced call to external diff with a trivial comparison routine (new function files_are_equal(...))

    For now I decided to keep the call to wget for two reasons:

    1. we currently take advantage of its timestamping function (parameter -N, see https://linux.die.net/man/1/wget), that only downloads a file if both local and remote timestamp and size differ: "When running Wget with -N, with or without -r or -p, the decision as to whether or not to download a newer copy of a file depends on the local and remote timestamp and size of the file." This is useful for not needing to re-download binaries that are already there, and I don't think its worth it to implement a similar mechanism in the script.

    2. IMHO the verbose step-by-step-output in case of failure can be quite useful for diagnosing the problem, that would possibly be quite hard to get in the same detail level in python (w/o possibly catching lots of individual exceptions), e.g.:

    wget output:
            --2020-12-21 00:17:29--  https://bitcoincore.org/bin/bitcoin-core-0.20.0/test.rc23/SHA256SUMS.asc
            Resolving bitcoincore.org (bitcoincore.org)... 198.251.83.116, 107.191.99.5
            Connecting to bitcoincore.org (bitcoincore.org)|198.251.83.116|:443... connected.
            HTTP request sent, awaiting response... 404 Not Found
            2020-12-21 00:17:29 ERROR 404: Not Found.
    

    So the remaining external calls would be to gpg and wget right now. Happy though to also get rid of the latter one, if that is the general wish.

  11. laanwj commented at 9:23 AM on December 21, 2020: member

    I was thinking of this script yesterday during a twitter thread, and wondered if it would be useful to make it do verification against gitian deterministic build attestations. This would be more thorough than validating my signature. At the least rewriting it in Python makes it easier to do this.

    This is out of scope for this PR of course.

    So the remaining external calls would be to gpg and wget right now. Happy though to also get rid of the latter one, if that is the general wish.

    wget is not low-hanging fruit in this regard. It's fairly easy to implement a poor man's wget in Python, but adding other features such as a progress bar makes it a lot more work. Especially as we don't want to introduce external Python dependencies. Could also be done in a later PR. I think it's fine like this.

  12. laanwj commented at 9:34 AM on December 21, 2020: member

    Testing a bit

    $ ./verify.py 0.20.1-linux
    Downloading bitcoin-0.20.1-aarch64-linux-gnu.tar.gz
    Downloading bitcoin-0.20.1-arm-linux-gnueabihf.tar.gz
    Downloading bitcoin-0.20.1-riscv64-linux-gnu.tar.gz
    Downloading bitcoin-0.20.1-x86_64-linux-gnu.tar.gz
    Keep the binaries in /tmp/bitcoin_verify_binaries
    Verified hashes of
    bitcoin-0.20.1-aarch64-linux-gnu.tar.gz
    bitcoin-0.20.1-arm-linux-gnueabihf.tar.gz
    bitcoin-0.20.1-riscv64-linux-gnu.tar.gz
    bitcoin-0.20.1-x86_64-linux-gnu.tar.gz
    $ ./verify.py 0.19.0.1-linux
    Downloading bitcoin-0.19.0.1-aarch64-linux-gnu.tar.gz
    Downloading bitcoin-0.19.0.1-arm-linux-gnueabihf.tar.gz
    Downloading bitcoin-0.19.0.1-i686-pc-linux-gnu.tar.gz
    Downloading bitcoin-0.19.0.1-riscv64-linux-gnu.tar.gz
    Downloading bitcoin-0.19.0.1-x86_64-linux-gnu.tar.gz
    Keep the binaries in /tmp/bitcoin_verify_binaries
    Verified hashes of
    bitcoin-0.19.0.1-aarch64-linux-gnu.tar.gz
    bitcoin-0.19.0.1-arm-linux-gnueabihf.tar.gz
    bitcoin-0.19.0.1-i686-pc-linux-gnu.tar.gz
    bitcoin-0.19.0.1-riscv64-linux-gnu.tar.gz
    bitcoin-0.19.0.1-x86_64-linux-gnu.tar.gz
    $ ./verify.py 0.20.0-linux  # <---------------------------------------------------
    bitcoin.org and bitcoincore.org signature files were not equal?
    $ ./verify.py 0.20.0-linux
    Downloading bitcoin-0.20.0-aarch64-linux-gnu.tar.gz
    Downloading bitcoin-0.20.0-aarch64-linux-gnu.tar.gz
    Downloading bitcoin-0.20.0-arm-linux-gnueabihf.tar.gz
    Downloading bitcoin-0.20.0-riscv64-linux-gnu.tar.gz
    Downloading bitcoin-0.20.0-x86_64-linux-gnu.tar.gz
    Keep the binaries in /tmp/bitcoin_verify_binaries
    Verified hashes of
    bitcoin-0.20.0-aarch64-linux-gnu.tar.gz
    bitcoin-0.20.0-arm-linux-gnueabihf.tar.gz
    bitcoin-0.20.0-riscv64-linux-gnu.tar.gz
    bitcoin-0.20.0-x86_64-linux-gnu.tar.gz
    

    The intermittent issue was strange. I'd love to see more detailed output in case of such an error, or at the least not removing the files immediately.

  13. contrib: binary verification script verify.sh rewritten in python c84838e7af
  14. contrib: remove verify.sh
    This script has been replaced by verify.py.
    c86b9a65eb
  15. theStack force-pushed on Dec 21, 2020
  16. theStack commented at 1:09 PM on December 21, 2020: member

    I was thinking of this script yesterday during a twitter thread, and wondered if it would be useful to make it do verification against gitian deterministic build attestations. This would be more thorough than validating my signature. At the least rewriting it in Python makes it easier to do this.

    Would that mean that we wouldn't need to check any signature at all, but instead simply fetch each versions ...-build.assert file from the gitian.sigs repository (e.g. from a random contributor) and compare the hashes?

    The intermittent issue was strange. I'd love to see more detailed output in case of such an error, or at the least not removing the files immediately.

    Thanks for testing! I could reproduce this behaviour with both the shell and the python script by starting with an empty working directory, first verifying version 0.19.1, then 0.18.1:

    $ ./verify.sh 0.19.1-linux
    Downloading bitcoin-0.19.1-aarch64-linux-gnu.tar.gz
    Downloading bitcoin-0.19.1-arm-linux-gnueabihf.tar.gz
    Downloading bitcoin-0.19.1-i686-pc-linux-gnu.tar.gz
    Downloading bitcoin-0.19.1-riscv64-linux-gnu.tar.gz
    Downloading bitcoin-0.19.1-x86_64-linux-gnu.tar.gz
    Keep the binaries in /tmp/bitcoin_verify_binaries
    Verified hashes of
    bitcoin-0.19.1-aarch64-linux-gnu.tar.gz
    bitcoin-0.19.1-arm-linux-gnueabihf.tar.gz
    bitcoin-0.19.1-i686-pc-linux-gnu.tar.gz
    bitcoin-0.19.1-riscv64-linux-gnu.tar.gz
    bitcoin-0.19.1-x86_64-linux-gnu.tar.gz
    $ ./verify.sh 0.18.1-linux
    bitcoin.org and bitcoincore.org signature files were not equal?
    

    This seems to turn up whenever a signature file of an older release is downloaded and it has the same size than the one from the previous run -- using wgets timestamping (-N) is obviously counterproductive in this case, and we should ensure to always freshly fetch the signatures.

    Force-pushed with the following changes:

    • don't use timestamping mechanism for downloading the signature files (only for the release binaries)
    • if the two signature files don't match, keep them and show a more verbose message with file paths
    • add missing error message if no files match the specified platform filter
  17. instagibbs commented at 2:24 AM on December 24, 2020: member

    bash scripts longer than 5 lines are a mistake ;)

    concept ACK

  18. luke-jr commented at 12:43 AM on January 3, 2021: member

    Weak concept NACK. sh scripts are generally preferable over Python. You don't need to run tests to use Bitcoin Core, but verifying is kind of important...

    (Weak because only embedded would be missing Python, and ... we use a lot of resources anyway)

  19. theStack commented at 6:33 PM on January 17, 2021: member

    Current conceptual review score state, after 1 month: 4 Concept ACKs (jonatack, laanwj, practicalswift, instagibbs) vs. 1 weak Concept NACK (luke-jr)

    Weak concept NACK. sh scripts are generally preferable over Python. You don't need to run tests to use Bitcoin Core, but verifying is kind of important...

    (Weak because only embedded would be missing Python, and ... we use a lot of resources anyway)

    Though I get your point and agree that it's good to minimize dependencies, I'd argue that the advantage of increased readability/maintainability outweighs the drawback in this case. Looking at the other external dependencies in the script, I think having wget and pgp available on an embedded target is even much less likely than a Python interpreter... Not even being able to do something seemingly simple as calculating a SHA256 in a shell script without pain (as pointed out by laanwj) doesn't convince me either to the "sh scripts are generally preferable over Python" argument.

  20. MarcoFalke commented at 4:53 PM on January 21, 2021: member

    Concept ACK

  21. laanwj commented at 7:03 PM on January 29, 2021: member

    Tested and code review ACK c86b9a65eb0d6d1e659415880702c4dc889c34e6

  22. laanwj merged this on Jan 29, 2021
  23. laanwj closed this on Jan 29, 2021

  24. in contrib/verifybinaries/verify.py:71 in c86b9a65eb
      66 | +        contents2 = file2.read()
      67 | +    return contents1 == contents2
      68 | +
      69 | +
      70 | +def verify_with_gpg(signature_filename, output_filename):
      71 | +    result = subprocess.run(['gpg', '--yes', '--decrypt', '--output',
    


    MarcoFalke commented at 8:05 AM on January 31, 2021:

    Obviously unrelated, but I wondered why we'd use --decrypt to --verify when the files aren't encrypted


    MarcoFalke commented at 8:15 AM on January 31, 2021:

    Oh, I see it is used to protect against attacks where some parts of the file are not covered by the signature.


    shesek commented at 2:31 AM on February 1, 2021:

    What attack are you referring to?

    My understanding is that --decrypt is used here because it makes gpg output the signed message to STDOUT (or the file specified by --output), without the header and signature, while --verify won't give you that.

    I commonly use it like that: curl https://foo.bar/SHA256SUMS.asc | gpg --decrypt - | sha256sum -c -.

    Yes, it is kind of strange that its called decrypt when in reality it doesn't decrypt anything...


    shesek commented at 2:46 AM on February 1, 2021:

    Another thing to note is that gpg checks the signature against any of the public keys in the local keyring. This can be improved by creating a keyring with just Bitcoin Core's signing key and verifying against that:

    $ gpg --export 90C8019E36C2E964 |  gpg --no-default-keyring --keyring ./bitcoincore.keyring --import
    $ gpg --no-default-keyring --keyring ./bitcoincore.keyring --decrypt SHA256SUMS.asc
    

    (Note the ./ for --keyring -- the keyring file will be created in ~/.gnupg instead of the PWD without it)

    I would also consider switching to gpgv, which is part of the gnupg suite but does only signature verification, is much smaller/simpler, can be installed independently and is more widely available (i.e. its usually there on these "slim" docker images and similar barer-bones environments).


    MarcoFalke commented at 6:25 AM on February 1, 2021:

    You can append/prepend any data to any historically valid signature and it will happily "verify". E.g.

    gpg --verify SHA256SUMS.asc && tail -2 SHA256SUMS.asc 
    gpg: Signature made Wed 03 Jun 2020 11:59:52 AM CEST
    gpg:                using RSA key 90C8019E36C2E964
    gpg: Good signature ...
    ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff  bitcoin-22.0-windows.zip
    aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa  bitcoin-22.0-linux.zip
    
  25. in contrib/verifybinaries/verify.py:54 in c86b9a65eb
      49 | +
      50 | +def download_with_wget(remote_file, local_file=None):
      51 | +    if local_file:
      52 | +        wget_args = ['wget', '-O', local_file, remote_file]
      53 | +    else:
      54 | +        # use timestamping mechanism if local filename is not explicitely set
    


    hebasto commented at 12:23 PM on February 1, 2021:

    typo: explicitely ==> explicitly

  26. in contrib/verifybinaries/verify.py:88 in c86b9a65eb
      83 | +    # sanity check
      84 | +    if len(args) < 1:
      85 | +        print("Error: need to specify a version on the command line")
      86 | +        return 3
      87 | +
      88 | +    # determine remote dir dependend on provided version string
    


    hebasto commented at 12:25 PM on February 1, 2021:

    typo: dependend ==> dependent

  27. DrahtBot locked this on Aug 16, 2022

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-04-14 21:14 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me