contrib: replace binary verification script verify.sh with python rewrite #20689

theStack commented at 7:00 PM on December 17, 2020: contributor

The rationale for the PR is the same as for #18132:

Most of our test scripts are written in python. We don't have enough reviewers for bash scripts and they tend to be clumsy anyway. Especially when it comes to argument parsing.

Note that there are still a lot of things that could be improved in this replacement (e.g. using regexps for version string parsing, adding type annotations, dividing up into more functions, getting a pylint score closer to 10, etc.), but I found the original shell script quite hard to read, so it's possibly still a good first step for an improvement. ~Not sure though if it's worth the reviewers time, and if it's even continued to be used long-term (maybe there are plans to merge it with get_previous_releases.py, which partly does the same?), so chasing for Concept ACKs right now.~

DrahtBot added the label Docs on Dec 17, 2020

DrahtBot added the label Scripts and tools on Dec 17, 2020

jonatack commented at 7:38 PM on December 17, 2020: contributor

Concept ACK (modulo context / future plans for this script; I don't know).

in contrib/verifybinaries/verify.py:75 in bb801302ee outdated

  70 | +        return False
  71 | +    subprocess.run(f"mv {filename}-tmp {filename}", shell=True)
  72 | +    return True
  73 | +
  74 | +
  75 | +def verify_with_sha256sum(files_to_verify, hashfile):

laanwj commented at 11:30 AM on December 18, 2020:

This is a missed opportunity; please use python's internal hashing instead of calling out to an utility here.

theStack commented at 7:12 PM on December 18, 2020:

Good point! When I thought about using internal hashing I was initially put off about potentially loading dozens of MBs into memory (I guess sha256sum does this in a smart way by loading chunk after chunk, instead of the whole file at once), but OTOH this should not be a problem at all and the advantage of getting rid of an external tool outweighs this by far.

shesek commented at 7:53 PM on January 29, 2021:

I'm late to the party, but you can do stream hashing in python:

import hashlib as hash

CHUNKSIZE = 65536

sha = hash.sha256()
with open('/path/to/foo.bar', 'rb') as fh:
    buff = fh.read(CHUNKSIZE)
    while len(buff) > 0:
        sha.update(buff)
        buff = fh.read(CHUNKSIZE)
digest = sha.hexdigest()

theStack commented at 9:26 PM on January 29, 2021:

@shesek: Thanks for posting! I didn't doubt that stream hashing is possible at all in python, I just was put off by the idea of partly reinventing the wheel of sha256 here and investing more than 2-3 LOC for getting the hash of a file.

I'm late to the party, ...

Well, luckily in open source development there is no "late to the party", you can always open a follow-up PR :)

laanwj commented at 11:31 AM on December 18, 2020: member

Concept ACK, I generally prefer python to shell code as things become more complex, I find it easier to review and structure. It also doesn't need to rely so much on tooling that happens to be scattered around the system (like your OS's favorite command line sha256 hashers).

practicalswift commented at 2:38 PM on December 18, 2020: contributor

Concept ACK on the general idea of replacing this shell script with a Python script.

The current version of the Python script shells out to wget, gpg, grep, mv, diff and sha256sum.

AFAICT the only necessary external command that would be non-trivial to replace by using only what is supplied by the Python standard library is gpg.

Would it be possible to re-write this script to make it shell out only to gpg?

theStack commented at 7:12 PM on December 18, 2020: contributor

Thanks for all the conceptual reviews! I agree with the idea to reduce calls to external tools, if we move to a language that has 'batteries included' we should also use them to be more deterministic, robust and platform-independent.

It also doesn't need to rely so much on tooling that happens to be scattered around the system (like your OS's favorite command line sha256 hashers).

I didn't know it was that bad and not even sha256sum is available consistently across Linux and *BSDs 👀

Would it be possible to re-write this script to make it shell out only to gpg?

Challenge accepted :-)

theStack force-pushed on Dec 20, 2020

theStack commented at 11:25 PM on December 20, 2020: contributor

Force-pushed with the following changes, eliminating several external tool call (subprocess.run) instances, according to the reviewers suggestions:

replaced call to external grep and mv with simple manual filtering (we don't need the contents in a file anyways)
replaced call to external sha256sum with python's internal hashing (hashlib.sha256)
replaced call to external diff with a trivial comparison routine (new function files_are_equal(...))

For now I decided to keep the call to wget for two reasons:

we currently take advantage of its timestamping function (parameter -N, see https://linux.die.net/man/1/wget), that only downloads a file if both local and remote timestamp and size differ: "When running Wget with -N, with or without -r or -p, the decision as to whether or not to download a newer copy of a file depends on the local and remote timestamp and size of the file." This is useful for not needing to re-download binaries that are already there, and I don't think its worth it to implement a similar mechanism in the script.
IMHO the verbose step-by-step-output in case of failure can be quite useful for diagnosing the problem, that would possibly be quite hard to get in the same detail level in python (w/o possibly catching lots of individual exceptions), e.g.:

wget output:
        --2020-12-21 00:17:29--  https://bitcoincore.org/bin/bitcoin-core-0.20.0/test.rc23/SHA256SUMS.asc
        Resolving bitcoincore.org (bitcoincore.org)... 198.251.83.116, 107.191.99.5
        Connecting to bitcoincore.org (bitcoincore.org)|198.251.83.116|:443... connected.
        HTTP request sent, awaiting response... 404 Not Found
        2020-12-21 00:17:29 ERROR 404: Not Found.

So the remaining external calls would be to gpg and wget right now. Happy though to also get rid of the latter one, if that is the general wish.

laanwj commented at 9:23 AM on December 21, 2020: member

I was thinking of this script yesterday during a twitter thread, and wondered if it would be useful to make it do verification against gitian deterministic build attestations. This would be more thorough than validating my signature. At the least rewriting it in Python makes it easier to do this.

This is out of scope for this PR of course.

So the remaining external calls would be to gpg and wget right now. Happy though to also get rid of the latter one, if that is the general wish.

wget is not low-hanging fruit in this regard. It's fairly easy to implement a poor man's wget in Python, but adding other features such as a progress bar makes it a lot more work. Especially as we don't want to introduce external Python dependencies. Could also be done in a later PR. I think it's fine like this.

laanwj commented at 9:34 AM on December 21, 2020: member

Testing a bit

$ ./verify.py 0.20.1-linux
Downloading bitcoin-0.20.1-aarch64-linux-gnu.tar.gz
Downloading bitcoin-0.20.1-arm-linux-gnueabihf.tar.gz
Downloading bitcoin-0.20.1-riscv64-linux-gnu.tar.gz
Downloading bitcoin-0.20.1-x86_64-linux-gnu.tar.gz
Keep the binaries in /tmp/bitcoin_verify_binaries
Verified hashes of
bitcoin-0.20.1-aarch64-linux-gnu.tar.gz
bitcoin-0.20.1-arm-linux-gnueabihf.tar.gz
bitcoin-0.20.1-riscv64-linux-gnu.tar.gz
bitcoin-0.20.1-x86_64-linux-gnu.tar.gz
$ ./verify.py 0.19.0.1-linux
Downloading bitcoin-0.19.0.1-aarch64-linux-gnu.tar.gz
Downloading bitcoin-0.19.0.1-arm-linux-gnueabihf.tar.gz
Downloading bitcoin-0.19.0.1-i686-pc-linux-gnu.tar.gz
Downloading bitcoin-0.19.0.1-riscv64-linux-gnu.tar.gz
Downloading bitcoin-0.19.0.1-x86_64-linux-gnu.tar.gz
Keep the binaries in /tmp/bitcoin_verify_binaries
Verified hashes of
bitcoin-0.19.0.1-aarch64-linux-gnu.tar.gz
bitcoin-0.19.0.1-arm-linux-gnueabihf.tar.gz
bitcoin-0.19.0.1-i686-pc-linux-gnu.tar.gz
bitcoin-0.19.0.1-riscv64-linux-gnu.tar.gz
bitcoin-0.19.0.1-x86_64-linux-gnu.tar.gz
$ ./verify.py 0.20.0-linux  # <---------------------------------------------------
bitcoin.org and bitcoincore.org signature files were not equal?
$ ./verify.py 0.20.0-linux
Downloading bitcoin-0.20.0-aarch64-linux-gnu.tar.gz
Downloading bitcoin-0.20.0-aarch64-linux-gnu.tar.gz
Downloading bitcoin-0.20.0-arm-linux-gnueabihf.tar.gz
Downloading bitcoin-0.20.0-riscv64-linux-gnu.tar.gz
Downloading bitcoin-0.20.0-x86_64-linux-gnu.tar.gz
Keep the binaries in /tmp/bitcoin_verify_binaries
Verified hashes of
bitcoin-0.20.0-aarch64-linux-gnu.tar.gz
bitcoin-0.20.0-arm-linux-gnueabihf.tar.gz
bitcoin-0.20.0-riscv64-linux-gnu.tar.gz
bitcoin-0.20.0-x86_64-linux-gnu.tar.gz

The intermittent issue was strange. I'd love to see more detailed output in case of such an error, or at the least not removing the files immediately.

contrib: binary verification script verify.sh rewritten in python c84838e7af

contrib: remove verify.sh

This script has been replaced by verify.py.

c86b9a65eb

theStack force-pushed on Dec 21, 2020

theStack commented at 1:09 PM on December 21, 2020: contributor

I was thinking of this script yesterday during a twitter thread, and wondered if it would be useful to make it do verification against gitian deterministic build attestations. This would be more thorough than validating my signature. At the least rewriting it in Python makes it easier to do this.

Would that mean that we wouldn't need to check any signature at all, but instead simply fetch each versions ...-build.assert file from the gitian.sigs repository (e.g. from a random contributor) and compare the hashes?

The intermittent issue was strange. I'd love to see more detailed output in case of such an error, or at the least not removing the files immediately.

Thanks for testing! I could reproduce this behaviour with both the shell and the python script by starting with an empty working directory, first verifying version 0.19.1, then 0.18.1:

$ ./verify.sh 0.19.1-linux
Downloading bitcoin-0.19.1-aarch64-linux-gnu.tar.gz
Downloading bitcoin-0.19.1-arm-linux-gnueabihf.tar.gz
Downloading bitcoin-0.19.1-i686-pc-linux-gnu.tar.gz
Downloading bitcoin-0.19.1-riscv64-linux-gnu.tar.gz
Downloading bitcoin-0.19.1-x86_64-linux-gnu.tar.gz
Keep the binaries in /tmp/bitcoin_verify_binaries
Verified hashes of
bitcoin-0.19.1-aarch64-linux-gnu.tar.gz
bitcoin-0.19.1-arm-linux-gnueabihf.tar.gz
bitcoin-0.19.1-i686-pc-linux-gnu.tar.gz
bitcoin-0.19.1-riscv64-linux-gnu.tar.gz
bitcoin-0.19.1-x86_64-linux-gnu.tar.gz
$ ./verify.sh 0.18.1-linux
bitcoin.org and bitcoincore.org signature files were not equal?

This seems to turn up whenever a signature file of an older release is downloaded and it has the same size than the one from the previous run -- using wgets timestamping (-N) is obviously counterproductive in this case, and we should ensure to always freshly fetch the signatures.

Force-pushed with the following changes:

don't use timestamping mechanism for downloading the signature files (only for the release binaries)
if the two signature files don't match, keep them and show a more verbose message with file paths
add missing error message if no files match the specified platform filter

instagibbs commented at 2:24 AM on December 24, 2020: member

bash scripts longer than 5 lines are a mistake ;)

concept ACK

luke-jr commented at 12:43 AM on January 3, 2021: member

Weak concept NACK. sh scripts are generally preferable over Python. You don't need to run tests to use Bitcoin Core, but verifying is kind of important...

(Weak because only embedded would be missing Python, and ... we use a lot of resources anyway)

theStack commented at 6:33 PM on January 17, 2021: contributor

Current conceptual review score state, after 1 month: 4 Concept ACKs (jonatack, laanwj, practicalswift, instagibbs) vs. 1 weak Concept NACK (luke-jr)

Weak concept NACK. sh scripts are generally preferable over Python. You don't need to run tests to use Bitcoin Core, but verifying is kind of important...

(Weak because only embedded would be missing Python, and ... we use a lot of resources anyway)

Though I get your point and agree that it's good to minimize dependencies, I'd argue that the advantage of increased readability/maintainability outweighs the drawback in this case. Looking at the other external dependencies in the script, I think having wget and pgp available on an embedded target is even much less likely than a Python interpreter... Not even being able to do something seemingly simple as calculating a SHA256 in a shell script without pain (as pointed out by laanwj) doesn't convince me either to the "sh scripts are generally preferable over Python" argument.

MarcoFalke commented at 4:53 PM on January 21, 2021: member

Concept ACK

laanwj commented at 7:03 PM on January 29, 2021: member

Tested and code review ACK c86b9a65eb0d6d1e659415880702c4dc889c34e6

laanwj merged this on Jan 29, 2021

laanwj closed this on Jan 29, 2021

in contrib/verifybinaries/verify.py:71 in c86b9a65eb

  66 | +        contents2 = file2.read()
  67 | +    return contents1 == contents2
  68 | +
  69 | +
  70 | +def verify_with_gpg(signature_filename, output_filename):
  71 | +    result = subprocess.run(['gpg', '--yes', '--decrypt', '--output',

MarcoFalke commented at 8:05 AM on January 31, 2021:

Obviously unrelated, but I wondered why we'd use --decrypt to --verify when the files aren't encrypted

MarcoFalke commented at 8:15 AM on January 31, 2021:

Oh, I see it is used to protect against attacks where some parts of the file are not covered by the signature.

shesek commented at 2:31 AM on February 1, 2021:

What attack are you referring to?

My understanding is that --decrypt is used here because it makes gpg output the signed message to STDOUT (or the file specified by --output), without the header and signature, while --verify won't give you that.

I commonly use it like that: curl https://foo.bar/SHA256SUMS.asc | gpg --decrypt - | sha256sum -c -.

Yes, it is kind of strange that its called decrypt when in reality it doesn't decrypt anything...

shesek commented at 2:46 AM on February 1, 2021:

Another thing to note is that gpg checks the signature against any of the public keys in the local keyring. This can be improved by creating a keyring with just Bitcoin Core's signing key and verifying against that:

$ gpg --export 90C8019E36C2E964 |  gpg --no-default-keyring --keyring ./bitcoincore.keyring --import
$ gpg --no-default-keyring --keyring ./bitcoincore.keyring --decrypt SHA256SUMS.asc

(Note the ./ for --keyring -- the keyring file will be created in ~/.gnupg instead of the PWD without it)

I would also consider switching to gpgv, which is part of the gnupg suite but does only signature verification, is much smaller/simpler, can be installed independently and is more widely available (i.e. its usually there on these "slim" docker images and similar barer-bones environments).

MarcoFalke commented at 6:25 AM on February 1, 2021:

You can append/prepend any data to any historically valid signature and it will happily "verify". E.g.

gpg --verify SHA256SUMS.asc && tail -2 SHA256SUMS.asc 
gpg: Signature made Wed 03 Jun 2020 11:59:52 AM CEST
gpg:                using RSA key 90C8019E36C2E964
gpg: Good signature ...
ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff  bitcoin-22.0-windows.zip
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa  bitcoin-22.0-linux.zip

in contrib/verifybinaries/verify.py:54 in c86b9a65eb

  49 | +
  50 | +def download_with_wget(remote_file, local_file=None):
  51 | +    if local_file:
  52 | +        wget_args = ['wget', '-O', local_file, remote_file]
  53 | +    else:
  54 | +        # use timestamping mechanism if local filename is not explicitely set

hebasto commented at 12:23 PM on February 1, 2021:

typo: explicitely ==> explicitly

in contrib/verifybinaries/verify.py:88 in c86b9a65eb

  83 | +    # sanity check
  84 | +    if len(args) < 1:
  85 | +        print("Error: need to specify a version on the command line")
  86 | +        return 3
  87 | +
  88 | +    # determine remote dir dependend on provided version string

hebasto commented at 12:25 PM on February 1, 2021:

typo: dependend ==> dependent

bitcoin locked this on Aug 16, 2022