Migrate verify-commits script to python, run in travis #13066

pull ken2812221 wants to merge 1 commits into bitcoin:master from ken2812221:verify-commits changing 7 files +169 −161
  1. ken2812221 commented at 12:42 PM on April 24, 2018: contributor

    The cron job that runs every day would fail because of git checkout a single commit, not a branch.

    #12708 introduce a method to check whether merges are clean. However, there are four merges are not clean. So, I add a list of merges that are dirty and ignore them.

    Also, I modify the current shell script to python, it makes the script speed up a lot. The python code tree_sha512sum was copied from github-merge.py

    I've re-designed this. Now we verify all the things by default.

    • Add --disable-tree-check option, not to check SHA-512 tree
    • Add --clean-merge NUMBER option, only verify commits after <NUMBER> days ago

    Travis running time:

    option time
    verify-commits.py 25m47.02s(1547.02s)
    verify-commits.py --disable-tree-check 19m10.08s(1150.08s)
    verify-commits.py --clean-merge 30 9m18.18s(558.18s)
    verify-commits.py --disable-tree-check --clean-merge 30 1m16.51s(76.51s)

    Since the cron job always fail, I've created a respository to verify this daily. Build Status

  2. ken2812221 renamed this:
    Make travis to run verify-commits
    Make travis run verify-commits
    on Apr 24, 2018
  3. ken2812221 force-pushed on Apr 24, 2018
  4. fanquake added the label Tests on Apr 24, 2018
  5. MarcoFalke commented at 1:35 PM on April 24, 2018: member

    The cron job that runs every day would fail because of git checkout a single commit, not a branch.

    Mind to elaborate why it wouldn't work on a "single commit"?

  6. ken2812221 commented at 1:49 PM on April 24, 2018: contributor

    git rev-parse --abbrev-ref HEAD does not work with commit, it always returns HEAD. And git would not return to latest commit after the script. But I don't know if this is the exact reason. I just test on my PC, the script failed to check signature at random commit.

  7. laanwj commented at 3:27 PM on April 24, 2018: member

    Also, I modify the current shell script to python, it makes the script speed up a lot. The python code tree_sha512sum was copied from github-merge.py

    Thanks! Yes, the current implementation has been slow especially the treehash512 checking, my implementation from github-merge.py only uses a single call to ls-tree which makes it more efficient.

    Concept ACK.

  8. MarcoFalke commented at 3:35 PM on April 24, 2018: member

    I think we should remove the shell script after this has been converted to python. Objections, @TheBlueMatt ?

  9. in contrib/verify-commits/verify-commits.py:80 in 902e9f9866 outdated
      75 | +    VERIFY_TREE = True if len(sys.argv) >= 3 and sys.argv[2] == "--tree-check" else False
      76 | +    NO_SHA1 = True
      77 | +    PREV_COMMIT = ""
      78 | +    INITIAL_COMMIT = CURRENT_COMMIT
      79 | +
      80 | +    BRANCH = subprocess.check_output([GIT,'rev-parse','--abbrev-ref','HEAD'], universal_newlines=True).splitlines()[0]
    


    MarcoFalke commented at 3:40 PM on April 24, 2018:

    Shouldn't this be INITIAL_COMMIT instead of HEAD?


    laanwj commented at 3:46 PM on April 24, 2018:

    It's a direct transcription from the shell script at least:

    BRANCH="$(git rev-parse --abbrev-ref HEAD)"
    

    ken2812221 commented at 3:49 PM on April 24, 2018:

    No, it is a string 'HEAD', it would return a branch name. If you put a commit id, it returns nothing.


    MarcoFalke commented at 4:00 PM on April 24, 2018:

    'HEAD', it would return a branch name. If you put a commit id, it returns nothing.

    It always returns a commit id for me.


    ken2812221 commented at 6:24 AM on April 25, 2018:

    Well, the git version might be different. I'm using git 2.1.4


    MarcoFalke commented at 3:13 PM on April 25, 2018:

    Oh yeah, that's it. To me this seems like a thing to fix (The script shouldn't depends on what version of git you are using)

    Also, I still think that the correct thing to use here is INITIAL_COMMIT. I.e. git log --format="%H" -1 "${INITIAL_COMMIT}". Otherwise a comment might be appropriate to say why HEAD is used here.


    ken2812221 commented at 5:35 PM on April 25, 2018:

    I use it just because the original shell script, so I change it to git show -s --format="%H" ${INITIAL_COMMIT} now, and looks like we don't need to change the travis script.

  10. sipa commented at 6:17 PM on April 24, 2018: member

    All the unclean merge commits are from 2016 or earlier. Perhaps instead there can just be a threshold that skips the validation below a certain age (say, 1 month)?

  11. ken2812221 force-pushed on Apr 25, 2018
  12. ken2812221 force-pushed on Apr 25, 2018
  13. ken2812221 commented at 6:16 AM on April 25, 2018: contributor

    All the unclean merge commits are from 2016 or earlier. Perhaps instead there can just be a threshold that skips the validation below a certain age (say, 1 month)?

    Or do we need a trusted-clean-merge-root? This can be when that merge policy start.

  14. ken2812221 commented at 1:44 PM on April 25, 2018: contributor

    Test for only checking commits in one month: https://travis-ci.org/ken2812221/bitcoin/jobs/371267929

  15. ken2812221 force-pushed on Apr 25, 2018
  16. ken2812221 force-pushed on Apr 25, 2018
  17. ken2812221 force-pushed on Apr 25, 2018
  18. ken2812221 commented at 8:50 PM on April 25, 2018: contributor

    Squashed and delete shell script.

    We can run the run the script in about 80 seconds, so can we revert fa6f12af6baa97961dc4ae03708bd52cb7918ea0?

  19. ken2812221 force-pushed on Apr 25, 2018
  20. ken2812221 force-pushed on Apr 26, 2018
  21. ken2812221 force-pushed on Apr 26, 2018
  22. jonasschnelli commented at 8:14 AM on April 26, 2018: contributor

    I think the PR title should be "Migrate verify-commits script to python, run in travis" (or similar)

  23. ken2812221 renamed this:
    Make travis run verify-commits
    Migrate verify-commits script to python, run in travis
    on Apr 26, 2018
  24. ken2812221 commented at 8:17 AM on April 26, 2018: contributor
  25. laanwj commented at 2:04 PM on April 27, 2018: member

    I did a timing comparison locally, this is an impressive win (given that the checks are correct):

    contrib/verify-commits/verify-commits.py 
    (all the way to dc0305d15aa02819cd4763e1efe3876d674faea7, after verifying 2627)
    real    5m36.844s
    user    3m52.696s
    sys     1m1.720s
    
    contrib/verify-commits/verify-commits.sh 
    (and it wasn't even completely finished yet, only up to commit 6052d509105790a26b3ad5df43dd61e7f1b24a12, after verifying 1940)
    real    32m1.084s
    user    25m57.068s
    sys     4m16.484s
    
  26. laanwj commented at 2:54 PM on April 27, 2018: member

    IRC discussion:

    <wumpus> ken2812221: so to be clear: the only expected difference in behavior between your python script and the shell script is that your script only verifies merges up to a certain depth? <wumpus> (which is a sensible change, as the shell script throws an error there, not allowing it to verify the gpg signatures deeper either) <ken2812221> Yes.

    As far as I see this is the case, so utACK 058d18bdf302976b6d59752ab816463c89f70404.

  27. laanwj requested review from TheBlueMatt on Apr 27, 2018
  28. TheBlueMatt commented at 4:52 PM on April 27, 2018: member

    I'm fine with migrating this to python, but I certainly dont know python well enough to verify something that is intended to be a secure way of checking our git tree, nor maintain it in the future. If this change means I dont have to and others will, however, thats fine.

  29. TheBlueMatt commented at 4:54 PM on April 27, 2018: member

    NACK changing the script to only verify up to N commits instead of up to a trusted root by default. Its fine to do that on travis but that obviously defeats the point of "checking the git commits come from a trusted source and don't just have a different timestamp on them".

  30. ken2812221 commented at 8:06 PM on April 27, 2018: contributor

    I have two ways to do this. The first, add allow-unclean-merge-commits. The list is following.

    6052d509105790a26b3ad5df43dd61e7f1b24a12
    3798e5de334c3deb5f71302b782f6b8fbd5087f1
    326ffed09bfcc209a2efd6a2ebc69edf6bd200b5
    97d83739db0631be5d4ba86af3616014652c00ec
    

    The second way is add trusted-unclean-merge-commit-root. It would be

    6052d509105790a26b3ad5df43dd61e7f1b24a12
    

    @TheBlueMatt Which way would you prefer?

  31. MarcoFalke commented at 8:10 PM on April 27, 2018: member

    Given that the speedup is just based on skipping verification, I'd slightly prefer to stick around with the bash script for now and do the python transcribe in a follow up pr.

  32. ken2812221 commented at 8:16 PM on April 27, 2018: contributor

    @MarcoFalke If we do the entire verification in python, it still just take 18 mins on travis.

  33. ken2812221 force-pushed on May 1, 2018
  34. ken2812221 commented at 3:37 AM on May 1, 2018: contributor

    I've re-designed this. Now we verify all the things by default.

    • Add --disable-tree-check option, not to check SHA-512 tree
    • Add --clean-merge NUMBER option, only verify commits after <NUMBER> days ago

    Travis running time:

    option time
    verify-commits.py 25m47.02s(1547.02s)
    verify-commits.py --disable-tree-check 19m10.08s(1150.08s)
    verify-commits.py --clean-merge 30 9m18.18s(558.18s)
    verify-commits.py --disable-tree-check --clean-merge 30 1m16.51s(76.51s)

    Since the cron job always fail, I've created a respository to verify this daily. Build Status

  35. in contrib/verify-commits/verify-commits.py:8 in e537dc7d21 outdated
       0 | @@ -0,0 +1,131 @@
       1 | +#!/usr/bin/env python3
       2 | +import argparse
       3 | +import hashlib
       4 | +import os
       5 | +import subprocess
       6 | +import sys
       7 | +import time
       8 | +from subprocess import PIPE
    


    jnewbery commented at 3:16 PM on May 8, 2018:

    You're already importing subprocess. Suggest you just use subprocess.PIPE throughout.

    Same for sys.stderr below.

  36. ken2812221 force-pushed on May 10, 2018
  37. ken2812221 force-pushed on May 10, 2018
  38. ken2812221 force-pushed on Jun 3, 2018
  39. in contrib/verify-commits/verify-commits.py:1 in 3ef8327086 outdated
       0 | @@ -0,0 +1,131 @@
       1 | +#!/usr/bin/env python3
    


    jnewbery commented at 4:35 PM on June 11, 2018:

    Please include a copyright notice and module docstring, eg:

    # Copyright (c) 2018 The Bitcoin Core developers
    # Distributed under the MIT software license, see the accompanying
    # file COPYING or http://www.opensource.org/licenses/mit-license.php.
    """Verify commits against a trusted keys list."""
    
  40. in contrib/verify-commits/verify-commits.py:20 in 3ef8327086 outdated
       9 | +from sys import stderr
      10 | +
      11 | +GIT = os.getenv('GIT','git')
      12 | +
      13 | +def tree_sha512sum(commit='HEAD'):
      14 | +    # request metadata for entire tree, recursively
    


    jnewbery commented at 4:38 PM on June 11, 2018:

    Nit: update the comment to say that this is copied from github-merge.py

  41. in contrib/verify-commits/verify-commits.py:68 in 3ef8327086 outdated
      57 | +    p.stdin.close()
      58 | +    if p.wait():
      59 | +        raise IOError('Non-zero return value executing git cat-file')
      60 | +    return overall.hexdigest()
      61 | +
      62 | +def main():
    


    jnewbery commented at 4:39 PM on June 11, 2018:

    A few line breaks would be helpful here!

  42. jnewbery commented at 7:24 PM on June 11, 2018: member

    utACK 3ef832708661ddcfa9b38b5de68c86aeab6a9085.

    This looks generally good to me. I started adding some nits but decided to just go ahead and implement the changes myself here: https://github.com/jnewbery/bitcoin/tree/pr13066.1 . Feel free to take as much of that as you want. It's mostly nits and style - adding comments, fixing flake8 warnings, etc.

    I'm not a shell expert so something subtle may have escaped me, but it looks functionally the same as verify-commits.sh.

    I haven't verified the incorrect sha 512 commits and unclean merge commits.

  43. ken2812221 commented at 12:19 PM on June 12, 2018: contributor

    @jnewbery Thanks for your review! I would like to take your nits.

  44. jnewbery commented at 2:31 PM on June 12, 2018: member

    I would like to take your nits.

    Great! Feel free to squash the changes into your commit.

    I've now tested this and verified that:

    • f8feaa4636260b599294c7285bcf1c8b7737f74e and 8040ae6fc576e9504186f2ae3ff2c8125de1095c do have missing/incorrect Tree_SHA512 digests in the git logs, and do not contain malicious code.
    • 6052d509105790a26b3ad5df43dd61e7f1b24a12, 3798e5de334c3deb5f71302b782f6b8fbd5087f1, 326ffed09bfcc209a2efd6a2ebc69edf6bd200b5 and 97d83739db0631be5d4ba86af3616014652c00ec are not clean merges and do not contain malicious code.

    Tested ACK 8c1515e4393afe12d76f12565f712ca8b26152a8.

  45. Use python instead of slow shell script on verify-commits e5b2cd8e75
  46. ken2812221 force-pushed on Jun 12, 2018
  47. ken2812221 commented at 2:49 PM on June 12, 2018: contributor

    Squashed

  48. TheBlueMatt commented at 3:06 PM on June 12, 2018: member

    I'm still wondering why migrate to python? It seems like most of the new code just does the same stuff that the old code did but instead of bash pipes it does a bunch of stdin/out reading, which I find way less readable.

    Definitely Concept ACK the allow-unclean-merge-commits file and just making travis only go back a ways instead of doing a full check.

  49. ken2812221 commented at 3:15 PM on June 12, 2018: contributor

    @TheBlueMatt The major reason I do this PR is that bash script is too slow. It took about three hours to verify commits to trusted git root and unclean merges with bash and took about one hour with python on my PC.

    Also, current bash script only check Tree-SHA512 with commit HEAD, it would take even longer if we check it every commit.

  50. laanwj merged this on Jun 12, 2018
  51. laanwj closed this on Jun 12, 2018

  52. laanwj referenced this in commit fa4b9065a8 on Jun 12, 2018
  53. ken2812221 deleted the branch on Jun 12, 2018
  54. in contrib/verify-commits/verify-commits.py:63 in e5b2cd8e75
      58 | +        # update overall hash with file hash
      59 | +        overall.update(dig.encode("utf-8"))
      60 | +        overall.update("  ".encode("utf-8"))
      61 | +        overall.update(f)
      62 | +        overall.update("\n".encode("utf-8"))
      63 | +    p.stdin.close()
    


    ryanofsky commented at 3:52 PM on June 12, 2018:

    It seems possible this code could hang depending on internals of git or the operating system, if reads from stdin are buffered. I think moving this stdin.close() line up to where the stdin.flush() is could avoid this. Another option is to use https://docs.python.org/3/library/subprocess.html#subprocess.Popen.communicate, which avoids deadlocks of this type.

  55. in contrib/verify-commits/verify-commits.py:57 in e5b2cd8e75
      52 | +                intern.update(piece)
      53 | +            else:
      54 | +                raise IOError('Premature EOF reading git cat-file output')
      55 | +            ptr += bs
      56 | +        dig = intern.hexdigest()
      57 | +        assert p.stdout.read(1) == b'\n'  # ignore LF that follows blob data
    


    ryanofsky commented at 3:55 PM on June 12, 2018:

    It's better to move code that has side effects outside of assert statements, because assert statements are stripped out with python's -O option.

  56. MarcoFalke referenced this in commit 4b1edd3185 on Jun 13, 2018
  57. UdjinM6 referenced this in commit f92a7b8e6e on May 5, 2021
  58. UdjinM6 referenced this in commit 9dac547275 on May 6, 2021
  59. MarcoFalke locked this on Sep 8, 2021

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-04-17 09:15 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me