lint: PR description linter #18399

pull fjahr wants to merge 1 commits into bitcoin:master from fjahr:pr-desc-lint changing 3 files +83 −0
  1. fjahr commented at 5:17 PM on March 21, 2020: member

    This is a proof of concept of my idea briefly outlined here #18380 (comment). The code is partially based on the merge script.

    The new linter uses the pull request number available in the Travis env and the Github API to retrieve the PR description and checks it for common errors, currently @-prefixed GH usernames and fractional left-overs of the pull request template (identified by HTML comments). Currently, users mostly have to be reminded here manually to fix these things or they are caught in the merge process. Obviously this linter can be expanded to check for more similar issues, those are just the ones I am familiar with. For example, usernames in ACK messages could be added as well, as mentioned here: https://github.com/bitcoin-core/bitcoin-maintainer-tools/pull/51

    If #18380 gets merged, the HTML comments part is useless but we could put a unique string in the bottom of the pull request template and then match against that. Alternatively, this PR could be a replacement for #18358

    Open question: I was not sure if the script should be more forgiving. It expects to be run in Travis and otherwise fails. I think this makes sense because the user would have to provide the PR number manually if they want to run it locally. It also fails if there is an issue with the Github API which seems to have problems quite frequently. On the other hand, if Github is down, the CI checks should probably not be running anyway.

  2. fjahr force-pushed on Mar 21, 2020
  3. fjahr commented at 5:44 PM on March 21, 2020: member

    Rate limiting of the Travis IPs seems to make the use of an authenticated request mandatory, added that.

  4. fjahr renamed this:
    PR description linter
    lint: PR description linter
    on Mar 21, 2020
  5. DrahtBot added the label Tests on Mar 21, 2020
  6. laanwj commented at 5:42 PM on March 22, 2020: member

    Concept ACK, thanks for working on this.

  7. practicalswift commented at 8:26 PM on March 22, 2020: contributor

    Concept ACK: Automation is good

  8. promag commented at 11:49 AM on March 23, 2020: member

    Concept ACK.

  9. in test/lint/check-pr-description.py:72 in 1f866ba44f outdated
      67 | +
      68 | +    # Good enough username regex
      69 | +    gh_username = r"/\B@([a-z0-9](?:-?[a-z0-9]){0,38})/gi"
      70 | +    assert not bool(re.search(gh_username, body)), "Please remove any GitHub @-prefixed usernames from your PR description"
      71 | +
      72 | +    html_comment_start = r"/<!--/"
    


    theStack commented at 1:22 PM on March 25, 2020:

    I guess you were inspired by the JavaScript/ECMAScript syntax here with slashes as regex delimiters (https://javascript.info/regexp-escaping#a-slash)? In Python regexs, those don't have any special meaning, as I just tried out, hence they are also part of the searched string (and thus HTML comments are never found!).

        html_comment_start = "<!--"
    

    fjahr commented at 9:38 PM on March 25, 2020:

    Fixed! I think it was more some of my ruby background showing but also I was not sure if I would get ACKs for this so I did not do proper testing yet. Thanks a lot!

  10. in test/lint/check-pr-description.py:74 in 1f866ba44f outdated
      69 | +    gh_username = r"/\B@([a-z0-9](?:-?[a-z0-9]){0,38})/gi"
      70 | +    assert not bool(re.search(gh_username, body)), "Please remove any GitHub @-prefixed usernames from your PR description"
      71 | +
      72 | +    html_comment_start = r"/<!--/"
      73 | +    assert not bool(re.search(html_comment_start, body)), "Please remove the pull request template from your PR description"
      74 | +    html_comment_end = r"/-->/"
    


    theStack commented at 1:25 PM on March 25, 2020:

    See my comment just two lines above. By the way, in Python the prefix "r" has nothing to do with regexs per se, but just indicates raw strings, i.e. no need to escape special characters, but our search string doesn't contain any of those anyway.

        html_comment_end = "-->"
    

    fjahr commented at 9:38 PM on March 25, 2020:

    Fixed!

  11. theStack commented at 1:32 PM on March 25, 2020: member

    Concept ACK

    Testing the script, I couldn't trigger a linting error with the script when run on a PR with HTML comments in the description. E.g. run on PR #18376 $ TRAVIS_PULL_REQUEST=18376 ./check-pr-description.py didn't output anything when it should, see https://github.com/bitcoin/bitcoin/commit/a421e0a22f1230abd69b4661a019bed39b72205f for the commit message. That likely has to do with the search strings that have to be adapted (see my two code review comments plus change suggestions).

  12. MarcoFalke commented at 2:30 PM on March 25, 2020: member

    Not sure if this is worth it. If it is extended to check for @ in any comment (as suggested in the OP), it is a DOS vector where anyone can block the travis run/make it fail by just posting a comment. In that case maintainers would have to run around and remove the @ to just get travis to run.

    I think it should be sufficient to have the check in the merge script

  13. fjahr commented at 2:32 PM on March 25, 2020: member

    Not sure if this is worth it. If it is extended to check for @ in any comment (as suggested in the OP), it is a DOS vector where anyone can block the travis run/make it fail by just posting a comment. In that case maintainers would have to run around and remove the @ to just get travis to run.

    I think it should be sufficient to have the check in the merge script

    That makes a lot of sense, thanks!

  14. fjahr force-pushed on Mar 25, 2020
  15. fjahr commented at 9:45 PM on March 25, 2020: member

    Thanks to @theStack 's feedback the regex should work properly now. I also simplified the GH username regex and added some documentation.

    Here are some test vectors:

    Fail due to HTML tags:

    $ TRAVIS_PULL_REQUEST=16439 ./check-pr-description.py
    

    Fail due to GH username:

    $ TRAVIS_PULL_REQUEST=18376 ./check-pr-description.py
    

    Since these are active the descriptions could be fixed any day so if it does not work withthese please check first if that is the case.

    For the fun of it, you can test ranges of PRs like so but you will see 404 errors for each number that is an issue and not a pull:

    COUNTER=18000; while [ $COUNTER -lt 18020 ]; do TRAVIS_PULL_REQUEST=$COUNTER test/lint/check-pr-description.py; let COUNTER+=1; done
    
  16. fjahr commented at 10:15 PM on March 25, 2020: member

    I thought there was a github access token available in the git config inside Travis but it appears their isn't and not running into rate limiting last time was a false positive. I am trying to find out the variable name if there is one available in the Travis environment, otherwise it would need to be added.

  17. theStack commented at 10:32 PM on March 25, 2020: member

    FWIW, with the help of a quick git log --grep="-->" and further manual inspection I could find two false positives, i.e. PRs that contain no HTML comments but still the linter would complain:

    • #17283 ($ TRAVIS_PULL_REQUEST=17283 ./check-pr-description.py)
    • #12759 ($ TRAVIS_PULL_REQUEST=12759 ./check-pr-description.py)
  18. fjahr commented at 1:28 PM on March 30, 2020: member

    FWIW, with the help of a quick git log --grep="-->" and further manual inspection I could find two false positives, i.e. PRs that contain no HTML comments but still the linter would complain:

    • #17283 ($ TRAVIS_PULL_REQUEST=17283 ./check-pr-description.py)
    • #12759 ($ TRAVIS_PULL_REQUEST=12759 ./check-pr-description.py)

    Thanks for discovering these. So yes, the current version does give false positives any time someone wants to write something in the PR description that actually includes HTML comments. The only effective way I could think of to mitigate that would be to use the solution with the unique string in the template. It would work like this: at the bottom of the template, there is something like TEMPLEATE_END_PLEASE_REMOVE_ME or so and then this linter checks for that string. I am happy to hear feedback if that solution is preferred to prevent these false positives.

  19. fjahr force-pushed on Apr 3, 2020
  20. fjahr force-pushed on Apr 3, 2020
  21. lint: Add PR description linter 0f56acbc3c
  22. fjahr force-pushed on Apr 3, 2020
  23. fjahr commented at 2:54 PM on April 4, 2020: member

    Unfortunately, it looks like this is blocked. I was trying to get a Github Oauth Token into the Travis build environment as a secure (encrypted) variable but this variable can still not be accessed by pull requests triggered through forks since the variable would be exposed to unknown code (see travis docs) This makes sense and I currently only see this working if the pull body does get added to the standard Travis environment variables. There exists an issue for this https://github.com/travis-ci/travis-ci/issues/9302 and a pull request is open to add the title at least https://github.com/travis-ci/travis-build/pull/1319 but it has been open for 2 years.

    I am happy to hear if there are other ideas on how to achieve this. Otherwise, I will close this and reopen when the pull body becomes available in Travis somehow. But I don't think there are any other obvious solutions as they should have come up in the Travis issues on this topic.

  24. fjahr closed this on Apr 4, 2020

  25. DrahtBot locked this on Feb 15, 2022

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-04-15 15:14 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me