Use TravisCI to enforce copyright header rules for source files #9452

pull isle2983 wants to merge 2 commits into bitcoin:master from isle2983:PR-travisci-copyright-enforce changing 13 files +886 −376
  1. isle2983 commented at 6:41 PM on December 31, 2016: contributor

    As the title indicates, this PR adds the enforcement of MIT Licence copyright headers. The idea is that missing headers are caught ASAP and any deviation from the normal The Bitcoin Core developers header needs to be explicitly marked as such in the script.

    In addition this PR does a significant iteration on contrib/devtools/copyright_header.py. Like the preceding version, this script is able to parse through the source files and report on the state of the copyright headers as well as update the year.

    The first major improvement in this new iteration is that it now has a ruleset specific for each subtree to detect the local proper header style (the subtrees being secp256k1, leveldb, univalue and ctaes).

    The second major improvement in this iteration makes more elegant use of Python's regular expressions to detect and manipulate the header text. This approach should be maintainable and extensible while keeping the script's quick execution time.

    To provide the TravisCI integration, a ci_check subcommand is implemented. This executes the logic of the script to catch submissions early where a header was not added pre-merge. A non-zero status is returned to the shell if a problem is found. If everything is fine, TravisCI will look like this:

    noissuesfound

    If, for example, a source file is added with no copyright header it will be errored in TravisCI it like this:

    novalidheader

    If a file has a proper copyright, but is added with some other copyright text such as "Copyright (c) 2016 Some Other Entity", the error will be:

    otherheaderinstance

    In total there are six failure conditions as per what is loaded into the FAILURE_REASONS list of the script. They each have a 'resolution' string attached to them to point the reader to the appropriate resolution - either fix the file (because it is a problem) or update the script (because it is a legitimate change involving non-standard copyright that the script must tolerate). The rules and exceptions for each subtree are loaded into the HEADER_RULES list.

    There is a commit preceding the update to the script which makes small adjustments to the set of MIT copyright headers to make them uniform in instances where they are not. Also, this commit adds the MIT header to contrib/dev/tools/gen-manpages.sh, and hence requires an ACK from @nomnombtc.

  2. fanquake added the label Docs and Output on Jan 1, 2017
  3. TheBlueMatt commented at 3:31 PM on January 1, 2017: member

    Hmm, do we care about enforcing the listed copyright holders? As long as it appears in the MIT header, I'm not sure it matters all that much. Also, would it be simple to enforce that all of the copyright date of files modified (according to git timestamp) after date X (ie ignore the issues from #9450) have the right copyright year on them?

  4. isle2983 commented at 6:29 PM on January 1, 2017: contributor

    If we want it to not care about the holder's name, that would be simple to add in the regex. However, there are two side effects I can think of:

    1. It would then have to tolerate bad spelling/punctuation/capitalization.
    2. It would require some more sophisticated logic in the update part of the script to make sure that the last holder is the one that is incremented and not necessarily the one with The Bitcoin Core developers.

    It would be simple to enforce that the end year gets bumped upon file modification. There are a couple downsides, so I left it out (for now):

    1. It takes a minute or two to pull the file modification history info from git. In CI that probably isn't much of a problem though. In other codebases, I have seen similar checks as part of make test scripting so a check that is reasonably fast is desirable.
    2. It might take a period of adjustment to train everyone to update the year in sync with changes and this might create demand for "increment every file on January 1st dammit!".
  5. nomnombtc commented at 2:17 AM on January 2, 2017: contributor

    ... and hence requires an ACK from @nomnombtc

    Sure whatever. ACK :)

  6. laanwj commented at 8:27 AM on January 2, 2017: member

    To be honest I'd prefer not to add to many of these extra bureaucratic rules to Travis. It's reason for existence is to test the code. If too many auxiliary checks are added it becomes too pesky it increases the chance people will just skip it and merge anyway, as sometimes something needs to be merged.

    Copyright is a grey area but please don't go on to add code style checks and such...

  7. isle2983 commented at 10:01 PM on January 2, 2017: contributor

    Thanks @laanwj - getting your take is most important. Keeping header adjustments periodic and asynchronous is perfectly reasonable.

    The script improvements here are still useful for assessing and fixing up the headers as they drift over time. I have re-submitted as #9459 with the CI part and 'enforcement' language taken out.

    Copyright is a grey area but please don't go on to add code style checks and such...

    Could you clarify your perspective on automated code style checks? Some sense of what is and isn't on the wish list would be helpful.

    The way I see it, there are a couple levels of similar things that can be done for improving code style:

    1. a script to assess and highlight problems with basic style (spaces over tabstops, no trailing whitespace, don't end lines with double semicolons, etc...)
    2. attaching this basic style script to CI for enforcement
    3. a script wrapper to run clang-format/pylint/whatever and highlight deviations from a preferred style
    4. attaching this script wrapper CI to lock down a preferred style in the codebase (probably starting out with an include/exclude list to manage pain, but then ratchet up)

    1 and 3 can be done in a similar way to copyright_header.py to assess and periodically nudge the code in a good direction asynchronous to day-to-day development PRs.

    2 and 4 are more for automating away style nits to better focus on PR contents. Also, ratcheting towards and achieving a consistent code style helps make reading the code easier and bugs shallower. I would expect this to be more controversial since it requires some irritation and short-term pain. However if 1 and 3 are well-built and mature first, demand for CI integration may naturally follow.

  8. TheBlueMatt commented at 11:16 AM on January 3, 2017: member

    @laanwj On the flip side, everything that travis enforces is one less thing we need to check in review...Obviously code style checks aren't coming anytime soon, but I think this is reasonable.

  9. TheBlueMatt commented at 11:17 AM on January 3, 2017: member

    Regarding code style: we're a long way from really enforcing anything regarding code style...would be cool if we could detect indentation errors (since those are likely to introduce real bugs, whereas other code style issues may or may not be as likely), but I'm not sure if we could do ONLY that with many tools.

  10. MarcoFalke commented at 11:20 AM on January 7, 2017: member

    Well, considering white space errors that caused bugs such as #9319 (review) or de8980df9d3a1fc0b257139cef10a0e6ba29a8bd, I think having an indicator that the touched code is clang formatted really makes sense. Some months ago I hacked up a script for travis, which fails when the new code is not formatted, but a failing travis is obviously not something we want in pull requests.

    Would it be possible to have another service run that provides this indicator, but is still not too prevalent so that a "failing" format check can be ignored?

  11. MarcoFalke added the label Brainstorming on Jan 7, 2017
  12. isle2983 commented at 5:41 PM on January 7, 2017: contributor

    I am not deeply familiar with Travis CI, but from looking at their docs there doesn't seem to be an obvious way to 'warn' instead of 'fail'.

    One interesting thing in the docs is a workflow they discuss for coverity scans:

    https://docs.travis-ci.com/user/coverity-scan/#Build-Submission-Frequency

    Similarly, we could have a collection of tools and corresponding side branches that do thorough static code checks in CI. Any found issues can be fixed on those branches and then periodically cherry picked back into the mainline.

  13. Minor adjustments to MIT Licence copyright headers to make uniform b64b1c6000
  14. Use TravisCI to enforce copyright header rules for source files 28f8a3642b
  15. isle2983 force-pushed on Jan 7, 2017
  16. isle2983 commented at 6:12 PM on January 7, 2017: contributor

    Rebased. The conflict was in the files:

    contrib/linearize/linearize-data.py contrib/linearize/linearize-hashes.py share/rpcuser/rpcuser.py

    For three files, the conflict was with 27765b6403cece54320374b37afb01a0cfe571c3 which incremented the end year to 2016.

    For linearize-data.py and linearize-hashes.py it also conflicted with 3c8f63ba7c7be62d462727f4d67633e1064f0f79 which changed the shebang at the top of the file to point to python3

  17. TheBlueMatt commented at 6:50 PM on January 7, 2017: member

    Heh, #9487 is pretty much exactly what I was referring to that it'd be ideal to fix in #9452 (comment).

  18. MarcoFalke commented at 7:51 PM on January 7, 2017: member
  19. fanquake commented at 12:07 AM on January 21, 2017: member

    Closing this in favour of #9603. Enforcing copyright header styling (to a sub-tree level) is currently overkill.

  20. fanquake closed this on Jan 21, 2017

  21. MarcoFalke locked this on Sep 8, 2021

github-metadata-mirror

This is a metadata mirror of the GitHub repository bitcoin/bitcoin. This site is not affiliated with GitHub. Content is generated from a GitHub metadata backup.
generated: 2026-04-15 15:15 UTC

This site is hosted by @0xB10C
More mirrored repositories can be found on mirror.b10c.me