The objective here is to expose data to inform the code style discussion. The tool makes it easy to change the clang-format binary and .clang_format file to help reason about the problem.
clang_format.py is added with the usage:
$ contrib/devtools/clang_format.py -h
usage: clang_format.py [-h] [-b BIN_PATH] [-s STYLE_FILE] [-j JOBS] [-f]
{report,check,format} [target [target ...]]
A utility for invoking clang-format to observe the state of C++ code
formatting in the repository. It produces reports of style metrics and also
can apply formatting.
positional arguments:
{report,check,format}
Selects the action to be taken. 'report' produces a
report with analysis of the selected files taken as a
group. 'check' validates that the selected files match
the style and gives a per-file report and returns a
non-zero bash status if there are any format issues
discovered. 'format' applies the style formatting to
the selected files.
target A list of files and/or directories that select the
subset of files for this action. If a directory is
given as a target, all files contained in it and its
subdirectories are recursively selected. All targets
must be tracked in the same git repository clone.
(default=The current directory)
optional arguments:
-h, --help show this help message and exit
-b BIN_PATH, --bin-path BIN_PATH
The path to the clang-format binary to be used.
(default=clang-format-[0-9]\.[0-9] installed in PATH
with the highest version number)
-s STYLE_FILE, --style-file STYLE_FILE
The path to the style file to be used. (default=The
src/.clang_format file of the repository which holds
the targets)
-j JOBS, --jobs JOBS Parallel jobs for computing diffs. (default=6)
-f, --force Force proceeding with 'check' or 'format' if clang-
format doesn't support all parameters in the style
file. (default=False)
The report command looks like this when run against master:
$ contrib/devtools/clang_format.py report
--------------------------------------------------------------------------------
1466 files tracked according to 'git ls-files'
406 files in scope according to SOURCE_FILES and ALWAYS_IGNORE settings
406 files examined according to listed targets
--------------------------------------------------------------------------------
clang-format bin: /usr/lib/llvm-3.8/bin/clang-format
clang-format version: 3.8.0
Using style in: /home/isle/muhbitcoin/src/.clang-format
--------------------------------------------------------------------------------
Parallel jobs for diffs: 6
Elapsed time: 38.60s
Slowest diffs:
33.88s for src/chainparamsseeds.h
3.45s for src/qt/bitcoinstrings.cpp
--------------------------------------------------------------------------------
Files 100% matching: 47
Files <100% matching: 359
Formatted content MD5: 91ee2c53ca93ccca584457770dbf606c
--------------------------------------------------------------------------------
Files 90%-99% matching: 132
Files 80%-89% matching: 151
Files 70%-79% matching: 46
Files 60%-69% matching: 14
Files 50%-59% matching: 10
Files 40%-49% matching: 1
Files 30%-39% matching: 1
Files 20%-29% matching: 1
Files 10%-19% matching: 1
Files 0%- 9% matching: 2
--------------------------------------------------------------------------------
+-------+ +------------+--------+---------+-----------+-------------+
| score | | pre-format | added | removed | unchanged | post-format |
+-------+ +-------+------------+--------+---------+-----------+-------------+
| 83% | | lines | 109469 | 16807 | 17917 | 91552 | 108359 |
+-------+ +-------+------------+--------+---------+-----------+-------------+
--------------------------------------------------------------------------------
It also will take a list of specific files to narrow it to a specific set instead of the entire repo.
The check subcommand output displays a specific report for each non-matching file in the set of targets and provides a status code to the shell:
$ contrib/devtools/clang_format.py check src/init.cpp
--------------------------------------------------------------------------------
1466 files tracked according to 'git ls-files'
406 files in scope according to SOURCE_FILES and ALWAYS_IGNORE settings
1 files examined according to listed targets
--------------------------------------------------------------------------------
A code format issue was detected in src/init.cpp
+-------+ +------------+--------+---------+-----------+-------------+
| score | | pre-format | added | removed | unchanged | post-format |
+-------+ +-------+------------+--------+---------+-----------+-------------+
| 90% | | lines | 1641 | 151 | 163 | 1478 | 1629 |
+-------+ +-------+------------+--------+---------+-----------+-------------+
--------------------------------------------------------------------------------
These files can be auto-formatted by running:
$ contrib/devtools/clang_format.py format [target [target ...]]
--------------------------------------------------------------------------------
*** Format issues found!
The format subcommand will apply clang-format to set of targets much like the predecessor clang-format.py (which was recently removed in #9649).
The usage is overloaded for convenience:
To limit the reporting to just src/qt/:
$ contrib/devtools/clang_format.py report src/qt/
To check three random files:
$ contrib/devtools/clang_format.py check src/foo.cpp src/bar.h src/baz.cpp
To apply format to a single random file and also all files under a subdir:
$ contrib/devtools/clang_format.py format src/fizz.cpp src/buzz/
I have tested with all versions 3.4 through 3.9.0 on Ubuntu 14.04 and most of the same up to 3.9.1 on Debian 8. The applied formatting does have small differences between major versions (e.g. different result between 3.8.0 and 3.9.0 but the same result between 3.9.0 and 3.9.1). Some of the parameters in src/.clang_format aren't added until version 3.6.0. The script warns if a parameter is rejected. If you use the --force flag, it will proceed with 'best-effort' where the rejected parameters are dropped.
This is the fourth script in a series (The others being #9459 copyright_header.py, #9603 basic_style.py and #9632 clang_static_analysis.py) that provides a report command to measure the state of the codebase in the particular dimension.
A suggested trajectory from here is:
- automating a suite of scripts to generate reports with consistent and known tool dependencies - a build matrix item in TravisCI looks like a good first thought
- exposing metrics so they can be observed and monitored.
After that, the discussion can progress to:
- considering some bulk actions to improve the codebase - in a safe, measured, conscientious way, of course.
- observing the 'drift' in the wrong direction as PRs are merged
- considering enforcement of pre-merge CI criteria where it adds clear value - the
checksubcommand in these scripts show a way that this can be done. - considering areas where similar automation scripts might be helpful (pep8 etc.)
Comments on all this are welcome.