Historically, there was an attempt via test/lint/lint-python-utf8-encoding.py to enforce explicit UTF8 in every Python IO statement (open, subprocess, …). However, the lint check has many problems:
- The check is incomplete and many IO statements lack the explicit UTF8 specification.
- It was added at a time when some systems were not UTF8 by default.
- The check is brittle, as it depends on a fragile regex.
In theory, now that the minimum Python version is 3.10 (since commit 2123c94448ed142e78942421c597a1f264859c48), the check could be replaced by PYTHONWARNDEFAULTENCODING=1 from https://docs.python.org/3/whatsnew/3.10.html#optional-encodingwarning-and-encoding-locale-option. However, this comes with many other problems:
- All our Python scripts already assume and require UTF8 to be set externally. On almost all modern systems, this is already the default. Some Windows versions do not have UTF8 by default and require PYTHONUTF8=1to be set for the tests to run already today (with or without the changes in this pull). Also, the CI and many other Bash scripts force UTF8 viaLC_ALL. Finally, Python 3.15 will likely enable UTF8 on all systems by default, per https://peps.python.org/pep-0686/#abstract.
- So adding UTF8 to every single IO call is redundant, verbose, and confusing, given that it is the expected default.
So fix all issues, by:
- Removing the test/lint/lint-python-utf8-encoding.pycheck.
- Removing the encoding on the individual IO calls.
- Clarifying the existing docs around the existing UTF8 requirement and assumption.
Obviously, every IO call is still free to specify UTF8 or any other encoding explicitly, if there is a documented need for it in the future.