The encoding detection code was trying to catch encoding-related
exceptions when the file is opened. This doesn't make sense, because
at this point no data has been read, therefore no encoding errors can be
detected. Instead, catch encoding-related exceptions when the file
contents are read.
Also avoid bailing out with `Exception('Unknown encoding')` on empty
files.
On Win11 I always get
codespell................................................................Failed
- hook id: codespell
- exit code: 1
Traceback (most recent call last):
File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python39_64\lib\runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python39_64\lib\runpy.py", line 87, in _run_code
exec(code, run_globals)
File "C:\Users\Joerg.WORLDWARTWEB\.cache\pre-commit\repox5gyvwwr\py_env-python3\Scripts\codespell.EXE\__main__.py", line 7, in <module>
File "C:\Users\Joerg.WORLDWARTWEB\.cache\pre-commit\repox5gyvwwr\py_env-python3\lib\site-packages\codespell_lib\_codespell.py", line 747, in _script_main
return main(*sys.argv[1:])
File "C:\Users\Joerg.WORLDWARTWEB\.cache\pre-commit\repox5gyvwwr\py_env-python3\lib\site-packages\codespell_lib\_codespell.py", line 862, in main
build_exclude_hashes(options.exclude_file, exclude_lines)
File "C:\Users\Joerg.WORLDWARTWEB\.cache\pre-commit\repox5gyvwwr\py_env-python3\lib\site-packages\codespell_lib\_codespell.py", line 443, in build_exclude_hashes
for line in f:
File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python39_64\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 2244: character maps to <undefined>
The proposed change fixes this for me
The list comprehension is shorter than the map() version.
I feel it is also simpler, although that is debatable.
This is consistent with the previous commit.
For an end user who is not familiar with the inner details of the
script, the meaning of the old definition was ambiguous. It wasn't
clear whether using this dictionary flags the rare words as errors
(false positives) or on the contrary does not flag these rare words
(false negatives).
Co-authored-by: Peter Newman <peternewman@users.noreply.github.com>
* Read options from config
* Fix assert in test
* Fix tests
* Fix readme to pass rst checks in Travis
* Also support .codespellrc config file
* Fix flake8 error
* CLI args override config args
* Rename tool:codespell to just codespell in config
* Fix typo in readme
* Remove unnecessary check for existance of config files (configparser already handles this case)