When one line had the same mispelled word, codespell was incorrectly
fixing that line, even introducing new typos. This was because the list
of misspelled words were not updated according the fixes.
Instead of always updating this list and making the loop more difficult,
we do as following:
- Cache the words that are fixed in a certain line
- Fix all cases of a misspelled in each line (this means that
interactive mode will fix all cases with the same suggestions... not
awesome, but simplifies a lot the code)
- Use a regex with re.sub() instead of the naive string.replace()
function. This eliminates dumb cases of matching partial words and
modifying them. Eg.: addres->address would modify addressable to
addresssable.
- Skip words that were already fixed by previous iteration.
Thanks to Bruce Cran <bruce@cran.org.uk> for reporting this issue.
If there are two words misspelled in a line, codespell was detecting
both, but when writing them to file only the latest was actually being
fixed. This is because we used the wrong line string to fix them.
Example:
$ ./codespell.py -d example/dict.txt example/code.c
example/code.c:9: clas ==> class | disabled due to name clash in c++
example/code.c:10: tis ==> this
example/code.c:14: opem ==> open
example/code.c:16: buring ==> burying, burning, burin, during
example/code.c:17: clas ==> class | disabled due to name clash in c++