72 Commits

Author SHA1 Message Date
0d23934116 only parse regular, non-empty files
I have unix sockets sitting in my source dir:

$ ls -al /home/vagrant/pdns/pdns/pdns.controlsocket
srwxr-xr-x 1 root root 0 Jul 23 14:58 /home/vagrant/pdns/pdns/pdns.controlsocket
This makes codespell abort:

$ ./codespell.py -s ~/pdns
[...]
Traceback (most recent call last):
  File "./codespell.py", line 527, in <module>
    sys.exit(main(*sys.argv))
  File "./codespell.py", line 516, in main
    parse_file(os.path.join(root, file), colors, summary)
  File "./codespell.py", line 378, in parse_file
    if not istextfile(filename):
  File "./codespell.py", line 298, in istextfile
    with open(filename, mode='rb') as f:
IOError: [Errno 6] No such device or address: '/home/vagrant/pdns/pdns/pdns.controlsocket'
2013-10-23 22:29:49 -02:00
02ac2c132c codespell.py
Fix help text: the -r option was removed, so don't
refer to it.
2013-07-23 22:42:50 -03:00
8284c608a1 codespell 1.6 2013-05-06 14:29:53 -03:00
f1c09d814d Move dictionary name to optional parameter
Most often users just need the default dictionary, so let's create an option
with default value set to dictonary.txt. Users can overwrite this parameter with
its own dictionary.

By default codespell will check $PWD/data/dictionary.txt file. It is where
dictionary is stored if one uses codespell from sources tar or git repo.
It is expected that package managers will install dictionary to shared data folder
and change default value accordinally.
2013-04-17 11:02:20 -07:00
dcb1a6867d Fix spell error using codespell tool 2013-04-17 02:13:32 -03:00
b914fd88a1 Use 'check current directory' as common case
Checking current directory is something more widely used than
checking STDIN. Set is as common case.
2013-04-17 02:10:41 -03:00
c2c573cdb1 Remove '-r' recursive flag
If user specifies directory then it means user wants to check it
recursively.
2013-04-17 02:07:43 -03:00
c92bdd0804 codespell 1.5 2013-04-10 14:35:23 -03:00
77739491c6 codespell 1.4 2012-02-28 23:12:43 -03:00
8e0a59aecd Allow hifen in words 2012-02-28 13:09:14 -03:00
c169262497 codespell 1.3 2011-12-09 13:16:24 -02:00
b038c2e70f Do not rely on environment to know charset of known file
When opening the dict, we know it's in UTF-8, so do not rely on the
environment.

Thanks to Ettl Martin for reporting the problem and founding the
solution.
2011-12-06 03:54:37 -02:00
be9d1ace36 codespell 1.2 2011-09-30 10:44:54 -03:00
4ccd5cb5c6 Add option to skip paths matching glob 2011-09-29 11:24:15 -03:00
6183f803ce codespell itself 2011-09-28 16:18:53 -03:00
fe34d74317 Add missing spaces in description 2011-09-28 16:16:50 -03:00
b22e8d51ec Allow words with different case on one line, but ask the user only once. 2011-07-25 10:41:02 +02:00
4e116261d5 Show and return words in the proper case in interactive mode. 2011-07-25 10:39:00 +02:00
2957743f01 Don't init chardet if it's not being used
This way codespell can be run on a system without chardet installed.
2011-07-12 11:52:10 -03:00
e47b050d4d Add option to use chardet for encoding detection
The tradeoff is it's much, much, much slower. In my tests, circa 10
times slower than without chardet. But it always use the right encoding.

Maybe the right thing to do is only a fallback to chardet since most of
source code is in ascii/utf-8/iso8859-1. This will be left undecided
until 1.2 comes out.
2011-07-02 16:43:11 -03:00
892ad79af4 Move encoding detection to its own function 2011-07-02 12:49:21 -03:00
d308b7a591 codespell 1.1 2011-06-18 14:23:01 -03:00
95c5ea62f4 codespell 1.1-rc2 2011-06-14 19:48:54 -03:00
3ef81ccf3f Fix replacement of same word in one line
When one line had the same mispelled word, codespell was incorrectly
fixing that line, even introducing new typos. This was because the list
of misspelled words is not updated according to the fixes.

Instead of always updating this list and making the loop more difficult,
we do as following:

- Cache the words that are fixed in a certain line
- Fix all cases of a misspelled in each line (this means that
  interactive mode will fix all cases with the same suggestions... not
  awesome, but simplifies a lot the code)
- Use a regex with re.sub() instead of the naive string.replace()
  function. This eliminates dumb cases of matching partial words and
  modifying them. Eg.: addres->address would modify addressable to
  addresssable.
- Skip words that were already fixed by previous iteration.

Thanks to Bruce Cran <bruce@cran.org.uk> for reporting this issue.
2011-06-14 01:58:40 -03:00
2fe2752b92 codespell 1.1-rc1 2011-06-02 00:02:03 -03:00
8d26e40d1d Write file with the same encoding it was opened 2011-05-30 10:54:09 -03:00
83dcc1b694 Change order of scissors and SUMMARY 2011-05-28 14:10:23 -03:00
c64a4f500b Implement quiet levels
Add -q (--quiet-level) argument so user can disable messages. Values
are a bit mask, and any combination of them is possible.
2011-05-27 23:39:11 -03:00
5e28817da1 Reword comment 2011-05-27 21:16:13 -03:00
8ca6ac1d64 Do not hash twice
Sets already use the __hash__() method of each object to decide if an
object is in it. When we use the sha1 we are therefore hashing twice.

The impact is on performance. Following the performance before and after
this patch to parse the entire Linux Kernel tree with a big exclude
list.

Before:
	real	2m20.959s
	user	2m16.888s
	sys	0m1.386s

After:
	real	1m35.169s
	user	1m28.719s
	sys	0m1.354s
2011-05-25 11:24:38 -03:00
5884c59139 Implement exclusion list 2011-05-24 00:37:19 -03:00
eef1ea640a Remove commented out code 2011-05-02 16:32:22 -03:00
810848dbba Add interactive mode option 2011-04-28 19:06:31 -03:00
1f6deb0da3 Implement interactive mode 2011-04-28 19:06:14 -03:00
b684551ef7 Refactor loop to add interactive mode
Do not expose the interactive mode yet, but add the necessary code to
implement it.
2011-04-27 14:20:07 -07:00
c01530dae7 Implement simple summary of the changes 2011-04-27 01:15:48 -07:00
3ddcab5674 Do not treat apostrophe as a word boundary
In case the word to be fixed has apostrophes, codespell was not making
the right fix. E.g:

i) "doesn't" was read as two separated words: "doesn" and "t"
ii) "doesnt'" was read as "doesnt"
iii) "doens't" was read as two separated words: "doens" and "t"

(i) is not a big deal since the spelling is right. In (ii) the fix would
be obviously wrong, since the net result would be "doesn't'" since the
doesnt->doesn't would apply in this case. (iii) is even worse since the
doens->does rule would apply and the result would be "does't"

Adding apostrophe to the list of chars treated as word boundary (i) and
(iii) are fixed and new rules are added to the dictionary in order to
fix (ii).
2011-04-18 10:51:40 -03:00
e835701832 Do not override previous fix
If there are two words misspelled in a line, codespell was detecting
both, but when writing them to file only the latest was actually being
fixed. This is because we used the wrong line string to fix them.
2011-04-14 13:54:11 -03:00
86701bcda5 Update copyrights 2011-04-14 13:53:53 -03:00
3a0c196609 codespell 1.0 2011-03-29 17:42:39 -03:00
990a73f9f0 Add license
codespell is licensed as GPLv2
2011-03-22 17:41:02 -03:00
09b4baa68b codespell 1.0-rc2 2011-02-21 23:59:00 -03:00
1002cedf3f Print right lower/upper/capitalized word 2011-02-17 08:52:18 -02:00
5a1c2ad67c replace one word at a time
Since we already iterate over all words, there's no need to substitute
all the words at once.
2011-02-16 15:01:42 -02:00
1fcef67f6f codespell 1.0-rc1 2011-02-03 01:32:02 -02:00
12ca129d9b Fix 'reason' not appearing 2011-02-03 01:12:57 -02:00
e46c8a7244 Use WARNING prefix when file is binary 2011-02-03 01:05:08 -02:00
2588f4aba7 Try iso-8859-1 encoding if utf-8 fails 2011-02-03 01:05:08 -02:00
1073b60660 fix condition for closing file 2011-02-03 00:20:12 -02:00
160d3d3649 Do not open links 2011-02-02 21:21:58 -02:00