Files
flame/scripts/cspell-verify.sh
Luan Nico a036f19517 chore: Add a script to validate and fix all dictionary files (#2780)
The script has two purposes:

remove orphan words
alphabetize the files
I set it up to run on GitHub action as a checker, but a --fix option is available as well for running locally.

When running, I noticed that there are A LOT of orphaned words.
At first, I thought that it might be the case that cSpell was missing words on our docs that were clearly used, which would be a HUGE issue. I made this PR to validate that: #2735

But upon proper investigation, and using cSpell's trace command, I realized that we import multiple standard dictionaries: "en_US" and "softwareTerms", and they are constantly being updated. The word "cypher" was just added 12 hours ago, for example.

Turns out ALL of the current orphan words are properly being detected on our files, but now are included on the official dictionaries! Which is amazing.

Note that I did have to stop using the GitHub Action to run cSpell. The reason is twofold; (1) because I need to install cSpell anyway to run my script and didn't want to have the action download it again; and (2) because the version on the GitHub Action (even though it is the same 7.3.7 from npm that I have locally) doesn't have the latest updates (like does not have the cypher word that was added 12h ago). This would make my script and the CI script incompatible.
2023-10-01 17:44:58 +00:00

76 lines
2.0 KiB
Bash
Executable File

#!/bin/bash
fix=$([[ "$*" == *--fix* ]] && echo true || echo false)
function sort_fn() {
sort --ignore-case -C
}
function sort_dictionary() {
local file="$1"
local tmp_file=$(mktemp)
head -n 1 "$file" > "$tmp_file"
tail -n +2 "$file" | sort_fn >> "$tmp_file"
mv "$tmp_file" "$file"
}
function delete_unused() {
local file="$1"
local word="$2"
perl -i -ne "print unless /^\s*${word}\s*([# ].*)?$/i" "$file"
}
function lowercase() {
tr 'A-Z' 'a-z'
}
word_list="word_list.tmp"
dictionary_dir=".github/.cspell"
tmp_dir=".cspell.tmp"
mv "$dictionary_dir" "$tmp_dir"
mkdir "$dictionary_dir"
for file in "$tmp_dir"/*; do
if [[ -f "$file" ]]; then
touch "$dictionary_dir/$(basename "$file")"
fi
done
cspell --dot --no-progress --unique --words-only "**/*.{md,dart}" | lowercase | sort -f > $word_list || exit 1
rm -r "$dictionary_dir"
mv "$tmp_dir" "$dictionary_dir"
error=0
for file in .github/.cspell/*.txt; do
echo "Processing dictionary '$file'..."
violation=$(awk '!/^#/' "$file" | sort_fn 2>&1 || true)
if [ -n "$violation" ]; then
echo "Error: The dictionary '$file' is not in alphabetical order. First violation: '$violation'" >&2
error=1
if $fix; then
echo "Fixing the dictionary '$file'"
sort_dictionary "$file"
fi
fi
while IFS= read -r line; do
# split the line by # to remove comments
word=$(echo "$line" | cut -d '#' -f 1 | xargs | lowercase) # xargs trims whitespace
# check if the word exists in the project
if [[ -n "$word" ]] && ! grep -wxF "$word" "$word_list" >/dev/null; then
echo "Error: The word '$word' in the dictionary '$file' is not needed." >&2
error=1
if $fix; then
echo "Fixing the dictionary '$file' with excess word $word"
delete_unused "$file" "$word"
fi
fi
done < "$file"
done
rm $word_list
exit $error