mirror of
https://github.com/trekhleb/javascript-algorithms.git
synced 2025-07-06 17:44:08 +08:00
Add Levenshtein Distance algorithm explanations.
This commit is contained in:
@ -40,7 +40,76 @@ three edits:
|
||||
2. sitt**e**n → sitt**i**n (substitution of "i" for "e")
|
||||
3. sittin → sittin**g** (insertion of "g" at the end).
|
||||
|
||||
## Applications
|
||||
|
||||
This has a wide range of applications, for instance, spell checkers, correction
|
||||
systems for optical character recognition, fuzzy string searching, and software
|
||||
to assist natural language translation based on translation memory.
|
||||
|
||||
## Dynamic Programming Approach Explanation
|
||||
|
||||
Let’s take a simple example of finding minimum edit distance between
|
||||
strings `ME` and `MY`. Intuitively you already know that minimum edit distance
|
||||
here is `1` operation and this operation. And it is a replacing `E` with `Y`. But
|
||||
let’s try to formalize it in a form of the algorithm in order to be able to
|
||||
do more complex examples like transforming `Saturday` into `Sunday`.
|
||||
|
||||
To apply the mathematical formula mentioned above to `ME → MY` transformation
|
||||
we need to know minimum edit distances of `ME → M`, `M → MY` and `M → M` transformations
|
||||
in prior. Then we will need to pick the minimum one and add _one_ operation to
|
||||
transform last letters `E → Y`. So minimum edit distance of `ME → MY` transformation
|
||||
is being calculated based on three previously possible transformations.
|
||||
|
||||
To explain this further let’s draw the following matrix:
|
||||
|
||||

|
||||
|
||||
- Cell `(0:1)` contains red number 1. It means that we need 1 operation to
|
||||
transform `M` to an empty string. And it is by deleting `M`. This is why this number is red.
|
||||
- Cell `(0:2)` contains red number 2. It means that we need 2 operations
|
||||
to transform `ME` to an empty string. And it is by deleting `E` and `M`.
|
||||
- Cell `(1:0)` contains green number 1. It means that we need 1 operation
|
||||
to transform an empty string to `M`. And it is by inserting `M`. This is why this number is green.
|
||||
- Cell `(2:0)` contains green number 2. It means that we need 2 operations
|
||||
to transform an empty string to `MY`. And it is by inserting `Y` and `M`.
|
||||
- Cell `(1:1)` contains number 0. It means that it costs nothing
|
||||
to transform `M` into `M`.
|
||||
- Cell `(1:2)` contains red number 1. It means that we need 1 operation
|
||||
to transform `ME` to `M`. And it is be deleting `E`.
|
||||
- And so on...
|
||||
|
||||
This looks easy for such small matrix as ours (it is only `3x3`). But here you
|
||||
may find basic concepts that may be applied to calculate all those numbers for
|
||||
bigger matrices (let’s say `9x7` one, for `Saturday → Sunday` transformation).
|
||||
|
||||
According to the formula you only need three adjacent cells `(i-1:j)`, `(i-1:j-1)`, and `(i:j-1)` to
|
||||
calculate the number for current cell `(i:j)`. All we need to do is to find the
|
||||
minimum of those three cells and then add `1` in case if we have different
|
||||
letters in `i`'s row and `j`'s column.
|
||||
|
||||
You may clearly see the recursive nature of the problem.
|
||||
|
||||

|
||||
|
||||
Let's draw a decision graph for this problem.
|
||||
|
||||

|
||||
|
||||
You may see a number of overlapping sub-problems on the picture that are marked
|
||||
with red. Also there is no way to reduce the number of operations and make it
|
||||
less then a minimum of those three adjacent cells from the formula.
|
||||
|
||||
Also you may notice that each cell number in the matrix is being calculated
|
||||
based on previous ones. Thus the tabulation technique (filling the cache in
|
||||
bottom-up direction) is being applied here.
|
||||
|
||||
Applying this principles further we may solve more complicated cases like
|
||||
with `Saturday → Sunday` transformation.
|
||||
|
||||

|
||||
|
||||
## References
|
||||
|
||||
- [Wikipedia](https://en.wikipedia.org/wiki/Levenshtein_distance)
|
||||
- [YouTube](https://www.youtube.com/watch?v=We3YDTzNXEk&list=PLLXdhg_r2hKA7DPDsunoDZ-Z769jWn4R8)
|
||||
- [ITNext](https://itnext.io/dynamic-programming-vs-divide-and-conquer-2fea680becbe)
|
||||
|
Reference in New Issue
Block a user