Distance Measures: Levenshtein Distance

From exact search to in-exact search to phonetics, to change and edits in text, we use these algorithms on a daily basis. How about we explore how they really work?
We will be exploring one particular algorithm that was probably used by your device whe…

From exact search to in-exact search to phonetics, to change and edits in text, we use these algorithms on a daily basis. How about we explore how they really work?
We will be exploring one particular algorithm that was probably used by your device when you were searching for this post. You also use it every day when you text your family and friends. Sometimes, you blame it for mistakes in your spelling. Sit back and enjoy this ride.



The Levenshtein Distance.

This algorithm was designed in 1965 by a Russian Mathematician, Vladimir Levenshtein. It measures the minimum number of insertions, deletions, or replacements that are required to change one string to another. The higher the distance value, the greater the difference between the strings.
A simple example looks like this. The Levenshtein distance between the word “run” and “run” is 0 because both words are identical. No insertion, deletion, or replacement is needed. In contrast hand, the Levenshtein distance between the words “run” and “ran” is 1. We need one substitution to change the word “run” to “ran” i.e. “u” to “a”.



Implementation in Python

To check how many operations we need to transform the word MONDAY to FRIDAY, we will draw a dynamic programming table to visualize.

This is our initial table:
Alt Text
All values are currently initialized to zero. Notice some changes in the second column of the table.
The reason for that will be explained using the second row because the same process occurs.
Alt Text
The next task is to start comparing substrings and calculate distances. To fill up the second row of the table, perform the following operations:
Alt Text
Alt Text
Alt Text
Alt Text
Notice the way the numbers and colors are changing for each element.
This key will be used as a reference when traversing the table.
Alt Text
This means at any particular location table[x,y] where source[x]!=target[y], a replacement will cost table[x-1,y-1]+1, an insertion table[x,y-1]+1 and a deletion table[x-1,y] +1. The solution table[x,y] is the minumum of either of those values. When source[x]==target[y], table[x,y] will be min(table[x-1,y-1],table[x,y-1]+1,table[x-1,y] +1 ). The intuition behind previous statements will become clearer in the code and subsequent tables. See how to apply them in the table.
Alt Text
Intuition :
Alt Text

After all the iterations are completed, the global solution to the problem can be found at the bottom right coner of the table.
Alt Text

import numpy as np
def levenshtein(token_1,token_2):
    len_token_1 =len(token_1) +1
    len_token_2 = len(token_2) +1
    matrix  = np.zeros((len_token_1, len_token_2))

    for i in range(len_token_1):
        matrix[i,0] = i
    for j in range(len_token_2):
        matrix[0,j] = j

    for x in range(1,len_token_1):
        for y in range(1,len_token_2):
            if token_1[x-1] == token_2[y-1]:
                matrix [x,y] = min(
                    matrix[x-1,y-1],
                    matrix[x,y-1]+1,
                    matrix[x-1,y]+1
                )  
            else:
                matrix [x,y] = min(
                    matrix[x-1,y-1]+1,
                    matrix[x,y-1]+1,
                    matrix[x-1,y]+1
                )
    return matrix[len_token_1-1, len_token_2-1]
print(levenshtein("monday","friday"))

Time complexity : O(x*y)
Space complexity: O(x*y)

Dynamic programming helps us break big problems into smaller problems. There are other approaches to solving this problem. This algorithm can also be enhanced to give more information about the operation to be performed on either character.

If you have any questions, contact me @forthecode__. This can also help you understand better.


Print Share Comment Cite Upload Translate
APA
Bamimore-Tomi | Sciencx (2024-03-29T13:26:04+00:00) » Distance Measures: Levenshtein Distance. Retrieved from https://www.scien.cx/2021/04/26/distance-measures-levenshtein-distance/.
MLA
" » Distance Measures: Levenshtein Distance." Bamimore-Tomi | Sciencx - Monday April 26, 2021, https://www.scien.cx/2021/04/26/distance-measures-levenshtein-distance/
HARVARD
Bamimore-Tomi | Sciencx Monday April 26, 2021 » Distance Measures: Levenshtein Distance., viewed 2024-03-29T13:26:04+00:00,<https://www.scien.cx/2021/04/26/distance-measures-levenshtein-distance/>
VANCOUVER
Bamimore-Tomi | Sciencx - » Distance Measures: Levenshtein Distance. [Internet]. [Accessed 2024-03-29T13:26:04+00:00]. Available from: https://www.scien.cx/2021/04/26/distance-measures-levenshtein-distance/
CHICAGO
" » Distance Measures: Levenshtein Distance." Bamimore-Tomi | Sciencx - Accessed 2024-03-29T13:26:04+00:00. https://www.scien.cx/2021/04/26/distance-measures-levenshtein-distance/
IEEE
" » Distance Measures: Levenshtein Distance." Bamimore-Tomi | Sciencx [Online]. Available: https://www.scien.cx/2021/04/26/distance-measures-levenshtein-distance/. [Accessed: 2024-03-29T13:26:04+00:00]
rf:citation
» Distance Measures: Levenshtein Distance | Bamimore-Tomi | Sciencx | https://www.scien.cx/2021/04/26/distance-measures-levenshtein-distance/ | 2024-03-29T13:26:04+00:00
https://github.com/addpipe/simple-recorderjs-demo