How to Calculate Edit Distance in Python
Today, we will learn about the edit distance in Python. We will also explore the character string’s insertion, deletion, substitution, and recursive implementation.
Edit Distance in Python
Edit distance is the amount required for the transposition of a String to another String. This transposition is done by recursion, substitution, insertion and deletion of the characters of a string.
The number of character deletions, insertions, and replacements required to turn one word into another depends on how closely the two words were edited.
It is also addressed as the Levenshtein
distance in Python because it does not need that two strings to be of identical length to be compared; the Levenshtein
distance has a significant influence.
The Levenshtein
distance is straightforward to comprehend intuitively. The total number of single-character modifications determines the distance between the two.
The output distance indicates how many modifications were required to make the two words identical. The smaller the output distance, the fewer changes will be required.
Let’s understand the edit distance using the following example code for insertion, deletion, substitution and recursion implementation of the character string.
Insertion of the Character String
In the following code, t
is inserted between i
and n
in the word machine.
Example Code:
w = "Machine"
w = w[:5] + "t" + w[5:]
print(w)
Output:
Machitne
Deletion of the Character String
In the following code, c
is deleted between a
and h
in the word machine.
Example Code:
w = "Machine"
w = w[:2] + w[3:]
print(w)
Output:
Mahine
Substitution of the Character String
In the following code, c
is replaced by a
between a
and h
in the word machine.
Example Code:
w = "Machine"
w = w[:2] + "a" + w[3:]
print(w)
Output:
Maahine
Recursive Implementation of the Character String
The edit distance between machine
and learning
is five, as shown in the output below.
We can take any string input from the user, and the Levenshtein
distance is implemented recursively in the following Python method.
This recursive technique is inefficient since the Levenshtein
distances of the identical substrings are computed repeatedly.
Example Code:
w1 = input("Type here for word 1: ")
w2 = input("Type here for word 2: ")
len1 = len(w1)
len2 = len(w2)
a = [[0] * (len2 + 1) for _ in range(len1 + 1)]
for m in range(0, len1 + 1):
a[m][0] = m
for n in range(0, len2 + 1):
a[0][n] = n
for m in range(1, len1 + 1):
for n in range(1, len2 + 1):
if w1[m - 1] == w2[n - 1]:
a[m][n] = a[m - 1][n - 1]
else:
a[m][n] = min(a[m][n - 1], a[m - 1][n], a[m - 1][n - 1]) + 1
print(a[m][n])
Output:
Type here for word 1: machine
Type here for word 2: learning
5
Zeeshan is a detail oriented software engineer that helps companies and individuals make their lives and easier with software solutions.
LinkedIn