In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/02 Report--
This article mainly explains "how to calculate the editing distance in Python". The content in the article is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn "how to calculate editing distance in Python".
Algorithm principle
When calculating the similarity of text, editing distance is often used. Edit distance, also known as Levenshtein distance, refers to the minimum number of editing operations required between two strings from one to the other. Generally speaking, the smaller the editing distance, the greater the similarity between the two texts. There are three main editing operations here:
Insert: insert a character into a string
Delete: delete a character in a string
Replace: replace one character in a string with another.
Let's take a look at it through an example.
What is the editing distance when you change the string batyu to beauty? This requires the following steps:
1. Batyu becomes beatyu (insert character e)
2. Beatyu becomes beaty (delete the character u)
3. Beaty becomes beauty (insert character u)
So the editing distance is 3.
So, how to use Python to calculate the editing distance? We can analyze it from a relatively simple situation.
When both strings are empty, the editing distance is 0
When one of the strings is empty, the editing distance is the length of the other non-empty string
When both strings are non-empty (length I and j, respectively), you can choose the minimum values of the following three cases:
1. If the editing distance of a string of length iMui 1 and j is known, you can add 1.
2. If the editing distance of a string of length I and jmur1 is known, you can add 1
3. The editing distance of strings with lengths of iMui 1 and jMui 1 is known, so consider two cases. If the I character and the j character are different, you can add 1; if different, there is no need to add 1.
Obviously, the idea of the above algorithm is dynamic programming.
To calculate the editing distance of a string of length m and n, first define the function-edit (I, j), which represents the editing distance between the first string of length I and the second string of length j. Dynamic programming expressions can be written as follows:
If I = = 0 and j = 0 MagneEdit (I, j) = 0
If (I = 0 and j > 0) or (I > 0 and j = = 0), edit (I, j) = I + j
If I ≥ 1 and j ≥ 1, edit (I, j) = = min {edit (iMel 1, j) + 1, edit (I, JMel 1) + 1, edit (iMub 1, jMet 1) + d (I, j)}, d (I, j) = 1 when the I character of the first string is not equal to the j character of the second string; otherwise, d (I, j) = 0.
The final editing distance is edit (mrecoery n). The edit matrix of the above example can be represented as follows:
Python code implementation
Talk is cheap. Show me the code. The Python code is also extremely concise, which is the charm of dynamic planning:
Def editdistance (str1, str2):
''
Calculate the edit distance between the strings str1 and str2
: param str1:
: param str2:
: return:
''
Edit = [I + j for j in range (len (str2) + 1)] for i in range (len (str1) + 1)]
For i in range (1, len (str1) + 1):
For j in range (1, len (str2) + 1):
If str1 [I-1] = = str2 [j-1]:
D = 0
Else:
D = 1
Edit [I] [j] = min (Edit [I-1] [j] + 1, edit [I] [j-1] + 1, edit [I-1] [j-1] + d)
Return Edit [len (str1)] [len (str2)]
Expansion
So, Python is so powerful, is there a package that calculates the editing distance?
The answer is yes. The Levenshtein package in Python can be used to calculate the editing distance. The installation method is very simple. You can install it directly:
Pip install python-Levenshtein
In this way, we can introduce packages to calculate the editing distance directly:
Import Levenshtein
Str1 = 'batyu'
Str2 = 'beauty'
Print (Levenshtein.distance (str1, str2))
So, is there any other way to calculate distance in the Levenshtein package?
This package has many ways to calculate the distance, including the following:
Hamming (str1, str2), which calculates the hamming distance of equal length strings str1 and str2, that is, the number of different characters in the corresponding position between two equal-length strings.
Ratio (str1, str2), calculate Levens Tambi. Calculate the formula r = (sum-ldist) / sum, where sum refers to the sum of the length of str1 and str2 strings, and ldist is the class editing distance. Note that this is the class editing distance, in which deletion and insertion are still + 1, but replace + 2.
Jaro (str1, str2), jaro_winkler (str1, str2) and so on.
Summary
The dynamic programming algorithm can be used to solve the editing distance of a string.
The PyPi package Levenshtein can be used to calculate the editing distance of a string, as well as other types of distance.
Thank you for your reading, the above is the content of "how to calculate the editing distance of Python". After the study of this article, I believe you have a deeper understanding of how to calculate the editing distance of Python, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.