Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to implement Chinese error Correction with Python Fault-tolerant prefix Tree

2025-01-14 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/01 Report--

This article mainly introduces the Python fault-tolerant prefix tree how to achieve Chinese error correction related knowledge, the content is detailed and easy to understand, the operation is simple and fast, has a certain reference value, I believe that everyone after reading this Python fault-tolerant prefix tree how to achieve Chinese error correction article will have a harvest, let's take a look.

Introduction

In this paper, the prefix tree is implemented using Python, and the query with fault tolerance of editing distance is supported. The prefix tree in this paper stores only three participles in the format of (participle string, frequency), such as: ("Zhonghai Jinxi Garden", 2), ("Zhonghai West Park", 24), ("Zhongnanhai", 4). You can replace the data with your own file. Specify a string and the maximum fault-tolerant editing distance when querying.

Realize

Class Word: def _ init__ (self, word, freq): self.word = word self.freq = freqclass Trie: def _ init__ (self): self.root = LetterNode (") self.START = 3 def insert (self, word, freq): self.root.insert (word, freq, 0) def findAll (self, query, maxDistance): suggestions = self.root.recommend (query, maxDistance) Self.START) return sorted (set (suggestions), key=lambda x: x.freq) class LetterNode: def _ init__ (self, char): self.REMOVE =-1 self.ADD = 1 self.SAME = 0 self.CHANGE = 2 self.START = 3 self.pointers = [] self.char = char self.word = None def charIs (self C): return self.char = = c def insert (self, word, freq, depth): if "" in word: word = [i for i in word.split (")] if depth

< len(word): c = word[depth].lower() for next in self.pointers: if next.charIs(c): return next.insert(word, freq, depth + 1) nextNode = LetterNode(c) self.pointers.append(nextNode) return nextNode.insert(word, freq, depth + 1) else: self.word = Word(word, freq) def recommend(self, query, movesLeft, lastAction): suggestions = [] length = len(query) if length >

= 0 and movesLeft-length > = 0 and self.word: suggestions.append (self.word) if movesLeft = = 0 and length > 0: for next in self.pointers: if next.charIs (query [0]): suggestions + = next.recommend (query [1:], movesLeft Self.SAME) break elif movesLeft > 0: for next in self.pointers: if length > 0: if next.charIs (query [0]): suggestions + = next.recommend (query [1:], movesLeft Self.SAME) else: suggestions + = next.recommend (query [1:], movesLeft-1, self.CHANGE) if lastAction! = self.CHANGE and lastAction! = self.REMOVE: suggestions + = next.recommend (query, movesLeft-1) Self.ADD) if lastAction! = self.ADD and lastAction! = self.CHANGE: if length > 1 and next.charIs (query [1]): suggestions + = next.recommend (query [2:], movesLeft-1 Self.REMOVE) elif length > 2 and next.charIs (query [2]) and movesLeft = = 2: suggestions + = next.recommend (query [3:], movesLeft-2, self.REMOVE) else: if lastAction! = self.CHANGE and lastAction! = self.REMOVE: suggestions + = next.recommend (query, movesLeft-1) Self.ADD) return suggestionsdef buildTrieFromFile (): trie = Trie () rows = [("Zhonghai Jinxi Garden", 2), ("Zhonghai West Park", 24), ("Zhongnanhai", 4)] for row in rows: trie.insert (row [0], int (row [1])) return triedef suggestor (trie, s) MaxDistance): if "" in s: s = [x for x in s.split (")] suggestions = trie.findAll (s, maxDistance) return [str (x.word) for x in suggestions] if _ _ name__ =" _ _ main__ ": trie = buildTrieFromFile () r = suggestor (trie," Zhonghai Jinxi Park ", 1) print (r)

Analysis.

Result print:

["Zhongjin West Park", "Zhonghai West Park"]

It can be seen that "Zhonghai Jin Xiyuan" is exactly the same string as the input, and the editing distance is 0, so it meets the requirement that the maximum editing distance is 1, and returns directly.

"Zhonghai Xiyuan" is the result of "Zhonghai Jin Xiyuan" after removing the word "Jin". The editing distance is 1, so it meets the requirement that the maximum editing distance is 1, and returns directly.

In addition, the editing distance between Zhongnanhai and Zhongjin West Park is 4, which does not meet the requirement that the maximum editing distance is 1, so it does not appear in the results.

This is the end of the article on "how to achieve Chinese error correction with Python fault-tolerant prefix tree". Thank you for reading! I believe you all have a certain understanding of the knowledge of "how to achieve Chinese error correction with Python fault-tolerant prefix tree". If you want to learn more, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report