Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to use Python Fault-tolerant prefix Tree to realize Chinese error Correction

2025-01-14 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)05/31 Report--

This article "how to use Python fault-tolerant prefix tree to achieve Chinese error correction" most people do not understand, so the editor summarizes the following content, detailed, clear steps, with a certain reference value, I hope you can get something after reading this article, let's take a look at this "how to use Python fault-tolerant prefix tree to achieve Chinese error correction" article.

Introduction

In this paper, the prefix tree is implemented using Python, and the query with fault tolerance of editing distance is supported. The prefix tree in this paper stores only three participles in the format of (participle string, frequency), such as: ('Zhongjin West Park', 2), ('Zhonghai West Park', 24), ('Zhongnanhai', 4). You can replace the data with your own file. Specify a string and the maximum fault-tolerant editing distance when querying.

Realize

Class Word: def _ init__ (self, word, freq): self.word = word self.freq = freqclass Trie: def _ init__ (self): self.root = LetterNode ('') self.START = 3 def insert (self, word, freq): self.root.insert (word, freq, 0) def findAll (self, query, maxDistance): suggestions = self.root.recommend (query, maxDistance) Self.START) return sorted (set (suggestions), key=lambda x: x.freq) class LetterNode: def _ init__ (self, char): self.REMOVE =-1 self.ADD = 1 self.SAME = 0 self.CHANGE = 2 self.START = 3 self.pointers = [] self.char = char self.word = None def charIs (self C): return self.char = = c def insert (self, word, freq, depth): if''in word: word = [i for i in word.split (')] if depth

< len(word): c = word[depth].lower() for next in self.pointers: if next.charIs(c): return next.insert(word, freq, depth + 1) nextNode = LetterNode(c) self.pointers.append(nextNode) return nextNode.insert(word, freq, depth + 1) else: self.word = Word(word, freq) def recommend(self, query, movesLeft, lastAction): suggestions = [] length = len(query) if length >

= 0 and movesLeft-length > = 0 and self.word: suggestions.append (self.word) if movesLeft = = 0 and length > 0: for next in self.pointers: if next.charIs (query [0]): suggestions + = next.recommend (query [1:], movesLeft Self.SAME) break elif movesLeft > 0: for next in self.pointers: if length > 0: if next.charIs (query [0]): suggestions + = next.recommend (query [1:], movesLeft Self.SAME) else: suggestions + = next.recommend (query [1:], movesLeft-1, self.CHANGE) if lastAction! = self.CHANGE and lastAction! = self.REMOVE: suggestions + = next.recommend (query, movesLeft-1) Self.ADD) if lastAction! = self.ADD and lastAction! = self.CHANGE: if length > 1 and next.charIs (query [1]): suggestions + = next.recommend (query [2:], movesLeft-1 Self.REMOVE) elif length > 2 and next.charIs (query [2]) and movesLeft = = 2: suggestions + = next.recommend (query [3:], movesLeft-2, self.REMOVE) else: if lastAction! = self.CHANGE and lastAction! = self.REMOVE: suggestions + = next.recommend (query, movesLeft-1) Self.ADD) return suggestionsdef buildTrieFromFile (): trie = Trie () rows = [('Zhonghai Jinxi Park', 2), ('Zhongnanhai', 24), ('Zhongnanhai', 4)] for row in rows: trie.insert (row [0], int (row [1])) return triedef suggestor (trie, s) MaxDistance): if''in s: s = [x for x in s.split ('')] suggestions = trie.findAll (s, maxDistance) return [str (x.word) for x in suggestions] if _ _ name__ = "_ _ main__": trie = buildTrieFromFile () r = suggestor (trie, 'Zhongjin Xiyuan', 1) print (r)

Analysis.

Result print:

['Zhonghai Jinxi Garden', 'Zhonghai Xiyuan']

It can be seen that "Zhonghai Jin Xiyuan" is exactly the same string as the input, and the editing distance is 0, so it meets the requirement that the maximum editing distance is 1, and returns directly.

"Zhonghai Xiyuan" is the result of "Zhonghai Jin Xiyuan" after removing the word "Jin". The editing distance is 1, so it meets the requirement that the maximum editing distance is 1, and returns directly.

In addition, the editing distance between Zhongnanhai and Zhongjin West Park is 4, which does not meet the requirement that the maximum editing distance is 1, so it does not appear in the results.

The above is the content of this article on "how to use Python fault-tolerant prefix tree to achieve Chinese error correction". I believe we all have some understanding. I hope the content shared by the editor will be helpful to you. If you want to know more related knowledge, please pay attention to the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report