In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/02 Report--
This article analyzes "what is SpringBoot's method of using prefix trees to filter sensitive words?" The content is detailed and easy to understand. Friends who are interested in "what is the method of SpringBoot using prefix tree to filter sensitive words" can follow the editor's train of thought to read it slowly and deeply. I hope it will be helpful to you after reading. Let's follow the editor to learn more about "what is the way SpringBoot uses prefix trees to filter sensitive words".
1. Prefix tree
Generally, when designing a website, there will be problem publishing or content publishing functions. a very important point of these functions is how to filter sensitive words, otherwise bad information may be published. or the published content is mixed with code fragments that may have malicious functions, and the basic algorithm for filtering sensitive words is the prefix tree algorithm, that is, the dictionary tree. The speed of sensitive word matching can be accelerated by prefix tree matching.
Prefix trees are also known as Trie, dictionary trees, and lookup trees. The main features are: high search efficiency, but large memory consumption; mainly used in string retrieval, word frequency statistics, string sorting and so on.
What exactly is a prefix tree? How is the function of prefix tree realized?
To give a specific example: if there is a string "xwabfabcff" and the sensitive words are "abc", "bf" and "be", detect the string, and if there are sensitive words, replace the sensitive words with "*" to implement an algorithm.
Characteristics of the prefix tree:
1. The following node is an empty node without any characters.
two。 Except for the root node, each node has only one character.
3. Each node contains different child nodes. For example, the child node of root would have two b, but we only keep one
4. Make a mark at the end of each sensitive word, indicating that the string composed from the root node to this node is a sensitive word, and the string between the unmarked node and the root node does not constitute a sensitive word.
The algorithm logic of the prefix tree:
1. Preparation: we need three pointers: the ① pointer points to the prefix tree, which points to the root node by default; the ② and ③ pointers point to the string to be detected (in the same direction ruler method, ② walks from beginning to end, marks the beginning of sensitive words, ③ moves with ②, marks the end of sensitive words), and points to the first character of the string by default. We also need a string (StringBuilder) that holds the test results.
2. ① visits the first layer of the tree and finds that there is no'x', then ② and ③ go one step down and store'x'in the StringBuilder string. W'is the same.
3. At this point, ② and ③ point to'a', and ① visits the first layer of the tree and finds'a', but'a'is not marked, so it is not a sensitive word, so it stores'a'in the StringBuilder string. Then ② does not move, and ① and ③ continue to go down until they reach the marked node or mismatch, ① returns, ② takes a step down, and ③ returns to where ② points at this time. Repeat the above steps.
4. If a sensitive word is detected, the "*" is stored in the StringBuilder and the ② skips the sensitive word. ② and ③ point to the next location of the original ③.
5. When ② and ③ reach the end of the string, the detection is complete. The final result is "xwa*ff".
Second, sensitive word filter
When we redevelop the project, we need to develop a reusable tool to filter sensitive words and become sensitive word filters so that they can be reused in the project.
There are three main steps to develop sensitive word filters:
1. Define prefix tree
two。 Initialize the prefix tree according to sensitive words
3. A method of writing and filtering sensitive words
The code is implemented as follows:
Import org.apache.commons.lang3.CharUtils;import org.apache.commons.lang3.StringUtils;import org.slf4j.Logger;import org.slf4j.LoggerFactory;import org.springframework.stereotype.Component; import javax.annotation.PostConstruct;import java.io.BufferedReader;import java.io.IOException;import java.io.InputStream;import java.io.InputStreamReader;import java.util.HashMap;import java.util.Map; @ Componentpublic class SensitiveFilter {/ / logging private static final Logger logger = LoggerFactory.getLogger (SensitiveFilter.class) / / substitute private static final String REPLACEMENT = "*"; / / initialize the root node private TrieNode rootNode = new TrieNode (); / * 2. According to the sensitive word, initialize the prefix tree * / @ PostConstruct// when the container instantiates this Bean when the server starts, and after calling the constructor of Bean This method will be automatically called public void init () {try (/ / load sensitive words file sensitive-words.txt is a self-built file InputStream is = this.getClass () .getClassLoader () .getResourceAsStream ("sensitive-words.txt")) / / Byte stream-- > character stream-- > character buffer stream BufferedReader reader = new BufferedReader (new InputStreamReader (is));) {String keyword While ((keyword = reader.readLine ())! = null) {/ / add to the prefix tree. AddKeyword is a custom method to add a sensitive word to the prefix tree to this.addKeyword (keyword);}} catch (IOException e) {logger.error ("failed to load sensitive word file:" + e.getMessage ()) }} / / encapsulation method: add a sensitive word to the prefix tree to private void addKeyword (String keyword) {TrieNode tempNode = rootNode; for (int I = 0; I)
< keyword.length(); i++) { char c = keyword.charAt(i); TrieNode subNode = tempNode.getSubNode(c); if(subNode == null){ // 如果子节点中没有该字符,则以此字符初始化子节点,并装配到树中 subNode = new TrieNode(); tempNode.addSubNode(c,subNode); } // 指向字节点,进入下一层循环 tempNode = subNode; // 设置结束标识 if(i == keyword.length() -1){ tempNode.setKeywordEnd(true); } } } /** * 3. 检索并过滤敏感词 * @param text 待过滤的文本 * @return 过滤后的文本 */ public String filter(String text){ if(StringUtils.isBlank(text)){ return null; } // 指针① TrieNode tempNode = rootNode; // 指针② int begin = 0; // 指针③ int position = 0; // 存放结果 StringBuilder sb = new StringBuilder(); while(position < text.length()){ char c = text.charAt(position); // 跳过符号 if(isSymbol(c)){ // 若指针①处于根节点,将此符号计入结果,让指针②向下走一步 if(tempNode == rootNode){ sb.append(c); begin++; } // 无论符号在未检测时出现还是正在检测时出现,指针③总是向下走一步 // (未检测时和指针②一起向下走一步,检测时指针②不动,指针③向下走一步) position++; continue; } // 检查下级节点 tempNode = tempNode.getSubNode(c); if(tempNode == null){ // 以begin开头的字符串不是敏感词 sb.append(text.charAt(begin)); // 进入下一个位置 begin++; position = begin; // 指针①归位,重新指向根节点 tempNode = rootNode; }else if (tempNode.isKeywordEnd()){ // 发现敏感词,将begin~position字符串替换掉 sb.append(REPLACEMENT); // 进入下一个位置 position++; begin = position; // 指针①归位,重新指向跟接待你 tempNode = rootNode; }else { // 检查下一个字符 position++; } } // 将最后一批字符计入结果:指针③比指针②先到中终点,且两者之间的字符串不是敏感词 sb.append(text.substring(begin)); return sb.toString(); } // 封装方法:判断是否为特殊符号 private boolean isSymbol(Character c){ // 0x2E80~0x9FFF 是东亚文字范围,不予当作特殊符号看待 return !CharUtils.isAsciiAlphanumeric(c) && (c < 0x2E80 || c >0x9FFF);} / * * 1. Define prefix tree * / private class TrieNode {/ / sensitive words (keywords) end mark private boolean isKeywordEnd = false; / / child node (key is a subordinate character, value is a subordinate node) private Map subNodes = new HashMap (); public boolean isKeywordEnd () {return isKeywordEnd } public void setKeywordEnd (boolean keywordEnd) {isKeywordEnd = keywordEnd;} / / add child node public void addSubNode (Character c, TrieNode node) {subNodes.put (c, node);} / / get child node public TrieNode getSubNode (Character c) {return subNodes.get (c) What is springboot? springboot is a new programming specification, which is designed to simplify the initial construction and development process of new Spring applications. SpringBoot is also a framework that serves the framework, and the scope of services is to simplify configuration files.
So much for sharing about SpringBoot's method of using prefix trees to filter sensitive words. I hope the above content can improve everyone. If you want to learn more knowledge, please pay more attention to the editor's updates. Thank you for following the website!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.