In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-29 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
This article introduces the knowledge of "how to use the Hutool-dfa of java". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!
This module pays attention to the function of keyword search.
Hutool-dfa uses document 1. An overview of the origin
In one of my earliest companies, I was mainly responsible for the content business. for me, most of the work was to clean and organize the content. Of course, the cleaning process is inevitable is to filter by keywords, you know. The requirements are as follows:
The background personnel add N keywords, and then clean all the contents of the master station, and all the contents containing these keywords are set to be invalid.
Train of thought
To get this requirement, my earliest plan was rough: create a HashSet for the keyword, and then traverse the entire database, traversing the Set for each article to see if the contains keyword. Well, I admit that this is not a good way, with the increase of keywords and data, the time consumed by this process increases exponentially!
So I found du Niang and found an algorithm: DFA.
DFA introduction
The full name of DFA is: Deterministic Finite Automaton, which means deterministic finite automata. Because I am not good at computational law, those who are interested can read this blog: a brief Analysis of the algorithm of querying sensitive words based on DFA
In fact, it is not difficult to explain the principle, that is, to construct a tree with all the keywords, and then traverse the tree with the text, traversing to the leaf node means that the keyword exists in the article.
For the time being, we ignore the time to build the keyword tree. It only takes O (n) complexity to find the text each time.
Aiming at the DFA algorithm and some implementations on the Internet, Hutool has done some arrangement and improvement, and finally formed the current Hutool-dfa module.
2.DFA looks for 1. Build key words tree WordTree tree = new WordTree (); tree.addWord ("big"); tree.addWord ("big potato"); tree.addWord ("potato"); tree.addWord ("just out of the pot"); tree.addWord ("out of the pot"); 2. Look for the key word / / body String text = "I have a big potato, just out of the pot"
Case 1: standard match, match to the shortest keyword, and skip the keyword that has already been matched
/ / when the match reaches [big], it no longer continues to match, so the three words are skipped when [Big Potato] does not match / / as soon as it matches [out of the pot], so [out of the pot] does not match (because it has just been matched first, the long one is matched, and the shortest match is only selected for the first word) List matchAll = tree.matchAll (text,-1, false, false) Assert.assertEquals (matchAll.toString (), "[big, potato, just out of the pot]")
two。 Case 2: match to the shortest keywords and do not skip the keywords that have already been matched
/ / [big] is matched, the shortest matching principle [big potato] is skipped, [potato continues to be matched] / / [just out of the pot] is matched, because the words that have already been matched are not skipped, [out of the pot] is matched matchAll = tree.matchAll (text,-1, true, false); Assert.assertEquals (matchAll.toString (), "[big, potato, just out of the pot, out of the pot]")
Case 3: match to the longest keyword and skip the keyword that has already been matched
/ / match to [big], due to the longest match, so [big potato] is then matched / / because [big potato] is matched, [potato] is skipped, because [just out of the pot] is matched, [out of the pot] is skipped matchAll = tree.matchAll (text,-1, false, true); Assert.assertEquals (matchAll.toString (), "[big, big potato, just out of the pot]")
4. Case 4: match to the longest keyword, do not skip the keyword that has been matched (the most complete keyword)
/ / match to [big], due to the longest match, so [big potato] is then matched, because the keywords that have already been matched are not skipped, potatoes continue to be matched / / [just out of the pot] is matched, and because the words that have already been matched are not skipped, [out of the pot] is matched matchAll = tree.matchAll (text,-1, true, true). Assert.assertEquals (matchAll.toString (), "[big, big potatoes, fresh out of the pot, out of the pot]")
In addition to the matchAll method, WordTree also provides two methods, match and isMatch, which only look for the first matching result, so that once the first keyword is found, the matching will stop, which greatly improves the matching efficiency.
3. For special characters
Sometimes, keywords in the text often contain special characters, such as "☆ keyword". In this case, Hutool provides a StopChar class that skips special characters, which automatically removes special characters when the match method or matchAll method is executed.
This is the end of the content of "how to use the Hutool-dfa of java". Thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.