In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/03 Report--
During this period of time, I have been in contact with the knowledge of learning hadoop, so I have also done some understanding of natural language processing technology. There are many articles about natural language processing technology on the Internet. Today, I would like to share with you the content of HanLP.
In fact, natural language processing technology is a general term for all the technologies associated with computer processing of natural language. the purpose of the application of natural language processing technology is to enable computers to understand and receive instructions we enter in natural languages. realize the translation of our human language into a language that computers can understand and will not produce ambiguity. With the current big data and artificial intelligence, the rapid development of natural language processing technology can contribute to the development of artificial intelligence.
(Daxuai DKhadoop integrated development framework)
The HanLP I want to share here is the natural language processing technology I used when learning to use the DKhadoop big data integrated platform. Using this construction can be very efficient for natural language processing, such as summarizing articles, semantic discrimination and improving the accuracy and effectiveness of content retrieval.
I wanted to find a popular case to introduce HanLP, but I didn't think of any good case for a moment, so let's simply introduce the HE participle from the HanLp data structure.
First, let's take a look at the data structure of HanLP:
Binary tire tree: Tire tree is a prefix compression structure that can compress and store a large number of strings and provide get operations that are faster than Map. The trie tree in HanLP uses an ordered array to store child nodes, and through binary search algorithm, it can provide faster query speed than TreeMap.
Unlike the ordinary trie tree in which the parent node stores the reference of the child node, the double array trie tree transforms the dependency of the node into the addition and check operation of the inner character code.
For a transfer of the receiving character c from state s to t, the conditions must be met:
Base [s] + c = t
Check [t] = s for example: base [No.1] + shop = No.1 store
Check [No.1 store] = No.1
Compared with the prefix compression of trie trees (success table), AC automata also implements suffix compression (output table).
When a match fails, AC automatically jumps to the state where it is most likely to succeed (fail pointer)
On HanLP participle
1. Dictionary word segmentation
Dictionary longest word segmentation based on double array trie tree or ACDAT (that is, find all possible words from the dictionary and select the longest words sequentially)
Output: [HanLP/ noun, is it / null, special / adverb, convenient / adjective,? / null]
2. NGram participle
Statistics of the BiGram in the corpus, according to the transfer probability, select the most likely sentences to achieve the purpose of eliminating ambiguity
3. HMM2 participle
This is a generative model of word formation, and the sequence labeling is provided by the second-order hidden horse model.
Known as TnT Tagger, it is characterized by the use of low-order events to smooth high-order events to make up for the data sparse problem of high-order models.
4. CRF participle
This is a generative model of word formation, with sequence tagging provided by CRF.
Compared with HMM,CRF, the advantage is that it can make use of more features and has a good effect on OOV word segmentation, while the disadvantage is that it takes up a large memory and is slow to decode.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.