In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-21 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/02 Report--
This article shows you how to use the IK word splitter in elasticsearch 5.x, the content is concise and easy to understand, it will definitely brighten your eyes. I hope you can get something through the detailed introduction of this article.
The address of the ik splitter is https://github.com/medcl/elasticsearch-analysis-ik/releases. The splitter plug-in needs to match the ES version.
Since es is version 5.6.16, we download 5.6.16
Https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v5.6.16/elasticsearch-analysis-ik-5.6.16.zip
After unpacking, place the installation package in the plugins directory of the ES node, and rename the package name to ik
Restart ES to test the effect of IK word segmentation
(1) the effect without word splitter
GET _ analyze?pretty {"text": "Yangtze River Basin in Anhui Province"}
Returns the result.
{"tokens": [{"token": "an", "start_offset": 0, "end_offset": 1, "type": "," position ": 0}, {" token ":" emblem "," start_offset ": 1," end_offset ": 2," type ":", "position": 1} {"token": "province", "start_offset": 2, "end_offset": 3, "type": "", "position": 2}, {"token": "long", "start_offset": 3, "end_offset": 4, "type": "," position ": 3} {"token": "Jiang", "start_offset": 4, "end_offset": 5, "type": "," position ": 4}, {" token ":" flow "," start_offset ": 5," end_offset ": 6," type ":", "position": 5} {"token": "start_offset": 6, "end_offset": 7, "type": "," position ": 6}]}
It can be seen that every word in "the Yangtze River Valley of Anhui Province" is divided into one word.
(2) the effect under the IK word splitter, the ik_smart word splitter
GET _ analyze?pretty {"analyzer": "ik_smart", "text": "Yangtze River Basin in Anhui Province"}
Result
{"tokens": [{"token": "Anhui Province", "start_offset": 0, "end_offset": 3, "type": "CN_WORD", "position": 0}, {"token": "Yangtze River Basin", "start_offset": 3, "end_offset": 7, "type": "CN_WORD" "position": 1}]}
(3) the effect under the IK word splitter, the ik_smart word splitter
GET _ analyze?pretty {"analyzer": "ik_max_word", "text": "Yangtze River Basin in Anhui Province"}
Result
{"tokens": [{"token": "Anhui", "start_offset": 0, "end_offset": 3, "type": "CN_WORD", "position": 0}, {"token": "Anhui", "start_offset": 0, "end_offset": 2, "type": "CN_WORD" "position": 1}, {"token": "Governor", "start_offset": 2, "end_offset": 4, "type": "CN_WORD", "position": 2}, {"token": "Yangtze River Basin", "start_offset": 3, "end_offset": 7, "type": "CN_WORD" "position": 3}, {"token": "Yangtze River", "start_offset": 3, "end_offset": 5, "type": "CN_WORD", "position": 4}, {"token": "River flow", "start_offset": 4, "end_offset": 6, "type": "CN_WORD" "position": 5}, {"token": "Watershed", "start_offset": 5, "end_offset": 7, "type": "CN_WORD", "position": 6}]}
Why can the IK word splitter analyze Chinese words? Because there are some dictionaries built into its config directory.
So what if we need to recognize some new words? For example, a series "Game of Thrones"
Custom dictionary
Create a tv directory under the config directory of the IK plug-in, and create a new tv.dic file (be sure to UTF-8 without BOM format)
Then add the configuration in the IKAnalyzer.cfg.xml file
Restart ES and Kibana to try the effect
GET _ analyze?pretty {"analyzer": "ik_smart", "text": "Game of Thrones"}
Word segmentation result
{"tokens": [{"token": "Game of Thrones", "start_offset": 0, "end_offset": 5, "type": "CN_WORD", "position": 0}]} the above is how to use the IK participle in elasticsearch 5.x. Have you learned any knowledge or skills? If you want to learn more skills or enrich your knowledge reserve, you are welcome to follow the industry information channel.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.