Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to use IK word Separator in elasticsearch 5.x

2025-02-21 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/02 Report--

This article shows you how to use the IK word splitter in elasticsearch 5.x, the content is concise and easy to understand, it will definitely brighten your eyes. I hope you can get something through the detailed introduction of this article.

The address of the ik splitter is https://github.com/medcl/elasticsearch-analysis-ik/releases. The splitter plug-in needs to match the ES version.

Since es is version 5.6.16, we download 5.6.16

Https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v5.6.16/elasticsearch-analysis-ik-5.6.16.zip

After unpacking, place the installation package in the plugins directory of the ES node, and rename the package name to ik

Restart ES to test the effect of IK word segmentation

(1) the effect without word splitter

GET _ analyze?pretty {"text": "Yangtze River Basin in Anhui Province"}

Returns the result.

{"tokens": [{"token": "an", "start_offset": 0, "end_offset": 1, "type": "," position ": 0}, {" token ":" emblem "," start_offset ": 1," end_offset ": 2," type ":", "position": 1} {"token": "province", "start_offset": 2, "end_offset": 3, "type": "", "position": 2}, {"token": "long", "start_offset": 3, "end_offset": 4, "type": "," position ": 3} {"token": "Jiang", "start_offset": 4, "end_offset": 5, "type": "," position ": 4}, {" token ":" flow "," start_offset ": 5," end_offset ": 6," type ":", "position": 5} {"token": "start_offset": 6, "end_offset": 7, "type": "," position ": 6}]}

It can be seen that every word in "the Yangtze River Valley of Anhui Province" is divided into one word.

(2) the effect under the IK word splitter, the ik_smart word splitter

GET _ analyze?pretty {"analyzer": "ik_smart", "text": "Yangtze River Basin in Anhui Province"}

Result

{"tokens": [{"token": "Anhui Province", "start_offset": 0, "end_offset": 3, "type": "CN_WORD", "position": 0}, {"token": "Yangtze River Basin", "start_offset": 3, "end_offset": 7, "type": "CN_WORD" "position": 1}]}

(3) the effect under the IK word splitter, the ik_smart word splitter

GET _ analyze?pretty {"analyzer": "ik_max_word", "text": "Yangtze River Basin in Anhui Province"}

Result

{"tokens": [{"token": "Anhui", "start_offset": 0, "end_offset": 3, "type": "CN_WORD", "position": 0}, {"token": "Anhui", "start_offset": 0, "end_offset": 2, "type": "CN_WORD" "position": 1}, {"token": "Governor", "start_offset": 2, "end_offset": 4, "type": "CN_WORD", "position": 2}, {"token": "Yangtze River Basin", "start_offset": 3, "end_offset": 7, "type": "CN_WORD" "position": 3}, {"token": "Yangtze River", "start_offset": 3, "end_offset": 5, "type": "CN_WORD", "position": 4}, {"token": "River flow", "start_offset": 4, "end_offset": 6, "type": "CN_WORD" "position": 5}, {"token": "Watershed", "start_offset": 5, "end_offset": 7, "type": "CN_WORD", "position": 6}]}

Why can the IK word splitter analyze Chinese words? Because there are some dictionaries built into its config directory.

So what if we need to recognize some new words? For example, a series "Game of Thrones"

Custom dictionary

Create a tv directory under the config directory of the IK plug-in, and create a new tv.dic file (be sure to UTF-8 without BOM format)

Then add the configuration in the IKAnalyzer.cfg.xml file

Restart ES and Kibana to try the effect

GET _ analyze?pretty {"analyzer": "ik_smart", "text": "Game of Thrones"}

Word segmentation result

{"tokens": [{"token": "Game of Thrones", "start_offset": 0, "end_offset": 5, "type": "CN_WORD", "position": 0}]} the above is how to use the IK participle in elasticsearch 5.x. Have you learned any knowledge or skills? If you want to learn more skills or enrich your knowledge reserve, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report