Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Es word Segmentation plug-in based on hanlp

2025-03-29 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/02 Report--

Abstract: elasticsearch is a widely used distributed search engine. Es provides a single word segmentation tool, and a word segmentation plug-in ik is widely used. Hanlp is a natural language processing package that can better segment words according to context semantics, person names, place names, organizational names, etc.

Elasticsearch

Default participle

Output:

IK participle

Output:

Hanlp participle

Output:

Ik participle does not divide words according to the meaning of the sentence. Hanlp can segment words correctly according to the meaning of the sentence.

Installation steps:

1. Enter https://github.com/pengcong90/elasticsearch-analysis-hanlp, download the plug-in and decompress it to the plugins directory of es, modify the hanlp.properties file under analysis-hanlp directory, modify the attribute of root, and the value is data under analysis-hanlp.

The address of the directory

2. Modify the jvm.options file in the es config directory and add it on the last line

-Djava.security.policy=../plugins/analysis-hanlp/plugin-security.policy

Restart es

GET / _ analyze?analyzer=hanlp-index&pretty=true

{

"text": "Cecilia Cheung cake shop"

}

Test whether the installation is successful

Analyzer has hanlp-index (index mode) and hanlp-smart (smart mode).

Custom dictionary

Modify my dictionary .txt file under plugins/analysis-hanlp/data/dictionary/custom

The format follows the frequency of [word] [part of speech A] [A]

Delete the CustomDictionary.txt.bin file in the same directory after modification

Restart the es service

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 224

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report