In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)05/31 Report--
This article introduces the relevant knowledge of "how to achieve elasticsearch Chinese word segmentation integration". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!
Perhaps the most relevant thing for the index is word segmentation. Generally speaking, for es, the default smartcn is not very good.
One is for ik and the other is for mmseg. The following two usages are described respectively. In fact, they are all the same. Install the plug-in first and use the command line:
Install the ik plug-in
Plugin-install medcl/elasticsearch-analysis-ik/1.1.0
Download the ik related configuration dictionary file to the config directory
Unzip ik.zip
Rm ik.zip
Word segmentation configuration
Ik participle configuration, adding to the elasticsearch.yml file
Index: analysis: analyzer: ik: alias: [ik_analyzer] type: org.elasticsearch.index.analysis.IkAnalyzerProvider
Or
Index.analysis.analyzer.ik.type: "ik" install the mmseg plug-in:
Bin/plugin-install medcl/elasticsearch-analysis-mmseg/1.1.0
Download the relevant configuration dictionary file to the config directory
Cd config
Wget http://github.com/downloads/medcl/elasticsearch-analysis-mmseg/mmseg.zip-- no-check-certificate
Unzip mmseg.zip
Rm mmseg.zip
Mmseg participle configuration, also in the elasticsearch.yml file
Index: analysis: analyzer: mmseg: alias: [news_analyzer, mmseg_analyzer] type: org.elasticsearch.index.analysis.MMsegAnalyzerProvider
Or
Index.analysis.analyzer.default.type: "mmseg"
Some more personalized parameter settings for mmseg participle are as follows
Index: analysis: tokenizer: mmseg_maxword: type: mmseg seg_type: "max_word" mmseg_complex: type: mmseg seg_type: "complex" mmseg_simple: type: mmseg seg_type: "simple"
After configuration, the plug-in installation is complete, and launching es will load the plug-in.
Define mapping
You can define a word splitter like this when you add an indexed mapping
{"page": {"properties": {"title": {"type": "string", "indexAnalyzer": "ik", "searchAnalyzer": "ik"}, "content": {"type": "string", "indexAnalyzer": "ik" SearchAnalyzer: "ik"}
IndexAnalyzer is the word splitter used in indexing, and searchAnalyzer is the word splitter used in search.
The java mapping code is as follows:
XContentBuilder content = XContentFactory.jsonBuilder (). StartObject () .startObject ("page") .startObject ("properties") .startObject ("title") .field ("type", "string") .field ("indexAnalyzer", "ik") .field ("searchAnalyzer") "ik") .endObject () .startObject ("code") .field ("type", "string") .field ("indexAnalyzer", "ik") .field ("searchAnalyzer", "ik") .endObject () .endObject ()
You can call the following api for testing participle. Note that indexname is the index name. Just specify a random index.
This is the end of the content of "how to achieve elasticsearch Chinese word Segmentation Integration". Thank you for your reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.