In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-29 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)05/31 Report--
This article mainly introduces how to install elasticsearch Chinese word-cutting plug-in hanlp, the text is very detailed, has a certain reference value, interested friends must read!
the advantage of hanlp is that its data dictionary is relatively complete.
On github, there are plug-ins for chinese people to write hanlp to support es
https://github.com/pengcong90/elasticsearch-analysis-hanlp
Download its installation release package
Download Discovery Extract According to its installation requirements, the hanlp.properties file cannot be found
The source git down, found that the path has problems.
package org.elasticsearch.index.analysis;import com.hankcs.hanlp.HanLP;import com.hankcs.hanlp.utility.Predefine;import com.hankcs.lucene4.HanLPIndexAnalyzer;import org.elasticsearch.common.inject.Inject;import org.elasticsearch.common.inject.assistedinject.Assisted;import org.elasticsearch.common.settings.Settings;import org.elasticsearch.env.Environment;import org.elasticsearch.index.IndexSettings;/** */public class HanLPAnalyzerProvider extends AbstractIndexAnalyzerProvider { private final HanLPIndexAnalyzer analyzer; private static String sysPath = String.valueOf(System.getProperties().get("user.dir")); @Inject public HanLPAnalyzerProvider(IndexSettings indexSettings, Environment env, @Assisted String name, @Assisted Settings settings) { super(indexSettings, name, settings); //original path //Predefine.HANLP_PROPERTIES_PATH = sysPath.substring(0, sysPath.length()-4) + "/plugins/analysis-hanlp/hanlp.properties"; //correct path after modification Predefine.HANLP_PROPERTIES_PATH = sysPath + "/plugins/analysis-hanlp/hanlp.properties"; analyzer = new HanLPIndexAnalyzer(true); } public static HanLPAnalyzerProvider getIndexAnalyzerProvider(IndexSettings indexSettings, Environment env, String name, Settings settings) { return new HanLPAnalyzerProvider(indexSettings, env, name, settings); } public static HanLPAnalyzerProvider getSmartAnalyzerProvider(IndexSettings indexSettings, Environment env, String name, Settings settings) { return new HanLPAnalyzerProvider(indexSettings, env, name, settings); } @Override public HanLPIndexAnalyzer get() { return this.analyzer; }}
Because its hanlp version is 1.2.8, the latest version is 1.5.4
Modify pom.xml to
com.hankcs hanlp portable-1.5.4
package compilation
Create analysis-hanlp file under $ES_HOME/plugins
The structure under the directory is
hanlp.properties properties (you can modify the root path directly from https://github.com/hankcs/HanLP realease download)
#Root directory of path in this profile, root directory + other path = full path (Support relative paths, please refer to: https: //github.com/hankcs/HanLP/pull/254)#Windows users please note that path separator uniformly used/root=/opt/elasticsearch-5.5.1/plugins/analysis-hanlp#Core Dictionary Path CoreDictionaryPath=data/dictionary/CoreNatureDictionary.txt#2-gram dictionary path BiGramDictionaryPath= data/dictionary/CoreNatureDictionary.ngram.txt #Stop word dictionary path CoreStopWordDictionaryPath= data/dictionary/stopwords.txt #Synonymology dictionary path CoreSynonymology DictionaryDictionaryPath = data/dictionary/synonym/CoreSynonym.txt #Person Dictionary Path PersonDictionaryPath = data/dictionary/person/nr.txt #Person Dictionary Transfer Matrix Path PersonDictionaryTrPath=data/dictionary dictionary/person/nr.tr.txt#The root directory tcDictionaryRoot=data/dictionary/tc#custom dictionary path is separated by;. The beginning of a space indicates that they are in the same directory. The use of the "part of speech of file name" form indicates that the part of speech of this dictionary is the part of speech by default. Descending priority.# In addition, data/dictionary/custom/CustomDictionary.txt is a high-quality thesaurus, please do not delete it. All dictionaries use UTF-8 coding. CustomDictionaryPath=data/dictionary/custom/CustomDictionary.txt; Modern Chinese Supplementary Dictionary.txt; National Toponymic Dictionary.txt ns; Personal Name Dictionary.txt; Institution Name Dictionary.txt; Shanghai Toponymic.txt ns;data/dictionary/person/nrf.txt nrf;#CRF SegmentModelPath = data/model/segment/CRFSegmentModel.txt #HMM SegmentModelPath = data/model/segment/HMMSegmentModel.bin #SegmentResult ShowTermNature=true#IO Adapter, Implement the com.hankcs.hanlp.corpus.io.IIOAdapter interface to run HanLP on different platforms (Hadoop, Redis, etc.)#The default IO adapter is as follows, which is based on a common file system # IOAdapter=com.hankcs.hanlp.corpus.io.FileIOAdapter
The plugin-descriptor.properties and plugin-security.policy properties are modified by the release package properties of elasticsearch-analysis-hanlp.
modify ES boot and boot
vim /opt/elasticsearch-5.5.1config/jvm.options #Added-Djava.security.policy=/opt/elasticsearch-5.5.1/plugins/analysis-hanlp/plugin-security.policy
Test installation success No command
GET /_analyze? analyzer=hanlp-index&pretty=true { "text":"Ministry of Public Security: School buses everywhere will enjoy the highest right of way" }
Data dictionary file download from https://github.com/hankcs/HanLP/releases, extract on the line.
The above is "how to install elasticsearch Chinese word-cutting plug-in hanlp" all the content of this article, thank you for reading! Hope to share the content to help everyone, more relevant knowledge, welcome to pay attention to the industry information channel!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.