In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/02 Report--
This article mainly explains "what are the knowledge points of Elasticsearch". Friends who are interested might as well take a look. The method introduced in this paper is simple, fast and practical. Now let the editor take you to learn "what are the knowledge points of Elasticsearch"?
The main contents of this article are as follows:
Preface
In a project, we always use the Kibana interface to search the logs in the test or production environment to see if there is any abnormal information. Kibana is what we often call the K of ELK.
The Kibana interface is shown in the following figure:
Kibana interface
But what is the principle of these log retrieval? This is where our Elasticsearch search engine comes in.
A brief introduction to Elasticsearch
1.1 what is Elasticsearch?
Elasticsearch is a distributed open source search and analysis engine for all types of data, including text, numbers, geospatial, structured and unstructured data. To put it simply, ES can do anything related to search and analysis.
1.2 what is the purpose of Elasticsearch?
Elasticsearch performs well in terms of speed and scalability, and has the ability to index many types of content, which means it can be used in a variety of use cases:
For example, an online store where you can allow customers to search for the products you sell. In this case, you can use Elasticsearch to store the entire product catalog and inventory and provide them with search and auto-completion suggestions.
Search for mobile phones
For example, collect log or transaction data, and analyze and mine this data to find trends, statistics, summaries, or anomalies. In this case, you can use Logstash (part of the Elasticsearch / Logstash / Kibana stack) to collect, aggregate, and parse the data, and then have Logstash provide the data to Elasticsearch. After the data is put into Elasticsearch, you can run searches and aggregations to mine any information you are interested in.
1.3How does Elasticsearch work?
ELK schematic diagram
Elasticsearch is built on the basis of Lucene. ES has made a lot of enhancements on Lucence.
Lucene, a sub-project of the jakarta project team of apache Software Foundation 4, is an open source full-text search engine toolkit, but it is not a complete full-text search engine, but a full-text search engine architecture, which provides a complete query engine and indexing engine, some text analysis engines (English and German two western languages). The purpose of Lucene is to provide a simple and easy-to-use toolkit for software developers to conveniently realize the function of full-text retrieval in the target system, or to establish a complete full-text retrieval engine based on it. (from Baidu encyclopedia)
Where does the raw data of Elasticsearch come from?
Raw data is entered into Elasticsearch from multiple sources, including logs, system metrics, and network applications.
How is the data collected from Elasticsearch?
Data collection refers to the process of parsing, standardizing and enriching the original data before indexing in Elasticsearch. Once this data is indexed in Elasticsearch, users can run complex queries against their data and use aggregations to retrieve complex summaries of their own data. Logstash is used here, which will be described later.
How to visually view the data you want to retrieve?
Kibana is about to be used here, where users can search based on their own data, view data views, and so on.
1.4 what is the Elasticsearch index?
An Elasticsearch index refers to a collection of documents that are related to each other. Elasticsearch stores data as an JSON document. Each document establishes a relationship between a set of keys (the name of a field or property) and their corresponding values (strings, numbers, Boolean values, dates, numeric groups, geographic locations, or other types of data).
Elasticsearch uses a data structure called inverted index, which is designed to allow full-text search to be done very quickly. The inverted index lists each unique word that appears in all documents, and all documents that contain each word can be found.
During the indexing process, Elasticsearch stores the document and builds an inverted index so that users can search the document data in near real time. The indexing process is initiated in the index API, which allows you to add JSON documents to or change JSON documents in a specific index.
What is the purpose of 1.5 Logstash?
Logstash is the L of ELK.
Logstash is one of the core products of Elastic Stack, which can be used to aggregate and process data and send it to Elasticsearch. Logstash is an open source server-side data processing pipeline that allows you to collect, enrich and transform data from multiple sources at the same time before indexing it to Elasticsearch.
1.6 what is the purpose of Kibana?
Kibana is a data visualization and management tool for Elasticsearch, which can provide real-time histogram, linear graph and so on.
1.7 Why use Elasticsearch
ES is a fast, near real-time search platform.
ES is distributed in nature.
ES includes a wide range of functions, such as data aggregation and index lifecycle management.
Official document: https://www.elastic.co/cn/what-is/elasticsearch
II. Basic concepts of ES
2.1 Index (Index)
Verb: equivalent to insert in Mysql
Noun: equivalent to database in Mysql
Comparison with mysql
Serial number MysqlElasticsearch1Mysql Service ES Cluster Service 2 Database Database Index Index3 Table Table Type Type4 record Records (row record) document Document (JSON format)
2.2 inverted index
If the database has the following movie records:
1-A Chinese Odyssey
2-rumors of A Chinese Odyssey
3-Analysis of A Chinese Odyssey
4-Journey to the West to conquer the devil
5-exclusive Analysis of Fantasy Journey to the West
Participle: to split a whole sentence into words
The serial number is saved to the ES word corresponding to the movie record serial number A Journey to the West 1 Magic 2, 3 Magic 4, 5B lie 1 Magic 2, 3 C gossip 2 Magic 4, 5D parsing 3 Magic 5e falling Magic 4F Dream 5G exclusive 5
Search: exclusive A Chinese Odyssey
Split the exclusive A Chinese Odyssey analysis into exclusive, boast and Journey to the West.
There is one of these three words in the records of A, B and G in ES, so the related words are hit in the records of 1, 3, 4 and 5.
Record 1 hit 2 times, both An and B (hit 2 times), and record 1 has 2 words, correlation score: 2 times / 2 words = 1
Record 2 hits both An and B (2 hits), and record 2 has 2 words, correlation score: 2 / 3 words = 0.67
Record 3 hit 2 words An and B (hit 2 times), and record 3 has 2 words, correlation score: 2 times / 3 words = 0.67
Record 4 hits 2 words A (1 hit), and record 4 has 2 words, correlation score: 1 / 3 words = 0.33
Record 5 hit 2 words A (hit 2 times), and record 4 had 4 words, correlation score: 2 times / 4 words = 0.5
So the order of the retrieved records is as follows:
1-A Chinese Odyssey (correlation score: 1)
2-A Chinese Odyssey's anecdotal score: 0.67
3-analyze A Chinese Odyssey (relevance score: 0.67)
5-exclusive analysis of Fantasy Journey to the West (fantasy score: 0.5)
4-Journey to the West to conquer demons (desire score: 0.33)
Third, Docker builds the environment
3.1. Build Elasticsearch environment
1) download the image file
Docker pull elasticsearch:7.4.2
2) create an instance
Mapping Profil
Configure mapping folder mkdir-p / mydata/elasticsearch/config configure mapping folder mkdir-p / mydata/elasticsearch/data set folder permissions for any user readable and writable chmod 777 / mydata/elasticsearch-R configuration http.host echo "http.host: 0.0.0.0" > > / mydata/elasticsearch/config/elasticsearch.yml
Start the elasticsearch container
Docker run-- name elasticsearch-p 9200 discovery.type 9200-p 9300 discovery.type = "single-node"\-e ES_JAVA_OPTS= "- Xms64m-Xmx128m"\-v / mydata/elasticsearch/config/elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml\-v / mydata/elasticsearch/data:/usr/share/elasticsearch/data\-v / mydata/elasticsearch/plugins:/usr/share/elasticsearch/plugins\-d elasticsearch:7.4.2
Access the elasticsearch service
Visit: http://192.168.56.10:9200
Returned reponse
{"name": "8448ec5f3312", "cluster_name": "elasticsearch", "cluster_uuid": "xC72O3nKSjWavYZ-EPt9Gw", "version": {"number": "7.4.2", "build_flavor": "default", "build_type": "docker", "build_hash": "2f90bbf7b93631e52bafb59b3b049cb44ec25e96" "build_date": "2019-10-28T20:40:44.881551Z", "build_snapshot": false, "lucene_version": "8.2.0", "minimum_wire_compatibility_version": "6.8.0", "minimum_index_compatibility_version": "6.0.0-beta1"}, "tagline": "You Know, for Search"}
Access: http://192.168.56.10:9200/_cat access node information
127.0.0.162 900 0 0.06 0.10 0.05 dilm * 8448ec5f3312
3.2. Build Kibana environment
Docker pull kibana:7.4.2 docker run-- name kibana-e ELASTICSEARCH_HOSTS= http://192.168.56.10:9200-p 5601 ELASTICSEARCH_HOSTS= 5601-d kibana:7.4.2
Visit kibana: http://192.168.56.10:5601/
Fourth, the primary retrieval method
4.1._cat usage
GET / _ cat/nodes: view all nodes GET / _ cat/health: view es health status GET / _ cat/master: view master node GET / _ cat/indices: view summary of all index queries: / _ cat/allocation / _ cat/shards/ _ cat/shards/ {index} / _ cat/master / _ cat/nodes / _ cat/tasks / _ cat/indices/ _ cat/indices/ {index} / _ cat/segments / _ Cat/segments/ {index} / _ cat/count/ _ cat/count/ {index} / _ cat/recovery/ _ cat/recovery/ {index} / _ cat/health / _ cat/pending_tasks / _ cat/aliases/ _ cat/aliases/ {alias} / _ cat/thread_pool/ _ cat/thread_pool/ {thread_pools} / _ cat/plugins / _ cat/fielddata/ _ cat/fielddata/ {fields} / _ cat/nodeattrs / _ cat/repositories / _ cat/snapshots/ {repository} / _ cat/templates
4.2. Index a document (save)
Example: save the data identified as 1 under the external type under the customer index.
Use Kibana's Dev Tools to create
PUT member/external/1 {"name": "jay huang"}
Reponse:
{"_ index": "member", / / in which index "_ type": "external", / / in that type "_ id": "2", / / record id "_ version": 7successful / version number "result": "updated", / / Operation type "_ shards": {"total": 2, "successful": 1 "failed": 0}, "_ seq_no": 9, "_ primary_term": 1}
You can also send a request through the Postman tool to create a record.
Create a record
Note:
Both PUT and POST can create records.
POST: if id is not specified, id is generated automatically. If you specify id, modify the record and add a new version number.
PUT: id must be specified. If there is no record, add it, and if so, update it.
4.3 query documents
Request: http://192.168.56.10:9200/member/external/2 Reposne: {"_ index": "member", / / in which index "_ type": "external", / / in that type "_ id": "2", / / record id "_ version": 7, / / version number "_ seq_no": 9 / / concurrency control field Each update will be + 1, which is used to make the optimistic lock "_ primary_term": 1, / / as above, the main shard will be redistributed. If you restart, it will change "found": true, "_ source": {/ / the real content "name": "jay huang"}}.
_ seq_no as an optimistic lock
Each time the data is updated, _ seq_no is + 1, so it can be used as concurrency control.
When updating a record, if the _ seq_no is inconsistent with the preset value, the record has been updated at least once, and this update is not allowed.
The usage is as follows:
Request to update record 2: http://192.168.56.10:9200/member/external/2?if_seq_no=9&&if_primary_term=1 returned result: {"_ index": "member", "_ type": "external", "_ id": "2", "_ version": 9, "result": "updated", "_ shards": {"total": 2 "successful": 1, "failed": 0}, "_ seq_no": 11, "_ primary_term": 1}
_ seq_no is equal to 10, and the data is updated when _ primary_term=1. After executing a request, the above request will report an error: version conflict.
{"error": {"root_cause": [{"type": "version_conflict_engine_exception", "reason": "[2]: version conflict, required seqNo [10], primary term [1]. Current document has seqNo [11] and primary term [1] "," index_uuid ":" CX6uwPBKRByWpuym9rMuxQ "," shard ":" 0 "," index ":" member "}]," type ":" version_conflict_engine_exception "," reason ":" [2]: version conflict, required seqNo [10], primary term [1]. Current document has seqNo [11] and primary term [1] "," index_uuid ":" CX6uwPBKRByWpuym9rMuxQ "," shard ":" 0 "," index ":" member "}," status ": 409}
4.4 Update documentation
Usage
POST update operation with _ update. If the original data has not changed, the result in repsonse returns noop (without any action) and the version will not change.
The request data needs to be wrapped in doc in the request body.
POST request: http://192.168.56.10:9200/member/external/2/_update {"doc": {"name": "jay huang"}} response: {"_ index": "member", "_ type": "external", "_ id": "2", "_ version": 12, "result": "noop" "_ shards": {"total": 0, "successful": 0, "failed": 0}, "_ seq_no": 14, "_ primary_term": 1}
Usage scenario: for large concurrent updates, it is recommended not to take _ update. For large concurrent queries, for scenarios with a few updates, you can update them with _ update.
Add attributes when updating
Request weight gain age attribute
Http://192.168.56.10:9200/member/external/2/_update request: {"doc": {"name": "jay huang", "age": 18}} response: {"_ index": "member", "_ type": "external", "_ id": "2", "_ version": 13, "result": "updated" "_ shards": {"total": 2, "successful": 1, "failed": 0}, "_ seq_no": 15, "_ primary_term": 1}
4.5 Delete documents and indexes
Delete document
DELETE request: http://192.168.56.10:9200/member/external/2 response: {"_ index": "member", "_ type": "external", "_ id": "2", "_ version": 2, "result": "deleted", "_ shards": {"total": 2, "successful": 1 "failed": 0}, "_ seq_no": 1, "_ primary_term": 1}
Delete index
DELETE request: http://192.168.56.10:9200/member repsonse: {"acknowledged": true}
There is no function to delete a type
4.6 Import data in bulk
Using kinaba's dev tools tool, enter the following statement
POST / member/external/_bulk {"index": {"_ id": "1"} {"name": "Jay Huang"} {"index": {"_ id": "2"}} {"name": "Jackson Huang"}
The execution result is shown in the following figure:
Copy official sample data
Https://raw.githubusercontent.com/elastic/elasticsearch/master/docs/src/test/resources/accounts.json
Execute the script in kibana
POST / bank/account/_bulk {"index": {"_ id": "1"} {"account_number": 1, "balance": 39225, "firstname": "Amber", "lastname": "Duke", "age": 32, "gender": "M", "address": "880Holmes Lane", "employer": "Pyrami", "email": "amberduke@pyrami.com", "city": "Brogan" "state": "IL"} {"index": {"_ id": "6"}.
Execution results of batch insertion of sample data
View all indexes
View all indexes
You can see from the returned results that the bank index has 1000 pieces of data, which takes up 440.2kb storage space.
Fifth, the high-level retrieval method
5.1 two query methods
5.1.1 parameters followed by URL
GET bank/_search?q=*&sort=account_number: asc
```/ _ search?q=*&sort=account_number: Asc`
Query all the data, a total of 1000 pieces of data, time-consuming 1ms, only show 10 pieces of data (ES pagination)
URL followed by parameters
Attribute value description:
Time for took-ES to perform search (in milliseconds) whether timed_out-ES timed out _ shards-how many shards have been searched, and statistics of successful / failed / skipped search fragments max_score-highest score hits.total.value-number of records hit hits.sort-sorted key key of the result If not, sort hits._score by score-relevance score reference document: https://www.elastic.co/guide/en/elasticsearch/reference/current/getting-started-search.html
5.1.2 URL plus request body for retrieval (QueryDSL)
Write query conditions in the request body
Syntax:
GET bank/_search {"query": {"match_all": {}}, "sort": [{"account_number": "asc"}]}
Example: query all, first sort by accout_number ascending order, and then sort by balance descending order
URL plus request body for retrieval
5.2 detailed explanation of QueryDSL query
DSL: Domain Specific Language
5.2.1 all match match_all
Example: query all records, sort by balance descending order, return only 11 records to 20 records, and display only the balance and firstname fields.
GET bank/_search {"query": {"match_all": {}}, "sort": [{"balance": {"order": "desc"}], "from": 10, "size": 10, "_ source": ["balance", "firstname"]}
5.2.2 matching query match
Basic type (non-string), exact match
GET bank/_search {"query": {"match": {"account_number": "30"}
String, full-text search
GET bank/_search {"query": {"match": {"address": "mill road"}
String full-text retrieval
The full-text search is sorted according to the score, and the retrieval conditions are segmented and matched.
Query all records in address that contain mill or road or mill road, and give the correlation score.
32 records were found, and the highest record was Address = "990 Mill Road", with a score of 8.926605. Address= 's "198 Mill Lane" score is 5.4032025, and only Mill words are matched.
5.2.3 phrase matching match_phase
Retrieve the values that need to be matched as a whole word (no word segmentation)
GET bank/_search {"query": {"match_phrase": {"address": "mill road"}
Find out all the records in address that contain mill road, and give the correlation score
5.2.4 Multi-field matching multi_match
GET bank/_search {"query": {"multi_match": {"query": "mill land", "fields": ["state", "address"]}
Query in multi_match can also do word segmentation.
Query records where state contains mill or land or address contains mill or land.
5.2.5 compound query bool
Compound statements can merge any other query statements, including compound statements. Compound statements can be nested with each other to express complex logic.
Use must,must_not,should in combination
Must: the condition specified by must must be met. (influence correlation score)
Must_not: the conditions for must_not must not be met. (does not affect the correlation score)
Should: if the should condition is met, the score can be improved. If you are not satisfied, you can also query the record. (influence correlation score)
Example: query a record whose address contains mill and whose gender is M and whose age is not equal to 28, and give priority to showing the record where firstname contains Winnie.
GET bank/_search {"query": {"bool": {"must": [{"match": {"address": "mill"}}, {"match": {"gender": "M"}}] "must_not": [{"match": {"age": "28"}], "should": [{"match": {"firstname": "Winnie"}}]}
5.2.6 filter filtering
Do not affect the correlation score, query the records that meet the filter conditions.
Used in bool.
GET bank/_search {"query": {"bool": {"filter": [{"range": {"age": {"gte": 18, "lte": 40}]}
5.2.7 term query
Matches the value of an attribute.
Full-text search fields use match, and other non-text fields match with term
Keyword: exact text matching (match all)
Match_phase: text phrase matching
Non-text fields exactly match GET bank/_search {"query": {"term": {"age": "20"}
5.2.8 aggregations aggregation
Aggregation: grouping and extracting data from data. Similar to SQL GROUP BY and SQL aggregate functions.
Elasticsearch can return hit results and multiple aggregate results at the same time.
Aggregation syntax:
"aggregations": {"": {"": {} [, "metadata": {[]}]? [, "aggregations": {[] +}]?} ["aggregate name 2 >": {...}] *}
Example 1: search for the age distribution (top 10 items) and average age, as well as average salary of all people with big in address
GET bank/_search {"query": {"match": {"address": "mill"}, "aggs": {"ageAggr": {"terms": {"field": "age", "size": 10} "ageAvg": {"avg": {"field": "age"}, "balanceAvg": {"avg": {"field": "balance"}
The search results are as follows:
The hits record returned, and so did the three aggregations, with an average age of 34 and an average salary of 25208.0. Pinjun's age distribution: two at 38, one at 28, and one at 32.
Example 1
If you do not want to return the hits result, you can set size:0 at the end
GET bank/_search {"query": {"match": {"address": "mill"}}, "aggs": {"ageAggr": {"terms": {"field": "age", "size": 10}, "size": 0}
Example 2: aggregate by age and query the average salary for those age groups
From the results, we can see that there are 61 31-year-olds, with an average salary of 28312.9, and the aggregate results of other ages are similar.
Example 2
Example 3: group by age, then group the results by sex, and then query the average salary after these groups.
GET bank/_search {"query": {"match_all": {}}, "aggs": {"ageAggr": {"terms": {"field": "age", "size": 10} "aggs": {"genderAggr": {"terms": {"field": "gender.keyword", "size": 10} "aggs": {"balanceAvg": {"avg": {"field": "balance"}}, "size": 0}
From the results, we can see that there are 61 31-year-olds. Among them, 35 were of gender M, with an average salary of 29565.6, and 26 of gender F, with an average salary of 26626.6. The aggregation results of other ages are similar.
Aggregate result
5.2.9 Mapping mapping
Mapping is used to define a document (document) and how the attributes it contains (field) are stored and indexed.
Define which string properties should be treated as full-text properties (full text fields)
Define which properties contain numbers, dates, or geographic locations
Defines whether all attributes in the document can be indexed (_ all configuration)
Format of date
Customize mapping rules to perform dynamically adding attributes
Elasticsearch7 removes the concept of tpye:
Two database representations in a relational database are independent, even if they have columns with the same name, but this is not the case in ES. Elasticsearch is a search engine based on Lucence, and different type and field with the same name in ES end up in the same way in Lucence.
In order to distinguish fields with the same name under different type, Lucence needs to deal with conflicts, resulting in a decline in retrieval efficiency.
ES7.x version: the type parameter in URL is optional.
ES8.x version: type parameter in URL is not supported
For all types, please refer to the documentation: https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-types.html
Mapping of query indexes
For example, query the mapping of my-index index
GET / my-index/_mapping returns the result: {"my-index": {"mappings": {"properties": {"age": {"type": "integer"}, "email": {"type": "keyword"} "employee-id": {"type": "keyword", "index": false}, "name": {"type": "text"}
Create an index and specify a mapping
For example, to create a my-index index, there are three fields age,email,name, and the specified types are interge, keyword, and text.
PUT / my-index {"mappings": {"properties": {"age": {"type": "integer"}, "email": {"type": "keyword"}, "name": {"type": "text"} return result: {"acknowledged": true, "shards_acknowledged": true "index": "my-index"}
Add a new field mapping
For example, add an employ-id field to the my-index index and specify the type as keyword
PUT / my-index/_mapping {"properties": {"employee-id": {"type": "keyword", "index": false}
Update Mapping
We cannot update existing mapping fields, we must create a new index for data migration.
Data migration
POST _ reindex {"source": {"index": "twitter"}, "dest": {"index": "new_twitter"}}
VI. Chinese word segmentation
ES has many built-in word separators, but it is not friendly to Chinese word segmentation, so we need to use the third-party Chinese word segmentation toolkit.
6.1 the principle of word segmentation in ES
6.1.1 the concept of word splitter in ES
One of ES's word splitters (tokenizer) receives a character stream, splits it into separate tokens, and then outputs the word stream.
ES provides many built-in word splitters that can be used to build custom word splitters (custom ananlyzers)
6.1.2 principle of standard word splitter
For example, the stadard tokenizer standard word splitter encounters spaces for word segmentation. The separator is also responsible for recording the order or position position of each entry (term) (for phrase phrases and nearest neighbor queries of word proximity words). The character offset for each word (used to highlight the content of the search).
6.1.3 examples of English and punctuation word segmentation
An example of a query is as follows:
POST _ analyze {"analyzer": "standard", "text": "Do you know why I want to study ELK? 2 333..."}
Query results:
Do, you, know, why, i, want, to, study, elk, 2,3,33
As can be seen from the query results:
(1) there is no participle in punctuation.
(2) numbers can be segmented.
English sentence segmentation
6.1.4 examples of Chinese word segmentation
However, this kind of word separator is not friendly to Chinese word segmentation and will segment words into separate Chinese characters. For example, the following example will divide the structure of Wukong chat into Wu, Kong, chat, frame, structure, and expected participle as Wukong, chat, and structure.
POST _ analyze {"analyzer": "standard", "text": "Wukong chat Architecture"}
Chinese word Segmentation Wukong chat structure
We can install ik word Segmentation to support Chinese word segmentation more amicably.
6.2 install the ik word splitter
6.2.1 ik Separator address
Ik Separator address:
Https://github.com/medcl/elasticsearch-analysis-ik/releases
Check the ES version first. The version I installed is 7.4.2, so we also choose 7.4.2 when we install the ik participle.
Http://192.168.56.10:9200/ {"name": "8448ec5f3312", "cluster_name": "elasticsearch", "cluster_uuid": "xC72O3nKSjWavYZ-EPt9Gw", "version": {"number": "7.4.2", "build_flavor": "default", "build_type": "docker", "build_hash": "2f90bbf7b93631e52bafb59b3b049cb44ec25e96" "build_date": "2019-10-28T20:40:44.881551Z", "build_snapshot": false, "lucene_version": "8.2.0", "minimum_wire_compatibility_version": "6.8.0", "minimum_index_compatibility_version": "6.0.0-beta1"}, "tagline": "You Know, for Search"}
6.2.2 how to install the ik word splitter
6.2.2.1 method 1: install ik word splitter in the container
Enter the plugins directory inside the es container
Docker exec-it / bin/bash
Get ik Separator compressed package
Wget https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v7.4.2/elasticsearch-analysis-ik-7.4.2.zip
Extract the ik package
Unzip compressed package
Delete the downloaded package
Rm-rf * .zip
6.2.2.2 method 2: install ik word splitter in mapping file
Go to the mapping folder
Cd / mydata/elasticsearch/plugins
Download the installation package
Wget https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v7.4.2/elasticsearch-analysis-ik-7.4.2.zip
Extract the ik package
Unzip compressed package
Delete the downloaded package
Rm-rf * .zip
6.2.2.3 method 3: Xftp uploads the compressed package to the mapping directory
First use the XShell tool to connect the virtual machine (you can refer to the previous article [02. Quickly build Linux environment-essential for operation and maintenance] (http://www.jayh.club/#/05. Installation and deployment / 01. ), and then use Xftp to copy the downloaded installation package to the virtual machine.
6.3 decompress the ik word splitter into the container
If the unzip decompression tool is not installed, install the unzip decompression tool.
Apt install unzip
Extract the ik word splitter to the ik folder in the current directory.
Command format: unzip
Example:
Unzip ELK-IKv7.4.2.zip-d. / ik
Decompress the ik word splitter
Modify folder permissions to be readable and writable.
Chmod-R 777 ik/
Delete ik Separator compressed package
Rm ELK-IKv7.4.2.zip
6.4 check the installation of the ik word splitter
Enter into the container
Docker exec-it / bin/bash
Check out the plug-ins for Elasticsearch
Elasticsearch-plugin list
The results are as follows, indicating that the ik word splitter has been installed. Isn't it easy.
Ik
Then exit the Elasticsearch container and restart the Elasticsearch container
Exit docker restart elasticsearch
6.5 use ik Chinese word Separator
There are two modes of ik word splitter
Intelligent word segmentation mode (ik_smart)
Maximum combinatorial word segmentation model (ik_max_word)
Let's take a look at the effect of the intelligent word segmentation model. For example, if you do Chinese word segmentation for a small star, you will get two words: one and the little star.
We enter the following query in Dev Tools Console
POST _ analyze {"analyzer": "ik_smart", "text": "a little star"}
The following results are obtained, and the participle is divided into small stars.
The result of word segmentation with a little star
Let's take a look at the maximum combinatorial word segmentation model. Enter the following query statement.
POST _ analyze {"analyzer": "ik_max_word", "text": "a little star"}
A little star is divided into six words: one, little star, little star, star.
The result of word segmentation with a little star
Let's look at another Chinese participle. For example, search Wukong Brother chat structure, and expect the results: Gokong Brother, chat, structure three words.
The actual result: Wu, empty elder brother, chat, structure four words. The ik participle participle Gokongge and thinks that empty elder brother is a word. So you need to let the ik splitter know that Brother Wukong is a word and does not need to be split. What are we going to do?
Brother Wukong talks about structural participle
6.5 Custom word Segmentation Thesaurus
6.5.1 Scheme for a custom thesaurus
Scheme
Create a new thesaurus file, and then specify the path to the participle thesaurus file in the configuration file of the ik word splitter. You can specify a local path or a remote server file path. Here we use the remote server file scheme because it supports hot updates (the server file is updated and the ik participle thesaurus is reloaded).
Modify the configuration file
The path of the configuration file of the ik word splitter in the container:
/ usr/share/elasticsearch/plugins/ik/config/IKAnalyzer.cfg.xml .
You can modify this file by modifying the mapping file and file path:
/ mydata/elasticsearch/plugins/ik/config/IKAnalyzer.cfg.xml
Edit the configuration file:
Vim / mydata/elasticsearch/plugins/ik/config/IKAnalyzer.cfg.xml
The contents of the configuration file are as follows:
IK Analyzer extension configuration custom/mydict.dic;custom/single_word_low_freq.dic custom/ext_stopword.dic location http://xxx.com/xxx.dic
Modify the property value of the configuration remote_ext_dict to specify the path to a remote Web site file, such as http://www.xxx.com/ikwords.text.
Here we can build our own nginx environment, and then put ikwords.text into the nginx root directory.
6.5.2 build a nginx environment
Solution: first obtain the nginx image, then start a nginx container, then copy the nginx configuration file to the root directory, delete the original nginx container, and then restart the nginx container by mapping folders.
Install the nginx environment through the docker container.
Docker run-p 80:80-- name nginx-d nginx:1.10
Copy the configuration file of the nginx container to the conf folder of the mydata directory
Cd / mydata docker container cp nginx:/etc/nginx. / conf
Create a nginx directory in the mydata directory
Mkdir nginx
Move the conf folder to the nginx mapping folder
Mv conf nginx/
Terminate and delete the original nginx container
Docker stop nginx docker rm
Start a new container
Docker run-p 80:80-- name nginx\-v / mydata/nginx/html:/usr/share/nginx/html\-v / mydata/nginx/logs:/var/log/nginx\-v / mydata/nginx/conf:/etc/nginx\-d nginx:1.10
Access the nginx service
192.168.56.10
Report 403 Forbidden, and nginx/1.10.3 indicates that the nginx service starts normally. The reason for the exception is that there is no file under the nginx service.
Create a new html file in the nginx directory
Cd / mydata/nginx/html vim index.html hello passjava
Access the nginx service again
The browser prints the hello passjava. Indicates that there is no problem accessing the page of the nginx service.
Create ik word segmentation thesaurus file
Cd / mydata/nginx/html mkdir ik cd ik vim ik.txt
Fill in Brother Gokong and save the file.
Access thesaurus files
Http://192.168.56.10/ik/ik.txt
The browser will output a string of garbled codes, which can be ignored first. Indicates that the thesaurus file can be accessed.
Modify ik word splitter configuration
Cd / mydata/elasticsearch/plugins/ik/config vim IKAnalyzer.cfg.xml
Modify ik word splitter configuration
Restart the elasticsearch container and set the elasticsearch container to start each time the machine is rebooted.
Docker restart elasticsearch docker update elasticsearch-restart=always
Query the result of word segmentation again
We can see that the structure of Wukong chat is divided into three words: Gokong, chat and architecture, which shows that Wukong in the custom thesaurus plays a role.
The result of word segmentation after a custom thesaurus
At this point, I believe you have a deeper understanding of "what are the knowledge points of Elasticsearch?" you might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.